. microsoft word has an option “text alternative” to add a description of a table or figure for visually impaired people, who will use screen readers for reading the document. adobe acrobat reader also has an accessibility pane to tag tables and add alternative text and descriptions of tables, which is used by the nvda screen reader to read aloud. moreover, commonlook office, whose motto is “build accessibility into documents early,” has add-ins for microsoft word or powerpoint to add enough accessibility content to the documents to information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 9 table 1. solutions and libraries for table extraction and processing. s no. tools open source image based comments 1 tabula y n extracts data tables from pdf and saves as csv or excel spreadsheet. it works on native pdf files and cannot extract scanned tables. it supports multiple platforms but does not support batch processing. 2 pdftables n n extracts page, table, table row, and even table cell. it is a fully automated api. it supports multiple platforms and multiple programming languages. 3 docparser n y extracts information from images and forms. it is a cloud-based application and supports batch processing. it parses the documents and offers more features but needs human intervention. it shows poor accuracy in handwritten application forms. 4 pdftron n n supports multiple platforms and multiple programming languages. 5 camelot y y a python library that extracts table from images. it has built-in ocr. 6 excalibur y y a web-based solution which is powered by camelot. 7 pypdf2 y n a python library that can do batch processing with multiple files. 8 pdfplumber y y a python library built on pdfminer. 9 pdf table extractor y n a web-based tool built on tabula. it supports scraping of multiple page tables and comparison of cell values. 10 pdfminer y n a python library that extracts information like location, fonts, and lines of the text. it focuses on analyzing text. it has a pdf parser. it figures out the semantic relationships among structured tables. make the resulting pdf accessible. however, already-developed unstructured documents, without any accessibility features, still need some measures to make the documents understandable to visually impaired or blind users. keeping in mind the statistics of visually impaired people and the unstructured data of the future—the global data sphere will grow from 33zb to 175zb and 80% of this worldwide data will be unstructured—visually impaired individuals cannot be ignored for their access to knowledge.68 therefore, we would need mechanisms for making these unstructured documents understandable to as many people as possible by incorporating accessibility measures in the document readers. the following section highlights some of the key issues in this domain. issues and challenges in the existing systems tables can be utilized in multiple scenarios including information extraction, table search, ontology engineering, conversion to dbms, and document engineering. 69 the situation becomes difficult when a blind or visually impaired person needs to understand the tables. the issues and challenges in dealing with pdf tables are categorized in the following sections. https://tabula.technology/ https://pdftables.com/ https://docparser.com/ https://www.pdftron.com/ https://resourcegovernance.org/analysis-tools/tools/pdf-table-extractor https://resourcegovernance.org/analysis-tools/tools/pdf-table-extractor information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 10 table structure tables in pdf documents need more focus on table structure detection because they do not follow a defined formal structure.70 several knowledge gaps are identified in literature regarding table structure, such as the identification of functional areas of tables, for which silva argued the use of multiple heuristics and machine learning algorithms in parallel or in sequence.71 the variety of structural layouts creates problems in their identification, which can be handled by defining more rules at the lexical and syntactic layer of table processing. this could also be fruitful for better semantic annotations.72 in addition, the variety of cell content or inconsistent cell content, along with implicit header cells, creates problems in understanding the tables, especially by machines.73 the vector representation of web tables may be applied to pdf tables for semantic annotations and identification of column types.74 along with that approach, graphical representation and a graphical neural network (gnn) can also be used for better structure identification in multiple domains.75 new data sets need to be introduced for structure recognition in various domains, including business and finance, as they use a huge amount of tables in their documents.76 from the discussion above, the table structure inconsistencies, cell content inconsistencies, functional and logical processing of tables needs more research effort to eliminate the stated problems. along with that, the inclusion of more data sets will also help in handling the diversity in the field. table formats the existing format of tables in pdf lacks the metadata needed for further processing; therefore, the conversion of pdf tables to other formats, especially open formats, will open new endeavors. some researchers have worked on converting tables to csv format, which retains the basic structure but lacks some cell formatting. researchers worked on the transformation of web tables to relational tables for easy manipulation.77 in contrast, xml can handle complex data and is more easily read by humans. therefore, a methodology is presented to work on tables in xml format, but it considers tables having text and numerical data only.78 json, another format, can also be used as an alternative to xml; it is smaller in size than the xml and can handle complex and hierarchical data. the json format has less support than xml but is preferred for web application due to its interoperability and lightweight features. table interpretation the variable representation patterns of table values, dense content and natural languag e processing create problems in the correct interpretation of tables.79 anaphoric resolution techniques and documenting level discourse parsers are suggested to handle complex references among multiple domains.80 moreover, handling the locality features of a table and the annotation of its property feature can lead to better interpretation of tables.81 the use of a knowledge base is suggested for understanding and annotating the relationships among tables and text to get more information about the extracted entities from tables and text.82 similarly, the extraction of data and its precision in medical and financial tables is an issue that needs the attention of researchers, as both fields have crucial and important data in its tables. 83 for easy interpretation of tables, machine learning classifiers, based on table headings and captions, can be used to classify them into their respective domain.84 the relationship of tables in a specific domain and or among multiple domains can be achieved by developing ontologies.85 this will enable the tables to be published on an lod cloud that will establish more relationships and infer insights from multiple domains. information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 11 table evaluation most of the researchers working on pdf tables have tried to evaluate their work with popular data sets such as icdar 2013, icdar 2015, icdar 2017 pod, pubmed, unlv, and mormont. as we have pdf documents in multiple domains, therefore, new data sets should be introduced for structure recognition, especially in business and finance, as these domains use a large number of tables in their documents.86 an evaluation methodology was proposed for table detection, structure recognition, and its functional and semantic analysis.87 unfortunately, there are no standard metrics, parameters, and formal methodology for table processing evaluation.88 therefore, standard evaluation metrics should be defined for pdf tables, in order to standardize the evaluation of algorithms and frameworks. table presentation to blind and visually impaired users the available tools and techniques for reading aloud documents to blind and visually impaired people either read the table caption only and ignore the content or treat the tables as text and read the rows line by line. this does not help these users to understand the semantics of the table and its content. besides the content of the table, its layout shows grouping and connections among the content which is not presented to blind and visually impaired people by current solutions.89 therefore, tools and screen readers need to present tables in nonvisual format or give a summarized view of tables by following the guidelines of w3c, instead of reading the table like text.90 the summarized view of tables can become part of bibliographic metadata and can contribute in cataloging in the perspective of linked and open data. 91 a study highlighted the accessibility of published pdf articles by four journal publishers and presented the findings in graphs to show the trend from 2009 to 2013, by taking parameters including meaningful title, alternate text for images, and logical reading order.92 the author further applied the same methodology to analyze the articles published in next four years (2014 to 2018) and came to the conclusion that accessibility of pdf documents had improved. however, the journal publishers , who should be more aware of disability and accessibility, did not consistently follow the pdf/ua accessibility requirements and wcag 2.0 when producing pdf versions of their articles.93 therefore, visually impaired individuals should be provided with a mechanism for understanding the digital content and underlying semantics at multiple levels of abstractions, like the general information about the document and its elements—including tables—its structure and content, navigation in the table, and querying the table to get more details and lessen cognitive overload. accessibility of digital library collection the accessibility of large-scale digital library collections can enhance content for sighted as well as visually impaired users. the traditional utilization of digital library collections needs to be broadened by making computation-ready collections meant to be used and consumed in multiple domains.94 an effort was made by researchers to digitize and archive a digital repository of images and convert them to pdf/a documents but, unfortunately, the researchers came up with limited semantics as they did not consider the elements within the documents themselves.95 the accessibility of these converted documents may be compromised with these limited semantics. the rich semantics of tables can be used in the bibliographic classification of a digital library’s collection to increase the search width of the digital library.96 blind and visually impaired users can be assisted in using digital libraries, as they may need help at physical and cognitive levels. at the physical level, the blind may face difficulty in accessing information, identifying path and status, and efficiently evaluating information. at the cognitive level, they may face problems in understanding multiple structures, programs, information, features of the digital library, and the need to stick to some specific formats. therefore, the inclusion of help features will make the information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 12 digital library friendly to blind and visually impaired people by incorporating meaningful descriptions for nontextual elements.97 the sight-centered nature of the digital library creates problems for blind and visually impaired users due to missing textual or verbal instructions. some researchers identified the inclusion of labels and meaningful descriptions for hyperlinks, instructions, structure, multimedia content and nontext content to make digital libraries friendly to blind and visually impaired people.98 at the same time, others argue for improvement in usability by introducing help features in terms of usefulness, ease of use, and user satisfaction.99 the accessibility of digital libraries in general and its content in specific may be improved by accommodating help features in the interface and meaningful descriptions for the contents’ nontext elements including tables. conclusions and future research directions this study discusses the accessibility of tables included in pdf documents in general as well as in the specific environment of digital libraries. existing frameworks, algorithms, and solutions for the processing and interpretation of pdf tables, specifically their presentation to blind and visually impaired people, are thoroughly discussed. a general workflow of table processing is also presented in figure 1. the available solutions for reading out pdf documents to blind and visually impaired people are analyzed for their output, specifically for their attitude towards handling tables. furthermore, a list of resources for table interpretation and presentation are discussed along with their different features. the issues and challenges in table structure, format, interpretation, evaluation, its presentation to blind, and accessibility of digital library collection are discussed. the researchers working in the domain of accessibility, digital library, and pdf tables can extend and modify the current solutions and algorithms by following the future research directions given below. • the structure of a table has implicit semantic information which a sighted reader can infer but a blind reader needs assistance to understand. the structure of a pdf table is extracted using multiple approaches like heuristics, ontologies, machine learning and segmentation, whereas vectors are used for a web table.100 therefore, the combinations of multiple approaches and use of vectors for pdf tables may produce better results. • the content of a table is usually numeric or very short text and needs proper interpretation. therefore, a knowledge base can be used to get more information about the extracted entities from tables and text in order to understand and annotate the relationships among tables and text.101 these knowledge bases can be predetermined or may be selected automatically according to the table content or domain. • table interpretation can become easy if tables are classified according to their domains by using machine learning classifiers. the classification can be based on table headings and captions, as well as the title and author of the document.102 • ontologies are used to relate the tables in a specific domain and or among multiple domains, and publishing them on an lod cloud will establish new relationships.103 this will help in inferring new insights from complex, long, and numerical tables. • unstructured data and content can be made available for multiple usage and interpretations if it is converted to open formats like csv, json and xml.104 among these, csv comes with repeated content, xml needs special parsers, whereas json is lightweigh t and easy to write and read.105 it has support from nosql databases like mongodb and apache couchdb, and web application apis like twitter, you tube, and facebook. information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 13 therefore, json might be a better option for the conversion of pdf tables for its multiple interpretation and navigation within tables. • the processes used for evaluation of tables have no defined matrices.106 therefore, the table evaluation processes should be defined with their respective matrices in order to standardize the research in this domain. • the precision of extracted content of table is very crucial especially in medical, financial, and experimental tables that have numeric data. therefore, the preprocessing of tables or conversion to other formats would need more attention to avoid any truncation or round off of the data. • the presentation of tables to blind or visually impaired people can be in nonvisual or summarized form.107 the summaries may be presented nonvisually, including the structural layout as well as a brief introduction of the table, to minimize the cognitive overload on these individuals. • to evaluate the accessibility of digital library interfaces, 16 heuristics were proposed to make the digital libraries in reach of users, however, more heuristics are needed to make generalized interfaces for all individuals.108 • the nontext elements of digital library collections should have meaningful descriptions for better understandability of blind and visually impaired individuals. the user-generated content about these nontext elements could be used for cataloging.109 • the rich semantics of tables can be exploited for cataloging and classification that will be helpful in exploratory searching. • as the michigan state university libraries has taken the initiative of assessing and improving the accessibility of digital library content by adopting the wcag guidelines, other libraries can also adopt the model for providing accessible content to their users including blind and visually impaired individuals. • the development of new data sets for tables in multiple domains can facilitate the researchers in interpreting tables and establishing relationships in cross-domains. this review paper is an attempt to highlight the knowledge gap in processing the pdf tables and its accessibility for blind and visually impaired individuals. an efficient and open-source solution for making pdf documents accessible to blind and visually impaired people needs to exploit the heuristics, ontologies, machine learning, and deep learning by using open-source libraries and tools for understanding and interpreting the tabular content in order to reduce information overload. endnotes 1 roya rastan, “automatic tabular data ex wcag traction and understanding” (phd diss., university of new south wales, 2017). 2 mark t. maybury, “communicative acts for explanation generation,” international journal of man-machine studies 37, no. 2 (1992): 135–72. 3 patricia wright, “the comprehension of tabulated information: some similarities between reading prose and reading tables,” nspi journal 19, no. 8 (1980): 25–29, https://doi.org/10.1002/pfi.4180190810. https://doi.org/10.1002/pfi.4180190810 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 14 4 jean-claude guédon et al., future of scholarly publishing and scholarly communication: report of the expert group to the european commission (brussels: european commission, directorategeneral for research and innovation, 2019), https://doi.org/10.2777/836532. 5 world health organization, world report on vision, october 8, 2019, https://www.who.int/publications-detail/world-report-on-vision/. 6 mireia ribera turró, “are pdf documents accessible?” information technology and libraries 27, no. 3 (2008): 25–43, https://doi.org/10.6017/ital.v27i3.3246. 7 kyunghye yoon, laura hulscher, and rachel dols, “accessibility and diversity in library and information science: inclusive information architecture for library websites,” library quarterly 86, no. 2 (2016): 213–29, https://doi.org/10.1086/685399. 8 iris xie et al., “using digital libraries non-visually: understanding the help-seeking situations of blind users,” information research 20, no. 2 (2015): 673. 9 heidi m. schroeder, “implementing accessibility initiatives at the michigan state university libraries,” reference services review 46, no. 3 (2018): 399–413, https://doi.org/10.1108/rsr04-2018-0043. 10 joanne oud, “accessibility of vendor-created database tutorials for people with disabilities,” information technology and libraries 35, no.4 (2016): 7–18, https://doi.org/10.6017/ital.v35i4.9469. 11 rakesh babu and iris xie, “haze in the digital library: design issues hampering accessibility for blind users,” electronic library 35, no. 5 (2017): 1052–65, https://doi.org/10.1108/el-102016-0209. 12 rachel wittmann et al., “from digital library to open datasets,” information technology and libraries 38, no. 4 (2019): 49–61, https://doi.org/10.6017/ital.v38i4.11101. 13 xinxin wang, “tabular abstraction, editing, and formatting” (phd diss., university of waterloo, 1996). 14 rastan, “automatic tabular data extraction,” 25. 15 azadeh nazemi, “non-visual representation of complex documents for use in digital talking books” (phd diss., curtin university, 2015). 16 rastan, “automatic tabular data extraction,” 14. 17 max göbel et al., “icdar 2013 table competition,” in 2013 12th international conference on document analysis and recognition (2013): 1449–53, https://doi.org/10.1109/icdar.2013.292. 18 burcu yildiz, katharina kaiser, and silvia miksch, “pdf2table: a method to extract table information from pdf files,” in proceedings of the 2nd indian international conference on artificial intelligence (iicai, 2005): 1773–85; tamir hassan and robert baumgartner, “table recognition and understanding from pdf files,” in ninth international conference on https://doi.org/10.2777/836532 https://www.who.int/publications-detail/world-report-on-vision/ https://doi.org/10.6017/ital.v27i3.3246. https://doi.org/10.1086/685399 https://doi.org/10.1108/rsr-04-2018-0043 https://doi.org/10.1108/rsr-04-2018-0043 https://doi.org/10.6017/ital.v35i4.9469 https://doi.org/10.1108/el-10-2016-0209 https://doi.org/10.1108/el-10-2016-0209 https://doi.org/10.6017/ital.v38i4.11101 https://doi.org/10.1109/icdar.2013.292 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 15 document analysis and recognition (icdar 2007) (2007): 1143–47, https://doi.org/ 10.1109/icdar.2007.4377094; alexey shigarov et al., “tabbypdf: web-based system for pdf table extraction,” in international conference on information and software technologies (springer international publishing, 2018): 257–69, https://doi.org/10.1007/978-3-31999972-2_20. 19 minghao li et al., “tablebank: table benchmark for image-based table detection and recognition,” preprint, arxiv:1903.01949; sebastian schreiber et al., “deepdesrt: deep learning for detection and structure recognition of tables in document images,” in 2017 14th iapr international conference on document analysis and recognition (icdar) (2017): 1162–67, https://doi.org/10.1109/icdar.2017.192. 20 zewen chi et al., “complicated table structure recognition,” preprint, arxiv:1908.04729. 21 michael cafarella et al., “ten years of webtables,” in proceedings of the vldb endowment 11, no. 12 (august 2018): 2140–49, https://doi.org/10.14778/3229863.3240492. 22 shah khusro, asima latif, and irfan ullah. “on methods and tools of table detection, extraction and annotation in pdf documents,” journal of information science 41, no. 1 (2015): 41–57, https://doi.org/10.1177/0165551514551903. 23 hassan, “table recognition and understanding”; richard zanibbi, dorothea blostein, and james r cordy, “a survey of table recognition,” document analysis and recognition 7, no. 1 (2004): 1–16, https://doi.org/10.1007/s10032-004-0120-9; andreiwid sheffer corrêa and pär-ola zander, “unleashing tabular content to open data: a survey on pdf table extraction methods and tools,” in proceedings of the 18th annual international conference on digital government research (june 2017): 54–63, https://doi.org/10.1145/3085228.3085278; christopher clark and santosh divvala, “looking beyond text: extracting figures, tables and captions from computer science papers” (paper, aaai workshops at the twenty-ninth aaai conference on artificial intelligence, austin, tx, january 25–26, 2015)., 24 ermelinda oro and massimo ruffolo, “pdf–trex: an approach for recognizing and extracting tables from pdf documents,” in 2009 10th international conference on document analysis and recognition (icdar) (2009): 906–10, https://doi.org/10.1109/icdar.2009.12. 25 vidhya govindaraju, ce zhang, and christopher ré, “understanding tables in context using standard nlp toolkits,” in proceedings of the 51st annual meeting of the association for computational linguistics (sofia, bulgaria: association for computational linguistics, august 2013): 658–64. 26 nikola milosevic et al., “disentangling the structure of tables in scientific literature,” in natural language processing and information systems, nldb 2016, lecture notes in computer science 9612 (springer, cham), https://doi.org/10.1007/978-3-319-41754-7_14. 27 rastan, “automatic tabular data extraction,” 48. https://10.0.4.85/icdar.2007.4377094 https://10.0.4.85/icdar.2007.4377094 https://doi.org/10.1007/978-3-319-99972-2_20 https://doi.org/10.1007/978-3-319-99972-2_20 https://doi.org/10.1109/icdar.2017.192 https://doi.org/10.14778/3229863.3240492 https://doi.org/10.1177/0165551514551903 https://doi.org/10.1007/s10032-004-0120-9 https://doi.org/10.1145/3085228.3085278 https://doi.org/10.1109/icdar.2009.12 https://doi.org/10.1007/978-3-319-41754-7_14 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 16 28 alexey shigarov, andrey mikhailov, and andrey altaev, “configurable table structure recognition in untagged pdf documents,” in proceedings of the 2016 acm symposium on document engineering, (2016): 119–22, https://doi.org/10.1145/2960811.2967152. 29 shigarov et al., “tabbypdf,” 262, 263, 265. 30 dae hyun kim et al., “facilitating document reading by linking text and tables,” in proceedings of the 31st annual acm symposium on user interface software and technology (october 2018): 423–34, https://doi.org/10.1145/3242587.3242617. 31 hassan, “table recognition and understanding,” 1145. 32 jing fang et al., “a table detection method for multipage pdf documents via visual separators and tabular structures,” in 2011 international conference on document analysis and recognition (2011): 779–83, https://doi.org/10.1109/icdar.2011.304. 33 bahadar ali and shah khusro, “a divide-and-merge approach for deep segmentation of document tables,” in proceedings of the 10th international conference on informatics and systems (may 2016): 43–49, https://doi.org/10.1145/2908446.2908473. 34 wenyuan xue et al., “table analysis and information extraction for medical laboratory reports,” in 2018 ieee 16th intl conf on dependable, autonomic and secure computing, 16th intl conf on pervasive intelligence and computing, 4th intl conf on big data intelligence and computing and cyber science and technology congress (dasc/picom/datacom/cyberscitech) (2018): 193–99, https://doi.org/10.1109/dasc/picom/datacom/cyberscitec.2018.00043. 35 roya rastan, hye-young paik, and john shepherd, “texus: a unified framework for extracting and understanding tables in pdf documents,” information processing & management 56, no. 3 (2019): 895–918, https://doi.org/10.1016/j.ipm.2019.01.008. 36 dafang he et al., “multi-scale multi-task fcm for semantic page segmentation and table detection,” in 2017 14th iapr international conference on document analysis and recognition (icdar) (2017): 254–61, https://doi.org/10.1109/icdar.2017.50. 37 jing fang et al., “table header detection and classification,” in proceedings of the twenty-sixth aaai conference on artificial intelligence (july 2012): 599–605. 38 he et al., “multi-scale multi-task,” 255. 39 martha o. perez-arriaga, trilce estrada, and soraya abad-mota, “tao: system for table detection and extraction from pdf documents,” florida artificial intelligence research society conference, north america (2016). 40 saman arif and faisal shafait, “table detection in document images using foreground and background features,” in 2018 digital image computing: techniques and applications (dicta), (2018): 1–8, https://doi.org/10.1109/dicta.2018.8615795. 41 schreiber et al., “deepdesrt,” 1163, 1164. https://doi.org/10.1145/2960811.2967152 https://doi.org/10.1145/3242587.3242617 https://doi.org/10.1109/icdar.2011.304 https://doi.org/10.1145/2908446.2908473 https://doi.org/10.1109/dasc/picom/datacom/cyberscitec.2018.00043 https://doi.org/10.1016/j.ipm.2019.01.008 https://doi.org/10.1109/icdar.2017.50 https://doi.org/10.1109/dicta.2018.8615795 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 17 42 shoaib ahmed siddiqui et al., “decnt: deep deformable cnn for table detection,” ieee access 6 (2018): 74151–61, https://doi.org/10.1109/access.2018.2880211. 43 chi et al., “complicated table structure recognition.” 44 rahul anand, hye-young paik, and cheng wang, “integrating and querying similar tables from pdf documents using deep learning,” 2019, preprint, arxiv:1901.04672. 45 jiaoyan chen et al., “colnet: embedding the semantics of web tables for column type prediction,” in proceedings of the aaai conference on artificial intelligence 33, no. 1: 29–36, https://doi.org/10.1609/aaai.v33i01.330129. 46 ziqi zhang, “towards efficient and effective semantic table interpretation,” in international semantic web conference (2014): 487–502, https://doi.org/10.1007/978-3-319-11964-9_31. 47 ivan ermilov, sören auer, and claus stadler, “user-driven semantic mapping of tabular data,” in proceedings of the 9th international conference on semantic systems (september 2013): 105–12, https://doi.org/10.1145/2506182.2506196. 48 martha o perez-arriaga, trilce estrada, and soraya abad-mota, “table interpretation and extraction of semantic relationships to synthesize digital documents,” in proceedings of the 6th international conference on data science, technology and application—data (2017): 223– 32, https://doi.org/10.5220/0006436902230232. 49 varish mulwad, “tabel—a domain-independent and extensible framework for inferring the semantics of tables,” (phd diss., university of maryland, 2015). 50 syed tahseen raza rizvi et al., “ontology-based information extraction from technical documents,” in proceedings of the 10th international conference on agents and artificial intelligence (icaart) (2018): 493–500, https://doi.org/10.5220/0006596604930500. 51 corrêa and zander, “unleashing tabular content to open data,” 55. 52 irfan ullah et al., “an overview of the current state of linked and open data in cataloging,” information technology and libraries 37, no. 4 (2018): 47–80, https://doi.org/10.6017/ital.v37i4.10432. 53 nosheen fayyaz, irfan ullah, and shah khusro, “on the current state of linked open data: issues, challenges, and future directions,” international journal on semantic web and information systems (ijswis) 14, no. 4 (2018): 110–28, https://doi.org/10.4018/ijswis.2018100106. 54 govindaraju, zhang, and ré , “understanding tables in context using standard nlp toolkits,” 660, 661. 55 perez-arriaga, estrada, and abad-mota, “table interpretation and extraction,” 227. 56 kim et al., “facilitating document reading,” 425, 426. https://doi.org/10.1109/access.2018.2880211 https://doi.org/10.1609/aaai.v33i01.330129 https://doi.org/10.1007/978-3-319-11964-9_31 https://doi.org/10.1145/2506182.2506196 https://doi.org/10.5220/0006436902230232 https://doi.org/10.5220/0006596604930500 https://doi.org/10.6017/ital.v37i4.10432 https://doi.org/10.4018/ijswis.2018100106 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 18 57 rastan, pail, and shepherd, “texus,” 906. 58 nikola milosevic et al., “a framework for information extraction from tables in biomedical literature,” international journal on document analysis and recognition (ijdar) 22, no. 1 (2019): 55–78, https://doi.org/10.1007/s10032-019-00317-0. 59 chi et al., “complicated table structure recognition.” 60 wenhao yu et al., “tablepedia: automating pdf table reading in an experimental evidence exploration and analytic system,” in the world wide web conference (may 2019): 3615–19, https://doi.org/10.1145/3308558.3314118. 61 anand, paik, and wang, “integrating and querying similar tables.” 62 turró, “are pdf documents accessible?” 2, 4. 63 nazemi, “non-visual representation of complex documents,” 110, 111, 112, 118. 64 juan cao, “generating natural language descriptions from tables,” ieee access 8 (2020): 46206–16, https://doi.org/10.1109/access.2020.2979115. 65 maartje ter hoeve et al., “conversations with documents: an exploration of document-centered assistance,” in proceedings of the 2020 conference on human information interaction and retrieval (march 2020): 43–52, https://doi.org/10.1145/3343413.3377971. 66 guédon et al., “future of scholarly publishing,” 42. 67 w3c, “wcag 2.0.” 68 world health organization, “world report on vision”; david reinsel, john gantz, and john rydning, “data age 2025: the digitization of the world, from edge to core,” idc white paper, #us44413318 (framingham, ma: idc, november 2018), https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataagewhitepaper.pdf/. 69 rastan, “automatic tabular data extraction,” 18, 19. 70 arif and shafait, “table detection in document images,” 1. 71 ana costa e silva, “parts that add up to a whole: a framework for the analysis of tables,” (phd diss., edinburgh university, uk, 2010). 72 milosevic et al., “a framework for information extraction from tables,” 60. 73 rastan, “automatic tabular data extraction,” 14. 74 chen et al., “colnet,” 31. 75 mulwad, “tabel,” 23; zewen, “complicated table structure recognition.” 76 siddiqui et al., “decnt,” 74160. https://doi.org/10.1007/s10032-019-00317-0 https://doi.org/10.1145/3308558.3314118 https://doi.org/10.1109/access.2020.2979115 https://doi.org/10.1145/3343413.3377971 https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf/ https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf/ information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 19 77 david w embley, sharad seth, and george nagy, “transforming web tables to a relational database,” 2014 22nd international conference on pattern recognition (2014) 2781–86, https://doi.org/10.1109/icpr.2014.479. 78 milosevic et al., “a framework for information extraction from tables,” 56. 79 milosevic et al., “a framework for information extraction from tables,” 55, 56. 80 kim et al., “facilitating document reading,” 432. 81 chen et al., “colnet,” 36. 82 asima latif et al., “a hybrid technique for annotating book tables,” int. arab j. inf. technol 15, no. 4 (2018): 777–83. 83 rastan, paik, and shepherd, “texus,” 909. 84 milosevic et al., “a framework for information extraction from tables,” 61, 62, 65, 66. 85 rizvi et al., “ontology-based information extraction,” 496. 86 siddiqui et al., “decnt,” 74160. 87 max göbel et al., “a methodology for evaluating algorithms for table understanding in pdf documents,” in proceedings of the 2012 acm symposium on document engineering (september 2012): 45–48, https://doi.org/10.1145/2361354.2361365. 88 rastan, paik, and shepherd, “texus,” 917. 89 david pinto et al., “table extraction using conditional random fields,” in proceedings of the 26th annual international acm sigir conference on research and development in information retrieval (july 2003): 235–42, https://doi.org/10.1145/860435.860479. 90 nazemi, “non-visual representation of complex documents,” 118–44; w3c, “wcag 2.0.” 91 ullah et al., “current state of linked and open data in cataloging,” 47, 48. 92 julius t. nganji, “the portable document format (pdf) accessibility practice of four journal publishers,” library and information science research 37, no.3 (2015): 254–62, https://doi.org/10.1016/j.lisr.2015.02.002. 93 julius t. nganji, “an assessment of the accessibility of pdf versions of selected journal articles published in a wcag 2.0 era (2014–2018),” learned publishing 31, no. 4 (2018): 391–401, https://doi.org/10.1002/leap.1197. 94 wittmann et al., “from digital library to open datasets,” 49, 50. 95 yan han and xueheng wan, “digitization of text documents using pdf/a,” information technology and libraries 37, no. 1 (2018): 52–64, https://doi.org/10.6017/ital.v37i1.9878. https://doi.org/10.1109/icpr.2014.479 https://doi.org/10.1145/2361354.2361365 https://doi.org/10.1145/860435.860479 https://doi.org/10.1016/j.lisr.2015.02.002 https://doi.org/10.1002/leap.1197 https://doi.org/10.6017/ital.v37i1.9878 information technology and libraries september 2021 accessibility of tables in pdf documents | fayyaz, khusro, and ullah 20 96 asim ullah, shah khusro, and irfan ullah, “bibliographic classification in the digital age: current trends & future directions,” information technology and libraries 36, no. 3 (2017): 48–77, https://doi.org/10.6017/ital.v36i3.8930. 97 xie et al., “using digital libraries non-visually,” paper 673. 98 babu and xie, “haze in the digital library,” 1057–59. 99 iris xie et al., “enhancing usability of digital libraries: designing help features to support blind and visually impaired users,” information processing and management 57, no. 3 (2020): 102110, https://doi.org/10.1016/j.ipm.2019.102110. 100 chen et al., “colnet,” 31, 32. 101 kim et al., “facilitating document reading,” 432. 102 milosevic et al., “a framework for information extraction from tables,” 61. 103 rizvi et al., “ontology-based information extraction,” 496. 104 embley, seth, and nagy, “transforming web tables to a relational database,” 2783; milosevic et al., “a framework for information extraction from tables,” 60. 105 nicholas j tierney and karthik ram, “a realistic guide to making data available alongside code to improve reproducibility,” preprint, arxiv:2002.11626. 106 rastan, paik, and shepherd, “texus,” 917. 107 nazemi, “non-visual representation of complex documents,” 118–44; w3c, “wcag 2.0.” 108 mexhid ferati and wondwossen m. beyene, “developing heuristics for evaluating the accessibility of digital library interfaces,” in universal access in human–computer interaction, design and development approaches and methods, uahci 2017, lecture notes in computer science 10277 (springer, cham), https://doi.org/10.1007/978-3-319-58706-6_14. 109 ullah et al., “current state of linked and open data in cataloging,” 64. https://doi.org/10.6017/ital.v36i3.8930 https://doi.org/10.1016/j.ipm.2019.102110 https://doi.org/10.1007/978-3-319-58706-6_14 abstract introduction the current state of table processing table extraction and processing using heuristics using segmentation using machine learning and deep learning approaches using ontologies relationship of tables with content and context existing accessibility-driven solutions for pdf documents issues and challenges in the existing systems table structure table formats table interpretation table evaluation table presentation to blind and visually impaired users accessibility of digital library collection conclusions and future research directions endnotes applying gamification to the library orientation: a study of interactive user experience and engagement preferences articles applying gamification to the library orientation a study of interactive user experience and engagement preferences karen nourse reed and a. miller information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12209 karen nourse reed (karen.reed@mtsu.edu) is associate professor, middle tennessee state university. a. miller (a.miller@mtsu.edu) is associate professor, middle tennessee state university. © 2020. abstract by providing an overview of library services as well as the building layout, the library orientation can help newcomers make optimal use of the library. the benefits of this outreach can be curtailed, however, by the significant staffing required to offer in-person tours. one academic library overcame this issue by turning to user experience research and gamification to provide an individualized online library orientation for four specific user groups: undergraduate students, graduate students, faculty, and community members. the library surveyed 167 users to investigate preferences regarding orientation format, as well as likelihood of future library use as a result of the gamified orientation format. results demonstrated a preference for the gamified experience among undergraduate students as compared to other surveyed groups. introduction background newcomers to the academic campus can be a bit overwhelmed by their unfamiliar environment: there are faces to learn, services and processes to navigate, and an unexplored landscape of academic buildings to traverse. whether one is an incoming student or recently hired employee of the university, all need to become quickly oriented to their surroundings to ensure productivity. in the midst of this transition, the academic library may or may not be on the list of immediate inquiries; however, the library is an important place to start. newcomers would be wise to familiarize themselves with the building and its services so that they can make optimal use of its offerings. two studies found that students who used the library received better grades and had higher retention rates. 1 another study regarding university employees revealed that untenured faculty made less use of the library than tenured faculty, a problem attributed to lack of familiarity with the library.2 researchers have also found that faculty will often express interest in different library services without realizing that these services are in fact available.3 it is safe to say that libraries cannot always rely on newcomers to discover the physical and electronic services on their own; they need to be shown these items in order to mitigate the risk of unawareness. in consideration of these issues, the walker library at middle tennessee state university (mtsu) recognized that more could be done to welcome its new arrivals to campus. the public university enrolls approximately 21,000 students, the majority of whom are undergraduates. however, with a carnegie classification of doctoral/professional and over one hundred graduate degree programs, there was a strong need for specialized research among the university’s graduate students and faculty. other groups needed to use the library too: non-faculty employees on campus as well as community users who frequently used walker library for its specialized and general collections. the authors realized that when new members of these different groups mailto:karen.reed@mtsu.edu mailto:a.miller@mtsu.edu information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 2 arrived on campus, few opportunities were available for acclimation to the library’s services or building layout. limited orientation experiences were conducted within library instruction classes, but these sessions primarily taught research skills and targeted freshman generaleducation classes as well as select upper-division and graduate classes. in short, it appeared that students, employees, and visitors to the university would largely have to discover the library’s services on their own through a search on the library website or an exploration of the physical library. it was very likely that, in doing so, the newcomers might miss out on valuable services and information. as mtsu librarians, the authors felt strongly that library orientations were important to everyone at the university so that they might make optimal use of the library’s offerings. the authors based this opinion on their knowledge of relevant scholarly literature as well as their own anecdotal experiences with students and faculty.4 the authors defined the library orientation differently from library instruction: in their view, an orientation should acquaint users with the services and physical spaces of the library, as compared to instruction that would teach users how to use the library’s electronic resources such as databases. the desired new approach would structure orientations in response to the different needs of the library’s users. for example, the authors found that undergraduates typically had distinct library interests compared to faculty. it was recognized that library orientations were time-consuming for everyone: library patrons at mtsu often did not want to take the time for a physical tour, nor did the library have the staffing to accommodate large-scale requests. the authors turned to the gamification trend, and specifically interactive storytelling, as a solution. interactive storytelling has previous applications in librarianship as a means of creating an immersive and self-guided user experience.5 however, no previous research appears to have been conducted to understand the different online, gamified orientation needs of various library groups. to overcome this gap, the authors developed an online, interactive, game-like experience via storytelling software to orient four different groups of users to the library’s services. these groups were undergraduate students, graduate students, faculty members (which included both faculty and staff at the university), and community members (i.e., visitors to the university or alumni); see figure 1 for an illustration of each groups’ game avatars. these groups were invited to participate in the gamified experience called libgo (short for library game orientation). after playing libgo, participants gave feedback through an online survey. this paper will give a brief explanation of the creation of the game, as well as describe the results of research conducted to understand the impact of the gamified experience across the four user groups. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 3 figure 1. libgo players were allowed to self-select their user group upon entering the game. each of the four user groups was assigned an avatar and followed a logic path specified for that group. literature review traditional orientation searches for literature on library orientation yield very broad and yet limited details about users of the traditional library orientation method. it is important to note that the terms “library tour” and “library orientation” can be somewhat vague, because this terminology is not interchangeable, yet is frequently treated as such in the literature.6 these terms are often included among library instruction materials which predominately influence undergraduate students.7 kylie bailin, benjamin jahre, and sarah morris define orientation as “any attempt to reduce library anxiety by introducing students to what a college/university library is, what it contains, and where to find information while also showing how helpful librarians can be.”8 their book is a culmination of case studies of academic library orientation in various forms worldwide where the common theme across most chapters is the need to assess, revise, and change library orientation models as needed, especially in response to feedback, staff demands, and the evolving trend of libraries and technology.9 furthermore, the majority of these studies are undergraduate-focused, and often freshman-focused, while only a few studies are geared towards graduate students. other traditional orientation problems discussed in the literature include students lacking intrinsic motivation to attend library orientation, library staff time required to execute the orientation, and lack of attendance.10 additionally, among librarians there seems to be consensus that the traditional library tours are the least effective means of orientation, yet they are the most highly used and with attention predominately focused on the undergraduate population alone. 11 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 4 in 1997, pixey anne mosely described the traditional guided library tour as ineffective, and documented the trend of libraries discontinuing it in favor of more active learning options.12 her study surveyed 44 students who took a redesigned library tour, all of whom were undergraduates (with freshmen as the target population). although mosely’s study only addressed one group of library users, it does attempt to answer a question on library perception whereby 93 percent of surveyed students indicated feeling more comfortable in using the library after the more active learning approach.13 a comparison study by marcus and beck looked at traditional vs treasure hunt orientations, and ultimately discovered that perception of the traditional method is limited by the selective user population and lack of effective measurements. they cited the need for continued study of alternative approaches to academic library orientation.14 a study by kenneth burhanna, tammy eschedor voelker, and julie gedeon looked at the traditional library tour from the physical and virtual perspective. confronted with a lack of access to the physical library, these researchers at kent state university decided to add an online option for the required traditional freshman library tour.15 their study compared the efficacy of learning and affective outcomes between face-to-face library tours and those of online library tours. of the 3,610 students who took the required library tour assignment, 3,567 chose the online tour method and 63 opted or were required to take the in-person, librarian-led tour. surveys were later sent to a random list of 250 students who did not take the in-person tour and the 63 students who did take the in-person tour. of the 46 usable responses all but one were undergraduates and 39 (85 percent) of them were freshman.16 this is a small sample size with a ratio of slightly greater than 2:1 for online versus in-person tour participation. although results showed that an instructor’s recommendation on format selection was the strongest influencing factor, convenience was also significant for those who selected the online option (81.5 percent). in contrast, only 18.5 percent of the students who took the face-toface tour rated it as convenient. the authors found that regardless of tour type, students were more comfortable using the library (85 percent) and more likely to use library resources (80 percent) after having taken a library tour. interestingly, students who took the online tour seemed slightly more likely to visit the physical library than those who took the in-person tour. ultimately the analysis of both tours showed this method of library orientation encourages library resource use, and the “online tour seems to perform as well, if not slightly better than the in-person tour.”17 gamification use in libraries an alternative format to the traditional method is gamification. gamification has become a familiar trend within academic libraries in recent years, and most often refers to the use of a technology based game delivery within an instructional setting. some users find gamified library instruction to be more enjoyable than traditional methods. for these people, gamification can potentially increase student engagement as well as retention of information.18 the goal of gamification is to create a simplified reality with a defined user experience. kyle felker and eric phetteplace emphasized the importance of user interaction over “specific mechanics or technologies” in thinking about the gamification design process.19 proponents of gamification of library instructional content indicate that it connects to the broader mission of library discovery and exploration as exemplified through collaboration and the stimulation of learning.20 additional benefits of gamification are its teaching, outreach and engagement functions.21 many researchers have documented specific applications of online gaming as a means of imparting library instruction. mary j. broussard and jessica urick oberlin described the work of librarians at lycoming college in developing an online game as one approach to teaching about information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 5 plagiarism.22 melissa mallon offered summaries of nine games produced for higher education, several of which were specifically created for use by academic libraries.23 many of these online library games reviewed used flash, or required players to download the game before playing. by contrast, j. long detailed an initiative at miami university to integrate gamification into the library instruction, a project which utilized twine.24 twine is an in-browser method and therefore avoids the problem of requiring users to download additional software prior to playing the game. other libraries have used online gamification specifically as a tool for library orientations. although researchers have demonstrated that the library orientation is an important practice in establishing positive first impressions of the library and counteracting library anxiety among new users, the differences between in-person versus online delivery formats are unclear.25 several successful instances have been documented in which the orientation was moved to an online game format. nancy o’hanlon, karen diaz, and fred roecker described a collaboration at ohio state university libraries between librarians and the office of first year experience; for this project, they created a game to orient all new students to the library prior to arrival on campus.26 the game was called “head hunt,” and was cited among those games listed in the article by mallon. 27 anna-lise smith and lesli baker reported the “get a clue” game at utah valley university which oriented new students over two semesters.28 another orientation game developed at california state university-fresno was noteworthy for its placement in the university’s learning management system (lms).29 in reviewing the literature regarding online library gamification efforts, there appear to be several best practices. several studies cite initial student assessment to understand student knowledge and/or perceptions of the content, followed by an iterative design process with a team of librarians and computer programmers.30 felker and phetteplace reinforced the need for this iterative process of prototyping, testing, deployment, and assessment as one key to success; however they also stated that the most prevalent reason for failure is that the games are not fun for users.31 librarians are information experts, and are not necessarily trained in fun game design. some libraries have solved this problem by partnering with or hiring professional designers; however for many under-resourced libraries, this is not an option.32 taking advantage of opensource tools, as well as the documented trial-and-error practices of others, can be helpful to newcomers who wish to break into new library engagement methods utilizing gamification. as literature has shown, a traditional library tour may have a place in the list of library services, but for whom and at what cost are questions with limited answers in studies done to date. gamification has offered an alternative perspective but with narrow accounts of its success in the online storytelling format and for users outside of the heavily focused freshman group. across all literature of library orientation studies, there is little reference to other library user populations such as faculty, staff, community users, distance students, or students not formally part of a class that requires library orientation. development of the library game orientation (libgo) libgo was developed by the authors with not only a consideration for the walker library user experience, but also with a specific attention to the differing needs of the multiple user groups served by the library. this user-focused concern led to exploring creative methodologies such as user experience research and human-centered design thinking, a process of overlapping phases that produces a creative and meaningful solution in a non-linear way. the three pillars of design information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 6 thinking are inspiration, ideation, and iteration.33 defining the problem and empathizing with the users (inspiration) led into the ideation phase, whereby the authors created lowand high-fidelity prototypes. the prototypes were tested and improved (iteration) through the use of beta testing in which playtesters interacted with the gamified orientation. the authors were novice developers of the gamified orientation, and this entailed a learning curve for not only the design thinking mindset but also the technical achievability. the development started with design thinking conversations and quickly turned to low-fidelity prototypes designed on paper. the development soon advanced to the actual coding so that the authors could get early designs tested before launching the final version. prior to deployment on the library’s website, libgo underwent a series of playtesting by library faculty, staff, and student employees. this testing was invaluable and led to such improvements as streamlining of processes and less ambiguity of text. libgo was developed with the twine open-source software (https://twinery.org), a product which is primarily used for telling interactive, non-linear stories with html. twine was an excellent application for this project as it allowed the creation of an online and interactive “choose your own adventure” styled library orientation game, in which users could explore the library based upon their selection of one of multiple available plot directions. with a modest learning curve and as an open source software, twine is highly accessible for those who are not accustomed to coding. for those who know html, css, javascript, variables, and conditional logic, twine’s capabilities can be extended. the library’s interactive orientation adventure requires users to select one of the four available personas: undergraduate student, graduate student, faculty, or community member. users subsequently follow that persona through a non-linear series of places, resources and points of interest built with the html output of using twee (twine’s programming language). see figure 2 for an example point of interest page and figure 3 for an example of a user’s final score after completing the gamified experience. once the twine story went through several iterations of design and testing, the html file was placed on the library’s website for the gamified orientation to be implemented with actual users. https://twinery.org/ information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 7 figure 2. this instructional page within libgo explains how to reserve different library spaces online. upon reading this content, the user will progress by clicking on one of the hypertext lines in blue font at the bottom. figure 3. based upon the displayed avatar, this libgo page is representative of a graduate student’s completion of libgo. the page indicates the player’s final score and gives additional options to return to the home page or complete the survey. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 8 purpose of study libgo utilized the common "choose your own adventure" format whereby players progress through a storyline based upon their selection of one of multiple available plot directions. although the literature suggests that other technology-based methods are an engaging and instructive mode of content delivery, little prior research exists regarding this specific approach to library outreach. furthermore, no previous research appears to have been conducted to understand the different online, gamified orientation needs of various library groups. the researchers wanted to understand the potential of interactive storytelling as a means to educate a range of users on library services as well as make the library more approachable from a user perspective. the study was designed to understand the user experience of each of the four groups. the researchers hoped to discern which users, if any, found the gamified experience to be a helpful method of orientation to the library’s physical and electronic services. another area of inquiry was to determine whether this might be an effective delivery method by which to target certain segments of the campus for outreach. finally, the study intended to determine whether this method of orientation might incline participants toward future use of the library. methodology overview the authors selected an embedded mixed methods design approach in which quantitative and qualitative data were collected concurrently through the same assessment instrument.34 the survey instrument primarily collected quantitative data, however a qualitative open-response question was embedded at the end of the survey: this question gathered additional data by which to answer the research questions. each data set (one quantitative and one qualitative) was analyzed separately for each participant group, and then the groups were compared to develop a richer understanding of participant behavior. research questions the data collection and subsequent analysis attempted to answer the following questions: 1. which group(s) of library users prefer to be oriented to library services and resources through the interactive storytelling format, as compared to other formats? 2. which group(s) of library users are more likely to use library services and resources after participating in the interactive storytelling format of orientation? 3. what are user impressions of libgo, and are there any differences in impression based on the characteristics of the unique user group? participants participants for the study were recruited in-person and via the library website. in-person recruitment entailed the distribution of flyers and use of signage to recruit participants to play libgo in a library computer lab during a one-day event. online recruitment lasted approximately ten weeks and simply involved the placement of a link to libgo on the home page of th e library’s website. a total of 167 responses were gathered through both methods and participants were distributed as shown in table 1. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 9 table 1. composition of study’s participants group number affiliation number of responses 1 undergraduate students 55 2 graduate students 62 3 faculty 13 4 staff 28 5 community members 9 total 167 for the purposes of statistical data analysis, groups 3 and 4 were combined to produce a single group of 41 university employee respondents; also, group 5’s data was not included in the statistical analysis due to the low number of participants. qualitative data for all groups, however, was included in the non-statistical analysis. survey instrument a survey with twelve total questions was developed for this study and was administered online through qualtrics. after playing libgo, participants were asked to voluntarily complete the survey; if they agreed, they were redirected to the survey’s website. before answering any survey questions, the instrument administered an informed consent statement to participants . all aspects of the research, including the survey instrument, were approved through the university’s institutional review board (protocol number 18-1293). the first part of the survey (see appendix a) consisted of ten questions, each with a ten-point likert scaled response. the first five questions were each designed to measure a preference construct, and the next five questions each measured a likelihood construct. the pref erence construct referred to participant’s preference for a library orientation: did they prefer libgo’s online interactive storytelling format, or did they prefer another format such as in-person talks? the likelihood construct referred to the participant’s self-perceived likelihood of more readily engaging with the library in the future (both in-person and online) after playing libgo. the second part of the survey gathered the participant’s self-reported affiliation (see table 1 for the list of possible group affiliations) as well as offered participants an open-ended response area for optional qualitative feedback. data collection the study’s data was collected in two stages. in stage one, libgo was unveiled to library visitors during a special campus-wide week of student programming events. on the library’s designated event day, the researchers held a drop-in event at one of the library’s computer labs (see figure 4 for an example of event advertisement). library visitors were offered a prize bag and snacks if they agreed to play libgo and complete the survey. during the three-hour-long drop-in session, 58 individual responses were collected: the vast majority of these came from undergraduate students (51 responses), with additional responses from graduate students (n = 4), university staff employees (n = 2), and one community member responding. community members were defined as anyone not currently directly affiliated with the university; this group may have included prospective students or alumni. stage 2 began the following day after the library drop-in event, and simply involved the placement of a link to libgo on the home page of the library’s website. any visitor to the library’s website could click on the advertisement to be taken to libgo. this link information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 10 remained active on the library website for ten weeks, at which point the final data was gathered. a total of 167 responses were gathered during both stages and participants were distributed as previously shown in table 1. figure 4. example of student libgo event advertisement results quantitative findings statistical analysis of each of the ten quantitative questions required the use of one-way anova in spss. a post hoc test (hochberg’s gt2) was run in each instance to account for the different sample sizes. for all statistical analysis, only the data from undergraduates, graduate students, and university employees (a group which combined both faculty and staff results) were utilized. a listing of mean comparisons by group, for each of the ten survey questions, may be found in table 2. the analysis of the one-way anovas yielded statistically significant results for three of the ten individual questions in the first part of the survey: questions 2, 3, and 6 (see table 3). table 2. descriptive statistics for survey results (10-point scale, with 10 as most likely) survey question mean for undergraduate students mean for graduate students mean for university employees 1. in considering the different ways to learn about walker library, do you find this library orientation game to be more or less preferable as compared to other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own)? 7.02 6.39 6.02 2. in your opinion, was the library orientation game a useful way to get introduced to the library’s services and resources? 8.13 6.94 7.12 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 11 3. if your friend needed a library orientation, how likely would you be to recommend the game over other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own?) 7.38 5.94 5.98 4. please indicate your level of agreement with the following statement: “as compared to playing the game, i would have preferred to learn about the library’s resources and services by my own exploration of the library website?” 6.11 6.50 5.88 5. please indicate your level of agreement with the following statement: “as compared to playing the game, i would have preferred to learn about the library’s resources and services through an inperson orientation tour.” 6.11 5.08 5.76 6. after playing this orientation game, are you more or less likely to visit walker library in person? 8.27 6.94 6.90 7. after playing this library orientation game, are you more or less likely to use the walker library website to find out about the library (such as hours of operation, where to go to get different materials/services, etc.)? 7.82 6.97 7.20 8. after playing this library orientation game, are you more or less likely to seek help from a librarian at walker library? 6.95 6.58 6.63 9. after playing this library orientation game, are you more or less likely to use the library’s online resources (such as databases, journals, e-books)? 7.67 7.15 6.90 10. after playing this library orientation game, are you more or less likely to attend a library workshop, training, or event? 6.96 6.73 6.24 table 3. overall statistically significant group differences df f p w2 question 2 2 3.714 .027 .03 question 3 2 4.508 .012 .04 question 6 2 7.178 .001 .07 question 2 asked “in your opinion, was the library orientation game a useful way to get introduced to the library’s services and resources?” the one-way anova found that there was a statistically significant difference between groups (f(2,155) = 3.714, p = .027, ω2 = .03). the post hoc comparison using the hochberg’s gt2 test revealed that undergraduates were statistically significantly more likely to prefer libgo in this manner (m = 8.13, sd = 1.94, p = .031) as information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 12 compared to the graduate students (m = 6.94, sd = 2.72). there was no statistically significant difference between undergraduates and the university employees (p = .145). according to criteria suggested by roger kirk, the effect size of .03 indicates a small effect in perceived usefulness of libgo as an introduction among undergraduates.35 question 3 asked “if your friend needed a library orientation, how likely would you be to recommend the game over other orientation options (such as in-person tours, speaking with a librarian, or clicking through the library website on your own)?” the one-way anova found that there was a statistically significant difference between groups (f(2, 155) = 4.508, p = .012, ω2 = .04). the post hoc comparison using the hochberg’s gt2 test found that undergraduates were statistically significantly more likely to prefer libgo over other orientation options (m = 7.38, sd = 2.49, p = .021) as compared to graduate students (m = 5.94, sd = 3.06). there was no statistically significant difference between undergraduates and university employees (p = .053). the effect size of .04 indicates a small effect regarding undergraduate preference for libgo versus other orientation options. question 6 asked “after playing this library orientation game, are you more or less likely to visit walker library in person?” the one-way anova found that there was a statistically significant difference between groups (f(2,155) = 7.178, p = .001, ω2 = .07). the post hoc comparison using the hochberg’s gt2 test revealed that undergraduates were statistically significantly more likely to visit the library after playing libgo (m = 8.27, sd = 2.09, p = .003) as compared to graduate students (m = 6.94, sd = 2.20). additionally, the test found that undergraduates were statistically significantly more likely to visit the library after playing libgo (p = .007) as compared to university employees (m = 6.90, sd = 2.08). according to criteria suggested by kirk, the effect size of .07 indicates a medium effect regarding undergraduate potential to visit the library in person after playing libgo. 36 in addition to testing each individual survey question, tests were run to understand the possible group differences by construct (preference and likelihood). the preference construct was an aggregate of survey questions 1-5, and the likelihood construct was an aggregate of survey questions 6-10. for both constructs, the one-way anova found results which were not statistically significant. in all, the quantitative findings indicated three areas by which the experience of playing libgo was more helpful for the surveyed undergraduates than the other surveyed groups (i.e., graduate students or university employees). at this point, the analysis turned to the qualitative data so as to better understand participant views of libgo. qualitative findings analysis of the qualitative results was limited to the data collected in the survey’s final question. question 12 was an open-response area, and was intentionally prefaced with a vague prompt: “do you have any final thoughts for the library (suggestions, additions, modification, comments, criticisms, praise, etc.)?” of the 167 total survey responses, 67 individuals chose to answer this question. preliminary analysis showed that the feedback derived from this question covered a spectrum of topics, ranging from remarks on the libgo experience itself to broader concerns regarding other library services. open coding strategies were utilized to interpret the content of participant responses. under this methodology, the responses were evaluated for general themes and then coded and grouped information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 13 under a constant comparative approach.37 nvivo 12 software was used to code all 67 participant responses. initial coding yielded eight open codes, but these were later consolidated into six final codes (see table 4). one code (libgo improvement tip) was rather nuanced and yielded five axial codes (see table 5). axial codes denoted secondary concerns which fell under a larger category of interest. although some participants gave longer feedback which addressed multiple concerns, care was taken to segregate each distinct concern to a specific code. therefore, it is important to note that some comments addressed multiple concerns, and so the total number of concerns (n = 76) is greater than the total number of individuals responding to the prompt (n = 67). table 4. distribution of qualitative codes by user group code undergraduate graduate faculty staff community member total # concerns positive feedback 7 7 1 4 2 21 negative feedback 1 2 0 3 0 6 in-person tour preference 2 3 0 1 0 6 libgo improvement tip 5 11 1 3 3 23 library services feedback 2 4 3 0 0 9 library building feedback 1 7 1 2 0 11 total: 18 34 6 13 5 76 discussion of qualitative themes positive feedback (21 separate concerns). affirmative comments regarding libgo were primarily split between undergraduate and graduate students, with a small number of comments coming from the other groups. although all groups stated that the game was helpful, one undergraduate wrote “i wish i would’ve received this orientation at the very beginning of the year!” a graduate student declared “this was a creative way to engage students, and i think it should be included on the website for fun.” both community members commented on the utility of libgo in providing an orientation without having to physically come to the library; for example, “interactive without having to actually attend the library in person which i liked.” additionally, a community member pointed out the instructional capability of libgo, writing “i think i learned more from the game than walking around in the library.” negative feedback (6 separate concerns). unfavorable comments regarding libgo primarily challenged the orientation’s characterization as a “game” in terms of its lack of fun. one graduate student wrote a comment representative of this concern by stating, “the game didn’t really seem like a game at all.” a particularly searing comment came from a university staff member who information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 14 wrote, “calling this collection of web pages an ‘interactive game’ is a stretch, which is a generous way of stating it.” in-person tour preference (6 separate concerns). a small number of concerns indicated a preference for in-person orientations versus online. one undergraduate cited the ability to ask questions during an in-person tour as an advantage of that delivery medium. a graduate student mentioned their desire for kinesthetic learning over an online approach, writing, “i prefer hands on exploration of the library.” libgo improvement tip (23 separate concerns). suggested improvements to libgo were the largest area of qualitative feedback and produced five axial themes (subthemes); see table 5 for a breakdown of the five axial themes by group. 1. design issues were the largest cited area of improvement, and the most commonly mentioned design problem was the inability of the user to go back to previously seen content. although this functionality did in fact exist, it was apparently not intuitive to users; design modifications in future iterations are therefore critical. other users made suggestions as to the color scheme used and the ability to magnify image sizes. 2. user experience was another area of feedback, and primarily included suggestions on how to make libgo a more fun experience. one graduate student offered a role-playing game alternative. another graduate student expressed an interest in a game with side missions, in addition to the overall goals, where tokens could be earned for completed missions; the student justified these changes by stating “i feel that incorporating these types of idea will make the game more enjoyable.” in suggesting similar improvements, one undergraduate stated that libgo “felt more like a quiz than a game.” 3. technology issues primarily addressed two related issues: images not loading and broken links. images not loading could be dependent on many factors, including the user’s browser settings, internet traffic (volume) delaying load time, or broken image links, among others. broken links could be the root issue since the images used in libgo were taken from other areas of the library website. this method of gathering content pointed out a design vulnerability of using existing image locations (controlled by non-libgo developers) rather than images exclusively for libgo. 4. content issues were raised exclusively by graduate students. one student felt that libgo placed an emphasis on physical spaces in the library and did not give a deep enough treatment to library services. another graduate student asked for “an interactive map to click on so that we physically see the areas” of the library, thus making the interaction more user-friendly with a visual. 5. didn’t understand purpose is a subtheme where improvement is needed and is based on two comments made by the two university staff members. one wrote that “an online tour would have been better and just as informative,” although libgo was not only designed to be an online tour of the library, but also an orientation of the library’s services. the other staff member wrote, “i read the rules but it was still unclear what the objective was.” in all, it is clear that libgo’s purpose was confusing for some. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 15 table 5. libgo improvement tip axial codes by user group axial code undergraduate graduate faculty staff community member total # concerns design 4 3 0 0 1 8 user experience 1 2 1 0 1 5 tech issue 0 1 0 1 0 2 content 0 5 0 0 1 6 didn’t understand purpose 0 0 0 2 0 2 total: 5 11 1 3 3 23 library services feedback (9 separate concerns). several participants took the opportunity to provide feedback on general library services rather than on libgo itself. undergraduates simply gave general positive feedback about the value of the library, but many graduate students gave recommendations regarding specific electronic resource improvements. additionally, one graduate student wrote, “i think it is critical to meet with new graduate students before they start their program,” something the library used to do but had not pursued in recent years. although these comments did not directly pertain to libgo, the authors accepted all of them as valuable feedback to the library. library building feedback (11 separate concerns). this was another theme in which graduate students dominated the comments. feedback ranging from requests for microwave use, additional study tables and better temperature control in the building appeared. several participants asked for greater enforcement of quiet zones. like the library services feedback, the authors again took these comments as helpful to the overall library rather than libgo. discussion the results of this study indicated that some groups of library visitors better received the gamified library orientation experience than other groups. undergraduate students indicated the largest appreciation for a library orientation via libgo. specifically, they demonstrated a statistically significant difference over the other groups in supporting libgo’s usefulness as an orientation tool, a preference for libgo over other orientation formats, and a likelihood of future use of the physical library after playing libgo. these very encouraging results provide evidence for the efficacy of alternative means of library orientation. the qualitative results provided additional helpful insight regarding the user impressions from each of the five surveyed groups. this feedback demonstrated that a variety of groups benefited from the experience of playing libgo, including some community members who appreciated libgo as a means of becoming acclimated to the library without having to enter the building. a virtual orientation format was not ideal for a few players who indicated a preference for a face-toface orientation due to the ability to ask questions. many people identified areas of improvement for libgo. graduate students in particular offered a disproportionate number of suggestions as compared to the other groups. while they provided a great deal of helpful feedback, it is possible that graduate students were so distracted by the perceived problems that they could not fully take in the experience or gain value from libgo’s orientation purpose. it is also very likely that libgo information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 16 simply was not very fun for these players: several players noted that it did not feel like a game but rather a collection of content. the review of literature indicated that this amusement issue is a common pitfall of educational games. although the authors tried to design an enjoyable orientation experience, it is possible that more work is needed to satisfy user expectations. the mixed-methods design of this study was instrumental in providing a richer understanding of user perceptions. while the statistical analysis of participant survey responses was very helpful in identifying clear trends between groups, the qualitative analysis helped the authors draw valuable conclusions. specifically, the open-response data demonstrated that additional groups such as graduate students and community members appreciated the experience of playing libgo; this information was not readily apparent through the statistical analysis. additionally, the qualitative analysis demonstrated that many groups had concerns regarding areas of improvement that may have impaired their user experience. these important findings could help guide future directions of the research. in all, the authors concluded this phase of the research feeling satisfied that libgo showed great promise for library orientation delivery but could benefit from continued development and future user assessment. although undergraduate students seemed most receptive overall to a virtual orientation experience, other groups appeared to have benefited from the resource. study limitations a primary limitation of this study was its small sample size. as the entire university campus was targeted for participation in the study, the number of respondents was far too small to generalize the results. despite this limitation however, the study’s population reflected many different groups of library patrons on campus. the findings are therefore valuable as a means of stimulating future discussion regarding the value of alternative library orientation methods utilizing gamification. another limitation is that the authors did not pre-assess the targeted groups for their prior knowledge of walker library services and building layout, nor for their interest in learning about these topics. it is possible that various groups did not see the value in learning about the library for a variety of reasons. faculty members, in particular, may have considered their prior knowledge adequate for navigating the electronic holdings or building layout without recognizing the value of the other many services offered physically and electronically by the library. all groups may have experienced a level of “library anxiety” that prevented them from being motivated to learn more about the library.38 it is difficult to understand the range of covariate factors without a pre-assessment. finally, there was qualitative evidence supporting the limitation that libgo did not properly convey its stated purpose of orientation rather than imparting research skills. without understanding libgo’s focus on library orientation, users could have been confused or disappointed by the experience. although care was taken to make this purpose explicit, some users indicated their confusion in the qualitative data. this observed problem points to a design flaw that undoubtedly had some bearing on the study’s results. conclusion & future research convinced of the importance of the library orientation, the authors sought to move this traditional in-person experience to a virtual one. the quantitative results indicated that the gamified information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 17 orientation experience was useful to undergraduate students in its intended purpose of acclimating users to the library, as well as encouraging their future use of the physical library. at a time in which physical traffic to the library has shown a marked decline, new outreach strategies should be considered.39 the results were also helpful in showing that this particular iteration of the gamified orientation was preferred over other delivery methods by undergraduate students, as compared to other groups, to a statistically significant level. this is an important finding as it demonstrates that a diversified outreach strategy is necessary: different groups of library patrons desire their orientation information in different formats. the next logical question to ask however is: why did the other groups examined through the statistical data analysis (graduate students and faculty) not appreciate the gamified orientation to the same level as undergraduates? the answers to this question are complicated and may be explained in part by the qualitative analysis. based upon those findings, it is possible that the game did not appeal to these groups on the basis of fun or enjoyment; this concern was specifically mentioned by graduate students. faculty members, including staff, provided a smaller level of qualitative feedback; it is therefore difficult to speculate as to their exact reasons for disengagement with libgo. with this concern in mind, the authors would like to concentrate their next iteration of research on the specific library orientation needs of graduate students and faculty. both groups present different, but critical, needs for outreach. graduate students were the largest group of survey respondents, presumably indicating a high level of interest in learning more about the library. many graduate programs at mtsu are delivered partially or entirely online; as a result, these students may be less likely to come to campus. due to graduate students’ relatively infrequent visits to campus, a virtual library orientation could be even more meaningful for them in meeting their need for library services information. faculty are another important group to target because if they lack a full understanding of the library’s offerings, they are unlikely to assign assignments that wholly utilize the library’s services. although it is possible that faculty prefer an in-person orientation, many new faculty have indicated limited availability for such events. a virtual orientation seems conducive to busy schedules. however, it is possible that the issue is simply a matter of marketing: faculty may not know that a virtual option is available, nor do they necessarily understand all that the library has to offer. in all, future research should begin with a survey to understand what both groups already know about the library, as well as the library services they desire. another necessary step in future research would be the expansion of the development team to include computer programmers. although the authors feel that libgo holds great promise as a virtual orientation tool, more needs to be done to enhance the user’s enjoyment of the experience. twine is a user-friendly software that other librarians could pick up without having to be computer programmers; however, programmers (professional or student) could bring a design expertise to the project. future iterations of this project should incorporate the skills of multiple groups, including expertise in libraries, user research, visual design, interaction design, programming, marketing, and testers from each type of intended audience. collectively, this group will have the greatest impact on improving the user experience and ultimately the usefulness of a gamified orientation experience. this experience with gamification, and specifically interactive storytelling, was a valuable experience for walker library. these results should encourage other libraries seeking an alternate information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 18 delivery method for orientations. the authors hope to build upon the lessons learned from this mixed methods research study of libgo to find the correct outreach medium for their range of library users. acknowledgments special thanks to our beta playtesters and student assistants who worked the libgo event, which was funded, in part, by mt engage and walker library at middle tennessee state university. information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 19 appendix a: survey instrument information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 20 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 21 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 22 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 23 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 24 endnotes 1 sandra calemme mccarthy, “at issue: exploring library usage by online learners with student success,” community college enterprise 23, no. 2 (january 2017): 27–31; angie thorpe et al., “the impact of the academic library on student success: connecting the dots,” portal: libraries and the academy 16, no. 2 (2016): 373–92, https://doi.org/10.1353/pla.20160027. 2 steven ovadia, “how does tenure status impact library usage: a study of laguardia community college,” journal of academic librarianship 35, no. 4 (january 2009): 332–40, https://doi.org/10.1016/j.acalib.2009.04.022. 3 chris leeder and steven lonn, “faculty usage of library tools in a learning management system,” college & research libraries, 75, no. 5 (september 2014): 641–63, https://doi.org/10.5860/crl.75.5.641. 4 kyle felker and eric phetteplace, “gamification in libraries: the state of the art,” reference and user services quarterly 54, no. 2 (2014): 19-23, https://doi.org/10.5860/rusq.54n2.19; nancy o’hanlon, karen diaz, and fred roecker, “a game-based multimedia approach to library orientation,” (paper, 35th national loex library instruction conference, san diego, may 2007), https://commons.emich.edu/loexconf2007/19/; leila june rod-welch, “let’s get oriented: getting intimate with the library, small group sessions for library orientation,” (paper, association of college and research libraries conference, baltimore, march 2017), http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 7/letsgetoriented.pdf. 5 kelly czarnecki, “chapter 4: digital storytelling in different library settings,” library technology reports, no. 7 (2009): 20-30; rebecca j. morris, “creating, viewing, and assessing: fluid roles of the student self in digital storytelling,” school libraries worldwide, no. 2 (2013): 54–68. 6 sandra marcus and sheila beck, “a library adventure: comparing a treasure hunt with a traditional freshman orientation tour,” college & research libraries 64, no. 1 (january 2003): 23–44, https://doi.org/10.5860/crl.64.1.23. 7 lori oling and michelle mach, “tour trends in academic arl libraries,” college & research libraries, 63, no. 1 (january 2002): 13-23, https://doi.org/10.5860/crl.63.1.13. 8 kylie bailin, benjamin jahre, and sarah morriss, “planning academic library orientations: case studies from around the world,” (oxford, uk: chandos publishing, 2018): xvi. 9 bailin, jahre, and morriss, “planning academic library orientations.” 10 marcus and beck, “a library adventure”; a. carolyn miller, “the round robin library tour,” journal of academic librarianship 6, no. 4 (1980): 215–18; michael simmons, “evaluation of library tours,” edrs, ed 331513 (1990): 1-24. 11 marcus and beck, “a library adventure”; oling and mach, “tour trends”; rod-welch, “let’s get oriented.” https://doi.org/10.1353/pla.20160027 https://doi.org/10.1016/j.acalib.2009.04.022 https://doi.org/10.5860/crl.75.5.641 https://commons.emich.edu/loexconf2007/19/ http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/letsgetoriented.pdf http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/letsgetoriented.pdf https://doi.org/10.5860/crl.64.1.23 https://doi.org/10.5860/crl.63.1.13 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 25 12 pixey anne mosley, “assessing the comfort level impact and perceptual value of library tours,” research strategies 15, no. 4 (1997): 261–70, https://doi.org/10.1016/s07343310(97)90013-6. 13 mosley, “assessing the comfort level impact and perceptual value of library tours.” 14 marcus and beck, “a library adventure,” 27. 15 kenneth j. burhanna, tammy j. eschedor voelker, and jule a. gedeon, “virtually the same: comparing the effectiveness of online versus in-person library tours,” public services quarterly 4, no. 4(2008): 317–38, https://doi.org/10.1080/15228950802461616. 16 burhanna, voelker, and gedeon, “virtually the same,” 326. 17 burhanna, voelker, and gedeon, “virtually the same,” 329. 18 felker and phetteplace, “gamification in libraries.” 19 felker and phetteplace, “gamification in libraries,”20. 20 felker and phetteplace, “gamification in libraries.” 21 felker and phetteplace, “gamification in libraries”; o’hanlon et al., “a game-based multimedia approach.” 22 mary j. broussard and jessica urick oberlin, “using online games to fight plagiarism: a spoonful of sugar helps the medicine go down,” indiana libraries 30, no. 1 (january 2011): 28–39. 23 melissa mallon, “gaming and gamification,” public services quarterly 9, no. 3 (2013): 210–21, https://doi.org/10.1080/15228959.2013.815502. 24 j. long, “chapter 21: gaming library instruction: using interactive play to promote research as a process,” distributed learning (january 1, 2017), 385–401, https://doi.org/10.1016/b978-008-100598-9.00021-0. 25 rod-welch, “let’s get oriented.” 26 o’hanlon et al., “a game-based multimedia approach.” 27 mallon, “gaming and gamification.” 28 anna-lise smith and lesli baker, “getting a clue: creating student detectives and dragon slayers in your library,” reference services review 39, no. 4 (november 2011): 628–42, https://doi.org/10.1108/00907321111186659. 29 monica fusich et al., “hml-iq: frenso state’s online library orientation game,” college & research libraries news 72, no. 11 (december 2011): 626–30, https://doi.org/10.5860/crln.72.11.8667. https://doi.org/10.1016/s0734-3310(97)90013-6 https://doi.org/10.1016/s0734-3310(97)90013-6 https://doi.org/10.1080/15228950802461616 https://doi.org/10.1080/15228959.2013.815502 https://doi.org/10.1016/b978-0-08-100598-9.00021-0 https://doi.org/10.1016/b978-0-08-100598-9.00021-0 https://doi.org/10.1108/00907321111186659 https://doi.org/10.5860/crln.72.11.8667 information technology and libraries september 2020 applying gamification to the library orientation | reed and miller 26 30 broussard and oberlin, “using online games”; fusich et al., “hml-iq”; o’hanlon et al., “a gamebased multimedia approach.” 31 felker and phetteplace, “gamification in libraries.” 32 felker and phetteplace, “gamification in libraries”; fusich et al., “hml-iq.” 33 “design thinking for libraries: a toolkit for patron-centered design,” ideo (2015), http://designthinkingforlibraries.com. 34 john w. creswell and vicki l. plano clark, designing and conducting mixed methods research (thousand oaks, ca: sage publications, 2007). 35 roger kirk, “practical significance: a concept whose time has come,” educational and psychological measurement, no. 5 (1996). 36 kirk, “practical significance.” 37 sandra mathison, “encyclopedia of evaluation,” sage, 2005, https://doi.org/10.4135/9781412950558. 38 rod-welch, “let’s get oriented.” 39 felker and phetteplace, “gamification in libraries.” http://designthinkingforlibraries.com/ https://doi.org/10.4135/9781412950558 abstract introduction background literature review traditional orientation gamification use in libraries development of the library game orientation (libgo) purpose of study methodology overview research questions participants survey instrument data collection results quantitative findings qualitative findings discussion of qualitative themes discussion study limitations conclusion & future research acknowledgments appendix a: survey instrument endnotes 6 japanese character input: its state and problems ichiko morita: ohio state university, columbus. computer processing of information is highly advanced in japan, and it continues to be researched and improved by the cooperative efforts of the government, private corporations, and individual scientists, who are among the best in the world. this paper introduces various approaches to the computer input of information currently developed in japan, and discusses the possibility of their applications to the processing of east asian-vernacular language materials in large research libraries in this country. processing of catalog information through an on-line shared-cataloging system has become a part of american libraries' common practice, and its financial and temporal savings have been proven. however, there are some materials not yet considered appropriate for computer processing. the library of congress' plans for romanizing catalog information for all non-roman language materials and putting them on marc tapes for quick distribution of information have been objected to by a large number of specialists in the field. the opponents' reason has been that computerization of vernacular languages by means of transliteration is not satisfactory. such materials are best handled in their own writing systems (the languages in this category include chinese, japanese, korean, hebrew, arabic, and various languages in india). those specialists in the field who see systems working for roman-alphabet materials generally agree that automated systems are very efficient and useful for their research. it would be best if non-roman language materials could be processed through computers using their own writing systems. as far as technology goes, it is possible to process such materials in their original form. systems that have the capability of handling those languages directly have been developed; among the most advanced are the japanese systems. japan has overcome numerous difficulties in developing systems that are capable of handling japanese characters. although automation of libraries is not as widespread as in the united states (due perhaps to a delay in the development of computers), some japanese libraries have already a decade of experience with advanced manuscript received august 1980; accepted december 1980. japanese character input/morita 7 systems. many others have recently started to adopt them. wide utilization of these systems seems to be just a matter of time. it will be beneficial to review japanese methods and consider possible adaptation of them to our systems. in the following sections, various japanese approaches to inputting the japanese language are explained with an eye to future automation of non-roman language materials in this country. the japanese language and the computer it should be noted, first of all, that the japanese language is an entirely different language from chinese, although they are often confused because they both use the same chinese ideographs in writing. each chinese ideograph , or character, symbolizes a certain object or denotes a certain meaning. the japanese use them in the japanese language with its own pronunciation in the context of its own grammar, whereas the chinese use them in the chinese language with its own pronunciation in the context of its own grammar. this means that a chinese ideograph could mean the same thing in both languages, but be pronounced or read differently and used in different grammatical environments. the chinese ideographs used in japanese are referred to as kanji, which are, to complicate the matter, used along with japanese syllabaries called kana. kana, in two styles called hiragana and katakana, total about 170 characters. depending on whether ,a kanji is used with another kanji or kana, the reading of it varies. at different times one set of kanji may be read in two or three different ways. the total number of kanji is about 50,000. in comprehensive dictionaries, about 40,000 or more kanji are included. medium-sized ones, such as ueda's daijiten, include about 15,000; concise ones about 8,000 to 10,000. 1 according to several tests on frequency of kanji occurrence made in various japanese institutions, approximately 3,000 kanji appear in high frequency, 3,000 are of moderate frequency, and several thousand more are of infrequent occurrence. as for geographical names, 2,279 kanji will cover most of japan and 1,500 kanji will suffice to cover personal names, except for very unusual names. 2 approximately 6,300 characters are needed for major newspapers such as the asahi and the nikkei. the trends in the use of kanji are to simplify the characters themselves, and not to use difficult kanji with many strokes. in 1946, the japanese government established 1,850 kanji as those for daily use, 3 and today newspapers and official documents use only those kanji, except for some personal and geographical names. the implication of this trend for computerization of kanji is that, depending on the documents to be covered, the need in number and kind of kanji varies. that is, institutions that deal with scientific or current information do not need as many kanji as other types of institutions that handle documents cover8 journal of library automation vol. 14/1 march 1981 ing longer periods and larger areas of knowledge . for example, japan information center for science and technology, which mainly handles the latest scientific information, claims that with approximately 6,000 kanji it can function satisfactorily. an example from the other extreme is the national institute of japanese literature, whose collection covers older historical periods, during which a great number of kanji were used and many kanji went through changes, mostly simplification m style. the latter institute is constantly adding new kanji to its system. it is obvious then that the first problem in the computerization of japanese materials is the number and kind of kanji to be included in the system. this is a problem of hardware. the other problem concerns software. when japanese is written, its words are not divided as in english, for combination of kanji and kana helps visually to make sentences understandable without word division. also, compound nouns are made by adding other words to a noun, so that, if a set of kanji represents one noun, one can expand its meaning by adding another kanji to it. though word division has been a problem in transliteration and not new in computerization, both arbitrarily divided words and undivided words in particular become serious problems in the computer files and in the retrieval of information . a question may be raised as to why we need kanji processing in spite of these problems; why isn't computer handling of alphanumerics and kana, which is in use today, sufficient? the answer to this is mainly that kanji possess a definite visual effect. also, if only romanized languages or kana alone are used, many homonyms may make the meaning ambiguous. while it is quite possible to write japanese only in kana or in the'"romanized forms, as proven by the systems in use, it is better, for efficiency and precision, to express the language in the way it is actually written. as for the problem of word division, study is in progress on methods of dividing words systematically and automatically, incorporating the latest research in the field of applied linguistics. this is more concerned with the development of software, and this paper will not delve into it. inputting various japanese approaches to inputting kanji and kana are organized below into six major groupings according to different inputting devices. they are: (1) full keyboard, (2) component pattern input , (3) kana keyboard, ( 4) stenotype, (5) optical character recognition, and (6) voice recognition . these six methods are further divided into subvariations as shown in table 1. 4 full keyboard the main feature of this approach is use of a full character keyboard as the inputting device. the operator uses the full character keyboard japanese character input/morita 9 table 1 . input systems major approaches full variations keyboard kanji teletypewriter subvariations japanese typewriter character location coded-plate scanning coded typeface modified coded typeface tablet style electromagnetic component pattern input kana electrostatic photoelectric training characters/ characters needed minute accommodated mediumextensive medium mediumsmall 40-100 30-50 30-70 2,300-4,000 2,205 2,863 2,200-3,000 3,000-4,096 2,800-4,000 2,800-4,000 keyboard two-key stroke location correspondence extensive 60-120 4,096 stenotype optical character recognition voice recognition association memory display selection small 20-30 kana-kanji conversion word conversion sentence conversion 1,000-2,500 rather than codes or other symbols. the keyboard varies depending on models, usually consisting of frequently used kanji and both sets of kana, supplemented by arabic numerals, roman, cyrillic, and greek alphabets in upper and lower cases, often with italics, signs, and diacritical marks. to each character, a two-byte binary code (expressed by a four-digit numeral) is assigned, so that when the inputter types a character the code for the character is punched on paper or cassette tape. kanji teletypewriter the oldest method for kanji inputting, still widely in use, is the kanji teletypewriter system or multishift system. one variation of this approach, developed by the national diet library at an early stage of its computerization, has 192 character keys, each having fourteen characters in three columns and five lines, as shown in figure 1. in addition, there are fourteen selection keys arranged in three columns and five rows on the lower left of the keyboard to correspond to the pattern of characters on each character key . when an operator strikes the character key b with the right hand and the selection key a with the left hand at the same time, the code for the character c is punched on the tape. 10 journal of library automation vol. 14/1 march 1981 000 000 ooo 000 qoo \ ' \ \ \ \ :>'1":111l"jwij i '_''l'h.l-t tt1ul um ~~i :f'i :t~ ;jl;lt'{>,'f r r_r,f rx 15 1 lf~ 1 1-e --····--j·-·· ·-.. lf [l{rl i i'yliilj f'li: ·r1 1jgt)f i *y:nt : ii!j,ii¥1.1 i 9;j.;,~;1: ; ~!z1tt '?" ~~:.,;.· .-.t r •.. ~,, •. x ~.:r, r_ x ,r,; ,r;; i ~~i_i_ 1 if r. ---· . -l -··· ·r'i!..r':~ i tm~m~x <¥~1 t :k j~,] f:k {i~ ~ ri fr t1>/ ilm ~'!<. #.l'li!iii *t9.t !k ix x: . rel ~ \ \ \ \ \ \ \ • \ .ii character key b character c 1rselection key a fig. 1. kanji teletypewriter keyboard of the national diet library. included on this keyboard are : kanji kana western alphabets numerals symbols and marks kanji pattern s kanji components space 2,006 90 144 20 210 40 139 total 2,6506 by using shift keys on the upper left of the keyboard, kana in both styles and alphabets in upper and lower cases can be input. for satisfacjapanese character input!morita 11 tory operation, the keyers must be professionally trained, and it is said that one to three months are necessary for them to be fully trained and able to input an average of fifty to sixty kanji per minute. this is not as fast as most other methods discussed. japanese typewriter the second of the full keyboard approaches is the japanese typewriter method, which uses a modification of the standard japanese typewriter with a tray filled with kanji printing types. the operator finds a character in the tray and punches it by moving a metal handle as the type bar is punched down to print the character. this is rather primitive and different in its operation from the english typewriter, which uses the ten-finger touch method. there are four variations: character location method. kanji are arranged on a keyboard by their codes, so that when a key is punched, the kanji is typed on regular paper as if it had been done by a regular japanese typewriter. at the same time, the code is automatically read from the location of the key and is punched on tape. code-plate scanning method. each type bar has a plate attached on its side, and the code for the character is marked on its plate . when a key is typed, the kanji is printed on paper and the code from the plate is optically scanned at the same time. coded typeface method. each typeface is made with a character on the upper half and a code for it on the lower hale when a key is typed, both the character and code are printed. the code on the bottom half is optically scanned from the printed paper. modified coded typeface method. instead of typing both characters and codes on the paper, this method prints only the characters on the front of the paper and, at the same time, prints a bar code on the back of the paper. the machine capable of doing this is complicated. the size of the character on a typeface can be bigger than in the variation above, and the bar code can be larger to make the scanning of the code easier and more precise. as the discussion of the four variations indicates, the japanese typewriter offers the advantage of being able to monitor input at the time of keying. since the japanese typewriter has been in use for a long time in offices where a quantity of official documents are dealt with, and since ordinary japanese typists can use this system without any additional training, the use of equipment similar in operation was considered advantageous . however, it should be noted that japanese typewriters have never become as prevalent as english typewriters, and the demand for computers comes from more areas than just those where japanese typewriters are used . for this reason, the use of japanese typewriters is not as advantageous as its proponents claim . an obvious 12 journal of library automation vol. 14/1 march 1981 disadvantage is its slow speed of operation-thirty to fifty characters per minute on the average. another disadvantage is that the number of characters on the keyboard is limited to about 3,000. tablet style this method, also known as pen-touch method, was recently developed . each character has a key, and characters are arranged in a certain order. the location of the characters on a matrix sheet determines the two-byte binary code, which consists of a two-digit numerical abscissa and twodigit numerical ordinate . the operator touches the key with a penshaped detector and the code for the character is punched on the paper tape. the operation is one-handed, requiring only a light touch of the key by a detector. keys are on one flat keyboard and are color-coded by sections to make it easier for the operator to locate them. light touch operation reduces operator fatigue. this method does not require special training. however, the number of kanji on a keyboard of reasonable size is limited to approximately 3,500. by shifting, twice as many characters can be handled, though all characters are not indicated on the keyboard. speed of input is not very high-thirty to seventy characters per minute. this system, already used in many libraries, is becoming increasingly popular because of its easy operation. there are three different technologies used: electromagnetic, electrostatic, and photoelectric. there are no differences in actual input operation for those electronically different methods. component pattern input although not a full keyboard method, component pattern input is closely related to these methods. the idea behind this approach is that most kanji are composed of one or more basic component units, two or more of which can be put together into one kanji according to one predetermined pattern out of forty general patterns. the inputting device has keys for those forty patterns along with keys for individual components on a special keyboard. to compose a kanji, a key for an appropriate pattern is selected and typed, and components are chosen to fill each individually numbered block of the selected pattern, following the established order as shown below. 7 each pattern has a code, and so does each component . when a key is typed, the code is punched on a paper tape as shown in figure 2. there are cases where a kanji with two components can be a component of another kanji, as shown in the first and second examples in figure 2. a kanji is constructed by punching at least three codes : one for a pattern and at least two for components. then, a kanji dictionary consisting of several thousand master-code combinations (see figure 3) is stored in a magnetic drum, and the several codes to compose a kanji punched on paper or cassette tapes are converted through this dictionjapanese character lnput!morita 13 k&njl nol on pattern a componenl parlo (radiula) lhe keyboard• ;1§ *-d! [e] . f§ ---. .j 2804 38d 2723 --·-c od eo ~t§ !-.~f~~ --:~ . : .... .: . ... ! 00 "j * ej 2806 3813 1638 1938 -codu t-t ;f:t:~ lm * ~t ~ ~ ~' ,,.~; u : __ ~~-; 4 2807 1638 1138 1138 1138 --cod eo ffe ~*,l; ~ [1@ * ;-1-1 y {! -l __ m1 ___.. 4 2807 1o3a 1817 142a 08z4 ---cod eo fig. 2 . component pattern input. z804 3813 z7zb 0000 0000 0000 8118 • ~-m z806 3813 1638 193!1 0000 0000 b 118 -ao z607 1638 1138 1138 1138 0000 6117 -~ 1a z807 1638 1817 l4za 08z4 0000 9815 .. t~ fig. 3. kanji dictionary. ary to a two-byte binary code assigned to that particular kanji. these are then handled as other kanji with an individual code. though this can be a stand-alone approach to inputting kanji, the principle has been adopted by the national diet library to supplement the inputting of kanji on the full keyboard kanji teletypewriter. the national diet library uses this system when inputting kanji that are not included in its keyboard. instead of having a special separate keyboard, the kanji teletypewriter of the national diet library integrates patterns and components as equivalents to other characters. its keyboard includes forty patterns and approximately 140 components. this was the most elementary approach to computerize kanji . conceived in the early developmental stage of kanji processing, it used one of the characteristics of kanji, the composition from several components. in actual situations, this technique requires at least three key strokes for one kanji and consumes time to locate the needed component on the 14 journal of library automation vol. 14/1 march 1981 keyboard. furthermore, it requires the complicated extra step of putting input codes through a kanji dictionary to combine component codes into a code per kanji. no library is currently using this system by itself. kana keyboard system the keyboard of a japanese syllabary typewriter has adapted the conventional english typewriter keyboard and has standard roman alphabet keys that contain katakana in shift (figure 4). since the number of katakana exceeds that of roman letters, the katakana keys are extended to keys for numerals and punctuation marks. this means that this typewriter can be used either for kana or roman letters by changing its mode. fig. 4. kana typewriter keyboard. two-key stroke method this variation of the kana keyboard system is referred to as the twokey stroke system, and uses kana as codes not as letters . roman letters can be used as codes, too. there are two different subvariations. they are: location correspondence. keys are divided into two sections : one for right hand, and the other for left hand. if two keys are to be stroked, there will be four possible combinations of key strokes: (1) left hand twice, (2) left .and right, (3) right and left, and (4) right twice. the keyboard is accompanied by a kanji table in which characters are arranged in several blocks and in a certain order within each block. each block, which contains twenty-six kanji in a four-by-six arrangement, is made according to each combination of strokes: first block is left and left; second block is left, right, etc. within each block, the ordinate consists of keys for the first stroke and the abscissa for the second . a kanji which is at the intersection of the above indicates which keys are to be typed. when kanji a is to be typed (see figure 5), since it is in block a indicating the stroke combination as left and left, the operator types a · and w by left hand. if kanji b is to be typed, the operator types key a by left hand and key p by right. each key has a byte code and a combination of two key strokes makes a composite, a two-byte binary code, for a kanji. the bit may be changed by shifting, and different kanji can block a (for left, left) g j.,;( '7-. (q) (w) (e) (r) ~ ( 1) 0000 '! (q) 00 00 4(a) o• 00 ll) 0/0 0 0 (z) ' ,. kanji a japanese character input!morita 15 'ij / (t) (y) 0 0 0 0 00 0 0 ~ (1) "' (q) 4(a) ''l (z) block b (for left, right) 7-.:::.. 7--e" o (u) (i) (0) (p) ($) (c) 000000 oooooo ooo.oo 0 0 0/0 0 0 ,. / / kanji b fig. 5. kanji table for location correspondence method. be typed if another table is prepared for kanji with different bits. association memory method . in this method, each kanji is given two kana which usually represent a reading of that kanji. the operator associates a kanji to be input with two kana assigned to that kanji, and types them with two strokes using the kana keys. both of the key-stroke methods are economical as well as convenient because of the wide availability of kana typewriters . mainly for that reason, both of these systems . have been well accepted and are expected to grow further. since this touch method does not require the operator to look for the character on the keyboard to input, it is the fastest to operate and is considered suitable for input in quantity. it is possible to input 60 to 120 characters per minute. the only drawback is that the operator must get acquainted with the arrangement of kanji in the first variation, and must memorize all the associated kana spelling for many kanji in case of the second variation. in either case, the operator must be professionally trained. the japan information center for science and technology, which indexes many scientific publications, employs a vendor who uses the location correspondence variation of this system for inputting information. display selection this also uses a kana typewriter with a screen in front . when a word is typed in kana, a group of kanji with that sound are displayed on the screen. the operator chooses the right kanji with a light pen-a slow but accurate operation. the operator does not have to be specially trained for this. kana-kanji conversion in contrast to the conventional approach of full keyboard inputting, an entirely new method for inputting kanji is gaining popularity as the 16 journal of library automation vol. 14/1 march 1981 availability of sophisticated software increases. this uses a kana typewriter keyboard to input japanese in syllabary or romanized form, converting them to kanji by software. there are two ways of conversion: one that converts word by word, and the other sentence by sentence. stenotype the stenotype is a typewriterlike device. the operator must be able to take shorthand. when the stenotype is used, it punches words in paper tapes. therefore, inputting is high speed. however, the operator must receive proper training. optical character recognition this system, developing quickly and expected to gain wider use, can scan a maximum of 2,500 printed kanji. 8 one variation connects a writing tablet to a computer so that as the operator writes kanji on the tablet, the computer scans them in stroke order. this function of scanning by the stroke order is considered to be an advantage for processing some types of japanese documents. the drawbacks are that the system is still very expensive, and the number of recognizable characters is fewer than 2,000. voice recognition this is an oral-visual system, in which the human voice is read by a computer. obviously the most difficult to develop, this system is still in an experimental stage . however, a prototype has been demonstrated at various exhibitions, and the system apparently possesses great potential. summary pattern configuration and output devices for japanese characters are basically the same as those for english. however, the pattern generation of characters is mechanically more complicated than that of the roman alphabet, because kanji has a more complicated structure than the roman alphabet and the number of components is greater. each kanji is represented by a two-byte binary code rather than one byte as in roman alphabet. because of this, the efficiency of retrieval is low. presently, hard copy and typesetting for printing of hard copy are the major output forms, and very little on-line retrieval of information with kanji is in current operation. problems particular to kanji processing among numerous problems in processing kanji through computers, major ones are: (1) which kanji are to be included; (2) how many characters are to be handled; (3) what code should be assigned and how it should be arranged on the keyboard or table; and (4) how the kanji not included on the keyboard should be treated. in the early stage of kanji computer development, different institujapanese character input/morita 17 tions handled the problems in ways best suited to their individual needs, according to the nature of the literature covered, the amount of literature processed, and the kinds of output needed . they experimented with the then best available capabilities. as a result, the finished systems are all independent and mutually incompatible. standardization is obviously necessary for exchange of information among the systems. in order to set standards for selection of characters and assignment of codes, jis (japan industrial standard) c6226-1978 has been compiled by the japan association for development of information processing. this is a table of characters designed for information exchange (a portion of which is shown in figure 6). it has a one-byte code as its abscissa and another as its ordinate. characters are arranged so that the intersection of abscissa and ordinate determines a kanji whose code consists of four numerals, two from the abscissa and two from the ordinate. included in the table are kana in both styles, roman, greek, and cyrillic alphabets in upper and lower cases, diacritical marks, numerals, and punctuation marks, as follows: 1. special characters 108 2. numerals (arabic) 10 3. roman alphabets 52 4. hiragana 83 5. katakana 86 6. greek alphabets 48 7. cyrillic alphabets 66 8 . kanji 6,349 total 6~8029 in the first section of the table , numerals, alphabets., kana, and special characters are grouped . in the second section, the total of 2, 965 frequently used kanji are arranged as the first priority group, and an additional 3,384 kanji are selected as the second group 10 in the bottom half of the table. kanji are printed in the preferred style for printing typeface. this table will resolve problems 1 to 3 mentioned above. institutions that had arranged their own codes for kanji, including the national institute of japanese literature, are now automatically translating their own codes into jis codes. in cases where needed kanji are not included on the keyboard, handling varies. with the japanese typewriter, because each kanji is inscribed on a typeface, only the kanji on that typeface is printed when the type bar is stroked . therefore , only kanji that have typefaces can be input in this system, while some other handling is possible in other methods. while the number of characters that can be accommodated on keyboards is limited to 2,000 to 3,500, depending on the type of equip18 journal of library automation vol. 14/1 march 1981 b7 d d did d d d d 0 d 0 0 0 b6 1 1 1 1 1 1 1 1 1 1 1 1 1 ! ~ bs d d d d d 1 d 0 d d d 0 d d 2 "' b4 d d d d 0 d 0 1 1 1 1 1 1 1 bj d 0 d 1 1 1 1 0 0 0 0 1 1 1b2 d 1 1 d 0 1 1 0 0 1 1 d 0 bt 1 0 1 0 1 0 1 0 1 0 1 d 1 ~ 1 "'1 1~ b4 1 2 3 4 5 6 7 8 9 10 11 12 13 b; b6 b5 b3 b2 bt 0 1 0 0 0 0 1 1 :·s p: i jl r-f ii ' lll-i . . ? i ~ ~ lj ' 0 ji' 1 • _; ' 1... ---' . . 0 1 d 0 1 0 1 0 2 ~ oic'ji6 a. \l v * ' t -i t 0 1 0 0 0 j1 1 3 0 1 0 i q i 1 0 0 4 ... ..j.. ~.--. ) ;{_ i .z h tj' /j{ ~ "> d) "' 7 }; 0 1 0 0 1 0 1 5 7 711 1 '/ rf .:r.. .x. ;;t ;t ij ij~ ~ 0 1 0 j 0 i 1 ii 0 6 a!bir t.ieiz h 8 r kia m n 0 1 0 j 0 1 1 i 1 i 7 a 6 1 8 rln e e )k 3 l1 i1 k ji 0 1 0 1 0 olol 8 0 1 ol1,oloi11 9 0 1 0 1 0 ii! 0 10 0 i 1 0 1 0 1 ' 11 j. 0 1 ol 1 1jo 0 12 0 1 0 1 1 0 1 13 0 1 0 1 1 1 0 14 0 1 o 1 1 1 1 1 15 r· :!fi p.§. k ~ n -~ "' 1 t-'· ttr; j ;~ ;_,~ -ftt !ffi 0 1 1 0 0 0 0 16 5.p.. ' t ,u a. ').{ * _[§ ~c...· 0 \1 ' 1 0 0 0 1 17 vi,"'i ~~ ,., .. p-[ :tji r'-· ft•;j;j i rr: 1fn .!jfl ·~.c.~ >j(; ,>_l;;, , .~. lit (j • 1 -'fj•--;;1 0/ !_11/ 0 0 1 0 18 tftl b.fltitti l~j [£}:\ £ fjil ~~ n ~;_rj :& !iii] :l~ j . f--""· i . --:----·-·· ~q.~~ t~r-i~ 1 jf( fe .r.t: ''"' if~~ rm 0 1 1 0 oj1/l 19 is •r ·1,. i \. 1,el ;r-.l; j,~ ~ 0 1 1 0 1 0 0 20 5''5 ;\ j ....:: i ji "'~ fn . f i )(ij • 'f-lj!t. ret jf~ ;flj /fjj •wj .;lj: p~ n i 1 1 n 1 i n i 1 ?l ~ .j~ i m i ~ \.;!cr j:.jt ~rr ~ i.gi ~;j 14!.~ h:l :=r fig. 6 . code of th e japanese graphic character set for information interchange. japanese character input!morita 19 ment, character generators have the capability of outputting more than the number of characters on the keyboard. figure 7 shows their relationship. characters that are in the generator but not on the keyboard must be frequently processed, because the number of characters needed for most documents could reach 6,000 to 6,500. using a shift key to enter another mode is a fairly common technique for inputting uncommon kanji. the keyboard may not have a character but, if the character generator has it, the code for that character can be input by shifting. for example, if a character on the keyboard has a code 0117, a bit is changed so the code 8117 can be typed by shifting and typing that key. if the code 8117 is assigned to another kanji not on the keyboard but indexed in the dictionary, it can be input. this applies for the kanji teletypewriter, tablet style, and the two-key stroke variations of the kana typewriter. in the kanji teletypewriter system used by the national diet library, the keyboard accommodates 2,650 characters, while its character generi i i i ,---...... ' / -'-fig. 7. kanji creating capability. outside system capability system capability character generator capability keyboard characters ator has the capability for 5, 717. operators in the national diet library input kanji that are not on the keyboard by using component pattern input method. or, if the operator finds the kanji code in the specially compiled dictionary in which codes for kanji are indexed, a shift key is used to change the bit, thus creating the code for kanji not on the keyboard. most other tablet systems use code dictionaries. in the twokey stroke variations of kana typewriters, tables of kanji for second and third or more shifts can be built, especially when the location association method is used. the handling of kanji that are not in character generators is more difficult. only the digital character generator, the kind that uses either dot or stroke, can add characters fairly easily. in the flying spot system, characters can be added, but it must be done professionally with an additional character cylinder and is very costly. the national diet library, which now uses flying spot, limits addition of kanji to a minimum. because its output is solely in printed book form, the national diet library inputs a fill character for kanji not in the system . when 20 journal of library automation vol. 14/1 march 1981 the phototypeset masters are made, the fill characters are replaced by typeset characters . the use of a fill character suffices only when the output is phototypeset, because there is a step to replace fill characters by typeface. however, as long as the data base includes many fill characters on the magnetic tapes, the on-line retrieval of information or later utilization of tapes becomes unsatisfactory . the national institute of japanese literature uses a dot matrix and prints by wiredot impact . if a kanji is not in the character generator, the institute's staff composes the kanji in an enlarged dot matrix and creates the capability for printing in the generator. if the kanji made in such a way is used only once, the kanji pattern is not stored in the character generator, so that the generator does not reach its full capacity quickly. the enlarged dot composite for kanji created in the institute is filed and indexed for future use. most other institutions simply do not use those less commonly used kanji, and substitute kana for them . in addition to the problems common to any character output, such as size and number of dots, the problem of the space for kanji in relation to other characters and the choice of vertical or horizontal printing of japanese sentences with kanji must be considered. kanji have many strokes and, as mentioned before, are expressed by two-byte codes . each kanji needs a double space when displayed on screens or printed. when a kanji is used with numerals or kana, the kanji part looks fine but the numerical part has too much space between each numeral. therefore, input of kanji is done in a kanji mode and input of kana, roman alphabets, and numerals are in a kananumerical mode. in this way a multidigit figure looks like one whole figure rather than a line of one-digit figures . some formal documents must be printed in the traditional vertical arrangement. to cope with this situation, some line printers have the capability to precompose a vertical page before printing it. there are multicolor crts · on the market that can be used for the retrieval of library-related information, e. g., main entry in red, series statement in yellow. one last problem that must be considered is that most of these systems require trained operators, or else the operation is very slow. the information is edited and compiled by the editors and prepared for input in the form of worksheets. so are the revisions. at various stages of revising the text, the information must be printed, given to the editors, and revised . further developments in simplifying input and revising texts for efficient flow are to be expected. application of kanji systems processing of vernacular-language materials in their own writing systems is considered vital for research libraries in this country. in adoptjapanese character input/morita 21 ing the kanji systems in such libraries, there are three major factors that must be considered: the objectives and needs of the institution, the cost, and the personnel. first, the institution must know what it must accomplish by means of such a system. the needs may not be the same for all institutions . is the system for retrieving catalog information, or for inputting catalog and other information? is it for internal processing or patron use? is it for a large bibliographic utility to distribute information to its subscribers, or for an individual institution to process its own information? could the system be shared by the department of asian studies in any way? the character set needs· of the institution are a major factor in choosing the system . since input and output devices are different, i.e., one cannot input kanji on a crt and retrieve kanji from the same crt, the institution must consider how much it will need to input, or whether it can rely on available data bases. some institutions may not need any input equipment if they utilize available data bases . if japan marc and other tapes are made accessible by a large bibliographic utility in this country, the institutions will be able to obtain bibliographic information in kanji on the screen. if they want only catalog cards or a com catalog, they will not need any equipment except the terminals supported by the utility. if they want to input, they must consider what form or forms of output they need, how to create the characters not included in the system, in addition to which system to choose. second, cost is an important factor. is the expense jl.lstified in terms of the other needs of the library? what can be accomplished per dollar spent? the kanji systems are still expensive, though the cost will eventually be reduced. how much can be spent and how much continuing support can be expected are factors that modify system expectations. the budget must include not only the one-time hardware cost , but also the software, maintenance, and personnel. third, the availability of personnel will affect the choice of system. what degree of language expertise does the system require in each stage of operation, such as inputting, maintenance , and programming? does it need terminal operators trained in those languages? what other personnel does the system need as far as language-related qualification is concerned? apart from the three major factors discussed above, there are some technical aspects that must be adjusted to library situations in this country. since japanese, chinese, and korean use the same chinese ideographs to different degrees and in different ways, libraries considering automated processing of these language materials are probably expected to handle all three languages by the same system, to say nothing about the other non-roman scripts. problems will arise in selecting characters for inclusion in the system. as pointed out earlier with regard to 22 journal of library automation vol. 14/1 march 1981 japanese character processing, there are simply too many characters for the present capacity of any computer. if korean and chinese languages are to be handled by the same computer, this problem multiplies. the korean alphabet, called hangul, would have to be included. chinese has more characters than japanese. worse yet is the fact that some kanji are simplified in different ways in japan and china, so that they are neither recognizable nor interchangeable between them . it will be an enormous task to accommodate both in the same system. another problem is the arrangement and indexing of kanji. if a full keyboard, a japanese typewriter keyboard, or two-key stroke system, especially its location association method by kana typewriter, is considered for japanese, chinese, and korean, the arrangement of the characters must be indexed and accessed for the three languages, in addition to the multiple readings found in japanese. for example, kanji on the japanese keyboard are usually arranged by the initial sound of the japanese reading of the kanji . this arrangement will be useless for chinese and korean, because japanese readings are not the same as chinese or korean readings. the arrangement of kanji on the keyboards must be on some new principle common to these languages. even if the kana-kanji conversion is used, and roman alphabet-kanji conversion software is adopted, software to handle those three languages must be developed. such software would have to be highly sophisticated. the presence of many homonyms in chinese will cause a great problem to the extent that the system relies on transliterated or romanized forms of the language . recognition of the many identical spellings in different language contexts will be extremely difficult. the above discussion is based on what is currently available in japan . the combination of existing inputting, generating, and outputting equipment developed by japanese technology opens up various possibilities for us to build effective systems in this country . acknowledgment this article is based on a study conducted in japan as a japan foundation professional fellow, and as a visiting re search fellow of the center for research on information and library science, university of tokyo. references l. national institute of japanese lite rature, implementation of a computer system and a kanji handling system at ni]l (tokyo: nijl, 1978), p.16. 2. toshio ishiwata, "kanji shori kenkyu ni motomerareru mono " ("requirements for study on kanji processing"] computopia no.9 (1977) , p.35 . 3. gendai yoga no kiso chishiki , 1980 {basic knowledge on current terms , 1980] (tokyo: jiyukokuminsha, 1980), p .999. 4. figures are taken from the following two sources and compiled by the author: hasegawa, jitsur6. "kanji shari sochi" ("kanji processing devices"] ]aha shari [in formation processing] 19, no.4:353 (april 1978). japanese character lnput!morita 23 sugai, kazur6. "kanji nyii.-shutsuryoku sochi mo kaihatsu doko" ["a trend in development of kanji input-output devices"] business communication 16, no. 7:41 (1979). 5. used for the pattern input mentioned in the following component pattern input system . 6. national diet library, library automation in the national diet library (tokyo: the library, 1979), p.4 . 7. ibid., p.7 . 8. asia business consultants is using an optical character recognition system that can scan handwritten kana and numerals in a small scale to input and process catalog information for a library collection. 9. "joh6 kokan no tame no kanji fug6 no hy6junka" ["s tandarization of kanji code for information interchange"] kagaku gijitsu bunken siibisu [scientific and technical documents service] no.50 (1978), p.29. 10. ibid., p .28. ichiko morita is assistant professor in library administration and head, automated processing division, the ohio state university libraries . editor's notes most ]ola readers are aware of significant delays in publication in the last volume. susan k. martin, a former editor of ]ola, and richard d. johnson, a former editor of college & research libraries , gave freely of their time and energy to bring the journal back on schedule. mary madden, judith schmidt, and the members of the editorial board under the leadership of charles husbands all worked closely with sue and richard in this effort. this was a second time around for sue, who undertook a similar task when she assumed the jola editorship in 1972. the ]ola readership and this editor owe debts of gratitude to sue, richard, and all the others who helped. we do not foresee major changes in the format of the journal as established principally under the editorships of kilgour and martin. we look for increased strength in our book reviews section under the editorship of david weisbrod. the addition of tom harnish as assistant editor for video technologies indicates our recognition of the growing importance of videobased information systems. we encourage reader suggestions. w e welcome brief communications of successes or failures that might be of interest to other readers. letters to the editor about any of our feature articles or communications are solicited. the next generation library catalog | yang and hofmann 141 sharon q. yang and melissa a. hofmann the next generation library catalog: a comparative study of the opacs of koha, evergreen, and voyager open source has been the center of attention in the library world for the past several years. koha and evergreen are the two major open-source integrated library systems (ilss), and they continue to grow in maturity and popularity. the question remains as to how much we have achieved in open-source development toward the next-generation catalog compared to commercial systems. little has been written in the library literature to answer this question. this paper intends to answer this question by comparing the next-generation features of the opacs of two open-source ilss (koha and evergreen) and one proprietary ils (voyager’s webvoyage). m uch discussion has occurred lately on the nextgeneration library catalog, sometimes referred to as the library 2.0 catalog or “the third generation catalog.”1 different and even conflicting expectations exist as to what the next-generation library catalog comprises: in two sentences, this catalog is not really a catalog at all but more like a tool designed to make it easier for students to learn, teachers to instruct, and scholars to do research. it provides its intended audience with a more effective means for finding and using data and information.2 such expectations, despite their vagueness, eventually took concrete form in 2007.3 among the most prominent features of the next-generation catalog are a simple keyword search box, enhanced browsing possibilities, spelling corrections, relevance ranking, faceted navigation, federated search, user contribution, and enriched content, just to mention a few. over the past three years, libraries, vendors, and open-source communities have intensified their efforts to develop opacs with advanced features. the next-generation catalog is becoming the current catalog. the library community welcomes open-source integrated library systems (ilss) with open arms, as evidenced by the increasing number of libraries and library consortia that have adopted or are considering opensource options, such as koha, evergreen, and the open library environment project (ole project). librarians see a golden opportunity to add features to a system that will take years for a proprietary vendor to develop. open-source opacs, especially that of koha, seem to be more innovative than their long-established proprietary counterparts, as our investigation shows in this paper. threatened by this phenomenon, ils vendors have rushed to improve their opacs, modeling them after the next-generation catalog. for example, ex libris pushed out its new opac, webvoyage 7.0, in august of 2008 to give its opac a modern touch. one interesting question remains. in a competition for a modernized opac, which opac is closest to our visions for the next-generation library catalog: opensource or proprietary? the comparative study described in this article was conducted in the hope of yielding some information on this topic. for libraries facing options between open-source and proprietary systems, “a thorough process of evaluating an integrated library system (ils) today would not be complete without also weighing the open source ils products against their proprietary counterparts.”3 ■■ scope and purpose of the study the purpose of the study is to determine which opac of the three ilss—koha, evergreen, or webvoyage—offers more in terms of services and is more comparable to the next-generation library catalog. the three systems include two open-source and one proprietary ilss. koha and evergreen are chosen because they are the two most popular and fully developed open-source ilss in north america. at the time of the study, koha had 936 implementations worldwide; evergreen had 543 library users.4 we chose webvoyage for comparison because it is the opac of the voyager ils by ex libris, the biggest ils vendor in terms of personnel and marketplace.5 it also is one of the more popular ilss in north america, with a customer base of 1,424 libraries, most of which are academic.6 as the sample only includes three ilss, the study is very limited in scope, and the findings cannot be extrapolated to all open-source and proprietary catalogs. but, hopefully, readers will gain some insight into how much progress libraries, vendors, and open-source communities have achieved toward the next-generation catalog. ■■ literature review a review of the library literature found two relevant studies on the comparison of opacs in recent years. the first study was conducted by two librarians in slovenia investigating how much progress libraries had made toward the next-generation catalog.7 six online catalogs sharon q. yang (yangs@rider.edu) is systems librarian and melissa a. hofmann (mhofmann@rider.edu) is bibliographic control librarian, rider university. 142 information technology and libraries | september 2010 were examined and evaluated, including worldcat, the slovene union catalog cobiss, and those of four public libraries in the united states. the study also compared services provided by the library catalogs in the sample with those offered by amazon. the comparison took place primarily in six areas: search, presentation of results, enriched content, user participation, personalization, and web 2.0 technologies applied in opacs. the authors gave a detailed description of the research results supplemented by tables and snapshots of the catalogs in comparison. the findings indicated that “the progress of library catalogues has really been substantial in the last few years.” specifically, the library catalogues have made “the best progress on the content field and the least in user participation and personalization.” when compared to services offered by amazon, the authors concluded that “none of the six chosen catalogues offers the complete package of examined options that amazon does.”8 in other words, library catalogs in the sample still lacked features compared to amazon. the other comparative study was conducted by linda riewe, a library school student, in fulfillment for her master’s degree from san jose university. the research described in her thesis is a questionnaire survey targeted at 361 libraries that compares open-source (specifically, koha and evergreen) and propriety ilss in north america. more than twenty proprietary systems were covered, including horizon, voyager, millennium, polaris, innopac, and unicorn.9 only a small part of her study was related to opacs. it involved three questions about opacs and asked librarians to evaluate the ease of use of their ils opac’s search engines, their opac search engine’s completeness of features, and their perception of how easy it is for patrons to make self-service requests online for renewals and holds. a scale of 1 to 5 was used (1 = least satisfied; 5= very satisfied) regarding the three aspects of opacs. the mean and medium satisfaction ratings for open-source opacs were higher than those of proprietary ones. koha’s opac was ranked 4.3, 3.9, and 3.9, respectively in mean, the highest on the scale in all three categories, while the proprietary opacs were ranked 3.9, 3.6, and 3.6.10 evergreen fell in the middle, still ahead of proprietary opacs. the findings reinforced the perception that open-source catalogs, especially koha, offer more advanced features than proprietary ones. as riewe’s study focused more on the cost and user satisfaction with ilss, it yielded limited information about the connected opacs. no comparative research has measured the progress of open-source versus proprietary catalogs toward the next-generation library catalog. therefore the comparison described in this paper is the first of its kind. as only koha, everygreen, and voyager’s opacs are examined in this paper, the results cannot be extrapolated. studies on a larger scale are needed to shed light on the progress librarians have made toward the next-generation catalog. ■■ method the first step of the study was identifing and defining of a set of measurements by which to compare the three opacs. a review of library literature on the next-generation library catalog revealed different and somewhat conflicting points of views as to what the nextgeneration catalog should be. as marshall breeding put it, “there isn’t one single answer. we will see a number of approaches, each attacking the problem somewhat differently.”11 this study decided to use the most commonly held visions, which are summarized well by breeding and by morgan’s lita executive summary.12 the ten parameters identified and used in the comparison were taken primarily from breeding’s introduction to the july/ august 2007 issue of library technology reports, “nextgeneration library catalogs.”13 the ten features reflect some librarians’ visions for a modern catalog. they serve as additions to, rather than replacements of, the feature sets commonly found in legacy catalogs. the following are the definitions of each measurement: ■■ a single point of entry to all library information: “information” refers to all library resources. the next-generation catalog contains not only bibliographical information about printed books, video tapes, and journal titles but also leads to the full text of all electronic databases, digital archives, and any other library resources. it is a federated search engine for one-stop searching. it not only allows for one search leading to a federation of results, it also links to full-text electronic books and journal articles and directs users to printed materials. ■■ state-of-the-art web interface: library catalogs should be “intuitive interfaces” and “visually appealing sites” that compare well with other internet search engines.14 a library’s opac can be intimidating and complex. to attract users, the next-generation catalog looks and feels similar to google, amazon, and other popular websites. this criterion is highly subjective, however, because some users may find google and amazon anything but intuitive or appealing. the underlying assumption is that some internet search engines are popular, and a library catalog should be similar to be popular themselves. ■■ enriched content: breeding writes, “legacy catalogs tend to offer text-only displays, drawing only on the marc record. a next-generation catalog might bring in content from different sources to strengthen the visual appeal and increase the amount of information presented to the user.”15 the enriched content the next generation library catalog | yang and hofmann 143 includes images of book covers, cd and movie cases, tables of contents, summaries, reviews, and photos of items that traditionally are not present in legacy catalogs. ■■ faceted navigation: faceted navigation allows users to narrow their search results by facets. the types of facets may include subjects, authors, dates, types of materials, locations, series, and more. many discovery tools and federated search engines, such as villanova university’s vufind and innovative interface’s encore, have used this technology in searches.16 auto-graphics also applied this feature in their opac, agent iluminar.17 ■■ simple keyword search box: the next-generation catalog looks and feels like popular internet search engines. the best example is google’s simple user interface. that means that a simple keyword search box, instead of a controlled vocabulary or specific-field search box, should be presented to the user on the opening page with a link to an advanced search for user in need of more complex searching options. ■■ relevancy: traditional ranking of search results is based on the frequency and positions of terms in bibliographical records during keyword searches. relevancy has not worked well in opacs. in addition, popularity is another factor that has not been taken into consideration in relevancy ranking. for instance, “when ranking results from the library’s book collection, the number of times that an item has been checked out could be considered an indicator of popularity.”18 by the same token, the size and font of tags in a tag cloud or the number of comments users attach to an item may also be considered relevant in ranking search results. so far, almost no opacs are capable of incorporating circulation statistics into relevancy ranking. ■■ “did you mean . . . ?”: when a search term is not spelled correctly or nothing is found in the opac in a keyword search, the spell checker will kick in and suggest the correct spelling or recommend a term that may match the user’s intended search term. for example, a modern catalog may generate a statement such as “did you mean . . . ?” or “maybe you meant . . . .” this may be a very popular and useful service in modern opacs. ■■ recommendations and related materials: the nextgeneration catalog is envisioned as promoting reading and learning by making recommendations of additional related materials to patrons. this feature is an imitation of amazon and websites that promote selling by stating “customers who bought this item also bought . . . .” likewise, after a search in the opac, a statement such as “patrons who borrowed this book also borrowed the following books . . .” may appear. ■■ user contribution—ratings, reviews, comments, and tagging: legacy catalogs only allow catalogers to add content. in the next-generation catalog, users can be active contributors to the content of the opac. they can rate, write reviews, tag, and comment on items. user contribution is an important indicator for use and can be used in relevancy ranking. ■■ rss feeds: the next-generation catalog is dynamic because it delivers lists of new acquisitions and search updates to users through rss feeds. modern catalogs are service-oriented; they do more than provide a simple display search results. the second step is to apply these ten visions to the opacs of koha, evergreen, and webvoyage to determine if they are present or absent. the opacs used in this study included three examples from each system. they may have been product demos and live catalogs randomly chosen from the user list on the product websites. the latest releases at the time of the study was koha 3.0, evergreen 2.0, webvoyage 7.1. in case of discrepancies between product descriptions and reality, we gave precedence to reality over claims. in other words, even if the product documentation lists and describes a feature, this study does not include it if the feature is not in action either in the demo or live catalogs. despite the fact that a planned future release of one of those investigated opacs may add a feature, this study only recorded what existed at the time of the comparison. the following are the opacs examined in this paper. koha ■■ koho demo for academic libraries: http://academic .demo.kohalibrary.com/ ■■ wagner college: http://wagner.waldo.kohalibrary .com/ ■■ clearwater christian college: http://ccc.kohalibrary .com/ evergreen ■■ evergreen demo: http://demo.gapines.org/opac/ en-us/skin/default/xml/index.xml ■■ georgia pines: http://gapines.org/opac/en-us/ skin/default/xml/index.xml ■■ columbia bible college at http://columbiabc .evergreencatalog.com/opac/en-ca/skin/default/ xml/index.xml webvoyage ■■ rider university libraries: http://voyager.rider.edu ■■ renton college library: http://renton.library.ctc .edu/vwebv/searchbasic 144 information technology and libraries | september 2010 ■■ shoreline college library: http://shoreline.library .ctc.edu/vwebv/searchbasic the final step includes data collection and compilation. a discussion of findings follows. the study draws conclusions about which opac is more advanced and has more features of the next-generation library catalog. ■■ findings each of the opacs of koha, evergreen, and webvoyage are examined for the presence of the ten features of the next-generation catalog. single point of entry for all library information none of the opacs of the three ilss provides true federated searching. to varying degrees, each is limited in access, showing an absence of contents from electronic databases, digital archives, and other sources that generally are not located in the legacy catalog. of the three, koha is more advanced. while webvoyage and evergreen only display journal-holdings information in their opacs, koha links journal titles from its catalog to proquest’s serials solutions, thus leading users to fulltext journals in the electronic databases. the example in figure 1 (koha demo) shows the journal title unix update with an active link to the full-text journal in the availability field. the link takes patrons to serials solutions, where full text at the journal-title level is listed for each database (see figure 2). each link will take you into the full text in each database. state-of-the-art web interface as beauty is in the eye of the beholder, the interface of a catalog can be appealing to one user but prohibitive to another. with this limitation in mind, the out-of-thebox user interface at the demo sites was considered for each opac. all the three catalogs have the google-like simplicity in presentation. all of the user interfaces are highly customizable. it largely depends on the library to make the user interface appealing and welcoming to users. figures 3–5 show snapshots from each ilss demo sites and have not been customized. however, there are a few differences in the “state of the art.” for one, koha’s navigation between screens relies solely on the browser’s forward and back buttons, while webvoyage and evergreen have internal navigation buttons that more efficiently take the user between title lists, headings lists, and record displays, and between records in a result set. while all three opacs offer an advanced search page with multiple boxes for entering search terms, only webvoyage makes the relationship between the terms in different boxes clear. by the use of a drop-down box, it makes explicit that the search terms are by default anded and also allows for the selection of or and not. in koha’s and evergreen’s advanced search, however, the terms are anded only, a fact that is not at all obvious to the user. in the demo opacs examined, there is no option to choose or or not between rows, nor is there any indication that the search is anded. the point of providing multiple search boxes is to guide users in constructing a boolean search without their having to worry about operators and syntax. in koha, however, users have to type an or or not statement themselves within the text box, thus defeating the purpose of having multiple boxes. while evergreen allows for a not construction within a row (“does not contain”), it does not provide an option for or (“contains” and “matches exactly” are the other two options available). see figures figure 1. link to full-text journals in serials solutions in koha figure 2. links to serials solutions from koha the next generation library catalog | yang and hofmann 145 6–8. thus koha’s and evergreen’s advanced search is less than intuitive for users and certainly less functional than webvoyage’s. enriched content to varying degrees, enriched content is present in all three catalogs, with koha providing the most. while all three catalogs have book covers and movie-container art, koha has much more in its catalog. for instance, it displays tags, descriptions, comments, and amazon reviews. webvoyage displays links to google books for book reviews and content summaries but does not have tags, descriptions, and comments in the catalog. see figures 9–11. faceted navigation the koha opac is the only catalog of the three to offer faceted navigation. the “refine your search” feature allows users to narrow search results by availability, places, libraries, authors, topics, and series. clicking on a term within a facet adds that term to the search query and generates a narrower list of results. the user may then choose another facet to further refine the search. while evergreen appears to have faceted navigation upon first glance, it actually does not possess this feature. the following facets appear after a search generates hits: “relevant subjects,” “relevant authors,” and “relevant series.” but choosing a term within a facet does not narrow down the previous search. instead, it generates an entirely new search with the selected term; it does not add the new term to the previous query. users must manually combine the terms in the simple search box or through the advanced search page. webvoyage also does not offer faceted navigation—it only provides an option to “filter your search” by format, language, and date when a set of results is returned. see figures 12–14. keyword searching koha, evergreen, and webvoyage all present a simple keyword search box with a link to the advanced search (see figures 3–5). relevancy neither koha, evergreen, nor webvoyage provide any evidence for meeting the criteria of the next-generation catalog’s more inclusive vision of relevancy ranking, such as accounting for an item’s popularity or allowing user tags. koha uses index data’s zebra program for its relevance ranking, which “reads structured records in a variety of input formats . . . and allows access to them through exact boolean search figure 3. koha: state-of-the-art user interface figure 5. voyager: state-of-the-art user interface figure 4. evergreen: state-of-the-art user interface 146 information technology and libraries | september 2010 user contributions koha is the only system of the three that allows users to add tags, comments, descriptions, and reviews. in koha’s opac, user-added tags form tag clouds, and the font and size of each keyword or tag indicate that keyword or figure 6. voyager advanced search figure 7. koha advanced search figure 8. evergreen advanced search expressions and relevance-ranked free-text queries.19 evergreen’s dokuwiki states that the base relevancy score is determined by the cover density of the searched terms. after this base score is determined, items may receive score bumps based on word order, matching on the first word, and exact matches depending on the type of search performed.20 these statements do not indicate that either koha or evergreen go beyond the traditional relevancy-ranking methods of legacy systems, such as webvoyage. did you mean . . . ? only evergreen has a true “did you mean . . . ?” feature. when no hits are returned, evergreen provides a suggested alternate spelling (“maybe you meant . . . ?”) as well as a suggested additional search (“you may also like to try these related searches . . .”). koha has a spell-check feature, but it automatically normalizes the search term and does not give the option of choosing different one. this is not the same as a “did you mean . . . ?” feature as defined above. while the normalizing process may be seamless, it takes the power of choice away from the user and may be problematic if a particular alternative spelling or misspelling is searched purposefully, such as “womyn.” (when “womyn” is searched as a keyword in the koha demo opac, 16,230 hits are returned. this catalog does not appear to contain the term as spelled, which is why it is normalized to women. the fact that the term does not appear as is may not be transparent to the searcher.) with normalization, the user may also be unaware that any mistake in spelling has occurred, and the number of hits may differ between the correct spelling and the normalized spelling, potentially affecting discovery. the normalization feature also only works with particular combinations of misspellings, where letter order affects whether a match is found. otherwise the system returns a “no result found!” message with no suggestions offered. (try “homoexuality” vs. “homoexsuality.” in koha’s demo opac, the former, with a missing “s,” yields 553 hits, while the latter, with a misplaced “s,” yields none.) however, koha is a step ahead of webvoyage, which has no built-in spell checker at all. if a search fails, the system returns the message “search resulted in no hits.” see figures 15–17. recommendations/related materials none of the three online catalogs can recommend materials for users. the next generation library catalog | yang and hofmann 147 figure 9. koha enriched content figure 10. evergreen enriched content figure 11. voyager enriched content figure 12. koha faceted navigation figure 13. evergreen faceted navigation figure 14. voyager faceted navigation 148 information technology and libraries | september 2010 nevertheless, the user contribution in the koha opac is not easy to use. it may take many clicks before a user can figure out how to add or edit text. it requires user login, and the system cannot keep track of the search hits after a login takes place. therefore the user contribution features of koha need improvement. see figure 18. rss feeds koha provides rss feeds, while evergreen and webvoyage do not. ■■ conclusion table 1 is a summary of the comparisons in this paper. these comparisons show that the koha opac has six out of the ten compared features for the next-generation catalog, plus two halves. its full-fledged features include state-of-the-art web interface, enriched content, faceted navigation, a simple keyword search box, user contribution, and rss feeds. the two halves indicate the existence of a feature that is not fully developed. for instance, “did you mean . . . ?” in koha does not work the way the next-generation catalog is envisioned. in addition, koha has the capability of linking journal titles to full text via serials solutions, while the other two opacs only display holdings information. evergreen falls into second place, providing four out of the ten compared features: state-of-the-art interface, enriched content, a keyword search box, and “did you mean . . . ?” webvoyage, the voyager opac from ex libris, comes in third, providing only three out of the ten features for figure 15. evergreen: did you mean . . . ? figure 16. koha: did you mean . . . ? figure 17. voyager: did you mean . . . ? figure 18. koha user contibutions tag’s frequency of use. all the tags in a tag cloud serve as hyperlinks to library materials. users can write their own reviews to complement the amazon reviews. all user-added reviews, descriptions, and comments have to be approved by a librarian before they are finalized for display in the opac. the next generation library catalog | yang and hofmann 149 the next-generation catalog. based on the evidence, koha’s opac is more advanced and innovative than evergreen’s or voyager’s. among the three catalogs, the open-source opacs compare more favorably to the ideal next-generation catalog then the proprietary opac. however, none of them is capable of federated searching. only koha offers faceted navigation. webvoyage does not even provide a spell checker. the ils opac still has a long way to go toward the nextgeneration catalog. though this study samples only three catalogs, hopefully the findings will provide a glimpse of the current state of open-source versus proprietary catalogs. ils opacs are not comparable in features and functions to stand-alone opacs, also referred to as “discovery tools” or “layers.” some discovery tools, such as ex libris’ primo, also are federated search engines and are modeled after the next-generation catalog. recently they have become increasingly popular because they are bolder and more innovative than ils opacs. two of the best stand-alone open-source opacs are villanova university’s vufind and oregon state university’s libraryfind.21 both boast eight out of ten features of the next-generation catalog.22 technically it is easier to develop a new stand-alone opac with all the next-generation catalog features than mending old ils opacs. as more and more libraries are disappointed with their ils opacs, more discovery tools will be implemented. vendors will stop improving ils opacs and concentrate on developing better discovery tools. the fact that ils opacs are falling behind current trends may eventually bear no significance for libraries—at least for the ones that can afford the purchase or implementation of a more sophisticated discovery tool or stand-alone opac. certainly small and public libraries who cannot afford a discovery tool or a programmer for an open-source opac overlay will suffer, unless market conditions change. references 1. tanja mercun and maja žumer, “new generation of catalogues for the new generation of users: a comparison of six library catalogues,” program: electronic library & information systems 42, no. 3 (july 2008): 243–61. 2. eric lease morgan, “a ‘next-generation’ library catalog— executive summary (part #1 of 5),” online posting, july 7, 2006, lita blog: library information technology association, http:// litablog.org/2006/07/07/a-next-generation-library-catalog -executive-summary-part-1-of-5/ (accessed nov. 10, 2008). 3. marshall breeding, introduction to “next generation library catalogs,” library technology reports 43, no. 4 (july/aug. 2007): 5–14. 4. ibid. 5. marshall breeding, “library technology guides: key resources in the field of library automation,” http:// www .librarytechnology.org/lwc-search-advanced.pl (accessed jan. 23, 2010). 6. marshall breeding, “investing in the future: automation marketplace 2009,” library journal (apr. 1, 2009), http:// www .libraryjournal.com/article/ca6645868.html (accessed jan. 23, 2010). 7. marshall breeding, “library technology guides: company directory,” http://www.librarytechnology.org/exlibris .pl?sid=20100123734344482&code=vend (accessed jan. 23, 2010). 8. merčun and zumer, “new generation of catalogues.” 9. ibid. 10. linda riewe, “integrated library system (ils) survey: open source vs. proprietary-tables” (master’s thesis, san jose university, 2008): 2–5, http://users.sfo.com/~lmr/ils-survey/ tables-all.pdf (accessed nov. 4, 2008). 11. ibid., 26–27. 12. breeding, introduction. 13. ibid.; morgan, “a ‘next-generation’ library catalog.” 14. breeding, introduction. 15. ibid. 16. ibid. 17. villanova university, “vufind,” http://vufind.org/ (accessed june 10, 2010); innovated interfaces, “encore,” http:// encoreforlibraries.com/ (accessed june 10, 2010). 18. auto-graphics, “agent illuminar,” http://www4.auto -graphics.com/solutions/agentiluminar/agentiluminar.htm (accessed june 10, 2010). 19. breeding, introduction; morgan, “a ‘next-generation’ table 1. summary features of the next generation catalog koha evergreen voyager single point of entry for all library information ûü û û state-of-the-art web interface ü ü ü enriched content ü ü ü faceted navigation ü û û keyword search ü ü ü relevancy û û û did you mean…? üû ü û recommended/ related materials û û û user contribution ü û û rss feed ü û û 150 information technology and libraries | september 2010 22. villanova university, “vufind”; oregon state university, “libraryfind,” http://libraryfind.org/ (accessed june 10, 2010). 23. sharon q.yang and kurt wagner, “open source standalone opacs,” (microsoft powerpoint presentation, 2010 virtual academic library environment annual conference, piscataway, new jersey, jan. 8, 2010). library catalog.” 20. index data, “zebra,” http://www.indexdata.dk/zebra/ (accessed jan. 3, 2009). 21. evergreen docuwiki, “search relevancy ranking,” http://open-ils.org/dokuwiki/doku.php?id=scratchpad:opac_ demo&s=core (accessed dec. 19, 2008). lita cover 3, cover 4 yalsa cover 2 index to advertisers communications ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ manzari user-centered design of a web site | manzari and trinidad-christensen 163 this study describes the life cycle of a library web site created with a user-centered design process to serve a graduate school of library and information science (lis). findings based on a heuristic evaluation and usability study were applied in an iterative redesign of the site to better serve the needs of this special academic library population. recommendations for design of web-based services for library patrons from lis programs are discussed, as well as implications for web sites for special libraries within larger academic library settings. u ser-centered design principles were applied to the creation of a web site for the library and information science (lis) library at the c. w. post campus of long island university. this web site was designed for use by master’s degree and doctoral students in the palmer school of library and information science. the prototype was subjected to a usability study consisting of a heuristic evaluation and usability testing. the results were employed in an iterative redesign of the web site to better accommodate users’ needs. this was the first usability study of a web site at the c. w. post library. human-computer interaction, the study of the interaction of human performance with computers, imposes a rigorous methodology on the process of user-interface design. more than an intuitive determination of userfriendliness, a successful interactive product is developed by careful design, testing, and redesign based on the testing outcomes. testing the product several times as it is being developed, or iterative testing, allows the users’ needs to be incorporated into the design. the interface should be designed for a specific community of users and set of tasks to be accomplished, with the goal of creating a consistent, usable product. the lis library had a web site that was simply a description of the collection and did not provide access to online specialized resources. a new web site was designed for the lis library by the incoming lis librarian who made a determination of what content might be useful for lis students and faculty. the goal was to have such content readily accessible in a web site separate from the main library web site. the web site for the lis library includes: ฀ access to all online databases and journals related to lis; ฀ a general overview of the lis library and its resources as well as contact information, hours, and staff; ฀ a list of all print and online lis library journal subscriptions, grouped by both title and subject, with links to access the online journals; ฀ links to other web sites in the lis field; ฀ links to other university web pages, including the main library’s home page, library catalog, and instructions for remote database access, as well as to the lis school web site; ฀ a link to jake (jointly administered knowledge environment), a project by yale university that allows users to search for periodical titles within online databases, since the library did not have this type of access through its own software. this information was arranged in four top-level pages with sublevels. design considerations included making the site both easy to learn and efficient once users were familiar with it. since classes are taught at four locations in the metropolitan area, the site needed to be flexible enough to serve students at the c. w. post campus library as well as remotely. the layout of the information was designed to make the web site uncluttered and attractive. different color schemes were tried and informally polled among users. a version with white text on black background prompted strong likes or dislikes when shown to users. although this combination is easy to read, it was rejected because of the strong negative reactions from several users. photographs of the lis library and students were included. the pages were designed with a menu on the left side; fly-out menus were used to access submenus. where main library pages already existed for information to be included in the lis web site, such as lis hours and staff, links to those pages were made instead of re-creating the information in the lis web site. an attempt was made to render the site accessible to users with disabilities, and pages were made compliant with the world wide web consortium (w3c) by using their html validator and their cascading style sheet validator.1 ฀ literature review usability is a term with many definitions, varying by field.2 the fields of industrial engineering, product research and development, computer systems, and library science all share the study of human-and-machine interaction, as well user-centered design of a web site for library and information science students: heuristic evaluation and usability testing laura manzari and jeremiah trinidad-christensen laura manzari (manzari@liu.edu) is an associate professor and library and information science librarian at the c. w. post campus of long island university, brookville, n.y. jeremiah trinidad-christensen (jt2118@columbia.edu) is a gis/map librarian at columbia university, new york, n.y. 164 information technology and libraries | september 2006 as a commitment to users. dumas and reddish explain it simply: “usability means that the people who use the product can do so quickly and easily to accomplish their own tasks.”3 user-centered design incorporates usability principles into product design and places the focus on the user during project development. gould and lewis cite three principles of user-centered design: an early focus on users and tasks, empirical measurement of product usage, and iterative design to include user input into product design and modification.4 jakob nielsen, an often-cited usability engineering specialist, emphasizes that for increased functionality, engineering usability principles should apply to web design, which should be treated as a software development project. he advocates incorporating user evaluation into the design process first through a heuristic evaluation, followed by usability testing with a redesign of the product after each phase of evaluation.5 usability principles have been applied to library web-site design; however, library web-site usability studies often do not include the additional heuristic evaluation recommended by nielsen.6 in addition to usability, consideration should also be given during the design process to making the web site accessible to people with disabilities. federal agencies are now required by the rehabilitation act to make their web sites accessible to the disabled. section 508 part 1194.22 of the act enumerates sixteen rules for internet applications to help ensure web-site access for people with various disabilities.7 similarly, the web accessibility initiative hosted by the w3c works to ensure that accessibility practices are considered in web-site design. they developed the web content accessibility guidelines for making web sites accessible to people with disabilities.8 although articles have been written about usability testing of academic library web sites, very little has been written about usability testing of special-collection web sites for distinct user populations within larger academic settings.9 ฀ heuristic evaluation methodology heuristic evaluation is a usability engineering method in which a small set of expert evaluators examine a user interface for design problems by judging its compliance with a set of recognized usability principles or heuristics. nielsen developed a set of ten widely adopted usability heuristics (see sidebar). after studying the use of individual evaluators as well as groups of varying sizes, nielsen and molich recommend using three to five evaluators for a heuristic evaluation.10 the use of multiple experts will catch more flaws than a single expert, but using more than five experts does not produce greater results. in comparisons of heuristic evaluation and usability testing, the heuristic evaluation uncovered more of the minor problems while usability testing uncovered more major, global problems.11 since each method tends to uncover different usability problems, it is recommended that both methods be used complementarily, particularly with an iterative design change between the heuristic evaluation and the usability testing. for the heuristic evaluation, four people were approached from the palmer lis school faculty and ph.d. program with expertise in web-site design and humancomputer interaction. three agreed to participate. they were asked to familiarize themselves with the web site and evaluate it according to nielsen’s ten heuristics, which were provided to them. ฀ heuristic evaluation results the evaluators were all in agreement that the language was appropriate for lis students. one evaluator said if new students were not familiar with some of the terms they soon would be. another thought jake, the tool to access full text, might not be clear to students at first, but the lis web-site explanation was fine the way it was. they were also in agreement that the web site was well designed. comments included: “the purpose and description of each page is short and to the point, and there is a good, clean, viewable page for the users”; “the site was well designed and not over designed”; “very clear and user friendly”; “excellent example of limiting unnecessary irrelevant information.” the only page to receive a “poor layout” comment was the lengthy subject list of journals, though no suggestions for improvement were made. concern was expressed about links to other web sites on campus. one evaluator thought new students might be confused about the relationship between long island university, c. w. post, and the palmer school. two evaluators thought links to the main library’s web site could cause confusion because of the different design and layout. a preference for the design of the lis library web site over the main library and palmer school web sites was expressed. to eliminate some confusion, the menu options for other campus web sites were dropped down to a separate menu right below the menu of lis web pages. for additional clarity, some of the main library pages were re-created in the style of the lis pages instead of linking to the original page. the evaluators made several concrete suggestions for menu changes, which were included in the redesign. it was suggested that several menu options were unclear and needed clarification, so additional text was added for clarity at the expense of brevity. long island university’s online catalog is named liucat and was listed that way on the menu. new students might not be familiar with this name, so the menu label was changed to liucat (library catalog). user-centered design of a web site | manzari and trinidad-christensen 165 for the link to jake, a description, find periodicals in online databases, was added for clarification. it was also suggested that the link to the main library web page for all databases could cause confusion since the layout and design of that page is different. the wording was changed to all databases (located in the c. w. post library web site). menu options were originally arranged in order of anticipated use (see figure 1). thus, the order of menu options from the lis home page was databases, journals, library catalog, other web sites, palmer school, and main library. evaluators suggested that putting the option for lis home page first would give users an easy “emergency exit” to return to the home page if they were lost. the original menu options also varied from page to page. for example, menu options on the database page referred only to pages that users might need while doing database searches. at the suggestion of evaluators, the menu options were changed to be consistent on every page (see figure 2). a redesign based on these results was completed and posted to the internet for public use (see figure 3). ฀ usability testing methodology usability testing is an empirical method for improving design. test subjects are gathered from the population who will use the product and are asked to perform real tasks using the prototype while their performance and reactions to the product are observed and recorded by an interviewer. this observation and recording of behavior distinguishes usability testing from focus groups. observation allows the tester to see when and where users become frustrated or confused. the goal is to jakob nielsen’s usability heuristics visibility of system status—the system should always keep users informed about what is going on, through appropriate feedback within reasonable time. match between system and the real world— the system should speak the user’s language, with words, phrases, and concepts familiar to the user rather than system-oriented terms. follow real-world conventions, making information appear in a natural and logical order. user control and freedom—users often choose system functions by mistake and will need a clearly marked “emergency exit” to leave the unwanted state without having to go through an extended dialogue. support undo and redo. consistency and standards—users should not have to wonder whether different words, situations, or actions mean the same thing. follow platform conventions. error prevention—even better than good error messages is a careful design that prevents problems from occurring in the first place. recognition rather than recall—make objects, actions, and options visible. the user should not have to remember information from one part of the dialogue to another. instructions for use of the system should be visible or easily retrievable whenever appropriate. flexibility and efficiency of use—accelerators, unseen by the novice user, may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. allow users to tailor frequent actions. aesthetic and minimalist design—dialogues should not contain information that is irrelevant or rarely needed. every extra unit of information in a dialogue competes with the relevant units of information and diminishes their relative visibility. help users recognize, diagnose, and recover from errors—error messages should be expressed in plain language (no codes), precisely indicate the problems, and constructively suggest a solution. help and documentation—even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. any such information would be easy to search, focused on the user’s task, list concrete steps to be carried out, and not be too large.12 figure 1. original menu figure 2. revised menu 166 information technology and libraries | september 2006 uncover usability problems with the product, not to test the participants themselves. the data gathered are then analyzed to recommend changes to fix usability problems. in addition to recording empirical data such as number of errors made or time taken to complete tasks, active intervention allows the interviewer to question participants about reasons for their actions as well as about their opinions regarding the product. in fact, subjects are asked to verbalize their thought processes as they complete the tasks using the interface. test subjects are usually interviewed individually and are all given the same pretest briefing from a script with a list of instructions followed by tasks representing actual use. test subjects are also asked questions about their likes and dislikes. in most situations, payment or other incentives are offered to help recruit subjects. four or five subjects will reveal 80 percent of usability problems.13 messages were sent to students via the palmer school’s mailing lists requesting volunteers. a ten-dollar gift certificate to a bookstore was offered as an inducement to recruitment. input was desired from both master’s degree and doctoral students. the first nine volunteers to respond—all master’s degree students—were accepted. this group included students from both the main and satellite campuses. no ph.d. students volunteered to participate at first, citing busy schedules, but eventually a doctoral student was recruited. testing was conducted in computer labs at the library, at the palmer school, and at the manhattan satellite campus. demographic information was gathered regarding users’ gender, age range, university status, familiarity with computers, with the internet, and with the lis library, as well as the type of internet connection and browser usually used. the subjects were given eight tasks to complete using the web site. the tasks reflected both the type of assignment a student might receive in class and the type of information they might seek on the lis web site on their own. the questions were designed to test usability of different parts of the web site. ฀ ฀usability testing results the first task tested the print journals page and asked if the lis library subscribes to a specific journal and whether it is refereed. (the web site uses an asterisk next to a journal title to indicate that it is refereed.) all subjects were able to easily find that the lis library does hold the journal title. although it was not initially obvious that the asterisk was a notation indicating that the journal was refereed, most of the subjects eventually found the explanatory note. many of the subjects did not know what a refereed journal was, and some asked if a definition could be provided on the site. for the second task, subjects needed to use jake to find the full text of an article. none of the students were familiar with jake but were able to use the lis web site to gain an understanding of its purpose and to access it. the third task asked subjects to find a library association that required using the other web sites page. all subjects demonstrated an understanding of how to use this page and found the information. the fourth task tested the full-text databases page. only one subject actually used this page to complete the task. the rest used the all databases link to the main library’s database list. that link appears above the link to full-text databases and most subjects chose that link without looking at the next menu option. several subfigure 3. final home page user-centered design of a web site | manzari and trinidad-christensen 167 jects became confused when they were taken to the main library’s page, just as the evaluators had predicted. even though wording was added warning users that they were leaving the lis web site, most subjects did not read it and wondered why the page layout changed and was not as clear. they also had trouble navigating back to the lis web site from the main library web site. the fifth task tested the journals by subject page. this task took longer for most of the subjects to answer, but all were able to use the page successfully to find a journal on a given subject. the sixth task required using the lis home page, and everyone easily used it to find the operating hours. the seventh task required subjects to find an online journal title that could be accessed from the electronic journals page. all subjects navigated this page easily. the final task asked subjects to find a book review. most subjects did not look at the page for library and information sciences databases to access the books in print database, saying they did not think it would be included there. instead, they used the link to the main library’s database page. one subject was not able to complete this task. problems primarily occurred during testing when subjects left the lis page to use a non-library science database located on the main web site. subjects had problems getting back to the lis site from the main library site. while performing tasks, some subjects would scroll up and down long lists instead of using the toolbars provided to bring the user to an exact location on the page. some preferred using the back button instead of using the lis web-site menu to navigate. these seemed to be individual styles of using the web and not any usability problem with the site. several people consistently used the menu to return to the lis home page before starting each new task, even though they could have navigated directly to the page they needed, making a return to the home page unnecessary. this validated the recommendation from the heuristic study that the link to the home page always be the first menu option to give users a comfortable safety valve when they get lost. the final questions asked subjects for their opinions on what they did and did not like about the web site, as well as any suggestions for improving the site. all subjects responded that they liked the layout of the pages, calling them uncluttered, clean, attractive, and logical. there were very few suggestions for improving the site. one person asked that contact information be included on the menu options in addition to its location right below the menu on the lis home page. another participant suggested adding class syllabi to the web site each semester, listing required texts along with a link to an online bookstore. some of the novice users asked for explanations of unfamiliar terms such as “refereed journals.” a participant suggested including a search engine instead of using links to navigate the site. this was considered during the initial site design but was not included since the site did not have a large number of pages. however, a search engine may be worth including. the one doctoral student had previously only used the main library’s web page to access databases. originally, he said he did not see the advantage of a site devoted to information science sources for doctoral candidates, since that program is more multidisciplinary. however, after completing the usability study, the student concluded that the lis web site was useful. he suggested that it should be publicized more to doctoral candidates and that it be more prominently highlighted on the main library web site. though the questions asked were about the lis web site, several subjects complained about the layout of the main library web site and suggested that it have better linking to the lis web site to enable it to be accessed more easily. ฀ conclusions iterative testing and user-centered design resulted in a product that testing revealed to be easy to learn and efficient to use, and about which subjects expressed satisfaction. based on findings that some students had not even been aware of the existence of the lis web site, greater emphasis is now given to the web site and its features during new student orientations. the biggest problem users had was navigating from the web pages of the main library back to the lis site. it was suggested that the lis site be highlighted more prominently on the main library web site. some users were confused by the different layouts between the sites, but no one expressed a preference for the design used by the main library web site. despite this confusion, subjects overwhelmingly expressed positive feedback about having a specialized library site serving their specific needs. issues regarding web-site design can be problematic for smaller specialized libraries within larger institutions. in this case, some of the problems navigating between the sites could be resolved by changes to the main library site. the design of the lis web site was preferred over the main campus web site by both the heuristic evaluators and the students in the usability test. however, designers of a main library web site might not be receptive to suggestions from a specialized or branch library. although consistency in design would eliminate confusion, requiring the specialcollection’s web site to follow a design set by the main institution could be a loss for users. in this instance, the main site was designed without user input, whereas the specialized library serving a smaller population was able to be more dynamic and responsive to its users. finding an appropriate balance for a site used by students new to the field as well as advanced students is 168 information technology and libraries | september 2006 a challenge. although the students in the study were all experienced computer and web users, their familiarity with basic library concepts varied greatly. a few novice users expressed some confusion as to the difference between journals and index databases. there actually was a description of each of these sources on the site but it was not read. (the subjects barely read any of the site’s text, so it can be difficult to make some points clearer when users want to navigate quickly without reading instructions. several subjects who did not bother to read text on the site still suggested having more notes to explain unfamiliar terms. however, if the site becomes too overloaded with explanations of library concepts, it could become annoying for more advanced users.) a separate page with a glossary is a possibility—based on the study, however, it will probably not be read. another possibility is a handout for students that could have more text for new users without cluttering the web site. having such a handout would also serve to publicize the site. there was some concern prior to the study that offering more advanced features, such as providing access to jake or indicating which journals are refereed, might be off-putting for new students; therefore, test questions were designed to gauge reactions to these features. most students in the study did express some intimidation at not being familiar with these concepts. however, all the subjects eventually figured out how to use jake and, once they tried it, thought it was a good idea to include it. even new students who had the most difficulty were still able to navigate and learn from the site to be able to use it efficiently. an online survey was added to the final design to allow continuous user input. the site consistently receives positive feedback through these surveys. it was planned that responses could be used to continually assess the site and ensure that it is kept responsive and up-to-date; however specific suggestions have not yet been forthcoming. how valuable was usability testing to the web-site design? several good suggestions were made and implemented, and the process confirmed that the site was well designed. it provided some insight into how subjects used the web site that had not been anticipated by the designers. since usability studies are fairly easy and inexpensive to conduct, it is probably a step worth taking during the web-site design process even if it results in only minor changes to the design. references and notes 1. w3c, “the w3c markup validation service,” validator .w3.org (accessed nov. 1, 2005); w3c, “the w3c css validation service,” jigsaw.w3.org/css-validator (accessed nov. 1, 2005). 2. see carol m. barnum, usability testing and research (new york: longman international, 2002); alison j. head, “web redemption and the promise of usability,” online 23, no. 6 (1999): 20–29; international standards organization, ergonomic requirements for office work with visual display terminals. part 11: guidance on usability—iso 9241-11 (geneva: international organization for standardization, 1998); judy jeng, “what is usability in the context of the digital library and how can it be measured?” information technology and libraries 24, no. 2 (2005): 47–52; jakob nielsen, usability engineering (boston: academic, 1993); ruth ann palmquist, “an overview of usability for the study of users’ web-based information retrieval behavior,” journal of education for library and information science 42, no. 2 (2001): 123–36. 3. joseph s. dumas and janice c. redish, a practical guide to usability testing (portland: intellect bks., 1999), 4. 4. john d. gould and clayton h. lewis, “designing for usability: key principles and what designers think,” communications of the acm 28 no. 3 (1985): 300–11. 5. jakob nielsen, “heuristic evaluation,” in jakob nielsen and robert l. mack, eds., usability inspection methods (new york: wiley, 1994), 25–62. 6. see denise t. covey, usage and usability assessment: library practices and concerns (washington, d.c.: digital library federation, 2002); nicole campbell, usability assessment of library-related web sites (chicago: ala, 2001); kristen l. garlock and sherry piontek, designing web interfaces to library services and resources (chicago: ala, 1999); anna noakes schulze, “user-centered design for information professionals,” journal of education for library and information science 42, no. 2 (2001): 116–22; susan m. thompson, “remote observation strategies for usability testing,” information technology and libraries 22, no. 3 (2003): 22–32. 7. government services administration, “section 508: section 508 standards,” www.section508.gov/index.cfm?fuseacti on=content&id=12#web (accessed nov. 1, 2005). 8. w3c, “web content accessibility guidelines 2.0,” www .w3.org/tr/wcag20 (accessed nov. 1, 2005). 9. see susan augustine and courtney greene, “discovering how students search a library web site: a usability case study,” college and research libraries 63, no. 4 (2002): 354–65; brenda battleson, austin booth, and jane weintrop, “usability testing of an academic library web site: a case study,” journal of academic librarianship 27, no. 3 (2001): 188–98; janice krueger, ron l. ray, and lorrie knight, “applying web usability techniques to assess student awareness of library web resources,” journal of academic librarianship 30, no. 4 (2004): 285–93; thura mack et al., “designing for experts: how scholars approach an academic library web site,” information technology and libraries 23, no. 1 (2004): 16–22; mark shelstad, “content matters: analysis of a web site redesign,” oclc systems & services 21, no. 3 (2005): 209–25; robert l. tolliver et al., “web site redesign and testing with a usability consultant: lessons learned,” oclc systems & services 21, no. 3 (2005): 156–67; dominique turnbow et al., “usability testing for web redesign: a ucla case study,” oclc systems & services 21, no. 3 (2005): 226–34; leanne m. vandecreek, “usability analysis of northern illinois user-centered design of a web site | manzari and trinidad-christensen 169 university libraries’ web site: a case study,” oclc systems & services 21, no. 3 (2005): 181–92. 10. jakob nielsen and rolf molich, “heuristic evaluation of user interfaces,” in proceedings of the acm chi ’90 (new york: association for computing machinery, 1990), 249–56. 11. robin jeffries et al., “user interface evaluation in the real world: a comparison of a few techniques,” in proceedings of the acm chi ’91 (new york: association for computing machinery, 1991), 119–24; jakob nielsen, “finding usability problems through heuristic evaluation,” in proceedings of the acm chi ’92 (new york: association for computing machinery, 1992), 373–86. 12. jakob nielsen, “heuristic evaluation,” 25–62. 13. jeffrey rubin, handbook of usability testing: how to plan, design, and conduct effective tests (new york: wiley, 1994); jakob nielsen, “why you only need to test with five users, alertbox mar. 19, 2000,” www.useit.com/alertbox/20000319.html (accessed nov. 1, 2005). letter from the editor: september 2021 letter from the editor september 2021 kenneth j. varnum information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13859 in the editorial section of this issue, we have two columns to share. the september editorial board thoughts essay is by paul swanson, “building a culture of resilience in libraries,” reflecting on the lessons of covid-driven flexibility and suggests that a culture of resilience in our libraries will help us to more easily adapt to these, and emerging, changes we will inevitably encounter. that is followed by carole williams’ public libraries leading the way column, “delivering: automated materials handling for staff and patrons,” in which she discusses the effects of an automated materials handling system on both the staff and patrons of the charleston county (sc) public library. in peer-reviewed content, we have a diverse set of articles on range of topics: bias mitigation in metadata; accessibility of pdf documents; two articles on automated classification of different kinds of texts; two articles with lessons learned due to our abrupt move to remote service; and a case study on the importance of product ownership. 1. mitigating bias in metadata: a use case using homosaurus linked data / juliet hardesty and allison nolan 2. accessibility of tables in pdf documents: issues, challenges and future directions / nosheen fayyaz, shah khusro, and shakir ullah 3. text analysis and visualization research on the hetu dangse during the qing dynasty of china / zhiyu wang, jingyu wu, guang yu, and zhiping song 4. topic modeling as a tool for analyzing library chat transcripts / hyunseung koh and mark fienup 5. expanding and improving our library’s virtual chat service: discovering best practices when demand increases / parker fruehan and diana hellyar 6. a rapid implementation of a reserve reading list solution in response to the covid-19 pandemic / matthew black and susan powelson 7. product ownership of a legacy institutional repository: a case study on revitalizing an aging service / mikala narlock and don brower kenneth j. varnum, editor varnum@umich.edu september 2021 https://ejournals.bc.edu/index.php/ital/article/view/13781 https://ejournals.bc.edu/index.php/ital/article/view/13697 https://ejournals.bc.edu/index.php/ital/article/view/13697 https://ejournals.bc.edu/index.php/ital/article/view/13053 https://ejournals.bc.edu/index.php/ital/article/view/12325 https://ejournals.bc.edu/index.php/ital/article/view/12325 https://ejournals.bc.edu/index.php/ital/article/view/13279 https://ejournals.bc.edu/index.php/ital/article/view/13279 https://ejournals.bc.edu/index.php/ital/article/view/13333 https://ejournals.bc.edu/index.php/ital/article/view/13117 https://ejournals.bc.edu/index.php/ital/article/view/13117 https://ejournals.bc.edu/index.php/ital/article/view/13209 https://ejournals.bc.edu/index.php/ital/article/view/13209 https://ejournals.bc.edu/index.php/ital/article/view/13241 https://ejournals.bc.edu/index.php/ital/article/view/13241 mailto:varnum@umich.edu 2 information technology and libraries | march 2010 michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. michelle frisque michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. michelle frisque president’s message: join us at the forum! t he first lita national forum i attended was in milwaukee, wisconsin. it seems like it was only a couple of years ago, but in fact nine national forums have since passed. i was a new librarian, and i went on a lark when a colleague invited me to attend and let me crash in her room for free. i am so glad i took her up on the offer because it was one of the best conferences i have ever attended. it was the first conference that i felt was made up of people like me, people who shared my interests in technology within the library. the programming was a good mix of practical know-how and mindblowing possibilities. my understanding of what was possible was greatly expanded, and i came home excited and ready to try out the new things i had learned. almost eight years passed before i attended my next forum in cincinnati, ohio. after half a day i wondered why i had waited so long. the program was diverse, covering a wide range of topics. i remember being depressed and outraged on the current state of internet access in the united states as reported by the office for information technology policy. i felt that surge of recognition when i discovered that other universities were having a difficult time documenting and tracking the various systems they run and maintain. i was inspired by david lanke’s talk, “obligations of leadership.” if you missed it you can still hear it online. it is linked from the lita blog (http:// www.litablog.org). while the next forum may seem like a long way off to you, it is in the forefront of my mind. the national forum 2010 planning committee is busy working to make sure this forum lives up to the reputation of forums past. this year’s forum takes place in atlanta, georgia, september 30–october 3. the theme is “the cloud and the crowd.” program proposals are due february 19, so i cannot give you specifics about the concurrent sessions, but we do hope to have presentations about projects, plans, or discoveries in areas of library-related technology involving emerging cloud technologies; software-as-service, as well as social technologies of various kinds; using virtualized or cloud resources for storage or computing in libraries; library-specific open-source software (oss) and other oss “in” libraries; technology on a budget; using crowdsourcing and user groups for supporting technology projects; and training via the crowd. each accepted program is scheduled to maximize the impact for each attendee. programming ranges from five-minute lightening talks to full day preconferences. in addition, on the basis of attendee comments from previous forums, we have also decided to offer thirtyand seventy-five-minute concurrent sessions. these concurrent sessions will be a mix of traditional singleor multispeaker formats, panel discussions, case studies, and demonstrations of projects. finally, poster sessions will also be available. while programs such as the keynote speakers, lightning talks, and concurrent sessions are an important part of the forum experience, so is the opportunity to network with other attendees. i know i have learned just as much talking with a group of people in the hall between sessions, during lunch, or at the networking dinners as i have sitting in the programs. not only is it a great opportunity to catch up with old friends, you will also have the opportunity to make new ones. for instance, at the 2009 national forum in salt lake city, utah, approximately half of the people who attended were first-time attendees. the national forum is an intimate event whose attendance ranges between 250 and 400 people, thus making it easy to forge personal connections. attendees come from a variety of settings, including academic, public, and special libraries; library-related organizations; and vendors. if you want to meet the attendees in a more formal setting you can attend a networking dinner organized on-site by lita members. this year the dinners were organized by the lita president, lita past president, lita presidentelect, and a lita director-at-large. if you have not attended a national forum or it has been a while, i hope i have piqued your interest in coming to the next national forum in atlanta. registration will open in may! the most up-to-date information about the 2010 forum is available at the lita website (http:// www.lita.org). i know that even after my lita presidency is a distant memory, i will still make time to attend the lita national forum. i hope to see you there! 87 book reviews automation in libraries, by r. t. kimber. oxford, pergamon press, 1968. 140 pp. $6.00. many books have been published in recent years on the subject of library automation. very few of them, however, have succeeded in making meaningful contributions to a better understanding of the subject. this volume has made a sincere effort to be one of the few. although library automation is an ambiguous term which lacks precise definition, it is used here clearly to mean the use of computers in libraries. the book is intended for those with no computer background but who are familiar with library operations. it attempts to give a good introduction to current practices in library automation and a fairly detailed ac4 count of the state of the art. in the first chapter, "libraries and automation," mr. kimber discusses the relationship between the library and the computer. seeing the computer as a means of performing human clerical functions, he points out two important attitudes that must be observed: first, one must not change to a computer system just for the sake of changing, and second, one must be willing to change if the change means improvement. the monetary worth of the computer in the library is difficult to express because the end result is not increased profit but better service. since benefits from computer operations can be expressed in time and effort saved, these are the means of monetary comparison the author suggests. he also observes that although there are many good reasons for wanting computerized operations, some of these are merely emotional. chapter ii, "introduction to computers" is written by anne h. boyd, lecturer in computation at queen's university of belfast. miss boyd gives a brief review of the development and use of computers and discusses the fundamentals of computer systems. the next four chapters by mr. kimber present computerized systems for various library activities: chapter iii, "ordering and acquisitions," chapter iv, "circulation control," chapter v, "periodicals listing and accessioning," chapter vi, "catalogues and bibliographies." each chapter with the minimum of technical terminology gives a good account of what is involved in automating a particular operation. his treatment is very informative on these matters. in his final chapter (chapter vii, "the present state of automation in libraries") kimber discusses current trends of library automation and gives examples of libraries which use computers. his list is admittedly not comprehensive, but it does provide a comparison to the "ideal" systems he has described in the earlier chapters. in commenting on the future of computerized library systems, he sees these systems as an escape from the problems of everyday library operations. .88 journal of library a.utomation vol. 3/1 march, 1970 this book should be a good addition to the current book-s on library automation. one unfortunate aspect, however, appears to be an absence of treatment regarding the psychological impact of automation on librarians and users which is certainly one important aspect to he considered when automation of a system is proposed. also, at times the author, in attempting to simplify his discussion, has made :a generalized statement without fuller explanation. this could be misleading and tend to confuse the uninitiated reader. these deficiencies are not of major consequence and do not prejudice the total work but, care should be taken in reading. sul h. lee 1968 international directory of research and development scientists, philadelphia: institute for scientific information, inc., 1969, 1352 pages { approx. ) . $60.00. the second issue of the "international directory of research and development scientists" ( idr&ds) iists the names and organizational addresses of 152,648 authors whose papers were listed in either 4o implications of marc, and the· library of congress systems studies. (this paper includes twenty-eight pages. of ap-· pendices., mostly charts}., two additional papers include a discussion of the future of, and a tabulation of trends affecting, library automation. mm:h of the material in these non-survey papers. is reported more completely elsewhere and some of ft now seems dated. the material presented in this publication must have produced a highly effective educational institute in 1967. in 1969~ its value is at best as a first reader in library automation but not as the state-of-the-art review the title proclaims. charles t . payne 90 journal of library automation vol. 3/1 march, 1970 computers and data processing: information sources, by chester morrill, jr. an annotated guide to the literature, associations, and institutions concerned with input, throughput, and output of data. detroit: gale research co., [1969]. 275 pp. $8.75. (management information guide, 15) this latest volume in the management information guide series should prove as useful as its predecessors, offering to those persons interested in or concerned with computers and data processing (and who now is not?) an organized and extensive survey of the basic and necessary source of available information. thus the text is for the most part an annotated bibliography of pertinent references arranged in broad categories, each category prefaced with a paragraph or two of comment. this is in the style of mr. morrill's earlier contribution to the series, systems and procedures including office management, 1967 and, in general, that of all the volumes of the series. section 7 "operating" is the largest category, some forty pages of references subdivided into "manuals," "digital computers," "data transmission," "fortran," "software" and the like. section 9, entitled "front office references," is of particular interest to the reference librarian, since it serves as a guide to desirable dictionaries, handbooks and abstracting services in the fields of automation and data processing. individual annotations are usually brief, informative and on occasion evaluative. they give evidence of considerable skill in the art of capsule characterization. the prefatory paragraphs and notes to each section characterize the particular topic as successfully and succinctly as do the individual annotations. the preface to section 3, "personnel," is particularly felicitous. coverage is ample not only as to the subjects chosen but also as to numbers of references under individual subjects. an important thirty pages of appendices lists additional sources of information associations, manufacturers, seminars, publishers, placement firms, etc.-particularly valuable to the business man or government official as a desk or front-office reference book, although the librarian will also find it of value in providing specific information for his clientele. in all, this is a highly competent and very welcome addition to the series as well as to the ranks of special reference sources so necessary to the proper practice of the reference librarian's art. i think of crane's a guide to the literature of chemistry and white's sources of information in the social sciences and consider the author quite comfortable in their company as well as in that of his colleagues in the series. in addition, he evinces in his annotations and prefaces a wit, a turn of phrase and a capacity for direct statement that inform and delight the user. he displays an expertise in the fields of management and computer science, and one feels one can rely on his selection and judgment. eleanor r. devlin book reviews 91 cenralized book processing: a feasibility study based on colorado academic libraries by lawrence e. leonard, joan m. maier and richard m. dougherty. metuchen, n.j.: scarecrow press, 1969. 401 pp. $10.00. in october 1966 the national science foundation awarded a grant to the university of colorado libraries and the colorado council of librarians for research in the area of centralized processing. the project was in three phases. phase i involved an examination of the feasibility of establishing a book-processing center to serve the needs of the nine state-supported college and university libraries in colorado (which range in size from the university of colorado, with 805,959 volumes as of june 30, 1967, to metropolitan state college, a new institution with 8,310 volumes). phase ii involved a simulation study of the proposed center, while phase iii involved an operational book-processing center on a one-year experimental basis. this book summarizes the results of the first two phases of the study. phase i involved a detailed time-and-cost analysis of the acquisition, cataloging, and bookkeeping procedures in the nine participating libraries, with resultant processing costs per volume which are both convincing and somewhat startling, ranging as they do from $2.67 to $7.71 per volume. the operating specifications of the proposed book-processing center are then set forth and a mathematical model for simulating its operations under a variety of alternative conditions is prepared. the conclusions are less than surprising: "a centralized book processing center to serve the needs of the academic libraries in colorado is a viable approach to book processing." project benefits are enumerated, in the areas of cost savings, time-lag reductions, and the more efficient utilization of personnel. unfortunately, while many of the conclusions are buttressed by a dazzling array of tables and mathematical formulas (how can most librarians really argue with a regression analysis correlation coefficient matrix?), some of the most important savings cited are based on simple guesses, in some cases very simple guesses. to mention just two examples: 1) we are told that "a discount advantage expected through the use of combined ordering and a larger volume of ordering is conservatively estimated at 5% ... " (perhaps, but what is this based on?) 2) in the area of time lag reduction, "the greatest savings in time will accrue when the center is able to purchase materials from a vendor who has built up his book stock to reflect the needs of academic institutions. up to now, vendors have been unwilling to do this because there is insufficient profit motive." would nine libraries combining together change this profit picture? it is unfortunate that this report could not have waited on phase iii, the completion of the one-year trial of the operational center which was to have been ready in august 1969, so that we could see just how the predictions for the center worked out in practice. as it stands, however, the 92 journal of library autcmuztion vol 3/1 march, 1970 book is a valuable study in library systems analysis and design, and its identification and quantification of the various technical processing activities can yield real benefits to librarians everywhere, be they ever so decentralized. norman dudley a guide to a selection of computer-based science and technology reference services in the u.s.a., american library association, chicago, illinois, 1969, 29 pages. $1.50. this guide is an attempt to bring together those reference publications which are also available in machine readable form. as a "selection" it is limited to eighteen sources from government, professional and private organizations. the guide is the result of a survey undertaken in 1968 by the science and technology reference services committee of the american library association reference services division. the committee was composed of elsie bergland, john mcgowan, william page, joseph paulukonis, margaret simonds, george caldwell, robert krupp and richard snyder. each entry is broken down into three units: 1) the characteristics of the data base, 2) the equipment configuration and 3) the use of the file. subject headings under characteristics of the data base include subject matter, literature surveyed, types of material covered, etc. the equipment configuration section describes computer model, core, operating systems, and programming language. the use of the file section covers potential uses of the data base by the producer and the subscriber. unfortunately for publications of this sort, they become out of date rather quickly. the continuing series, the directory of computerized information in science and technology, is updated periodically and is a very useful reference tool in this field. ge"y d. guthrie 92 journal of library autonuztion vol 3/1 march, 1970 book is a valuable study in library systems analysis and design, and its identification and quantification of the various technical processing activities can yield real benefits to librarians everywhere, be they ever so decentralized. norman dudley a guide to a selection of computer-based science and technology reference services in the u.s.a., american library association, chicago, illinois, 1969, 29 pages. $1.50. this guide is an attempt to bring together those refere~~e pu~lic,~~o~s which are also available in machine readable form. as a selection 1t ls limited to eighteen sources from government, professional and private organizations. . . the guide is the result of a survey undertaken m 1968 by the sc1ence and technology reference services committee of the american library association reference services division. the committee was composed of elsie bergland, john mcgowan, william page, joseph paulukonis, margaret simonds, george caldwell, robert krupp and richard snyder. each entry is broken down into three units: 1) the characteristics of the data base, 2) the equipment configuration and 3) the use of the file. subject headings under characteristics of the data base include subject matter, literature surveyed, types of material covered, etc. the. equipment configuration section describes computer model, core, operatmg systems, and programming language. the use of the file section covers potential uses of the data base by the producer and the subscriber. unfortunately for publications of this sort, they become out of date rather quickly. the continuing series, the directory of computerized infornuztion in science and technology, is updated periodically and is a very useful reference tool in this field. gerry d. guthrie \ orthographic error patterns of author names in catalog searches 93 renata tagliacozzo, manfred kochen, and lawrence rosenberg: mental health research institute, the university of michigan, ann arbor, michigan an investigation of error patterns in author names based on data from a survey of library catalog searches. position of spelling errors was noted and related to length of name. probability of a name having a spelling error was found to increase with length of name. nearly half of the spelling mistakes were replacement errors; following, in order of decreasing frequency, were omission, addition, and transposition errors. computer-based catalog searching may fail if a searcher provides an author or title which does not match with the required exactitude the corresponding computer-stored catalog entry ( 1). in designing computer aids to catalog searching, it is important to build in safety features that decrease sensitivity to minor errors. for example, compression coding techniques may be used to minimize the effects of spelling errors on retrieval ( 2, 3, 4). preliminary to the design of good protection devices, the application of error-correction coding theory ( 5, 6, 7) and data on error patterns in actual catalog searches ( 8, 9) may be helpful. a recent survey of catalog use at three university libraries yielded some data of the above-mentioned kind (10). the aim of this paper is to present and analyze those results of the survey which bear on questions of error control in searching a computer-stored catalog. in the survey, users were interviewed at random as they approached the catalog. of the 2167 users interviewed, 1489 were searching the catalog for a particular item ("known-item searches"). of these, 67.9% first entered the catalog with an author's or editor's name, 26.2% with a title, and 5.9% with a subject heading. approximately half the searchers had a written citation, while half relied on memory for the relevant ineditorial board thoughts | eden 109 editorial board thoughts bradford lee eden musings on the demise of paper w e have been hearing the dire predictions about the end of paper and the book since microfiche was hailed as the savior of libraries decades ago. now it seems that technology may be finally catching up with the hype. with the amazon kindle and the sony reader beginning to sell in the marketplace despite the cost (about $360 for the kindle), it appears that a whole new group of electronic alternatives to the print book will soon be available for users next year. amazon reports that e-book sales quadrupled in 2008 from the previous year. this has many technology firms salivating and hoping that the consumer market is ready to move to digital reading as quickly and profitably as the move to digital music. some of these new devices and technologies are featured in the march 3, 2009, fortune article by michael v. copeland titled “the end of paper?”1 part of the problem with current readers is their challenges for advertising. because the screen is so small, there isn’t any room to insert ads (i.e., revenue) around the margins of the text. but new readers such as plastic logic, polymer vision, and firstpaper will have larger screens, stronger image resolution, and automatic wireless updates, with color screens and video capabilities just over the horizon. still, working out a business model for newspapers and magazines is the real challenge. and how much will readers pay for content? with everything “free” over the internet, consumers have become accustomed to information readily available for no immediate cost. so how much to charge and how to make money selling content? the plastic logic reader weighs less than a pound, is one-eighth of an inch thick, and resembles an 8½ x 11 inch sheet of paper or a clipboard. it will appear in the marketplace next year, using plastic transistors powered by a lithium battery. while not flexible, it is a very durable and break-resistant device. other e-readers will use flexible display technology that allows one to fold up the screen and place the device into a pocket. much of this technology is fueled by e-ink, a start-up company that is behind the success of the kindle and the reader. they are exploring the use of color and video, but both have problems in terms of reading experience and battery wear. in the long run, however, these issues will be resolved. expense is the main concern: just how much are users willing to pay to read something in digital rather than analog? amazon has been hugely successful with the kindle, selling more than 500,000 for just under $400 in 2007. and with the drop in subscriptions for analog magazines and newspapers, advertisers are becoming nervous about their futures. or will the “pay by the article” model, like that used for digital music sales, become the norm? so what should or do these developments mean for libraries? it means that we should probably be exploring the purchase of some of these products when they appear and offering them (with some content) for checkout to our patrons. many of us did something similar when it became apparent that laptops were wanted and needed by students for their use. many of us still offer this service today, even though many campuses now require students to purchase them anyway. offering cutting-edge technology with content related to the transmission and packaging of information is one way for our clientele to see libraries as more than just print materials and a social space. and libraries shouldn’t pay full price (or any price) for these new toys; companies that develop these products are dying to find free research and development focus groups that will assist them in versioning and upgrading their products for the marketplace. what better avenue than college students? related to this is the recent announcement by the university of michigan that their university press will now be a digital operation to be run as part of the library.2 decreased university and library budgets have meant that university presses have not been able to sell enough of their monographs to maintain viable business models. the move of a university press to a successful scholarly communication and open-source publishing entity like the university of michigan libraries means that the press will be able to survive, and it also indicates that the newer model of academic libraries as university publishers will have a prototypical example to point out to their university’s administration. in the long run, these types of partnerships are essential if academic libraries are to survive their own budget cuts in the future. references 1. michael v. copeland, “the end of paper?” cnnmoney .com, mar. 3, 2009, http://money.cnn.com/2009/03/03/ technology/copeland_epaper.fortune/ (accessed june 22, 2009). 2. andrew albanese, “university of michigan press merged with library, with new emphasis on digital monographs,” libraryjournal.com, mar. 26, 2009, http://www .libraryjournal.com/article/ca6647076.html (accessed june 22, 2009). bradford lee eden (eden@library.ucsb.edu) is associate university librarian for technical services and scholarly communication, university of california, santa barbara. a tale of two tools: comparing libkey discovery to quicklinks in primo ve communication a tale of two tools comparing libkey discovery to quicklinks in primo ve jill k. locascio and dejah rubel information technology and libraries | june 2023 https://doi.org/10.6017/ital.v42i2.16253 jill k. locascio (jlocascio@sunyopt.edu) is associate librarian, suny college of optometry. dejah rubel (dejahrubel@ferris.edu) is metadata and electronic resources management librarian, ferris state university. © 2023. introduction consistent delivery of full-text content has been a challenge for libraries since the development of online databases. library systems have attempted to meet this challenge, but link resolvers and early direct linking tools often fell short of patron expectations. in the last several years, a new generation of direct linking tools has appeared, two of which will be discussed in this article: third iron’s libkey discovery and quicklinks by ex libris, a clarivate company. figure 1 shows the “download pdf” link added by libkey. figure 2 shows the “get pdf” link provided by quicklinks. the way we configured our discovery interface, a resource cannot receive both the libkey and quicklinks pdf links. these two direct linking tools were chosen because they were both relatively new to the market in april 2021 when this analysis took place and they can both be integrated into primo ve, the library discovery system of choice at the authors’ home institutions of suny college of optometry and ferris state university. through analysis of the frequency of direct links, link success rate, and number of clicks, this study may help determine which product is most likely to meet your patrons’ needs. figure 1. example of a libkey discovery link in primo ve. figure 2. example of a quicklink in primo ve. mailto:jlocascio@sunyopt.edu mailto:dejahrubel@ferris.edu information technology and libraries june 2023 a tale of two tools 2 locascio and rubel literature review over the past 20 years link resolvers and direct linking have evolved in tandem. early link generator tools, such as proquest’s sitebuilder, often involved a process that “… proved too cumbersome for most end-users.”1 five years later, tools from ebsco, gale, ovid, and proquest had improved, but they were all proprietary. bickford postulates that metadata-based standards, like openurl, may make linking as simple as copying and pasting from the address bar; however, they may be more likely to fail “… as long as vendors use incompatible, inaccurate, or incomplete metadata.”2 the first research was wakimoto’s 2006 study of sfx, which relied on 224 test queries and 188,944 individual uses for its data set. 3 of those queries, 39.7% of search results included a full-text link and that link was accessed 65.2% of the time. unfortunately, wakimoto also discovered that 22.2% of all full-text results failed and concluded that most complaints against sfx were problems with the systems it links to and not the link resolver itself. alth ough intended to be provider-neutral, the openurl standard is, in fact, vulnerable to metadata omissions. content providers, whether aggregators or publishers, have a vested interest in link stability and platform use and have therefore invested in building direct link generation tools. in 2006, grogg examined ebsco’s smartlink, which checks access rights before generating the link; proquest’s crosslinks, which was used to link from proquest to another vendor’s content; silverplatter and links@ovid, which relied on a knowledge base in the terabytes for static links.4 in 2008, cecchino described the national library of medicine’s linkout tool for selected publishers within pubmed.5 they also described two ovid products: links@ovid and linksolver, noting that the former is similar to linkout and the latter is similar to sfx. most of the time these tools worked well, but their use was restricted to a particular platform or set of publishers. as online public catalogs became discovery layers, direct linking became a feature of the library management system. two studies have been done thus far: silton’s analysis of summon and stuart’s analysis of 360 link. in 2014, silton tested the percentage of full-text articles retrievable from summon by running a test query and examining the first 100 results. over a year, the total success rate for unfiltered queries rose from 61% to 76%. after direct linking was introduced, the success rate of link resolver links rose to 65.8–73% and direct links succeeded 90.48–100% of the time. silton concluded, “while direct linking had some issues in its early months, it generally performs better than the link resolver.”6 in 2011, stuart, varnum, and ahronheim began testing the 1-click feature of 360 link on 579 citations, 82.2% of which were successful. after direct linking became an option for summon in 2012, 61–70% of their sample relied on it. “between direct linking and 1-click about 93 to 94% of the time an attempt was made to lead users directly to the full text of the article … [and] … we were able to reach full text … from 79% to about 84% of the time.”7 direct linking outperformed 1-click with a 90% success rate compared to 58–67% for 1-click. stuart also compared the actual error rate with one based on user reports and discovered that “relying solely on user reports of errors to judge the reliability of full-text links dramatically underreports true problems by a factor of 100.”8 openurl links were especially alarming with approximately 20% of them failing. although direct linking is more reliable, stuart closes by noting that direct linking binds libraries closer to vendors thereby decreasing institutional their flexibility. information technology and libraries june 2023 a tale of two tools 3 locascio and rubel methods the goal of this project was to assess two of the latest direct linking tools: ex libris’s native quicklinks feature and third iron’s libkey discovery. we performed a side-by-side comparison of the two tools by searching for specific articles in primo ve, the library discovery system used by the authors’ respective home institutions, suny college of optometry and ferris state university, and measuring • how often each vendor’s direct links appeared on the brief record; • success rate of the links; and • number of clicks it takes from each link to reach the pdf full text. both suny college of optometry and ferris state university use ex libris’ alma as their library services platform. alma provides a number of usage reports in their analytics module. we sourced the queries used in our analysis from the alma analytics link resolver usage report. the report contains a field number of requests, which records the number of times an openurl request was sent to the link resolver. an openurl request is sent to the link resolver when the user clicks on a link to the link resolver from an outside source (such as google scholar), for example, when the user submits a request using primo’s citation linker or when the user accesses the article’s full record in primo by clicking on either the brief record’s title or availability statement. this means that results that have a direct link (whether a quicklink or libkey discovery link) on the brief record will not appear in the report if the user clicked the direct link to the article. thus, in order to create test searches that would be an accurate representation of articles being accessed, we used article titles taken from suny optometry’s october 2019 alma link resolver usage report— a report that was generated prior to the implementation of both libkey discovery and quicklinks. the report was filtered to include only articles with the source type of primo/primo central to ensure that the initial search was taking place within the native primo interface, as requests from outside sources like google scholar or from primo’s citation linker are irrelevant to this analysis. this filtering generated a total of 412 articles. after further removal of duplicates and non -article material, there were 386 article titles in our test query set. we created two separate primo views as test environments: one with libkey discovery and the other with quicklinks. we ran the test searches twice in each view. in the first round of testing, we recorded whether a direct link was present. we also recorded the name of the full-text provider (if present), as well as whether the article was open access. suny optometry does not filter their primo results by availability; therefore, many of the articles included in the initial search did not have any associated full-text activations. since these articles are irrelevant to our assessment, we removed them before analyzing the first round of data and proceeding with the second search. the exception to these removals were articles identified as open access by unpaywall, as the presence of unpaywall links is independent of any activations in alma. furthermore, third iron’s libkey discovery and ex libris’ quicklinks both incorporate unpaywall’s api into their products to provide direct links to pdfs of open access articles. this functionality helps fill coverage gaps where institutions may not have activated a hybrid open access journal due to its paywalls. therefore, we are including the presence of direct links resulting from the unpaywall api when determining whether a libkey discovery link or quicklink is present. after filtering for availability, we had 254 article titles for the first round of searching and analysis. the initial analysis revealed the need to further filter the information technology and libraries june 2023 a tale of two tools 4 locascio and rubel articles used for the second round of searching, which would provide a much closer comparison of the two direct linking tools as third iron had partnered with more content providers than ex libris. controlling for shared providers would give a more accurate representation of how each direct linking tool performs in relation to the other. when controlling for shared providers and open access articles, we were left with 145 article titles for the second query set. during the second round of searching, we measured whether the direct link was successful in linking to the full text—meaning that the link was neither broken nor linked to an incorrect article—and how many clicks were necessary to get from the direct link to the article pdf. along the way, additional qualitative measures were observed, such as document download time and metadata record quality. while not as easy to measure as the quantitative data, these observations provided additional insight into the strengths and weaknesses of each of these direct linking tools. since april 2022, when our research was conducted, ex libris has added several quicklinks providers, possibly increasing the current number of quicklinks available. additionally, both rounds of searching were conducted on campus, so our analysis excludes any consideration of authentication and/or proxy information. results of the 254 articles searched, 208 (82%) had libkey discovery links present while 129 (52%) had quicklinks present. while this seems like a large discrepancy between the two direct link providers, it can be explained by the fact that during the time of testing, ex libris was collaborating with fewer content providers than third iron. ex libris has since added more providers. while the provider discrepancy meant that there were many instances where a libkey discovery link was present where a quicklink was not, there were 5 articles where a quicklink was present while a libkey discovery link was not. as mentioned previously, the criterion for the 254 articles included in the second round of searching was that the articles must be activated in alma or must be open access. of these 254 articles, we identified 137 (54%) as open access. of those open access articles, 132 (96%) had libkey discovery links present, and 118 (86%) had quicklinks present. we found that 113 (82%) of the open access articles had both libkey discovery links and quicklinks present. we also discovered within this set of 137 open access articles that 30 (22%) were from non-activated resources. of those 30 open access articles from non-activated titles, all 30 (100%) had libkey discovery links appearing on the brief results and 24 (80%) had quicklinks. to get a better idea of how libkey discovery links and quicklinks compared in terms of linking success, we filtered to only those articles available from providers who were participating in both libkey discovery links as well as quicklinks. since both direct linking tools use unpaywall integrations, we continued to include open access articles. this filtering resulted in 145 articles where libkey discovery links were present in 137 articles (94%) while quicklinks were present in 129 articles (89%). we found that 123 (85%) of these 145 articles had both libkey discovery links and quicklinks present. there were 2 (1%) articles that had neither libkey discovery links nor quicklinks present despite being activated in a journal currently participating as a provider in both direct linking tools. there were also 14 articles (10%) that had libkey discovery links but information technology and libraries june 2023 a tale of two tools 5 locascio and rubel not quicklinks; all of these articles were open access. in total, of the 145 articles searched, 128 (88%) were identified as open access. as for the 137 libkey discovery links, 130 (95%) of them successfully linked to the article. on average it took 1.07 clicks to get to the pdf of the article. of the 129 quicklinks, 126 (98%) of them successfully linked to the article. on average it took 1.07 clicks to get to the pdf of the article. we also attempted to measure the time it took for the pages to load after the initial click on the libkey discovery links and quicklinks; however, the tools used to measure this, as well as the environments in which the links were being clicked, proved too varied to provide an appropriate comparison. nevertheless, we noted observations such as the page load times after clicking on libkey discovery links and quicklinks were generally consistent, but quicklinks attempts to connect to the wiley platform took a significant time (at least 10 seconds) to load. conclusions with high article linking success rates, both third iron’s libkey discovery and ex libris’ quicklinks deliver on the promise to provide fast and seamless access to full-text articles. however, the libkey discovery tool far outpaces quicklinks when it comes to coverage. both direct linking tools perform well with open access articles, supplying libraries with better options for full-text links to articles that may be in hybrid journals. as with any kind of full-text linking, both direct linking tools rely on metadata. in conclusion, while libkey discovery provides a more complete direct linking solution, both libkey discovery and quicklinks are reliable tools that improve primo’s discovery and delivery experience. endnotes 1 david bickford, “using direct linking capabilities in aggregated databases for e-reserves,” journal of library administration 41, no. 1/2 (2004): 31–45, https://doi.org/10.1300/j111v41n01_04. 2 bickford, 45. 3 wendy furlan, “library users expect link resolvers to provide full text while librarians expect accurate results,” evidence based library and information practice 1, no. 4 (2006): 60–63, https://doi.org/10.18438/b88c7p. 4 jill e. grogg, “linking without a stand-alone link resolver,” library technology reports 42, no. 1 (2006): 31–34. 5 nicola j. cecchino, “full-text linking demystified,” journal of electronic resources in medical libraries 5, no. 1 (2008): 33–42, https://doi.org/10.1080/15424060802093377. 6 kate silton, “assessment of full-text linking in summon: one institution’s approach,” journal of electronic resources librarianship 26, no. 3 (2014): 163–69, https://doi.org/10.1080/1941126x.2014.936767. https://doi.org/10.1300/j111v41n01_04 https://doi.org/10.18438/b88c7p https://doi.org/10.1080/15424060802093377 https://doi.org/10.1080/1941126x.2014.936767 information technology and libraries june 2023 a tale of two tools 6 locascio and rubel 7 kenyon stuart, ken varnum, and judith ahronheim, “measuring journal linking success from a discovery service,” information technology and libraries 34, no. 1 (2015): 52–76, https://doi.org/10.6017/ital.v34i1.5607. 8 stuart, varnum, and ahronheim, 74. https://doi.org/10.6017/ital.v34i1.5607 introduction literature review methods results conclusions microsoft word 14041 20211221 galley.docx public libraries leading the way how covid affected our python class at the worcester public library melody friedenthal information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.14041 melody friedenthal (mfriedenthal@mywpl.org) is a public services librarian, worcester public library. © 2021. in june 2020, ital published my account of how the worcester public library (ma) came to offer a class in python programming and how that class was organized. although readers may have read the article in the middle of our covid-year, i wrote it mostly in early january 2020, before libraries across the country closed in an effort to protect staff and patrons from the disease. from spring 2020 through april 2021, i taught intro to coding: python for beginners five more times. but, of course, these classes were not face-to-face. like virtually all other library, musical, political, religious, and cultural programming across the world, our python course was taught virtually. the public services team has one professional zoom account, which my colleagues and i share. how did going remote affect this class? it depends on whether your perspective is that of a student or that of the instructor. many of us have read how difficult it’s been for teachers to effectively reach their elementarythrough-collage-age students. i’ve had many of the same challenges, but since nearly all my students are adults and they all chose to take this class, i don’t need to grapple with fidgety kids or recess. on the other hand, there were few distractions in our computer lab, while covid-time students have to grapple with pets, children squabbling, or noise from a tv. i was teaching from my home office. at the library i have one monitor but at home i have two, which makes it easier for me to spread out my assorted documents. to “protect” my students from seeing my messy house, i used a virtual background, one chosen not to distract. however, the software which determines the borders of a human presenter isn’t perfect and there is sometimes a halo behind my head of the things behind me; this may be distracting itself. prior to covid, since we had twelve seats in the computer lab, we limited registration to fourteen, allowing for some no-shows (and we have two spare laptops, in case everyone showed up). a week prior to session one i would email the registrants, asking them to confirm their continued interest. if a student didn’t confirm, i’d give their seat to someone on the waitlist. while i was not prepared to make my class a mooc (massive open online courses) because i individually review homework and give lots of feedback, we did increase maximum registration to fifteen since the number of seats in the computer lab was no longer a limiting factor. and, as before, i ask for confirmation via email, but i also include in that email two links and an attached word doc. the document is an excerpt from cory doctorow’s novel little brother on the joys of coding. information technology and libraries december 2021 how covid affected our python class | friedenthal 2 the first embedded link leads to the free version of zoom. the second link is to the thonny website (https://thonny.org). thonny is a free ide (integrated development environment) where students can write and execute python code. we used thonny when i taught face-to-face, but the lab computers all had thonny installed, and were ready for students to use. now, i have to depend on the ability of students to download the software to their own computers. i ask students to do the two downloads ahead of session one. which brings us to two problems: the class was no longer accessible to students who live in a household without a computer and internet service. and, as i found out with one prospective student, it’s not accessible to patrons who don’t have administrative rights to their computer; that is, the ability to download new software. when a patron confirms their interest, i email them the course manual. it now contains about 93 pages. i told students they might choose to print it but doing so is up to individual preference. the advantage of having a digital copy is that students can search for keywords easily. the disadvantage is that the cost of printing the manual is shifted to the student and may be prohibitive for some. in session one, i acknowledged that it’s difficult to learn technical material via zoom, and i encouraged everyone to ask questions during class and to email me if they are stymied while working on the homework. i reiterated that invitation during every session. while teaching, i bounce back-and-forth between screen-sharing my thonny window and the manual, while trying to keep an eye on the little zoom windows showing my students. some students cannot or choose not to turn on their video. this is a problem for me, since i can’t readily determine who’s asking a question. moreover, it is helpful to associate a face with a name. and since i give out a certificate of completion to each student who does the homework and attends all sessions, i want to make sure the student is actually taking part. i’ve had students who sign in, leave their camera off, and then apparently leave (i call on students by name and sometimes the no-video ones never respond). offering the class online has advantages in snowy worcester. students can tune in from the comfort of their own homes, avoid the slick roads, bypassing paying for parking at the municipal lot next to our building or for a bus to downtown, or the discomfort of walking in a dark citycenter in the evening. another plus: as program organizers and program participants have discovered, with videoconferencing we are no longer limited geographically. i had registrants who live in pennsylvania and georgia. as always, students range from total beginners to experienced programmers-of-other-languages. i’ve thought about how i can give extra time to the former while not boring the latter. one thing i’ve done is to make some assignments optional and say, “if you want an extra challenge, give this a try….” i’ve slowed the class down a bit, leaving more time for coding during each session. if a student has difficulties, i invited them to share their screen. this pedagogical technique actually works better information technology and libraries december 2021 how covid affected our python class | friedenthal 3 via zoom than in-person, because we could all see that screen equally well. in the computer lab, only the student who sat at the same (2-person) desk could easily see what the other person had coded. another thing i’ve done is to ratchet down the formality of the class: i am chattier and demo fun games i’ve written, e.g., hangman, tic-tac-toe, rock-paper-scissors, and you sunk my battleship, for inspiration. i experimented with using the built-in zoom whiteboard but that wasn’t satisfactory, so i wrote supplementary notes as comments in thonny. parents were fearful their kids were not being intellectually challenged when schools were closed due to the pandemic, so maybe i shouldn’t have been surprised that the april 2021 class contained seven children. there would have been an eighth, but when i realized one registrant was just seven years old, i told his mother that, while she was the best judge of her son’s abilities, i discouraged him from taking the class. she decided to take it herself. figure 1. a word-cloud of our fall 2020 project outcome evaluations (includes other digital learning programs). at our sixth and final session i traditionally execute a program which draws colorful graphics, rather like spirograph. students were able to see each curve being drawn in a new window launched by the ide. but this window doesn’t exist until i executed the program. while we were information technology and libraries december 2021 how covid affected our python class | friedenthal 4 using zoom, when i attempted to share my screen, the students missed the first graphics, no matter how fast i was at screen-sharing. i made the execution “sleep” for a few seconds to give me time to switch screens before the graphics were drawn. a larger percentage of students earned the certificate of completion during the virtual classes than on average in the in-person pre-covid classes, perhaps 75% vs. 40%. for the in-person classes our communications officer printed the certificates on heavy paper adorned with the wpl logo; i signed each and handed them out during the final session. for our virtual classes, the certificates were digitally signed and then emailed; students could print them if they chose. this follow-up is being written during october 2021, and with a substantial percentage of massachusetts residents vaccinated for covid, the worcester public library is now back to offering many programs in-person, including python. the city of worcester requires mask use in all municipal buildings, and while some patrons don’t cooperate, i’ve told my students that anyone not wearing a mask properly will be asked to leave the computer lab. with so many people out of work due to the economic devastation wrought by covid, we were gratified to be able to offer a class that teaches in-demand skills, especially ones that can be applied in a work-from-home environment. article title | author 23frbrization of a library catalog | dickey 23 the functional requirements for bibliographic records (frbr)’s hierarchical system defines families of bibliographic relationship between records and collocates them better than most extant bibliographic systems. certain library materials (especially audio-visual formats) pose notable challenges to search and retrieval; the first benefits of a frbrized system would be felt in music libraries, but research already has proven its advantages for fine arts, theology, and literature—the bulk of the non-science, technology, and mathematics collections. this report will summarize the benefits of frbr to nextgeneration library catalogs and opacs, and will review the handful of ils and catalog systems currently operating with its theoretical structure. editor’s note: this article is the winner of the lita/ ex libris writing award, 2007. t he following review addresses the challenges and benefits of a next-generation online public access catalog (opac) according to the functional requirements for bibliographic records (frbr).1 after a brief recapitulation of the challenges posed by certain library materials—specifically, but not limited to, audiovisual materials—this report will present frbr’s benefits as a means of organizing the database and public search results from an opac.2 frbr’s hierarchical system of records defines families of bibliographic relationship between records and collocates them better than most extant bibliographic systems; it thus affords both library users and staff a more streamlined navigation between related items in different materials formats and among editions and adaptations of a work. in the eight years since the frbr report’s publication, a handful of working systems have been developed. the first benefits of such a system to an average academic library system would be felt in a branch music library, but research already has proven its advantages for fine arts, theology, and literature—the bulk of the non-science, technology, and mathematics collections. ■ current search and retrieval challenges the difficulties faced first, but not exclusively, by music users of most integrated library systems fall into two related categories: issues of materials formats, and issues of cataloging, indexing, and marc record structure. music libraries must collect, catalog, and support materials in more formats than anyone else; this makes their experience of the most common ils modules—circulation, reserves, and acquisitions—by definition more complicated. the study of music continues to rely on the interrelated use of three distinct information formats—scores (the notated manifestation of a composer’s or improviser’s thought), recordings (realizations in sound, and sometimes video, of such compositions and improvisations), and books and journals (intellectual thought regarding such compositions and improvisations)—music libraries continue to require . . . collections that integrate [emphasis mine] these three information formats appropriately.3 put a different way, “relatedness is a pervasive characteristic of music materials.”4 this is why frbr’s model of bibliographic relationships offers benefits that will first impact the music collection.5 at present, however, musical formats pose search and retrieval challenges for most ils users, and the problem is certainly replicated with microforms and video recordings. the marc codes distinguish between material formats, but they support only one category for sound recordings, lumping together cd, dvd audio, cassette tape, reel-toreel tape, and all other types.6 this single “sound recording” definition is easily reflected in opacs (such as those powered by innovative interfaces’ millennium and ex libris’ aleph 500) and union catalogs (such as worldcat. org).7 however, the distinction between sound recording formats is embedded in subfields of the 007 field, which presently cannot be indexed by many library automation systems because the subfields are not adjacent. an even more central challenge derives from the fact that music sound recordings—such as journals and essay collections—contain within each item more than one work. thus, for one of the central material formats collected by a music library (as well as by a public library or other academic branches), users routinely find themselves searching for a distinct subset of the item record. perversely, though music catalogers do tend to include analytic added-entries for the subparts of a cd recording or printed score, and major ils vendors are learning to index them, aacr2 guidelines set arbitrary cutoff points of about fifteen tracks on a sound recording, and three performable units within a score.8 subsets of essay collections and journal runs are routinely exposed to users’ searches by indexing and abstracting services and major databases, but subsets of libraries’ music collections depend upon catalogers to exploit the marc records for user access.9 timothy j. dickey (dickeyt@oclc.org) is a post-doctoral researcher, oclc office of programs and research, dublin, ohio. frbrization of a library catalog: better collocation of records, leading to enhanced search, retrieval, and display timothy j. dickey 24 information technology and libraries | march 200824 information technology and libraries | march 2008 in light of these pervasive bibliographic relationships, catalogers of music (again, with parallels in other subjects) have developed a distinctive approach to the marc metadata schema. in particular, they—with their colleagues in literature, fine arts, and theology—rely upon the 700t field for uniform work titles, and upon careful authority control.10 however, once again, many major ils portals have spotty records in affording access to library collections via these data. innovative interfaces’ millennium, though it clearly leads other major library products in this market, frequently frustrates music librarians (it is, of course, not alone in doing so).11 its automatic authority control feature works poorly with (necessary) music authority records.12 and even though innovative has been one of the first vendors to add a database index to the 700t field, partly in response to concerns expressed to the company by the music librarians’ user group, millennium apparently does not allow for an appropriate level of follow-through on searching.13 an initial search by name of a major composer, for instance, yields a huge and cluttered result set containing all indexed 700t fields.14 the results do helpfully include the appropriate see also references, but those references disappear in a subsidiary (limited) search. in addition, the subsidiary display inexplicably changes to an unhelpful arrangement of generic 245 fields (“mozart, symphonies”; “mozart, operas, excerpts”). similar challenges will be faced by other parts of an academic or large public library collection, including the literature collections (for works such as shakespeare’s plays), fine arts (for images and artists’ works), and theology (for works whose uniform title is in latin). the opac interfaces of other major ils vendors fare little better. the same search (for “mozart”) on the emory university library catalog (with an ils by sirsidynix), similarly yields a rich results set of more than one thousand records, and poses similar problems in refining the search.15 in the case of this opac, an index of 700t fields also exists, but it only may be searched from the inside of a single record; as with millennium, sirsidynix’s interface will then group the next set of results confusingly by 245 fields. the library corporation’s carl-x apparently does not contain a 700t index; the simple “mozart” search returns a muchsimplified set of only 97 results organized by 245a fields, and thus offers a more concise set of results but avoids the most incisive index for audio-visual materials.16 ex libris offers a somewhat more helpful display of its more restricted results; unfortunately for the present comparison, though the detailed results set does list the “format” of all mozart-authored items, the same term— “music”—is used for sound recordings, musical scores, and score excerpts, with no attempt logically to group the results around individual works.17 no 700t index appears present. ■ the frbr paradigm: review of literature and theory from the earliest library catalogs in the modern age, the tools of bibliographic organization have sought to afford users both access to the collection and collocation of related materials. anglo-american cataloging practice has traditionally served the first function by main entries and alternate access points and the second function by classification systems. however, as knowledge increases in scope and complexity, the systems of bibliographic control have needed to evolve. as early as the 1950s, theories were developing that sought to distinguish between the intellectual content of a work, and its often manifold physical embodiments.18 the 1961 paris international conference on cataloging principles first reified within the cataloging community a work-item distinction, though even the 1988 publication of the anglo-american cataloging rules, 2nd ed., “continued to demonstrate confusion about the nature . . . of works.”19 meanwhile, extensive research into the nature of bibliographic relationships groped toward a consensus definition of the entity-types that could encompass such relationships.20 ed o’neill and diane vizine-goetz examined some one hundred editions of smollett’s the expedition of humphrey clinker over a two-hundred-year span of publication history to propose a hierarchical set of definitions to define entity levels.21 the theoretical entities include the intellectual content of a work—which in the case of audio-visual works, may not even exist in any printed formats—the various versions, editions, and printings in which that intellectual content manifests itself, and the specific copies of each manifestation which a library may hold.22 research has discovered such clusters of bibliographically related entities for as much as 50 percent or more of all the intellectual works in any given library catalog, and as many as 85 percent of the works in a music catalog.23 this work laid the foundation for frbr (and, once again, incidentally underscored the breadth of its applicability to, and beyond, music catalogs). the theoretical framework of frbr is most concisely set forth in the final report of the ifla study group. the long-awaited publication traces its genesis to the 1990 stockholm seminar, and the resultant 1992 founding of the ilfa study group on functional requirements for bibliographic records. the study group set out to develop: a framework that identifies and clearly defines the entities of interest to users of bibliographic records, the attributes of each entity, and the types of relationships that operate between entities . . . a conceptual model that would serve as the basis for relating specific attributes and relationships . . . to the various tasks that users perform when consulting bibliographic records. article title | author 25frbrization of a library catalog | dickey 25 the study makes no a priori assumptions about the bibliographic record itself, either in terms of content or structure.24 in other words, the intention of the group’s deliberations and the final report is to present a model for understanding bibliographic entities and the relationships between them to support information organization tools. it specifically adopts an approach that defines classes of entities based upon how users, rather than catalogers, approach bibliographic records—or, by natural extension, any system of metadata. the frbr hierarchical entities comprise a fourfold set of definitions: ■ work: “a distinct intellectual or artistic creation”; ■ expression: “the intellectual or artistic realization of a work” in any combination of forms (including editions, arrangements, adaptations, translations, performances, etc.); ■ manifestation: “the physical embodiment of an expression of a work”; and ■ item: “a single exemplar of a manifestation.”25 examples of these hierarchical levels abound in the bibliographic universe, but frequently music offers the quickest examples: ■ work: mozart’s die zauberflöte (the magic flute) ■ work: puccini’s la bohéme ■ expression: the composer’s complete musical score (1896) ■ manifestation: edition of the score printed by ricordi in 1897 ■ expression: an english language edition for piano and voices ■ expression: a performance by mirella freni, luciano pavarotti, and the berlin philharmonic orchestra (october 1972) ■ manifestation: a recording of this perfor mance released on 33¹/³ rpm sound discs in 1972 by london records ■ manifestation: a re-release of the same per formance on compact disc in 1987 by london records ■ item: the copy of the compact disc held by the columbus metropolitan library ■ item: the copy of the compact disc held by the university of cincinnati in fact, lis research has tended to demonstrate what music librarians have always understood—that relatedness among items and complexity of families is most prevalent in audio-visual collections. even before the ifla report had been penned, sherry vellucci had set out the task: “to create new catalog structures that better serve the needs of the music user community, it is important first to understand the exact nature and complexity of the materials to be described in the catalog.”26 even limiting herself to musical scores alone (that is, no recordings or monographs), vellucci found that more than 94.8 percent of her sample exhibited at least one bibliographic relationship with another entity in the collection; she further related this finding to the very “inherent nature of music, which requires performance for its aural realization,” as opposed to, for example, monographic book printing.27 vellucci and others have frequently commented on how the relatedness of manifestations—in different formats, arrangements, and abridgements—of musical works continues to be a problem for information retrieval in the world of music bibliography.28 musical works have been variously and industriously described by musicologists and music bibliographers. yet, in the information retrieval domain [and, i might add, under both aacr and aacr2] . . . systems for bibliographic information retrieval . . . have been designed with the document as the key entity, and works have been dismissed as too abstract . . .29 the work is the access point many users will bring—in their minds, and thus in their queries—to a system. they intend, however, to discover, identify, and obtain specific manifestations of that work. very recently, research has begun to demonstrate that the frbr model can offer specific advantages to music retrieval in cases such as these: “the description of bibliographic data in a frbr-based database leads to less redundancy and a clearer presentation of the relationships which are implicit in the traditional databases found in libraries today.”30 explorations of the theory in view of the benefits to other disciplines, such as audio-visual and other graphic materials, maps, oral literature, and rare books, have appeared in the literature as well.31 the admitted weakness of the frbr theory, of course, is that it remains a theory at its inception, with still preciously few working applications. ■ frbr applications working implementations of frbr to catalogs, opacs, and ilss are still relatively few but promise much for the future. the frbr theoretical framework has remained an area of intense research at oclc, which has even led to some prototype applications and, very recently, deployment in the worldcat local interface.32 a scattered few other researchers have crafted frbr catalogs and catalog displays for their own ends; the library of congress has a prototype as well. innovative, the leading academic ils vendor, announced a frbr feature for 2005 release, 26 information technology and libraries | march 200826 information technology and libraries | march 2008 yet shelved the project for lack of a beta-testing partner library.33 ex libris’ primo discovery tool, one other complete ils (by visionary technologies for library systems, or vtls), and the national library of australia, have each deployed operational frbr applications.34 the number of projects testifies to the high level of interest among the cataloging and information science communities, while the relatively small number of successful applications testifies to the difficulties faced. oclc has engaged in a number of research projects and prototypes in order to explore ways that frbrization of bibliographic records could enhance information access. oclc research frequently notes the potential streamlining of library cataloging by frbrization; in addition they have experienced “superior presentation” and “more intuitive clustering” of search results when the model is incorporated into systems.35 work-level definitions stand behind such oclc research prototypes as audience level, dewey browser, fictionfinder, xisbn, and live search. in every case, researchers determined that, though it was very difficult to automate any identification of expressions, application of work-level categories both simplifies and improves search result sets.36 an algorithm common to several of these applications is freely available as an open source application, and now as a public interface option in oclc’s worldcat local.37 the algorithm creates an author/title key to cluster worksets (often at a higher level than the frbr work, as in the case of the two distinct works that are the book and screenplay for gone with the wind). in the public search interface, the results sets may be grouped at the work level; users may then execute a more granular search for “all editions,” an option that then displays the group of expressions linked to the work record. unfortunately, as the software does not use 700t fields (its intention is to travel up the entity hierarchy, and it uses the 1xx, 24x, and 130 fields), its usefulness in solving the above challenges may not be immediate. a somewhat similar application (though merrilee proffitt declares it not to be a frbr product) was redlightgreen, a user interface for the exrlg union catalog based upon quasi-frbr clustering.38 the reports from designers of other automated systems offer interesting commentaries on the process. the team building an automatically frbrized database and user interface for austlit—a new union collection of australian literature among eight academic libraries and the national library of australia—acknowledged some difficulty with non-monographic works such as poems, though the majority of their database consisted of simpler work-manifestation pairs.39 based on strongly positive user feedback (“the presentation of information about related works [is] both useful and comprehensible”), a similar application was attempted on the australian national music gateway musicaustralia; it is unclear whether the project was shelved due to difficulties in automating the frbrization process.40 one recent application created for the perseus digital library adopts a somewhat different approach.41 rather than altering previously created marc records to allow hierarchical relationships to surface, this team created new records using crosswalks between marc and, for instance, mods, for work-level records. they claim some moderate level of success; though once again, their discussion of the process is more illuminating than their product. mimno and crane successfully allowed a single manifestation-level record to link upwards to many expressions, a necessary analytic feature especially for dealing with sound recordings. they did practically demonstrate the difficulty of searching elements from different levels of the hierarchy at the same time (such as work title and translator), a complication predicted by yee.42 three ils vendors have released products that use the frbr model: portia (visualcat), ex libris (primo), and vtls (virtua).43 the first product, a cataloging utility from a smaller player in the vendor market, claims to incorporate frbr into its metadata capture, yet the information available does not explain how, nor do they offer an opac to exploit it. the 2007 release of ex libris’ primo offers what the company calls “frbr groupings” of results.44 this discovery tool is not itself an ils, but promises to interoperate with major existing ils products to consolidate search results. it remains unclear at this time how ex libris’ “standard frbr algorithms” actually group records; the single deployment in the danish royal library allows searching for more records with the same title, for instance, but does not distinguish between translations of the same work.45 vtls, on the other hand, has since 2004 offered a complete product that has the potential to modify existing marc records—via local linking tags in the 001 and 004 fields—to create frbr relationships.46 their own studies agreed with oclc that a subset, roughly 18 percent, of existing catalog records (most heavily concentrated in music collections) would benefit from the process, and they thus allow for “mixed” catalogs, with only subsets (or even individually selected records) to be frbrized. the company’s own information suggests relatively simple implementation by library catalogers, coupled with robust functionality for users, and may be the leading edge of the next generation of catalog products. ■ frbr solutions the ilfa study group, following its user-centered approach, set out a list of specific tasks that users of a computer-aided catalog should be able to accomplish: article title | author 27frbrization of a library catalog | dickey 27 ■ to find all manifestations embodying certain criteria, or to find a specific manifestation given identifying information about it; ■ to identify a work, and to identify expressions and manifestations of that work; ■ to select among works, among expressions, and among manifestations; and ■ to obtain a particular manifestation once selected. it seems clear that the frbr model offers a framework of relationships that can aid each task. unfortunately, none of the currently available commercial solutions may be in themselves completely applicable for a single library. the oclc work-set algorithm is open source, as well as easily available through worldcat local, but it only works to create super-work records; it also ignores the 700t field so crucial to many of the issues noted above. none of the other home-grown applications may have code available to an institution. the virtua module from vtls offers a very tempting solution, but may require a change of vendor.47 either adapting one of these solutions or designing a local application, then, raises the question: what would the ideal system entail? catalog frbrization will transpire in two segments: enhancing the existing catalog to add bibliographic relationships to surface in the retrieval phase, and designing or adaptating a new interface and display to reflect the relationships.48 the first task may prove the more formidable, due to the size of even a modest catalog database and the difficulties often observed in automating such a task; while the librarians constructing the austlit system found a relatively high percentage of records could be transferred en masse, the oclc research team had difficulty automatically pinpointing expressions from current marc records.49 despite current technology trends toward users’ application of tags, reviews, and other metadata, a task as specialized as adding bibliographic relationships to the catalog demands specialized cataloging professionals.50 the best approach within a current library structure may be to create a single new position to head the project and to act as liaison with cataloging staff in the various branches and with vendor staff, if applicable. each library branch may judge on its own the proportions of records to frbrize, beginning with high-traffic works and authors, those for whom search results tend to be the most overwhelming and confusing to users. each branch can be responsible for allocation of cataloging staff effort to the process, and will thus have specialist oversight of subsets of the database. three technical solutions to actually changing the database structure have been attempted in the literature to date: incrementally improving the existing marc records to better reflect bibliographic relationships, adding local linking tags, and simply creating new metadata schemas. the vtls solution of adding local linking tags seems most appropriate; relationships between records are created and maintained via unique identifiers and linking statements in the 001 and 004 fields.51 oclc’s open source software could expedite the creation of work-level records, and the creation of expression-level records will be made easier by the large amount of bibliographic information already present in the current catalog. wherever possible, cataloging staff also should take the opportunity to verify or create links to authority files so as to enhance retrieval.52 creating a new catalog display option could be accomplished via additions to current opac coding, either by adopting worldcat local or by designing parts of a new local interface. it need not even require a complete revision; the single site (ucl) currently deploying vtls’ frbrized interface maintains a mixed catalog and offers, once again, a highly intuitive model.53 when a searcher comes across a bibliographic record for which frbr linking is available, they may click a link to open a new display screen. we should strive, however, to use simple interface statements such as “view all different kinds of holdings,” “this work has x editions, in y languages” or “this version of the work has been published z times” (both the oclc prototype and the austlit gateway offer such helpful and user-friendly statements). though the foundational work of both tillett and smiraglia focused upon taxonomies of relationships, the hierarchical structure of the ifla proposal should remain at the forefront of the display, with a secondary organization by type of relationship or type of entity. rather than adopting a design which automatically refreshes at each click, a tree organization of the display should be more user-friendly, allowing users to maintain a visual sense of the organization that they are encountering (see appendix for screenshots of this type of tree display).54 format information should be included in the display, as an indication of a users’ primary category, as well as a distinction among expressions of a work. with these changes, the library catalog will begin to afford its users better access to many of its core collections. frbrization of even part of the catalog—concentrating on high-incidence authors, as identified by subject specialists—will allow it better to reflect, and collocate, items within the families of bibliographic relationships that have been acknowledged a part of library collections for decades. this increased collocation will begin to counteract the pitfalls of mere keyword searching on the part of users, especially in conjunction with renewed authority work. finally, frbr offers a display option in a revamped opac that is at the same time simpler than current result lists, and more elegant in its reflection of relatedness among items. each feature should better 28 information technology and libraries | march 200828 information technology and libraries | march 2008 enable the users of our catalog to find, select, and obtain appropriate resources, and will bring our libraries into the next generation of cataloging practice. references and notes 1. ifla committee on the functional requirements for bibliographic records, final report (munich: k. g. saur, 1998); see also http://www.ifla.org/vii/s13/wgfrbr/bibliography.htm (accessed mar. 10, 2007). 2. this paper began as a graduate research assignment for lis 60640 (library automation), in the kent state university mlis program, march 19, 2007. my thanks to jennifer hambrick, nancy lensenmayer, and joan lippincott, for their helpful comments on earlier drafts. the curricular assignment asked for a library automation proposal in a specific library setting; the original review contained a set of recommendations concerning frbr through the lens of a (fictional) medium-sized academic library system, that of st. hildegard of bingen catholic university. as will be noted below, the branch music library typically serves a small population of music majors (graduate and undergraduate) within such an institution, but also a large portion of the student body that use the library’s collection to support their music coursework and arts distribution requirements. any music library’s proportion of the overall system’s holdings may be relatively small, but will include materials in a diverse set of formats: monographs, serials, musical scores, sound recordings in several formats (cassette tapes, lps, cds, and streaming audio files), and a growing collection of video recordings, likewise in several formats (vhs, laser discs, and dvd). it thus offers an early test case for difficulties with an automated library system. 3. dan zager, “collection development and management,” notes—quarterly journal of the music library association 56, no. 3 (march 2000): 569. 4. sherry l. velluci, “music metadata and authority control in an international context,” notes—quarterly journal of the music library association 57, no. 3 (mar. 2001): 541. 5. the opac for the university of huddersfield library system famously first deployed a search option for related items (“did you mean . . . ?”); http://www.hud.ac.uk/cls (accessed july 10, 2007). frbr not only offers the related item search, but also logically groups related works throughout the library catalog. 6. allyson carlyle demonstrated empirically that users value an object’s format as one of the first distinguishing features: “user categorization of works: toward improved organization of online catalog displays,” journal of documentation 55, no. 2 (mar. 1999): 184–208 at 197. 7. millennium will feature heavily in the following discussion, both because of its position leading the academic library automation market (being adopted wholesale by, for instance, the ohio statewide academic library consortium), and because it was the subject of the original paper. 8. see alastair boyd, “the worst of both worlds: how old rules and new interfaces hinder access to music,” caml review 33, no. 3 (nov. 2005), http://www.yorku.ca/caml/ review/33-3/both_worlds.htm (accessed mar. 12, 2007); michael gorman and paul w. winkler, eds., anglo-american cataloging rules, 2nd ed. (chicago: ala, 1988). 9. in the past few years, a small subset of the search literature has described technical efforts to develop search engines that can query by musical example; see j. stephen downie, “the scientific evaluation of music information retrieval systems: foundations and future,” computer music journal 28, no. 2 (summer 2004): 12–23. a company called melodis corporation has recently announced a successful launch of a query-by-humming search engine, though a verdict from the music community remains out; http://www.midomi.com (accessed jan. 31, 2007). 10. see velluci, “music metadata and authority control in an international context”; richard p. smiraglia, “uniform titles for music: an exercise in collocating works,” cataloging and classification quarterly 9, no. 3 (1989): 97–114; steven h. wright, “music librarianship at the turn of the century: technology,” notes—quarterly journal of the music library association 56, no. 3 (mar. 2000): 591–97. each author builds upon the foundational work of barbara tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging” (ph.d. diss., university of california at los angeles, 1987). 11. “at conferences, [my colleagues] are always groaning if they are a voyager client,” interview with an academic music librarian by the author, feb. 9, 2007. 12. several prominent music librarians only discovered that innovative’s system had such a feature when instances of the automatic system’s changing carefully crafted music authority records were discovered; mark sharff (washington university in st. louis) and deborah pierce (university of washington), postings to innovative music users’ group electronic discussion list, oct. 6, 2006, archive accessed feb. 1, 2007. 13. music librarians are the only subset of the millennium users to have formed their own innovate users’ group. sirsidynix has a separate users’ group for stm librarians, and ex libris hosts a law librarians’ users’ group, two other groups whose interaction with the ils poses discipline-specific challenges. 14. searches were tested on the the ohio state university libraries’ opac , http://library.osu.edu (accessed mar. 10, 2007). 15. http://www.emory.edu/libraries.cfm (accessed june 27, 2007). 16. searches performed on the library of oklahoma state university, http://www.library.okstate.edu (accessed june 27, 2007); tlc has considered making frbrization a possible feature of their product. they offer some concatenation of “intellectually similar bibliographic records,” and “tlc continues to monitor emerging frbr standards”; don kaiser, personal communication to the author, july 8, 2007. i was unable to reach representatives of sirsidynix on this issue. 17. searches performed on the mit library catalog, powered by aleph 500 http://libraries.mit.edu (accessed june 27, 2007). 18. eva verona, “literary unit versus bibliographic unit [1959],” in foundations of descriptive cataloging, ed. michael carpenter and elaine svenonius, 155–75 (littleton, colo.: libraries unlimited, 1985), and seymour lubetzky, principles of cataloging, final report phase i: descriptive cataloging (los angeles: institute for library research, 1969), are usually credited with article title | author 29frbrization of a library catalog | dickey 29 the foundational work on such theories; see richard p. smiraglia, the nature of “a work”: implications for the organization of knowledge (lanham, md.: scarecrow, 2001), 15–33, to whom the following overview is indebted. 19. anglo-american cataloging rules, cited in smiraglia, the nature of “a work,” 33. 20. among the many library and information science thinkers contributing to this body of research, the most prominent have been patrick wilson, “the second objective” in the conceptual foundations of descriptive cataloging, ed. elaine svenonius, 5–16 (san diego: academic publ., 1989); edward t. o’neill and diane vizine-goetz, “bibliographic relationships: implications for the function of the catalog,” in the conceptual foundations of descriptive cataloging, ed. elaine svenonius, 167–79 (san diego: academic publ., 1989); barbara ann tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging” (ph.d. diss, university of california, los angeles, 1987); eadem, “bibliographic relationships,” in relationships in the organization of knowledge, carol a. bean and rebecca green, eds. , 19–35 (dordrecht: kluwer, 2001) (summary of her dissertation findings on 19–20); martha m. yee, “manifestations and near-equivalents: theory with special attention to moving-image materials,” library resources and technical services 38, no. 3 (1994): 227–55. 21. o’neill and vizine-goetz, “bibliographic relationships”; see also edward t. o’neill, “frbr: application of the entityrelationship model to humphrey clinker,” library resources and technical services 46, no. 4 (oct. 2002): 150–59. 22. theorists in music semiotics who have more or less profoundly influenced music librarians’ view of their materials include jean-jacques nattiez, music and discourse: toward a semiology of music, trans. by carolyn abbate (princeton, n.j.: princeton univ. pr., 1990), and lydia goehr, the imaginary museum of musical works (new york: oxford univ. pr., 1992). see also smiraglia, the nature of “a work,” 64. for a concise overview of how semiotic theory has influenced thinking about literary texts, see w. c. greetham, theories of the text (oxford: oxford univ. pr., 1999), 276–325. 23. studies have found families of derivative bibliographic relationships in 30.2 percent of all worldcat records, 49.9 percent of records in the catalog of georgetown university library, 52.9 percent in the burke theological library (union theological seminary), 57.9 percent of theological works in the new york university library, and 85.4 percent in the sibley music library at the eastman school of music (university of rochester). see smiraglia, the nature of “a work,” 87, who cites richard p. smiraglia and gregory h. leazer, “derivative bibliographic relationships: the work relationship in a global bibliographic database,” journal of the american society for information science 50 (1999): 493–504; richard p. smiraglia, “authority control and the extent of derivative bibliographic relationships” (ph.d. diss., university of chicago, 1992); richard p. smiraglia, “derivative bibliographic relationships among theological works,” proceedings of the 62nd annual meeting of the american society for information science (medford, n.j.: information today, 1999): 497–506; and sherry l. vellucci, “bibliographic relationships among musical bibliographic entities: a conceptual analysis of music represented in a library catalog with a taxonomy of the relationships” (d.l.s. diss., columbia university, 1994). 24. ifla, final report, 2–3. 25. ibid, 16–23. 26. sherry l. vellucci, bibliographic relationships in music catalogs (lanham, md.: scarecrow, 1997), 1. 27. ibid, 238; 251. 28. vellucci, “music metadata”; richard p. smiraglia, “musical works and information retrieval,” notes: quarterly journal of the music library association 58, no. 4 (june 2002). patrick le boeuf notes that users of music collections often use the single word “score” to indicate any one of the four frbr entities; “musical works in the frbr model or ‘quasi la stessa cosa’: variations on a theme by umberto eco,” in functional requirements for bibliographic records (frbr): hype or cure-all? ed. patrick le boeuf, 103–23 at 105–06 (new york: haworth, 2005). 29. smiraglia, “musical works and information retrieval,” 2. 30. marte brenne, “storage and retrieval of musical documents in a frbr-based library catalogue” (masters’ thesis, oslo university college, 2004), 79. see also john anderies, “enhancing library catalogs for music,” paper presented at the conference on music and technology in the liberal arts environment, hamilton college, june 22, 2004; powerpoint presentation accessed mar. 12, 2007, from http://academics. hamilton.edu/conferences/musicandtech/presentations/catalog-enhancements.ppt; boyd, “the worst of both worlds.” 31. see the extensive bibliography compiled by ifla, cataloging division: “frbr bibliography,” http://www.ifla.org/ vii/s13/wgfrbr.bibliography.htm (accessed mar. 10, 2007). 32. the first ils deployment of the worldcat local application using frbr is with the university of washington libraries: http://www.lib.washington.edu (accessed june 27, 2007). 33. innovative interfaces, inc., “millennium 2005 preview: frbr support,” inn-touch (june 2004), 9. interestingly, the onepage advertisement for the new service chose a musical work, puccini’s opera la bohème, to illustrate how the sorting would work. innovative interfaces booth staff at the ala national conference, washington, d.c., june 24, 2007, told the author the company has moved in a different development direction now (investing more heavily in faceted browsing). 34. denmark’s det kongelige bibliotek has been the first ex libris partner library to deploy primo, http://www.kb.dk/en (accessed july 10, 2007). the vtls system has been operating since 2004 at the université catholique de louvain, http:// www.bib.ucl.ac.be (accessed mar. 15, 2007). for austlit, see http://www.austlit.edu.au (accessed mar. 14, 2007). 35. rick bennett, brian f. lavoie, and edward t. o’neill, “the concept of a work in worldcat: an application of frbr,” library collections, acquisitions, and technical services 27, no. 1 (spring 2003): 45–60. work-level records allow manifestation and item records to inherit labor-intensive subject classification metadata; eric childress, “frbr and oclc research,” paper presented at the university of north carolina-chapel hill, apr. 10, 2006, http://www.oclc.org/research/presentations/ childress/20060410-uncch-sils.ppt (accessed mar. 12, 2007). 36. thomas b. hickey, edward t. o’neill, and jenny toves, “experiments with the ifla functional requirements for bibliographic records (frbr),” d-lib 8, no. 9 (sept. 2002), http://www.dlib.org/dlib/september02/hickey/09hickey.html (accessed mar. 12, 2007). 37. thomas b. hickey and jenny toves, “frbr work-set algorithm,” apr. 2005 report, http://www.oclc.org/research/ projects/frbr/default.htm (accessed mar. 12, 2007); algorithm 30 information technology and libraries | march 200830 information technology and libraries | march 2008 available at http://www.oclc.org/research/projects/frbr/algorithm.htm. on worldcat local, see above, note 32. 38. merrilee proffitt, “redlightgreen: frbr between a rock and a hard place,” http://www.ala.org/ala/alcts/alctsconted/ presentations/proffitt.pdf (accessed mar. 12, 2007). redlight green has been discontinued, and some of its technology incorporated into worldcat local. 39. http://www.austlit.edu.au (accessed mar. 14, 2007), but unfortunately a subscription database at this time, and thus unavailable for operational comparison. see marie-louise ayres, “case studies in implementing functional requirements for bibliographic records: austlit and musicaustralia,” alj: the australian library journal 54, no. 1 (feb. 2005): 43–54, http:// www.nla.gov.au/nla/staffpaper/2005/ayres1.html (accessed mar. 12, 2007). 40. ibid. 41. see david mimno and gregory crane, “hierarchical catalog records: implementing a frbr catalog,” d-lib 11, no. 10 (oct. 2005); http://www.dlib.org/dlib/october05/ crane/10crane.html (accessed mar. 12, 2007). 42. ibid. see also martha m. yee, “frbrization: a method for turning online public finding lists into online public catalogs,” information technology and libraries 24, no. 3 (2005): 77–95, http://repositories.cdlib.org/postprints/715 (accessed mar. 12, 2007). 43. portia, “visualcat overview,” http://www.portia.dk/ pubs/visualcat/present/visualcatoverview20050607.pdf (accessed mar. 14, 2007); vtls, inc., “virtua,” http://www.vtls. com/brochures/virtua.pdf (accessed mar. 14, 2007). 44. http://www.exlibrisgroup.com/primo_orig.htm (accessed july 10, 2007). 45. syed ahmed, personal communication to the author, july 10, 2007; searches run july 10, 2007, on http://www.kb.dk/en. the library’s holdings of manifestations of mozart’s singspiel opera, the magic flute, run to four different groupings on this catalog: one under the title “die zauberflöte,” one under the title “la flute enchantée: opéra fantastique en 4 actes,” and two separate groups under the title “tryllefløtjen.” 46. “vtls announces first production use of frbr,” http:// www.vtls.com/corporate/releases/2004/6.shtml (accessed mar. 14, 2007). unfortunately, though this press release indicates commitments on the part of the université catholique de louvain and vaughan public libraries (ontario, canada) to use fully frbrized catalogs, only the first is operating in this mode as of july 2007, and with only a subset of its catalog adapted. 47. virtua is not interoperable, for instance, with any of innovative’s other ils modules, which continue to dominate a number of larger academic consortia; john espley, vtls inc. director of design, personal communication to the author, mar. 15, 2007. 48. see allyson carlyle, “fulfilling the second objective in the online catalog: schemes for organizing author and work records into usable displays,” library resources and technical services 41, no. 2 (1997): 79–100. 49. even at the work-level, yee distinguished fully eight different places in a marc record in which the identity of a work may be located, “frbrization,” 79–80. 50. gregory leazer and richard p. smiraglia imply that cataloger-based “maps” of bibliographic relationships are inadequate; “bibliographic families in the library catalog: a qualitative analysis and grounded theory,” library resources and technical services 43, no. 4 (1999): 191–212. the cataloging failures they describe, however, are more a result of inadequacies in the current rules and practice, and do not really prove that catalogers have failed in the task of creating useful systems. 51. vinood chacra and john espley, “differentiating libraries though enriched user searching: frbr as the next dimensions in meaningful information retrieval,” powerpoint presentation, http://www.vtls.com/corporate/frbr.shtml (accessed mar. 10, 2007). 52. see yee, “frbrization.” 53. http://www.bib.ucl.ac.be (accessed mar. 15, 2007). 54. not only does the ex libris primo application need clickthroughs, it creates a new window for an extra step before presenting a new group of records. bibliography anderies, john. “enhancing library catalogs for music.” paper presented at the conference on music and technology in the liberal arts environment, hamilton college, june 22, 2004; http://academics.hamilton.edu/conferences/musicandtech/presentations/catalog-enhancements.ppt (accessed mar. 12, 2007). ayres, marie-louise. “case studies in implementing functional requirements for bibliographic records: austlit and musicaustralia.” alj: the australian library journal 54, no. 1 (feb. 2005): 43–54; http://www.nla.gov.au/nla/staffpaper/2005/ ayres1.html (accessed mar. 12, 2007). bennett, rick, brian f. lavoie, and edward t. o’neill. “the concept of a work in worldcat: an application of frbr.” library collections, acquisitions, and technical services 27, no. 1 (spring 2003): 45–60. boyd, alistair. “the worst of both worlds: how old rules and new interfaces hinder access to music.” caml review 33, no. 3 (nov. 2005); http://www.yorku.ca/caml/review/33-3/ both_worlds.htm (accessed mar. 12, 2007). brenne, marte. “storage and retrieval of musical documents in a frbr-based library catalogue.” masters’ thesis, oslo university college, 2004. carlyle, allyson. “fulfilling the second objective in the online catalog: schemes for organizing author and work records into usable displays,” library resources and technical services 41, no. 2 (1997): 79–100. ______. “user categorization of works: toward improved organization of online catalog displays.” journal of documentation 55, no. 2 (mar. 1999): 184–208 chacra, vinood, and john espley. “differentiating libraries though enriched user searching: frbr as the next dimensions in meaningful information retrieval.” powerpoint presentation, http://www.vtls.com/corporate/frbr.shtml (accessed mar. 10, 2007). childress, eric. “frbr and oclc research.” paper presented at the university of north carolina-chapel hill, apr. 10, 2006; http://www.oclc.org/research/presentations/ childress/20060410-uncch-sils.ppt (accessed mar. 12, 2007). hickey, thomas b., and edward o’neill. “frbrizing oclc’s worldcat.” in functional requirements for bibliographic records article title | author 31frbrization of a library catalog | dickey 31 (frbr): hype or cure-all? ed. patrick le boeuf, 239-251. new york: haworth, 2005. hickey, thomas b., and jenny toves. “frbr work-set algorithm.” apr. 2005 report; http://www.oclc.org/research/ frbr (accessed mar. 12, 2007). hickey, thomas b., edward t. o’neill, and jenny toves, “experiments with the ifla functional requirements for bibliographic records (frbr),” d-lib 8, no. 9 (sept. 2002); http://www.dlib.org/dlib/september02/hickey/09hickey. html (accessed mar. 12, 2007). ifla study group on the functional requirements for bibliographic records. functional requirements for bibliographic records: final report. munich: k. g. saur, 1998. layne, sara shatford. “subject access to art images.” in introduction to art image access: issues, tools, standards, strategies, murtha baca, ed., 1–18. los angeles: getty research institute, 2002. leazer, gregory, and richard p. smiraglia. “bibliographic families in the library catalog: a qualitative analysis and grounded theory.” library resources and technical services 43, no. 4 (1999): 191–212. le boeuf, patrick. “musical works in the frbr model or ‘quasi la stessa cosa’: variations on a theme by umberto eco.” in functional requirements for bibliographic records (frbr): hype or cure-all? patrick le boeuf, ed., 103–23 new york: haworth, 2005. markey, karen. subject access to visual resources collections: a model for computer construction of thematic catalogs. new york: greenwood, 1986. mimno, david, and gregory crane. “hierarchical catalog records: implementing a frbr catalog.” d-lib 11, no. 10 (oct. 2005); http://www.dlib.org/dlib/october05/crane/10crane. html (accessed mar. 12, 2007). o’neill, edward t. “frbr: application of the entity-relationship model to humphrey clinker.” library resources and technical services 46, no. 4 (oct. 2002): 150–59. o’neill, edward t., and diane vizine-goetz. “bibliographic relationships: implications for the function of the catalog.” in the conceptual foundations of descriptive cataloging. elaine svenonius, ed., 167–79. san diego: academic publ., 1989. proffitt, merrilee. “redlightgreen: frbr between a rock and a hard place.” paper presented at the 2004 ala annual conference, orlando, fla.; http://www.ala.org/ala/alcts/alctsconted/presentations/proffitt.pdf (accessed mar. 12, 2007). smiraglia, richard p. bibliographic control of music, 1897–2000. lanham, md.: scarecrow and music library association, 2006. ______. “content metadata: an analysis of etruscan artifacts in a museum of archaeology.” cataloging and classification quarterly, 40, no. 3/4 (2005): 135–51. ______. “musical works and information retrieval,” notes: quarterly journal of the music library association 58, no. 4 (june 2002): 747–64. ______. the nature of “a work”: implications for the organization of knowledge. lanham, md.: scarecrow, 2001. ______. “uniform titles for music: an exercise in collocating works.” cataloging and classification quarterly 9, no. 3 (1989): 97–114. tillett, barbara ann. “bibliographic relationships.” in relationships in the organization of knowledge. carol a. bean and rebecca green, eds., 19–35. dordrecht: kluwer, 2001. vellucci, sherry l. bibliographic relationships in music catalogs. lanham, md.: scarecrow, 1997. ______. “music metadata and authority control in an international context.” notes—quarterly journal of the music library association 57, no. 3 (mar. 2001): 541–54. wilson, patrick. “the second objective.” in the conceptual foundations of descriptive cataloging. elaine svenonius, ed., 5–16. san diego: academic publ., 1989. wright, h. s. “music librarianship at the turn of the century: technology.” notes: quarterly journal of the music library association 56, no. 3 (mar. 2000): 591–97. yee, martha m. “frbrization: a method for turning online public finding lists into online public catalogs.” information technology and libraries 24, no. 3 (2005): 77–95; http://repositories.cdlib.org/postprints/713 (accessed mar. 12, 2007). ______. “manifestations and near-equivalents: theory with special attention to moving-image materials.” library resources and technical services 38, no. 3 (1994): 227–55. zager, daniel. “collection development and management.” notes: quarterly journal of the music library association 56, no. 3 (2000): 567–73. 32 information technology and libraries | march 200832 information technology and libraries | march 2008 a search on also sprach zarathustra on the online public access catalog for the universite catholique de louvain, with results frbrized. (a vtls opac). selecting the first work yields the following screen: . . . which, when frbrized, yields a list of expressions. any part of the tree may be expanded, to display manifestations, and item-level records follow. appendix: examples of a frbrized tree display evaluating the impact of the long-s upon 18th-century encyclopedia britannica automatic subject metadata generation results articles evaluating the impact of the long-s upon 18th-century encyclopedia britannica automatic subject metadata generation results sam grabus information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12235 sam grabus (smg383@drexel.edu) is an information science phd candidate at drexel university’s college of computing and informatics, and research assistant at drexel’s metadata research center. this article is the 2020 winner of the lita/ex libris student writing award. © 2020. abstract this research compares automatic subject metadata generation when the pre-1800s long-s character is corrected to a standard < s >. the test environment includes entries from the third edition of the encyclopedia britannica, and the hive automatic subject indexing tool. a comparative study of metadata generated before and after correction of the long-s demonstrated an average of 26.51 percent potentially relevant terms per entry omitted from results if the long-s is not corrected. results confirm that correcting the long-s increases the availability of terms that can be used for creating quality metadata records. a relationship is also demonstrated between shorter entries and an increase in omitted terms when the long-s is not corrected. introduction the creation of subject metadata for individual documents is long known to support standardized resource discovery and analysis by identifying and connecting resources with similar aboutness .1 in order to address the challenges of scale, automatic or semi-automatic indexing is frequently employed for the generation of subject metadata, particularly for academic articles, where the abstract and title can be used as surrogates in place of indexing the full text. when automatically generating subject metadata for historical humanities full texts that do not have an abstract, anachronistic typographical challenges may arise. one key challenge is that presented by the historical “long-s” < ſ >. in order to account for these idiosyncrasies, there is a need to understand the impact that they have upon the automatic subject indexing output. addressing this challenge will help librarians and information professionals to determine whether or not they will need to correct the long-s when automatically generating subject metadata for full-text pre-1800s documents. the problem of the long-s in optical character recognition (ocr) for digital manuscript images has been discussed for decades.2 many scholars have researched methods for correcting the longs through the use of rule-based algorithms or dictionaries.3 while the problem of the long-s is well-known in the digital humanities community, automatic subject metadata generation for a large corpus of pre-1800s documents is rare, as is research about the application and evaluation of existing automatic subject metadata generation tools on 18th-century documents in real-world information environments. the impact of the long-s upon automatic subject metadata generation results for pre-1800s texts has not been extensively explored. the research presented in this paper addresses this need. the paper reports results from basic statistical analysis and visualization using the helping interdisciplinary vocabulary engineering (hive) tool automatic mailto:smg383@drexel.edu information technology and libraries september 2020 evaluating the impact of the long-s | grabus 2 subject indexing results, before and after the correction of the historical long-s in the 3rd edition of the encyclopedia britannica. background work was conducted over the summer and fall of 2019, and the research presented was conducted during winter 2020. the work was motivated by current work on the “developing the data set of nineteenth-century knowledge” project, a national endowment for the humanities collaborative project between temple university’s digital scholarship center and drexel university’s metadata research center. the grant is part of a larger project, temple university’s “19th-century knowledge project,” which is digitizing four historical editions of the encyclopedia britannica.4 the next section of this paper presents background covering the historical encyclopedia britannica data, the automatic subject metadata generation tool used for this project, a brief background of “the long-s problem,” and the distribution of encyclopedia entry lengths in the 3rd edition. the background section will be followed by research objectives and method supporting the analysis. next, the results are presented, demonstrating prevalence of terms omitted from the automatic subject metadata generation results if the long-s is not corrected to a standard small < s > character, as well as the impact of encyclopedia entry length upon these results. the results are followed by a contextual discussion, and a conclusion that highlights key findings and identifies future research. background indexing for the 19th-century knowledge project the 19th-century knowledge project, an neh-funded initiative at temple university, is fully digitizing four historical editions of the encyclopedia britannica (the 3rd, 7th, 9th, and 11th). the long-term goal of the project is to analyze the evolving conceptualization of knowledge across the 19th century.5 the 3rd edition of the encyclopedia britannica (1797) is the earliest edition being digitized for this project. the 3rd edition consists of 18 volumes, with a total of 14,579 pages, and individual entries ranging from four to over 150,000 words. for each individual entry, researchers at temple have created individual tei-xml files from the ocr output. in order to enrich accessibility and analysis across this digital collection, the knowledge project will be adding controlled vocabulary subject headings into the tei headers of each encyclopedia entry xml file. considering the size of this corpus, both in terms of entry length and number of entries, automatic subject metadata generation will be required for the creation of this metadata. the knowledge project will employ controlled vocabularies to replace or complement naturally extracted keywords for this process. using controlled vocabularies adheres to metadata semantic interoperability best practices, ensures representation consistency, and helps to bypass linguistic idiosyncrasies of these 18th and 19th century primary source materials. 6 we selected two versions of the library of congress subject headings (lcsh) as the controlled vocabularies for this project. lcsh was selected due to its relational thesaurus structure, multidisciplinary nature, and continued prevalence in digital collections due to its expressiveness and status as the largest general indexing vocabulary.7 in addition to the headings from the 2018 edition of lcsh, headings from the 1910 lcsh are also implemented in order to provide a more multi-faceted representation, using temporally-relevant terms that may have been removed from the contemporary lcsh. the tool applied for this process is hive, a vocabulary server and automatic indexing application. 8 hive allows the user to upload a digital text or url, select one or more controlled vocabularies, and performs automatic subject indexing through the mapping of naturally extracted keywords to the available controlled vocabulary terms. hive was initially launched as an imls linked open information technology and libraries september 2020 evaluating the impact of the long-s | grabus 3 vocabulary and indexing demonstration project in 2009. since that time, hive has been further developed, with the addition of more controlled vocabularies, user interface options, and the rake keyword extraction algorithm. the rake keyword extraction algorithm has been selected for this project after a comparison of topic relevance precision scores for three keyword extraction algorithms.9 the long-s problem early in our metadata generation efforts, we discovered that the 3rd edition of the encyclopedia britannica employs the historical long-s. originating in early roman cursive script, the long-s was used in typesetting up through the 18th century, both with and without a left crossbar. by the end of the 18th century, the long-s fell out of use with printers.10 as outlined by lexicographers of the 17th and 18th centuries, the rules for using the long-s were frequently vague, complicated, inconsistent over time, and varied according to language (english, french, spanish, or italian). 11 these rules specified where in a word the long-s should be used instead of a short < s >, whether it is capitalized, where it may be used in proximity to apostrophes, hyphens, and the letters < f >, < b >, < h >, and < k >; and whether it is used as part of a compound word or abbreviation.12 this is further complicated by the inclusion of the half-crossbar, which occasionally results in two consequences: (a) the long-s may be interpreted by ocr as an < f >, and < b > and < f > may be interpreted by ocr as a long-s. figure 1 shows an example from the 3rd edition entry on russia, in which the original text specifies “of” (line 1 in top figure), yet the ocr output has interpreted the character as a long-s. the long-s may also occasionally be interpreted by the ocr as a lowercase < l >, such as the “univerlity of dublin” in the 3rd edition entry on robinson (the most rev sir richard). these complications and inconsistencies are challenges when developing python rules for correcting the long-s in an automated way, and even preexisting scripts will need to be adapted for individual use with a particular corpus. figure 1. example from the 3rd edition entry on russia, comparing the original use of a letter < f > in “of” to the ocr output of the same passage, which mistakenly interprets the character as a long-s. information technology and libraries september 2020 evaluating the impact of the long-s | grabus 4 despite the transition away from the long-s towards the end of the 18th century, the 3rd edition of the encyclopedia britannica (published in 1797) implements the long-s throughout, with approximately 100,594 instances of the long-s in the ocr output. when performing metadata generation with the hive tool on the ocr output for an entry, the long-s is most often interpreted by the automatic metadata generation tool as an < f >, which can result in (a) inaccurate keyword extraction (e.g., russians→ ruffians), and (b) when mapping extracted keywords to controlled vocabulary terms, essential topics could be unidentifiable, and hive will subsequently omit them from the results because they cannot be mapped to controlled vocabulary terms. figure 2 provides a truncated view of long-s words in the 3rd edition entry on rum, which are subsequently removed from the pool of automatically extracted keywords when performing the automatic subject indexing sequence in hive. using keyword extraction algorithms that are largely dependent upon term frequencies, automatic subject indexing for an entry on rum may be substantially hindered when meaningful and frequently occurring words such as sugar, and yeast are removed. figure 2. examples of the long-s in the 3rd edition encyclopedia britannica entry on rum. using this example entry, the automatic subject indexing results were compared using python, to determine which terms only appear when the long-s has been corrected to the standard < s >. the comparison showed that 16 total terms no longer appeared in the results when the long-s was not corrected to a standard < s >: ten terms using the 2018 lcsh, and six terms using the 1910 lcsh. these omitted results included the terms sugar and yeast. the next section will discuss the encyclopedia entry word count for this corpus, and the possible impact that this may have upon automatic subject indexing between corrected and uncorrected long-s instances. encyclopedia entry lengths consistent with other encyclopedia britannica editions in the 18th and 19th centuries, the encyclopedia entries in the 3rd edition vary substantially in length. a convenience sample of 3,849 3rd edition entries ranging in length from 2 to 202,848 words demonstrated an arithmetic mean of information technology and libraries september 2020 evaluating the impact of the long-s | grabus 5 826.60, and a median word count of 71. as shown in figure 3, this indicates a significant skew towards shorter entry lengths. for the vast majority of encyclopedia entries in this corpus, a low total word count may impact the degree of long-s impact for automatic subject indexing results, given the importance of term availability and frequency for keyword extraction algorithms. figure 3. scatterplot of word count for a convenience sample of 3,849 3rd edition encyclopedia britannica entries. large-scale metadata generation requires time, labor, and resources, and it becomes more costly when accounting for the complications of correcting the long-s for a particular corpus. library and information professionals working with digital humanities resources will need to understand the impact of correcting or not corrected the long-s in the corpus before designating resources and developing a protocol for generating the automatic or semi-automatic metadata for full-text resources. this includes understanding whether or not the length of each individual document will affect the degree of long-s impact upon the results. this challenge, and issues reviewed above, are in the research presented below. objectives the overriding goal of this work is to determine the prevalence of omitted terms in automatic subject indexing results when the long-s is not corrected in the 3rd edition entries of the encyclopedia britannica. research questions: 1. what is the average number of terms that are omitted from automatic subject indexing results when the long-s is not corrected to a standard < s >? 2. how does the encyclopedia entry length affect the number of terms that are omitted when the long-s is not corrected to a standard < s >? this analysis will approach these goals by performing a comparative analysis of automatic subject indexing results to determine the number of terms that are omitted from the results when the long-s is not corrected to a standard letter < s >. basic descriptive statistics are generated to determine central tendency. the quantity of terms omitted are then compared with encyclopedia information technology and libraries september 2020 evaluating the impact of the long-s | grabus 6 entry word counts. these objectives were shaped by collaboration between drexel university’s metadata research center and temple university’s digital scholarship center. the next section of this paper will report on methods and steps taken to address these objectives. methods we approached this research by performing a comparative analysis of subject metadata generated both before and after the correction of the historical long-s in the 3rd edition of the encyclopedia britannica. the hive tool was used to automatically generate the subject metadata. descriptive statistics were applied, and visualizations produced from the results were also examined to identify trends. figure 4. the 30 encyclopedia britannica 3rd edition encyclopedia britannica entries randomly selected for this study, sorted in ascending order by their word counts. the protocol for performing this research involved the following steps: 1. compile a sample for testing: 1.1. a random sample of 30 encyclopedia entries was identified from a convenience sample of entries that comprise the letter s volumes of the 3rd edition. the entries range, in length, from 6 to 6,114 words. the median word count for entries in this sample is 99 words. 1.2. the sample of terms selected for this study and their respective word counts are visualized in figure 4. 1.3. for each entry, the long-s terms in the original xml file were extracted to a list. 2. perform automatic subject indexing sequence upon entries to generate lists of terms: 2.1. using the 2018 and 1910 versions of the lcsh. 2.2. with fixed maximum subject heading results set to 40: 20 maximum terms returned with the 2018 lcsh, and 20 maximum terms returned with the 1910 lcsh. 2.3. before long-s correction and after long-s correction, using the oxygen xml editor tei to txt transformation. information technology and libraries september 2020 evaluating the impact of the long-s | grabus 7 3. perform outer join on python data frames, between terms generated when the long-s has been corrected vs. terms generated when the long-s has not been corrected. the resulting left outer join list displays terms that are omitted from the automatic indexing results if the long-s is not corrected to a standard small < s >. the quantity of terms omitted are recorded for comparison. 4. analysis: descriptive statistics were generated to determine central tendency for the number and percentage of words omitted when the long-s is not corrected. the quantity of terms omitted are also visualized in a continuous scatterplot with the corresponding word counts, to demonstrate that the quantity of terms omitted when the long-s is not corrected seems to relate to the length of the document being automatically classified. results the results report the prevalence of omitted terms when the long-s is not corrected to a standard < s >, as well as a visualization of the number of terms omitted as they relate to the encyclopedia entry length. for each of the 30 sample entries automatically indexed with hive, a fixed maximum number of 40 entries were returned: a maximum of 20 terms using the 2018 lcsh, and a maximum of 20 terms using the 1910 lcsh. as seen in table 1, central tendency is measured using the arithmetic mean and median, along with the standard deviation and range. the average number of terms omitted from an entry’s results is 6.73, and the average percentage of terms omitted from an entry’s results is 26.51 percent, with the 2018 and 1910 editions of lcsh performing at similar rates. the full results are displayed in appendix a. table 1. measures of centrality, standard deviation, range, and percentage for quantity of terms omitted when the long-s is not corrected to a standard < s >, rounded to the hundredth. for each entry, a maximum of 40 terms were returned: 20 using 2018 lcsh and 20 using 1910 lcsh. the total results returned varies according to the entry length. these totals are reported in appendix b. (n= 30 entries.) for each entry in the sample, the results in appendix a display the total words omitted when the long-s is not corrected, the number of 2018 lcsh terms omitted, the number of 1910 lcsh terms omitted, and the encyclopedia entry word count. figure 5 visualizes the total number of terms omitted for each entry when the long-s is not corrected, demonstrating an increase in terms omitted for entries with lower word counts. these results are broken down by vocabulary used in figure 6, demonstrating that both vocabularies used to generate these results indicate a significant increase in omitted terms for shorter entries. column1 both vocabularies 2018 lcsh 1910 lcsh average, terms omitted 6.73 3.67 3.07 median, terms omitted 5 3 2 standard deviation 6.53 3.84 3.17 range, terms omitted 0-24 0-13 0-11 average percentage, omitted terms 26.51% 27.51% 24.28% median percentage, omitted terms 22.36% 20.00% 19.09% information technology and libraries september 2020 evaluating the impact of the long-s | grabus 8 figure 5. number of automatic subject indexing terms that are omitted when the long-s is not corrected to a standard < s > as compared by encyclopedia entry word count. figure 6. number of automatic subject indexing terms that are omitted when the long-s is not corrected to a standard < s > as compared by encyclopedia entry word count, separated by controlled vocabulary version. information technology and libraries september 2020 evaluating the impact of the long-s | grabus 9 discussion the analysis above presents measures of centrality for quantity of terms omitted if the long-s is not corrected to a standard < s > prior to automatic subject indexing using hive, as well as a visualization to represent the relationship between encyclopedia entry word count and number of terms omitted. although researchers have identified challenges with the long-s and have focused a great deal on the technologies and methods used to correct it, there is still limited work on looking at the results of not correcting the long-s character when performing an automatic subject indexing sequence. this research demonstrated an average of 6.73 potentially relevant terms omitted from automatic indexing results when the long-s is not corrected, accounting for an average of 26.51 percent of the total results, with an approximately equal distribution of omitted terms across the two controlled vocabulary versions used. when the quantity of terms omitted is visualized using a continuous scatterplot, the results also demonstrated a significant increase in omitted terms for shorter entries, with longer entries less affected. these results reflect the impact of term frequency and total word count in keyword extraction and automatic subject indexing, with longer documents having a greater pool of total terms from which to identify key terms. considering the complexities and similarities of the typographical characters in the original manuscript, the ocr output process for this corpus occasionally mistakes the letters < s >, < f >, < r >, and < l >. as a result, an occasional long-s word in this study did not originally contain an < s > (e.g., sor instead of for). correction of these long-s ocr errors requires the development of a dictionary-based script. an additional complication of this research is that the corrected ocr output for the encyclopedia entries still contains a few errors not related to the long-s, which will prevent the mapping of the term to any controlled vocabulary term (e.g., in the entry on sepulchre, the ocr output for the term palestine was palestinc). these results are specific to this particular corpus of 3rd edition encyclopedia britannica entries, but it is very likely that testing another set of pre-1800s documents containing the long-s would also illustrate that for best results with any algorithm or tool, the long-s needs to be corrected. the results are also specific to the two versions of the lcsh used, both the 1910 lcsh and the 2018 lcsh, which are available in the hive tool. the 1910 version is key for the time period being studied, and the 2018, more contemporary to today, has supported additional analysis on the impact of the long-s. both of these vocabularies are important to the larger 19th-century knowledge project. it should be noted that while the lcsh is updated weekly, we were limited to what is available via the hive tool, and any discrepancies that may be found with the 2020 lcsh will very likely have a minimal effect upon metadata generation results. it should be noted that the 2020 lcsh will be incorporated into hive soon and can be explored in future research. conclusion and next steps the objective of this research was to determine the impact of correcting the long-s in pre-1800s documents when performing an automatic metadata generation sequence using keyword extraction and controlled vocabulary mapping. this was accomplished by performing an automatic subject indexing sequence using the hive tool, followed by a basic statistical analysis to determine the quantity of terms omitted from the results when the long-s is not corrected to a standard < s >. the number of omitted terms was also compared with the encyclopedia entry word count and visualized to demonstrate a significant increase in omitted terms for shorter information technology and libraries september 2020 evaluating the impact of the long-s | grabus 10 encyclopedia entries. the study was conclusive in confirming that the correction of the long-s is a critical part of our workflow. the significance of this research is that it demonstrates the necessity of correcting the long-s prior to performing an automatic subject indexing on historical documents. beyond the correction of the long-s, the larger next steps for this project are to continue to explore automatic metadata generation for this corpus. these next steps include the comparison of results using contemporary vs. historical vocabularies and streamlining a protocol for bulk classification procedures and integration of terms into the tei-xml headers. the research presented here can inform other digital humanities and even science-oriented projects, where researchers may not be aware of the impact of the long-s on automatic metadata generation not only for subjects, but also named entities, particularly when automatic approaches with controlled vocabularies are desired. acknowledgements the author thanks dr. jane greenberg and dr. peter logan for their guidance. the author acknowledges the support of the neh grant #haa-261228-18. information technology and libraries september 2020 evaluating the impact of the long-s | grabus 11 appendix a entry term total words omitted 2018 lcsh terms omitted 1910 lcsh terms omitted encyclopedia entry word count sardis 24 13 11 381 suction 24 13 11 38 stylites, pillar saints 19 13 6 199 shadwell 14 10 4 211 salicornia 13 6 7 254 sepulchre 11 3 8 348 sitta nuthatch 9 5 4 620 sprat 9 3 6 475 serapis 8 5 3 587 strada 8 1 7 189 shoad 7 4 3 463 sign 7 5 2 68 shooting 6 3 3 6114 strata 6 3 3 2920 stewartia 5 4 1 72 subclavian 5 3 2 20 schweinfurt 4 2 2 84 scroll 4 2 2 45 spalatro 4 3 1 99 special 4 3 1 24 samogitia 3 2 1 112 shakespeare 3 0 3 3855 sinapism 2 1 1 25 sect 1 1 0 20 severino 1 1 0 38 shaddock 1 1 0 6 scarlet 0 0 0 65 shallop, shalloop 0 0 0 42 soldanella 0 0 0 56 spoletto 0 0 0 99 information technology and libraries september 2020 evaluating the impact of the long-s | grabus 12 appendix b *n = 30 entries average terms returned median terms returned corrected 24.77 / 40 possible 28 / 40 possible uncorrected 26.47 / 40 possible 29 / 40 possible 2018 lcsh corrected 14.10 / 20 possible 19 / 20 possible 2018 lcsh uncorrected 13.47 / 20 possible 18.5 / 20 possible 1910 lcsh corrected 11.27 / 20 possible 11 / 20 possible 1910 lcsh uncorrected 10.13 / 20 possible 9 / 20 possible information technology and libraries september 2020 evaluating the impact of the long-s | grabus 13 endnotes 1 liz woolcott, “understanding metadata: what is metadata, and what is it for?,” routledge (november 17, 2017), https://doi.org/10.1080/01639374.2017.1358232; koraljka golub et al., “a framework for evaluating automatic indexing or classification in the context of retrieval,“ journal of the association for information science and technology 67, no. 1 (2016), https://doi.org/10.1002/asi.23600; lynne c. howarth, “metadata and bibliographic control: soul-mates or two solitudes?,“ cataloging & classification quarterly 40, no. 3-4 (2005), https://doi.org/10.1300/j104v40n03_03. 2 a. belaid et al., “automatic indexing and reformulation of ancient dictionaries“ (paper presented at the first international workshop on document image analysis for libraries, palo alto, ca, 2004), https://doi.org/10.1109/dial.2004.1263264. 3 beatrice alex et al., “digitised historical text: does it have to be mediocre" (paper presented at the konvens 2012 (lthist 2012 workshop), vienna, september 21, 2012); ted underwood, “a half-decent ocr normalizer for english texts after 1700," the stone and the shell, december 10, 2013, https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-englishtexts-after-1700/. 4 “nineteenth-century knowledge project," (github repository), 2020, https://tuplogan.github.io/. 5 “nineteenth-century knowledge project.” 6 marcia lei zeng and lois mai chan, “metadata interoperability and standardization a study of methodology, part ii," d-lib magazine 12, no. 6 (2006); g. bueno-de-la-fuente, d. rodríguez mateos, and j. greenberg, “chapter 10 automatic text indexing with skos vocabularies in hive" (elsevier ltd, 2016); sheila bair and sharon carlson, “where keywords fail: using metadata to facilitate digital humanities scholarship," journal of library metadata 8, no. 3 (2008), https://doi.org/10.1080/19386380802398503. 7 john walsh, “the use of library of congress subject headings in digital collections," library review 60, no. 4 (2011), https://doi.org/10.1108/00242531111127875. 8 jane greenberg et al., “hive: helping interdisciplinary vocabulary engineering,“ bulletin of the american society for information science and technology 37, no. 4 (2011), https://doi.org/10.1002/bult.2011.1720370407. 9 sam grabus et al., “representing aboutness: automatically indexing 19thcentury encyclopedia britannica entries,” nasko 7 (2019), pp. 138-48, https://doi.org/10.7152/nasko.v7i1.15635. 10 karen attar, “s and long s," in oxford companion to the book, eds. michael felix suarez and h. r. ii woudhuysen (oxford: oxford university press, 2010); ingrid tieken-boon van ostade, “spelling systems,“ in an introduction to late modern english (edinburgh university press, 2009). 11 andrew west, “the rules for long-s," tugboat 32, no. 1 (2011). 12 attar, “s and long s.” https://doi.org/10.1080/01639374.2017.1358232 https://doi.org/10.1002/asi.23600 https://doi.org/10.1300/j104v40n03_03 https://doi.org/10.1109/dial.2004.1263264 https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english-texts-after-1700/ https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english-texts-after-1700/ https://tu-plogan.github.io/ https://tu-plogan.github.io/ https://doi.org/10.1080/19386380802398503 https://doi.org/10.1108/00242531111127875 https://doi.org/10.1002/bult.2011.1720370407 https://doi.org/10.7152/nasko.v7i1.15635 abstract introduction background indexing for the 19th-century knowledge project the long-s problem encyclopedia entry lengths objectives methods results discussion conclusion and next steps acknowledgements appendix a appendix b microsoft word march_ital_dehmlow.docx editorial board thoughts a&i databases: the next frontier to discover mark dehmlow information technology and libraries | march 2015 1 i think it is fair to say that the discovery technology space is a relatively mature market segment, not complete, but mature. much of the easy-‐to-‐negotiate content has been negotiated, and many of the systems on the market are above or approaching a billion records. this would seem a lot, but there is a whole slice of tremendously valuable content still not fully available across all platforms, namely the specialized subject abstracting and indexing database content. this content has a lot of significant value for the discovery community—many of those databases go further back than content pulled from journal publishers or full-‐text databases. equally as important is that they represent an important portion of humanities and social sciences content that is less represented in discovery systems as compared to stem content. for vendors of a&i content, the concerns are clear and realistic, differently from journal publishers whose metadata is meant to direct users to their main content (full text), the metadata for a&i publishers is the main content. according to a recent nfais report, a major concern for them is that if they include their content in discovery systems, they “risk loss of brand awareness” and the implications are that institutions will be more likely to cancel those subscriptions.1 the focus therefore seems to have been how to optimize the visibility of their content in discovery systems before being willing to share it. in addition to the nfais report, some of the conversations i have seen on the topic seem to focus on wanting discovery system providers to meet a more complex set of requirements that will maximize leveraging the rich metadata contained in those resources, the idea being that utilizing that metadata in specific ways will increase the visibility of the content. in principle i think it is a commendable goal to maximize the value of the comprehensive metadata a&i records contain, and the complexities of including a&i data into discovery systems need to be carefully considered -‐ namely blending multiple subject and authority vocabularies, and ensuring that metadata records are appropriately balanced with full text in the relevancy algorithm. but i also worry that setting too many requirements that are too complicated will lead to delayed access and biased search results. it is important that this content is blended in a meaningful way, but determining relevancy is a complex endeavor, and it is critically important for relevancy to be unbiased from the content provider perspective and instead focus on the user, their query, and the context of their search. another concern that i have heard articulated is that results in discovery services are unlikely to be as good as native a&i systems because of the already mentioned blending issues. this is likely mark dehmlow (mark.dehmlow@nd.edu), a member of the ital editorial board, is program director, library information technology, university of notre dame, south bend, in. editorial board thoughts: a&i databases | dehmlow 2 to be true, but i think it is critical to focus on the purpose of discovery systems. as donald hawkins recently wrote in a summary of a workshop called “information discovery and the future of abstracting and indexing services,” “a&i services provide precision discipline-‐specific searching for expert researchers, and discovery services provide quick access to full text.”2 hawkins indicates that discovery systems are not meant to be sophisticated search tools, but rather a quick means to search a broad range of scholarly resources and i think sometimes a quick starting point for researchers. because of the nature of merging billions of scholarly records into a single system, discovery systems will never be able to provide the same experience as a native a&i system, nor should they. over time, they may become better tuned to provide a better overall experience for the three different types of searchers we have in higher education: novice users like undergraduates looking for a quick resource, advanced users like graduate students and faculty looking for more comprehensive topical coverage, and expert users like librarians who want sophisticated search features to hone in on the perfect few resources. many of the discovery systems are working on building these features, but the industry will take time to solve this problem, and i tend to look at things from the lense of our end users—non-‐inclusion of this content directly impacts their overall discovery experience. one might ask, if the discovery system experience isn’t as precise and complete as the native a&i experience, why bother? in addition to broadening the subject scope by including many of the more narrow and deep subject metadata, there is also the importance of serendipitous finding. that content, in the context of a quick user search, may drive the user to just the right thing that they need. in addition, my belief is that with that content, we can build search systems that are deeper than google scholar, and by extension provide our end users with a superior search experience. and so i advocate for innovating now instead of waiting to work out all of the details. i am not suggesting moving forward callously, but swiftly. the work that niso has done on the open data initiative has resulted in some good recommendations about how to proceed. for example, they have suggested two usage metrics that could be valuable for measuring a&i content use in discovery systems: search counts (by collection and customer for a&i databases) and results clicks (number of times an end user clicks on a content provider’s content in a set of results).3 while i think these types of metrics are aligned with the types of measures that libraries evaluate a&i database usage by, i think at the same time they don’t really say much about the overall value of the resources themselves. sometimes in the library profession, our obsession for counting stuff loses connection with collecting metrics that actually say something about impact. of the two counts, i could see perhaps counting the result clicks as having more value. in this instance, knowing that a user found something of interest from a specific resource at the very least indicates that it led the user some place. i think the measure of search counts by collection is less useful. at best it indicates that the resource was searched, but it tells us nothing about who was searching for an item, what they found, or what they subsequently did with the item once they found it. i do think we in libraries need to consider the bigger picture. regardless of the number of searches information technology and libraries | march 2015 3 (which doesn’t really tell us anything anyway), we need to recognize the value alone of including the a&i content, and instead of trying to determine the value of the resource by the number of times it was searched, focus more on the breadth of exposure that content is getting by inclusion in the discovery system. i think a more useful technical requirement for discovery providers would be to provide pathways to specific a&i resources within the context of a user’s search—not dissimilar to how google places sponsored content at the top of their search results, a kind of promotional widget. in this case, using metadata returned from the query, the systems could calculate which one or two specific resources would guide the user to more in depth research. by virtue of inclusion of the resource in the discovery system, those resources could become part of the promotional widget. this would guide users back to the native a&i resource which both libraries and a&i providers want, and it would do that in a more intuitive and meaningful way for the end user. all of the parties involved in the discovery discussion can bring something to the table if we want to solve these issues in a timely way. i hope that a&i publishers and discovery system providers make haste and get agreements underway for content sharing and i would recommend that instead of focusing on requiring finished implementations based in complex requirement before loading content, both of them should instead focus on some achievable short and long term goals. integrating a&i content perfectly will take some time to complete and the longer we wait, the longer our users have a sub-‐optimal discovery experience. discovery providers need to make long term commitments to developing mechanisms that satisfy usage metrics for a&i content, although i would recommend defining measures that have true value. a&i providers should be measured in their demands: while their stakes in system integration is real, there runs a risk of content providers vying for their content to be preferred when relevancy neutrality is paramount for a discovery system to be effective. i think it is worth lauding the efforts of a few trailblazing a&i publishers such as thomson reuters and proquest who have made agreements with some of the discovery providers and are sharing their a&i content already, providing some precedent for sharing a&i content. lastly, libraries and knowledge workers need to develop better means for calculating overall resource value, moving beyond strict counts to thinking of ways to determine the overall scholarly/pedagogical impact of those resources and they need to make the fact alone that an a&i publisher shares its data with a discovery provider indicate significant value for the resource. editorial board thoughts: a&i databases | dehmlow 4 references 1. nfais, recommended practices: discovery systems. nfais, 2013. https://nfais.memberclicks.net/assets/docs/bestpractices/recommended_practices_final_aug_ 2013.pdf. 2. hawkins, donald t., “information discovery and the future of abstracting and indexing services: an nfais workshop.” against the grain. , 2013. http://www.against-‐the-‐ grain.com/2013/08/information-‐discovery-‐and-‐the-‐future-‐of-‐abstracting-‐and-‐indexing-‐ services-‐an-‐nfais-‐workshop/. 3. open discovery initiative working group, open discovery initiative: promoting transparency in discovery. baltimore: niso, 2014. http://www.niso.org/apps/group_public/download.php/13388/rp-‐19-‐2014_odi.pdf. editor’s comments bob gerrity information technologies and libraries | september 2013 3 this month’s issue in this month’s issue, we welcome back the president’s message column, with incoming lita president cindi trainor describing upcoming lita events, priorities, and opportunities for members. university of denver mlis candidate gina schlesselman-tarango contributes a compelling piece describing the background, use, and potential library application of searchable signatures in web 2.0 applications such as instagram. jenny emanuel from university of illinois reports on the complex relationship that millennial academic librarians have with technology. kristina l. southwell and jacquelyn slater from university of oklahoma present the findings of a study evaluating the accessibility of special collections finding aids to screen readers for visually impaired users. ping fu from central washington university and moira fitzgerald from yale university look at the potential effects of cloud-based next-generation library services platforms on staffing models for systems and technical-services departments. visiting the discovery side of library services, megan johnson from appalachian state university reports on usability testing of appalachian’s “one box” integrated articles and catalog search, using innovative interfaces’ encore discovery service. speaking of usability, i had the chance recently to observe a usability testing session for my library’s website, and was reminded of the importance of designing library websites and delivering web-based library services that will actually be of value to our users, delivered with their context in mind rather than ours. my library, like many others, has a website rich in content and complexity and organized around our structure. to the user i was observing, the complexity and library-centric organization clearly were obstacles to the rich content we offer. an undergraduate art history major, she was primarily interested in library resources and services that were directly connected to her coursework and that were accessible from the university’s learning management system (lms). she valued the convenience of direct access from the lms to library-managed course readings and past exam papers. but, when asked to navigate to the same resources using the library homepage as a starting point, rather than the lms, she quickly became frustrated and confused by the overload of search options with (to her) confusing labels. she was further stymied by our proclivity to make things more complex than they need to be (or should be). a simple example: a common occurrence at the beginning of semester is that students with outstanding library fines/fees are blocked from registering for classes. rather than providing a simple, direct “resolve my library fees” link, with clear instructions on how to fix their problem, as bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, australia. editor’s comments bob gerrity editor’s comments | gerrity 4 quickly as possible, we instead provide pages of information about how and why the fines/fees were calculated, with no link to a solution to the problem at hand. my takeaways from the session were that (1) our website needs to be radically simplified and (2) we should be focussing on designing and delivering services that can be embedded in the context of the user’s natural workflows, not the library’s. easier said than done, of course. reviewers needed the ital editorial board has room for a couple of additional members, to help us keep up with incoming article submissions. if you have a passion for library technology, a willingness to undertake a few reviews each year, and are a member of lita (or willing to join), please send me an e-mail indicating your interest and area(s) of expertise. as always, suggestions and feedback on ital are welcome, at the e-mail address above. 315 technical communications isad/solinet to sponsor institute "networks and networking ii; the present and potential" is the theme of an isad institute to be held at the braniff place hotel on february 27-28, 1975, in new orleans. the sponsors are the information science and automation division of ala and the southeastern library network (solinet). this second institute on networking will be an extension of the previous one held in new orleans a year ago. the ground covered in that previous institute will be the point of departure for "networks ii." the purpose of the previous institute was to review the options available in networking, to provide a framework for identifying problems, and to suggest evaluation strategies to aid in choosing alternative systems. while the topics covered in the previous institute will be briefly reviewed in this one, some speakers will take different approaches to the subject of networking, while other speakers will discuss totally new aspects. in addition to the papers given and the resultant questions and answers from the floor, a period of round table discussions will be held during which the speakers can be questioned on a person-to-person basis. a new feature to isad institutes now being planned will be the presence of vendors' exhibits. arrangements are being made with the many vendors and manufacturers whose services are applicable to networking to exhibit their products and systems. it is hoped that many of them will be interested in responding to this opportunity. the program will include: "a systems approach to selection of alternatives" -resource sharing-camponents-communications options-planning strategy. joseph a. rosenthal university of california, berkeley. ' "state of the nation"-review of current developments and an evaluation. brett butler, butler associates. "the library of congress, marc, and future developments." henriette d. avram, library of congress. "data bases, standards and data conversions" -existing data bases-characteristics-standardization-problems. john f. knapp, richard abel & co. "user products"-possibilities for product creation-the role of user products. maurice freedman, new york public library. "on-line technology"-hardware and software considerations-library requirements-standards-cost considerations of alternatives. philip long, state university of new york, albany. "publishers' view of networks"-copyright-effect on publishers-effect on authorship-impact on jobbers-facsimile transmission. carol nemeyer, association of american publishers. "national library of canada"-current and anticipated developments-cooperative plans in canada-international cooperation. rodney duchesne, national library of canada. "administrative, legal, financial, organizational and political considerations" -actual and potential problems-organizational options-financial commitment-governance. fred kilgour, oclc. registration will be $75.00 to members of ala and staff members of solinet institutions, $90.00 to nonmembers, and $10.00 to library school students. for hotel reservation information and registration blanks, contact donald p. hammer, isad, american library association, 50 e. huron st., chicago, il 60611; 312-944-6780. 316 journal of library automation vol. 7/4 december 1974 regional projects and activities indiana coopemtive libmry services authm·ity the first official meeting of the board of directors of the indiana cooperative library services authority (incolsa) was held june 4, 1974, at the indiana state library in indianapolis. a direct outgrowth of the cooperative bibliographic center for indiana libraries ( cobicil) feasibility study project sponsored by the indiana state library and directed by mrs. barbara evans markuson, incolsa has been organized as an independent not-for-profit organization "to encourage the development and improvement of all types of library service." to date, contracts have been signed by sixty-one public, thirteen academic, fourteen schools and five specfal librariesa total of ninety-three libraries. incolsa is being funded initially by a three-year establishment grant from the u.s. office of education, library services and construction act (lsca) title i funds. officers are: president-harold baker, head of library systems development, indiana state university; vice-presidentor. michael buckland, assistant director for technical services, purdue university libraries; secretary-mary hartzler, head of catalog division, indiana state library; treasurer-mary bishop, director of the crawfordsville book processing center; three directors-at-large--phil hamilton, director of the kokomo public library; edward a. howard, director of the evansville-vanderburgh county public library; and sena kautz, director of media services, duneland school corporation. stanford's ballots on-line files publicly available through spires september 16,.1974 the stanford university libraries automated technical processing system, ballots (bibliographic automation of large library operations using a timesharing system) , has been in operation for twenty-two months and supports the acquisition and cataloging of nearly 90 percent of all materials processed. important components of the ballots operations are several on-line files accessible through an unusually powerful set of indexes. currently available are: a file of library of congress marc data starting from january 1, 1972 (with a gap from may to august 1972); an in-process file of individual items being purchased by stanford; an on-line catalog (the catalog data file) of all items cataloged through the system, whether copy was derived from library of congress marc data, was input from non-marc cataloging copy, or resulted from stanford's own original cataloging efforts; and a file of see, see also, and explanatory references (the reference file) to the catalog data file. in addition, during september and october 1974, the 85,000 bibliographic and holdings records (already in machinereadable form on magnetic tape) representing the entire j. henry meyer memorial undergraduate library was convmted to on-line meyer catalog data and meyer reference files in ballots. these files are publicly available through spires (stanford public information retrieval system) to any person with a terminal that can dial up the stanford center for information processing's academic computer services computer (an ibm 360 model 67) and who has a valid computer account. the marc file can be searched through the following index points: lc card number personal name corporate/ conference n arne title the in-process, catalog data, and reference files for stanford and for meyer can also be searched as spires public subfiles through the following index points: ballots unique record identification number personal name corporate/ conference name title subject heading (catalog data and reference file records only) call number (catalog data and reference file records only) lc card number the title and corporate/ conference name indexes are word indexes; this means that each word is indexed individually. search requests may draw on more than one index at a time by using the logical operators "and," "or," and "and not" to combine index values sought. if you plan to use spires to search these files, or if you would like more information, a publication called gttide to ballots files may be ordered by writing to: editor, library computing services, s.c.i.p.-willow, stanford university, stanford, ca 94305. this document contains complete information about the ballots files and data elements, how to open an account number, and how to use spires to search ballots files. a list of ballots publications and prices is also available on request. as additional libraries create on-line files using ballots in a network environment, these files will also be available. these additions will be announced in ]ola technical commttnications. data base news interchange of alp and ei data bases a national science foundation grant (gn-42062) for $128,700 has been awarded to the american institute of physics (aip), in cooperation with engineering index ( ei), for a project entitled "interchange of data bases." the grant became effective on may 1, 1974, for a period of fifteen months. the project is intended to develop methods by which ei and alp can reduce their input costs by eliminating duplication of intellectual effort and processing. through sharing of the resources of the two organizations and an interchange of their respective data bases, alp and ei expect to improve the utilization of these computer-readable data bases. the basic requirement for the developtechnical communications 317 ment of the interchange capability for computer-readable data bases is the establishment of a compatible set of data elements. each organization has unique data elements in its data base. it will therefore be necessary to determine which of the data elements are absolutely essential to each organization's services which elements can be modified, and wh~t other elements must be added. mter the list of data elements has been established, it will be possible to unite the specifications and programs for format conversions from alp to ei tape format and vice versa. simultaneously, there will be the development of language conversion facilities between ei' s indexing vocabulary and alp's physics and astronomy classification scheme (pacs). it is also planned to investigate the possibility of establishing a computer program which can convert alp's indexing to ei's terms and vice versa. with the accomplishment of the above tasks, it will be possible to create new services and repackage existing services to satisfy the information demands in areas of mutual interest to engineers and physicists, such as acoustics and optics. eric data base users conference the educational resource information center (eric) held an eric data base users conference in conjunction with the 37th annual meeting of the american society for information science (asis) in atlanta, georgia, october 13-17, 1974. the eric data base users conference provided a forum for present and potential eric users to discuss common problems and concerns as well as interact with other components of the eric network: central eric, the eric processing and reference facility, eric clearinghouse personnel, and information dissemination centers. although attendees have in the past been primarily oriented toward machine use of the eric files, all patterns of usage were represented at this conference, from manual users of printed indexes to operators of national on-line reh·ieval systems. 318 ]oumal of library automation vol. 7/4 december 1974 a number of invited papers were presented dealing with subjects such as: • the current state and future directions of educational information dissemination. sam rosenfeld (nie), lee burchina! (nsf). • what services, systems, and data bases are available? marvin gechman (information general), harvey marron (nie). • the roles of libraries and industry, respectively, in disseminating educational information. richard de gennaro (university of pennsylvania), paul zurkowski (information industry association) . several organizations (national library of canada, university of georgia, wisconsin state department of education) were invited to participate in "show and tell" sessions to describe in detail how they are using the eric system and data base. a status report covering eric on-line services for educators was presented by dr. carlos cuadra (system development corporation) and dr. roger summit (lockheed). interactive discussion groups covered a number of subjects including: • computer techniques-programming methods, use of utilities, file maintenance, search system selection, installation, and operation. • serv:uig the end user of educational information. • introduction to the eric systemwhat tools, systems, and services are available and how are they used? • beginning and advanced sessions on computer searching the eric files. online terminals were used to demonstrate and explain use of machine capabilities. commercial services and developments scope data inc. ala train compatible terminal printers scope data inc. currently is offering a high-speed, nonimpact terminal printer for use in various interactive printing applications. capability can be included in the series 200 printer as an extra-cost feature to print the eight-bit ascii character set for ala character set with 176 characters. for further information contact alan g. smith, director of marketing, scope data inc., 3728 silver star rd., orlando, fl 32808. institute for scientific information puts life sciences data base on-line through system development corporation the institute for scientific information (lsi) has announced that it will collaborate with system development corporation (sdc) to provide on-line, interactive, computer searches of the life sciences journal literature. scheduled to be fully operational by july 1, 1974, the isi-sdc service is called scisearch® and is designed to give quick, easy, and economical access to a large life sciences literature .file. stressing ease of access, the sdc retrieval program, orbit, permits subscribers to conduct extremely rapid literature searches through two-way communications terminals located in their own facilities. mter examining the preliminary results of their inquiries, searchers are able to further refine their questions to make them broader or narrower. this dialog between the searcher and the computer (located in sdc's headquarters in santa monica, california) is conducted with simple english-language statements. because this system is tied in to a nationwide communications network, most subscribers will be able to link their terminals to the computer through the equivalent of a local phone call. covering every editorial item from about 1,100 of the world's most important life sciences journals, the service will initially offer a searchable ille of over 400,000 items published between april 1972 and the present. each month approximately 16,000 new items will be added until the average size of the file totals about one-half million items and represents two-and-one-half years ·of coverage. to assure subscribers maximum retrieval effectiveness when dealing with this massive amount of information, the data base can be searched in several ways. included are searches by keywords, word stems, word phrases, authors, and organizations. one of the search techniques utilized-citation searching-is an exclusive feature of the lsi data base. for every item retrieved through a search, subscribers can receive a complete bibliographic description that includes all authors, journal citation, full title, a language indicator, a code for the type of item (article, note, review, etc.), an lsi accession number, and all the cited references contained in the retrieved article. the accession number is used to order full-text copies of relevant items through lsi's original article tear sheet service (oats®). this ability to provide copies of every item in the data base distinguishes the lsi service from many others. current library of congress catalog on-line for reference searches information dynamics corporation (idc) has agreed to collaborate with system development corporation (sdc) to provide reference librarians, researchers, and scholars with on-line interactive computer searches of all library materials being cataloged by the library of congress. scheduled to be fully operational as of october 1, 1974, the sdc-idc service is called sdc-idc/libcon and is designed to give quick, easy, and economical access to a large portion of the world's scholarly library materials. as in the lsi service described above, the data base can be searched in several ways. included are compound logic searches by keywords, word stems, word phrases, authors, organizations, and subject headings for most english materials. one of the search techniques utilized-string searching-is an exclusive feature of sdc's orbit system. keyword searching of cataloged items including all foreign materials processed by the library of congress technical communications 319 is an exclusive feature of the idc data base not currently available in other online marc files. for individual items retrieved through a search, subscribers can receive a bibliographic description that includes authors, full title, an idc accession number, the lc classification number, and publisher information. standards the isad committee on technical standards for library automation invites your participation in the standards game editor's note: the tesla reactor ballot will be provided in f01'thcoming issues. to use, photocopy the ballot fol'm, fill out, and mail to: john c. kountz, associate for library automation, office of the chan{jellor, the california state university and colleges, 5670 wilshire blvd., suite 900, los angeles, ca 90036. the procedure this procedure is geared to handle both reactive (originating from the outside) and initiative (originating from within ala) standards proposals to provide recommendations to ala's representatives to existing, recognized standards organizations. to enter the procedure for an initiative standards proposal you must complete an "initiative standards proposal" using the outline which follows: initiative standard proposal outlinethe following outline is designed to facilitate review by both the committee and the membership of initiative standards proposals and to expedite the handling of the initiative standard proposal through the procedure. since the outline will be used for the review process, it is to be followed explicitly. where an initiative standard requirement does not require the use of a specific outline entry, the entry heading is to be used followed by the words "not applicable" (e.g., where no standards exist which relate to the proposal, this is indi320 journal of library automation vol. 7/4 december 1974 cated by: vi. existing standards. not applicable). nate that the parenthetical statements following most of the outline entry descriptions relate to the ansi standards proposal section headings to facilitate the translation from this outline to the ansi format. all initiative standards proposals are to be typed, double spaced on 8~~~~ x 11" white paper (typing on one side only) . each page is to be numbered consecutively in the upper right-hand corner. the initiator's last name followed by the key word from the title is to appear one line below each page number. i. title of initiative standard proposal (title) . ii. initiator information (forward). a. name b. title c. organization d. address e. city, state, zip f. telephone: area code, number, extension iii. technical area. describe the area of library technology as understood by initiator. be as precise as possible since in large measure the information given here will help determine which ala official representative might best handle this proposal once it has been reviewed and which ala organizational component might best be engaged in the review process. iv. purpose. state the purpose of standard proposal (scope and qualifications) . v. description. briefly describe the standard proposal (specification of the standard). vi. relationship of other standards. if existing standards have been identified which relate to, or are felt to influence, this standard proposal, cite them here (expository remarks). vii. background. describe the research or historical review performed relating to this standard proposal (if applicable, provide a bibliography) and your findings (justification). viii. specifications. (optional) specify the standard proposal using record layouts, mechanical drawings, and such related documentation aids as required in addition to text exposition where applicable (specifications of the standard). kindly note that the outline is designed to enable standards proposals to be written following a generalized format which will facilitate their review. in addition, the outline permits the presentation of background and descriptive information which, while important during any evaluation, is a prerequisite to the development of a standard. tesla reactor ballot identification number for standing requirement reactor information name-----'----------tiue ______________________ ___ organization --------------addrms _____________ ___ city ___ _ state ___ zip __ _ telephonea 1:-:::-rea::+----~--- need (for this standard) for d against 0 specification (a presented in this requirement) for 0 against 0 ext. can you participate in the development of this. standard -.,.---------==----0 no d yes reason for position: (use format of proposal. · additional pages can be used if required) the reactor ballot is to be used by members to voice their recommendations relative to initiative standards proposals. the reactor ballot permits both "for" and "against" votes to be explained, permitting the capture of additional information which is necessary to document and communicate formal standards proposals to standards organizations outside of the american library association. as you, the members, use the outline to present your standards proposals, tesla will publish them in jola-tc and solicit membership reaction via the reactor ballot. throughout the process tesla will insure that standards proposals are drawn to the attention of the applicable american library association division or committee. thus, internal review usually will proceed concurrently with membership review. from the review and the reactor ballot tesla will prepare a "majority recommendation" and a "minority report" on each standards proposal. the majority recommendation and minority report so developed will then be transmitted to the originator, and to the official american library association representative on the appropriate standards organization where it should prove a source of guidance as official votes are cast. in addition, the status of each standards proposal will be reported by tesla in jola-tc via the standards scoreboard. the committee (tesla) itself will be nonpartisan with regard to the proposals handled by it. however, the committee does reserve the right to reject proposals which after review are not found to relate to library automation. input to the editor: we have been asked by the members of the ala interdivisional committee on representation in machine readable form of bibliographic information, (marbi) to respond to your editorial in the june 1974 issue of the journal of library automation. this editorial dealt with the council of library resources' [sic] involvement in a wide range of projects, ranging from the sponsorship of a group which is attempting to develop a subset of marc for use in inter-library exchange technical communications 321 of bibliographic data ( cembi), to management of a project which has as its goal the creation of a national serials data base, (conser), and, more recently, to the convening of a conference of library and a&i organizations to discuss the outlook for comprehensive national bibliographic control. you raised several legitimate questions: 1) has sufficient publicity been given to these activities of the council so that all, not just a few, libraries are aware of what is happening and have an opportunity to exert an influence on developments? and, 2) is the council bypassing existing channels of operation and communication? you also suggest that proposals from groups such as cembi be channeled through an official ala committee such as marbi for intensive review and evaluation. it should be pointed out that marbi is not charged with the development of standards. it acts to monitor and review proposals affecting the format and content of machine readable bibliographic data, where that data has implications for national or international use. this applies to proposals emanating from cembi and conser as well as from other concerned groups. all indications to date are that the council is fully aware of marbi's role and will not bypass marbi. a number of members of marbi are also members of cembi and marbi is represented on the conser project. also reassuring is the fact that, unless we allow lc to fall by the wayside in its role as the primary creator and distributor of machine readable data, any standards for format or content developed by a council-sponsored group will eventually be reflected in the marc records distributed by lc. the library of congress has issued a statement, published in the june 1974 issue of jola, to the effect that it will not implement any changes in the marc distribution system which are not acceptable to marbi. marbi and lc have worked out a procedure whereby all proposed changes to marc are submitted to marbi. they are then published in ]ola and distributed to mem322 journal of library automation vol. 7/4 december 1974 hers of the marc users discussion group for comments. comments are collected and evaluated by marbi and a report submitted to lc, with its recommendations. the marbi review process does not guarantee perfection and there is no assurance that everyone will be satisfied. compromise and expediency are the name of the game in this extremely complicated and uncharted area of standards for machine readable bibliographic data. however the council has undoubtedly learned from the isbd(m) experience that it cannot make decisions which affect libraries without the greatest possible involvement of librarians. it is the feeling of the marbi committee members that the council intends to work with marbi in future projects which fall into marbi's area of concern. velma veneziano marbi past chairperson ruth tighe chairperson editor's note: it is gratifying to note that marbts response reflects the opinions expressed in the june 1974 editorial. the library community will doubtless. be pleased to learn of clr's intention to work closely with marbi.-skm to the editor: as briefly discussed with you, yom editorial in the june 1974 issue of jola is both admirable and disturbing (to me, at least). the problem of national leadership in the area of library automation is a critical problem indeed. being in the ''boondocks" and far removed from the scene of action, i can only express to you my perception as events and activities filter through to me. i can remember as far back as 1957 when adi had a series of meetings in washington, d.c. trying to establish a national program for bibliographic automation. i have been through eighteen years of meetings, committees, conferences, etc. concerned with trying to develop a national plan for bibliographic automation and information storage and retrieval systems. i have worked with nsf, usoe, department of commerce, u.s. patent office, engineering and technical societies, dod agency-the entire spectrum. i spent a good many years working in adi and asis, sla, andmost recently ala. at no time were we able to make significant progress towards a national system. even the great airlie house conference did not produce any significant changes in the fragmented, competitive "non-system." it has only been in the recent past since clr has taken an aggressive posture that i am able to see the beginning of orderly development of a national automated bibliographic system. i certainly agree that any topic as critical as those being discussed by cembi should be in the public domain, but i also believe that the progress made by cembi would not have been possible without clr taking the initiative in getting these key agencies together. thank goodness someone quit talking and started doing something at the national level! i sincerely believe that in the absence of a national library and with the cmrent lack of legally derived authority in this arena, clr provides a genuine service to the total library community in establishing cembi. hopefully, your very excellent article (in the same issue of jola) on "standards for library automation ... " will help to put the entire issue of bibliographic record standards into perspective. as a former chemist and corrosion engineer, i am fully aware of the absolute necessity for technical standards. i am also fully aware of the necessity of developing technical standards through the process you outlined in your article. hopefully, clr action with cembi will expedite this laborious process and help to push our profession forward into the twentieth century. since we ourselves have not been able to do it through all these years, i am personally grateful that some group such as clr took the initiative and forced us to do what we should have done years ago. maryann duggan slice office di1·ector editor's note: positive action and progmssive movement are, of course, desirable and are often lacking in large organizations. however, posit·ive action without communication of this action to the affected population can only be detrimental. on issues of the complexity of those addressed by cembi and conser, review by the library community is always useful, even though action may be temporarily delayed.-skm to the editor: on page 233 of the september issue of lola there is a report from the information industry association's micropublishing committee chairman (henry powell). he states that", .. the committee spelled out several areas of concern to micropublishers which will be the subject of committee action .... " one of the concerns of the committee is that a z39 standards committee has recommended "standards covering what micropublishers can say about their products." (emphasis mine.) technical communications 323 as chairman of the z39 standards subcommittee which is developing the advertising standard referred to, i wish to point out that there is no intention on the part of the subcommittee to tell micropublishers what they can say nor what they may say about their products. the subcommittee, which is composed of representatives from three micropublishing concerns, two librarians, and myself, has from the beginning taken the view that the purpose of the standard would be to provide guidance for micropublishers and librarians alike. we are most anxious that no one feel that the subcommittee has any intention of attempting to use the standards mechanism to tell any micropublisher how he must design his advertisements. in addition it should be noted that no ansi standard is compulsory. carl m. spaulding program officer council on library resou1·ces decision-making in the selection, procurement, and implementation of alma/primo: the customer perspective article decision-making in the selection, procurement, and implementation of alma/primo the customer perspective jin xiu guo and gordon xu information technology and libraries | march 2023 https://doi.org/10.6017/ital.v42i1.15599 jin xiu guo (jiguo@fiu.edu) is associate dean for technical services, florida international university. gordon xu (gordon.xu@njit.edu) is associate university librarian for collections & information technology, new jersey institute of technology. © 2023. abstract this case study examines the decision-making process of library leaders and administrators in the selection, procurement, and implementation of ex libris alma/primo as their library services platform (lsp). the authors conducted a survey of libraries and library consortia in canada and the united states who have implemented or plan to implement alma. the results show that most libraries use both request for information (rfi) and request for proposal (rfp) in their system selection process, but the vendor-offered training is insufficient for effective operation. one-third of the libraries surveyed are considering switching to open-source options for their next automation system. these insights can benefit libraries and library consortia in improving their technological readiness and decision-making processes. introduction with the exponential growth of digital information, libraries have been seeking innovative systems to manage electronic resources and provide collection services. the next-generation integrated library system (ils) should address both current challenges and future demands. with that in mind, new cloud-based commercial products have come into the market in recent years. ex libris alma, oclc worldshare, and innovative sierra are often referred to as library service platforms (lsps) compared to a client-based ils. among these new products, selecting and implementing a new system is no small task. studies show that libraries might overlook the capacity of an ils to accommodate many functions and make a tough choice between sticking with the current vendor or switching to another before investing time and resources to migrate to a completely new system.1 libraries do not make these kinds of decisions in a rational manner, which involves clearly defining the problem, identifying and evaluating potential options, weighing the pros and cons of each option, considering an organization’s values, goals, and preferences, making a choice based on a systematic analysis, and continuously reassessing and adjusting the decision as new information becomes available. as a result, a selected system might not be the best fit for a library’s actual needs.2 library consortia also face a similar challenge, but in a more complex context. for example, sharing cost, level of collaboration, and integration with other library applications can be quite different from a small library to a large research library. additionally, the requirement for security and scalability can vary among consortial members. ninety-four percent of academic libraries migrated their systems to alma in 2018 by joining a consortium.3 at a consortial level, managing a system migration project adds a significant challenge because of the competing, often conflicting desires of constituent institutions. mailto:jiguo@fiu.edu mailto:gordon.xu@njit.edu information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 2 guo and xu budgeting for a migration project needs to be secured before the project takes place. the one-time migration cost has a huge impact on a library’s decision on a new system. lengthy procurement processes mean that it can take a year to communicate requirements, solicit bids, and make a final decision. libraries also wonder if they should acquire such a new system through a consortial deal or on their own. a successful implementation of a new system starts with making a sound choice. the system migration project encompasses various technological and management decisions made by project managers, team leaders, and library administrators. decisions about data cleanup, migration mapping, system configuration, communication, and training can have a tremendous impact on project outcomes, staffing, existing workflows, and job functions and responsibilities. in the meantime, the project itself also provides libraries a great opportunity to improve the existing operational and staffing model and to adjust their strategy to manage technological and organizational change. there are few studies on decision-making of the alma/primo selection, procurement, and migration from the user’s perspective. alma is a cloud-based library management system that helps libraries manage, deliver, and discover digital and physical resources. it offers functionalities such as resource discovery, resource management, resource sharing, and analytics. primo ve is a next-generation library discovery platform that provides users with access to a central index of the library’s collections. it offers a personalized and intuitive search experience, with features such as faceted searching, saved searches, and item recommendations. both alma and primo ve are ex libris products. this case study fills the gap and provides a better understanding of how american and canadian library leaders and administrators make decisions for their libraries and consortia. the pairing of ex libris’s alma and primo products has become a widely accepted next-generation system due to its cloud-based model for managing both electronic and print resources. the findings of this study offer insights and lessons learned to help library leaders and administrators to make better decisions on their future technological change. literature review the growing user demand for electronic resources over the last decade has led libraries to make a rapid digital transformation to manage and deliver online library services. consequently, system providers are hungry to develop the next-generation library systems. organizations have started to adopt cloud computing as their infrastructure. a benefit of cloud computing is that local it staff no longer need to handle hardware failures and software installation. cloud computing streamlines processes and saves time and money. additionally, cloud computing not only enables libraries to deliver resources and services in a network and a library community but also frees libraries from managing technology to focus on collection building, service improvement, and innovation. therefore, libraries have started to migrate their client-based integrated library systems (ils) to cloud-based next-generation systems, often referred as lsps. these lsps can be connected with other web applications, increase collection visibility and accessibility, streamline workflows, reduce duplication of staffing and collections, and create a greener ecosystem for organizations.4 library consortia have been playing vital roles in resource sharing, cooperative purchasing, discovery, user experience, and technical support. many libraries migrate to a shared nextgeneration ils or lsp by joining a consortium. besides sharing common needs, participating information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 3 guo and xu libraries are quite different with respect to their sizes, the kinds and numbers of resources they provide, services, priorities, and staffing. although this could pose some challenges like cost sharing for participating libraries, workflow design, policy, and a collaboration model for libraries, libraries still benefit greatly from the shared catalog and enhanced metadata as well as cooperation on a global level through the product community such as eluna and igelu.5 the selection of a new system is not a small decision. calvert and read pointed out that some libraries turned to “sheep syndrome” of selecting what other libraries have bought due to the lack of software knowledge.6 their study suggested that a request for proposal (rfp) could be a part of the lsp selection process by providing a consistent set of vendor responses with a narrow scope, a formal statement of requirements for benchmarking, and a mechanism for vendors to compete. gallagher advised considering existing contracts, financial resources, and rfps before beginning a system assessment. he indicated that the expiration date of the current ils and opt-out clauses of the existing contract could be the indicators of a go-live date. a price quote including a one-time implementation fee and a cost-benefit analysis of the current ecosystem compared to the vendor offer could provide a helpful document that envisions future library services.7 in addition to an rfp, yang and venable also considered the library automation marketplace and needs of their own library when migrating from sirsidynix symphony to alma/primo.8 gallaway and hines embraced competitive usability techniques to test a set of standard tasks across multiple systems by using focus groups at loyola university new orleans to select a nextgeneration system.9 they also collected anecdotal information and feedback on the system performance of the current library online catalog through a survey of library staff. this evidencebased decision-making process makes system selection in a rational manner. manifold, on the other hand, proposed a principled approach to selecting a new lsp. he believed that system selection was a part of the continuing process of organizational change and needed to involve library staff and users throughout the process. today’s lsp systems can connect almost the entire range of library operations, from resource management and acquisitions to user request fulfillment and the integration of subject guides on research, teaching, and learning a system migration is much more than just a move to a new system; instead, it is a transfer to a new culture. he suggested the acquisitions process must start with educating participants on the features of various systems, methods of vendor assessment, the rules of contract negotiation, communication, and stress management. the success in system selection and implementation should be measured over the life span of the system to guide new decisions along the way.10 in addition to commercial products, some libraries are acquiring open-source software (oss) that enables them to have a greater control over customization. the potential benefits of oss include cost effectiveness, interoperability, user friendliness, reliability, stability, auditability, and customization. koha, evergreen, folio, abcd, winisis, newgenlib, emilda, pmb (phpmybibli) and weblis are examples of oss ils/lsp products on the market.11 when selecting and implementing an oss solution, small libraries such as the paine college colins-callaway library, with a limited budget and small staff, chose a hosted open-source ils (koha) to obtain specific expertise and services at a reasonable price.12 once a system is selected, the implementation process itself can be critical to the perception of overall system success. lovins expressed concern about choosing a project management approach that is schedule-driven over results-driven. he also recommended organizing implementation information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 4 guo and xu activities around the incoming system functionality. for a consortium-wide system migration, a “train-the-trainer” strategy was adopted in the training program, which mostly offers demonstrations instead of instruction to future trainers.13 the program hardly met libraries’ expectation for training. active staff participation in a system migration is key to a project success. banerjee and middleton reported that when library staff owned the migration process, fewer mistakes and greater satisfaction with the new system, as well as quicker troubleshooting of problems that did arise as a result of the migration, were observed.14 avery shared that the god’s bible college libraries did an informal preand post-assessment of library users and staff to gather feedbacks on both legacy and target ils. he recommended conducting a formalized preand post-evaluation of user satisfaction with the ils.15 stewart and morrison observed that acquisitions workflows in a shared alma environment must balance required consortial needs with local policies and procedures. the unmet training needs and the lack of an electronic resources management (erm) module in alma presented challenges for library staff to develop and manage alma workflows. they argued that a two-year project cycle was super ambitious especially if the consortium size and variety of individual libraries involved were large and wide.16 when migrating from horizon to symphony (both are sirsidynix products), king fahd university of petroleum and minerals based in dhahran saudi arabia experienced a delayed implementation. some unmet needs, such as a dramatic shift of workflows, user interface customization, and training support by a system provider or its parent company not matched by a local vendor, became hurdles for this project.17 although a new lsp including alma/primo and oss empowers libraries to create unified workflows across functional modules, this feature requires a system user to have cross-functional roles to conduct these activities.18 when migrating from non-ex libris product lines to alma/primo, libraries may need to make tough implementation decisions. for example, the university of south carolina migrated library data to alma/primo from innovative’s millennium and ebsco’s full text finder. when the legacy and target products are from different vendors, the system migration can be more complicated in communication, data mapping, data quality, and expected results of data migration. for the usc library, the preexisting duplicate records for electronic resources should have been cleaned up before the migration.19 libraries should address their concerns about key activities during the implementation to get the best possible result. the joint bank fund library had a three-day onsite training in workflows in the middle of the project. it would be much more effective if the library had communicated with the vendor to reschedule the training at a later stage of the migration because library staff were not yet familiar with the lsp by the expected time.20 the university of north carolina at charlotte migrated from oclc’s worldshare management services (wms) to alma/primo after migrating from millennium to wms four and a half years previously. the atkins library went through the second system migration because wms modules did not meet their library’s needs. going through two system migrations in the span of five years was particularly costly and frustrated technical services staff spent more than half of their work time on data cleanup. additional time for data cleaning, workflow design, and training was also needed after the migration to alma.21 fu and fitzgerald studied the effect of lsp staffing models for library systems and technical services by analyzing the software architecture, workflows, and functionality of voyager and information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 5 guo and xu millennium against those realigned in alma, wms (worldshare management systems), and innovative sierra. they discovered that the workload of systems staff could be reduced by around 40 percent, so library systems staff could have additional time to focus on local applications development, the discovery interface, and system integration. in the meanwhile, the functionality of the next generation ils provides a centralized-data services platform to manage all types of library assets with unified workflows. consequently, libraries could streamline and automate workflows for both physical and electronic resources through systems integration and enhanced functionality. this change requires libraries to reconsider their staffing models, redefine job descriptions, and even reorganize the library structure to leverage the benefits of a new lsp.22 western michigan university (wmu) decided to reorganize its technical services department after the alma migration was completed in 2015. after the alma implementation, it was observed that staff spent 38 percent less time working with physical materials. the systems department also shifted its focus from back-end system support to front-end user and other new technologies. wmu consolidated fourteen departments into six and renamed technical services to resource management, composed of cataloging and metadata, collections and stacks, and electronic resources. the lsp administration was shared by four certified alma administrators and one discovery administrator residing in the resource management department.23 although researchers and library practitioners have studied ils selection and implementation processes and the impact of migration on library operation and staffing, only the studies on the rfp and usability testing have focused on decision-making on the ils selection. today, library administrators and leaders face technological change more often while making a transformation to a digital business model. they should understand how decisions are made at different organizational levels when managing change. this study is to fill this gap and help library administrators and leaders to better prepare for future change through the following research questions: • what is the decision-making process and what do libraries consider? • how do libraries evaluate the migration project? • what are the impacts of the system migration on library staffing and operation? • what lessons have libraries learned from the system migration? • what will libraries do differently for the future system migration? methods researchers have adopted both qualitative and quantitative methods for studies about system migration. the literature indicates that both interviews and surveys have been employed to collect data for these studies.24 a usability testing through a set of tasks across systems has also been utilized in a system selection.25 a comparative analysis of vendor documents, rfp responses, and webinars has been applied in studying the impact of system migration on staffing models.26 in this research, the authors used a qualitative method through a survey to understand decisionmaking on system selection, procurement, and implementation. data collection the population for this study is those libraries that implemented or are planning to implement alma. through the eluna membership management site (https://eluna40.wildapricot.org/), the https://eluna40.wildapricot.org/ information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 6 guo and xu authors identified 1,440 libraries in the united states and canada that use at least one ex libris product. with help from sue julich at university of iowa libraries, who manages the site, 1,150 alma libraries were identified. the authors also contacted marshall breeding, the founder and publisher of library technology guides (https://librarytechnology.org/), and obtained a list of 1,134 alma libraries in the united states and canada. comparing the alma libraries acquired from the two different sources, they eventually identified 1,079 libraries from the united states and 55 libraries from canada as eligible survey-participating libraries. the authors developed a 13-question survey in qualtrics. this questionnaire aimed to help participants recall the project experience and offer them an opportunity to self-reflect and give feedback. the survey was distributed via email to the eligible libraries. a few email reminders were sent out to encourage participation. upon the closure of the survey, 291 libraries (27%) completed the survey completely. data analysis qualtrics generates data analysis and reports. the authors conducted a text analysis by categorizing responses to those open-ended survey questions to clarify the characteristics of each response manually and then presented and analyzed data in microsoft excel. findings part i: library profile & background information the participating libraries have diverse profiles in terms of size and geographic location and reflect the point of views from small library to library consortium. remarkably, during the survey, the authors received requests for a complete survey questionnaire so that respondents could coordinate and provide the complete and accurate data on behalf of their libraries. respondents the majority of the respondents in this survey were deans, directors of the library or university librarians, and system librarians (see table 1). also, there were a wide variety of other position titles across cataloging, acquisitions, technical support, and reference, who participated in the survey (see table 2). participating libraries geographic location the participating libraries were located in the united states and canada, and the majority of them were american libraries (see table 3). the american libraries were distributed in 36 states, while the canadian libraries came from 4 provinces. https://librarytechnology.org/ information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 7 guo and xu table 1. the position titles of the respondents position title percentage dean/director of the library/university librarian 35% system librarian 23% other 42% table 2. the other position titles of the respondents other position titles assessment librarian head of metadata and cataloging asset management librarian head of technical services assistant director ils coordinator associate dean instructional technology librarian associate director lead librarian associate law librarian library technician associate university librarian library technology manager cataloging and metadata librarian manager of archives & access services cataloging librarian manager of digital services collections librarian manager of technical support consortial executive director metadata librarian deputy director of the library project director director of library systems public services librarian director of library technology services reference librarian/webmaster director of technical services resource description and access librarian electronic resources librarian solutions architect, alma implementation project manager head librarian supervisor for access services head of acquisitions technical services and instruction librarian head of collection management technical services librarian head of library systems technical services section head head of library technology services technology manager table 3. the geographic locations of the libraries country percentage united states 92% canada 8% information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 8 guo and xu library size the libraries served a wide variety of student sizes, ranging from less than 1,000 to over 50,000 students (see table 4). the smallest library had only 199 students while the largest library system or consortium had 482,000. the number of employees in those institutions ranged from less than 1,000 employees to over 20,000 faculty and staff (see table 5). the smallest institution may only have 10 employees, while there were three larger institutions with over 50,000 faculty and staff. table 4. student population (number of ftes) student population (number of ftes) percentage <1,000 6% 1,000–1,999 14% 2,000–2,999 10% 3,000–3,999 8% 4,000–4,999 4% 5,000–5,999 6% 6,000–6,999 4% 7,000–7,999 6% 8,000–8,999 4% 9,000–9,999 1% 10,000–14,999 9% 15,000–19,999 8% 20,000–29,999 6% 30,000–39,999 5% 40,000–49,999 3% 50,000+ 4% table 5. faculty and staff population (number of ftes) faculty/staff population (number of ftes) percentage <100 9% 100–499 25% 500–1,000 17% 1,000–1,999 14% 2,000–2,999 7% 3,000–4,999 12% 5,000–9,999 9% 10,000–19,999 4% 20,000+ 5% information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 9 guo and xu library type the majority of the libraries were single campus libraries; some were part of a multicampus library system or consortium libraries (see table 6). the other types of libraries may include single campus libraries serving more than one institution or location, central offices of a consortium, part of a statewide system, or independent libraries involved in consortium purchase and implementation of alma. table 6. library type library type percentage single campus library 45% part of a multicampus library system 24% part of a consortium 26% other 5% previous integrated library system (ils) the majority of previous ilss used by the participating libraries were voyager, aleph, millennium, and sierra (see table 7), and their vendors were ex libris, innovative interfaces, inc., and sirsidynix (see table 8). thirty-seven percent of libraries reported that they had used their previous ils over 20 years before they planned to migrate or migrated to alma (see table 9). also, one-fifth of libraries indicated that prior to alma, it was their first time to adopt an ils. therefore, this was their only experience in system migration (see table 10). all libraries used cataloging, circulation, and opac modules in their previous ilss, and they also used other modules (see tables 11 and 12). table 7. the previous ilss the previous ils percentage voyager 29% aleph 24% millennium 16% sierra 12% symphony 6% worldshare management services 3% horizon 2% workflows 2% tlc 1% clio 1% evergreen 1% surpass 1% the library corporation 1% other 3% information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 10 guo and xu table 8. the previous system vendors the previous ils vendor percentage ex libris 49% innovative interfaces, inc. 28% sirsidynix 11% oclc 4% endeavor 1% tlc 1% surpass 1% the library corporation 1% other 5% table 9. years with the previous systems years with the previous system percentage 3 1% 4 1% 5–9 7% 10–14 18% 15–19 27% 20+ 37% unknown 9% table 10. whether the previous systems were the first ilss was it your first ils percentage no 72% yes 20% unknown 7% table 11. modules used in previous ils modules used in previous ilss percentage cataloging 100% circulation 100% opac 100% serials 77% acquisitions 76% course reserves 64% interlibrary loan 28% other 9% information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 11 guo and xu table 12. other modules used in previous ilss other modules used in previous ilss analytics booking course reserves discovery system electronic resource management ereserves inn-reach licensing part ii: implementation process alma modules/functions the majority of libraries reported that they will implement or have implemented the following alma modules: fulfillment, primo/primo ve, resource management, and acquisitions (see table 13). some libraries mentioned that they also used summon to replace primo/primo ve as they had used it before the system migration. table 13. alma modules/functions implemented alma modules/functions implemented percentage fulfillment 100% primo/primo ve 93% resource management 92% acquisitions 84% erm (electronic resources management) 77% course reserves 73% network zone 50% interlibrary loan 40% digital collections 21% other 8% selection process rfi and rfp when asked if an rfi (request for information) was involved, more than half of the libraries responded with a confirmative answer (see fig. 1). about half of the libraries reported that they did not conduct a system functionality survey to collect information from library users and colleagues (see fig. 2). more than half of the libraries indicated that the rfp (request for proposal) process is required for the system migration (see fig. 3). there were a variety of reasons why for those libraries who did not conduct the rfp process (see fig. 4), such as an rfp may not be necessary when migrating systems to the same vendor, there was no increase in expenditure, or the expenditure did not reach a budget threshold (e.g., less than $100,000), or the previous contract stipulated it if upgrading to a new product with the same vendor. another reason was that libraries might have an existing relationship with vendors and would like to continue using information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 12 guo and xu their products. some libraries were given authority by the university administration and library directors to handle the negotiation, or they thought an rfi offered sufficient information to make this decision. other libraries had no choice in conducting an rfi or rfp process for reasons such as their system was outdated and they had to migrate, the decision was made by consortium, or alma was their sole source procurement. figure 1. whether an rfi (request for information) was involved. figure 2. whether a system functionality survey was conducted. yes 52% no 40% unknown 8% no 51%yes 43% unknown 6% information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 13 guo and xu figure 3. whether an rfp (request for proposal) was involved. figure 4. the rationales for libraries who did not conduct the rfp. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 14 guo and xu decision-making the authors found that the common roles involved in the decision-making process included library dean/director, alma local implementation team, and alma project working group (consortium) (see fig. 5). some libraries indicated that their system migration decision was made by university executives (provost, vp finance, cio, and cfo), campus it, aul for library technology, or all librarians/staff. one library reported that the dean of arts, languages & learning services made the selection decision instead of the library or librarians. figure 5. the decision makers. important factors for system selection the authors found that the four most important elements to consider for system selection were budget reality; electronic resource management (erm), bibliographic, and authority control; discovery layers (primo, primo ve); and cloud hosted (see table 14). information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 15 guo and xu table 14. the important factors for system selection important factor for system selection strongly disagree somewhat disagree neither agree nor disagree somewhat agree strongly agree the budget reality 3% 6% 11% 34% 47% the number of libraries adopted 7% 7% 27% 40% 19% erm, bibliographic, & authority control 2% 2% 17% 38% 41% discovery layers (primo, primo ve) 6% 4% 13% 27% 50% the analytics/reporting functionality 4% 6% 15% 41% 35% cloud hosted 3% 3% 12% 36% 47% the campus it infrastructure & its ecosystems 8% 12% 31% 31% 18% integration with other erps 12% 15% 30% 33% 10% customer support & satisfaction 4% 6% 21% 37% 31% system user training programs 5% 11% 24% 38% 21% figure 6. the data migrated to alma. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 16 guo and xu data migrated the most common types of data migrated to alma were bibliographic records, holdings and items, patrons, and circulation data (see fig. 6). some libraries reported that they also migrated other types of data including vendor lists, e-resource data, all available data types, etc. discovery service the survey asked if there were any libraries that migrated to alma and did not choose primo/primo ve for their discovery service. nine libraries reported they were in this case. four of them used summon, four chose ebsco discovery service, and one adopted their locally developed product. when asking the reason for their choices, the nine libraries indicated that they would like to stay with the existing discovery service. additionally, two of the libraries stated that a budget limitation was a part of their reasons, and one library thought the better discovery service for users was the rationale. part iii: feedback on alma migration system migration evaluation the majority of libraries reported that they did not conduct a formal post-migration evaluation. half of the libraries thought the migration achieved their project goals, or met the needs of library operations (acquisitions, cataloging, fulfilment, discovery, etc.) (see fig. 7). figure 7. whether a formal post migration evaluation was conducted. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 17 guo and xu some libraries also provided their own migration evaluation, including rfp mandatory requirements signoff, availability study, focus groups with library staff, usability testing with students and faculty, feedback and cross-checking with consortium, debrief of library staff, etc. some only did an informal evaluation, which turned out to be not handled well or not very satisfactory. for example, one consortium did a survey on the migration and provided the feedback to ex libris for improvement. other libraries reported that they had not done the evaluation as they did not start the migration process, were still in the migration stage, that an evaluation was not a part of the decision-making process, or that alma was offered as a free product because of their consortial partnerships. valuable lessons learned the authors asked what were the most valuable lessons the libraries had learned from the migration project, and how they would implement the migration differently if they had a chance to do it again. the most valuable lessons concentrated on training, communication, engagement, implementation process, and data cleanup/preparation (see fig. 8). these lessons are shared in greater detail in the discussion section. figure 8. the valuable lessons learned from the migration project. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 18 guo and xu prospective migration when asking if libraries would consider working with ex libris again if they migrated to a new system in the future, 70 percent of libraries gave an affirmative answer, but some libraries indicated that they would seek other alternatives (see fig. 9). when asked how likely libraries would be to consider implementing an open-source ils, the majority of libraries conveyed that they would not consider open source; only 7 percent of libraries would consider it (see fig. 10). figure 9. whether ex libris products would be considered in the future. figure 10. whether an open-source ils would be considered in the future. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 19 guo and xu discussion the authors examine the above findings further through the lens of the research questions raised in the literature review section. the decision-making process and factors considered the survey indicates that both rfi and rfp are important for a selection process. fifty-two percent of the libraries conducted an rfi and 57 percent required the rfp process for the system migration. interestingly, even with a variety of sound reasons such as no increase in expenditure, within the budget threshold, existing relationships with vendors, sole source procurement, consortium decision, riders, etc., some libraries still did not roll out the rfp process. besides rfi and rfp, 43 percent of libraries went through a system functionality survey to collect information from library users and colleagues. for most libraries, the library dean or director, alma local implementation team, or alma project working group of a consortium were involved in the decision-making process. in some cases, university executives such as provost, vp finance, cio, cfo, campus it, and associate dean or associate university librarian for library technology made a collective decision. in a rare case, the dean of arts, languages & learning services made the call for the system selection. when considering system migration, many factors can be important. this survey shows that libraries mainly consider budget reality; erm, bibliographic, and authority control; discovery layers; and cloud-hosted systems. it is interesting that most libraries would like to move to a cloud-based system that has better functionality for discovery and electronic resources management. the survey also reveals that library administration needs to find a way to offset the cost increase of the system migration. the lack of comparable system or service offerings in the market also contributes to the decision on system selection. project evaluation project evaluation provides important feedback from both system users and system providers and a great opportunity for libraries to learn. the findings indicate that many libraries do not have a formal assessment process. some consortia have conducted surveys and provided feedback to ex libris, but no response reported to the feedback from ex libris. both libraries and system vendors have lost the opportunity to learn and improve project management. for example, welldocumented complaints on dissatisfaction with ex libris training have not been effectively addressed. some libraries believe a demonstration-focused training model does not provide the same experience as onsite training offers. many libraries have had trouble with acquisitions workflows. the eocr (electronic order confirmation record) and edi (electronic data interchange) processes are standard practices in libraries today to generate order records and create invoices automatically and should be a part of implementation contract to ensure that libraries can operate appropriately after a new system goes live. it is time for both libraries and system providers to consider a formal project assessment as a part of system migration down the road. libraries will not do better if they do not improve today. libraries cannot improve if they do not know where previous projects have gone wrong. a better way to learn from mistakes is project assessment. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 20 guo and xu impacts on library staffing and library operation some libraries reported that insufficient staffing over the system migration has created additional problems and hardships. some library departments have been stretched very thin in order to work on the migration project in addition to their regular operational duties. however, about onethird of survey-participating libraries have reported that meeting the needs of library operation including acquisitions, cataloging, fulfilment, and discovery is a criterion of project evaluation. the lack of dedicated lsp project migration staff creates a challenge for system migration. most importantly, additional staffing time and technical capacity are important factors that decide if libraries could fully take advantage of the functionalities of a new system. libraries might manage the system migration better by hiring additional technical staff on a project basis to handle technical aspects if staff cannot be released from library operation to focus on the migration project. the system integration and unified automated workflows of a modern lsp can enable libraries to run their operations more efficiently. particularly in a shared environment or network, libraries could share bibliographic records for general collections wider and deeper, which could dramatically reduce the need for both original and copy cataloging. system staff no longer need to install or upgrade proprietary software and maintain servers in house. these changes might cause job insecurity for some library staff. it is critical for library leaders to make adjustments to some job responsibilities or develop new skills to meet new demands. this requires library administration to create a culture of embracing change, learning, and collaboration. staff can take the advantage of a new system by being curious and reassessing previous workflows. library administration could create a flexible structure to encourage learning and collaboration across departments. lessons learned many libraries shared valuable lessons they learned from the migration projects. those lessons concentrate on training, communication and engagement, implementation process, and data cleanup and preparation. training many libraries expressed dissatisfaction with the training provided by their vendor. for example, libraries moving to alma reported that ex libris could have focused more on in-person, postmigration training. as it was, staff felt undertrained because they had access only to online training before the libraries had access to their own data in alma/primo. additionally, ex libris did not have regular trainers for a particular library, so there was less continuity across training sessions than there could have been. some suggest that ex libris do a concentrated several-day initial training for migration so that libraries have a solid overview of the entire system before data exports for testing loads, and then delve into a detailed weekly training that includes more library staff. it seems a good idea to schedule more training sessions after implementation because libraries may not know how the system functions during the implementation period. in an ideal world, libraries would put more contractual obligations on ex libris to train staff more thoroughly. after all, libraries need to hold ex libris more accountable for project outcome. for consortium libraries, they should insist that ex libris provide specialized individual trainers and technical contacts. attending group training sessions conducted by a variety of different ex libris information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 21 guo and xu trainers does not work well in large migration projects. ex libris needs to train the library staff rather than focusing on training the consortium support staff and expecting them to do most of the staff training. ex libris indeed carries a variety of training webinars that are free; however, for bespoke training or intimate training sessions, they charge their customers. a barrier for many libraries is that they just cannot afford to pay more on these bespoke training sessions so they depend on other in-house training and best practices (e.g., work groups, training committees, inhouse power users, etc.) to train/manage the training needs of their library personnel. communication and engagement many libraries express that communication is extremely important and buy-in from stakeholders at all levels is critical to the migration project’s success. investing the initial time to have all stakeholders onboard will pay off. blocking off time for weekly meetings with involved staff and ex libris is key. some suggested asking more questions and seeking to understand the functionality of the new system more deeply. for consortial libraries, librarians can become much closer to each other and learn to seek out and receive help from one another in the ways that they might never do before. the networking can be an invaluable source for mutual support going forward. some libraries reported that due to the lack of communication, an overly sudden decision for the implementation timeline was made at the legislative level. information regarding requirements and expenses was not fully clarified before the process began and came as a surprise during the migration. the whole process felt very rushed by the vendor with insufficient trainings, which turned out to be very dissatisfying. implementation process a system migration is complex and requires a great deal of time, institutional resources, and staff. some key processes needed to be better prepared in advance, such as staff trainings, project plans and major milestones, system analysis, customer inputs for implementation and configuration, data cleanup, physical to electronic processing (p2e), source data extraction, validation and delivery, workflow analysis, fulfillment network, authentication, third-party integrations, data review and testing, go-live readiness checklist, etc. in practice, the migration was often more time and resource-intensive than expected, meaning that libraries found it difficult to complete their part of the process in the contractually-specified time. libraries should clear the decks of core staff to focus on migration, and make sure there are no other major projects occurring at the same time. if staff have insufficient time during the migration window, libraries need to hire temporary experienced staff for the project. this investment will benefit library operation in the long run. the implementation team members should have more dedicated time to be trained so that the library staff are well prepared and knowledgeable in the areas in which they work. it is wise to clean up data as much as possible prior to migration. it would be ideal if the existing workflows were fully documented with diagrams so that it would be easier to determine what parts of the workflows need change. some libraries reported their migration happened during the pandemic with state-issued stay-athome orders in force. it was extremely stressful juggling all of the changes for the library while keeping up with system migration. ideally, it would be better to avoid doing the migration during a pandemic and postpone the migration. but if libraries have no other choices, one benefit is to take advantage of closures for cutover days. the stress of the implementation and trying to get information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 22 guo and xu things done may cause frustrations to boil over. it is advised to manage these situations by adding additional support where needed and by always ensuring that communication is a top priority so that any confusion is kept to a minimum. for consortial libraries, it is important for individual institution members to have their own project managers. the consortial libraries would have tried to standardize more configurations across the consortia, like user groups, circulation settings, item types, etc. some libraries felt the whole migration process was rushed by the vendor, which turned out to be not very successful. libraries should not let the vendor talk them into a compressed, severalmonth migration timeline; instead, they should spend more time in the preparation and implementation process. data cleanup and preparation although it is tedious and time consuming, many libraries suggested cleaning up data as much as possible prior to migration. more pre-migration data cleanup would avoid the post-migration mess. some libraries recommended more stringent cleanup of catalog records, acquisitions data, circulation data, patron records, weeding, etc. it is important to make sure the cataloging structure matches the structure of the new system. had they taken the data review stage more seriously and fully modeled the processes and workflows that would be needed, they would have had fewer data cleanup problems to address after the migration was complete. some libraries cautioned that alma’s p2e (physical to electronic) migration process was more complex than anticipated. they stated that the p2e conversion did not work as it should have, and ex libris should do a better job in the future. due to misalignment of source and target collections, the p2e process resulted in a large cleanup after the migration. a number of libraries would have asked more questions about what data was migrated and to where. ex libris had migrated data that should not have been migrated. as a result, a messy system became a reality. planning for future system migrations when asking what libraries will do differently for a future system migration, many provided very interesting insights. some libraries believed that the system migration put library leadership in a difficult position. they needed to engage all library employees in decision-making and provide staff with the resources they needed to navigate change, experience the vulnerability of learning a new system, and even have difficult conversations with colleagues. at the same time, library leaders are accountable to their parent organizations and subject to budget pressure and mandates to follow procurement processes, which are geared around efficiency and hierarchy rather than promoting democratic decision-making and self-governance. many libraries expressed a concern about training. they stated that they would demand a separate contract for training in the future and put more contractual obligations on system providers to train staff more thoroughly. they would spell out in greater detail what a successful migration would consist of to hold ex libris responsible for outcome. during the bidding process, library staff should be less distracted by smooth presentations but ask difficult questions about system functionality. another concern is about the pricing. one early adopter of alma stated that they learned the risks, rewards, and excitement of helping with a developing product as they felt aleph was a dead end information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 23 guo and xu and did not see many other alternatives. they would have negotiated more strongly with ex libris on pricing considering the immaturity of the product and pricing model at the time of adoption. some libraries felt they were not given competitive pricing, and their costs went up significantly, which constituted a large budget shift. some small libraries believed alma is too big for them, and oclc might be more appropriate for their size of collections and materials. they realized they underutilized a very expensive system. some libraries preferred a customized implementation as opposed to the one-size-fits-all model ex libris offered. they stated that despite learning the new system, they found that the solutions ex libris offered for their implementation rarely worked. they would better off fitting in their own workflows with alma (especially for budgeting). ex libris seems to be not ready to work with single-campus small colleges. other libraries reported that they had multiple people in a project management role, which created communication issues. they learned that in any future migration processes they should have a single project manager empowered to make decisions. for consortium libraries, some libraries suggested taking advantage of cohorts of migrating institutions to share information, issues, and raise common questions. they would have made some local decisions instead of simply going with the consortiums. one consortium experienced a major difficulty that the group implementation took place in different countries. the time difference with their implementation team had added an additional dimension to project management. they would have done an individual migration instead of a group migration since they had a very complex institutional structure. some libraries strongly recommended open-source systems as well. they believed that the trend toward vertical consolidation of vendors is not healthy for the library system market in the long run. with mergers and acquisitions, gigantic companies are formed and might over-control the market and pricing. conclusions decision-making on the selection, procurement, and implementation of a new lsp is a process that requires gathering information and seeking input from library administration, experts, and different levels of stakeholders in a systematical way to ensure the system quality, fitness, and a successful implementation. the findings suggest that libraries should adopt an rfi/rfp (request for information/proposal) or system functionality survey as the basis for system selection. budget, resource discovery, and electronic resources management are the most important factors to be considered in an ils selection. staffing time and technical capability must be addressed before implementing a new system to enable libraries to manage user expectations. insufficient staff and the lack of technical skills could affect the realization of the benefits of a new system. technological change can lead to the shifts of staff job responsibilities and lead to a new way of working together. it is important for library administration to address organizational change when making technological change. a formal project assessment is essential for libraries and system providers to learn and improve collectively. open-source systems could open doors for libraries to seek more customized and affordable systems. research limitations like all research studies, this study has limitations that provide opportunities for further investigations. firstly, because we asked for responses from individuals, not libraries, the findings information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 24 guo and xu might be biased by participants due to individual experiences. secondly, due to the limitation of time, space, and number of survey questions, reported data mainly focused on alma libraries and could not cover migration experiences of libraries migrating to other products or all aspects of system migration. further research would benefit the library community from interviewing participating libraries in a different size, type, and geographic location as well as different system providers. practical implications every new system has its advantages and downsides. to help libraries fully take advantage of a new system, it would be helpful if vendors could evaluate training, physical to electronic (p2e) process, and system affordability. providing training after a system goes live will help libraries implement workflows effectively and give staff better experience. p2e is crucial for ensuring that all relevant information is transferred and maintained in the new system. vendors could address potential p2e issues before a system migration takes place so that libraries might approach data cleanup differently. it would be great if vendors could customize system modules or functionalities as needed by both small and large libraries. this will give libraries flexibility to invest in most needed library operations at different prices to make the system affordable. customer services can be crucial for libraries to continue optimizing the new system down the road. regularly seeking libraries’ feedback can foster a positive customer relationship and benefit both libraries and vendors. acknowledgements the authors appreciate the support of marshall breeding and sue julich for providing the library contact lists. the authors would also like to thank the office of research integrity for reviewing the survey questionnaire and providing comments. much gratitude goes to the survey participants who volunteered their time to participate in this study and took the time to communicate with the authors in order to provide accurate responses for their libraries or consortia. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 25 guo and xu appendix: survey questionnaire adult online consent to participate in a research study a customers’ perspective: decision-making on system migration summary information things you should know about this study: • purpose: the purpose of the study is to understand how library leaders make decisions on system migration during technological change and the impact of these decisions on library operation and staff. • procedures: if you choose to participate, you will be asked to answer 12 multiplechoice questions and 3 open-ended questions. • duration: this will take about 15 to 20 minutes. • risks: there is little risk or discomfort from this research since you share your project experience anonymously. • benefits: the main benefit to you from this research is to self-reflect on the project and have an opportunity to share the project experience. we plan to publish our findings, which will bring potential benefits to you and the library community. • alternatives: there are no known alternatives available to you other than not taking part in this study. • participation: taking part in this research project is voluntary. please carefully read the entire document before agreeing to participate. confidentiality the records of this study will be kept private and will be protected to the fullest extent provided by law. in any sort of report we might publish, we will not include any information that will make it possible to identify you. research records will be stored securely and only the researcher team will have access to the records. information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 26 guo and xu the following questions are for general analytical use only. although qualtrics does not collect your email address, please do not provide your personal identification indicators (pii) with your answers. if pii appear in the responses, we will apply a data anonymization process to anonymize pii after the results are added into the final tally. right to decline or withdraw your participation in this study is voluntary. you are free to participate in the study or withdraw your consent at any time during the study. you will not lose any benefits if you decide not to participate or if you quit the study early. the investigator reserves the right to remove you without your consent at such time that he/she feels it is in the best interest. researcher contact information if you have any questions about the purpose, procedures, or any other issues relating to this research study you may contact jin guo (jiguo@fiu.edu) or gordon xu (gordon.xu@njit.edu). irb contact information if you would like to talk with someone about your rights of being a subject in this research study or about ethical issues with this research study, you may contact the fiu office of research integrity by phone at 305-348-2494 or by email at ori@fiu.edu. participant agreement i have read the information in this consent form and agree to participate in this study. i have had a chance to ask any questions i have about this study, and they have been answered for me. by clicking on the “consent to participate” button below i am providing my informed consent. consent to participate mailto:jiguo@fiu.edu mailto:gordon.xu@njit.edu information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 27 guo and xu section i: library profile and background information 1. your title: a. dean/director of the library/university librarian b. system librarian c. other (please specify: _________________) 2. describe your institution a. location i. us ii. canada iii. state 2. total student and faculty population a. total student population (number of ftes) b. total faculty population (number of ftes) 3. information about your library a. single campus library b. part of a multicampus library system c. part of a consortium d. other (please specify: _________________) 4. previous ils: a. the previous ils name: b. the previous ils vendor: c. years with the previous system: d. was it your first ils? a. yes b. no 5. ils modules in use prior to alma migration: (please check all that apply) a. acquisitions b. cataloging c. circulation d. interlibrary loan e. reserves f. serials g. opac h. other (please specify: _____________________) information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 28 guo and xu section ii: alma implementation process 6. alma modules/functions implemented: (please check all that apply) a. acquisitions b. resource management c. fulfillment d. interlibrary loan e. course reserves f. erm g. network zone h. primo/primo ve i. digital collections j. other (please specify: ________________________) 7. the system selection process • was an rfi (request for information) involved? a. yes b. no • did you conduct a system functionality survey to collect information from library users and colleagues? a. yes b. no • was the rfp (request for proposal) process required? • a. yes, please specify the person/department that prepared for rfp. _____ • b. no, please provide the reason why (e.g., budget cap less than $100k, etc.)_____ 8. who was involved in the decision-making process? • alma project working group (consortium) • alma local implementation team • project manager(s) • library dean • institutional coordinators/leads • departmental heads • others (please specify ______) information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 29 guo and xu 9. what are important factors for system selection (5 points, weight/per response)? • the budget reality • the number of libraries adopted • e-resource management (erm), bibliographic, and authority control • discovery layers (primo, primo ve) • the analytics/reporting functionality • cloud hosted • the university/college it infrastructure and its ecosystems • integration with other erp (enterprise resource planning) systems/platforms • customer support & satisfaction • system user training programs 10. what data was migrated (please select all that apply)? • authority data • bibliographic records • holdings and items • patrons • loans, holds, and fines • acquisitions • course reserves • digital metadata and objects 11. please skip this question if you use primo/primo ve. if you chose non-ex libris products for discovery service, please specify the product____, and select the possible reason below: • budget limitation • stay with the existing discovery service • others section iii: feedback on alma migration project. 12. how did your library evaluate the system migration project? • no formal post-migration evaluation • user satisfaction survey • achieved the project goals • met the needs of library operations (acquisitions, cataloging, fulfilment, discovery, etc.) information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 30 guo and xu 13. open-ended questions • what are the most valuable lessons you have learned from this project? if you had a chance to do it again, how would you implement the migration differently? • would the library consider working with ex libris again if it were to migrate to a new system in the future? • how likely is it that this library would consider implementing an open-source ils? information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 31 guo and xu endnotes 1 zhonghong wang, “integrated library system (ils) challenges and opportunities: a survey of us academic libraries with migration projects,” the journal of academic librarianship 35, no. 3 (2009): 207–20, https://doi.org/10.1016/j.acalib.2009.03.024. 2 teri oaks gallaway and mary finnan hines, “competitive usability and the catalogue: a process for justification and selection of a next-generation catalogue or web-scale discovery system,” library trends 61, no. 1 (2012): 173–85. 3 guoying liu and ping fu, “shared next generation ilss and academic library consortia: trends, opportunities and challenges,” international journal of librarianship 3, no. 2 (2018): 53–71. 4 matt goldner, “winds of change: libraries and cloud computing,” bcla browser: linking the library landscape 4, no. 1 (2012): 1–7. 5 liu and fu, “shared next generation,” 53–71; jone thingbø, frode arntsen, anne munkebyaune, and jan erik kofoed, “transitioning from a self-developed and self-hosted ils to a cloudbased library services platform for the bibsys library system consortium in norway,” bibliothek forschung und praxis 40, no. 3 (2016): 331–40, https://doi.org/10.1515/bfp-20160052. 6 philip calvert and marion read, “rfps: a necessary evil or indispensable tool?” electronic library 24, no. 5 (2006): 649–61. 7 matt gallagher, “how to conduct a library services platform review and selection,” computers in libraries 36, no. 8 (2016): 20. 8 zhongqin (june) yang and linda venable, “from sirsidvnix symphony to alma/primo: lessons learned from an ils migration,” computers in libraries 38, no. 2 (march 2018): 10–13. 9 gallaway and hines, “competitive usability,” 173–85. 10 alan manifold, “a principled approach to selecting an automated library system,” library hi tech 18, no. 2 (2000): 119–30, https://doi.org/10.1108/07378830010333455. 11 ayoku a. ojedokun, grace o. o. olla, and samuel a. adigun, “integrated library system implementation: the bowen university library experience with koha software,” african journal of library, archives and information science 26, no. 1 (2016): 31–42. 12 lyn h. dennison and alana faye lewis, “small and open source: decisions and implementation of an open source integrated library system in a small private college,” georgia library quarterly 48, no. 2 (spring 2011): 6–9. 13 daniel lovins, “management issues related to library systems migrations. a report of the alcts camms heads of cataloging interest group meeting. american library association annual conference, san francisco, june 2015,” technical services quarterly 33, no. 2 (2016): 192–98, https://doi.org/10.1080/07317131.2016.1135005. https://doi.org/10.1016/j.acalib.2009.03.024 https://doi.org/10.1515/bfp-2016-0052 https://doi.org/10.1515/bfp-2016-0052 https://doi.org/10.1108/07378830010333455 https://doi.org/10.1080/07317131.2016.1135005 information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 32 guo and xu 14 kyle banerjee and cheryl middleton, “successful fast track implementation of a new library system,” technical services quarterly 18, no. 3 (2001): 21–33. 15 joshua m. avery, “implementing an open source integrated library system (ils) in a special focus institution,” digital library perspectives 32, no. 4 (2016): 287–98, https://doi.org/10.1108/dlp-02-2016-0003. 16 morag stewart and cheryl aine morrison, “breaking ground: consortial migration to a nextgeneration ils and its impact on acquisitions workflows,” library resources & technical services 60, no. 4 (2016): 259–69. 17 zahiruddin khurshid and saleh a. al-baridi, “system migration from horizon to symphony at king fahd university of petroleum and minerals,” ifla journal 36, no. 3, (2010): 251–58, https://doi.org/10.1177/0340035210378712. 18 efstratios grammenis and antonios mourikis, “migrating from integrated library systems to library services platforms: an exploratory qualitative study for the implications on academic libraries’ workflows,” qualitative and quantitative methods in libraries 9, no. 3 (september 2020): 343–57, http://qqml-journal.net/index.php/qqml/article/view/655/585. 19 abigail wickes, “e-resource migration: from dual to unified management,” serials review 47, no. 3–4 (2021): 140–42. 20 yang and venable, “from sirsidynix,” 13. 21 joseph nicholson and shoko tokoro, “cloud hopping: one library’s experience migrating from one lsp to another,” technical services quarterly 38, no. 4 (2021): 377–94. 22 ping fu and moira fitzgerald, “a comparative analysis of the effect of the integrated library system on staffing models in academic libraries,” information technology and libraries 32, no. 3 (september 2013): 47–58. 23 geraldine rinna and marianne swierenga, “migration as a catalyst for organizational change in technical services,” technical services quarterly 37, no. 4 (2020): 355–75, https://doi.org/10.1080/07317131.2020.1810439. 24 vandana singh, “experiences of migrating to open source integrated library systems,” information technology and libraries 32, no. 1 (2013): 36–53, https://doi.org/10.6017/ital.v32i1.2268; shea-tinn yeh and zhiping walter, “critical success factors for integrated library system implementation in academic libraries: a qualitative study,” information technology and libraries 35, no. 3 (2016): 27–42, https://doi.org/10.6017/ital.v35i3.9255; grammenis and mourikis, “migrating from integrated library systems,” 343–54; xiaoai ren, “service decision-making processes at three new york state cooperative public library systems,” library management 35, no. 6 (2014): 418–32, https://doi.org/10.1108/lm-07-2013-0060; wang, “integrated library system,” 207– 20; pamela r. cibbarelli, “helping you buy ils,” computers in libraries 30, no. 1 (2010): 20–48, https://www.infotoday.com/cilmag/cilmag_ilsguide.pdf; calvert and read, “rfps,” 649–61. https://doi.org/10.1108/dlp-02-2016-0003 https://doi.org/10.1177/0340035210378712 http://qqml-journal.net/index.php/qqml/article/view/655/585 https://doi.org/10.1080/07317131.2020.1810439 https://doi.org/10.6017/ital.v32i1.2268 https://doi.org/10.6017/ital.v35i3.9255 https://doi.org/10.1108/lm-07-2013-0060 information technology and libraries march 2023 decision-making in the selection, procurement, and implementation of alma/primo 33 guo and xu 25 gallaway and hines, “competitive usability,” 173–85. 26 fu and fitzgerald, “a comparative analysis,” 47–58. abstract introduction literature review methods data collection data analysis findings part i: library profile & background information respondents participating libraries geographic location library size library type previous integrated library system (ils) part ii: implementation process alma modules/functions selection process rfi and rfp decision-making important factors for system selection data migrated discovery service part iii: feedback on alma migration system migration evaluation valuable lessons learned prospective migration discussion the decision-making process and factors considered project evaluation impacts on library staffing and library operation lessons learned training communication and engagement implementation process data cleanup and preparation planning for future system migrations conclusions research limitations practical implications acknowledgements appendix: survey questionnaire endnotes lib-mocs-kmc364-20131012113638 book reviews the future of the printed word: the impact and implications of the new communications technology. edited by philip hills. westport, conn.: greenwood, 1980. 172p. $25. lc: 80-1716. isbn: 0-313-22693-8 (lib. bdg.). the character of this volume is as much that of a topical journal or annual review as that of a monograph. a dozen authors have contributed thirteen chapters, all but one prepared especially for this publication. ten of the chapters are by british authors, two by americans, and one by european community personnel located in luxembourg. an amusing punch satire about book (built-in orderly organized knowledge) is reprinted as an unnumbered fourteenth chapter. in an excellent opening essay, john m. straw horn notes: "in this book, the expression printed word is construed very broadly, to include words in any kind of display: paper, microforms, crt's, plasma panels and so on." his essay is a terse but pointed review of the organization of information transfer, some current trends, factors affecting acceptance of new technologies, and some broad projections for the future. provocative essays by maurice b. line and p. j. hills, editor of the volume, explore the printed word from the points of view of a bookperson and an educator. in one of the most elegant metaphors to appear in information science literature, line suggests: "the printed butterfly will emerge from its electronic chrysalis, but it will also return again to it in due time. the vast majority of documents will thus be stored in electronic (chrysalis) form, but the majority of those used at any given time will be in their printed (butterfly) form." two incisive and thorough chapters on official information by patricia wright systematically explore the use of old and new technologies for forms, leaflets, and signs. 239 wright makes acute and useful observations on how technology can hinder or help gathering and dispersion of governmental information. the graphic information research unit of the royal college of art has done excellent work in recent years in exploring how various display options affect comprehension. linda reynolds provides a good essay, "designing for the new communications technology," based on that research. the review of prospects for electronic journal publishing by donald w. king is a good overview, especially for beginners. a chapter on euronet diane describes problems in creating an online database capability in the european political environment. chapters on printing technologies, microforms, and videodiscs cover all major alternatives but suffer from brevity. two brief but competent speculative essays, which add little, complete the volume. the work lacks a general index, but the organization of chapters makes this a minor flaw. use of presumably common british acronyms without explanation, especially in credits and citations, is an irritant for non-u.k. readers. the work would make an excellent supplementary text for a course on the history of the book. practitioners in publishing or library and information science will find much of interest.-brian aveney. turnkey automated circulation systems: aids to libraries in the market place. edited by judith bernstein . chicago. american library assn., 1980. 332p. $10.50. when my library entered the marketplace for an automated circulation system, i searched the literature for aids. had i found this book at that time i would have been disappointed. what i would expect from a 332-page book with a subtitle, "aids to libraries in the market place," would be numerous examples of what had been done 240 journal of library automation vol. 14/3 september 1981 before. i would expect samples of the analyses that other libraries had done to justify entering the marketplace, samples of the rfps that had been sent to vendors, and samples of the contracts that had been signed. i would like to see a case study (or two) of the complete process of procurement. admittedly, this expectation is somewhat of an ideal, but these are "aids" that we searched for and that other libraries now ask from us. what does this book provide? an editorial introduction gives a sense of the difficulties of the marketplace and the frustrations encountered in it. a two-page bibliography gives a reasonable selection of readings to provide a background for decision making. a discussion titled "hiring a consultant-why and how," is a very useful enumeration of details to be considered in the decision to hire a consultant and in the agreement with a consultant. a model request for proposal is a good synthesis of the details to be included in almost every library's rfp and thus provides a starting point for the library new to the marketplace. all of this is what i consider to be the substance of this book, and it ends at page 40. the remaining 292 pages are devoted to the "profiles" of individual libraries which have installed automated circulation systems. the profiles are intended to assist in the identification of libraries to be contacted for further information, but provide little useful information by themselves. my primary objection to this book is the misleading nature of the citation. one expects more than three hundred pages of "aids" and finds a directory with a fortypage preface. but for the librarian new to the marketplace it may be worth the price.-alan e. hagyard, yale university library, new haven, connecticut. archives and the computer, by michael cook. london: butterworths, 1980. 152p. $29.95. lc: 80-41286. isbn: 0-40810734-0. michael cook recognizes the special predicament of the archivist whose job consists of trying to satisfy three contradictory needs: (1) the need to arrange and describe archives by their provenance, (2) the need to store them most efficiently by shape and size, and (3) the need to access them to answer inquiries that are mostly subjectoriented. the solution to these conflicting requirements may come from the computer. as cook says, "the speed and variety of computerized lists and indexes derived from a single data base could solve this problem by producing finding aids in all possible sorts of order." in a very handsomely produced, sturdily bound book, archives and the computer, michael cook, archivist of the university of liverpool, reports on various computer systems serving the needs of the archivists. his book starts with a general discussion on the nature of automated systems and their relation to manual ones. this is followed by the description of a select group of archives systems-some still in use, others put to their well-deserved rest after a few years' use. he covers records management systems (i.e., the area of handling current records) and archives management systems (i.e., the handling of noncurrent documents). in the final chapter cook moves the discussion away from computer processing of traditional, familiar forms of archival material, focusing instead on processing archives that are themselves machine-readable data files. how does the archivist accomplish all of the necessary tasks if the archives are not readable by the human eye? how does he appraise, arrange, describe, and access them? i like mr. cook's cautious and sober attitude. talking about system design, he remarks, "at this stage decisions will be made which will be irrevocable in practical terms, and may cause much trouble later. " about implementation and testing, "computer systems should help people to work more effectively in a more interesting environment; if they fail in this, or appear to fail, there is something wrong, and it would perhaps be better not to introduce the change." the records management systems he describes are used by british county and city record offices. an interesting feature in one of them, a system called arms, is a printout that tabulates for each class of documents the number of requests in a year, per year stored. this printout could be very helpful in modifying established retention periods on the basis of experience. the following archives systems are described: prospec (adopted by the public record office of london) , nars a-1 (used by the national archives of the usa), spindex (first used by the national archives and the national historical publications and records commission), selgem (used by the archives of the smithsonian institution), stairs (an ibm system, used, among others, by the house of lords record office in london), paradigm (developed and used at the university of illinois), mistral (used by the national archives of ivory coast), and arcaic (used and abandoned by the east sussex record office). of all these systems, i found the description of selgem the most educational. besides listing the fields making up a computer record, cook shows an example of an actual record as it appears in the master list, and as it appears in the printed guide to the archives. he also includes an actual segment of the name/ subject index. although there is a brief mention about the choice between networking versus isolated, separate systems, the book does not speculate about the possibility of a network of many institutions building a common database. nor does the author discuss the much debated and very timely question of whether archivists could possibly agree on a uniform computer record for the description of manuscripts and archives, similar to the way in which librarians have agreed on using the marc formats for the description of their materials. a glossary of technical terms, a "select directory" of archival systems, and a "select bibliography" are useful additions to the main text. this book is more recommended to the archivist looking for a computer system than for the systems analyst who wants to learn how archives are processed.suzanna lengyel, yale university library, new haven, connecticut. the library and information manager's guide to online seroices. edited by ryan e. hoover. white plains, n.y.: knowledge industry publications, 1980. 270p. $29.50 hardcover, $24.50 softcover. lc: 8021602. isbn: 0-914236-60-1 (hardcover); book reviews 241 0-914236-52-0 (softcover). hoover and jeven colleagues provide an overview of the main issues and techniques involved in starting and managing an online retrieval service. the emphasis is on a library setting-the implicitly broader focus conveyed by the title is not matched by any specific coverage of, for example, the online search activity of the for-profit information brokers, where funding, staffing, publicizing, and the search process itself are handled differently than in libraries. the three large, general search services (lockheed, sdc, and brs) are used throughout for the descriptions and search examples, and their bibliographic databases inevitably receive the most attention. there is a noticeable slant toward the two agencies with which several of the contributors are or were affiliated-the university of utah (which doesn't detract from the book's objectivity) and sdc (which does). the chapters are of uneven quality and scope. most of the obvious areas are covered-the available search systems and databases; equipment needs; search techniques; managing an online service in a library; training searchers; promoting service; and measurement and evaluation. taken as a whole, the.book is a good stateof-the-art report, even though it is already becoming outdated in terms of industry facts. the numerous charts and tables serve to flesh out the text, but do we really need six photographs of terminals (two of them showing the same searcher at the same terminal , the only difference being that in one there is an onlooker) to illustrate that "some searchers prefer to have the user present"? brief chapters on the growing network of online user groups, and on the future of online services (largely derived from lancaster) end the text, and the book has a serviceable bibliography, glossary, and index. six years ago i reviewed one of the first kipi publicationsit was in typescript, comb-bound, a little more than one hundred pages, and it cost $24.50. this is a much better production and, considering inflation since 1975, it represents vastly better value for money. it should serve as a useful handbook for those of us in the field, as well as those just starting, for another 242 journal of library automation vol. 14/3 september 1981 year or two.-peter watson, california state university, chico. basics of online searching, by charles t. meadow and pauline atherton cochrane. new york: wiley, 1981. 245p. $15.95. lc: 80-23050. isbn: 0-417-05283-3. the use of online information retrieval services is becoming widespread throughout the information community, whether in traditional libraries or in business, industry, or government offices. the need for trained searchers is evident by looking at the job advertisements and at the quantity of training programs being offered around the country. the programs presented by the machine-assisted reference section (mars) of the reference and adult services division of ala are always packed. the librarians attending ala annual conferences seem to be hungry for any information available about online information retrieval services. this text fills an obvious need for the professional who attended library school before course offerings in online information retrieval were available. although online information retrieval is now being taught in most library and information science curriculums, there have been only a few attempts at providing a textbook for beginning students, and none of those has been very successful since the lancaster and fayen information retrieval online in 1973. basics of online searching is a text intended "to teach the principles of interactive bibliographic searching . . . to those with little or no prior experience. the major intended audiences are students, working information specialists and librarians, and end users, the people for whom all this searching is done. " because the authors have done an excellent job of targeting their audience and sticking to that target, this text will be useful at the introductory level. the authors cover the elements of interactive searching including the reference interview, boolean logic, search strategy development, telecommunications and equipment, basic database structure, selective dissemination of information, and how to get help from search-service vendors. the text is relatively free of jargon and does a good job of defining in context new terms as they appear. the authors begin with basic definitions and a brief overview of the process of interactive searching. the reference interview and search strategy development is covered adequately, first with an introduction and then in a later chapter providing more detailed information. telecommunications and computer equipment are covered in enough detail for the novice. the next five chapters cover search language, databases, various types of text searching, and how to get on and off the computer. this section of the book uses examples that show the different approaches to the same process on three different systems-brs, orbit, and dialog. the authors do not lose sight of their intent to demonstrate the principles of online searching. there is a brief chapter on selective dissemination of information (sdi) and cross-file searching. the chapter explains how sdi is used and gives examples of constructing and saving a search for sdi on each of the three systems. the last chapter of the book, "search strategy," is especially good. there seemed to be something beyond the basic elementary information of the preceeding chapters. the authors clearly demonstrate concept development and search strategy formulation. the authors do an excellent job of integrating the discussion of the three major search service vendors, lockheed's dialog, system development's orbit, and bibliographic retrieval services, inc. examples are used from each of the services with a discussion of the differences. the book does clarify the similarity of the services by showing how each function can be accomplished on each system. searchers using only one system now might use this text to see how easily their knowledge could be transferred to another system. problems with the text do not abound, but there are some that should be brought to the attention of the reader. there is a slight problem with the format of the examples. the reviewer found herself searching for the completion of a paragraph of text on a few occasions. the examples are very good and clear; they are simply not separated from the text adequately for easy reading. there were a couple of instances of unnecessary redundancy . t here were two separate discussions, one on truncation and one on searching word fragments, which could have been improved by integration into one section. there was a repetition of "steps in the presearch interview and the online search" in chapter 3 and then again in chapter 12. this is almost a page of steps, which are very good, but a simple reference back to the earlier list would have sufficed. but the biggest problem with the text in the eyes of this reviewer is that of omission. there was no discussion of citation searching, evaluation of search results, and no mention of the various training options available for the novice searcher. this reviewer would like to have seen more information on where to go next as guidance to the novice. the one hundred pages of appendixes seem unnecessary and will soon be out of date. library school teachers planning to use this as a text would do well to request free, up-to-date materials rather than relying upon the documents in the appendix, which are more than a year old at the time of this writing. most every book on this topic has made the same mistake of reprinting search-service and database-producer literature. overall, however, the authors have succeeded very capably in their intended endeavor "to teach principles, rather than the detailed mechanics of any particular search system." there is a place in the literature for this very basic text, which is well written, uses clear examples, and teaches in an understated way. for those people who are afraid of automation, afraid to touch a computer terminal, and are insecure about their ability to do online searching, this book will relieve most of those fears and insecurities. the authors acknowledge their desire to give simple instructions and offer a chapter called "assistance" for people who need more help. novices might assume they could read this book, purchase a terminal, get a password and system manual, and begin searching. as a matter of fact one could do this, but the results would likely be a discredit to the search-service vendor because of a lack of system-specific training on the part of the searcher. most people, like this reviewer, can conceptualize a new process, but would feel more comfortable with some type of formal hands-on book reviews 243 training-even for half a day. there are too many little things that can be an impediment to success. the reviewer would heartily recommend this book to inexperienced searchers and library school students but would warn the experienced searchers that there is nothing newforthem.-carolynm. gray, western illinois university, macomb. quick • search cross-system database search guides. san jose, calif.: california library authority for systems and services, 1980. 21 charts. $75 (class members), $95 (nonmembers). isbn: 0-938098-00-4. the class on-line reference service (colrs) is a cooperative program for public, academic, and special libraries offering training and consultation on almost any aspect of online reference searching through the major commercial vendors of databases. this service is a part of class, the california library authority for systems and services, and acts as a contact point for searchers and the database industry through vendor-training sessions, database training, and the coordination of large group contracts with dialog information services and bibliographic retrieval services (brs). this close relationship to the online industry gives class a unique position from which to supply information on databases from a multiple search-system perspective. the publication of the quick•search cross-system database search guides is a natural outgrowth of the colrs program in training and consulting. the twenty-one charts in quick•search show the formats used to search for information in a specific database across the two or three vendors offering the database commercially. the databases were selected as the most commonly searched through the major commercial search services: bibliographic retrieval services, dialog information services, and system development corporation search service (soc). eight databases in the sciences, eight in the social sciences, and five multidisciplinary files are included in the complete set. two subsets of the science and multidisciplinary files, and the social science and multidisciplinary files are available for $60 for class members 244 journal of library automation vol. 14/3 september 1981 and $80 for nonmembers. the eight science databases are biosis, cab abstracts, compendex, energyline, enviroline, food service & technology abstracts, inspec, and oceanic abstracts. the social science files are abii inform, eric, exceptional child education resources, library and information science abstracts, management contents, psychological abstracts, social scisearch, and u.s. political science documents. the multidisciplinary databases are conference papers index, comprehensive dissertation index, ntis, pais international, and ssie current research. the stated purpose of the quick • search guides is to aid the experienced searcher who must use databases from more than one search service by showing the formats for each vendor of a database side by side for comparison. because most searchers tend to use a database on only one system, the guides are really more appropriate to an organization where several searchers may be using the same database through different systems and a "universal" quickreference chart is needed. because each guide covers only one database, the level of detail shown is much greater than in the simple-command comparison charts previously published. the guides are arranged to show particular features of the databases as they are used on the different search systems. the file label used to access the database and those fields that are searched when a term is entered with no restriction (the basic index) are shown at the top of each chart. the fields used in subject searching follow and show the field codes used to restrict subject searches, along with the format used online to enter search terms. the typical fields illustrated are title, subject descriptor, identifier, abstract, and category or section code. these fields vary according to database, but include the majority of subject access points used in the file. the balance of the chart is used to illustrate the field codes and formats used to retrieve information from other access points in the database such as author, journal source, language, publication date, document type, report numbers, or update code. these alternate access points vary widely by database, but each chart provides information on limiting searches by date, language, or update code at a minimum. the guides supply a useful amount of information for the experienced searcher needing a prompt on a form of entry for the fields available in a database, but a good understanding of the search system is required to use them properly. given the close contact class has with the database producers and online vendors, it is somewhat surprising to find inaccuracies and some misinterpretation in some of the guides. in the preface, for instance, the editor states, "in many brs files, uj and un are paragraph labels used in addition to de, mj, and mn. they are used to indicate major (uj) or minor (un) single word descriptors, similar to the df in dialog and iw in orbit." it is true that df is used in dialog to indicate a single-word descriptor, but in orbit the code is it. in brs, uj and un mean the term so restricted is an "unbound" part of a multiword descriptor-not a single-word descriptor (see brs/eric database guide, p.l4). the use of iw in orbit retrieves "unbound" words from the it field. the most trouble in the charts appears to be in the orbit sections. the basic index is misrepresented in several files and the iw field is only irregularly listed, even when it is present in the sdc version of the database. suggestions on the use of sensearch and stringsearch are not consistently illustrated for fields that cannot be directly restricted in some databases on orbit, such as abstract or supplementary index terms. many times the suggested search entry would not restrict retrieval to the field indicated on the chart. these inaccuracies would probably not doom an experienced searcher to failure in using a database, but they are annoying and do little to inspire absolute confidence in the information presented. class is to be complimented on the graphic representations in quick*search and the heavy stock used for the guides (the paper will probably outlive the information printed on it). addenda are planned for those databases changed or reloaded since the preparation of quick*search in october 1980, and a second edition is already under consideration. the quick*search guides are not meant as a replacement for vendor or database documentation and, in fact, are simply repackaged versions of the basic file descriptions available from the online vendors. considering the price of this publication, organizations would do well to consider investing instead in detailed user guides and updates for their searchers in order to provide the most accurate and current information on databases on a specific system.-rod slade, university of oregon library, eugene. viewdata and videotext, 1980-81: a worldwide report. transcript of viewdata '80, first world conference on viewdata, videotex, and teletext, london, march 26-28, 1980. white plains, n.y.: knowledge industry publications, 1980. 623p. $75 softcover. lc: 80-18234. isbn: 0914236-77-6. videotex81. proceedingsofvideotex'81 international conference and exhibition, may 20-22, 1981, toronto, canada. northwood hills, middlesex, u.k.: online conferences ltd., 1981. 470p. $85 softcover. viewdata '80 and videotex '81 were two state-of-the-art conferences for the emerging videotex field. videotex is the generic name for mass-market, consumer-oriented information retrieval systems of low cost and relative ease of use. videotex, as a technology, is divided into teletext systems and viewdata systems. teletext systems sequentially broadcast information using a portion of the television signal. subscribers, using a special decoder, can select individual pages from the several hundred offered. viewdata systems, on the other hand, are quite like online information systems except for their use of a television as a display device, their simplicity, and their broader range of transactions and information. these conference proceedings will be of interest to a limited audience. they are not for the complete beginner. nor will they provide hours of entertaining reading. neither meets academic publication criteria; many of the papers are fluff, outlines, or sales pitches. both proceedings have their share, unfortunately large, of uninformative articles. but if you are seriously interested in vidbook reviews 245 eotex's technology, uses, and social implications, then by all means at least skim the 1981 conference papers. the proceeding~ do describe the state of the art. moreover, the two proceedings, taken together, show some of the changes in the videotex field in the last year ... and not only in the spelling of "videotex." as state of the art, the viewdata '80 conference proceedings are already superseded. most of the material has been adequately covered by now in other publications at a much lower cost. there are two exceptions to this, both worth noting. the proceedings has several excellent articles on the japanese captain system, the best published on that system. of additional interest is a report on control data corporation's (cdc) market test of their plato educational system. their report suggests a large consumer market for highquality educational services even at a relatively high price. the videotex '81 conference proceedings are, of course, more current. there are four major topics of interest in the proceedings. firstly, there are several good presentations on videotex services, such as electronic publishing, retailing, and banking. there is an excellent discussion on what videotex means to newspapers, both in opportunities and threats. secondly, and particularly recommended, is a paper by tydeman and zwimpfer of the institute for the future. the paper outlines some of the social changes and problems that may result from large-scale videotex implementation. thirdly, there are updates on the existing videotex technologies and efforts from the french, japanese, canadian, and british groups. the british are perhaps the most interesting since they have a year of operational experience with their viewdata system, prestel. they state that most usage was from the business community, and their reports suggest that services are shifting to attract that market. if this is the case, it is a significant change from the original consumer orientation. there is also a good article on a prestel information provider's first year. of additional interest is that prestelcompatible databases and systems are being constructed in britain. thus, people will be 246 journal of library automation vol. 14/3 september 1981 able to access different systems using the same protocol. finally, there are numerous fascinating papers on american efforts. the americans, in contrast to the british, seem very unsettled; there is still a multiplicity of designs. (at&t's decision on a modified telidon standard, not reported in the proceedings but a major event of the conference, may ameliorate that .) the papers indicate overall that the "classic" definitions of viewdata and teletext will crumble or will be supplemented in the face of 100-channel, two-way cable systems. several papers document how these new cable capabilities will provide channels for large amounts of information to be delivered by teletext, viewdata, or hybrid systems. a paper by simon notes that cable will not only provide large audiences for information services but will also eliminate some of the traditionally defined viewdata functions. for example, people will not buy commodity prices from a viewdata service if that same information is available on a cable channel at a lower price. unfortunately, there are some topics missing from the 1981 conference proceedings. consumer-oriented educational services are mentioned little. systemperformance or human-factor considerations are rarely analyzed. there is much discussion of what services should be offered, but there is little discussion of how those services should be offered. no presentation is made on how to design very large databases for ease of use. particularly distressing is the relative omission of the word "quality" from the american papers in both proceedings. one cannot expect every home to be wired to access the entire library of congress. nonetheless, one can hope that videotex will not become merely a medium for used-car advertising.-mark s. ackerman, department of computer and information science, ohio state university and oclc, inc. , columbus. a candid look at collected works: challenges of clustering aggregates in glimir and frbr gail thornburg information technology and libraries | september 2014 53 abstract creating descriptions of collected works in ways consistent with clear and precise retrieval has long challenged information professionals. this paper describes problems of creating record clusters for collected works and distinguishing them from single works: design pitfalls, successes, failures, and future research. overview and definitions the functional requirements for bibliographic records (frbr) was developed by the international federation of library associations (ifla) as a conceptual model of the bibliographic universe. frbr is intended to provide a more holistic approach to retrieval and access of information than any specific cataloging code. frbr defines a work as a distinct intellectual or artistic creation. put very simply, an expression of that work might be published as a book. in frbr terms, this book is a manifestation of that work.1 a collected work can be defined as “a group of individual works, selected by a common element such as author, subject or theme, brought together for the purposes of distribution as a new work.”2 in frbr, this type of work is termed an aggregate or “manifestation embodying multiple distinct expressions .”3 zumer describes aggregate as “a bibliographic entity formed by combing distinct bibliographic units together.”4 here the terms are used interchangeably. in frbr, the definition of aggregates applies only to group 1 entities, i.e., not to groups of persons or corporate bodies. the ifla working group on aggregates has defined three distinct types of aggregates: (1) collections of expressions, (2) aggregates resulting from augmentation or supplementing of a work with additional material, and (3) aggregates of parallel expressions of one work in multiple languages.5 while noting the relationships between the categories, this paper will focus on the first type. aggregates of the first type include selections, anthologies, series, books with independent sections by different authors, and so on. aggregates may occur in any format, from a volume containing both of the j. d. salinger works catcher in the rye and franny and zooey to a sound recording containing popular adagios from several composers to a video containing three john wayne movies. gail thornburg (thornbug@oclc.org) is consulting software engineer and researcher at oclc, dublin, ohio. mailto:thornbug@oclc.org a candid look at collected works | thornburg 54 the environment the oclc worldcat database is replete with bibliographic records describing aggregates. it has been estimated that that database may contain more than 20 percent aggregates.6 this proportion may increase as worldcat coverage of recordings and videos tends to increase. in the global library manifestation identifier (glimir) project, automatic clustering of the records into groups of instances of the same manifestation of a work was devised. glimir finds and groups similar records for a given manifestation and assigns two types of identifiers for the clusters. the first type is manifestation id, which identifies parallel records differing only in language of cataloging or metadata detail, some of which are probably true duplicates whose differences cannot be safely deduplicated by a machine process. the second type is a content id, which describes a broader clustering, for instance, physical and digital reproductions and reprints of the same title from differing publishers. this process started with the searching and matching algorithms developed for worldcat. the glimir clustering software is a specialization of the matching software developed for the batch loading of records to worldcat, deduplicating the database, and other search and comparison purposes.7 this form of glimirization compares an incoming record to database search results to determine what should match for glimir purposes. this is a looser match in some respects than what would be done for merging duplicates. the initial challenges of tailoring matching algorithms to suit the needs of glimir have been described in thornburg and oskins8 and in gatenby et al.9 the goals of glimir are (1) to cluster together different descriptions of the same resource and to get a clearer picture of the number of actual manifestations in worldcat so as to allow the selection of the most appropriate description, and (2) to cluster together different resources with the same content to improve discovery and delivery for end users. according to richard greene, “the ultimate goal of glimir is to link resources in different sites with a single identifier, to cluster hits and thereby maximize the rank of library resources in the web sphere.”10 glimir is related conceptually to the frbr model. if the goal of frbr is to improve the grouping of similar items for one work, then glimir similarly groups items within a given work. manifestation clusters specify the closest matches. content clusters contain reproductions and may be considered to represent elements of the expression level of the frbr model. the frbr and glimir algorithms this paper discusses have evolved significantly over the past three years. in addition, it should be recognized that the frbr algorithms use a map/reduce keyed approach to cluster frbr works and some glimir content while the full glimir algorithms use a more detailed and computationally expensive record comparison approach. the frbr batch process starts with worldcat enhanced with additional authority links, including the production glimir clusters. it makes several passes through worldcat, each pass constructing keys that pull similar records together for comparison and evaluation. as described by toves, “successive passes progressively build up knowledge about the groups allowing us to refine and information technology and libraries | september 2014 55 expand clusters, ending up with the work, content and manifestation clusters to feed into production.”11 each approach to clustering has its limits of feasibility, but the frbr and glimir combined teams have endeavored to synchronize changes to the algorithms and to share insights. some materials are easier to cluster using one approach, and some in the other. clustering meets aggregates in the initial implementation of glimir, the issue of handling collected works was considered out of scope for the project. with experience, the team realized there can be no effective automatic glimir clustering if collected works are not identified and handled in some way. why is this? suppose a record exists for a text volume containing work a. this matches to a record containing work a, but actually also containing work b. this matches to a work containing b and also containing works c, d, and e. the effect is a snowballing of cluster members that serves no one. how could this happen? in a bibliographic database such as worldcat, items representing collected works can be catalogued in several ways. efforts to relax matching criteria in just the right degree to cluster records for the same work are difficult to devise and apply. the glimir and frbr teams consulted several times to discuss clustering strategies for works, content, and manifestation clusters. practical experience with glimir led to rounds of enhancements and distinctions to improve the software’s decisions. while glimir clusters can and have been undone and redone on more than one occasion, it took experience from the team to realize that the clues to a collected work must be recognized. bible and beowulf as are many initial production startups, the output of glimir processing was monitored. reports for changes in any clusters of more than fifty were reviewed by quality control catalogers for suspicious combinations. and occasionally a library using a glimiror frbr-organized display would report a strange cluster. this was the case with a huge malformed cluster of records for the bible. such a work set tends to be large and unmanageable by nature; there are a huge number of records for the bible in worldcat. however, it was noticed the set had grown suddenly over the previous two months. user interface applications stalled when attempting to present a view organized by such a set. one day, a local institution reported that a record for beowulf had turned up in this same work set. this started the team on an investigation. after much searching and analysis of the members of this cluster, the index case was uncovered. in many cases bibliographic records are allowed to cluster based on a uniform title. what the team found connecting these disparate records was a totally unexpected use of the uniform title, a field a candid look at collected works | thornburg 56 240 subfield a, contents: “b.”. that’s right, “b.”. once the first case was located, it was not hard to figure out that there were numerous uniform “titles” with other single letters of the alphabet. so in this odd usage, bible and beowulf could come together, if insufficient data were present in two records to discriminate by other comparisons. or potentially, other titles which started with “b.” seeing this unanticipated use of uniform title field, the frbr and glimir algorithms were promptly modified to beware. the frbr and glimir clusters were then unclustered and redone. this was a data issue, and unanticipated uses of fields in a record will crop up, if usually with less drama. further experience showed more. in the examination of another ill-formed cluster, a reviewer realized that one record had the uniform title stated as “illiad” but the item title was homer’s “odyssey.” of course these have the same author, and may easily have the same publisher. even the same translator (e.g., richard lattimore) is not improbable for a work like this. this was a case of bad data, but it imploded two very large clusters. music and identification of collected works as music catalogers know, musical works are very frequently presented in items that are collections of works. the rules for creating bibliographic records for music, whether scores or recordings or other, are intricate. the challenges to software to distinguish minor differences in wording from critical differences seem to be endless. moreover, musical sound recordings are largely collected works due to the nature of publication. as noted by papakhian, personal author headings are repeated oftener in sound recording collections than in the general body of materials.12 there are several factors that may contribute to such an observation. there are likely to be numerous recordings by the same performer of different works and numerous records of the same work by different performers. composers are also likely to be performers. the point is, for sound recordings an author statement and title may be less effective discriminators than for printed materials. vellucci13,14 and riley15 have written extensively on the problems of music in frbr models. the problems of distinguishing and relating whole/part relationships is particularly tricky. musical compositions often consist of units or segments that can be performed separately. so they are generally susceptible to extraction. these extractive relationships are seen in cases where parts are removed from the whole to exist separately, or perhaps parts for a violin or other instrument are extracted from the full score. software must be informed with rules as to significant differences in description of varying parts and varying descriptions of instruments, and in this team’s experience that is particularly difficult. krummel has noted that the bibliographic control of sound recordings has a dimension beyond item and work, that is, performance.16 different performances of the same beethoven symphony information technology and libraries | september 2014 57 need to be distinguished. cast and performer list evaluation and dates checking are done by the software. however, the comparisons the software can make are susceptible to fullness or scarcity of data provided in the bibliographic record. there is great variation observed in the numbers of cast members stated in a record. translator and adapter information can prove useful in the same sense of roles discrimination for other types of materials. this is close scrutiny of a record. at the same time consider that an opera can include the creative contributions of an author (plot), a librettist, and a musical composer. yet these all come together to provide one work, not a collected work. tillett has categorized seven types of bibliographic relationships among bibliographic entities, including the following: 1. equivalence, as exact copies or reproduction of a work. photocopies, microforms are examples. 2. derivative relationships, or, a modification such as variations, editions, translations. 3. descriptive, as in criticism, evaluation, review of a work. 4. whole/part, such as the relation of a selection from an anthology. 5. accompanying, as in a supplement or concordance or augmentation to a work. 6. sequential, or chronological relationships. 7. shared characteristic relationships, as in items not actually related that share a common author, director, performer, or other role. 17 while it is highly desirable for a software system to notice category 1 to cluster different records for the same work, that same software could be confused by “clues,” such as in category 7. and the software needs to understand the significance of the other categories in deciding what to group and what to split. to handle these relations in bibliographic records, tillett discusses linking devices including, for instance, uniform titles. yet uniform titles are used for the categories of equivalence relationships, whole/part relationships, and derivative relationships. this becomes more and more complex for a machine to figure out. of course, uniform titles within bibliographic records are supposed to link to authority records via text string only. consideration should ideally be given to linking via identifiers, as has been suggested elsewhere.18 thematic indexes review of scores and recordings glimir clusters showed a case where haydn’s symphonies a and b were brought together. these were outside the traditional canon of the 104 haydn symphonies and were referred to as “a” and “b” by the haydn scholar h. c. robbins landon. this misclustering highlighted the need for additional checks in the software. a candid look at collected works | thornburg 58 the original glimir software was not aware of thematic indexes as a tool for discrimination. thematic indexes are numbering systems for the works of a composer. the kochel mozart catalog, as in k. 626, is a familiar example. these designations are not unique to a given composer, that is, they are intended to be unique for a given composer, but identical designators may coincidentally have been assigned to multiple composers. while “b” series numbers may be applied to works of chambonnières, couperin, dvořák, pleyel, and others, the presence of more than one b number is suggestive of collected work status. for more on the various numbering systems, see the interesting discussion by the music library association.19 however, the software cannot merely count likely identifiers in the usual place. this could lead to falsely flagging aggregates; one work by dvořák could have b.193, which is incidentally equivalent to opus 105. clearly, any detection of multiple identifiers of this sort must be restricted to identifiers of the same series. string quartet number 5, or maybe 6 cases of renumbering can cause problems in identifying collected works. an early suppressed or lost work, later discovered and added to the canon of the composer’s work, can cause renumbering of the later works. clustering software needs must be very attentive to discrete numbers in music, but can it be clever enough? paul hindemith (1895–1963) works offer an example. his first string quartet was written in 1915, but long suppressed. his publisher was generally schott. long after hindemith’s death, this first quartet was unearthed, and then was published by schott. the publisher then renumbered all the quartets. so quartets previously 1 through 6 became 2 through 7. the rediscovered work was then called “no. 1,” though sometimes called “no. 0” to keep the older numbering intact. further, the last two quartets did not even have opus numbers assigned and were both in the same key.20 this presents a challenge. anything musical another problem case emerged when reviewers noticed a cluster contained both the unrelated songs “old black joe” and “when you and i were young maggie.” on investigation, the cluster held a number of unrelated pieces. here the use of alternate titles in a 246 field had led to overclustering, and the rules for use of 246 fields were tightened in frbr and glimir. as in the other problem cases, cycles of testing were necessary to estimate sufficient yet not excessive restrictions. rules too strict split good clusters and defeat the purpose of frbr and glimir. at this point the glimir/frbr team recognized that rules changes were necessary but not sufficient. that is, a concerted effort to handle collected works was essential. information technology and libraries | september 2014 59 strategies for identifying collected works the greatest problem, and most immediate need, was to stop the snowballing of clusters. clusters containing some member records that are collected works can suddenly mushroom out of control. rule 1 was that a record for a collected work must never be grouped with a record for a single work. if all in a group are collected works, that is closer to tolerable (more on that later). with time and experimentation, a set of checks were devised to allow collected works to be flagged. these clues were categorized as types: (1) considered conclusive evidence, or (2) partial evidence. type 2 needed another piece of evidence in the record. finding the best clues was a team effort. it was acknowledged that to prevent overclustering, overidentification of aggregates was preferable to failure to identify them. several cycles of tests were conducted and reviewed, assessing whether the software guessed right. table 1 illustrates the types of checks done for a given bibliographic record. here the “$” is used as abbreviation for subfield, and “ind” equals indicator. area field rule notes uniform title 240 $a and no $m, $n, $p, or $r title in $ a on list of terms, without the other subfields listed, is collected work this is a long list of terms such as “symphonies,” “plays,” “concertos,” and so on. title 245 contains “selections,” is collected 245 245 with multiple semi colons and doc type “rec” 246 if four or more v246 fields with ind2 = 2, 3, or 4, is collected. if more than 1 246, consider partial evidence extent 300 if 300$a has “pagination multiple” or “multiple pagings,” is collected contents notes 505$a and $t 1. check $a for first and last occurrences of “movement”. if not multiple movement occurrences and does have if all / any the above produce more than one pattern instance or more a candid look at collected works | thornburg 60 multiple “ / ” pattern. 2. if the above doesn’t find multiple patterns, also look for “ ; “ patterns. 3. if the above checks don’t produce more than 1 pattern, look for multiple “ – ” patterns. 4. count 505s $t cases. 5. count $r cases. than one $t, or more than one $r, is collected. various fields for thematic index clues 505a if any v505 $a, check for differing opuses. (this also checks for thematic index cases too.) if found, is collected. for types score and recording related work 740 if 1 or more 740 and 1 has indicator 2 = 2”, is collected . if only multiple 740s, partial evidence author 700/710/711/730 check for $t and $n. and check 730 ind 2 value of “2.” if 730 with ind2 = 2 or multiple $t is found, is collected. if only 1 $t, partial evidence 100/110/111, 700/710 730 if format recording, and both records are collected work, require cast list match to cluster anything but manifestation matches. that is, do not cluster at content level without verifying by cast. table 1. checks on bibliographic records. frailties of collected works identification in well-cataloged records the above table illustrates many areas in a bibliographic record that can be mined for evidence of aggregates. the problem is that cataloging practice offers no one rule mandatory to catalog a collected work correctly. moreover, as worldcat membership grows, the use of multiple schemes of cataloging rules for different eras and geographic areas adds to the complexity, even assuming that all the bibliographic records are cataloged “correctly.” correct cataloging is not assumed by the team. information technology and libraries | september 2014 61 software confounded with all the checks outlined in the table, the team still found cases of collected works that seemed to defy machine detection. one record had the two separate works, tom sawyer and huckleberry finn, in the same title field, with no other clues to the aggregate nature of the item. the work brustbild was another case. for this electronic resource set, brustbild appeared to be the collection set title, but the specific title for each picture was given in the publisher field. a cluster for the work gedichte von eduard morike (score) showed problems with the uniform title which was for the larger work, but the cluster records each actually represented parts of the work. the bad cluster for si ku quan shu zhen ben bie ji, an electronic resource, contained records which each appeared to represent the entire collection of 400 volumes, but the link in each 856 field pointed only to one volume in the set. limitations of the present approach the current processing rules for collected works adopt a strategy of containment. the problem may be handled in the near term by avoiding the mixing of collected works with noncollected works, but the clusters containing collected works need further analysis to produce optimal results. for example, it is one thing to notice scores “arrangements” as a clue to the presence of an aggregate. the requirement also exists that an arrangement should not cluster with the original score. the rules for clustering and distinguishing different sets of arrangements present another level of complexity. checks to compare and equate the instruments involved in an arrangement are quite difficult; in this team’s experience, they fail more often than they succeed. without initial explication of the rules for separating arrangements, reviewers quickly found clusters such as haydn’s schopfung, which included records for the full score, vocal score, and an arrangement for two flutes. an implementation that expects one manifestation to have the identifier of only one work is a conceptual problem for aggregates. a simple case: if the description of a recording of bernstein’s mass has an obscurely placed note indicating the second side contains the work candide, mass is likely to be dominant in the clustering effect, with the second work effectively “hidden.” this manifestation would seem to need three work ids, one for the combination, one for mass, and one for candide. this does not easily translate to an implementation of the frbr model but could perhaps be achieved via links. several layers of links would seem necessary. a manifestation needs to link to its collected work. a collected work needs links to records for the individual works that it contains, and vice versa, individual works need to link to collective works. this can be important for translations, for example, into russian, where collective works are common even where they do not exist in the original language. a candid look at collected works | thornburg 62 lessons learned first and foremost, plan to deal with collected works. for clustering efforts this must be addressed in some way for any large body of records. secondly, formats will gain the focus. the initial implementation of the glimir algorithms used test sets mainly composed of a specific work. after all, glimir clusters should all be formed within one work. these sets were carefully selected to represent as many different types of work sets as possible, whether clear or difficult examples of work set members. plenty of attention was given to the compatibility of differing formats, given the looser content clustering. these were good tests of the software’s ability to cluster effectively and correctly within a set that contained numerous types of materials. random sets of records were also tested to cross check for unexpected side effects. what in retrospect the team would have expanded was sets that were focused on specific formats. recordings, scrutinized as a group, can show different problems than scores or books. the distinctions to be made are probably not complete. another lesson learned in glimir concerned the risks of clustering. the deliberate effort to relax the very conservative nature of the matching algorithms used in glimir was critical to success in clustering anything. singleton clusters don’t improve anyone’s view. in the efforts to decide what should and should not be clustered, it was initially hard to discern the larger scale risks of overclustering. risks from sparse records were probably handled fairly well in this initial effort, but risks from complex records needed more work. collected works is only one illustration of risks of overclustering. future research the current research suggests a number of areas for possible further exploration: • the option for human intervention to rearrange clusters not easily clustered automatically would seem to be a valuable enhancement. • there is next the general question, what sort of processing is needed, and feasible, to distinguish the members of clusters flagged as collected works? • part versus whole relationships can be difficult to distinguish from the information in bibliographic records. further investigation of these descriptions is needed. • arrangements of works in music are so complex as to suggest an entire study by themselves. work on this area is in progress, but it needs rules investigation. • other derivative relationships among works: do these need consideration in a clustering effort? can and should they be brought together while avoiding overclustering of aggregates? • how much clustering of collected works may actually be helpful to persons or processes searching the database? how can clusters express relationships to other clusters? information technology and libraries | september 2014 63 conclusion clustering bibliographic records in a database as large as worldcat takes careful design and undaunted execution. the navigational balance between underclustering and overclustering is never easy to maintain, and course corrections will continue to challenge the navigators. acknowledgments this paper would have been a lesser thing without the patient readings by rich greene, janifer gatenby, and jay weitz, as well as their professional insights and help in clarifying cataloging points. special thanks to jay weitz for explicating many complex cases in music cataloging and music history. references 1. barbara tillett, “what is frbr? a conceptual model for the bibliographic universe,” last modified 2004, accessed november 22, 2013, http://www.loc.gov/cds/frbr.html. 2. janifer gatenby, email message to the author, november 10, 2013. 3. international federation of library associations (ifla) working group on aggregates, final report of the working group on aggregates, september 12, 2011, http://www.ifla.org/files/assets/cataloguing/frbrrg/aggregatesfinalreport.pdf. 4. maja zumer and edward t. o’neill, “modeling aggregates in frbr,” cataloging and classification quarterly 50, no. 5–7 (2012): 456–72. 5. ifla working group on aggregates, final report. 6. zumer and o’neill, “modelling aggregates in frbr.” 7. gail thornbug and w. michael oskins, “misinformation and bias in metadata processing: matching in large databases,” information technology & libraries 26, no. 2 (2007): 15–22. 8. gail thornburg and w. michael oskins, “matching music: clustering versus distinguishing records in a large database,” oclc systems and services 28, no. 1 (2012): 32–42. 9. janifer gatenby et al., “glimir: manifestation and content clustering within worldcat,” code{4}lib journal 17 (june 2012),http://journal.code4lib.org/articles/6812. 10. richard o. greene, “cataloging alchemy: making your data work harder” (slideshow presented at the american library association annual meeting, washington, dc, june 26–29, 2010), http://vidego.multicastmedia.com/player.php?p=ntst323q. 11. jenny toves, email message to the author, december 17, 2013. 12. arsen r. papakhian, “the frequency of personal name headings in the indiana university music library card catalogs,” library resources & technical services 29 (1985): 273–85. http://www.loc.gov/cds/frbr.html http://www.ifla.org/files/assets/cataloguing/frbrrg/aggregatesfinalreport.pdf http://journal.code4lib.org/articles/6812 http://vidego.multicastmedia.com/player.php?p=ntst323q a candid look at collected works | thornburg 64 13. sherry l. vellucci, bibliographic relationships in music catalogs (lanham, md: scarecrow, 1997). 14. sherry l. vellucci, “frbr and music,” in understanding frbr: what it is and how it will affect our retrieval tools, ed. arlene g. taylor (westport, ct: libraries unlimited, 2007), 131–51. 15. jenn riley, “application of the functional requirements for bibliographic records (frbr) to music,” www.dlib.indiana.edu/~jenlrile/presentations/ismir2008/riley.pdf. 16. donald w. krummel, “musical functions and bibliographic forms,” the library, 5th ser. 31 (1976): 327–50. 17. barbara tillett, “bibliographic relationships: toward a conceptual structure of bibliographic information used in cataloging,” (phd diss., graduate school of library & information science, university of california, los angeles, 1987), 22–83. 18. program for cooperative cataloging (pcc) task group on the creation and function of name authorities in a non marc environment, “report on the pcc task group on the creation and function of name authorities in a non marc environment,” last modified 2013, http://www.loc.gov/aba/pcc/rda/rda%20task%20groups%20and%20charges/reportpcc tgonnameauthina_nonmarc_environ_finalreport.pdf. 19. music library association, authorities subcommittee of the bibliographic control committee, “thematic indexes used in the library of congress/naco authority file,” http://bcc.musiclibraryassoc.org/bcc-historical/bcc2011/thematic_indexes.htm. 20. jay weitz, email message to the author, may 6, 2013. http://www.dlib.indiana.edu/~jenlrile/presentations/ismir2008/riley.pdf http://www.loc.gov/aba/pcc/rda/rda%20task%20groups%20and%20charges/reportpcctgonnameauthina_nonmarc_environ_finalreport.pdf http://www.loc.gov/aba/pcc/rda/rda%20task%20groups%20and%20charges/reportpcctgonnameauthina_nonmarc_environ_finalreport.pdf http://bcc.musiclibraryassoc.org/bcc-historical/bcc2011/thematic_indexes.htm overview and definitions the environment clustering meets aggregates in the initial implementation of glimir, the issue of handling collected works was considered out of scope for the project. with experience, the team realized there can be no effective automatic glimir clustering if collected works are not identified ... why is this? suppose a record exists for a text volume containing work a. this matches to a record containing work a, but actually also containing work b. this matches to a work containing b and also containing works c, d, and e. the effect is a snowb... bible and beowulf music and identification of collected works thematic indexes string quartet number 5, or maybe 6 anything musical strategies for identifying collected works the greatest problem, and most immediate need, was to stop the snowballing of clusters. clusters containing some member records that are collected works can suddenly mushroom out of control. rule 1 was that a record for a collected work must never be grouped with a record for a single work. if all in a group are collected works, that is closer to tolerable (more on that later). frailties of collected works identification in well-cataloged records software confounded limitations of the present approach lessons learned future research conclusion acknowledgments this paper would have been a lesser thing without the patient readings by rich greene, janifer gatenby, and jay weitz, as well as their professional insights and help in clarifying cataloging points. special thanks to jay weitz for explicating many co... references 150 book reviews networks and disciplines; !proceedings of the educom fall conference, october 11-13, 1972, ann arbor, michigan. princeton: educom, 1973. 209p. $6.00. as with so many conferences, the principal beneficiaries of this one are those who attended the sessions, and not those who will read the proceedings. except for a few prepared papers, the text is the somewhat edited version of verbatim, ad lib summaries of a number of workshop sessions and two panels that purport to summarize common themes and consensus. since few people are profound in ad lib commentaries, the result is shallow and repetitive. the forest of themes is completely lost among a bewildering array of trees. the conference was, i am sure, exciting and thought-provoking for the participants. it was simply organized, starting with statements of networking activities in a number of disciplines, i.e., chemistry, language studies, economics, libraries, museums, and social research. the paper on economics is by far the best organized presentation of the problems and potential of computers in any of the fields considered, and perhaps the best short presentation yet published for economics. the paper on libraries was short, that on chemistry lacking in analytical quality, that on language provocative, that on social research highly personal, and that on museums a neat mixture of reporting and interpreting. much of the information is conditional, that is, it described what might or could be in the realm of the application of computers to the various subjects. the speakers all directed their papers to the concept of networks, interpreted chiefly as widespread remote access to computational facilities. the papers are followed by very brief transcripts of the summaries of workshops in which the application of computers to each of the disciplines was presumably discussed in detail. much of each summary is indicative and not really informative about the discussions. the concluding text again is the transcript of two final panels on themes and relationships among computer centers. the only description for this portion of the text is turgid. in the midst of all this is the banquet paper presented by ed parker, who as usual was thoughtful and insightful, and several presentations by national science foundation officials that must have been useful at the time to guide those relying on federal funding for computer networks in developing proposals. i can't think of another reference that touches on the potential of computers in so many different disciplines, but it is apparent from the breadth of ideas and the range of suggested or tested applications that a coherent and analytical review should be done. this volume isn't it. russell shank smithsonian institution the analysis of information systems, by charles t. meadow. second edition. los angeles: melville publishing co., 1973. a wiley-becker & hayes series book. this is a revised edition of a book first published in 1967. the earlier edition was written from the viewpoint of the programmer interested in the application of computers to information retrieval and related problems. the second edition claims to be "more of a textbook for information science graduate students and users" (although it is not clear who these "users" are) . elsewhere the author indicates that his emphasis is on "software technology of information systems" and that the book is intended "to bridge the communications gap among information users, librarians and data processors." the book is divided into four parts: language and communication (dealing largely with indexing techniques and the properties of index languages) , retrieval of information (including retrieval strategies and the evaluation of system performance), the organization of information (organization of records, of ffies, file sets), computer processing of information (basic file processes, data access systems, interactive information retrieval, programming languages, generalized data management systems). the second two sections are, i feel, . much better than the first. these are the areas in which the author has had the most direct experience, and the topics covered, at least in their information retrieval applications, are not discussed particularly well or particularly fully elsewhere. it is these sections of the book that make it of most value to the student of information science. i am less happy about meadow's discussion of indexing and index languages, which i find unclear, incomplete, and inaccurate in places. the distinction drawn between pre-coordinate and post-coordinate systems is inaccurate; meadow tends to refer to such systems simply as keyword systems, although it is perfectly possible to have a post-coordinate system based on, say, class numbers, which can hardly be considered keywords, while it is also possible to have keyword systems that are essentially precoordinate. in fact, meadow relates the characteristic of being post-coordinate to the number of terms an indexer may use (" ... permit their users to select several descriptors for an index, as many as are needed to describe a particular document"), but this is not an accurate distinction between the two types of system. the real difference is related to how the terms are used (not how many are used), including how they are used at the time of searching. the references to faceted classification are also confusing and a number of statements are made throughout the discussion on index languages that are completely untrue. for example, meadow states (p. 51) that "a hierarchical classification language has no syntax to combine descriptors into terms." this is not at all accurate since several hierarchical classification schemes, including udc, do have synthetic elements which allow combination of descriptors, and some of these are highly synthetic. in fact, meadow himself gives an example (p. 3839) of this synthetic feature in the udc. it is also perhaps unfortunate that the student could read all through meadow's discussion of index languages without getting any clear idea of the structure of a thesaurus for information retrieval and how this thesaurus is applied in practice. book reviews 151 moreover, meadow used medical subject headings as his example of a thesaurus (p. 33-34), although this is not at all a conventional thesaurus and does not follow the usual thesaurus structure. my other criticism is that the book is too selective in its discussion of various aspects of information retrieval. for example, the discussion on automatic indexing is by no means a complete review of techniques that have been used in this field. likewise, the discussion of interactive systems is very limited, because it is based solely on nasa's system, recon. the student who relied only on meadow's coverage of these topics would get a very incomplete and one-sided view of what exists and what has been done in the way of research. in short, i would recommend this book for those sections (p. 183-412) that deal with the organization of records and files and with related programming considerations. the author has handled these topics well and perhaps more completely, in the information retrieval context, than anyone else. indexing and index languages, on the other hand, are subjects that have been covered more completely, clearly, and accurately by various other writers. i would not recommend the discussion on index languages to a student unless read in conjunction with other texts. f. w. lancaster university of illinois application of computer technology to librm·y processes, a syllabus, by joseph becker and josephine s. pulsifer. metuchen, n.j.: scarecrow press, 1973. 173p. $5.00. despite the large number of institutions offering courses related to library automation, including just about every library school in north america, accredited or not, there is a remarkable shortage of published material to assist in this instruction. with the publication of this small volume a light has been kindled; let us hope it will be only the first of many, for larger numbers of better educated librarians must surely result in higher standards in the field. this syllabus covers eight topics related 152 journal of library automation vol. 7/2 jtme 1974 to the use of computers in libraries, titled as follows: bridging the gap (librarians and automation); computer technology; systems analysis and implementation; marc program; library clerical processes (which encompasses acquisitions, cataloging, serials, circulation, and management information) ; reference services; related technologies; and library networks. each topic is treated as a unit of instruction, and each receives the identical treatment as follows. the units each start with an introductory paragraph, explaining what the field encompasses, and indicating the purpose of teaching that topic. the purpose of systems analysis, for example, is "to develop the sequence of steps essential to the introduction of automated systems into the library." a series of behavioral objectives are then listed, to show what the student will be able to do (after he has learned the material) that he presumably was unable to do before. for example, there are seven behavioral objectives in the unit on computer technology, of which the first four are: "1) the student will be able to discuss the two-fold requirement to represent data by codes and data structures for purposes of machine manipulation, 2) the student will be able to identify the basic components of computer systems and describe their purposes, 3) the student will be able to differentiate hardware and software and describe briefly the part that programming plays in the overall computer processing operation, 4) the student will be able to define the various modes of computer operation and indicate the utility of each in library operations." the remaining three objectives refer to the student's ability to enumerate and compare types of input, output, and storage devices. then an outline of the instructional material is presented, followed by the detailed and well-organized material for instruction. in no case can the material presented here be considered all that an instructor would need to know about the field, but a surprising amount of specific detail is included, along with a carefully organized framework within which to place other knowledge. the end result is to present to the instructor a series of outlines that would encompass much of the material included in a basic introductory course in library automation. every instructor would, presumably, want to add other topics of his own in addition to adding other material to the topics treated in this volume, but he has here an extremely helpful guide to a basic course, and the only work of its kind to be published to date. peter simmons school of librarianship university of british columbia the larc reports, vol. 6, issue 1. online cataloging and circulation at western kentucky university: an approach to automated instructional resources ~anagement. 1973. 78p. this is a detailed account of the design, development, and implementation of online cataloging and circulation which have been in operation at western kentucky university for several years. the library's reasons for using computers are similar to those of many college and university libraries that experienced rapid growth during the 1960s. the faculty of the division of library services first prepared a detailed proposal with appropriate feasibility studies and cost analyses to reclassify the collection from dewey decimal to library of congress classification. the proposal was approved by the administration of the university, and the decision was made to utilize campus computer facilities via online input techniques for reclassification, cataloging, and circulation. "project reclass" was accomplished during 1970-71 using ibm 2741 ats/360 terminals. a circulation file was subsequently generated from the master record file. the main library is housed in a new building and has excellent computer facilities within the library that are connected to the university computer center. cataloging information is input directly into the system via ats terminals; ibm 2260 visual display terminals are used for inquiry into the status of books and patrons; and ibm 1031/1033 data collection terminals are used to charge out and check in books. catalog cards and book catalogs in upper/lower case are produced in batch mode on regular schedule. the on-line circulation book record file is used in conjunction with the on-line student master record and payroll master record files for preparation of overdue and fine notices. apparently the communication between library staff and computer personnel has been well above average, and cooperation of the administration and other interested parties has been outstanding. the attention given to planning, scheduling, training, and implementation is impressive. what has been accomplished to date is considered very successful, and plans are book reviews 153 underway to develop on-line acquisitions ordering and receiving procedures. the report has some annoying shortcomings such as referring to the library of congress as "national library"; frequent use of the word "xeroxing," which the xerox corporation is attempting to correct; "inputing" for "inputting"; and several other misspelled words. some parts are poorly organized and unclear, but the report does provide rriany useful details for those considering a similar undertaking. lavahn overmyer school of library science case western reserve university letter from the editors (march 2023) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | march 2023 https://doi.org/10.6017/ital.v42i1.16319 welcome to the march 2023 issue. despite the date, snow still covers the ground where the editor lives, and winter still appears to be holding on tightly to both coasts. we’re pleased to share with you the first issue of the calendar year and a collection of five peer-reviewed articles, as well as some news and updates (below). we also have a column in our public libraries leading the way series, “virtual production at cloud901 in the memphis central library” by alan ji and david mason, about how that library has adapted cutting-edge production techniques used in streaming tv shows such as the mandalorian to create virtual scenery in their teen-focused makerspace. peer-reviewed articles in the current issue are listed here: • the current state and challenges in democratizing small museums’ collections online / avgoustinos avgousti and georgios papaioannou • services to mobile users: the best practice from the top visited public libraries in the us / yan quan liu and sarah lewis • decision-making in the selection, procurement, and implementation of alma/primo: the customer perspective / jin xiu guo and gordon xu • exploring final project trends utilizing nuclear knowledge taxonomy: an approach using text mining / faizhal arif santosa • japanese military “comfort women” knowledge graph: linking fragmented digital records / haram park and haklae kim call for new editorial board members coming in april the ital editorial board, a core committee, will be issuing a call for volunteers in april. for those selected, two-year terms of service will start on july 1. editorial board members have a critical role in building the foundation for the journal’s future through setting policy and content guidelines. members of the board have several key responsibilities: • shaping the direction and strategy for the journal; • participating in online editorial board meetings; • soliciting contributions to the journal (based on personal networking, conference attendance, etc.); and • optionally reviewing articles submitted to the journal, for those who want to be involved at an even deeper level (see the peer reviewer job description). if you are interested in furthering the scholarly record for library technology and have a background in information technology in libraries, archives, or museums, this is an exciting opportunity to contribute to the profession and engage with colleagues across all types of organizations in examining the role of technology in libraries. because we want the editorial board to reflect the broad diversity of core’s membership, we especially encourage individuals from underrepresented groups and identities to apply. ital will move to a new host this summer over the past year, the editors of the three core journals—ital, library leadership & management (ll&m), and library resources and technical services (lrts)—have been working with core and the core board to consolidate our journals on a single publishing platform. we’re pleased to say that ll&m and ital will move this summer to ala’s open journal systems platform, where lrts https://ejournals.bc.edu/index.php/ital/article/view/16315 https://ejournals.bc.edu/index.php/ital/article/view/14099 https://ejournals.bc.edu/index.php/ital/article/view/15143 https://ejournals.bc.edu/index.php/ital/article/view/15599 https://ejournals.bc.edu/index.php/ital/article/view/15599 https://ejournals.bc.edu/index.php/ital/article/view/15603 https://ejournals.bc.edu/index.php/ital/article/view/15603 https://ejournals.bc.edu/index.php/ital/article/view/15799 https://ejournals.bc.edu/index.php/ital/article/view/15799 https://docs.google.com/document/d/1vtgq8fcfm9ux2u0elvhjrdlm6vxut7ybu6cytqw-nz4/edit?usp=sharing information technology and libraries march 2023 letter from the editors 2 varnum and kelly is already published. we’ll have more details to share in our june issue, before the move, but want to let you know some important details: • ital’s urls will change, but dois will continue to resolve the new home of the journal. we will work with our current host, boston college, to set up redirects to the new location. • ala uses the same publishing platform as boston college, open journal systems, so for authors and reviewers, the experience will remain the same. • articles published in ital (and our two sibling journals) will continue to be open access with no fees charged to authors or readers. authors maintain copyright in their work. we are very grateful to boston college for their support of information technology and libraries over the past decade, and to the core board for supporting this project. be a part of a future issue as the u.s. academic year hurdles to a close this spring, it’s a great time to think about the work you’ve accomplished and what you might share with your library colleagues near and far. our call for submissions outlines the topics of interest to the journal—basically, if the submission discusses the intersection of libraries/archives/museums and technology, it’s potentially in scope—and the process for submitting an article. we’d love to consider your article for publication. or, if you have an idea you’d like to discuss with ital’s editors, contact either of us at the email addresses below. kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu marisha.librarian@gmail.com https://ejournals.bc.edu/index.php/ital/call-for-submissions mailto:varnum@umich.edu mailto:marisha.librarian@gmail.com call for new editorial board members coming in april ital will move to a new host this summer be a part of a future issue salazar 170 information technology and libraries | september 2006 author id box for 3 column layout traditional, larger libraries can rely on their physical collection, coffee shops, and study rooms as ways to entice patrons into their library. yet virtual libraries merely have their online presence to attract students to resources. this can only be achieved by providing a fully functional site that is well designed and organized, allowing patrons to navigate and locate information easily. one such technology significantly improving the overall usefulness of web sites is a content management system (cms). although the cms is not a novel technology per se, it is a technology smaller libraries cannot afford to ignore. in the fall of 2004, the northcentral university electronic learning resources center (elrc), a small, virtual library, moved from a static to a database-driven web site. this article explains the importance of a cms for the virtual or smaller library and describes the methodology used by elrc to complete the project. state of the virtual library the northcentral university electronic learning resource center (elrc), a virtual library, recently moved from a static to a databasedriven web site in 2004.1 before this, the site consisted of 450 static pages and continued to multiply due to the creation and expansion of northcentral university (ncu) programs. to provide the type of service demanded by our internet-savvy patrons, the elrc felt it needed to evolve to the next stage of web management and design. ncu, with a current enrollment of roughly twenty-one hundred fulltime students, is one of many forprofit virtual universities (including the university of phoenix, capella, and walden, among others) seeking to carve a niche in the education market by offering professional degrees entirely online.2 in the past few years, distance education has experienced exponential growth, causing virtual universities to flourish, but forcing on their libraries the challenge of keeping pace.3 typically, virtual libraries are manned by a limited staff comprised of one or two librarians who are responsible for all facets of the library, including interlibrary loan, virtual reference, library instruction, and web site management, among other library duties. 4 web site management, as expected, becomes cumbersome when a site exceeds two hundred or more static pages and a clear and structured system is not in place to maintain a proliferating number of web pages. because virtual, for-profit libraries do not rely on public funding and taxes, they tend not to be as concerned about autonomy as public or state libraries, which must find ways to stay within budget and curtail expenses. on the same note, some academic libraries prefer to maintain a local area network (lan), while other libraries may not have the staff, resources, or need for such a system. thus, for some virtual libraries, such as elrc, the incorporation of technology takes on a more dependent role. that is, where some libraries are encouraged to explore open source applications and create homegrown tools, the virtual, smaller-staffed library finds itself more or less reliant on its university’s information technology (it) department.5 virtual libraries address the needs of distance education students, who demand an equivalent, if not surpassing, level of service and instruction as they would expect to find at physical libraries.6 meeting these needs requires a great deal of creativity, ingenuity, and a strong technical background. recent trends in developing technologies such as mylibrary, learning objects, blogs, virtual chat, and federated searching have broadened the scope of possibilities for the smaller-staffed, virtual library. in particular, a content management system (cms) utilizes a combination of tools that provide numerous advantages, as outlined below: 1. the creation of templates that maintain a consistent design throughout the site 2. the convenience of adding, updating, and deleting information from a single, online location 3. the creation and maintenance of interactive pages or learning objects 4. the implementation of a simple editing interface that eliminates knowledge of extensible hypertext markup language/hypertext markup language (xhtml/ html) by library staff simply defined, a cms is comprised of a database, server pages such as active server page (asp), personal home page (php), or coldfusion; a web server—for example, internet information server (iis), personal web server (pws), or apache; and an editing tool to manage web content.7 these resources vary in price, but for a virtual library integrated into a larger university, it is ideal to implement applications and software supported by the university. for the autonomous academic library, this may differ. there are advantages and disadvantages for using proprietary and nonproprietary software, and it is left to the library, virtual or physical, to determine the type of resources needed to meet the goals and mission of the university.8 although the scope of this article focuses on the creation of tools for a homegrown cms, some libraries may wish to explore commercial cms packages that include additional services such as technical support. these cms packages will vary in price and services depending on the vendor and the needs of the library.9 elrc transformed in fall 2004, a group that consisted ed salazar (esalazar@ncu.edu) is reference/web librarian at northcentral university. content management for the virtual library ed salazar article title | author 171content management for the virtual library | salazar 171 of two librarians, the education chair, and programmer, convened to discuss the redesign of the elrc web site, which had become increasingly difficult to manage. specifically, the amount of duplicated content, inconsistent design and layout, and unstructured architecture of the site posed severe navigational and organizational problems. the group selected and compared other academic library sites to determine a desired design and theme for the new elrc site. discussions also involved the addition of features such as a site search and breadcrumbs, which the group felt were essential. as a result, the creation of a homegrown cms using proprietary software became the route of choice to meeting the increasing demands of patrons and the need to expand the site. because ncu utilizes microsoft (ms) information system products, it was agreed ms or ms-compatible applications would be used to create the cms, which consisted of sql server, iis, asp, visual basic script (vbscript), jspell iframe, and ms visual interdev. ms visual interdev and jspell iframe supplanted our previous web editor, ms frontpage, which seemed to generate superfluous code and thus made it difficult to debug or alter the design and layout of pages. also, using jspell iframe eliminated the need for future ncu librarians to possess an expertise in xhtml/html. with these pieces in place, the arduous task of culling content from static pages and entering it into a database was begun. the database the sql server database helped in organizing and structuring content, and allowed for the creation of templates and administration (admin) pages.10 in addition, the database played an integral part in creating the search, breadcrumb, and site map features the group so desperately wanted. a significant amount of time was spent weeding the site for information that had become obsolete or irrelevant to elrc. it should be noted that the group originally attempted to use access for a database but stumbled across several problems, one being the inability to maintain a stable and reliable connection to the database. the templates with the database nearly complete, the programmer began creating asp templates in ms visual interdev. these templates basically serve as the shell of the web page, preserving the design and layout elements of the page while extracting unique content based on a user’s request. in essence, a single template can produce hundreds of pages consistent in design. likewise, a single change to the template can alter the entire design of the site. for the elrc, seven templates were created for more than 450 pages. figure 1 shows the elrc course guides template. figure 2 shows the public view of the elrc course guide template. changes to the templates are done using ms visual interdev, which offers a user-friendly environment for managing web pages. ms visual interdev also includes helpful features, such as highlighting code errors for easy debugging, and the ability to access, create, and maintain stable connections to databases.11 in addition, the ms visual interdev editor recognizes commonly used asp commands, allowing the user to save time by utilizing keyboard shortcuts when programming. besides creating templates, asp server-include files and cascading style sheets (css) were incorporated, allowing for the easy modification of code on a single file instead of each and every page or template. this, in particular, is time-efficient when having to add or change database connections or design elements. also, the elrc took extra precaution to ensure that style elements met the accessibility requirements and standards set forth by the world wide web consortium (w3c), as well as tested the site on other browsers, such as firefox and netscape.12 as the site continues to grow and expand, so may the need for additional templates. creation or replication of templates is simple, requiring a basic understanding of programming and the re-assigning of new variables in the code to match added or modified tables. there is some speculation in the near future of migrating the site to the asp.net environment for added functionality and security. if and when that time comes, the elrc will be ready. at present, ncu is not considering the use of open source code or applications (the exception being the apache web server); this is primarily due to available technical support, security, and intuitiveness of use associated with commercial software. in addition, the ncu information system was built using commercial software and a complete transition to open source, at the moment, is not possible or desirable. with the templates complete, the elrc began running a prototype of the new site, making it accessible to students and faculty from a link on the old site. a survey was created that allowed users to comment on the new site. one detail of importance to note is that the survey duplicated a prior survey done on the old site in 2003 in order to provide the elrc with comparative data. the admin pages the next phase of the project required the creation of admin pages, which would allow content to be quickly added, updated, and deleted on the site. these pages, like the templates, were created in ms visual interdev; display content is housed within the database on the web, thus allowing 172 information technology and libraries | september 2006 it to be changed on the fly. figure 3 shows all of the web pages for the elrc within a table. what is particularly convenient about the admin edit pages is the incorporation of the jspell iframe editor, which serves as the frontend editor to the site. the reason for using jspell iframe, as stated earlier, is its ease of use: the simple tool bar provides the basic, essential tools necessary for creating content without the daunting number of buttons and menu selections other editors tend to have. also, jspell iframe is reasonably priced and does not entail a complex installation or require any space on local hard drives; instead, the program is maintained on the server. consequentially, all that is required is the insertion of the jspell iframe javascript code into the web pages. in addition to jspell iframe, fields within admin edit pages are or can be pre-populated by content in the database. for instance, the title or display order of links can be easily edited or changed. longer text fields comprised of paragraphs are created or modified using jspell iframe. deleting a page is simple, requiring only the click of a delete button on the bottom, righthand corner. figure 4 shows jspell iframe embedded within an admin edit page. the admin add page is straightforward. information is entered into the fields appearing on a form page, and the proper page type designation is selected from a drop-down menu. yet, more importantly, the admin add and the admin edit pages can filter information to specific users for security purposes and library needs. figure 5 shows an admin add page. figure 6 shows an admin edit page. the admin pages were designed with flexibility in mind. main column headings may be sorted, as seen in figure 3, allowing one to locate a particular page. the sorting feature also displays the inner structure of the database that, in turn, identifies parent-child relationships between pages in the elrc, which is useful and necessary when adding pages to the elrc site. due to the careful thought used in creating the admin pages, they have proven to be extremely effective and useful in maintaining a library web site. each and every change to the site can be made on the web, allowing content to be edited remotely and eliminating the need for installing and maintaining expensive editing software on local and remote machines. usability testing with the site completed, the elrc felt it important to perform usability figure 1. elrc course guide template figure 2. public view of the elrc course guide template article title | author 173content management for the virtual library | salazar 173 tests, but how does a virtual library conduct usability testing when all of its students are distance education students? this is a difficult question that involves some ingenuity to answer. in order to solve this problem, staff members were propositioned (begged) to volunteer for the study. total staff acquired was five. also, a local college class of about ten students was persuaded to participate in the study. granted, the total number of subjects is not representative of the ncu student body; however, substantial changes to the site were made from the data gathered. more usability testing is expected in the immediate future. the findings usability testing complete, the site was launched. during this period, a few minor hang-ups were experienced, including broken links, form page errors, and stray design elements, but these were only minor problems that were quickly fixed. feedback from the elrc survey showed that nearly all of the students and faculty, roughly fifty respondents, approved of the changes by commenting that the site had improved in layout and organization of content as well as navigation. also, responses and comments from usability testing participants were equally positive and encouraging. figure 7 shows the new ncu learners elrc home page. although it is difficult to establish a direct connection between the elrc site and usage, recent statistics appear promising. since the inception of the new site in december 2004, the number of visits to the elrc learners home page has jumped 10 percent. this number is expected to rise as ncu continues to grow and students become more acquainted and familiar with the site. the project took nearly six months to complete and required the expertise of a programmer. although programming may be outside the requisites of a distance librarian, managing the site is not. a general understanding of control statements and sql is all that is needed. for the distance librarian who spends almost all of his or her time online, these skills can be acquired on the job or by taking introductory programming courses at a local college. in the hope that the site will continue to expand in concert with the growing body of ncu students, recently the elrc added a writing center and blog. with the entire site now being database driven, adding, updating, deleting content is done effortlessly. ideally, students and faculty will play a greater role in the development of the elrc site as a result of the changes. involving patrons with the site can play an integral, beneficial role in their academic pursuits. figure 3. web pages for elrc within a table figure 4. jspell iframe editor embedded within an admin edit page 174 information technology and libraries | september 2006 conclusion the elrc at ncu encourages other virtual or smaller libraries to explore their resources for improving their library web sites, which involves understanding campus resources and personnel. with the ever-burgeoning growth of technological resources, every library—small or large, virtual or physical, public or private—can empower itself to meet the needs of internet-savvy students. it is only a matter of being aware of the resources and putting them to good use. references and notes 1. the ncu elrc web site is comprised of three separate sites: the public site www.ncu.edu/elrc (accessed dec. 2, 2004), the mentors site http://mentors .ncu.edu/elrc (accessed dec. 2, 2004), and the learners site http://learners.ncu .edu/elrc (accessed dec. 2, 2004). although similar in design, each site is tailored to meet the needs of each individual group as well as protect ncu’s resources, services, and information. access to subscription resources and personal information is available upon authentication of the user to the site. 2. for a detailed overview of virtual libraries, see valerie a. akuna, “virtual universities: the new higher education paradigm,” estrella mountain college, http://students.estrellamountain .edu/drakuna/virtualuniversities.htm (accessed feb. 15, 2005). 3. u.s. department of education, national center for education statistics, “the condition of education 2004,” distance education at postsecondary institutions, http://nces.ed.gov/pubsearch/ pubsinfo.asp?pubid=2004077 (accessed feb. 8, 2005). 4. for more information on the role of the virtual librarian in a virtual university, see jan zastrow, “going the distance: academic librarians in the virtual university,” university of hawaii–kapiolani community college, http://library.kcc .hawaii.edu/~illdoc/de/depaper.htm (accessed jan. 29, 2005). 5. for an overview on developing an open source cms, please see mark dahl, “content management strategy for a college library web site,” information technology and libraries 23, no. 1 (2004). 6. for a detailed discussion on distance education and virtual libraries, see smiti gandhi, “academic librarians and distance education: challenges and opportunities,” reference & user services quarterly 43, no. 2 (2003). 7. for detailed information on using asp pages for managing databases, see xiaodong li and john paul fullerton, “create, edit, and manage web database content using active server pages,” library hi tech 20, no. 3 (2002); see also, bryan h. davidson, “database driven, dynamic content delivery: providing and managing access to online resources using microsoft access and active server pages,” oclc systems and services 17, no. 1 (2001). figure 6. admin edit page figure 5. admin add page article title | author 175content management for the virtual library | salazar 175 8. for advantages and disadvantages of open source and proprietary software, see john caroll, “open source versus proprietary: both have advantages,” special to cnet asia, http://asia.cnet.com/ builder/program/work/0,39009380,3918 1451,00.htm (accessed feb. 4, 2004); see also, stephen shankland, “study: opensource database going mainstream,” cnet, http://ecoustics-cnet.com.com/ study+open-source+databases+going +mainstream/2100-7344_3-5171543.html (accessed feb. 4, 2004). 9. for information on commercial content management vendors and prices, see cms watch, www.cmswatch.com/cms/ vendors (accessed feb. 15, 2005). “sql server 2000 product overview,” microsoft windows server system, www.microsoft. com/sql/evaluation/overview/default. asp (accessed feb. 15, 2005). 10. for a review on visual interdev, see maggie biggs, “visual studio 6.0 demonstrates improved integration,” infoworld 20, no. 35 (1998), www.infoworld.com/ cgi-bin/displaytc.p1?/reviews/980831 vstudio6.htm (accessed feb. 4, 2004). 11. “checklist of checkpoints for web content accessibility guidelines 1.0,” w3c, www.w3.org/tr/wai-webcon tent/full-checklist.html (accessed feb. 1, 2005). 12. jspell iframe 2004, www.jspell .com/iframe-spell-checker.html (accessed dec. 2, 2004). figure 7. elrc learners home page ebsco cover 2 lama cover 3 lita cover 4 index to advertisers lib-mocs-kmc364-20131012112749 147 who rules the rules? "why can't the english teach their children how to speak?" wondered henry higgins, implying that a lack of widely and consistently followed rules of usage created linguistic backwardness and anarchy. higgins' question might be rephrased today as: "when will the code teach its founders how to catalog?" the library of congress has historically fitted catalog codes to its own practices rather than following them slavishly. the best example is the lamentable policy of superimposition: continued use of preestablished forms of names that are not in compliance with the paris principles or aacrl. this was a cause of widespread confusion and complaint and the practice was eventually discontinued ... well, sort of discontinued. the various interpretations of aacrl, the inclusion of new rules, and pressure for further modifications eventually led to the drafting of aacr2, a code that was supposed to end variance and controversial practices. one might assume that including lc as a principal author of the new text and an lc official as one of the editors might result in a code that it could actually follow. judging by the spate of exceptions and interpretations made so far (more than 300), this has not been the case. in the place of superimposition, we have new impositions known as "compatible headings." they may not be readily ascertained according to the rules, but have been granted a sort of bibliographic squatter's rights. although it would be simpler for catalogers to follow the rules consistently, they must instead check several cataloging service bulletins and name authorities to see whether lc has determined that a given personal, corporate, or serial name is already "compatible" with aacr2. this can result in cataloging delays, higher processing costs, and inconsistent entries. aacr2 and uncertainties regarding its application by lc have been widely credited with lower cataloging productivity. this is not to imply that lc is behaving in a strictly arbitrary or capricious manner vis-a-vis the code. they can be seen as caught on the horns of a trilemma, with vast internal needs and increasing external demands competing for a shrinking budget. president reagan may have whispered sweet nothings during national library week, but during budget hearings it became clear that libraries are not as "truly needy" as impoverished generals and interior decorators. decisions to depart from aacr2 have been based primarily on cost factors. the decision by the rtsd catalog code revision committee and the joint steering committee not to consider cost and implementation factors has led both to widespread opposition to the code resulting in a one-year delay in implementation, and to the modifications that lc has made and is making. some variations such as using "dept." for "depart148 journal of library automation vol. 14/3 september 1981 ment" and "house" for "house of representatives" make fiscal and common sense. many other lc changes are simply bibliographic nit-picking, minor irritants to catalogers who must flip back and forth between the text of aacr2 and half a dozen bulletins to settle a minor point of description. why didn't lc representatives attempt to say, "wait a minute-we just can't do that now," while the code was being considered rather than after it was published? anyway, considering that lc was starting up a whole new catalog and closing the old one, one wonders why rules not to be applied retrospectively had to be tinkered with to such an extent. major questions still to be resolved include not only the compatiblename quandary, but the treatment of serials, microform reproductions, establishment of corporate names and determination of when works "emanate from" corporate bodies, and the romanization of slavic names. the decision to use title entry for serials and monographic series even in the case of generic titles has been controversial. there are, of course, exceptions to the rules, and there will be differences in how uncertain catalogers construct complex entries with parenthetical modifiers. unfortunately, rules establishing entries for serials have sometimes been muddied rather than clarified in the bulletin. consider the example in the winter 1981 issue wherein the bulletin of the engineering station of west virginia university is entered under "bulletin," while the same publication for the entire university is entered under "west virginia university bulletin." also, consider the complex cross-reference structure required to direct users between the two files, both of which may well be split again,' historically, between author/ title and title main entry. this is a special problem in the case of large monographic series generated by corporate bodies. the lc position on microform reproductions of previously published works is clearer, but is still a point of controversy. they have decided to provide the imprint and collation (er, make that "publication, distribution, etc., area" and "physical description area") of the original work, with a description of the microform in a note. in other words, they're sticking to aacrl. the rtsd ccs committee on cataloging: description and access is currently trying to resolve this conflict, one in which many research libraries have sided with lc. this body is also trying to unravel the mystique of "corporate emanation'' introduced in aacr2. another sore point has been the lc decision to follow an alternative rule, which prefers commonly known forms of romanized names over those established via systematic romanization. that lc is correctly following the spirit of the general principle for personal names is little comfort to research libraries with large slavic collections. how are other libraries responding to the murky form of aacr2? some are closing old card catalogs and continuing them with com or temporary card supplements. some of these are establishing cross-reference links between variant forms of names between catalogs, while others are not. editorial/dwyer 149 some are keeping their catalogs open and shifting files, while others are splitting files. some are shifting some files and splitting others. aa cr2 was intended to provide headings that could be easily ascertained by the user. ironically, the temporary result is scrambled catalogs: access systems involving multiple lookups and built-in confusion . until most bibliographic records are in machine-readable form under reliable authority control this will continue to be the case. authority control, it would seem, has long been an idea whose time has come but whose application is yet to be realized. the cooperative efforts of the library of congr~s and the major bibliographic utilities to establish reliable automated authority control will do much to ameliorate the problems presented by aacr2. it would also be helpful if lc, perhaps with the financial assistance of other libraries, networks, and foundations, would publish what might be called aacr2¥2-not a new edition of the code but one accurately reflecting actual lc practice. finally, future code makers would be wise to consider cost and other implementation factors in their deliberations. professor higgins, ever the optimist, would rather sing "wouldn't it be !overly" than hear another verse of "i did it my way." james r. dwyer editor's notes title change it often seems that the only things that change their names as often as library publications are standards organizations. not to be left out, jola will be called information technology and libraries beginning with volume 1, number 1, the march 1982 issue . this name was approved by the lit a board in san francisco this june as more accurately reflecting the true scope of the journal. new section with this issue, we are initiating a new section: "reports and working papers." this is intended to help disseminate documents of particular interest to the]ola readership. we solicit suggestions of documents, often developed as working papers for a specific purpose or group but of interest and value to our readership. in general, documents in this section are neither refereed nor edited. mitch i take great personal pleasure in publishing mike malinconico's speech upon presenting the 1981 lita award to mitch freedman. readers' comments we do continue to solicit suggestions about the journal but receive few. is anybody reading it? if you have any thoughts about what we should or shouldn't do, we would welcome your sharing them. lib-s-mocs-kmc364-20141005044052 109 statistical behavior of search keys abraham bookstein: graduate library school, university of chicago editor's note: the editor and author are aware that varying approaches may be taken to the problem presented here. readers are invited to respond in the form of a paper or a technical c.'ommunication. in discussion about search keys, concern has been expressed as to how the nwnber of items tetrieved by a single value relates to collection size. this paper creates a statistical model that attempts to give some insight into this behavior. it is concluded that, in general, the observed behavior can be explained as being intrinsically statistical in nature rather than being a property of specific search keys. an attempt is made to relate this model to other tesearch, and to indicate how this model may be made to yield more accurate predictions. introduction various experiments suggest that it may be possible to develop, as an access route into a file of bibliographic records, a search key'" whose values can be easily derived from such bibliographic data as is likely to be available to its users.1 some concern, however, has been expressed regarding the nonuniqueness of these keys: if the number of items retrieved were often to exceed an amount easily handled by a user of the system, the value of this access route would be considerably diminished. accordingly, an important measure of search key performance is the frequency with which a large number of records is reh·ieved as the search key is applied to the file. this measure is · related, for example, to how many memory accesses will be required, on the average, to retrieve all records satisfying a request; it is also an important consideration in deciding which display device should be installed in a system.2 • 3 after evaluating such a measure for a search key on a particular file, it is reasonable to ask how that measure will change over time, as the file increases in size. the nature of this variation has already been of concern to researchers in the field. kilgour, on the basis of a· number of experiments carried out at oclc, notes that "there remains a major problem to be o by the. phrase "search key'~ we mean a key similar to the 3-3 or 3-1-1-1 keys used at · ohio college library center and other places, which is made up by concatenating truncations of bibliographic data elements. llo journal of library automation vol 6/ 2 june 1973 solved and a major question to be answered. the problem is constituted of those replies that contain a number of entries exceeding the optimal maximum .. .. the major question to be answered is how truncated search keys will perform on files ten and a hundred times the size of that used in this experiment."' he elsewhere observes that "as a file of bibliographic entries increases, the maximum number of entries per reply does not increase in a one-to-one ratio ... . "5 this paper presents a mathematical model that addresses itself to the problem defined by kilgour and attempts to explain his observation; it is suggested that the gross features of the behavior are statistical in nature and not properties of specific search keys. a view of collection growth the cause of the phenomenon observed by kilgour can best be understood by first considering a simple model which, while not itself valid, does cast light on the nature of the behavior. this first model neglects the effect of randomness both in the growth of the collection and in the arrival of requests. it supposes our search key has the following property: regardless of collection size, the fraction of the collection retrieved by a particular search key value, v~, is exactly given by a constant f;; thus, if the fil e holds n records, a request for v 1 will retrieve n 1 = f,n records. this model similarly assumes that among any sizeable number of requests, the fraction of the time any particular search key value will occur is fixed; thus, for any subset of search key values, it is possible to determine how often members of that subset will occur among a set of requests. in particular, for any integer n, we can form the set of all the search key values that will retrieve less than n items. we can then determine how often search key values from that set are requested. if, for example, requests for these values occur 99 percent of the time, then we can assert that 99 percent of the time less than n items will be retrieved. if the fil e contains n items, then these n items constitute the fraction f = ~ of the file. should the collection size increase to ln, then the model predicts that 99 percent of the time less than f( ln) = ln items would be retrieved. in other words, we have precisely the behavior kilgour observes does not occur. this argument shows that a simple deterministic model does not conform to experience with search keys. the model breaks down in two ways, which accounts for the discrepancy between the results derived from it and kilgour's observations: 1. in any actual library, the fraction of the time that a particular request will appear within a sequence of requests will vary; and 2. in comparing two different samples having the same size, the number of items having a given search key value will vary. the first of these factors is easily dealt with and its analysis will suggest the number of requests to use in a test of search key behavior in a given library. for a particular collection, lets denote the set of search key values statistical behavior of search keysj bookstein 111 for which, say, twenty or more items are retrieved. we would like to find the fraction of the time that a request in s occurs in the long run; suppose this value is in fact q. then among m requests, the probability that m members of s occur is given by the binomial distribution fb(m\q,mi). this distribution has a mean of qm and a variance of qm(1 q). should we desire to estimate the actual fraction of the time that twenty or more items will be retrieved, we can take a sample of m requests and compute q, the fraction of the requests with search key values in s; if we do so, we will usually get a value for q between q ,/ m v q ( 1 q) and q + v2 m v q ( 1 q) .' if for example, q = .01 and m = 10,000, we would tend to find q in the interval .01 ± .002. thus the effect of randomness in the arrival of requests can easily be controlled by increasing the number of requests considered; furthermore, the size of error can be predicted. we next introduce the second factor; its analysis will suggest how the behavior of search keys will change as the collection grows in size. for this purpose we adopt a model of collection growth which assumes that as items arrive, they are randomly distributed among the search key values in accordance with some probability distribution. if we suppose that the probability of an item being assigned a specified search key value, v11 is p11 then in a collection of n items we may conclude that the probability of n items having that value is given by the binomial distribution: ( n ) n n-n fu(n jpbn) = 7 p1(1p1 ) • if g' ( v;) is the probability that the value v1 is selected from the request population, then the probability that the "next" request retrieve n items is given by def ~~ g'(vt) fb(njp;,n) =fg(p) fb(njp,n)dp; g(p) dp= ~ g'(v;) p;! p i ~ p + dp is the probability that a request arrive with value p1 in the interval (p,p + dp), and will be treated as a continuous function.""' since the expectation of the binomial distribution is given by pn, we have de£ nfpg(p)dp = np as the expected number of items retrieved by a random request; since this is proportional ton, doubling the size of the collection will, on the average, double the amount of material ret1·ieved. similarly, the 2 2 variance, u 2, is given by n2 ( p 2 p ) + nf p( 1 p) g( p )dp. should p2 p , de£ the variance of p, be small, this reduces to nfp(l p )g(p )dp = i?n, so that approximately 95 percent of the time the amount of material retrieved would be less than np + 2\1 n a-= n ( p + , ~a) . v n . •• this result would more precisely be expressed as f fb(n lp ,n)dg(p), which has the form of a stieltjes integral. the expression used in the text is simpler and reasonably valid because of the vast number of values the search key can take. i i i j i 112 journal of library automation vol. 6/2 june 1973 it is the factor + 2cf p vn' and its dependence on n, that may account for kilgour's nonlinearity, and not any property intrinsic in the nature of any type of search key. thus, to the extent that this model reflects what is really happening, the 95 percent point increases roughly proportionately with file size; the "constant" of proportionality, however, is the sum of two tem1s: the first is a true constant, and the second is a term that approaches zero as the file gets larger. in particular, this model suggests that we will never reach a leveling off point-as the file increases in size, the number of items retrieved will also increase, and the pattern of increase will become increasingly linear. up to this point this discussion has been qualitative in nature, being based upon general statistical considerations and making use of the normal approximation to some unknown distribution; its broad conclusions are, however, consistent with the findings of earlier workers and can explain certai11 unanticipated properties of search keys. to proceed further it will be necessary to restrict the form of the function g(p); tl1is will be attemped in the following section of this paper. relationship of model to earlier research interest in access methods that are appropriate for files of bibliographic data has generated a considerable amount of empirical research on search key behavior. of necessity, this pioneering work has been of a descriptive nature, resulting in data showing search key behavior in specific environments. while these efforts have lent a good deal of insight into the nature of search keys, the basic weakness of such research lies in the difficulty of extending these findings to other situations. one purpose of a mathematical model such ·as. the one being developed here is to provide this increased generality by representing in a concise and easily manipulated form the results of previous research. it is accordingly of interest to indicate the relationship between previous work on search keys and our model. research on search key performance has been of two kinds. the fi.rst kind seeks .to answer the question: for any number, n, how many search key values retrieve n items? the answer to this question depends only on the search key and the collection; it is independent of the pattern of request arrivals. the second kind of research involves the ·actual arrival of requests; it tries to answer the question: for any number n, how frequently will requests resulting in the retrieval of n items occur? to discuss this research in terms of our model requires a closer examinadef tion of the function g( p) previously defined. we recall that g( p) dp == ~ g'(v1), with dp being a small number. thus g(p) is determined p ~ pi ~ p+dp by two factors: statistical behavior of semch keysjbookstein 113 a. the number of search key values in the interval ( p,p + dp). let us denote this value by f(p )dp, so f(p) is the density of search keys at p. we make use here of the fact that although the number of possible search key values is finite, the number is very large, so their. distribution can be thought of as continuous. b. the average probability of search keys, with values p 1 near p, being requested. we shall refer to this quantity as g"(p). by combining these factors we have g(p) = g"(p )f(p ). · in terms of this discussion, the first type of research described above. is in fact estimating f(p): if there ares search key values that retrieve n items from a collection of n items, then sis an estimate of this relation uses _!_ f (~)· n n' n + ~~ n = pn, and dp = n n~ 1 n n' the second kind of research directly estimates g ( p). guthrie, in a recent paper, provides a bridge between the two types of research by discussing his findings in terms of two models.6 one of his models, which asserts that each search key value has an equal chance of being requested, is equivalent to the assumption that g"(p) = 1, and g(p) = f(p). guthrie finds that this is not an adequate representation of his data. guthrie's second model asserts that each item has an equal chance of being requested. in our terms this becomes g' ( p )ap, and g( p )apf ( p). this model, while an improvement over the first, still disagrees with the data. furthermore , these models do not estimate f ( p); even if guthrie's model were correct, we would not know the probability that n items would be re trieved until we were told how many search key values contained n items. in the next section we will try to remedy this situation by means of a two paramete r representation of g( p). a representation of f(p) to get a more detailed account of search key behavior by experiment is difficult since the two aspects of randomness already discussed are confounded; the experimenter only sees the combined effect. we will, however, try to estimate the distribution g ( p) by a distribution of the form (a + {3 + 1)! a (1 )f3 a!f3! p p. we believe that such an attempt is reasonable on three grounds: a. it is not possible to find g(p) exactly, and moreover, it is not clear that this would be desirable. we are interested in a reasonable approximation that is satisfactory for decision-making purposes; b. the above distribution assumes a wide variety of shapes as a and f3 vary; it seems likely that values of a and f3 can be found for which 114 journal of library automation vol. 6/ 2 june 1973 this distribution is close enough to g ( p); and c. this distribution is mathematically tractable. if we proceed using the above approximation for g(p ), we find: (i) the probability, p(n), of n items being retrieved is given by 1. p(n) = (-n) ~-+ f3 + 1~l(a + n)! (nn + [3)! n a!fj! (a+fj+n+l)! ( ii) the expected number of items retrieved, e, is given by a + 1 2. e == n a + {3 + 2 ; and (iii) the variance, v, of the number of items retrieved is given by _ a+l {3 + 1 n 3· v n a + f3 + 2 a + {3 + 3 ( 1 + a. + {3 + 2 ) · if the experiment is performed on a small sample, the expectation and variance can be computed and the values of a and f1 estimated from the relations e a (1 -) + 1 4. f1== n 2, and e n v en e 1 -n 5. a. v 1 e l e 1-n usually ~ will be much smaller than one; in this case we may use the approximations: n 4'. f3 =(a+ i)e, and e 5'. e 1 a= n. once a and f1 have been evaluated, we can compute the probabilities p ( n) for files of arbitrary size, and with these values we can make assertions regarding the probability of, say, more than 30 items being retrieved. a relation that can be derived from formula 1 and may be of use when comparing this model with experiment is: p(n) i + {3 n-n = 1 + a n + 1 p(n + 1) statistical behavior of search keys/ bookstein 115 the probability of zero retrievals is likely to be an extraordinary point in the distributions g ( p) and p ( n) since it is influenced by the knowledge that a user may have of the collection; this effect is likely to be encountered in a sampling process in which the requests have to be generated artificially. in such cases it would be advisable to treat p ( 0) as an empirically derived parameter, (), and use the modified formula { (jifn=o 6. p' (n) = (1 fj) 1 ~(;~o) if n ::1= 0. the value of() can be estimated by the fraction of requests retrieving zero items; for sampling techniques using only productive requests, () will be zero. a. and f3 can be calculated as before from the mean and variance of the sample. conclusion the above discussion is intended as an attempt to provide some theoretical understanding of the puzzling behavior discovered in the use of search keys and also to provide some guide for those experimenting with samples of such files. we do, however, urge caution for the latter uses. an analysis similar to the above can be useful under several different circumstances, such as: determining the future behavior expected of a search key in a single library as the collection grows; determining the behavior for one library based upon experiments conducted on a different but similar library; and extrapolating from the performance of a search key in a sample of the collection to its pedormance in the full collection. if one wishes to compare two different libraries, one can note that as far as search key values are concerned, a particular library's collection can be thought of as a random sample of the larger population from which it selects its material, and accordingly the formula for p ( n) should be valid. in this case, if two different collections are drawn from the same population, the g ( p) refers to this population and the libraries are distinguished by the parameter n; when we are considering samples from a single library, then n is the sample size and g ( p) refers to the library itself. no theoretical basis exists at present for estimating to what extent the populations being considered depend upon the type of library, if any, so this problem must be dealt with empirically. we have assumed here that these populations are similar with regard to search key values. should these populations in fact vary, it is possible that they can be broken down, e.g., by language, into subpopulations that are stable and for each of which the analysis is valid. acknowledgments this work was made possible by clr/ neh grant no. e0-262-70-4658. i would like to express my gratitude to members of the university of chicago systems development office for their many comments and suggestions on this work. i ; i ll6 journal of library automation vol. 6/ 2 june 1973 references i. frederick g. kilgour, philip l. long, eugene b. leiderman, and alan l. landgraf, "title-only entries retrieved by use of truncated search key," journal of library automation 4:207-10 (dec. 1971). 2. a. bookstein, "double hashing," journal of the american society for information science 23:402-25 (nov.-dec. 1972) . 3. a. bookstein, "hash coding with a non-unique search key," to be published in the journal of american society for information science. 4. frederick g. kilgour, philip l. long, eugene b. leiderman, and ajan l. landgraf, "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys." preprint. 5. kilgour, long, leiderman, and landgraf, "title-only entries," p.209-10. 6. gerry p. guthrie and steven d. slifko, "analysis of search key retrieval on a large bibliographic file," journal of library automation 5:96-100 (june 1972). letter from the editor kenneth j. varnum information technology and libraries | march 2018 1 https://doi.org/10.6017/ital.v37i1.10388 this issue marks 50 years of information technology and libraries. the scope and everaccelerating pace of technological change over the five decades since journal of library automation was launched in 1968 mirrors what the world at large has experienced. from “automating” existing services and functions a half century ago, libraries are now using technology to rethink, recreate, and reinvent services — often in areas that simply were in the realm of science fiction. in an attempt to put today’s technology landscape in context, ital will publish a series of essays this year, each focusing on the highlights of a decade. in this issue, editorial board member mark cyzyk talks about selected articles from the first two volumes of the journal. in the remaining issues this year, we’ll tackle the 1970s, 1980s, 1990s, and 2000s. the journal itself, now as ever before, focuses on the present and the near future, so we will hold off recapitulating the current decade until our centennial celebration in 2068. as we look back over the journal’s history, the editorial board is also looking to the future. we want to make sure that we know for whom we are publishing these articles, and to make sure that the journal is as relevant to today’s (and tomorrow’s) readership as it has been for those who have brought us to the present. to that end, we invite anyone who is reading this issue to take this brief survey — tell us a little about how you came to ital today, how you’re connected with library technology, and what you’d like to see in the journal. it won’t take much of you r time (no more than 5 minutes) and will help us understand the context in which we are working. there’s another opportunity for you to help shape the future of the journal. due to a number of terms being up at the end of june 2018, we have at least five openings on the editorial board to fill. if you are passionate about libraries and technology, enjoy working with authors to shape their articles, and want to help set out today’s scholarly record for tomorrow’s technologists, submit a statement of interest at https://goo.gl/forms/5gbqouuseolxrfx52. we seek to have an editorial board that represents the diversity of library technology practitioners, and particularly invite individuals from non-academic libraries and underrepresented demographic groups to apply. sincerely, kenneth j. varnum editor march 2018 https://umich.qualtrics.com/jfe/form/sv_6hafly0cyjpbk4j https://umich.qualtrics.com/jfe/form/sv_6hafly0cyjpbk4j https://goo.gl/forms/5gbqouuseolxrfx52 accessible, dynamic web content using instagram jaci wilkinson information technology and libraries | march 2018 19 jaci wilkinson (jaci.wilkinson@umontana.edu) is web services librarian at the university of montana. abstract this is a case study in dynamic content creation using instagram’s application program interface (api). an embedded feed of the mansfield library archives and special collections’ (asc) most recent instagram posts was created for their website’s homepage. the process to harness instagram’s api highlighted competing interests: web services’ desire to most efficiently manage content, asc staff’s investment in the latest social media trends, and everyone’s institutional commitment to accessibility. introduction the mansfield library archives and special collections (asc) at the university of montana had a simple enough request. their homepage had been static for years and it was not possible to add more content creation to anyone’s workload. however, they had a robust instagram account with more than one thousand followers. was there any way to synchronize workflows with an instagram embed on the homepage? the solution was more complicated than we thought. we developed an instagram embed, but in the process grappled with some fundamental questions of technology in the library. how do we streamline the creation and sharing of ephemeral, dynamic content? how do we reconcile web accessibility standards with the innovative new platforms we want to incorporate on our websites? libraries have invested heavily in social media to improve their approachability, reduce library anxiety, and interact with their users. at the mansfield library, this investment has paid off for asc. this unit was an early adaptor of instagram, a photo and short video–sharing application with the public or approved followers. the asc instagram account launched in january 2015, and staff quickly settled on the persona of “banjo cat” to share collection items and relevant history. banjo cat was inspired by a whimsical nineteenth-century photograph in asc of a cat playing a banjo (see figure 1). asc now has about 1,200 followers including many other libraries, archives, and special collections. in fact, connecting to a wider community of similar institutions was a driving factor in creating an instagram account. the asc staff member who updates the account said, while we have lots of interactions with patrons on facebook we have basically zero interactions with other institutions. instagram is all about interacting with other institutions, sharing ideas for posts, commenting on posts. so by learning about this community and participating and interacting with it we are able to . . . learn about programs and ideas that we would probably not have access to otherwise. 1 mailto:jaci.wilkinson@umontana.edu accessible, dynamic web content using instagram | wilkinson 20 https://doi.org/10.6017/ital.v37i1.10230 figure 1. banjo cat by l. a. de ribas. mansfield library archives and special collections. 1880s. but while asc’s social media thrived, its website was bereft of dynamic content. given that the asc homepage is the ninth most visited page on the library site, it felt like a wasted opportunity to let such a highly trafficked area lack engaging, current, and appealing content. it seemed only natural to harness the energy put into the asc instagram account and embed that same light-hearted, community-oriented, and collection-focused content on the asc homepage. literature review libraries are enthusiastic adopters of social media; one study even shows that as of 2013, 94 percent of academic libraries had a social media presence.2 a 2006 library journal article observed the following about myspace, then a popular social media platform: “given the popularity and reach of this powerful social network, libraries have a chance to be leaders on their college campuses and in the larger community by realizing the possibilities of using social networking sites like myspace to bring their services to the public.” 3 this open-minded spirit and willingness to try new technology trends was shrewd. pew research reports that as of 2016, 69 percent of americans use some type of social media. 4 social media use has grown more representative of the population: the percentage of older adults on at least one social media site continues to increase.5 for academic libraries, the pull of facebook was immediately strong because of the initial requirement for users to have a .edu address. academic libraries very early on attempted to connect with students about services, resources, and spaces using facebook.6 information technology and libraries | march 2018 21 dynamic content is a gateway to building interest toward and buy-in to an institution. in user experience literature, “user delight” is “a positive emotional affect that a user may have when interacting with a device or interface.”7 in walter’s hierarchy of user needs, pleasure tops all other needs.8 figure 2. aaron walter’s hierarchy of user needs, from therese fessenden, “a theory of user delight: why usability is the foundation for delightful experiences,” nielsen norman group, march 25, 2017, https://www.nngroup.com/articles/theory-user-delight/. using social media to engage users with special collections has its own niche. special collections are typically housed in closed stacks and have no digital equivalent. often the materials housed in special collections are rare, fragile, exotic, beautiful, and unusual; a study of library blogs and social media found that those with higher aesthetic value received more visitors and more revisits.9 social media “gives users an idea of what the collection offers while it promotes and potentially gains foot traffic.”10 it has even been suggested that social media gives special collections the opportunity to stand in when digitization isn’t possible: “instead of digitizing a whole collection, librarians can highlight important parts of the collection with a snippet of its history.”11 in creating ucla’s powell library instagram account, librarian danielle salomon https://www.nngroup.com/articles/theory-user-delight/ accessible, dynamic web content using instagram | wilkinson 22 https://doi.org/10.6017/ital.v37i1.10230 writes, “special collections items and digital library images can be a treasure trove of social media content. one of our library’s goals is to increase students’ exposure to special collections items, so we draw heavily from these collections.”12 instagram is a relative newcomer to social media, but it has been consistently successful since its inception in 2010.13 as of 2016, 28 percent of americans use instagram, up from 11 percent in 2013.14 facebook bought instagram in 2012 and has since bolstered the application’s success by making the two platforms easy to navigate and share between. after vine, a short video application, was shuttered in 2017, instagram’s ability to take and post short videos has increased its value. instagram is distinct in that it is mobile-dependent: it is difficult to run the application through a web browser, and only one device can operate an instagram account. within the library community, instagram’s adoption has been strongest in academic libraries. this is tied to the high number of instagram users who are college-age.15 another reason libraries select instagram is because it has more diverse users than other social media applications, specifically african americans and latinos.16 in a 2016 study, instagram was the second-most pick among college students at western oregon university when asked what social media application the library should use (twitter came in first). the most popular use of instagram in academic libraries is familiarizing students with services, resources, and spaces. uses include first-year instruction activities to combat library anxiety and mini-contests that ask users to identify what posted photos are of.17 ucla’s powell library discovered students posting instagram photos of their spaces, so they initially joined to repost those photos and interact with those users. instagram makes a library seem approachable. librarian joanna hare reflected on this discovery: “instagram is really powerful in that respect because you can just snap a few photos [and] show what’s going on . . . so that students don’t view the library as being intimidating.” 18 approachability is augmented by delegating photography and posting tasks to library student employees. social media is less often seen as a way to help create dynamic content for a library’s website. the exceptions to this trend have come from institutions with substantial technology resources. north carolina state university created an open source software that adds photos posted by anyone on instagram to a library photo collection when a certain hashtag is used.19 the university of nebraska’s calvin t. ryan library created an rss feed that disseminates blog posts to twitter, facebook, and the library homepage. posts from followed accounts in twitter and facebook are also a part of the resulting feed. the rss feed requires use of a third-party tool called dlvr.it (https://dlvrit.com/), which supports many other social media applications, but not instagram. a notable absence in literature on social media use in libraries is any mention of accessibility concerns. the “improving the accessibility of social media for public service” toolkit developed by a group of us government offices is a useful resource that includes specific guidelines on making instagram posts more accessible.20 the toolkit explains that “more and more organizations are using social media to conduct outreach, recruit job candidates and encourage workplace productivity. . . . but not all social media content is accessible to people with certain disabilities, which limits the reach and effectiveness of these platforms. and with 20% of the population estimated to have a disability, government agencies have an obligation to ensure that their messages, services and products are as inclusive as possible.”21 given the stated importance of social media in library literature, the lack of conversation about accessibility and social media is a barrier to inclusivity. https://dlvrit.com/ information technology and libraries | march 2018 23 mansfield library archives and special collections’ instagram feed dynamic content was lacking from any part of the asc website, but staff had a dearth of time and knowledge of the content management system to create web content. there was a drive to solve this problem because a new web services librarian had recently been hired. when the web services librarian learned of asc’s thriving instagram presence, she pursued the possibility of including that content on the asc website. she felt that, in addition to being more efficient, content creation should stay in-house given the highly specialized nature of asc’s collections, spaces, and resources. the ideal solution would allow asc staff to create and manage an instagram feed unassisted; the web services librarian sought the simplest possible solution for them. our content management system and instagram’s developer website were first consulted with the hope that one provided an automated embed or plugin. our content management system, cascade, could pull in content from facebook and twitter but not instagram, and instagram did not have an automated feed creator. after more research, we learned that third-party instagram feed embeds are the only possible way to create an instagram feed without using instagram’s api. the api was considered a last-resort option because we knew that asc staff could not manage the code themselves. the idea of using any third-party service was undesirable because of a lack of control, stability, and accessibility. if the service has technical issues or goes out of business, it would be very noticeable given the visibility of asc’s homepage. in 2012, a student advocacy organization at the university of montana filed a civil rights complaint with the us department of education focusing on disabled students’ unequal access to electronic and information technologies. since then, the mansfield library has been proactive to eliminate barriers to access.22 given this history, we are wary of the accessibility of third-party applications to someone using assistive technology, most likely, a screen reader. juicer (https://www.juicer.io/), for example, is a freely available service for an instagram feed but in exchange it retains its branding prominently at the top of the feed. an example of juicer in use can be found on the home page of the baltimore aquarium (http://aqua.org/). tests of juicer showed that it was not accessible for a screen reader. finally, it didn’t fit our need: juicer curated posts from other users depending on the hashtags and reposts, but we only wanted to feature our own content. the unpredictability of other accounts’ posts ending up on the asc homepage was not desirable. instagram’s developer site did not make finding a solution easy. the page titled “embedding” is about embedding individual posts on a webpage, not a whole feed.23 this content does not even link out to an explanation of how to embed a feed. the “authentication” page is where the process begins because calling the api requires a token an authenticated instagram account user.24 a user is authenticated by creating a client id and then receiving an access token. another interesting roadblock provided by the instagram developer site is that the “authentication” page provides no further information about using the access token to call the api. it took outside research to finally figure out the steps needed to make the api requests for asc’s feed.25 php code is used to call the api and copy the three most recent asc instagram posts to a local server file. (using javascript to call the api is a poor choice because that code will make the account’s access token public. if anyone sees this token they can use it themselves to pull your feed using the instagram api.) css replicates the look and feel of instagram with white, minimalistic icons and a simple photo display https://www.juicer.io/ http://aqua.org/ accessible, dynamic web content using instagram | wilkinson 24 https://doi.org/10.6017/ital.v37i1.10230 that darkens and shows the beginning of the description when a user’s mouse hovers over it. all code from this project is freely available in github.26 there is a catch to this embedded feed process. the directions given through instagram and by the online sources we used only took us to sandbox mode (in web development, sandbox refers to a restricted or test version of a final product). in sandbox, instagram limits the number of requests to the api. unfortunately, a request was made every time someone went to the asc page. the initial feed stopped working in minutes because we did not realize this limitation of sandbox mode meant. another look at the instagram developer site taught us that the only way to leave sandbox was to have our “app,” as instagram called it, reviewed.27 in other words, instagram has only set up their api to be used for full application development (like juicer). we decided not to leave sandbox mode because of uncertainty about what instagram’s review process would entail. if our app was rejected, would they force us to discontinue our work? the timeline for the approval process was also uncertain. distrust and uncertainty, unfortunately, guided our decision-making at this stage. instead of undergoing the review process, the php code was reconfigured to call the api only once a day. this made the feed less dynamic because it was not updating in real time. f or our purposes this was not a problem; the asc instagram account is updated at most once or twice a week anyway. as a result, we are “scraping” asc’s instagram account. although “crawling, scraping, and caching” are prohibited by instagram’s terms of use, other instagram feeds in github have similar workarounds and point out that a plugin/scraper “uses (the) same endpoint that instagram is using in their own site, so it’s arguable if the toc [terms of use] can prohibit the use of openly available information.”28 while figuring out how to work with the instagram api, a major accessibility roadblock cropped up: there was no place for the alt text—descriptive information about the image that is used by assistive technologies for users with low vision. besides taking or uploading a photo, the only other actions offered to create a new post were to write a caption, tag people, or add a location. only the caption allowed for a text string. without alt text, not only is the instagram feed unintelligible to a screen reader but it disturbs a screen reader user’s interaction with all other content on that page. an asc staff member discovered a solution when she noticed a joshua tree national park instagram post with alt text at the bottom of the caption. although initially put off by the “wordiness,” we concluded this was the only logical way to move forward. the benefits to this format of alt text took focus as we moved through the project: the asc staff member was able to choose the desired alt text without any additional steps or skills, and we grew to relish the opportunity to explain to curious users what the #alttext hashtag meant and why it was important to us. php code isolates all text after #alttext and displays that as the alt text to a screen reader. since the instagram feed was implemented, it has been interesting to follow how the instagram developer site has changed and grown. although facebook has owned instagram for five years, the instagram developer site is only now starting to link out to facebook developer content. most recently, the instagram developer site has been advertising the instagram graph api for use by business accounts. this type of development is useless for us because we have a personal instagram account, not a business account. and the function of the instagram graph api is focused on the internal user and analytics, not the end user and user experience. even if the instagram graph api was available for personal accounts, it is worth asking if this type of data collection would be of use to an organization that doesn’t have the labor of a devoted marketing team. information technology and libraries | march 2018 25 dynamic content through social media and web content provides opportunities to create user delight because it focuses on visually appealing, fun, timely, and interesting information. for archives, special collections, and other cultural heritage institutions, this content is particularly useful because it provides a look into collections that are interesting and rare but also fragile and housed in closed stacks. these positives are tempered by the reality many of these institutions face: budgets are tight, staffs are small, and technical expertise might be lacking. this paper demonstrates how important and useful social media is to create dynamic website content. unfortunately, there is a gap in library literature on accessibility and social media; although social media content is ephemeral or lacks specific utility, libraries need to pay more attention to the various ways users access resources and information through social media, especially if that same content appears on the institution’s website. the asc’s embedded homepage instagram feed fits their needs, is accessible, and builds community around their unique collections. by providing all the code created in this project in github,29 including the css we used, our hope is that institutions interested in this instagram feed model could replicate it for their own purposes without extensive technical support. acknowledgments i am thankful for the expertise of carlie magill, donna mccrea, and wes samson. without them this project would not have been possible. references 1 carlie magill, e-mail message to author, august 8, 2017. 2 michael sutherland, “rss feed 2.0” code4lib 31, january 28, 2016, http://journal.code4lib.org/articles/11299. 3 beth evans, “your space or myspace?” library journal 131 (2006): 8–12. library, information science & technology abstracts, ebscohost. 4 “social media fact sheet,” pew research center, january 12, 2017, http://www.pewinternet.org/fact-sheet/social-media/. 5 ibid. 6 brian s. mathews, “do you facebook?” c&rl news, may 2006, http://crln.acrl.org/index.php/crlnews/article/viewfile/7622/7622. 7 therese fessenden, “a theory of user delight: why usability is the foundation for delightful experiences,” nielsen norman group, march 25, 2017, https://www.nngroup.com/articles/theory-user-delight/. 8 ibid. 9 daryl green, “utilizing social media to promote special collections: what works and what doesn’t” (paper, 78th ifla general conference and assembly, helsinki, finland, june 2012), 11, https://www.ifla.org/past-wlic/2012/87-green-en.pdf. 10 katrina rink, “displaying special collections online,” serials librarian 73, no. 2 (2017): 1–9, https://doi.org/10.1080/0361526x.2017.1291462. 11 ibid. http://journal.code4lib.org/articles/11299 http://www.pewinternet.org/fact-sheet/social-media/ http://crln.acrl.org/index.php/crlnews/article/viewfile/7622/7622 https://www.nngroup.com/articles/theory-user-delight/ https://www.ifla.org/past-wlic/2012/87-green-en.pdf https://doi.org/10.1080/0361526x.2017.1291462 accessible, dynamic web content using instagram | wilkinson 26 https://doi.org/10.6017/ital.v37i1.10230 12 danielle salomon, “moving on from facebook,” college & research libraries news 74, no. 8 (2013): 408–12, https://crln.acrl.org/index.php/crlnews/article/view/8991. 13 sarah perez, “the rise of instagram,” techcrunch, april 24, 2012, https://techcrunch.com/2012/04/24/the-rise-of-instagram-tracking-the-apps-spreadworldwide/. 14 “social media fact sheet,” pew research center, january 12, 2017, http://www.pewinternet.org/fact-sheet/social-media/. 15 lauren wallis, “#selfiesinthestacks: sharing the library with instagram,” internet reference services quarterly 19, no. 3–4 (2014): 181–206, https://doi.org/10.1080/10875301.2014.983287. 16 elizabeth brookbank, “so much social media, so little time: using student feedback to guide academic library social media strategy ,” journal of electronic resources librarianship 27, no. 4 (2015): 232–47, https://doi.org/10.1080/1941126x.2015.1092344; salomon, “moving on from facebook.” 17 wallis,“#selfiesinthestacks”; salomon, “moving on from facebook.” 18 wendy abbott et al., “an instagram is worth a thousand words: an industry panel and audience q&a,” library hi tech news 30, no. 7 (2013): 1–6, https://doi.org/10.1108/lhtn08-2013-0047. 19 salomon “moving on from facebook.” 20 “federal social media accessibility toolkit hackpad,” digital gov, accessed november 25, 2017, https://www.digitalgov.gov/resources/federal-social-media-accessibility-toolkit-hackpad/ . 21 ibid. 22 donna e. mccrea, “creating a more accessible environment for our users with disabilities: responding to an office for civil rights complaint,” archival issues 38, no. 1 (2017): 7, https://scholarworks.umt.edu/ml_pubs/25/ 23 “embedding,” instagram developer, accessed november 25, 2017, https://www.instagram.com/developer/embedding/. 24 “authentication,” instagram developer, accessed november 25, 2017, https://www.instagram.com/developer/authentication/ . 25 pranay deegoju, “embedding instagram feed in your website,” logical feed, december 25, 2015, https://www.logicalfeed.com/embedding-instagram-feed-in-your-website . 26 wes samson, “ws784512 instagram,” github, 2016, https://github.com/ws784512/instagram. 27 “sandbox mode,” instagram developer, accessed november 25, 2017, https://www.instagram.com/developer/sandbox/. 28 “terms of use,” instagram, accessed november 25, 2017, https://help.instagram.com/478745558852511; and “image-hashtag-feed,” digitoimisto dude oy, accessed november 25, 2017, https://github.com/digitoimistodude/image-hashtag-feed. 29 samson, “ws784512 instagram.” https://crln.acrl.org/index.php/crlnews/article/view/8991 https://techcrunch.com/2012/04/24/the-rise-of-instagram-tracking-the-apps-spread-worldwide/ https://techcrunch.com/2012/04/24/the-rise-of-instagram-tracking-the-apps-spread-worldwide/ http://www.pewinternet.org/fact-sheet/social-media/ https://doi.org/10.1080/10875301.2014.983287 https://doi.org/10.1080/1941126x.2015.1092344 https://doi.org/10.1108/lhtn-08-2013-0047 https://doi.org/10.1108/lhtn-08-2013-0047 https://www.digitalgov.gov/resources/federal-social-media-accessibility-toolkit-hackpad/ https://scholarworks.umt.edu/ml_pubs/25/ https://www.instagram.com/developer/embedding/ https://www.instagram.com/developer/authentication/ https://www.logicalfeed.com/embedding-instagram-feed-in-your-website https://github.com/ws784512/instagram https://www.instagram.com/developer/sandbox/ https://help.instagram.com/478745558852511 https://github.com/digitoimistodude/image-hashtag-feed abstract introduction literature review mansfield library archives and special collections’ instagram feed acknowledgments references editorial: singularity—are we there, yet? | truitt 55 i n my last column, i wrote about two books—nicholas carr ’s the shallows and william powers’ hamlet’s blackberry—relating to learning in the always-on, always connected environment of “screens.”1 since then, two additional works have come to my attention. while i won’t be able to do them justice in the space i have here, they deserve careful consideration and open discussion by those of us in the library community. if carr’s and power’s books are about how we learn in an always-connected world of screens, sherry turkle’s alone together and elias aboujaoude’s virtually you are about who we are in the process of becoming in that world.2 turkle is a psychologist at mit who studies human– computer interactions. among her previous works are the second self (1984) and life on the screen (1995). aboujaoude is a psychiatrist at the stanford university school of medicine, where he serves as director of the obsessive compulsive disorder clinic and the impulse control disorders clinic. based on extensive coverage of specialist and popular literature, as well as numerous anonymized accounts of patients and subjects encountered by the authors, both works are characterized by thorough research and thoughtful analysis. while their approaches to the topic of “what we are becoming” as a result of screens may differ— aboujaoude’s, for example, focuses on “templates” and the terminology of traditional psychiatry, while turkle’s examines the relationship between loneliness and solitude (they are different), and how these in turn relate to the world of screens—their observations of the everyday manifestations of what might be called the pathology of screens bear many common threads. i’m acutely aware of the potential for injustice (at best) and misrepresentation or misunderstanding (rather worse) that i risk in seeking to distill two very complex studies into such a small space. and, frankly, i’m still trying to wrap my head around both the books and the larger issues they raise. with that caveat, i still think we should be reading about and widely discussing the phenomena reported, which many of us observe on a daily basis. in the sections that follow, i’d like to touch on a very few themes that emerge from these books. ■■ “why do people no longer suffice?”3 a pair of anecdotes that turkle recounts to explain her reasons for writing the current book seems worth sharing at the outset. in the first, she describes taking her then-fourteen-year-old daughter, rebecca, to the charles darwin exhibition at new york’s american museum of natural history in 2005. among the many artifacts on display was a pair of live giant galapagos tortoises: “one tortoise was hidden from view; the other rested in its cage, utterly still. rebecca inspected the visible tortoise thoughtfully for a while and then said matter-of-factly, ‘they could have used a robot.’” when turkle queried other bystanders, many of the children agreed, with one saying, ‘for what the turtles do, you didn’t have to have live ones.’” in this case, “alive enough” was sufficient for the purpose at hand.4 sometime later, turkle read and publicly expressed her reservations about british computer scientist david levy’s book, love and sex with robots, in which levy predicted that by the middle of this century, love with robots will be as normal as love with other humans, while the number of sexual acts and lovemaking positions commonly practiced between humans will be extended, as robots will teach more than is in all of the world’s published sex manuals combined.5 contacted by a reporter from scientific american about her comments regarding levy’s book, turkle was stunned when the reporter, equating the possibility of relationships between humans and robots with gay and lesbian relationships, accused her of likewise opposing these human-to-human relationships. if we now have reached a point where gay and lesbian relationships can strike us as comparable to human-to-machine relationships, something very important has changed; for turkle, it suggested that we are on the threshold of what she terms the “robotic moment”: this does not mean that companionate robots are common among us; it refers to our state of emotional—and i would say philosophical—readiness. i find people willing to seriously consider robots not only as pets but as potential friends, confidants and romantic partners. we don’t seem to care what these artificial intelligences “know” or “understand” of the human moments we might “share” with them. at the robotic moment, the performance of connection seems connection enough. we are poised to attach to the inanimate without prejudice.6 marc truitteditorial: singularity—are we there, yet? marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 56 information technology and libraries | june 2011 while these examples are admittedly extreme, both authors agree that something very basic has changed in the way we conduct ourselves. turkle characterizes it as mobile technology having made each of us “pausable,” i.e., that a face-to-face interaction being interrupted by an incoming call, text message, or e-mail is no longer extraordinary; rather, in the “new etiquette,” it is “close to the norm.”10 and the rudeness, as well we know, isn’t limited to mobile communications. referring to “flame wars,” which regularly erupt in online communities, aboujaoude observes: the internet makes it easier to suspend ethical codes governing conduct and behavior. gentleness, common courtesy, and the little niceties that announce us as well-mannered, civilized, and sociable members of the species are quickly stripped away to reveal a completely naked, often unpleasant human being.11 even our routine e-mail messages—lacking as they often do salutations and closing sign-offs—are characterized by a form of curtness heretofore unacceptable in paper communications. remarkably, to those old enough to recall the traditional norms, the brusqueness is not only unintended, it is as well unconscious; “[we] just don’t think warmth and manners are necessary or even advisable in cyberspace.”12 ■■ castles in the air: avatars, profiles, and remaking ourselves as we wish we were finally, a place to love your body, love your friends, and love your life. —second life, “what is second life?”13 one of the interesting and worrisome themes in both turkle’s and aboujaoude’s studies is that of the reinvention and transformation of the self, in the form of online personas and avatars. this is the stock-in-trade of online communities and gaming sites such as facebook and second life. these sites cater to our nearly universal desire to be someone other than who we are: online, you’re slim, rich, and buffed up, and you feel you have more opportunities than in the real world. . . . we can reinvent ourselves as comely avatars. we can write the facebook profile that pleases us. we can edit our messages until they project the self we want to be.14 the problem is that for many there is an increasing fuzziness at the interface between real and virtual ■■ changing mores, or the triumph of rudeness i can’t think of any successful online community where the nice, quiet, reasonable voices defeat the loud, angry ones. . . . the computer somehow nullifies the social contract. —heather champ, yahoo!’s flickr community manager7 sadly, we’ve all experienced it. we get stuck on a bus, train, or in an elevator with someone engaged in a loud conversation on her or his mobile phone. all too often, the person is loudly carrying on about matters we wish we weren’t there to hear. perhaps it’s a fight with a partner. or a discussion of some delicate health matter. whatever it is, we really don’t want to know, but because of the limitations imposed by physical spaces, we can’t avoid being a party to at least half of the conversation. what’s wrong with these individuals? do they really have no consideration or sense of propriety? it turns out that in matters of tact and good taste, the ground has shifted, and where once we understood and abided by commonly accepted rules of conduct and respect for others, we do so no longer. indeed, the everyday obnoxious intrusions by those using public spaces for their private conversations are among the least of offenders. consider the following situations shared by turkle: sal, 62 years old, holds a small dinner party at his home as part of his “reentry into society” after several years of having cared for his recently deceased wife: i invited a woman, about fifty, who works in washington. in the middle of a conversation about the middle east, she takes out her blackberry. she wasn’t speaking on it. i wondered if she was checking her e-mail. i thought she was being rude, so i asked her what she was doing. she said that she was blogging the conversation. she was blogging the conversation.8 turkle later tells of attending a memorial service for a friend. several [attendees] around me used the [printed] program’s stiff, protective wings to hide their cell phones as they sent text messages during the service. one of the texting mourners, a woman in her late sixties, came over to chat with me after the service. matter-of-factly, she offered, “i couldn’t stand to sit that long without getting on my phone.” the point of the service was to take a moment. this woman had been schooled by a technology she’d had for less than a decade to find this close to impossible.9 editorial: singularity—are we there, yet? | truitt 57 enough” became yet more blurred. turkle’s anecdotes of children explaining the “aliveness” of these robots are both touching and disturbing. speaking of a tamagotchi, one child wrote a poem: “my baby died in his sleep. i will forever weep. then his batteries went dead. now he lives in my head.”19 the concept of “alive enough” is not unique to the very young, either. by 2009, sociable robots had moved beyond children’s toys with the introduction of paro, a baby seal-like “creature” aimed at providing companionship to the elderly and touted as “the most therapeutic robot in the world. . . . the children were onto something: the elderly are taken with the robots. most are accepting and there are times when some seem to prefer a robot with simple demands to a person with more complicated ones.”20 where does it end? turkle goes on to describe nursebot, a device aimed at hospitals and long-term care facilities, which colleagues characterized as “a robot even sherry can love.” but when turkle injured herself in a fall a few months later, [i was] wheeled from one test to another on a hospital stretcher. my companions in this journey were a changing collection of male orderlies. they knew how much it hurt when they had to lift me off the gurney and onto the radiology table. they were solicitous and funny. . . . the orderly who took me to the discharge station . . . gave me a high five. the nursebot might have been capable of the logistics, but i was glad that i was there with people. . . . between human beings, simple things reach you. when it comes to care, there may be no pedestrian jobs.21 but need we librarians care about something as farfetched as nursebot? absolutely. now that ibm has proven that it can design a machine—okay, an array of machines, but something much more compact is surely coming soon—that can win at jeopardy!, is the robotic reference librarian really that much of a hurdle? take a bit of watson technology, stick it in nursebot, give it sensible shoes, and hey, i can easily imagine bibliobot, factory-standard in several guises, including perhaps donna reed (as mary, who becomes the town librarian in the alter-life of capra’s it’s a wonderful life) or shirley jones (as marian, the librarian, in the music man). i like donna reed as much as anyone, but do i really want reference assistance from her android doppelgänger? but then, for years after the introduction of the atm, i confess that i continued taking lunch hours off just so that i could deal with a “real person” at the bank, so perhaps it’s just me. the future is in the helping/service professions, indeed! and when we’re all replaced by robots (sociable and otherwise), what will we do to fill the time? personas: “not surprisingly, people report feeling let down when they move from the virtual to the real world. it is not uncommon to see people fidget with their smartphones, looking for virtual places where they might once again be more.”15 turkle speaks of the development of what she terms a “vexed relationship” between the real and the virtual: in games where we expect to play an avatar, we end up being ourselves in the most revealing ways; on social-networking sites such as facebook, we think we will be presenting ourselves, but our profile ends up as somebody else—often the fantasy of who we want to be. distinctions blur.16 and indeed, some completely lose sight of what is real and what is not. aboujaoude relates the story of alex, whose involvement in an online community became so consuming that he not only created for himself an online persona—“’i then meticulously painted in his hair, streak by streak, and picked “azure blue” for his eye color and “snow white” for his teeth.’”—but also left his “real” girlfriend after similarly remaking the avatar of his online girlfriend, nadia—“from her waist size to the number of freckles on her cheeks.” speaking of his former “real” girlfriend, alex said, “real had become overrated.”17 ■■ “don’t we have people for these jobs?”18 ageist disclaimer: when i grew up, robots—those that weren’t in science fiction stories or films—were things that were touted as making auto assembly lines more efficient, or putting auto workers out of jobs, depending on your perspective. while not technically a robot, the other machine that characterized “that time” was the automated teller machine (atm), which freed us from having to do our banking during traditional weekday hours, and not coincidentally resulted, again, in the loss of many entry-level jobs in financial institutions. as i recall, we were all reassured that the future lay in “helping/ service” professions, where the danger of replacement by machines was thought to be minimal. now, fast forward 30 years. the first half of turkle’s book is the history of “sociable robots” and our interactions with them. moving from the reactions of mit students to joseph weizenbaum’s eliza in the mid-1970s, she recounts her studies of children’s interactions, first with electronic toys—e.g., tamagotchi—and later, with increasingly sophisticated and “alive” robots, such as furby, aibo, and my real baby. with each generation, these devices made yet more “demands” on their owners—for care, “feeding”, etc. and with each generation, the line between “alive” and “alive 58 information technology and libraries | june 2011 to admit that we’ve seen many examples of how connectedness between people we’d otherwise consider “normal” has and is changing our manners and mores.24 many libraries and other public spaces, reacting to patron complaints about the lack of consideration shown by some users, have had to declare certain areas “cell phone free.” in the interest of getting your attention, i’ve admittedly selected some fairly extreme examples from the two books at hand. however, i think the point is that, now that the glitter of always-on, always-connected, has begun to fade a bit, there is a continuum of dysfunctional behaviors that we are beginning to notice, and it’s time to talk about how we as librarians fit into all of this. are there things we in libraries are doing that encourage some of these less desirable and even unhealthy behaviors? which takes us to a second concern raised by some of my gentle draft-readers: we’ve heard this tale before. television, and radio before it, were technologies that, when they were new, were criticized as corrupting and leading us to all sorts of negative, self-destructive, and socially undesirable behaviors. how are screens and the technology of always-connected any different? a part of me—the one that winces every time someone glibly refers to the “transformational” changes taking place around us—agrees. i was trained as a historian, to take a long view about change. and we’re talking about technologies that—in the case of the web— have been in common use for just over fifteen years. that said, my interest here is in seeing our profession begin a conversation about how connective technologies have influenced behavioral changes in people, and especially about how we in libraries may be unwittingly abetting those behavioral changes. television and radio were fundamentally different technologies in that they were one-way broadcast tools. and to the best of my recollection, neither has ever been widely adopted by or in libraries. yes, we’ve circulated videos and sound recordings, and even provided limited facilities for the playback of such media. but neither has ever really had an impact on the traditional core business of libraries, which is the encouragement and facilitation of the largely solitary, contemplative act of reading. connective technologies, in the form of intelligent machines and network-based communities, can be said to be antithetical to this core activity. we need to think about that, and to consider carefully the behaviors we may be encouraging. notwithstanding those critics of change in our profession who feel we move far too glacially, i would maintain that we have often been, if not at the forefront of the technology pack, then certainly among its most enthusiastic ■■ where from here? i titled this column “singularity.” for those not familiar with the literature of science fiction, turkle provides a useful explanation: this notion has migrated from science fiction to engineering. the singularity is the moment—it is mythic; you have to believe in it—when machine intelligence crosses a tipping point. past this point, say those who believe, artificial intelligence will go beyond anything we can currently conceive. . . . at the singularity, everything will become technically possible, including robots that love. indeed, at the singularity, we may merge with the robotic and achieve immortality. the singularity is technological rapture.22 i think it’s pretty clear that we’re still a fair distance from anything that one might reasonably term a singularity. but the concept is surely present, albeit in a somewhat less hubristic degree, when we speak in uncritical awe of “game-changing” or “transformational” technologies. turkle puts it this way: the triumphalist narrative of the web is the reassuring story that people want to hear and that technologists want to tell. but the heroic story is not the whole story. in virtual worlds and computer games, people are flattened into personae. on social networks, people are reduced to their profiles. on our mobile devices, we often talk to each other on the move and with little disposable time—so little, in fact, that we communicate in a new language of abbreviation in which letters stand for words and emoticons for feelings. . . . we are increasingly connected to each other but oddly more alone: in intimacy, new solitudes.23 some of my endlessly patient friends—the ones who provide both you and me with some measure of buffering from the worst of my rants in prepublication drafts of these columns—have asked questions about how all this relates to libraries, for example: how much it is legitimate to generalize to the broader population research findings from cases of obsessive compulsive disorder? the individuals studied are, of course, obsessive and compulsive, in relation to the internet and new technologies. do their behaviors not represent an extreme end of the population? a fair question. and yes, the examples i’ve provided in this column are admittedly somewhat extreme. but turkle and aboujaoud both point to many examples that are far more common. i think all of us would have editorial: singularity—are we there, yet? | truitt 59 references and notes 1. marc truitt, “editorial: the air is full of people,” information technology and libraries 30 (mar. 2011): 3–5. http:// www.ala.org/ala/mgrps/divs/lita/ital/302011/3001mar/ editorial_pdf.cfm (accessed apr. 25, 2011). 2. sherry turkle, alone together: why we expect more from technology and less from each other (new york: basic books, 2011); elias aboujaoude, virtually you : the dangerous powers of the e-personality (new york : norton, 2011). 3. turkle, 19. 4. ibid., 3–4. 5. quoted in ibid., 5. 6. ibid., 9–10. emphasis added. 7. quoted in aboujaoude, 99. 8. turkle, 162. emphasis in original. 9. ibid, 295. 10. turkle, 161. 11. aboujaoude, 96 12. ibid., 98. 13. quoted in turkle, 1. 14. ibid., 12. 15. ibid. 16. ibid., 153. 17. aboujaoude, 77–78. 18. turkle, 290. 19. ibid., 34. 20. ibid., 103–4. 21. ibid., 120–21. 22. ibid., 25. 23. ibid., 18–19. 24. for a recent and typical example, see david carr, “keep your thumbs still when i’m talking to you,” new york times, apr. 15, 2011, http://www.nytimes.com/2011/04/17/ fashion/17text.html (accessed may 2, 2011). 25. aboujaoude, 283. adopters. in our quest to remain “relevant” to our university or school administrations, governing boards, and (in theory, at least) our patrons, we have embraced with remarkably little reservation just about every technology trend that’s come along in the past few decades. at the same time, we’ve been remarkably uncritical and unreflective about our role in, and the larger implications of, what we might be doing by adopting these technologies. aboujaoude, in a surprising, but i think largely correct summary comment, observes: extremely little is available, however, for the individual interested in learning more about how virtual technology has reshaped our inner universe and may be remapping our brains. as centers of learning, public libraries, schools, and universities may be disproportionately responsible for this deficiency. they outdo one another in digitalizing their holdings and speeding up their internet connections, and rightfully see those upgrades as essential to compete for students, scholars, and patrons. in exchange, however, and with few exceptions, they teach little about the unintended, less obvious, and more personal consequences of the world wide web. the irony is, at least in some libraries’ case, that their very survival seems threatened by a shift that they do not seem fully engaged in trying to understand, much less educate their audiences about.25 i could hardly agree more. so, how do we answer aboujaoude’s critique? letter from the editor: farewell 2020 letter from the editor farewell 2020 kenneth j. varnum information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.13051 i don’t think i’ve ever been so ready to see a year in the rear-view mirror as i am with 2020. this year is one i’d just as soon not repeat, although i nurture a small flame of hope. hope that as a society what we have experienced this year will exert a positive influence on the future. hope that we recall the critical importance of facts and evidence. hope that we don’t drop the effort to be better members of our local, national, and global communities and treat everyone equitably. hope that as a global populace we continue to get into “good trouble” and push back against institutionalized policies and practices of racism and discrimination and strive to be better. despite the myriad challenges this year has brought, it is welcome to see so many libraries continuing to serve their communities, adapting to pandemic restrictions, and providing new and modified access to books and digital information. and equally gratifying, from my perspective as ital’s editor, is that so many library technologists continue to generously share what they have learned through submissions to this journal. along those lines, i’m extending my annual invitation to our public library colleagues to propose a contribution to our quarterly column, “public libraries leading the way.” items in this series highlight a technology-based innovation from a public library perspective. topics we are interested in could include any way that technologies have helped you provide or innovate service to your communities during the pandemic, but could touch on any novel, interesting, or promising use of technology in a public library setting. columns should be in the 1,000-1,500 word range and may include illustrations. these are not intended to be research articles. rather, public libraries leading the way columns are meant to share practical experience with technology development or uses within the library. if you are interested in contributing a column, please submit a brief summary of your idea. wishing you the best for 2021, kenneth j. varnum, editor varnum@umich.edu december 2020 https://ejournals.bc.edu/index.php/ital/pllw https://docs.google.com/forms/d/e/1faipqlsd7c0-g-lxetkj2ukjokd7oyt-vprtoizdm1fs8xuhkotctug/viewform https://docs.google.com/forms/d/e/1faipqlsd7c0-g-lxetkj2ukjokd7oyt-vprtoizdm1fs8xuhkotctug/viewform mailto:varnum@umich.edu lib-mocs-kmc364-20131012114451 322 highlights of lit a board meetings these highlights are published to inform division members of the activities of their board. they are abstracted from the official minutes. 1981 ala annual conference san francisco first session june 29, 1981 board members present: s. michael malinconico, brigitte l. kenney, barbara e. markuson, nancy l. eaton, kenneth j. bierman, bonnie k. juergens, marilyn j. rehnberg, helen cyr, heike kordish, donald p. hammer. lita election results. vice-president/president-elect: carolyn m. gray director-at-large: hugh atkinson ala councilor: bonnie k. juergens vccs vice-chairperson/chairperson-elect: mary h. karpinski vccs secretary: patricia m. paine vccs member-at-large: leon l. drolet, jr. avs chairperson: anne t. meyer a vs vice-chairperson/chairperson-elect: louis r. pointon avs member-at-large: michael d. miller isas vice-chairperson/chairperson-elect: james c. thompson isas member-at-large: sherrie schmidt evaluation of electronic mail project. the members of the board reviewed their experiences and impressions with the ontyme electronic mail system. the general consensus was that the system was very good and everyone was pleased with it and wants to expand its use. the board has not yet used the source, although we are now subscribers to the system. motion was made by markuson , seconded by rehnberg, and passed that: the electronic mail project be extended through the midwinter meeting, 1982, with a total budget of $2,000 from the inception of the project. uta's representation on ansi x-3. x-3 is the american national standards institute committee on computers and information processing. discussion included the mechanics of keeping the membership informed of proposed standards being considered, the large amount of time required of the representahighlights of meetings 323 tive to monitor, study, and disseminate the proposed standards, and the costs involved for lit a to support a representative. juergens requested that if a division-wide representative to x-3 is appointed that that person should also be made ex officio to the isas/tesla committee or be liaison to the chair of isas. no action was taken. goals and long-range planning committee. kenney announced that she had appointed an ad hoc goals and long-range planning committee chaired by george abbott. directory of library systems in use. the suggestion was made that a directory of the many automated systems in use in libraries would be very useful. a motion was made by markuson, seconded by kenney, and passed that: in response to inquiry about a directory to assist in identifying specific applications of technology in libraries, media, and information centers, that the publications committee explore the feasibility of an online lit a directory of library, media, and information center use of technology. the investigation should consider format of description, potential of interactive online updating, and possible output byproducts, and should result in a draft rfp for consideration by the lit a board for review at midwinter. president's program at philadelphia. kenney announced her plans for the lit a president's program at the philadelphia ala annual conference. she is planning to transmit by satellite to fifty receiving sites around the country an "ala sampler" of outstanding technically-based programs from the philadelphia conference and short vignettes of what ala is all about. the subject of "0 n-line catalogs" has been chosen for the president's program and segments of it and the rtsd/lit airasd preconference institute on the same subject will be used. the program is intended for people who cannot get to ala conferences. if not enough registration is received by the coming ala midwinter meeting the whole activity would be cancelled. oral history project. at the 1980 new york ala conference, the suggestion was made that in the future many of the pioneers in the field of library automation will pass off the scene and it was felt that it was lit a's responsibility to capture for posterity the ideas and philosophy of those people. a motion was made by kenney, seconded by eaton, and passed that: an ad hoc committee be formed to investigate an oral history project in all aspects and submit a detailed set of alternative approaches for the board's consideration. the library history roundtable will be informed of the committee's activity and invited to participate. second session june 30, 1981 board members present: s. michael malinconico, brigitte l. kenney, barbara e. markuson, nancy l. eaton , kenneth}. bierman, ronald f. miller, bonnie k. juergens, marilyn j. rehnberg, helen cyr , heike kordish , charles husbands, and donald p. hammer. 324 journal of library automation vol. 14/4 december 1981 lita section reports: isas. bonnie juergens, chairperson of isas, reported that the section has approved three programs for the philadelphia conference. asis will be asked to cosponsor the program "information science, computer science, and library science: in search of common ground". another program is the "the uses of microcomputers in medium-sized public and academic libraries," and the third one will be a detailed analysis and comparison of the marc format. juergens reported that the isas retrospective conversion discussion group and one of the same name in rtsd would like to combine. a motion was made by juergens, and passed that: isas pursue appropriate steps to invite the rtsd section which currently hosts a discussion group on retrospective conversion to combine that discussion group with the lit a/isas retrospective conversion discussion group. the invitation to rtsd will include a specific description of mutual responsibilities. electronic library membership initiative group. (information report by richard sweeney, public library of columbus and franklin co., ohio; and neal kaske, oclc.) sweeney reviewed the discussions that took place at a meeting held in columbus on march 23-24, 1981 concerned with the whole area of remote electronic access to information and its impact on the library field. the group concluded that its members want to have some input on a very immediate level on the direction technology goes and the direction the policies and issues go. out of that meeting came a mission statement which is now the function statement of the ala electronic library membership initiative group (elmig). sweeney read that statement and reported on the group's concern for the future of libraries when these remote systems become established. he commented on the large number of programs and meetings on these areas that are not coordinated and not really providing the leadership our field should be giving. the almost total lack of research on these areas was also commented on. the need for the associations to provide the leadership was stressed. several members of the lita board expressed interest in providing a "home" for elmig within lita as many of lit a's interests are those of the mig. both groups are concerned with the same issues it was pointed out. lita section reports: audio-visual section. avs recommended that an audiovisual task force be established, which would include other ala units, and would share information about their plans, and would try to avoid major schedule conflicts and overlaps. a motion was made by cyr, and passed that lit a board approve ad hoc lit a a-v section participation in a broadbased task force involving rtsd, pla, acrl, aasl and others to coordinate audiovisual-related activities. cyr asked board's sanction for a "a-v interest group breakfast" where people could just socialize and talk together. this would be sometime in the future. the board members had no objection. marbi committee report. elaine woods reported that the marbi committee is focusing more on the principles and the issues that need to be highlights of meetings 325 addressed in the marc format. the committee is current with l.c. proposals. marbi has drawn up a shopping list of issues to be addressed and they are now working on some of them. publications committee report. charles husbands informed the board that the publications committee feels it is time to change the title of lola. they have chosen a title of information technology and libraries, and it is to be effective with the march 1982 issue. after discussion, a motion was made by bonnie juergens, and passed to that effect. the matter of raising the subscription price of lola was discussed. due to the fact that the division's subsidy to the journal will greatly increase next budget year, the motion was made by ken bierman, and passed that: non-member prices for the journal of the division be increased to $20 for a one-year subscription and $5.50 for a single issue, effective with march 1982, and that the published member subscription price be raised sufficiently to conform to postal regulation. husbands requested that various members of the lola editorial board be included in the lita electronic mail system. approved by the board by consensus. husbands asked the board to keep in mind the possibility of publishing some of the results of the oral history project in lola. brian aveney asked the board to allow him to investigate the possibility of putting the full text of lola online. it would be an experiment to see what people would do with it. the board approved by consensus. aveney will return with a final proposal later. other such ideas were discussed including the proposals to put the "headlines" from the lita newsletter on the source, and to include the roster of lita committees in the oclc address directory. arrangements are in process for both of these activities. goals and long-range planning committee. george abbott, chairperson, asked the board's permission to include his committee on lit a's electronic mail system. the intent would be to use it for text editing of committee documents. board approved by consensus. abbott reported that the committee expects to hold open hearings at midwinter and to have a basic document for discussion at that time. third session june 30, 1981 board members present: s. michael malinconico, brigitte l. kenney, ronald f. miller, kenneth j. bierman, marilyn j. rehnberg, heike kordish, and donald p. hammer. bylaws and organization report. there have been seven changes to the lit a bylaws that kordish will prepare in text form for the board to act on at midwinter in time for the spring ala ballot. ala priorities survey. ron miller reported that the ala executive board took action on the ala priorities and there are five of them. briefly, they are 326 journal of library automation vol. 14/4 december 1981 access to information , legislation and funding, intellectual freedom, public awareness, and personnel resources. joint council on educational telecommunications. lynne bradley reported that jcet has established a task force to bring information to its members about the new technologies and how they can best be used in education. since lit a members have much of the necessary expertise, bradley suggested that lita organize a one-day program for jcet. some board members were very much interested and bradley was asked to work with the lit a program planning committee to organize such a program. program planning committee. sue tyner reported that the telecommunications committee will hold a preconference institute at the philadelphia annual conference called "the teleconference center." it is intended to teach librarians how to set up a teleconference center. the lit a group that has been putting on the "data processing specifications and contracting" workshops has been asked to hold a workshop prior to the ifla meeting. malinconico suggested that the board adopt a policy of lit a costs plus 15 percent, but that a subcommittee of the lit a program planning committee should be set up to define policy in this area. carolyn gray was suggested as a person for this committee. marilyn rehnberg, chairperson of vccs, reported a request from national audio-visual association asking lit a to put on a " video showcase" for the seminar part of the nava annual conference in anaheim in january. lit a board of directors meetin gs record of votes 1981 annual conference motions (in order of appearance in the " highlights") board member 1 2 3 4 5 6 7 8 s. michael malinconico y y y y y y y y brigitte l. kenney y y y y y y y y barbara e. markuson y y y y y y y y nancy l. eaton y y y y y y y y kenneth j . bierman y y y y y y y y honald f. miller 0 0 0 y y y y y angie w. lecierq 0 0 0 0 0 0 0 0 helen cyr y y y y y y y y bonnie k. juergens y y y y y y y y marilyn j. rehnberg y y y y y y y y key: y =yes a= abstain 0 =absent president’s message andromeda yelton information technology and libraries | march 2018 2 andromeda yelton (andromeda.yelton@gmail.com) is lita president 2017-18 and senior software engineer, mit libraries, cambridge, massachusetts. in my last president’s message, i talked about change — ital’s transition to new leadership — and imagination — wakanda and the archival imaginary. today change and imagination are on my mind again as lita contemplates a new path forward: potential becoming a new combined division with alcts and llama. as you may have already seen on litablog (http://litablog.org/2018/02/lita-alcts-and-llamadocument-on-small-division-collaboration/), the three divisional leadership teams have been envisioning this possibility, and all three division boards discussed it at midwinter. while the id ea sprang out of our shared challenges with financial stability, in discussing it we’ve realized how much opportunity we have to be stronger together. for instance, we’ve heard for years that you, lita members, want more of a leadership training pathway, and more ways to stay involved with your lita home as you move into management; alignment with llama automatically opens up all kinds of possibilities. they have an agile divisional structure with their communities of practice and an outstanding set of lead ership competencies. and anyone involved with library technology knows that we live and die by metadata, but we aren’t all experts in it; joining forces with alcts creates a natural home for people no matter where they are (or where they’re going) on the technology/metadata continuum. alcts also runs far more online education than lita and runs a virtual conference. meanwhile, of course, lita has a lot to offer to llama and alcts. you already know how rewarding the networking is, and how great the depth of expertise on technology topics. we also bring strong publications (like this very journal), marquee conference programs (like top tech trends and the imagineering panel), and a face-to-face conference. (speaking of which, please pitch a session (http://bit.ly/2gpgxdf) for the 2018 lita forum!) i want to emphasize that no decisions have been made yet. the outcome of our three board discussions was that we all feel there is enough merit to this proposal to explore it further, but none of us are formally committed to this direction. furthermore, it is not practically or procedurally possible to make a change of this magnitude until at least 2019. in the meantime, we expect there will be numerous working groups to determine if and how this all could work, as well as open forums for the membership of all three divisions to express hopes, concerns, and ideas. personally, my highest priority is to ensure that that you, the members, continue to have a divisional home: one that gives you learning opportunities and a place for professional camaraderie, and that is on solid financial footing so it can continue to be here for you in the long term. http://litablog.org/2018/02/lita-alcts-and-llama-document-on-small-division-collaboration/ http://litablog.org/2018/02/lita-alcts-and-llama-document-on-small-division-collaboration/ http://bit.ly/2gpgxdf president’s message | march 2018 3 https://doi.org/10.6017/ital.v37i1.10386 so, i’m excited about the possibilities that a superhero teamup affords, but i’m even more excited to hear from you. do you find this prospect thrilling, scary, both? do you think we should absolutely go this way, or definitely not, or maybe but with caveats and questions? please tell me what you think. you can submit anonymous feedback and questions at https://bit.ly/litamergefeedback. i will periodically collate and answer these questions on litablog. you can also reach out to me personally any time (andromeda.yelton@gmail.com). https://bit.ly/litamergefeedback mailto:andromeda.yelton@gmail.com automated storage & retrieval system: from storage to service articles automated storage & retrieval system: from storage to service justin kovalcik and mike villalobos information technology and libraries | december 2019 114 justin kovalcik (jdkovalcik@gmail.com) is director of library information technology, csun oviatt library. mike villalobos (mike.villalobos@csun.edu) is guest services supervisor, csun oviatt library. abstract the california state university, northridge (csun) oviatt library was the first library in the world to integrate an automated storage and retrieval system (as/rs) into its operations. the as/rs continues to provide efficient space management for the library. however, added value has been identified in materials security and inventory as well as customer service. the concept of library as space, paired with improved services and efficiencies, has resulted in the as/rs becoming a critical component of library operations and future strategy. staffing, service, and security opportunities paired with support and maintenance challenges, enable the library to provide a unique critique and assessment of an as/rs. introduction “space is a premium” is a phrase not unique to libraries; however, due to the inclusive and open environment promoted by libraries, their floor space is especially attractive to those within and outside of the building’s traditional walls. in many libraries, the majority of floor space is used to house a library’s collection. in the past, as collections grew, floor space became increasingly limited. faced with expanding expectations and demands, libraries struggled to identify a balance between transforming space for new services while adding materials to a growing collection. in addition to management activities like weeding, other solutions such as offsite storage and compact shelving rose in popularity as a method to create library space in the absence o f new building construction. years later as collections move away from print and physical materials, libraries are beginning to reexamine their building’s space and envision new features and services. “now that so many library holdings are accessible digitally, academic libraries have the opportunity to make use of their physical space in new and innovative ways.”1 the csun oviatt library took a novel approach and launched the world’s first automated storage and retrieval system (as/rs) in 1991 as a storage solution to resolve its building space limitations. the project was a california state university (csu) system chancellor’s office initiative that cost more than $2 million to implement and began in 1989. the original concept “came from the warehousing industry, where it had been used by business enterprises for years.”2 by leveraging and storing physical materials in the as/rs, the csun oviatt library is able to create space within the library for new activities and services. “instead of simply storing information materials, the library space can and should evolve to meet current academic needs by transforming into an environment that encourages collaborative work.”3 mailto:jdkovalcik@gmail.com mailto:mike.villalobos@csun.edu automated storage & retrieval system | kovalcik and villalobos 115 https://doi.org/10.6017/ital.v38i4.11273 unfortunately, as the first stewards of an as/rs, csun made decisions that led to mismanagement and neglect resulting in the as/rs facing many challenges in becoming a stable and reliable component of the library. however, recent efforts have sought to resolve these issues and resulted in system updates, management, and functionality. whereas in the past low-use materials were placed in as/rs to create space for new materials, now materials are moved into the as/rs to create space for patrons, secure collections, and improve customer service. as part of this critical review, the functionality and maintenance along with the historical and current management of the as/rs will be examined. background csun is the second-largest member of the twenty-three-campus csu system. the diverse university community includes over 38,000 students and more than 4,000 employees.4 consisting of nine colleges offering 60 baccalaureate degrees, 41 master’s degrees, 28 credentials in education, and various extended learning and special programs, csun provides a diverse community with numerous opportunities for scholarly success.5 the csun oviatt library’s as/rs is an imposing and impressive area of the library that routinely attracts onlookers and has become part of the campus tour. the as/rs is housed in the library’s east wing and occupies an area that is 8,000 square feet and 40 feet high arranged into six aisles. the 13,260 steel bins, each 2 feet x 4 feet, in heights of 6, 10, 12, 15, and 18 inches, are stored on both sides of the aisles enabling the as/rs to store an estimated 1.2 million items.6 each aisle has a storage retrieval machine (srm) that performs automatic, semiautomatic, and manual “picks” and “deposits” of the bins.7 the as/rs was assessed in 2014 as responsibilities, support, and expectations of the system shifted and previous configurations were no longer viable. discontinued and failing equipment, unsupported server software, inconsistent training and use, and decreased local support and management were identified as impediments for greater involvement in library projects and operations. campus provided funding in 2015 to update the server software as well as major hardware components on three of the six aisles. divided into two phases, the server software upgrade was completed in may 2017 followed by the hardware upgrade in january 2019.8 literature review the continued growth of student, faculty, and academic programs along with evolving expectations and needs since the late 1980s has required the library to analyze library services and examine the building’s physical space and storage capacity. in the late 1980s, identifying space for increasing printed materials was the main contributing factor in implementing the as/rs. in the mid-2010s, creating space within the library for new services was dependent on a stable and reliable as/rs. “the conventional way of solving the space problem by adding new buildings and off-site storage facilities was untenable.”9 a benefit of an as/rs, as creaghe and davis predicted in 1986 was, “the probable slow transition from books to electronic media, an aaf [automated access facility] may postpone the need for future library construction indefinitely.”10 the as/rs has enabled the library to create space by removing physical materials while enhancing customer service, material security, and inventory control. “the role of the library as service has been evolving in lockstep with user needs. the current transformative process that takes place in academia has a powerful impact on at least two functional areas of the library: information technology and libraries | december 2019 116 library as space and library as collection.”11 in addition, the “increased security the aaf … offers will save patrons time that would be spent looking for books on the open shelves that may be in use in the library, on the waiting shelves, misplaced, or missing.”12 in subsequent years, library services have evolved to include computer labs with multiple high-use printers/scanners/copiers, instructional spaces, individual and group study spaces, makerspaces, etc., in addition to campus entities that have required large amounts of physical space within the library. “it is well-known that academic libraries have storage problems. traditional remedies for this situation—used in libraries across the nation—include off-site storage for less used volumes, as well as, more recently, innovative compact shelving. these solutions help, but each has its disadvantages, and both are far from ideal. . . . when the eastern michigan university library had the opportunity to move into a new building, we saw that an as/rs system would enable us to gain open space for activities such as computer labs, training rooms, a cafe, meeting rooms, and seating for students studying.”13 the as/rs provides all the space advantages provided by off-site storage and compact shelving while adding much more value while mitigating negatives of off-site time delays and the confusion of accessing and using compact shelving. staffing & usage 1991–1994 following the 80/20 principle, low-use items were initially selected for storage in the as/rs. “when the storage policy was being developed in [the] 1990s, the 80/20 principle was firmly espoused by librarians. . . . thus, by moving lower-use materials to as/rs, the library could still ensure that more than 80% of the use of the materials occurs on volumes available in the open stacks.”14 low-use items were identified if one of the following three conditions was met: (1) the item’s last circulation date was more than five years ago; (2) the item was a non-circulating periodical; or (3) items that were not designed to leave an area and received little patron usage such as the reference collection. in 1991, the as/rs was loaded with 800,000 low-use items and went live for the first time later that year. staffing for the initial as/rs department consisted of one full-time as/rs supervisor (40 hours/week), one part-time as/rs repair technician (20 hours/week), and 40 hours a week of dedicated student employees, for a total of 100 hours a week of dedicated as/rs management. the as/rs was largely utilized as a specialized service for internal library operations with limited patron-initiated requests. as/rs operations were uniquely created and customized for each as/rs operator as well as the desired task needing to be performed. skills were developed internally with knowledge and training shared by word of mouth or accompanied with limited documentation. 2000 mid-2000s the as/rs department functioned in this manner until the 1994 northridge earthquake struck the campus directly and required partial building reconstruction to the library. although there was no damage to the as/rs itself or its surrounding structure, extensive damage occurred in the wings of the library. the damage resulted in the library building being closed and inaccessible. when the library reopened in 2000, it was determined that due to previous as/rs low usage that a dedicated department was no longer warranted. the as/rs supervisor position was dissolved, the student employee budget was eliminated, and the as/rs technician position was not replaced after the employee retired in 2008. as/rs operational responsibilities were consolidated into the circulation department and as/rs administration into the systems department. both circulation automated storage & retrieval system | kovalcik and villalobos 117 https://doi.org/10.6017/ital.v38i4.11273 and systems departments redefined their roles and responsibilities to include the as/rs without additional budgetary funding, staffing, or training. in order for as/rs operations to be absorbed by these departments, changes had to occur in the administration, operating procedures, staffing assignments, and access to the as/rs. all five circulation staff members and twenty student employees received informal training by members of the former as/rs department in the daily operations of the as/rs. the circulation members also received additional training for first-tier troubleshooting of as/rs operations such as bin alignments, emergency stops, and inventory audits. the as/rs repair technician remained in the systems department; however, as/rs troubleshooting responsibility was shared among the systems support specialists and dedicated as/rs support was lost. the administrative tasks of scheduling preventive maintenance services (pms), resolving as/rs hardware/equipment issues with the vendor, and maintaining the server software remained with the head of the systems department. without a dedicated department providing oversight for the as/rs, issues and problems began to occur frequently. circulation had neither the training nor resources available to master procedures or enforce quality control measures. similarly, the systems department became increasingly removed from daily operations. many issues were not reported at all and became viewed as system quirks that required workarounds or were viewed as limitations of the system. for issues that were reported, troubleshooting had to start all over again and systems relied on circulation staff being able to replicate the issue in order to demonstrate the problem. system’s personnel retained little knowledge on performing daily operations, and troubleshooting became more complex and problematic as different operators had different levels of knowledge and skill that accompanied their unique procedures. mid-2000s–2015 these issues became further exasperated when areas outside of circulation were given full access to the as/rs in the mid-2000s. employees from different departments of the library began entering and accessing the as/rs area and operated the as/rs based on knowledge and skills they learned informally. student assistants from these other departments also began accessing the area and performing tasks on behalf of their informally trained supervisors. further, without access control, employees as well as students ventured into the “pit” area of the as/rs where the srms move and end-of-aisle operations occur. this area contains many hazards and is unsafe without proper training. during this period, the special collections and archives (sc/a) department loaded thousands of un-cataloged, high-use items into the as/rs that required specialized service from circulation. these items were categorized as “non-library of congress” and inventory records were entered into the as/rs software manually by various library employees. in addition, paper copies were created and maintained as an independent inventory by sc/a. over the years, the sc/a paper inventory copies were found to be insufficiently labeled, misidentified, or missing. therefore, the as/rs software inventory database and the sc/a paper copy inventory contained conflicts that could not be reconciled. to resolve this situation, an audit of sc/a materials was completed in spring 2019 to locate inventory that was thought to be missing. information technology and libraries | december 2019 118 all bound journals and current periodicals were eventually loaded into the as/rs as well, causing other departments and areas to rely on the as/rs more heavily. departments such as interlibrary loan and reserves, as well as patrons, began requesting materials stored in the as/rs more routinely and frequently. the as/rs transformed from a storage space with limited usage to an active area with simultaneous usage requests of different types throughout the day. without a dedicated staff to organize, troubleshoot, and provide quality control, there was an abundance of errors that led to long waits for materials, interdepartmental conflicts, and unresolved errors. high-use materials from sc/a, as well as currently received periodicals from the main collection, were the catalysts that drove and eventually warranted change in the as/rs usage model from storage to service. the inclusion of these materials created new primary customers identified as internal library departments: sc/a and interlibrary loan (ill). with over 4,000 materials contained in the as/rs, sc/a requires prompt service for processing archival material into the as/rs and filling specialized patron requests for these materials. in addition, ill processes over 500 periodical requests per month that utilize and depended on as/rs services. the additional storage and requests created an uptick in overall as/rs utilization that carried over into circulation desk operations as well. 2015–present the move from storage to service was not only inevitable due to an evolving as/rs inventory, but was necessary in order to regain quality control and manage the library-wide projects that involved the as/rs. the increased usage and reliance on the as/rs required the system be well maintained and managed. administration of the as/rs remains within systems and circulation student employees continue to provide supervised assistance to the as/rs. the crucial change was identified and emerged within circulation for a dedicated operations and project manager. an as/rs lead position was created with responsibilities for the daily operations and management of the system and service. however, this was not a complete return to the original staffing concept of the early 1990s. the concept for this new position focuses on project management and system operations rather than the original sole attention to system operations. the as/rs lead is the point of contact for all library projects that utilize the as/rs, relaying any as/rs issues or concerns to systems, and daily as/rs usage. this shift is necessary due to the increased demand and reliance on the system that has changed its charge from storage to service. customer service the library noted over time that the as/rs could be used as a tool in weeding and other collection shift projects to create space and aid in reorganizing materials. as more high-use materials were loaded into the as/rs the indirect advantages of the as/rs became more apparent. patrons request materials stored within the as/rs through the library’s website and pick up the materials at the circulation desk. there is no need for patrons to navigate the library, successfully use the classification system, and search shelves to locate an item that may or may not be there. as kirsch notes, “the ability to request items electronically and pick them up within minutes eliminates the user’s frustration at searching the aisles and floors of an unfamiliar library.”15 the vast majority of library patrons are csun students that commute and must make the best use of their time while on campus. housing items in the as/rs creates the opportunity to have hundreds of thousands of items all picked up and returned to one central location. this makes it far easier for library patrons, especially users with mobility challenges, to engage with a plethora of library automated storage & retrieval system | kovalcik and villalobos 119 https://doi.org/10.6017/ital.v38i4.11273 materials. the time allotted for library research and/or enjoyment becomes more productive as their desired materials are delivered within minutes of arriving in the building. as heinrich and willis state, “the provision of the nimble, just-in-time collection becomes paramount, and the demand for as/rs increases exponentially.”16 as/rs items are more readily available than shelved items on the floor, as it takes minutes to have as/rs items returned and made available once again. “they may be lost, stolen, misshelved, or simply still on their way back to the shelves from circulation—we actually have no way of knowing where they are without a lengthy manual search process, which may take days. . . . unlike books on the open shelves, returned storage books are immediately and easily ‘reshelved’ and quickly available again.”17 another advantage is there is no need to keep materials in call-number order with the unpleasant reality of missing and misshelved items. items in the as/rs are assigned bin locations that can only be accessed by an operatoror user-initiated request. the workflow required to remove a material from the as/rs involves multiple scans and procedures that increase accountability that does not exist for items stored on floor shelves. further, users are assured of an item’s availability within the system. storing materials in the as/rs ensures that items are always checked out when they leave the library and not sitting unaccounted for in library offices and processing areas. it also avoids patron frustration of misshelved, recently checked-out, or missing items. security the decision to follow the 80/20 principle and place low-use items in the as/rs meant high-use items remained freely available to library patrons on the open shelves of each floor. this resulted in high-use items being available for patron browsing and checkout, as well as patron misuse and theft. the sole means of securing these high-use items involved tattle-tape and installing security gates at the main entrance. therefore, the development of policies and procedures for the enforcement of these gates was also required. beyond the inherent cost, maintenance, and issue of ensuring items are sensitized and desensitized correctly, gate enforcement became another issue that rested upon the circulation department. assuming theft would occur by exiting the building through passing through the gates at the main entrance of the library, enforcement is limited in actions that may be performed by library employees. touching, impeding the path, following, detaining, searching, etc. of library patrons are restricted actions reserved for campus authorities such as the police and not library employees. rather than attempting to enforce a security mechanism in which we have no authority, the as/rs provides an alternative for the security of high-use and valuable materials. storing items in the as/rs eliminates the possibility of theft or damage by visitors and places control and accountability over the internal use of materials. “there would be far fewer instances of mutilation and fewer missing items.”18 further, access to the as/rs area was restricted from all library personnel to only circulation and systems employees with limited exceptions. individual log ins also provided a method of control and accountability as each operator is required to use a personal account rather than a departmental account to perform actions on the as/rs. materials stored in the as/rs are, “more significantly . . . safer from theft and vandalism.”19 information technology and libraries | december 2019 120 inventory conducting a full inventory of a library collection is time consuming, expensive, and often inaccurate by the time of completion. missing or lost items, shelf reading projects, in-process items, etc. create overhead for library employees and generate frustration for patrons searching for an item. massive, library-wide projects such as collection shifts and weeding are common endeavors undertaken to create space, remove outdated materials, and improve collection efficiency. however, actions taken on an open shelves collection is time consuming, costly, inefficient, and affect patron activities. these projects typically involve months of work that involve multiple departments to complete. items stored within the as/rs do not experience these challenges because the system is managed by a full-time employee throughout the year and not on a project basis. the system is capable of performing inventory audits, and does not affect public services. therefore, while the cost of an item on an open shelf is $0.079, the cost of storing the same item in the as/rs is $0.0220 routine and spot audits ensure an accurate inventory, confirm capacity level of the system, and establish best management of the bins. as/rs inventory audits are highly accurate and much more efficient than shelf reading with little impact to patron services. “while this takes some staff time, it is far less time-consuming than shelf reading or searching for misshelved books.”21 storing materials in the as/rs is more efficient than on open shelves; however, bin management is essential in ensuring bins are configured in the best arrangement to achieve optimal efficiency. the size and configuration of bins directly affects storage capacity. type of storage, random or dedicated, also influences capacity, efficiency, and accessibility of items. the 13,260 steel bins in the as/rs range in height from 6 to 18 inches. the most commonly used bins are the 10and 12-inch bins; however, there is a finite number of these bin heights. unfortunately, the smallest and largest bins are rarely used due to material sizes and weight capacity; therefore, as/rs optimal capacity is unattainable and the number of materials eligible for loading limited by number of bins available. the library also determined that dedicated, rather than random, bin storage type aided in locating specialized materials, reduced loading and retrieval errors, and enhanced accessibility by arranging highly used bins to reachable locations. in the event an srm breaks down and an aisle becomes nonfunctional for retrieving bins, strategically placing the highest used and specialized locations in bins that can be manually pulled is a proactive strategy. however, this requires dedicated bins with an accurate and known inventory that has been arranged in accessible locations. lessons learned disasters & security in 1994, the as/rs proved to provide a much more stable and secure environment than the open stacks when it successfully endured a 6.9 earthquake. the reshelving of more than 300,000 items required a crew of more than thirty personnel over a year to complete. many items were destroyed from the impact of falling to the floor and being buried underneath hundreds of other automated storage & retrieval system | kovalcik and villalobos 121 https://doi.org/10.6017/ital.v38i4.11273 items. the as/rs in contrast consisted of over 800,000 items and successfully sustained the brunt of the earthquake’s impact with no damage to any of the stored items. unfortunately. the materials that had been loaded into the as/rs in 1991 were low-use items that were viewed as one step from weeding. therefore, high-use items stored in open shelves were damaged and required the long process of recovery and reconstruction: identifying and cataloging damaged and undamaged materials, disposal of those damaged, renovation of the area, and purchase of new items. the low-use items stored in the as/rs by contrast required a few bins that had slightly shifted be pushed back fully into their slots. as/rs items have proven to be more secure from misplacement, theft, and physical damage from earthquakes as compared to items in open shelves. maintenance, support, and modernization the csun oviatt library has received two major updates to the as/rs since it was installed in 1991. in 2011, the as/rs received updates for communication and positioning components. the second major update occurred in two phases between 2016 and 2018 and focused on software and equipment. in phase one, server and client-side software was updated from the original software created in 1989. in phase two, half the srms received new motors, drives, and controllers. due to the many years of reliance on preventive maintenance (pm) visits and avoidance of modernization, our vendors were unable to provide support for the as/rs software and had difficulty locating equipment that had become obsolete. preventive maintenance visits were used to maintain the status quo and are not a long-term strategy for maintaining a large investment and critical component of business operations. creaghe and davis note that, “current industrial facility managers report that with a proper aaf [automated access facility] maintenance program, it is realistic to expect the system to be up 9598 percent of the time.”22 pm service is essential for long-term as/rs success; however, preventive maintenance alone is incapable of modernization and ensuring equipment and software do not become obsolete. maintenance is not the same as support, rather maintenance is an aspect of support. support includes points of contacts who are available for troubleshooting, spare supplies on hand for quick repairs, a life-cycle strategy for major components, and longterm planning and budgeting. kirsch attested the following describing eastern michigan university’s strategy: “although the dean is proud and excited about this technology, he acknowledges that just like any computerized technology, when it’s down, it’s down. ” to avoid system problems, emu bought a twenty-year supply of major spare parts and employs the equivalent of one-and-a-half full-time workers to care for its automated storage and retrieval system.”23 a system that relies solely on preventive maintenance will quickly become obsolete and require large and expensive projects in the future if the system is to continue functioning. further, modernization provides an avenue for new features and functions to be realized that increase functionality and efficiency. networking the csun oviatt library on average receives between three to four visits a year along with multiple emails and phone conversations requesting information from different libraries regarding the as/rs. these conversations aid the library by viewing the as/rs in different perspectives and forces the library to review current practices. information technology and libraries | december 2019 122 the library has learned through speaking with many different libraries that needs, design, and configuration of an as/rs can be as unique as the libraries inquiring. the csun oviatt library, for example. is much different than the three other csu system libraries that have an as/rs. due to our system being outdated, it has been difficult to form or establish meaningful groups or share information because the systems are all different from each other. as more conversations occur and systems become more modern and standard, there is potential for knowledge sharing as well as group lobbying efforts for features and pricing. buy in user confidence in any system is required in order for that system to be successful. convincing a user base that moving materials from readily available open shelves and transferring them into steel bins housed within 40-feet-high aisles that are inaccessible will be difficult if the system is consistently down. therefore, the better the as/rs is managed and supported, the more reliable and dependable that system will be and the likelihood user confidence will grow. informing stakeholders of long-term planning and welcoming feedback demonstrates that the system is being supported and managed with an ongoing strategy that is part of future library operations. similarly, administrators need confirmation that large investments and mission-critical services are stable, reliable, and efficient. creating a new line item in the budget for as/rs support and equipment life-cycle requires justification along with a firm understanding of the system. in addition, staffing and organizational responsibilities must also be reviewed in order to establish an environment that is successful and efficient. continuous assessments of the as/rs regarding downtime, projects involved, services and efficiencies provided, etc. aid in providing an illustration of the importance and impact of the system on library operations as a whole. recording usage and statistics unfortunately, usage statistics were not recorded for the as/rs prior to june 2017. therefore, data is unavailable to analyze previous system usage, maintenance, downtime, or project involvement. data-driven decisions require the collection of statistics for system analysis and assessment. following the server software and hardware updates, efforts have been taken to record project statistics, inventory audits, and srm faults, as well as public and internal paging requests. conclusion the as/rs remains, as heinrich & willis described it, “a time-tested innovation.”24 through lessons learned and objective assessment, the library is positioning the as/rs to be a critical component for future development and strategy. by expanding the role of the as/rs to include functions beyond low-use storage, the library discovered efficiencies in material security, customer service, inventory accountability, and strategic planning. the csun oviatt library has learned, experienced, and adjusted its perception, treatment, and usage of the as/rs over the past thirty years. factors often forgotten such as access to the area, staffing and inventory auditing are easily overlooked, while other potential functions such as material security and customer services may not be identified without ongoing analysis and assessment. critical review without a limited or biased perception, has enabled the library to realize the greater functionality the as/rs is able to provide. automated storage & retrieval system | kovalcik and villalobos 123 https://doi.org/10.6017/ital.v38i4.11273 notes 1 shira atkinson and kirsten lee, “design and implementation of a study room reservation system: lessons from a pilot program using google calendar,” college & research libraries 79, no. 7 (2018): 916–30, https://doi.org/10.5860/crl.79.7.916. 2 helen heinrich and eric willis. “automated storage and retrieval system: a time-tested innovation,” library management 35, no. 6/7 (august 5, 2014): 444-53. https://doi.org/10.1108/lm-09-2013-0086. 3 atkinson and lee, “design and implementation of a study room reservation system,” 916–30. 4 “about csun,” california state university, northridge, february 2, 2019, https://www.csun.edu/about-csun. 5 “colleges,” california state university, northridge, may 8, 2019, https://www.csun.edu/academic-affairs/colleges. 6 estimated as/rs capacity was calculated by determining the average size and weight of an item for each size of bin along with the most common bin layout. the average item was then used to determine how many could be stored along the width and length (and if appropriate height) of the bin and then multiplied. many factors affect the overall capacity including: bin layout (with or without dividers), stored item type (book, box, records, etc.), weight of the items, and operator determination of full, partial, empty bin designation. the as/rs mini-loaders have a weight limit of 450 pounds including the weight of the bin. 7 “automated storage and retrieval system (as/rs),” csun oviatt library, https://library.csun.edu/about/asrs. 8 “automated storage and retrieval system (as/rs),” csun oviatt library, https://library.csun.edu/about/asrs. 9 heinrich and willis, “automated storage and retrieval system,” 444-53. 10 norma s. creaghe and douglas a. davis. “hard copy in transition: an automated storage and retrieval facility for low-use library materials,” college & research libraries 47, no. 5 (september 1986): 495-99, https://doi.org/10.5860/crl_47_05_495. 11 heinrich and willis, “automated storage and retrieval system,” 444-53. 12 creaghe and davis, “hard copy in transition,” 495-99. 13 linda shirato, sarah cogan, and sandra yee, “the impact of an automated storage and retrieval system on public services.” reference services review 29, no. 3 (september 2001): 253-61, https://doi.org/10.1108/eum0000000006545. 14 heinrich and willis, “automated storage and retrieval system,” 444-53. 15 sarah e. kirsch, “automated storage and retrieval—the next generation: how northridge’s success is spurring a revolution in library storage and circulation,” paper presented at the https://doi.org/10.5860/crl.79.7.916 https://doi.org/10.1108/lm-09-2013-0086 https://www.csun.edu/about-csun https://www.csun.edu/academic-affairs/colleges https://library.csun.edu/about/asrs https://doi.org/10.5860/crl_47_05_495 https://doi.org/10.1108/eum0000000006545 information technology and libraries | december 2019 124 acrl 9th national conference, detroit, michigan, april 8-11 1999, http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/pdf/kirsch99.pdf . 16 heinrich and willis, “automated storage and retrieval system,” 444-53. 17 shirato, cogan, and yee, “the impact of an automated storage and retrieval system, 253-61. 18 kirsch, “automated storage and retrieval.” 19 shirato, cogan, and yee, “the impact of an automated storage and retrieval system, 253-61. 20 cost of material management was calculated by removing building operational costs (lighting, hvac, carpet, accessibility/open hours, etc.) and focusing on the management of the material instead. the management of materials (or unit cost) is determined by dividing the total amount of fixed and variable costs by the total number of units; 400,000 items divided by $31,500 in annual shelving student budget equals $0.079 per-material per-year in open shelves; 900,000 items divided by $18,000 in annual as/rs student budget equals $0.02 permaterial per-year in the as/rs. 21 shirato, cogan, and yee, “the impact of an automated storage and retrieval system,” 253-61. 22creaghe and davis, “hard copy in transition,” 495-99. 23 kirsch, “automated storage and retrieval.” 24 heinrich and willis, “automated storage and retrieval system,” 444-53. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/pdf/kirsch99.pdf abstract introduction background literature review staffing & usage 1991–1994 2000 mid-2000s mid-2000s–2015 2015–present customer service security inventory lessons learned disasters & security maintenance, support, and modernization networking buy in recording usage and statistics conclusion notes 10181 20190318 galley a systematic approach towards web preservation muzammil khan and arif ur rahman information technology and libraries | march 2019 71 muzammil khan (muzammilkhan86@gmail.com) assistant professor, department of computer and software technology, university of swat. arif ur rahman (badwanpk@gmail.com) assistant professor, department of computer science, bahria university islamabad. abstract the main purpose of the article is to divide the web preservation process into small explicable stages and design a step-by-step web preservation process that leads to creating a well-organized web archive. a number of research articles are studied about web preservation projects and web archives, and designed a step-by-step systematic approach for web preservation. the proposed comprehensive web preservation process describes and combines strengths of different techniques observed during the study for preserving digital web contents into a digital web archive. for each web preservation step, different approaches and possible implementation techniques have been identified that can be adopted in digital archiving. the potential value of the proposed model is to guide the archivist, related personnel, and organizations to effectively preserved their intellectual digital contents for future use. moreover, the model can help to initiate a web preservation process and create a wellorganized web archive to efficiently manage the archived web contents. a section briefly describes the implementation of the proposed approach in a digital news stories preservation framework for archiving news published online from different sources. introduction the amount of information generated by institutions is increasing with the passage of time. one of the mediums that uses this information is the world wide web (www). the www has become a tool to share information quickly with everyone regardless of their physical location. the number of web pages is vast. google and bing each index approximately 4.8 billion.1 though the www is a rapidly growing source of information, it is fragile in nature. according to the available statistics, 80 percent of pages become unavailable after one year and 13 percent of links (mostly web references) in scholarly articles are broken after 27 months.2 moreover, 11 percent of posts and comments on websites for various purposes are lost within a year. according to another study conducted on 10 million web pages collected from the internet archive in 2001, the average survival rate of web pages is 1,132.1 days with a standard deviation of 903.5 days. 90.6 percent pages of those web pages are inaccessible today.3 the information fragility causes this valuable scholarly, cultural, and scientific information to vanish and become inaccessible to future generations. in recent years, it was realized that the lifespan of digital objects is very short, and rapid technological changes make it more difficult to access these objects. therefore, there is a need to preserve the information available on the www. digital preservation is performed using the primary methods of emulation and migration, in which emulation provides the preserved digital objects in their original format while migration provide objects in a different format.4 in the last systematic approach towards web preservation | khan and ur rahman 72 https://doi.org/10.6017/ital.v38i1.10181 two decades, a number of institutions worldwide, such as national and international libraries, universities, and companies started to preserve their web resources (resources found at a web server, i.e., web contents and web structure). the first web archive was initiated in 1996 by brewster kahle, named the internet archive, and it holds more than 30 petabytes data, which includes 279 billion web pages, 11 million books and texts, and 8 million other digital objects such as audio, video, image files, etc. more than seventy web archive initiatives were started in 33 countries since 1996, which shows the importance of web preservation projects and preservation of web contents. this information era encourages librarians, archivists, and researchers to preserve the information available online for upcoming generations. while digital resources may not replace the information available in physical form, the digital version of these information resources improves access to the available information.5 there are different aspects of the preservation process and web archiving, e.g., digital objects’ ingestion to the archive during preservation process, digital object’s format and storage, archival management, administrative issues, access and security to the archive, and preservation planning. these aspects need to be understood for effective web preservation and will help in addressing the challenges that occur during the preservation process. the reference model for open archival information system (oais) is an attempt to provide a high-level framework for the development and comparison of digital archives. in web preservation, a challenging task is to identify the starting point of the preservation process and to effectively complete the process which help to proceed further to the other activities. therefore, the complicated nature of the web and the complex structure of the web contents make the preservation of the web content even more difficult. the oais reference model helps in achieving the goals of a preservation task in a step-by-step manner. the stakeholders are identified, i.e., producer, management, and consumer, and the packages, i.e., submission information package (sip), archival information package (aip) and dissemination information package (dip), which need to be processed, are clearly defined.6 this study aims to design a step-by-step systematic approach for web preservation that helps to understand preservation or archival activities’ challenges, especially those that relate to digital information objects at various steps of the preservation process. the systematic approach may lead to an easy way to analyze, design, implement, and evaluate the archive with clarity and different options for an effective preservation process and archival development. an effective preservation process is one that leads to a well-organized, easily managed web archive and accomplishes designated community requirements. this approach may help to address the challenges and risks that confront archivists and analysts during preservation activities. step-by-step systematic approach digital preservation is “the set of processes and activities that ensure long-term, sustained storage of, access to and interpretation of digital information.”7 the growth and decline rates of www content and the importance of the information presented on the web make it a key candidate for preservation. web preservation confronts a number of challenges due to its complex structure, a variety of available formats, and the type of information (purpose) it provides. the overall layout of the web varies domain to domain based on the type of information and its presentation. the websites can be categorized based on two things. first, the type of information (i.e., the web information technology and libraries | march 2019 73 contents) and second, the way this information presented (i.e., the layout or structure of the web page. examples include educational, personal, news, e-commerce, and social networking websites, which vary a lot in their contents and structure. the variations in the overall layout make it difficult to preserve different web contents in a single web archive. the web preservation activities are summarized in figure 1. the following sections explain the web preservation activities and possible implementation in proposed systematic approach. defining the scope of the web archive the www provides an opportunity to share information using various services, such as blogs, social networking websites, e-commerce, wikis, and e-libraries. these websites provide information on a variety of topics and address different communities based on their interest and needs. there are many differences in the way the information is handled and presented on the www. in addition, the overall layout of the web changes from one domain to another domain.8 therefore, it is not practically feasible to develop a single system to preserve all types of websites for the long term. so, before starting to preserve the web, one (the archivist) should define the scope of the web to be archived. the archive will be either a site-centric, topic-centric, or domaincentric archive.9 site-centric archive a site-centric archive focuses on a particular website for preservation. these types of archives are mostly initiated by the website creator or owner. the site-centric web archives allow access to the old versions of the website. topic-centric archive topic-centric archives are created to preserve information on a particular topic published on the web for future use. for scientific verification, researchers need to refer to the available information while it is difficult to ensure access to these contents due to the ephemeral nature of the web. a number of topic-centric archive projects have been performed including the archipol archive of dutch political websites,10 the digital archive for chinese studies (dachs) archive2,11 minerva by the library of congress,12 and the french elections web archive for archiving the websites related to the french elections.13 domain-centric archive the word “domain” refers to a location, network, or web extension. a domain-centric archive covers websites published with a specific domain name dns, using either a top-level domain (tld), e.g., .com, .edu, or .org, or a second-level domain (sld), e.g., .edu.pk or .edu.fr. an advantage of domain-centric archiving is that it can be created by automatically detecting specific websites. several projects have a domain-centric scope, e.g., the portuguese web archive (pwa) national websites,14 the kulturarw, a swedish royal library web archive collection of.se and .com domain websites,15 and the uk government web archive collection of uk government websites, e.g., .gov.uk domain websites. understanding the web structure after defining the scope of the intended web archive, the archivist will have a better understanding of the interest and expected queries of the intended community based on the resources available or the information provided by the selected domain. the focus in this step is to understand the type of information (contents) provided by the selected domain and how the information has been presented. the web can be understood by two dimensions. the first systematic approach towards web preservation | khan and ur rahman 74 https://doi.org/10.6017/ital.v38i1.10181 figure 1. systematic approach for web preservation process. information technology and libraries | march 2019 75 considers the web as a medium that communicates contents using various protocols, i.e., http, and the second considers the web as a content container, which further presents the contents to the viewers and not simply contents, e.g. the underlying technology used to display the contents.16 the preservation team should understand such parameters as the technical issues, the future technologies, and the expected inclusion of other related content. identify the web resources the archivist should understand the contents and the representation of the contents of the selected domain, e.g., blogs, social networking websites, institutional websites, educational institutional websites, newspaper websites, or entertainment websites. all of these websites provide different information and address individual communities that have distinct information needs. a web page is the combination of two things, i.e., web contents and web structure.17 the resources which can be preserved are as follows. web contents web contents or web information can be categorized into the following categories: • textual contents (plain text): this category describes textual information that appears on a web page. it does not include links, behaviors, and presentation stylesheets. • visual contents (images): these contents are the visual forms of information or are a complementary material to the information provided in the textual form. • multimedia contents: as another form of information, multimedia contents mainly include audio and video. it may also include animation or even text as a part of a video or a combination of text, audio, and video. web structure web structure can be categorized in the following categories: • appearance (web layout or presentation): this category indicates the overall layout or presentation of a web page. the look and feel of a web page (representation of the contents) are important, which is maintained with different technologies, e.g., html or stylesheets, etc. • behavior (code navigations): categorized by link navigations, these can be within a website or to other websites, external document links or dynamic and animated features, such as live feed, comments, tagging, or bookmarking. identify designated community the archivist should identify the designated community of the intended web archive, their functional requirements and expected queries by analyzing them carefully. the designated community means the potential users, such as those who can access the archived web contents for different purposes, i.e., accessing old information that is not available in normal circumstances or referring to an old news article which is not bookmarked properly or retrieving relevant news articles published long ago, etc. prioritize the web resources after a comprehensive assessment of the resources of the selected domain and the identification of potential users’ requirements and expected queries, the archivist should prioritize the web systematic approach towards web preservation | khan and ur rahman 76 https://doi.org/10.6017/ital.v38i1.10181 resources. the complexity of web resources and their representation cause complications in the digital preservation process. generally, it may be undesirable or unviable to preserve all web resources; therefore, it is worthwhile to designate the web resources for preservation. the priority should be assigned on the basis of two things: first, the potential reuse of the resource and second, the frequency with which the resource will be accessed. the resources with no value, little value, or those managed elsewhere can be excluded. for prioritization of resources, the moscow method can be applied.18 the acronym moscow can be elaborated as: m must have, the resource must be preserved or resources that must be a part of the archive and preserved. for example, in the digital news story archive (dnsa), the textual news story must be preserved in the archive because the preservation emphasis is on a textual news story.19 online news contains textual news stories, and many news stories contain associated images, and a fraction of news stories contain associated audio-video contents. s should have, the resource should be preserved if at all possible. almost all the news stories have associated images; a few news stories have associated audio and video that complement it and should be preserved as a part of the news story in the web archive. c could have, the resource could be preserved if it does not affect anything else or is nice to have. the web structure in dnsa depends on the resources to be used for the preservation of news stories; the layout of the newspaper website could (c) be a part of the preservation process if it does not affect anything, e.g., storage capacity and system efficiency. w won’t have, the resource would not be included. archiving multiple versions of the layout or structure of the online newspaper are not worthwhile and hence would not (w) be preserved. the prioritization of these resources is very important in the context of web preservation planning because it does not waste time and energy, and it is the best way to handle users’ requirements and fulfill their expected queries. how to capture the resource(s) the selection of a feasible capturing technique depends on: first, the resources to be captured and second, the capturing task frequency. there are three web resources capturing techniques, i.e., by browser, web crawler, and authoring system. each capturing technique has associated advantages and disadvantages.7 web capturing using browsers the intended web content can be captured using browsers after a web page is rendered when the http transaction occurs. this technique is also referred to as a snapshot or post-rendering technique. the method captures those things which are visible to the users; the behavior and other attributes remain invisible. capturing static contents is one of the disadvantages of web capturing by the browser approach, this approach generally preserved contents in the form of images. it is best for well-organized websites, and commercial tools are available for capturing the web. the following are well-known tools to capture web using browsers. webcapture (https://web-capture.net/) is a free online web-capturing service. it is a fast web page snapshot tool, which can grab web pages in seven different formats, i.e. jpeg, tiff, png, bmp information technology and libraries | march 2019 77 image formats, pdf, svg, and postscript files of high quality. it also allows downloading the intended format in a zip file and is suitable for long vertical web pages with no distortion in layout. a.nnotate (http://a.nnotate.com/), is an online annotating web snapshot tool to keep track of information gathered from the web efficiently and easily. it allows adding tags and notes to the snapshot and building a personal index of web pages as document index. the annotation feature can be used for multiple purposes, for example, compiling an annotated library of objects for organization, sharing commented web pages, product comparison, etc. snagit (https://www.techsmith.com/screen-capture.html) is a well-known snapshot tool for capturing screens with built-in advanced image editing features and screen recording. snagit is a commercial and advanced screen capture tool that can capture web pages with images, linked files, source code, and the url of the web page. acrobat webcapture (file > create > pdf from web page...) creates a tagged pdf file from the web page that a user visits while the adobe pdf toolbar is used for the entire website.20 the capture by a browser technique has the following advantages: • by this technique, the archivist can capture only the displayed contents, and it is an advantage if you need to preserve the displayed contents only. • it is a relatively simple technique for well-organized websites. • commercial tools exist for web capturing using browsers. in addition, the disadvantages are the following: • capturing displayed contents only is a disadvantage if the focus is not on only displayed contents. • it results in frozen contents and treats contents as if they are publications. • it loses the web structure, such as appearance, behavior, and other attributes of the web page. web capturing using an authoring system/server the authoring system capturing technique is used for web harvesting directly from the website hosting server. all the contents, e.g., textual information, images, and source code, are collected from the source web server. the authoring system allows the archivist to preserve the different versions of the website. the authoring system depends on the infrastructure of the content management system and is not a good choice for external resources. the system is best for an owned web server and works well for limited internal purposes. the web curator tool (http://webcurator.sourceforge.net/), pandas (an old british library harvesting tool), and netarchivesuite (https://sbforge.org/display/nas/netarchivesuite) are known tools use for planning and scheduling web harvesting. they can be used by non-technical personnel for both selection and harvesting web content selection policies. these web archiving tools were developed in a collaboration of the national library of new zealand and the british library and are used for the uk web archive (http://www.ariadne.ac.uk/issue50/beresford/). the tools can interface with web crawlers, such as heritrix (https://sourceforge.net/projects/archivecrawler/). authoring systems are also referred to as workflow systems or curatorial tools. systematic approach towards web preservation | khan and ur rahman 78 https://doi.org/10.6017/ital.v38i1.10181 the authoring system has the following advantages: • it is best for web harvesting, which captures everything available. • it is easy to perform, if you have proper access permission or you own the server or system to access for capturing the resources. • it works in short to medium term resources and feasible for internal access within organizations. the disadvantages of web capturing using the authoring system are: • it captures all available raw information, not only presentations. • it may be too reliant on the authoring infrastructure or the content management system. • it is not feasible for large term resources, or for external access from outside organization. web capturing using web crawlers web crawlers are perhaps the mostly used technique for capturing web contents in systematic and automated manner.21 crawler development needs the expertise and experience of different tools, i.e. positive and negative of technologies, and the viability of a tool in a specific scenario. the main advantage of crawlers is that they extract embedded content. heritrix, httrack, wget, and deeparc are common examples of web crawlers. heritrix (https://github.com/internetarchive/heritrix3/wiki) is developed in java, an open source and freely available web crawler, and it was developed by internet archive. heritrix is one of the widely used extensible and web-scale web crawlers in web preservation projects. initially, the heritrix was developed for specific purpose crawling of specific websites and now a resourceful or customize web crawler for archiving the web. httrack (https://www.httrack.com/) is a freely available configurable browser utility. httrack crawls html, images, and other files from a server to a local directory and allows offline viewing of the website. the httrack crawler downloads a complete website from the web server to a local computer system and makes it available for offline for viewing with all related link-structure and seems like the user is using it online. it also updates the archived websites at the local system from the server and resumes all the interrupted previous extractions. the httrack available for both windows and linux/unix operating systems. wget (http://www.gnu.org/software/wget/) is a freely available non-interactive command line tool that can easily be configured with other technologies and different scripts. it can capture files from the web using widely used ftp, ftps, http and https protocols, and support cookies as well. it also updates the archived websites and resumes all the interrupted extractions. wget is available for both microsoft windows and unix operating systems. the advantages of web crawling: • widely used in capturing techniques. • can capture specific content or everything. • avoids some of the accessing issues, such as: link rewriting and embedded external content from an archive or live. information technology and libraries | march 2019 79 disadvantages associated with web crawling: • much work is required, as well as tools or development expertise and experience, etc. • the web crawler does not have the right scope: sometimes, it does not capture everything that it should, and sometimes the crawler captures too much content. web content selection policy in the previous steps, the web resources are identified, prioritized based on requirements and expected queries of the designated community, and feasible capturing technique is identified based on capturing frequency. now, the contents need to be prepared and filtered for selection, and a feasible selection approach needs to be selected based on the contents. a web content selection policy helps to determine and clarify, which web contents are required to be captured based on the priorities, the purpose and the scope of web contents already defined.22 the decision of the selection policy comprises the description of the context, the intended users, the access mechanisms and the expected uses of the archive. the selection policy may comprise the selection process and selection approach. the selection process can be divided into subtasks which, in combination, provide a qualitative selection of web contents to a certain extent, i.e., preparation, discovery, and filtering, as shown in figure 2. the main objective of the preparation phase is to determine the targeted information space, the capture technique, capturing tools, extension categorization, granularity level, and the frequency of archiving activity. the best personnel who can provide help in preparation are the domain experts, regardless of the scope of the web archive. the domain experts may be the archivists, researchers, librarians, or any other authentic reference, i.e. a document or a research article. the tools defined in the preparation phase will help to discover intended information in the discovery phase, which can be divided into the following four categories: 1. hubs may be the global directories or topical directories, collection of sites or even a single web page with essential links related to a particular subject or topic. 2. search engines can facilitate discovery by defining a precise query or set of alternative queries related to a topic. the use of specialized search engines can significantly improve the results of discovering related information that can be greatly improved. 3. crawlers can be used to extract web contents such as textual information, images, audio, video and links. moreover, the overall layout of a web page or a whole website can also be extracted in a well-defined systematic manner. 4. external sources may be non-web sources that may be anything, such as printed material for mailing lists, which can be monitored by the selection team. the main objective of the discovery phase is to determine the source of information to be stored the archive. this determination can be achieved by two ways. first, a manually created entry point list is used to determine the list of entry points (usually links) for crawling the collection manually and updating the list during the crawl. there are two discovery methods, i.e., exogenous and endogenous. exogenous discovery is used in manual selection and mostly relies on exploitation of an entry point list for hubs, search engines, and on non-web documents. second, there is an automatically created entry point list to determine the list of entry points by extracting links automatically and obtaining an updated list every time during the crawl. endogenous discovery is systematic approach towards web preservation | khan and ur rahman 80 https://doi.org/10.6017/ital.v38i1.10181 used in automatic selection and relies on the link extraction using crawlers by exploring the entry point list. figure 2. selection process. the main objective of the filtering phase is to optimize and make concise the discovered web contents (discovery space). filtering is important in order to collect more specific web content and remove unwanted or duplicated content. usually, for preservation, an automatic filtering method is used; manual filtering is useful if the robots or automatic tools cannot interpret the web. the discovery and filter phase can be combined practically or logically. several evaluation axes can be used for the selection policy (e.g., quality, subject, genre, and publisher). in the literature, we have three known techniques for selecting web content. the selection approach can be either automatic or manual. manual content selection is very rare because it is labor intensive: it requires automatic tools for finding the content, and then manual review of that collection to identify the subset that should be captured. automatic selection policies are used frequently in web preservation projects for web collection, especially for web archives.23 the selection of the collection approach depends on the frequency with which the web content has been preserved in the archive. there are four different selection approaches for web content collection. unselective approach the unselective approach implies collecting everything possible; by specifically using this approach, the whole website and its related domains and subdomains are downloaded to the archive. it is also referred to as automatic harvesting or selection, bulk selection, and domain selection.24 the automatic approach is used in a situation where a web crawler usually performs the collection. for example, the collection of websites from a domain, i.e., .edu means all educational institution websites (at domain level) or the collection of all possible contents/pages from a website (harvesting at website level) by extracting the embedded links. a section of the data preservation community believes that technically it is a relatively cheaper, quicker collection approach and yields a comprehensive picture of the web as a whole. in contrast, its significant drawbacks are that it generates huge unsorted, duplicated, and potentially useless data, consuming too many resources. information technology and libraries | march 2019 81 the swedish royal library’s project kulturarw3 harvests websites at domain level, i.e., collecting websites from a .se domain which is a physically located website in sweden and one of the first projects to adopt this approach.25 usually, national-based web archive initiatives adopt the unselective approach, most notably nedlib, a helsinki university library harvester, and aola, an austrian online archive.26 selective approach the selective approach was adopted by the national library of australia (nla) in the pandas project in 1997. in this approach, a website is included for archiving based on certain predefined strategies and on the access and information provided by the archive. the library of congress’ project minerva and the british library project “britain on the web” are the other known projects that have adopted the selective approach. according to nla, the selected websites are archived based on nla guidelines after negotiation with the owners.27 the inclusion decision could be taken at one of the following levels: • website level: which websites should be included from a selected domain, e.g., to archive all educational websites from high level domain “.pk”. • web page level: which web pages should be included from a selected website, e.g., to archive the homepages of all educational websites. • web content level: which type of web contents should be preserved, e.g., to archive all the images from the homepages of educational websites. a selective approach is best if the numbers of websites to be archived are very large or the archiving process is targeting the entire www and wants to narrow down the scope by identifying the resources in which the archivists are more interested. this approach performs implicit or explicit assumptions about the web contents that are not to be selected for preservation. it may be very helpful to initiate a pilot preservation project, which identifies: what is possible? what can be managed? in addition, some tangible results may be obtained easily and quickly in order to enhance the scope of the project in a broader perspective. the selective approach may be based on a predefined criterion or based on an event. selective approach based on criteria involves selecting web resources based on various predefined sets of criteria. nla’s guidance characterizes the criteria-based selective approach as the “most narrowly defined method,” and described it as “thematic selection.” a simple or a complex content-selection criteria can be defined, which depends on the overall goal of preservation. for example, all resources owned by an organization, all resources of one genre, i.e., all programming blogs, resources contributed to a common subject, resources addressing a specific community within an institution, i.e., students or staff, all publications belonging to an individual organization or group of organizations, all resources that may benefit external users or an external user’s community, e.g., historians, or alumni. selective approach based on event involves selecting web resources or websites based on various time-based events. the archivists may focus on websites that address national or international important events, e.g., disasters, elections, and the football world cup, etc. eventbased websites have two characteristics: (1) very frequent updates and (2) website content is lost after a short time, e.g., a few weeks or a few months. for example, the start and end of a term or systematic approach towards web preservation | khan and ur rahman 82 https://doi.org/10.6017/ital.v38i1.10181 academic year, the duration of an activity, e.g., research project, appointment, or departure of a new senior official. deposit approach in the deposit collection approach, the information package is submitted by the administrator or owner of the website which includes a copy of the website with related files that can be accessed through different hyperlinks. the archival information package is applicable to the small collection (of a few websites), or the owner of the website can initiate the preservation project, e.g. a company can initiate a project for preserving their website. the deposit collection approach was adopted by the national archives and records administration (nara) for the collection of us federal agency websites in 2001 and by die deutsche bibliothek (ddb, http://deposit.ddb.de/) for the collection of dissertations and some online publications. new digital initiatives are heavily dependent on administrator or owner support and provide an easy way to deposit new content to the repository, e.g., in the macewan university’s institutional repository, the librarians leading the project tried to offer an easy and effective way to deposit their archival contents.28 combined approach there are advantages and disadvantages associated with each collection approach. the ongoing debate is which approach is best in a given situation. for example, the deposit approach should be an inexpensive agreement with the depositors. the emphasis is to use the combination of automatic harvesting and selective approaches as these two approaches are cheaper as compared to other selection approaches because a few staff personnel are required and cope with technological challenges. this initiative was taken by the bibliothque nationale de france (bnf) in 2006. the bnf automatically crawls information regarding the updated web pages and stores it in an xml-based “site delta” and uses page relevancy and importance, similar to how google ranks pages, to evaluate individual pages.29 the bnf used a selective approach for the deep web (that is, web pages or websites that are behind a password or are otherwise not generally accessible to search engines), referred to as “deposit track.” metadata identification cataloging is required to discover a specific item from the digital collection. an identifier or set of identifiers is required to retrieve a digital record in digital repositories or an archive. for digital documents, this catalog or registration or identifier is referred to as metadata.30 metadata are structured information concerning resources that describe, locate (discover or place), manage, easily retrieve (access) and use digital information resources. metadata are often referred to as “data about data” or “information about information”, but it may be more helpful and informative to describe these data as “descriptive and technical documentation.”31 metadata can be divided into the following three categories: 1. descriptive metadata describes a resource for discovery and identification purposes. it may consist of elements for a document such as title, author(s), abstract, and keywords, etc. 2. structural metadata describes how compound objects are put together, for example, how sections are ordered to form chapters. information technology and libraries | march 2019 83 3. administrative metadata imparts information to facilitate resource management, such as when and how a file was created, who can access the file, its type, and other technical information. administrative metadata is classified into two types: (1) rights management metadata addresses intellectual property rights and (2) preservation metadata contains information needed to archive and preserve a resource.32 due to new information technologies, digital repositories, especially web-based repositories, have grown rapidly over the last two decades. this interest prompts the digital libraries communities to devise metadata strategies to manage the immense amount of data stored in digital libraries.33 metadata play a vital role in the long-term preservation of digital objects and important to identify the metadata which may help to retrieve a specific object from the archive after preservation. according to duff et al., “the right metadata is the key to preserving digital objects.”34 there are hundreds of metadata standards developed over the years for different user environments, disciplines, and for different purposes; many of them are in their second, third, or nth edition.35 digital preservation and archiving requires metadata standards to trace and ensure its access to the digital objects. several of the common standards are briefly discussed below. dublin core metadata initiative (dcmi, http://dublincore.org/) was initiated at the 2nd world wide web conference in 1994 and was standardized by ansi/niso z39.85 in 2001 and iso 15386 in 2003.36 the main purpose of the dcmi was to define an element set for representing web resources; initially, thirteen core elements were defined which later increased to a fifteen-element set. the elements are optional, repeatable, can be followed in any order, and expressed in xml.37 metadata encoding and transmission standard (mets, http://www.loc.gov/standards/mets/) is an xml metadata standard intended to represent information of the complex digital objects. mets elements evolved from the early project making of america ii “moa2” in 2001, supported by the library of congress and sponsored by the digital library federation “dlf” and registered with national information standards organization “niso” in 2004. a mets document contains seven major sections in which each contains different aspects of metadata.38 metadata object description schema (mods, http://www.loc.gov/standards/mods/) was initiated by the marc21 maintenance agency at the library of congress in 2002. mods elements are richer then dcmi, simpler then marc21 bibliographic format and expressed in xml.39 the mods identified the widest facets or features of an object and presented nineteen high-level optional elements.40 visual resources association core strategies (vra core, http://www.loc.gov/standards/vracore/) was developed in 1996, and the current version 4.0 was released in 2007. the vra core is a widely used standard for art, libraries, and archives for such objects as paintings, drawings, sculpture, architecture, and photographs, as well as books and decorative and performance art.41 the vra core contains nineteen elements and nine sub-elements.42 preservation metadata implementation strategies (premis, http://www.loc.gov/standards/premis/) was developed in 2005, sponsored by the online computer library center (oclc) and the research libraries group (rlg), includes a data dictionary and some information about metadata. premis defined a set of five interactive core semantic units or entities and xml schema for endorsing digital preservation activities. it is not systematic approach towards web preservation | khan and ur rahman 84 https://doi.org/10.6017/ital.v38i1.10181 concerned with discovery and access but with common metadata, and for descriptive metadata, other standards (dublin core, mets or mods) need to be used. the premis data model contains intellectual entities (contents that can be described as a unit, e.g., books, articles, databases), objects (discrete units of information in digital form, which can be files, bitstreams, or any representation), agents (people, organization, or software), events (actions that involve an object and an agent known to the system) and rights (assertion of rights and permission).43 it is indisputable that good metadata improves access to the digital object in the digital repository. therefore, the creation and selection of appropriate metadata make the web archive accessible to the archive user. structure metadata helps to manage the archival collection internally, as well as the related services, but may not always help to discover the primary source of the digital object.44 currently, there are many semi-automated metadata generation tools. the use of these semiautomatic tools for generating metadata is crucial for the future, considering the operation’s complexity and cost of manual metadata origination.45 archival format the web archive initiatives select websites for archiving based on relevance of contents and the intended audience of the archived information. the size of the web archives varies significantly depending on their scope and the type of content they are preserving, e.g., web pages, pdf documents, images, audio, or video files.46 to preserve these contents, a web archive uses different storage formats containing metadata and utilizes data compression techniques. the internet archive defined the arc format (http://archive.org/web/researcher/arcfileformat.php), later used as a defacto standard. in 2009, the internet organization for standardization (iso) established the warc format (https://goo.gl/0rbwsn) as an official standard for web archiving. approximately 54 percent of web archive initiatives applied arc and warc formats for archiving. the use of standard formats helps the archivists to facilitate the creation of collaborative tools, such as search engines and ui utilities to efficiently manipulate the archived data.47 information dissemination mechanisms a well-defined preservation process can lead to a well-organized web archive that is easy to maintain and easy to retrieve a specific digital object from the collection using information dissemination techniques. poor search results are one of the main problems in information dissemination of web archives. the users of a web archive expend excessive time to retrieve intended documents or information to satisfy the user’s query. archivists are more concerned with “ofness,” “what collections are made up of,” although archive users are concerned with aboutness, “what collections are about.”48 to use the full potential of web archives a usable interface is needed to help the user to search the archive for specific digital object. full text and keyword search are the dominant ways to search the unstructured information repository, evidently observed from the online search engines. the sophistication of search results against user queries is based on the ranking tools.49 the access tools and techniques are getting the attention of researchers, and approximately 82 percent of european web archives concentrate on such tools, which makes these web archives easily accessible.50 the lucene full-text search engine and its extension nutchwax is widely used in web archiving. moreover, for the combination of semantic descriptions that already rely on or are implicit within their descriptive metadata, reasoning-based or semantic searching of the archival information technology and libraries | march 2019 85 collection can enable the system to produce novel possibilities for the archival content retrieval and browsing.51 even in the current era of digital archives, mobile services are adopted in digital libraries, e.g., access to e-books, libraries databases, catalogs, and text messaging are common mobile services offered in university libraries.52 in a massive repository, a user query retrieves millions of documents, which makes it difficult for users to identify the most relevant information. the ranking model estimates the results relevancy based on user’s queries using specified criteria to overcome this problem and sorts the results by placing the most relevant result at the top.53 there are a number of ranking models that exist in the literature, e.g., conventional ranking models, e.g., tf-idf, bm25f, temporal ranking models, e.g., pagerank, and learning to rank models, e.g., l2r. the findings of the systematic approach for web preservation are used to automate the process of the digital news-story preservation. the steps of the proposed model are carefully adopted to develop a tool that is able to add contextual information to the stories to be preserved. digital news stories preservation framework the advancement of web technologies and maturation of the internet attracts news readers to access news online that is provided by multiple sources and to obtain the desired information comprehensively. the amount of news published online has grown rapidly, and for an individual, it is cumbersome to browse through all online sources for relevant news articles. the news generation in the digital environment is no longer a periodic process with a fixed single output, such as printed newspapers. the news is instantly generated and updated online in a continuous fashion. however, because of different reasons, such as the short lifespan of digital information and the speed of generation of information, it has become vital to preserve digital news for the long term. digital preservation includes various actions to ensure that digital information remains accessible and usable, as long as they are considered important.54 libraries and archives preserve by carefully digitizing newspapers considering as a good source of knowing the history. many approaches have been developed to preserve digital information for the long term. the lifespan of news stories published online varies from one newspaper to another, i.e., from one day to a month. however, a newspaper may be backed up and archived by the news publisher or national archives; in the future, it will be difficult to access particular information published in various newspapers regarding the same news story. the issues become even more complicated if a story is to be tracked through an archive of many newspapers, which requires different access technologies. the digital news story preservation (dnsp) framework was introduced to preserve digital news articles published online from multiple sources.55 the dnsp framework is planned based on adopting the proposed step-by-step systematic approach for web preservation to develop a wellorganized web archive. initially, the main objectives defined for the dnsp framework are: • to initiate a well-organized national level digital news archive of multiple news sources. • to normalize news articles during preservation to a common format for future use. • to extract explicit and implicit metadata, which would be helpful in ingesting stories to the archive and browsing through the archive in the future. • to introduce content-based similarity measures to link digital news articles during preservation. systematic approach towards web preservation | khan and ur rahman 86 https://doi.org/10.6017/ital.v38i1.10181 the digital news story extractor (dnse) is a tool developed to facilitate the extraction of news stories from the online newspapers and to migrate to a normalized format for preservation. the normalized format also includes a step to add metadata in the digital news stories archive (dnsa) for future use.56 to facilitate the accessibility of news articles preserved from multiple sources, some mechanisms need to be adopted for linking the archived digital news articles. an effective term-based approach “common ratio measure for stories (crms)” for linking digital news articles in dnsa is introduced that links similar news articles during the preservation process.57 the approach is empirically analyzed, and the results of the proposed approach are compared to get conclusive arguments. the initial results computed automatically using a common ratio measure for stories are encouraging and are compared with the similarity of news articles based on human judgment. the results are generalized by defining a threshold value based on multiple experimental results using the proposed approach. currently, there is ongoing work to extend the scope of dnsa to dual languages, i.e., urdu and english, as well as content-based similarity measures to link news articles published in urduenglish. moreover, research is underway to develop tools for exploiting the linkage created among stories during the preservation process for search and retrieval tasks. summary effective strategic planning is critical in creating web archives; hence, it requires a wellunderstood and a well-planned preservation process. the process should result in a wellorganized web archive that includes not only the content to be preserved but also the contextual information required to interpret the content. the study attempts to answer many questions by guiding the archivists and related personnel, such as: how to lead the web preservation process effectively? how to initiate the preservation process? how to proceed through different steps? what are the possible techniques that may help to create a well-organized web archive? how can the archived information can be used to its greatest potential? to answer these questions, the study resulted in an appropriate step-by-step process for web preservation and a well-organized web archive. the targeted goal of each step is identified by researching the existing approaches that can be adopted. the possible techniques for those approaches are discussed in detail for each step. references 1 “world wide web size,” the size of the world wide web, visited on jan 31, 2019, http://www.worldwidewebsize.com/. 2 brian f. lavoie, “the open archival information system reference model: introductory guide,” microform & imaging review 33, no. 2 (2004): 68-81; alexandros ntoulas, junghoo cho, and christopher olston, “what's new on the web? the evolution of the web from a search engine perspective,” in proceedings of the 13th international conference on world wide web-04 (new york, ny: acm, 2004), 1-12. information technology and libraries | march 2019 87 3 teru agata et al., “life span of web pages: a survey of 10 million pages collected in 2001,” ieee/acm joint conference on digital libraries, (ieee, 2014), 463-64, https://doi.org/10.1109/jcdl.2014.6970226. 4 timothy robert hart and denise de vries, “metadata provenance and vulnerability,” information technology and libraries 36, no. 4 (dec. 2017): 24-33, https://doi.org/10.6017/ital.v36i4.10146. 5 claire warwick et al., “library and information resources and users of digital resources in the humanities,” program 42, no. 1 (2008): 5-27, https://doi.org/10.1108/00330330810851555. 6 lavoie, “open archival information system reference model.” 7 susan farrell, k. ashley, and r. davis, “a guide to web preservation,” practical advice for web and records managers based on best practices from the jisc-funded powr project (2010), https://jiscpowr.jiscinvolve.org/wp/files/2010/06/guide-2010-final.pdf. 8 lavoie, “open archival information system reference model;” farrell, ashley, and davis, “guide to web preservation.” 9 peter lyman, “archiving the world wide web,” washington, library of congress (2002), https://www.clir.org/pubs/reports/pub106/web/. 10 diomidis spinellis, “the decay and failures of web references,” communications of the acm 46, no. 1 (2003): 71-77, https://dl.acm.org/citation.cfm?doid=602421.602422. 11 digital archive for chinese studies (dachs) archive2 https://www.zo.uniheidelberg.de/boa/digital_resources/dachs/index_en.html, visited on jan 31, 2019. 12 julien masanès, “web archiving methods and approaches: a comparative study,” library trends 54, no. 1 (2005): 72-90, https://doi.org/10.1353/lib.2006.0005. 13 hanno lecher, “small scale academic web archiving: dachs,” in web archiving (berlin/heidelberg: springer, 2006), 213-25, https://doi.org/10.1007/978-3-540-463320_10. 14 daniel gomes et al., “introducing the portuguese web archive initiative,” in 8th international web archiving workshop (berlin/heidelberg: springer, 2009). 15 gerrit voerman et al., “archiving the web: political party web sites in the netherlands,” european political science 2, no. 1 (2002): 68-75, https://doi.org/10.1057/eps.2002.51. 16 sonja gabriel, “public sector records management: a practical guide,” records management journal 18, no. 2 (2008), https://doi.org/10.1108/00242530810911914. 17 farrell, ashley, and davis, “guide to web preservation.” systematic approach towards web preservation | khan and ur rahman 88 https://doi.org/10.6017/ital.v38i1.10181 18 jung-ran park and andrew brenza, “evaluation of semi-automatic metadata generation tools: a survey of the current state of the art,” information technology and libraries 34, no. 3 (sept, 2015): 22-42, https://doi.org/10.6017/ital.v34i3.5889. 19 muzammil khan and arif ur rahman, “digital news story preservation framework,” in digital libraries: providing quality information: 17th international conference on asia-pacific digital libraries, icadl 2015 seoul, korea, december 9-12, 2015 (proceedings, vol. 9469, springer, 2015), 350-52, https://doi.org/10.1007/978-3-319-27974-9; muzammil khan, “using text processing techniques for linking news stories for digital preservation,” phd thesis, faculty of computer science, preston university kohat, islamabad campus, hec pakistan, 2018. 20 dennis dimick, “adobe acrobat captures the web,” washington apple pi journal (1999): 23-25. 21 trupti udapure, ravindra d. kale, and rajesh c. dharmik, “study of web crawler and its different types,” iosr journal of computer engineering (iosr-jce) 16, no. 1 (2014): 01-05, https://doi.org/10.9790/0661-16160105. 22 dora biblarz et al., “guidelines for a collection development policy using the conspectus model,” international federation of library associations and institutions, section on acquisition and collection development (2001). 23 farrell, ashley, and davis, “guide to web preservation;” e. pinsent et al., “powr: the preservation of web resources handbook,” http://jisc.ac.uk/publications/programmerelated/2008/powrhandbook.aspx (2010); michael day, “preserving the fabric of our lives: a survey of web preservation initiatives,” lecture notes in computer science (berlin/heidelberg: springer, 2003): 461-72, https://doi.org/10.1007/978-3-540-45175-4_42. 24 pinsent et al., “powr:”; day, “preserving the fabric.” 25 allan arvidson, “the royal swedish web archive: a complete collection of web pages,” international preservation news (2001): 10-12. 26 andreas rauber, andreas aschenbrenner, and oliver witvoet, “austrian online archive processing: analyzing archives of the world wide web,” research and advanced technology for digital libraries (2002): ecdl 2002. lecture notes in computer science, vol 2458, (berlin/heidelberg: springer, 2002), 16-31, https://doi.org/10.1007/3-540-45747-x_2. 27 william arms, “collecting and preserving the web: the minerva prototype,” rlg diginews 5, no. 2 (2001). 28 sonya betz and robyn hall, “self-archiving with ease in an institutional repository: micro interactions and the user experience,” information technology and libraries 34, no. 3 (sept. 2015): 43-58, https://doi.org/10.6017/ital.v34i3.5900. 29 serge abiteboul et al., “a first experience in archiving the french web,” in international conference on theory and practice of digital libraries, (berlin/heidelberg: springer, 2002), 115, https://doi.org/10.1007/3-540-45747-x_1; sergey brin and lawrence page, “reprint of: information technology and libraries | march 2019 89 the anatomy of a large-scale hypertextual web search engine,” computer networks 56, no. 18 (2012): 3825-33, https://doi.org/10.1016/j.comnet.2012.10.007. 30 masanès, “web archiving.” 31 niso-press, “understanding metadata,” national information standards (2004), http://www.niso.org/publications/understanding-metadata. 32 ibid. 33 jane greenberg, “understanding metadata and metadata schemes,” cataloging & classification quarterly 40, no. 3-4 (2009): 17-36, https://doi.org/10.1300/j104v40n03_02. 34 michael day, “preservation metadata initiatives: practicality, sustainability, and interoperability,” publishers: archivschule marburg (2004): 91-117. 35 jenn riley, glossary of metadata standards (2010). 36 corey harper, “dublin core metadata initiative: beyond the element set,” information standards quarterly 22, no. 1 (2010): 20-31. 37 jane greenberg, “dublin core: history, key concepts, and evolving context (part one),” in slide presentation on dc-2010 international conference on dublin core and metadata applications pittsburgh, pa (2010). 38 cundiff v. morgan, “an introduction to the metadata encoding and transmission standard (mets),” library hi tech 22, no. 1 (2004): 52-64, https://doi.org/10.1108/07378830410524495; leta negandhi, “metadata encoding and transmission standard (mets),”in texas conference on digital libraries, tcdl-2012 (2012). 39 sally h. mccallum, “an introduction to the metadata object description schema (mods),” library hi tech 22, no. 1 (2004): 82-88, https://doi.org/10.1108/07378830410524521. 40 r. gartner, “mode: metadata object description schema,” jisc techwatch report tsw (2003): 03-06. www.loc.gov/standards/mods/. 41 vra-core, “an introduction of vra core,” http://www.loc.gov/standards/vracore/vra core4 intro.pdf, created: oct 2014. 42 vra-core, “vra core element outline,” http://www.loc.gov/standards/vracore/vra core4 outline.pdf, created: feb 2007. 43 priscilla caplan, “understanding premis,” washington dc, usa: library of congress, (2009), https://www.loc.gov/standards/premis/understanding-premis.pdf; j. relay, “an introduction to premis,” singapore ipress tutorial, (2011), http://www.loc.gov/standards/premis/premistutorial ipres2011 singapore.pdf. systematic approach towards web preservation | khan and ur rahman 90 https://doi.org/10.6017/ital.v38i1.10181 44 jennifer schaffner, “the metadata is the interface: better description for better discovery of archives and special collections, synthesized from user studies,” making archival and special collections more accessible, 85 (2015). 45 joao miranda and daniel gomes, “trends in web characteristics,” in web congress, 2009. laweb'09. latin american, (ieee, 2009), 146-53, https://doi.org/10.1109/la-web.2009.28. 46 daniel gomes, joão miranda, and miguel costa, “a survey on web archiving initiatives,” research and advanced technology for digital libraries (2011): 408-20, https://doi.org/10.1007/978-3-642-24469-8_41. 47 ibid. 48 schaffner, “metadata is the interface.” 49 miguel costa and mário j. silva, “evaluating web archive search systems,” in international conference on web information systems engineering (berlin/heidelberg: springer, 2012), 440454. https://doi.org/10.1007/978-3-642-35063-4_32. 50 foundation, i, “web archiving in europe,” technical report, commercenet labs (2010). 51 georgia solomou and dimitrios koutsomitropoulos, “towards an evaluation of semantic searching in digital repositories: a dspace case-study,” program 49, no. 1 (2015): 63-90, https://doi.org/10.1108/prog-07-2013-0037. 52 liu yan quan and sarah briggs, “a library in the palm of your hand: mobile services in top 100 university libraries,” information technology and libraries 34, no. 2 (june 2015): 133, https://doi.org/10.6017/ital.v34i2.5650. 53 ricardo baeza-yates and berthier ribeiro-neto, modern information retrieval 463. (new york: acm pr., 1999). 54 daniel burda and frank teuteberg, “sustaining accessibility of information through digital preservation: a literature review,” journal of information science, 39, no. 4 (2013): 442-58, https://doi.org/10.1177/0165551513480107. 55 muzammil khan et al., “normalizing digital news-stories for preservation,” in digital information management (icdim), 2016 eleventh international conference on (ieee, 2016), 8590, https://doi.org/10.1109/icdim.2016.7829785. 56 khan, et al., “normalizing digital news.” 57 muzammil khan, arif ur rahman, and m. daud awan, “term-based approach for linking digital news stories,” in italian research conference on digital libraries (cham, switzerland: springer, 2018), 127-38, https://doi.org/10.1007/978-3-319-73165-0_13. generating collaborative systems for digital libraries | visser and ball 187 marijke visser and mary alice ball the middle mile: the role of the public library in ensuring access to broadband of fundamentally altering culture and society. in some circles the changes happen in real time as new web-based applications are developed, adopted, and integrated into the user’s daily life. these users are the early adopters; the internet cognoscenti. second tier users appreciate the availability of online resources and use a mix of devices to access internet content but vary in the extent to which they try the latest application or device. the third tier users also vary in the amount they access the internet but have generally not embraced its full potential, from not seeking out readily available resources to not connecting at all.1 regardless of the degree to which they access the internet, all of these users require basic technology skills and a robust underlying infrastructure. since the introduction of web 2.0, the number and type of participatory web-based applications has continued to grow. many people are eagerly taking part in creating an increasing variety of web-based content because the basic tools to do so are widely available. the amateur, creating and sharing for primarily personal reasons, has the ability to reach an audience of unprecedented size. in turn, the internet audience, or virtual audience, can select from a vast menu of formats, including multimedia and print. with print resources disappearing, it is increasingly likely for an individual to only be able to access necessary material online. web-based resources are unique in that they enable an undetermined number of people, personally connected or complete strangers, to interact with and manipulate the content thereby creating something new with each interaction and subsequent iteration. many of these new resources and applications require much more bandwidth than traditional print resources. with the necessary technology no longer out of reach, a crosssection of society is affecting the course the twenty-first century is taking vis à vis how information is created, who can create it, and how we share it.2 in turn, who can access web-based content and who decides how it can be accessed become critical questions to answer. as people become more adept at using web-based tools and eager to try new applications, the need for greater broadband will intensify. the economic downturn is having a marked effect on people’s internet use. if there was a preexisting problem with inadequate access to broadband, current circumstances exacerbate it to where it needs immediate attention. access to broadband internet today increases this paper discusses the role of the public library in ensuring access to the broadband communication that is so critical in today’s knowledge-based society. it examines the culture of information in 2010, and then asks what it means if individuals are online or not. the paper also explores current issues surrounding telecommunications and policy, and finally seeks to understand the role of the library in this highly technological, perpetually connected world. i n the last twenty years library collections have evolved from being predominantly print-based to ones that have a significant digital component. this trend, which has a direct impact on library services, has only accelerated with the advent of web 2.0 technologies and participatory content creation. cutting-edge libraries with next generation catalogs encourage patrons to post reviews, contribute videos, and write on library blogs and wikis. even less adventuresome institutions offer a variety of electronic databases licensed from multiple publishers and vendors. the piece of these library portfolios that is at best ignored and at worst vilified is the infrastructure that enables internet connectivity. in 2010, broadband telecommunication is recognized as essential to access the full range of information resources. telecommunications experts articulate their concerns about the digital divide by focusing on firstand last-mile issues of bringing fiber and cable to end users. the library, particularly the public library, represents the metaphorical middle mile providing the public with access to rich information content. equally important, it provides technical knowledge, subject matter expertise, and general training and support to library users. this paper discusses the role of the public library in ensuring access to the broadband communication that is so critical in today’s knowledge-based society. it examines the culture of information in 2010, and then asks what it means if individuals are online or not. the paper also explores current issues surrounding telecommunications and policy, and finally seeks to understand the role of the library in this highly technological, perpetually connected world. ■■ the culture of information information today is dynamic. as the internet continues on its fast paced, evolutionary track, what we call ‘information’ fluctuates with each emerging web-based technology. theoretically a democratic platform, the internet and its user-generated content is in the process marijke visser (mvisser@alawash.org) is information technology policy analyst and mary alice ball (maryaliceball@yahoo .com) former chair, telecommunications subcommittee, office for information technology policy, american library association, washington, dc. 188 information technology and libraries | december 2010 the geographical location of a community will also influence what kind of internet service is available because of deployment costs. these costs are typically reflected in varying prices to consumers. in addition to the physical layout of an area, current federal telecommunications policies limit the degree to which incentives can be used on the local level.7 encouraging competition between isps, including municipal electric utilities, incumbent local exchange carriers, and national cable companies, for example, requires coordination between local needs and state and federal policies. such coordinated efforts are inherently difficult when taking into consideration the numerous differences between locales. ultimately, though, all of these factors influence the price end users must pay for internet access. with necessary infrastructure and telecommunications policies in place, there are individual behaviors that also affect broadband adoption. according to the pew study, “home broadband adoption 2008,” 62 percent of dial-up users are not interested in switching to broadband.8 clearly there is a segment of the population that has not yet found personal relevance to high-speed access to online resources. in part this may be because they only have experience with dial-up connections. depending on dial-up gives the user an inherently inferior experience because bandwidth requirements to download a document or view a website with multimedia features automatically prevent these users from accessing the same resources as a user with a high-speed connection. a dial-up user would not necessarily be aware of this difference. if this is the only experience a user has it might be enough to deter broadband adoption, especially if there are other contributing factors like lack of technical comfort or availability of relevant content. motivation to use the internet is influenced by the extent to which individuals find content personally relevant. whether it is searching for a job and filling out an application, looking at pictures of grandchildren, using skype to talk to a family member deployed in iraq, researching healthcare providers, updating a personal webpage, or streaming video, people who do these things have discovered personally relevant internet content and applications. understanding the potential relevance of going online makes it more likely that someone would experiment with other applications, thus increasing both the familiarity with what is available and the comfort level with accessing it. without relevant content, there is little motivation for someone not inclined to experiment with internet technology to cross what amounts to a significant hurdle to adoption. anthony wilhelm argues in a 2003 article discussing the growing digital divide that culturally relevant content is critical in increasing the likelihood that non-users will want to access web-based resources.9 the scope of the issue of providing culturally relevant content is underscored in the 2008 pew study, the amount of information and variety of formats available to the user. in turn more content is being distributed as users create and share original content.3 businesses, nonprofits, municipal agencies, and educational institutions appreciate that by putting their resources online they reach a broader segment of their constituency. this approach to reaching an audience works provided the constituents have their own access to the materials, both physically and intellectually. it is one thing to have an internet connection and another to have the skill set necessary to make productive use of it. as reported in job-seeking in u.s. public libraries in 2009, “less than 44% of the top 100 u.s. retailers accept instore paper applications.”4 municipal, state, and federal agencies are increasingly putting their resources online, including unemployment benefit applications, tax forms, and court documents.5 in addition to online documents, the report finds social service agencies may encourage clients to make appointments and apply for state jobs online.6 many of the processes that are now online require an ability to navigate the complexities of the internet at the same time as navigating difficult forms and websites. the combination of the two can deter someone from retrieving necessary resources or successfully completing a critical procedure. while early adopters and policy-makers debate the issues surrounding internet access, the other strata of society, knowingly or not and to varying degrees, are enmeshed in the outcomes of these ongoing discussions because their right to information is at stake. ■■ barriers to broadband access by condensing internet access issues to focus on the availability of adequate and sustainable broadband, it is possible to pinpoint four significant barriers to access: price, availability, perceived relevance, and technical skill level. the first two barriers are determined by existing telecommunications infrastructure as well as local, state, and federal telecommunications policies. the latter barriers are influenced by individual behaviors. both divisions deserve attention. if local infrastructure and the internet service provider (isp) options do not support broadband access to all areas within its boundaries, the result will be that some community members can have broadband services at home while others must rely on work or public access computers. it is important to determine what kind of broadband services are available (e.g., cable, dsl, fiber, satellite) and if they are robust enough to support the activities of the community. infrastructure must already be in place or there must be economic incentive for isps to invest in improving current infrastructure or in installing new infrastructure. generating collaborative systems for digital libraries | visser and ball 189 at all. success hinges on understanding that each community is unique, on leveraging its strengths, and on ameliorating its weaknesses. local government can play a significant role in the availability of broadband access. from a municipal perspective, emphasizing the role of broadband as a factor in economic development can help define how the municipality should most effectively advocate for broadband deployment and adoption. gillett offers four initiatives appropriate for stimulating broadband from a local viewpoint. municipal governments can ■■ become leaders in developing locally relevant internet content and using broadband in their own services; ■■ adopt policies that make it easier for isps to offer broadband; ■■ subsidize broadband users and/or isps; or ■■ become involved in providing the infrastructure or services themselves.12 individually or in combination these four initiatives underscore the fact that government awareness of the possibilities for community growth made possible by broadband access can lead to local government support for the initiatives of other local agencies, including nonprofit, municipal, or small businesses. agencies partnering to support community needs can provide evidence to local policy makers that broadband is essential for community success. once the municipality sees the potential for social and economic development, it is more likely to support policies that stimulate broadband buildout. building strong local partnerships will set the stage for the development of a sustainable broadband initiative as the different stakeholders share perspectives that take into account a variety of necessary components. when the time comes to implement a strategy, not only will different perspectives have been included, the plan will have champions to speak for it: the government, isps, public and private agencies, and community members. it is important to know which constituents are already engaged in supporting community broadband initiatives and which should be tapped. the ultimate purpose in establishing broadband internet access in a community is to benefit the individual community members, thereby stimulating local economic development. key players need to represent agencies that recognize the individual voice. a 2004 study led by strover provides an example of the importance of engaging local community leaders and agencies in developing a successful broadband access project.13 the study looked at thirty-six communities that received state funding to establish community technology centers (ctc). it addressed the effective use and management of ctcs and called attention to the inadequacy of supplying the hardware without community support which found that of the 27 percent of adult americans who are not internet users, 33 percent report they are not interested in going online.10 that pew can report similar information five years after the wilhelm article identifies a barrier to equitable access that has not been adequately resolved. ■■ models for sustainable broadband availability in discussing broadband, the question of what constitutes broadband inevitably arises. gillett, lehr, and osoria, in “local government broadband initiatives,” offers a functional definition: “access is ‘broadband’ if it represents a noticeable improvement over standard dial-up and, once in place, is no longer perceived as the limiting constraint on what can be done over the internet.”11 while this definition works in relationship to dial-up, it is flexible enough to apply to all situations by focusing on “a noticeable improvement” and “no longer perceived as the limiting constraint” (added emphasis). ensuring sustainable broadband access necessitates anticipating future demand. short sighted definitions, applicable at a set moment in time, limit long-term viability of alternative solutions. devising a sustainable solution calls for careful scrutiny of alternative models, because the stakes are so high in the broadband debate. there are many different players involved in constructing information policies. this does not mean, however, that their perspectives are mutually exclusive. in debates with multiple perspectives, it is important to involve stakeholders who are aligned with the ultimate goal: assuring access to quality broadband to anyone going online. what is successful for one community may be entirely inappropriate in another; designing a successful system requires examining and comparing a range of scenarios. existing circumstances may predetermine a particular starting point, but one first step is to evaluate best practices currently in place in a variety of communities to come up with a plan that meets the unique criteria of the community in question. sustainable broadband solutions need to be developed with local constituents in mind and successful solutions will incorporate the realities of current and future local technologies and infrastructure as well as local, state, and federal information policies. presupposing that the goal is to provide the community with the best possible option(s) for quality broadband access, these are key considerations to take into account when devising the plan. in addition to the technological and infrastructure issues, within a community there will be a combination of ways people access the internet. there will be those who have home access, those who need public access, and those who do not seek access 190 information technology and libraries | december 2010 the current emphasis on universal broadband depends on selecting the best of the alternative plans according to carefully vetted criteria in order to develop a flexible and forward-thinking course of action. can we let people remain without access to robust broadband and the necessary skill set to use it effectively? no. as more and more resources critical to basic life tasks are accessible only online, those individuals that face challenges to going online will likely be socially and economically disadvantaged when compared to their online counterparts. recognition of this potential for intensifying digital divide is recognized in the federal communication commission’s (fcc) national broadband plan (nbp) released in march 2010.18 the nbp states six national broadband goals, the third of which is “every american should have affordable access to robust broadband service, and the means and skills to subscribe if they so choose.”19 research conducted for the recommendations in the nbp was comprehensive in scope including voices from industry, public interest, academia, and municipal and state government. responses to more than thirty public notices issued by the fcc provide evidence of wide concern from a variety of perspectives that broadband access should become ubiquitous if the united states is to be a competitive force in the twentyfirst century. access to essential information such as government, public safety, educational, and economic resources requires a broadband connection to the internet. it is incumbent on government officials, isps, and community organizations to share ideas and resources to achieve a solution for providing their communities with robust and sustainable broadband. it is not necessary to have all users up to par with the early adopters. there is not a one-size-fits-all approach to wanting to be connected, nor is there a one-size-fits-all solution to providing access. what is important is that an individual can go online via a robust, high-speed connection that meets that individual’s needs at that moment. what this means for finding solutions is ■■ there needs to be a range of solutions to meet the needs of individual communities; ■■ they need to be flexible enough to meet the evolving needs of these communities as applications and online content continue to change; and ■■ they must be sustainable for the long term so that the community is prepared to meet future needs that are as yet unknown. solutions to providing broadband internet access will be most successful when they are designed starting at the local level. community needs vary according to local demographics, geography, existing infrastructure, types of service providers, and how state and federal systems in place. users need a support system that highlights opportunities available via the internet and that provides help when they run into problems. access is more than providing the infrastructure and hardware. the potential users must also find content that is culturally relevant in an environment that supports local needs and expectations. strover found the most successful ctcs were located in places that “actively attracted people for other social and entertaining reasons.”14 in other words, the ctcs did not operate in a vacuum devoid of social context. successful adoption of the ctcs as a resource for information was dependent on the targeted population finding culturally relevant content in a supportive environment. an additional point made in the study showed that without strong community leadership, there was not significant use of the ctc even when placed in an already established community center.15 this has significant implications for what constitutes access as libraries plan broadband initiatives. investments in technology and a national commitment to ensure universal access to these new technologies in the 1990s provide the current policy framework. as suggested by wilhelm in 2003, to continue to move forward the national agenda needs to focus on updating policies to fit new information circumstances as they arise. today’s information policy debates should emphasize a similar focus. beyond accelerating broadband deployment into underserved areas, wilhelm suggests there needs to be support for training and content development that guarantees communities will actually use and benefit from having broadband deployed in their area.16 technology training and support for local agencies that provide the public with internet access, as well as opportunities for the individuals themselves, is essential if policies are going to actually lead to useful broadband adoption. individual and agency internet access and adoption require investment beyond infrastructure; they depend on having both culturally relevant content and the information literacy skills necessary to benefit from it. ■■ finding the right solution though it may have taken an economic crisis to bring broadband discussions into the living room, the result is causing renewed interest in a long-standing issue. many states have formed broadband task forces or councils to address the lack of adequate broadband access at the state level and, on the national front, broadband was a key component of the american recovery and reinvestment act of 2009.17 the issue changes as technologies evolve but the underlying tenet of providing people access to the information and resources they need to be productive members of society is the same. what becomes of generating collaborative systems for digital libraries | visser and ball 191 difficult to measure, these kinds of social and cultural capital are important elements in ongoing debates about uses and consequences of broadband access. an ongoing challenge for those interested in the social, economic, and policy consequences of modern information networks will be to keep up with changing notions of what it means to be connected in cyberspace.”20 the social contexts in which a broadband plan will be enacted influence the appropriateness of different scenarios and should help guide which ones are implemented. engaging a variety of stakeholders will increase the likelihood of positive outcomes as community members embrace the opportunities provided by broadband internet access. it is difficult, however, to anticipate the outcomes that may occur as users become more familiar with the resources and achieve a higher level of comfort with technology. ramirez states, the “unexpected outcomes” section of many evaluation reports tends to be rich with anecdotes . . . . the unexpected, the emergent, the socially constructed innovations seem to be, to a large extent, off the radar screen, and yet they often contain relevant evidence of how people embrace technology and how they innovate once they discover its potential.21 community members have the most to gain from having broadband internet access. including them will increase the community’s return on its investment as they take advantage of the available resources. ramirez suggests that “participatory, learning, and adaptive policy approaches” will guide the community toward developing communication technology policies that lead to a vibrant future for individuals and community alike.22 as success stories increase, the aggregation of local communities’ social and economic growth will lead to a net sum gain for the nation as a whole. ■■ the role of the library public libraries play an important role in providing internet access to their community members. according to a 2008 study, the public library is the only outlet for no-fee internet access in 72.5 percent of communities nationwide; in rural communities the number goes up to 82.0 percent.23 beyond having desktop or, in some cases, wireless access, public libraries offer invaluable user support in the form of technical training and locally relevant content. libraries provide a secondary community resource for other local agencies who can point their clients to the library for no-fee internet access. in today’s economy where anecdotal reports show an increase in library use, particularly internet use, the role of the public policies mesh with local ordinances. local stakeholders best understand the complex interworking of their community and are aware of who should be included in the decision-making process. including a local perspective will also increase the likelihood that as community needs change, new issues will be brought to the attention of policy makers and agencies who advocate for the individual community members. community agencies that already are familiar with local needs, abilities, and expectations are logical groups to be part of developing a successful local broadband access strategy. the library exemplifies a community resource whose expertise in local issues can inform information policy discussions on local, state, and federal levels. as a natural extension of library service, libraries offer the added value support necessary for many users to successfully navigate the internet. the library is an established community hub for informational resources and provides dedicated staff, technology training opportunities, and no-fee public access computers with an internet connection. libraries in many communities are creating locally relevant web-based content as well as linking to other community resources on their own websites. seeking a partnership with the local library will augment a community broadband initiative. it is difficult to appreciate the impacts of current information technologies because they change so rapidly there is not enough time to realistically measure the effects of one before it is mixed in with a new innovation. with web-based technologies there is a lag time between what those in the front of the pack are doing online and what those in the rear are experiencing. while there is general consensus that broadband internet access is critical in promoting social and economic development in the twenty-first century as is evidenced by the national purposes outlined in the nbp, there is not necessarily agreement on benchmarks for measuring the impacts. three anticipated outcomes of providing community access to broadband are ■■ civic participation will increase; ■■ communities will realize economic growth; and ■■ individual quality of life will improve. when a strategy involves significant financial and energy investments there is a tendency to want palpable results. the success of providing broadband access in a community is challenging to capture. to achieve a level of acceptable success it is necessary to focus on local communities and aggregate anecdotal evidence of incremental changes in public welfare and economic gain. acceptable success is subjective at best but can be usefully defined in context of local constituencies. referring to participation in the development of a vibrant culture, horrigan notes that “while inherently 192 information technology and libraries | december 2010 isolation. an individual must possess skills to navigate the online resources. as users gain an understanding of the potential personal growth and opportunities broadband yields, they will be more likely to seek additional online resources. by stimulating broadband use, the library will contribute to the social and economic health of the community. if the library is to extend its role as the information hub in the community by providing no-fee access to broadband to anyone who walks through the door, the local community must be prepared to support that role. it requires a commitment to encourage build out of appropriate technology necessary for the library to maintain a sustainable internet connection. it necessitates that local communities advocate for national information and communication policies that are pro-library. when public policy supports the library’s efforts, the local community benefits and society at large can progress. what if the library’s own technology needs are not met? the role of the library in its community is becoming increasingly important as more people turn to it for their internet access. without sufficient revenue, the library will have a difficult time meeting this additional demand for services. in turn, in many libraries increased demand for broadband access stretches the limit of it support for both the library staff and the patrons needing help at the computers. what will be the fallout from the library not being able to provide internet services the patrons desire and require? will there be a growing skills difference between people who adopt emerging technologies and incorporate them into their daily lives and those who maintain the technological status quo? what will the social impact be of remaining off line either completely or only marginally? can the library be the bridge between those on the edge, those in the middle, and those at the end? with a strong and well articulated vision for the future, the library can be the link that provides the community with sustainable broadband. ■■ conclusion the recent national focus on universal broadband access has provided an opportunity to rectify a lapse in effective information policy. whether the goal includes facilitating meaningful access continues to be more elusive. as government, organizations, businesses, and individuals rely more heavily on the internet for sharing and receiving information, broadband internet access will continue to increase in importance. following the status quo will not necessarily lead to more people having broadband access in the long run. the early adopters will continue to stimulate technological innovation which, in turn, will trickle down the ranks of the different user types. currently, library as a stable internet provider cannot be overestimated. to maintain its vital function, however, the library must also resolve infrastructure challenges of its own. because of the increased demand for access to internet resources, public libraries are finding their current broadband services are not able to support the demand of their patrons. the issues are two-fold: increased patron use means there are often neither sufficient workstations nor broadband speeds to meet patron demand. in 2008, about 82.5 percent of libraries reported an insufficient number of public workstations, and about 57.5 percent reported insufficient broadband speeds.24 to add to these already significant issues, the report indicates libraries are having trouble supporting the necessary information technology (it) because of either staff time constraints or the lack of a dedicated it staff.25 public libraries are facing considerable infrastructure management issues at a time when library use is increasing. overcoming the challenges successfully will require support on the local, state, and federal level. here is where the librarian, as someone trained to become inherently familiar with the needs of her local constituency and ethically bound to provide access to a variety of information resources, needs to insert herself into the debate. librarians need to be ahead of the crowd as the voice that assures content will be readily accessible to those who seek it. today, the elemental policy issue regarding access to information via the internet hinges on connectivity to a sustainable broadband network. to promote equitable broadband access, the librarian needs be aware of the pertinent information policies in place or under consideration, and be able to anticipate those in the future. additionally, she will need to educate local policy makers about the need for broadband in their community. in some circumstances, the librarian will need to move beyond her local community and raise awareness of community access issues on the state and federal level. the librarian is already able to articulate numerous issues to a variety of stakeholders and can transfer this skill to advocate for sustainable broadband strategies that will succeed in her local community. there are many strata of internet users, from those in the forefront of early adoption to those not interested in being online at all. the early adopters drive the market which responds by making resources more and more likely to be primarily available only online. as we continue this trend, the social repercussions increase from merely not being able to access entertainment and news to being unable to participate in the knowledge-based society of the twenty-first century. by folding in added value online access for the community, the library helps increase the likelihood that the community will benefit from broadband being available to the library patrons and by extension to the community as a whole. to realize the internet’s full potential, access to it cannot be provided in generating collaborative systems for digital libraries | visser and ball 193 community, the entire community benefits regardless of where and how the individuals go online. the effects of the internet are now becoming broadly social enough that there is a general awareness that the internet is not decoration on contemporary society but a challenge to it.28 being connected is no longer an optional luxury; to engage in the twenty-first century it is essential. access to the internet, however, is more than simple connectivity. successful access requires: an understanding of the benefits to going on line, technological comfort, information literacy, ongoing support and training, and the availability of culturally relevant content. people are at various levels of internet use, from those eagerly anticipating the next iteration of web-based applications to those hesitant to open an e-mail account. this user spectrum is likely to continue. though the starting point may vary depending on the applications that become important to the user in the middle of the spectrum, there will be those out in front and those barely keeping up. the implications of the pervasiveness of the internet are only beginning to be appreciated and understood. because of their involvement at the cutting edge of internet evolution, librarians can help lead the conversations. libraries have always been situated in neutral territory within their communities and closely aligned with the public good. librarians understand the perspective of their patrons and are grounded in their local communities. librarians can therefore advocate effectively for their communities on issues that may not completely be understood or even recognized as mattering. connectivity is an issue supremely important to the library as today access to the full range of information necessitates a broadband connection. libraries have carved out a role for themselves as a premier internet access provider in the continually evolving online culture. as noted by bertot, mcclure, and jaeger, the “role of internet access provider for the community is ingrained in the social perceptions of public libraries, and public internet access has become a central part of community perceptions about libraries and the value of the library profession.”29 in times of both economic crisis and technological innovation, there are many unknowns. in part because of these two juxtaposed events, the role of the public library is in flux. additionally, the network of community organizations that libraries link to is becoming more and more complex. it is a time of great opportunity if the library can articulate its role and frame it in relationship to broader society. evolving internet applications require increasing amounts of bandwidth and the trend is to make these bandwidth-heavy applications more and more vital to daily life. one clear path the library community can take however, the supply of internet resources is unevenly stimulating user demand and the unequal distribution of broadband access has greater potential for significant negative social consequences. staying the course and following a haphazard evolution of broadband adoption, may, in fact, renew valid concerns about a digital divide. without an intentional and coordinated approach to developing a broadband strategy, its success is likely to fall short of expectations. the question of how to ensure that internet content is meaningful requires instituting a plan on a very local level, including stakeholders who are familiar with the unique strengths and weaknesses of their community. strover, in her 2000 article the first mile, suggests connectivity issues should be viewed from a first mile perspective where the focus is on the person accessing the internet and her qualitative experience rather than from a last mile perspective which emphasizes isp, infrastructure, and market concerns.26 both perspectives are talking about the same physical section of the connection network: the piece that connects the user to the network. according to strover, distinguishing between the first mile and last mile perspectives is more than an arbitrary argument over semantics. instead, a first mile perspective represents a shift “in the values and priorities that shape telecommunications policy.”27 by switching to a first mile perspective, connectivity issues immediately take into account the social aspects of what it means to be online. who will bring this perspective to the table? and how will we ascertain what the best approach to supporting the individual voice should be? the first mile perspective is one the library is intimately familiar with as an organization that traditionally advocates for the first mile of all information policies. the library is in a key position in the connectivity debate because of its inclination to speak for the user and to be aware of the unique attributes and needs of its local community. as part of its mission, the library takes into account the distinctive needs of its user community when it designs and implements its services. a natural outgrowth of this practice is to be keenly aware of the demographics of the community at large. the library can leverage its knowledge and understanding to create an even greater positive impact on the social, educational, and economic community development made possible by broadband adoption. to extend the first mile perspective analogy, in the connectivity debate, the library will play the role of the middle mile: the support system that successfully connects the internet to the consumer. while the target populations for stimulating demand for broadband are really those in the second tier of users, by advocating for the first mile perspective, the library will be advocating for equitable information policies whose implementation has bearing on the early adopters as well. by stimulating demand for broadband within a 194 information technology and libraries | december 2010 initiatives,” 538. 12. ibid., 537–58. 13. sharon strover, gary chapman, and jody waters, “beyond community networking and ctcs: access, development, and public policy,” telecommunications policy 28, no. 7/8 (2004): 465–85. 14. ibid., 483. 15. ibid. 16. wilhelm, “leveraging sunken investments in communications infrastructure,” 282. 17. see, for example, the virginia broadband round table (http://www.otpba.vi.virginia.gov/broadband_roundtable .shtml), the ohio broadband council (http://www.ohiobroad bandcouncil.org/), and the california broadband task force (http://gov.ca.gov/speech/4596. see www.fcc.gov/recovery/ broadband/) for information on broadband initiatives in the american recovery and reinvestment act. 18. federal communication commission, national broadband plan: connecting america, http://www.broadband.gov/ (accessed apr. 11, 2010). 19. ibid. 20. horrigan, “broadband: what’s all the fuss about?” 2. 21. ricardo ramirez, “appreciating the contribution of broadband ict with rural and remote communities: stepping stones toward and alternative paradigm,” the information society 23 (2007): 86. 22. ibid., 92. 23. denise m. davis, john carlo bertot, and charles, r. mcclure, “libraries connect communities: public library funding & technology access study 2007–2008,” 35, http:// www.ala.org/ala/aboutala/offices/ors/plftas/0708/libraries connectcommunities.pdf (accessed jan. 24, 2009). 24. john carlo bertot et al., “public libraries and the internet 2008: study results and findings,” 11, http://www.ii.fsu.edu/ projectfiles/plinternet/2008/everything.pdf (accessed jan. 24, 2009). these numbers represent an increase from the previous year’s study which suggests that libraries while trying to meet demand are not able to keep up. 25. ibid. 26. sharon strover, “the first mile,” the information society 16, no. 2 (2000): 151–54. 27. ibid., 151. 28. clay shirky, “here comes everybody: the power of organizing without organizations.” berkman center for internet & society (2008). video presentation. available at http:// cyber.law.harvard.edu/interactive/events/2008/02/shirky (retrieved march 1, 2009). 29. john carlo bertot, charles r. mcclure, and paul t. jaeger, “the impacts of free public internet access on public library patrons and communities,” library quarterly 78, no. 3 (2008): 286, http://www.journals.uchicago.edu.proxy.ulib.iupui.edu/ doi/pdf/10.1086/588445 (accessed jan. 30, 2009). is to develop its role as the middle mile connecting the increasing breadth of internet resources to the general public. the broadband debate has moved out of the background of telecommunication policy and into the center of public attention. now is the moment that calls for an information policy advocate who can represent the end user while understanding the complexity of the other stakeholder perspectives. the library undoubtedly has its own share of stakeholders, but over time it is an institution that has maintained a neutral stance within its community, thereby achieving a unique ability to speak for all parties. those who speak for the library are able to represent the needs of the public, work with a diverse group of stakeholders, and help negotiate a sustainable strategy for providing broadband internet access. references and notes 1. lee rainie, “2.0 and the internet world,” internet librarian 2007, http://www.pewinternet.org/presentations/2007/20 -and-the-internet-world.aspx (accessed mar. 4, 2009). see also john horrigan, “a typology of information and communication technology users,” 2007, www.pewinternet.org/~/media// files/reports/2007/pip_ict_typology.pdf.pdf (accessed feb. 12, 2009). 2. lawrence lessig, “early creative commons history, my version,” video blog post, 2008, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed jan. 20, 2009). see the relevant passage from 20:53 through 21:50. 3. john horrigan, “broadband: what’s all the fuss about?” 2007, p. 1, http://www.pewinternet.org/~/media/ files/reports/2007/broadband%20fuss.pdf.pdf (accessed feb. 12, 2009). 4. “job-seeking in us public libraries,” public library funding & technology access study, 2009, http://www.ala.org/ ala/research/initiatives/plftas/issuesbriefs/brief_jobs_july.pdf (accessed mar. 27, 2009). 5. ibid. 6. ibid. 7. sharon e. gillett, william h. lehr, and carlos osorio, “local government broadband initiatives,” telecommunications policy 28 (2004): 539. 8. john horrigan, “home broadband adoption 2008,” 10, http://www.pewinternet.org/~/media//files/reports/2008/ pip_broadband_2008.pdf (accessed feb. 12, 2009). 9. anthony g. wilhelm, “leveraging sunken investments in communications infrastructure: a policy perspective from the united states,” the information society 19 (2003): 279–86. 10. horrigan, “home broadband adoption,” 12. 11. gillett, lehr, and osorio, “local government broadband roel some considered 2000 the year of the e-book, and due to the dot-com bust, that could have been the format’s highwater mark. however, the first quarter of 2004 saw the greatest number of e-book purchases ever with more than $3 million in sales. a 2002 consumer survey found that 67 percent of respondents wanted to read e-books; 62 percent wanted access to e-books through a library. unfortunately, the large amount of information written on e-books has begun to develop myths around their use, functionality, and cost. the author suggests that these myths may interfere with the role of libraries in helping to determine the future of the medium and access to it. rather than fixate on the pros and cons of current versions of e-book technology, it is important for librarians to stay engaged and help clarify the role of digital documents in the modern library. a lthough 2000 was unofficially proclaimed as the year of the electronic book, or e-book, due in part to the highly publicized release of a stephen king short story exclusively in electronic format, the dot-com bust would derail a number of high-profile e-book endeavors. with far less fanfare, the e-book industry has been slowly recovering. in 2004, e-books represented the fastest-growing segment of the publishing industry. during the first quarter of that year, more than four hundred thousand e-books were sold, a 46 percent increase over the previous year ’s numbers.1 e-books continue to gain acceptance with some readers, although their place in history is still being determined—fad? great idea too soon? wrong approach at any time? the answers partly depend on the reader ’s perspective. the main focus of this article is the role of e-book technologies in libraries. libraries have always served as repositories of the written word, regardless of the particular medium used to store the words. from the ancient scrolls of qumran to the hand-illuminated manuscripts of medieval europe to the familiar typeset codices of today, the library’s role has been to collect, organize, and share ideas via the written word. in today’s society, the written word is increasingly encountered in digital form. writers use word processors; readers see words displayed; and researchers can scan countless collections without leaving the confines of the office. for self-proclaimed book lovers, the digital world is not necessarily an ideal one. emotional reactions are common when one imagines a world without a favorite writing pen or the musty-smelling, yellowed pages of a treasured volume from youth. one of the battle lines between the traditional bibliophile and the modern technologist is drawn over the concept of the e-book. some see this digital form of written word as an evolutionary step beyond printed texts, which have been sometimes humorously dubbed tree-books. although a good deal of attention has been generated by the initial publicity regarding newer e-book technologies, the apparent failures of most of them has begun to establish myths around the concept. abram points out that the relative success of e-books in niche areas (such as reference works) is in direct contrast with public opinion of those purchasing novels and popular literature through traditional vendors.2 crawford paraphrases lewis carroll in describing this confusion: “when you cope with online content about e-books, you can believe six impossible things before breakfast.”3 incidentally, this article will attempt to dispel a mere five of the myths about e-books. the future of e-books and the critical role of libraries in this future are best served by uncovering these myths and seeking a balanced, reasoned view of their potential. a 2002 consumer survey on e-books found that 67 percent of respondents wanted to read an e-book, and 62 percent wanted that access to be from a library.4 underlying this position is the assumption that the ideas represented by the written word are of paramount importance to both writers and readers. it is also assumed that libraries will continue their critical role in collecting, organizing, and sharing information. � myth 1—e-books represent a new idea that has failed many libraries have invested in various forms of e-book delivery with mixed results.5 sottong wisely warns of the premature adoption of e-book technology, which he dubs a false pretender as a replacement to printed texts.6 however, the last five years are but a small part of a longer history, and presumably, a still longer future as is often the case with computer jargon, the term e-book has emerged and gained currency in a very short amount of time. however, the concept of providing written texts in an electronic format has existed for a long time, as demonstrated by bush’s description of the dispelling five myths about e-books james e. gall james e. gall (james.gall@unco.edu) is assistant professor of educational technology at the university of northern colorado, greeley. dispelling five myths about e-books | gall 25 26 information technology and libraries | march 2005 memex.7 the gutenberg project put theory into practice by converting traditional texts into digital files as early as 1971.8 even if the e-book merely represents the latest incarnation of the concept, it does so tenuously. books in their present form have a history of hundreds of years, or thousands if their parchment and papyrus ancestors are included. this history is rich with successes and failures of technology. for example, petroski presents an interesting historical examination of the problem of storing books when the one book–one desk model collapsed under the proliferation of available texts.9 similarly, a determination on the success or failure of e-books, or digital texts, based upon a relatively short period of time, is fraught with difficulty. rather, it is important to look at recent developments as merely a next step. the technology is clearly not ready for uncritical, widespread acceptance, but it is also deserving of more than a summary dismissal. � myth 2—e-books are easily defined the term e-book means different things depending on the context. at the simplest, it refers to any primarily textual material that is stored digitally to be delivered via electronic display. one of the confusing aspects of defining ebooks is that in the digital world, information and the media used to store, transfer, and view it are loosely coupled. an e-book in digital form can be stored on cd–rom or any number of other media and then passed on through computer networks or telephone lines. the device used to view an e-book could be a standard computer, a personal digital assistant (pda), or an e-book reader (the dedicated piece of equipment on which an e-book can be read; confusingly, also referred to as an e-book). technically, virtually any computing device with a display could be used as an e-book reader. from a practical point of view, our eyes might not tolerate reading great lengths of text on a wireless phone, and banks will not likely provide excerpts of chaucer during atm transactions. another important factor in defining e-books is the actual content. a conservative definition is that an e-book is an electronic copy or version of a printed text. this appears to be the predominant view of publishers. purists often maintain that a true e-book is one that is specifically written for that format and not available in traditional printed form.10 this was one of the categories of the shortlived (2000–2002) frankfurt e-book awards. of course, the multitude of textual materials that could be delivered via the technology exceeds these definitions. magazines, primary-source documents, online commentaries and reviews, and transcripts of audio or video presentations are just a short list of nonbook materials that are finding their way into e-book formats. one can note with some sense of irony that the technology behind the web was originally designed as a way for scientists to disseminate research reports.11 despite the web’s popularity, reading research reports makes up an exceedingly small percentage of its use today. although there is a continuing effort to reach a common standard for e-books (see www.openebook.org/), the current marketplace contains numerous noncompatible formats. this noncompatibility is the result of both design and competitive tradeoffs. in the case of the former, there is a distinct philosophical difference between formats that attempt to retain the original look and navigation of the printed page (such as adobe’s popular pdf files) versus those that retain the text’s structure but allow variability in its presentation (as best exemplified by the free-flowing nature of texts presented as html pages). this difference can also be seen in the functionality built around the format. traditional systems provide readers with familiar book characteristics such as a table of contents, bookmarks, and margin notes, a view that could be named bibliocentric. the alternative is one that takes more advantage of the new medium and could be labeled technocentric, and can most easily be seen in the extensive use of hyperlinking.12 the simplest use of hyperlinking provides an easy form of annotating texts and presenting related texts. on the other extreme, hyperlinks are used in the creation of nonlinear texts in which the followed links provide a unique context for building meaning on the part of the reader.13 it is interesting to note that a preliminary study of e-book features found that the most desirable features tended to reflect the functionality of traditional books and the least desirable features provided functionality not found there.14 competitive tradeoffs are a critical issue at the current point of e-book development. the current profit models of publishing entities and copyright concerns of authors seem naturally opposed to e-book formats in which texts were freely shared, duplicated, and distributed. for example, the open ebook forum is the most prominent organization devoted to the development of standards for e-book technologies. in late 2004, their web site listed seventy-six current members. although the american library association is a member, it is one of only six members representing library-oriented organizations. in comparison, thirty-five members (or 46 percent) are publishing organizations, and thirteen (or 17 percent) are technology companies.15 the number of traditional publishers versus technology companies on this list may suggest that a bibliocentric view of ebooks would be more favored. this also appears to confirm one media prediction that traditional publishers would continue to dominate efforts with this new medium.16 however, the limited representation of libraries in this endeavor is troubling (despite the disclaimer of using an admittedly rough metric for measuring impact). it is clear that many industry formats attempt to limit the ability to distribute materials by keying files so that they may only be viewed on one device or a specific installed version of the reader software. this creates technological problems for entities like libraries that attempt to provide access to information for various parties. the concept of fair use of copyrighted materials has to be reexamined under an entirely new set of assumptions. another irony is that the availability of free, public-domain materials in e-book format can be viewed as negative by the publishing industry. after investing considerable time and effort in developing e-book technology, publishers would prefer that users continue purchasing new e-book material rather than spend time reading the vast library of free historical material. many of these content issues are currently being played out in courts and the marketplace, particularly with regard to digital music and video.17 although one can humorously imagine the so-called problems associated with a population obsessed with downloading and reading great literature, the precedents set by these popular media will have a direct impact on the future of digital texts. despite the labor required to scan or key entire print books into digital formats, there have been some reports of this type of piracy.18 other models for the dissemination of digital intellectual property that are not determined by traditional material concerns of supply and demand will continually be attempted. for example, nelson predicted a hypertext-publishing scheme in which all material was available, but royalties were distributed according to actual access by end users.19 theoretically, such a system would provide a perfect balance between access and profitability. in nelson’s words “nothing will ever be misquoted or out of context, since the user can inquire as to the origins and native form of any quotation or other inclusion. royalties will be automatically paid by a user whenever he or she draws out a byte from a published document.”20 � myth 3—e-books and printed books are competing media many, if not most, published articles regarding e-books follow classic plot construction; the writer must present a protagonist and an antagonist. bibliophiles place the printed page as the hero and the e-book as the potential bane of civilization. proulx, one such author, was quoted as saying, “nobody is going to sit down and read a novel on a twitchy little screen—ever.”21 technologists cast the e-book as the electronic savior of text, replacing the tired tradition of the printed word in the same way the printed word replaced oral traditions. hawkins quotes an author who claims that e-books are “a meteor striking the scholarly publication world.” his slightly more restrained view was that e-books had the potential “to be the most far-reaching change since gutenberg’s invention.”22 grant places this metaphorical battle at the forefront by titling an article “e-books: friend or foe?”23 before deciding which side to take, consider whether this clash of media is an appropriate metaphor. this author has introduced samples of current ebook technology in graduate classes he has taught. when presented with the technology as part of the coursework, students quickly declare their allegiances. bibliophiles most often suggest that the technology will never replace the love of curling up with a good book. the technologists will ask how many pages can be stored in the device and then fantasize about the types of libraries they can carry and the various venues for reading that they will explore. however, after a few weeks in using the devices, both groups tend to move to a middle ground of practical use. at that point, the discussion turns to what materials are best left on the printed page (usually described as pleasure reading) and what would be useful in e-book format (reference works, course catalogs, how-to manuals). other instructors have reported similar patterns of use.24 at this point, the observation is largely anecdotal, but it does call into question the perceived need for a decisive referendum on the value of e-books. the issue is not whether e-books will replace the printed word. the concern of librarians and others involved in the infrastructure of the book should be on developing the proper role for e-books in a broader culture of information. unless this approach is taken, the true goal of libraries—disseminating information to the public—will suffer. the gap between bibiliophile and technologist approaches can already be seen in the materials available in e-book format. the publishing industry in general treats the e-book as just another format, releasing the same titles in hardcover, book-on-tape, and e-book at the same time. on the opposite end of the spectrum, technologists have adopted various e-book formats for creating and transferring numerous reference documents. given their preferences, it is easy to find e-book references on unix, html coding, and the like, but there is a scarcity of materials in philosophy, history, and the arts. librarians seem the most appropriate group for developing shared understanding. publishers and e-book hardware and software manufacturers need to be concerned with the bottom line. libraries, by design, are concerned with the preservation of information and its continued dissemination long after the need to sell a particular book has passed. the hobby of creating and transferring texts to digital form is idiosyncratic and unorganized when viewed from the highest levels. libraries not only contain expertise in all areas of human endeavor, but also have strategies for categorizing and maintaining information in productive ways. in short, libraries are the best line of defense for maintaining the value of the printed page and promoting the value of digital texts. dispelling five myths about e-books | gall 27 28 information technology and libraries | march 2005 � myth 4—e-books are expensive a common complaint about e-books is that they are expensive. on the surface, this seems clear. dedicated ebook readers seemed to bottom out at around $300, and a new bestseller in e-book format is priced about the same as the hardcover edition. add the immediate and longterm costs of rechargeable batteries and the electricity needed to power them, and the economic case against the e-book appears closed. what if we turn the same critical eye to the printed page? the manufacture and distribution of printed texts is highly developed and astounding. when gutenberg succeeded in putting the christian bible in the hands of the moneyed public, he surely could not have comprehended the billions of copies that would eventually be distributed. even with the wealth of printed material at hand, one must still consider the high cost of the system. the law of supply and demand rules books as a tangible product. the most profitable books are those that will reach the most readers. specialized texts have limited audiences and, therefore, will usually be priced higher. this produces problems for both groups. popular texts must be printed in high quantities and delivered to various outlets. unfortunately, the printed page does have maintenance costs. sellen and harper point out that the actual printing cost is insignificant compared with the cost of dealing with documents after printing. they cite one study that indicated that united states businesses spend about $1 billion per year designing and printing forms, but spend an additional $25 to $35 billion filing, storing, and retrieving them.25 books are no different; as any librarian knows, it costs money to maintain a collection and protect texts from the environment and the effects of age. in the retail arena, the competition is fiercer. books that do not sell are removed in favor of those that do. it is estimated that 10 percent of texts printed each year are turned to pulp, although, fortunately, many are recycled.26 the bbc reported that more than two million former romance novels were used in the construction of a new tollway.27 with more specialized texts, the problem is not wealth, but scarcity. if a text is not profitable, it will probably become out of print. this is often synonymous with inaccessible. from the publisher’s perspective, it is only cost-effective to commit to a printing when the demand is high enough. a library is a good source of outof-print texts, provided that it has been funded appropriately to acquire and maintain the particular works that are needed. e-books are not a panacea. other innovations, such as on-demand publishing, may be part of the answer in solving the economic issues regarding collections. however, e-books can help alleviate some of these issues. e-books are easily copied and distributed, which is a boon to the researcher and information consumer. in many cases, the goal is the access to information, not the possession of a book. it could also benefit the author and publisher if appropriate reimbursement systems are put into place. as previously described, nelson originally envisioned his online hypertext system, xanadu, with a mechanism for royalties based on access—a supply-anddemand system for ideas, not materials.28 the systems used to manage access to digital materials continue to increase in complexity and have spawned a whole new business of digital rights management (drm).29 examples include reciprocal (www.reciprocal.com), overdrive (www.overdrive.com), and netlibrary (www.net library.com). libraries are the specific target of netlibrary, which promotes an e-books-on-demand project that allows free access for short periods of time.30 the creation of a standard digital object identifier (doi) for published materials may also help online publishers and entities like libraries manage their digital collections more easily.31 online music systems, such as apple’s itunes (www. itunes.com), strike a workable balance between quickand-easy access to music and a workable, economic model for reimbursing artists. e-books also have appeal for special audiences who already require assistive technologies for accessing print collections.32 having discussed the hidden costs of printed texts, another important economic issue of e-books to examine is a current trend in usage. despite the availability of dedicated e-book readers, the largest growth in e-book usage is surely in nondedicated devices. e-book–reading software is available for personal computers, laptops, and pdas. according to one source, microsoft had sold four million pocketpc e-book-enabled devices, and had two million downloads of the ms reader for the personal computer; palm had sold approximately 20 million ebook-enabled devices; and adobe had more than 30 million acrobat readers downloaded.33 these numbers alone indicate some 24 million reader-capable pdas, and 32 million reader-capable pcs, for a total of 56 million devices. although it is difficult to find data on actual use, one online bookseller reported some data on e-book use from an audience survey.34 although 88 percent had purchased books online, only 16 percent had read an e-book (11 percent using a pc, 3 percent on a handheld device, and 2 percent on both). it is presumed that in most cases this equipment was purchased for other reasons, with ebook reading being a secondary function. as such, it would be unfair to include the full cost of this equipment in any calculation of the cost of providing information in an e-book format. if so, the cost of providing artificial lighting in any building where reading takes place would need to be calculated as part of the cost of the printed page. the potential user base for the e-book rises as more computers and pdas are sold, decreasing the need for special equipment. this does not mean that the dedicated e-book reader is obsolete. by most commercial accounts, the apple newton was a failure. its bulky size and awkward interface were the subject of much ridicule. however, it did introduce the concept of the pda. the success of the palm line of products owes much to the proof of concept provided by the newton. the makers of the portable gameboy videogame system are repositioning it for multimedia digital-content delivery, and plan to pilot a flash-memory download system for various content types, including e-books.35 innovative products such as e-paper are already developed in prototype form.36 they are likely to lead to another wave of dedicated e-book readers or provide e-book–reading potential embedded in other consumer applications. � myth 5—e-books are a passing fad it is trendy to list the failures of past media (such as radio, film, and television) in impacting education despite great initial promise.37 however, all those media are still with us after having found particular niches within our culture. if the e-book is viewed as just an alternative format, comparisons with past experiences of library collections containing videotapes, record albums, and such are not appropriate.38 however, if e-books are viewed as a tool or way to access information, the questions change. instead of asking how digital formats will replace print collections, we can ask how will an e-book version extend the reach of our current collection or provide our readers with resources previously unavailable or unaffordable. when trying to locate a research article, one is generally not concerned with whether the local library has a loose copy, bound copy, microform, microfiche, or even has to resort to interlibrary loan. as long as the content is accessible and can be cited, it can be used. electronic access to journal content is becoming more common. perhaps dry journal articles do not conjure up the same romantic visions of exploring the stacks that may hinder greater acceptance of e-books. a parallel can be drawn to the current work of filmrestoration experts. the medium of film has reached an age where some of the earliest influential works no longer exist or are in a condition of rapid deterioration. according to one film site, more than half of the films made before 1950 have already been lost due to decay of existing copies.39 the work of restoration involves finding what remains of a great work in various vaults and collections. often, the only usable film is a secondor third-generation copy. from digitized copies, cleaning, color correction, and other painstaking work, a restored and—it is hoped—complete work emerges. ironically, once this laborious process is completed, a near-extinct classic is suddenly available to millions in the form of a dvd disc at a local retailer. what if the same attitude was taken with the world’s collections of printed materials? jantz has described potential impacts of e-book technology on academic libraries.40 lareau conducted a study on using e-books to replace lost books at kent state university, but found that limited availability and high costs did not make it feasible at the time.41 project gutenberg (www.gutenberg.net) and the electronic text center at the university of virginia (http://etext.lib.virginia.edu) are two examples of scholars attempting to save and share book content in electronic forms, but more efforts are needed. unfortunately, the shift to digital content has also contributed to the sheer volume of content available. edwards has recently discussed issues in attempting to archive and preserve digital media.42 the web may be suffering from a glut of information, but the content is highly skewed toward the new and technology oriented. in a few years, we may find that nontechnology–related endeavors are no longer represented in our information landscape. � conclusion the e-book industry is currently dominated by commercial-content providers, such as franklin, and software companies, most notably adobe, palm, and microsoft. traditional print-based publishers have also maintained continued interest in the medium. it is assumed that these publishers had the capital to weather the ups and downs of the industry more so than new publishers dedicated solely to e-book delivery. although the contributions and efforts of these organizations are needed, the future of e-book content should not be left to their largesse. when the rocket e-book device was initially released, a small but loyal following of readers contributed thousands of titles to its online library. some of these titles were self-published vanity projects or brief reference documents, but many were public-domain classics, painstakingly scanned or keyed in by readers wishing to share their favorite reads. when gemstar purchased rocket, the software’s ability to create non-purchased content was curtailed and the online library of free titles dismantled. apparently, both were viewed as limiting the profitability of the e-book vendor. however, gemstar recently made notice of discontinuing their e-book reading devices, one would assume due to a lack of profitability. this can be seen as a cautionary tale for libraries, which often define success by number of volumes available and accessed rather than units sold. committing to a technology that concurrently requires consumer success can be problematic. bibliophile and technologist alike must take responsibility for the future of our collective information resources. the bibliophile must ensure that all aspects of dispelling five myths about e-books | gall 29 30 information technology and libraries | march 2005 human knowledge and creativity are nurtured and allowed to survive in electronic forms. the technologist must ensure that accessibility and intellectual-property rights are addressed with every technological innovation. parry provides three concrete suggestions for public libraries in response to new media demands: continue to acknowledge and respond to customer demands, revisit the library’s mission statement for currency, and promote or accelerate shared agreements with other institutions to alleviate the high costs of accumulating resources.43 the proper frame of mind for these activities is suggested by levy: we make a mistake, i believe, when we fixate on particular forms and technologies, taking them in and of themselves, to be the carriers of what we want to embrace or resist. . . . it isn’t a question, it needn’t be a question, of books or the web, of letters or e-mail, of digital libraries or the bricks-and-mortar variety, of paper or digital technologies. . . . these modes of operation are only in conflict when we insist that one or the other is the only way to operate.44 in the early 1930s, lomax dragged his primitive audio-recording equipment over the roads of the american south to capture the performances of numerous folk musicians.45 at the time, he certainly didn’t imagine that at one point in history someone with a laptop computer sitting in a coffee shop with wireless access could download the performances of robert johnson from itunes. however, without his efforts, those unique voices in our history would have been lost. it is hoped that the readers of the future will be thanking the library professionals of today for preserving our print collections and enabling their access digitally via our primitive, but evolving, e-book technologies. references 1. open e-book forum, “press release: record e-book retail sales set in q1 2004,” june 4, 2004. accessed dec. 27, 2004, www.openebook.org. 2. stephen abram, “e-books: rumors of our death are greatly exaggerated,” information outlook 8, no. 2 (2004): 14–16. 3. walt crawford, “the white queen strikes again: an e-book update,” econtent 25, no. 11 (2002): 46–47. 4. harold henke, “consumer survey on e-books.” accessed dec. 27, 2004, www.openebook.org. 5. sue hutley, “follow the e-book road: e-books in australian public libraries,” aplis 15, no. 1 (2002): 32–37; andrew k. pace, “e-books: round two,” american libraries 35, no. 8 (2004): 74–75; michael rogers, “librarians, publishers, and vendors revisit e-books,” library journal 129, no. 7 (2004): 23–24. 6. stephen sottong, “e-book technology: waiting for the ‘false pretender,’” information technology and libraries 20, no. 2 (2001): 72–80. 7. vannevar bush, “as we may think,” atlantic monthly 176, no. 1 (1945): 101–108. 8. michael s. hart, “history and philosophy of project gutenberg.” accessed dec. 27, 2004, www.gutenberg.net/ about.shtml. 9. henry petroski, the book on the bookshelf (new york: vintage, 2000). 10. steve ditlea, “the real e-books,” technology review 103, no. 4 (2000): 70–73. 11. tim berners-lee, weaving the web: the original design and ultimate destiny of the world wide web by its inventor (new york: harpercollins, 1999). 12. james e. gall and annmari m. duffy, “e-books in a college course: a case study” (presented at the association for educational communications and technology conference, atlanta, ga., nov. 8–10, 2001). 13. george p. landow, hypertext 2.0: the convergence of contemporary critical theory and technology (baltimore, md.: johns hopkins univ. pr., 1997). 14. harold henke, “survey on electronic book features.” accessed dec. 27, 2004, www.openebook.org. 15. open e-book forum, “press release: record e-book retail sales set in q1 2004.” 16. lori enos, “report: e-book industry set to explode,” e-commerce times, 20 dec. 2000. accessed dec. 27, 2004, www. ecommercetimes.com/story/6215.html. 17. luis a. ubinas, “the answer to video piracy,” mckinsey quarterly no. 1. accessed accessed dec. 27, 2004, www .mckinseyquarterly.com. 18. mark hoorebeek, “e-books, libraries, and peer-topeer file-sharing,” australian library journal 52, no. 2 (2003): 163–68. 19. theodor h. nelson, “managing immense storage,” byte 13, no. 1 (1988): 225–38. 20. ibid., 238. 21. jacob weisberg, “the way we live now: the good ebook,” new york times, 4 june 2000. accessed dec. 27, 2004, www.nytimes.com. 22. donald t. hawkins, “electronic books: a major publishing revolution. part 1: general considerations and issues,” online 24, no. 4 (2000): 14–28. 23. steve grant, “e-books: friend or foe?” book report 21, no. 1 (2002): 50–54. 24. lori bell, “e-books go to college,” library journal 127, no. 8 (2002): 44–46. 25. abigail j. sellen and richard h. harper, the myth of the paperless office (cambridge, mass.: mit pr., 2002). 26. stephen moss, “pulped fiction,” sydney morning herald, 29 mar. 2002. accessed dec. 27, 2004, www.smh.com.au. 27. bbc news, “m6 toll built with pulped fiction,” bbc news uk edition, 18 dec. 2003. accessed dec. 27, 2004, http:// news.bbc.co.uk. 28. nelson, “managing immense storage.” 29. michael a. looney and mark sheehan, “digitizing education: a primer on e-books,” educause 36, no. 4 (2001): 38–46. 30. brian kenney, “netlibrary, ebsco explore new models for e-books,” library journal 128, no. 7 (2003). 31. stephen h. wildstrom, “a library to end all libraries,” business week (july 23, 2001): 23. online.” they have implemented several process improvements already and will complete their work by the 2005 ala annual conference. this past fall, michelle frisque, lita web manager, conducted a survey of our members about the lita web site. michelle and the web coordinating committee are already working on a new look and feel for the lita web site based on the survey comments, and the result promises to be phenomenal. on top of all of the current activities, new vision statement, strategic planning, and the lita web site redesign, mary taylor and the lita board worked with a graphic designer to develop a new lita logo. after much deliberation, the new logo debuted at the 2004 lita national forum with great enthusiasm. many members commented that the new logo expresses the “energy” of lita and felt the change was terrific. with your help, lita had a very successful conference in orlando. although there were weather and transportation difficulties, the lita programs and discussions were of the highest quality, as always. the program and preconference offerings for the upcoming annual conference in chicago promise to be as strong as ever. don’t forget, lita also offers regional institutes throughout the year. check the lita web site to see if there’s a regional institute scheduled in your area. lita held another successful national forum in fall 2004 in st. louis, “ten years of connectivity: libraries, the world wide web, and the next decade.” the threeday educational event included excellent preconferences, general sessions, and more than thirty concurrent sessions. i want to thank the wonderful 2004 lita national forum planning committee, chaired by diane bisom, the presenters, and the lita office staff who all made this event a great experience. the next lita national forum will be held at the san jose marriott, san jose, california, september 29–october 2, 2005. the theme will be “the ubiquitous web: personalization, portability, and online collaboration.” thomas dowling, chair, and the 2005 lita national forum planning committee are preparing another “must attend” event. next year marks lita’s fortieth anniversary. 2006 will be a year for lita to celebrate our history, future, and our many accomplishments. we are fortunate to have lynne lysiak leading the fortieth anniversary task force activities. i know we all will enjoy the festivities. i look forward to working with many of you as we continue to make lita a wonderful and vibrant association. i encourage you to send me your comments and suggestions to further the goals, services, and activities of lita. 32. terence cavanaugh, “e-books and accommodations: is this the future of print accommodation?” teaching exceptional children 35, no. 2 (2002): 56–61. 33. skip pratt, “e-books and e-publishing: ignore ms reader and palm os at your own peril,” knowledge download, 2002. accessed dec. 27, 2004, www.knowledge-download.com/260802 -e-book-article. 34. davina witt, “audience profile and demographics,” mar./apr. 2003. accessed dec. 27, 2004, www.bookbrowse.com/ media/audience.cfm. 35. geoff daily, “gameboy advance: not just playing with games,” econtent 27, no. 5 (2004): 12–14. 36. associated press, “flexible e-paper on its way,” associated press, 7 may 2003. accessed dec. 27, 2004, www.wired.com/news. 37. richard mayer, multimedia learning (cambridge, uk: cambridge university press, 2000). 38. sottong, “e-book technology.” 39. amc, “film facts: read about lost films.”accessed june 19, 2003, www.amctv.com/article?cid=1052. 40. ronald jantz, “e-books and new library service models: an analysis of the impact of e-book technology on academic libraries,” information technology and libraries 20, no. 2 (2001): 104–15. 41. susan lareau, the feasibility of the use of e-books for replacing lost or brittle books in the kent state university library, 2001, eric, ed 459862. accessed dec. 27, 2004, http://searcheric.org. 42. eli edwards, “ephemeral to enduring: the internet archive and its role in preserving digital media,” information technology and libraries 23, no. 1 (2004): 3–8. 43. norm parry, format proliferation in public libraries, 2002, eric, ed 470035,. accessed dec. 27, 2004, http://searcheric.org. 44. david m. levy, scrolling forward: making sense of documents in the digital age (new york: arcade pub., 2001). 45. about alan lomax. accessed dec. 27 2004, www.alan -lomax.com/about.html. dispelling five myths about e-books | gall 31 (president’s column continued from page 2) art & tech 24 ebsco cover 2 lita covers 3–4 index to advertisers 26 information technology and libraries | september 2007 author id box for 2 column layout wikis in libraries matthew m. bejune wikis have recently been adopted to support a variety of collaborative activities within libraries. this article and its companion wiki, librarywikis (http://librarywikis. pbwiki.com/), seek to document the phenomenon of wikis in libraries. this subject is considered within the framework of computer-supported cooperative work (cscw). the author identified thirty-three library wikis and developed a classification schema with four categories: (1) collaboration among libraries (45.7 percent); (2) collaboration among library staff (31.4 percent); (3) collaboration among library staff and patrons (14.3 percent); and (4) collaboration among patrons (8.6 percent). examples of library wikis are presented within the article, as is a discussion for why wikis are primarily utilized within categories i and ii and not within categories iii and iv. it is clear that wikis have great utility within libraries, and the author urges further application of wikis in libraries. i n recent years, the popularity of wikis has skyrocketed. wikis were invented in the mid1990s to help facilitate the exchange of ideas between computer programmers. the use of wikis has gone far beyond the domain of com puter programming, and now it seems as if every google search contains a wikipedia entry. wikis have entered into the public consciousness. so, too, have wikis entered into the domain of professional library practice. the purpose of this research is to document how wikis are used in librar ies. in conjunction with this article, the author has created librarywikis (http://librarywikis.pbwiki.com/), a wiki to which readers can submit additional examples of wikis used in libraries. the article will proceed in three sections. the first section is a literature review that defines wikis and introduces computersupported cooperative work (cscw) as a context for understanding wikis. the second section documents the author’s research and presents a schema for classifying wikis used in libraries. the third section considers the implications of the research results. ■ literature review what’s a wiki? wikipedia (2007a) defines a wiki as: a type of web site that allows the visitors to add, remove, edit, and change some content, typically with out the need for registration. it also allows for linking among any number of pages. this ease of interaction and operation makes a wiki an effective tool for mass collaborative authoring. wikis have been around since the mid1990s, though it is only recently that they have become ubiquitous. in 1995, ward cunningham launched the first wiki, wikiwikiweb (http://c2.com/cgi/wiki), which is still active today, to facilitate the exchange of ideas among computer program mers (wikipedia 2007b). the launch of wikiwikiweb was a departure from the existing model of web communica tion ,where there was a clear divide between authors and readers. wikiwikiweb elevated the status of readers, if they so chose, to that of content writers and editors. this model proved popular, and the wiki technology used on wikiwikiweb was soon ported to other online communi ties, the most famous example being wikipedia. on january 15, 2001, wikipedia was launched by larry sanger and jimmy wales as a complementary project for the nowdefunct nupedia encyclopedia. nupedia was a free, online encyclopedia with articles written by experts and reviewed by editors. wikipedia was designed as a feeder project to solicit new articles for nupedia that were not submitted by experts. the two services coexisted for some time, but in 2003 the nupedia servers were shut down. since its launch, wikipedia has undergone rapid growth. at the close of 2001, wikipedia’s first year of operation, there were 20,000 articles in eighteen language editions. as of this writing, there are approximately seven million articles in 251 languages, fourteen of which have more than 100,000 articles each. as a sign of wikipedia’s growth, when this manuscript was first submitted four months earlier, there were more than five million articles in 250 languages. author’s note: sources in the previous two para graphs come from wikipedia. the author acknowledges the concerns within the academy regarding the practice of citing wikipedia within scholarly works; however, it was decided that wikipedia is arguably an authoritative source on wikis and itself. nevertheless, the author notes that there were changes—insubstantial ones—to the cited wikipedia entries between when the manuscript was first submitted and when it was revised four months later. wikis and cscw wikis facilitate collaborative authoring and can be con sidered one of the technologies studied under the domain of cscw. in this section, cscw is explained and it is shown how wikis fit within this framework. cscw is an area of computer science research that considers the application of computer technology to sup port cooperative, also referred to as collaborative work. the term was first coined in 1984 by irene greif (1988) and matthew m. bejune (mbejune@purdue.edu) is an assistant professor of library science at purdue university libraries. he also is a doctoral student at the graduate school of library and information science, university of illinois at urbana-champaign. article title | author 27wikis in libraries | bejune 27 paul cashman to describe a workshop they were planning on the support of people in work environments with com puters. over the years there have been a number of review articles that describe cscw in greater detail, including bannon and schmidt (1991), rodden (1991), schmidt and bannon (1992), sachs (1995), dourish (2001), ackerman (2002), olson and olson (2002), dix, finlay, abowd, and beale (2004), and shneiderman and plaisant (2005). publication in the field of cscw primarily occurs through conferences. the first conference on cscw was held in 1986 in austin, texas. since then, the conference has been held biennially in the united states. proceedings are published by the association for computing machinery (acm, http://www.acm.org/). in 1991, the first european conference on computer supported cooperative work (ecscw) was held in amsterdam. ecscw also is held biennially, in oddnumbered years. ecscw proceedings are published by springer (http://www.ecscw.unisie gen.de/). the primary journal for cscw is computer supported cooperative work: the journal of collaborative computing. publications also appear within publications of the acm and chi, the conference on human factors in computing. cscw and libraries as libraries are, by nature, collaborative work envi ronments—library staff working together and with patrons—and as digital libraries and computer technolo gies become increasingly prevalent, there is a natural fit between cscw and libraries. the following researchers have applied cscw to libraries. twidale et al. (1997) pub lished a report sponsored by the british library research and innovation centre that examined the role of col laboration in the informationsearching process to inform how information systems design could better address and support collaborative activity. twidale and nichols (1998) offered ethnographic research of physical collaborative environments—in a university library and an office—to aid the design of digital libraries. they wrote two reviews of cscw as applied to libraries—the first was more com prehensive (twidale and nichols 1998) than the second (twidale and nichols 1999). sánchez (2001) discussed collaborative environments designed and prototyped for digital library environments. classification of collaboration technologies that facilitate collaborative work are typically classified within cscw across two continua: synchronous versus asynchronous, and colocated versus remote. if put together in a twobytwo matrix, there are four possibilities: (1) synchronous and colocated (same time, same place); (2) synchronous and remote (same time, different place); (3) asynchronous and remote (different time, different place); and (4) asynchronous and colocated (different time, same place). this classification schema was first proposed by johansen et al. (1988). nichols and twidale (1999) mapped work applications within the realm of cscw in figure 1. wikis are not present in the figure, but their absence is not an indication that they are not cooperative work technologies. rather, wikis were not yet widely in use at the time cscw was considered by nichols and twidale. the author has added wikis to nichols and twidale’s graphical representation in figure 2. interestingly, wikis are bordercrossers fitting within two quadrants: the upper right—asynchronous and colocated; and the lower right—asynchronous and remote. wikis are asynchronous in that they do not require people to be working together at the same time. they are both colocated and remote in that people working collaboratively may not need to be working in the same place. it is also interesting to note that library technologies also can be mapped using johansen’s schema. nichols and twidale (1999) also mapped this, and figure 3 illus trates the variety of collaborative work that goes on within libraries. ■ method in order to to discover the widest variety of wikis used in libraries, the author searched for examples of wikis used in libraries within three areas—the lis literature, the library success wiki, and within messages posted on three professional electronic discussion lists. when examples were found, they were logged and classified according to a schema created by the author. results are presented in the next section. the first area searched was within the lis literature. the author utilized the wilson library literature and figure 1. classification of cscw applications co-located remote synchronous asynchronous meeting rooms distributed meetings muds and moos shared drawing video conferencing collaborative writing team rooms organizational memory workflow web-based applications collaborative writing 2� information technology and libraries | september 20072� information technology and libraries | september 2007 information science database. there were two main types of articles: ones that argued for the use of wikis in libraries, and ones that were case studies of wikis that had been implemented. the second area searched was within library success: a best practices wiki (http://www.libsuccess.org/) (see figure 4), created by meredith farkas, distance learning librarian at norwich university. as the name implies, it is a place for people within the library community to share their success stories. posting to the wiki is open to the public, though registration is encouraged. there are many subject areas on the wiki, including management and leadership, readers’ advisory, reference services, infor mation literacy, and so on. there also is a section about collaborative tools in libraries (http://www.libsuccess .org/index.php?title=collaborative_tools_in_libraries), in which examples of wikis in libraries are presented. within this section there is a presentation about wikis made by farkas (2006) titled wiki world (http://www. libsuccess.org/indexphp?title=wiki_world), from which examples were culled. the third area that was searched was professional electronic discussion list messages from web4lib, dig_ ref, and librefl. the web4lib electronic discussion list (tennant 2005) is “for the discussion of issues relating to the creation, management, and support of library based world wide web servers, services, and applica tions.” the list is moderated by roy tennant and the web4lib advisory board and was started in 1994. the dig_ref electronic discussion list is a forum for “people and organizations answering the questions of users via the internet” (webjunction n.d.). the list is hosted by the information institute of syracuse, school of information studies, syracuse university, and was created in 1998. the librefl electronic discussion list is “a moderated discussion of issues related to reference librarianship (balraj 2005). established in 1990, it’s operated out of kent state university and moderated by a group of list own ers. these three electronic discussion lists were selected for two reasons. first, the author is a subscriber to each electronic discussion list, and prior to the research noted there were messages about wikis in libraries. second, based on the descriptions of each electronic discussion list stated above, the selected electronic discussion lists reasonably covered the discussion of wikis in libraries within the professional library electronic discussion lists. one year of messages, november 15, 2005, through november 14, 2006, was analyzed for each list. messages about wikis in libraries were identified through key word searches against the author’s personal archive of electronic discussion list messages collected over the figure 2. classification of cscw applications including wikis co-located remote synchronous asynchronous meeting rooms distributed meetings muds and moos shared drawing video conferencing collaborative writing wikis team rooms wikis organizational memory workflow web-based applications collaborative writing figure 3. classification of collaborative work within libraries co-located remote synchronous asynchronous personal help reference interview issue of book on loan fact-to-face interactions use of opacs database search video conferencing telephone notice boards post-it notes memos documents for study social information filtering e-mail, voicemail distance learning postal services figure �. library success: a best practices wiki (http://www. libsuccess.org/) article title | author 29wikis in libraries | bejune 29 years. an alternative method would have been to search the web archive of each list, but the author found it easier to search within his mail client, microsoft outlook. messages with the word “wiki” were found in 513 mes sages: 354 in web4lib, 91 in dig_ref, and 68 in libref l. this approach had high recall, as discourse about wikis frequently included the use of the word “wiki,” though low precision, as there were many results that were not about wikis used in libraries. common false hits included messages about the nature study (giles 2005) that com pared wikipedia to encyclopedia britannica, and messages that included the word “wiki” but were simply refer ring to wikis, though not examples of wikis used within libraries. from the list of 513 messages, the author read each message and came up with a much shorter list of thirtynine messages about wikis in libraries: thirtytwo in web4lib, three in dig_ref, and four in librefl. ■ results classification of the results after all wiki examples had been collected, it became clear that there was a way to classify the results. in farkas’s (2006) presentation about wikis, she organized wikis in two categories: (1) how libraries can use wikis with their patrons; and (2) how libraries can use wikis for knowledge sharing and collaboration. this schema, while it accounts for two types of collaboration, is not granular enough to represent the types of collaboration found within the wiki examples identified. as such, it became clear that another schema was needed. twidale and nichols (1998) identified three types of collaboration within libraries: (1) collaboration among library staff; (2) collaboration between a patron and a member of staff; and (3) collaboration among library users. their classification schema mapped well to the examples of wikis that were identified; however, it too was not granular enough, as it did not distinguish among col laboration between library staff intraorganizationally and extraorganizationally, the two most common types of wiki usage found in the research (see appendix). to account for these types of collaboration, which are common not only to wiki use in libraries but to all professional library prac tice, the author modified twidale and nichols schema (see figure 6). the improved schema also uniformly represents entities across the categories—library staff and member of staff are referred to as “library staff,” and patrons and library users are referred to as “patrons.” examples of wikis used in libraries for each category are provided to better illustrate the proposed classifica tion schema. ■ collaboration among libraries the library instruction wiki (http://instructionwiki .org/main_page) is an example of a wiki that is used for collaboration among libraries (figure 7). it appears as though the wiki was originally set up to support library instruction within oregon—it is unclear if this was asso ciated with a particular type of library, say academic or public—but now the wiki supports library instruction in general. the wiki is selfdescribed as: a collaboratively developed resource for librarians involved with or interested in instruction. all librarians and others interested in library instruction are welcome and encouraged to contribute. the tagline for the wiki is “stop reinventing the wheel”(library instruction wiki 2006). from this wiki there figure 6. four types of collaboration within libraries 1. collaboration among libraries (extra-organizational) 2. collaboration among library staff (intra-organizational) 3. collaboration among library staff and patrons 4. collaboration among patrons figure 5. wiki world (http://www.libsuccess.org/index.php?title=wiki _world) 30 information technology and libraries | september 200730 information technology and libraries | september 2007 is a list of library instruction resources that include the fol lowing: handouts, tutorials, and other resources to share; teaching techniques, tips, and tricks; classspecific web sites and handouts; glossary and encyclopedia; bibliography and suggested reading; and instructionrelated projects, brainstorms, and documents. within the handouts, tutori als, and other resources to share section, the author found a wide variety of resources from libraries across the country. similarly, there were a number of suggestions to be found under the teaching techniques, tips, and tricks section. another example of a wiki used for collaboration among libraries is the library success wiki (http://www .libsuccess.org/), one of the sources of examples of wikis used in this research. adding to earlier descriptions of this wiki as presented in this paper, library success seems to be one of the most frequently updated library wikis and perhaps the most comprehensive in its cover age of library topics. ■ collaboration among library staff the university of connecticut libraries’ staff wiki (http:// wiki.lib.uconn.edu/) is an example of a wiki used for col laboration among library staff (figure 8). this wiki is a knowledge base containing more than one thousand infor mation technology services (its) documents. its docu ments support the information technology needs of the library organization. examples include answers to com monly asked questions, user manuals, and instructions for a variety of computer operations. in addition to being a repository of its documents, the wiki also serves as a portal to other wikis within the university of connecticut libraries. there are many other wikis connected to library units; teams; software applications, such as the libraries ils; libraries within the university of connecticut libraries; and other university of connecticut campuses. the health science library knowledge base, stony brook university (http://appdev.hsclib.sunysb.edu/ twiki/bin/view/main/webhome) is another example of a wiki that is used for collaboration among library staff (figure 9). the wiki is described as “a space for the dynamic collaboration of the library staff, and a platform of shared resources” (health sciences library 2007). on the wiki there are the following content areas: news and announcements; hsl departments; projects; trouble shooting; staff training resources, working papers and support materials; and community activities, scholarship, conferences, and publications. ■ collaboration among library staff and patrons there are only a few examples of wikis used for collabora tion among library staff and patrons to cite as exemplars. one example is the st. joseph county public library (sjpl) subject guides (http://www.libraryforlife.org/ subjectguides/index.php/main_page), seen in figure 10. this wiki is a collection of resources and services in print and electronic formats to assist library patrons with subject area searching. as the wiki is published by library staff for public consumption, it has more of a professional feel than wikis from the first two categories. pages have images, and the content is structured to look like a standard web page. though the wiki looks like a web page, there still remain a number of edit links that follow each section of text on the wiki. while these tags bear importance for those editing figure 7. library instruction wiki (http://instructionwiki.org/) figure �. the university of connecticut libraries’ staff wiki (http:// wiki.lib.uconn.edu/) article title | author 31wikis in libraries | bejune 31 the wiki—library staff only in this case—they undoubtedly puzzle library patrons who think that they have the ability to edit the wiki when, in fact, they do not. another example of collaboration between library staff and patrons that takes a similar approach is the usc aiken gregggraniteville library web site (http://library. usca.edu/) in figure 11. as with the sjpl subject guides, this wiki looks more like a web site than a wiki. in fact, the usc aiken wiki conceals its true identity as a wiki even more so than the sjpl subject guides. the only evidence that the web site is a wiki is a link at the bottom of each page that says “powered by pmwiki.” pmwiki (http:// pmwiki.org/) is a content management system that uti lizes the wiki technology on the back end to manage a web site while retaining the look and feel of a standard web site. it seems that the benefits of using a wiki in such a way are shared content creation and management. ■ collaboration among patrons as there are only three examples of wikis used for col laboration among patrons, all examples will be high lighted in this section. the first example is wiki worldcat (http://www.oclc.org/productworks/wcwiki.htm), sponsored by oclc. wiki worldcat launched as a pilot project in september 2005. the service allows users of open worldcat, oclc’s web version of worldcat, to add book reviews to item records. though this wiki does not have many book reviews in it, even for contemporary bestsellers, it gives a taste for how a wiki could be used to facilitate collaboration among patrons. a second example is the biz wiki from ohio university libraries (http://www.library.ohiou.edu/subjects/ bizwiki/index.php/main_page) (see figure 12). the biz wiki is a collection of business information resources avail able through ohio university. the wiki was created by chad boeninger, reference and instruction librarian, as an alternate form of a subject guide or pathfinder. what separates this wiki from those in the third category, collaboration among library staff and patrons, is that the wiki is editable by patrons as well as librarians. similarly, butler wikiref (http://www .seedwiki.com/wiki/butler_wikiref) is a wiki that has reviews of reference resources created by butler librarians, faculty, staff, and students (see figure 13).figure 9. health sciences library knowledge base (http://appdev .hsclib.sunysb.edu/twiki/bin/view/main/webhome) figure 11. usc aiken gregg-graniteville library (http://library.usca .edu/) figure 10. sjcpl subject guides (http://libraryforlife.org/subject guides/index.php/main_page/) 32 information technology and libraries | september 200732 information technology and libraries | september 2007 full results thirtythree wikis were identified. two wikis were classi fied in two categories each. the full results are available in the appendix. table 1 illustrates how wikis were not uniformly distributed across the four categories: category i had 45.7 percent, category ii had 31.4 percent, category iii had 14.3 percent, and category iv had 8.6 percent. nearly 80 percent of all examples were found within categories i and ii. as seen in some of the examples in the previous section, wikis were utilized for a variety of purposes. here is a short list of purposes for which wikis were utilized: for sharing information, supporting association work, collecting soft ware documentation, supporting conferences, facilitating librariantofaculty collaboration, creating digital reposito ries, managing web content, creating intranets, providing reference desk support, creating knowledge bases, creating subject guides, and collecting reader reviews. wiki software utilization is summarized in tables 2 and 3. mediawiki is the most popular software utilized by libraries (33.3 percent), followed by unknown (30.3 percent), pbwiki (12.1 percent), pmwiki (12.1 percent), seedwiki (6.1 percent), twiki (3 percent), and xwiki (3 percent). if the values for unknown are removed from the totals (table 3 ), mediawiki is utilized in almost half (47.8 percent) of all library wiki applications. ■ discussion with a wealth of examples of wikis in categories i and ii and a dearth of examples of wikis in categories iii and iv, the library community seems to be more comfortable using wikis to collaborate within the community, but less comfortable using wikis to collaborate with library patrons or to enable collaboration among patrons. the research results pose the questions: why are wikis pre dominantly used for collaboration within the library community? and why are wikis minimally used for col laborating with patrons and helping patrons to collabo rate with one another? why are wikis predominantly used for collaboration within the library community? this is perhaps the easier of the two questions to explain. there is a long legacy of cooperation and collaboration intraorganizationally and extraorganizationally within libraries. one explanation for this is the shared bud getary climate within libraries. all too often there are insufficient money, staff, and resources to offer desired levels of service. librarians work together to overcome these barriers. prominent examples include coopera tive cataloging, interlibrary lending, and the formation of consortia to negotiate pricing. another explanation can be found in the personal characteristics of library professionals. librarianship is a service profession that consequently attracts serviceminded individuals who are interested in helping others, whether they are library patrons or fellow colleagues. a third reason is the role of library associations, such as the international federation of library associations and institutions, the american library association, the special libraries association, and the medical library association, as well as many others at the international, national, state, and local lev figure 12. ohio university libraries biz wiki (http://www.library. ohiou.edu/subjects/bizwiki) figure 13. butler wikiref (http://www.seedwiki.com/wiki/butler_ wikiref) article title | author 33wikis in libraries | bejune 33 els, and the work that is done through these associations at annual conferences and throughout the year. libraries use wikis to collaborate intraorganizationally and extra organizationally because collaboration is what they do most naturally. why are wikis minimally used for collaborating with patrons and helping patrons to collaborate with one another? the reasons for why libraries are only minimally using wikis to collaborate with patrons and for patron collabora tion are more difficult to ascertain. however, due to the untapped potential of using wikis, the proposed answers to this question are more important and may lead to future implementations of wikis in libraries. here are four pos sible explanations, some more speculative than others. first, perhaps one of the reasons is the result of the way in which libraries are conceived by library patrons and librarians alike. a strong case can be made for libraries as places of collaborative work, and the author takes this posi tion. however, historically libraries have been repositories of information, and this remains a pervasive and difficult concept to change—libraries are frequently seen simply as places to get books. in this scenario, the librarian is a gate keeper that a patron interacts with to get a book—that is, if the patron interacts with a librarian at all. it also is worthy to note that the relationship is oneway—the patron needs the assistance of librarian, but not the other way around. viewed in these terms, this is not a collaborative situation. for libraries to use wikis for the purpose of collaborating with library patrons, it might demand the reconceptualiza tion of libraries by library patrons and librarians. similarly, this extreme conceptualization of libraries does not con sider patrons working with one another, even though it is an activity that occurs formally and informally within libraries, not to mention with the emergence of interdisci plinary and multidisciplinary work. if wikis are to be used to facilitate collaboration between patrons, the conceptual ization of the library by library patrons and librarians must be expanded. second, there may be fears within the library commu nity about authority, responsibility, and liability. libraries have long held the responsibility of ensuring the authority of the bibliographic catalog. if patrons are allowed to edit the library wiki, there is potential for negatively affecting the authority of the wiki and even the perceived author ity of the library. likewise, there is potential liability in allowing patrons to post to the library wiki. similar con table 2. software totals wiki software no. % mediawiki 11 33.3 unknown 10 30.3 pbwiki 4 12.1 pmwiki 4 12.1 seedwiki 2 6.1 twiki 1 3 xwiki 1 3 total: 33 100 table 3. software totals without unknowns wiki software no. % mediawiki 11 47.8 pbwiki 4 17.4 pmwiki 4 17.4 seedwiki 2 8.7 twiki 1 4.3 xwiki 1 4.3 total: 23 100.0 table 1. classification summary category no. % i: collaboration among libraries 16 45.7 ii: collaboration among library staff 11 31.4 iii: collaboration among library staff and patrons 5 14.3 iv: collaboration among patrons 3 8.6 total: 35 100.0 3� information technology and libraries | september 20073� information technology and libraries | september 2007 cerns have been raised in the past about other collabora tive technologies, such as blogs, bulletin boards, mailing lists, and so on, all aspects of the library 2.0 movement. if libraries are fully to realize library 2.0 as described by casey and savastinuk (2006), miller (2006), and courtney (2007), these issues must be considered. third, perhaps it is due to a matter of fit. it might be the case that wikis are utilized in categories i and ii and not within categories iii and iv because the tools are better suited to support the types of activities within categories i and ii. consider some of the activities listed earlier: sup porting association work, collecting software documenta tion, supporting conferences, creating digital repositories, creating intranets, and creating knowledge bases. each of these illustrates a wiki that is utilized for the creation of a resource with multiple authors and readers, tasks that are wellsuited to wikis. wikipedia is a great example of a wiki with clear, shared tasks for multiple authors and multiple readers and a sense of persistence over time. in contrast, relationships between library staff and patrons do not typically lead to the shared creation of resources. while it is true that the relationship between patron and librarian in the context of a patron’s research assignment can be collab orative depending on the circumstances, authorship is not shared but is possessed by the patron. in addition, research assignments in the context of undergraduate coursework are shortlived and seldom go beyond the confines of a particular course. in terms of patrons working together with other patrons, there is the precedent of group work; however, groups often produce projects or papers that share the characteristics of nongroup research assignments listed above. this, of course, does not mean that wikis are not suitable for collaboration within categories iii and iv, but perhaps the opportunities for collaboration are fewer or that they stretch the imagination of the types and ways of doing collaborative work. fourth, perhaps it is a matter of “not yet.” while the research has shown that libraries are not utilizing wikis in categories iii and iv, this may be because it is too soon. it should be noted that wikis are still new technologies. it might be the case that librarians are experimenting in safer contexts so they will gain experience prior to trying more public projects where their expertise will be needed. if this explanation is true, it is expected that more exam ples of wikis in libraries will soon emerge. as they do, the author hopes that all examples of wikis in libraries, new and old, will be added to the companion wiki to this article, librarywikis (http://librarywikis.pbwiki.com/). ■ conclusion it appears that wikis are here to stay, and that their utili zation within libraries is only just beginning. this article documented the current practice of wikis used in libraries using cscw as a framework for discussion. the author located examples of wikis in three places: within the lis lit erature, on the library success wiki, and within messages from three professional electronic discussion lists. thirty three examples of wikis were identified and classified using a classification schema created by the author. the schema has four categories: (1) collaboration among librar ies; (2) collaboration among library staff; (3) collaboration among library staff and patrons; and (4) collaboration among patrons. wikis were used for a variety of purposes, including for sharing information, supporting associa tion work, collecting software documentation, supporting conferences, facilitating librariantofaculty collaboration, creating digital repositories, managing web content, creat ing intranets, providing reference desk support, creating knowledge bases, creating subject guides, and collecting reader reviews. by and large, wikis were primarily used to support collaboration among library staff intraorganiza tionally and extraorganizationally, with nearly 80 percent (45.7 percent and 31.4 percent respectively) of the examples so identified, and less so in the support of collaboration among library staff and patrons (14.3 percent) and col laboration among patrons (8.6 percent). a majority of the examples of wikis utilized the mediawiki software (47.8 percent). it is clear that there are plenty of examples of wikis utilized in libraries, and more to be found each day. it is at this time that the profession is faced with extending the use of this technology, and it is to the future to see how wikis will continue to be used within libraries. works cited ackerman, mark s. 2002. the intellectual challenge of cscw: the gap between social requirements and technical feasibil ity. in human-computer interaction in the new millennium, ed. john m. carroll, 179–203. new york: addisonwesley. balraj, leela, et al. 2005 librefl. kent state university librar ies. http://www.library.kent.edu/page/10391 (accessed june 12, 2007). archive is available at this link as well. bannon, liam j., and kjeld schmidt. 1991. cscw: four charac ters in search for a context. in studies in computer supported cooperative work. ed. john m. bowers and steven d. benford, 3–16. amsterdam: elsevier. casey, michael e., and laura c. savastinuk. 2006. library 2.0. library journal 131, no. 14: 40–42. http://www.libraryjournal. com/article/ca6365200.html (accessed june 12, 2007). courtney, nancy. 2007. library 2.0 and beyond: innovative technologies and tomorrow’s user (in press). westport, conn.: libraries unlimited. dix, alan, et al. 2004. socioorganizational issues and stake holder requirements. in human computer interaction, 3rd ed., 450–74. upper saddle river, n.j.: prentice hall. dourish, paul. 2001. social computing. in where the action is: the foundations of embodied interaction, 55–97. cambridge, mass: mit pr. article title | author 35wikis in libraries | bejune 35 farkas, meredith. 2006. wiki world. http://www.libsuccess. org/index.php?title=wiki_world (accessed june 12, 2007). giles, jim. 2005. internet encyclopaedias go head to head. nature 438: 900–01. http://www.nature.com/nature/journal/v438/ n7070/full/438900a.html (accessed june 12, 2007). greif, irene, ed. 1988. computer supported cooperative work: a book of readings. san mateo, calif.: morgan kaufmann publishers. health sciences library, state university of new york, stony brook. 2007. health sciences library knowledge base. http://appdev.hsclib.sunysb.edu/twiki/bin/view/main/ webhome (accessed june 12, 2007). johansen, robert, et al. 1988. groupware: computer support for business teams. new york: free press. library instruction wiki. 2006. http://instructionwiki.org/ main_page (accessed june 12, 2007). miller, paul. 2006. coming together around library 2.0. dlib magazine 12, no. 4. http://www.dlib.org/dlib/april06/ miller/04miller.html (accessed june 12, 2007). nichols, david m., and michael b. twidale. 1999. com puter supported cooperative work and libraries. vine 109: 10–15. http://www.comp.lancs.ac.uk/computing/research/ cseg/projects/ariadne/docs/vine.html (accessed june 12, 2007). olson, gary m., and judith s. olson. 2002. groupware and com putersupported cooperative work. in the human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, ed. julie a. jacko and andrew sears, 583–95. mahwah, n.j.: lawrence erlbaum associates, inc.. rodden, tom t. 1991. a survey of cscw systems. interacting with computers 3, no. 3: 319–54. sachs, patricia. 1995. transforming work: collaboration, learn ing, and design. communications of the acm 38: 227–49. sánchez, j. alfredo. 2001. hci and cscw in the context of digi tal libraries. in chi ‘01 extended abstracts on human factors in computing systems. conference on human factors in computing systems. seattle, wash., mar. 31–apr. 5 2001. schmidt, kjeld, and liam j. bannon. 1992. taking cscw seri ously: supporting articulation work. computer supported cooperative work 1, no. 1/2: 7–40. shneiderman, ben, and catherine plaisant. 2005. collaboration. in designing the user interface: strategies for effective humancomputer interaction, 4th ed., 408–50. reading, mass.: addison wesley. tennant, roy. 2005. web4lib electronic discussion. webjunc tion.org. http://lists.webjunction.org/web4lib/ (accessed june 12, 2007). archive is available at this link as well. twidale, michael b., et al. 1997. collaboration in physical and digital libraries. report no. 64, british library research and innovation centre. http://www.comp.lancs.ac.uk/ computing/research/cseg/projects/ariadne/bl/report/ (accessed june 12, 2007). twidale, michael b., and david m. nichols. 1998a. using studies of collaborative activity in physical environments to inform the design of digital libraries. technical report cseg/11/98, computing department, lancaster university, uk. http://www.comp.lancs.ac.uk/computing/research/cseg/ projects/ariadne/docs/cscw98.html (accessed june 12, 2007). twidale, michael b., and david m. nichols. 1998b. a survey of applications of cscw for digital libraries. technical report cseg/4/98, computing department, lancaster university, uk. http://www.comp.lancs.ac.uk/computing/research/cseg/ projects/ariadne/docs/survey.html (accessed june 12, 2007). webjunction. n.d. dig_ref electronic discussion list. http:// www.vrd.org/dig_ref/dig_ref.shtml (accessed june 12, 2007). wikipedia. 2007a. wiki. http://en.wikipedia.org/wiki/wiki (accessed april 29, 2007). wikipedia. 2007b. wikiwikiweb. http://en.wikipedia.org/ wiki/wikiwikiweb (accessed april 29, 2007). 36 information technology and libraries | september 200736 information technology and libraries | september 2007 appendix. wikis in libraries i = collaboration between libraries ii = collaboration between library staff iii = collaboration between library staff and patrons iv = collaboration between patrons category description location wiki software i library success: a best practices wiki—a wiki capturing library success stories. covers a wide variety of topics. also features a presentation about wikis http://www.libsuccess. org/index.php?title=wiki_world http://www.libsuccess.org/ mediawiki i wiki for school library association in alaska http://akasl.pbwiki.com/ pbwiki i wiki to support reserves direct. free, opensource software for managing academic reserves materials developed by emory university. http://www.reservesdirect.org/ wiki/index.php/main_page mediawiki i sunyla new tech wiki—a place for state university of new york (suny) librarians to share how they are using information technologies to interact with patrons http://sunylanewtechwiki.pbwiki. com/ pbwiki i wiki for librarians and faculty members to collaborate across campuses. being used with distance learning instructors and small groups message from robin shapiro. on [dig_ref] electronic discussion list dated 10/18/2006. unknown i discusses setting up three wikis in last month: “one to sup port a preconference workshop, another for behindthe scenes conferences planning by local organizers, and one for conference attendees to use before they arrived and during the sessions” (30). fichter, darlene. 2006. using wikis to support online collaboration in libraries. information outlook 10, no.1: 3031. unknown i unofficial wiki to the american library association 2005 annual conference http://meredith.wolfwater.com/ wiki/index.php?title=main_page mediawiki i unofficial wiki to the 2005 internet librarian conference http://ili2005.xwiki.com/xwiki/bin/ view/main/webhome xwiki i wiki for the canadian library association (cla) 2005 annual conference http://wiki.ucalgary.ca/page/cla mediawiki i wiki for south carolina library association http://www.scla.org/governance/ homepage pmwiki i wiki set up to support national discussion about institutional repositories in new zealand http://wiki.tertiary.govt.nz/ ~institutionalrepositories pmwiki i the oregon library instruction wiki used for sharing infor mation about library instruction http://instructionwiki.org/ mediawiki i personal repositories online wiki environment (prowe)— an online repository sponsored by the open university and the university of leicester that uses wikis and blogs to encourage the open exchange of ideas across communities of practice http://www.prowe.ac.uk/ unknown article title | author 37wikis in libraries | bejune 37 category description location wiki software i lis wiki—space for collecting articles and general informa tion about library and information science http://liswiki.org/wiki/main_page mediawiki i making of modern michigan—a wiki to support a statewide digital library project http://blog.lib.msu.edu/mmmwiki/ index.php/main_page unknown (behind firewall) i wiki used as a web content editing tool in a digital library initiative sponsored by emory university, the university of arizona, virginia tech, and the university of notre dame http://sunylanewtechwiki.pbwiki .com/ pbwiki ii wiki at suny stony brook health sciences library used as knowledge base http://appdev.hsclib.sunysb.edu/ twiki/bin/view/main/webhome; presentation can be found at: http:// ms.cc.sunysb.edu/%7edachase/ wikisinaction.htm twiki ii wiki at york university used internally for committee work. exploring how to use wikis as a way to collaborate with users message from mark robertson. on web4lib electronic discussion list dated 10/13/2006. unknown ii wiki for internal staff use at the university of waterloo. they utilize access control to restrict parts of the wiki to groups message from chris gray. on web4lib electronic discussion list dated 08/09/2006. unknown ii wiki at the university of toronto for internal communica tions, technical problems, and as a document repository message from stephanie walker. on librefl electronic discussion list dated 10/28/2006. unknown ii wiki used for coordination and organization of portable professor program, which appears to be a collaborative infor mation literacy program for remote faculty http://tfppcommittee.pbwiki.com/ pbwiki ii the university of connecticut libraries’ staff wiki which is a repository of information technology services documents http://wiki.lib.uconn.edu/wiki/ main_page mediawiki ii wiki used at binghamton university libraries for staff intranet. features pages for committees, documentation, policies, newsletters, presentations, and travel reports screenshots can be found at http://library.lib.binghamton.edu/ presentations/cil2006/cil%202006 _wikis.pdf mediawiki ii wiki used at the information desk at miami university described in: withers, rob. “something wiki this way comes.” c&rl news 66, no. 11 (2005): 775–77. unknown ii use of wiki as knowledge base to support reference service http://oregonstate.edu/~reeset/ rdm/ unknown ii university of minnesota libraries staff web site in wiki form https://wiki.lib.umn.edu/ pmwiki ii wiki used to support the mit engineering and science libraries bteam. the wiki may no longer be active, but is still available http://www.seedwiki.com/wiki/b team seedwiki iii a wiki that is subject guide at st. joseph county public library in south bend, indiana http://www.libraryforlife.org/ subjectguides/index.php/main_page mediawiki 3� information technology and libraries | september 20073� information technology and libraries | september 2007 category description location wiki software iii wiki used at the aiken library, university of south carolina as a content management system (cms) http://library.usca.edu/main/ homepage pmwiki iii doucette library of teaching resources wiki—a repository of resources for education students http://wiki.ucalgary.ca/page/ doucette mediawiki iv wiki worldcat (wikid) is an oclc pilot project (now defunct) that allowed users to add reviews to open worldcat records http://www.oclc.org/product works/wcwiki.htm unknown iii and iv wikiref lists reviews of reference resources—databases, books, web sites, etc. —created by butler librarians, faculty, staff, and students. http://www.seedwiki.com/wiki/ butler_wikiref; reported in matthies, brad, jonathan helmke, and paul slater. using a wiki to enhance library instruction. indiana libraries 25, no. 3 (2006): 32–34. seedwiki iii and iv wiki used as a subject guide at ohio university http://www.library.ohiou.edu/sub jects/bizwiki/index.php/main_page; presentation about the wiki: http://www.infotoday.com/cil2006/ presentations/c101102_boeninger .pps mediawiki evaluation and comparison of discovery tools: an update f. william chickering and sharon q. yang information technology and libraries | june 2014 5 abstract selection and implementation of a web-scale discovery tool by the rider university libraries (rul) in the 2011–2012 academic year revealed that the endeavor was a complex one. research into the state of adoption of web-scale discovery tools in north america and the evolution of product effectiveness provided a good starting point. in the following study, we evaluated fourteen major discovery tools (three open source and ten proprietary), benchmarking sixteen criteria recognized as the advanced features of a “next generation catalog.” some of the features have been used in previous research on discovery tools. the purpose of the study was to evaluate and compare all the major discovery tools , and the findings serve to update librarians on the latest developments and user interfaces and to assist them in their adoption of a discovery tool. introduction in 2004, the rider university libraries’ (rul) strategic planning process uncovered a need to investigate federated searching as a means to support rese arch. a tool was needed to search and access all journal titles available to rul users at that time, including 12,000+ electronic full-text journals. lacking the ability to provide relevancy ranking due to its real-time search operations, as well as the cost of the products then available, the decision was made to defer implementation of federated search. monitoring developments yearly revealed no improvements strong enough to adopt the approach. by 2011, the number of electronic full-text journals had increased to 51,128, and by this time federated search as a concept had metamorphosed into web -scale discovery. clearly, the time had come to consider implementing this more advanced approach to searching the ever-growing number of journals available to our clients. though rul passed on federated searching, viewing it as too cumbersome to serve our students well, we anticipated the day when improved systems would emerge. vaughn nicely describes the ability of more highly evolved discovery systems to “provide qu ick and seamless discovery, delivery, and relevancy-ranking capabilities across a huge repository of content.” 1 yang and hofmann anticipated the emergence of web-scale discovery with their evaluation of next generation catalogs. 2,3 by 2011, informed by yang and hofmann’s research, we believed that the systems in the marketplace were sufficiently evolved to make our efforts at assessing available systems worthwhile. this coincided nicely with an important objective in our strategic plan : f. william chickering (chick@rider.edu) is dean of university libraries, rider university, lawrenceville, new jersey. sharon q. yang (yangs@rider.edu) is associate professor–librarian at moore library, rider university, lawrenceville, new jersey. mailto:chick@rider.edu mailto:yangs@rider.edu evaluation and comparison of discovery tools: an update | chickering and yang 6 investigate link resolvers and discovery tools for federated searching and opac by summer 2011. heeding alexander pope’s advice to “be not the first by whom the new are tried, nor yet the last to lay the old aside,”4 we set about discovering what systems were in use throughout north america and which features each provided. some history in 2006, antelman, lynema, and pace observed that “library catalogs have represented stagnant technology for close to twenty years.” better technology was needed “to leverage the rich metadata trapped in the marc record to enhance collection browsing. the promise of online catalogs has never been realized. for more than a decade, the profession either turned a blind eye to problems with the catalog or accepted that it is powerless to fix them.” 6 dissatisfaction with catalog search tools led us to review the vufind discovery tool. while it had some useful features (spelling, “did you mean?” suggestions), it still suffered from inadequacies in full-text search and the cumbersome nature of searcher-designated boolean searching. it did not work well in searching printed music collections and, of course, only served as a catalog front end. with this all in mind, rul developed a set of objectives to improve information access for clients: • to provide information seekers with • an easy search option for academically valid information materials • an effective search option for academically valid information materials • a reliable search option for academically valid information materials across platforms • to recapture student academic search activity from google • to attempt revitalizing the use of monographic collections • to provide an effective mechanism to support offerings of e -books • to build a firm platform for appropriate library support of distance learning coursework literature review marshall breeding first discussed broad based discovery tools in 2005, shortly after the launch of google scholar. he posits that federated search could not compete with the power and speed of a tool like google scholar. he proclaims the need for, as he describes it, a “centralized search model.”7 information technology and libraries | june 2014 7 building on breeding’s observations four years earlier, diedrichs astutely observe d in 2009 that “user expectations for complete and immediate discovery and delivery of information have been set by their experiences in the web2.0 world. libraries must respond to the needs of those users whose needs can easily be met with google-like discovery tools, as well as those that require deeper access to our resources.”10 in that same year, dolski described the common situation in many academic libraries when in reference to the university of nevada las vegas (unlv) library he states, “our library website serves as the de facto gateway to our electronic, networked content offerings. yet usability studies have shown that findability, when given our website as a starting point, is poor. undoubtedly this is due, at least in part, to interface fragmentation.” 11 this perfectly described the way we had come to view rul’s situation. in 2010, breeding reviewed the systems in the market, noting that these are not just nextgeneration catalogs. he stressed “equal access to content in all forms,” a concept we now take for granted. a key virtue in discovery tools, he notes, is the “blending of the full text of journal articles and books alongside citation data, bibliographic, and authority records resulting in a powerful search experience. rather than being provided a limited number of access points selected by catalogers, each word and phrase within the text becomes a possible point of retrieval.” breeding further points out that: “web-scale discovery platforms will blur many of the restrictions and rules that we impose on library users. rather than having to explain to a user that the library catalog lists books and journal titles but not journal articles, users can simply begin with the concept, author, or title of interest and straightaway begin seeing results across the many formats within the library’s collection.”12 working with freshmen at rider university revealed that they are ahead of the professionals in approaching information this way, and we believed that web-scale discovery tools could help our users. as we began the process of selecting a discovery tool, we looked at the experiences of others. fabbi at the university of nevada las vegas (unlv) folded in a strong component of organizational learning in a highly structured manner that was unnecessary at rider. 13 no information was disclosed on the process of selecting a discovery vendor, though the website reveals the presence of a discovery tool (http://library.nevada.edu/). in contrast, many librarians at rider explored a variety of libraries’ application of search tools. following hofmann and yang’s work, a process of vendor demonstrations and analysis of feasibility led to a trial of ebsco discovery service. what we hoped for is what way at grand valley state reported in 2010 of his analysis of serials solutions’ summon: http://library.nevada.edu/ evaluation and comparison of discovery tools: an update | chickering and yang 8 an examination of usage statistics showed a dramatic decrease in the use of traditional abstracting and indexing databases and an equally dramatic increase in the use of full text resources from full text database and online journal collections. the author concludes that the increase in full text use is linked to the implementation of a web‐scale discovery tool.14 method understanding both rul’s objectives and the state of the art as reflected in the literature, we concluded that an up-to-date review of discovery tool adoptions was in order before moving forward in the process of selecting a product. 1. the resulting study included these steps: (1) compiling a list of all the major discovery tools, (2) developing a set of criteria for evaluation, (3) examining between four to seven websites where a discovery tool was deployed and evaluating each tool against each criteria, (4) recording the findings, and (5) analyzing the data. the targeted population for the study included all the major discovery tools in use in the united states. we define a discovery tool as a library user interface independent of any library systems. a discovery tool can be used to replace the opac module of an integrated library system or liv e sideby-side with the opac. other names for discovery tools include stand -alone opac, discovery layer, or discovery user interface. lately, a discovery tool is more often called a discovery service because most are becoming subscription-based and reside remotely in a cloud-based saas (software as a service) model. the authors compiled a list of fourteen discovery tools based on marshall breeding’s “major discovery products” guide published in “library technology guides.”15 those included aquabrowser library, axiell arena, bibliocommons (bibliocore), blacklight, ebsco discovery service, encore, endeca, extensible catalog, sirsidynix enterprise, primo, summon, visualizer, vufind, and worldcat local. two open-source discovery layers, sopac (the social opac) and scriblio, were excluded from this study because very few libraries are using them. for evaluation in this study, academic libraries were preferred over public libraries during the sample selection process. however, some discovery tools , such as bibliocommons, were more popular among public libraries. therefore examples of public library websites were included in the evaluation. the sites that made the final list were chosen either from the vendor’s website that maintained a customer list or breeding’s “library technology guides.”16 the following is the final list of libraries whose implementations were used in the study. information technology and libraries | june 2014 9 example library sites with proprietary discovery tools: aquabrowser (serials solutions) 1. allen county public library at http://smartcat.acpl.lib.in.us/ 2. gallaudet university library at http://discovery.wrlc.org/?skin=ga 3. harvard university at http://lib.harvard.edu/ 4. norwood young america public library at http://aquabrowser.carverlib.org/ 5. selco southeastern libraries cooperating at http://aquabrowser.selco.info/?c_profile=far 6. university of edinburgh (uk) at http://aquabrowser.lib.ed.ac.uk/ axiell arena (axiell) 1. doncaster council libraries (uk) at http://library.doncaster.gov.uk/web/arena 2. lerums bibliotek (lerums library, sweden) at http://bibliotek.lerum.se/web/arena 3. london libraries consortium-royal kingston library (uk) at http://arena.yourlondonlibrary.net/web/kingston 4. norddjurs (denmark) at https://norddjursbib.dk/web/arena/ 5. north east lincolnshire libraries (uk) at http://library.nelincs.gov.uk/web/arena 6. someron kaupunginkirjasto (finland) at http://somero.verkkokirjasto.fi/web/arena 7. syddjurs (denmark) at https://bibliotek.syddjurs.dk/web/arena1 bibliocore (bibliocommons) 1. halton hills public library at http://hhpl.bibliocommons.com/dashboard 2. new york public library at http://nypl.bibliocommons.com/ 3. oakville public library at http://www.opl.on.ca/ 4. princeton public library at http://princetonlibrary.bibliocommons.com/ 5. seattle public library at http://seattle.bibliocommons.com/ 6. west perth (australia) public library at http://wppl.bibliocommons.com/dashboard 7. whatcom county library system at http://wcls.bibliocommons.com/ ebsco discovery service/eds (ebsco) 1. aston university (uk) at http://www1.aston.ac.uk/library/ 2. columbia college chicago library at http://www.lib.colum.edu/ 3. loyalist college at http://www.loyalistlibrary.com/ 4. massey university (new zealand) at http://www.massey.ac.nz/massey/research/library/library_home.cfm 5. rider university at http://www.rider.edu/library 6. santa rosa junior college at http://www.santarosa.edu/library/ 7. st. edward's university at http://library.stedwards.edu/ encore (innovative interfaces) http://smartcat.acpl.lib.in.us/ http://discovery.wrlc.org/?skin=ga http://lib.harvard.edu/ http://aquabrowser.carverlib.org/ http://aquabrowser.selco.info/?c_profile=far http://aquabrowser.lib.ed.ac.uk/ http://library.doncaster.gov.uk/web/arena http://bibliotek.lerum.se/web/arena http://arena.yourlondonlibrary.net/web/kingston https://norddjursbib.dk/web/arena/ http://library.nelincs.gov.uk/web/arena http://somero.verkkokirjasto.fi/web/arena https://bibliotek.syddjurs.dk/web/arena1 http://hhpl.bibliocommons.com/dashboard http://nypl.bibliocommons.com/ http://www.opl.on.ca/ http://princetonlibrary.bibliocommons.com/ http://seattle.bibliocommons.com/ http://wppl.bibliocommons.com/dashboard http://wcls.bibliocommons.com/ http://www1.aston.ac.uk/library/ http://www.lib.colum.edu/ http://www.massey.ac.nz/massey/research/library/library_home.cfm http://www.rider.edu/library http://www.santarosa.edu/library/ http://library.stedwards.edu/ evaluation and comparison of discovery tools: an update | chickering and yang 10 1. adelphi university at http://libraries.adelphi.edu/ 2. athens state university library at http://www.athens.edu/library/ 3. california state university at http://coast.library.csulb.edu/ 4. deakin university (australia) at http://www.deakin.edu.au/library/ 5. indiana state university at http://timon.indstate.edu/iii/encore/home?lang=eng 6. johnson and wales university at http://library.uri.edu/ 7. st. lawrence university at http://www.stlawu.edu/library/ endeca (oracle) 1. john f. kennedy presidential library and museum at http://www.jfklibrary.org/ 2. north caroline state university at http://www.lib.ncsu.edu/endeca/ 3. phoenix public library at http://www.phoenixpubliclibrary.org/ 4. triangle research libraries network at http://search.trln.org/ 5. university of technology, sydney (australia) at http://www.lib.uts.edu.au/ 6. university of north carolina at http://search.lib.unc.edu/ 7. university of ottawa (canada) libraries at http://www.biblio.uottawa.ca/html/index.jsp?lang=en enterprise (sirsidynix) 1. cerritos college at http://cert.ent.sirsi.net/client/cerritos 2. maricopa county community colleges at https://mcccd.ent.sirsi.net/client/default 3. mountain state university/university of charleston at http://msul.ent.sirsi.net/client/default 4. university of mary at http://cdak.ent.sirsi.net/client/uml 5. university of the virgin islands at http://uvi.ent.sirsi.net/client/default 6. western iowa tech community college at http://wiowa2.ent.sirsi.net/client/default primo (ex libris) 1. aberystwyth university (uk) at http://primo.aber.ac.uk/ 2. coventry university (uk) at http://locate.coventry.ac.uk/ 3. curtin university (australia) at http://catalogue.curtin.edu.au/ 4. emory university at http://web.library.emory.edu/ 5. new york university at http://library.nyu.edu/ 6. university of iowa at http://www.lib.uiowa.edu/ 7. vanderbilt university at http://www.library.vanderbilt.edu visualizer (vtls) 1. blinn college at http://www.blinn.edu/library/index.htm 2. edward via virginia college of osteopathic medicine at http://vcom.vtls.com:1177/ 3. george c. marshall foundation at http://gmarshall.vtls.com:6330/ 4. scugog memorial public library at http://www.scugoglibrary.ca/ http://libraries.adelphi.edu/ http://www.athens.edu/library/ http://coast.library.csulb.edu/ http://www.deakin.edu.au/library/ http://timon.indstate.edu/iii/encore/home?lang=eng http://library.uri.edu/ http://www.stlawu.edu/library/ http://www.jfklibrary.org/ http://www.lib.ncsu.edu/endeca/ http://www.phoenixpubliclibrary.org/ http://search.trln.org/ http://www.lib.uts.edu.au/ http://search.lib.unc.edu/ http://www.biblio.uottawa.ca/html/index.jsp?lang=en http://cert.ent.sirsi.net/client/cerritos https://mcccd.ent.sirsi.net/client/default http://msul.ent.sirsi.net/client/default http://cdak.ent.sirsi.net/client/uml http://uvi.ent.sirsi.net/client/default http://wiowa2.ent.sirsi.net/client/default http://primo.aber.ac.uk/primo_library/libweb/action/search.do?dscnt=1&dstmp=1326479965873&vid=aberu_vu1&fromlogin=true http://locate.coventry.ac.uk/primo_library/libweb/action/search.do?dscnt=1&fromlogin=true&dstmp=1326480439550&vid=cov_vu1&fromlogin=true http://catalogue.curtin.edu.au/primo_library/libweb/action/search.do?dscnt=0&dstmp=1326480547980&vid=cur&fromlogin=true http://web.library.emory.edu/ http://library.nyu.edu/ http://www.lib.uiowa.edu/ http://www.library.vanderbilt.edu/ http://www.blinn.edu/library/index.htm http://vcom.vtls.com:1177/ http://gmarshall.vtls.com:6330/ http://www.scugoglibrary.ca/ information technology and libraries | june 2014 11 summon (serials solutions) 1. arizona state university at http://lib.asu.edu/ 2. dartmouth college at http://dartmouth.summon.serialssolutions.com/ 3. duke university at http://library.duke.edu/ 4. florida state university at http://www.lib.fsu.edu/ 5. liberty university at http://www.liberty.edu/index.cfm?pid=178 6. university of sydney at http://www.library.usyd.edu.au/ worldcat local (oclc) 1. boise state university at http://library.boisestate.edu/ 2. bowie state university at http://www.bowiestate.edu/academics/library/ 3. eastern washington university at http://www.ewu.edu/library.xml 4. louisiana state university at http://lsulibraries.worldcat.org/ 5. saint john's university at http://www.csbsju.edu/libraries.htm 6. saint xavier university at http://lib.sxu.edu/home examples of open source and free discovery tools: blacklight (the university of virginia library) 1. columbia university at http://academiccommons.columbia.edu/ 2. johns hopkins university at https://catalyst.library.jhu.edu/ 3. north carolina university at http://historicalstate.lib.ncsu.edu 4. northwestern university at http://findingaids.library.northwestern.edu/ 5. stanford university at http://www-sul.stanford.edu/ 6. university of hull (uk) at http://blacklight.hull.ac.uk/ 7. university of virginia at http://search.lib.virginia.edu/ extensible catalog/xc (extensible catalog organization/carli/university of rochester) 1. demo at http://extensiblecatalog.org/xc/demo 2. extensible catalog library at http://xco-demo.carli.illinois.edu/dtmilestone3 3. kyushu university (japan) at http://catalog.lib.kyushu-u.ac.jp/en 4. spanish general state authority libraries (spain) at http://pcu.bage.es/ 5. thailand cyber university/asia institute of technology (thailand) at http://globe.thaicyberu.go.th/ vufind (villanova university) 1. auburn university at http://www.lib.auburn.edu/ 2. carnegie mellon university libraries at http://search.library.cmu.edu/vufind/search/advanced http://lib.asu.edu/ http://dartmouth.summon.serialssolutions.com/ http://library.duke.edu/ http://www.lib.fsu.edu/ http://www.liberty.edu/index.cfm?pid=178 http://www.library.usyd.edu.au/ http://library.boisestate.edu/ http://www.bowiestate.edu/academics/library/ http://www.ewu.edu/library.xml http://lsulibraries.worldcat.org/search?qt=affiliate_wcl_all&q=&wcsbtn2w.x=14&wcsbtn2w.y=9 http://www.csbsju.edu/libraries.htm http://lib.sxu.edu/home http://academiccommons.columbia.edu/ https://catalyst.library.jhu.edu/ http://historicalstate.lib.ncsu.edu/ http://findingaids.library.northwestern.edu/ http://www-sul.stanford.edu/ http://blacklight.hull.ac.uk/ http://search.lib.virginia.edu/ http://extensiblecatalog.org/xc/demo http://xco-demo.carli.illinois.edu/dtmilestone3 http://catalog.lib.kyushu-u.ac.jp/en http://pcu.bage.es/ http://globe.thaicyberu.go.th/ http://www.lib.auburn.edu/ http://search.library.cmu.edu/vufind/search/advanced evaluation and comparison of discovery tools: an update | chickering and yang 12 3. colorado state university at http://lib.colostate.edu/ 4. saint olaf college at http://www.stolaf.edu/library/index.cfm 5. university of michigan at http://mirlyn.lib.umich.edu 6. western michigan university at https://catalog.library.wmich.edu/vufind/ 7. yale university library at http://yufind.library.yale.edu/yufind/ the following list of criteria was used for the purpose of the evaluation. some were based on those used by the previous studies on discovery tools.17, 18, 19 the list embodied the librarians’ vision for the next-generation catalog and contained some of the most desirable features for a modern opac. the authors were aware of other desirable features for a discovery layer, and the following list was by no means the most comprehensive, but it served the purpose of the study well. 1. one-stop search for all library resources. a discovery tool should include all library resources in its search including the catalog with books and videos, journal articles in databases, and local archives and digital repository. this can be accomplished by the unified index or federated search, an essential component for a discovery tool. some of the discovery tools are described as web-scale because of their potential to search seamlessly across all library resources. 2. state-of-the-art web interface. a discovery tool should have a modern design similar to e-commerce sites, such as google, netflix, and amazon. 3. enriched content. discovery tools should include book cover images, reviews, and user driven input, such as comments, descriptions, ratings, and tag clouds. the enriched content can be either from library patrons, commercial sources, or both. 4. faceted navigation. discovery tools should allow users to narrow down the search results by categories, also called facets. the commonly used facets include locations, publication dates, authors, formats, and more. 5. simple keyword search box with a link to advanced search at the start page. a discovery tool should start with a simple keyword search box that looks like that of google or amazon. a link to the advanced search should be present. 6. simple keyword search box on every page. the simple keyword search box should appear on every page of a discovery tool. 7. relevancy. relevancy results criteria should take into consideration circulation statistics and books with multiple copies. more frequently circulated books indicate popularity and usefulness, and they should be ranked higher on the top of the display. a book of multiple copies may also be an indication of importance. http://lib.colostate.edu/ http://www.stolaf.edu/library/index.cfm http://mirlyn.lib.umich.edu/ https://catalog.library.wmich.edu/vufind/ http://yufind.library.yale.edu/yufind/ information technology and libraries | june 2014 13 8. “did you mean . . . ? spell-checking. when an error appears in the search, the discovery tool should correct the query spelling as a link so that users can simply click on it to get the search results. 9. recommendations/related materials. a discovery tool should recommend resources for readers in a similar manner to amazon or other e -commerce sites, based on transaction logs. this should take the form of “readers who borrowed this item also borrowed the following . . . ” or a link to recommended readings. it would be ideal if a discovery tool can recommend the most popular articles, a service similar to ex libris ’ bx usage-based services. 10. user contribution. user input includes descriptions, summaries, reviews, criticism, comments, rating and ranking, and tagging or folksonomies. 11. rss feeds. a modern opac should provide rss feeds. 12. integration with social networking sites. when a discovery tool is integrated with social networking sites, patrons can share links to library items with their friends on social networks like twitter, facebook, and delicious. 13. persistent links. records in a discovery tool contain a stable url capable of being copied and pasted and serving as a permanent link to that record. they are also called permanent urls. 14. auto-completion/stemming. a discovery tool is equipped with the computational algorithm that it can auto-complete the search words or supply a list of previously used words or phrases for users to choose from. google has stemming algorithms. 15. mobile compatibility. there is a difference between being “mobile compatible” and a “custom mobile website.” the former indicates a website can be viewed or used on a mobile phone, and the later denotes a different version of the user interface specially built for mobile use. in this study we include both as “yes.” 16. functional requirements for bibliographic retrieval (frbr). the latest development of rda certainly makes a discovery tool more desirable if it can display frbr relationships. for instance, a discovery tool may display and link different versions, editions or formats of a work, what frbr refers to as expressions and manifestations. for record keeping and analysis, a microsoft excel file with sixteen fields based on the above criteria was created. the authors checked the discovery tools on the websites of the selected libraries and recorded those features as present or absent. evaluation and comparison of discovery tools: an update | chickering and yang 14 rda compatibility is not used as a criterion in the study because most discovery tools allow users to add rda fields in marc. by now, all the discovery tools should be able to display, index, and search the new rda fields. findings one stop searching for all library resources—this is the most desirable feature when acquiring a discovery tool. unfortunately it also presented the biggest challenge for vendors. both librarians and vendors have been struggling with this issue for the past several years, yet no one has worked out a perfect solution. based on the examples the authors examined, this study found that only five out of fourteen discovery tools can retrieve articles from databases along with books, videos, and digital repositories. those include ebsco discovery service, encore, p rimo, summon, and worldcat local. whereas encore uses an approach similar to federated search performing live searches of databases, the other discovery tools build a single unified index. while the single unified index requires the libraries to send their catalog data and local information to the vendor for update and thus the discovery tools may fall behind in reflecting up to the minute accuracy in local holdings, federated search does real-time searching and does not lag behind in displaying current information. both approaches are limited in what they cover. both need permission from content providers for inclusion in the unified index or to develop a connection to article databases for real-time searching. for those discovery tools that do not have their own unified index or real-time searching capability, they provide web-scale searching through other means. for instance, vufind has developed connectors to application programming interfaces (apis) by serials solutions or oclc to pull search results from summon and worldcat local. encore not only developed its own realtime connection to electronic databases but is enhancing its web -scale search by incorporating the unified index from other discovery tools such as the ebsco discovery service. aquabrowse r is augmented by 360 federated search for the same purpose. despite of those possibilities, the authors did not find the article level retrieval in the sample discovery tools other than the main five mentioned above. comparing the coverage of each tools’ web-scale index can be challenging. ebsco, summon, and worldcat local publicize their content coverage on the web while primo and encore only share this information with their customers. this makes it hard to compare and evaluate the content coverage without contacting vendors and asking for that information. at present, none of the five discovery tools (ebsco discovery service, encore, primo, summon, and worldcat local) can boast 100% coverage of all library resources. in fact, none of the internet search engines, including google or google scholar, can retrieve 100% of all resources. therefore web -scale searching is more a goal than a possibility. apart from political and economic reasons, this is in part due to the nonbibliographic structure of the contents in databases such as scifinder and some others. one stop searching is still a work in progress because discovery tools provide students with a quick and simple way to retrieve a large number, but still an incomplete list of resources held by a information technology and libraries | june 2014 15 library. for more in-depth research, students are still encouraged to search the catalog, discipline specific databases, and digital repositories separately. state of the art interface—all the discovery tools are very similar in appearance to amazon.com. some are better than others. this study did not rate each discovery tool based on a scale and thus did not distinguish their fine degrees in appearance. rather each discovery tool is given a “yes” or “no.” the designation was based on subjective judgment. all the discovery tools received “yes” because they are very similar in appearance. enriched content—all the discovery tools have embedded book cover images or video jacket images, but some have displayed more, such as ratings and rankings, user -supplied or commercially available reviews, overviews, previews, comments, descriptions, title discussion, excerpts, or age suitability, just to name a few. a discovery tool may display enriched content by default out of box, but some may need to be customized to include it. the following is a list of enriched content implemented in each discovery tool that the authors found in the sample. the number in the last column indicates how many types of enriched content were found in the discovery tool at the time of the study. bibliocommons and aquabrowser stand out from the rest and made the top two on the list based on the number of enriched content from noncataloging sources (see figure 1). it is debatable how much nontraditional data a discovery tool should incorporate into its display. it warrants another discussion as to how useful such data is for users. faceted navigation—faceted navigation has become a standard feature in discovery tools over the last two years. it allows users to further divide search results into subsets based on predetermined terms. facets come from a variety of fields in marc records. some discovery tools have more facets than others. the most commonly seen facets include location or collections, publication dates, formats, author, genre, and subjects. faceted navigation is highly configurable as many discovery tools allow libraries to decide on their own facets. faceted navigation has become an integral part of a discovery tool. simple keyword search box on the starting page with a link to advanced search—the original idea is to allow a library’s user interface to resemble google by displaying a simple keyword search box with a link to advanced search at the starting page. most discovery tools provide the flexibility for libraries to choose or reject this option. however, many librarians find this approach unacceptable as they feel it lacks precision in searching and thus may mislead users. as the keyword box is highly configurable and up to the library to decide how they will present it, many libraries have added a pull down menu with options to search keywords, authors, titles, and locations. in doing so, the original intention for a google like simple search box is lost. therefore only a few libraries follow the goo gle-like box style at the starting page. most libraries altered the simple keyword search box on the starting page to include a dropdown menu or radio buttons, so the simple keyword search box is neither simple nor limited to keyword search only. nevertheless, this study gave all the discovery tools a “yes.” all the systems are capable of this feature even though libraries may choose not to use it. evaluation and comparison of discovery tools: an update | chickering and yang 16 rank discovery tool enriched content total 1 bibliocommons cover images, tags, similar title, private note, notices, age suitability, summary, quotes, video, comments, and rating 11 2 aquabrowser cover images, previews, reviews, summary, excerpts, tags, author notes & sketches, full text from google, rating/ranking 9 3 enterprise cover images, reviews, google previews, summary, excepts 5 4 axiell arena cover images, tags, reviews, and title discussion 4 vufind cover images, tags, reviews, comments 4 5 primo cover images, tags, previews 3 worldcat local cover images, tags, reviews 3 6 encore cover images, tags 2 visualizer cover images, reviews 2 summon cover images, reviews 2 7 blacklight cover images 1 ebsco discovery service cover images 1 endeca cover images 1 extensible catalog cover images 1 figure 1. the ranked list of enriched content in discovery tools . simple keyword search box on every page—the feature enables a user to start a new search at every step of navigation in the discovery tool. most of the discovery tools provide such a box on the top of the screen as users navigate through the search results and record displays except extensible catalog and enterprise by sirsidynix. the feature is missing from the former while the latter almost has this feature except when displaying bib records in a pop-up box. information technology and libraries | june 2014 17 relevancy—traditionally, relevancy is uniformly based on a computer algorithm that calculates the frequency and relative position of a keyword (field weighting) in a record and displays the search results based on the final score. other factors have never been a part of the decision in the display of search results. in the discussion on next-generation catalogs, relevancy based on circulation statistics and other factors came up as a desirable possibility, and no discovery tool has met this challenge until now. primo by ex libris is the only one among the discovery tools under investigation that can sort the final results by popularity. “primo’s popularity ranking is calculated by use. this means that the more an item record has been clicked and viewed, the more popular it is.”20 even though those are not real circulation statistics, this is considered to be a revolutionary step and a departure from traditional relevancy. three years ago none of the discovery tools provided this option.21 to make relevancy ranking even more sophisticated, scholarrank, another service by ex libris, can work with primo to sort the search results not only based on a query match but also an item’s value score (its usage and number of citations) and a user’s characteristics and information needs. this shows the possibility of more advanced relevancy ranking in discovery tools. other vendors will most likely follow in the future incorporating more sophistication in their relevancy algorithms. spell checker/“did you mean . . . ?”—the most commonly observed way of correcting a misspelling in a query is, “did you mean . . . ?” but there are other variations providing the same or similar services. some of those variations are very user-friendly. the following is a list of different responses when a user enters misspelled words (see figure 2). “xxx” represents the keyword being searched. evaluation and comparison of discovery tools: an update | chickering and yang 18 discovery tools responses for misspelled search words notes acquabrowser did you mean to search: xxx, xxx, xxx? the suggested words are hyperlinks to execute new searches. axiell arena your original search for xxx has returned no hits. the fuzzy search returned n hits. automatically displays a list of hits based on fuzzy logic. “n” is a number. bibliocommons did you mean xxx (n results)? displays suggested word along with the number of results as a link. blacklight no records found. no spell checker, but possible to add by local technical team. ebsco discovery service results may also be available for xxx. the suggested word is a link to execute a new search. encore did you mean xxx? the suggested word is a link to execute a new search. endeca did you mean xxx? the suggested word is a link to execute a new search. enterprise did you mean xxx? the suggested word is a link to execute a new search. extensible catalog sorry, no results found for: xxx. no spell checker, but possible to add by local technical team. primo did you mean xxx? the suggested word is a link to execute a new search. summon did you mean xxx? the suggested word is a link to execute a new search. visualizer did you mean xxx? the suggested word is a link to execute a new search. vufind 1. no results found in this category. search alternative words: xxx, xxx, xxx. 2. perhaps you should try some spelling variation: xxx, xxx, xxx. 3. your search xxx did not match any resources. what should i do now? a list of suggestions including checking a web dictionary. 1. alternative words are links to execute new searches. 2. suggested words are links to execute new searches. 3. suggestions what to do next. worldcat local did you mean xxx? the suggested word is a link to execute a new search. figure 2. spell checker. most of the discovery tools on the list provide this feature except blacklight and extensible catalog. open-source solutions sometimes provide a framework that you add features to. this leaves many information technology and libraries | june 2014 19 possibilities for local developers to add and develop. for instance, a diction ary or spell checker may be easily installed even if a discovery tool does not come with one out of the box. this feature may be configurable. 9. recommendation—amazon has one of those search engines with a recommendation system such as “customers who bought item a also bought item b.” the ecommerce recommendation algorithms analyze the activities of shoppers on the web and build a database of buyer profiles. the recommendations are made based on shopper behavior. when this applies to the library content, it could become “readers who were interested in item a were also interested in item b .” however, most discovery tools do not have a recommendation system. instead, they have adopted different approaches. most discovery tools make recommendations from bibliographic data in marc records such as subject headings for similar items. primo is one of the few discovery tools with a recommendation system similar to those used by amazon and other internet commercial sites. its bx article recommender service is based on usage patterns collected from its link resolver, sfx. developed by ex libris, bx is an independent service that integrates with primo well, but can serve as an add-on function for other discovery tools. bx is an excellent example that discovery tools can suggest new leads and directions for scholars in their research. the authors counted all the discovery tools that provide some kind of recommendations regardless of their technological approaches using marc data or algorithms. ten out of fourte en discovery tools provide this feature in various forms (see figure 3). those include axiell arena, bibliocommons, ebsco discovery service, encore, endeca, extensible catalog, primo, summon, worldcat local, and vufind. the following are some of the recommendations found in those discovery tools. the authors did not find any recommendation in the libraries that use aquabrowser, enterpri se, visualizer, or blacklight. discovery tools language used for recommending or linking to related items axiell arena “see book recommendations on this topic” “who else writes like this?” bibliocommons “similar titles & subject headings & lists that include this title” ebsco discovery service “find similar results” encore “other searches you may try” “additional suggestions” endeca “recommended titles for. . . . view all recommended titles that match your search” evaluation and comparison of discovery tools: an update | chickering and yang 20 “more like this” extensive catalog “more like this” “searches related to . . . ” primo “suggested new searches by this author” “suggested new searches by this subject” “users interested in this article also expressed an interest in the following:” summon “search related to . . . ” worldcat local “more like this” “similar items” “related subjects” “user lists with this item” vufind “more like this” “similar items” “suggested topics” “related subjects” figure 3. language used for recommendation. some discovery tool recommendations are designed in a more user friendly manner than others. most recommendations exist exclusively for items. ideally, a discovery tool should provide an article recommendation system like ex libris’ bx usage-based service that will show users the most frequently used and most popular articles. at the time of this evaluation, no discovery tool has incorporated an article recommendation system except primo. research is needed to evaluate how patrons utilize recommendation services or if they find recommendations beneficial in discovery tools. user contribution—traditionally, bibliographic data has been safely guarded by cataloging librarians for quality control. it has been unthinkable that users would be allowed to add data to library records. the internet has brought new perspectives on this issue. half of the discovery tools (7) under evaluation provide this feature to varying degrees (see figure 4). designed primarily for public libraries, bibliocommons seems the most open to user -supplied data among information technology and libraries | june 2014 21 all the discovery tools. many other discovery tools (7) allow users to contribute tags and reviews. all the discovery tools allow librarians to censor user -supplied data before releasing it for public display. the following figure is a summary of the types of data these discove ry tools allow users to enter. ranking discovery tool user contribution 1 bibliocommons tags, similar title, private note, notices, age suitability, summary, quotes, video, comments, and ratings (10) 2 aquabrowser tags, reviews, and ratings/rankings (3) axiell arena tags, reviews, and title discussions (3) vufind tags, reviews, comments (3) 3 primo tags and reviews (2) worldcat local tags and reviews (2) 4 encore tags (1) 5 blacklight (0) endeca (0) enterprise (0) extensible catalog (0) summon (0) visualizer (0) figure 4. discovery tools based on user contribution. past research indicates that folksono mies or tags are highly useful.22 they complement librarycontrolled vocabularies, such as library of congress subject headings, and increase access to library collections. a few discovery tools allow user entered tags to form “word clouds.” the relative importance of tags in a word cloud is emphasized by font color and size. a tag list is another way to organize and display tags. in both cases , tags are hyperlinked to a relevant list of items. some tags serve as keywords to start new searches, while others narrow search results. only four discovery tools, aquabrowser, encore, primo, and worldcat local, provide both tag clouds and lists. bibliocommons provides only tag lists for the same purpose. the rest of the discovery tools do not have either. one setback of user-supplied tags for subject access is their evaluation and comparison of discovery tools: an update | chickering and yang 22 incomplete nature. they may lead users to partial retrieval of information as users add tags only to items that they have used. the coverage is not systematic and inclusive of all collections. therefore data supplied by users in discovery tools remains controversial. it is possible to seed systems with folksonomies using services like librarything for libraries, which could reduce the impact of this issue. rss feed/email alerts—this feature can automatically send a list of new library resources to users based on his or her search criteria. it can be useful for experienced researchers or frequent library users. some discovery tools may use email alerts as well. eight out of fourteen discovery tools in this evaluation provide rss feeds. those with rss feeds include aquabrowser, axiell arena, ebsco discovery service, endeca, enterprise, primo, summon, and vufind. an rss feed can be added as a plug-in in some discovery tools if it does not come as part of the base system. integration with social networking sites—as most of the college students participate in social networking sites, this feature provides an easy way to share resources among college s tudents on social networking sites. users can place the link to a resource by clicking on an icon in the discovery tool and share the resource with friends on facebook, twitter, delicious and many other social network sites. nine out of the fourteen discovery tools provide this feature. some discovery tools provide integration possibilities with many more social networking sites than others. those with this feature include aquabrowser, axiell arena, bibliocommons, ebsco discovery service, encore, endeca, primo, worldcat local, and extensible catalog. so far , the interaction between discovery tools and social networking sites is limited to sharing resources. social networking sites should be carefully evaluated for the possibility of integra ting some of their popular features into discovery tools. persistent link—this is also called permanent link or permurl. not all the links displayed in a browser location box are persistent links, therefore some discovery tools specifically provid e a link in the records for users to copy and keep. five out of fourteen discovery tools explicitly listed this link in records. those include aquabrowser, axiell arena, blacklight, ebsco discovery service, and worldcat local. the authors marked a system a s “no” when a permanent link is not prominently displayed in a discovery tool. in other words, only those discovery tools that explicitly provide a persistent link are counted as “yes.” however, the url in a browser’s location box during the display of a record may serve as a persistent link in some cases. for instance, vufind does not provide a permanent url in the record, but indicates on the project site that url in the location box is a persistent link. auto-completion/stemming—when a user types in keywords in the search box, the discovery tool will supply a list of words or phrases that she or he can choose readily. this is a highly useful feature that google excels at. stemming not only automatically completes the spelling of a keyword, but also supplies a list of phrases that point to existing items. the authors found this feature in six out of fourteen discovery tools. they include axiell arena, endeca, enterprise, extensible catalog, summon, and worldcat local. information technology and libraries | june 2014 23 mobile interface—the terms “mobile compatible” and “mobile interface” are two different concepts. a mobile interface is a simplified version of a normal browser version of a discovery tool interface so it is optimized for use on mobile phones , and the authors only counted those discovery tools that have a separate mobile interface. a discovery tool may be mobile friendly or compatible and does not necessarily need a separate mobile interface. many discovery tools, such as ebsco, can detect the request from a mobile phone and automatically direct the request to the mobile interface. eleven out of fourteen claim to provide a separate mobile interface. blacklight, enterprise, and extensible catalog do not seem to have a separate mobile interface even though they may be mobile friendly. frbr—frbr groupings denote the relationships between work, manifestation, expression, and items. for instance, a search will not only retrieve a title, but different editions and formats of the work. only three discovery tools can display frbr relationships: extensible catalog (open source), primo by ex libris, and worldcat local by oclc. so far , most discovery tools are not capable of displaying the manifestations and expressions of a work in a meaningful way. from the user’s point of view, this feature is highly desirable. figure 5 is a screenshot from primo demonstrating displays indicating a large number of different adaptations of the work “romeo and juliet.” figure 6 displays the same intellectual work in different manif estations such as dvd, vhs, books, and more. figure 5. display of frbr relationships in primo . evaluation and comparison of discovery tools: an update | chickering and yang 24 figure 6. different versions of the same work in primo . summary the following are the summary tables of our comparison and evaluation. proprietary and open source programs are listed separately in these tables. the total number of features the authors found in a particular discovery tool is displayed at the end of the column. proprietary discovery tools seem to have more advanced characteristics of a modern d iscovery tool than the opensource counterparts. the open-source program blacklight displays fewer advanced features, but seems flexible for users to add features. see figures 7, 8, and 9. information technology and libraries | june 2014 25 figure 7. proprietary discovery tools. aquabrower axiell arena bibliocommons ebsco/ eds encore endeca 1. single point of search no no no yes yes no 2. state of the art interface yes yes yes yes yes yes 3. enriched content yes yes yes yes yes yes 4. faced navigation yes yes yes yes yes yes 5. simple keyword search box on the starting page yes yes yes yes yes yes 6. simple keyword search box on every page yes yes yes yes yes yes 7. relevancy no no no no no no 8. spell checker/ “did you mean . . . ?” yes yes yes yes yes yes 9. recommendation no yes yes yes yes yes 10. user contribution yes yes yes no yes no 11. rss yes yes no yes no yes 12. integration with social network sites yes yes yes yes yes yes 13. persistent links yes yes no yes no no 14. stemming/autocomplete no yes no no no yes 15. mobile interface yes yes yes yes yes yes 16. frbr no no no no no no total 11/16 13/16 10/16 12/16 11/16 11/16 evaluation and comparison of discovery tools: an update | chickering and yang 26 enterprise primo summon visualizer worldcat local 1. single point of search no yes yes no yes 2. state of the art interface yes yes yes yes yes 3. enriched content yes yes yes yes yes 4. faced navigation yes yes yes yes yes 5. simple keyword search box on the starting page yes yes yes yes yes 6. simple keyword search box on every page no yes yes yes yes 7. relevancy no yes no no no 8. spell checker/ did you mean...? yes yes yes yes yes 9. recommendation no yes yes no yes 10. user contribution no yes no no yes 11. rss yes yes yes no no 12. integration with social network sites no yes no no yes 13. persistent links no no no no yes 14. stemming/autocomplete yes no yes no yes 15. mobile interface no yes yes yes yes 16. frbr no yes no no yes total 7/16 14/16 11/16 7/16 14/16 figure 8. proprietary discovery tools (continued). blacklight extensible catalog vufind 1. one point of search no no no 2. state of the art interface yes yes yes 3. enriched content yes yes yes 4. faceted navigation yes yes yes 5. simple keyword search box on the starting page yes yes yes 6. simple keyword search box on every page yes yes yes 7. relevancy no no no 8. spell checker/did you mean ...? no no yes 9. recommendation no yes yes 10. user contribution no no yes 11. rss no no yes 12. integration with social network sites no yes no information technology and libraries | june 2014 27 13. persistent links yes no no 14. stemming/auto-complete no yes no 15. mobile interface no no yes 16. frbr no yes no total 6/16 9/16 10/16 figure 9. free and open-source discovery tools. as one-stop searching is the core of a discovery tool, this consideration placed five discovery tools above the rest: encore, ebsco discovery service, primo, summon, and worldcat local ( see figure 10). these five are web-scale discovery services. all of them use their native unified index except encore, which has incorporated the ebsco unified index in its search. despite of great progress made in the past three years in one-stop searching, none of the discovery to ols can truly search across all library resources—all of them have some limitations as to the coverage of content. each unified index may cover different databases as well as overlap each other in many areas. one possible solution may lie in a hybrid approach that combines a unified index with federated search (also called real-time discovery). those old and new technologies may work well when complementing each other. it remains a challenge if libraries will ever have one-stop searching in its true sense. discovery tools one-stop searching encore yes ebsco discovery service yes primo yes summon yes worldcat local yes figure 10. the discovery tools capable of one stop searching . it is also worth mentioning that one-stop searching is a vital and central piece of discovery tools. those discovery tools without a native unified index or connectors to databases for real -time searching are at a disadvantage. therefore discovery tools that do not provide web -scale searching are investigating various possibilities to incorporate one-stop searching. some are drawing on the unified indexes of those discovery tools that have them through connectors to the application programming interfaces (apis) of those products. for instance, vufind in cludes connectors to the apis of a few other systems that have a unified index or vast resources such as summon and worldcat. blacklight may provide one-stop searching through the primo api. such a practice may present other problems such as calculating relevancy ranking across resources that may not live in the same centralized index, thus not achieving fully balanced relevancy ranking. nevertheless, discovery tool developers are working hard to achieve one-stop searching. as a unified index can be shared across discovery tools, in the next few years, more and more discovery services may offer one-stop searching. evaluation and comparison of discovery tools: an update | chickering and yang 28 based on the count of the sixteen criteria in the checklist, we ranked primo and worldcat local as the top two discovery tools. based on our criteria , primo has two unique features that make it stand out: relevancy enhanced by usage statistics and value score and the frbr relationship display. worldcat local and extensible catalog are the other two discov ery tools that can display frbr relationships (see figure 11). rank discovery tools number of advanced features 1 primo and worldcat local 14/16 2 axiell arena 13/16 3 ebsco discovery service 12/16 4 aquabrowser, encore, and endeca 11/16 5 bibliocommons, summon, and vufind 10/16 6 extensible catalog 9/16 6 enterprise and visualizer 7/16 7 blacklight 6/16 figure 11. ranked discovery tools. limitations as discovery tools are going through new releases and improvements, what is true today may b e false tomorrow. discovery tools constantly improve and evolve , and many features are not included in this evaluation, such as integration with google maps for the location of an item and user-driven acquisitions. innovations are added to discovery tools constantly. this study only covers the most common features that the library community agreed upon as those that a discovery tool should have. some open-source discovery tools may provide a skeleton of an application that leaves the code open for users to develop new features. therefore different implementations of an open-source discovery tool may encompass totally different features that are not part of the core application. for instance, the university of virginia developed virgo based on blacklight, adding many advanced features. thus it is quite a challenge to distinguish what comes with the software and what are local developments. this study focused on the user interface of discovery tools. what are not included are content coverage, application administration, and searching capability of the discovery tools. those three are important factors when choosing a discovery tool. conclusion search technology has evolved far beyond federated searching. the concept of a “next generation catalog” has merged with this idea, and spawned a generation of discovery tools bringing almost google-like power to library searching. the problems facing libraries now are the intelligent information technology and libraries | june 2014 29 selection of a tool that fits their contexts, and structuring a process to adopt a nd refine that tool to meet the objectives of the library. our findings indicate that primo and worldcat local have better user interfaces, displaying more advanced features of a next generation catalog than their peers. for rul, ebsco discovery service (eds) provides something approaching the ease of google searching from either a single search box or a very powerful advanced search. being aware of the limitations noted above, rider’s libraries elected to continue displaying traditional search options in addition to what we’ve branded “library one search.” another issue we discovered in this process is when negotiating for a vendor-hosted test, libraries must be sure that the test period begins when the configuration is complete rather than only whe n the data load begins. all phases of the project took far more time than anticipated. the client institution’s implementation coordinator or team needs to be reviewing the progress on a daily basis and communicating often with the vendor-based implementation team. with the evaluative framework this study provides, libraries moving toward discovery tools should consider changing capabilities of the available discovery tools to make informed choices. references 1. jason vaughan, “investigations into library web-scale discovery services,” information technology & libraries 31, no. 1 (2012): 32–33, http://dx.doi.org/10.6017/ital.v31i1.1916. 2. sharon q. yang and melissa a. hofmann, “next generation or current generation? a study of the opacs of 260 academic libraries in the usa and canada,” library hi tech 29 no. 2 (2011): 266–300. 3. melissa a. hofmann and sharon q. yang, “‘discovering’ what’s changed: a revisit of the opacs of 260 academic libraries,” library hi tech 30, no. 2 (2012): 253–74. 4. alexander pope, “alexander pope quotes,” http://www.brainyquote.com/quotes/authors/a/alexander_pope.html. 5. f. william chickering, “linking information technologies: benefits and challenges,” proceedings of the 4th international conference on new information technologies, budapest, hungary, december 1991, http://web.simmons.edu/~chen/nit/nit%2791/019-chi.htm. 6. kristin antelman, emily lynema, and andrew k. pace, “toward a twenty-first century library catalog,” information technology & libraries 25, no. 3, (2006): 128-39, http://dx.doi.org/10.6017/ital.v25i3.3342. 7. marshall breeding, “plotting a new course for metasearch,” computers in libraries 25, no. 2 (2005): 27–29. http://www.brainyquote.com/quotes/authors/a/alexander_pope.html http://web.simmons.edu/~chen/nit/nit%2791/019-chi.htm evaluation and comparison of discovery tools: an update | chickering and yang 30 8. judith carter, “discovery: what do you mean by that?” information technology & libraries 28, no. 4 (2009): 161–63, http://dx.doi.org/10:6017/ital.v28i4.3326. 9. priscilla caplan, “on discovery tools, opacs and the motion of library language,” library hi tech 30, no. 1 (2012): 108–15. 10. carol pitts diedrichs, “discovery and delivery: making it work for users,” serials librarian 56, no. 1–4 (2009): 79, http://dx.doi.org/10.1080/03615260802679127. 11. alex a. dolski, “information discovery insights gained from multipac, a prototype library discovery system,” information technology & libraries 28, no. 4, (2009): 173, http://dx.doi.org/10.6017/ital.v28i4.3328. 12. marshall breeding, “the state of the art in library discovery,” computers in libraries 30, no. 1 (2010): 31–34. 13. jennifer l. fabbi, “focus as impetus for organizational learning,” information technology & libraries 28, no. 4 (2009): 164–71, http://dx.doi.org/10.6017/ital.v28i4.3327. 14. douglas way, “the impact of web-scale discovery on the use of a library collection,” serials review 36, no. 4: (2010): 214–20, http://dx.doi.org/10.1016/j.serrev.2010.07.002. 15. marshall breeding, “library technology guides: discovery products,” http://www.librarytechnology.org/discovery.pl. 16. ibid. 17. sharon q. yang and kurt wagner, “evaluating and comparing discovery tools: how close are we towards next generation catalog?” library hi tech 28, no. 4 (2010): 690–709. 18. yang and hofmann, “next generation or current generation? ” 266–300. 19. melissa a. hofmann and sharon q. yang, “how next-gen r u? a review of academic opacs in the united states and canada,” computers in libraries 31, no. 6 (2010): 26–29. 20. brown library of virginia western community college, “primo-frequently asked questions,” http://www.virginiawestern.edu/library/primo -faq.php#popularity_ranking. 21. yang and wagner, “evaluating and comparing discovery tools,” 690–709. 22. yanyi lee and sharon q. yang, “folksonomies as subject access—a survey of tagging in library online catalogs and discovery layers,” paper presented at ifla post-conference “beyond libraries-subject metadata in the digital environment and semantic web ,” tallinn, estoniai, 18 august 2012, http://www.nlib.ee/html/yritus/ifla_jarel/papers/4-1_yan.docx http://athena.rider.edu:2054/eds/viewarticle?data=dgjymppp44rp2%2fdv0%2bnjisfk5ie42eik6tmvsk6k63nn5kx94um%2bsa2otkewpq9lnqe4sk%2bws0yexss%2b8ujfhvhx4yzn5eyb4rorsbguteq1r7u%2b6tfsf7vb7d7i2lt94unjho6c8nnls79mpnfsvdgmrlg2rbdjsaeusk6mtlcwnosh8opfjlvc84tq6uoq8gaa&hid=20 http://www.librarytechnology.org/discovery.pl http://www.virginiawestern.edu/library/primo-faq.php#popularity_ranking http://www.nlib.ee/html/yritus/ifla_jarel/papers/4-1_yan.docx letter from the editor kenneth j. varnum information technology and libraries | june 2018 1 in this june 2018 issue, we continue our celebration of ital’s 50th year with a summary by editorial board member sandra shores of the articles published in the 1970s, the journal’s first full decade of publication. the 1970s are particularly pivotal in library technology, as it marks the introduction of the personal computer, as a hobbyist’s tool, to society. the web is still more than a decade away, but the seeds are being planted. with this issue, we introduce a new look for the journal — thanks to the work of lita’s web coordinating committee, and in particular kelly sattler (also a member of the editorial board), jingjing wu, and guy cicinelli. the new design is much easier on the eyes and more legible, and sports a new graphic identity for ital. board transitions june marks the changing of the editorial board. a significant number of board members’ terms expire this june 30, and i’d like to take this opportunity to thank those departing members for their years of service to information technology and libraries, and the support they have offered me this year as i began as editor. each has ably and generously contributed to the journal’s growth over the last years, and i thank them for their service to the journal and to ital: • mark cyzyk (johns hopkins university) • mark dehmlow (notre dame university) • sharon farnel (university of alberta) • kelly sattler (michigan state university) • sandra shores (university of alberta) these are big shoes to fill, but i am excited about the new members who have been appointed for two-year terms beginning july 1, 2018. in march, we extended a call for volunteers for 2 -year terms on the editorial board. we received almost 50 applications, and ultimately added seven new members: • steven bowers (wayne state university) • kevin ford (art institute of chicago) • cinthya ippoliti (oklahoma state university) • ida joiner (independent consultant) • breanne kirsch (university of south carolina upstate) • michael sauers (do space, omaha, nebraska) • laurie willis (san jose public library) readership survey summary over the past three months, we ran a survey of the ital readership to try to understand a bit more detail about who you are, collectively. the survey received 81 complete responses out of about 11,000 views of pages with the survey link on it. here are some brief summary results: • nearly half (46%) of respondents have attended at least one lita event (in-person or online). letter from the editor | varnum 2 https://doi.org/10.6017/ital.v37i2.10571 • three quarters (75%) of respondents are from academic libraries. public, special, and lis programs make up an additional 20%. • the majority (56%) are librarians, with the remaining spread across a number of other roles. • almost two thirds (63%) of respondents have never been lita members, a quarter (25%) are current members, and the remainder are former members. • about four fifths (81%) of responses came from the current issue (either the table of contents or individual articles). an invitation what can you share with your library colleagues in relation to technology? if you have interesting research about technology in a library setting, or are looking for a venue to share your your case study, get in touch with me at varnum@umich.edu. sincerely, kenneth j. varnum, editor varnum@umich.edu june 2018 mailto:varnum@umich.edu board transitions readership survey summary an invitation mitchell multimedia will have a profound effect on libraries during the next decade. this rapidly developing technology permits the user to combine digital still images, video, animation, graphics, and audio. it can be delivered in a variety of finished formats, including streaming video on the web, video on dvd/vcd, embedded digital objects within a web page or presentation software such as powerpoint, utilized within graphic designs, or printed as hardcopy. this article examines the elements of multimedia creation, as well as requirements and recommendations for implementing a multimedia facility in the library. t he term multimedia, which some may remember being used in the early 1970s as the name for slide shows set to music, now is used to describe “a number of diverse technologies that allow visual and audio media to be combined in new ways for the purpose of communicating.”1 almost all personal computers sold today are capable of viewing multimedia; many can, with minor modifications, also create multimedia. one of the most important features of multimedia is its flexibility. multimedia creation has several distinct elements—inputs, processes performed on those inputs, and outputs (see figure 1). each element can be described as follows. � inputs—new video can be recorded, or existing video, stored on a hard disk, cd/dvd, or tape can be imported. the same is true of audio, with the added flexibility of creating soundtracks or sound effects later, during the editing process. digital still images can be used, either shot on a camera or created by scanning an existing picture. digital artwork or animated sequences created in other software also can be brought in. � processing—regardless of the source, these digital inputs are loaded into the editing software. at this stage, the user will select and arrange the images and sounds, and the software may permit special effects to be created. in addition, the editing software may compress the file so that it is easier to use than the large file sizes used in raw video and audio recording. � outputs—at this point, the user has more choices to make. the new multimedia file can be sent to a program that will encode it for a streaming video in any one of a variety of popular formats, such as windows media, realmedia, or clipstream. then it can be mounted on a web site (either a regular page or within courseware such as webct or blackboard), or the file could be burned onto a cd or dvd, or it could be used within presentation software such as microsoft powerpoint. or the output file from the editing process could be encoded and embedded so that it is an avatar running as part of a web page with a product such as rovion bluestream. the possibilities are nearly endless. all of this is made possible by advances in technology on a variety of fronts. one of the happy anomalies in technology is that greater performance is frequently accompanied by lower costs. this is certainly the case with much of the activity surrounding multimedia. the following factors have fostered advances in multimedia: � increase in processing power and decrease in cost of computer hardware; � quality and affordability of video equipment; � compression of multimedia files; � consumer broadband internet access; and � current multimedia editing software the first two technology factors concern the equipment involved in multimedia production. leading off is the familiar, ever-increasing speed of processors and improved memory and hard-drive space, all delivered for less money. this trend is something that many people take for granted, but a reality check is sometimes in order. the processor in the typical desktop machine on advertised special today is approximately forty-four times as fast as the first pentium processor sold ten years ago, and is equipped with sixteen times as much ram and 117 times as much hard-drive space—at 20 percent of the cost of the old machine (not even adjusted for inflation!). the second factor is the incredible quality available in consumer-market video equipment at reasonable costs. while the images produced with consumer-grade video would not play well at the local megaplex movie theater, they look very good on the small screens found on computers, televisions, and classroom projectors. the third factor is that tremendous compression of multimedia files can be achieved during the editing process. an incoming raw-video file (in the standard .avi format) can be compressed with editing, encoding, and dedicated third-party compression software to an incredible 1 to 2 percent of its original size, and it will still retain very good quality as a digital object on the web and in other desktop viewing applications. the fourth factor is extremely critical for the success of multimedia web applications. home access is shifting away from dial-up access to broadband, with its greatly increased transfer rates. half of all united states homes with internet access are already using broadband, and the 32 information technology and libraries | march 2005 gregory a. mitchell (mitchellg@utpa.edu) is assistant director, resource management at the university of texas—pan american library, edinburg, texas. distinctive expertise: multimedia, the library, and the term paper of the future gregory a. mitchell forecast is for steady increase in these numbers.2 although not all broadband is created equal, it is all significantly faster than dial-up access. the final technology factor concerns the software that is currently available to the multimedia web developer. a developer can achieve some quite professional results with even the most basic products, and then can grow into more complex software that supports increasing levels of expertise. once again, this software is being sold in the price range that typical consumers can afford. � small really is beautiful creating a multimedia lab in the library need not be a large, complex undertaking. in fact, it can be very low cost and as simple as a single workstation. so it is scalable, allowing the library to start small and build in complexity and cost as time, money, and human resources will permit. at the bare-bones minimum, a multimedia lab would consist of a workstation with the software necessary for acquiring, editing, and outputting the files. for practical purposes, though, the workstation should be equipped with a network connection, a cd/dvd burner, a scanner, and a webcam with microphone. another very useful option is an analog-digital bridge device, which enables the capture of analog input (such as vhs tape) into digital files for the editor. to achieve better-quality video when shooting original content, a digital-video camera, tripod, wireless microphone, and portable light kit would be recommended. since more time typically is spent at the editing station than with the camera, the lab can be expanded with additional workstations before investing in another camera. experience at the author’s institution has shown that it is possible to operate a lab with ten workstations and only three video cameras and three still cameras. finally, output from the editing process will likely be printed, so a photoquality printer is another convenient option. this illustrates that the entry into multimedia work need not be a large expense, especially if an existing workstation and any other equipment is already available. if a fairly recent workstation is available to dedicate to the project, the library’s total startup cost could range from $200 to $1,000. not many new library services can be launched for as little as that. rather than dwell on equipment specifications, as that is not the intent of this discussion, the reader may consult the excellent tutorials available from desktop video and pc magazine’s online product guide.3 finally, the creation of a studio is a worthwhile option. although some video will need to be shot on location, many times it is possible to set up and shoot in just one place. a studio is the best place in which to work because it is a controlled environment. it does not need to be large or complicated, and a quiet office or study room can be set up with little effort and expense. the studio gives the users control over the sound and the lighting, and involves minimal setup time for projects. � the research paper of the future multimedia has begun to attract attention in the library community. joe janes, chair of library and information science at the information school at the university of washington and the person responsible for developing the internet public library, recently stated he foresees a growing role for multimedia in the library. it will replace much of the traditional, text-based communication that people are accustomed to. for example, multimedia projects can become the research paper of the future for students.4 it is the media in which many library customers will be working. experience from the author’s institution with creating a multimedia lab would seem to confirm his observation. during the first year and a half of operation, use of the lab has steadily increased (see figure 2). � collaboration the multimedia lab opens the doors to collaborative opportunities with faculty and students from a variety of disciplines across campus. this is because multimedia, like geographic information systems (gis) or other electronic information and communication technologies, is a tool and is not discipline-specific. as important as it is to make the connection with faculty, this media is something with which the students will frequently lead the figure 1. multimedia creation process distinctive expertise: multimedia, the library, and the term paper of the future | mitchell 33 34 information technology and libraries | march 2005 way. they are, after all, the mtv generation, and multimedia has an incredible appeal to their visual orientation. faculty themselves have used it to augment their web-based courses as well as traditional classroom instruction. the author ’s library has even initiated a multimedia résumé service for graduating students. the students can record a video introduction of themselves, encode this as a rovion bluestream avatar, and post it with their résumés on the web. this creates a much stronger impression than a standard résumé, hopefully giving the students an edge in promoting themselves on the job market. even more impressive is the variety of projects that are created in the lab by the students. one might expect to see interest from students in art and communications classes, but students come from many other disciplines as well. for example, business students have effectively used multimedia in their graduate-school business-plan presentations, while biology students like to use the graphics capabilities to study close-ups of slides. education students have employed it to produce multimedia instructional aids, and a sociology student put together a presentation on underserved, low-income neighborhoods. the library supplies the facility and instruction—only the imagination of students is needed. libraries have always been involved in the students’ research and writing process, by providing content, instruction, and facilities for producing the final research product. the same is true in the multimedia environment, although implementing a multimedia lab calls for some new skills for librarians. these include familiarity with basic principles of videography, learning how to use the cameras and other equipment, and gaining some mastery of the editing and encoding software. � why put it in the library? in addition to the research-paper analogy, the author believes that librarians can point with pride to the values and value that libraries offer their communities. it is a central and neutral location—not in one department’s or college’s turf. libraries are conveniently open for many hours per week. many of the information resources that students might use to prepare the presentation are in the library. and librarians have a professional ethic that drives them to provide instruction and assistance for the services the library offers. since multimedia production does have a learning curve and most new users need help in mastering the technology, it does not fit very well with the typical 24/7 drop-in computer lab that the campus information technology (it) often operates. this is a good opportunity for librarians to recognize some of their strengths and capitalize on them. in addition, this can be a breath of fresh air for librarians. here is an opportunity to learn about something new and creative. most people find that they have less room for creativity as time goes by.5 with a multimedia lab in the building, it will offer the librarians the opportunity to create multimedia productions for the library, besides assisting students and faculty with their projects. � potential problems there are some obstacles to overcome, of course. they need not be seen as major, but it is best to be realistic when beginning any new venture. it is almost always a good idea to start small, with a pilot project that will yield valuable lessons before venturing into anything big. � equipment—define what specifications are needed, see what is already available to use or borrow, then figure out what you will actually need to buy. � software—check out the variety of software for editing and production; think about how you want to begin using multimedia (primarily on the web, in presentation software such as powerpoint, as standalone videos on cds and dvds). � money—if funding permits, a library can invest several thousand dollars in a high-end multimedia computer, associated peripherals such as a color printer and one or more scanners, and a software suite to meet initial anticipated demands for multimedia creation and editing. if funding is scarce, you may want to investigate what existing equipment could be used in support of a pilot project. � location—this needs some space of its own, accessible to students and monitored by staff. although the figure 2. university of texas—pan american library multimedia lab usage editing workstation could be in an area with other computers, a quiet area is needed for shooting video so that there will not be interference from noise and unwanted foot traffic through the shots. � staffing and training—a multimedia lab is not a good candidate for self-service. librarians and staff who will provide the service need to learn how to use the equipment and software. make sure that they all have an acceptable level of competence and confidence so that the library can shine with its new service, but expect that everyone will need to continue to learn and grow in their proficiency. if your library plans to produce its own multimedia sessions as well, it would be a good investment to attend a class on television or video production. � hours—how many hours per week will the new service be available? if it is the entire time the library is open, be prepared to train plenty of staff. repeat users will need less help as their skills increase (by the way, some of these students can be great workstudy employees). � instruction—plan to offer formal orientation and instruction sessions to faculty and their classes. if your lab is small, this is challenging, but it can be accomplished with some creativity. for example, a general instruction session on concepts can be done in a classroom, followed up by a series of small groups working by appointment for the appliedlearning component in the multimedia lab. the author and a colleague have even done instruction outside the library using laptops and cameras, creating a de facto mobile studio. � copyright—if there are already vcrs or photocopiers in the library, you have had to deal with this issue. the pan american library at university of texas does not allow people to use its lab to copy movies, which is a request that surely will come to you, and we post the usual copyright notices just as we do at our photocopiers. for some excellent information on copyright, visit the american library association web site (www.ala.org). � evaluation—plan on at least basic evaluation of the service. this can include an assessment of the effectiveness of the instruction sessions, a survey of satisfaction with the lab itself, a questionnaire on the intended uses of the multimedia projects, demographic data on the students, or other student input. logs of the number of uses and peak-demand periods are extremely useful for planning and for justifying further expenditures and staffing requests. � flexibility for the future—whatever you do in a pilot phase, always keep in mind that you want to keep an open mind—you are trying to learn from the experience so that you can make good decisions for the direction of this new service. it may not go exactly the way you originally thought, because of serendipity, or changes in technology, or very strong demand from some segments of the campus instead of others, or other environmental factors. � conclusion benefits to the library from the multimedia lab are many. one of the most important benefits is that it keeps the library involved in the process of academic communication, as the medium of the communication changes with technology. by being involved in this evolving medium at its early stages, the library is poised to pounce on opportunities to employ it to the benefit of the library in instruction and content delivery. the library also would position itself on campus as a key player in it and the leading local expert in the growing field of multimedia. since multimedia is a tool that crosses the entire range of subject disciplines on campus, it opens the doors of faculty to collaborate with librarians in exciting new ways. just as many campuses already have learning and collaborative communities that grew around their web courseware or gis endeavors, so too can one develop around multimedia. the appendix offers a list of multimedia web sites to consider. libraries are more than warehouses of books and periodicals. as more and more of our resources have been made available electronically, and indeed more of higher education has moved to electronic delivery, many libraries have been faced with declining gate counts, circulations, and reference statistics. as someone observed, we are victims of our own success. so what is the role of the library? we are intrinsically involved in the process of instruction, academic research, and communication. as kling observed, “one important strategic idea is that libraries configure their it services and activities to emphasize the distinctive expertise of their librarians rather than simply concentrate on the size and character of the documentary collection.”7 it is imperative therefore that libraries pick out the new trends that will allow them to excel by capitalizing on their traditional strengths. references 1. scala, inc. multimedia directory. accessed apr. 21, 2004, www.scala.com/multimedia/multimedia-definition.html. 2. nielsen/netratings as of june, 2004. accessed aug. 10, 2004, www.websiteoptimization.com/. 3. about.com, dvt101. accessed apr. 15, 2004, http:// desktopvideo.about.com/library/weekly/aa040703a.htm; “anatomy of a video editing workstation,” pc magazine. accessed apr. 16, 2004, www.pcmag.com/article2/0,1759,1264650 ,00.asp. distinctive expertise: multimedia, the library, and the term paper of the future | mitchell 35 36 information technology and libraries | march 2005 4. college of dupage, “joe janes and colleagues: preparing for the future of digital reference,” a satellite broadcast from the college of dupage, 16 apr. 2004. 5. sandra kerka, creativity in adulthood (columbus, ohio: eric clearinghouse on adult career and vocational education, eric digest no. 204, ed429186, 1999). 6. american library association, “copyright issues, primer on the digital millennium.” accessed may 10, 2004, www.ala .org/ala/washoff/woissues/copyrightb/dmca/dmcprimer.pdf. 7. rob kling, “the internet and the strategic reconfiguration of libraries,” library administration & management 15, no. 3 (summer 2001): 144–51. appendix. for further reading: a multimedia web-site tour the following is a sampling of some of the most popular and interesting multimedia software, with examples of completed productions. this is not an official endorsement of any one product over another, whether listed here or not. a look at these sites will, however, give the reader an idea about the power and possibilities of multimedia communications. adobe (www.adobe.com) the well-known makers of some of the most powerful and popular editing software packages for graphics and video. camtasia (www.camtasia.com) easy to use, this is a good example of the type of software that does screen capture and recording, which is handy for producing online tutorials. clipstream (www.clipstream.com) an excellent example of the type of newer encoding software that achieves incredible compression of video and delivers it over the web with no viewer or plug-ins required for the user. finalcut pro (www.apple.com/finalcutpro) a perennial favorite among the mac crowd, this software is relatively easy to learn and lets the developer achieve dramatic results. flashants (www.flashants.com) a handy program that converts flash animation into .avi video format so that you can integrate animated sequences into a video production. macromedia (www.macromedia.com) the makers of flash and director, which are some of the most popular graphics, animation, and mulitimedia editing tools in the business. pinnacle (www.pinnaclesys.com) what finalcut pro is to the mac, this package is for the pc environment. easy to use, yet sophisticated in the results achieved. rovion (www.rovion.com) rovion bluestream is an encoder that enables the creation of avatar characters to appear live on your web page. a plugin is required for the user, but this approach definitely gets attention. serious magic (www.seriousmagic.com) an award-winning software package that allows you to turn a workstation into a studio, complete with teleprompter capability, sound effects, graphics, and editing. university of texas—pan american library (www.lib.panam.edu/libinfo/media.asp) links to multimedia projects at the author’s institution, including productions made by staff and students. identifying key steps for developing mobile applications and mobile websites for libraries devendra dilip potnis, reynard regenstreifharms, and edwin cortez information technology and libraries | september 2016 43 abstract mobile applications and mobile websites (mamw) represent information systems that are increasingly being developed by libraries to better serve their patrons. because of a lack of in-house it skills and the knowledge necessary to develop mamw, a majority of libraries are forced to rely on external it professionals who may or may not help libraries meet patron needs but instead may deplete libraries’ scarce financial resources. this paper applies a system analysis and design perspective to analyze the experience and advice shared by librarians and it professionals engaged in developing mamw. this paper identifies key steps and precautions to take while developing mamw for libraries. it also advises library and information science graduate programs to equip their students with the specific skills and knowledge needed to develop and implement mamw. introduction the unprecedented adoption and ongoing use of a variety of context-specific mobile technologies by diverse patron populations, the ubiquitous nature of mobile content, and the increasing demand for location-aware library services have forced libraries to “go mobile.” mobile applications and mobile websites (mamw), that is, web portals running on mobile devices, represent information systems that are increasingly being developed and used by libraries to better serve their patrons. however, a majority of libraries often lack the in-house human resources necessary to develop mamw. because of a lack of staff equipped with the requisite it skills and knowledge, libraries are often forced to partner with and rely on external it professionals, potentially losing control over the process of developing mamw.1 partnerships with external it professionals do not always help libraries meet the information needs of their patrons but instead can deplete their scarce financial resources. it then becomes necessary for librarians to understand the process of developing mamw to better evaluate mamw for better serving library patrons. one possibility devendra dilip potnis (dpotnis@utk.edu) is associate professor, school of information sciences; reynard regenstreif-harms (reynardrh@gmail.com) is project archives technician, great smoky mountains national park, gatlinburg, tennessee; and edwin cortez (ecortez@utk.edu) is professor, school of information sciences, university of tennessee at knoxville. mailto:dpotnis@utk.edu mailto:reynardrh@gmail.com) mailto:ecortez@utk.edu identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 44 is to re-educate themselves through continuing education or other professional development activities. another solution would be to see library and information science (lis) schools strengthen their curriculum in the area of management, evaluation, and application of mamw and related emerging technologies. issues, challenges, and strategies for providing librarians with these opportunities are abundant and have been debated for more than thirty years, especially since libraries started experiencing the impact of microchip and portable technologies.2 any practical and immediate guidance could help librarians in charge of developing mamw.3 however, a majority of the practical guidance available for developing mamw for libraries is limited to specific settings or patron populations. also, the practical guidance is not theoretically validated, curtailing its generalizability for diverse library settings. for instance, a number of librarians and it professionals share their experience and stories of mamw development to serve a specific patron population in a specific library setting.4,5 their stories typically describe their success stories of developing mamw, the lessons learned during the development of mamw, or their advice for developing mamw. this paper applies a system analysis and design perspective from the information systems discipline to examine the experience and advice shared by librarians and it professionals for identifying the key steps and precautions to be taken when developing mamw for libraries. system analysis and design, a branch of the information systems discipline, is the most widely used theoretical knowledgebase available for developing information systems.6 according to the system analysis and design perspective, development, planning, analysis, design, implementation, and maintenance are the six phases of building any information system.7 the next section synthesizes our method for this secondary research. the following section discusses the key steps we identified for developing, planning, analyzing, designing, implementing, and maintaining mamw for libraries. the concluding section presents the implications of this study for libraries and lis graduate programs. method we began this study with a practitioner’s handbook guiding libraries to use mobile technologies for delivering services to diverse patron populations.8 to search the literature relevant to our research, we devised many key phrases, including but not limited to “mobile technolog*,” “mobile applications for libraries,” and “mobile websites for libraries.” as part of our active informationseeking process, we applied a snowball sampling technique to collect more than seventy-five scholarly research articles, handbooks, ala library technology reports, and books hosted on ebsco and information science source databases. our passive information-seeking was helped by article suggestions from emerald insight and elsevier science direct, two of the most widely used journal hosting sites, in response to the journal articles we accessed there. we applied the following four criteria to establish the relevancy of publications to our research: accuracy of facts; duration of publications (i.e., from 2000 to 2014); credibility of authors; and content focused on information technology and libraries | september 2016 45 problems, solutions, advice, and tips for developing mamw. several research articles published by information technology and libraries and library hi tech, two top-tier journals covering the development of mamw for libraries, built the foundation of this secondary research. we analyzed the collected literature using the qualitative data presentation and analysis method proposed by miles and huberman.9 we developed microsoft excel summary sheets to code the experience and advice shared by librarians and it professionals. the coded data was read repeatedly to identify and name patterns and themes. each relevant publication was analyzed individually and then compared across subjects to identify patterns and common categories. the inter-coder reliability between the two authors who analyzed data was 85 percent. data analysis helped us identify the key steps needed for planning, analyzing, designing, implementing, and maintaining mamw for libraries. findings and discussion key steps for planning mamw forming and managing a team building teams of people with the appropriate skills, knowledge, and experience is one of the first steps suggested by the existing literature for planning mamw. it is essential for team members to be aware of new developments and trends in the market.10 for instance, developers should be aware of print resources on relevant technologies such as apache, asp, javascript, php, ruby on rails, and python, etc.; online resources such as detectmobilebrowser.com and w3c mobileok checker to test catalogs, design functionality, and accessibility on mobile devices; and various online communities of developers who could provide peer-support when needed.11 team members are also expected to keep up with new developments in mobile devices, platforms, operating systems, digital rights management terms and conditions, and emerging standards for content formats.12 periodic delegation of various tasks could help libraries develop mamw effectively.13 libraries should also form productive, financially feasible partnerships with external stakeholders such as internet service providers and network administrators for hosting mamw on appropriate internet servers that meet desired safety and security standards.14,15 requirements gathering requirements for developing mamw can be collected through empirical research and secondary research. typically, the goal of empirical research is to help libraries [set off as bulleted list?]gather patron preferences for and expectations of mamw,16,17 stay abreast of the continual evolution of patron needs,18 periodically (e.g., quarterly, annually, biannually, etc.) gather and evaluate user needs,19 index the content of mamw,20 investigate the acceptance of the library’s use of mamw by patrons,21 understand user needs, and identify top library services requested by patrons. identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 46 empirical research in the form of usability testing, functional validation, user surveys, etc., should be carried out before developing mamw to inform the development process and/or after developing mamw to study their adoption by library patrons. empirical research typically involves the identification of patrons and other stakeholders who are going to be affected by mamw. this step is followed by developing data-collection instruments, collecting data from patrons and other stakeholders, and analyzing qualitative and quantitative data using appropriate techniques and software.22 secondary research mainly focuses on scanning and assessing existing literature. for instance, using appropriate datasets on mobile use, librarians may be able to identify the factors responsible for the adoption of mobile technologies.23 typically, such factors include but are not limited to cognitive, affective, social, and economic conditions of potential users. mamw developers could also scan the environment by examining existing mamw and reviewing the literature to create sets of guidelines for replacing old information systems by developing new, well-functioning mamw.24 librarians could also scan the market for free software options to conserve financial resources.25 making strategic choices mobile applications or mobile websites? one of the most important strategic decisions libraries need to make during this phase is whether to use a mobile app or a mobile website—that is, a web portal running on mobile devices—for offering services to patrons. mobile websites are web browser-based applications that might direct mobile users to a different set of content pages, serve a single set of content to all patrons while using different style sheets or templates reformatted for desktop or mobile browsers, or use a site transcoder (a rule-based interpreter), which resides between a website and a web client and intercepts and reformats content in real time for a mobile device.26,27 mobile apps are more challenging to build than mobile websites because they require separate and specific programming for each operating system.28 mobile apps burden users and their devices. for instance, users are expected to remember the functionality of each menu item, and a significant amount of memory is required to store and support apps on mobile devices. however, potential profitability, better mobile-device functionality, and greater exposure through app stores can make mobile apps an economical option over mobile websites.29 buy or build? in the planning phase, libraries also need to decide whether to buy commercial, off-the-shelf (cots) mamw or build a customized mamw. mamw need to be evaluated in terms of customer support and service, maintenance, the ability to meet patron needs, and library needs when making this choice.30 sometimes libraries purchase cots products and end up customizing them, benefiting from both options. for example, some libraries first purchase packaged mobile frameworks to create simple, static mobile websites and subsequently develop dynamic library apps specific to library services.31 information technology and libraries | september 2016 47 managing scope many libraries have limited financial resources, which makes it necessary for their staff to manage the scope of mamw development. the ability to prioritize tasks and identify mission-critical features of mobile mamw are some of the most common activities undertaken by libraries to manage this scope.32 for instance, it is not practical to make entire library websites mobile because libraries would end up serving only those patrons who access their sites over mobile alone. instead, libraries should determine which part of the website should go mobile. a growing trend of using products like mobile first design to design a mobile version of a website first and then work up to a larger desktop version could help librarians better manage the scope of mamw development. alternatively, jeff wisniewski, a leading web services librarian in the united states, advises libraries to create a new mobile-optimized homepage alone, which is faster than trying to retrofit the library’s existing homepage for mobile.33 this advice is highly practical because no webmaster has any interest in trying to maintain two distinct versions of the library’s webpages with details such as hours of operations and contact information. selecting the appropriate software development method there are three key methods for developing mamw: structured methodologies (e.g., waterfall or parallel), rapid application prototyping (e.g., phased, prototyping, or throwaway prototyping), and agile development, an umbrella term used to refer to the collection of agile methodologies like crystal, dynamic systems development method, extreme programming, feature-driven development, and scrum. there is a bidirectional relationship between these mamw development methods and the resources available for their development. project resources such as funding, duration, and human resources influence and are affected by the type of software development method selected for developing mamw. however, studies rarely pay attention to this important dimension of the planning phase.34 key steps in the analysis phase requirements analysis after collecting data from patrons, the next natural step is to analyze the data to inform the process of conceptualizing, building, and developing mamw.35 the requirements-analysis phase helps libraries achieve user-centered design of mamw and assess the return on investment in mamw. the context and goals of the patrons using mobile devices, and the tasks they are likely and unlikely to perform on a mobile device, are the key considerations for developing user-centered mamw for library patrons.36 it is critical to gather, understand, and review user needs.37 surveys can be developed on paper or online, which can be analyzed using advanced statistical techniques or qualitative software.38,39 the analysis allows the following questions to be answered: which identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 48 library services do patrons use most frequently on their mobile devices? what is their level of satisfaction for using those services? what types of library services and products would they like to access with their mobile phones in the future? survey analyses can help librarians predict which mobile services patrons will find most useful;40 they can also help librarians classify users on the basis of their perceptions, experience, and habits when using mobile technologies to access library services.41 as a result, libraries can identify and prioritize functional areas for their mamw deployment.42 mamw developers can learn from their users’ humbling and/or frustrating experience of using mobile devices for library services. in addition, libraries can keep track of their patrons’ positive and negative observations, their information-sharing practices, and howthey create group experiences on the platform provided by their libraries.43 to improve existing mamw, libraries could also use google analytics, a free web metrics tool, for identifying the popularity of mamw features and analyzing statistics on how they are used.44 to develop operating system-specific mobile apps, google analytics can be used to learn about the popularity of mobile devices used by patrons.45 ideally, libraries should calculate and document roi before investing in the development of mamw.46 for instance, libraries can run a cost-benefit analysis on the process of developing mamw and compare various library services offered over mobile devices.47 typically the following data could help libraries run the cost-benefit analysis: specific deliverables (e.g., features of mamw), resources (e.g., resources needed, available resources, etc.), risks (e.g., types of risks, level of risks, etc.), performance requirements, and security requirements for developing mamw. this analysis would help libraries make decisions on service provisions such as specific goals to be set for developing mamw, feasibility of introducing desired features of mamw, and how to manage available resources to meet the set goals.48 libraries should also examine what other libraries have already done to provide mobile services.49 communication/liaising with stakeholders the effective communication between developers and stakeholders influences almost every aspect of developing information systems. however, existing studies do not emphasize the significance of communication with stakeholders. for instance, several studies vaguely refer to the translation of user needs into technology requirements.50 but few studies point out the precise modeling technique (e.g., entity relationship diagrams, unified modeling language, etc.) for converting user needs into a language understood by software developers. developers should communicate best practices and suggestions for the future implementation of mamw in libraries,51 which involves the prediction and selection of appropriate mamw for libraries,52 the demonstration of what is possible and how services are relevant, and how new resources can help create value for libraries.53,54 communication with users is also critical for creating value-added services for patrons who use different mobile technologies to meet their needs related to work, leisure, commuting, etc.55 information technology and libraries | september 2016 49 however, the existing literature on mamw development for libraries does not mention the significance of this activity. key steps for designing mamw prototyping prototyping refers to the modeling or simulation of an actual information system. mamw can have paper-based or computer-based prototypes. prototyping allows developers to directly communicate with mamw users to seek their feedback. developers can correct or modify the original design of mamw until users and developers are in agreement about the system design. building consensus between mamw developers and potential users is another key challenge to overcome during this phase, which may put a financial burden on mamw development projects. it requires skilled personnel to manage the scope, time, human resources, and budget of such projects. wireframing is one of the most prominent prototyping techniques practiced by librarians and it professionals for developing mamw for libraries.56 this technique depicts schematic on-screen blueprints of mamw, lacking style, color, or graphics, focusing mainly on functionality, behavior, and priority of content. selecting hardware, programming languages, platforms, frameworks, and toolkits existing literature on the development of mamw for libraries covers the selection and management of software; software development kits; scripting languages like javascript; data management and representation languages such as html, xml, and their text editors; and ajax for animations and transitions. the existing literature also guides libraries for training their staff for using mamw to better serve patrons.57 few studies also provide guidance on selecting cots products such as webkit, an open source web browser engine that renders webpages on smartphones and allows users to view high-quality graphics on data networks with faster throughput.58 however, it might be a good idea to use licensed open source cots products because licensed software allows libraries to legally distribute software within their organizations as covered by the licensing agreement. libraries that use software-licensing agreements may also be able to seek expert help and advice whenever they have a concern or query. in the authors’ experience, librarians have shared few effective strategies to design mamw. one key strategy is to purchase reliable device emulators and cross-compatible web editors. these technologies allow the user to work with the design at the most basic level, save documents as text, transfer the documents between web programs, and direct designers toward simple solutions.59 sample cross-compatible web editors include, but are not limited to, notetab pro (http://www.notetab.com/), code lobster (http://www.codelobster.com/), and bluefish (http://bluefish.openoffice.nl). http://www.notetab.com/ http://www.codelobster.com/ http://bluefish.openoffice.nl/ identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 50 hybrid mobile app frameworks like bootstrap, ionic, mobile angular ui, intel xdk, appcelerator titanium, sencha, kendo ui, and phonegap use a combination of web technologies like html, css, and javascript for developing mobile-first, responsive mamw. a majority of these frameworks use a drag-and-drop approach and do not require any coding for developing mobile apps. one-click api connect further simplifies the process. user-interface frameworks like jquerymobile and topcoat eliminate the need to design user interfaces manually. importantly, mamw developed using such frameworks can support many mobile platforms and devices. toolkits like github, skyronic, crudkit, and hawhaw enable developers to quickly build mobilefriendly crud (create/read/update/delete) interfaces for php, laravel, and codeigniter apps. such mobile apps also work with mysql and other databases, allowing users to receive and process data and display information to users. table 1 categorizes specific hardware and software features recommended for mamw to better serve library patrons. # areas of information systems/it specific features recommended for developing mamw for libraries 1 human-computer interaction (hci) behavioral, cognitive, motivational, and affective aspects of hci design responsive web sites for libraries to enhance user experience60 design a user interface meeting the expectations and needs of potential users (e.g., menu with the following items: library catalog, patron accounts, ask a librarian, contact information, listing of hours, etc.)61 design meaningful mobile websites based on user needs, documenting and maintaining mobile websites62 usability engineering design concise interfaces with limited links, descriptive icons, home and parent-link icons63 create a user-friendly site (e.g., the dok library concept center in delft, netherlands, offers a welcome text message to first-time visitors)64 effectively transition from traditional websites to mobile-optimized sites with responsive design65 create user-friendly interface designs66 present a clean, easy to navigate mobile version of search results67 information technology and libraries | september 2016 51 information visualization automatically maintain reliable and stable fundamental information required by indoor localization systems68 save time by redesigning existing sites69,70 2 web programming html, xml, etc. design sites with a complete separation of content and presentation71 code html and css for better user experiences72 create and shorten links to make them easier to input using small or virtual keyboards73 using cient-side and server-side scripting such as javascript object notation, etc. design and develop mashups74 develop mamw using client-server architecture, accessible on mobile devices75 without scripting implement widgetization to facilitate the integration of mobile websites—developing a widget library for mobile-based web information systems76 3 open source design mobile websites that allow users to leverage the same open source technology as the main websites77 design mobile websites linking to other existing services like library h3lp and library catalogs with mobile interfaces such as mobilecat78 4 networking design a mobile website capable of exploiting advancements in technology such as faster mobile data networks79 identify and address technology issues (e.g., connectivity, security, speed, signal strength, etc.) faced by patrons when using mamw80 5 input/output devices use a mobile robot to determine the location of fixed rfid tags in space81 design mamw capable of processing data communicated using radio frequency identification devices, near-field communication technology, and bluetoothbased technology like ibeacons82 offer innovative services using augmentedreality tools83 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 52 6 databases integrate a back-end database of metadata with front-end mobile technologies84 integrate front-end of mobile mamw with back-end of standard databases and services85 7 social media and analytics integrate social media sites (e.g., foursquare, facebook place, gowalla, etc.) with existing checkout services for accurate and information rich entries86 implement google voice or a free textmessaging service87 use google analytics for mobile optimized website by copying the free javascript code generated from google analytics and paste it into library webpages to gain insight into what resources are used and who used them88 integrate a geo-location feature with mobile services89 table 1. mamw with specific hardware and software features from the above table, which is based on the analysis of the literature on developing mobile applications and mobile websites for libraries, it becomes clear that web programming and hci are the two leading technology areas that shape the development of mamw and consequently the services offered by them. designing user interfaces of mamw librarians and it professionals engaged in developing mamw for libraries make the following recommendations. use two style sheets: css play a key role in offering uniform display to user interfaces for all webpages. studies recommend designing two style sheets—namely, mobile.css and iphone.css— when developing mamw, since most of the time smartphones ignore mobile stylesheets.90 in that case, iphone.css could direct itself to browsers of a specific screen-width, helping those mobile devices that are not directed to the mobile website by the mobile.css stylesheet.91 minimize use of javascript: javascript is instrumental in detecting what mobile device is being used by patrons and then directing them to the appropriate webpage with options including full website, simple text-based, and touch-mobile-optimized. however, it is critical to minimize the use of javascript on library mobile websites because not every smartphone offers the minimum level of support required to operate it.92 handle images intelligently: to help patrons optimize their bandwidth use, image files on mobile sites should be incorporated with css rather than html code; also, to ensure consistency in the information technology and libraries | september 2016 53 appearance of user interfaces of mobile websites, images should be kept to the same absolute size.93 key steps for implementing mamw programming for mamw programming is at the heart of developing mamw. as shown in table 1 above, web programming enables developers to build mamw with a number of value-added features for patrons. for instance, a web-application server running on cold fusion can process data communicated via web browsers on mobile devices; this feature allows mamw users to access search engines on library websites via smartphones.94 also, client-side processing of classes (with a widget library) allows patrons to use their mobile devices as thin clients, thereby optimizing the use of network bandwidth.95 testing mamw past studies recommend testing the content, display/design, and functionality of mamw in a controlled environment (e.g., usability lab) or in the real world (i.e., in libraries). content: librarians are advised to set up testing databases for testing image presentation, traditional free text search, location-based search, barcode scanning for isbn search, qr encapsulation, and voice search.96 display/design: librarians can review and test mamw on multiple devices to confirm that everything displays and functions as intended.97 they can also test a beta version of their mobile website with varying devices to provide guidance regarding image sizing;98 beta versions are also useful in testing mobile websites for their display on different browsers and devices.99 functionality: librarians can set up testing practices and environments for the most heavily used device platforms (e.g., hci incubators such as eye testing software, which is a combination of virtual emulators and mobile devices not owned by libraries).100,101 they can also use the user agent switcher add-on for firefox to test a mobile website and use web-based services like device anywhere and browser cam offering mobile emulation to test the functionality of mamw.102 training patrons unless patrons realize the significance of a new information system for managing information resources they will hardly use it. however, training patrons for using a newly developed mamw is almost completely missing from the studies describing the process of developing mamw for libraries. joe murphy, a technology librarian at yale university, identifies the significance of user training in managing the change from traditional to mobile search and advises librarians to explore the mobile literacy skills of their patrons and educate them on how to use new systems.103 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 54 data management mamw cannot function properly without clean data. cleaning up data, curating data, and addressing other data-related issues are some of the least mentioned activities in the literature for developing mamw. however, it is necessary for librarians engaged in developing mamw to identify and address common challenges for managing data when used for mamw. for example, it might be a good strategy for librarians to study the best practices for managing data-related issues when offering reference services using sms .104 skills needed for maintaining mamw documentation and version control of software past studies recommend developing a mobile strategy for building a mobile-tracking device and evaluating mobile infrastructure to ensure the continued assessment and monitoring of mobile usage and trends among patrons.105 however, past studies do not report or provide many details about the maintenance of mamw, which leads us to infer that maintenance of mamw involving documentation and version control is a neglected aspect of their development. open source software development is increasingly becoming a common practice for developing mamw. implementing version-control software (e.g., subversion and github) to accommodate the needs of developers distributed across the world is a necessity for developing mamw. versioncontrol software provides a code repository with a centralized database for developers to share their code, which minimizes errors associated with overwriting or reverting code changes and maximizes software development collaboration efforts.106 conclusion there are various forces driving change in the knowledge and skills area for information professionals: technologies, changing environments, and the changing role of it in managing and providing services to patrons. these forces affect all levels of it-based professionals, those responsible for information processing and those responsible for information services. this paper has examined the key steps and precautions to be taken while developing mamw to better serve their patrons. after analyzing the existing guidance offered by librarians and it professionals from the system analysis and design perspective, we find that some of the most ignored activities in mamw development are selecting appropriate software development methodologies, prototyping, communicating with stakeholders, software version control, data management, and training patrons to use newly developed or revamped mamw. the lack of attention to these activities could hinder libraries’ ability to better serve patrons using mamw. it is necessary for librarians and it professionals to pay close attention to the above activities when developing mamw. information technology and libraries | september 2016 55 our study also shows that web programming and hci are the two most widely used technology areas for developing mamw for libraries. to save their scarce financial resources, which otherwise could be invested in partnering with external it professionals, libraries could either train their existing staff or recruit lis graduates equipped with the skills and knowledge identified in this paper to develop mamw (see table 2). # key steps for developing mamw skills and knowledge required for developing mamw a planning phase 1 forming and managing team human resource management 2 making strategic choices time management cost management quality management human resource management (e.g., staff capacity) 3 requirements gathering research (empirical and secondary) 4 managing scope (e.g., managing financial resources, prioritizing tasks, identifying mission-critical features of mamw, etc.) scope management 5 selecting an appropriate software development method time management cost management quality management b analysis phase 6 requirements analysis research (empirical and secondary) 7 communication/liaising with stakeholders communications management c design phase 8 prototyping software development (hci) 9 selecting hardware and programming languages and platforms software development (web programming and hci) 10 designing user interfaces of mamw software development (hci) d implementation phase 11 programming for mamw software development (web programming—e.g., android, ios, visual c++, visual c#, visual basic, etc.) 12 testing mamw software development (web programming and hci) identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 56 13 training patrons human resource management 14 data management (e.g., cleaning up data, curating data, etc.) data management e maintenance phase 15 documentation and version control of software software development (web programming and hci) table 2. skills and knowledge necessary to develop mamw the management of scope, time, cost, quality, human resources, and communication related to any project is known as project management.107 in addition to the skills and knowledge related to project management, librarians would also need to be proficient in software development (with an emphasis on hci and web programming), data management, and the proper methods for conducting empirical and secondary research for developing mamw. if lis programs equip their graduate students with the skills and knowledge identified in this paper, the next generation of lis graduates could develop mamw for libraries without relying on external it professionals, which would make libraries more self-reliant and better able to manage their financial resources.108 this paper assumes a very small number of scholarly publications to be reflective of the realworld scenarios of developing mamw for all types of libraries. this assumption is one of the limitations of this study. also, the sample of publications analyzed in this study is not statistically representative of the development of mamw for libraries around the world. in the future, the authors plan to interview librarians and it professionals engaged in developing and maintaining mamw for their libraries to better understand the landscape of developing mamw for libraries. references 1. devendra potnis, ed cortez, and suzie allard, “educating lis students as mobile technology consultants” (poster presented at 2015 association for library and information science education annual meeting, chicago, january 25–27), http://f1000.com/posters/browse/summary/1097683. 2. edwin michael cortez, “new and emerging technologies for information delivery,” catholic library world no. 54 (1982): 214–18. 3. kimberly d. pendell and michael s. bowman, “usability study of a library’s mobile website: an example from portland state university,” information technology & libraries 31, no. 2 (2012): 45–62, http://dx.doi.org/10.6017/ital.v31i2.1913. 4. godmar back and annette bailey, “web services and widgets for library information systems,” information technology & libraries 29 no. 2 (2010): 76–86, http://dx.doi.org/10.6017/ital.v29i2.3146 . http://f1000.com/posters/browse/summary/1097683 http://dx.doi.org/10.6017/ital.v31i2.1913 http://dx.doi.org/10.6017/ital.v29i2.3146 information technology and libraries | september 2016 57 5. hannah gascho rempel and laurie bridges, “that was then, this is now: replacing the mobile optimized site with responsive design,” information technology & libraries 32, no. 4 (2013): 8–24, http://dx.doi.org/10.6017/ital.v32i4.4636. 6. june jamrich parsons and dan oja, new perspectives on computer concepts 2014: comprehensive, course technology (boston: cengage learning, 2013). 7. ibid. 8. andrew walsh, using mobile technology to deliver library services: a handbook (london: facet, 2012). 9. matthew b. miles and a. michael huberman, qualitative data analysis (thousand oaks, ca: sage, 1994). 10. bohyun kim, “responsive web design, discoverability and mobile challenge,” library technology reports 49, no 6 (2013): 29–39, https://journals.ala.org/ltr/article/view/4507. 11. james elder, “how to become the “tech guy and make iphone apps for your library,” the reference librarian 53, no. 4 (2012): 448–55, http://dx.doi.org/10.1080/02763877.2012.707465. 12. sarah houghton, “mobile services for broke libraries: 10 steps to mobile success,” the reference librarian 53, no. 3 (2012): 313–21, http://dx.doi.org/10.1080/02763877.2012.679195. 13. pendell and bowman, “usability study.” 14. lisa carlucci thomas, “libraries, librarians and mobile services,” bulletin of the american society for information science & technology 38, no. 1 (2011): 8–9, http://dx.doi.org/10.1002/bult.2011.1720380105. 15. elder, “how to become the ‘tech guy.’” 16. kim, “responsive web design.” 17. chad mairn, “three things you can do today to get your library ready for the mobile experience,” the reference librarian 53, no. 3 (2012): 263–69, http://dx.doi.org/10.1080/02763877.2012.678245. 18. rempel and bridges, “that was then.” 19. rachael hu and alison meier, “planning for a mobile future: a user research case study from the california digital library,” serials 24, no. 3 (2011): s17–25. 20. kim, “responsive web design.” http://dx.doi.org/10.6017/ital.v32i4.4636 https://journals.ala.org/ltr/article/view/4507 http://dx.doi.org/10.1080/02763877.2012.707465 http://dx.doi.org/10.1080/02763877.2012.679195 http://dx.doi.org/10.1002/bult.2011.1720380105 http://dx.doi.org/10.1080/02763877.2012.678245 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 58 21. lorraine paterson and boon low, “student attitudes towards mobile library services for smartphones,” library hi tech 29, no. 3 (2011): 412–23, http://dx.doi.org/10.1108/07378831111174387. 22. jim hahn, michael twidale, alejandro gutierrez and reza farivar, “methods for applied mobile digital library research: a framework for extensible wayfinding systems,” the reference librarian 52, no. 1-2 (2011): 106–16, http://dx.doi.org/10.1080/02763877.2011.527600. 23. patterson and low, “student attitudes.” 24. gillian nowlan, “going mobile: creating a mobile presence for your library,” new library world 114, no. 3/4 (2013): 142–50, http://dx.doi.org/10.1108/03074801311304050. 25. elder, “how to become the ‘tech guy.’” 26. matthew connolly, tony cosgrave, and baseema b. krkoska, “mobilizing the library’s web presence and services: a student-library collaboration to create the library’s mobile site and iphone application,” the reference librarian 52, no. 1-2 (2010): 27–35, http://dx.doi.org/10.1080/02763877.2011.520109. 27. stephan spitzer, “make that to go: re-engineering a web portal for mobile access,” computers in libraries 3 no. 5 (2012): 10–14. 28. houghton, “mobile services.” 29. cody w. hanson, “mobile solutions for your library,” library technology reports 47, no. 2 (2011): 24–31, https://journals.ala.org/ltr/article/view/4475/5222. 30. terence k. huwe, “using apps to extend the library’s brand,” computers in libraries 33, no. 2 (2013): 27–29. 31. edward iglesias and wittawat meesangnill, “mobile website development: from site to app,” bulletin of the american society for information science and technology 38, no. 1 (2011): 18– 23. 32. jeff wisniewski, “mobile usability,” bulletin of the american society for information science & technology 38, no. 1 (2011): 30–32, http://dx.doi.org/10.1002/bult.2011.1720380108. 33. jeff wisniewski, “mobile websites with minimal effort,” online 34, no. 1 (2010): 54–57. 34. hahn et al., “methods for applied mobile digital library research.” 35. j. michael demars, “smarter phones: creating a pocket sized academic library,” the reference librarian 53, no. 3 (2012): 253–62, http://dx.doi.org/10.1080/02763877.2012.678236. http://dx.doi.org/10.1108/07378831111174387 http://dx.doi.org/10.1080/02763877.2011.527600 http://dx.doi.org/10.1108/03074801311304050 http://dx.doi.org/10.1080/02763877.2011.520109 https://journals.ala.org/ltr/article/view/4475/5222 http://dx.doi.org/10.1002/bult.2011.1720380108 http://dx.doi.org/10.1080/02763877.2012.678236 information technology and libraries | september 2016 59 36. kim griggs, laurie m. bridges, and hannah gascho rempel, “library/mobile: tips on designing and developing mobile websites,” code4lib no. 8 (2009), http://journal.code4lib.org/articles/2055. 37. demars, “smarter phones.” 38. hahn et al., “methods for applied mobile digital library research.” 39. beth stahr, “text message reference service: five years later,” the reference librarian no. 52, no. 1-2 (2011): 9–19, http://dx.doi.org/10.1080/02763877.2011.524502. 40. patterson and low, “student attitudes.” 41. ibid. 42. ibid. 43. hanson, “mobile solutions for your library.” 44. stahr, “text message reference service.” 45. spitzer, “make that to go.” 46. allison bolorizadeh et al., “making instruction mobile,” the reference librarian 53, no. 4 (2012): 373–83, http://dx.doi.org/10.1080/02763877.2012.707488. 47. maura keating, “will they come? get out the word about going mobile,” the reference librarian no. 52, no. 1-2 (2010): 20-26, http://dx.doi.org/10.1080/02763877.2010.520111. 48. patterson and low, “student attitudes.” 49. hanson, “mobile solutions for your library.” 50. patterson and low, “student attitudes.” 51. hanson, “mobile solutions for your library.” 52. cody w. hanson, “why worry about mobile?,” library technology reports no. 47, no. 2 (2011): 5–10, https://journals.ala.org/ltr/article/view/4476. 53. keating, “will they come?” 54. spitzer, “make that to go.” 55. kim, “responsive web design.” 56. wisniewski, “mobile usability.” 57. elder, “how to become the ‘tech guy.’” http://journal.code4lib.org/articles/2055 http://dx.doi.org/10.1080/02763877.2011.524502 http://dx.doi.org/10.1080/02763877.2012.707488 http://dx.doi.org/10.1080/02763877.2010.520111 https://journals.ala.org/ltr/article/view/4476 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 60 58. sally wilson and graham mccarthy, “the mobile university: from the library to the campus,” reference services review 38, no. 2 (2010): 214–32, http://dx.doi.org/10.1108/00907321011044990. 59. brendan ryan, “developing library websites optimized for mobile devices,” the reference librarian 52, no. 1-2 (2010): 128–35, http://dx.doi.org/10.1080/02763877.2011.527792. 60. kim, “responsive web design.” 61. connolly, cosgrave, and krkoska, “mobilizing the library’s web presence and services.” 62. demars, “smarter phones.” 63. mark andy west, arthur w. hafner, and bradley d. faust, “expanding access to library collections and services using small-screen devices,” information technology & libraries 25 (2006): 103–7. 64. houghton, “mobile services.” 65. rempel and bridges, “that was then.” 66. elder, “how to become the ‘tech guy.’” 67. heather williams and anne peters, “and that’s how i connect to my library: how a 42second promotional video helped to launch the utsa libraries’ new summon mobile application,” the reference librarian 53, no. 3 (2012): 322–25, http://dx.doi.org/10.1080/02763877.2012.679845. 68. hahn et al., “methods for applied mobile digital library research.” 69. danielle andre becker, ingrid bonadie-joseph, and jonathan cain, “developing and completing a library mobile technology survey to create a user-centered mobile presence,” library hi-tech 31, no. 4 (2013): 688–99, http://dx.doi.org/10.1108/lht-03-2013-0032. 70. rempel and bridges, “that was then.” 71. iglesias and meesangnill, “mobile website development.” 72. elder, “how to become the ‘tech guy.’” 73. andrew walsh, “mobile information literacy: a preliminary outline of information behavior in a mobile environment,” journal of information literacy 6, no. 2 (2012): 56–69, http://dx.doi.org/10.11645/6.2.1696. 74. back and bailey, “web services and widgets.” 75. ibid. 76. ibid. 77. spitzer, “make that to go.” http://dx.doi.org/10.1108/00907321011044990 http://dx.doi.org/10.1080/02763877.2011.527792 http://dx.doi.org/10.1080/02763877.2012.679845 http://dx.doi.org/10.1108/lht-03-2013-0032 http://dx.doi.org/10.11645/6.2.1696 information technology and libraries | september 2016 61 78. iglesias and meesangnill, “mobile website development.” 79. bohyun kim, “the present and future of the library mobile experience,” library technology reports 49, no. 6 (2013): 15–28, https://journals.ala.org/ltr/article/view/4506. 80. pendell and bowman, “usability study.” 81. hahn et al., “methods for applied mobile digital library research.” 82. andromeda yelton, “where to go next,” library technology reports 48, no. 1 (2012): 25–34, https://journals.ala.org/ltr/article/view/4655/5511. 83. ibid. 84. hahn et al., “methods for applied mobile digital library research.” 85. houghton, “mobile services.” 86. ibid. 87. mairn, “three things you can do today.” 88. ibid. 89. tamara pianos, “econbiz to go: mobile search options for business and economics— developing a library app for researchers,” library hi tech 30, no. 3 (2012): 436–48, http://dx.doi.org/10.1108/07378831211266582. 90. demars, “smarter phones.” 91. ryan, “developing library websites.” 92. pendell and bowman, “usability study.” 93. ryan, “developing library websites.” 94. michael j. whitchurch, “qr codes and library engagement,” bulletin of the american society for information science & technology 38, no. 1 (2011): 14–17. 95. back and bailey, “web services and widgets.” 96. jingru hoivik, “global village: mobile access to library resources,” library hi tech 31, no. 3 (2013): 467–77, http://dx.doi.org/10.1108/lht-12-2012-0132. 97. elder, “how to become the ‘tech guy.’” 98. ryan, “developing library websites.” 99. west, hafner and faust, “expanding access.” 100. hu and meier, “planning for a mobile future.” 101. iglesias and meesangnill, “mobile website development.” https://journals.ala.org/ltr/article/view/4506 https://journals.ala.org/ltr/article/view/4655/5511 http://dx.doi.org/10.1108/07378831211266582 http://dx.doi.org/10.1108/lht-12-2012-0132 identifying key steps for developing mobile applications & mobile websites for libraries | potnis, regenstreif-harms, and cortez |doi:10.6017/ital.v35i2.8652 62 102. wisniewski, “mobile usability.” 103. joe murphy, “using mobile devices for research: smartphones, databases and libraries,” online 34, no. 3 (2010): 14–18. 104. amy vecchione and margie ruppel, “reference is neither here nor there: a snapshot of sms reference services,” the reference librarian 53, no. 4 (2012): 355–72, http://dx.doi.org/10.1080/02763877.2012.704569. 105. hu and meier, “planning for a mobile future.” 106. wilson and mccarthy, “the mobile university.” 107. project management institute, a guide to the project management body of knowledge (pmbok guide) (newtown square, pa: project management institute, 2013). 108. devendra potnis et al., “skills and knowledge needed to serve as mobile technology consultants in information organizations,” journal of education for library & information science 57 (2016): 187–96. http://dx.doi.org/10.1080/02763877.2012.704569 abstract introduction method forming and managing a team key steps in the analysis phase key steps for designing mamw key steps for implementing mamw skills needed for maintaining mamw conclusion forming and managing team this paper assumes a very small number of scholarly publications to be reflective of the real-world scenarios of developing mamw for all types of libraries. this assumption is one of the limitations of this study. also, the sample of publications anal... references can bibliographic data be put directly onto the semantic web? | yee 55 martha m. yee can bibliographic data be put directly onto the semantic web? this paper is a think piece about the possible future of bibliographic control; it provides a brief introduction to the semantic web and defines related terms, and it discusses granularity and structure issues and the lack of standards for the efficient display and indexing of bibliographic data. it is also a report on a work in progress—an experiment in building a resource description framework (rdf) model of more frbrized cataloging rules than those about to be introduced to the library community (resource description and access) and in creating an rdf data model for the rules. i am now in the process of trying to model my cataloging rules in the form of an rdf model, which can also be inspected at http://myee. bol.ucla.edu/. in the process of doing this, i have discovered a number of areas in which i am not sure that rdf is sophisticated enough yet to deal with our data. this article is an attempt to identify some of those areas and explore whether or not the problems i have encountered are soluble—in other words, whether or not our data might be able to live on the semantic web. in this paper, i am focusing on raising the questions about the suitability of rdf to our data that have come up in the course of my work. t his paper is a think piece about the possible future of bibliographic control; as such, it raises more complex questions than it answers. it is also a report on a work in progress—an experiment in building a resource description framework (rdf) model of frbrized descriptive and subject-cataloging rules. here my focus will be on the data model rather than on the frbrized cataloging rules for gathering data to put in the model, although i hope to have more to say about the latter in the future. the intent is not to present you with conclusions but to present some questions about data modeling that have arisen in the course of the experiment. my premise is that decisions about the data model we follow in the future should be made openly and as a community rather than in a small, closed group of insiders. if we are to move toward the creation of metadata that is more interoperable with metadata being created outside our community, as is called for by many in our profession, we will need to address these complex questions as a community following a period of deep thinking, clever experimentation, and astute political strategizing. n the vision the semantic web is still a bewitching midsummer night’s dream. it is the idea that we might be able to replace the existing html–based web consisting of marked-up documents—or pages—with a new rdf– based web consisting of data encoded as classes, class properties, and class relationships (semantic linkages), allowing the web to become a huge shared database. some call this web 3.0, with hyperdata replacing hypertext. embracing the semantic web might allow us to do a better job of integrating our content and services with the wider internet, thereby satisfying the desire for greater data interoperability that seems to be widespread in our field. it also might free our data from the proprietary prisons in which it is currently held and allow us to cooperate in developing open-source software to index and display the data in much better ways than we have managed to achieve so far in vendor-developed ils opacs or in giant, bureaucratic bibliographic empires such as oclc worldcat. the semantic web also holds the promise of allowing us to make our work more efficient. in this bewitching vision, we would share in the creation of uniform resource identifiers (uris) for works, expressions, manifestations, persons, corporate bodies, places, subjects, and so on. at the uri would be found all of the data about that entity, including the preferred name and the variant names, but also including much more data about the entity than we currently put into our work (name-title and title), such as personal name, corporate name, geographic, and subject authority records. if any of that data needed to be changed, it would be changed only once, and the change would be immediately accessible to all users, libraries, and library staff by means of links down to local data such as circulation, acquisitions, and binding data. each work would need to be described only once at one uri, each expression would need to be described only once at one uri, and so forth. very much up in the air is the question of what institutional structures would support the sharing of the creation of uris for entities on the semantic web. for the data to be reliable, we would need to have a way to ensure that the system would be under the control of people who had been educated about the value of clean and accurate entity definition, the value of choosing “most commonly known” preferred forms (for display in lists of multiple different entities), and the value of providing access martha m. yee (myee@ucla.edu) is cataloging supervisor at the university of california, los angeles film and television archive. 56 information technology and libraries | june 2009 under all variant forms likely to be sought. at the same time, we would need a mechanism to ensure that any interested members of the public could contribute to the effort of gathering variants or correcting entity definitions when we have had inadequate information. for example, it would be very valuable to have the input of a textual or descriptive bibliographer applied to difficult questions concerning particular editions, issues, and states of a significant literary work. it would also be very valuable to be able to solicit input from a subject expert in determining the bounds of a concept entity (subject heading) or class entity (classification). n the experiment (my project) to explore these bewitching ideas, i have been conducting an experiment. as part of my experiment, i designed a set of cataloging rules that are more frbrized than is rda in the sense that they more clearly differentiate between data applying to expression and data applying to manifestation. note that there is an underlying assumption in both frbr (which defines expression quite differently from manifestation) and on my part, namely that catalogers always know whether a given piece of data applies at either the expression or the manifestation level. that assumption is open to questioning in the process of the experiment as well. my rules also call for creating a more hierarchical and degressive relationship between the frbr entities work, expression, manifestation, and item, such that data pertaining to the work does not need to be repeated for every expression, data pertaining to the expression does not need to be repeated for every manifestation, and so forth. degressive is an old term used by bibliographers for bibliographies that provide great detail about first editions and less detail for editions after the first. i have adapted this term to characterize my rules, according to which the cataloger begins by describing the work; any details that pertain to all expressions and manifestations of the work are not repeated in the expression and manifestation descriptions. this paper would be entirely too long if i spent any more time describing the rules i am developing, which can be inspected at http://myee.bol.ucla .edu. here, i would like to focus on the data-modeling process and the questions about the suitability of rdf and the semantic web for encoding our data. (by the way, i don’t seriously expect anyone to adopt my rules! they are radically different than the rules currently being applied and would represent a revolution in cataloging practice that we may not be up to undertaking in the current economic climate. their value lies in their thought-experiment aspect and their ability to clarify what entities we can model and what entities we may not be able to model.) i am now in the process of trying to model my cataloging rules in the form of an rdf model (“rdf” as used in this paper should be considered from now on to encompass rdf schema [rdfs], web ontology language [owl], and simple knowledge organization system [skos] unless otherwise stated); this model can also be inspected at http://myee.bol .ucla.edu. in the process of doing this, i have discovered a number of areas in which i am not sure that rdf is yet sophisticated enough to deal with our data. this article is an attempt to outline some of those areas and explore whether the problems i have encountered are soluble, in other words, whether or not our data might be able to live on the semantic web eventually. i have already heard from rdf experts bruce d’arcus (miami university) and rob styles (developer of talis, as semantic web technology company), whom i cite later, but through this article i hope to reach a larger community. my research questions can be found later, but first some definitions. n definition of terms the semantic web is a way to represent knowledge; it is a knowledge-representation language that provides ways of expressing meaning that are amenable to computation; it is also a means of constructing knowledgedomain maps consisting of class and property axioms with a formal semantics rdf is a family of specifications for methods of modeling information that underpins the semantic web through a variety of syntax formats; an rdf metadata model is based on making statements about resources in the form of triples that consist of 1. the subject of the triple (e.g., “new york”); 2. the predicate of the triple that links the subject and the object (e.g., “has the postal abbreviation”); and 3. the object of the triple (e.g., “ny”). xml is commonly used to express rdf, but it is not a necessity; it can also be expressed in notation 3 or n3, for example.1 rdfs is an extensible knowledge-representation language that provides basic elements for the description of ontologies, also known as rdf vocabularies. using rdfs, statements are made about resources in the form of 1. a class (or entity) as subject of the rdf triple (e.g., “new york”); 2. a relationship (or semantic linkage) as predicate of the rdf triple that links the subject and the object (e.g., can bibliographic data be put directly onto the semantic web? | yee 57 “has the postal abbreviation”); and 3. a property (or attribute) as object of the rdf triple (e.g., “ny”). owl is a family of knowledge representation languages for authoring ontologies compatible with rdf. skos is a family of formal languages built upon rdf and designed for representation of thesauri, classification schemes, taxonomies, or subject-heading systems. n research questions actually, the full-blown semantic web may not be exactly what we need. remember that the fundamental definition of the semantic web is “a way to represent knowledge.” the semantic web is a direct descendant of the attempt to create artificial intelligence, that is, of the attempt to encode enough knowledge of the real world to allow a computer to reason about reality in a way indistinguishable from the way a human being reasons. one of the research questions should probably be whether or not the technology developed to support the semantic web can be used to represent information rather than knowledge. fortunately, we do not need to represent all of human knowledge—we simply need to describe and index resources to facilitate their retrieval. we need to encode facts about the resources and what the resources discuss (what they are “about”), not facts about “reality.” based on our past experience, doing even this is not as simple as people think it is. the question is whether we could do what we need to do within the context of the semantic web. sometimes things that sound simple do not turn out to be so simple in the doing. my research questions are as follows: 1. is it possible for catalogers to tell in all cases whether a piece of data pertains to the frbr expression or the frbr manifestation? 2. is it possible to fit our data into rdf? given that rdf was designed to encode knowledge rather than information, perhaps it is the wrong technology to use for our purposes? 3. if it is possible to fit our data into rdf, is it possible to use that data to design indexes and displays that meet the objectives of the catalog (i.e., providing an efficient instrument to allow a user to find a particular work of which the author and title are known, a particular expression of a work, all of the works of an author, all of the works in a given genre or form, or all of the works on a particular subject)? as stated previously, i am not yet ready to answer these questions. i hope to find answers in the course of developing the rules and the model. in this paper, i am focusing on raising the questions about the suitability of rdf to our data that have come up in the course of my work. n other relevant projects other relevant projects include the following: 1. frbr, functional requirements for authority data (frad), funtional requirements for subject authority records (frsar), and frbr-objectoriented (frbroo). all are attempts to create conceptual models of bibliographic entities using an entity-relationship model that is very similar to the class-property model used by rdf.2 2. various initiatives at the library of congress (lc), such as lc subject headings (lcsh) in skos,3 the lc name authority file in skos,4 the lccn permalink project to create persistent uris for bibliographic records,5 and initiatives to provide skos representations for vocabularies and data elements used in marc, premis, and mets. these all represent attempts to convert our existing bibliographic data into uris that stand for the bibliographic entities represented by bibliographic records and authority records; the uris would then be available for experiments in putting our data directly onto the semantic web. 3. the dc-rda task group project to put rda data elements into rdf.6 as noted previously and discussed further later, rda is less frbrized than my cataloging rules, but otherwise this project is very similar to mine. 4. dublin core’s (dc’s) work on an rdf schema.7 dublin core is very focused on manifestation and does not deal with expressions and works, so it is less similar to my project than is the dc-rda task groups’s project (see further discussion later). n why my project? one might legitimately ask why there is a need for a different model than the ones already provided by frbr, frad, frsar, frbroo, rda, and dc. the frbr and rda models are still tied to the model that is implicit in our current bibliographic data in which expression and manifestation are undifferentiated. this is because publishers publish and libraries acquire and shelve manifestations. in our current bibliographic practice, a new 58 information technology and libraries | june 2009 bibliographic record is made for either a new manifestation or a new expression. thus, in effect, there is no way for a computer to tell one from the other in our current data. despite the fact that frbr has good definitions of expression (change in content) and manifestation (mere change in carrier), it perpetuates the existing implicit model in its mapping of attributes to entities. for example, frbr maps the following to manifestation: edition statements (“2nd rev. ed.”); statements of responsibility that identify translators, editors, and illustrators; physical description statements that identify illustrated editions; and extent statements that differentiate expressions (the 102-minute version vs. the 89-minute version); etc. thus the frbr definition of expression recognizes that a 2nd revised edition is a new expression, but frbr maps the edition statement to manifestation. in my model, i have tried to differentiate more cleanly data applying to expressions from data applying to manifestations.8 frbr and rda tend to assume that our current bibliographic data elements map to one and only one group 1 entity or class. there are exceptions, such as title, which frbr and rda define at work, expression, and manifestation levels. however, there is a lack of recognition that, to create an accurate model of the bibliographic universe, more data elements need to be applied at the work and expression level in addition to (or even instead of) the manifestation level. in the appendix i have tried to contrast the frbr, frad, and rda models with mine. in my model, many more data elements (properties and attributes) are linked to the work and expression level. after all, if the expression entity is defined as any change in work content, the work entity needs to be associated with all content elements that might change, such as the original extent of the work, the original statement of responsibility, whether illustrations were originally present, whether color was originally present in a visual work, whether sound was originally present in an audiovisual work, the original aspect ratio of a moving image work, and so on. frbr also tends to assume that our current data elements map to one and only one entity. in working on my model, i have come to the conclusion that this is not necessarily true. in some cases, a data element pertaining to a manifestation also pertains to the expression and the work. in other cases, the same data element is specific to that manifestation, and, in other cases, the same data element is specific to its expression. this is true of most of the elements of the bibliographic description. frad, in attempting to deal with the fact that our current cataloging rules allow a single person to have several bibliographic identities (or pseudonyms), treats person, name, and controlled access point as three separate entities or classes. i have tried to keep my model simpler and more elegant by treating only person as an entity, with preferred name and variant name as attributes or properties of that entity. frbroo is focused on the creation process for works, with special attention to the creation of unique works of art and other one-off items found in museums. thus frbroo tends to neglect the collocation of the various expressions that develop in the history of a work that is reproduced and published, such as translations, abridged editions, editions with commentary, etc. dc has concentrated exclusively on the description of manifestations and has neglected expression and work altogether. one of the tenets of semantic web development is that, once an entity is defined by a community, other communities can reuse that entity without defining it themselves. the very different definitions of the work and expression entities in the different communities described above raise some serious questions about the viability of this tenet. n assumptions it should be noted that this entire experiment is based on two assumptions about the future of human intervention for information organization. these two assumptions are based on the even bigger assumption that, even though the internet seems to be an economy based on free intellectual labor, and, even though human intervention for information organization is expensive (and therefore at more risk than ever), human intervention for information organization is worth the expense. n assumption 1: what we need is not artificial intelligence, but a better human–machine partnership such that humans can do all of the intellectual labor and machines can do all of the repetitive clerical labor. currently, catalogers spend too much time on the latter because of the poor design of current systems for inputting data. the universal employment provided by paying humans to do the intellectual labor of building the semantic web might be just the stimulus our economy needs. n assumption 2: those who need structured and granular data—and the precise retrieval that results from it—to carry out research and scholarship may constitute an elite minority rather than most of the people of the world (sadly), but that talented and intelligent minority is an important one for the cultural and technological advancement of humanity. it is even possible that, if we did a better job of providing access to such data, we might enable the enlargement of that minority. can bibliographic data be put directly onto the semantic web? | yee 59 n granularity and structure issues as soon as one starts to create a data model, one encounters granularity or cataloger-data parsing issues. these issues have actually been with us all along as we developed the data model implicit in aacr2r and marc 21. those familiar with rda, frbr, and frad development will recognize that much of that development is directed at increasing structure and granularity in catalogerproduced data to prepare for moving it onto the semantic web. however, there are clear trade-offs in an increase in structure and granularity. more structure and more granularity make possible more powerful indexing and more sophisticated display, but more structure and more granularity are more complex and expensive to apply and less likely to be implemented in a standard fashion across all communities; that is, it is less likely that interoperable data would be produced. any switching or mapping that was employed to create interoperable data would produce the lowest common denominator (the simplest and least granular data), and once rendered interoperable, it would not be possible for that data to swim back upstream to regain its lost granularity. data with less structure and less granularity could be easier and cheaper to apply and might have the potential to be adopted in a more standard fashion across all communities, but that data would limit the degree to which powerful indexing and sophisticated display would be possible. take the example of a personal name: currently, we demarcate surname from forename by putting the surname first, followed by a comma and then the forename. even that amount of granularity can sometimes pose a problem for a cataloger who does not necessarily know which part of the name is surname and which part is forename in a culture unfamiliar to the cataloger. in other words, the more granularity you desire in your data, the more often the people collecting the data are going to encounter ambiguous situations. another example: currently, we do not collect information about gender self-identification; if we were to increase the granularity of our data to gather that information, we would surely encounter situations in which the cataloger would not necessarily know if a given creator was self-defined as a female or a male or of some other gender identity. presently, if we are adding a birth and death date, whatever dates we use are all together in a $d subfield without any separate coding to indicate which date is the birth date and which is the death date (although an occasional “b.” or “d.” will tell us this kind of information). we could certainly provide more granularity for dates, but that would make the marc 21 format much more complex and difficult to learn. people who dislike the marc 21 format already argue that it is too granular and therefore requires too much of a learning curve before people can use it. for example, tennant claims that “there are only two kinds of people who believe themselves able to read a marc record without referring to a stack of manuals: a handful of our top catalogers and those on serious drugs.”9 how much of the granularity already in marc 21 is used either in existing records or, even if present, is used in indexing and display software? granularity costs money, and libraries and archives are already starving for resources. granularity can only be provided by people, and people are expensive. granularity and structure also exist in tension with each other. more granularity can lead to less structure (or more complexity to retain structure along with granularity). in the pursuit of more granularity of data than we have now, rda, attempting to support rdf–compliant xml encoding, has been atomizing data to make it useful to computers, but this will not necessarily make the data more useful to humans. to be useful to humans, it must be possible to group and arrange (sort) the data meaningfully, both for indexing and for display. the developers of skos refer to the “vast amounts of unstructured (i.e., human readable) information in the web,”10 yet labeling bits of data as to type and recording semantic relationships in a machine-actionable way do not necessarily provide the kind of structure necessary to make data readable by humans and therefore useful to the people the web is ultimately supposed to serve. consider the case of music instrumentation. if you have a piece of music for five guitars and one flute, and you simply code number and instrumentation without any way to link “five” with “guitars” and “one” with “flute,” you will not be able to guarantee that a person looking for music for five flutes and one guitar will not be given this piece of music in their results (see figure 1).11 the more granular the data, the less the cataloger can build order, sequencing, and linking into the data; the coding must be carefully designed to allow the desired order, sequencing, and linking for indexing and display to be possible, which might call for even more complex coding. it would be easy to lose information about order, sequencing, and linking inadvertently. actually, there are several different meanings for the term structure: 1. structure is an object of a record (structure of document?); for example, elings and waibel refer to “data fields . . . also referred to as elements . . . which are organized into a record by a data structure.”12 2. structure is the communications layer, as opposed to the display layer or content designation.13 3. structure is the record, field, and subfield. 4. structure is the linking of bits of data together in the 60 information technology and libraries | june 2009 form of various types of relationships. 5. structure is the display of data in a structured, ordered, and sequenced manner to facilitate human understanding. 6. data structure is a way of storing data in a computer so that it can be used efficiently (this is how computer programmers use the term). i hasten to add that i am definitely in favor of adding more structure and granularity to our data when it is necessary to carry out the fundamental objectives of our profession and of our catalogs. i argued earlier that frbr and rda are not granular enough when it comes to the distinction between data elements that apply to expression and those that apply to manifestation. if we could just agree on how to differentiate data applying to the manifestation from data applying to the expression instead of our current practice of identifying works with headings and lumping all manifestation and expression data together, we could increase the level of service we are able to provide to users a thousandfold. however, if we are not going to commit to differentiating between figure 1b. example of encoding of musical instrumentation at the expression level based on the above model 5 guitars 1 flute instrumentation of musical expression original instrumentation of musical expression—number of a particular instrument original instrumentation of musical expression—type of instrument figure 1a. extract from yee rdf model that illustrates one technique for modeling musical instrumentation at the expression level (using a blank node to group repeated number and instrument type) can bibliographic data be put directly onto the semantic web? | yee 61 expression and manifestation, it would be more intellectually honest for frbr and rda to take the less granular path of mapping all existing bibliographic data to manifestation and expression undifferentiated, that is, to use our current data model unchanged and state this openly. i am not in favor of adding granularity for granularity’s sake or for the sake of vague conceptions of possible future use. granularity is expensive and should be used only in support of clear and fundamental objectives. n the goal: efficient displays and indexes my main concern is that we model and then structure the data in a way that allows us to build the complex displays that are necessary to make catalogs appear simple to use. i am aware that the current orthodoxy is that recording data should be kept completely separate from indexing and display (“the applications layer”). because i have spent my career in a field in which catalog records are indexed and displayed badly by systems people who don’t seem to understand the data contained in them, i am a skeptic. it is definitely possible to model and structure data in such a way that desired displays and indexes are impossible to construct. i have seen it happen! the lc working group report states that “it will be recognized that human users and their needs for display and discovery do not represent the only use of bibliographic metadata; instead, to an increasing degree, machine applications are their primary users.”14 my fear is that the underlying assumption here is that users need to (and can) retrieve the single perfect record. this will never be true for bibliographic metadata. users will always need to assemble all relevant records (of all kinds) as precisely as possible and then browse through them before making a decision about which resources to obtain. this is as true in the semantic web—where “records” can be conceived of as entity or class uris—as it is in the world of marc–encoded metadata. some of the problems that have arisen in the past in trying to index bibliographic metadata for humans are connected to the fact that existing systems do not group all of the data related to a particular entity effectively, such that a user can use any variant name or any combination of variant names for an entity and do a successful search. currently, you can only look for a match among two or more keywords within the bounds of a single manifestation-based bibliographic record or within the bounds of a single heading, minus any variant terms for that entity. thus, when you do a keyword search for two keywords, for example, “clemens” and “adventures,” you will retrieve only those manifestations of mark twain’s adventures of tom sawyer that have his real name (clemens) and the title word “adventures” co-occurring within the bounded space created by a single manifestation-based bibliographic record. instead, the preferred forms and the variant forms for a given entity need to be bounded for indexing such that the keywords the user employs to search for that entity can be matched using co-occurrence rules that look for matches within a single bounded space representing the entity desired. we will return to this problem in the discussion of issue 3 in the later section “rdf problems encountered.” the most complex indexing problem has always proven to be the grouping or bounding of data related to a work, since it requires pulling in all variants for the creator(s) of that work as well. otherwise, a user who searches for a work using a variant of the author’s name and a variant of the title will continue to fail (as they do in all current opacs), even when the desired work exists in the catalog. if we could create a uri for the adventures of tom sawyer that included all variant names for the author and all variant titles for the work (including the variant title tom sawyer), the same keyword search described above (“clemens” and “adventures”) could be made to retrieve all manifestations and expressions of the adventures of tom sawyer, instead of the few isolated manifestations that it would retrieve in current catalogs. we need to make sure that we design and structure the data such that the following displays are possible: n display all works by this author in alphabetical order by title with the sorting element (title) appearing at the top of each work displayed. n display all works on this subject in alphabetical order by principal author and title (with principal author and title appearing at top of each work displayed), or title if there is no principal author (with title appearing at top of each work displayed). we must ensure that we design and structure the data in such a way that our structure allows us to create subgroups of related data, such as instrumentation for a piece of music (consisting of a number associated with each particular instrument), place and related publisher for a certain span of dates on a serial title change record, and the like. n which standards will carry out which functions? currently, we have a number of different standards to carry out a number of different functions; we can speculate about how those functions might be allocated in a new semantic web–based dispensation, as shown in table 1. in table 1, data structure is taken to mean what a record represents or stands for; traditionally, a record has represented an expression (in the days of hand62 information technology and libraries | june 2009 press books) or a manifestation (ever since reproduction mechanisms have become more sophisticated, allowing an explosion of reproductions of the same content in different formats and coming from different distributors). rda is record-neutral; rdf would allow uris to be established for any and all of the frbr levels; that is, there would be a uri for a particular work, a uri for a particular expression, a uri for a particular manifestation, and a uri for a particular item. note that i am not using data structure in the sense that a computer programmer does (as a way of storing data in a computer so that it can be used efficiently). currently, the encoding of facts about entity relationships (see table 1) is carried out by matching data-value character strings (headings or linking fields using issns and the like) that are defined by the lc/naco authority file (following aacr2r rules), lcsh (following rules in the subject cataloging manual), etc. in the future, this function might be carried out by using rdf to link the uri for a resource to the uri for a data value. display rules (see table 1) are currently defined by isbd and aacr2r but widely ignored by systems, which frequently truncate bibliographic records arbitrarily in displays, supply labels, and the like; rda abdicates responsibility, pushing display out of the cataloging rules. the general principle on the web is to divorce data from display and allow anyone to display the data any way they want. display is the heart of the objects (or goals) of cataloging: the point is to display to the user the works of an author, the editions of a work, or the works on a subject. all of these goals only can be met if complex, high-quality displays can be built from the data created according to the data model. indexing rules (see table 1) were once under the control of catalogers (in book and card catalogs) in that users had to navigate through headings and cross-references to find table 1. possible reallocation of current functions in a new semantic web–based dispensation function current future? data content, or content guidelines (rules for providing data in a particular element) defined by aacr2r and marc 21 defined by rda and rdf/rdfs/ owl/skos data elements defined by isbd–based aacr2r and marc 21 defined by rda and rdf/rdfs/ owl/skos data values defined by lc/naco authority file, lcsh, marc 21 coded data values, etc. defined as ontologies using rdf/ rdfs/owl/skos encoding or labeling of data elements for machine manipulation; same as data format? defined by iso 2709–based marc 21 defined by rdf/rdfs/xml data structure (i.e., what a record stands for) defined by aacr2r and marc 21; also frbr? defined by rdf/rdfs/owl/ skos schematization (constraint on structure and content) marc 21, mods, dcmi abstract model defined by rdf/rdfs/owl/ skos encoding of facts about entity relationships carried out by matching data value strings (headings found in lc/naco authority file and lcsh, issn’s, and the like) carried out by rdf/rdfs/owl/ skos in the form of uri links display rules ils software, formerly isbd– based aacr2r (“application layer”) or yee rules indexing rules ils software sparql, “application layer,” or yee rules can bibliographic data be put directly onto the semantic web? | yee 63 what they wanted; currently indexing is in the hands of system designers who prefer to provide keyword indexing of bibliographic (i.e., manifestation-based) records rather than provide users with access to the entities they are really interested in (works, authors and subjects), all represented currently by authority records for headings and cross-references. rda abdicates responsibility, pushing indexing concerns completely out of the cataloging rules. the general principle on the web is to allow resources to be indexed by any web search engines that wish to index them. current web data is not structured at all for either indexing or display. i would argue that our interest in the semantic web should be focused on whether or not it will support more data structure—as well as more logic in that data structure—to support better indexes and better displays than we have now in manifestation-based ils opacs. crucial to better indexing than we have ever had before are the co-occurrence rules for keyword indexing, that is, the rules for when a co-occurrence of two or more keywords should produce a match. we need to be able to do a keyword search across all possible variant names for the entity of interest, and the entity of interest for the average catalog user is much more likely to be a particular work than to be a particular manifestation. unfortunately, catalog-use studies only have studied so-called known-item searches without investigating whether a known-item searcher was looking for a particular edition or manifestation of a work or was simply looking for a particular work in order to make a choice as to edition or manifestation once the work was found. however, common sense tells us that it is a rare user who approaches the catalog with prior knowledge about all published editions of a given work. the more common situation is surely one in which a user desires to read a particular shakespeare play or view a particular david lean film and discovers that the desired work exists in more than one expression or manifestation only after searching the catalog. we need to have the keyword(s) in our search for a particular work co-occur within a bounded space that encompasses all possible keywords that might refer to that particular work entity, including both creator and title keywords. notice in table 1 the unifying effect that rdf could potentially have; it could free us from the use of multiple standards that can easily contradict each other, or at least not live peacefully together. examples are not hard to find in the current environment. one that has cropped up in the course of rda development concerns family names. presently the rules for naming families are different depending on whether the family is the subject of a work (and established according to lcsh) or whether the family is responsible for a collection of papers (and established according to rda). n types of data rda has blurred the distinctions among certain types of data, apparently because there is a perception that on the semantic web the same piece of data needs to be coded only once, and all indexing and display needs can be supported from that one piece of data. i question that assumption on the basis of my experience with bibliographic cataloging. all of the following ways of encoding the same piece of data can still have value in certain circumstances: n transcribed; in rdf terms, a literal (i.e., any data that is not a uri, a constant value). transcribed data is data copied from an item being cataloged. it is valuable for providing access to the form of the name used on a title page and is particularly useful for people who use pseudonyms, corporate bodies that change name, and so on. transcribed data is an important part of the historical record and not just for off-line materials; it can be a historical record of changing data on notoriously fluid webpages. n composed; in rdf terms, also a literal. composed data is information composed by a cataloger on the basis of observation of the item in hand; it can be valuable for historical purposes to know which data was composed. n supplied; in rdf terms, also a literal. supplied data is information supplied by a cataloger from outside sources; it can be valuable for historical purposes to know which data was supplied and from which outside sources it came. n coded; in rdf, represented by a uri. coded data would likely transform on the semantic web into links to ontologies that could provide normalized, human-readable identification strings on demand, thus causing coded and normalized data to merge into one type of data. is it not possible, though, that the coded form of normalized data might continue to provide for more efficient searching for computers as opposed to humans? coded data also has great cross-cultural value, since it is not as language-dependent as literals or normalized headings. n normalized headings (controlled headings); in rdf, represented by a uri. normalized or controlled headings are still necessary to provide users with coherent, ordered displays of thousands of entities that all match the user’s search for a particular entity (work, author, subject, etc.). the reason google displays are so hideous is that, so far, the data searched lacks any normalized display data. if variant language forms of the name for an entity 64 information technology and libraries | june 2009 are linked to an entity uri, it should be possible to supply headings in the language and script desired by a particular user. n the rdf model those who have become familiar with frbr over the years will probably not find it too difficult to transition from the frbr conceptual model to the rdf model. what frbr calls an “entity,” rdf calls a “subject” and rdfs calls a “class.” what frbr calls an “attribute,” rdf calls an “object” and rdfs calls a “property.” what frbr calls a “relationship,” rdf calls a “predicate” and rdfs calls a “relationship” or a “semantic linkage” (see table 2). the difficulty in any data-modeling exercise lies in deciding what to treat as an entity or class and what to treat as an attribute or property. the authors of frbr decided to create a class called expression to deal with any change in the content of a work. when frbr is applied to serials, which change content with every issue, the model does not work well. in my model, i found it useful to create a new entity at the manifestation level, the serial title, to deal with the type of change that is more relevant to serials, the change in title. i also created another new entity at the manifestation level, title-manifestation, to deal with a change of title in a nonserial work that is not associated with a change in content. one hundred years ago, this entity would have been called title-edition. i am also in the process of developing an entity at the expression level—surrogate—to deal with reproductions of original artworks that need to inherit the qualities of the original artwork they reproduce without being treated as an edition of that original artwork, which ipso facto is unique. these are just examples of cases in which it is not that easy to decide on the classes or entities that are necessary to accurately model bibliographic information. see the appendix for a complete comparison of the classes and entities defined in four different models: frbr, frad, rda, and the yee cataloging rules (ycr). the appendix also shows variation among these models concerning whether a given data element is treated as a class/entity or as an attribute/property. the most notable examples are name and preferred access point, which are treated as classes/entities in frad, as attributes in frbr and ycr, and as both in rda. n rdf problems encountered my goal for this paper is to institute discussion with data modelers about which problems i observed are insoluble and which are soluble: 1. is there an assumption on the part of semantic web developers that a given data element, such as a publisher name, should be expressed as either a literal or using a uri (i.e., controlled), but never both? cataloging is rooted in humanistic practices that require careful recording of evidence. there will always be value in distinguishing and labeling the following types of data: n copied as is from an artifact (transcribed) n supplied by a cataloger n categorized by a cataloger (controlled) tim berners-lee (the father of the internet and the semantic web) emphasizes the importance of recording not just data but also its provenance for the sake of authenticity.15 for many data elements, therefore, it will be important to be able to record both a literal (transcribed or composed form or both) and a uri (controlled form). is this a problem in rdf? as a corollary, if any data that can be given a uri cannot also be represented by a literal (transcribed and composed data, or one or the other), it may not be possible to design coherent, readable displays of the data describing a particular entity. among other things, cataloging is a discursive writing skill. does rdf require that all data be represented only once, either by a literal or by a uri? or is it perhaps possible that data that has a uri could also have a transcribed or composed form as a property? perhaps it will even be possible to store multiple snapshots of online works that change over time to document variant forms of a name for works, persons, and so on. 2. will the internet ever be fast enough to assemble the equivalent of our current records from a collection of hundreds or even thousands of uris? in rdf, links are one-to-one rather than one-to-many. this leads to a great proliferation of reciprocal links. the more granularity there is in the data, the more linking is necessary to ensure that atomized data elements are linked together. potentially, every piece of data describing a particular entity could be represented by a uri leading out to a skos list of data values. the number of links necessary to pull together table 2. the frbr conceptual model translated into rdf and rdfs frbr rdf rdfs entity subject class attribute object property relationship predicate relationship/ semantic linkage can bibliographic data be put directly onto the semantic web? | yee 65 all of the data just to describe one manifestation could become astronomical, as could the number of one-to-one links necessary to create the appearance of a one-to-many link, such as the link between an author and all the works of an author. is the internet really fast enough to assemble a record from hundreds of uris in a reasonable amount of time? given the often slow network throughput typical of many of our current internet connections, is it really practical to expect all of these pieces to be pulled together efficiently to create a single display for a single user? we yet may feel nostalgia for the single manifestation-based record that already has all of the relevant data in it (no assembly required). bruce d’arcus points out, however, that i think if you’re dealing with rdf, you wouldn’t necessarily be gathering these data in real-time. the uris that are the targets for those links are really just global identifiers. how you get the triples is a separate matter. so, for example, in my own personal case, i’m going to put together an rdf store that is populated with data from a variety of sources, but that data population will happen by script, and i’ll still be querying a single endpoint, where the rdf is stored in a relational database.16 in other words, d’arcus essentially will put them all in one place, or in one database that “looks” from a uri perspective to be “one place” where they’re already gathered. 3. is rdf capable of dealing with works that are identified using their creators? we need to treat author as both an entity in its own right and as a property of a work, and in many cases the latter is the more important function for user service. lexical labels, or human-readable identifiers for works that are identified using both the principal author and the title, are particularly problematic in rdf given that the principal author is an entity in its own right. is rdf capable of supporting the indexing necessary to allow a user to search using any variant of the author’s name and any variant of the title of a work in combination and still retrieve all expressions and manifestations of that work, given that author will have a uri of its own, linked by means of a relationship link to the work uri? is rdf capable of supporting the display of a list of one thousand works, each identified by principal author, in order first by principal author, then by title, then by publication date, given that the preferred heading for each principal author would have to be assembled from the uri for that principal author and the preferred title for each work would have to be assembled from the uri for that work? for fear that this will not, in fact, be possible, i have put a human-readable work-identifier data element into my model that consists of principal author and title when appropriate, even though that means the preferred name of the principal author may not be able to be controlled by the entity record for the principal author. any guidance from experienced data modelers in this regard would be appreciated. according to bruce d’arcus, this is purely an interface or application question that does not require a solution at the data layer.17 since we have never had interfaces or applications that would do this correctly, even though the data is readily available in authority records, i am skeptical about this answer! perhaps bruce’s suggestion under item 9 of designating a sortname property for each entity is the solution here as well. my human-readable work identifier consisting of the name of the principal creator and uniform title of work could be designated the sortname poperty for the work. it would have to be changed whenever the preferred form of the name for the principal creator changed, however. 4. do all possible inverse relationships need to be expressed explicitly, or can they be inferred? my model is already quite large, and i have not yet defined the inverse of every property as i really should to have a correct rdf model. in other words, for every property there needs to be an inverse property; for example, the property iscreatorof needs to have the inverse property iscreatedby; thus “twain” has the property iscreatorof, while “adventures of tom sawyer” has the property iscreatedby. perhaps users and inputters will not actually have to see the huge, complex rdf data model that would result from creating all the inverse relationships, but those who maintain the model will have to deal with a great deal of complexity. however, since i’m not a programmer, i don’t know how the complexity of rdf compares to the complexity of existing ils software. 5. can rdf solve the problems we are having now because of the lack of transitivity or inheritance in the data models that underlie current ilses, or will rdf merely perpetuate these problems? we have problems now with the data models that underlie our current ilses because of the inability of these models to deal with hierarchical inheritance, such that whatever is true of an entity in the hierarchy is also true of every entity below that entity in the hierarchy. one example is that of cross-references to a parent corporate body that should be held to apply to all subdivisions of that corporate body but never are in existing ils systems. there is a cross-reference from “fbi” to “united states. federal bureau of investigation,” but not from “fbi counterterrorism division” to “united states. federal bureau of investigation. counterterrorism division.” for that reason, a search in any opac name index for “fbi counterterrorism division” will fail. we need systems that recognize that data about a parent corporate body is relevant to all subdivisions of that parent body. we need systems that recognize that data about a work is relevant to all expressions and manifestations of that work. rdf allows you to link a work to an expression 66 information technology and libraries | june 2009 and an expression to a manifestation, but i don’t believe it allows you to encode the information that everything that is true of the work is true of all of its expressions and manifestations. rob styles seems to confirm this: “rdf doesn’t have hierarchy. in computer science terms, it’s a graph, not a tree, which means you can connect anything to anything else in any direction.”18 of course, not all links should be this kind of transitive or inheritance link. one expression of work a is linked to another expression of work a by links to work a, but whatever is true of one of those expressions is not necessarily true of the other; one may be illustrated, for example, while the other is not. whatever is true of one work is not necessarily true of another work related to it by related work link. it should be recognized that bibliographic data is rife with hierarchy. it is one of our major tools for expressing meaning to our users. corporate bodies have corporate subdivisions, and many things that are true for the parent body also are true for its subdivisions. subjects are expressed using main headings and subject subdivisions, and many things that are true for the main heading (such as variant names) also are true for the heading combined with one of its subdivisions. geographic areas are contained within larger geographic areas, and many things that are true of the larger geographic area also are true for smaller regions, counties, cities, etc., contained within that larger geographic area. for all these reasons, i believe that, to do effective displays and indexes for our bibliographic data, it is critical that we be able to distinguish between a hierarchical relationship and a nonhierarchical relationship. 6. to recognize the fact that the subject of a book or a film could be a work, a person, a concept, an object, an event, or a place (all classes in the model), is there any reason we cannot define subject itself as a property (a relationship) rather than a class in its own right? in my model, all subject properties are defined as having a domain of resource, meaning there is no constraint as to the class to which these subject properties apply. i’m not sure if there will be any fall-out from that modeling decision. 7. how do we distinguish between the corporate behavior of a jurisdiction and the subject behavior of a geographical location? sometimes a place is a jurisdiction and behaves like a corporate body (e.g., united states is the name of the government of the united states). sometimes place is a physical location in which something is located (e.g., the birds discussed in a book about the birds of the united states). to distinguish between the corporate behavior of a jurisdiction and the subject behavior of a geographical location, i have defined two different classes for place: place as jurisdictional corporate body and place as geographic area. will this cause problems in the model? will there be times when it prevents us from making elegant generalizations in the model about place per se? there is a similar problem with events. some events are corporate bodies (e.g., conferences that publish papers) and some are a kind of subject (e.g., an earthquake). i have defined two different classes for event: conference or other event as corporate body creator and event as subject. 8. what is the best way to model a bound-with or an issuedwith relationship, or a part–whole relationship in which the whole must be located to obtain the part? the bound-with relationship is actually between two items containing two different works, while the issued-with relationship is between two manifestations containing two different works (see figure 2). is this a work-to-work relationship? will designating it a work-to-work relationship cause problems for indicating which specific items or manifestation-items of each work are physically located in the same place? this question may also apply to those part–whole relationships in which the part is physically contained within the whole and both are located in the same place (sometimes known as analytics). one thing to bear in mind is that in all of these cases the relationship between two works does not hold between all instances of each work; it only holds for those particular instances that are contained in the particular manifestation or item that is bound with, issued with, or part of the whole. however, if the relationship is modeled as a work-1manifestation to work-2-manifestation relationship, or a work-1-item to work-2-item relationship,, care must be taken in the design of displays to pull in enough information about the two or more works so as not to confuse the user. 9. how do we express the arrangement of elements that have a definite order? i am having trouble imagining how to encode the ordering of data elements that make up a larger element, such as the pieces of a personal name. this is really a desire to control the display of those atomized elements so that they make sense to human beings rather than just to machines. could one define a property such as natural language order of forename, surname, middle name, patronymic, matronymic and/or clan name of a person given that the ideal order of these elements might vary from one person to another? could one define properties such as sorting element 1, sorting element 2, sorting element 3, etc., and assign them to the various pieces that will be assembled to make a particular heading for an entity, such as an lcsh heading for a historical period? (depending on the answer to the question in item 11, it may or may not be possible to assign a property to a property in this fashion.) are there standard sorting rules we need to be aware of (in unicode, for example)? are there other rdf techniques available to deal with sorting and arrangement? bruce d’arcus suggests that, instead of coding the name parts, it would be more useful to designate sortname properties;19 might it not be necessary to designate a sortname property for each variant name, as well, can bibliographic data be put directly onto the semantic web? | yee 67 for cases in which variants need to appear in sorted displays? and wouldn’t these sortname properties complicate maintenance over time as preferred and variant names changed? 10. how do we link related data elements in such a way that effective indexing and displays are possible? some examples: number and kind of instrument (e.g., music written for two oboes and three guitars); multiple publishers, frequencies, subtitles, editors, etc., with date spans for a serial title change (or will it be necessary to create a new manifestation for every single change in subtitle, publisher name, place of publication, etc?). the assumption seems to be that there will be no repeatable data elements. based on my somewhat limited experience with rdf, it appears that there are record equivalents (every data element—property or relationship—pertaining to a particular entity with a uri), but there are no field or subfield equivalents that allow the sublinking of related pieces of data about an entity. indeed, rob styles goes so far as to argue that ultimately there is no notion of a “record” in rdf.20 it is possible that blank nodes might be able to fill in for fields and subfields in some cases for grouping data, but there are dangers involved in their use.21 to a cataloger, it looks as though the plan is for rdf data to float around loose without any requirement that there be a method for pulling it together into coherent displays designed for human beings. 11. can a property have a property in rdf? as an example of where it might be useful to define a property of a property, robert maxwell suggests that date of publication is really an attribute (property) of the published by relationship (another property).22 another example: in my model, a variant title for a serial is a property. can that property itself have the property type of variant title to encompass things like spine title, key title, etc.? another example appeared in item 9, in which it is suggested that it might be desirable to assign sort-element properties to the various elements of a name property. 12. how do we document record display decisions? there is no way to record display decisions in rdf itself; it is completely display-neutral. we could not safely commit to a particular rdf–based data model until a significant amount of sample bibliographic data had been created and open-source indexing and display software had been designed and user-tested on that data. it may be that we will need to supplement rdf with some other encoding mechanism that allows us to record display decisions along with the data. current cataloging rules are about display as much as they are about content designation. isbd concerns the order in which the elements should be displayed to humans. the cataloging objectives concern display to users of such entity groups as the works of an author, the editions of a work, and the works on a subject. 13. can all bibliographic data be reduced to either a class or a property with a finite list of values? another way to put this is to ask if all that catalogers do could be reduced to a set of pull-down menus. cataloging is the art of writing discursive prose as much as it is the ability to select the correct value for a particular data element. we must deal with ambiguous data (presented by joe blow could mean that joe created the entire work, produced it, distributed it, sponsored it, or merely funded it). we must sometimes record information without knowing its exact meaning. we must deal with situations that have not been anticipated in advance. it is not possible to list every possible kind of data and every possible value for each type of figure 2. examples of part–whole relationships. how might these be best expressed in rdf? issued-with relationship a copy of charlie chaplin’s 1917 film the immigrant can be found on a videodisc compilation called charlie chaplin, the early years along with two other chaplin films. this compilation was published and collected by many different libraries and media centers. if a user wants to view this copy of the immigrant, he or she will first have to locate charlie chaplin, the early years, then look for the desired film at the beginning of the first videodisc in the set. the issued-with relationship between the immigrant and the other two films on charlie chaplin, the early years is currently expressed in the bibliographic record by means of a “with” note: first on charlie chaplin, the early years, v. 1 (62 min.) with: the count – easy street. bound-with relationship the university of california, los angeles film & television archive has acquired a reel of 16 mm. film from a collector who strung five warner bros. cartoons together on a single reel of film. we can assume that no other archive, library, or media collection will have this particular compilation of cartoons, so the relationship between the five cartoons is purely local in nature. however, any user at the film & television archive who wishes to view one of these cartoons will have to request a viewing appointment for the entire reel and then find the desired cartoon among the other four on the reel. the bound-with relationship among these cartoons is currently expressed in a holdings record by means of a “with” note: fourth on reel with: daffy doodles – tweety pie – i love to singa – along flirtation walk. 68 information technology and libraries | june 2009 data up front before any data is gathered. it will always be necessary to provide a plain-text escape hatch. the bibliographic world is a complex, constantly changing world filled with ambiguity. n what are the next steps? in a sense, this paper is a first crude attempt at locating unmapped territory that has not yet been explored. if we were to decide as a community that it would be valuable to move our shared cataloging activities onto the semantic web, we would have a lot of work ahead of us. if some of the rdf problems described above are insoluble, we may need to work with semantic web developers to create a more sophisticated version of rdf that can handle the transitivity and complex linking required by our data. we will also need to encourage a very complex existing community to evolve institutional structures that would enable a more efficient use of the internet for the sharing of cataloging and other metadata creation. this is not just a technological problem, but also a political one. in the meantime, the experiment continues. let the thinking and learning begin! references and notes 1. “notation3, or n3 as it is more commonly known, is a shorthand non–xml serialization of resource description framework models, designed with human-readability in mind: n3 is much more compact and readable than xml rdf notation. the format is being developed by tim berners-lee and others from the semantic web community.” wikipedia, “notation 3,” http://en.wikipedia.org/wiki/notation_3 (accessed feb. 19, 2009). 2. frbr review group, www.ifla.org/vii/s13/wgfrbr/; frbr review group, franar (working group on functional requirements and numbering of authority records), www .ifla.org/vii/d4/wg-franar.htm; frbr review group, frsar (working group, functional requirements for subject authority records), www.ifla.org/vii/s29/wgfrsar.htm; frbroo, frbr review group, working group on frbr/crm dialogue, www .ifla.org/vii/s13/wgfrbr/frbr-crmdialogue_wg.htm. 3. library of congress, response to on the record: report of the library of congress working group on the future of bibliographic control (washington, d.c.: library of congress, 2008): 24, 39, 40, www.loc.gov/bibliographic-future/news/lcwgrpt response_dm_053008.pdf (accessed mar. 25, 2009). 4. ibid., 39. 5. ibid., 41. 6. dublin core metadata initiative, dcmi/rda task group wiki, http://www.dublincore.org/dcmirdataskgroup/ (accessed mar. 25, 2009). 7. mikael nilsson, andy powell, pete johnston, and ambjorn naeve, expressing dublin core metadata using the resource description framework (rdf), http://dublincore.org/ documents/2008/01/14/dc-rdf/ (accessed mar. 25, 2009). 8. see for example table 6.3 in frbr, which maps to manifestation every kind of data that pertains to expression change with the exception of language change. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records (munich: k. g. saur, 1998): 95, http://www.ifla.org/vii/s13/frbr/frbr.pdf (accessed mar. 4, 2009). 9. roy tennant, “marc must die,” library journal 127, no. 17 (oct. 15, 2002): 26. 10. w3c, skos simple knowledge organization system reference, w3c working draft 29 august 2008, http://www.w3.org/ tr/skos-reference/ (accessed mar. 25, 2009). 11. the extract in figure 1 is taken from my complete rdf model, which can be found at http://myee.bol.ucla.edu/ ycrschemardf.txt. 12. mary w. elings and gunter waibel, “metadata for all: descriptive standards and metadata sharing across libraries, archives and museums,” first monday 12, no. 3 (mar. 5, 2007), http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/ article/view/1628/1543 (accessed mar. 25, 2009). 13. oclc, a holdings primer: principles and standards for local holdings records, 2nd ed. (dublin, ohio: oclc, 2008), 4, http:// www.oclc.org/us/en/support/documentation/localholdings/ primer/holdings%20primer%202008.pdf (accessed mar. 25, 2009). 14. the library of congress working group, on the record: report of the library of congress working group on the future of bibliographic control (washington, d.c.: library of congress, 2008): 30, http:// www.loc.gov/bibliographic-future/news/lcwg-ontherecord -jan08-final.pdf (accessed mar. 25, 2009). 15. talis, sir tim berners-lee talks with talis about the semantic web: transcript of an interview recorded on 7 february 2008, http://talis-podcasts.s3.amazonaws.com/twt20080207_timbl .html (accessed mar. 25, 2009). 16. bruce d’arcus, e-mail to author, mar. 18, 2008. 17. ibid. 18. rob styles, e-mail to author, mar. 25, 2008. 19. bruce d’arcus, e-mail to author, mar. 18, 2008. 20. rob styles, e-mail to author, mar. 25, 2008. 21. w3c, “section 2.3, structured property values and blank nodes,” in rdf primer: w3c recommendation 10 february 2004, http://www.w3.org/tr/rdf-primer/#structuredproperties (accessed mar. 25, 2009). 22. robert maxwell, frbr: a guide for the perplexed (chicago: ala, 2008). can bibliographic data be put directly onto the semantic web? | yee 69 entities/classes in rda, frbr, frad compared to yee cataloging rules (ycr) rda, frbr, and frad ycr group 1: work work group 1: expression expression surrogate group 1: manifestation manifestation title-manifestation serial title group 1: item item group 2: person person fictitious character performing animal group 2: corporate body corporate body corporate subdivision place as jurisdictional corporate body conference or other event as corporate body creator jurisdictional corporate subdivision family (rda and frad only) group 3: concept concept group 3: object object group 3: event event or historical period as subject group 3: place place as geographic area discipline genre/form name identifier controlled access point rules (frad only) agency (frad only) appendix. entity/class and attribute/property comparisons 70 information technology and libraries | june 2009 attributes/properties in frbr compared to frad model entity frbr frad work title of the work form of work date of the work other distinguishing characteristics intended termination intended audience context for the work medium of performance (musical work) numeric designation (musical work) key (musical work) coordinates (cartographic work) equinox (cartographic work) form of work date of the work medium of performance subject of the work numeric designation key place of origin of the work original language of the work history other distinguishing characteristic expression title of the expression form of expression date of expression language of expression other distinguishing characteristics extensibility of expression revisability of expression extent of the expression summarization of content context for the expression critical response to the expression use restrictions on the expression sequencing pattern (serial) expected regularity of issue (serial) expected frequency of issue (serial) type of score (musical notation) medium of performance (musical notation or recorded sound) scale (cartographic image/object) projection (cartographic image/object) presentation technique (cartographic image/object) representation of relief (cartographic image/object) geodetic, grid, and vertical measurement (cartographic image/ object) recording technique (remote sensing image) special characteristic (remote sensing image) technique (graphic or projected image) form of expression date of expression language of expression technique other distinguishing characteristic surrogate can bibliographic data be put directly onto the semantic web? | yee 71 model entity frbr frad manifestation title of the manifestation statement of responsibility edition/issue designation place of publication/distribution publisher/distributor date of publication/distribution fabricator/manufacturer series statement form of carrier extent of the carrier physical medium capture mode dimensions of the carrier manifestation identifier source for acquisition/access authorization terms of availability access restrictions on the manifestation typeface (printed book) type size (printed book) foliation (hand-printed book) collation (hand-printed book) publication status (serial) numbering (serial) playing speed (sound recording) groove width (sound recording) kind of cutting (sound recording) tape configuration (sound recording) kind of sound (sound recording) special reproduction characteristic (sound recording) colour (image) reduction ratio (microform) polarity (microform or visual projection) generation (microform or visual projection) presentation format (visual projection) system requirements (electronic resource) file characteristics (electronic resource) mode of access (remote access electronic resource) access address (remote access electronic resource) edition/issue designation place of publication/distribution publisher/distributor date of publication/distribution form of carrier numbering title-manifestation serial title item item identifier fingerprint provenance of the item marks/inscriptions exhibition history condition of the item treatment history scheduled treatment access restrictions on the item location of item attributes/properties in frbr compared to frad (cont.) 72 information technology and libraries | june 2009 model entity frbr frad person name of person dates of person title of person other designation associated with the person dates associated with the person title of person other designation associated with the person gender place of birth place of death country place of residence affiliation address language of person field of activity profession/occupation biography/history fictitious character performing animal corporate body name of the corporate body number associated with the corporate body place associated with the corporate body date associated with the corporate body other designation associated with the corporate body place associated with the corporate body date associated with the corporate body other designation associated with the corporate body type of corporate body language of the corporate body address field of activity history corporate subdivision place as jurisdictional corporate body conference or other event as corporate body creator jurisdictional corporate subdivision family type of family dates of family places associated with family history of family concept term for the concept type of concept object term for the object type of object date of production place of production producer/fabricator physical medium event term for the event date associated with the event place associated with the event attributes/properties in frbr compared to frad (cont.) can bibliographic data be put directly onto the semantic web? | yee 73 model entity frbr frad place term for the place coordinates other geographical information discipline genre/form name type of name scope of usage dates of usage language of name script of name transliteration scheme of name identifier type of identifier identifier string suffix controlled access point type of controlled access point status of controlled access point designated usage of controlled access point undifferentiated access point language of base access point script of base access point script of cataloguing transliteration scheme of base access point transliteration scheme of cataloguing source of controlled access point base access point addition rules citation for rules rules identifier agency name of agency agency identifier location of agency attributes/properties in frbr compared to frad (cont.) 74 information technology and libraries | june 2009 attributes/properties in rda compared to ycr model entity rda ycr work title of the work form of work date of work place of origin of work medium of performance numeric designation key signatory to a treaty, etc. other distinguishing characteristic of the work original language of the work history of the work identifier for the work nature of the content coverage of the content coordinates of cartographic content equinox epoch intended audience system of organization dissertation or theses information key identifier for work language-based identifier (preferred lexical label) variant language-based identifier (alternate lexical label) language-based identifier (preferred lexical label) for work language-based identifier for work (preferred lexical label) identified by principalcreator in combination with uniform title language-based identifier (preferred lexical label) for work identified by title alone (uniform title) supplied title for work variant title for work original language of work responsibility for work original publication statement of work dates associated with work original publication/release/broadcast date of work copyright date of work creation date of work date of first recording of a work date of first performance of a work finding date of naturally occurring object original publisher/distributor/broadcaster of work places associated with work original place of publication/distribution/broadcasting for work country of origin of work place of creation of work place of first recording of work place of first performance of work finding place of naturally occurring object original method of publication/distribution/broadcast of work serial or integrating work original numeric and/or alphabetic designations—beginning serial or integrating work original chronological designations— beginning serial or integrating work original numeric and/or alphabetic designations—ending serial or integrating work original chronological designations— ending encoding of content of work genre/form of content of work original instrumentation of musical work instrumentation of musical work—number of a particular instrument instrumentation of musical work—type of instrument original voice(s) of musical work voice(s) of musical work—number of a particular type of voice voice(s) of musical work—type of voice original key of musical work numeric designation of musical work coordinates of cartographic work equinox of cartographic work original physical characteristics of work original extent of work original dimensions of work mode of issuance of work can bibliographic data be put directly onto the semantic web? | yee 75 model entity rda ycr work (cont.) original aspect ratio of moving image work original image format of moving image work original base of work original materials applied to base of work work summary work contents list custodial history of work creation of archival collection censorship history of work note about relationship(s) to other works expression content type date of expression language of expression other distinguishing characteristic of the expression identifier for the expression summarization of the content place and date of capture language of the content form of notation accessibility content illustrative content supplementary content colour content sound content aspect ratio format of notated music medium of performance of musical content duration performer, narrator, and/or presenter artistic and/or technical credits scale projection of cartographic content other details of cartographic content awards key identifier for expression language-based identifier (preferred lexical label) for expression variant title for expression nature of modification of expression expression title expression statement of responsibility edition statement scale of cartographic expression projection of cartographic expression publication statement of expression place of publication/distribution/release/broadcasting for expression place of recording for expression publisher/distributor/releaser/broadcaster for expression publication/distribution/release/broadcast date for expression copyright date for expression date of recording for expression numeric and/or alphabetic designations for serial expressions chronological designations for serial expressions performance date for expression place of performance for expression extent of expression content of expression language of expression text language of expression captions language of expression sound track language of sung or spoken text of expression language of expression subtitles language of expression intertitles language of summary or abstract of expression instrumentation of musical expression instrumentation of musical expression—number of a particular instrument instrumentation of musical expression—type of instrument voice(s) of musical expression voice(s) of musical expression—number of a particular type of voice voice(s) of musical expression—type of voice key of musical expression appendages to the expression expression series statement mode of issuance for expression notes about expression surrogate [under development] attributes/properties in rda compared to ycr (cont.) 76 information technology and libraries | june 2009 model entity rda ycr manifestation title statement of responsibility edition statement numbering of serials production statement publication statement distribution statement manufacture statement copyright date series statement mode of issuance frequency identifier for the manifestation note media type carrier type base material applied material mount production method generation layout book format font size polarity reduction ratio sound characteristics projection characteristics of motion picture film video characteristics digital file characteristics equipment and system requirements terms of availability key identifier for manifestation publication statement of manifestation place of publication/distribution/release/broadcast of manifestation manifestation publisher/distributor/releaser/broadcaster manifestation date of publication/distribution/release/broadcast carrier edition statement carrier piece count carrier name carrier broadcast standard carrier recording type carrier playing speed carrier configuration of playback channels process used to produce carrier carrier dimensions carrier base materials carrier generation carrier polarity materials applied to carrier carrier encoding format intermediation tool requirements system requirements serial manifestation illustration statement manifestation standard number manifestation isbn manifestation issn manifestation publisher number manifestation universal product code notes about manifestation titlemanifestation key identifier for title-manifestation variant title for title-manifestation title-manifestation title title-manifestation statement of responsibilities title-manifestation edition statement publication statement of title-manifestation place of publication/distribution/release/broadcasting of titlemanifestation publisher/distributor/releaser, broadcaster of title-manifestation date of publication/distribution/release/broadcast of titlemanifestation title-manifestation series title-manifestation mode of issuance notes about title-manifestation title-manifestation standard number attributes/properties in rda compared to ycr (cont.) can bibliographic data be put directly onto the semantic web? | yee 77 model entity rda ycr serial title key identifier for serial title variant title for serial title title of serial title serial title statement of responsibility serial title edition statement publication statement of serial title place of publication/distribution/release/broadcast of serial title publisher/distributor/releaser/broadcaster of serial title date of publication/distribution/release/broadcast of serial title serial title beginning numeric and/or alphabetic designations serial title beginning chronological designations serial title ending numeric and/or alphabetic designations serial title ending chronological designations serial title frequency serial title mode of issuance serial title illustration statement notes about serial title serial title issn-l item preferred citation custodial history immediate source of acquisition identifier for the item item-specific carrier characteristics key identifier for item item barcode item location item call number or accession number item copy number item provenance item condition item marks and inscriptions item exhibition history item treatment history item scheduled treatment item access restrictions attributes/properties in rda compared to ycr (cont.) 78 information technology and libraries | june 2009 model entity rda ycr person name of the person preferred name for the person variant name for the person date associated with the person title of the person fuller form of name other designation associated with the person gender place of birth place of death country associated with the person place of residence address of the person affiliation language of the person field of activity of the person profession or occupation biographical information identifier for the person key identifier for person language-based identifier (preferred lexical label) for person clan name of person forename/given name/first name of person matronymic of person middle name of person nickname of person patronymic of person surname/family name of person natural language order of forename, surname, middle name, patronymic, matronymic and/or clan name of person affiliation of person biography/history of person date of birth of person date of death of person ethnicity of person field of activity of person gender of person language of person place of birth of person place of death of person place of residence of person political affiliation of person profession/occupation of person religion of person variant name for person fictitious character [under development] performing animal [under development] corporate body name of the corporate body preferred name for the corporate body variant name for the corporate body place associated with the corporate body date associated with the corporate body associated institution other designation associated with the corporate body language of the corporate body address of the corporate body field of activity of the corporate body corporate history identifier for the corporate body key identifier for corporate body language-based identifier (preferred lexical label) for corporate body dates associated with corporate body field of activity of corporate body history of corporate body language of corporate body place associated with corporate body type of corporate body variant name for corporate body corporate subdivision [under development] place as jurisdictional corporate body [under development] attributes/properties in rda compared to ycr (cont.) can bibliographic data be put directly onto the semantic web? | yee 79 model entity rda ycr conference or other event as corporate body creator [under development] jurisdictional corporate subdivision [under development] family name of the family preferred name for the family variant name for the family type of family date associated with the family place associated with the family prominent member of the family hereditary title family history identifier for the family concept term for the concept preferred term for the concept variant term for the concept type of concept identifier for the concept key identifier for concept language-based identifier (preferred lexical label) for concept qualifier for concept language-based identifier variant name for concept object name of the object preferred name for the object variant name for the object type of object date of production place of production producer/fabricator physical medium identifier for the object key identifier for object language-based identifier (preferred lexical label) for object qualifier for object language-based identifier variant name for object event name of the event preferred name for the event variant name for the event date associated with the event place associated with the event identifier for the event key identifier for event or historical period as subject language-based identifier (preferred lexical label) for event or historical period as subject beginning date for event or historical period as subject ending date for event or historical period as subject variant name for event or historical period as subject place name of the place preferred name for the place variant name for the place coordinates other geographical information identifier for the place key identifier for place as geographic area language-based identifier (preferred lexical label) for place as geographic area qualifier for place as geographic area variant name for place as geographic area discipline key identifier for discipline language-based identifier (preferred lexical label) (name or classification number or symbol) for discipline translation of meaning of classification number or symbol for discipline attributes/properties in rda compared to ycr (cont.) 80 information technology and libraries | june 2009 model entity rda ycr genre/form key identifier for genre/form language-based identifier (preferred lexical label) for genre/form variant name for genre/form name scope of usage date of usage identifier controlled access point rules agency note: in rda, the following attributes have not yet been assigned to a particular class or entity: extent, dimensions, terms of availability, contact information, restrictions on access, restrictions on use, uniform resource locator, status of identification, source consulted, cataloguer’s note, status of identification, and undifferentiated name indicator. name is being treated as both a class and a property. identifier and controlled access point are treated as properties rather than classes in both rda and ycr. attributes/properties in rda compared to ycr (cont.) utilizing technology to support and extend access to students and job seekers during the pandemic public libraries leading the way utilizing technology to support and extend access to students and job seekers during the pandemic daniel berra information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.13261 daniel berra (danielb@pfulgervilletx.gov) is assistant director, pflugerville (texas) public library. © 2021. “public libraries leading the way” is a regular column spotlighting technology in public libraries. the ongoing pandemic necessitated a reimaging of public library services and resources. out of this challenge rose opportunities to better serve the needs of our communities during the pandemic and beyond. when our library first closed our doors to the public last march, we began discussions on how the needs of our community have changed. we identified two key groups for whom the pandemic had forced an uncomfortable shift: students suddenly thrust into virtual learning and adults who had lost their jobs. while we continue to serve all members of our community in a variety of ways, we looked to increase support for these specific groups utilizing available technology. like many public libraries, the pflugerville public library quickly shifted our service model to include virtual programs, curbside pickup, library cards issued remotely and a focus on electronic resources. our community is rapidly growing and diverse. many of our nearly 70,000 residents are frequent users of library services, attend our wide array of programs, hold meetings, study or work inside the building and enjoy both the physical and virtual library collection. the pandemic shift required our talented staff to find ways to provide a similar level of service to a community who heavily utilizes the library. for both students and job seekers, we took steps to alleviate some of the difficulties the building’s closure caused by utilizing existing technology. we worked with the city’s it department to extend the library’s wi-fi to cover the entire parking lot, allowing for 24-hour access. we also utilized our existing print from your own device system to allow library users to submit print jobs and then pick them up through our curbside service. we added additional wi-fi hotspots available for checkout to ensure access at home for those lacking internet. since these services were already offered to some degree, the expansion of access was relatively easy to implement. for students we drew upon our existing relationship with the pflugerville independent school district (pfisd) to provide support and extend access. we expanded the offering of our special digit cards, which allow students to sign up for an account giving them access to all of our electronic resources and wi-fi hotspots. the school district’s librarians handle the signups and then submit the forms so we can set up the accounts then contact students by email or phone. we further extended access to ebooks by working with the district and our vendor overdrive, to provide a direct way for students to browse and check out through the district’s own ebook app. this allows students to seamlessly see both of our collections, significantly increasing their reading options and removing barriers to access. on the support front, we utilized a portion of the city’s cares act funds directed toward the library to launch a live, virtual tutoring service called brainfuse helpnow. students of all ages have anonymous access to tutors from home seven days a week, as well as additional homework mailto:danielb@pfulgervilletx.gov information technology and libraries march 2021 utilizing technology to support and extend access | berra 2 support resources. this piece meshes nicely with some of our virtual programming for teens, like our sat and act practice tests and other testand career-preparation e-resources. recognizing the pandemic’s impact on the economy, and how this directly affects our community, we worked to prioritize support for the unemployed and under-employed. we added a resume review/job-search coaching service led by two of our circulation staff members. we utilized another portion of our cares act funds to offer career online high school, providing adults with access to an online program to obtain their high school diploma. we also began lending laptops for home use to ensure access to necessary technology. some of our support was already in place before the pandemic began, and we made a significant marketing push to highlight these e-resources. for instance, we partner with the pflugerville community development corporation to provide the online training resource lynda.com (soon to be linkedin learning). we saw a large increase in usage particularly in the first few months of the pandemic as community members looked to add employable skills to their toolboxes. we also created a page on our website with all of our job search assistance resources and services highlighted in one place. while the main emphasis of these efforts utilizes technology, serving the needs of the entire community also requires supporting those who are generally less connected. we have to balance our digital expectations with something more tangible, recognizing many library users still utilize the library in a more traditional way. for students, our senior youth services librarian partnered with pfisd for a book give away in conjunction with the district’s food distribution program to get books in the hands of children for the summer. we also began distributing “care kits” through our curbside service that include personal grooming products and cold weather gear for anyone in need. while 2020 featured the addition of many new services or significant expansion of existing ones, we are focused in 2021 on increasing our marketing efforts for these offerings. relying too heavily on digital forms of communication can limit the impact of our services. for instance, if we want to let people who do not have access to the internet at home know we have wi-fi hotspots and laptops available for checkout, then spreading the word through our standard methods of social media, website, and email will prove ineffective. with the building currently closed to the public, we face an additional barrier to communication. to help alleviate some of this, we have created a job search assistance flyer that we are distributing at places like local food pantries. we plan to expand on similar methods of marketing throughout the year. while positive feedback is often hidden from libraries since we prioritize patron privacy and anonymity, we have received a few specific stories that highlight our impact. our firs t scholarship recipient for career online high school shared how the opportunity to obtain her high school diploma will open up new professional avenues and erase the stigma of having not completed high school. another community member who took advantage of our job search coaching to prepare for an interview expressed gratitude to the library staff who helped increase his employment chances. we also see resumes and homework assignments printed through our virtual printing service, hear from parents with children utilizing hotspots for virtual schooling, see cars in the parking lot using the extended wi-fi and track statistics showing a large increase in the usage of our electronic resources. https://library.pflugervilletx.gov/services/assistance-for-job-seekers information technology and libraries march 2021 utilizing technology to support and extend access | berra 3 the ongoing pandemic necessitated a re-imagining of library services. the needs of our community changed and we set out to find ways to provide assistance to those who need it the most utilizing technology, while remaining mindful of those who are not as comfortable in the digital age. the combination of utilizing technology to address the current needs and expanding access to this technology, has allowed us to better serve the community. we are in the process now of evaluating all of our changes to determine which ones will continue even after the pandemic ends. we already know that we will keep our methods of extending access like the expanded wi-fi availability, laptops for checkout, digit cards for students and the seamless connection to our ebook collection for pfisd. in the area of support, we will continue to offer career online high school, brainfuse helpnow for virtual tutoring, and our resume review/job search coaching service. public libraries are well positioned to innovate and adjust to changes in society. it is one of the things we do extremely well, out of necessity, but also out of a deep desire to serve our communities. all of the shifts the pflugerville public library made related to supporting students and job seekers drew upon existing technology and available resources. what changed was the areas on which we chose to focus our efforts. by prioritizing support and access while pinpointing the needs of the moment, we found ways to better serve our community within the context of everything else we provide. while the jury is still out on how successful some of these initiatives will prove, we already know that many of these changes will continue long after the pandemic ends. editorial board thoughts | dehmlow 53 mark dehmloweditorial board thoughts the ten commandments of interacting with nontechnical people m ore than ten years of working with technology and interacting with nontechnical users in a higher education environment has taught me many lessons about successful communication strategies. somehow, in that time, i have been fortunate to learn some effective mechanisms for providing constructive support and leading successful technical projects with both technically and “semitechnically” minded patrons and librarians. i have come to think of myself as someone who lives in the “in between,” existing more in the beyond than the bed or the bath, and, while not a native of either place, i like to think that i am someone who is comfortable in both the technical and traditional cliques within the library. ironically, it turns out that the most critical pieces to successfully implementing technology solutions and bridging the digital divide in libraries has been categorically nontechnical in nature; it all comes down to collegiality, clear communication, and a commitment to collaboration. as i ruminated on the last ten plus years of working in technology, i began to think of the behaviors and techniques that have proved most useful in developing successful relationships across all areas of the library. the result is this list of the top ten dos and don’ts for those of us self-identified techies who are working more and more often with the self-identified nontechnical set. 1. be inclusive—i have been around long enough to see how projects that include only technical people are doomed to scrutiny and criticism. the single best strategy i have found to getting buy-in for technical projects is to include key stakeholders and those with influence in project planning and core decision-making. not only does this create support for projects, but it encourages others to have a sense of ownership in project implementation—and when people feel ownership for a project, they are more likely to help it succeed. 2. share the knowledge—i don’t know if it is just the nature of librarianship, but librarians like to know things, and more often than not they have a healthy sense of curiosity about how things work. i find it goes a long way when i take a few moments to explain how a particular technology works. our public services specialists, in particular, often want to know the details of how our digital tools work so that they can teach users most effectively and answer questions users have about how they function. sharing expertise is a really nice way to be inclusive. 3. know when you have shared enough—in the same way that i don’t need to know every deep detail of collections management to appreciate it, most nontechies don’t need hour-long lectures on how each component of technology relates to the other. knowing how much information to share when describing concepts is critical to keeping people’s interest and generally keeping you approachable. 4. communicate in english—it is true that every specialization has its own vocabulary and acronyms (oh how we love acronyms in libraries) that have no relevance to nonspecialists. i especially see this in the jargon we use in the library to describe our tools and services. the best policy is to avoid jargon and explain concepts in lay-person’s terms or, if using jargon is unavoidable, define specialized words in the simplest terms possible. using analogies and drawing pictures can be excellent ways to describe technical concepts and how they work. it is amazing how much from kindergarten remains relevant later in life! 5. avoid techno-snobbery—i know that i am risking virtual ostracism in writing this, but i think it needs to be said. just because i understand technology does not make me better than others, and i have heard some variant of the “cup holder on the computer” joke way too often. even if you don’t make these kinds of comments in front of people who aren’t as technically capable as you, the attitude will be apparent in your interactions, and there is truly nothing more condescending. 6. meet people halfway—when people are trying to ask technology-related questions or converse about technical issues, don’t correct small mistakes. instead, try to understand and coax out their meaning; elaborate on what they are saying, and extend the conversation to include information they might not be aware of. people don’t like to be corrected or made to feel stupid—it is embarrassing. if their understanding is close enough to the basic idea, letting small mistakes in terminology slide can create an opening for a deeper understanding. you can provide the correct terminology when talking about the topic without making a point to correct people. 7. don’t make a clean technical/nontechnical distinction— after once offering the “technical” perspective on a topic, one librarian said to me that it wasn’t that they themselves didn’t have any technical mark dehmlow (mdehmlow@nd.edu) is digital initiatives librarian, hesburgh libraries, university of notre dame, notre dame, indiana. 54 information technology and libraries | june 2009 perspective, it just wasn’t perhaps as extensive as mine. each person has some level of technical expertise; it is better to encourage the development of that understanding rather than compartmentalizing people on the basis of their area of expertise. 8. don’t expect everyone to be interested—just because i chose a technical track and am interested in it doesn’t mean everyone should be. sometimes people just want to focus on their area of expertise and let the technical work be handled by the techies. 9. assume everyone is capable—at least at some level. sometimes it is just a question of describing concepts in the right way, and besides, not everyone should be a programmer. everyone brings their own skills to the table and that should be respected. 10. expertise is just that—and no one, no one knows everything. there just isn’t enough time, and our brains aren’t that big. embrace those with different expertise, and bring those perspectives into your project planning. a purely technical perspective, while perhaps being efficient, may not provide a practical or intuitive solution for users. diversity in perspective creates stronger projects. in the same way that the most interesting work in academia is becoming increasingly more multidisciplinary, so too the most successful work in libraries needs to bring diverse perspectives to the fore. while it is easy to say libraries are constantly becoming more technically oriented because of the expanse of digital collections and services, the need for the convergence of the technical and traditional domains is clear—digital preservation is a good example of an area that requires the lessons and strengths learned from physical preservation, and, if anything, the technical aspects still raise more questions than solutions—just see henry newman’s article “rocks don’t need to be backed up” to see what i mean.1 increasingly, as we develop and implement applications that better leverage our collections and highlight our services, their success hinges on their usability, user-driven design, and implementations based on user feedback. these “user”-based evaluation techniques fit more closely with traditional aspects of public services: interacting with patrons. lastly, it is also important to remember that technology can be intimidating. it has already caused a good deal of anxiety for those in libraries who are worried about long-term job security as technology continues to initiate changes in the way we perform our jobs. one of the best ways to bring people along is to demystify the scary parts of technology and help them see a role for themselves in the future of the library. going back to maslow’s hierarchy of needs, people want to feel a sense of security and belonging, and i believe it is incumbent upon those of us with a deep understanding of technology to help bring the technical to the traditional in a way that serves everyone in the process. reference 1. henry newman, “rocks don’t need to be backed up,” enterprise storage forum.com (mar. 27, 2009), www.enterprise storageforum.com/continuity/features/article.php/3812496 (accessed april 24, 2009). student use of library computers: are desktop computers still relevant in today’s libraries? susan thompson information technology and libraries |december 2012 20 abstract academic libraries have traditionally provided computers for students to access their collections and, more recently, facilitate all aspects of studying. recent changes in technology, particularly the increased presence of mobile devices, calls into question how libraries can best provide technology support and how it might affect the use of other library services. a two-year study conducted at california state university san marcos library analyzed student use of computers in the library, both the library’s own desktop computers and laptops owned by students. the study found that, despite the increased ownership of mobile technology by students, they still clearly preferred to use desktop computers in the library. it also showed that students who used computers in the library were more likely to use other library services and physical collections. introduction for more than thirty years, it has been standard practice in libraries to provide some type of computer facility to assist students in their research. originally, the focus was on providing access to library resources, first the online catalog and then journal databases. for the past decade or so, this has expanded to general-use computers, often in an information-commons environment, capable of supporting all aspects of student research from original resource discovery to creation of the final paper or other research product. however, times are changing and the ready access to mobile technology has brought into question whether libraries need to or should continue to provide dedicated desktop computers. do students still use and value access to computers in the library? what impact does student computer use have on the library and its other services? have we reached the point where we should reevaluate how we use computers to support student research? california state university san marcos (csusm) is a public university with about nine thousand students, primarily undergraduates from the local area. csusm was established in 1991 and is one of the youngest campuses in the 23-campus california state university system. the library, originally located in space carved out of an administration building, moved into its own dedicated library building in 2004. one of the core principles in planning the new building was the vision of the library as a teaching and learning center. as a result, a great deal of thought went into the design of technology to support this vision. rather than viewing technology’s role as just supporting access to library resources, we expanded its role to providing cradle-to-grave support for the entire research process. we also felt that encouraging students to work in the library would encourage use of traditional library materials and the expertise of library staff, since these resources would be readily available.1 susan thompson (sthompsn@csusm.edu) is coordinator of library systems, california state university san marcos. student use of library computers | thompson 21 rethinking our assumptions about library technology’s role in the student research process led us to consider the entire building as a partner in the students’ learning process. rather than centralizing all computer support in one information commons, we wanted to provide technology wherever students want to use it. we used two strategies. first, we provided centralized technology using more than two hundred desktop computers, most located in four of our learning spaces: reference, classrooms, the media library, and the computer lab. three of these spaces are configured like information commons, providing full-service research computers grouped around the service desks near each library entrance. in addition, simplified “walk-up” computers are available on every floor. the simplified computers provide limited web services to encourage quick turnaround and no login requirement to ensure ready access to library collections for everyone, including community members. the other major component of our technology plan was the provision of wireless throughout the building, along with extensive power outlets to support mobile computing. more than forty quiet study rooms, along with table “islands” in the stacks, help support the use of laptops for group study. however, only two of these quiet studies, located in the media library, provide desktop computers designed specifically to support group work. in 2009 and again in 2010, we conducted computer use studies to evaluate the success of the library’s technology strategy and determine whether the library’s desktop computers were still meeting student needs as envisioned by the building plan. the goal of the study was to obtain a better understanding of how students use the library’s computers, including types of applications used, computer preferences, and computer-related study habits. the study addressed several specific research questions. first, librarians were concerned that the expanded capabilities of the desktop computers distracted students from an academic and library research focus. were students using the library’s computers appropriately? second, the original technology plan had provided extensive support for mobile technology, but the technology landscape has changed over time. how did the increase in student ownership of mobile devices—now at more than 80 percent—affect the use of the desktop computers? finally, did providing an application-rich computer environment encourage student to conduct more of their studying in the library, leading them more frequently to use traditional library collections and services? this article will focus on the study results pertaining to the second and third research questions. we found that, according to our expectations, students using library computer facilities also made extensive use of traditional library services. however, we were surprised to discover that the growing availability of mobile devices had relatively little impact on students’ continuing preference for libraryprovided desktop computers. literature review the concept of the information commons was just coming into vogue in the early 2000s, when we were designing our library building, and it strongly influenced our technology design as well as building design. information commons, defined by steiner as the “functional integration of technology and service delivery,” have become one of the primary methods by which libraries provide enhanced computing support for students studying in the library.2 one of the changes in libraries motivating the information-commons concept is the desire to support a broad range of learning styles, including the propensity to mix academic and social activities. particularly influential to our design was the concept of the information commons supporting students’ projects “from inception to completion” by providing appropriate technologies to facilitate research, collaboration, and consultation.3 information technology and libraries |december 2012 22 providing access to computers appears to contribute to the value of libraries as “place.” shill and toner, early in the era of information commons, noted “there are no systematic, empirical studies documenting the impact of enhanced library buildings on student usage of the physical library.” 4 since then, several evaluations of the information-commons approach seem to show a positive correlation between creation of a commons and higher library usage because students are now able to complete all aspects of their assignments in the library. for example, the university of tennessee and indiana university have shown significant increases in gate counts after they implemented their commons.5 while many studies discuss the value of information commons, very few look at why library computers are preferred over computers in other areas on campus. burke looked at factors influencing students’ choice of computing facilities at an australian university.6 given a choice of central computer labs, residence hall computers, and the library’s information commons, most students preferred the computers in the library over the other computer locations, with more than half using the library computers more than once a week. they rated the library most highly on its convenience and closeness to resources. perhaps the most important trend likely to affect libraries’ support for student technology needs is the increased use of mobile technology. the 2010 nationwide educause center for applied research (ecar) study, from the same year as the second csusm study, showed that 89 percent of students had laptops.7 other nationwide studies have corroborated this high level of laptop ownership.8 so, does this increased use of laptops and mobile devices have affect the use of desktop computers? the 2010 ecar study reported that desktop ownership (about 50 percent in 2010) had declined by more than 25 percent between 2006 and 2009, a significant period in the lifetime of csusm’s new library building. pew’s internet & american life project trend data showed desktop ownership as the only gadget category in which ownership is decreasing, from 68 percent in 2006 to 55 percent at the end of 2011.9 some libraries and campuses are beginning to respond to the increase in laptop ownership by changing their support for desktop computers. university of colorado boulder, in an effort to decrease costs and increase availability of flexible campus spaces, is making a major move away from providing desktop computers.10 while they found that 97 percent of their students own laptops and other mobile devices, they were concerned that many students still preferred to use desktop computers when on campus. to entice students to bring their laptops to campus, the university is enhancing their support for mobile devices by converting their central computer labs into flexible-use space with plentiful power outlets, flexible furniture, printing solutions, and access to the usual campus software. nevertheless, it may be premature for all libraries and universities to eliminate their desktop computer support. tom, voss, and scheetz found students want flexibility with a spectrum of technological options.11 certainly, they want wi-fi and power outlets to support their mobile technology. however, students also want conventional campus workstations providing a variety of functions, such as quick print and email computers, long-term workstations with privacy, and workstations at larger tables with multiple monitors that support group work. while the ubiquity of laptops is an important factor today, other forms of mobile devices may become more important in the future. a 2009 wall street journal article reported the trend for business travelers is to rely on smartphones rather than laptops.12 for the last three years, educause’s horizon reports have made support for non-laptop mobile technologies one of the top trends. the 2009 horizon report mentioned that in countries like japan, “young people equipped student use of library computers | thompson 23 with mobiles often see no reason to own personal computers.”13 in 2010, horizon reported an interesting pilot project at a community college in which one group of students was issued mobile devices and another group was not.14 members of the group with the mobile devices were found to work on the course more during their spare time. the 2011 horizon report discusses mobiles as capable devices in their own right that are increasingly users’ first choice for internet access.15 therefore, rather than trying to determine which technology is most important, libraries may need to support multiple devices. trends described in the ecar and horizon studies make it clear that students own multiple devices. so how do they use them in the study environment? head’s interviews with undergraduate students at ten us campuses found that “students use a less is more approach to manage and control all of the it devices and information systems available to them.”16 for example, in the days before final exams, students were selective in their use of technology to focus on coursework yet remain connected with the people in their lives. the question then may not be which technology libraries should support but rather how to support the right technology at the right time. method the csusm study used a mixed-method approach, combining surveys with real-time observation to improve the effectiveness of assessment and generate a more holistic understanding of how library users made their technology choices. the study protocol received exempt status by the university human subjects review board. it was carried out twice over a two-year period to determine whether time of the semester affected usage. in 2009, the study was administered at the end of the spring term, april 15 to may 3. we expected that students near the end of the term would be preparing for finals and completing assignments, including major projects. the 2010 study was conducted near the beginning of the term, february 4 to february 18. we that early term students would be less engaged in academic assignments, particularly major research projects. we carried out each study over a two-week period. an attempt was made to check consistency by duplicating each time and location. each location was surveyed monday—thursday, once in the morning and once in the afternoon during the heavy-use times of 11 a.m. and 2 p.m. the survey locations included two large computer labs (more than eighty computers each), one located near the library reference desk and one near the academic technology helpdesk. other locations included twenty computers in the media library, a handful of desktop computers in the curriculum area, and laptop users, mostly located on the fourth and fifth floor of the library. the fourth and fifth floor observations also included the library’s forty quiet study rooms. for the 2010 study, the other large computer lab on campus (108 computers), located outside the library, also was included for comparison purposes. we used two techniques: a quantitative survey of library computer users and a qualitative observation of software applications usage and selected study habits. the survey tried to determine the purpose for which the student was using the computer for that day, what their computer preference was, and what other business they might have in the library. it also asked students for their suggestions for changes in the library. the survey was usually completed within the five-minute period that we had estimated and contained no identifying personal information. the survey administrator handed-out the one-page paper survey, along with a pencil if desired, to each student using a library workstation or using a laptop during each designated observation information technology and libraries |december 2012 24 period. users who refused to take the survey were counted in the total number of students asked to do the survey. however, users who indicated they refused because they had already completed a survey on a previous observation date were marked as “dup” in the 2010 survey and were not counted again. the “dup” statistic proved useful as an independent confirmation of the popularity of the library computers. the second method involved conducting “over-the-shoulder” observations of students using the library computers. while students were filling out the paper survey, the survey administrator walked behind the users and inconspicuously looked at their computer screens. all users in the area were observed whether or not they had agreed to take the survey. the one exception was users in group-study rooms. the observer did not enter the room and could only note behaviors visible from the door window, such as laptop usage or group studying. based on brief (one minute or less) observations, administrators noted on a form the type of software application the student was using at that point in time. the observer also noted other, nondesktop computer technical devices in use (specifically laptops, headphones, and mobile devices such as smart phones), and study behaviors, such as groupwork (defined as two or more people working together). the student was not identified on the form. we felt that these observations could validate information provided by the users on the survey. results we completed 1,452 observations in 2009 and 2,501 observations in 2010. the gate counts for the primary month each study took place—70,607 for april 2009 and 59,668 for february 2010— show the library was used more heavily during the final exam period. the larger number of results the second year was due to more careful observation of laptop and study-group computer users on the fourth and fifth floor and the addition of observations in a nonlibrary computer lab rather than an increase of students available to be observed. the observations looked at application usage, study habits, and devices present, but this article will only discuss the observations pertaining to devices. in 2009, 17 percent of students were observed using laptops (see table1). this number almost doubled in 2010 to 33 percent. most laptop users were observed on the fourth and fifth floors where furniture, convenient electrical outlets, and quiet study rooms provided the best support for this technology. very few desktop computers were available, so students desiring to study on these floors have to bring their own laptops. almost 20 percent of students in 2010 were observed with other mobile technology, such as cell phones or ipods, and 16 percent were wearing headphones, which indicated there was other, often not visible, mobile technology in use. student use of library computers | thompson 25 table 1. mobile technology observed in 2009, 1,141 students completed the computer-use survey. however, we were unable to accurately determine the return rate that year. the nature of the study, which surveyed the same locations multiple times, revealed that many of the students were approached more than once to complete the survey. thus the majority of the refusals to take the survey were because the subject had already completed one previously. the 2010 study accounted for this phenomenon by counting refusals and duplications separately. in 2010, 1,123 students completed the survey out of 1,423 unique asks, resulting in a 79 percent return rate. the 619 duplicates counted represented about half of the 2010 surveys completed and could be considered another indicator of frequent use of the library’s computers. the 2010 results included an additional 290 surveys completed by students using the other large computer lab on campus outside the library. table 2. frequency of computer use 33% 16% 18% 17% 0% 5% 10% 15% 20% 25% 30% 35% laptop in use headphones in use mobile device in use (cell phone, ipod) 2010 2009 49% 33% 11% 9% 42% 30% 15% 10% 0% 10% 20% 30% 40% 50% 60% daily when on campus several times a week several times a month rarely use comps in library 2009 2010 information technology and libraries |december 2012 26 in both years of the study, 78 percent of students said they preferred to use computers in the library to other computer lab locations on campus. students also indicated they were frequent users (see table 2). in 2009, 82 percent of students used the library computers frequently—49 percent daily and 33 percent several times a week. the frequency of use in the 2010 early term study dropped about 10 percent to 72 percent but with the same proportion of daily vs. weekly users. convenience and quiet were the top reasons given by more than half of students as to why they preferred the library computers followed closely by atmosphere. about a quarter of students preferred library computers because of their close access to other library services. table 3. preferred computer to use in the library the types of computer that students preferred to use in the library were desktop computers followed by laptops owned by the students (see table 3). it is notable that the preference for desktop computers changed significantly from 2009 and 2010: 84 percent of students preferred desktop computers in 2009 vs. 72 percent in 2010—a 12 percent decrease. not surprisingly, few students preferred the simplified walk-up computers used for quick lookups. however, we did not expect such little interest in checking out laptops, with only 2 percent preferring that option. the 2010 study added a new question to the survey to better understand the types of technology devices owned by students (see table 4). in 2010, 84 percent of students owned a laptop (combining the netbook and laptop statistics). almost 40 percent of students owned a desktop, therefore many students owned more than one type of computer. of the 85 percent of students that indicated they had a cell phone, about one-third indicated they owned smart phones. the majority of students own music players. the one technology students were not interested in was e-book readers, with less than 2 percent indicating ownership. 84% 6% 23% 2% 71% 5% 28% 2% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% sit-down pc walk-up pc own laptop laptop checked out in library 2009 2010 student use of library computers | thompson 27 table 4. technology devices owned by students (2010) to understand how the use of technology might affect use of the library in general, the survey asked students what other library services they used on the same day they were using library computers. table 5 shows survey responses are very similar between the late term 2009 study and the early term in 2010. by far the most popular use of the library, by more than three-quarters of the students, was for study. around 25 percent of the students planned to meet with others, and 20 percent planned to use the media services. around 15 percent of students planned to checkout print books, 15 percent planned to use journals, and 10 percent planned to ask for help. the biggest difference for students early in the term was an increased interest (5 percent more) in using the library for study. the late-term students were 9 percent more likely to meet with others. by contrast, users in the nonlibrary computer lab were much less likely to make use of other library services. only 24 percent of nonlibrary users planned to study in the library, and 8 percent planned to meet with others in the library that day. use of all other library services was less than 5 percent by the nonlibrary computer users. 1% 1% 7% 31% 40% 52% 59% 77% 0% 20% 40% 60% 80% 100% kindle/book reader other handheld devices netbook smart phone desktop computer regular cell phone ipod/mp3 music player laptop information technology and libraries |december 2012 28 table 5. other library services used in 2010, we also asked users what changes they would like in the library, and 58 percent of respondents provided suggestions. the question was not limited to technology, but by far the biggest request for change was to provide more computers (requested by 30 percent of all respondents). analysis of the other survey questions regarding computer ownership, and preferences revealed who was requesting more traditional desktops in the library. surprisingly, most were laptop users; 90 percent of laptop owners wanted more computers and 88 percent of the respondents making this request were located on the fourth and fifth floor, which were almost exclusively laptop users. the next most comments received were remarks indicating student satisfaction with the current library services: 19 percent of students said they were satisfied with current library services and 9 percent praised the library and its services. commonality of requests dropped quickly at that point, with the fourth most common request being for more quiet (2 percent). 1% 0% 0% 2% 2% 3% 3% 4% 7% 23% 4% 3% 3% 9% 10% 13% 13% 22% 26% 81% 0% 3% 6% 8% 10% 15% 16% 20% 35% 76% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% other pick up ill/circuit create a video/web page use a reserve book ask questions/get help look for journals/newspapers checkout a book use media meet with others study 2009 2010 non-library student use of library computers | thompson 29 discussion the results show that students consistently prefer to use computers in the library, with 78 percent declaring a preference for the library over other computer locations on campus both years of the study. this preference is confirmed by the statistics reported by csusm’s campus it department, which tracks computer login data. this data consistently shows the library computer labs are used more than nonlibrary computer labs, with the computers near the library reference desk as the most popular followed closely by the library’s second large computer lab, which is located next to the technology help desk. for instance, during the 2010 study period, the reference desk lab (80 computers) had 6,247 logins compared to 3,218 logins in the largest nonlibrary lab (108 computers)—double the amount of usage. the data also shows that use of the computers near the reference desk increased by 15 percent between 2007 and 2010. supporting the popularity of using computers in the library is the fact that most students are repeat customers. table 2 shows 82 percent of the 2009 late-term respondents used the library computers several times a week with almost half using our computers daily. in contrast, 72 percent of the 2010 early term students used the library computers daily or several times a week. the 10 percent drop in frequency of visits to the library for computing applied to both laptop and desktop users and seems to be largely due to not yet receiving enough work from classes to justify more frequent use. the kind of computer that users prefered changed somewhat over the course of the study. the preference for desktop computers dropped from 84 percent of students in 2009 to 72 percent in 2010 (see table 3). one reason for this 12 percent drop may be related to how the survey was adminstered. the 2010 study did a more thorough job of surveying the fourth and fifth library floors where most laptop users are. as a result, the laptop floors represented 29 percent of the response in 2010 vs. only 13 percent in 2009. these numbers are also reflected in the proporation of laptops observed each year—33 percent in 2010 vs. 17 percent in 2009 (see table 1). the drop in desktop computer preference is interesting because it was not matched by an equally large increase in laptop preference, which only increased by 5 percent. the other reason for the decrease in desktop preference is likely due to the larger change seen nationwide in student laptop ownership. for instance, the pew study of gadget ownership showed a 13 percent drop in desktop ownership over a five-year period, 2006–2011, while at the same time laptop ownership almost doubled from 30 percent to 56 percent.17 however, it is interesting to note that, according to the pew study, in 2011 the percent of adults who owned each type of device was nearly equal— 55 percent for desktops and 56 percent for laptops. the 2010 survey tried to better understand students’ preferences by identifying all the kinds of technology they had available to them. we found that 77 percent of csusm students owned laptops and an additional 7 percent owned the netbook form of laptops (see table 4). the combined 84 percent laptop ownership is comparable with the 2010 ecar study’s finding of 89 percent student laptop ownership nationwide.18 this high level of laptop ownership may explain why the users who preferred laptop computers almost all preferred to use their own rather than laptops checked out in the library. despite the high laptop ownership and decrease in desktop preference, it is significant that the majority of csusm students still prefer to use desktop computers in the library. aside from the 72 percent of respondents who specifically stated a preference for desktop computers, the top suggestion for library improvement was to add more desktop computers, requested by 38 percent information technology and libraries |december 2012 30 of respondents. further analysis of the survey data revealed that it was the laptop owners and the fourth and fifth floor laptop users who were the primary requestors of more desktop computers. to try to better understand this seemingly contradictory behavior, we have done some further investigation. anecdotal conversations with users during the survey indicated that convenience and reliability are two factors affecting student’s decision to use desktop computers. the desktop computers’ speed and reliable internet connections were regarded as particularly important when uploading a final project to a professor, with some students stating they came to the library specifically to upload an assignment. in may 2012, the csusm library held a focus group that provided additional insight to the question of desktops vs. laptops. all of the eight-student focus group participants owned laptops, yet all eight participants indicated that they preferred to use desktop computers in the library. when asked why, participants indicated the reliability and speed of the desktop computers and the convenience of not having to remember to bring their laptop to school and “lug” it around. another factor influencing the convenience factor may be that our campus does not require that students own a laptop and bring it to class, so they may have less motivation to travel with their laptop. supporting the idea that students perceive different benefits for each type of computer, six of the eight participants owned a desktop computer in addition to a laptop. the 2010 study also showed that students see value in owning both a desktop and a laptop computer, since the 40 percent ownership of desktop computers overlaps the 84 percent ownership of laptops (see table 4). table 6. reasons students prefer using library computer areas for almost half of the students surveyed, one of the reasons for their preference for using computers in the library was either the ready access to library services or staff (see table 6). even more significant, when specifically asked what else they planned to do in the library that day besides using the computer (see table 5), more than 80 percent of the students indicated that they intended to use the library for purposes other than computing. the top two uses for the library were studying (76 percent in 2009, 81 percent in 2010) and meeting with others (35/26 percent), indicating the importance of the library as place. the most popular library service was the media 0% 5% 10% 15% 20% 25% 30% library services are close library staff are close 2009 2010 student use of library computers | thompson 31 library (20/22 percent) followed by collections with 16/13 percent planning to checkout a book and 15/13 percent planning to look for journals and newspapers. it is interesting that the level of use of these library services was similar whether early or late in the term. the biggest difference was that early term students were less likely to be working with a group but were slightly more likely to be engaged in general studying. even the less-used services, such as asking a question (10 percent) or using a reserve book (8 percent), exhibited an appropriate amount of usage if one looks at the actual numbers. for example, 8 percent of 1,123 2010 survey respondents represent 90 students who used reserve materials sometime during the 8 hours of the two-week survey period. to put the use of the library by computer users into perspective, we also asked students using the nonlibrary computer lab if they planned to use the library sometime that same day. only 24 percent of the nonlibrary computer users planned to study in the library that day vs. 81 percent of the library computer users; only 4 percent planned to use media vs. 24 percent; and 2 percent planned to check out a book vs. 13 percent. the implication is clear that students using computers in the library are much more likely to use the library’s other services. we usually think of providing desktop computers as a service for students, and so it is. however, the study results show that providing computers also benefits the library itself. it reinforces its role as place by providing a complete study environment for students and encouraging all study behaviors including communication and working with others. the popularity of the library computers provide us with a “captive audience” of repeat customers. conclusion the csusm library technology that was planned in 2004 is still meeting students’ needs. although most of our students own laptops, most still prefer to use desktop computers in the library. in fact, providing a full-service computer environment to support the entire research process benefits the entire library. students who use computers in the library appear to conduct more of their studying in the library and thus make more use of traditional library collections and services. going forward, several questions arise for future studies. csusm is a commuter school. students often treat their work space in the library as their office for the day, which increases the importance of a reliable and comfortable computer arrangement. one question that could be asked is whether the results would be different for colleges where most students live on campus or nearby. if the university requires that all students own their own laptop and expects them to bring them to class, how does that affect the relevance of desktop computers in the library? the 2010 study was completed just a few weeks before the first ipad was introduced. since students have identified convenience and weight as reasons for not carrying their laptops, are tablets and ultra-light computers, like the macbook air, more likely to be carried on campus by students and used them more frequently for their research? how important is it to have a supportive mobile infrastructure with features such as high speed wifi, ability to use campus printers, and access to campus applications? are students using smart phones and other mobile devices for study purposes? in fact, are we focusing too much on laptops, and are other mobile devices starting to take over that role? this study’s results make it clear that we can’t just look at data such as ecar’s, which show high laptop ownership, and assume that means students don’t want or won’t use library computers. as information technology and libraries |december 2012 32 the types of mobile devices continue to grow and evolve, libraries should continue to develop ways to facilitate their research role. however, the bottom line may not be that one technology will replace another but rather that students will have a mix of devices and will choose which device is best suited to a particular purpose. therefore libraries, rather than trying to pick which device to support, may need to develop a broad-based strategy to support them all. references 1. susan m. thompson and gabriella sonntag. “chapter 4: building for learning: synergy of space, technology and collaboration.” learning commons: evolution and collaborative essentials. oxford: chandos publishing (2008): 117-199. 2. heidi m. steiner and robert p. holley, “the past, present, and possibilities of commons in the academic library,” reference librarian 50, no. 4 (2009): 309–332. 3. michael j. whitchurch and c. jeffery belliston,“information commons at brigham young university: past, present, and future,” reference services review 34, no. 2 (2006): 261–78. 4. harold shill and shawn tonner, “creating a better place: physical improvements in academic libraries, 1995–2002,” college & research libraries 64 (2003): 435. 5. barbara i. dewey, “social, intellectual, and cultural spaces: creating compelling library environments for the digital age,” journal of library administration 48, no. 1 (2008): 85–94; diane dallis and carolyn walters, “reference services in the commons environment,” references services review 34, no. 2 (2006): 248–60. 6. liz burke et al., “where and why students choose to use computer facilities: a collaborative study at an australian and united kingdom university,” australian academic & research libraries 39, no. 3 (september 2008): 181–97. 7. shannon d. smith and judith borreson caruso, the ecar study of undergraduate students and information technology, 2010 (boulder, co: educause center for applied research, october 2010), http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf (accessed march 21, 2012). 8. pew internet & american life project, “adult gadget ownership over time (2006–2012),” http://www.pewinternet.org/static-pages/trend-data-(adults)/device-ownership.aspx (accessed june 14, 2012); the horizon report: 2009 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012); the horizon report: 2010 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012); the horizon report: 2011 edition, the new media consortium and educause learning initiative, http://net.educause.edu/ir/library/pdf/hr2011.pdf (accessed march 21, 2012). 9. pew internet, “adult gadget ownership.” http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf http://www.pewinternet.org/static-pages/trend-data-(adults)/device-ownership.aspx http://net.educause.edu/ir/library/pdf/hr2011.pdf http://net.educause.edu/ir/library/pdf/hr2011.pdf http://net.educause.edu/ir/library/pdf/hr2011.pdf student use of library computers | thompson 33 10. deborah keyek-franssen et al., computer labs study university of colorado boulder office of information technology october 7, 2011, http://oit.colorado.edu/sites/default/files/labsstudypenultimate-10-07-11.pdf (accessed june 15, 2012). 11. j. s. c. tom, k. voss, and c. scheetz[full names?], “the space is the message: first assessment of a learning studio,” educause quarterly 31, no. 2 (2008), http://www.educause.edu/ero/article/space-message-first-assessment-learning-studio (accessed june 25, 2012). 12. nick wingfield, “time to leave the laptop behind,” wall street journal, february 23, 2009, http://online.wsj.com/article/sb122477763884262815.html (accessed june 15 2012). 13. the horizon report: 2009 edition. 14. the horizon report: 2010 edition. 15. the horizon report: 2011 edition. 16. alison j. head and michael b. eisenberg, “balancing act: how college students manage technology while in the library during crunch time,” project information literacy research report, information school, university of washington, october 12, 2011, http://projectinfolit.org/pdfs/pil_fall2011_techstudy_fullreport1.1.pdf (accessed june 14, 2012). 17. pew internet, “adult gadget ownership.” 18. smith and caruso, ecar study. http://oit.colorado.edu/sites/default/files/labsstudy-penultimate-10-07-11.pdf http://oit.colorado.edu/sites/default/files/labsstudy-penultimate-10-07-11.pdf http://www.educause.edu/ero/article/space-message-first-assessment-learning-studio http://online.wsj.com/article/sb122477763884262815.html http://projectinfolit.org/pdfs/pil_fall2011_techstudy_fullreport1.1.pdf table 1. mobile technology observed discussion digital collections are a sprint, not a marathon: adapting scrum project management techniques to library digital initiatives michael j. dulock and holley long information technology and libraries | december 2015 5 abstract this article describes a case study in which a small team from the digital initiatives group and metadata services department at the university of colorado boulder (cu-boulder) libraries conducted a pilot of the scrum project management framework. the pilot team organized digital initiatives work into short, fixed intervals called sprints—a key component of scrum. working for more than a year in the modified framework yielded significant improvements to digital collection work, including increased production of digital objects and surrogate records, accelerated publication of digital collections, and an increase in the number of concurrent projects. adoption of sprints has improved communication and cooperation between participants, reinforced teamwork, and enhanced their ability to adapt to shifting priorities. introduction libraries in recent years have freely adapted methodologies from other disciplines in an effort to improve library services. for example, librarians have • employed usability testing techniques to enhance users’ experience with digital libraries interfaces,1 improve the utility of library websites,2 and determine the efficacy of a visual search interface for a commercial library database;3 • adopted participatory design methods to identify information visualizations that could augment digital library services4 and determine user needs in new library buildings;5 and • utilized principles of continuous process improvement to enhance workflows for book acquisition and implementation of serial title changes in a technical services unit.6 librarians often come to the profession with disciplinary knowledge from an undergraduate degree unrelated to librarianship, so it should come as no surprise that they bring some of that disciplinary knowledge to their work. the interdisciplinary nature of librarianship also creates an environment that is amenable to adoption or adaptation of techniques from a variety of sources, not only those originating in library science. in this paper, the authors describe their experiences michael j. dulock (michael.dulock@colorado.edu) is assistant professor and metadata librarian, university of colorado boulder. holley long (longh@uncw.edu), previously assistant professor and systems librarian for digital initiatives at university of colorado, boulder, is digital initiatives librarian, randall library, university of north carolina wilmington. mailto:michael.dulock@colorado.edu mailto:longh@uncw.edu digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 6 in applying a modified scrum management framework to facilitate digital collection production. they begin by elucidating the fundamentals of scrum and then describes a pilot project using aspects of the methodology. they discuss the outcomes of the pilot and posit additional features of scrum that may be adopted in the future. fundamentals of scrum project management the scrum project management framework—one of several techniques under the rubric of agile project management—originated in software development, and has been applied in a variety of library contexts including the development of digital library platforms7 and library web applications.8 scrum’s salient characteristics include self-managing teams that organize their work into “short iterations of clearly defined deliverables” and focus on “communication over documentation.”9 the scrum primer: a lightweight guide to the theory and practice of scrum describes the roles, tools, and processes involved in this project management technique.10 scrum teams are cross-functional and consist of five to nine members who are cross-trained to perform multiple tasks. in addition to the team, two individuals serve specialized roles, scrum master and product owner. the scrum master is responsible for ensuring that scrum principles are followed and for removing any obstacles that hinder the team’s productivity. hence the scrum master is not a project manager, but a facilitator. the product owner’s role is to manage the product by identifying and prioritizing its features. this individual represents the stakeholders’ interests and is ultimately responsible for the product’s value. the team divides their work into short, fixed intervals called sprints that typically last two to four weeks and are never extended. at the beginning of each sprint, the team meets to select and commit to completing a set of deliverables. once these goals are set, they remain stable for the duration; course corrections can occur in later sprints. in software development, the scrum team aims to complete a unit of work that stands on its own and is fully functional, known as a potentially shippable increment. it is selected from an itemized list of product features called the product backlog. the backlog is established at the outset of development and consists of a comprehensive list of tasks that must occur to complete the product. a well-constructed backlog has four characteristics. first, it is prioritized with the features that will yield the highest return on investment at the top of the list. second, the backlog is appropriately detailed, so that the tasks at the top of the list are well-defined whereas those at the bottom may be more vaguely demarcated. third, each task receives an estimation for the amount of effort required to complete it, which helps the team to project a timeline for the product. finally, the backlog evolves in response to new developments. individual tasks may be added, deleted, divided, or reprioritized over the life of the project. during the course of a sprint, team members meet to plan the sprint, check-in on a daily basis, and then debrief at the conclusion of the sprint. they begin with a two-part planning meeting in which the product owner reviews the highest priority tasks with the team. in the second half of the meeting, the team and the scrum master determine how many of the tasks can be accomplished in information technologies and libraries |december 2015 7 the given timeframe, thus defining the goals for the sprint. this meeting generally lasts no longer than four hours for a two-week sprint. every day, the team holds a brief meeting to get organized and stay on track. during these “daily scrums,” each team member shares three pieces of information: what has been accomplished since the previous meeting, what will be accomplished before the next meeting, and what, if any, obstacles are impeding the work. these fifteen-minute meetings provide the team with a valuable opportunity to communicate and coordinate their efforts. sprints conclude with two meetings, a review and retrospective. during the review, the team inspects the deliverables that were produced during that sprint. the retrospective provides an opportunity to discuss the process, what is working well, and what needs to be adjusted. figure 1. typical meeting schedule for a two-week sprint evidence in the literature suggests that scrum improves both outcomes and process. one metaanalysis of 274 programming case studies found that implementing scrum led to improved productivity as well as greater customer satisfaction, product quality, team motivation, and cost reduction.11 proponents of this project management technique find that it leads to a more flexible and efficient process. scrum’s brief iterative work cycles and evolving product backlog promote adaptability so the team can address the inevitable changes that occur over the life of a project. by contrast, traditional project management techniques have been criticized for requiring too much time upfront on planning and being too rigid to respond to changes in later stages of the project.12 scrum also promotes communication over documentation,13 resulting in less administrative overhead as well as increased accountability and trust between team members. scrum pilot at university of colorado boulder libraries the university of colorado boulder (cu-boulder) libraries digital initiatives team was interested in adopting scrum because of its incremental approach to completing large projects, its focus on communication, and its flexibility. these attributes meshed well with the group’s goals to publish larger collections more quickly and to more effectively multitask the production of multiple high digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 8 priority collections. the group’s staffing model and approach to collection building prior to the scrum pilot is described here to provide some context for this choice of project management tool. digital collection proposals are vetted by a working group composed of ten members, the digital library management group (dlmg), to ensure that major considerations such as copyright status are fully investigated before undertaking the collection. approved proposals are prioritized by the appropriate collection manager as high, medium, or low and then placed in a queue for scanning and metadata provisioning. a core group of individuals generally works on all digital collections, including the metadata librarian, the digital initiatives librarian, and one or both of the digitization lab managers. additionally, the team frequently includes the subject specialist who nominated the collection for digitization, staff catalogers, and other library staff members whose expertise is required. at any given time, the queue may contain as many as fifteen collections, and the core team works on several of them concurrently to address the separate needs of participating departments. while this approach allows the teams to distribute resources more equitably across departments, progress on individual collections can be slower than if they are addressed one at a time. prior to implementing aspects of scrum, the team also completed the scanning and metadata records for every object in the collection before it was published. as a result, publication of larger collections trailed behind smaller collections. the details of digital collection production vary depending of the nature of the project, but the process usually follows the same broad outline. unless the entire collection will be digitized, the collection manager chooses a selection of materials on the basis of criteria such as research value, rarity, curatorial considerations, copyright status, physical condition, feasibility for scanning, and availability of metadata. photographs and paper-based materials are then evaluated by the preservation department to ensure that they are in suitable condition for scanning. likewise, the media lab manager evaluates audio and video media for condition issues such as sticky shed syndrome, which will affect digitization.14 depending on format, the material is then digitized by the digitization lab manager or the media lab manager and their student assistants according to locally established workflows that conform to nationally recognized best practices. once digitized, student assistants apply post-processing procedures as appropriate and required, such as running ocr (optical character recognition) software to convert images to text or equalizing levels on an audio file. the lab managers then check the files for quality assurance and move the files to the appropriate location on the server. the metadata librarian creates a metadata template appropriate to the material being digitized by using industry standards such as visual resources association core (vra core), metadata object description schema (mods), pbcore, and dublin core (dc). metadata creation methods depend on the existence of legacy metadata for the analog materials and in what format legacy metadata is contained. the metadata librarian, along with his staff and/or student assistants, adapts legacy metadata into a format that can be ingested by the digital library software or creates records directly in the software when there is no legacy metadata. metadata is formatted or created in accordance with existing input standards such as cataloging cultural objects (cco) and resource description and access (rda), and it is enhanced information technologies and libraries |december 2015 9 as much as possible using controlled vocabularies such as the art and architectural thesaurus (aat) and library of congress subject headings. the metadata librarian performs quality assurance on the metadata records during creation and before the collection is published. in the final stages, the collection is created in the digital library software, at which time search and display options are established: thumbnail labels, default collection sorting, faceted browsing fields, etc. then the files and metadata are uploaded and published online. the highlight of the cu-boulder digital library is the twenty-seven collections drawn from local holdings in archives, special collections department, music library, and earth sciences and map library, among others. the library also contains purchased content and “luna commons” collections created by institutions that use the same digital library platform, for a total of more than 185,000 images, texts, maps, audio recordings, and videos. the following four collections were created during the scrum pilot and illustrate the types of materials available in the cuboulder digital library: the colorado coal project consists of video and audio interviews, transcripts, and slides collected between 1974 and 1979 by the university of colorado coal project. the project was funded by the colorado humanities program and the national endowment for the humanities to create an ethnographic record of the history of coal mining in the western united states from immigration and daily life in the coal camps to labor conditions and strikes, including ludlow (1913–14) and columbine (1927). the mining maps collection provides access to scanned maps of various mines, lodes, and claims in colorado from the late 1800s to the early 1900s. these maps come from a variety of creators, including private publishers and us government agencies. digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 10 the vasulka media archive showcases the work of pioneering video artists steina and woody vasulka and contains some of their cutting-edge studies in video that experiment with form, content, and presentation. steina, an icelander, educated in music at the prague conservatory of music, and woody, a graduate of prague's film academy, arrived in new york city just in time for the new media explosion. they brought with them their experience of the european media awakening, which helped them blend seamlessly into the youth media revolution of the late sixties and early seventies in the united states. the 3d natural history collection comprises one hundred archaeology and paleontology specimens from the rocky mountain and southwest regions, including baskets, moccasins, animal figurines, game pieces, jewelry, tools, and other everyday objects from the freemont, clovis, and ancestral puebloan cultures as well as a selection of vertebrate, invertebrate, and track paleontology specimens from the mesozoic through the cenozoic eras (250 ma to the present). the diffusion of effort across multiple collections and a slower publication rate for larger collections offered opportunities for improvement. after attending a conference session on scrum project management for web development projects, one of the team members recognized scrum’s potential to improve production processes since the technique divides large projects into manageable subtasks that can be accomplished in regular, short intervals.15 this approach would allow the team to switch between different high priority collections at regularly defined intervals to facilitate steady progress on competing priorities. working in sprints would also make it easier to publish smaller portions of a large collection at regular intervals. thus scrum held the potential to increase the production rate for larger collections and make the team’s progress more transparent to users and colleagues. in april 2013, a small team of cu-boulder librarians and staff initiated a pilot to assess the effect on processes and outcomes for digital collection production. rather than involving individuals from all affected units, regardless of their level of engagement in a particular project, the scrum pilot was limited to the three individuals who were involved in most, if not all, of the projects information technologies and libraries |december 2015 11 undertaken: the digital initiatives librarian, metadata librarian, and digitization lab manager.16 by including these three individuals, the major functions of metadata provision, digitization, and publication were covered in the trial with no disruption to the existing workflows or organizational structures. selecting this group also ensured that scrum would be tested in a broad range of scenarios and on collections from several different departments. to begin, the team met to review the scrum project management framework and considered how best to pilot the technique. taking a pragmatic approach, they only adopted those aspects of scrum that were deemed most likely to result in improved outcomes. if the pilot were successful, other aspects of scrum could be incrementally incorporated later. the group discussed how scrum roles, processes, and tools could be adapted to digital collection workflows and determined that sprints would likely have the highest return on investment. they also chose to adapt and hybridize certain aspects of the planning meeting and daily scrum to achieve goals that were not being met by other existing meetings. sprint planning and end meetings were combined so that all three participants knew what each had completed and what was targeted for the next sprint. select activities of sprint planning and end meetings were already a part of the monthly dlmg meetings, making additional sprint meetings redundant. daily scrum meetings were excluded as the team felt that daily meetings would not produce enough benefit to justify the costs. in addition, two of the three participants have numerous responsibilities that lie outside of projects subject to the scrum pilot, so each person does not necessarily perform scrum-related work every day. however, the short meeting time was adopted into the planning/end meeting, as were elements of the three core questions of the daily scrum meeting, with some modifications. the questions addressed in the biweekly meetings are: what have you done since the last meeting? what are you planning for the next meeting? what impediments, if any, did you encounter during the sprint? the latter question was sometimes addressed mid-sprint through emails, phone calls, or one-off meetings that include a larger or different group of stakeholders. the team adopted the two-week duration typical of scrum sprints for the pilot. this has proven to be a good medium-term timeframe. it was short enough that the team could adjust priorities quickly, but long enough to complete significant work. the team chose to combine the sprint planning and sprint review meetings into a single meeting. part of the motivation for a trial of the scrum technique was to minimize additional time away from projects while maximizing information transfer during the meetings. a single biweekly planning/review meeting was determined to be sufficient to report accomplishments and set goals yet substantial and free of irrelevant content without being overly burdensome as “yet another meeting.” at each sprint meeting, each participant reported on results from the previous sprint. work that was completed allowed the next phase of a project to proceed. based on the results of the last sprint, each team member set measurable goals that could be realistically met in the next twoweek sprint. there has been a concerted effort to keep the meetings short, limited to about twenty digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 12 to twenty-five minutes. to enforce this habit, the sprint meetings were scheduled to begin twenty minutes before other regularly scheduled meetings for most or all of the participants. this helped keep participants on-topic and reinforced the transfer-of-information aspect of the meetings, with minimal leeway for extraneous topics. reflection the modified scrum methodology described above has been in place for more than a year. there have been several positive outcomes resulting from this practice. beginning with the most practical, production has become more regular than it was before scrum was implemented. the nature of digital initiatives in this environment dictates that many projects are in progress at once, in various stages of completion. the production work, such as digitizing media or creating metadata records, has become more consistent and regular. instead of production peaks and valleys, there is more of a straight line as portions of projects are finished and others come online. this in turn has resulted in faster publication of collections. in 2013, the team published six new collections, twice as many as the previous year. the ability to put all hands on deck for a project for a two-week period can increase productivity. since sprints allow for short, concentrated bursts of work on a single project, smaller projects can be completed in a few sprints and larger projects can be divided into “potentially shippable units” and thus published incrementally. another benefit of scrum is that the variability of the two-week sprint cycle allows the team to work on more collections concurrently. for example, during a given sprint, scanning is underway for one collection, a metadata template is being constructed for another, the analog material in a third is being examined for pre-scanning preservation assessment, and a fourth collection is being published. while this type of multitasking occurred before the team piloted sprints, the scrum project management framework lends more structure and coordination to the various team members’ efforts. collection building activities can be broken down into subtasks that are accomplished in nonconsecutive sprints without undercutting the team’s concerted efforts. as a result, the team can juggle competing priorities much more effectively. the team is working with multiple stakeholders at any given time, each of whom may have several projects planned or in progress. as focus shifts among stakeholders and their respective projects, the scrum team is able to adjust quickly to align with those priorities, even if only for a single sprint. this also makes it easier to respond to emerging requests or address small, focused projects on the basis of events such as exhibits or course assignments. additional benefits of the scrum methodology pertain to communication and work style among the three scrum participants. the frequent, short meetings are densely packed and highly focused. each person has only a few minutes to describe what has been accomplished, explain problems encountered, troubleshoot solutions, and share plans for the next sprint. the return on the time investment of twenty minutes every two weeks is significant—there is no time to waste on issues that do not pertain directly to the projects underway, just completed, or about to start. a further result is that the group’s sense of itself as a team is enhanced. as stated above, the three scrum information technologies and libraries |december 2015 13 participants do not all work in the same administrative unit within the library. though they shared frequent communication by email as projects progressed, regular sprint meetings have fostered a closer sense of team. the participants know from sprint to sprint what the others are doing; they can assist one another with problems face-to-face and coordinate with one another so that work segments progress toward production in a logical sequence. with more than a year of experience with scrum, the pilot team has determined that several aspects of the methodology have worked well in our environment. in general, the sprint pattern fits well with existing operating modes. the monthly dlmg meeting, which includes a large and diverse group, provides an opportunity to discuss priorities, review project proposals, establish standards, and make strategic decisions. the bi-weekly sprint meetings dovetail nicely, with one meeting taking place at a midpoint between dlmg meetings, and one just prior to dlmg meetings. this allows the three scrum participants to focus on strategic items during the dlmg meeting but keep a close eye on operational items in between. the scrum methodology has also accommodated the competing priorities that the three participants must balance on an ongoing basis. there is considerable variation between participants in terms of roles and responsibilities, but the division of work into sprints has given the team greater opportunity to fit production work in with other responsibilities, such as supervision and training; scholarly research and writing; service performed for disciplinary organizations; infrastructure building; and planning, research, and design work for future projects. the two-week sprint duration is a productive time interval during which the team can set and reach incremental goals, whether that is starting and finishing a small project on short notice, making a big push on a large-scale project, or continuing gradual progress on a large, deliberatelypaced initiative. the brief meetings ensure that participants focus on the previous sprint and the upcoming sprint. there is usually just enough time to discuss accomplishments, goals, and obstacles, with some time left to troubleshoot as necessary. the meeting schedule and structure allows each individual to set his or her own goals so that he or she can make maximum progress during the sprint. this in turn feeds into accountability. there is always an external check on one’s progress—the next meeting comes up in two weeks, creating an effective deadline (which also sometimes corresponds to a project deadline). it becomes easier to stay on task and keep goals in sight with the sprint report looming in a matter of days. at the same time, scrum helps to define each person’s role and clarifies how roles align with each other. some tasks are completely independent, while others must be done in sequence and depend on another’s work. the sprint schedule allows large, complex projects to be divided into manageable pieces so that each sprint can result in a sense of accomplishment, even if it may require many sprint cycles to actually complete a project. this is especially true for large digital initiatives. for instance, completing the entire project may take a year, but subsets of a collection may be published in phases at more frequent intervals in the meantime. digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 14 summary of benefits ● enhanced ability to manage multiple concurrent projects ● published large collections incrementally, increasing responsiveness to users and other stakeholders ● improved team building ● increased communication and accountability among team members future considerations based on these outcomes, the team can safely say that it met its objectives for the test pilot. one of the reasons that it was feasible to try this when the participants were already highly committed is that the pilot used a small portion of the scrum methodology and was not too rigid in its approach. the team felt that a hybrid of the scrum planning and scrum review meeting held twice a month would provide the benefits without overburdening schedules with additional meetings. there were also plans to have a virtual email check-in every other week to loosely achieve the goals of the daily scrum meeting, that is, to improve communication and accountability. the email check-in fell by the wayside; the team found it wasn’t necessary because there were already adequate opportunities to check-in with each other over the course of a two-week sprint. the team has found the sprints and modified scrum meetings to be highly useful and relatively easy to incorporate into their workflows. the next phase of the pilot will implement product backlogs and burn down charts, diagrams showing how much work remains for the team in a single sprint, with the goal of tracking collections’ progress at the item level through each step of the planning, selection, preservation assessment, digitization, metadata provisioning, and publication workflows. figure 2. hypothetical backlog for the first sprint of a digital collection17 information technologies and libraries |december 2015 15 scrum backlogs are arranged on the basis of a task’s perceived benefit for customers. to adapt backlogs for digital collection production work, the backlog task list’s order will instead be based in part on the workflow sequence. for example, pieces from the physical collection must be selected before preservation staff can assess them. additionally, the backlog items will be sequenced according to the materials’ research value or complexity. for instance, the digitization of a folder of significant correspondence from an archival collection would be assigned a higher priority in the backlog than the digitization of newspaper clippings of minor importance from the same collection. or, materials that are easy to scan would be listed in the backlog ahead of fragile or complex items that require more time to complete. this will allow the team to publish the most valuable items from the collection more quickly. according to scrum best practices, backlogs are also appropriately detailed. in the context of digital collection production work, collections’ backlogs would begin with a standard template of high-level activities: materials’ selection, copyright analysis, preservation assessment, digitization, metadata creation, and publication. as the team progresses through backlog items, they will become increasingly detailed. backlogs also evolve. scrum’s ability to respond to change has been one of its strongest assets in this environment and therefore the backlog’s ability to evolve will make it a valuable addition to the team’s process. for example, materials that a collection manager uncovers and adds to the project late in the process can be easily incorporated into the backlog or materials in the collection that are needed to support an upcoming instruction session can be moved up in the backlog for the next sprint. in this way, the backlog will support the team’s goal to nimbly respond to shifting priorities and emerging opportunities. figure 3. hypothetical burn down chart18 digital collections are a sprint, not a marathon | dulock and long | doi: 10.6017/ital.v34i4.5869 16 the final relevant feature of a backlog, the “effort estimates,” taken in conjunction with the burn down chart will help the team develop better metrics for estimating the time and resources required to complete a collection. when items are added to the backlog, team members estimate the amount of effort needed to complete it. the burn down chart illustrates how much work remains and, in general practice, is updated on a daily basis. given that the team has truncated the scrum meeting schedule, this may occur on a weekly basis, but will nonetheless benefit the team in several ways. initially, it will keep the team on track and provide valuable and detailed information for stakeholders on the collections’ progress. as the team accrues old burn down charts from completed collections, they can use the data to hone their ability to estimate the amount of time and resources needed to complete a given project. conclusion through the pilot conducted for digital initiatives at cu-boulder libraries, application of aspects of the scrum project management framework has demonstrated significant benefits with no discernable downside. adoption of sprint planning and end meetings resulted in several positive outcomes for the participants. digital collection production has become more regular; work can be underway on more collections simultaneously; and collections are, on average, published more quickly. in addition, communication and cooperation among the sprint pilot participants have increased and strengthened the sense of teamwork among them. the sprint schedule has blended well with existing digital initiatives meetings and workflows, and has enhanced the team’s ability to handle ever-shifting priorities. additional aspects of scrum, such as product backlogs and burn down charts, will be incorporated into the participants’ workflows to allow them to better track the work done at the item level, provide more detailed information for stakeholders during the course of a project, and predict how much time and effort will be required for future projects. the positive results of this pilot demonstrate the benefits to be gained by looking outside standard library practice and adopting techniques developed in another discipline. given the range of activities performed in libraries, the possibilities to improve workflows and increase efficiency are limitless as long as those doing the work keep an open mind and a sharp eye out for methodologies that could ultimately benefit their work, and in turn, their users. references 1. sueli mara ferreira and denise nunes pithan, “usability of digital libraries,” oclc systems & services: international digital library perspectives 21, no. 4 (2005): 316, doi: 10.1108/10650750510631695. 2. danielle a. becker and lauren yannotta, “modeling a library web site redesign process: developing a user-centered web site through usability testing,” information technology & libraries 32, no. 1 (2013): 11, doi: 10.6017/ital.v32i1.2311. 3. jodi condit fagan, “usability testing of a large, multidisciplinary library database: basic search and visual search,” information technology & libraries 25 no. 3 (2006): 140–41, 10.6017/ital.v25i3.3345. http://dx.doi.org/10.1108/10650750510631695 http://dx.doi.org/10.6017/ital.v32i1.2311 http://dx.doi.org/10.6017/ital.v25i3.3345 information technologies and libraries |december 2015 17 4. panayiotis zaphiris, kulvinder gill, terry h.-y. ma, stephanie wilson and helen petrie, “exploring the use of information visualization for digital libraries,” new review of information networking 10, no. 1 (2004): 58, doi: 10.1080/1361457042000304136. 5. benjamin meunier and olaf eigenbrodt, “more than bricks and mortar: building a community of users through library design,” journal of library administration 54 no. 3 (2014): 218–19, 10.1080/01930826.2014.915166. 6. lisa a. palmer and barbara c. ingrassia, “utilizing the power of continuous process improvement in technical services,” journal of hospital librarianship 5 no. 3 (2005): 94–95, 10.1300/j186v05n03_09. 7. javier d. fernández et al., “agile dl: building a delos-conformed digital library using agile software development,” in research and advanced technology for digital libraries, edited by birte christensen-dalsgaard et al. (berlin: springer-verlag, 2008), 398–9, doi: 10.1007/978-3540-87599-4_44. 8. michelle frisque, “using scrum to streamline web applications development and improve transparency” (paper presented at the 13th annual lita national forum, atlanta, georgia, september 30–october 3, 2010). 9. frank h. cervone, “understanding agile project management methods using scrum,” oclc systems & services 27, no. 1 (2011): 19, 10.1108/10650751111106528. 10. pete deemer, gabrielle benefield, craig larman, and bas vodde, “the scrum primer: a lightweight guide to the theory and practice of scrum," (2012), 3-15, www.infoq.com/minibooks/scrum_primer. 11. eliza s. f. cardozo et al., “scrum and productivity in software projects: a systematic literature review” (paper presented at the 14th international conference on evaluation and assessment in software engineering (ease), 2010), 3. 12. cervone, “understanding agile project management,” 18. 13. ibid., 19. 14. sticky shed syndrome refers to the degradation of magnetic tape where the binder separates from the carrier. the binder can then stick to the playback equipment rendering the tape unplayable. 15. frisque, “using scrum.” 16. the media lab manager responsible for audio and video digitization did not participate because his lab offers fee-based services to the public and thus has long-established business processes in place that would not have blended easily with sprints. 17. figure 2 is based on illustration created by mountain goat software, “sprint backlog,” https://www.mountaingoatsoftware.com/agile/scrum/sprint-backlog. 18. figure 3 is adapted from template created by expert project management, “burn down chart template,” www.expertprogrammanagement.com/wpcontent/uploads/templates/burndown.xls. http://dx.doi.org/10.1080/1361457042000304136 http://dx.doi.org/10.1080/01930826.2014.915166 http://dx.doi.org/10.1300/j186v05n03_09 http://dx.doi.org/10.1007/978-3-540-87599-4_44 http://dx.doi.org/10.1007/978-3-540-87599-4_44 http://dx.doi.org/10.1108/10650751111106528 https://www.mountaingoatsoftware.com/agile/scrum/sprint-backlog http://www.expertprogrammanagement.com/wp-content/uploads/templates/burndown.xls http://www.expertprogrammanagement.com/wp-content/uploads/templates/burndown.xls eclipse editor for marc records bojana dimić surla information technology and libraries | september 2012 65 abstract editing bibliographic data is an important part of library information systems. in this paper we discuss existing approaches in developing user interfaces for editing marc records. there are two basic approaches: screen forms that support entering bibliographic data without knowledge of the marc structure, and direct editing of marc records shown on the screen. this paper presents the eclipse editor, which fully supports editing of marc records. it is written in java as an eclipse plug-in, so it is platform-independent. it can be extended for use with any data store. the paper also presents a rich client platform (rcp) application made of a marc editor plug-in, which can be used outside of eclipse. the practical application of the results is integration of the rcp application into the bisis library information system. introduction an important module of every library information system (lis) is one for editing bibliographic records (i.e., cataloguing). most library information systems store their bibliographic data in a form of marc records. some of them support cataloging by direct-editing of marc record; others have a user interface that enables entering bibliographic data by a user who knows nothing about how marc records are organized. the subject of this paper is user interfaces for editing marc records. it gives software requirements and analyzes existing approaches in this field. as the main part of the paper, we present the eclipse editor for marc records, developed at the university of novi sad, as a part of the bisis library information system. eclipse uses the marc 21 variant of the marc format. the remainder of this paper describes the motivation for the research, presents the software requirements for cataloging according to marc standards, and provides background on the marc 21 format. it also describes the development of the bisis software system, reviews the literature concerning tools for cataloging, and analyzes existing approaches in developing user interfaces for editing marc records. the results of the research are presented in the final section, which describes the functionality and technical characteristics of the eclipse marc editor. the rich client platform (rcp) version of the editor, which can be used independently of eclipse, is also presented. motivation the motivation for this paper was to provide an improved user interface for cataloging by the marc standard that will lead to more efficient and comfortable work for catalogers. bojana dimić surla (bdimic@uns.ns.ac.yu) is an associate professor, university of novi sad, serbia. eclipse editor for marc records |surla 66 there are two basic approaches in developing user interfaces for marc cataloging. the first approach includes using a classic screen form made of text fields and labels with the description of the bibliographic data, without marc standard indication. the second approach is direct editing of a record that is shown on the screen. those two approaches will be discussed in detail in “existing approaches in developing user interfaces for editing marc records” below. the current editor in the bisis system is a mixture of these two approaches—it supports direct editing, but data input is done via text field, which opens on double click.1 the idea presented in this paper is to create an editor that overcomes all drawbacks of previous solutions. the approach taken in creating the editor was direct record-editing with real-time validation and no additional dialogs. software requirements for marc cataloging the user interface for marc cataloging needs to support following functions: • creating marc records that satisfy constraints proposed by the bibliographic format • selecting codes for field tags, subfield names, and values of coded elements, such as character positions in leader and control fields, indicators, and subfield content • validating entered data • access to data about the marc format (a “user manual” for marc cataloging) • exporting and importing created records • providing various previews of the record, such as catalog cards background marc 21 as was previously mentioned, the eclipse editor uses the marc 21 variant. marc 21 consists of five formats: bibliographic data, authority data, holdings data, classification data, and community information.2 marc 21 records consist of three parts: record leader, set of control fields, and set of data fields. the record leader content, which follows the ldr label, includes the logical length of the record (first five characters) and the code for record status (sixth character). after the record leader, there are control fields. every control field is written in new line and consists of the threecharacter numeric tag and content of the control field. the content of the control field can be a single datum or a set of fixed-length bibliographic data. control fields are followed by data fields in the record. every line in the record that contains a data field consists of a three-character numeric tag, the value for the first and the second indicator—or the number sign (#) if indicators are not defined for the field—and the list of subfields that belong to the field. information technology and libraries | september 2012 67 detailed analysis of marc 21 shows that there are some constraints on the structure and content of the marc 21 record. constraints on the structure define which fields and subfields can appear more than once in the record (i.e., are the fields and subfields repeatable or not), the allowed length of the record elements, and all the elements of the record defined by marc 21. constraints on the record content are defined on the content of the leader, indicators, control fields and subfields. moreover, some constraints connect more elements in the record (when the content of one element depends on the content of the other element in the record). an example of constraint on the structure for data field 016 is that the field has the first indicator whereas the second indicator is undefined. the field 016 can have subfields a, z, 2, and 8, of which z and 8 are repeatable. bisis the results presented in this paper belong to the research on the development of the bisis library information system. this system, which has been in development since 1993, is currently in its fourth version. the editor for cataloging in the current version of bisis was the starting point for the development of eclipse, the subject of this paper. 3 apart from an editor for cataloging, the bisis system has a module for circulation and an editor for creating z39.50 queries.4 the indexing and searching of bibliographic records was implemented using the lucene text server.5 as a part of the editor for cataloging, we developed the module generating various reports and catalog cards from marc records.6 bisis also supports creating an electronic catalog of unimarc records on the web, where the input of bibliographic data can be down without knowing unimarc but the entered data are mapped to unimarc and stored in the bisis database.7 the recent research within the bisis project relates to its extension for managing research results at the university of novi sad. for that purpose, we developed the current research information system (cris) on the recommendation of the nonprofit organization eurocris.8 the paper “cerif compatible data model based on marc 21 format” gives the proposal for the common european research information format (cerif), a compatible data model based on marc 21. in this model, a part of the cerif data model that relates to research results is mapped to marc 21. furthermore, on the basis of this model, research management at the university of novi sad was developed.9 the paper “cerif data model extension for evaluation and quantitative expression of scientific research results” explains the extension of cerif for evaluation of published scientific research. the extension is based on the semantic layer of cerif, which enables classification of entities and their relationships by different classification schemas.10 the current version of the bisis system is based on a variant of the unimarc format. the development of the next version of bisis, which will be based on marc 21, is in progress. the first task was migrating existing unimarc records.11 the second task is developing the editor for marc 21 records, which is the subject of this paper. eclipse editor for marc records |surla 68 cataloging tools an editor for cataloging is a standard part of a cataloger’s workstation and the subject of numerous studies. lange describes the cataloging development process from handwritten cataloging cards, to typewriters (first manual then electronic), to the appearance of marc records and pc-based cataloger’s workstations.12 leroya and thomas debate the influence of web development on cataloging. they stress that the availability of information on the web, as well as the possibility that more applications can be opened in the same time in different windows, greatly influence the process of creating bibliographic records. their paper also indicates that there are some problems that result from using large numbers of resources from the web, such as errors that arise from copy-paste methods. consequently, there is a need for automatic check of spelling errors and the possibility of a detailed review by a cataloger during editing.13 khurshid deals with general principles of the cataloger’s workstation, its configuration, and its influence on a cataloger’s productivity. in addition to efficient access to remote and local electronic resources, khurshid includes record transfer through a network and sophisticated record editing as important functions of a cataloger’s workstation. furthermore, khurshid says it is possible to improve cataloging efficiency in the windows-based cataloger’s workstation by finding bibliographic records in other institutions and cutting and pasting lengthy parts of the record (such as summary notes) to their own catalog.14 existing approaches in developing user interfaces for editing marc records the basic source for this analysis of existing user interfaces for editing marc records was the official site for marc standards of the library of congress in addition to scientific journals and conferences. the analysis of existing systems shows that there are two basic approaches in the implementation of editing marc records: 15 • entering bibliographic data in classic screen forms made of text fields and labels, which does not require knowledge of the marc format (concourse,16 koha,17 j-marc18) • direct editing of a marc record shown on the screen (marcedit,19 isismarc,20 catalis,21 polaris,22 marcmaker and marcbraker,23 exlibris voyager24). both of these approaches have advantages and disadvantages. the drawback of the first approach is that it provides a limited set of bibliographic data to edit, and the extension of that set implies changes to the application, or in the best cases changes in configuration. another problem is that there are usually a lot of text fields, text areas, combo boxes, and labels on the screen that need to be organized into several tabs or additional windows. this situation usually makes it difficult for the users to see errors or to connect different parts of the record when checking their work. moreover, all found solutions from the first group perform little validation of data entered by the user.25 one important advantage of the first approach is that the application can be used by a user information technology and libraries | september 2012 69 who is not familiar with the standard, thus the need for access to marc data can be avoided (one of functions listed “marc 21” above). as for second approach, editing a marc record directly on the screen overcomes the problem of extending the set of bibliographic data to enter. it also enables users to scan entered data and check the whole record, which appears on the screen. users can also copy and paste parts of records from other resources into the editor. however, a majority of those applications are actually editors for editing marc files that are later uploaded in some database or transformed in some other format (marcedit, marcmaker and marcbreaker, polaris), and they usually support little or no data validation.26 they allow users to write anything (i.e., the record structure is not controlled by the program), and only validate at the end of the process when uploading or transforming the record. among those editors there are those, such as catalis and isismarc, that present the marc record as a table. they support the control of structure, but the record presented in this way is usually too big to fit on the screen, so it is separated into several tabs. an important function of editing marc records is selecting code for coded elements that can be positioned in the leader or control field, value of the indicator, or value of the subfield. there are also field tags or subfield codes that sometimes need to be selected for addition to a record. all analyzed editors provide additional dialogs for picking this code that require the user to constantly open and close dialogs, which sometimes can be annoying for the user. one important fact about editors in the second group is that they can be used only by a user who is familiar with marc, so access to the large set of marc element descriptions can make the job easier. some of the mentioned systems provide descriptions of the fields and subfields (e.g., isismarc), but most of them do not. findings the editor for marc records was developed as a plug-in for eclipse; therefore it is similar to eclipse’s java code editors. as the editor is written in java, it is platform-independent. the main part of this editor was created using oaw xtext framework for developing textual domain-specific languages.27 it was created using model-driven software development by specifying the model of marc record in a form of xtext grammar and generating the editor. all main characteristics of the editor were generated on the basis of the specification of constraints and extensions of the xtext grammar—therefore all changes to the editor can be realized by changing the specification. moreover, this editor can be easily adjusted for any database by using the concept of extension and extension point in the eclipse plug-in. we make this application independent of eclipse by using rich client platform (rcp) technology. this editor is implemented for marc 21 bibliographic and holdings formats. user interface eclipse editor for marc records |surla 70 figure 1 shows the editor opened within eclipse. the main area is marked with “1”—it shows the marc 21 file that is being edited. that file contains one marc 21 bibliographic record. the tags of the fields and subfields codes are highlighted in the editor, which contributes to presentation clarity. the area marked with “2” serves for listing the errors in the record, that is, nonvalid elements entered in the record. the area marked with “3” shows data about marc 21 in a tree form. this part of the screen has two other possible views: a marc 21 holdings format tree and a navigator, which is the standard eclipse view for browsing resources for the opened project. the actions available for creating a record are available in the cataloging menu and on the cataloging toolbar, which is marked with “4.” these are actions for previewing the catalog card, creating a new bibliographic record, loading a record from a database (importing the record), uploading a record to a database (exporting the record), and creating a holdings record for this bibliographic record. figure 1. eclipse editor for marc records in the eclipse editor for marc, selecting codes is enabled without opening additional dialogs or windows (figure 2). that is a standard eclipse mechanism for code completion: typing ctrl + space opens the dropdown list with all possible values for the cursor’s current position. information technology and libraries | september 2012 71 figure 2. selecting codes record validation is done in real time, and every violation is shown while editing (figure 3). figure 3 depicts two errors in the record: one is a wrong value in the second character position in control field 008, and another is that two 100 fields were entered, which is a field that cannot be duplicated in a record. figure 3. validation errors rcp application of the cataloging editor as shown above, the editor is available as an eclipse plug-in, which raises the question of what a cataloger will do with all the other functions of the eclipse integrated development environment (ide). as seen in figures 1 and 3, there are a lot of additional toolbars and menus that not related eclipse editor for marc records |surla 72 to cataloging. the answer lies in rcp technology. rcp technology generates independent software applications on the basis of a set of eclipse plug-ins.28 the main window of an rcp application with additional actions is shown in figure 4. beside the cataloguing menu that is shown, the window also contains the file menu, which includes save and save as actions, as well as the edit menu, which includes undo and redu actions. all of these actions are also available via the toolbar. figure 4. rcp application conclusion the goal of this paper was to review current user interfaces for editing marc records. we presented two basic approaches in this field and analyzed of advantages and disadvantages of each. we then presented the eclipse marc editor, which is part of the bisis library software system. the idea behind eclipse is inputting structured marc data in the form similar to programming language editors. the author did not find this approach in the accessible literature. the rcp application of the presented editor will find its practical application in future versions of the bisis system. it represents an upgrade of the existing editor and a starting point for forming the version of the bisis system that will be based on marc 21. the acquired results can also be information technology and libraries | september 2012 73 used for the input of other data into the bisis system, including data from the cris system used at the university of novi sad. this paper shows that eclipse plug-in technology can be used for creating end user applications. the development of applications with the plug-in technology enables the use of a big library of created components from the eclipse user interface, whereby writing source code is avoided. additionally, the plug-in technology enables the development of extendible applications by using the concept of the extension point. in this way, we can create software components that can be used by a great number of different information systems. by using the concept of “extension point,” the editor can be extended by the functions that are specific for a data store. an extension point was created for export and import of marc records, which means the marc editor plug-in can be used with any database management system by extending this extension point in eclipse plug-in technology. future work in the development of the eclipse marc editor is to implement support for additional marc formats, for authority and classification data, and for community information. these formats propose the same record structure but have different constraints on the content and different sets of fields and subfields, as well as different codes for character positions and subfields. therefore the appearance of the editor will remain the same. the only difference will be the specification of the constraints and codes for code completion. another interesting topic for discussion is considering implementation of other modules of library information systems in eclipse plug-in technology. references 1. bojana dimić and dušan surla, “xml editor for unimarc and marc21 cataloging,” electronic library 27 (2009): 509–28; bojana dimić, branko milosavljević, and dušan surla, “xml schema for unimarc and marc 21 formats,” electronic library 28 (2010): 245–62. 2. library of congress, “marc standards,” http://www.loc.gov/marc (access february 19, 2011). 3. dimić and surla, “xml editor,” dimić, milosavljević, and surla, “xml schema.” 4. danijela tešendić, branko milosavljević, and dušan surla, “a library circulation system for city and special libraries,” electronic library 27 (2009): 162–68; branko milosavljevic and danijela tešendić, “software architecture of distributed client/server library circulation,” electronic library, 28 (2010): 286–99; danijela boberić and dušan surla, “xml editor for search and retrieval of bibliographic records in the z39.50 standard,” electronic library 27 (2009): 474–95. 5. branko milosavljević, danijela boberić, and dušan surla, “retrieval of bibliographic records using apache lucene,” electronic library 28 (2010): 525–36. http://www.loc.gov/marc eclipse editor for marc records |surla 74 6. jelana rađenović, branko milosavljеvić, and dušan surla, “modelling and implementation of catalogue cards using freemarker,” program: electronic library and information systems 43 (2009): 63–76. 7. katarina belić and dušan surla, “model of user friendly system for library cataloging,” comsis 5 (2008): 61–85; katarina belić and dušan surla, “user-friendly web application for bibliographic material processing,” electronic library 26 (2008): 400–410; eurocris homepage, www.eurocris.org (accessed february 21, 2011). 8. dragan ivanović, dušan surla, and zora konjović, “cerif compatible data model based on marc 21 format,” electronic library 29 (2011). http://www.emeraldinsight.com/journals.htm?articleid=1906945. 9. eurocris, “common european research information format,” http://www.eurocris.org/index.php?page=cerifreleasesandt=1 (accessed february 21, 2011); dragan ivanović et al., “a cerif-compatible research management system based on the marc 21 format,” program: electronic library and information systems 44 (2010): 229–51. 10. gordana milosavljević et al., “automated construction of the user interface for a cerifcompliant research management system,” the electronic library 29 (2011). http://www.emeraldinsight.com/journals.htm?articleid=1954429; dragan ivanović, dušan surla, and miloš racković, “a cerif data model extension for evaluation and quantitative expression of scientific research results,” scientometrics 86 (2010): 155–72. 11. gordana rudić and dušan surla, “conversion of bibliographic records to marc 21 format,” electronic library 27 (2009): 950–67. 12. holley r. lange, “catalogers and workstations: a retrospective and future view,” cataloging & classification quarterly 16 (1993): 39–52. 13. sarah yoder leroya and suzanne leffard thomas, “impact of web access on cataloging,” cataloging & classification quarterly 38 (2004): 7–16. 14. zahirrudin khurshid, “the cataloger’s workstation in the electronic library environment,” electronic library 19 (2001): 78–83. 15. library of congress, “marc standards,” http://www.loc.gov/marc (access february 19, 2011). 16. book systems, “concourse software product,” http://www.booksys.com/v2/products/concourse (accessed february 19, 2011). 17. koha library software community homepage, http://koha-community.org (accessed february 19, 2011). http://www.emeraldinsight.com/journals.htm?articleid=1906945 http://www.emeraldinsight.com/journals.htm?articleid=1954429 http://www.loc.gov/marc http://www.booksys.com/v2/products/concourse http://koha-community.org/ information technology and libraries | september 2012 75 18. wendy osborn et al., “a cross-platform solution for bibliographic record manipulation in digital libraries,” (paper presented at the sixth iasted international conference communications, internet and information technology, july 2–4, 2007, banf, alberta, canada). 19. terry reese, “marcedit—your complete free marc editing utility,” http://people.oregonstate.edu/~reeset.marcedit/html/index.php (accessed february 19, 2011). 20. united nations educational scientific and cultural organization, “isismarc,” http://portal.unesco.org/ci/en/ev.phpurl_id=11041&url_do=do_topic&url_section=201.html (accessed february 19, 2011). 21. fernando j. gómez “catalis,” http://inmabb.criba.edu.ar/catalis (accessed february 19, 2011). 22. polaris library systems homepage, http://www.gisinfosystems.com (accessed february 19, 2011). 23. library of congress, “marcmaker and marcbreaker user’s manual,” http://www.loc.gov/marc/makrbrkr.html (accessed february 19, 2011). 24. exlibris, “exlibris voyager,” http://www.exlibrisgroup.com/category/voyager (accessed february 19, 2011). 25. book systems, “concourse software product.” 26. bonnie parks, “an interview with terry reese,” serials review 31 (2005): 303–8. 27. eclipse.org, “xtext,” http://www.eclipse.org/xtext (accessed february 19, 2011). 28. the eclipse foundation, “rich client platform,” http://wiki.eclipse.org/index.php/rich_client_platform (accessed february 19, 2011). http://people.oregonstate.edu/~reeset.marcedit/html/index.php http://portal.unesco.org/ci/en/ev.php-url_id=11041&url_do=do_topic&url_section=201.html http://portal.unesco.org/ci/en/ev.php-url_id=11041&url_do=do_topic&url_section=201.html http://inmabb.criba.edu.ar/catalis http://www.gisinfosystems.com/ http://www.loc.gov/marc/makrbrkr.html http://www.exlibrisgroup.com/category/voyager http://www.eclipse.org/xtext http://wiki.eclipse.org/index.php/rich_client_platform 18. wendy osborn et al., “a cross-platform solution for bibliographic record manipulation in digital libraries,” (paper presented at the sixth iasted international conference communications, internet and information technology, july 2–4, 2007, banf, ... 25. book systems, “concourse software product.” 26. bonnie parks, “an interview with terry reese,” serials review 31 (2005): 303–8. methods of randomization of large files with high volatility 79 patrick c. mitchell: senior programmer, washington state university, pullman, washington, and thomas k. burgess: project manager, institute of library research, university of california, los angeles, california key-to-address conversion algorithms which have been used for a large, direct access file are compared with respect to record density and access time. cumulative distribution functions are plotted to demonstrate the distribution of addresses generated by each method. the long-standing practice of counting address collisions is shown to be less valuable in fudging algorithm effectiveness than considering the maximum number of contiguously occupied file locations. the random access disk file used by the washington state university library acquisition sub-system is a large file with a sizable number of records being added and deleted daily. this file represents not only materials on order by the acquisitions section, but all materials which are in process within the technical services area of the library. the size of the file currently varies from approximately 12,000 to 15,000 items and has a capacity of 18,000 items. over 40,000 items are added and purged annually. each record consists of both fixed length fields and variable length fields. fixed fields primarily contain quantity and accounting information; the variable length fields represent bibliographic data. records are blocked at 1,000 characters for file structuring purposes; however the variable length information is treated as strings of characters with delimiters. the key to the file is a 16-character structure which is developed from the purchase order number. the structure of the key is as follows: six digits of the original purchase order number, two digits of partial order and credit information, and eight digits containing the computed relative record address. proper development of this key turns out to be 80 journal of library automation vol 3/1 march, 1970 the most important factor in achieving efficiency in both file access time and record density within the file. the w.s.u. purchase order numbering system, developed from a basic six-digit purchase order number, allows up to one million entries. of these, the library currently uses four blocks: one block for standing orders, one block for orders originating from the university after the system becomes operational, another block used by the systems people in prototype testing of the system, and a fourth block which was given to one vendor who operates an approval book program. in mapping a possible million numbers into eighteen thousand disk locations, there is a high probability that the disk addresses for more than one record will be the same. disk location, also called disk address, home position, and relative record address ( rra) in this paper, refers to the computed offset address of a record in the file, relative to the starting address of the file. currently, the file resides on an ibm 2316 disk pack which can store six 1000-character records per track. thus if the starting address of the file is track 40, a record with rra = 5 would have its home position on track 40, while a record with rra = 6 would have its home position on track 41. it should be noted that routines in this system are required to calculate neither absolute track address nor relative track address and therefore the file could be moved to any direct access device supported by os/bdam without program modification. when two records map into the same address, it is called a collision. for a write statement under the ibm 360 operating system, basic direct access methods, the system locates that disk address generated and if another record is found there, it sequentially searches from that point forward until a vacant space is found and then stores the new record in that space. the sequential search is done by a hardware program in the i/ 0 channel and proceeds at the rotational speed of the device on which the file resides. the cpu is free during this period to service other users. similarily, when searching for a record, the system locates the disk address and matches keys; if they do not match, it sequentially searches forward from that point. long sequential searches sharply degrade the operating efficiency of on-line systems. in initial experimentation with this file, it was discovered that some records were 2,500 disk positions away from their computed locations. this seriously reduced response time to the terminals which were operating against those records. the necessity to develop a method for placing each record close to its calculated location became quite obvious. however, the methodology for doing this was not as clear. the upper bound delay for a direct access read/write operation can be defined as the largest number of contiguously occupied record locations within the file. the problem of minimizing this upper bound for a particular file is equivalent to finding an algorithm which maps the keys in such a way that unoccupied locations are interspersed throughout the randomization of large files/mitchell and burgess 81 file space. one method for doing this is to triple the amount of space required for the file. this has been a traditional approach but is unsatisfactory in terms of its efficiency in space utilization. the method first used by the library was motivated by the necessity to "get on the air." its requirements were that it be easily implemented and perform to a reasonable degree. the prime modulo scheme seemed to qualify and was selected. as this algorithm was used, the largest prime number within the file size was divided into the purchase order number and the modulo remainder was used as an address; that is, rra = [po modulo pr] where rra is the relative record address, po is the purchase order number, and pr is a prime number. during the initial period file size grew to about 8,000 records. because the acquisitions section was converting from its manual operation, the file continued to grow in size and the collision problem became pronounced. when the file reached about 70% capacity-that is when 70% of the space allocated for the file was being occupied by records-this method became unusable; records were then located so far from their original addresses that terminal response times became degraded and batch process routines began to have significant increases in run times. with no additional space available to expand the size of the file, it became necessary to increase the record density within the existing file bounds. therefore an adaptation of the original algorithm was developed. in addition to generating the original number by dividing a prime number into the purchase order number and keeping the modulo remainder, the purchase order number was multiplied by 300 and divided by that same prime number to get an additional modulo remainder; the latter was added to the first modulo remainder and the sum then divided by 2: (po modulo pr) + (300 • po modulo pr) 2 rra = again this scheme brought some relief, but the file continued to grow as the system was implemented, and it became obvious that this procedure would also fail because of over-crowded areas in the file. a search of the literature using w. b. climenson's chapter on file structure ( 2) as a start provided some other methods for reducing the collision problem ( 1, 3, 4, 5, 6). several randomization or hashing schemes were examined. however, none of these methods appeared to be particularly pertinent to the set of conditions at washington state. in order to bring relief from the continuing problem of file and program maintenance involved with changing the file-mapping algorithm, research was initiated to devise an algorithm which would, independent of the input data, map records uniformly across the available file space. the algorithm which resulted utilizes a pseudo-random number generator, rand (7) developed at the w.s.u. computing center randl, program 360l-13.5.004, computing center library, computing center, 82 journal of library automation vol 3/ 1 march, 1970 washington state university, pullman, washington. the normal use of rand is to generate a sequence of uniformly distributed integers over the interval [1, m], where m is a specified upper bound in the interval [1, 231 -1]. in addition to m, rand has a second input parameter: n, which is the last number generated by rand. given m and n, rand generates a result r. rand is used by the algorithm to generate relative disk addresses by setting m to the size or capacity of the file, by setting n to the purchase order number of the record to be located, and by using r as the relative address of the record. rra =rand (po, m ) . in order to test the effectiveness of this algorithm and others which might be devised, a file simulation program was written bdamsim, program 360l-06.7.008, computing center library, computing center, washington state university, pullman, washington. inputs to this program are: a) an algorithm to generate relative record locations; b) a sequential file which contains the input data for "a"; c) various scalar values such as file capacity, approximate number of records in the file, title of output, etc. the program analyzes the numbers generated by "a" operating on "b" within the constraints of "c". the outputs of the program are some statistical results and a graphical plot showing the cumulative distribution function of the generated addresses. figures 1, 2, and 3 show the plotted output of the three algorithms operating against the current acquisitions file. the abscissas of the plots 8 • )!! ii! li 1i :;! 5i ::! !':! ~ ~ n n ~~ a= .. ~, ,~ -' -' ~)11 i'! a; ·:5 ~li ma! 0.. 0.. .. .. it ,:: ~ ~ ~--~~~~-±~~~--~~~~~~~~--~--~--~--~~0 21 , 10 '12.20 83.30 111,'10 105. 51 i:m.61 1~7.71 1811,81 im.tl 211.01 2$!,11 253.2f relrt ive record addresses lx i 02 l fig. 1. rra =po modulo pr randomization of large files/ mitchell and burgess 83 fig. 2. rra = ( (po modulo pr) + (300 x po modulo pr) )/ 2. 8 i )c ii! ~ ~ z! 5i fl !! l':! i<; ~ ;;; :::::: ;::: ~8 z 8::: .; .; ::~ ~ ,.. ~ ..j ..j ~~ ~iii ~m :ti~ a: a: """' "' ~ ~ ~ ~ fig. 3. rra =rand (po, pr). 84 journal of library a while any abandoned cluster (14,692,237 out of 24,030,176!) was erroneously described as follows: this xml empty statement omits the specific information about the abandoned cluster. to obtain this invaluable information again, we filed a bug by email. 29 the decision taken was drastic: starting in may 2020, viaf stopped including this information in its monthly dump, as stated at the bottom of the page itself.30 as a result, the only recourse available to viaf contributors or any https://www.wikidata.org/w/index.php?title=q102371&oldid=1220309663 https://viaf.org/viaf/57898554/ https://www.wikidata.org/wiki/q4117019 https://www.wikidata.org/wiki/q23665535 http://viaf.org/viaf/data information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 8 other institution that would synchronize their authority records with viaf identifiers is to rely on an external identification tool such as wikidata! materials and methods any comparison between viaf and wikidata must consider their different content. viaf contains personal name clusters, corporate name clusters, geographic name clusters, and work clusters, whereas wikidata allows items to describe any kind of entity relevant in the universe of discourse of the users’ data and irrespective of their bibliographic nature. even if all kinds of viaf clusters are relevant for bibliographic control, this study is limited to the analysis of personal name clusters in viaf and of items having “instance of: human” (p31:q5) in wikidata, because they are largely the most represented in viaf and they can be directly compared.31 some entities, such as mythological persons, legendary persons, etc., that are personal clusters in viaf, are not treated as humans in wikidata and belong to other instances (e.g., https://www.wikidata.org/wiki/q95074). a double approach was used to compare viaf and wikidata: first, data analyses of viaf and wikidata were performed, to compare viaf clusters and wikidata items and to investigate their reciprocal relationships (see the data analysis section). second, a comparison of several general characteristics, such as scope, objectives, philosophy, authority control, and identification, was made based on respective websites and available literature to find and highlight differences and similarities. full viaf dumps are available in native xml, rdf, marc-21 xml, or iso-2709 marc-21 (http://viaf.org/viaf/data/). viaf clusters were analyzed using an xml dump published on september 6, 2020 (http://viaf.org/viaf/data/viaf-20200906-clusters.xml.gz). full wikidata dumps are available in xml, json, or rdf.32 however, given the size of the entire dataset, it is much more convenient to create customized rdf dumps using the tool wdumper (https://wdumps.toolforge.org/). all the information (settings, dimension, and date of base dump) about dumps created using wdumper remains traced (https://wdumps.toolforge.org/dumps). wikidata items were analyzed using a customized rdf dump updated to september 14, 2020 (https://wdumps.toolforge.org/dump/732). the customized dump contains all statements with non-deprecated values33 present in items having both “instance of: human” (p31:q5) in best rank and at least one value of “viaf id” (p214) in best rank. both dumps were parsed using three perl scripts. dumps and scripts were uploaded on zenodo and are all available for analysis and reuse.34 perl scripts generate json data that are published on the html page http://catalogo.pusc.it/beyond_viaf/, where they are interpreted by javascript scripts in order to populate eight tables: three dedicated to viaf (tables 1–3) and five to wikidata (tables 4–8). in order to select the statements to be analyzed in wikidata items, three sets of relevant properties were found through three distinct sparql queries at the end of september 2020: viaf members (table 5), authority controls related to libraries but not being viaf members (table 6), and biographical dictionaries (table 7).35 at the beginning of october 2020, another sparql query was performed to find all the personal items containing the authority controls related to libraries but not being viaf members (table 6, column 4), without filtering the search to personal items having at least one value of “viaf id” (p214).36 https://www.wikidata.org/wiki/q95074 http://viaf.org/viaf/data/ http://viaf.org/viaf/data/viaf-20200906-clusters.xml.gz https://wdumps.toolforge.org/ https://wdumps.toolforge.org/dumps https://wdumps.toolforge.org/dump/732 http://catalogo.pusc.it/beyond_viaf/ http://catalogo.pusc.it/beyond_viaf/#summary http://catalogo.pusc.it/beyond_viaf/#summary http://catalogo.pusc.it/beyond_viaf/#tb5 http://catalogo.pusc.it/beyond_viaf/#tb6 http://catalogo.pusc.it/beyond_viaf/#tb7 http://catalogo.pusc.it/beyond_viaf/#tb6 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 9 data analysis: viaf clusters and wikidata items for this paper, two different versions of the data tables were produced: the first version, available at http://catalogo.pusc.it/beyond_viaf/, is a full, commented, and dynamic version of all the tables. within that version, links to the acronyms (such as lc, dnb, sudoc, etc.) of all the viaf contributors and other data providers are available too. static versions of these tables are included in this paper with commentary. viaf viaf has 22,099,715 personal clusters, half of which (50.90%; table 1, col. 2) are isolated clusters (i.e., they contain only one id). the presence of isolated clusters is interesting because it means that those clusters are created based on data coming from just one source. what is more, the percentage of isolated clusters is much higher (71.19%; table 1, col. 12) if just viaf contributors are taken into account (i.e., excluding isolated clusters due to data from other data providers, such as isni). it is worth noting that other data providers can form isolated clusters, with the relevant exception of wikidata (for which viaf uses the acronym wkp), which never appears in isolated clusters (table 1, cols. 7 and 8). table 1. viaf personal clusters by number of sources [adapted from http://catalogo.pusc.it/beyond_viaf/#tb1] the total number of ids present in viaf clusters is 51,327,847 (table 2), distributed in 22,099,715 clusters; the most relevant contributors include lc (7,266,628 ids), dnb (5,677,731 ids), sudoc (3,278,189 ids), and nta (2,754,036 ids), while the most relevant other data providers are isni (8,455,814 ids) and wkp (2,148,680 ids) (table 2). apart from lc and dnb, data about isolated clusters (table 2, col. 5) shows that the number of isolate clusters tends to slowly decrease over time and that clustering has improved: recently-added sources tend to have a higher share of isolated ids. another relevant figure is that sources in non-latin alphabets usually have higher shares of isolated ids.37 so, a high number of isolated clusters may reveal a source that is partially in need to be gathered to existing clusters. http://catalogo.pusc.it/beyond_viaf/ http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb2 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 10 table 2. viaf personal clusters by source [adapted from http://catalogo.pusc.it/beyond_viaf/#tb2] the histories of viaf clusters, as contained in xml dumps, appear weird and incoherent. for example, many viaf contributors in their first year of appearance seem to have no additions and many removals (e.g., bav row; for complete information see table 3 on the website at http://catalogo.pusc.it/beyond_viaf/#tb3). incoherence is due to the absence of redirected and abandoned clusters in the data. nevertheless, the histories allow us to reconstruct the year of first contribution of each source—an information otherwise unavailable—and to detect major changes in the data provided to viaf by each source.38 table 3. viaf history of personal clusters by source [adapted from http://catalogo.pusc.it/beyond_viaf/#tb3] wikidata wikidata has 8,304,947 personal items and 2,061,046 of them contain a viaf id. usually one or more viaf sources are extracted from the viaf id(s), so that 1,905,470 personal items containing viaf id have at least one viaf source id (table 4, col. 1). wikidata records ids from a wide range http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb3 http://catalogo.pusc.it/beyond_viaf/#tb4 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 11 of other resources, such as non-viaf bibliographic agencies and biographical dictionaries (investigated in these tables), but also encyclopedias and various online databases. considering the 2,061,046 items containing a viaf id, 684,367 items contain only one viaf source id (table 4, col. 1), but only 353,710 items contain only one among viaf sources ids and non-viaf sources ids and biographical dictionaries ids (table 4, col. 15); so, more than 300,000 items containing only one viaf source id have at least one non-viaf source id and/or one biographical dictionary id. table 4. wikidata personal items (pers. it.) by number of ids [adapted from http://catalogo.pusc.it/beyond_viaf/#tb4] viaf and wikidata: a data comparison from a quantitative perspective, wikidata personal items (8,304,947) are 37.58% of viaf personal clusters (22,099,715), while wikidata personal items having a viaf id (2,061,046) are 9.26%. ids from viaf sources present in wikidata personal items containing viaf id (6,292,778; table 5, col. 3) are 12.91% of ids present in viaf personal clusters (48,740,933; table 5, col. 4). in the authors’ opinion, quantitative confrontation between viaf and wikidata must be carefully considered. it could be argued that is a noticeable disadvantage of wikidata with respect to viaf, but it would be right only from a bibliographic control perspective and the other side of the coin must be examined too. as wikidata represents any kind of entity relevant for its users (libraries, archives, museums, and many other stakeholders), viaf contains just over a third of wikidata items (37%). furthermore, a very large part of the personal entities represented in wikidata (at present, more than 6,200,000, i.e., about 75%) cannot rely on viaf for identification purposes (for example, because wikidata personal items can also represent singers, lawyers, pilots, and so on). it can be concluded that viaf can be considered just one specialized source, in the domain of the semantic web and with respect to the objectives of wikidata. considering single viaf sources, wikidata surpasses viaf by number of ids only in two cases, perseus (135.18%) and simacob (102.17%) (table 5, col. 5). this is possible because wikidata and viaf gather different sets of data from both the sources; the former uses sets of data obtained by its users, while the latter uses only data sent by the contributor. all the other sources, because of the absence of systematic imports, are much rarer in wikidata than in viaf. http://catalogo.pusc.it/beyond_viaf/#tb4 http://catalogo.pusc.it/beyond_viaf/#tb4 http://catalogo.pusc.it/beyond_viaf/#tb5 http://catalogo.pusc.it/beyond_viaf/#tb5 http://catalogo.pusc.it/beyond_viaf/#tb5 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 12 table 5. wikidata personal items (pers. it.) by viaf source [adapted from http://catalogo.pusc.it/beyond_viaf/#tb5] table 6 and table 7 show authority control in wikidata living aside viaf. wikidata contains some non-viaf sources (usually non-national libraries or groups of libraries which couldn’t become viaf contributors); their ids in personal items having viaf id (894,161) are the 86.04% of their ids in all personal items (958,206; table 6, col. 4), meaning that wikidata provides a clusterization for more than 64,000 ids (6%) probably corresponding to non-existent viaf clusters (table 6, totals). http://catalogo.pusc.it/beyond_viaf/#tb6 http://catalogo.pusc.it/beyond_viaf/#tb7 http://catalogo.pusc.it/beyond_viaf/#tb6 http://catalogo.pusc.it/beyond_viaf/#tb6 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 13 table 6. wikidata personal items (pers. it.) by non-viaf sources [adapted from http://catalogo.pusc.it/beyond_viaf/#tb6] table 7. wikidata personal items (pers. it.) by biographical dictionary [adapted from http://catalogo.pusc.it/beyond_viaf/#tb7] in general the presence of ids of biographical dictionaries (796,609 ids in total) in 725,755 personal items having viaf id helps significantly in the definition of authoritative dates of birth and death (table 7, total of column 2 and table 4, total of column 12). http://catalogo.pusc.it/beyond_viaf/#tb7 http://catalogo.pusc.it/beyond_viaf/#tb4 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 14 a comparison between table 1, column 7, and table 2, row wkp (the acronym for wikidata wrongly used by viaf) shows that 2,147,319 clusters contain 2,148,680 wkp ids; it means that, from a viaf point of view, wikidata duplicates are only 1,361. furthermore, a comparison between the total and row 0 in table 8, col. 1, shows that 2,061,046 items contain at least one viaf id and that 2,037,638 items contain exactly one viaf id; so, items containing one or more viaf duplicates are 23,408. as a result, it can be concluded that the percentage of duplicates in wikidata is less than 0.01% and in viaf is about 0.01%, so wikidata is as trustworthy as viaf. viaf and wikidata not only are able to discover reciprocal duplicates, but also discover duplicates in viaf sources, by a comparison between table 8, col. 3—containing the total number of the cases in which a viaf source has at least one duplicate—and table 8, col. 5—containing the total number of the cases in which viaf sources are duplicated. however, while duplicates recorded by viaf are findable only by querying the monthly dumps using in-house–made programs, duplicates discovered by wikidata are easily findable through sparql queries detecting single-value constraint violations. table 8. wikidata personal items (pers. it.) by repeated viaf sources and viaf source ids [adapted from http://catalogo.pusc.it/beyond_viaf/#tb8] discussion viaf and wikidata are quite different in their purpose, scope, organizational and theoretical approach, data harvesting and management. a major difference between viaf and wikidata is in their purpose: on the one hand, viaf aims to identify bibliographic entities and to connect authority data provided by selected contributors (national libraries, cultural agencies, and other major institutions) and extracted from other data providers (such as isni, rism or de663, wikidata, etc.) through the creation of clusters by means of software. on the other hand, like isni, wikidata focuses on both identification and description of entities and has the purpose of building collaboratively a database concerning the sum of all relevant knowledge—provided that each item complying with its notability criteria is accepted— using a crowdsourced approach (https://www.wikidata.org/wiki/wikidata:notability). http://catalogo.pusc.it/beyond_viaf/#tb1 http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb8 http://catalogo.pusc.it/beyond_viaf/#tb8 http://catalogo.pusc.it/beyond_viaf/#tb8 http://catalogo.pusc.it/beyond_viaf/#tb8 https://www.wikidata.org/wiki/wikidata:notability information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 15 another relevant difference between viaf and wikidata is their scope: while viaf aims to identify a few selected types of entities already described within the bibliographic universe by national agencies, wikidata aims to identify and describe any kind of entity of interest for the wikidata community. wikidata items may exist for any kind of entity and may contain a very broad range of data and of external identifiers. so, wikidata can represent bibliographic data and entities —e.g., at present wikidata records data for the 54% of all the bibliographic sources cited in wikipedia entries—any other kind of entity provided for in viaf (i.e., agents, works, expressions, and places), and any other entity defined by the frbr-ifla lrm model (e.g., manifestations, items, timespans, nomens, res, etc.), and by other models relevant for the glam universe (such as frbroo and cidoc).39 but it is open to any data model because it can also include any kind of entity outside the bibliographic or cultural heritage universe, as it is a knowledge base capable of containing any kind of statement on any entity users want to describe. in addition, for any kind of entity there is no minimum or maximum number of statements that must or can be added; as soon as an entity is clearly identified, it can be added to wikidata. moreover, when miss ing, new identifiers—and properties for description—can be proposed by anyone through property proposals and, if well defined, they are usually approved within two weeks (https://www.wikidata.org/wiki/wikidata:property_proposal). a broader scope is supposed to be much more convenient for users who wish to discover previously unknown links and information in the semantic web. organizational model due to the viaf top-down approach, data is completely managed by oclc with no chance for common users or medium and small libraries or other institutions to directly improve viaf clusters (e.g., by adding other data coming from their collections or from encyclopedias or online databases, merging duplicates, solving conflations, etc.). as the wikidata approach is “to crowdsource data acquisition, allowing a global community to edit the data,” data is curated directly by users interested in their creation and use.40 so, in wikidata, data is produced by volunteers, by means of semiautomatic or manual data harvesting from any desired and available source. moreover, users’ statistics show that authoritative data from national bibliographic agencies and other libraries, archives, and museums are normally uploaded by common users, not by librarians (or any other kind of institutional data curator).41 identification function the theoretical approach differs too, both as to the form of the names and as to identification function. in viaf, preferred and variant forms of names for persons are based on national cataloguing codes. because national codes are different, viaf is needed and works as a neutral hub of all the national preferred forms. cataloguing rules can assure uniformity and univocity to the forms of the names of the entities within a national catalogue but are quite complicated to be understood and used by users. in ranganathan’s words, “the cataloguing conventions are on the surface quite contrary to what mr. everybody is familiar with.”42 in contrast, preferred forms in wikidata are based on the international principles of the convenience of the user and common usage.43 a clear example is the use of the direct form of name (jane doe) instead of the inverted form of name (doe, jane). a different usage in the forms of names could be an issue for the integration of library metadata in wikidata. in practice, however, it is not. first, there is no conflict between the wikidata form and any other form from a theoretical point of view, as wikidata form is already treated in viaf as the preferred form within its specific context.44 in addition to that, wikidata accepts any library https://www.wikidata.org/wiki/wikidata:property_proposal information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 16 identifier, so that any library-controlled form can be linked to a wikidata item and vice versa. furthermore, a wikidata bot could be programmed to dump authorized and variant access points from national authority files and add them to the item labels and aliases. 45 lastly, it could be argued that national cataloguing codes are compliant with the icp principles and with the convenience of the user and common usage. but a remarkable difference is that while in national codes principles are applied by cataloguers for users, in wikidata they are expressed directly by the users themselves. as the identification function is a major feature of the semantic web, the different approach of viaf and wikidata to this issue must be underlined. as noted, “viaf remains neutral towards differences in the cataloguing policy of its data contributors” and, for this reason, viaf accepts all ids provided by its sources, even when they are not clearly identifiable entities but are just labels (see for example https://viaf.org/viaf/307171748 or https://viaf.org/viaf/305052259).46 on the contrary, wikidata explicitly requires each item to refer to “a clearly identifiable conceptual or material entity” (second notability criterium; https://www.wikidata.org/wiki/wikidata:notability). as a consequence, many isolated clusters formed by viaf on the basis of single contributors’ ids related to not-clearly-identifiable entities are not acceptable in wikidata and remain unlinked. moreover, data on cluster duplication shows that identification in wikidata is performed with the same quality level as in viaf. clusters for identification purpose are created both in viaf and wikidata, but differently from viaf, in wikidata external identifiers—as all the other data—are not provided in a structured way by national libraries or other institutions (with very few exceptions); instead, identifiers are usually found and added by common users through web scrapers and after data cleaning. what is more, matches are not performed automatically, but semiautomatically (through tools such as openrefine or mix’n’match (https://mix-n-match.toolforge.org/ and https://openrefine.org/) or manually. an enhanced feature of wikidata in clusterization is the record of a wider variety of sources and relative ids: due to its openness, wikidata refers to viaf and its sources, but also to any other library or cultural institution and to a large number of reference sources like encyclopedias and biographical dictionaries too (table 7). a wider variety of identification sources and manual work assure a higher level of identification. data quantity data harvesting affects both quantity and quality of data. in viaf, data are collected from periodical contributions of viaf participants, with very large sets of data. therefore, from a quantitative point of view, viaf has a far larger number of people (22,099,715 personal clusters) in comparison with wikidata (8,304,947 personal items). even though wikidata was created in 2012, the number of personal items in wikidata is currently only over a third (37%) of all viaf personal clusters. although quantities are not directly comparable due to the different universe to be described, in the last few years initiatives to enhance organized cooperation between libraries and wikidata and to promote data production in wikidata are increasing. a very high-quality initiative is supported by cornell university, harvard university, stanford university, and the university of iowa’s school of library and information science, in collaboration with the library of congress and the program for cooperative cataloging (pcc). their linked data for production (ld4p) wikidata project is “an indepth exploration of how wikidata could serve as a platform for publishing, linking, and enriching library linked data” https://viaf.org/viaf/307171748 https://viaf.org/viaf/305052259/#jones,_a._l https://www.wikidata.org/wiki/wikidata:notability https://mix-n-match.toolforge.org/ https://openrefine.org/ http://catalogo.pusc.it/beyond_viaf/#tb7 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 17 (https://www.wikidata.org/wiki/wikidata:wikiproject_linked_data_for_production). an additional example is the ifla wikidata working group that was formed “to explore and advocate for the use of and contribution to wikidata by library and information professionals, the integration of wikidata and wikibase with library systems, and alignment of the wikidata ontology with library metadata formats such as bibframe, rda, and marc” (https://www.ifla.org/node/92837). even so, wikidata is still very far from having a structured workflow to ingest data from national or local libraries, museums, and archives. in fact, while the projects mentioned above are mainly dedicated to explaining to the public of librarians and institutions why wikidata is important and how to contribute to it, there are still very few projects which are mainly dedicated to the concrete massive synchronisation of data between library and bibliographic data and wikidata. in fact, they also require a relevant effort in the manual cleaning of discrepancies and oddities emerging from the synchronisation. relevant exceptions are the national library of wales 47 and the biblioteca europea di informazione e cultura, where significant work has been done to synchronise respective databases of authors (and of other types of entities) with wikidata. 48 data quality data quality also needs to be analyzed in detail. even if data from national libraries are authoritative and of high quality, as a virtual file viaf neither has nor produces its own data. consequently, viaf data does not always remain authoritative because errors can be both inherited and added, and clusters can be duplicated. the issue is well known by isni, that “whenever necessary [. . .] splits and merges data coming from viaf, and even applies protection to data that has been fixed manually.”49 as shown in table 2 and table 8, viaf clusters are subject to isolation and duplication when they are created and to many changes and updates when they are maintained. so, even if viaf collects a huge amount of authoritative data and creates clusters of ids, viaf users can not always safely and continuously rely on them. data flows just in one direction (from national libraries to viaf), viaf deletes and rebuilds clusters without giving priority to the stability of one cluster over another, and, after april 2020, viaf no longer makes available to users a record of its changes.50 on the contrary, wikidata data is always under strict control of any user, as its structure is designed to trace any minimum change to its data. every single addition or deletion is documented, not just to easily recover eventual vandalism, but also to support any decision with clear evidence. any stakeholder can exactly know if, how, when, and why data changed, in any moment. what is more, from a qualitative point of view, wikidata seems to offer a better solution for the recording of authority data than viaf. first, it can store a wider variety of data about a person in a more semantic way. not only is it possible in wikidata to express preferred and variant forms of the name, related names, works, co-authors, publication statistics, and other data about the person—like in viaf—but all these data are all expressed in a semantic way. for example, whereas in viaf “bach, anna magdalena” is just a related name of johann sebastian bach, in wikidata she is recorded and qualified as the person who married the musician. thanks to that different approach, wikidata can represent and show bach’s full genealogic tree (https://magnustoolserver.toolforge.org/ts2/geneawiki/?q=q1339). as adamich noted, “building graphs from bibliographic entities is really about making the data machine readable and understandable. it is about making the data web enabled. in terms of translation, linked data opens up a whole new world over our marc entrapment.”51 https://www.wikidata.org/wiki/wikidata:wikiproject_linked_data_for_production https://www.ifla.org/node/92837 http://catalogo.pusc.it/beyond_viaf/#tb2 http://catalogo.pusc.it/beyond_viaf/#tb8 https://magnus-toolserver.toolforge.org/ts2/geneawiki/?q=q1339 https://magnus-toolserver.toolforge.org/ts2/geneawiki/?q=q1339 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 18 quality is enhanced by matching methods too; whereas viaf matches identities by an algorithm based on explicit identifiers or string matching (such as the forms of the name, dates, and bibliographic relationships),52 wikidata matches are usually decided by a human, the user, or (in the case of semiautomatic imports) at least checked a posteriori by a human after some time. the higher precision of manual over automatic matching is recognized also in viaf guidelines. 53 furthermore, as seen above, notability requires that, when clear identification is impossible, no item must be created in wikidata. data maintenance and usability data quality relies also on maintenance. comparison between wikidata items and viaf clusters shows a very small but constant presence of errors to be fixed in both (around 0.01%), even if it is impossible to determine with certainty whether viaf uses wikidata error pages. issues on fixing viaf errors directly by viaf contributors were already noted: “while clustering anomalies can be handled by viaf itself, reporting errors found in source data of viaf partners raise problems related to the efficiency of the notification workflows. at this point, involvement of viaf partners themselves in the process is needed.”54 on the other hand, in wikidata anyone can edit items, add new data or delete mistakes, merge items, fix various issues, and so on, on the fly. due to its openness, wikidata may also suffer from vandalism, but it has its own solutions.55 along with this, data receive special attention to their accuracy and reliability because they are uploaded and maintained by users that are direct stakeholders. for this reason, in wikidata, references to bibliographical or biographical sources and to other data provider ids such as any national and international identification system are suggested, promoted, and carefully examined. moreover, there is a commitment to monitor the consistency of viaf clusters. the ability of wikidata to identify inconsistent viaf clusters and the fact that viaf isolated clusters can be reduced at least by 30%56 by referring to identifiers from wikidata and other data providers, are the best demonstration of the quality of its data and of the importance of the other data providers in viaf clusterization. as to the usability of data, the internal search of viaf lacks more than basic functions: the only available filter allows to limit results to clusters having one specific source; on the contrary, filtering searches for clusters having and/or not having a specific group of sources or to clusters having more or less sources would be very useful, especially in order to find duplicates. in contrast, wikidata has a sparql query service which returns results based on the current status of the database and its internal search can integrate some of the functions of the query service, allowing to look for items having and/or not having specific statements (https://www.wikidata.org/wiki/special:search).57 considering cases in which viaf and wikidata discover potential duplicates in their sources, viaf has no page dedicated to listing cases of (supposedly) duplicate ids from its sources, while wikidata easily allows to find cases in which single sources have (supposedly) duplicate ids through constraint violations58 and appropriate sparql queries. a comparison table a comparison table was built to compare scope, role, system, and functions between viaf and wikidata, inspired by and adapted from a viaf vs isni comparison.59 https://www.wikidata.org/wiki/special:search information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 19 table 9. comparison between and complementarity of viaf and wikidata features feature viaf wikidata scope ● persons ● organizations ● works ● expressions ● locations ● any kind of viaf entity ● any “res” of ifla lrm ● any entity of cidoc ● any other non-glam entity ● any entity in the universe of discourse software ● unknown ● wikibase60 data. person entity properties ● preferred form of name, based on national cataloguing rules ● very rich variant forms of name, identified by national agencies variant forms ● sources ● preferred form of name (label) based on convenience of the user and common usage61 ● variant forms of name (aliases), organized by languages and scripts62 ● sources (as statements and references and with qualifiers) data. quantity (persons) ● number of clusters: 33,656,281 (sept. 2020) ● number of personal clusters: 22,099,715 (sept. 2020) ● number of entities: 90,260,081 (oct. 2020) ● number of personal items: 8,304,947 (oct. 2020) ● number of personal items with viaf id: 2,061,046 (sept. 2020) data. harvesting ● data are provided by authoritative national bibliographic agencies ● data are added through massive semiautomatic imports and/or manually by any interested user data. quality ● data are granted by authoritative national bibliographic agencies ● data are controlled by any directly interested user, based on data from viaf, available bibliographic agencies, and other authoritative bibliographic sources data. other entities properties ● isbn, titles, dates included in the cluster ● any kind of property applicable to an entity can be used (multimedia included)63 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 20 feature viaf wikidata ● dates, genre, bibliographic references from sources, xlinks, etc. ● properties are unchangeable ● all statements admit references, which are strongly recommended in some cases ● unavailable properties can be freely added through a process of property proposal64 data. dates ● dates are extracted from authority and bibliographic records using a parsing technique; calendars and precision are not available65 ● dates are imported semiautomatically from various sources or filled in manually; different calendars are available and further statements can be made through qualifiers66 data. vandalism ● no vandalism: data are editable only by oclc ● everyone can edit, but items which are frequently vandalized can be temporarily or permanently protected from the edits of unregistered users67 data. fixing errors, deduplicating, or unmerging clusters/items ● suggestions and requests via email ● asynchronous ● presumably, automated processes and human interventions ● viaf rebuilds clusters and does not give priority to the stability of one cluster over another68 ● everyone can edit69 ● instantaneous ● probable errors (constraintviolations) are detected in an automated way (by bots and through queries) ● pages with lists of probable errors (constraint-violations) are freely available and constantly updated in an automated way (by bots)70 data. license ● all public data (license: http://opendatacommons.org/licen ses/by/1.0/) ● all public data (license: https://creativecommons.org/publi cdomain/zero/1.0/deed.it) role ● create clusters ● ingest authority records from viaf contributors and other data providers (included wkd and isni) ● publish and diffuse viaf ids and data ● create items with a worldwide recognized and standard identifier ● interlink items with any available external identifier ● ingest data from viaf, from viaf contributors, and other data providers (e.g., isni) http://opendatacommons.org/licenses/by/1.0/ http://opendatacommons.org/licenses/by/1.0/ https://creativecommons.org/publicdomain/zero/1.0/deed.it https://creativecommons.org/publicdomain/zero/1.0/deed.it information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 21 feature viaf wikidata ● allow to create and maintain on toolforge free tools—e.g., mix’n’match—to ingest external identifiers71 ● manage library, bibliographic, and non-library and non-bibliographic linked data ● publish and diffuse wikidata ids and data organizational model ● oclc service, guided by viaf council of participating institutions ● hierarchical, top-down ● membership on request and subordinated to approval ● largely limited to national bibliographic agencies ● wikimedia project ● distributed, bottom-up ● everyone can take part in the project72 ● open to any bibliographic or nonbibliographic institution (national, large, medium, and small) system. website ● interface only in english language ● interface in nearly any language and script; new ones can be added ● online facilities (end user input; edit online facilities for end user) ● login enhances users’ experience (by gadgets and scripts) system. updating ● periodical (asynchronous) ingestions ● continuous, instantaneous, free updates system. versioning ● history is included in each present cluster and for abandoned clusters ● history is inaccessible in redirected clusters ● page history available in each item and for redirected items ● for deleted items, history is accessible only to administrators long-term preservation policy ● oclc maintains the hosting, software, and data for viaf73 ● wikimedia foundation maintains the hosting, software, and data for wikidata74 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 22 feature viaf wikidata notifications to stakeholders ● notifications to be sent to data providers ● notifications are sent to end users and contributors display, search, and download ● in multiple formats: xml and json, including justlinks.json; ● basic search interface ● clusters are listed without clear ranking rule ● integrating monthly dumps ● api endpoint75 ● before april 2020, by monthly dump with persist links; after, monthly dumps without persists links ● in multiple formats: json, php, n3, ttl, nt, rdf, jsonld, html76 ● search interface 77 ● api endpoint78 ● sparql query endpoint79 ● dumps80, also customizable81 ● see https://www.wikidata.org/wiki/help :about_data linked data and sru ● linked data ● sru82 (search and browse indexes, using cql syntax; output formats are xml or html) ● linked data interoperability. local ● local institution can only reconcile viaf ids to their own data ● as changes are made by viaf, synchronization must be periodically performed by sources and local institutions ● full reconciliation, upload, and synchronization of local ids on wikidata and vice versa ● dedicated tools: mix’n’match ● other tools: openrefine ● bots ● manually conclusion main viaf and wikidata features and personal entities data were analyzed and compared in this study to focus on analogies and differences, and to highlight their reciprocal role and helpfulness in the worldwide bibliographical context and in the semantic web environment. viaf is a major international initiative to address the challenge of reliably identifying bibliographic agents on the web, by means of authoritative data based on national cataloguing codes and coming from the national libraries involved in the ubc program. moreover, viaf is a pillar of the identification process that users enact within wikidata. still, the comparison emphasized a few relevant issues in viaf’s approach, designed more than twenty years ago: a very selective policy of inclusion of its sources—contributors and other data providers—and to their participation to the governance, that prevents a worldwide openness of the project to non national libraries and cultural institutions; an obvious neutrality toward data coming from its https://www.wikidata.org/wiki/help:about_data https://www.wikidata.org/wiki/help:about_data https://www.wikidata.org/wiki/help:about_data https://www.wikidata.org/wiki/help:about_data information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 23 contributors, even when data are not compliant with the identification requirements of the semantic web; troubles in correct clustering of ids (duplicate clusters to be merged and conflated clusters to be split), and a one-way flow of data due to its top-down approach that prevents a quick and cooperative workflow to identify and fix errors; the ability to identify only a narrow range of entities (i.e., mainly bibliographic entities, but not even all those provided by ifla lrm). on the other side, the semantic web has offered new important tools and chances to libraries, archives, museums and other cultural institutions, and their data are recognized as a relevant asset for building the backbone of the semantic web as to the control of entities of bibliographic and cultural interest. after eight years of existence, wikidata is playing a relevant role in the publication, aggregation, and control of bibliographic and non-bibliographic information in the semantic web too. it is more and more indicated as a hub for identifiers in the semantic web.83 wikidata depends on viaf for a large part of the identification work of its items on viaf and viaf’s preeminent role in wikidata is acknowledged by its primary position in the identifiers section of the data of each item. for this reason, the wikidata community constantly monitors the consistency of viaf clusters and continuously updates lists of errors present in them . on the other hand, if viaf is undoubtedly very useful to the wikidata community, wikidata can support the consistency of viaf clusters. the wikidata informational ecosystem is much larger and wider, can be built by any interested institution and person, and its identification function can count also on the authority work of national and non-national libraries excluded from the viaf environment, and on authoritative non-bibliographical reference sources too. this study opens some research perspectives. analysis was limited to data about personal entities, as this kind of entity was the only one directly comparable, while further research is wanted to possibly extend the analysis to other kinds of entities. moreover, more research should be devoted to the investigation of the treatment of special categories of persons and their names, such as mythological and legendary characters, ancient greek and latin authors, kings, queens, popes, saints, and so on, as viaf guidelines84 themselves declare among viaf’s typical problems the clusterization of such names (and they often get five or more viaf ids in wikidata). a further line of research should consider the relevance of the clusterization of encyclopedias and other reference sources in the identification process within wikidata. lastly, isolated clusters would need more consideration; as a matter of fact, in this study they were used as a clue of relatively recent uploads in viaf, but lc and dnb show a high rate of isolated clusters too (maybe due to the richness of their collections and metadata). more research on isolated clusters could help to describe with more precision the possible role of non-national libraries and institutions and of their locally rich collections in identifying lesser-known agents (not just persons) in a worldwide perspective. from analyzed data and direct comparison, it can be concluded that viaf and wikidata can be constantly improved through reciprocal comparison, which allows discovery of errors in both. viaf and wikidata are two relevant tools for the authority control in the semantic web and they each have a specific role to play and different stakeholders. unfortunately, as opposed to the relationship between viaf and isni, at present no aspect of viaf-wikidata interoperability is discussed between the managing structures of both systems, on a regular or irregular basis . while wikidata appears to be more reliable with regards to the identification process, its most significant weakness consists in its unorganized and unplanned crowdsourced data acquisition, information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 24 even if based at present on about 11,500 active editors.85 furthermore, the wikidata community still lacks the constant support and cooperation of institutional data curators such as librarians, archivists, and museum curators. many current projects are mainly dedicated to explaining to the potential institutional stakeholders the importance and the usefulness of wikidata for their institutional missions, but there are still too few projects devoted to massive synchronization of data from institutional silos to wikidata. but, as soon as these initiatives reach a critical mass, wikidata will become the real global hub of the web of data. acknowledgements all the authors have cooperated in the redaction and revision of the article. nevertheless, each author has mainly authored specific sections and subsections of the article: • stefano bargioni: data analysis; viaf; wikidata; viaf and wikidata: a data comparison. • carlo bianchini: introduction; discussion; organizational model; identification function; data quantity; data quality; data maintenance and usability. • camillo carlo pellizzari di san girolamo: relationship between viaf and libraries; relationship between wikidata and academic, research, and public libraries; relationship between viaf and wikidata; wikidata controls on viaf; materials and methods; conclusion. all authors contributed to a comparison table. the authors wish to thank the anonymous reviewer whose suggestions helped to improve and enrich the paper, and the editor for his helpful edits. information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 25 endnotes 1 thomas baker et al., library linked data incubator group final report, sec. 2 (w3c incubator group, october 25, 2011), http://www.w3.org/2005/incubator/lld/xgr-lld-20111025/. 2 baker et al., library linked data. 3 dorothy anderson, universal bibliographic control. a long term policy—a plan for action (munchen: verlag dokumentation, 1974), 11. 4 anila angjeli, andrew mac ewan, and vincent boulet, “isni and viaf: transforming ways of trustfully consolidating identities,” in ifla wlic 2014 (ifla 2014 lyon, ifla, 2014), 2, http://library.ifla.org/985/1/086-angjeli-en.pdf. 5 rick bennett et al., “viaf (virtual international authority file): linking the deutsche nationalbibliothek and library of congress name authority files,” international cataloguing and bibliographic control 36, no. 1 (2007): 12–18; barbara b. tillett, the bibliographic universe and the new ifla cataloging principles : lectio magistralis in library science = l’universo bibliografico e i nuovi principi di catalogazione dell’ifla : lectio magistralis di biblioteconomia (fiesole (firenze): casalini libri, 2008), 14–15, http://digital.casalini.it/9788885297814; “viaf. connect authority data across cultures and languages to facilitate research,” oclc, 2020, https://www.oclc.org/en/viaf.html. 6 gildas illien and françoise bourdon, “a la recherche du temps perdu, retour vers le futur: cbu 2.0” (paper, ifla wlic 2014, lyon, france, 2014), 13–14, http://library.ifla.org/956/. 7 illien and bourdon, “a la recherche,” 15. 8 gordon dunsire and mirna willer, “the local in the global: universal bibliographic control from the bottom up” (paper, ifla wlic 2014, lyon, france, 2014), 11, http://library.ifla.org/817/. 9 luca martinelli, “wikidata: la soluzione wikimediana ai linked open data,” aib studi 56, no. 1 (march 2016): 75–85, https://doi.org/10.2426/aibstudi-11434; jesús tramullas, “objetos culturales y metadatos: hacia la liberación de datos en wikidata,” anuario thinkepi 11 (2017): 319–21, https://doi.org/10/ghbj63; xavier agenjo-bullón and francisca hernández-carrascal, “wikipedia, wikidata y mix’n’match,” anuario thinkepi 14 (2020), https://doi.org/10/ghbj6t; claudio forziati and valeria lo castro, “the connection between library data and community participation: the project share catalogue-wikidata,” jlis.it 9, no. 3 (2018): 109–20, https://doi.org/10/ggxj9n; adrian pohl, “was ist wikidata und wie kann es die bibliothekarische arbeit unterstützen?,” abi technik 38, no. 2 (2018): 208, https://doi.org/10/ghbj6w; arl white paper on wikidata: opportunities and recommendations (the association of research libraries, 2019), https://www.arl.org/wpcontent/uploads/2019/04/2019.04.18-arl-white-paper-on-wikidata.pdf; regine heberlein, “on the flipside: wikidata for cultural heritage metadata through the example of numismatic description” (paper, ifla wlic 2019, libraries: dialogue for change, session 206: art libraries with subject analysis and access, athens, greece, august 28, 2019), http://library.ifla.org/2492/1/206-heberlein-en.pdf. 10 arl white paper on wikidata, 27–30; theo van veen, “wikidata: from ‘an’ identifier to ‘the’ identifier,” information technology and libraries 38, no. 2 (2019): 72–81, http://www.w3.org/2005/incubator/lld/xgr-lld-20111025/ http://library.ifla.org/985/1/086-angjeli-en.pdf http://digital.casalini.it/9788885297814 https://www.oclc.org/en/viaf.html http://library.ifla.org/956/ http://library.ifla.org/817/ https://doi.org/10.2426/aibstudi-11434 https://doi.org/10/ghbj63 https://doi.org/10/ghbj6t https://doi.org/10/ggxj9n https://doi.org/10/ghbj6w https://www.arl.org/wp-content/uploads/2019/04/2019.04.18-arl-white-paper-on-wikidata.pdf https://www.arl.org/wp-content/uploads/2019/04/2019.04.18-arl-white-paper-on-wikidata.pdf http://library.ifla.org/2492/1/206-heberlein-en.pdf information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 26 https://doi.org/10/ghbj62; hilary thorsen, “ld4p: linked data for production: wikidata as a hub for identifiers” (slideshow presentation, june 11, 2020), https://docs.google.com/presentation/d/1jwz3_ncf5rdd7ejetglfv99uv2pnd1v/edit?usp=embed_facebook. 11 tillett, the bibliographic universe, 15. 12 open data commons attribution license (odc-by) v1.0 (as stated in http://viaf.org/viaf/data/). 13 “viaf admission criteria,” oclc, 2020, https://www.oclc.org/content/dam/oclc/viaf/viaf%20admission%20criteria.pdf. 14 the description of wikidata source in http://viaf.org/viaf/partnerpages/wkp.html seems to refer to wikipedia before the existence of wikidata. the same acronym wkp reflects this anachronism, whereas isni correctly uses wkd. anyway, this description, as well as many others, requires an update. 15 stacy allison-cassin and dan scott, “wikidata: a platform for your library’s linked open data,” code4lib journal 40 (may 4, 2018), https://journal.code4lib.org/articles/13424. 16 carlo bianchini and pasquale spinelli, “wikidata at fondazione levi (venice, italy): a case study for the publication of data about fondo gambara, a collection of 202 musicians’ portraits,” jlis.it 11, no. 3 (september 15, 2020): 24. 17 ifla working group on functional requirements and numbering of authority records (franar), functional requirements for authority data: a conceptual model (münchen: k. g. saur, 2009), 46, https://www.ifla.org/files/assets/cataloguing/frad/frad_2013.pdf. for qualifiers, see https://www.wikidata.org/wiki/help:qualifiers; for references see https://www.wikidata.org/wiki/help:sources. 18 partial lists are linked from https://wikibase-registry.wmflabs.org/wiki/main_page. 19 see https://www.transition-bibliographique.fr/fne/french-national-entities-file/; the proof of concept is available at https://github.com/abes-esr/poc-fne. 20 jean godby et al., creating library linked data with wikibase: lessons learned from project passage (dublin oh: oclc research, 2019): 8, https://doi.org/10.25333/faq3-ax08. 21 ifla, “opportunities for academic and research libraries and wikipedia” (discussion paper, 2016), 10, https://www.ifla.org/files/assets/hq/topics/infosociety/iflawikipediaopportunitiesforacademicandresearchlibraries.pdf. 22 john riemer, “the program for cooperative cataloging & a wikidata pilot” (slideshow presentation, june 16, 2020), slide 5, https://docs.google.com/presentation/d/1npkaqdggft1wi2vx0zgmtixwxwjpq96ntxx4mmy xffi/edit#slide=id.p. 23 godby et al., “creating library linked data,” 8. https://doi.org/10/ghbj62 https://docs.google.com/presentation/d/1jwz3_ncf5rdd-7ejetglfv99uv2pnd1v/edit?usp=embed_facebook https://docs.google.com/presentation/d/1jwz3_ncf5rdd-7ejetglfv99uv2pnd1v/edit?usp=embed_facebook http://viaf.org/viaf/data/ https://www.oclc.org/content/dam/oclc/viaf/viaf%20admission%20criteria.pdf http://viaf.org/viaf/partnerpages/wkp.html https://journal.code4lib.org/articles/13424 https://www.ifla.org/files/assets/cataloguing/frad/frad_2013.pdf https://www.wikidata.org/wiki/help:qualifiers https://www.wikidata.org/wiki/help:sources https://wikibase-registry.wmflabs.org/wiki/main_page https://www.transition-bibliographique.fr/fne/french-national-entities-file/ https://github.com/abes-esr/poc-fne https://doi.org/10.25333/faq3-ax08 https://www.ifla.org/files/assets/hq/topics/info-society/iflawikipediaopportunitiesforacademicandresearchlibraries.pdf https://www.ifla.org/files/assets/hq/topics/info-society/iflawikipediaopportunitiesforacademicandresearchlibraries.pdf https://docs.google.com/presentation/d/1npkaqdggft1wi2vx0zgmtixwxwjpq96ntxx4mmyxffi/edit%23slide=id.p https://docs.google.com/presentation/d/1npkaqdggft1wi2vx0zgmtixwxwjpq96ntxx4mmyxffi/edit%23slide=id.p information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 27 24 maximilian klein and alex kyrios, “viafbot and the integration of library data on wikipedia,” code4lib journal 22 (october 14, 2013), https://journal.code4lib.org/articles/8964. 25 ifla cataloguing section and ifla meeting of experts on an international cataloguing code, statement of international cataloguing principles (icp) (den haag: ifla, 2016), para. 5.3. 26 https://www.wikidata.org/wiki/mediawiki:wikibasesortedproperties#ids_with_datatype_%22external-id%22; isni (p213, https://www.wikidata.org/wiki/property:p213) is presently sorted after viaf instead of in the iso section because it is considered primarily as a viaf source. 27 epìdosis, viaf e wikidata.mpg, 2020, https://commons.wikimedia.org/wiki/file:viaf_e_wikidata.mpg; a list of gadgets is available at https://www.wikidata.org/wiki/wikidata:viaf/cluster#gadgets. 28 the main error-report page is https://www.wikidata.org/wiki/wikidata:viaf/cluster/conflating_entities; its subpage https://www.wikidata.org/wiki/wikidata:viaf/cluster/conflating_specific_entries is designed for collecting “easy” cases of conflation, when only a few members of a cluster should be moved elsewhere, while the cluster is substantially sane. 29 moreno hayley, email to author, march 23, 2020. to the question if data about abandoned clusters would have been maintained, the viaf answered, “we recognize that the data in the file was not usable. viaf is in a period of transition and it was decided that we could not at this time fix the file so it has been removed from the list of available downloads.” 30 the statement read: “the persist-rdf.xml file has been removed and will no longer be available,” accessed october 23, 2020. 31 angjeli, mac ewan, and boulet “isni and viaf,” 3. 32 https://dumps.wikimedia.org/wikidatawiki/; instructions and a list of kinds of data dumps are available at https://www.wikidata.org/wiki/wikidata:database_download. 33 a general explanation of ranks is available at https://www.wikidata.org/wiki/help:ranking. here is a small summary: values of statements can be ranked in three ways, “preferred,” “normal” (default), and “deprecated”; the expression “values with non-deprecated rank” includes all values with preferred rank or normal rank; the expression “values with best rank” includes only values with preferred rank or normal rank, with this condition: if the same statement has two or more values and at least one of them has preferred rank, values with normal rank aren’t counted; if there aren’t values with preferred rank, all values with normal rank are counted. 34 viaf and wikidata dumps, together with the scripts, were published on zenodo at https://doi.org/10.5281/zenodo.4457114. https://journal.code4lib.org/articles/8964 https://www.wikidata.org/wiki/mediawiki:wikibase-sortedproperties%23ids_with_datatype_%22external-id%22 https://www.wikidata.org/wiki/mediawiki:wikibase-sortedproperties%23ids_with_datatype_%22external-id%22 https://www.wikidata.org/wiki/property:p213 https://commons.wikimedia.org/wiki/file:viaf_e_wikidata.mpg https://www.wikidata.org/wiki/wikidata:viaf/cluster%23gadgets https://www.wikidata.org/wiki/wikidata:viaf/cluster/conflating_entities https://www.wikidata.org/wiki/wikidata:viaf/cluster/conflating_specific_entries https://dumps.wikimedia.org/wikidatawiki/ https://www.wikidata.org/wiki/wikidata:database_download https://www.wikidata.org/wiki/help:ranking https://doi.org/10.5281/zenodo.4457114 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 28 35 the queries can be performed using the following links: viaf members: https://w.wiki/i5j; authority controls related to libraries but not being viaf members: https://w.wiki/i5k; biographical dictionaries: https://w.wiki/i5n. 36 the query can be performed using the following link: https://w.wiki/i5p. 37 it could be because they are probably more difficult to cluster, but in some cases also because they represent infrequently described entities. 38 as suggested by the reviewer, more removals than additions may be a clue of a cleanup project. 39 pat riva, patrick le boeuf, and maja zumer, ifla library reference model, draft (den haag: ifla, 2017), https://www.ifla.org/files/assets/cataloguing/frbr-lrm/ifla_lrm_2017-03.pdf; nick crofts et al., “definition of the cidoc conceptual reference model,” version 5.0.4, icom/cidoc crm special interest group, 2011, http://www.cidoc-crm.org/html/5.0.4/cidoc-crm.html; chryssoula bekiari et al., eds., frbr object-oriented definition and mapping from frbrer, frad and frsad, version 2.0 (international working group on frbr and cidoc crm harmonisation, 2013), http://old.cidoccrm.org/docs/frbr_oo/frbr_docs/frbroo_v2.0_draft_2013may.pdf; lydia pintscher, lea lacroix, and mattia capozzi, “what’s new on the wikidata features this year,” youtube video, october 26, 2020, truocolo, https://www.youtube.com/watch?v=ebxdzk54gru. 40 denny vrandečić and markus krötzsch, “wikidata: a free collaborative knowledgebase,” communications of the acm 57, no. 10 (september 23, 2014): 80, https://doi.org/10/gftnsk. 41 for a general statistic see http://wikidata.wikiscan.org/users; for a statistic about the viaf property see https://bambots.brucemyers.com/navelgazer.php?property=p214; changing the id of the property at the end of the url allows exploring other property statistics. 42 shiyali ramamrita ranganathan, reference service, 2nd ed., ranganathan series in library science 8 (bombay: asia publishing house, 1961), 74. 43 ifla cataloguing section and ifla meeting of experts on an international cataloguing code, statement of international cataloguing principles (icp), 5, https://www.ifla.org/publications/node/11015. 44 wikidata does have a guideline for a preferred label, and its choice is based on users’ convenience (https://www.wikidata.org/wiki/help:label, par. 1.2) as required by international cataloguing principles (2016). as to the choice of the wikidata label in a specific language, viaf does not show any clear principle, while the authors believe that it would be preferable to use the english (“en”) label, whenever available. see ifla cataloguing section and ifla meeting of experts on an international cataloguing code, statement of international cataloguing principles (icp). 45 for example, in september it was done for nkc using openrefine (sample edit: https://www.wikidata.org/w/index.php?title=q520487&diff=1269046867&oldid=12668704 64). https://w.wiki/i5j https://w.wiki/i5k https://w.wiki/i5n https://w.wiki/i5p https://www.ifla.org/files/assets/cataloguing/frbr-lrm/ifla_lrm_2017-03.pdf http://www.cidoc-crm.org/html/5.0.4/cidoc-crm.html http://old.cidoc-crm.org/docs/frbr_oo/frbr_docs/frbroo_v2.0_draft_2013may.pdf http://old.cidoc-crm.org/docs/frbr_oo/frbr_docs/frbroo_v2.0_draft_2013may.pdf https://www.youtube.com/watch?v=ebxdzk54gru https://doi.org/10/gftnsk http://wikidata.wikiscan.org/users https://bambots.brucemyers.com/navelgazer.php?property=p214 https://www.ifla.org/publications/node/11015 https://www.wikidata.org/wiki/help:label https://www.wikidata.org/w/index.php?title=q520487&diff=1269046867&oldid=1266870464 https://www.wikidata.org/w/index.php?title=q520487&diff=1269046867&oldid=1266870464 information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 29 46 angjeli, mac ewan, and boulet, “isni and viaf,” 9. 47 simon cobb (https://www.wikidata.org/wiki/user:sic19) became wikidata visiting scholar in 2017 (https://en.wikipedia.org/wiki/user:jason.nlw/wikidata_visiting_scholar). 48 federico leva and marco chemello, “the effectiveness of a wikimedian in permanent residence: the beic case study,” jlis.it 9, no. 3 (september 2018): 141–47, https://doi.org/10.4403/jlis.it-12481. 49 angjeli, mac ewan, and boulet, “isni and viaf,” 11. 50 andrew mac ewan, “isni, viaf and naco and their relationship to orcid, discussion paper for pcc policy committee, 4 november,” 2013, 2, http://www.loc.gov/aba/pcc/documents/isni%20poco%20discussion%20paper%202013.d ocx. 51 tom adamich, “library cataloging workflows and library linked data: the paradigm shift,” technicalities 39, no. 3 (may/june 2019): 14. 52 oclc, viaf guidelines, rev. july 16, 2019, 2, https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf. 53 oclc, viaf guidelines, 5. “when viaf is unable to algorithmically match some of the source authority records with each other, they can be manually pulled together into a single cluster using an internal table.” 54 angjeli, mac ewan, and boulet, “isni and viaf,” 16. 55 stefan heindorf et al., “vandalism detection in wikidata,” in proceedings of the 25th acm international conference on information and knowledge management, cikm ’16 (new york, ny: association for computing machinery, 2016), 327–36, https://doi.org/10/gg2nmm; amir sarabadani, aaron halfaker, and dario taraborelli, “building automated vandalism detection tools for wikidata,” in proceedings of the 26th international conference on world wide web companion, www ’17 companion (republic and canton of geneva, che: international world wide web conferences steering committee, 2017), 1647–54, https://doi.org/10/ghhtzf. 56 see table 1, col. 1 vs col. 9; it should be noted that col. 9 considers only non-viaf sources and biographical dictionaries, but wikidata also links to encyclopedias and other online databases. 57 for example, people not having viaf id but having iccu id (https://tinyurl.com/y6hbtjuo); instructions about the internal search are available at https://www.mediawiki.org/wiki/help:extension:wikibasecirrussearch. 58 https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations. 59 angjeli, mac ewan, and boulet, “isni and viaf,” 16. 60 https://www.mediawiki.org/wiki/wikibase/datamodel. https://www.wikidata.org/wiki/user:sic19 https://en.wikipedia.org/wiki/user:jason.nlw/wikidata_visiting_scholar https://doi.org/10.4403/jlis.it-12481 http://www.loc.gov/aba/pcc/documents/isni%20poco%20discussion%20paper%202013.docx http://www.loc.gov/aba/pcc/documents/isni%20poco%20discussion%20paper%202013.docx https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf https://doi.org/10/gg2nmm https://doi.org/10/ghhtzf https://tinyurl.com/y6hbtjuo https://www.mediawiki.org/wiki/help:extension:wikibasecirrussearch https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations https://www.mediawiki.org/wiki/wikibase/datamodel information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 30 61 “the label is the most common name that the item would be known by” (https://www.wikidata.org/wiki/help:label). see also ifla cataloguing section and ifla meeting of experts on an international cataloguing code, statement of international cataloguing principles (icp), 5., https://www.ifla.org/publications/node/11015. 62 bots exist to create more and more variant forms based on matching properties, such as date of birth (p569) and date of death (p570), and to import variant forms of names from national authority files. see, for example, https://www.wikidata.org/w/index.php?title=q5669&diff=611600491&oldid=608231160 . 63 https://www.wikidata.org/wiki/help:data_type. 64 https://www.wikidata.org/wiki/wikidata:property_proposal. 65 jenny a. toves and thomas b. hickey, “parsing and matching dates in viaf,” code4lib journal, 26 (october 21, 2014), https://journal.code4lib.org/articles/9607; stefano bargioni, “from authority enrichment to authoritybox : applying rda in a koha environment,” jlis.it 11, no. 1 (2020): 175–89, https://doi.org/10/gg66rq. 66 https://www.wikidata.org/wiki/help:dates. 67 see heindorf et al., “vandalism detection in wikidata.” 68 see mac ewan, “isni, viaf and naco.” 69 see https://www.wikidata.org/wiki/help:merge, https://www.wikidata.org/wiki/help:split_an_item, and https://www.wikidata.org/wiki/help:conflation_of_two_people. 70 complete list at https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations (e.g., https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations/p214). 71 https://admin.toolforge.org/; see also xavier agenjo-bullón and francisca hernándezcarrascal, “registros de autoridades, enriquecimiento semántico y wikidata,” anuario thinkepi 12 (2018): 361–72, https://doi.org/10/ghbj6z. 72 https://www.wikidata.org/wiki/wikidata:property_proposal. 73 https://www.oclc.org/en/viaf.html. 74 https://www.wikidata.org/wiki/wikidata:introduction. 75 https://platform.worldcat.org/api-explorer/apis/viaf. 76 https://www.wikidata.org/wiki/special:entitydata; see also https://www.wikidata.org/wiki/wikidata:database_download. 77 https://www.wikidata.org/wiki/special:search. https://www.wikidata.org/wiki/help:label https://www.ifla.org/publications/node/11015 https://www.wikidata.org/w/index.php?title=q5669&diff=611600491&oldid=608231160 https://www.wikidata.org/wiki/help:data_type https://www.wikidata.org/wiki/wikidata:property_proposal https://journal.code4lib.org/articles/9607 https://doi.org/10/gg66rq https://www.wikidata.org/wiki/help:dates https://www.wikidata.org/wiki/help:merge https://www.wikidata.org/wiki/help:split_an_item https://www.wikidata.org/wiki/help:conflation_of_two_people https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations https://www.wikidata.org/wiki/wikidata:database_reports/constraint_violations/p214 https://admin.toolforge.org/ https://doi.org/10/ghbj6z https://www.wikidata.org/wiki/wikidata:property_proposal https://www.oclc.org/en/viaf.html https://www.wikidata.org/wiki/wikidata:introduction https://platform.worldcat.org/api-explorer/apis/viaf https://www.wikidata.org/wiki/special:entitydata https://www.wikidata.org/wiki/wikidata:database_download https://www.wikidata.org/wiki/special:search information technology and libraries june 2021 beyond viaf | bianchini, bargioni, and pellizzari di san girolamo 31 78 https://www.wikidata.org/w/api.php. 79 https://query.wikidata.org/. 80 https://dumps.wikimedia.org/wikidatawiki/. 81 https://wdumps.toolforge.org/. 82 https://www.oclc.org/developer/develop/web-services/viaf/authority-source.en.html. 83 van veen, “wikidata.” 84 see “typical problems” in viaf guidelines: https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf. 85 pintscher, lacroix, and capozzi, “what’s new.” https://www.wikidata.org/w/api.php https://query.wikidata.org/ https://dumps.wikimedia.org/wikidatawiki/ https://wdumps.toolforge.org/ https://www.oclc.org/developer/develop/web-services/viaf/authority-source.en.html https://www.oclc.org/content/dam/oclc/viaf/viaf%20guidelines.pdf abstract introduction relationship between viaf and libraries relationships between wikidata and academic, research, and public libraries relationship between viaf and wikidata wikidata controls on viaf materials and methods data analysis: viaf clusters and wikidata items viaf wikidata viaf and wikidata: a data comparison discussion organizational model identification function data quantity data quality data maintenance and usability a comparison table conclusion acknowledgements endnotes 276 on-line acquisitions by lolita frances g. spigai: former information analyst, oregon state university library; and thomas mahan: research associate, oregon state university computer center, corvallis, oregon. the on-line acquisition program (lolita) in use at the oregon state university library is described in t erms of development costs, equipment requirements, and overall design philosophy. in pa1'ticular, the record format and content of records in the on-orde1' file, and the on-line processing of these records (input, search, correction, output) using a cathode ray tube display terminal are detailed. the oregon state university library collection has grown by 15,00020,000 new titles per year (corresponding to 30,000-35,000 volumes per year) for the past three years to a total of approximately 275,000 titles ( 600,000 volumes); continuing serials account for a large percentage of annual "volume" growth. these figures would indicate an average input of 60-80 new titles per day. on an average, a corresponding number of records are removed each day upon completion of the processing cycle. a like number of records are updated when books and invoices are received. in addition, approximately 200 searches per day are made to determine whether an item is being ordered or to determine the status of an order. since the mid-1960's, and with the introduction of time-sharing, a handful of academic libraries ( 1, 2, 3) and several library networks ( 4, 5, 6) have introduced the advantages ( 7) of on-line computer systems to library routines. most of the on-line library systems use teletypewriter terminals. use of visual displays for library routines has been limited, although stanford anticipates using visual displays with ibm 2741 typeon-line acquisitionsjspigai and mahan 277 writer terminals in a read-only mode ( 1), and the library of the ibm advanced systems development division at los gatos, sharing an ibm 360/50, uses an ibm 2260 display for ordering and receiving ( 8). in addition, an institute of library research study, focusing on on-line maintenance and search of library catalog holdings records, has concluded that even with the limited number of characters available on all but the most expensive display terminals " ... the high volume of data output associated with bibliographic search makes it desirable to incorporate crt's as soon as possible, in order to facilitate testing on a basis superior to that achievable with the mechanical devices." (9). many academic libraries, during shelflist conversion or input of acquisition data, use a series of tags for bibliographic information. some of these tags are for in-house use, while others presumably are used to aid in the conversion of marc tape input to the library's own input format. the number of full-time staff required to design and operate automated systems in individual academic libraries typically ranges from seven to fifteen. this doesn't seem to be an inordinate range, since most departments of a medium-large to large academic library require a similar size staff for operational purposes alone. lolita (library on-line information and text access) is the automated acquisition system used by the oregon state university library. it operates in an on-line, time-shared, conversational mode, using a cathode ray tube (cdc-210) or a 35-ksr teletype as a terminal, depending upon the operation required. both types of equipment are in the acquisitions department of the library; each interacts with the university's main computer ( cdc-3300, 91k core, 24-bit words), which, in turn accesses the mass storage disk ( cdc-814, capable of storing almost 300 million characters) through the use of lolita's programs in conjunction with the executive program, os-3 ( 10). under the os-3 time-sll,aring system, lolita shares the use of the central computer memory and processor with up to 59 other concurrent users; the use of the mass storage disk is also shared with other users of the university's computer center. (lolita will require approximately 11 million characters of disk storage). lolita's programs are written in fortran and in the assembly language, compass, and are composed of two sets: those which maintain the outstanding order file, and those which produce printed products and maintain the accounting and vendor files. several key factors have shaped the design of lolita. an on-line, time-sharing system has been operating at osu since july 1968, and online capabilities have been available for test purposes since the summer of 1967. programming efforts could be concentrated exclusively on the design of lolita and an earlier pilot project ( 11) , for no time was needed to design, debug or redesign the operating system software, as was necessary at washington state u. and the u. of chicago (2, 12) . heavy reliance was put on assembly language coding for the usual 278 journal of library automation vol. 3/4 december, 1970 reasons, plus the knowledge that the computer center's next computer is to be a cdc-3500, with an instruction set identical to that which the library now uses. in short, neither the os-3 operating system nor the assembly language will change for the next few years. an added motivation influencing program design was the desire to minimize response time for the user. in view of the transient nature of a university library's student and civil service staff, the need for an easily-learned and maintained system is paramotmt. the flerible display format of the crt allows a machine readable worksheet, with a built-in, automatic, tagging scheme; it obviates the need for a paper worksheet, and thus eliminates a time-consuming, · tedious, and error-prone conversion process. the book request slip contains the source information for input. proofreading and correction are done on-line at time of input. alterations can be made at any later time as well. lolita has used from 1.5 to 3.0 fte through the period of design to operation. after an initial testing and data base buildup period, anticipated to last about six months, and during which lolita will be run in parallel with the manual system, it is expected that the on-order/in-process, vendor, and accounting files will be maintained automatically and that reports and forms currently output by the acquisitions department staff will be generated automatically. specifically, records comprising three files will be kept on-line : 1) the outstanding order file (a slight misnomer since it includes and will include three types of book request data: outstanding orders, desiderata of high priority, and in-process material), 2 ) name and address for those vendors of high use (approximately 200 of 2500, or about 8% ), and codes and use-frequency counts for all vendors, and 3) accounting data for all educational resource materials purchased by the oregon state university library. it should be kept in mind that, although lolita is designed for book order functions, the final edited record, after the item has been cataloged, will be captured on magnetic tape as a complete catalog record. thus, all statistics and information, except circulation data, will be available for future book acquisitions. this project is being undertaken for two reasons: 1) the oregon state university library is concerned that librarians achieve their potential as productive professionals through the use of data processing equipment for routine procedures, and that cost savings may be realized as the library approaches a total system encompassing all of the technical services routines, and 2) a uniquely receptive computer center and a successful on-line time-sharing facility are available. record format and content each book request is described by 27 data elements which are grouped into three logical categories and are displayed in three logical "pages" on-line acquisitionsfspigai and mahan 279 of a crt screen. the categories are: 1) bibliographic information, 2) accounting information, and 3) inventory information; figures 1, 2, and 3 list the data elements in the same sequence as they appear on the crt screen. though most data elements listed are self-explanatory, eight require some description. order number flag word author title edition id number publisher year published notes fig. 1. bibliographic information. order number date requested date ordered estimated price number of copies account number vendor code vendor invoice number invoice date actual price date received date 1st claim sent date 2nd claim sent fig. 2. accounting information. order number bib cit date cataloged volume issue location code lc class number fig. 3. inventory information. 280 l ournal of library automation vol. 3 f 4 december, 1970 flag word this data element indicates the status of a request. the normal order procedure needs no hag word. exceptions are dealt with automatically by entering an appropriate hag word. as more requests are added to the system, and as more exceptional instances are uncovered, more hag words will undoubtedly be added. to date there are twelve hag words, plus one data element which serves both as a data element and as a status signal. flag words and procedures activated are described below. conf.: confirming orders for materials ordered by phone or letter, and for unsolicited items which are to be added to the collection. the order form is not mailed, but used for processing internal to the library only. accounting routines are activated. gift: for gift or exchange items, a special series number prefixed by a "g" is assigned and the printed purchase order is used internally only. this hag word also acts as a signal so that accounting routines will not encumber any money. the primary reason for assigning a purchase order number is to provide a record indexing mechanism (this is also true for held orders) . held : selected second-priority orders being held up for additional book budget funds. these order records are kept on line, and are assigned a special series of purchase order numbers, prefixed by an "h." no accounting procedures accompany these orders, although a purchase order is generated and manually filed by purchase order number. live : held orders which have been activated. this word causes a reassignment of purchase order numbers to the next number in the main sequence ( instead of "h" -prefixed numbered) and sets up the natural chain of accounting events. the new purchase order number is then written or typed on the order form, the order date added, and the order mailed. cash: orders for books from vendors who require advance payment. an expenditure, instead of an encumbrance, is recorded. rush: used for books which are to be rush ordered and/or rush cataloged. rush will also be rubber-stamped on the purchase order for emphasis. no special procedures are activated within the computer programs; rush is an instruction for people. docs: used when ordering items from vendors with whom the osu library maintains deposit accounts (e.g. government printing office). this causes a zero encumbrance in the accounting scheme; cash is used to put additional money into deposit accounts. canc: cancelled orders. unencumbers monies and credits accounts for cash orders. reis: used to reissue an order for an item which has been cancelled. a new purchase order containing a new order number, vendor, etc. will automatically be issued. re-input is not necessary; however, changes in vendor no., etc., can be made. on-line acquisitionsj spigai and mahan 281 part: denotes a partial shipment for one purchase order. no catalog date can be entered while part appears as the flag word. invo will replace part when the final shipment has been received; canc will replace part if the final shipment is not received, and the order is reissued for the portion received. · invo : when invoice information is entered into the file, invo is typed in as the flag word. this causes accounting information (purchase order number, vendor code, invoice number, actual price, invoice data, account number) to be duplicated in the accounting file. kill: used to remove an inactive record from the file ( cf. date cataloged). date cataloged: a value entered for this data element signals the end of processing. the record is removed from the main file and transferred to magnetic tape. changes and additions to inventory and bibliographic data elements are anticipated at this final point, to bring the record into line with those of the catalog dept. author(s) all authors are to be included in this data element, corporate authors, joint authors, etc. the entry form is last name first (e.g. smith, john a. ). for compound authors, a slash is used as the delimiter separating names (e.g. smith, john a. i jones, john paul) . id number standard book number, vendor catalog number, etc. order number the order number is automatically assigned to one of three series depending on the flag word: the main number series with the fiscal year as prefix; held order series with an "h"-prefix (stored in the order number index as 101, the "h" is what is printed on the order forms); and gift series with a "g" -prefix (likewise stored in the order number index as 102). vendor code a sample of 18 months of invoice data (obtained from the comptroller's office) for the library resource account number indicates the use of 2200 vendors during that period of time. by sorting by invoice frequency and dollar amount, about 200 vendors were identified who either invoiced the library more than 12 times during this time period (since the invoices tended to contain more than one item for frequently used vendors, the number of purchase orders issued could easily be several times this amount), or whose invoices totalled over $110.00. of these, 171 have been selected for on-line storage. they will be assigned code numbers 1 to 171, and names and addresses of these vendors will be included on the computer generated purchase orders. authority files for all vendors 282 journal of library automation vol. 3/4 december, 1970 are kept on rolodex units; one set is arranged alphabetically by vendor name, the other by vendor code. account number the library account to which the book is charged. the number is divided into four sections: 1) a two-digit prefix identification for osu, 2) a four-digit identification for osu library resource expenditures, 3) a oneor two-digit identification of the particular library resource fund account to be charged (e.g. science, humanities, serials, binding, etc. ), and 4) a oneor two-digit code identifying the subject which most closely describes the request. from this data, statistics will be derived which describe expenditures by subject as well as by fund allocation. this will provide a powerful tool for collection building and . may also be a political aid in governing departmental participation in book selection. bibcit bibliographic citation code which cites the location by acquisitions dept. personnel of bibliographic data ( l.c. copy, etc. ). this information is included on the catalog work slip (4th copy of the purchase order) so that duplicate searching by the catalog dept. can be avoided. lc classification number refers to the call number as it is assigned by the osu catalog dept. file organization on-order record the operating system for oregon state university's on-line, time-sharing system reads into memory a quarter page (or file block) of 510 computer words at a time. each on-order (outstanding order) record is composed of a block of 51 computer words ( 204 6-bit characters), or linked lists of blocks, in order to best use this system. thu·s, each quarter page is divided into ten physical records of 51 computer words apiece. for records requiring more than one block, the nearest available block of 51 words within the same 510 word file-block is used; but if none is vacant within the same file-block, the first available 51-word block in the file is used. if none is free the file is lengthened to provide more blocks. a bit array is used to keep track of the status (in use, vacant) of records in the main file. in the bit array, each of 20 bits of each 24-bit computer word corresponds to a 51-word block in the main file. as in figure 4, the 13th bit has a zero value, indicating a vacancy in the 13th 51-word block of the main file; the 14th bit has a value of 1, indicating the 14th 51-word block in the on-order file is in use. a total of 10,120 block locations can be monitored by each file block of the bit array. records in this file are logically ordered by purchase order number, the arrangement effected by pointers which string the blocks together. on-line acquisitiansf spigai .and mahan 28$ 510-word ftle block unused 4 bits one -word b i t array fig. 4. bit army monitor of record block use in the on order file. access points order number the order number index is arranged by the main portion of the order number, and within that, it is in prefix number sequence. the sequence in figure 5 illustrates order number index arrangement (as well as the logical arrangement of the on-order file). the order number index allows quick access to selected points within the main file. conceptually, the ordered main file is segmented into strings of records whose order numbers fall into certain ranges. more specifically, items whose sequence numbers range from 0 to 4 (ignoring the prefix of the order number) comprise the first segment, 5 to 9 the second, etc. the index itself merely contains pointers to the leading record in each (conceptual) segment. thus, in the records whose purchase order numbers are shown in figure 5, there would be pointers to the second (69-124) and sixth (70-125), but not to the others. to reach the fourth ( 101-124) one follows the index to the second, and then follows the block pointers through the third to the fourth . 102-118 69-124 70-124 101-124, 102-124 70-125 102-125 . 70-126 fig. 5. fiscal year 1969, order number 124 fiscal year 1970, order number 124 held order number 124 for the current year gift order number 124 for the current year ( note : the prefix 'h,' which is printed on the purchase orders is represented as the number 101 for internal computer processing; likewise 102 represents the prefix 'g') order number index sequence. 284 journal of library automation vol. 3/4 december, 1970 p.o. number forward pointer ' p.o. number backward pointer time of last update . p. 0. number title forward pointer v title backward pointer v pointers to author( s) / ~ ~ title > date of re_quest date ordered encumbered price number of c<>e_ies account number (2 words) vendor number flag word ~ publisher 1 date of publication ~ notes ~ ~ edition ~ ld number ~ blbcit ' lc classification number )' volume number issue ~ location code ; ~ ~ vendor's invoice number ~~ invoice date actual price date received date first claim sent date second claim sent fig. 6. "on order" record organization. on-line acquisitionsjspigai and mahan 285 author(s) the author index is in the form of a multi-tiered inverted tree. the lowest tier is an inverted index containing the only representation of the author's names (it is not stored in the on-order record (figure 6), and, for each author, pointers to the records of each of his books (figure 7). the entries for several authors may be packed into a single 51-word block, if space permits. each higher tier serves to direct the indexing mechanism to the proper block in the next tier below, and to this end as much as needed of an author's name is filed upwards into higher tiers; this method is described in more detail by lefkovitz ( 13) as "the unique truncation variable length key-word key." author index directory (level 0 + 1) john/ jones, j 927 inverted author index (level 0) control word (ii chars. in record; # chars. in full name of author; # of titles jones, t jones, john pa ul 928 jop k.a 1282 tow ~ ~~~3 in on order file ~~2~66~7------------~ on order file 1072 927 10/20/69 10/29/69 $4.95 . 30-1061-6-20 16 0000 1282 10 fig. 7. author index organization and access to on order file. title not yet programmed. on-line record processing record creation after a number of new book requests have been searched to determine their absence from osu's collection and after they have been bibliographically identified, they are hatched for vendor assignment and readied for entry into the on-line file of book requests via the crt (figure 8 ). l-.:> 00 0) g '"'t i5 -c -~ n ... /y'rifiid "-.. n _/ not ""i assiql vrnoor 1•..-::-::-. _ i .... ~ a ~ y i > ~ ...... c ~ ...... .... c ;:s n < 0 !-' cn -~ d (!) () (!) !3 0"' (!) ~'"i ..... to -..1 0 fig. 8. book request processing. on-line acquisitionsjspagai and mahan 287 lolita's starting page is obtained by typing in the word lolita on the crt screen. the text illustrated in figure 9 is then displayed on the screen of the crt. when 't' is typed in, indicating a wish to create a record, the first data element of the first page of input appears (figure 10). (since the majority of records do not need a flag word upon input, the flag word fill-in line appears only on a redisplay of this page, and the flag word may be inserted at that time.) main file please indicate a choice 1. create a new entry 2. locate an existing entry 9. terminate all processing fig. 9. "starting" page of function choices. author(s): examples: jones dequincey, thomas washington, booker t. adams, john quincy/ doe, john american medical association fig. 10. first data element displayed in new record creation process. at this point the user can go in one of two directions. the first page of input information may be entered one data element at a time, each element being requested in a tutorial fashion by lolita. alternately, all of the first page data may be input at once, with data elements separated by delimiters. the user can switch from one method to the other at any point. a control key (return) is the delimiter used to signal the end of each data element, and, at the same time, return repositions the cursor (which indicates the position of the next character to be typed on the crt screen) to the location of the next data element to be filled in. another conh·ol key (send): 1) serves as a terminal delimiter, and 2) transmits data on the screen to the computer, thereby 3) triggering the continuation of processing until the next screen display is generated. thus, with page one, data elements are displayed, filled in and sent one at a time in the tutorial approach, or, all seven data elements are typed in at once, a return mark following items 1-6, then sent after the last data element. return or send must be used with each data element, even with those for which there is no information. this secures the sequence of element input, thus providing an easy (for the user) and automatic way of tagging elements for any future tape searches to provide statistics or analytical reports. in particular, this process obviates all content restrictions on variable (ie., free-form) items. each of the pages is redisplayed after 288 journal of library auto'tiultion vol. 3/4 december, 1970 input, and corrections can be made at this time. the crt is used for all input and its write-over capabilities are utilized for corrections, as compared to the "read-only" use planned for crt displays used for stanford's ballots ( 1). except for the flag word, all the data elements on the first page are variable in length and unrestricted as to content. data elements on page 2 and 3 (figures 2 and 3) are more of a fixed length in nature; thus with these pages, a whole page at a time is always filled in and sent: the tutorial function is inherent in the display. the concluding display is shown in figure 11. send if all done, type 1-3 to review pages. fig. 11. review option. because hatched searching and input are assumed, when one search or input is finished, the program recycles to continue searching or inputting without going back to the starting page (figure 9) each time. record search searching programs have been completed which will search by order number and by author. title searching will be implemented within the next few months, although a satisfactory scheme for title searching ( improving on manual methods, yet economical) has not been uncovered. methods suggested or used by ames, kilgour, ruecking, and spires have been noted (14, 15, 16, 17). the procedure for searching within the outstanding order file begins with the display of choices shown in figure 9. one types a "2," indicating a desire to locate an existing entry, and the text shown in figure 12 is displayed on the crt screen. at this point one chooses to search either by order number or by author. if one selects a valid order number representing a request record, the first page of that record, containing bibliographic information, is displayed. this is followed by the display shown in figure 11, so that accounting and inventory information may also be reviewed. for the user's convenience the order number is displayed in the upper right-hand comer of each of the three pages, both upon record input and search redisplay. to search by author, one types the author's name on the second line of figure 12, using the same format as that used in record creation. if the ------------------------: order number ------------------------------: a uth 0 r supply one of the above (start on the appropriate line) fig. 12. display of search options. ' on-line acquisitionsjspigai and mahan 289 author has only one entry in the outstanding order file, the first page of the entry will appear, etc. (as in the order number search above) . if the author entered has more than one entry in the on-line file, information depicted in figure 13 will be displayed on the screen of the crt. __ _____________ : enter number or 'nf' (not found) 1. night of the iguana 2. the milk-train doesn't stop here anymore 3. cat on a hot tin roof n. the glass menagerie fig. 13. display of multiple titles on file for one author. if the requested title is one of the titles displayed, one types its number and the record for that title will be displayed. if the title isn't among those displayed, typing nf would result in a redisplay of the text in figure 12 in order for searching to continue. for personal authors, variant forms of the name may be located using the following procedure. the word others is entered at the top of the screen, after an unsuccessful author search, so that a search for author j. p. jones would find all documents by john paul jones, joseph p. jones, j. peter jones, etc., as well as j. p. jones. a search for john p. jones would find all documents by j. p. jones, john jones and j. peter jones as well as john p. jones. record changes additions and corrections to the original record are made by first locating the record (by order number, author, or eventually, title), adding to the data elements, or writing over them (for corrections), and transmitting the information. examples of this procedure include: 1) entering the date received, 2) recording the vendor invoice number, invoice date, and actual price and 3) inserting or changing a flag word. in addition, after an item has been cataloged, the record is revised to include catalog data, as well as to exclude extraneous order notes. output aside from the crt displays, output is in three forms: off-line tape, printed forms and on-line files (figure 14). examples of output are library purchase orders, accounting reports, vendor data, and records of cataloged items. the number of potential reporting uses is limited only by money and imagination. 290 journal of library automation vol. 3/4 december, 1970 fig. 14. output from on-line on order file input. i order number i i date i id number author title publisher vendor name vendor address voujmes edition fig. 15. purchase order. f estimated price i no. of copies i vendor cooe i account date of pub. * * * • flag** • * gift or held order no. bibcit library purchase order 00 r cd !il~ iii r= ::0 r < . > sp >cil r-i ~ c/l c~ x c/lftl :v 0 c: -i ::0 z 0 "' < q "' ~ ::0 c/l ~ :::; -< on-line acquisitionsfspigai and mahan 291 the purchase order, shown in figure 15, is composed of four copies: 1} the vendor's copy to be retained by him, 2) a vendor "report" copy, 3) the copy which is kept as a record in the osu library, and 4) a catalog work slip to be forwarded to the catalog department with the book. purchase orders are printed on the library's teletype, which is equipped with a sprocket-feed. orders can also be printed on the line printer in the computer center. while this is a slightly cheaper data processing procedure, since no terminal costs are incurred, convenience and security have produced a victory in "economics over economies" ( 18 ), and the librarian's time has been considered in the total scheme. for gift items, purchase orders are produced as the cheapest means of preparing a catalog work slip. held purchase orders are produced and manually filed in purchase order number sequence, but when their status is changed to live, the old numbers are automatically replaced by a purchase order number in the main series. these new numbers are written onto the purchase orders, along with any other changes, and the orders are mailed. the flag word live also activates accounting procedures. there are two sets of accounting reports. the first is generated when the purchase orders are issued and contains tabulated information for the library's bookkeeper, the head of business records in the acquisitions dept., and the comptroller of the oregon state system of higher education. the second summary report is issued after the book and invoice have been received and will contain additional information, pertinent to the invoicing procedure; this report has the same distribution as the first. periodic reports are planned for the library's subject divisions summarizing expenditures by account number, reference area, and subject. programming for this has not yet been done. a frequency count will be stored with each vendor code and periodic listings will be printed for use in retaining vendors. mter an item has been cataloged, the catalog work slip and a slip equivalent to a main-entry catalog card are sent to acquisitions, and all remaining information and changes are recorded in the on-line record. this record is then transferred to a file from which it is dumped onto a magnetic tape. this off-line file will be used for statistical analyses and will be the start of a machine readable data base. future plans will, of course, depend on funding; however, two logical steps which could follow immediately and require no additional conversion are: 1) additional computer generated paper products (charge cards, catalog cards, book spine labels, new book lists, etc. ) , and 2) a management information system using acquisition and cataloging data. the construction of a central serial record in machine readable form would produce many valuable by-products. a program for the translation of the marc ii test tape has been written which causes these records to be printed out on the computer center's line printer; and since a sub292 journal of library automation vol. 3/4 december, 1970 scription to the marc tapes is now available to osu for test purposes, its advantages and compatibility with lolita will be investigated as time permits. unsolved problems, aside from those which everyone working in a data processing environment faces (e.g. syst~m and hardware breakdown, continued project funding, and lengthy dehv~ry times for hardware), include: 1) the widely varying system response tunes (commonly from a fraction of a second up to 60 seconds; usually 2-15 seconds); 2) the lack of personnel skilled in both data processing and library techniques; 3) the limited print train currently available on the line printer ( 62 character set); and 4) bureaucratic policy, which can render the most sophisticated plans for automation unfeasible if properly applied. it is recognized that all these problems can be solved by money, time, and priorities. meanwhile, the period of in-parallel operation will be valued as a time to educate, to test, to gather statistics, and to further refine the programs and procedures which comprise lolita. evaluation preliminary input samples indicate that a daily average of from 8 hours, 20 minutes, to 10 hours and 45 minutes will be necessary for input, searches, ~ting and corrections using the crt. an additional 3 hours per day ~f terminal time using the teletype will be required to produce the purchase orders, answer rush search questions if the crt is busy, and activate the daily batch programs (accounting reports, etc.). the sad economic plight of most libraries causes librarians to cast an especially suspicious eye on the costs of automation; a few words on osu's data processing costs may b~ of interest. the cost of total development efforts to produce lolita is under $90,000 (though considerably less was actually expended), or an average annual cost of $30,000 over a three-year period. this compare~ favorably with average annual incomes of from $50,000 to over $300,000 m federal funds alone for other on-line library acquisition projects in ?tiiversities ( 19, 20, 21, 22). a total of 6.75 man-years was required to des1gn lolita. the 6.75 man-years comprises 2.5 years of programming, 3.25 years .of systems analysis, coordination and documentation, and 1.0 year of clencal work, and represents the efforts of four students and six professional workers. this total does not include the time spent by acqu~sitions department personnel in reviewing lolita's abilities or in leammg to use the terminals. current data processing rates charged by the computer center include the following: crt rental-$100/mo.; cpu time-$300/hr.; terminal time -$2.00/hr.; on-line storage costs-15c/2040 characters/mo. the teletype has been purchased, thus only local phone lines charges are incurred. the on-line system is available for use from 7 :30 a.m. to 11:00 p.m. each week-day, and from 7:30 a.m. to 5:00 p.m. on saturday, which more than covers the 8-5 schedule of the acquisitions department. il on-line acquisitionsjspigai and mahan 293 acknowledgments the work on which this paper is based was supported by the administration, the computer center and the library of oregon state university. special mention is due robert s. baker, systems analyst, osu library, and lawrence w. s. auld, head, technical services, osu library, for their extensive participation in the lolita project and for their many suggestions which benefitted the final version of this paper. hans weber, head, business records, osu library, also contributed much to lolita's design. references l. veaner, allen b.: project ballots: bibliographic automation of large library operations using a time-sharing system. progress report, march 27, 1969-june 26, 1969, (stanford california: stanford university libraries, 29 july 1969), ed-030 777. 2. burgess, thomas k.; ames, l.: lola: library on-line acquisition sub~system~ (pullman, washington: washington state university, systems office, july 1968), pb-179 892. 3. payne, charles: "the university of chicago's book processing system." in stanford conference on collaborative library systems development: proceedings, stanford, california, october 4-5, 1968 (stanford california: stanford university libraries, 1969). ed-031 281, 119-139. 4. pearson, karl m.: marc and the library service center: automation at bargain rates (santa monica, california: system development corporation, 12 september 1969). sp-3410. 5. nugent, william r.: "nelinet -the new england library information network." in congress of the international federation for information processing (ifip), 4th: proceedings, edinburgh, august 5-10, 1968 (amsterdam, north holland publishing co., 1968 ). g28-g32. 6. blair, john r.; snyder, ruby: «an automated library system: project leeds," american libraries, 1 (february 1970), 172-173. 7. warheit, i. a.: "design of library systems for implementation with interactive computers," ] ournal of library automation, 3 (march 1970)' 68-72. 8. overmyer, lavahn: library automation: a critical review (cleveland, ohio: case western reserve university, school of library science, december 1969). ed-034 107. 9. cunningham, jay l.; schieber, william d.; shoffner, ralph m.: a study of the organization and search of bibliographic holdings records in on-line computer systems: phase i (berkeley, california university: institute of library research, march 1969). ed029 679, pp. 13-14. 294 journal of library automation vol. 3/4 december, 1970 10. meeker, james w.; crandall, n. ronald; dayton, fred a.; rose, g. : "os-3: the oregon state open shop operating system." in american federation for information processing societies: proceedings of the 1969 spring joint computer conference, boston, mass., may 14-16, 1969 (montvale, new jersey: afips press, 1969), 241248. 11. spigai, frances; taylor, mary: a pilot-an on-line library acquisition system (corvallis, oregon: oregon state university, computer center, january 1968), cc-68-40, ed-024 410. 12. university of chicago. library: development of an integrated, computer-based, bibliographical data system for a large university library (chicago, illinois: university of chicago, library, 1968). pb-179 426. 13. lefkovitz, david : file structures for on-line systems (new york: spartan books, 1969 ), pp. 98-104. 14. ames, james lawrence: an algorithm for title searching in a computer based file (pullman, washington : washington state university library, systems division, 1968). 15. kilgour, frederick g.: "retrieval of single entries from a computerized library catalog file," proceedings of the american society for information science, 5 (new york, greenwood publishing corp., 1968)' 133-136. 16. ruecking, frederick h., jr.: "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," journal of library automation, 1 (december 1968), 227-238. 17. parker, edwin b.: spires (stanford physical information retrieval system). 1967 annual report (stanford california: stanford university, institute for communication research, december 1967), 33-39. 18. kilgour, frederick g.: "effect of computerization on acquisitions," program, 3 (november 1969), 100-101. 19. "university library systems development projects undertaken at columbia, chicago and stanford with funds from national science foundation and office of education," scientific information notes, 10 (april-may 1968), 1-2. 20. "grants and contracts," scientific information notes, 10 (octoberdecember 1968), 14. 21. "university of chicago to set up total integrated library system utilizing computer-based data-handling processes," scientific information notes, 9 (june-july 1967), 1. 22. "washington state university to make preliminary library systems study," scientific information notes, 9 (april-may 1967), 6. editorial ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ facing what’s next, together lita president’s message facing what’s next, together emily morton-owens information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.12383 emily morton-owens (egmowens.lita@gmail.com) is lita president 2019-20 and the acting associate university librarian for library technology services at the university of pennsylvania libraries. when i wrote my march editorial, i was optimistically picturing some of the changes that we are now seeing for lita—while being scarcely able to imagine how the world and our profession would need to adapt quickly to the impacts on library services as a result of covid-19. it is a momentous and exciting change for us to turn the page on lita and become core, yet this suddenly pales in comparison to the challenges we face as professionals and community members. libraries’ rapid operational changes show how important the ingenuity and dedication of technology staff are to our libraries. since states began to shut down, our listserv, lita-l, has hosted discussions on topics like how to provide person-to-person reference and computer assistance remotely, how to make computer labs safe for re-occupancy, how to create virtual reading lists to share with patrons, and how to support students with limited internet access. there has been an explosion in practical problem-solving (ils experts reconfiguring our systems with new user account settings and due dates), ingenuity (repurposing 3d printers and conservation materials to make masks), and advocacy (for controlled digital lending). sometimes the expense of library technologies feels heavy, but these tools have the ability to scale services in crucial ways—making them available to more people at the same time, available to people who can only take advantage after hours, available across distances. technologists are focused on risk, resilience, and sustainability, which makes us adaptable when the ground rules change. our websites communicate about our new service models and community resources; ill systems regenerate around increased digital delivery; reservation systems for laptops now allocate the use of study seating. our library technology tools bridge past practices, what we can do now, and what we’ll do next. one of our values as ala members is sustainability. (we even chose this as the theme for lita’s 2020 team of emerging leaders.) sustainability isn’t about predicting the future and making firm plans for it; it’s about planning for an uncertain future, getting into a resilient mindset, and including the community in decision-making. although the current crisis isn’t climate-related per se, this way of thinking is relevant to helping libraries serve their communities. we will need this agile mindset as we confront new financial realities. our libraries and ala itself are facing difficult budget challenges, layoffs, reorganizations, and fundamental conversations about the vitalness of the services we provide. my favorite example from my own library of a covid-19 response is one where management, technical services, and it innovated together. our leadership negotiated an opportunity for us to gain access to digitized, copyrighted material from hathitrust that corresponds to print materials currently locked away in our library building. thanks to decades of careful effort by our technical services team, we had accurate data to match our print records with records for the digital versions. our it team had processes for loading the new links into our catalog almost mailto:egmowens.lita@gmail.com information technology and libraries june 2020 facing what’s next, together | morton-owens 2 instantaneously. the result was a swift and massive bolstering of our digital access precisely when our users needed it most. this collaboration perfectly illustrates how natural our merger with alcts and llama is. as threats to our profession and the ways we’ve done things in the past gather around us, i am heartened by the strengths and opportunities of core. it is energizing to be surrounded by the talent of our three organizations working together. i hope more of our members experience that over the summer and fall, as we convene working groups and hold events together, including a unique social hour at ala virtual and an online fall forum. i close out my year serving as the penultimate lita president in a world with more sadness and uncertainty than we could have foreseen. we are facing new expectations and new pressures, especially financial ones. as professionals and community members, we are animated by our sense of purpose. while lita has been transformed by our vote to continue as core, the support and inspiration we provide each other in our association will carry on. lib-s-mocs-kmc364-20140601052432 118 journal of library automation vol. 5/2 june, 1972 automation of acquisitions at parkland college ruth c. carter: system librarian, university of pittsburgh libraries. when this article was in preparation, the author was head of technical services and automation, parkland college, champagne, illinois this paper presents a case study of the automation of acquisitions fun ctions at parkland college. this system, utilizing batch processing, demonstrates that small libraries can develop and support lm·ge-scale automated systems at a reasonable cost. in operation since september 1971 , it provides machine-generated purchase orders, multiple order cards, budget statements, ovet·due notices to vendors, and many cataloging by-products. th e entire collection, print and nonprint, of the learning resource center is being accumulated gradually into a machine-readable data base. introduction-background parkland college, opened in 1967, is a two-year community college located in champaign, illinois. before the librarian-analyst, who combines a library degree with several years' experience as a computer systems analyst and six months of programming training, was hired by parkland, the administration decided that automation of some library procedures was feasible. at the time the library decided to initiate automation planning (december 1970), it had a book collection just under 30,000 plus 1000 audio-visual items. the decision to automate would not have been possible unless a computer was available at the college. in the spring of 1970 when the librarian-analyst was hired, parkland owned an ibm 360/ 30 with 32k. before automation plans were under way, the college purchased an ibm 360/30 with 64k. the computer's increased capacity provided even more incentive for utilizing the computer for significant projects in addition to instructional and administrative functions. among the reasons in favor of automation was a automation of acquisitions/carter 119 general consensus indicating that automation was the way to go, and that the library with its many individual records is a natural for utilizing the computer. the automation of library acquisitions at parkland is notable for several reasons. first, automation was done relatively easily and rapidly; actual systems design and programming were completed in six months. full implementation was achieved within nine months of the formal beginning of the project. second, documentation of the system is exhaustive and is based on a detailed method of communication between the system's librarian-analyst and the programmer. third, automation in this instance was accomplished economically. fourth, the entire system can be run on an ibm 360/30 with 32k having two disk drives and two tape drives, and a standard print chain consisting of just upper-case letters. what to automate? this, of course, is a crucial question. where out of the various alternatives of circulation, acquisitions, cataloging, and others does one begin? neither the librarian-analyst nor the rest of the library staff made any attempt to work out an answer during the fall of 1970. the librarian-analyst, as head of technical services spent the first four months concentrating on cataloging and learning the problems in the acquisitions area. by december she was ready to begin planning for automation. meetings were arranged with the director of the learning resource center and the director of the computer center. informal discussions with the library staff were held. circulation was eliminated early from consideration, since parkland is in temporary quarters. it seemed more logical to develop the area of circulation with the move to the permanent campus. in addition, the volume of circulation did not appear to warrant the time and personnel commitment necessary to develop a comprehensive system at this time. several possibilities remained: the acquisition of new materials, conversion of our whole catalog, and periodicals control, including automatic claim notices. periodicals seemed the least likely of the three, because our holdings numbered less than 700, and it was felt that the volume involved did not justify the effort and expense of going to a computer system, particularly the first computer system within the library. converting the whole catalog had some positive arguments. it would provide a data base for later circulation efforts and also make it possible to produce bibliographies and other service features for faculty members. however, this idea was discarded due to the large initial data-conversion problem, and because it did not provide relief for existing problems within the library. the library staff concluded that acquisitions had first priority for automation. to this the director of the computer center heartily agreed on the grounds that it was a conventional data processing type of application, and it would dovetail with existing data bases already maintained for administrative purposes, in particular, the vendor file and financial reporting 120 journal of library automation vol. 5/ 2 june, 1972 files. furthermore, the library could then produce its encumbrance data to be entered into the budget programs for the business office accounting records. from the standpoint of the library staff, it was believed that by utilizing the computer in acquisitions we could improve the overall staff utilization in the area. probably the strongest point is that, while we did not expect clerical work time to be decreased, its nature would be changed. one specific function to be eliminated was the manual bookkeeping done, although a machine system would still require checking for accuracy. we expected that the acquisitions librarian, once freed from some routine responsibilities concerning the budget, would be able to devote that time to more professional activities. other advantages in automating acquisitions were: more accurate and up-to-date information, especially in regard to budget figures would be available; human errors in sending out orders would be cut down; and statistics on orders could be compiled automatically. at this point, as well as previously, the literature was searched for relevant discussions of acquisitions systems and/or mechanization applications in small libraries. relatively little had appeared in print describing library automation in junior colleges. those articles found to be helpful included: burgess, cage, corbin, dobb, dunlap, macpherson, morris, and vagianos (see references 1-5 and 7-9). also, hayes and becker's handbook of data processing for libraries ( 6) became available at this time. it was especially useful for the summary of features usually present within the scope of standard acquisitions applications. along with use of the literature, several visits to other libraries with operational systems were made. a visit of particular importance was made in january ( 1971) to study an established off-line acquisitions system. as soon as there was general agreement on proceeding with plans for acquisitions, a list was prepared of the criteria the library staff would expect from the automation of acquisitions. the list items included: 1. the system should be open-ended, i.e., it should be planned with other potential future systems in mind. 2. it should handle the preparation of outgoing forms such as purchase orders, book-order cards, notifications to faculty requestors, and overdue notices to vendors. 3. the system should perform bookkeeping functions and provide many different access points for inquiry into the data base. 4. there must be a status list of items in the acquisitions process, up to and including the point of receiving cataloging. 5. it should have as much automatic editing of input data as possible. 6. the system must have flexible updating and file maintenance routines. 7. it should provide the library staff with decision-aiding information including many of our previously manually maintained statistics. automation of acquisitions/carter 121 8. it must be flexible. 9. it should maintain simplicity. and, 10. it should provide better service to the faculty through faster and more accurate ordering and notifications. along with the criteria for an acquisitions system, a possible sequence of automation development was submitted. this was to provide a means for keeping clearly in mind that, while acquisitions would get first attention, this was only a starting point, and that the system should be planned in such a manner as to facilitate its compatibility with future developments. as originally stated, acquisitions, strictly speaking, represented phase 1, and materials added to the collection were phase 2. however, phases 1 and 2 were planned and programmed at the same time. thus, from the beginning, parkland college has included in its system cataloging information such as the complete call number, and up to three subject headings of fifty characters each. the decision regarding number and length of subject headings will be discussed later. (see master record layout at figure 1.) time estimate-schedule in january, 1971, a proposed time estimate (see figure 2) was submitted to the director of the computer center for his approval. this time estimate was prepared with the goal of automating acquisitions beginning with the fiscal year 1972 (i.e., july 1972). the proposed schedule also took into account the fact that most of the librarians were expected to be on vacation all (or at least most) of august, and also that during september, with the registration of students and other demands on the computer resulting from the beginning of a new academic year, computer time and personnel would be tight and probably could not provide the necessary support to a system still in its developmental stages. the schedule called for the librarian-analyst to begin full -time work on analysis on february 15 with final implementation of the system by the end of july. preparation of this estimate was based on computer output if everything went right. it was an extremely rigorous schedule. considering that problems did arise, the implementation of this system during the first week of august is truly notable. of course, bugs remained after the system was actually in operation, and, as with all systems, changes were still being made several months later both in specifications for programming and in the programs conforming to the specifications. when the time estimate was submitted, it was also necessary to make firm decisions regarding personnel to perform all the necessary tasks. the librarian-analyst assumed responsibility for all systems analysis and program definitions. the library staff supplied the keypunching support. one clerk had been hired previously because of her keypunch training. on july 1, an additional clerk was hired with this skill. the main problem was programtap'e layout foi'im tape no. i prepared by: i remarks r. carter library master files: on order, i n process, ~!story no , length, block : 400 x 9 iji"i'l ·~ ~~·i i ill i ii ii i i ill i i j";'i" i ~ ii ill ii i i ii ii i ill i i~ ~j""t"l" i rj,iiiiiiiiiiiiiiiii 'i'j"llllllllllllllj~ llll.~ k '"''''' "''""' '"· ' j l 1111111111111111111111111111111 1 111111111111111 f ~ositions 301-350 • subject hesding no. 2; 35 1-400 • subject he &ding no. 3. fig. 1. master record layout. 1-' ~ 0' ~ !::l --a t'"' .... cl"' ~ > ~ ...... 0 ~ ...... o· ;! < 0 ~ cjl ......._ l~ ._ c :l v(!) 1-' cd -l 1:-0 automation of acquisitions/carter 123 ming, because the computer center did not have the full-time personnel to support a major new effort. this was resolved by hiring a programmer on a special three-month contract running from april 15 to july 15, 1971. prior to implementation, the library was forced to rely on the availability of keypunch machines at the computer center. in september 1971, an ibm model 129 keypunch and verifier was installed in the technical services department of the library. a model 129 was chosen for the library in conformance with the initial requirement set by the director of the computer center-that all library data for the computer be verified. this has proven to be a wise decision, as we have had relatively limited problems with invalid or erroneous data. requirements specification phase (analysis) three weeks were allowed for identification and specification of all output desired from the initial system. many of these requirements were alluded to in the preliminary list of criteria for the system. to meet the library's needs we decided that the system must produce: purchase orders, individual order cards (including a copy used to order catalog cards from the library of congress), budget statements including all encumbrances and payments as well as other financial data, lists of all books on order or in process or cancelled, notices to vendors regarding items on order more than 120 days, notices to each faculty member of the additions to the collection of items they requested complete with call number, and a monthly accession list of all newly cataloged items that could be circulated to all faculty members. time date to date to development steps required start complete i. requirements specifications 3 weeks feb. 15 march 5 ii. detailed design-system how 3 weeks march 8 march 26 ill. detailed design-programming specifications 10 weeks march 29 june4 iv. programming-acquisitions 10 weeks april15 june 23 v. programming-materials accessioned 3 weeks june 24 july 14 vi. computer program system test -acquisitions & materials accessioned 2% weeks july 1 july 26 vii. implementation july 1971 fig. 2. time estimate for automation of acquisitions at parkland college as submitted in january 1971. a beginning and ending date for each phase is indicated and the actual time in weeks required is shown. 124 journal of library automation vol. 5/2 june, 1972 once it was known what forms were required, orders were placed for the necessary pre-printed forms. with some outside advice in the matter of forms suppliers, specifications for three new forms were delineated, two of which would be for use on the computer. the first form encountered in outlining the acquisitions process was a request form. the request form is used to make a record of all items ordered and to serve as a checklist in the searching process (see figure 3). later, it is stamped with a six-position control number and serves as the source document for keypunching new orders, which require three input cards per item ordered. the request form is then retained in control-number sequence until the item has completed its way through the technical services process. specifications for the purchase orders were drawn up by parkland's business manager. the machine-generated purchase orders used by parkland are almost identical to the conventional manual purchase orders used throughout the college. in this case, automation of the library's purchase orders is a likely precursor to automation of the purchase orders for the remainder of the college. the most complicated form to design, from the library's viewpoint, was the individual order form. this was required in five parts, including a copy complying with library of congress specifications for use with ocr equipment. (this is illustrated in figure 4. ) paper pato iy n.cji. co. speeoiset e moore business foams, inc., 26 searched in bip pbip 8pr ptla o. p, pil fund vendor format code author (last name first) titlefvol. card catalog publisher other year no. copies reviewed in: series/edition lccard no. requester control no. order code price sbn fig. 3. request form, used as a control record for each item ordered. -------r-------------------------------------------------------------------i 0 0 0 0 0 i subsc riber no i m i alpha pref' i i 220111 i i i author westheimer, david title lighter than a feathe r publisher little date 1971 no. copies l control number 103921-b order date ·v endor ll ttle brown & co j l c car d number r 174 -15494 7 i i 10 i list price i lo i ' w '; i r •· " ~ ii 0 7.95 i 1-14-72 i l01375 i 0 p 0. no . i parkland college library i io 11111111111111111 i 0 i a b c d e f g sbni h i j k l m n 0 i i b_ -------i-------------------------------------------------------------------~---~-. 01 1 o i i ------------_l__ original copy, used to order catalog cards rrom me library of congress. ---~--------------------------------,-0 i 1 r;:·-t,m7 ! o 0 [ author·westh£jmer, oavio l ... o ... ee•o title light fit than • featttfr i o-m 0 'o publisher ll ttle date 1971 list price 7. 95 0 0 0 0 . no. copies 1 control number 10)921-6 order date vendor little a·•town a. to 1-14-72 p.o. no. l01j7s parkland college library champaign, illinois 61820 second copy, used to send to vendor. fig. 4. copies one and two of the multiple-part order form . 0 0 0 126 journal of lib·rary automation vol. 5/2 june, 1972 it was important to determine forms requirements early, as it was anticipated that several months' time would elapse before they would be received. naturally, it was desired that the forms be on hand by the time the programs would be ready for testing, which was planned for late june or early july. one of the most critical parts of the requirements specification phase was the determination of data elements to be included in the master records. perhaps the most perplexing of those possibilities considered was subject headings. since we wanted an open-ended system which would leave us some room for future development, without major modifications, a decision was made to include three 50-character subject headings in each record. here we were limited because of the decision made (for purposes of simplicity of design and programming) to confine the system to fixed -length records. it was considered desirable for storage purposes to keep the master record length within 400 characters. while the decision on subject headings may prove to be adequate in the long run, it does give parkland's library a good starting point for some projects using subject headings, such as developing bibliographies on demand. despite possible future modifications to the data base, all items going into the history (master) file included headings as defined above. additional determinations made in the initial phase regarded files to be maintained. here a crucial factor was the physical limitations of the college's computer system. as only two tape drives and two disk drives comprised the primary storage facilities, the capability for performing sorts was limited. in fact, one of the disk drives was reserved strictly for systems programs, and could not be utilized directly by the library. this contributed to the decision to maintain separate on-order and in-process files, as well as a history file on tape. the college vendor file and the library budget file are maintained on disk. a final area of effort in the initial phase was developing codes to be utilized throughout the system. naturally, many conditions would be indicated in the computer records by the use of a oneor two-position code. one example is the format code, a one-position code, which indicates the types of items used such as: b=book, r=record, and s=filmstrip. design phase-system flow three weeks were allotted to developing the overall systems flow chart. this time was spent working out each separate program that would be required, and flow-charting the entire series of programs. a flow chart of the system (without minor additions dating after september 1971) is shown in figure 5. however, it does not necessarily indicate the sequence in which programs are run. in general, maintenance of each of the separate files is run prior to new data. this procedure has proved to work well. .-------~ : llfnoo• i : u'oaf( ca~o$ i ~ ----;--.! ... -.. -, i \woau i i vtlfooii r· ~:~~ ~ automation of acquisitions/ carter 127 o\uiojhiuv ooooo c:~f6' fig. 5. system flow chart. 128 journal of library automation vol. 5/ 2 june, 1972 in most cases, pre-sorting of card input is provided. this decision was not based on optimum efficiency but on the compatibility with routine procedures and facilities in the computer center. design phase-program specifications one of the most significant parts of the development of parkland's automated library acquisitions system is the exhaustive documentation provided by detailed written specifications for each program in the system. each program, including utilities such as sorts, was assigned a job number and then described under each of the following topics: purpose, frequency, definitions (any unusual terms), input, output, and method. a format was provided for each input and output, whether it was a card, tape, disk, list, or other printed report or form. these accompanied each individual program specification. the method section is particularly important. here the librarian-analyst stated the procedure used to arrive at the given output based on the given input. any necessary constants were defined. because the librarian-analyst has had programming training, these specifications are detailed to the point where the programmer does not have to do much more than code the problem, making it possible for programming to proceed quickly. this thorough problem definition for each program by the librarian-analyst was one of the major factors (perhaps the primary key) in our success in acquisitions being accomplished rapidly and efficiently. it had the advantage of obviating the need for a senior programmer, or for having someone from the computer center become highly involved in the analysis of library details. furthermore, and perhaps most important is the fact that it provides the detailed documentation of the system. there should be no doubt as to the procedures within each program. an example of a specification for one of the programs in the parkland college library acquisition series is presented in the appendix. it should be mentioned that most of the programs are written in cobol. there are a few in assembler, and some minimal use is made of rpg. testing of the program the original plans called for testing with test data which would proceed simultaneously with programming. however, as things developed, most coding was done prior to very much testing. as a result, the period originally devoted to live data testing of the whole system was instead devoted to testing the programs with test data. thus, in early july, we were about two weeks behind the original time estimate, and that is where it ended up. the usual problems showed up in testing with test data. moreover, during the first week of july, it was learned that the business office was changing the length of the account numbers from 9 to 11 positions. fortunately, space had been planned for up to a 12-position field, so the lengthened number could be easily accommodated by the system. however, the changautomation of acquisitions/carter 129 ing of numbers required modification of any program which edited data for valid account numbers. this was a minor problem and easily resolved. on july 15 the programmer completed the job for which he was hiredi.e., to complete a programming and systems test utilizing live data and to make appropriate changes as identified during testing. since not even testdata testing was complete on july 15, he stayed until july 20 and finished that work. meanwhile, the director of the computer center had already selected the individual to be the operator when the library's jobs were being run on a regular basis. this employee would also provide program maintenance. on july 21, this permanent staff member took over programming. for the next two weeks, while summer school classes were in session, most of the trial runs of the library series had to he done during evenings, nights, and on weekends. by the end of july, most of the major bugs appeared to be out of the programs. impact on technical services success on the first usable purchase order and order cards came on august 3. within the next day or two, a workable budget statement was produced along with a wits list (work in technical services). by august 13, when the vacation time came, nearly one thousand books had been ordered via the automated system. while a few bugs remained to be dealt with in september, the system was accomplishing its basic mission essentially on time. it took less than eight months to identify requirements, and design, program, and test a system consisting of twenty-seven programs in its original design! during the remainder of 1971, various bugs were found, and, it is to be hoped, eliminated from the system. more bugs occurred in the budget series than in any other single segment of the system. over a period of several months, these were worked out; as of march, 1972, the budget sequence of programs worked smoothly. implementation following the implementation of the automated technical services system, several effects were evident. an obvious effect was the saving of two to three days per month formerly spent on bookkeeping. on the other hand, one permanent staff member was added to technical services because of the keypunching workload. this addition had two causes: the keypunching load, and the fact that many more books were ordered directly from publishers with a consequent major increase in processing in-house. therefore, much of what was expended in salary for the extra clerk was saved by eliminating most prepaid processing costs. for several months after implementation, some duplication of effort was required, especially by acquisitions personnel. thus, the total effect on changing the nature of work was not immediately obvious. by march 1972, duplication was essentially phased out, and more realistic assessments of the 130 journal of library automation vol. 5/2 june, 1972 impact of automation in changing the nature of the workload are now being made. one of the most obvious changes is the increased number of bills to be approved for payment. by utilizing the computer to batch purchase orders and order cards, almost all materials are now ordered directly from publishers, rather than pre-processed from a jobber. although the speed by which items are received and processed has increased substantially, there has been a corresponding increase in paper work in this regard. additional services besides the immediate effects of the automation of acquisitions within technical services, other parts of the library and the college felt the impact. this is especially true of reference, which now has a weekly updated listing of all items on order, in process, or cataloged within the last month, in both author /main entry and title sequence. budget statements are now available to the director of the learning resource center and other personnel on a weekly rather than monthly basis. not only are they received sooner, but they provide more information than is present in the statement originating from the computer center. a useful fringe benefit is the availability of overdue notices to vendors when items have heen on order more than 120 days. a computer-generated notice is sent each week to faculty members regarding items requested, cancelled, or cataloged. the response of the library staff and the rest of the faculty to the automated system has been very favorable. cost at this date (march 1972) , costs are difficult to assess, but certainly seem minimal. the only direct costs are the installation of a 129 keypunch, which rents for $170 per month, plus the salary of the extra staff member for keypunching. however, the extra salary is compensated for by no longer ordering items pre-processed at an average cost of $2.05 per item. naturally, there is some local cost for processing materials such as pockets and labels, but it is minor on a per-volume basis. in addition, by being processed locally, materials are available to the users much more rapidly. among other costs, the learning resource center had to pay a threemonth salary for a programmer. other computer support, whether personnel or machine time, has not been directly billed to the library. analyst time is absorbed, in part, in general library salaries as the librarian-analyst is also head oftechnical services and is responsible for original cataloging. about one-half of her time is devoted to automation activities. as an indirect cost of automation, it is reasonable to include the cost of a special summer project contract of about $1500 for the reference librarian to catalog a-v materials. this was necessary because the librarian-analyst was directly involved with automation, thus not able to keep up with all media of materials to be cataloged. purchase-order forms previously covered by the business office budget cost the library $900. however, it was a two-year automation of acquisitions/carter 131 supply which was paid for by money the college, if not the library, would have expended anyway. the multiple-order forms for computer use exceed the cost of more standard forms by several hundred dollars per year. the library also expends about $400 per year to buy punch cards and magnetic tape. some direct savings resulted from what are by-products of the automated system, but which were previously done manually. these include production of a monthly accession list and notices to faculty members of items they requested which were ordered, cancelled, or cataloged. the accession list was previously compiled by xeroxing in ten copies the shelflist card for all items added to the collection during a month. this involved both xerox charges and student assistant time. notices to faculty were previously sent out by both the order and processing sections. now these notices are consolidated, which produces savings in addressing time, as well as eliminating manual production of each notice. overall, in calculating costs and savings, direct and indirect, it appears at this point that parkland has automated many library routines very inexpensively, although specific cost figures remain to be determined. with the availability of a similar computer, many other libraries should be able to undertake automation of certain basic functions without large expenditures of either money or personnel time. problems as with all automated efforts, some problems were encountered at almost every stage of development. taken as a whole, these were minor and, for the most part, few hitches were encountered. however, so that others may profit from the library automation experience at parkland, those problems will be discussed. the major problem was the original programmer of the series. this person was not a regular employee of parkland and was not concerned with being retained. since he was not part of the staff, he worked erratically and frequently was hard to get hold of. we were working on a tight time schedule, and it was very important to maintain close supervision of the progress being made, although sometimes this was difficult. in addition, even though it was strongly desired that tests be conducted throughout the three-month period, the programmer waited until all coding and compiling was completed before beginning even test-data testing with most programs. fortunately, it worked out satisfactorily, as the regular staff member of the computer center, who presently runs our jobs and does program maintenance, took over in mid-july and was available for live-data tests. all staff members directly involved with automation worked very hard the last two weeks of july and the first week of august to complete testing with live data. the programs were further refined during august and september, and most of the bugs were out by early fall . naturally, changes in specifications continued to be made, and our acquisitions system is definitely not static. 132 journal of library automation vol. 5/ 2 june, 1972 the lesson we learned from the experience with the initial programmer is that, if a regular staff member of the institution can be assigned to the development of programs for the library, avoiding other assignments during that time period, a more satisfactory response can be achieved from the programmer. also, in such an operation it would be possible to monitor progress on a more regular basis. another group of problems arose in connection with the new forms required for the automated system. fortunately, these were not serious. the forms arrived later than they were promised, and, without exception, their cost was about 25 percent more than the original estimates. because custom forms can take a long time to be completed, it is wise to identify output requirements ·early in the development of an automated system, so that the forms can be completed and delivered when the system is ready for final testing and implementation. a few minor problems revolved around decisions made in file design. for conserving space and holding down the size of the master record, it was decided to pack numerical fields. this would have been satisfactory if packing had been limited to such fields as the julian date, such as 72001 rather than 01-01-72. (this form of the date was used to provide easy computation when calculating overdue orders. ) unfortunately, fields such as the numerical part of the lc card number and the parkland college account numbers were also packed. no problem existed except when the lc card was blank at order time; then the lc number printed as zeros. of course, these could be suppressed once the problem was identified, although it was decided to make space to unpack the field. it was learned that packed fields always print zero when unpacked, unless this is specifically suppressed, and also that it is impossible to debug packed fields on routine file dumps that are requested with provisions for unpacking and reformatting the dump. this is because packed fields print blank when they are dumped. other minor difficulties included: l. the print chain did not print colons or semi-colons, except as zero, therefore, the library's records all contain commas instead. 2. in the midst of programming the account numbers , all the college's funds were changed, thus requiring the change of constants and edit criteria in many programs. 3. as originally specified for input, the lc classification number did not sort in shelf list order, for instance, bf 8 sorted after bf 21. this was eventually remedied by left-justifying the letters and right-justifying the numbers within separate fixed fields. 4. routine delays for machine repair and maintenance were a concern, since it is necessary to adhere to a tight schedule in systems development. automation of acquisitions/carter 133 future development as is so frequently the case, now that parkland is committed to automated functions within the library, more and more applications are seen. even the former skeptics on the staff are enthusiastic, and all the professionals have made suggestions for the future. several additions to the acquisitions system were made in the first six months following implementation of the system. these included a list of purchase orders sequenced by vendor and enlarging the machine-generated notices to faculty requestors to cover items ordered and cancelled. various additions have been made in several programs originally part of the system, which expand the services the system can provide for the library staff. many more minor modifications and supplementary features in acquisitions have been identified for inclusion in the system, and will be added as time permits. the first additional area to benefit directly by the computer availability has been periodicals. without involving complicated programming, the periodicals holdings have been converted to a card file which is then listed directly, card by card, without changes, except for suppression of a control and sequence number. nothing more is planned for periodicals in the near future, because the new card file enables the master holdings list of 800 titles to be updated in technical services by the periodicals assistant, who also keypunches one-half time. the time-consuming retyping of the holdings list is now eliminated, and multiple copies of up-to-date holdings lists can be produced more frequently with less effort. another new area for which programming specifications were released in december 1971 is reference. in this system it is hoped that subject bibliographies and holdings lists, based on library of congress classification, can be produced. this system will have a multitude of purposes, one of the primary ones being to give better service to our faculty members. we get many requests for copies of portions of our shelflist or other extracts of holdings. rather than filling these requests by xeroxing cards or tedious typing, a few extract specifications will permit computerized retrieval and printing. also, search time in the catalog will be cut down considerably. in the subject bibliographies, the library plans to be able to extract on any heading, stem of a heading, or any part of a heading, thus getting much more flexibility than in manual use of the card catalog. programming for this is currently under way, and after the system has been completed and is operational, some interesting results should be identified. by including three subject headings of fifty characters in our original file design, it was possible to design and program the reference series as a spin-off of the acquisitionstechnical services system with a minimum of additional effort. even if it is eventually decided to lengthen either the number or size of the subject headings contained in parkland's file, useful services will have been provided under the original design, as well as simply having provided a base for further decisions and developments. 134 journal of library automation vol. 5/2 june, 1972 other projects which are being considered for future action are serials holdings (in parkland's case, mostly annuals and yearbooks which get cataloged), including an anticipation list, and management statistics consisting of holdings percentages by class letter versus collection additions and circulation figures by class letter. circulation itself will undoubtedly not be designed prior to actual residence on the permanent campus ( anticipated for fall 1973), but all of the above are possibilities and some will receive attention in the immediate future. by building a data base which includes subject headings and call numbers, many future projects will be practical to consider as the file maintenance programs and the data base will already exist. these, of course, may be modified from time to time to meet changing conditions and requirements. additionally, parkland's library staff has been following cooperative library automation efforts involving other libraries, and would happily consider participation in appropriate cooperative ventures. conclusion in the opinion of both the library and computer staff, the automation of acquisitions is a success. it was accomplished rapidly and essentially on time and economically-with few costs higher than originally anticipated. now that the system is operating smoothly, with only an occasional bug cropping up, the extra workload caused by parallel operations has been phased out and the total efficiency of the system should continue to improve. the system to date has been running on a weekly basis, and this has proved satisfactory to both the computer center personnel and the library. the library is among the first parts of parkland to be on a regular weekly schedule using the computer. most other processing is on a monthly and quarterly cycle. in approaching any automated systems development, a general attitude of flexibility combined with thoroughness is very important and will probably bring the best long-term results. by being flexible and open-ended, regardless of what portion of a library's functions were originally automated, the way will be paved to provide a data nucleus for other applications in areas of the library. thoroughness in design and attention to initial detail are also important, as sometimes it is harder to find the time to make the changes than was expected. there is probably a tendency to get along with an operational system as it is, rather than making minor non-crucial modifications in it, although such changes do get worked in as time permits. nonetheless, it is very important that in the initial stages a system be as comprehensively planned as feasible. the parkland college learning resource center is fortunate in that original specifications (on the whole) were well thought out and provided a cohesive unit, which is also characterized by built-in flexibility, and as a result is adaptable to future growth. automation of acquisitions/carter 135 acknowledgments numerous individuals have participated in and supported library automation efforts at parkland college. david l. johnson, director of the learning resource center provided the initial inspiration and determination. robert 0. carr, director of the computer center, welcomed the library's commitment to automation and provided the technical advice where necessary. sandra lee meyer, acquisitions librarian, gave full cooperation, including tireless aid in clarification of requirements and debugging test results. since late july 1971, bill abraham has been the programmeroperator for the library system and has consistently given more than one hundred percent effort. jim whitehead from western illinois university contributed valuable advice based on his prior experience in acquisitions automation. finally, kathryn luther henderson, an inspirational teacher and friend, voluntarily spent many hours writing test data and offering the opportunity for many fruitful discussions. references 1. thomas k. burgess, "criteria for design of an on-line acquisitions system at washington state university library," in proceedings of the 1969 clinic on library applications of data processing, edited by dewey e. carroll (urbana: university of illinois, graduate school of library science, 1970), p. 50-66. 2. alvin c. cage, "data processing applications for acquisitions at the texas southern university library," in proceedings, texas confe1·ence on library automation, 1969 (houston: texas library association, acquisitions round table, 1969), p. 35-57. 3. john b. corbin, "the district and its libraries-tarrant county junior college district, fort worth, texas," in proceedings of the 1969 clinic on library applications of data processing, edited by dewey e. carroll (urbana: university of illinois, graduate school of library science, 1970), p. 114-34. 4. t. c. dobb, "administration and organization of data processing for the library as viewed from the computing centre," in proceedings of the 1969 clinic on library applications of data processing, edited by dewey e. carroll (urbana: university of illinois, graduate school of library science, 1970), p. 75-80. 5. connie dunlap, "automated acquisitions procedures at the university of michigan library," library resources & technical se rvices 11: 192202 (spring 1967). 136 journal of library automation vol. 5 / 2 jun e , 1972 6. robert m . hayes and joseph becker, handbook of data processing for libraries (new york: wiley-becker and hayes, 1970). 7. john f. macpherson, "automated acquisition at the university of western ontario," in automation in libraries. papers presented at the c.a.c.u.l. workshop on library automation at the university of british columbia, vancouver, april 10-12, 1967 (ottawa, ontario: canadian library association, 1967). 8. ned c. morris, "computer-based acquisitions system at texas a & t university," journal of library automation 1 :1-12 (march 1968 ). 9. louis vagianos, "acquisitions: policies, procedures, and problems," in automation in libraries. papers presented at the c.a.c.u.l. workshop on library automation at the university of british columbia, vancouver, april 10-12, 1967 (ottawa, ontario: canadian library associ ation , 1967 ), p. 1-9. 158 information technology and libraries | december 2009 michelle frisquepresident’s message i know the president’s message is usually dedicated to talking about where lita is now or where we are hoping lita will be in the future, but i would like to deviate from the usual path. the theme of this issue of ital is “discovery,” and i thought i would participate in that theme. like all of you, i wear many hats. i am president of lita. i am head of the information services department at the galter health sciences library at northwestern university. i also am a new part-time student in the masters of learning and organizational change program at northwestern university. as a student and a practicing librarian, i am now on both sides of the discovery process. as head of the information systems department, i lead the team that is responsible for developing and maintaining a website that assists our health-care clinicians, researchers, students, and staff with selecting and managing the electronic information they need when they need it. as a student, i am a user of a library discovery system. in a recent class, we were learning about the burkelitwin causal model of organization performance and change. the article we were reading described the model; however, it did not answer all of my questions. i thought about my options and decided i should investigate further. before i continue, i should confess that, like many students, i was working on this homework assignment at the last minute, so the resources had to be available online. this should be easy, right? i wanted to find an overview of the model. i first tried the library’s website using several search strategies and browsed the resources in metalib, the library catalog, and libguides with no luck. the information i found was not what i was looking for. i then tried wikipedia without success. finally, as a last resort, i searched google. i figured i would find something there, right? i didn’t. while i found many scholarly articles and sites that would give me more information for a fee, none of the results i reviewed gave me an overview of the model in question. i gave up. the student in me thought: it should not be this hard! the librarian in me just wanted to forget i had ever had this experience. this got me to thinking: why is this so hard? libraries have “stuff” everywhere. we access “stuff,” like books, journals, articles, images, datasets, etc., from hundreds of vendors and thousands of publishers who guard their stuff and dictate how we and our users can access that stuff. that’s a problem. i could come up with a million other reasons why this is so difficult, but i won’t. instead, i would like to think about what could be. in this same class we learned about appreciative inquiry (ai) theory. i am simplifying the theory, but the essence of ai is to think about what you want something to be instead of identifying the problems of what is. i decided to put ai to the test and tried to come up with my ideal discovery process. i put both my student and librarian hats on, and here is what i have come up with so far: n i want to enter my search in one place and search once for what i need. i don’t want to have to search the same terms many times in various locations in the hopes one of them has what i am looking for. i don’t care where the stuff is or who provides the information. if i am allowed to access it i want to search it. n i want items to be recommended to me on the basis of what i am searching. i also want the system to recommend other searches i might want to try. n i want the search results to be organized for me. while perusing a result list can be loads of fun because you never know what you might find, i don’t always have time to go through pages and pages of information. n i want the search results to be returned to me in a timely manner. n i want the system to learn from me and others so that the results list improves over time. n i want to find the answer. i’m sure if i had time i would come up with more. while we aren’t there yet, we should continually take steps—both big and small—to perfect the discovery process. i look forward to reading the articles in this issue to see what other librarians have discovered, and i hope to learn new things that will bring us one step closer to creating the ultimate discovery experience. michelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, northwestern university, chicago. book reviews information technology and libraries | march 2014 44 epub 3: best practices, by matt garrish and markus gylling. sebastopol, ca: o'reilly. 2013. 345 pp. isbn: 978-1-449-32914-3. $29.99. there is much of value in this book—there aren't really that many books out right now about the electronic book markup framework, epub 3—yet i have a hard time recommending it, especially if you're an epub novice like me. so much of the book assumes a familiarity with epub 2. if you aren't familiar with this version of the specification, then you will be playing a constant game of catch-up. also, it's clear that the book was written by multiple authors; the chapters are sometimes jarringly disparate with respect to pacing and style. the book as a whole needs a good edit. this is surprising since o'reilly is almost unifo rmly excellent in this regard. the first three chapters form the core of the book. the first chapter, "package document and metadata," illustrates how the top level container of any epub 3 book is the "package document." this document contains metadata about the book as well as a manifest (a list of files included in the package as a whole), a spine (a list of the reading order of the files included in the book), and an optional list of bindings (a lookup list similar to the list of helper applications contained in the configurations of most modern web browsers). the second chapter, "navigation," addresses and illustrates the creation of a proper table of contents, a list of landmarks (sort of an abbreviated table of contents), and a page list (useful for quickly navigating to a specific print-equivalent page in the book). the third chapter, "content documents," is the heart of the core of the book. this chapter addresses markup of actual chapters in a book, pointing out that epub 3 markup here is mostly a subset of html5, but also pointing out such things as the use of mathml for mathematical markup, svg (scalable vector graphics), page layout issues, use of css, and the use of document headers and footers. after reading these first three chapters, my sense is that one is ready to dive into a markup project, which is exactly what i did with my own project. that said, i think a reread of these core chapters is due, which i intend to do presently. the rest of the book is devoted to specialty subjects such as how to embed fonts, use of audio and video clips, "media overlays" (epub 3 supports a subset of smil, the synchronized multimedia integration language, for creating synchronized text/audio/video presentations), interactivity and scripting (with javascript), global language support, accessibility issues, provision for automated text-to-speech, and a nice utility chapter on validation of epub 3 xml files. of these, the chapter on global language support i found to be fascinating. for us native english speakers, it's not immediately obvious some of the problems one will inevitably encounter when trying to create an electronic publication that can work in non-western languages. just consider languages that read vertically and from right to left, for one! as an epub novice, my greatest desire would be for the book to provide, maybe in an appendix, a fairly comprehensive example of an epub 3 marked -up book. maybe this is a tall book reviews 45 order? nevertheless, i would love to see an example of marked up text including bidirectional footnotes, pagination, a table of contents, etc.; simple, foundational things, really. examples of each of these are included in the book, but not in one place. having such an example in one place would be something that could be used as a quick-start template for us epub beginners. to be fair, code examples of all of this is up on the accompanying website, and i am using these examples as i learn to code epub 3 for my own project. but having a single, relatively comprehensive example as an appendix to the book would be very useful. as i read this book, something kept bothering me. epub2 and epub 3 are so very different, with reading systems designed to render epub 3 documents being fairly rare at this point. so if different versions of the same spec are so different, with no guarantee that a future reading system will be able to read documents adhering to a previous version, then the prospect of reading epub documents into the future is pretty sketchy. are e-books, then, just convenient and cool mechanisms for currently reading longish narrative prose—convenient and cool, but transitory? mark cyzyk is the scholarly communication architect in the sheridan libraries, johns hopkins university, baltimore, maryland, usa. 78 design principles for a comprehensive library system tamer uluakar, anton r. pierce, and vinod chachra: virginia polytechnic institute and state university, blacksburg, virginia. this paper describes a project that takes a step-by-step or incremental approach to the development of an online comprehensive system running on a dedicated computer. the described design paid particular attention to present and predicted capabilities in computing as well as to trends in library automation. the resultant system is now in its second of three releases, having tied together circulation control, catalog access, and serial holdings . perspective the use of computers in libraries is no longer a speculative venture for the daring few. rather, library automation has become the accepted prerequisite for effective library service. the question faced is not "if," but rather "how" and "when." the reasons for this evolution are diverse, but fundamental is the recognition of online computer processing as the most effective means of simultaneously handling inventory control, information retrieval, and networking of large, complex, and volatile stores of data. most areas of current library practice could now benefit from effective computer-based control. mature and proven systems exist for cataloging, circulation, serials control, acquisitions, catalog access, and "reader guidance"; the latter by virtue of online literature searching facilities such as dialog, medlars, or brs. the challenge is to find or develop an optimal mix of capabilities. two common limitations from which library automation projects suffer are the use of nonstandardized, incomplete records and the lack of functional integration of different tasks. in most cases these limitations are due to historic circumstances. the pioneering systems say, those online systems introduced between 1967 and 1975 had to conserve carefully the available computing resources. a decade ago it was unthinkable for any library to store a million marc records online. mass manuscript received july 1980; accepted february 1981. design principles/uluakar, et al. 79 storage costs alone precluded that option. to best realize the benefits of automation, short records, usually of fixed length, were employed. there is little question that systems based on short records were helpful to their users . however, one characteristic of these systems was their proliferation within a particular library. after the first system was shown to be a success, it became compelling to try another. the problem was that these separate systems were usually not communicating directly with each other because of limitations imposed by program complexity and load on available resources. thus, the use of incomplete records breeds isolated, noncommunicating systems. however, system users have come to demand that all relevant data be available at a single terminal from a single system. it is not enough to know that a particular title is due back in twenty-five days; the user must also know that copy two has just been received, and that copy three is expected to arrive from the vendor in one week. that is, the functions of catalog access, circulation, and acquisitions must be brought together at a single place the user's terminal. and while the importance of functional integration has been recognized for some time, only a very few report successful implementations. i,z the kafkaesque alternative to functional integration becomes the library that has been "well computerized" but where the librarian must use five different terminals, one for each task. as computer-based systems have grown to maturity, increasing stress has been placed on standardization . in library automation the measure of standardization is wide-scale use of the marc formats for documents and authorities; the use of bibliographic "registry" entries such as isbn, issn, or coden; the use of standard bibliographic description; and so forth. however, the application of common languages and standardized protocols, data description, and definition has been less pervasive. we find many applications that eschew use of the common high-level languages, database management systems, and standard "off-the-shelf' or general-purpose hardware. the emergence of powerful and easy-to-use database management systems, the spectacular price reductions in hardware, and the concomitant, and equally spectacular, improvements in system capabilities have made it clear that it is practical to think ambitiously. perhaps the major articulation of these developments has been the pervasive shift from a central computer shared with nonlibrary users to the utilization of dedicated minicomputers. 3 our analysis of the requirements of a comprehensive system led to recognition of the key role played by serials in research libraries. serials form the most critical factor in automating library service because of the complexity of their bibliographic, order, and inventory records, and because of their importance to research. 4 a fundamental error in designing a comprehensive library system would involve focusing on the require80 journal of library automation vol. 14/2 june 1981 ments of monographs and/or other "one-shot" forms of the literature. the reason is, simply, that monographs and other such publications can be treated as an easy limiting case of a continuing set of publications . this observation is borne out by christoffersson, who reports an application that extends the idea of seriality and develops a means to provide useful control and access to all classes of material. 5 design philosophy the concerns outlined above mean that a viable library system should meet the following design criteria: functional integration. functional integration is simply the ability to conduct all appropriate inquiries, updates, and transactions on any terminal. this envisages a cradle-to-grave system wherein a title is ordered, has its bibliographic record added to the database, is received and paid, has its bibliographic record adjusted to match the piece, is bound, found by author, title, subject, series, etc., charged out, and, alas, flagged as missing. in this way a terminal linked to the system will be a one-stop place to conduct all the business associated with a particular title, subject, series, order, claim, vendor, or borrower. completeness of data. if the system is to be functionally integrated, it is clear that it must carry the data required to support all functions. in particular, data completeness is required to satisfy the access and control functions. consider, for example, the problems associated with the cataloging function. a book is frequently known by several titles or authors. creating these additional access points is a large portion of the cataloger's responsibility. only systems that allow the user access to these additional entries utilize the effort spent in building the catalog record. such system capabilities must be present to allow the laborintensive card catalog to be closed and, more important, to allow maintenance of the catalog within the system . use of standardized data and networking. in an excellent article, silberstein reminds us that, in general, the primary rationale for adhering to standards is interchangeability. 6 we give great importance to being able to project our data to whatever systems may develop in the future. we believe this consideration is of the highest priority because, fundamentally, the only thing that will be preserved into the future is the data itself.* without interchangeability of data, sharing of resources is impossible. data interchangeability is, of course, a basic assumption that has been made in speculation concering the national bibliographic network7 developing from the bibliographic utilities-notably, oclc, inc., the research libraries group's rlin facility, the washington library network, and the university of toronto's utlas facility. today, nearly all *this state of affairs seems to be true for all computer-based systems because their lifetime is, typically, no greater than ten years. design principles!uluakar, et al. 81 research libraries participate in some utility. while their participation is primarily directed to utilization of the c<;~,taloging support services, we find an increasing amount of interest and use of additional capabilities, notably interlibrary loan. we expect a steady and continual growth of these library networking capabilities. however, networking is not problem free. perhaps the biggest single problem in using the network is the misalignment between the record as found on the bibliographic database and the requirements of individual libraries. while such variability between the resource database record and the user's needed version is well understood, 8 the local library frequently has a difficult time adjusting records to meet local needs. one example is oclc's inability to "remember" in the online database a particular library's version of a record. another example is the conser project's practice of "locking" very dynamic records as soon as they are authenticated. this locking frequently means that required updates cannot be made and users cannot share with one another corrections to the base record. after locking, each must, independently, go about bringing the record up to date. thus, as roughton notes, "the next library to call up the record loses the benefit of the previous library's work. "9 this inhospitable state of affairs forces individual libraries to maintain their own records if they wish to change bibliographic records after initial entry. the problem of local adjustment of bibliographic records in no way conflicts with the goal of standardized bibliogra:phic data. standardized data provides a quick means of delivering an intelligible package to a variety of users who will adapt the package to meet their particular needs . standardization does not mean making adaptation inefficient or more costly than it need be; rather, standards provide a framework around which the details are filled in. these observations on standardized data formats imply that the library's data must be based on marc records for books, serials, authorities, etc.; and on the ansi standards for summary serials holdings notation, book numbers, library addresses, and so forth. microscopic data description. at this point, system administrators face a fundamental problem-many of the library's important records have no standard format. the most conspicuous example involves the notation for detailed serials holdings. 10 the only alternative one has when trying to build a system without standardized formats is to rely on "microscopic" description. that is, each and every distinct type of data element that makes up (or can make up) a field in a record must be accounted for and uniquely tagged. in this way, whatever standard format is ultimately set, it will be possible, in principle, to assemble by algorithm the data elements into an arrangement that will be in conformity with the standard. only if the library is using microscopic data description will the library be able to maintain its independence of particular lines 82 journal of library automation vol. 14/2 june 1981 of hardware or software. we are convinced that the use of untagged, free-form input will, in the long run, spell disaster. use of general purpose hardware and software. many strategies in dealing with library automation involve redesigning standard hardware or software. for example, one vendor has reported an interesting design of mass storage units that improved access time. 11 we feel that future applications should, as much as possible, steer clear of such customized implementations because the standard capabilities of most affordable systems allow sufficient processing power and storage economies even if these capabilities are suboptimal for a particular application . the use of general-purpose hardware and system software promotes system sharing between different installations. moreover, an application based on general-purpose hardware and system software will be easier to maintain and far less vulnerable to changes in personnel. for turnkey installations, the greater the degree of use of general-purpose hardware and software, the better shielded will the installation be against changes in product line or the vendor's ultimate demise . a noteworthy application of this principle of compatibility is seen in the system being developed by the national library of medicine. 12 system description the functional capabilities of the virginia tech library system (vtls) have been developed in two software releases, with the third release soon to appear. the initial release met the needs associated with circulation control and also provided rudimentary access to the catalog and serials holdings. the present release has benefited from the use of the marc format, and allows vastly improved catalog access and control. release iii, the comprehensive library system now being developed, will draw together acquisitions, authority control, and serials control with the current capabilities. vtls release i the initial release of the system was developed in 1976 to meet needs generated by rapid library growth. circulation transactions had been increasing at about 10 percent annually for the previous decade and were straining the manually maintained circulation files beyond acceptable limits. the main library* at virginia tech is organized in subject divisions-each essentially "owning" one floor of a 100,000-square-foot facility. a 100,000-square-foot addition to the library had been approved. because virginia tech's library has only one card catalog, some means was necessary to distribute catalog information throughout a facility that *only two quite small branch libraries (architecture and geology) exist on campus . in addition there is a reserve collection located in the washington, d.c., area that supports off-campus graduate programs in the areas of education, business administration, and coiuputer science. all these sites are linked to the system. design principles/uluakar, et al. 83 was to double its size. after reviewing the alternative means of distributing the catalog-e . g., a duplicate card catalog, photographic reproduction of the catalog, or a com catalog-it was decided to attack both problems, circulation control and remote catalog access, within a single online system . vtls was installed on a full-time basis in august 1976. its first release ran continuously on the library's dedicated hewlett/packard 3000 minicomputer until december 1979 . at that time the system held brief bibliographic data for approximately 325,000 monographs and 25,000 journals and other serial titles-records for about half the collection. while the first release ably met its goals, it became clear that it would prove to be an unsuitable host for additional modules involving acquisitions and serials control, primarily because of the brief, fixed-length bibliographic records. as a result of highly favorable price reductions in computer hardware and improvements in capability, it was possible to think in terms of storing one million marc records online as well as supporting the additional terminals required for a comprehensive library system. vtls release ii vtls runs under a single online program for all real-time transactions. the major goals in the design of this program were the following: 1. two conflicting requirements had to be a~commodated : first, the program had to be easy to use for library patrons. this is requisite for a system that will eventually replace the card catalog. second, the program had to be practical, efficient, and versatile for its professional users. the keystrokes required had to be minimal, and related screens had to be easily accessible· from one to another. 2. the response time had to be good, especially for more frequent transactions. 3. the contents of all screens had to be balanced to provide enough information without being overcrowded and difficult to read or comprehend. further, each screen of vtls had to be arranged by some logical arrangement of the data it contains-for most screens this meant alphabetical sorting of the data according to ala rules. 4. the format of all screens, especially those to be viewed by the patrons, had to be visually pleasing. thus , the use of special symbols (which are so abundant on many computer system displays), nonstandard abbreviations, and locally (and often quite arbitrarily) defined terms were unacceptable. 5. the program had to have security provisions to restrict certain classes of users from addressing particular modules of the program. considerable effort was spent to satisfy these goals. the first goal was achieved by the "network of screens" approach. the second goalprompt system response-necessitated the use of the "data buffer 84 journal of library automation vol. 14/2 june 1981 method," which, in turn , proved to have other uses (both of these techniques are discussed below) . to satisfy goals three and four, a committee of librarians and analysts spent months drafting and reviewing each screen until it was finally approved by the design group. goal fivesecurity provisions-was reached without much difficulty. network of screens vtls' s data-access system is designed to be used as easily as a road map. this is accomplished by the use of a "network of screens." the network of screens is much like a road map in which a set of related data (a screen displayed in one or more pages) acts as a "city," and the commands that lead from one set to another act as "highways." vtls has nineteen screens including various menu screens, bibliographic screens (see "the data buffer method" below), serial holdings screens, item (physical piece) screens, and screens for patron-related data. the user can "drive" from one "city" to another us ing system commands. the system commands are either "global" or "local." global commands, as the name implies, may be entered at any point during the execution of the online program. a local command is peculiar to a given screen. global commands are of two types: search commands and processing commands. search commands are used to access the database by author, title, subject, added entries, call number, lc card number, isbn, issn, patron name, etc. processing commands, on the other hand, initiate procedures such as check-out, renewal, or check-in of items. the user first enters a global (search) command to access one of the screens in the network. from there, local commands that are specific to the current screen can be used. there are three different types of local commands: commands that take the user from one screen to another; commands that page within the current screen; and commands that update data related to the screen. for example, it is possible to start by entering an author search command to access the network and then proceed not only to find what books the author has in the system but also the availability of each of the books . if the books are checked out, information about the patrons who have them can also be reached. this display is called the patron screen . from the patron screen, one can "drive" to the patron activity screen , which displays circulation information about the patrons. thus, each d isplayed screen leads to another. in fact, the searches can start at ten different screens and proceed in many different ways through the network. database design image/3000, hewlett-packard's database management system used by vtls, is designed to be used with fixed-length records. this fact, coupled with the need to sort entries on most screens, created serious problems in the early stages of the system design . but various techdesign principles/uluakar, et al. 85 niques were devised to overcome these apparent road blocks . figure 1 illustrates the breakdown of the bibliographic record in the database and the way it is linked with piece-specific · data. bibliographic data are stored in three distinct groups for subsequent retrieval: l. controlled vocabulary terms. (authority data set) 2. title and title-like data. (title data set) 3. all remaining bibliographic data; i.e., data that is not indexed. (marc-other data set) this grouping of the marc record extends to subfields, thus splitting mixed fields such as author-title added entries . when individual fields are parsed in this way, a single field may contribute more than one access point, such as variant forms of author, title, series name, subject, and added entries. access by the standard bibliographic control numbers is effected by use of inverted files (not shown in the figure). a fundamental characteristic of this layout involves the storage of controlled vocabulary terms (i.e., authors and subjects). regardless of the number of references made to an authority term from different bibliographic records, the controlled vocabulary term is stored only once. the system assigns a unique number (authority id) to each such term and uses this number to keep records of the references made to it in a separate data set (authority bibliographic linkage data set). this particular structure makes an authority control subsystem possible, speeds up online retrieval and display, and economizes mass storage. the data buffer method the system displays bibliographic records in two different formats. if the terminal used is designated for librarians, the records are displayed al'thority -bibliographic linka<;e data set fh;ure 16. biblio<;raphic layo ut of the cfs-11 data base . tsimplif'iedl fig. 1. bibliographic layout of the vtls database (simplified). 86 journal of library automation vol. 14/2 june 1981 in the marc format (the resulting screen is referred to as the marc screen); otherwise, they are displayed in a screen that is formatted similar to a catalog card. before displaying these screens, the online program collects and formats the data to be displayed and stores it in one of the two "buffer" data sets. the records stored in the buffer data sets are called buffer records. buffer records can be edited, as required, by adding new lines, deleting, or modifying existing character strings. these updates can be executed quickly and without placing much load on the system since they involve little, if any, analysis, indexing, and sorting. thus, the buffer data sets store all bibliographic updates and new data entry of the day. at night, these records are transferred to the rest of the database by a batch program. the data buffer method has had several pronounced effects on the system. by transferring periods of heavy resource demand to off-hours, the system can work with full marc records in a library that has a heavy real-time load of data entry, inquiry, and circulation. the data buffer approach also improves access efficiency because once a buffer record is prepared for a screen, subsequent searches for the same record are satisfied by the buffer record. data entry and the oclc interface the most frequently encountered method of entering marc records into a local computer involves use of tape in the marc ii communications format . alternative methods include the use of microprocessors or digital recorders which "play back" a marc-tagged screen image from oclc or some other bibliographic utility. these alternative methods have the strong advantage of shortening the delay introduced while waiting for a tape to be delivered. we have been able to link the utility's terminal to the data buffer. 13 data flows from the utility to the buffer in real time. no intervention in the utility's terminal was required for the local processor to be able to capture the marc-tagged screen. batch programs running on the hip 3000 read records from printer ports of oclc terminals and pass them directly to the data buffer. once a record gets into the data buffer, it is accessible by oclc number so that subsequent editing and linkage to piece-specific data or serial holdings can be made right away in the local system . buffer records can also be created by direct keyboarding of the full array of fixed and variable fields using the vtls terminals. circulation as with most other online circulation systems, vtls uses machinesensible bar-code labels to identify books and borrowers to the system. all efforts have been made to humanize the system. one consequence is design principles/uluakar, et al. 87 that the system does not make decisions better made by responsible staff. thus, two kinds of circulation stations reside side by side. the first is staffed by students who typically work a ten-to-twenty-hour week and historically have shown high turnover. their circulation stations only deal with inquiries and with heavily used but nondiscretionary transactions: check-out, renewal, and check-in. should problems arise, the borrower is directed to the adjacent station staffed by a full-time employee who, using the system, can articulate circulation policy to borrowers and make decisions with regard to any questions concerning fines, lost books, or reinstatement of invalidated or blocked privileges . start-up we found system start-up to be a relatively easy task. it was convenient to use the so-called rolling conversion in which items were labeled upon their initial circulation through the system. the greatest benefit was seen in the first year when the probability that items brought to the circulation desk were already known to the system increased exponentially. after six months this probability had risen to 65 percent with only 10 percent of the circulating collection having been labeled . at the end of the year the probability increased linearly at 0. 7 percent per month. after three years of operation, the probability was 90 percent, with approximately 50 percent of the circulating collection having been labeled. reference use the ability to distribute catalog access as well as circulation information provides a powerful information tool. a subset of all functions previously described is available to the nonlibrarian users of the system through user-cordial screens. a "help" function may also be initiated at any screen to guide users through the network of screens. current development critical to the overall design of vtls is the system's ability to treat serials and continuations. without this capability, the modules being developed to support acquisitions, serials check-in and claiming, and binding, will not function satisfactorily. equally important, the design lays the foundation for authority control by virtue of its use of a dictionary for all controlled vocabulary terms. thus a name or subject entry is carried internally as a four-byte code, which is translated to the authority entry upon display. another internally coded data element, the bib-id, is designed to handle many of the linkage problems associated with serials and continuations. the bib-id is unique for each marc record. prior to establishing the serials control modules governing receipt, 88 journal of library automation vol. 14/2 june 1981 claiming, and binding, the coded holdings module must be functioning. this module will allow automatic identification of volume (or binding unit) closure and automatic identification of gaps in holdings or overdue receipts. thus, highest priority has been given to the development of this module so that these other modules can, in turn, develop. the holdings module serves two functions: first, it allows the detailed recordings of serials holdings consistent with the principle stated earlier concerning microscopic data description; and second, these microscopic data are coded so that the system can recognize (and predict) particular pieces or binding units in terms of enumerative and chronological data. the next three areas of development are modules for acquisitions and fund control, serials receipts and binding, and authority control. the final development will be comprehensive management reports. it should be noted that each one of these developments will result in a specific benefit to the user community. the project is incremental in that the development of area a does not mean that area b must be developed for a to have lasting value. this incremental approach offers designers and administrators the advantages associated with an orderly growth in complexity and budget requirements. further, the capabilities of the host hardware and software are stressed in smaller steps than would be the case if the comprehensive system were written and then turned on. the key move appears to be predefining the scope and capabilities of each stage so that a useful product emerges at its completion, and so that it lays a foundation for the next. references 1. velma veneziano and james s. aagaard, "cost advantages of total system development," in proceedings of the 1976 clinic on library applications of data processing (urbana, ill.: university of illinois press, 1976), p.133-44 . 2. charles payne and others, "the university of chicago data management system ," library quarterly 47:1-22 (jan . 1977). 3. audry n. grosch, minicomputers in libraries (new york: knowledge industry press, 1979), 142p . 4. richard degennaro, "wanted: a mini-computer serials system," library journal 102:878-79 (april 15, 1977). 5. john g. christoffersson, "automation at the university of georgia libraries," journal of library automation 12:23-38 (march 1979). 6. stephen m. silberstein, "standards in a national bibliographic network," journal of library automation 10:142-53 (june 1977). 7. network technical architecture group, "message delivery system for the national library and information service network: general requirements," in david c. hartmann, ed . , library of congress network planning paper, no.4, 1978, 35p. 8. arlene t. dowell, cataloging with copy (littleton, colo.: libraries unlimited, 1976), 295p. 9. michael roughton, "oclc serials records: errors , omissions, and dependability," journal of academic librarianship 5:316-21 (jan. 1980). 10. tamer uluakar, "needed: a national standard for machine-interpretable representation of serial holdings," rtsd newsletter 6:34 (may/june 1981) . design principles!uluakar, et al. 89 11. c.l. systems, inc., "the libs 100 system: a techn-ological perspective," clsi newsletter, no .6 (fall/winter 1977). 12. lister hill national center for biomedical communications, national library of medicine, "the integrated library system: overview and status" (lhc/ctb internal documentation, bethesda, md., october 1, 1979), 55p. 13. francis j. galligan to pierce, 11 feb. 1980. tamer uluakar is manager of the virginia tech library automation project. anton r. pierce is planning and research librarian at the university libraries. vinod chachra is director of computing resources and associate professor of industrial engineering. using the harvesting method to submit etds into proquest: a case study of a lesser-known approach communications using the harvesting method to submit etds into proquest a case study of a lesser-known approach marielle veve information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12197 marielle veve (m.veve@unf.edu) is metadata librarian, university of north florida. © 2020. abstract the following case study describes an academic library’s recent experience implementing the harvesting method to submit electronic theses and dissertations (etds) into the proquest dissertations & theses global database (pqdt). in this lesser-known approach, etds are deposited first in the institutional repository (ir), where they get processed, to be later harvested for free by proquest through the ir’s open archives initiative (oai) feed. the method provides a series of advantages over some of the alternative methods, including students’ choice to opt-in or out from proquest, better control over the embargo restrictions, and more customization power without having to rely on overly complicated workflows. institutions interested in adopting a simple, automated, post-ir method to submit etds into proquest, while keeping the local workflow, should benefit from this method. introduction the university of north florida (unf) is a midsize public institution established in 1972, with the first theses and dissertations (tds) submitted in 1974. since then, copies have been deposited in the library, where bibliographic records are created and entered in the library catalog and the online computer library center (oclc). during the period of 1999 to 2012, some tds were also deposited in proquest by the graduate school on behalf of students who decided to. this practice, however, was discontinued in the summer of 2012, when the institutional repository, digital commons, was established and submission to it became mandatory. five years later, in the summer of 2017, interest in getting unf tds hosted in proquest resurfaced. this renewed interest grew out from a desire of some faculty and graduate students to see the institution’s electronic theses and dissertations (etds) posted there, in addition to a recent library subscription to the proquest dissertations & theses global database (pqdt). a month later, conversations between the library and graduate school began on the possibility of resuming hosting unf etds in proquest. consensus was reached that the pqdt database would be a good exposure point for our etds, in addition to the institutional repository (ir), yet some concerns were raised. one of the concerns was cost of the service and who would be paying for it. neither the library nor the graduate school had allocated funds for this. the next concern was the possibility of proquest imposing restrictions that could prevent students, or the university, from posting etds in other places. it was important to make sure there were no such restrictions. another concern was expressed over students entering embargo dates in proquest that do not match the embargo dates selected for the ir. this is a common problem encountered by other libraries.1 for that reason, we wanted to keep the local workflow. the last concern expressed during the conversations was preserving students’ right to opt-in or out from distributing their theses in proquest. this is something both the graduate school and library have been adamant mailto:m.veve@unf.edu information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 2 about. in higher education, requiring students to submit to proquest is a controversial issue which has raised ethical concerns and has been highly debated over the years.2 once conversations between the library and graduate school were held and concerns were gathered, the library moved ahead to investigate the available options to submit etds into proquest. literature review currently, there are three options to submit etds into proquest: (1) submission through the proquest etd administrator tool, (2) submission via file transfer protocol (ftp), and (3) submission through harvests performed by proquest.3 proquest etd administrator submission option in this option, a proprietary submission tool called proquest etd administrator is used by students, or assigned administrators, to upload etds into proquest. inside the tool, a fixed metadata form is completed with information on the degree, subject terms are selected from a proprietary list, and keywords are provided. the whole administrative and review process gets done inside the tool. afterwards, zip packages with the etds and proquest’s extensible markup language (xml) files are sent to the institution via ftp transfers, or through direct deposits to the ir using the simple web-service offering repository deposit (sword) protocol. the etd administrator submission method presents several shortcomings. first, the proquest xml metadata that is returned to the institutions must be transformed into ir metadata for ingest in the ir, a process that can be long and labor intensive.4 second, the subject terms supplied in the returned files come from a proprietary list of categories maintained by proquest, which does not match the library of congress subject headings (lcsh) used by libraries.5 third, control over the metadata provided is lost because the metadata form cannot be altered, plus customizations to other parts of the system can be difficult to integrate. 6 fourth, there have been issues with students indicating different embargo periods in the proquest and ir publishing options, with instances of students choosing to embargo etds in the ir, while not in proquest.7 lastly, this method does not allow students’ choice, unless the etds are submitted separately in two systems in a process that can be burdensome. ultimately, for these reasons, we found the etd administrator not a suitable option for our institution. ftp submission option in this option, an administrator sends zip packages with the institution’s etd files and proquest xml metadata to proquest via ftp.8 at the time of this investigation, there was a $25 charge per etd submitted through this method.9 we did not want to pursue this option because of the charge and the tedious metadata transformations that would be needed between ir and proquest xml schemas. another way to go around this would have been to submit the etds through the vireo application. vireo is an open source, etd management system used by libraries to freely submit etds into proquest via ftp.10 this alternative, however, was not an option for us as our ir, digital commons, does not support the vireo application. harvesting submission option this is the latest method available to submit etds into proquest. in this option, etds are submitted first into an ir, or other internal system, where they get processed to be later harvested by proquest through the ir’s existing open archives initiative (oai) feed.11 at the time of this writing, we were not able to find a single study that documents the use of this method. this option information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 3 looked appealing and worth pursuing as it met most of our desired criteria. first, with this option, students’ choice would not be compromised as etds would be submitted to proquest after being posted in the ir. second, because the etd administrator would not be used, issues with conflicting embargo dates and unalterable metadata forms would be avoided. in addition, the local workflow would be retained, thus eliminating the need for tedious metadata transformations between proquest and ir schemas. from the available options, this one seemed the most feasible solution for our institution. implementation of the harvesting method at unf after research on the different submittal options was performed, the library approached proquest to express interest in depositing our future etds into their system by using a post-ir option. in the first communications, proquest suggested we use the etd administrator to submit etds because is the most commonly used method. when we expressed interest in the harvesting option, they said “we have not been harvesting from bepress sites” (the company that makes digital commons) and suggested we use the ftp option instead.12 ten months later, they clarified the harvests could be performed from bepress sites and that the option is free, with the only requirement of a non-exclusive agreement between the university and proquest. the news appeased both the library’s and the graduate school’s previous concerns, as we would be able to adopt a free method that would not compromise on students’ choice nor restrict students from posting in other places, while keeping the local workflow. after agreement on the submittal method was established, planning and testing of the harvesting method began. the library worked with proquest and bepress to customize the harvesting process while the university’s office of the general counsel worked with proquest on the negotiation process. negotiation process before proquest could harvest unf etds, two legal documents needed to be in place. the first document was the theses and dissertations distribution agreement, which specifies the conditions under which etds can be obtained, reproduced, and disseminated by proquest. the document had to be signed by the unf’s board of trustees and proquest. the agreement stipulated the following conditions: • the agreement must be non-exclusive. • the university must make the full-text uniform resource locators (urls) and abstracts of etds available to proquest. • proquest must harvest the etds from the university’s ir. • the university and students have the option to elect not to submit individual works or to withdraw them. • no fees are due from the university or students for the service. • proquest must include the etds in the pqdt database. the second document that needed to be in place was the theses and dissertations availability agreement, which grants the university the non-exclusive right to reproduce and distribute the etds. this agreement between students and unf specifies the places where etds can be hosted and the embargo restrictions, if any. unf already has been using this document as part of its etd workflow, but the document needed to be modified to include the additional option to submit information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 4 etds into proquest. beginning with the spring 2019 semester, the revised version of the agreement provided students with two hosting alternatives: posting in the ir only or in the ir and proquest. local steps performed before the harvesting the workflow begins when students upload their etds and supplemental files (certificate of approval and availability agreements) directly into the digital commons ir. in there, students complete a metadata template with information on the degree and keywords related to the thesis are provided. after this, the graduate school reviews the submitted etds and approves them inside the ir platform. next, the library digital projects’ staff downloads the native pdf files of etds, processes them, and creates public and archival versions for each etd. availability agreements are reviewed to determine which students chose to embargo their etds and which ones chose to host them in proquest, in addition to the ir. if students choose to embargo their etds, the embargo dates are entered in the metadata template. if students choose to publish their etds in proquest, a “proquest: yes” option is checked in their metadata template, while students who choose not to host in proquest would get a “proquest: no” in their template. (the proquest field is a new administrative field that was added to the etd metadata template, starting with the spring 2019 semester, to assist with the harvesting process. it was designed to alert proquest of the etds that were authorized for harvesting. more detail on its functionality will be provided in the next section.) the reason library staff enters the proquest and embargo fields on behalf of students is to avoid having students enter incorrect data on the template. following this review, the metadata librarian assigns library of congress subject headings to each etd and creates authority files for the authors. these are also entered in the metadata template. afterwards, the etds get posted in the digital commons’ public display, with the fulltext pdf files available only for the non-embargoed etds. information that appears in the public display of digital commons will also appear immediately in the oai feed for harvesting. at this point, two separate processes take place: 1. metadata librarian harvests the etds’ metadata from the oai feed and converts it into marc records that are sent to oclc, with the ir’s url attached. the workflow is described at https://journal.code4lib.org/articles/11676. 2. on the seventh of each month, proquest harvests the full-text pdf files, with some metadata, of the non-embargoed etds that were authorized for harvesting from the oai feed. harvesting process (customized for our institution) to perform the harvests, proquest creates a customized robot for each institution that crawls oaipmh compliant repositories to harvest metadata and full-text pdf files of etds.13 the robot performs a date-limited oai request to pull everything that has been published or edited in an ir’s publication set during a specific timeframe. information to formulate the date limited request is provided to proquest by the institution for the first harvest only, subsequently, the process gets done automatically by the robot. the request contains the following elements: information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 5 • base url of the oai repository • publication set • metadata prefix or type of metadata • date range of titles to be harvested in the particular case of our institution, we needed to customize the robot to limit the harvests to authorized etds only. to achieve this, we worked with bepress to add a new, hidden field at the bottom of our digital commons’ etd metadata template. the field, called proquest, consisted of a dropdown menu with 2 alternatives: “proquest yes” or “proquest no” (see figure 1). the field was mapped to an element in the oai feed that displays the value of “proquest: yes” or “proquest: no,” thus alerting the robot of the etds that were authorized for harvesting and the ones that were not. the element used to map the proquest field in the oai feed is the , which is a qualified dublin core (qdc) element (figure 2). for that reason, the robot needs to perform the harvests from the qdc oai feed in order to see this field. figure 1. display of the proquest field’s dropdown menu in the metadata template figure 2. display of the proquest field in the qdc oai feed after the etds authorized for harvesting have been identified with help from the “proquest: yes” field, the robot narrows down the ones that can be harvested at the present moment by using the element. this element, as the name implies, provides the date when the full text file of an etd becomes available. it also displays in the qdc oai feed (see figure 3). if the date is on or before the monthly harvest day, the etd is currently available for harvesting. if the date is in the future, the robot identifies that etd as embargoed and adds its title to a log of embargoed etds with some basic metadata (including the etd’s author and the last time it was checked). the log of embargoed etds is then pulled out in the future to identify the etds that come out of embargo so the robot can retrieve them. figure 3. display of the element in the qdc oai feed information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 6 after the etds that are currently available for harvesting have been identified (because they have the “proquest: yes” field and a present or past availability date), the robot performs a harvest of their full-text pdf files by using the third element, which displays at the bottom of records in the oai feed (figure 4). the third element contains a url with direct access to the complete pdf file of etds that are currently not embargoed. etds that are currently on embargo contain a url that redirects the user to a webpage with the message: “the full-text of this etd is currently under embargo. it will be available for download on [future date]” (see figure 5). figure 4. display of the third element at the bottom of records in the qdc oai feed figure 5. message that displays in the url of embargoed etds once the metadata and full-text pdf files of authorized, non-embargoed etds have been obtained by the robot, they get queued for processing by the proquest editorial team, who then assigns them international standard book numbers (isbns) and proquest’s proprietary terms. it takes an average of four to nine weeks for the etds to display in the pqdt database after been harvested. records in the pqdt come with the institutional repository’s original cover page and a copyright statement that leaves copyright to the author. afterwards, the process gets repeated once a month. this frequency can be set to quarterly or semi-annually if desired. information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 7 additional points on the harvesting method handling of etds that come out of embargo. when the embargo period of an etd expires, the full-text pdf of it becomes automatically available in the ir’s webpage, and consequently, in the third element that displays in the oai record. each month, when the robot prepares to crawl the oai feed, it will first check for the titles in the log of embargoed etds to determine if any of them have become fully available through the third element. the ones that become available are then pulled by the robot through this element. handling of metadata edits performed after the etds have been harvested and published in pqdt. edits performed to metadata of etds will trigger a change of date in the element that displays in the oai records. this change of date will alert the robot of an update that took place in a record, which is then manually edited or re-harvested, depending on the type of update that took place. sending marc records to oclc. as part of the harvesting process, proquest provides free marc records for the etds hosted in their pqdt database. these can be delivered to oclc on behalf of the institution on an irregular basis. records are machine-generated “k” level and come with urls that link to the pqdt database and with proquest’s proprietary subject terms. we requested to be excluded from these deliveries and continue our local practice of sending marc records to oclc with lcsh, authority file headings, and the ir’s urls. notifications of harvests performed by proquest and imports to the pqdt database. when harvests or imports to the pqdt have been performed by proquest, institutions do not get automatically notified. still, they can request to receive scheduled monthly reports of the titles that have been added to the pqdt. unf requested to receive these monthly reports. usage statistics of etds hosted in pqdt. usage statistics of an institution’s etds hosted in the pqdt can be retrieved from a tool called dissertation dashboard. this tool is available to the institution’s etd administrators and provides the number of times some aspect of an etd (e.g., citations, abstract viewings, page previews, and downloads) has been accessed through the pqdt database. royalty payments to authors. students who submit etds through this method are also eligible to receive royalties from proquest. obstacles faced during the planning phase, we encountered some obstacles that hindered progress on the implementation. these were: • amount of time it took to get the ball rolling. initially, we were misled by the assumption we would not be able to use the harvesting method to submit etds into proquest because we were bepress users, as we were originally told, but that ended up not being the case. ten months later, we were notified by the same source that the harvesting option for bepress sites would be possible and doable by proquest. these were ten months that delayed the implementation process. information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 8 • amount of time it took to get the paperwork finalized and signed before the harvesting. from the moment first contact was initiated with proquest, to the moment the last agreement was finalized and signed by both parties, 21 months went by. there was a lot of back and forth in the negotiation process and paperwork between the university and proquest. • inconsistent lines of communication. there were multiple parties involved in the communication process and some of the emails began with one person only to be later transferred to someone else. this lack of consistency in the communication lines made it difficult to determine who was in charge of particular tasks at certain stages of the process. conclusion and recommendations although problems were encountered at the beginning, implementation of the harvesting process at unf was a complete success. once the process started, it ran smoothly without complications. harvests were performed on schedule and no issues with unauthorized content been pulled from the oai were faced. fields used to alert the robot in the oai of the etds authorized for harvesting worked as planned, and so did the embargo log used to identify and pull the out of embargo etds. it should be noted that digital commons users who want to exclude embargoed etds from displaying in the oai can do so by setting up an optional yes/no button in their submission form. this button prevents metadata of particular records from displaying in the oai feed. we did not pursue this option because we have been using the etd metadata that displays in th e oai to generate the marc records we send to oclc. in addition, we took the necessary precautions to avoid exposing full content of the embargoed etds in the oai feed. institutions planning to use this method should be very careful with the content they display in the oai as to avoid embargoed etds from been mistakenly pulled by proquest. access restrictions can be set by either suppressing the metadata of embargoed etds from displaying in the oai or by suppressing the urls with full access to the embargoed etds. the same precaution should be taken if planning to provide students with the choice to opt-in or out from proquest. altogether, the harvesting option proved to be a reliable solution to submit etds into proquest without having to compromise on students’ choice nor rely on complicated workflows with metadata transformations between ir and proquest schemas. institutions interested in adopting a simple, automated, post-ir method, while keeping the local workflow, should benefit from this method. information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 9 endnotes 1 dan tam do and laura gewissler, “managing etds: the good, the bad, and the ugly,” in what’s past is prologue: charleston conference proceedings, eds. beth r. bernhardt et al. (west lafayette, in: purdue university press, 2017), 200-04, https://doi.org/10.5703/1288284316661; emily symonds stenberg, september 7, 2016, reply to wendy robertson, “anything to watch out for with etd embargoes?,” digital commons google users group (blog), https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20dates%7csort:da te/digitalcommons/rningtrarny/6byzt9apaqaj. 2 gail p. clement, “american etd dissemination in the age of open access: proquest, noquest, or allowing student choice,” college & research libraries news 74, no. 11 (december 2013): 562– 66, https://doi.org/10.5860/crln.74.11.9039; fuse, 2012-2013, graduate students re-fuse!, https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/152270/graduate%20students %20re-fuse.pdf?sequence=25&isallowed=y. 3 “pqdt submissions options for universities,” proquest, http://contentz.mkt5049.com/lp/43888/382619/pqdtsubmissionsguide_0.pdf . 4 meghan banach bergin and charlotte roh, “systematically populating an ir with etds: launching a retrospective digitization project and collecting current etds,” in making institutional repositories work, eds. burton b. callicott, david scherer, and andrew wesolek (west lafayette, in: purdue university press, 2016), 127–37, https://docs.lib.purdue.edu/purduepress_ebooks/41/. 5 cedar c. middleton, jason w. dean, and mary a. gilbertson, “a process for the original cataloging of theses and dissertations,” cataloging and classification quarterly 53, no. 2 (february 2015): 234–46, https://doi.org/10.1080/01639374.2014.971997. 6 wendy robertson and rebecca routh, “light on etd’s: out from the shadows” (presentation, annual meeting for the ila/acrl spring conference, cedar rapids, ia, april 23, 2010), http://ir.uiowa.edu/lib_pubs/52/; yuan li, sarah h. theimer, and suzanne m. preate, “campus partnerships advance both etd implementation and ir development: a win-win strategy at syracuse university,” library management 35, no. 4/5 (2014): 398–404, https://doi.org/10.1108/lm-09-2013-0093. 7 do and gewissler, “managing etds,” 202; banach bergin and roh, “systematically populating,” 134; donna o’malley, june 27, 2017, reply to andrew wesolek, “etd embargoes through proquest,” digital commons google users group (blog), https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20proquest%7csort :date/digitalcommons/gadwi8infga/sg7de7sdcaaj. 8 gail p. clement and fred rascoe, “etd management & publishing in the proquest system and the university repository: a comparative analysis,” journal of librarianship and scholarly communication 1, no. 4 (august 2013): 8, http://doi.org/10.7710/2162-3309.1074. 9 “u.s. dissertations publishing services: 2017-2018 fee schedule,” proquest. https://doi.org/10.5703/1288284316661 https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20dates%7csort:date/digitalcommons/rningtrarny/6byzt9apaqaj https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20dates%7csort:date/digitalcommons/rningtrarny/6byzt9apaqaj https://doi.org/10.5860/crln.74.11.9039 https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/152270/graduate%20students%20re-fuse.pdf?sequence=25&isallowed=y https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/152270/graduate%20students%20re-fuse.pdf?sequence=25&isallowed=y http://contentz.mkt5049.com/lp/43888/382619/pqdtsubmissionsguide_0.pdf https://docs.lib.purdue.edu/purduepress_ebooks/41/ https://doi.org/10.1080/01639374.2014.971997 http://ir.uiowa.edu/lib_pubs/52/ https://doi.org/10.1108/lm-09-2013-0093 https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20proquest%7csort:date/digitalcommons/gadwi8infga/sg7de7sdcaaj https://groups.google.com/forum/#!searchin/digitalcommons/embargo$20proquest%7csort:date/digitalcommons/gadwi8infga/sg7de7sdcaaj http://doi.org/10.7710/2162-3309.1074 information technology and libraries september 2020 using the harvesting method to submit etds into proquest | veve 10 10 “support: proquest export documentation,” vireo users group, https://vireoetd.org/vireo/support/proquest-export-documentation/. 11 “pqdt global submission options, institutional repository + harvesting,” proquest, https://media2.proquest.com/documents/dissertations-submissionsguide.pdf. 12 marlene coles, email message to author, january 19, 2018. 13 “proquest dissertations & theses global harvesting process,” proquest. https://vireoetd.org/vireo/support/proquest-export-documentation/ https://media2.proquest.com/documents/dissertations-submissionsguide.pdf abstract introduction literature review proquest etd administrator submission option ftp submission option harvesting submission option implementation of the harvesting method at unf negotiation process local steps performed before the harvesting harvesting process (customized for our institution) additional points on the harvesting method handling of etds that come out of embargo. handling of metadata edits performed after the etds have been harvested and published in pqdt. sending marc records to oclc. notifications of harvests performed by proquest and imports to the pqdt database. usage statistics of etds hosted in pqdt. royalty payments to authors. obstacles faced conclusion and recommendations endnotes smartphones: a potential discovery tool | starkweather and stoward 187 smartphones: a potential discovery tool wendy starkweather and eva stowers the anticipated wide adoption of smartphones by researchers is viewed by the authors as a basis for developing mobile-based services. in response to the unlv libraries’ strategic plan’s focus on experimentation and outreach, the authors investigate the current and potential role of smartphones as a valuable discovery tool for library users. w hen the dean of libraries announced a discovery mini-conference at the university of nevada las vegas libraries to be held in spring 2009, we saw the opportunity to investigate the potential use of smartphones as a means of getting information and services to students. being enthusiastic users of apple’s iphone, we and the web technical support manager, developed a presentation highlighting the iphone’s potential value in an academic library setting. because wendy is unlv libraries’ director of user services, she was interested in the applicability of smartphones as a tool for users to more easily discover the libraries’ resources and services. eva, as the health sciences librarian, was aware of a long tradition of pda use by medical professionals. indeed, first-year bachelor of science nursing students are required to purchase a pda bundled with select software. together we were drawn to the student-outreach possibilities inherent in new smartphone applications such as twitter, facebook, and myspace. n presentation our brief review of the news and literature about mobile phones in general provided some interesting findings and served as a backdrop for our presentation: n a total of 77 percent of internet experts agreed that the mobile phone would be “the primary connection tool” for most people in the world by 2020.1 the number of smartphone users is expected to top 100 million by 2013. there are currently 25 million smartphone users, with sales in north america having grown 69 percent in 2008.2 n smartphones offer a combination of technologies, including gps tracking, digital cameras, and digital music, as well as more than fifty-thousand specialized apps for the iphone and new ones being designed for the blackberry and the palm pre.3 the palm pre offered less than twenty applications at its launch, but one million apllication downloads had been performed by june 24, 2009, less than a month after launch.4 n the 2009 horizon report predicts that the time to adoption of these mobile devices in the educational context will be “one year or less.”5 data gathered from campus users also was presented, providing another context. in march 2009, a survey of university of california, davis (uc-davis) students showed that 43 percent owned a smartphone.6 uc-davis is participating in apple’s university education forum. here at unlv, 37 percent of students and 26 percent of faculty and staff own a smartphone.7 the presentation itself highlighted the mobile applications that were being developed in several libraries to enhance student research, provide library instruction, and promote library services. two examples were abilene christian university (http://www.acu.edu/technology/ mobilelearning/index.html), which in fall 2008 distributed iphones and ipod touches to the incoming freshman class; and stanford university (http://www.stanford .edu/services/wirelessdevice/iphone/) which participates in “itunes u” (http://itunes.stanford.edu/). if the libraries were to move forward with smartphone technologies, it would be following the lead of such universities. readers also may be interested in joan lippincott’s recent concise summary of the implications of mobile technologies for academic libraries as well as the chapter on library mobile initiatives in the july 2008 library technology report.8 n goals: a balancing act ultimately the goal for many of these efforts is to be where the users are. this aspiration is spelled out in unlv libraries’ new strategic plan relating to infrastructure evolution, namely, “work towards an interface and system architecture that incorporates our resources, internal and external, and allows the user to access from their preferred starting point.”9 while such a goal is laudable and fits very well into the discovery emphasis of the mini-conference presentation, we are well aware of the need for further investigation before proceeding directly to full-scale development of a complete suite of mobile services for our users. of critical importance is ascertaining where our users are and determining whether they want us to be there and in what capacity. the value of this effort is demonstrated in booth’s research report on student interest in emerging technologies at ohio state university. the report includes the results of an extensive environmental survey of their wendy starkweather (wendy.starkweather@unlv.edu) is director, user services division, and eva stowers (eva.stowers @unlv.edu) is medical/health sciences librarian at the university of nevada las vegas libraries. 188 information technology and libraries | december 2009 library users. the study is part of ohio state’s effort to actualize their culture of assessment and continuous learning and to use “extant local knowledge of user populations and library goals” to inform “homegrown studies to illuminate contextual nuance and character, customization that can be difficult to achieve when using externally developed survey instruments.”10 unlv libraries are attempting to balance early experimentation and more extensive data-driven decision-making. the recently adopted strategic plan includes specific directions associated with both efforts. for experimentation, the direction states, “encourage staff to experiment with, explore, and share innovative and creative applications of technology.”11 to that end, we have begun working with our colleagues to introduce easy, small-scale efforts designed to test the waters of mobile technology use through small pilot projects. “text-a-librarian” has been added to our existing group of virtual reference service, and we introduced a “text the call number and record” service to our library’s opac in july 2009. unlv libraries’ strategic plan helps foster the healthy balance by directing library staff to “emphasize data collection and other evidence based approaches needed to assess efficiency and effectiveness of multiple modes and formats of access/ownership” and “collaborate to educate faculty and others regarding ways to incorporate library collections and services into education experiences for students.”12 action items associated with these directions will help the libraries learn and apply information specific to their users as the libraries further adopt and integrate mobile technologies into their services. as we begin our planning in earnest, we look forward to our own set of valuable discoveries. references 1. janna anderson and lee rainie, the future of the internet iii, pew internet & american life project, http://www.pewinternet .org/~/media//files/reports/2008/pip_futureinternet3.pdf .pdf (accessed july 20, 2009). 2. sam churchill, “smartphone users: 110m by 2013,” blog entry, mar. 24, 2009, dailywireless.org, http://www.daily wireless.org/2009/03/24/smartphone-users-100m-by-2013 (accessed july 20, 2009). 3. mg siegler, “state of the iphone ecosystem: 40 million devices and 50,000 apps,” blog entry, june 8, 2009, tech crunch, http://www.techcrunch.com/2009/06/08/40-million-iphones -and-ipod-touches-and-50000-apps (accessed july 20, 2009). 4. jenna wortham, “palm app catalog hits a million downloads,” blog entry, june 24, 2009, new york times technology, http://bits.blogs.nytimes.com/2009/06/24/palm-app-cataloghits-a-million-downloads (accessed july 20, 2009). 5. larry johnson, alan levine, and rachel smith, horizon report, 2009 edition (austin, tex.: the new media consortium, 2009), http://www.nmc.org/pdf/2009-horizon-report.pdf (accessed july 20, 2009). 6. university of california, davis. “more than 40% of campus students own smartphones, yearly tech survey says,” technews, http://technews.ucdavis.edu/news2.cfm?id=1752 (accessed july 20, 2009). 7. university of nevada las vegas, office of information technology, “student technology survey report: 2008– 2009,” http://oit.unlv.edu/sites/default/files/survey/survey results2008_students3_27_09.pdf (accessed july 20, 2009). 8. joan lippincott, “mobile technologies, mobile users: implications for academic libraries,” arl bi-monthly report 261 (dec. 2008), http://www.arl.org/bm~doc/arl-br-261-mobile .pdf. (accessed july 20, 2009); ellyssa kroski, “library mobile initiatives,” library technology reports 44, no. 5 (july 2008): 33–38. 9. “unlv libraries strategic plan 2009–2011,” http://www .library.unlv.edu/about/strategic_plan09-11.pdf (accessed july 20, 2009): 2. 10. char booth, informing innovation: tracking student interest in emerging library technologies at ohio university (chicago: association of college and research libraries, 2009), http:// www.ala.org/ala/mgrps/divs/acrl/publications/digital/ ii-booth.pdf (accessed july 20, 2009); “unlv libraries strategic plan 2009–2011,” 6. 11. “unlv libraries strategic plan 2009–2011,” 2. 12. ibid. 76 information technology and libraries | june 2010 in this paper we discuss the design space of methods for integrating information from web services into websites. we focus primarily on client-side mash-ups, in which code running in the user’s browser contacts web services directly without the assistance of an intermediary server or proxy. to create such mash-ups, we advocate the use of “widgets,” which are easy-to-use, customizable html elements whose use does not require programming knowledge. although the techniques we discuss apply to any web-based information system, we specifically consider how an opac can become both the target of web services integration and also a web service that provides information to be integrated elsewhere. we describe three widget libraries we have developed, which provide access to four web services. these libraries have been deployed by us and others. our contributions are twofold: we give practitioners an insight into the trade-offs surrounding the appropriate choice of mash-up model, and we present the specific designs and use examples of three concrete widget libraries librarians can directly use or adapt. all software described in this paper is available under the lgpl open source license. ■■ background web-based information systems use a client-server architecture in which the server sends html markup to the user’s browser, which then renders this html and displays it to the user. along with html markup, a server may send javascript code that executes in the user’s browser. this javascript code can in turn contact the original server or additional servers and include information obtained from them into the rendered content while it is being displayed. this basic architecture allows for myriad possible design choices and combinations for mash-ups. each design choice has implications to ease of use, customizability, programming requirements, hosting requirements, scalability, latency, and availability. server-side mash-ups in a server-side mash-up design, shown in figure 1, the mash-up server contacts the base server and each source when it receives a request from a client. it combines the information received from the base server and the sources and sends the combined html to the client. server-side mash-up systems that combine base and mash-up servers are also referred to as data mash-up systems. such data mash-up systems typically provide a web-based configuration front-end that allows users to select data sources, specify the manner in which they are combined, and to create a layout for the entire mash-up. godmar back and annette bailey web services and widgets for library information systems as more libraries integrate information from web services to enhance their online public displays, techniques that facilitate this integration are needed. this paper presents a technique for such integration that is based on html widgets. we discuss three example systems (google book classes, tictoclookup, and majax) that implement this technique. these systems can be easily adapted without requiring programming experience or expensive hosting. t o improve the usefulness and quality of their online public access catalogs (opacs), more and more librarians include information from additional sources into their public displays.1 examples of such sources include web services that provide additional bibliographic information, social bookmarking and tagging information, book reviews, alternative sources for bibliographic items, table-of-contents previews, and excerpts. as new web services emerge, librarians quickly integrate them to enhance the quality of their opac displays. conversely, librarians are interested in opening the bibliographic, holdings, and circulation information contained in their opacs for inclusion into other web offerings they or others maintain. for example, by turning their opac into a web service, subject librarians can include up-to-the-minute circulation information in subject or resource guides. similarly, university instructors can use an opac’s metadata records to display citation information ready for import into citation management software on their course pages. the ability to easily create such “mash-up” pages is crucial for increasing the visibility and reach of the digital resources libraries provide. although the technology to use web services to create mash-ups is well known, several practical requirements must be met to facilitate its widespread use. first, any environment providing for such integration should be easy to use, even for librarians with limited programming background. this ease of use must extend to environments that include proprietary systems, such as vendor-provided opacs. second, integration must be seamless and customizable, allowing for local display preferences and flexible styling. third, the setup, hosting, and maintenance of any necessary infrastructure must be low-cost and should maximize the use of already available or freely accessible resources. fourth, performance must be acceptable, both in terms of latency and scalability.2 godmar back (gback@cs.vt.edu) is assistant professor, department of computer science and annette bailey (afbailey@vt.edu) is assistant professor, university libraries, virginia tech university, blacksburg. web services and widgets for library information systems | back and bailey 77 examples of such systems include dapper and yahoo! pipes.3 these systems require very little programming knowledge, but they limit mash-up creators to the functionality supported by a particular system and do not allow the user to leverage the layout and functionality of an existing base server, such as an existing opac. integrating server-side mash-up systems with proprietary opacs as the base server is difficult because the mash-up server must parse the opac’s output before integrating any additional information. moreover, users must now visit—or be redirected to—the url of the mash-up server. although some emerging extensible opac designs provide the ability to include information from external sources directly and easily, most currently deployed systems do not.4 in addition, those mash-up servers that do usually require server-side programming to retrieve and integrate the information coming from the mash-up sources into the page. the availability of software libraries and the use of special purpose markup languages may mitigate this requirement in the future. from a performance scalability point of view, the mash-up server is a bottleneck in server-side mash-ups and therefore must be made large enough to handle the expected load of end-user requests. on the other hand, the caching of data retrieved from mash-up sources is simple to implement in this arrangement because only the mash-up server contacts these sources. such caching reduces the frequency with which requests have to be sent to sources if their data is cacheable, that is, if realtime information is not required. the latency in this design is the sum of the time required for the client to send a request to the mashup server and receive a reply, plus the processing time required by the server, plus the time incurred by sending a request and receiving a reply from the last responding mash-up source. this model assumes that the mash-up server contacts all sources in parallel, or as soon as the server knows that information from a source should be included in a page. the availability of the system depends on the availability of all mash-up sources. if a mash-up source does not respond, the end user must wait until such failure is apparent to the mash-up server via a timeout. finally, because the mash-up server acts as a client to the base and source servers, no additional security considerations apply with respect to which sources may be contacted. there also are no restrictions on the data interchange format used by source servers as long as the mash-up server is able to parse the data returned. client-side mash-ups in a client-side setup, shown in figure 2, the base server sends only a partial website to the client, along with javascript code that instructs the client which other sources of information to contact. when executed in the browser, this javascript code retrieves the information from the mash-up sources directly and completes the mash-up. the primary appeal of client-side mashing is that no mash-up server is required, and thus the url that users visit does not change. consequently, the mash-up server is no longer a bottleneck. equally important, no maintenance is required for this server, which is particularly relevant when libraries use turnkey solutions that restrict administrative access to the machine housing their opac. on the other hand, without a mash-up server, results from mash-up sources can no longer be centrally cached. thus the mash-up sources themselves must be sufficiently figure 1. server-side mash-up construction figure 2. client-side mash-up construction 78 information technology and libraries | june 2010 scalable to handle the expected number of requests. as a load-reducing strategy, mash-up sources can label their results with appropriate expiration times to influence the caching of results in the clients’ browsers. availability is increased because the mash-up degrades gracefully if some of the mash-up sources fail, since the information from the remaining sources can still be displayed to the user. assuming that requests are sent by the client in parallel or as soon as possible, and assuming that each mash-up source responds with similar latency to requests sent by the user’s browser as to requests sent by a mash-up server, the latency for a client-side mash-up is similar to the server-side mash-up. however, unlike in the server-side approach, the page designer has the option to display partial results to the user while some requests are still in progress, or even to delay sending some requests until the user explicitly requests the data by clicking on a link or other element on the page. because client-side mash-ups rely on javascript code to contact web services directly, they are subject to a number of restrictions that stem from the security model governing the execution of javascript code in current browsers. this security model is designed to protect the user from malicious websites that could exploit client-side code and abuse the user’s credentials to retrieve html or xml data from other websites to which a user has access. such malicious code could then relay this potentially sensitive data back to the malicious site. to prevent such attacks, the security model allows the retrieval of html text or xml data only from sites within the same domain as the origin site, a policy commonly known as sameorigin policy. in figure 2, sources a and b come from the same domain as the page the user visits. the restrictions of the same-origin policy can be avoided by using the javascript object notation (json) interchange format.5 because client-side code may retrieve and execute javascript code served from any domain, web services that are not co-located with the origin site can make their results available using json. doing so facilitates their inclusion into any page, independent of the domain from which it is served (see source c in figure 2). many existing web services already provide an option to return data in json format, perhaps along with other formats such as xml. for web services that do not, a proxy server may be required to translate the data coming from the service into json. if the implementation of a proxy server is not feasible, the web service is usable only on pages within the same domain as the website using it. client-side mash-ups lend themselves naturally to enhancing the functionality of existing, proprietary opac systems, particularly when a vendor provides only limited extensibility. because they do not require server-side programming, the absence of a suitable vendor-provided server-side programming interface does not prevent their creation. oftentimes, vendor-provided templates or variables can be suitably adapted to send the necessary html markup and javascript code to the client. the amount of javascript code a librarian needs to write (or copy from a provided example) determines both the likelihood of adoption and the maintainability of a given mash-up creation. the less javascript code there is to write, the larger the group of librarians who feel comfortable trying and adopting a given implementation. the approach of using html widgets hides the use of javascript almost entirely from the mash-up creator. html widgets represent specially composed markup, which will be replaced with information coming from a mash-up source when the page is rendered. because the necessary code is contained in a javascript library, adapters do not need to understand programming to use the information coming from the web service. finally, html widgets are also preferable for javascript-savvy users because they create a layer of abstraction over the complexity and browser dependencies inherent in javascript programming. ■■ the google book classes widget library to illustrate our approach, we present a first example that allows the integration of data obtained from google book search into any website, including opac pages. google book search provides access to google’s database of book metadata and contents. because of the company’s book scanning activities as well as through agreements with publishers, google hosts scanned images of many book jackets as well as partial or even full previews for some books. many libraries are interested in either using the book jackets when displaying opac records or alerting their users if google can provide a partial or full view of an item a user selected in their catalog, or both.6 this service can help users decide whether to borrow the book from the library. the google book search dynamic link api the google book search dynamic link api is a jsonbased web service through which google provides certain metadata for items it has indexed. it can be queried using bibliographic identifiers such as isbn, oclc number, or library of congress control number (lccn). it returns a small set of data that includes the url of a book jacket thumbnail image, the url of a page with bibliographic information, the url of a preview page (if available), as well as information about the extent of any preview and whether the preview viewer can be embedded directly into other pages. table 1 shows the json result returned for an example isbn. web services and widgets for library information systems | back and bailey 79 widgetization to facilitate the easy integration of this service into websites without javascript programming, we developed a widget library. from the adapter’s perspective, the use of these widgets is extremely simple. the adapter places html or

tags into the page where they want data from google book search to display. these tags contain an html attribute that acts as an identifier to describe the bibliographic item for which information should be retrieved. it may contain its isbn, oclc number, or lccn. in addition, the tags also contain one or more html <class> attributes to describe which processing should be done with the information retrieved from google to integrate it into the page. these classes can be combined with a list of traditional css classes in the <class> attribute to apply further style and formatting control. examples as an example, consider the following html an adapter may use in a page: <span title=“isbn:0596000278” class=“gbs -thumbnail gbs-link-to-preview”></span> when processed by the google book classes widget library, the class “gbs-thumbnail” instructs the widget to embed a thumbnail image of the book jacket for isbn 0596000278, and “gbs-link-to-preview” provides instructions to wrap the <span> tag in a hyperlink pointing to google’s preview page. the result is as if the server had contacted google’s web service and constructed the html shown in example 1 in table 2, but the mash-up creator does not need to be concerned with the mechanics of contacting google’s service and making the necessary manipulations to the document. example 2 in table 2 demonstrates a second possible use of the widget. in this example, the creator’s intent is to display an image that links to google’s information page if and only if google provides at least a partial preview for the book in question. this goal is accomplished by placing the image inside the span and using style=“display:none” to make the span initially invisible. the span is made visible only if a preview is available at google, displaying the hyperlinked image. the full list of features supported by the google book classes widget library can be found in table 3. integration with legacy opacs the approach described thus far assumes that the mashup creator has sufficient control over the html markup that is sent to the user. this assumption does not always hold if the html is produced by a vendor-provided system, since such systems automatically generate most of the html used to display opac search results or individual bibliographic records. if the opac provides an extension system, such as a facility to embed customized links to external resources, it may be used to generate the necessary html by utilizing variables (e.g., “@#isbn@” for isbn numbers) set by the opac software. if no extension facility exists, accommodations by the widget library are needed to maintain the goal of not requiring any programming on the part of the adapter. we implemented such accommodations to facilitate the use of google book classes within a iii millennium opac.7 we used magic strings such as “isbn:millennium.record” in a table 1. sample request and response for google book search dynamic link api request: http://books.google.com/books?bibkeys=isbn:0596000278&jscmd=viewapi&callback=process json response: process({ “isbn:0596000278”: { “bib_key”: “isbn:0596000278”, “info_url”: “http://books.google.com/books?id=ezqe1hh91q4c\x26source=gbs_viewapi”, “preview_url”: “http://books.google.com/books?id=ezqe1hh91q4c\x26printsec=frontcover\x26 source=gbs_viewapi”, “thumbnail_url”: “http://bks4.books.google.com/books?id=ezqe1hh91q4c\x26printsec=frontcover\x26 img=1\x26zoom=5\x26sig=acfu3u2d1usnxw9baqd94u2nc3quwhjn2a”, “preview”: “partial”, “embeddable”: true } }); 80 information technology and libraries | june 2010 table 2. example of client-side processing by the google book classes widget library example 1: html written by adapter browser display <span title=“isbn:0596000278” class=“gbs-thumbnail gbs-link-to-preview”> </span> resultant html after client-side processing <a href=“http://books.google.com/books?id=ezqe1hh91q4c& printsec=frontcover&source=gbs_viewapi”> <span title=“” class=”gbs-thumbnail gbs-link-to-preview”> <img src=“http://bks3.books.google.com/books?id=ezqe1hh91q4c& amp;printsec=frontcover&img=1&zoom=5& sig=acfu3u2d1usnxw9baqd94u2nc3quwhjn2a” /> </span> </a> example 2: html written by adapter browser display <span style=“display: none” title=“isbn:0596000278” class=“gbs-link-to-info gbs-if-partial-or-full”> <img src=“http://www.google.com/intl/en/googlebooks/images/ gbs_preview_button1.gif” /> </span> resultant html after client-side processing <a href=”http://books.google.com/books?id=ezqe1hh91q4c& source=gbs_viewapi”> <span title=“” class=“gbs-link-to-info gbs-if-partial-or-full”> <img src=“http://www.google.com/intl/en/googlebooks/images/ gbs_preview_button1.gif” /> </span> </a> table 3. supported google book classes google book class meaning gbs-thumbnail gbs-link-to-preview gbs-link-to-info gbs-link-to-thumbnail gbs-embed-viewer gbs-if-noview gbs-if-partial-or-full gbs-if-partial gbs-if-full gbs-remove-on-failure include an <img...> embedding the thumbnail image wrap span/div in link to preview at google book search (gbs) wrap span/div in link to info page at gbs wrap span/div in link to thumbnail at gbs directly embed a viewer for book’s content into the page, if possible keep this span/div only if gbs reports that book’s viewability is “noview” keep this span/div only if gbs reports that book’s viewability is at least “partial” keep this span/div only if gbs reports that book’s viewability is “partial” keep this span/div only if gbs reports that book’s viewability is “full” remove this span/div if gbs doesn’t return book information for this item <title> attribute to instruct the widget library to harvest the isbn from the current page via screen scraping. figure 3 provides an example of how a google book classes widget can be integrated into an opac search results page. ■■ the tictoclookup widget library the tictocs journal table of contents service is a free online service that allows academic researchers and web services and widgets for library information systems | back and bailey 81 other users to keep up with newly published research by giving them access to thousands of journal tables of contents from multiple publishers.8 the tictocs consortium compiles and maintains a dataset that maps issns and journal titles to rss-feed urls for the journals’ tables of contents. the tictoclookup web service we used the tictocs dataset to create a simple json web service called “tictoclookup” that returns rss-feed urls when queried by issn and, optionally, by journal title. table 4 shows an example query and response. to accommodate different hosting scenarios, we created two implementations of this tictoclookup: a standalone and a cloud-based implementation. the standalone version is implemented as a python web application conformant to the web services gateway interface (wsgi) specification. hosting this version requires access to a web server that supports a wsgicompatible environment, such as apache’s mod_wsgi. the python application reads the tictocs dataset and responds to lookup requests for specific issns. a cron job downloads the most up-to-date version of the dataset periodically. the cloud version of the tictoclookup service is implemented as a google app engine (gae) application. it uses the highly scalable and highly available gae datastore to store tictocs data records. gae applications run on servers located in google’s regional data centers so that requests are handled by a data center geographically close to the requesting client. as of june 2009, google hosting of gae applications is free, which includes a free allotment of several computational resources. for each application, gae allows quotas of up to 1.3 mb requests and the use of up to 10 gb of bandwidth per twenty-fourhour period. although this capacity is sufficient for the purposes of many small and medium-size institutions, additional capacity can be purchased at a small cost. widgetization to facilitate the easy integration of this service into websites without javascript programming, we developed a widget library. like google book classes, this widget library is controlled via html attributes associated with html <span> or <div> tags that are placed into the page where the user decides to display data from the tictoclookup service. the html <title> attribute identifies the journal by its issn or its issn and title. as with google book classes, figure 3. sample use of google book classes in an opac results page table 4. sample request and response for tictocs lookup web service request: http://tictoclookup.appspot.com/0028-0836?title=nature&jsoncallback=process json response: process({ “lastmod”: “wed apr 29 05:42:36 2009”, “records”: [{ “title”: “nature”, “rssfeed”: http://www.nature.com/nature/current_issue/rss }], “issn”: “00280836” }); 82 information technology and libraries | june 2010 the html <class> attribute describes the desired processing, which may contain traditional css classes. example consider the following html an adapter may use in a page: <span style=“display:none” class=“tictoc-link tictoc-preview tictoc-alternate-link” title=“issn:00280836: nature”> click to subscribe to table of contents for this journal </span> when processed by the tictoclookup widget library, the class “tictoc-link” instructs the widget to wrap the span in a link to the rss feed at which the table of content is published, allowing users to subscribe to it. the class “tictoc-preview” associates a tooltip element with the span, which displays the first entries of the feed when the user hovers over the link. we use the google feeds api, another json-based web service, to retrieve a cached copy of the feed. the “tictoc-alternate-link” class places an alternate link into the current document, which in some browsers triggers the display of the rss feed icon figure 4. sample use of tictoclookup classes in the status bar. the <span> element, which is initially invisible, is made visible if and only if the tictoclookup service returns information for the given pair of issn and title. figure 4 provides a screenshot of the display if the user hovers over the link. as with google book classes, the mash-up creator does not need to be concerned with the mechanics of contacting the tictoclookup web service and making the necessary manipulations to the document. table 5 provides a complete overview of the classes tictoclookup supports. integration with legacy opacs similar to the google book classes widget library, we implemented provisions that allow the use of tictoclookup classes on pages over which the mash-up creator has limited control. for instance, specifying a title attribute of “issn:millennium.issnandtitle” harvests the issn and journal title from the iii millennium’s record display page. ■■ majax whereas the widget libraries discussed thus far integrate external web services into an opac display, majax is a widget library that integrates information coming from an opac into other pages, such as resource guides or course displays. majax is designed for use with a iii millennium integrated library system (ils) whose vendor does not provide a web-services interface. the techniques we used, however, extend to other opacs as well. like many table 5. supported tictoclookup classes tictoclookup class meaning tictoc-link tictoc-preview tictoc-embed-n tictoc-alternate-link tictoc-append-title wrap span/div in link to table of contents display tooltip with preview of current entries embed preview of first n entries insert <link rel=“alternate”> into document append the title of the journal to the span/div web services and widgets for library information systems | back and bailey 83 legacy opacs, millennium does not only lack a web-services interface, but lacks any programming interface to the records contained in the system and does not provide access to the database or file system of the machine housing the opac. providing opac data as a web service we implemented two methods to access records from the millennium opac using bibliographic identifiers such as isbn, oclc number, bibliographic record number, and item title. both methods provide access to complete marc records and holdings information, along with locations and real-time availability for each held item. majax extracts this information via screenscraping from the marc record display page. as with all screen-scraping approaches, the code performing the scraping must be updated if the output format provided by the opac changes. in our experience, such changes occur at a frequency of less than once per year. the first method, majax 1, implements screen scraping using javascript code that is contained in a document placed in a directory on the server (/screens), which is normally used for supplementary resources, such as images. this document is included in the target page as a hidden html <iframe> element (see frame b in figure 2). consequently, the same-domain restriction applies to the code residing in it. majax 1 can thus be used only on pages within the same domain—for instance, if the opac is housed at opac.library.university.edu, majax 1 may be used on all pages within *.university.edu (not merely *.library.university.edu). the key advantage of majax 1 is that no additional server is required. the second method, majax 2, uses an intermediary server that retrieves the data from the opac, translates it to json, and returns it to the client. this method, shown in figure 5, returns json data and therefore does not suffer from the same-domain restriction. however, it requires hosting the majax 2 web service. like the tictoclookup web service, we implemented the majax 2 web service using python conformant to wsgi. a single installation can support multiple opacs. widgetization the majax widget library allows the integration of both majax 1 and majax 2 data into websites without javascript programming. the <span> tags function as placeholders, and <title> and <class> attributes describe the desired processing. majax provides a number of “majax classes,” multiple of which can be specified. these classes allow a mash-up creator to insert a large variety of bibliographic information, such as the values of marc fields. classes are also provided to insert fully formatted, ready-to-copy bibliographic references in harvard style, live circulation information, links to the catalog record, links to online versions of the item (if applicable), a ready-to-import ris description of the item, and even images of the book cover. a list of classes majax supports is provided in table 6. examples figure 6 provides an example use of majax widgets. four <span> tags expand into the book cover, a complete harvard-style reference, the valid of a specific marc field (020), and a display of the current availability of the item, wrapped in a link to the catalog record. texts such as “copy is available” shown in figure 6 are localizable. even though there are multiple majax <span> tags that refer to the same isbn, the majax widget library will contact the majax 1 or majax 2 web service only once per identifier, independent of how often it is used in a page. to manage the load, the majax client site library can be configured to not exceed a maximum number of requests per second, per client. all software described in this paper is available under the lgpl open source license. the majax libraries have been used by us and others for about two years. for instance, the “new books” list in our library uses majax 1 to provide circulation information. faculty members at our institution are using majax to enrich their course websites. a number of libraries have adopted majax 1, which is particularly easy to host because no additional server is required. ■■ related work most ilss in use today do not provide suitable web-services interfaces to access either bibliographic information figure 5. architecture of the majax 2 web service 84 information technology and libraries | june 2010 or availability data.9 this shortcoming is addressed by multiple initiatives. the ils discovery interface task force (ils-di) created a set of recommendations that facilitate the integration of discovery interfaces with legacy ilss, but does not define a concrete api.10 related, the iso 20775 holdings standard describes an xml schema to describe the availability of items across systems, but does not describe an api for accessing them.11 many ilss provide a z39.50 interface in addition to their htmlbased web opacs, but z39.50 does not provide standardized holdings and availability.12 nevertheless, there is hope within the community that ils vendors will react to their customers’ needs and provide web-services interfaces that implement these recommendations. the jangle project provides an api and an implementation of the ils-di recommendations through a representations state transfer (rest)–based interface that uses the atom publishing protocol (app).13 jangle can be linked to legacy ilss via connectors. the use of the xml-based app prevents direct access from client-side javascript code, however. in the future, adoption and widespread implementation of the w3c working draft on crossorigin resource sharing may relax the same-origin restriction in a controlled fashion, and thus allow access to app feeds from javascript across domains.14 screen-scraping is a common technique used to overcome the lack of web-services interfaces. for instance, oclc’s worldcat local product obtains access to availability information from legacy ilss in a similar fashion as our majax 2 service.15 whereas the web services used or created in our work exclusively use a rest-based model and return data in json format, interfaces based on soap (formerly simple object access protocol) whose semantics are described by a wsdl specification provide an alternative if access from within client-side javascript code is not required.16 html written by adapter <table width=“340”><tr><td> <span class=“majax-syndetics-vtech” title=“i1843341662”></span> </td><td> <span class=“majax-harvard-reference” title=“i1843341662”></span> <br /> isbn: <span class=“majax-marc-020” title=“i1843341662”></span> <br /> <span class=“majax-linktocatalogmajax-showholdings” title=“i1843341662”></span> </td></tr></table> display in browser after processing dahl, mark., banerjee, kyle., spalti, michael., 2006, digital libraries : integrating content and systems / oxford, chandos publishing, xviii, 203 p. isbn: 1843341662 (hbk.) 1 copy is available figure 6. example use of majax widgets oclc grid services provides rest-based web-services interfaces to several databases, including the worldcat search api and identifier services such as xisbn, xissn, and xoclcnum for frbr-related metadata.17 these services support xml and json and could benefit from widgetization for easier inclusion into client pages. the use of html markup to encode processing instructions is common in javascript frameworks, such as yui or dojo, which use <div> elements with customdefined attributes (so-called expando attributes) for this purpose.18 google gadgets uses a similar technique as well.19 the widely used context objects in spans (coins) specification exploits <span> tags to encode openurl table 6. selected majax classes majax class replacement majax-marc-fff-s majax-marc-fff majax-syndetics-* majax-showholdings majax-showholdings-brief majax-endnote majax-ebook majax-linktocatalog majax-harvard-reference majax-newline majax-space marc field fff, subfields concatenation of all subfields in field fff book cover image current holdings and availability information …in brief format ris version of record link to online version, if any link to record in catalog reference in harvard style newline space web services and widgets for library information systems | back and bailey 85 techniques for the seamless inclusion of information from web services into websites. we considered the cases where an opac is either the target of such integration or the source of the information being integrated. we focused on client-side techniques in which each user’s browser contacts web services directly because this approach lends itself to the creation of html widgets. these widgets allow the integration and customization of web services without requiring programming. therefore nonprogrammers can become mash-up creators. we described in detail the functionality and use of several widget libraries and web services we built. table 7 provides a summary of the functionality and hosting requirements for each system discussed. although the specific requirements for each system differ because of their respective nature, all systems are designed to be deployable with minimum effort and resource requirements. this low entry cost, combined with the provision of a high-level, nonprogramming interface, constitute two crucial preconditions for the broad adoption of mash-up techniques in libraries, which in turn has the potential to context objects in pages for processing by client-side extension.20 librarything uses client-side mash-up techniques to incorporate a social tagging service into opac pages.21 although their technique uses a <div> element as a placeholder, it does not allow customization via classes—the changes to the content are encoded in custom-generated javascript code for each library that subscribes to the service. the juice project shares our goal of simplifying the enrichment of opac pages with content from other sources.22 it provides a set of reusable components that is directed at javascript programmers, not librarians. in the computer-science community, multiple emerging projects investigate how to simplify the creation of server-side data mash-ups by end user programmers.23 ■■ conclusion this paper explored the design space of mash-up table 7. summary of features and requirements for the widget libraries presented in this paper majax 1 majax 2 google book classes tictoclookup classes web service screen scraping iii record display json proxy for iii record display google book search dynamic link api books.google.com tictoc cloud application tictoclookup .appspot.com hosted by existing millennium installation /screens wsgi/python script on libx.lib.vt.edu google, inc. google, inc. via google app engine data provenance your opac your opac google jisc (www.tictocs .ac.uk) additional cost n/a can use libx.lib.vt.edu for testing, must run wsgi-enabled web server in production free, but subject to google terms of service generous free quota, pay per use beyond that same domain restriction yes no no no widgetization majax.js: class-based: majaxclasses gbsclasses.js:classbased: gbs tictoc.js:class-based: tictoc requires javascript programming no no no no requires additional server no yes (apache+mod_wsgi) no no (if using gae), else need apache+mod_wsgi iii bibrecord display n/a n/a yes yes iii webbridge integration yes yes yes yes 86 information technology and libraries | june 2010 vastly increase the reach and visibility of their electronic resources in the wider community. references 1. nicole engard, ed., library mashups—exploring new ways to deliver library data (medford, n.j.: information today, 2009); andrew darby and ron gilmour, “adding delicious data to your library website,” information technology & libraries 28, no. 2 (2009): 100–103. 2. monica brown-sica, “playing tag in the dark: diagnosing slowness in library response time,” information technologies & libraries 27, no. 4 (2008): 29–32. 3. dapper, “dapper dynamic ads,” http://www.dapper .net/ (accessed june 19, 2009); yahoo!, “pipes,” http://pipes .yahoo.com/pipes/ (accessed june 19, 2009). 4. jennifer bowen, “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1,” information technology & libraries 27, no. 2 (2008): 6–19; john blyberg, “ils customer bill-of-rights,” online posting, blyberg.net, nov. 20, 2005, http://www.blyberg .net/2005/11/20/ils-customer-bill-of-rights/ (accessed june 18, 2009). 5. douglas crockford, “the application/json media type for javascript object notation (json),” memo, the internet society, july 2006, http://www.ietf.org/rfc/rfc4627.txt (accessed mar. 30, 2010). 6. google, “who’s using the book search apis?” http:// code.google.com/apis/books/casestudies/ (accessed june 16, 2009). 7. innovative interfaces, “millennium ils,” http://www.iii .com/products/millennium_ils.shtml (accessed june 19, 2009). 8. joint information systems committee, “tictocs journal tables of contents service,” http://www.tictocs.ac.uk/ (accessed june 18, 2009). 9. mark dahl, kyle banarjee, and michael spalti, digital libraries: integrating content and systems (oxford, united kingdom: chandos, 2006). 10. john ockerbloom et al., “dlf ils discovery interface task group (ils-di) technical recommendation,” (dec. 8, 2008), http://diglib.org/architectures/ilsdi/dlf_ils_ discovery_1.1.pdf (accessed june 18, 2009). 11. international organization for standardization, “information and documentation—schema for holdings information,” http://www.iso.org/iso/catalogue_detail .htm?csnumber=39735 (accessed june 18, 2009) 12. national information standards organization, “ansi/ niso z39.50—information retrieval: application service definition and protocol specification,” (bethesda, md.: niso pr., 2003), http://www.loc.gov/z3950/agency/z39-50-2003.pdf (accessed may 31, 2010). 13. ross singer and james farrugia, “unveiling jangle: untangling library resources and exposing them through the atom publishing protocol,” the code4lib journal no. 4 (sept. 22, 2008), http://journal.code4lib.org/articles/109 (accessed apr. 21, 2010); roy fielding, “architectural styles and the design of network-based software architectures” (phd diss., university of california, irvine, 2000); j. c. gregorio, ed., “the atom publishing protocol,” memo, the internet engineering task force, oct. 2007, http://bitworking.org/projects/atom/rfc5023.html (accessed june 18, 2009). 14. world wide web consortium, “cross-origin resource sharing: w3c working draft 17 march 2009,” http://www .w3.org/tr/access-control/ (accessed june 18, 2009). 15. oclc online computer library center, “worldcat and cataloging documentation,” http://www.oclc.org/support/ documentation/worldcat/default.htm (accessed june 18, 2009). 16. f. curbera et al., “unraveling the web services web: an introduction to soap, wsdl, and uddi,” ieee internet computing 6, no. 2 (2002): 86–93. 17. oclc online computer library center, “oclc web services,” http://www.worldcat.org/devnet/wiki/services (accessed june 18, 2009); international federation of library associations and institutions study group on the functional requirements for bibliographic records, “functional requirements for bibliographic records : final report,” http://www.ifla.org/files/ cataloguing/frbr/frbr_2008.pdf (accessed mar. 31, 2010). 18. yahoo!, “the yahoo! user interface library (yui),” http://developer.yahoo.com/yui/ (accessed june 18, 2009); dojo foundation, “dojo—the javascript toolkit,” http://www .dojotoolkit.org/ (accessed june 18, 2009). 19. google, “gadgets.* api developer’s guide,” http://code. google.com/apis/gadgets/docs/dev_guide.html (accessed june 18, 2009). 20. daniel chudnov, “coins for the link trail,” library journal 131 (2006): 8–10. 21. librarything, “librarything,” http://www.librarything .com/widget.php (accessed june 19, 2009). 22. robert wallis, “juice—javascript user interface componentised extensions,” http://code.google.com/p/juice-project/ (accessed june 18, 2009). 23. jeffrey wong and jason hong, “making mashups with marmite: towards end-user programming for the web” conference on human factors in computing systems, san jose, california, april 28–may 3, 2007: conference proceedings, volume 2 (new york: association for computing machinery, 2007): 1435–44; guiling wang, shaohua yang, and yanbo han, “mashroom: end-user mashup programming using nested tables” (paper presented at the international world wide web conference, madrid, spain, 2009): 861–70; nan zang, “mashups for the web-active user” (paper presented at the ieee symposium on visual languages and human-centric computing, herrshing am ammersee, germany, 2008): 276–77. 6 information technology and libraries | march 2010 sandra shores is [tk] sandra shores editorial board thoughts: issue introduction to student essays t he papers in this special issue, although covering diverse topics, have in common their authorship by people currently or recently engaged in graduate library studies. it has been many years since i was a library science student—twenty-five in fact. i remember remarking to a future colleague at the time that i found the interview for my first professional job easy, not because the interviewers failed to ask challenging questions, but because i had just graduated. i was passionate about my chosen profession, and my mind was filled from my time at library school with big ideas and the latest theories, techniques, and knowledge of our discipline. while i could enthusiastically respond to anything the interviewers asked, my colleague remarked she had been in her job so long that she felt she had lost her sense of the big questions. the busyness of her daily work life drew her focus away from contemplation of our purpose, principles, and values as librarians. i now feel at a similar point in my career as this colleague did twenty-five years ago, and for that reason i have been delighted to work with these student authors to help see their papers through to publication. the six papers represent the strongest work from a wide selection that students submitted to the lita/ ex libris student writing award competition. this year’s winner is michael silver, who looks forward to graduating in the spring from the mlis program at the university of alberta. silver entered the program with a strong library technology foundation, having provided it services to a regional library system for about ten years. he notes that “the ‘accidental systems librarian’ position is probably the norm in many small and medium sized libraries. as a result, there are a number of practices that libraries should adopt from the it world that many library staff have never been exposed to.”1 his paper, which details the implementation of an open-source monitoring system to ensure the availability of library systems and services, is a fine example of the blending of best practices from two professions. indeed, many of us who work in it in libraries have a library background and still have a great deal to learn from it professionals. silver is contemplating a phd program or else a return to a library systems position when he graduates. either way, the profession will benefit from his thoughtful, well-researched, and useful contributions to our field. todd vandenbark’s paper on library web design for persons with disabilities follows, providing a highly practical but also very readable guide for webmasters and others. vandenbark graduated last spring with a masters degree from the school of library and information science at indiana university and is already working as a web services librarian at the eccles health sciences library at the university of utah. like mr. silver, he entered the program with a number of years’ work experience in the it field, and his paper reflects the depth of his technical knowledge. vandenbark notes, however, that he has found “the enthusiasm and collegiality among library technology professionals to be a welcome change from other employment experiences,” a gratifying comment for readers of this journal. ilana tolkoff tackles the challenging concept of global interoperability in cataloguing. she was fascinated that a single database, oclc, has holdings from libraries all over the world. this is also such a recent phenomenon that our current cataloging standards still do not accommodate such global participation. i was interested to see what librarians were doing to reconcile this variety of languages, scripts, cultures, and independently developed cataloging standards. tolkoff also graduated this past spring and is hoping to find a position within a music library. marijke visser addresses the overwhelming question of how to organize and expose internet resources, looking at tagging and the social web as a solution. coming from a teaching background, visser has long been interested in literacy and life-long learning. she is concerned about “the amount of information found only online and what it means when people are unable . . . to find the best resources, the best article, the right website that answers a question or solves a critical problem.” she is excited by “the potential for creativity made possible by technology” and by the way librarians incorporate “collaborative tools and interactive applications into library service.” visser looks forward to graduating in may. mary kurtz examines the use of the dublin core metadata schema within dspace institutional repositories. as a volunteer, she used dspace to archive historical photographs and was responsible for classifying them using dublin core. she enjoyed exploring how other institutions use the same tools and would love to delve further into digital archives, “how they’re used, how they’re organized, who uses them and why.” kurtz graduated in the summer and is looking for the right job for her interests and talents in a location that suits herself and her family. finally, lauren mandel wraps up the issue exploring the use of a geographic information system to understand how patrons use library spaces. mandel has been an enthusiastic patron of libraries since she was a small child visiting her local county and city public libraries. she is currently a doctoral candidate at florida state university and sees an academic future for herself. mandel expresses infectious optimism about technology in libraries: sandra shores (sandra.shores@ualberta.ca) is guest editor of this issue and operations manager, information technology services, university of alberta libraries, edmonton, alberta, canada. editorial board thoughts | shores 7 looking ahead, it seems clear that the pace of change in today’s environment will only continue to accelerate; thus the need for us to quickly form and dissolve key sponsorships and partnerships that will result in the successful fostering and implementation of new ideas, the currency of a vibrant profession. the next challenge is to realize that many of the key sponsorship and partnerships that need to be formed are not just with traditional organizations in this profession. tomorrow’s sponsorships and partnership will be with those organizations that will benefit from the expertise of libraries and their suppliers while in return helping to develop or provide the new funding opportunities and means and places for disseminating access to their expertise and resources. likely organizations would be those in the fields of education, publishing, content creation and management, and social and community webbased software. to summarize, we at ex libris believe in sponsorships and partnerships. we believe they’re important and should be used in advancing our profession and organizations. from long experience we also have learned there are right ways and wrong ways to implement these tools, and i’ve shared thoughts on how to make them work for all the parties involved. again, i thank marc for his receptiveness to this discussion and my even deeper appreciation for trying to address the issues. it’s serves as an excellent example of what i discussed above. people forget, but paper, the scroll, the codex, and later the book were all major technological leaps, not to mention the printing press and moveable type. . . . there is so much potential for using technology to equalize access to information, regardless of how much money you have, what language you speak, or where you live. big ideas, enthusiasm, and hope for the profession, in addition to practical technology-focused information await the reader. enjoy the issue, and congratulations to the winner and all the finalists! note 1. all quotations are taken with permission from private e-mail correspondence. a partnership for creating successful partnerships continued from page 5 microsoft word march_ital_prommann_original_notes.docx applying hierarchical task analysis method to discovery layer evaluation merlen prommann and tao zhang information technology and libraries | march 2015 77 abstract while usability tests have been helpful in evaluating the success or failure of implementing discovery layers in the library context, the focus of usability tests has remained on the search interface rather than the discovery process for users. the informal site-‐ and context specific usability tests have offered little to test the rigor of the discovery layers against the user goals, motivations and workflow they have been designed to support. this study proposes hierarchical task analysis (hta) as an important complementary evaluation method to usability testing of discovery layers. relevant literature is reviewed for the discovery layers and the hta method. as no previous application of hta to the evaluation of discovery layers was found, this paper presents the application of hta as an expert based and workflow centered (e.g., retrieving a relevant book or a journal article) method to evaluating discovery layers. purdue university’s primo by ex libris was used to map eleven use cases as hta charts. nielsen’s goal composition theory was used as an analytical framework to evaluate the goal charts from two perspectives: a) users’ physical interactions (i.e., clicks), and b) user’s cognitive steps (i.e., decision points for what to do next). a brief comparison of hta and usability test findings is offered as a way of conclusion. introduction discovery layers are relatively new third party software components that offer google-‐like web-‐ scale search interface for library users to find information held in the library catalo and beyond. libraries are increasingly utilizing these to offer a better user experience to their patrons. while popular in application, the discussion about discovery layer implementation and evaluation remains limited. [1][2] a majority of reported case studies discussing discovery layer implementations are based on informal usability tests that involve a small sample of users in a specific context. the resulting data sets are often incomplete and the scenarios are hard to generalize.[3] discovery layers have a number of technical advantages over the traditional federated search and cover a much wider range of library resources. however, they are not without limitations. questions have remained scarce about the workflow of discovery layers and how well they help users achieve their goals. merlen prommann (mpromann@purdue.edu) is user experience researcher and designer, purdue university libraries. tao zhang (zhan1022@purdue.edu) is user experience specialist, purdue university libraries. applying hierarchical task analysis method to discovery layer evaluation | promann and zhang 78 beth thomsett-‐scott and patricia e. reese1 offered an extensive overview of the literature discussing the disconnect between what the library websites offer and what their users would like.[1] on the one hand, library directors deal with a great variety of faculty perceptions, in terms of what the role of library is and how they approach research differently. the ithaka s+r library survey of not-‐for profit four-‐year academic institutions in the us suggests a real diversity of american academic libraries as they seek to develop services with sustained value.[4] for the common library website user, irrelevant search results and unfamiliar library taxonomy (e.g. call numbers, multiple locations, item formats, etc.) are two most common gaps.[3] michael khoo and catherine hall demonstrated how users, primarily college students, have become so accustomed to the search functionalities on the internet that they are reluctant to use library websites for their research.[5] no doubt, the launch of google scholar in 2005 was another driver for librarians to move from the traditional federated searching to something faster and more comprehensive.[1] while literature encouraging google-‐like search experiences is abundant, khoo and hall have warned designers to not take users’ preferences towards google at face value. they studied users’ mental models, defining it as “a model that people have of themselves, others, the environment, and the things with which they interact, such as technologies,” and concluded that users often do not understand the complexities of how search functions actually work or what is useful about them.[5] a more systematic examination of the tasks that discovery layers are designed to support is needed. this paper introduces hierarchical task analysis (henceforth hta) as an expert method to evaluate discovery layers from a task-‐oriented perspective. it aims to complement usability testing. for more than 40 years, hta has been the primary methodology to study systems’ sub-‐ goal hierarchies for it presents the opportunity to provide insights into key workflow issues. with expertise in applying hta and being frequent users of the purdue university libraries website for personal academic needs, we mapped user tasks into several flow charts based on three task scenarios: (1) finding an article, (2) finding a book, and (3) finding an ebook. jackob nielsen’s “goal composition” heuristics: generalization, integration and user control mechanisms[6] were used as an analytical framework to evaluate the user experience of an ex libris primo® discovery layer implemented at purdue university libraries. the goal composition heuristics focus on multifunctionality and the idea of servicing many possible user goals at once. for instance, generalization allows users to use one feature on more objects. integration allows each feature to be used in combination with other facilities. control mechanisms allow users to inspect and amend how the computer carries out the instructions. we discussed the key issues with other library colleagues to meet nielsen’s five expert rule and avoid loss in the quality of insights.[7] nielsen studied the value of participant volume in usability tests and concluded that after the fifth user researchers are wasting their time by observing the same findings and not learning much new. a comparison to usability study findings, as presented by fagan et al, is offered as a way of conclusion.[3] information technology and libraries | march 2015 79 related work discovery layers the traditional federated search technology offers the overall benefit of searching many databases at once.[8][1] yet it has been known to frustrate users, as they often do not know which databases to include in their search. emily alling and rachel naismith aggregated common findings from a number of studies involving the traditional federated search technology.[9] besides slow response time, other key causes of frustrating inefficiency were: limited information about search results, information overload due to the lack of filters, and the fact that results were not ranked in order of relevance (see also [2][1]). new tools, termed as “discovery,” “discovery tools,”[2][10] “discovery layers’” or “next generation catalogs,”[11] have become increasingly popular and have provided the hope of eliminating some of the issues with traditional federated search. generally, they are third party interfaces that use pre-‐indexing to provide speedy discovery of relevant materials across millions of records of local library collections, from books and articles, to databases and digital archives. furthermore, some systems (e.g., ex libris primo central index) aggregate hundreds of millions of scholarly e-‐ resources, including journal articles, e-‐books, reviews, legal documents and more that are harvested from primary and secondary publishers and aggregators, and from open-‐access repositories. discovery layers are projected to help create the next generation of federated search engines that utilize a single search index of metadata to search the rising volume of resources available for libraries.[2][11][10][1] while not systematic yet, results from a number of usability studies on these discovery layers point to the benefits they offer. the most noteworthy benefit of a discovery layer is its seemingly easy to use unified search interface. jerry caswell and john d. wynstra studied the implementation of ex libris metalib centralized indexes based on the federated search technology at the university of northern iowa library.[8] they confirmed how the easily accessible unified interface helped users to search multiple relevant databases simultaneously and more efficiently. lyle ford concluded that the summon discovery layer by serials solutions fulfilled students’ expectations to be able to search books and articles together.[12] susan johns-‐smith pointed out another key benefit to users: customizability.[10] the summon discovery layer allowed users to determine how much of the machine-‐readable cataloging (marc) record was displayed. the study also confirmed how the unified interface, aligning the look and feel among databases, increased the ease of use for end-‐ users. michael gorrell described how one of the key providers, ebsco, gathered input from users and considered design features of popular websites, to implement new technologies to the ebscohost interface.[13] some of the features that ease the usability of ebscohost are a dynamic date slider, an article preview hover, and expandable features for various facets, such as subject and publication.[2] applying hierarchical task analysis method to discovery layer evaluation | promann and zhang 80 another key benefit of discovery systems is the speed of results retrieval. the primo discovery layer by ex libris has been complimented for its ability to reduce the time it takes to conclude a search session, while maximizing the volume of relevant results per search session.[14] it was suggested that in so doing the tool helps introduce users to new content types. yuji tosaka and cathy weng reported how records with richer metadata tend to be found more frequently and lead to more circulation.[15] similarly, luther and kelly reported an increase in overall downloads, while the use of individual databases decreased.[16] these studies point to the trend of an enhanced distribution of discovery and knowledge. with the additional metadata of item records, however, there is also the increased likelihood of inconsistencies across databases that are brought together in a centralized index. a study by graham stone offered a comprehensive report on the implementation process of the summon discovery layer at the university of huddersfield, highlighting major inconsistences in cataloging practices and the difficulties it caused in providing consistent journal holdings and titles.[17] this casts shadows on the promise of better findability. jeff wisniewski[18] and williams and foster[2] are among the many who espouse discovery layers as a step towards a truly single search function that is flexible while allowing needed customizability. these new tools, however, are not without their limitations. the majority of usability studies reinforce similar results and focus on the user interface. fagan et al, for example, studied the usability of ebsco discovery service at james madison university (jmu). while most tasks were accomplished successfully, the study confirmed previous warnings that users do not understand the complexities of search and identified several interface issues: (1) users desire single search, but willingly use multiple options for search, (2) lack of visibility for the option to sort search results, and (3) the difficulty in finding journal articles.[3] yang and wagner offer one case where the aim was to evaluate discovery layers against a check-‐ list of 12 features that would define a true ‘next generation catalogue’: (1) single point of entry to all library information, (2) state-‐of-‐the-‐art web interface (e.g. google and amazon), (3) enriched content (e.g. book cover images, ratings and comments), (4) faceted navigation for search results, (5) simple keyword search on every page, (6) more precise relevancy (with circulation statistics a contributing factor), (7) automatic spell check, (8) recommendations to related materials (common in commercial sites, e.g. amazon), (9) allowing users to add data to records (e.g. reviews), information technology and libraries | march 2015 81 (10) rss feeds to allow users to follow top circulating books or topic related updates in the library catalogue, (11) links to social networking sites to allow users to share their resources, (12) stable url’s that can be easily copied, pasted and shared. [11] they used this list to evaluate seven open source and ten proprietary discovery layers, revealing how only a few of them can be considered true ‘next generation catalogs’ supporting the users’ needs that are common on the web. all of the tools included in their study missed precision in retrieving relevant search results, e.g. based on transaction data. the authors were impressed with open source discovery layers libraryfind and vufind, which had 10 of the 12 features, leaving vendors of proprietary discovery layers ranking lower (see figure 1). figure 1. 17 discovery layers (x-‐axis) were evaluated against a checklist of 12 features expected of the next generation catalogue (y-‐axis) yang and wagner theorized that the relative lack of innovation among commercial discovery layers is due to practical reasons: vendors create their new discovery layers to run alongside older ones, rather than attempting to alter the proprietary code of the integrated library system’s (ils) online public access catalog (opac). they pointed to the need for “libraries, vendors and the open source community […] to cooperate and work together in a spirit of optimism and collegiality to make the true next generation catalogs a reality”.[11] at the same time, the university of michigan article discovery working group reported on vendors’ being more cooperative and allowing coordination among products, increasing the potential of web-‐scale discovery services.[19] how to evaluate and optimize user workflow across these coordinating products remains a practical 9 9 9 8 7.5 7 7 7 6 6 6 5 5 4 2 1 0 1 2 3 4 5 6 7 8 9 10 ranking of discovery layers (yang and wagner 2010, 707) applying hierarchical task analysis method to discovery layer evaluation | promann and zhang 82 challenge. in this study, we propose hta as a prospectively helpful method to evaluate user workflow through these increasingly complex products. hierarchical task analysis with roots in tylorism*, industrial psychology and system processes, task analyses continue to offer valuable insights into the balance of efficiency and effectiveness in human-‐computer interaction scenarios [20][21]. historically, frank and lillian gilbreth (1911) set forth the principle of hierarchical task analysis (hta), when they broke down and studied the individual steps involved in laying bricks. they reduced the brick laying process from about 18 movements down to four (in [21]). but, it was john annett and keith d. duncan (1967) who introduced hta as a method to better evaluate the personnel training needs of an organization. they used it to break apart behavioral aspects of complex tasks such as planning, diagnosis and decision-‐making (see in[22][21]). hta helps break users goals into subtasks and actions, usually in a visual form of a graphic chart. it offers a practical model for goal execution, allowing designers to map user goals to the system’s varying task levels and evaluate their feasibility [23]. in so doing, hta offers the structure with which to learn about tasks and highlight any unnecessary steps and potential errors that might occur during a task performance [24][25], whether cognitive or physical. its strength lies in its dual approach to evaluation: on the one hand, user interface elements are mapped at an extremely low and detailed level (to individual buttons), while on the other hand, each of these interface elements gets mapped to user’s high-‐level cognitive tasks (the cognitive load). this informs a rigorous design approach, where each detail accounts for the high-‐level user task it needs to support. the main limitation of classical hta is its system-‐centric focus that does not account for the wider context the tasks under examination exists in. the field of human-‐computer interaction has shifted our understanding of cognition from an individual information processing model to a networked and contextually defined set of interactions, where the task under analysis is no longer confined to a desktop but “extends into a complex network of information and computer-‐mediated interactions” [26]. the task step focused hta does not have the ability to account for the rich social and physical contexts that the increasingly mediated and multifaceted activities are embedded in. hta has been reiterated with additional theories and heuristics, so as to better account for the increasingly more complete understanding of human activity. advanced task models and analysis methods have been developed based on the principle of hta. stuart k. card, thomas p. moran and allen newell [27] proposed an engineering model of human performance – goms (goals, operators, methods, and selection) – to map how task environment features determine what and when users know about the task [20]. goms have been expanded to cope with rising complexities (e.g. [28][29][30]), but the models have become largely impractical * tylorism is the application of scientific method to the analysis of work, so as to make it more efficient and cost-‐effective. modern task information technology and libraries | march 2015 83 in the process [20]. instead of simplistically suggesting cognitive errors are due to interface design, cognitive task analysis (cta) attempts to address the underlying mental processes that most often give rise to errors [24]. given the lack of our structural understanding about cognitive processes, the analysis of cognitive tasks has remained problematic to implement [20][31]. activity theory models people as active decision makers [20]. it explains how users convert goals into a set of motives and how they seek to execute those motives as a set of interactions in a given situational condition. these situational conditions either help or prevent the user from achieving the intended goal. activity theory is beginning to offer a coherent foundation to account for the task context [20], but it has yet to offer a disciplined set of methods to execute this theory in the form of a task analysis. even though task analyses have seen much improvement, adaptation and usage in its near-‐40-‐ year-‐long existence and its core benefit – aiding an understanding of the tasks users need to perform to achieve their desired goals – have remained the same. until activity theory, cla and other contextual approaches are developed into more readily applicable analysis frameworks, classical hta with the additional layers of heuristics guiding the analysis remains the practical option [21]. nielsen’s goal composition [6] offers one such set of heuristics applicable for the web context. it presents usability concepts such as reuse, multitasking, automated use, recovering and retrieving, to name a few, so as to systematically evaluate the hta charts representing the interplay between an interface and the user. utility of hta for evaluating discovery layers usability testing has become the norm in validating the effectiveness and ease of use of library websites. yet, thirteen years ago, brenda battleson, austin booth and jane weintrop [32] emphasized the need to support user tasks as the crucial element to user-‐centered design. in comparison to usability testing, hta offers a more comprehensive model for the analysis of how well discovery layers support users’ tasks in the contemporary library context. considering the strengths of the hta method and the current need for vendors to simplify the workflows in the increasingly complex systems, it is surprising that hta has not yet been applied to the evaluation of discovery layers. this paper introduces hierarchical task analysis (hta) as a solution to systematically evaluate the workflow of discovery layers as a technology that helps users accomplish specific tasks, herein, retrieving relevant items from the library catalog and other scholarly collections. nielsen’s [6] goal composition heuristics, designed to evaluate usability in the web context, is used to guide the evaluation of the user workflow via the hta task maps. as a process (vs. context) specific approach, hta can help achieve a more systematic examination of the tasks discovery layers should support, such as finding an article, a book or an ebook, and help vendors coordinate to achieve the full potential of web-‐scale discovery services. applying hierarchical task analysis method to discovery layer evaluation | promann and zhang 84 method: applying hta to primo by ex libris the object of this study was purdue university’s library website, which was re-‐launched with ex libris’ primo in january 2013 (figure 2) to serve the growing student and faculty community. its 3.6 million indexed records are visited over 1.1 million times every year. roughly 34% of these visits are to electronic books. according to sharon q. yang and kurt wagner [11], who studied 17 different discovery layers, primo ranked the best among the commercial discovery layer products, coming fourth after the open source tools library find, vufind, and scriblio in the overall rankings. we will evaluate how efficiently and effectively the primo search interface supports users’ of the purdue libraries tasks. figure 2. purdue library front page and search box based on our three year experience of user studies and usability testing of the library website, we identified finding an article, a book and an ebook as the three major representative scenarios of purdue library usage. to test how primo helps its users and how many cognitive steps it requires of them, each of the three scenarios were broken into three or four specific case studies. the case studies were designed to account for the different availability categories present in the current primo system, e.g. ‘full text available’, ‘partial availability’, ‘restricted access’ or ‘no access’. this is because the different availabilities present users with different possible frustrations and obstacles information technology and libraries | march 2015 85 to task accomplishment. this system-‐design perspective could offer a comparable baseline for discovery layer evaluation across libraries. a full list of the eleven case studies can be seen below: find an article: case 1. the library has only a full electronic text. case 2. the library has the correct issue of the journal in print, which contains the article, as well as, a full electronic text. case 3. the library has the correct issue of the journal, which contains the article, only in print. case 4. the library does not have the full text, either in print or electronically. a possible option is to use inter library loan (here forth ill) request. find a book (print copy): case 5. the library has the book and the book is on the shelf. case 6. the library has the book, but the book is in a restricted place, such as the hicks repository. the user has to request the book. case 7. the library has the book, but it is either on the shelf or in a repository. the user would like to request the book. case 8. the library does not have the book. possible options are uborrow† or ill. find an ebook: case 9. the library has the full text of the ebook. case 10. the ebook is shown in search results but the library does not have full text. case 11. the book is not shown in search results. possible option is to use uborrow or ill. it is generally accepted that hta is not a complex analysis method, but since it offers general guiding principles rather than a rigorous step-‐by-‐step guide, it can be tricky to implement [24][20][21][23]. both authors of this study have expertise in applying hta and are frequent users of the purdue library’s website. we are familiar with the library’s commonly reported system errors; however, all of our case studies result from a randomized topic search, not from specific reported items. to achieve consistent hta charts one author carried out the identified use-‐cases on a part-‐time basis over a two-‐month period. each case was executed on the purdue library website, using the primo discovery layer. an on campus hewlett-‐packard (hp) desktop computer with internet explorer and a personal macbook laptop with safari and google chrome were used to identify any possible inconsistencies between user experiences on different † uborrow is a federated catalog and direct consortial borrowing service provided by the committee on institutional cooperation (cic). uborrow allows users to search for, and request, available books from all cic libraries, which includes all universities in the big ten as well as the university of chicago, and the center for research libraries. applying hierarchical task analysis method to discovery layer evaluation | promann and zhang 86 operating systems. as per stanton’s [21] statement that “hta is a living documentation of the sub-‐ goal hierarchy that only exists in the latest state of revision”, mapping the hta charts was an iterative process between the two authors. according to david embrey [24] “the analyst needs to develop a measure of skill [in the task] in order to analyze a task effectively” (2). this measure of skill was developed in the process of finding real examples (via a randomized topic search) from the purdue library catalog to match the structural cases listed above. for instance ‘case 1. the library has only the electronic full text’ was turned into a case goal: ‘0 find the conference proceeding on network-‐assisted underwater acoustic communication'. a full list of referenced case studies is below: find an article: case 1. find the article “network-‐assisted underwater acoustic communication” (yang and kevin, 2012). case 2. find the article “comparison of simple potential functions for simulating liquid water” (jorgensen et al., 1983). case 3. find the journal design annual “graphis inc” (2008). case 4. find the article “a technique for murine irradiation in a controlled gas environment” (walb, m. c. et al., 2012). find a book (in print): case 5. find the book show me the numbers: designing tables and graphs to enlighten (few, 2004). case 6. find the book the love of cats and place a request for it (metcalf, 1973). case 7. find the book the prince and place a request for it (machiavelli). case 8. find the book the design history reader by maffei and houze (2010). (uborrow or ill). find an ebook: case 9. find the ebook handbook of usability testing. how to plan, design and conduct effective tests (rubin and chisnell, 2008) case 10. find the ebook the science of awakening consciousness: our ancient wisdom (partly available via hathi trust) case 11. find the ebook ancient awakening by matthew bryan laube (uborrow). hta descriptions are generally diagrammatic or tabular. since diagrams are easier to assimilate and promise the identification of a larger number of sub-‐goals [23], diagrammatic description method was preferred (figure 2). each analysis started with the establishment of sub-‐goals, such as ‘browse the library website’ and ‘retrieve the article’, and followed with the identification of individual small steps that make the sub-‐goal possible, e.g. ‘press search’ and ‘click on 2, to go to page 2’ (figures 3-‐5). then, additional iterations were made to include: (1) cognitive steps, where information technology and libraries | march 2015 87 users need to evaluate the screen in order to take the next step (e.g. identifying the correct url to open from the initial results set), and (2) capture cognitive decision points between multiple options for users to choose from. for instance, items can be requested either via interlibrary loan (ill) or uborrow, presenting the user with an a or b option, requiring cognitive effort to make a choice. such parallel paths were color coded in yellow (figure 2). both physical and cognitive steps were recorded into xmind‡, a free mind mapping software. they were color-‐coded black and gray, respectively, helping visualize the volume of cognitive decision points and steps (i.e. cognitive load). figure 3. full hta chart for 'find a book' scenario (case 5). created in xmind. figure 4. zoom in to steps 1 and 2 of the hta map for ‘find a book’ scenario (case 5). created in xmind. ‡ xmind is a free mind mapping software that allows structured presentation of step multiple coding references, the addition of images, links and extensive notes. http://www.xmind.net/ applying hierarchical task analysis method to discovery layer evaluation | promann and zhang 88 figure 5. zoom in to step 3 of the hta map for the 'find a book' scenario (case 5). created in xmind. information technology and libraries | march 2015 89 figure 6. zoom in to step 4 of the hta map of the 'find a book' scenario (case 5). created in xmind. to organize the decision flow chart, the original hierarchical number scheme for hta that requires every sub-‐goal to be uniquely numbered with an integer in numerical sequence [21], was strictly followed. visual (screen captures) and verbal notes on efficient and inefficient design factors were taken during the hta mapping process and linked directly to the tasks they applied to. steps, where interface design guided the user to the next step, were marked ‘fluent’ with a green tick (figures 3 and 4). steps that were likely to mislead users from the optimal path to item retrieval and were a burden to user’s workflow were marked with a red ‘x’ (see figures 4 and 5). one major advantage of the diagram format is its visual and structural representation of sub-‐goals and their steps in a spatial manner (see figures 2-‐5). this is useful for gaining a quick overview of the workflow [21]. when exactly to stop the analysis has remained undefined for hta [21]. it is at the discretion of the analyst to evaluate if there is the need to re-‐describe every sub-‐goal down to the most basic level, or whether the failure to perform that sub-‐goal is, in fact, consequential to the study results. we decided to stop evaluation at the point where the user located (a shelf number or reserve pick up number) or received the sought item via download. furthermore, steps that were perceived as possible when impossible in actuality were transcribed into the diagrams. article scenario case 1 offers an example: once the desired search result was identified, its green dot for ‘full text available’ applying hierarchical task analysis method to discovery layer evaluation | promann and zhang 90 was likely to be perceived as clickable, when in actuality it was not. the user is required to click on the title or open the tab ‘find online’ to access the external digital library and download the desired article (see figure 7). figure 7. article scenario (case1) two search results, where green 'full text available' may be perceived as clickable. task analysis focuses on the properties of the task rather than the user. this requires expert evaluation in place of involving users in the study. as stated above, both of the authors are working experts in the field of user experience in the library context, thoroughly aware of the tasks under analysis and how they are executed on a daily basis. a group of 12 (librarians, reference service staff, system administrators and developers) were asked to review the hta charts on a monthly basis. feedback and implications of identified issues were discussed as a group. according to nielsen [7] it takes five experts (double specialist in nielsen’s terms, is an expert in usability as well as in the particular technology employed by the software.) to not have significant loss of findings (see figure 7). based on this enumeration, the final versions of the hta charts offer accurate representations of the primo workflow in the three use scenarios of finding an article, finding a book and finding an ebook at purdue university libraries. information technology and libraries | march 2015 91 figure 8. average proportion of usability problems found as a function of number of evaluators in a group performing heuristic evaluation [7]. results the reason for mapping primo’s workflows in hta charts was to identify key workflow and usability issues of a widely used discovery layer in scenarios and contexts it was designed to serve. the resulting hta diagrams offered insights into fluent steps (green ticks), as well as workflow issues (red ‘x’) present in primo, as applied at purdue university libraries. it is due to space limitations, that only the main findings of the hta will be discussed. the full results are published on purdue university research repository§. table 1 presents how many parallel routes (a vs. b route), physical steps (clicks), cognitive evaluation steps, likely errors and well guided steps each of the use cases had. on average it took between 20 to 30 steps to find a relevant item within primo. even though no ideal step count has been identified for the library context, this is quite high in the general context of the web, where fast task accomplishment is generally expected. paul chojecki [33] tested how too many options impact usability on a website. he revealed that the average step count to lead to higher satisfaction levels is 6 (vs. 18,16 average steps at purdue libraries). in our study, the majority of the steps were physical pressing of a button or filter selection; however, cognitive steps took up just under a half of the steps in nearly all cases. the majority of cases flow well, as the strengths (fluent well guided steps) of primo outweigh its less guided steps that easily lend themselves to the chance of error. § task analysis cases and results for ex libris primo. https://purr.purdue.edu/publications/1738 applying hierarchical task analysis method to discovery layer evaluation | promann and zhang 92 content type articles books ebooks case number 1 2 3 4 avg 5 6 7 8 avg 9 10 11 avg no. of decision points (between a & b), to retrieve an item 5 8 4 4 5 4 5 5 2 4 6 3 2 4 minimum steps possible to retrieve an item (clicks + cognitive decisions) 18 27 16 30 23 18 25 28 24 24 22 19 19 20 of these minimum steps, how many were cognitive (information evaluation was needed to proceed) 4 8 9 13 9 6 9 7 7 7 4 6 4 5 maximum steps it can take to retrieve an item (clicks + cognitive decisions) 26 35 23 36 30 22 31 33 28 29 32 23 22 26 of these, maximum steps, how many were cognitive 10 17 14 15 14 10 13 16 8 12 9 8 5 7 errors (steps that mislead from optimal item retrieval) 3 15 4 8 8 2 2 4 3 3 13 1 2 5 fluent well guided steps to item retrieval 11 11 9 8 10 7 8 7 5 7 6 4 3 5 table 1. table listing each case’s key task measures, and each scenario’s averages. between the three item search scenarios – articles, books and ebooks – the retrieval of articles was least guided and required the highest amount of decisions from the user (5, vs. 4 for books and 4 for ebooks on average). retrieving an article (between 23-‐30 steps on average) or a book (24-‐29 steps on average) took more steps to accomplish than finding a relevant ebook (20-‐26 steps on average). the high volume of steps (max 30 steps on average) it required to retrieve an article, as well as its high error rate (8), were due to the higher amount of cognitive steps (12 steps on average) required to identify the correct article and to locate a hard copy (instead of the relatively easily retrievable online copy). in the book scenario, the challenge was also two-‐fold: on the one hand, it was challenging to verify the right book when there were many similar results (this explains the high number of 12 cognitive steps on average); on the other hand, the flow to place a request for a book was also a challenge. the latter was a key contributor to the higher amount of physical steps required for retrieving a book (max 29 on average). information technology and libraries | march 2015 93 common to all eleven cases, whether articles or books, was the four sub-‐goal-‐process: 1) browse the library website, 2) find results, 3) open the page of the desired item, and 4) retrieve, locate or order the item. the first two offered near identical experiences, no matter the search scenario or case. third and fourth sub-‐goals, however, presented different workflow issues depending on the product searched and its availability, e.g. ‘in print’ or ‘online’. as such, general results will be presented for the first two themes, while scenario specific overviews will be provided for the latter two themes. browsing the library website browsing the library website was easy and supported different user tasks. the simple url (lib.purdue.edu) was memorable and appeared first in the search results. the immediate availability of sub-‐menus, such as databases and catalogs, offered speedy searching for the frequent users. the choice between: a) general url, or b) sub-‐menu, was the first key decision point users of primo at purdue libraries were presented with. the purdue libraries’ home page (revisit figure 1) had a simple design with a clear, central and visible search box. just above it were search filters for articles, books and the web. this was the second key decision point users were presented with: a) they could either type into the search bar without selecting any filters, or b) they could select a filter to aid the focus of their results to a specific item type. browsing the library website offers an efficient and fluent workflow, with ebooks being the only exception. it was hard to know whether they were grouped under articles or books & media filters. confusingly (at the time of the study) purdue libraries listed ebooks that had no physical copies under articles, while other ebooks that purdue had physical version of (in addition to the digital ones) under books & media. this was not explained in the interface, nor was there a readily available tooltip. finding relevant results figure 9. search results for article (case2) ‘comparison of simple potential functions for simulating liquid water’ applying hierarchical task analysis method to discovery layer evaluation | promann and zhang 94 primo presented the search results in an algorithmic order of relevance offering additional pages for every 20 items appearing in the search results. the search bar was then minimized at the top of the page, available for easy editing. the page was divided into two key sections, where the first quarter entailed filters (e.g. year of publishing, resource type, author, journal, etc.), and the other three quarters was left for search results (see figure 8). the majority of cognitive decisions across scenarios were made on this results page. this was due to the need to pick up the cues to identify and verify the accurate item being searched. the value of these cognitive steps lies in their leading of the user to the next physical steps. as discussed in the next section, opening the page of the desired item, there were several elements that succeeded and failed at guiding the user to their accurate search result. search results were considered relevant when the search presented results in the general topic area of the searched item. most cases in most scenarios led to relevant results, however, book case 8 and ebook case 11, provided only unrelated results. generally, books and ebooks were easy to identify as available. this was due to their typically short titles, which took less effort to read. journal articles, on the other hand, have longer titles and required more cognitive effort to be verified. article case 4, book case 6 and ebook case 10 had relevant but restricted results. the color-‐ coding system that indicated the level of availability for the presented search results: green (fully available), orange (partly available) or gray (not available) dots – was followed by an explanatory availability tag, e.g. 'available online' or 'full text available' etc. tabs represented additional cues, offering additional information, e.g. ‘find in print’. these appeared in a supplementary way where applicable. for example, if an item was not available, its dot was gray and it neither had the 'find in print' nor 'find online' tab. instead, it had a 'request' tab, guiding the user towards an available alternative action. restricted availability items, such as a book in a closed repository, had an orange indicator for partial availability. for these, primo still offered the 'find in print' or 'find online' tab, whichever was appropriate. while the overall presentation of item availability was clear and color-‐coding consistent, the mechanisms were not without their errors, as discussed below. opening the page of the desired item this sub-‐goal comprised of two main steps: 1) information driven cognitive steps, which help the user identify the correct item, and 2) user interface guided physical steps that resulted in opening the page of the desired item. frequent strengths that helped the identification of relevant items across the scenarios were the clearly identifiable labels underneath the image icons (e.g. 'book’, 'article', ‘conference proceeding'), hierarchically structured information about the items (title, key details, availability) and perceivably clickable links (blue with an underlined hover effect). the labels and hierarchically presented details (e.g. year, journal, issue, volume, etc.) helped the workflow to remain smooth, information technology and libraries | march 2015 95 minimizing the need to use side filters. the immediate details reduced the need to open additional pages, cutting down the steps needed to accomplish the task. the hover effect of item titles made the link look and feel clickable, guiding the user closer to retrieving the item. color-‐ coding all clickable links in the same blue was also an effective design feature, even though bolded availability labels were equally prominent and clickable. this was especially true for articles where the ‘full text available’ tags correspond to users goal to immediately download the sought item (figure 8). the most frequent causes of errors were duplicated search results. generally, primo displays multiple versions of the same item into one search result and offered a link: ‘see all results’. in line with graham stone’s [17] study, which highlighted the problem of cataloging inconsistences, primo struggled to consistently grouping all overlapping search result items. both book and article scenarios suffered from at least one duplicate search result case due to inconsistent details. article scenario case 2 offers an example, where jorgensen et al “comparison of simple potential functions for simulating liquid water” (1983) had two separate results for the same journal article of the same year (first two results in figure 8). problematically, the two results offered different details for the journal issue and page numbers. this may cause likely referencing problems for primo users. duplicated search results were also an issue for book scenarios. the most frequent causes for this were instances where authors’ first and last names were presented in a reverse order (see also figure 8 for article case 2), the books had different print editions, or the editors’ name was used in place of the authors’. book scenario case 7: machiavelli’s “the prince” resulted in extremely varied results, requiring 16 cognitive steps and 33 physical steps before a desired item could be verified. this is where search filters were most handy. problematically, in case 7, machiavelli – the author – did not even appear in the author filter list, while ebrary inc was listed. again, this points to the inconsistent metadata and the effects it can have on usability, as discussed by stone.2 other workflow issues were presented by design details such as the additional information boxes underneath the item information, e.g. ‘find in print’, ‘details’ and ‘find online’. they opened a small scrollable box that maintained the overall page view, were difficult to scroll. the arrow kept slipping outside of the box, scrolling the entire site’s page instead of the content inside the box. in addition, the information boxes did not work well with chrome. this was especially problematic on the macbook where after a couple of searches the boxes failed to list the details and left the user with an unaccomplished task. comparably, safari on a mac and internet explorer on a pc never had such issues. retrieving the items (call number or downloading the pdf) the last sub-‐goal was to retrieve the item of interest. this often comprised of multiple decision points: whether to retrieve the pdf version from online or identify a call number for the physical applying hierarchical task analysis method to discovery layer evaluation | promann and zhang 96 copy or whether to place a request, ordering it via inter library loan (ill) or uborrow. each option is briefly discussed below. ebooks and articles, if available online, offered efficient online availability. if an article was identified for retrieval, there were two options to access the link to the database, e.g. ‘view this record in acm’: a) via the full view of the item, or b) small ‘find online’ preview box discussed above. where more than one database was available, information about the publication range the library holds helped identify the right link to download the pdf on the link-‐resolver page. one of the key benefits of having links from within primo to the full texts was the fact that they opened in new browser windows or tabs, without interference to other ongoing search. while a few of the pdf links to downloadable texts were difficult to find through some external database sites, once found, they all opened in adobe reader with easy options to either 'save' or ‘print’ the material. ebooks were available via ebrary or ebl libraries. while the latter offers some novel uses, such as audio (i.e. read aloud), neither of the two platforms was easy to use. while reading online was possible, downloading an ebook was challenging. the platform seemed to offer good options: a) download by chapter, b) download by page numbers, or c) download the full book for 14 days. in actuality, however, these were all unavailable. ebook case 9 had chapters longer than the 60-‐page limit per day. page numbers proved difficult to use, as the book’s numbers did not match the pdf’s page numbers. this made it hard to keep track of what was downloaded and where one left off to continue later (due to imposed time-‐limits). the 14-‐day full access option was only available in adobe digital editions software (an ebook reader software by adobe systems built with adobe flash), which was neither available on most campus computers nor on personal laptops. the least demanding and most fluent of all retrieval options was the process of identifying the location and call number for physical copies. inconsistent metadata, however, posed some challenges. book case 5 offered a merged search result of two books, but listed them with different call numbers in the ‘find in print’ tab. libraries have many physical copies of the same book, but identifying consistency in call number is a cognitive step that helps verify the similarities or differences between the two results. the different call numbers raised doubts about which item to choose, slowing the workflow for the task and increasing the number of cognitive steps required to accomplish the task. compared to books, finding an article in print format was hardly straightforward. the main cause for error when looking up hard copies of journals was the fact that individual journal issues did not have individual call numbers at purdue libraries. instead, they were had one call number per periodical where the entire journal series had only one call number. article case 2, for example, offered the journal code: 530.5 j821 in the ‘find in print’ tab. in general, the tab suffered from too much information, poor layout and unhelpful information hierarchy, all of which slowed down the cognitive tasks of verifying whether an item was relevant or not. it listed ‘location’ and ‘holdings range’ as the first pieces of information, wherein ‘holdings range’ included not just hard copy related information, but listed digital items as well, even though this tab was for physical version information technology and libraries | march 2015 97 of the item. to illustrate, article case 2 claimed to have holdings for 1900 – 2013, whereas hard copies were only available for 1900-‐2000, and digital copies for 2001-‐2013. each scenario had one or two cases where there were neither physical nor digital options available. the sub-‐goal commonly comprised of a decision between three options: c) placing a request, d) ordering an item via inter library loan (ill), or c) ordering an item via uborrow. while the ‘signing in to request’ option and ill were easy to use with few required steps, there was a lack of guidance on how to choose between the three options. frequently, ill and uborrow appeared as equal options adjacent to one another, leaving the next step unguided. of all three, placing a request via uborrow was the hardest to accomplish. it often failed to present any relevant results on the first results page of the uborrow system, requiring the use of advanced search and filters. for instance, book case 6 was ‘not requestable’ via uborrow. when it did list the sought for item in the search results it looped back to purdue's own closed repository (which remained unavailable). discussion the goal of this study was to utilize hta to examine the workflow of the primo discovery layer at purdue university libraries. nielsen’s [6] goal composition heuristics were used to extend the task-‐based analysis and understand the tasks in the context of discovery layers in libraries. three key usability domains: generalization, integration and user control mechanisms were used as an analytical framework to draw usability conclusions about how primo was supporting, if at all, successful completion of the three scenarios. the next three sub-‐sections evaluate and offer design solutions on the three usability domains mentioned above. overall, this study confirmed primo’s ability to reduce the workload for users to find their materials. primo is flexible and intuitive, permitting efficient search and successful retrieval of library materials, while offering the possibility of many search sessions at once [14]. a comparison to a usability test results is offered as a way of conclusion. generalization mechanisms primo can be considered a flexible discovery layer as it helps users achieve many goals with minimum amount of steps. it makes use of several generalization mechanisms that allow users to utilize their tasks towards many goals at once. for instance, the library website result in google offers not only the main url but also seven sub-‐links to specialist library site locations, such as opening hours and databases. this makes primo accessible and relevant for a broader array of people who are likely to have different goals. for instance, some may seek to enter a specific database, instead of having to open primo’s landing page and entering the search terms. another may wish to utilize ‘find’, which guides the user, one step at a time, via a process of definition elimination, closer to the item they are looking forknow the opening times. similarly, the primo search function saves already typed information, both on its landing page and its results page. this facilitates search by requiring query entry only once, while allowing end applying hierarchical task analysis method to discovery layer evaluation | promann and zhang 98 users to click on different filters to narrow the results in different ways. as a part of the work done towards one search can be used towards another, e.g. by content, journal, or topic type, the system can ease the work effort required of users. this is further supported by the system saving already typed keywords when returning to the main search page from research results and allows for a fluid search experience where the user adjusts a keyword thread with minimal typing, until they find what they are looking for. a key problem for primo is its inability to manage inconsistent meta-‐data. the tendency to group different versions of the same search results together is helpful as it clarifies information noise. in an effort to enhance the speed it takes to evaluate the relevancy of search results, the system seeks to shighlight any differences in the meta-‐data. if inconsistencies in meta-‐data cause same search results to appear as separate items, it is likely to affect the cognitive steps and therefore the workload and efficiency with which the user is able to accomplish identification. it is clear from previous studies that if discovery layers were to become the next generation catalogs [11], and were to enhance the speed of knowledge distribution as has been hoped by tosaka and weng [15] and luther and kelly [16], then mutual agreement is needed on how meta-‐ data from disparate sources [17]. understanding that users’ cognitive workload should be minimized (by offering fewer options and more directive guidance) for more efficient decision-‐ making, library items should have accurate details in their meta-‐data, e.g. consistent and thorough volume, issue and page numbers for journal articles, correct print and reprint years for books, and item type (conference proceeding vs. journal article). integration mechanisms the discovery layer’s ability to increase the number of search sessions [14] at any one time is possible due to its flexibility to support multitasking. primo achieves this with its own individual features used in combination with other system facilities and external sources. for instance, primo’s design allows users to review and compare several search results at once via the ‘find in print’ or ‘details’ tabs. although not perfect, since the small boxes are hard to scroll within, the information can save the user the need and additional steps of opening many new windows and having to click between them just for reviewing search results. instead, many ‘detail’ boxes of similar results may be opened and viewed at once, allowing for effective visual comparison. this integration mechanism allows a fluent transition from skimming the search results to another temporary action of gaining insight about the relevance of an item. most importantly, this is accomplished without requiring the user to open a new browser page or tab, where they would have to break from their overall search flow and remember the details (instead of visually comparing them), making it hard to resume from where they left off. a contrary integration mechanic that primo makes use of is its smooth automated connectivity to external sites, such as databases, ebrary, ill, etc. new browser pages are used to allow the continuation of a task outside of primo itself without forcing the user out of the system to the information technology and libraries | march 2015 99 library service or full text. primo users can skim search results, identify relevant resources and open them in new browser pages for later reviewing. what is missing, however, is the opportunity to easily save and resume a search. retrieving the search result or saving it under ones’ login details would benefit users who recall items of interest from previous searches and would like to repeat the results without having to remember the keywords or search process they used. it is not obvious how to locate the save search session option in primo’s interface. user control mechanisms yang and wagner [11] ranked primo highest among the vendors, primarily for its good user control mechanisms, which allow users to inspect and change the search functions on an ongoing basis. primo does a good job at presenting search results in a quick and organized manner. it allows for the needed ‘undo’ functionality and continued attachment and removal of filters, while saving the last keywords when clicking the back button from search results. the continuously available small search box also offers the flexibility for the user to change search parameters easily. in summary, primo offers agile searching, while accounting for a few different discovery mental models. however, if primo wants to preserve its current effectiveness and make the jump towards a single search function that is truly flexible and allows for much needed customizability [18][2], it needs to allow for several similar user goals to be easily executable without confusion about the likely outcome. the most prominent current system error for primo, as it has been applied in the purdue libraries, is its inability to differentiate ebooks from journal articles or books. it would support users goals to be able to start and finish an ebook related tasks at the home page’s search box. currently, users have the cognitive burden to consider whether ebooks are more likely to be found under ‘books & media’ or ‘journals’. currently, primo, as applied to its implementation at purdue libraries at the time of this study, does not support goals to search for content type, e.g. an ebook. this however, is increasingly popular among the student population who want ebooks on their tablets and phones instead of carrying heavy books in their backpacks. another key pain-‐point for current users is the identification of specific journals in physical form, say for archival research. currently, each journal issue is listed individually in the ‘find in print’ section, even though the journals only have one call number. listing all volumes and issues of each periodical overwhelms the user with too much information and prevents the effective accomplishment of the task of locating a specific journal issue. since there is only one call number available for the entire journal sequence, it may lead to better clarity and usability if the information was reduced. instead of listing all possible journal issues, a range or ranges (if incomplete set of issues) that the library has physically present should be listed. in article case 2, for instance, there are five items for the year 1983. why lead the user to look at a range where there is no possible option? applying hierarchical task analysis method to discovery layer evaluation | promann and zhang 100 comparing hta to a usability test usability tests benefit from the invaluable direct input from the end user. at the same time usability studies, as constructed conditions, offer limited opportunities to learn about users’ real motivations and goals and how the discovery layers support or fail to support their tasks. fagan et al [3] conducted a usability test with eight students and two faculty members to learn about usability issues and user satisfaction with discovery layers. they measured time, accuracy and completion rate for nine specific tasks, and obtained insights from task observations and post-‐test surveys. they reported on issues with users not following directions (93), the prevalence of time outs, users skipping tasks, and variable task times. these results all point to a mismatch between the user goals and the study tasks and offer an incomplete picture about the system’s ability to support user goals that are accomplished via specific tasks. expert evaluation based hta method does not require users’ direct input. hta offers a method to achieve a relatively complete evaluation of how low-‐level interface facets support users’ high-‐ level cognitive tasks. hta measures the system designs quality in supporting a specific task needed to accomplish a user goal. instead of measuring time, physical and cognitive tasks are measured in number of steps. instead of accuracy and completion rate, fluent workflow steps and mistaken steps are counted. the two methods offer opposite strengths, making them a good complements. given hta’s system-‐centric approach, it can better inform which tasks would be useful in usability testing. to compare the our research findings with usability tests, fagan et al [3] confirmed some of the previously established findings that journal titles are difficult to locate via the library home page (vs. databases), that filters are handy when they are needed and that users’ mental models have a preference for a google-‐like single search-‐box. for instance, students and even librarians, struggle to understand what is being searched in each system and how results are ranked (see also [5]). the hta method applied in this study was also able to confirm that journal titles are more difficult to identify than books and ebooks, the flexibility benefit offered by filters and identify the single search box as a fluent system design. since, hta does not rely on the user to tell why these results are true, hta, as applied in this study, helped expert evaluators understand the reasons for these findings via self-‐directed execution and discussion with colleagues later. depending on the task design, either usability testing or hta offer the capabilities to identify cases such as confusion about how to start an ebook search in primo. taking a system design approach to task design offers a path to a systematic understanding of discovery layer usability, which lends itself to easier comparison and external validity. in terms of specific interface features, usability tests are good for evaluating the visibility of specific features. for example, fagan et al [3] asked their participants to (1) search on speech pathology, (2) find a way to limit search results to audiology, and then (3) limit their search results to peer-‐reviewed (task 3 in [3], p. 95). by measuring completion rate, they were able to identify the relative failure of ‘peer-‐reviewed’ over ‘audiology’ filters, but they were left “unclear information technology and libraries | march 2015 101 [about] why the remaining participants did not attempt to alter the search results to ‘peer reviewed,” failing to accomplish the task [3]. in comparison, hta as an analytical rather than observational methodology, leads to more synthesized results. in addition to insights into possible gaps between system design and mental models, hta as a goal-‐oriented approach, concerns itself with issues of workflow (how well the system guides the user to accomplishing their task) and efficiency (minimizing the number of steps required to finish a task). these are less obvious to identify with usability tests, where participants are not impacted by their routine goals, time pressures and consequently their patience may be more tolerant as a result. the application of hta helped identify key workflow issues and map them to specific design elements. for instance, the lack of ebooks as a search filter meant that the current system did not support content form based searching well for two mains forms: articles and books. compared to usability tests that focus on specific fabricated search processes, hta aims to map all possible routes the system’s design offers to accomplish a goal, allowing for their parallel existence during the analysis. this system-‐centered approach to task evaluation, we argue, is the key benefit hta can offer towards a more systematic evaluation of discovery layers, where different user groups would have varying levels of assistance needs. hta task-‐analysis allows for the nuanced understanding that results can differ as the context of use differs. that applies even to the contextual difference between user test participants and routine library users. conclusion discovery layers are advancing the search experiences libraries can offer. with increasing efficiency, increased ease of use and more relevant results, scholarly search has become a far less frustrating experience. while google is still perceived as the holy grail of discovery experiences, in reality it may not be quite what scholarly users are after [5]. the application of discovery layers has focused on eliminating the limitations that plagued the traditional federated search and improving the search index coverage and performance. usability studies have been effective in verifying these benefits and key interface issues. moving forward, studies on discovery layers should focus more on the significance of discovery layers on user experience. this study presents the expert evaluation based hta methods as a complementary way to systematically evaluate popular discovery layers. it is the system design and goal-‐oriented evaluation approach that offers the prospects of a more thorough body of research on discovery layers than usability alone. using hta as a systematic preliminary study guiding formal usability testing offers one way to achieve more comparable study results on applications of discovery layers. it is through comparisons that the discussion of discovery and user experience can gain a more focused research attention. as such, hta can help vendors to achieve the full potential of web-‐scale discovery services. to better understand and ultimately design to their full potential, systematic studies are needed on discovery layers. this study is the first attempt to apply hta towards systematically analyzing user workflow and interaction issues on discovery layers. the authors hope to see more work in applying hierarchical task analysis method to discovery layer evaluation | promann and zhang 102 this area, with the hope of achieving true next generation catalogs that can enhance knowledge distribution. references [1] beth thomsett-‐scott and patricia e. reese, “academic libraries and discovery tools: a survey of the literature,” college & undergraduate libraries 19, no. 2–4 (april 2012): 123– 143. http://dx.doi.org/10.1080/10691316.2012.697009. [2] sarah c. williams and anita k. foster, “promise fulfilled? an ebsco discovery service usability study,” journal of web librarianship 5, no. 3 (jul. 2011): 179–198. http://dx.doi.org/10.1080/19322909.2011.597590. [3] jody condit fagan, meris a. mandernach, carl s. nelson, jonathan r. paulo, and grover saunders, “usability test results for a discovery tool in an academic library,” information technology and libraries 31, no. 1 (mar. 2012): 83–112, mar. 2012. http://dx.doi.org/10.6017/ital.v31i1.1855. [4] roger c. schonfeld and matthew p. long, “ithaka s+r us library survey 2013,” ithaka s+r, survey 2, mar. 2014. http://sr.ithaka.org/research-‐publications/ithaka-‐sr-‐us-‐library-‐survey-‐ 2013. [5] michael khoo and catherin hall, “what would ‘google’ do? users’ mental models of a digital library search engine,” in theory and practice of digital libraries, ed. panayiotis zaphiris, george buchanan, edie rasmussen, and fernando loizides, 1-‐12 (berlin heidelberg, springer: 2012). http://dx.doi.org/10.1007/978-‐3-‐642-‐33290-‐6_1. [6] jakob nielsen, “goal composition: extending task analysis to predict things people may want to do,” goal composition: extending task analysis to predict things people may want to do, 01-‐jan-‐1994. http://www.nngroup.com/articles/goal-‐composition/. [7] jakob nielsen, “finding usability problems through heuristic evaluation,” in proceedings of the sigchi conference on human factors in computing systems, 373-‐380 (new york, ny, acm: 1992). http://dx.doi.org/10.1145/142750.142834. [8] jerry v. caswell and john d. wynstra, “improving the search experience: federated search and the library gateway,” library hi tech 28, no. 3 (sep. 2010): 391–401. http://dx.doi.org/10.1108/07378831011076648. [9] emily r. alling and rachael naismith, “protocol analysis of a federated search tool: designing for users,” internet reference services quarterly 12, no. 1/2, (2007): 195–210. http://dx.doi.org/10.1300/j136v12n01_10. [10] susan johns-‐smith, “evaluation and implementation of a discovery tool,” kansas library association college and university libraries section proceedings 2, no. 1 (jan. 2012): 17–23. information technology and libraries | march 2015 103 [11] sharon q. yang and kurt wagner, “evaluating and comparing discovery tools: how close are we towards next generation catalog?,” library hi tech 28, no. 4 (nov. 2010): 690–709. http://dx.doi.org/10.1108/07378831011096312. [12] lyle ford, “better than google scholar?,” presentation, advance program for internet librarian 2010, monterey, california, 25-‐oct-‐2010. [13] michael gorrell, “the 21st century searcher: how the growth of search engines affected the redesign of ebscohost,” against the grain 20, no. 3 (2008): 22, 24. [14] sian harris, “discovery services sift through expert resources,” research information, no. 53, , (apr. 2011): 18–20. http://www.researchinformation.info/features/feature.php?feature_id=315. [15] yuji tosaka and cathy weng, “reexamining content-‐enriched access: its effect on usage and discovery,” college & research libraries 72, no. 5 (sep. 2011): pp. 412–427. http://dx.doi.org/10.5860/. [16] judy luther and maureen c. kelly, “the next generation of discovery,” library journal 136, no. 5 (march 15, 2011): 66-‐71. [17] graham stone, “searching life, the universe and everything? the implementation of summon at the university of huddersfield,” liber quarterly 20, no. 1 (2010): 25–51. http://liber.library.uu.nl/index.php/lq/article/view/7974. [18] jeff wisniewski, “web scale discovery: the future’s so bright, i gotta wear shades,” online 34, no. 4 (aug. 2010): 55–57. [19] gaurav bhatnagar, scott dennis, gabriel duque, sara henry, mark maceachern, stephanie teasley, and ken varnum, “university of michigan library article discovery working group final report,” university of michigan library, jan. 2010, http://www.lib.umich.edu/files/adwg/final-‐report.pdf [20] abe crystal and beth ellington, “task analysis and human-‐computer interaction: approaches, techniques, and levels of analysis” in amcis 2004 proeedings, paper 391, http://aisel.aisnet.org/amcis2004/391. [21] neville a. stanton, “hierarchical task analysis: developments, applications, and extensions,” applied ergonomics 37, no. 1 (2006): 55–79. [22] john annett and neville a. stanton, eds. task analysis, 1 edition. london ; new york: crc press, 2000. [23] sarah k. felipe, anne e. adams, wendy a. rogers, and arthur d. fisk, “training novices on hierarchical task analysis,” proceedings of the human factors and ergonomics society annual meeting 54, no. 23, (sep. 2010): 2005–2009, http://dx.doi.org/10.1177/154193121005402321. applying hierarchical task analysis method to discovery layer evaluation | promann and zhang 104 [24] d. embrey, “task analysis techniques,” human reliability associates ltd, vol. 1, 2000. [25] j. reason, “combating omission errors through task analysis and good reminders,” quality & safety health care 11, no. 1 (mar. 2002): 40–44, http://dx.doi.org/10.1136/qhc.11.1.40. [26] james hollan, edwin hutchins, and david kirsh, “distributed cognition: toward a new foundation for human-‐computer interaction research,” acm trans. comput.-‐hum. interact 7, no. 2 (jun. 2000): 174–196, http://dx.doi.org/10.1145/353485.353487. [27] stuart k. card, allen newell, and thomas p. moran, the psychology of human-‐computer interaction. hillsdale, nj, usa: l. erlbaum associates inc., 1983. [28] stephen j. payne and t. r. g. green, “the structure of command languages: an experiment on task-‐action grammar,” international journal of man-‐machine studies 30, no. 2 (feb. 1989): 213–234. [29] bonnie e. john and david e. kieras, “using goms for user interface design and evaluation: which technique?,” acm transactions on computer-‐human interactions 3, no. 4 (dec. 1996): 287–319, http://dx.doi.org/10.1145/235833.236050. [30] david e. kieras and david e. meyer, “an overview of the epic architecture for cognition and performance with application to human-‐computer interaction,” human-‐computer interaction 12, no. 4 (dec. 1997): 391–438, http://dx.doi.org/10.1207/s15327051hci1204_4. [31] laura g. militello and robert j. hutton, “applied cognitive task analysis (acta): a practitioner’s toolkit for understanding cognitive task demands,” ergonomics 41, no. 11 (nov. 1998): 1618–1641, http://dx.doi.org/10.1080/001401398186108. [32] brenda battleson, austin booth, and jane weintrop, “usability testing of an academic library web site: a case study,” the journal of academic librarianship 27, no. 3 (may 2001): 188– 198. [33] paul chojecki, “how to increase website usability with link annotations,” in 20th international symposium on human factors in telecommunication. 6th european colloquium for user-‐friendly product information. proceedings, 2006, p. 8. case study references: information technology and libraries | march 2015 105 find an article: case 1. yang, t. c., and kevin d. heaney. "network-‐assisted underwater acoustic communications." in proceedings of the seventh acm international conference on underwater networks and systems, p. 37. acm, 2012. case 2. jorgensen, william l., jayaraman chandrasekhar, jeffry d. madura, roger w. impey, and michael l. klein. "comparison of simple potential functions for simulating liquid water." the journal of chemical physics 79 (1983): 926. case 3. “design annual.” graphis inc., 2008 case 4. walb, m. c., j. e. moore, a. attia, k. t. wheeler, m. s. miller, and m. t. munley. "a technique for murine irradiation in a controlled gas environment." biomedical sciences instrumentation 48 (2012): 470. find a book (physical): case 5. few, stephen. show me the numbers: designing tables and graphs to enlighten. vol. 1, no. 1. oakland, ca: analytics press, 2004. case 6. metcalf, christine. the love of cats. crescent books, 1973. case 7. machiavelli, niccolò, and leo paul s. de alvarez. 1989. the prince. prospect heights, ill: waveland press. case 8. lees-‐maffei, grace, and rebecca houze, eds. the design history reader. berg, 2010. find an ebook: case 9. rubin, jeffrey, and dana chisnell. handbook of usability testing: how to plan, design, and conduct effective tests. wiley technical communication library, 2008. case 10. rubin, jeffrey, and dana chisnell. handbook of usability testing: how to plan, design, and conduct effective tests. wiley technical communication library, 2008. case 11. laube, matthew bryan. ancient awakening. 2010. lib-s-mocs-kmc364-20140601051149 3 an interactne computer-based circulation system: design and development james s. aagaard: departments of computer sciences and electrical engineering, northwestern university, evanston, illinois. an on-line computer-based circulation control system has been installed at the northwestern university library. features of the system include selfservice book charge, remote terminal inquiry and update, and automatic production of notices for call-ins and books available. fine notices are also prepared daily and overdue notices weekly. important considerations in the design of the system were to minimize costs of operation and to include technical services functions eventually. the system operates on a relatively smau computer in a multiprogrammed mode. introduction although the northwestern university library had given some consideration to the adoption of data processing techniques over a period of many years, it was not until planning for a new library building started that this consideration became serious. an associate university librarian and a systems analyst were added to the staff with specific responsibilities in the "automation" area. the recommendation of the systems analyst was that an on-line system should be designed to integrate all library functions. two areas were isolated for initial development: technical services, including ordering and cataloging, and circulation control. several other decisions were made at about the same time (fall 1967). perhaps most important of these was the choice of computer. the acquisition of a dedicated library computer was ruled out on the basis of cost, leaving the choice to be made between a control data 6400 soon to be installed in the university's computing center, or an ibm 360/30 in the administrative data processing department. it was clear that the ibm 360 would have to be upgraded considerably to handle an on-line system, but the decision was made to use it, based on the facts that it was already installed and operating, that the machine itself was more adaptable to text processing 4 journal of library automation vol. 5/1 march, 1972 applications, and that the library was an administrative application. a small programming staff was available, and it was decided to use that staff rather than have the library develop its own programming capability. the university's engineering and science libraries were administratively divorced from the rest of the evanston campus libraries to serve as a pilot location for development and testing. one final decision was made by the programming staff. since there was reason to believe that the use of a real-time system by the library might generate similar requests from other users of data processing services, the system should be capable of extension to other applications, if possible. design then began on a general-purpose file maintenance system. a detailed description of this system will be presented in another paper. actual programming started in spring 1968, and in about a year the teleprocessing system was essentially complete and work was started on various subsidiary programs, to be run on a daily or weekly basis. these included programs for producing catalog cards, purchase orders, and similar materials. however, at this time the realization came that the opening of the new building was less than a year away (construction was on schedule, unlike the situation with several other buildings) , and the new library administration felt very strongly that it would be desirable to have an operational circulation system. work then was suspended on the technical services part of the system, but it is important to note that the system developed up to that point provides the on-line inquiry capability to the circulation operation. it is also true that this capability is more sophisticated than would be needed for circulation applications only. mter the basic system design for the circulation system was completed in spring 1969, the massive job of preparing nearly a million punched cards for the books was started in the summer. this was done using student operators, working from the shelflist. the most expensive and time-consuming part of the job, however, proved to be the insertion of the cards in the books and this was not completely finished before the new building opened in january 1970. the computer circulation system was not ready for operation until december, so that it was tested in the pilot library for only a three-week period, hardly enough for a complete cycle of book charges and discharges. operation in the new building was complicated by several factors besides a new and unfamiliar circulation system. the building itself was not quite finished, all of the books were not in place, all of the remote terminals were not installed, and there was a large backlog of work which had accumulated during the moving period. the most serious problem, however, was that a decision had been made to continue the old manual circulation system in parallel with the new one, and with the other problems this became too much of a burden on the library staff. when it apinteractive circulation design/ aagaard 5 peared that there were no problems with the new system which could not be worked out, the manual system was quickly abandoned. after this point operations began to improve rapidly, and within a few months the system was running quite smoothly. the systems and programming staff has now returned to the implementation of the technical services system. general description functionally the northwestern university library circulation system may be viewed as consisting of three parts. the first of these is a book charge/ discharge operation using the ibm 1030 series of terminals. the second part is the general-purpose file maintenance system, originally developed for technical services; and the third part is a group of programs which are run in "batch" mode and thus have no direct interaction with the remote terminals. the teleprocessing program operates in a partition of 36,864 bytes of storage on a 65,536-byte computer. the ibm disk operating system is used, which requires 8,240 bytes, leaving 18,432 bytes for batch programs. the basic telecommunications access method is used for remote terminal input-output operations. data storage is on an ibm 2321 data cell. the present terminal configuration consists of five pairs of 1031 input terminals and 1033 printers, and four 27 40 model 2 typewriter terminals (two of which are used for technical services development). the partition size is probably adequate for one or two more terminals. all of the 1030 terminals share a common telephone line, as do all of the 27 40 terminals. two of the 1030 terminals are master units, connected directly to the telephone line, while three 1030s are satellites, operating through a master terminal. one master 1030 and one 2740 are located in the technological institute library; the remaining terminals are in the main university library. book charge system each of the 1031 terminals will accept an 80-column punched card which is kept in the book pocket, and also a punched plastic user identification badge. the three satellite terminals are located in the stack area of the library, one on each of three floors adjacent to the elevators. they are used for self-service charge of books. a master terminal, which also includes a manual entry keyboard, is located at the circulation desk on the main floor and is operated by library staff. the keyboard on this terminal allows the staff to perform additional functions , such as charging books to users without badges, charging for periods other than the standard loan period, processing renewals, and discharging books which have been returned. the 1033 printer associated with each t erminal is really just a modified electric typewriter. when a book is charged and the transaction is ac6 journal of library automation vol. 5/1 march, 1972 cepted by the computer, the printer creates a date due slip which shows the call number of the book, the identification number of the user, and the date due. this is placed in the book pocket and serves as the borrower's pass to carry the book past the exit guards. to make this a reasonably secure system, the guard must verify that the call number printed on the slip corresponds with the number on the book, and the user number on the slip corresponds with the number on a valid university identification badge. note that this system permits a user to bring the book back into the library and take it out again as often as he wishes during the loan period. the 1030 terminals are associated with a small, specialized part of the computer teleprocessing program which accepts the information from the terminal, reformats it so that it is compatible with the general-purpose file maintenance system, and enters it in the file, checking, of course, that the same book is not already in the file. (transactions which are invalid for one reason or another result in an "unprocessed" message on the printer, and the user must then go to the circulation desk to have the problem resolved. ) this portion of the system also processes renewals and discharges from the terminal at the circulation desk. inquiry system inquiry requests use the general-purpose file maintenance system originally developed for technical services. this gives an operator at the circulation desk the capability to query the file about the status of any book, and also to make certain changes in records, such as indicating renewals and saves. the terminal used for this purpose is the ibm 27 40, which is similar to a typewriter in operation. in order to facilitate the inquiry operations, the key to each record is the actual call number of the book, rather than some arbitrary accession number. the call number is divided into two parts, which we call the search key and the key extension. the search key includes the dewey classification number, cutter number, and up to four work letters; the key extension includes other information such as edition number, volume number, or copy number. a few compromises were necessary with the key extension in order to adapt it to the limited character set that the 1030 terminals can process, but these changes have not caused any difficulty. when a record is first entered in the file, the search key is used to calculate a position in the file. this is done by taking the alphanumeric characters in the search key, performing some mathematical operations to reduce the number of characters to four, and then treating the result as if it were a number. this resultant number is divided by a constant number which is chosen to be the largest prime number which is smaller than the number of tracks on the storage device for the file (the data cell ). the remainder from this randomizing computation gives a location where an attempt is made to place the record. it is quite possible that this place interactive circulation design/aagaard 7 in the file will be occupied by some other record, which might have the same search key or a quite different one. additional steps are provided to find an alternate location in such cases. additional information about the file organization is given in the appendix to this article. when a record must be located in the file, the same randomizing procedure is followed. then, the complete keys of any records which are found at the calculated location are examined to find the desired record. however, the operator at the inquiry terminal has the option of requesting a search for records based on search key only, and this is often useful. even though the individual records for each book are relatively short ( sixty-seven characters), the cost of maintaining a record in the file for every book in the library would be prohibitive. for this reason the philosophy has been adopted that an entry will be made in the file only for a book which is not in its proper place on the shelves. thus the file includes entries for books which have been placed on reserve, loaned to library departments, or reported lost or missing. there are probably more records of this type in the file at any given time than records representing books borrowed by individuals. unfortunately, the library does not have the terminal or personnel capacity to verify the status of all books before the user leaves the card catalog area (adjacent to the circulation desk) so that if he doesn't find a book on the shelves he must return to the circulation desk to request an inquiry. in many cases, of course, the user may be satisfied with a nearby book on the same subject, or perhaps he just intends to browse in a general subject area. it is hoped that at some future time it will be possible to obtain inquiry terminals which can be user-operated and located in the stack area. an additional feature of the system is the capability to place a "save" on a book which is on loan. this is done by the operator at the inquiry terminal, who adds the saver's identification number to the record of the book. the save request triggers a call-in notice, discussed below. this procedure has the minor disadvantage that a save cannot be entered so that the first copy of a particular book which is returned will be held; the operator must select a copy if more than one is out (or enter the save on all copies). this has not proved to be a serious problem. the overall teleprocessing system includes a file of records, called the "transaction file," which is written sequentially. records from other files which can be accessed by remote terminals are written in the transaction file under certain circumstances. in the case of circulation records accessed by the inquiry terminal, writing of the transaction file is under the control of the terminal operator, who will always request that the book record be written in the transaction file when entering a save on a record. records processed by the 1030 terminals are entered in the transaction file only when they represent books which are discharged and which possibly involve a fine or contain a save. 8 journal of library automation vol. 5/1 march, 1972 batch processing the third part of the complete system includes a number of "batch" programs, which are run periodically and are independent of the real-time program. however, they may be, and usually are, run at times when the real-time program is also operating. these programs use either the transaction file or the main circulation file. each weekday morning, a series of programs is run which processes the data entered into the transaction file since the previous run. three types of printed notices are prepared by these programs as shown in the accompanying figures. one is a fine notice for books which were overdue when returned but for which a fine was not collected. (if a fine was collected the fact is indicated by a keyboard entry on the 1030 terminal when the book is discharged.) there also may be circumstances when the regular fine was collected, but a penalty fine is due because the book was called in but not returned in the specified time. the second type of notice is the call-in notice. this is prepared from records which were placed in the transaction file as a result of a save entered by the inquiry terminal operator. the third type of notice is the book-available notice, which results when a book with a save is discharged. (when the actual discharge is performed the real-time computer program prints a message on the 1033 printer so that the book will not be returned to the stacks. ) after the selection of records from the transaction file, name and address information is added from student and personnel files maintained by the data processing department, and a four-part notice is prepared. the same printed form is used for all three types of notices; the additional copies of the fine notice are used in a manual follow-up system if the fine is not p-aid immediately. processing of these notices generally takes less than ten minutes of computer time a day. on a weekly basis, another series of programs is used to process the entire file of outstanding books. a considerably longer time is required because of the size of the file which must be examined. at this time all records in the file are also transferred to a backup file, which provides some protection in case of damage to the prime file (it has not yet been necessary to use it). information is also extracted about books which have had transactions in the past week, and a circulation statistics report is prepared. finally, records for books which are overdue are extracted and processed similarly to the daily notices, resulting in a one-part overdue notice. on a quarterly basis the entire file is examined and lists are prepared of all entries which otherwise do not qualify for overdue notices. these include charges to reserve, lost and missing, other libraries, carrels within the library, and to faculty (who are not charged fines). these lists are distributed for verification that the books are actually located where the interactive circulation design/ aagaard 9 file says they are and then returned to the circulation department for any further processing which might be necessary. backup system in designing a real-time circulation system it was obviously necessary to provide for those occasions when some equipment malfunction prevents normal processing. it was not felt necessary to provide any backup to the inquiry part of the system, and book discharges can be allowed to wait for several days if absolutely necessary. to process charges during a period when the computer system is not operational, a standard register source record punch is used. this device can accept the same plastic user badge and book card as the 1030 terminal, and transfers the punched information to a two-part form. the first part of the form serves as the date due slip, while the second part, a standard 80-column card, is used to enter the transaction through the 1031 when the real-time system is again operational. this system has the advantage that it is completely independent of the real-time system, except of course for the building electric power. if for some reason the standard register punch cannot be used for a transaction (or for a book without a book card under any circumstances), the missing information can be handwritten on the two-part form. the second card part is then keypunched and the transaction entered through the 1031 terminal. this ultimate backup system is avoided, however, as it is very susceptible to transcription errors. costs any attempt to determine the cost of the on-line circulation system is complicated by several factors. the cost of the terminals is an obvious item, and fairly easy to determine and allocate. however, the cost of the communications adapter which connects the telephone lines to the computer, as well as the cost of running the teleprocessing program, must be shared by all users. these now include technical services as well as circulation services, and in the future may include other nonlibrary university users. finally, even if this allocation could be made, there is still the problem of separating the costs of the teleprocessing program and the batch programs being run in the data processing department. however, since poor information may be better than none at all, the following figures are presented. they include monthly charges for the real-time program for both circulation and the part of technical services which is now operating, but do not include any charges for running of batch programs. 1030 terminals master 1031a & 1033 with manual entry 2 @ $251 ----------------------------------------------------------------------------------$ 502 10 journal of library automation vol. 5/ 1 march, 1972 satellite 1031b & 1033 3 @ $155 --------------------------------------------------------------------------------------465 2740 model 2, 600 bps 4 @ $170 --------------------------------------------------------------------------------------------680 2701 data adapter with 2 lines ---------------------------------------------------------_ 450 t elephone line charges ---------------------------------------------------------------------------25 data cell space, 5 cells ( 1 circulation; 4 technical services ) --------------------------------------------------1,400 core storage allocated exclusively to teleprocessing ------------------------1,700 estimated share of cpu and disk costs -------------------------------------------------1,400 special operator charge -----------------------------------------------------------------------350 future plans although the present real-time program includes a list containing a limited number of invalid user numbers (lost badges or users who are guilty of repeated violations of library rules), it would be more satisfactory to have a list of valid numbers instead. this might even be expanded into a self-regulating system, where users with a sufficient number of "demerits" would be prevented from charging additional books, and which might reduce the need for fines. another desirable addition to the system would be if some simple and inexpensive display terminals could be obtained and placed in the stack area to provide a self-service inquiry capability. this would relieve the circulation desk staff of some additional work, as well as reduce the number of trips the users must make to and from the stack area. acknowledgments the success of this circulation system is due in no small part to the constant support and encouragement from mr. h oward f. smith, director of administrative data processing, and mr. john p. mcgowan, university librarian. mrs. velma veneziano was responsible for establishing the library's requirements and rendered invaluable help during the implementation of the system. [editor's note: mrs. v eneziano is preparing an updated (1972) summary of the actual application of this design, which it is hoped will be published shortly.] appendix 1 detailed d escription of file structure the following information is included in each circulation file record: key dewey classification number-11 characters . ( the decimal point is included in the record but not punched on book cards. ) interactive circulation designjaagaard 11 cutter number-5 characters. work letters-4 characters. key extension volume, editor, copy-17 characters. location code-2 characters. this code indicates whether the book is in the main library or in a branch. it also provides for a subsidiary location code if required. large book indicator-! character. charge code and date-s characters. borrower number-5 characters. renewal code and date-s characters. discharge code and date-s characters. save date-2 characters. saver number-5 characters. due date-4 characters. terminal identification code-1 character. reserved for system use-1 character. all dates and user identification numbers are packed to save space. because of the random organization of the file it is necessary that storage space be allocated for considerably more records than the expected maximum file size. this allocation conceivably might have been based on a simulation study of the file operation, but this could not be justified since not even a good estimate of maximum file size was available. (an important unknown factor was the change in usage patterns expected to result from the move to a new building. ) the only available estimates indicated a maximum file size of about 60-70,000 records, and on this basis sufficient space was allocated to hold 144,000 records. with twelve records to a track on the ibm 2321 data cell, the file requires 12,000 tracks, or 60 percent of one cell. an equivalent would be about one pack on an ibm 2314 disk storage unit. of this total space, five percent (about 7,200 records ) is set aside as an overflow area; the remainder is the prime area. when the randomizing algorithm leads to a cylinder (twenty tracks) which is completely filled, the record is written in the overflow area and the cylinder at the prime location is flagged . using this system, failure due to "running out of space" will be gradual; it is likely that system performance will be seriously degraded before the overflow area is completely filled. after more than a year of operation the file contains almost 63,000 records, of which about 230 are in the overflow area. consideration is being given to minor changes in the handling of overflow records which should reduce the time required to search the overflow area. enhancing visibility of vendor accessibility documentation samuel kent willis and faye o’reilly information technology and libraries | september 2018 12 samuel kent willis (samuel.willis@wichita.edu) is assistant professor and technology development librarian and faye o’reilly (faye.oreilly@wichita.edu) is assistant professor and digital resources librarian at wichita state university. abstract with higher education increasingly being online or having online components, it is important to ensure that online materials are accessible for persons with print and other disabilities. libraryrelated research has focused on the need for academic libraries to have accessible websites, in part to reach patrons who are participating in distance-education programs. a key component of a library’s website, however, is the materials it avails to patrons through vendor platforms outside the direct control of the library, making it more involved to address accessibility concerns. librarians must communicate the need for accessible digital files to vendors so they will prioritize it. in much the same way as contracted workers constructing a physical space for a federal or federally funded agency must follow ada standards for accessibility, so software vendors should be required to design virtual spaces to be accessible. a main objective of this study was to determine a method of increasing the visibility of vendor accessibility documentation for the benefit of our users. it is important that we, as service providers for the public good, act as a bridge between vendors and the patrons we serve. introduction the world wide web was developed late in 1989 but reached the public sector the following year and quickly gained prominence.1 around this same time (1990), the americans with disabilities act (ada) was also passed, so when it was written the role of the web had yet to take shape. websites and online content, while not included specifically in the ada, have been increasingly emphasized when institutions examine the accessibility of their resources for persons with disabilities. more recent legislation, as well as legal-settlement agreements (including with colleges and universities), have included—and even emphasized—the importance of accessible online content. researchers have argued that in requiring facilities to be accessible, ada must include digital accessibility.2 with higher education increasingly being online or having online components, it is important to ensure that online materials are accessible for persons with print and other disabilities, many of whom may have received more extensive support in primary and secondary schools. unless accessibility is pursued with purpose, the level of education and educational materials available for students with disabilities will be severely limited.3 literature review legislation and existing guidelines equal access to information for all patrons is a foundational goal of libraries. in higher education, accessible information and communications technology allows users of all abilities to focus on learning without undue burden.4 colleges and universities are required by law to provide enhancing visibility of vendor accessibility documentation | willis and o’reilly 13 https://doi.org/10.6017/ital.v37i3.10240 reasonable accommodations to allow an individual with a disability to participate fully in the programs and activities of the university. according to title ii of ada, discrimination on the basis of disability by any state or local government and its agencies is strictly prohibited.5 section 504 of the rehabilitation act of 1973 also prohibits discrimination on the basis of disability by any program or activity receiving federal assistance.6 the department of education stated, “public educational institutions that are subject to education’s section 504, regulations because they receive federal financial assistance from us are also subject to the title ii regulations because they are public entities (e.g., school districts, state educational agencies, public institutions of vocational education and public colleges and universities).” 7 this piece of legislation usually manifests itself in the physical learning space—wheelchair ramps, braille textbook options, interpreters, and more—but finds little application in the digital spaces of a university, especially in the library’s online research presence. this is an alarming revelation; much higher learning today takes place in an online environment, and inaccessible library resources are a contributing factor to challenges in higher education faced by users with disabilities. to be considered accessible, a digital space, such as a website, online-learning management system, or a research discovery layer, and any word documents, pdfs, and multimedia presented therein, should be formatted in such a way that it is compatible with assistive technologies, such as screen-reading software. a website should also be navigable without a mouse using visual or auditory clues. content on a website ought to be clearly and logically organized, with skip navigation links to bypass to the page’s main content. images should have alternative text descriptions, known as “alt text,” that is brief and informative, describing the content and role of the image. links should likewise have clear descriptions of the target page. these and similar considerations aim to help persons with impairments that may make reading a monitor or screen difficult.8 digital spaces like a research database are considered electronic information technology (eit). eit is defined as “information technology and any equipment or interconnected system or subsystem of equipment that is used in the creation, conversion or duplication of data or information.”9 recently this terminology has been converted to information and communications technology (ict) as per the final rule updating section 508 in early 2017, but the essence of what it means remains unchanged.10 legislation regarding digital accessibility exists, specifically section 508 of the rehabilitation act of 1973, but only federal agencies and institutions receiving federal aid are required to abide by these statutes. lawmakers considered technology as a growing part of daily life in 1998 and amended the rehabilitation act with section 508, requiring federal agencies to make their ict accessible to people with disabilities.11 in 2017, these standards were updated with a final rule that modernized guidelines for accessibility of future ict.12 any research databases or other applications used by college and university libraries to facilitate online learning would be considered ict and thereby subject to section 508 requirements. it is evident that libraries not only have legal reasons to comply with section 508, but ethical reasons as well because making library collections and services universally available is a core value of the library community.13 in addition to legislation, the world wide web accessibility initiative (wai) created the web content accessibility guidelines (wcag) in 1999 in response to the growing need for web accessibility and to promote universal design. these standards created for web-content creators and web-tool developers are continually updated as new technologies and capabilities emerge— with version 2.0 being released in 2008—and apply specifically to web content and design. many information technology and libraries | septmeber 2018 14 of these guidelines were absorbed by the 2017 refresh of section 508 of the rehabilitation act of 1973.14 with fourteen guidelines assigned priority levels 1–3, wcag 2.0, and subsequent revisions to date, offer three levels of conformance with digital-accessibility guidelines: level a, the most basic level, meaning all mandatory level 1 guidelines are met; level aa, meaning priority levels 1 and 2 are met; and level aaa, meaning priority levels 1–3 are met. these conformance levels are important because many ict vendors will make their claims to conformance with wcag standards by using provided wai icons or using statements that refer to the level of conformance.15 wcag 2.0 guidelines alone are not enough to determine fully if a website or other digital content is truly accessible. it partly depends on it having an intuitive layout for a variety of users, which can only be achieved through usability testing.16 it is crucial that librarians understand what is required for a product or service to be considered accessible, and a firm grasp of wcag 2.0 and its conformance levels will enrich a librarian’s understanding of web accessibility and section 508 regulations.17 a voluntary product accessibility template (vpat) is a self-assessment document that vendors are required to complete only if they wish to sell their products to the federal government or any institution that chooses to require them. the quality of vpats varies, but essentially they will list section 508 standards and for each specify whether they fully or partially support it, do not support it, or if the standard is not applicable. there is then a space for the vendor to provide an explanation for limitations. since these are voluntary self-assessments, these documents can sometimes be brief and incomplete, but even brief statements can be specific enough to relatively easily verify the claims of support. because libraries are portals to online content, including e-books, e-journals, databases, streaming media, and more, which are provided largely by third-party vendors, libraries face unique struggles when attempting to comply with federal regulations. notions of equality and equal access are inherent to libraries and important for the maintenance of a democratic society, which makes accessibility within libraries’ digital content a concerning ethics issue.18 having little control over how ict is designed, libraries still must figure out how to address accessibility needs within third-party ict. in 2012, the association of research libraries (arl) joint task force on services to patrons with print disabilities encouraged libraries to require publishers to implement industry best practices, comply with legal requirements for accessibility, include language in publisher and vendor contracts to address accessibility, and request documentation like vpats.19 the task force’s report was vital in the creation and direction of this study. existing literature and studies as library professionals, we may often make assumptions of the accessibility of a third-party resource when the reality is that greater importance is placed on design of a product; accessibility components are either being added as special features or are being included once the design work is completed.20 tatomir and durrance conducted a study on the compatibility of thirty-two library databases with a set of guidelines for accessibility they called the tatomir accessibility checklist.21 this list included checking the usability of these databases with a screen reader and braille renewable display. they found that 44 percent of the databases were inaccessible, with an additional 28 percent being only “marginally accessible,” based on their criteria. this suggests major problems exist within vendor database platforms.22 enhancing visibility of vendor accessibility documentation | willis and o’reilly 15 https://doi.org/10.6017/ital.v37i3.10240 building on this research, western kentucky university libraries conducted a study on vpats from vendors to determine how accessible seventeen of their databases were.23 the university libraries ran an accessibility scan on those databases and compared the results with the vendors’ vpats, finding that the templates from the vendors were accurate about 80 percent of the time. most of the vendors did not address the accessibility of portable document format (pdf) files in their vpat statements, though it was an important component of their services. pertinent to this study, western kentucky’s work looked for accessibility documentation on vendors’ websites , and when one was not found, contacted the vendors requesting this information. this study was unique for targeting vendor-supplied vpats rather than only examining the databases themselves or tutorials from vendors. as mentioned previously, this was only done for the libraries’ main database vendors. mune and agee published an article on the ebooks accessibility project (eap) funded by affordable learning solutions at the california state university system. in this project, the researchers compared academic e-book platforms to e-reader platforms used for popular trade publications. they gathered data on the top sixteen library e-book vendors at san jose state university based on patron usage and title and holdings counts. the results indicated that academic e-book platforms were less accessible than nonacademic platforms, largely because of hesitance in adopting the epub 3 format, which by default has superior navigation and document structure to pdf or html, common academic options.24 while this study focused solely on the accessibility of e-book materials, a method for contacting vendors used in the eap study was adapted for the current study, applied at a larger scale. the eap researchers attempted to locate the vendors’ vpats online, and they contacted the vendors at least twice to request a vpat or other accessibility statement when none was located. it is noteworthy that of the sixteen vendors, all but one (94 percent) provided eap with some form of accessibility documentation, though less than half (44 percent) had a vpat available.25 another study, by joanne oud, examined vendor-supplied database video tutorials. half of the twenty-four vendors examined in oud’s study had tutorials in formats that were not accessible by keyboard or screen reader. this was largely because many of these tutorials were flash-based.26 shockwave flash is neither accessible for persons with disabilities nor good for usability on modern browsers.27 oud’s findings suggest that tutorial content would be more widely accessible if they were placed in youtube or another platform that had transcripts and captions available. while the focus of the study was different from our own, it was similar in that oud examined the accessibility of vendor materials apart from the journals and collections. also, oud noted that to make use of vendor tutorials, the website on which they are housed must likewise be accessible and the videos easy to find, but this is often not the case.28 other studies suggest that vendor websites and platforms often impede access to information. vendor platforms often have inaccessible pdfs, or the links to the full-text options are not easily located. delancey’s study also found more than three-fourths of the vendors examined had images without alternative text and frames without titles, resulting in many users with visual impairments being left out of the content of these images and frames entirely. of particular note, however, was the finding that not one of the vendors in this study had all forms —buttons, search boxes, and other browser navigation tools—labeled correctly, leaving the sites difficult to navigate.29 beyond whether the information itself is accessible, the question inevitably arises, can information technology and libraries | septmeber 2018 16 the desired information even be reached? one way or another, the content on these platforms must be accessible and easy to find. part of the motivation behind the current study stems from what delancey put so well: “only one vendor (out of seventeen), project muse, had a publicly available vpat on their website, though 9 others supplied this documentation upon request in under a week.”30 the first step in improving accessibility of resources for our patrons is to discuss accessibility with them—to determine how accessible information resources are today and identify areas of need. if a vpat or, minimally, any form of an accessibility statement is not easily discoverable on a vendor’s website—even if it is available upon request—users with disabilities as well as enabled users are not able to benefit from this information. are the vendors making it a priority in this case? additionally, since 41 percent of the vendors delancey examined had no vpat at all, what can be done before and aside from reaching out to vendors and stressing the importance of accessibility and of making statements on accessibility easy to find? from legal responsibilities to the dismal reality of digital accessibility, the task of improving library service for patrons with disabilities is daunting, even with the empowering ethical drivers of the library value system. ostergaard created “strategies for acquiring accessible electronic information sources,” an incredible guide to begin creating a guide that helps librarians develop an accessibility plan informed by her own work committed to accessibility in her library. steps 3 and 4 of ostergaard’s strategies are particularly relevant to the current study. step 3, “communicating with vendors,” involves inquiring about the accessibility of electronic products in addition to asking about any future plans for accessibility of their product and requesting vpats or other vendor supplied accessibility documentation. step 3 also recommends that librarians request vendors meet wcag 2.0 best practices and to incorporate a clause in license agreements that clearly defines accessibility of their products as further demonstration of ded ication to accessibility. such communication, it is hoped, would also lead to improved product development.31 once vendors are contacted, ostergaard outlines in step 4 the importance of documenting vendor communication regarding digital accessibility and further suggests assigning a person or team to review information received. ostergaard’s library changed the name of their acquisitions budget to “access budget,” reallocating a portion of their budget to review existing subscriptions, purchase accessible replacements, or in some cases, convert materials to an accessible format. the documentation review allowed the library to make informed decisions about collections and service availability on behalf of library users, but no mention was made of involving users in this process. the article provided a letter template that encompassed the aforementioned concepts and a request for assessment documentation, such as vpats and official statements of compliance. the ostergaard template served as a foundation for the language used in vendor communication for the current study, particularly the vpat or other accessibility documentation request.32 there have been no studies that suggest a way to implement easily discoverable vendor accessibility documentation—even when said documentation is not readily available to the public on the vendors’ sites. delancey suggested creating “an open repository for both vendor supplied documentation, and the results of any usability testing,” but this was suggested for internal library use, not public dissemination.33 if this documentation is made more easily available, we can enhancing visibility of vendor accessibility documentation | willis and o’reilly 17 https://doi.org/10.6017/ital.v37i3.10240 increase patron involvement in the discussion of accessibility of vendor-supplied library resources. research methods library-related research has focused the need for academic libraries to have accessible websites, in part to reach patrons who are participating in distance-education programs.34 a key component of a library’s website is the materials it avails to patrons from vendors, like databases and database aggregators. since, however, these materials are accessed via vendor platforms, they are outside the direct control of the library, making it more difficult to address accessibility concerns. some vendors have put forward significant effort in addressing accessibility needs. some offer a built-in feature for text-to-speech for html files or provide documents in a variety of formats, including txt and mp3 files, thereby offering a format that works well with common screenreading programs, or providing a sound file directly. this is of particular benefit to patrons with print disabilities.35 other vendors, such as ebook central (formerly ebrary), have worked to eliminate their flash dependencies. this is recognized as a positive step toward making vendor content usable for all. streaming video and other nonprint-based library materials must also be accessible. a person with visual impairments may be able to hear the soundtrack of the video, but unless an accurate description is provided of what is being presented visually, he or she will miss out on such information, such as the names of those speaking. to complicate matters further, hearing impaired users of these databases will not be privy to what is verbalized unless accurate captions and transcripts, or an interpreter, is made available for the videos. captions and transcripts are sometimes made available, but can easily be incomplete or incorrect. for example, alexander street press provided closed captioning and transcripts for some collections but not others. even when the captions or transcripts existed, as with a video we tested from ethnographic videos online, it was of low quality, inscribing the word “object” as “old pics,” “house” as “mess,” and so forth. one vendor, docuseek, had subtitles to translate from spanish, but no closed captioning or transcript available. audio-impaired users could not make full use of the video because the subtitles did not include all information presented in the sound track. (transcripts can also be useful to visually impaired users using screen readers.) films on demand had better captions and transcripts, but did not include all the words on the screen in the transcript, such as the title. regardless of the medium there are multiple ways to provide accessible versions, but they are seldom automatic. librarians must communicate the need for accessible digital files to vendors so they will prioritize it. as long as libraries—one of their main customer groups—accept their offerings whether accessible or not for persons with disabilities, vendors have no reason to put great effort into making these improvements. as colker pointed out, commercial vendors are not required to comply with ada regulations under title ii or title iii.36 vendors may also face resource restrictions that hinder their ability to improve their platforms’ accessibility. 37 they are businesses, so it is natural that they would only commit a concerted effort to reformat and enhance their platforms and records if the benefits are expected to outweigh the costs; they must firstly be made aware of the issue, and know that it is important to libraries and their patrons. information technology and libraries | septmeber 2018 18 in much the same way as contracted workers constructing a physical space for a federal or federally funded agency must follow ada standards for accessibility, so software vendors should be required to design virtual spaces to be accessible. this comparison was made by the department of education more than twenty years ago, and has the added benefit of greatly reducing the need for accommodation after the fact.38 according to cardenes, “at a minimum, a public entity has a duty to solve barriers to information access that the public entity’s purchasing choices create.”39 oswal stressed the importance of integrating the blind user experience into the development of databases from the beginning, as well as finding steps useful for guiding library users after the fact. merely following the rules set out in federal regulations is not enough to provide exemplary service to library patrons. the patrons as well must be involved in the process to fully address accessibility needs.40 process and findings the first objective of this study was to gain a better understanding of the accessibility of our library’s vendor-provided digital resources through the review of vendor-provided accessibility documentation. the second objective of this study was to determine a method of increasing the visibility of accessibility documentation for the benefit of our users and to communicate to them our commitment to improving service to users with disabilities. with a digital collection consisting of 270 databases, more than 750,000 e-books and e-journals, and more than 12 million streaming media titles, it was difficult to identify an appropriate sample. we needed a collection that would best serve as an illustrative swatch of our library’s digital holdings, and more importantly, a collection that would have the largest impact on our users. we also needed to establish a strategy for obtaining accessibility documentation regarding third-party content as well as create a delivery method for the vpats and other documentation we discovered in the course of our study. similar to other institutions, our library maintains a directory of the most used and most useful databases on the library’s homepage in the form of the a–z list (http://libresources.wichita.edu/az.php). determinations of usefulness are based on input from our reference librarians, who connect with user needs directly, whereas use comes from annual usage statistics compiled as per standard library procedures. users can browse this directory by subject, search by title, and sort by database type (full-text, streaming media, etc.), and the a-z list is a convenient place for users to begin their research. the directory also served as a convenient place to begin this study as it presented us with a sample that not only reflected the needs and habits of our patrons, but an excellent and diverse list of vendors to work with. beginning with a list of all subscribed databases (270 in 2016) exported directly from the a–z list’s backend, we sorted the list by vendor and determined that 74 vendors would be investigated. university materials indexed by the directory (i.e., institutional repository and libguides) were excluded from this study. as visibility of accessibility documentation is of concern to this study, our investigation began by visiting the database or vendor’s site and conducting a web search to obtain any information about accessibility. we were looking for mentions of the following keywords: “section 508” or “section 504,” “w3” or “wcag,” “vpat,” “ada,” and simply “accessibility.” some sites were intuitive: thirty-four vendors (45 percent) had statements that were found online. examples of commonly used documentation, which for the purposes of this study will be referred to as http://libresources.wichita.edu/az.php enhancing visibility of vendor accessibility documentation | willis and o’reilly 19 https://doi.org/10.6017/ital.v37i3.10240 accessibility statements, included “accessibility policy,” “section 508 compliance,” or “accessibility statement.” of those thirty-four vendors who posted accessibility documentation online, eleven provided a vpat or a link thereto in their accessibility statements. if we could not find an accessibility statement on the site, vendors were contacted first via email requesting information and documentation regarding the accessibility of their product using a form letter inspired by the ostergaard template.41 this email address was either found online— likely the “contact us” or technical support email links—or originated in the list of vendors’ contacts maintained in the library management system if another contact could not be found. if a response was not received within thirty days, the vendors were contacted a second time, a suggestion gleaned from mune and agee’s work.42 after all vendors included in the study had been contacted, any who did not provide a vpat were contacted a final time with a specific request for a vpat. for vendors who responded they could not provide a vpat or other accessibility statement, we used a screenshot of their response as documentation. the form letter (see appendix a) used in the current study made it known to vendors that their response would be posted publicly for the benefit of our users. twelve of the remaining vendors responded to our email inquiries with vpats and seven vendors responded with other accessibility documentation. figure 1. results of vendor query for accessibility documentation. in total, eleven vpats (15 percent) were found online and vpats from twelve vendors (16 percent) were received in response to our emailed request. twenty-three vendors (31 percent) had other accessibility documentation available online, while seven vendors (9 percent) provided other accessibility documentation in response to email inquiries. eight vendors (11 percent) other accessibility documentation found online 31% other accessibility documentation received 9% no official statement 11% did not respond 18% vpats found online 15% vpats received 16% vpats 31% results of vendor query for accessibility documentation information technology and libraries | septmeber 2018 20 responded they had no official statements or documentation to offer, and thirteen vendors (18 percent) did not respond (see figure 1). with the documentation compiled, we needed to establish an appropriate delivery system that would make this accessibility information visible to library users and therefore further the accessibility efforts. our collection cross-section, the a–z list, was chosen because of its prominence in our library’s online research presence as a suitable location to not only store but to convey this documentation to users. we created a clickable icon to be embedded into the databases’ entries in our a–z list created in libguides (a springshare product). clicking the icon would take the user to the vendor’s statement page, directly to the vpat, or to a page we created in libguides to store screen captures of vendor emails and vpats we received as attachments. if a vpat was available, we linked to it above any other documentation because vpats present a more rigorous analysis of the accessibility of third-party-created ict. libguides was determined to be a suitable place to house this documentation not only because it made the information easy to find for patrons, but also because springshare built libguides in an increasingly accessible manner and has documented its efforts using vpats for each product (see appendix b). further study it is expected that some of the information provided by the vendors is incomplete or inaccurate, even despite their best efforts, so the information we provide to patrons from and about the vendors might at times lead our patrons astray. we briefly examined the vpats acquired through this project to inform our work moving forward and found errors in at least half of them. some vendors claimed that skip navigation was available when none was found, while another would have benefitted from it but said it was “not applicable.” others were too brief to be useful, as no explanations were given for their claims. building on this current research, we intend, in collaboration with patrons with disabilities, to further verify the accuracy of key statements made by vendors in their vpats and other accessibility documentation. this analysis will give concrete feedback to vendors on how their sites could be further improved. as stated earlier, giving patrons access requires more than following a set of guidelines; it requires dialog to ensure their needs are fully met.43 it requires more than making the available documents accessible, but also testing the platform used to retrieve the documents for accessibility. as one author put it so well, “a lack of technological access is a solvable problem, but only if it is made a priority.”44 as vendors are not directly subject to enforcement of section 508 and other statutes regarding accessibility of the products they provide to libraries, vpats are truly voluntary. as such, the level of effort and detail of the product assessments are inconsistent and accuracy of the documentation is questionable. we intend to continue to be involved in the digital-accessibility initiative in part through our analysis of our digital-library presence, utilizing user input and expanding their role in improving the user experience. this would enable us to further improve our libraries’ service to users with disabilities. if we, as library professionals and institutions, stand together and each say our part, vendors will realize this is an important issue to address. also, it is important that we, as service providers for the public good, act as a bridge between these vendors—who at times do not avail good service information to their customers—and the patrons we serve. it may be a small step, but providing links to the vpats and other accessibility statements from vendors right where the patrons need enhancing visibility of vendor accessibility documentation | willis and o’reilly 21 https://doi.org/10.6017/ital.v37i3.10240 them is an important step in meeting the patrons where they are and showing them help is available. we can show patrons we care and will work with them to improve the now limited accessibility of not only scholarly information itself, but even of the platforms in which they are housed. information technology and libraries | septmeber 2018 22 appendix a: accessibility documentation request email template subject line: vpat request thank you for the information you provided answering our inquiry regarding the accessibility of your electronic product. wichita state university libraries has set a goal of improving the accessibility of the electronic and information technology we provide to our patrons. in accordance with section 504 of the rehabilitation act and title ii of the americans with disabilities act, do you happen to have a voluntary product accessibility template (vpat) available, or have you made plans to do further accessibility testing on your product? the vpat documentation can be found on the u.s. department of state website: http://www.state.gov/m/irm/impact/126343.htm. http://www.state.gov/m/irm/impact/126343.htm enhancing visibility of vendor accessibility documentation | willis and o’reilly 23 https://doi.org/10.6017/ital.v37i3.10240 appendix b: vpat and other accessibility documentation urls used in the databases a–z list. (list current as of october 20, 2017. library subscriptions may have changed. vendors may have updated urls or added additional documentation since october 20. research on this project is ongoing. please see http://libresources.wichita.edu/az.php for a current list of vendor accessibility documentation.) vendor urls aapg (american association of petroleum geologists) no accessibility documentation available abc-clio no response acls (american council of learned societies) http://www.humanitiesebook.org/about/for-librarians/#adacompliance-and-accessibility acm (association of computing machinery) https://www.acm.org/accessibility acs (american chemical society) https://www.acs.org/content/acs/en/accessibilitystatement.html adam matthew digital http://libresources.wichita.edu/c.php?g=583127&p=4026332 aiaa (american institute of aeronautics & astronautics) http://libresources.wichita.edu/ld.php?content_id=32264954 alexander street press https://alexanderstreet.com/page/accessibility-statement american institute of physics http://www.scitation.org/faqs american mathematical society http://www.ams.org/about-us/vpat-mathscinet-2014-ams.pdf apa (american psychological association) http://www.apa.org/about/accessibility.aspx asm international no response asme (american society of mechanical engineers) no accessibility documentation available astm no accessibility documentation available bioone http://www.bioone.org/page/resources/accessibility books 24x7 https://documentation.skillsoft.com/bkb/qrc/assistiveqrc.pdf britannica http://help.eb.com/bolae/accessibility_policy.htm business expert press http://media2.proquest.com/documents/ebookcentral_vpat.pdf cabell’s no response cambridge crystallographic data centre https://www.ccdc.cam.ac.uk/termsandconditions/ cambridge university press http://www.cambridge.org/about-us/accessibility/ cas no accessibility documentation available clcd (children’s literature comprehensive database) no response http://libresources.wichita.edu/az.php http://www.humanitiesebook.org/about/for-librarians/#ada-compliance-and-accessibility http://www.humanitiesebook.org/about/for-librarians/#ada-compliance-and-accessibility https://www.acm.org/accessibility https://www.acs.org/content/acs/en/accessibility-statement.html https://www.acs.org/content/acs/en/accessibility-statement.html http://libresources.wichita.edu/c.php?g=583127&p=4026332 http://libresources.wichita.edu/ld.php?content_id=32264954 https://alexanderstreet.com/page/accessibility-statement http://www.scitation.org/faqs http://www.ams.org/about-us/vpat-mathscinet-2014-ams.pdf http://www.apa.org/about/accessibility.aspx http://www.bioone.org/page/resources/accessibility https://documentation.skillsoft.com/bkb/qrc/assistiveqrc.pdf http://help.eb.com/bolae/accessibility_policy.htm http://media2.proquest.com/documents/ebookcentral_vpat.pdf https://www.ccdc.cam.ac.uk/termsandconditions/ http://www.cambridge.org/about-us/accessibility/ information technology and libraries | septmeber 2018 24 conference board http://www.conferenceboard.ca/accessibility/resources.aspx?as pxautodetectcookiesupport=1 cq press http://library.cqpress.com/cqresearcher/html/public/vpat.html credo reference https://credoreference.zendesk.com/hc/enus/articles/201429069-accessibility datazoa http://libresources.wichita.edu/accessibilitystatements/datazo avpat docuseek2 https://docuseek2.wikispaces.com/section+508+compliance+st atement ebsco https://www.ebscohost.com/government/full-508-accessibility ei engineering village https://www.elsevier.com/solutions/engineeringvillage/features/accessibility elsevier https://www.elsevier.com/solutions/sciencedirect/support/web -accessibility gale https://support.gale.com/technical/618 google https://www.google.com/accessibility/initiatives-research.html hathitrust https://www.hathitrust.org/accessibility heinonline https://www.wshein.com/accessibility/ ibisworld no response ieee https://www.ieee.org/accessibility_statement.html infobase learning http://support.infobaselearning.com/index.php?/tech_support/ knowledgebase/article/view/1318/0/ada-usability-statement infogroup http://libresources.wichita.edu/c.php?g=583127&p=4286285 institute of physics http://iopscience.iop.org/page/accessibility interdok no response jstor https://about.jstor.org/accessibility/ kanopy https://help.kanopystreaming.com/hc/enus/articles/210691557-what-is-kanopy-s-position-onaccessibility lexisnexis http://www.lexisnexis.com/gsa/76/accessible.asp library of congress https://www.congress.gov/accessibility mergent no accessibility documentation available national academies press no response national library of medicine https://www.nlm.nih.gov/accessibility.html naxos http://libresources.wichita.edu/c.php?g=583127&p=4287131 ncjrs https://www.justice.gov/accessibility/accessibility-information newsbank http://libresources.wichita.edu/c.php?g=583127&p=4457078 oclc https://www.oclc.org/en/policies/accessibility.html ovid http://ovidsupport.custhelp.com/app/answers/detail/a_id/590 9/~/is-the-ovid-interface-section-508-compliant%3f oxford university press https://global.oup.com/academic/accessibility/?cc=us&lang=en & projectmuse https://muse.jhu.edu/accessibility http://www.conferenceboard.ca/accessibility/resources.aspx?aspxautodetectcookiesupport=1 http://www.conferenceboard.ca/accessibility/resources.aspx?aspxautodetectcookiesupport=1 http://library.cqpress.com/cqresearcher/html/public/vpat.html https://credoreference.zendesk.com/hc/en-us/articles/201429069-accessibility https://credoreference.zendesk.com/hc/en-us/articles/201429069-accessibility http://libresources.wichita.edu/accessibilitystatements/datazoavpat http://libresources.wichita.edu/accessibilitystatements/datazoavpat https://docuseek2.wikispaces.com/section+508+compliance+statement https://docuseek2.wikispaces.com/section+508+compliance+statement https://www.ebscohost.com/government/full-508-accessibility https://www.elsevier.com/solutions/engineering-village/features/accessibility https://www.elsevier.com/solutions/engineering-village/features/accessibility https://www.elsevier.com/solutions/sciencedirect/support/web-accessibility https://www.elsevier.com/solutions/sciencedirect/support/web-accessibility https://support.gale.com/technical/618 https://www.google.com/accessibility/initiatives-research.html https://www.hathitrust.org/accessibility https://www.wshein.com/accessibility/ https://www.ieee.org/accessibility_statement.html http://support.infobaselearning.com/index.php?/tech_support/knowledgebase/article/view/1318/0/ada-usability-statement http://support.infobaselearning.com/index.php?/tech_support/knowledgebase/article/view/1318/0/ada-usability-statement http://libresources.wichita.edu/c.php?g=583127&p=4286285 http://iopscience.iop.org/page/accessibility https://about.jstor.org/accessibility/ https://help.kanopystreaming.com/hc/en-us/articles/210691557-what-is-kanopy-s-position-on-accessibilityhttps://help.kanopystreaming.com/hc/en-us/articles/210691557-what-is-kanopy-s-position-on-accessibilityhttps://help.kanopystreaming.com/hc/en-us/articles/210691557-what-is-kanopy-s-position-on-accessibilityhttp://www.lexisnexis.com/gsa/76/accessible.asp https://www.congress.gov/accessibility https://www.nlm.nih.gov/accessibility.html http://libresources.wichita.edu/c.php?g=583127&p=4287131 https://www.justice.gov/accessibility/accessibility-information http://libresources.wichita.edu/c.php?g=583127&p=4457078 https://www.oclc.org/en/policies/accessibility.html http://ovidsupport.custhelp.com/app/answers/detail/a_id/5909/~/is-the-ovid-interface-section-508-compliant%3f http://ovidsupport.custhelp.com/app/answers/detail/a_id/5909/~/is-the-ovid-interface-section-508-compliant%3f https://global.oup.com/academic/accessibility/?cc=us&lang=en& https://global.oup.com/academic/accessibility/?cc=us&lang=en& https://muse.jhu.edu/accessibility enhancing visibility of vendor accessibility documentation | willis and o’reilly 25 https://doi.org/10.6017/ital.v37i3.10240 proquest http://media2.proquest.com/documents/proquest_academic_vp at.pdf, http://media2.proquest.com/documents/ebookcentral_vpat.pdf, readex http://uniaccessig.org/lua/wpcontent/uploads/2014/11/readex.pdf sage https://us.sagepub.com/en-us/nam/accessibility-0 salem press no response sbrnet no response springer https://github.com/springernature/vpat/blob/master/springerl ink.md standard & poor’s no response swank no accessibility documentation available (http://libresources.wichita.edu/accessibilitystatements/swan kaccessibility) taylor & francis http://libresources.wichita.edu/c.php?g=583127&p=4539268 thomson reuters https://clarivate.com/wpcontent/uploads/2018/02/pacr_wos_5.27_jan-2018_v1.0.pdf, us department of commerce http://osec.doc.gov/accessibility/accessibliity_statement.html us department of education https://www2.ed.gov/notices/accessibility/index.html us government printing office https://www.gpo.gov/accessibility university of chicago no accessibility documentation available university of michigan https://www.press.umich.edu/about#accessibility uptodate http://libresources.wichita.edu/c.php?g=583127&p=4691631 valueline http://libresources.wichita.edu/accessibilitystatements/valueli neaccessibility wrds (wharton research data services) https://wrds-www.wharton.upenn.edu/pages/wrds-508compliance/ wiley http://olabout.wiley.com/wileycda/section/id-406157.html references 1 neil savage, “weaving the web,” communications of the acm 60, no. 6 (june 2017): 22. 2 ruth colker, “the americans with disabilities act is outdated,” drake law review 63, no. 3 (2015): 799. 3 colker, “the americans with disabilities act,” 817; joanne oud, “accessibility of vendor-created database tutorials for people with disabilities,” information technology and libraries 35, no. 4 (2016): 13–14. 4 laura delancey and kirsten ostergaard, “accessibility for electronic resources librarians,” serials librarian 71, no. 3–4 (2016): 181, https://doi.org/10.1080/0361526x.2016.1254134. http://media2.proquest.com/documents/proquest_academic_vpat.pdf http://media2.proquest.com/documents/proquest_academic_vpat.pdf http://media2.proquest.com/documents/ebookcentral_vpat.pdf http://uniaccessig.org/lua/wp-content/uploads/2014/11/readex.pdf http://uniaccessig.org/lua/wp-content/uploads/2014/11/readex.pdf https://us.sagepub.com/en-us/nam/accessibility-0 https://github.com/springernature/vpat/blob/master/springerlink.md https://github.com/springernature/vpat/blob/master/springerlink.md http://libresources.wichita.edu/accessibilitystatements/swankaccessibility http://libresources.wichita.edu/accessibilitystatements/swankaccessibility http://libresources.wichita.edu/c.php?g=583127&p=4539268 https://clarivate.com/wp-content/uploads/2018/02/pacr_wos_5.27_jan-2018_v1.0.pdf https://clarivate.com/wp-content/uploads/2018/02/pacr_wos_5.27_jan-2018_v1.0.pdf http://osec.doc.gov/accessibility/accessibliity_statement.html https://www2.ed.gov/notices/accessibility/index.html https://www.gpo.gov/accessibility https://www.press.umich.edu/about#accessibility http://libresources.wichita.edu/c.php?g=583127&p=4691631 http://libresources.wichita.edu/accessibilitystatements/valuelineaccessibility http://libresources.wichita.edu/accessibilitystatements/valuelineaccessibility https://wrds-www.wharton.upenn.edu/pages/wrds-508-compliance/ https://wrds-www.wharton.upenn.edu/pages/wrds-508-compliance/ http://olabout.wiley.com/wileycda/section/id-406157.html https://doi.org/10.1080/0361526x.2016.1254134 information technology and libraries | septmeber 2018 26 5 americans with disabilities act of 1990, pub. l. no. 101-336, 104 stat. 327 (1990). 6 rehabilitation act of 1973, pub. l. no. 93-112, 87 stat. 355 (1973). 7 discrimination on the basis of disability in federally assisted programs and activities, 77 fed. reg. 14,972 (march 14, 2012) (to be codified at 34 cfr pt. 104). 8 delancey and ostergaard, “accessibility for electronic resources,” 180. 9 architectural and transportation barriers compliance board, 65 fed. reg. 80,500, 80,524 (december 21, 2000) (to be codified at 36 cfr pt. 1194). 10 architectural and transportation barriers compliance board, 82 fed. reg. 5,790 (january 19, 2017) (to be codified at 36 cfr pt. 1193-1194). 11 29 usc §794d, at 289 (2016). 12 architectural and transportation barriers compliance board, 82 fed. reg. 5,790, 5,791 (january 19, 2017) (to be codified at 36 cfr pt. 1193-1194). 13 paul t. jaeger, “section 508 goes to the library: complying with federal legal standards to produce accessible electronic and information technology in libraries,” information technology and disabilities 8, no. 2 (2002), http://link.galegroup.com/apps/doc/a207644357/aone?u=9211haea&sid=aone&xid=4c7f 77da. 14 architectural and transportation barriers compliance board, 82 fed. reg. 5,790, 5791 (january 19, 2017) (to be codified at 36 cfr pt. 1193-1194). 15 ben caldwell et al., eds., “web content accessibility guidelines (wcag) 2.0,” last modified december 11, 2008, http://www.w3.org/tr/2008/rec-wcag20-20081211/. 16 delancey, laura, “assessing the accuracy of vendor-supplied accessibility documentation,” library hi tech 33, no. 1 (2015): 108. 17 ostergaard, kirsten, “accessibility from scratch: one library’s journey to prioritize the accessibility of electronic information resources,” serials librarian 69, no. 2 (2015): 159, https://doi.org/10.1080/0361526x.2015.1069777. 18 jaeger, “section 508.” 19 mary case et al., eds., “report of the arl joint task force on services to patrons with print disabilities,” association of research libraries, november 2, 2012, p. 29, http://www.arl.org/storage/documents/publications/print-disabilities-tfreport02nov12.pdf. 20 delancey and ostergaard, “accessibility for electronic resources,” 180. http://link.galegroup.com/apps/doc/a207644357/aone?u=9211haea&sid=aone&xid=4c7f77da http://link.galegroup.com/apps/doc/a207644357/aone?u=9211haea&sid=aone&xid=4c7f77da http://www.w3.org/tr/2008/rec-wcag20-20081211/ https://doi.org/10.1080/0361526x.2015.1069777 http://www.arl.org/storage/documents/publications/print-disabilities-tfreport02nov12.pdf enhancing visibility of vendor accessibility documentation | willis and o’reilly 27 https://doi.org/10.6017/ital.v37i3.10240 21 jennifer tatomir and joan c. durrance, “overcoming the information gap: measuring the accessibility of library databases to adaptive technology users,” library hi tech 28, no. 4 (2010): 581, https://doi.org/10.1108/07378831011096240. 22 tatomir and durrance, “overcoming the information gap,” 584. 23 delancey, “assessing the accuracy,” 104–5. 24 christina mune and ann agee, “are e-books for everyone? an evaluation of academic e-book platforms’ accessibility features,” journal of electronic resources librarianship 28, no. 3 (2016): 172–75, https://doi.org/10.1080/1941126x.2016.1200927. 25 mune and agee, “are e-books for everyone?,” 175. 26 joanne oud, “accessibility of vendor-created database tutorials for people with disabilities,” information technology and libraries 35, no. 4 (2016): 12, https://doi.org/10.6017/ital.v35i4.9469. 27 mark hachman, “tested: how flash destroys your browser’s performance,” pc world, august 7, 2015, https://www.pcworld.com/article/2960741/browsers/tested-how-flash-destroysyour-browsers-performance.html. 28 oud, “accessibility of vendor-created database tutorials,” 12. 29 delancey, “assessing the accuracy,” 106–7. 30 delancey, “assessing the accuracy,” 105. 31 kirsten ostergaard, “accessibility from scratch: one library’s journey to prioritize the accessibility of electronic information resources,” serials librarian 69, no. 2 (2015): 162–65, https://doi.org/10.1080/0361526x.2015.1069777. 32 ostergaard, “accessibility from scratch.” 164 33 delancey, “assessing the accuracy,” 111. 34 cynthia guyer and michelle uzeta, “assistive technology obligations for postsecondary education institutions,” journal of access services 6, no. 1/2 (2009): 29; oud, “accessibility of vendor-created database tutorials,” 7. 35 mune and agee, “are e-books for everyone?,” 173. 36 colker, “the americans with disabilities act,” 792–93. 37 delancey, “assessing the accuracy,” 107. 38 colker, “the americans with disabilities act,” 814; mune and agee, “are e-books for everyone?,” 182. https://doi.org/10.1108/07378831011096240 https://doi.org/10.1080/1941126x.2016.1200927 https://doi.org/10.6017/ital.v35i4.9469 https://www.pcworld.com/article/2960741/browsers/tested-how-flash-destroys-your-browsers-performance.html https://www.pcworld.com/article/2960741/browsers/tested-how-flash-destroys-your-browsers-performance.html https://doi.org/10.1080/0361526x.2015.1069777 information technology and libraries | septmeber 2018 28 39 adriana cardenes to dr. james rosser, april 7, 1997, private collection, quoted in colker, “the americans with disabilities act is outdated,” 815. 40 sushil k. oswal, “access to digital library databases in higher education: design problems and infrastructural gaps,” work 48, no. 3 (2014): 316. 41 ostergaard, “accessibility from scratch,” 164. 42 mune and agee, “are e-books for everyone?,” 175. 43 delancey, “assessing the accuracy,” 108; mune and agee, “are e-books for everyone?,” 181. 44 colker, “the americans with disabilities act,” 817. abstract introduction literature review legislation and existing guidelines existing literature and studies research methods process and findings further study appendix a: accessibility documentation request email template appendix b: vpat and other accessibility documentation urls used in the databases a–z list. experiences of migrating to an open source integrated library system vandana singh information technology and libraries | march 2013 36 abstract interest in migrating to open-source integrated library systems is continually growing in libraries. along with the interest, lack of empirical research and evidence to compare the process of migration brings a lot of anxiety to the interested librarians. in this research, twenty librarians who have worked in libraries that migrated to open-source integrated library system (ils) or are in the process of migrating were interviewed. the interviews focused on their experiences and the lessons learned in the process of migration. the results from the interviews are used to create guidelines/best practices for each stage of the adoption process of an open-source ils. these guidelines will be helpful for librarians who want to research and adopt an open-source ils. introduction open-source software (oss) has become increasingly popular in libraries, and every year more libraries migrate to an open-source integrated library system.1 while there many discrete opensource applications used by libraries, this paper focuses on the integrated library system (ils), which supports core operations at most libraries. the two most popular open-source ilss in the united states are koha and evergreen, and they are being positioned as alternatives to proprietary ilss. 2 as open-source software becomes more widely used, it is not enough just to identify which software is the most appropriate for libraries, but it is also important to identify best practices, common problems, and misconceptions with the adoption of these software packages. the literature on open-source ilss is usually in the form of a case study from an individual library or a detailed account of one or two aspects of the process of selection, migration, and adoption. in our interactions with librarians from across the country, we found that there are no consolidated resources for researching different open-source ilss and for sharing the experiences of the people using them. librarians who are interested in open-source ils cannot find one resource that can give them an overview of the necessary information related to open-source ilss. in this research, we interviewed twenty librarians from different types and sizes of libraries and gathered their experiences to create generalized guidelines for the adoption of open-source ilss. these guidelines are at a broader level than one single case study and cover all the different stages of the adoption lifecycle. the experiences of librarians are useful for people who are evaluating opensource ilss as well as those who are in the process of adoption. learning from their experiences will help librarians to not have to reinvent the wheel. this type of research helps the librarians by empowering them with the information they need; also, it helps us in understanding the current status of this popular software. vandana singh (vandana@utk.edu) is assistant professor, school of information sciences, university of tennessee, knoxville, tennessee. mailto:vandana@utk.edu experiences of migrating to an open-source integrated library system | singh 37 literature review as mentioned earlier, most of the literature on open-source ils is practitioner-based and provides case studies or single steps in the process of adoption. these research studies and resources are useful but do not address the broad information needs of the librarians who are researching the topic of open-source ilss. every library is different, so no two libraries are going to take the same path in the adoption process. the usefulness of these articles depends on whether the searcher can find one in a similar environment. another issue is the amount of information given in these resources. often these papers discuss only one aspect of moving to an open-source ils, for example choosing the open-source ils. if they do cover the whole process, there is usually not enough detail to know how they did it. for example, morton-owens, hanson, and walls organize their paper into five sections: motivation and requirements analysis, software selection, configuration, training, and maintenance. 3 however, each section includes more main points than description. another relevant stream of literature includes those that compare different opensource ilss. these range from little more than links to different open-source projects to in-depth comparisons.4 for example, muller evaluated open-source communities for different ilss on forty criteria and then compared the ils on over eight hundred functions and features.5 these types of articles are very useful for those who are trying to become acquainted with the different opensource ilss that are available and are in the evaluation phase of the process. again, they are not helpful in understanding the entire process of adoption. some best practices articles such as tennant may be a little older, but his nine tips are still valid and very useful as a good foundation for anyone thinking about making the switch to open-source ils.6 what are the factors for moving to an open-source ils? another reason why an open-source ils appeals to libraries is its underlying philosophy: “open source and open access are philosophically linked to intellectual freedom, which is ultimately the mission of libraries.” 7 the other two common reasons are cost and functionality. the literature covering the decision to move to an open-source ils makes it clear that there is a wide variety of ways that libraries come to this decision. in espiau-bechetoille, bernon, bruley, and mousin, the consortium made the decision in four parts. 8 the article states that they initially determined that four open-source ilss met their needs (koha, emilda, gnuteca and evergreen), although it is somewhat vague as to how they determined that koha was the best for their situation. indeed, most of the article is about how the three libraries involved had to work together, coordinating and dividing responsibilities. bissels shares that money was the main reason that the complementary and alternative medicine library and information service (camlis) decided to migrate to koha.9 they explain the process of making that decision. camlis was being developed from nothing, which makes their situation different than most libraries, and hence the process is different as well. michigan is an area known for its number of evergreen libraries. much of that is due to michigan library consortium. dykhuis explains the long, involved process that led to a number of evergreen installations. 10 mlc provides services to michigan information technology and libraries | march 2013 38 libraries, such as training and support. when they started looking for an ils system that all libraries could use, the main concerns were cost and functionality, which are the two key aspects that are mentioned in any discussion about choosing an ils. kohn and mccloy state that they decided to migrate to a new ils due to frustration with their current ils and that they involved all six of their librarians in the decision-making process.11 dennison and lewis show another reason why people migrate to open-source ils.12 they say that the proprietary system they were using was much more complicated than they needed. in addition, because of staff turnover no one really understood the system. this lack of expertise combined with increasing annual costs led to the decision to move to an open-source ils. an important lesson to take from this article is that they included all six of their librarians in the decisionmaking process. for a smaller library where everyone is an expert in their area of the library, it is important to get everyone involved in order to make sure that important functions or needed capabilities are not overlooked. almost any library that chooses open-source ils will name cost as one of their primary reasons. functionality is usually what determines which ils they choose. riewe conducted a study where he asked why each library chose its current ils. 13 open source libraries responded most often with ability to customize, the freedom from vendor lock-in, portability, and cost. how does migration happen? there are two general ways to do a migration: all at once or in stages. kohn and mccloy discuss a three-phase migration.14 the reason for this method was to spread the cost over several years. they did the public website and federated catalog as phase one and did the backend part during phases two and three. when multiple libraries are involved, phased migration is more like what is described in dykhuis.15 in that case, first a pilot program was created where a few libraries migrated over to the new system. when that was successful, then more libraries migrated. in contrast to a phased migration, walls discusses a migration completed in three months.16 this time includes installation, testing, and configuration. one interesting decision they made was to migrate at the end of the fiscal year in order to limit the amount of acquisitions data to be migrated. dennison and lewis completed their migration in two months. in this migration, most of the work was done by the company that was hosting their system. 17 this limited the amount of expertise that the library staff needed and made the migration much smoother from their perspective. migration can also be an opportunity; for example, morton-owens, hanson, and walls mention that they used the migration to koha to synchronize circulation rules between the branches. 18 it was also used to weed out inactive patrons (anyone who had not used the library in two years). data migration can be a problem, though. in the old system, the location code had been used for where the item was within the branch library, what kind of item it was, and how it circulated, but experiences of migrating to an open-source integrated library system | singh 39 these are three separate fields in koha. however, to some extent these issues are true of any migration between different systems. the migration experience is not always of a smooth transition. one of the advantages of opensource is the ability to customize and to develop functions that are specific to your library. in the case of new york academy of medicine library (nyam) working with its consortium waldo (westchester academic library directors organization), it was the decision to have developments completed before migration that caused the problems.19 their migration schedule was delayed by a month, and even after the delay not all of the eleven key features were complete. in addition, their migration took place when liblime (a proprietary vendor) with whom they were working announced their separation from the koha open-source community, which caused additional confusion. there are a couple of lessons to take from this. first, if doing development, be sure that the time needed is built into the migration schedule. also, when choosing an ils, think about how many developments are going to be necessary to successfully run the ils in your environment. lastly, try to prioritize the developments to minimize the number needed before “going live.” what does the literature say about training? very little is available about the training process for open-source ils. in current studies, training can be done in two ways: either by buying training from a vendor, or doing it internally.20 dennison and lewis found that having staff work on the system together at first and then try it independently was the most successful. 21 they had a demonstration system to practice, which also helped. in addition to this self-training, they had onsite training done by module, which allowed staff to attend only the training that was relevant and needed for them. in all of the articles discussed in this section, only one talks about ongoing maintenance. 22 the two-paragraph section includes suggested methods and does not mention anything about the amount of time or expertise needed for ongoing maintenance. in summary, in this literature review we found that there is research about open-source ils but that there is a need for much more work in this area. it was found that research articles and practitioner pieces are available and talk about different aspects of the adoption process. the main reasons for adoption are identified. there are also a few scattered individual articles about the process of migration, training, and maintenance. there is a gap in the studies of open-source ils, and there is no comprehensive study that documents the process, explains the steps, and identifies best practices and challenges for librarians who are interested. data sources the objective for data collection was to collect data from a variety of library types and sizes in order to collect a wide range of data. e-mail invitations for interviews were sent to koha and evergreen discussion list and to several other library-related discussion list. the e-mail requested volunteers for a telephone interview to share their experiences with open-source integrated library systems. potential participants identified themselves as being willing to be interviewed for information technology and libraries | march 2013 40 the project via e-mail and were then contacted by researchers to set up times for phone interviews. the list of interview questions was e-mailed to the participants before the interviews so that they could review the questions and had enough time to reflect on their experiences. the interviews were conducted with librarians working in a variety of libraries, including nine libraries using evergreen and one in the process of migrating to evergreen. seven libraries were using koha, two were using other open-source ilss, and one was using a proprietary ils while evaluating opensource. public libraries were the most numerous with eleven respondents, while there were also four special libraries, three academic libraries, and one school library. researchers also requested information about the size of the library collection. seven libraries owned collections of less than 100,000 items, seven had collections of 100,001–999,999 items, and four libraries owned collections of over 1,000,000 items. geographically, the respondents ranged all over the united states and included one library located in afghanistan (although the ils was installed in the united states). table 1 details the description of the data. data collection method interviews were chosen as the primary means of data collection in order to gather rich information that could be analyzed using qualitative methods. researchers sought to interview professionals from a variety of library types and sizes in order to collect a variety of different experiences regarding the selection, implementation, and ongoing maintenance of open-source ils. interviewing was the chosen methodology for several reasons. first, the goal was to go past the practitioner articles to see what kinds of trends there are in the migration process. this requires getting experiences from multiple librarians. interviews provide the in-depth “case-study description” that we were looking for.23 in addition, the most useful aspect of interviewing is the ability to follow up on an answer that the participant gives.24 this ensures that the same type of information is gathered from every interview. this is unlike surveys where sometimes participants do not respond in a way that answers what the researcher really wants to know. in our case, we used telephone interviews due to the geographic dispersion of the participants. it allowed us to talk to librarians from all over the country instead of just within our area. the interview questions are listed in appendix a. data analysis methodology interviews were transcribed, and identifying information was then removed from each of the transcribed documents. the transcripts were then uploaded into dedoose (www.dedoose.com), a web-based analysis program supporting qualitative and mixed methods research. dedoose provides online excerpt selection, coding, and analysis by multiple researchers for multiple documents. the research team used an iterative process of qualitatively analyzing the resulting documents. this method used multiple reviews of the data to initially code large excerpts which were then analyzed twice more to extract common themes and ideas. researchers began by reviewing each document for quantitative information, including the library type, ils in use, experiences of migrating to an open-source integrated library system | singh 41 number of it staff, and size of the collection. this information was added as metadata descriptors to each document in dedoose. upon review of the transcriptions and in discussions about the interview process, researchers began a content analysis of the qualitative data. codes were created based on this initial analysis to aid in categorizing the data from the interviews. two coders coded the entire dataset, specifying categories and themes to the excerpts of the interview transcription. all of the excerpts from each coder were used to create two tests. each coder then took the test of the other's codes by choosing their own codes for each excerpt. researchers earned scores of .96 and .95 using cohen’s kappa statistic, indicating very high reliability. table 1. description of libraries library size (number of items in collection) library type ils used under 100,000 academic koha 100,000–1,000,000 public evergreen under 100,000 special proprietary—considering open-source under 100,000 public koha school koha 100,000–1,000,000 public millennium—in process of migrating to evergreen 100,000–1,000,000 public evergreen 100,000–1,000,000 special koha under 100,000 public koha public evergreen 100,000–1,000,000 academic evergreen-equinox under 100,000 special koha over 1,000,000 academic kuali ole 100,000–1,000,000 public evergreen-equinox over 1,000,000 public evergreen 100,000–1,000,000 public evergreen under 100,000 public koha-bywater over 1,000,000 public evergreen-equinox under 100,000 public evergreen over 1,000,000 special collective access information technology and libraries | march 2013 42 results results from the interview questions were divided into eight categories identified as stages of migration, starting with evaluation of the ils, creation of a demonstration site, data preparation, identification of customization and development needs, data migration, staff training and user testing, and going live and long-term maintenance plans. best practices and challenges for each of the stages are presented below. this section begins with some general considerations gleaned from the responses. general consideration when migrating to an open-source ils • create awareness about open-source culture in your library—let them know what to expect. • develop it skills internally even if you use a vendor. • assess your staff’s abilities before committing. knowing what your staff can do will help determine whether you need to work with a vendor and to what degree or if you can do it alone. it is also a way to determine who is going to be on your migration team. • have a demonstration system; pre-migration, it can be used to test and train, and after migration it can be used to help find solutions to problems. this will also help develop skills internally. • communication is key. o if working with a vendor either as a single library or as a consortium, have a designated liaison with the vendor so all questions go through one person. in a consortium, ensure that everybody knows what is going on. • be prepared to commit a significant amount of staff time for testing, development, and migration, especially if you are not hiring a proprietary vendor for support. working with vendors • read contracts carefully. do not be afraid to ask questions and request changes. sometimes the other party has a completely different meaning for a word than you do. make sure you are on the same page. • ensure that there is an explicit timeline and procedure for the release of usable source code. • see that you are guaranteed and entitled to access the source code in case you need to switch developers, bring additional developers on board, or try to fix problems in-house. • provide specific examples when reporting problems. specific example will help the developers determine what the problem is and will help prevent any miscommunication. • designate a liaison between library staff and developers. the liaison will have to be someone who understands or can learn enough about what the developers are doing so that he or she can translate any problems or complaints from one group to the other. experiences of migrating to an open-source integrated library system | singh 43 • set up regular meetings for those involved in the migration project. regular meetings keep everyone focused and on task. they also provide an opportunity for questions, concerns, and problems to be addressed quickly. sample quote from interviews: one of the main things that came up is working with equinox, it was amazing. to start with, they were very, very helpful. and i had made an assumption, and i think the rest of us had, too, that we were working with, that this was developed by librarians, and that the terminology used would be library jargon. but that was not the case. we had some stumbling points over, we would say, okay, we want this, or this is a transaction, or that’s a bill, but that’s not what they called it. they didn’t call it a transaction, or they didn’t call it a bill. and so when we wrote the contract, we wrote it so that none of the patrons’ current checkout record would migrate, which is a big issue. and we didn’t realize that we weren’t using the right terminology in order to put that in the contract so that those current checkouts would move over with the migration and not just the record. stage 1—evaluation when making the decision of whether to migrate to open-source and which open-source ils is best for your library, the main things to start with are two questions: who makes the decision and on what basis. in practice, who makes the decision? • if a single library, one or two people make the decision, usually the library director and whoever is serving as the tech person. • if in a consortium, a committee makes the decision, often either the library directors or tech people. best practice suggestion: regardless of the size of the library system, even though these are the people making the decisions, you should always try to include as many groups as open-sourceible in the decision to move to open-source. which ils? • make a list of requirements based on your current system and a wish list of requirements for the new system. this is one area where you can involve more than just the system staff. asking the different departments (cataloging, acquisitions, and circulation) what their needs are ensures that the final decision includes everyone. • talk to other libraries that have made the move to open-source. they are a great resource for seeing how the system actually works, asking questions about the migration process, and providing information about open-source problems. if available, talk to a library that migrated from your current proprietary system. some systems are easier to migrate from than others, so this would be an opportunity to find out about any specific problems. information technology and libraries | march 2013 44 stage 2—set up a demonstration site • this is the most important guideline in the entire paper. create a demonstration site before making a final decision. o if there is still confusion in your team about which ils to use, setting up a demo site and installing koha and evergreen will be the best way to decide which one works for your situation. o doing at least one test migration will show what kind of data preparation needs to be done, usually by doing data mapping. data mapping is where you determine where the fields in your current system go when you move into the new system. another often-used term for data mapping is staging tables. o the demo site is also a good way to do staff training when needed. o the demo site also provides a way to determine what the best setup rules, policies, and settings are by testing them in advance. o it provides an opportunity to learn the processes of the different modules and how they differ from your library’s current practices. o most importantly, it serves as a test run for migration, which will make the actual migration go smoothly. sample quotes from interviews: do you think that the tests with the data and doing that really helped? oh yes, we were have had a disaster if we hadn’t done three tests and test loads. the pals office has done conversions multiple times before so they have it done, and we have good tech people. so they knew that the three tests loads would be a good thing. we did discover some of the tools that should be used, like for example one of the things that’s recommended for evergreen patron migration is to have a staging table, so you dump all your records into a database that you can then use to create the records in the evergreen tables. and you know we found out why that was important by running into a couple, a few problems with not being able to line up the data in the multiple fields. but you know that’s the sort of thing we expect. that’s pretty, i classify it as pretty typical migration learning, is finding out what works one way, what doesn’t the other. but you know that was a good thing because all the documents were saying, “you should use a staging table.” and we had to figure out ourselves why that was such a good idea. you should use a staging table for migration, i.e. move records into a database that is then used to create records in evergreen. it helps because some data doesn't line up in the same fields. it's a good idea to set up tables and rules far in advance in order to test before migration. it's very important to do data mapping very carefully because if you lose anything during migration it's difficult to get it back. check it to make sure that all the fields will be experiences of migrating to an open-source integrated library system | singh 45 transported correctly, and run tests while the old system is still up to make sure everything is there. stage 3—data preparation • clean up the data in advance. the better the data is, the easier it will transfer. this is also an opportunity to start fresh in a new system, so if there were inconsistencies or irritations in the old system this is a good time to fix it. o weeding—if you have records (either materials or patrons) that are out of date, get rid of them. the fewer the records, the easier migration will be. in addition, vendors often charge by record, so why pay for records you do not need? • consistency in data is key. if multiple people are working on the data, make sure they are working based on the same standards. • do a fine amnesty when migrating to a new system. depending on the systems (current and new), it is sometimes impossible or very difficult to transfer fine data into the new ils, so doing a fine amnesty will make the process simpler. • spot check data (testing, during, and after migration). catching problems early means there will be less work trying to fix problems later. sample quotes from interviews: i would say that if you’re considering converting to an ils software, that you’ve really got to do the data mapping very carefully with a fine-toothed comb because you don’t want to lose data. it’s too hard to get it back in. the data needs to be normalized so that the numbers of fields are uniform, names are in the correct order, and data is displayed correctly. the library has had to decide whether it is worthwhile to do things like getting rid of old abbreviations, etc. to make the data more easily understood. problems occur with old data if information such as note fields has been entered inconsistently. it's important to have procedures and to make sure everyone is following them. often things are put in different places, which causes a lot of trouble. they are doing a lot of cleanup of data, such as reducing the number of unique values in the case of some items that had a huge number of values in a drop down list. would like to spend more time on data cleanup but need to go ahead and get data migrated. stage 4—development/customization • one benefit of using an open-source ils is that any development done by any library comes back to the community, so often if you want something done, someone else might have already created that functionality and you can use it. information technology and libraries | march 2013 46 • develop partnerships. often if you want a specific development, someone else does too. if your staff does not have the expertise, then you could provide more of the funding and the partner could provide the tech skills or vice versa. partnerships mean the development will cost less than if you did it alone. • grant money is also available for open-source development and may be another funding option. sample quotes from interviews: the library does its own minor customizations and uses equinox for major jobs. they will lay out and prepare everything then hire equinox to write and implement new code. the library tries not to do things on its own but always looks for partnerships when doing any customizations. that way libraries that have similar needs can share resources. stage 5—migration process • write workflows and policies/rules beforehand. writing these when working on the demo site should provide step-by-step instructions on how to do the final migration. • having regular meetings during the migration process ensures that everyone stays on the same page and prevents miscommunications that will slow down the process. • if many libraries are involved, migration in waves will make things easier. this is generally a situation with a statewide consortium. usually there is a pilot migration of four to eight libraries, then after that, each wave gets a little bigger as the system becomes more practiced. this can also be a useful model if the libraries involved in the consortium are accepting the migration at different rates. • for a consortium that is coming from multiple ilss, having a vendor will make it easier. this is not to say that it could not be done without a vendor, but migrating from system a is going to be different than migrating from system b. this increases the complexity, which can make working with a vendor more cost effective. stage 6—staff training and user testing • who does the training? there are two main ways: by a vendor or internally. o if trained by a vendor, there are two options:  the vendor sends someone to the library to conduct training.  the library sends someone to the vendor for training and then he or she comes back and trains the rest of the staff. o if trained internally, there are a lot of training materials available. there are several libraries that have created their own materials and then made them available online. this is another time where having contacts with other libraries can help in using common resources. experiences of migrating to an open-source integrated library system | singh 47 • documentation is important for training. the best way is to find what documentation is already available and then customize it for your system. • do training fairly close to the “go live” date. • use a day or two for training. if a consortium is spread out geographically, use webinars and wikis. • when doing training, have specific tasks to do. this can be done a few ways. o do the specific tasks at the training. o demonstrate the tasks at training and then give “homework” where the staff does the specific tasks independently. to implement this option, staff has to have access to a demo system. o have staff try the tests on their own and use the training session for questions or problems they had. sample quotes from interviews: well we had, we hired equinox to come and do 2 days of training with us. so they’re here and did hands-on training with us. and then we also, they provided some packets of exercises that people could do on their own. and we had the system up and running in the background so that they could play with it about a week before we actually went live to the public so that they could get used to it, figure out how things worked, and work with it a little bit so they could answer questions before the public came and said, hey, how do i find my record, and i can’t get into this anymore. and the training was really good, but the hands-on was the best. and it’s not a difficult system to work, but you just need a little experience with it before it makes sense. evergreen runs a test server that anybody can download the staff client for that and work in their test server and just examine all of the records and how the system works, to figure out our workflows. we looked up documentation online—evergreen, indiana, pines, various places—copied the documentation they so graciously hosted online for everybody to use, went through it, found what worked for us. those couple staff members worked with other staff. we printed out kind of our little how-to guides for other people, depending on which worked, and told them they’re going to sit down, we’ve got terminals set up here, sit down and learn it. the admin person, she went through some quite detailed training. she went to atlanta and had training from equinox on a lot of aspects of evergreen. and then we also, she came back, and then she did training for all the libraries in the consortium, kind of an intensive day-long or half-day-long thing that she offered in several different central geographic locations so that all the libraries would have a chance to go and attend without having to drive too far. and we also did webinars, we got a couple webinars for the real outlying libraries. and we also have ongoing weekly webinars. and we have a wiki set up where we put all the information in the online manual and stuff like that. information technology and libraries | march 2013 48 all the training sessions were recorded, and so we had them on cd for new people coming on board. marketing for patrons • most libraries have not done anything elaborate, generally just announcements through posters, local papers, flyers, and on websites. • if the migration is greatly changing the situation for patrons, then more marketing is needed. • set up a demo computer for patrons to try or hold classes once the system is up. training for patrons • most libraries did not find this necessary. either the system is easy to use or it is set up to look like the old system. • if training patrons, create online tutorials. stage 7—“go live” and after • if possible, have your old system running for a month or two until you are sure that all the data got migrated over properly. sample quote from interviews: check it to make sure that all the fields will be transported correctly, and run tests while the old system is still up to make sure everything is there. maintenance—library staff (this assumes a migration being done in-house with little to no vendor support.) • staff has to have the technical knowledge (linux, sql, and coding). • often the money saved from moving to open-source is used to pay for additional staff. • most time is not spent on maintenance but on customization, updates, or problem-solving. maintenance—vendor • often start with higher vendor support, which lessens as the staff learns and develops expertise. discussion and conclusion interviews with twenty librarians from different settings provided insight into the process of the adoption of open-source ils and were used to develop the guidelines presented in this paper. these guidelines are not intended to serve as a complete guide to the process of adoption but are meant to give interested librarians an overview of the process. these guidelines can help libraries prepare themselves for the research and adoption far before they delve into the process. since these guidelines are all based in the real-life adoption experiences of libraries, they provide insight experiences of migrating to an open-source integrated library system | singh 49 into the challenges as well as the opportunities in the process. these guidelines can be used to develop an adoption plan and requirements for the adoption process. in future research, we are working to create adoption blueprints and total cost of ownership assessments (with and without vendors) for libraries of different sizes and types. also, as part of this research we have developed an information portal that contains resources that will help librarians in each phase of the process of open-source ils adoption. the information portal along with these guidelines will fill a very important gap in the resources available for open-source ils adoption. the url for the portal is not being provided in this paper to ensure anonymous review. references 1. marshall breeding, “automation marketplace 2012: agents of change,” library journal 137, no. 6 (april 1, 2012), http://lj.libraryjournal.com/2012/03/industry-news/automationmarketplace-2012-agents-of-change (accessed february 18, 2013). 2. tristan müller, “how to choose a free and open-source integrated library system,” oclc systems & services: international digital library perspectives 27, no. 1 (2011): 57–78, http://dx.doi.org/10.1108/10650751111106573 (accessed february 18, 2013). 3. emily g. morton-owens, karen l. hanson, and ian walls, “implementing open-source software for three core library functions: a stage-by-stage comparison,” journal of electronic resources in medical libraries 8, no. 1 (2011), 1–14, http://dx.doi.org/10.1080/15424065.2011.551486 (accessed february 18, 2013). 4. janet l. balas, “how they did it: ils migration case studies,” computer in libraries 31, no. 8 (2011): 37. 5. müller, “how to choose a free and open-source integrated library system.” 6. roy tennant, “technology decision-making: a guide for the perplexed,” library journal 125, no. 7 (2000): 30. 7. xan arch, “ultimate debate 2010: open source software—free beer or free puppy? a report of the lita internet resources & services interest group program, american library association annual conference, washington, dc, june 2010,” technical services quarterly 28, no. 2 (2011): 186–88, http://dx.doi.org/10.1080/07317131.2011.546268 (accessed february 18, 2013). 8. camille espiau-bechetoille, jean bernon, caroline bruley, and sandrine mousin, “an example of inter-university cooperation for implementing koha in libraries: collective approach and institutional needs,” oclc systems & services: international digital library perspectives 27, no.1 (2011): 40–44, http://dx.doi.org/10.1108/10650751111106546 (accessed february 18, 2013). http://lj.libraryjournal.com/2012/03/industry-news/automation-marketplace-2012-agents-of-change/ http://lj.libraryjournal.com/2012/03/industry-news/automation-marketplace-2012-agents-of-change/ http://dx.doi.org/10.1108/10650751111106573 http://dx.doi.org/10.1080/15424065.2011.551486 http://dx.doi.org/10.1080/07317131.2011.546268 http://dx.doi.org/10.1108/10650751111106546 information technology and libraries | march 2013 50 9. gerhard bissels, “implementation of an open-source library management system: experiences with koha 3.0 at the royal london homoeopathic hospital,” electronic library and information systems 42, no. 3 (2008): 303–14, http://dx.doi.org/10.1108/00330330810892703 (accessed february 18, 2013). 10. randy dykhuis, “michigan evergreen: implementing a shared open source integrated library system,” collaborative librarianship 1, no. 2 (2009): 60–65, http://collaborativelibrarianship.org/index.php/jocl/article/view/7/8 (accessed february 18, 2013). 11. karen kohn and eric mccloy, “phased migration to koha: our library’s experience,” journal of web librarianship 4 no. 4 (2010): 427–34, http://dx.doi.org/10.1080/19322909.2010.485944 (accessed february 18, 2013). 12. l.h. lyn dennison and a.f. lewis, “small and open-source: decisions and implementation of an open-source integrated library system in a small private college,” georgia library quarterly 48 no. 2 (2011): 6–8, http://digitalcommons.kennesaw.edu/glq/vol48/iss2/3 (accessed february 18, 2012). 13. linda m. riewe, “survey of open-source integrated library systems,” master’s theses, paper 3481, http://scholarworks.sjsu.edu/etd_theses/3481 (accessed february 18, 2013). 14. karen kohn and eric mccloy, “phased migration to koha: our library’s experience.” 15. randy dykhuis, “michigan evergreen: implementing a shared open source integrated library system.” 16. ian walls, “migrating from innovative interfaces’ millennium to koha: the nyu health sciences libraries’ experiences,” oclc systems & services: international digital library perspectives 27, no. 1 (2011): 51–56, http://dx.doi.org/10.1108/10650751111106564 (accessed february 13, 2013). 17. l.h. lyn dennison and a.f. lewis, “small and open-source: decisions and implementation of an open-source integrated library system in a small private college.” 18. emily g. morton-owens, karen l. hanson, and ian walls “implementing open-source software for three core library functions: a stage-by-stage comparison.” 19. lisa genoese and latrina keith, “jumping ship: one health science library’s voyage from a proprietary ils to open source,” journal of electronic resources in medical libraries 8, no. 2 (2011): 126–33, http://dx.doi.org/10.1080/15424065.2011.576605 (accessed february 18, 2013). 20. ian walls, “migrating from innovative interfaces’ millennium to koha: the nyu health sciences libraries’ experiences”; emily g. morton-owens, karen l. hanson, and ian walls, http://dx.doi.org/10.1108/00330330810892703 http://collaborativelibrarianship.org/index.php/jocl/article/view/7/8 http://dx.doi.org/10.1080/19322909.2010.485944 http://digitalcommons.kennesaw.edu/glq/vol48/iss2/3 http://scholarworks.sjsu.edu/etd_theses/3481 http://dx.doi.org/10.1108/10650751111106564 http://dx.doi.org/10.1080/15424065.2011.576605 experiences of migrating to an open-source integrated library system | singh 51 “implementing open-source software for three core library functions: a stage-by-stage comparison.” 21. l. h. lyn dennison and a. f. lewis, “small and open-source: decisions and implementation of an open-source integrated library system in a small private college.” 22. morton-owens, hanson, and walls, “implementing open-source software for three core library functions.” 23. laurel jizba mis, “an essay on our interviews, and a call for participation,” journal of internet cataloging 6 no. 2 (2003): 17–20, doi: 10.1300/j141v06n02_04 (accessed february 18, 2013). 24. golnessa galyani moghaddan and mostafa moballeghi, “how do we measure the use of scientific journals? a note on research methodologies,” scientometrics 76, no. 1 (2008): 125– 33, doi: 10.1007/s11192-007-1901-y (accessed february 18, 2013). doi:%2010.1300/j141v06n02_04 doi:%2010.1007/s11192-007-1901-y information technology and libraries | march 2013 52 appendix a. interview questions library environment 1. what is your library type (school, academic, public, special, etc.)? 2. what is your library size (how many employees, population served, and number of materials)? evaluation (we would like as much info as possible about why the system was chosen over others, including any existing system.) 3. what open-source ils are you using and why did you choose it? 4. when choosing an open-source ils, where did you go for information (vendor/ils pages, community groups, personal contacts, etc)? 5. who was involved in deciding which ils to use? adoption (we would like to document specific problems or issues that could be used by other libraries to ease their installation.) 6. were there any problems during migration? 7. what do you know now that you wish you had known before migration? 8. how long did migration take? were you on schedule? 9. if getting paid support, how did the vendors (previous and current) help with migration? implementation (again, specific examples of the things that worked well or didn't work. how can other libraries learn from this experience?) 10. what kind of (and how much) training did your library staff receive? 11. did you do any kind of marketing to your patrons? 12. (if haven’t gotten to this part yet), what are your plans for implementation? 13. how much time did implementation take and were you on schedule? maintenance (this information will be especially important when compared to the library type and size as a reference for other libraries. we would like to get answers that are as specific as possible). 14. how large is your systems staff? is it sufficient to maintain the system? 15. how much time do you spend each week doing system maintenance? how does this compare to your old system? experiences of migrating to an open-source integrated library system | singh 53 16. what resources (or channels) do you use to solve your technical support issues? what roles do paid vendors play in maintenance of your system? advice for other libraries (these open-ended questions are an opportunity to learn more information that we might not have thought of asking about. responses could provide a valuable resource to other libraries as they plan their implementation). 17. what is the best thing and worst thing about having an open-source ils? 18. are there any lessons or advice that you would like to share with other librarians who are thinking about or migrating to an open-source ils? acknowledgment this research was funded by an early career imls grant. abstract interest in migrating to open-source integrated library systems is continually growing in libraries. along with the interest, lack of empirical research and evidence to compare the process of migration brings a lot of anxiety to the interested librari... a collaborative approach to newspaper preservation public libraries leading the way a collaborative approach to newspaper preservation ana krahmer and laura douglas information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12596 ana krahmer (ana.krahmer@unt.edu) oversees the digital newspaper unit at unt. through this work, she manages the texas digital newspaper program collection on the portal to texas history, which is a gateway to historic research materials freely available worldwide. laura douglas (laura.douglas@cityofdenton.com) is the librarian in charge of the special collections with the denton public library which houses the genealogy, texana, and local denton history collections as well as the denton municipal archives. in her work, she regularly assists patrons with newspaper research questions specifically related to denton newspapers. © 2020. introduction when we first proposed this column in january 2020, we had no idea how much the world would change between then and the july deadline. while we have collaborated for many years on a variety of projects, the value of our collaboration has never proven itself more than in this covid 19 reality: collaboration leverages the strengths and resources of partners to form something stronger than each. in this world of covid-19, the collaboration between the denton public library (dpl) and the university of north texas libraries (unt) has allowed us to build open, online access to the first 16 years of the denton record-chronicle (drc). this newspaper is the city’s daily newspaper of record, and the collaboration between dpl and unt resulted in free, worldwide research access, via the portal to texas history. the project was funded by a $24,820.00 grant through the imls library services and technology act (lsta), awarded from september 2019 to august 2020 by the texas state libraries and archives commission (tslac) as part of their textreasures program, to digitize 24,000 newspaper pages. this project has also resulted in a follow-up collaboration to build open access to further years of this daily newspaper title, through a 2021 textreasures award to digitize an additional 24,000 newspaper pages. the real question, though, is what recipe made this a successful collaboration. background the drc has been the community newspaper in denton for over 100 years. due to the sheer amount of material, digitizing a daily newspaper with such an extensive publication run is a long term project that requires a lot of planning, time, and funding. since the dpl’s inception in 1937, the library has endeavored to collect items related to denton and texas history. with community support, the library has developed a well-rounded collection of local history, texana, and genealogical materials, all of which are housed in the special collections research area at the emily fowler central library. these materials support research, projects, and exhibits. one major research resource is the archival collection of local newspapers, mainly the drc, maintained on 752 rolls of microfilm containing issues from 1908 to 2018. before this project, access to these newspapers was only available in the special collections research area, through microfilm readers or paid subscription services. in addition, although steps had been taken to preserve the film, many of the rolls show wear from years of use, while others have developed vinegar syndrome and soon will no longer be a usable resource. in 2018, unt obtained publisher permission to make the drc run freely accessible on the portal to texas history. mailto:ana.krahmer@unt.edu mailto:laura.douglas@cityofdenton.com information technology and libraries september 2020 a collaborative approach to newspaper preservation | krahmer and douglas 2 laura had been exploring different avenues to digitize this microfilm and make them freely available to the public when ana contacted her with information about the texas state library and archives commission (tslac), which awards annual grants supported by library services & technology act funds, through the institute of museum & library services. lsta funding is annually provided to all fifty states through the institute of museum and library services, and the state library determines how this funding is expended. in texas, lsta funding is provided through a number of grant programs including textreasures, a competitive grant program for any texas library. as described by tslac, the “textreasures grant is designed to help libraries make their special collections more accessible for the people of texas and beyond. activities considered for possible funding include digitization, microfilming, and cataloging.” libraries can apply to fund the same type of project up to three years in a row, and the drc project applied for $24,820.00 in 2019 to digitize 24,000 newspaper pages, representing the earliest years of microfilm available at the denton public library. to create a viable grant application dpl partnered with the texas digital newspaper program (tdnp), available through unt’s portal to texas history, and decided to start first by digitizing as many early years of microfilm as grant funding could cover. tdnp is the largest single-state, open access, digital newspaper preservation repository in the u.s., hosting just under 8 million newspaper pages at the time of this writing. in late 2018, unt received permission from the owner of the drc to include the newspaper run in the tdnp collection, which represented a very exciting opportunity for city and county researchers, as well as for the dpl. as thanks to the publisher for granting permission, unt built access to the 2014 to 2018 pdf eprint editions, which the tdnp preserves as a service to texas press associationmember publishers. after this, unt contacted the dpl to discuss applying for grant funding. once laura learned that the dpl had received the 2019 award, she prepared the local planning steps necessary to collaborate with the university. the project becomes real the denton record-chronicle digitization project grant contract and resolution for adoption went before the denton city council on october 8, 2019. the city of denton issued a press release that day, and the drc also published an article announcing the project. over the next few days the drc article appeared across social media, including the city of denton’s social media accounts, as well as through library-associated email newsletters. after the first newspapers became available on the portal, both dpl and unt prepared blog posts about the project, which have also appeared on social media. these blog posts fulfilled publicity requirements specified by the grant, even while offering training to researchers in how to work with the online newspaper collection. one major convenience to this collaboration is that both organizations are in the same city. transfer of materials was arranged by email and accomplished by a trip across town. we completed the digitization process in batches, with the first 10 microfilm rolls going to unt on october 10, 2019, and unt uploading the first 854 issues in december 2019. the newspapers from the first microfilm set represented 1908-1916. dpl transferred the last set of microfilm in april 2020, with dates ranging from 1917 through september 1924, shortly after which unt completed and uploaded the grant-funded count of 24,000 newspaper pages. the estimated year given in the grant proposal that the scans would have gone through was 1938, but the page count on this newspaper proved to be much, much higher than originally estimated, and as a result, the funding only covered up to september 1924. dpl and unt will continue their partnership by information technology and libraries september 2020 a collaborative approach to newspaper preservation | krahmer and douglas 3 digitizing further years of the drc, through a variety of methods. as we were in the midst of preparing this column the tslac contacted laura to inform her that dpl had received a second grant award, in the amount of $24,820.00 to digitize 24,000 additional newspaper pages, which will move the newspapers through 1954. as of july 23, 2020, the denton record-chronicle collection on the portal to texas history hosts 6,168 items and has been used 16,397 times. this includes 1,743 items that are pdf eprint editions of the paper from 2014 to 2018, which unt uploaded for long-term preservation and access. unt uploads eprint editions without a charge, and digitally preserves these through an agreement with the texas press association; these pdfs were not a part of the funded grant, but they do enhance access to the collection and helped to build community interest in seeing earlier years available on the portal. the usage of the collection skyrocketed after the early editions became available. january 2020 saw the highest number with the collection uses at 3105. once this project is complete, it will include over 200,000 newspaper pages. neither dpl nor unt has the ability to tackle this project on their own, but through collaboration, it is possible. recipe for your own collaboration success these are planning recommendations as you prepare for your own collaboration, drawn from what we’ve learned as we worked on this project together. 1. communicate early and often: communicating needs enables partners to identify each other’s strengths. each partner will bring their strengths to the project, which in this case included actual archival materials from dpl and technological expertise on the unt side. in addition, be prepared to communicate with local groups who need to endorse or sign off on the project, including possibly the city council, the historical commission, or the city manager. 2. partner to write the grant: partnering in preparing the grant achieves two goals: first, it enables partners to develop a communication flow that will move forward throughout the collaboration; second, it ensures that partners know what each can realistically accomplish within the grant timeline. in this case, laura wrote most of the grant application herself, but she had very specific questions that ana had to answer, and she needed key elements from unt, including project budget, technological infrastructure, and a commitment letter. communicating early and partner on the grant application process ensured that there were no unexpected surprises that were within the control of either partner. 3. work together to explain your partnership: with a grant of this size, we always spoke in advance to ensure we weren’t over-promising when newspapers would appear online. this also gave both laura and ana lead-time for promoting the project: laura would share the years of the physical microfilm before sending them over, and ana would walk laura through the years that would get uploaded in a given month. this allowed them to plan publicity, training, and outreach efforts based on the dates of newspapers going online. in addition, laura regularly communicated with ana prior to submitting grant reports, and this was critical in preventing miscommunication going to the funding agency. 4. pad enough time for the unexpected: of course, we had no way of knowing a pandemic would occur when we began this project, and what saved us was that we’d started planning as soon as we learned about receiving the grant, rather than as soon as the grant started, which was in september 2019. planning two months in advance put us two months ahead of schedule, and we were able to start exchanging materials as soon as the grant period information technology and libraries september 2020 a collaborative approach to newspaper preservation | krahmer and douglas 4 started. this gave us a few weeks of lead time so we successfully completed the project by the end of april 2020, at which point the microfilm page count had been scanned and unt staff could remote in to complete the digitization processes. extra time is only a benefit. if the covid-19 pandemic had not occurred, we still might have had to address technological or film deterioration problems, and we could resolve these earlier rather than later because we had given ourselves a few extra weeks of lead time. 5. don’t be afraid to explain changes to your granting agency: if your project changes due to unforeseen circumstances, for example in our project the uploaded total of pages reached 24,000 before we digitized the entire planned date range. unt charges a per-page digitization fee, and these newspaper issues proved to contain more pages than expected . laura contacted the representative at tslac to explain the situation and offer an alternative approach to cover the digitization of the remaining years. the important thing is to keep the granting agency informed of any changes, delays, or hiccups in the project. we are both proud of having completed this project three months before the end of the grant period, but we know that without solid communication, planning, or flexibility, the covid-19 pandemic would have made the situation extremely difficult if not impossible. leveraging the portal’s technical infrastructure and tdnp’s newspaper expertise with the volume of material and collection expertise provided by the dpl has given us a model for success we plan to capitalize on in future projects. best of all, in the world of covid-19, our patrons can access these newspapers from the comfort of their own couches, without even taking off their pajamas! introduction background the project becomes real recipe for your own collaboration success application level security in a public library: a case study richard thomchick and tonia san nicolas-rocca information technology and libraries | december 2018 107 richard thomchick (richardt@vmware.com) is mlis, san josé state university. tonia san nicolas-rocca (tonia.sannicolas-rocca@sjsu.edu) is assistant professor in the school of information at san josé state university. abstract libraries have historically made great efforts to ensure the confidentiality of patron personally identifiable information (pii), but the rapid, widespread adoption of information technology and the internet have given rise to new privacy and security challenges. hypertext transport protocol secure (https) is a form of hypertext transport protocol (http) that enables secure communication over the public internet and provides a deterministic way to guarantee data confidentiality so that attackers cannot eavesdrop on communications. https has been used to protect sensitive information exchanges, but security exploits such as passive and active attacks have exposed the need to implement https in a more rigorous and pervasive manner. this report is intended to shed light on the state of https implementation in libraries, and to suggest ways in which libraries can evaluate and improve application security so that they can better protect the confidentiality of pii about library patrons. introduction patron privacy is fundamental to the practice of librarianship in the united states (u.s.). libraries have historically made great efforts to ensure the confidentiality of personally identifiable information (pii), but the rapid, widespread adoption of information technology and the internet have given rise to new privacy and security challenges. the usa patriot act, the rollback of the federal communications commission rules prohibiting internet service providers from selling customer browsing histories without the customer’s permission, along with electronic surveillance efforts by the national security agency (nsa) and other government agencies, have further intensified privacy concerns about sensitive information that is transmitted over the public internet when patrons interact with electronic library resources through online systems such as an online public access catalog (opac). 1 hypertext transport protocol secure (https) is a form of hypertext transport protocol (http) that enables secure communication over the public internet and provides a deterministic way to guarantee data confidentiality so that attackers cannot eavesdrop on communications. https has been used to protect sensitive information exchanges (i.e., e-commerce transactions, user authentication, etc.). in practice, however, security exploits such as man-in-the-middle attacks have demonstrated the relative ease with which an attacker can transparently eavesdrop on or hijack http traffic by targeting gaps in https implementation. there is little or no evidence in the literature that libraries are aware of the associated vulnerabilities, threats, or risks, or that researchers have evaluated the use of https in library web applications. this report is intended to shed light on the state of https implementation in libraries, and to suggest ways in which libraries can evaluate and improve application security so that they can better protect the mailto:richardt@vmware.com mailto:tonia.sannicolas-rocca@sjsu.edu application level security in a public library |thomchick and san nicolas-rocca 108 https://doi.org/10.6017/ital.v37i4.10405 confidentiality of pii about library patrons. the structure of this paper is as follows. first, we review the literature on privacy as it pertains to librarianship and cybersecurity. we then describe the testing and research methods used to evaluate https implementation. a discussion on the results of the findings is presented. finally, we explain the limitations and suggest future research directions. literature review the research begins with a survey of the literature on the topic of confidentiality as it pertains to patron privacy; the impact of information technology on libraries; and the use of https as a security control to protect the confidentiality of patron data when it is transmitted over the public internet. while there is ample literature on the topic of patron privacy, there appears to be a lack of empirical studies that measure the use of https to protect the privacy of data transmitted to and from patrons when they use library web applications.2 the primal importance of patron privacy patron privacy has long been one of the most important principles of the library profession in the u.s. as early as 1939, the code of ethics for librarians explicitly stated, “it is the librarian’s obligation to treat as confidential any private information obtained through contact with li brary patrons.”3 the concept of privacy as applied to personal and circulation data in library records began to appear in the library literature not long after the passage of the u.s. privacy act of 1974.4 today, the american library association (ala) regards privacy as “fundamental to the ethics and practice of librarianship,” and has formally adopted a policy regarding the confidentiality of personally identifiable information (pii) about library users, which asserts, “confidentiality exists when a library is in possession of personally identifiable information about users and keeps that information private on their behalf.”5 this policy affirms language from the ala code of ethics, and states that “confidentiality extends to information sought or received and resources consulted, borrowed, acquired or transmitted including database search records, reference questions and interviews, circulation records, interlibrary loan records, information about materials downloaded or placed on ‘hold’ or ‘reserve,’ and other personally identifiable information about uses of library materials, programs, facilities, or services.” 6 with the advent of new technologies used in libraries to support information discovery, more challenges arise to protect patron privacy.7 the impact of information technology on patron privacy researchers have studied the impact of information technology on patron privacy for several decades. early research by harter and machovec discussed the data privacy challenges arising from the use of automated systems in the library, and the associated ethical considerations for librarians who create, view, modify, and use patron records.8 fouty addressed issues regarding the privacy of patron data contained in library databases, arguing that online patron records provide more information about individual library users, more quickly, than traditional paperbased files.9 agnew and miller presented a hypothetical case involving the transmission of an obscene email from a library computer, and an ensuing fbi inquiry, as a method of examining privacy issues that arise from patron internet use at the library.10 in addition, merry pointed to the potential for violations of patron privacy brought about by tracking of personal information attached to electronic text supplied by publishers.11 information technology and libraries | december 2018 109 the consensus from the literature, as articulated by fifarek, is that technology has given rise to new privacy challenges, and that the adoption of technology in the library has outpaced efforts to maintain patron privacy.12 this sentiment was echoed and amplified by john berry, former ala president, who commented that there are “deeper issues that arise from the impact of converting information to digitized, online formats” and critiqued the library profession for having “not built protections for such fundamental rights as those to free expression, privacy, and freedom.”13 ala affirmed these findings and validated much of the prevailing research in a report from the library information technology association, which concluded, “user records have also expanded beyond the standard lists of library cardholders and circulation records as libraries begin to use electronic communication methods such as electronic mail for reference services, and as they provide access to computer, web and printing use.”14 in more recent years, library systems have made increasing use of network communication protocols such as http and focus of the literature has shifted towards internet technologies in response to the growth of trends such as cloud computing and web 2.0. mavodza characterizes the relevance of cloud computing as “unavoidable” and expounds on the ways in which software-as-aservice (saas), platform as a service (paas), and infrastructure as a service (iaas) and other cloud computing models “bring to the forefront considerations about . . . information security [and] privacy . . . that the librarian has to be knowledgeable about.”15 levy and bérard caution that nextgeneration library systems and web-based solutions are “a breakthrough but need careful scrutiny” of security, privacy, and related issues such as data provenance (i.e., where the information is physically stored, which can potentially affect security and privacy compliance requirements). 16 protecting patron privacy in the “library 2.0” era “library 2.0” is an approach to librarianship that emphasizes engagement and multidirectional interaction with library patrons. although this model is “broader than just online communication and collaboration” and “encompasses both physical and virtual spaces,” there can be no doubt that “library 2.0 is rooted in the global web 2.0 discussion,” and that libraries have made increasing use of web 2.0 technologies to engage patrons.17 the library 2.0 model disrupts many traditional practices for protecting privacy, such as limited tracking of user activity, short-term data retention policies, and anonymous browsing of physical materials. instead, as zimmer states, “the norms of web 2.0 promote the open sharing of information—often personal information—and the design of many library 2.0 services capitalize on access to patron information and might require additional tracking, collection, and aggregation of patron activities.”18 as ala cautioned in their study on privacy and confidentiality, “libraries that provide materials over websites controlled by the library must determine the appropriate use of any data describing user activity logged or gathered by the web server software.”19 the dilemma facing libraries in the library 2.0 era, then, is how to appropriately leverage user information while maintaining patron privacy. many library systems require users to validate their identity through the use of a username, password, pin code, or another unique identifier for access to their library circulation records and other personal information.20 however, several studies suggest the authentication process itself spawns a trail of personally identifiable information about library patrons that must be kept confidential.21 there is discussion in the literature about the value of using https and ssl certificates to protect patron privacy and build a high level of trust with users, and general awareness about importance of encrypting communications that involve sensitive information, such as “payment for fines and fees via the opac” or when “patrons are required to enter personal application level security in a public library |thomchick and san nicolas-rocca 110 https://doi.org/10.6017/ital.v37i4.10405 details such as addresses, phone numbers, usernames, and/or passwords.”22 however, as breeding observed, many opacs and other library automation software products “don't use ssl by default, even when processing these personalization features.” 23 these observations call library privacy practices into question, and are concerning since “hackers have identified library ilss as vulnerable, especially when libraries do not enforce strict system security protocols.” 24 one of the challenges facing libraries is the perception that “a library's basic website and online catalog functions don't need enhanced security.”25 as a matter-of-fact, one of the most common complaints against https implementation in libraries has been: “we don’t serve any sensitive information.”26 these beliefs may be based on the historical practice of using https selectively to secure “sensitive” information and operations such as user authentication. but in recent years, it has become clear that selective https implementation is not an adequate defense. the electronic frontier foundation (eff) cautions, “some site operators provide only the login page over https, on the theory that only the user’s password is sensitive. these sites’ users are vulnerable to passive and active attacks.”27 passive attacks do not alter systems or data. during a passive attack, a hacker will attempt to listen in on communications over a network. eavesdropping is an example of a passive attack.28 active attacks alter systems or data. during this type of attack, a hacker will attempt to break into a system to make changes to transmitted or stored data, or introduce data into the system. examples of active attacks include man-in-the-middle, impersonation, and session hijacking.29 http exploits web servers typically generate unique session token ids for authenticated users and transmit them to the browser, where they are cached in the form of cookies. session hijacking is a type of attack that “compromises the session token by stealing or predicting a valid session token to gain unauthorized access to the web server,” often by using a network sniffer to capture a valid session id that can be used to gain access to the server.30 session hijacking is not a new problem, but the release of the firesheep attack kit in 2010 increased awareness about the inherent insecurity of http and the need for persistent https.31 in the wake of firesheep’s release and several major security breaches, senator charles schumer, in a letter to yahoo!, twitter, and amazon, characterized http as a “welcome mat for would-be hackers” and urged the technology industry to implement better security as quickly as possible.32 these and other events prompted several major site operators, including google, facebook, paypal, and twitter, to switch from partial to pervasive https. today these sites transmit virtually all web application traffic over https. security researchers from these companies, as well as from several standards organizations such as electronic frontier foundation (eff), internet engineering task force (ietf), and open web application security project have shared their experiences and recommendations to help other website operators implement https effectively.33 these include encrypting the entire session, avoiding mixed content, configuring cookies correctly, using valid ssl certificates, and enabling hsts to enforce https. testing techniques used to evaluate https implementation there is little or no evidence in the literature that libraries are aware of the associated vulnerabilities, threats, or risks, or that researchers have evaluated the use of https in library web applications. however, there are many methods that libraries can use to evaluate https and information technology and libraries | december 2018 111 ssl/tls implementation, including automated software tools and heuristic evaluations. these methods can be combined for deeper analysis. automated software tools among the most widely used automated analysis software tools is ssl server test from qualys ssl labs. this online service “performs a deep analysis of the configuration of any ssl web server on the public internet” and provides a visual summary as well as detailed information about authentication (certification and certificate chains) and configuration (protocols, key strength, cipher suites, and protocol details).34 users can optionally post the results to a central “board” that acts as a clearinghouse for identifying “insecure” and “trusted” sites. another popular tool is sslscan, a command-line application that, as the name implies, quickly “queries ssl services, such as https, in order to determine the ciphers that are supported.”35 however, these tools are limited in that they only report specific types of data and do not provide a holistic view of https implementation. heuristic evaluations in addition to automated software tools, librarians can also use heuristic evaluations to manually inspect the gray areas of https implementation, either to validate the results of automated software or to examine aspects not included in the functionality of these tools. one example is httpsnow, a service that lets users report and view information about how websites use https. httpsnow enables this activity by providing heuristics that non-technical audiences can use to derive a relatively accurate assessment of https deployment on any particular website or application. the project documentation includes descriptions of, and guidance for identifying, http-related vulnerabilities such as use of http during authenticated user sessions, presence of mixed content (instances in which content on a webpage is transmitted via https while other content elements are transmitted via http), insecure cookie configurations, and use of invalid ssl certificates. research methodology a combination of heuristic and automated methods was used to evaluate https implementation in a public library web application to determine how many security vulnerabilities exist in the application and assess to the potential privacy risks to the library’s patrons. research location this research project was conducted at a public library in the western us that we will call west coast public library (wcpl). this library was established in 1908 and employs ninety staff and approximately forty volunteers. in addition, it has approximately 91,000 cardholders. as part of its operations, wcpl runs a public-facing website and an integrated library system (ils) that includes an opac with personalization for authenticated users. test to conduct the test, a valid wcpl library patron account was created and used to authenticate one of the authors for access to account information and personalized features of wcpl’s opac. next, the google chrome web browser was used to visit wcpl’s public-facing website. a valid patron name, library card number, and eight-digit pin number were then used to gain access to online account information. several tasks were performed to evaluate https usage. a sample search application level security in a public library |thomchick and san nicolas-rocca 112 https://doi.org/10.6017/ital.v37i4.10405 query for the keyword “recipes” was performed in the opac while logged in. the description pages for two of the resources listed in the search engine result page (one printed resource and one electronic resource) were clicked on and viewed. the electronic resource was added to the online account’s “book cart” and the book cart page was viewed. during these activities, httpsnow heuristics were applied to individual webpages and to the user session as a whole. the web browser’s url address window was inspected to determine whether some or all pages were transmitted via http or https. the url icon in the browser’s address bar was clicked on to view a list of the cookies that the application set in the browser. each cookie was inspected for the text, "send for: encrypted connections only," which indicates that the cookie is secure. individual webpages were checked for the presence of mixed (encrypted and unencrypted) content. information about individual ssl certificates was inspected to determine their validity and encryption key length. all domain and subdomain names encountered during these activ ities were documented. the google chrome web browser was then used to access the qualys ssl server test tool. each domain name encountered was submitted. test results were then examined to determine whether any authentication or configuration flaws exist in wcpl’s web applications. results and discussion given the recommendations suggested by several organizations (e.g., eff, ietf, owasp), we evaluated wcpl’s web application to determine how many security vulnerabilities exist in the application, and assess the potential privacy risks to the library’s patrons. the results of tests, as discussed below, suggest that wcpl’s web application processes a number of vulnerabilities that could potentially be exploited by attackers and compromise the confidentiality of pii about library patrons. this is not surprising given the lack of research on https implementation, as well as the general consensus in the literature that technology adoption has outpaced efforts to maintain patron privacy. based on the results of these tests, wcpl’s website and ils span across several domains. some of these domains appear to be operated by wcpl, while others appear to be part of a hosted environment operated by the ils vendor. based on this information, it is reasonable to conclude that wcpl’s ils utilizes a “hybrid cloud” model. in addition, random use of https is observed in the opac interface during the testing process. this is discussed in the following sections. use of http during authenticated user sessions library patrons use wcpl’s website and opac to access and search for books and other material available through the library. given the results of the tests, wcpl does not use https pervasively across its entire web application. during the test, we found that wcpl’s website is transmitted via http by default. this was after manually entering in the url with an “https” prefix, which resulted in a redirect to the unencrypted “http” page. we continued to test wcpl’s website and opac by performing a query using the search bar located on the patron account page. we found that wcpl’s opac transmits some pages over http and others over https. for example, when a search query is performed in the search bar located on the patron account page, the search engine results page is sometimes served over https, and sometimes over http (see figure 1). this behavior is not limited to specific pages; rather it appears to be random. this security flaw leaves library patrons vulnerable to passive and active attacks that exploit gaps in https implementation, which allows an attacker to eavesdrop on and hijack a user-session providing the attacker with access to private information. information technology and libraries | december 2018 113 figure 1. results of the library’s use of https. presence of mixed content when a library patron visits a webpage served over https, the connection with the web server is encrypted, and therefore, safeguarded from attack. if an https webpage includes content retrieved via http, the webpage is only partially encrypted, leaving the unencrypted content vulnerable to attackers. analysis of wcpl’s website did not reveal any explicit use of mixed content on the public-facing portion of the site. test results, however, detected unencrypted content sources on some pages of the library’s online catalog. this, unfortunately, puts patron privacy at risk as attackers can intercept the http resources when an https webpage loads content such as an image, iframe or font over http. this compromises the security of what is perceived to be a secure site by enabling an attacker to exploit an insecure css file or javascript function, leading to disclosure of sensitive data, malicious website redirect, man-in-the-middle attacks, phishing, and other active attacks.36 insecure cookie management cookies are small text files, sent from a web server and stored on user computers via web browsers. cookies can be divided into two categories: session and persistent. persistent cookies are stored on the user’s hard drive until they are erased or expire. unlike persistent cookies, session cookies are stored in memory and erased once the user closes their browser. provided that computer settings allow for it, cookies are created when a user visits a website. cookies can be set up such that communication is limited to encrypted communication, and can be used to remember login credentials, previous information entered into forms, such as name, mailing address, email address, and the like. cookies can also be used to monitor the number of times a user visits a website, the pages a user visits, and the amount of time spent on a webpage. application level security in a public library |thomchick and san nicolas-rocca 114 https://doi.org/10.6017/ital.v37i4.10405 the results of the tests suggest that wcpl’s cookie policies are inconsistent. we found two types of cookies present. within one domain, the web application uses a jsession cookie that is configured to send for “secure connections only.” this indicates that the session id cookie is encrypted during transmission. another domain uses an asp.net session id that is configured to send for any connection, which means the session id could be transmitted in an unencrypted format. cookies transmitted in an unencrypted format could be intercepted by an attacker in order to eavesdrop on or hijack user sessions. this leaves user privacy vulnerable given the type of information contained within cookies. flawed encryption protocol support transport layer security (tls) is a protocol designed to provide secure communication over the web. websites using tls, therefore, provide a secure communication path between their web servers and web browsers preventing eavesdropping, hijacking, and other active attacks. this study employed the ssl server test from qualys ssl labs to perform an analysis of wcpl’s web applications. results of the qualys test (see figure 2) indicate that the site does not support tls 1.2, which means the server may be vulnerable to passive and active attacks, thereby providing hackers with access to data passed between a web server and web browser accessing the server. in addition, the application’s server platform supports ssl 2.0, which is insecure because it is subject to a number of passive and active attacks leading to loss of confidentiality, privacy, and integrity. figure 2. qualys scanning service results. the vulnerabilities discovered during the testing process may be a result of uncoordinated security. this is concerning because it is a by-product of the cloud computing approach used to operate wcpl’s ils. while libraries may have acclimated to the challenge of coordinating security measures across a distributed application, they now face the added complexity of coordinating information technology and libraries | december 2018 115 security measures with their vendors, who themselves may also utilize additional cloud-based offerings from third parties. as cloud technology adoption increases and cloud-based infrastructures become more complex and distributed, attackers will likely attempt to find and exploit systems with inconsistent or uneven security measures, and libraries will need to work closely with information technology vendors to ensure tight coordination of security measures. unencrypted communication using http affects the privacy, security, and integrity of patron data. passive attacks such as eavesdropping, and active attacks such as hijacking, man -in-the-middle, and phishing can reveal patron login credentials, search history, identity, and other sensitive information that, according to ala, should be kept private and confidential. given the results of the testing done in this study, it is clear that wcpl needs to revisit and strengthen their web application security measures by, according to organizations within the security community, using https pervasively across the entire web application, avoiding mixed content, configuring cookies limited to encrypted communication, using valid ssl certificates, and enabling hsts to enforce https. implementing improvements to https will mitigate attacks by strengthening the integrity of wcpl’s web applications, which in turn, will help protect the privacy and confidentiality of library patrons. limitations and future research this research was performed at a public library in the western u.s. therefore, future research is needed to study the implementation of https to increase patron privacy at other public libraries, libraries in other parts of the u.s. and in other countries. it would also be valuable to conduct similar research at libraries of different types, including academic, law, medical, and other types of special libraries. ssl server test from qualys ssl labs and httpsnow were used to evaluate the use of https at wcpl. the use of other evaluation techniques may generate different results. while a major limitation of this study is the evaluation of a single public library and the implementation of https to ensure patron privacy, a next phase of research should further investigate the policies in place that are used to safeguard patron privacy. these include security education, training, and awareness programs, as well as access controls. furthermore, library 2.0 and cloud computing are fundamental to libraries, but create risks that could impact the ability to keep patron pii safeguarded. as such, future research should evaluate the impact library 2.0 and cloud computing applications have on maintaining the confidentiality of patron information. conclusion the library profession has long been a staunch defender of privacy rights, and the literature reviewed indicates strong awareness and concern about the rapid pace of information technology and its impact on the confidentiality of personally identifiable information about library patrons. much work has been done to educate librarians and patrons about the risks facing them and the measures they can take to protect themselves. however, the research and experimentation presented in this report strongly suggest that there is a need for wcpl and other libraries to reassess and strengthen their https implementations. https is not a panacea for mitigating web application risks, but it can help libraries give patrons the assurance of knowing they take security and privacy seriously, and that reasonable steps are being taken to protect them. finally, this report concludes that further research on library application security should be conducted to assess the overall state of application security in public, academic, and special libraries, with the application level security in a public library |thomchick and san nicolas-rocca 116 https://doi.org/10.6017/ital.v37i4.10405 long-term objective of enabling ala and other professional institutions to develop policies and best practices to guide the secure adoption of library 2.0 and cloud computing technologies within a socially connected world. references 1 jon brodkin, “president trump delivers final blow to web browsing privacy rules,” ars technica (april 3, 2017), https://arstechnica.com/tech-policy/2017/04/trumps-signaturemakes-it-official-isp-privacy-rules-are-dead/. 2 shayna pekala, “privacy and user experience in 21st century library discovery,” information technology and libraries 36, no. 2 (2017): 48–58, https://doi.org/10.6017/ital.v36i2.9817. 3 american library association, “history of the code of ethics: 1939 code of ethics for librarians,” accessed may 11, 2018, http://www.ala.org/template.cfm?section=history1&template=/contentmanagement/conte ntdisplay.cfm&contentid=8875. 4 joyce crooks, “civil liberties, libraries, and computers,” library journal 101, no. 3 (1976): 482– 87; stephen harter and charles c. busha, “libraries and privacy legislation,” library journal 101, no. 3 (1976): 475–81; kathleen g. fouty, “online patron records and privacy: service vs. security,” journal of academic librarianship 19, no. 5 (1993): 289–93, https://doi.org/10.1016/0099-1333(93)90024-y. 5 “code of ethics of the american library association,” american library association, amended january 22, 2008, http://www.ala.org/advocacy/proethics/codeofethics/codeethics; “privacy: an interpretation of the library bill of rights,” american library association, amended july 1, 2014, http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy. 6 american library association, “privacy: an interpretation of the library bill of rights,” amended july 1, 2014, http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy. 7 pekala, “privacy and user,” pp. 48–58. 8 harter and busha, “libraries and privacy legislation,” pp. 475–81; george s. machovec, “data security and privacy in the age of automated library systems,” information intelligence, online libraries, and microcomputers 6, no. 1 (1988). 9 fouty, “online patron records and privacy, pp. 289–93. 10 grace j. agnew and rex miller, “how do you manage?,” library journal 121, no. 2 (1996): 54. 11 lois k. merry, “hey, look who took this out!—privacy in the electronic library,” journal of interlibrary loan, document delivery & information supply 6, no. 4 (1996): 35–44, https://doi.org/10.1300/j110v06n04_04. https://arstechnica.com/tech-policy/2017/04/trumps-signature-makes-it-official-isp-privacy-rules-are-dead/ https://arstechnica.com/tech-policy/2017/04/trumps-signature-makes-it-official-isp-privacy-rules-are-dead/ https://doi.org/10.6017/ital.v36i2.9817 http://www.ala.org/template.cfm?section=history1&template=/contentmanagement/contentdisplay.cfm&contentid=8875 http://www.ala.org/template.cfm?section=history1&template=/contentmanagement/contentdisplay.cfm&contentid=8875 https://doi.org/10.1016/0099-1333(93)90024-y http://www.ala.org/advocacy/proethics/codeofethics/codeethics http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy https://doi.org/10.1300/j110v06n04_04 information technology and libraries | december 2018 117 12 aimee fifarek, “technology and privacy in the academic library,” online information review 26, no. 6 (2002): 366–74, https://doi.org/10.1108/14684520210452691. 13 john n. berry iii, “digital democracy: not yet!,” library journal 125, no. 1 (2000): 6. 14 american library association, “appendix—privacy and confidentiality in the electronic environment,” september 28, 2006, http://www.ala.org/lita/involve/taskforces/dissolved/privacy/appendix. 15 judith mavodza, “the impact of cloud computing on the future of academic library practices and services,” new library world 114, no. 3/4 (2012): 132–41, https://doi.org/10.1108/03074801311304041. 16 richard levy, “library in the cloud with diamonds: a critical evaluation of the future of library management systems,” library hi tech news 30, no. 3 (2013): 9–13, https://doi.org/10.1108/lhtn-11-2012-0071; raymond bérard, “next generation library systems: new opportunities and threats,” bibliothek, forschung und praxis 37, no. 1 (2013): 52–58, https://doi.org/10.1515/bfp-2013-0008. 17 michael stephens, “the hyperlinked library: a ttw white paper,” accessed may 13, 2018, http://tametheweb.com/2011/02/21/hyperlinkedlibrary2011/; michael zimmer, “patron privacy in the ‘2.0’ era.” journal of information ethics 22, no. 1 (2013): 44–59, https://doi.org/10.3172/jie.22.1.44. 18 zimmer, “patron privacy in the ‘2.0’ era,” p. 44. 19 “the american library association’s task force on privacy and confidentiality in the electronic environment,” american library association, final report july 7, 2000, http://www.ala.org/lita/about/taskforces/dissolved/privacy. 20 library information technology association (lita), accessed may 11, 2018, http://www.ala.org/lita/. 21 library information technology association (lita), accessed may 11, 2018, http://www.ala.org/lita/; pam dixon, “ethical issues implicit in library authentication and access management: risks and best practices,” journal of library administration 47, no. 3 (2008): 141–62, https://doi.org/10.1080/01930820802186480; eric p. delozier, “anonymity and authenticity in the cloud: issues and applications,” oclc systems and services: international digital library perspectives 29, no. 2 (2012): 65–77, https://doi.org/10.1108/10650751311319278. 22 marshall breeding, “building trust through secure web sites,” computers in libraries 25, no. 6 (2006), p. 24. 23 breeding, “building trust,” p. 25. https://doi.org/10.1108/14684520210452691 http://www.ala.org/lita/involve/taskforces/dissolved/privacy/appendix https://doi.org/10.1108/03074801311304041 https://doi.org/10.1108/lhtn-11-2012-0071 https://doi.org/10.1515/bfp-2013-0008 http://tametheweb.com/2011/02/21/hyperlinkedlibrary2011/ https://doi.org/10.3172/jie.22.1.44 http://www.ala.org/lita/about/taskforces/dissolved/privacy http://www.ala.org/lita/ http://www.ala.org/lita/ https://doi.org/10.1080/01930820802186480 https://doi.org/10.1108/10650751311319278 application level security in a public library |thomchick and san nicolas-rocca 118 https://doi.org/10.6017/ital.v37i4.10405 24 barbara swatt engstrom et al., “evaluating patron privacy on your ils: how to protect the confidentiality of your patron information,” aall spectrum 10, no 6 (2006): 4–19. 25 breeding, “building trust,” p. 26. 26 tj lamana, “the state of https in libraries,” intellectual freedom blog, the office for intellectual freedom of the american library association (2017), https://www.oif.ala.org/oif/?p=11883. 27 chris palmer and yan zhu, “how to deploy https correctly,” electronic frontier foundation, updated february 9, 2017, https://www.eff.org/https-everywhere/deploying-https. 28 computer security resource center, “glossary,” national institute of standards and technology, accessed may 12, 2018, https://csrc.nist.gov/glossary/?term=491#alphaindexdiv. 29 computer security resource center, “glossary,” national institute of standards and technology, accessed may 12, 2018, https://csrc.nist.gov/glossary/?term=2817. 30 open web application security project, “session hijacking attack,” last modified august 14, 2014, https://www.owasp.org/index.php/session_hijacking_attack; open web application security project, “session management cheat sheet,” last modified september 11, 2017, https://www.owasp.org/index.php/session_management_cheat_sheet. 31 eric butler, “firesheep,” (2010), http://codebutler.com/firesheep/; audrey watters, “zuckerberg's page hacked, now facebook to offer ‘always on’ https," accessed may 16, 2018, https://readwrite.com/2011/01/26/zuckerbergs_facebook_page_hacked_and_now_facebook/ . 32 info security magazine, “senator schumer: current internet security “welcome mat for wouldbe hackers,” (march 2, 2011), http://www.infosecurity-magazine.com/view/16328/senator schumer-current-internetsecurity-welcome-mat-for-wouldbe-hackers/. 33 palmer and zhu, “how to deploy https correctly”; internet engineering task force, “recommendations for secure use of transport layer security (tls) and datagram transport layer security (dtls),” (may, 2015), https://tools.ietf.org/html/bcp195; open web application security project, “session management cheat sheet,” last modified september 11, 2017, https://www.owasp.org/index.php/session_management_cheat_sheet. 34 qualys ssl labs, “ssl/tls deployment best practices,” accessed may 18, 2018, https://www.ssllabs.com/projects/best-practices/. 35 sourceforge, “sslscan—fast ssl scanner,” last updated april 24, 2013, http://sourceforge.net/projects/sslscan/. 36 palmer and zhu, “how to deploy https correctly.” https://www.oif.ala.org/oif/?p=11883 https://www.eff.org/https-everywhere/deploying-https https://csrc.nist.gov/glossary/?term=491#alphaindexdiv https://csrc.nist.gov/glossary/?term=2817 https://www.owasp.org/index.php/session_hijacking_attack https://www.owasp.org/index.php/session_management_cheat_sheet http://codebutler.com/firesheep/ https://readwrite.com/2011/01/26/zuckerbergs_facebook_page_hacked_and_now_facebook/ http://www.infosecurity-magazine.com/view/16328/senator-%20schumer-current-internet-%20security-welcome-mat-for-wouldbe-hackers/ http://www.infosecurity-magazine.com/view/16328/senator-%20schumer-current-internet-%20security-welcome-mat-for-wouldbe-hackers/ https://tools.ietf.org/html/bcp195 https://www.owasp.org/index.php/session_management_cheat_sheet https://www.ssllabs.com/projects/best-practices/ http://sourceforge.net/projects/sslscan/ abstract introduction literature review the primal importance of patron privacy the impact of information technology on patron privacy protecting patron privacy in the “library 2.0” era http exploits testing techniques used to evaluate https implementation automated software tools heuristic evaluations research methodology research location test results and discussion use of http during authenticated user sessions presence of mixed content insecure cookie management flawed encryption protocol support limitations and future research conclusion references core leadership column: making room for change through rest core leadership column making room for change through rest margaret heller information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.13513 i write this column from the vantage point of my current role as a member of the core technology section leadership team, and as a newly elected president-elect of core, with my term starting in july 2021. the planning for core began years ago but became a real division of ala in the most chaotic of times. visions for the first year of core were set aside as we had to face the reality of all the work needing to be done remotely, without any conferences that would allow for in-person conversations, and with all the leadership and members under personal and professional strain. yet being forced to start up slowly and deliberately provides some advantages. settling into this new situation has allowed staff, leaders, and members to acclimate to a new division an d learn how we want to do things in the future, rather than relying too much on how we did things in the past or feeling pressure to meet every demand. right now, we are all at a juncture in our personal and professional lives, and thinking about how to approach the coming months. summer offers the promise of growth and reinvention. the pause that a break implies allows time for us both as individuals to make time for what is important to us, and as members or employees of institutions to reconsider our priorities. for people working in library technology, however, the “summer break” is often anything but. public libraries become a hub for activity as schools are closed, and school and academic libraries may use slow periods when classes are not in session for necessary systems upgrades or to roll out a new service. the summer of 2020 was one of the most challenging of my life, both professionally and personally, and meeting all the demands of the moment left hardly any time for a true break. this year, just like last year, feels like a summer we might not let ourselves rest for a moment. while many libraries have been open to some degree over the past year, the upcoming summer has the potential for a return to something like normal. shutting down regular in-person services and buildings felt chaotic since it required new ways of providing those services and building up new technical infrastructure, but without us having expected this in advance like a normal summer project. the return may also feel chaotic, but rather than approaching it as a series of tasks in a plan that requires lots of energy and work, i hope we can treat the time as a period of reflective practice and give ourselves time to understand what has changed. adapting to the realities of life since spring of 2020 has changed us all in various ways, and so too our library users have new needs and expectations. in some cases, they have embraced new services, though this has not been a smooth process for everyone. i have a family member who started using an e-reader for the first time during the pandemic to access library e-books when her public library was closed or had limited services. she was grateful for the option to access books this way, but occasionally struggled to follow the complex workflow from library app to vendor site to device. without the ability to visit a physical reference desk to ask for help, she asked me to assist with device troubleshooting on several occasions. that worked well for her, but margaret heller (mheller1@luc.edu) is digital services librarian, loyola university chicago, and (as of july 1, 2021) president-elect of the core: leadership, infrastructure, futures division of ala. © 2021. mailto:mheller1@luc.edu information technology and libraries june 2021 making room for change through rest | heller 2 not everyone has a digital services librarian in their quarantine bubble. i share this to illustrate that while some people will have adapted or gotten the help they need, for many, this time has been one of doing without or maladaptation. going back to “normal” will not help those who will need even more than they did pre-pandemic. taking time to understand that fact, and to accept that it will not be a quick process of return for many people, will allow us to give each other space to find a way back to our lives as library users and library employees. while many of us feel uncomfortable when we see slow progress—i know i do—i am coming to realize the value of making space for slowness and for rest. rest comes in all forms. it could be physical rest, but it could be pursuing an artistic or athletic hobby, intentional social interactions, or spiritual practices. institutions might give extra time off or set healthy expectations for work hours and meeting-free days, while also discarding old practices and attitudes to create better future work environments. there are crises to which we must immediately react and respond, but without personal and institutional energy in reserve, we will not do as good a job when they occur. crises include political upheaval, public health emergencies, and other major events, but we can also appreciate how they unfold on a more mundane level. information technology work often requires odd hours, intense bursts of energy to complete projects in a small window of time, and unpredictable problems that require dropping everything else to address an emergency. it is natural to constantly look towards the most urgent and the newest problem. this tendency results in lengthy backlogs for requests and accumulates technical debt from deferred maintenance or refactoring. yet as we bring our libraries and other institutions out of pandemic mode over the next few years, allowing for reflective space can help us to be cautious about the choices we make. for example, during earlier stages of the pandemic, many of us probably had to set u p systems for some type of surveillance to maintain social distancing and aid in contact tracing. taking some time to review all those new procedures and systems—and purposefully dismantle those with negative privacy implications—will help us to go forward as more ethical and empathetic institutions. taking it slow is going to be the only way through the next period. summer 2021 should be about reflection on collective trauma. we responded to the events of the past year, whether it was for closing libraries, keeping libraries open as safely as possible, racial justice work, or election support, and now we must consider how to incorporate what we started into lasting change. to do that reflection will require rest. we know how important rest is but finding space for it is not usually a high priority. rest allows us to integrate our experiences, and will build us back to make sure we can keep responding to what comes next. i am challenging myself to spend time in deliberate reflection at the cost of mindless productivity over the coming months so that i can keep helping my library and core succeed. i hope you will consider doing the same. 214 highlights of isad board meeting 197 4 annual meeting new york, new york monday, july 8, 1974 the meeting was called to order by president frederick kilgour at 4:45 p.m. the following were present: board-frederick g. kilgour, lawrence w. s. auld, paul j. fasana, susan k. martin, ralph m. shoffner, donald p. hammer ( isad executive secretary), and berniece coulter, secretary, isad. guests-henriette d. avram, roberto esteves, stephen salmon, merry sue smoller, and ruth l. tighe. additions to the agenda. mrs. martin requested that the matter of commercial brochures being included in isad mailings be added to the agenda. midwinter minutes approved. motion. it was moved by paul fasana that the minutes of the isad 197 4 chicago midwinter meeting be approved. seconded by ralph shoffner. carried. introduction of new officers. mr. kilgour introduced to the board henriette avram, vice-president/president-elect, and ruth tighe, member-at-large of the isad board of directors, who would assume office at the close of the new york conference. policy concerning materials used in isad dissemination or displays. motion. it was moved by susan martin that the isad board establish a policy that only material produced by ala units or related professional organizations be included in its disseminations or displays. seconded by paul fasana. carried. video/cable section. mr. roberto esteves, chairman of the ala video/ cable ad hoc study committee, solicited the interest of and activity by the isad board in getting video i cable incorporated into the isad structure. he reported that his committee had considered three alternatives as to where video/ cable concerns could be situated within ala: ( 1) it could remain as a task force in srrt; (2) a separate round table on video/ 216 i ouj'nal of libmry automation vol. 7 i 3 september 197 4 that had been before the evaluation, when it was his belief that isa was a cunent awareness service. mr. fasana recommended that the executive secretary write isa a letter informing them that the board cannot consider becoming a sponsor at this time. asidic. mr. hammer informed the board that peter watson had talked with him about asidic liaison, and they had concluded that asidic is primarily interested in having an observer at isad board meetings. to accomplish this requires no action from the board. wednesday, july 10, 1974 the meeting was called to order at 4:40 p.m. by president frederick kilgour. those present were: board-frederick g. kilgour, lawrence w. s. auld, paul j. fasana, susan k. martin, ralph m. shoffner, donald p. hammer (isad executive secretary), and berniece coulter, secretary, isad. committee chairmen-brian aveney, brett butler, helen schmierer, velma veneziano. guests-henriette d. avram, gerald lazorick, ruth l. tighe. sdi service for ala members. mr. lazorick (ohio state university mechanized information center) discussed the advantages to ala members if the osu selective dissemination of information service were available to them by subscription. the center would charge $50 per year for a profile, as opposed to the standard $300. the contract for sdi and retrospective searches (two services) would require ala to guarantee $17,000 per year ( $10,000 for sdi and $7,000 for retrospective searches). also, mr. fasana estimated that advertising and publicity costs might be as much as $5,000. mr. lazorick further explained that the printing of the necessary materials and the mailing would be handled by the center. ala would be responsible for advertising, marketing, and billing. it was relayed to the board that mr. wedgeworth did not feel that ala would profit enough for the amount it would have to pay for the service. he felt that osu could provide the service directly to individuals without the intervention of ala. mr. kilgour said that he felt a need to know that the money paid would indeed return to ala. the board had in the past expressed an interest in this type of service for ala members, and mr. kilgour asked if this feeling still existed. there was agreement among the board that it would be a desirable service for ala members. mr. kilgour stated that it will be necessary to: ( 1) determine the actual costs; ( 2) find the least expensive way of informing the members of this opportunity; and ( 3) obtain a commitment from the membership. he further said he would talk with mr. wedgeworth to see if agreement could highlights of meetings 215 cable could be formed; or ( 3) a section on video i cable could be formed in isad. the committee had favored an isad section. a round table might have more appeal to members, but would be outside the ala divisional and political structure. he further made known the desire of the committee for coordination of the forty-nine existing groups involved with audiovisual in ala. he noted that it would be possible to create a committee on a v within the isad section on video/ cable if there were interest in that approach to solving the problem. motion. mr. fasana moved that the isad board endorse in principle the ala video/cable ad hoc study committee's suggestion to create within isad a section devoted to video/ cable. seconded by susan martin. carried. misleading claims. mr. salmon indicated that some advertising over the last few years had appeared to be misleading, and that in some cases librarians' and libraries' names had been incorporated into advertising literature without their knowledge. two rtsd committees touch upon these problems as they relate to technical processing products and services: the bookdealer-library relations committee and the micropublishing projects committee. with adequate care, mr. salmon suggested, such a committee could be used by isad to ensure that its members are adequately informed. isad board members indicated an interest in and a need for a service of this nature, but reflected a hesitancy regarding the sensitivity of the issue. mr. kilgour asked that the matter be deferred until the wednesday board meeting, after the function statements of the two rtsd committees had been distributed to the board. isad historian. mr. hammer reviewed the action of the board previously in deciding to eliminate the history committee and appoint a historian if ala were going to publish a history of the association for the 1976 centennial celebration. since it has since been determined that ala will not produce such a publication, isad has no need to appoint a historian. isa (infonnation science abstmcts). mr. kilgour felt that there were now two obvious avenues open to the board at this time: ( 1) to pursue the evaluation of isa, or ( 2) to drop it altogether. mr. fasana said he believed that if ala were a sponsor, isa would abstract more library literature. because chemical information journals are among their sponsors, they cover chemical literature heavily. he felt that isad should look seriously into isa sponsorship, as there is nothing comparable to it in the united states. the isa board is interested, he said, because it would increase' their subscriptions and also increase their scope if they obtained subscriptions from ala members. mr. hammer explained that isad at one time had attempted to organize a subscription campaign, but the response was poor. mr. kilgour said that his reaction had been in favor of sponsorship, but highlights of meetings 217 be reached after additional points were discussed. we could then determine the answers to the three questions mentioned above. bylaws and organization committee. the newly appointed chairman of the isad bylaws and organization committee, helen schmierer, explained that she had found two versions of the isad bylaws extant, and there was a question as to which was current. minutes of the division did not reveal that any actual vote by the membership concerning various changes in the bylaws ever took place. she suggested that her committee use the original ( 1968) version of the bylaws as the basis on which to present all subsequent changes to the membership for a vote. she told the board she would have a new version ready by midwinter, and that it could then be published in ]ola and voted on at the san francisco annual conference. in answer to a question from ms. schmierer, mr. kilgour explained that it was the intent of the board that the bylaws and organization committees be combined, and the resulting committee should provide guidelines for each new committee established subsequently within isad. he also stated that it was necessary that a change be made in the present bylaws so that if a president did not complete a term, there would be a special election in order to elect another vice-president to take over the following year. for other charges to the bylaws and organization committee, mr. kilgour referred ms. schmierer to the minutes of the 197 4 midwinter meeting. telecommunications committee report. mr. kilgour annouced that david waite had resigned as chairman of the telecommunications committee and that he had appointed philip long as new chairman. mr. long presented a report of the committee (exhibit 1). he said that the areas of interest of the committee were networking, protocol, and standards. the following resolution was passed at their meeting: "that ala, via isad, join the committee of corporate telephone users ( cctu) and thus support the effort to combat the at&t attempt to adversely modify the current w ats tariff; should it not be legally or financially feasible for isad i ala to join cctu the committee will nonetheless attempt to follow and rep01t on this and related regulatory items." mr. kilgour called for a motion recommending that ala become a member of the committee of corporate telephone users, an organization to combat the cunent revision of the w ats tariff, providing money is available and no legal problems are connected with ala's so doing. however, several members of the board wanted further information as to what would be the position of ala with regard to the organization and in what sense would that position be an advantage to the members of ala. copies of the document produced by the cctu were also requested. mr. long said 218 journal of libmry automation vol. 7/3 september 1974 that he would contact a member of that committee in new york and get copies to the board members. mr. long requested that his committee be enlarged. mr. kilgour told him to appoint as many members as he needed. program planning committee report. chairman brett butler reported on the new orleans institute on networking which he felt was very successful both topically and financially. he said smaller libraries are beginning to consider automation, and therefore are sending staff to these institutes. he reported that $9,300 was received from registration fees, and expenses were approximately $6,100. in addition, $1,800 in expenses were paid by slice. mr. hammer will send a report to the board. mr. butler told of the committee's meeting in may in chicago. the minutes of that meeting, written by mr. hammer, had been approved by the committee and could be distributed to the board. he further related that the program at the new york annual conference had gone well, with approximately 400 in attendance. there were no plans for publication of the proceedings of the program, although it had been taped for sale by ala. mr. butler said the program planning committee desired liaison with each isad operating committee. they had appointed someone to tesla and hoped to do likewise with the telecommunications committee. at the suggestion of ms. avram, a serials institute has been planned for atlanta in october, preceding the asis meeting. josh smith (asis) and mr. hammer are the coordinators. mr. butler also announced that another institute on networking would be held in the spring in new orleans. with more advance publicity he felt there would be a greater response than the institute of march 197 4, which had an attendance of over 125. the 1975 institute will be a basic tutorial; james rizzolo is responsible for the content. plans for a series of cooperative programs with asis were laid out by the committee. this had been discussed with josh smith and had received his approval. mr. butler said he would prepare a statement which would describe the fiscal organization to be sent to the board for a mail vote. mr. kilgour expressed his opinion that with the new dues structure, the board must look at the financial gain involved in the institutes. in fact, any money-making venture must be considered at this time due to the dues structure change. plans for the cable tv preconference at san francisco ( 1975) were dropped. a program for san francisco would center around reactions to the document produced by the national commission on libraries and information science, the final draft of which is to be published in january 1975. this program is to be analytical in nature. mr. butler explained that there is possible cosponsorship interest. also at the san francisco annual conference, the office of intellectual highlights of meetings 219 freedom will cosponsor with isad a panel on various aspects of privacy and data file security. mr. butler announced that fifteen deans of library schools had attended that morning's meeting of the committee. there is interest in cosponsorship of continuing education programs, but nothing has been made definite at this point. the committee will explore this further. committee on representation in machine readable form of bibliographic information (marbi) report. (exhibit 2). velma veneziano requested that mr. fasana report to isad, as he had prepared a summary of the meeting for the rtsd board of directors. she asked mr. kilgour if the board would approve her writing the canadian library association to grant permission to send an official observer to marbi, as requested. mr. kilgour suggested that the letter definitely state that this representative would be a nonvoting participant. cola. (exhibit 3 ). thursday, july 11, 1974 the meeting was called to order by president frederick kilgour at 4:30p.m. those present were: board-frederick g. kilgour, lawrence w. s. auld, paul p. fasana, susan k. martin, ralph m. shoffner, donald p. hammer ( isad executive secretary), and berniece coulter, secretary, isad. guests-henriette d. avram, william summers. report of lola editor. copies of the ]ola annual report were distributed to the board (exhibit 4). ms. martin requested board reaction to changes suggested by the isad editorial board: ( 1) incorporate the issn on the cover of the journal, and drop the coden; (2) change the color of the cover of lola for each volume, beginning with the march 1975 issue; and ( 3) consider changing the title of the journal, in the light of possible incorporation of information technologies into isad, to the journal of library technology (jolt). the consensus of the board was that: ( 1) coden should remain on the cover; ( 2) a change in cover stock was quite appropriate; and (3) ]ola is a long-established title, and should remain. committee reports. mr. kilgour suggested that committee reports to the board be discontinued to save time and that written reports be submitted in the future. motion. it was moved by ralph shoffner that all isad committee reports be submitted to the board in writing and that the chairman appear before the board only if the committee desired some board action, and that the board has previously received this request in writing. seconded by larry auld. carried. mr. kilgour suggested that committee appointments be sent to the board by carbon copies of letters rather than reported as an agenda item. 220 ]oumal of library automation vol. 7/3 september 1974 representative to ansi x-4. motion. it was moved by ralph shoffner that mr. hammer explore and obtain, if possible, ala representation to ansi x-4 committee and that the board conditionally appoint arthur brody to be that representative. seconded by larry auld. carried. committee on technical standards for library automation (tesla) report. (exhibit 5). motion. it was moved by paul fasana to turn over to helen schmierer, chairman of the bylaws and organization committee, the matter of a revised charge to tesla. seconded by susan martin. carried. membership survey committee report. the final report of the membership survey ad hoc committee was distributed to the board members. this completed the work of the committee and the committee was therefore disbanded. mr. william summers, a member of the committee, appeared before the board. he stated that they had the computer capability to run any data correlations desired by the board. mr. kilgour asked the board members to request any correlations they would want from don hammer by october 15. he will forward them to ms. pope by mid-november, and she will have the correlations ready by the midwinter meeting in 1975. the board noted that the survey showed that 25 percent of !sad members are library directors, and that the most frequent age is over fifty. the number of people belonging to !sad who have no contact with library automation was surprising to some. a significant number of !sad members responded to the questionnaire. misleading claims. it was the sense of the board that the establishment of a committee in !sad to investigate misleading claims be referred to the bylaws and organization committee. the chairman is to contact william north, the ala attorney, concerning legal implications, and also steve salmon, who had shown interest in these problems, should be approached concerning the chairmanship. general discussion. most of the discussion centered around the new dues structure of the association. there was a question of how funds would be distributed to the divisions from institutional membership dues. ms. martin said that she would send the board an analysis of the expenditures and income for ]ola. the need for cash capital should be considered for the continuing publication of the journal despite advertising fluctuations. mr. shoffner stated that he favored using lola funds to sponsor !sad institutes and that there is a need for more introductory and elementary education in the !sad institutes. participants in the institutes had shown interest in basic knowledge of automation in order to make decisions in their work even though not necessarily involved directly with automation. highlights of meetings 221 exhibit 1 telecommunications committee report progress report of activities to date: 1. committee decided to maintain an awareness of future possibilities of two-way cable for data transmission, but not to continue active role in broadcast cable area in view of ongoing work in the area elsewhere. 2. committee members extensively debated the directions to which its future efforts would be actively directed; these included education, network protocol standards, etc. 3. committee accepted reports from messrs. randel and long on current suppliers of bibliographic services via star networking, and on current ansi and eia (plus iso) standards activities related to present and future bibliographic data transmission. 4. committee resolved: to attempt to formulate methods for computer-to-computer interaction (protocols) by telecommunication links, such that a single terminal of arbitrary characteristics could access a variety of host services in a "user-transparent" fashion. 5. various members of the committee accepted assignments in gathering data and protocols in use in such networks as arpa, tym-share, ncic, etc. it was recognized that the membership of the committee must be enlarged and that more than two ala meeting forums yearly are needed for the task. recommendations for division board action: the committee moved and unanimously passed a resolution that ala, via !sad, join the committee of corporate telephone users (cctu) and thus support the effort to combat the at&t attempt to adversely modify the current wats tariff; should it not be legally or financially feasible for !sad/ ala to join cctu the committee will nonetheless attempt to follow and report on this and related regulatory items. exhibit 2 committee on representation in machine readable form of bibliographic information (marbi) report following is a summary of deliberations and actions of the committee: 1. jola editorial, vol. 7, no. 2, 1974. the chairperson was asked to send a letter to the editor correcting the erroneous/ ambiguous reference to marbi and its relationship (formal and otherwise) to lc, clr cembi, etc. 2. conser. the committee took note of and discussed recent developments of the conser project. the committee will review and comment on formal recommendations of conser affecting marc serials format when they are submitted through lc. 3. clrjnsf sponsored conference on national bibliographic control. formal "conclusions and recommendations" of the conference have been distributed. the committee decided to take note of this document, ask each member to comment on the substance, and to prepare a formal critique/reaction of the conclusions for clr. 4. character sets. a progress report (by h. avram) was presented of international activities. extended character sets for latin (i.e., roman), cyrillic, and greek have been agreed to by iso working group on character sets. a draft standard i's being prepared. further work is being done on character sets for mathematical symbols and african languages. 5. iso 2709, format structure for marc records. progress report given. no action taken. 6. content designators. a progress report on international activities was given (by 222 i oumal of library automation vol. 7 i 3 september 197 4 h. avram) as well as a summary of some working papers prepared to date. copies will be submitted to the marbi members. 7. iso filing standards. a progress report was given. discussion but no action. 8. authority record formats. copies of the lc proposal for "authorities: a marc format" were distributed. a description of the work in progress at lc was given. lc tentatively plans to initiate a service for authorities in machine-readable form in 1975. the service probably will include names new to lc with cross-references and names new to marc with cross-references. 9. microform experiment. lc representative described a com microform experiment currently being defined/ set up at lc. the experiment will focus on lcsh 8th ed. in com format. 10. isbd-serials. the first formal publication of isbd-s was available at this conference. it was decided that each member would review the document and send comments to mr. fasana by august 15. mr. fasana was instructed to prepare a summary of the comments supplied. 11. catalog code revision committee. the need to establish liaison and input to this committee was discussed. arrangements were made with the chairperson of the ccrc (j. byrum) to establish input and liaison between the two committees. exhibit 3 cola discussion group report the isad cola discussion group met on july 7, 1974. brian aveney, chairman, mentioned that the subject of merger of cola with the marc users' discussion group had been informally raised, and invited comment from any members of the groups. discussion centered around the time needs and a suggestion was made that cola and mudg meet back to back. further discussion was deferred for later informal contacts. the program divided into two different sessions. the first consisted of a series of independent presentations on library automation activities around the counhy. those who reported were: helena rivoire (bucknell university); ron miller and bill mathews (nelinet); ann ekstrom (oclc); richard de gennaro (university of pennsylvania); james sokoloski (university of massachusetts); james dolby (r&d associates); howard harris (university of chicago); and stephen silberstein (university of california, berkeley). the second half of the program consisted of a panel presentation about the use of microform catalogs in libraries. richard jensen (university of texas, permian basin) described the use during the last year of a divided microfiche catalog produced under contract by richard abel co. no other form of access to the collection is provided for public use. a brief questionnaire about patron response indicated no great difficulties in use. some complaints about readers and filing were noted. mary fischer (los angeles public library) discussed the transition to com fiche for internal reports, for reasons of cost. a variety of reports can now be distributed to all branches which formerly did not have access to this information except at the central library. james rizzolo (new york public library) mentioned the dance collection catalog now available on film. user response has been very positive, but the fact that this is the first time any form of catalog has been available is probably a large factor in this response. a com marc character set has been developed with a new york vendor for use in internal fiche files, and samples were made available to the group. highlights of meetings 223 exhibit 4 journal of library automation annual report this report covers the eighteen months between january 1973 and june 1974. during this period nine issues of ]ola appeared, from the june 1972 to the june 1974 issues. these issues contained thirty-nine articles and twenty-three book reviews. in addition, lola/technical communications was incorporated into the journal with the march 1973 issue. with volume 7 (1974), an editorial or guest editorial appears in each issue. in january 1973 the journal was eight months behind. ala's central production unit was to have taken over the technical editing with the 1973 volume, but due to the unforeseen delay in publication the staff was not familiar with the journal, or the printer. by march all the major problems had been sorted out, and the june 1972 issue was sent to the printer. at that time there was a bacldog of thirty-five manuscripts, of which twenty were eventually published, nine were rejected, and six are still pending (either sent back to the author for revision, or still in the process of locating or identifying the author). with volume 6 (1973), the contract for printing was given to the ovid bell press, inc. spencer-walker did not bid on a contract renewal. because of the increasing cost of paper and the narrower selection offered by paper manufacturers, the editorial board determined that at the same time it would be reasonable to change from use of permalife to another cheaper but acid-free stock. warren old english was selected; at the time (june 1973) it was $25.10 per hundredweight. since february 1973, ]ola has received fifty manuscripts for consideration: published 18 rejected 11 accepted 7 in review 4 pending 9 sent to tc editor 1 it is difficult to summarize the content of these nine issues. when categorized very broadly, the thirty-nine articles covered the following topics: aspects of cataloging 7 search keys and file structure 7 national automation and standards 7 isad topics 5 circulation 5 acquisitions 2 serials 2 information reh·ieval 2 administration 1 other 1 don bosseau continues, i am pleased to say, as editor of technical communications. peter simmons (university of british columbia) accepted the position of book review editor, and is also doing an excellent job. he reports that, in addition to the reviews already published, eleven reviews have been submitted and are awaiting publication, and six books are in the hands of reviewers. the central production unit has been of invaluable assistance in bringing the journal up to date, in negotiating with the post office on our behalf, and in continuing to provide technical editing support. lola is now completely up to date; i hope that we shall continue to improve the 224 journal of libmry automation vol. 7/3 september 197 4 standards for acceptance of articles, and that time will now permit us to examine the journal critically to determine where improvements could or should be made. exhibit 5 committee on technical standards for library automation report recommendations for division board action: i. nominate mr. arthur brody as isad representative to ansi-x4. 2. approve revised charge to tesla. the tesla met in three sessions. i. minutes of previous meeting. approved. 2. charge to the committee. the charge to the committee had been revised and the · reasons for each revision documented, and the revisions reviewed. it was voted that the charge as revised be approved by the isad board. 3. draft procedure. the tesla procedure for handling standards proposals was reviewed and the following changes recommended: a. proposal outline item viii be made optional. b. reactor ballot include three responses, e.g., for/ against-need for standard; for/against-specification of standard; yes/no-available to work on specification. these changes will be made and published in the next issue of jola-tc. 4. publication of materials relating to standards. the article describing the committee's procedures and role and outlining the standards organization potentially impacting libraries was published in jola. the committee discerned that standards exist which would be of importance to the library community and that these be identified and reviewed in terms of their impact on libraries. as a first step, a listing of those standards will be drawn up and, on review of the committee, published in jola-tc. 5. representative to ansi-x4. the ala is currently not represented on ansi-x4. the committee recommends that the isad board nominate mr. arthur brody as the ala representative to ansi-x4. 6. metrication. the current movement to metric measure may impact libraries. a subcommittee of ms. madeline henderson (chairperson) and dr. ed bowles was formed to develop a position paper on the impact of metrication. 7. standards program at san francisco. the committee will present a 1 ~-hour program on standards at the next annual convention, in san francisco. 8. open meeting. reactor ballot responses to the potential standards areas and a general review of the committee's activities were held in its third session. 9. next meeting. tentatively the committee will meet at the asis conference in atlanta. time and date to be announced. exhibit 6 isad/led committee on education for information science report discussion: directions of committee. need for visibility at ala and follow up to denver ( 1971) meeting. highlights of meetings 225 possible tutorial or institute topics-cosponsors. action: 1. plan program for san francisco 1975. speaker: ph.d. student from syracuse to design guidelines for module development panel: two to three modules presented reactors: discussion 2. work out subject outline based on questionnaire for distribution at san francisco for possible module development ready for committee approval by midwinter. recommendations for division board action: program slot for san francisco highlights: serious concern about lack of member participation. isad and led may want to reexamine purpose-need for committee and/ or reorganization. i i '! 320 book reviews a computer based system for reserve activities in a university library, by paul j. fasana (and others). new york: systems office, the libraries, columbia university, 1969. (final report, project no. 7-1129, u. s. office of education, bureau of research) iii, 50, (53) pp. one opens this report wondering whether circulation of reserve books to readers is included in the computer based system, and assuming that such circulation would have to be handled on-line because the short duration of reserve loans, often on the order of one hour, would not seem to fit well with batch processing. it is soon made clear that on-line circulation was set as a goal of the second phase of the system; only the first phase is described here, though somewhat tantalizingly it is stated that one of the aspects of phase two already developed or experimented with is "a fully operational off-line circulation system." what is reported here, however, in commendable fullness, is a system, called reserves processing, which greatly facilitates the processes of putting books on reserve, taking them off, and producing reference lists. emphasis has been placed on developing a generalized system that can be used in different units of the columbia university libraries, and, with necessary modifications, in other academic libraries. the preferred form of data entry is on-line with an ibm 2741 terminal. other functions (and backup systems for data entry) are off-line; the master reserve file is stored on an ibm 2311 disc pack. one section of the report describes the system for those who are not computer specialists; this includes copies of forms and form letters. other sections give technical documentation, including a flow chart, details of format, and actual listings of four programs written in f level cobol for os/360. the report will be valuable to anyone considering the problem of reserve books; its successor covering phase two will be eagerly awaited by all those interested in circulation as well. foster m. palmer involvement of computers in medical sciences, compiled by k. m. shahid, h. j. vander aa, and l. m. c. j. sicking. amsterdam: swets and zeitlinger, 1969. 227 pp. the compilers of this volume have brought together the significant abstracts of the literature that pertains to the use of the computer in present-day medicine. this volume will serve a valuable purpose for those interested in the computer and its applications in medical sciences as it will give a broad overview of computer usage in medicine and many closely allied fields. as computer uses grow in frequency and diversity, a review of this type becomes increasingly valuable to those interested in the field. 1 ohn a. prior book reviews 321 translations journals; list of periodicals translated cover-to-cover, abstracted publications and periodicals containing selected articles, compiled by mrs. a. s. de groot-de rook. delft: european translations centre, 1970. 44 pp. $2.00. this book is an updated bibliographical list and union catalog intended as a guide to scientific and technical journals in translation. entries, arranged alphabetically by original title, contain bibliographical details, publisher and price. the list includes both current and terminated periodicals (about 400 entries). there are cross references from the translated title to the original title. at the end of each entry selected locations and their holdings are listed. the holdings of national translation centres and/or libraries adhering to the european translation centre are also included. a "list of publishing houses," the agents from which to order, are included along with mailing addresses. there is also a "list of holding libraries" with addresses. only non-western language periodicals for which there are western language verisions are included. no non-western journals that contain western language articles or journals originally published in western languages are included. irene braden hoadley ' proceedings of the 1969 clinic on library applications of data processing, edited by dewey e. carroll. urbana: university of illinois graduate school of library science, 1970. 144 pp. $5.00. the volume contains eleven invited papers presented at the seventh annual clinic on library applications of data processing held april 2730, 1969, at urbana, illinois. as in the preceding volumes in this series, the purpose is to report actual experience in case history form of applications of data processing technology to areas of library operations. the book is a source of information on how particular problems were handled within a particular environment. library operations which receive particular attention are the usual ones: acquisitions, cataloging and circulation. "library networks: cataloging and bibliographic aspects," by ann curran presents actual problems encountered in the development of an operating network as well as many thought-provoking questions. stephen salmon's article on automation of the library of congress card division is very informative. also of interest are two articles dealing with pl/i as a programming language for library applications. several articles are beginning to describe on-line applications of data processing for libraries as well as batch processing and the optimal mixes of both. unlike some of the preceding volumes, this volume has a very fine overali index. there is an error in the name of one of the authors (james b. corbin should b e john b. corbin). no participant discussion is included. kenneth ]. bierman 322 journal of library automation vol. 3/4 december, 1970 techniques of information retrieval, by b. c. vickery. hamden, conn.: archon books, 1970. 262 pp. $11.00. this book is a lucidly written text dealing primarily with manual indexing, and the manual construction of document profiles. there is a wealth of information about classification systems and their use for indexing purposes, and two particularly interesting chapters that give illustrations of some of the work going on at information centers, and of some of the basic concepts arising in systems evaluation, respectively. the present reviewer finds this book difficult to deal with, since the temptation continuously arises to substitute one's own aims for those of the author. to my mind, this book does not deal with the "techniques of information retrieval," as commonly understood. the latter would surely include a thorough description of automatic indexing procedures, automatic classification, on-line search systems, modern storage allocation methods, fast search systems, and so on; and while some of these concepts are mentioned in passing, the reader surely cannot obtain an accurate picture in these areas. rather, the book deals with conventional indexing procedures, and will likely be of value for the conventional training of librarians and documentalists. the text is easy to read, and includes plenty of examples, as well as some examination questions and exercises. still, this reviewer wonders whether a more modem book might not have been published in 1970, particularly if the title includes the phrase "information retrieval." to this question, the author would likely answer (as on page 17) that the: " ... analysis and synthesis of information, though it may be aided by the machine can only be carried out effectively by skilled human labor;" or again (as on page 43) : " ... if we cannot say for certain what is the optimum human selection of index terms in a particular situation, then one cannot evaluate a machine selection." statements such as these are easy to generate, particularly if one is not obliged to furnish any proof for one's assertions. in any case, they serve to illustrate the author's viewpoint and his particular choice of subject matter. to summarize, this text appears to be an excellent introduction to conventional documentation work, with emphasis on manual document analysis and indexing. it does not, unfortunately, give a reasonable preview of the fundamental changes which will inevitably occur in the information and documentation fields over the next ten or twenty years. g. salton : | zhang et al. 75seeing the wood for the trees | zhang et al. 75 here again, no weighting or differentiating mechanism is included in describing the multiple elements. what is addressed is the “what” problem: what is the work of or about? metadata schemas for images and art works such as vra core and cdwa focus on specificity and exhaustivity of indexing, that is, the precision and quantity of terms applied to a subject element. however, these schemas do not address the question of how much the work is of or about the item or concept represented by a particular keyword. recently, social tagging functions have been adopted in digital library and catalog systems to help support better searching and browsing. this introduces more subject terms into the system. yet again, there is typically no mechanism to differentiate between the tags used for any given item, except for only a few sites that make use of tag frequency information in the search interfaces. as collections grow and more federated searching is carried out, the absence of weights for subject terms can cause problems in search and navigation. the following examples illustrate the problems, and the rest of the paper further reviews and discusses the precedent research and practice on weighting, and further outlines the issues that are critical in applying a weighting mechanism. example, the dublin core metadata element set recommends the use of controlled vocabulary to represent subject in “keywords, key phrases, or classification codes.”1 similarly, the library of congress practice, suggested in the subject headings manual, is to assign “one or more subject headings that best summarize the overall contents of the work and provide access to its most important topics.”2 a topic is only “important enough” to be given a subject heading if it comprises at least 20 percent of a work, except for headings of named entities, which do not need to be 20 percent of the work when they are “critical to the subject of the work as a whole.”3 although catalogers are aware of it when they assign terms, this weight information is left out of the current library metadata schemas and practice. a similar practice applies in non-textual object subject indexing. because of the difficulty of selecting words to represent visual/aural symbolism, subject indexing for art and cultural objects is usually guided by panofsky’s three levels of meaning (pre-iconographical, iconographical, and post-iconographical), further refined by layne in “ofness” and “aboutness” in each level. specifically, what can be indexed includes the “ofness” (what the picture depicts) as well as some “aboutness” (what is expressed in the picture) in both pre–iconographical and iconographical levels.4 in practice, vra core 4.0 for example defines subject subelements as: terms or phrases that describe, identify, or interpret the work or image and what it depicts or expresses. these may include generic terms that describe the work and the elements that it comprises, terms that identify particular people, geographic places, narrative and iconographic themes, or terms that refer to broader concepts or interpretations.5 seeing the wood for the trees: enhancing metadata subject elements with weights subject indexing has been conducted in a dichotomous way in terms of what the information object is primarily about/of or not, corresponding to the presence or absence of a particular subject term, respectively. with more subject terms brought into information systems via social tagging, manual cataloging, or automated indexing, many more partially relevant results can be retrieved. using examples from digital image collections and online library catalog systems, we explore the problem and advocate for adding a weighting mechanism to subject indexing and tagging to make web search and navigation more effective and efficient. we argue that the weighting of subject terms is more important than ever in today’s world of growing collections, more federated searching, and expansion of social tagging. such a weighting mechanism needs to be considered and applied not only by indexers, catalogers, and taggers, but also needs to be incorporated into system functionality and metadata schemas. s ubjects as important access points have largely been indexed in a dichotomous way: what the object is primarily about/ of or not. this approach to indexing is implicitly assumed in various guidelines for subject indexing. for hong zhang, linda c. smith, michael twidale, and fang huang gaocommunications hong zhang (hzhang1@illinois.edu) is phd candidate, graduate school of library and information science, university of illinois at urbana-champaign, linda c. smith (lcsmith@illinois.edu) is professor, graduate school of library and information science, university of illinois at urbana-champaign, michael twidale (twidale@illinois.edu) is professor, graduate school of library and information science, university of illinois at urbana-champaign, and fang huang gao (fgao@gpo.gov) is supervisory librarian, government printing office. 76 information technology and libraries | june 2011 ■■ examples of problems exhaustive indexing: digital library collections a search query of “tree” can return thousands of images in several digital library collections. the results include images with a tree or trees as primary components mixed with images where a tree or trees, although definitely present, are minor components of the image. figure 1 illustrates the point. these examples come from three different collections and either include the subject element of “tree” or are tagged with “tree” by users. there is no mechanism that catalogers or users have available to indicate that “tree” in these images is a minor component. note that we are not calling this out as an error in the professionally developed subject terms, nor indeed in the end user generated tags. although particular images may have an incorrectly applied keyword, we want to talk about the vast majority where the keyword quite correctly refers to a component of the image. furthermore, such keywords referring to minor components of the image are extremely useful for other queries. this kind of exhaustive indexing of images enables the effective satisfaction of search needs, such as looking for pictures of “buildings, people, and trees” or “trees beside a river.” with large image collections, such compound needs become more important to satisfy by combinations of searching and browsing. to enable them, metadata about minor subjects is essential. however, without weights to differentiate subject keywords, users will get overwhelmed with partially relevant results. for example, a user looking for images of trees (i.e., “tree” as the primary subject) would have to look through large sets of results such as a photograph of a dog with a tiny tree out of focus in the background. for some items that include rich metadata, such as title or description, when people look at a particular item’s record, with the title and sometimes the description, we may very well determine that the picture is primarily of, say, a dog instead of trees. that is, the subject elements have to be interpreted based on the context of other elements in the record to convey the “primary” and “peripheral” subjects among the listed subject terms. however, in a search and navigation system where subject elements are usually treated as context-free, search efficiency will be largely impaired because of the “noise” items and inability to refine the scope, especially when the volume of items grows. lack of weighting also limits other potential uses of keywords or tags. for example, all the tags of all the items in a collection can be used to create a tag cloud as a low cost way to contribute to a visualization of what a collection is “about” overall.6 unfortunately, a laboriously developed set of exhaustive tags, although valuable for supporting searching and browsing within a large image collection, could give a very distorted overview of what the whole collection is about. extending our example, the tag “tree” may occur so frequently and be so prominent in the tag cloud that a user infers that this is mostly a botanical collection. selective indexing: lcsh in library catalogs although more extreme in the case of images in conveying the “ofness,” the same problem with multiple subjects also applies to text in terms of “aboutness.” the following example comes from an online library catalog in a faceted navigation web interface using library of congress subject headings in subject cataloging.7 the query “psychoanalysis and religion” returned 158 results, with 126 in “psychoanalysis and religion” under the topic facet. according to the subject headings manual, the first subject is always the primary one, while the second and others could be either a primary or nonprimary subject.8 this means that among these 126 books, there is no easy way to tell which books are “primarily” about “psychoanalysis and religion” unless the user goes through all of them. with the provided metadata, we do know that all books that have “psychoanalysis and religion” as the first subject heading are primarily about this topic, but a book that has this same heading as its second subject heading may or may not be primarily about this topic. there is no way to indicate which it is in the metadata, nor in the search interface. as this example shows, the library of congress manual involves an attempt to acknowledge and make a distinction between primary and nonprimary subjects. however in practice the attempt is insufficient to be really useful since apart from the first entry, it is ambiguous whether subsequent entries are additional primary subjects or nonprimary subjects. consequently, the search system and, further on, the users are not able to take full advantage of the care of a cataloger in deciding whether an additional subject is primary or not. other information retrieval systems the negative effect of current subject indexing without weighting on search outcomes has been identified by some researchers on particular information retrieval systems. in a study examining “the contribution of metadata to effective searching,”9 hawking and zobel found that the available subject metadata are “of little value in ranking answers” to search queries.10 their explanation is that “it is difficult to indicate via metadata tagging the relative importance of a page to a particular topic,”11 in addition to the problems in data quality and system implementation. the same problem : | zhang et al. 77seeing the wood for the trees | zhang et al. 77 authors compared with the automatic indexing systems, because human indexers should be better at weighting the significance of subjects, and be more able to distinguish between important and peripheral compared with computers that base significance on term frequency.13 indeed, while various weighting algorithms have been used in automatic indexing systems to approximate the distinguishing function, there is simply no such mechanism built in human subject the particular page harder to find.12 a similar problem is reported in a recent study by lykke and eslau. in comparing searching by controlled subject metadata, searching based on automatic indexing, and searching based on automatic indexing expanded with a corporate thesaurus in an enterprise electronic document management system, the authors found that the metadata searches produced the lowest precision among the three strategies. the problem of indiscriminate metadata indexing is “remarkable” to the of multiple tags without weights is described: in the kinds of queries we have studied, there is typically one page (or at most a small number) that is particularly valuable. there are many other pages which could be said to be relevant to the query—and thus merit a metadata match—but they are not nearly so useful for a typical searcher. under the assumption that metadata is needed for search, all of these pages should have the relevant metadata tag, but this makes a. subject: women; books; dresses; flowers; trees; . . . in: victoria & albert museum (accessed aug. 30, 2010), http://collections.vam.ac.uk/item/014962/oil-painting-the-day-dream b. tags: japanese; moon; nights; walking; tree; . . . in: brooklyn museum (accessed aug. 30, 2010), http://www.brooklynmuseum.org/opencollections/objects/121725/aoi_slope_outside_toranomon_gate_no._113_from_ one_hundred_famous_views_of_edo c. tags: japanese; birds; silk; waterfall; tree; . . . in: steve: the museum social tagging project (accessed aug. 30, 2010), http://tagger.steve.museum/steve/object/15?offset=2 figure 1. example images with “tree” as a subject item 78 information technology and libraries | june 2011 anderson in niso tr021997.20 in addition, researchers have noticed the limitations of this dichotomous indexing. in an opinion piece, markey emphasizes the urgency to “replace boolean-based catalogs with post-boolean probabilistic retrieval methods,”21 especially given the challenges library systems are faced with today. it is the time to change the boolean, i.e., dichotomous, practice of subject indexing and cataloging, no matter whether it is produced by professional librarians, by user tagging, or by an automatic mechanism. indeed, as declared by svenonius, “while the purpose of an index is to point, the pointing cannot be done indiscriminately.”22 needed refinements in subject indexing the fact that weighted indexing has become more prominently needed over the past decade may be related to the shift in the continuum from subject indexing as representation/ surrogate to subject indexing as access points, which is consistent with the shift from a small number of subject terms to more subject terms. this might explain why the weighting practice is applied in the above mentioned medline/pubmed system. with web-based systems, social tagging technology, federated searching, and the growing number of collections producing more subject terms, to distinguish between them has become a prominent problem. in reviewing information users and use from the 1920s to the present, miksa points out the trend to “more granular access to informational objects” “by viewing documents as having many diverse subjects rather than one or two ‘main’ subjects,” no matter what the social and technical environment has been.23 in recognizing this theme in the future development of information organization and retrieval systems, we argue that the subject indexing mechanism subject indexing has been discussed in the research area of subject analysis for some time. weighting gives indexing an increased granularity and can be a device to counteract the effect of indexing specificity and exhaustivity on precision and recall, as pointed out by foskett: whereas specificity is a device to increase relevance at the cost of recall, exhaustivity works in the opposite direction, by increasing recall, but at the expense of relevance. a device which we may use to counteract this effect to some extent is weighting. in this, we try to show the significance of any particular specification by giving it a weight on a pre-established scale. for example, if we had a book on pets which dealt largely with dogs, we might give pets a weight of 10/10, and dogs, a weight of 8/10 or less.16 anderson also includes weighting as a part of indexing in the guidelines for indexes and related information retrieval devices (niso tr021997): one function of an index is to discriminate between major and minor treatments of particular topics or manifestations of particular features.17 he also notes that a weighting scheme is “especially useful in high-exhaustivity indexing”18 when both peripheral and primary topics are indicated. similarly, fidel lists “weights” as one of the issues that should be addressed in an indexing policy.19 metadata indexing without weighting is related to the simplified dichotomous assumption in subject indexing—primarily about/of and not primarily about/of, which further leads to the dichotomous retrieval result—retrieved and not retrieved. weighting as a mechanism to break this dichotomy is noted by metadata indexing even though human indexers are able to do the job much better than computers. weighting: yesterday, today, and future precedent weighting practices written more than thirty years ago, the final report of the subject access project describes how the project researchers applied weights to the newly added subject terms extracted from tables of contents and backof-the-book indexes. the criterion used in that project was that terms and phrases with a “ten-page range or larger” were treated as “major” ones.14 a similar mechanism was adopted in the eric database beginning in the 1960s, with indexes distinguishing “major” and “minor” descriptors as the result of indexing. while some search systems allowed differentiation of major and minor descriptors in formulating searches, others simply included the distinction (with an asterisk) when displaying a record. unfortunately, this distinguishing mechanism is no longer included in the later eric indexing data. a system using weighted indexing and searching and still running today is the medline/pubmed interface. a qualifier [majr] can be used with a medical subject headings (mesh) term in a query to “search a mesh heading which is a major topic of an article (e.g., thromboembolism[majr]).”15 in the search result page, each major mesh topic term is denoted by an asterisk at the end. weighting concept and the purpose of indexing the weighting concept is connected with the fundamental purpose of indexing. the idea of weighting in : | zhang et al. 79seeing the wood for the trees | zhang et al. 79 user tagging and machine generated metadata, such weighting becomes more important than ever if we are to make productive use of metadata richness and still see the wood for the trees. references 1. “dublin core metadata element set, version 1.1,” http://dublincore.org/docu ments/dces/ (accessed nov. 20, 2010). 2. library of congress, subject headings manual (washington, d.c.: library of congress, 2008). 3. ibid. 4. elaine svenonius, “access to nonbook materials: the limits of subject indexing for visual and aural languages,” journal of the american society for information science, 45, no. 8 (1994): 600–606. 5. “vra core 4.0 element description,” http://www.loc.gov/standards/vracore/ vra_core4_element_description.pdf (accessed mar. 31, 2011). 6. richard j. urban, michael b. twidale, and piotr adamczyk, “designing and developing a collections dashboard,” in j. trant and d. bearman (eds). museums and the web 2010: proceedings, ed. j. trant and d. bearman (toronto: archives & museum informatics, 2010). http://www .archimuse.com/mw2010/papers/urban/ urban.html (accessed apr. 5, 2011). 7. “vufind at the university of illinois,” http://vufind.carli.illinois.edu (accessed nov. 20, 2010). 8. library of congress, subject headings manual. 9. david hawking and justin zobel, “does topic metadata help with web search?” journal of the american society for information science & technology 58, no. 5 (2007): 613–28. 10. ibid. 11. ibid. 12. ibid, 625. 13. marianne lykke and anna g. eslau, “using thesauri in enterprise settings: indexing or query expansion?” in the janus faced scholar. a festschrift in honour of peter ingwersen, ed. birger larsen et al. (copenhagen: royal school of library & information science, 2010): 87–97. 14. subject access project, books are for use: final report of the subject access project to the council on library resources (syracuse, n.y.: syracuse univ., 1978). 15. “pubmed,” http://www.nlm.nih more than three categories or using continuous scales instead of category rating.24 subject indexing involves a similar judgment of relevance when deciding whether to include a subject term. more sophisticated scales certainly enable more useful ranking of results, but the cost of obtaining such information may rise. after the mechanism of incorporating weights into subject indexing/ cataloging is developed, guidelines should be provided for indexing practice to produce consistent and good quality. weights in both indexing and retrieval system adding weights to subject indexing/ cataloging needs to be considered and applied in three parts: (1) extending metadata schemas by encoding weights in subject elements; (2) subject indexing/cataloging with weight information; and (3) retrieval systems that exploit the weighting information in subject metadata elements. the mechanism will not work effectively in the absence of any one of them. conclusion this paper advocates for adding a weighting mechanism to subject indexing and tagging, to enable search algorithms to be more discriminating and browsing better oriented, and thus to make it possible to provide more granular access to information. such a weighting mechanism needs to be considered and applied not only by indexers, catalogers, and taggers, but also needs to be incorporated into system functionality. as social tagging is brought into today’s digital library collections and online library catalogs, as collections grow and are aggregated, and the opportunity arises for adding more metadata from a variety of different sources, including end should provide sufficient granularity to allow more granular access to information, as demonstrated in the examples in the previous section. potential challenges while arguing for the potential value of weights associated with subject terms, it is also important to acknowledge potential challenges posed by this approach. human judgment treating assigned terms equally might seem to avoid the additional human judgment and the subjectivity of the weight levels because different catalogers may give different weight to a subject heading. we argue that assigning subject headings is itself unavoidably subjective. we are already using professional indexers and subject catalogers to create value-added metadata in the form of subject terms. assigning weights would be a further enhancement. on the other hand, adding a weighting mechanism into metadata schemas is independent of the issue of human indexing. no matter who will do the subject indexing or tagging, either professional librarians or users or possibly computers, there is a need for weight information in the metadata records. the weighting scale in terms of the specific mechanism of representing the weight rating, we can benefit from research on weighting of index terms and on the relevance of search results. for example, the three categories of relevant, partially relevant, and nonrelevant in information retrieval are similar to the major, minor, and nonpresent subject indexing method in the examples above. borlund notes several retrieval studies proposing 80 information technology and libraries | june 2011 22. svenonius, “access to nonbook materials,” 601. 23. francis miksa, “information organization and the mysterious information user,” libraries & the cultural record 44, no. 3 (2009): 343–70. 24. pia borlund, “the concept of relevance in ir,” journal of the american society for information science & technology 54, no. 10 (2003): 913–25. 18. ibid. 19. raya fidel, “user-centered indexing,” journal of the american society for information science 45, no. 8 (1994): 572–75. 20. anderson, guidelines for indexes and related information retrieval devices, 20. 21. karen markey, “the online library catalog: paradise lost and paradise regained?” d-lib magazine 13, no. 1/2 (2007). . g o v / b s d / d i s t e d / p u b m e d t u t o r i a l / 020_760.html (accessed nov. 20, 2010). 16. a. c. foskett, the subject approach to information, 5th ed. (london: library association publishing, 1996): 24. 17. james d. anderson, guidelines for indexes and related information retrieval devices. niso-tr02–1997, http:// www.niso.org/publications/tr/tr02.pdf (accessed nov. 20, 2010): 25. virtual reality: the next big thing for libraries to consider editorial board thoughts virtual reality: the next big thing for libraries to consider breanne kirsch information technology and libraries | december 2019 4 breanne kirsch (breanne.kirsch@briarcliff.edu) is university librarian, briar cliff university. i had the pleasure of attending educause annual conference from october 14-17, 2019. this was my first time at educause, but i was impressed with the variety of programs, vendors, and options for learning about technology and higher education. after recently completing my coursework for a second master’s in educational technology, i was curious to see what new technologies would be highlighted at educause. i found out about some new trends, such as the growth of esports in high schools and higher education. esports are when players or teams compete through computers in video game competitions.1 there were over 20 programs and sessions about virtual reality at educause. since there were so many programs about virtual reality at educause, i wanted to share a little of what i learned including how some higher education institutions are creating vr content, using pre-created content, and vr in libraries. since virtual reality is still new to many higher education institutions, i wasn’t sure how many would be creating content, but i did attend a couple of sessions about how 360 -degree content is being created. virtual reality content creation seems to happen most frequently in the medical field so students can practice different procedures that may not happen very frequently in their jobs, allowing them to experience a wider variety of procedures that they will eventually encounter in the workplace. health sciences libraries are generally ahead of the curve in providing vr services to patrons.2 additionally, stem areas are finding more uses for vr, such as vr laboratories so expensive lab equipment does not need to be purchased, but students can still participate in vr lab experiences. creating vr content using tools such as unity can be difficult and time-consuming. some educators are using 360-degree cameras to create virtual settings that can be used by students but are easier to create. tim fuller and rich kappel spoke about how they used a 360-degree camera and matterport scans to create 360-degree virtual environments for students to explore and engage with robotics technology. tags can then be added to include pictures, videos, or link to websites with more information. this creates a shareable link that can be used to share with students. i was able to use my iphone and the google street view app to create a 360-degree tour of my library. it is not high quality enough to view in virtual reality with an oculus go or other vr headset, but it is a great starting point for creating a 360-degree virtual tour of a library on a budget. this was free (since i already had an iphone). there is a wide variety of freely available, 360-degree content that can be used by educators in the classroom and more is being created. what does this mean for libraries? while quick virtual tours can be created with smartphones, higher quality vr experiences can also be created by librarians using a 360-degree video camera. these experiences could be used to teach students information literacy skills or search strategies in a vr environment. while this would be harder to do right now with the technologies available, mailto:breanne.kirsch@briarcliff.edu virtual reality: the next big thing for libraries to consider | kirsch 5 https://doi.org/10.6017/ital.v38i4.11847 it could become easier down the road. meanwhile, librarians can create 360-degree virtual tours. libraries can offer vr services, such as a vr lab or checking out standalone vr headsets, such as oculus go or oculus quest. just like with the makerspaces trend, libraries are well situated to support virtual reality in education. our library circulates an oculus go and when we were considering adding a virtual reality headset, there were some risks we considered prior to purchasing it. there are health risks for some people when using virtual reality headsets, such as motion sickness, dizziness, and, in some cases, epileptic seizures. it is important to explain this to students before they check out the device, so they know to immediately quit using the oculus go if they have an adverse reaction. additionally, we keep cleaning wipes with the oculus go to help keep it sanitary when multiple people are using it. a tablet or smart phone needs to be associated with the oculus go in order to update apps or download new apps. therefore, a passcode needs to be added so students can’t purchase paid apps on the oculus go with the associated credit card. privacy can also be a concern, especially when using the social apps, which is why i decided not to download the social apps on the oculus go at this time. some of the scary apps, such as the face your fear app can cause students to scream, so it is important that students realize how realistic the experiences are before using them. one final consideration when offering vr services is staffing. there needs to be someone trained in the library that can help teach students how to use the vr headset and experiences. i’ve trained each of our student workers in how to use the headset so they can show other students. while these are some important considerations when deciding whether to offer vr services or not, i believe the benefits outweigh the risks. virtual reality is expected to continue to grow, especially with wireless headsets, such as the oculus go and oculus quest available. it is important for libraries to be ready to offer support with virtual reality, just as we’ve offered support for prior technologies including tablets, laptops, computers, 3d printers, etc. libraries can start small, by circulating an oculus go or creating a 360-degree library tour. libraries with more resources could create a vr lab or provide support for creating vr content, such as 360 -degree video cameras or tools like unity. it will be exciting to see how libraries can support vr in the future. further readings van arnhem, jolanda-pieta, christine elliott, and marie rose. augmented and virtual reality in libraries. lanham: rowman & littlefield, 2018. varnum, kenneth j. beyond reality: augmented, virtual, and mixed reality in the library. chicago: ala editions, 2019. endnote 1 matthew a. pluss, kyle j. m. bennett, andrew r. novak, derek panchuk, aaron j. coutts and job fransen, “esports: the chess of the 21st century,” frontiers in psychology 10, no. 156, 2019, https://doi.org/10.3389/fpsyg.2019.00156. 2 susan lessick and michelle kraft, “facing reality: the growth of virtual reality and health sciences libraries,” journal of the medical library association 105, no. 4, 2017, https://doi.org/10.5195/jmla.2017.329. https://doi.org/10.3389/fpsyg.2019.00156 https://doi.org/10.5195/jmla.2017.329 further readings endnote reproduced with permission of the copyright owner. further reproduction prohibited without permission. using server-side include commands for subject web-page management: ... northrup, lori;cherry, ed;darby, della information technology and libraries; dec 2004; 23, 4; proquest pg. 192 tutorial using server-side include commands for subject webpage management: an alternative to database-driven technologies for the smaller academic library lori northrup, ed cherry, and della darby frustrated by the time-consuming process of updating subject web pages, librarians at samford university library (sul) developed a process for streamlining updates using server-side include (sst) commands. they created text files on the library server that corresponded to each of 143 online resources. include commands within the html document for each subject page refer to these text files, which are pulled into the page as it loads on the user's browser. for the user, the process is seamless. for librarians, time spent in updating web pages is greatly reduced; changes to text files on the server result in simultaneous changes to the edited resources across the library's web site. for small libraries with limited online resources, this process may provide an elegant solution to an ongoing problem. the migration of print ed subject guides and p athfinders to web pages began almost concurrently with th e creation of library web sites. dunsmore relates that this online mi gra tion durlori northrup (lanorthr@samford.edu) is reference librarian, ed cherry (cecherry @samford.edu) is the automation librarian, and della darby (dhdarby@samford. edu) is the coordinator of reference and government documents at samford university library, birmingham, alabama. in g the 1990s was follow ed almost immediately by articles on the d esig n, construction, usability , and maintenanc e of web-ba sed subject guides. 1 a scan of recent literat ure (for example, dean; roberts; davi dson; grimes and morris; and galvanestrada) suggests that onlin e access to library resources has becom e the norm, and that librarian s struggle with the tim e necessary to maintain th ese guides online. 2 in an effort to reduce time sp en t maintainin g subject guides to int erne t, print, and online resources, librari ans are discove ring more efficient methods of resource management for their web-mounted subject guid es and pathfinder s. roberts , davidson, and ga lvanestrada d escr ibed variations of database-driv en technology th at generate dynamic subject guides to library res ources; a common database of reso urces that have been descriptive ly enhanced for retrieval pro v ides the backbone for a system th at crea tes subject guid es for the user at the point of query. ' patrons can then search for materials that ma y cross disciplinary lines, and receive a more targeted result list than the y would ha v e rece ived had they only combined two s tatic subject bibliographi es from related fields. the primary advantage to this type of retrieval system , for the librarian, is that updates can be done at one central location-within the database-and will then appear when the upd ated item is viewed on any portion of the library 's web site generated from that database . libraries that have adopted databa se-driven technologies hav e done so because their resource listings have exceeded m anageable capacity. selected resourc es from the int erne t or from librar y electronic holdin gs have reached a numb er that is difficult for available staff to maintain , especially acro ss hundr eds of static web pages. for one library , this might mean a collection of mor e than two hundred; for larger librar y, this critical point might be reached only after eight hundred resources were gathered. at some 192 information technology and libraries i december 2004 point in the collection of resourc es, the amount of work n ecessa ry to create a database-driven syste m will be less than the pot en tial workload for updating individual pages. for the creat ors of these databasedri ven systems of resource retri eva l and displa y, th e time inve sted in the database sys tem grea tly outweighs th e potential time lost in upd ating what davidson has describ ed as "hundreds of p ages of html containing multipl e occurrences of the same information , each of which nee ds to be checked and updat ed in response to even trivial changes in title or url." 4 however, the investm ent of time and labor neces sary to th e creation of a populated database should not be downplayed. davidson also notes "th e process of recrea ting or migrating an entire site to a databa se-driven platform is timeand lab or -intensive ."5 indeed, rob er ts sugg es ts that the labor and time required of library staff to get the sys tem running at full potenti al outweighs any technical issues involved in creating th e database. 6 in galvanestrada 's case, librarians created tools spe cific to the sys tem to facilitate entr y of dat abase information , th ereby incr easi ng initial time and lab or investm ents. 7 this method for handling a multiplicity of web pages and resources may work well for colleges and universities with hundreds of resources to be repeated across searche s or subject pages. for some libraries , howeve r, the critical amount of resources tha t can spur tha t type of d ecision ma y never be reached. a closer loo k at one small acad emic library 's efforts to reduce time spent in updating static html subject guides may be helpful to other librari es in similar situations . samford's situation samford univ ers ity is a smallto medium-sized institution , with about 2,900 undergr aduate and 1,500 gradu ate students. the campus h as five reproduced with permission of the copyright owner. further reproduction prohibited without permission. information units: the university library, the law library, the education curriculum center, the career development center, and the drug information center. the university library is the primary information center for all disciplines except law. within the reference department, which is one of several university library departments, four full-time librarians are responsible for general reference, government documents reference, and maintenance of the department's designated web pages on the library's web site. like the majority of academic libraries, samford university library (sul) provides subject access to its electronic resources. sul' s practice is to provide static-subject web pages that include a list made up of (1) peri odical databases of primary subject importance, databases of secondary importance, and general databases; (2) reference books; and (3) web sites. each reference librarian is responsible for the creation and maintenance of selected subject pages and departmental pages. the department also maintains a page with an alphabetical list of all sul subscription databases and some free databases. it is from this list of commonly used resources that librarians select materials to fill the top portion of the subject pages . subject pages have undergone several metamorphoses in recent years. one major change was to add brief descriptions to each title on the database list while maintaining links to pages that held more in-depth information about each of the databases. because a database may be listed on two or more subject pages, these changes required a considerabl e amount of repetitious updating . the initial changes were made to the alphabetical list, and then the html code was copied into each of the subject pages where the title was listed. copying and pasting in this way helped to ensure consistency across subject pages. when a recent review of sul's web-site statistics indicated that the description pages were receiving little or no activity, the reference librarians decided to enlarge the descriptions under each database title on the list and remove the links to the description pages. this would provide more initial information for the patron while eliminating the web team's maintenance of underutilized pages. making these changes across all the sul subject pages and some other affected pages resulted in hours of html correction . for each change made to the alphabetical list of resources, several other subject web pages had to undergo the same change. for example, librarians were copying the information for academic search elite (ase) and infotrac onefile across every subject page. recently, faced with another update of all of the subject pages due to a database name change, the reference librarians decided to find a method that would minimize the repetition of effort required for this and future overhauls of the sul database lists. some informal discussion among reference librarians and with the automation librarian had concerned recent literature on database-driven web sites; however, the general consensus was that sul had neither the time nor the need to take such a significant leap. the total number of items on the alphabetical list of resources at that time was 144. adopting a new platform of operation in a database-driven model seemed too large an undertaking for sul' s small list of resources. the reference coordinator consulted with the automation librarian about the possibility of using include statements in place of each database title and description. the web team was already using include statements for the headers and footers of all library web pages. include commands include commands are a type of ssi code. fagan explains that through using ssi codes, a web author with no knowledge of programming can insert set groups of data into an html web page. carefully constructed statements within the html page give the server a command to locate and insert a piece of information (a date, a text file, a program). when a user requests a page (clicks on a link for the page), the server loads the page, inserting the requested material in place of the ssi code on the page ." the web server executes ssi code before the web page is transferred to the browser making the request. this means that the use of ssi is not browser-dependent. ssi directives work regardless of security or privacy controls in the browser, such as disabling javascript or cookies. mach notes that this type of retrieval and substitution can be especially helpful for material that is used repeatedly on several library web pages. the web author can create one file containing the information used on several pages; then, when a change is made to that file, it is repeated across all web pages that include that file. 9 as mentioned above, sul web pages all include the same header and footer; these headers and footers do not appear in the original html code for the pages . instead, there is a command that tells the server to locate the header and footer files and include them when the page is loaded on a browser. the user does not know that ssi coding was used; for the user, the page appears complete. if a change needs to be made in a footer , then the change is made to the file containing that information. changes appear on all pages that refer to the edited footer file. library literature on the use of ssi somewhat slow to adopt ssi capabilities, librarians have recently made excellent use of this elegant resource . using server-side include commands i northrup, cherry, and darby 193 reproduced with permission of the copyright owner. further reproduction prohibited without permission. current articles tend to be written with the understanding that many librarians who work on their library's web site may not have the experience or levels of access necessary to make server changes. articles focus more on the implementation of the syntax at the level of web-page maintenance and less on server technology. authors in the most recent publications have been more likely to offer examples, and to speculat e on uses that, while they are not entirely innovative, hav e not been fully taken advantage of in the past. using ssi to include text files in a web page has been mentioned as a possibility since the first article, but later articles have tended to amplify the possibilities for this type of include command. in a slight shift of emphasis, later articles have tended to downplay the added server strain that was once a matter of great concern for ssi users. ssi statements have probably been in use in libraries since servers were capable of processing them; documentation accompanying servers includes information on their use, as do some html manuals for beginning web-page creators. reference to the use of ssis in a library environment does not appear until 2000. notess's article provides a good, concise overview, and points to several helpful resources for the webmaster. notess mentions common uses of ssi commands that will provide knowledge about the page, such as "current date and tim e, the ltrl of the page, the directory in which the file is located, the kind of browser the user has, or pull in content from separate files to construct the page before it is delivered." he elaborates on this point by stating that "a simple text file can contain the content [for a web page], and people with no html experience can be given access to change that text." 10 while he mentions this possibl e use of the ssi include command for text files, notess's examples all concern the echo command. notess does mention that ssi can possibly cause server strain but states that "includes add very little extra load" [emphasis added]. 11 later articles tend to mirror notess's in describing situations appropriate for the use of ssi and in noting possible difficulties and concerns. mach's article assumes a bit more knowledge of servers and also access to server set-up features, but it is written in clear explanatory language and provides exc ellent examples of commonly used ssi commands, including figures to illustrate those uses. she elaborates on notess's suggestion about text files, and makes the suggestion that if non-html files are the targets of ssi commands, that they be given an extension such as .txt, so that they will not be indexed as web pages. 12 also notable is her assertion that "most web servers should be able to handle the extra load of parsing all files and simply using th e .html extension already in place." 13 written with an eye to those who are not web or server administrators, pagan's article one year later provides many example screen shots and a step-by-step guide to implementing ssi. the information here is similar to that in mach 's article, but the style is more accessible to the less-experienced web site developer. like mach and notess, fagan mentions the use of non-html files and clarifies the idea of having an "all-text file, which could be ftp'd to the web server." 14 as with the other authors, she mentions that enabling ssi on a server can result in the loss of "at least microseconds of time" as pages are parsed and reconstructed for the browser .15 she explained, though, that "ssi is used on large web sites in some fairly complex ways without causing any discernible time lag . the question is one of how busy your web server is; if it is not overburdened with requests, it will easily handle the additional load of parsing files." 1" she rightly asserts that slow internet connections or older, slower patron pcs are more likely to cause delays than is ssi load on a server. 17 194 information technology and libraries i december 2004 these articl es demonstrate an organic movement toward wider use of ssi capabilities. all of them mention the potential to use ssi for portions of pages that are the same, such as headers and footers. each of them also mentions the use of ssi to include text files that web authors without html experience can edit. what they do not cover is the use of ssi to include text files that are repeatedly used in the body of multiple web pag es. examples given are for web page s with significant blocks of text that might be written at different times by different authors. the use of ssi include commands at sul to insert pieces of identical text across many pages, while not groundbr eaking , is significant in light of libraries' recent efforts to handle large numbers of electronic resources. sul's experience as the literatur e and server technology have advanc ed over the last few years, there has been a noticeable move from emphasis on the strain that ssi can cause to acknowledgement that the technology has become more capable in handling that strain. this is not to say that strain on the server does not occur. the user manual for apache servers clearly states that "while this load increase is minor, in a shared server environment it can become significant." 18 however, the increases in server speed and capability certainly make it more feasible to use ssi in quantity than it was in the past, and library literature supports that view. informal information from web discussions and web-development sites is much more emphatic . recent discussion on thelist at evolt.org focused on this issue and elicited the following from baratta: "today's servers are super charged compared to just a few years ago, and most people won't see the traffic levels that give ssi overhead a chance to affect serving time." 19 reproduced with permission of the copyright owner. further reproduction prohibited without permission. the server statistics at sul bear out this theory. the library's six-yearold apache server last year averaged slightly more than 300,000 hits per month. the average increases somewhat to around 340,000 hits during the nine months of samford' s fall and spring semesters. this total number of hits includes all catalog searches as well, as this server also hosts sul' s automated library system. despite this seemingly large number of hits, the server usually reports that it is 80 percent to 90 percent idle even during the 8 a.m. to 5 p.m. time slot during which it does the major portion of its work. at the time these statistics were gathered, ssi was already in use for the inclusion of urls, last update dates, and some header and footer information. the commands directing the server to provide this information to the browser are more taxing to the server than commands employing the include command. after the addition of four to ten include commands on each of thirtyseven subject pages, and the construction of one alphabetical list consisting only of 143 include commands, the load on the server remains at less than 20 percent. this load is on a server with a 200 mhz powerpc processor, with 512 mb of ram. certainly each library has its own server situation, and this solution for handling occasional updates to electronic resources information on library web pages may not work for all. consultations with the automation librarian or computer technology department should verify specific limitations and requirements for using ssl sul also tested to see whether the use of ssi would result in longer page load times. the download of two identical (to the user) pages, one static and the other built from include directives, was timed. there was no visible difference between the two. a more rigorous test may have detected a difference measurable in milliseconds. however, sul librarians feel that the user's connection speed has a larger effect on the page load time than the use of ssl background the evolution of this process for sul moved rather quickly from inception to application, involving only slight changes to the library's normal processes for updating web pages. the most time-consuming tasks involved creating the include .txt files at the outset, and then providing a database of the completed include commands. each time a database title is added to the collection, the automation librarian creates a persistent uniform resource locator (purl) for that database. this is due to the tendency of the database vendors to alter the url through which access is gained and because the library sometimes switches vendors for its databases. with this procedure in place, the automation librarian can make the change to the url in one place and every link to that database will connect properly. this saves considerable time and effort for the librarians who maintain the subject pages. librarians were in the habit of using that purl to create a section of html that presented the database link to the patron. figure 1 illustrates a portion of html code for ebsco' s ase as it appears in a text editor. in addition to appearing in subject pages on various topics, subscription databases and commonly used web resources are listed in one alphabetical listing. this page, updated first when changes are made, is where librarians came to copy the current code for a database. in this way, sul attempted to keep pages consistent. text for a database could be copied and pasted into the html editor as a librarian worked to update the pages. while this seemed like a smooth process, the simple reality of having to copy and paste one database change to, potentially, all subject pages was too time-consuming to make it an efficient process. updates to the subject pages and alphabetical list are done as time permits. often, one or two librarians were unable to make the changes until weeks had passed, resulting in a web site that was inconsistent and sometimes misleading. for instance, dates of database coverage might change and be documented on the alphabetical list, but not on all the subject pages listing that database. the way sul does it now when it was decided to try the include commands as an alternative for updating database information and streamlining the update process for librarians, changes were made over the course of three days, and occurred in three phases. in phase one, the alphabetical list was carved into 143 individual .txt files, each of which contained information for one database or web resource. each .txt file was saved on the server with a name corresponding to the purl that is used for that database, or with an easily recognizable name constructed from the title or url of the resource. the html code in figure 1 became a .txt file titled eb-ase.txt. using the purl for the title of the file makes it more easily recognizable and acquaints librarians with the purls already in use. in phase two, one librarian created an excel file containing the following: (1) an alphabetical list of resource names; (2) existing names for the corresponding .txt files; and (3) component parts of an include command for each resource. the final column concatenates these component parts together resulting in a ready-made include command for each resource. figure 2 illustrates a portion of this table. in phase three, librarians used th e table of include commands to select the resources they wanted on thei r pages. this entailed replacing a block using server-side include commands i northrup, cherry, and darby 195 reproduced with permission of the copyright owner. further reproduction prohibited without permission. of html code with one include command . the examp le of html code in figure 1 would be reduced to the following statem ent:  fig. 2. database names, .txt file names, and resultant include commands in addition, librarians who are using this simple technique do not need extensive training. the creation of the excel database of include commands allows for quick additions to an existing page, or the creation of new subject pages. librarians using the include commands can simply copy and paste them; there is no need for them to understand the syntax or to be able to repeat it. this makes using ssi particularly attractive to staff who do not want the added burden of further training in html. the librarian responsible for creating the .txt files and the excel database of statements demonstrated the copying and pasting of the include statements to all the other librarians who edit html pages in a one-time tenminute training session. the only additional training issue has involved page structure. since the library uses a table structure for the subject pages, all table tags are included in the database .txt files. making sure that librarians understand that they do not need to recreate the table tags has been the only additional training issue for the department. as librarians begin to use these commands, links to resources across subject pages will look the same and will provide the user with the same information. this increased uniformity results in a more professional appearance for the web site as a whole. disadvantages this revolution in the maintenance of subject pages has not been without its disadvantages. the primary complaint by librarians using ssi include commands is that they cannot preview their changes in their html editors. sul's department uses the coffeecup html editor, which allows previews, but the previews are not visible for items that are retrieved using ssis. this is because the page is not fully assembled until the server assembles it. when the librarian views the page in the editor, 196 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. prior to uploading it to th e server, the include commands are without tar gets. the target .txt files are on the server. when a user requ ests a p age, include commands pull in the missing pieces (the .txt files, or other files); th en, th e completed pag e is seamlessly presented to the us er via his or her brow ser. as mach notes, "preview ing a web page without crucial element s . . . can be di sconcerting, esp ecially to visuall y oriented d esigners."20 in sul's experienc e with thi s particular issue, librarian s who are uncomfortable loading pages with locally invisible elements can load th em into temporary fold ers on the server, check them for errors there, and then move them to th eir appro priate dir ectories . conclusion situational factors have allowed sul to imple ment this change with surprising ease and speed. because the library has its own server, and because th ere is an automation librarian on staff, communicati on and chan ge have been easy and efficient. librar y staff deduce that it is becau se the include command of ssi is b eing u sed more than other possible commands that the librar y is not experiencing an increase in loadin g tim e on its pages. of course, the size of sul's reso urce list makes this kind of soluart & tec h ebsco tion feasible ; certainly, if the librar y were working with hundreds of resources, it would be more likely that a datab ase -driv en strategy would be ad op ted . the simplicity and elegance of the ssi include command process has encourage d adoption, and sul ha s seen no ill effects from the us er side of operations. librarian web au th ors qui ckly overcame any slight di sco mfort with the new proc ess and are now able to devote a portion of editing time to other, less m ono tonous tasks. references and notes 1. carla dun smore, "a qualitative study of web-mounted pathfinders created by academic business libraries," libri 52, no . 3 (sept. 2002): 140-41. 2. charles w. dea n , "th e public electronic libr ary : web-based subj ec t guides," library hi tech 16, no. 3-4 (1998): 80-88; gary rob erts , "designi ng a database-driven web site, or, the evolution of the infoiguan a," computers in libraries 20, no. 9 (oct. 2000): 26-32; bryan h. davidson, "database-driven, dynamic content delivery: providing an d managing access to online resources using microsoft access and ac ti ve server pages," oclc systems and services 17, no . 1 (2001): 34-42; marybeth grimes and sara e. morris , "a co mp ari so n of academic librarie s' webliographies, " internet reference services quarterly 5, no . 4 (2001): 69-77; laur a ga lv an -estra da, "moving towards a user-cent ere d, database-driven web site at th e ucsd libraries," index to advertisers 179 200 lita internet reference services quarterly 7, no. 1-2 (2002): 49-61. 3. roberts, "infoiguana "; davidson, "da tabase driven"; galvanestrada, "user -cen tered, database-driv en web site." 4. davidson, "database driven," und er " int roduction ." 5. ibid., under "developm ent conside ra tions." 6. roberts, "infoiguana ," 32. 7. ga lvan-estrada, " u ser -centered, database-driven web site, " 55-56. 8. jody co ndit fagan, "server -side includ es made sim ple, " the electronic library 20, no. 5 (2002): 382-83 . 9. michelle mach, "the service of serv er -side includes," information technology and libraries 20, no. 4 (2001): 213. 10. greg r. notess, "serv er side includes for site management," online 24, no. 4 (july 2000): 78, 80. 11. ibid. 12. mach, "se rvice of server-side includ es," 216. 13. ibid., 214. 14. fagan, "server -side includ es m ade simple," 387. 15. ibid., 383. 16 . ibid. 17. ibid. 18. apache httpd server project, "apac h e http server version 1.3: secu rity tips for server configurati on," th e apache softwar e foundation. accessed oct. 29, 2003, http: / / httpd. apac he.org/ docs / misc / sec urity _tips .html. 19. an th on y baratta, e-mail to th elis t mailing list, may 16, 2003, accessed nov . 4, 2003, http:/ / lists.evolt.or g/ archive/ week-of-mon-20030512/140824.html. 20. mach, "service of serv er -side includ es," 217. cover 2, 191, covers 3--4 using server-side include commands i northrup, cherry, and darby 197 editorial board thoughts: critical technology cinthya ippoliti information technology and libraries | december 2018 5 cinthya ippoliti (cinthya.ippoliti@ucdenver.edu) is university librarian and director, auraria library, university of colorado. critical librarianship has brought many changes in how libraries have examined their programs and services, created new positions dedicated to equity, inclusion, and diversity, and paved the way to challenge existing assumptions about our work and environment. technology also exists in a space that is not neutral, as library systems and services reflect specific perspectives in their content and focus as well as how they are made accessible (or not). i would like to briefly examine how we can begin to think about these issues within academic libraries, and offer some additional readings for further reflection for four technology-related areas: spaces, services/programming, systems, and engaging with our users. technology spaces we might assume that because we are seeing students using our classrooms, makerspaces, and study areas, that we have been successful in meeting the needs of a wide variety of users. to a large extent that may be true, but we should also be asking ourselves who does not feel welcome in such a space and, more importantly, why not? there are two facets to this question. the first involves the degree to which libraries strive to create a welcoming environment. staff interactions, signage, hours, and institutional values are all part of a complex and broader environment that signals to users how these spaces function and how they are perceived by the organization. these same elements can also serve as deterrents through choices in layout, policy, or other intangible aspects so that they may in fact prevent individuals from entering these spaces in the first place. the second revolves around the notion that each technology-rich space conveys its level of friendliness and intended purpose through its physical presence. ensuring that furniture, paint, and layout are compliant with ada standards, and integrating these features with each other as opposed to setting them apart so that they are not considered “special” or “different,” is one small and vital step in this direction. maggie beers and teggin summers cover these issues an educause review article and discuss asking questions regarding how power structures are reinforced by having a “front” of the room or other configurations can enrich planning and assessment efforts. similarly, developing a plan so that new technology in areas such as makerspaces rotates as much as possible will help to provide access for those who may not be able to utilize these resources outside of the library context in order to accommodate differing skill levels, interests, and learning styles. in addition, students may not always be present on campus due to family, job, or other life circumstances and planning with the assumption that everyone who could benefit from using a particular space is in fact taking advantage of that benefit, is problematic. one way around that is to ensure that each space is as flexible as possible and (ideally) can be reconfigured for quiet reflection, collaborative work, or transformed into a sensory space or other type of specialized environment. the reservation process should be available both online and manually (as not critical technology | ippoliti 6 https://doi.org/10.6017/ital.v37i4.10810 everyone may have access to a computer and/or the internet), hour limitations should have several counter options, and the space should be available as much of the time as possible when it is not in use for more a more formalized purpose. any space usage assessments should also purposefully include non-users or perceived non-users and integrate questions about barriers to or about the space in their methodologies. finally, ensuring that the right level of staffing to support both the intended, as well as perhaps the unintended, uses of the space and the activities that occur within it will help create a sense that not only the space itself is valued, but that the experiences occurring within it are even more important. this is not easy to accomplish, as it is difficult to predict exactly how a space will be used unless there are very strict confines placed around its configuration and accessibility. but assuming that most spaces in libraries are designed to be malleable and keeping in constant communication with users via some of the methods described above should help. technology services and programming similarly, services and programs cannot be built around a one-size-fits-all model. this can prove to be quite challenging given the limited resources libraries face. engagement and learning lie not only in access to tools, but in the very process of sharing knowledge and experiences — whether for academic growth, social action, or simply personal enjoyment. matt ratto, who coined the term “critical making,” defines it as the process “intended to highlight the interwoven material and conceptual work that making involves.” he argues that “critical making is dependent on open design technologies and processes that allow the distribution and sharing of technical work and its results.” ratto makes the further point that this process also has the capacity of “unpacking the social and technical dimensions of information technologies.” this in turn allows for technology to become more than simply a cool resource, but rather a mechanism for democratizing this creative work of making and designing while dealing with its messy, political, and uncomfortable aspects which do not exist in vacuum outside of the tools themselves. an approach in this instance might involve taking technology outside of library spaces such as on campus or within the community, offering as much for free as possible, and capitalizing on programs such as girls who code (https://girlswhocode.com/) and grow with google (https://grow.google/). capturing how these resources are used in all of their possible permutations enables stories of individuals to shine through. the impact of these programs takes on a personal element through showcases, speaker events, and hackathons that are designed to bring the community together and engage in sharing of knowledge, perspectives, and conversations. in addition, this will hopefully shrink the barriers for those who don’t see themselves as having a role in these activities. integrated library systems i do not have a background in systems, but simon barron and andrew preater have written a great chapter unpacking the inherent power structures which manifest themselves in library systems such as the integrated library system (ils), discovery interfaces, and the third-party resources we provide access to. they suggest taking action by thinking about user privacy and ensuring that the information libraries are able to view, gather, and store is used ethically and that decisions for derivative services or actions are not made based on assumptions about gender identity, economic status, or other identifiers via access to these types of data. openness is another area the authors explore, as they discuss how libraries can use open source software whenever possible in order to balance the field against profit-based licensing models. barron and preater also raise a concern however that while crowdsourcing is in theory a good way to include the community in https://girlswhocode.com/ https://grow.google/ information technology and libraries | december 2018 7 developing ways to help itself, it still does not recognize the limited resources marginalized populations can dedicate to these efforts. finally, they discuss how it is crucial for libraries to recognize and support the expertise needed in this arena in order to avoid overreliance on vendor systems that can prove alluring with out-of-the-box solutions, but which compromise things like privacy, autonomy, and customization that might otherwise benefit from equity, diversity, and inclusion-centered practices. equity-driven design engaging with users in developing shared solutions to challenges is an important aspect of the user experience, and can help pave the way for deeper conversations. taking a step back and making sure the assessment and design process itself is transparent for everyone is one of the first things that needs to be in place. i would like to harken to the work of gretchen rossman and sharon rallis who make a crucial distinction between user-centered design, in which the user seldom has a voice in what the final process or product looks like, and what they term as “emancipatory design,” in which participants are “collaboratively producing knowledge to improve their work and their lives.” in addition, emancipatory design is one where “users are in charge; their power, their indigenous knowledge are more powerful and respected than those of the expert designer.” this approach can therefore be a means to promoting equity, diversity, and inclusion into technology work in libraries by focusing on the users’ voice as opposed to our own and working collaboratively to develop shared solutions to address their challenges. a specific example of how this framework might be applied comes from the stanford school of design which is famous for its course in design thinking. stanford has recently taken that concep t even further, and integrated an equity focus into the first steps of the progression, where the designer is not only identifying existing built-in biases but also raises questions such as who the users are, what are the equity challenges that need to be addressed, who has institutional power, and how is it manifested in the decisions that drive the organization. the stanford model also provides specific methods focusing on human values and developing relational trust as a way to bookend the design thinking process by reflecting on the blind spots that were uncovered as a way to help inform action items and next steps and ensure that the users are actively collaborating to develop these services and programs which in turn affect them. this version of the program is available at https://dschool.stanford.edu/resources/equity-centered-design-framework. as a final thought, one idea to keep at the forefront in all of these areas is that of universal design, which is defined by the center for universal design at ncsu as “the design of products and environments to be useable by all people, to the greatest extent possible, without the need for adaptation or specialized design.” the first principle is that of equitable use and can be applied to many technology-related aspects whether they are physical or virtual: • provide the same means of use for all users: identical whenever possible; equivalent when not • avoid segregating or stigmatizing any users • provisions for privacy, security, and safety should be equally available to all users • make the design appealing to all users https://dschool.stanford.edu/resources/equity-centered-design-framework critical technology | ippoliti 8 https://doi.org/10.6017/ital.v37i4.10810 further readings: barron, s. and preater, a. j. “critical systems librarianship.” in the politics of theory and the practice of critical librarianship (sacramento: litwin books, 2018). https://repository.uwl.ac.uk/id/eprint/4512/1/2018-barron-and-preater-critical-systemslibrarianship.pdf. beers, m. & summers, t. “educational equity and the classroom: designing learning-ready spaces for all students,” educause review. may 7, 2018. https://er.educause.edu/articles/2018/5/educational-equity-and-the-classroom-designinglearning-ready-spaces-for-all-students. north carolina state university center for universal design. “center for universal design”. https://projects.ncsu.edu/design/cud/ (accessed november 25, 2018). ratto, m. “critical making,” open design now. http://opendesignnow.org/index.html%3fp=434.html (accessed november 7, 2018). rossman, g. b., and rallis, s. f. learning in the field: an introduction to qualitative research (thousand oaks, ca: sage, 1998). https://repository.uwl.ac.uk/id/eprint/4512/1/2018-barron-and-preater-critical-systems-librarianship.pdf https://repository.uwl.ac.uk/id/eprint/4512/1/2018-barron-and-preater-critical-systems-librarianship.pdf https://er.educause.edu/articles/2018/5/educational-equity-and-the-classroom-designing-learning-ready-spaces-for-all-students https://er.educause.edu/articles/2018/5/educational-equity-and-the-classroom-designing-learning-ready-spaces-for-all-students https://projects.ncsu.edu/design/cud/ http://opendesignnow.org/index.html%3fp=434.html technology spaces technology services and programming integrated library systems equity-driven design further readings: microsoft word 12035 20211217 galley.docx article stateful library analysis and migration system (slam) an etl system for performing digital library migrations adrian-tudor pănescu, teodora-elena grosu, and vasile manta information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.12035 adrian-tudor pănescu (tudor@figshare.com) is software engineer, figshare. teodora-elena grosu (teodora@figshare.com) is software engineer, figshare. vasile manta (vmanta@tuiasi.ro) is professor, faculty of automatic control and computer engineering, gheorghe asachi technical university of iași, romania. © 2021. abstract interoperability between research management systems, especially digital libraries or repositories, has been a central theme in the community for the past years, with the discussion focused on means of enriching, linking, and disseminating outputs. this paper considers a frequently overlooked aspect, namely the migration of records across systems, by introducing the stateful library analysis and migration system (slam) and presenting practical experiences with migrating records from dspace and digital commons repositories to figshare. introduction bibliographic record repositories are a central part of the research venture, playing a key role in both the dissemination and preservation of outcomes such as journal articles, conference papers, theses and dissertations, monographs, and, more recently, datasets. as the ecosystem of which these are a part of has evolved at a sustained pace in the last decade, repositories also had to adapt while ensuring uninterrupted service to the research community. nevertheless, a number of developments, both at the local, repository level and at a more general, global scale, have created the necessity of considering the complete replacement of certain systems with new repository solutions which are better suited for their stakeholders’ requirements. the following are a few such developments: • the need to consolidate both technological solutions and operational teams, in order to reduce running costs and provide a unified experience for end users, the research personnel.1 • various policies require researchers to provide not only traditional outputs, such as journal articles or conference papers, but also the datasets and other materials backing up scientific claims. for repositories, this means both adapting to larger amounts of stored data as well as ensuring that the metadata dissemination and preservation mechanisms are suited for the new output types (e.g., while full-text search is a common feature of literature repositories, it cannot be easily applied to numeric datasets).2 • apart from extending the set of stored outputs, policies have also created new requirements for existing record types. for example, the research excellence framework (ref) in the uk mandates monitoring open access (oa) publishing of research articles; thus, institutional repositories are no longer only a facilitator of green open access (selfarchiving of records) but also a means of monitoring compliance.3 this requires the implementation of new logic in existing repositories, which can frequently be difficult, especially when faced with legacy repository code bases or insufficient technological resources. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 2 • commercial, contractual, or leadership changes can also create the need to replace repository systems, due to uncertainty (see the acquisition of bepress by elsevier) or preference for certain platforms.4 while these developments can generate the requirement to switch repositories in a very short span of time, such a venture needs to be properly planned and executed in order to ensure, on the one hand, that no records are lost or corrupted and, on the other hand, that minimal or no downtime is caused. ideally, migrations would also be an opportunity to curate and enrich the existing corpus by consolidating and correcting bibliographic records. between 2018 and 2019 the research team has performed six digital library migrations from various source repository solutions (dspace, digital commons, custom in-house built systems) to the figshare software as a service (saas) repository platform. for this purpose, slam, an extract, transform, load (etl) system, was developed and successfully employed in order to migrate over 80,000 records. this article describes the rationale behind slam, its design and implementation, and the practical experiences with employing it for repository migrations. a number of future enhancements and open problems are also discussed. motivation and background of slam in early 2018 figshare started considering the suitability of its repository platform for storing content which is usually specific to institutional repositories (journal articles, theses, monographs), along with non-traditional research outputs (datasets or scientific software).5 while feature-wise this was validated by its hosted preprint servers, a new challenge was posed, as stakeholders choosing to use figshare as an institutional repository also had to transfer all content from their existing systems.6 thus, in the first half of 2018, a first migration was performed, transferring records from a bepress digital commons (dc) repository (https://www.bepress.com/products/digitalcommons/) to figshare (https://figshare.com). from a technical point of view, a python (https://www.python.org/) script was developed for this migration; this script parsed a commaseparated values (csv) report produced by dc which contained all metadata and links to the record files.7 using this information, records were created on the figshare repository using its application programming interface (api) (https://api.figshare.com). while this migration succeeded, the naive technical solution presented a number of issues: • difficulties with the metadata crosswalk: while a crosswalk was initially set up, mostly based on the definition of the fields in the source and target repositories’ metadata schema, issues were discovered while migrating the records, mainly generated by inconsistencies in the values of the fields across the corpus. these issues were fixed on a case-by-case basis, in order to ensure a lossless migration, but it would have been preferable to surface them in the early phases, in order to have the migration script mitigate the issues in the final run. • running the migration procedure multiple times: the migration script followed mostly an all or nothing approach, which, at each run, fully migrated all records between repositories. this is undesirable, as there was a need to run the script only for those records that failed to migrate (due, for example, to metadata crosswalk issues). after the full migration was completed, there was also a need to apply only some minor corrections to records, without following the full procedure. this was not possible, since the script would recreate all records to migrate from scratch on the target repository, as it did not have any memory of information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 3 previous runs. this issue was also amplified by the fact that in the source repository records did not have any type of persistent identifier attached. thus, additional scripts, which only performed the corrections, had to be developed. • ability to run the migration procedure with minimal supervision: like most migrations, this instance considered a large number of records (over 10,000) and, ideally, the process would run with minimal supervision operators. while the script partially accomplished this, the need for better fault-tolerance and enhanced logging was identified. given the lessons learned from the initial attempt and the requirement that five additional migrations were to be completed between october 2018 and december 2019, a more robust alternative to the naive migration script was required. this alternative had to adhere to three design principles: 1. reusability: the system should be usable for multiple migrations without extensive additions or modifications. thus, it should be able to adapt to the workflows of multiple repositories, metadata schemas, and other concerns specific to each migration. 2. statefulness: in software engineering, programs can either discard knowledge of past transactions or preserve it, allowing previous results and operations to be revisited. migration systems benefit from a stateful architecture, as the system should be able to perform the same migration multiple times, without creating duplicate records on the target repository, while allowing for incremental record improvements with each run. apart from allowing for corrections to be applied post-migration, this would also support the prototyping phase (where multiple test migrations are performed in order to validate the metadata crosswalks), that no information is lost, and other general workflow aspects. 3. fault tolerance: the system should implement fault tolerance mechanisms at all levels, allowing it to run migrations of large corpora with minimal supervision and, at the same time, implement sufficient logging and exception handling to allow operators to identify and correct potential issues. several repository migrations are represented in the literature. in van tuyl et al., the authors describe the process of moving from a dspace (https://duraspace.org/dspace) to a samvera (https://samvera.org) system, while in the study from do van chau records were migrated from a solution developed in house to dspace.8 both instances offer valuable insight into the challenges posed by digital library migrations, especially at the level of bibliographic metadata; on the other hand, both works are focused mostly on a specific use-case and do not propose general technical solutions for other migrations. it is interesting to note that the migration presented by van tuyl et al. required two and a half years of work, while slam was employed to carry five migrations in 14 months. the bridge2hyku toolkit (https://bridge2hyku.github.io/toolkit) is a collection of tools, including a module for the hyku repository solution (https://hyku.samvera.org), aimed at facilitating the import of records into digital libraries based on this software. similar to slam, it includes an analysis component, useful for surfacing and correcting potential metadata issues during the migration. slam provides two major improvements over this solution, namely it defines a generic architecture that can be used for migrating records between any two repositories, while also defining a procedural migration workflow to create a robust, fault-tolerant, and extensible solution. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 4 pygrametl (http://chrthomsen.github.io/pygrametl/) and petl (https://github.com/petldevelopers/petl) are two open-source frameworks which allow the defining of etl workflows; similar to slam, the processing steps are defined using python functions. these projects are targeted towards tabular and numeric data, making them unsuitable for the transfer of files and metadata across bibliographic repositories. singer (https://www.singer.io/) is an etl framework similar in design to slam, which allows the composing of various data sources (or taps) and targets, in order to move data between them. the two downsides of this implementation are that it is focused on processing data specified in the javascript object notation (json) format, which is not always available for bibliographic metadata, and that it does not facilitate extending the pipeline with, for example, the analysis facilities targeted by slam. hevo data (https://hevodata.com/), pentaho kettle (https://github.com/pentaho/pentahokettle) and talend open studio (https://www.talend.com/products/talend-open-studio/) are etl frameworks which employ graphical interfaces to allow users to define the processing workflows. while such functionality was not initially identified as a requirement for our planned migration projects, during testing it became obvious that providing such an interface could bring value by having repository administrators be more involved in defining and validating the processing applied to bibliographic records, as the administrators possess the most knowledge of the organisation of the repositories. a downside of the three solutions is that their usage requires commercial agreements, which did not line up with the business requirements of the considered migrations. in their work, tešendić and boberić krstićev use the pentaho suite in order to implement the etl component of a business intelligence (bi) solution for reporting on bibliographic records.9 while the structure of the etl processing is different—the authors being mostly interested only on certain aspects of the metadata—this work provides insights into the types of analysis that could be performed while migrating records. slam’s design and implementation following the design principles previously mentioned, slam’s architecture was devised as presented in figure 1; as for most etl systems, the easiest way of understanding its operation is by examining the data flow. the migration workflow proceeds by extracting all the required information from the source repository. this could be achieved in multiple ways, such as harvesting through an oai-pmh (https://www.openarchives.org/pmh/) endpoint or other types of api, using the bulk export functionality implemented by most repository systems, or even by crawling the html markup describing records, similar to what search engines do in order to discover web pages. once this mechanism has been established, practical experience proves that it is beneficial to move this raw data closer to the destination repository (to a staging area as depicted in figure 1). while this transfer might prove cumbersome, especially for large corpora, it is required only once. moreover, having the data close to the destination repository allows faster prototyping and testing of the migration procedure, as network latency and throughput are improved, while also ensuring that the source repository’s functioning is not affected in any manner. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 5 figure 1. main components and data flow in slam. areas in light blue are currently under development, while the components highlighted in green need to be adapted for each migration. the system splits the data to be migrated into four logical slices: bibliographic metadata, record files (e.g., pdfs of journal articles), persistent identifiers of records (pids, such as digital object identifiers or handles), and usage data (views and downloads). metadata is the first aspect to be considered. from the migration point of view, two dimensions are considered: the syntax and the semantics. metadata comes in various formats, such as csv or extensible markup language (xml) files, but most of these can be easily parsed by openly available software solutions. of more interest are the semantics of the metadata, which stem from the employed schemas or ontologies of field definitions; examples include dublin core (https://www.dublincore.org) or datacite (https://schema.datacite.org). a schema crosswalk, which describes how the fields in the target repository schema should be populated using the source data, needs to be set up when transferring records. while this should not be a concern if information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 6 the two repositories use the same schema, for the performed migrations (described below) this was not the case. other reasons for setting up such a crosswalk include • loosely defined schema in at least one of the repositories: certain repository systems do not specify a schema with clear field definitions, validations or applicability. by having the source repository administrators help with setting up a crosswalk, the migration team can avoid issues caused by incomplete understanding of the metadata. • support for the review of bibliographic records: migrations can prove to be an opportunity for reviewing and amending the records’ metadata; for example, infrequently used fields can be completely removed, and values which tend to confuse end users can be moved to other fields. • ensuring that a record on how the migration was performed, from the metadata point of view, is maintained. the crosswalk is considered an artefact of the migration and is preserved for future reference. in slam, the crosswalk is tested using elasticsearch, “an open-source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured.”10 the setup uses the crosswalk to create elasticsearch documents which include all fields as they would be transferred to the destination repository. a kibana (https://www.elastic.co/products/kibana) dashboard is then used to inspect the records’ metadata and perform structured searches across the corpus. this can allow, for example, discovering fields which do not follow a consistent pattern for the values, as seen in figure 2. as the crosswalk includes, apart from the field mapping, altering operations that can be performed on each field, this analysis can facilitate the review process described by the second point above. while performing actual migrations, a number of inconsistencies that the source repository administrators were unaware of were surfaced by slam and corrected in the target repository. this is commonplace especially in large corpora spanning decades, where the repository metadata workflows and schemas changed multiple times. two points should be noted about this component: • this is the only component of the architecture for which we mention an actual solution chosen for the practical implementation, namely elasticsearch. while other solutions could have been chosen, such as the ones included in the bridge2hyku toolkit, elasticsearch proved to be the best fit for a highly automated system which requires analysis capabilities; it is a production-grade solution which can index a high number of documents and support complex queries, while also providing user-friendly analytical views via kibana. • there are arguments for loading the metadata in the analysis component without having it processed through the crosswalk; such a workflow could provide further insights into various issues in the corpus which are possibly obscured by the crosswalk. our practical experiences did not fully justify this requirement, while the actual implementation provided a mean to test the crosswalk, a major migration component; nevertheless, we are still considering the possibility of having to load the raw metadata for analysis in future migrations. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 7 figure 2. a view examining the possible values of the temporal coverage field from the dublin core schema in an institutional repository corpus to be migrated. this shows variation in the format of the values (full date, year only) which can cause issues when migrating to a schema which applies strict validation on date/time values, and thus need to be handled by the migration harness. this view is generated using kibana from the elasticsearch stack, employed by slam for metadata analysis purposes. with the crosswalk set up, the migration module can be completed. from a logical point of view, it comprises of four components: 1. metadata processing: this component uses the crosswalk in order to transfer the metadata to the target repository. 2. file upload: this simply uploads all files associated to a bibliographic record to their new locations. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 8 3. usage data transfer: most repositories implement counters for views and downloads of records, and this information, if available, is also transferred to the target repository. 4. persistent identifier update: if the records are using persistent identifiers, such as digital object identifiers (dois) (https://doi.org/) or handles (http://handle.net/), these are updated to resolve to the new locations in the target repository. while employing slam for migrations, cases in which persistent identifiers were not employed on source repositories were encountered, with records being accessible only via uniform resource locators (urls). as these cannot always be transferred across repositories, because each software uses its own url schema, it is advisable to implement persistent identifiers before migrations. figure 3. a simplified process diagram describing the steps required for migrating a bibliographic record. each successful operation is recorded in a persistent database which is used in subsequent runs for resuming the workflow. for example, files will not be uploaded each time the script is run, thus avoiding duplication. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 9 one of the architectural goals of slam is statefulness and this is implemented at this level, the migration module being designed as a state machine. a trivial example of such a state machine is shown in figure 3. the state machine status is serialised in a persistent database, with each migration run deserializing it in order to understand which operations still need to be applied for each record. maintaining such a registry provides several other benefits: • facilitates testing and prototyping: this was the original reason behind the architecture, useful especially before the metadata analysis functionality was implemented. if one of the operations required for transferring a record fails, subsequent runs will not apply all steps, but only the ones that did not complete. as for each record a separate state section is maintained, this becomes especially useful when migrating multiple entries; records which failed to migrate can be easily isolated and subsequently reprocessed. • allows creating reports on the migration: these are used, for example, to validate that all records were indeed transferred to the target repository. • allows the migration module to be portable: if the state machine serialisation is accessible, the module can run from different locations and at different points in time. the first architectural principle previously presented relates to the reusability of slam across migrations. the most common cause of divergence between migrations is related to the differences between repository solutions; slam isolates this concern by using two connectors, one for the source and one for the target repository. these connectors translate the information to be migrated to and from slam’s internal data model. thus, the source connector needs to be able to traverse the staging storage and provide slam with all the required record information, while the target connector will upload the records to the new repository (using a web-accessible api for example). this means that for each migration only three parts of slam need to be adapted (shown in green highlights in figure 1): the source and target connectors, and the metadata crosswalk. all other components can remain unchanged, thus reducing the technical development time. in the last step of slam’s workflow, the information that was used for the migration is sent to a long-term preservation storage, in order to ensure that it remains available for future reference. in our implementation, the following information is preserved: • original metadata and files, as extracted from the source repository. • metadata crosswalk from source to target repository. • migration script state machine serialisation. this information is sufficient for understanding the exact steps applied during the migration and, if required, for applying certain corrections to the migrated records at a future point in time. employing slam for real-world migrations slam was used for performing five repository migrations in one year, as described in table 1; the target repository in all five cases was figshare. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 10 table 1. overview of repositories migrated to figshare using slam. source repository identifier repository type software number of records ir1 institutional dspace 37,000 ir2 institutional dspace 25,605 d1 data custom 334 (105 gb) ir3 institutional digital commons 2,275 ir4 institutional dspace 15,474 slam’s viability was assessed based on the design principles outlined above. reusability, the main rationale behind slam, relates to being able to reuse as much of the system as possible across migrations. the architecture isolated the parts that required adaption from one migration to another (the connectors and the crosswalk); the time spent by a software engineer in order to set up these was monitored. the target here was to support the specialised staff on making domainspecific decisions, especially on the metadata crosswalk, by reducing the time needed to develop the three mentioned components. for example, the research excellence framework (ref) 2021 exercise in the united kingdom had strict metadata requirements, which required thorough testing in connection with current research information systems and open access monitoring solutions. between the first and fourth migration, this was reduced from six person-weeks to only two; it is important to note that slam evolved between the migrations, based on the lessons learned from each instance. statefulness, the property which allows re-processing already-migrated records, is covered in slam by the state machine implemented in the migration module, which is persistent and can be referenced in subsequent runs. all the migrations in table 1 required supplementary runs after all records were migrated, most frequently in order to fix metadata issues discovered after the full corpus was transferred. for example, ir1 required three such runs: 1. the first run fixed a number of issues caused by omissions in the metadata schema crosswalk. 2. the second run enriched the metadata using information taken from a current research information system (a source external to slam). 3. the last run corrected the usage statistics (view and downloads) which were incorrectly imported initially, due to incomplete understanding of the source repository’s database. due to slam’s design, no issues were encountered while performing these runs, as no records were duplicated, removed, or erroneously modified; this was manually checked by the repository administrators, either by sampling the corpus or by inspecting each migrated record, depending on the repository size. a key aspect highlighted by the requirement to reprocess migrated records relates to the granularity of the state machine. as an example, in ir3 a second run required attaching supplementary files to a number of migrated records, and this posed a challenge due to the fact that the state machine only recorded if all files have been uploaded, and not which files were successfully added to the record. thus, the state machine was amended to record the complete list of record files, allowing for more granular control over this processing step. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 11 the last concern, fault tolerance, was achieved by applying basic software engineering principles, such as fail-fast (report migration issues as soon as they manifest), the implementation of proper exception handling (such as not to ignore any potential issues), and addition of enhanced logging in order to provide a complete record of the processing steps. for each of the five migrations, slam ran unsupervised, reporting at the end of each run the records for which an issue was encountered. as an example, in the ir4 migration, slam initially failed to migrate 300 records. these were reported to the operator, and after minor fixes were applied to the metadata crosswalk the migration completed successfully. fault-tolerance plays a central role in ensuring that during migrations no data is lost or corrupted, by surfacing any edge-case that might have been missed during the development of the metadata crosswalk, repository connectors, or core migration module, while also isolating such issues to the records exhibiting them, with no impact on the full corpus. future directions while proven viable in real-world scenarios, a number of areas which can benefit from further improvements were identified through an analysis of the current implementation, based on the experiences of the five migrations. first, the migration-specific components (connectors and metadata crosswalk, shown in green in figure 1) require further decoupling from the core migration module. for example, since all migrations considered figshare as a target repository, this connector is currently strongly interlinked with the core module, in order to save development time according to business requirements and migration timelines. further decoupling will ensure that the core migration module’s design is not influenced in any way by the repository’s architecture and capabilities. completing this work will also allow making the source code of our current implementation of slam publicly available, as in its current state it is making use of proprietary components which are employed across other parts of the figshare platform. aside from these, the source code includes straightforward python modules and makes use of open technologies such as elasticsearch, which will allow the larger community to adapt and use slam with other source or target repositories, or even enhance it with further functionality. nevertheless, the general architecture can already be implemented in any other way or using a different set of technologies. further to this point, the metadata crosswalk is currently influenced by the logic and design of the migration module; for example, it uses the same procedural programming language, python, as all other components of slam. employing technologies such as extensible stylesheet language transformations (xslt, for metadata in xml formats) or sparql (for rdf) will help involve staff with in-depth domain knowledge further in the migration, for whom these technologies are more familiar; moreover, such a design does not require any knowledge of slam’s internal processes. second, the five completed migrations highlighted the importance of reviewing, correcting, and enhancing records during the migration. for example, when migrating a journal article’s version of record in an open access context, special care needs to be given to its metadata (title, authors, journal name, publication date or persistent identifier), as mistakes can generate issues with scholarly search engines which will not be able to link the published version to the repository one. a possible input for comparing and correcting existing metadata is the information contained by current research information systems, which aggregate information from various databases, such as scopus (https://www.scopus.com/). if access to such systems is not available, it is possible to information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 12 source metadata from open directories, such as crossref (https://www.crossref.org/). this component is included in the architectural overview presented in figure 1. the third area in need of improvement relates to testing the outcome of the migrations. as mentioned in the previous section, this is currently a manual process and can be both cumbersome and error prone. while in line with slam’s philosophy of automating every step of the process, implementing a mechanism for validating the end migration result could also provide stronger assurances on the completeness and correctness of the migration. finally, slam’s preservation module requires further development in order to ensure that it is fully automated; moreover, the possibility of adding a manifest explaining the migration artefacts needs to be considered, as knowledge on the organisation of the information, which is specific to each migration, might be lost in time. it is important to note that architecture-wise, which was the main concern of this work, we did not identify any major shortcomings in slam—most issues discussed above focus on implementation issues. slam’s modular design will facilitate any additions to the system, required to support new use cases and migrations. conclusions this paper describes slam, the stateful library analysis and migration system, an etl software architecture for performing digital library migrations. what differentiates such transfers from other data migrations is the required domain knowledge, the particularities of the target and source repositories in the context of the scholarly communications ecosystem, and the structure of the migration package, which includes, among others, bibliographic metadata, record files, and usage data. digital libraries are an integral part of the cultural heritage; thus, any migration needs to ensure that no information is lost or corrupted in the process. the main contributions brought by slam are 1. it includes an analysis module based on an industry standard search engine, elasticsearch, which allows operators to analyse the metadata and schema crosswalk, facilitating the decisions required for properly migrating information between repositories; 2. it implements a serializable state machine in its migration module, which facilitates running the migration procedures multiple times without duplicating, removing, or corrupting records, while allowing for corrections to be applied to the corpus; 3. it follows a modular design, which enhances its reusability across multiple migrations, by reducing the development time required for adapting the system to new source and target repositories. slam applies established software engineering principles in order to provide a trustworthy tool to digital library administrators that need to transfer content between systems. its design was both influenced and validated by real-world applications, having been used for five different migrations with various requirements and targeted repository solutions. future work will consider enhancing slam’s metadata analysis and enrichment capabilities as well as the collection of further data points on its performance and possible improvement directions while using it for new digital library migrations. information technology and libraries december 2021 stateful library analysis and migration system (slam) | pănescu, grosu, and manta 13 endnotes 1 david scherer and dan valen, “balancing multiple roles of repositories: developing a comprehensive repository at carnegie mellon university,” publications 7, no. 2 (2019), https://doi.org/10.3390/publications7020030. 2 directorate-general for research & innovation, “h2020 programme—guidelines to the rules on open access to scientific publications and open access to research data in horizon 2020,” version 3.2, march 21, 2017, https://web.archive.org/web/20180826235248/http://ec.europa.eu/research/participants/ data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf; national institutes of health, “nih public access policy details,” last updated march 25, 2016, https://web.archive.org/web/20180421191423/https://publicaccess.nih.gov/policy.htm. 3 the ref, “research excellence framework,” https://web.archive.org/web/20191215143352/https://www.ref.ac.uk/. 4 roger c. schonfeld, “elsevier acquires bepress,” scholarly kitchen (blog), august 2, 2017, https://web.archive.org/web/20191212183253/https://scholarlykitchen.sspnet.org/2017/0 8/02/elsevier-acquires-bepress/. 5 alan hyndman, “announcing the figshare institutional repository… and data repository… and thesis repository… really just an all-in-one next gen repository,” figshare (blog), march 22, 2018, https://figshare.com/blog/announcing_the_figshare_institutional_repository_and_data_repos itory_and_thesis_repository_really_just_an_all-in-one_next_gen_repository/389. 6 alan hyndman, “figshare to power chemrxiv™ beta, new chemistry preprint server for the global chemistry community,” figshare (blog), august 14, 2017, https://web.archive.org/web/20191218194210/https:/figshare.com/blog/_/322. 7 bepress, “digital commons dashboard,” https://web.archive.org/web/20191218192450/https://www.bepress.com/reference_guide_ dc/digital-commons-dashboard/. 8 steve van tuyl et al., “are we still working on this? a meta-retrospective of a digital repository migration in the form of a classic greek tragedy (in extreme violation of aristotelian unity of time),” code{4}lib journal no. 41 (august, 9, 2018), https://journal.code4lib.org/articles/13581; do van chau, “challenges of metadata migration in digital repository: a case study of the migration of duo to dspace at the university of oslo library” (master’s thesis, university of oslo, 2011), http://hdl.handle.net/10642/990. 9 danijela tešendić and danijela boberić krstićev, “business intelligence in the service of libraries,” information technology and libraries 38, no. 4 (2019), https://doi.org/10.6017/ital.v38i4.10599. 10 “what is elasticsearch?” elasticsearch bv, http://web.archive.org/web/20191207032247/https://www.elastic.co/whatis/elasticsearch. it is our flagship: surveying the landscape of digital interactive displays in learning environments lydia zvyagintseva information technology and libraries | june 2018 50 lydia zvyagintseva (lzvyagintseva@epl.ca) is the digital exhibits librarian at the edmonton public library in edmonton, alberta. abstract this paper presents the findings of an environmental scan conducted as part of a digital exhibits intern librarian project at the edmonton public library in 2016. as part of the library’s 2016–2018 business plan objective to define the vision for a digital exhibits service, this research project aimed to understand the current landscape of digital displays in learning institutions globally. the resulting study consisted of 39 structured interviews with libraries, museums, galleries, schools, and creative design studios. the environmental scan explored the technical infrastructure of digital displays, their user groups, various uses for the technologies within organizational contexts, the content sources, scheduling models, and resourcing needs for this emergent service. additionally, broader themes surrounding challenges and successes were also included in the study. despite the variety of approaches taken among learning institutions in supporting digital displays, the majority of organizations have expressed a high degree of satisfaction with these technologies. introduction in 2020, the stanley a. milner library, the central branch of the edmonton (alberta) public library (epl) will reopen after extensive renovations to both the interior and exterior of the building. as part of the interior renovations, epl will have installed a large digital interactive display wall modeled after the cube at queensland university of technology (qut) in brisbane, australia. to prepare for the launch of this new technology service, epl hired a digital exhibits intern librarian in 2016, whose role consisted of conducting research to inform the library in defining the vision for a digital display wall serving as a shared community platform for all manner of digitally accessible and interactive exhibits. as a result, the author carried out an environmental scan and a literature review related to digital display, as well as their consequent service contexts. for the purposes of this paper, “digital displays” refers to the technology and hardware used to showcase information, whereas “digital exhibits” refers to content and software used on those displays. wherever the service of running, managing, or using this technology is discussed, it is framed as “digital display service” and concerns both technical and organizational aspects of using this technology in a learning institution. method the data were collected between may 30 and august 20, 2016. a series of structured interviews were conducted by skype, phone, and email. the study population was driven by searching google mailto:lzvyagintseva@epl.ca it is our flagship | zvyagintseva 51 https://doi.org/10.6017/ital.v37i2.9987 and google news for keywords such as “digital interactive and library,” “interactive display,” “public display,” or “visualization wall” to identify organizations that have installed digital displays. a list of the study population was expanded by reviewing websites of creative studios specializing in interactive experiences and through a snowball effect once the interviews had begun. a small number of vendors, consisting primarily of creative agencies specializing in digital interactive services, were also included in the study population. participants were then recruited by email. the goal of this project was to gain a broad understanding of the emergent technology, content, and service model landscape related to digital displays. as a result, structured interviews were deemed to be the most appropriate method of data collection because of their capacity to generate a large amount of qualitative and quantitative data. in total, 39 interviews were conducted. a list of interview questions prepared for the interviews is included in appendix a. additionally, a complete list of the study population can also be found in appendix b. predominantly, organizations from canada, the united states, australia, and new zealand are represented in this study. literature review definitions • public displays, a term used in the literature to refer to a particular type of digital display, can refer to “small or large sized screens that are placed indoor . . . or outdoor for public viewing and usage” and which may be interactive to support information browsing and searching activities.”1 in public displays, a large proportion of users are passers-by and thus first-time users.2 in academic environments, these technologies may be referred to as “video walls” and have been characterized as display technologies with little interactivity and input from users, often located in high-traffic, public areas with content prepared ahead of time and scheduled for display according to particular priorities.3 • semi-public displays, on the other hand, can be understood as systems intended to be used by “members of a small, co-located group within a confined physical space, and not general passers-by.”4 in academic environments, they have been referred to as “visualization spaces” or “visualization studios,” and can be defined as workspaces with real-time content displayed for analysis or interpretation, often placed in in libraries or research department units.5 for the purposes of this paper, “digital displays” refers to both public and semi-public displays, as organizations interviewed as part of this study had both types of displays, occasionally simultaneously. • honeypot effect describes how people interacting with an information system, such as a public display, stimulate other users to observe, approach, and engage in interaction with that system.6 this phenomenon extends beyond digital displays to tourism, art, or retail environments, where a site of interest attracts attention of passers-by and draws them to participate in that site. interactivity the area of interactivity with public displays has been studied by many researchers, with three commonly used modes of interaction clearly identified: touch, gesture, and remote modes. information technology and libraries | june 2018 52 • touch (or multi-touch): this is the most common way users interact with personal mobile devices such as smartphones and tablets. multi-touch interaction on public displays should support many individuals interacting with the digital screen simultaneously, since many users expect immediate access and will not take turns. for example, some technologies studied in this report support up to 30 touch points at any given time, while others, like qut’s the cube, allow for a near infinite number of touch points. though studies show that this technique is fast and natural, it also requires additional physical effort from the user.7 while touch interaction using infrared sensors has a high touch recognition rate, its shortcomings have been identified as being expensive and being influenced by light interference, such as light around the touch screen.8 • gesture: this is interaction is through movement of the user’s hands, arms, or entire body, recognized by sensors such as the microsoft kinect or leap motion systems. although studies show that this type of interaction is quick and intuitive, it also brings “a cognitive load to the users together with the increased concern of performing gestures in public spaces.”9 specifically, body gestures were found not to be well suited to passing-by interaction, unlike hand gestures, which can be performed while walking. hand gestures also have an acceptable mental, physical and temporal workload.10 research into gesturebased interaction shows that “more movement can negatively influence recall” and is therefore not suited for informational exhibits.11 similarly, people consider gestures to be too much work “when they require two hands and large movements” to execute.12 not surprisingly, research suggests that gestures deemed to be socially acceptable for public spaces are small, unobtrusive ones that mimic everyday actions. they are also more likely to be adopted by users. • remote: these are interactions using another device, such as mobile phones, tablets, virtual-reality headsets, game controllers, and other special devices. connection protocols may include bluetooth, sms messaging, near-field communication, radio-frequency identification, wireless-network connectivity, and other methods. mobile-based interaction with public displays has received a lot of attention in research, media, and commercial environments because this mode allows users to interact from variable distance with minimal physical effort. however, users often find mobile interaction with a public display “too technical and inconvenient” because it requires sophisticated levels of digital literacy in addition to having access to a suitable device.13 some suggest that using personal devices for input also helps “avoid occlusion and offers interaction at a distance” without requiring multi-touch or gesture-based interactions.14 as well, subjects in studies on mobile interaction often indicate their preference for this mode because of its low mental effort and low physical demand. however, it is possible that these studies focused on users with high degrees of digital literacies rather than the general public with varying degrees of access and comfort with mobile technologies. user engagement attracting user attention is not necessarily guaranteed by virtue of having a public display. according to research, the most significant factors that influence user engagement with public digital displays are age, display content, and social context. it is our flagship | zvyagintseva 53 https://doi.org/10.6017/ital.v37i2.9987 age hinrichs found that children were the first to engage in interaction with public displays and would often recruit adults accompanying them toward the installation.15 on the other hand, the hinrichs found adults to be more hesitant in approaching the installation: “they would often look at it from a distance before deciding to explore it further.”16 these findings suggest that designing for children first is an effective strategy for enticing interaction from users of all ages. display content studies on engagement in public digital display environments indicate that both passive and active types of engagement exist with digital displays. the role of emotion in the content displayed also cannot be overlooked. specifically, clinch et al. state that people typically pay attention to displays “only when they expected the content to be of interest to them” and that they are “more likely to expect interesting content in a university context rather than within commercial premises.”17 in other words, the context in which the display is situated affects user expectations and primes them for interaction. the dominant communication pattern in existing display and signage systems has been narrowcast, a model in which displays are essentially seen as distribution points for centrally created content without much consideration for users. this model of messaging exists in commercial spaces, such as malls, but also in public areas like transit centers, university campuses, and other spaces where crowds of people may gather or pass by. observational studies indicate that people tend to perceive this type of content as not relevant to them and ignore it.18 for public displays to be engaging to end users, in other words, “there needs to be some kind of reciprocal interaction.”19 in public spaces, interactive displays may be more successful than noninteractive displays in engaging viewers and making city centers livelier and more attractive.20 in terms of precise measures of attention to such displays, studies of average attention time correlate age with responsiveness to digital signage. children (1–14 years) are more receptive than adults and men spend more time observing digital signage than women.21 studies also indicate a significantly higher average attention times for observing dynamic content as compared to static content.22 scholars like buerger suggest that designers of applications for public digital displays should assume that viewers are not willing “to spend more than a few seconds to determine whether a display is of interest.”23 instead, they recommend presenting informational content with minimal text and in such a way that the most important information can be determined in two-to-three seconds. in a museum context, the average interaction time with the digital display was between two and five minutes, which was also the average time people spent exploring analog exhibits.24 dynamic, game-like exhibits at the cube incorporate all the above findings to make interaction interesting, short, and drawing the attention of children first. social context social context is another aspect that has been studied extensively in the field of human-computer interaction, and it provides many valuable lessons for applying evidence-based practices to technology service planning in libraries. many scholars have observed the honeypot effect as related to interaction with digital displays in public settings. this effect describes how users who are actively engaged with the display perform two important functions: they entice passers-by to become actively engaged users themselves, and they demonstrate how to interact with the technology without formal instruction. information technology and libraries | june 2018 54 many argue that a conductive social context can “overcome a poor physical space, but an inappropriate social context can inhibit interaction” even in physical spaces where engagement with the technology is encouraged.25 this finding relates to use of gestures on public displays. researchers also found that contextual social factors such as age and being around others in a public setting do, in fact, influence the choice of multi-touch gestures. hinrichs suggests enabling a variety of gestures for each action—accommodating different hand postures and a large number of touch points, for example—to support fluid gesture sequences and social interactions.26 a major deterrent to users’ interaction with large public displays has been identified as the potential for social embarrassment.27 as an implication, the authors suggest positioning the display along thoroughfares of traffic and improving how the interaction principles of the display are communicated implicitly to bystanders, thus continually instructing new users on techniques of interaction.28 findings technical and hardware landscape the average age of public displays was around three years, indicating an early stage of development of this type of service among learning institutions. such technologies first appeared in europe more than 10 years ago (for example, the most widely cited early example of a public display is the citywall in helsinki in 2007).29 however, adoption in north american did not start until around 2013.the median year for the installation of these technologies among organizations studied in this report is 2014. among public institutions represented in the study population, such as public libraries and museums, digital displays were most frequently installed in 2015. while most organizations have only one display space, it was not unusual to find several within a single organization. for example, for the purposes of this study, the researcher has counted the cube as three display spaces, as documentation and promotional literature on the technology cites “3 separate display zones.” as a result, the average number of display spaces in the population of this study is 1.75. the following modes of interaction beyond displaying video content with digital displays have been observed in the study population in descending order of frequency: • sound (79%). while research on human-computer interaction is inconclusive about best practices related to incorporating sound into digital interactive displays, it is clear, among the organizations interviewed in the environmental scan, that sound is a major component of digital exhibits and should not be overlooked. • touch or multi-touch (46%). this finding highlights that screens capable of supporting multi-user interaction is not consistent across the study population. • gesture (25%): these include tools such as microsoft kinect, leap motion, or other systems for detecting movement for interaction. • mobile (14%). while some researchers in the human-computer interaction field suggest mobile is the most effective way to bridge the divide between large public displays, personalization of content, and user engagement, mobile interactivity is not used frequently to engage with digital displays in the study population. one outlier is north carolina state university library, which takes a holistic, “massively responsive design” approach in which responsive web design principles are applied to content that can be it is our flagship | zvyagintseva 55 https://doi.org/10.6017/ital.v37i2.9987 displayed effectively at once online, on digital display walls, and on mobile devices while optimizing institutional resources dedicated to supporting visualization services. further, as in the broader personal computing environment, the microsoft windows operating system dominates display systems, with 61% of the organizations choosing a windows machine to power their digital display. a fifth (21%) of all organizations have some form of networked computing infrastructure, such as the cube with its capacity to process exhibit content using 30 servers. instead, the majority (79%) of organizations interviewed have a single computer powering the display. this finding is perhaps not surprising, given that few institutions have dedicated it teams to support a single technology service like the cube. users and use cases understanding primary audiences was also important for this study, as the organizational user base defines the context for digital exhibits. the breakdown of these audiences is summarized in figure 1. for example, the university of oregon ford alumni center’s digital interactive display focuses primarily on showcasing the success of its alumni, with a goal of recruiting new students to the university. however, the interactive exhibits also serve the general public through tours and events on the university of oregon campus. other organizations with digital displays, such as all saints anglican school and the philadelphia museum of art, also target specific audiences, so planning for exhibits may be easier in those contexts than in organizations like the university of waterloo stratford campus, with its display wall at the downtown campus that receives visitor traffic from students, faculty, and the public. 44% 33% 22% types of audience academic public both public and academic information technology and libraries | june 2018 56 figure 1. audience types for digital displays in the study population. digital displays serve various purposes, which depend on the context of the organization in which they exist, their technical functionality, their primary audience, their service design, and other factors. interview participants were asked about the various uses for these technologies at their institutions. a single display could have multiple functions within a single institution. the following list summarizes these multiple uses: 1. educational (67%), such as displaying digital collections, archives, historical maps, and other informational. these activities can be summarized in the words of one participant as “education via browse”—in other words, self-guided discovery rather than formal instruction. 2. fun or entertainment (56%), including art exhibitions, film screenings, games, playful exhibits, and other engaging content to entice users. 3. communication (47%), which can be considered a form of digital signage to promote library or institutional services and marketing content. displays can also deliver presentations and communicate scholarly work. 4. teaching (42%), including formal and semi-formal instruction, workshops, student presentations, and student course-work showcases. 5. events (31%), such as public tours, conferences, guest speakers, special events, galas, and other social activities near or using the display. 6. community engagement (28%), including participation from community members through content contribution, showing local content, using the display technology as an outreach tool, and other strategies to build relationships with user communities. 7. research (22%), where the display functions as a tool that facilitates scholarly activities like data collection, analysis, and peer review. many study participants acknowledged challenges in using digital displays for this purpose and have identified other services that might support this use more effectively. content types and management in the words of deakin university librarians, “content is critical, but the message is king,” so it was particularly important for the author to understand the current digital display landscape as it relates to content.30 specifically, the research project encompassed the variety of content used on digital displays as well as how it is created, managed, shared, and received by the audiences of various organizations interviewed in this study. as can be observed in figure 2, all organizations supported 2d content, such as images, video, audio, presentation slides, and other visual and textual material. however, dynamic forms of content, such as social media feeds, interactive maps, and websites were less prevalent. it is our flagship | zvyagintseva 57 https://doi.org/10.6017/ital.v37i2.9987 figure 2. types of content supported by digital displays in the study population. discussions around interest in emergent, immersive, and dynamic 3d content such as games and virtual and augmented reality also came up frequently in the study interviews, and the researcher found that these types of content were supported in only 16 (57%) of the 28 total cases. this number is lower than the total number of interviewees because not all organizations interviewed had content to manage or display. in addition, many organizations recognized that they would likely be exploring ways to present 3d games or immersive environments through their digital display in the near future. not surprisingly, the creative agencies included in this study revealed an awareness and active development of content of this nature, noting “rising demand and interest in 3d and game-like environments.” furthermore, projects involving motion detection, the internet of things, and other sensor-based interactions are also seeing rise in demand, according to study participants. 100 % 61 % 57 % 0 10 20 30 40 50 60 70 80 90 100 content types supported content types static 2d dynamic web dynamic 3d information technology and libraries | june 2018 58 figure 3. content management systems for digital displays. in terms of managing various types of content, 20 (71%) of the organizations interviewed had used some form of content management system (cms), while the rest did not use any tool to manage or organize content. of those organizations that used a cms, 15 (75%) relied on a vendorsupplied system, such as tools by fourwinds interactive, visix, or nec live. the remaining 5 (18%) cms users created a custom solution without going to a vendor. this finding suggests that since the majority of content supported by organizations with digital displays is 2d, current vendor solutions for managing that content are sufficient for the study population at this point. it is unclear how the rise in demand for dynamic, game-like content will be supported by vendors in the coming years. table 1 reflects the distribution of approaches to managing content observed in the study population. 18% 11% 53% 18% 71% content management no system unknown vendor-supplied system in-house created system it is our flagship | zvyagintseva 59 https://doi.org/10.6017/ital.v37i2.9987 table 1. content management in study population content management responses % vendor supplied system 15 54 in-house created system 5 18 no system 5 18 unknown 3 10 middleware, automation, and exhibit management middleware can be described as the layer of software between the operating system and applications running on the display, especially in a networked computing environment. for example, most organizations studied in the environmental scan supported a windows environment with a range of exhibit applications, like slideshows, web browsers, and executable files, such as games. middleware can simplify and automate the process of starting up, switching between, and shutting off display applications on a set schedule. as figure 4 demonstrates, the majority of the organizations in the study population (17, or 61%) did not have a middleware solution. however, this group was heterogeneous: 14 organizations (50%) did not require a middleware solution because they ran content semi-permanently or relied on user-supplied content, in which case the display functioned as a teaching tool. the remaining three organizations (11%) manually managed scheduling and switching between exhibit content. in such cases, a middleware solution would be valuable to management of content, especially as the number of applications grows, but it was not present in these organizations. comparatively, 10 organizations (36%) used a custom solution, such as a combination of windows or linux scripts to manage automation and scheduling of content on the display. one organization (3%) did not specify their approach to managing content. these findings suggest that no formalized solution to automating and managing software currently exists among the study population. in addition to organizing content, digital-exhibits services involve scheduling or automating content to meet user needs according to the time of day, special events, or seasonal relevance. as a result, the middleware technology solution supports sustainable management of displays and predictable sharing of content for end users. this environmental scan revealed that digital exhibits and interactive experiences are still in the early days of development. it is possible that new solutions for managing content both at the application and the middleware level may emerge in the coming years, but they are currently limited. information technology and libraries | june 2018 60 figure 4. middleware solutions in the study population. sources of content when finding sources of content to be displayed on digital displays, organizations interviewed used multiple strategies simultaneously. table 2 below brings together the findings related to this theme. table 2. content sources for digital exhibits content source % external/commissioned 64 user-supplied 64 internal/in-house 50 collaborative with partner 43 for example, many organizations rely on their users to generate and submit material (18, or 64%); others commission vendors to create exhibits for them (18, or 64%). in 50% of all cases, organizations also produce content for exhibits in-house. in other words, most organizations used a combination of all sources to generate content for their digital displays. only a few use a single 61% 36% 3% middleware use none custom unknown it is our flagship | zvyagintseva 61 https://doi.org/10.6017/ital.v37i2.9987 source of content, such as the semi-permanent historical exhibit at henrico county public library. others, like the duke media wall, rely entirely on their users to supply content, which employs a “for students by students” model of content creation. additionally, only 12 (43%) of the organizations interviewed had explored or established some form of partnership for creating exhibits. primarily, these partnerships existed with departments, centers, institutes, campus units, and/or students in academic settings, such as the computer science department, faculty of graduate studies, and international studies. other examples of partnerships were with similar civic, educational, cultural, and heritage organizations, such as municipal libraries, historical societies, art galleries, museums, and nonprofits. examples included study participants working with ars electronica, local symphony orchestras, harvard space science, and nasa on digital exhibits. clearly, a variety of approaches were taken in the study population to come up with digital exhibits content. content creation guidelines seven organizations (19%) in the study population shared publicly the content guidelines aimed to simplify the process of engaging users in creating exhibits. these guidelines were analyzed, and key elements were identified that are necessary for users to know in order to contribute in a meaningful way, thereby lowering the barrier to participation. these elements include resolution of the display screen(s), touch capability, ambient light around the display space, required file formats, and maximum file size. a complete list of organizations with such guidelines, along with websites where these guidelines can be found, is included in appendix c. based on the analysis of this limited sample, the bare minimum for community participation guidelines would include clearly outlining • the scope, purpose, audience, and curatorial policy of the digital exhibits service; • the technical specifications, such as the resolution, aspect ratio, and file formats supported by the display; • the design guidelines, such as colors, templates and other visual elements; • the contact information of the digital exhibits coordinator; and • the online or email submission form. it should be noted, however, that such specifications are primarily useful when a cms exists and the content solicited from users is at least somewhat standardized. for example, images, slides, or webpages may be easier for community partners to contribute than video games or 3d interactive content. no examples of guidelines for the latter were observed in the study. content scheduling whereas the middleware section of this study examined the technical approaches to content management and automation, this section explores the frequency of exhibit rotation from a service design perspective. as can be observed in figure 5, no consistent or dominant model for exhibit scheduling has been identified in the study population. generally, approaches to scheduling digital exhibits reflect organizational contexts. for example, museums typically design an exhibit and display it on a permanent basis, while academic institutions change displays of student work or scholarly communication once per semester. the following scheduling models have emerged in the descending order of frequency in the study population. information technology and libraries | june 2018 62 figure 5. content scheduling distribution in the study population. 1. unstructured (29%): no formal approach, policy, or expectation is identified by the organization regarding displaying exhibits. this model is largely related to the early stage of service development in this domain, lack of staff capacity to support the service, and/or responsiveness to user needs. one study participant, for example, referred to this loose approach by noting that “no formalized approach and no official policy exists.” for example, institutions may have frameworks for what types of content are acceptable but no specific requirements on the content subjects. institutions adopting a lab space model (see figure 6) for digital displays largely belong to this category. in other words, content is created on the fly through workshops, data analysis, and other situations as needed by users. in this case, no formal scheduling is required apart from space reservations. 2. seasonal (29%), which can be defined as a period from three to six months and includes semester-based scheduling in academic institutions. many organizations operate on a quarterly basis, so it would seem logical that content refresh cycles reflect the broader workflow of the organization. 3. permanent (21%): in the cases of museums, permanent exhibits may mean displaying content indefinitely or until the next hardware refresh, which might reconfigure the entire interactive display service. no specific date ranges were cited for this model. 4. monthly (10%): this pattern was observed among academic libraries, with production of “monthly playlists” featuring curated book lists or other monthly specials. 5. weekly (7%): north carolina state university and deakin university libraries aim to have fresh content up once per week; they achieve this in part by formalizing the roles needed to support their digital display and visualization services. 29% 29% 21% 10% 7% 4% content scheduling unstructured seasonal permanent monthly weekly daily it is our flagship | zvyagintseva 63 https://doi.org/10.6017/ital.v37i2.9987 6. daily (4%): only griffith university ensures that new content is available every day on its #seemore display; it does this largely by relying on standardized external and internal inputs, such as weather updates and the university marketing department content. staffing and skills one key element of the digital exhibits research project included investigating staffing models required to support a service of this nature. not surprisingly, the theme around resource needs for digital exhibits emerged in most interviews conducted. several participants have noted that one “can’t just throw up content and leave it” while others advised to “have expertise on staff before tech is installed.” data gathered shows that the average full-time equivalent (fte) needed to support digital display services in organizations interviewed was 2.97—around three full time staff members. in addition, 74% of the organizations studied had maintenance or support contracts with various vendors, including av integrators, cms specialists, creative studios that produced original content, or hardware suppliers. hardware and av integrators typically provided a 12-month contract for technical troubleshooting while creative studios ensured a 3month support contract for digital exhibits they designed. the average time to create an original, interactive exhibit was between 9 and 12 months according to the data provided by creative agencies, the cube teams, and learning organizations who have in-house teams creating exhibits regularly. this length of time varies on the complexity of interaction designed, depth of the exhibit “narrative,” and modes of input supported by the exhibit application. additionally, it was important to understand the curatorial labor behind digital exhibits; the author did not necessarily speak with the curator of exhibits, and this work may be carried out by multiple individuals within organizations with digital displays or creative studios. in 20 (57%) of the cases, the person interviewed also curated some of or all the content for the digital display in their respective institutions. in five (14%) of the cases, the individual interviewed was not a curator for any of the content, because there was no need for curation in the first place. for example, displays in these cases were used for analysis or teaching and therefore did not require prepared content. in the rest of the cases (10, or 29%), a creative agency vendor, another member of the team, or a community partner was responsible for the curation of exhibit content. this finding suggests that, while a significant number of organizations outsource the design and curation of exhibits, the majority retain control over this process. therefore, dedicating resources to curation, organization, and management of exhibit content is deemed significant by the organizations represented in the study. in terms of the capacity to carry out digital display services, skills that have been identified by study participants as being important to supporting work of this nature include the following: 1. technical skills (such as the ability to troubleshoot), general interest in technology, and flexibility and willingness to learn new things (74%) 2. design, visual, and creative sensibility (40%), as this type of work is primarily a visual experience 3. software-development or programming-language knowledge (31%) 4. communication, collaboration, and relationship-building (25%) 5. project management (20%) information technology and libraries | june 2018 64 6. audiovisual and media skills (14%), as digital exhibits are “as much an av experience as an it experience,” according to one study participant 7. curatorial, organizational, and content-management skills (11%) the most frequent dedicated roles mentioned in the interviews are shown in table 3. table 3. types of roles significant to digital exhibits work position responses % developer/programmer 11 31 project manager 8 23 graphic designer 6 17 user experience or user interface designer 4 11 it systems administrator 4 11 av or media specialist 4 11 the relatively low percentages represented in this table suggest the distribution of skills mentioned above among various team members or combining multiple skills in a single role, as may be the case in small institutions or those without formalized services with dedicated roles. nevertheless, the presence of specific job titles indicates understanding of various skill sets needed to run a service that uses digital displays. challenges and successes many challenges were identified by study participants related to initiating and supporting a service that uses digital displays for learning. clearly, multiple challenges could be associated with the services related to digital displays within a single organization. however, many successes and lessons learned were also shared by interviewees, often overlapping with identified challenges. this pattern suggests that some organizations can pursue strategies that address challenges faced by their library or museum colleagues while perhaps lacking resources or capacity in other areas related to this type of service. for example, some organizations have observed a lack of user engagement because of limited interactivity of the technology solution they used. others have had successful user engagement largely by investing in technology solutions that provide a range of modes of interaction. it is important to learn from both these areas to anticipate possible pain points and to be able to capitalize on successes that lead to industry recognition and engagement from library customers. table 4 summarized the range of challenges identified. it is our flagship | zvyagintseva 65 https://doi.org/10.6017/ital.v37i2.9987 table 4. challenges related to digital display services challenge identified responses % technical 14 41 content 11 33 costs 11 33 user expectations 11 33 workflow 10 29 service design 9 26 time 8 24 organizational culture 8 24 user engagement 7 20 as reflected in table 4, several key challenges have been discussed: 1. technical, such as troubleshooting the technology, keeping up with new technologies or upgrades, and finding software solutions appropriate for the hardware selected. 2. content, such as coming up with original content or curating existing sources. in the words of one participant, “quality and refresh of content is key—it has to be meaningful, interesting, and new.” this clearly presents a resource requirement. 3. costs, such as the financial commitment to the service, the unseen costs in putting exhibits together, software licensing, and hardware upgrades. 4. user expectations, such as keeping the service at its full potential, using maximum functionality of the hardware, and software solutions. according to study participants, users “may not want what they think or they say they want,” and to some extent, "such technologies are almost an expectation now, and not as exciting for users.” 5. workflow or project-management strategies specifically related to emergent multimedia experiences that require new cycles of development and testing. 6. time to plan, source, create, troubleshoot, launch, and improve exhibits. 7. service design, such as thinking holistically about the functions of the technology within the larger organizational structure. as one study participant stated, organizations “cannot disregard the reality of the service being tied to a physical space” in that these types of technologies are both a virtual and physical customer experience. 8. organizational culture and policy, in terms of adapting project-based approaches to planning and resourcing services, getting institutional support, and educating all staff about the purpose, function, and benefits of the service. 9. user engagement, particularly keeping users interested in the exhibits and continually finding new and exciting content. various participants have found that “linger time is information technology and libraries | june 2018 66 between 30 seconds to few minutes” and content being displayed needs to be “something interesting, unique, and succinct, but not a destination in itself.” despite the clear challenges with delivering digital exhibits services, organizations that participated in this study have identified keys to success (see table 5). table 5. successes and lessons learned in using digital displays successful approach or lesson identified responses % user engagement and interactivity 16 47 service design 14 41 “wow” factor 12 35 organizational leadership 12 35 technology solution 10 29 flexibility 10 29 communication and collaboration 10 29 project management 9 26 team and skill sets 9 26 as reflected in table 5, several approaches have been discussed: • user engagement and interactivity, particularly for those institutions that invested in highly interactive and immersive experiences; the rewards are seen in interest and enthusiasm of their user groups. • service design: organizations that have carefully planned the service have found that this technology was successfully serving the needs of their user communities. • promotion and “wow factor” that has brought attention to the organization and the service. it is not surprising that digital displays are central points on tours of dignitaries, political figures, and external guests. further, many have commented that they “did not imagine a library could be involved in such an innovative experiment,” and others have added that their digital displays have “created new conversations that did not exist before.” • leadership and vision at the organizational level, which secures support and resources as well as defines the scope of the service to ensure its sustainability and success: “money is not necessarily the only barrier to doing this service, but risk taking, culture.” • technology solution, where “everything works” and both the organization and users of the service are happy with the functionality, features, and performance of the chosen solution. • flexibility and willingness to learn new things, including being open to agile projectmanagement methods, taking risks, and continually learning new tools, technologies, and processes as the service matures. it is our flagship | zvyagintseva 67 https://doi.org/10.6017/ital.v37i2.9987 • communication and collaboration, both internally among stakeholders and externally by building community partnerships, new audiences, and user participation in content creation. for example, one study participant noted that the technology “has contributed to giving the museum a new audience of primarily young people and families—a key objective held in 2010 at the commencement of the gallery refurbishments.” • workflow and project management for those embracing new approaches required to bring multiple skill sets together to create engaging new exhibits. as one participant has put it, “these types of approaches require testing, improvement, a new workflow and lifecycle for the projects.” • having the right team with appropriate skills to support the service, though this theme was rated as being less significant than designing services effectively and securing institutional support for the technology service. in other words, study participants noted that having in-house programming or design skills is not enough without proper definition of success for digital exhibits services. perceptions institutional and user reception of digital displays as a service to pursue in learning organizations has been identified as overwhelmingly positive, with 87% of the organizations noting positive feedback. for example, one study participant noted the positive attention received by the wider community for the digital display, stating “it is our flagship and people are in general impressed by both the potential and some of the existing content." some participants have gone as far as to say that the reception among users has been “through the roof” and they have “never had a negative feedback comment” about their display. this finding indicates a high degree of satisfaction with such technologies by organizations that pursued a digital display. table 6 further explores the range of perceptions observed in the study. table 6. perception of digital display services perception responses % positive 20 87 hesitation or uncertainty 7 30 concerns about purpose 4 17 concerns about user engagement 4 17 concerns about costs 3 13 negative 3 13 a minority (13%) have noted some negative perceptions, largely related to concerns about costs or functionality of the technology; 30% have observed uncertainty and hesitation on behalf of the staff and users in terms of engagement as well as interrogating its purpose in the organization. for example, one study participant summarizes this mixed sentiment by saying, “the perception is information technology and libraries | june 2018 68 that it’s really neat and worthwhile for exploring new ways of teaching, but that the same features and functions could be achieved with less (which we think is a good thing!).” it is helpful to note this trend in perception, as any new service will likely bring a mixture of excitement, hesitation, and occasional opposition. interestingly, these reactions have originated both from the staff of organizations interviewed and their communities of users. discussion the findings from this study indicate that the functions of the digital displays are highly dependent on the organizational context in which displays exist. this context, in turn, defines the nature of the services delivered through the digital display. for example, figure 6 can be useful in classifying the various ways digital displays appear in the study population, from research and teaching-oriented lab spaces to public spaces with passive messaging or active immersive gamelike digital experiences. figure 6. types of digital displays in the study population. as such, visualization walls might belong in the “lab spaces” category that typically appears in academic libraries or research units and do not require content planning and scheduling. what we might call “digital interactive exhibits” tend to appear in museums and galleries with a primarily public audience and may have a permanent, seasonal, or monthly rotation schedule. however, despite a range of approaches taken to provide content and in terms of use of these technologies, many organizations share resourcing needs and challenges, such as troubleshooting the technology solution, creating engaging content, and managing costs of interactive projects. despite these common concerns, the digital-exhibits services were perceived as being overwhelmingly satisfactory in all types of organizations included in this study because they brought new audiences to the organization and were often seen as “showpieces” in the broader community. the data gathered in the environmental scan demonstrates that there is currently little consistency among digital displays in learning environments. this lack of consistency is seen in content-development methods among study participants, their programming, content it is our flagship | zvyagintseva 69 https://doi.org/10.6017/ital.v37i2.9987 management, technology solutions, and even naming of the display (and, by extension, the display service). for example, this study revealed that no evidently “open platform” for managing content at the application or the middleware level currently exists. a small number of software tools are used by organizations to support digital displays, but their use is in no way standardized, as compared to nearly every other area of library services. there is some indication that digitaldisplay services may become more standardized in the coming years, and more tools, solutions, vendors, and communities of practice will be available. for example, many signage cmss are currently on the market, and the number of game-like immersive experience companies is growing, suggesting extension of these services to libraries in the coming years. only a few software tools exist for creating exhibits, such as intuiface and touchdesigner, though no free, open-source versions of exhibit software are currently available. as well, the growing number of digital exhibits and interactive media companies currently focuses on turnkey—rather than software-as-a-service or platform—solutions. in contrast, some consistency exists in staffing needs and skills required to support the digitalexhibits service. a majority of organizations interviewed agreed that design, software development, systems administration, and project-management skills are needed to ensure digital-exhibits services run sustainably in a learning organization. in addition, lack of public library representation in this study makes it challenging to draw parallels to the library context. adapting museum practices is also not necessarily reliable, as there is rarely a mandate to engage communities and partner on content creation, as there is in libraries. for example, only the el paso (texas) museum of history engages the local community to source and organize content. these findings suggest that digital displays are a growing domain, and more solutions are likely to emerge in the coming years. the cube, compared to the rest of the study population, is a unique service model because it successfully brings together most elements examined in the environmental scan. for example, to ensure continual engagement with the digital display, the cube schedules exhibits on a regular basis and employs user interface designers, systems administrators, software engineers, and project managers. it also extends the content through community engagement, public tours, and stem programming. it has created an in-house middleware solution to simplify exhibit delivery and has chosen unity3d as its platform of choice for exhibit development. limitations only organizations from english-speaking countries were interviewed as part of the environmental scan. it is therefore unclear if access to organizations from non–english-speaking countries would have produced new themes and significantly different results. in addition, as with all environmental scans, the data is limited by the degree of understanding, knowledge, and willingness to share information of the individual being interviewed. particularly, individuals with whom the author spoke may or may not have been technology or service leads for the digital display at their respective institutions. thus, the study participants had a range of understanding of hardware specifications, functionality, and service-design components associated with digital displays. for example, having access to technology leads would have likely provided more nuanced responses around the middleware solutions and the underlying technical infrastructure required to support this service. information technology and libraries | june 2018 70 a small number of vendors were also interviewed as part of the environmental scan even though vendors did not necessarily have digital displays or service models parallel to libraries or museums. they are included in appendix b. nevertheless, gathering data from this group was deemed relevant to the study, as creative agencies have formalized staffing models and clearly identified skill sets necessary to support services of this nature. in addition, this group possesses knowledge of best practices, workflows, and project-management processes related to exhibit development. finally, this environmental scan also did not capture any interaction with direct users of digital displays, whose experiences and perceptions of these technologies may or may not support the findings gathered from the organizations interviewed. these limitations were addressed by increasing the sample size of the study within the time and resource constraints of the research project. conclusion the findings of this study show that the functions of digital-display technologies and their related services are highly dependent on the organizational context in which they exist. however, despite a range of approaches taken to provide content and in terms of use of these technologies, many organizations share resourcing needs and challenges, such as troubleshooting the technology solution, creating engaging content, and managing costs of interactive projects. despite these common concerns, digital displays were perceived as being overwhelmingly positive in all types of organizations interviewed in this study, as they brought new audiences to the organization and were often seen as “showpieces” in the broader community. the successes and lessons learned from the study population are meant to provide a broader perspective on this maturing domain as well as help inform planning processes for future digital exhibits in learning organizations. it is our flagship | zvyagintseva 71 https://doi.org/10.6017/ital.v37i2.9987 appendix a. environmental scan questions digital exhibits environmental scan interview questions—museums, libraries, public organizations 1. what are the technical specifications of the digital interactive technology at your institution? 2. who are the primary users of this technology (those interacting with the platform)? is there anyone you thought would use it and isn’t? 3. what are primary uses for the technology (events, presentations, analysis, workshops)? 4. what types content is supported by the technology (video, images, audio, maps, text, games, 3d, all of the above?) 5. where is content created and how is this content managed? 6. what is the schedule for the content and how is it prioritized? 7. can you estimate the fte (full-time equivalent) of staff members involved in supporting this technology/service, both directly and indirectly? what does indirect support for this technology entail? 8. in your experience, what kinds of skills are necessary in order to support this service? 9. have partnerships with other organizations producing content to be exhibited been established or explored? 10. what challenges have you encountered in providing this service? 11. what have been some keys to the successes in supporting this service? 12. what has been the biggest success of this service and what has been the biggest disappointment? 13. what is the perception of this technology in institution more broadly? 14. are there any other institutions you suggest we contact to learn more about similar technologies? information technology and libraries | june 2018 72 digital exhibits environmental scan interview questions: vendors 1. what is the relationship between creative studio and hardware/fabrication? do you do everything or work with av integrators instead to put together touch interactives? 2. who have been the primary users of the interactive exhibits and projects you have completed? 3. who writes the use cases when creating a digital interactive exhibit? 4. what types content is supported by the technology (video, images, audio, maps, text, games, 3d, all of the above?) do you see a rise in interest for 3d and game-like environments and do you have internal expertise to support it? 5. where is content created for the exhibits and how is this content managed? who curates? 6. what timespan or lifecycle do you design for? 7. how big is your team? how long to projects typically take to create? 8. what types of expertise do you have in house? what might a project team look like? 9. to what extent is there a goal of sharing knowledge back with the company from clients or users? 10. what challenges have you encountered in providing this service? 11. what have been some keys to the successes in supporting this service? it is our flagship | zvyagintseva 73 https://doi.org/10.6017/ital.v37i2.9987 appendix b: study population in environmental scan organization location date interviewed all saints anglican school merrimac, australia july 25, 2016 anode nashville, tn july 22, 2016 belle & wissell seattle, wa july 26, 2016 bradman museum bowral, australia july 10, 2016 brown university library providence, ri june 3, 2016 university of calgary library and cultural resources calgary, ab june 2, 2016 deakin university library geelong, australia june 14, 2016 university of colorado denver library denver, co june 24, 2016 duke university library durham, nc august 17, 2016 el paso museum of history el paso, tx june 24, 2016 georgia state university library atlanta, ga june 10, 2016 gibson group wellington, new zealand july 16, 2016 henrico county public library henrico, va august 9, 2016 ideum corrales, nm july 26, 2016 indiana university bloomington library bloomington, in may 31, 2016 interactive mechanics philadelphia, pa august 2, 2016 johns hopkins university library baltimore, md june 20, 2016 nashville public library nashville, tn july 22, 2016 north carolina state university library raleigh, nc june 8, 2016 university of north carolina atchapel hill library chapel hill, nc june 2, 2016 university of nebraska omaha omaha, ne june 16, 2016 omaha do space omaha, ne july 11, 2016 university of oregon alumni center eugene, or june 7, 2016 philadelphia museum of art philadelphia, pa august 10, 2016 queensland university of technology brisbane, australia june 30; july 29, 2016; august 16, 2016 société des arts technologiques montreal, qc august 8, 2016 second story portland, or july 28, 2016 st. louis university st. louis, mo july 4, 2016 stanford university library stanford, ca july 22, 2016 university of illinois at chicago chicago, il june 22, 2016 university of mary washington fredericksburg, va july 7, 2016 visibull waterloo, on august 12, 2016 university of waterloo stratford campus stratford, on june 22, 2016 yale university center for science and social science information new haven, ct july 13, 2016 information technology and libraries | june 2018 74 appendix c: digital content publishing guidelines organization name guidelines website deakin university library http://www.deakin.edu.au/library/projects/sparking-trueimagination duke university https://wiki.duke.edu/display/lmw/lmw+home griffith university https://intranet.secure.griffith.edu.au/work/digitalsignage/seemore north carolina state university library http://www.lib.ncsu.edu/videowalls university colorado denver http://library.auraria.edu/discoverywall university of calgary library and cultural resources http://lcr.ucalgary.ca/media-walls university of waterloo stratford campus https://uwaterloo.ca/stratford-campus/research/christiemicrotiles-wall http://www.deakin.edu.au/library/projects/sparking-true-imagination http://www.deakin.edu.au/library/projects/sparking-true-imagination https://wiki.duke.edu/display/lmw/lmw+home https://intranet.secure.griffith.edu.au/work/digital-signage/seemore https://intranet.secure.griffith.edu.au/work/digital-signage/seemore http://www.lib.ncsu.edu/videowalls http://library.auraria.edu/discoverywall http://lcr.ucalgary.ca/media-walls https://uwaterloo.ca/stratford-campus/research/christie-microtiles-wall https://uwaterloo.ca/stratford-campus/research/christie-microtiles-wall it is our flagship | zvyagintseva 75 https://doi.org/10.6017/ital.v37i2.9987 references 1 flora salim and usman haque, “urban computing in the wild: a survey on large scale participation and citizen engagement with ubiquitous computing, cyber physical systems, and internet of things,” international journal of human-computer studies 81 (september 2015): 31–48, https://doi.org/10.1016/j.ijhcs.2015.03.003. 2 peter peltonen et al., “it’s mine, don't touch! interactions at a large multi-touch display in a city center,” proceedings of the sigchi conference on human factors in computing systems, florence, italy, april 5–10, 2008, 1285–94, https://doi.org/10.1145/1357054.1357255. 3 shawna sadler, mike nutt, and renee reaume, “managing public video walls in academic library,” (presentation, cni spring 2015 meeting, seattle, washington, april 13-14, 2015), http://dro.deakin.edu.au/eserv/du:30073322/sadler-managing-2015.pdf. 4 peltonen et al., “it’s mine, don't touch!” 5 john brosz, e. patrick rashleigh, and josh boyer. “experiences with high resolution display walls in academic libraries” (presentation, cni fall 2015 meeting, washington, dc, december 13-14, 2015), https://www.cni.org/wp-content/uploads/2015/12/cni_experiences_brosz.pdf; bryan sinclair, jill sexton, and joseph hurley, “visualization on the big screen: hands-on immersive environments designed for student and faculty collaboration” (presentation, cni spring 2015 meeting, seattle, washington, april 13–14, 2015), https://scholarworks.gsu.edu/univ_lib_facpres/29/. 6 niels wouters et al., “uncovering the honeypot effect: how audiences engage with public interactive systems. conference on designing interactive systems,” dis ’16 proceedings of the 2016 acm conference on designing interactive systems, brisbane, australia, june 4–8, 2016, 516, https://doi.org/10.1145/2901790.2901796. 7 gonzalo parra, joris klerkx, and erik duval, “understanding engagement with interactive public displays: an awareness campaign in the wild,” proceedings of the international symposium on pervasive displays, copenhagen, denmark, june 3–4, 2014, 180–85, https:/doi.org/10.1145 /2611009.2611020; ekaterina kurdyukova, mohammad obaid, and elisabeth andre, “direct, bodily or mobile interaction?,” proceedings of the 11th international conference on mobile and ubiquitous multimedia, ulm, germany, december 4–6, 2012, https://doi.org/10.1145 /2406367.2406421; tongyan ning et al., “no need to stop: menu techniques for passing by public displays,” proceedings of the 2011 annual conference on human factors in computing systems, vancouver, british columbia, https://www.gillesbailly.fr/publis/bailly_chi11.pdf. 8 jung soo lee et al., “a study on digital signage interaction using mobile device,” international journal of information and electronics engineering 5 no. 5 (2015): 394–97, https://doi.org/10.7763/ijiee.2015.v5.566. jung soo lee et al., “a study on digital signage interaction using mobile device,” international journal of information and electronics engineering 5 no. 5 (2015): 394–97, https://doi.org/10.7763/ijiee.2015.v5.566. 9 parra et al, “understanding engagement,” 181. https://doi.org/10.1016/j.ijhcs.2015.03.003 https://doi.org/10.1145/1357054.1357255 http://dro.deakin.edu.au/eserv/du:30073322/sadler-managing-2015.pdf https://www.cni.org/wp-content/uploads/2015/12/cni_experiences_brosz.pdf https://scholarworks.gsu.edu/univ_lib_facpres/29/ https://doi.org/10.1145/2901790.2901796 https://doi.org/10.1145/2611009.2611020 https://doi.org/10.1145/2611009.2611020 https://doi.org/10.1145/2406367.2406421 https://doi.org/10.1145/2406367.2406421 https://www.gillesbailly.fr/publis/bailly_chi11.pdf https://doi.org/10.7763/ijiee.2015.v5.566 https://doi.org/10.7763/ijiee.2015.v5.566 information technology and libraries | june 2018 76 10 parra et al, “understanding engagement,” 181; walter, robert, gilles gailly, and jorg müller. “strikeapose: revealing mid-air gestures on public displays.” proceedings of the sigchi conference on human factors in computing systems, paris, france, april 27-may 2, 2013, 841850. https://doi.org/10.1145/2470654.2470774. 11 philipp panhey et al., “what people really remember: understanding cognitive effects when interactive with large displays,” proceedings of the 2015 international conference on interactive tabletops & surfaces, madeira, portugal, november 15–18, 2015, 103–6, https://doi.org/10.1145/2817721.2817732. 12 christopher ackad et al., “an in-the-wild study of learning mid-air gestures to browse hierarchical information at a large interactive public display,” proceedings of the 2015 acm international joint conference on pervasive and ubiquitous computing, osaka, japan, september 7–11, 2015, 1227–38, https://doi.org/10.1145/2750858.2807532. 13 parra et al, “understanding engagement,” 181; kurdyukova, obaid and andre, 2012, n.p. 14 jouni vepsäläinen et al., “web-based public-screen gaming: insights from deployments,” ieee pervasive computing 15 no. 3 (2016): 40–46, https://ieeexplore.ieee.org/document/7508836/. 15 uta hinrichs, holly schmidt, and sheelagh carpendale, “emdialog: bringing information visualization into the museum,” ieee transactions on visualization and computer graphics 14 no. 6 (november 2008):1181-1188. https://doi.org/10.1109/tvcg.2008.127. 16 hinrichs, schmidt, and carpendale, “emdialog.” 17 sarah clinch et al., “reflections on the long-term use of an experimental digital signage system,” proceedings of the 13th international conference on ubiquitous computing, beijing, china, september 17-21, 2011, 133-142. https://doi.org/10.1145/2030112.2030132. 18 elaine m. huang, anna koster, and jan borchers. “overcoming assumptions and uncovering practices: when does the public really look at public displays?,” proceedings of the 6th international conference on pervasive computing, sydney, australia, may 19-22, 2008, 228-243. https://doi.org/10.1007/978-3-540-79576-6_14; jorg muller et al., “looking glass: a field study on noticing interactivity of a shop window,” proceedings of the sigchi conference on human factors in computing systems, austin, texas, may 5-10, 2012, 297-306. https://doi.org/10.1145/2207676.2207718. 19 salim & haque, “urban computing in the wild,” 35 20 mettina veenstra et al., “should public displays be interactive? evaluating the impact of interactivity on audience engagement,” proceedings of the 4th international symposium on pervasive displays, saarbruecken, germany, june 10–12, 2015, 15–21, https://doi.org/10.1145/2757710.2757727. 21 clinch et al., “reflections.” https://doi.org/10.1145/2470654.2470774 https://doi.org/10.1145/2817721.2817732 https://doi.org/10.1145/2750858.2807532 https://ieeexplore.ieee.org/document/7508836/ https://doi.org/10.1109/tvcg.2008.127 https://doi.org/10.1145/2030112.2030132 https://doi.org/10.1007/978-3-540-79576-6_14 https://doi.org/10.1145/2207676.2207718 https://doi.org/10.1145/2757710.2757727 it is our flagship | zvyagintseva 77 https://doi.org/10.6017/ital.v37i2.9987 22 robert ravnik and franc solina, “audience measurement of digital signage: qualitative study in real-world environment using computer vision,” interacting with computers 25, no. 3 (2013), https://doi.org/10.1093/iwc/iws023. 23 neal buerger, “types of public interactive display technologies and how to motivate users to interact,” media informatics advanced seminar on ubiquitous computing, 2011, hausen, doris, conradi, bettina, hang, alina, hennecke, fabiant, kratz, sven, lohmann, sebastian, richter, hendrik, butz, andreas and hussmann, heinrich (eds). university of munich, department of computer science, media informatics group, 2011. https://pdfs.semanticscholar.org/533a/4ef7780403e8072346d574cf288e89fc442d.pdf . 24 c. g. screven, “information design in informal settings: museums and other public spaces,” in information design, ed. robert e. jacobson (cambridge, ma: mit press, 2000), 131–192. 25 parra et al., “understanding engagement,” 181. 26 uta hinrichs and sheelagh carpendale, “gestures in the wild: studying multi-touch gesture sequences on interactive tabletop exhibits,” proceedings of the sigchi conference on human factors in computing systems, vancouver, british columbia, may 7–12, 2011, 3023–32, https://doi.org/10.1145/1978942.1979391. 27 harry brignull and yvonne rogers, “enticing people to interact with large public displays in public spaces,” interact ’03, proceedings of the international conference on human-computer interaction, zurich, switzerland, september 1-5, 2003, 17-24, matthias rauterberg, marino menozzi, and janet wesson (eds.), tokyo: ios press, 2003. http://www.idemployee.id.tue.nl/g.w.m.rauterberg/conferences/interact2003/interact200 3-p17.pdf. 28 peltonen et al., “it’s mine, don't touch!” 29 peltonen et al., “it’s mine, don't touch!” 30 anne horn, bernadette lingham, and sue owen, “library learning spaces in the digital age,” proceedings of the 35th annual international association of scientific and technological university libraries conference, espoo, finland, june 2-5, 2014. http://docs.lib.purdue.edu/iatul/2014/libraryspace/2. https://doi.org/10.1093/iwc/iws023 https://pdfs.semanticscholar.org/533a/4ef7780403e8072346d574cf288e89fc442d.pdf https://doi.org/10.1145/1978942.1979391 http://www.idemployee.id.tue.nl/g.w.m.rauterberg/conferences/interact2003/interact2003-p17.pdf http://www.idemployee.id.tue.nl/g.w.m.rauterberg/conferences/interact2003/interact2003-p17.pdf http://docs.lib.purdue.edu/iatul/2014/libraryspace/2 abstract introduction method literature review definitions interactivity user engagement age display content social context findings technical and hardware landscape users and use cases figure 1. audience types for digital displays in the study population. content types and management middleware, automation, and exhibit management sources of content content creation guidelines content scheduling staffing and skills challenges and successes perceptions discussion figure 6. types of digital displays in the study population. limitations conclusion appendix a. environmental scan questions digital exhibits environmental scan interview questions—museums, libraries, public organizations digital exhibits environmental scan interview questions: vendors appendix b: study population in environmental scan appendix c: digital content publishing guidelines references article explainable artificial intelligence (xai) adoption and advocacy michael ridley information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.14683 michael ridley (mridley@uoguelph.ca) is librarian, university of guelph. © 2022. abstract the field of explainable artificial intelligence (xai) advances techniques, processes, and strategies that provide explanations for the predictions, recommendations, and decisions of opaque and complex machine learning systems. increasingly academic libraries are providing library users with systems, services, and collections created and delivered by machine learning. academic libraries should adopt xai as a tool set to verify and validate these resources, and advocate for public policy regarding xai that serves libraries, the academy, and the public interest. introduction explainable artificial intelligence (xai) is a subfield of artificial intelligence (ai) that provides explanations for the predictions, recommendations, and decisions of intelligent systems.1 machine learning is rapidly becoming an integral part of academic libraries. xai is a set of techniques, processes, and strategies that libraries should adopt and advocate for to ensure that machine learning appropriately serves librarianship, the academy, and the public interest. knowingly or not, libraries acquire and provide access to systems, services, and collections infused and directed by machine learning methods, and library users are engaged in information behavior (e.g., seeking, using, managing) facilitated or augmented by machine learning. machine learning in library and information science (lis), as with many other fields, has become ubiquitous. however, this technology is often opaque and complex, yet consequential. there are significant concerns about bias, unfairness, and veracity.2 there are troubling questions about user agency and power imbalances.3 while lis has a long-standing interest in ai and intelligent information systems generally, 4 it has only recently turned its attention to xai and how it affects the field and how the field might influence it.5 xai is a critical lens through which to view machine learning in libraries. it is also a set of techniques, processes, and strategies essential to influencing and shaping this stil l emerging technology: research libraries have a unique and important opportunity to shape the development, deployment, and use of intelligent systems in a manner consistent with the values of scholarship and librarianship. the area of explainable artificial intelligence is only one component of this, but in many ways, it may be the most important.6 dismissing engagement with xai because it is “highly technical and impenetrable to those outside that community” is neither acceptable nor increasingly possible.7 artificial intelligence is the essential substrate of contemporary information systems and xai is a tool set for critical assessment and accountability. the details matter and must be understood if libraries are to have a place at the table as xai, and machine learning, evolves and further deepens its effect on lis. mailto:mridley@uoguelph.ca information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 2 this paper provides an overview of xai with key definitions, a historical context, and examples of xai techniques, strategies, and processes that form the basis of the field. it considers areas where xai and academic libraries intersect. the dual emphasis is on xai as a toolset for libraries to adopt and xai as an area for public policy advocacy. what is xai? xai is plagued by definitional problems.8 some definitions are focused solely and narrowly on the technical concepts while others focus only on the broad social and political dimensions. lacking “a theory of explainable ai, with a formal and universally agreed definition of what explanations are,”9 the fundamentals of this field are still being explored, often from different disciplinary perspectives.10 critical algorithm studies position machine learning as socio-techno-informational systems.11 as such, a definition of xai must encompass not just the techniques, as important and necessary as they are, but also the context within which xai operates. the us defense advanced research projects agency (darpa) description of xai captures the breadth and scope of the field. the purpose of xai is for ai systems to have “the ability to explain their rationale, characterize their strengths and weaknesses, and convey an understanding of how they will behave in the future” 12 and to “enable human users to understand, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners.”13 xai is needed to: 1. generate trust, transparency, and understanding; 2. ensure compliance with regulations and legislation; 3. mitigate risk; 4. generate accountable, reliable, and sound models for justification; 5. minimize or mitigate bias, unfairness, and misinterpretation in model performance and interpretation; and 6. validate models and validate explanations generated by xai.14 xai consists of testable and unambiguous proofs, various verification and validation methods that assess influence and veracity, and authorizations that define requirements or mandate auditing within a public policy framework. xai is not a new consideration. explainability has been a preoccupation of computer science since the early days of expert systems in the late twentieth century.15 however, the 2018 introduction of the general data protection regulation (gdpr) by the european union (eu) shifted explainability from a purely technical issue to one with an additional and urgent focus on public policy.16 while the presence of a “right to explanation” in the gdpr is highly contested, 17 industry groups and jurisdictions beyond the eu recognized its evitability spurring an explosion in xai research and development.18 types of xai taxonomies of xai types are classified based on their scope and mechanism.19 local explanations interpret the decisions of a machine learning model used in a specific instance (i.e., involving data and context relevant to the circumstance). global explanations interpret the model more generally (i.e., involving all the training data and relevant contexts). in black-box or model-agnostic explanations, only the input and the output of the machine learning model are required while information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 3 white-box or model-specific explanations require more detailed information regarding the processing or design of the model. another way to categorize xai is as proofs, validations, and authorizations. proofs are testable, traceable, and unambiguous explanations demonstrable through causal links, logic statements, or transparent processes. typically, proofs are only available for ai systems that use “inherently interpretable” techniques such as rules, decisions trees, or linear regressions.20 validations are explanations that confirm the veracity of the ai system. these verifications occur through testing procedures, reproducibility, approximations and abstractions, and justifications. authorizations are explanations because of processes in which third parties provide some form of standard, ratification, prohibition, or audit. authorizations might pertain to the ai model, its operation in specific instances, or even the process by which the ai was created. they can be provided by professional groups, nongovernmental organizations, governments and government agencies, and third parties in the public and private sector. academic libraries can adopt proofs and validations as means to interrogate information systems and resources. this includes collections which are increasingly machine learning systems themselves or developed with machine learning methods. the recognition of “collections as data” is an important shift in this direction.21 where appropriate, proofs and validations should accompany content and systems derived from machine learning. libraries must also engage with xai as authorizations to assess the public policy implications that exist, are emergent, or are necessary. library advocacy is currently lacking in this area. the requirement for policy and governance frameworks is a reminder that machine learning is “far from being purely mechanistic, it is deeply, inescapably human”22 and that while complex and opaque “the ‘black box’ is full of people.”23 prerequisites to an xai strategy three questions are important for any xai strategy: • what constitutes a good explanation? • who is the explanation for? • how will the explanation be provided? explanations are context specific. the “goodness” of an explanation is dependent on the needs and objectives of the explainee (a user) and the explainer (an xai). following research from the fields of psychology and cognitive science, keil suggests five reasons for why someone wants an explanation: (1) to predict similar events in the future, (2) to diagnose, (3) to assess blame or guilt, (4) to justify or rationalize an action, and (5) for aesthetic pleasure.24 for most people, explanations need not be complete or even fully accurate.25 as a result, who the explanation is for is critical to a good explanation. different audiences have different priorities. system developers are primarily interested in performance explanations while clients focus on effectiveness or efficacy, professionals are concerned about veracity, and regulators are interested in policy implications. nonexpert, lay users of a system want explanations that build trust and provide accountability. information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 4 a good explanation is also affected by its presentation. there are temporal and format considerations. explanations can be provided or available in real time and continuously as the process occurs (hence partial explanations) or post hoc and in summary form. interactive explanations are widely preferred but are not always appropriate or actionable. 26 studies have compared textual, visual, and multimodal formats with differing results. familiar textual responses or simple visual explanations such as venn diagrams are often most effective for nonexpert users.27 drawing from philosophy, psychology, and cognitive science, miller recommends four approach es for xai.28 explanations are contrastive. when people want to know the “why” of something, “people do not ask why event p happened, but rather why event p happened instead of some event q.” explanations are selected. “humans are adept at selecting one or two causes from a sometimes infinite number of causes to be the explanation.” explanations are social. “they are a transfer of knowledge, presented as part of a conversation or interaction, and are thus presented relative to the explainer’s beliefs about the explainee’s beliefs.” finally, miller cautions against using probabilities and statistical relationships and encourages references to causes. burrell identifies three key barriers to explainability: concealment, the limited technical understanding of the user, and an incompatibility between the user (human) and algorithmic reasoning.29 while concealment is deliberate, it may or may not be justified. protecting ip and trade secrets is acceptable while obscuring processes to purposively deceive users is not. regulations are a tool to moderate the former and minimize the latter. the technical limitations of users and the incompatibility between users and algorithms suggest two remedies. first is enhancing algorithmic literacy. algorithmic literacy is a “a set of competencies that enables individuals to critically evaluate ai technologies; communicate and collaborate effectively with ai; and use ai as a tool online, at home, and in the workplace.”30 libraries have a key role in advancing algorithmic literacy in their communities.31 just as libraries championed information literacy through the promulgation of standards and principles, the provision of diverse educational programming, and the engagement of the broad academic community, so too can libraries be central to efforts to enhance algorithmic literacy. second is a requirement that xai must be sensitive to the abilities and needs of different users. a survey of the key challenges and research direction of xai identified 39 issues, including the need to understand and enhance the user experience, match xai to user expertise, and explain the competencies of ai systems to users.32 this is the essence of human-centered explainable ai (hcxai). among hcxai principles are the importance of context (regarding user objectives, decision consequences, timing, modality, and intended audience), the value of using hybrid explanation methods that complement and extend each other, and the power of contrastive examples and approaches.33 proofs and validations xai that provide proofs or validations can be adopted by libraries to assess and evaluate machine learning utilized in systems, services, and collections. since proofs pertain to already interpretable systems, the four examples provided focus on validations: feature audit, approximation and abstraction, reproducibility, and xai by ai. these techniques may require access to, or information about, the machine learning model. this would include such characteristics as the algorithms used, settings of the parameters and hyperparameters, optimization choices, and the training data. while all these may not be normally information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 5 available, designers of machine learning systems in consequential settings should expect to provide, indeed be required to provide, such access. similarly, vendors of library content or systems utilizing machine learning should make explanatory proofs and validations available for library inspection. feature audit feature audit is an explanatory strategy that attempts to reveal the key features (e.g., characteristics of the data or settings of the hyperparameters used to the differentiate data) that have a primary role in the prediction of the algorithm. by isolating these features, it is possible to explain the key components of the decision. feature audit is a standard technique of linear regression, but it is made more difficult in machine learning because of the complexity of the information space (e.g., billions of parameters and high dimensionality). there are various feature audit techniques34 but all of them are “decompositional” in that they attempt to reduce the work of the algorithm to its component parts and then use those results as an explanation.35 feature audit can highlight bias or inaccuracy by revealing incongruence between the data and the prediction. more advanced feature audit techniques (e.g., gradient feature auditing) recognize that features can indirectly influence other features and that these features are not easily detectable as separate, influential elements.36 this interaction among features challenges the strict decompositional approach to feature audit and will likely lead to an increased focus on the relational analysis among and between elements. approximation and abstraction approximation and abstraction are techniques that create a more simplified model to explain the more complex model.37 people seek and accept explanations that “satisfice”38 and are coherent with existing beliefs.39 this recognizes that “an explanation has greater power than an alternative if it makes what is being explained less surprising.”40 approaches such as “model distillation”41 or the “model agnostic” feature reduction of the local interpretable model-agnostic explanations (lime) tool create a simplified presentation of the algorithmic model.42 this approximation or abstraction may compromise accuracy, but it provides an accessible representation that enhances understandability. a different type of approximation or abstraction is a narrative of the machine learning processes utilized that provides sufficient documentation for a reader to act as an explanation of the outcomes. an exemplary case of this is lithium-ion batteries: a machine-generated summary of current research published by springer nature and written by beta writer, an ai or more accurately a suite of algorithms.43 a collaboration of machine learning and human editors, the full production cycle of the book is documented in the introduction.44 in lieu of being able to interrogate the system directly, this detailed account provides an explanation of the system allowing readers to assess the strengths, limitations, and confidence levels of the algorithmic processes and offers a model of what might be necessary for future ai generated texts.45 libraries can utilize this documentation in acquisition or licensing decisions and subsequently make it available as user guides when resources are added to the collection. reproducibility replication is a verification strategy fundamental to science. being able to independently reproduce results in different settings provides evidence of veracity and supports user trust. however, documented problems in reproducing machine learning studies have questioned the information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 6 generalizability of these approaches and undermined their explanatory capacity. for example, an analysis of text mining studies using machine learning for citation screening in the preparation of systemic reviews revealed a lack of key elements to enable replicability (e.g., access to research datasets, software environments used, randomization control, and lack of detail on new methods proposed or employed).46 in response, a “reproducibility challenge” was created by the international conference on learning representations (iclr) to validate 2018 conference submissions and has continued in subsequent meetings.47 more rigorous replication through the availability of all necessary components and the development of standards will be important to this type of verification.48 xai by ai the inherent complexity and opacity of unsupervised learning or reinforcement learning suggests, as xai researcher trevor darrell puts it, “the solution to explainable ai is more ai.”49 in this approach to explanation, oversight ai are positioned as intermediaries between an ai and its users: workers have supervisors; businesses have accountants; schoolteachers have principals. we suggest that the time has come to develop ai oversight systems (“ai guardians”) that will seek to ensure that the various smart machines will not stray from the guidelines their programmers have provided.50 while the prospect of ai guardians may be dystopic, oversight systems performing roles that validate, interrogate, and report are common in code checking tools. generative adversarial networks (gans) have been used to create counterfactual explanations of another machine learning model to enhance explainability.51 with strategic organizational and staffing changes to enhance capabilities, libraries can design and deploy such oversight or adversarial tools with objectives appropriate to the requirements and norms of libraries and the academy. authorization xai that results from authorizations is an area where public policy engagement is needed to ensure xai, and machine learning, are appropriately serving libraries, the academy, and the public at large. three examples are provided: codes and standards, regulation, and audit. codes and standards one approach to explanation, supported by the ai industry and professional organizations, are voluntary codes or standards that encourage explanatory capabilities. these nonbinding principles are a type of self-regulation and are widely promoted as a means of assurance.52 the association for computing machinery’s statement on algorithms highlights seven principles as guides to system design and use: awareness, access and redress, accountability, explanation, data provenance, auditability, validation, and testing. however, the language used is tentative and conditional. designers are “encouraged” to provide explanations and to “encourage” a means for interrogation and auditing “where harm is suspected” (i.e., a post hoc process). despite this, the statement concludes with a strong position on accountability if not explainability: “institutions should be held responsible for decisions made by the algorithms that they use, even if it is not feasible to explain in detail how the algorithms produce their results.”53 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 7 unfortunately, the optimism for self-regulation in explainability is undercut by the poor experience with voluntary mechanisms regarding privacy protection.54 in addition, library associations, library system vendors, and scholarly publishers have been slow to endorse any codes or standards regarding explainability. regulation the most common recommendation for ai oversight and authorization to ensure explainability is the creation of a regulatory agency. specific suggestions include a “neutral data arbiter” with investigative powers like the us federal trade commission,55 a food and drug administration “for algorithms,”56 a standing “commission on artificial intelligence,”57 quasi-governmental agencies such as the council of europe,58 and a hybrid agency model combining certification and liability.59 such agencies would have legislated or delegated powers to investigate, certify, license, and arbitrate on matters relating to ai and algorithms, including their design, use, and effects. there are few calls for an international regulatory agency despite digitally porous national boundaries and the global reach of machine learning.60 that almost no such agencies have been created reveals the strength and influence of the large corporations responsible for developing and deploying most machine learning tools and systems.61 reports comparing regulatory approaches to ai among the european union, the united kingdom, the united states, and canada indicate significantly different approaches but with most proceeding with a “light touch” to avoid competitive disadvantages in a multitrillion dollar global marketplace.62 the introduction of the draft eu artificial intelligence act marks the first major jurisdiction to propose specific ai legislation.63 while the act is fulsome about high-risk ai, it is silent on any notion of “explainable” ai, preferring to focus on the less specific idea of “trustworthy artificial intelligence.” with this the eu appears to retreat from the idea of explainability in the gdpr. an exception to this inertia or backtracking is the development and use of algorithmic impact assessments in both governments and industry. these instruments help prospective users of an algorithmic decision-making system determine levels of explanatory requirements and standards to meet those requirements.64 canada has been a leader in this area with a protocol covering use of these systems in the federal government.65 some identify due process as a possible, if limited, remedy for explainability.66 however, a landmark us case suggests otherwise. in state v. loomis, regarding the use of compas, an algorithmic sentencing system, the court ruled on the role of explanation in due process:67 the wisconsin supreme court held that a trial court’s use of an algorithmic risk assessment in sentencing did not violate the defendant’s due process rights even though the methodology used to produce the assessment was disclosed neither to the court nor to the defendant.68 the petition of the loomis case to the us supreme court was denied, so a higher court ruling on this issue is unavailable.69 advocacy for regulations regarding explainability should be a central concern for libraries. without strong regulatory oversight requiring disclosure and accountability, machine learning information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 8 systems will remain black boxes and presence of these consequential systems in the lives of users will be obscured. audit a commonly recommended approach to ai oversight and explanation is third-party auditing.70 the use of audit and principles of auditing are widely accepted in a variety of areas. 71 in a library context, auditing of ai can be thought of as a reviewing process to achieve transparency or to determine product compliance. auditing is typically done after system implementation, but it can be accomplished at any stage. it is possible to audit design specifications, completed code, cognitive models, or periodic audits of specific decisions.72 the keys to successful audit oversight are clear audit goals and objectives (e.g., what is being audited and for what purpose), acknowledged expertise of the auditors, authority of the auditors to recommend, and authorization of the auditors to investigate. any such auditing responsibility for xai would require the trust of stakeholders such as ai designers, government regulators, industry representatives as well as users themselves. critics of the audit approach have focused on lack of auditor expertise, algorithmic complexity, and the need for approaches that assess the algorithmic system prior to its release. 73 while most audit recommendations assume a public agency in this role, an innovative suggestion is a crowdsourced audit (a form of audit study that involves the recruitment of testers to anonymously assess an algorithmic system; an xai form of the “secret shopper”).74 this approach resembles techniques used by consumer advocates and might indicate the rise of public activists into the xai arena. the complexity of algorithms suggests that a precondition for an audit is “auditability.”75 this would require that ai be designed in such a way that an audit is possible (i.e., inspectable in some manner) while, presumably, not impairing its predictive performance. sandvig et al. propose regulatory changes because “rather than regulating for transparency or misbehavior, we find this situation argues for ‘regulation toward auditability’.”76 auditing is not without its difficulties. there are no industry standards for algorithmic auditing.77 a high-profile development was the recent launch of orcaa (orcaarisk.com), an algorithmic auditing company started by cathy o’neil, a data scientist who has written extensively about the perils of uncontrolled algorithms.78 however, the legitimacy of third-party auditing has been criticized as lacking public transparency and the capacity to demand change.79 while libraries may not be able to create their own auditing capacity, whether collectively or individually, they are encouraged to engage with the emerging algorithmic auditing community to shape auditing practices appropriate for scholarly communication. xai as discovery while xai is primarily a means to validate and authorize machine learning systems, another use of xai is gaining attention. since xai can find new information latent in large and complex datasets, discovery is promoted as “one of the most important achievements of the entire algorithmic explainability project.”80 alkhateeb asks “can scientific discovery really be automated” while invoking the earlier work of swanson which mined the medical literature for new knowledge by connecting seemingly unrelated articles through search.81 an emerging reason for libraries to adopt xai may be as a powerful discovery tool. https://orcaarisk.com/ information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 9 conclusion our lives have become “algorithmically mediated”82 where we are “dependent on computational spectacles to see the world.”83 academic libraries are now sites where systems, services, and collections are increasingly shaped and provided by machine learning. the predictions, recommendations, and decisions of machine learning systems are powerful as well as consequential. however, “the danger is not so much in delegating cognitive tasks, but in distancing ourselves from—or in not knowing about—the nature and precise mechanisms of that delegation.”84 taddeo notes that “delegation without supervision characterises the presence of trust.”85 xai is an essential tool to build that trust. geoffrey hinton, a central figure in the development of machine learning,86 argues that requiring an explanation from an ai system would be “a complete disaster” and that trust and acceptance should be based on the system’s performance, not its explainability.87 this is consistent with the view of many that “if algorithms that cannot be easily explained consistently make better decisions in certain areas, then policymakers should not require an explanation.”88 both these views are at odds with the tenants of critical thought and assessment, and both challenge norms of algorithmic accountability. xai is a dual opportunity for libraries. on one hand, it is a set of techniques, processes, and strategies that enable the interrogation of the algorithmically driven resources that libraries provide to their users. on the other hand, it is a public policy arena where advocacy is necessary to promote and uphold the values of librarianship, the academy, and the public interest in the face of powerful new technologies. many disciplines have engaged with xai as machine learning has impacted their fields.89 xai has been called a “disruptive force” in lis,90 warranting the growing interest in how xai affects the field and how the field might influence it. information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 10 endnotes 1 vijay arya et al., “one explanation does not fit all: a toolkit and taxonomy of ai explainability techniques,” arxiv:1909.03012 [cs, stat], 2019, http://arxiv.org/abs/1909.03012; shane t. mueller et al., “explanation in human-ai systems: a literature meta-review, synopsis of key ideas and publications, and bibliography for explainable ai,” arxiv:1902.01876 [cs], 2019, http://arxiv.org/abs/1902.01876; ingrid nunes and dietmar jannach, “a systematic review and taxonomy of explanations in decision support and recommender systems,” user modeling and user-adapted interaction 27, no. 3 (2017): 393–444, https://doi.org/10.1007/s11257-017-9195-0; gesina schwalbe and bettina finzel, “xai method properties: a (meta-) study,” arxiv:2105.07190 [cs], 2021, http://arxiv.org/abs/2105.07190. 2 safiya noble, algorithms of oppression: how search engines reinforce racism (new york: new york university press, 2018); frank pasquale, the black box society: the secret algorithms that control money and information (cambridge, mass.: harvard university press, 2015); sara wachter-boettcher, technically wrong: sexist apps, biased algorithms, and other threats of toxic tech (new york: w. w. norton, 2017). 3 abeba birhane et al., “the values encoded in machine learning research,” arxiv:2106.15590 [cs], 2021, http://arxiv.org/abs/2106.15590; taina bucher, if ... then: algorithmic power and politics (new york: oxford university press, 2018); sarah myers west, meredith whittaker, and kate crawford, discriminating systems: gender, race, and power in ai (ai now institute, 2019), https://ainowinstitute.org/discriminatingsystems.html. 4 rao aluri and donald e. riggs, “application of expert systems to libraries,” ed. joe a. hewitt, advances in library automation and networking 2 (1988): 1–43; ryan cordell, machine learning + libraries: a report on the state of the field (washington dc: library of congress, 2020), https://labs.loc.gov/static/labs/work/reports/cordell-loc-ml-report.pdf; jason griffey, ed., “artificial intelligence and machine learning in libraries,” library technology reports 55, no. 1 (2019), https://doi.org/10.5860/ltr.55n1; guoying liu, “the application of intelligent agents in libraries: a survey,” program: electronic library and information systems 45, no. 1 (2011): 78–97, https://doi.org/10.1108/00330331111107411; linda c. smith, “artificial intelligence in information retrieval systems,” information processing and management 12, no. 3 (1976): 189–222, https://doi.org/10.1016/0306-4573(76)90005-4. 5 jenny bunn, “working in contexts for which transparency is important: a recordkeeping view of explainable artificial intelligence (xai),” records management journal (london, england) 30, no. 2 (2020): 143–53, https://doi.org/10.1108/rmj-08-2019-0038; cordell, “machine learning + libraries”; andrew m. cox, the impact of ai, machine learning, automation and robotics on the information professions (cilip, 2021), http://www.cilip.org.uk/resource/resmgr/cilip/research/tech_review/cilip_–_ai_report__final_lo.pdf; daniel johnson, machine learning, libraries, and cross-disciplinary research: possibilities and provocations (notre dame, indiana: hesburgh libraries, university of notre dame, 2020), https://dx.doi.org/10.7274/r0-wxg0-pe06; sarah lippincott, mapping the current landscape of research library engagement with emerging technologies in research and learning (washington dc: association of research libraries, 2020), https://www.arl.org/wp-content/uploads/2020/03/2020.03.25-emerging-technologies http://arxiv.org/abs/1909.03012 http://arxiv.org/abs/1902.01876 https://doi.org/10.1007/s11257-017-9195-0 http://arxiv.org/abs/2105.07190 http://arxiv.org/abs/2106.15590 https://ainowinstitute.org/discriminatingsystems.html https://labs.loc.gov/static/labs/work/reports/cordell-loc-ml-report.pdf https://doi.org/10.5860/ltr.55n1 https://doi.org/10.1108/00330331111107411 https://doi.org/10.1016/0306-4573(76)90005-4 https://doi.org/10.1108/rmj-08-2019-0038 http://www.cilip.org.uk/resource/resmgr/cilip/research/tech_review/cilip_–_ai_report_-_final_lo.pdf http://www.cilip.org.uk/resource/resmgr/cilip/research/tech_review/cilip_–_ai_report_-_final_lo.pdf https://dx.doi.org/10.7274/r0-wxg0-pe06 https://www.arl.org/wp-content/uploads/2020/03/2020.03.25-emerging-technologies-landscape-summary.pdf information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 11 landscape-summary.pdf; thomas padilla, responsible operations. data science, machine learning, and ai in libraries (dublin, oh: oclc research, 2019), https://doi.org/10.25333/xk7z-9g97; michael ridley, “explainable artificial intelligence,” research library issues, no. 299 (2019): 28–46, https://doi.org/10.29242/rli.299.3. 6 ridley, “explainable artificial intelligence,” 42. 7 bunn, “working in contexts for which transparency is important,” 151. 8 sebastian palacio et al., “xai handbook: towards a unified framework for explainable ai,” arxiv:2105.06677 [cs], 2021, http://arxiv.org/abs/2105.06677; sahil verma et al., “pitfalls of explainable ml: an industry perspective,” in mlsys journe workshop, 2021, http://arxiv.org/abs/2106.07758; giulia vilone and luca longo, “explainable artificial intelligence: a systematic review,” arxiv:2006.00093 [cs], 2020, http://arxiv.org/abs/2006.00093. 9 wojciech samek and klaus-robert muller, “towards explainable artificial intelligence,” in explainable ai: interpreting, explaining and visualizing deep learning, ed. wojciech samek et al., lecture notes in artificial intelligence 11700 (cham: springer international publishing, 2019), 17. 10 mueller et al., “explanation in human-ai systems.” 11 isto huvila et al., “information behavior and practices research informing information systems design,” journal of the association for information science and technology, 2021, 1–15, https://doi.org/10.1002/asi.24611. 12 darpa, explainable artificial intelligence (xai) (arlington, va: darpa, 2016), http://www.darpa.mil/attachments/darpa-baa-16-53.pdf. 13 matt turek, “explainable artificial intelligence (xai),” darpa, https://www.darpa.mil/program/explainable-artificial-intelligence. 14 julie gerlings, arisa shollo, and ioanna constantiou, “reviewing the need for explainable artificial intelligence (xai),” in proceedings of the hawaii international conference on system sciences, 2020, http://arxiv.org/abs/2012.01007. 15 william j. clancey, “the epistemology of a rule-based expert system—a framework for explanation,” artificial intelligence 20, no. 3 (1983): 215–51, https://doi.org/10.1016/00043702(83)90008-5; william swartout, “xplain: a system for creating and explaining expert consulting programs,” artificial intelligence 21 (1983): 285–325; william swartout, cecile paris, and johanna moore, “design for explainable expert systems,” ieee expert-intelligent systems & their applications 6, no. 3 (1991): 58–64, https://doi.org/10.1109/64.87686. 16 european union, “regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016,” 2016, http://eur-lex.europa.eu/legalcontent/en/txt/?uri=celex:32016r0679. https://www.arl.org/wp-content/uploads/2020/03/2020.03.25-emerging-technologies-landscape-summary.pdf https://doi.org/10.25333/xk7z-9g97 https://doi.org/10.29242/rli.299.3 http://arxiv.org/abs/2105.06677 http://arxiv.org/abs/2006.00093 https://doi.org/10.1002/asi.24611 http://www.darpa.mil/attachments/darpa-baa-16-53.pdf https://www.darpa.mil/program/explainable-artificial-intelligence http://arxiv.org/abs/2012.01007 https://doi.org/10.1016/0004-3702(83)90008-5 https://doi.org/10.1016/0004-3702(83)90008-5 https://doi.org/10.1109/64.87686 http://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:32016r0679 http://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:32016r0679 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 12 17 lilian edwards and michael veale, “slave to the algorithm? why a ‘right to explanation’ is probably not the remedy you are looking for,” duke law & technology review 16 (2017): 18–84; bryce goodman and seth flaxman, “european union regulations on algorithmic decision making and a ‘right to explanation’,” ai magazine 38, no. 3 (2017): 50–57, https://doi.org/10.1609/aimag.v38i3.2741; margot e. kaminski, “the right to explanation, explained,” berkeley technology law journal 34, no. 1 (2019): 189–218, https://doi.org/10.15779/z38td9n83h; sandra wachter, brent mittelstadt, and luciano floridi, “why a right to explanation of automated decision-making does not exist in the general data protection regulation,” international data privacy law 7, no. 2 (2017): 76–99, https://doi.org/10.1093/idpl/ipx005. 18 amina adadi and mohammed berrada, “peeking inside the black-box: a survey on explainable artificial intelligence (xai),” ieee access 6 (2018): 52138–60, https://doi.org/10.1109/access.2018.2870052; mueller et al., “explanation in human-ai systems”; vilone and longo, “explainable artificial intelligence.” 19 schwalbe and finzel, “xai method properties.” 20 or biran and courtenay cotton, “explanation and justification in machine learning: a survey” (international joint conference on artificial intelligence, workshop on explainable artificial intelligence (xai), melbourne, 2017), http://www.cs.columbia.edu/~orb/papers/xai_survey_paper_2017.pdf. 21 padilla, responsible operations. 22 jenna burrell and marion fourcade, “the society of algorithms,” annual review of sociology 47, no. 1 (2021): 231, https://doi.org/10.1146/annurev-soc-090820-020800. 23 nick seaver, “seeing like an infrastructure: avidity and difference in algorithmic recommendation,” cultural studies 35, no. 4–5 (2021): 775, https://doi.org/10.1080/09502386.2021.1895248. 24 frank c. keil, “explanation and understanding,” annual review of psychology 57 (2006): 227– 54, https://doi.org/10.1146/annurev.psych.57.102904.190100. 25 donald a. norman, “some observations on mental models,” in mental models, ed. dedre gentner and albert l. stevens (new york: psychology press, 1983), 7–14. 26 ashraf abdul et al., “trends and trajectories for explainable, accountable, and intelligible systems: an hci research agenda,” in proceedings of the 2018 chi conference on human factors in computing systems, chi ’18 (new york: acm, 2018), 582:1–582:18, https://doi.org/10.1145/3173574.3174156; joachim diederich, “methods for the explanation of machine learning processes and results for non-experts,” psyarxiv, 2018, https://doi.org/10.31234/osf.io/54eub. 27 pigi kouki et al., “user preferences for hybrid explanations,” in proceedings of the eleventh acm conference on recommender systems, recsys ’17 (new york, ny: acm, 2017), 84–88, https://doi.org/10.1145/3109859.3109915. https://doi.org/10.1609/aimag.v38i3.2741 https://doi.org/10.15779/z38td9n83h https://doi.org/10.1093/idpl/ipx005 https://doi.org/10.1109/access.2018.2870052 http://www.cs.columbia.edu/~orb/papers/xai_survey_paper_2017.pdf https://doi.org/10.1146/annurev-soc-090820-020800 https://doi.org/10.1080/09502386.2021.1895248 https://doi.org/10.1146/annurev.psych.57.102904.190100 https://doi.org/10.1145/3173574.3174156 https://doi.org/10.31234/osf.io/54eub https://doi.org/10.1145/3109859.3109915 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 13 28 tim miller, “explanation in artificial intelligence: insights from the social sciences,” artificial intelligence 267 (2019): 3, https://doi.org/10.1016/j.artint.2018.07.007. 29 jenna burrell, “how the machine ‘thinks’: understanding opacity in machine learning algorithms,” big data & society 3, no. 1 (2016), https://doi.org/10.1177/2053951715622512. 30 duri long and brian magerko, “what is ai literacy? competencies and design considerations,” in proceedings of the 2020 chi conference on human factors in computing systems, chi ’20 (honolulu, hi: association for computing machinery, 2020), 2, https://doi.org/10.1145/3313831.3376727. 31 michael ridley and danica pawlick-potts, “algorithmic literacy and the role for libraries,” information technology and libraries 40, no. 2 (2021), https://doi.org/doi.org/10.6017/ital.v40i2.12963. 32 waddah saeed and christian omlin, “explainable ai (xai): a systematic meta-survey of current challenges and future opportunities,” arxiv:2111.06420 [cs], 2021, http://arxiv.org/abs/2111.06420. 33 shane t. mueller et al., “principles of explanation in human-ai systems” (explainable agency in artificial intelligence workshop, aaai 2021), http://arxiv.org/abs/2102.04972. 34 sebastian bach et al., “on pixel-wise explanations for non-linear classifier decisions by layerwise relevance propagation,” plos one 10, no. 7 (2015): e0130140, https://doi.org/10.1371/journal.pone.0130140; biran and cotton, “explanation and justification in machine learning: a survey”; chris brinton, “a framework for explanation of machine learning decisions” (ijcai-17 workshop on explainable ai (xai), melbourne: ijcai, 2017), http://www.intelligentrobots.org/files/ijcai2017/ijcai-17_xai_ws_proceedings.pdf; chris olah, alexander mordvintsev, and ludwig schubert, “feature visualization,” distill, november 7, 2017, https://doi.org/10.23915/distill.00007. 35 edwards and veale, “slave to the algorithm?” 36 philip adler et al., “auditing black-box models for indirect influence,” knowledge and information systems 54 (2018): 95–122, https://doi.org/10.1007/s10115-017-1116-3. 37 alisa bokulich, “how scientific models can explain,” synthese 180, no. 1 (2011): 33–45, https://doi.org/10.1007/s11229-009-9565-1; keil, “explanation and understanding.” 38 herbert a. simon, “what is an ‘explanation’ of behavior?,” psychological science 3, no. 3 (1992): 150–61, https://doi.org/10.1111/j.1467-9280.1992.tb00017.x. 39 norbert schwarz et al., “ease of retrieval as information: another look at the availability heuristic,” journal of personality and social psychology 61, no. 2 (1991): 195–202, https://doi.org/10.1037/0022-3514.61.2.195; paul thagard, “evaluating explanations in law, science, and everyday life,” current directions in psychological science 15, no. 3 (2006): 141– 45, https://doi.org/10.1111/j.0963-7214.2006.00424.x. https://doi.org/10.1016/j.artint.2018.07.007 https://doi.org/10.1177/2053951715622512 https://doi.org/10.1145/3313831.3376727 https://doi.org/doi.org/10.6017/ital.v40i2.12963 http://arxiv.org/abs/2111.06420 http://arxiv.org/abs/2102.04972 https://doi.org/10.1371/journal.pone.0130140 http://www.intelligentrobots.org/files/ijcai2017/ijcai-17_xai_ws_proceedings.pdf https://doi.org/10.23915/distill.00007 https://doi.org/10.1007/s10115-017-1116-3 https://doi.org/10.1007/s11229-009-9565-1 https://doi.org/10.1111/j.1467-9280.1992.tb00017.x https://doi.org/10.1037/0022-3514.61.2.195 https://doi.org/10.1111/j.0963-7214.2006.00424.x information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 14 40 tania lombrozo, “explanatory preferences shape learning and inference,” trends in cognitive sciences 20, no. 10 (2016): 756, https://doi.org/10.1016/j.tics.2016.08.001. 41 sarah tan et al., “detecting bias in black-box models using transparent model distillation,” arxiv:1710.06169 [cs, stat], november 18, 2017, http://arxiv.org/abs/1710.06169. 42 marco tulio ribeiro, sameer singh, and carlos guestrin, “model-agnostic interpretability of machine learning,” arxiv:1606.05386 [cs, stat], 2016, http://arxiv.org/abs/1606.05386. 43 beta writer, lithium-ion batteries: a machine-generated summary of current research (heidelberg: springer nature, 2019), https://link.springer.com/book/10.1007/978-3-03016800-1. 44 henning schoenenberger, christian chiarcos, and niko schenk, preface to lithium-ion batteries; a machine-generated summary of current research, by beta writer, (heidelberg: springer international publishing, 2019). 45 michael ridley, “machine information behaviour,” in the rise of ai: implications and applications of artificial intelligence in academic libraries, ed. sandy hervieux and amanda wheatley (association of college and university libraries, 2022). 46 babatunde kazeem olorisade, pearl brereton, and peter andras, “reproducibility of studies on text mining for citation screening in systematic reviews: evaluation and checklist,” journal of biomedical informatics 73 (2017): 1–13, https://doi.org/10.1016/j.jbi.2017.07.010; babatunde k. olorisade, pearl brereton, and peter andras, “reproducibility in machine learning-based studies: an example of text mining,” in reproducibility in ml workshop (international conference on machine learning, sydney, australia, 2017), https://openreview.net/pdf?id=by4l2pbq-. 47 joelle pineau, “reproducibility challenge,” october 6, 2017, http://www.cs.mcgill.ca/~jpineau/iclr2018-reproducibilitychallenge.html. 48 benjamin haibe-kains et al., “transparency and reproducibility in artificial intelligence,” nature 586, no. 7829 (2020): e14–e16, https://doi.org/10.1038/s41586-020-2766-y; benjamin j. heil et al., “reproducibility standards for machine learning in the life sciences,” nature methods, august 30, 2021, https://doi.org/10.1038/s41592-021-01256-7. 49 cliff kuang, “can a.i. be taught to explain itself?,” the new york times magazine, november 21, 2017, 50, https://nyti.ms/2hr1s15. 50 amitai etzioni and oren etzioni, “incorporating ethics into artificial intelligence,” the journal of ethics 21, no. 4 (2017): 403–18, https://doi.org/10.1007/s10892-017-9252-2. 51 kamran alipour et al., “improving users’ mental model with attention-directed counterfactual edits,” applied ai letters, 2021, e47, https://doi.org/10.1002/ail2.47. 52 association for computing machinery, statement on algorithmic transparency and accountability (new york: acm, 2017), http://www.acm.org/binaries/content/assets/publicpolicy/2017_joint_statement_algorithms.pdf; alex campolo et al., ai now 2017 report (new https://doi.org/10.1016/j.tics.2016.08.001 http://arxiv.org/abs/1710.06169 http://arxiv.org/abs/1606.05386 https://link.springer.com/book/10.1007/978-3-030-16800-1 https://link.springer.com/book/10.1007/978-3-030-16800-1 https://doi.org/10.1016/j.jbi.2017.07.010 https://openreview.net/pdf?id=by4l2pbqhttp://www.cs.mcgill.ca/~jpineau/iclr2018-reproducibilitychallenge.html https://doi.org/10.1038/s41586-020-2766-y https://doi.org/10.1038/s41592-021-01256-7 https://nyti.ms/2hr1s15 https://doi.org/10.1007/s10892-017-9252-2 https://doi.org/10.1002/ail2.47 http://www.acm.org/binaries/content/assets/public-policy/2017_joint_statement_algorithms.pdf http://www.acm.org/binaries/content/assets/public-policy/2017_joint_statement_algorithms.pdf information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 15 york: ai now institute, 2017); ieee, ethically aligned design: a vision for prioritizing human wellbeing with artificial intelligence and autonomous systems (new york: ieee, 2019), https://standards.ieee.org/content/dam/ieeestandards/standards/web/documents/other/ead1e.pdf. 53 association for computing machinery, statement on algorithmic transparency and accountability, 2. 54 lilian edwards and michael veale, “enslaving the algorithm: from a ‘right to an explanation’ to a ‘right to better decisions’?,” ieee security & privacy 16, no. 3 (2018): 46–54. 55 kate crawford and jason schultz, “big data and due process: toward a framework to redress predictive privacy harms,” boston college law review 55, no. 1 (2014): 93–128. 56 andrew tutt, “an fda for algorithms,” administrative law review 69, no. 1 (2017): 83–123. 57 corinne cath et al., “artificial intelligence and the ‘good society’: the us, eu, and uk approach,” science and engineering ethics, march 28, 2017, https://doi.org/10.1007/s11948-017-9901-7. 58 edwards and veale, “slave to the algorithm?” 59 matthew u. scherer, “regulating artificial intelligence systems: risks, challenges, competencies, and strategies,” harvard journal of law & technology 29, no. 2 (2016): 353– 400. 60 roger brownsword, “from erewhon to alphago: for the sake of human dignity, should we destroy the machines?,” law, innovation and technology 9, no. 1 (january 2, 2017): 117–53, https://doi.org/10.1080/17579961.2017.1303927. 61 birhane et al., “the values encoded in machine learning research”; ana brandusescu, artificial intelligence policy and funding in canada: public investments, private interests (montreal: centre for interdisciplinary research on montreal, mcgill university, 2021). 62 cath et al., “artificial intelligence and the ‘good society’”; law commission of ontario and céline castets-renard, comparing european and canadian ai regulation, 2021, https://www.lcocdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulationfinal-november-2021.pdf. 63 european commission, “artificial intelligence act,” 2021, https://eur-lex.europa.eu/legalcontent/en/txt/?uri=celex:52021pc0206. 64 dillon reisman et al., algorithmic impact assessment: a practical framework for public agency accountability (new york: ai now institute, 2018), https://ainowinstitute.org/aiareport2018.pdf. 65 treasury board of canada secretariat, “directive on automated decision-making,” 2019, http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=32592. https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead1e.pdf https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead1e.pdf https://doi.org/10.1007/s11948-017-9901-7 https://doi.org/10.1080/17579961.2017.1303927 https://www.lco-cdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulation-final-november-2021.pdf https://www.lco-cdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulation-final-november-2021.pdf https://www.lco-cdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulation-final-november-2021.pdf https://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:52021pc0206 https://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:52021pc0206 https://ainowinstitute.org/aiareport2018.pdf http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=32592 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 16 66 danielle keats citron and frank pasquale, “the scored society: due process for automated predictions,” washington law review 89 (2014): 1–33; scherer, “regulating artificial intelligence systems.” 67 julia angwin et al., “machine bias,” propublica, may 23, 2016, https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. 68 “state v. loomis,” harvard law review 130, no. 5 (2017), https://harvardlawreview.org/2017/03/state-v-loomis/. 69 “loomis v. wisconsin,” scotusblog, june 26, 2017, http://www.scotusblog.com/casefiles/cases/loomis-v-wisconsin/. 70 brownsword, “from erewhon to alphago”; campolo et al., ai now 2017 report; ieee, ethically aligned design; pasquale, the black box society: the secret algorithms that control money and information; wachter, mittelstadt, and floridi, “why a right to explanation.” 71 michael power, the audit society: rituals of verification (oxford: oxford university press, 1997). 72 alfred ng, “can auditing eliminate bias from algorithms?,” the markup, february 23, 2021, https://themarkup.org/ask-the-markup/2021/02/23/can-auditing-eliminate-bias-fromalgorithms. 73 joshua alexander knoll, “accountable algorithms” (phd diss, princeton university, 2015). 74 christian sandvig et al., “auditing algorithms: research methods for detecting discrimination on internet platforms,” data and discrimination: converting critical concerns into productive inquiry, 2014, http://wwwpersonal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20-%20ica%202014%20data%20and%20discrimination%20preconference.pdf. 75 association for computing machinery, statement on algorithmic transparency and accountability. 76 sandvig et al., “auditing algorithms,” 17. 77 ng, “can auditing eliminate bias from algorithms?” 78 cathy o’neil, weapons of math destruction: how big data increases inequality and threatens democracy (new york: crown, 2016). 79 emanuel moss et al., assembling accountability: algorithmic impact assessment for the public interest (data & society, 2021), https://datasociety.net/wpcontent/uploads/2021/06/assembling-accountability.pdf. 80 david s. watson and luciano floridi, “the explanation game: a formal framework for interpretable machine learning,” synthese (dordrecht) 198, no. 10 (2020): 9214, https://doi.org/10.1007/s11229-020-02629-9. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing https://harvardlawreview.org/2017/03/state-v-loomis/ http://www.scotusblog.com/case-files/cases/loomis-v-wisconsin/ http://www.scotusblog.com/case-files/cases/loomis-v-wisconsin/ https://themarkup.org/ask-the-markup/2021/02/23/can-auditing-eliminate-bias-from-algorithms https://themarkup.org/ask-the-markup/2021/02/23/can-auditing-eliminate-bias-from-algorithms http://www-personal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20--%20ica%202014%20data%20and%20discrimination%20preconference.pdf http://www-personal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20--%20ica%202014%20data%20and%20discrimination%20preconference.pdf http://www-personal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20--%20ica%202014%20data%20and%20discrimination%20preconference.pdf https://datasociety.net/wp-content/uploads/2021/06/assembling-accountability.pdf https://datasociety.net/wp-content/uploads/2021/06/assembling-accountability.pdf https://doi.org/10.1007/s11229-020-02629-9 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 17 81 ahmed alkhateeb, “science has outgrown the human mind and its limited capacities,” aeon, april 24, 2017, https://aeon.co/ideas/science-has-outgrown-the-human-mind-and-its-limitedcapacities; don r. swanson, “undiscovered public knowledge,” the library quarterly 56, no. 2 (1986): 103–18; don r. swanson, “medical literature as a potential source of new knowledge.,” bulletin of the medical library association 78, no. 1 (1990): 29–37. 82 jack anderson, “understanding and interpreting algorithms: toward a hermeneutics of algorithms,” media, culture & society 42, no. 7–8 (2020): 1479–94, https://doi.org/10.1177/0163443720919373. 83 ed finn, “algorithm of the enlightenment,” issues in science and technology 33, no. 3 (2017): 24. 84 jos de mul and bibi van den berg, “remote control: human autonomy in the age of computermediated agency,” in law, human agency, and autonomic computing, ed. mireille hildebrandt and antoinette rouvroy (abingdon: routledge, 2011), 59. 85 mariarosaria taddeo, “trusting digital technologies correctly,” minds and machines 27, no. 4 (2017): 565, https://doi.org/10.1007/s11023-017-9450-5. 86 cade metz, genius makers: the mavericks who brought ai to google, facebook, and the world (dutton, 2021). 87 tom simonite, “google’s ai guru wants computers to think more like brains,” wired, december 12, 2018, https://www.wired.com/story/googles-ai-guru-computers-think-more-like-brains/. 88 nick wallace, “eu’s right to explanation: a harmful restriction on artificial intelligence,” techzone, january 25, 2017, http://www.techzone360.com/topics/techzone/articles/2017/01/25/429101-eus-rightexplanation-harmful-restriction-artificial-intelligence.htm#. 89 mueller et al., “explanation in human-ai systems.” 90 bunn, “working in contexts for which transparency is important,” 143. https://aeon.co/ideas/science-has-outgrown-the-human-mind-and-its-limited-capacities https://aeon.co/ideas/science-has-outgrown-the-human-mind-and-its-limited-capacities https://doi.org/10.1177/0163443720919373 https://doi.org/10.1007/s11023-017-9450-5 https://www.wired.com/story/googles-ai-guru-computers-think-more-like-brains/ http://www.techzone360.com/topics/techzone/articles/2017/01/25/429101-eus-right-explanation-harmful-restriction-artificial-intelligence.htm http://www.techzone360.com/topics/techzone/articles/2017/01/25/429101-eus-right-explanation-harmful-restriction-artificial-intelligence.htm abstract introduction what is xai? types of xai prerequisites to an xai strategy proofs and validations feature audit approximation and abstraction reproducibility xai by ai authorization codes and standards regulation audit xai as discovery conclusion endnotes lib-s-mocs-kmc364-20140601053153 184 ]oumal of library automation vol. 5/3 september, 1972 two types of designs for on-line circulation systems rob mcgee: systems development office, university of chicago libra,ry on-line circulation systems divide into two types. one type contains records only for charged or otherwise absent items. the other contains a file of records for all titles or volumes in the library collection, regardless of their circulation status. this paper traces differences between the two types, examining different kinds of files and terminals, transaction evidence, the quality of bibliographic data, querying, and the possibility of functions outside circulation. aspects of both operational and potential systems are considered. introduction a literature survey was made of on-line circulation systems ( 1 ). to qualify for study, a system needed to perform any major circulation function on-line. charging and querying were common. some systems were also found to perform some acquisitions, cataloging, and reference work. criteria used to examine systems have been presented in an earlier paper as key factors of circulation system analysis and design (2). this paper conceptualizes the survey findings, and goes on to consider general problems and alternatives of designing on-line circulation systems. the survey shows that on-line circulation systems divide into two types, according to the scope of their bibliographic records. we give the term "absence file" to a set of records for only those items that have been charged or otherwise removed from their assigned locations. the name "item file" is given to what is, or approaches being, a comprehensive file of records for all titles or volumes in the library collection, regardless of their circulation status. each on-line circulation system either does or does not have an item file. systems without an item file must contain an absence file, and are two types of designs/mcgee 185 therefore called "absence systems." systems with an item file are called "item systems." (an item system may also have an absence file, depending upon its design.) note that an "absence file" and an "item file" are each conceptually or logically defined as a single file, whereas in some operational systems either may be stored as more than one physical file. two other basic files generally appear in operational systems: a user file of complete records for users; and a transaction file that may be variously used for data collection, system update, system backup, and batch generation of notices. we can now generalize common but not exclusive file definitions for the two design types. absence systems usually contain three main logical files: 1) a user file; 2) an absence file that contains records only for charged or otherwise absent items; and 3) a transaction file. user identification number and complete item data (all the item data the system is to hold) are input at transaction time to create charge records. these data are typically collected from machine-readable sources such as punched cards or magnetic strips; the surveyed systems use punched cards. time data, such as charge date or due date, and circumstantial data, such as charging location, may also be collected. during batch processing, user records are accessed by identification number to obtain name, address, and so forth. examples of absence systems are found at west sussex county library (3,4,5 ), illinois state library ( 6,7 ), midwestern university library ( 8,9,10), queen's university library ( 11,12,13,14,15), northwestern university library ( 16), and bucknell university library ( 17). item systems are characterized by three or four major files: 1) a user file; 2) an item file of bibliographic records for all library volumes or titles, or for as many as machine records can feasibly be created and stored; 3) a transaction file that may be used for update of the item file, data collection and analysis, and perhaps notice generation; and optionally 4) an absence file of records for circulating items, if transaction data for them are more efficiently kept here than in the item file. records in both the user and item files contain either full data or at least enough data to address messages to users and to adequately describe items. if an absence file is used in an item system, it may copy bibliographic data from the item file, or the two files may be linked to avoid data redundancy. item systems are in operation at bell laboratories library ( 18,19 ), eastern illinois university library ( 20), ohio state university libraries ( 21,22 ), and the technical library of the manned spacecraft center, houston ( 23). (the manned spacecraft center library, alone among the systems we have surveyed, does not have a user file. instead, user's last name, initials, and address code are input at charge time.) since the basic distinction between absence and item systems is whether descriptive data for an item are machine-held prior to its charge time, item file records are limited primarily by the costs of conversion and machine storag~, but can have full, even marc-like formats; whereas absence sys186 journal of library automation vol. 5/3 september, 1972 tern records are restricted by the quantity of data that can be input at charge time, i.e. by the capacities of source record coding and data transfer techniques. basic approaches to on-line circulation system development three approaches to the design of on-line circulation systems have originated from different notions of circulation control ( 24). first is the view that circulation control is a separate library function, or one with minimal relationships to other library data processing. exclusive requirements for user and item data are formulated; the format of bibliographic data and the design of data management capabilities are developed explicitly for circulation control, to the exclusion of other library data processing requirements. absence systems have been developed with this approach, but thus far item systems have not. the second approach is to create a circulation system that is operationally independent of other library data processing activities, but designed with a view toward the possibility of shared usage of bibliographic and other data, and of general library data processing facilities. compatibility with other functions is provided, to aid later combination. either system design can take this approach, but item systems can take better advantage of the integration of functions. a third approach is to add circulation control to other large file processes (such as a cataloging system), or to develop them concurrently. this follows an integrated view of library data processing that sees a circulation system operating with many of the same data and processing requirements as other library functions, all of which are handled by a general library data management system. the broad range of library data processing activities needs to be addressed, and an item system design is likely to be preferred. two concepts underlie these approaches: 1 ) an integrated library system, and 2) a remotely accessible library catalog. an integrated view of the library is one of a total operating unit with a variety of operations that are logically interrelated and interconnected by their mutual requirements for data and processing ( 25) . the term "integrated system" usually implies a system in which centralized, minimally-redundant files undergo shared processing by different library functions. it is not clear exactly how the concept of a remotely accessible catalog should be defined, or exactly what the phrase means to various users. if we take it to mean the capability to access information from a given catalog at remote locations, then a variety of systems may qualify: e.g., telephone access to a group that performs manual card catalog lookups ; multiple locations of book catalogs or microform catalogs; and terminal access to an on-line, computerized catalog. the last is pertinent to our discussion. two types of designs/mcgee 187 how an integrated system is implemented determines if its central bibliographic file is accessible from multiple remote locations; how an on-line, remotely accessible catalog is used determines if the system is integrated. recently the ohio state university libraries circulation system has come to be explicitly called a remote catalog access system. we have not yet found reports of any on-line system that has integrated all of technical services and circulation. the addition of circulation control to existing on-line cataloging systems has been planned for the shawnee mission system ( 26) and mentioned for the system at western kentucky university, bowling green ( 27). the ohio college library center has not yet decided how it will handle circulation. as long as we define an integrated system on the basis of multiple uses of nonredundant data (among other characteristics), and a remote catalog access system upon physical accessibility, various systems may qualify as either or both. recognizing these two concepts helps to show how the three approaches to on-line circulation system development capsulize broader trends in library systems. first, the redundancy of bibliographic data in operationally separate but conceptually related functions has characterized traditional manual systems, batch computer systems, and now some on-line systems. second, the construction of individual, independent subsystems, while planning for their eventual combination, has been called an evolutionary approach to a total system, by de gennaro ( 25). he has also defined a third, integrated approach, in which computerized systems are designed to take advantage of interrelationships among different subsystems, for example, by accepting one-time inputs of given data, and processing them for multiple library functions and outputs. these three trends have been widely experienced in changing relationships among traditional and innovative systems for acquisitions and cataloging. the pattern is repeating now in the evolution of on-line systems with respect to technical services processing and circulation control. large on-line systems are emerging to perform acquisitions, cataloging, circulation control, and reference functions with shared processing facilities and data bases. terminal devices most on-line circulation systems do not perform all functions on-line, although the following are possibilities: charge, discharge, inquiry, and other record creations and updates, such as reserving items in circulation, renewing loans, recording fines payments, and even converting files to machine-readable form. what do their input/output requirements imply for terminal devices? inputs for charges may be minimal user and item identification numbers, or full borrower and book descriptions. evidence of valid charges may be produced; printouts of user number, call number, short author and title, and due date are common. there are also special "security systems" that switch two-state devices in books (such as sensitized plates ~r labels) to record valid charge, but as yet no such system has been 188 journal of library automation vol. 5/3 september, 1972 coupled with on-line charging. discharge inputs need only to match existing records; simple access keys such as call number or accession number are adequate. querying, too, may be accomplished with simple search keys, or with bibliographic inputs such as author and title . all these functions can be performed with keyboard input and output display of alphanumeric data. absence systems although not a requirement by our definition, most absence systems feature machine-readable user and item cards (the queen's university system does not); the terminals used for charge and discharge must have card reading capabilities. thus for on-line tasks charge stations need card readers and the ability to produce charge evidence, usually in hard copy; querying by bibliographic search keys requires keyboard input and output display of alphanumeric data; discharge stations need a card-reading capability and a display mechanism to identify reserved items; and file creation requires inputs of alphanumeric data in a character set that may range from minimum to full. there are at least two problems in choosing terminal equipment for absence systems. fi1·st, any single terminal or configuration that satisfies input and output requirements for all basic functions may be too expensive to install at every library location of these activities. second, the combination of separate hardware units (such as keyboards, printers, and card readers) may require special hardware or software interfaces that prove difficult and expensive ( 16, 28). alternatively, separate circulation stations with different terminal devices can be established for specific functions. this solution may introduce problems of hardware and personnel redundancy and backup. difficulties with terminal devices explain in part why most systems perform not all but only selected functions on-line. item systems it is possible, in systems with user and item files, to access records by using search keys that are either keystroked or machine-read from cards. the use of machine-readable cards involves the same problems as those described for absence systems. however, choosing keyboard entry of accession or call numbers eliminates card reading, and simplifies requirements so that keyboard devices with display capabilities can perform all basic functions. the feasibility of keyboarding the inputs at transaction time has been demonstrated by the systems at queen's university library (an absence system without machinereadable cards), ohio state university libraries, bell laboratories, and the technical library of the manned spacecr:1ft center. a system based on a single terminal device that handles all realtime functions offers attractive simplifications for hardware and teleprocessing software. the primary disadvantages also center on the device itself. factors such as input error, transmission and printing rates, character set, special function keys, noise, and cost have various implications two types of designs/mcgee 189 for system design and operations. obviously, in a system based on a single terminal device, the characteristics of that device are influential. kilgour has stated that the two most important factors in configuring a computer system for an item file design are, first, the nature of secondary memory, and second, the kind of terminal device to be employed ( 29). the need to quickly access large stores of data is basic. as for keyboard devices, one often finds that typewriter terminals may require far more computation of a central computer than do cathode ray tube terminals, because many crt systems have substantial computing power of their own, giving the effect of a satellite computer. this can be important for systems that will run on time-shared machines, or transmit data over long distances. the problems we have described for circulation terminals can be overcome; appropriate devices can be built. too many library systems have been designed around unsuitable hardware; there has been little choice but to develop circulation systems (both on-line and batch) with data collection devices designed for industrial applications. their influence-frequently bad-is fundamental to the nature of resulting systems. in fairness, suppliers need both direction and marketing potential. the deeper fault is with librarians, who have inadequately documented requirements and not proven the existence of a market. the integrated approach to library automation ultimately visualizes all library functions using a single set each of bibliographic, user, and other kinds of records, although different pieces of data for different purposes. similarly, one can say that different sets of terminal requirements arise from the different input/ output specifications among library tasks and not so much from the nature of bibliographic and other data. as functional requirements of different activities (e.g., acquisitions, cataloging, and circulation) overlap, the opportunity to use identical or similar terminals in a variety of library processes is enhanced. extending the integrated approach to libraryrelated hardware fits well with the concepts of modular hardware design and add-on features. take, for example, a basic keyboard/display screen terminal to which modules can be added to read book and borrower identification and to produce hardcopy printout. transaction evidence a variety of transactions may occur between a library and its users: charging and discharging books, placing reserves on circulating material, paying fines, etc. evidence may be provided to verify transaction accuracy and to furnish receipts for users. this evidence can be in various formats: it might be a hardcopy record (or worksheet) of transaction inputs, or a printout or screen display of system responses. printed charge evidence is a familiar example, and is sometimes used for inspection of items that library users carry from the building. two kinds of charge evidence may be defined ( 2). simple evidence contains no more user and item data than are input at transaction time. complex evidence contains user or item data other 190 ]oumal of library automation vol. 5/3 september, 1972 than charge time input, and requires the system to extract data from machine-held file( s). printed evidence typically contains an item due date that may be calculated from either user or item criteria, or both; or directly specified at the time of charge. let us look at the implications of printed charge evidence for the two system types. absence systems in most absence systems user identification number and full item data are transferred into the system from machine-readable cards at charge time. there are various ways of printing simple transaction evidence; the following are illustrative. one technique is to transmit data directly to a computer that formats them for output, calculates a due date, and returns them to a printing device. another method is to process source record data with a terminal system that can buffer and format them, select a juc date, and output the evidence on a printer. shifting functions from the computer to a terminal system may simplify teleprocessing software, save time at the central processing unit, and permit nearly normal charge operations during computer downtime. if more elaborate user data than identification number are required, there are two obvious solutions. central user records may he accessed to provide complex evidence, possibly increasing central processing unit time and response time. or, user cards (such as magnetically encoded ones ) that contain fuller data could be employed with a terminal system that handles them independently of the central computer. item systems in item systems as in absence systems it is possible to use machinereadable cards, with the same implications for printing charge evidence. however, if 1) user and item numbers are keystroked, 2) these are considered sufficient borrower and book information, and 3) decision rules for loan periods are simple, then little or no computer response is required for charging. due date may be returned and printed to signal completed transaction, or predated date due slips may be used. alternatively, special terminal features may be added to select and print a due date. this complicates otherwise simple terminal requirements. sophistications such as status checks on the borrower (e.g., any outstanding fines?) and item (e.g., is it reserved for another user?) will of course require more extensive processing and responses. if charge time inputs are indeed keystroked, user and item numbers with check digits are desirable, to minimize the effects of input error. for complex evidence response time is important, expecially if terminals are typewriter-like devices. the time required is determined by the sources of response data, their access times, how much data must be transmitted, and the transmission and terminal display rates. through careful design the time required to obtain charge evidence and complete the transaction can two types of designsj.mcgee 191 be minimized. for example, if the user number carries a code for borrower class, then a due date can be quickly selected and printed, while the item file is accessed for needed author /title data. it is clear that in an item system containing a user file, only very simple inputs are required to record a charge transaction. the additional requirements for charge evidence, status checks on user and item, and so forth determine how elaborate and slow system responses may become. availability, holdings, and absence information one may take the view that a library should provide the following kinds of responses to users. if a title is requested, library holdings for it should be given. if a specific item is wanted, either its absence or presumed location should be reported. if the item cannot be immediately provided, the library should determine its future availability and inform the user. the terms "availability," "holdings," and "absence information" have special meanings. the availability of a specific item to a library's users is mapped onto the universe of items by the library's acquisitions, cataloging, circulation control, and interlibrary borrowing functions. availability information obtains from all these sources, but particularly from the public catalog of library holdings. absence information, in contrast, corresponds only to a subset of library holdings-it tells the locations of library-owned items when they are absent from the locations indicated by the catalog. absence information therefore corresponds to a subset of holdings information; holdings information is a subset of availability information. in the context of our discussion an absence system provides full absence information and only partial holdings and availability information. an item system can provide full holdings and absence information, but only partial availability information, since items not owned may be ordered, or borrowed from another library. such considerations strengthen the argument that circulation control shares a functional unity with other library processes, and should therefore be considered as one of several integrated functions. the provision of absence and availability information is the essence of circulation system querying requirements. figure 1 shows that different query keys access different subsets of availability information. note the wider utility of some keys than for just circulation control. absence systems in on-line circulation systems built around an absence file, the data representing each physical item may range from a simple accession or item call number (as in the queen's university and northwestern university systems) to larger records containing as much data as may be stored and transferred with a machine-readable card (for example, a hollerith-punched book card). if availability information is to be obtained from a library's public catalog, and not from the circulation system, then access to the file of absence information may be with any key shown in the catalog records: 192 journal of library automation vol. 5/3 september, 1972 e.g., author and title, title and author, ca11 number, accession number. only the simpler keys, item call number and accession number, have been used in absence systems developed so far. consequently their requirements for file organization and access software are minimal. in most systems these keys permit exact matches to only single records, but in the northwestern university system a call number query may cause display of a set of related records ( 16). item systems the query function in item systems is bound by different constraints than those constraints of absence systems. the amount of bibliographic data is not restricted by the storage capacities of machine-readable cards, transaction response time, or the transfer rates of charge-time inputs. however, the following questions arise: how many and which records from the library's data base (e.g., its shelflist) must be converted to provide a sufficient item file? for each record converted, how much and which data are required? what functions shall such a data base ultimately support? what kinds of absence and availability information will be provided? these deserve special discussion before we consider querying in item systems. item system bibliographic data how much data are required for item file records? in an integrated system full records are ultimately produced by the cataloging process. should one use full, variable-length, marc-like records for circulation control? the conversion, on-line storage, management, and access of a large file of full bibliographic records are expensive propositions. one may be compelled toward a lesser effort. under much of the popular data management software it is easier to organize and store fixed length records than variable length records with different combinations of fixed and variable length fields. the files of current item systems hold less-than-full bibliographic records: bell laboratories library utilizes two basic fixed length formats of 155 and 188 characters ( 18,19); the eastern illinois item file consists of 124-byte records ( 20); although the ohio state system contains variable length records, they are less-than-full bibliographically, averaging 103 bytes ( 22 ) ; the ~1anned spacecraft center system has fixed length records of 168 characters ( 23). if not a full, marc-like record, then what? two questions may be asked: how much data should be converted for each record? and: how much of these data should be put initially into an item file? if one believes a fully integrated system may eventually take over some public catalog functions, then traditional author-title-subject accesses must be maintained, at least until proven unneeded. the minimum genuinely useful set of bibliographic data elements needed for futuristic information retrieval from library catalogs has not been proven; the safe but expensive answer is to convert univ e r se of all items two types of designs/ mcgee 193 access keys .,..------:~}standard bibliogr aphic / descr i ption l st andard but noncomp r ehensively --__ ::;;~ ~ppli~d.single-element unique a----/" 1dent1f1e r: e . g. , isbn , ssn _,/ l li brarya ssigned ke ys s uch as item call number and accession numbe r no te ~means t he access key r etrieves all me mbers of a set ..-me a ns the access ke y retrieves only so~e members of a set fig. 1. possible access ke ys to sets of availability information full records. initially, however, one might want no more data in an item file than are functionally justified. how much are a ctually needed ? the four existing item systems provide traditional information in new ways that have dramatically improved services to users. they answer several basic kinds of questions on-line: does the library have book ? is it available now? what books do i have charged out? such queries can be answered by nonsubject, d escriptive bibliographic data, and by circulation status information that shows if items are absent, and when they may become available. for this an item file needs records only for items that are used, in contrast to a comprehensive on-line shelflist. which records to include b ecomes a problem remarkably similar to deciding what books to put into low-access, compact storage , or to discard. the two university libraries with item systems chose comprehensive conversions: eastern illinois for 235,000 volumes ( 20 ), and ohio state for 800,000 titles ( 22). what are the potential advantages to users of an item system? if one only wants to know what books are charged, an absence system will suffice. both the penalties and promise of an item system lie in its bibliographic store-in the records it holds (scope ), and in the data these records contain (content ) . unless real-time querying of an item file can substitute for at least some manual searches of the public catalog, and in an improved way, its bibliographic data offer no direct advantages to users of a circulation system; an item system will provide no direct circulation services that an absence system could not. applying this as a test to the utility of a noncompreh ensive item file (a file of records only for items that are, or are likely to b e, in use ), we find perhaps the key question for development of item systems among libraries with very large catalogs: to what extent may 194 journal of library automation vol. 5/3 september, 1972 a noncomprehensive item file substitute for accesses to a comprehensive public catalog? although related, this is not the same question as what proportion of a library's book stock circulates. this is a question of how the public catalog is used: by whom , and for what? lipetz's study of the card catalog in yale university's sterling memorial library gives insight to at least that institution's catalog use (30). he found that 73 percent of the users attempting searches were looking for particular documents (known items ). overall, users' approaches to catalog searches were: author, 62 percent; title, 28.5 percent; subject, 4.5 percent; and editor, 4 percent. this may encourage one to believe that an item file which is accessible by author and title can handle a significant portion of manual catalog lookups. if so, developers of item systems may want to consider strategies similar to the following. if it is shown that satisfactory author /title access can be provided by an item file, then perhaps a large library is justified in dividing its card catalog and retaining only the subject-access portion. the argument is that author/ title access can be provided by an item file of partial records containing nonsubject descriptive data, whereas the requirements for subject access involve still more data that are likely to change as subject descriptions do. however, if a manual card catalog for subjects were maintained , this would facilitate updates of subject headings, and at the same time permit the most efficient format and smallest set of machine-held item file records to be kept. through the use of machine-held subject authority files, maintenance instructions and replacement heading cards could be computer-produced for update of the manual subject catalog. (distribution of machine-readable subject headings is being considered by the library of congress marc development office.) reduction in the maintenance and use of a full manual author-title-subject catalog by library technical services departments could produce significant savings, aside from whatever direct improvements in access that machine files might provide. if the item file were noncomprehensive, or contained retrospective records only for those items that circulated, then author/title accesses would of course be limited to the contents of that file. this would require maintenance of full manual catalogs for noncirculating items. two general alternatives to a comprehensive item file of full records come to mind. one is to utilize records as they are created by the cataloging process, complemented by partial-record conversion (conveniently, inhouse) for only those retrospective items that circulate. another is to create a special circulation-only item file of partial records. this kind of system would use an item file primarily as an alternative to machine-readable book cards. absence and holdings querying would be supported, but not acquisitions or cataloging functions. a system like this, with an item file of partial records, may be the most reasonable answer for large research libraries ( 28). it should be able to give the same circulation services as an absence system, in addition to satisfying certain kinds of two types of designs j.mcgee 195 public catalog searches. the simplifications for data conversion, data management software, and terminals are worth special evaluation as a middle or simple approach to on-line circulation system development, with an item system design. item system querying, bibliographic data structure, and file organization the querying capabilities featured by each item system differ somewhat, and are explained in part by differences in bibliographic data structure and file organization. the data and design of an information-providing system are fundamental to the kinds of services it can provide. a useful conceptual model is the traditional manual library system in which separate files are used for different functions: an in-process file for technical processing, a shelflist for the official holdings, a public catalog for reference, and a circulation file to control item absences. among these the file for circulation contains less bibliographical data than the others, since even a single data element such as the call number can uniquely identify a physical item and relate it to a fuller description, such as a shelflist record. a circulation file of this nature is in effect a manual absence file, and serves no major purpose other than circulation control. vvere the processing requirements not impractical, circulation status cou ld be more usefully recorded in public catalog records, in the manner of an item file. the eastern illinois university system has an indexed sequential item file organized by item call number plus accession number. it may be queried by this key to get an exact match to a single record, or by a classification number to get a file scan of corresponding records. query by user number displays charges to the user. the ohio state system has a read-only item file that is randomized by item call number, but it may also be accessed by an author/title key that consists of the first four characters of the main entry plus the first five characters of the first significant word or words of the title. the second five characters can be blanked to provide author-only access. the file is also accessible by an item record number that is assigned sequentially to new records entering the file. the bell laboratories system provides access by item number to its item file, and uses a set of twelve query codes to obtain status and other factual information on users and items. user number and item number are the query keys. the item file is also used to produce a book catalog that gives the item numbers by which queries can be made. the item file of the manned spacecraft center library is sequentially organized by item number, and can also be queried by call number and user number. these systems demonstrate alternatives for bibliographic data structure and file organization and access methods that are summarized by the author in a separate work ( 1) and explained by the references for each system 196 journal of library automation vol. 5/3 september, 1972 ( 18,19,20,21,22,23). briefly, the eastern illinois, bell laboratories, and manned spacecraft center systems use a fixed-length item record structure, and charge data are written directly to item record fields that are defined for this purpose. the ohio state system has a variable length , read-only item record. transaction data are recorded in an absence file, and linked to the item file. in the bell laboratories system what is conceptually a single item file is actually two separately organized physical rles with different record formats. fixed-length book records are organized sequentially, and each contains fields for three loans and two reserves; all copies and volumes are represented. journal records arc organized by an indexed sequential method, and do not contain copy and volume data, which must be added at transaction time. in the eastern illinois and manned spacecraft center systems the item file contains a separate record for each physical volume in the library. the ohio state item file contains one record per title. although it is difficult to tell without detailed programming knowledge of these systems, thebell laboratories data structure seems to enable exact matches to single records for status queries (e.g., what is the status of title number ? what is the status of copy ? ) in ways that the eastern illinois and ohio state systems can only accomplish through a terminal operator's interpretation of a displayed set of matching records. the bell laboratories system can therefore conduct queries of this nature with keyboard/printer terminals, whereas the eastern illinois and ohio state systems require crt devices to display large amounts of information. it can also ask what overnight loans are still out, possibly a function of its journal file's data structure. the software implications of these various capabilities will not be discussed here. suffice it to say that absence systems require simpler accesses and data management than do the kind of item systems we have discussed, and that as item files are designed to replace all or selected public catalog functions, their data management and user interface requirements become greater. special aspects of the charge function two aspects of the charge function have special significance for on-line systems: patron self-charging, and a telephone and mail or delivery service. among the on-line systems we surveyed, only the one at northwestern university is reported to be self-charging. to have patron self-charging requires that charge transactions be simple and convenient. data transfer methods that require little effort are therefore preferred, and the usc of machine-readable user and item identifications seems to be the best current choice. the northwestern system uses hollerithpunched user badges and book cards. other methods of data entry such as magnetic card reading and optical scanning are often mentioned for circulatwo types of designs/mcgee 197 tion control, but as of december 1971 we know of none that has resulted in a practical terminal-based svstem for on-line charging. two of the item systems promote a telephone and delivery service: the bell laboratories system and the ohio state university system. in each system inquiries can be directed to operators, who may conduct on-line searches of library holdings and circulation information for specific items. the kinds of questions that can be asked are "does the library have ___ ?" and "is it charged?" we noted earlier that a catalog can he made "remotely accessible" in several ways: e.g., by a special group that performs manual card catalog lookups for telephoned requests, or by users' consulting multiple copies of book or microform catalogs. in principle, a variety of catalogs and circulation systems can be used together in a telephone inquiry system of this nature. for example, the library of the georgia institute of technology has recently implemented an "extended catalog access" and delivery service that is based on microfiche copies of its catalog at thirty-six campus locations, coupled with telephone inquiry to a manual circulation system ( 31). readers look up wanted items and telephone the library to request them. the manual circulation file is checked: available items are charged for delivery, or reserves may be placed for items that are already loaned. presumably, the currency of information and quickness of response times are better in an on-line circulation system than in any other type. an item system can furnish both holdings and absence information. an absence system needs to be coupled to another system to furnish holdings information: a requirement is that the holdings information must contain a key by which the corresponding absence records can be accessed. these are basic considerations in providing a telephone and delivery service. system backup the problems considered here derive from two conditions: unexpected system downtimes and scheduled periods when the system is not in operation. at these times a system cannot execute on-line tasks . two classes of backup problems are: 1) provision of service to users during the downtimes; and 2) updating system files to record downtime transactions. the latter are termed recovery problems. one way to backup the query function is to periodically print a list of circulating items. the frequency and ease of access (e.g., number of copies, their locations, telephone access to them ) of such a list can pose substantial problems. an alternative to scheduled printings is an arrangement for quick printouts of a frequently copied backup tape on a redundant computer system. the basic recovery problem is how to enter data into the system for transactions that took place during downtimes. presumably, if unexpected do\\'ntimes are not inordinately long, discharges and other file updates may be postponed. this simplification is helpful, since transaction sequences 198 journal of librm·y automation vol. 5/3 september, 1972 among different kinds of updates can become quite complicated, e.g., discharges undo charges, and confusing the sequence causes problems: although other kinds of system updates have their own special problems, the following paragraphs only briefly discuss the backup and recovery of charging activities. absence systems the provision of transaction evidence in off-line mode has already been suggested for absence systems that have the necessary terminal capabilities. similarly, there are configurations which, in off-line mode, read user and item cards and produce machine-readable transaction records that can be read-in during post-downtime recovery procedures. the northwestern university system has a special backup terminal for this purpose. the provision of automatic recovery facilities is an attractive feature. alternatively, multiple part manual transaction records can be made for charges during downtimes. one part may serve as transaction evidence; the other can be used for manual input of recovery data, when the system is up again. exactly how this is done depends upon other details of the particular system. item systems since inputs of user and item identification numbers are sufficient to record charges in item systems, the recovery problem can be simpler than for absence systems. typewriter-like terminals with card or paper tape punches or magnetic recorders can be used to create machine-readable recovery data. the requirements for transaction evidence may be crucial. perhaps the solution to the worst case is the use of a two-part manual transaction form: one copy for transaction evidence, and the other, as above, for post-downtime recovery inputs. we can summarize three hardware solutions for transaction backup in either system type: 1) total system redundancy, 2) backup at the terminal level, and 3) a backup facility between terminal and computer. the cost of full system redundancy makes it unlikely. a facility to log transactions during downtimes is more feasible; there are several choices. one such alternative is to record transaction data off-line in machine-readable form at each data collection point: e.g., to punch paper-tape or cards. another alternative is to record data from several terminals with a single device, such as a magnetic recorder, or a control unit that coordinates a multiterminal system. a third solution, a variation on the second, is a mini-computer which links terminals, and handles telecommunications with a larger machine that holds system files. this approach has been taken by bucknell university. it affords more comprehensive backup than merely capturing transaction data. other functions, such as checks for user validity and reserved items, can be performed on a relatively reliable mini-computer dedicated to circulation. two types of designs/mcgee 199 conclusion on-line library catalogs are now a reality, but not yet for the exotic information retrieval work once popularly projected. instead, relatively straightforward accesses by author, title, and call number are supporting circulation, reference, and technical processing functions. the needs for better circulation systems and network processing of shared cataloging data have stimulated developments of large-scale operational (not experimental) systems around resident files of on-line bibliographic records. developers have not waited for solutions to fundamental problems of automatic indexing and information retrieval ; they have put large bibliographic files on-line and provided relatively simple, multiple access keys. the advances that have been made are in methods of physical access to bibliographic records, not in the intellectual or subject access to information. no new information is being retrieved, but familiar processes are being performed in better ways. improvements in the ease and time of accessing library files have dramatically upgraded the library's responses for its own routine work and to the public in general. we are experiencing the first of a new generation of practical systems that perform traditional functions with on-line rather than manual files, with as much benefit as possible short of better subject access. the new systems are transcending the barriers to convenient use that have been imposed by the size, complexities, and awkwardness of large manual systems. historically, it has been impractical to add circulation information to each record in the public catalog for an item. with on-line files of single records per item this is now possible. state-of-the-art computing affords multiple access keys to a record, instead of duplicating it for additional entries as in manual catalogs. how many and which keys are furnished largely determines the extent to which an on-line catalog can replace a traditional one. difficult cost and technical problems explain the current approaches. full requirements of a public catalog have been avoided; simpler files have been built to handle explicit processing functions. the advantages are simplified records and fewer access points. full bibliographic records are variable length, often large, and sometimes eccentric-and therefore relatively expensive to handle in machine form. in principle the overhead for access is the same as for manual files: the more entries that are provided, the greater the storage, processing, and cost. systems with simpler files than the public catalog have therefore been built. there have been no machine equivalents of large library catalogs; so we have studied manual ones to theorize ideal characteristics. in some cases this model may have supplied a misleading bias. studies of the new on-line systems at work could possibly revise our notions of what is needed. the kinds of systems now emerging are answers for the foreseeable futur e. the tradition of separately organizing and managing public and technical services will be challenged by the integrated systems. their centralized files , data handling, and access methods transcend functional boundaries which 200 journal of library automation vol. 5/3 september, 1972 grew between library tasks that used different but redundant manual files and evolved separate units and procedures to accomplish virtually the same basic data processing functions. the profession has yet to widely appreciate the new overview and managerial changes that are invited. reaction to them may be projected as a fourth and perhaps painful trend. in sofar as no fully integrated systems have yet been developed, it is likely that as they emerge they will force substantial changes to traditional patterns of library organization and management. acknowledgments this work was supported by the university of chicago library systems development office under clr/neh grant no. e0-262-70-4658 from the council on library resources and the national endowment for the humanities, for the d evelopment and operational testing of a library data management system. references 1. rob mcgee, a literatu1'e survey of operational and emerging online library circulation systems (university of chicago library systems development office, feb. 1972). available as eric/clis ed 059 752. mf$0.65, hc$3.29. 2. , "key factors of circulation system analysis and design," college and research libraries 33:127-140 ( mar. 1972 ). 3. h. k. g. bearman, "library computerisation in west sussex," program: news of computers in british libraries 2:53-58 (july 1968). 4. , "west sussex county library computer book issuing system," assistant libra1·ian 61:200-202 ( sept. 1968 ) . 5. richardt. kimber, "an operational computerised circulation system with on-line interrogation capability," program: news of computers in british librm·ies 2 :75-80 (oct. 1968 ) . 6. homer v. ruby, "computerized circulation at illinois state library," illinois libraries 50:159-162 ( feb . 1968 ). 7. robert e. hamilton, "the illinois state library 'on-line' circulation control system," in: proceedings of the 1968 clinic on librm·y applications of data processing. (urbana, ill.: university of illinois graduate school of library science, 1969 ) p. 11-28. 8. ibm corp., on-line library circulation control syste m, moffet library, midwestern university , wichita falls, t exas. application bri ef k-20-0271-0. ( white plains, n.y.: ibm corp., data processing div. , 1968) 14 p . 9. calvin j. boyer and jack frost, "on-lin e circulation controlmidwestern university library's system using an ibm 1401 computer in a 'time-sharing' mode," in: proceedings of the 1969 clinic on two types of designs/mcgee 201 library applications of data processing. (urbana, ill.: university of illinois graduate school of library science, 1970) p. 135-145. 10. charles d. reineke and calvin j. boyer, "automated circulation system at midwestern university," ala bulletin 63:1249-1254 (oct. 1969). 11. belfast, queen's university, school of library studies, study group on the library applications of computers, first report of the working party (belfast university, july 1965) 18 p. 12. richard t. kimber, "studies at the queen's university of belfast on real-time computer control of book circulation," journal of documentation 22:116-122 (june 1966) . 13. , "conversational circulation," libri 17:131-141 ( 1967). 14. ___ ,"the cost of an on-line circulation system," program: news of computers in british libraries, 2:81-94 (oct. 1968). 15. ann h. boyd and philip e. j. walden, "a simplified on-line circulation system," program: news of compute1·s in libraries 3:47-65 (july 1969). 16. velma veneziano and joseph t. paulukonis, "an on-line, real-time time circulation system." [this documentation of the northwestern university library system was made specially available to the author. a later version with the same title appears in larc reports 3:7-48 (winter 1970-71)]. 17. h . rivoire and m. smith, library systems automation reports 1971a-2, bucknell library on-line circulation system (blocs). ellen clarke bertrand library ( 15 mar. 1971) 19 p. 18. r. a. kennedy, "bell laboratories' library real-time loan system (bellrel)," lola, 1:128-146 (june 1968). 19. , "bell laboratories' on-line circulation control system: one year's experience," in : proceedings of the 1969 clinic on library applications of data processing. (urbana, ill.: university of illinois graduate school of library science, 1970) p. 14-30. 20. paladugu v. rao and b. joseph szerenyi, "booth library on-line circulation system (bloc)," ]ola, 4:86-102 (june 1971). 21. richard h. stanwood, "monograph and serial circulation control," a paper for the international congress of documentation, buenos aires, sept. 21-24, 1970. national council for scientific and technical researcb, buenos aires ( 1970) 23 p. 22. ibm corp., data processing division, functional specifications: a circulation system for the ohio state university libraries, gaithersburg, maryland (november 26, 1969) various paginations. [this and other technical documentation were made specially available to the author. this is now available through eric/clis as: on-line remote catalog access and circulation control system. part i: functional specifications. part ii: user's manual. november 1969. 151 p. ed 050 792. mf $0.65, hc $4.00] 202 journal of library automation vol. 5/3 september, 1972 23. edward e. shumilak, an online interactive book-library-management system. nasa technical note nasa tn d-7052. national aeronautics . and space administration, washington, d.c. ( march 1971 ) 40 p. [this document is available through the national technical information service under document number n71-20526] 24. university of chicago library, a p1'0posal for the development and operational testing of a library data management system, herman h. fussier and fred h. harris, principal investigators. ( chicago, ill.: 1970) 44 p. 25. richard de gennaro, "the development and administration of automated systems in academic libraries," ]ola, 1:75-91 (mar. 1968). 26. ellen w. miller and b. j. hodges, "shawnee mission's on-line cataloging system," ]ola 4:13-26 (mar. 1971). 27. simon p. j. chen, "on-line and real-time cataloging," american libraries 3:117-119 (feb. 1972 ). 28. university of chicago library, development of an integrated, computer-based, bibliographical data system for a large university library, annual report 1967/ 68. by herman h . fussier and charles t. payne. university of chicago library, chicago, illinois ( 1968 ) 17 p. + appendixes. 29. frederick g. kilgour, letter to the author 23 november 1971. 30. ben-ami lipetz, user requirements in identifying desired works in a large library, final report, grant no. sar/ oeg-1-71071140-4427, u.s. department of health, education, and welfare, office of education, bureau of research. (new haven, conn.: yale university library, june 1970) 73 p. + appendixes. 31. "library extends catalog access and new delivery service," [ 4 p.] a brochure issued by price gilbert memorial library, georgia institute of technology, atlanta, georgia, 1972. library use of web-based research guides jimmy ghaphery and erin white information technology and libraries | march 2012 21 abstract this paper describes the ways in which libraries are currently implementing and managing webbased research guides (a.k.a. pathfinders, libguides, subject guides, etc.) by examining two sets of data from the spring of 2011. one set of data was compiled by visiting the websites of ninety-nine american university arl libraries and recording the characteristics of each site’s research guides. the other set of data is based on an online survey of librarians about the ways in which their libraries implement and maintain research guides. in conclusion, a discussion follows that includes implications for the library technology community. selected literature review while there has been significant research on library research guides, there has not been a recent survey either of the overall landscape or of librarian attitudes and practices. there has been recent work on the efficacy of research guides as well as strategies for their promotion. there is still work to be done on developing a strong return on investment metric for research guides, although the same could probably be said for other library technologies including websites, digital collections, and institutional repositories. subject-based research guides have a long history in libraries that predates the web as a servicedelivery mechanism. a literature-review article from 2007 found that research on the subject gained momentum around 1996 with the advent of electronic research guides, and that there was a need for more user-centric testing.1 by the mid-2000s, it was rare to find a library that did not offer research guides through its website.2 the format of guides has certainly shifted over time to database-driven efforts through local library programming and commercial offerings. a number of other articles start to answer some of the questions about usability posed in the 2007 literature review by vileno. in 2008, grays, del bosque, and costello used virtual focus groups as a test bed for guide evaluation.3 two articles from the august 2010 issue of the journal of library administration contain excellent literature reviews and look toward marketing, assessment, and best practices.4 also in 2010, vileno followed up on the 2007 literature review with usability testing that pointed toward a number of areas in which users experienced difficulties with research guides.5 jimmy ghaphery (jghapher@vcu.edu) is head, library information systems and erin white (erwhite@vcu.edu) is web systems librarian, virginia commonwealth university libraries, richmond, va. mailto:jghapher@vcu.edu library use of web-based research guides | ghaphery and white 22 in terms of cross-library studies, an interesting collaboration in 2008 between cornell and princeton universities found that students, faculty, and librarians perceived value in research guides, but that their qualitative comments and content analysis of the guides themselves indicated a need for more compelling and effective features.6 the work of morris and grimes from 1999 should also be mentioned; the authors surveyed 53 university libraries, finding that it was rare to find a library with formal management policies for their research guides.7 most recently, libguides has emerged as a leader in this arena, offering a popular software-as-aservice (saas) model and as such is not yet heavily represented in the literature. a multichapter libguides lita guide is pending publication and will cover such topics as implementing and managing libguides, setting standards for training and design, and creating and managing guides. arl guides landscape during the week of march 3rd, 2011, the authors visited the websites of 99 american university arl libraries to determine the prevalence and general characteristics of their subject-based research guides. in general, the visits reinforced the overarching theme within the literature that subject-based research guides are a core component of academic library web services. all 99 libraries offered research guides that were easy to find from the library home page. libguides was very prominent as a platform, in production at 67 of the 99 libraries. among these, it appeared that at least 5 libraries were in the process of migrating from a previous system (either a homegrown, database-driven site or static html pages) to libguides. in addition to the presence and platform, the authors recorded additional information about the scope and breadth of each site’s research guides. for each site, the presence of course-based research guides was recorded. in some cases the course guides had a separate listing, whereas in others they were intermingled with the subject-based research guides. course guides were found on 75 of the 99 libraries visited. of these, 63 were also libguides sites. it is certainly possible that course guides are being deployed at some of the other libraries but were not immediately visible in visiting the websites, or that course guides may be deployed through a course management system. nonetheless, it appears that the use of libguides encourages the presence of public-facing course guides. qualitatively, there was wide diversity of how course guides were organized and presented, varying from a simple a-to-z listing of all guides to separately curated landing pages specifically organized by discipline. the number of guides was recorded for each libguides site. it was possible to append “/browse.php?o=a” to the base url to determine how many guides and authors were published at each site. this php extension was the publicly available listing of all guides on each libguides platform. the “/browse.php?o=a” extension no longer publicly reports these statistics; however, findings could be reproduced by manually counting the number of guides and authors on each site. the authors confirmed the validity of this method in the fall of 2011 by revisiting four sites and finding that the numbers derived from manual counting were in line with the previous findings. of information technology and libraries | march 2012 23 the 63 libguides sites we observed, a total of 14,522 guides were counted from 2,101 authors for an average of 7 guides per author. on average, each site had 220 guides from 32 authors (median of 179 guides; 29 authors). at the high end of the scale, one site had 713 guides from 46 authors. based on the volume observed, libraries appear to be investing significant time toward the creation, and presumably the maintenance, of this content. in addition to creation and ongoing maintenance, such long lists of topics raise a number of usability issues that libraries will also be wise to keep in mind.8 survey the literature review and website visits call out two strong trends: 1. research guides are as commonplace as books in libraries, 2. libguides is the elephant in the room, so much so that it is hard to discuss research guides without discussing libguides. based on preliminary findings from the literature review and survey, we looked to further describe how libraries are supporting, innovating, implementing, and evaluating their research guides. a ten-question survey was designed to better understand how research guides sit within the cultural environment of libraries. it was distributed to a number of professional discussion lists the week of april 19, 2011 (see appendix). the following lists were used in an attempt to get a balance of opinion from populations of both technical and public services librarians: code4lib, web4lib, lita-l, lib-ref-l, and ili-l. the survey was made available for two weeks following the list announcements. survey response was very strong, with 198 responses (188 libraries) received without the benefit of any follow-up recruitment. ten institutions submitted more than one response. in these cases only the first response was included for analysis. we did not complete a response for our own institution. the vast majority (155, 82%) of respondents were from college or university libraries. of the remaining 33, 24 (13%) were from community college libraries, with only 9 (5%) identifying themselves as public, school, private, or governmental. among the college and university libraries, 17 (9%) identified themselves as members of the arl, which comprises 126 members.9 in terms of “what system best describes your research guides by subject?” the results were similar to the survey of arl websites. most libraries (129, 69%) reported libguides as their system, followed by “customized open source system” and “static html pages,” both at 20 responses (11% each). sixteen libraries (9%) reported using a homegrown system, with three libraries (2%) reporting “other commercial system.” in terms of initiating and maintaining a guides system, much of the work within libraries seems to be happening outside of library systems departments. when asked which statement best described who selected the guides system, 67 respondents (36%) indicated their library research library use of web-based research guides | ghaphery and white 24 guides were “initiated by public services,” followed closely by “more of a library-wide initiative” at 63 responses (34%). in the middle at 34 responses (18%) was “initiated by an informal crossdepartmental group.” only 10 respondents (5%) selected “initiated by systems,” with the top down approach of “initiated by administration” gathering 14 responses (7%). when narrowing the responses to those sites that are using libguides or campus guides, the portrait is not terribly different, with 36% library-wide, 35% public services, 18% informal cross-departmental, 7% administration, and systems trailing at 4%. likewise there was not a strong indication of library systems involvement in maintaining or supporting research guides. sixty-nine responses (37%) indicated “no ongoing involvement” and an additional 35 (19%) indicated “n/a we do not have a systems department.” there were only 21 responses (11%) stating “considerable ongoing involvement,” with the balance of 63 responses (34%) for “some ongoing involvement.” not surprisingly, there was a correlation between the type of research guide and the amount of systems involvement. for sites running a “customized open source system,” “other commercial system,” or “homegrown system,” at least 80% of responses indicated either “considerable” or “some” ongoing systems involvement. in contrast, 37% of sites running libguides or campusguides indicated “considerable” or “some” technical involvement. further, the libguides and campusguides users recorded the highest percentage (43%) of “no ongoing involvement” compared to 37% of all respondents. interestingly, 20% of libguides and campus guides users answered “n/a we do not have a systems department,” which is not significantly higher than all respondents for this question at 19%. the level of interaction between research guides and enterprise library systems was not reported as strong. when asked “which statement best describes the relationship between your web content management system and your research guides?” 112 responses (60%) indicated that “our content management system is independent of our research guides” with an additional 51 responses (27%) indicating that they did not have a content management system (cms). only 12 respondents (6%) said that their cms was integrated with their research guides with a remaining 13 (7%) saying that their cms was used for “both our website and our research guides.” a similar portrait was found in seeking out the relationship between research guides and discovery/federated search tools. when asked “which statement best describes the relationship between your discovery/federated search tool and your research guides?” roughly half of the respondents (96, 51%) did not have a discovery system (“n/a we do not have a discovery tool”). only 12 respondents (6%) selected “we prominently feature our discovery tool on our guides,” whereas more than double that number, 26 (14%), said “we typically do not include our discovery tool on our guides.” fifty four respondents (29%) took the middle path of “our discovery tool is one of many search options we feature on our guides.” in the case of both discovery systems and content management systems, it seems that research guides are typically not deeply integrated. when asked “what other type of content do you host on your research guides system?” respondents selected from a list of choices as reflected in table 1. information technology and libraries | march 2012 25 answer total percent libguides/campusguides course pages 127 68% 74% “how to” instruction 123 65% 77% alphabetical list of all databases 76 40% 42% “about the library” information (for example hours, directions, staff directory, event) 59 31% 35% digital collections 34 18% 19% everything—we use the research guide platform as our website 16 9% 9% none of the above 17 9% 2% table 1. other types of content hosted on research guides system these answers reinforce the portrait of integration within the larger library web presence. while the research guides platform is an important part of that presence, significant content is also being managed by libraries through other systems. it is also consistent with the findings from the arl website visits, where course pages were consistently found within the research guides platform. for sites reporting libguides or campusguides as their platform, inclusion of course pages and how-to instruction was even higher, at 74% and 77%, respectively. another multi-answer question sought to determine what types of policies are being used by libraries for the management of research guides: “which of the following procedures or policies do you have in place for your research guides?” responses are summarized in table 2. library use of web-based research guides | ghaphery and white 26 answer total percent percent using libguides/campusguides style guides for consistent presentation 105 56 58 maintenance and upkeep of guides 94 50 53 link checking 87 46 50 required elements such as contact information, chat, pictures, etc. 78 41 56 training for guide creators 73 39 43 transfer of guides to another author due to separation or change in duties 72 38 41 defined scope of appropriate content 43 23 22 allowing and/or moderating user tags, comments, ratings 36 19 25 none of the above 36 19 19 controlled vocabulary/tagging system for managing guides 23 12 25 table 2. management policies/procedures for research guides while nearly one in five libraries reported none of the policies in place at all, the responses indicate that there is effort being applied toward the management of these systems. the highest percentage for any given policy was 56% for “style guides for consistent presentation.” best practices in these areas could be emerging or many of these policies could be specific to individual library needs. as with the survey question on content, the research-guides platform also has a role with the libguides and campusguides users reporting much higher rates of policies for “controlled vocabulary/tagging” (25% vs. 12%) and “required elements” (56% vs. 41%). in both information technology and libraries | march 2012 27 of these cases, it is likely that the need for policies arise from the availability of these features and options that may not be present in other systems. based on this supposition, it is somewhat surprising that the libguides and campusguides sites reported the same lack of policy adoption (none of the above; 19%). the final question in the survey further explored the management posture for research guides by asking a free-text question: “how do you evaluate the success or failure of your research guides?” results were compiled into a spreadsheet. the authors used inductive coding to find themes and perform a basic data analysis on the responses, including a tally of which evaluation methods were used and how often. one in five institutions (37 respondents, 19.6%) looked only to usage stats, while seven respondents (4%) indicated that their library had performed usability testing as part of the evaluation. forty-our respondents (23.4%) said they had no evaluation method in place (“ouch! it hurts to write that.”), though many expressed an interest or plans to begin evaluation. another emerging theme included ten respondents who quantified success in terms of library adoption and ease of use. this included one respondent who had adopted libguides in light of prohibitive it regulations (“we choose libguides because it would not allow us to create class specific research webpages”). several institutions also expressed frustration with the survey instrument because they were in the process of moving from one guides system to another and were not sure how to address many questions. most responses indicated that there are more questions than answers regarding the efficacy of their research guides, though the general sentiment toward the idea of guides was positive, with words such as “positive,” “easy,” “like,” and “love” appearing in 16 responses. countering that, 5 respondents indicated that their libraries’ research-guides projects had fallen through. conclusion this study confirms previous research that web-based research guides are a common offering, especially in academic libraries. adding to this, we have recorded a quantitative adoption of libguides both through visiting arl websites and through a survey distributed to library listservs. further, this study did not find a consistent management or assessment practice for library research guides. perhaps the most interesting finding from this study is the role of library systems departments with regard to research guides. it appears that many library systems departments are not actively involved in either the initiation or ongoing support of web-based research guides. what are the implications for the library technology community and what questions arise for future research? the apparent ascendancy of libguides over local solutions is certainly worth considering and in part demonstrates some comfort within libraries for cloud computing and saas. time will tell how this might spread to other library systems. the popularity of libguides, at its heart a specialized content management system, also calls into question the vitality and adaptability of local content management system implementations in libraries. more generally, does the desire to professionally select and steward information for users on research guides indicate librarian misgivings about the usability of enterprise library systems? how do attitudes library use of web-based research guides | ghaphery and white 28 toward research guides differ between public services and technical services? hopefully these questions serve as a call for continued technical engagement with library research guides. what shape that engagement may have in the future is an open question, but based on the prevalence and descriptions of current implementations, such consideration by the library technology community is worthwhile. references 1. luigina vileno, “from paper to electronic, the evolution of pathfinders: a review of the literature,” reference services review 35, no. 3 (2007): 434–51. 2. martin courtois, martha higgins, aditya kapur, “was this guide helpful? users’ perceptions of subject guides,” reference services review 33 , no. 2 (2005): 188–96. 3. lateka j. grays, darcy del bosque, and kristen costello, “building a better m.i.c.e. trap: using virtual focus groups to assess subject guides for distance education students,” journal of library administration 48, no. 3/4 (2008): 431–53. 4. mira foster et al., “marketing research guides: an online experiment with libguides,” journal of library administration 50, no. 5/6 (july/september, 2010): 602–16; alisa c. gonzalez and theresa westbrock, “reaching out with libguides: establishing a working set of best practices,” journal of library administration 50, no. 5/6 (july/september, 2010): 638–56. 5. luigina vileno, “testing the usability of two online research guides,” partnership: the canadian journal of library and information practice and research 5, no. 2 (2010), http://journal.lib.uoguelph.ca/index.php/perj/article/view/1235 (accessed august 8, 2011). 6. angela horne and steve adams, “do the outcomes justify the buzz? an assessment of libguides at cornell university and princeton university—presentation transcript,” presented at the association of academic and research libraries, seattle, wa, 2009, http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-oflibguides-at-cornell-university-and-princeton-university (accessed august 8, 2011). 7. sarah morris and marybeth grimes, “a great deal of time and effort: an overview of creating and maintaining internet-based subject guides,” library computing 18, no. 3 (1999): 213–16. 8. mathew miles and scott bergstrom, “classification of library resources by subject on the library website: is there an optimal number of subject labels?” information technology & libraries 28, no. 1 (march 2009): 16–20, http://www.ala.org/lita/ital/files/28/1/miles.pdf (accessed august 8, 2011). 9. association of research libraries, “association of research libraries: member libraries,” http://www.arl.org/arl/membership/members.shtml (accessed october 24, 2011). http://journal.lib.uoguelph.ca/index.php/perj/article/view/1235 http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-of-libguides-at-cornell-university-and-princeton-university http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-of-libguides-at-cornell-university-and-princeton-university http://www.ala.org/lita/ital/files/28/1/miles.pdf http://www.arl.org/arl/membership/members.shtml information technology and libraries | march 2012 29 appendix. survey library use of web-based research guides please complete the survey below. we are researching libraries’ use of web-based research guides. please consider filling out the following survey, or forwarding this survey to the person in your library who would be in the best position to describe your library’s research guides. responses are anonymous. thank you for your help! jimmy ghaphery, vcu libraries erin white, vcu libraries 1) what is the name of your organization? __________________________________ note that the name of your organization will only be used to make sure multiple responses from the same organization are not received. any publication of results will not include specific names of organizations. 2) which choice best describes your library? o arl o university library o college library o community college library o public library o school library o private library o governmental library o nonprofit library 3) what type of system best describes your research guides by subject? o libguides or campusguides o customized open source system o other commercial system o homegrown system o static html pages 4) which statement best describes the selection of your current research guides system? o initiated by administration o initiated by systems o initiated by public services o initiated by an informal cross-departmental group o more of a library-wide initiative library use of web-based research guides | ghaphery and white 30 5) how much ongoing involvement does your systems department have with the management of your research guides? o no ongoing involvement o some ongoing involvement o considerable ongoing involvement o n/a we do not have a systems department 6) what other type of content do you host on your research guides system? o course pages o “how to” instruction o alphabetical list of all databases o “about the library” information (for example: hours, directions, staff directory, events) o digital collections o everything—we use the research guide platform as our website o none of the above 7) which statement best describes the relationship between your discovery/federated search tool and your research guides? o we typically do not include our discovery tool on our guides o our discovery tool is one of many search options we promote on our guides o we prominently feature our discovery tool on our guides o n/a we do not have a discovery tool 8) which statement best describes the relationship between your web content management system and your research guides? o our content management system is independent of our research guides o our content management system is integrated with our research guides o our content management system is used for both our website and our research guides o n/a we do not have a content management system 9) which of the following procedures or policies do you have in place for your research guides? o defined scope of appropriate content o required elements such as contact information, chat, pictures, etc. o style guides for consistent presentation o allowing and/or moderating user tags, comments, ratings o training for guide creators o controlled vocabulary/tagging system for managing guides o maintenance and upkeep of guides o link checking information technology and libraries | march 2012 31 o transfer of guides to another author due to separation or change in duties o none of the above 10) how do you evaluate the success or failure of your research guides? [free text] development of a gold-standard pashto dataset and a segmentation app article development of a gold-standard pashto dataset and a segmentation app yan han and marek rychlik information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.12553 yan han (yhan@email.arizona.edu) is full librarian at the university of arizona libraries. marek rychlik (rychlik@math.arizona.edu) is full professor at the department of mathematics, university of arizona. © 2021. abstract the article aims to introduce a gold-standard pashto dataset and a segmentation app. the pashto dataset consists of 300 line images and corresponding pashto text from three selected books. a line image is simply an image consisting of one text line from a scanned page. to our knowledge, this is one of the first open access datasets which directly maps line images to their corresponding text in the pashto language. we also introduce the development of a segmentation app using textbox expanding algorithms, a different approach to ocr segmentation. the authors discuss the steps to build a pashto dataset and develop our unique approach to segmentation. the article starts with the nature of the pashto alphabet and its unique diacritics which require special considerations for segmentation. needs for datasets and a few available pashto datasets are reviewed. criteria of selection of data sources are discussed and three books were selected by our language specialist from the afghan digital repository. the authors review previous segmentation methods and introduce a new approach to segmentation for pashto content. the segmentation app and results are discussed to show readers how to adjust variables for different books. our unique segmentation approach uses an expanding textbox method which performs very well given the nature of the pashto scripts. the app can also be used for persian and other languages using the arabic writing system. the dataset can be used for ocr training, ocr testing, and machine learning applications related to content in pashto. background the ocr technology for printed modern latin scripts is a largely solved problem, as both character and word accuracies typically reach greater than 95%. most well-known commercial ocr systems include abbyy, omnipage, and adobe acrobat ocr engine (licensed from iris), while open source systems have tesseract, ocropus, and kraken. ocr technology for other languages and scripts, including arabic scripts and traditional chinese, is still not satisfactory despite the fact that ocr research on these languages has been ongoing since the 1980s. an east asian librarian in 2019 wrote to the author: i am just back from the annual aas (association for asian studies) and ceal (council on east asian libraries) meetings. this year, prof. peter bol of harvard hosted a 2-day digital tech expo there to promote digital humanities . . . i spent 1 day on the dh sessions, where scholars constantly mentioned chinese ocr as a conspicuous and serious block on their path to assessing “digitized” textual collections. if you and your team succeed, it will surely help the eas scholarly community a lot.1 mailto:yhan@email.arizona.edu mailto:rychlik@math.arizona.edu information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 2 sturgeon, who has directed the chinese text project since 2005, stated that ocr of premodern chinese texts presents challenges distinct from ocr of modern documents and premodern documents in other languages, because training data is typically not available and a natural approach to improving accuracy is to train using data extracted from real images of text in the same historical writing style.2 sturgeon utilized both imperfect ocr software and allowed users to manually key in corresponding text via a crowdsourcing approach to gradually improve the quality of transcriptions.3 in 2018, the authors received a grant award from the national endowment for the humanities (neh) to develop ocr and a software prototype for an open-source global language databank for pashto and traditional chinese. activities included fundamental research and software implementation of new ocr technology for the two languages. for the past two years, we have been engaged in all aspects of ocr research in pashto, persian, and chinese scripts, including assessing current technology and systems, reviewing and building datasets, and researching and implementing segmentation algorithms and machine learning models involving neural networks. languages, scripts, and writing systems people in the world read, write, and speak a handful of major languages. of those, reading and writing is accomplished through the use of several types of scripts: latin, chinese, arabic, and devanagari. languages and scripts are very complex topics in regard to origin, structure, and use. they evolve due to influencing and being influenced by each other. a script is defined as “a collection of letters and other written signs used to represent textual information in one or more writing systems,” where a writing system is a common communication method to allow people to exchange information through a medium such as paper.4 the first requirement in a writing system is letters or other written signs. a common writing system can use an alphabet, syllabary, or logography. specifically, the latin and arabic writing systems use alphabets, where an alphabet is a standardized set of letters. combination of letters makes a word. another approach is to use a logogram. chinese characters (including japanese kanji and korean hanja) are logograms. in the alphabet and syllabic systems, individual characters represent sounds only, while in the logographic system each logogram represents a word or a phrase. one script, such as latin and arabic, may be used for several different languages, while some languages use several scripts. latin script is used in western europe, most of eastern europe, and across north and south america. arabic script was adopted by the west asian, middle eastern, and near african regions. in contrast, the japanese use three scripts: the hiragana and katakana syllabaries and the kanji logogram. the next critical feature in a writing system is the order in which to read and write. a writing system has two directions: horizontal and vertical. almost all writing systems are written vertically from top to bottom (ttb). bottom-to-top (btt) writing systems do exist. the philippines traditional scripts, the tagalog (baybayin), hanunóo, buhid, and tagbanwa are in limited use today. they are written from btt.5 within the ttb method, four possibilities exist: 1. left to right (ltr) first and ttb: this method refers to writing a horizontal line starting from the top left of a page, continuing to the right, and returning to the next line all the way from top to bottom. the latin writing system uses this variation. the current chinese writing system uses this order as well. information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 3 2. right to left (rtl) first and ttb: this method refers to writing a horizontal line starting from the top right of a page, continuing to the left, returning to the next line and all the way from top to bottom. arabic writing systems, such as arabic, persian, and pashto, use this order. 3. ttb first and rtl: this method refers to writing a vertical line starting from top right of a page, continuing to the bottom, and returning to the next line all the way from right to left. this method was widely used in traditional chinese (before the 1950s) and traditional japanese materials for thousands of years. it is still used in chinese calligraphy, and occasionally can be found in materials published in chinese. 4. ttb first and ltf: rarely used by a writing system. one of the examples is the manchu script.6 the nature of the scripts and the writing systems may require different algorithms and considerations when we deal with ocr technology, including preparing datasets, segmentation, and performing ocr in computer vision. pashto pashto (پښتو ), alternatively spelled as pushto, pukhto, or pakhto, historically as afghani (افغاني ), is one of the two official languages of afghanistan (the other is dari/farsi/persian). it is also spoken as a regional language in pakistan. pashto is spoken by 40 to 60 million people in afghanistan and pakistan. 7 the arabic script writing system is used for writing arabic, persian, and pashto languages in a cursive style. arabic, persian, and pashto are totally different languages, though they use almost the same alphabets within the same writing system. the pashto alphabet is a modified form of the arabic alphabet. it consists of 45 letters and four diacritic marks and includes all 28 letters from the arabic alphabet. the pashto alphabet includes all 32 letters from the persian alphabet, of which 28 letters are from the arabic alphabet. the romanization of pashto consists of several standards including the american library association (ala) and library of congress (lc) ala-lc romanization, bgn/pcgn, din 31635, iso233, and arabtex. details of romanization of pashto letters with their initial, medial, final forms, and the ala-lc rules are available at library of congress’s website.8 the need for datasets the authors are currently engaging in ocr research, and have applied machine learning (ml) models and methods such as convolutional neural networks (cnn) and recurrent neural networks (rnn). the advance of ml models and multiple methods has achieved great improvements in many fields. for instance, the most well-known event in ml occurred when an ai program named alphago defeated the world go champion in 2015. open-source ocr systems tesseract and ocropus both released their ocr systems using the rnn models in 2014 and 2018. these models and methods rely heavily on datasets for training, improvement, and evaluation. similarly, alphago uses datasets for training and evaluations. good and comprehensive datasets are critical to the success of an ml model and/or method. the most well-known dataset is the mnist database which contains a training set of 60,000 images and a test set of 10,000 images (28 × 28 pixels) of handwritten digits (0–9). the dataset is widely used for training and testing in ml as the gold-standard dataset for ml techniques and pattern recognition. information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 4 related datasets currently, few pashto datasets are available as open access. while there are other pashto datasets mentioned in the literature, we have not found one that provides a one-to-one mapping of line images to texts. a search on github has one result showing a raw text dataset containing content in pashto scraped from the web. however, this dataset is of little use in the case of training ml models for ocr, because it has no corresponding text. the computer science department of the national university of computer and emerging sciences (nuces) peshawar campus has been working on pashto ocr since 2006, and its research has created a pashto image-to-ligature dataset titled fast-nu dataset, containing 4,000 images of 1,000 unique ligatures in a variety of font sizes.9 the creators of this dataset have kindly sent us the pashto image-to-ligature dataset. a recent paper discussed the use of deep learning architectures for ocr in pashto with the development of a bigger dataset based on the fast-nu dataset including contours, negative, and rotated images.10 ali developed a database recording pashto digits from 25 male and 25 female native pashto speakers for automatic speech recognition. unfortunately, the authors had difficulties in downloading this dataset.11 khan et. al. designed a database encompassing a total of 4,488 images (102 distin guishing samples for the 44 pashto letters). this approach is very close to that of the fast-nu dataset.12 we are not sure if they are very similar, as we have not found a way to download and evaluate the dataset. another article describes offline pashto ocr using ml which tested more than 5,000 images in the dataset.13 the article describes its “extraction of lines containing pashto content,” but these “lines containing pashto content” have no specific resource or link to check. rawan and han compiled a pashto–english dictionary, which is open accessible through its website and an android app.14 in the past decade of working with afghan materials, han and rawan found several existing pashto language dictionaries online but encountered several issues related to standardized spelling, pronunciation, romanization/transliteration, and limited content. this improved dictionary contains over 12,000 entries of pashto words; each entry has a pashto word and corresponding english meanings. the pashto–english dictionary has been created with the following objectives in mind: a) standardized spelling and vocabulary, b) standard pronunciation, and c) standardized romanization with the ala–lc romanization scheme. other published pashto dictionaries either use one of the above or a combination of a few romanization systems. this dataset is available for noncommercial use upon reasonable request. two datasets but in different languages (arabic and persian) were produced by the open islamicate texts initiative, available in github (https://github.com/openiti/ocr_gs_data).15 both arabic and persian datasets have scans of original books from the premodern and corresponding texts.16 for example, its persian datasets came from page images from three persian books. these pages were segmented into separated line images and the line images were transcribed with corresponding persian texts. https://github.com/openiti/ocr_gs_data information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 5 building a pashto dataset our dataset creation methodology consists of three phases: • the first is to select pashto publications from our largest digital afghan collections. the focus was to have a language specialist who selected publications varying in fonts, original quality, and publication years. • the second phase is to use our segmentation app to produce line images from page images of the selected titles. because of the nature of pashto alphabets, we took a different segmentation approach involving expanding textboxes. this approach produced positive outcomes. • the final phase is to generate gold-standard text from corresponding line images involving human key-in and final review. we originally hoped that ocr generated text could increase productivity. unfortunately, the text produced from the current open-source ocr system tesseract 4.x was not useful. a persian ph.d. student was hired to complete the one-to-one key-in. finally, the author and his colleague reviewed the dataset. data source rawan and han at the university of arizona libraries have been collaborating with the afghanistan centre at kabul university (acku), the de facto national library of afghanistan. the purpose of the 13-year-long collaboration is to preserve and provide open access to afghanistan’s unique materials from the acku’s physical collections. initially funded by a grant of $350,000 from the national endowment for the humanities (neh) for the period of 2008 to 2012, the project digitized 200,000 pages of materials from the modern period. the project continues to receive support from the university of arizona and the acku. the acku’s permanent collection is the most extensive in the region covering a time of war and social upheaval in the country, with most of the documents in the principal languages of pashto, dari (persian), and english with a variety of formats such as monographs, series, reports, yearbooks, videos, and newspapers. in addition, rawan and han also pursued related afghani scholars’ collections including those of ludwig w. adamec and m. mobin shorish. a repository (www.afghandata.org) has been openly accessible containing these unique materials dating from the 1950s to the present. the repository has grown from the initial 200,000 pages to 2 million, and is the biggest digital repository in the world covering afghanistan and its region with more than 200,000 active users viewing 400,000 pages per year. the wealth of the materials in terms of content, formats, and sources of information makes them undoubtedly the ultimate source of information for the studies of afghanistan and its region. from a data scientist’s point of view, the repository is a treasure trove for big data and ml purposes because it consists of a diversity of content from many sources in a variety of formats and document layouts. selection the selected books, published in 1986, 2002, and 2006 respectively, vary in fonts, printing, and digitization quality. ms. rawan, a language specialist, selected ten pashto books from the afghan digital repository. due to the limited funding available, only three books were used as the source for the dataset. more titles can be added if additional funding is available in the future. http://www.afghandata.org/ information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 6 1. “ کرګر اکبر محمد څیړونکی / څیره اوفلسفی عرفانی روښان دبایزید کی حالنامه په .” (mystic and philosophic profile of bayazid roshan as reflected in halnama), published in 2006 and digitized in 400 dpi in grayscale (www.doi.org/10.2458/azu_acku_bp189_5_bay29_pay23_1385). published in 2002 and digitized in 600 dpi in ,(women in life) ”لیکنه او څیړنه معصومه رضاء سید“ .2 black and white (www.doi.org/10.2458/azu_acku_bp173_4_ray62_1381). published in 1986 with lower ,(teaching the qur'an and theology) ”… تعلمیم القران او دینیات“ .3 quality printing in a different pashto font digitized in 600 dpi in black and white (www.doi.org/10.2458/azu_acku_bp45_tay67_1365). image processing algorithms and the segmentation app in a general traditional ocr system, the workflow of recognition consists of the following stages: preprocessing, document layout analysis, page segmentation, classification, and postprocessing. in our research, we refer to segmentation as the process of partitioning a digital image into one or more information blocks. the segmentation app is used during the preprocessing and segmentation stages. sturgeon’s paper discussed major methods for preprocessing and character segmentation.17 multiple papers discussed various methods to do arabic or pashto text segmentation, including • horizontal projection18 • baseline 19 • template matching 20 • contour analysis 21 • zoning, 22 and • a combination of one or more above methods such as contour analysis and template matching.23 these methods have certain issues when dealing with letters with dots on the top or the bottom, and diacritics, specific to pashto scripts, as the pashto alphabet contains more letters and diacritics than its counterparts in arabic and persian. in addition, noise from original low-quality printing and digitization creates additional barriers. ullah et. al.24 briefly mentioned text area detection and segmentation with the detection and removal of diacritics. their segmentation goes from line segmentation using the horizontal projection, to word, and then to character level progressively. the letters (e.g., څ ,ټ, ښ) are sensitive to noise randomly appearing in page images. our method has proven to be successful in getting accurate character and line segments with the benefits of simplicity and program efficiency. details of discussion of the method are beyond the scope of this paper and shall be discussed in another paper. the author and a postdoctoral researcher created the code to identify pashto/persian text lines from page images, where the page images are from our digitization master files. our method takes a different approach from the above segmentation methods. algorithms and specific properties related to the characteristics of pashto letters have been implemented. we called it the “expanding textbox” method, which calculates the overlapping ratio of one textbox with the others and merges them based on a confidence level controlled by users. the confidence level of overlapping ratio is controlled by properties such as textbox, overlaptype, overlapthreshold, maxdiacriticssize, and minlineheight. to achieve segmentation, the app is also a specific image processing program that contains common preprocessing algorithms such as binarization. http://www.doi.org/10.2458/azu_acku_bp189_5_bay29_pay23_1385 https://www.doi.org/10.2458/azu_acku_bp173_4_ray62_1381 https://www.doi.org/10.2458/azu_acku_bp45_tay67_1365 information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 7 all commercial and open-source ocr systems give users few choices in page segmentation. we believe that the availability of flexible adjustments unique to the pashto/persian/arabic alphabets allows users to achieve accurate results based on analysis of our largest collections of pashto materials. our huge collections of printed materials spanning the period from the 1950s to the 2010s were published by governments, non-profit organizations, local companies, and individuals. these materials were printed in a diverse range of fonts and printing quality. the app has unique features to allow users to adjust several variables to ensure that they have accurate segmentation. segmentation parameters such as vertical expansion and horizontal expansion (see fig. 1) can be adjusted to expand the line vertically and/or horizontally. our experiments show that typically vertical expansion is set to −0.15 and horizontal expansion is set to 5 for most of the page images from our collections. however, both variables are subject to change if lines are not segmented correctly. figures 2 and 3 show a real-life example of the different values in the vertical expansion (set to 0.20) to get all of the correct lines. users can adjust these variables to achieve desired outputs if diacritics and lines are not recognized correctly. the app was programmed using matlab, which can run on matlab or run independently if packaged with matlab. the app can be exported to other platforms and run in batch mode if needed. the app has a simple gui (see fig. 1) providing a preview of expanded ligatures, expanded diacritics, lines of text, and binarized image windows. this allows users to adjust segmentation variables and verify results before outputting. figure 4 demonstrates an example of lines of text preview. when satisfied, users can output these lines as images (one image per line from a page image). these line images are ready for ocr or manual transcription. figure 1. expanded diacritics (highlighted in red) and the app gui. information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 8 figure 2. vertical expansion set as −0.15 missing two lines. figure 3. vertical expansion set as −0.20 producing correct results information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 9 figure 4. text lines identified (lines in green). information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 10 finalizing the dataset to build a truly 100% accurate dataset, we have a language specialist who keyed in, verified, and double-checked the corresponding pashto texts. a ph.d. student from the school of middle eastern and north african studies (with persian language fluency) was hired to complete this task. initially, we tried to ocr these line images by using the open-source system tesseract 4.x with the hope that its output would speed up the key-in process. unfortunately, the majority of the ocr results from these line images was not usable. to ensure that the dataset has the gold-standard one-to-one mapping of a line image to a line text, the ph.d. student keyed in pashto texts line by line by viewing every individual line image. figure 5 shows a sample line image and its text. finally, ms. rawan and the author han reviewed these line images and their corresponding texts. the dataset is organized in a hierarchical structure consisting of directories, where each directory contains line subdirectories which hold line images and its texts. the dataset is openly available at github (https://github.com/yhan818/pashto-dataset). figure 5. sample line image and corresponding text. discussion the nature of the scripts and the writing systems may require different algorithms and considerations when we deal with ocr technology, including preparing datasets, segmentation, and performing ocr in computer vision. in our research in specific languages, we have tested this app with documents in pashto, persian/dari and arabic with successful results. our textbox extension method should work for any language using the arabic writing system beyond these above scripts. during our research, we are clearly aware of the following limitations of the ocr technology, techniques and systems: 1) lack of high accuracy in segmentation: a) while it is true that ocr on the character/word accuracy of the latin scripts can exceed 95% accuracy, one shall not believe that the accuracy of a document after ocr will be at the same level. depending on the nature of a document, segmentation accuracy varies among documents. ocring documents in simple layout (e.g., a monograph without columns and tables) generally reaches high accuracy, while ocring documents with complex layouts (e.g., newspapers and scientific articles) generates poor results. b) we have tested multiple popular commercial and open-source ocr systems specifically in the area of segmentation. on several samples, every ocr system failed completely. in other words, the text output of every ocr system is nonsense. in some cases, only abbyy recognized columns correctly; the remaining systems unexpectedly transposed https://github.com/yhan818/pashto-dataset information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 11 text columns, which means potential indexing and searching errors, although the character and word accuracy reached 95% accuracy. c) we argue that segmentation accuracy shall be added as one of the most important evaluation criteria. 2) to date almost all ocr technology and systems are limited to text only: a) missing information in other formats: we agree that plain text in the writing systems is the most commonly used and a very important communication method. however, almost all materials contain information in other formats (e.g., illustrations, figures, tables, and formulas) that may be very difficult to describe in text. an individual page from a monograph, journal article, or newspaper may contain information in other formats beyond text. such information can be a table, mathematical formula, figure, picture, or drawing. one company, mathpix, recently started to provide ocr on simple tables and mathematical formulas for a fee. b) missing semantic information: in addition, current ocr simply outputs plain text, ignoring existing semantic information (e.g., bold highlighted text and section/subsection headings in different fonts and sizes). ocr refers to character recognition, which limits its own scope in theory. in current practices, semantic information is totally ignored by every ocr system. scant research has been carried out for them too. conclusion and future work so far, we have created a pashto dataset containing 300 line images and their corresponding text from three pashto monographs published in 1986, 2002, and 2006, respectively. the dataset is openly available as a gold-standard pashto dataset from real books. when future funding is received, we will add more data to this dataset. the segmentation app produces accurate line images from page images for pashto and persian content. it will work for other languages using the arabic writing system. potential users of our prototype software will find it is relatively easy to modify with little knowledge of the underlying technology in other programming languages such as java. in addition, researchers who understand linear algebra in which matlab is used can modify the code for their needs. we are also using this dataset to train and evaluate our current ocr algorithms with rnn and other ml models. an initial report of our research and results can be found at arxiv.org.25 the authors will report and update future research results and available datasets via conferences and formal publications. the authors would like to thank the national endowment for the humanities for its grant (pr 263939-19) to our project development of image-to-text conversion for pashto and traditional chinese. the authors would like to thank riaz ahmad and saeeda naz for providing the nuces fast ligature dataset. the authors would also like to thank atifa rawan, sayyed m. vazirizade, and sharam parastesch for their valuable contributions. ms. rawan selected the sample pashto manuscripts and reviewed the lines. dr. vazirizade worked on segmentation algorithms and code. ph.d. student sharam parastesch keyed in and verified the dataset. information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 12 glossary of terms • alphabet: an alphabet is a standardized set of letters and symbols. the most popular are the latin alphabet (a–z) and the arabic alphabet. • classification: in machine learning, classification is to assign a sample to one or more classes by supervised learning from examples. • dataset: a set of data. a dataset can be in a variety of forms or formats (e.g., text, images, audio, videos, 3d objects, gps data, machine learning data), from one table, to a collection of images of handwritten digits (e.g., mnist database), to a collection of metadata of a digital repository (e.g., arxiv dataset). • document layout analysis: in the ocr technology, document layout analysis is the process of identifying the layout and categorizing the information blocks in a digital image of a document. the goal is to segmentalize one information block from the other and arrange these information blocks in the correct reading order. • language: a structured system of communication; the system of linguistic signs or symbols considered in the abstract (as opposed to speech). • left to right (ltr) first and top to bottom (ttb) writing direction: writing a horizontal line starting from the top left of a page, continuing to the right, and returning to the next line all the way from top to bottom. the latin writing system uses this writing direction. the current chinese writing system uses this order as well. • logogram: a written character that represents a word or phrase. the most popular are chinese (simplified and traditional) characters, kanji (japanese), and hanja (korean). • optical character recognition (ocr): conversion of an image consisting of text (printed or handwritten) into digital text. • page segmentation: segmentation process for a scanned page in a digital image file format. • right to left (rtl) first and top to bottom (ttb) writing direction: writing a horizontal line starting from the top right of a page, continuing to the left, returning to the next line and all the way from top to bottom. the arabic writing systems such as arabic, persian, and pashto use this order. • script: “[a] collection of letters and other written signs used to represent textual information in one or more writing systems.”26 • segmentation: segmentation is the process of partitioning a digital image of a document into multiple segments where each segment consists of a set of pixels. it aims at separating the digital image into one or more information blocks, where each information block contains logical information separated from the other information block. these information blocks shall be arranged in the correct reading order. (see document layout analysis) • textbox: in an ocr system, a textbox is a box with (x,y) (identified in the computer source code) that contains one or more characters. • top to bottom (ttb) first and left to right (ltr) writing direction: writing a vertical line starting from the top left of a page, continuing to the bottom, and returning to the next line all the way from left to right. this method is rarely used by any writing system. • top to bottom (ttb) first and right to left (rtl) writing direction: writing a vertical line starting from the top right of a page, continuing to the bottom, and returning to the next line all the way from right to left. this method was widely used in traditional chinese (before 1950s) and traditional japanese materials for thousands of years. it is still used in chinese calligraphy, and occasionally can be found in materials published in chinese. • writing system: a common communication method to allow people to exchange information through a medium such as paper. information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 13 endnotes 1 lu gan, email message to author, march 25, 2019. 2 donald sturgeon, “large-scale optical character recognition of pre-modern chinese texts,” international journal of buddhist thought and culture 28, no. 2 (2018): 11–44, https://dsturgeon.net/papers/large-scale-chinese-ocr.pdf. 3 donald sturgeon, “digitizing premodern text with the chinese text project,” journal of chinese history 4, no. 2 (2020): 486–98, https://doi:10.1017/jch.2020.19. 4 “glossary of unicode terms,” the unicode consortium, last updated may 20, 2020, http://www.unicode.org/glossary/. 5 the unicode standard version 13.0—core specification: chapter 17: indonesia and oceania (the unicode consortium: mountain view, ca, 2020), https://www.unicode.org/versions/unicode13.0.0/ch17.pdf#g26723. 6 britta-maria gruber and wolfgang kirsch, “writing machu on a western computer (an interim report),” saksaha: a journal of manchu studies, 3, (1998): https://doi.org/10.3998/saksaha.13401746.0003.008. 7 herbert penzl, a grammar of pashto: a descriptive study of the dialect of kandahar, afghanistan. (new york: ishi press, 2009). 8 library of congress, pushto romanization tables (2013), https://www.loc.gov/catdir/cpso/romanization/pushto.pdf. 9 riaz ahmad et al., “robust optical recognition of cursive pashto script using scale, rotation and location invariant approach,” plos one 10, no. 9 (september 14, 2015): e0133648, https://doi.org/10.1371/journal.pone.0133648. 10 shizza zahoor et al., “deep optical character recognition: a case of pashto language,” journal of electronic imaging 29, no. 02 (march 4, 2020), https://doi.org/10.1117/1.jei.29.2.023002. 11 zakir ali et al., “database development and automatic speech recognition of isolated pashto spoken digits using mfcc and k-nn,” international journal of speech technology 18, no. 2 (june 2015): 271–75, https://doi.org/10.1007/s10772-014-9267-z. 12 sulaiman khan et al., “knn and ann-based recognition of handwritten pashto letters using zoning features,” international journal of advanced computer science and applications 9, no. 10 (2018), https://doi.org/10.14569/ijacsa.2018.091069. 13 sultan ullah et al., “offline pashto ocr using machine learning,” in 2019 7th international electrical engineering congress (ieecon), (hua hin, thailand, 2019): 1–4, https://doi.org/10.1109/ieecon45304.2019.8938859. 14 atifa rawan and yan han, the pasto-english dictionary (2014), http://www.pashtoenglish.org. https://dsturgeon.net/papers/large-scale-chinese-ocr.pdf about:blank http://www.unicode.org/glossary/ https://www.unicode.org/versions/unicode13.0.0/ch17.pdf#g26723 https://doi.org/10.3998/saksaha.13401746.0003.008 https://doi.org/10.3998/saksaha.13401746.0003.008 https://doi.org/10.3998/saksaha.13401746.0003.008 https://www.loc.gov/catdir/cpso/romanization/pushto.pdf https://doi.org/10.1371/journal.pone.0133648 https://doi.org/10.1371/journal.pone.0133648 https://doi.org/10.1371/journal.pone.0133648 https://doi.org/10.1117/1.jei.29.2.023002 https://doi.org/10.1117/1.jei.29.2.023002 https://doi.org/10.1007/s10772-014-9267-z https://doi.org/10.1007/s10772-014-9267-z https://doi.org/10.14569/ijacsa.2018.091069 https://doi.org/10.14569/ijacsa.2018.091069 %20 %20 https://doi.org/10.1109/ieecon45304.2019.8938859 http://www.pashtoenglish.org/ information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 14 15 open islamicate texts initiative, open islamicate texts initiative (openiti): creating the digital infrastructure for the study of the premodern islamicate world (2016), https://iticorpus.github.io/. 16 matthew thomas miller, maxim g. romanov, and sarah bowen savant, “digitizing the textual heritage of the premodern islamicate world: principles and plans,” international journal of middle east studies 50, no. 1 (february 2018): 103–9, https://doi.org/10.1017/s0020743817000964. 17 sturgeon, “large-scale optical character recognition of pre-modern chinese texts,” 11–44. 18 mohamed attia and mohamed el-mahallawy, “histogram-based lines and words decomposition for arabic omni font-written ocr systems; enhancements and evaluation,” in computer analysis of images and patterns, ed. walter g. kropatsch, martin kampel, and allan hanbury, vol. 4673, lecture notes in computer science (berlin, heidelberg: springer berlin heidelberg, 2007), 522–30, https://doi.org/10.1007/978-3-540-74272-2_65; mahmoud a. a. mousa, mohammed s. sayed, and mahmoud i. abdalla, “arabic character segmentation using projection based approach with profile’s amplitude filter,” arxiv:1707.00800 [cs], july 3, 2017, http://arxiv.org/abs/1707.00800. 19 atallah al-shatnawi and khairuddin omar, “methods of arabic language baseline detection— the state of art,” international journal of computer science and network security 8, no. 10 (october 2008); tarik abu-ain et al., “a novel baseline detection method of handwritten arabic-script documents based on sub-words,” in soft computing applications and intelligent systems, ed. shahrul azman noah et al., communications in computer and information science 378 (springer: berlin, heidelberg, 2013), 67–77, https://doi.org/10.1007/978-3-642-405679_6; saeeda naz et al., “challenges in baseline detection of arabic script based languages,” in intelligent systems for science and information, ed. liming chen, supriya kapoor, and rahul bhatia, studies in computational intelligence (springer international publishing, 2014), 542: 181–96, https://doi.org/10.1007/978-3-319-04702-7_11. 20 majid ziaratban and karim faez. “a novel two-stage algorithm for baseline estimation and correction in farsi and arabic handwritten text line,” in 2008 19th international conference on pattern recognition, tampa, fl, usa: ieee, 2008: 1–5, https://doi.org/10.1109/icpr.2008.4761822. 21 safwan wshah, zhixin shi, and venu govindaraju, “segmentation of arabic handwriting based on both contour and skeleton segmentation,” in 2009 10th international conference on document analysis and recognition, barcelona, spain: ieee, 2009: 793–97, https://doi.org/10.1109/icdar.2009.152; yusra osman, “segmentation algorithm for arabic handwritten text based on contour analysis,” in 2013 international conference on computing, electrical and electronic engineering (icceee), khartoum, sudan: ieee, 2013: 447–52, https://doi.org/10.1109/icceee.2013.6633980. 22 khan et al., “knn and ann-based recognition of handwritten pashto letters using zoning features.” https://iti-corpus.github.io/ https://iti-corpus.github.io/ https://doi.org/10.1017/s0020743817000964 https://doi.org/10.1017/s0020743817000964 https://doi.org/10.1017/s0020743817000964 https://doi.org/10.1007/978-3-540-74272-2_65 https://doi.org/10.1007/978-3-540-74272-2_65 http://arxiv.org/abs/1707.00800 http://arxiv.org/abs/1707.00800 http://arxiv.org/abs/1707.00800 https://doi.org/10.1007/978-3-642-40567-9_6 https://doi.org/10.1007/978-3-642-40567-9_6 https://doi.org/10.1007/978-3-642-40567-9_6 https://doi.org/10.1007/978-3-319-04702-7_11 https://doi.org/10.1007/978-3-319-04702-7_11 https://doi.org/10.1109/icpr.2008.4761822 https://doi.org/10.1109/icpr.2008.4761822 https://doi.org/10.1109/icpr.2008.4761822 https://doi.org/10.1109/icdar.2009.152 https://doi.org/10.1109/icdar.2009.152 https://doi.org/10.1109/icdar.2009.152 https://doi.org/10.1109/icceee.2013.6633980 https://doi.org/10.1109/icceee.2013.6633980 https://doi.org/10.1109/icceee.2013.6633980 information technology and libraries march 2021 development of a gold-standard pashto dataset and a segmentation app | han and rychlik 15 23 abdelhay zoizou, arsalane zarghili, and ilham chaker. “a new hybrid method for arabic multifont text segmentation, and a reference corpus construction.” journal of king saud university—computer and information sciences 32, no. 5 (june 2020): 576–82, https://doi.org/10.1016/j.jksuci.2018.07.003. 24 ullah, “offline pashto ocr using machine learning.” 25 marek rychlik et al., “development of a new image-to-text conversion system for pashto, farsi and traditional chinese,” arxiv:2005.08650 [cs], may 8, 2020, http://arxiv.org/abs/2005.08650. 26 “glossary of unicode terms,” http://www.unicode.org/glossary/. https://doi.org/10.1016/j.jksuci.2018.07.003 https://doi.org/10.1016/j.jksuci.2018.07.003 https://doi.org/10.1016/j.jksuci.2018.07.003 http://arxiv.org/abs/2005.08650 http://arxiv.org/abs/2005.08650 http://arxiv.org/abs/2005.08650 http://www.unicode.org/glossary/ abstract background languages, scripts, and writing systems pashto the need for datasets related datasets building a pashto dataset data source selection image processing algorithms and the segmentation app finalizing the dataset discussion conclusion and future work glossary of terms endnotes 100 communications the evolution of an online acquisitions system jenko lukac: lewis and clark college library, portland, oregon. about two years ago a home-grown online acquisitions system was developed and implemented at pacific university. the program, written in basic for the data general nova computer, performs all the necessary functions such as ordering, receiving, fund accounting, etc. 1 this program was offered to the library community, and about one hundred libraries from around the world have availed themselves of it . one of the libraries that obtained and adopted pacific's electronic acquisitions system (peas) was the watzek library at lewis and clark college. the advantage of a home-grown system is that it can be freely modified to suit the evolving needs of a particular library. this communication describes some of the changes made by lewis and clark college to the peas program, in order to illustrate how software developed at one institution can be "imported" into and enhanced by another institution . although matters were particularly simplified by having the same person who developed peas at pacific be responsible for the enhancements at lewis and clark, the procedure and conclusions are still generally applicable. the first change made to the peas program was to rename it clas-the computerized library acquisitions system . the most important change, however, was to translate it from data general basic to digital equipment corporation basic, since the computer at lewis and clark is a dec vax-11 . (each hardware manufacturer implements a slightly different version of a programming language.) the translation requires changing things such as square brackets to parentheses, the word read to get, the word write to put, etc. these changes would have to have been done repeatedly throughout the program, but, in fact, were quite easily accomplished by using a text editor-a metaprogram that can be instructed to change all occurrences of, for example, the word read to the word get in a single pass. clas retained all of the features of peas , and became fully operational at lewis and clark in february of 1980. since then, new features have been added as the staff expressed a need for them. some are minor, such as having the computer recognize initial articles in titles . others are more significant : 1. searching for records in clas by author and title makes use of unlimited rightand left-handed truncation. this makes possible subject searching through k~y words in the title . for this purpose an extra terminal is provided at the reference desk. 2. clas permits the file to be searched by the name of the faculty member who requested the item, in addition to the eight other access points available in peas. 3. clas provides an activity report for any given period showing, for each fund, the amount ordered, the amount received, and the average cost per item . 4. clas can produce vendor reports showing for each vendor the average discount and the delivery schedule. 5. clas asks the operator to verify the cost of an item if the list price and cost differ by more than 30 percent. 6. clas allows the receipt of partial shipments. some of the enhancements to clas involved successive modifications. for example, one of the features of peas was the prevention of duplicate orders by matching new orders being input with records already in the database. a potential duplicate is reported if there is a match on both the author and the title fields . it was decided at the time of implementation at lewis and clark that this criterion was too restrictive, and clas was programmed to report a duplicate if only the title fields matched . after some months of experience, it turned out that even this requirement was excessively restrictive: a slight variation in the way a title was input would prevent a duplicate from showing up. the criterion was then further relaxed to signal duplicates if either the title or the author's last name matched. this, however, was too broad a net : although no duplicates were missed, ordering a book by wilson or smith produced a tedious list of potential duplicates. hence , the requirement was tightened slightly to look for a match in either the title or the author's last name and first initial. this final criterion is currently serving well the needs of the watzek library. what is important about this evolutionary process is that it illustrates the dynamic way in which a library can "fine-tune" an automated system that is receptive to user modifications. since peas is supposed to be a selfexplanatory system, it lacks any documentation. clas is still a self-explanatory system, but nevertheless a manual has been produced to describe all its features and to record programming information such as the structure of the files . one version of the documentation is kept in machine-readable form so that it can be easily updated to correspond to developments in the program . in conclusion, it can be stated that a library-application software package has been successfully transplanted from one institution to another, from one hardware environment to another, and in doing so has matured into a fuller and more flexible system, which it is hoped will, in turn, benefit other libraries contemplating the automation of their acquisitions operation .2 references 1. jenko lukac, "a no cost online acquisicommunications 101 tions system for a medium-size library," library journal 107:684-85 (march 15, 1980). 2. interested libraries can request a copy of the clas program ($80) or manual ($40) directly from the author. the significance of information in the ordinary conduct of life* robert newhard: torrance public library, torrance, california. the information benefit provided to the general public by the developing telecommunications systems will be highly dependent upon the provider's perception of the current and potential role of information in the ordinary interests of life. as sessing this role cannot easily be done by standard questionnaire or survey methods because information does not have a conscious function in people's lives. some paradigms from the past and present may, therefore, be of use in articulating the everyday importance of information. the tool paradigm: information as a link between man and his tools or repairing a lost confidence prior to the industrial revolution, most production was carried on in the home, using tools either made or repaired mainly at home. in this cottage industry, each person was very close to and secure in the use of his tools . with the advent of the industrial revolution and the factory system, the worker no longer owned his tools, but went to one place to use someone else's tools. man and his tools began to separate. many used the tools, fewer understood them. this process began to create the "expert." today most of the tools we use-the automobile, telephone, computer termi* a version of this paper was delivered at the meeting on "public libraries and the remote electronic delivery of information (redi)," columbus, ohio, march 23-24, 1981. index to volume 24 ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ contactless services: a survey of the practices of large public libraries in china article contactless services a survey of the practices of large public libraries in china yajun guo, zinan yang, yiming yuan, huifang ma, and yan quan liu information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.14141 yajun guo (yadon0619@hotmail.com) is professor, school of information management, zhengzhou university of aeronautics. zinan yang (yangzinan612@163.com) is master, school of information management, zhengzhou university of aeronautics. yiming yuan (yuanyiming361@163.com) is master, school of information management, zhengzhou university of aeronautics. huifang ma (mahuifang126@126.com) is master, school of information management, zhengzhou university of aeronautics. *corresponding author hamed yan quan liu (liuy1@southernct.edu) is professor, department of information and library science, southern connecticut state university. © 2022. abstract contactless services have become a common way for public libraries to provide services. as a result, the strategy used by public libraries in china will effectively stop the spread of epidemics caused by human touch and will serve as a model for other libraries throughout the world. the primary goal of this study is to gain a deeper understanding of the contactless service measures provided by large chinese public libraries for users in the pandemic era, as well as the challenges and countermeasures for providing such services. the data for this study was obtained using a combination of website investigation, content analysis, and telephone interviews for an analytical survey study of 128 large public libraries in china. the study finds that touch-free information dissemination, remote resources use, no-touch interaction self-services, network services, online reference, and smart services without personal interactions are among the contactless services available in chinese public libraries. exploring the current state of contactless services in large public libraries in china will help to fill a need for empirical attention to contactless services in libraries and the public sector. up-to-date information to assist libraries all over the world in improving their contactless services implementation and practices is provided. introduction the spread of covid-19 began in 2020, and people all over the world are still fighting the severity of its spread, the breadth of its impact, and the extent of its endurance. the virus’s continued spread has had a wide-ranging impact on industry sectors worldwide, including libraries. the growth of public libraries has also seen significant changes as a result of covid-19, resulting in added patron services, including contactless services. contactless services are those that patrons can use without having to interact face to face with librarians. these services transcend time and geographical constraints, as well as lower the danger of disease transmission through human interaction. since the covid-19 pandemic, contactless or touch-free interaction services are emerging in chinese public libraries. this service model can also serve as a reference for other libraries. this study evaluates and analyzes contactless service patterns in large public libraries in china, and then suggests a contactless service framework for public libraries, which is currently in the process of being implemented. mailto:yadon0619@hotmail.com mailto:yangzinan612@163.com mailto:mahuifang126@126.com mailto:liuy1@southernct.edu information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 3 literature review the available literature shows that the term “non-contact” appeared as early as 1916 in the article “identification of the meningococcus in the naso-pharynx with special reference to serological reactions” and described a patient’s infection in the context of medical research.1 in recent years, with the widespread application of “internet +” and the development and promotion of technologies such as the internet of things, cloud computing, and artificial intelligence, the contactless economy has grown by leaps and bounds, and so has the research on library contactless services.2 library contactless services encompass a wide range of services such as selfservices, online reference, and smart services without personal interactions. library self-service has become a major service model for contact-free services. the self-service model was first adopted in american public libraries in the 1970s with the emergence of self service borrowing and returning practices.3 many public libraries have since adopted stand-alone, fully automated self-service halls, self-service counters, etc.4 by the 1990s, a range of commercial self-service kiosks and self-service products had been introduced.5 currently, the most mature self-service type used by the library community is the circulation self-service product.6 in addition to self-service borrowing and returning of titles, libraries have launched self-service printing systems, self-service computer systems, and self-service booking of study spaces.7 as an example, patrons can complete printing operations using a self-service system and can offer payment by bank card, alipay, wechat, and other means.8 a face recognition system can also be used to borrow and return books, a solution for patrons who forget their library cards.9 these library selfservice system elements are confined to simple, repetitive, and routine tasks such as conducting book inventories, book handling, circulating books, and the like, whose development stems from the widespread application of electronic magnetic stripe technology and radio frequency identification (rfid), optical character recognition (ocr) technology, and face recognition.10 new applications of technology continue to advance the development of contactless services in libraries. the overall work and service processes of the library have been made intelligent to varying degrees. online reference is an important service in the contactless service program. researchers have started to study the current state of library reference services. interactive online reference services support patrons using the library, including how to search for literature, locate and renew books, schedule a study or seminar room, and participate in other library activities, such as seminars, lectures, etc.11 in response to the problem of how patrons access various library service abilities, digital reference systems need to have functions such as automated semantic processing, automated scene awareness, through automatic calculation and adaptive matching, understanding of patrons’ interests preferences and needs, and the ability to recommend the most suitable information resources for them.12 at present, most library reference services in china mainly include the use of telephone, email, wechat, robot librarians/interactive communication, microblogs, and qq, an instant messaging software popular in china. during the past two years, most public libraries in china have essentially implemented the use of the aforementioned reference tools to communicate and interact with patrons, with wechat having a 55.6% adoption rate when compared to other instant reference tools.13 the use of online chat in reference services has allowed librarians to help patrons from anywhere and at any time through embedding chat plug-ins into multiple pages of the library website and directing patrons to ask questions based on the specific page they are viewing, setting up automatic pop-up chat windows, and changing patrons’ passive waiting to active engagement. 14 in terms of technology, emerging technologies information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 4 such as patron profiling, natural language processing, and contextual awareness can support the development of reference advisory services in libraries.15 the online reference service provides a 24/7, high-quality, efficient, and personalized service that connects libraries more closely with society and is an important window in the future smart library service system. smart services without personal interactions may become the most popular form of library services development for the future, and research on library smart services has gradually deepened. in terms of conceptual definition, the library community generally understands the concept of library smart services as mobile library services that are not limited by time and space and can help patrons find books and other types of materials in the library by connecting to the wireless internet.16 apart from this, there are two other ways to define library smart services. one discusses the meaning of smart services in an abstract way, such as library smart services that should be an advanced library form dedicated to knowledge services through human-computer interaction, a comprehensive ecosystem.17 the other concretizes the extension of this concept expressed with a formula “smart library = library + internet of things + cloud computing + smar t devices.”18 applied technology research is an important part of smart services in libraries. library smart services have three main features: digitization, networking, and clustering. among them, digitization provides the technical basis, networking provides the information guarantee, and clustering provides the library management model of resources sharing, complementary advantages, and common development among libraries.19 the key breakthrough in the development of smart services is the applications deployment of smart technologies to truly realize a new form of integration of online and offline, virtual and reality. 20 the integration of face recognition technology in traditional libraries, as well as its application to services like acces s control management, book borrowing and returning, and wallet payment, can help libraries build smart services faster.21 the integration of deep learning into a mobile visual search system for library smart services can play an important role in integrating multiple sources of heterogeneous visual data and the personalized preferences of patrons.22 blockchain technology, born out of the impact of the new wave of information technology, has also been applied to the construction of smart library information systems because of its decentralized and secure features.23 library smart services can leverage new technologies and smart devices to enhance the efficiency of library contact-free services and provide new opportunities for knowledge innovation, knowledge sharing, and universal participation, thereby enabling innovation in service models. additional research on the development of contactless services in service areas such as library self-services, online reference, and smart services is discussed. in particular, the research and construction of smart library services have been enriched with the advent of big data and artificial intelligence. however, non-contact service has not been systematically researched and elaborated in domestic and international librarianship. the emergence and prevalence of covid-19 has enabled libraries in many countries to practice various types of touch-free services, such as the introduction of postal delivery, storage deposit, and click-and-collect in australian libraries; curbside pickup service or build a book bag service in us public libraries; and delivery book to the building services in chinese university libraries. 24 therefore, a systematic investigation and study of contactless services in public libraries in the pandemic is of great importance for the adaptation and innovation of library services. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 5 methods survey samples the survey selected some of the most typical public libraries for the study. the selection criteria were those large public libraries in the more economically and culturally developed regions of china. a total of 128 large public libraries were identified, including national libraries, 32 provincial public libraries, and municipal public libraries in the top 100 cities by gdp ranking in 2020, of which five public libraries, including the capital library and nanjing library, are both top 100 city libraries and provincial libraries. these 128 large public libraries can more obviously reflect the current service level of the better developed public libraries in china, and represent the highest level of public library construction in china. (see table 1 for a list of the libraries studied.) table 1. a list of the 128 public libraries that were studied no. library no. library 1. national library of china 2. hebei library 3. shanxi library 4. liaoning provincial library 5. jilin province library 6. heilongjiang provincial library 7. zhejiang library 8. anhui provincial library 9. fujian provincial library 10. jiangxi provincial library 11. shandong library 12. henan provincial library 13. hubei provincial library 14. hunan library 15. guangzhou library 16. hainan library 17. sichuan library 18. guizhou library 19. yunnan provincial library 20. shanxi library 21. gansu provincial library 22. qinghai library 23. guangxi library 24. inner mongolia library 25. tibet library 26. ningxia library 27. xinjiang library 28. shanghai library 29. capital library of china 30. shenzhen library 31. guangzhou digital library 32. chongqing library 33. tianjin library 34. suzhou library 35. chengdu public library 36. wuhan library 37. hangzhou public library 38. nanjing library 39. qingdao library 40. wuxi library 41. changsha library 42. ningbo library 43. foshan library 44. zhengzhou library 45. nantong library 46. dongguan library 47. yantai library 48. quanzhou library 49. dalian library 50. jinan library 51. xi’an public library 52. hefei city library 53. fuzhou library 54. tangshan library 55. changzhou library 56. changchun library 57. guilin library 58. harbin library 59. xuzhou library 60. shijiazhuang library 61. weifang library 62. shenyang library 63. wenzhou library 64. shaoxing library information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 6 no. library no. library 65. yangzhou library 66. yancheng library 67. nanchang library 68. zibo library 69. kunming library 70. taizhou library 71. erdos city library 72. public library of jining 73. taizhou library 74. linyi library 75. luoyang library 76. xiamen library 77. dongying library 78. nanning library 79. zhenjiang library 80. jiaxing library 81. xiangyang library 82. jinhua library 83. yichang library 84. huizhou tsz wan library 85. cangzhou digital library 86. zhangzhou library 87. weihai library 88. digital library of handan 89. guiyang library 90. sun yat-sen library of guangdong province 91. ganzhou library 92. baotou library 93. huaian library 94. yulin digital library 95. dezhou network library 96. yuyang library 97. changde library 98. baoding library 99. the library of jiujiang city 100. taiyuan library 101. hohhot library 102. wuhu library 103. langfang library 104. national library of hengyang city 105. maoming library 106. nanyang library 107. heze library 108. urumqi library 109. zhanjiang library 110. zunyi library 111. shangqiu library 112. jiangmen library 113. liuzhou library 114. zhuzhou library 115. xuchang library 116. chuzhou library 117. lianyungang library 118. suqian library 119. mianyang library 120. zhuhai library 121. xinyang library 122. zhoukou library 123. zhumadian library 124. huzhou library 125. lanzhou library 126. fuyang library 127. xinxiang library 128. jiaozuo library survey methods web-based investigation, content analysis, and interviews with librarians were used to assess 128 public libraries in china. the survey was carried out between march 10 and september 15 in 2021. first, the authors identified the media platforms for sharing information about each public library’s contactless services, including an official website, a social networking account on wechat, or a library-developed app. the authors investigated whether these media platforms were updated with information about the contactless services and if they provided various information about these services. next, the authors searched the various contactless services offered by this library through these media platforms and recorded them. finally, the authors reviewed the data and findings from the survey to minimize errors and ensure the accuracy of the findings. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 7 findings touch-free information distribution the distribution of library information is generally carried out in a touch-free manner. there are three commonly used information media in libraries: official website, wechat official account, and library-developed app. the adoption rate of each information medium by libraries is determined by investigating whether libraries have opened information media platforms and whether the opened platforms are updated with service information. the results showed that the information medium with the highest adoption rate was the wechat official account, reaching 100%. the library’s official website showed an adoption rate of 94%. only 57% of libraries use apps to distribute contactless information (see fig, 1). figure 1. percentage of touch-free information distribution platforms in large public libraries in china. patron services must provide timely and convenient access if public libraries want to effectively expand their patron base or increase library usage. wechat is better adapted to user convenience than websites, which explains the greater utilization rate as a contactless information dissemination tool for libraries. as a public service institution, the chinese public library has an incomparable impact on politics, economy, and culture. libraries have a great influence on the cultural popularization and educational development of the public. therefore, touch-free information dissemination plays an important role in improving the efficiency of information dissemination. wechat has been fully integrated into china’s public library services as a communication tool, allowing libraries to better foster cultural growth. in the process of cultural growth, libraries need to emphasize interactive public participation and combine public culture, social topics, citizen interaction and media communication, bringing innovative value to promote urban vitality and urban humanism. the 100% 94% 57% 0% 20% 40% 60% 80% 100% 120% wechat official account official website app information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 8 widespread use of wechat helps users stay up to date on the newest information and access library resources services more conveniently. remote resources services restrictions on the use of digital resources are closely related to the frequency of patrons’ use. restrictive measures that posed obstacles to patrons using digital resources were identified. among the 128 large public libraries surveyed, 42% of libraries require reader card authentication by patrons before they can access remote resources services; 8% of libraries do not require users to have reader cards for services. patrons can use the remote resources services available in the remaining 49% of public libraries without needing to register for a user account or patron id on the library website. to reduce the risk of infection between librarians and patrons, some libraries adopted noncontact paper document delivery services for users in urgent need of paper books during the pandemic. for example, the peking university library’s book delivery to building service (see fig. 2) and xiamen library and wenzhou library’s book delivery to home (see fig. 3) allow patrons to reserve books online, and librarians will express mail the books to patrons’ homes according to their needs. figure 2. peking university library’s book delivery service to the building. figure 3. book delivery service of xiamen library and wenzhou library. contactless services have two outstanding advantages: services can be obtained without contact with people, and convenience. however, if the use of remote resources is restricted in many ways, it will lead to a decrease in the utilization of digital resources in libraries. while intellectual property requirements and concerns must be appropriately managed, public libraries should strive to provide patrons with unlimited access to digital materials and physical print books. no-touch interaction self-services no-touch interaction self-services in chinese public libraries mainly include self-checkout, selfretrieval, self-storage, self-printing, self-card registration, and other self-service services, such as self-payment, and self-reservation of study rooms or seminar rooms (see fig. 4). information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 9 figure 4. percentage of large public libraries in china that provide contactless self-service. the survey of large public libraries in china shows that the majority offer self-checkout and selfretrieval services. the percentage of public libraries offering self-storage, self-certification and self-printing is low, with only 50% or less usage. self-storage, as one of the earlier self-services, has a usage rate of 50%. only 34 percent of public libraries offered self-card registration. the selfservice card registration machine has four main functions: reader card registration, payment, password modification, and renewal. for example, when patrons need to pay deposits or overdue fines, they can use the self-service card registration machine to swipe their cards and payment to facilitate subsequent borrowing of various resources. the machine supports face recognition technology for card application and online deposit recharge, catering to the needs of patrons in many aspects of operation (see fig. 5). the proportion of self-printing is even lower available at only 15% of libraries. self-card registration and self-printing are both emerging self-service options that require strong financial and technical support and are therefore not widely available. 5% 99% 98% 50% 34% 15% 0% 20% 40% 60% 80% 100% 120% others self-checkout self-retrieval self-storage self-card registration self-printing information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 10 figure 5. self-service card registration machine in chinese large public libraries. most public libraries in china have set up dedicated self-service libraries or microservice halls on the wechat public account platform in addition to further promoting library contactless services and enabling users to enjoy self-service library services anytime, anywhere. for example, the changsha library (see fig. 6) and the taiyuan library (see fig. 7) have both set up a microservice hall column on their wechat public numbers, containing services such as personal appointment, book renewal, event registration, and digital resources. the emergence of online self -service library services has greatly contributed to the development of equalization and standardization of public library services. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 11 figure 6. changsha library no-touch interaction self-service hall. figure 7. taiyuan library no-touch interaction self-service hall. 24-hour self-service library the 24-hour self-service library, a contactless phenomenon in china’s public libraries, was introduced in 2006 and officially launched in 2007 by dongguan library and followed by shenzhen library’s initial batch of ten self-service libraries. the success of the shenzhen model has sparked a boom in the construction of self-service libraries in china, with 77% of the chinese public libraries surveyed having opened self-help libraries. the development of self-service libraries is divided into two types of service models: space-based self-service libraries (see fig. 8), i.e., unattended libraries with a certain amount of space for use, in which patrons can freely select books and read for leisure, such as 24-hour city bookstores; and a cabinet-type self-service library (see fig. 9), similar to a bank atm with an operating panel and similar in appearance to a bookcase, which allows real-time data interaction with the central library via the network. the eight self-service libraries in taiyuan library in shanxi can provide self-service book borrowing services through the new model of library + internet + credit, which allows patrons to apply for a reader’s card without a deposit and make reservations online and deliver books to the counter (see fig. 10). by cross-referencing the reader’s card with the patron’s face information, the guangzhou self-service library provides self-service borrowing and returning services for patrons through face recognition. there are many similar self-service libraries in china, which provide various types of patron services in different forms, largely reducing direct contact between patrons and librarians, and between patrons and readers. for example, when the pandemic was most severe, data collected from the ningbo self-service library showed that 7,022 physical books were borrowed and returned from january to march 2020, 50% more than in a normal year.25 information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 12 figure 8. space-based self-service libraries. figure 9. cabinet type self-service library. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 13 figure 10. taiyuan self-service library. the popularity of 24-hour self-service libraries in china is first and foremost due to the strong support and financial investment of government departments in the construction of self -service libraries. secondly, the features of self-service libraries, which are convenient, time-independent, time-saving, efficient, and diversified, are in line with modern lifestyles, integrating public library services into people’s lives, increasing the visibility and penetration of public library patron services, and maximizing patrons’ needs in reading. network services there is a wide range of network services but the most common are seat reservation, online renewal, and overdue fee payment (see fig. 11). the survey found that 89% of chinese public libraries offer at least one of these network services, indicating a high adoption rate of network services. in 2002, online renewals began to appear in china and then gradually became popular. most of the public libraries in china provide this service in the personal library or wechat official account. the rate of adoption of network service is as high as 85% in the 128 public libraries surveyed. the prevalence of seat reservation services is not high. only 28% of the public libraries surveyed offered seat reservation services. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 14 figure 11. percentage of large chinese public libraries that provide network services. coverage of the online overdue fee payment service was even lower with only 21% of public libraries providing access. however, some libraries have replaced the overdue fee system with other methods, such as the shantou library’s lending points system. in the system, the initial number of points on a patron’s account is 100, with two points added for each book borrowed and one point deducted for each day a book is overdue. when the number of points deducted on the account reaches zero, the reader’s card will be frozen for seven days and cannot be used to borrow books. after the freeze is lifted, the number of points will be reset to 20.26 in summary, contactless services in china’s public libraries are moving in a more humane direction. online reference services as a type of contactless service, online reference services are extremely helpful in developing access to documentary information resources. the survey shows that 94% of public libraries provide online reference services. online reference services are available by telephone, website, email, qq, and wechat. telephone reference and website reference are the earliest forms of contactless service, with the highest usage rates of 79% and 71% respectively among public libraries surveyed. this is followed by slightly lower coverage of email reference and qq reference at 55% and 48% respectively. wechat reference coverage rate is the lowest with only 16% (see fig. 12). qq and wechat are both tencent’s instant messengers, but qq’s file function is slightly stronger than wechat’s. qq can send large files of over 1gb and files do not expire, making it easy for the reference librarians to communicate with patrons. 85% 28% 21% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% online renewal seat reservation overdue fee payment information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 15 figure 12. percentage of large public libraries in china that provided online reference service tools. other online reference methods such as microblog reference and intelligent robot reference are present in chinese large public libraries. real-time reference is labor-intensive and timeconsuming, and where librarians may be unavailable to provide an immediate response, intelligent robotic referencing can make up for the problem of consultants being online full time. applying intelligent robots to library reference can also provide accurate and personalized consultation services according to patrons’ needs and behavioral patterns, greatly improving the quality, effectiveness, and satisfaction of consultation services. for example, the zhejiang library has an online reference service which includes online 24-hour robot reference and offline message modules. patrons can also choose expert reference and see available reference experts in the expert list and their details, including name, library, title, specialties, status, etc.27 in addition, the hunan library provides joint online reference, which is a public welfare platform of the hunan provincial literature and information resources common construction and sharing collaborative network, to provide online reference services to the public. eleven member units, including hunan library, hunan university library, and hunan science and technology information institute benefit from the rich literature resources, information technology, and human resources of the network, and all sites work together to provide free online reference advice and remote delivery of literature to a wide range of patrons, as well as advisory and tutorial services to guide patrons on how to use the library’s physical and digital resources.28 smart services without personal interactions driven by artificial intelligence, blockchain, cloud computing, and other technologies, libraries are evolving from physical and digital libraries to smart libraries. smart services without personal interactions are a fundamental capability of smart libraries. this survey found that the coverage of 4% 79% 71% 55% 48% 16% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% others telephone website email qq wechat information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 16 smart services was 52%, with virtual reality coverage at 21%, face recognition coverage at 20%, and swipe face to borrow books at 9%. face recognition can be used in library resources services, face gates, security monitoring, self-checkout, and other online and offline real-name identity verification instances, which can improve the efficiency of identity verification. the biggest advantage of face recognition is that it is contactless and easy to use, avoiding the health and safety risks associated with contact identification such as fingerprints. swipe face to borrow books is one of the applications included in face recognition technology that allows patrons to quickly borrow and return books by swiping faces, even if they have forgotten their reader’s card. this technology also tracks the interests of patrons based on their borrowing habits and history records, providing them with corresponding reading recommendation services. it is worth noting that chinese public libraries have a rich variety of smart service methods. in terms of vr technology applications, the national library of china launched the national library virtual reality system in 2008, the first service in china to bring vr technology to the public eye. the virtual reality system provides patrons with the option to explore virtual scenes and interact with virtual resources available in the library. the virtual scenes are distributed by using computer systems to build realistic architectural structures and reading rooms, so that patrons can learn about the library in the library lobby with the help of vr equipment. virtual resources are digital resources presented in virtual form. the technology combines flash and human gesture recognition systems, allowing patrons to flip through books touch-free at virtual reality reading stations, enhancing the reading style and interactive experience. in addition, the fuzhou library is concerned with the characteristics of different groups of people and has made virtual experiences a focus of its services, using vr technology to innovate reading methods, such as presenting animal images in 3d form on a computer screen, which has been welcomed by a large number of readers, especially children. shanghai library, tianjin library, shenzhen library, chongqing library, and jinan library have introduced vr technology into their patron services as to attract more users. in terms of blockchain applications, the national digital library of china makes use of the special features of blockchain technology in terms of distributed storage, traceable transmission, and high-grade encryption to provide full-time, full-domain, and full-scene copyright protection for massive digital resources and promotes the construction of intelligent library services. related to big data technology, the shanghai library provides personalized recommendation services for e-books based on the characteristics of the books borrowed by readers. patrons using a mobile phone can scan a code on borrowed books and click on the recommended book’s cover for immediate reading.29 conclusion & recommendations an in-depth analysis of the contactless service strategy will help to steadily improve the smart library development process in public libraries and to support their transition to smart libraries. this report provides a systematic framework for contactless services for public libraries based on a survey and assessment of the contactless service status of large public libraries in china. contactless patron services, contactless space services, contactless self -services, and contactless extension services are the four key components of the framework (see fig. 13). information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 17 figure 13. a systematic framework of contactless services for public libraries. providing contactless patron services patron services are the heart and soul of each public library. the library’s services providing no personal physical contact or touch-free connection with patrons are referred to as contactless patron services. this includes book lending, online reference, digital resources and network reading promotion. at present, most chinese public libraries have few contactless lending options, making it difficult to meet the needs of patrons who cannot access the library due to covid-19 or transportation difficulties for various reasons. therefore, public libraries can enrich their existing book lending methods by providing patrons with contactless services, such as book delivery and online lending, to create a convenient reading environment. a focus on digital resources is fundamental to achieving contactless patron services. at present, some public libraries in china neglect the management of digital resources due to the emphasis on paper resources, and digital resources are not updated and maintained in a timely manner, which leads to the inability of patrons to use them smoothly; therefore, the effective management of digital resources in libraries is crucial. in addition, public libraries can carry out activities such as network reading promotion and reader education to effectively improve the utilization of library resources. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 18 building contactless space services contactless space services refer to the touch-free interaction between physical space and virtual space. physical space services mainly include self-reservation of study rooms, discussion rooms, meeting rooms, as well as providing venues for public lectures or exhibitions, etc., to fulfill the space demands arising from patrons’ access to information. virtual space services mainly include building spaces for collaboration and communication, creative spaces, information sharing spaces, and cultural spaces, providing a virtual integrated environment for patrons’ needs for information exchange and acquisition in the online environment. public libraries can develop their activities through different channels according to the characteristics and elements of physical and virtual spaces, so that libraries can evolve from “library as a place” to “library as a platform.” the combination of an offline library space and an online library platform provides a more convenient and accessible library experience for patrons. implementing no-touch interaction self-services no-touch interactive self-service plays a pivotal role as one of the service forms of the contactless service strategy. it mainly includes no-touch interaction self-services such as information retrieval, resources navigation, self-checkout, and self-printing. public libraries can set up no-touch interaction self-service sections on their official websites or social media accounts to help patrons quickly access up-to-date information from anywhere and at any time. developing contactless extension services in the three dimensions of time, space, and approach, contactless extension services refer to the mutual extension of the library. public libraries can be open year round on a 24/7 basis or during holidays without librarians, allowing patrons to swipe their own cards to gain access. the traditional collection of paper books should not only be available in offline libraries but can extend to individual self-service libraries or city bookshops. libraries can approach patrons with a more individualized service strategy. for example, some public libraries provide a service called build a book bag, where librarians select books according to the patron’s personal interests and reading preferences and deliver them to a designated location. limitations and prospects after analyzing the current status of contactless services in large public libraries in china, this paper finds that contactless services such as reference and access to digital resources are well established in chinese public libraries. on the other hand, the availability of contactless applications such as no-touch interaction self-services, network services, and smart services without personal interaction are less well-developed. despite the rapid development of touch-free services and their variety, public libraries in china have not yet implemented a system of contactless services. this paper proposes a systematic framework to improve the development and practice of contactless services in public libraries and interrupt the spread of covid-19. the framework includes four core modules: contactless patron services, contactless space services, contactless self-help services, and contactless extension services. it is foreseeable that contactless services will become the mainstream of public library services in the future. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 19 endnotes 1 fred griffith, “identification of the meningococcus in the naso-pharynx with special reference to serological reactions,” journal of hygiene 15, no. 3 (1916): 446–63, https://doi.org/10.1017/s0022172400006355. 2 “guiding opinions of the state council on actively promoting the ‘internet +’ action,” 2015, http://www.gov.cn/zhengce/content/2015-07/04/content_10002.htm. 3 d. brooks, “a program for self-service patron interaction with an online circulation file,” in proceedings of the american society for information science 39th annual meeting (oxford, england, 1976). 4 beth dempsey, “do-it-yourself libraries,” library journal 135, no. 12 (2010): 86–93, https://doi.org/10.1016/j.lisr.2010.03.004. 5 jackie mardikian, “self-service charge systems: current technological applications and their implications for the future library,” reference services review 23, no. 4 (1995): 19–38, https://doi.org/10.1108/eb049262. 6 pan yongming, liu huihui, and liu yanquan, “mobile circulation self-service in u.s. university libraries,” library and information service 58, no. 12 (2014): 26–31, https://doi.org/10.13266/j.issn.0252-3116.2014.12.004. 7 chen wu and jang airong, “building a modern self-service oriented library,” journal of academic libraries, no. 3 (2013): 93–96, https://doi.org/cnki:sun:mrfs.0.2016-24-350. 8 rao zengyang, “innovative strategies for university library services in the era of smart libraries,” library theory and practice, no. 12 (2016): 75–76, https://doi.org/10.14064/j.cnki.issn1005-8214.2016.12.018. 9 wang weiqiu and liu chunli, “functional design and model construction of intelligent library services in china based on face recognition technology,” research on library science, no. 18 (2018): 44–50, https://doi.org/10.15941/j.cnki.issn1001-0424.2018.18.008. 10 cheng huanwen and zhong yuanxin, “a three-dimensional analysis of a smart library,” library tribune 41, no. 6 (2021): 43–45. 11 nahyun kwon and vicki l. gregory, “the effects of librarians’ behavioral performance and user satisfaction in chat reference services,” reference & user services quarterly, no. 47 (2007): 137–48, https://doi.org/10.5860/rusq.47n2.137. 12 w. uutoni, “providing digital reference services: a namibian case study,” new library world 119, no. 5 (2018): 342–56, https://doi.org/10.1108/ils-11-2017-0122. 13 zhu hui, liu hongbin, and zhang li, “an analysis of the remote service model of university libraries in response to public safety emergencies,” new century library, no. 5 (2021): 39–45, https://doi.org/10.16810/j.cnki.1672-514x.2021.05.007. https://doi.org/10.1017/s0022172400006355 http://www.gov.cn/zhengce/content/2015-07/04/content_10002.htm https://doi.org/10.1016/j.lisr.2010.03.004 https://doi.org/10.1108/eb049262. https://doi.org/10.13266/j.issn.0252-3116.2014.12.004 https://doi.org/10.14064/j.cnki.issn1005-8214.2016.12.018 https://doi.org/10.15941/j.cnki.issn1001-0424.2018.18.008 https://doi.org/10.5860/rusq.47n2.137 https://doi.org/10.1108/ils-11-2017-0122 https://doi.org/10.1080/24750158.2020.1840719 information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 20 14 xiangming mu, alexandra dimitroff, jeanette jordan, and natalie burclaff, “a survey and empirical study of virtual reference service in academic libraries,” journal of academic librarianship 37, no. 2 (2011): 120–29, https://doi.org/10.1016/j.acalib.2011.02.003. 15 cheng xiufeng et al., “a study on a library’s intelligent reference service model based on user portraits,” research on library science, no. 2 (2021): 43–55, https://doi.org/10.15941/j.cnki.is sn1001-0424.2021.02.012. 16 m. aittola, t. ryhänen, and t. ojala, “smart library-location-aware mobile library service,” in human-computer interaction with mobile devices and services, international symposium, (2003). 17 chu jingli and duan meizhen, “from smart libraries to intelligent libraries,” journal of the national library of china, no. 1 (2019): 3–9, https://doi.org/10.13666/j.cnki.jnlc.2019.01.001. 18 yan dong, “iot-based smart libraries,” journal of library science 32, no. 7 (2010): 8–10, http://doi.org/10.14037/j.cnki.tsgxk.2010.07.034. 19 wang shiwei, “a brief discussion of the five relationships of smart libraries,” library journal 36, no. 4 (2017): 4–10, https://doi.org/10.13663/j.cnki.lj.2017.04.001. 20 morell d. boone, “unlv and beyond,” library hi tech 20, no. 1 (2002): 121–23, https://doi.org/10.1108/07378830210733981. 21 qin hong et al., “research on the application of face recognition technology in libraries,” journal of academic libraries 36, no. 6 (2018): 49–54, https://doi.org/10.16603/j.issn10021027.2018.06.008. 22 li mo, “research on a mobile visual search service model for smart libraries based on deep learning,” journal of modern information 39, no. 5 (2019): 89–96. 23 zhou jie, “study on the application of lora technology in smart libraries,” new century library, no. 5 (2021): 57–61, https://doi.org/10.16810/j.cnki.1672-514x.2021.05.010. 24 international federation of library associations and institutions, “the covid-19 and the global library community,” 2020, https://www.ifla.org/covid-19-and-the-global-library-field/; guo yajun, yang zinan, and yang zhishun, “the provision of patron services in chinese academic libraries responding to the covid-19 pandemic,” library hi tech 39, no. 2 (2021): 533–48, https://doi.org/10.1108/lht-04-2020-0098; peking university library, “book delivery service to the buildings where the patrons live,” (2020), https://mp.weixin.qq.com/s/eknyg_-_rjrcl6sjc-it-a. 25 hu bin ying yan, “study on the intelligent construction of ningbo library under the influence of epidemic,” jiangsu science & technology information 38, no. 24 (2021): 17–21, https://doi.org/10.3969/j.issn.1004-7530.2021.24.005. 26 shantou library, “come and be a book ‘saint’! city library changes lending rules, points system instead of overdue fees,” 2021, http://www.stlib.net/information/26182. https://doi.org/10.1016/j.acalib.2011.02.003 https://doi.org/10.13666/j.cnki.jnlc.2019.01.001 https://doi.org/10.13663/j.cnki.lj.2017.04.001 https://doi.org/10.1108/07378830210733981 https://doi.org/10.16603/j.issn1002-1027.2018.06.008 https://doi.org/10.16603/j.issn1002-1027.2018.06.008 https://doi.org/10.16810/j.cnki.1672-514x.2021.05.010 https://www.ifla.org/covid-19-and-the-global-library-field/ https://doi.org/10.1108/lht-04-2020-0098 https://mp.weixin.qq.com/s/eknyg_-_rjrcl6sjc-it-a' http://dx.chinadoi.cn/10.3969/j.issn.1004-7530.2021.24.005 http://www.stlib.net/information/26182 information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 21 27 zhejiang library, “online reference services,” 2020, https://www.zjlib.cn/yibanwt/index.htm?liid=2. 28 hunan provincial collaborative network for the construction and sharing of literature and information resources, “reference union of public libraries in hunan province,” 2021, http://zx.library.hn.cn/. 29 ministry of culture and tourism of the people’s republic of china, “shanghai library launches personalized recommendation service for e-books,” 2021, https://www.mct.gov.cn/whzx/qg whxxlb/sh/202101/t20210106_920497.htm. https://www.zjlib.cn/yibanwt/index.htm?liid=2 http://zx.library.hn.cn/ https://www.mct.gov.cn/whzx/qgwhxxlb/sh/202101/t20210106_920497.htm https://www.mct.gov.cn/whzx/qgwhxxlb/sh/202101/t20210106_920497.htm abstract introduction literature review methods survey samples survey methods findings touch-free information distribution remote resources services no-touch interaction self-services 24-hour self-service library network services online reference services smart services without personal interactions conclusion & recommendations providing contactless patron services building contactless space services implementing no-touch interaction self-services developing contactless extension services limitations and prospects endnotes an overview of the current state of linked and open data in cataloging irfan ullah, shah khusro, asim ullah, and muhammad naeem information technology and libraries | december 2018 47 irfan ullah (cs.irfan@uop.edu.pk) is doctoral candidate, shah khusro (khusro@uop.edu.pk) is professor, asim ullah (asimullah@uop.edu.pk) is doctoral student, and muhammad naeem (mnaeem@uop.edu.pk) is assistant professor, at the department of computer science, university of peshawar. abstract linked open data (lod) is a core semantic web technology that makes knowledge and information spaces of different knowledge domains manageable, reusable, shareable, exchangeable, and interoperable. the lod approach achieves this through the provision of services for describing, indexing, organizing, and retrieving knowledge artifacts and making them available for quick consumption and publication. this is also aligned with the role and objective of traditional library cataloging. owing to this link, major libraries of the world are transferring their bibliographic metadata to the lod landscape. some developments in this direction include the replacement of anglo-american cataloging rules 2nd edition by the resource description and access (rda) and the trend towards the wider adoption of bibframe 2.0. an interesting and related development in this respect are the discussions among knowledge resources managers and library community on the possibility of enriching bibliographic metadata with socially curated or user-generated content. the popularity of linked open data and its benefit to librarians and knowledge management professionals warrant a comprehensive survey of the subject. although several reviews and survey articles on the application of linked data principles to cataloging have appeared in literature, a generic yet holistic review of the current state of linked and open data in cataloging is missing. to fill the gap, the authors have collected recent literature (2014–18) on the current state of linked open data in cataloging to identify research trends, challenges, and opportunities in this area and, in addition, to understand the potential of socially curated metadata in cataloging mainly in the realm of the web of data. to the best of the authors’ knowledge, this review article is the first of its kind that holistically treats the subject of cataloging in the linked and open data environment. some of the findings of the review are: linked and open data is becoming the mainstream trend in library cataloging especially in the major libraries and research projects of the world; with the emergence of linked open vocabularies (lov), the bibliographic metadata is becoming more meaningful and reusable; and, finally, enriching bibliographic metadata with user-generated content is gaining momentum. conclusions drawn from the study include the need for a focus on the quality of catalogued knowledge and the reduction of the barriers to the publication and consumption of such knowledge, and the attention on the part of library community to the learning from the successful adoption of lod in other application domains and contributing collaboratively to the global scale activity of cataloging. introduction with the emergence of the semantic web and linked open data (lod), libraries have been able to make their bibliographic data publishable and consumable on the web, resulting in an increased understanding and utility both for humans and machines.1 additionally, the use of linked data principles of lod has allowed connecting related data on the web.2 traditional catalogs as mailto:cs.irfan@uop.edu.pk mailto:khusro@uop.edu.pk mailto:asimullah@uop.edu.pk mailto:mnaeem@uop.edu.pk current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 48 https://doi.org/10.6017/ital.v37i4.10432 collections of metadata about library content have served the same purpose for a long time.3 it is, therefore, natural to establish a link between the two technologies and exploit the capabilities of lod to enhance the power of cataloging services. in this regard, significant milestones have been achieved, which includes the use of linked and open data principles for publishing and linking library catalogs, bibframe, and europeana data model (edm).4 however, the potential of linked and open data for building more efficient libraries and the challenges involved in that direction are mostly unknown due to the lack of a holistic view of the relationship between cataloging and the lod initiative and the advances made in both areas. likewise, the possibility of enriching the bibliographic metadata with user-generated content such as ratings, tags, and reviews to facilitate the search for known-items as well as exploratory search has not received much attention. 5 some studies of preliminary extent have, however, appeared in literature an overview of which is presented in the following paragraphs. several survey and review articles have contributed to different aspects of cataloging in the lod environment. hallo et al. investigated how linked data is used in digital libraries, how the major libraries of the world implemented it, and how they benefit from it by focusing on the selected ontologies and vocabularies. 6 they identified several specific challenges to applying linked data to digital libraries. more specifically, they reviewed the linked data applications in digital libraries by analyzing research publications regarding the major national libraries (obtaining five-stars by following linked data principles) and published from 2012 to 2016.7 tallerås examined statistically the quality of linked bibliographic data published by the major libraries including spain, france, the united kingdom, and germany. 8 yoose and perkins presented a brief survey of lod uses under different projects in different domains including libraries, archives, and museums.9 by exploring the current advances in the semantic web, robert identified the potential roles of libraries in publishing and consuming bibliographic data and institutional research output as linked and open data on the web.10 gardašević presented a detailed overview of semantic web and linked open data from the perspective of library data management and their applicability within the library domain to provide a more open and integrated catalog for improved search, resource discovery, and access.11 thomas, pierre-yves, and bernard presented a review of linked open vocabularies (lov), in which they analyzed the health of lov from the requirements perspective of its stakeholders, its current progress, its uses in lod applications, and proposed best practices and guidelines regarding the promotion of lov ecosystem.12 they uncovered the social and technical aspects of this ecosystem and identified the requirements for the long-term preservation of lov data. vandenbussche et al. highlighted the features, components, significance, and applications of lov and identified the ways in which lov supports ontology & vocabulary engineering in the publication, reuse and data quality of lod.13 tosaka and park performed a detailed literature review of rda (2005–11) and identified its fundamental differences from aacr2, its relationship with the metadata standards, and its impact on metadata encoding standards, users, practitioners, and the training required.14 sprochi presented the current progress in rda, frbr (functional requirements for bibliographic records), and bibframe to predict the future of library metadata, the skills and knowledge required to handle it, and the directions in which the library community is heading. 15 gonzales identified the limitations of marc21 and the benefits of and challenges in adopting the bibframe information technology and libraries | december 2018 49 framework.16 taniguchi assessed bibframe 2.0 for the exchange and sharing of metadata created in different ways for different bibliographic resources.17 he discussed bibframe 1.0 from rda point of view.18 he examined bibframe 2.0 from the perspective of rda to uncover issues in its mapping to bibframe including rda expressions in bibframe, mapping rda elements to bibframe properties, and converting marc21 metadata records to bibframe metadata. 19 fayyaz, ullah, and khusro reported on the current state of lod and identified several prominent issues, challenges, and research opportunities. 20 ullah, khusro, and ullah reviewed and evaluated different approaches for bibliographic classification of digital collections.21 by looking at the above survey and review articles, one may observe that these articles target a specific aspect of cataloging from the perspective of lod. the holistic analysis and a complete picture of the current state of cataloging in transiting to lod ecosystem are missing. this paper adds to the body of knowledge by filling this gap in the literature. more specifically, it attempts to answer the following research questions (rqs): rq01: how linked open data (lod) and vocabularies (lov) are transforming the digital landscape of library catalogs? rq02: what are the prominent/major issues, challenges, and research opportunities in publishing and consuming bibliographic metadata as linked and open data? rq03: what is the possible impact of extending bibliographic metadata with the usergenerated content and making it visible on the lod cloud? the first section of this paper answers rq01 by discussing the potential role of lod and lov in making library catalogs visible and reusable on the web. the second section answers rq02 by identifying some of the prominent issues, challenges, and research opportunities in publishing, linking, and consuming library catalogs as linked data. it also identifies specific issues in rda and bibframe from lod perspective and highlights the quality of lod-based cataloging. the third section answers rq03 by reviewing the state-of-the-art literature on the socially curated metadata and its role in cataloging. the last section concludes the paper followed by references cited in this article. the role of linked open data and vocabularies in cataloging the catalogers, librarians, and information science professionals have always been busy defining the set of rules, guidelines, and standards to record the metadata about knowledge artifacts accurately, precisely, and efficiently. the aacr2 are among the widely used rules and guidelines for cataloging. however, it has several issues with the nature of authorship, the relationships between bibliographic metadata, the categorization of format-specific resources, and the description of new data types.22 in an attempt to produce its revised version, aacr3, the cataloging community noticed that a new framework should be developed with the name of rda.23 based on frbr conceptual models, rda is a “flexible and extendible bibliographic framework” that supports data sharing and interoperability and is compatible with marc21 and aacr2.24 according to the rda toolkit, rda describes digital and non-digital resources by taking advantage of the flexibilities and efficiencies of modern information storage and retrieval technologies while at the same time is backward-compatible with legacy technologies used in conventional resource discovery and access applications.25 it is aligned with the ifla’s current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 50 https://doi.org/10.6017/ital.v37i4.10432 (international federation of library associations and institutions) conceptual models of authority and bibliographic metadata (frbr, frad [functional requirements for authority data], frsad [functional requirements for subject authority data]).26 rda accommodates all types of content and media in digital environments with improved bibliographic control in the realm of linked and open data; however, its responsiveness to user requirements needs further research.27 the discussion of the cataloging rules and guidelines stays incomplete without the metadata encoding standards and formats that give practical shape to these rules in the form of library catalogs. the most common encoding formats include dublin core (dc) and marc21. dublin core (http://lov.okfn.org/dataset/lov/vocabs/dce) is a [general-purpose metadata encoding scheme and] vocabulary of fifteen properties with “broad, generic, and usable terms” for resource description in natural language. it is advantageous as it presents relatively low barriers to repository construction; however, it lacks in standards to index subjects consistently as well as to offer a uniform semantic basis necessary for an enhanced search experience.28 the lack of uniform semantic basis is due to the individual interpretations and exploitations of dc metadata by the libraries, which in turn originated from its different and independent implementations at the element level.29 marc21 is the most common machine process-able metadata encoding format for bibliographic metadata. it can be mapped to several formats including dc, marc/xml (http://www.loc.gov/standards/marcxml/), mods (http://www.loc.gov/standards/mods), mads (http://www.loc.gov/standards/mads), and other metadata standards.30 however, marc21 has several limitations such as only library software and librarians understand it, it is semantically inexpressive and isolated from the web structure, and it lacks in expressive semantic connections to relate different data elements in a single catalog record.31 besides its limitations, marc metadata encoding format is vital for resource discovery especially within the library environment, and therefore, ways must be found to make visible the library collections outside the libraries and available through the major web search engines.32 one such effort is from the library of congress (http://catalog.loc.gov/) that introduced a new bibliographic metadata framework, bibframe 2.0, which will eventually replace marc21 and allow semantic web and linked open data to interlink bibliographic metadata from different libraries. other metadata encoding schema and frameworks include schema.org, edm, and the international community for documentation (cidoc)’s conceptual reference model (cidoc-crm).33 today, the bibliographic metadata records are available on the web in several forms including marc21, online public access catalogs (opacs), and bibliographic descriptions from online catalogs (e.g., library of congress), online cooperative catalogs (e.g., oclc’s worldcat [https://www.oclc.org/en/worldcat.html program]), social collaborative cataloging applications (e.g., librarything [https://www.librarything.com]), digital libraries (e.g., ieee xplore digital library [https://ieeexplore.ieee.org/xplore/home.jsp]), acm digital library(https://dl.acm.org), book search engines such as google books, and commercial databases including e.g., amazon.com. most of these cataloging web applications use either marc or other legacy standards as metadata encoding and representation schemes. however, the majority of these applications are either considering or transiting to the emerging cataloging rules, frameworks, and encoding schemes so that the bibliographic descriptions of their holdings could be made visible and reusable as linked and open data on the web for the broader interests of libraries, publishers, and end-users. http://lov.okfn.org/dataset/lov/vocabs/dce http://www.loc.gov/standards/marcxml/ http://www.loc.gov/standards/mods http://www.loc.gov/standards/mads http://catalog.loc.gov/ https://www.oclc.org/en/worldcat.html https://www.librarything.com/ https://ieeexplore.ieee.org/xplore/home.jsp https://dl.acm.org/ information technology and libraries | december 2018 51 the presence of high-quality reusable vocabularies makes the consumption of linked data more meaningful, which is made possible by linked open vocabularies (lov) that bring value-added extensions to the web of data.34 the following two subsections attempt to answer the rq01 by highlighting how lod and lov are transforming the current digital landscape of cataloging. linked and open data the semantic web and linked open data have enabled libraries to publish and make visible their bibliographic data on the web, which increases the understanding and consumption of this metadata both for humans and machines.35 lod connects and relates bibliographic metadata on the web using linked data principles.36 publishing, linking, and consuming bibliographic metadata as linked and open data brings several benefits. these include improvements in data visibility, linkage with different online services, interoperability through universal lod platform, and the credibility due to user annotations.37 other benefits include: the semantic modeling of entities related to bibliographic resources; ease in transforming topics into skos; ease in the usage of linked library data in other services; better data visualization according to user requirements; linking and querying linked data from multiple sources; and improved usability of library linked data in other domains and knowledge areas.38 different users including scientists, students, citizens and other stakeholders of library data can benefit from adopting lod in libraries.39 linked data has the potential to make bibliographic metadata visible, reusable, shareable, and exchangeable on the web with greater semantic interoperability among the consuming applications. several major projects including bibframe, lodlam (linked open data in libraries archives and museums [http://lodlam.net]), and ld4l (linked data for libraries [https://www.ld4l.org]) are in progress, which advocates for this potential.40 similarly, library linked data (lld) is lod-based bibliographic datasets, available in mods and marc21 and could be used in making search systems more sophisticated and may also be used in lov datasets to integrate applications requiring library and subjects domain datasets.41 bianchini and guerrini report on the current changes in the library and cataloging domains from ranganathan’s point of view of trinity (library, books, staff), which states that changes in one element of this trinity undoubtedly affect the others.42 they found several factors including readers, collections, and services influence this trinity and emphasize for a change: • readers moved to the web from libraries and wanted to save their time but want many capabilities including searching and navigating the full-text of resources by following links. they want resources connected to similar and related resources. they want concepts interlinked to perform an exploratory search, find serendipitous results to fulfill their information needs. • collections encompass several changes from their production to dissemination, from search and navigation to the representation and presentation of content. the ways the users access them and catalogers describe them are changing. their management is moving beyond the boundaries of their corresponding libraries to the open and broader landscape of open access context and exposure to lod environment. • services are moving from bibliographic data silos to the semantic web. this affects moving the bibliographic model to a more connected and linked data model and environment of semantic web. the data is moving from bibliographic database management systems to large lod graph, where millions of marc records are reused and converted to new http://lodlam.net/ https://www.ld4l.org/ current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 52 https://doi.org/10.6017/ital.v37i4.10432 encoding formats that are backward compatible with marc21, rda, and others and provide opportunities to be exploited fully by the linked and open data environment. thinking along this direction, new cataloging rules and guidelines, such as rda, are making us a part of the growing global activity of cataloging. therefore, catalogers should take keen interest in and avail themselves of the opportunities that lie in linked and open data for cataloging. otherwise, they (as a service) might be forgotten or removed from the trinity, i.e., from collections and readers.43 several major libraries have been actively working to make their bibliographic metadata visible and re-usable on the web. the library of congress through its linked data service (http://id.loc.gov) enables humans and machines to access its authority data programmatically. 44 it exposes and interconnects data on the web through dereferenceable uniform resource identifiers (uris).45 its scope includes providing access to the commonly found loc standards and vocabularies (controlled vocabularies and data values) for the list of authorities and controlled vocabularies that loc currently supports.46 according to the loc, the linked data service brings several benefits to the users including: accessing data at no cost; providing granular access to individual data values; downloading controlled vocabularies and their data values in numerous formats; enabling linking to loc data values within the user metadata using linked data principles; providing a simple restful api, clear license and usage policy for each vocabulary; accessing data across loc divisions through a unified endpoint; and visualizing relationships between concepts and values.47 however, to fully exploit the potentials of lod, loc is mainly focusing on its bibframe initiative.48 bibframe is not only a replacement for the current marc21 metadata encoding format it is a new way of thinking how the available large amount of bibliographic metadata could be shared, reused, and made available as linked and open data. 49 the bibframe 2.0 (https://www.loc.gov/bibframe/docs/bibframe2-model.html) model organizes information into work (the details of the about the work information), instance (work on specific subject quantity in numbers), item (format: print or electronic), and nature (copy/original work). bibframe 2.0 elaborates the roles of the persons in the specific work as agents, and the subject of the work as subjects and events.50 according to taniguchi, bibframe 2.0 takes the bibliographic metadata standards to the linked and open data with model and vocabulary that makes the cataloging more useful both inside and outside the library community.51 to achieve this goal, it needs to fulfill two primary requirements. these include (1) accepting and representing metadata created with rda by replacing the marc21, and therefore, working as creating, exchanging, and sharing rda metadata; (2) accepting and accommodating descriptive metadata for bibliographic resources created by libraries, cultural heritage communities, and users for the wide exchange and sharing. bibframe 2.0 should comply with the linked data principles including the use of rdf and uris. in addition to the library of congress, oclc through its linked data research has also been actively involved in research on transforming and publishing its bibliographic metadata as linked data.52 under this program, oclc aims to provide a technical platform for the management and publication of its rdf datasets at a commercial scale. it models the key bibliographic entities including work and person and populates them with legacy and marc-based metadata. it extends http://id.loc.gov/ https://www.loc.gov/bibframe/docs/bibframe2-model.html information technology and libraries | december 2018 53 models to efficiently describe the contents of digital collections, art objects, and institutional repositories, which are not very well-described in marc. it improves the bibliographic description of works and their translations. it manages the transition from marc and other legacy encoding formats to linked data and develops prototypes for native consumption of linked data to improve resource description and discovery. finally, it organizes teaching and training events.53 since 2012, oclc has been publishing bibliographic data as linked data with three major lod datasets including oclc persons, worldcat works, and worldcat.org.54 inspired from google research, currently, they have been working on knowledge vault pipeline process to harvest, extract, normalize, weigh, and synthesize knowledge from bibliographic records, authority files, and the web to generate linked data triples to improve the exploration and discovery experience of end-users.55 worldcat.org publishes it bibliographic metadata as linked data by extracting a rich set of entities including persons, works, places, events, concepts, and organizations to make possible several web services and functionalities for resource discovery and access.56 it uses schema.org (http://schema.org) as the base ontology, which can be extended with different ontologies and vocabularies to model worldcat bibliographic data to be published and consumed as linked data.57 tennant presents a simple example of how this works. suppose we want to represent the fact “william shakespeare is the author of hamlet” as linked data.58 to do this, the important entities should be extracted along with their semantics (relationships) and represented in a format that is both machine-processable and human-readable. using schema.org, virtual international authority file (viaf.org), and worldcat.org, the sentence can be represented as a linked data triple, as shown in figure 1 based on tennant.59 the digital bibliography & library project (dblp) is an online computer science bibliography that provides bibliographic information about major publications in computer science with the goal of providing free access to high-quality bibliographic metadata and links to the electronic version of these publications.60 as of october 2018, it has indexed more than 4.3 million publications from more than 2.1 million authors and has indexed more than 40,000 journal volumes, 38,000 conference/workshop proceedings, and more than 80,000 monographs.61 its dataset is available on lod that allows for faceted search and faceted navigation to the matching publications. it uses growbag graphs to create topic facets and uses dblp++ datasets (an enhanced version of dblp) and additional data extracted from the related webpages on the web.62 a mysql database stores the dblp++ dataset that is accessible through several ways including (1) getting the database dump; (2) using its web services; (3) using d2r server to access it in rdf; and (4) getting the rdf dump available in n3 serialization.63 the above discussions on loc, oclc, and dblp make it clear that lod can potentially transform the cataloging landscape of libraries by making bibliographic metadata visible and reusable on the web. however, this potential can only be exploited to its fullest if relevant vocabularies are provided to make the linked data more meaningful. lov fulfills this demand for relevant and standard vocabularies, discussed in the next subsection. current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 54 https://doi.org/10.6017/ital.v37i4.10432 figure 1. an example of publishing a sample fact as linked data (based on tennant64). linked open vocabularies linked open vocabularies (lov) are a “high-quality catalog of reusable vocabularies to describe linked and open data.”65 they assist publishers in choosing the appropriate vocabulary to efficiently describe the semantics (classes, properties, and data types) of the data to be published as linked and open data.66 lov interconnect vocabularies, version control, the property type of values to be matched with a query to increase the score of the terms, and offers a range of data access methods including apis, sparql endpoint, and data dump. the aim is to make the reuse of well-documented vocabularies possible in the lod environment.67 the lov portal brings valueadded extensions to the web of data, which is evident from its adoption in several state-of-the-art applications.68 the presence of vocabulary makes the corresponding linked data meaningful, if the original vocabulary vanishes from the web, linked data applications that rely on it no longer function because they cannot validate against the authoritative source. lov systems prevent vocabularies from becoming unavailable by providing redundant or back-up locations for these vocabularies.69 the lov catalog meets almost all types of search criteria including search using metadata, ontology, apis, rdf dump, and sparql endpoint enabling it to provide a range of services regarding the reuse of rdf vocabularies.70 linked data should be accompanied by its meaning to achieve its benefits, which is possible using vocabularies especially rdf vocabularies that are also published as linked data and linked with each other forming an lov ecosystem.71 such an ecosystem defines the health and usability of linked data by making its meaningful interpretation possible.72 for an ontology or vocabulary to be included into the lov catalog, it must be of an appropriate size with low-level and normalized information technology and libraries | december 2018 55 constraints and represented in rdfs or web ontology language (owl); it must allow creating instances and support documentation by permitting comments, labels, definitions, and descriptions to support end users.73 the ontology must have additional characteristics such as those described in semantic web languages like owl, published on the web with no limitations on its reuse, and support for content negotiation using searchable content and namespace uris .74 the lov catalog offers four core functionalities that make it more attractive for libraries. the aggregate accesses vocabularies through dump file or (a sparql) endpoint. the search finds classes/properties in a vocabulary or ontology. the stat displays descriptive statistics of lov vocabularies. finally, suggest enables the registry of new vocabularies.75 radio and hanrath uncovered the concerns regarding transitioning to lov including how preexisting terms could be mapped while considering the potential semantic loss.76 they describe this transition in the light of a case study at the university of kansas institutional repository, which adopted oclc’s fast vocabulary and analyzed the outcomes and impact of exposing their data as linked data. to them, a vocabulary that is universal in scope and detail can become “bloated” and may result in an aggregated list of uncontrolled terms. however, such a diverse system may be capable of accurately describing the contents of an institutional repository. in this regard, adopting linked data vocabulary may serve to increase the overall quality of data by ensuring consistency with greater exposure of the resources when published as lod. however, such a transition to a linked data vocabulary is not that simple and gets complicated when the process involves reconciling the legacy metadata especially when dealing with the issues of under or misrepresentation.77 publishers, commercial entities, and data providers such as universities are taking keen interest and consortial participation, and therefore the library community must contribute to, benefit from, and consider this inevitable opportunity seriously.78 considering, the core role of libraries in connecting people to the information, they should come forward to make available their descriptive metadata collections as linked and open data for the benefit of the scholarly community on the web. it is time to move from strings (descriptive bibliographic records) to things (data items) that are connected in a more meaningful manner for the consumption of both machines and humans.79 besides the numerous benefits of the lov, there are some well-documented [and well-supported] vocabularies that are “not published or no longer available.”80 while focusing on the mappings between schema.org and lov, nogales et al. argue that the lov portal is limited as “some of the vocabularies are not available here.”81 in other words, the lov portal is growing, but currently, it is at the infant stage, where much work is needed to bring all or at least the missing welldocumented and well-supported vocabularies. this way the true benefits of lov could be exploited to the fullest when such vocabularies are linked and made available for the consumption and reuse of the broader audience and applications of the web of data. challenges, issues, and research opportunities to answer the rq02, this section attempts to identify some of the prominent/key challenges and issues regarding publishing and consuming bibliographic metadata as linked and open data. the sheer scale and diversity of cataloging frameworks, metadata encoding schemes, and standards make it difficult to approach cataloging effectively and efficiently. the quality of the cataloging data is another dimension that needs proper attention. current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 56 https://doi.org/10.6017/ital.v37i4.10432 the multiplicity of cataloging rules and standards the importance and critical role of standards in cataloging are clear to everyone. with standards, it becomes possible to identify authors uniquely; link users to the intended and the required resources; assess the value and usage of the services a library or information system provides; operate efficiently different transactions regarding bibliographic metadata, link content, preserve metadata, and generate reports; and enable the transfer of notifications, data, and events across machines.82 the success of these standards is because of the community-based efforts and their utility for a person/organization and ease of adoption. 83 however, we are living in a “jungle of standards” with massive scale and complexity.84 we are facing a flood of standards, schemas, protocols, and formats to deal with bibliographic metadata. 85 it is necessary to come up with some uniform and widely accepted standard, schema, protocol, and format, which will make possible the uniformity between bibliographic records and make way for records de-duplication on the web. also, because of the exponential growth of the digital landscape of document collections and the emerging yet widely adopted linked data environment, it becomes necessary for librarians to be part of this global scale activity of making their bibliographic data available as linked and open data.86 therefore, all these standards need reenvisioning and reconsideration when libraries transit from the current implementations to a more complex lod-based environment.87 rda is easy to use, user-centric, and retrieval-supportive with a precise vocabulary.88 however, it has lengthier descriptions with a lot of technical terms, is time-consuming, needs re-training, and suffers from the generation gap.89 rda is transitioning from aacr2 to produce metadata for knowledge artifacts, and it will be adaptive to the emerging data structures of linked data.90 although librarians could potentially play a vital role in making rda successful, it is challenging to bring them on the same page with publishers and vendors.91 while studying bibframe 2.0 from rda point of view, taniguchi observed that: • bibframe has no class correspondence with rda, especially making a distinction between work and expression is challenging. • some rda elements have no corresponding properties in bibframe, and therefore, cannot be expressed in bibframe. in other cases, bibframe properties cannot be converted back to rda elements due to the many-to-one and many-to-many mappings between them. • the availability of multiple marc21-to-bibframe tools results in the variety of bibframe metadata, which makes its matching and merging in the later stages challenging.92 to understand whether bibframe 2.0 is suitable as a metadata schema, taniguchi examined it closely for domain constraint of properties and developed four additional methods for implementing such constraints, i.e., defining properties in bibframe.93 in these methods, method 1 is the strictest one for defining such properties, method 2 from bibframe, and the remaining gradually loosen. method 1 defines the domain of individual properties as work or instance only, which is according to the method in rda. method 2 defines properties using multiclass structure (work-instance-item) for descriptive metadata. method 3 introduces a new class bibres to accommodate work and instance properties. method 4 uses two classes bibres and work for representing a bibliographic resource. method 5 leaves the domain of any property unspecified and uses rdf:type to represent whether a resource belongs to the work or instance. he observed that: information technology and libraries | december 2018 57 • the multi-class structure used in bibframe (method 2) questions the consistency between this structure and the domain definition of the properties. • if the quality of the metadata is concerned especially matching among converted metadata from different source metadata, then method 1 works better than method 2. • if metadata conversion from different sources is required, then method 4 or 5 should be applied.94 taniguchi concludes that bibframe’s domain constraint policy is unsuitable for descriptive metadata schema to exchange and share bibliographic resources, and therefore, should be reconsidered.95 according to sprochi, bibliographic metadata is passing through a significant transformation. 96 frbr, rda, and bibframe are among the three major and currently running programs that will affect the recording, storage, retrieval, reuse and sharing of bibliographic metadata. ifla focuses on reconciling frbr, frad, and frsad models into one model namely frbr-library reference model (rfbr-lrm [https://www.ifla.org/node/10280]), published in may 2016.97 sprochi further adds that it is generally expected that by adopting this new model, rda will be changed and revised significantly. bibframe will also get substantial modifications to become compatible with frbr-lrm and the resulting rda rules.98 these initiatives, on the one hand, makes possible their visibility on the web, but on the other hand, introduces several changes and challenges for the library and information science community.99 to cope with the challenges of making bibliographic data visible, available, reusable, and shareable on the web, sprochi argues that: 100 • the library and information science community must think of the bibliographic records in terms of data that is both human-readable and machine-understandable, which can be processed across different applications and databases with no format restrictions. also, this data must support interoperability among vendors, publishers, users, and libraries and therefore, should be thought of beyond the notion that “only library create quality metadata (as quoted in coyle (2007)” and cited by sprochi101). • a shared understanding of semantic web, lod, data formats, and other related technologies is necessary for the library and information science community for more meaningful and fruitful conversations with software developers, information & library science (ils) designers, and it & linked data professionals. at least some basic knowledge about these technologies will enable the library community to take active participation in publishing, storing, visualizing, linking, and consuming bibliographic metadata as linked and open data. • the library community must show a strong commitment to more ils vendors to “postmarc” standards such as bibframe or any other standard that is supportive of the lod environment. this way we will be in a better position to exploit linked data and semantic web to their fullest. the library community must be ready to adopt lod in cataloging. transitioning from marc to linked data needs collaborative efforts and requires addressing several challenges. these challenges include: https://www.ifla.org/node/10280 current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 58 https://doi.org/10.6017/ital.v37i4.10432 • committing to a single standard by getting all units in the library, so that the big data problem resulting from using multiple metadata standards by different institutions could be mitigated; • bringing individual experts, libraries, university, and governments to work together and organize conferences, seminars, and workshops to bring linked data into the mainstream ; • translating the bibframe vocabulary into other languages; • involving different users and experts in the area; and • obtaining funding from the public sector and other agencies to continue the journey towards linked data.102 in the current scenario of metadata practices, the interoperability for the exchange of metadata varies across different formats.103 the semantic web and lod support different library models such as frbroo, edm, and bibframe. these conceptual models and frameworks suffer from the interoperability issue, which makes data integration difficult. currently, several options are available for encoding bibliographic data to rdf (and to lod), which further complicates the interoperability and introduces inconsistency.104 existing descriptive cataloging methodologies and the bibliographic ontology descriptions in cataloging and metadata standards set the stage for redesigning and developing better ways of improved information retrieval and interoperability.105 besides the massive heaps of information on the web, the library community (especially digital libraries) has devised standards for metadata and bibliographic description to meet the interoperability requirements for this part of the data on the web.106 semantic web technologies could be exploited to make information presentation, storage, and retrieval more user-friendly for digital libraries.107 to achieve such interoperability among resources, castro proposed an architecture for semantic bibliographic description.108 gardašević emphasizes on employing information system engineers and developers to understand resource description, discovery, and access process in libraries and then extend these practices by applying linked data principles.109 this way bibliographic metadata will be more visible, reusable and shareable on the web. godby, wang, and mixter stress collaborative efforts to establish a single and universal platform for cataloging rules, encoding schema, and model to a higher level of maturity, which requires initiatives such as rda, bibframe, ld4l, and biblow (https://bibflow.library.ucdavis.edu/about).110 the massive volume of metadata (available in marc and other legacy formats) makes data migration to bibframe challenging.111 although bibframe challenges the conventional ground of cataloging, which aims to record tangible knowledge containers, it is still in the infant stage at both theoretical and practical levels.112 for bibframe to be more efficient, enhanced, and enriched, it needs the attention of librarians and information science experts who will use it to encode their bibliographic metadata.113 gonzales suggests that librarians must be willing to share metadata and upgrade metadata encoding standards to bibframe; they should train, learn, and upgrade their systems to efficiently use bibframe encoding scheme and research new ways of bringing interoperability between bibframe and other legacy metadata standards; and they should ensure the data security of patrons and mitigate the legal and copyright issues in making visible their resources as linked and open data.114 also, lov must be exploited from the cataloging perspective by finding out ways to create a single, flexible, adaptable, and representative vocabulary. such a vocabulary will bring the cataloging data from different https://bibflow.library.ucdavis.edu/about information technology and libraries | december 2018 59 libraries of the world and make it accessible and consumable as a single library linked data to get free from the jungle of metadata vocabularies [and standards]. publishing and consuming linked bibliographic metadata according to the findings of one survey, there are several primary motives for publishing an institution’s [meta]data as linked data. these include (in the order from most frequent/ essential to a lesser one):115 • making data visible on the web; • experimenting and finding the potentials of publishing datasets as linked data; • exposing local datasets to understand the nature of linked data; • exploring the benefits of linked data for search engine optimization (seo); • consuming and reusing linked data in future projects; • increasing the data reusability and interoperability; • testing schema.org and bibframe; • meeting the requirements of the project; and • making available the “stable, integrated, and normalized data about research activities of an institution.”116 they also identified several reasons from the participants regarding the consumption of such data. these include (in the order from most frequent/essential to a lesser one):117 • improving the user experience; • extending local data with other datasets; • effectively managing the internal metadata; • improving the accuracy and scope of search results; • trying to improve seo for local resources; • understanding the effect of data aggregation from multiple datasets; and • experimenting and finding the potentials of consuming linked datasets. publishing and consuming bibliographic data on the lod cloud brings numerous applications. kalou et al. developed a semantic mashup by combining semantic web technologies, restful services, and content management services (cms) to generate personalized book recommendations and publish them as linked data.118 it allows for the expressive reasoning and efficient management of ontologies and has potential applications in the library, cataloging services, and ranking book records and reviews. this application exemplifies how we can use the commercially [and socially] curated metadata with bibliographic descriptions from improved user experience in digital libraries using linked data principles. however, publishing and consuming bibliographic metadata as linked and open data is not that simple and need addressing several prominent challenges and issues, which are identified in the following subsections along with some opportunities for further research. publishing linked bibliographic metadata the university of illinois library worked on publishing marc21 records of 30,000 digitized books as linked library data by adding links, transforming them to lod-friendly semantics (mods) and deploying them as rdf with the objective to be used by a wider community.119 to them, using semantic web technologies, a book can be linked to related resources and multiple possible current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 60 https://doi.org/10.6017/ital.v37i4.10432 contexts, which is an opportunity for libraries to build innovative user-centered services for the dissemination and uses of bibliographic metadata.120 in this regard, the challenge is to utilize the existing book-related bibliographic maximally and descriptive metadata in a manner that parallels with the services (both inside the library and outside) as well as exploit to the fullest the full-text search and semantic web technologies, standards, and lod services.121 while publishing the national bibliographic information as free open linked data, ifla identifies several issues including:122 • dealing with the negative financial impact on the revenue generated from traditional metadata services; • the inability to offer consistent services due to the complexity of copyright and licensing frameworks; • the confusion in understanding the difference between “open” and “free” terms; • remodeling library data as library linked data; • the limited persistence and sustainability of linked data resources; • the steep learning curve in understanding and applying linked data practices to library data; • making choices between sites to link to; and • creating persistent uris for library data objects. from the analysis of the relevant literature, hallo identified several issues in publishing bibliographic metadata as linked and open data. these include difficulties in cataloging and migrating data to new conceptual models; the multiplicity of vocabularies for the same metadata; the lack of agreements to share data; the lack of experts and tools for transforming data; the lack of applications and indicators for its consumption; mapping issues; providing useful links of datasets; defining and controlling data ownership; and ensuring dataset quality.123 libraries should adopt to linked data five-stars model by adopting emerging non-proprietary formats to publish its data; link to external resources and services; participate actively in enriching; and improving the quality of metadata to improve knowledge management and discovery. 124 the cataloging has a bright future with more dataset providers by involving citizens and end -users in metadata enrichment and annotation; making ranking and recommendation as part of library cataloging services; and the increased participation of the library community to the body of semantic web and linked data.125 publishing linked data poses several issues. these include data cleanup issues es pecially when dealing with legacy data; technical issues such as data ownership; the software maturity to keep linked data up-to-date; managing its colossal volume; and providing it support for data entry, annotation, and modeling; developing representative and widely applicable lovs; and handling the steep learning curve to understand and apply linked data principles. 126 bull and quimby stress understanding how the library community is transiting their cataloging methods, systems, standards, and integrations to the lod for making them visible on the web and how they keep backward compatibility with legacy bibliographic metadata.127 it is necessary for the lod data model to maintain the underlying semantics of the existing models, schemas, and standards, yet innovate and renew old traditions, where the quality of the conversion solely depends on the ability of this new model to cope with heterogeneity conflicts, information technology and libraries | december 2018 61 maintain granularity and semantic attributes and consequently prevent loss of data and semantics.128 the new model should be semantically expressive enough to support meaningful and precise linking to other datasets. by thinking alternatively, these challenges are the significant research opportunities that will enable us to be part of linked and open data community in a more profound manner. consuming linked bibliographic metadata consuming linked data resources can be a daunting task and may involve resolving/mitigating several challenges. these challenges include:129 • dealing with the bulky or non-available rdf dumps, no authority control within rdf dumps, and data format variations; • identifying terms’ specificity levels during concept matching; • the limited reusability of library linked data due to lack of contextual data; • harmonizing classes and objects at the institution level; • excessive handcrafting due to few off-the-shelf visualization tools; • manual mapping of vocabularies; • matching, aligning, and disambiguating library and linked data; • the limited representation of several essential resources as linked data due to nonavailability of uris; • the lack of sufficient representative semantics for bibliographic data; • the time-consuming nature of linked data to understand its structure for reuse; • the ambiguity of terms across languages; and • the non-stability of endpoints and outdated datasets. syndication is required to make library data visible on the web. also, it is necessary to understand how current applications including web search engines perceive and treat visibility, to what extent schema.org matters, and what is the nature of the linked data cloud.130 an influential work may be translated into several languages, which results in multiple metadata records. some of these are complete, and others are with missing details. godby and smith‐ yoshimura suggest aggregating these multiple metadata records into a single record, which can be complete, link the work to its different translations and translators, and is publishable (and consumable) as linked data.131 however, such an aggregation demands a great deal of human effort to make these records visible and consumable as linked data. this also includes describing all types of objects that libraries currently collect and manage, translating research findings to best practices; and establishing policies to use uris in marc and other types of records. 132 to achieve the long-term goal of making metadata consumable as linked data; the libraries, as well as individual researchers, should align their research with work that of the major players such as oclc, loc, and ifla and follow their best practices.133 the issues in lov needs immediate attention to make lod more useful. these issues, according to include the following:134 • lov publishes only a subset of rdf vocabularies with no inclusion for value vocabularies such as skos thesaurus; • it provides no or almost negligible support for vocabulary authors; current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 62 https://doi.org/10.6017/ital.v37i4.10432 • it relies on third parties to get the information about vocabulary usage in published datasets; • it has insufficient support for multilingualism or many languages; • it should support multi-term vocabulary search, which is required from the ontology designers to understand and employ the complex relationships among concepts; • it should support vocabulary matching, vocabulary checking, and multilingualism to allow users to search and browse vocabularies using their native language. it also improves the quality of the vocabulary by translation, which allows the community to evaluate and collaborate; and • efforts are required to improve and make possible the long-term preservation of vocabularies. lod emerged to change the design and development of metadata, which has implications for controlled vocabularies, especially, the person/agent vocabularies that are fundamental to data linkage but suffer from the issues of metadata maintenance and verification. 135 therefore, practical data management and the metadata-to-triples transition should be studied in detail to make the wider adaptation of lod possible.136 to come out of the lab environment and make lod practically useful, the controlled vocabularies must be cleaned, and its cost should be reduced.137 however, achieving this is challenging and needs to answer how knowledge artifacts could be uniquely identified and labeled across digital collections and what should be the standard practices to use them.138 linked data is still new to libraries.139 the technological complexities, the feeling of risks in adopting new technology and limitations due to the system, politics, and economy are some of the barriers in its usage in libraries.140 however, libraries can potentially overcome these barriers by learning from the use of linked data in other domains including, e.g., google’s knowledge graph and facebook’s open graph.141 the graph interfaces could be developed to link author, publisher, and book-related information, which in turn can be linked to the other open and freely available datasets.142 it is time that the library and information science professionals come out of the old, document-centric approach to bibliographic metadata and adapt their thinking as more datacentric for a more meaningful consumption of bibliographic metadata by both users and machines.143 quality of linked bibliographic metadata the use of a cataloging data defines its quality.144 the quality is essential for the discovery, usage, provenance, currency, authentication, and administration of metadata. 145 cataloging data or bibliographic metadata is considered fit for use based on its accuracy, completeness, logical consistency, provenance, coherence, timeliness, conformance and accessibility. 146 data is commonly assessed by its quality to be used in specific application scenarios and use cases, however, sometimes, low-quality data can still be useful for a specific application as far as its quality meets the requirements of that application.147 the reasons include several factors including availability, accuracy, believability, completeness, conciseness, consistency, objectivity, relevance, understandability, timeliness, and verifiability that determine the quality of data. 148 the quality of linked data can be of two types, one is the inherent quality of linked data, and the other relates to its infrastructure aspects. the former can be further divided into aspects including domain, metadata, rdf model, links among data items, and vocabulary. the infrastructural information technology and libraries | december 2018 63 aspects include the server that hosts the linked data, linked data fragments, and file servers.149 this typology introduces issues of their own, the issues related to the inherent quality including “linking, vocabulary usage and the provision of administrative metadata.”150 the infrastructural aspect introduces issues related to naming conventions, which include avoiding blank nodes and using http uris, linking through owl:sameas links, describing by reusing the existing terms and dereferencing.151 the quality cataloging definitions are mainly based on the experience and practices of the cataloging community.152 its quality falls into at least four basic categories: (1) the technical details of the bibliographic records, (2) the cataloging standards, (3) the cataloging process, and (4) the impact of cataloging on the user.153 the cataloging community focuses mainly on the quality of bibliographic metadata. however, it is not sufficient enough to consider the accuracy, completeness, and standardization of bibliographic metadata, and therefore, it is necessary that they should also consider the information needs of the users.154 van kleeck et al. investigated issues in the quality management of metadata of electronic resources to assess in supporting user tasks of finding, selecting, and accessing library holdings as well as identifying the potential for increasing efficiencies in acquisition and cataloging workflow.155 they evaluated the quality of existing bibliographic records mostly provided by their vendors and compared them with those of oclc and found that the latter has better support users in resource discovery and access. 156 from the management perspective, the complexity and volume of bibliographic metadata and the method of ingesting it to the catalog emphasize the selection of highest quality records.157 from the perspective of digital repositories, the absence of well-defined theoretical and operational definitions of metadata quality, interoperability, and consistency are some of the issues for the quality of metadata.158 the national information standards organization (niso) identifies several issues in creating metadata. 159 these include the inadequate knowledge about cataloging in both manual and automatic environments leading to inaccurate data entry, inconsistency of subject vocabularies, and limitations of resource discovery, and the development of standardized approaches to structure metadata.160 the poor quality of linked data can make its usefulness much difficult.161 datasets are created at the data level resulting in a significant variance in perspectives and underlying data models.162 this also leads to errors in triplication, syntax, and data; misleading owl:sameas links, and the low availability of sparql endpoints.163 library catalogs, because of their low quality, most often fail to communicate clear and correct information correctly to the users.164 the reasons for such low quality include user’s inability to produce catalogs that are free from faults and duplicates as well as low standards and policies that drive these cataloging practices. 165 although the rich collections of bibliographic metadata are available, these are rich in terms of the heaps of cataloging data and not in terms of quality with almost no bibliographic control. 166 these errors in and the low quality of bibliographic metadata are the result of misunderstanding the aims and functions of bibliographic metadata and adopting the “unwise” cataloging standards and policies.167 still there exist some high-quality cataloging efforts with well-maintained cataloging records, where the only quality warrant is to correctly understand the subject matter of the artifact and effectively communicate between librarians and experts in the corresponding domain knowledge. 168 the demand for such high quality and well-managed catalogs has increased on the web. although current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 64 https://doi.org/10.6017/ital.v37i4.10432 people are more accustomed to web search engines, the quality catalogs will attract not only libraries but the general web users as well (when published and consumed as linked data).169 the community must work together on metadata with publishers and vendors to approach cataloging from the user perspective and refine the skillset as well as produce quality metadata.170 as library and information science professionals, we should not only be the users of the standards , instead, we must actively participate and contribute to its development and improvement so that we may effectively and efficiently connect our data with the rest of the world.171 such collaboration is required from not only the librarians and vendors but also from the users in developing an efficient cataloging environment and for a more usable bibliographic metadata, this is discussed in the next section. linking the socially curated metadata this section addresses rq03 by reviewing the state-of-the-art literature from multiple but related domains including library sciences, information sciences, information retrieval, and semantic web. the section below discusses the importance and possible impact of making socially curated metadata as part of the bibliographic or professionally curated metadata. the next section highlights why social collaborative cataloging approaches should be adopted by librarians to work with other stakeholders in making their bibliographic data available and visible as linked and open data and what is the possible impact of fusing the user-generated content with professional metadata and making it available as linked and open data. the socially curated metadata matters in cataloging conventional libraries have clear and well-established classification and cataloging schemes but these are as challenging to learn, understand, and apply as they are slow and painful to consume.172 using computers to retrieve bibliographic records resulted in the massive usage of copy cataloging.173 however, adopting this practice is challenging, because these records are inconsistent; incomplete; less visible, granular, and discoverable; unable to integrate metadata and content to the corresponding records; difficult to preserve with new and usable format for the consumption by users and machines; and not supportive towards integrating the user-generated content into the cataloging records.174 the university of illinois library, through its vufind service, offers extra features to enhance the search and exploration experience of end users by providing a book’s cover image, table of contents, abstracts, reviews, comments, and user tags.175 users can contribute content such as tags, reviews, comments, and recommend books to friends. h owever, it is necessary to research whether this user-generated content should be integrated to or preserved along the bibliographic records.176 in their book, alemu and stevens mentioned several advantages of making user-generated content as part of the library catalogs.177 these include (i) enhancing the functionality of professionallycurated metadata by making information objects findable and discoverable; (ii) removing the limitations posed by sufficiency and necessity principles of the professionally-curated metadata; (iii) bringing users closer to the library by “pro-actively engaging” them in ratings, tagging, and reviewing, etc., provided that users are also involved in managing and controlling metadata entries; and (iv) the resulting “wisdom of the crowd” would benefit all the stakeholders from this massively growing socially-curated metadata. however, this combination can only be utilized optimally if we can semantically and contextually link it to the internal and external resources; the resulting metadata is openly accessed, shared, and reused; users are supported in easily adding information technology and libraries | december 2018 65 the metadata and made part of the quality control by enabling them to report spamming activities to the metadata experts.178 librarything for libraries (ltfl) makes a library catalog more informative and interactive by enhancing opac, providing access to professional and social metadata, and enabling them to search, browse, and discover library holdings in a more engaging way (https://www.librarything.com/forlibraries). it is one of the practical examples of enriching library catalogs with user-generated content. this trend of merging social and professional metadata innovates library cataloging by dissolving the borders between “social sphere” and library resources.179 the social media has expanded library into social spaces by exploiting tags and tag clouds as navigational tools and enriching the bibliographic descriptions by integrating the user-generated content.180 it bridges the communication gaps between the library and its users, where users take active participation in resource description, discovery, and access. 181 the potential role of the socially curated metadata in resource description, discovery, and access is also evident from the long long-tail social book search research under the initiative for xml retrieval (inex) where both professionally curated bibliographic and user-generated social metadata are exploited for retrieval and recommendation to support both known-item as well as exploratory search.182 by experimenting with amazon/librarything datasets of 2.8 million book records, containing both professional and social metadata, the results conclude that enriching the professional metadata with social metadata especially tags significantly improves search and recommendation.183 koolen also noticed that the social metadata especially tags and reviews significantly improve the search performance as professionally curated metadata is “often too limited” to describe books resourcefully.184 users add socially curated metadata with the intention of making resource re-findable during a future visit, i.e., they add metadata such as tags to facilitate themselves and allow other similar users in resource discovery and access, and therefore, form a community around the resource.185 clements found user tags (social tagging) beneficial for librarians while browsing and exploring the library catalogs.186 to some librarians, tags are complementary to controlled vocabulary; however, training issues and lack of awareness of social tagging functionality in cataloging interfaces prevent their perceived benefit.187 the socially curated metadata as linked data metadata is socially constructed.188 it is shaping and shaped by the context in which it is developed and applied, and demands community-driven approaches, where data should be looked at from a holistic point of view rather than considering them as discrete (individual) semantic units.189 the library is adopting the collaborative social aspect of cataloging that will take place between authors, repository managers, libraries, e-collection consortiums, publishers, and vendors.190 librarians should improve their cataloging skills in line with the advances in technology to expose and make visible their bibliographic metadata as linked and open data.191 currently, linked library data is generated and used by library professionals. socially constructed metadata will act as a value-added in retrieving knowledge artifacts with precision.192 the addition of socially constructed and community-driven metadata in current metadata structures, controlled vocabularies, and classification systems provide the holistic view of these structures as they add the community-generated sense to the professionally-curated metadata structures.193 an https://www.librarything.com/forlibraries current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 66 https://doi.org/10.6017/ital.v37i4.10432 example of the possibilities of making user-generated content as part of cataloging and linked open data is the semantic book mashup (see “consuming linked bibliographic metadata” above) which demonstrates how the commercially [and socially] curated metadata could be retriev ed and linked with bibliographic descriptions.194 while enumerating the possible applications of this mashup, they argue that book reviews from different websites could be aggregated using linked data principles by extending the review class of bibframe 2.0.195 from the analysis of twenty-one in-depth interviews with lis professionals, alemu discovered four metadata principles, namely metadata enrichment, linkage, openness, and filtering.196 this analysis revealed that the absence of socially curated metadata is sub-optimal for the potential of lod in libraries.197 their analysis advocates for a mixed-metadata approach, in which social metadata (tags, ratings, and reviews) augments the bibliographic metadata by involving users proactively and by offering a social collaborative cataloging platform. the metadata principles should be reconceptualized, and linked data should be exploited to address the existing library metadata challenges. therefore, the current efforts in linked data should fully consider social metadata.198 library catalogs should be enriched by mixing the professional and social metadata as well as semantically and contextually interlinked to internal and external information resources to be optimally used in different application scenarios.199 to fully exploit this linkage, the duplication of metadata should be reduced. it must be made openly accessible so that its sharing, reuse, mixing, and matching could be made possible. the enriched metadata must be filtered per user requirements using an interface that is flexible, personalized, contextual, and reconfigurable.200 their analysis suggests a “paradigm shift” in metadata’s future, i.e., from simple to enriched; from disconnected, invisible and locked to well-structured, machine-understandable, interconnected, visible, and more visualized metadata; and from single opac interface to reconfigurable and adaptive metadata interfaces.201 by involving users in the metadata curation process, the mixed approach will bring diversity in metadata and make resources discoverable, usable, and user-centric with the wider and well-supported platform of linked and open data.202 in conclusion, the fusion of socially curated metadata with the standards-based professional metadata is essential from the perspective of the user-centric paradigm of cataloging, which has the potential to aid resource discovery and access and open new opportunities for information scientists working in linked and open data as well as catalogers who are transiting to the web of data to make their metadata visible, reusable, and linkable to other resources on the web. from the analysis and scholarly discussions of alemu, stevens, farnel, and others as well as from the initial experiments of kalou et al.203 it becomes apparent that the application of linked data principles for library catalogs is future-proof and promising towards more user-friendly search and exploration experience with efficient resource description, discovery, access, and recommendations. conclusions in this paper, we presented a brief yet holistic review of the current state of linked and open data in cataloging. the paper identified the potentials of lod and lov in making the bibliographic descriptions publishable, linkable, and consumable on the web. several prominent challenges, issues, and future research avenues were identified and discussed. the potential role of sociallycurated metadata for enriching library catalogs and the collaborative social aspect of cataloging were highlighted. some of the notable points include the following: information technology and libraries | december 2018 67 • publishing, linking, and consuming bibliographic metadata on the web using linked data principles brings several benefits for libraries.204 the library community should improve their skills regarding this paradigm shift and adopt the best practices from other domains.205 • standards have a key role in cataloging, however, we are living in a “jungle of metadata standards” with varying complexity and scale, which makes it difficult to select, apply and work with.206 to be part of global scale activity of making bibliographic data available on the web as linked and open data, these standards should be considered and reenvisioned.207 • the quality of bibliographic metadata depends on several factors including accuracy, completeness, logical consistency, provenance, coherence, timeliness, conformance and accessibility.208 however, achieving these characteristics is challenging because of several reasons including cataloging errors; limited bibliographic control; misunderstanding the role of metadata; and “unwise” cataloging standards and policies.209 to ensure high-quality and make data visible and reusable as linked data, the library community should contribute to developing and refining these standards and policies. 210 • metadata is socially constructed and demands community-driven approaches and the social collaborative aspect of cataloging by involving authors, repository managers, librarians, digital collection consortiums, publishers, vendors, and users.211 this is an emerging trend, which is gradually dissolving the borders between the “social sphere” and library resources and bridging the communication gap between libraries and their users, where end users contribute to the bibliographic descriptions resulting in a diversity of metadata and making it user-centric and usable.212 • adopting a “mixed-metadata approach” by considering bibliographic metadata and the user-generated content complementary and essential for each other suggests a “paradigm shift” in the metadata’s future from simple to enriched; from human-readable data silos to machine understandable, well-structured, and reusable; from invisible and restricted to visible and open; and from single opac to reconfigurable interfaces on the web.213 several researchers including the ones cited in this article agree that the professionally curated bibliographic metadata supports mostly the known-item search and has little value to open and exploratory search and browsing. they believe that not only the collaborative social efforts of the cataloging community are essential but also the socially curated metadata, which can be used to enrich bibliographic metadata and support exploration and serendipity. this is not only evident from the wider usage of librarything and its ltfl but also from the long-tail inex social book search research where both professionally curated bibliographic and user-generated social metadata are exploited for retrieval and recommendation to support both known-item as well as exploratory search.214 therefore, this aspect should be considered for further research to make cataloging more useful for all the stakeholders including libraries, users, authors, publishers, and for the general consumption as linked data on the web. the current trend of social collaborative cataloging efforts is essential to fully exploit the potential of linked open data. however, if we look closely, we find four groups including librarians, linked data experts, information retrieval (ir) and interactive ir researchers; and users, all going on their separate ways with minimal collaboration and communication. more specifically, they are not benefiting from each other to a greater extent, which could result in better possibilities of current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 68 https://doi.org/10.6017/ital.v37i4.10432 resource description, discovery, and access. for example, the library community should consider the findings of inex sbs track, which have demonstrated that professional and social metadata, are essential for each other to facilitate end users in resource discovery and access and support not only known-item search but also exploration and serendipity. the current practices of librarything, ltfl, and social web in general advocate for user-centric cataloging, where users are not only the consumers of bibliographic descriptions but also the contributors to metadata enrichment. linked open data experts have achieved significant milestones in other domains including, e.g., e-government, they should understand the cataloging and resource discovery & access practices in libraries to make the bibliographic metadata not only visible as linked data on the web but also shareable, re-usable, and beneficial to the end-users. the social collaborative cataloging approach by involving the four mentioned groups actively is significant to make bibliographic descriptions more useful not only for the library community and users but also for their consumption on the web as linked and open data. together we can, and we must. references 1 maría hallo et al., “current state of linked data in digital libraries,” journal of information science 42, no. 2 (2016):117–27, https://doi.org/10.1177/0165551515594729. 2 tim berners-lee, “design issues: linked data,” w3c, 2006, updated june18, 2009, accessed november 09, 2018, https://www.w3.org/designissues/linkeddata.html; hallo, “current state,” 117. 3 yuji tosaka and jung-ran park, “rda: resource description & access—a survey of the current state of the art,” journal of the american society for information science and technology 64, no. 4 (2013): 651–62, https://doi.org/10.1002/asi.22825. 4 hallo, “current state,” 118; angela kroeger, “the road to bibframe: the evolution of the idea of bibliographic transition into a post-marc future,” cataloging & classification quarterly 51, no. (2013): 873–90. https://doi.org/10.1080/01639374.2013.823584; martin doerr et al., “the europeana data model (edm).” paper presented at the world library and information congress: 76th ifla general conference and assembly, gothenburg, sweden, august 10–15, 2010. 5 getaneh alemu and brett stevens, an emergent theory of digital library metadata—enrich then filter,1st edition (waltham, ma: chandos publishing, elsevier ltd. 2015). 6 hallo, “current state,” 118 . 7 berners-lee, “design issues.” 8 kim tallerås, “quality of linked bibliographic data: the models, vocabularies, and links of data sets published by four national libraries,” journal of library metadata 17, no. 2 (2017):126– 55, https://doi.org/10.1080/19386389.2017.1355166. 9 becky yoose and jody perkins, “the linked open data landscape in libraries and beyond,” journal of library metadata 13, no. 2–3 (2013): 197–211, https://doi.org/10.1080/19386389.2013.826075. https://doi.org/10.1177/0165551515594729 https://www.w3.org/designissues/linkeddata.html https://doi.org/10.1002/asi.22825 https://doi.org/10.1080/01639374.2013.823584 https://doi.org/10.1080/19386389.2017.1355166 https://doi.org/10.1080/19386389.2013.826075 information technology and libraries | december 2018 69 10 robert fox, “from strings to things,” digital library perspectives 32, no. 1 (2016): 2–6, https://doi.org/10.1108/dlp-10-2015-0020. 11 stanislava gardašević, “semantic web and linked (open) data possibilities and prospects for libraries,” infotheca—journal of informatics & librarianship 14, no. 1 (2013): 26–36, http://infoteka.bg.ac.rs/pdf/eng/2013-1/infotheca_xiv_1_2014_26-36.pdf. 12 thomas baker, pierre-yves vandenbussche, and bernard vatant, “requirements for vocabulary preservation and governance,” library hi tech 31, no. 4 (2013): 657-68, https://doi.org/10.1108/lht-03-2013-0027. 13 pierre-yves vandenbussche et al., “linked open vocabularies (lov): a gateway to reusable semantic vocabularies on the web,” semantic web 8, no. 3 (2017): 437–45, https://doi.org/10.3233/sw-160213. 14 tosaka, “rda,” 651, 652. 15 amanda sprochi, “where are we headed? resource description and access, bibliographic framework, and the functional requirements for bibliographic records library reference model,” international information & library review 48, no. 2 (2016): 129–36, https://doi.org/10.1080/10572317.2016.1176455. 16 brighid m.gonzales, “linking libraries to the web: linked data and the future of the bibliographic record,” information technology and libraries 33, no. 4 (2014): 10, https://doi.org/10.6017/ital.v33i4.5631. 17 shoichi taniguchi, “is bibframe 2.0 a suitable schema for exchanging and sharing diverse descriptive metadata about bibliographic resources?,” cataloging & classification quarterly 56, no. 1 (2018): 40–61, https://doi.org/10.1080/01639374.2017.1382643. 18 shoichi taniguchi, “bibframe and its issues: from the viewpoint of rda metadata,” journal of information processing and management 58, no. 1 (2015): 20–27, https://doi.org/10.1241/johokanri.58.20. 19 shoichi taniguchi, “examining bibframe 2.0 from the viewpoint of rda metadata schema,” cataloging & classification quarterly 55, no. 6 (2017): 387–412, https://doi.org/10.1080/01639374.2017.1322161. 20 nosheen fayyaz, irfan ullah, and shah khusro, “on the current state of linked open data: issues, challenges, and future directions,” international journal on semantic web and information systems (ijswis) 14, no. 4 (2018): 110–28, https://doi.org/10.4018/ijswis.2018100106. 21 asim ullah, shah khusro, and irfan ullah, “bibliographic classification in the digital age: current trends & future directions,” information technology and libraries 36, no. 3 (2017): 48–77, https://doi.org/10.6017/ital.v36i3.8930. 22 tosaka, “rda,” 659. https://doi.org/10.1108/dlp-10-2015-0020 http://infoteka.bg.ac.rs/pdf/eng/2013-1/infotheca_xiv_1_2014_26-36.pdf https://doi.org/10.1108/lht-03-2013-0027 https://doi.org/10.3233/sw-160213 https://doi.org/10.1080/10572317.2016.1176455 https://doi.org/10.6017/ital.v33i4.5631 https://doi.org/10.1080/01639374.2017.1382643 https://doi.org/10.1241/johokanri.58.20 https://doi.org/10.1080/01639374.2017.1322161 https://doi.org/10.4018/ijswis.2018100106 https://doi.org/10.6017/ital.v36i3.8930 current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 70 https://doi.org/10.6017/ital.v37i4.10432 23 tosaka, “rda,” 651, 652, 659. 24 tosaka, “rda,” 653, 660. 25 the first author used the trial version of rda toolkit to report these facts about rda (https://access.rdatoolkit.org). rda toolkit is co-published by american library association (http://www.ala.org), canadian federation of library associations (http://cflafcab.ca/en/home-page), and facet publishing (http://www.facetpublishing.co.uk). 26 ifla, “ifla conceptual models,” the international federation of library associations and institutions (ifla), 2017, updated april 06, 2009, accessed november 12, 2018, https://www.ifla.org/node/2016. 27 tosaka, “rda,” 651, 652, 655. 28 michael john khoo et al., “augmenting dublin core digital library metadata with dewey decimal classification,” journal of documentation 71, no. 5 (2015): 976–98. https://doi.org/10.1108/jd-07-2014-0103; ulli waltinger et al., “hierarchical classification of oai metadata using the ddc taxonomy,” in advanced language technologies for digital libraries, edited by raffaella bernardi, frederique segond and ilya zaihrayeu. lecture notes in computer science (lncs), 29–40: springer, berlin, heidelberg, 2011; aaron krowne and martin halbert, “an initial evaluation of automated organization for digital library browsing,” paper presented at the proceedings of the 5th acm/ieee-cs joint conference on digital libraries, denver, co, usa, june 7–11, 2005 2005; waltinger, “ddc taxonomy,” 30. 29 khoo, “dublin core,” 977, 984 . 30 loc, “marc standards: marc21 formats,” library of congress (loc), 2013, updated march 14, 2013, accessed january 2, 2014, http://www.loc.gov/marc/marcdocz.html. 31 philip e schreur, “linked data for production and the program for cooperative cataloging,” pcc policy committee meeting, 2017, accessed may 18, 2018, https://www.loc.gov/aba/pcc/documents/facil-session-2017/pcc_and_ld4p.pdf. 32 sarah bull and amanda quimby, “a renaissance in library metadata? the importance of community collaboration in a digital world,” insights 29, no. 2 (2016): 146–53, http://doi.org/10.1629/uksg.302. 33 philip e. schreur, “linked data for production,” pcc policy committee meeting, 2015, accessed november 09, 2018, https://www.loc.gov/aba/pcc/documents/pcc-ld4p.docx. 34 vandenbussche, “linked open vocabularies,” 437, 438, 450. 35 hallo, “current state,” 120. 36 hallo, “current state,” 118. 37 hallo, “current state,” 120, 124. https://access.rdatoolkit.org/ http://www.ala.org/ http://cfla-fcab.ca/en/home-page http://cfla-fcab.ca/en/home-page http://www.facetpublishing.co.uk/ https://www.ifla.org/node/2016 https://doi.org/10.1108/jd-07-2014-0103 http://www.loc.gov/marc/marcdocz.html https://www.loc.gov/aba/pcc/documents/facil-session-2017/pcc_and_ld4p.pdf http://doi.org/10.1629/uksg.302 https://www.loc.gov/aba/pcc/documents/pcc-ld4p.docx information technology and libraries | december 2018 71 38 hallo, “current state,” 120, 124. 39 hallo, “current state,” 124. 40 bull, “community collaboration,” 147. 41 sam gyun oh, myongho yi, and wonghong jang, “deploying linked open vocabulary (lov) to enhance library linked data,” journal of information science theory and practice 2, no. 2 (2015): 6–15, http://dx.doi.org/10.1633/jistap.2015.3.2.1. 42 carlo bianchini and mauro guerrini, “a turning point for catalogs: ranganathan’s possible point of view,” cataloging & classification quarterly 53, no. 3-4 (2015): 341–51, http://doi.org/10.1080/01639374.2014.968273. 43 bianchini, “turning point,” 350. 44 loc, “library of congress linked data service,” the library of congress, accessed march 24, 2018, http://id.loc.gov/about/. 45 loc, “linked data service.” 46 loc, “linked data service.” 47 loc, “linked data service.” 48 loc, “linked data service.” 49 margaret e dull, “moving metadata forward with bibframe: an interview with rebecca guenther,” serials review 42, no. 1 (2016): 65–69, https://doi.org/10.1080/00987913.2016.1141032. 50 loc, “overview of the bibframe 2.0 model,” library of congress, april 21, 2016, accessed november 09, 2018, https://www.loc.gov/bibframe/docs/bibframe2-model.html. 51 taniguchi, “bibframe 2.0,” 388; taniguchi, “suitable schema,” 40. 52 oclc. 2016, “oclc linked data research,” online computer library center (oclc), https://www.oclc.org/research/themes/data-science/linkeddata.html. 53 oclc, “linked data research.” 54 jeff mister, “turning bibliographic metadata into actionable knowledge,” next blog—oclc, february 29, 2016, http://www.oclc.org/blog/main/turning-bibliographic-metadata-intoactionable-knowledge/. 55 mister, “turning bibliographic metadata.” 56 george campbell, karen coombs, and hank sway, “oclc linked data,” oclc developer network, march 26, 2018, https://www.oclc.org/developer/develop/linked-data.en.html. 57 campbell, “oclc linked data.” http://dx.doi.org/10.1633/jistap.2015.3.2.1 http://doi.org/10.1080/01639374.2014.968273 http://id.loc.gov/about/ https://doi.org/10.1080/00987913.2016.1141032 https://www.loc.gov/bibframe/docs/bibframe2-model.html https://www.oclc.org/research/themes/data-science/linkeddata.html http://www.oclc.org/blog/main/turning-bibliographic-metadata-into-actionable-knowledge/ http://www.oclc.org/blog/main/turning-bibliographic-metadata-into-actionable-knowledge/ https://www.oclc.org/developer/develop/linked-data.en.html current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 72 https://doi.org/10.6017/ital.v37i4.10432 58 roy tennant, “getting started with linked data,” next blog—oclc, february 8, 2016, http://www.oclc.org/blog/main/getting-started-with-linked-data-3/. 59 tennant, “linked data.” 60 dblp, “dblp computer science bibliography: frequently asked questions,” digital bibliography & library project (dblp), updated november 07, 2018, accessed 08 november 2018. http://dblp.uni-trier.de/faq/. 61 dblp, “frequently asked questions.” 62 jörg diederich, wolf-tilo balke, and uwe thaden, “demonstrating the semantic growbag: automatically creating topic facets for faceteddblp,” paper presented at the proceedings of the 7th acm/ieee-cs joint conference on digital libraries, vancouver, canada, june 17–22, 2007. 63 jörg diederich, wolf-tilo balke, and uwe thaden, “about faceteddblp,” 2018, accessed november 09, 2018, http://dblp.l3s.de/dblp++.php. 64 tennant, “linked data.” 65 in this section, lov catalog or portal refers to the lov platform available at http://lov.okfn.org/dataset/lov/, whereas the abbreviation lov, when used alone (without the term catalog/portal), refers to linked open vocabularies in general; vandenbussche, “linked open vocabularies,” 437. 66 vandenbussche, “linked open vocabularies,” 443, 450. 67 vandenbussche, “linked open vocabularies,” 437. 68 vandenbussche, “linked open vocabularies,” 437, 438, 450. 69 vandenbussche, “linked open vocabularies,” 438. 70 vandenbussche, “linked open vocabularies,” 437, 438, 443–46. 71 baker thomas, pierre-yves vandenbussche, and bernard vatant, “requirements for vocabulary preservation and governance,” library hi tech 31, no. 4 (2013): 657–68, https://doi.org/10.1108/lht-03-2013-0027. 72 thomas, “vocabulary preservation,” 658. 73 oh, “deploying,” 9. 74 oh, “deploying,” 9. 75 oh, “deploying,” 9, 10. http://www.oclc.org/blog/main/getting-started-with-linked-data-3/ http://dblp.uni-trier.de/faq/ http://dblp.l3s.de/dblp++.php http://lov.okfn.org/dataset/lov/ https://doi.org/10.1108/lht-03-2013-0027 information technology and libraries | december 2018 73 76 erik radio and scott hanrath, “measuring the impact and effectiveness of transitioning to a linked data vocabulary,” journal of library metadata 16, no. 2 (2016): 80–94, https://doi.org/10.1080/19386389.2016.1215734. 77 radio, transitioning,” 81. 78 robert, “strings to things,” 2. 79 robert, “strings to things,” 2, 4, 6. 80 vandenbussche, “linked open vocabularies,” 438. 81 as of april 23, 2018, the schema.org vocabulary is now available at http://lov.okfn.org/dataset/lov/; alberto nogales et al., “linking from schema.org microdata to the web of linked data: an empirical assessment,” computer standards & interfaces 45 (2016): 90-99. https://doi.org/10.1016/j.csi.2015.12.003. 82 bull, “community collaboration,” 146. 83 bull, “community collaboration,” 146. 84 bull, “community collaboration,” 147. 85 bull, “community collaboration,” 147. 86 bull, “community collaboration,” 147, 148. 87 schreur, 2015. linked data for production. 88 yhna therese p. santos, “resource description and access in the eyes of the filipino librarian: perceived advantages and disadvantages,” journal of library metadata 18, no. 1 (2017): 45–56, https://doi.org/10.1080/19386389.2017.1401869. 89 santos, “filipino librarian,” 51–55. 90 philomena w. mwaniki, “envisioning the future role of librarians: skills, services and information resources,” library management 39, no. 1, 2 (2018): 2–11, https://doi.org/10.1108/lm-01-2017-0001. 91 mwaniki, “envisioning the future,” 7, 8. 92 taniguchi, “bibframe 2.0,” 410, 411 . 93 taniguchi, “suitable schema,” 52–58 . 94 taniguchi, “suitable schema,” 59, 60. 95 taniguchi, “suitable schema,” 60. 96 sprochi, “where are we headed?,” 129, 134. https://doi.org/10.1080/19386389.2016.1215734 http://lov.okfn.org/dataset/lov/ https://doi.org/10.1016/j.csi.2015.12.003 https://doi.org/10.1080/19386389.2017.1401869 https://doi.org/10.1108/lm-01-2017-0001 current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 74 https://doi.org/10.6017/ital.v37i4.10432 97 sprochi, “where are we headed?,” 129. 98 sprochi, “where are we headed?,” 134. 99 sprochi, “where are we headed?,” 134. 100 sprochi, “where are we headed?,” 134, 135. 101 sprochi, “where are we headed?,” 134. 102 caitlin tillman, joseph hafner, and sharon farnel, “forming the canadian linked data initiative,” paper presented at the the 37th international association of scientific and technological university libraries 2016 (iatul 2016) conference, dalhousie university libraries in halifax, nova scotia, june 5–9, 2016. 103 carol jean godby, shenghui wang, and jeffrey k mixter, library linked data in the cloud: oclc's experiments with new models of resource description. vol. 5, synthesis lectures on the semantic web: theory and technology, san rafael, california (usa),morgan & claypool publishers, 2015, https://doi.org/10.2200/s00620ed1v01y201412wbe012. 104 sofia zapounidou, michalis sfakakis, and christos papatheodorou, “highlights of library data models in the era of linked open data,” paper presented at the the 7th metadata and semantics research conference, mtsr 2013, thessaloniki, greece, november 19 –22, 2013; timothy w. cole et al., “library marc records into linked open data: challenges and opportunities,” journal of library metadata 13, no. 2–3 (2013): 163–96, https://doi.org/10.1080/19386389.2013.826074; kim tallerås, “from many records to one graph: heterogeneity conflicts in the linked data restructuring cycle, information research 18, no. 3 (2013) paper c18, accessed november 10, 2018. 105 fabiano ferreira de castro, “functional requirements for bibliographic description in digital environments,” transinformação 28, no. 2 (2016): 223–31. https://doi.org/10.1590/231808892016000200008. 106 castro, “functional requirements,” 223, 224. 107 castro, “functional requirements,” 224, 230. 108 castro, “functional requirements,” 223, 228–30. 109 gardašević, “possibilities and prospects,” 35. 110 godby, oclc's experiments, 112. 111 gonzales, “the future,” 17. 112 karim tharani, “linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe,” information technology and libraries 34, no. 1 (2015): 5–15. https://doi.org/https://doi.org/10.6017/ital.v34i1.5664. 113 tharani, “harvesting and sharing,” 16. https://doi.org/10.2200/s00620ed1v01y201412wbe012 https://doi.org/10.1080/19386389.2013.826074 https://doi.org/10.1590/2318-08892016000200008 https://doi.org/10.1590/2318-08892016000200008 https://doi.org/https:/doi.org/10.6017/ital.v34i1.5664 information technology and libraries | december 2018 75 114 gonzales, “the future,” 16. 115 karen smith-yoshimura, “analysis of international linked data survey for implementers,” dlib magazine, 2016, july/august 2016. 116 smith-yoshimura, “analysis.” 117 smith-yoshimura, “analysis.” 118 aikaterini k. kalou, dimitrios a. koutsomitropoulos, and georgia d. solomou, “combining the best of both worlds: a semantic web book mashup as a linked data service over cms infrastructure,” journal of library metadata 16, no. 3–4 (2016): 228–49, https://doi.org/10.1080/19386389.2016.1258897. 119 cole, “marc,” 163, 165, 175. 120 cole, “marc,” 163, 164, 191. 121 cole, “marc,” 164, 191. 122 ifla, “linked open data: challenges arising,” the international federation of library associations and institutions (ifla), 2014, accessed march 03, 2018, https://www.ifla.org/book/export/html/8548. 123 hallo, “current state,” 124. 124 hallo, “current state,” 126. 125 hallo, “current state,” 124. 126 karen smith-yoshimura, “linked data survey results 4–why and what institutions are publishing (updated),” hanging together the oclc research blog, september 3, 2014, accessed november 12, 2018, https://hangingtogether.org/?p=4167. 127 bull, “community collaboration,” 148. 128 tallerås, “one graph.” 129 karen smith-yoshimura, “linked data survey results 3–why and what institutions are consuming (updated),” hanging together the oclc research blog, september 1, 2014, accessed november 12, 2018, http://hangingtogether.org/?p=4155. 130 godby, oclc’s experiments, 116. 131 carol jean godby and karen smith‐yoshimura, “from records to things: managing the transition from legacy library metadata to linked data,” bulletin of the association for information science and technology 43, no. 2 (2017): 18–23, https://doi.org/10.1002/bul2.2017.1720430209. 132 godby, “from records to things,” 23. https://doi.org/10.1080/19386389.2016.1258897 https://www.ifla.org/book/export/html/8548 https://hangingtogether.org/?p=4167 http://hangingtogether.org/?p=4155 https://doi.org/10.1002/bul2.2017.1720430209 current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 76 https://doi.org/10.6017/ital.v37i4.10432 133 godby, “from records to things,” 22. 134 vandenbussche, “linked open vocabularies,” 449, 450. 135 silvia b. southwick, cory k lampert, and richard southwick, “preparing controlled vocabularies for linked data: benefits and challenges,” journal of library metadata 15, no. 3–4 (2015): 177–190, https://doi.org/10.1080/19386389.2015.1099983. 136 southwick, “controlled vocabularies,” 177. 137 southwick, “controlled vocabularies,” 189, 190. 138 southwick, “controlled vocabularies,” 183. 139 robin hastings, “feature: linked data in libraries: status and future direction,” computers in libraries (magzine article), 2015, http://www.infotoday.com/cilmag/nov15/hastings-linked-data-in-libraries.shtml. 140 hastings, “status and future.” 141 hastings, “status and future.” 142 hastings, “status and future.” 143 hastings, “status and future.” 144 tallerås, “national libraries,” 129 (by quoting from van hooland 2009; wang and strong 1996). 145 jung-ran park, “metadata quality in digital repositories: a survey of the current state of the art,” cataloging & classification quarterly 47, no. 3–4 (2009): 213–28, https://doi.org/10.1080/01639370902737240. 146 tallerås, “national libraries,” 129 (by quoting from bruce & hillmann, 2004). 147 park, “metadata quality,” 213, 224; tallerås, “national libraries,” 129, 150. 148 park, “metadata quality,” 213, 215, 218–21, 224, 225; tallerås, “national libraries,” 141. 149 tallerås, “national libraries,” 129. 150 tallerås, “national libraries,” 129. 151 tallerås, “national libraries,” 129. 152 karen snow, “defining, assessing, and rethinking quality cataloging,” cataloging & classification quarterly 55, no. 7–8 (2017): 438–55, https://doi.org/10.1080/01639374.2017.1350774. 153 snow, “quality cataloging,” 445. 154 snow, “quality cataloging,” 451, 452. https://doi.org/10.1080/19386389.2015.1099983 http://www.infotoday.com/cilmag/nov15/hastings--linked-data-in-libraries.shtml http://www.infotoday.com/cilmag/nov15/hastings--linked-data-in-libraries.shtml https://doi.org/10.1080/01639370902737240 https://doi.org/10.1080/01639374.2017.1350774 information technology and libraries | december 2018 77 155 david van kleeck et al., “managing bibliographic data quality for electronic resources,” cataloging & classification quarterly 55, no. 7-8 (2017): 560–77, https://doi.org/10.1080/01639374.2017.1350777. 156 van kleeck, “data quality,” 560, 575, 576. 157 van kleeck, “data quality,” 575. 158 park, “metadata quality,” 214, 216–18, 225. 159 niso, a framework of guidance for building good digital collections, ed. niso framework advisory group, 3rd ed (baltimore, md: national information standards organization, 2007), https://www.niso.org/sites/default/files/2017-08/framework3.pdf. 160 park, “metadata quality,” 214, 215; niso. guidance; jane barton, sarah currier, and jessie mn hey, “building quality assurance into metadata creation: an analysis based on the learning objects and e-prints communities of practice,” paper presented at the proceedings of the international conference on dublin core and metadata applications: supporting communities of discourse and practice—metadata research & applications, seattle, washington, september 28–october 2, 2003. 161 pascal hitzler and krzysztof janowicz, “linked data, big data, and the 4th paradigm,” semantic web 4, no. 3 (2013): 233–35, https://doi.org/10.3233/sw-130117. 162 hitzler, “4th paradigm,” 234. 163 hitzler, “4th paradigm,” 234. 164 alberto petrucciani, “quality of library catalogs and value of (good) catalogs,” cataloging & classification quarterly 53, no. 3–4 (2015): 303–13. https://doi.org/10.1080/01639374.2014.1003669. 165 petrucciani, “quality,” 303, 305. 166 petrucciani, “quality,” 303, 309, 311. 167 petrucciani, “quality,” 303, 309. 168 petrucciani, “quality,” 309, 310. 169 petrucciani, “quality,” 310. 170 bull, “community collaboration,” 147. 171 bull, “community collaboration,” 148. 172 han, myung-ja, “new discovery services and library bibliographic control,” library trends 61, no. 1 (2012):162–72, https://doi.org/10.1353/lib.2012.0025. 173 han, “bibliographic control,” 162. https://doi.org/10.1080/01639374.2017.1350777 https://www.niso.org/sites/default/files/2017-08/framework3.pdf https://doi.org/10.3233/sw-130117 https://doi.org/10.1080/01639374.2014.1003669 https://doi.org/10.1353/lib.2012.0025 current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 78 https://doi.org/10.6017/ital.v37i4.10432 174 han, “bibliographic control,” 169–71. 175 han, “bibliographic control,” 163. 176 han, “bibliographic control,” 167–70. 177 alemu, emergent theory, 29–33, 43–65. 178 alemu, emergent theory, 29–65. 179 lorri mon, social media and library services, synthesis lectures on information concepts, retrieval, and services, ed. gary marchionini, 40, san rafael, california (usa), morgan & claypool publishers, 2015), https://doi.org/10.2200/s00634ed1v01y201503icr040. 180 mon, social media, 50. 181 mon, social media, 24. 182 marijn koolen et al., “overview of the clef 2016 social book search lab,” paper presented at the 7th international conference of the cross-language evaluation forum for european languages, évora, portugal, september 5–8, 2016; koolen et al., “overview of the clef 2015 social book search lab,” paper presented at the 6th international conference of the crosslanguage evaluation forum for european languages, toulouse, france, september 8–11, 2015; patrice bellot et al., “overview of inex 2014,” paper presented at the international conference of the cross-language evaluation forum for european languages, sheffield, uk, september 15–18, 2014; bellot et al., “overview of inex 2013,” paper presented at the international conference of the cross-language evaluation forum for european languages, valencia, spain, september 23–26, 2013. 183 bo-wen zhang, xu-cheng yin, and fang zhou, “a generic pseudo relevance feedback framework with heterogeneous social information,” information sciences 367–68 (2016): 909–26, https://doi.org/10.1016/j.ins.2016.07.004; xu-cheng yin et al., “isart: a generic framework for searching books with social information,” plos one 11, no. 2 (2016): e0148479, https://doi.org/10.1371/journal.pone.0148479; faten hamad and bashar alshboul, “exploiting social media and tagging for social book search: simple query methods for retrieval optimization,” in social media shaping e-publishing and academia, edited by nashrawan tahaet al., 107–17 (cham: springer international publishing, 2017). 184 marijn koolen, “user reviews in the search index? that’ll never work!” paper presented at the 36th european conference on ir research (ecir 2014), amsterdam, the netherlands, april 13–16, 2014. 185 alemu, emergent theory, 29–33, 43–65. 186 lucy clements and chern li liew, “talking about tags: an exploratory study of librarians’ perception and use of social tagging in a public library,” the electronic library 34, no. 2 (2016): 289–301, https://doi.org/10.1108/el-12-2014-0216. 187 clements, “talking about tags,” 291, 297-99. https://doi.org/10.2200/s00634ed1v01y201503icr040 https://doi.org/10.1016/j.ins.2016.07.004 https://doi.org/10.1371/journal.pone.0148479 https://doi.org/10.1108/el-12-2014-0216 information technology and libraries | december 2018 79 188 sharon farnel, “understanding community appropriate metadata through bernstein’s theory of language codes,” journal of library metadata 17, no. 1 (2017): 5–18, https://doi.org/10.1080/19386389.2017.1285141. 189 farnel, “bernstein’s theory,” 5, 6. 190 mwaniki, “envisioning the future,” 8. 191 mwaniki, “envisioning the future,” 8, 9. 192 getaneh alemu et al., “toward an emerging principle of linking socially-constructed metadata,” journal of library metadata 14, no. 2 (2014): 103–29, https://doi.org/10.1080/19386389.2014.914775. 193 farnel, “bernstein’s theory,” 15–16. 194 kalou, “book mashup.” 195 kalou, “book mashup,” 242, 243. 196 alemu, “socially-constructed metadata,” 103, 107. 197 alemu, “socially-constructed metadata,” 103. 198 alemu, “socially-constructed metadata,” 103, 104, 120, 121. 199 getaneh alemu, “a theory of metadata enriching and filtering: challenges and opportunities to implementation,” qualitative and quantitative methods in libraries 5, no. 2 (2017): 311–34, http://www.qqml-journal.net/index.php/qqml/article/view/343 200 alemu, “metadata enriching and filtering,” 311. 201 alemu, “socially-constructed metadata,” 125. 202 alemu, “metadata enriching and filtering,” 319, 320. 203 alemu, “metadata enriching and filtering”; alemu, emergent theory; alemu, “sociallyconstructed metadata”; farnel, “bernstein's theory”; kalou, “book mashup.” 204 hallo, “current state,” 120. 205 alemu, “socially-constructed metadata,” 125; hastings, “status and future.” 206 bull, “community collaboration,” 147. 207 bull, “community collaboration,” 152; bull, “community collaboration,” 152; schreur, 2015. linked data for production. 208 tallerås, “national libraries,” 129. 209 petrucciani, “quality,” 303, 309. https://doi.org/10.1080/19386389.2017.1285141 https://doi.org/10.1080/19386389.2014.914775 http://www.qqml-journal.net/index.php/qqml/article/view/343 current state of linked and open data in cataloging | ullah, khusro, ullah, and naeem 80 https://doi.org/10.6017/ital.v37i4.10432 210 bull, “community collaboration,” 147, 152. 211 farnel, “bernstein's theory,” 5, 6, 12, 13, 15, 16; mwaniki, “envisioning the future,” 8. 212 mon, social media, 3; alemu, “metadata enriching and filtering,” 320. 213 alemu, “socially-constructed metadata,” 125. 214 koolen, “clef 2016”; koolen, “clef 2015”; bellot, “inex 2014”; bellot, “inex 2013.” abstract introduction the role of linked open data and vocabularies in cataloging linked and open data linked open vocabularies challenges, issues, and research opportunities the multiplicity of cataloging rules and standards publishing and consuming linked bibliographic metadata publishing linked bibliographic metadata consuming linked bibliographic metadata quality of linked bibliographic metadata linking the socially curated metadata the socially curated metadata matters in cataloging the socially curated metadata as linked data conclusions references from our readers | eden 93 bradford lee edenfrom our readers the new user environment: the end of technical services? editor’s note: “from our readers” is an occasional feature highlighting ital readers’ letters and commentaries on timely issues. technical services: an obsolete term used to describe the largest component of most library staffs in the twentieth century. that component of the staff was entirely devoted to arcane and mysterious processes involved in selecting, acquiring, cataloging, processing, and otherwise making available to library users physical material containing information content pieces (incops). the processes were complicated, expensive, and time-consuming, and generally served to severely limit direct service to users both by producing records that were difficult to understand and interpret, even by other library staff, and by consuming from 75–80 percent of the library’s financial and personnel resources. in the twenty-first century, the advent of new forms of publication and new techniques for providing universal records and universal access to information content made the organizational structure obsolete. that change in organizational structure, more than any other single factor, is generally credited as being responsible for the dramatic improvement in the quality of library service that has occurred in the first decade of the twenty-first century. t here are many who would say that i was the one who wrote this quotation. i didn’t, and it is, in fact, more than twenty-five years old!1 while i was beginning to research and prepare for this article, i began as most users today start their search for information: i started with google. granted, i rarely go beyond the first page of results (as most user surveys indicate), but the paucity of links made me click to the next screen. there, at number 16, was a scanned article. jackpot! i thought as i started perusing the contents of this resource online, thinking to myself how the future had changed so dramatically since 1984, with the emergence of the internet and the laptop, all of the new information formats, and the digitization of information. ahh, the power of full text! after reading through the table of contents, introduction, and the first chapter, i noticed that some of the pages were missing. mmmm, obviously some very shoddy scanning on the part of google. but no, i finally realized that only part of this special issue was available on google. obviously, i missed the statement at the bottom of the front scan of the book: “this is a preview. the total pages displayed will be limited. learn more.” and thus the issues regarding copyright reared their ugly head. when discussing the new user environment, there are many demands facing libraries today. in a report by martha bates, citing the principle of least effort first attributed to philologist george zipf and quoted in the calhoun report to the library of congress, she states: people do not just use information that is easy to find; they even use information that they know to be of poor quality and less reliable—so long as it requires little effort to find—rather than using information they know to be of high quality and reliable, though harder to find . . . despite heroic efforts on the part of librarians, students seldom have sufficiently sustained exposure to and practice with library skills to reach the point where they feel real ease with and mastery of library information systems.2 according to the final report of bibliographic services task force of the university of california libraries, users expect the following: ■■ one system or search to cover a wide information universe (e.g., google or amazon) ■■ enriched metadata (e.g., onix, tables of contents, and cover art) ■■ full-text availability ■■ to move easily and seamlessly from a citation about an item to the item itself—discovery alone is not enough ■■ systems to provide a lot of intelligent assistance ■❏ correction of obvious spelling errors ■❏ results sorting in order of relevance to their queries ■❏ help in navigating large retrievals through logical subsetting or topical maps or hierarchies ■❏ help in selecting the best source through relevance ranking or added commentary from peers and experts or “others who used this also used that” tools ■❏ customization and personalization services ■■ authenticated single sign-on ■■ security and privacy ■■ communication and collaboration ■■ multiple formats available: e-books, mpeg, jpeg, rss and other push technologies, along with traditional, tangible formats ■■ direct links to e-mail, instant messaging, and sharing ■■ access to online virtual communities ■■ access to what the library has to offer without actually having to visit the library3 bradford lee eden (eden@library.ucsb.edu) is associate university librarian for technical services & scholarly communication, university of california, santa barbara. 94 information technology and libraries | june 2010 what is there in this new user environment for those who work in technical services? as indicated in the opening quote, would a dramatic improvement in library services occur if technical services were removed from the organizational structure? even in 1983, the huge financial investment that libraries made in the organization and description of information, inventory, workflows, and personnel was recognized; today, that investment comes under intense scrutiny as libraries realize that we no longer have a monopoly on information access, and to survive we need to move forward more aggressively into the digital environment than ever before. as marcum stated in her now-famous article, ■■ if the commonly available books and journals are accessible online, should we consider the search engines the primary means of access to them? ■■ massive digitization radically changes the nature of local libraries. does it make sense to devote local efforts to the cataloging of unique materials only rather than the regular books and journals? ■■ we have introduced our cataloging rules and the marc format to libraries all over the world. how do we make massive changes without creating chaos? ■■ and finally, a more specific question: should we proceed with aacr3 in light of a much-changed environment?4 there are larger internal issues to consider here as well. the budget situation in libraries requires the application of business models to workflows that have normally not been questioned nor challenged. karen calhoun discusses this topic in a number of her contributions to the literature: when catalog librarians identify what they contribute to their communities with their methods (the cataloging rules, etc.) and with the product they provide (the catalog), they face the danger of “marketing myopia.” marketing myopia is a term used in the business literature to describe a nearsighted view that focuses on the products and services that a firm provides, rather than the needs those products and services are intended to address.5 for understanding the implementation issues associated with the leadership strategy, it is important to be clear about what is meant by the “excess capacity” of catalogs. most catalogers would deny there is excess capacity in today’s cataloging departments, and they are correct. library materials continue to flood into acquisitions and cataloging departments and staff can barely keep up. yet the key problem of today’s online catalog is the effect of declining demand. in healthy businesses, the demand for a product and the capacity to produce it are in balance. research libraries invest huge sums in the infrastructure that produces their local catalogs, but search engines are students and scholars’ favorite place to begin a search. more users bypass catalogs for search engines, but research libraries’ investment in catalogs—and in the collections they describe—does not reflect the shift in user demand.6 i have discussed this exact problem in recent articles and technical reports as well.7 there have to be better, more efficient ways for libraries to organize and describe information not based on the status quo of redundant “localizing” of bibliographic records. a good analogy would be the current price of gas and the looming transportation crisis. for many years, americans have had the luxury of being able to purchase just about any type of car, truck, suv, hummer, etc., that they wanted on the basis of their own preferences, personalities, and incomes, not on the size of the gas tank or on the mileage per gallon. why not buy a mercedes over a kia? but with gas prices now well above the average person’s ability to consistently fill their gas tank without mortgaging their future, the market demands that people find alternative solutions in order to survive. this has meant moving away from the status quo of personal choice and selection toward a more economic and sustainable model of informed fuel-efficiency transportation, so much so that public transportation is now inundated with more users than it can handle, and consumers have all but abandoned the truck and suv markets. libraries have long worked in the mercedes arena, providing features such as authority control, subject classification, and redundant localizing of bibliographic records that were essential when libraries held the monopoly on information access but are no longer cost-efficient—nor even sane—strategies in the current information marketplace. users are not accessing the opac anymore; well-known studies indicate that more than 80 percent of information seekers begin their search on a web search engine. libraries are investing huge resources in staffing and priorities fiddling with marc bibliographic records in a time when they are struggling to survive and adapt from a monopoly environment to being just one of many players in the new information marketplace. budgets are stagnant, staffing is at an all-time low, new information formats continue to appear and require attention, and users are no longer patient nor comfortable working with our clunky opacs.8 why do libraries continue to support an infrastructure of buying and offering the same books, cds, dvds, journals, etc., at every library, when the new information environment offers libraries the opportunity to showcase and present their unique information resources and one-of-a-kind collections to the world? special collections materials held by every major research and public library in the world can now be digitized, and from our readers | eden 95 sparse library resources need to be adjusted to compete and offer these unique collections and their services to our users and the world. the october 2007 issue of computers in libraries is devoted solely to articles related to the enhancement, usability, appropriateness, and demise of the library opac. interesting articles include “fac-back-opac: an open source solution interface to your library system,” “dreaming of a better ils,” “plug your users into library resources with opensearch plug-ins,” delivering what people need, when and where they need it,” “the birth of a new generation of library interfaces,” and “will the ils soon be as obsolete as the card catalog?” an especially interesting quote is given by cervone, then assistant university librarian for information technology at northwestern university: what i’d like to see is for the catalog to go away. to a great degree, it is an anachronism. what we need from the ils is a solid, business-process back end that would facilitate the functions of the library that are truly unique such as circulation, acquiring materials, and “cataloging” at the item level for what amounts to inventory-control purposes. most of the other traditional ils functions could be rolled over into a centralized system, like oclc, that would be cooperatively shared. the catalog itself should be treated as just another database in the world of resources we have access to. a single interface to those resources that would combine our local print holdings, electronic text (both journal and ebook), as well as multimedia material is what we should be demanding from our vendors.9 one book that needs to be required reading for all librarians, especially catalogers, is weinberger ’s everything is miscellaneous.10 he describes the three orders of order (self organization, metadata, and digital); provides an extensive history of how western civilization has ordered information, specifically the links to nineteenth-century victorianism; and the concepts of lumping and splitting. in the end, weinberger argues that the digital environment allows users to manipulate information into their own organization system, disregarding all previous organizational attempts by supposed experts using outdated and outmoded systems. in the digital disorder of information, an object (leaf) can now be placed on many shelves (branches), figuratively speaking, and this new shape of knowledge brings out four strategic principles: 1. filter on the way out, not on the way in. 2. put each leaf on as many branches as possible. 3. everything is metadata and everything can be a label. 4. give up control. it is this last principle that libraries have challenges with. whether we agree with this principle or not, it has already happened. arguing about it, ignoring it, or just continuing to do business as usual isn’t going to change the fact that information is user-controled and user initiated in the digital environment. so, where do we go from here? the future of technical services (and its staff) far be it from me to try to predict the future of libraries as viable, and more importantly marketable, information organizations in this new environment. one has only to examine the quotations from the first issues of technical services quarterly to see what happens to predictions and opinions. titles of some of the contributions (from 1983, mind you) are worthy of mention: “library automation in the year 2000,” “musings on the future of the catalog,” and “libraries on the line.” there are developments, however, that require reexamination and strategic brainstorming regarding the future of library bibliographic organization and description. the appearance of worldcat local will have a tremendous impact on the disappearance of proprietary vendor opacs. there will no longer be a need for an integrated library system (ils); with worldcat local, the majority of the world’s marc bibliographic records are available in a library 2.0 format. the only things missing are some type of inventory and acquisitions module that can be formatted locally and a circulation module. if oclc could focus their programming efforts on these two services and integrate them into worldcat local, library administrators and systems staff would no longer have to deal with proprietary and clunky opacs (and their huge budgetary lines), but could use the power of web 2.0 (and hopefully 3.0) tools and services to better position themselves in the new information marketplace. another major development is the google digitization project (and other associated ventures). while there are some concerns about quality and copyright,11 as well as issues related to the disappearance of print and the time involved to digitize all print,12 no one can deny the gradual and inevitable effect that mass digitization of print resources will have in the new information marketplace. just the fact that my research explorations for this article brought up digitized portions of the 1983 technical services quarterly articles is an example. more and more, published print information will be available in full-text online. what effect will this have on the physical collection that all libraries maintain, not only in terms of circulation, but also in terms of use of space, preservation, and collection development? no one knows for sure, but if the search strategies and information discovery patterns of our users are any 96 information technology and libraries | june 2010 indication, then we need to be strategically preparing and developing directions and options. automatic metadata generation has been a topic of discussion for a number of years, and jane greenberg’s work at the university of north carolina–chapel hill is one of the leading examples of research in this area.13 while there are still viable concerns about metadata generation without any type of human intervention, semiautomatic and even nonlibrary-facilitated metadata generation has been successful in a number of venues. as libraries grapple with decreased budgets, multiplying formats, fewer staff to do the work, and more retraining and reprofessional development of existing staff, library administrators have to examine all options to maximize personnel as well as budgetary resources. incorporating new technologies and tools for generating metadata without human intervention into library workflows should be viewed as a viable option. user tagging would be included in this area. even intner, a long-time proponent of traditional technical services, has written that generating cataloging data automatically would be of great benefit to the profession, and that more tools and more programming ought to be focused toward this goal.14 so, with print workflows being replaced by digital and electronic workflows, how can administrators assist their technical services staff to remain viable in this new information environment? how can technical services staff not only help themselves but their supervisors and administrators to incorporate their unique talents, expertise, education, and experience toward the type of future scenarios indicated above? competencies and challenges for technical services staff there are some good opinions available for assisting technical services staff with moving into the new environment. names have power, whether we like to admit it or not, and changing the name from “technical services” to something more understandable to our users, let alone our colleagues within the library, is one way to start. names such as “collections and data management services” or “reference data services” have been mentioned.15 an interesting quote sums up the dilemma: it’s pretty clear that technical services departments have long been the ugly ducklings in the library pond, trumped by a quintet of swans: reference departments (the ones with answers for a grateful public); it departments (the magicians who keep the computers humming); children’s and youth departments (the warm and fuzzy nurturers); other specialty departments (the experts in good reads, music, art, law, business, medicine, government documents, av, rare books and manuscripts, you-name-it); and administrative groups (the big bosses). part of the trouble is that the rest of our colleagues don’t really know what technical services librarians do. they only know that we do it behind closed doors and talk about it in language no one else understands. if it can’t be seen, can’t be understood, and can’t be discussed, maybe it’s all smoke and mirrors, lacking real substance. it’s easy to ignore.16 ruschoff mentions competencies for technical services librarians in the new information environment: comfortable working in both print and digital worlds, specialized skills such as foreign languages and subject area expertise, comfortable working in both digital and web-based technologies (suggesting more computing and technology skills), expertise in digital asset management, and problem-solving analytical skills.17 in a recent blog posting summarizing a presentation at the 2008 ala annual conference on this topic, comparisons between catalogers going extinct or retooling are provided. the following is a summary of that post: converging trends ■■ more catalogers work at the support-staff level than as professional librarians. ■■ more cataloging records are selected by machines. ■■ more catalog records are being captured from publisher data or other sources. ■■ more updating of catalog records is done via batch processes. ■■ libraries continue to deemphasize processing of secondary research products in favor of unique primary materials. what are our choices? ■■ behind door number one—the extinction model. ■■ behind door number two—the retooling model. how it’s done ■■ extinction ■❏ keep cranking about how nobody appreciates us. ■❏ assert over and over that we’re already doing everything right—why should we change? ■❏ adopt a “chicken little” approach to envisioning the future. ■■ retooling ■❏ considers what catalogers already do. ■❏ look for support. ■❏ find a new job. what catalogers do ■■ operate within the boundaries of detailed standards. ■■ describe items one-at-a-time. ■■ treat items as if they are intended to fit carefully from our readers | eden 97 within a specific application—the catalog. ■■ ignore the rest of the world of information. what metadata librarians do ■■ think about descriptive data without preconceptions around descriptive level, granularity, or descriptive vocabularies. ■■ consider the entirety of the discovery and access issues around a set or collection of materials. ■■ consider users and uses beyond an individual service when making design decisions—not necessarily predetermined. ■■ leap tall buildings in a single bound. what new metadata librarians do ■■ be aware of changing user needs. ■■ understand the evolving information environment. ■■ work collaboratively with technical staff. ■■ be familiar with all metadata formats and encoding metadata. ■■ seek out tall buildings—otherwise jumping skills will atrophy. the cataloger skill set ■■ aacr2, lc, etc. the metadata librarian skill set ■■ views data as collections, sets, streams. ■■ approaches the task as designing data to “play well with others.” characteristics of our new world ■■ no more ils ■■ bibliographic utilities are unlikely to be the central node for all data. ■■ creation of metadata will become more decentralized. ■■ nobody knows how this will all shake out, but metadata librarians will be critical in forging solutions.18 while the above summary focuses on catalogers and their future, many of the directions also apply to any librarian or support staff member currently working in technical services. in a recent educause review article, brantley lists a number of mantras that all libraries need to repeat and keep in mind in this new information environment: ■■ libraries must be available everywhere. ■■ libraries must be designed to get better through use. ■■ libraries must be portable. ■■ libraries must know where they are. ■■ libraries must tell stories. ■■ libraries must help people learn. ■■ libraries must be tools of change. ■■ libraries must offer paths for exploration. ■■ libraries must help forge memory. ■■ libraries must speak for people. ■■ libraries must study the art of war.19 you will have to read the article to find out about that last point. the above mantras illustrate that each of these issues must also be aligned with the work done by technical services departments in support of the rest of the library’s services. and there definitely isn’t one right way to move forward; each library with its unique blend of services and staff has to define, initiate, and engender dialogue on change and strategic direction, and then actively make decisions with integrity and vigor toward both its users and its staff. as calhoun indicates, there are a number of challenges to feasibility for next steps in this area, some technically oriented but many based on our own organizational structures and strictures: ■■ difficulty achieving consensus on standardized, simplified, more automated workflows. ■■ unwillingness or inability to dispense with highly customized acquisitions and cataloging operations. ■■ overcoming the “not invented here” mindset preventing ready acceptance of cataloging copy from other libraries or external sources. ■■ resistance to simplifying cataloging. ■■ inability to find and successfully collaborate with necessary partners (e.g., ils vendors). ■■ difficulty achieving basic levels of system interoperability. ■■ slow development and implementation of necessary standards. ■■ library-centric decision making; inability to base priorities on how users behave and what they want ■■ limited availability of data to support management decisions. ■■ inadequate skill set among library staff; unwillingness or inability to retrain. ■■ resistance to change from faculty members, deans, or administrators.20 moving forward in the new information world in a recent discussion on the autocat electronic discussion list regarding the client-business paradigm now being impressed on library staff, an especially interesting quote puts the entire debate into perspective: the irony of this discussion is that our patrons/users/ clients [et al.] expect to be treated as well as business customers. they pay tuition or taxes to most of our institutions and expect to have a return in value. and a very large percentage of them care about the differences between the government services vs. business 98 information technology and libraries | june 2010 arguments we present. what they know is that when they want something, they want it. more library powers-that-be now come from the world of business rather than libraries because of the pressure on the bottom line. business administrators are viewed, even by those in public administration, as being more fiscally able than librarians. i would recommend that we fuss less about titles and semantics and develop ways to show the value of libraries to the public.21 wheeler, in a recent educause review article, documents a number of “eras” that colleges and universities have gone through in recent history.22 first is the “era of publishing,” followed by the “era of participation” with the appearance of the internet and its social networking tools. the next era, the “era of certitude,” is one in which users will want quick, timely answers to questions, along with some thought about the need and context of the question. wheeler espouses five dimensions that tools of certitude must have: reach, response, results, resources, and rights. he explains these dimensions in regards to various tools and services that libraries can provide through human–human, human–machine, and machine–machine interaction.23 wheeler sees extensive rethinking and reengineering by libraries, campuses, and information technology to assist users to meet their information needs. are there ways that technical services staff can assist in these efforts? although somewhat dated, calhoun’s extensive article on what is needed from catalogers and librarians in the twenty-first century expounds a number of salient points.24 in table 1, she illustrates some of the many challenges facing traditional library cataloging, providing her opinion on what the challenges are, why they exist, and some solutions for survivability and adaptability in the new marketplace.25 one quote in particular deserves attention: at the very least, adapting successfully to current demands will require new competencies for librarians, and i have made the case elsewhere that librarians must move beyond basic computer literacy to “it fluency”—that is, an understanding of the concepts of information technology, especially applying problem solving and critical thinking skills to using information technology. raising the bar of it fluency will be even more critical for metadata specialists, as they shift away from a focus on metadata production to approaches based on it tools and techniques on the one hand, and on consulting and teamwork on the other. as a result of the increasing need for it fluency among metadata specialists, they may become more closely allied with technical support groups in campus computing centers. the chief challenges for metadata specialists will be getting out of library back rooms, becoming familiar with the larger world of university knowledge communities, and developing primary contacts with the appropriate domain experts and it specialists.26 getting out of the back room and interacting with users seems to be one of the dominant themes of evolving technical services positions to fit the new information marketplace. putting web 2.0 tools and services into the library opac has also gained some momentum since the launch of the endeca-based opac at north carolina state university. as some people have stated, however, putting “lipstick on a pig” doesn’t change the fundamental problems and poor usability of something that never worked well in the first place.27 in their recent article, jia mi and cathy weng tried to answer the following questions: why is the current opac ineffective? what can libraries and librarians do to deliver an opac that is as good as search engines to better serve our users?28 of course, the authors are biased toward the opac and wish to make it better, given that the last sentence in their abstract is, “revitalizing the opac is one of the pressing issues that has to be accomplished.” users’ search patterns have already moved away from the opac as a discovery tool; why should personnel and resource investment continue to be allocated toward something that users have turned away from? in their recommendations, mi and weng indicate that system limitations, not fully exploiting the functionality already made available by ilss, and the unsuitability of marc standards to online bibliographic display are the primary factors to the ineffectiveness of library opacs. exactly. debate and discussion on autocat after the publication of their article again shows the line drawn between conservative opinions (added value, noncommercialization, and overall ideals of the library profession and professional cataloging workflows) and the newer push for open-source models, junking the opac, and learning and working with non-marc metadata standards and tools. conclusion from an administrative point of view, there are a number of viable options for making technical services as efficient as possible, in its current emanation: ■■ conduct a process review of all current workflows, following each type of format from receipt at loading dock to access by user. revise and redesign workflows for efficiency. ■■ eliminate all backlogs, incorporating and standardizing various types of bibliographic organization (from brief records to full records, using established criteria of importance and access). ■■ as much as possible, contract with vendors to make from our readers | eden 99 all print materials shelf-ready, establishing and monitoring profiles for quality and accuracy. establish a rate of error that is amenable to technical services staff; once that error rate is met, review incoming print materials only once or twice a year. ■■ assure technical services staff that their skills, experience, and attention to detail are needed in the electronic environment, and provide training and professional development to assist them in scanning and digitizing unique collections, learning non-marc metadata standards, improving project management, and performing consultation training to interact with faculty and students who work with data sets, metadata, and research planning. support and actively work for revised job reclassification of library support staff positions. most libraries are forced to work with fewer staff, and it is essential that current personnel are valued for their institutional knowledge and skill sets (knowledge management philosophy). library administrations need to emphasize to their staff that the organization has a vested interest in providing them with the tools and training they need to assist the organization in the new information marketplace. the status quo of technical services operations is no longer viable or cost-effective; all of us must look at ways to regain market share and restructure our organizations to collaborate and consult with users regarding their information and research needs. no longer is it enough to just provide access to information; we must also provide tools and assistance to the user in manipulating that information. to end, i would like to quote from a few of the articles from that 1983 issue of technical services quarterly i have alluded to throughout this chapter: like all prognostications, predictions about cataloging in a fully automated library may bear little resemblance to the ultimate reality. while the future cataloging scenario discussed here may seem reasonable now, it could prove embarrassing to read 10–20 years hence. still, i would be pleasantly surprised if, by the year 2000, ts operations are not fully integrated, ts staff has not been greatly reduced, there has not been a large-scale jump in ts productivity accompanied by a dramatic decline in ts costs, and if most of us are not cooperating through a national database.29 in conclusion, i will revert to my first subject, the uncertain nature of predictions. in addition to the fearless predictions already recorded, i predict that some of these predictions will come true and perhaps even most of them. some of them will come true, but not in the time anticipated, while others never will. let us hope that the influences not guessed that will prevent the actualization of some of these predictions will be happy ones, not dire. however they turn out, i predict that in ten years no one will remember or really care what these predictions were.30 technical services as we know them now may well not exist by the end of the century. the aims of technical services will exist for as long as there are libraries. the technical services quarterly may well have changed its name and its coverage long before then, but its concerns will remain real and the work to which many of us devote our lives will remain worthwhile. there can be few things in life that are as worth doing as enabling libraries to fulfill their unique and uniquely important role in culture and civilization.31 twenty-five years have come and gone; some of the predictions in this first issue of technical services quarterly came true, many of them did not. there have been dramatic changes in those twenty-five years, most of which were unforeseen, as they always are. what is a certainty is that libraries can no longer sustain or maintain the status quo in technical services. what also is a certainty is that technical services staff, with their unique skills, talents, abilities, and knowledge in relation to the organization and description of information, are desperately needed in the new information environment. it is the responsibility of both library administrators and technical services staff to work together to evolve and redesign workflows, standards, procedures, and even themselves to survive and succeed into the future. references 1. norman d. stevens, “selections from a dictionary of libinfosci terms,” in “beyond ‘1984’: the future of technical services,” special issue, technical services quarterly 1, no. 1–2 (fall/winter 1983): 260. 2. marcia j. bates, “improving user access to library catalog and portal information: final report,” (paper presented at the library of congress bicentennial conference on bibliographic control for the new millennium, june 1, 2003): 4, http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03 .doc.pdf (accessed apr. 7, 2009). see also karen calhoun, “the changing nature of the catalog and its integration with other discovery tools,” final report to the library of congress, mar. 17, 2006, 25, http://www.loc.gov/catdir/calhoun-report-final .pdf (accessed apr. 7, 2009). 3. university of california libraries bibliographic services task force, “rethinking how we provide bibliographic services for the university of california,” final report, dec. 2005, 8, http://libraries.universityofcalifornia.edu/sopag/bstf/final. pdf (accessed apr. 7, 2009). 4. deanna b. marcum, “the future of cataloging,” library resources & technical services 50, no. 1 (jan. 2006): 9, http://www .loc.gov/library/reports/catalogingspeech.pdf (accessed apr. 100 information technology and libraries | june 2010 7, 2009). 5. karen calhoun, “being a librarian: metadata and metadata specialists in the twenty-first century,” library hi tech 25, no. 2 (2007), http://www.emeraldinsight.com/insight/view contentservlet?filename=published/emeraldfulltextarticle/ articles/2380250202.html (accessed apr. 7, 2009). 6. calhoun, “the changing nature of the catalog,” 15. 7. bradford lee eden, “ending the status quo,” american libraries 39, no. 3 (mar. 2008): 38; eden, introduction to “information organization future for libraries,” library technology reports 44, no. 8 (nov./dec. 2007): 5–7. 8. see karen schneider’s “how opacs suck” series on the ala techsource blog, http://www.techsource.ala.org/ blog/2006/03/how-opacs-suck-part-1-relevance-rank-or-the -lack-of-it.html, http://www.techsource.ala.org/blog/2006/04/ how-opacs-suck-part-2-the-checklist-of-shame.html, and http:// www.techsource.ala.org/blog/2006/05/how-opacs-suck-part3-the-big-picture.html (accessed apr. 7, 2009). 9. h. frank cervone, quoted in ellen bahr, “dreaming of a better ils,” computers in libraries 27, no. 9 (oct. 2007): 14. 10. david weinberger, everything is miscellaneous: the power of the new digital disorder (new york: times, 2007). 11. for a list of these concerns, see robert darnton, “the library in the new age,” the new york review of books 55, no. 10 (june 12, 2008), http://www.nybooks.com/articles/21514 (accessed apr. 7, 2009). 12. see calhoun, “the changing nature of the catalog,” 27. 13. see the metadata research center, “automatic metadata generation applications (amega),” http://ils.unc.edu/mrc/ amega (accessed, apr. 7, 2009). 14. sheila s. intner, “generating cataloging data automatically,” technicalities 28, no. 2 (mar./apr. 2008): 1, 15–16. 15. sheila s. intner, “a technical services makeover,” technicalities 27, no. 5 (sept./oct. 2007): 1, 14–15. 16. ibid, 14 (emphasis added). 17. carlen ruschoff, “competencies for 21st century technical services,” technicalities 27, no. 6 (nov./dec. 2007): 1, 14–16. 18. diane hillman, “a has-been cataloger looks at what cataloging will be,” online posting, metadata blog, july 1, 2008, http://blogs.ala.org/nrmig.php?title=creating_the_future_of_ the_catalog_aamp_&more=1&c=1&tb=1&pb=1 (accessed apr. 7, 2009). 19. peter brantley, “architectures for collaboration: roles and expectations for digital libraries,” educause review 43, no. 2 (mar./apr. 2008): 31–38. 20. calhoun, “the changing nature of the catalog,” 13. 21. brian briscoe, “that business/customer stuff (was: letter to al),” online posting, autocat, may 30, 2008. 22. brad wheeler, “in search of certitude,” educause review 43, no. 3 (may/june 2008): 15–34. 23. ibid., 22. 24. karen calhoun, “being a librarian.” 25. ibid. 26. ibid. (emphasis added). 27. andrew pace, quoted in roy tennant, “digitl libraries: ‘lipstick on a pig,’” library journal, apr. 15, 2005, http:// www.libraryjournal.com/article/ca516027.html (accessed apr. 7, 2009). 28. jia mi and cathy weng, “revitalizing the library opac: interface, searching, and display challenges,” information technology & libraries 27, no. 1 (mar. 2008): 5–22. 29. gregor a. preston, “how will automation affect cataloging staff?” in “beyond ‘1984’: the future of technical services,” special issue, technical services quarterly 1, no. 1–2 (fall/ winter 1983): 134. 30. david c. taylor, “the library future: computers,” in “beyond ‘1984’: the future of technical services,” special issue, technical services quarterly 1, no. 1–2 (fall/winter 1983): 92–93. 31. michael gorman, “technical services, 1984–2001 (and before),” in “beyond ‘1984’: the future of technical services,” special issue, technical services quarterly 1, no. 1–2 (fall/winter 1983): 71. lita cover 2, cover 3 neal-schuman cover 4 index to advertisers guest editorial | hirst 179 o rganization structure and reorganization are never exciting topics. the world rarely pauses to take a deep breath or offer a round of applause when an organization adds a new committee or decides to split into subgroups. however, organizations frequently inform the patterns and processes of change—as well as no change. recently, the ex libris users of north america (eluna) group reorganized. processes and outcomes were similar to those i observed many years before when the library information and technology association (lita) restructured, and i labeled the process litaish. john webb subsequently asked me to elaborate through an information technologies and libraries (ital) editorial. ■ lita—an organizational recap in 1981, lita launched a bold reorganization. sections and committees were abolished and a new structure, the interest group, was created with the hope of significant benefits to the organization. the final report of the long-range plan implementation committee of may 29, 1984, stated: the main thrust of the reorganization . . . was the establishment and encouragement of interest groups, which were intended to reflect topics of current interest to members and to have a structure which allows for easy creation and easy elimination as interests and technology change. interest groups could be formed . . . from as few as ten lita members and were empowered to plan and present programs, institutes, and preconferences . . . linda knutson, who became executive director of lita in february 1987, “has . . . been impressed by the increase in the level of participation and by the tremendous energy that the players have; they want to contribute, and they plunge in with both feet.” these comments are from conversations with linda knutson quoted in “lita’s first twenty-five years: a brief history,” by stephen r. salmon in the march 1993 silver anniversary issue of ital. twenty years later, the lita organization and, specifically, the lita interest groups (igs) continue to provide forums for discussion, create conference programs, institutes, and preconferences. the igs hold the content of the organization with minimal administrative overhead, irregular leadership, and virtually no bylaws. ■ naaug—the deconstruction of a classic model aleph, the ex libris integrated library solution (ils) software, is used in numerous countries. the north american aleph user’s group (naaug) existed from 1999 to 2006. the organization had a reasonably classic structure with a steering committee and ad hoc groups to work on annual software enhancements, focus groups, and conference planning. the organization was very centralized with all appointments to subgroups made by the steering committee. developments outside the ils put pressure on naaug to reorganize. ex libris was offering numerous new products, some of which complemented, some of which were independent of the ils. as with any organization, there was some pressure to retain all or part of the status quo from those who were hesitant to change or change radically. leaders, including myself, were cautious, always questioning whether new developments would work and be effective. ■ eluna emerges the new ex libris users’ organization, eluna, is composed of the steering committee, product groups (pgs), and interest groups (igs). i was intrigued with the formation of eluna igs and believe that this structure was an offspring of the lita igs. the eluna igs have very little bureaucracy to hinder the creativity and energy that lita wanted to capture. there is no minimum number of participants in an eluna ig, the creation of which can be proposed by any single individual. each group must write a brief annual report, have a contact person whose name and e-mail is posted on the web site, and may have an optional electronic discussion list. the groups can meet at the annual conference or anywhere they choose and a virtual ig is not discouraged. the igs may get involved in product enhancements, but it is fine to leave this work to the pgs. currently, igs are organized around such areas as function, type of library, and particular software. some examples: ■ data representation (special scripts) ■ law ■ edi ■ music ■ government publications ■ shared systems (consortia) ■ ill ■ sql guest editorial: organizational structure— yesterday informs the present donna hirst donna l. hirst (donna-hirst@uiowa.edu) is project coordinator, library information technology, university of iowa libraries, and a member of the ital editorial board. 180 information technology and libraries | december 2006 ■ large research libraries ■ z39.50 ■ what happens next? the eluna structures of steering committee, pgs, and igs are off to a good start. because each of these is empowered to work independently, a communication matrix needs to be put into place so that all interested or affected parties are adequately informed. in the future, a process will need to be created to identify groups that need to be disbanded. lita solved this problem with the periodic renewal process. in eluna, the contact person may be able to assume this responsibility. we live in an age where “opening” offers a context for change. opening implies new possibilities and few restrictions. open systems . . . open access . . . open source. it appears to me that eluna is continuing a tradition that lita began twenty-five years ago with an open organization. put people into a group, stir lightly, and watch what comes out of the pot. ■ 184 information technology and libraries | december 2009 thomas sommer unlv special collections in the twenty-first century university of nevada las vegas (unlv) special collections is consistently striving to provide several avenues of discovery to its diverse range of patrons. specifically, unlv special collections has planned and implemented several online tools to facilitate unearthing treasures in the collections. these online tools incorporate web 2.0 features as well as searchable interfaces to collections. t he university of nevada las vegas (unlv) special collections has been working toward creating a visible archival space in the twenty-first century that assists its patrons’ quest for historical discovery in unlv’s unique southern nevada, gaming, and las vegas collections. this effort has helped patrons ranging from researchers to students to residents. special collections has created a discovery environment that incorporates several points of access, including virtual exhibits, a collection-wide search box, and digital collections. unlv special collections also has added web 2.0 features to aid in the discovery and enrichment of this historical information. these new features range from a what’s new blog to a digital collection with interactive features. the first point of discovery within the unlv special collections website began with the virtual exhibits. staff created the virtual exhibits as static html pages that showcased unique materials housed within unlv special collections. they showed the scope and diversity of materials on a specific topic available to researchers, faculty, and students. one virtual exhibit is “dino at the sands” (figure 1), a point of discovery for the history not only of dean martin but of many rat pack exploits.1 the photographs in this exhibit come from the sands collection. it is a static html page, and it provides information and pictures regarding one of las vegas’ most famous entertainers. this exhibit contains links to rat pack information and various resources on dean martin, including photographs, books, and videotapes. a second mode of discovery within the unlv special collections website is its new “search special collections” google-like search box (figure 2). this is located on the homepage and searches the manuscript, photograph, and oral history primary source collections.2 the purpose is to aid in the discovery of material within the collections that is not yet detailed in the public online catalog. in the past researchers would have to work through the special collection’s website to locate the resources. they can now go to one place to search for various types of material—a one-stop shop. the search results are easy to read and highlight the search term (see figure 3).3 the third point of access is the digital collection. these collections are digital copies of original materials located within the archives. the digital copies are presented online, described, and organized for easy access. each collection offers full-text searches, browsing, zoom, pan, figure 2. unlv special collections search box figure 1. “dino at the sands” exhibit thomas sommer (thomas.sommer@unlv.edu) is university and technical services archivist in special collections at the university of nevada las vegas libraries. unlv special collections in the twenty-first century | sommer 185 side-by-side comparison, and exporting for presentation and reuse. the newest example of a digital collection is “southern nevada: the boomtown years” (figure 4).4 this collection brings together a wide range of original materials from various collections located within unlv special collections, the nevada state museum, the historical society in las vegas, and the clark county heritage museum. it even provides standards-based activities for elementary and high school students. this project was funded by the nevada state library and archives under the library services and technology act (lsta) as amended through the institute of museum figure 4. “southern nevada: the boomtown years” digital collection figure 5. “what’s new” blog figure 6. unlv special collection facebook page figure 3. hoover dam search results 186 information technology and libraries | december 2009 and library services (imls). unlv special collections director peter michel selected the content. the team included fourteen members, four of whom were funded by the grant. christy keeler, phd, created the educator pages and designed the student activities. new collections are great, but users have to know they exist. to announce new collections and displays, special collections first added a what’s new blog that includes an rss feed to keep patrons up-to-date on new messages (figure 5).5 another avenue of interaction was implemented in april 2009 when special collections created its own facebook page (figure 6).6 students and researchers are encouraged to become fans. status updates with images and links to southern nevada and las vegas resources lead the fans back to the main website where the other treasures can be discovered. special collections has implemented various web 2.0 features within its newest digital collections. specifically, it added a comments section, a “rate it” feature, and an rss feature to its latest digital collections (figures 7, 8, and 9). these latest trends enrich the collections’ resources with patron-supplied information.7 as is apparent, unlv special collections implemented several online tools to allow patrons to discover its extensive primary resources. these tools range from virtual exhibits and digital collections with web 2.0 features to blogs and social networking sites. special collections has endeavored to stay on top of the latest trends to benefit its patrons and facilitate their discovery of historical materials in the twenty-first century. figure 8. “rate it” feature for aerial view of hughes aircraft plant photograph figure 7. comments section for aerial view of hughes aircraft plant photograph figure 9. rss feature for the index to the “welcome home howard” digital collection continued on page 190 190 information technology and libraries | december 2009 as previously mentioned, these easy-to-use tools can allow screencast videos and screenshots to be integrated into a variety of online spaces. a particularly effective type of online space for potential integration of such screencast videos and screenshots are library “how do i find . . .” research help guides. many of these “how do i find . . .” research help guides serve as pathfinders for patrons, outlining processes for obtaining information sources. currently, many of these pathfinders are in text form, and experimentation with the tools outlined in this article can empower library staff to enhance their own pathfinders with screencast videos and screenshot tutorials. reference 1. “unlv libraries strategic plan 2009–2011,” http://www .library.unlv.edu/about/strategic_plan09-11.pdf (accessed july 30, 2009): 2. unlv special collections continued from page 186 references 1. peter michel, “dino at the sands,” unlv special collections, http://www.library.unlv.edu/speccol/dino/index.html (accessed july 28, 2009). 2. peter michel, “unlv special collections search box.” unlv special collections. http://www.library.unlv.edu/speccol/ index.html (accessed july 28, 2009). 3. unlv special collections search results, “hoover dam,” http://www.library.unlv.edu/speccol/databases/index .php?search_query=hoover+dam&bts=search&cols[]=oh&cols []=man&cols[]=photocoll&act=2 (accessed october 27, 2009). 4. unlv libraries, “southern nevada: the boomtown years,” http://digital.library.unlv.edu/boomtown/ (accessed july 28, 2009). 5. unlv special collections, “what’s new in special collections,” http://blogs.library.unlv.edu/whats_new_in_special_ collections/ (accessed july 28, 2009). 6. unlv special collections, “unlv special collections facebook homepage,” http://www.facebook.com/home .php?#/pages/las-vegas-nv/unlv-special-collections/70053 571047?ref=search (accessed july 28, 2009). 7. unlv libraries, “comments section for the aerial view of hughes aircraft plant photograph,” http://digital.library .unlv.edu/hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “‘rate it’ feature for the aerial view of hughes aircraft plant photograph,” http://digital.library.unlv.edu/ hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “rss feature for the index to the welcome home howard digital collection” http://digital.library.unlv.edu/hughes/ dm.php/ (accessed july 28, 2009). statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: marc truitt, associate director, information technology resources and services, university of alberta, k adams/cameron library and services, university of alberta, edmonton, ab t6g 2j8 canada. annual subscription price, $65. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: september 2009 issue). total number of copies printed: average, 5,096; actual, 4,751. mailed outside country paid subscriptions: average, 4,090; actual, 3,778. sales through dealers and carriers, street vendors, and counter sales: average, 430; actual 399. total paid distribution: average, 4,520; actual, 4,177. free or nominal rate copies mailed at other classes through the usps: average, 54; actual, 57. free distribution outside the mail (total): average, 127; actual, 123. total free or nominal rate distribution: average, 181; actual, 180. total distribution: average, 4,701; actual, 4,357. office use, leftover, unaccounted, spoiled after printing: average, 395; actual, 394. total: average, 5,096; actual, 4,751. percentage paid: average, 96.15; actual, 95.87. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , s e p t e m b e r 2 0 0 7 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , o c t o b e r 1 , 2 0 0 9 . fboze rectangle president’s message: ux thinking and the lita member experience rachel vacek information technologies and libraries | september 1014 1 my mind has been occupied lately with user experience (ux) thinking in both the web world and in the physical world around me. i manage a web services department in an academic library, and it’s my department’s responsibility to contemplate how best to present website content so students can easily search for the articles they are looking for, or so faculty can quickly navigate to their favorite database. in addition to making these tasks easy and efficient, we want to make sure that users feel good about their accomplishments. my department has to ensure that the other systems and services that are integrated throughout the site are located in meaningful places and can be used at the point of need. additionally, the site’s graphic and interaction design must not only contribute to but also enhance the overall user experience. we care about usability, graphic design, and the user interfaces of our library’s web presence, but these are just subsets of the larger ux picture. for example, a site can have a great user interface and design, but if a user can’t get to the actual information she is looking for, the overall experience is less than desirable. jesse james garrett is considered to be one of the founding fathers of user-centered design, the creator of the pivotal diagram defining the elements of user experience, and author of book, the elements of user experience. he believes that “experience design is the design of anything, independent of medium, or across media, with human experience as an explicit outcome, and human engagement as an explicit goal.”1 in other words, applying a ux approach to thinking involves paying attention to a person’s behaviors, feelings, and attitudes about a particular product, system, or service. someone who does ux design, therefore, focuses on building the relationship between people and the products, systems, and services in which they interact. garrett provides a roadmap of sorts for us by identifying and defining the elements of a web user experience, some of which are the visual, interface, and interaction design, the information architecture, and user needs.2 in time, these come together to form a cohesive, holistic approach to impacting our users’ overarching experience across our library’s web presence. paying attention to these more contextual elements informs the development and management of a web site. let’s switch gears for a moment. prior to winning the election and becoming the lita vicepresident/president-elect, i reflected on my experiences as a new lita member and before i became really engaged within the association. i endeavored to remember how i felt when i had joined lita in 2005. was i welcomed and informed, or did i feel distant and uninformed? was the path clear to getting involved in interest groups and committees, or were there barriers that rachel vacek (revacek@uh.edu) is lita president 2014-15 and head of web services, university libraries, university of houston, houston, texas. mailto:revacek@uh.edu president’s message | vacek 2 prevented me from getting engaged? what was my attitude about the overall organization? how were my feelings about lita impacted? luckily, there were multiple times when i felt embraced by lita members, such as participating in bigwig’s social media showcase, teaching pre-conferences, hanging out at the happy hours, and attending the forums. i discovered ample networking opportunities and around every corner there always seemed to be a way to get involved. i attended as many lita programs at annual and midwinter conferences as i could, and in doing so, ran into the same crowds of people over and over again. plus, the sessions i attended always had excellent content and friendly, knowledgeable speakers. over time, many of these members became some of my friends and most trusted colleagues. unfortunately, i’m confident that not every lita member or prospective member has had similar, consistent, or as engaging experiences as i’ve had, or as many opportunities to travel to conferences and network in-person. we all have different expectations and goals that color our personal experiences in interacting with lita and its members. one of my goals as lita president is to enhance the member experience. i want to apply the user experience design concepts that i’m so familiar with to effect change and improve the overall experience for current members and those who are on the fence about joining. to be clear, when i say lita member, i am including board members, committee members and chairs, interest group members and chairs, representatives, and those just observing on the sidelines. we are all lita members and deserve to have a good experience no matter the level within the organization. so what does “member experience” really mean? don norman, author of the design of everyday things and the man attributed with actually coining the phrase user experience, explains that "user experience encompasses all aspects of the end-user's interaction with the company, its services, and its products.” 3 therefore, i would say that the lita member experience encompasses all aspects of a member’s interaction with the association, including its programming, educational opportunities, publications, events, and even other members. i believe that there are several components that define a good member experience. first, we have to ensure quality, coherence, and consistency in programming, publications, educational opportunities, communications and marketing, conferences, and networking opportunities. second, we need to pay attention to our members’ needs and wants as well as their motivations for joining. this means we have to engage with our members more on a personal level, and discover their interests and strengths, and help them get involved in lita in ways that benefit the association as well assist them in reaching their professional goals. third, we need to be welcoming and recognize that first impressions are crucial to gaining new members and retaining current ones. think about how you felt and what you thought when you received a product that really impressed you, or when you started an exciting new job, or even used a clean and usable web site. if your initial impression was positive, you were more likely to connect with the product, environment, or even a website. if prospective and relatively new lita information technologies and libraries | september 1014 3 members experience a good first impression, they are more likely to join or renew their membership. they feel like they are part of a community that cares about them and their future. that experience became meaningful. finally, the fourth component to a good member experience is that we need to stop looking at the tangible benefits that we provide to users as the only things that matter. sure, it’s great to get discounts on workshops and webinars or be able to vote in an election and get appointed to a committee, but we can’t continue to focus on these offerings alone. we need to assess the way we communicate through email, social media, and our web page and determine if it adds or detracts from the member experience. what is the first impression someone might have in looking at the content and design of lita’s web page? do the presenters for our educational programs feel valued? does ital contain innovative and useful information? is the process for joining lita, or volunteering to be on a committee, simple, complex, or unbearable? what kinds of interactions do members have with the lita board or the lita staff? these less tangible interactions are highly contextual and can add to or detract from our current and prospective members’ abilities to meet their own goals, measure satisfaction, or define success. as lita president, and with the assistance of the board of directors, there are several things we have done or intend to do to help lita embrace ux thinking: • we have implemented a chair and vice-chair model for committees so that there is a smoother transition and the vice-chair can learn the responsibilities of the chair role prior to being in that role. • we have established a new communications committee that will create a communication strategy focused on communicating the lita’s mission, vision, goals, and relevant and timely news to lita membership across various communication channels. • we are encouraging our committees to create more robust documentation. • we are creating richer documentation that supports the workings of the board. • we are creating documentation and training materials for lita representatives to compliment the materials we have for committee chairs. • we have disbanded committees that no longer serve a purpose at the lita level and whose concerns are now addressed in groups higher within ala. • the assessment and research committee is preparing to do a membership survey. the last one was done in 2007. • we are going to be holding a few virtual and in-person lita “kitchen table conversations” in the fall of 2014 to assist with strategic planning and to discuss how lita’s goals align with ala’s strategic goals of information policy, professional development, and advocacy. • the membership development committee is exploring how to more easily and frequently reach out, engage, appreciate, acknowledge, and highlight current and prospective members. they will work closely with the communications committee. president’s message | vacek 4 i believe that we’ve arrived at a time where it’s crucial that we employ ux thinking at a more pragmatic and systematic level and treat at it as our strategic partner when exploring how to improve lita and help the association evolve to meet the needs of today’s library and informational professionals. garrett summarizes my argument nicely. he says, “what makes people passionate, pure and simple, is great experiences. if they have great experience with your product [and] they have great experiences with your service, they’re going to be passionate about your brand, they’re going to be committed to it. that’s how you build that kind of commitment.”4 i personally am very passionate about and committed to lita, and i truly believe that our ux efforts will positively impact your experience as a lita member. references 1. http://uxdesign.com/events/article/state-of-ux-design-garrett/203, garrett said this quote in a presentation entitled “state of user experience” that he gave during ux week 2009, a very popular conference for ux designers. 2. http://www.jjg.net/elements/pdf/elements.pdf 3. http://www.nngroup.com/articles/definition-user-experience/ 4. http://www.teresabrazen.com/podcasts/what-the-heck-is-user-experience-design, garret said this quote in a podcast interview with teresa brazen, “what the heck is user experience design??!! (and why should i care?)” http://uxdesign.com/events/article/state-of-ux-design-garrett/203 http://www.jjg.net/elements/pdf/elements.pdffunctional http://www.nngroup.com/articles/definition-user-experience/ http://www.teresabrazen.com/podcasts/what-the-heck-is-user-experience-design information technology and libraries at 50: the 1990s in review steven k. bowers information technology and libraries | december 2018 9 steven k. bowers (sbowers@wayne.edu) is executive director, detroit area library network (dalnet). i played some computers games — stored on data cassette tapes — in the 1980s. that was entertaining, but i never imagined the greater hold that computers would have on the world by the mid-1990s. i can remember getting my first email account in 1993, and looking at information on rudimentary web pages in 1996. i remember my work shifting from an electric typewriter to a bulky personal computer with dial-up internet access. eventually, this new computing technology became a prevalent part of my everyday life. this shift to a computer-driven reality had a major effect on libraries too. i was amazed by the end of the 1990s to be doing research on a university library catalog system connected with other institutions of higher education throughout the region, wondering at the expanded access to, and reach of, information. in my mind, due to computers and the internet, libraries were really connected at that time more than they had ever been. as i prepared this review of what we were writing about in ital in the 1990s, i had some fond memories of the advent of personal computers in my daily life and in the libraries i had access to. as we take a look back, i think it is interesting to see what we were doing then and how it is connected to what we are still working on today. along with the eventual disruption that the internet was to libraries, computers and online access also had the effect of greatly changing how libraries constructed our core research tools, especially the catalog. prior to the 1990s libraries had begun automation projects to move their catalogs to computer-based terminals, creating connections and access that were not previously possible with a card catalog. if we are still complaining about the design and function of the online public access catalog (opac) today, in the early 1990s we were discussing what their design and function should be, in a positive and optimistic way. in some ways it seems hard to recall the discussions of how to format data and display it to users. in other ways it seems like we are still having the same discussions, but our work has become more complex as we continue to restructure library data to become more open and accessible. while we were contemplating the design of online library catalogs, libraries were also discussing the implementation of networking and other information technology infrastructures. nevins and learn examined the changes in hardware, software, and telecommunications at the time and predicted a more affordable cost model with distributed personal computers connected through networks, and enhancing library automation cooperation. 1 they expanded the discussion to include consideration of copyright and intellectual property, security, authorization, and a need for information literacy in the form of user navigation, all key to what we are doing today. beyond catalogs, there was the real adoption of the internet itself. by the early 1990s there was growing enthusiasm for accessing and exploring the internet. 2 this created a need for libraries to learn about the internet and instruct others on how to use it. as late as 1997, however, even search engines were still being introduced and defined, and using the internet or searching the world wide web was still a new concept that was not fully understood by many people. at their the 1990s in review| bowers 10 https://doi.org/10.6017/ital.v37i4.10821 basis, search engines were simply defined as indexing and abstracting databases for the web. 3 it is interesting that library catalogs were developed separately from the development of search engines and we are still trying to get our metadata out of our closed systems and open to the rest of the web. in 1991, kibirige examined the potential impact of this new connectivity on library automation. he posited that “one of the most significant change agents that will pervade all other trends is the establishment and regular use of high-speed, fiber optic communication highways.”4 his article in ital provides a prescient overview of much of what has played out in technology, not just in libraries. he noted the need for disconnected devices to become tools to access full-text information remotely.5 perhaps most important, he noted the need for librarians to become experts in non-library technology, to keep pace with developments outside of the profession. this admonition is still important to keep in mind today. at the time, however, libraries were working on the basics of converting records from online bibliographic utility systems running on mainframes to a more useful format for access on a personal computer, let alone thinking about transforming library metadata into linked data that can be accessed by rest of the internet. so we keep moving forward. later in the decade, libraries began to think about the library catalog as a “one stop shop” for information. in 1997, caswell wrote about new work to integrate local content, digital materials, and electronic resources, all into one search interface. initially the discussion was more technical in nature, but caswell provided an early concept for providing a single access point to all of the content that the library has, print and electronic, which was a step forward from just listing the books in the catalog.6 at the time we were still far away from our current concept of a full discovery system with access to millions of electronic resources that may well surpass the print collections of a library. eventually more discussion developed around the importance of user experience and usability for the design of catalogs and websites. catalogs were examined in parallel with the structure of library metadata, and both were seen as important to the retrievability of library materials. human-machine interaction was starting to be examined on the staff side of systems, and this would eventually become part of examining the public interface usability as well. outlining an agenda for redesigning online catalogs, buckland summarized this new technological development work for libraries by noting that “sooner or later we need to rethink and redesign what is done so that it is not a mechanization of paper but fully exploits the capabilities of the new technology.”7 more exciting, by the end of the 1990s we were seeing usability studies for specific populations and those with accessibility difficulties. systems were in wide enough use that libraries began to examine their usefulness to more audiences. beyond our systems, the technology of our actual collections was changing. new network connectivity combined with new hardware led to new formats for library resources, specifically digital and electronic resources. in 1992, geraci and langschied summarized these changes, stating that “what is new for the 1990s is the complication of a greater variety of electronic format, software, hardware, and network decisions to consider.”8 they also expanded the conversation to include data in all forms, and data sets of various kinds, well beyond traditional library materials. this is an important evolution as libraries worked to shift their operations, identities, and curatorial practices. geraci and langschied defined data by type, including social data, scientific information technology and libraries | month year 11 data, and humanities data. they called most importantly for libraries to include access to this varied data to continue the role of libraries providing access to information, as they cautioned that information seekers were already beginning to bypass libraries and look for such information from other sources. libraries were beginning to lose ground as the gatekeepers of information and needed to shift to providing online access and open data themselves. the early 1990s were an exciting time for preservation, as discussion was moving from converting materials to microforms to digitization. in 1990, lesk compared the two formats and had hope for a promising digital future.9 thank goodness he was on target for sharing resources and creating economical digital copies, even if he did not completely predict the eventual shift to reliance on electronic resources that many research libraries have now made. lesk also noted the importance of text recognition, optical character recognition (ocr), and text formatting in ascii. others focused on digital file formats and the planning and execution of creating digital collections. digitization practices were developing and the need to formalize practice was becom ing evident. the same year, lynn outlined the relationship between digital resources and their original media, highlighting preservation, capture, storage, access, distribution.10 by the late 1990s there were more targeted discussions about the benefits of digitizing resources to provide not only remote access, but access to archival materials specifically. in 1996, alden provided a good primer on everything to consider when doing digitization projects, within budget constraints. 11 by the mid-1990s, karen hunter was excited to extol the promises of the dissemination of information electronically, calling the high performance computing and high speed networking applications act of 1993 “[a] formidable vision and goal. real-time access to everything and a laser printer in every house. the 1990s equivalent to a chicken in every pot.”12 hunter’s article is a good overview of where libraries were at working with electronic publications and online access in the early 1990s. halcyon enssle’s piece on moving reserves to online access opened with a great summary of where much of library access was headed: “the virtual library, libraries without walls, the invisible user . . . these are some of the terms getting used to describe the library of the future . . . .”13 eventually, by the end of the decade we even learned to start tracking how our new online libraries were being used, applying our knowledge of print resource usage to our new online collections. in 1995, laverna saunders had already developed a new definition of what a library was, and how the transformation of libraries from physical warehouses to providing access to online content would affect workflows in libraries. as defined by saunders, “the virtual library is a metaphor for the networked library, consisting of electronic and digital resources, both local and remote.”14 not a bad definition more than 20 years later. saunders asked pertinent questions such as which resources would be best in print vs. online, what print materials should be retained, and which resources and collections libraries should digitize themselves. the broader view provided was that these changes would affect not just collections but the entire operation of libraries. there would still be work to do in libraries, but changes in the work were necessary to address shifting technology and the composition of collections. by the end of the decade there was new work to assess use of electronic resources, extended virtual reference services, and information literacy extending to technology instruction. in 1998, kopp wrote about the promising future of library collaborations. consortia were well established in prior decades and they were seeing a resurgence. kopp noted that just as consortia the 1990s in review| bowers 12 https://doi.org/10.6017/ital.v37i4.10821 had been built around support for new shared utilities in the 1970s and 1980s, in the 1990s they were finding a new purpose in the new networking of the internet and possibilities of greater connectivity and collaborations in the online environment.15 beyond cataloging and automation technology, it is interesting to note that even in the new online environment that was forming in the 1990s, many consortia formed at the time to share print resources. this may have been conversely related to libraries shifting from complete print collections to online holdings that many may have felt were more ephemeral, or maybe money was spent on new technological infrastructures and less on library materials. resource sharing of print materials is still an important part of libraries working together to provide access to information, and since the time that kopp wrote about consortia and growing networked collaborations, there has also been a growing development of sharing electronic resources. a large part of the work of many consortia today revolves around purchasing of electronic resources, but in the late 1990s libraries were just beginning to get into purchasing commercial electronic resources.16 there were lots of ital articles in the 1990s looking at the future of libraries and technology, and some specific articles dedicated to prognostication. in 1991, looking into the future, kenneth e. dowlin shared a vision for public libraries in 2001. he predicted that libraries would still exist but it is noteworthy that at the time the future existence of libraries was questioned by many. dowlin did predict change for libraries, including the confluence of new media formats, computing, and yes, still books. he stated what time has now confirmed: “the public wants them all.”17 he had lots of other interesting ideas as well; his article is worth a second look. another fun take on the future was a special section on science fiction from 1994 considering future possibilities in information technology and access. in one piece, david brin noted, “nobody predicted that the home computer would displace the mega-machine and go on to replace the rifle over the fireplace as freedom’s great emancipator, liberating common citizens as no other technology has since the invention of the plow.”18 an interesting observation, even if the computer has now been replaced by phones in our pockets or other fantastic wearable technologies. by the end of the 1990s, libraries had been greatly transformed by technology. many libraries had automated, workflows continued to adjust in all areas of library work, and most libraries had at least partially incorporated elements of using the internet along with providing computer access to library users. some libraries were already moving through the change from print to electronic library resources. specific web applications and websites were also being developed and used for and by libraries. these eventually have matured into smarter systems that can provide better access to our collections and smarter assessment of our resource usage, for both print and electronic materials. as a whole, the 1990s are an exciting time to review when looking at the intersection of information technology and libraries. as information dissemination moved to an online environment, within and outside of the profession, the future existence of libraries began to be questioned. as we now know, libraries still play an important role in providing access to information. notes 1 kate nevins and larry l. learn, “linked systems: issues and opportunities (or confronting a brave new world),” information technology and libraries 10, no. 2 (1991): 115. information technology and libraries | month year 13 2 constance l. foster, cynthia etkin, and elaine e. moore, “the net results: enthusiasm for exploring the internet,” information technology and libraries 12, no. 4 (1993): 433-6. 3 scott nicholson, “indexing and abstracting on the world wide web: an examination of six web databases,” information technology and libraries 16, no. 2 (1997): 73-81. 4 harry m. kibirige, “information communication highways in the 1990s: an analysis of their potential impact on library automation,” information technology and libraries 10, no. 3 (1991): 172. 5 kibirige, “information communication highways in the 1990s,” 175. 6 jerry v. caswell, “building an integrated user interface to electronic resources,” information technology and libraries 16, no. 2 (1997): 63-72. 7 michael k. buckland, “agenda for online catalog designers,” information technology and libraries 11, no. 2 (1992): 162. 8 diane geraci and linda langschied, “mainstreaming data: challenges to libraries,” information technology and libraries 11, no. 1 (1992): 10. 9 michael lesk, “image formats for preservation and access,” information technology and libraries 9, no. 4 (1990): 300-308. 10 m. stuart lynn, “digital imagery, preservation, and access--preservation and access technology: the relationship between digital and other media conversion processes: a structured glossary of technical terms,” information technology and libraries 9, no. 4 (1990): 309-336. 11 susan alden, “digital imaging on a shoestring: a primer for librarians,” information technology and libraries 15, no. 4 (1996): 247-50. 12 karen a. hunter, “issues and experiments in electronic publishing and dissemination,” information technology and libraries 13, no. 2 (1994): 127. 13 halcyon r. enssle, “reserve on-line: bringing reserve into the electronic age,” information technology and libraries 13, no. 3 (1994): 197. 14 laverna m. saunders, “transforming acquisitions to support virtual libraries,” information technology and libraries 14, no. 1 (1995): 41. 15 james j. kopp, “library consortia and information technology: the past, the present, the promise,” information technology and libraries 17, no. 1 (1998): 7-12. 16 international coalition of library consortia, “guidelines for statistical measures of usage of web-based indexed, abstracted, and full text resources,” information technology and libraries 17, no. 4 (1998): 219-21; charles t. townley and leigh murray, “use-based criteria the 1990s in review| bowers 14 https://doi.org/10.6017/ital.v37i4.10821 for selecting and retaining electronic information: a case study,” information technology and libraries 18, no. 1 (1999): 32-9. 17 kenneth e. dowlin, “public libraries in 2001,” information technology and libraries 10, no. 4 (1991): 317. 18 david brin, “the good and the bad: outlines of tomorrow,” information technology and libraries 13, no. 1 (1994): 54. editorial board thoughts: appreciation for history cynthia porter information technology and libraries | september 2012 2 the future looks exciting for ital, with our new open access and online only journal. as i look forward, i have been thinking about librarians and the changes i have witnessed in library technology. i would like to thank judith carter for her work on ital for over 13 years. she encouraged me to volunteer for the editorial board. i will miss her. i believe that lessons from the past can help us. ital’s first issue appeared in 1982—the same year that i graduated from high school. i typed all my school papers with a typewriter except for my last couple of papers in college. my father bought an early macintosh computer (called lisa). he had a daisy wheel printer—if we wanted to change fonts, we changed out the daisy wheel. i am thankful for the editing capabilities and font choices i have now when i create documents. as an undergraduate student, i worked on dedicated oclc terminals in the interlibrary loan (ill) department at my college library. i was hired because i had the two hours open when ill usually used mail. i thought our ill service was a big help for our students. i could not imagine then that electronic copies of articles could be delivered to ill customers within one day. today’s ill staff doesn’t have to worry about paper cuts now, either. i graduated from library school in 1989. when i first started working as a cataloger, we were able to access oclc on pc’s (an improvement from the dumb terminals) in the libraries. our subject heading lists were in the big red books from the library of congress. i tried to use the red books as an example for today’s students and they had no idea what i was talking about. even though “subject headings” are a foreign concept to many students today, i will always value them and fight for their continuation. i worked on several retrospective conversion projects when i worked for a library contractor until 1991. the libraries still had card catalogs and we converted these physical catalogs to online catalogs. nicholson baker’s article “discards1,” published in 1994, fondly remembered card catalogs. this article was discussed fervently in library school, but it seems quaint now. i grew up with card catalogs and i liked being able to browse through the subject listings. browsing online does not provide the same satisfaction, but i would never give up the ability to keyword search an electronic document. i liked browsing the classification schemes, too. i did like easily seeing where your chosen number appeared within the scheme. it’s harder to do the same thing online. in 1991 i worked at an academic library where we were still converting catalog cards. we all had cynthia porter (cporter@atsu.edu) is distance support librarian at a.t. still university of health sciences, mesa, arizona. editorial board thoughts: appreciation for history| porter 3 computers on our desks by then and we were comfortable with regular use of e-mail. the internet was still young and gophers were the new technology. even though gophers were text-based, i thought it was amazing how easy it was to access information from a university on the other side of the country. the internet was the biggest technology development for me. i currently work with distance students who rely on their internet connections to use our online library. i could not imagine even having distance students if we weren’t connected with computers as we are now. a 2009 issue of ital was dedicated to discovery tools. in judith carter’s introduction to the issue she cites the browsing theory of shan-ju lin chang. browsing is an old practice in libraries and i am very happy to see that discovery tools use this classic library practice. bringing like items together has been a helpful organization method for ages. when i studied s.r. ranganathan and his colon classification scheme, i realized that faceted classification would work very well on the web. i found his ideas to be fascinating, but difficult to implement on book labels for classification numbers. some discovery tools even identify “facets” in searching and limiting. ranganathan’s work is a beautiful example of an old idea blossoming years after its conception. classification, facets, and browsing are old ideas that are still helping us organize information in our libraries. we can’t see the heavily used subjects by how dirty the cards are, but getting exact statistics on search terms is more useful anyway. i would also like to thank marc truitt for his time and contributions to ital. marc recently finished serving for four years as ital editor. he helped me remember library technology. i wanted to know about his collaboration with judith carter. he said that he “thought no one this side of pluto could do as well as she” as managing editor. we are lucky to have had brave librarians like ranganathan, carter, and truitt. although i enjoy remembering the past, i am very happy to utilize modern technology in my library. i don’t want to live in the past, but i definitely don’t want to forget it either. thank you library technology pioneers. references 1. nicholson baker, “discards,” the new yorker, april 4, 1994, vol. 70, no. 7, p. 64-85. fulfill your digital preservation goals with a budget studio yongli zhou information technology and libraries | march 2016 26 abstract to fulfill digital preservation goals, many institutions use high-end scanners for in-house scanning of historical print and oversize materials. however, high-end scanner prices do not fit in many small institutions’ budgets. as digital single-lens reflex (dslr) camera technologies advance and camera prices drop quickly, a budget photography studio can help to achieve institutions’ preservation goals. this paper compares images delivered by a high-end overhead scanner and a consumer-level dslr camera, discusses pros and cons of using each method, demonstrates how to set up a cost-efficient shooting studio, and presents a budget estimate for a studio. introduction colorado state university libraries (csul) are regularly engaged in a variety of digitization projects. materials for some projects are digitized in-house, while items from selected projects are sometimes outsourced. most fragile materials that require professional handling are digitized inhouse using an expensive overhead scanner. however, the overhead scanner has been occasionally unstable since it was purchased, and this has delayed some of our digitization projects. as digital photography technologies advance, image quality delivered by digital singlelens reflex (dslr) cameras is improving, and camera prices have lowered to an affordable level. in this paper, i will compare images produced by a scanner and a camera side-by-side, list pros and cons of using each method, illustrate how to establish a shooting studio, and present a budget estimate for that studio. literature review there are many online guidelines and manuals for digitizing print materials. some universities and museums have information about their digitization equipment online. most articles focus on either high-end scanners or customized scanning stations. these articles are very helpful for universities and museums that are relatively well funded. however, there is almost no literature discussing how to use inexpensive digital cameras and photography equipment to produce highquality digitized images. this article will use a case study to prove that a low-budget studio can produce high-quality digitized images. comparison of scanned and photographed images the test camera set was chosen because it was the one the author used for general purpose. the camera was also chosen by many professional photographers because of its quality and yongli zhou (yongli.zhou@colostate.edu) is digital repositories librarian, colorado state university libraries, fort collins, colorado. mailto:yongli.zhou@colostate.edu fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 27 affordability. to avoid dispute, the overhead scanner’s make and model are not revealed. test equipment budget studio overhead scanner • nikon d800 • nikon af micro-nikkor 60mm f/2.8d lens • manfrotto 055cxpro3 3-section carbon fiber tripod legs • really right stuff bh-40 lr ii ballhead • nonreflective glass • book cradles • x-rite original colorchecker card • natural daylight • total cost: $4,500 and no maintenance fees (priced in 2014) • our overhead scanner • nonreflective glass • book cradles • purchase price: $55,000 (purchase in 2007) • $8,000 annual maintenance (2013 price) table 1. test equipment focus and sharpness a quality digitized image needs to have a good focus. a well-focused image shows details better and can produce better optical character recognition (ocr) results for text-based documents. at csul, we have no control over the automatic focus on our overhead scanner and have noticed that sometimes one page is sharply focused but the next page is slightly out-of-focus. during the scanning process, our overhead scanner does not indicate if a shot is focused or not. a dslr camera can beep or display a flashing dot on the viewfinder when in focus. illustration the following two figures compare images produced by our test dslr and overhead scanner. both images were originals and have not been enhanced by software. in addition to this image, we tested nine other illustrations. following our comparison study, we concluded that a semiprofessional dslr camera produces sharper images than our expensive overhead scanner. in figure 1, at 100 percent zoom , the left image has a better focus, contains more details, and has colors closer to the original. the left image was taken using a nikon d800 + nikkor 60mm macro lens and under natural lighting. the right image was produced by our overhead scanner. in figure 2, at 200 percent zoom, the left image (taking using the dslr) shows much more detail than the image on the right (taken with the overhead scanner). information technology and libraries | march 2016 28 figure 1. comparative images from dslr (left) and overhead scanner (right), at 100 percent zoom. image from samuel m. janney, the life of william penn; with selections from his correspondence and auto-biography (philadelphia: hogan perkins & co, 1852), plate between pages 296 and 297. figure 2. comparative images from dslr (left) and overhead scanner (right), at 200 percent zoom. image from samuel m. janney, the life of william penn; with selections from his fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 29 correspondence and auto-biography (philadelphia: hogan perkins & co, 1852), frontispiece, print. at csul, the process of digitizing a text document includes scanning pages, converting them into portable document format (pdf) files, and applying an ocr process. in general, a well-focused image of text produces better ocr results, although software such as adobe acrobat can tolerate fuzzy images and produce reasonably accurate ocr text. our ocr tests from a slightly out-of-focus image and a well-focused image have no significant difference; however, from preservation and usability standpoints, we prefer well-focused images. figure 3. the left image was produced by our test dslr camera and has a better focus. the right image was produced by our overhead scanner. samuel m. janney, the life of william penn; with selections from his correspondence and auto-biography (philadelphia: hogan perkins & co, 1852), 300, print. figure 4. we ran the ocr process on the above two images. the top image was produced by our test dslr camera and the bottom image was produced by our overhead scanner. samuel m. information technology and libraries | march 2016 30 janney, the life of william penn; with selections from his correspondence and auto-biography (philadelphia: hogan perkins & co, 1852), 300, print. generated from the image by camera generated from the image by scanner " on one or two points of high importance, he had notions more correct than were, in his day, common, even among men of e1~larged minds, and he had the rare good fortune of being able to carry his theories into practice without any compromise." yet, "he was not a man of stron sense." " on one or two points of high importance, he bad notions more correct than were, in his day, common, even arnong men of e1~larged minds, and he had the rare good fortune of being able to carry his theories into practice without any compromise." yet, "he was not a man of strong sense." table 2. ocr results comparison these test results are very close because of the forgiveness of the adobe acrobat software. however, we have seen that for some other pages, a better-focused image generates improved ocr results. photograph a 6.5 inches by 4.5 inches silver print was used for this test. our tests show that the test dslr camera produced a sharper image of this historic photograph. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 31 figure 5. tested 6.5 inches by 4.5 inches photograph. the red square indicates the enlarged area for figure 6. historical photograph from colorado state university archives and special collections. figure 6. screen view at 100 percent zoom of a silver print. the top image was produced by the test dslr camera and the bottom one was produced by our overhead scanner. historical photograph from colorado state university archives and special collections. oversize materials for oversized materials, overhead scanners and dslr cameras have their drawbacks, so we do not think either option is ideal for them. our library uses a map scanner to scan oversize maps and posters. however, a map scanner is expensive and may not fit many libraries’ budgets. a map scanner also is not suitable for fragile maps or posters. our overhead scanner’s maximum scanning area is 24 inches by 17 inches, and the test map’s size is 25 inches by 26 inches. we had to scan the map in four sections and stitch them together using adobe photoshop. each section image has a files size of 313 mb. because of large file sizes, the stitching process is extremely slow. also stitching images is not recommended because there are always some degrees of mismatching errors created by lens distortion. a camera can capture any material size, but the details of the photographed images diminish as the material’s size increases. the photo of the entire map taken by our test dlsr has a file size of 35.8 mb. the image produced by camera has a lower resolution and less detail. information technology and libraries | march 2016 32 figure 7. oversized materials screen view at 100 percent zoom. the top image was photographed by the test dslr. the bottom image was scanned by our overhead scanner. historical map from colorado state university archives and special collections. small prints one big advantage of a dslr camera is that it can be set farther away to take pictures of oversized materials or very close to smaller objects to take close-up pictures. comparatively, the distance of lens and scanning platform on our overhead scanner is fixed, so no close-up images can be produced, and everything is reproduced at scale of 1:1. for the following example, we used a 5.5 inches by 3.5 inches drawing as our test subject. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 33 figure 8. a 5.5 inches by 3.5 inches fine drawing. a historical booklet from colorado state university archives and special collections. figure 9. small prints screen view at 100 percent zoom. the left image is produced by a dslr with a macro lens and the right image was scanned by our overhead scanner. a historical booklet from colorado state university archives and special collections. information technology and libraries | march 2016 34 the image produced by our overhead scanner has a resolution of 3,427 pixels by 2,103 pixels. the camera produces a 6,776 pixels by 4,240 pixels image. the higher pixel count allows users to see more details at the same zoom level. the image produced by camera is not only sharper but also contains more details. it also is good for making enlarged prints for promotion materials. for smaller maps, a dslr camera also produces superior images. for the following sample, we tested a 15 inches by 9.5 inches map. figure 10. a 15 inches by 9.5 inches map. the blue square indicates the enlarged area for figure 11. historical map from colorado state university archives and special collections. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 35 figure 11. small map screen views at 100 percent zoom. the left image was photographed by a dslr camera with a macro lens and the right image was produced by our overhead scanner. historical map from colorado state university archives and special collections. post-processing use of a sharpening filter our tests showed that a main drawback of our overhead scanner is that images produced are outof-focus. some digitization guidelines recommend minor post-processing for delivered images files to improve image quality. one might argue that to fix our overhead scanner’s out-of-focus problem, sharpening can be applied. technical guidelines for digitizing cultural heritage materials: creation of raster image master files recommends doing minor post-scan adjustment to optimize image quality and bring all images to a common rendition.1 this is good advice, but it is not applicable in real-world practice. to get the best result, each image would need to be evaluated and have a sharpening filter applied separately because when an improper sharpening setting is applied to an image, it often creates haloing artifacts and an unnatural look. the application of a sharpening filter to each image process will be extremely time-consuming. the haloing artifact is also called chromatic aberration (ca) effect. ca appears as unsightly color fringes near high contrast edges. chromatic aberrations are typically only visible when viewing the image on-screen at higher zoom levels or on large prints. information technology and libraries | march 2016 36 the following example shows that the ca may not appear at lower zoom levels, such as 50 percent or 100 percent. the left image has no sharpening filter applied and the right image has a sharpening filter applied. at 100 percent zoom, chromatic aberration is almost not identifiable, and the right image appears to be superior in turns of sharpness. figure 12. sharpening filter comparison sample at 100 percent zoom. the left image has no sharpening filter applied and the right image has been applied a sharpening filter. historical map from colorado state university archives and special collections. at a higher zoom level, we see ca, visible in the right image of figure 13. the extra colors are introduced by the software. figure 13. comparison of sharpening filter applied to images and at 500 percent zoom. the left image has no sharpening filter applied and the right image has sharpening filter applied. historical map from colorado state university archives and special collections. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 37 we recommend not applying sharpening filters to original scanned images; instead, attempt to obtain well-focused images from the beginning. for this reason, the test dslr camera outperformed our overhead scanner for most materials. color balance have you seen a scanned color image or color photograph with colors very different from the original image? for example, a white area appears to be bluish, or it has an orange cast? when scanning or photographing an image under different lighting, the output image can have very different colors. in the following figure, the left image was shot at a correct white balance (wb) setting. wb is the process of removing unrealistic color casts so that objects that appear white in person are rendered white in your photo.2 the center image has a blue color cast, which was caused by a lower kelvin setting, and the right image was shot at a higher kelvin setting. a camera may create images with the wrong colors, but so will a scanner if it is not calibrated correctly. figure 14. images shot under different white balance settings. we pay an $8,000 annual service fee for overhead scanner maintenance, which includes scanner color calibration. in general, image colors rendered by the machine are close to original colors but not exact. we have noticed that some images have a very light green overcast and other others are overly yellow; sometimes images appear to be darker than they should be. because we are not certified to calibrate the overhead scanner, we only use the prescribed settings set by technicians. also, we have no control over maintaining a fading light bulb, which will affect correct exposure. wb adjustment on photographs taken in a studio can be very precise. most dslr contains a variety of preset white balances. in general, auto wb works well, but does not deliver the best results. custom wb allows fine-tuning of colors. if a shooting studio is set up properly, the lighting should be consistent, so ideally one setting found most desirable can be used repeatedly. however, professional photographers do test shots at the beginning of each shooting session. once they find information technology and libraries | march 2016 38 the optimal test shot, they will use the exact settings for the batch. later, they will do minor color adjustment on the chosen test shot to ensure precise color representation, and then apply the adjustment settings on all other photos of the same batch. because many small variations can be present for each shooting session, they do not use the settings from the previous shooting. it may seem arduous to do test shots for each shooting, but it ensures accurate color reproduction. many professional photographers use colorchecker passport,3 which is a commercial product to help with quick and easy capture of accurate colors. i will demonstrate briefly a useful trick i learned from a professional photography seminar how to utilize colorchecker passport to apply correct white balance a group of images. 4 step 1: place an 18 percent gray card or a colorchecker passport card on top of a page. choose the correct exposure and take the photo. use the same exposure setting to take additional photos. for demonstration purposes, we deliberately used a very low and high kelvin setting for sample images. the low kelvin setting created cool and blue tones and the high kelvin setting created a tone that was too warm. note that the test shot with colorchecker board was not taken with exactly the correct white balance setting. figure 15. sample images for white balance adjustment. rocky mountain collegian 3–4 (1893), 118, colorado state university archives and special collections. step 2: in adobe lightroom, select the test target image and switch to “develop” mode. select the white balance tool, move the cursor over a gray area, try to find a spot where the red, green, and blue (rgb) values are close. if you can find a place with equal rgb values, it will be ideal. this simple click will set the test image’s white balance to an almost perfect setting. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 39 figure 16. applying a white balance in adobe lightroom 4 step 3. synchronize other images’ settings with the target image. select the target image and all other images, click the sync button, and select settings you would like to synchronize. make sure the wb button is checked. figure 17. synchronize settings in adobe lightroom 4 information technology and libraries | march 2016 40 figure 18. synchronized images with correct white balance. rocky mountain collegian 3–4 (1893), 118, colorado state university archives and special collections. recently, i had the opportunity to visit the spencer museum of art’s digitization lab. they have a different workflow to ensure even more scientifically correct colors. if you are interested in their approach, you can contact their information technology manager or photographer. color space one very important thing to understand is color space when you use a dslr camera. many dslr cameras support adobe rgb and srgb. srgb reflects the characteristics of the average cathode ray tube (crt) display. this standard space is endorsed by many hardware and software manufacturers, and it is becoming the default color space for many scanners, low-end printers, and software applications. it is the ideal space for web work but not recommended for prepress work because of its limited color gamut. adobe rgb (1998) was designed to encompass most of the colors achievable on cmyk printers, but only by using rgb primary colors on a device such as your computer display.5 it is recommended to use this color space if you need to do print production work with a broad range of colors. many scanning vendors deliver images in adobe rgb color space. prophoto rgb contains all colors that are in adobe rgb, and adobe rgb contains nearly every color that is in srgb. this color space covers more colors than the human eye can see. it can only be used for images in raw format and in 16-bit mode. common file formats that support 16-bit images are tiff and psd. most printers do not support 16-bit format. this color space normally is used by photographers who have a specific workflow and who print on specific high-end inkjet printers. when converting from 16-bit to 8-bit, some images will have banding or posterization problems. banding is a digital imaging artifact. a picture with banding problem shows horizontal or vertical lines. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 41 figure 19. an example of colour banding, visible in the sky in this thotograph.6 posterization of an image entails conversion of a continuous gradation of tone to several regions of fewer tones, with abrupt changes from one tone to another.7 figure 20. an example of posterization.8 while it is a good idea to capture images using adobe rgb to preserve a wide range of colors, you should convert images to srgb when delivering to unknown users and displaying on the web. currently, srgb is the only appropriate choice for images uploaded to the web, since most web browsers don’t support any color management. adobe rgb images that are uploaded to websites without conversion to srgb generally appear dark and muted.9 if they were printed on printers that do not support adobe rgb format, colors will be dull too. setting up a budget studio commercial approach bookdrive pro is a commercially available digitization unit. it uses two digital cameras and built-in flash lights. it may be the optimal solution for your projects, but it also may not fit your library’s information technology and libraries | march 2016 42 budget. the unit also is not suitable for oversized material such as large maps and posters. for more information about this product, please visit http://pro.atiz.com/. sample budget studio setup a digitization lab can have three rooms or areas, one for oversized materials, one for smaller prints or 3-d objects, and one for computers. the area for shooting oversized materials should have black walls and floor. you can either use one flash light to bounce light off the ceiling or use two flash lights to shine lights directly onto the materials. for fragile materials, the first approach is more appropriate. the area for shooting smaller prints or 3-d objects should have a stable table and black or white background paper. for this room or area, black walls and floor are not required. for shooting equipment, i will use the set chosen by the photographer from the university of kansas spencer museum of art as my example. item name sample item purchasing url price dslr camera nikon d810 http://www.bhphotovideo.co m/c/search?atclk=camera+mo del_nikon+d810&ci=6222&n= 4288586280+3907353607 $2,996.95 macro lens nikon af micro-nikkor 60mm f/2.8d lens http://www.bhphotovideo.co m/c/product/66987grey/nikon_1987_af_micro_ nikkor_60mm_f_2_8d.html $429.00 heavy duty mono stand arkay 6jrcw mono stand jr with counter weight— 6' http://www.bhphotovideo.co m/c/product/2727reg/arkay_605138_6jrcw_m ono_stand_jr.html $678.50 strobe broncolor g2 pulso— 1600 watt/second focusing lamphead with 16' cord http://www.bhphotovideo.co m/c/product/259745reg/broncolor_32_115_07_g2 _pulso_with_16.html $3,053.68 power pack broncolor senso a4 2,400w/s power pack http://www.bhphotovideo.co m/c/product/745060reg/broncolor_31_051_07_se nso_a4_2_400w_s_power.html $3,629.92 http://www.bhphotovideo.com/c/search?atclk=camera+model_nikon+d810&ci=6222&n=4288586280+3907353607 http://www.bhphotovideo.com/c/search?atclk=camera+model_nikon+d810&ci=6222&n=4288586280+3907353607 http://www.bhphotovideo.com/c/search?atclk=camera+model_nikon+d810&ci=6222&n=4288586280+3907353607 http://www.bhphotovideo.com/c/search?atclk=camera+model_nikon+d810&ci=6222&n=4288586280+3907353607 http://www.bhphotovideo.com/c/product/66987-grey/nikon_1987_af_micro_nikkor_60mm_f_2_8d.html http://www.bhphotovideo.com/c/product/66987-grey/nikon_1987_af_micro_nikkor_60mm_f_2_8d.html http://www.bhphotovideo.com/c/product/66987-grey/nikon_1987_af_micro_nikkor_60mm_f_2_8d.html http://www.bhphotovideo.com/c/product/66987-grey/nikon_1987_af_micro_nikkor_60mm_f_2_8d.html http://www.bhphotovideo.com/c/product/2727-reg/arkay_605138_6jrcw_mono_stand_jr.html http://www.bhphotovideo.com/c/product/2727-reg/arkay_605138_6jrcw_mono_stand_jr.html http://www.bhphotovideo.com/c/product/2727-reg/arkay_605138_6jrcw_mono_stand_jr.html http://www.bhphotovideo.com/c/product/2727-reg/arkay_605138_6jrcw_mono_stand_jr.html http://www.bhphotovideo.com/c/product/259745-reg/broncolor_32_115_07_g2_pulso_with_16.html http://www.bhphotovideo.com/c/product/259745-reg/broncolor_32_115_07_g2_pulso_with_16.html http://www.bhphotovideo.com/c/product/259745-reg/broncolor_32_115_07_g2_pulso_with_16.html http://www.bhphotovideo.com/c/product/259745-reg/broncolor_32_115_07_g2_pulso_with_16.html http://www.bhphotovideo.com/c/product/745060-reg/broncolor_31_051_07_senso_a4_2_400w_s_power.html http://www.bhphotovideo.com/c/product/745060-reg/broncolor_31_051_07_senso_a4_2_400w_s_power.html http://www.bhphotovideo.com/c/product/745060-reg/broncolor_31_051_07_senso_a4_2_400w_s_power.html http://www.bhphotovideo.com/c/product/745060-reg/broncolor_31_051_07_senso_a4_2_400w_s_power.html fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 43 reflector broncolor p65 reflector, 65 degrees, 11" diameter, for broncolor pulso 8, twin and hmi http://www.bhphotovideo.co m/c/product/7162reg/broncolor_33_106_00_p6 5_reflector_65_degrees.html $513.52 reflector broncolor softlight reflector, 20" diameter, for broncolor primo, pulso 2/4 & hmi heads http://www.bhphotovideo.co m/c/product/7167reg/broncolor_33_110_00_sof tlight_reflector_20_for.html $501.76 light stand impact air-cushioned light stand http://www.bhphotovideo.co m/c/product/253067reg/impact_ls10ab_air_cush ioned_light_stand.html $44.99 light meter sekonic l-308s flashmate—digital incident, reflected and flash light meter http://www.bhphotovideo.co m/c/product/368226reg/sekonic_401_309_l_308s _flashmate_light_meter.html $199.00 book cradle book exhibition cradles http://www.universityproduct s.com/cart.php?m=product_list &c=1115&primary=1&parenti d=1271&navtree[]=1115 $30.00 background paper savage seamless background paper (both white and black) http://www.bhphotovideo.co m/c/product/45468reg/savage_1_12_107_x_12yd s_background.html $45.00 x 2 = $90.00 nonreflective glass 1/4" optiwhite starphire purified tempered single lite clear class can be purchased at local glass store. $75.00 white balancing accessory x-rite original colorchecker card http://www.bhphotovideo.co m/c/product/465286reg/x_rite_msccc_original_c olorchecker_card.html $69.00 software adobe lightroom 5 http://www.adobe.com/produ cts/photoshop-lightroom.html $150.00 table 3. list of items needed to prepare for a budget studio the total cost for a “budget” shooting studio ranges from $10,000 to $15,000, and there is no annual maintenance expense. http://www.bhphotovideo.com/c/product/7162-reg/broncolor_33_106_00_p65_reflector_65_degrees.html http://www.bhphotovideo.com/c/product/7162-reg/broncolor_33_106_00_p65_reflector_65_degrees.html http://www.bhphotovideo.com/c/product/7162-reg/broncolor_33_106_00_p65_reflector_65_degrees.html http://www.bhphotovideo.com/c/product/7162-reg/broncolor_33_106_00_p65_reflector_65_degrees.html http://www.bhphotovideo.com/c/product/7167-reg/broncolor_33_110_00_softlight_reflector_20_for.html http://www.bhphotovideo.com/c/product/7167-reg/broncolor_33_110_00_softlight_reflector_20_for.html http://www.bhphotovideo.com/c/product/7167-reg/broncolor_33_110_00_softlight_reflector_20_for.html http://www.bhphotovideo.com/c/product/7167-reg/broncolor_33_110_00_softlight_reflector_20_for.html http://www.bhphotovideo.com/c/product/253067-reg/impact_ls10ab_air_cushioned_light_stand.html http://www.bhphotovideo.com/c/product/253067-reg/impact_ls10ab_air_cushioned_light_stand.html http://www.bhphotovideo.com/c/product/253067-reg/impact_ls10ab_air_cushioned_light_stand.html http://www.bhphotovideo.com/c/product/253067-reg/impact_ls10ab_air_cushioned_light_stand.html http://www.bhphotovideo.com/c/product/368226-reg/sekonic_401_309_l_308s_flashmate_light_meter.html http://www.bhphotovideo.com/c/product/368226-reg/sekonic_401_309_l_308s_flashmate_light_meter.html http://www.bhphotovideo.com/c/product/368226-reg/sekonic_401_309_l_308s_flashmate_light_meter.html http://www.bhphotovideo.com/c/product/368226-reg/sekonic_401_309_l_308s_flashmate_light_meter.html http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentid=1271&navtree%5b%5d=1115 http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentid=1271&navtree%5b%5d=1115 http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentid=1271&navtree%5b%5d=1115 http://www.universityproducts.com/cart.php?m=product_list&c=1115&primary=1&parentid=1271&navtree%5b%5d=1115 http://www.bhphotovideo.com/c/product/45468-reg/savage_1_12_107_x_12yds_background.html http://www.bhphotovideo.com/c/product/45468-reg/savage_1_12_107_x_12yds_background.html http://www.bhphotovideo.com/c/product/45468-reg/savage_1_12_107_x_12yds_background.html http://www.bhphotovideo.com/c/product/45468-reg/savage_1_12_107_x_12yds_background.html http://www.bhphotovideo.com/c/product/465286-reg/x_rite_msccc_original_colorchecker_card.html http://www.bhphotovideo.com/c/product/465286-reg/x_rite_msccc_original_colorchecker_card.html http://www.bhphotovideo.com/c/product/465286-reg/x_rite_msccc_original_colorchecker_card.html http://www.bhphotovideo.com/c/product/465286-reg/x_rite_msccc_original_colorchecker_card.html http://www.adobe.com/products/photoshop-lightroom.html http://www.adobe.com/products/photoshop-lightroom.html information technology and libraries | march 2016 44 figure 21. the university of kansas spencer museum of art digitization lab setup for oversized materials figure 22. steelworks museum of industry and culture’s digitization lab setup for oversized materials fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 45 figure 23. the university of kansas spencer museum of art digitization lab setup for smaller prints and 3-d objects figure 24. steelworks center of the west’s digitization lab setup for 3-d objects functions of some elements in the sample shooting studio 1. macro lens: it allows close up shooting of objects. it is especially useful when photograph small prints and small 3-d objects. it can also be used to photograph regular and oversized materials. 2. heavy-duty mono stand: it replaces a traditional tripod. it is very stable and allows quick adjustment of camera height and location. 3. strobe, power pack, and reflector: together they generate consistent and homogeneous light distribution. recommended further reading: “introduction to offcamera flash: three main choices in strobe lighting.”10 4. light stand: it holds strobe and reflector. information technology and libraries | march 2016 46 5. light meter: hand-held exposure meters measure light falling onto a light-sensitive cell and converts it into a reading that enables the correct shutter speed and or lens aperture settings to be made.11 6. book cradles: they help to minimize the stress on bookbindings and minimize page curvature problem. 7. nonreflective glass: it helps to flatten a photographed page and reduce the reflection. however, it does not completely eliminate glass reflection. one very useful trick to reduce glass reflection is to place a black board with a hole above a page and shoot through the hole. this approach actually does not eliminate reflection but reflects black to the photograph. when the photograph is reviewed on computer, it will appear as no reflection has occurred. figure 25. the university of kansas spencer museum of art digitization lab setup for materials needed be pressed down by a glass. many librarians believe that digitizing print materials using a digital camera requires a professional photographer, but this is not necessarily true. a professional photographer or even an art student can act as a consultant to help set up a shooting studio and provide basic training. also, many museums have professional photographers and have set up shooting studios for digitization. they are very willing to share their experience and even provide training. i believe the learning curve for operating a shooting studio is no greater than the learning curve to operate an overhead scanner machine and its software. pros and cons no digitization equipment or system is perfect. they all have trade-offs in image quality, speed, convenience of use, quality of accompanying software, and cost. our tests show that for most archival materials a dslr camera will do a better job than an overhead scanner. pros of overhead scanner fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 47 • the scanner is a complete scanning station. it can be connected to a computer and starts scanning immediately. materials can be placed on the scanning surface, so no equipment adjustments are required while scanning. • it can scan and save images in bitmap format directly, while a dslr camera can only shoot in grayscale or color. • built-in book cradles help to scan thick books and those that cannot be fully opened. • book curve correction functionality is provided by the accompanying software. cons of overhead scanner • high cost. the overhead scanner we have cost more than $50,000, with an annual maintenance contract of $8,000. • high replacement cost. when a scanner is outdated or broken, the entire machine has to be replaced. • instability. our overhead scanner is unstable even when placed on a sturdy table and handled only by professionals. from april 2010 to october 2010, the scanner was down for a total of forty-two working days (sixty calendar days). the company fixed the machine onsite many times, but it continues to have minor problems and has not been completely reliable. • the autofocus feature does not work consistently. • special training is needed to operate the machine and associated software. • file formats supported are limited. most scanners only support tiff, jpeg, jpeg 2000, windows bmp, and png. • unsupported outdated software: our overhead scanner’s software can only be run on an older operating system (windows xp) because there is no updated software for this model. pros of budget studio • stable. under normal use dslr cameras are much less likely to break down than scanners. for example, i have had an older dslr, nikon d200, for seven years. it has survived numerous backpacking trips, multiple drops, and extreme weather conditions. the camera still functions as needed. • fast and accurate focus. dslr cameras are designed to focus quickly, and their focus indicators provide instant feedback to the operators so they know that the image is focused. if operated properly, images delivered by dslr cameras can be sharper than ones delivered by scanners. • less expensive. a good quality dslr camera and a lens can be purchased for fewer than $4,000 and last for years. as technologies advance, dslr cameras’ prices will continue to drop. • ability to save files in more formats. in addition to tiff and jpeg formats, most dslr cameras can save photos in raw file format. some cameras can directly save images in digital negative (dng) format, and others deliver images in proprietary formats that can be information technology and libraries | march 2016 48 converted into dng using a computer program. editing raw images is nondestructive, while editing of tiff and jpeg images is irreversible. • accurate wb and exposure. by using right shooting and post-processing techniques, photographs can have exact color reproduction. on the other hand, calibrating an overhead scanner most likely can only be performed by a company’s trained technician. proper exposure and wb are not guaranteed. • the raw file format usually provides more dynamic range. overexposed and underexposed images can be fixed by adjusting exposure compensation via software; thus lost shadow or highlight detail can be restored. • can photograph 3-d objects. archival collections often have materials other than books, such as art pieces. these materials are better to be photographed than scanned. • versatile. cameras can perform on-site digitization, while overhead scanners are too bulky to be moved around. • faster and better preview. images can be viewed instantly on a computer when proper software, such as adobe lightroom, is used. operators can compare multiple shoots on a screen side-by-side and decide which photo to retain. • more accessible technical support. the number of dslr camera users is much higher than overhead scanner users. technical questions can often be answered through online forums. • easy to find replacement parts. when a piece in a shooting studio break down, it is easy to find replacement piece and replace by staff. • easy software updates. software used in a studio is independent from equipment. cons of budget studio • there is learning curve for setting up a shooting studio, operating the studio, and mastering new image processing techniques. • a dslr camera with a lower pixel setting will not be sufficient for scanning large-format materials, such as posters and maps. • no built-in book curve correction is provided by adobe photoshop or lightroom. however, our experience proves that the automatic book curve function does not always work well. we normally use a home-made book cradle to help lay a page flat and use one or two weights to hold down the other side of book. for some books, if flatness is hard to achieve, we place a piece of glass on the top to ensure the flatness. • security concern: since a dslr camera is highly portable, it can be stolen easily. fulfill your preservations goals with a budget studio | zhou doi: 10.6017/ital.v35i1.5704 49 figure 26. scanning setup using a book cradle. conclusion the technology of dslr cameras has advanced very quickly in the past ten years. newer dslr cameras can handle higher resolutions and have very little image noise even at a high iso setting. the higher demand for dslr cameras and accompanying image-editing software results in more rapid technology advances compared to low-demand and high-end overhead scanners. high consumer demand drives dslr camera prices much lower than prices for overhead scanners. in addition, the wide range of consumers purchasing dslr cameras and software prompts companies to offer more user-friendly interfaces. as you can see from our tests, for most library materials a dslr camera can produce superior images. if you do not have a budget for high-end overhead scanners, you can still fulfill your digitization preservation goals with a budget studio. acknowledgement i would like to thank robert hickerson and ryan waggoner, the university of kansas spencer museum of art, tim hawkins, and steelworks center of the west for showing their digitization labs and sharing experience with me. references 1. federal agencies digitization guidelines initiative, “technical guidelines for digitizing cultural heritage material: creation of raster image master files,” august 2010, http://www.digitizationguidelines.gov/guidelines/digitize-technical.html 2. “tutorials: white balance,” cambridge in colour, accessed march 9, 2016, http://www.cambridgeincolour.com/tutorials/white-balance.htm. http://www.cambridgeincolour.com/tutorials/white-balance.htm information technology and libraries | march 2016 50 3. “colorchecker passport user manual,” x-rite incorporated, accessed march 9, 2016, http://www.xrite.com/documents/manuals/en/colorcheckerpassport_user_manual_en.pdf. 4. scott kelby, “scott kelby's editing essentials: how to develop your photos,” pearson education, peachpit, accessed march 9, 2016, http://www.peachpit.com/articles/article.aspx?p=2117243&seqnum=3. 5. “srgb vs. adobe rgb 1998,” cambridge in colour, accessed march 9, 2016, http://www.cambridgeincolour.com/tutorials/srgb-adobergb1998.htm. 6. “colour banding,” wikipedia, accessed march 9, 2016, http://en.wikipedia.org/wiki/colour_banding. 7. “posterization,” wikipedia, accessed march 9, 2016, http://en.wikipedia.org/wiki/posterization. 8. “image posterization,” cambridge in colour, accessed march 9, 2016, http://www.cambridgeincolour.com/tutorials/posterization.htm. 9. richard anderson and peter krogh, “color space and color profiles,” american society of media photographers, accessed march 9, 2016, http://dpbestflow.org/color/color-space-andcolor-profiles. 10. tony roslund, “introduction to off-camera flash: three main choices in strobe lighting,” fstoppers (blog), accessed march 9, 2016, https://fstoppers.com/originals/introductioncamera-flash-three-main-choices-strobe-lighting-40364. 11. “introduction to light meters,” b & h foto & electronics corp., accessed march 9, 2016, http://www.bhphotovideo.com/find/product_resources/lightmeters1.jsp. http://www.xrite.com/documents/manuals/en/colorcheckerpassport_user_manual_en.pdf http://www.peachpit.com/articles/article.aspx?p=2117243&seqnum=3 http://www.cambridgeincolour.com/tutorials/srgb-adobergb1998.htm http://en.wikipedia.org/wiki/colour_banding http://en.wikipedia.org/wiki/posterization http://www.cambridgeincolour.com/tutorials/posterization.htm http://dpbestflow.org/color/color-space-and-color-profiles http://dpbestflow.org/color/color-space-and-color-profiles https://fstoppers.com/originals/introduction-camera-flash-three-main-choices-strobe-lighting-40364 https://fstoppers.com/originals/introduction-camera-flash-three-main-choices-strobe-lighting-40364 http://www.bhphotovideo.com/find/product_resources/lightmeters1.jsp oversize materials small prints use of a sharpening filter color balance color space setting up a budget studio commercial approach sample budget studio setup cons of budget studio acknowledgement i would like to thank robert hickerson and ryan waggoner, the university of kansas spencer museum of art, tim hawkins, and steelworks center of the west for showing their digitization labs and sharing experience with me. 178 information technology and libraries | december 2006 l eadership—what is it? ala president leslie burger has me thinking about it a lot these days. as i write, the lita board is in the process of determining who lita will sponsor in the ala emerging leaders program. the task is difficult. lita has many new librarians who have strong potential for leadership. consequently i feel assured that lita has a strong future because what is an association, if not its members? so one of the questions the board asked was what does it mean to be an emerging leader? when has one emerged? personally, i feel that i am still emerging because there is always more to learn. lifelong learning, isn’t that what librarians are all about? in preparation for my presidency, i attended an american society for association executives seminar facilitated by tecker consultants. they defined four types of influential leadership: servant, visionary, expert, and catalytic. i see all four types of influential leaders within lita and they are all important. the servant leader provides service to others. in a volunteer organization like lita, a lot of servant leadership is being exhibited. these are the people who keep the organization humming, making sure we have the programs and education opportunities that make lita relevant to its members. the most obvious place we see visionary leaders in lita is at our top technology trends; however, it is not the only place where visionary thinking occurs. lita members are often cuttingedge, applying new technologies to solve problems or to provide better solutions and services. visionary leadership is where one sees what the future could look like. lita programs are filled with expert leaders who share their technical expertise and lead the profession in applying those technologies. however, we also have many expert leaders who have important insights into what the association can be. the catalytic leader brings people together and leverages their capabilities. the lita board works with other lita leadership to ensure that our goals are reached and to bring together all of the lita offerings to make membership a comprehensive professional benefit. my challenge as the current lita president with the ala emerging leaders program is to ensure that our sponsored member has a meaningful opportunity to become a superb leader both within lita and within the profession. in addition to attending the leadership training workshops for all of the emerging leaders, each sponsored person will be appointed to some service role within ala or one of its units. the lita board has elected to have our sponsored emerging leader work closely with the officers, in particular our vice president/president-elect mark beatty, on strategic planning for the next two years. i am hopeful that we will learn a great deal from our emerging leader regarding what new members are seeking out of the organization. when i think about a good leader, i think about someone who listens, who allows others to think creatively and to take risks, who inspires, who sees the big picture, who can make decisions and make others understand the reasons for a decision, and who communicates well. john buchan put it this way: “the task of leadership is not to put greatness into people, but to elicit it, for the greatness is there already.” my goal this year, in conjunction with, but not limited to, the ala emerging leaders program, is to grow our new members into future lita leaders. i have been rewarded in all of my work within lita to witness rising stars take on exciting roles and projects. i hope everyone reaps the joys of mentoring new professionals at some point in their careers. in my own leadership role, i take seriously the need to implement lita’s strategic plan. in that vein, the board has created an assessment and research task force that will make recommendations on gathering assessment data and feedback from members. with the appropriate knowledge base, we can ensure that value is being received. the board has also created a working group consisting of the chairs of the education committee, the regional institutes committee, and the program planning committee to make recommendations on our education programs. i have been working with that group to identify new modes of delivering our programs and to ensure that they maintain their relevancy to lita members. lita continues to implement new communication technologies to reach out to its members. the lita blog has now been up for over a year and the new lita wiki is available for use by interest groups and others to allow experts to collaborate in the building of topic-specific resources. sir john harvey-jones framed the question thusly: “how do you know you have won? when the energy is coming the other way and when your people are visibly growing individually and as a group.” i see this happening in lita. what an energizing and fulfilling sight it is! president’s column bonnie postlethwaite bonnie postlethwaite (postlethwaiteb@umkc.edu) is lita president 2006/2007 and associate dean of libraries, university of missouri–kansas city. lib-mocs-kmc364-20131012114302 304 reports and working papers cable library survey results public service satellite consortium: washington, d .c. the following paper was distributed to pssc members in may 1981 , and is reproduced here to bring it to the attention of a wider audience. background the public service satellite consortium (pssc) conducted a survey of academic libraries in july 1980 to study their data communications needs and services. results of that study, coupled with library interest generated by that study, convinced pssc that: (1) libraries have a wide variety of communications needs which could be addressed with appropriate uses of telecommunications; (2) all types of libraries are affected, not just academic libraries; and (3) data transfer was but one of many types of library services in need of better communications. this information motivated pssc to take a broader look at library communications. that second look resulted in the identification of the "cable library" (catvlib) phenomenon and video library services. in december 1980, pssc launched a second survey directed to cable libraries; that is, libraries of all types which are connected to local cable companies. this study was aimed at determining to what extent, if any, a national satellite cable library network might be already in technical existence. how many libraries are presently connected to cooperative cable companies with satellite hardware and excess satellite receiver capacity? and of that number, how many cable libraries would be interested in participating in satellite-assisted library services and video-teleconferences? to answer these questions, pssc mailed questionnaires to 101 libraries that had been identified as potential cable libraries. in order to allow the participation of unidentified cable libraries, pssc also advertised the survey in various library periodicals, including american libraries, cable-libraries, and lola. that ad resulted in an additional 97 cable libraries requesting to participate in the survey, raising the total number of libraries receiving the questionnaire to 198. as of april 1981, 86 libraries have responded, yielding a 43 % return. follow-up phone calls have indicated that more surveys are forthcoming, or that the questionnaire proved to be irrelevant to present library conditions. in some cases, copies of the survey were requested and distributed for informational purposes only. the survey instrument the questionnaire incorporated explanations of terminology and was eight pages long. additional enclosures furnished more specific information about pssc and videoteleconferencing. the respondent was not only questioned about his/her library facilities, but also was asked to interview thecable company for necessary technical information. though contributing to slower returns, this two-tiered approach did succeed in establishing contact between the library and the cable company, as well as provide all the data required to profile each library as a potential network participant. survey participants since a national network is being pursued, an attempt was made to reach as many of the states as possible. thirty-seven states received copies of the survey, while thirty-one had at least one responding library. all types of libraries were surveyed. those surveyed included elementary school libraries, high school libraries, vocational school libraries, academic libraries, public libraries, regional library networks, state libraries, library systems, special libraries, and libraries that also double as their local community access center for cable television. of the 86 who responded, 63 were public, 18 were academic, 4 were school, and one was a special library. responding libraries have been categorized according to their ability to be an active member of the network: uf usable facility-those libraries that have met all the technical requirements for network participation. the library must be currently connected to an operational cable system which has a satellite receiving station and excess receiver capacity. in addition, the cable system and the library must have indicated an interest in participating in and hosting occasional satellite-transmitted events. nxc no excess ro capacity-libraries that meet all technical cable connectivity requirements, but whose cable system cannot presently accommodate any more activity on its satellite receiver(s), are grouped here. should time become available in the future, these libraries are then technically able to advance to the usable facility group. nro no catv rohere are placed those libraries that are connected to an operational cable system . however, the cable system has no satellite receiving station and, therefore, no satellite access. in order to become a usable facility, these cable systems must install a satellite receiving station and be able to offer excess receiver capacity. ncc no catv connectionwhile a cable system with all the satellite hardware requirements may be operating in the library's area, these libraries are not connected to the cable system. reasons given in the survey are varied including logistics, economics, and disinterest . depending upon the technical status of the cable system, a reports and working papers 305 simple link may be all that is needed for the library to become a usable facility. nca no catv in arealibraries in this group are located in areas that presently have no operational cable system. some areas are now in the franchising process, some have awarded franchises but are not operational, and others have no idea if and when cable service will come to their areas. libraries here have the advantage of knowing what requirements are necessary for network participation and can use this information when franchising negotiations begin. ni no interest-here are grouped those libraries that are at various stages of technical capability, but have no desire to participate in a national satellite cable library network. table 1 illustrates responses according to geographical location. (numbers refer to the quantity of libraries from each state that fit into the above defined categories .) exactly half of these respondents are usable facilities. the largest hindrance to network participation is lack of connectivity between the library and the cable system. library /cable connectivity part one of this survey established the degree of connectivity between libraries and their local cable companies. pssc's major concern was to find libraries wired to at least receive cable programming. pssc also discovered that the highest percentage of libraries had two-way connection, usually for the purpose of cablecasting. connectivity among the 86 respondents was broken down as follows (all percentages have been rounded off): 33 (39%) two-way interconnection (transmit and receive video) 29 (34%) one-way catv drop (receive onlyregular subscriber) 14 (16%) no catv connection 9 (10%) nocatvinmyareaor presently operational in my area 1 (1%) no answer to question other questions in this section profiled the technical capabilities of the cable system. specific hours of each day of the week a satellite receiver was available for occa306 journal of library automation vol. 14/4 december 1981 table i. state nabama naska arizona california colorado connecticut florida georgia hawaii idaho lllinois indiana iowa kansas kentucky maryland massachusetts michigan minnesota mi ssouri nevada new jersey new york north carolina north dakota ohio oklahoma oregon pennsylvania tennessee texas utah vermont virg inia wash ington wisconsin wyoming total total state respondents uf 0 no response 3 i 5 2 2 4 i i i 0no response 2 2 i i 2 2 2 i 3 3 i 2 2 ii 7 i i i i 4 i 14 5 2 i 0 no response i i 0no response 2 2 2 2 i 2 2 2 0 no response 4 2 2 i 3 3 0 no response 86 43 sional use were charted. weekday mornings proved to be the most available time block. it is also imperative for pssc to know what transponders (channels) of the satellite cable systems can access. there are twenty-four transponders on satcom i, the main satellite used by cable. when pssc coordinates a satellite telecast, time on a satellite transponder must be secured . each transponder is leased to someone, such as home box office (hbo), ted turner's cable news network, or the appalachian community service network nxc 3 i 7 nro 2 3 9 ncc 3 i 2 2 2 14 nca 2 i 2 8 nl 5 (acsn), to name a few, for the carriage of their programming. time needed by pssc for a two-hour satellite event, for example, can be sublet from a transponder lessee, subject to availability. however, finding time slots on satcom i transponders is becoming increasingly difficult as many lessees are expanding the number of hours of their own programming. as a result, pssc must know which transponders each cable system can receive so that an attempt can be made, where possible, to accommodate the majority of survey facilities. the ideal situation is for catvs to own "frequency agile" satellite receivers; that is, receivers that can access any of the transponders. some receivers can get only evennumbered transponders or odd-numbered transponders; others can access only certain individual transponders. transponder accessibility is usually related to the type of programming the cable operator offers or plans to offer to the local cable subscribers, or to the age of the system. (older systems often use twelve channel receivers, tunable to only evenor odd-numbered transponders on satcom i.) for example, if a cable operator does not anticipate offering anything besides hbo now or in the future from satcom i, often he/she cannot justify the need for a frequency agile receiver. table 2 outlines transponder accessibility for usable facilities only. this abundance of frequency agile receivers will provide the connected libraries with a greater amount of flexibility in receiving programming since their participation will not be dependent upon a certain transponder. another question probed the availability of provisions for closed-circuit, discrete delivery of satellite transmissions from thecable system's receiver into the library. being able to provide closed-circuit capabilities would ensure the privacy of a satellite telecast. some pssc clients insist that their transmissions be safe-guarded through closed-circuit delivery. as expected, closed-circuit arrangement does not exist between very many libraries and their catvs. unless part of an institutional cable loop, most libraries cannot presently be singled out for closed-circuit cable reception. under normal conditions, what is transmitted from the head end of the cable system travels to everyone subscribing to the cable service. eleven of the forty-three usable facilities claimed closedcircuit capabilities are currently available. those thirty-two without described what technical considerations must be present before such provisions could be offered. these technical requirements included scrambling devices, mid-band channel usage, modulators and demodulators. such upgrading of the cable company's hardware was quoted as costing from hundreds to sevreports and working papers 307 era! thousands of dollars. no catv indicated willingness to assume the expenses for such special capabilities, but a few did offer to investigate the possibility of temporary special links on a per-occasion basis. library facilities the survey also asked about the library's facilities. information in part two centered on library accommodations and equipment. answers here provided a description of each library, which gave pssc an idea of how adaptable to hosting satellite teleconferences each might be. a basic satellite program viewing facility consists of the viewing area, equipped with chairs and tables, at least one television monitor (wired to receive the cable protable2. # of facilities able to access transponder # transponder i 2 2 2 3 i 4 i 5 i 6 4 7 3 8 3 9 6 10 3 11 0 12 2 13 i 14 3 15 0 16 3 17 i 18 i 19 0 20 2 21 3 22 4 23 0 24 5 frequency agile 30 not sure 4 note: these ligures are for transponder accessibility on satcom i. numbers for the specific transponders were tabulated from those surveys that indicated their satellite receivers were not frequency agile, but rather could access only those transponders they had listed. 308 journal of library automation vol. 14/4 december 1981 gramming), and , for interactive programs, a telephone. survey libraries reported they had conference rooms, auditoriums, and classrooms available for viewing satellite telecasts. the number of viewers able to be accommodated at one time ranged from 6 to 400, with the average facility holding 75 people. some libraries could provide simultaneous viewing in more than one room, which increased the total number of people they could accommodate for a single event. a majority of the libraries had more than one monitor; some as many as fifteen monitors. three lib!aries indicated they owned a large-screen television projector. fortyfour percent of the usable facilities have no phones in the viewing rooms, but many explained that phones were either nearby or could be temporarily installed for an interactive event. in response to a question about the location and accessibility of the library within its community, the general comments described the majority of the libraries as being in a convenient part of town, with ample parking and barrier-free design. when given enough advance notice, most libraries were willing to schedule an event at any time, even during hours and on days the library was normally closed to the public. traditionally, as a part of its standard networking service, pssc rents viewing facilities for the client, whether they are public television stations, hotels, or other facilities. libraries, as another type of viewing resource, would be entitled to receive payment for use of their facilities. obviously, this fact treads on controversial "fee or free" waters. being aware of this, pssc asked the libraries whether they could accept money for these purposes; and, if not, whether they might have some other mechanism, such as a "friends of the library" group, to which the money could be given instead. those libraries that said they could accept money directly for the use of their facilities numbered thirty-four. oddly enough, thirtyfour libraries also said they could not accept money directly for the use of their facilities. of that group, thirty-one indicated they did have a "friends of the library" or similar group to which money could be given for indirect channeling back into the library. eighteen libraries did not answer this question (many due to libraries not completing the entire survey once they felt the cable information made them technically ineligible for participation). only three libraries might have a problem with financial arrangements for an event. program interests the final section of the survey (part three) gave each respondent the opportunity to list topics of interest to the library and community that could be presented via a satellite video-teleconference. general comments identified continuing education, organizational conferences, training, seminars, workshops, media distribution, and information dissemination as major activities suitable for satellite-assisted delivery and distribution. special target audiences included the following: 1. senior citizens 2. handicapped 3. minorities 4. the disadvantaged (economically, educationally, socially) 5. the abused (drug addicts and alcoholics; abused children and spouses, teachers and students; victims of crime; and the sexually harrassed) 6. the institutionalized (in hospitals, prisons, nursing homes, mental health centers, hospices) these special patrons are often served through outreach programs and were named here as potential beneficiaries of satellite programming. the most frequently named special population was the elderly, with suggestions for retirement, social services, nursing-home care, insurance, and other senior-oriented programming. three major classes of other potential users of satellite video-teleconferencing in the library were identified: 1. education-oriented: preschool and nursery students; elementary, middle, junior high, and high school students; postsecondary and graduate students; vocational, technical, extension, and cooperative education students; special education students; adult and continuing education students; educational administrators, faculties, and staff 2. government-oriented: federal, regional, state, county, and local government officials and employees 3. employment-oriented: professional! nonprofessional; salaried/hourly; union /nonunion; management/staff; public/private sectors; employed/ unemployed; full /part-time; permanent/ temporary; big/small business; human services/ trade particular topics of interest felt to be ideal satellite program areas within each library's community included the following (appearing in no rank order): energy (solar and natural resources) consumerism community services environment historic preservation/oral history legal aid librarianship computers, data processing technology communications/telecommunications fund raising safety recreation , physical education, sports, parks language (bilingual, sign, foreign, literacy) economics and finance (investment, banking, inflation, budgeting) conservation genealogy religion business and industry civil defense agriculture and forestry health and medicine mental health arts and humanities curriculum sharing therapy and rehabilitation real estate several local associations, who have affiliates or branches located nationally, were listed as potential users of satellite videoteleconferencing (in order of popularity): 1. american association of retired persons 2. league of women voters 3. historical societies 4. american library association 5. chamber of commerce 6. american association of university women reports and working papers 309 7. parent/teacher associations 8. councils of government 9. jaycees 10. boy scouts 11 . friends of the library three questions concerning interest and ability to participate in future satellite video-teleconferencing activities were asked . the questions, vital to the outcome of this survey, are reiterated here with their respective answers: 1. would you be interested in helping set up one or more of these specialized teleconferences? yes 63 (73%) no 10 (12 o/o) maybe 5 (6%) no answer 8 (9 %) 2. would you be interested in doing a local follow-up program after a national teleconference that is of interest to your community? yes 65 (76%) no 6 (7%) maybe 8 (9%) no answer 7 (8%) 3. periodically, nationally based organizations sponsoring teleconferences or special programs enlist promotional and site arrangement support from local site facilitators. would you like to be listed as available to provide this support? yes 54 (63%) no 18 (21 o/o) maybe 3 (3%) no answer 11 (13%) the interest of the libraries surveyed is well documented in questions one and two. however, their ability to presently participate is limited to financial and personnel resources as demonstrated by question three's responses. general conclusions and recommendations the majority of surveyed libraries recognize the need for libraries to expand their community service roles through some use of telecommunications. many of the 86 libraries indicated the concept of libraries becoming satellite program viewing facilities through their cable connectivity was an idea so new to them that they could not fully 310 journal of library automation vol. 14/4 december 1981 understand or visualize what would be expected of the library in this novel role. yet the general consensus was that if joining with their cable systems to provide satellite programs receiving locations was a method of improving community library services, while not making demands on the library's budget, then the concept was worth exploring individually on an operational basis. to illustrate this concept of the ca tvlib as a satellite program viewing facility, a typical scenario would find participating catvlibs contacted by an organization or networking agent who wishes to reach the general community or a special segment with its satellite-transmitted programming. the catvlib, as the community contact, would have the option to respond negatively or positively. if the catvlib is interested, it must begin performing local coordination duties, most important of which is garnering the agreement of its cable system. catvlib and cable system discussions will determine five things: 1. can the cable system access the satellite transponder on which the programming will be carried? 2. will the cable system have a satellite receiver available on the date and time of the program? 3. will theca tv lib have its viewing facility available on the date and time of the program? 4. if desired by the program's sponsor, will the catv lib contact the local group who is to participate in the program and work with them prior to the satellite telecast to the extent needed by the requesting organization? 5. can the cable system and/or the catvlib handle special program considerations, if any? for example, provide closed circuit capability in the catv lib? tape the program? provide telephone(s) for interactive programs? provide local site facilitation? -coordinate local follow -up activities? provide refreshments? coordinate advance publicity within the community? once the catvlib has determined whether or not it is able and desires to offer their services, the ca tvlib would be recorded as a satellite program "receive site." theca tv lib will then assume the degree of local responsibility requested and contracted by the requesting organization, including all negotiations necessary with the cable system. while there were survey indications of general support for such a national satellite cable library network, what are the pros and cons of its operation? pros pre-existing conditions. ca tvlibs need no investment for hardware, but merely take advantage of pre-existing cable connectivity. community service. such ca tvlib participation potentially offers service to every member of the community. outreach to new patrons. those community residents not previously using the library may find this new service applicable to their needs. economics. catvlibs could recoup any charges incurred through this service, as well as expect payment as a rented receive site. program interaction. live satellite programming has the advantage over taped programming of allowing the option of offering viewers the opportunity to interact with the program's presenter(s). resource-sharing potential. this service has the future potential of providing catvlibs with an alternative method of accessing new information resources and data bases. human resources can be shared now through this service. potential catv expansion. more catvs are expanding and upgrading their satellite access capabilities as usage of satellites by cable programming vendors increases. some catv s have already purchased west ar iii hardware in addition to their satcom i hardware. future implications. if satellite-related services become valued by the community, the residents might decide the catv lib should have its own satellite hardware so that the community could take advantage of more programming available directly from satellite. cons lack of sa tcom i occasional time. it is becoming increasingly difficult to sublease transponder time on this satellite for occasional satellite programs. dependency. the catvlib must depend entirely on the cable system to be able to be a network participant and offer this service. ca tvlib participation is dependent upon the cable system's satellite access capabilities, which generally means satcom i only. lack of cctv. generally, most ca tvlibs cannot offer closed-circuit capability, so absolute privacy cannot be guaranteed to the program's sponsor. catvlib policies. some catvlibs will have to make decisions about various controversial items, such as: -accepting money for use of facilities. -allowing some clients the right to limit viewing to only registrants. -hosting controversial groups. range of catvlib capabilities. the survey demonstrated that ca tvlibs cannot all offer the same degree of service due to the wide range of technical capabilities. at present, each satellite event would have to be judged individually to determine which catvlibs were equipped to participate. a glance at the pros and cons of marrying libraries and satellite communications through cable connectivity suggests a national satellite catv lib network is a presently available and usable resource with potential for future expanded capabilities and unlimited programming uses. the obstacles imposed by the cons, however, are cause for a serious and objective look at the present and future viability of such a network. popular present uses of satellite videoteleconferencing are for telecasting continuing education and organizational conference interactive programming to special audiences. some pssc clients will often request to: -charge his/her special audience for participating (course or conference fees, for example). -have the satellite-transmitted event reports and working papers 311 closed-circuit telecasted to the receiving locations only. -reach specific geographical locations (often large urban areas, such as new york or los angeles). charging special audiences for closed-circuit satellite event the first two client requests are often related. if the client intends to charge the registrant-viewer a fee, he/she often expects the program to be viewed only at designated receive sites that are hosting the paying participants. (why should a viewer pay if heishe could watch the same program at home on a cable channel for free?) obviously, those clients interested in a "box office" approach to their event, that is, to make a profit rather than offer a service, are not suited for catvlib network use. however, how can the ca tvlibs accommodate those public service groups which must recoup expenses in order to offer such satellite program services? client-designed incentives such as giving the phone number for viewer interaction in a program only to the ca tvlibs rather than displaying or announcing the number during the program; requiring participants to have special materials and/ or integrating local preor postevent activities in the catvlibs with the program; even offering course credit to registrants only are manageable alternatives for those catvlibs that cannot terminate the program in their facilities only. some catvlibs may be able to negotiate whh their catv for the provision of the necessary equipment to provide closed-circuit capabilities. however, this survey did not identify many catvs that were willing to cooperate with the libraries to that extent. for those catvlibs whose policies restrict their involvement with financial transactions, particularly money exchange among library patrons, advance registration fees paid directly to the client could enable the libraries to avoid being required by the client to "collect at the door." most libraries, however, by their very nature, cannot prohibit anyone from viewing a program within their facilities, thereby making it generally impossible for them to guar312 journal of library automation vol. 14/4 december 1981 antee the client their requested selective audience. size, location, and distribution of receive sites video-teleconference users generally want to reach as many of their members or special populations as possible, yet they must pay to rent each receive site. economics influence their attempt to reach more people at fewer locations, not necessarily those most in need of the program. therefore, it is no surprise that popular receive sites are located in heavily populated cities. while cable television is finally coming to urban areas, present conditions find a lack of operational catvs available. the typical catvlib now is located in a smaller city or rural area. large states, such as california and texas, have little or no catvlib representation. only twentythree states currently have a usable catvlib facility, which makes the network descriptor "national" not quite accurate. expanding the catvlib network to include more and larger cities and all states is a must to make it competitive with other satellite networks available to a client. but even if the network is able to expand, the previously mentioned inability of catvlibs to provide closed-circuit capabilities will lessen its desirability as a resource when that capability is offered by another satellite ground facility in the same city. one competitive alternative a catv lib can consider is rental cost. clients expect to pay a reasonable rate for the use of each facility. this rate differs among different types of satellite networks, and even within the same network. for example, renting a public television station is generally less expensive than booking a hotel. yet the rate for two public television stations can vary in the hundreds of dollars. if a catvlib chooses to offer its facilities for free, asking only for compensation os any expenses it might incur because of the satellite event or charges a minimal amount, their facility becomes economically attractive. one factor the ca tvlibs must not overlook when contemplating such a decision is the cable system. will the cable system expect remuneration for its services, especially if the catvlib is receiving payment? libraries must remember they have entered into a cooperative arrangement with their catvs in order to become a satellite program viewing facility. toward future independence while a skeletal cable library network does technically exist, it is imperative that libraries work toward their own future independence before they can truly establish themselves as a viable satellite network. evolution of a catvlib network to a satellite library network might include the following two steps: l. expanded catvljb network. the survey instrument should now evolve into an interview tool for profiling additional libraries to become part of this network. efforts should be made to encourage libraries within poorly represented states to join the network if technically feasible. expansion is urged for two main reasons: to allow libraries the opportunity to experience being a satellite program viewing facility without financial obligations. -to allow community residents the opportunity to experience a library service with great potential for all local population segments. once the library is regarded as the logical place for community communications, it will be much ea~ier to begin a community drive toward supporting the outfitting of the library with the proper hardware necessary to function in that capacity. requirements for becoming part of the expanded catvlib network include: -at least one-way connectivity between the library and the catv. (a typical subscription for basic service will suffice.) -the catv must have a satellite receiving station. -the catv must have excess capacity available on its satellite receiver. -catv must be willing to cooperate with the library in providing satellite reception of occasional satellite telecasts. library must have at least one viewing room available to seat those viewing the satellite program. library must have at least one television monitor, wired to receive cable programming, available in the viewing room. library must be willing to assume role of community contact to extent requested by client. (need is for library interest in participating in these occasional satellite telecasts; degree of local responsibility can be negotiated.) even though this network is designed to be a temporary method of allowing library participation in satellite communications, future implications could find these libraries expanding, improving, or beginning eablecasting on a library-designated cable channel. thus, libraries deciding whether they should become involved with a temporary network might contemplate the related activities available from library/cable system cooperation. 2. satellite library network. at some point in the not too distant future, libraries will be faced with the decision of becoming independent from their cable system and obtaining their own satellite hardware. a library with its own satellite receiving station will become more desirable to more users as a receive site for a satellite videoteleconference since it will be more reports and working papers 313 flexible and autonomous. besides satellite video-teleconferences, libraries could investigate other uses of their satellite hardware including: direct satellite access (with permission recommended) for cable television fare; reception of nationwide satellite distribution of taped video programming for library use; -facilitation of various library data communications. if the library is able to prove the value and practicality of having community satellite access capabilities located at its facilities to the residents through participation in the catv lib network, local funding of a satellite library project might be realistic. if corporations are made aware of how such a satellite library facility could benefit their own communications needs, a corporate grant could prove to be another funding route. other sources of support must also be explored. final word as a result of this survey, pssc has profiled cable libraries of all technical capabilities for input into a database of network resources. however, the limitations of a catv lib network have been noted. effort will be made by pssc where appropriate to use this network for client satellite telecasts. pssc will continue to profile interested cable libraries for addition to the network , upon request of the library. statement or ownership and management }ourrwf of l.ihrary automatior1 is published qua rterly by the arn ericnn library as~iation, 50 e. lluron st .. chica~o. jl 60611. annual subscriptio n prk-e, s 15. am erican library a.o,sociation. o\\ rwr; brian avcncy. editor. second class postage paid at chicago, ill inois. pnnted in u.s.a. a .. a nonprofit organization authorized to mail at special ratl'\ (sl'ction 132.122, postal s(•rt:ice .\lanual), the puq>oi,(', fu nction. and nonprofit .;:tatu~ of thh organi zation and the exempt status for ft..--dcral in(•omt.· taj; purposes have not chan~ed during the preceding l\\ el\ e month~. extent and ~aturc of circulatlon ("a\eras;:e" figures denott• the numlx:r of copies printed each issue during the preccdmg: tweh"e months: ''actual " figure-. denote number of copies o f sin~le l 'isue published neart..-q to filin~date -the june 1981 i~sue.) tot al numbt!r o f copll"i printed: aq~ra~e. 6,869: ac t ual. 7,345. paid circulation: not applicable (i.e .. no ... a c"' throuj!h dealers. carrie rs. street 'endoro, and rountcnal<.--s} . mail subscription ... : ah•ra~<·. 6,076: actual. 6 ,308. total puid circulation: average, 6.076. actual6 ,308. free dhtrihution b y mail, carrier, o r otlwr means, samples, complirnt·ntary. and ot her free cop ies: a\t~ragc . 432: a<:tuul, 446. t olitl di ~t rihution: average. 6.508: actual. 6,i.54. copies no t di. ... tributcd: offic<' us(', le ft over, unacco unt ed . 11poiled after printing: aven~ge, 361; actual. 59 1. hcturm from news agents: not applicahlc. t otul (sum prcviouo; thrt.."c entries): a\'erage, 6,869: actual. 1.345. stateml·nt of 0\\ ncn hi p. ~1anagement and circulation (ps 3526. j une 19so) fo r 1981 fil ed with the united stat<" po't office pmt rna\tn in chica~o. september 30. 19hl 54 information technology and libraries | june 2011 recreation, law enforcement and public safety, and social services available in the community ■■ access to electronic encyclopedias, local libraries’ catalogs, full-text articles online, and document delivery.”2 at the time we were asking the question, will an information infrastructure be built? the answer? most assuredly. indeed, librarians stepped up to the table and ensured that the public had access to information-related services at their local library. the information the public asked for in 1994, as listed above, is widely available today. there are numerous examples in which librarians and libraries have served as leaders in the ongoing sustainablity of local, regional, and national information networks. it was pointed out at the time, and remains true today, that in an era of ever-shrinking resources, libraries cannot and should not compete with telecommunications, entertainment, and computer companies. they need to “join them as equals in the information arena.”3 lita has a viable role in the development of the twentyfirst-century skills that will firmly put the information infrastructure into place. a lita member is appointed as a liaison to the office for information technology policy (oitp) and serves on the lita technology and access committee, which addresses similar issues. the lita transliteracy interest group explores, develops, and promotes the role of libraries in all aspects of literacy. working with the oitp provides lita membership with the opportunity to participate in current issues, such as digital literacy. the information infrastructure has come a long way in the last twenty some years. there is still much to be done. robert bocher, technology consultant with the wisconsin state library and oitp fellow, will present “building the future: addressing library broadband connectivity issues in the 21st century” at the lita president’s program from 4 p.m. to 5:30 p.m. on sunday, june 26, at the ala annual conference in new orleans. i look forward to seeing you at the program and to hear about the successes and the work that remains to be done to address the broadband needs we all face in the country. references 1. federal communications commission, the national broadband plan: chapter 2: goals for a high performance america, http://www.broadband.gov/plan/2 -goals-for-a-high-performance-america/ (accessed apr. 2, 2011). 2. karen starr, “the american public, the public library, and the internet; an ever-evolving partnership” in the cybrarian’s manual, ed. pat ensor (chicago: ala, 1997): 23–24. 3. ibid., 31. t wenty years ago, librarians became involved in the implementation of the internet for the use of the public across the country. those initiatives were soon followed by the bill and melinda gates foundation projects supporting public libraries, which included funding hardware grants to implement public computer labs and connectivity grants to support high-speed internet connections. in 2008, the institute of museum and library services (imls) convened a task force to define twentyfirst-century skills for museums and libraries, which became an ongoing national initiative (http://www.imls .gov/about/21stcskills.shtm). the one year anniversary of the release of the national broadband plan was march 16, 2011. as described on broadband.gov, the plan is intended “to create a high-performance america—a more productive, creative, efficient america in which affordable broadband is available everywhere and everyone has the means and skills to use valuable broadband applications.”1 in 1994, the idaho state library’s development division cosponsored eight focus groups in which 179 people participated. the participants were asked several questions, including the types of information they would like to see on the internet. the results reflected the public’s interest at that time in the following: ■■ “expert advice on a variety of topics including medicine, law, car repair, computer technology, animal husbandry, and gardening ■■ economic development, investment, bank rates, consumer product safety, and insurance ■■ community-based information such as events, volunteers, local classified advertisements, special interest groups, housing information, public meetings, transportation schedules, and local employment opportunities ■■ computer training, foreign language programs, homework service, teacher recertification, school activities, school scheduling, and adult education ■■ electronic mail and the ability to transfer files locally as well as worldwide ■■ access to public records, voting records of legislators, absentee voting, the ability to renew a driver’s license, the rules and regulations from governmental agencies, and taxes ■■ information about hunting and fishing, environmental quality, the local weather, road advisories, sports, karen j. starr (karen.j.starr@gmail.com) is lita president 2010-11 and assistant administrator for library and development services, nevada state library and archives, carson city. karen j. starr president’s message: 21st century skills, 21st century infrastructure reproduced with permission of the copyright owner. further reproduction prohibited without permission. in the beginning...was the command line zillner, tom information technology and libraries; jun 2000; 19, 2; proquest pg. 103 book reviews in the beginning ... was the command line by neal stephenson. new york: avon books, inc., 1999. 151p. $10 (isbn 0-38081593-1) neal stephenson is best known for his cyberfiction, including snow crash and most recently cryptonomicon. in the beginning . . . was tlze command line is a quite different kettle of fish. command line is a short book with a succinct message: the command line is a good thing, because the full power of the computer is only available to those who can access the command line and type in the magic commands that make things happen. stephenson learned this lesson the hard way, after first spending much time as a macintosh-devoted guihead. the revelation came when he lost a document he was editing on his powerbook, completely and without a trace, forever irretrievable. actually, i say the book has a succinct message, but it has many messages and many metaphors, all artfully constructed by a master of prose. stephenson constructs his arguments along multiple lines, providing a discursive tour through windows, macintosh, and unix history, offering personal history as well as his own take on the economics of the software industry. for example, he believes that microsoft would be better off as an applications company rather than carrying the millstone of a family of operating systems. as for apple, he suggests that they have been doing their best to destroy themselves for years, so far unsuccessfully (but give them time). the real meat of the book is whether, in fact, it is better to offer to people the flash of metaphor with the recognition that power and certain levels of choice are lost, as with graphical user interfaces exemplified by windows and the macintosh, or whether it is better to have at least some access to the command line interface, which ms/dos offered and members of the unix family (e.g., linux) afford. this is, in fact, both a silly and important question at the same time. silly because many people would wonder why anyone would want command line access to any software. silly because others might wonder why you couldn't have both. important, or at least apparently important, because we seem to have become, without much warning, a world wrapped in guis of one sort or another. important in the library automation world, because end-user tools are moving increasingly toward gui-based or web-based interfaces without textbased alternatives (except, perhaps, lynx or similar web browsers, which have their own problems). for much of the book, stephenson dances around the question, among others, of why not both gui and text-based interfaces, and finally finds the answer in the be operating system. my question is, why not as many interfaces as it takes, of whatever sort? to repeat the trite saw, there are two kinds of people in the world, those who divide the world into two kinds of people and those who don't. stephenson has a lot of fun trying to make the division in this case, then ultimately comes out from behind the posturing and admits that he believes in the availability of both worlds. there are many people who do, indeed, want hard things hidden from them, at least some of the time. when i am dealing with an automated teller machine, i don't want to have to use mechanical levers or pedals as i might have needed were atms invented in an earlier age, nor do i want to type in commands, although i am comfortable using a command line environment in my workplace. i just want to be prompted through a minimal number of steps to walk away with some cash from my checking account. the world is a complicated and challenging place to navigate. some people tom zillner, editor would like to be helped by other people in this navigation, although many have found that they would far rather deal with the dumbeddown interface of an atm machine than to interact with not-so-friendly, underpaid bank tellers. similarly, many people want to accomplish a particular task requiring the use of a computer and don't mind having the details hidden from them, no matter how much power knowing the details would provide. or, they want to do that at least some of the time. as an example in the library world, let's consider a nai:ve patron who enters the library desiring to perform a known-item search. such a user might be quite comfortable with an interface with a single type-in box and a set of clickable buttons labeled title, author and subject. or maybe just a single button "click to start search." although nai:ve users may consult library staff, who are most often more friendly than bank tellers, many people want to find their own materials. at the same time, more sophisticated users want more sophisticated capabilities and interfaces from the same catalogs. although vendors have gotten better at providing a couple of levels of complexity and corresponding user interfaces, why not go further? there aren't just two kinds of people. there are lots of kinds of people, with lots of kinds of information needs, representing lots of experience levels. why the restrictions at the user interface? in the history of microcomputing, stephenson points to the evolution of two major players, microsoft and apple, with linux coming on strong and be representing an interesting offshoot. i think the important insight implicit in what stephenson discusses is that much of the appearance and behavior of windows and the macintosh desktop are historically based artifacts. in order to maintain backward compatibility with existing applications, the windows and macintosh book reviews 103 reproduced with permission of the copyright owner. further reproduction prohibited without permission. operating systems have picked up a great deal of "cruft," computer code that allows multitasking and other improvements cobbled on to the fragile inner shell of ancient code required for compatibility with older applications. at the same time, stephenson invokes the familiar refrain that the user interfaces of both platforms are tied to a tired set of metaphors that attempt to mimic the real-world office (e.g., desktop, folder) but do not do so with any kind of useful fidelity. in the library world, i think a similar kind of lineage might be traced from command line interfaces to the current windowsand web-based front-ends. although many libraries and librarians have faced painful conversion processes over the years in moving through generations of automated systems, it might be interesting to see if there are still traces of underlying code that owe their existence to backward compatibility. where does stephenson turn in the face of the inelegance of the windows and macintosh worlds? he finds solace in the power and integrity of linux. it may take a long time to successfully install the operating system and get it to function with all of the hardware components of a particular computer configuration, but it has all that power, and all of those cool applications carefully constructed by people who care. bugs are fixed quickly. it's a community effort. that's all very appealing, particularly when compared to the appalling response (or lack of it) to windows or macintosh bugs. the problem is that so far most of us aren't equipped to deal with the steep curve required to install linux on personal computers, and the corporate or library environment usually isn't politically prepared for linux to be adopted as an institutionwide standard. so, while linux boxes are frequent choices for servers, they are not widespread personal pc choices. nor r.hould they be until easy installation tools are available. again, stephenson is ambivalent. on the one hand, he recognizes that there are many people who don't want the kind of power offered by being so close to the machine if it means becoming experts in arcane commands and codes. even though he wants the power and simplicity, and decries the limitations imposed by the gui, he recognizes that linux is not for everyone. he's right. most people use computers to get some work done (or to play). to the extent that the software gets in the way, it isn't operating properly. by that criterion, none of the three environments described are particularly useful in a desktop world. in spite of the fact that the old metaphors have been rightly criticized for years for their tiredness, there doesn't seem to be much movement beyond them, except in limited research operating environments and applications. similarly, it seems, in the library and information world, at least in most people's routine interactions with opacs and databases. yes, i am waffling, because i'm sure that someone could point out the "snarfle n 1 virtual reality interface to the lc catalog that affords a walkthrough browsing experience," but of course only six computer science researchers have actually experienced the snarfletm interface, and it requires a $25,000 workstation and $10,000 in virtual reality gear to work, plus it is s-1-o-w. pardon the sarcastic riff, but there is a lot of wonderful user interface work that is certainly not finding its way onto mainstream computer users' desktops, or to the library or information center. so what's the answer? criticism is fun, because critics don't necessarily have to provide a positive account to match their nay-saying function. if things are bleak in the world of the user interface, both on the average user desktop and on the library desk104 information technology and libraries i june 2000 top as well, what is to be done? for a taste of what is to come in the library world, take a look at mylibrary (http:/ /my.lib.ncsu.edu/), which allows profiling of user preferences and customization based on academic discipline. similarly, there are a number of web portals and other sites that allow customization for users (e.g., my yahoo, my excite, etc.). suppose that these first steps in customization are carried further, so that each user's unique profile generates a unique user interface experience across all databases he or she deals with in a session. the interface unification could be accomplished across heterogeneous databases in a couple of different ways. a simple initial step that many libraries already employ is to obtain databases from a single aggregator, so that a uniform interface is presented to the user. for example, oclc' s first search offers a single interface to a number of commercial databases. this type of solution is not possible for libraries that need access to a diverse array of databases not available through a single aggregator or vendor. of course, this situation can present patrons and staff with a bewildering array of interfaces and search methods. a more elaborate solution is to employ z39.50 to access the databases and build a single interface at the front end. there may be aggregators that already use this strategy with the databases they provide, but in the future perhaps there would be an incentive to offer unified interfaces with fine-grain customization possible by users. getting back to stephenson's more generalized view of the user interface, i think there are also opportunities here for more finegrained customization. stephenson points to the beos, which apparently allows both command-line and guibased interactions, as an example of what can be done when an operating system is constructed anew, from the bottom up, with no pre-existing reproduced with permission of the copyright owner. further reproduction prohibited without permission. audience to satisfy. at the same time, and in contrast, stephenson extols the power of open software development, which he believes is most apparent in operating systems, the production of which he describes as money-losing propositions. yet, linux is tremendously successful without, for the most part, commercial gain for developers. can this same model be applied to interface and other development in the library world? in this example, might not some group of librarian coders (or coder librarians) work together to put mylibrary together with z39.50 capabilities and customization of interfaces to produce a little slice of paradise for library patrons? promising moves are being made within the library community to get open source efforts off the ground. this could be one of many especially useful and fruitful projects to come out of open software development for libraries. although his book is ostensibly about a few issues that elicit yawns from most of the world, stephenson is really using in the beginning . . . was the command line to look at a much bigger picture than simply the command line versus the gui at its microscopic level. stephenson looks at the cloaking, obfuscation or replacement of underlying text by images and multimedia as contributing to the decline of civilization. that seems like a radical claim, but at heart it is the one that stephenson makes in his discussion of the disney-ification of the world-that visual metaphors and explanations oversimplify and obscure the truth. in fact, stephenson goes further, discussing this trend toward anti-word as our attempt at an antidote for the kind of intellectualism that resulted in a lot of death, pain, and suffering for people in the twentieth century. he, as a person who lives by words and loves the intellectual life, thinks we've gone too far, reaching a state of cultural relativism where there is neither good nor bad remammg. this discussion includes my favorite quote of the book: the problem is that once you have done away with the ability to make judgments as to right and wrong, true and false, etc., there's no real culture left. all that remains is clog dancing and macrame. the ability to make judgments, to believe things, is the entire point of having a culture. i think this is why guys with machine guns sometimes pop up in places like luxor and begin pumping bullets into westerners .... when their sons come home wearing chicago bulls caps with the bills turned sidewavs, the dads go out of their minds. (p. 56) it's a pretty startling move to try to connect up the decline in use of the command line to an anti-intellectualism following world war ii that resulted in cultural relativism. i think it actually has some merit, although in the case of visual interfaces versus the command line the ethical import is minimal, i.e., i don't believe my decision to accomplish certain tasks using visual metaphors contributes to the decline of civilization, and i think the fact that i like to work on other tasks utilizing a command line won't serve to save our written culture. it's too much of a stretch. i think that something stephenson misses in his discussion of the replacement of the written word by visual images is that there is still a creative force and judgment involved in the creation of the images. there is still script writing. isn't this, after all, what a writer does in any case, creating images, metaphorically, through his or her work? certainly, we are moving through a perilous time, when the world really is changing from a reliance on the written word to more dependence on the visual. there will be many things lost in this transition. plato had some major, wellfounded doubts about the transition from greece's oral cultural tradition to a written one. the change happened anyway. civilization has been declining for a long time. my fearless prediction is that it will continue to decline for a long time. i think stephenson has done a masterful job of writing a brief glimpse of the overall picture that represents the state of culture and intellectual life in the world today, and has also made some important points about the economics and character of the world of software and operating environments. his writing skills make this fairly short book a pleasurable read and a worthwhile one. as i did, i think you might find this long essay a useful starting point for thoughts about issues large and small.-tom zillner, wils the cathedral & the • bazaar: musings on linux and open source by an accidental revolutionary by eric s. raymond, sebastopol, calif.: o'reilly, 1999. 288p. $19.95 (isbn 156592-724-9) this short essay examines, in the guise of a book review, the concept of a "gift culture" and how it may or may not be related to librarianship. as a result of this examination, and with a few qualifications, i believe my judgements about open source software and librarianship are true: open source software development and librarianship have a number of similarities-both are examples of gift cultures. i have recently read a book about open source software development by eric raymond. the cathedral & the bazaar describes the environment of free software and tries to explain why some programmers are willing to give away the products of their labors. it describes the "hacker milieu" as a "gift culture": book reviews 105 reproduced with permission of the copyright owner. further reproduction prohibited without permission. gift cultures are adaptations not to scarcity but to abundance. they arise in populations that do not have significant material scarcity problems with survival goods. we can observe gift cultures in action among aboriginal cultures living in ecozones with mild climates and abundant food. we can also observe them in certain strata of our own society, especially in show business and among the very wealthy. 1 raymond alludes to the definition of "gift cultures," but not enough to satisfy my curiosity. being the good librarian, i was off to the reference department for more specific answers. more often than not, i found information about "gift exchange" and "gift economies" as opposed to "gift cultures." (yes, i did look on the internet but found little.) probably one of the earliest and more comprehensive studies of gift exchange was written by marcell mauss. 2 in his analysis he says gifts, with their three obligations of giving, receiving, and repaying, are in aspects of almost all societies. the process of gift giving strengthens cooperation, competitiveness, and antagonism. it reveals itself in religious, legal, moral, economic, aesthetic, morphological, and mythological aspects of life.3 as gregory states, for the industrial capitalist economies, gifts are nothing but presents or things given, and "that is all that needs to be said on the matter." ironically for economists, gifts have value and consequently have implications for commodity exchange. 4 he goes on to review studies about gift giving from an anthropological view, studies focusing on tribal communities of various american indians, cultures from new guinea and melanesia, and even ancient roman, hindu, and germanic societies: the key to understanding gift giving is apprehension of the fact that things in tribal economics are produced by nonalienated labor. this creates a special bond between a producer and his/her product, a bond that is broken in a capitalistic societv based on alienated wage-labor. 5 ingold, in "introduction to social life," echoes many of the things summarized by gregory when he states that industrialization is concerned exclusively with the dynamics of commodity production. clearly in non-industrial societies, where these conditions do not obtain, the significance of work will be very different. for one thing, people retain control over their own capacity to work and over other productive means, and their activities are carried on in the context of their relationships with kin and community. indeed their work may have the strengthening or regeneration of these relationships as its principle objective. 6 in short, the exchange of gifts forges relationships between partners and emphasizes qualitative as opposed to quantitative terms. the producer of the product (or service) takes a personal interest in production, and when the product is given away as a gift it is difficult to quantify the value of the item. therefore, along with the product or service, less tangible elements-such as obligations, promises, respect, and interpersonal relationships-are exchanged. as i read raymond and others i continually saw similarities between librarianship and gift cultures, and therefore similarities between librarianship and open source software development. while the summaries outlined above do not necessarily mention the "abundance" alluded to by raymond, the existence of abundance is more than mere speculation. potlatch, "a ceremonial feast of the american indians of the northwest coast marked by the host's lavish distribution of gifts or sometimes destruction of property to demonstrate wealth and generosity with the 106 information technology and libraries i june 2000 expectation of eventual reciprocation," is an excellent example.? libraries have an abundance of data and information. (i won't go into whether or not they have an abundance of knowledge or wisdom of the ages. that is another essay.) libraries do not exchange this data and information for money; you don't have to have your credit card ready as you leave the door. libraries don't accept checks. instead the exchange is much less tangible. first of all, based on my experience, most librarians simply take pride in their ability to collect, organize, and disseminate data and information in an effective manner. they are curious. they enjoy learning things for learning's sake. it is a sort of platonic end in itself. librarians, generally speaking, just like what they do and they certainly aren't in it for the money. you won't get rich by becoming a librarian. information is not free. it requires time and energy to create, collect, and share, but when an information exchange does take place, it is usually intangible, not monetary, in nature. information is intangible. it is difficult to assign it a monetary value, especially in a digital environment where it can be duplicated effortlessly: an exchange process is a process whereby two or more individuals (or groups) exchange goods or services for items of value. in library land, one of these individuals is almost always a librarian. the other individuals include tax payers, students, faculty, or in the case of special libraries, fellow employees. the items of value are information and information services exchanged for a perception of worth-a rating valuing the services rendered. this perception of worth, a highly intangible and difficult thing to measure, is something the user of library services "pays," not to libraries and librarians, but to administrators and decision-makers. ultimately, these payments reproduced with permission of the copyright owner. further reproduction prohibited without permission. manifest themselves as tax dollars or other administrative support. as the perception of worth decreases so do tax dollars and support. 8 therefore, when information exchanges take place in libraries, librarians hope their clientele will support the goals of the library to administrators when issues of funding arise. librarians believe that "free" information ("think free speech, not free beer") will improve society. it will allow people to grow spiritually and intellectually. it will improve humankind's situation in the world. libraries are only perceived as beneficial when they give away this data and information. that is their purpose, and they, generally speaking, do this without regard to fees or tangible exchanges. in many ways i believe open source software development, as articulated by raymond, is very similar to the principles of librarianship. first and foremost they are similar in the idea of sharing information. both camps put a premium on open access. both camps are gift cultures and gain reputation by the amount of "stuff" they give away. what people do with the information, whether it be source code or journal articles, is up to them. both camps hope the shared information will be used to improve our place in the world. just as jefferson's informed public is necessary for democracy, open source software is necessary for the improvement of computer applications. second, human interactions are a necessary part of the mixture in both librarianship and open source development. open source development requires people skills by source code maintainers. it requires an understanding of the problem the computer application is intended to solve, since the maintainer must be able to "patch" the software, both to add functionality and to repair bugs. this, in turn, requires interactions both with other developers and with users who request repairs or enhancements. similarly, librarians understand that information-seeking behavior is a human process. while databases and many "digital libraries" house information, these collections are really "data stores" and are only manifested as information after the assignment of value is given to the data and interrelations between data are created. third, it has been stated that open source development will remove the necessity for programmers. yet raymond posits that no such thing will happen. if anything, there will be an increased need for programmers. similarly, many librarians feared the advent of the web because they believed their jobs would be in jeopardy. ironically, librarianship is flowering under new rubrics such as information architects and knowledge managers. it has also been brought to my attention by kevin clarke (kevin_clarke@unc.edu) that both institutions use peer-review: your cultural take (gift culture) on "open source" is interesting. i've been mostly thinking in material terms but you are right, i think, in your assessment. one thing you didn't mention is that, like academic librarians, open source folks participate in a peer-review type process. index to advertisers all of this is happening because of an information economy. it sure is an exciting time to be a librarian, especially a librarian who can build relational databases and program on a unix computer. acknowledgements thank you to art rhyno (arhyno@ server.uwindsor.ca) who encouraged me to post the original version of this text.-eric lease morgan, north carolina state university, raleigh, north carolina references 1. the cathedral & the bazaar: musings on linux and open source by an accidental revolutionary, 99. 2. m. mauss, the gift: forms and functions of exchange in archaic societies (new york: norton, 1967). 3. s. lukes, "mauss, marcel," in international encyclopedia of the social sciences, d. l. sills, ed. (new york: macmillian), vol 10, 80. 4. c. a. gregory, "gifts," in the new pa/grave: a dictionary of eeconomics, j. eatwell and others, eds. (new york: stockton pr., 1987), vol. 4, 524. 5. ibid. 6. t. ingold, "introduction to social life," in companion encyclopedia of anthropology, t. ingold, ed (new york: routledge, 1984), 747. 7. the merriam-webster online dictionary, http://search.eb.com/ cgi-bin/ dictionary?va=potlatch 8. e. l. morgan, "marketing future libraries." accessed apr. 27, 2000, www.lib.ncsu.edu/ staff/ morgan/ cil/ marketing. info usa library technologies, inc. lita mit press cover 2 cover 3 58, 69, cover 4 95 book reviews 107 japanese military “comfort women” knowledge graph: linking fragmented digital records article japanese military “comfort women” knowledge graph linking fragmented digital records haram park and haklae kim information technology and libraries | march 2023 https://doi.org/10.6017/ital.v42i1.15799 haram park (haram9553@gmail.com) is master student, library and information science, chung-ang university, haklae kim (haklaekim@cau.ac.kr) is associate professor, library and information science, chung-ang university. © 2023. abstract materials related to japanese military “comfort women” in korea are managed by several institutions. each digital archive has their own metadata schema and management policies. so far, a standard or a common guideline for describing digital records is not formalized. we propose a japanese military “comfort women” knowledge graph to semantically interlink the digital records from distributed digital archives. to build a japanese military “comfort women” knowledge graph, digital records and descriptive metadata were collected from existing digital archives. a list of metadata was defined by analyzing commonly used properties and a knowledge model designed by reusing standard vocabularies. knowledge was constructed by interlinking the collected records, external data sources, and enriching data. the knowledge graph was evaluated using the fair data maturity model. introduction in december 1991, kim hak-sun (a korean) became the first woman to disclose and identify as a former “comfort woman.”1 in february 1992, ms. itoh hideko discovered three telegrams in the japanese defense agency stating that not only korean but also taiwanese women had been dispatched as “comfort women.”2 between 1931 and 1945, the imperial japanese army forced approximately 200,000 girls and young women from korea, china, and other countries, known as “comfort women,” into sexual slavery. these women came from all over east asia, but the majority, over 80 percent, were from south korea.3 it was not until the early 1990s that survivors began to share their stories and demand justice. many international organizations and volunteers continue to participate in advocacy and campaigns to solve the japanese military sexual slavery.4 however, the japanese government has never accepted legal responsibility or agreed to pay reparations.5 regardless of political interpretation, we believe it is critical to reveal the historical truth. the records of japanese military “comfort women” serve as objective evidence to prove the fact that the japanese military indulged in sexual slavery. as there are now only 13 elderly survivors left in south korea, the records could serve as one of the key pieces of evidence for understanding the japanese military “comfort women.” in korea, materials related to japanese military “comfort women” are managed by the national archives of korea and some private organizations, and some of this material is being provided as digital archives.6 digital archives systematically describe digital resources so that users can effectively search and view the materials.7 in general, digital archives describe digital resources based on guidelines for expressing standard metadata elements and data values that are mainly used in the domain. for mailto:haram9553@gmail.com mailto:haklaekim@cau.ac.kr information technology and libraries march 2023 japanese military “comfort women” knowledge graph 2 park and kim example, the us library of congress is creating digital resources with varying levels and types of descriptive metadata, providing an increasingly coordinated and standardized approach to the creation and management of descriptive metadata.8 however, for the digital archives related to japanese military “comfort women,” there are no recommendations or agreed guidelines on metadata for describing digital records. even when metadata standards such as dublin core are used, there remain variations in describing metadata elements of digital records. therefore, linking or integrating the digital records with different metadata structures and values is difficult. to solve this problem, a metadata model to describe digital records related to japanese military “comfort women” should be developed, and digital records should be systematically described. if the various pieces of information contained in the digital record are expressed in a format that a machine can understand, a precise search is possible based on the meaning and relationship of the data. a knowledge graph can be applied to define the relationships between the various entities included in japanese military “comfort women” records. in particular, the records existing in a distributed digital archive can be expressed as objects that can be identified on the web, so that different records can be linked at a semantic level.9 this study proposes a method to interlink and search digital records of the digital archives of japanese military “comfort women.” for describing and linking distributed digital records, a set of metadata elements was proposed, and a knowledge model was defined by examining the common metadata model and the existing rdf vocabulary. the collected digital records were constructed as a knowledge graph, using a knowledge model. the knowledge graph was evaluated by applying the fair data maturity model.10 the remainder of this paper is organized as follows. the literature review introduces the japanese military “comfort women” issue and describes the concepts and research trends related to knowledge graphs. we then introduce the case of korean digital archives containing materials about the japanese military’s use of “comfort women.” next, we describe the process of developing a knowledge graph in detail and define sparql queries, comparing the search results of existing digital archives and knowledge graphs, and describing differences in fair data maturity. finally, the research results are summarized, and future research directions are described. literature review japanese military “comfort women” the japanese military “comfort women” issue was made official in 1991 when the korean council for the women drafted for military sexual slavery by japan and the korean victims appealed to solve the problem themselves,11 through activities such as the testimony of victims,12 and the activities of individual researchers and civic groups,13 raising issues through the international community and through domestic and international judicial procedures.14 through these efforts, the japanese military “comfort women” issue has been seen as a problem of forced mobilization, human trafficking, sexual exploitation, and extreme human rights violations by the ruling state targeting women in the colonized state.15 however, the japanese military “comfort women” were a cause of conflict and confrontation between victims and their families, private organizations, and the south korean and japanese governments. for example, mark ramseyer defined the japanese military “comfort women” in his paper as prostitutes (ianfu) who, based on game theory, engaged in prostitution to the japanese military for high wages during the pacific war.16 this sparked a information technology and libraries march 2023 japanese military “comfort women” knowledge graph 3 park and kim debate about historical distortion.17 some argue that the “comfort women” issue is not viewed as a conflict between korea and japan but as a women’s and a universal human rights issue.18 from a political and social point of view, research on the japanese military “comfort women” is active, but insufficient research has been conducted on archives and records management due to licensing of records, data sharing, and a lack of qualified personnel. various licensing policies and sharing limitations apply to the records kept by different institutions. as a result, the preservation and exchange of documents are nominal, and they are administered with a minimal amount of personnel. records are essential evidence for discussing historical truths. fifteen organizations, from eight countries, have tried to list the records of the japanese military “comfort women” as unesco world’s documentary heritage.19 a total of 2,744 records have been requested, including materials that prove the japanese military’s “comfort women” system or materials produced by “comfort women” victims. however, the decision to list japanese military “comfort women” records as unesco documentary heritage has been postponed due to tensions between south korea and japan.20 the national archives of korea has selected materials related to the “comfort women” of the japanese military as a nation-designated record and is integrating and managing these records.21 however, most records are scattered in various university research institutes, nongovernmental organizations, and institutions, and it is difficult to systematically preserve and manage them. reuse of ontology vocabularies and fair data principles the records of the japanese military “comfort women” are not systematically managed, and existing digital archives tend not to contain sufficient information contained in the original records. a previous study suggested a metadata schema for the integrated management of the records of japanese military “comfort women.”22 however, although most studies suggest common metadata elements, they do not include methods for representing and processing records in a machine-readable format.23 reusing vocabularies is recommended to foster interoperability and facilitate knowledge use by interlinking new datasets to existing resources. some previous efforts demonstrate a way of interlinking digital resources on the web by using several ontology vocabularies.24 in particular, freire et al. propose a mapping from schema.org metadata to the europeana data model. the proposed method is suitable for metadata aggregation in the area of cultural heritage by enriching the semantics of the schema.org model.25 the fair data principles are designed to reinforce the reusability of research data and are defined as four principles: findable, accessible, interoperable, and reusable.26 in particular, the fair principles emphasize the ability of machines to find and use data on their own, in accordance with the research data management environment.27 initially, the fair principles were recognized as a tool to enhance the reusability of research data in the context of open science; however, they are now being extended to a universal framework for preserving and managing data in the long term.28 representative examples include fair metrics,29 the data maturity model of the rda (research data alliance) working group,30 and fairsfair.31 fair metrics presents an evaluation framework that can measure fair indices using an automated tool. discussions on the fair principle are also expanding in digital archives and libraries.32 koster and woutersen-windhouwer propose the fair principle suitable for lam (libraries, archives, museums) collections and suggest a practical method to increase the reusability of digital cultural heritage.33 information technology and libraries march 2023 japanese military “comfort women” knowledge graph 4 park and kim digital archives of japanese military “comfort women” the records or documents of the japanese military “comfort women” are managed in the form of digital archives by national and private institutions. table 1 summarizes the status of digital archives held by each institution as representative digital archives. the wednesday demonstration archive is a digital archive operated by the korean council. it contains a record of the “regular demand demonstration to solve the japanese military’s sexual slavery problem” that began in january 1992. the archive contains 1,085 records, and each record is described with 17 metadata elements. archive 814, named for the annual day of remembrance of the japanese military “comfort women” observed on august 14, aims to develop efforts and research results table 1. status of records by archives archives organization number of digital records number of descriptive metadata url wednesday demonstration the korean council 1,085 17 https://womenandwarmuseum.net archive 814 research institute on japanese military sexual slavery 596 20 https://www.archive814.or.kr/ digital collection of “comfort women” seoul metropolitan archives 137 25 https://archives.seoul.go.kr/class/cc0003 gender archive seoul foundation of women and family 408 88 http://genderarchive.or.kr/ nationdesignated archives no. 8 national archives of korea 27 20 https://theme.archives.go.kr//next/n ationalarchives/subpage/nationalarc hives7.do note: archive names in the following sections are abbreviated for readability: wed: wednesday demonstration; a814: archive 814; sma: digital collection of “comfort women”; gen: gender archive; nak: nation-designated archives no. 8. information technology and libraries march 2023 japanese military “comfort women” knowledge graph 5 park and kim surrounding the “comfort women” issue. archive 814 has 596 records, including domestic and foreign legal records, official documents, collections by subject, chronological tables, and book lists. the seoul archives provides documents proving the existence of japanese military “comfort women” and comfort stations from documents produced by the allied forces during world war ii. in total, 137 records were provided, and each record consisted of 25 descriptive metadata elements. the gender archive provides documents on the issue of “military sexual slavery by japan” and “the women’s international war crimes tribunal on japan’s military sexual slavery.” a total of 408 records were provided, with 88 metadata elements describing each record. the national archives of korea has designated records related to japanese military “comfort women” as nation-designated archives no. 8. among the records (approximately 3,060 cases) owned by house of sharing (http://www.nanum.org/eng/main/index.php) and daegu citizen forum for halmuni (http://www.1945815.or.kr/), 27 records are selected as major records, and digitized records including 20 metadata elements are provided. development of japanese military “comfort women” knowledge graph data preprocessing a total of 2,253 records and metadata were collected from the five digital archives. excluding records with insufficient information (a814 and nak had three and two documents, respectively), 2,248 records were constructed as a knowledge graph. metadata values in the collected records are not consistently expressed. for example, the seoul archives indicates the institution in the form “[organization/group] jinseong jeong research team, seoul national university, 2015,” whereas “kunji takei, governor of yamagata prefecture” in archive 814 has a combination of person, organization, and his position together. these values are separated into relevant categories and described in the corresponding metadata elements (e.g., “kunji takei, governor of yamagata prefecture” is divided to “kunji takei” (name) and “yamagata prefecture” (his position)). the units for expressing metadata values such as “production date” and “language” are also unified, and errors in some data values are corrected directly (e.g., “gabrelle kirk mcdonald” is changed to “gabrielle kirk mcdonald”, restoring the “i” to her first name). in addition, a new classification system is defined by aligning and integrating existing categories, since digital archives uses different categories (e.g., book/publication, document). a model of designing a knowledge graph two tasks are performed to transform the collected data into a knowledge graph. since the metadata elements used in digital archives are different, metadata properties commonly used in archives are extracted. for common metadata, the scope of reuse is determined by investigating the existing rdf vocabularies and adding to the proposed knowledge model. common metadata elements among the selected archives are defined by the following two criteria: 1. metadata elements commonly used in all archives were extracted. metadata elements present in all five archives, such as title, description, identifier, license, and url are mandatory. metadata elements defined in two or more archives, such as “production date” and “language,” are optional properties. even if the metadata name written in korean is different, it is regarded as the same metadata element if its purpose is to indicate the same data value. http://www.nanum.org/eng/main/index.php http://www.1945815.or.kr/ information technology and libraries march 2023 japanese military “comfort women” knowledge graph 6 park and kim 2. metadata elements not used in the actual data were excluded from the model. for example, ged has 88 metadata elements. however, there were no data values for 60 of these elements. table 2 summarizes a list of metadata elements for describing the records of digital archives. a proposed model should be able to represent the context of individual records and their own properties. after investigating semantic relationships between common metadata elements and existing vocabularies, the proposed model is defined. the model reuses existing vocabularies, such as dcmi (dublin core metadata initiative) metadata terms for describing online resources, skos (simple knowledge organization system) for representing taxonomies, ric-o (records in contexts – ontology) for describing digital records, and schema.org for supporting universal search on the web. the basic structure of the japanese military “comfort women” knowledge model is illustrated in figure 1. all records that are digital resources (“#record”) are instances of schema:archivecomponent and represent records provided by each archive. the individual records contain information on several people and organizations. for example, the schema:creator property describes a creator who creates a record, the schema:contributor can be used to represent a person who contributes a record, and the schema:mentions is to represent a thing related to a record. an archive manager who holds or maintains a record can be described using the schema:holdingarchive property, and the archive manager is represented by the schema:archiveorganization class. if the value of each property is a type of organization, then the value of rdfs:range is the schema:organization class. figure 1. abstract structure of the japanese military “comfort women” knowledge graph. information technology and libraries march 2023 japanese military “comfort women” knowledge graph 7 park and kim table 2. mapping results of both metadata elements and models of the knowledge graph wed a814 sma nak gen property entity value mandatory title title title title dc:title schema:title schema: archivecomponent xsd:string yes identifier registration number identification number management number dc:identifier schema:identifier schema: archivecomponent xsd:string yes description scope and content description dc:description schema:description schema: archivecomponent xsd:string yes production date production date year of production itm:date schema:datecreated schema: archivecomponent xsd:datetime no creator creator production institution itm:creator schema:creator schema: archivecomponent schema:person; schema:organizati on yes license license rights statement license cc:license schema: archivecomponent cc:license yes management organization management organization service provider management organization schema:holdingarchive schema: archivecomponent schema: archiveorganizatio n yes url url url url schema:sameas schema: archivecomponent schema:url yes attachment view attachment view attachment view attachment view file schema:mainentityofpage schema: archivecomponent schema:url no attachment download download schema:downloadurl schema: archivecomponent schema:url no information technology and libraries march 2023 japanese military “comfort women” knowledge graph 8 park and kim wed a814 sma nak gen property entity value mandatory record type record type record type record type itm:typeofrecord rico:hascontentoftype schema: archivecomponent skos:concept yes format type of document itm:formatofrecord rico:hasdocumentaryformt ype schema: archivecomponent skos:concept no number of pages number of pages itm:size/amount schema:numberofpages schema: archivecomponent xsd:nonnegativein teger no language itm:langauage schema:inlanguage schema: archivecomponent schema:language no periodic classification temporal coverage schema:temporalcoverage schema: archivecomponent xsd:string no related terms related information itm:relatedperson; itm:relatedorganizati on; itm:relatedevent schema:mentions schema: archivecomponent schema:person; schema:organizati on; schema:event no donor/collect or contributor, collector/provid er itm:donor schema:contributor schema: archivecomponent schema:person; schema:organizati on no information technology and libraries march 2023 japanese military “comfort women” knowledge graph 9 park and kim data enrichment and transformation data enrichment refers to the process of appending or otherwise enhancing the collected data with the relevant context obtained from additional sources. in the collected digital records, the entities of person and organization are linked to wikidata (http://wikidata.org) and the enriched information is expanded to a knowledge graph using the rdf extension of openrefine (http://openrefine.org). a total of 654 terms were extracted from the existing archives for people and organizations. after removing duplicates, the dictionary contained 150 people and 312 organizations. for each term in the dictionary, a matching entity is searched for in wikidata. if the entity name matches completely, the uri of wikidata is assigned automatically. thirty-eight percent of people (57) and 28 percent of organizations (88) matched between the dictionary and wikidata. matched entities can be added to the knowledge graph by extracting the properties and values of wikidata. for example, kim bok-dong is linked to wikidata (q16175111), and citizenship, occupation, place of birth, and gender, which did not exist in the collected data, are added to the knowledge graph. as a result, six properties are representing the extended properties were mapped (e.g., citizenship is mapped to fetched from the person and three attributes are obtained from the organization. a total of nine properties were expanded by data enrichment, and vocabularies for schema:nationality). the constructed knowledge graph had 47,499 triples for 3,069 entities. the collected records and information contained in the records included 2,560 objects. the number of entities expanded through wikidata was 145 (88 individuals and 57 entities) and were added to the organization. the enriched entity contained 2,144 explicit statements and 102 inferred statements. as shown in table 3, the total number of triples was 47,327 for explicit statements and 172 for inferred statements. the knowledge graph is published on github (https://github.com/hike-lab/comfortwomen-archives). table 3. statistics of the constructed knowledge graph entities explicit statements implicit statements sum of statements collected entities 2,560 45,213 70 45,283 enriched entities 509 2,114 102 2,216 sum 3,069 47,327 172 47,499 figure 2 shows the information about “jan ruff o’herne” in the knowledge graph. she is a dutchaustralian sexually enslaved by the japanese military and has been active as a human rights activist since she disclosed in 1992 that she had been sexually enslaved by the japanese army. the knowledge graph links several records produced or contributed by o’herne. wed’s record (wednes-demo-368) links jan ruff o’herne with related information (schema:mentions), and a814’s record (a814-107) links jan ruff o’herne as the record’s creator (schema:creator). existing digital archives do not provide specific information about the person, organization, or http://wikidata.org/ http://openrefine.org/ https://github.com/hike-lab/comfort-women-archives https://github.com/hike-lab/comfort-women-archives information technology and libraries march 2023 japanese military “comfort women” knowledge graph 10 park and kim event described in metadata. if anyone does not know that jan ruff o’herne was a victim of the japanese military “comfort women,” it is difficult to fully understand the record of “letter from jan ruff-o’herne in support of us congress resolution 121 in 2007” provided by a814. however, as shown figure 2 the knowledge graph provides a rich context for understanding her and her associated records. figure 2. semantic relationships of jan ruff o’herne on the knowledge graph. evaluation the evaluation of the constructed knowledge graph was carried out in two ways: 1) discoverability among five archives and the knowledge graph is compared by using several semantic queries, and 2) the fair data evaluation was applied to the knowledge graph and existing digital archives. information technology and libraries march 2023 japanese military “comfort women” knowledge graph 11 park and kim discoverability all queries aim to find out all digital records across five digital archives by using search conditions and are designed by the rdf standard query language (sparql). table 4 is an example query (q3), and the records produced from 1990 to 1994 in digital resources are sorted in ascending order. at this time, the values of all the objects must exactly match the rdf:type, and regardless of the physical location, the object is identified based on the uri and included in the search result. table 4. a sparql query example (q3) prefix schema: <http://schema.org/> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix xsd: <http://www.w3.org/2001/xmlschema#> select ?title ?date ?archiveorganizationname where { ?record rdf:type schema:archivecomponent; schema:name ?title; schema:datecreated ?date; schema:holdingarchive ?archiveorganization . ?archiveorganization rdfs:label ?archiveorganizationname filter (?date >= ‘1990-01-01’^^xsd:date && ?date <= ‘1994-12 31’^^xsd:date) } order by ?date table 5. list of sparql queries queries description number of results q1 select all records of japanese military “comfort women” 2,248 q2 select all records whose record type is ‘document’ 1,793 q3 select records produced between 1990 and 1994, and sort in ascending order 345 q4 select all information about ‘ministry of gender equality and family’ 480 q5 select all information about ‘jan ruff-o’herne’ 120 table 5 summarizes the queries constructed to search for a knowledge graph, and figure 4 shows the results of the comparison between the search of the existing archives and the query of the knowledge graph. the existing archives provide keyword-based search without considering the http://schema.org/ http://www.w3.org/2000/01/rdf-schema http://www.w3.org/2001/xmlschema information technology and libraries march 2023 japanese military “comfort women” knowledge graph 12 park and kim meaning and relationship of search keywords. furthermore, they do not share any common categories or classifications among others. a knowledge graph that semantically links records in different digital archives also enables accurate and relevant discovery. q1, q2, and q3 find all digital records matching the query condition and information semantically linked to those records. for example, gen had 169 records produced between 1990 and 1994. since the archive did not support the search for a type of a record, it is not possible to specifically search for the record type in q2 and q3. however, in the knowledge graph, the record type is rico:hascontenttypeof; thus, information is expressed at the semantic level, such that 169 related records can be retrieved. q4 and q5 discover entities based on their semantic relations. “ministry of gender equality and family” in q4 is an organization, and each government uses the name of the department slightly differently (e.g., “ministry of gender equality”). q5 discovers different entities in existing archives. the knowledge graph semantically defines the variant of entities and their types. as a result, the knowledge graph provided 104 more search results in q4 and nine more search results in q5 than the existing archives. figure 3. search results of knowledge graph and existing digital archives. fair data evaluation for the knowledge graph the fair data evaluation for the constructed knowledge graph reveals a clear improvement compared to existing archives. findable, accessible, and interoperable follow the fair data principles. all objects of the constructed knowledge graph can be identified by uri, and metadata elements are described with a standard vocabulary, so that the machine can search for digital resources. digital resources in the existing archives are accessible over the web, therefore accessible received a pretty good score. however, access to the metadata of ind ividual records was restricted, as the majority of metadata elements were described as simple strings instead of machine-readable forms. all the information in the knowledge graph has improved accessibility by providing uris to metadata elements. in addition, to avoid being linked to the resources of the existing archive, standardized vocabulary, such as schema.org and dublin core, was applied to increase the connectivity between data, and rich contextual information was provided through semantic linkage with wikidata. as shown in figure 5, the evaluation score of reusable is 0.7, which is 2.9 times better than the existing archives. the metadata elements in the information technology and libraries march 2023 japanese military “comfort women” knowledge graph 13 park and kim knowledge graph clearly describe a license for reuse. in particular, the creative commons license and the korea open government license provide machine-readable uri information to enhance reusability. however, data for which licensing information is not clear or not provided are left blank. in summary, the constructed knowledge graph semantically connects digital resources fragmented in different archives, enables a rich search, and satisfies all fair data indicators. figure 4. results of fair data evaluation of the knowledge graph and existing digital archives. conclusion this study proposed a method for linking and searching digital records from the japanese military “comfort women” digital archive. in korea, materials related to japanese military “comfort women” are managed by several institutions, some of which are provided as digital archive services. however, the existing digital archives describe digital records without common standards or guidelines, and the metadata of individual records are expressed in text format in html documents without explicitly expressing their structure and meaning. therefore, digital records that exist in different digital archives cannot be connected even if they have the same context, such as subject, event, person, or institution. this study proposed a common metadata model for the descriptive metadata of digital records and constructed a knowledge graph in which digital records are semantically interlinked. furthermore, the fair data maturity model was used to evaluate the constructed knowledge graph. the constructed knowledge graph semantically defines the relationship between the various entities included in the records of japanese military “comfort women.” in particular, records existing in a distributed digital archive are expressed as objects that can be identified on the web, so that different records can be explored at a semantic level. the knowledge model proposed herein is the first attempt to describe digital records related information technology and libraries march 2023 japanese military “comfort women” knowledge graph 14 park and kim to japanese military “comfort women”; thus, it can serve as a starting point for discussing a comprehensive model for describing fragmented digital records worldwide. we also apply an open license to disclose all the collected records and construct knowledge graphs for further collaboration. however, there are also considerations for the construction and management of high-quality digital records. first, the records must contain accurate and rich semantic information. the collected digital archives have an average of 16 metadata elements, but because the metadata elements and values differ among institutions, the data accuracy needs to be improved. second, it is necessary to clearly provide conditions for the use of records. most records do not provide a clear license for terms of use. it is important to explicitly express and provide international or korean standard licenses for digital resources. finally, it is necessary to discuss the records of japanese military “comfort women” using open data. the sharing of records and the promotion of information exchange between domestic and international scholars can both be facilitated by the opening of records, which can also play a significant role in the long-term preservation and sharing of records. as a majority of records are fragmented and difficult to discover and manage, it is necessary to find an effective method to preserve the records by opening and sharing them and to lead research cooperation at home and abroad. endnotes 1 chunghee sarah soh, “the korean ‘comfort women’: movement for redress,” asian survey 36, no. 12 (1996): 1226–40, https://doi.org/10.2307/2645577. 2 shogo suzuki, “the competition to attain justice for past wrongs: the ‘comfort women’ issue in taiwan,” pacific affairs 84, no. 2 (june 2011): 223–44, https://doi.org/10.5509/2011842223. 3 center for korean legal studies, “military sexual slavery, 1931–1945,” accessed october 17, 2022, https://kls.law.columbia.edu/content/military-sexual-slavery-1931-1945. 4 kathryn j. witt, “comfort women: the 1946–1948 tokyo war crimes trials and historical blindness,” the great lakes journal of undergraduate history 4, no. 1 (september 2016): 17– 34. 5 “south korea: lawsuits against japanese government last chance for justice for ‘comfort women’,” amnesty international, accessed october 17, 2022, https://www.amnesty.org/en/latest/news/2020/08/south-korea-lawsuits-against-thejapanese-government-last-chance-for-justice-for-comfort-women/. 6 sincheol lee and hye-in han, “comfort women: a focus on recent findings from korea and china,” asian journal of women’s studies 21, no. 1 (march 2015): 40–64, https://doi.org/10.1080/12259276.2015.1029229. 7 itza a. carbajal and michelle caswell, “critical digital archives: a review from archival studies,” the american historical review 126, no. 3 (september 2021): 1102–20, https://doi.org/10.1093/ahr/rhab359. https://doi.org/10.2307/2645577 https://doi.org/10.5509/2011842223 https://kls.law.columbia.edu/content/military-sexual-slavery-1931-1945 https://www.amnesty.org/en/latest/news/2020/08/south-korea-lawsuits-against-the-japanese-government-last-chance-for-justice-for-comfort-women/ https://www.amnesty.org/en/latest/news/2020/08/south-korea-lawsuits-against-the-japanese-government-last-chance-for-justice-for-comfort-women/ https://doi.org/10.1080/12259276.2015.1029229 https://doi.org/10.1093/ahr/rhab359 information technology and libraries march 2023 japanese military “comfort women” knowledge graph 15 park and kim 8 “library of congress metadata for digital content – master data element list version 4.1,” library of congress, accessed october 4, 2022, https://www.loc.gov/standards/mdc/elements/masterdataelementlist-20120215.doc. 9 stefano ferilli and domenico redavid, “an ontology and knowledge graph infrastructure for digital library knowledge representation,” italian research conference on digital libraries, (january 2020): 47–61, https://doi.org/10.1007/978-3-030-39905-4_6. 10 mark d. wilkinson et al., “evaluating fair maturity through a scalable, automated, communitygoverned framework,” scientific data 6, no. 174 (september 2019): 1–12, https://doi.org/10.1038/s41597-019-0184-5. 11 na-young lee, “the korean women’s movement of japanese military ‘comfort women’: navigating between nationalism and feminism,” the review of korean studies 17, no. 1 (june 2014): 71–92. 12 jaeyeon lee, “the ethno-nationalist solidarity and (dis)comfort in the wednesday demonstration in south korea,” gender, place & culture (2021): 1–14, https://doi.org/10.1080/0966369x.2021.2016655. 13 lee and han, “comfort women,” 40–64. 14 witt, “comfort women,” 17–34. 15 na-young lee, “the korean women’s movement,” 71–92. 16 j. mark ramseyer, “contracting for sex in the pacific war,” international review of law and economics 65, (march 2021): 105971, https://doi.org/10.1016/j.irle.2020.105971. 17 andrew gordon and carter eckert, “statement by andrew gordon and carter eckert concerning j. mark ramseyer, ‘contracting for sex in the pacific war’,” accessed october 4, 2022, https://nrs.harvard.edu/urn-3:hul.instrepos:37366904. 18 jaeyeon lee, “the ethno-nationalist solidarity and (dis)comfort,” 1–14. 19 heisoo shin, “voices of the ‘comfort women’: the power politics surrounding the unesco documentary heritage,” the asia–pacific journal 19, no. 5 (march 2021): 1–19. 20 ian e. wilson, “the unesco memory of the world program: promise postponed,” archivaria 87, (may 2019): 106–37. 21 yunshin hong, “epilogue: ‘comfort stations’ as sites of remembrance,” in “comfort stations” as remembered by okinawans during world war ii, ed. robert ricketts (leiden: brill, 2020), 432– 59. 22 ji hyeon bong and young joon nam, “a study on the design of metadata elements for management of oral history archives about sexual slavery by japan’s military,” journal of https://www.loc.gov/standards/mdc/elements/masterdataelementlist-20120215.doc https://doi.org/10.1007/978-3-030-39905-4_6 https://doi.org/10.1038/s41597-019-0184-5 https://doi.org/10.1080/0966369x.2021.2016655 https://doi.org/10.1016/j.irle.2020.105971 https://nrs.harvard.edu/urn-3:hul.instrepos:37366904 information technology and libraries march 2023 japanese military “comfort women” knowledge graph 16 park and kim korean society of archives and records management 19, no. 1 (february 2019): 225–50, https://doi.org/10.14404/jksarm.2019.19.1.225. 23 haram park and haklae kim, “a knowledge graph on japanese ‘comfort women’: interlinking fragmented digital archival resources,” journal of korean society of archives and records management 21, no. 3 (august 2021): 61–78, https://doi.org/10.14404/jksarm.2021.21.3.061. 24 myung-ja k. han et al., “exposing library holdings metadata in rdf using schema.org semantics,” international conference on dublin core and metadata applications, (september 2015): 41–49, https://dcpapers.dublincore.org/pubs/article/view/3772. 25 nuno freire, valentine charles, and antoine isaac, “evaluation of schema.org for aggregation of cultural heritage metadata,” semantic web (june 2018): 225–39, https://doi.org/10.1007/978-3-319-93417-4_15. 26 mark d. wilkinson et al., “the fair guiding principles for scientific data management and stewardship,” scientific data 3, no. 160018 (march 2016): 1–9, https://doi.org/10.1038/sdata.2016.18. 27 “fairification process,” go fair, accessed october 4, 2022, https://www.go-fair.org/fairprinciples/fairification-process/. 28 christian haux and petra knaup, “using fair metadata for secondary use of administrative claims data,” studies in health technology and informatics 264 (august 2019): 1472–73, https://doi.org/https://doi.org/10.3233/shti190490. 29 wilkinson et al., “evaluating fair maturity,” 1–12. 30 christophe bahim et al., “the fair data maturity model: an approach to harmonise fair assessments,” data science journal 19, no. 1 (october 2020): 41, https://doi.org/10.5334/dsj2020-041. 31 ansuriya devaraju et al., “fairsfair data object assessment metrics (v0.4),” fairsfair, (october 2020): https://doi.org/10.5281/zenodo.4081213. 32 silvia calamai and francesca frontini, “fair data principles and their application to speech and oral archives,” journal of new music research 47, no. 4 (may 2018): 339–54, https://doi.org/10.1080/09298215.2018.1473449; gustavo candela et al., “reusing digital collections from glam institutions,” journal of information science 48, no. 2 (august 2020): 251–67, https://doi.org/10.1177/0165551520950246; danuta nitecki and adi alter, “leading fair adoption across the institution: a collaboration between an academic library and a technology provider,” data science journal 20, no. 1 (february 2021): 6, https://doi.org/10.5334/dsj-2021-006. 33 lukas koster and saskia woutersen-windhouwer, “fair principles for library, archive and museum collections: a proposal for standards for reusable collections,” code4lib journal 40 (may 2018). https://doi.org/10.14404/jksarm.2019.19.1.225 https://doi.org/10.14404/jksarm.2021.21.3.061 https://dcpapers.dublincore.org/pubs/article/view/3772 https://doi.org/10.1007/978-3-319-93417-4_15 https://doi.org/10.1038/sdata.2016.18 https://www.go-fair.org/fair-principles/fairification-process/ https://www.go-fair.org/fair-principles/fairification-process/ https://doi.org/https:/doi.org/10.3233/shti190490 https://doi.org/10.5334/dsj-2020-041 https://doi.org/10.5334/dsj-2020-041 https://doi.org/10.5281/zenodo.4081213 https://doi.org/10.1080/09298215.2018.1473449 https://doi.org/10.1177/0165551520950246 https://doi.org/10.5334/dsj-2021-006 abstract introduction literature review japanese military “comfort women” reuse of ontology vocabularies and fair data principles digital archives of japanese military “comfort women” development of japanese military “comfort women” knowledge graph data preprocessing a model of designing a knowledge graph data enrichment and transformation evaluation discoverability fair data evaluation for the knowledge graph conclusion endnotes enhancing opac records for discover | griffis and ford 191 patrick griffis and cyrus ford enhancing opac records for discovery this article proposes adding keywords and descriptors to the catalog records of electronic databases and media items to enhance their discovery. the authors contend that subject liaisons can add value to opac records and enhance discovery of electronic databases and media items by providing searchable keywords and resource descriptions. the authors provide an examination of opac records at their own library, which illustrates the disparity of useful keywords and descriptions within the notes field for media item records versus electronic database records. the authors outline methods for identifying useful keywords for indexing opac records of electronic databases. also included is an analysis of the advantages of using encore’s community tag and community review features to allow subject liaisons to work directly in the catalog instead of collaborating with cataloging staff. a t the university of nevada las vegas (unlv) libraries’ discovery mini-conference, there was a wide range of initiatives and ideas presented. some were large-scale initiatives that focused on designing search platforms and systems as well as information architecture schemas that would enhance library resource discovery. but there was not much focus on enhancing the representation of library resources within the construct of bibliographic records in the opac. since searching platforms can only be as useful as the information available for searching, and since opac records are the method for representing the majority of library resources, we thought it important that the prominence of opac records and how they represent library resources be considered in the mini-conference. to that end, our presentation focused on enhancing the opac records for nonbook items to support their discoverability as opposed to focusing on search systems and information architecture schemas. our proposition was that subject liaisons’ expertise could be used to enhance opac records by including their own keyword search terms and descriptive summaries in opac records for electronic databases as well as records of media items. this proposition acts as a moderate approach to initiatives that call for opac records to be opened for usergenerated content in that this approach provides subject liaison mediation and expertise to modify records. as such, this approach may serve as an effective stopgap in cases where there is resistance toward permitting social tagging and user descriptions within opac records. such an initiative also is scalable, allowing liaisons to provide as few or as many terms as they want. such an initiative would require collaboration between cataloging staff and subject liaisons. n disparity between media and database records at unlv libraries, terms included in the notes fields of bibliographic records are indexed for keyword searching. in the case of media items, there is extensive use of notes to include descriptive terms that enhance discoverability for users. for example, notes for films indicate any awards the film has won as well as festivals in which it has been featured (see figure 1). as a result, users can discover films through keyword searches of film awards or film festivals. a film student who is searching “cannes film festival” via a keyword search will generate results that include films owned by unlv libraries that have been featured at that festival. these keyword-searchable notes add value and discoverability for this type of material, and subject liaisons can be a source for such information. while it appears that notes in media records are heavily populated with a variety of user-centric information, there is relatively little use of descriptive notes for figure 1. the notes field in an opac record of a film item patrick griffis (patrick.griffis@unlv.edu) is business librarian and cyrus ford (cyrus.ford@unlv.edu) is special formats catalog librarian, university of nevada las vegas libraries. 192 information technology and libraries | december 2009 electronic databases (see figure 2). for databases, notes traditionally include information about access restrictions and mode of access while overlooking information representing the content of the resource. these fields could be utilized for specific terms relating to database content not adequately covered by the library of congress subject headings (lcsh). subject liaisons have specialized knowledge of which databases work best for unique content areas, class assignments, and information needs. this user-centric knowledge can be used to enhance database discovery if liaisons were to provide catalogers with information and descriptors to add to the record. as an example, at unlv libraries there is one particular database that provides a strengths, weaknesses, opportunities, and threats (swot) analysis for companies, but that natural language term isn’t found anywhere in the general database summary listing or subject headings. if it were added to a note field as part of a description or as a labeled descriptor, then students could easily find this database to complete their assignments. this proposal is scalable, allowing liaisons to provide as few or as many key terms as they want, depending on their preference or on the vagaries of a particular database. subject liaisons could opt to add a few major terms from their own knowledge and expertise that they feel will add value for patrons searching the opac. subject liaisons also could mine the index and thesaurus terms of individual databases to identify prominent content areas for individual databases to find useful keywords. n mining electronic database index descriptors electronic databases typically have subject matter taxonomies developed by experts who assign descriptors to journal articles. subject liaisons could mine these taxonomies to identify predominant descriptors for individual databases to add to the database catalog records. predominance of a subject descriptor could be determined by examining the relative number of articles that are assigned to that descriptor. such a strategy of indexing key predominant subject descriptors identified from database subject matter taxonomies could serve to uncover unique content areas not served with lcsh. a different application of this strategy could be employed for identifying predominant and emerging research areas for particular groups. subject liaisons could conduct a citation analysis of articles authored by members of a particular research group to record and codify the subject descriptors of each article. once codified, an analysis could determine the most predominant subject descriptors for articles authored by that particular group. this could serve as a baseline for identifying emerging research areas and their terms. both types of analysis have potential to provide useful keyword terms for database records. n using encore’s community features in 2008, unlv libraries purchased and implemented the innovative interfaces’ encore discovery platform, which provides a google-like interface for searching the public catalog and the ability to narrow results using facets such as location, year, language, and format. encore also includes many display features that showcase the information provided in the bibliographic records. two of encore’s web 2.0 features provide users with the ability to contribute data to records via community tags and community reviews. unlv requires users to enter a valid library barcode number and pin. subject liaisons could use the community reviews feature to add descriptive summaries of items to encore records independently, without the need for cataloging staff to edit a marc record. however, the content of community reviews are not indexed for searching and thus only add value at the point when a user is determining whether the resources they have retrieved are valuable for them. on the other hand, if a community tag is added to an item, that tag is included in the community tags section figure 2. the notes field in an opac record of an electronic database enhancing opac records for discover | griffis and ford 193 of the encore result display and becomes an indexed keyword for searches in encore (see figure 3). if that tag term is searched in encore’s keyword search, the bibliographic record attached to that tag term will be included in the results list under the community tags facet. since these community tags are searchable, subject liaisons can add keywords to encore records without collaboration with cataloging staff. however, this provides limited success because the keyword is included and indexed only in encore records—not in the opac records. also, the community tags facet must be selected from the results display for the encore record tags to be searchable. n the case for collaboration as described above, keywords and descriptions added by subject liaisons into encore records have inherent discovery limitations when compared to a cataloger adding the same information directly to the marc bibliographic record. the advantages of collaboration between subject liaisons and catalogers is clear, and subject librarians at unlv libraries have experienced similar collaboration in efforts in the past. in 2006, subject librarians at unlv libraries were offered the opportunity to create their own descriptions of electronic resources through an initiative to update the summary descriptions for the electronic databases portion of the libraries’ website. at that time, all existing electronic database summaries were those used by the database publishers. the project provided subject liaisons the option to create custom summary descriptions to represent electronic databases in their own terms. each subject liaison had a document file for their descriptions, and the website editors used them to update the electronic databases list on the libraries’ website. this particular initiative serves as one example of the willingness of subject liaisons to share their subject expertise to enhance the representation of library resources through collaboration with technical services staff. as such, collaboration between subject liaisons and catalogers to allow liaisons to add terms to opac records of electronic databases and media items could prove to be both effective and feasible as an initiative toward enhancing the discovery of library resources. figure 3. encore community tag lita cover 2, cover 3, cover 4 index to advertisers yiotis ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ c1'eation of compuh" "'.,---' llo in (9) presents me results 01 a comparison test 01 me first mree creation of computer input in an expanded character set donald v. black: system development corporation, santa monica california (formerly, university of california, santa cruz, calif.) , keypunching of an expanded character set for library catalog data is described. the set included 101 different characters. source documents were shelf list cards, the master record at the university of california library, santa cruz. at the end of february, 1967, some 50 million characters, 1'epresenting more than 110,000 separate titles, had been punched. some of the considerations leading to the adoption of this method for the creation of machine readable input are given, and details on costs and production rates. for manipulation by a computer, data must be converted to machine readable fornl. there are still only a few reasonably flexible means of creating machine readable records, especially if the data include an. ex~ panded character set. five possible methods utilize one of the fnuowlllg. standard keypunch, paper tape-producing typewriter, optical character reader, keyboard device that encodes dh'ectly onto magnetic tap~, or f keyboard tenuinal that inputs directly into a computer. descriptions. 0 some of these methods are available in the literature. the johns hopkin! university (1) used optical character recognition which can handle a ft~_ > alphanumeric representation, whereas southern illinois (2) used mar. sense scanning to convert only a limited amount of information. car~ wright (3) and ibm (4) discuss direct computer input from a keybo~r terminal. buckland (5) discusses the use of the paper tape-produc:nf typewriter. hammer (6) and kilgour (7) discuss keypunching. patflc t( 8) discusses several methods of conversion, but only in the abstrac · ...cbap\110ds above.does not discuss the relative merlts 01 these methods, but. 'fb paper ts the details of a system that has converted approximately es 11resen . h 2 1ra"eris;fuon characters 01 library catalog data on more t an 0 anguag , 1 500 to et of 101 characters. with • ~ iversity 01 california at santa cruz is one 01 three university '!'h. n recently established by the state. it opened lor business in the ~pls~~65 with a core collection 01 some 55,000 titles in approximately fw000 volun,es. early in the operation 01 the library, it was decided to so, achin ro as much as possible; therelore me existing catalog emods eos. as had to be converted il the original collection were to be a part 'f,~e lutur machine system. the creation 01 the core collection lor the e ;)"ee new campuses 01 the university 01 calilornia has been described in the literature (10). methodsbids were sought to convert the catalog records during the summer 01 )965. the shell list record produced by me new campuses' project was the master record and was to be me source lor conversion. unlortunately, the shell list consisted 01 both printed library of congress caids and cards produced at me new campuses' project irom typewritten multilim mas' ters. no editing was to be done on me shell list caids. the only addition was the stamping 01 an arbitrary number using a five-digit automatic numbering machine, the purpose 01 the number being to keep individual punch cards together for each entry. weighing me responses to me request lor bids was a disherutening experience. only lour responses were received irom a total 01 15 requests sent out. the bid request did not specify the method to be used to convert to machine readable form, but only the resulting machine read able record. since the specifications had used punch cards as an example, p,:,haps this limited the minking 01 some 01 the organizations involved, with the result that they did not choose to bid, e three .bids were based on keypunching. one was from florida and me .omplexlhes of the task made the choice of such a distant company :mpossible. ii problems had arisen during the course 01 me conversion, ravel costs would have been excessive. cf'0ther .response estimated the cost to be about $1.50 per record. early, tllls was too costly, and since bids of this nature are apt to be ~::ervative in 'h. matter ~f ~itimate tot~1 costs, we i~lt the choice 01 eth an mgam7.abon to do tne lob would, mdeed, result m a target figure at would be too high. i ?nly one bid used optical scanning as the method 01 conversion. un· orufately, the bid was for me scanning only, and library staff members wou d have had to retype the records for the scanner. since the cost the scanjling alone was close to 301 a title, that bid was also ljased"'ll~.~being ultunately more costly, choice 01 a kely"nching service in san francisc~ was made m 1'b" fill . 01 its pr<>xirillty to santa cruz, on the enthuslas 01 the '" tbo b""j, task to be undertaken, and on a reasonable cost estimate. ljiddor i<?r b':.re completed the task in slightly more than three months. , au'" po , '!'he se~ge hon was done on an ibm/056 verifier during this mass con .<: '" .....+j co rl ... ... ~ <d , "i"? 01-4 co lj"\ u"\ t,ey "erwca.o i i i ;e<si;e ej<pan charact~r set employed is shown in figure 1. it was» !> -"" .... n <: ' .... ..... dedco, col '" 0 because it was avlulable on an ibm/1401 computer at the los.<: " >,. +j ... co coir\, p co i i ..... >, i -::t -::t cbg'j':. caidpus 01 the university (ucla). at that time, it was the only >< tdp (v") i is '"' $,,; ~;:1~ ~with sucb a printer on the west coast. the character set had been ! g .& ~ .~ ~ ro...... ... z ..., joe joe 0) f co co i, ted by librarians at ucla from characters offered by ibm in the ..; ~~ pi~ '}j~~ co al ij) joe 0 i • ..-i .;.> ~~§ij)'rlpi ... ~~ (j) ~er of 1964 for the 1403 printer.'" >r< a 'rle:;j~~g <rt ...... ar as figure 1 shows, digits and 100ver-case letters are punched as usual;'" po '"' col<li@l ~ 1-' u el~~::;~'5 -::tarc(~ er'case letters are preceded by the word-separator character (0·5·s). ;:;, <: <: " q) q) x""" j..4 '" 0 '" co ' t-n n u u q} rd q} 6'gg. ..... "'~ '?~..;..;«: () () r-i '-" > ~ ...... ~g lor the special characters is descnbed in the tables. there are'-" d ,....-! ~ -rl" (v") col col ~ 0 0 .;.> ~ ul <li ~ (j.): h ... co «) .;.>~ q) q} , o;:j q.! 0) pi 0) .c ~rl+j rooo"'co i i .;.> > u ~ .... <d .;.> ..-i <d ..; "" """ minus signs: the one printed from an ll·punch is below the center 'd'ro >0 '" ~ a.g -a:ci~~ 0j,'f'cr~ ~ e ~ ~ ~ ~ ~ ~ ~ ;i;«: ~ ~ e-o ~ (j) e-o u '" 0:l.!ll8c"l«:~~~~;:1~ 01 the character; obtaining a centered minus requires a multiple punch u ~ '" ~ -&3 (11. ). the underscore prints in a space by itself, just as do other char· 0:; 'b po< ~ acters. it requires special programming to overprint this character by 0uj co co ..... 0 ~£, , , r-i c\j (v") -::t if'\ \0ffi '):lr-co~j,~ ..... ..... suppressing paper spacing. the virgule overprint requires two columns e-o o ;:1 ..... ~ ~ cj it> punch. sharp.eyed readers will notice that the virgule appears twice ~~ u in figure 1, and it has been counted twice lor the total 01 101 char o;j '""l acters. the blank has also been counted as a character, but the black ~ , < o lz -a '" ... > ....... "" ed «: .'ifj square, which was not used at santa cruz, was not counted.fu ~ all data elements were encoded in fixed card fields; that is, the field for each type of inforrnation had a fixed length, generally 300 characters. ..," it was not necessary, however, to use the entire field or to fill it with g ..-i - :a til zeros or other codes. no terminating characters were used to separate ii the fields. each type of information was included on one or more cards'"' ~ ~ ~ ~ ~ u ~ ~ m m :arin ~ code which would tell the computer precisely what type 01uj 1l ~ ~~ ~(q" ga ii '"' .<: a .., "" '"' ~ () as ..., 0) <li () v ormation the card ( s) contained. all of this is illustrated in table l.g; j ~ .....;:;, s:ll:! aug ~'"'-ti~~.;.> <: u there are basically two ways that information can be encoded into ~ ,.a<>: 1-'~g~ ~~6 h~~p~" rd 0) '-" j..t ~1j)~!!~gbj'ga~p~,.... cards. this is discussed in references (3) and (6) especially. to use a.ch~meno)~ en 0 ei ro ~ ,0 r::!30)o.rllqo)....-i~vjdco.....-la"ii ~ ~: '" .~ ..; ... ~ ° '" °m ~ .;.> so'"8 rl0 ~ '" h '" r-id ~~og.;i§ n,",daiqo,"podjl) .... p,..; u~etely variable lonnat it is necessary to bave field delimiting codes. cd ~ a«:~~p., (j) 0:; u w ::0: z ~ ~ ~ «: u uj «: ~ po< 000z i' ~ xed sequence 01 data elements is established (e.g., author, title, pub co co co co ""'-!cococoed o?cfo'?'?'f'?'? 'f'f<j?"t'9,-i~ 'e~ (\j ~ ~ j.po< nls er, .etc.), one code will suffice to separate each field. fixed sequence "" m i i i ~ 'f t-;~ ~ 1" if' ~ tt 9 ~ 1" u;' '? tj ~§ uu (v")-::tlj"\\o 0000" ° o 0 ";;:1 ;:1 ;:1 ;:1 ;:1 ~ ~ ~ ~ ~ ~ ~~ :sltate~, however, making provision lor establishing sequence lor@ uj <d.<:-e ry possible data element that can occur in a catalog entry· and if one ~ ~ § ~ op e e ements do not occur lor some specific entry, then the keypunch08 ....... • l:c ormor i ' '"ii'~. ~ ~~j "'" @j "** everatrjr must remember to add the required field-delimiting code lor .~ ~l<:~ cru , ery data element not present, and to add the code in the proper se... o~ .;.> '" a"fu . ii a number 01 individual codes are to be used to delimit fields, h u ell-<!)-@j 1\ v + ....... -ii, h ,,,,,,,,... 1* .. . '-" ...-....... ,......, h ceed en becomes difficult to find codes that are not otherwise used for fu .<:: " colu 114 journal ot library au-tomation vol. 1/2 june, 1968 table 1. punch card input format 'fable ):. (c ont.) field iv cals. comments field id gols. comments . card, no. 200-224 (limits, 1-5 cards per title; 1-5 titles) ')'jile'fitl 1-60 may be cont. on up to 4 additional cards shelf key card, no. 000 call no. 1-20 as desired e 61-67 unused year 21-24 year of publication con . indicator 68 is '-' if continued on next card copy 25-26 (blank or 01-99) tspecial code 69 see table 3 series 27 (either alphabetic or numeric o.i( ) card no. 70-72 200-224 volume no. 28-30 (blank or 001-999) . accession no. 73-80 same as shelf key card (blank or 01-99) part no. 31-32 publisher/source card, no. 300-305 (limit' 1-2 cards per publ./source, donor no. 33-36 no. of donor on g~t list (may be blank) 3 publ./sources total) date rec'd 37-40 month, year received at library publ./source 1-60 may be continued on a second card location 41-42 alpha code designating location on 61-67 unused campus 68 is 'o' if publ./source is cont. on next card type 43 o=book, l=serial, 2=reference cont. indicator 69 p=publisher, s=source 3=gov't pub., 4=see auth., , special code 70-72 300-305 5=see subj., 6=see also subj. card no. 73-80 same as shelf key card language 44-47 from 1 to 4 one-letter codes indicating accession no. lang. collation card, no. 400 as desired suppress 48 h "s", entry appears on shelf list only , collation 1-40 unused 49-69 unused . 41-69 400 (1 card only) card no. 70-72 must be '000' . card no. 70-72 same as shelf key card accession no. 73-80 8-digit no. which sequences a batch of accession no. 73-80 (limit: 1-5 cards per comment; accessions in call no. sequence (gener. commentary card, no. 500-509 2 commentaries, (1 of each type) (i,ally only final five digits are used) may be cont. on up to 4 more cards personal author, no. 100-104 (limit: 0-5 ca.rds) (i, commentary 1-60 unused61-67author 1-60 name of author, left justified is '_' if commentary cont. on next card 61-68 unused cont. indicator 68 's' for commentary to appear on shelf 69special code 69 see table 2 special code list only card no. 70-72 100 through 104 card no. 70-72 500-509accession no. 73-80 same as shelf key card accession no. 73-80 same as shelf key card corporate author, no. 110-119 (limit: 1 or 2 cards per author; subject card, no. 600-604 (limit: 1 card per subject; 5 subjects) 0-5 authors) (i, subject 1-60 as desired corporate author 1-60 may be continued on second card 61-68 unused61-67 unused 69 may be used to indicate level of cont. indicator 68 is '-' if author is cont. on second card special code subjectl) 4 special code 69 see table 2 600-60470-72card no. 70-72 110-119 card no. same as shelf key card accession no. 73-80 same as shelf key card accession no. 73-80 i): h 2 commentary entries are used, at least 1 must have an '5' code. the first author to be processed by the computer is considered th~(i, the special code is ignored by the system at the present time. main author. the main author appears on all catalogs and is represente in the title by "00". .l.lq jotlrnm ot lwrary l-1tlromanon vol l/ ~ june, hjoo legitimate data. it seemed easier to input with fixed fields and s' waste a few card columns. that is, if a particular field ends anywhllnp1y ere'the body of the card before the final column (for example, 60), the 0 ll\ tor simply stops and feeds in a new card. it is possible that a per:\. may have only one character of data on it, in addition to a card-s cardequ number and item-type number. in practice it would seem that the 10el\:e time in card feeding is not significant, and blank hollerith cards are ss iii . ~ry cheap mdeed. training for the mass conversion effort at the keypunch service in s francisco proved relatively easy. an operator's guide was produ~~ showing the codes and conventions for each data element found on th typical catalog card. only the shelf list card was used for the conversio e tables 2 and 3 show the various elements that were coded. there we~' two forms of shelf list cards, as mentioned above: library of congres~ cards and cards produced by the new campuses' program at san diego. on library of congress cards everything was encoded except roman numeral pagination and size information, and the information at the bottom of the card: the call number used in the library of congress itself, the dewey number, the lc card number, and the name of the originat ing library if any. on the home-made cards, roman numeral pagination and size were not encoded. everything in roman characters was punched. cards with only a small amount of information in roman char acters had a legend punched, "for complete entry see shelf list." it is simply not possible in a short article to give all the fine points of conver sion. rules for all contingencies were devised and most proved easy to follow. twenty operators, working in two shifts of ten each, converted the 55,000 titles that existed in june, 1965, in about three months' working time. all data elements to be used later for sorting purposes on the computer were key verified, but for the first month of the conversion the entire record for each title was verified. beginning in december, 1965, the library at santa cruz began key punching operations. mter a training period of a week and operational experience of four months, the local operators achieved a rate of 7,000 to 8,000 keystrokes per hour, with a net error rate of only 12 errors in ap proximately 24,000 keystrokes; that is, the operators recognized a number of errors and corrected them at the time of initial punching. the 12 remaining errors should be caught during proofreading, which we sub stituted for key verification in the ongoing production system. it was felt desirable to combine proofreading for transcription accuracy with the typical library practice known as "revision," which implies that the cat~ log copy be reviewed for content as well as accuracy. this is true even for text taken from library of congress catalog copy. elements such as. the form of entry, the form of series note if any, number of subject headl1lf and form, etc., are all reviewed by a cataloger other than the one w 0 initially prepared the copy. proofreading and revision was done from !l special codes for authors 'table i,. meaning notation code 6 to create added notation on author catalog: type 1: joint author joint auth. coillpuer compo editor ed. e joint editor joint ed. g illustrator illus. i publisher publ. p translator trans. t type 2: to specify a substitute sort key: x use this author as a substitute sort key for previous author. previous author wul appear on appropriate catalog but this author wul not. table 3. special codes for titles code meaning suppress listing this title in title catalog x title is a transliterated title t title is a series title s partial title p d standard title or conventional title in all cases, the first title encountered when processing a given entry will be the only title which appears in the author and subject catalogs. printout on a line printer having only 64 characters available in the char acter set. this number of characters suffices, however, since there are only ~ usable card codes, i.e., the pattern of holes in each column of a holler ith card. there are only 64 valid combinations which can be read by the ~mputer equipment. as illustrated in figure 1, some characters with dlacri~cals r~quire three punched columns to produce one character in the ultimate printout. results thfor the mass conversion which took approximately three months during e ~ummer of 1965, the total cost was slightly less than $34,000, or ap p:oxunately 60¢ per title. in a discussion of the project, after its conclu sion, with the two supervisors of the service bureau in san francisco, it was agreed that in all likelihood the service bureau operators had just 118 journal of library automation vol. 1/2 june, 1968 "d the programs written to process the catalog card data. this ,,ailable' statement to make, yet the road to the production of either ~reached peak efficiency about the time the project terminated. 1'h had the project continued, the cost per title would have decreased. ;t is, work records maintained by the service bureau, it was apparent that roill :first two months of ~he project :vas a learnin.g period? as the output of the operators rose contmually dunng that penod of tune. during the ~e month the productivity curve leveled off considerably. ast table 4 shows the cost of the production operation established in s cruz. the costs used are somewhat arbitrary. for example, the ,~ta punch operator" classification at the university had six steps. in prodey. ing an average cost should the actual rate being earned by the keypu:h operators be used, or the beginning rate, or some other? the amount us~d in table 4 represents an average of the pay being received by the 2.3 full-time equivalent operators, rounded upward to an even amount. holl. erith card costs can also vary slightly. table 4 uses $1.00 per thousand as a reasonable price and one which could probably be obtained any'nhere in the country. costs are based on rates obtaining in february, 1967. table 4. cost per title to produce machine readable catalog data keypunch rental, $65.00/mo. $ .026 (one-shift operation) keypunch operator, $2.10jhr.+20% overhead .168 blank hollerith cards, $1.00/thousand .009 machine listing of cards for proofreading .002 (printing at 390 cards per minute) proofreading, $5.00/hr., 120 titlesjhr. .042 correction of errors .020 total $ .267 discussion there are, of course, hidden costs in the ongoing production at santa cruz that are difficult to fix because the university does not charge for them. for example, there is the cost of space occupied by keyp~ch operators and by equipment, the cost of air conditioning and electncal supply, the cost of adding internal partitions, doors, etc. spread over ~ yearly total of some 30,000 new titles, these unknown costs and the addl tional costs of supervision could not be very great per title, assuming that the rate of keypunching production remains relatively constant. however, labor costs may prove to be a key factor in some geographic areas. in, santa cruz good keypunch operators were available at a reasonable cost, but in large metropolitan areas this may not be true. since the ope~~tor cost is over 60 percent of the total per title, it obviously can be a critical factor. what happens after catalog copy is converted to machine readable fori1l by punch cards or any other method, depends on computer equiprnent .. eos)' rds or book-fonn catalogs is not easy. even the data them .,..,wg ca cause problems. for example tbe reader will note in figure 1 sei"" ':'cha'a used to indicate an up-smt for capital \etters has a cter...t ih de of 0-5-8 fixed by the manufacturer, ibm, and neces~ary to print ~d ~e exp",ded cbaracter set cham. in retrospect ,t would have been .,rth t convert the firuj code configurations as part of a computer ""'"' ~g step than to punch them from the beginning. the 0-5-8 smt p'f~s a special code used within the 1400 series computers, and is known 10 e is ore! mark or wore! separator. in normal operation of the 1401 com asu~these marks are used to delimit fields within the mexnory of the ~chble. certain program commands use these marks to detect when the b bluing or the end of a field has been reached. use of such a code ;ift,e data can raise havoc with a program unless the programmer is con stantly alert to the problem and takes great pains to circumvent it. some other code such as the $ sign might have been used and then converted, prior to the final run on the )401, as part of the computer processing to thewhilecode needed for printing purposes. the number of articles on catalog conversion has not yet be.ml overwhehnblg, it is apparent that there is a great deal of interest in the 'field. one might ask, 'should every library proceed to convert its own catalog?" degennaro has addressed this problem (11) and the reader is referred to that discussion. perhaps the question has no ideal answer. it does seern unfortunate, however, that tbe pre-1955 national union catalog is not to be published from a record that is machine readable. it would seem possible, however, that in the future methods could be s devised to use macbine readable records produced by the larger b'br.,ie and some procedure whereby tbe smaller libraries could check their holdings against those of the larger libraries. by some fairly simple method, a subset of the master machine records could be selected for use in the catalogs of smaller libraries., ~ the santa cruz project began before the library of congress had an nounced results of preliminary plans for the marc project (12). to a fu'am extent the catalog record at santa cruz could be converted into e marc format, although marc goes far . deeper in coding discrete elements of data within the catalog record than does santa cruz. how fiver, to the extent that discrete dahl elements are encoded and identi iz proflcrly in the machine record, any catalog format can be transdrmoo .mto any other catalog format by the computer. the key is the proper identification of each data element. acknowledgmentoth~ author wishes to thank his colleagues at system development cor p ration, ill particular mrs. ann luke, for helpful comments on this paper. 121 __ ~ jvuiiuu or ltbrary automation vol. 1/ 2 june, 1968 references 1. the johns hopkins university. the milton s. eisenhower lib progress report on an operations research and systems en i r~: study ofa university library (baltimore: johns hopkins, 196i)ne~ 2. southern illinois university. office of systems and procedur~. automated circulation control system for the delyte w. morris ~ brary; the system and its progress in brief (carbondale, ill.: southlj. illinois university, 1963). ern 3. cartwright, kelley l.; shoffner, ralph m.: catalogs in book form (berkeley: institute of library research, 1967). . ... 4. international business machines. federal systems division: report on pilot project for converting the pre-1952 national union catalog t a machine readable record (rockville, maryland: ibm, 1965) . 0 5. buckland, l. f.: recording of library of congress bibliographical data in machine form rev. ed. (washington, d. c.: council on library resources, 1965). 6. hammer, donald p.: "problems in the conversion of bibliographical data-a keypunching experiment," american documentation, 19 (jan uary 1968), 12-17. 7. kilgour, frederick g.: "development of computerization of catalogs in medical and scientific libraries," clinic on library applications of data processing, university of illinois, 2nd, 1964, proceedings (champaign: illini union bookstore, 1965), p. 25-35. 8. patrick, robert l.; black, donald v.: "index files; their loading and organization for use," libraries and automation; proceedings of the conference on libraries and automation, airlie foundation, warren ton, virginia, may 26-30,1963 (washington, d. c.: library of congress, 1964 ), p. 29-53. 9. chapin, richard . e.; pretzer, dale h.: "comparative costs of co~ verting shelf list records to machine-readable form," journal of lt. brary automation, 1 (march 1968), 66-74. 10. voigt, melvin j.; treyz, joseph h.: "new campuses prograill (ucsd, uci, and ucsc)," library journal, 90 (may 15, 1965) , p. 2204-08. 1l. degennaro, r. a.: "a strategy for the conversion of research library catalogs to machine readable form," college & research libraries, 28 (july 1967), p. 253-257. 12. u. s. library of congress, information systems office: a prelimi~ry report on the marc (machine-readable catalog) pilot project . (washington, d. c.; library of congress, 1966). costs of library catalog cards produced by computer frederick g. kilgour, ohio college library center, columbus, ohio .pffjductw costs of 79,831 cards are analyzed. cards were produced by ntout ".,.;ants of the columbia.harvard-yale procedure employing an ibm ({l0 document writer and an ibm 1401 computer. costs per card ranged from 8.8 to 9.8 cents for completed cards. early in september, 1964, the yale medical library put into routine oper ation the columbia-harvard-yale computerized technique for catalog card manufacture (1), and during the following three years yale pro duced over 87,000 cards. the principal objective of the chy project was an on-line, computerized, bibliographie lnformation retrieval system. however, the route selected for attaining the objective included manu facture of cards from machine readable data to keep up the manual catalog while machine readable records were being inexpensively ac cumulated for computerized subject retrieval. catalog cards were only:iii .product of the system, but their production was designed to be as clent as possible within constraints of the system. nevertheless, this pjper will examine chy card production costs as though this segment ? the system were an isolated procedure, yielding but one product, as hethe case in classical library procedures. costing will disregard other ii nellis, such as accession lists and machine readable data produced lor ttie, or no, additional expense. ~ columbia medical library and harvard medical library also in· s ed ibm 870 document writers and tested the programs for card production, but neither library routinely produced cards. however, co metadata provenance and vulnerability timothy robert hart and denise de vries information technology and libraries | december 2017 24 timothy robert hart (tim.hart@flinders.edu.au) is phd researcher and denise de vries (denise.devries@flinders.edu.au) is lecturer of computer science, college of science and engineering, flinders university, adelaide, australia. abstract the preservation of digital objects has become an urgent task in recent years as it has been realised that digital media have a short life span. the pace of technological change makes accessing these media increasingly difficult. digital preservation is primarily accomplished by main methods, migration and emulation. migration has been proven to be a lossy method for many types of digital objects. emulation is much more complex; however, it allows preserved digital objects to be rendered in their original format, which is especially important for complex types such as those comprising multiple dynamic files. both methods rely on good metadata to maintain change history or construct an accurate representation of the required system environment. in this paper, we present our findings that show the vulnerability of metadata and how easily they can be lost and corrupted by everyday use. furthermore, this paper aspires to raise awareness and to emphasise the necessity of caution and expertise when handling digital data by highlighting the importance of provenance metadata. introduction unesco recognised digital heritage in its “charter on the preservation of digital heritage,” adopted in 2003, stating, “the digital heritage consists of unique resources of human knowledge and expression. it embraces cultural, educational, scientific and administrative resources, as well as technical, legal, medical and other kinds of information created digitally, or converted into digital form from existing analogue resources. where resources are ‘born digital’, there is no other format but the digital object.” 1 born-digital objects are at risk of degradation, corruption, loss of data, and becoming inaccessible. we combat this through digital preservation to ensure they remain accessible and useable. the two main approaches to preservation are migration and emulation. migration involves migrating digital objects to a different and currently supported file type. emulation involves replicating a digital environment in which the digital object can be accessed in its original format. both methods have advantages and disadvantages. migration is the more common method because it is simpler than emulation and the risks can often be neglected. these risks include potential data loss or change, in which the effects are permanent. emulation is complex, but it offers the better means to access preserved objects, especially complex file types comprising multiple dynamic files that must be constructed correctly. emulation also allows users to handle digital objects as closely to the “look and feel” as originally intended. 2 mailto:tim.hart@flinders.edu.au mailto:denise.devries@flinders.edu.au metadata provenance and vulnerability | hart and de vries 25 https://doi.org/10.6017/ital.v36i4.10146 accurate and complete metadata is central to both migration and emulation; thus, it is the focus of this paper. metadata are needed to record the migration history of a digital object and to record contextual information. they are also necessary to accurately render digital objects in emulated environments. emulated environments are designed around a digital object’s dependencies , which typically include, but are not limited to, drivers, software, and hardware. 3 the metadata describe the attributes of the digital object from which we can derive the type of system in which it can run (e.g., the operating system), the versions of any software dependencies, and other criteria that are crucial for accurate creation of an emulated environment. while metadata are being used to support the preservation of digital objects, there is another equally important role it should be playing. it is not enough to preserve the object so it can be accessed and used in the future. what of the history and provenance of the digital object? what about search and retrieval functionality within the archive or repository the digital object is held in? one must consider how these preserved objects will be used in the future, and by whom. preserving digital objects is difficult if adequate metadata is not present, especially if the item is outdated and no longer supported. looking to the future, we should try to ensure metadata are processed correctly for the lifecycle of the digital object. this means care must be taken at the time of creation and curation of any digital objects because although some metadata are typically generated automatically, many elements that will play a pivotal role later must be created manually. digital objects also commonly go through many changes, which is something that must be captured, as the change history will reveal what has happened to the object over of its lifecycle. the changes may include how the object has been modified, migrations to different formats, and what software created or changed the object—all of which is considered when emulating an appropriate environment. examples of these changes can be found in case studies presented in the paper. metadata types the common and more widely used metadata types include, but are not restricted to, administrative, descriptive, structural, technical, transformative, and preservation metadata. each metadata type describes a unique set of characteristics for digital objects. administrative metadata include information on permissions as well as how and when an object was created. transformative metadata includes logs of events that have led to changes to a digital object. 4 structural metadata describe the internal structure of an object and any relationships between components. technical metadata describe the digital object with attributes such as height, weight, format, and other technical details. 5 preservation metadata support digital preservation by maintaining authenticity, identity, renderability, understandability, and viability. they are not bound to any one category as they comprise multiple types of metadata, not including descriptive or contextual metadata. however, unlike the common metadata types, preservation metadata are unique from the other metadata types and are often ambiguous. 6 in 2012, the developers of version 2.2 of the premis data dictionary for preservation metadata saw descriptive metadata as less crucial for preserving digital objects; however, they did state it was important for discovery and decision making. 7 while version 2.2 allowed descriptive information technology and libraries | december 2017 26 metadata to be handled externally through existing standards such as dublin core, the latest version (2017) of the dictionary allows for “intellectual entities” to be created within premis that can capture descriptive metadata. 8 thus, while digital preservation does not require all types of metadata, the absence of contextual metadata limits the future possibilities for the preserved object. hart writes that because the multimedia objects are dynamic and interactive, and often composed of multiple image, audio, video, and software files, descriptive metadata are increasingly important because they can be used to describe, organise, and package the files. 9 it is also stressed that content description is of great importance because digital objects are not self-describing, which makes identifying semantic-level content difficult; without description metadata, context is lost. 10 for example, without description metadata to provide context, an image’s subject information and search and retrieval functionality is lost. without this information, verifying whether an object is the original, a copy, or a fabricated or fraudulent item is impossible in most cases. metadata vulnerability—case studies digital objects that are currently being created often go through several modifications, making it difficult to identify the original or authentic copy of the object. verifying and validating authenticity is important for preserving, conserving, and archiving objects. the digital preservation coalition defines authenticity as the digital material is what it purports to be. in the case of electronic records, it refers to the trustworthiness of the electronic record as a record. in the case of “born digital” and digitised materials, it refers to the fact that whatever is being cited is the same as it was when it was first created unless the accompanying metadata indicates any changes. confidence in the authenticity of digital materials over time is particularly crucial owing to the ease with which alterations can be made. 11 tests were undertaken to discover how vulnerable metadata can be in digital files that are subject to change, which can lead to loss, addition, and modification. the tests were conducted using the file types jpeg, pdf, and docx (word 2007). the tests revealed what metadata can be extracted and what metadata could be present in the selected file types. furthermore, they revealed how specific metadata can verify and validate the authenticity of a file such as an image. for each test, the metadata were extracted using exiftool (http://owl.phy.queensu.ca/~phil/exiftool/). alternative browser-based tools were tested and provided similar results; however, exiftool was selected as the primary testing tool because it produced the best results and had the best functionality. some of the files tested provided extensive sets of metadata that are too large to include, but subsets can be found in hart (2009). note that only subsets are included because some metadata was removed for privacy and relevance reasons. the process and method for each test was conducted in the following manner: http://owl.phy.queensu.ca/~phil/exiftool/ metadata provenance and vulnerability | hart and de vries 27 https://doi.org/10.6017/ital.v36i4.10146 • case study 1—jpeg o original metadata extracted for comparison o image copied, metadata extracted from copy and examined for changes o file uploaded to social media, downloaded from social media, extracted and examined against original • case study 2—jpeg (modified) o original metadata extracted for comparison o image opened and modified in photo editing software (adobe photoshop), metadata extracted from new version and examined against original • case study 3—pdf o basic metadata extraction performed to establish what metadata are typically found in pdf files and what types of metadata could be possible • case study 4—docx o original metadata extracted for comparison o file saved as pdf through microsoft word and metadata compared to original o file converted to pdf through adobe acrobat and metadata compared to original case study 1 this case study investigated the everyday use of digital files, the first being simply copying a file. it was revealed that copying a file creates an exact copy of the original file and no changes in metadata aside from the creation and modification time/date. thus, the copy could not be identified against the original unless the original creation time/date was known. the second everyday use was uploading an image to facebook. the metadata-extraction tests revealed that the original file had approximately 265 metadata elements. (the approximation is caused by the ambiguity of certain elements that may be read as singular or multiple entries.) these elements included, but were not limited to, the following: • dates • technical metadata • creator/author information • color data • image attributes • creation-tool information • camera data • change • software history many of the metadata elements had useful information for a range of situations. even so, several metadata elements were missing that would require a user input for creation. once the file had been uploaded to and then downloaded from social media, approximately 203 metadata elements were lost, included date, color, creation-tool information, camera data, change, and software history. it can be argued that removing some of this metadata would help keep user information private, but certain metadata should be retained, such as change and software history. these information technology and libraries | december 2017 28 metadata make it easier to differentiate fabricated images from authentic images and to know which modifications have been made to a file. for preservation purposes, the missing metadata is what may be needed to provide authenticity. this case study aims to make users aware of the significant risk of metadata loss when dealing with digital objects. if metadata are not identified and captured before the object is processed within a repository, the loss could be irreversible. case study 2 the second case study revealed how the change and software history metadata can be used to easily identify when a file has been modified. in the test conducted, it was evident by visually comparing the images that changes were made; however, modifications are not always obvious as some changes can be subtle, such as moving an element in the image that completely changes what the image is conveying. the following example displays the change history from the image used in case study 1, revealing how the metadata can easily identify modification: • history action—saved, saved, saved, saved, converted, derived, saved • history when—the first saved was at 2010:02:11 21:59:05, the last saved was at 2010:02:11 22:12:01 with each action having its own timestamp • history software agent—adobe photoshop cs4 windows for each action • history parameters—converted from tiff to jpeg further testing was conducted with simple photo manipulation using an original image to see firsthand the issues described in the initial test. the image contained approximately 178 metadata elements, including the typical metadata that were found in the first case study. once the image was processed and modified with adobe photoshop cs5, the metadata were no longer identical. the modified image had approximately 201 metadata elements. the new elements included photoshop-specific data, change, and software history. however, extensive camera data were lost. it can be argued that the camera data are not important for digital preservation because the lack of it will not hinder the preservation process. however, once the file is preserved and those data are lost, important technical and descriptive information can never be regained. for example, consider a spectacular digital image that captures an important moment in history. if that image is preserved for twenty years, in that time cameras and perhaps photography itself will have advanced dramatically. how digital images are captured and processed might be completely different and will most likely provide different results. should someone wish to know how that preserved image was captured, they would need to know what camera was used, lens and shutter speed data, lighting data, and other technical information. preserving those metadata can be almost as important as preserving the file itself because each metadata element has importance and meaning to someone. as most viewers of online media are aware, photos are often modified, especially on social media. this is often performed on “selfies,” pictures taken of oneself. these can be modified to make the person in the photo look better or to hide features they see as flawed. small modifications, such as covering some blemishes or improving the lighting have little effect on the image’s context, but some modifications and manipulations that can mislead people. these manipulated images often metadata provenance and vulnerability | hart and de vries 29 https://doi.org/10.6017/ital.v36i4.10146 take the form of viral hoax images circulating around the web. for example, figure 1 displays how two images can be combined into a composite image that changes the context of the image. figure 1. composite image. “photo tampering throughout history,” fourandsix technologies, 2003, http://pth.izitru.com/2003_04_00.html. the two images side by side are original photos taken in basra of a british soldier gesturing to iraqi civilians to take cover. in the right image, the iraqi man is holding a child and seeking help from the solider; as you can see, this soldier does not interpret this as a hostile act. the image above is a composite of the two that changes the story. in this image, the soldier appears to be responding with hostility toward the man approaching. with basic photo manipulation, this soldier who is protecting innocent civilians is portrayed holding them against their will. images like this circulate through media of all types, and although the exchangeable image file format (exif) metadata may not identify what has been done to the image, it would eliminate any doubt that the image has been modified. unfortunately, these data are not made available. making users aware of this vulnerability may improve detection of file manipulation at the time of ingest to better ensure only accurate and authentic material is being considered for preservation. donations received by digital repositories such as libraries must be scrutinised by trained individuals. with this awareness and knowledge of metadata, they can perform their duties to a much higher standard. case study 3 the pdf metadata extraction provided interesting results. over a range of tests on academic research papers, the main metadata identified consisted of pdf version, author, creator, creation date, modification date, and xmp (adobe extensible metadata platform) data. these metadata http://pth.izitru.com/2003_04_00.html information technology and libraries | december 2017 30 were not present in every pdf tested; in fact, the majority of pdf files seemed to be lacking important metadata. the author and creator fields were generally listed as “administrator” or “user” and bibliographic metadata was usually missing. however, pdf openly supports xmp embedding, therefore, bibliographic metadata could be embedded into the pdf. through further testing, bibliographic metadata linked to the pdfs were discovered stored in online databases. bibliographic software such as endnote and zotero allow metadata extraction, which enables users to import pdf files and automatically generate the appropriate bibliographic metadata. for example, zotero performs this extraction by first searching for a match for the pdf on google scholar. if this search does not return a match, zotero uses the embedded digital object identifier (doi) to perform the match. this method is not consistent: it often fails to retrieve any data, and in rare cases it retrieves the wrong data, which leads to incorrect references. given what we saw happen to metadata when a file is uploaded such as in case study 1 and the nature of a pdf’s journey through template selection, editing, and publishing, it is no surprise that metadata are lost or diluted along the way. case study 4 the fourth case study conducted on docx files provided an extensive set of metadata, some of which are unique to this file type. creating a new word document via the file explorer context menu and attempting to extract metadata resulted in an error as there were no readable metadata to extract until the file was accessed and saved. once the file had some user input and was saved, the metadata were created and could be extracted. microsoft office files contain external xml files that holds information about the document, such as formatting data, user information, edit history, and information about the document’s page count, word count, etc. picture a docx file as an uncompressed directory. however, using exiftool on the docx file allowed retrieval of the metadata from all the hidden files. the metadata included creation, modification, and edit information, such as number of edits and total edit time. every element within the document (e.g., text, images, tables, etc.) has its own metadata attached that are crucial for preserving the format of the document. the next step in the test involved converting the docx file into pdf using the following two methods: (1) converting the document via the “publish” save option within microsoft word; and (2) “right clicking” the document and selecting the option to convert to an adobe pdf. the results of the two methods varied slightly. method 1 stripped all the metadata from the document and generated only default pdf metadata consisting of system metadata (file size, date, time, permissions) and the pdf version, author details, and document details. method two behaved the same way except that some xmp metadata were created. both methods resulted in no informative metadata remaining as the majority of the xmp elements were empty fields or contained generic values such as the computer name as the author. all formatting and metadata unique to microsoft word was lost. this case study is an enlightening example of what can happen to metadata when a file is changed from one format to another. metadata provenance and vulnerability | hart and de vries 31 https://doi.org/10.6017/ital.v36i4.10146 human intervention the human element is a requirement in digital preservation as certain metadata, such as descriptive and administrative metadata, can only be created by humans. in fact, as hart notes, user input is needed to record the majority of the digital preservation metadata. 12 the process can be tedious, as described by wheatley. 13 one of the examples described included following the processes in a repository from ingest to access, beginning with the creation of metadata and the managerial tasks that are necessary. these tasks include using extraction tools and automation where possible. using frameworks to record changes to metadata is required, and in some cases metadata must be stored externally to their digital objects. this allows multiple objects of the same type to utilise a generic set of metadata to avoid redundant data. however, although using a generic metadata set is convenient, a large collection of digital objects could be affected if the metadata is lost or damaged. the human element increases the risk of error drastically because there are numerous steps to metadata creation. misconduct is also possible. therefore, the less digital preservation is reliant on humans (and the easier the tasks are that require human input), the better. this can only be achieved by automating most process and training people to ensure they handle their responsibilities accurately, consistently, and completely. learning the results from the case studies like those described in this paper will better prepare users working with digital objects. discussion to achieve the most authentic, consistent, and complete digital preservation, institutions must revise their preservation workflows and processes. this entails ensuring the initial processes within workflows are correct before processing digital content. the content must come from a credible source and have its authenticity approved. participation from the donor of the digital content might be beneficial if they can provide information and metadata about the content. this information could provide additional context for the content as well as identify its history (e.g., format migration or modification). this is not always possible as the donor is not always be the creator of the digital content. if the original source is no longer available, as much information as possible should be gathered from the donor about the acquisition of the content and any information regarding the original source. this should be considered and carefully monitored throughout the lifecycle of digital content. granted, if no changes are needed, devices such as write blockers can ensure this as they restrict users and any systems from making unwanted changes or “writes.” however, changes are sometimes unavoidable and (although it may not affect the content) detrimental. when changes are required, it is crucial to maintain the digital history by capturing all metadata added, removed, or modified during processing, commonly known as the “change history.” donor participation should be stipulated in a donor agreement, something that each institution offers to all donors, sometimes in the form of agreements through communication and often with a structured document. donor-agreement policies differ for each institution: some are quite detailed, allowing donors to carefully stipulate their conditions, whereas others place most of the information technology and libraries | december 2017 32 responsibility on the receiving institution. when dealing with sensitive or historic data of importance, policies should be in place to capture adequate data from the donor. when the content does not fall into this category, standard procedures, which should be present in all donor agreements and institution policies, can be followed. institutions must also consider when to apply these steps as some transactions between donor and institution can follow standard protocol; others are more complex, such as donations of content with diverse provenance issues. conclusion we have presented four case studies that illustrate how vulnerable digital-object metadata are. these examples show that common methods of handling files can cause irretrievable loss of important information. we discovered significant loss of metadata when uploading photos to social media and when converting a file to another format. the digital footprint left behind from photo manipulation was also exposed. we shed light on the bibliographic-metadata generation of pdf files, how they are obtained, and the surrounding issues. action is needed to ensure proper metadata creation and preservation for born-digital objects. librarians and archivists must place a greater emphasis on why digital objects are preserved as well as how and when users may need to access them. therefore, all types of metadata must be captured to allow users from all disciplines to take advantage of historical data in many years to come. given the rate of technological change, we must be prepared; observing first-hand the vulnerability of metadata is a step toward a safer future for our digital history. references 1 “charter on the preservation of digital heritage,” unesco, october 15, 2003, http://portal.unesco.org/en/ev.phpurl_id=17721&url_do=do_topic&url_section=201.html. 2 k. rechert et al., “bwfla—a functional approach to digital preservation,” pik—praxis der informationsverarbeitung und kommunikation 35, no. 4 (2012), 259–67. 3 k. rechert et al., design and development of an emulation-driven access system for reading rooms, archiving conference, 2014, 126–31, society for imaging science and technology, 2014. 4 m. phillips et al., the ndsa levels of digital preservation: explanation and uses, archiving conference, 2013, 216–22, society for imaging science and technology, 2013. 5 “premis: preservation metadata maintenance activity” library of congress, accessed march 10, 2016, http://www.loc.gov/standards/premis/. 6 r. gartner and b. lavoie, preservation metadata (2nd edition) (york, uk: digital preservation coalition, 2013), 5–6. http://portal.unesco.org/en/ev.php-url_id=17721&url_do=do_topic&url_section=201.html http://portal.unesco.org/en/ev.php-url_id=17721&url_do=do_topic&url_section=201.html http://www.loc.gov/standards/premis/ metadata provenance and vulnerability | hart and de vries 33 https://doi.org/10.6017/ital.v36i4.10146 7 premis editorial committee, premis data dictionary for preservation metadata, version 2.2 (washington, dc: library of congress, 2012), http://www.loc.gov/standards/premis/v2/premis-2-2.pdf. 8 premis editorial committee, premis schema, version 3.0 (washington, dc: library of congress, 2015), http://www.loc.gov/standards/premis/v3/premis-3-0-final.pdf. 9 timothy hart, “metadata standard for future digital preservation” (honours thesis, flinders university, adelaide, australia, 2015). 10 j. r. smith and p. schirling, “metadata standards roundup,” ieee multimedia 13, no 2 (april-june 2006): 84–88. 11 “glossary,” digital preservation coalition, accessed august 5, 2016, http://handbook.dpconline.org/glossary. 12 timothy hart, “metadata standard for future digital preservation” (honours thesis, flinders university, adelaide, australia, 2015). 13 paul wheatley, “institutional repositories in the context of digital preservation,” microform & digitization review 33, no. 3 (2004): 135–46. http://www.loc.gov/standards/premis/v2/premis-2-2.pdf http://www.loc.gov/standards/premis/v3/premis-3-0-final.pdf http://handbook.dpconline.org/glossary abstract introduction metadata types metadata vulnerability—case studies case study 1 case study 2 case study 3 case study 4 human intervention discussion conclusion references microsoft word june_ital_gerrity.docx editor’s comments bob gerrity information technology and libraries | june 2015 1 library discovery circa 1974 our ongoing project to digitize back issues of information technology and libraries (ital) and its predecessor, journal of library automation (jola), provides frequent reminders of what’s changed (and what hasn’t) in library technology in the past several decades. the image above is from a 1974 advertisement in jola for the “rom ii book catalog on microfilm” from information design in menlo park, ca. the ad copy speaks for itself: all the advantages of a printed book catalog…none of the disadvantages. your staff and patrons can use the catalog simultaneously in many different locations. the user can scan a number of related titles on the same page, in contrast to the one-‐at-‐a-‐time viewing of catalog cards in trays. manual filing routines and maintenance are eliminated. easy to use…requires no instruction. an automatic index pointer shows your patron his position in the file. at the touch of a button he can scan forward or back at high speed. average look-‐up time is about twelve seconds. a staff member can insert an updated catalog totally cumulated on a single reel of microfilm in about one minute. your patrons never touch the film—your complete library catalog “locked-‐in”! bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, australia. editor’s comments | gerrity doi: 10.6017/ital.v34i2.8805 2 my favorite bit is the sign on the front of the machine, proudly proclaiming: these are all the books in the library. this month’s issue of ital looks at the current state of library discovery from a number of angles. will owen and sarah michalak describe efforts at unc chapel hill and partners within the triangle research libraries network to enhance the utility of the library catalog as a core tool for research, taking advantage of web-‐based search technologies while retaining many of the unique attributes of the traditional catalog. joseph deodato provides a useful step-‐by-‐ step guide to evaluating web-‐scale discovery services for libraries. david nelson and linda turney analyze faceted navigation capabilities in library discovery systems and offer suggestions for improving their usefulness and potential. julia bauder and emma lange describe a new approach to subject searching, using an interactive, visual approach. yan quan liu and sarah briggs report on the current state of mobile services among the top 100 us university libraries. unrelated to discovery but certainly relevant to issues around library provision of access to information, jill ellern, robin hitch, and mark stoffan report on user authentication policies and practices at academic libraries in north carolina. selecting a web content management system for an academic library website | black 185 others. the osu libraries needed a content management system (cms). web content management is the discipline of collecting, organizing, categorizing, and structuring information that is to be delivered on a website. cmss support a distributed content model by separating the content from the presentation and giving the content provider an easy to use interface for adding content. but not just any cms would work. it was important to select a system that would work for the organization. the focus of this article is the process followed by osu libraries in the selection of a web cms. other aspects of the project, such as the creation of the user focused information architecture, the redesign of the site, the implementation of the cms, and the management of the project are outside the scope of this article. ■■ literature review content and workflow management for library web sites: case studies, a set of case studies edited by holly yu, a special issue of library hi tech dedicated to content management, and other articles effectively outlined the need for libraries to move from static websites, dominated by html webpages, to dynamic database and cms driven websites.1 each of these works noted the messy, unmanageable situation of the static websites in which the content is inconsistently displayed and impossible to maintain. seadle summarizes the case well when he wrote “a content management system (cms) offers a way to manage large amounts of web-based information that escapes the burden of coding all of the information into each page in html by hand.”2 a cms provides an interface for content providers to add their contributions to the website without requiring knowledge of html; it separates the layout and design of the webpages from the content and provides the opportunity for reuse of both content and the code running the site. these features of a cms permit a library to professionalize its website by enforcing a consistency of design across all pages while at the same time increasing efficiency by making the maintenance of the content itself less technically challenging.3 the potential of the cms is powerful, yet it is not an easy process to select and implement a cms. one challenge is that the process of selecting and implementing a cms is not a fully technical one. the selection must be tied to the goals and strategy of the library and parent elizabeth l. black selecting a web content management system for an academic library website this article describes the selection of a web content management system (cms) at the ohio state university libraries. the author outlines the need for a cms, describes the system requirements to support a large distributed content model and shares the cms trial method used, which directly included content provider feedback side-by-side with the technical experts. the selected cms is briefly described. i magine a city that has been inhabited consistently for hundreds, perhaps thousands of years. those arriving in the city’s main port follow clear, wide paths that are easy to navigate. soon, however, the visitor notices that the signs change. they look similar but the terms are different and the spaces have an increasingly different look. continuing further, the visitor is lost. some sections look drastically different, as if they belong in an entirely different city. other sections are abandoned. the buildings at first seem occupied, but upon closer inspection all is old and neglected. the visitor tries to head back to the main, clear sections but cannot find the way. in frustration, the visitor leaves the city and moves on, often giving up the mission that led to the city in the first place. this metaphor describes the state of the ohio state university (osu) libraries’ website at the beginning of this project. the website has many content providers, more than 150 at one point. these content providers were given accounts to ftp files to the web server and a variety of web editors with which to manage their files. the site consisted of more than 100,000 files of many types: html, php, image files, microsoft office formats, pdf, etc. the files with content were primarily static html files. in 2005, the osu libraries began to implement a php-based template that included three php statements that called centrally maintained files for the header, the main navigation, and the footer. the template also called a series of centrally controlled style sheets. the goal was to have the content providers add the body of the pages and leave the rest to be managed by these central files. this didn’t work as intended. because of a combination of page editing practices learned with static html and a variety of skill with cascading style sheets (css), many pages lost the central control of the header, menu, and footer. also, the template was confusing for many because they had to wade through a lot of code they didn’t understand. one part of this content model was right—giving the person with the content knowledge the power to update the content while centrally controlling parts that should remain consistent throughout the website. unfortunately, the technical piece of the model didn’t support this goal. it required too much technical knowledge from the content providers. the real solution was a system that would allow the content providers to focus on their content and leave the technical knowledge to elizabeth l. black (black. 367@osu.edu) is head, web implementation team, ohio state university libraries, columbus, ohio. 186 information technology and libraries | december 2011 and interviews/focus groups with the current content providers. the research was similar to that described previously in the literature review section of this article. the most helpful for this project was a 2006 issue of library hi tech focused on cmss.10 the most useful of these articles was wiggins, remley, and klingler’s article about the work done at kent state university, particularly the way in which they organized their requirements.11 a working group of four served as interviewers for the focus groups with current web content providers. they worked in pairs, with one serving as a recorder and the other as the facilitator, who asked the questions. fifteen interview sessions were held over a period of three months. the focus group participants were invited to participate in like groups as much as possible, so for example the foreign language librarians were interviewed together in a different session from the instruction librarians. however, no one participated more than once in an interview. the same set of guiding questions was used for each interview. they are included in the appendix. the results of these interviews became the basis for the requirements document to which the technical team added the technical requirements. ■■ the cms requirements the requirements were gathered into five categories: content creation and ownership, content management, publishing, presentation, and administration/technical. these categories were modeled after those used for the project at kent state university.12 the full list is detailed below by category. content creation and ownership requirements ■■ separation of content and presentation: the content owners can add and edit content without impact on the presentation ■■ web-based gui content-editing environment that is intuitive and easy to learn without knowledge of html ■■ metadata created and maintained for each webpage or equivalent content level that contains: ■❏ owner ■❏ subject terms or tags describing the content ■■ multi-user authoring without overwriting ■■ can handle a large number of content providers (approximately 200) ■■ can integrate rss and other dynamic content from other sources ■■ can handle different content types, including: ■❏ text ■❏ images organization, must meet specific local requirements for functionality, and must include revision of the content management environment, meaning new roles for the people involved with the website.4 karen coombs noted that “the implementation of a content management system dramatically changes the role of the web services staff” and requires training for the librarians and staff who are now empowered to provide the content.5 another challenge was and continues to be a lack of a turn-key library cms.6 several libraries that did a systematic requirements gathering process generally found that the readily available cmss did not meet their requirements, and they ended up writing their own applications.7 building a cms is not a project to take lightly, so only a select few libraries with dedicated in-house programming staff are able to take on such an endeavor. the sharing of the requirements of these in-house library specific cmss is valuable for other libraries in identifying their own requirements. in the past few years, the field of open-source cmss has increased, making it more likely that a library will find a viable cms in the existing marketplace that will meet the organization’s needs. drupal is an open-source cms that was one of the first viable options for libraries and so is widely used in the library community. it was the subject of an edition of library technology reports in 2008.8 since drupal opened the door for open-source cmss in libraries, others have entered the market as well. in 2009 john harney noted, “there are few technologies as prolific as web content management systems. some experts number these systems in the 80-plus range, and most would concede there are at least 50.”9 the cms selection process described here builds on those described in the literature by integrating their requirements and methods to address the needs of a very large decentralized website. it builds on the increased emphasis on user involvement in technology solution building and selection by fully incorporating the cms users in the selection process. further, the process described here took place after those described in the literature, after the opensource cms field had significantly improved. the options were much greater at the time of this study and this article describes the increased possibilities of second generation cmss. while there still does not exist the perfect library ready turn-key cms, there are many excellent, robust open-source cmss available. this article describes one process for selecting among them, including an in-depth trial of three major systems: drupal, modx, and silverstripe. ■■ gathering requirements there were two parts to the requirements gathering process undertaken at osu libraries: research of the literature selecting a web content management system for an academic library website | black 187 ■■ meets established usability standards ■■ dynamic navigation generated for main and subsections of website that includes breadcrumbs and menus ■■ searching ■❏ search engine for website ■❏ can pass searches on to other library web servers ■■ mobile device friendly version (optional) ■■ delivery presentable in the browsers used most heavily by osu libraries websites visitors ■■ page load time is acceptable ■■ easy search engine optimization administration ■■ lamp (linux, apache, mysql, php) platform ■■ good documentation for technical support and end users ■■ scalable in terms of both content and traffic ■■ skills required to maintain system are available at osul the next step was to take this extensive requirements list and identify cmss that would be appropriate for a side-by-side test with both content providers and systems engineers. ■■ cms trial the web cms would become a critical part of the web infrastructure so it was important to ensure selection of the best system for both the content providers and the it team. between may 21 and august 29, 2008, two groups worked with the cmss, testing them on criteria taken from the initial requirements documents. the first team included fourteen content providers with diverse content areas and diverse technical skills; this group rated each system on a content providers set of criteria. the second team, which included the systems engineer and a technical support specialist, rated each system on a set of criteria that was more technical in nature. each participant used a microsoft excel spreadsheet containing requirements condensed from the full list. they rated each system on a scale of 1 to 3 for each criterion, where 1 was difficult, 2 was moderate, and 3 was easy. the project manager in the it web team led the trial. the criteria given to the content providers were: ■■ web gui intuitiveness ■■ media integration ■■ editor ease of use ■■ ability to add content ■■ ability to preview content ■■ ability to publish content ■■ metadata storage ■❏ videos ■❏ camtasia/captivate tutorials ■❏ flash files ■■ content owners can create tables and/or databases for display of tabular data in the web gui interface ■■ content owners can create forms in the web gui interface ■■ option for faculty and professional staff to have webpages featuring their work and their profiles ■❏ all staff must have some control over the personal information available about them on the public website content management ■■ link maintenance ■❏ does not allow internal pages to be deleted if linked to by another cms page ■❏ can regularly check the viability of external links ■❏ periodic reminders to content owners to check their content ■■ way to repurpose content elements to multiple pages for content such as: ■❏ descriptions of article and research databases ■❏ highlight or feature content elements ■■ access controls ■❏ that allows content owners to only edit their content ■❏ that allow web liaisons to provide first line support for their departments ■❏ integrates into our existing security structures (shibboleth) ■■ robust reporting features ■❏ integration with quality web analytics software ■❏ content update tracking ■❏ system usage ■❏ customized report creation publishing ■■ ability to preview before publishing ■■ cms can produce rss feeds for dynamic sections of content ■■ page templates and style sheets are used to control page layout and design centrally ■■ display non-roman scripts using unicode ■■ extensible—can incorporate non-cms content into the site ■■ ability to add personalization options for site users presentation ■■ meets ada and w3c accessibility requirements ■■ code validates to current html specifications 188 information technology and libraries | december 2011 on the technical requirement that the system be easy to extend and integrate. a simple website served as the hub for the cms trial. the site included links to each cms instance, a link to the project blog for updates from the project team, and a link to a wiki space where trial participants shared ideas and thoughts with one another. the web team’s issue tracking system was integrated into this site so participants could easily ask questions of the technical team and report problems. time was set aside each week to handle all reported issues. of the sixteen participants who started the trial, thirteen completed a criteria spreadsheet. the project manager totaled and then averaged the scores provided by each group to determine the overall content provider score and the overall technical score (see figure 1). in the end, both the content providers and the systems engineers agreed that silverstripe was the cms that best met the requirements. ■■ silverstripe silverstripe was released as an open-source cms in november 2006 by silverstripe limited. they had developed the cms as part of their business of creating websites for clients. the company was founded in 2000 and is headquartered in wellington, new zealand. the company continues to use the cms for their website business and also offers paid support for the cms. the testers agreed that silverstripe provided the best match in the areas of easy content creation by multiple authors, handling multilingual content, management of different types of content and content files, search engine optimization, and meeting web standards. a strong and growing open-source community and strong documentation were additional keys to the selection of silverstripe.14 use by high-profile clients, such as the 2008 democratic national convention, provided proof that silverstripe could handle high traffic. the content providers praised silverstripe for the intuitive user interface, the system’s ease of use, specifically the ease of previewing and publishing content. they also noted that silverstripe handled the metadata supporting the pages as well as tabular and form page content better than the other systems. the technical evaluators noted silverstripe’s modular structure, which makes it flexible enough to integrate easily with existing web applications and accommodate local customizations without modifying the core system. silverstripe includes a template language, which fully separates the content from the presentation. in practice, this means that even informed users cannot spot a silverstripe website through simple web browsing, as is common with other cmss. ■■ ability to “feature items” ■■ ability to add rss feeds ■■ ability to enter tabular data ■■ ability to create forms ■■ testing area for new features the criteria given to the systems engineers were: ■■ installation ■■ maintainability ■■ technical documentation ■■ active developer community ■■ structure management (subsites/trees) ■■ access control/permissions ■■ link management ■■ ease of extensibility ■■ interoperability (data portability and web services) the cms requirements document was used in conjunction with the cmsmatrix (http://cmsmatrix.org) website to select five cmss to participate in a trial.13 the five systems selected were drupal 6.2, modx 0.9.6, silverstripe 2.2.2, plone 3.0.6, and typo3. the systems engineer installed all five cmss on a development server and did a simple configuration to make each operational for testing. it was at this stage that plone and typo3 were dropped from the trial because they took too long to configure and set up. the goal was to do a simple installation of the base cms, without any modules, but some systems were not functional as a cms without some modules so we added modules selectively. at the point of the selection of the systems for the trial, the project leaders noted that the entire list of requirements could not be met by an existing cms. they also noted that the majority of the key needs could be met with an existing system. therefore the goal remained to select an existing open-source web cms with the emphasis figure 1. cms trial scores selecting a web content management system for an academic library website | black 189 web guides in a content management system,” library hi tech 24, no. 1 (2006): 29–53; yan han, “digital content management: the search for a content management system,” library hi tech 22, no. 4 (2004): 355–65; david kane and nora hegarty, “new web site, new opportunities: enforcing standards compliance within a content management system,” library hi tech 25, no. 2 (2007): 276–87; ed salazar, “content management for the virtual library,” information technology & libraries 25, no. 3 (2006): 170–75. 2. seadle, “content management systems,” 5. 3. huttenlock, beaird, and fordham, “untangling a tangled web”; kane and hegarty, “new web site, new opportunities”; salazar, “content management for the virtual library.” 4. holly yu, “library web content management: needs and challenges,” in content and workflow management for library web sites: case studies, ed. holly yu, 1–21 (hersey, pa.: information science, 2005). 5. karen coombs, “navigating content management,” library journal 133 (winter 2008): 24. 6. yu, “library web content management,” 10. 7. goans, leach, and vogel, “beyond html”; salazar, “content management for the virtual library”; rick wiggins, jeph remley, and tom klingler, “building a local cms at kent state,” library hi tech 24, no. 1 (2006): 69–101; regina beach and miqueas dial, “building a collection development cms on a shoe-string,” library hi tech 24, no. 1 (2006): 115–25. 8. andy austin and christopher harris, library technology reports 44, no. 4 (may/june 2008). 9. john harney, “are open-source web content management systems a bargain?” infonomics 23, no. 3 (may/june 2009): 59–62. 10. library hi tech 24, no. 1 (2006). 11. wiggins, remley, and klingler, “building a local cms at kent state.” 12. ibid. 13. cms matrix, “the content management comparison tool,” http://cmsmatrix.org/ (accessed aug. 16, 2010). 14. silverstripe.org, “open source help & support,” http:// silverstripe.org/help-and-support/ (accessed aug. 16, 2010). ■■ conclusion an academic library website is a complex operation. the best ones use the strengths of the organization to their fullest: give web content authors direct access to maintain their content without burdening them with the requirement of technical expertise in html. excellent sites also offer a consistent user experience facilitated by centrally managed presentation. a web cms facilitates this model. the selection of a web cms is not solely a technical decision; it is most effective when made in partnership with the web content providers. the process followed by osu libraries described here provides an example of one such selection process. ■■ acknowledgements the author thanks james muir and jason thompson for their thoughtful contributions to this article and their exceptional work on the project. none of it would have been possible without them. references 1. holly yu, ed., content and workflow management for library web sites: case studies (hersey, pa.: information science, 2005); michael seadle, “content management systems,” library hi tech 24, no. 1 (2006): 5–7; terry l. huttenlock, jeff w. beaird, and ronald w. fordham, “untangling a tangled web: a case study in choosing and implementing a cms,” library hi tech 24, no. 1 (2006): 61–68; doug goans, guy leach, and teri m. vogel, “beyond html: developing and re-imagining library appendix. content provider focus interview questions each group interview included a series of questions, which could be modified depending on the direction in which the interviews progressed. these are the questions provided to the interviewers: 1. who is your audience? 2. how do you teach/communicate with each audience? 3. what types of information are you trying to communicate? 4. how dynamic or static is the information? 5. what are the most important resources in your discipline? 6. who do you teach most frequently? undergrads, grads? 7. where do you start your instruction: with library.osu.edu or the department? 8. how do you connect the users/audience to your resources? 9. what message do you want to deliver? 10. what is unique about your discipline/ needs/ department? 11. what would make things easier for you? lib-mocs-kmc364-20131012113937 268 the use of automatic indexing for authority control martin dillon: university of north carolina at chapel hill ; rebecca c. knight: wichita state university, wichita, kansas; margaret f. lospinuso: university of north carolina at chapel hill; and john ulmschneider: national library of medicine. thesaurus-based automatic indexing and automatic authority control share common ground as word-matching processes. to demonstrate the resemblance, an experimental system utilizing automatic indexing as its core process was implemented to perform authority control on a collection of bibliographic records. details of the system are given and results discussed. the benefits of exploiting the resemblance between the two systems are examined. introduction it is not often realized how close the relationship is between automatic indexing using a thesaurus , on the one hand , and automatic authority control, on the other. making the connection is worthwhile for many reasons. the first has to do with terminology. though one would be naive to hope for a reduction in specialized vocabulary, it is helpful to appreciate that what is called a thesaurus in one application is referred to as an authority file in the other; that the two have virtually the same structure, similar working parts, and play the same role in controlling the content of fields in a bibliographic file in their creation and, at least potentially, during retrievals by users. a second reason emerges in system development. below we discuss the various ways that a library can implement authority control. they range from a fully manual system, where the authority file exists only in card form, to online, automatic authority management. there are intermediate points as well. for each of the automated implementations, the system investment in software can be great. recognition of the close parallel in function of these two library needs allows for parallel development of software for any of these stages. a third reason looks to the future. successful system-patron interaction manuscript received apri11981 ; accepted september 1981. automatic indexing/dillon, et al. 269 ought not to depend upon a patron's knowledge of the authorized entry forms currently in use for a library. first, the concept of a controlled vocabulary is far too narrow: authority control should encompass all fields available for searching. but the patron need not be aware of complicating details: substitutions of recognized variants for authorized forms ought to be carried out automatically during patron retrievals (with due regard, of course, for the intent of the patron). this article describes a project in authority control in a specialized system environment, one that is increasingly typical in many of its features. the file of records is relatively small, currently below 10,000, and has a potential for growth not exceeding 100,000. the collection, derived from the annabel morris buchanan collection of american religious tune books at the university of north carolina (chapel hill) music library, has many similarities with standard book collections, but its details vary greatly and cataloging conventions have been developed locally. its use for scholarly research is similar to that for any standard collection of bibliographic records. a great many such nonstandard collections exist-the morgue file in a newspaper, machine-readable data files, even properties marketed by cooperatives of real estate agencies. developing automated retrieval systems for such collections are similar enterprises, sharing similar goals and problems. in particular, all require extensive authority control similar to that required by a tune-book collection. the important feature of the method of authority control described here, one that makes it likely to be of interest to others, is its use of the same structures and software that are used for general vocabulary control. the three major software components we will refer to below are: thesaurus maintenance, automatic indexing, and automatic updating. these components antedated our effort to implement a similar system for authority control. when the problems that dealt with authority control per se were investigated, it was discovered that the system already available for subject control could be used exactly as it stood for authority control as well. initial experiments confirmed this relationship. 1 authority control and automatic indexing automatic authority control has been approached largely as a unique problem requiring special software development for its implementation. but authority control shares common ground with automatic subject indexing. both are term-matching activities based on a list of preferred terms plus a much larger list of match terms. each preferred term is tied to a number of match terms, but each match term is tied to only one preferred term. in the indexing environment, document text is examined for certain terms; these "free text" (uncontrolled vocabulary) terms are tied to equivalent (controlled vocabulary) terms in a thesaurus. when an uncontrolled vocabulary term is encountered in a document, its associated controlled 270 journal of library automation vol. 14/4 december 1981 vocabulary term is posted to the document as a descriptor. in authority control, document text is also examined for certain terms, e.g., author names. these "free-text" author names (i.e., names just as they appear on a title page) are tied to their authoritative name form (controlled vocabulary) in an authority file . when a "free-text" author name is encountered, the authoritative name is posted to the document or book (i.e., assigned as a heading or entry point). an automatic authority control system, then, is realizable by applying standard automatic subject-indexing software, which exploits the resemblance between the two processes. the input would consist of a thesaurus (in this case, an authority file) and bibliographic records; the indexing discovers matches between the list of possible terms in the thesaurus (variants of author names) with the "free-text" terms (title-page author names) , and posts the appropriate controlled thesaurus terms (authoritative author name form) whenever a match occurs. (see figure 1.) the tune-book project an experimental version of an authority control system using automatic indexing was implemented to test the feasibility of automatic indexing as i thesaurus i (authority file) \ \ i i \ fig. 1. at1thority control by indexing. matching and posting , l ' pdated records i \ ' i bibliographic records ~ automatic indexing/dillon, et al. 271 the core process for authority control. the goal was automatic authority control for the buchanan collection index, the first step in work on a more comprehensive project, an index of american religious tune books, in particular, the shape-note tune books. for the study of american cultural and musical history it is important to be able to trace the dissemination of these hymn tunes and texts, but the absence of a comprehensive index of american hymn tune books severely constrains such studies. many factors have discouraged scholars from constructing an index, among them the magnitude of the repertory . using computers to sort, file, and print reduces many of the problems associated with the size of the repertory, but does not address those created by the diverse forms of names and texts used by the tune-book compilers. correct hymn titles and especially accurate composer attributions were not important to the compilers of the tune books. consequently, although many tune-book compilers did attempt to indicate who had composed the work, the names of the composers appeared in various forms. for example, the name "israel holdroyd" might appear as simply "holdrad" or "holdrayd" with no first name given, or a first initial might be added, or an abbreviated first name, such as "is." might be used with one of several forms of the family name. automatic authority control over these names is necessary to the study of this collection, since only automatic means can address the problems of magnitude encountered in approaching the index as a whole. the database now contains about 6,000 records for these tune books. they are stored in marc format with variable-length fields giving a variety of information about each tune . creation of the authority file a thesaurus of authority records for the buchanan collection was manually created and placed in an online file. the initial authority file comprises a selection of composers whose names are present in conflicting forms in the present database. these were obtained by analyzing the file sorted by tune names, noting those tunes for which it appeared that the name of the same composer was given in more than one form. all forms of the name found were entered on cards along with the name of the tune (or tunes) through which the relationship was established . we used an explicit algorithm as a guide in determining which names were actually forms of the same name (see appendix for details). this process resulted in a list of 266 distinct composers, each with one to four different name forms. all were compared with the list sorted by composers, noting additional forms. these names were then checked in several reference works, and authoritative forms (with dates) were established when possible. implementation software systems file processing for the tune records and the authority thesaurus was 272 journal of library automation vol. 14/4 december 1981 accomplished using a local software product, bibliographic/marc processing system (bps). bps is a general-purpose software package for the manipulation of marc-format records. this experiment used bps subsystems for creation of marc-format records, sorting and formatting, and file updating (i.e., updating a master file with the contents of a transaction file). the automatic indexing program used here was intended as part of a thesaurus-based document query system. 2 it is compatible with bps, but utilizes generalized automatic indexing principles-its compatibility depends only on properly formatted thesaurus and bibliographic records. it includes file-processing programs for the thesaurus (authority file) and the bibliographic records (tune records) and a matching program that performs the indexing. posting of the authoritative name forms to the proper marc record is done with standard bps updating procedures using output from the matching program. automatic authority control process as input the system uses a thesaurus and the text of fields selected from marc-format document records. the thesaurus consists of pairs of terms: the first of each pair is the term searched for in a document, the second is the authority term assigned to the document, whenever the first term is found. figure 2 gives examples. the text may be abstracts, titles, or the contents of any field selected from the documents for authority control. in this case, the text is derived from the composer field; for authority work in general, any field requiring authority control would be input. the first step in authority control is as follows. the text sample and a stop-word list are input to the initial text-processing program. the incomau'ihcrity fcri'i cole, j_ i cvle, joh~ 1774-1855 clarkf", thos. 1 clark, thomas \:ol e!' , ~ eo. i cuzens, 9. / cuzens, benjamin ilall , ::;_ bi ba 11 , r. fholraj / hcld r oyd , israel aolroyd i hcldroyd, israel fig. 2. thesaurus/authority file format . automatic indexing/dillon , et al. 273 ing text (in this case, composer names) is separated into individual words. the stop-word list is used to remove designated words from the input, which in authority control might be titles of address and so onterms such as "miss," "elder," or "reverend." (automatic indexing uses the stop-word list to eliminate similarly noncontributory terms, such as conjunctions and prepositions.) the processing program can also convert plurals to singulars if desired. the purpose of this option in automatic indexing is to pare down variants in order to increase matches by standardizing term forms. however, plurals are not converted in authority control, since names are usually distinguished from one another by their full forms. the processing produces a list of individual terms. each term is given once along with the number of words in the term, then broken up with the document number attached to each piece. the thesaurus authority records are edited by the thesaurus processing program into specially formatted matched pairs of variant and authoritative forms. input is the match-term/variant-term file (figure 2) and the same stop-word list used for document processing. the stop-word list eliminates all unwanted words in the list of variant name forms. output is a file containing all possible name forms (variants), the number of terms in each name and their positions in the name, and the authoritative name form, as in figure 3. next the two files are used as input to a matching program that creates an inverted file of the processed document text, then compares each match term from the prepared thesaurus with the inverted file. a match is discovered according to one of the following criteria: 1. exact match: match term and document term are the same words, in the same order, and adjacent. 2. stop word exact match: words are the same in match term and in document term, and in order, but deleted stop words may intervene between words in the document term. 3. any order match: term must be the same words and adjacent (i.e., without intervening words) and may be in any order. va!'iani twc!ld s ~:utiv~ auti:-ci\ily ?cs: no fch hlstin'js, 'ihos. 2 1 2 rastinq~ , tl:hii.l s 17~4-l tl7 _ hastl.nqs, l h:>s :le i 1 2 rds tl nq.< , th.:>llll s 17cl~ 1 -!72 holde a':! l!ol:lccyd , l s cd<! l holdcoy1 i· holiroyd , i sra~: housec, w 2 1 2 lid u se [', willia m 1 '3 12 -1tho fig. 3. processed authority file. 274 journal of library automation vol. 14/4 december 1981 4. stop word any order match: terms must have the same words and in any order, but intervening stop words are ignored. 5. any match: any word of the match term may be in any part of the document text in any order. these match criteria are similar in intent to the criteria for deciding composer-variant forms/composer-authority form match mentioned above and presented in the appendix. an interesting possibility is to use such match criteria to discover variant author name forms in creating the authority file, since many variant forms result only from misspellings, title attributions, and so on. pseudonyms would not be detected, but such a procedure would be useful in collating forms morphologically similar. the experiment used criterion two, one of the most restrictive; the "freetext" composer name must match exactly and with its parts in the same order (except that stop words, such as "miss" or "elder," may intervene) as the variant author name before an authoritative form is posted. this seems the most reasonable choice for this project; presumably more flexibility could be achieved by adding criteria to the match process or by allowing boolean combinations of criteria analogous to those outlined in the appendix. the final output from the match module is three files: a print file of all match terms, a file of all unmatched authority names, and a file constructed for the update of the bibliographic records, giving the document and field to be updated and the update term. the print file is a record printed for each term matched. the record gives the variant form matched, its field type, the proper authoritative name form as given in the thesaurus , and the identifier numbers of the documents in which the term is found. field type is an identifying code assigned to each term in the prepared thesaurus, not necessarily the same as those identifiers in the marc-format authority file; here, the field type is preferred composer n arne (pcn). an example of the printed output file is in figure 4. the update file is for use in an update program that posts the authoritative name form assigned by the indexing. it contains the document identifier number in which a match was found, the field type of term found (pcn), and the authoritative name form. the update program uses this file to add the authoritative composer name form as a new bibliographic data field to the appropriate bibliographic record, assigning as a field identifier the field type identifier accompanying it. figure 5 gives the new records with added fields. during the update process, a file containing all records not receiving a new authority-name field is generated. these records may contain a new variant of an authoritative name already in the file or a name altogether new to the file; in either case the unmatched author name would have to be added to the authority file and tied to an authoritative name form. the output also assists in tracing erroneous name-form assignments. automatic indexing/dillon, et al. 275 '1at'...:ii ~i::r:1: walk er 'y"e: pcil .\'jtnoriti ':zr~: ~alk'?r, williu 1tj q9 -1 d 7s lochc:ons: h-1j59, .&.a-11~ij, a!l-1273, ••• maic!i n:a'l: oaviison type: pol .;otho!li:iy ier,1: da vi s~or., an-1ias 17 ~0 -1 857 locat!on.:i: h-1035 :1!\tch t c:r ~: h an de l type: pcn authcr:ty ter m: han1el, george frideric 1685-17 5 9 locations: ak-1j~5, aa2 1)93 match term: ev€rett type: pc:i author!'!'y term: everett, e. r.. locations: aa-1015, ~a-1090, a~-1105, al-10~3, ak-1060, ak-1111, a!-13 57 ••• ib'ich te8ll: pond lype: pcn autho<!ity terl!: pond, sylnnus sillings 179~-1871 location!>: ab-1054, .\3-166q, ad-1248, aq-133b, ••• fig. 4. update file. results table 1 gives some statistics on the experimental runs. in the 5, 788 bibliographic records, 760 distinct composer names were present, the remainder (one composer per record) being duplicate forms; many of these are simply "anon," where the composer was not known. earlier test runs on a subset of the file had fewer duplicates, and additions to the full database show few new composer name forms. thus the database is nearing a stable state with an exhaustive list of composers; this stability contribtable 1. implementation statistics f ile statistics: total number of bibliograp hi c records number of composer names in biblio reco rds ave rage number of compositions per composer tota l number of authorit y na me forms (in authority file) tota l number of variant and authority names (in authority file) run statisti cs: total number of variant thesauru s names matched total numbe r of variant thesaurus n am es unmatched average number of documents per match ed ter m average number of docume nts per term total number of reeords updated b y authority form 5,788 760 13.2 266 599 372 213 5.87 3.61 2, 110 276 journal of library automation vol. 14/4 december 1981 jqc 10: af1 14 7 .\nt ho l o.:; y ; 'i h <:> ~ n ion ilih jl on y i mjrin : : sel~cted ty ;ecr qe y~njr~ckson tune na:1e: i e::-usa lem firs: lin~:je~us, my all tc h~~v•n is gone, pcn: walk e r, william 18 09 -187 5 cc.'1p!)3:':r: loi al k e r, \ojr • joc i d: aa-1353 "antholo.:;y: the sacred harp imprinl': oy 3. f. lthite, e . j. king [and d.p. white}--4th ed.---atalnta : d. p. byrd, 1870 tune name: the hilt cf zion frgsr ~ine:the hill cf zion yield s , pc~: white, benjamin franklin 1800-1879 coi1po ser: white, b. f. )ot: id: afl -1100 anthology: the culcia;er imprint : or, 'ihe new york coll~ction of ~acred music 1 by i. b. woccbury. --neli york f. j. huntington tune name: carson first line:jesus an1 shall it ever be, pcn: bradbury, williaa; batchelder 1816-1868 composer: er, w. !l. fig. 5. updated records. utes to decreasing errors and fewer unmatched composer names in the automated authority control process. the total numbe r of thesaurus records matched applies to variant forms, authoritative forms (matching occurs for these also) , and for those few forms that have no variants. the unmatched terms (213) are largely variants not in the database but gleaned from reference sources in anticipation of their occurrence, and authority forms, most of which do not occur in the database. the 2, 110 matched represent the total number of composer names matched of the originals, 788 names. most of the unmatched names are the "anon" entries (more than 2 ,000); the remainder are unanticipated forms not detected in the initial manual construction of the authority file. these unanticipated forms become new variants added to the authority file as described above. conclusions automated authority control as presented here has a number of advantages, either for libraries with their own processing facilities or for the management of information collections outside the standard library environment. unifying the processes of subject control and authority control by using the same procedures and software for both simplifies the tasks of automatic indexing/dillon, et al. 277 systems personnel and information managers. where catalog access is online, the patron benefits by applying subject access facilities to other searches. ideally, substitutions for all variants would occur automatically, accompanied by an alerl lo the patron where it was felt necessary. at a minimum, the same command structure would be available for referencing names as would be normally available for consulting an online thesaurus. in either case, the difficulties of the patron are reduced, both in comprehending how the system works, and in acquiring a facility for using system commands. references 1. gordon ellyson jessee, "authority control: a study of the concept and its implementation using an automated indexing system" (master's paper, school of library science, university of north carolina at chapel hill, 1980). 2. margaret s. strode, "automatic indexing using a thesaurus" (master's thesis, department of computer science, university of north carolina at chapel hill, 1977). appendix rules for decisions on similar names the following conditions may exist: a = identical tune name b = identical surname c = identical first initial d = same first letter of surname and close match of the rest of the surname. (55 percent match of latters in content, not in order. such a similarity is presumed to represent a similarity in sound. ) e = similar tune name (same criteria as in d for percentage of match). exception: words "new" and "old" cancel any presumed relation between similar tune names. f = information in cmp subfield x field is identical in content the following combinations of conditions indicate the same person, expressed in decreasing order of reliability: l. a&b 2. b&c 3. a&d 4. c&d 5. b&e 6. c&d&e 7. d&e 8. f&(bord) note: points seven and eight are regarded as tentative, and matches using these combinations are flagged for later checking. martin dillon is associate professor of library science at the university of north carolina at chapel hill. rebecca c. knight is administrative services librarian at wichita state university, wichita, kansas. margaret f. lospinuso is music librarian at the university of north carolina at chapel hill. john ulmschneider is library associate at the national library of medicine. examining attributes of open standard file formats for long-term preservation and open access eun g.park and sam oh information technology and libraries | december 2012 44 abstract this study examines the attributes that have been used to assess file formats in literature and compiles the most frequently used attributes of file formats to establish open-standard file-formatselection criteria. a comprehensive review was undertaken to identify the current knowledge regarding file-format-selection criteria. the findings indicate that the most common criteria can be categorized into five major groups: functionality, metadata, openness, interoperability, and independence. these attributes appear to be closely related. additional attributes include presentation, authenticity, adoption, protection, preservation, reference, and others. introduction file format is one of the core issues in the fields of digital content management and digital preservation. as many different types of file formats are available for texts, images, graphs, audio recordings, videos, databases, and web applications, the selection of appropriate file formats poses an ongoing challenge to libraries, archives, and other cultural heritage institutions. some file formats appear to be more widely accepted: tagged image file format (tiff), portable document format (pdf), pdf/a, office open xml (ooxml), and open document format (odf), to name a few. many institutions, including the library of congress (lc), possess guidelines on file format applications for long-term preservation strategies that specify requisite characteristics of acceptable file formats (e.g., they are independent of specific operating systems, are independent of hardware and software functions, conform to international standards, etc.).1 the format descriptions database of the global digital format registry is an effort to maintain a detailed representation of information and sustainability factors for as many file formats as possible (the pronom technical registry is another such database).2 despite these developments, file format selection remains a complex task and prompts many questions that range from a general interest (“which selection criteria are appropriate?”) to more specific (“are these international standard file formats sufficient for us to ensure long term preservation and access?” or “how should we define and implement standard file formats in harmony with our local context?”). in this study, we investigate the definitions and features of standard file formats and examine the eun g. park (eun.park@mcgill.ca) is associate professor, school of information studies, mcgill university, montreal, canada. sam oh (samoh@skku.edu) is corresponding author and professor, department of library and information science, sungkyunkwan university, seoul, korea. mailto:eun.park@mcgill.ca mailto:samoh@skku.edu information technology and libraries | december 2012 45 major attributes of assessing file formats. we discuss relevant issues from the viewpoint of openstandard file formats for long-term preservation and open access. background on standard file formats the term file format is generally defined as what “specifies the organization of information at some level of abstraction, contained in one or more byte streams that can be exchanged between systems.”3 according to interpares 2, file format is “the organization of data within files, usually designed to facilitate the storage, retrieval, processing, presentation, and/or transmission of the data by software.”4 the premis data dictionary for preservation metadata observes that, technically, file format is “a specific, pre-established structure for the organization of a digital file or bitstream.”5 in general, file format can be divided into two types: an access format and a preservation format. an access format is “suitable for viewing a document or doing something with it so that users access the on-the-fly converted access formats.”6 in comparison, a preservation format is “suitable for storing a document in an electronic archive for a long period”7; it provides “the ability to capture the material into the archive and render and disseminate the information now and in the future.”8 while the ability to ensure long-term preservation focuses on the sustainability of preservation formats, the document in its access format tends to emphasize that it should be accessible and available by users, presumably all of the time. many researchers have discussed file formats and long-term preservation in relation to various types of resources. for example, folk and barkstrom describe and adopt several attributes of file formats that may affect the long-term preservation of scientific and engineering data (e.g., the ease of archival storage, ease of archival access, usability, data scholarship enablement, support for data integrity, and maintainability and durability of file formats).9 barnes suggests converting word processing documents in digital repositories, which are unsuitable for long-term storage, into a preservation format.10 the evaluation by rauch, krottmaier, and tochtermann illustrates the practical use of file formats for 3d objects in terms of long-term reliability.11 others have developed and/or applied numerous criteria in different settings. for instance, sullivan uses a list of desirable properties of a long-term preservation format to explain the purpose of pdf)/a from an archival and records management prospective.12 sullivan cites device independence, self-containment, self-describing, transparency, accessibility, disclosure, and adoption as such properties. rauch, krottmaier, and tochtermann’s study applies criteria that consist of technical characteristics (e.g., open specification, compatibility, and standardization) and market characteristics (e.g., guarantee duration, support duration, market penetration, and the number of independent producers). rog and van wijk propose a quantifiable assessment method to calculate composite scores of file formats.13 they identify seven main categories of criteria: openness, adoption, complexity, technical protection mechanism, self-documentation, robustness, and dependencies. sahu focuses on the criteria developed by the uk’s national archives, which include open standards, ubiquity, stability, metadata support, feature set, examining attributes of open standard file formats for long-term preservation and open access | park and oh 46 interoperability, and viability.14 a more comprehensive evaluation by the lc reveals three components—technical factors, quality, and functionality—while placing a particular emphasis on the balance between the first two.15 hodge and anderson use seven criteria for sustainability, which are similar to the technical factors of the lc study: disclosure, adoption, transparency, selfdocumentation, external dependencies, impact of patents, and technical protection mechanisms.16 some institutions adopt another term, standard file formats, to differentiate accepted and recommended file formats from others. according to the david project, “standard file formats owe their status to (official) initiatives for standardizing or to their widespread use.”17 standard may be too general to specify the elements of file formats. however, there is a recognition that only those file formats accepted and recommended by national or international standard organizations (such as the international standardization organization [iso], international industry imaging association [i3a], www consortium, etc.) are genuine standard file formats. for example, iso has announced several standard file formats for images: tiff/it (iso 12639:2004), png (iso/iec 15948:2004), and jpeg 2000 (iso/iec 15444:2003, 2004, 2005, 2007, 2008). for document file formats, pdf/a-1 (iso standard 19005-1. document file format for long-term preservation) is one example. this format is proprietary to maintain archival and recordsmanagement requirements and to preserve the visual appearance and migration needs of electronic documents. office open xml file format (iso/iec 29500–1:2008. information technology—document description and processing languages) is another open standard that can be implemented from microsoft office applications on multiple platforms. odf (iso/iec 26300:2006. information technology—open document format for office applications [opendocument] v1.0) is an xml-based open file format. regardless of iso-announced standards, some errors in these file formats have been reported. for example, although pdf/a-1 is for longterm preservation of and access to documents, studies reveal that the feature-rich nature of pdf can create difficulties in preserving pdf information over time.18 to overcome the barriers of pdf and pdf/a-1, xml technology seems prevalent for digital resources in archiving systems and digital preservation.19 the digital repository community is treating xml technology as a panacea and converting most of their digital resources to xml. the netherlands institute for scientific information service (nisis) adopts another noteworthy definition of standard file formats. it observes that standard image file formats “are widely accepted, have freely available specifications, are highly interoperable, incorporate no data compression and are capable of supporting preservation metadata.”20 this definition implies specific and advanced ramifications for cost-free interoperability and metadata, which closely relates to open access. open standard is another relevant term to consider in file formats. although perspectives vary greatly between researchers, open standards can be acquired and used without any barrier or cost.21 in other words, open standard products are free from restrictions, such as patents, and are independent of proprietary hardware or software. since the 1990s, open standard has been broadly adopted in many fields and is now an almost compulsory feature in information services. information technology and libraries | december 2012 47 to follow the national archives’ definition, open standard formats are “formats for which the technical specifications have been made available in the public domain.”22 in comparison, the folk and barkstrom approach opens standards from institutional support perspectives, relying on user communities for standards that are widely available and used.23 on a more specific level, stanescu emphasizes independence as the basic selection criteria for file formats.24 others, such as todd, propose determining whether a standard should be more open than others by applying criteria: adoption, platform independence, disclosure, transparency, and metadata support.25 other factors considered by todd include reusability and interoperability; robustness, complexity, and viability; stability; and intellectual property (ip) and rights management.26 echoing the lc, hodge and anderson also suggest a list of selection criteria that have been grouped under the banner of “technical factors”: disclosure, adoption, transparency, self-documentation, external dependencies, impact of patents, and technical protection mechanisms.27 researchers agree that open standard file formats are less obsolete and more reliable than proprietary formats.28 close examination of the nisis definition mentioned above reveals that standard file formats are in reality not free, nor do they allow unrestricted access to resources. the three file formats that iso has announced (pdf/a, ooxml, and odf) are proprietary and sometimes costly. they also prohibit the purchase of access to a proprietary standard, although there is an assumption that a standard should be free from legal and financial restrictions. the iso-announced file formats, in short, are only standard file formats, not open standard file formats. for cultural heritage institutions, questions regarding appropriate selection criteria and the sufficiency of existing international standard file formats for long-term preservation and access remain unanswered. there exists neither a uniform method to compare the specifications of different file formats nor an objective approach to assess format specifications that would ensure long-term preservation and persistent access. objectives of the study in this study, we attempt to better define and establish open-standard file-format-selection criteria. to that end, we assess and compile the most frequently used attributes of file formats to establish open-standard file-format-selection criteria. method we performed a comprehensive review of published articles, institutional reports, and other literature to identify the current knowledge regarding file-format-selection criteria. we included literature that deals with the three standard file formats (pdf, pdf/a, and xml) but excluded the recently announced odf format due to the scarcity of literature on odf. among more than the thirty articles initially reviewed, only twenty-five that use their own clear attributes were included in this study. all of the attributes that we have employed are listed by frequency and grouped according to similarities in meaning (see appendix). the original definitions or descriptions that we used are listed in the second column. the file formats that we assessed by their attributes are examining attributes of open standard file formats for long-term preservation and open access | park and oh 48 listed in the third column. when we give attributes without specific definitions or descriptions, “no definite term” is inserted. findings as illustrated in the appendix, the criteria identified by the studies vary. although the requirements and context of the studies may differ, the most common criteria can be divided into five categories: functionality, metadata, openness, interoperability, and independence. first, functionality refers to the ability of a format to do exactly what it is supposed to be doing.29 it is important to distinguish between two broad uses: preservation of document structure and formatting and preservation of useable content. to preserve document formatting, a “published view” of a given piece of content is critical for distribution. other content, such as database information or device-specific documents, needs to be preserved as well. functionality criteria include various attributes related to formats and structure or physical and technical specifications of files (e.g., robustness, feature set, viability, color maintenance, clarity, compactness, modularity, compression algorithms, etc.). second, metadata indicates that a format allows rich descriptive and technical metadata to be embedded in files. metadata can be expressed as metadata support, self-documentation (selfdocumenting), documentation, content-level (as opposed to presentation-level) description, selfdescribing, self-describing files, formal description of format, etc. third, openness refers to specifications of a file format that are publicly available and accessible and formats that are not proprietary. whether seen as a single definition or as a set of criteria, the characteristic that appears to be at the core of the open standard movement is its independence from outside proprietary or commercial control. openness also may refer to the autonomy of a file format, which relies on several factors. first, the document should be self-contained in terms of the content information (e.g., the text), the structural information (i.e., for those documents that are structured), the formatting information (e.g., fonts, colours, styles, etc.), and the metadata information. self-containment does not necessarily mean that an archivist will only have one document to deal with. it does mean, however, that they will have documents that will provide them with all the information to access and process the content, structure, formatting, and metadata. openness is expressed as open availability by some researchers.30 other researchers adopt the term disclosure for expressing that specification is publicly available.31 fourth, is the independence of a document from proprietary or commercial hardware and software configurations, especially to prevent any issues resulting from different versions of software, hardware, and operating systems. this aspect is expressed in the appendix as open standards, open source software or equivalent, standard/proprietary, etc. this also closely relates to independence, one of the five categories in the appendix, expressed as device independencies, independent implementations, no external dependency, no external dependencies, portability, and monitoring obsolescence. having documents in a proprietary format controlled by a third party information technology and libraries | december 2012 49 implies that, at one time or another, this format may no longer be supported, or that a change in the user agreement may lead to restricted access, access to outdated material, or patent and copyright issues. this fact means that the document must be freely accessible, without password restrictions or protection, and without any digital rights management scheme. blocking access to a document with a password can lead to serious problems if the password gets lost. in addition, the size and compactness of the document will influence the selection of a file format. fifth, interoperability primarily refers to the ability of a file format to be compatible with other formats and to exchange documents without loss of information.32 specifically, it refers to the ability of a given software to open a document without requiring any special application, plug-in, codec, or proprietary add-on. adherence to open source standards is usually a good indication of the interoperability of a format. in general, an open standard is released after years of bargaining and agreements between major players. supervision by an international standard (such as iso or the w3c) commonly helps propagate the format. in addition to the five categories mentioned above, other attributes are often used. presentation, authenticity, adoption, protection, preservation and reference are such examples. among these attributes, authenticity, although this is the seventh in the appendix, is one of the most important attributes in archives and records management. it refers to the ability to guarantee that a file is what it originally was without any corruption or alteration.33 specific to authenticity is data integrity, which assesses the integrity of the file through an internal mechanism (e.g., png files include byte sequences to validate against errors). another method of validating the authenticity of a document is to look at its traceability,34 that is, the traces left by the original author and those who modified or opened a file. one example is the difference between the creation date, modification date, and access date of any file on a personal computer. these three dates correspond to a moment when someone (often a different person each time) opened the file. other mechanisms may require log information, which is external to the file. another good indication of authenticity is the stability of a format.35 a format that is widely used is more likely to be stable. a stable format is also more likely to cause less data loss and corruption; hence it is a better indicator of authenticity. presentation includes attributes related to presenting and rendering data, expressed as distributing a page image, normal rendering, self-containment, selfcontained, and beyond normal rendering. adoption indicates how popular and widely a file format is adopted by user communities, also represented as popularity, widely used formats, ubiquity, or continuity. protection includes the technical protection mechanism or source verification to protect with security skills. preservation means long-term preservation, institutional support, or ease of transformation and preservation. reference indicates citability, or referential extensibility. among other attributes, transparency is interesting to note because it indicates the degree to which files are open to direct analysis with basic tools and human readability. another important aspect across these criteria is that the terminologies used in the studies may be quite different yet describe the same or similar concepts from different angles. for instance, rog and van wijk use openness for standardization and specification without restrictions,36 while examining attributes of open standard file formats for long-term preservation and open access | park and oh 50 several other researchers use open availability to convey the same thing.37 they in turn adopt the term disclosure to express that specification is publicly available.38 discussion and conclusion functionality, metadata, openness, interoperability, and independence appear to be the most important factors when selecting file formats. when file formats for long-term preservation and open access are under discussion, cultural heritage institutions need to consider many issues. despite several efforts, it is still tricky for them to identify the most appropriate file format or even to discern acceptable formats from unacceptable formats. where it is difficult to prevent the creation of a new file format, format selection is not an easy task, both in theory and in practice. it is critical, however, to base the decision on a clear understanding of the purpose for which the document is preserved: access preservation or repurposing preservation. cultural heritage institutions and digital repository communities need to guarantee long-term preservation of digital resources in selected file formats. additionally, users find it necessary to have access to digital information in these file formats. additional consideration involves the level of access users may enjoy (e.g., long-term access, permanent access, open access, persistent access, etc.). when determining international standard file formats, an aspect of open access should be included because it is a well-liked topic. it is necessary to develop a scale or measurement to assess open-standard format specifications to ensure long-term preservation and open access. identifying which attributes are required to be an open-standard file format and which digital format is most apt for the use and sustainability of long-term preservation is a meaningful task. the outcome of our study provides a framework for appropriate strategies when selecting file formats for long-term preservation and access to digital content. we hope that the criteria described in this study will benefit librarians, preservers, record creators, record managers, archivists, and users. we are reminded of todd’s remark that “the most important action is to align the recognition and weighting of criteria with a clear preservation strategy and keep them under review using risk management techniques.”39 the question of how to adopt and implement these attributes can only be answered in the local context and decisions of each cultural heritage institution.40 each institution should consider implementing a file format throughout the entire life cycle of digital resources, with a holistic approach to managerial, technical, procedural, archival, and financial issues for the purpose of long-term preservation and persistent access. the criteria may change over time, as is necessary for any format to adequately serve its purpose. maintaining its quality may be an ongoing task that cultural heritage institutions should take into account at all times. even more importantly, cultural heritage institutions need to establish and implement a set of standard guidelines specific to each context for the selection of open-standard file formats. note: this research was supported by the sungkyunkwan university research fund (2010-2011). information technology and libraries | december 2012 51 references and notes 1. library of congress, “sustainability of digital formats: planning for library of congress collections,” www.digitalpreservation.gov/formats/intro/intro.shtml (accessed november 21, 2011). 2. global digital format registry, www.gdfr.info (accessed november 17, 2011); the technical registry pronom, www.nationalarchives.gov.uk/aboutapps/pronom (accessed november 21, 2011). 3. mike folk and bruce r. barkstrom, “attributes of file formats for long-term preservation of scientific and engineering data in digital libraries” (paper presented at the joint conference on digital libraries (jcdl), houston, tx, may 27–31, 2003), 1, www.larryblakeley.com/articles/storage_archives_preservation/mike_folk_bruce_barkstrom2 00305.pdf (accessed november 21, 2011). 4. interpares 2 project glossary, p. 24, www.interpares.org/ip2/ip2_term_pdf.cfm?pdf=glossary (accessed november 21, 2011). 5. premis editorial committee, premis data dictionary for preservation metadata, ver. 2.0, march 2008, p. 195, www.loc.gov/standards/premis/v2/premis-2-0.pdf (accessed november 21, 2011). 6. ian barnes, “preservation of word processing documents,” july 14, 2006, p. 4, http://apsr.anu.edu.au/publications/word_processing_preservation.pdf (accessed november 21, 2011). 7. ibid. 8. gail hodge and nikkia anderson, “formats for digital preservation: a review of alternatives and issues,” information services & use 27 (2007): 46. 9. folk and barkstrom, “attributes of file formats.” 10. barnes, “preservation of word processing documents.” 11. carl rauch, harald krottmaier, and klaus tochtermann, “file-formats for preservation: evaluating the long-term stability of file-formats,” in proceedings of the 11th international conference on electronic publishing 2007 (vienna, austria, june 13–15, 2007): 101–6. 12. susan j. sullivan, “an archival/records management perspective on pdf/a,” records management journal 16, no. 1 (2006): 51–56. 13. judith rog and caroline van wijk, “evaluating file formats for long-term preservation,” 2008, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011). http://www.digitalpreservation.gov/formats/intro/intro.shtml http://www.nationalarchives.gov.uk/aboutapps/pronom http://www.larryblakeley.com/articles/storage_archives_preservation/mike_folk_bruce_barkstrom200305.pdf http://www.larryblakeley.com/articles/storage_archives_preservation/mike_folk_bruce_barkstrom200305.pdf http://www.interpares.org/ip2/ip2_term_pdf.cfm?pdf=glossary http://www.loc.gov/standards/premis/v2/premis-2-0.pdf http://apsr.anu.edu.au/publications/word_processing_preservation.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf examining attributes of open standard file formats for long-term preservation and open access | park and oh 52 14. d. k. sahu, “long term preservation: which file format to use” (paper presented in workshops on open access & institutional repository, chennai, india, may 2–8, 2004), http://openmed.nic.in/1363/01/long_term_preservation.pdf (accessed november 21, 2011). 15. cendi digital preservation task group, “formats for digital preservation: a review of alternatives and issues,” www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf (accessed november 21, 2011). 16. hodge and anderson, “formats for digital preservation.” 17. david 4 project (digital archiving, guideline and advice 4), “standards for fileformats,” 1, www.expertisecentrumdavid.be/davidproject/teksten/guideline4.pdf (accessed november 21, 2011). 18. sullivan, “an archival/records management perspective on pdf/a”; john michael potter, “formats conversion technologies set to benefit institutional repositories,” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881&rep=rep1&type=pdf (accessed november 21, 2011). 19. eva müller et al., “using xml for long-term preservation: experiences from the diva project,” in proceedings of the 6th international symposium on electronic theses and dissertations (may 20–24, 2003): 109–16, https://edoc.hu-berlin.de/conferences/etd2003/hanssonpeter/html/index.html (accessed november 21, 2011). 20. rene van horik, “image formats: practical experiences” (paper presented in erpanet training, vienna, austria, may 10–11, 2004), 22, www.erpanet.org/events/2004/vienna/presentations/erpatrainingvienna_horik.pdf (accessed november 21, 2011). 21. open standard is related to open access, which comes from the open access movement that allows resources to be freely available to the public and permits any user to use those resources (e.g., mainly electronic journals, repositories, databases, software applications, etc.) without financial, legal, or technical barriers. see amy e. c. koehler, “some thoughts on the meaning of open access for university library technical services,” serials review 32, no. 1 (march 2006): 17–21; budapest open access initiative, “read the budapest open access initiative,” www.soros.org/openaccess/read.shtml (accessed november 21, 2011). 22. national archives, “selecting file formats for long-term preservation,” 6, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011). 23. folk and barkstrom, “attributes of file formats.” http://openmed.nic.in/1363/01/long_term_preservation.pdf http://www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf http://www.expertisecentrumdavid.be/davidproject/teksten/guideline4.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881&rep=rep1&type=pdf https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/html/index.html https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/html/index.html http://www.erpanet.org/events/2004/vienna/presentations/erpatrainingvienna_horik.pdf http://www.soros.org/openaccess/read.shtml http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf information technology and libraries | december 2012 53 24. andreas stanescu, “assessing the durability of formats in a digital preservation environment: the inform methodology,” d-lib magazine 10, no. 11 (november 2004), www.dlib.org/dlib/november04/stanescu/11stanescu.html (accessed november 21, 2011). 25. malcolm todd, “technology watch report: file formats for preservation,” www.dpconline.org/advice/technology-watch-reports (accessed november 21, 2011). 26. ibid. 27. hodge and anderson, “formats for digital preservation.” 28. edward m. corrado, “the importance of open access, open source, and open standards for libraries,” issues in science & technology librarianship (spring 2005), www.library.ucsb.edu/istl/05-spring/article2.html (accessed november 21, 2011); carl vilbrandt et al., “cultural heritage preservation using constructive shape modeling,” computer graphics forum 23, no. 1 (2004): 25–41; marshall breeding, “preserving digital information,” information today 19, no. 5 (2002): 48–49. 29. eun g. park, “xml: examining the criteria to be open standard file format,” (paper presented at the interpares 3 international symposium, oslo, norway, september 17, 2010), www.interpares.org/display_file.cfm?doc=ip3_isym04_presentation_3–3_korea.pdf (accessed november 21, 2011). 30. adrian brown, “digital preservation guidance note: selecting file formats for long-term preservation,” www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed november 21, 2011); barnes, “preservation of word processing documents”; sahu, “long term preservation”; potter, “formats conversion technologies.” 31. stephen abrams et al., “pdf-a: the development of a digital preservation standard” (paper presented at the 69th annual meeting for the society of american archivists, new orleans, louisiana, august 14–21, 2005), www.aiim.org/documents/standards/pdf-a.ppt (accessed november 21, 2011); sullivan, “an archival/records management perspective on pdf/a”; cendi, “formats for digital preservation”; and hodge & anderson, “formats for digital preservation.” 32. the national archives, http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_me thod_27022008.pdf (accessed november 21, 2011); ecma international, “office open xml file formats—ecma-376,” www.ecma-international.org/publications/standards/ecma-376.htm (accessed november 21, 2011). 33. christoph becker et al., “systematic characterisation of objects in digital preservation: the extensible characterisation languages,” www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_bec ker.pdf (accessed november 21, 2011); national archives, http://www.dlib.org/dlib/november04/stanescu/11stanescu.html http://www.dpconline.org/advice/technology-watch-reports http://www.library.ucsb.edu/istl/05-spring/article2.html http://www.interpares.org/display_file.cfm?doc=ip3_isym04_presentation_3–3_korea.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.aiim.org/documents/standards/pdf-a.ppt http://www.ecma-international.org/publications/standards/ecma-376.htm http://www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_becker.pdf http://www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_becker.pdf examining attributes of open standard file formats for long-term preservation and open access | park and oh 54 www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011). 34. folk and barkstrom, “attributes of file formats.” 35. national archives, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_2 7022008.pdf (accessed november 21, 2011); rog and van wijk, “evaluating file formats for long-term preservation.” 36. rog and van wijk, “evaluating file formats for long-term preservation.” 37. see brown, “digital preservation guidance note: selecting file formats for long-term preservation,” www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed november 21, 2011); barnes, “preservation of word processing documents”; sahu, “long term preservation”; potter, “formats conversion technologies.” 38. stephen abrams et al., “pdf-a: the development of a digital preservation standard” (paper presented at the 69th annual meeting for the society of american archivists, new orleans, louisiana, august 14–21, 2005), www.aiim.org/documents/standards/pdf-a.ppt (accessed november 21, 2011).; sullivan, “an archival/records management perspective on pdf/a”; cendi, “formats for digital preservation”; and hodge & anderson, “formats for digital preservation.” 39. todd, “technology watch report,” 33. 40. evelyn peters mclellan, “selecting digital file formats for long-term preservation: interpares 2 project general study 11 final report,” www.interpares.org/display_file.cfm?doc=ip2_file_formats(complete).pdf (accessed november 21, 2011). http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.aiim.org/documents/standards/pdf-a.ppt http://www.interpares.org/display_file.cfm?doc=ip2_file_formats(complete).pdf information technology and libraries | december 2012 55 appendix: file format attributes no. attribute definition/description assessed file format 1. f u n c t i o n a l i t y robustness robust against single point of failure, support for file corruption detection, file format stability, backward compatibility and forward compatibility (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (limited) a robust format contains several layers of defense against corruption (frey, 2000). n/a feature set formats supporting the full range of features and functionality (brown, 2003) n/a not defined (sahu, 2006) n/a viability error-detection facilities to allow detection of file corruption (brown, 2003). png format (yes) not defined (sahu, 2006) n/a support for graphic effects and typography not defined (cendi, 2007; hodge & anderson, 2007) tiff_g4 (no) color maintenance not defined (cendi, 2007; hodge & anderson, 2007) tiff_g4 (limited) clarity support for high image resolution (cendi, 2007; hodge & anderson, 2007) tiff_g4 (yes) quality this pertains to how well the format fulfills its task today: (1) low space costs, (2) highly encompassing, (3) robust, (4) simplicity, (5) highly tested, (6) loss-free, (7) supports metadata (clausen, 2004). n/a compactness to minimize storage and i/o costs (folk & barkstrom, 2003) n/a simplicity ease of implementing readers (folk & barkstrom, 2003) n/a file corruption detection to be able to detect that a file has been corrupted; to provide errorcorrection (folk & barkstrom, 2003) n/a raw i/o efficiency formats that are organized for fast sequential access (folk & barkstrom, 2003) n/a availability of readers to maintain ease of data access for readers (folk & barkstrom, 2003) n/a ease of subsetting to process only part of data files (folk & barkstrom, 2003) n/a size to transfer data in large blocks (folk & barkstrom, 2003) n/a ability to aggregate many objects in a single file to maintain as small as archive “name space” as possible (folk & barkstrom, 2003) n/a ability to embed data extraction software in the files the files come with read software embedded (folk & barkstrom, 2003). n/a ability to name file elements to work with data based on manipulating the element names instead of binary offsets, or other references (folk & barkstrom, 2003) n/a rigorous definition to be defined in a sufficient rigorous way (folk & barkstrom, 2003) n/a multilanguage implementation of library software to have multiple implementations of readers for a single format (folk & barkstrom, 2003) n/a memory some formats emphasize the presence or absence of memory (frey, 2000). tiff (yes) examining attributes of open standard file formats for long-term preservation and open access | park and oh 56 accuracy in some cases, the accuracy of the data can be decreased to save memory, e.g., through compression. in the case of a digital master, however, accuracy is very important (frey, 2000). n/a speed the ability to access or display a data set at a certain speed is critical to certain applications (frey, 2000). n/a extendibility a data format can be modified to allow for new types of data and features in the future (frey, 2000). n/a modularity a modular data set definition is designed to allow some of its functionality to be upgraded or enhanced without having to propagate changes through all parts of the data set (frey, 2000). n/a plugability related to modularity, this permits the user of an implementation of a data set reader or writer to replace a module with private code (frey, 2000). n/a interpretability not binary formats (barnes, 2006) rtf (yes) ms word (no) xml (yes) the standard should be written in characters that people can read (lesk, 1995). n/a complexity human readability, compression, variety of features (rog & van wijk, 2008; wijk & rog, 2007). n/a simple raster formats are preferred (puglia et al., 2004). n/a compression algorithms the format uses standard algorithms (puglia et al., 2004). n/a accessibility to prohibit encryption in the file trailer (sullivan, 2006) pdf/a (yes) component reuse not defined (sahu, 2006) pdf (no) html (limited) sgml (excellent) xml (excellent) repurposing not defined (sahu, 1999) pdf (limited) html (limited) sgml (excellent) xml (excellent) packaging formats in general, packaging formats should be acceptable as transfer mechanisms for image file formats (puglia et al., 2004). zip (yes) significant properties the format accommodates high-bit, high-resolution (detail), color accuracy, and multiple compression options (puglia et al., 2004). n/a processability the requirement to maintain a processable version of the record to have any reuse value (brown, 2003) conversion of a word-processed document into pdf format. (no) searching not defined (sahu, 2006) pdf (limited) html (good) sgml (excellent) xml (excellent) no definite term to support the automatic validation of document conversions and the evaluation of conversion quality by hierarchically decomposing documents from different sources and representing them in an abstract xml language (becker et al., 2008a; becker et al., 2008b) n/a xcl (yes) to make transferring data easy (johnson, 1999) n/a xml (yes) a format that is easy to restore and understand by both humans and machines (müller et al., 2003) n/a xml (yes) information technology and libraries | december 2012 57 inability to be backed out into a usable format (potter, 2006) pdfs (no) 2. m e t a d a t a self-documentation self-documenting digital objects that contain basic descriptive, technical, and other administrative metadata (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (yes) metadata and technical description of format embedded (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (limited) the ability of a digital format to hold (in a transparent form) metadata beyond that needed for basic rendering of the content (arms & fleischhauer, 2006) n/a self-documenting to contain its own description (abrams et al., 2005) n/a documentation deep technical documentation publicly and fully is available. it is maintained for older versions of the format (puglia et al., 2004). n/a metadata support file formats making provision for the inclusion of metadata (brown, 2003) tiff (yes) microsoft word 2000 (yes) not defined (kenney, 2001) fiff 6.0 (yes) gif 89a (yes) jpeg (yes) flashpix 1.0.2 (yes) imagepac, photo cd (no) png 1.2 (yes) pdf (yes) not defined (sahu, 2006) n/a metadata the format allows for self-documentation (puglia et al., 2004). n/a content-level description not presentation-level description; structural markup, not formatting (barnes, 2006) pdf (no) docbook (yes) tei (yes) xhtml (yes) xml (yes) content-level, not presentation-level, descriptions where possible, the labeling of items should reflect their meaning, not their appearance (lesk, 1995). sgml (yes) self-describing many different types of metadata are required to decipher the contents of a file (folk & barkstrom, 2003). n/a self-describing files embed metadata in pdf files (sullivan, 2006) pdf/a (adobe extensible metadata platform required) formal (bnfor xml-like) description of format to create new readers solely on the basis of formal descriptions of the file content (folk & barkstrom, 2003) n/a no definite term its self-describing tags identify what your content is all about (johnson, 1999). n/a xml (yes) a format for strong descriptive and administrative metadata and the complete content of the document (müller et al., 2003) n/a xml (yes) examining attributes of open standard file formats for long-term preservation and open access | park and oh 58 3. o p e n n e s s disclosure authoritative specification publicly available (abrams et al., 2005) pdf/a (yes) microsoft word (no) the degree to which complete specifications and tools for validating technical integrity exist and are accessible to those creating and sustaining digital content (cendi, 2007; hodge & anderson, 2007; arms & fleischhauer, 2006) pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (yes) authoritative specification is publicly available (sullivan, 2006). pdf/a (yes) open availability no proprietary formats (barnes, 2006) odf (yes) gif (no) pdf (no) rtf (no) microsoft word (no) any manufacturer or researcher should have the ability to use the standard, rather than having it under the control of only one company (lesk, 1995). kodak photocd (no) gif (no) openness standardization, restrictions on the interpretation of the file format, reader with freely available source (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (yes) ms word (no) a standard is designed to be implemented by multiple providers and guide 5: file formats for digital masters employed by a large number of users (frey, 2000). n/a formats that are described by publicly available specifications or open-source source code can, with some effort, be reconstructed later: (1) open publicly available specification, (2) specification in public domain, (3) viewer with freely available source, (4) viewer with gpl’ed source, (5) not encrypted (clausen, 2004). n/a open-source software or equivalent to move toward obtaining open-source arrangements for all parts of the file format and associated libraries (folk & barkstrom, 2003) n/a open standard formats for which the technical specification has been made available in the public domain (brown, 2003) jpeg (yes) pdf (limited) ascii (limited) not defined (sahu, 2006) n/a standard/ proprietary not defined (kenney, 2001) fiff 6.0 (yes) gif 89a (yes) jpeg (yes) flashpix 1.0.2 (yes) imagepac, photo cd (no) png 1.2 (yes) pdf (yes) nonproprietary formats the specification is independent of a particular vendor (public records office of victoria, 2004). n/a no definite term to avoid vendor-lock (potter, 2006) odf (yes) information technology and libraries | december 2012 59 4. i n t e r o p e r a b i l i t y interoperability is the format supported by many software applications/os platforms or is it linked closely with a specific application (puglia et al., 2004)? n/a the ability to exchange electronic records with other users and it systems (brown, 2003) n/a not defined (sahu, 2006) n/a data interchange not defined (sahu, 2006) pdf (no) html (limited) sgml (excellent) xml (excellent) compatibility compatibility with prior versions of data set definitions often is needed for access and migration considerations (frey, 2000). n/a stability compatibility between versions (folk & barkstrom, 2003) n/a stable, not subject to constant or major changes over time (brown, 2003) n/a the format is supported by current applications and backward compatible, and there are frequent updates to the format or the specification (puglia et al., 2004). n/a not defined (sahu, 2006). n/a scalability the design should be applicable both to small and large data sets and to small and large hardware systems (frey, 2000). n/a markup compatibility and extensibility to support a much broader range of applications (ecma, 2008) n/a xml (yes) suitability for a variety of storage technologies the format should not be geared toward any particular technology (folk & barkstrom, 2003). n/a no definite term to allow data to be shared across information systems and remain impervious to many proprietary software revisions (potter, 2006) openoffice (yes) 5. i n d e p e n d e n c e device independencies can be reliably and consistently rendered without regard to the hardware/software platform (abrams et al., 2005) pdf/a (yes) tiff (no) static visual appearance can be reliably and consistently rendered and printed without regard to the hardware or software platform used (sullivan, 2006). pdf/a (yes) pdf/x (yes) this is a very important aspect for master files because they will be most likely used on various systems (frey, 2000). n/a independent implementations independent implementations help ensure that vendors accurately implement the specification (public records office of victoria, 2004). n/a externaldependency degree to which the format is dependent on specific hardware, operating system, or software for rendering or use and the complexity of dealing with those dependencies in future technical environments (arms & fleischhauer, 2006) n/a external dependencies the degree to which a particular format depends on particular hardware, operating system, or software for rendering or use and the predicted complexity of dealing with those dependencies in future technical environments (cendi, 2007; hodge & anderson, 2007) pdf (limited) pdf/a (no) tiff_g4 (no) xml (no) examining attributes of open standard file formats for long-term preservation and open access | park and oh 60 portability a format that makes extensive use of specific hardware or operating system features is likely to be unusable when that hardware or operating system falls into disuse. a format that is defined in an independent way will be much easier to use in the future: (1) independent of hardware; (2) independent of operating system; (3) independent of other software; (4) independent of particular institutions, groups, or events; (5) widespread current use; (6) little built-in functionality; and (7) single version or well-defined versions (clausen, 2004). n/a monitoring obsolescence information gathered through regular web harvesting can give us some information about what file types are approaching obsolescence, at least for the more frequently used types (clausen, 2004). n/a no definite term a human-readable text format and internationalized character sets are supported (müller et al., 2003). n/a xml (yes) not dependent on specific hardware, not dependent on specific operating systems, not dependent on one specific reader, not dependent on other external resources (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (little) the format requires a plug-in for viewing if appropriate software is not available or relies on external programs to function (puglia et al., 2004). n/a 6. p r e s e n t a t i o n distributing page image not defined (sahu, 2006) pdf (excellent) html (good) sgml (good) xml (good) normal rendering not defined (cendi, 2007; hodge & anderson, 2007). pdf (yes) pdf/a (limited) tiff_g4 (yes) xml (yes) presentation preservation of its original look and feel (brown, 2003) n/a self-containment everything that is necessary to render or print a pdf/a file must be contained within the file (sullivan, 2006). pdf/a (yes) self-contained to contain all resources necessary for rendering (abrams et al., 2005) n/a beyond normal rendering not defined (cendi, 2007; hodge & anderson, 2007). pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (limited) 7. a u t h e n t i c i t y authenticity the format must preserve the content (data and structure) of the record and any inherent contextual, provenance, referencing and fixity information (brown, 2003). n/a provenance traceability ability to trace the entire configuration of data production (folk & barkstrom, 2003) n/a integrity of layout not defined (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (n/a) xml (yes) integrity of rendering of equations not defined (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (n/a) xml (limited) integrity of structure not defined (cendi, 2007; hodge & anderson, 2007) pdf (limited) pdf/a (limited) tiff_g4 (n/a) information technology and libraries | december 2012 61 xml (yes) 8. a d o p t i o n adoption degree to which the format is already used by the primary creators, disseminators, or users of information resources (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (yes) tiff_g4 (yes) xml (yes) worldwide usage, usage in the cultural heritage sector as archival format (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (yes) microsoft word (limited) the degree to which the format is already used by the primary creators, disseminators, or users of information resources (arms & fleischhauer, 2006) n/a widespread use may be the best deterrent against preservation risk (abrams et al., 2005). tiff (yes) the format is widely used by the imaging community in cultural institutions (puglia et al., 2004). n/a flexibility of implementation to promote its wide adoption (sullivan, 2006) pdf/a (yes) popularity a format that is widely used (folk & barkstrom, 2003) n/a widely used formats it is far more likely that software will continue to be available to render the format (public records office of victoria, 2004). n/a ubiquity popular formats supported by as much software as possible (brown, 2003) n/a not defined (sahu, 2006) n/a continuity the file format is mature (puglia et al., 2004) n/a 9. p r o t e c t i o n technical protection mechanism password protection, copy protection, digital signature, printing protection and content extraction protection (rog & van wijk, 2008; wijk & rog, 2007) pdf/a-1 (limited) microsoft word (limited) implementation of a mechanism such as encryption that prevents the preservation of content by a trusted repository (cendi, 2007; hodge & anderson, 2007) pdf (yes) pdf/a (no) tiff_g4 (no) xml (no) it must be able to replicate the content on new media, migrate and normalize it in the face of changing technology, and disseminate it to users at a resolution consistent with network bandwidth constraints (arms & fleischhauer, 2006). n/a no encryption, passwords, etc. (abrams et al. (2005) n/a protection the format accommodates error detection, correction mechanisms, and encryption options (puglia et al., 2004). n/a source verification cryptographic encoding of files or digital watermarks without overburdening the data centers or archives (folk & barkstrom, 2003) n/a examining attributes of open standard file formats for long-term preservation and open access | park and oh 62 10. p r e s e r v a t i o n preservation the format contains embedded objects (e.g., fonts, raster images) or links to external objects (puglia et al., 2004). n/a long-term institutional support to ensure the long-term maintenance and support of a data format by placing responsibility for these operations on institutions (folk & barkstrom, 2003) n/a ease of transformation/ preservation the format will be supported for fully functional preservation in a repository setting, or the format guarantee can currently only be made at the bitstream (content data) level (puglia et al., 2004). n/a no definite term to create files with either a very high or very low preservation value (becker et al., 2008a, becker et al., 2008b) pdf (no) tiff (no) 11. r e f e r e n c e citability a machine-independent ability to reference or “cite” the individual data element in a stable way (folk & barkstrom, 2003) n/a referential extensibility ability to build annotations about new interpretations of the data (folk & barkstrom, 2003) n/a no definite term an open and established notation (müller et al., 2003) n/a xml (yes) data is easily repurposed via tags or translated to any medium (johnson, 1999) n/a xml (yes) creating, using, and reusing tags is easy, making it highly extensible (johnson, 1999). n/a xml (yes) 12. o t h e r s transparency degree to which the digital representation is open to direct analysis with basic tools, such as human readability using a text-only editor (cendi, 2007, hodge & anderson, 2007). pdf (limited) pdf/a (limited) tiff_g4 (limited) xml (yes) in natural reading order (sullivan, 2006). pdf/a (yes) microsoft notepad (yes) the degree to which the format is already used by the primary creators, disseminators, or users of information resources (arms & fleischhauer, 2006) n/a amenable to direct analysis with basic tools (abrams et al., 2005) n/a ample comment space to allow rich metadata (barnes, 2006) n/a items should be labeled, as far as possible, with enough information to serve for searching or cataloging (lesk, 1995). tiff (yes) a digital format may inhibit the ability of archival institutions to sustain content in that format (arms & fleischhauer, 2006). n/a information technology and libraries | december 2012 63 table bibliography abrams, stephen et al. 2005. “pdf-a: the development of a digital preservation standard.” paper presented at the 69th annual meeting for the society of american archivists, new orleans, louisiana, august 14–21, http://www.aiim.org/documents/standards/pdf-a.ppt (accessed november 21, 2011). arms, caroline r. and carl fleischhauer. 2006. “sustainability of digital formats: planning for library of congress collections.” http://www.digitalpreservation.gov/formats/sustain/sustain.shtml (accessed november 21, 2011). barnes, ian. 2006. “preservation of word processing documents.” http://apsr.anu.edu.au/publications/word_processing_preservation.pdf (accessed november 21, 2011). becker, christoph et al. 2008. “a generic xml language for characterising objects to support digital preservation.” in proceedings of the 2008 acm symposium on applied computing, fortaleza, ceara, brazil, march 16–20. becker, christoph et al. 2008. “systematic characterization of objects in digital preservation: the extensible characterization language.” journal of universal computer science 14, no 18: 2936– 2952. brown, adams. 2003. “the national archives. digital preservation guidance note: selecting file formats for long-term preservation.” http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed november 21, 2011). cendi digital preservation task group. 2007. “formats for digital preservation: a review of alternatives and issues.” http://www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf (accessed november 21, 2011). clausen, lars r. 2004. “handling file formats.” http://netarchive.dk/publikationer/fileformats2004.pdf (accessed november 21, 2011). ecma. 2008. “office open xml file formats—part 1.” 2nd ed. http://www.ecmainternational.org/publications/standards/ecma-376.htm (accessed november 21, 2011). folk, mike, and bruce barkstrom. 2003. “attributes of file formats for long-term preservation of scientific and engineering data in digital libraries.” paper presented at the joint conference on digital libraries, houston, tx, may 27–31. http://www.hdfgroup.org/projects/nara/sci_formats_and_archiving.pdf (accessed november 21, 2011). http://www.digitalpreservation.gov/formats/sustain/sustain.shtml http://apsr.anu.edu.au/publications/word_processing_preservation.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.cendi.gov/publications/cendi_presformats_whitepaper_03092007.pdf http://netarchive.dk/publikationer/fileformats-2004.pdf http://netarchive.dk/publikationer/fileformats-2004.pdf http://www.ecma-international.org/publications/standards/ecma-376.htm http://www.ecma-international.org/publications/standards/ecma-376.htm http://www.hdfgroup.org/projects/nara/sci_formats_and_archiving.pdf examining attributes of open standard file formats for long-term preservation and open access | park and oh 64 frey, franziska. 2000. “5. file formats for digital masters.” in guides to quality in visual resource imaging, research libraries group and digital library federation. http://imagendigital.esteticas.unam.mx/pdf/guides.pdf (accessed november 21, 2011). hodge, gail and nikkia anderson. 2007. “formats for digital preservation: a review of alternatives and issues.” information services & use 27: 45–63. johnson, amy helen. 1999. “xml xtends its reach: xml finds favor in many it shops, but it’s still not right for everyone.” computerworld 33, no. 42: 76–81. lesk, michael e. 1995. “preserving digital objects: recurrent needs and challenges.” in proceedings of the 2nd npo conference on multimedia preservation. brisbane, australia. http://www.lesk.com/mlesk/auspres/aus.html (accessed november 21, 2011). müller, eva et al. 2003. “using xml for long-term preservation: experiences from the diva project.” in proceedings of the sixth international symposium on electronic theses and dissertations. berlin, may: 109–116, https://edoc.hu-berlin.de/conferences/etd2003/hanssonpeter/pdf/index.pdf (accessed december 8, 2012). potter, john michael. 2006. “formats conversion technologies set to benefit institutional repositories.” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026typ e=pdf (accessed november 21, 2011). public records office of victoria (australia). 2006. “advice on vers long-term preservation formats pros 99/007 (version2) specification 4.” department for victorian communities. http://prov.vic.gov.au/wp-content/uploads/2012/01/vers_advice13.pdf (accessed november 21, 2011). puglia, steven, jeffrey reed, and erin rhodes. 2004. “technical guidelines for digitizing archival materials for electronic access: creation of production master files—raster images.” us national archives and records administration. http://www.archives.gov/preservation/technical/guidelines.pdf (accessed november 21, 2011). rog, judith, and caroline van wijk. 2008. “evaluating file formats for long-term preservation.” national library of the netherlands. http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_metho d_27022008.pdf (accessed november 21, 2011). sahu, d.k. 2004. “long term preservation: which file format to use.” presentation at workshops on open access & institutional repository, chennai, india, may 2–8, http://openmed.nic.in/1363/01/long_term_preservation.pdf (accessed november 21, 2011). sullivan, susan j. 2006. “an archival/records management perspective on pdf/a.” records management journal 16, no. 1: 51–56. http://imagendigital.esteticas.unam.mx/pdf/guides.pdf http://www.lesk.com/mlesk/auspres/aus.html https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/pdf/index.pdf https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/pdf/index.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026type=pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026type=pdf http://prov.vic.gov.au/wp-content/uploads/2012/01/vers_advice13.pdf http://www.archives.gov/preservation/technical/guidelines.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/kb_file_format_evaluation_method_27022008.pdf http://openmed.nic.in/1363/01/long_term_preservation.pdf information technology and libraries | december 2012 65 van wijk, caroline, and judith rog. 2007. “evaluating file formats for long-term preservation.” presentation at international conference on digital preservation, beijing, china, oct 11–12. http://ipres.las.ac.cn/pdf/caroline-ipres2007-11-12oct_cw.pdf (accessed november 21, 2011). http://ipres.las.ac.cn/pdf/caroline-ipres2007-11-12oct_cw.pdf prevention of duplicate orders by matching new orders being input with records already in the database. a potential duplicate is reported if there is a match on both the author and the title fields . it was decided at the time of implementation at lewis and clark that this criterion was too restrictive, and clas was programmed to report a duplicate if only the title fields matched . after some months of experience, it turned out that even this requirement was excessively restrictive: a slight variation in the way a title was input would prevent a duplicate from showing up . the criterion was then further relaxed to signal duplicates if either the title or the author's last name matched. this, however, was too broad a net : although no duplicates were missed, ordering a book by wilson or smith produced a tedious list of potential duplicates. hence, the requirement was tightened slightly to look for a match in either the title or the author's last name and first initial. this final criterion is currently serving well the needs of the watzek library. what is important about this evolutionary process is that it illustrates the dynamic way in which a library can "fine-tune" an automated system that is receptive to user modifications . since peas is supposed to be a selfexplanatory system, it lacks any documentation. clas is still a self-explanatory system, but nevertheless a manual has been produced to describe all its features and to record programming information such as the structure of the files . one version of the documentation is kept in machine-readable form so that it can be easily updated to correspond to developments in the program . in conclusion, it can be stated that a library-application software package has been successfully transplanted from one institution to another, from one hardware environment to another, and in doing so has matured into a fuller and more flexible system, which it is hoped will, in turn, benefit other libraries contemplating the automation of their acquisitions operation .2 references 1. jenko lukac, "a no cost online acquisicommunications 101 tions system for a medium-size library," library journal 107:684-85 (march 15, 1980). 2. interested libraries can request a copy of the clas program ($80) or manual ($40) directly from the author. the significance of information in the ordinary conduct of life* robert newhard: torrance public library, torrance, california. the information benefit provided to the general public by the developing telecommunications systems will be highly dependent upon the provider's perception of the current and potential role of information in the ordinary interests of life. assessing this role cannot easily be done by standard questionnaire or survey methods because information does not have a conscious function in people's lives. some paradigms from the past and present may, therefore, be of use in articulating the everyday importance of information. the tool paradigm: information as a link between man and his tools or repairing a lost confidence prior to the industrial revolution, most production was carried on in the home, using tools either made or repaired mainly at home. in this cottage industry, each person was very close to and secure in the use of his tools . with the advent of the industrial revolution and the factory system, the worker no longer owned his tools, but went to one place to use someone else's tools. man and his tools began to separate. many used the tools, fewer understood them . this process began to create the "expert." today most of the tools we use-the automobile, telephone, computer termi* a version of this paper was delivered at the meeting on "public libraries and the remote electronic delivery of information (redi)," columbus, ohio, march 23-24, 1981. 102 journal of library automation vol. 14/2 june 1981 nal, etc.-we cannot repair. this has led to a set of latter-day "high priests" upon whom , because of their specific knowledge, we are dependent. i suggest that this trend toward information experts is inimical to a democratic society because of the dependency it creates and because of the pervasivehopelessness it engenders in the public mentality regarding matters as diverse as appliance repair and politics. this process , in a milieu of rapidly developing technology, may seem irreversible . i would suggest, however , that wellpackaged and targeted information could do much to reduce frustration, restore the judgmental effectiveness and the selfconfidence of the ordinary citizen (we are all ordinary for more purposes than not), and to improve citizen confidence in society. for example: i am told i need a clutch job on my car. i can check the flat-rate manual in the library to determin e the amount of time that job should take for my make , model, and year of car. the manual will even give the pric e, but , being a book, it is out of date . in california, each garage must post its hourly rate. suppose the flat-rate manual indicated the clutch job should take three hours and the posted rate is $20 per hour. if the estimate comes back at $150 instead of $60, i know something is wrong. either there is more to the job than clutch repair or it is a rip-off. in either case, even though i cannot repair my car, i can, because of information , make a rational judgment. i am effective in dealing with this problem d esp ite my technological incompetence. the flat-rate manual is a packaged set of information targeted on a specific range of problems, and can function as an imperfect paradigm for what information developm e nt commensurate to technological development should be. the word "they" as a paradigm another indicator of the "information gap" in this society, is a particular use of the word "they." if one listens to the frequency with which people say "they do this," "they don't care," "they 're all politicians," etc., one can grasp the pervasiveness of the "information gap." i suggest that the word "they," so used , almost always indicates an absence of information. this absence is frequently accompanied by suspicion and distrust. the yellow pages paradigm another measure of the importance of information to people in general consists of imagining what would happen if the yellow pages of the telephone book were suddenly withdrawn. there would, i suggest, be a minor revolution. freedom as a paradigm a final perspective on the importance of information may be found in its bearing on human freedom. in the earlier phases of this society's development, freedom consisted of enough space as in horace · greeley's "go west, young man," or in frederick turner's observations on the frontier as a release valve for social pressure in the eastern united states, or d aniel boone needing elbow room. today we live on top of each other and this aspect of freedom is rapidly diminishing. one might view time as a delineator of freedom, as we often say: "if only i had enough time." the absence of the time found in simpler societies, the temporal pressure cooker of today where one's days off are filled with running one's personal business (errands, bill paying, etc.), suggests we have lost much of this temporal freedom . i would suggest that the basic de facto support of freedom now lies with information. information, like knowledge, as observed by francis bacon, is power, and distributed information is distributed power. information awareness contrast these indicators of the public importance of information with a lack of conscious awareness of the significance of information. we do not have an information-prone society. when faced with a problem or interest, i suggest, we are more prone to ask, "what do i have to do?" rather than, "what do i have to know?" part of this reaction is probably due to the fact that when we ask "what do i have to know?" we are faced with another problem in addition to the initial one; i.e., where to get the information. this added effort simply confirms in us our indifference to information, and we take our best shot at solving the problem through decision and action . i sometimes think we have made a virtue of the information incapacity by the way we laud decision making as an indicator of ability. if the foregoing examples are reasonably accurate, we are then faced with a situation in which information is fundamentally important to societal and individual wellbeing, but is not perceived to be so by people in the conduct of their daily affairs . computer-supported telecommunications systems can be the instrument for accelerating information control by a few (this has been much of the trend , so far , as indicated by corporate, research, and technical use of these systems), or it can be used to build information confidence, use, and desire throughout society. this option, i. suggest, is central to the significance of telecommunications systems for a democratic society. if the latter option is to be obtained, i suggest that information will have to be packaged and targeted so well on people 's everyday problems and interests that it will be easier and more productive to say "what do i have to know?" before saying "what do i have to do?" a basic approach to articulating an information service of this kind consists of the following steps: l. determine and prioritize the individual and societal problems and interests of a given community. 2. ascertain the information parameters of those problems and interests. 3. locate and obtain the information necessary to address those problems and interests . 4. organize this information so as to optimally target the specified probcommunications 103 !em or interest to be as easily retrievable as possible. this requires an understanding of the context in which the information is used so that it is optimally relevant, and an understanding of the language and problem articulation common ·to the individuals in the community in order to ensure rapid retrieval. a lesson in interactive television programming: the home book club on qube w . theodore bolton: oclc , inc., columbus, ohio. on december 1, 1977, warner communications christened what has become the most publicized and talked about technological development in the field of cable television: qube, its two-way interactive cable system . publicity posters claimed that this would be "a day you'll tell your grandchildren about," and broadcasters added the word "interactive" to their cocktail-party vocabulary. academicians who ten· years ago forecast a technological revolution initiated by the marriage of computer to cable television, smugly grinned and saw their dreams turn into reality. response to qube, however, has been mixed. participatory television brings, to some, futuristic images of instant democracy; others warn of its potential demagogic power. 1 regardless of your critical persuasion, there now exists what former cbs executive turned warner amex2 consultant mike dann calls "a whole new utility ."3 this whole new utility, whether in the form of qube cable television, or some other combination of computer, cable television, telephone, and standard over-the-air broadcasting, will change the way we conduct our lives <:nd interact with other people . the history of the home book club early in 1979, the oclc, inc . , research staff appraised the nature and context of the qube facilities (located in comissing items: automating the replacement workflow process | smith et al. 93 tutorial cheri smith, anastasia guimaraes, mandy havert, and tatiana h. prokrym missing items: automating the replacement workflow process academic libraries handle missing items in a variety of ways. the hesburgh libraries of the university of notre dame recently revamped their system for replacing or withdrawing missing items. this article describes the new process that uses a customized database to facilitate efficient and effective communication, tracking, and selector decision making for large numbers of missing items. t hough missing books are a ubiquitous problem affecting multiple aspects of library services and workflows, policies and procedures for handling them have not generated a great deal of buzz in library literature. for the purpose of this article, missing books (and other collection items), refers to items that were not returned from circulation or have otherwise gone missing from the collection and cannot be located. significant staff time may be invested in the missing-book process by departments such as collection development, circulation, acquisitions, database management, systems, and public services. more importantly, user experiences can be negatively affected when missing books are not handled efficiently and effectively. while most libraries have procedures for replacing or suppressing catalog records for items that are missing from the stacks or have been checked out and never returned, few have made these procedures public. this article describes the procedure developed by the hesburgh libraries of the university of notre dame to replace missing items or to withdraw them from the catalog. hesburgh libraries’ procedure offers streamlined, paperless routing of records for missing materials, accounts for “nondecisions” by subject librarians, and results in a shortened turnaround time for acquisitions and catalogmaintenance workflows. hesburgh libraries’ experience in 2005, hesburgh libraries recognized its need to develop a streamlined method of processing missing items. because of personnel changes and competing demands on staff time, the routine handling of missing materials had been suspended for roughly five years. during this period, circulation staff continued to perform searches. when staff declared an item officially missing, the item’s catalog record was updated to the item process status “missing” (mi) and paper records were routed to the collection development department office, but no further action was taken. the mounting backlog of missing items in the catalog became a recurring source of frustration to patrons and public-services employees alike. searches for books that were popular among undergraduates often led to items with a “missing” status. to compound the problem, budgetary constraints resulted in the suspension of spending from the fund earmarked for the replacement of missing items. subject librarians were forced to use their own discipline-specific funds to replace items in their areas, but because there was no systematic means of notifying subject librarians of missing items, they replaced items very rarely and on a case-by-case basis—primarily when faculty or graduate students asked a selector to purchase a replacement for an item critical to their teaching or research. also in 2005, a library-wide fund to replace materials was made available. unfortunately, by that time, the tremendous backlog of catalog records for missing items rendered the existing paper-based system unworkable. as a result, a small task force was formed to manage the backlog and to develop a new method for handling future missing items. hesburgh libraries’ solution the missing items task force was initially composed of eight members representing all departments affected by changes in the procedures for handling missing books. the task force was chaired by the subject librarian for psychology and education. other members represented the circulation, collection development, cataloging, catalog and database maintenance (cadm), monograph acquisitions, and systems departments. during the initial meeting, each member described their portion of the workflow and communicated their requirements for effectively completing their parts of the process. because most items with the status “missing” were ones that a patron or patrons had either recently used or requested and could therefore be considered relatively high-use material, the task force quickly determined that the search time for missing books should be shortened from one year to six months. task force members from monograph acquisitions were particularly interested in making this change because newer books are more easily replaced if requests were made cheri smith (cheryl.s.smith.454@nd.edu) is coordinator for instructional services, anastasia guimaraes (aguimara @nd.edu) is supervisor of catalog and database maintenance, mandy havert (mhavert@nd.edu) is head of the monograph acquisitions department, and tatiana h. prokrym (tprokrym@ nd.edu) is senior technical consultant at hesburgh libraries of the university of notre dame, notre dame, indiana. 94 information technology and libraries | june 2009 sooner—many books, especially in the sciences, go out of print quickly and become difficult to replace. the systems task force member supplied a spreadsheet containing the roughly three thousand missing items. this initial spreadsheet included all fields that might be useful for staff in monograph acquisitions, cataloging, cadm, and collection development. various strategies for disseminating the spreadsheet to subject librarians were discussed, but all ideas for how the subject librarians might interact with the spreadsheet seemed laborious and inevitably required that someone sort through each item on the list to determine whether the records needed to be sent to monograph acquisitions or cadm for further processing. the process seemed feasible for a onetime effort, but the task force did not see it as a suitable permanent solution. the task force then considered the feasibility of developing a customized database to manage all of the information necessary for library employees—primarily subject librarians and monograph acquisitions and cadm staff—to participate in the processing of missing books. the database once the task force determined that a database would serve hesburgh libraries’ needs more efficiently than a spreadsheetor paper-based system, the task force enlisted the help of an applications developer. hesburgh libraries had previously created a database for handling journal cancellations, and the task force decided to base the replacement application upon this model. the application is therefore written in php and uses a mysql database. the first step in designing the database was to determine which bibliographic metadata (such as call number, isbn, issn, imprint, etc.) would be required by subject librarians to specify replacement or withdrawal decisions, including whether the item was to be replaced with the same edition, any edition, or the newest available edition. because replacement funds may not always be available, the task force wanted to enable the selector to identify other funds to use for the replacement purchase. finally, the task force felt that, no matter how easy the system was to use, there would always be a few subject librarians who choose not to use it. it was therefore important that the database could also account for “nondecisions” from subject librarians. other general database requirements included that it be available through any web browser and accessible to only those people who are part of the replacement-book process. with those requirements in mind, the task force created a list of metadata elements to be included in the database (see table 1). on a quarterly basis, the application pulls the database fields—title, author, call number, sub library, imprint, isbn or issn, barcode, previous fund, local cost, description, item status, update date, bib system number, and system number—from hesburgh libraries’ ils (aleph v18) and imports into the replacements database. for each item, bibliographic, circulation, and acquisitions information is retrieved from aleph and combined to generate the export data file. procedurally, a list of all items with an item process status of “missing” is first retrieved into a temporary table from the item record (z30) table. this temporary table consists of the system number, status field, sublibrary, collection, barcode, description, and the last date the item was modified (z30-update-date in aleph). a second temporary table is then created that includes the purchase price and fund code originally used to purchase the item. the two temporary tables are joined and their information merged, creating a single list of missing items and related acquisitions information. this list is then linked to the bibliographic tables to obtain key bibliographic information such as title, author, imprint, isbn or issn, the ils bibliographic number, and the barcode. these combined results are converted into an ascii text file for import into the mysql replacements database. upon the import of the ascii file, an e-mail is sent to the collection development e-mail list, informing subject librarians that data has been loaded and is ready for their review and input. table 2 lists the purpose of each of the nine tables within the replacements database. figure 1 illustrates the relationships and linking fields table 1. fields for the replacements database database field data type title varchar(200) author varchar(150) call number varchar(30) sub library varchar(12) imprint varchar(150) isbn or issn varchar(150) barcode varchar(30) previous fund varchar(20) local cost decimal(10,2) description varchar(50) item status char(2) update date date bib system number int(9) unsigned zerofill system number varchar(50) new database fields: action to take tinyint(1) new fund code int(10) modified date date modified by varchar(50) notes longtext system-used fields: transfer date date record id int(10) (auto) missing items: automating the replacement workflow process | smith et al. 95 between the tables. the database provides two “pick lists” for subject librarians. the first pick list is the action to take field. primary choices are “any edition,” “newest edition only,” “micro format only,” and “do not replace.” the second pick list is the new fund field. the default choice for this field is hesburgh libraries’ replacement fund code, although any acquisitions funds may be selected. both pick lists provide data integrity and assurance that all input from the subject librarians is standardized. two internal fields, record id and transfer date facilitate programming and identification. these fields are very important for auditing and tracking replacement records through the replacement process. rollbacks are easily handled through the manipulation of these two fields. programmatic process for the initial implementation of this application, the task force decided that batch loads would be preformed on an as-needed basis. after the initial phase of the project, the task force implemented a quarter-based schedule. for each data load, the exported records are written to a text file, which is then imported into the replacements database through an import script. the import script archives the previous group of processed records, appending them to a set of historical tables stored within the database. the import script further processes the aleph data by eliminating duplicate records and ensuring there is only one record per barcode and system number. the historical tables are checked to see if a missing item has already been loaded into the database and processed. if a record has already been processed, it is automatically deleted from the newly imported item list. after the successful completion of the data load, an e-mail is automatically generated notifying subject librarians that the replacements database is ready for their review and input. the verified missing item records are then transferred to the main database table, “tblreplacements,” and are ready for updating. included in the e-mail to subject librarians is a link that directs them to a search window allowing them to take action on the missing items (see figure 2). once the subject librarians update the records, the application provides a mechanism to distribute missing book records to the appropriate departments for further processing. a collection development staff member runs a series of reports, each one creating a microsoft excel spreadsheet. the first report lists missing-book records marked for replacement and is sent to monograph acquisitions for processing. missing books that have been marked “do not replace” or have had no action taken on them after a certain time period are exported to a separate excel spreadsheet that is sent to cadm for suppression or removal of cataloging records. for each report that is run, the application generates an e-mail message, notifying all necessary departments that there is information to be processed. a list of processed records is available for viewing and distribution to cadm and acquisitions as illustrated in figure 3. the application also provides customized manipulation of the data records that are exported to each of the departments. this customization pulls together only the specific fields of interest to each department such that each export template is unique to each department’s needs. at the end of each replacement cycle, the application automatically creates backups and archives the missing book records. table 2. tables and their purposes within the database table description alephdump stores imported aleph data before processing. tbltempreplacemetns stores aleph data from the alephdump table. this data is processed and sent through verification and truncation programs. tblreplacments post-processed aleph records. primary table for all activities, actions, and fund codes selected by the subject librarians. tblactions a reference list of valid actions that can be taken by the subject librarians. tblfunds a reference list of valid fund codes; originally imported from aleph. tblacqrecords temporary table that stores processed records that should be sent to monographic acquistions. tblcadmrecords temporary table that stores processed records that should be sent to cadm. tblcadmnullrecord temporary table that stores records where no action has been taken by a subject librarian. historytblreplacements an archiving table. 96 information technology and libraries | june 2009 subject librarian workflow when subject librarians receive a message indicating a new replacement list is ready for review, their job is surprisingly simple. after entering their network id and password to gain access to the database, they can select how they wish to view the list of missing books—by selected call number ranges, by the budget code with which the books were originally purchased, or by system number (the last two options are rarely used). subject librarians can also view items that have already been processed, and they are able to sort this list by subject librarian, action taken, new budget code, or call number. figure 1. relationship diagram for the nine database tables that were created for this application. the aleph system number is used as the primary linking field for most of the tables. missing items: automating the replacement workflow process | smith et al. 97 initially, subject librarians encounter a list of brief records for each item in the database. the brief records include system numbers, titles, authors, volume numbers (if applicable), call numbers, sublibraries, and isbns or issns. if a record has already been reviewed by a subject librarian, the list will include actions taken and the names of the subject librarians who took the action. to take action on an item, subject librarians select the system number, displaying the full record (see figure 4), and may then choose to replace the book with the same edition, any edition, the newest edition available, or a microform version. by using a drop-down menu, the selector can elect to pay for the replacement with replacement funds or with their own subject funds. subject librarians who choose to replace books with their own funds are rewarded at the end of the quarter when their replacement requests appear at the top of the queue for processing by monograph acquisitions. additional functionality includes the ability to directly link to and browse opac records for items in the database. replacement funds cannot be used for second copies of books, so quick access to opac records is often useful. it also facilitates determining if the library owns other editions of the item before taking action. a notes field allows subject librarians to communicate special instructions for monograph acquisitions or cadm, and records can be e-mailed to other librarians for additional input with just a few clicks. subject librarians are able to return to the database at any time during a given quarter to continue making decisions on their missing books and make any adjustments to prior decisions as necessary. if a subject librarian takes no action on an item by the end of the quarter, it is assumed that it is not to be replaced, and these untouched items are sent to cadm for removal or suppression. figure 2. replacements application search window figure 3. processed book records ready to be sent to monograph acquisitions and cadm. notification and data transmission to these units are achieved through the send buttons on this webpage. 98 information technology and libraries | june 2009 monograph acquisitions workflow once the quarterly database processing completes, a comma-separated file is delivered to the shared monograph acquisitions e-mail address. monograph acquisitions staff format, sort, and begin searching the spreadsheet, giving priority to the orders designated for replacement by subject librarian funds over those funded from the library replacement fund. staff members routinely search the library catalog for duplicate titles or review orders in process for the same title prior to searching with our library materials vendors. staff members ensure that replacement funds are not used to purchase second copies. material that is not available for purchase is referred by monograph acquisitions to the subject librarian for direction. sometimes the materials may be kept on order with a vendor to continue searching for outof-print or aftermarket availability. other times it is necessary for staff to cancel the order and remove the record from the system completely. likewise, the missing edition may have been subsumed by a newer, revised edition. subject librarians are contacted by search and order staff in the monograph acquisitions department regarding availability of different editions when they did not specify that any edition would be acceptable. when the monograph acquisitions department places a replacement-copy order, the search-and-order unit adds an ils library note field code designating the item is a replacement (rplc), the bibliographic system number of the item being replaced, and any typical order notes such as the initials of the staff member placing the order. the rplc code alerts the receipt unit to route new items to the cataloging supervisor, who then reviews and directs the items to either cataloging or cadm for processing. catalog and database maintenance (cadm) workflow cadm is usually the last unit to edit records in the missing books workflow. the unit receives two reports from the database: a “do not replace” list and a “no action taken” list. both reports get the same treatment: all catalog records for titles listed are removed from the catalog. removal of catalog records is accomplished either by suppression/ deletion of the bibliographic records or complete deletion of all records (item, holdings, bibliographic, and administrative) from the server. for titles that have order or subscription records attached to bibliographic records, a suppression/deletion procedure allows the record to be suppressed from patrons’ view while preserving the title’s order and payment history for internal staff use. records are completely deleted when no such information exists (e.g., a gift copy or an older record that has no such data attached). because it takes a long time to review each newly loaded batch from the catalog into the database, some records that come to cadm for deletion no longer need to be deleted if missing books are found and returned to the shelves. it is very important for staff working on the cleanup of records to check the item process status and not delete any items that have been cleared of the “missing” status. fortunately, aleph allows staff to look up an item’s history and view prior changes made to the record. this item history feature eliminates unnecessary shelf checks for items appearing on cadm reports that are no longer listed as “missing” in the catalog. occasionally, cadm receives requests to delete records directly from monograph acquisitions and cataloging staff because of a revised selector decision. this often occurs when a replacement item is only available in a different edition from figure 4. full record for a missing book in the replacement database missing items: automating the replacement workflow process | smith et al. 99 the one originally sought, or when an item is ultimately unable to be replaced because it has gone out of print or a vendor backs out of a purchase agreement. when a different edition is received to replace a missing item, the replacement copy is sent by the receipt unit in monograph acquisitions to cataloging for copy or original cataloging, and cadm is alerted by either monograph acquisitions or cataloging staff if the record for the missing item needs to be deleted. because monograph acquisitions often orders the replacement on its own record with appropriate bibliographic information (we keep the original record just in case the missing piece is found while we wait for replacement), the record for the missing book does not come to cadm on either of the two reports. perhaps in a library with a different makeup of technical services the process would be more streamlined, but because hesburgh libraries has separate cataloging and database maintenance units, we have created such partnerships to make sure nothing falls through the cracks. so far it has worked well, and every party in the process knows and carries out their responsibilities. issues while the initial implementation successfully brought a large backlog of missing records into the database, subsequent loads included duplicate records of some items processed in earlier batches. this duplication occurred, for example, if an item was identified for replacement in a prior database review cycle, but a replacement request had not yet been processed by monograph acquisitions staff. because such an item is still identified as “missing” in the catalog, it was again included in data loaded from the catalog into the missing-books database, creating confusion for selectors, cadm, and monograph acquisitions. to resolve this problem, the import process was revised to include a search for previously loaded items, deleting them before records are viewed by collection managers. a second issue involved the timing of the data load from the catalog into the replacements database. for various reasons, the data load file was not fully generated for several of the scheduled processing dates. to remedy this problem, the application automatically generates an e-mail confirming a successful data load to the collection development department staff. there is continued debate as to whether the missingitems file should be created on a daily basis, providing the capability for collection development to import new data at one time rather than periodically. results since implementing our new system, hesburgh libraries has processed records for 5,141 missing items. since its creation, twenty-five librarians have consulted the database and twenty-three of thirty subject librarians have used the database to request replacements. of the 5,141 records loaded into the database, 2,537 items (49 percent) have been selected for replacement, and 2,604 items (51 percent) have either been suppressed or deleted from our catalog. replacement funds are renewed on an annual basis and have not yet run out. as a reflection of the collection strengths at hesburgh libraries, most of the missing books (21 percent) fell in the theology/religion call number range. language and literatures was the second most popular collection for missing items (17 percent). other collections with significant numbers of missing books are history (15 percent), social sciences (17 percent), science (12 percent), and philosophy (10 percent). conclusion although the process could certainly be further developed and refined, the hesburgh libraries missing books application is an amazing improvement over the extremely outdated paper-based method of dealing with missing library materials. the process works; it is both efficient and effective, and employees who engage in the process have reported satisfaction with it. it has not only allowed hesburgh libraries to catch up on its backlog but, more importantly, to stay current and organized, keeping the catalog more accurate and patrons more satisfied. furthermore, should the libraries opt to do a full inventory in the future, the current system will prove invaluable. the authors are pleased to have the opportunity to share our experiences with interested libraries. feel free to contact any of the authors for further information. reproduced with permission of the copyright owner. further reproduction prohibited without permission. lib-s-mocs-kmc364-20140601052623 137 technical note help: the automated binding records control system an interesting new aspect of library automation has been the appearance of commercial ventures established to provide for an effective use of the new ideas and techniques of automation and related fields. some of these ventures have offered the latest in information science research and development techniques, such as systems analysis, management planning, and operations research. others have offered services based on new procedures, for example, computer-produced book catalogs, selective dissemination of information services, indexing and abstracting activities, mechanized acquisitions, and catalog card production systems. one innovation is a new technique devised for libraries to reduce the clerical effort required to prepare materials for binding and to maintain the necessary related records. the technique is called help, the heckman electronic library program. it was developed by the heckman bindery of north manchester, indiana, with the cooperation of the purdue university libraries. it was recognized by heckman's management that the processing of 10,000 to 20,000 periodicals weekly and the maintenance of over 250,000 binding patterns would soon become too unwieldy and costly unless more efficient procedures were developed. it was additionally realized that any new system should also be designed as a means to aid libraries with their interminable record-keeping problems. the latter purpose could be accomplished by providing a library with detailed and accurate information regarding each periodical it binds, and by simplifying the library's method of preparing binding slips for the bindery. in the fall of 1969, after a detailed analysis, the heckman bindery management began the development and programming of a computerized binding pattern system. this system was a result of a team effort involving management, sales, and production departments. john pilkington, data processing manager, directed the installation of the system and earl beal performed the necessary programming functions. in december of 1971 approx imately 700 libraries were using the system, and about 100,000 binding patterns were in the data file . 138 journal of library automation vol. 5/2 june, 1972 as the system was developed, a library's binding pattern data were converted to machine-readable form which then made it possible for the bindery automatically to provide nearly complete binding slips for each periodical title bound. in addition, the system provides an up-to-date pattern record for the libraries' files, and the bindery maintains the resultant data bank of pattern records as the library notifies it of additions, changes, and deletions. in this manner, the bindery expects to establish an efficient method for purging the file of out-of-date information. the system revolves around four forms: the binding pattern index card, the binding slip, the variable posting sheet, and the binding historical record. the binding pattern index card (figure 1) is a 5" x 8w' card, pink in color, which is a computer printout. one of these cards is retained in the library as its pattern record for each set of each periodical bound by the library. the data given on the card are essentially the same as those maintained by most libraries in their manual pattern £les, except that more detail is provided by the help system, and the library does not maintain the record-the bindery does-in machine-readable form. as changes are made to the patterns, the library clerk simply crosses out the old data on the appropriate binding slip and writes in the new data. when the bindery receives the binding slip, a new index card is produced, among other records, and forwarded to the library with the returned shipment of bound volumes. the system also provides for one-time changes that do not affect the pattern record. the data contained on the index cards include the library account number, the library branch or department code, the pattern number, color, type size, stamping position, title (vertical or horizontal spine positions), labels, call number, library imprint, and collating instructions. the collating instructions, which are listed in the instruction manual provided by the bindery, are given as a series of numeric codes. asterisks are used to indicate the end of a print line. the binding slips are also 5" x 8}2'' forms, but they are four-part multiple forms, of which three parts are sent to the bindery with the periodical to be bound, and one part, a card form, is retained by the library as its "at bindery" record. the information required by the binding slip is essentially the same as that included on the index card. the library, however, must provide the variable data such as volume number(s), date(s), month(s), or whatever information is required to identify a specific volume. the variable posting sheet (figure 2) is an 8)~" x 11" form that is used by the library when it sends several volumes or copies of a volume to the bindery at the same time. since the bindery cannot determine beforehand the number of physical volumes of a title a library will want to send for binding at a given time, it sends to the library only one printed-out binding slip to be used for the next volume of a given serial. if multiple volumes of -r-~---------------------------------------------:r-0 pattern cust. acct . no. i lib rar y' i pattern no. i •• 1. colo~. . , i trim i spine icust . pat. no. 'i' 0 i t'"e slot or i i otui i i i size start i i ::: library i 0 <( ~ ..... oo z z 0 i <( 0 post • 0 0 0 0 ,, ' 0 -o ·-' 0 ·~ f ·o 0 ~ accents i ~ z ~ to i : i rr.· llol ~ !ii llol i: i o i z <( ::e i i u z ·> i a: lli 0 i ~ i id z <( :1 :iii: i ~ x lli x i ... v e • t i c a l f r l 0 a n 0 t e l 0 • • call impriiiit panel l,..lllll$ coll.atl 8 len . s£w p£rma · film vol.. oty. 1 ovt• u: " u fiiioui u•• nu... 0, ~: r tal"[ stui filler sep. covea stu r stu8 w/stui sheets 11111 papu y x title i 0 ~ i f '" required i ~ i new title i i : i 0$ sample i i q. or rub job no. cover no. 0 c o: .. . : 3 0~ : • ! of 'i q_l ' + ~--------. 'f•a ( i o 1 -. }, 4 fig. 1. binding pattern index card. 140 journal of library automation vol. 5/ 2 june, 1972 binding patiern variable posting sheet 1he. heckman bin'de.~y, inc. cust. acct. no. 1 ~.18rj.rv rattern no.,l-israrv name periodicalname 'post patterw variabl-e information from \.eft to right in seqi./li:nc'e i z 3 4 5 . 6 ; .... '-......_~ .... "-....... ,_......-"'\_'-~r··-~ ........... ..____.._ · -l, )~ i / -~ fig. 2. variable posting sheet. a set are to be bound, the library clerk provides the variable information for the first volume by using the single binding slip, and the variable data for each additional volume of the same title are posted by the clerk on the posting sheet. the bindery will automatically produce from its pattern data bank the binding slips necessary for binding the additional volumes that are listed on the posting sheet. the binding historical record (figure 3) is a form provided for the use of the library if it desires a permanent record of every volume bound. the use of this form is not required by the system; it is simply a convenience record for the library binding staff. the form is printed on the back of the pattern index card. spaces are provided for volume, year, and date sent to the bindery, and most of the back of the card is available for posting. all data fields are of fixed length with the maximum size of the records at 328 characters. some of the data formats are shown in figure 4. a few of the data fields in the example need additional explanation. the fifth field labeled "print" refers to the color of the spine stamping, i.e., gold, black, or white. the "trim #1 & 2" fields are for bindery use only, and indicate volume size within certain groups for printing purposes. the "spine" field is also for bindery use, and it indicates the size of type that can be used according to the width of the spine. "product no." refers to certain types of publications such as magazines, matched sets, or items which will be pamphlet (inexpensively) bound. i i 0 0 0 0 0 0 0 0 0 0 title : publisher ' s address: volume year -------------------· binding record 0 0 date sent volume year date sent 0 0 0 0 0 0 0 0 fig. 3. binding historical record. ,..--1 i i i i i i i i i ibr., print punch program control card print punch program control card print punch program control card l----96 column cal card name ______________ _ i 12 1314 15 1 6 171 81 91 10 11 112 113 14l15l16 l11 l18l 19 20 l21l22 l23l24l25l26 l21 l28 l29 l30 131132133 134 35136 3ji38 l39 l40 1411421 43144145 l i ' print line 1 i p ier 1 ' t i i cust. no. lib pattern p mat. trim ~im s customer no. no. r #i p 1 i i i pattern ' n i n i t i e no. i ' i 2j 3 4j5j_6 1 18 19 110 11112113 14li5l 16 l11 l1 8 ll 9 20121 122 23124125126 21128129130 31132133134 35136 31138139140 i 411 42 143144 145 i ii i i i i i ii i i i i i ill ill ii i i ii i ill ii card name-------------i 2 13} 4 }5}6 1 i 8 i 9 i 101 ii 1121 131 14115 116111 118 119 20121122123124125126 1211281 29 130 i 311 32 133 i 34 i 35 l36l31 l38l39l40 141 42143144 14511 l i ' print line 1 i p,, er 1 i ti ' i cust. no. lib pattern i ' ' no. no. i !2 ' ) collate (con~.) -~ ' i i i i i 2 i 3 4 i 5 i 6 118 19 }10 11}12 }13 j4li5l16 i 11 i 18 i 19 20 i 21 i 22123 i 24125126121 i 28129130 131132133 i 34 i 35136 131 138 i 39140 i 41 42 143 14414511 i i i i ii i i , iiiii i i i i i 11 i i i i 11 111111 i i ii ii card name ______________ _ 1 2 i 3 14 i sl 6 i 1 i sl 91 10 111 112 j 13 14l15l 16l11 l1 8}19 20 }21}22}23}24}25}26}21}28 }29 }30 l31 }32 }33}34l35l36l37j38 l39l40 ]41 ]42 143 144]45~ print line 1 pr ier 1 ti cust. no. lib pattern i ~ no. no. 5 i 2 i 3 41516 1 i 8 i 9 i 10 ii 112113 14115 i 16117 i 18 i 19 20121 i 22123124125126 i 27128129130 i 31 i 32 133 134 i 35136131138 i 39140 i41i42i43i44i4sl~ i i i ijj ll . ill i i i i l l i i l i ll l ill l l l l l l l l l i ll fig. 4. data formats. ----, 1 j multiple layout form print lines 3 and 4 tier3 gx21·9088·0 um /050 " pnnted •n u s a "no of,offt'is_,.,~,..\w~l,.. "1f.---------'--collate----------------l 11 line 2 print lines 3 and 4 r2 tier 3 ----------------------~----variable ------------------------~----~ 1t line 2 print lines 3 and 4 r2 tier3 -----variable (contt) ------------------~ ' ' i i i i ~ l i i i i _____ j 144 journal of library automation vol. 5/ 2 june, 1972 l. lllrary hamil!. 'uft. acct. no, llll r: how 80unp i pro.~~~ ;::;:. j'iittflll no.,i'itiny i"ayijtta..:l trim i ~''ni-l cu$ t. i"ayteitn no. i !rvpe nor dr 'patter-n pr.l)o.itlng~t::tu p sixe hart ~ lfor.ix:olhal. lv veil tical ' i fr? fronl' or. labels variable fgl cafyions call ~ c impjlinl' ~ i panel. ~ line:s p collatingom ~ fig. 5. pattern printing setup. technical note / hammer 145 one additional form used in the system is for heckman's internal operations. that is a data input form known as the "pattern printing setup" (figure 5). this form is used by the bindery's input clerks to prepare new binding patterns for conversion to machine-readable form. the data prescribed by the form is much like that required by the binding pattern index card, except that data tags are shown for keypunching purposes. the system operates on an ibm system 3 computer with two 5445 disk drives and a 1403nl printer. the disk drives provide a total of 40,000,000 characters of on-line storage in addition to the 7,500,000 usable characters provided by the system 3 itself. five 5496 data recorders are used for data conversion. the programs are written in rpg2. the development of computer-oriented commercial services for libraries suggests that, perhaps if librarians wait long enough, they will not have to automate their libraries as commercial ventures will do it for them. the rapid appearance of systems-analysis firms, commercial and societal abstracting and indexing services, management and planning consulting groups, and data processing service bureaus tends to bear this theory out. at the very least, libraries will not be able to automate internally without providing for the incorporation of such ready services into their systems. when a service such as help is made available at no additional charge, there is no way for libraries to avoid automation. donald p. hammer donald p. hammer is associate director for library and information systems, university of massachusetts library, amherst. at the time the system d escribed in this article was developed, mr. hammer was the head of libraries systems development at purdue university. 304 a marc based sdi service kenneth john bierman: data processing coordinator, oklahoma department of libraries; and betty jean blue: programmer, information and management services division, state board of public ahairs, oklahoma city, oklahoma. an operating sdi service utilizes the weekly marc ii tapes distributed by the library of congress. the history, creation, operation, uses, advantages, disadvantages, cost and future plans for the sdi service are discussed, and flow charts (system and detail) and sample output given. introduction sdi (selective dissemination of information) is the distribution of new information to individuals or groups according to their expressed interests. sdi as a service of libraries is not a new concept, for libraries have been providing such specialized current awareness services for years both formally and informally in such ways as routing proof slips to interested persons, departments, etc. such services have been provided most commonly in special libraries, but are not uncommon in public and academic libraries as well ( 1). "although the practice of sdi is not new, its application in libraries has been generally irregular, informal, and very limiteddepending variously on the memory, willingness and free time of the librarian and contingent on the desire and ability of the patron to make his interest known" ( 2). with the interest in library applications of data processing has come marc based sdi servicejbierman and blue 305 an interest in automated sdi services. "all computer based sdi systems work on the same principle and include two basic elements: subject interest profiles for the users and a machine readable file of indexed bibliographic records of current materials." ( 3) the annual review of information science, volume 4, presents a summary of many different types of sdi systems ( 4) as well as an excellent bibliography ( 5). additional recent automated sdi services are described in the literature (6-12). the purpose of this article is to describe an operating marc-based sdi system, the environment within which it operates, and some of the thinking which led to its creation. background information the oklahoma department of libraries is the designated state library agency in oklahoma. as such it has two primary statutory responsibilities: 1) provision of library services to state government, including the executive, legislative and judicial branches of government, and the agencies of state government and 2) state-wide responsibility for total library development, including the development of multi-county library systems. the department provides a great variety of library services to fulfill these functions, one of which is the maintenance of a collection of materials with three subject specialty areas: 1) law (prhnarily of use to the judicial and legislative branches), 2) political science (primarily of use to the executive and legislative branches), and 3) library science (primarily of use to the department's own staff and the librarians throughout the state). in addition, a general collection and a general reference collection are maintained primarily for use by the executive and legislative branches of state government and as back-up and supportive col1ections to the libraries throughout the state. with the beginning of the marc distribution service march 29, 1969, the department implemented a service for other libraries around the state by creating and maintaining a marc data base for the use of all libraries within the state ( 13). after the data base had been created and was working satisfactorily, the department considered what it could do with marc to help its own operation. the five following paragraphs discuss projects suggested and considered. the first was design and implementation of the original input of a selected portion of the collection (the law collection, for example) in marc format. the project would be a beginning of putting the entire collection in marc format and would yield interim useful products as well (a book catalog, for example). however, it was decided that this project would be premature for two reasons. if retrospective conversion were going to be done nationally (14), it would be foolish for the department to duplicate the work at the local level; and before the department should expend money putting material into marc format, it should demonstrate the 306 journal of library automation vol. 3/4 december, 1970 usefulness of marc with the already existing records being distributed by the library of congress. a second project considered was design and implementation of conversion of the storage of the marc data base from the sequential (tape ) system ( 13) to a direct-access system (disk, for example). certainly from the standpoint of economic use of the data base such conversion is desirable, and for providing multiple access points to the data base (author, title, subject, etc.) it is essential. it was decided, however, that before additional funds and energy should be expended to improve the storage and retrieval of marc records, the usefulness of the presently available individual records themselves should be demonstrated. direct access storage and retrieval was deferred until the completion of the sdi system. work has recently begun on the direct access project ( 15). design and implementation of an acquisitions module for the department was considered because the department was preparing to re-design its acquisition system. however, to have a meaningful automated marcbased acquisitions system it would be necessary to search the data base by a number of entry points (author, title, etc. ), which would require the direct access system described above. a fourth project considered was design and implementation of a catalog card set and processing aids (label, etc. ) production module for the department. because the department does centralized processing for the library systems throughout the state, the processing center is a critical area within the library; therefore, this alternative had the most immediate appeal. it was decided that the department was not financially prepared to undertake so ambitious a project yet. it was felt that a less ambitious project should be undertaken fust to gain knowledge and experience which would be essential in a successful catalog card and processing aids production system. the project ultimately selected was design and implementation of a subject current-awareness service based upon the weekly marc tapes. the service would be immediately useful, both to the department and its clientele; was not dependent upon the maintenance of a large data base; and could be set up and operated quickly and economically. further, the experience gained in manipulating the marc records in the print portion of an sdi system would be valuable experience for manipulating the marc records for printing catalog cards and processing aids at a later date. overview of the sdi system first, the subject interests of a particular user (perhaps an individual, but more likely a state agency) are profiled in dewey and/or l.c. classification numbers by a reference librarian from the department. for example, the library school could use a listing of marc records on each weekly tape dealing with library science. table 1 is a library science profile. marc based sdi servicejbierman and blue 307 the dewey and lc classification numbers of each marc record on a tape are compared with the profiled dewey and lc classification numbers. table 1. library science profile . subject library science manuscripts & rare books ethics of librarianship library manpower library study techniques audio visual instruction films in adult education printing & binding bookselling management of libraries city planning & libraries architecture & libraries book illustration motion pictures dewey numbers 020-029 090-099 174.902 331.76102 371.30 371.33 374.27 655 658.809655 658.9102 711.57 727.8 741.64 791.43 l. c.' number~ zl-zlooo when either the dewey or lc number matches, that record is pulled from the marc tape and copied onto a detail tape. after the entire marc tape has been searched, the detail tape is rewound and the selected records are printed. description of the sdi programs the sdi system consists of two programs: the first, odl-07, pulls the appropriate records from a marc tape, and the second, odl-07x, prints these records in a readable form and appropriate sequence. figure 1 is a system flow chart. odl-07 program inputs are 1) a control card givmg program identification and date; 2) header cards containing list codes and headers. (the list code is a one-character code that uniquely identifies the list, e.g., "z" for library science; the header will appear on each page of output, e.g., "library science" for the library science list.); 3) classification number cards, which contain the proper list code, a selector code ( "d" for dewey and "c" for library of congress), and the lc or dewey classification number or range of numbers to be selected; and 4) the marc tape to be searched. outputs are: 1) a header tape containing all the information from the header cards and the date, and 2) a detail tape containing all selected records with a list code for each record. 308 journal of library automation vol. 3/4 december, 1970 print sdi listings fig. 1. sdi system flow chart. figure 2 is a detail flow chart for odl-07. the control and header cards are the first to be read. a header table is constructed for editing and records are written on the header tape. the classification number cards are then read. these cards are edited first for such errors as invalid list code (for each list code on a classification number card, there must be a corresponding header record), invalid selector code (must be "c" for lc or "d" for dewey), and invalid characters in the lc or dewey numbers (dewey may not contain any alphabetic characters and the only valid special characters for dewey are the period and dash; the first marc based sdi servicejbierman and blue 309 ( start ) hskp no onve rt to com ..., __ -lpore fom10t on construct lc & dewey tabl es fig. 2. odl-(/)7 detail flow chart. check tables for matches· hskp character of lc must be alphabetic). if the classification cards pass edits, they are used to construct lc and/or dewey entries. each table entry consists of three items: the lowest acceptable value, the highest acceptable value, and the list code. dewey classification numbers can be input into the system without reformatting and are converted by the program to table entries. table 2 presents some dewey numbers as they might be keypunched and input into the system and the corresponding table entries which would be created. dewey numbers are converted from the free form to a fixed-length .10-position all-numeric form. lc classification numbers are more difficult. these cannot be entered into the system without reformatting as can the dewey numbers; rather, table 2. dewey classification number table keypunched classification number cards corresponding table entry list code selector code z d classification number 174.902 list code z lowest value 1749020000 0200000000 3317610200 3400000000 3311100000 highest value 1749029999 0299999999 3317610299 3499999999 3318989999 z d 020-029 z z d 331.76102 z l d 34 z l d 331.11-331.898 z key: z = library science; l = law; d = dewey classification table 3. lc classification number table keypunched classification number cards list code selector code lowest value hv7231 highest value explanation p c p c l c z c jooooo kooooo zooool hv9920 records with lc classification number between hv7231 and hv9920 will be hits. jkzzzz records with lc classification number beginning with j and ja-jk will be hits; jl-jz will not be hits. kzzzzz records with lc classification number beginning with k (including ka-kz) will be hits. z01000 records with lc classification number between z1 and z1000 will be hits. key: z = library science; l = law; p = political science; c = lc classification c.o ..... 0 'c' 3 ~ ...... c -~ ... ~ ~ "'!! r.:: > ~ .... c ~ .... c;· ~ < ~ c.o ......... ~ t) (!) @ o(!) v'"l ..... :s 0 marc based sdi service j bierman and blue 311 low and high values are entered into the system and put directly into the lc table for searching of the marc tape. table 3 presents some lc classification numbers as they might be keypunched and entered into the system and a brief explanation of what records will be pulled as hits (matches). lc table entries are in the form of aann,nn where aa stands for the two possible initial letters and nnnn stands for the four numbers following the initial letter( s) and immediately preceding the first decimal point or next alphabetic character. zero is the lowest value and z is the highest. mter all classification number cards have been converted to table entries, the marc tape is read, the lc and dewey numbers are pulled from each record, and both tables are searched for hits (matches). the dewey classification number from the marc record is read and converted into a fixed-length 10-position numeric field. for example, the classification number 020/.6234/5456 from the marc tape would be converted to 0206234545 and the number 025.3/02 would be converted to 0253020000 before dewey table searching. if a classification number card had been 020-029 (see table 2), both of these records would have been a hit. the lc classification number read from the marc record is first converted to the form aannnn and then searched against the lc table. for example, the classification number z665.h45 from the marc tape would be converted to z00665 and z678.3.k39 would be converted to z00678 and then searched against the lc table. if the last entry in table 3 had been input into the system, these records would both be hits, as their lc numbers lie between zooool and zolooo. if a match is found in either table, the marc record is transferred in the original marc format to the output tape with the list code. mter odl-07 is completed, control passes to odl-07x. odl-fp7x program inputs are the header tape from the previous run and the detail tape containing the selected records from the previous ( odl-07) run. outputs are the sdi listings by subject areas (list code). figure 3 is a detail flow chart for odl-07x. the first record is read from the header tape and the detail tape is then searched for matching list codes. when a match is found, the marc record is formatted and printed. when the entire tape has been searched, the next header is read, the detail tape is rewound and the process is repeated. this continues · until all header and detail records have been matched and printed. the result is a series of sdi lists, each in lc card number sequence. see figure 4 for a sample of two printed records from a library science list. presently, the weekly lists are being printed on two-up, three-part, perforated teletype size (8~" x 5~") paper, one record ( sdi notice) to each separable form. 312 journal of library automation vol. 3/4 december, 1970 hskp at end at end yes construct and print record no fig. 3. odl-07x detail flow chart. discussion the· sdi system was written with flexibility as one of the main considerations. dewey classification number cards in ahnost any format can be machine converted to the intended table entry. both ranges and individual classification numbers are allowed. any number of dewey and lc entries and any number of lists can be handled simultaneously, the only limit being core size. the selection tables, not being built into the programs, can be changed at any time, weekly if desired. the print format generally follows traditional catalog card arrangement, the major difference being that each subject heading and added entry appears on a new line and is not numbered. the print program can be easily adapted to any conversion table desired; delimiters, field terminators, etc. are referred to symbolically. there is an optional feature which allows marc based sdi servicefbierman and blue 313 09/03170 ll erary science stevens, ~ary elizabeth• autc~atic indexing, a state•cf•the•art report. reissued with additions a.no ccrrecticns. washington, u.s. national bureau of standards, for sale by the supt. of docs., u.s. ggvt. print. off., 1970. vit 290 p. 26 cm. 2.25 (national bureau of standards ~onograph 91) •a united . states department of co~~erce publication.• includes eiblicgraphies. automatic indexing. t.s. national bureau cf standards. monograph 91 cc1co.u556 no. 91, l97c 029.5 73•t07239 marc • oklahoma oklahoma d.,artm£nt of lo1uoon sdj usu inr<>rmatoot< s••voco 09/03170 library sci ehce librarianship and literature, essays in honour of jack paffcrc. ecitec by a. t. milne. london, athlone p., 1970. viii ., ·141 p., 4 plates. illus., port. 23 cm . 40/incluoes eibliographical references. £r. j. ~. p. paffcrot by a. t. mjlne.--1. the british museum in recent times, by sir f. fra~cis.--2. the education cf a librarian, by r. irwin.••3. library cd-operation in great britain, by s. p. l. filcn.••4. the development of british university libraries, by j. w. scott.--5. problems of a special library, by r. tho~as.••6. t~e growth of literary societies, by a. brown.••7. the editor and the literary textt requi~e~ents ano opportunities, by ~. f. brooks.••b. some leaves frcm a thirteen•centurv illuminated manvscript in the university of london library, ay f. wormal0.••9. a bibliography of j. h. p. paffort, by j. harries and r. wo pound. library science••acdresse·s, essays, lectures. pafford, john henry pyle. ~ilne, ~lex~niler taylor, eo. pafford, john ~enry pyle. z6~5.l57 c20/.9~2 10~477193 ~85111179 marc • oklahoma . oklahoma o .. artm£nt 01' liuaiifs sol u ... infoiimation suvicf fig. 4. sample sdi notices. 314 journal of library automation vol. 3/4 december, 1970 any character or characters to be deleted and the resulting gap closed; this is desirable for diacriticals until better techniques for handling them are devised. both line and page length are referred to symbolically and can be easily changed to fit any form desired. line spacing and indentation are built into the present program, but even these can be changed. the major disadvantage of the sdi system as it now exists is that it allows selection by classification numbers only. unlike the marc i experimental sdi system at indiana university (16), which allowed for selection by weighted terms (both classification number and subject heading), this system allows for classification number selection only. programming difficulties, expense, and the necessity for additional processing time inhibit searching on subject headings. for selection of detailed subjects, subject heading searching is essential; however, for making subject searches in subject areas classification number searching seems more expedient, as it would be difficult to determine, and expensive to input, all of the subject headings for the field of law, for example. ideally, a marc-based sdi system would be able to provide selection based on classification numbers and/or subject descriptors. computer, language and cost the computer for which the programs were written was an ibm 360/30, 32k core size, one card read/punch, four tape drives, two disk drives and one printer. the programs have also been successfully run on an ibm 360/25 with one card read/punch, two tape drives and one printer. in the latter case, the first program was modified slightly because only two tape drives were available, whereas the sdi system normally requires three. modification was easily accomplished by having the header records punched rather than written in odl-07. the programs are written in cobol for the 360, operating under dos. very little modification would be required to operate under os. being written in cobol, the programs are easily adapted from one machine to another; they have been successfully run on a rca spectra, for example. they also are easily adapted and changed, the symbolic names and procedure division paragraph headings having been carefully selected to build in as much documentation as possible. following is a breakdown of the charges to the department of libraries for programming and machine time for development; department of libraries' staff time, overhead costs, and operating costs are not included. programming and debugging ------------------------$2,941.00 machine & operator costs for testing ___________ 452.00 operating costs are more difficult to determine and nearly impossible to evaluate meaningfully. the total amount of computer time required (and therefore the major cost) is primarily a function of the number of records on the marc tape being searched and the number of selected marc based sdi servicejbierman and blue 315 and printed records. if the marc tape contains 1,200 records, it takes about twelve minutes (clock time) of computer time (ibm 360/30, 32k) to select the desired records ( odl-07). as the total of classification numbers being searched increases (that is, as the dewey and lc tables grow), the computer time for selection does not appear to increase significantly. the print program ( odl-07x) is directly a function of the number of lists being produced (the number of times the detail tape must be rewound and re-read) and the total number of records being printed. as an example, ' if six different lists are being produced and a total of 375 records are being printed out, the computer time is 25 minutes. therefore, producing six weekly lists with an average of 62 records for each list takes approximately 37 minutes (clock time) each week. at the rate of $60.00 an hour, this is $37.00, or approximately 10c per record selected and sdi notice printed. table 4 presents a detailed analysis of five weekly runs. the total computer time is the number of minutes which were charged to the department of libraries by the computer center. since the department is charged one dollar per minute, this is also the dollar cost to the department for computer and operator costs for that weekly run. unfortunately, the total time given includes time for set-up and other factors. therefore, meaningful patterns are difficult to discern, as one week it may take several minutes longer to get the forms inserted and lined up in the printer, forms may break another week, etc. the remainder of table 4 is exactly accurate. it is interesting to note how much variance there is from week to week in the number of sdi notices for each subject list. for example, out of 889 marc records on the marc tape run on july 23, 16 were library science titles. however, the marc tape run on august 6 contained 1,201 records but only 12 were library science titles. in addition, notice that the library science list was reprinted seven times, and for the last two weeks reprinted five times, to get the total number of copies needed for the 25 subscribers to the list. current uses the uses to which the system is presently being put are in three general areas: 1) sdi lists for internal use of the department, 2) sdi lists for state government, and 3) sdi lists for other libraries. the department currently produces subject lists primarily for its own use in the areas of law and political science. since the department maintains specialty collections in these two subject areas, it is anxious to obtain the most current information on materials published in them for selection purposes. because the marc record comes out before the corresponding proof slip is distributed ( 17), use of the marc file has been a most successful means of obtaining complete and verified bibliographic information for the purpose of ordering new books. in addition, complete lc cataloging information is available should the proof not have arrived at the time the book is received. because the lists are currently being printed on three2'able 4. sample run times and list lengths. r"' cl' <r <r ~ print-out .s· :co • n tope-out count a q "' -input count count (total numn -o "' ;;· ;r n '< otal (total num(total number of sol it ;;· " date of omputer ber of reber of renotices tota l .. n .. computer ime cords on cord$ copied printed in number run minutes) marc tape) in ool-%7} ool-px) of lists 15 7 1 1 july 9 65 1,467 525 39 36 87 f:jy u j jo is/ 15 7 1 1 jul y 16 57 1,118 456_ _24 30 71. 600 168 30 76 16 7 1 1 july 23 49 889 377 16 16 71 473 112 16 71 14 5 1 1 july 30 56 1,078 437 23 32 57 529 115 32 57 14 5 1 1 august 6 60 1,201 543 12 23 103 591 60 23 103 ~ !;g'l m :;~ a.. "' 0 c t] -· < n ~ .. q c'3 :::. ~ :co. 3 c;· 2. .. ~ g " )?'~~e.. ~~ ii'v> -8 a .. " ~ n 3 '" ~ .. " 1 1 1 1 75 92 118 22 /:j yl. lis ll. 1 1 1 1 7 1 r~ 100 ?n 71 83 100 20 1 1 1 1 60 65 73 21 60 65 73 21 1 1 1 1 61 80 89 29 61 80 89 29 1 1 1 1 80 113 106 31 80 113 106 31 c )> c .. ~ .. "!?"!?-0 "' 2. 2. (') ~ ~ .. g; .. !1. a (;' a 1 1 -15 41 -i:j 41 -1 1 -r u -8 44 -1 i 1 5 34 16 5 34 16 1 1 1 11 38 17 11 38 17 1 1 1 11 42 22 11 42 22 number of print runs number of marc records selected number of sol notices printed number of print runs nun>t>e_r of marc records selected number of sol notices printed number of pri nt runs number of marc records selected number of sol noli ces printed number of print runs number of marc records se ected number of 501 notices printed number of pri nt runs number of marc records selected number of sol not ices printed c.;, ...... <j:) ...... 0 >:: "'! ;i ~ .q.. ~ ~ ~ "'i ~ > >:: cs .... ;::s a ~· ~ ~ c.;, .......... ~ t::1 ('t) (') ('t) s 0" j~ ...... ~ marc based sdi servicejbierman and blue 317 part teletype paper, one record per sheet, it is easy to separate the record to be ordered and send one copy to acquisitions, retaining one copy for the files, and sending one to the interested individual in state government with a note that the book is on order. the department also produces a special list of many different subjects which are of interest to the legislature for the legislative reference division of the governmental services branch. the legislative reference division can then order particularly useful materials quickly and route a copy of the sdi printout to the interested legislator or legislative committee. the department has prepared profiles of the state agencies having a large planning and research role. lists are prepared weekly for the department of education, department of corrections, department of vocational-technical education, department of welfare, industrial development commission, department of highways, and several small agencies, and are sent to the person responsible for planning and research within the department. he can then request books from the lists by returning one copy of the sdi notice to the department of libraries with a note to order, retaining the other copy for his files or routing it to a researcher particularly interested in the subject. certain lists are being produced and shared with libraries around the state. the law and political science lists are being sent to two law schools in oklahoma. the library science and bibliography lists are being sent to the library school and the two largest public library systems, as well as the two state universities. over 25 libraries outside oklahoma are receiving weekly library science, political science or law lists ( 18). a cooperative acquisitions program is evolving whereby certain libraries agree to specialize in certain subject areas so that every subject area would be covered by one library for specialized materials not needed by all libraries. currently, the program involves the two major public libraries and the department of libraries wherein the state teletype network (otis) is used to transmit rapidly information on expensive materials for cooperative acquisitions. selected lists in the specialized subject areas can be produced each week for each of the cooperating libraries to aid them in their selection, acquisition and cataloging of the materials. the uses currently being made have excited the imagination of many people, both within and without the department of libraries. a great deal has been accomplished since the system became operational early in february 1970; however, the possibilities have barely been identified. as mentioned above, one can envision this being the foundation of a cooperative acquisitions program. such a system could form a node of library service to business and industry; currently, some thought is being given to producing weekly lists of materials in automation and computer science (systems analysis, etc.) both for the many state agencies which have automated equipment and for businesses and industries around the state which utilize computer technology. 318 journal of library automation vol. 3/4 december, 1970 conclusion marc is an exciting and potentially valuable innovative new tool available to the library community, useful to improve both its own internal operations and, more importantly, its service to others. nonetheless, before extensive meaningful use of marc will occur, its potential uses must be identified and explored. this article has attempted to give a picture of one such experimental project to improve library service for others within the framework of a particular institution's resources and functions. much more research is needed on potential and operating uses of marc and the results of this research need to be disseminated to the library community. in addition, it is the opinion of the authors that for reasons both of available financial resources and expertise much of the research and development with marc must be a cooperative venture among many different libraries. some work has been done with marc cooperatively throughout the country (nelinet (19), oclc (20), clsd (21), for example) but much more needs to be done. the future of meaningful uses of marc is bright; however, much research and development is yet to be done which can best be done as a cooperative effort. programs and additional information sdi computer programs and services available from the department of libraries to other libraries are described in a publication called "sdi services and costs," available from the oklahoma department of libraries, 109 state capitol, oklahoma city, oklahoma 73105. additional progress reports on the sdi project, as well as other automation projects in oklahoma are reported in the bi-monthly oklahoma department of libraries automation newsletter, which is available on request. references 1. cuadra, carlos a., editor: annual review of information science and technology, 4 (chicago: encyclopedia britannica, 1969), 249-258. 2. studer, william joseph: computer-based selective dissemination of information (sdi) service for faculty using library of congress machine-readable catalog (marc) records (ph.d dissertation, graduate library school, indiana university, september, 1968 ), 1. 3. studer, william j.: "book-oriented sdi service provided for 40 faculty." in avram, henriette d.: the marc pilot profect; final report on a project sponsored by the council on library resources, inc. (washington: library of congress, 1968), 180. 4. cuadra: op. cit., 243-258. 5. ibid:. 263-270. 6. bloomfield, masse: "current awareness publications; an evaluation," special libraries, 60 (october 1969), 514-520. marc based sdi servicejbierman and blue 319 7. bottle, robert t.: "title indexes as alerting services in the chemical and life sciences," journal of the american society for information science, 21 (january-february 1970), 16-21. 8. brannon, pam barney; et al.: "automated literature alerting system," american documentation, 20 (january 1969), 16-20. 9. brown, jack e.: "the can/sdi project; the sdi program of canada's national science library," special libraries, 60 (october 1969), 501-509. 10. davis, charles h.; hiatt, peter: "an automated current-awareness service for public libraries," journal of the american society for information science, 21 (january-february 1970), 29-33. 11. housman, edward m.: "survey of current systems for selective dissemination of information ( sdi) ." in proceedings of the american society for information science, 6 (westport, connecticut: greenwood publishing corporation, 1969), 57-61. 12. martin, dohn h.: "marc tape as a selection tool in the medical library," special libraries, 61 (april 1970), 190-193. 13. bierman, kenneth john; blue, betty jean: "processing of marc, tapes for cooperative use," journal of library automation, 3 (march 1970), 36-64. 14. recon working task force: conversion of retrospective catalog records to machine-readable form; a study of the feasibility of a national bibliographic service (washington d.c.: library of congress, 1969). 15. bierman, kenneth john: "marc-oklahoma data base maintenance project," oklahoma department of libraries automation newsletter, 2 ( october 1970). 16. studer, william j.: (op. cit., note 2), 23-37. 17. payne, charles t.; mcgee, robert s.: "comparisons of lc proofslip and marc tape arrival dates at the university of chicago library," journal of library automation, 3 (june 1970 ), 115-121. 18. bierman, kenneth john: "marc-oklahoma cooperative sdi project report no. 1," oklahoma department of libraries automation newsletter, 2 (june & august 1970), 10-14. 19. nugent, william r.: nelinet: the new england library information network. paper presented at the international federation for information processing, ifip congress 68, edinburgh, scotland, august 6, 1968. (cambridge, mass: inforonics, inc., 1968). 20. kilgour, frederick g.: "a regional networkohio college library center" datamation, 16 (february 1970 ), 87-89. 21. the collaborative library systems development project (clsd): chicago-columbia-stanford. unpublished paper presented at the marc ii special institute, san francisco, september 29-30, 1969. supporting faculty’s instructional video creation needs for remote teaching: a case study on implementing eglass technology in a library multimedia studio space article supporting faculty’s instructional video creation needs for remote teaching a case study on implementing eglass technology in a library multimedia studio space hanwen dong information technology and libraries | june 2023 https://doi.org/10.6017/ital.v42i2.15201 hanwen dong (hanwendong@uidaho.edu) is instructional technology librarian, university of idaho. © 2023. abstract in 2021, alongside seven colleges at the university of idaho campus, the university of idaho library received an eglass system (https://eglass.io) with funding from the governor’s emergency education relief grant to expand faculty’s capacity to create instructional videos. the eglass is a transparent glass whiteboard that allows instructors to write, draw, and annotate. it comes with a built-in camera that can capture instructors’ facial expressions and gestures while facing their remote students and allow better engagement. the eglass is suitable for creating asynchronous instructional videos for flipped classrooms and integrating zoom for synchronous online classes. this article details the eglass equipment setup, studio space optimization, outreach efforts and initiatives, usage examples of early adopters, lessons learned during the first year of the eglass deployment, and future considerations. introduction in 2021, the university of idaho library (library) received a transparent glass whiteboard called the eglass for faculty to record video-based lectures. the eglass was based on a similar glass whiteboard technology, called the lightboard, that the library already owned. initially built by university of idaho engineering students and later gifted to the library, the lightboard presented challenges to library staff as properly supporting the technology required spending a significant amount of time. offering similar functionalities, the eglass had the potential to also address the issues that the lightboard presented. similar to the lightboard, the eglass allowed instructors to write and draw on the glass while facing their audience, typically students who would be watching the recorded videos later, to provide better engagement. the eglass could also be used for creating asynchronous instructional videos for flipped classrooms and integrating zoom for synchronous online classes. to implement the eglass, it was necessary to consider factors such as the functionality, the space to be occupied, and faculty interest. a year after the original deployment of this tool, the author reports on the lessons learned in this article. lessons including the eglass equipment setup, multimedia studio space optimization, outreach efforts and initiatives, usage examples of early adopters, lessons learned, and future considerations are explored later in this article. background the studio in the university of idaho library provides space and audiovisual equipment to students, faculty, and staff to pursue curricular, personal, and creative multimedia projects. mailto:hanwendong@uidaho.edu https://eglass.io/ https://eglass.io/ information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 2 dong originally converted from a 200-square-foot meeting room, the studio is equipped with a 27-inch imac, a 32-inch full-hd victek monitor, a scarlett 18i20 audio interface, a dbx 266xs 2-channel compressor/gate, two krk’s rokit 5 g3 powered studio monitors, two shure sm58 dynamic vocal microphones with microphone arm stands and pop filters, several portable lights, a green screen, and more. software installed on the imac includes audacity, camtasia, and the essential adobe creative cloud applications such as photoshop, premiere pro, indesign, etc. patrons can use the studio software and equipment to record voice-over narrations and podcasts as well as to edit multitrack audio clips and videos. in addition to using the studio equipment, patrons can also borrow other multimedia equipment, such as video camcorders, audio recorders, tripods, a usb microphone, and a dslr camera, at the circulation desk. initially managed by two library support staff, both of whom left the organization to pursue other opportunities, the studio operations were taken over by the author in 2020. due to the covid-19 pandemic and the lack of air ventilation in the space, the studio was closed in march 2020 and did not reopen until august 2021. while any university-affiliated patron is welcome to use the studio, first-time users were expected to complete an orientation with the author to become familiar with the equipment setup and the audio workflow. to use the studio, patrons had to make reservations, up to two weeks in advance, for up to two hours per day. reservations were made from the studio’s webpage and managed through springshare’s libcal product. patrons who frequented the studio pursued various personal, creative, instructional, and curriculum-related projects, including video recording with the green screen, video editing, podcast recording, voice-over narration recording, etc. the studio was used by patrons several times a week. according to the libcal space statistics, in fall semester 2021, the studio had 48 unique users, 147 total bookings, 211 hours booked, and the average reserved time block was 86 minutes. in spring semester 2022, the studio had 30 unique users, 64 total bookings, 103 hours booked, and the average reserved time block was 97 minutes. a noticeable usage drop in the spring semester was likely due to a reduced number of advertised studio orientations provided to the campus community and fewer classroom assignments that required or promoted studio use. for several years, the studio was home to a lightboard for faculty to record class lectures. designed as open-source hardware by dr. michael peshkin from the mccormick school of engineering at northwestern university, the lightboard was a transparent glass whiteboard illuminated with a built-in light, and the ink would glow in low-light environments. instructors could write and draw on the glass with neon markers while facing the viewers, and the writings and drawings along with the instructor could all be captured in the same frame using a separate camera.1 dr. peshkin provided two solutions for those who were interested in acquiring a lightboard : buying a commercially-produced one or building one from scratch. the lightboard in the studio was built by a group of students in a mechanical engineering class for a senior capstone project as part of a design challenge in partnership with the center for excellence in teaching and learning (cetl), and the students later gifted the lightboard to the library. the lightboard that the studio received came with a steel frame and wheels. the unit’s overall dimensions were 75 inches long, 45 inches wide, and 78 inches high. the glass board itself measured 71.5 by 47.5 inches (see figure 1). information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 3 dong figure 1. the lightboard that the library received. the lightboard was used by a few instructors who frequented the studio over the years. during fall 2019, one faculty member from the college of natural resources regularly used the lightboard two to three times per week for about 45 minutes to an hour per session. another engineering faculty member, whose students built the lightboard, also used the lightboard several times but did not have a regularly scheduled appointment. there had not been any regular users since then. recording videos using the lightboard required a complicated setup. first, instructors would need to gather several pieces of equipment. for instance, they would need to check out a video camera and a tripod at the circulation desk downstairs and a lavalier microphone at the room adjacent to the studio. the setup required the lightboard to be positioned between the instructor and the camera. it was necessary to change the camera setting to flip the video horizontally; otherwise, any writings or drawings in the final recording would be displayed backward. additional steps included starting and stopping the camera recordings, checking throughout the recording process to ensure the instructor’s writing on the lightboard stayed within the camera’s frame of capture, and transferring the media from the camera’s sd card to an external hard drive or to cloud storage. as a result, recording a session using the lightboard required assistance from at least one other individual, usually a library staff or faculty member, from start to finish. the many different information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 4 dong moving parts made the whole experience time-consuming and labor-intensive both for the library staff and the lightboard users. literature review lightboard technology has been implemented at various higher education institutions since 2014. thanks to dr. peshkin, who made the lightboard an open-source technology and provided the building instructions on his website, many institutions built their own versions of lightboards with variable setups. due to the nature of the lightboard requiring a controlled lighting environment and the writing being backwards from the perspective of those facing the glass (including the camera), the lightboards were used almost exclusively in dedicated studio spaces where the videos were to be recorded. for instance, similar to the university of idaho library studio setup, the complete setup at the university of western australia consists of a lightboard, a camera, lights, markers, a lapel microphone, and a black canvas.2 a budget setup that cost as little as $100 as a removable, tabletop version was also developed.3 cornell university came up with a lightboard and projector setup that can be used in a live 500-person auditorium.4 needless to say, the lightboard technology was adaptable enough to meet various needs on many campuses. several studies show that, among the various types of instructional videos for asynchronous learning, students favor lightboard videos. one unique feature of the lightboard technology, for example, is that it enables instructors to incorporate their gaze and gestures into the instruction. according to a 2015 study, combining gaze and gestures with traditional instructional materials proved to be more effective in directing students’ attention.5 in a 2019 study, several researchers analyzed various lightboard cases in the context of learning theories and theoretical frameworks, such as cognitive load theory, cognitive theory of multimedia learning, and social learning theory. the researchers concluded that while more empirical research was needed, the lightboard videos could improve student learning and engagement.6 in another study conducted by researchers at the university of illinois urbana-champaign, students watched two types of recorded lectures—picture-in-picture with the instructor appearing in a corner of the video, or an overlay of the instructor without the background. study results showed that the overlay videos where the instructor interacted with the content had more views and were preferred by the students, likely thanks to the gaze and gestures of the instructor increasing accessibility. 7 in classes in which the instructors opted to use the lightboard, students generally responded positively to the lightboard videos. for example, in two online classes at clayton state university, most students preferred the lightboard lecture over the traditional narrated powerpoint lecture, and “students described it as engaging, more personable, appealing to visual learners, easier to follow and retain the information, and more similar to a conventional live lecture.”8 at bond university, in queensland, australia, in a chemistry class where the lightboard videos were incorporated as a learning aid, researchers reported that over a four-year period, students scored higher on exams in courses in which lightboard videos were incorporated as instructional materials.9 in another example, students enrolled in a physics class at san diego state university were exposed to the learning glass, a commercial product that was based on the lightboard technology. students responded in a post-course assessment that they felt more connected to their instructor when the instructor utilized the learning glass, and thus the researchers argued that the learning glass could positively impact stem students’ retention rates.10 lastly, at georgia southern university, two researchers conducted a mixed-method study to assess different groups of students’ perceptions of lightboard videos. the findings showed that while performing equally information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 5 dong well when comparing test scores, the students in the class that incorporated lightboard videos had better understanding, engagement, and satisfaction based on the assessment measures.11 lightboards are not without their drawbacks given the requirements and the limitations of the equipment and the recording conditions. in an engineering class where students used lightboard for a problem-solving assignment to demonstrate their learning, researchers identified the various requirements including a room with sufficient size, the need for filming equipment, and long postproduction processing time.12 other disadvantages of the lightboard included immobility, limited writing surface, and a more rigorous cleaning process.13 the type of content being presented in lightboard videos also required consideration. in a study comparing different types of lecture videos, students showed a strong preference to the learning glass videos and “suggested that this style be used to supplement lecture videos (in the form of practice problems and follow-up videos).”14 this conclusion corroborated another study that a lightboard was useful for step-by-step problem-solving explanations.15 lastly, in a study that examined three different styles of lightboard videos (interview style, multipresenter, and multimedia-enriched), the researchers identified the benefits along with the drawbacks of each style.16 for example, while interview videos highlighted interactions between the presenter and the interviewer, the presenter experienced “difficulty in multitasking between writing notes on the lightboard and attending to the interviewer’s questions.” having several presenters could also limit the amount of space for them to move around and write on the glass while remaining in frame and created possible distractions of having too many people as well as too much writing on the glass. another potential issue is that not all presenters could be wearing darker-colored clothing for better contrast with the writing. eglass context in spring 2021, the manager at the collaboration & classroom technology services (ccts) department at the university of idaho informed the author that they were planning on purchasing several eglass units for the campus to support faculty’s instructional video creation. the funds came from the governor’s emergency education relief (geer) grant to address the covid-19 pandemic’s impact on higher education. initially, the grant was written by several individuals who intended to purchase commercially-made lightboards to enhance distance teaching options. while researching for the grant, the team stumbled upon the eglass, which seemed to be easier to use than the lightboard. the pricing was reasonable, so the team decided to purchase several of these devices instead of the original two lightboards that were originally recommended. if interested, the library could receive one unit alongside eight other colleges on campus. the author checked out the demo unit at ccts and reported the first impressions as a user to the dean of university of idaho libraries. the latter reasoned that due to the lightboard and eglass’s duplicating functionalities and the fact that the eglass had more perceived ease of use given its all in-one package without the lighting and camera being separate, it would be best to replace the lightboard with the eglass. the author contacted the lightboard capstone project faculty member, who chose to rehome the lightboard to the engineering outreach department at the college of engineering. removing the lightboard paved the way for welcoming the eglass to the studio by reclaiming needed room space. information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 6 dong the eglass came in two sizes—a 35-inch and a 50-inch diagonal writing surface. the library received a 50-inch unit with the writing surface measured at 45.64 inches long and 27.40 inches tall. the height of the overall unit could be adjusted to 29.37 inches, 31.33 inches, or 33.31 inches. additional accessories that the library received included a desktop computer, two heightadjustable desks, a touchscreen monitor, a webcam, a ring light, peripherals, neon pens, and white clothes for wiping down the writings. once the order of the eglass came through, a ccts team that consisted of several individuals brought the eglass along with two height-adjustable tables to assemble (see figure 2). the assembling of all the equipment took about an hour. figure 2. ccts team assembling the eglass; disclosure: the shirt logo does not represent any affiliations. description similar to the lightboard, the eglass was made of a sheet of glass and a frame, and the instructors could write on the glass using neon markers. however, the eglass had several distinct features and advantages over the lightboard. first and foremost, the eglass had a built-in camera and the recording function that enabled the instructors to start, pause, and stop the recording on their own with a touch of a button. in addition, the eglass internal system flipped the image automatically in real time so that instructors did not need to write backward. therefore, using the information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 7 dong eglass would not require additional support from library personnel since the separate camera setup was no longer needed. the eglass’s built-in lights were also an improvement over the lightboard’s lights. the lightboard came with one set of lights on the frame that illuminated the writings on the glass, but it was necessary to set up additional portable lights to ensure the instructors were illuminated as well. the eglass came with two sets of lights—the instructor light illuminated the instructor, and the blue glass lights ensured the ink on the glass would glow for better visibility. each set of lights was controlled by a separate knob to adjust the intensity. moreover, the eglass could be used as a standalone unit for simple tasks that involved writing and drawing on the glass. for example, instructors could start, pause, and stop the recording using the touch buttons located below the writing surface on the frame. instructors could also use the freeto-download eglassfusion software to access additional features, such as taking snapshots; importing powerpoint slides, word documents, pdfs, and other types of media files; removing the imported media’s background color; zooming in and out; and annotating by typing texts and drawing rectangles or arrows. figure 3. a faculty member recording a video with an application overlay. while the eglass was connected to a desktop computer via a usb cable, instructors could bring their own devices to connect to the eglass, which supports windows, macos, and chromebook operating systems. with a laptop connected to the eglass, instructors could use the down loaded and installed eglassfusion software to control what they were sharing on their screens. for instance, on their devices, instructors could use video conference software such as zoom and microsoft teams for synchronous online instruction via screen sharing and could switch from their laptops’ camera to the eglass camera as the output video. in addition, students could see the information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 8 dong writings and drawings on the glass, the instructor’s face, body, gestures, and any programs opened on the instructor’s laptop on the same screen (see example in figure 3). lastly, instructors could choose to use the eglass while sitting down or standing up as the eglass was placed on a height-adjustable desk. the desktop computer, touchscreen monitor, webcam, and ring light enabled a one-button studio setup. instructors could open any video recording software when pressing the button to start a recording and use the touchscreen monitor for zoom whiteboard and camtasia for screencast recording with annotating. outreach the new equipment setup was completed a few weeks before the start of fall semester 2021. ccts sent out an announcement to the university daily newsletter targeted to faculty and staff to advertise that the eglass had been set up at various locations on campus. the author also provided 20 in-person studio orientations sessions, scheduled at 10:00 a.m. and 2:00 p.m. monday–friday during the first two weeks of classes, to campus students, faculty, and staff. prior sign-ups were not necessary, so patrons could simply show up at the orientation time. these orientations provided an overview to patrons unfamiliar with the studio or any pieces of the existing or new equipment. among the 36 patrons who showed up to the orientations, three faculty members were introduced to the eglass and one-button studio. several additional informational and educational workshops were conducted to promote awareness of the eglass. in the fall semester, ccts hosted a workshop introducing the eglass. due to the limited physical space in the studio that could only comfortably accommodate less than five people, the workshop was hosted in a hybrid format with the in-person location in a room adjacent to the studio. participants could choose to attend either via zoom or in person. if attending in person, participants could visit the studio after the workshop to check out the eglass setup and try out the equipment. workshop attendees noticed that the writing on the eglass was difficult to differentiate from the white wall, which served as the background. after the workshop, the author ordered some black wallpaper and applied it to the wall facing the eglass to help improve the contrast. in the 2022 spring semester, the author facilitated an online library workshop to introduce the eglass, its core features, advantages over the traditional white/blackboard or zoom instructions, examples of applicable disciplines to use eglass for instruction, and best practices to five faculty and two staff attendees. another event to promote the eglass was the engineering design expo at the university of idaho college of engineering, an annual event that showcases design projects created by students. this event attracted regional k–12 students, community college students, industry partners, and community partners. the makerfaire, an event that featured makerspace technologies and a drone demonstration, took place on the same day as the expo. due to the perceived impact of eglass and its application to stem instructions, marketing eglass to the stem audience seemed to be a natural fit. thanks to the assembling ease, the author staffed a table at the makerfaire with a smaller eglass unit loaned from another campus location. the author demoed the eglass to passersby, including students, faculty, and community members. lastly, the active learning symposium is an annual event hosted by ccts and cetl at the university of idaho. in 50-minute presentations, instructors shared their teaching strategies to promote active learning in their classrooms. the author reached out to one eglass regular user, information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 9 dong the computer science department chair, to co-present at the symposium to introduce the eglass and showcase some eglass videos created for a computer science class. usage in the 2021–2022 academic year, two faculty members regularly reserved the studio to use the eglass. one faculty member was the chair of the computer science department, and the other person was in the animal, veterinary and food sciences department. after attending an orientation to the equipment, setup, and software, the faculty members reserved the space and recorded on their own a few more times without the need for support from the author or a staff member. one of the initial goals of replacing the lightboard with the eglass was to free up library staff time to support faculty recording lectures, and the author believed that having this new equipment reached this goal. about halfway through the fall semester in 2021, the author added a checkbox for patrons to indicate their intended studio usage when making a reservation on the library website. based on the statistics generated by libcal, in addition to the two faculty members, five students booked the studio to create instructional videos. however, since none of the students reached out to the author directly and the studio was not staffed, it was not possible to confirm if the students used the eglass or any other pieces of equipment in the studio for video creation. regardless, the overall usage of eglass was lower than anticipated, and the author believed that there were several contributing factors. first, the equipment was not properly set up until the end of summer. several faculty who heard of the eglass expressed interest in using it to prepare for fall instruction, but shipping delays prevented the equipment from being delivered and set up in time. moreover, since several other colleges also received the eglass, faculty members who could access a unit at their colleges chose not to check out the library studio location despite the additional equipment and the optimized space to help improve the user experience. lastly, despite the marketing efforts, the author suspected that the majority of campus was still not aware of the existence of the eglass technology, so additional outreach was probably still needed. lessons learned after overseeing the studio with the new eglass equipment for two semesters, the author underestimated the amount of work to promote the eglass—the saying that “if you build it, they will come” does not always ring true. ensuring that the eglass was adopted by more faculty members required a lot of dedicated effort. identifying several early adopters who saw the value of the technology and were willing to advocate for it by spreading the word to their colleagues was key. even then, the author noticed that the two faculty members who had been using the eglass had stopped coming to the studio regularly after several sessions. keeping faculty engaged despite their diminishing interest in using the equipment was an issue that the author did not anticipate or resolve. in the 2022–2023 academic year, the library engaged in an organization-wide reorganization that halted several existing and anticipated work priorities, one of which was conducting studio space and service assessments. in the 2023–2024 academic year, through a collaborative effort with the new department administrator, the author hopes to improve the studio and eglass usage by planning promotional initiatives and resuming assessment activities. the space to place the equipment, on the other hand, was another consideration. while it was decided to put the eglass in the studio so that the lightboard could be replaced, the physical unit of the 50-inch eglass took more space than the original lightboard. occasionally, the author received information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 10 dong requests from patrons who wanted to use the studio to record videos using a green screen. while it was still manageable to set up a green screen in the remaining space, the lack of room made patrons’ recording experience feel cramped and awkward. overall, for a 200-square-foot studio that had a computer desks, audio equipment racks, portable lights, housing the eglass was less ideal than anticipated. moreover, in order for the studio to be optimized for using the eglass, the lighting, sound, and background required permanent adjustments. for example, after the initial setup, the eglass was facing a white wall in the studio. ideally, the background needs to be dark to help contrasting the lighter neon color writings on the glass. possible solutions included installing a black backdrop, painting the wall black, or applying black wallpaper. installing a backdrop with curtains was the most expensive and time-consuming option, and painting the wall would require temporary closure of the studio. the author opted to order black wallpaper from amazon.com to minimize the disruption to studio operations during the regular semester. the wallpaper cost less than a hundred dollars and applying it to the wall only required an hour, but eventually the adhesive started to wear off. the author decided to remove the wallpaper over the summer and contacted the facilities department to paint the wall black, which took time for removing and restoring the equipment in addition to the time for the wall to dry. lighting was another challenge since the eglass required a light-controlled environment. ideally, all the lights in the room should be turned off for patrons who wanted to use the eglass so that the writings and drawings on the glass were highly visible. some fluorescent lights in the studio were emergency lights that could not be turned off by flipping the light switches. the author had to manually disable some of the lights for the eglass users. the last space-related challenge was sound. the eglass came with a built-in microphone that did not require a separate microphone setup. however, the eglass was placed close to the walls in the studio due to a lack of space which caused some reverberations, lowering the overall sound quality. the sound could be improved if patrons used a headset with microphone and connected the headset to the computer dedicated to the eglass. installing acoustic wall panels was another viable option, and the author might consider such an approach if the usage of the eglass grew to justify the equipment purchase. conclusion the eglass technology at the university of idaho library offered an improved instructional video creation experience to the campus community. thanks to the eglass’s easier setup compared to the lightboard and the studio space improvement in terms of the controlled lighting and the black wall, faculty were greatly benefitted from having access to a tool that enabled them to create engaging videos for classes delivered in online and hybrid modalities. however, additional dedicated outreach efforts are needed for a wider campus adoption. at the university of idaho, seven other colleges on campus owned eglass alongside the library, and there has not been any coordinated communications to promote the technology among all locations. while marketing emails and newsletters would work well for most new services, it is the author’s opinion that potential users would better understand the applicability of the eglass to their instruction when they are able to see the physical unit in person. more in-person outreach, such as inviting faculty to the studio or attending departmental faculty meetings to show videos made using eglass, would be of help. information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 11 dong for other institutions that might be interested in acquiring an eglass or a similar technology, the author would suggest conducting an environment scan first to determine the campus need. are there faculty on campus who could benefit from this type of technology to achieve their instructional goals? are there any existing spaces on campus that offer comparable services or resources? if the library administration was interested in acquiring the technology for the library, is there an existing space that would be suitable for placing the equipment? would the library invest in the room so that the lights could be fully controlled, sounds could be proofed or dampened, and a background could be darkened? would there be a staff member to be assigned as the dedicated person to support and maintain the technology? the author hopes that this case study presents a myriad of ideas for those considering adopting a technology similar to an eglass at their libraries. endnotes 1 michael peshkin, “lightboard.info,” https://www.lightboard.info/. 2 timothy r. corkish et al., “a how-to guide for making online pre-laboratory lightboard videos,” in advances in online chemistry education, acs symposium series, vol. 1389 (washington, dc: american chemical society, 2021), 77–91, https://doi.org/10.1021/bk2021-1389.ch006. 3 katrina hay and zachary wiren, “do-it-yourself low-cost desktop lightboard for engaging flipped learning videos,” the physics teacher 57, no. 8 (november 1, 2019): 523–25, https://doi.org/10.1119/1.5131115. 4 erik s. skibinski et al., “a blackboard for the 21st century: an inexpensive light board projection system for classroom use,” journal of chemical education 92, no. 10 (october 13, 2015): 1754– 56, https://doi.org/10.1021/acs.jchemed.5b00155. 5 kim ouwehand, tamara van gog, and fred paas, “designing effective video-based modeling examples using gaze and gesture cues,” journal of educational technology & society 18, no. 4 (2015): 78–88. 6 mark lubrick, george zhou, and jingsheng zhang, “is the future bright? the potential of lightboard videos for student achievement and engagement in learning,” eurasia journal of mathematics, science and technology education 15, no. 8 (april 11, 2019): em1735, https://doi.org/10.29333/ejmste/108437. 7 suma bhat, phakpoom chinprutthiwong, and michelle perry, “seeing the instructor in two video styles: preferences and patterns” (paper, international conference on educational data mining, madrid, spain, june 26–29, 2015), https://eric.ed.gov/?id=ed560520. 8 sheryne southard and karen young, “an exploration of online students’ impressions of contextualization, segmentation, and incorporation of light board lectures in multimedia instructional content,” the journal of public and professional sociology 10, no. 1 (january 5, 2018), https://digitalcommons.kennesaw.edu/jpps/vol10/iss1/7. https://www.lightboard.info/ https://doi.org/10.1021/bk-2021-1389.ch006 https://doi.org/10.1021/bk-2021-1389.ch006 https://doi.org/10.1119/1.5131115 https://doi.org/10.1021/acs.jchemed.5b00155 https://doi.org/10.29333/ejmste/108437 https://eric.ed.gov/?id=ed560520 https://digitalcommons.kennesaw.edu/jpps/vol10/iss1/7 information technology and libraries june 2023 supporting faculty’s instructional video creation needs for remote teaching 12 dong 9 stephanie s. schweiker and stephan m. levonis, “a quick guide to producing a virtual chemistry course for online education,” future medicinal chemistry 12, no. 14 (july 1, 2020): 1289–91, https://doi.org/10.4155/fmc-2020-0103. 10 shawn firouzian, chris rasmussen, and matthew anderson, “adaptations of learning glass solutions in undergraduate stem education,” in proceedings of the 19th annual conference on research in undergraduate mathematics education, (pittsburgh, pennsylvania: special interest group of the mathematical association of america on research in undergraduate mathematics education, 2016), 751–60, http://sigmaa.maa.org/rume/rume19v3.pdf. 11 peter d. rogers and diana t. botnaru, “shedding light on student learning through the use of lightboard videos,” international journal for the scholarship of teaching and learning 13, no. 3 (2019), https://eric.ed.gov/?id=ej1235871. 12 kenneth r. hite et al., “effects of lightboard usage on circuit problem skills,” in 2017 ieee frontiers in education conference (fie) proceedings, (ieee, 2017), 1–4, https://doi.org/10.1109/fie.2017.8190529. 13 weibing ye, “lightboard and chinese language instruction,” journal of technology and chinese language teaching 7, no. 2 (december 31, 2016): 97–112. 14 ronny c. choe et al., “student satisfaction and learning outcomes in asynchronous online lecture videos,” cbe—life sciences education 18, no. 4 (december 2019): ar55, https://doi.org/10.1187/cbe.18-08-0171. 15 julia vandermolen, kristen vu, and justin melick, “use of lightboard video technology to address medical dosimetry concepts: field notes,” current issues in emerging elearning 4, no. 1 (june 13, 2018), https://scholarworks.umb.edu/ciee/vol4/iss1/6. 16 christoph dominik zimmermann et al., “utilizing the power of blended learning through varied presentation styles of lightboard videos,” in technology-enabled blended learning experiences for chemistry education and outreach, ed. fun man fung and christoph zimmermann (elsevier, inc., 2021), 31–40, https://doi.org/10.1016/b978-0-12-822879-1.00003-2. https://doi.org/10.4155/fmc-2020-0103 http://sigmaa.maa.org/rume/rume19v3.pdf https://eric.ed.gov/?id=ej1235871 https://doi.org/10.1109/fie.2017.8190529 https://doi.org/10.1187/cbe.18-08-0171 https://scholarworks.umb.edu/ciee/vol4/iss1/6 https://doi.org/10.1016/b978-0-12-822879-1.00003-2 abstract introduction background literature review eglass context description outreach usage lessons learned conclusion endnotes 90 oclc search key usage patterns in a large research library kunj b. rastogi: oclc; and ichiko t. morita: ohio state university, columbus. many libraries use the oclc online union catalog and shared cataloging subsystem to perform various library functions, such as acquisitions and cataloging of library materials. as an initial part of the operations, users must search and retrieve a bibliographic record for the desired item from the large oc lc database. various types of derived search keys are available for retrieval. this study of actual search keys entered by users of the oclc online system was conducted to determine the types of search keys users prefer for performing various library operations and to find out whether the preferred search keys are effective. introduction in the last decade, many information systems have been developed that use search keys to retrieve bibliographic records from large databases. the oclc online union catalog and shared cataloging subsystem in particular is one of the larger of these systems. 1--u there are currently more than 7 million bibliographic records in the oclc database. the oclc online system uses search keys to access various index files that locate bibliographic records in the database. index files are maintained for name/title, personal author, corporate author, coden, isbn, and lccn indexes. the first four of the above index files contain search keys that are derived from information (e. g., author, title) present in the piece or citation. search keys in these four indexes are in general not unique, because the derived key could be the same for different bibliographic records. the last three indexes (coden, isbn, and lccn) contain search keys or identifiers that are unique in general. a user enters a search key consisting of characters (letters, numbers, symbols, commas, hyphens) formatted according to specific rules that identify to the system which index file to search. for example, to search the name/title index, the user enters a search key consisting of the first four characters of the author's last name and the first four characters of manuscript received october 1980; .accepted december 1980. search key usage!rastogi and morita 91 the first nonarticle word of the title of the work, separated by a comma. to search the title index, the user enters a search key consisting of the first three characters of the first nonarticle word in the title, the first two characters of the second word, the first two characters of the third word, and the first character of the fourth word, each separated by a comma. 7 the system compares the user-entered search key with the search keys contained in that index file. this comparison results in one of three possible cases: l. only one index file search key matches the user-entered search key . 2 . more than one index file search key matches the user-entered search key. 3. no index file search key matches the user-entered search key. in the first case, the system retrieves the unique bibliographic record corresponding to the search key and displays it on the user's terminal screen. in the second case, the system retrieves all records that correspond to the search key, prepares truncated entries (consisting of author, title, imprint data, etc.) for those records, and displays the truncated entries on the user's terminal screen . the user then selects the truncated entry that corresponds to the desired record and requests the system to display the full record for that item. in the third case, the system responds with the reply that a record matching the user-entered search key was not present (a "not found" response) in the index. in the oclc online system, 2,500 member libraries ·using 3,800 terminals search the oclc database to perform various library functions such as acquisitions, monograph cataloging, and serials cataloging. users can choose to enter any type of search key from the various types of search keys permitted by the system. users' preferences to enter a particular type of search key will depend in part upon the kind of information they have about the item to be searched and the type of library function they wish to perform. if users receive a "not found" response after entering a particular type of search key, they may then try a different type of search key that they consider next best. the purpose of this study was to determine what types of search keys are preferred to perform various library functions and whether the preferred search keys are effective. the study also investigated what type of search key is used next when particular types of search keys are unable to retrieve the desired record to determine if there are any discernible search patterns. materials and methods for conducting this study, data were needed on the pattern of searchkey use in oclc member libraries. further, the data had to include the actual time of day when work was performed for a particular library 92 journal of library automation vol. 14/2 june 1981 function on a specific terminal. this requirement would permit identification in the online system use data collected by oclc of search keys entered to perform specific library functions. ideally, a library with several oclc terminals, each used exclusively for only one library function, was desired. the ohio state university (osu) library met this requirement. the osu library has eleven terminals: two of the eleven terminals are used exclusively for performing acquisition functions, seven are used for monographic cataloging, and one terminal each is used for serials cataloging and public use. the terminal assigned for serials cataloging is used for monograph cataloging after 5 p.m. library staff at osu use all the terminals exclusively, except for the public-use terminal. this public-use terminal can be used by anyone, including faculty, students, and library staff. two full days' transactions for each of the osu terminals were obtained from the oclc online system use statistics (olsus) file. during the online operation, the system writes a record on the olsus file for each message entered by the user. this record includes the institution number, a number identifying the terminal from which the message came, the time of the transaction, and the first nonblank sixteen characters of the message . if the user-entered message is a search key, the system response is either a "not found" response or a "found" response. with the "found" response, the system displays the bibliographic record (if unique) or displays a truncated entry screen. however, a "found" response does not necessarily mean that the truncated entry screen includes information about the bibliographic record the user was actually seeking. for the study, a program was written to scan the records in the olsus file for two full days in october 1978. the program extracted all the records for messages that came from the eleven osu terminals and wrote the records on two tapes--one for each day's activity. these tapes were sorted first by the terminal number and then within each terminal number by the time of transaction. each sorted tape was fed to another program that printed, for each terminal, the actual messages in chronological order and the associated system response. from this printout, it was possible manually to go through the complete sequence of messages entered to search a single bibliographic item. the printout for an entire day's activity for each terminal was thus divided into sections, each section containing all transactions that were performed to search for a single item. for each section, the type of search key first entered and the system response was noted. in case of a "not found" response, the type of search key next entered (if the search process was continued for the item) also was noted. the results were combined for all the terminals used to perform a specific library function (e.g., acquisitions) and for the two days. search key usage!rastogi and morita 93 results and discussion table 1 and figure 1 show the different types of search keys used as the first choice to perform various library functions. note that at the time of data collection for this study, the interlibrary loan subsystem was not operational. table 1. different types of searches for various applications type of search nametritle title personal author lccn isbn issn coden total monograph acquisitions cataloging items %of items %of searched total searched total 111 37.5 313 51.7 49 16.6 48 7.9 0 0.0 9 1.5 122 41.2 201 33.2 14 4.7 34 5.6 0 0.0 0 0.0 0 0.0 0 0.0 296 100.0 605 100.0 acquisitions lccn ( 41.2% serials cataloging title other s 4 .7% serials cataloging items %of searched total 15 15.9 72 76.6 0 0.0 1 l.l 1 l.l 5 5.3 0 0.0 94 100.0 monograph cataloging lccn name/tit le public use title name / title 48.7% public use items %of searched total 77 48.7 44 27 .8 16 10.1 13 8.2 3 1.9 3 1.9 2 1.3 158 100.0 fig . 1 . number of different types of search keys for various applications. 94 journal of library automation vol. 14/2 june 1981 during the two-day period, a total of 605 items were searched for monograph cataloging, 296 items were searched for acquisitions operations, and 94 items were searched for serials cataloging. a total of 158 items were. searched on the public-use terminal. most types of search keys were used to some extent. the use of isbn and issn search keys was quite limited for all types of library functions. the coden search key was used only twice, and both times through the public-use terminal. the corporate author search key was not used at all. the use of the personal-author search key was much smaller than expected. this was probably because at the time of the study the system did not permit use of personal author keys during peak hours (9 a .m. to 5 p.m .) of online system operation. for the acquisitions function, the lccn search key was used most often, followed by the name/title key. these two types of keys together were used for about 80 percent of the acquisitions items searched . for the monograph cataloging function, the most frequently used search key was the name/title key. this key was entered for about 52 percent of items searched. the next most frequently used key for monograph cataloging was the lccn key, used for about 33 percent of the items searched. for the serials cataloging function, the title key was used most often, for more than 75 percent of the items searched. searches performed through the public-use terminal included all types of search keys. the name/title key was used most frequently , followed by the title key. before performing an actual search, a user must choose, from among the various types of search keys available in the oclc system, the particular search key to use. if the search key used for a first try (primary choice of search key) results in a "not found" response from the system, a second key may be entered (secondary choice of search key). this sequence may continue through many search-key choices until the user retrieves the desired record ("found" response) or decides to abandon the search at some point upon obtaining a "not found" response. for this study, the investigation was confined to onlyprimary and secondary choices of search keys. the results of the "found" responses for the primary choice of key and for the secondary search key entered after receiving the first "not-found" response are presented in tables 2 through 5. for the acquisitions function (table 2), the most frequently used primary search key was the lccn key, which retrieved the desired record about 89 percent of the time. when the lccn key could not retrieve the record, the user chose mostly the name/title key as his/her secondary choice or abandoned the search. the next most frequently used primary search key was the name/title key, which retrieved the desired record about 51 percent of the time. when the name/title key was unsuccessful, the users entered as their secondary search key a title key search key usage!rastogi and morita 95 table 2. number of primary and secondary choices of search keys for acquisitions search discontinued types of search key used after the type of %of notafter the first not-found response first notsearch key items found found found name/ personal found used first searched responses responses responses title title author lccn isbn response nameffitle 111 57 51.3% 54 17 22 0 1 0 14 (31.5%)(40. 7%) (0.0%) (1.9%) (0.0%) (25.9%) title 49 17 34.7% 32 6 ll 0 2 1 12 (18.8%)(34.4%) (0.0%) (6.2%) (3.1%) (37.5%) personal author 0 lccn 122 109 89.3% 13 5 1 0 2 1 4 (38.4%) (7.7%) (0.0%) (15.4%) (7.7%) (30.8%) isbn 14 1 7.1% 13 8 3 0 0 0 2 (61.5%)(23.1 %) (0.0%) (0.0%) (0.0%) (15.4%) issn 0 coden 0 total 296 184 62 .2% 112 36 37 0 5 2 32 (32.1 %)(33.0%) (0.0%) (4.5%) (1.8%) (28.6%) note: to calculate the percentage given in parentheses, the number of ''types of search key used after the first not-found response" was divided by the number of "not-found responses." about 41 percent of the time, or a different name/title key about 31 percent of the time. approximately 26 percent of the time they abandoned the search. it seems that acquisitions users mostly try the lccn key first if available (the lccn is not present in all the records) and the name/title key first if the lccn is not available. thus, users adopted the right approach since the lccn key has· the highest hit rate. furthermore, the lccn key is more efficient than other keys because it results, on the average, in a fewer number of replies. for the monograph cataloging function (table 3), the name/title key was used most often as the primary search key, resulting in retrieval of the desired record about 57 percent of the time. when the name/title key could not retrieve the record, the users next attempted a title key (52 percent of the time) or a different name/title (21 percent of the time). about 23 percent of the time they discontinued the search. the lccn key was the second most frequently used primary search key and successfully retrieved the record about 79 percent of the time. when the lccn key was unsuccessful, the users tried the name/title key (58 percent of the time) as their secondary choice or abandoned the search . unlike the search-key usage pattern for acquisitions, the use of the lccn key for monograph cataloging was lower than use of the name/ title key, although here also the hit rate was highest for the lccn key. the reason the lccn use was lower is that ohio state university, being a research institution, processes a large number of items from var96 journal of library automation vol. 14/2 june 1981 table 3. number of primary and secondary choices of search keys for monograph cataloging search discontinued types of search key used after the type of %of notafter the first not-found response first notsearch key items found found found name/ personal found used first searched responses responses responses title title author lccn isbn response nameffitle 313 180 57.5% 133 28 69 1 4 1 30 (21.1%)(51.9%) (0.7%) (3.0%) (0.7%) (226%) title 48 24 50.0% 24 9 2 1 3 2 7 (37.5%) (8.3%) (4.2%) (12.5%) (8.3%) (29.2%) personal author 9 3 33.3% 6 4 0 0 0 1 1 (66.6%) (0.0%) (0.0%) (0.0%) (16.7%) (16.7%) lccn 201 158 78.6% 43 25 4 0 2 l 11 (58.1 %) (9.3%) (0.0%) (4.7%) (2.3%) (25.6%) isbn 34 3 8.8% 31 20 4 1 1 3 2 (64.5%)(12.9%) (3.2%) (3.2%) (9.7%) (6.5%) issn 0 coden 0 total 605 368 60.8% 237 86 79 3 10 8 51 (36.3%)(33.3%) (1.3%) (4.2%) (3.4%) (21.5%) note: to calculate the percentage given in parentheses, the number of ''types of search key used after the first not-found response" was divided by the number of "not-found responses." ious sources other than regular acquisitions channels, and many of these sources do not have lccn information. for the serials cataloging function (table 4), the title key was the first primary choice and retrieved the desired records 44 percent of the time. if this key failed to retrieve the desired records, the users entered as their secondary key a different title key 55 percent of the time and a name/title key 17 percent of the time. approximately 23 percent of the time, users decided to discontinue the search. although for serials cataloging the title key was used most frequently, its hit rate was less than 45 percent. on the other hand, the issn key was used very little, but its hit rate was as high as 80 percent. the use of the issn key is likely to increase in the future, however, because the united states postal service now requires the issn to be present on serials . 8 therefore, the issn will be more readily available to the user. among the searches performed through the public-use terminal (table 5), the most frequently used primary search key was the name/title key, which resulted in a successful search about 29 percent of the time. when patrons encountered a "not found" response, they tried as their secondary choice a different name/title key 29 percent of the time, or a title key 29 percent of the time. they abandoned the search 38 percent of the time. as mentioned earlier, the public-use terminal can be used by anyone, including faculty and students. the hit rate for name/title search key usage!rastogi and morita 97 table 4 . number of primary and secondary choices of search keys for serials cataloging %of nottypes of search key used after the first not-found response search discontinued after the first not-type of search key used first items found found found name/ personal found response searched responses responses responses title title author lccn isbn nameffitle 15 3 20.0% title 72 32 44.4% personal author 0 lccn 0 0.0% isbn 0 0.0% issn 5 4 80.0% coden 0 total 94 39 41.5% 12 6 4 1 0 0 (50.0%)(33.3%) (8.3%) (0.0%) (0.0%) 1 (8.3%) 40 7 22 2 0 0 9 1 (17.5%)(55.0%) (5.0%) (0.0%) (0.0%) (22. 5%) 0 1 0 0 0 (0.0%)(100.0%) (0.0%) (0.0%) (0.0%) 1 0 0 0 0 (100.0%) (0.0%) (0.0%) (0. 0%) (0.0%) 0 1 0 0 0 (0.0%) (100.0%) (0.0%) (0.0%) (0.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 55 14 28 3 0 0 10 (25.5%)(50.9%) (5.4%) (0.0%) (0.0%) (18:2%) note: to calculate the percentage given in parentheses, the number of "types of search key used after the first not-found response" was divided by the number of "not-found responses." table 5. number of primary and secondary choices of search keys for public use %of nottypes of search key used after the first not-found response search discontinued after the first n ot-type of search key used first items found found found name/ personal found response searched responses responses responses title title . author lccn isbn nameffitle 77 22 28.6% 55 16 16 0 2 0 21 (29.1 %)(29.1 %) (0.0%) (3.6%) (0.0%) (38.2%) title 44 20 45.4% 24 ll 9 0 0 0 4 (45.8%)(37.5%) (0.0%) (0.0%) (0.0%) (16.7%) personal author 16 5 31.3% ll 0 0 3 0 0 8 (0.0%) (0.0%) (27.3%) (0.0%) (0.0%) (72.7%) lccn 13 5 38.5% 8 2 2 0 1 1 2 (25.0%)(25. 0%) (0.0%) (12.5%) (12.5%) (25.0%) isbn 3 2 66.7% 0 0 0 0 1 0 (0 .0%) (0.0%) (0.0%) (0.0%) (100.0%) (0.0%) issn 3 33.3% 2 0 0 0 0 0 2 (0 .0%) (0.0%) (0.0%) (0.0%) (0.0%) (100.0%) coden 2 0 0.0% 2 0 1 0 0 0 1 (0.0%) (50.0%) (0.0%) (0.0%) (0.0%) (50.0%) total 158 55 34.8% 103 29 38 3 3 2 3h (28.2%)(27 .2%) (2.9%) (2.9%) (1.9%) (36.9%) note: to calculate the percentaee given in parentheses, the number of "types of search key used after the first not-found response" was divided by the number of "not-found responses." 98 journal of library automation vol. 14/2 june 1981 key at this terminal was rather low. from this study, it is not possible to say whether this was due to patrons' lack of knowledge in key construction or lack of sufficient information needed for the construction of the key. summary and conclusions among various types of search keys available to the users, the name/ title, lccn, and title search keys were entered most frequently. the use of personal-author, isbn, issn, and coden search keys was very limited for all library functions. corporate-author search keys were not used at all. for the acquisitions function, system users most frequently entered the lccn key, followed by the name/title key. for monograph cataloging, the users entered the name/title key most frequently, followed by the lccn key. for serials cataloging, the use of the title key was the most common. persons using public-use terminals entered mostly name/ title and title search keys. for acquisitions and monograph cataloging functions, the lccn key was most successful in retrieving the desired records. the next most successful key was the name/title key. for both of these functions, when the name/title key failed to retrieve the record, users next tried the title key most of the time. for serials cataloging, the title key was used most frequently but was not very successful in retrieving serial records. on the other hand, the issn key was the most successful but it was used very little . individual identifiers such as lccn, issn, isbn, and coden are very efficient search keys because they retrieve, on the average, far fewer numbers of replies than other types of search keys. with the exception of lccn, the individual indentifiers were used only to a small extent. from this study, it is not possible to answer questions such as: why weren't individual identifiers' search keys not used more often? did a searcher use a name/title key even when the lccn was available? to answer such questions, data will have to be collected concerning what kind of information is available to the searcher when constructing the search keys. acknowledgments the authors wish to thank william h. hochstettler for programming assistance, and peggy zimbeck for editorial assistance with the manuscript. references l. f. g. kilgour, p. l. long, and e. b. leiderman, "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the american society for information science 7:79-82 (1970) . 2. f. g . kilgour and others, '"title-only entries retrieved by truncated search keys," search key usage!rastogi and morita 99 journal of library automation 4:207-10 (dec . 1971). 3. p. l. long and f . g . kilgour, "a truncated search key title index," journal of library automation 5:17-20 (march 1972). 4. a. l. landgraf and f. g. kilgour, "catalog records re trieved by personal author using derived search keys," journal of library automation 6:103--8 (june 1973). 5. a. l. landgraf, k. b. rastogi, and p. l. long, "corporate author entry records retrieved by use of derived truncated search keys," journal of library automation 6:15161 (sept. 1973). 6. j. d . smith and j . e . rush , "the relationship between author names and author entries in a large on-line union catalog as retrie ved using truncated keys," journal of the american society for information science 28 , no.2:115--20 (march 1977). 7. oclc, inc., searching the on-line union catalog (columbus, ohio: oc lc, inc., 1979). 8. library of congress information bulletin 37:35 (1 sept. 1978). kunj b. rastogi is a research scientist at oclc . ichiko morita is assistant professor at the ohio state university libraries. \ orthographic error patterns of author names in catalog searches 93 renata tagliacozzo, manfred kochen, and lawrence rosenberg: mental health research institute, the university of michigan, ann arbor, michigan an investigation of error patterns in author names based on data from a survey of library catalog searches. position of spelling errors was noted and related to length of name. probability of a name having a spelling error was found to increase with length of name. nearly half of the spelling mistakes were replacement errors; following, in order of decreasing frequency, were omission, addition, and transposition errors. computer-based catalog searching may fail if a searcher provides an author or title which does not match with the required exactitude the corresponding computer-stored catalog entry ( 1). in designing computer aids to catalog searching, it is important to build in safety features that decrease sensitivity to minor errors. for example, compression coding techniques may be used to minimize the effects of spelling errors on retrieval ( 2, 3, 4). preliminary to the design of good protection devices, the application of error-correction coding theory ( 5, 6, 7) and data on error patterns in actual catalog searches ( 8, 9) may be helpful. a recent survey of catalog use at three university libraries yielded some data of the above-mentioned kind (10). the aim of this paper is to present and analyze those results of the survey which bear on questions of error control in searching a computer-stored catalog. in the survey, users were interviewed at random as they approached the catalog. of the 2167 users interviewed, 1489 were searching the catalog for a particular item ("known-item searches"). of these, 67.9% first entered the catalog with an author's or editor's name, 26.2% with a title, and 5.9% with a subject heading. approximately half the searchers had a written citation, while half relied on memory for the relevant in94 journal of library automation vol. 3/2 june, 1970 formation. paradoxically, though most known-item searchers tried to match primarily an author and only secondarily a title, there were in the sample of searches many more cases of exact title citation than of exact author citation. imperfect recall of author name of the 1489 "known-item" searches, 1356 could be verified against the actual item. from the total nwnber of searches ( 1260) in which the catalog user had provided an author's (or editor's) name, those works were subtracted which did not have a personal authorship ( 208) or had multiple authors or multiple editors ( 127). this left 925 searches, of which 470 had complete and correct author entries, while 455 contained various degrees of imperfection in the author citation. table 1 gives the distribution of incorrect and/or incomplete author citations. in the study an author's name was defined as incomplete when the first name, or the two initials, or one out of two initials was missing. table 1. incorrect and/or incomplete author names categories university of michigan libraries i ii iii total general library 144 25 6 175 undergraduate library 94 35 4 133 medical library 110 27 10 147 -total 348 87 20 455 in category i (the most numerous) the author's last name was correct, but the author citation as a whole was either incomplete or incorrect; i.e., there were mistakes and/or omissions in the first and middle name or initials. most of the searches in category i were incomplete rather than incorrect. since in category i there is nothing wrong with the author's last name, the searcher's ability to gain access to the right location in the catalog is presumably not impaired as long as the last name is not too common. once the searcher has entered the catalog, he will make use of other clues, such as title or knowledge of the topic, to identify the right item. but if the name is smith or brown or johnson, and the catalog is a large one, to have an incomplete author's name may be equivalent to having no name at all. (in the university of michigan general library catalog, which contains over four million cards, the entry "smith" extends over eight drawers, and the entries "brown" and "johnson" over four drawers each.) in an automated catalog it is easy to limit the set of entries from which the right item has to be selected by intersecting the last name of the author with some other clues. incompleteness of the author name may then not be a serious handicap. orthographic error patternsftagliacozzo 95 category iii includes all searches in which the searcher had an author that turned out to be wrong. the error in this case was not in incompleteness or misspelling of the author's name, but in the identity of the author. no further analysis of this group was conducted. category ii is the one which forms the object o£ the present report. the analysis concerns mainly position and type of errors, and the incidence of errors as related to name length. position of errors in author names the location of errors in the author citation is important for manual systems, such as traditional library card catalogs, as well as for automated systems. table 2 shows the distribution of e in the sample of incorrect author citations from all three libraries, where e is the position of the letter, counting from left to right, in which an error appeared. in the fourteen cases in which more than one error occurred in the same name, only the first error was considered. in a few cases the error involved a string of letters (e.g., friedman for friedberg). in such cases the position of the first letter of the string determined the location of the error. table 2. position of error in last name of author incorrect names e no. % cumulative % 1 2 2.3 2.3 2 11 12.6 14.9 3 11 12.6 27.6 4 19 21.8 49.4 5 13 14.9 64.4 6 12 13.8 78.2 7 7 8.0 86.2 8 6 6.9 93.1 9 3 3.4 96.6 10 2 2.3 98.9 11 1 1.1 100.0 total 87 table 2 shows that about half the incorrect author names had errors in one of the first four letters, while the other half had errors in one of the following letters, from the fifth to the eleventh position. the most frequently misspelled is the fourth letter, which is responsible for 21.8% of the total number of errors occurring in the sample. the ordinal number indicating the position of the error is not, by itself, a sufficient indicator of the area where the error occurred. an error in the third letter, for instance, is close to the beginning of the name if the 96 ]ourml of library automation vol. 3/ 2 june, 1970 name is 9 letters long, but close to the end if the name is 4 letters long. in table 3 l indicates the length (the number of letters) of the authot name and pa the location of the error-i.e., the position of the first letter, counting from left to right, where an error appears. the incorrect author names of the sample ( 87) have a length of between 3 and 12 letters. the column on the right of the table, el, indicates the distribution of names of a given length. the row at the bottom of the table gives the distribution of errors occurring in a given position. mistakes are shown to occur anywhere from the first letter to the eleventh letter. when the error consists in the addition of a letter to the end of the correct name, pa is beyond the name itself. the figures which appear next to the diagonal line, on the right, indicate mistakes of this sort. a sununary inspection of the table produces the impression that errors are clustered toward the end of the names, or at least that they are more prevalent in the second half of the name than in the first half. this seems to be a direct consequence of the fact that the first column of the table (errors in position 1) is almost empty. it is tempting to say that errors very rarely occur in the first letter of a proper name. but is this really so? it is true that english-speaking people place particular emphasis on initials, to the extent that initials are often sufficient for identifying well-known figures. the special attention given to the first table 3. position of error vs. length of name length (l) errors (pe) frequency (el) 1 2 3 4 5 6 7 8 9 10 11 3 1 1 4 1 3 5 5 1 2 1 7 6 1 3 6 21 7 4 2 6 19 8 2 3 2 16 9 2 1 1 1 1 1 1 8 10 1 1 2 1 2 7 11 1 1 2 12 1 1 total 2 11 11 19 13 12 7 6 3 2 1 87 orthographic error patternsjtagliacozzo 97 letter of a name would certainly contribute to the scarcity of errors in such a letter. but it is also possible that when errors in the first letter occur, they so transform the name that it becomes unrecognizable. several such authors may have ended up in the category of non-verified authors necessarily excluded from the analysis. it would be interesting to verify whether the "serial-position effect" that some authors found in the spelling of common nouns is present also in the spelling of proper names. according to jensen and to kooi et al., the distribution of spelling errors in relation to letter position closely approximates the serial-position curve for errors found in serial rote learning ( 11, 12). to ascertain if this is the case for author names, a data base much larger than that used for this study would be needed. distribution of errors and length of names is the probability of a catalog searcher misspelling the name of an author dependent to any extent on the length of the name? table 3 shows the frequency of occurrence of names of a given length in the 87 misspelled names (column el). the next step was to calculate the distribution of the length of author names in the whole group of verified author citations provided by the catalog searchers. this group, it should be remembered, does not include multiple authors, multiple editors or nonpersonal authors. the ratio of the corresponding figures in the two distributions will give the percentage of names of a given length having spelling mistakes (table 4) . table 4. probability of errors in recall of author names of a given length length frequency of frequency of percentage of of name incorrect names all names incorrect names 2 1 3 1 9 11.1%} 4 5 87 5.7% 4.9% (short 5 7 169 4. 1% names) 6 21 215 9.8%"\ 7 19 191 9.9% j 10.5% (medium 8 16 127 12.6% names) 9 8 59 13.6%} 10 7 36 19.4% 14.3% (long 11 2 26 7.7% names) 12 1 5 20.0% 87 925 there is an observable trend toward an increase of mistakes with length of name. of course, the two extremes of length distribution are scarcely 98 journal of library automation vol. 3/2 june, 1970 represented, and this is probably responsible for inconsistencies in the percentage disb·ibution. grouping names into three length categories (i.e., short names, middle-length names, and long names) makes more apparent differences in percentages of incorrect names. the differences are significant at the .01 level of confidence. type of error in author names errors which occurred in the spelling of the last names of authors were grouped into four broad categories: replacement errors, omission errors, addition errors, and transposition errors. while it is true, especially in badly mangled words, that an error can often be said to be of any of several types, it was generally easy to identify the simplest necessary transformation of the letters, and to assign the incorrect name to the type of error corresponding to that kind of transformation. in some cases this meant adding a string of letters or replacing one string by another. altogether the sample of 87 incorrect authors contained 104 errors. eleven names exhibited two errors each, three had three errors, and the remaining just one error. of the 104 errors, 50 were replacement errors; these are cases in which one letter or string of letters of the correct name has been replaced by a different letter or string of letters (e.g. hoiser for hoijer, friedman for friedberg). the most common replacement errors appear in table 5, in order of decreasing frequency. table 5. single-letter replacement errors no. of errors correct lettet' incorrect letter 6 0 a, a, a, a, p, r 5 a, e, y, y, y 4 y a, i, u, z 3 a i, o, 0 3 s c, r, z 3 v b, f, w 2 e i, 0 2 g c, r 28 not included in the table are the 10 letters which were each replaced just once and the 12 strings of letters. in four cases, the replaced letter was the second of a double letter. there were 34 omission errors in all. four of these involved a string of letters; all the rest were single-letter omissions. eleven single-letter omissions occurred in the last letter of the name (e.g. abbot instead of abbott), and 19 in the middle of the name (e. g. brent instead of orthographic error patternsjtagliacozzo 99 brendt). table 6 gives the frequency distribution of the omitted letters. the asterisk indicates that the omitted letter was the second of a double letter. table 6. single-letter omission errors no. of error in middle error in final letter errors position position omitted 8 5 3 e 4 4 a 4 40 t 3 1 20 n 2 2 h 2 2 i 2 20 1 2 1 1 s 1 1 c 1 1 d 1 1 r 30 addition errors totaled 18. in one case the addition consisted of a string of letters, while in the others only one letter was added. addition errors can occur in the middle of a name (e.g. berelison for berelson) or at the end of it (e.g. haller for halle). in the latter case, the added letter is found beyond the last letter of the correct name (these were the errors on the right of the diagonal in table 3). the distribution of addition errors is shown in table 7. the asterisk indicates that the added letter duplicated the previous letter. table 7. single-letter addition errors no. of error in middle errors position 5 2 2 2 1 1 1 1 l 1 17 error in final position 4 1 1 1 .l added letter s c e i a f 1 m n z 100 journal of library automation vol. 3/2 june, 1970 there were two transposition errors: ie for ei and ai for ia. in cases of second and third errors in the name, there were five replacement errors, seven omission errors, and five addition errors. table 8 summarizes the type of errors encountered in the sample of incorrect authors. figures in this table include strings as well as single letters, and second and third errors, as well as first errors. table 8. distribution of types of errors middle position replacement errors omission errors addition errors transposition errors conclusion four trends could be observed: 44 21 10 2 final total position 6 50 13 34 8 18 2 104 1) vowels usually replaced vowels, and consonants usually replaced consonants. apparently the probability of misspelling a single letter was slightly higher for vowels than for consonants. with the latter, there is some indication that the substitution was guided by phonetic similarity ( " » • 1 d b "b" "f" " ") e.g., v is rep ace y , or , or w . 2) most omissions in which the correct name had a double letter occurred at the end of the word. 3) replacement errors tended to come earlier in words than did omissions and additions. (this is not due to the fact that addition and omission errors contained a disproportionately high number of final errors; even when these final errors are excluded, replacement errors still come earlier than other types.) 4) second and third errors in a name have comparatively few replacement errors. acknowledgment this work was supported in part by the national science foundation, grant gn 716. references 1. kilgour, f. g.: "retrieval of single entries from a computerized library catalog file," proceedings of the american society for i nfo1'11ultion science, 5 ( 1968), 133-136. 2. nugent, william r. : "compression word coding techniques for information retrieval," journal of library automation, 1 (december 1968), 250-260. orthographic error patternsjtagliacozzo 101 3. ruecking, frederick h ., jr.: "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," journal of library automation, 1 (december 1968), 227-238. 4. dolby, james l. : "an algorithm for noisy matches in catalog searching." in: a study of the organization and search of bibliographic holdings records in on-line computer systems: phase i. (berkeley, cal. : institute of library research, university of california march 1969 ), 119-136. 5. peterson, william w.: error correcting codes (new york: wiley, 1961). 6. alberga, cyril n.: "string similarity and mispellings," communications of the acm, 10 ( 1967), 302-313. 7. galli, enrico j.; yamada, hisao m.: "experimental studies in computer-assisted correction of unorthographic text," ieee transactions on engineering writing and speech, ews-11 (august 1968), 75-84. 8. tagliacozzo, r., et al.: "patterns of searching in library catalogs." in: integrative mechanisms in literature growth. vol iv. (university of michigan, mental health research institute, january 1970). report to the national science foundation, gn 716. 9. university of chicago graduate library school: requirements study for future catalogs, (chicago : university of chicago graduate library school, 1968) . 10. tagliacozzo, renata; rosenberg, lawrence; kochen, manfred: access and recognition: from users' data to catalog entries (ann arbor, mich.: the university of michigan, mental health research institute, october 1969, communication no. 257) . 11. jensen, arthur r.: "spelling errors and the serial-position effect," journal of educational psychology, 53 (june 1962), 105-109. 12. kooi, beverly y.; schutz, richard e.; baker, robert l.: "spelling errors and the serial-position effect," journal of educational psychology, 56 ( 1965), .334-336. lib-s-mocs-kmc364-20140601051127 1 foreword the editorial board of the journal of library automation is pleased to pay tribute to frederick g. kilgour who, with the able assistance of his assistant editor, eleanor m. kilgour, so firmly established this periodical and set its standards so high. especially in view of the fact that in these first years of journal publication, mr. kilgour was also designing and implementing the complex system which is the ohio college library center, his achievement as first editor was remarkable. to him the information science and automation division of the american library association owes a great debt. as library automation moves further into the seventies, the context of its existence changes. ever-increasing fisca l pressures have required economic justification for every alteration of traditional practice. the mere availability of equipment, of programs and tested system design, even of skilled and experienced manpower can no longer be considered enough. novelty, the magic word "innovation," seldom now cast a spell on those who control institutional budgets. increasingly, in the issues of this journal, we hope that emphasis will be placed on reviews of experience, retrospective evaluations of operation rather than optimistic projections made in the first bright mornings of system design. we must have reports if not of failures at least of alterations and accommodations enforced on operational systems by experience and the heavy hand of time. it is our further hope that the ] ournal will receive more reports from public and school libraries which indicate an increasing dedication, in automation explications, to the social and educational goals of those institutions. -ajg principles of format design henriette d. avram and lucia j. rather: marc development office, library of congress 161 this paper is a summary of several working papers prepared for the international federation of library associations (ifla) working group on content designators. the first working paper, january 1973, discussed the obstacles confronting the worldng group, stated the scope of responsibility for the working group, and gave definitions of the terms, tags, indicator and data element identifiers, as well as a statement of the function of each.1 the first paper was submitted to the working group for comments and was subsequently modified (revised aprill973) to reflect those comment$ that were applicable to the scope of the working group and to the definit·ion and function of content designators. the present paper makes the basic assumption that there will be a supermarc and discusses principles of format design. this se1·ies of papers is be·ing published in the interest of almting the library community to intemational activities. all individual working papers are submitted to the marbi interdivisional committee of ala by the chairman of the ifla working group for comments by that committee. introduction in order to have this paper stand alone, the scope and the definition and functions of the content designators as agreed to by the working group are summarized below: 1. the scope of responsibility for the ifla working group is to arrive at a standard list of content designators for different forms of material for the international interchange of bibliographic data. 2. the definition and function of each content designator are given as: a. a tag is a string of characters used to identify or name the main content of an associated data field. the designation of main content does not require that a data field contain all possible data elements all the time. b. an indicator is a character associated with a tag to supply additional information about the data field or parameters for the processing of the data field. there may be more than one indicator per data field. 162 ] ournal of lib1'a1'y automation vol. 7 i 3 september 197 4 c. a data element identifier is a code consisting of one or more characters used to identify individual data elements within a data field. the data element identifier precedes the data element which it identifies. d. a fixed field is one in which every occurrence of the field has a length of the same fixed value regardless of changes in the contents of the fixed field from occurrence to occurrence. the content of the fixed field can actually be data content, or a code representing data content, or a code representing information about the record. basic assumption-supermarc there appears to be little doubt that the format used for international exchange will not be the format presently in use in any national system. the first working paper addressed the obstacles that preclude complete agreement on any single national format, and a study of the matrix of the content designators assigned by various national agencies substantiates the above conclusion. consequently, we are concerned with the development of a supermarc whereby national agencies would translate their local format into that of the supermarc format and conversely, each agency would accept the supermarc format and translate it into a format for local processing. 2• 3 supermarc, therefore, is an international exchange format with the principal function that of transferring data across national boundaries. it is not a processing format (although if desired, it could be used as such) and in no way dictates the record organization, character bit configuration, coding schemes, etc., to be used within processing agencies. the supermarc format, however, should conform to certain conventions, namely the format structure should be iso 2709 and the character representation should be an eight-bit extension of iso 646. ~ the latter convention means that data cannot be in any other configuration than a character-by-character representation. supermarc assumes not only agreement on the value of content designators but, equally as important, on the level of application of these content designators. whatever the agreed upon level of content designation is, those agencies with formats more detailed will be able to translate to supermarc but will be in the position of having to upgrade all records entered into their local system from other agencies. likewise, local formats consisting of less detailed content designation than supermarc must upgrade to the supermarc level for communication purposes. where the actual content of the record is concerned, i.e., the fields andjor data elements to be included, it is highly probable that the decision of the content designator working group will be that data, if in~ iso/tc 46/sc4 wgl is presently engaged in the definition of extended characters for roman, cyrillic, and greek alphabets and mathematics and control symbols. principles of format design/ avram and rather 163 eluded in the record, are assigned supermarc content designators, but that not all data will always be present. this permits the flexibility required to bypass some of the substantive problems of different cataloging rules and cataloging systems. for example, one agency may supply printer and place of printing while another may not. it may be assumed, however, that all agencies will conform to the specifications prescribed by the isbd and other such standard descriptions as they become available. principles of format design prior to any deliberation regarding the actual value of content designators, the working group realized it must agree on a set of basic principles for the design of the international format. the first working paper set forth, in the form of questions, some of the issues that must be taken into account in arriving at the principles. several members of the working group expressed their opinions and these were considered in the formulation of the principles. the principles were discussed at the grenoble meeting in august 1973. five of the principles were adopted and the sixth was deferred for further analysis based on working papers to be written by some of the members. the sixth principle was adopted at the brussels meeting in february 1974. the six basic principles are stated below with a discussion following each principle: 1. the international format should be designed to handle all media. it would be ideal if at this time all forms of material had been fully analyzed. this is currently not the case. agreement on data fields and the assignment of content designators can realistically only be accomplished if there is a foundation upon which to build. therefore, the forms of material have been limited to those listed below because, to the best of our knowledge, these are the only forms where either experience has been gained in the actual conversion to machine-readable form or in-depth analysis has been performed to define the elements of information for the material. books: all monographic printed language materials. serials: all printed language materials in serial form. maps: printed maps, single maps, serial maps, and map collections. films: all media intended for projection in monographic or serial form. music and sound recordings: music scores and music and nonmusic sound recordings. at the meeting in brussels, the decision was made to use the isbd as the foundation for the definition of functional areas for the formats. since at the present time an isbd exists only for monographs and serials, these materials will receive first priority by the ifla working group. · still under consideration is the question whether manuscripts should be included in the forms of material within the scope of the 164 j oumal of lihra1'y automation vol. 7 i 3 september 197 4 working group. pictorial representations and computer mediums have not as yet been analyzed. when these forms have been analyzed, they should be added to the generalized list. 2. the inte1'national fo1'mat should accept single-level and multilevel st1'uctu1'es. there is a requirement to express the relationship of one bibliographic entity to another. this relationship may take many forms. a hierarchical relation is expressed for works which are part of a larger bibliographic entity (such as the chapter of a book, a single volume of a multivolume set, a book within a series). a linear relation is expressed for works which are related to other works such as a book in translation. this discussion is concerned with hierarchical relationships and the need to describe this relationship in machinereadable records. there are a number of ways in which hierarchical relationships may be expressed. one method is to place the information on the related work in a single field within the record. for example, the different volumes of a multivolume set may be carried in a contents field. when a book is in a series, the series may be caltied in a series field. this may be termed using a single-level record to show a hierarchical relationship. another method is to use a multilevel record made up of subrecords.t the concept of a subrecord directory and a subrecord relationship field was discussed in appendix ii to the ansi standard z39.2-197!.4 the appendix illustrated a possible method of handling subrecords and expressing relationships within a bibliographic record but was not part of the american standard. similarly, in 1968 the library of congress published as part of its marc ii format a proposal to provide for the bibliographic descriptions of more than one item in a single record, and represented this capability as "levels" of bibliographic description. 5 the international standard (iso 2709) defines a subrecord technique without an explicit statement of a method to describe relationships. 6 more recently, a level structure was proposed in a document by john e. linford,7 and an informal paper by richard coward8 gave the following example of a level structure: level collection sub-collection document analytical record 1 subrecord 1 subrecord 1 subrecord r------1------, 1 subrecord 1 subrecord 1 subrecord t a subrecord is a "group of fields within a bibliographic record which may be treated as a logical entity." when a bibliographic record describes more than one bibliographic unit, the descriptions of the individual bibliographic units may be treated as subrecords. principles of format design/ avram and rather 165 several national ,agencies have expressed concern regarding the efficiency of the iso 2709 subrecord technique and have suggested that a modification be made to the subrecord statement. there are alternative techniques which could be incorporated in the international exchange format to build in level capability. methods have been suggested that would cause a revision (specifically the number of characters in each directory entry) to the iso standard; other alternatives might not. regardless of the final technique agreed upon, national agencies should maintain the authority to record their cataloging data to reflect their catalog practices, i.e., either describing the items related to an item cataloged as fields within a single-level record or as subrecords of a multilevel record. 3. tags should identify a field by type of entry as well as function by assigning specific values to the charactet positions. assigning values to the characters of the tags allows the flexibility to derive more than a single kind of information from the tag. for example, it should be possible by an inspection of the tags to retrieve all personal names from a machine-readable record regardless of the function of the name in the record, i.e., principal author, secondary author, name used as subject, etc. 4. indicatots should be tag dependent and used as consistently as possible across all fields. indicators should be tag dependent because they provide both descriptive and processing information about a data field. if the value assigned to an indicator is used as consistently as possible across all fields, where the situation warrants this equality, the machine coding is simplified to process different functional fields containing the same type of entry. 5. data element identifiets should be tag dependent, but, as fat as possible, common data elements should be identified by the same data element identifiets actoss fields. the principle has been adopted that the format will handle all types of media and consequently the projected number of unique tags may be quite large. in addition, since all types of media are not yet fully analyzed, the number of unique fields is an unknown factor. while it is undeniable that making data element identifiers tag independent would be desirable, the limited number of alphabetic, numeric, and symbolic characters would restrict the number of data elements to the number of unique characters. this constraint on future expansion seems to be more important than any advantages gained from making data element identifiers tag independent. if data element identifiers are tag dependent, then additional refinements could be added in one of two ways: ( 1) the principle of identifying common data elements by the same identifiers across fields could be followed as far as possible, 01' ( 2) the identifiers could be given a value to aid in filing. the two refinements appear to be mutu166 journal of library automation vol. 7/3 september 197 4 ally exclusive since a data element in one field may have a different filing value from the same data element in another field. since the first refinement should be useful for many types of processing, and the second would be useful only in filing, the former seems to be the better option. 6. the fields in a bibliographic record are primarily related to broad categories of information relating to "sttbfect," "description," "intellectual1'esponsibility," etc., and should be grouped according to these fundamental categories. the first working paper discussed as an obstacle the lack of agreement on the organization of data content in machine-readable records in different bibliographic communities. a subsequent paper consisting of comments made by staff of the library of congress on the proposed eudised format discussed in greater detail the analytic versus traditional arrangement. 9 • t the majority of the national formats designed to date are arranged by using the function as the primary grouping and the type of entry as the secondary grouping. several working papers produced by committee members supported the arrangement by function on the grounds that it followed the traditional order of elements in the bibliographic record and therefore simplified input procedures. grouping of the fields first by function and then by type of entry was agreed to at the brussels meeting. references 1. henriette d. avram and kay d. guiles, "content designators for machine readable records," journal of library automation 5:207-16 (dec. 1972). 2. r. e. coward, "marc: national and international cooperation," in international seminar on the marc format and the exchange of bibilographic data in machinereadable form, berlin, 1971, the exchange of bibliographic data and the marc format (munich: pullach, 1972), p. 17-23. 3. roderick m. duchesne, "marc: national and international cooperation," in international seminar on the marc format and the exchange of bibliographic data in machine-readable form, berlin, 1971, the exchange of bibliographic data and the marc format (munich: pullach, 1972), p.37-56. 4. american national standards institute, american national standard fot' bibliogmphic information interchange on magnetic tape (washington, d.c.: 1971) (ansi z39.2-1971). appendix, p.l5-34. 5. henriette d. avram, john f. knapp, and lucia j. rather, the marc ii format; a communications format for bibliographic data (washington, d.c.: library of congress, 1968), appendix iv, p.l47-49. 6. international organization for standardization, documentation-format fot• bibliographic information interchange on magnetic tape. 1st ed. international standard iso 2709-1973(e). 4p. t in an analytic tagging scheme, the first character of the tag describes the type of entry and subsequent characters describe function; in a traditional tagging scheme, the first character describes function and subsequent characters describe type of entry. ptinciples of format design/ avram and rather 167 7. council for cultural cooperation. ad hoc committee for educational documentation and information. working party on eudised formats and standards, 3d meeting, luxembourg, 26-27 april 1973, draft eudised format (second revision). prepared by john e. linford. 8. paper sent from richard coward to henriette d. avram, "notes on marc subrecord directory mechanism." 9. henriette d. avram, "comments on draft eudised format (second revision)," unpublished paper. lib-s-mocs-kmc364-20140601052239 101 an interactive computer-based circulation system for northwestern university: the library puts it to work velma veneziano: systems analyst, northwestern university library, evanston, illinois northwestern university library's on-line circulation system has resulted in dramatic changes in practices and procedures in the circulation services section. after a hectic period of implementation, the staff soon began to adjust to the system. over the past year and a half, they have devised ways to use the system to maximum advantage, so that manual and machine systems now mesh in close harmony. freed from time-consuming clerical chores, the staff have been challenged to use their released time to best advantage, with the result that the "service" in "circulation services" is much closer to being a reality. the transition from a manual to an automated system is never easy. northwestern university library's experience with an automated circulation system was no exception. the first three months of operation were especially harrowing; there were times when only the realization that the bridges back to the old system were burned kept the staff plugging away with a system which often seemed in imminent danger of collapse. that they survived this period is a tribute to their persistence and optimism as well as to the merit of the system . the impressive array of obstacles was offset by a number of positive factors. even though there were mechanical problems with terminals, the on-line computer programs worked flawlessly from the first. the climate for change was favorable. the automation project had the complete support of library administration; the head of circulation services, although new to the department and untrained in automation, was completely committed to the system and was able to transmit his enthusiasm to his staff. 102 journal of library automation vol. 5/2 june, 1972 within three months, the systems analyst, who had been available for advice and trouble-shooting, began to fade from the scene. only an occasional minor refin ement is now necessary. maintenance problems, both in programs and procedures, are minimal. basically the system has proved itself workable. in a previous paper by dr. james s. aagaard (lola , mar. 1972 ), the development of the system is traced and the system is described in terms of its logical design, program, and hardware components. the present paper will describe how the system operates in the library environment. the system accomplishes the traditional library tasks connected with circulation, but the methods used have changed radically. the development of effective procedures must in large part be credited to the circulation staff. these procedures have in a real sense spelled the difference between an adequate system and a good one. it is these procedures on which we will concentrate. the author wishes to thank the head of circulation services, rolf erickson, and his assistants, mrs. eleanor pederson and mrs. lillian hagerty, for supplying th e information to bring her up-to-date on procedures as they have evolved over the past three years. book identification almost 100 percent of the 900,000 books in the main library's circulation collection contain punched cards. accurately punched book cards, available in all books, can make the difference between success and failure of a circulation system. the book cards contain only the call number and location code. there is no doubt that, if conversion funds had been less limited, we might have elected to capture author/title data. however an analysis of the amount of data which could be carried on an 80-column card, added to the fact that this would quadruple the cost, led to the decision to omit author/ title. as a result, key punch costs were exceptionally low-1.1 cent per card. in spite of our fears, the complaints by users because overdue and other notices do not contain author / title have been surprisingly few. cards for new books are, with a few exceptions, produced automatically as output from the technical services module. all book cards are also on magnetic tape and constitute a physical inventory of the entire circulating collection, which is updated at intervals and listed. user identification the system requires a unique numeric identification number for each borrower. for faculty and evanston campus students, this is their social security number; for special users it is a five -digit number assigned by the library from a list of sequential numbers. the number is supplemented by a one-digit code which identifies the type of user. ,. interactive circulation syste m j veneziano 103 the university's division of student finance has responsibility for issuance of punched plastic badges for students. each spring at preregistration time, data are gathered and pictures taken for students planning to return to the university in the fall. badges are ready for distribution as soon as school opens. for incoming freshmen, transfers, and returnees, data are gathered and badges punched at registration time in the fall. a temporary paper badge is used during the several weeks required for badge preparation. an outside contractor prepares and punches the badges. there were initial problems with the accuracy of punching but these have been resolved. the library now has a small ibm 015 badge punch, which it uses for punching special user badges and badges for carrel holders. student badges are valid for one year. the user code is changed each year to prevent use of an expired badge. faculty and staff badges are issued by the personnel department of the university, and are good for three years. these are also produced by an outside firm. book security exit guards examine all books taken from the library to ensure that they are properly charged. the call number on the book and on the date-due slip are compared; the user number on the date-due slip and the user's badge are compared. this need not be a character-by-character comparison. a few selected characters will suffice. student badges contain their pictures, which should bear at least a resemblance to the holder of the badge. initially, students were not required to show their badges. after a rash of book thefts resulting from the use of lost or stolen badges, this policy was changed. the book-check routine sometimes slows exit from the building during peak periods ; however it is considered a necessary security measure. the problem of lost badges is a serious one. users tend to leave badges in the terminals. usually such badges are turned in at the main circulation desk by the next user; the owner is notified to come in and pick it up. if a student loses his badge, he must report it to the circulation desk as soon as possible. he is issued a special use r badge, and the computer center is notified to "block" his regular user number. if someone then tries to use the badge, an "unprocessed" message will appear in lieu of a valid date-due slip. the problem is timing. "blocking" is done only once a day. a determined thief can charge out a considerable number of books before the number can be blocked. for this reason, a check of the photograph on the badge is important. the maximum number of user numbers which can be blocked is fifty. fortunately, except for faculty /staff badges which are good for three years, student badges automatically become invalid at the end of each school year, and special user badges expire at the end of each quarter. behind the decision to go on-line was the belief that a university library, 104 journal of library automation vol. 5/ 2 june, 1972 to effectively serve its patrons, needs to be able to determine the status of a book without delay. all books which are not in their places on the shelves as indicated by the card catalog are, in theory, retiected in the computer circulation file. out of a circulating collection of 900,000 items, the number of records in the file at any one time will range from 30,000 to 60,000. this includes books temporarily located in the reserve room, books being bound, and books which have been sent to the catalog department. it also includes books which are lost or missing but which have not yet been withdrawn from the catalog. a single 2740 typewriter terminal, located at the main circulation desk, is used for inquiry into the circulation file. a library user, having obtained the call number of a book from the catalog, looks for it in the stacks. if he is unable to find it, he inquires at the terminal. the operator enters a command "search," followed by the abbreviated call number of the book (the key ) . if one or more records with this key are in the file, the file address, plus the balance of the call number (the key extension), are typed back from the computer for each such record. if one of the listed records is the desired one, the operator then asks for a display of the record. the display includes the due date, type of charge, user number, and, if there is one, the saver number. the ability to use an abbreviated call number to access the file has proved invaluable. the operator can in effect "browse" among all the various editions, copies, and volumes of a particular book which are in circulation. the technique also facilitates finding a record, such as a volume in a serial, where the format is often quite variable, and not always obvious from the call slip supplied by the user. if a large number of books all with the same key are in the file, there is sometimes a considerable wait while the typewriter types out the addresses and key extensions for all the records. once such a listing begins, there is no way at present to cut it off in mid-point. this is a minor inconvenience; it could be remedied quite easily if computer core were not such a precious commodity. the single 2740 terminal is heavily used and plans are under way to substitute a cathode ray tube in the near future. book locate procedures if a search on the 27 40 terminal reveals that a book is not in circulation, the individual may ask that it be "located." a form is filled out and the book is searched nightly in the stacks. ( it is also again searched in the 27 40 since it may have been charged out to another user after the inquiry. ) if it is found , it is brought down and placed on the "save" shelf, and the inquirer is notified that it is available. if it is not found , the form is held for two weeks and searched again, both in the 2740 and the stacks. if it is not found on the second search, it is interactive circulation systemjveneziano 105 entered into the file as a "missing book." the circulation section has found that entering missing books into the file as soon as possible saves them time, because a search for a single book is often duplicated needlessly for a number of different individuals. save procedures when a user is informed that a book is in the circulation file, he may ask that it be called in for him, provided it is not on loan to the reserve koom and provided it is not already "saved" for someone else. the 2740 operator calls in the record and adds the saver's identification number to the record. each weekday morning, '·book needed notices" are sent over from the computer center for books "saved" since the last notice run. the notices are stuffed in window envelopes and mailed. even though the number of saves is small, in relation to the total number of books charged out, this feature has contributed to the library's and the user's satisfaction with the system. initially there was some consideration given to providing for multiple saves on the same book. a study of the frequency of multiple saves indicated that the increased system complexity did not warrant it. moreover, a student usually cannot wait too long for his turn at a book. a better solution in a university library is either to buy more copies or place high demand books in the reserve room, or both. the standard loan period is four weeks. a save on a book causes the due date to be recalculated either to two weeks, or to five days from the date of the save, whichever is later. this variable loan period increases the number of users who can use a book in high demand, without inconveniencing the user of a book which no one else needs. to succeed, such a call-in policy must be backed with enough force to ensure that a called-in book is returned promptly. if a book is returned after the revised due date, the user incurs a penalty fine of $1.00 per day in addition to the regular 10 cents per day fine. expired call-ins result in a weekly computer-generated reminder. when a book which is saved is discharged, the terminal printer issues a message to this effect, and the book can be placed on the "save" shelf instead of being sent to the stacks. each night "book available notices" are produced for all such books discharged since the last notice run. the first copy of the notice is mailed to the saver; the second part is inserted in the book. the saver is given five weekdays to pick up the book. book charges self-service charges during the regular school year, from 1000 to 1200 books per day are charged out through the system. most of these charges are processed by the users, on the self-service terminals. 106 ]ounwl of library automation vol. 5/2 june, 1972 a basic objective in the design of the system was to make it easy for the user to charge out books. initially it was planned to have manned chargeout terminals. however, as the design of the system progressed, it became evident that the vast bulk of charge-out transactions would consist of three simple steps: ( 1) insert the user badge, ( 2) insert the book card, and ( 3) tear off the date-due slip. the idea arose: if the procedure was so simple, why not let the user himself do it, thus saving the cost of terminal operators? there was some concern over user reaction, but it was decided it was worth the risk. a simple set of illustrated instructions is attached to the terminal. since the terminal will not accept badges or book cards unless they are inserted in the proper direction, the user soon gets the idea. the terminal will also refuse a seriously off-punched badge or book card. if everything is done properly, the printer produces a date-due slip containing the user number, the book ca11 number, and the date due. this is detached and placed in the book pocket. if, instead of a valid date-due slip, the user receives a slip from the printer containing the word "unprocessed," he is instructed to take all materials to the main circulation desk. this condition will occur if the individual tries to take out a book which is already charged out (perhaps to the reserve room or a carrel). it also happens if the badge or book card has fewer than the required complement of characters or if the user code on the badge has expired. it also happens if the user's number has been "blocked." although readers had no difficulty mastering the technique of using the 1031 badge/card reader, the 1033 printer was another story. despite the printed and illustrated injunction to "tear the slip forward ," the users insisted on pulling the paper upward. the result-the continuous roll of paper would start to skew and the paper would eventually jam. to alleviate the skewing problem, we had pin-feed platens installed in the typewriters. these prevented the skew, but the upward pull on the paper caused the pin-feed holes to tear and get out of alignment with the pins. the result-a paper jam. the ibm field engineers valiantly tried to overcome the condition but to no avail. ibm was unwilling to make any major modification of the paper feed mechanism, and no amount of argument that such an improvement would increase their sales to other libraries had any effect. in desperation, the library fina11y took its problem to the physics shop in northwestern university's technological institute. the technicians there designed and built a hooded feed to channel the paper upward and forward at the desired angle. a hand-actuated knife blade was installed to cut and dispense the ticket-type slip. in spite of these heroic efforts, paper jams still occur with enough frequency to be annoying. since the terminals are isolated in the stacks, a jam often goes undetected until a user comes down to the main circulation desk interactive circulation systemfveneziano 107 with a complaint. for this reason, we have plans to install a "ticket printer," which will automatically cut and eject a ticket with no user intervention. unlike the 1033 printer, there has been very little down-time due to malfunctioning of the 1031 badge/card reader. due to their isolation on the stack floors, there was some early tampering with the terminals. now that the newness has worn off, the terminals seem to have lost their appeal to pranksters, except that the photographs used to illustrate procedures have a way of disappearing. everything taken into consideration, the self-service concept has proved completely feasible. it saves staff time and user time. the time required to charge out a book ranges from ten to fifteen seconds. carrel charges each quarter, the circulation section assigns carrels to individuals, mostly graduate students and faculty. carrel holders may charge out books for use in their carrels. a special loan code is entered which results in the date-due slip bearing the word "carrel." the user cannot take these books from the building. carrel charges are subject to call-in after two weeks but are not subject to fines. at the end of each quarter, unless the carrel has been reassigned to the same individual, any remaining books in the carrel are picked up and discharged. once a quarter, the carrel user receives a computer-printed list of books charged to his carrel. carrel holders tend to charge large numbers of books. for saving time on their part and on the part of staff, plastic badges are issued. these will contain the carrel number, the carrel code, and an expiration date. carrel holders may then use the self-service terminals in the stacks. charges to the reserve room the reserve room does not use the circulation system for charges to individuals, since the loan period is so limited. however the circulation file contains a record of all books located in the reserve room. when a book is charged to the reserve room, the identification number of the reserve room is entered in the 1031 slides, together with a loan code indicating an indefinite loan period. processing of large batches of books is speeded up by suppressing the printing of date-due slips for all intralibrary charges. after charging, the punched book card is removed and held until the book is ready to be returned to the stacks, at which time the book is discharged in the regular manner. if a book needed for reserve cannot be found in the stacks, it is searched in the 2740 terminal. if it is in the file , a save is placed on the record which generates a book-needed notice. the user is given five days to return the book. when the book is returned and discharged, a printer message alerts 108 journal of library automation vol. 5/ 2 june, 1972 the discharger, who places the book on the shelf for pick-up by the reserve room. if the book is not in the file, it goes through the "book locate procedure," after which , if it is not found , it is processed as a "missing" book. if su ch a missing book turns up, it can be immediately identified as needed hy the reserve room. a quarterly listing, in call number order, is re ceived from the computer center for all books charged out to the reserve room. this list serves as the heserve room's shelf list. bindery charges if a book is found to be in bad condition, it is set aside for a bindery decision. if it is beyond repair, it is charged out to the catalog department to be replaced or withdrawn. (after it is withdrawn it is deleted from the file.) if it can be repaired in-house, it is charged out to the mending section. if it must be sent to a commercial binder, it is charged to the bindery. the bindery section prepares an extra copy of the bindery ticket for all periodicals and unbound items, which it sends to the bindery. this ticket is used to keypunch a book card, which is then used to charge the book to the bindery. whenever a book is back from mending or binding, it is discharged before sending to the stacks. renewals all renewals are processed at the main circulation desk. the procedure is identical to a regular charge except that a slide on the terminal is set to "renew." the new date-due slip will contain the phrase "renew to." in theory, the self-service terminals could be used for renewals. in practice, unless elaborate precautions were taken, a user could renew a book before it became due and then return it for discharge, leaving one slip in the book and keeping the other. after the book reached the stacks, the user could insert the extra date-due slip and walk out undetected. as protection against this, the original date-due slip must be in the book when it is renewed. phone renewals are not accepted. however, if the user mails or brings in his date-due slips, the renewal is processed on the 2740. in the renewal of a book via the 2740, the record is called in and modified to change the date due and enter the correct renewal code. the original date due slip is stamped with the new date and the phrase "renewed." the slip is mailed to the user. although record modification via the 27 40 is a valuable and necessary feature, it must be used with discretion, since the generalized file management system governing the 2740 does not have the controls contained in the circulation-specific portions of the program which handle data from the 1030's; for example, automatic calculation of date due, rejection of renewals on saved books, validation of codes, etc. interactive circulation systemjveneziano 109 book discharge heturned books are left in book bins, one inside the building and one outside. it became very c\·ident during the implementation phase of the system that the success of the system depended on a thorough screening before discharge for purposes of detecting and deflecting potential problems before they got to the discharge terminal. books are first placed on dated trucks and then screened. books tcifhout punched book cards 1 f the punched hook card is missing, there will usually be a hand-written dat(' slip in the book ( the result of a manual charge ). the screener pulls the matching book card from the "book-cards-pending" fil e. ( after a manual charge, book cards arc punched and filed in this file to await the return of the hook.) the book is then ready for regular discharge. if there is no book card waiting, the hook must be held until a card is ready. this is done to avoid the charge being made after the discharge. books with incorrect book cards all book cards are checked to see that they match the call number on the book pocket. sometimes cards get switched between two books by the user when he charges them; sometimes the error was made when the card was originally matched with the book. if a book is found to contain an incorrect card, sometimes the correct card will he found in the "cards-pending" file. if so, it is pulled and inserted and the hook sent for regular discharge. ( the incorrect card becomes a "snag". ) if the correct card is not found, the record is searched in the 27 40 under both call numbers ( the one on the card and the one on the book ) . if the record is under both call numbers, the record which matches the book is deleted; the book is sent to keypunching; the unmatched book card is filed in the "cards-pending" file to await the return of the book which matches it. , if the record is found under only one of the two call numbers, it is deleted. the book is sent to keypunching; the unmatched book card becomes a ''snag." "snag" cards will be searched in the shelf list and, if they represent valid books, will be searched in the stacks. this is done to determine if a matching book can be found. books without date due slips the presence of a date clue slip in a hook usually indicates that the book should be in the circulation file. a slip will be missing if the user nen'r charged it out or if he lost (or removed ) the slip after charging it out. such books arc searched on the 2740. if no record is found, the book is sent to the stacks. if a record is found, it is deleted. however, we wish to llo journal of library automation vol. .5 / 2 june, 1972 guard against the user returning to insert the date due slip and walk out with it; thus, the book is not sent to the stacks until the date due is past. regular 1031 discharges the speed and accuracy of discharge are features which have contributed much to the success of the system. a book with a date-due slip and book card which matches the book go to the 1030 terminal at the main circulation desk for discharge. one slide is set to either "fine paid" or "fine not paid." (if the user paid a fine at the time he returned the book, a "fine paid" flag will be in the book. ) another slide is set to "book returned today," or "book returned yesterday," or "book returned prior to yesterday." if the last condition applies, the date of return is also set in the slides. once set, these slides need not be reset until there is a change of date or fine condition. for minimizing the resetting of slides, books are segregated into groups all of the same type. discharging is the essence of simplicity. the book card is inserted in the reader; it feeds down and out and is replaced in the book. the date-due slip is discarded and the book is ready for shelving. for the purpose of speeding up discharge, no printer message is received unless there is an error (record not in file), or unless the book has a save on it, or is a "found" lost book. one operator can discharge five to six books per minute. books are almost always discharged within one day of return and usually within three or four hours. if a large number of books should pile up after a period of computer down-time (fortunately rare), a massive discharge campaign is launched. two operators, working together on the terminal, can discharge books at the rate of one every eight seconds. if at the time of discharge a "save" message appears on the printer, the book is placed on the save shelf instead of being sent to the stacks. if a lost book is "found," the message alerts the operator to send the book to the staff member in charge of lost and missing books. if a message is received to the effect that no record exists in the file, the book is routed to the 2740 operator. occasionally the 1031 terminal will misread a card, usually due to improper folding. if a card is folded outside the punched area it causes no trouble. unfortunately, some of the original cards were folded in the middle which sometimes results in a punch being missed. this, in some cases, cannot be detected by the computer program. if the error resulted from a mis-read card, the terminal operator can usually determine, from the date-due slip and the error-message slip, the key under which the record exists. the record is deleted and the book sent to have a replacement card punched. an occasional cause of the "record-not-in-file" condition results when the charge was processed on the standard register punch (the mechanized interactive circulation systemjveneziano 111 back-up system). this punch has a disconcerting habit of dropping a punch from badges which have a slight defect. there is no warning when this happens, and the error is often not detected until the transaction is later processed through the 1030 terminal. since it is impossible to identify the user with certainty, such cards are simply discarded without processing on the assumption that most users are basically honest and will return the book. the 27 40 operator, seeing a date-due slip with a short identification number, is safe in assuming the record never got in the file. sometimes the "record-not-in-file" condition is the result of a discharger absent-mindedly discharging a book twice. if the 2740 operator cannot find a record, she gives up and sends the book to the stacks. during the early days of operation, when much of the charging was being done on the source record punch, the "record-not-in-file" condition was often due to the book being "discharged" b efore the charge was processed. the very small amount of down-time now, coupled with careful scheduling when it does occur, has almost eliminated this source of error. overdue books overdue notices for students and special individual users are prepared once a week. to avoid sending out large numbers of notices for books only a few days overdue, an overdue notice is prepared only if the book is at least four days overdue. a second notice is prepared two weeks after the first; a third and final notice is prepared two weeks after that. if there is no response to the final notice within two weeks, a "delinquent" notice is prepared which is not sent out but is used to prepare a bill for a "lost" book. the overdue-notice run also produces reminders of expired call-ins. fines and fine collection faculty and staff are fine-exempt. students and other individual users pay a 10 cents per day fine for books overdue more than three days. in addition , if a reader does not respond to a call-in by the revised due date, he is charged a $1.00 per day penalty fine. a user may elect to pay a fine on an overdue book at the time he returns it, in which case a "fine-paid" flag is inserted to alert the discharger to set the proper slide. no fine notice will result if this slide is set. for all other books returned late, fine notices are computer-prepared each weekday. these are on four-part forms; one copy is inserted in a window envelope and mailed; the other three parts are filed alphabetically by name. when the user pays his fine, the extra slips are discarded. if the fine is not paid in a reasonable period, one of the extra copies is sent as a follow-up notice. if no response to the follow-up is received, and if the total bill exceeds $3.00, the bill is sent to the department of student finance for collection. 112 journal of library automation vol. 5/2 june, 1972 sometimes the receiver of an overdue notice will come in to report that he ( l) returned the book, ( 2) lost it, or ( 3) never had it. in such cases the book is searched in the 27 40 because the book may have been returned since the overdue notice was prepared. if the record is still in the file, the item is verified in the shelf list. in some instances an incorrectly punched card is responsible for the item not being properly discharged. if a call number on a notice cannot be found in the shelf list, there is no alternative except to delete the record and absolve the reader of responsibility. if the call number on the notice represents a valid book, it is searched in the stacks and if found, is brought down and discharged, with a resultant fine notice. when the book cannot be found, the reader is usually held responsible for it, unless it was a case of a lost badge which was reported promptly, in which case the library is usually lenient. if no lost badge was involved, the book is processed as a "lost" book and the user is billed. a book is also considered lost (and the user billed) if the user does not respond to three overdue notices. weekly overdue notices are not prepared for faculty. instead a once-aquarter computer-produced memo is prepared informing the individual of the books charged to him. he is asked to return them or notify the library by carbon copy of the list that he wishes to retain them. if a faculty member does not return the list, the library calls in the books individually. as part of this quarterly memo run, listings of books charged to carrels and to departments (reserve room, bindery, cataloging, etc. ) are produced. these listings have proved very valuable in maintaining control over books charged out on a long-term basis. lost books when a book is determined to be "lost," a duplicate book card is prepared. the history of the loss, including the name and address of the individual involved, is entered on the card. if the reader is held responsible, the book is priced and a bill is prepared. the original record is left in the file until all the documents are prepared. then it is deleted via the 27 40, and a duplicate card is immediately used to charge the book out to the "lost" category. the duplicate card is then filed in the "lost/missing" file, by call number. another category of books is known as "missing." these are books which, although not charged out to anyone, cannot be found in the stacks. a duplicate card is prepared and used to charge the item out to the "missing" category. the card is filed in the "lost/missing" file. once a quarter, a computer-produced listing of lost/missing books is received. using this list, the stacks can be searched to see if the books have turned up. the list of books lost or missing for more than two years it turned over to the catalog department for withdrawal. after official withdrawal, the record will be deleted from the file. interactive circulation systemjveneziano 113 the fact that all lost/ missing books are reflected in the file has aided in detecting them if they turn up. if such a book is discharged, a printed message alerts the operator who routes the book to the person in charge of lost/missing books. the duplicate card is then pulled from the lost/missing file. since the card contains the name of the responsible individual, it is possible to trace down the original bill in case an adjustment is necessary. lost/ missing books also turn up if someone tries to charge them out. the "unprocessed" message which is printed instead of a date-due slip will usually cause the reader to bring the book to the main circulation desk where the proper action can be taken to reinstate it in the collection. manual charges the system had to be designed so a book could be charged out even if it did not have a punched book card. such books are brought to the main circulation desk where a two-part form is hand-prepared. one part becomes a date-due slip; the other part goes to keypunching. a composite card containing the call number, the user number, and the loan code is punched, which is then fed through the 1030 to create a charge record. also keypunched at this time is a regular book card, which is filed in the "cards-pending" file to await the return of the book. such manual charges are very unsatisfactory. call numbers and identification numbers are often illegible or miscopied. keypunch errors are not uncommon. care must be taken that the composite cards are processed through the 1030 before the book is returned for discharge. fortunately, books without cards are now a rarity. mechanized charges although the amount of computer down-time is very slight, some means had to be devised to charge out books during such periods. the manual charge procedure could have been used; however the high error rate in copying and punching, coupled with the delay in keypunching any substantial volume of cards, caused us to reject this as a back-up system. a standard register source record punch is used. this punch reads the badge and book card and transfers the data, plus data from a series of internal slides, to produce a printed date-due slip and a punched composite card. when the computer comes back on, the composite card is fed through the 1031 to set up the charge record. since only one machine can be justified from a cost standpoint, the process of charging books out in this fashion is slow. long lines of people often form, waiting for service. resetting the internal slides between one loan code and another is awkward and error-prone. the machine is extremely sensitive to badge quality and often misses a punch. however, as with manual charges, the most significant disadvantage is that charges are made "blind." there is no way to determine whether a book is not already in the file, or, if it is being renewed, that it has a save 114 journal of library automation vol. 5/2 june, 1972 on it. the user's number may be one of those "blocked" from use; this fact is not detected until it is too late. as with manual charges, care must be taken that all such mechanized charges are processed through the 1030 before any discharging is done. in spite of its defects, the source record punch has proved useful as a system back-up. the error rate in transfer, while higher than on the 1030, is significantly less than the error rate of manually prepared and keypunched charges. although slow, records get into the file much faster than if they had to be keypunched. the impact of the system on the library the new system has had a profound impact on the operation of the circulation services section, but other departments have also been affected, particularly technical services. tighter control of cataloging is now maintained. no longer is it feasible for small uncataloged collections or collections with off-beat cataloging to exist in virtual isolation from the rest of the library. regulations as to depth of classification have had to be adopted; the formation of the cutter number and work letters must be carefully regulated; the assignment of volume and edition numbers must be uniform. location symbols require careful control; no longer can books be casually passed from one collection to another without official transfer. withdrawal of lost and missing books must be systematically performed. the system gives maximum flexibility-books may circulate on lc class numbers or document numbers as well as on a dewey number. ways of handling non-standard cutter numbers and work letters have been improvised. at the same time, the system operates to prevent unnecessary haphazard and shortsighted practices. within the circulation services section, the computerization of circulation has not resulted in fewer personnel; it has, however, resulted in the same number of staff members being able to handle a much larger volume of circulation and to handle it more efficiently. in addition, cirrulation services has taken on a number of tasks which in the past were either not its responsibility or, if they were, were given only perfunctory attention. a comprehensive inventory of the entire collection of 1,200,000 books in the main library is in progress. errors both on books and in the catalog are being corrected. the physical condition of the collection is being attended to. the content and quality of the collection are receiving increased attention. incomplete serial holdings are being brought to light for possible acquisition. books in the stacks which are candidates for inclusion in the "special collections" department are being detected. so far as circulation proper is concerned, it can be said without reservation that the system saves a great deal of clerical effort. staff time spent in charging out books is very small. discharging an average day's books interactive circulation system j veneziano 115 requires three or four man-hours. filing has almost disappeared as has most of the typing formerly required. a 2740 operator is required for inquiry and processing of mail renewals for the better part of the day and evening. the collection and follow-up on fines and bills is still a time-consuming job, although the extra forms available for follow-up have supplied some relief. the system is not perfect. there are certain improvements-such as on-line validation of users and automatic regulation of loan privilegeswhich would be made if the time and money were available for them. however, considering the modest cost of developing and operating the system, the imperfections are bearable. not the least of the benefits derived from the system is a somewhat intangible one. the role of the circulation librarian, and that of his staff, has changed. no longer are they chained to mountains of cards which, as soon as they are filed, must be unfiled. staff members have been challenged to use their released time to the best advantage. much thought and ingenuity has gone into setting up procedures to achieve maximum efficiency and accuracy. for the first time, perfection is seen as an attainable goal. each day the staff develops more sophistication and gets a step closer to that goal. figure 1. user inserts identification badge and punched book card in self-service circulation terminal. 116 journal of library automation vol. 5/2 june, 1972 fig. 3. user inserts date-due slip in book pocket, completing charge procedure. fig. 2. specially designed attachment is used to cut off printed fig. 4. terminal at circulation desk has manual entry unit, which can be set to process charges without an identification badge, renewals, or discharges. interactive circulation systemjveneziano 117 fig. 5. typewriter terminal is used for inquiry into file, placing saves on books, and occasionally for renewals. reproduced with permission of the copyright owner. further reproduction prohibited without permission. free culture: how big media uses technology and the law to lock down culture and control creativity coyle, karen information technology and libraries; dec 2004; 23, 4; proquest pg. 198 book review free culture how big media uses technology and the law to lock down culture and control creativity by lawrence lessig . new york: penguin, 2004. 240p. $24.95 (isbn 1594-20006-8). this is the third book by stanford law professor larry lessig, and the third in which he furthers his basic theme : that the ancient regime of intellectual property owners is locked in a battle with the capabilities of new technology. lessig u sed his first book, code and other laws of cyberspace (basic books, 1999), to explain that the notion of cyberspace as free, open, and anarchic is simply a myth, and a dangerous one at that: the very architecture of our computers and how they communicate determine what one can and cam10t do within that environment. if you can get control of that architecture, say by mand ating filters on cont ent, yo u can get subs tantial control over the culture of that communication space. in his sec ond book, the future of ideas: the fate of the commons in a connected world (random, 2001), lessig describes how the chang e from real prop erty to virtual propert y actually means more opportunity for control , not less. the theme that he takes up in free culture is his conc ern that certain powerful inter ests in our society (read: hollywood) are using copyright law to lock down the very stuff of creativity: mainly , pa st creativity. lessig himself admits in his preface that his is not a new or unique argument. he cites richard stallman's writings in the mid-1980s that became the basis for the free software movement as containing many of the same concepts that lessig argues in his book. in this case, it serves as a kind of proof of concept (that new ideas build on past ideas) rather than a criticism of lack of originality. stallman's work is not, however, a substitute for lessig's; not only does lessig address popular culture where stallman addresses only computer code, but lessig has one key thing in his favor: h e is a mast er story-tell er and a darned good writer, not something one usually expec ts in an academic and an expert in constitutional law. his book opens with the first flight osf the wright brothers and the death of a farmer's chick ens, followed by buster keaton's film steamboat bill and disney's famous mouse . th e next chapter traces the history of photography and how the law once considered that snapping a picture could require prior permission from the owners of any property caught in th e viewfinder. later he tells how an improvement to a sea rch engin e led one college student to owe the recording industry association of america $15 million. throughout the book lessig illustrates copyright through the lives of real people and uses histor y, science, and the arts to mak e this law come to life for the reader . lessig explains that intellectual property differ s from real property in the eye of the law. unlike real property, where th e property owner has near total control over its uses, the only control offered to authors originally was the control over who could make copies of the work and distribut e them. in addition, that right-the "copy right" -lasted only a short time. the original length of copyright in the united states was fourteen years, with the right to renew for another fourteen years. so a total of twenty-eight years stood betwe en an author's rights and the public domain, and those rights were limited to publishing copies. others could quote from a work, even derive other works from it (such as turning a no ve l into a play) , all within a law that was designed to promote science and the arts. fast forward to the present day and we have a very different situation. not only has there been a change in th e length of time that copyright applies to a work; a major change in 198 information technology and libraries i december 2004 tom zillner, editor copyright law in 1976 extended copyright to works that had not previously b een covered. in the earli es t u.s. copyright regimes of the late 18th century, only works that were registered with the copyright office were afforded the prot ection of copyright law, and only about five perc en t of works produc ed were so registered. th e rest were in the public domain. later, actual registration with the copyright office was unnecessary but the author was required to place a copyright notice on a work (e.g ., "© 2004, karen coyle") in order to claim copyright in it. copyright holder s had to renew works in order make use of the full term of protection, and renewal rates were actually quite low. in 1976, all such requirements were removed, and the law was amended to state that any work in a fixed m edium automatically receives copyright protection, and for the full term. that is true even if the author do es not want that protection . so although many saw the great exchange of ideas an d information on the internet as being a huge commons of knowledge, to be shared and sha red alike, a ll of it has, in fact, alwa ys been covered by copyright law-every word out there belongs to someone. that chang e, combined with a much earlier change that gave a copyright holder control over derivative works, puts creators into a deadlock. th ey cannot safely build on the work of others without permission (thus less ig's argument that we are becomin g a "permission culture ") . yet, we have no m echanism (such as registration of works that would result in a databas e of creators) that would facilitate getting th at permission . if you find a work on the internet and it has no named author or no contact information for the author, the law forbids you to reuse the work without permission, but there is nothing that would make getting that permission a manageable task. of course, even if you do know who th e rights hold er is , permission is not a given. for examreproduced with permission of the copyright owner. further reproduction prohibited without permission. ple, you hear a great song on the radio and want to use parts of that tune in your next rap performance. you would need to approach the major record label that holds the rights and ask permission, which might not be granted. you could go ah ead and use the sample and, if challenged, claim "fair use." but being challenged means going to court in a world where a court case could cost you in the six digits, an amount of money that most creators do not have. lessig, of course, spends quite a bit of time in his book on the length of copyright, now life of the author plus seventy years. it was exactly this issue that he and eric eldred took to the supreme court in 2003. lessig argued before the court that if congress can seemingly arbitrarily increase the length of copyright, as it has eleven times since 1962, then there is effectively no limit to the copyright term. yet "for a limited time" was clearly mandated in the u.s. constitution. lessig lost his case. you might expect him to spend his efforts explaining how the supreme court was wrong and he was right, but that is not what he does . right or wrong, they are the supreme court, and his job was to convince them to decide in favor of his client. instead, lessig revises his estimation of what can be accomplished with constitutional arguments and spends a chapter outlining compromises that mightjust might-be possible in the future. to the extent that eldred v. ashcroft had an effect on lessig's thinking , and there is evidence that the effect was profound, it will have an effect on all of us because lessig is one of the key actors in this arena. throughout the book, lessig points out the difference between copyright law and the actual market for works. there is a great irony in the fact that copyright law now protects works for a century or more while most books are in print for one year or less. it is this vast storehouse of out-ofprint and unexploited works that makes a strong argument for some modification of our copyright law. he also recognizes that there are different creative cultures in our society, with different views of the purpose of creation. here he cites academic movements like the public library of science as solutions for the sector of society that has a low or nonexistent commercial interest but a need to get its works as widely distributed as possible. for these creators, and for "sharers" everywhere, lessig promotes the creativecommons solution (at www. creativecommons.org), a simple licensing scheme that allows creators to attach a license to their work that lets others know how they can make use of it. in a sense, creativecommons is a way to opt out of the default copyright that is applied to all works. when i first received my copy of free culture, i did two things: i looked up libraries in the index, and i looked up the book online to see what other reviewers had said. online, i found a web site for the book (http:/ /free-culture.org) that pointed to two very interesting sites: one that lists free, downloadable fulltext copies of the book in over a dozen different formats; and one that allows you to listen to the chapters being read aloud by volunteers and admirers. (i did listen to a few chapters and generally they are as listenable as most nonfiction audio books. in the end, though, i read the hard copy of the book.) lessig is making a point by offering his work outside the usual confines of copyright law, but in fact the meaning of his gesture is more economic than legal. although he, and cory doctorow before him (down and out in the magic kingdom, tor books, 2003), brokered agreements with their publishers to publish simultaneously in print with free digital copies, few authors and publishers today will choose that option for fear of loss of revenue, not because of their belief in the sanctity of intellectual property. if there were sufficient proof that free online copies of works increased sales of hard copies, this would quickly become the norm, regardless of the state of copyright law. as for libraries-unfortunately, they do not fare well. he dedicates a short chapter to brewster kahle and his way-back machine as his example of the need to archive our culture for future access. i admit that i winced when lessig stated: but kahle is not the only librarian. the internet archive is not the only archive. but kahle and the internet archive suggest what the future of librarie s or archives could be. (114) lessig also mentions libraries in his arguments about out-of-print and inaccessible works, but in this case he actually gets it wrong: after it [a book] is out of print , it can be sold in used book store s without the copyright owner getting anything and stored in libraries, where many get to read the book, also for free. (113) since we know that lessig is very aware that books are sold and lent even while they are still in print, we have to assume that the elegance of the argum ent was preferred over precision . but he makes this error mor e than once in the book, leaving librarie s to appear to be a home for leftov ers and remaindered works. that is too bad. we know that lessig is aware of libraries; anyone active in the legal profession depends on them. he has spoken at library-related conferences and events. yet he does not see libraries as key players in the battle against overly powerful copyright interests . more to the point, libraries have not captured his imagination, or given him a good story to tell. so here is a challenge for myself and my fellow librarians: whether it means chatting up lessig after one of his many public performances, becoming active in creativecommons, or stopping by palo alto to take a busy law professor to lunch , we need to make sure that we get on , and stay on, lessig's radar . we need him ; h e needs us.-karen coyle, digital libraries consultant, http:// kcoyle.net book review 199 editorial | truitt 107 marc truitteditorial: computing in the “cloud” silver lining or stormy weather ahead? c loud computing. remote hosting. software as a service (saas). outsourcing. terms that all describe various parts of the same it elephant these days. the sexy ones—cloud computing, for example—emphasize new age-y, “2.0” virtues of collaboration and sharing with perhaps slightly mystic overtones: exactly where and what is the “cloud,” after all? others, such as the more utilitarian “remote hosting” and “outsourcing,” appeal more to the bean counters and sustainabilityminded among us. but they’re really all about the same thing: the tradeoff between cost and control. that the issue increasingly resonates with it operations at all levels these days can be seen in various ways. i’ll cite just a few: n at the meeting of the lita heads of library technology (holt) interest group at the 2009 ala annual conference in chicago, two topics dominated the list of proposed holt programs for the 2010 annual conference. one of these was the question of virtualization technology, and the other was the whole white hat–black hat dichotomy of the cloud.1 practically everyone in the room seemed to be looking at—or wanting to know more about—the cloud and how it might be used to benefit institutions. n my institution is considering outsourcing e-mail. all of it—to google. times are tough, and we’re being told that by handing e-mail over to the googleplex, our hardware, licensing, evergreening, and technical support fees will total zero. zilch. with no advertising. heady stuff when your campus hosts thirty-plus central and departmental mail servers, at least as many blackberry servers, and total costs in people, hardware, licensing, and infrastructure are estimated to exceed can$1,000,000 annually. n in the last couple of days, library electronic discussion lists such as web4lib have been abuzz— or do we now say a-twitter?—about amazon’s orwellian kindle episode, in which the firm deleted copies of 1984 and animal farm from subscribers’ kindle e-book readers without their knowledge or consent.2 indeed, amazon’s action was in violation of its own terms of service, in which the company “grants [the kindle owner] the non-exclusive right to keep a permanent copy of the applicable digital content and to view, use, and display such digital content an unlimited number of times, solely on the device or as authorized by amazon as part of the service and solely for [the kindle owner ’s] personal, noncommercial use.”3 all of this has me thinking back to the late 1990s marketing slogan of a manufacturer of consumer-grade mass storage devices—remember removable hard drives? iomega launched its advertising campaign for the 1 gb jaz drive with the catch-line “because it’s your stuff.” ultimately, whether we park it locally or send it to the cloud, i think we need to remember that it is our stuff. what i fear is that in straitened times, it becomes easy to forget this as we struggle to balance limited staff, infrastructure, and budgets. we wonder how we’ll find the time and resources to do all the sexy and forward-looking things, burdened as we are with the demands of supporting legacy applications, “utility” services, and a huge and constantly growing pile of all kinds of content that must be stored, served up, backed up (and, we hope, not too often, restored), migrated, and preserved. the buzz over the cloud and all its variants thus has a certain siren-like quality about it. the notion of signing over to someone else’s care—for little or no apparent cost—our basic services and even our own content (our stuff) is very appealing. the song is all the more persuasive in a climate where we’ve moved from just the normal bad news of merely doing more with less to a situation where staff layoffs are no longer limited to corporate and public libraries, but indeed extend now to our greatest institutions.4 at the risk of sounding like a paranoid naysayer to what might seem a no-brainer proposition, i’d like to suggest a few test questions for evaluating whether, how, and when we send our stuff into the cloud: 1. why are we doing this? what do we hope to gain? 2. what will it cost us? bear in mind that nothing is free—except, in the open-source community, where free beer is, unlike kittens, free. if, for example, the borg offer to provide institutional mail without advertisements, there is surely a cost somewhere. the borg, sensibly enough, are not in business to provide us with pro bono services. 3. what is the gain or loss to our staff and patrons in terms of local customization options, functionality, access, etc? 4. how much control do we have over the service offered or how our content is used, stored, marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 108 information technology and libraries | september 2009 repurposed, or made available to other parties? 5. what’s the exit strategy? what if we want to pick up and move elsewhere? can we reclaim all of our stuff easily and portably, leaving no sign that we’d ever sent it to the cloud? we are responsible for the services we provide and for the content we have been entrusted. we cannot shrug off this duty by simply consigning our services and our stuff to the cloud. to do so leaves us vulnerable to an irreparable loss of credibility with our users; eventually some among them would rightly ask, “so what is it that you folks do, anyway?” we’re responsible for it—whether it’s at home or in the cloud—because it’s our stuff. it is our stuff, right? references and notes 1. i should confess, in the interest of full disclosure, that it was eli neiburger of the ann arbor district library who suggested “hosted services as savior or slippery slope” for next year’s holt program. i’ve shamelessly filched eli’s topic, if not his catchy title, for this column. thanks, eli. also, again in the interest of full disclosure, i suggested the virtualization topic, which eventually won the support of the group. finally, some participants in the discussion observed that virtualization technology and hosting are in many ways two sides of the same topical coin, but i’ll leave that for others to debate. 2. brad stone, “amazon erases orwell books from kindle,” new york times, july 17, 2009, http://www.nytimes .com/2009/07/18/technology/companies/18amazon.html?_ r=1 (accessed july 21, 2009). 3. amazon.com, “amazon kindle: license agreement and terms of use,” http://www.amazon.com/gp/help/customer/ display.html?nodeid=200144530 (accessed july 21, 2009). 4. “budget cutbacks announced in libraries, center for professional development,” stanford university news, june 10, 2009, http://news.stanford.edu/news/2009/june17/layoffs-061709 .html (accessed july 22, 2009; “harvard libraries cuts jobs, hours,” harvard crimson (online edition), june, 26 2009, http:// www.thecrimson.com/article.aspx?ref=528524 (accessed july 22, 2009). letter from the editor (june 2022) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.15225 editorial board update i would like to open with a message of gratitude to the editorial board members who have helped shape the direction and focus of the journal over the past four years. steve bowers, kevin ford, cinthya ippoliti, ida joiner, michael sauers, and laurie willis have been fantastic colleagues, providing sage advice and thoughtful opinions through their tenures. together, they have reviewed dozens of articles for the journal but, more importantly, have helped shape the policies and directions we hope to take. together, we thought through and instituted our name change policy, a policy for revision of published articles, and ongoing efforts to identify sources of bias in editorial and reviewing practice. this work lays the foundation for future improvements. even as we say farewell to these editorial board members, it is my pleasure to welcome these individuals to the editorial board on july 1: ashlea green, mary a. guillory, dana haugh, shanna hollich, and cynthia schwarz. they were selected from an impressive pool of applicants. we are grateful for all who applied. we welcome submissions related to the intersection of cultural memory institutions (libraries, archives, and museums) and technology. our call for submissions outlines the topics and process for submitting an article for review. if you have questions or wish to bounce ideas off the editor and assistant editor, please contact either of us at the email addresses below. this issue’s contents the june “public libraries leading the way” column is contributed by julie lane at the county of prince edward public library and archives. lane describes how the covid-19 pandemic not only led to immediate changes to serve a geographically distributed community, but also increased the library’s horizons in terms of advocating for and promoting equitable access to learning materials . our peer-reviewed content this month showcases topics including collection analysis, userlearner profiles, topic modeling, copyright bots, intangible cultural heritage, contactless services, and explainable artificial intelligence. 1. rarely analyzed: the relationship between digital and physical rare books collections / allison mccormack and rachel wittmann 2. ontology for the user-learner profile personalizes the search analysis of online learning resources: the case of thematic digital universities / marilou kordahi 3. applying topic modeling for automated creation of descriptive metadata for digital collections / monika glowacka-musial 4. classical musicians v. copyright bots: how libraries can aid in the fight / adam eric berkowitz 5. research on knowledge organization of intangible cultural heritage based on metadata / qing fan, guoxin tan, chuanming sun, and panfeng chen 6. contactless services: a survey of the practices of large public libraries in china / yajun guo, zinan yang, yiming yuan, huifang ma, and yan quan liu 7. explainable artificial intelligence (xai): adoption and advocacy / michael ridley kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu marisha.librarian@gmail.com https://ejournals.bc.edu/index.php/ital/name-change-policy https://ejournals.bc.edu/index.php/ital/name-change-policy https://ejournals.bc.edu/index.php/ital/call-for-submissions https://ejournals.bc.edu/index.php/ital/article/view/13415 https://ejournals.bc.edu/index.php/ital/article/view/13601 https://ejournals.bc.edu/index.php/ital/article/view/13601 https://ejournals.bc.edu/index.php/ital/article/view/13799 https://ejournals.bc.edu/index.php/ital/article/view/13799 https://ejournals.bc.edu/index.php/ital/article/view/14027 https://ejournals.bc.edu/index.php/ital/article/view/14093 https://ejournals.bc.edu/index.php/ital/article/view/14141 https://ejournals.bc.edu/index.php/ital/article/view/14683 mailto:varnum@umich.edu mailto:marisha.librarian@gmail.com editorial board update this issue’s contents topic modeling as a tool for analyzing library chat transcripts article topic modeling as a tool for analyzing library chat transcripts hyunseung koh and mark fienup information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13333 hyunseung koh (hyunseung.koh@uni.edu) is an assessment librarian and assistant professor of library services, university of northern iowa. mark fienup (mark.fienup@uni.edu) is an associate professor in the computer science department, university of northern iowa. © 2021. abstract library chat services are an increasingly important communication channel to connect patrons to library resources and services. analysis of chat transcripts could provide librarians with insights into improving services. unfortunately, chat transcripts consist of unstructured text data, making it impractical for librarians to go beyond simple quantitative analysis (e.g., chat duration, message count, word frequencies) with existing tools. as a stepping-stone toward a more sophisticated chat transcript analysis tool, this study investigated the application of different types of topic modeling techniques to analyze one academic library’s chat reference data collected from april 10, 2015, to may 31, 2019, with the goal of extracting the most accurate and easily interpretable topics. in this study, topic accuracy and interpretability—the quality of topic outcomes—were quantitatively measured with topic coherence metrics. additionally, qualitative accuracy and interpretability were measured by the librarian author of this paper depending on the subjective judgment on whether topics are aligned with frequently asked questions or easily inferable themes in academic library contexts. this study found that from a human’s qualitative evaluation, probabilistic latent semantic analysis (plsa) produced more accurate and interpretable topics, which is not necessarily aligned with the findings of the quantitative evaluation with all three types of topic coherence metrics. interestingly, the commonly used technique latent dirichlet allocation (lda) did not necessarily perform better than plsa. also, semi-supervised techniques with human-curated anchor words of correlation explanation (corex) or guided lda (guidedlda) did not necessarily perform better than an unsupervised technique of dirichlet multinomial mixture (dmm). last, the study found that using the entire transcript, including both sides of the interaction between the library patron and the librarian, performed better than using only the initial question asked by the library patron across different techniques in increasing the quality of topic outcomes. introduction with the rise of online education, library chat services are an increasingly important tool for student learning.1 library chat services have the potential to support student learning, especially for distant learners who have a lack of opportunity to come and learn about library and research skills in person. in addition, unlike traditional in-person reference services whose use has declined drastically, library chat services have become an important communication channel that connects patrons to library resources, services, and spaces.2 quantitative and qualitative analysis of chat transactions could provide librarians with insights into improving the quality of these resources, services, and spaces. for example, in order to maximize patrons’ satisfaction, librarians could identify or evaluate quantitative and qualitative mailto:hyunseung.koh@uni.edu mailto:mark.fienup@uni.edu information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 2 patterns of chat reference data (e.g., busiest days and times of nondirectional, research-focused questions) and develop a better staffing plan for assigning librarians or student employees to most appropriate days and times. furthermore, these insights could be used to help demonstrate library value by showing external stakeholders how successfully library chat services support students’ needs, which is increasingly in demand for higher education. 3 in practice, it is burdensome for librarians to go beyond simple quantitative analysis (e.g., chat duration, message count, word frequencies) with existing chat software tools, such as libraryh3lp, questpoint, springshare’s libchat, and liveperson.4 currently, in order to obtain rich and hidden insights from large volumes of chat transcripts, librarians need to conduct manual qualitative analysis of chat transcripts with unstructured text data, which requires a lot of time and effort. in an age when library patrons' information needs have been changing, the lack of chat analysis tools that handle large volumes of transcripts hinders librarians’ ability to respond to patrons’ wants and needs in a timely manner.5 in particular, small and medium-sized academic libraries have seen a shortage of librarians and need to hire and train student employees , so librarians’ capabilities for real-time quick and easy analysis and assessment will become critical in helping them take appropriate actions to best meet user needs.6 as part of an effort to develop a quick and easy analysis tool for large volumes of chat transcripts, this study applied topic modeling, which is a statistical technique “for learning the latent structure in document collections” or “a type of statistical model for finding hidden topical patterns of words.”7 we compared outcomes of different types of topic modeling techniques and attempted to propose topic modeling techniques that would be most appropriate in the context of chat reference transcript data. literature review to identify the most appropriate research methods that would facilitate analyzing a vast amount of chat transcripts, this section first introduces literature in relation to research methods used in analyzing chat transcript data in library settings and nonlibrary settings. it follows by discussing different types of topic modeling techniques that have high potential for quick and easy analysis of chat transcripts and their strengths and weaknesses. chat transcript analysis methods in library settings in analyzing library chat transcripts, which are one major data source of library chat service research, researchers have used variants of quantitative and qualitative research methods.8 coding-based content analysis with or without predefined categories is one type of qualitative method.9 the other type of qualitative research method is conversation or language usage analysis but it is not a dominant type of research method, as compared to coding-based qualitative content analysis.10 the most common quantitative methods are simple descriptive countor frequencybased analyses that are accompanied by qualitative coding-based content analyses.11 in some recent research, advanced quantitative research methods, such as cluster analysis and topic modeling techniques, have been used, but they have not been fully explored yet with a wide range of techniques.12 chat transcript analysis methods in nonlibrary settings as shown in table 1, researchers in nonlibrary settings also used research methods in analyzing chat data from diverse technology platforms or contexts, ranging from qualitative manual coding methods to data mining and machine learning techniques. topic modeling techniques are one of the chat analysis methods, but again, it seems that they have not been fully explored yet in chat analyses in nonlibrary settings, even though they have been used in a wide range of contexts.13 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 3 table 1. chat transcript analysis applications in non-library settings disciplines platforms/sources of chat transcript data chat transcript analysis methods/tools/techniques education chat rooms and text chat14 qualitative content analysis health social media15 qualitative & quantitative content analysis business in-game chat features and chatbots16 a spell-checker, readability scores, the number of spelling and grammatical errors, linguistic inquiry and word count (liwc) program, logistic regression analysis, decision tree, support vector machine (svm) criminology instant messengers, internet relay chat (irc) channels, internet-based chat logs, and social media17 liwc program, cluster analysis, latent dirichlet allocation (lda) topic modeling techniques and their strengths and weaknesses as a quantitative and statistical method appropriate for analyzing a vast amount of chat transcript data, researchers from both library and nonlibrary settings used topic modeling. as shown in table 2, conventional topic modeling techniques include latent semantic analysis, probabilistic latent semantic analysis, and latent dirichlet allocation, each of which has its unique strengths and weaknesses.18 in order to overcome weaknesses of the conventional techniques, researchers have developed alternative techniques. for example, dirichlet multinomial mixture (dmm) has been proposed to overcome data sparsity problems in short texts.19 as another example, correlation explanation (corex) has been proposed to avoid time and effort to identify topics and their structure ahead of time.20 last, guided lda (guidedlda) has been proposed to improve performance of infrequently occurring topics.21 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 4 table 2. strengths and weaknesses of conventional topic modeling techniques acronym definitions strengths weaknesses latent semantic analysis lsa a document is represented as a vector of numbers found by applying dimensionality reduction (specifically, truncated svd) to summarize the frequencies of cooccurring words across documents. can deal with polysemy (multiple meanings) to some extent. is hard to obtain and to determine the optimal number of topics. probabilistic latent semantic analysis plsa a document is represented as vectors, but these vectors have nonnegative entries summing to 1 such that each component (topic) represents the relative prominence of some probabilistic mixture of words in the corpus. topics in a document are “probabilistic instead of the heuristic geometric distances.”22 can deal with polysemy issues; provides easy interpretation terms of word, document, and topic probabilities. has over-fitting problems. latent dirichlet allocation lda a bayesian extension of plsa that adds assumptions about the relative probability of observing different document's distributions over topics. prevents overfitting problems; provides a fully bayesian probabilistic interpretation. does not show relationships among topics. data, preprocessing, analysis, and evaluation this section first introduces the data used for this study. next, it explains the procedures of each stage starting from preprocessing to analyzing chat transcript data using different types of conventional and alternative topic modeling techniques. last, it discusses quantitative and qualitative evaluation in terms of the quality of topic outcomes across different types of topic technique. for more details including python scripts please visit our github page at https://github.com/mfienup/uni-library-chat-study. https://github.com/mfienup/uni-library-chat-study information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 5 data this study collected the university of northern iowa’s rod library chat reference data dated from april 10, 2015, to may 31, 2019 (irb#18-0225). this raw chat data was downloaded from libchat in the form of an excel spreadsheet with 9,942 english chat transcripts with each transcript as a separate row. preprocessing as the first step, this study removed unnecessary components of each chat transcript using a custom python script. components removed were timestamps, patron and librarian identifiers, http tags (e.g., urls), and non-ascii characters. next, it processed the resulting text words using python’s natural language toolkit (https://www.nltk.org/) and its wordnetlemmatizer function (https://www.nltk.org/_modules/nltk/stem/wordnet.html) to normalize words for further analyses. as the final step, it prepared the four types of data sets to identify which type of data set would produce better topic outcomes. the four types of data sets were as follows: • question-only: consists of only the initial question asked by the library patron in each chat transcript. only the latter 10.7% of the chats recorded in the excel spreadsheet contained an initial question column entry. the remaining chats assumed to contain their initial question in the patron’s first response if it was longer than a trivial welcome message. • whole-chat: consists of the whole chat transcripts from the library patron and librarians. • whole-chat with nouns and adjectives: consists of only nouns and adjectives as parts of speech (pos) from the whole chat transcripts. • whole-chat with nouns, adjectives, and verbs: consists of only nouns, adjectives, and verbs as pos from the whole chat transcripts. the first two data sets were prepared to examine if the first question initiated by each patron or the whole chat transcripts would help produce better topic outcomes. the last two data sets were prepared to examine which parts of speech retained would help produce better topic outcomes. data analysis with conventional topic modeling techniques this study first analyzed chat reference data using three conventional topic modeling techniques: latent semantic analysis (lsa), probabilistic latent semantic analysis (plsa), and two versions of latent dirichlet allocation (lda), as shown in table 3. all three techniques are examples of unsupervised topic modeling techniques that automatically analyze text data from a set of documents (in this study, a set of chat transcripts) to infer predominant topics or themes across all documents without human help. a key challenge, or a key parameter to be determined, for unsupervised topic modeling techniques is to identify the optimal number of topics. the study ran the commonly used lda technique with the whole-chat data set with various numbers of topics. fifteen was chosen as an optimal number of topics for this study by calculating and comparing the log-likelihood scores among various number of topics. https://www.nltk.org/ https://www.nltk.org/_modules/nltk/stem/wordnet.html information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 6 table 3. conventional topic modeling techniques and their sources technique programming language implementation source version used in the study latent semantic analysis python https://pypi.org/project/gensim/ 3.8.1 probabilistic latent semantic analysis python https://scikitlearn.org/stable/modules/generated/ sklearn.decomposition.nmf.html 0.21.3 latent dirichlet allocation (with sklearn) python https://scikitlearn.org/stable/modules/generated/ sklearn.decomposition.latentdirichlet allocation.html 0.21.3 latent dirichlet allocation (with pymallet) python https://github.com/mimno/pymallet dated february 26, 2019 also, before analyzing chat transcript data using lsa and plsa, this study performed a term frequency–inverse document frequency (tf–idf) transformation. tf–idf is a measure of how important a word is to a document (i.e., a single chat transcript) compared to its relevance in a collection of all documents. data analysis with alternative topic modeling techniques in addition to conventional topic modeling techniques, this study analyzed chat reference data using three alternative techniques of dirichlet multinomial mixture (dmm), anchored correlation explanation (corex) and guided lda (guidedlda), as shown in table 4. this study selected dmm as an alternative unsupervised topic modeling technique that has been developed for short texts. also, this study selected anchored corex and guided lda (guidedlda) as semi-supervised topic modeling techniques that require human-curated sets of words, called anchors or seeds, which nudge topic models toward including the suggested anchors. this is based on the assumption that human’s curated techniques would help produce better quality of topics than the unsupervised techniques. for example, the three words “interlibrary,” “loan,” and “request,” or the two words “article” and “database,” are possible anchor words in the context of library chat transcripts. such anchor words can appear anywhere within a chat in any order. https://pypi.org/project/gensim/ https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.nmf.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.nmf.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.nmf.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.latentdirichletallocation.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.latentdirichletallocation.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.latentdirichletallocation.html https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.latentdirichletallocation.html https://github.com/mimno/pymallet information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 7 table 4. alternative topic modeling techniques and their sources unsupervised vs. semisupervised technique programming language implementation source version used in the study unsupervised dirichlet multinomial mixture (dmm) java https://github.com/qiang2 100/sttm 9/27/2019 semi-supervised anchored correlation explanation (corex) python https://github.com/gregve rsteeg/corex_topic 1/21/2020 semi-supervised guided lda using collapsed gibbs sampling python https://guidedlda.readthe docs.io/en/latest/ 10/5/2017 given that a known set of anchor words associated with academic library chats seems unavailable in the literature, this study decided to obtain a list of most meaningful anchor words by combining outcomes of the unsupervised techniques with a human’s follow-up curation, as follows: step 1. execute unsupervised topic modeling techniques step 2. combine resulting topics from all unsupervised topic modeling techniques step 3. identify a list of all possible pairs of words (bi-occurrences), e.g., 28 pairs of words if each topic has 8 words, and all possible combinations of tri-occurrences of words step 4. identify most common bi-occurrences and tri-occurrences of words across all topics by ordering in descending order by frequency step 5. select a set of anchors from these bi-occurrences and tri-occurrences of words by a human’s judgment in terms of selecting a set of anchor words, the librarian author of this paper judged whether combinations of words in each row from step 4 were aligned with frequently asked questions or easily inferable themes in academic library contexts. as shown in table 5, a set of “interlibrary,” “loan,” and “request” was selected as anchor words that are aligned with one frequently asked question about interlibrary loan requests, whereas a set of “access,” “librarian,” and “research” was not selected as anchor words because multiple themes, such as access to resources and asking for research help from librarians, can be inferred. additionally, a set of “hour,” “time,” and “today” was selected over a set of “time,” “tomorrow,” and “tonight” as better or clearer anchor words. https://github.com/qiang2100/sttm https://github.com/qiang2100/sttm https://github.com/gregversteeg/corex_topic https://github.com/gregversteeg/corex_topic https://guidedlda.readthedocs.io/en/latest/ https://guidedlda.readthedocs.io/en/latest/ information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 8 table 5. examples of anchor words that were selected and not selected examples of tri-occurrences of words (note: strikethrough denotes a set of words that were not selected as anchor words) 1 interlibrary loan request 2 hour time today 3 time tomorrow tonight 4 time today tomorrow 5 floor librarian research 6 access librarian research 7 camera digital hub 8 digital hub medium 9 access article journal 10 access article database 11 access account campus 12 research source topic 13 paper research topic quantitative evaluation with topic coherence metrics comparing the quality of topic outcomes across various topic modeling techniques is tricky. purely statistical and quantitative evaluation techniques, such as held-out log-likelihood measures, have proven to be unaligned with human intuition or judgment with respect to topic interpretability and coherency.23 thus, this study adopted the three topic coherence metrics of tcpmi (normalized pointwise mutual information), tc-lcp (normalized log conditional probability), and tc-nz (number of topic word pairs never observed together in the corpus) that have been introduced by boyd-graber, mimno, and newman; bouma; and lau, newman, and baldwin.24 these three metrics are based on the assumption that the likelihood that two words that co-occur in a topic would also co-occur within a corpus. to utilize the three topic coherence metrics, the study chose a binarized choice (e.g., does a transcript contain two words?) instead of a sliding window of fixed size (e.g., do two words appear within a fixed window of 10 consecutive words?) as a type of how to count term cooccurrences. this decision was made because each chat transcript is relatively short, and a fixed information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 9 window size seemed inconsistent across different type of data sets that included different parts of speech. in terms of the other decision to be made for applying the three topic coherence metrics, this study chose a training corpus of all the chat transcripts instead of external corpuses such as the entire collection of english wikipedia articles that has little in common with average library chat transcripts. qualitative evaluation with human judgment in addition to quantitative evaluation with topic coherence metrics, qualitative accuracy and interpretability were judged by the librarian author of this paper based on whether topics were aligned with frequently asked questions or easily inferable themes in academic library contexts. for example, “find or access book or article” was inferred, from a set of words in topic 1 on lsa in table 6, as an accurate and easily interpretable theme. from a set of words in topic 3 on lda, “reserve study room” and “check out laptop computer” were inferred as two separable, easily interpretable themes. from a set of words in topic 15 on corex with nine anchors, no theme was inferred as an easily interpretable theme. (see table 10 in the results section for all themes inferred from table 6.) information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 10 table 6. examples of topics found by topic modeling techniques topic modeling technique topics (top 15 topics with eight words per topic) note: parenthetical additions are explanations or descriptions and not part of the topic. latent semantic analysis (lsa) topic 1. article book search find access link will check topic 2. renew book article room reserve search journal check topic 3. room renew reserve book study scheduler loan online topic 4. renew request loan interlibrary search room review peer topic 5. loan floor renew access interlibrary request log book topic 6. book open print request search loan renew interlibrary topic 7. print floor open printer color hour research pm topic 8. open hour print search review close peer floor topic 9. print access renew research book loan librarian open topic 10. floor article open book renew print locate database topic 11. article book attach file print database floor check topic 12. check book desk laptop answer print shortly open topic 13. answer desk shortly place room database circulation pick topic 14. review peer search reserve log access campus database topic 15. database file attach collection access journal research reserve probabilistic latent semantic analysis (plsa) topic 1. collection special youth contact email number archive department topic 2. book title hold online check pick number reserve topic 3. room reserve study scheduler reservation group rodscheduler (software) space topic 4. search bar click type journal onesearch (a discovery tool) result homepage topic 5. request loan interlibrary link illiad (system) submit inter instruction topic 6. renew online account book today number circulation item topic 7. access link log campus click work online sign topic 8. article journal attach file title access google scholar topic 9. research librarian paper appointment consultation source topic question topic 10. open hour today close pm tomorrow midnight tonight topic 11. check answer place shortly desk laptop student long topic 12. print color printer computer printing mobile release black topic 13. floor locate desk stack main fourth number section information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 11 topic modeling technique topics (top 15 topics with eight words per topic) note: parenthetical additions are explanations or descriptions and not part of the topic. topic 14. database az subject ebsco(database) list business topic access topic 15. review peer journal topic sociology study article result latent dirichlet allocation (lda) with sklearn topic 1. file attach cite citation link article author pdf topic 2. check book renew student item today time member topic 3. room reserve computer laptop study check reservation desk topic 4. book request loan interlibrary check title online copy topic 5. search article database review result type google bar topic 6. student class access iowa course university college fall topic 7. research librarian source paper topic good appointment specific topic 8. email contact chat good librarian work question address topic 9. open hour today check pick hold desk close topic 10. link access click log work campus sign database topic 11. floor locate desk main art music circulation section topic 12. medium digital check video hub desk rent camera topic 13. article journal access title online link education amp topic 14. print printer color card scan document charge job topic 15. answer check place collection shortly special question number dirichlet multinomial mixture (dmm) topic 1. room reserve how will study check floor what topic 2. request loan book interlibrary how article will link topic 3. article access find journal link how search full topic 4. book how find check what online link will topic 5. article find attach file what how will link topic 6. how check open today desk hour will what topic 7. find article what search how research source database topic 8. how print will cite printer link what citation topic 9. search article find how review will database journal topic 10. book find floor how will where call number topic 11. book check how renew will today request what topic 12. research how librarian find what article will email topic 13. find how will contact collection what special email topic 14. access article link log how campus database work topic 15. article find will search what link book how anchored correlation explanation (corex) with nine anchor words topic 1. request loan interlibrary illiad (system) form submit inter fill topic 2. study reserve room scheduler hub medium equipment digital information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 12 topic modeling technique topics (top 15 topics with eight words per topic) note: parenthetical additions are explanations or descriptions and not part of the topic. topic 3. search review peer bar result type onesearch (a discovery tool) homepage topic 4. today open hour pm assist close window midnight topic 5. locate floor main where third fourth desk stack topic 6. print printer color printing black white mobile release topic 7. number collection special call phone youth archive xxx topic 8. research librarian appointment consultation paper set xxx transfer topic 9. access database journal article campus full az text topic 10. email will contact work when good who student topic 11. education read school class professor amp teacher child topic 12. topic source cite write apa start citation recommend topic 13. find attach file google what scholar title specific topic 14. click log link left side catid button hand topic 15. shortly place answer check cedar fall iowa northern guidedlda with nine anchor words and confidence 0.75 topic 1. book request loan interlibrary will how check link topic 2. room reserve how check will desk study medium topic 3. search article find how will database book review topic 4. book check how renew today will hour open topic 5. book floor find how check where call locate topic 6. print how computer will printer color desk student topic 7. contact collection will find email special how check topic 8. research librarian find how what will email article topic 9. article access link how log click database find topic 10. article find how access what link attach file topic 11. find chat copy how good online what will topic 12. article find file attach what journal will work topic 13. how check book answer place shortly what find topic 14. book how find what sport link video textbook topic 15. how cite what find citation author article source information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 13 results this section first introduces which topic modeling techniques, as well as which type of data set, performed the best on each of the three topic coherence metrics. it follows by introducing which technique was the best according to human qualitative judgment. quantitative evaluation with topic coherence metrics given that for a topic coherence metric tc-pmi larger values mean more coherent topics, table 7 and its corresponding figure 1 show that corex with anchor words on the whole-chat performed best on tc-pmi. tf–idf & plsa on the whole-chat performed better than lda on the whole-chat. given that for topic coherence metric tc-lcp larger values mean more coherent topics, table 8 and its corresponding figure 2 show that dmm on the whole-chat performed best on tc-lcp. tf– idf & plsa on the whole-chat performed better than lda, even though lda (pymallet) on the whole-chat performed better than tc-idf & plsa on the whole-chat. given that for topic coherence metric tc-nz smaller values mean more coherent topics, table 9 and its corresponding figure 3 show that tf–idf & plsa, lda and lda (pymallet) on the wholechat performed best on tc-nz. table 7. tc-pmi comparison of topic modeling techniques on the four types of data sets (with top 15 topics with eight words per topic) topic modeling technique whole-chat whole-chat (noun, adjective, verb) whole-chat (noun, adjective) question-only tf–idf & lsa -0.066 -0.061 -0.063 -0.429 tf–idf & plsa 0.508 0.321 0.494 -0.122 lda (sklearn) 0.378 0.261 0.099 -0.995 lda (pymallet) 0.218 0.262 0.271 -0.091 dmm 0.136 0.22 0.285 0.109 corex without anchor words 0.47 0.497 0.396 -0.584 corex with nine anchor words 0.522 0.534 0.558 -0.401 guidedlda with nine anchor words and confidence 0.75 0.133 0.216 0.262 0.069 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 14 figure 1. tc-pmi comparison of topic modeling techniques on the four types of data sets. table 8. tc-lcp comparison of topic modeling techniques on the four types of data sets (with top 15 topics with eight words per topic) topic modeling technique whole-chat whole-chat (noun, adjective, verb) whole-chat (noun, adjective) question-only tf–idf & lsa -1.114 -1.124 -1.204 -1.675 tf–idf & plsa -0.751 -0.793 -0.893 -1.956 lda (sklearn) -0.789 -0.979 -1.263 -2.827 lda (pymallet) -0.637 -0.767 -0.918 -1.626 dmm -0.546 -0.645 -0.731 -1.159 corex without anchor words -0.868 -0.853 -1.062 -2.618 corex with nine anchor words -0.82 -0.791 -0.884 -2.348 guidedlda with nine anchor words and confidence 0.75 -0.637 -0.686 -0.792 -1.143 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 15 figure 2. tc-lcp comparison of topic modeling techniques on the four types of data sets. information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 16 table 9. tc-nz comparison of topic modeling techniques on the four types of data sets (with top 15 topics with eight words per topic) topic modeling technique whole-chat whole-chat (noun, sdjective, verb) whole-chat (noun, adjective) question-only tf–idf & lsa 0.267 0.267 0.333 1.8 tf–idf & plsa 0 0 0.067 3.8 lda (sklearn) 0 0.467 1.2 7.067 lda (pymallet) 0 0.133 0.267 1.8 dmm 0.067 0 0 0.267 corex without anchor words 0.333 0.067 0.6 7.067 corex with nine anchor words 0.133 0 0.133 5.267 guidedlda with nine anchor words and confidence 0.75 0.2 0.067 0 0.133 figure 3. tc-nz comparison of topic modeling techniques on all four data sets. information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 17 last, all tables 7 to 9 and their corresponding figures 1 to 3 clearly show that the whole-chat data set with all parts of speech was generally the best data set on all the techniques. qualitative evaluation with human judgment as shown in table 10, all techniques had relatively high accuracy and interpretability in terms of straightforward topics or themes in italicized text, such as “interlibrary loan,” “technology,” “hours,” and “room reservations,” where one keyword could represent a whole theme. however, in terms of less-straightforward topics or themes plsa performed better than the other techniques. in other words, plsa had the highest number of topics that are aligned clearly with frequently asked questions or are easily inferable themes in academic library contexts. also, plsa had a lower number of unrelated or multiple themes within one topic, whereas other techniques had a higher number of unrelated or multiple themes within one topic. as an example, topic 8 on dmm shows that “print” and “citation” can be inferred as two unrelated themes within one topic. table 10. examples of themes qualitatively inferred from a list of words (a topic) identified by each topic modeling technique topic modeling technique themes inferred from table 6 (note: italics denotes straightforward themes; and strikethrough denotes themes with no interpretability or unrelated, multiple themes within one topic) latent semantic analysis (lsa) topic 1. find or access book or article topic 2. renew book or article; reserve a room; search journal topic 3. renew book online; reserve room; loan topic 4. renew; interlibrary loan; search; room topic 5. renew book; interlibrary loan; floor topic 6. renew; interlibrary loan print book; search topic 7. print color; floor; hours; research topic 8. hours; print; search; peer peer review; floor topic 9. print; renew book; librarian; open hours topic 10. renew book and article, print, floor and locate; database topic 11. print; database; floor topic 12. check out book or laptop; print; open topic 13. circulation desk; room; database topic 14. not clear topic 15. not clear probabilistic latent semantic analysis (plsa) topic 1. contact information of special collection and youth topic 2. not clear topic 3. room reservation topic 4. journal search and onesearch topic 5. interlibrary loan request topic 6. how to renew book online topic 7. working from off campus (not clear) topic 8. journal article via google scholar topic 9. appointment with librarians for research consultations topic 10. open hours topic 11. not clear topic 12. printing information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 18 topic modeling technique themes inferred from table 6 (note: italics denotes straightforward themes; and strikethrough denotes themes with no interpretability or unrelated, multiple themes within one topic) topic 13. stack on the fourth floor topic 14. databases a-z for business including ebsco topic 15. peer reviewed journals for sociology latent dirichlet allocation (lda) with sklearn topic 1. not clear topic 2. not clear topic 3. reserve study room; check out laptop computer topic 4. interlibrary loan online topic 5. search article via databases topic 6. not clear topic 7. appointment with research librarians topic 8. contact librarian via email topic 9. open hours topic 10. database access from off campus topic 11. floor for art and music circulation desk topic 12. rent camera topic 13. access journal article topic 14. printing and charge topic 15. special collection dirichlet multinomial mixture (dmm) topic 1. reserve study room and floor topic 2. interlibrary loan topic 3. search and access article topic 4. find book online topic 5. find article (not clear) topic 6. open hours topic 7. find article and database topic 8. print; citation topic 9. find article & database topic 10. find book with call number topic 11. renew book (not clear) topic 12. email librarians for research help topic 13. special collection (not clear) topic 14. access article/database from on campus topic 15. find article (not clear) anchored correlation explanation (corex) with nine anchor words topic 1. interlibrary loan topic 2. reserve study room; equipment topic 3. peer-reviwed and onesearch topic 4. open hours topic 5. floor location topic 6. printing topic 7. special collection and phone number topic 8. research consultation appointment topic 9. access database a-z information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 19 topic modeling technique themes inferred from table 6 (note: italics denotes straightforward themes; and strikethrough denotes themes with no interpretability or unrelated, multiple themes within one topic) topic 10. not clear topic 11. not clear topic 12. apa citations topic 13. google scholar (not clear) topic 14. log in topic 15. not clear guidedlda with nine anchor words and confidence 0.75 topic 1. interlibrary loan topic 2. reserve study room & medium topic 3. search and find article; databases topic 4. renew book; hours topic 5. find book with call number topic 6. printing topic 7. special collection topic 8. email to research librarian topic 9. access article and databases topic 10. access article; attach file (not clear) topic 11. not clear topic 12. find article and journal; file attach (not clear) topic 13. not clear topic 14. find book, video, and textbook about sport topic 15. citation discussion given that different topic modeling techniques performed the best depending on different types of topic coherence metrics, it is not possible to make a firm conclusion that one technique is better than the others. interestingly, the commonly-used technique lda tested in both sklearn and pymallet in this study did not consistently outperform tf–idf & plsa. in addition, semisupervised techniques of anchored correlation explanation (corex) or guided lda (guidedlda) did not necessarily outperform an unsupervised technique of the dirichlet multinomial mixture (dmm). last, from a human’s qualitative judgment, plsa performed the best, which is aligned with the findings on tc-nz. this might imply that tc-nz is a more appropriate metric than the other metrics in measuring topic coherence in the context of academic library chat transcripts. in terms of different types of data sets, all three of the whole-chat data sets significantly outperformed the questions-only data set. at the outset of the study, it was conjectured that the initial question of each chat transaction might concentrate the essence of each chat, thereby leading to better performance. clearly this was not the case, possibly because the rest of chat transcripts would reinforce a topic by standardizing the vocabulary of the chat’s initial question. it was somewhat interesting that varying the parts of speech (pos) retained in the three whole-chat data sets had little benefit on the topic modeling analyses. it might imply that topic modeling information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 20 techniques are sensitive enough to differentiate across different parts of speech, thereby leading to good performance regardless of types of data sets. conclusion this study clearly showed that conventional techniques should be also examined to avoid any errors from the assumption that newly developed techniques such as lda would always outperform regardless of contexts. also, both quantitative and qualitative evaluations indicate that unsupervised techniques should be equally weighted as semi-supervised techniques with human interventions. as a future study, like other similar research, it would be meaningful to compare human qualitative judgment with scores of each metric more rigorously, along with more librarians’ input, to confirm (or disconfirm) our preliminary conclusion that tc-nz is the most appropriate topic coherence metric in the context of library chat transcripts.25 it would be also interesting to investigate and examine semi-supervised techniques with different types of anchoring approaches, such as tandem anchoring.26 last, in order to overcome limitations of this study, it would be valuable to collect more and diverse chat reference data and compare output of topics across different types of institutions (e.g., teaching versus research institutions). acknowledgments this project was made possible in part by the institute of museum and library services [national leadership grants for libraries, lg-34-19-0074-19]. endnotes 1 christina m. desai and stephanie j. graves, “cyberspace or face-to-face: the teachable moment and changing reference mediums,” reference & user services quarterly 47, no. 3 (spring 2008): 242–55, https://www.jstor.org/stable/20864890; megan oakleaf and amy vanscoy, “instructional strategies for digital reference: methods to facilitate student learning,” reference & user services quarterly 49, no. 4 (summer 2010): 380–90, https://www.jstor.org/stable/20865299; shu z. schiller, “chat for chat: mediated learning in online chat virtual reference service,” computers in human behavior 65 (july 2016): 651–65, https://doi.org/10.1016/j.chb.2016.06.053; mila semeshkina, “five major trends in online education to watch out for in 2021,” forbes, february 2, 2021, https://www.forbes.com/sites/forbesbusinesscouncil/2021/02/02/five-major-trends-inonline-education-to-watch-out-for-in-2021/?sh=3261272521eb. 2 maryvon côté, svetlana kochkina, and tara mawhinney, “do you want to chat? reevaluating organization of virtual reference service at an academic library,” reference and user services quarterly 56, no. 1 (fall 2016): 36–46, https://www.jstor.org/stable/90009882; sarah lemire, lorelei rutledge, and amy brunvand, “taking a fresh look: reviewing and classifying reference statistics for data-driven decision making,” reference & user services quarterly 55, no. 3 (spring 2016): 230–38, https://www.jstor.org/stable/refuseserq.55.3.230; b. jane scales, lipi turner-rahman, and feng hao, “a holistic look at reference statistics: whither librarians?,” evidence based library and information practice 10, no. 4 (december 2015): 173– 85, https://doi.org/10.18438/b8x01h. https://www.jstor.org/stable/20864890 https://www.jstor.org/stable/20865299 https://doi.org/10.1016/j.chb.2016.06.053 https://www.forbes.com/sites/forbesbusinesscouncil/2021/02/02/five-major-trends-in-online-education-to-watch-out-for-in-2021/?sh=3261272521eb https://www.forbes.com/sites/forbesbusinesscouncil/2021/02/02/five-major-trends-in-online-education-to-watch-out-for-in-2021/?sh=3261272521eb https://www.jstor.org/stable/90009882 https://www.jstor.org/stable/refuseserq.55.3.230 https://doi.org/10.18438/b8x01h information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 21 3 pamela j. howard, “can academic library instant message transcripts provide documentation of undergraduate student success?,” journal of web librarianship 13, no. 1 (february 2019): 61– 87, https://doi.org/10.1080/19322909.2018.1555504. 4 côté and kochkina, “do you want to chat?”; sharon q. yang and heather a. dalal, “delivering virtual reference services on the web: an investigation into the current practice by academic libraries,” journal of academic librarianship 41, no. 1 (november 2015): 68–86, https://doi.org/10.1016/j.acalib.2014.10.003. 5 feifei liu, “how information-seeking behavior has changed in 22 years,” nn/g nielsen norman group, january 26, 2020, https://www.nngroup.com/articles/information-seeking-behaviorchanges/; amanda spink and jannica heinström, eds., new directions in information behavior (bingley, uk: emerald group publishing limited, 2011). 6 kathryn barrett and amy greenberg, “student-staffed virtual reference services: how to meet the training challenge,” journal of library & information services in distance learning 12, no. 3–4 (august 2018): 101–229, https://doi.org/10.1080/1533290x.2018.1498620; robin canuel et al., “developing and assessing a graduate student reference service,” reference services review 47, no. 4 (november 2019): 527–43, https://doi.org/10.1108/rsr-06-20190041. 7 bhagyashree vyankatrao barde and anant madhavrao bainwad, “an overview of topic modeling methods and tools,” in proceedings of international conference on intelligent computing and control systems, 2018, 745–50, https://doi.org/10.1109/iccons.2017.8250563; jordan boydgraber, david mimno, and david newman, “care and feeding of topic models: problems, diagnostics, and improvements,” in handbook of mixed membership models and their applications, eds. edoardo m. airoldi et al. (new york: crc press, 2014), 225–54. 8 miriam l. matteson, jennifer salamon, and lindy brewster, “a systematic review of research on live chat service,” reference & user services quarterly 51, no. 2 (winter 2011): 172–89, https://www.jstor.org/stable/refuseserq.51.2.172. 9 kate fuller and nancy h. dryden, “chat reference analysis to determine accuracy and staffing needs at one academic library,” internet reference services quarterly 20, no. 3–4 (december 2015): 163–81, https://doi.org/10.1080/10875301.2015.1106999; sarah passonneau and dan coffey, “the role of synchronous virtual reference in teaching and learning: a grounded theory analysis of instant messaging transcripts,” college & research libraries 72, no. 3 (2011): 276–95, https://doi.org/10.5860/crl-102rl. 10 paula r. dempsey, “‘are you a computer?’ opening exchanges in virtual reference shape the potential for teaching,” college & research libraries 77, no. 4 (2016): 455–68, https://doi.org/10.5860/crl.77.4.455; jennifer waugh, “formality in chat reference: perceptions of 17to 25-year-old university students,” evidence based library and information practice 8, no. 1 (2013): 19–34, https://doi.org/10.18438/b8ws48. 11 robin brown, “lifting the veil: analyzing collaborative virtual reference transcripts to demonstrate value and make recommendations for practice,” reference & user services quarterly 57, no. 1 (fall 2017): 42–47, https://www.jstor.org/stable/90014866; sarah https://doi.org/10.1080/19322909.2018.1555504 https://doi.org/10.1016/j.acalib.2014.10.003 https://www.nngroup.com/articles/information-seeking-behavior-changes/ https://www.nngroup.com/articles/information-seeking-behavior-changes/ https://doi.org/10.1080/1533290x.2018.1498620 https://doi.org/10.1108/rsr-06-2019-0041 https://doi.org/10.1108/rsr-06-2019-0041 https://doi.org/10.1109/iccons.2017.8250563 https://www.jstor.org/stable/refuseserq.51.2.172 https://doi.org/10.1080/10875301.2015.1106999 https://doi.org/10.5860/crl-102rl https://doi.org/10.5860/crl.77.4.455 https://doi.org/10.18438/b8ws48 https://www.jstor.org/stable/90014866 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 22 maximiek, elizabeth brown, and erin rushton, “coding into the great unknown: analyzing instant messaging session transcripts to identify user behaviors and measure quality of service,” college & research libraries 71, no. 4 (2010): 361–73, https://doi.org/10.5860/crl48r1. 12 christopher brousseau, justin johnson, and curtis thacker, “machine learning based chat analysis,” code4lib journal 50 (february 2021), https://journal.code4lib.org/articles/15660; ellie kohler, “what do your library chats say?: how to analyze webchat transcripts for sentiment and topic extraction,” in brick & click libraries conference proceedings (maryville, mo: northwest missouri state university, 2017), 138–48, https://files.eric.ed.gov/fulltext/ed578189.pdf; megan ozeran and piper martin, “‘good night, good day, good luck,’” information technology and libraries 38, no. 2 (june 2019): 49–57, https://doi.org/10.6017/ital.v38i2.10921; thomas stieve and niamh wallace, “chatting while you work: understanding chat reference user needs based on chat reference origin ,” reference services review 46, no. 4 (november 2018): 587–99, https://doi.org/10.1108/rsr09-2017-0033; nadaleen tempelman-kluit and alexa pearce, “invoking the user from data to design,” college & research libraries 75, no. 5 (2014): 616–40, https://doi.org/10.5860/crl.75.5.616. 13 jordan boyd-graber, yuening hu, and david mimno, “applications of topic models,” foundations and trends in information retrieval 11, no. 2–3 (2017): 143–296, https://mimno.infosci.cornell.edu/papers/2017_fntir_tm_applications.pdf. 14 ewa m. golonka, medha tare, and carrie bonilla, “peer interaction in text chat: qualitative analysis of chat transcripts,” language learning & technology 21, no. 2 (june 2017): 157–78, http://hdl.handle.net/10125/44616; laura d. kassner and kate m. cassada, “chat it up: backchanneling to promote reflective practice among in-service teachers,” journal of digital learning in teacher education 33, no. 4 (august 2017): 160–68, https://doi.org/10.1080/21532974.2017.1357512. 15 eradah o. hamad et al., “toward a mixed-methods research approach to content analysis in the digital age: the combined content-analysis model and its applications to health care twitter feeds,” journal of medical internet research 18, no. 3 (march 2016): e60, https://doi.org/10.2196/jmir.5391; janet richardson et al., “tweet if you want to be sustainable: a thematic analysis of a twitter chat to discuss sustainability in nurse education,” journal of advanced nursing 72, no. 5 (january 2016): 1086–96, https://doi.org/10.1111/jan.12900. 16 shuyuan mary ho et al., “computer-mediated deception: strategies revealed by languageaction cues in spontaneous communication,” journal of management information systems 33, no. 2 (october 2016): 393–420, https://doi.org/10.1080/07421222.2016.1205924; mina park, milam aiken, and laura salvador, “how do humans interact with chatbots?: an analysis of transcripts,” international journal of management & information technology 14 (2018): 3338–50, https://doi.org/10.24297/ijmit.v14i0.7921. 17 abdur rahman, m. a. basher, and benjamin c. m. fung, “analyzing topics and authors in chat logs for crime investigation,” knowledge and information systems 39, no. 2 (march 2014): 351–81, https://doi.org/10.1007/s10115-013-0617-y; michelle drouin et al., “linguistic https://doi.org/10.5860/crl-48r1 https://doi.org/10.5860/crl-48r1 https://journal.code4lib.org/articles/15660 https://files.eric.ed.gov/fulltext/ed578189.pdf https://doi.org/10.6017/ital.v38i2.10921 https://doi.org/10.1108/rsr-09-2017-0033 https://doi.org/10.1108/rsr-09-2017-0033 https://doi.org/10.5860/crl.75.5.616 https://mimno.infosci.cornell.edu/papers/2017_fntir_tm_applications.pdf http://hdl.handle.net/10125/44616 https://doi.org/10.1080/21532974.2017.1357512 https://doi.org/10.2196/jmir.5391 https://doi.org/10.1111/jan.12900 https://doi.org/10.1080/07421222.2016.1205924 https://doi.org/10.24297/ijmit.v14i0.7921 https://doi.org/10.1007/s10115-013-0617-y information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 23 analysis of chat transcripts from child predator undercover sex stings,” journal of forensic psychiatry & psychology 28, no. 4 (february 2017): 437–57, https://doi.org/10.1080/14789949.2017.1291707; da kuang, p. jeffrey brantingham, and andrea l. bertozzi, “crime topic modeling,” crime science 6, no. 12 (december 2017): 1–12, https://doi.org/10.1186/s40163-017-0074-0; md waliur rahman miah, john yearwood, and siddhivinayak kulkarni, “constructing an inter‐post similarity measure to differentiate the psychological stages in offensive chats,” journal of the association for information science and technology 66, no. 5 (january 2015): 1065–81, https://doi.org/10.1002/asi.23247. 18 charu c. aggarwal and chengxiang zhai, eds. mining text data (new york: springer, 2012); rubayyi alghamdi and khalid alfalqi, “a survey of topic modeling in text mining,” international journal of advanced computer science and applications 6, no. 1 (2015): 146–53, https://doi.org/10.14569/ijacsa.2015.060121; leticia h. anaya, “comparing latent dirichlet allocation and latent semantic analysis as classifiers” (phd diss., university of north texas, 2011); barde and bainwad, “an overview of topic modeling”; david m. blei, “topic modeling and digital humanities,” journal of digital humanities 2, no. 1 (winter 2012), http://journalofdigitalhumanities.org/2-1/topic-modeling-and-digital-humanities-by-davidm-blei/; tse-hsun chen, stephen w. thomas, and ahmed e. hassan, “a survey on the use of topic models when mining software repositories,” empirical software engineering 21, no. 5 (september 2016): 1843–919, https://doi.org/10.1007/s10664-015-9402-8; elisabeth günther and thorsten quandt, “word counts and topic models: automated text analysis methods for digital journalism research,” digital journalism 4, no. 1 (october 2016): 75–88, https://doi.org/10.1080/21670811.2015.1093270; gabe ignatow and rada mihalcea, an introduction to text mining: research design, data collection, and analysis (new york: sage, 2017); stefan jansen, hands-on machine learning for algorithmic trading: design and implement investment strategies based on smart algorithms that learn from data using python (birmingham: packt publishing limited, 2018); lin liu et al., “an overview of topic modeling and its current applications in bioinformatics,” springerplus 5, no. 1608 (september 2016): 1– 22, https://doi.org/10.1186/s40064-016-3252-8; john w. mohr and petko bogdanov, “introduction—topic models: what they are and why they matter,” poetics 41, no. 6 (december 2013): 545–69, https://doi.org/10.1016/j.poetic.2013.10.001; gerard salton, anita wong, and chung-shu yang, “a vector space model for automatic indexing,” communications of the acm 18, no. 11 (november 1975): 613–20, https://doi.org/10.1145/361219.361220; jianhua yin and jianyong wang, “a dirichlet multinomial mixture model-based approach for short text clustering,” in proceedings of the twentieth acm sigkdd international conference on knowledge discovery and data mining (new york: acm, 2014), 233–42, https://doi.org/10.1145/2623330.2623715; hongjiao xu et al., “exploring similarity between academic paper and patent based on latent semantic analysis and vector space model, ” in proceedings of the twelfth international conference on fuzzy systems and knowledge discovery (new york: ieee, 2015), 801–5, https://doi.org/10.1109/fskd.2015.7382045; chengxiang zhai, statistical language models for information retrieval (williston, vt: morgan & claypool publishers, 2018). 19 neha agarwal, geeta sikkaa, and lalit kumar awasthib, “evaluation of web service clustering using dirichlet multinomial mixture model based approach for dimensionality reduction in service representation,” information processing & management 57, no. 4 (july 2020), https://doi.org/10.1016/j.ipm.2020.102238 ; chenliang li et al., “topic modeling for short https://doi.org/10.1080/14789949.2017.1291707 https://doi.org/10.1186/s40163-017-0074-0 https://doi.org/10.1002/asi.23247 https://doi.org/10.14569/ijacsa.2015.060121 http://journalofdigitalhumanities.org/2-1/topic-modeling-and-digital-humanities-by-david-m-blei/ http://journalofdigitalhumanities.org/2-1/topic-modeling-and-digital-humanities-by-david-m-blei/ https://doi.org/10.1007/s10664-015-9402-8 https://doi.org/10.1080/21670811.2015.1093270 https://doi.org/10.1186/s40064-016-3252-8 https://doi.org/10.1016/j.poetic.2013.10.001 https://doi.org/10.1145/361219.361220 https://doi.org/10.1145/2623330.2623715 https://doi.org/10.1109/fskd.2015.7382045 https://doi.org/10.1016/j.ipm.2020.102238 information technology and libraries september 2021 topic modeling as a tool for analyzing library chat transcripts | koh and fienup 24 texts with auxiliary word embeddings,” in proceedings of the thirty-ninth international acm sigir conference on research and development in information retrieval (new york: acm, 2016), 165–74, https://doi.org/10.1145/2911451.2911499; jipeng qiang et al., “short text topic modeling techniques, applications, and performance: a survey,” ieee transactions on knowledge and data engineering 14, no. 8 (april 2019): 1–17, https://doi.org/10.1109/tkde.2020.2992485. 20 ryan j. gallagher et al., “anchored correlation explanation: topic modeling with minimal domain knowledge,” transactions of the association for computational linguistics 5 (december 2017): 529–42, https://doi.org/10.1162/tacl_a_00078. 21 jagadeesh jagarlamudi, hal daumé iii, and raghavendra udupa, “incorporating lexical priors into topic models,” in proceedings of the thirteenth conference of the european chapter of the association for computational linguistics (stroudsburg, pa: acl, 2012), 204–13, https://www.aclweb.org/anthology/e12-1021; olivier toubia et al., “extracting features of entertainment products: a guided latent dirichlet allocation approach informed by the psychology of media consumption,” journal of marketing research 56, no. 1 (december 2019): 18–36, https://doi.org/10.1177/0022243718820559. 22 nan zhang and baojun ma, “constructing a methodology toward policy analysts for understanding online public opinions: a probabilistic topic modeling approach,” in electronic government and electronic participation, eds. efthimios tambouris et al. (amsterdam, netherlands: ios press bv, 2015): 72–9, https://doi.org/10.3233/978-1-61499-570-8-72. 23 jonathan chang et al., “reading tea leaves: how humans interpret topic models,” in proceedings of the twenty-second international conference on neural information processing systems (new york: acm, 2009), 288–96, https://dl.acm.org/doi/10.5555/2984093.2984126. 24 gerlof bouma, “normalized (pointwise) mutual information in collocation extraction,” in proceedings of the international conference of the german society for computational linguistics and language technology (tübingen, germany: gunter narr verlag, 2009), 43–53; boydgraber, mimno, and newman, “care and feeding of topic models,” in handbook of mixed membership models and their applications, eds. edoardo m. airoldi, david m. blei, elena a. erosheva, and stephen e. fienberg (boca raton: crc press, 2014), 225–54; jey han lau, david newman, and timothy baldwin, “machine reading tea leaves: automatically evaluating topic coherence and topic model quality,” in proceedings of the fourteenth conference of the european chapter of the association for computational linguistics (stroudsburg, pa: acl, 2014), 530–39, https://doi.org/10.3115/v1/e14-1056. 25 lau, newman, and baldwin, “machine reading tea leaves”; david newman et al., “automatic evaluation of topic coherence,” in proceedings of human language technologies: the 2010 annual conference of the north american chapter of the association for computational linguistics (new york: acm, 2010), 100–108, https://dl.acm.org/doi/10.5555/1857999.1858011. 26 jeffrey lund et al., “tandem anchoring: a multiword anchor approach for interactive topic modeling,” in proceedings of the fifty-fifth annual meeting of the association for computational linguistics (stroudsburg, pa: acl, 2017), 896–905, https://doi.org/10.18653/v1/p17-1083. https://doi.org/10.1145/2911451.2911499 https://doi.org/10.1109/tkde.2020.2992485 https://doi.org/10.1162/tacl_a_00078 https://www.aclweb.org/anthology/e12-1021 https://doi.org/10.1177/0022243718820559 https://doi.org/10.3233/978-1-61499-570-8-72 https://dl.acm.org/doi/10.5555/2984093.2984126 https://doi.org/10.3115/v1/e14-1056 https://dl.acm.org/doi/10.5555/1857999.1858011 https://doi.org/10.18653/v1/p17-1083 abstract introduction literature review chat transcript analysis methods in library settings chat transcript analysis methods in nonlibrary settings topic modeling techniques and their strengths and weaknesses data, preprocessing, analysis, and evaluation data preprocessing data analysis with conventional topic modeling techniques data analysis with alternative topic modeling techniques quantitative evaluation with topic coherence metrics qualitative evaluation with human judgment results quantitative evaluation with topic coherence metrics qualitative evaluation with human judgment discussion conclusion acknowledgments endnotes microsoft word december_ital_vacek_final.docx president’s message twitter nodes to networks: thoughts on the #litaforum rachel vacek information technologies and libraries | december 1014 1 one thing that never ceases to amaze me is the technological talent and creativity of my library colleagues. the lita forum is a gathering of intelligent, fun, and passionate people who want to talk about technology and learn from one another. i suppose many conferences have lots of opportunities to network, but the size and friendliness of the forum makes it feel more like a comfortable place among friends. however, the utilization of technology always inspires me, and the networking and reconnect with friends is rejuvenating. so many more people are sharing their research and their presentations through twitter, and it’s fantastic in so many ways. so no matter what concurrent session you were in, or if you couldn’t even make it to albuquerque this year, you can still view most of the presentations, listen to the keynotes, see pictures of attendees, follow the backchannel, and engage with everyone on twitter. with libraries having more tight budgets, it’s extremely important that we continue to learn virtually. there are plenty of online workshops and webinars, but often they still cost money, don’t usually encourage much communication between attendees, and “attending” the lita forum only through twitter is not only free, but the learning and sharing is more organic. you have the opportunity to engage with attendees, observers, and even the presenters themselves. structured workshops have their place for focused, more in-‐depth learning on a particular topic, and they are definitely still needed and very popular. i enjoy our lita educational programs and highly recommend them. however, interacting with twitter throughout the forum was like a giant social playground for me, and i could engage as much as or as little as i liked. it’s a different user experience than so many other more traditional learning environments. twitter was born in mid 2006 and the paradigm shift started happening a few years later, but the ways people are socially engaging with one another through twitter has changed drastically since then.1 people aren’t just regurgitating what the presenters are saying, but are responding to speakers and others in the physical and virtual audience. people are talking more in depth about what they are learning and supplementing talks with links to sites, videos, images, and reports that might have been mentioned. they are coding and sharing their code while at the conference. they are blogging about their experiences and sharing those links. they are extending their networks. the conference theme this year was “from node to network” and reflecting on my own conference experience and reviewing all the twitter data, i don’t think the 2014 lita forum rachel vacek (revacek@uh.edu) is lita president 2014-‐15 and head of web services, university libraries, university of houston, houston, texas. president’s message | vacek 2 planning committee, led by ken varnum from the university of michigan, could have chosen a better theme. as previously mentioned, the ways in which we are using twitter have been significantly changing the way we learn and interact. when combing through the #litaform tweets for the gems, i found many links to tools that analyze and visually display unique information about tweets from the forum. the love of data is not uncommon in libraries, and neither is the analysis of that data. the tagsarchive2 contains lots of twitter data from the forum. as you can see in image 1, between november 1, 2013, and november 17, 2014, (the same tag for the forum was used for the 2013 forum) there were 5,454 tweets, 4,390 of which were unique, not just retweets. there were 1,394 links within those tweets, demonstrating that we aren’t just repeating what the speakers are saying; we are enriching our networks with more easily accessible information. image 1. archive of #litaforum tweets through tags the data also tells stories. for example, @cm_harlow by far tweeted more than everyone else with 881 tweets, @thestackscat had the highest retweet rate at 90%, and @varnum with the lowest information technologies and libraries | december 1014 3 retweet rate at 1%. i was able to look at every single tweet in a google spreadsheet, complete with timestamps and links to user profiles. all this is rich data and quite informative, but tagsexplorer, developed by @mhawksey, is also quite an impressive data visualization tool that shows connections between the twitter handles. (see image 2.) image 2. tagsexplorer data visualization and top conversationalists additionally, you can see whom you retweeted and who retweeted you,3 again demonstrating the power of rich, structured data. (see image 3.) all of these tools improve our ability to share, reflect, archive, and network within lita and beyond our typical, often comfortable library boundaries. tweets also don’t last forever on the web, but they do when they are archived.4 one conference attendee, @kayiwa, used a tool called twarc (https://github.com/edsu/twarc), a command-‐line tool for archiving json twitter search results before they disappear. looking through the tweets, you will learn that a great number of attendees experienced altitude sickness due to albuquerque’s elevation, which is around 5,000 feet above sea level. the most popular and desired food to were enchiladas with green chili. many were impressed with the scenery, mountains, and endless blue skies of the city, as evidenced by the number of images of outdoor landscapes and sky shots. president’s message | vacek 4 image 3. connections between @vacekrae’s retweets and who she was retweeted by there were two packed pre-‐conferences at the lita forum. dean krafft and jon corson-‐rikert from cornell university library taught attendees about a very hot topic: linked data and “how libraries can make use of linked open data to share information about library resources and to improve discovery, access, and understanding for library users.” the hashtag #linkeddata was used 382 times across all the forum’s tweets – clearly conversation went beyond the workshop. also, francis kayiwa, of kayiwa consulting, and eric phetteplace from the california college of arts, helped attendees “learn python by playing with library data” in the second, equally as popular pre-‐conference. (see image 4.) image 4 information technologies and libraries | december 1014 5 the forum this year also had three exceptional keynote speakers. annmarie thomas, @amptmn, an engineering professor from the university of st. thomas in minnesota, kicked off the forum and shared her enthusiasm and passion for makerspaces, squishy circuits, and how to engage kids in engineering and science in incredibly creative ways. i was truly inspired by her passion for making and sharing with others. she reminded us that all children are makers, and as adults we need to remember to be curious, explore, and play. there are 129 tweets that capture not only her fun presentation but also her vision for making in the future. (see image 5.) image 5 the second keynote speaker was lorcan dempsey, @lorcand, the vice president, oclc research and chief strategist. he’s known primarily for the research he presents through his weblog, http://orweblog.oclc.org, where he makes observations on the way users interact with technology and the discoverability of all that libraries have to offer, from collections to services to expertise. he wants to make library data more usable. in his talk, he explained how some technologies such as mobile devices and irs are having huge effects on user behaviors. “the network reshapes society and society reshapes the network.” what was nice also is that lorcan’s talk complimented annmarie’s talk about making and sharing. users are going from consumption to creation, and we, as libraries, need to be offering our services and content in the users’ workflows. we need to share our resources, make them more discoverable. why? “discovery often happens elsewhere.” check out the 123 posts on the twitter archive, which includes links to his presentation. (see image 6.) image 6 president’s message | vacek 6 kortney ryan ziegler, @fakerapper, is the founder trans*h4ck and the closing keynote speaker. his work focuses on supporting trans-‐created technology, trans entrepreneurs, and trans-‐led startups. he’s led hackathons and helped create safe spaces for the trans community. his work is so important and many of the apps help to address the social inequalities that the trans community still faces. for example, he mentioned that it’s still legal in 36 states to be fired for being trans. but there are 174 tweets captured at the forum that give examples of the web tools created, and ideas about how libraries can be inclusive and more supportive of the trans community. (see image 7.) image 7 the sessions themselves were excellent, and many sparked conversations long after the presentation. lightning talks were engaging, fast, and fun. posters were both beautiful and informative. overarching terms that i heard repeatedly and saw among the tweets were: open graph, openrefine, social media, makerspaces, bibframe, library labs, leadership, support, community, analytics, assessment, engagement, inclusivity, diversity, agile development, open access, linked data, vivo, dataone, discovery systems, discoverability, librarybox, islandora, and institutional repositories. below are some highlights: information technologies and libraries | december 1014 7 there were so many opportunities to network at sessions, on breaks, at the networking dinners, and even at game night. i see networking as a huge benefit of a small conference, and networking can lead to some pretty amazing things. for example, whitni watkins, @nimblelibrarian and one of lita’s invaluable volunteers for the forum, was so inspired by a conversation on openrefine that she created a list where people could sign up to learn more and get some hands-‐on playing time with the tool. on her blog,5 whitni says, “…most if not all of those who came left with a bit president’s message | vacek 8 more knowledge of the program than before and we opened a door of possibility for those who hadn’t any clue as to what openrefine could do.” another example of great networking is where tabby farney, @sharebrarian, and cody behles, @cbehles, decided to create a lita metrics interest group. at one of the networking dinners, they discussed their passion for altmetrics and web analytics but noticed that there wasn’t an existing group, and felt spurred to create one. the technology and information sharing, the networking, the collaborating, and the strategizing – these are all components that make up the lita forum. twitter is just another technology platform to help us connect with one another. we are all just nodes, and technology enables us to both become the network and to network more effectively. but finally, i want to acknowledge and thank our sponsors, many of which are also lita members. we could not have run the forum without the generous funds from ebsco, springshare, @mire, innovative, and oclc. on behalf of lita, i truly appreciate their support. i want to leave you with one more image that was created by @kayiwa using the most tweeted words from all the posts.6 next year’s forum is in minneapolis, and i hope to see you there. information technologies and libraries | december 1014 9 references 1. http://consumercentric.biz/wordpress/?p=106 2.https://docs.google.com/spreadsheet/pub?key=0asyivmoyhk87dfnfx196v1e2m2zqtvlhq2j vs2fsdee&output=html 3. http://msk0.org/lita2014/litaforum-‐directed-‐retweets.html 4. http://msk0.org/lita2014/lita2014.html 5. http://nimblelibrarian.wordpress.com/2014/11/14/lita-‐forum-‐2014-‐a-‐recap/ 6. http://msk0.org/lita2014/litaforum-‐wordcloud.html june_ital_fifarek_final president’s message: for the record aimee fifarek information technologies and libraries | june 2017 1 this is my final column as lita president. having just finished the 2016/17 annual report, i must admit i’m a little tapped out. over the last year i’ve written on the events of an ala annual and midwinter conferences, a lita forum, a new strategic plan, information ethics, and advocacy. even for an english major and a librarian that’s a lot of words. as i work with executive director jenny levine and the rest of the lita board to prepare the agenda for our meetings at annual, the temptation is to focus on all the work that is yet to be done. but with the end of school and fiscal years approaching, it is the ideal time to celebrate everything that has been accomplished over the last 12 months. first off, at some magical point during the year we completed the lita staff transition period. jenny has truly made the executive director position her own, and although she and mark beatty have more than enough work for six people, they are well on their way to guiding lita to a bright new future. with her knowledge of the inner workings of ala and her desire to make everything easier, faster and better, jenny is truly the right person for this job. next, we have a great new set of people coming in to lead lita. andromeda yelton is going to be a fabulous lita president. she is an eloquent speaker, has more determination than anyone i know, and is a kick ass coder to boot. bohyun kim has an amazing talent for organizing and motivating people, and as president-elect work wonders with the new appointments committee. our new directors-at-large lindsay cronk, amanda goodman, and margaret heller are all devoted litans who will be great additions to the board. i’m glad i get to work with them all in their new roles as i transition to past-president. and last but certainly not least we have started to make inroads on our advocacy and information policy strategic focus. the privacy interest group has already raised lita’s profile by supplementing ala’s intellectual freedom committee’s privacy policies with privacy checklists.1 a group of board members along with office for information technology policy liaison david lee king and advocacy coordinating committee liaison callan bignoli are working on a new task force proposal to outline strategies for effectively collaborating with the ala washington office. these are just the first steps towards a future in which lita is not only relevant but necessary. with all that hard work accomplished, it must be time to toast to our successes. i hope that everyone who will be at ala annual in chicago (http://2017.alaannual.org/) later this month will join us as we conclude our 50th anniversary year. sunday with lita promises to be amazing, with aimee fifarek (aimee.fifarek@phoenix.gov) is lita president 2016-17 and deputy director for customer support, it and digital initiatives at phoenix public library, phoenix, az. president’s message | fifarek https://doi.org/10.6017/ital.v36i2.10019 2 hugo award winner kameron hurley (http://www.kameronhurley.com) speaking at the president’ program, followed by what is sure to be a spectacular lita happy hour at the beer bistro (http://www.thebeerbistro.com/). we are still working on our goal to raise $10,000 for professional development scholarships. we’re only halfway there, so please donate at: https://www.crowdrise.com/lita-50th-anniversary. being lita president during the association’s 50th anniversary year has been both an honor and a challenge. during a milestone year like this you become acutely aware of all of the hard work and innovation that was required for the association to thrive for half a century, and feel more than a little pressure to leave an extraordinary legacy that will ensure another fifty years of success. it’s a tall order, especially in an era of rapid political and societal change. but as i navigated through my presidential year i realized that i didn’t have to do anything more than ensure that people who already want to work hard for the greater good have a welcoming place to do just that. after fifty years, lita still has the thing that made it a success in the first place: a core group of volunteers committed to the belief that new technologies can empower libraries to do great things. the talented and passionate people i have worked with on the board, in the committee and interest group leadership, and throughout the membership are the best legacy that an association can have. now more than ever the people in libraries who “do tech” can be leaders in their communities and on the national stage. now more than ever it is lita’s time to shine. references 1. http://litablog.org/2017/02/new-checklists-to-support-library-patron-privacy/ delivering: automated materials handling for staff and patrons public libraries leading the way delivering automated materials handling for staff and patrons carole williams information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13xxx carole williams (williamscar@ccpl.org) is amh and self-service coordinator, charleston county public library. © 2021. “you’ve made libraries cool again!” “wowsuper techie—we’re fascinated by the book return!” with enthusiastic comments like these from our visitors, the staff at charleston county public library (ccpl) knew that we were delivering patron engagement while providing an effective book return system. thanks to county residents overwhelmingly approving a referendum to build five new libraries and renovate thirteen others. from june 2019–november 2020 charleston county opened four new library branches, each with an automated materials handler (amh), and moved our support and administrative staff into a renovated support services building that now houses a 32-chute amh central sorter with smart transit check-in technology. (side note: yes, i know what you’re thinking and yes, we did, we opened two of the new branches during the pandemic—definitely fodder for another article.) the branch amhs have interior and exterior return windows and sit along a glass wall so patrons can watch their items ride the conveyor belts and drop into sorting bins. the staff side has an inductor for items being returned from or sent to other branches, so there is almost always something for the public to watch (see figure 1). men, women, children, young and old enjoy watching the amh and asking questions. some patrons bring their out of town guests (even a nun visiting from ireland) to see the amh in action. this spontaneous interaction bolsters our connection with visitors and subconsciously reinforces the concept of “library as safe exploration.” a frequent question is “how does this work?” our explanation of tags and coding is the perfect opportunity to suggest books, point out games, and promote upcoming classes. we follow a roving customer service model. because an amh is an efficient tool that checks in items and deposits them in pre-determined bins for easy shelving, we have freed up hours of staff time that can now be spent in the stacks, helping patrons find items and answering questions as needed. delivering an excellent amh experience for staff has been more complicated. as befitting a port county, we went full steam ahead with new technology, new locations, and increased services. this required all staff to simultaneously learn new systems and change many of our in-house procedures while continuing with daily operations. every detail, from how to sort for shelving to labeling shipments, needed to be re-examined. the biggest changes came with bringing the central sorter online. some of the changes were technical. for example, we use rfid tags as an identification number and place a matching barcode on each item. rfid is excellent technology; tagging all our items has completely changed and streamlined our process. most items come pre-processed and the amhs are set to only read the rfid. an unintended, but useful consequence is that we have become more aware of vendor processing errors where tags and barcodes don’t match. (side note: we are working on some system-wide solutions to locate discrepancies between barcode numbers and rfid tags in the ils; rfid is another topic entirely, so stay tuned.) mailto:williamscar@ccpl.org information technology and libraries september 2021 delivering| williams 2 figure 1. children returning books to the automated materials handler at a branch of the charleston public library. another benefit of the amh is that our library collections acquisitions and technical services (lcats) department realized that as they processed new orders, they could now send those items out daily instead of waiting to accumulate enough individual branch materials for a separate shipment—a win for patrons (new materials every day) and lcats (storage space.) the unexpected twist: our adult, ya, and children’s’ librarians are accustomed to receiving new materials separately from returns so they can familiarize themselves with the titles before the items are shelved. with the central sorter, new items go out daily mixed in with the rest of the daily shipment. spine tape makes it is easy for circulation staff to separate the new adult items, but we still needed a solution for children and ya. after several sort changes and many discussions, we went old school, recycling used paper into book flags. the flagging doesn’t cause a problem with the amh, is quick for technical services to place in each new book and is easy for circulation to spot and put aside at the receiving location. some of the changes were electrical. only the four new branch locations and support services facility have an amh, while the other fourteen branches check in items by hand. we added a tote check-in server (tcs) system to the central sorter. this feature creates a manifest of the items in each crate. branches can now receive the contents of each crate by entering a 4 -digit barcode instead of scanning individual items. an unintended consequence to our new internet-dependent system that we had not anticipated was electricity. the coast has frequent thunderstorms that can information technology and libraries september 2021 delivering| williams 3 cause power outages and flooding. if the power is out, there is no way to sort or receive the items in delivery. luckily this doesn’t happen often, and so far, power has been restored quickly. some of the changes were physical. our delivery drivers also process the shipment when they return each day. in their previous workflow, most of the shipment was delivered to the downstream libraries. the parts of the shipment that they did process had printed routing slips placed in each item, so staff could all be sorting the shipment at the same time. now their department has become logistics, which is a more encompassing title and better covers the wider variety of tasks the staff have added to their day. in addition to delivery and mail duties, logistics also manages and maintains the amh and tcs equipment, troubleshooting problems that arise, scanning barcodes, and processing an average of 3000 items daily with the amh. most of the shipment is now coming through the central sorter—staff handle an average of 157 crates each weekday, moving items from support services and to/from branches. we have electric forklifts that hold three crates at a time to help with the increase in physically shifting the crates. now one person inducts the shipment while others scan and stack the crates on the loading dock. this procedure is much faster than the previous paper slip method and processing is usually finished in a couple of hours. other changes were mental and emotional. new locations, renovations, technologies, and procedures can be exciting, but can also lead to change fatigue. fortunately, everyone retained their job this past year, but in order to operate a new branch built in a previously unserved community, we had to reassign staff from locations closed for renovation. ccpl’s vision is for our library to be the path to our cultural heritage, a door to resources of the present, and a bridge to opportunities in the future. we are doers, creators, servers, and teammates, not only to the community, but to our coworkers. we are all in for our shared vision, but whew. . . some days we all experience mental eye rolling and collective sighs of “another change?” our director’s mantra is “we are the calm.” and it is true. by fall we will have three of the renovated branches reopened, three more under renovation and another staffing shift. with some grace and encouragement to one another, we will handle whatever comes next. mitigating bias in metadata: a use case using homosaurus linked data article mitigating bias in metadata a use case using homosaurus linked data juliet l. hardesty and allison nolan information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13053 juliet l. hardesty (jlhardes@iu.edu) is metadata analyst, indiana university. allison nolan (anolan147@gmail.com) is library and information science graduate student, indiana university. © 2021. abstract controlled vocabularies used in cultural heritage organizations (galleries, libraries, archives, and museums) are a helpful way to standardize terminology but can also result in misrepresentation or exclusion of systemically marginalized groups. library of congress subject headings (lcsh) is one example of a widely used yet problematic controlled vocabulary for subject headings. in some cases, systemically marginalized groups are creating controlled vocabularies that better reflect their terminology. when a widely used vocabulary like lcsh and a controlled vocabulary from a marginalized community are both available as linked data, it is possible to incorporate the terminology from the marginalized community as an overlay or replacement for outdated or absent terms from more widely used vocabularies. this paper provides a use case for examining how the homosaurus, an lgbtq+ linked data controlled vocabulary, can provide an augmented and updated search experience to mitigate bias within a system that only uses lcsh for subject headings. introduction controlled vocabularies are a vital part of how individuals and communities are understood and discussed in scholarly discourse and research. controlled vocabularies are also a way to standardize terminology and allow items to be grouped by common subjects for easier discovery and access points. while larger, more universally recognized vocabularies like the library of congress subject headings (lcsh) exist, they are often slow to be updated and they reflect a largely white, heterosexual, cisgender, male, christian-centric point of view.1 when the terminology used to define a systemically marginalized group is determined by those outside of the group, often the terms are outdated or reflect a biased perspective.2 the prevalence and continued use of outdated metadata and vocabularies in discovery systems creates a cycle of biased search practices that can be difficult to break without the help of information professionals and outside resources. controlled vocabularies that have been created by or have the input of marginalized communities tend to be more inclusive and up to date. unfortunately, these vocabularies often are not known to the public or to researchers not well versed in metadata practices. providing access to controlled vocabularies created by marginalized communities and linking them to existing vocabularies such as lcsh can help make the search process more representative of the people who are using discovery systems and can connect them to resources that better represent themselves and their needs in a complex information world. lcsh terms are available as linked data, a format that enables online machine-readable connections between concepts and terms, and there needs to be an effort to make systems using lcsh terms more inclusive and representative of marginalized communities. the project described in this article built and gathered feedback on a proof-of-concept javascript application to show how defined connections between vocabularies can be used to provide alternative and mailto:jlhardes@iu.edu mailto:anolan147@gmail.com information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 2 often enhanced access to library catalog resources. in this instance, simple knowledge organization system (skos) relationships link lcsh subject terms to the homosaurus linked data vocabulary, an “international linked data vocabulary of lgbtq terms that supports improved access to lgbtq resources within cultural institutions.”3 skos is “a common data model [from the w3c] for sharing and linking knowledge organization systems via the web.”4 this project uses skos:exactmatch relationships defined by the homosaurus to enable researchers to use homosaurus terms to search a library catalog and retrieve relevant results based on the connected lcsh terms that are already in the catalog record.5 subject searches are conducted when the homosaurus term and the lcsh term match exactly, since the lcsh term’s presence in the library catalog record indicates a specific grouping of records could have this subject term applied. if the homosaurus term does not match exactly to the lcsh term, a keyword search is conducted using the homosaurus term to retrieve library catalog results where the homosaurus term appears in any indexed field in the catalog record, including creator-supplied title and abstract information. using a vocabulary like the homosaurus this way helps to connect researchers to resources that more accurately reflect systemically marginalized communities and potentially more accurately reflects the researchers themselves. by providing connections for users that they would otherwise have difficulty finding without the help of a librarian or other information professional, projects such as this one hope to combat the cycle of biased metadata and biased research practices that has dominated academic research. literature review students in higher education who identify as members of systemically marginalized communities can continue to experience marginalization within higher educational institutions, and the academic library setting is no exception. brook, ellenwood, and lazzaro provide analysis of multiple studies showing the effect of mostly white staffing in academic libraries, the impact this can have on reference services provided to patrons from marginalized communities, and the overwhelming and intimidating spaces in sizable academic libraries that can be “compounded for students who already feel that they do not belong on campus on the basis of their race.” 6 when considering how this experience impacts using an online library catalog or digital repository system for conducting research, these same students can find themselves not well represented.7 additionally, crossing disciplines to capture intersectionalities of an identity can be complicated by narrow controlled vocabulary terms which compound problems that already make interdisciplinary research difficult.8 drabinski proposes that the library catalog should be treated as a biased text that requires critical thinking to understand.9 subject headings from authorities such as the library of congress will never be unbiased as attitudes, perspectives, and identities change over time. it is therefore important to leverage information literacy competency standards put forward by the association of college & research libraries and teach students how to critically engage the library catalog as another information source. library instruction is one way to ease the challenges faced by marginalized researchers in higher education, helping researchers effectively use a system like a library catalog that incorporates biased subject headings. however, with interdisciplinary research, materials are often dispersed across information systems and physical locations, and there is still the challenge to identify and locate everything relevant to the research topic.10 using available fields within the library catalog record itself (the 590 in marc, for example) can identify cross-disciplinary resources. examples are provided by hogan for black lgbtq resources and latina lesbian literature.11 what all of these efforts seem to point to is what hannah buckland proposes: changing the framing of catalog records from “aboutness” to information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 3 “fromness,” providing “culturally-responsive metadata” that j. l. colbert recognizes can create an “equitable subject access” experience that “center[s] the information needs and information seeking behaviors of those whom our systems disenfranchise.”12 these changes can often only be implemented locally due to language variation and localized community relevance; but colbert then considers how linked open data might prove useful to combine or relate different subject or community vocabularies. “when we decenter the idea that for every concept there is one controlled term to describe it, we allow the play of seemingly opposite ways of thinking. . . . a linked open data catalog allows libraries to complement, replace, or even reject the standards that have been decided for us and our patrons.”13 librarians and archivists have suggested and tried other methods to mitigate the impact of systemic marginalization. these efforts go beyond the use of controlled vocabularies in the creation of catalog records. one of the earliest and most significant examples of this is dorothy porter’s work in organizing the collections she managed at howard university. up to that point in the 1930s and 1940s, dewey decimal classification (ddc) was used to organize works on the shelf. many libraries of the time were predominantly white institutions and dorothy porter remembered them using ddc to shelve anything by a black author or about the black experience under the ddc heading for colonization (325) or slavery (326).14 porter instead organized her collections based on subject matter, genre, and author, categorizing the work based on what it was about rather than the race of the author or the race of any people mentioned in the work. this subtle yet fundamental shift shows the real impact that libraries have on access to collections for their audiences. hope a. olson and dennis ward created a proof-of-concept microsoft access database interface connecting mary ellen capek’s a women’s thesaurus to the dewey decimal classification scheme to offer an end user interface for searching a ddc system using the thesaurus terminology. the idea, initially from joan mitchell (then editor of ddc), was to develop “a means of making ddc accessible from the point of view of a marginalized knowledge domain—in particular, creating a means of browsing ddc from a feminist/women’s studies perspective.”15 variables were defined from characteristics of different classifications to enable a systematic match to thesaurus terms. dorothy berry’s work at university of minnesota libraries to gather and digitize african american-related materials from across archival collections for aggregating in umbra search african american history shows an option for pulling a collection from other collections and highlighting what would otherwise remain marginalized items from marginalized communities.16 discovering these materials required searching with a variety of terms used over time to refer to african americans. adding collection level context at the folder level for these materials allows aggregation without losing original place and context, while at the same time centering the marginalized communities represented in these materials by gathering them from these various and marginalized original locations. archives for black lives in philadelphia is “a loose association of archivists, librarians, and allied professionals in the philadelphia and delaware valley area responding to the issues raised by the black lives matter movement.” within this group, the anti-racist description working group has compiled an annotated bibliography and metadata recommendations to address racist and antiblack archival description.17 the recommendations focus on the black community but can be applied more broadly when describing records by and about any marginalized community. the information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 4 recommendations include decentering “neutrality” and “objectivity” for “respect” and “care,” particularly when deciding on controlled vocabulary terms to use in archival description. specific recommendations to use “terminology that black people use to describe themselves,” to recognize that this “terminology changes over time, so description will be an iterative process,” and to consult “alternative cataloging schemes created by the subjects of the records being described when and if they are available” provide an approach that looks for descriptive terms from within the community and moves away from terms applied to a community by others.18 paying attention to the controlled vocabularies applied to archival description helps to change the narrative and the power structure of the historical record, centering those who have been marginalized and oppressed and increasing discoverability and access to their stories and perspectives. allowing for changes in controlled vocabulary terms keeps systems flexible enough to accommodate changes in a community’s terminology over time. linked data relationships can connect term changes for more comprehensive searching while also identifying the current controlled vocabulary term to use. the lavender library, archives, and cultural exchange (llace) community archives in sacramento, california is an archive for a marginalized community.19 in developing archival and circulating library collections that serve the queer community, the library collections use a thesaurus of queer terms from dee michel for classification and the archival collections use subject headings from michel’s thesaurus along with lcsh.20 the focus, again, begins with the community being served and recognizes that widely used controlled vocabularies like lcsh do not serve these collections or communities well. starting with a community-specific vocabulary and then connecting lcsh terms centers the collections and community first and then makes connections to the larger library and archives community possible. other efforts have used alternatives or supplements to common vocabularies and schemes. the xwi7xwa library’s use of the brian deer classification system at the university of british columbia incorporates names and terminology from the first nations community to better represent that community beyond what something like library of congress classification provides. using accurate names of nations and peoples, according to the head librarian, ann doyle, helps create identity among users of the collection and “shapes the research and types of questions that people ask.”21 the national indian law library began cataloging using local terminology only. as it moved records online and sought to be more discoverable and cooperative with other libraries, this local terminology was synchronized with lcsh and specialized terms for federal indian law and tribal law were kept as a supplement.22 doing this work is not only about changing terms on catalog records but also learning and making connections with communities who have been marginalized by these systems. farnel et al. explain the process of decolonizing both the library catalog and digital collections description at university of alberta libraries through investigation, analysis, partnering with other institutions doing this work and, most importantly, reaching out to indigenous communities represented in these records to engage and learn about the most appropriate terminology to use.23 different methods and attempts to center the marginalized in cataloging and collection description show it is possible and essential to voice the concerns of those least represented in order to have the most impact on all researchers using these resources. widely used controlled vocabularies like lcsh continue to be a major way to aggregate collections and provide common access points. groups like the association for library collections and technical services’ information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 5 cataloging and metadata management section subject analysis committee continue to work to change terms in these vocabularies to provide better and more accurate representation for systemically marginalized communities, but the process is slow and will likely never be enough.24 incorporating vocabularies from systemically marginalized communities for use either on the cataloging/description side or for researchers to use for search and discovery offers possibilities for more inclusive experiences that center marginalized voices and expand the options for research questions to ask and answer. methodology to test this idea that connections provided between a systemically marginalized community’s controlled vocabulary and a more generalized vocabulary like lcsh could be helpful, a proof-ofconcept information retrieval aid was conceived. the idea was to create a lightweight javascript application that could use a select set of terms from the homosaurus (http://homosaurus.org), an lgbtq+ vocabulary originally created by ihlia lgbt heritage (https://www.ihlia.nl/?lang=en) and now also used in its linked data form by the digital transgender archive (https://www.digitaltransgenderarchive.net), to connect to lcsh terms and provide search links against a library catalog (iucat, https://iucat.iu.edu, indiana university’s online library catalog) that uses lcsh for subject headings. homosaurus version 1 was used initially and did not identify connections to lcsh terms. analysis of homosaurus terms against lcsh terms suggested some connections could be made and for initial construction of the proof-of-concept application these were used, but with the recognition that these connections were not coming from the community vocabulary. this was a problem since the point in mitigating bias is to use the community’s definitions and any outside interpretations are necessarily not going to reflect the community’s intentions. as the application concept continued to form and the initial term comparison work continued, homosaurus version 2 was released containing explicit connections to lcsh terms, using skos:exactmatch for mapping those connections. those connections in version 2 are not expressed as linked data but are provided in the vocabulary’s site for each term. the proof-of-concept work switched to using select terms from homosaurus version 2 in order to make use of the lcsh connections now being provided by the community.25 the proof-of-concept application used the select set of homosaurus version 2 terms downloaded as json-ld and added in the lcsh terms using the supplied skos:exactmatch relationship. the user interface provided visual connections from the selected homosaurus term to its narrower, broader, and related terms within homosaurus. any exact matches to lcsh terms and any use for terms homosaurus indicated should be replaced by this term were provided together. the visual layout for the application is directly influenced by the ihlia lgbt heritage collections browse interface.26 in ihlia’s system, after searching for a term (“love,” for example), the interface provides broader, narrower, related, and used for terms as suggestions for other ways to discover items in these collections in a visually connected bubble layout surrounding the search term. those connections are linked and can be used to navigate ihlia’s controlled vocabulary, which also happens to be powered by a local non-linked data form of the homosaurus vocabulary. in the proof-of-concept application, for terms where there is an lcsh exact match, the lcsh term was used for the connection to search iucat and was only revealed on screen if the exact match (lcsh) bubble was clicked by the user (see fig. 1). http://homosaurus.org/ https://www.ihlia.nl/?lang=en https://www.digitaltransgenderarchive.net/ https://iucat.iu.edu/ information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 6 figure 1. information retrieval aid showing the homosaurus term “transgenderism” linked to search iucat. exact match (lcsh) shows the lcsh term “gender nonconformity” (also linked to search in iucat) along with narrower, broader, and related homosaurus terms. the initial proof-of-concept information retrieval aid javascript application was shared with and tested by olivia adams, a graduate student at indiana university working as the library coordinator for the lgbtq+ culture center library at indiana university (https://lgbtq.indiana.edu/programs-services/library/index.html). this library has adapted the llace classification system, the shelving organizational scheme developed by the lavender library in sacramento, california (http://lavenderlibrary.com), for organizing its own physical collection of resources. the lgbtq+ culture center library also has its own online library catalog that makes use of an established local list of tags for items included in that system (https://www.librarycat.org/lib/iuglbtlibrary/). the information retrieval aid application was first presented to the lgbtq+ culture center library coordinator for general impressions and feedback. additionally, specific tasks were proposed. please note that the proposed tasks use a vocabulary term as an example that is offensive and outdated. the results of this testing, along with feedback from the homosaurus editorial board, clarified the need to change the information retrieval aid to supply this additional contextual information (available in homosaurus as a description for the term). the tasks presented for trying the information retrieval aid were the following: https://lgbtq.indiana.edu/programs-services/library/index.html http://lavenderlibrary.com/ https://www.librarycat.org/lib/iuglbtlibrary/ information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 7 • you want to find resources at iu about transgenderism. what do you think of the resources that iucat is offering through this information retrieval aid? • how do the homosaurus terms you are seeing here compare to the llace classification terms or the tags/subjects you use in the lgbtq+ library catalog? • what is the importance of transparency for the lcsh terms in relation to community values (for terms that are different and only shown in the hidden section right now)? “transgenderism” is a term homosaurus connects to lcsh’s term “gender nonconformity” with an exact match relationship (http://homosaurus.org/v2/transgenderism). to provide results for answering the first question, the proof-of-concept information retrieval aid interface showed the homosaurus term with a linked search in iucat that provided results using the lcsh term as a subject search.27 the second question was asked to get a sense of the relevance of the homosaurus terms to the collections organized and housed in the lgbtq+ culture center library. the third question about the importance of transparency for the lcsh terms in relation to community values was meant to investigate how a system like this proof-of-concept information retrieval aid might be used by the community of researchers and patrons using the culture center’s library, and if the mechanism to mask the lcsh term in favor of the homosaurus term is useful or not. the code for this javascript web application in its current state is available on github at https://github.com/jlhardes/metadatabias. the initial proof-of-concept application was developed by justina kaiser, at the time an information and library science graduate student at indiana university. the current code is a fork of her project, also available on github (https://github.com/juskaise/metadatabias). discussion sharing this proof-of-concept information retrieval aid using homosaurus terms with the lgbtq+ culture center librarian revealed the importance of usability testing and being receptive to a community’s needs. an introduction and explanation of the controlled vocabulary and the community it represents was a recommended addition since the term list presented was not initially easily identified. additionally, the interface terminology of narrower/related/broader/exact match/use for is familiar in the library world but not necessarily for the casual user. this terminology is still in use by the information retrieval aid but is under review for updated labels that are easier to understand. this initial version kept any use for terms hidden unless the user clicked on that bubble in the interface to see them. the reasoning was to give more emphasis to the homosaurus term and to keep any potentially derogatory or harmful terms still in use by lcsh out of the way of researchers (even though the searches conducted against the catalog might need to use those terms if no other linked data connection is available). feedback here was helpful: hiding terms that homosaurus does not recommend might hinder discovering results if the researcher wants to search on a term that is no longer used by the community or is considered derogatory or harmful. this is a useful lesson in that covering up the past is not helpful to those in a marginalized community who have experience with that marginalization or those trying to learn about the past experiences of a marginalized community. also, being able to find all relevant resources can mean a variety of terms (both current in the community and no longer current) might be necessary. the homosaurus editorial board also explained that use for terms are sometimes slang terms and are not always considered derogatory. this information is helpful in figuring out how to present lcsh terms in the interface http://homosaurus.org/v2/transgenderism https://github.com/jlhardes/metadatabias https://github.com/juskaise/metadatabias information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 8 in the context of the homosaurus terms. additionally, moving use for terms next to related terms connected these sets of terms better than placing use for terms with exact match terms. further feedback from the homosaurus editorial board regarding the example term used for testing showed the terms and their connections to other terms do not supply enough information to express the full meaning of the term within the community. without supplying the homosaurus description for the term “transgenderism” (“pathologizing term often used in the medicalization of transgender people; use only in historical context,” see http://homosaurus.org/v2/transgenderism), the term can come across in the information retrieval aid as a preferred term from the community when, in fact, it is not. this was a critical update needed for the information retrieval aid to be effective as a research tool. in using the proof-of-concept interface to search against iucat, it was noted by the lgbtq+ culture center librarian that using the lcsh term to conduct a subject search against the catalog might not produce useful results if the homosaurus term is not an actual exact match to the lcsh term. in this case the homosaurus term should be searched in the catalog as a keyword instead of a subject, so the search is conducted on all indexed fields in the catalog record. in the example tried for the term “transgenderism” the skos:exactmatch relationship is defined as the lcsh term “gender nonconformity” (see fig. 1). even though the relationship is identified in homosaurus as an exact match, searching for “gender nonconformity” as a subject term in the catalog (267 results) and “transgenderism” as a keyword in the catalog (289 results) arrives at different result sets with different types of entries (see figs. 2 and 3). use for terms, while not always representative of the community providing the vocabulary, do have possible historical relevance if present in supplied information (such as a title) and can be connected to the catalog via keyword searching as well. there is an importance to revealing these differences within the library catalog and providing results that reflect the terms used by the community. the library’s applied terminology via subjects organizes a different set of resources compared to searching for terminology available via titles or other information supplied by authors and creators. when considering who is part of a community and who is not in this scenario, there are benefits to trying to work around or in addition to the library’s applied organizational scheme. subject searching in the catalog provides another view (and set of results) for those familiar with the community’s terminology. those approaching a research topic from outside of a community are able to learn more about how to find resources most effectively, moving from the catalog’s terminology to the community’s terminology. after trying the proof-of-concept information retrieval aid, the lgbtq+ culture center librarian provided feedback that this could be useful for people new to studying the lgbtq+ community and unfamiliar with the community’s terminology. with an introduction and explanation of the controlled vocabulary in place and an easy-to-follow interface to guide users through the vocabulary terms, effective searches against the catalog that also reveal terminology used by the community and differences between that terminology and the catalog’s terminology can be both educational and useful for research. http://homosaurus.org/v2/transgenderism information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 9 figure 2. searching indiana university’s online library catalog (iucat) for the lcsh term “gender nonconformity” as subject shows 267 results. information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 10 figure 3. searching indiana university’s online library catalog for the homosaurus term “transgenderism” as keyword shows 289 results. one of the largest obstacles to connecting marginalized communities to reliable, representative controlled vocabularies is the lack of controlled vocabularies that are readily available as linked data. unless an individual or organization has made the effort to establish connections between a community’s vocabulary and lcsh, the representative vocabularies stand alone and remain difficult to discover or use. the proof-of-concept testing of this project illustrates not only the need for connections to community-created controlled vocabularies, but also that having access to those vocabularies can result in more accurate and effective searches and usage of catalog resources. although vocabularies like lcsh contain outdated terms, having access to a variety of terms that are acceptable at different points in a community’s history can be useful for researchers who may not be as informed about certain systemically marginalized communities and whether certain terms have been completely eliminated, reclaimed, or replaced by more accurate terminology. efforts to mitigate bias in metadata via linked data are representative of a larger effort to correct a long-standing issue in libraries and other fields where the voices and perspectives of marginalized individuals have been overshadowed by the voices and needs of the majority. in addition to working to update large, generalized vocabularies and trying to incorporate these voices and information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 11 perspectives, this change in method is meant to add those voices and center their importance. by linking community-created vocabularies and placing them front and center in the search process, metadata can become a tool with which to center the voices of marginalized communities and move toward a more equitable method of searching, finding, and using resources. conclusion the information retrieval aid is still progressing beyond a proof-of-concept but it has seen significant updates since its initial implementation. figure 1 shows the initial proof -of-concept that was tested. introductory information has been added to explain the homosaurus vocabulary and the information retrieval aid tool itself. more terms are available (although still not the full set of homosaurus version 2 terms) and the term list in json-ld is being used to automatically populate the term list in the interface. if available, the term description is provided for more complete context. additionally, no terms are hidden in the bubble navigation and use for is located with related terms now. future work for this project includes incorporating the full list of homosaurus terms; reconsidering the category names (narrower/related/broader/exact match/use for) to determine if there are better labels to use for these categories that will be easier to understand for a general research audience; and testing the tool with researchers new to lgbtq+ terminology as well as those more knowledgeable about the lgbtq+ community and its terminology and history. additional areas of work that welcome investigation include automating the term list generated for use with the information retrieval aid (via api calls, for example) to help reflect any changes or updates made to the community vocabulary over time; the technical implications of connecting this information retrieval aid to a search engine beyond indiana university’s online library catalog; and using this tool with controlled vocabularies from other systemically marginalized communities, such as the bc first nations subject headings, the glossary of disability terms from the north carolina council on developmental disabilities, or atria: women’s thesaurus from the institute on gender equality and women’s history.28 what difference does it make to use a different search engine that incorporates lcsh terms? likewise, is it possible to connect other linked data (or non-linked data) controlled vocabularies from systemically marginalized communities and is that effective for retrieving information and improving research outcomes? the work so far shows the possibility of centering systemically marginalized voices by using the system more effectively, making linked data work to connect and update the terminology and search terms available for research. acknowledgements the authors would like to thank the lgbtq+ culture center librarian at indiana university for spring 2020, olivia adams, for her helpful review and feedback of the initial proof -of-concept information retrieval aid. we would also like to thank brian m. watson, editorial board member of homosaurus.org, for their help with using homosaurus version 2 terms and the homosaurus editorial board, particularly k. j. rawson, for reviewing and supplying article feedback. the authors also acknowledge the work of justina kaiser who created the initial code behind the information retrieval aid. information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 12 endnotes 1 hope a. olson, “mapping beyond dewey’s boundaries: constructing classificatory space for marginalized knowledge domains,” library trends 47, no. 2 (fall 1998): 238. 2 the term “systemically marginalized group,” used recently by dr. nicki washington from duke university at the september 3, 2020, indiana university center for women in technology talk, “‘bring a folding chair’: understanding and addressing issues of race in the context of stem,” was revealing to the authors as a better term to use than “historically marginalized communities.” this is significant in that it emphasizes the continued oppression and marginalization of these communities, rather than viewing these communities’ struggles as something of the past that has been overcome/surmounted. 3 “mission, history, editorial board,” homosaurus vocabulary site, accessed march 2, 2021, http://homosaurus.org/about. 4 “skos simple knowledge organization system reference,” w3, published august 18, 2009, https://www.w3.org/tr/skos-reference/. 5 “skos:exactmatch,” skos simple knowledge organization system namespace document—html variant, 18 august 2009 recommendation edition, w3, last modified august 6, 2011, https://www.w3.org/2009/08/skos-reference/skos.html#exactmatch. 6 freeda brook, david ellenwood, and althea eannace lazzaro, “in pursuit of antiracist social justice: denaturalizing whiteness in the academic library,” library trends 64, no. 2 (fall 2015): 259, https://muse.jhu.edu/article/610078. 7 holly tomren, “classification, bias, and american indian materials” (san jose state university, 2003), http://ailasacc.pbworks.com/f/biasclassification2004.pdf. 8 amelia koford, “how disability studies scholars interact with subject headings,” cataloging & classification quarterly 52, no. 4 (2014), https://doi.org/10/gf542p. 9 emily drabinski, “queering the catalog: queer theory and the politics of correction,” library quarterly: information, community, policy 83, no. 2 (april 2013), https://www.jstor.org/stable/10.1086/669547. 10 sara a. howard and steven a. knowlton, “browsing through bias: the library of congress classification and subject headings for african american studies and lgbtqia studies,” library trends 67, no. 1 (summer 2018), https://doi.org/10.1353/lib.2018.0026. 11 kristen hogan, “‘breaking secrets’ in the catalog: proposing the black queer studies collection at the university of texas at austin,” progressive librarian 34 (2010), http://www.progressivelibrariansguild.org/pl/pl34_35/050.pdf. 12 j. l. colbert [ https://orcid.org/0000-0001-5733-5168], “patron-driven subject access: how librarians can mitigate that ‘power to name’,” in the library with the lead pipe, november 15, 2017, http://www.inthelibrarywiththeleadpipe.org/2017/patron-driven-subject-access-howlibrarians-can-mitigate-that-power-to-name/. http://homosaurus.org/about https://www.w3.org/tr/skos-reference/ https://www.w3.org/2009/08/skos-reference/skos.html#exactmatch https://muse.jhu.edu/article/610078 http://ailasacc.pbworks.com/f/biasclassification2004.pdf https://doi.org/10/gf542p https://www.jstor.org/stable/10.1086/669547 https://doi.org/10.1353/lib.2018.0026 http://www.progressivelibrariansguild.org/pl/pl34_35/050.pdf http://www.inthelibrarywiththeleadpipe.org/2017/patron-driven-subject-access-how-librarians-can-mitigate-that-power-to-name/ http://www.inthelibrarywiththeleadpipe.org/2017/patron-driven-subject-access-how-librarians-can-mitigate-that-power-to-name/ information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 13 13 j. l. colbert, “patron-driven subject access.” 14 avril johnson madison and dorothy porter wesley, “dorothy burnett porter wesley: enterprising steward of black culture,” public historian 17, no. 1 (winter 1995): 25, https://www.jstor.org/stable/3378349; janet sims-wood, dorothy porter wesley at howard university: building a legacy of black history (charleston, sc: the history press, 2014), 39; zita cristina nunes, “cataloging black knowledge: how dorothy porter assembled and organized a premier africana research collection,” perspectives on history: the news magazine of the american historical association (november 20, 2018), https://www.historians.org/publications-and-directories/perspectives-on-history/december2018/cataloging-black-knowledge-how-dorothy-porter-assembled-and-organized-a-premierafricana-research-collection. 15 hope a. olson and dennis b. ward, “feminist locales in dewey’s landscape: mapping a marginalized knowledge domain,” in knowledge organization for information retrieval: proceedings of the sixth international study conference on classification research (the hague, netherlands: international federation for information documentation, 1997), 129. 16 dorothy berry, “digitizing and enhancing description across collections to make african american materials more discoverable on umbra search african american history,” the design for diversity learning toolkit, northeastern university libraries, august 2, 2018, https://des4div.library.northeastern.edu/digitizing-and-enhancing-description-acrosscollections-to-make-african-american-materials-more-discoverable-on-umbra-search-africanamerican-history/. 17 alexis a. antracoli et al., anti-racist description resources (philadelphia, pa: archives for black lives in philadelphia, 2019), i, https://archivesforblacklives.files.wordpress.com/2019/10/ardr_final.pdf. 18 antracoli et al., “anti-racist description resources,” 5. 19 diana k. wakimoto, debra l. hansen, and christine bruce, “the case of llace: challenges, triumphs, and lessons of a community archives,” american archivist 76, no. 2 (fall/winter 2013), http://www.jstor.org/stable/43490362. 20 according to the article, “the word queer is used throughout this article as the most general, over-arching term to describe communities and individuals who support llace and make it possible.” diana k. wakimoto et al., “case of llace,” 439; dee michel, ed., gay studies thesaurus, rev. ed. (urbana, il, 1990). 21 catelynne sahadath, “classifying the margins: using alternative classification schemes to empower diverse and marginalized users,” feliciter 59, no. 3 (june 2013): 16. 22 monica martens, “creating a supplemental thesaurus to lcsh for a specialized collection: the experience of the national indian law library,” law library journal 98, no. 2 (spring 2006). https://www.jstor.org/stable/3378349 https://www.historians.org/publications-and-directories/perspectives-on-history/december-2018/cataloging-black-knowledge-how-dorothy-porter-assembled-and-organized-a-premier-africana-research-collection https://www.historians.org/publications-and-directories/perspectives-on-history/december-2018/cataloging-black-knowledge-how-dorothy-porter-assembled-and-organized-a-premier-africana-research-collection https://www.historians.org/publications-and-directories/perspectives-on-history/december-2018/cataloging-black-knowledge-how-dorothy-porter-assembled-and-organized-a-premier-africana-research-collection https://www.historians.org/publications-and-directories/perspectives-on-history/december-2018/cataloging-black-knowledge-how-dorothy-porter-assembled-and-organized-a-premier-africana-research-collection https://www.historians.org/publications-and-directories/perspectives-on-history/december-2018/cataloging-black-knowledge-how-dorothy-porter-assembled-and-organized-a-premier-africana-research-collection https://des4div.library.northeastern.edu/digitizing-and-enhancing-description-across-collections-to-make-african-american-materials-more-discoverable-on-umbra-search-african-american-history/ https://des4div.library.northeastern.edu/digitizing-and-enhancing-description-across-collections-to-make-african-american-materials-more-discoverable-on-umbra-search-african-american-history/ https://des4div.library.northeastern.edu/digitizing-and-enhancing-description-across-collections-to-make-african-american-materials-more-discoverable-on-umbra-search-african-american-history/ https://des4div.library.northeastern.edu/digitizing-and-enhancing-description-across-collections-to-make-african-american-materials-more-discoverable-on-umbra-search-african-american-history/ https://des4div.library.northeastern.edu/digitizing-and-enhancing-description-across-collections-to-make-african-american-materials-more-discoverable-on-umbra-search-african-american-history/ https://archivesforblacklives.files.wordpress.com/2019/10/ardr_final.pdf http://www.jstor.org/stable/43490362 information technology and libraries september 2021 mitigating bias in metadata | hardesty and nolan 14 23 sharon farnel et al., “rethinking representation: indigenous peoples and contexts at the university of alberta libraries,” international journal of information, diversity, & inclusion 2, no. 3 (2018), https://doi.org/10.33137/ijidi.v2i3.32190. 24 alcts is a division of the american library association— http://www.ala.org/alcts/mgrps/camms/cmtes/ats-ccssac; sac working group, “report of the sac working group on alternatives to lcsh ‘illegal aliens,’” american library association institutional repository, submitted june 19, 2020, https://alair.ala.org/bitstream/handle/11213/14582/sac20-ac_report_sac-working-groupon-alternatives-to-lcsh-illegal-aliens.pdf. 25 this is a moment to acknowledge the work of several homosaurus editorial board members, including brian m. watson, who is studying and working with linked data at university of british columbia; chloe noland from american jewish university; and walter “cat” walker from the william h. hannon library and one national gay and lesbian archives. there was never a request to add these lcsh term connections, but the timing was incredibly helpful, and the effort greatly appreciated. 26 example search for term “love” that results in browsable terms in a visual interface: https://www.ihlia.nl/search/index.jsp?q%3asearch=love&q%3azoekterm.row1.field3=&lang =en. 27 “gender nonconformity,” (search results, iucat, indiana university, accessed march 2, 2021), https://iucat.iu.edu/?utf8=%26%2310004%3b&search_field=subject&q=gender+nonconfor mity. 28 bc first nations subject headings (vancouver, bc: xwi7xwa library first nations house of learning, march 2, 2009), http://branchxwi7xwa.sites.olt.ubc.ca/files/2011/09/bcfn.pdf; “glossary of disability terms,” north carolina council on developmental disabilities, accessed march 8, 2021, https://nccdd.org/welcome/glossary-and-terms/category/glossary-ofdisability-terms; “search in the women’s thesaurus,” atria—institute on gender equality and women’s history, accessed march 8, 2021, https://institute-genderequality.org/libraryarchive/collection/thesaurus. https://doi.org/10.33137/ijidi.v2i3.32190 http://www.ala.org/alcts/mgrps/camms/cmtes/ats-ccssac https://alair.ala.org/bitstream/handle/11213/14582/sac20-ac_report_sac-working-group-on-alternatives-to-lcsh-illegal-aliens.pdf https://alair.ala.org/bitstream/handle/11213/14582/sac20-ac_report_sac-working-group-on-alternatives-to-lcsh-illegal-aliens.pdf https://alair.ala.org/bitstream/handle/11213/14582/sac20-ac_report_sac-working-group-on-alternatives-to-lcsh-illegal-aliens.pdf https://alair.ala.org/bitstream/handle/11213/14582/sac20-ac_report_sac-working-group-on-alternatives-to-lcsh-illegal-aliens.pdf https://www.ihlia.nl/search/index.jsp?q%3asearch=love&q%3azoekterm.row1.field3=&lang=en https://www.ihlia.nl/search/index.jsp?q%3asearch=love&q%3azoekterm.row1.field3=&lang=en https://iucat.iu.edu/?utf8=%26%2310004%3b&search_field=subject&q=gender+nonconformity https://iucat.iu.edu/?utf8=%26%2310004%3b&search_field=subject&q=gender+nonconformity http://branchxwi7xwa.sites.olt.ubc.ca/files/2011/09/bcfn.pdf https://nccdd.org/welcome/glossary-and-terms/category/glossary-of-disability-terms https://nccdd.org/welcome/glossary-and-terms/category/glossary-of-disability-terms https://institute-genderequality.org/library-archive/collection/thesaurus https://institute-genderequality.org/library-archive/collection/thesaurus abstract introduction literature review methodology discussion conclusion acknowledgements endnotes 172 information technology and libraries | december 2009 information discovery insights gained from multipac, a prototype library discovery system alex a. dolski at the university of nevada las vegas libraries, as in most libraries, resources are dispersed into a number of closed “silos” with an organization-centric, rather than patron-centric, layout. patrons frequently have trouble navigating and discovering the dozens of disparate interfaces, and any attempt at a global overview of our information offerings is at the same time incomplete and highly complex. while consolidation of interfaces is widely considered to be desirable, certain challenges have made it elusive in practice. m ultipac is an experimental “discovery,” or metasearch, system developed to explore issues surrounding heterogeneous physical and networked resource access in an academic library environment. this article discusses some of the reasons for, and outcomes of, its development at the university of nevada las vegas (unlv). n the case for multipac fragmentation of library resources and their interfaces is a growing problem in libraries, and unlv libraries is no exception. electronic information here is scattered across our innovative webpac; our main website, our three branch library websites; remote article databases, local custom databases, local digital collections, special collections, other remotely hosted resources (such as libguides), and others. the number of these resources, as well as the total volume of content offered by the libraries, has grown over time (figure 1), while access provisions have not kept pace in terms of usability. in light of this dilemma, the libraries and various units within have deployed finding and search tools that provide browsing and searching access to certain subsets of these resources, depending on criteria such as n the type of resource; n its place within the libraries’ organizational structure; n its place within some arbitrarily defined topical categorization of library resources; n the perceived quality of its content; and n its uniqueness relative to other resources. these tools tend to be organization-centric rather than patron-centric, as they are generally provisioned in relative isolation from each other without placing as much emphasis on the big picture (figure 2). the result is, from the patron’s perspective, a disaggregated mass of information and scattered finding tools that, to varying degrees, each accomplishes its own specific goals at the expense of macro-level findability. currently, a comprehensive search for a given subject across as many library resources as possible might involve visiting a half-dozen interfaces or more—each one predicated upon awareness of each individual interface, its relation to the others, and figure 1. “silos” in the library figure 2. organization-centric resource provisioning alex a. dolski (alex.dolski@unlv.edu) is web & digitization application developer at the university of nevada las vegas libraries. information discovery insights gained from multipac | dolski 173 the characteristics of its specific coverage of the corpus of library content. our library website serves as the de facto gateway to our electronic, networked content offerings. yet usability studies have shown that findability, when given our website as a starting point, is poor. undoubtedly this is due, at least in part, to interface fragmentation. test subjects, when given a task to find something and asked to use the library website as a starting point, fail outright in a clear majority of cases.1 multipac is a technical prototype that serves as an exploration of these issues. while the system itself breaks no new technical ground, it brings to the forefront critical issues of metadata quality, organizational structure, and long-term planning that can inform future actions regarding strategy and implementation of potential solutions at unlv and elsewhere. yet it is only one of numerous ways that these issues could be addressed.2 in an abstract sense, multipac is biased toward principles of simplification, consolidation, and unification. in theory, usability can be improved by eliminating redundant interfaces, consolidating search tools, and bringing together resource-specific features (e.g., opac holdings status) in one interface to the maximum extent possible (figure 3). taken to an extreme, this means being able to support searching all of our resources, regardless of type or location, from a single interface; abstracting each resource from whatever native or built-in user interface it might offer; and relying instead on its data interface for querying and result-set gathering. thus multipac is as much a proof-of-concept as it is a concrete implementation. n background: how multipac became what it is multipac came about from a unique set of circumstances. from the beginning, it was intended as an exploratory project, with no serious expectation of it ever being deployed. our desire to have a working prototype ready for our discovery mini-conference meant that we had just six weeks of development time, which was hardly sufficient for anything more than the most agile of table 1. some popular existing library discovery systems name company/institution commercial status aquabrowser serials solutions commercial blacklight university of virginia open-source (apache) encore innovative interfaces commercial extensible catalog university of rochester open-source (mit/gpl) libraryfind oregon state university open-source (gpl) metalib ex libris commercial primo ex libris commercial summon serials solutions commercial vufind villanova university open-source (gpl) worldcat local oclc commercial table 2. some existing back-end search servers name company/institution commercial status endeca endeca technologies commercial idol autonomy commercial lucene apache foundation open-source (apache) search server microsoft commercial search server express microsoft free solr (superset of lucene) apache foundation open-source (apache) sphinx sphinx technologies open-source (gpl) xapian community open-source (gpl) zebra index data open-source (gpl) 174 information technology and libraries | december 2009 development models. the resulting design, while foundationally solid, was limited in scope and depth because of time constraints. another option, instead of developing multipac, would have been to demonstrate an existing open-source discovery system. the advantage of this approach is that the final product would have been considerably more advanced than anything we could have developed ourselves in six weeks. on the other hand, it might not have provided a comparable learning opportunity. n survey of similar systems were its development to continue, multipac would find itself among an increasingly crowded field of competitors (table 1). a number of library discovery systems already exist, most backed by open-source or commercially available back-end search engines (table 2), which handle the nitty-gritty, low-level ingestion, indexing, and retrieval. these lists of systems are by no means comprehensive and do not include notable experimental or research systems, which would make them much longer. n architecture in terms of how they carry out a search, meta-search applications can be divided into two main groups: distributed (or federated search), in which searches are “broadcast” to individual resources that return results in real time (figure 4); and harvested search, in which searches are carried out against a local index of resource contents (figure 5).3 both have advantages and disadvantages beyond the scope of this article. multipac takes the latter approach. it consists of three primary components: the search server, the user interface, and the metadata harvesting system (figure 6). figure 4. the federated search process figure 5. the harvested search process figure 6. the three main components of multipac figure 3. patron-centric resource provisioning information discovery insights gained from multipac | dolski 175 n search server after some research, solr was chosen as the search server because of its ease of use, proven library track record, and http–based representational state transfer (rest) application programming interface (api), which improves network-topological flexibility, allowing it to be deployed on a different server than the front-end web application—an important consideration in our server environment.4 jetty—a java web application server bundled with solr—proved adequate and convenient for our needs. the metadata schema used by solr can be customized. we derived ours from the unqualified dublin core metadata element set (dcmes),5 with a few fields removed and some fields added, such as “library” and “department,” as well as fields that support various multipac features, such as thumbnail images, and primary record urls. dcmes was chosen for its combination of generality, simplicity, and familiarity. in practice, the solr schema is for finding purposes only, so whether it uses a standard schema is of little importance. n user interface the front-end multipac system is written in php 5.2 in a model-view-controller design based on classical object design principles. to support modularity, new resources can be added as classes that implement a resource-class interface. the multipac html user interface is composed of five views: search, browse, results, item, and list, which exist to accommodate the finding process illustrated in figure 7. each view uses a custom html template that can be easily styled by nonprogrammer web designers. (needless to say, judging by figures 8–12, they haven’t been.) most dynamic code is encapsulated within dedicated “helper” methods in an attempt to decouple the templates from the rest of the system. output formats, like resources, are modular and decoupled from the core of the system. the html user interface is one of several interfaces available to the multipac system; others include xml and json, which effectively add web services support to all encompassed resources—a feature missing from many of the resources’ own built-in interfaces.6 n search view search view (figure 8) is the simplest view, serving as the “front page.” it currently includes little more than a brief introduction and search field. the search field is not complicated; it is, in fact, possible to include search forms on any webpage and scope them to any subset of resources on the basis of facet queries. for example, a search form could be scoped to las vegas–related resources in special collections, which would satisfy the demand of some library departments for custom search engines tailored to their resources without contributing to the “interface fragmentation” effect discussed in the introduction. (this would require a higher level of metadata quality than we currently have, which will be discussed in depth later.) because search forms can be added to any page, this view is not essential to the multipac system. to improve simplification, it could be easily removed and replaced with, for example, a search form on the library homepage. n browse view browse view (figure 9) is an alternative to search view, intended for situations in which the user lacks a “concrete target” (figure 7). as should be evident by its appearance, figure 7. the information-finding process supported by multipac figure 8. the multipac search view page 176 information technology and libraries | december 2009 this is the least-developed view, simply displaying facet terms in an html unordered list. notice the facet terms in the format field; this is malprocessed, marc– encoded information resulting from a quick-and-dirty extensible stylesheet language (xsl) transformation from marcxml to solr xml. n results view the results page (figure 10) is composed of three columns: 1. the left column displays a facet list—a feature generally found to be highly useful for results-gathering purposes.7 the data in the list is generated by solr and transformed to an html unordered list using php. the facets are configurable; fields can be made “facetable” in the solr schema configuration file. 2. the center column displays results for the current search query that have been provided by solr. thumbnails are available for resources that have them; generic icons are provided for those that do not. currently, the results list displays item title and description fields. some items have very rich descriptions; others have minimal descriptions or no descriptions at all. this happens to be one of several significant metadata quality issues that will be discussed later. 3. the right column displays results from nonindexed resources, including any that it would not be feasible to index locally, such as google, our article databases, and so on. multipac displays these resources as collapsed panes that expand when their titles are clicked and initiate an ajax request for the current search query. in a situation in which there might be twenty or more “panes” to load, performance would obviously suffer greatly if each one had to be queried each time the results page loaded. the on-demand loading process greatly speeds up the page load time. currently, the right column includes only a handful of resource panes—as many as could be developed in six weeks alongside the rest of the prototype. it is anticipated that further development would entail the addition of any number of panes—perhaps several dozen. the ease of developing a resource pane can vary greatly depending on the resource. for developerfriendly resources that offer a useful javascript object notation (json) api, it can take less than half an hour. for article databases, which vendors generally take great pains to “lock down,” the task can entail a two-day marathon involving trial-and-error http-request-token authentication and screen-scraping of complex invalid html. in some cases, vendor license agreements may prohibit this kind of use altogether. there is little we can do about this; clearly, one of multipac’s severest limitations is its lack of adeptness at searching these types of “closed” remote resources. n item view item view (figure 11) provides greater detail about an individual item, including a display of more metadata fields, an image, and a link to the item in its primary context, if available. it is expected that this view also would include holdings status information for opac resources, although this has not been implemented yet. the availability of various page features is dependent on values encoded in the item’s solr metadata record. for example, if an image url is available, it will be displayed; if not, it won’t. an effort was made to keep the view logic separate from the underlying resource to improve code and resource maintainability. the page template itself does not contain any resource-dependent conditionals. n list view list view (figure 12), essentially a “favorites” or “cart” view, is so named because it is intended to duplicate the list feature of unlv libraries’ innovative millennium figure 9. the multipac browse view page information discovery insights gained from multipac | dolski 177 opac. the user can click a button in either results view or item view to add items to the list, which is stored in a cookie. although currently not feature-rich, it would be reasonable to expect the ability to send the list as an e-mail or text message, as well as other features. n metadata harvesting system for metadata to be imported into solr, it must first be harvested. in the harvesting process, a custom script checks source data and compares it with local data. it downloads new records, updates stale records, and deletes missing records. not all resources support the ability to easily check for changed records, meaning that the full record set must be downloaded and converted during every harvest. in most cases, this is not a problem; most of our resources (the library catalog excluded) can be fully dumped in a matter of a few seconds each. in a production environment, the harvest scripts would be run automatically every day or so. in practice, every resource is different, necessitating a different harvest script. the open archives initiative protocol for metadata harvesting (oai-pmh) is the protocol that first jumps to mind as being ideal for metadata harvesting, but most of our resources do not support it. ideally, we would modify as many of them as possible to be oai–compliant, but that would still leave many that are out of our hands. either way, a substantial number of custom harvest scripts would still be required. for demonstration purposes, the multipac prototype was seeded with sample data from a handful of diverse resources: 1. a set of 16,000 marc records from our library catalog, which we converted to marcxml and then to solr xml using xsl transformations 2. our locally built las vegas architects and buildings database, a mysql database containing more than 10,000 rows across 27 tables, which we queried and dumped into xml using a php script 3. our locally built special collections database, a smaller mysql database, which we dealt with the same way 4. our contentdm digital collections, which we downloaded via oai-pmh and transformed using another custom xsl stylesheet there are typically a variety of conversion options for each resource. because of time constraints, we simply chose what we expected would be the quickest route for each, and did not pay much attention to the quality of the conversion. n how multipac answers unlv libraries’ discovery questions multipac has essentially proven its capability of solving interface multiplication and fragmentation issues. figure 10. the multipac results view page 178 information technology and libraries | december 2009 by adding a layer of abstraction between resource and patron, it enables us to reference abstract resources instead of their specific implementations—for example, “the library catalog” instead of “the innopac catalog.” this creates flexibility gains with regard to resource provision and deployment. this kind of “pervasive decoupling” can carry with it a number of advantages. first, it can allow us to provide custom-developed services that vendors cannot or do not offer. second, it can prevent service interruptions caused by maintenance, upgrades, or replacement of individual back-end resources. third, by making us less dependent on specific implementations of vendor products—in other words, reducing vendor “lock-in”—it can potentially give us leverage in vendor contract negotiations. because of the breadth of information we offer from our website gateway, we as a library are particularly sensitive about the continued availability of access to our resources at stable urls. when resources are not persistent, patrons and staff need to be retrained, expectations need to be adjusted, and hyperlinks—scattered all over the place—need to be updated. by decoupling abstract resources from their implementations, multipac becomes, in effect, its own persistent uri system, unifying many library resources under one stable uri schema. in conjunction with a url rewriting system on the web server, a resource-based uri schema (figure 13) would be both powerful and desirable.8 n lessons learned in the development of multipac the lessons learned in the development of multipac fall into three main categories, listed here in order of importance. metadata quality considerations quality metadata—characterized by unified schemas; useful crosswalking; and consistent, thorough description—facilitates finding and gathering. in practice, a surrogate record is as important as the resource it describes. below a certain quality threshold, its accompanying resource may never be found, in which case it may as well not exist. surrogate record quality influences relevance ranking and can mean the difference between the most relevant result appearing on page 1 or page 50 (relevance, of course, being a somewhat disputed term). solr and similar systems will search all surrogates, including those that are of poor quality, but the resulting relevancy ranking will be that much less meaningful. figure 13. example of an implementation-based vs. resource-based uri implementation-based http://www.library.unlv.edu/arch/archdb2/index.php/projects/view/1509 resource-based (hypothetical) http://www.library.unlv.edu/item/483742 figure 11. the multipac item view page figure 12. the multipac list view page information discovery insights gained from multipac | dolski 179 metadata quality can be evaluated on several levels, from extremely specific to extremely broad (figure 14). that which may appear to be adequate at one level may fail at a higher level. using this figure as an example, multipac requires strong adherence to level 5, whereas most of our metadata fails to reach level 4. a “level 4 failure” is illustrated in table 3, which compares sample metadata records from four different multipac resources. empty cells are not necessarily “bad”— not all metadata elements apply to all resources—but this type of inconsistency multiplies as the number of resources grows, which can have negative implications for retrieval. suggestions for improving metadata quality the results from the multipac project suggest that metadata rules should be applied strictly and comprehensively according to library-wide standards that, at our libraries, have yet to be enacted. surrogate records must be treated as must-have (rather than nice-to-have) features of all resources. resources that are not yet described in a system that supports searchable surrogate records should be transitioned to one that does; for example, html webpages should be transitioned to a content management system with metadata ascription and searchability features (at unlv, this is planned). however, it is not enough for resources to have high-quality metadata if not all schemas are in sync. there exist a number of resources in our library that are well-described but whose schemas do not mesh well with other resources. different formats are used; different descriptive elements figure 14. example scopes of metadata application and evaluation, from broad (top) to specific table 3. comparing sample crosswalked metadata from four different unlv libraries resources library catalog digital collections special collections database las vegas architects & buildings database title goldfield: boom town of nevada map of tonopah mining district, nye county, nevada 0361 : mines and mining collection flamingo hilton las vegas creator paher, stanley w. booker & bradford call number f849.g6p34 contents (item-level description of contents) format digital object photo collections database record language eng eng eng coverage tonopah mining district (nev.) ; ray mining district (nev.) description (omitted for brevity) publisher nevada publications university of nevada las vegas libraries unlv architecture studies library subject (lcsh omitted for brevity) (lcsh omitted for brevity) 180 information technology and libraries | december 2009 are used; and different interpretations, however subtle, are made of element meanings. despite the best intentions of everyone involved with its creation and maintenance, and despite the high quality of many of our metadata records when examined in isolation, in the big picture, multipac has demonstrated—perhaps for the first time—how much work will be needed to upgrade our metadata for a discovery system. would the benefits make the effort worthwhile? would the effort be implementable and sustainable given the limitations of the present generation of “silo” systems? what kind of adjustments would need to be made to accommodate effective workflows, and what might those workflows look like? these questions still await answers. of note, all other open-source and vendor systems suffer from the same issues, which is a key reason that these types of systems are not yet ascendant in libraries.9 there is much promise in the ability of infrastructural standards like frbr, skos, rda, and the many other esoteric information acronyms to pave the way for the next generation of library discovery systems. organizational considerations electronic information has so far proved relatively elusive to manage; some of it is ephemeral in existence, most of it is constantly changing, and all of it is from diverse sources. attempts to deal with electronic resources—representing them using catalog surrogate records, streamlining website portals, farming out the problem to vendors—have not been as successful as they have needed to be and suffer from a number of inherent limitations. multipac would constitute a major change in library resource provision. our library, like many, is for the most part organized around a core 1970s–80s ils–support model that is not well adapted to a modern unified discovery environment. next-generation discovery is trending away from assembly-line-style acquisition and processing of primarily physical resources and toward agglomerating interspersed networked and physical resource clouds from onand offsite.10 in this model, increasing responsibilities are placed on all content providers to ensure that their metadata conforms to site-wide protocols that, at our library, have yet to be developed. n conclusion in deciding how to best deal with discovery issues, we found that a traditional product matrix comparison does not address the entire scope of the problem, which is that some of the discoverability inadequacies in our libraries are caused by factors that cannot be purchased. sound metadata is essential for proper functioning of a unified discovery system, and descriptive uniformity must be ensured on multiple levels, from the element level to the institution level. technical facilitators of improved discoverability already exist; the responsibility falls on us to adapt to the demands of future discovery systems. the specific discovery tool itself is only a facilitator, the specific implementation of which is likely to change over time. what will not change are library-wide metadata quality issues that will serve any tool we happen to deploy. the multipac project brought to light important library-wide discoverability issues that may not have been as obvious before, exposing a number of limitations in our existing metadata as well as giving us a glimpse of what it might take to improve our metadata to accommodate a next-generation discovery system, in whatever form that might take. references 1. unlv libraries usability committee, internal library website usability testing, las vegas, 2008. 2. karen calhoun, “the changing nature of the catalog and its integration with other discovery tools.” report prepared for the library of congress, 2006. 3. xiaoming liu et al., “federated searching interface techniques for heterogeneous oai repositories,” journal of digital information 4, no. 2 (2002). 4. apache software foundation, apache solr, http://lucene .apache.org/solr/ (accessed june 11, 2009). 5. dublin core metadata initiative, “dublin core metadata element set, version 1.1,” jan. 14, 2008, http://dublincore.org/ documents/dces/ (accessed june 25, 2009). 6. lorcan dempsey, “a palindromic ils service layer,” lorcan dempsey’s weblog, jan. 20, 2006, http://orweblog.oclc .org/archives/000927.html (accessed july 15, 2009). 7. tod a. olson, “utility of a faceted catalog for scholarly research,” library hi tech 4, no. 25 (2007): 550–61. 8. tim berners-lee, “hypertext style: cool uris don’t change,” 1998, http://www.w3.org/provider/style/uri (accessed june 23, 2009). 9. bowen, jennifer, “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1,” information technology and libraries 2, no. 27 (june 2008): 6–19. 10. calhoun, “the changing nature of the catalog.” using dpla and the wikimedia foundation to increase usage of digitized resources article using dpla and the wikimedia foundation to increase usage of digitized resources dominic byrd-mcdevitt and john dewees information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.13659 dominic byrd-mcdevitt (dominic@dp.la) is data fellow, digital public library of america. john dewees (john.dewees@toledolibrary.org) is supervisor, digitization services, toledo lucas county public library. © 2022. abstract the digital public library of america has created a process by which rights-free or openly licensed resources that have already been harvested can be copied over into wikimedia commons, thus creating a simple path for including those digital collections materials into wikipedia articles. by meeting internet users where they already are, rather than relying on them to navigate to individual digital libraries, the access and usage of digital assets is dramatically increased, in particular to user groups that might otherwise not have a reason to interact with such digitized resources. introduction a dpla-sponsored webinar given by dominic byrd-mcdevitt, dpla data fellow, and sandra fauconnier, glam-wiki specialist at the wikimedia foundation, on april 21, 2020, entitled “dpla intro to wikimedia: increased discoverability and use” introduced a workflow by which records harvested by the digital public library of america (dpla) could be automatically copied over into wikimedia commons with their accompanying metadata.1 the major benefit of this migration is the ease with which assets can then be added to wikipedia articles, exposing resources to a large audience of general internet users who might otherwise have no reason to interact with a given repository’s resources. the gains from making digitized resources available in wikipedia articles is substantial, providing incredibly high usage statistics while requiring very little time commitment to execute the work. this dpla project, launched in early 2020, was a result of grant funding provided by the alfred p. sloan foundation and ongoing consultation from the wikimedia foundation. dpla’s interest in designing this system stemmed from an exploration of new ways to increase usage of materials. while previous bulk uploads to wikimedia commons by cultural institutions have required technical expertise and steep learning curves in navigating the wikimedia community, this project was designed to reduce these barriers by taking advantage of dpla’s role as an aggregator (more information is available at https://commons.wikimedia.org/wiki/commons:partnerships). with the workflow developed by dpla’s technology team in mid-2020, an authorized bot account on wikimedia commons (user:dpla_bot, https://commons.wikimedia.org/wiki/user:dpla_bot) uploads assets from dpla institutions. using data provided by contributing institutions, dpla applies filters to identify eligible items from participating institutions, then for each of these generates wiki markup from descriptive metadata and downloads media files to a server. these files are uploaded by a script that interacts with wikimedia’s api using the pywikibot framework (https://www.mediawiki.org/wiki/manual:pywikibot). by centralizing all of the dpla network’s wikimedia commons uploads, dpla was able to upload over 2.25 million files (or 2.5 tb of total mailto:dominic@dp.la mailto:john.dewees@toledolibrary.org https://commons.wikimedia.org/wiki/commons:partnerships https://commons.wikimedia.org/wiki/user:dpla_bot https://www.mediawiki.org/wiki/manual:pywikibot information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 2 storage) from 780,000 items in under a year and a half, becoming the largest single contribution to wikimedia commons ever (by more than quadruple the previous record). 2,3 this approach to the problem provides a simple on-ramp to participation in wikipedia for dpla institutions—especially the many that would otherwise lack the resources or expertise to do so— by requiring of them only those tasks that need their local knowledge, such as describing their own collections prior to aggregation and then making editorial decisions on wikipedia about them once uploaded. this project required a chain of partnerships between separate organizations, as well as a variety of metadata and technical requirements that needed to be satisfied: records of digitized resources are created by an organization locally and are then harvested by dpla. the eligible records in dpla are then copied over into wikimedia commons. once images are in wikimedia commons it is a straightforward process to embed the images in wikipedia articles, thus achieving the goal of expanded use and access to digitized resources. john dewees, supervisor digitization services at the toledo lucas county public library (tlcpl), was in attendance at the april 21, 2020 webinar and subsequently met with dominic byrd mcdevitt on april 30, 2020 to discuss the feasibility of using tlcpl collections as a pilot project for this workflow. the copying of records from tlcpl’s repository into wikimedia commons was actually started in the course of that first conversation on april 30. a map from page 96 of the book geography of ohio (see figures 1 and 2), previously digitized by dewees, will be used to illustrate the process of how records move through the various tools and platforms discussed.4 tlcpl makes digitized resources available through ohio memory, a shared contentdm instance for libraries, archives, and museums in ohio maintained by the state library of ohio and the ohio history connection. information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 3 figure 1. digitized image of geography of ohio, page 96, as seen in ohio memory. information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 4 figure 2. record metadata for geography of ohio, page 96, as seen in ohio memory. information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 5 dpla harvest dpla is a discovery portal that aggregates records of digitized resources from over 4,000 libraries, archives, and museums from around the united states. this creates a single search interface allowing millions of digital records to be searched simultaneously without having to navigate through a wide variety of different digital libraries. the aggregation of these records is accomplished by working with two different types of partners: content hubs and service hubs.5 content hubs are either organizations large enough to contribute to dpla directly, such as the library of congress or harvard library, or large digital libraries that work with partner institutions of their own, such as hathitrust or the internet archive. service hubs, on the other hand, act as mediators between the national aggregation service and individual organizations in states (such as ohio and its service hub, the ohio digital network) or regions (such as utah, idaho, and nevada, who have collectively formed a service hub in the mountain west digital library). service hubs ensure that the technical and metadata requirements for harvesting into dpla are satisfied and act as consultants and facilitators to prospective contributors. as dpla has grown over time, the metadata requirements and possibilities have also evolved and have varied depending on which service hub a contributing organization is working with. the ohio digital network (odn) is the service hub for our example page from ohio memory. odn’s metadata requirements for contributors in march 2021 included a title and a standardized rights statement in the metadata application profile for the contributing collection. more information on the dpla harvest process for the ohio digital network is available at https://ohiodigitalnetwork.org/contributors/getting-started. the nature of these requirements has also evolved since odn’s first harvest in march 2018. initially, the standardized rights statement was required to be one of the options from rightsstatements.org but through the work of dpla and odn, now creative commons licenses and the cc0 public domain dedication can be utilized as well. standardized rights statements must be formatted as machine-readable uris rather than textual descriptions. finally, the technical backend that supports the harvest of a digital collection is via an oai-pmh feed. other hubs operate in very different ways—such as some that actually host all their contributors’ collections in a single domain—but in all cases the end result is providing a data set that dpla can harvest and ingest. figures 3 and 4 illustrate this process, showing the geography of ohio represented as a record in dpla (available at https://dp.la/item/aaba7b3295ff6973b6fd1e23e33cde14) with associated metadata. https://ohiodigitalnetwork.org/contributors/getting-started https://rightsstatements.org/page/1.0/?language=en. https://creativecommons.org/licenses/ https://creativecommons.org/share-your-work/public-domain/cc0/ https://dp.la/item/aaba7b3295ff6973b6fd1e23e33cde14 information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 6 figure 3. geography of ohio as seen in dpla, specifically focusing on the thumbnail, link to the original record, and initial metadata fields. information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 7 figure 4. geography of ohio as seen in dpla, specifically focusing on the remaining metadata fields harvested. this process achieves the first level of aggregation: harvesting thumbnail images (full-sized images suitable for research are not harvested in this process) and metadata from local digital repositories and making them available for a unified search experience in dpla. dpla’s aggregation currently contains over 42 million items, with the majority of these containing standardized rights uris; 18 million items have rights compatible with upload to wikimedia commons (as can be seen at https://dp.la/search?rightscategory=%22unlimited%20reuse%22). once dpla has access to the records, it is possible for the code authored by dpla staff to be utilized to then integrate the resources into wikimedia commons. wikimedia commons harvest wikimedia commons is part of the larger network of services and tools under the umbrella of the wikimedia foundation. there are a wide variety of different tools available such as wikidata, a portal for open structured data; wikipedia, a collaboratively edited open encyclopedia; and https://dp.la/search?rightscategory=%22unlimited%20re-use%22 https://dp.la/search?rightscategory=%22unlimited%20re-use%22 information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 8 wikimedia commons. this last portal uses the same software platform that powers wikipedia to create an open file and media server that can interoperate with the other tools. wikimedia commons is capable of hosting digital still images, audio files, and video files. anyone can contribute to this open repository so long as the work is in the public domain or openly licensed. users may either release a work for which they own the rights under an open license at upload time or may upload any other works by providing evidence in the metadata that the work is out of copyright or openly licensed (more information on copyright and licensing in wikimedia commons is available at https://commons.wikimedia.org/wiki/commons:licensing). with this in mind, in order for records in dpla to be eligible for harvest into wikimedia commons, they first must have one of the five specific standardized rights statements available at the following links:6 • http://rightsstatements.org/vocab/noc-us/1.0/ • https://creativecommons.org/publicdomain/zero/1.0/ • https://creativecommons.org/publicdomain/mark/1.0/ • https://creativecommons.org/licenses/by/4.0/ • https://creativecommons.org/licenses/by-sa/4.0/ the uris above indicate the most recent version of each of the associated copyright descriptions or licenses, though being published under the most recent version is not a requirement for harvest into wikimedia commons. while standardized rights statements are not a requirement for contributing to dpla generally, they are being used as a requirement for wikimedia commons upload so that the software has a machine-readable way to determine the compatibility of rights. though it is a non-profit educational resource, wikimedia commons does not utilize media under fair use or materials only licensed for noncommercial/educational use, in order to ensure its users may reuse the media for any purpose. as a result one thing to keep in mind is that while a given organization may include in their gift or accession agreement a statement that digitized versions of physical resources are allowed to be shared through channels decided by the organization, this does not necessarily extend to wikimedia commons users outside your organization, because of the requirement to be able to reuse materials with little restriction past attribution and the need to share alike, depending on the standardized rights statement. dpla locates the asset to upload by using urls explicitly provided by the service hub; the urls can be provided in one of two ways. one is to provide the iiif manifest url (via the iiif presentation api), from which the dpla-developed software queries the manifest for the list of assets which are listed by the presentation api in the form of iiif image api urls. the other way the media location can be identified is by providing a list of direct urls to the media in the field dpla calls mediamaster during the initial harvest process. unlike the iiif manifest url, this is a multivalued field that can accommodate a list of urls. the reason for this approach is to allow any institution to contribute assets via the pipeline, regardless of whether they actually have implemented iiif in their repository or not. not all organizations have adopted the iiif suite of apis so it is important to be able to provide more than one avenue for wikimedia commons harvest. https://commons.wikimedia.org/wiki/commons:licensing http://rightsstatements.org/vocab/noc-us/1.0/ https://creativecommons.org/publicdomain/zero/1.0/ https://creativecommons.org/publicdomain/mark/1.0/ https://creativecommons.org/licenses/by/4.0/ https://creativecommons.org/licenses/by-sa/4.0/ https://iiif.io/api/presentation/3.0/ https://iiif.io/api/presentation/3.0/ https://iiif.io/api/image/3.0/ information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 9 however, providing a iiif manifest when and if it becomes available has benefits over the mediamaster field. it will always be true when queried, whereas the mediamaster values are only accurate to the last harvest, which may be a month or more out of date. figure 5. the dashboard developed by dpla displaying, for pine river library, percent of records that have open rights statements and percent of files with media access. a dashboard has been developed for dpla content hub and service hub administrators to analyze how many records in a given collection conform to the standardized rights statement and iiif api requirements (see figure 5). harvest of a collection into wikimedia commons from dpla necessitates that all eligible records in the collection be harvested into wikimedia commons; it is not possible for a participating institution to hand-curate which of the eligible items will be included. that is, all records in a given collection with the aforementioned standardized rights statements will be harvested into wikimedia commons. an additional signed agreement or memorandum of understanding has not been required between dpla and participating organizations due to the open nature of the works being transferred. since the works have been identified as in the public domain or openly licensed, users can already freely use the resources for any purpose they want, so long as it conforms to the appropriate creative commons license. resource presentation in wikimedia commons each portion of the migration process presents the resource in different ways. the original instance of geography of ohio is made available in contentdm as a complex digital object: multiple images (or more specifically in this case, pages) associated with a single metadata record. dpla presents this resource only in terms of its metadata along with a thumbnail image of the resource itself; to view the contents of the resource the user is directed back to the original repository for full access to the digital object. the migration process into wikimedia commons actually copies the image assets themselves along with the metadata. in this example, both the information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 10 image assets and the metadata are drawn from contentdm. wikimedia commons is not able to accommodate complex digital objects, and any that are imported via this process are broken out into discrete simple digital objects in wikimedia commons, for example, page 96 of geography of ohio (see figures 6, 7, and 8; view page 96 in wikimedia commons). figure 6. geography of ohio, page 96, as seen in wikimedia commons, with a focus on the file name, image, and viewing options. https://commons.wikimedia.org/wiki/file:geography_of_ohio_-_dpla_-_aaba7b3295ff6973b6fd1e23e33cde14_(page_96).jpg information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 11 figure 7. geography of ohio, page 96, as seen in wikimedia commons, with a focus on the record metadata. figure 8. geography of ohio, page 96, as seen in wikimedia commons, with a focus on the derivative images created from the original record and administrative metadata. information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 12 the filename is programmatically generated and embeds a great deal of information; the following example illustrates the various components of the filename. example filename: file:geography of ohio dpla aaba7b3295ff6973b6fd1e23e33cde14 (page 96) (cropped).jpg 1. the prefix for all items in wikimedia commons, “file:” 2. the title of the work, in this case “geography of ohio” followed by a hyphen 3. the source of the digital object, universally “dpla” for this project, followed by a hyphen 4. the unique identifier assigned by dpla, in this case “aaba7b3295ff6973b6fd1e23e33cde14” 5. in the case of complex objects, the page number, in this case “(page 96)” 6. if the file was cropped using wikimedia common’s built-in image editing tool, “(cropped)” will be included between the page number and file extension to indicate the image is a derivative of an original 7. the file format extension, in this case “.jpg” even if the complex object being imported is not actually a book, the individual item records in wikimedia commons still uses the “(page x)” nomenclature to differentiate the individual objects. the summary section of the wikimedia commons record displays how the metadata is crosswalked into this environment. the dublin core creator, title, description and date fields are copied verbatim from the local metadata application profile (map). to identify the contributing institution, and to differentiate between similarly named institutions, dpla maintains a json file mapping all dpla institutions with their wikidata identifiers.7 this document also indicates which hubs/institutions are participating in the project at any given time through a true/false field that is toggled when an institution authorizes upload. this enables distinct category pages for each contributing institution and analytics to be tracked and provided to dpla, relevant hubs, and contributing institutions. the source/photographer field is one of the most important as it ensures that attribution for the contributing institution is clear. the field contains a narrative description of how dpla facilitated this resource to be available in wikimedia commons. it also makes available information on the original contributing institution with links to the record as it is originally displayed (in ohio memory in this case) as well as in dpla. proper attribution of items was a topic that came up continuously when discussing this project with other organizations, so it should be reassuring to know that credit and direct links back to resources is enabled in this workflow. the permission and standardized rights statement fields leverage the aforementioned uris to be able to provide information to the user on the copyright status of the work as well as concrete information on how exactly they are able to use it responsibly for their own purposes. finally, an interesting aspect of this record is the links provided to derivative images. in this case we can see the map displayed on this book page has two cropped derivative images. use in wikipedia articles all of the work described above is in service of one goal: to enable higher usage and exposure of digitized resources in wikipedia articles. while it is possible to do this work manually, inserting images into articles without being a dpla contributor or even having a digital repository to speak information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 13 of, the automated process is a clear advantage, especially when talking about large collections. for the map on page 96 in geography in ohio, we can see that the map of limestone distribution in ohio has been included in an image gallery on the limestone wikipedia article (see figures 9 and 10). figure 9. the wikipedia article on limestone displaying the introduction and one image (but not the worked-example image). https://en.wikipedia.org/wiki/limestone information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 14 figure 10. the image gallery in the limestone wikipedia article, with the map from geography of ohio included and seen at the bottom right. figure 11. the source view editing option for the limestone article in wikipedia, allowing direct editing of the wikitext. once images are in wikimedia commons, embedding the images in wikipedia articles is a simple process. one option for wikipedia editors is to use a what you see is what you get (wysiwyg) information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 15 html editor that should be familiar to most users. alternately, there is also a source view editing option which uses the custom markdown called wikitext to format pages in wikipedia (see figure 11). source view editing allows more precision when inserting images into wikipedia articles and makes it easier to understand how they will ultimately be displayed in the article. the way in which different page elements flow around one another in articles can be surprising when using the wysiwyg editing option as images assumed to show up where you placed the cursor can ultimately be placed in very different locations than expected. 8 usage analytics analytics tools are available that allow organizations to track the articles containing their assets, showing what image was embedded in an article and how many views the article received. tlcpl’s initial ingest added a total of 129,725 discrete image assets to wikimedia commons. from that pool, images were added to a total of 227 wikipedia articles between may 2020 and february 2021 (see figure 12). in that time period the articles had a total of 11.7 million page views (see figure 13).9 in february 2021 alone, the 227 enriched articles received 1.87 million page views. by comparison, the total number of records tlcpl had available in ohio memory was 129,395 in february 2021, and those records received 26,602 unique page views. the major strength of this project is to display locally created digitized resources where researchers would already be on the open web and take advantage of that much wider level of exposure.10 figure 12. a graph displaying the cumulative total number of articles with inserted images from tlcpl resources from may 2020 to february 2021. https://en.wikipedia.org/wiki/help:wikitext information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 16 figure 13. a graph displaying the monthly total number of page views of wikipedia articles with inserted images from tlcpl resources from may 2020 to february 2021. there is a valid discussion to be had about the comparison of these metrics, as comparing page views to unique page views is not a one-to-one match, but no matter the measurement it is fairly clear this audience is an order of magnitude beyond what might conventionally be available. ultimately what might be one of the most exciting metrics for an organization looking to implement this work is the amount of time it actually took to execute this project. since tlcpl’s records already satisfied the requirements to be copied over to wikimedia commons, the actual import process was able to begin during the april 30, 2020 zoom call between tlcpl and dpla staff that was set up to discuss the project; from the perspective of the contributing organization, this process takes essentially no time or effort. once the process is started, staff at the contributor institution will be informed when the records have finished being copied. the actual work of locating images for inclusion into articles and inserting them took roughly an hour of work a week, or roughly ten minutes per article, sometimes less. approximately 38 hours of work was spent identifying images and inserting them into articles between may 2020 and february 2021. while not of central concern to the project or its usage, the editorial work is also interesting and uses enough creativity and problem solving to be an enjoyable activity. because the resources in wikimedia commons are available to be used by anyone (as in, anyone with a device and an internet connection), this makes it a wonderful opportunity for interns or volunteers to contribute. volunteers could work on the editorial portion of this project remotely with no real barriers. while all the effort of getting tlcpl digitized images into wikipedia described here has been using previously existing articles, this work could make an excellent partnership opportunity information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 17 with schools to write and create whole new articles for which there is an abundance of already digitized resources to support. conclusion the work of remediating and writing metadata to participate in large consortial efforts such as dpla is always going to be a major undertaking, but projects like this that can leverage automation and partnerships show just how powerful these relationships can be. making locally digitized resources available through dpla, copying them over to wikimedia commons, and then embedding those images into wikipedia articles is an excellent opportunity to meet users where they already are—online. this work provides exceptionally high usage statistics and is fertile ground for outreach and programming opportunities to get partners, volunteers, and interns involved with making those digitized resources available in wikipedia. acknowledgements special thanks to jen johnson, library consultant at the state library of ohio, and virginia dressler, digital projects librarian at kent state university, for their support in enabling this work and article. endnotes 1 this presentation is available on youtube at https://youtu.be/0bsoksybcbi. information on all past dpla webinars and programming can be found at https://pro.dp.la/events/workshops. 2 the entire collection of all resources contributed to wikimedia commons via dpla can be found at https://commons.wikimedia.org/wiki/category:media_contributed_by_the_digital_public_libr ary_of_america. 3 statistics related to contributor totals were created from a wikimedia database query published at https://quarry.wmflabs.org/query/51256. 4 geography of ohio was published as part of a series of bulletins by the ohio state geological survey. the book was authored by roderick peattie, an assistant professor of geology at ohio state university, in 1923. this item was digitized by the toledo lucas county public library and uploaded as part of public domain day 2019. the digitized version of this book is available through ohio memory at https://ohiomemory.org/digital/collection/p16007coll33/id/115214. 5 more information, including a complete list of content hubs and services hubs and their geographic distribution, is available on the dpla website at https://pro.dp.la/hubs/our-hubs. 6 as shared by dominic byrd-mcdevitt in a webinar on march 18, 2021 entitled “dpla + wikimedia: one year in + ten million views,” available at https://www.youtube.com/watch?v=jloj0gvvsnu. 7 the json file is available for view on dpla’s github page at https://github.com/dpla/ingestion3/blob/develop/src/main/resources/wiki/institutions_v2. json. https://youtu.be/0bsoksybcbi https://pro.dp.la/events/workshops https://commons.wikimedia.org/wiki/category:media_contributed_by_the_digital_public_library_of_america https://commons.wikimedia.org/wiki/category:media_contributed_by_the_digital_public_library_of_america https://quarry.wmflabs.org/query/51256 https://ohiomemory.org/digital/collection/p16007coll33/id/115214 https://pro.dp.la/hubs/our-hubs https://www.youtube.com/watch?v=jloj0gvvsnu https://github.com/dpla/ingestion3/blob/develop/src/main/resources/wiki/institutions_v2.json https://github.com/dpla/ingestion3/blob/develop/src/main/resources/wiki/institutions_v2.json information technology and libraries march 2022 using dpla and the wikimedia foundation | byrd-mcdevitt and dewees 18 8 for more information on step-by-step instructions for adding images into wikipedia articles after harvest, see the blog post at https://johndewees.com/2021/03/18/adding-images-towikipedia-articles-via-dpla/. 9 all statistics on wikipedia page views are drawn from the baglama 2 utility available at https://glamtools.toolforge.org/baglama2/#gid=430. 10 up-to-date statistics and data are available at the digitization statistics dashboard created to communicate about major projects in digitization services at the toledo lucas county public library and available at https://docs.google.com/spreadsheets/d/1jv0zzt6h_jl1tq8v2zdxmf5ygn0dfbnhqbffifcep zm/edit?usp=sharing. https://johndewees.com/2021/03/18/adding-images-to-wikipedia-articles-via-dpla/ https://johndewees.com/2021/03/18/adding-images-to-wikipedia-articles-via-dpla/ https://glamtools.toolforge.org/baglama2/#gid=430 https://docs.google.com/spreadsheets/d/1jv0zzt6h_jl1tq8v2zdxmf5ygn0dfbnhqbffifcepzm/edit?usp=sharing https://docs.google.com/spreadsheets/d/1jv0zzt6h_jl1tq8v2zdxmf5ygn0dfbnhqbffifcepzm/edit?usp=sharing abstract introduction dpla harvest wikimedia commons harvest resource presentation in wikimedia commons use in wikipedia articles usage analytics conclusion acknowledgements endnotes intro to coding using python at the worcester public library public libraries leading the way intro to coding using python at the worcester public library melody friedenthal information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.12207 melody friedenthal (mfriedenthal@mywpl.org) is a public services librarian, worcester public library. abstract the worcester public library (wpl) offers several digital learning courses to our adult patrons, and among them is “intro to coding using python”. this 6-session class teaches basic programming concepts and the vocabulary of software development. it prepares students to take more intensive, college-level classes. the bureau of labor statistics predicts a bright future for software developers, web developers, and software engineers. wpl is committed to helping patrons increase their “hireability” and we believe our python class will help patrons break into these lucrative and gratifying professions… or just have fun. history and details of our class i came to librarianship from a long career in software development, so when i joined the worcester public library in january 2018 as a public services librarian, my manager proposed that i teach a class in programming. she asked me to research what language would be best. python got high marks for ease of use, flexibility, growing popularity, and a very active online community. once i selected a language, i had to choose an environment to teach it in – or so i thought. i had absolutely no experience in front of a classroom, and few pedagogical skills, so i sought out an online python course within which to teach. i decided to use the code academy (ca) website as our programming environment. ca has selfguided classes in a number of subjects and the free beginning python course seemed to be just what we needed. i went through the whole class myself before using it as courseware. my intent was to help students register for ca, then, each day, teach them the concepts in that day’s ca lesson. they would then be set to do the online lesson and assignments. we first offered python in june 2018. problems with ca came up right from the start: students registered for the wrong class (despite the handout explicitly naming the correct class) and ca frequently tried to upsell to a not-free python class. since ca’s classes are moocs (massive open online courses), the developers built in an automated way of correcting student code: embedded behind each web page of the course, there’s code that examines the student’s code and decides whether it is acceptable or not. good in theory, not so good in practice. ca’s “code-behind” is flawed and sometimes prevented students from advancing to the next lesson. mailto:mfriedenthal@mywpl.org information technology and libraries june 2020 intro to coding using python at the worcester public library | friedenthal 2 moreover, some of the ca tasks were inane. for example, one lesson incorporated a kind of mad libs game. this is where the instructions ask, for example, for 13 nouns and 11 adjectives, and these are combined with set sentences to generate a silly story. this assignment turned out to be too long and difficult to fulfill, preventing students from advancing. although i used ca the first few times i offered the class, i subsequently abandoned it and wrote my own classroom material. after determining that ca wasn’t appropriate, i chose an online ide where the students could code independently. this platform worked well when i tested it ahead of time, but when the whole class tried to log on at once, we received denial-of-service error messages. hurriedly moving on to plan c, i chose thonny, a free python ide which we downloaded to each pc in the lab (see https://thonny.org/). each student receives a free manual (see figure 1), which i wrote. every time i’ve offered this class i’ve edited the manual, clarifying those topics the students had a hard time with. i’ve also added new material, including commands students have shown me. it is now 90 pages long, written in microsoft word, and printed in color. we use soft binders with metal fasteners. figure 1. intro to coding using python manual developed for the course. https://thonny.org/ information technology and libraries june 2020 intro to coding using python at the worcester public library | friedenthal 3 the manual consists of the following sections: • cover: course name, dates we meet, time class starts and ends, location, instructor’s name, manual version number, and a place for the student to write their own name. • syllabus: goals for each of the six sessions. this is aspirational. • basic information about programming, including an online alternative to thonny, for students who don’t have a computer at home and wish to use our public computers for homework. • lessons 1 – 17: “hello world” and beyond. • lesson 18: object oriented design, which i consider to be advanced, optional material. skipped if time is pressing or the class isn’t ready for it. • lesson 19: wrap-up: o how to write good code. o how to debug. o list of suggested topics for further study. o online resources for python forums and community. • list of wpl‘s print resources on python and programming. • relevant comic strips and cartoons. in march 2019, my manager asked me to start assigning homework. if a student attends all six sessions and makes a decent attempt at each assignment, at the sixth session they receive a certificate of completion. the certificate has the wpl name & logo, the student’s name, and my signature. typically three or four students earn a certificate. homework is emailed to me as an attachment. this class meets on tuesday evenings and i tell students to send me their homework as soon as possible. inevitably, several students don’t email me until the following monday. while i don’t give out grades, i do spend considerable time reviewing homework, line by line, and i email back detailed feedback. when the january 2020 course started, i found that between october’s class and january, outlook implemented a security protocol which removes certain file extensions from incoming email. and – you can see where this is going – the .py python extension was one of them. i told students to rename their python code files from xxxx.py to xxxx.py.doc, where “xxxx” is their program name. this fools outlook into thinking the file is a microsoft word document and the email is delivered to me intact. when it arrives, i remove the .doc extension from the attachment and save it to a student-specific file. then i open the file in thonny and review it. physically, our computer lab contains an instructor’s computer and twelve student computers (see figure 2). it also has a projector which projects the active window from the instructor’s computer onto a screen: usually the class manual. i use dry erase markers in a variety of colors to illustrate the concepts on a whiteboard. there is also a supply of pencils on hand for student notetaking use. the class is offered once per season. although the classroom can accommodate twelve students, we set our maximum registration to fourteen, which allows us to maximize attendance even if patrons cancel or don’t show up. and if all fourteen do attend the first class, we have two lap tops i information technology and libraries june 2020 intro to coding using python at the worcester public library | friedenthal 4 can bring into the lab. we also maintain a small waitlist, usually of five spots. we’ve offered this class seven times and the registration and waitlists have been full every time. sometimes we have to turn students away. figure 2. classroom at worcester public library. however, we had a problem with registered patrons not showing up, so last spring we implemented a process where, about a week before class starts, i email each student, asking them confirm their continued interest in the class. i tell them that if they are no longer interested—or don’t respond i will give the seat we reserved for them to another interested patron (from the waitlist). in this email i also outline how the course is structured and that they can each earn a certificate of completion. i tell them class starts promptly at 5:30 and to please plan accordingly. some students don’t check their email. some patrons show up without ever registering; they are told registration is required and to try again in a few months. i keep track of attendance on an excel spreadsheet. here in worcester, ma, weather is definitely a factor for our winter sessions. information technology and libraries june 2020 intro to coding using python at the worcester public library | friedenthal 5 over time i’ve made the class more dynamic. i have a student read a paragraph in the manual aloud. i’ve switched around the order of some lessons, in response to student questions. i have them play a game to teach boolean logic: “if you live in worcester and you love pizza, stand up!”… then: “if you live in worcester or you love pizza, stand up!” students range from experienced programmers (of other languages), to people with no experience but great aptitude, to people who just never seem to “get it”. this material is technical and i try hard to communicate the concepts but i lose a few students every time. we ask our patrons for feedback on all of our programs. our python students have written: • “… the classes were formatted in an organized manner that was beginner friendly” • “the manual is a big help. i'm thankful that the program is free.” • “… coding is fun and i learned a new skill.” • “this made me think critically and helped me understand where my errors in the programs were.” wpl is proud to offer classes that make a difference in our patrons’ lives. abstract history and details of our class 149 an integrated computer based technical processing system in a small college library jack w. scott: kent state university library, kent, ohio (formerly lorain county community college, lorain, ohio) a functioning technical processing system in a two-year community college library utilizes a model 2201 friden flexowriter with punch card control and tab card reading units, an ibm 026 key punch, and an ibm 1440 computer, with two tape and two disc drives, to produce all acquisitions and catalog files based primarily on a single typing at the time of initiating an order. records generated by the initial order, with slight updating of information,. are used to produce, via computer, manual and mechanized order files and shelf lists, catalogs in both the traditional 3x5 card form and book form, mechanized claiming of unfilled orders, and subject bibliographies. the lorain county community college, a two-year institution designed for 4000 students, opened in september 1964, with no librarian and no library collection. when the librarian was hired in october 1964, lack of personnel, both professional and clerical, forced him to examine closely traditional ways of ordering and preparing materials, his main task being the controlled building of a collection as quickly as possible. no library having been established, there were no inflexible rules governing acquisitions or cataloging and no catalogs or other files enforcing their pattern on future plans. the librarian was free to experiment and adapt as much as he desired; and adapt and experiment he did, remembering, at least most of the time, the primary reasons for designing the 150 journal of library automation vol. 1/3 september, 1968 system. these were 1) to notify the vendor about what material was desired; 2) to have readily available information about when material had been ordered and when it might arrive; 3) to provide a record of encumbrances; 4) to make sure that material received was the material which had been ordered; 5) to initiate payment for material received; 6) to provide catalog copy for technical processes to use in producing card and book catalogs; 7) to provide inexpensive control cards for a circulation system; and 8) to provide whatever other statistics might be needed by the librarian. the librarian attended the purdue conference on library automation (october 2-3, 1964) and an ibm conference on a-utomation held in cleveland (december 1964), and visited libraries with data processing installations, such as the decatur public library. then an extensive literature search was run on the subject of mechanization of libraries and the available material thoroughly reviewed. it was the consensus of the president, the librarian, and the manager of data processing that, as white said later, "the computer will play a major part in how libraries are organized and operated because libraries are a part of the fabric of society and computers are becoming a daily accepted part of life." ( 1) moreover, it was agreed that the use of data processing equipment would be justified only if it made building a collection more efficient and more economical than manual methods could do. metro}) after careful consideration of the ibm 870 document writing system ( 2) and the system described by kraft ( 3) as input techniqu~s for the college library, ·it . was decided to use the friden flexowriter, recommended both at purdue and, in european applications, by bernstein ( 4). its most attractive feature was the use of paper tapes to generate various secondary. records without the necessity of proofreading each one. the college, by mid-1965, ·had the following equipment available for library use: one friden flexowriter (model 2201) with card punch control unit and tab card reading unit, one ibm 026 key punch with alternate programming, and guaranteed time on the college-owned ibm 1440 8k computer with two tape and lwo disc drives. to produce punched paper tape and tab cards with only one keyboarding, an electrical connection between the flexowtiter and the keypunch was especially designed and installed. . it was fortunate for the library that the college also had an excellent data processing· manager who was interested in seeing data processing machines and techniques utilized in as many ways as possible. with his enthusiastic support, aid in programming and preparation of flow charts, and patient cooperation, it was not surprising that the automation of library processes was completely successful. ·· at this time it ·was decided that since the college was likely to remain integrated computer based processing/ scott 151 a single-campus institution it would be uneconomical to rely solely on a book catalog, even though the portability of such a device was most attractive to librarian and faculty alike. therefore, it was planned to have the public catalog, as well as the official shelf list, in card form, permitting both to be kept current economically. these two files were to be supplemented with crude book catalogs which would be a by-product, among others, of the typing of the original book orders. these book catalogs were not to replace the card catalog but simply to extend and facilitate use of the collection. it was also decided to design a system which would duplicate as few as possible of the manual aspects of normal technical processing systems, but one which would, at the same time, permit the return to a manual system from a machine system with a minimum of trouble and tribulation if support for the library's automated system should be withdrawn. concern about such withdrawal of support had originally been voiced by durkin and white in 1961, when they said: "there have been a number of unfortunate examples of libraries that abandoned their home-grown catalogs for a machine retriev(tl program because there was some free computer time, only to lose their machine time to a higher priority project and to be left with information storage to which they no longer have access. many of these librarians, and others who have heard about their plight, are determined not to bum their bridges behind' them by abandoning their reliable, if old-fashioned, 3x5 card catalogs." ( 5) although the necessity of returning to an inefficient manual system has not, to date, raised its ugly head, there were times when it was most comforting to know that routes of retreat and reformation were available. under the present system there is only one manual keyboarding of descriptive catalog main entries for most titles. all other records are generated from these main entries. this integrated system was adopted on the assumption that cataloging infonnation in some form ( 6) would be available for a high percentage of books. experience showed that about 95 percent of acquisitions did have catalog copy readily available. of 4029 titles processed in a 5-month period, catalog copy was available for 3824. after verification that a requested title is neither in the library nor on order, a copy of a catalog entry is located in a source such as the national union catalog, library of congress proofsheets, or publisher's weekly, etc. the catalog information is manually typed in its entirety (including subject headings) onto five-part multiple request forms, using the friden flexowriter. output from the friden consists of the multiple order, a punched paper tape containing the full bibliographic entry but no order information, and tab cards, punched by the slave ibm key punch, which contain full order information but only abbreviated bibliographic data. (figure 1 ). the tab cards, containing full order information, are used as input to the 1440 computer to create an "on order" file arranged by order 152 /ou·rnal of library automation vol. 1/ 3 september, 1968 mail copies to vendor typed multiple book orders on order tape fig. 1 on order creation routine. start flexowriter 026 key punch on order cards cards to week integrated computer based p1'0cessing / scott 153 number and stored on magnetic tape, from which an "on order" printout is produced weekly (figure 2). at any given time this magnetic tape order file can be used to total the dollar amount of outstanding orders to any given vendor, or the total amount outstanding to all vendors (figure 3 ). the punched paper tape and two copies of the request form are stored in a standard 3x5 card file arranged by main entry. one copy of the request form is to be used as a work slip when material is received. on order cards for one week fig. 2 on order update. start cpu on order update scratc h a f ter update 154 journal of library automation vol. 1/ 3 september, 1968 the original and one copy of the request form is sent to the vendor, with instructions to return one copy with shipment. in the event the vendor does not comply, the main entry can be located readily by checking the order number or order .date on the "on order" printout and using the abbreviated bibliographic information which appears there. if the material requested has not been shipped within three months, the magnetic tape order file is used to prepare tab cards containing all original order information and the cards are sent to the library with a notice stating that shipment is overdue. these tab cards are used as input fig. 3 on order cost tally. start cpu list or tab of on order file by cost #30000 on order cost tab integrated computer based processing/ scott 155 to the flexowriter tab card reader unit which activates the flexowriter itself and prepares "overdue, ship or cancel" notices to the vendor (figfig. 4 late on order routine. ure 4). 156 journal of library automation vol. 1/ 3 september, 1968 products when material is received, the paper tape and one copy of the main entry work slip are pulled from the card order file and sent to the cataloger who notes on the work slip the call number to be used as well as any changes. the work slip, punched paper tape and book then pass to the technician who does the shelf listing. at this point the original output paper tape containing full bibliographic information is used as input for the flexowriter to create a standard 3x5 hard-copy shelf list card containing full bibliographic information, as well as inventory data such as vendor, date of receipt and cost. the last three items and the call number are added manually as "changes." simultaneously a new paper tape is produced as output which contains bibliographic information from the first tape and all revisions deemed necessary by the cataloger. the revised paper tape is used on the flexowriter to prepare 3x5 card sets for the public catalog. at the same time the slave keypunch prepares a set of tab cards containing full acquisitions fig. 5 shelf list creation routine. integrated computer based processing/scott 157 information: cost, vendor, date of receipt; and abbreviated bibliographic information: short author, short title, full call number (including copy, year, part and volume), accession number and short edition statement (figure 5). the tab cards are used first to delete the item from the magnetic tape "on order" file and second as input to create a magnetic tape shelf list of abbreviated information arranged by call number (figure 6). the magnetic tape shelf list is used to create 1) eight copies of author, title, and classified catalogs which are updated semi-annually; 2 ) printouts of weekly acquisitions; 3) subject printouts on demand; and 4) tab cards which serve as circulation cards for books, film s, drawings, tape and disc recordings, filmstrips and any other materials. the tab cards can be used with the ibm 357 circulation system or any similar system. discussion the efficiency of this system is most dramatically demonstrated by the amount of work accomplished per person per year. one technician can sort by call number cpu circ. caro prep fig. 6 weekly shelf list update. sort by control number cpu 158 journal of library automation vol. 1/ 3 september, 1968 process over one thousand orders per month. over fifteen thousand fully cataloged volumes per year (approximately eleven thousand titles) are added to the collection by a technical processing department which consists solely of one full-time cataloger and two full-time technicians. one technician spends one half of her time typing orders and the other half preparing the shelf list. at present the limiting factor in processing material is not the personnel time available but rather time on the flexowriterkeypunch combination, which runs continuously for sixty hours per week. the cataloger feels if some thirty hours more per week were available for running the machines, or if a second flexowriter were available to handle catalog card output, it would then be possible to order, receive, and fully process fifteen thousand titles per year (eighteen to twenty thousand volumes) with only the present technical processing staff. references 1. white, herbert s.: "to the barricades! the computers are coming!" special libmries 57 (november, 1966), 631. 2. general information manual: mechanized library procedures (white plains, n.y.: ibm, n.d.). 3. kraft, donald h .: libmry automation with data processing equipment (chicago: ibm, 1964). 4. bernstein, hans h.: "die verwendung von flexowritern in dokumentation und bibliothek", n achrichten fur dokumentation 12 (june, 1961), 92. 5. durkin, robert e.; white, herbert s.: "simultaneous preparation of library catalogs for manual and machine applications", special libraries 52 (may, 1961), 231. 6. kaiser, walter h.: "new face and place for the catalog card", library journal 88 (january, 1963 ), 186. reproduced with permission of the copyright owner. further reproduction prohibited without permission. beyond information architecture: a systems integration approach to web-site design maloney, krisellen;bracke, paul j information technology and libraries; dec 2004; 23, 4; proquest pg. 145 beyond information architecture: a systems integration approach to web-site design krisellen maloney and paul j. bracke users' needs and expectations regarding access to information have fundamentally changed, creating a disconnect between how users expect to use a library web site and how the site was designed. at the same time, library technical infrastructures include legacy systems that were not designed for the web environment. the authors propose a framework that combines elements of information architecture with approaches to incremental system design and implementation. the framework allows for the development of a web site that is responsive to changing user needs, while recognizing the need for libraries to adopt a cost-effective approach to implementation and maintenance. t he web has become the primary mode of information seeking and access for users of academic libraries. the rapid acceptance of web technologies is due, in part, to the ubiquity of the web browser, which presents a user interface that is recognized and understood by a broad range of users. as libraries increase the amount of content and broaden the range of services available through their web sites, it is becoming evident that it will take more than a well-designed user interface to completely support users' information-seeking and access needs. the underlying technical infrastructure of the web site must also be organized to logically support the users' tasks. library technical infrastructures, largely designed to support traditional library processes, are being adapted to provide web access. as part of this adaptation process, they are not necessarily being reorganized to meet the changing expectations of web-savvy users, particularly younger users who are not familiar with traditional library organization methods such as the card catalog, print indexes, or other legacy tools. libraries must harness the power of the highly structured information systems that have long been a part of libraries and integrate these systems in new ways to support users' goals and objectives. part of this challenge will be answered by the development of new systems and technical standards, but these are only a partial solution to the problem. an important part of making library systems and web sites function as powerful discovery tools is to modernize the systems that provide existing services and content to support the changing needs and expectations of the user. emerging concepts of information architecture (ia) describe the system requirements from the user perspective but do not provide a mechanism to conceptually integrate existing functions and content, or to inform the requirements necessary to modernize and integrate the current system architecture. the authors propose a framework for approaching a comprehensive web-site implementation that combines components of ia and system modernization that have been successful in other industries. within this framework, those components are tailored for the unique aspects of information provision that characterize a library. the proposed framework expands the concept of ia to include functional and content requirements for the web site. this expansion identifies points within the conceptual and physical design where user requirements are constrained by the existing infrastructure. identification of these constraints begins an iterative design process in which some user requirements inform changes to the underlying system architecture. conversely, when the required changes to the underlying system architecture cannot be achieved, the constraints inform the conceptual design of the web site. the iterative nature of this approach acknowledges the usefulness of much of the existing infrastructure but provides an incremental approach to modernizing installed systems. this framework describes aspects of the conceptual and physicaldesign elements that must be considered together and balanced to produce a web site that supports the goals and objectives of the user but is cost-effective and practical to implement. i information architecture and the problem of libraries ia is both a characteristic of a web site and an emerging discipline. a number of authors have attempted to develop a formal definition of ia. wodtke presents a simple task-based definition, stating that an information architect "creates a blueprint for how to organize the web site so that it will meet all (business, end user) these needs." 1 rosenfeld and marville present a four-part definition in which two parts focus on the practice, and two parts define ia as characteristic. the first characteristic defines ia as a combination of "organization, labeling, and navigation schemes" while the second describes it as "the structural design of an information space to facilitate task description and intuitive access to content." 2 there is general agreement that ia provides a specification of the web site from the perspective of the user. the specification usually describes the organization, navigational elements, krisellen maloney (maloneyk@u.library.arizona.edu) is director of technology at the university of arizona libraries, tucson. paul j. bracke (paul@ahsl.arizona.edu) is head of systems and networking at the arizona health sciences library, tucson. beyond information architecture i maloney and bracke 145 reproduced with permission of the copyright owner. further reproduction prohibited without permission. and labeling required to completely structure a user's web-site experience. ia is not synonymous with web-site design, but rather provides the conceptual foundation upon which a presentation design is based. web-site design adds presentation and graphical elements to ia to create the user experience. library web sites provide a display platform by which library content and services can be accessed through a common user interface. most of the tools and services have been available for decades and, in response to user demand, are increasingly being made web-accessible in digital formats (virtual reference, full-text databases). despite this new access medium and format, the conceptual design of the underlying systems has not changed much. the library technical infrastructure is made up of many loosely coupled systems optimized to perform a single function or to support the work of a library department. library web sites do not present a sufficiently unified interface design or level of technical integration to match current users' mental models of information seeking and access. 3 the systems have not been integrated to support users' overarching goals or meet the expectation of seamless access that they have developed when using other web sites (such as google or amazon). in many cases, users are still expected to understand aspects of the library that are now obsolete (card catalogs) in order to navigate the library's web site. for example, the process of finding a journal article using a typical library web site is based on a print paradigm and has changed little despite the advent of online discovery tools. in a print environment, users first looked at an index to identify an article of interest, then wrote down the citation, went to the card catalog, and there looked up the journal containing the article. if the library owned the journal, the user would then write down the call number and go to the shelves to find the article. this process has not necessarily changed much for many libraries, even though indexes, card catalogs, and journals are often available online. even more confusing is that the end result of some search processes within a library web site is not necessarily content, but a metadata representation of content that must be entered into another search box. although the first search is representative of the search of a traditional index and the second search is representative of the search of the card catalog, many of our users have no mental model for this multistep search process. users accustomed to the simple keyword search available through internet search engines may have great difficulty in understanding the need for the many steps involved in library use. there is an expectation that search systems and online content will be linked, regardless of the economic, legal, and technical factors that make these links difficult. while linking-options in vendor databases and openurl resolvers have begun to simplify the electronic version of the process by automating some of the steps, the multistep process is still valid in many instances in most libraries. it is clear that library web sites must undergo a fundamental change in order to be responsive to the needs of the user. because library web sites appear to be similar to conventional web sites, it is tempting to adopt a general approach to ia to address users' needs. there are, however, several areas in which the general approach to ia does not adequately support the design needs for library web sites. generalized ia approaches, such as those provided by rosenfeld and marville, do not provide adequate guidance regarding the organization and display of content from external sources. there is an unstated assumption that external sources will provide information in the format specified by the web-site architect. ia approaches suggest methods to completely describe the user experience, from the time a user first accesses a site to the point at which a user task is complete, regardless of the origin of the content or service accessed. for example, the content from each of amazon.com' s commercial partners is packaged to operate like a part of the amazon.com site. in contrast, libraries often only have control of a user's experience up to the point at which they leave a library's servers. libraries guide users not only to local services and digitized collections, but to databases, journals, and more that are licensed from external sources and the appearances of which are controlled by external sources. even when using a technical standard such as z39.50 to provide a local look and feel to remote resources, libraries do not necessarily have full control over the data format or elements of the content that is returned. this lack of local control over content is a limitation to libraries adopting common definitions of ia. another design area that is not well supported by generalized approaches to ia is the integration of previously installed systems, such as library catalogs. these legacy systems provide important services that represent decades of development and collaboration, and are essential to the future of libraries. for example, libraries provide access to unique resources and systems ranging from online catalogs to abstracting and indexing databases to interlibrary loan (ill) networks. libraries are using web technologies to provide new access methods to library content and services. these technologies provide a thin veneer on systems that function in a manner unfamiliar to many users. the challenge then becomes to change what lies beneath the surface, the underlying functionality of the site, to support the needs of the user. using a generalized approach to ia, as applied in other settings, libraries would assess the needs of the user and develop a new, complete system that supports those needs. such an approach ignores the extensive, existing infrastructure of legacy systems in libraries that is still useful and that serves purposes beyond the user's web interface. what is 146 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. needed is a standard reference model for library services that provides a framework for access to services and content. this is a long-term goal that requires cooperation and agreement among libraries, and that would allow legacy systems to be repackaged in ways that are more flexible, meet changing user needs, and can be integrated into changing technology environments. because there are currently no such reference models, librarians need to develop other approaches to integrate existing legacy systems into a modernized web site. i extending the ia framework in this paper, the general definition of ia that has been proposed by several authors has been extended to incorporate the additional constraints that characterize library web sites.4 extended information architecture (eia) is the first half of the framework, and provides a complete conceptual design of the web site from the users' perspective. figure 1 depicts the elements and relationships within eia. the coordinating structure provides an overarching framework for the integration of the multiple service elements that provide much of the underlying functionality of the web site. the relationship between the coordinating structure and the service elements is iterative, with service elements constraining the coordinating structure and the coordinating structure informing the design of the service elements. the coordinating structure the coordinating structure contains many of the design elements that are found in generalized approaches to ia, including the organization, navigational structure, and labeling. these are the elements of a web site that, in concert, define the structure of the user interface without specifying the functionality and content underlying that interface. the framework emphasizes aspects of the generalized approaches that are most relevant to libraries and places them in relation to the service elements that specify the content and functionality of the site. the first element of the coordinating structure is the organization of the web site. organization refers to the logical groupings of the content and services that are available to the user. these groupings are not necessarily representative of physical-system implementations, but may be taskor subject-based instead. for example, many academic library web sites have primary groupings that include information resources, services, and user guides. although the information resources may include information from a range of systems (for instance, the catalog, abstracting and indexing databases, full-text databases, locally-developed exhibits), the logical grouping of information resources unifies the concept for the user. a site's organization scheme will often serve as the foundation for the primary navigational choices on a site's main menu or primary navigational bar. another component of the coordinating structure is the navigational structure of the site. navigational structures define the relationships between content and service elements of a site, and between groupings in the site's organization. these structures also include search tools and other link-management tools that help users locate needed content and services. there are usually two types of relationships that form a navigational structure. first is the definition of a global relationship scheme that outlines the primary navigational structure of the site. these often define relationships between sections of a site's organization, but may also provide access to key pieces of functionality from any point within a site. in addition to the overarching global relationship scheme, there are often several locally or functionally defined relationship schemes that are used throughout the site. these local relationship schemes are usually located within a service or content grouping and provide logical connections within their defined grouping. both sets of relationships are designed to support a task and provide pathways for the user to move among the various elements of the site. other relationship schemes may be topic oriented, allowing the user to move easily among similar content sources. these logical relationships are later implemented within a user interface as tools such as menus, navigation bars, and navigation tabs when combined with labels and a visual design. customization and personalization are navigational structures that have gained a fair amount of attention in the library literature. both strategies allow a web site to be displayed differently, based on user characteristics. customization allows the user to create the relationships most suitable for his or her needs. this strategy has been explored by a number of libraries, although there is little convincing evidence that users implement such strategies in an intense or repeated manner. 5 personalization allows a system designer to bring together a set of pages in a relationship that is meaningful for a user or a user group. labels, the third element of the coordinating structure, provide signposts that communicate an integrated view of a web site's design to those who use it. it is important to define a labeling system that consistently and clearly communicates the meaning of the site to the user. accordingly, the labels should be constructed in the user's language, not the librarian's. for example, a user may not understand that an abstracting and indexing database will provide them with information regarding journal articles that are relevant to a topic of interest. in that case, the label "find an article" is more useful than "indexes." beyond information architecture i maloney and bracke 147 reproduced with permission of the copyright owner. further reproduction prohibited without permission. extended informati on architecture coordinating structure • organization: the grouping and specification of the funct ion and content that is necessary to support the site. • navigational structure: the associations among the service and content elements of the site. these relationships provide the conceptual foundation for navigation and include global and local navigationa l concepts, site index and search, customizab le and personalized structures. • labeling: a consistent naming scheme that presents options and choices to users in terms that will understand. serv ice elements • functio nal requirements: the description of the functional elements that are necessary to support the user. • content requirements: the description of the content elements that are necessary to support the user. • content specifications: the description of the content elements that are already available to support the user. • functional specifications: the description of the functional elements that are present in a previously installed system. figure 1. an extended information architecture for developing a conceptual design of library web sites labels are used to describe individual service or content units, but may also be used as headings to provide structural elements to augment the navigational scheme. the consistent use of labels as headings within the site not only increases user understanding of the site, but may also be explicitly constructed to support user tasks. an example of labeling to support tasks can be seen on the university libraries web site of the university of louisville where, under the main heading for articles, the first subheading is step 1: search article databases; and the second subheading is step 2: search (the catalog) by journal title." service elements service elements are the second major component of extended information architecture, and represent the content and functionality of the web site. in this framework, the service elements serve a dual purpose. the definition of service elements involves defining both the ideal requirements for functionality and content as well as the specifications of what is currently available. the definition process can then be used to identify points in the web site where new functions and content need to be added, or where existing functionality must be modernized. these additions and modifications may be achievable immediately, but in many cases an incremental plan for change may need to be developed. the service-element requirements, labeled as functional requirements and content requirements in figure 1, express the users' needs and expectations for the functional or content elements of the web site. the purpose of the requirements definitions is to describe the service elements that are necessary to allow a user to meet his or her goals or objectives in using the site. these requirements are a representation of the ideal composition of a web site, and inform not only the immediate implementation of the site but also the development of future systems and the modernization of existing systems. it is also important to note that the requirements should be developed to express user needs, not a particular implementation option. for example, it might be tempting to specify the implementation of a particular vendor's openurl resolver. this does not, however, describe how the system would function ideally from a user perspective. instead, an appropriate requirement would be that users should be able to link to full text from all citations in an abstracting and indexing database. more specifically, content requirements describe the content that is necessary to meet the users' goals and objectives. access to content is often the primary emphasis of a library web site, and the content requirements describe the intellectual content that should be accessible through a web site. examples of content that might be required are article citations, full-text articles, and multimedia objects. normally, these requirements will be closely connected with library-wide collection-development policies and priorities, and should be driven by subject specialists rather than systems personnel. these requirements inform the development of systems to meet the needs of the users. the content specifications describe the content that is available within the current systems. there are many reasons why content requirements and content specifications do not match, including the inability or choice of a library to acquire a particular piece, the unavailability of specified content, or technical incompatibilities between content and the library's infrastructure. although content is sometimes viewed as the core component of a library web site, there is also great deal of additional functionality that is provided to users. the functional requirements describe the users' needs and expectations of the functionality in the context of completing tasks on the web site. for example, ill forms found on many sites are easy for the user to fill out, although the most effective interface to ill for the user might not involve a form-based user interface at all. it might be a direct system-to-system interface from an openurl resolver to the ill software in which all citation data are transmitted for the user. this requirement is not necessarily obvious when considering ill in isolation, but is evident when considering it in the larger context of the users' goals and objective for the entire web site. the functional specifications describe the functions 148 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. as they exist in the installed base of systems and expose the functionality that is available to the user. when the specifications do not match the requirements, the users' expectations regarding the system will not be fully achieved. the economic and technical limitations of system implementation and modernization often reduce the speed at which the large base of previously installed systems can be modified to meet users' changing needs and expectations. it is thus critical to identify gaps between existing systems and desired systems and discover areas where a web site will have characteristics that are not completely aligned with what the user needs or expects. when the service-element requirements do not match the service-element specifications of existing systems, an iterative design process begins. this process will be intertwined with the evaluation of the system architecture. gaps that can be addressed immediately should be incorporated into an implementation plan for the new web site. longer-term migration or development plans can be developed to fill gaps that cannot be addressed immediately. it is also important to acknowledge that developing and meeting service-element requirements is an iterative process. they will need to be revisited over time as user needs change, and requirements that are met now become the specifications that are evaluated in the future. i interrelationships within eia when the service-element requirements cannot be used to modify the service-element specifications, the service elements constrain the design of the web site and influence the design of the coordinating structure. the upward arrow in figure 1 labeled constrains indicates that the user experience is constrained by the specifications of content or functional elements that are not currently changeable. in such situations, the coordinating structure must be designed to provide additional context for the user to understand the purpose of the existing service elements. this explanatory role can be seen in the implementation of many web sites as formal parts of the organizational structure designed to explain the idiosyncrasies of the web site to the user. for example, many academic library web sites have tutorials, faqs, or sections labeled "how do i . . . ?" that provide tips on using aspects of the site that are not always evident to users. it is necessary to acknowledge the usefulness of the explanatory role of the coordinating structure in the iterative and incremental processes of web-site design. just as bibliographic instruction and adequate signage have allowed the user to navigate aspects of the traditional library that were not intuitive, the coordinating structure provides the conceptual signposts and other guidance required for users to effectively navigate the web site. at the same time, it is important to realize that the explanatory role would not be necessary if the web site's architecture and design were intuitive to the user. as the design of the service elements changes to accommodate the larger goals of the user, the explanatory function of the coordinating structure will be diminished. the main goal of library web site design should be to reduce the explanatory role of the coordinating structure and to develop service elements that seamlessly support the goals and objectives of the user. until all service elements have been modernized to meet the needs of the user, the conceptual design of web sites will represent a compromise between what users require and what it is possible for users to do within the current legacy information infrastructure i system architecture while the conceptual design of the web site describes the needs of the user apart from the technical details of the implementation, the system architecture is the description of the system as it exists. in the case of library web sites, the system architecture is not limited to the functionality and data on the library's web server. instead, it is also inclusive of all core infrastructure, individual systems, and data access and storage mechanisms that provide the blueprint of the web site's backend as it has been built. the individual systems in the architecture may include locally controlled ones, (for instance, an online catalog), but will also include remote systems such as abstracting and indexing databases mounted by a vendor. a definition of the design of the existing system plays a key role in the evolutionary specification of the system because it provides developers with a greater understanding of the possibilities and constraints of the existing infrastructure. in describing a system architecture, several formal representations can be used that capture various aspects of the system's capabilities at different levels of granularity. these include module views that provide static specifications of individual components; component and connector views that provide dynamic views of processes; and deployment views that incorporate hardware elements.' the selection of representations is beyond the scope of this paper. typical elements of a system architecture can be seen in figure 2. for the paper, three classes of components are being considered, although more may be introduced if applicable locally. the core-infrastructure components are fundamental services and information that support one or more systems or subsystems. in a typical library environment this includes authentication services, web platforms, and the network. in some library environments, external beyond information architecture i maloney and bracke 149 reproduced with permission of the copyright owner. further reproduction prohibited without permission. units may maintain some or all of these components. for example, many college campuses maintain an authentication infrastructure in the campus computing office. overall, core infrastructure provides the glue for tying together the many applications that libraries attempt to integrate in their web sites. the system architecture should include details regarding the standards and interfaces that are used within the library technical infrastructure. many of the applications in the library environment are off-the-shelf components that have been developed by external vendors. these off-the-shelf components may include the catalog, ill modules, electronic-course reserves, and virtual-reference systems. although individual libraries may have some control over configuration options in these applications, they are likely to have little influence over the basic functionality or data formats provided by these systems. core functionality tends to change based on the demands of many libraries looking for similar functionality. despite the lack of functional control over these systems, components developed by external vendors may provide standards-based system interfaces to their functionality. these usually take the form of industry-supported standards or vendor-supplied application programming interfaces and give libraries some flexibility in working with these components. explicit descriptions of the available standard and proprietary interfaces should be included within the system architecture. other applications may have been developed within the library and so can be changed more easily. examples of locally developed applications typically include subject pages, information about the library, and digital web exhibits and collections. although local development does provide more control over the appearance and functionality of a piece of software, it is not without problems. local development is often conducted using a bricolage approach, solving specific problems singularly, without giving consideration to the larger networks of systems in which the solutions operate. when such approaches do not take into account larger issues of systems architecture, opportunities to solve a broader range of problems may be missed and subsequent repackaging of these solutions may be limited or impossible. libraries frequently also have a limited number of programmers, often remedied by pulling librarians or staff from other duties. while this certainly can allow libraries to meet some user needs, the lack of software-engineering skills in libraries may result in local solutions that are inflexible and that do not support standards for data storage or interchange. because the internal design of these applications is accessible and modifiable, the system architecture should include more extensive descriptions of the internal features and relationships that they contain. although this will not completely alleviate the problems of software maintenance, it will provide a better foundation for decisions regarding future migration. system architecture applications (off-the-shelf and locally developed) specification of the access mechanisms and standar ds for previously installed systems including: • catalog • interlibrary-loan • electronic reserves • abstracting and indexing databases • content management systems • legacy web content core infrastructure • authentication: the va lidation of a users identity based on creden tials. increas ingly a part of a campus-wide infrastructure . • web platforms : operating systems, server software and application software the provide the general foundation for the website. • network: the communication infrastructur e within the library system and connect ing to the internet. information storage and access • storage: the definition of storage structures including relational or hierarchical schema. character format specifications. • standards: standards available for access to the data. these include formats like marc, dublin core and mechani sms like 239 .50 and odbc . figure 2. eleme nts of a system archit ec tur e finally, typical library architectures consist of links to resources that are licensed or organized on behalf of the user. these include abstracting and indexing databases, full-text content provided by publishers outside of the library, and general vetted internet sites. linking the user to the system usually provides access to these systems, and libraries have no control over the technical implementations of such resources. newer federated search technologies are integrating into the library infrastructure the users' access to the site and to results from the sites, and linking tools make the interrelationships between these systems more easily understood. nevertheless, integrating these resources into a web site in a manner that makes sense to library users is a challenge. the access mechanisms and information formats required to communicate with the site should be clearly documented within this system architecture. i interrelationship of the information and system architectures reacting to the rapid pace of change can result in an adhoc or haphazard approach to web-site design. the sections above describe a systematic approach to include and evaluate changes to the web site. in order to imple150 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. ment the changes and create a web site that is scalable and made of reusable components, it is necessary to evaluate, plan, and document all changes to the system . figure 3 graphically depicts the interrelationship between eia and system architecture. user needs, as described by ia, should inform the development of technical infrastructure. the informs arrow indicating that eia informs the design and development of th e system architecture depict s this interrelationship. the constrains arrow designates the reality that some aspects of the existing infrastructure cannot be changed within this planning cycle and will limit the library 's ability to immediately change the underlying content and function of the web site. when mapping the conceptual d esign to the physical design, there will be gaps that represent functionality that cannot be supported, either fully or in part, by the current system architecture and thus constrain the full implementation of the conceptual design . if ia is then to be implemented as fully as possible, these gaps identify the modification s and additions that must be carefully evaluated, designed , and implemented within the underlying system architecture. gaps can be addressed in a variety of ways. if there is a total gap in functionality, a system can be deve loped or implemented to provide the desired functionality as part of the larg er system architecture. this may result in a complete development project or in the specification of an off-the-sh elf application to meet the newly identified demand. in the case where an existing system has some of the required functionality but is not completely suitable for the users ' goals and objectiv es, an incremental approa ch of modernization can be adopted . modernization surrounds "the legacy system with a so ftware layer that hid es the unwanted complexity of the old system and exports a modern interface ."" this is done to provide integration with a modern operating environment while retaining the data and exposing the functions of the existing system, if desired. techniques range from screen scraping to the implementation of web services to export access to functions that are still relevant within the new context. all of these chang es beco me part of the system architecture for future iteration s of change. gaps that cannot be immediately added or changed to meet the specified requirements become constraints in the next iteration of conceptual design. in the absence of a plan, the underlying systems will continue to undergo constant evolutionary changes, ostensibly to meet the changing needs and workflows of both users and staff. change comes from many sources, including local implementations and modifications, external vendors, and industry-wide changes in standards. this rapid but incremental change can produce a system that is very difficult to maintain and that provides few reusable modules. having a well-documented implementation and integration plan will not guarantee that extended inform atio n archit ecture system archit ect ure coro infrastructu re • authonticatlon • web platforms • network figure 3. the interrelat ionsh ip between the conceptual and physical design of the library web site the library will not experi ence the negative effects of technological change, but it does allow a library to b ette r manage change in meeting the needs of its users. th e more explicitly and clearly th e modifiable featur es are documented within the sys te m architecture; the easier it will be to plan to fill the gaps. i conclusion library users' mental models of library processes hav e fundamentally changed, creating a serious disconn ect between how users expect to use a library web site and how the site was design ed. in particular , user expectations regarding the numb er of step s that must be completed have changed. at the same time, library technical infrastructures are composed, in part , of legacy systems that provide great value and facilitate interlibrary resour ce sharing, but were not designed for the web environm ent. it is essential that librari es develop new approaches to th e conceptual design of web sites that support current and future changes to both user behaviors and to library systems architectures. in th e long run, these approach es should contribute to th e development of a referenc e model for the description of library services. the authors have proposed a complete framework for conceptual design and physical implementation that is responsive to changing user ne eds while recogni zing the need for libraries to adopt an efficient and cost-effe ctive approach to web-site design, implementation, and maintenanc e. functional and content needs of the user are id entified and molded into a conceptual design based on a broadened perspectiv e of the users' objectiv es . mapping conceptual requirem ents to physical architectures is an important part of this framework, using an architectural representation in combination with descriptions of integration elements that have been developed to support the incremental and iterative change. beyond information architecture i maloney and bracke 151 reproduced with permission of the copyright owner. further reproduction prohibited without permission. the ability to respond is essential, necessitated by the rapid change in the technical and user environments in which libraries operate. the framework is designed to allow logical and informed decisions to be made throughout the process regarding when to create new systems, when to replace or modernize existing systems, and when to improve the conceptual signage of the web site. references 1. christina wodtke, information architecture: blueprints for the web (indianapolis: new riders, 2003), 2. louis rosenfeld and peter marville, information architecture for the world wide web, 2nd ed. (cambridge, mass.: o'reilly, 2002), 4. 3. bob gerrity, theresa lyman, and ed tallent, "blurring services and resources: boston college's implementation of metalib and spx," reference services review 30, no. 3 (2002): 229-41; barbara j. cockrell and elaine anderson jayne, "how do i find an article? insights from a web usability study," journal of academic librarianship 28, no. 3 (may 2002): 122-32. 4. jesse james garrett, elements of user experience (indianapolis: new riders, 2002); rosenfeld and marville, information architecture. 5. james s. ghapery and dan ream, "vcu's my library: librarians love it ... users? well maybe," information technology and libraries 19, no. 4 (dec. 2000): 186-90; james s. ghapery, "my library at virginia commonwealth university: third year evaluation," d-lib magazine 8, no. 7 /8 (july/ aug. 2002). accessed july 16, 2003, www.dlib.org/ dlib/july02/ ghaphery / 07ghaphery.html. 6. university of louisville libraries web site (2003). accessed july 16, 2003, http:/ /library.louisville.edu. 7. craig larman, applying uml and patterns: an introduction to object-oriented analysis and design (new jersey: prentice hall ptr, 1998); martin fowler, analysis patterns: reusable object models (boston: addison-wesley, 1997); james rumbaugh, ivar jacobson, and grady booch, the unified modeling language reference manual (boston: addison-wesley, 1999); robert c. seacord, daniel plakosh, and grace a. lewis, modernizing legacy systems: software technologies, engineering processes, and business practices (boston: addison-wesley, 2003). 8. seacord, plakosh, and lewis, modernizing legacy systems, 9. 152 information technology and libraries i december 2004 1 an automated music programmer (musprog) david f. harrison, music director, wsui-ksui, and randolph j. herber, applications programmer, university computer center, the university of iowa, iowa city, iowa a system to compile programs of recorded music for .broadcast by the university of iowa's radio stations. the system also provides a permanent catalog of all recorded music holdings and an accurate inventory control. the program, which operates on an ibm 360/65, is avaaable in fortran iv, cobol and pl/1, with assembly language subroutines and external maintenance programs. the state university of iowa (iowa city) owns and operates two broadcasting stations, wsui, at 9l0 kc, and ksui, at 91.7 mc. wsui was the first educational radio station in operation west of the mississippi, and ranks among the oldest stations iri the country; ksui was among the earliest of the frequency modulation outlets in the area to offer programming in multiplex stereo. in the spring of 1967, when it became necessary to completely reorganize their recorded music libraries, an investigation wali simultaneously underway to determine the feasibility of utilizing automated data processing ( a.d.p.) techniques in the discographic operations of the stations. at the time there were several working bibliographic applications ( 1), ranging from relatively simple record-keeping (where is ... ? ) to more ambitious cross-referencing and indexing operations, one of which uses the kwic (keyword-in-context) computer program to classify musical recordings ( 1). on the basis of the awareness of these applications, and a belief that the intrinsic principles could be utilized and extended to cover somewhat different needs, it was proposed that the facilities of the uni2 journal of library automation vol. 2/1 march, 1969 versity computer center be employed in the selection and updating of recorded music programs. in designing a coded set of instructions to perform these tasks, it was deemed necessary that any attempt at the selection or compilation of a series of music programs should be made in accordance with certain criteria supplied to the system by the user, and that these selection specification parameters should closely parallel those which would be employed were such an extraction from the total libraries to be performed manually. additional requisites were that provision be made for updating and enlarging the master file as new items were acquired, and that the coding of the programmed instn.1ctions should be sufficiently flexible to permit inclusion of supplemental criteria as they became desirable. the above proposal met with a certain degree of opposition, the main bone of contention being that such an application would necessarily "dehumanize" music programming. there have been, and will continue to be, similar objections raised by those who are unaware of the advantages offered by a. d. p. and concomitantly unaware of the mental processes which result in what is commonly referred to as "artistic judgment." it is not the purpose of this article to attempt an exhaustive analysis of such processes, nor to castigate the objectors; it is rather simply to bring forth several basic observations dealing with the problem under discussion. a contemporary composer-theorist interested in the applications of a. d. p. techniques to the process of musical composition has observed that no paradoxical "almighty force" exists in science, which, in actual fact, progresses by discrete steps which are at once limited but unpredictable ( 3, 4). the following list of conclusions, although relating specifically to the problems of machine-"created" music, find no less an application to the current problem: creative human thought is an aggregate of restrictions or choices in all fields of human activity, including the arts. certain aspects of these ·judgments can be mechanized and simulated by certain physical mechanisms currently extant, including the cm;nputer. the rapidity of calculation or decision by computer frees human beings from the long and arduous task of manually selecting, compiling and checking of programmed works. the time thus saved can be better spent on such amenities as scripting, with complete performance information and record data, and the always-too-necessary pronouncing aids. moreover, the computer program can be "exported" to any place similarly equipped to be used by other individuals, or where other programmers are able to alter the algorithm to meet their specific needs. the automated music programmer ( musprog) was interpreted as being a series of steps, the first of which specifies that complete mus~c programs are to be selected in accordance with a table of specifications introduced as data, each card containing inforination pertinent to a discrete program. the second step requires that each and every entry in the catalog be i f automated music programmer/harrison and herber 3 checked for availability by any program in the tables established in the preceding step, this status to be determined on the basis of a satisfactory comparison with the individual criteria supplied on the selection specification card. among these are "tests" (note that a failure to meet the requirements in any step disqualifies the item) to determine when the item was last selected, as well as the number of times selected; a check for allowable time length; a check for duplication of composer and/ or title; a statement that stereo recordings are to be used only for fm; a check for acceptable period, style and type of composition; and the decision to update the master file. in the final operation of the program, each duplicate title of a work selected is also updated, simulating selection to prevent its selection during the next month. if each duplicate were given the date factor of the item actually selected, the latter would tend to appear much more frequently than its companions because the program would continue to select the longest available item, and it is reasonably safe to assume that the selected item is the longest version of the title in question. it was necessary, therefore, to devise a means by which each version of a given work (indicated by both title and composer) be given equal weight for "fair" selection. a unidimensional array called item was constructed with ten positions as follows: item (10)/0', '0', '0', '0', '1', '1', '1', '2', '2', '3'/. the index of the array was then selected by referencing a routine which generates random, positive integers in the range one through ten. the contents of that position in item are added to the date factor of the record selected, and the result placed in the corresponding field of the duplicate title under scrutiny. thus there exists a 40% probability that the duplicate will have the same "weight" as the selected item, a 30% chance that the duplicate will be "pushed back" one month, 20% for two months, and a 10% probability that the date factor of the duplicate title will be increased by three months. 'when all the titles have been thus read or updated, the run concludes. figure 1 is the flowchart that is the basic design of musprog and from which the computer program was coded. the program runs on an ibm 360/65 and is available in fortran iv, cobol, and pl 1 with assembly language subroutines and external maintenance programs. copies of these programs may be obtained from the national auxiliary publications service (naps #00278). the machine readable catalog system currently employed by the university's radio stations is, on the whole, independent of the record's origin or manufacture. (the catalog number could be considered as nothing more than an indication of a discrete shelf space. ) the system was designed to facilitate maximally efficient use of the 80 columns available on a punch card. by utilizing two alaphabetic and two decimal characters ranging from aoo through zz99 provision is made for identification of records and tapes in quantities somewhat in excess of seventy-thousand 4 ] ournal of library automation vol. 2/1 number of p rogram spec .. lfication spaces +1 to snumi snumi -1 to snumi write program determine for which station program is being selected indic ate null selection list iy setting pol nter= ~ o'flfrmine which compo~ents of the allowable characteristics are significant fig. 1. flowchart for musprog. march, 1969 to the paogram pointer set j to pointer in piece cep•j-it--l.--------' i from lag add oh f 10 i\iumifr of plfcfs sflfctfd chain rl f: cf to begi nn i ng of list for ,rogram sui i rac t duration from time remain ing c om,u i f lag fr om du ra l io n copy p i e ce informal i on 1 ~ 1 0 spac f automated music programmer/ harrison and herber 5 ph 1 w ri te prog ram poi nter rewind ~npui and r an oom c hang e in to pr opu l a g f i el o fig. 1 continued. 6 journal of library automation vol. 2/1 march, 1969 individual discs or reels. the total of actual single titles possible to catalog in this manner is at least twice that number. the card catalog is made up along more or less standard, triple-reference lines on the familiar 3x5-inch card. these remain in the master card file, but are actually used only for reference purposes, rather than for actual selection. the "real" master library exists in the form of punched cards (later transferred to magnetic tape). each card image contains the following information, with blank columns separating contiguous fields: columns 1-10 composer, or first ten characters if abbreviation necessary. 12-27 title, abbreviations standardized 29-33 duration of work in seconds. 35-37 period of composition. 39-40 type of composition. 42-45 catalog number. 47-57 physical location of item on cataloged disc or tape. 59-64 "date fields, used for updating and usage factors. 66-69 seasonal key, a blank indicating general usefulness. 71-80 field used by musprog for internal record-keeping. operation selection of music by the system is performed in accordance with a table of program specifications which includes information pertinent to the length of the desired program and maximum permissible length of any single work within it, the type of music desired, and additional information, such as date, time and title of the program to be aired and an indication of the station for which the program is to be selected. all the selections for ksui ( fm) are required to be stereophonic. classification into stereophonic and monophonic groups is a function of the catalog number, aoo through z99 being stereophonic and aaoo through zz99 being monophonic. a program selection card contains the following data: columns 1 2-6 7-11 12 13-27 28-79 station code: w for wsui, blank for ksui duration of program in seconds. maximum duration of each item to be selected ( 0 or blank indicates program may consist of but a single work equivalent in length to program duration. number of types being specified. three three-plus-two character fields to specify period and type (modia equals "twentieth-century, orchestra"). if any field is blank, musprog assumes anything acceptable. title of program to be selected, day and time. automated music programmer/harrison and herber 7 as an example, the following specifications were made for a program called "aubade" which was aired at 10:00 a.m. on tuesday, july 30, by wsui. program duration was to be 3400 seconds (56:40), allowing 3:20 for continuity. maximum length of any single work within the program was to be 900 seconds (15 :00). music could be chosen from the contemporary orchestral repertoire, any instrumental work from the classic period, or any type "3" work, i. e., soloist and piano, or chorus a cappella. figure 2 shows a printout of selections for two programs. music sel ected fo r ws ui even i ng con cert 5:30 pm thu rsday. july 2 5 pr ogram no, 6 9 le ng th 860 0 unuse d time is 1 tota l 2:23:19 rangs trom di v el eg !aco fa41 52/82 prok o~ ! ev semyon kotko sui ka 4 7 51-2/e bach cantata 146 ma 19 s1-2/e beetho ven pia co n arr vn c kb-> 1 s 12/ e music sele cte d for wsui 0:15:32 0:42:12 0:42:25 0 :43:10 even i ng co nce rt 5:30 pm tu esday• july 30 procr mt no , 6 7 length 060 0 unus ed t im e is 0 total 2:23:20 ha ydn sy mp ho ny 38 kfl21 51/e tchaik ovsk sy~ phony 5 ga 5 0 s l -2/e i ve s pi a son 2 con cor na8 1 s l-2/ e beethoven strin g qrt 15 ca 12 51-8/e fig. 2 printout of selections. 0 : 13:40 0 ! 42:58 0:4 3: 0 an additional feature of musprog is provision for a periodic summary of library usage, affording the librarian a concise account of frequently played items, as well as an indication of those works which have been selected infrequently or ignored altogether. this report allows the programmer to assess more accurately the maximum number of times a selection may be programmed before it is declared unacceptable. the system also puts out printed lists of works extracted from the library in accordance with a user-specified table of reference fields: e. g., all symphonies, all works by bach, all works of under ten minutes in length, all christmas music; or conceivably, any symphonies by bach which are 8 journal of library automation vol. 2/1 march, 1969 suitable for christmas and less than ten minutes long. this latter step could also include, with minor alterations in the computer program, provision for performances by one specific ensemble or artist only. an external program allows adding items to the master tape, deleting those no longer needed and correcting any of the various fields within individual records; thus if mis-timings or other inaccuracies are noted, it becomes a relatively simple matter to correct them. discussion it can readily be seen that "the machine" neither possesses nor displays "taste" in any conventional sense of that word, since it can select only those types of music which the programmer has declared acceptable. it does not, indeed cannot, show any predilection toward certain types of music to the detriment or exclusion of others, save those which have been removed from the list of potential selections by the programmer. it performs no independent judgments. without doubt, then, there is no logical basis for t.~e cry of "dehumanization," since the program was originally designed by human minds and is, at each step of the process of selection, governed by the human-designed control parameters and program specifications; therefore it cannot select music willy-nilly, but must be told what to do and how to do it. it also has been found that specifications cannot be "plugged in" at random, for the programs thus selected would prove little more than a conglomerate of sundry works bearing no relation to one another. it is very much a necessity that organization and logic be designed into each program to have any coherent programming result. the machine does not "know" what to do unless told. it should be brought out that because of a built-in logic and the order of titles on the master file the program will tend to select the longest works available to fill the specified program time, making up the difference, if any, with progressively shorter pieces until the time is filled, or until no work of acceptable type and sufficient brevity can be located. since the longer works tend to occur among certain types and/ or styles of music, there may be some tenuous grow1ds for a suspicion of bias. it will be observed that musprog does not include information pertinent to performer, conductor, etc. one of the several reasons for this apparent oversight is that such information would, at the outset, have required the use of one to four additional data cards per title. since this information was not deemed absolutely essential to the immediate functions of the program, it was decided to postpone inclusion of such a refinement to some future date. conclusion musprog has been utilized by the state university of iowa since march, 1968, and has resulted in considerable time-saving. for example, the july, 1968, programming required one hundred and two programs automated music programmer/harrison and herber 9 varying in length from thirty minutes to somewhat over four hours, and consisting of a variety of musical styles and representing a diversity of programming difficult to achieve efficiently by ordinary means. in three minutes and twelve seconds, musprog selected the programs, updated the catalog, checked for duplication of selections, timed each program, and printed out the resultant copy properly headed. at an approximate cost of $250.00 per hour, this comes to less than fifteen dollars per month to perform tasks which might normally require two persons, at perhaps two or three dollars per hour, to work an entire week or more. it is doubtful that even then each catalog entry could be examined and an accurate record of usage be kept. acknowledgments a staff research grant from the graduate college, university of iowa, partially supported development and operation of this system. dean duane c. spriestersbach of the graduate college, professor gerard p. weeg, chairman of the department of computer science, and program supervisor robert e. irwin gave generous support and encouragement to the development of musprog. references 1. 'wilhoit, g. cleveland: "computerized indexing for broadcast music libraries," journal of broadcasting, 11 (fall, 1967) 325-337. 2. brook, barry s.: "rilm, repertoire internationale de ia litterature musicale," notes; the quarterly journal of the mtisic library association, 23 (march, 1967) 462-467. 3. xenakis, iannis: "in search of a stochastic music," gravesano review, 11 (1958). social contexts of new media literacy: mapping libraries elizabeth thorne-wallington information technology and libraries | december 2013 53 abstract this paper examines the issue of universal library access by conducting a geospatial analysis of library location and certain socioeconomic factors in the st. louis, missouri, metropolitan area. framed around the issue of universal access to internet, computers, and technology (ict) for digital natives, this paper demonstrates patterns of library location related to race and income. this research then raises important questions about library location, and, in turn, how this impacts access to ict for young people in the community. objectives and purpose the development and diffusion of new media and digital technologies has profoundly affected the literacy experiences of today’s youth.1 young people today develop literacy through a variety of new media and digital technologies.2 the dissemination of these resources has also allowed for youth to have literacy-rich experiences in an array of different settings. ernest morrell, literacy researcher, writes, as english educators, we have a major responsibility to help future english teachers to redefine literacy instruction in a manner that is culturally and socially relevant, empowering, and meaningful to students who must navigate a diverse and rapidly changing world.3 this paper will explore how mapping and geographic information systems (gis) can help illuminate the cultural and social factors related to how and where students access and use new media literacies and digital technology. libraries play an important role in encouraging new media literacy development;4 yet access to libraries must be understood through social and cultural contexts. the objective of this paper is to demonstrate how mapping and gis can be used to provide rigorous analysis of how library location in st. louis, missouri, is correlated with socioeconomic factors defined by the us census including median household income and race. by using gis, the role of libraries in providing universal access to new media resources can be displayed statistically, both challenging and confirming previously held beliefs about library access. this analysis raises new questions about how libraries are distributed across the st. louis area and whether they truly provide universal and equal access. elizabeth thorne-wallington (ethornew@wustl.edu) is a doctoral student in the department of education at washington university in st. louis. mailto:ethornew@wustl.edu information technology and libraries | december 2013 54 literature review advances in technologies are transforming the very meaning of literacy.5 traditionally, literacy has been defined as the ability to understand and make meaning of a given text.6 the changing global economy requires a variety of digital literacies, which schools do not provide.7 instead, young people acquire literacy through a multitude of inand out-of-school experiences with new media and digital technology.8 libraries play a vital role in supporting new media literacy by offering out-of-school access and experiences. to understand the role that libraries play in offering access to new media literacy technologies, a few key concepts must be defined. first is the concept of the digital native. those born around 1980, who have essentially grown up with technology, are known as digital natives.9 digital natives are expected to have a base knowledge of technology and to be able to pick up and learn new technology quickly because of that base knowledge. digital natives have been exposed to technology from a young age and are adept at using a variety of digital technologies. the suggestion is that young people can quickly learn to make use of the new media and technology available in a specific location. key to any discussion of digital natives is the concept of the digital divide. the digital divide has been a central issue of education policy since the mid-1990s.10 early work on the digital divide was concerned primarily with equal access.11 more recently, however, the idea of a “binary digital divide” has been replaced by studies focusing on a multidimensional view of the digital divide.12 hargattai asserts that even among digital natives, there are large variations in internet skills and uses correlated with socioeconomic status, race, and gender.13 these variations call for a nuanced study examining social and cultural factors associated with new media literacy, including out-ofschool contexts. the concept of literacy and learning in out-of-school contexts has a strong historical context. hull and schultz provide a review of the theory and research on literacy in out-of-school settings.14 a variety of studies, including self-guided literacy activities, after-school programs, and reading programs were reviewed, and the significance of out-of-school learning opportunities was supported by these studies. importantly for the research here, research has also been done on the use of digital technology in out-of-school settings. lankshear and knobel examine out-of-school practices extensively with their work on new literacies.15 lankshear and knobel also make clear the complexity of out-of-school experiences among young people. students participate in nontraditional literacy activities such as blogging and remix in a variety of out-of-school contexts, from home computers to community-based organizations to libraries. most importantly, lankshear and knobel found that the students did connect what they learned in the classroom with these out-of-school activities. the connection between out-of-school literacies and in-school learning has also been studied. education policy researcher allan luke writes, the redefined action of governments . . . is to provide access to combinatory forms of enabling capital that enhance students’ possibilities of putting the kinds of practices, texts, and discourses social contexts of new media literacies: mapping libraries| thorne-wallington 55 acquired in schools to work in consequential ways that enable active position taking in social fields.16 collins writes about this relationship between inand out-of-school literacies. collins writes in her case study that there are a variety of “imports” and “exports” in terms of practices. that is, skill transaction works in both directions, with skills learned out of school used in school, and skills learned in school used out of school.17 skerett and bomer make this connection even more explicit when looking at adolescent literacy practices.18 their article examines how a teacher in an urban classroom drew on her students’ out-of-school literacies to inform teaching and learning in a traditional literacy classroom. the authors found that the teacher in their study was able to create a curriculum that engaged students by inviting them to use literacies learned in out-of-school settings. however, the authors write that this type of literacy study was taxing and time-consuming for both the teacher and the student. still, it is clear that connections between inand out-of-school literacies can be made. the role libraries play in making this connection has not been studied as extensively. yet it is clear that young people do use libraries to access technology. becker et al., found that nearly half of the nation’s 14 to 18 year olds had used a library computer within the past year. becker et al. additionally found that for poor children and families, libraries are a “technological lifeline.” among those below the poverty line, 61 percent used public library computers and the internet for educational purposes.19 tripp writes that libraries have long played an important role in helping people gain access to digital media tools, resources, and skills.20 tripp writes that libraries should capitalize on the potential of new media to engage young people. additionally, tripp argues that librarians need to develop skills to train young people to use new media. the idea that libraries are important in meeting the need is further supported by the recent grants, totaling $1.2 million, by the john d. and catherine t. macarthur foundation to build “innovative learning labs for teens” in libraries. this grant making was a response to president obama’s “educate to innovate” campaign, a nationwide effort to bring american students to the forefront in science and math.21 this literature review demonstrates that the body of research currently available focuses on digital natives and the digital divide, but that the research lacks the nuance needed to capture the complexity of social and cultural contexts surrounding the issue. this literature review further demonstrates both the importance of new media literacy and out-of-school learning, as well as the key role that libraries play in supporting these learning opportunities. the study provided here uses gis analysis to demonstrate important socioeconomic and cultural factors that surround libraries and library access. first, i describe the role of gis in understanding context. next, i describe the methods used in this paper. finally, i analyze the results and implications for the study. geographic information systems analysis in education there is a burgeoning body of research which uses geographic information systems (gis) to better understand socioeconomic and cultural contexts of education and literacy issues.22 information technology and libraries | december 2013 56 there are several key works that link geography and social context. lefebvre defines space as socially produced, and he writes that space embodies social relationships shaped by values and meanings. he describes space as a tool for thought and action or as a means of control and domination. lefebvre writes that there is a need for spatial reappropriation in everyday urban life. the struggle for equality, then, is central to the “right of the city.”23 the unequal distributions of resources in the city help to maintain social and economic advantaged positions, which is important to the analysis here of library access. this unequal distribution of resources continues today. de souza briggs and others write that there is clear geographical segregation in american cities today.24 this is seen in housing choice, racial attitudes, and discrimination, as well as metropolitan development and policy coalitions. in the conclusion of his book, de souza briggs writes that housing choice is limited for low-ses minorities, and these limitations produce myriad social effects. again, this finding is important to the contexts of where libraries are located. jargowsky writes of similar findings.25 like de souza briggs, jargowsky focuses on the role that geography plays in terms of neighborhood and poverty. jargowsky even finds social characteristics of these neighborhoods: there is a higher prevalence of single-parent families, lower educational attainment, a higher level of dropouts, and more children living in poverty. important here, though, is that all such characteristics can be displayed geographically, which means that varying housing, economic, and social conditions can be displayed with library locations. soja goes beyond the geographic analysis offered by de souza briggs and jargowsky and writes that space should be applied to contemporary social theory.26 soja found that spatiality should be used in terms of critical human geography to advance a theory of justice on multiple levels. he writes that injustice is spatially construed and that this spatiality shapes social injustice as much as social injustice shapes a specific geography. this understanding, then, shapes how i approach the study of new media literacies as influenced by cultural and social factors. these factors are particularly prevalent in the st. louis, missouri, area. colin gordon reiterates the arguments of lefbvre jargowsky and de souza briggs in arguing that st. louis is a city in decline.27 by providing maps that project housing policies, gordon is able to provide a clear link between historical housing policies such as racial covenants and current urban decline. gordon is able to show that vast populations are moving out of st. louis city and into the county, resulting in a concentration of minority populations in the northern part of the city. gordon argues that the policies and programs offered by st. louis city have only exacerbated the problem and led to greater blight.28 in terms of literacy, morrell makes the most explicit connection between literacy and mapping with a study that used a community-asset mapping activity to make the argument that teachers need to make an explicit connection between literacy at school and the new literacies experienced in the community.29 the significance of this is that gis can be used to illuminate the social and economic contexts of new media literacy opportunities as well, which in turn could help inform social dialogue about the availability of and access to informal education opportunities for new media literacy. social contexts of new media literacies: mapping libraries| thorne-wallington 57 methods and data the gis analysis performed here concerns library locations in the st. louis metropolitan area, including st. louis city and st. louis county. the st. louis metropolitan area was chosen because of past research mapping the segregation of the city, largely because the city and county are so clearly segregated racially and economically along the north–south line. this segregation is striking when displayed geographically and illuminating when mapped with library location. maps were created using tiger files (www.census.gov/geo/maps-data/data/tiger.html) and us census data (http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml), both freely available to the public via internet download. libraries were identified using the st. louis city library’s “libraries & hours” webpage (www.slpl.org/slpl/library/article240098545.asp), the st. louis county library “locations & hours” webpage (www.slcl.org/about/hours_and_locations), google maps (www.maps.google.com), and the yellow pages for the st. louis metropolitan area (www.yellowpages.com). the address of each library was entered into itouchmap (http://itouchmap.com ) to indentify the latitude and longitude of the library. a spreadsheet containing this information was then loaded into the gis software and displayed as x–y data. the maps were then displayed using median household income, african american population, and latino and hispanic population as obtained from the us census at census tract level. for median household income, the data was from 1999. for all other census data, the year was 2010. for district-level data, communication arts data from the missouri department of elementary and secondary education (modese) website (http://dese.mo.gov/dsm ), was entered into microsoft excel, and then displayed on the maps. the data is district level, representing all grades tested for communication arts across all district schools. the modese data was from 2008, the most recent year available at the time the analysis was performed. the communication arts data was taken from the missouri assessment program test. this test is given yearly across the state to all public school students. the state then collects the data and makes it available at the state, district, and school level. the data used here is district-level data. scores are broken into four categories: advanced, proficient, basic, and below basic. the groups for proficient and advanced were combined to indicate the district’s success on the map test. these are the two levels generally considered acceptable or passing by the state.30 before looking at patterns of library location and these socioeconomic and educational factors, density analysis was performed on the library locations using esri arcgis software, version 9.0, to analyze whether clustering was statistically significant. this analysis was used to demonstrate whether libraries were clustered in a statistically significant pattern, or if location was random. the nearest neighbor tool of arcgis was used to determine if a set of features, in this case the libraries, shows a statistically significant level of clustering. this was done by measuring the distance from each library to its single nearest neighbor and calculating the average distance of all the measurements. the tool then created a hypothetical set of data with the same number of features, but placed randomly within the study area. then an average distance was calculated for these features and compared to the real data. that is, a hypothetical random set of locations was compared to the set of actual library locations. a near-neighbor index was produced, which expresses the ratio of the observed distance divided by the distance from the hypothetical data, thus comparing the two sets.31 this score was then standardized, producing a z-score, reported below in the results section. http://www.census.gov/geo/maps-data/data/tiger.html http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml http://www.slpl.org/slpl/library/article240098545.asp http://www.slcl.org/about/hours_and_locations http://www.maps.google.com/ http://www.yellowpages.com/ http://dese.mo.gov/dsm information technology and libraries | december 2013 58 results and conclusions using the nearest neighbor tool produced a z-score of -3.08, showing that the data is clustered beyond the 0.01 significance level. this means that there is a less than 1 percent chance that library location would be clustered to this degree based on chance. knowing, then, that library location is not random, we can now examine socioeconomic patterns of the areas where libraries are located. figure 1 shows library location and population of individuals under the age of 18 at the census tract level for st. louis city and county, using data from the 2010 us census. to clarify, the city and county are divided by the bold black line crossing the middle of the map, the only such boundary in figure 1, where the county is the larger geographic area. library location is important because previous research shows that young people use informal learning environments to access new media technologies,32 and libraries are a key informal learning environment.33 this map demonstrates, however, that libraries are not located in census tracts with the highest populations of individuals under the age of 18 in st. louis city and county. in fact, for all the tracts with the highest number of individuals under the age of 18, there are zero libraries located in these tracts. this is especially concerning given that young people may have less access to transportation, so their access of facilities in neighboring census tracts may be quite limited. figure 1. number of individuals under the age of 18 by census tract and library location in st. louis city and st. louis county. source: 2010 us census. social contexts of new media literacies: mapping libraries| thorne-wallington 59 figure 2 includes maps showing library locations in st. louis city and county in terms of poverty and race by census tract level, as well as act score by district, represented by the bold lines, where st. louis city is represented by a single district, the st. louis public school district. median household income in indicated by the gray shading, with white areas not having data available. first, census tracts with low median household income are clustered in the northern part of the city and county. there are four libraries in the northern half of the city, and eleven libraries in the central and southern parts of the city. there are fewer libraries in the census tracts with low median household income. figure 2. median household income, act score, and library location, st. louis city and county. source: 2010 us census and missouri department of elementary and secondary education, 2010, www.modese.gov. while the nearest neighbor analysis has already demonstrated the libraries are significantly clustered, the maps seem to suggest the pattern of that clustering. this is especially concerning given the report by becker that 61 percent of those living below the poverty line use libraries to access the internet.34 first, in terms of median household income, it does appear that many libraries are located in higher income areas of the city and county. while the libraries appear to be http://www.modese.gov/ information technology and libraries | december 2013 60 clustered centrally, and particularly near major freeways, there appear to be libraries in many of the higher income census tracts. adding to the concern of location is that of access to these library locations. for those living below the poverty line, transportation is often a prohibitive cost, so access from public transportation should also be a major concern for libraries. additionally, in a pattern repeated in figure 4, the location of libraries does not appear to have any effect on act scores, but there are clearly higher act scores in wealthier areas of the city and county. this is not to say that there is a statistical relationship between act score and library location, but rather to look at the spatial patterns of each in order to note similarities and differences in these patterns. figure 3 shows library location by race, including african american or black and hispanic or latino. first, it is important to note that patterns of race in st. louis have been carefully documented by gordon.35 the st. louis area is clearly a highly segregated region, which makes the social contexts of libraries in the st. louis area even more important. this map demonstrates that while there are many libraries in the northern parts of st. louis city and county, none of these libraries is located in the census tracts with the highest populations of those identifying themselves as african american or black in either the city or county. this raises questions about the inequality of access to the libraries. on the other hand, the densest populations of those identifying themselves as hispanic or latino are in the southern part of the city, but not the county. there is a library located in one of those tracts. it appears the areas with higher concentrations of african americans or blacks have fewer libraries, while areas with the higher concentrations of latinos or hispanics are located in the southern parts of the city that do have libraries. it is important to note, however, that the concentrations of latinos and hispanics is quite low, and those areas are majority white census tracts. as noted above, beyond location, access from public transportation is also an important issue. at the same time, the clustering and patterns shown on these maps raise key issues about access based on income and race. libraries are not located in areas with low median household income or in areas with high concentrations of african americans or blacks. this raises serious questions about why libraries are located where they are, and whether the individuals located in these areas have equal access to library resources, particularly new media technologies. social contexts of new media literacies: mapping libraries| thorne-wallington 61 figure 3. african american or black and hispanic, library location, st. louis city and county. source: 2010 us census. the final map raises a slightly different issue, one of test scores and student achievement. figure 4 shows library location by percent proficient or advanced on the missouri achievement program test by district. beyond the location of the libraries, one factor that stands out is that the areas with the lowest percent proficient or advanced are also the areas with the lowest median household income and the highest percentage of those identifying as african american or black. here an interesting pattern emerges. while there are many libraries in the city and northern part of the county, the percent proficient or advanced on the communication arts portion of exam is quite low (20–30 percent). on the other hand, in the western part of the county, there are few libraries, but the percent proficient or advanced is at its highest level. this suggests that there may not be a strong connection between achievement on the map exam and library location, similar to the lack of relationship seen in between act average score and library location in figure 2. at the same time, there does appear to be a correlation between race, income, and test scores. this correlation is noted throughout the literature on student achievement.36 clearly, these maps raise important questions such as how and why libraries are located in a certain area, who uses libraries in a given area, as well as what other informal learning environments and community assets exist in these areas. what is made clear by the maps, though, is that gis can be used as a tool to help understand the context of new media literacy. information technology and libraries | december 2013 62 figure 4. proficient or advanced, communication arts map by district, 2009, and library location. source: missouri department of elementary and secondary education, 2010, www.modese.gov. significance these results demonstrate that gis can be used to illuminate the social, cultural, and economic complexity that surrounds informal learning environments, particularly libraries. this can help demonstrate not only where young people have the opportunity to use new media literacy, but also the complex contextual factors surrounding those opportunities. paired with traditional qualitative and quantitative work, gis can provide an additional lens for understanding new media literacy ecologies, which can help inform dialogue about this topic. for the results of this study, there does appear to be a relationship between library location and race and income. this study illuminates the complex contextual factors affecting libraries. because of the important role that libraries can play in offering young people out of school learning opportunities, particularly in terms of access to new media resources, these contextual factors are important to ensuring equal access and opportunity for all. http://www.modese.gov/ social contexts of new media literacies: mapping libraries| thorne-wallington 63 references 1. ernest morrell, “critical approaches to media in urban english language arts teacher development,” action in teacher education 33, no. 2 (2011): 151–71, doi: 10.1080/01626620.2011.569416. 2. mizuko ito et al., hanging out, messing around, and geeking out: kids living and learning with new media (cambridge: mit press/macarthur foundation, 2010). 3. morrell, “critical approaches to media in urban english language arts teacher development.” 4. lisa tripp, “digital youth, libraries, and new media literacy,” reference librarian 52, no. 4 (2011): 329–41, doi: 10.1080/02763877.2011.584842. 5. gunther kress, literacy in the new media age (london: routledge, 2003). 6. ibid. 7. donna e. alvermann and alison h. heron, “literacy identity work: playing to learn with popular media,” journal of adolescent & adult literacy 45, no. 2 (2001): 118–22. 8. colin lankshear and michele knobel, new literacies: everyday practices and classroom learning (maidenshead: open university press, 2006). 9. john palfrey and urs gasser, born digital: understanding the first generation of digital natives (new york: perseus, 2009). 10. karin m. wiburg, “technology and the new meaning of educational equity,” computers in the schools 20, no. 1–2 (2003): 113–28, doi: 10.1300/j025v20n01_09. 11. rob kling, “learning about information technologies and social change: the contribution of social informatics,” information society 16, no. 3 (2000): 212–24. 12. james r. valadez and richard p. durán, “redefining the digital divide: beyond access to computers and the internet,” high school journal 90, no. 3 (2007): 31–44, http://www.jstor.org/stable/40364198. 13. eszter hargittai, “digital na(t)ives? variation in internet skills and uses among members of the ‘net generation,’” sociological inquiry 80, no. 1 (2010): 92–113, doi: 10.1111/j.1475682x.2009.00317.x. 14. glynda hull and katherine schultz, “literacy and learning out of school: a review of theory and research,” review of educational research 71, no. 4 (2001): 575–611, http://www.jstor.org/stable/3516099. 15. colin lankshear and michele knobel, new literacies. http://www.jstor.org/stable/40364198 http://www.jstor.org/stable/3516099 information technology and libraries | december 2013 64 16. allan luke, “literacy and the other: a sociological approach to literacy research and policy in multilingual societies,” reading research quarterly 38, no. 1 (2003): 132–41, http://www.jstor.org/stable/415697. 17. stephanie collins, “breadth and depth, imports and exports: transactions between the in-and out-of-school literacy practices of an ‘at risk’ youth,” in cultural practices of literacy: case studies of language, literacy, social practice, and power (mahwah, nj: lawrence erlbaum, 2007). 18. allison skerrett and randy bomer, “borderzones in adolescents literacy practices: connecting out-of-school literacies to the reading curriculum,” urban education 46, no. 6 (2011): 1256–79, doi: 10.1177/0042085911398920. 19. samantha becker et al., opportunity for all: how the american public benefits from internet access at u.s. libraries (washington, dc: institute of museum and library services). 20. lisa tripp, “digital youth, libraries, and new media literacy.” 21. nora fleming, “museums and libraries awarded $1.2m to build learning labs,” education week (blog), december 7, 2012, http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarde d_12_million_to_build_learning_labs_for_youth.html. 22. see william f. tate iv and mark hogrebe, “from visuals to vision: using gis to inform civic dialogue about african american males,” race ethnicity and education 14, no. 1 (2011), 51– 71, doi: 10.1080/13613324.2011.531980; mark c. hogrebe and william f. tate iv, “school composition and context factors that moderate and predict 10th-grade science proficiency,” teachers college record 112, no. 4 (2010), 1096–1136; robert j. sampson, great american city: chicago and the enduring neighborhood effect (chicago: university of chicago press, 2012). 23. henri lefebvre, the production of space (oxford: blackwell, 1991). 24. xavier de souza briggs, the georgraphy of opportunity: race and housing choice in metropolitan america (washington, dc: brookings institute press, 2005). 25 paul jargowsky, poverty and place: ghettos, barrios, and the american city (new york: russell sage foundation, 1997). 26. edward w. soja, postmodern geographies: the reassertion of space in critical social theory (new york: verso, 1989). 27. collin gordon, mapping decline: st. louis and the fate of the american city (university of pennsylvania press, 2008). 28. ibid. http://www.jstor.org/stable/415697 http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarded_12_million_to_build_learning_labs_for_youth.html http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarded_12_million_to_build_learning_labs_for_youth.html social contexts of new media literacies: mapping libraries| thorne-wallington 65 29. ernest morrell, “critical approaches to media in urban english language arts teacher development.” 30. missouri department of elementary and secondary education, http://dese.mo.gov/dsm/. 31. david allen, gis tutorial ii: spatial analysis workbook (redlands, ca: esri press, 2009). 32. becker et al., opportunity for all: how the american public benefits from internet access at u.s. libraries (washington, dc: institute of museum and library services). 33. lisa tripp, “digital youth, libraries, and new media literacy.” 34. becker et al., opportunity for all: how the american public benefits from internet access at u.s. libraries (washington, dc: institute of museum and library services). 35. collin gordon, mapping decline: st. louis and the fate of the american city. 36. see mwalimu shujaa, beyond desegregation: the politics of quality in african american schooling (thousand oaks, ca: corwin, 1996); william j. wilson, the truly disadvantaged: the inner city, the underclass, and public policy (chicago: university of chicago press, 1987); gary orfield and mindy l. kornhaber, raising standards or raising barriers: inequality and highstakes testing in public education (new york: century foundation, 2010). http://dese.mo.gov/dsm/ communication design and development of a himalayan studies information system for india: a proposed model anil singh the ever-increasing need for information, with its complexity and escalating costs; the enormous growth in publications, and the emergence of subject specialization have compelled librarians to share resources through information networks and systems. this paper describes the necessity of networking among the himalayan studies and research centers in india, allowing the sharing of information originating from the himalayan studies information system (himis). the paper also discusses in brief the definition of information systems, as well as the objectives and needs of a proposed himis. the recent advancements in the computer, communications, and networking technologies have brought about three paradigm shifts. there is the shift of information resources from print to electronic media, the shift in the role of information providers from passive to proactive, and the shift from manual to automated information delivery. this has presented library and information professionals with a tremendous challenge, that of playing a proactive role not only in their routine activities (acquisition, processing, and dissemination), but also in the actual learning process of their clientele. library and information professionals have to learn to scan, filter, interpret, analyze, repackage, and deliver information from a variety of sources in ways that are meaningful to their users.1 himalaya, the greatest physical feature of the earth, is not only an integral part of history and heritage, but it has also assumed the form of a social, cultural, and geo-political reality that cannot be ignored or underestimated. this mountain range makes an enormous contribution to our contemporary life, even as it influenced our history and mythology. furthermore, this influence promises to extend even to the future.2 the himalayas have always remained a source of fascination and inspiration for people from all walks of life, and have been deemed by the peoples of the subcontinent to be the cradle of human civilization. the variety of cultures, terrains, forest physiographies, flora, and fauna of this region has lured the intelligentsia of the world since time immemorial. in recent years, however, the himalayas have become the focus of attention of scientists and government alike. efforts are underway for a better understanding of its highly complex environmental and ecological systems, and to bring about an all-around development of the region, which has remained backward throughout the centuries.3 there has been a tremendous explosion of data and information in recent years, particularly in the field of himalayan resources, with an increase in the number of research and development institutions at the national and international level. while most of these research institutions and universities possess excellent libraries and information centers, there is at present no information network by which coordination and sharing of resources could be effected for the mutual benefit of each of the existing libraries. because there is little access to the right information at the right time, it has become difficult for one single organization to collect the data and information that are required by policy makers, administrators, and research scientists. the other scientific departments of the government of india have already started planning to begin information networking in their respective areas. some of these projects are completed and others are in the process of implementation.4 during the last decades, india has been active in setting up information systems and networks, and considerable progress has been achieved in this area (figure 1). since most of the information related to the himalayas is scattered in different research-andstudies centers all over india, there is a need to develop a himalayan studies information system (himis). himis will be a computer-communication network for linking various libraries and information centers of research and development (r&d) institutions, universities, and nongovernmental organizations (ngos) working on the himalayas. the system, therefore, has to take into account the specific information requirements of each development sector so far as its relevance to the himalayas is concerned. defining information systems the purpose of an information system is to provide accurate and relevant information to users at the right time and at the appropriate level of detail. this will help ensure that the corporate information resource is utilized fully.5 buckingham defines an information system as a system which assembles, stores, processes, and delivers information relevant to an organization (or to society), in such a way that the information is accessible and useful to those who wish to use it, including managers, staff, clients, and citizens. an anil singh (rathoreas@hotmail.com) is professional assistant, division of lib-rary, documentation and information, national council of educational research and training, sri aurobindo marg, new delhi, india. communications design and development of a himalayan studies information system for india | singh 37 38 information technology and libraries | march 2005 information system is a human activity (social) system which may or may not involve the use of computer system.6 harrod’s librarians’ glossary defines information system as an organized procedure for collecting, processing, storing, and retrieving information to satisfy a variety of needs.7 according to bowman, four different components can be identified: (1) a store of useful information that has been accumulated over a period of time; (2) a series of techniques used for adding material to and retrieving information from the store upon demand; (3) a group of people who operate the system and are responsible for selecting information to be added, answering questions, organizing the store, and for implementing and modifying the techniques for both storage and retrieval; and (4) the user. the ultimate test of any such system is the degree of satisfaction it gives the user who has specific information needs.8 setting up himis the volume of himalayan information plus the number of users and their various requirements have created a situation where it is almost impossible for any single library to provide information services singlehandedly. it is only through a network of all these information centers that some viable control of himalayan information is feasible, not only at a local level, but also at the national and international level, to ensure effective use of information resources to the best advantage at a minimum cost. these days, information is being regarded as a national resource, and this awareness has led to computerbased information services such as indexing and abstracting at the national and international levels in the field of himalayan studies.9 considering the importance of himis, the government of india, through the agency of the national information systems in science and technology (nissat), should aid and support the proposed himis generously. himis is eminently suitable for several spheres of national activities, including planning and research. reliable and timely information for decisionmaking becomes increasingly important for india, where a concept of social welfare has developed over the past three decades. many organizations have successfully developed their own information systems to plan, monitor, or control their research activities, and these have yielded increased research proficiency. the government should surely benefit from these methods. apart from suggesting suitable solutions to the problems of planning, monitoring, allocation, control, and coordination of the departmental programs, one has to consider the special distribution of these programs.10 the need to set up himis has, therefore, to be considered in the context of the rapid development of himalayan information as well as the increasing awareness of its relevance to societal development.11 objectives of himis himis will be fully computerized so that an efficient system of storage and retrieval could be organized through networks linking all the himalayanstudies centers with each other. the main objectives of the proposed himis are listed in appendix a. users of the proposed system the proposed information system is planned to meet the needs of the specialists who are directly or indirectly concerned with himalayan studies and research as a subject or as an activity. the following are the categories of those to whom the information would be supplied in a meaningful form within a reasonable time through himis. 1. planners, policy makers, decision makers, administrators with respect to himalayan development at government and nongovernment levels; 2. major institutions devoted to himalayan study and research as a discipline; 3. international organizations such as the united nations educational, scientific, and cultural organization (unesco) and the international centre for integrated mountain development (icimod); agricultural information system ahamdabad library network biodiversity conservation information system calcutta library network developing library network education information system environmental information system industrial information system information and library network management library network medical information system mysore library network national information system for science and technology nutrition information system patent information system project information system pune library network rural information system figure 1. some of the current networking systems in india 4. scientists engaged in the implementation and execution of plans and policies; 5. scientists and researchers engaged in himalayan study and research; 6. teachers engaged in teaching about the himalayas; 7. communicators who attempt to convey the information about development policies, plans, programs, and projects; and 8. ngos working on projects about himalayas. components of the proposed himis for designing himis, six essential components have been identified and proposed: (1) national resource center on himalayan studies; (2) library consortia; (3) computerization of himalayan institutes’ libraries; (4) information networking between himalayan institutes; (5) digitization of information material on himalayas; and (6) himalayan information gateway. figure 2 depicts the proposed components. national resource center on himalayan studies as a part of the information system, a national-level resource center for himalayan studies is also proposed on the pattern of other information centers in india, such as the national information center for management information (nicman), ahmedabad, and the national information center for food science and technology (nicfos), mysore, to name but two. these information centers have been established with the financial help of nissat, and at present are working as sectoral information centers of nissat. figure 3 describes various functions and activities of the proposed center. the planned objectives of the resource center are: � to create a user-need-based information-technology (it) resource center; � to develop a strong collection of different types of information sources; � to develop a user-friendly information-retrieval mechanism, such as an online facility for having access to international databases; � to create an it-based infrastructure; � to develop a liaison with all himalayan studies and research centers of india and other international centers for better information service through resource sharing and networking; and � to provide bibliographies on selected topics on demand or even in advance of demand. himalayan-studies libraries consortium the concept of library consortia has been floating around for quite some time in india. though it is the need of the hour, indian libraries have yet to move in a definite direction in this regard. strong resource-sharing activity among libraries, a prerequisite for the right attitude towards consortia activities, has not been as present in india as could be desired. sudden influx of electronic information is forcing library consortia to materialize.13 traditionally, the primary purpose of establishing a library consortium is to share physical resources amongst members. access to resources is now considered more important than collection building, especially if the access is perpetual in nature. a library consortium helps libraries to get the benefit of wider access to electronic resources at an affordable cost and at the best terms of license. a consortium with the collective strength of resources of the various institutions available to it is in a better position to address and resolve the problems of managing, organizing, and archiving electronic resources.14 the indian national digital library in science and technology (indest) consortium, which is set up by india’s ministry of human resource and development, and ugc-infonet, which is set up by the university grant commission of india are the best example of library consortia in india. therefore, it is also proposed that a consortium of himalayan institute’s libraries may be formed to share the electronic resources of other libraries. computerization of himalayan-studies institute’s libraries observance of an adherence to standard techniques, procedures, and methods is an essential prerequisite for the effective functioning of a network. participating libraries will have to follow certain procedures and practices, without which the resources held by them cannot be effectively and meaningfully shared. in the context of library computerization, standardization is very necessary in such areas as classification, subject heading, and cataloging of various types of library materials; bibliographic description; and standard interchange of bibliographic data.15 the himalayan institutes have started the computerization of their libraries, beginning with creation of computerized databases of books, journals, reports, conference proceedings, annual reports, and monographs. but not all the housekeeping activities have been computerized. it is, therefore, suggested that every library should begin the computerization of each and every one of their activities. in india, libraries are generally using dewey decimal classification (ddc) for classification, anglo-american cataloguing rules (aacr) for cataloging rules, and library of congress (lc) for subject headings. machine-readable cataloging (marc) fomat is being used in the majority of libraries for creation of the database. design and development of a himalayan studies information system for india | singh 39 40 information technology and libraries | march 2005 himalayan institute’s network it is also proposed that there is a need for networking among the himalayan institute’s libraries and information centers in the country for optimum resource sharing. resource sharing and networking in libraries are powerful tools both for increasing productivity and enhancing services to meet the changing needs of library users. the proposed network ensures effective bibliographic control, document delivery, cooperative acquisition of serials and other literature in the field, and dissemination of relevant information. all the libraries and information centers of himalayan-studies centers of india (see appendix b) will be linked to the national resource center on himalayan stud-ies. these libraries have been identified as regional centers of the proposed himis. a network, in the first instance, envisaged a physical structure of links among the libraries and information centers established by means of computer and telecommunication links. resource sharing is based on the concept that the collective strength and effectiveness of a group of libraries is greater than that of the sum of its individual members. digitization of himalayaninformation sources research publications are vital for any professional discipline, and it is crucial to preserve and provide access to them. due to the amount of important information published in journals, conference proceedings, and reports, these efforts must be on a par with similar initiatives in other countries. permanent preservation and enhanced access must be ensured vigorously now by the applications of new technology for digitizing and electronic access to valuable contents. digitization provides unhindered access to information via computer and communication networks, justifying the need for using it for studying himalayan literature. at present, very few copies of conference volumes, journals, and reports are published, and these are exclusively distributed to subscribers. the institutions involved rarely encourage readers to subscribe personally to such literature, thus placing it out of the reach of a large number of readers. the paper used for publishing these types of sources is of inferior quality, significantly reducing shelf life. in addition to providing increased and easy access to these publications, the most visible benefit of digitization is the fact that it preserves them.16 some of the purposes figure 2. components of himis figure 3. national resources center on himalayan studies in india of digitization, identified by different ongoing projects, are to: � collect, store, and organize information and knowledge in digital form; � promote economic and efficient delivery of information; � leverage considerable investments in computing and communication infrastructures; � strengthen communication and collaboration between research, government, and educational communities; and � contribute for lifelong learning opportunities. keeping all of these points in mind, the digitization of himalayan literature—particularly back volumes of journals, conference proceedings, annual reports, monographs, reports, and research papers published by himalayan scientists in various journals, available in himalayan studies centers of india—is very necessary. and it is one of the important and necessary aspects of the proposed himis. himalayan-studies information and subject gateway one of the major problems in accessing information from the internet is that it is very difficult and time consuming to get reliable and relevant information in the shortest possible time. the effective and efficient way to provide easy access to quality information on the internet is developing subject gateways in specific areas. “subject gateways are online services and sites that provide searchable and browseable catalogues of internetbased resources. subject gateways will typically focus on a related set of academic subject areas.”17 to meet the information requirements of the scientific and academic communities in the digital era, various departments in india have developed or are still in the process of developing subject gateways in their respective areas.18 in recent years, there has been a tremendous explosion of data and information on the internet, particularly in the field of himalayan resources. as a result, it has become difficult for an individual—and also for an organization— to collect data and information. thus, access to the right information at the right time has become very difficult.19 since most of the information related to the himalayas is scattered on the web, it is necessary to develop a himalayan-information subject gateway. this gateway will provide links to various libraries and information centers of r&d institutions, universities, and ngos working in the himalayan region. this gateway will be developed on the pattern of the sayama prasad mookerjee information gateway of social science (spmigss) developed by the indian council of social science research (icssr).20 the system, therefore, has to take into account the specific information requirements of each development sector insofar as its relevance to the himalayas is concerned. institutions engaged in himalayan studies and research to reduce the quantum of information illiteracy, it is essential that information is readily available to an individual about the agencies that generate and publish himalayan information—a huge task. the main hurdle has been the lack of appreciation of the role and importance of this type of institutional activity on a continuing basis. bearing in mind the concern for setting up himis, the first step is to identify the agencies and institutions that generate himalayan information. the information generated by these institutions may be contained in the form of files, computerized databases, reports, institutional publications, dissertations and theses, articles in journals, and conference and seminar papers. the information also is accumulated through state-of-the-art reports, serials, and yearbooks.21 india adinet (www.alibnet.org/) agris (www.fao.org/agris/) calibnet (www.calibnet.org.in/) csir (www.csir.res.in) delnet (www.delnet.nic.in/) desidoc (http://drdo.nic.in/labindex. shtml) drdo (http://drdo.nic.in) dst (www.dst.gov.in/) envis (http://envis.nic.in/) hellis (www.hellis.org/) icar (www.icar.org.in) icfre (www.icfre.org) icssr (www.icssr.org) indest (http://paniit.iitd.ac.in/indest/) inflibnet (www.inflibnet.ac.in) moef (www.envfor.nic.in) mylibnet (www.mylibnet.org/) nassdoc (www.icssr.org/doc_mail.htm) nicchem (www.dsir.nic.in/division/ nissat/nisnat/nics/mh.html) nicfos (www.cftri.com/department/ fostis.htm) niclai (www.clri.org/) nicman (www.iimahd.ernet.in/library/) niscair (www.niscair.res.in) nissat (www.dsir.nic.in/division/nissat/) punenet (www.punenet.com/) ugc (www.ugc.ac.in) ugc-infonet (http://web.inflibnet.ac.in/info/ ugcinfonet/ugcinfonet.jsp) figure 4. urls of organizations, networks, and systems of india design and development of a himalayan studies information system for india | singh 41 42 information technology and libraries | march 2005 has a reasonably good institutional setup for himalayan research. figure 5 lists agencies currently involved in diverse fields of r&d on the himalayas. appendix b lists some of the institutions that are engaged in himalayan studies and research. 22 conclusion all himalayan studies and research centers have to assume the major responsibility for developing himis. the government of india needs to be convinced of the usefulness and utility of such a system. it is necessary to emphasize that in the absence of such an information system, a large amount of research talent and resources will be wasted in duplicated efforts. it is hoped that the himalayan institutions and scientists engaged in himalayan studies and research will be able to impress upon the government of india the need for an early formulation and implementation of himis.23 this information system will also supplement the resources and services of the participating libraries as “libraries acting together can more effectively satisfy user needs and thus meet the objectives at reduced costs.” the success of the venture shall depend upon financial support, guidance, and encouragement received from government of india.24 references 1. r. l. raina, and i. v. malhan, eds., business librarianship and information services: proceedings of the iiml-manlibnet 3rd annual national convention, march 12–14, 2001, lucknow: (lucknow: international book distributing co., 2002), v–vii. 2. shekhar pathak and anup sah, kumaon himalaya, temptations (nainital: published for kumaon mandal vikas nigam ltd. by gyanodaya prakashan, 1993). 3. n. k. shah, s. d. bhatt, and r. k. pande, himalaya: environment resources and development (almora: shree almora book depot, 1990), iii. 4. p. c. bose, “national agricultural research information system,” in national information policies and programmes, seminar papers thirty-seventh all-india library conference, (delhi: indian library association, 1991), 177. 5. d. e. avinson and g. fitzgerald, information systems development: methodologies, techniques and tools (oxford: blackwell scientific, 1988), 7. 6. ibid., 8. 7. raymond prytherch and l. m. harrod, harrod’s librarians’ glossary of terms used in librarianship, documentation, and the book crafts, and reference book, 6th ed., (aldershot, u.k.: gower pub., 1987), 385. 8. c. m. bowman, “the development of chemical information systems,” in chemical information systems, j. e. ash, and ernest hyde, eds. (chichester, u.k.: ellis horwood, 1975), 6. 9. n. k. goil, “need for a social science information system: guidelines for a model for india,” library herald 17, nos. 1–4 (1975–1979): 81. 10. s. p. agrawal, “national information system in social sciences in india: a review,” in twenty-eighth all-india library conference, october 20–23, 1982, lucknow: seminar papers of planning for national information system, j. l. sardana et al., eds. (delhi: indian library association, 1982), 273–74. 11. s. p. agrawal, “national information system in social sciences,” in handbook of libraries, archives and information centers in india, vol. 3, information policy systems and networks, b. m. gupta et al., eds. (new delhi: information industry pub., 1986), 183. 12. raina, roshan, “national information system for geoscience in india,” in twenty-eighth all-india library conference, october 20–23, 1982, lucknow: seminar papers of planning for national information system, j. l. sardana et al., eds. (delhi: indian library association, 1982), 262–63. 13. swati bhattacharyya, “library consortia: towards an action plan,” in business librarianship and information services: proceedings of the iiml-manlibnet 3rd annual national convention, march 12–14, 2001, lucknow, r. l. raina and i. v. malhan, eds. (lucknow: international book distributing co., 2002), v–vii. 14. jagdish arora and pawan agrawal, “indian national digital library in science and technology (indest) consortium: consortium-based subscription to electronic resources for technical education system in india,” in mapping technology on libraries and people: proceedings of the second international conference automation of libraries in education and research institutions (ahmedabad: inflibnet, 2003), 272–73. 15. hanif uddin, md. and haru-orrashid, md. (2002), “networking of agricultural information systems in bangladesh (bd-agrinet): a model,” library herald 40, no. 1 (2002): 11. 16. v. k. j. jeevan, “digitizing of indian library science journals,” university news 39, no. 34 (2001): 5–13. 17. desire subject gateways. accessed july 30, 2003, www.desire.org/ html/subjectgateways/community/ imesh/. 18. anil singh and j. n. gautam, “himalayan information subject gateway in digital era: a proposal for its development,” desidoc bulletin of information technology 23, no. 2 (mar. 2003): 3–9. council of scientific and industrial research (csir) defence research and development organization (drdo) department of science and technology (dst) indian council of agricultural research (icar) indian council of forests research and education (icfre) ministry of environment and forests (moef) nongovernmental organizations (ngos) universities centers under the network of university grant commission (ugc) science, technology, and environment departments in various himalayan states figure 5. agencies involved in himalayan research 19. p. c. bose, “national agricultural research information system,” 177. 20. icssr newsletter 23, no. 4 (jan.–mar. 2002): 24. 21. goil, “need for a social science information system,” 72–73. 22. p. pushpangadan and k. narayanan nair, “future of systematics and biodiversity research in india: need for a national consortium and national agenda for systematic biology research,” current science 80, no. 5 (2002): 633. 23. n. k. goil, “need for a social science information system,” 92. 24. amritpal kuar, “networking of the libraries of agricultural universities and research institutes in the states of punjab, haryana, and himachal pradesh (phhalnet): a proposal,” library herald 33, nos. 3–4 (1995–1996): 113. appendix a. main objectives of himis the main objectives of the proposed himis are as follows: � to identify, study, and survey the existing himalayan information infrastructure in the country; � to function as an information base so that policy makers, administrators, and scientists can access the computerbased information in special fields and build up their expertise; � to function as a computer-based information storage-and-retrieval system database that collects structured information generated by research institutions, continuously updating and making the information available to users � to provide a communications link with international databanks and databases for selective bibliographic information to scientists and other users; � to examine, promote, and develop existing information services and resources to meet the information requirements of scientists working in the area of himalayan research; � to establish and maintain links with other national information centers and systems in the country;12 � to create a linked collection of internet-based, high-quality himalayan resources; � to convert core indian himalayan journals, research reports, dissertations, and working papers into digital format; � to keep indian databases of himalayan journals and newsletters of himalayan institute’s online; � to establishing a network of all himalayan-studies research centers situated in different parts of the country for sharing research resources; � to provide online information of forthcoming conferences, seminars, and training workshops in himalayan-researchand-studies centers in india; � to provide details of completed and on-going himalayan-research projects; � to connect web sites of himalayan studies, hill studies departments of major universities, and himalayan-research institutes; � to provide for discussions, chat groups, and video-conferencing facilities for himalayan scientists; � to share the resources of other libraries to supplement a library’s own collection; � to share scientific efforts and expertise; � to ensure effective bibliographic control of the literature; � to facilitate and promote document-delivery and library-lending services; � to develop a common collection-development policy; � to share catalog service and to create a computerized union database; � to share database services such as abstracting, indexing, and full-text services; � to collect, store, organize, and retrieve information on all aspects of himalaya and its interdisciplinary areas contained in various recorded media; and � to coordinate the existing resources, services, and facilities within india in the field of himalayan studies. design and development of a himalayan studies information system for india | singh 43 appendix b. institutions engaged in himalayan research universities center for environmental sciences, himachal pradesh university, shimla (http://hpuniv.nic.in/envstu.htm) center for himalayan studies, university of north bengal, darjeeling (www.nbu.ac.in/) center for interdisciplinary studies of mountain and hill environment, university of delhi, delhi g. b. pant university of agriculture and technology, ranichauri, tehri-garhwal (www.gbpuat.ac.in/) 44 information technology and libraries | march 2005 north east hill university, shillong (www.nehu.ac.in/) high altitude plant physiology research center, h. n. b. garhwal university, srinagar-garhwal dr. y. s. parmar university of horticulture and forestry, solan (www.yspuniversity.ac.in/) institute of integrated himalayan studies (iihs), (ugc center of excellence) himachal pradesh university, shimla (www.hpuniv.nic.in/) institute of himalayan studies and regional development, garhwal university, srinagar-garhwal r&d institutions central institute of higher tibetan studies, sarnath, varanasi (www.smith.edu/cihts/) central institute of himalayan culture studies, dahung, arunachal pradesh defence agricultural research laboratory (drdo) pithoragarh (http://drdo.nic.in/labindex.shtml) snow and avalanche study establishment, (drdo) chandigarh (http://drdo.nic.in/labindex.shtml) temperate forest research institute (icfre), shimla (www.envfor.nic.in/icfre/tfris/tfris.html) himalayan forest research institute, shimla (www.icfre.org/institues/hfri.htm) forest research institute (icfre), dehradun (www.envfor.nic.in/icfre/fri/fri.html) institute of himalayan bioresources technology, (csir) palampur (www.csir.res.in/ihbt/) regional research laboratory, (csir) jammu tawi (www.rrljammu.org/) g. b. pant institute of himalayan environment and development, with its headquarters at almora; and regional units at tadong-gangtok; srinagar-garhwal; shamshi-kullu; itanagar (http://gbpihed.nic.in/) icar research complex for neh region, (icar) umroi road : barapani, meghalaya (http://dare.nic.in/icarneh.htm) vivekananda parvatiya krishi anushandhanshala (icar), almora (http://vpkas.nic.in/) central institute of temperate horticulture (icar), srinagar, jammu and kashmir (http://dare.nic.in/cith.htm) indian veterinary research institute, regional station, palampur (icar) (http://ivri.nic.in/) indian veterinary research institute, regional station, muketeswar (icar), nainital (http://ivri.nic.in/) national bureau of plant genetic resources, regional station, bhowali–niglat, nainital (http://nbpgr.delhi.nic.in/) north eastern regional institute of science and technology, nirjuli, itanagar (http://agni.nerist.ac.in/) wadia institute of himalayan geology, (dst) dehradun (www.himgeology.com/) wildlife institute of india, dehradun (www.wii.gov.in/) international center international center for integrated mountain development (icimod), kathmandu, nepal (www.icimod.org/) ngos center of himalayan development and policy studies, dehradun himalayan environmental studies and conservation organization, kotdwara (garhwal), uttaranchal society for integrated development of himalayas (sidh), landour cantt., musoorie himalayan action research center (harc), dehradun himalayan study circle, pithoragarh people’s association for himalayan area research (pahar), nainital the himalaya trust, dehradun the himalayan foundation, nandprayag, chamoli distt research, advocacy, and communication in himalayan areas (rachna), dehradun central himalayan environment association, nainital himalayan region study and research center institute, new delhi himalayan seva sangh, new delhi himalayan research group, nainital himalayan research and cultural foundation, new delhi himalayan institute of action research and development, dehradun \ comparisons of lc proofslip and marc tape arrival dates at the university of chicago library charles t. payne: systems development librarian, and robert s. mcgee: assistant systems development librarian; university of chicago library, chicago, illinois 115 a comparison of arrival dates of 5020 lc proofslips and corresponding marc magnetic tape records reveals that four-fifths of the marc records were received the same week as, or earlier than, the proofslips. the purpose of this study is to determine the timeliness of marc ii records' arrival dates in comparison to the arrival dates of matching lc proofslips. the acquisitions department of the university of chicago library receives a complete set of cut and punched lc proofsheets (or "lc proofslips") that is used primarily for selection and ordering. in examining potential uses of marc records in acquisitions processing, the library systems development office felt that a critical determinant would be the timeliness of marc records in comparison to the arrival dates of the matching lc proofslips. accordingly, the study described below was designed to gather data upon which appropriate system design questions might be considered. it was decided that "arrival date" would be defined as the week in which an arrival occmted, since the initial processing and distribution of incoming lc proofslips is framed within weekly, rather than daily, periods. "week" was defined as the monday through friday workweek. "arrivals" were defined as deliveries of marc tapes and lc proofslips by the library mail service. no attempt was made to influence the normal delivery procedures, or to specialize or hasten identification of these 116 journal of library automation vol. 3/2 june, 1970 materials for priority handling. arrival weeks were numbered consecutively, the week of march 31 april 4, 1969, being designated week 1. marc tape numbers correspond to arrival week numbers; i.e., marc tape #4 arrived during week 4. table 1 presents these correspondences. table 1. week numbers for 15 weeks of study week number arrival week dates 1 march 31 april 4 2 april 7 april 11 3 april 14 april 18 4 april 21 april 25 5 april 28may 2 6 may 5-may 9 7 may 12may 16 8 may 19 may 23 9 may 26-may 30 10 june 2june 6 11 june 9june 13 12 june 16june 20 13 june 23june 27 14 june 30july 3 15 july 7july 11 data collection proofslip collection began in week 2, but in that week only a partial collection was made. in subsequent weeks, complete collections of proofslips bearing the marc acronym (marc proofslips) were attempted, so that proofslip data beginning with week 3 (april 14-18) are more complete. proofslip collection was terminated in week 15. discrepancies between the counts of marc records and the numbers of marc proofslips collected have not been accounted for, but possible reasons are discussed in the following section. data collection was based upon comparisons of: 1) the weekly printed indexes, in lc card number order, that came with marc ii tapes; and 2) weekly lists of marc proofslip arrivals. in each incoming batch of lc proofslips, those with marc notes were separated and their arrival date noted. the marc proofslips for each week were put in primary order by the £rst two digits (series number) of the card number, and were secondarily ordered within each series by the serial number following the hyphen, thereby matching the order of lc card numbers in the marc indexes. these numbers were transcribed to create a weekly list of proofslip arrivals. two new lists of lc card numbers were derived each week: 1) a marc index; and 2) a proofslip list. weekly each new list was compared with all lists of the other type to identify card number matches. \ proofslip and marc arrival dates/payne and mcgee 117 thus, each of the two types of lists was cross-tabulated with all the lists of the other type, showing on all lists which card numbers had been matched, and the week numbers of these matches. counts were made of the matches tabulated on each list, and were entered into table 2. matches made during a given week are subcounted by series groups 65-68, 69, and the 7 series. the cumulative percentages of marc record and proofslips matches were entered into tables 3 and 4. table 3 contains the percentages of matches for any week's proofslips with successive marc tapes. for example, of the 340 proofslips received in week 4, 71.2% matched marc records received the same week, or earlier, i.e., tapes 4, 3, and 2. table 4 contains the percentage of matches for any marc index on successive proofslip lists. for example, of the 768 records on marc tape number 5 (received in week 5) 23% were matched by proofslips received the same week, or earlier, i.e., weeks 5, 4, 3, and 2. analysis of results some patterns of marc and proofslip arrivals are indicated by the tables. the results in table 2 show that there is not a one-for-one weekly relationship between proofslip and marc record arrivals. for example, the 340 marc proofslips received in week 4 matched tape records received from week 2 through week 10, although the highest number of matches was also in the tape received in week 4. in later proofslip weeks, however, the highest number of proofslip matches was with tape records received at least one week earlier. a summary of table 2 would show that of the 5020 marc proofslips received during weeks 3-10, 4004, or 79.8% were matched to marc records received the same week or earlier. in table 3, the cumulative percentages of proofslip matches with successive marc indexes indicate, for several of the weeks, more than a 90% match with tape records two weeks after proofslip arrivals. table 3 shows that the percentage of matches for a set of proofslips received in one week with the marc indexes received the same week or earlier ranges from 48.9% to 91.6%. table 4 shows that the percentage of matches for a marc tape received in a given week with the proofslips received the same week or earlier ranges from 7.1% to 49.8%. for the period of weeks corresponding to tape numbers 3-10, 6335 tape records (from table 4) and 5020 proofslips (from table 2 or 3) were received. the reason for the discrepancy between the number of marc records and the number of marc proofslips is not clear, but is possibly due to the combined effects of basic factors such as the limited period of the study, the difficulties of collecting proofslips in a working environment, and the nature of the manual effort required to list lc card numbers and compare proofslip lists and marc indexes. table 2. number of proofslip matches with marc indexes by arrival week and by lc card number subseries proof slip lc tape tape tape tape tape tape tape tape tape tape week series 1 2 3 4 5 6 7 8 9 10 ps 2 6568 5 25 13 4 4 0 0 0 0 0 # 88 69 2 8 5 3 3 0 l 0 0 0 7 series 1 2 7 1 2 1 0 0 0 0 total 8 35 25 8 9 1 1 0 0 0 6!:>-68 6 42 77 37 17 9 2 1 0 0 ps 3 69 7 25 65 32 30 9 5 0 1 0 #497 7 series 0 16 36 41 8 2 0 0 2 0 total 13 83 178 110 55 20 7 1 3 0 65-68 0 14 35 57 19 12 1 3 2 1 ps 4 #340 69 0 0 26 56 12 3 0 1 1 3 7 serie' 0 0 18 36 19 9 1 0 1 2 total 0 14 79 149 so 24 2 4 4 6 65-68 0 0 14 56 33 35 0 4 7 4 ps 5 69 0 0 7 62 21 8 0 4 3 2 #398 7 seriel 0 0 9 49 9 9 1 5 2 6 total 0 0 30 167 63 52 1 13 12 12 65-68 0 0 0 29 108 77 3 5 2 3 ps 6 69 0 0 1 55 95 so 4 2 2 4 #653 7 serie~ 0 0 2 28 72 52 6 0 5 s· total 0 0 3 112 275 179 13 7 9 12 65-68 0 0 0 9 68 128 29 6 4 2 ps 7 #711 69 0 0 0 2 29 133 54 9 1 8 7 serie! 0 0 0 5 33 92 33 10 3 6 total 0 0 0 1 6 130 353 116 25 8 16 ps 8 65-68 1 0 0 0 5 87 46 29 6 4 69 0 0 0 0 2 54 49 20 11 4 #503 7 seriel 0 .0 0 0 2 37 46 17 1 2 total 1 0 0 0 9 178 141 66 18 10 65-68 0 0 0 0 0 10 86 122 52 34 ps 9 69 0 0 0 0 l 3 75 107 53 39 #933 7 «f>ri f>< 0 0 0 0 0 1 49 115 73 42 total 0 0 0 0 1 14 210 344 178 115 65-68 0 0 0 0 0 0 5 36 159 91 ps 10 #985 · 69 0 0 0 0 0 0 1 40 180 96 7 series 0 0 0 0 0 0 5 23 165 101 total 0 0 0 0 0 0 11 99 504 288 tape tape tape 11 12 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 2 1 q_ 3 1 n 0 0 0 3 0 0 1 2 0 4 2 0 0 l. 0 0 0 0 0 0 , 0 l. 1 5 1 1 0 0 0 10 4 0 15 5 1 8 1 0 5 1 1 7 1 0 20 3 1 tape tape 14 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 n n n n 0 0 1 0 0 0 1 0 0 0 0 0 () () () () 0 0 1 0 1 0 2 0 5 1 1 0 1 1 7 2 ~ ~ 00 'o' ~ g ..... -q.. t-c & j :;t.. ~ 0 g ..... a· ~ 2 !"""' co -to ._ c: l:l sd ~ td z table 3. cumulative percentages of matches of each week's proofslips received with each additional marc ii tape index proofsl ' cumulative \ of tape number 1_p tape 1 tape 2 tape 3 tape 4 tape 5 tape 6 tape 7 tape 8 tape 9 tape 10 ps 2 9.1 48,9 77.3 86.4 96.6 97 . 7 98.6 98.6 98.6 98.6 ll 88 ps 3 2.6 19.3 55.1 77.2 !1497 88.5 92.3 93.7 94.0 94.6 94.6 ps 4 o.o 4.1 27 . 4 71.2 85.9 92.9 ll340 93.5 94.7 95 .9 97.6 ps 5 o.o 0 . 0 7.5 49 . 7 65.3 78.4 78.6 81.9 84.9 87.9 # 398 ps 6 o.o o.o .4 17.6 59 .7 87.1 89.1 90.2 91.6 93.4 #653 ps 7 o.o o.o 2.2 20.5 70 . 2 86.5 90.0 91.1 93.5 11711 o.o ps 8 .2 .2 .2 .2 2. :> 37 .4 65.4 78 . 5 8 2.1 84.1 #503 ps 9 o.o o.o 0 . 0 o.o .1 1.6 24.1 61.0 80.0 92 . 4 ll933 ps 10 1.1 11.2 62 . 3 91.6 ll995 o.o o.o 0.0 o.o o.o 0.0 tape 11 tape 12 tape 13 tape 14 tape 15 ~ 0 (;l" 98.6 98 . 6 98.6 98.6 98.6 ....... -s· 9 4.6 94.6 94.6 94.6 94.6 ~ ~ r.. 97.6 97 . 6 97.6 97.6 97.6 ~ > i:j::j 88.2 88.2 88.2 88.2 88 .2 93.9 94.0 94 . 0 94.0 94.0 cj ~ c;· ~ ...... 94.9 94.2 94.2 94 .4 94 . 4 t:::l / ~ <'+<.1:> ., 84 . 1 84 .5 84. 7 84.7 84.7 --~ to< 94.0 94.5 94 . 6 94.8 94.8 z tr1 93 .6 93,9 94.0 94.7 94.9 $)j ~ 0.. ~ n g'j tr1 tr1 ...... ...... co table 4. cumulative percentages of matches of each marc ii tape index with each additional week's proofslips received tape cumulative \ of proofslip week number ps 2 ps 3 ps 4 ps 5 ps 6 ps 7 ps 8 ps 9 ps 10 ps 11 tape 1 1.2 3.2 3.2 3.2 3.2 3.2 3.4 3.4 3.4 3.4 # 648 tape 2 7.1 23.8 26.6 26.6 26.6 26.6 26.6 26.6 26.6 26.6 # 495 tape 3 · 5.1 41.1 57.1 63.2 63.7 63.7 63.7 63.7 63.7 63.7 # 494 tape 4 1.0 14.9 33 . 1 54.7 67.9 71.0 71.0 71.0 71.0 71.0 #791 tape 5 1.2 8.3. 14.8 23.0 #768 58.8 75.6 77.0 77.1 77.1 77.1 tape 6 0.1 1.9 4,0 a.5 #1136 24.2 55.4 71.0 72.3 72.3 72.4 tape 7 0.1 1.2 1.4 1.6 3.5 20.2 40.5 70.7 72.3 72.6 #694 tape 8 o.o 0,2 o.7 2.7 * 658 3.8 7.6 17.6 69.9 84.9 86.2 tape 9 o.o 0.3 o . a 2.1 3.1 4.0 6.1 26.0 82.5 86.0 #892 tape 10 o.o o.o 0.7 2.0 3.3 5.0 6.2 18.5 49.8 79.5 #922 -------ps 12 ps 13 ps 14 3.4 3.4 3.4 26.6 26.6 26.6 63.7 63.7 63.7 71.0 71.0 71.0 77.1 77.1 77.1 72.4 72.4 72.4 72.6 72.6 72.6 86.5 86.5 86.5 86.3 86.3 86.3 89.9 89.9 90.5 ps 15 3.4 26.6 63.7 71.0 77.1 72.4 72.6 86.5 86.3 90 . 5 ~~f--1 to 0 g ~ ...... .q.. ~ 1 > ~ 8' g ..... a· ;;s 2 ~ vo ........... to <....t = ::i so f--1 c:d ~ \ proofslip and marc arrival dates/payne and mcgee 121 conclusion the data collected to date indicate that the arrivals of marc records generally precede those of the corresponding proofslips. thus, marc records seem to be timely enough to be used in book selection and ordering processes, where proofslips are now used, as well as to supply bibliographic data for cataloging. 24 cost analysis of an automated and manual cataloging and book processing system joselyn druschel: washington state university , pullman. a comparative cost analysis of an automated network system (wln) and a local manual system of cataloging and book processing at washington state university libraries indicates that the automated system is about 20 percent less costly than the manual system. a per-unit cost approach was used in calculating the monthly cost of each system based on the average number of items processed per month under the automated system. the process and the results of the analysis are presented in a series of charts which detail the tasks, items processed, unit and total monthly costs of both the manual and automated systems. the higher costs of the manual system were essentially staff costs. the technical services division (tsd) of washington state university libraries (wsul) has had considerable experience in the use of automated techniques in selected areas of technical processing. an in-house automated acquisitions system was developed and implemented in 1967; that in-house system was eventually replaced by the acquisitions component of the washington library network (wln) . since november 1977, the technical services division of wsul has used the wln bibliographic component for data verification (searching) and cataloging of materials. although the library has generally known its total automation expenditures, it has lacked a more precise breakdown of cost data on automated processing. moreover, the library has practically no cost data on manual processing. this report deals only with the costs of using the wln bibliographic system, not the wln acquisitions component. an analysis was made of the total costs of both the automated and manual book processing systems. the objectives in undertaking the cost analysis were threefold: (1) to identify the essentially unknown costs of manual processing; (2) to provide more exact cost data on automated processing; and (3) to develop comparable data on the costs of each system . manuscript received october 1980; accepted december 1980. cost analysis/druschel 25 methodology the methodology used in this cost analysis was a per-unit cost approach. first, each process or task in which the staff were engaged in cataloging and book processing was identified. second, the per-unit cost-e.g., staff, data base, materials-of each process was calculated. finally, monthly costs were determined by multiplying the average number of items processed per month by the unit cost per task. the cost analysis charts (tables l(a)-l(e)-manual system; tables 2(a)-2(d)-automated system), which detail the tasks, items processed, and unit and total costs form the body of· the analysis. equipment costs-purchase, lease, maintenance-were calculated separately, and are included in the summary cost data for each system (table 3). identification of processes the staff of the tsd cataloging and book processing unit perform the following functions: bibliographic verification, bibliographic record production, bibliographic record maintenance, the marking of materials, binding preparation and receipt (for most of the library system), and the preparation of book cards. table i( a). cost analysis: manual cataloging and book processing system staff costs per process item bibliographic searching idc microfiche search (lc and cip copy) lt i (.084/min @ 3 min/item) $ .252 lt ii (.094/min@ 3 min/item) .282 lt iii (.117/min @ 3 min/item) .351 subscription costs-idc ($10,000/yr -'47,664 searches/yr = . 21/search) microfiche search subtotal national union catalog, etc., search lt i (.084/min@ 15 min/item) $1.26 lt ii (.094/min @ 15 min/item) 1.41 lt iii (.117/min@ 20 min/item) 2.34 lt iii (.117/min@ 40 min/item) 4. 68 subscriptions ($2, 940/yr -'15,300 searches/yr = .19/search) manual search subtotal bibliographic searching total data base costs/ item subscription costs/item $ .21 .21 .21 $ .19 .19 .19 . 19 materials costs/item average total number total cost processed cost per per per item month month $ .462 2484 $1,148 244 557 .492 496 .561 992 $1.45 1.60 2.53 4.87 3972 $1,949 588 $ 853 169 270 418 1,058 100 487 1275 $2,668 5247 $4,617 26 journal of library automation vol. 14/1 march 1981 table l(b). cost analysis: manual cataloging and book processing system ave rage staff data total number total costs base cost processed cost per costs/ subscription materials per per per process item item costs/item costs/item item month month bibliographic record production-processing and products l. cataloging with lc microfiche copy type abbreviated fanfold (4-part 3x5 slips) timeslip (.03/min@ min/item x 72) $ .803 $.02/ fanfold lt i (.084/min @ 10 min/item x 985) check series lt i (.084/min@ 2 min/item) .168 revise fanfold supervisor i (.126/min @ 3 min/item) .38 check fanfold against book; separate fanfold lt ii (.094/min@ 2 min/item x 540) .22 supervisor i (.126/min @ 2 min/item x 517) arrange and file shelflist copy of fanfold timeslip (. 03/min @ 1.5 min/slip) .045 revise shelflist filing of fanfold slips lt ii (.094/min @ 1 min/slip) .094 verify authorities ( subject and name ) (1057x4 ) timeslip (.03/min@ 4 min/item) .12 type multilith master for card production lt i (. 084/min @ 6 min/master) .504 .061 master revise typed multilith master lt i (.084/min@ 3 min/master) .252 run multilith masters multilith operator (.13/min@ 3.5 min/set) .455 (cost of cards see below) microfiche copy cataloging subtotal $ 3.04 $ 081 $3.12 1057 $3,298 item 2. cataloging with modified copy (nuc/ lc) type fanfolds (4-part 3x5 slips ) lt i (.084/min@ 15 min/item) $ 1.26 $.021 fanfold cost analysis/druschel 27 table l(b) ( cont.) check series lt i (.084/min@ 2 min/item) .168 revise fanfold supervisor i (.126/min @ 5 min/item) .63 review fanfold cataloging librarian (.155/min @ 5 min/item) .775 separate fan folds lt ii (.094/min @ 30 sec/item) .047 . arrange and file sheljlist copy of fanfold timeslip (.03/min@ 1.5 min/slip) .045 revise filing of shelflist copy lt ii (.094/min @ l min/slip) .094 verify authorities (984 x 4) timeslip (.03/min@ 4 min/item) .12 type multilith master for card production lt i (.084/min @ 6 min/master) .504 .06/ master revise typed multilith master lt i (.084/min @ 3 min/master) .252 run multilith masters multilith operator (.13/min @ 3. 5 min/set) .455 (cost of oards see below) modified copy cataloging subtotal $ 4.35 $.08/ $4.43 984 $4,359 item 3. original cataloging catalog material librarian (. 155/min @ 60 min/item x 200) $ 9.60 librarian (.205/min @ 60 min/item x 22) revise cataloging librarian (.205/min @ 5 min/item) 1.03 type fan folds ( 4-part 3x5 slips) lt i (.084/min@ 15 min/item) 1.26 $.02/ fanfold check series lt i (.084/min@ 2 min/item .168 revise fanfold supervisor i (.126/min @ 5 min/item) .63 separate fan folds lt ii (.094/min @ 30 sec/item) .047 arrange and file sheljlist copy of fanfold timeslip (.03/min @ 1.5 min/item) .045 revise filing of sheljlist copy lt ii (.094/min @ 1 min/slip) .094 28 journal of library automation vol. 14/1 march 1981 table i(b) (c ont .) staff costs per process item 4. type multilith master for card production lt i (.084/min @ 6 min/master) .504 revise typed master lt i (.084/min@ 3 min/master) .252 run multilith master multilith operator (.13/min @ 3.5 min/set) .455 original cataloging subtotal $14.085 catalog cards (7 cards/set @ .055/card) subtotal cataloging total miscellaneous bibliographic record production assign class numbers to theses supervisor i (. 126/min @ 2 min/item) assign subject headings for audio visual materials librarian (.155/min @ 2 min/set) type multilith masters for catalog cards for a-v materials lt i (.084/min @ 6 min/master) revise multilith masters lt i (.084/min @ 3 min /master) run multilith masters multilith operator (.13/min@ 3.5 min/set) (20 cards) resolve problems; general supervision supervisor i (7.56/hr x 52 hrs/mo) librarian (9.32/hr @ 22 hrs/mo) miscellaneous bibliographic record production subtotal bibliographic record production total $ .252 .31 .504 .252 .455 data base costs/ item subscription costs/item materials costs/item .06/ master (cost of cards see below) total cost per item average number total processed cost per per month month $. 081 $14.165 222 $ 3,145 item $.385/ set $.06/ master 1.10/ set $ .252 .31 .564 .252 1.555 4297 $ 1,6.54 2263 $12,456 30 30 30 30 30 8 9 17 8 47 393 205 $ 687 $13,143 cost analysis/druschel 29 table i( c). cost analysis: manual cataloging and book processing system average staff data total number total costs base cost processed cost per costs/ subscription mate rials per per per process item item costs/ item c osts/ltem i tern month month bibliographic record maintenance count sets of cards and match against cataloging copy lt i (.084/min @ 2 sets/min) $ .042 $ .042 4297 $ 180 type subject and added entries on card sets timeslip (.03/min @ 3 min/set) .09 .09 4297 387 revise card sets lt ii (.094/min @ 3 min/set) .282 .282 2520 711 lt iii (.117/min@ 3 min/set) .351 .35 1 1803 633 type subject and name authority slips timeslip (.03/min @ 1 min/slip) .03 .03 4526 136 file subject and name authority slips timeslip (.03/min @: 1 min/slip) .03 .03 4526 136 separate card sets lt i (.084/min@ 2 sets/min) .042 .042 4297 180 file subject catalog cards (2263x2) lt ii (.094/m in @ 1 min/card ) .094 .094 4526 425 file ait catalog cards (2263 x3 ) lt i (.084/min @ 1 min/card) .084 .084 6789 570 file shelflist cards (2) timeslip (.03/min @ 1 min/card ) .03 .03 4526 136 retoise subject card filing lt iii (.117/min@ 1 min/card ) .117 .117 .4526 530 retoise ait card filing lt iii (.117/min@ 1 min/card) .117 .117 6789 794 revise sheljlist filing (2) lt ii (.094/min @ 1 min/card ) .094 .094 2340 220 supervisor i (.126/min @ 1 min/card ) .126 .126 2186 275 alphabetize and date works/ips lt i (.084/min @ 4 slips/min) .021 .021 2263 48 pull card sets (withdrawals and card corrections timeslip (.03/min @ lo min/set) .30 .30 100 30 revise card pulling ( 100 sets/month) supervisor i (.126/min @ 2 min/set) .252 .252 100 25 correct card sets (50 sets/month ) lt ii (.094/min@ 5 min/set) .465 .465 50 23 revise card corrections supervisor i (.126/min @ 2 min/set) .252 .252 50 13 process added copies (record accession # on shelflist; record call # in book; type slip for marking) lt ii (.094/min @ 15 min/item) 1.41 1.41 50 71 30 journal of library automation vol. 14/1 march 1981 table 1( c) (cont.) average staff data total number total costs base cost per costs/ subscription materials per process item locate materials in process lt ii (.094/min @ 15 min/item) $1.41 prepare books for binding decision supervisor i (. 126/min @ 1 min/item) .126 general supervision librarian ($12.34/hr @ 65 hours/month) bibliographic record maintenance total item costs/item costs/item item $1.41 . 126 table i( d ) . cost analysis: manual cataloging and book processing system staff data total costs base cost per costs/ subscription materials per process item item costs/item costs/item item marking sort materials for processing (marking) oa ii-typing (.105/min@ 30 sec/item) $ .053 $ .053 place materials on table oa 11-typing (.105/min@ 20 items/min) .005 .005 process materials (type and paste labels, pockets, & date due slips; type book cards) timeslip (.03/min@ 20 min/item) .60 $.029/ .629 label ; pocket ; date due slip; book card process materials with tab book cards (type and paste labels, pockets, & date due slips) timeslip (.03/min @ 16 min/item) .48 .032/ .512 oa ii-typing (.105/min@ 16 min/item) 1.68 label; 1.712 pocket; date due slip; book card processed cost per per month month 50 $ 71 50 6 802 $6,402 average number total processed cost per per month month 2263 $ 120 2263 ll 400 252 1555 796 308 527 cost analysis/druschel 31 table 1( d) (cont.) keypunch bookcards lt i (.084/min @ 2.4 min/card) .201 verify book cards lt iii (.117/min@ 1.6 min/card) .187 revise processing lt i (.084/min @ 2 min/item) .168 lt iii (. 117/min@ 2 min/item) .234 sort materials for delivery oa ii-typing (.105/min@ 1.5 items/min) .07 unpack bindery materials, pull slips lt i (.084/min @ 1 min/item) .084 verify bindery slips; check price lt iii (.117/min@ 2 min/item) general supervision, bindery account & statistical data lt iii (7.04/hr@ 15 hrs/mo) supervisor ii (8.97/hr @ 128 hrs/mo) librarian (12.34/hr @ 15 hrs/mo) marking total cataloging and book processing total .234 table i( e). total monthly costs (summary ) staff costs per month $25,775 data base costs/month subscription costs per month $1,076 .201 1863 374 .187 1863 348 .168 1500 252 .234 763 179 .07 2263 158 .084 550 46 .234 550 129 106 1,148 185 $ 4,631 $28,793 material costs per month total cost per month $1,942 $28,793 table 2(a). cost analysis: automated cataloging and book processing system average staff data total number total costs base cost processed cost per costs/ subscription materials per per per process item item costs/item costs/item item month month bibliographic searching l. wln data base search items searched, no inquiry charges lt ii (.094/min @ 1 min/item) terminal use (4 @ .06) $ .094 $ .24 $ .334 2443 $ 816 terminal use (3@ .06) .094 .18 .274 100 27 32 journal of library automation vol. 14/1 march 1981 table 2( a) (cont.) average staff data total number total costs base cost processed cost per costs/ subscription materials per per per process item item costs/item costs/item item month month items searched, inquiry charges assessed lt ii (.094/min@ 1 min/item) inquiry (3 @ .069) .094 .39 .484 1429 692 terminal use (3@ .06) data base search subtotal 3972 $1,535 2. national union catalog, etc. search (manual) lt ii (.094/min @ 10 min/item ) .94 .31 1.25 508 635 subscriptions ($1,860/yr + 6096 searches/yr) manual search subtotal 508 $ 635 bibliographic searching total 4480 $2, 170 table 2(b ). cost analysis: automated cataloging and book processing system average staff data total number total costs base cost processed cost per costs/ subscription materials per per per process item item costs/item costs/item item month month bibliographic record productionprocessing and products l. materials cataloged via wln a. cataloging with wln data base copy attach holdings; order cards lt ii (.094/min @ 6 min/item) $ .564 data base costs inquiry costs (no charge) cost per record use $1.60 cost per request .15 shelllist cards (4 @ .055) .22 com (cost pe r record ) .43 terminal use (1 @ .06/use) .06 -----wln data base copy subtotal $ .564 $2.46 $3.024 1376 $4,161 b. cataloging with cip copy upgrade data base copy lt ii (. 094/m in @ 11 min/item) $1.034 revise upgraded copy librarian (.155/min @ 5 min/item) .775 attach holdings order cards lt ii (.094/min @ 6 min/item ) .564 table 2(b) (cont.) data base costs cost per record use cost per request shelflist cards (4 @ .055) com (cost per record) terminal use (1 @ .06/use) $1.60 .15 .22 .43 .06 cip copy subtotal $2.373 $2.46 c. cataloging with modified copy (e.g., nuc/lc copy) prepare cataloging worksheets lt ii (.094/min @ 15 min/item) $1.41 revise cataloging worksheets lt ii (.094/min @ 10 min/item) . 94 marc tag worksheets supervisor ii (.15/min @ 15 min/item) 2.25 revise marc tagged worksheets librarian (.155/min @ 8 min/item) 1.24 input cataloging data; attach holdings; order cards timeslip (.03/min @ 25 min/item) . 75 revise data input and verify authorities librarian (. 155/min @ lo min/item) 1.55 data base costs cost of input per record cost of authority checks (7 checks @ .069/entry) shelflist cards (4 @ .055) com (cost per record) terminal use (7 @ .06/use) $ .14 .48 .22 .43 .42 modified copy subtotal $8.14 $1.69 d. original cataloging catalog and marc tag material librarian (.155/min @ 60 min/item) $ 9.30 revise cataloging and marc tagging librarian (.205/min @ 5 min/item) 1.03 input cataloging data; attach holdings; order cards lt ii (.094/mi n @ 25 min x 104) cost analysis/druschel 33 $4.833 153 $ 739 $9.83 95 $ 934 34 journal of library automation vol. 14/1 march 1981 table 2(b ) (cont.) average staff data total number total costs base cost processed cost per costs/ subscription materials per per per process item item costs/item costs/item item month month timeslip (.03/min @ 25 min x 118) 1.49 revise input; verify authorities librarian (.155/min @ 10 min/item) 1.55 data base costs cost of input per record $ .14 cost of authority checks (7 checks @ .069/entry) .48 shelflist cards (4 @ .055) .22 com (cost per record) .43 terminal use (7 @ .06/use) .42 ----subtotal $13.37 $1.69 $15.06 222 $ 3,343 wln cataloging total 1846 $ 9,177 2. materials cataloged via other methods a. microform cataloging from publisher's copy review and revise copy; complete processing; revise card sets librarian (.25/min@ 2. 7 min/item) $ .675 xerox card sets (1 0 cards/set) timeslip (.03/min @ 1 min/title) .03 $.551 set microform subtotal $ .705 $.551 $1.255 407 $ 511 set b. cataloging music scores catalog scores; prepare for card production; revise card sets librarian (.25/min @ 28 min/item) 7.00 xerox card sets (14 cards/se t) timeslip (.03/min@ 2 min/title) .06 .77/ set music score subtotal $ 7.06 $. 77/ $7.83 10 $ 78 set non-wln cataloging total 417 $ 589 cataloging total 2263 $9,766 cost analysis/druschel 35 table 2(b ) (cont .) 3. miscellaneous costs assign class numbers to theses supervisor ii (.15/m in @ 2 min/item) $ .30 $ .30 30 $ 9 retrieve "rush" monographs supervisor ii (. 15/min @ 15 min/item) 2.25 2. 25 75 169 correct/update wln data base information lt ii (.094/min @ 10 min/item) terminal use (1 @ .06/use ) .94 $.06 1.00 360 360 assign subject headings for audio visual materials librarian (.155/min @ 2 min/set) .31 .31 30 9 file subject authority slips for microform mat erials librarian (.155/min @ 1.15 min/slip) .18 .18 55 10 resolve problems; general supervision lt ii (5 .68/hr x 13 hrs/mo) 74 supervisor ii (8. 97 hrs x 89 hrs/mo) 798 librarian ($12. 34 hr x 52 hrs/mo) 642 miscellaneous costs subtot~l $ 2,071 bibliographic record production total $11 ,837 table 2(c). cost analysis: automated cataloging and book processing system average staff data total number total costs base cost processed cost per costs/ subscription materials per per per process item item costs/item costs/item item month month bibliographic record maintenancets d collate card sets from wln (7384 cards) lt i (. 083/min @ 30 sec/card) $ .042 $ .042 7384 $ 310 insert card sets in books lt ii (.094/min @ 1.6 min/item) process new books (1846 ) 1.51 . 151 1846 279 review cards against books; add accession number and stamp date on shelflist card; carrect series (when needed); separate card sets and distribute timeslip (.03/min @ 10 min/item) .30 .30 145 44 38 journal of library automation vol. 14/1 march 1981 table 2( c) (cont.) average staff data total number total costs base cost processed cost per costs/ subscription materials per per per process item item costs/item costs/item item month month lt i (.083/ min @ 10 min/item) $.83 $ .83 1701 $1 ,412 revise book processing (1846) lt iii (.117/min@ 1 min/item) .117 .117 1846 216 file central and holland sheljlist timeslip (.03/min @ 1 min/card) .03 .03 4526 136 revise central and holland sheljlist lt i (.083/min @ 30 sec/card) .042 .042 4526 190 separate and alphabetize microform card sets timeslip (.03/min @ 1 min/set) .03 .03 2000 60 file author/title/subject microform cards in general catalog timeslip (.03/min@ 1 min/card) .03 .03 2000 60 revise filing general catalog lt iii (.117/min@ 1 min/card) .117 .117 2000 234 pull card sets (withdrawals and card corrections--40 setslmo) timeslip (.03/min@ 10 min/set) .30 .30 20 6 lt i (.083/min @ 6 min/set) .498 .498 20 10 revise card pulling lt iii (.117/min@ 2 min/set) .234 .234 40 10 correct card sets (20 sets/mo) lt i (.083/min@ 6 min/set) .498 .498 20 10 revise card corrections lt iii (.117/min@ 2 min/set) .234 .234 20 5 process added copies (record accession number on shelflist; record call number in book; type slip for marking) lt i (.083/min @ 15 min/item ) 1.25 1.25 50 63 locate materials in process lt i (.083/min@ 15 min/item) 1.25 1.25 33 41 prepare books for bindery decision lt iii (.117/min @ 1 min/item) .117 .117 50 6 supervise staff and timeslip lt iii (7.04/hr @ 68 hrs/mo) 479 librarian (12.34/hr@ 50.5 hrs/mo ) 623 bibliographical record maintenance total $4,194 cost analysis!druschel 39 table 2(d). cost analysis: automated cataloging and book processing system average staff data total number total costs base cost processed cost per costs! subscription materials per per per process item item costs/item costs/item item month month marking sort materials for processing ( marking ) oa ii-typing (. 105/ min@ 30 sec/item) $ .053 $ .053 2263 $ 120 place materials on table oa iityping (.105/min@ 20 items/min) .005 .005 2263 11 process materials (type and paste labels, pockets and date due slips; type book cards ) timeslip (.03/min @ 16 min/item) .60 $.029/ .629 400 252 date due slip; label pocket ; process materials with tab book card book cards (type and paste labels, pockets and date due slips) timeslip (.03/min @ 16 min/item) .48 .032/ .512 1555 796 oa ii-typing (. 105/min @ 16 min/item) 1.68 date due 1.712 308 527 slip ; label; pocket ; book card keypunch bookcards lt i (.083/min @ 2.4 min/card) .20 .20 1863 373 verify book cards lt iii (. 117/min @ 1.6 min/card) . 187 .187 1863 348 revise processing lt i (.083/min @ 2 min/item) .166 .166 1500 249 lt iii (.117/min@ 2 min/item) .234 .234 763 179 sort materials for delivery oa ii-typing (.105/min@ 1.5 items/min ) .07 .07 2263 158 unpack bindery materials; pull slips lt i (.083/min @ 1 item/min) .083 .083 550 46 verify bindery slips and check price lt iii (. 117/min @ 2 min/item) .234 .234 550 129 general supervision; bindery accounts and statistical data lt iii (7.04/hr @ 36 hrs/mo) 253 supervisor ii (8.97/hr@ 128 hrs/mo) 1,148 marking total $ 4,589 cataloging and book processing total $22,790 40 journal of library automation vol. 14/ 1 march 1981 table 2(e ) . total monthly costs ( summary ) staff costs per month $16,849 data base costs/month $5,480 subscription costs per month $157 materials costs per month $304 total cost per month $22 ,790 table 3. cataloging and book processing system: summary comparison costs manual system category costs/month staff data base subscriptions materials equipment total $25,775 1,076 1,942 462 $29 ,255/month cost comparison-difference manual $29,255/ month automated $23,680/month automated system category costs/month staff data base subscriptions materials equipment total $16,849 5 ,480 157 304 890 $23, 680/month $ 5,575/month/$66,900/year since 1978 this unit, as well as all units in the technical services division, have periodically analyzed unit activities, and recorded the data collected on work assignment/staffing profile sheets (see table 4 for sample profile sheet). the primary purpose of the profiles was to develop a detailed account of work distribution throughout tsd in order to determine the staffing requirements necessary for each unit to maintain an even workflow. in the cost analysis , the cataloging and book processing (cbp) profile was used to identify each unit process, as well as to provide the basic data on the number and level of staff and the time required to perform each process . additionally, for the automated system, the cbp profile sheets, together with wln invoices (see figure · 1 for sample invoice) and wln monthly activity reports (see figure 2 for sample activity report) were used to determine the average number of items processed per month . for example, since about 85 percent of the cataloging done in tsd is via wln, it was possible to derive exact figures from wln invoices for the average number of items cataloged per month. the wln invoices also differentiated between data-base copy cataloging and original data entry . the cbp profile sheets were used to determine average number of non-wln items cataloged. using a combination of wln invoice and profile data, a chart was constructed of the average number of items searched and cataloged per month under the automated system (see table 5) . in order to make costs comparable, an assumption was made that the same average number of items was searched and cataloged under the previous manual system and a similar chart was made for it (see table 6). in reality , the available staff under the manual system could not process the same amount of material per month. cost analysis/drusche l 41 table 4 . technical services division work assignment/staffing profile: november 1978 unit: cataloging and book processing. subunit: lc copy editing . tasks or processes average number of items received for processing order card sets, check item { 2100/mo(monos ) agamst data base, enter holdings 63/mo(serials) prepare worksheets 210/mo prepare ts d series cards 126/mo do series check 350/mo average time per item 6 min/item 6 min/item 10 min/item 2 min/item 2 min/item update cip records 134/mo 10 min/item . . . { 210/mo(mono) 25 min/item input ongmal catalogmg data 211 ( · 1 ) 25 · /'t · mo sena s · mm 1 em process "rush" monographs 168/mo 15 min/item process corrections360/mo 10 min/item data base information receive materialssort series 2100/mo 3 items/min resolve problems; na na locate materials prepare and sort series 168/mo 8 min/item decisions materials sort mail na na staff costs total staff hours average staff available number hours at of items needed level of designated processed per task staff level 10/hr {lti 124.1 210/mo lt ii 85 .9 10/hr 6.3/mo lt iii 6.3 6/hr {lti 17.5 35/mo lt ii 17.5 30/hr 4.2/mo lt i 4.2 30/hr 11. 7/mo lt i 22.3 6/hr 22.3/mo lt ii 22.3 2.4/hr 87.5/mo lt ii 87.5 2.4/hr 8.75/mo lt iii 8.75 4/hr 42/mo lt ii 42 6/hr {lt ii 30 60/mo sup ii 30 180/hr 11.6/mo lt ii 11.6 na {lt ii 18.6 31.2/mo lt iii 13 7.5/hr 22.4/mo lt iii 22.4 na 42/mo lt iii 42 in the cost analysis of the automated system, the monthly wages for staff members of the cataloging and book processing un it were based on current monthly salaries (as of february 1980) plus estimated fringe benefits (2 1 percent). the total wages were added together for each level of staff and d ivided by the number of staff at that level to give an 0002 rbsbill rpt b1041 agency invoiced 0002 washington state unive rsity holland library pullman wa 99164 allene f schnaitter services charges com catlg processing w/s hold com catlg fiche copies online-attach sum holdcol 1 online-req cat cards-co l 1 online i nput of bib rec col 1 onli ne inquiry into database catalog cards washington library network customer invoice **. * * ***** *** * ** *** ** ••• *** * *** * *invoiced expenditure breakdown* account number / system * 4000 00 * recurring cha~ges-bib sys t em * ** *** ** * *** ** ** ** **** * ****** *** * quantity units 18 , 750 . 00 459.00 810.00 1,003.00 378.00 5 , 335 . 00 5,54 1. 00 @ 4¢ a title @ 15¢ a copy @$ 1. 60 recrd @ 15¢ each @ 14¢ each @ 6.9¢ each @ 5.5¢ card total services charges total charges , f ig. 1. washington library network customer invoice . billing date 12/31/79 ref . invoice no . 000001311 page no . 0001 total charges credits net charges 750.00 68.85 1 , 296 . 00 150.45 52 . 92 368.11 304 . 75 2. 991.08 2. 991.08 750,00 68.85 1 , 296 . 00. 150.45 52 . 92 368.11 304 . 75 2,991.08 * 2,991. 08 * 42 journal of library automation vol. 14/1 march 1981 monthly activity report tor period 11/01/79 to 11/30/79 library total holdings holdings records contribution rcps from acq orders inq uiry as of 11/30/79 added input factor 11/01 to 11/28 created transactions wapac 2, 0 59 38 . 0% 311 wap1p 41,549 416 385 92.5% 588 1,472 6,607 wapoh 33,801 566 89 15 . 7x 616 5,243 waps (wsu library) 44 866 1 630 197 12 . 0% 2013 1 674 1 9 013 fig . 2 . washington library network monthly activity report (selective sample ). average monthly wage . this average was then divided by 174 (the standard figure for university staff hours per month) to determine the average hourly rate . to calculate staff costs per minute, it was necessary to carry the per-minute costs to the third decimal to approximate the total dollars expended for staffing (see table 7) . no other indirect costs , e. g., breaks, annual leave, or holidays, were included in staff wages ; however, in order to determine the staff hours available to perform the functions being analyzed, nonproductive hours or staff hours devoted to other ass ignments had to be calculated and deducted. these calculations were made according to the following formula : hours/year hours/year 120 hours/year hours/year 88 hours/year 96 hours/year committee assignment (varied) unit meetings (varied) breaks (standard) annual leave (varied) holidays (standard) sick leave (standardized) based on hours earned per month hours/year-:12 = __ hours/month the primary reasons for variation in the nonproductive hours were length of service and whether a staff member was faculty or classified . staff costs under the manual system were based on current monthly wages; however, the number and level of staff are esse ntially that which existed at the time the manual system was function ing (see table 8). timeslip costs were not based on the minimum hourly wage, since a large number of hours were work/study during the period of the analysis . the total hours worked were divided by the total monthly expenditure to derive the per-minute timeslip costs. no effort was made to reconstruct actual timeslip costs under the manual system, but the same per-minute timeslip costs were used in order to avoid unnecessary skewing of staff costs under the manual system. data base costs the per unit costs of using the wln bibliographic system, both for performing processes and securing products, were based on the 1979-80 cost analysis/druschel 43 table 5 . typ e and average numb e r of items searched/cataloged per month on automated system (based on wln invoice data and cbp work assignment/staffing profile ) not nuc searched (wln )/month found/month found/month searched/month book approvals 600 420 (70%) 180 firm orders 700 406 (58%) 294 form approvals 244 (60%) regular 162 (40%) new acquisitions 295 90 (30% ) 205 (re-searched) precats 1380 414 (30%) 966 documents 125 25 (20%) 100 50 se rials 100 lo (10%) 90 30 rush 75 32 (42%) 43 43 gifts 100 5 (5% ) 95 95 monographic series 300 120 (40%) 180 originals 222 0 (0%) 222 222 reinstates 75 7 (10%) 68 68 3972 1529 (38 .5%) 2443 508 type and quantity of bibliographic data found in data base 1376 lc copy 153 cip copy (10%) 1529 type and quantity of original data entry monographs 192 se rials 30 nuc/ lc 95 total 317 total mate rials cataloged wln data base copy 1529 wln original data entry 317 non-wln microform 407 non-wln music lo 2263 wln schedule of charges. the average number of items processed was derived from the wln invoices. the per-record cost of the com catalog was calculated by taking the total costs of producing the com catalog from july 1979 to february 1980 and dividing these costs by the number of titles contained in the com catalog. although the wln schedule of charges stipulates a charge of . 069 cents per data-base inquiry, three kinds of processes allow a given number of inquiries without charge. since not all allowable inquiries are always used for these processes, there are generally a number of inquiries which can be made without charges being assessed. between july 1979 and february 1980, the average number of monthly inquiries for which there was a charge was 11,800; the average number per month for which there was no charge assessed was 8,044. for this reason, in the cost analysis of the automated system (table 2(a)), there appears a category "items searched, no inquiry charges" under the bibliographic searching section. 44 journal of library automation vol. 14/1 march 1981 table 6. type and average number of items searched/cataloged per month on manual system (based on cbp work assignment/staffing profile) not nuc searched (idc)/month found/month found/month searched/month book approvals 600 300 (50%) 300 firm orders 700 280 (40%) 420 420 new acquisitions 295 59 (20%) 236 (re-searched) precats 1380 276 (20%) 1104 documents 125 12 (10%) 113 113 serials 100 5 (5%) 95 95 rush 75 23 (30%) 52 52 gifts 100 5 (5%) 95 95 monographic series 300 90 (30%) 210 210 originals 222 0 (0%) 222 222 reinstates 75 7 (10%) 68 68 total 3972 1057 (26.5%) 2915 1275 type and quantity of materials cataloged idc copy 1057 modified copy 984 original cataloging 222 2263 (note: part of the "no charge" inquiries are generated and used by the acquisitions unit and are therefore not included in this analysis.) although the terminal service and line charges might simply have been added as a total amount to the data-base costs, it seemed more meaningful to distribute these costs on a per-use basis . the method used to distribute these charges was to identify each use of the bibliographic data base, and to divide the total monthly costs of terminals and lines by the total monthly units of use (see table 9). this method of distributing terminal service and line charges not only provided per-unit terminal use costs, but also served to categorize kinds and quantity of data-base use. subscription and material costs subscription costs include only those bibliographic tools purchased for use in tsd for the purpose of bibliographic searching. as a result of the increased growth of the bibliographic data base, fewer tools are being used for searching under the automated system than under the manual system. prior to the implementation of wln, the library subscribed to bibliographic data (lc and cip copy) on microfiche supplied by the information dynamics corporation (idc). the per-unit costs of all subscriptions are presented in the cost analysis charts (tables 1(a) and 2(a)). material costs include only those materials unique to cataloging and book processing; general supplies, such as pencils and paper, are not included. the calculation of the per-unit cost of most materials is generally straightforward. it should be noted, however, that under the automated system, products, i.e., materials, are included in the data-base cost analysis/druschel 45 table 7. staff costs: automated cataloging and book processing system staff costs/month classified staff oa ii lt i (4) lt ii (4) lt iii (2) supervisor ii (2) faculty catalogers (3vz) (monos) unit head staff costs/minute times lip oa ii lt i (4) lt ii (4) lt iii (2) supervisor ii (2) catalogers (3v2) unit head total staff costs/month timeslip----809 hrs @ $1,456/mo special projects librarian classified staff faculty total (all staff) salaries month $ 912 2,888 3,269 2,024 2,578 $4,691 1,774 plus 21% (fringe benefits) $192 606 686 425 541 $985 373 $1,456/mo + 809 hrs = 1.80/hr + 60 = .03/min 1,104/mo + 174 = 6.34/hr + 60 = .105/min costs/ month $ 1,104 3,494 3,955 2,449 3,119 subtotal $14,121 $ 5,676 2,147 subtotal $ 7,823 3,494/mo + 4 = $874/mo + 174 = 5.02/hr + 60 = .083/min 3, 955/mo + 4 = $989/mo + 17 4 = 5. 68/hr + 60 = . 094/min 2,449/mo + 2 = $1,225/mo + 174 = 7.04/hr + 60 = .117/min 3,119/mo + 2 = $1,560/mo + 174 = 8.97/hr + 60 = .15/min 5,676/mo + 3.5 = $1,622/mo + 174 = 9.32/hr + 60 = .155/min 2,147/mo + 174 = 12.34/hr + 60 = .205/min $ 1,456 345* 14,121 7,823 $23,745 *amount of time (wages) assigned to cataloging. costs, and only those materials used independent of the data base, e. g., book pockets and book cards, are listed as material costs on the charts. under the manual system, due to the divisional arrangement of the library system and the number of card catalogs being maintained, the formula for producing sets of cards for a single title was complex. for this reason, the costs and number of cards produced for the titles cataloged per month are listed as a separate line item. equipment costs equipment costs include only equipment unique to cataloging and book processing, i.e., required for processing or products. general equipment, such as desks, book trucks, typewriters, are not included. equipment-automated system during the period covered by the cost analysis, november 1977 to february 1980, the following equipment was purchased for the automated system: 46 journal of library automation vol. 14/1 march 1981 7 bibliographic terminals 10 modems or modem contention units 2 printers tax $24,360 5,433 6,500 $36,293 1,887 $38,180 two pieces of equipment are currently being leased (maintenance included): keypunch verifier @$ 92.61 @ 101.12 $193. 73/month summary of monthly equipment costs purchases (5-year amortization) $636.33 maintenance 60.00 leased equipment 193.73 $890.06/month equipment-manual system if the automated system had not been implemented, the following equipment would have been purchased during this period: 2 card catalogs $ 3, 755 5 kardex units 4,475 2 lined ex units __ 2, 944 $11,174 tax 581 $11,755 although the anticipated life span of this equipment should be considerably greater than that of terminals and modems, it has also been amortized over a five-year period. the rationale for this period of amortization is that the rate of growth of the files for which the equipment is used results in the purchase of additional equipment equivalent to the expected replacement of electronic equipment. therefore, the initial cost of these purchases amortized would have been $196/month. since the multilith has been owned by the library for more than twenty years, its purchase price is not applicable to this analysis. however, maintenance on the multilith is $72.24/month. two pieces of equipment were being leased under the manual system (maintenance included): keypunch verifier @$ 92.61 @ 101.12 $193. 73/month cost analysis/druschel 47 summary of monthly equipment costs purchases (5-year amortization) $196.00 maintenance 72.27 leased equipment 193.73 $462.00/month summary and conclusion the cost analysis clearly indicates that at washington state university libraries the automated cataloging and book processing system is less expensive than its previous manual system . by using the bibliographic component of the washiqgton library network, the library has reduced the costs of searching, cataloging, and record maintenance by almost 20 percent (see table 10-summary comparison costs by function). the higher costs of the manual system are essentially staff costs. under that table 8. staff costs: manual cataloging and book processing system (based on the 1977 staffing levels at current staff costs) staff costs/month classified staff oa iityping lt i (11) lt ii (3) lt iii (5) supervisor i (2) supervisor ii offset duplicator operator faculty catalogers (3. 5) unit head staff costs/minute timeslip oa ii-typing lt i (11) lt ii (3) lt iii (5) supervisor i (2) supervisor ii offset duplicator operator catalogers (3.5) unit head total staff costs/month timeslip---1208 hrs @ $2,174/mo classified staff faculty total (all staff) plus 21% salaries (fringe month benefits) $ 912 $ 192 7,950 1,670· 2,434 511 5,060 1,063 2,175 457 1,289 271 1,135 238 subtotal 4,691 985 1,774 373 subtotal $2,174/mo -o1208 hrs. = 1.80/hr -o60 = .03/min. 1, 104/mo -o174 = 6.34/hr -o60 = .105/min costs/ month $ 1,104 9,622 2,945 6,123 2,632 1,560 1,373 $25,359 5,676 2,147 $ 7,823 9,622/mo -o11 = 875/mo -o174 = 5.03/hr -o60 = .084/min 2,945/mo -o3 = 982/mo -o174 = 5.64/hr -o60 = .094/min 6,123/mo -o5 = 1,225/mo -o174 = 7.04/hr -o60 = .117/min 2,632/mo -o2 = 1,316/mo -o174 = 7.56/hr -o60 = .126/min 1,560/mo -o174 = 8.97/hr -o60 = .149/min. 1,373/mo -o174 = 7.89/hr -o60 = .13/min 5,846/mo -o3.5 = 1,670/mo -o174 = 9.60/hr -o60 = .155/min 2, 147/mo -o174 = 12.34/hr -o60 = .205/min. $ 2,174 25,359 7,823 $35,356 48 journal of library automation vol. 14/ 1 march 1981 table 9 . bibliographic data base use per month (one unit = one access to or process in data base) category searching cataloging (data base copy) cataloging (original data entry ) authority verification (317 x 7) bibliographic changes/corrections ill, ref, general total units wln terminal service and telecommunication line charges/ month 5 v2 terminals @ $140/mo = $770/mo 5v2 lines @ $40/mo = 220/mo $990/mo quantity of te rminal use 10688 1529 3 17 2219 360 537 15650 $990 + 15650 = $.06/terminal use for cataloging and book proce ssing system table 10 . catalo ging and book processing system : summary comparison costs by function (excluding equipment costs ) function manual system l. bibliographic searching 2. bibliographic record production (cost of catalog cards distribution) lc copy cataloging modified copy cataloging original cataloging misce llane ous 3. bibliographic record maintenance 4. marking total automated system l. bibliographic searching 2. bibliographic record production (cost of catalog cards included) lc and cip copy cataloging modified copy cataloging original cataloging miscellaneous 3. bibliographic record maintenance 4. marking total *total of items listed below. ttotal of costs listed below. number of items 5247 [2263)* 1057 984 222 na na na 4480 [2263 )* 1529 512 222 na na na costs pe r month $ 4 ,61 7 [$13, 143)t 4,092 5,021 3,343 687 6 ,402 4 ,631 $28 ,793 $ 2, 170 [$ll ,837)t 4,900 1,523 3 ,343 2,071 4 , 194 4 ,589 $22,790 system, eleven more staff and 1,365 more timeslip hours were needed per month to process the same amount of materials as is processed under the automated system . in fact, compared to the staff costs of both the manual and automated systems, the costs of equipment, data-base use (including products), terminal service, and telecommunication lines cost analysis/druschel 49 of the automated system are a relatively small percentage (27 percent) of the total cataloging and book processing costs . this analysis serves to underscore a basic reality of the current library organization: personnel is one of its largest expenditures and staff-intensive systems are very costly . this cost analysis has not directly addressed the issue of the quality of processing and products of either the manual or automated systems. the analysis suggests, however, that the automated system is more efficient in terms of staff time . moreover, the tsd staff has found that not only can more be done with fewer staff, but the automated system also provides more accurate data and has the flexibility to accommodate with relative ease the many corrections and changes that must be made to the library's bibliographic files. joselyn druschel is assistant director for automation and technical support at the washington state university libraries . she is currently chairing a staff task force which is developing specifications for the libraries' on-line catalog. · 164 information technology and libraries | december 2009 “discovery” focus as impetus for organizational learning jennifer l. fabbi the university of nevada las vegas libraries’ focus on the concept of discovery and the tools and processes that enable our users to find information began with an organizational review of the libraries’ technical services division. this article outlines the phases of this review and subsequent planning and organizational commitment to discovery. using the theoretical lens of organizational learning, it highlights how the emerging focus on discovery has provided an impetus for genuine learning and change. t he university of nevada las vegas (unlv) libraries’ focus on the concept of discovery and the tools and processes that enable our users to find information stemmed from the confluence of several initiatives. however, a significant path that is directly responsible for the increased attention on discovery leads through one unit in unlv libraries—technical services. this unit, consisting of the materials ordering and receiving (acquisitions) and bibliographic and metadata services (cataloging) departments, had been without a permanent director for three years when i was asked to take the interim post in april 2008. while the initial expectation was that i would work with the staff to continue to keep technical services functioning while we performed our third search for a permanent director, it became clear after three months that, because of nevada’s budgetary limitations, we would not be able to go forward with a search at that time. as all personnel searches in unlv libraries were frozen, managers and staff across the divisions moved quickly to reassign staff with the aim of mitigating the effects of staff vacancies. there was division between the library administrators as to what the solution would be for technical services: split up the division—for which we had trouble recruiting and retaining a leader in the past—and divvy up its functions among other divisions in the libraries, or to continue to hold down the fort while conducting a review of technical services that would inform what it might become in the future. other organizations have taken serious looks at, and provided roadmaps of, how their organizations’ focus of technical services will change in the future.1 the latter route was chosen, and the review—eventually dubbed revisioning technical services—led directly to the inquiries and activities documented in this ital special issue. detailing the process of revisioning technical services and using the theoretical lens of organizational learning, i will demonstrate how the libraries’ emerging focus on the concept of discovery has provided an impetus for genuine learning and change. n organizational learning in images of organization, morgan devotes a chapter to theories of organizational development that characterize organizations using the metaphor of the brain.2 based on the principles of modern cybernetics, argyris and schön provide a framework for thinking about how organizations can learn to learn.3 while many organizations have become adept at single-loop learning—the ability to scan the environment, set objectives, and monitor their own figure 1. singleand double-loop learning source: learning-org discussion pages, “single and double loop learning,” learning-org dialog on learning organizations, http://www.learning-org.com/ graphics/lo23374singledll.jpg (accessed aug. 11, 2009). jennifer l. fabbi (jennifer.fabbi@unlv.edu) is special assistant to the dean at the university of nevada las vegas libraries. “discovery” focus as impetus for organizational learning | fabbi 165 general performance in relation to existing operating norms—these types of systems are generally designed to keep the organization “on course.” double-loop learning, on the other hand, is a process of learning to learn, which depends on being able to take a “double look” at the situation by questioning the relevance of operating norms (see figure 1). bureaucratized organizations have fundamental organizing principles, including management hierarchy and subunit goals that are seen as ends to themselves, which can actually obstruct the learning process. to become skilled in the art of double-loop learning, organizations must avoid getting trapped in singlelooped processes, especially those created by “traditional management control systems” and the “defensive routines” of organizational members.4 according to morgan, cybernetics suggests that learning organizations must develop capacities that allow them to do the following:5 n scan and anticipate change in the wider environment to detect significant variations by o embracing views of potential futures as well as of the present and the past; o understanding products and services from the customer’s point of view; and o using, embracing, and creating uncertainty as a resource for new patterns of development. n develop an ability to question, challenge, and change operating norms and assumptions by o challenging how they see and think about organizational reality using different templates and mental models; o making sure strategic development does not run ahead of organizational reality; and o developing a culture that supports change and risk taking. n allow an appropriate strategic direction and pattern of organization to emerge by o developing a sense of vision, norms, values, limits, or “reference points” to guide behavior, including the ability to question the limits being imposed; o absorbing the basic philosophy that will guide appropriate objectives and behaviors in any situation; and o placing as much importance on the selection of the limits to be placed on behavior as on the active pursuit of desired goals. unlv libraries’ revisioning technical services process and the resulting organizational focus on discovery is outlined below, and the elements identifying unlv libraries as a learning organization throughout this process are highlighted (see appendix a). n revisioning technical services this review of technical services was a process consisting of several distinct steps over many months, and each step was informed by the data and opinions gained in the prior steps: phase 1: technical services baseline, focusing on the nature of technical services work at unlv libraries, in the library profession, and factors that affect this work now and in the future phase 2: organizational call to action, engaging the entire organization in shared learning and input phase 3: summit on discovery, shifting significantly away from technical services and toward the concept of discovery of information and the experience of our users technical services baseline the first phase of the process, which i called the “technical services baseline,” included a face-to-face meeting with me and all technical services staff. we talked openly about the challenges that we faced, options on the table for the division and why i thought that taking on this review would be the best course to pursue, and goals of the review. outcomes of the process were guided by the dean of libraries, were written by me, and received input from technical services staff, resulting in the following goals: 1. collect input about the kinds of skills and leadership we would like to see in our new technical services director. (while creating these goals, we were given the go-ahead to continue our search for a new director). 2. investigate the organization of knowledge at a broad level—what is the added value that libraries provide? 3. increase overall knowledge of professional issues in technical services and what is most meaningful for us at unlv. 4. encourage technical services staff to consider current and future priorities. after establishing these goals, i began to document information about the process on unlv libraries’ staff website (figure 2) so that all staff could follow its progress. 166 information technology and libraries | december 2009 with the feedback i received at the face-to-face meeting and guided by the stated goals of the process, i gave technical services staff a series of three questions to answer individually: 1. what do you think the major functions of technical services are? examples are “cataloging physical materials” and “ordering and paying for all resources purchased from the collections budget.” 2. what external factors—in librarianship and otherwise—should we be paying the most attention to in terms of their effect on technical services work? examples are “the ways that users look for information” and “reduction of print book and serials budgets.” feel free to do a little research on this question and provide the sources of the information that you find. 3. what are the three highest priority/most important tasks on your to-do list right now? eighteen of twenty staff members responded to the questions. i then analyzed the twenty pages of feedback according to two specific criteria: (1) i paid special attention to phrases that indicated an individual’s beliefs, values, or philosophies to identify potential sources of conflict as we moved through the process; and (2) i looked for priority tasks listed that are not directly related to the individual’s job duties, as many of them were indicators of work stress or anxiety related to perceived impending change. during this phase, organizational learning was initiated through the process of challenging how technical services staff and others viewed technical services as a unit in the organization, and through the creation of shared reference points to guide our future actions. while beginning a dialogue about a variety of future management options for technical services work functions may have raised levels of anxiety within the organization, it also invited administration and staff to question the status quo and consider alternative modes of operation within the context of efficiency.6 in addition to thinking about current realities and external influences, staff were asked to participate in generating outcomes to guide the review process. these shared goals helped to develop a sense of coherence for what started out as a very loose assignment—a review that would inform what the unit might become in the future. organizational call to action the next phase of the process, “a call to action,” required library-wide involvement and input. while i knew that this phase would involve a library staff survey, i also desired that all staff responding to the survey had a basic knowledge of some of the issues that are facing library technical services today. using input from the two technical services department heads, i selected two readings for all library staff: bothmann and holmberg’s chapter on strategic planning for electronic resource management addressed many of the planning, policy, and workflow issues that unlv libraries has experienced7; and coyle’s article on information organization and the future of the library catalog offers several ideas for ensuring that valuable information is visible to our users in the information environments they are using.8 i also asked the library staff to visit the university of nebraska–lincoln’s “encore catalog search” (http://iris.unl.edu) and go through the discovery experience by performing a guided search and a search on a topic of their choice. they were then asked to ponder what collections of physical or digital resources we currently own at the libraries that are not available from the library catalog. after completing these steps, i directed library staff to a survey of questions related to the importance of several items referenced in the articles in terms of the following unlv libraries priorities: n creating a single search interface for users pulling together information from the traditional library catalog as well as other resources (e.g., journal articles, images, archival materials) n considering non–marc records in the library catalog for the integration of nontraditional library and nonlibrary resources into the catalog n linking to access points for full-text resources from the catalog n creating ways for the catalog to recommend items to users figure 2. project’s wiki page on staff website “discovery” focus as impetus for organizational learning | fabbi 167 n creating metadata for materials not found in the catalog n creating “community” within the library catalog n implementing an electronic resource management system (erms) to help manage the details related to subscriptions to electronic content n implementing federated searching so that users can search across multiple electronic resource interfaces at once n making electronic resource license information available to library staff and patrons there also were several questions asking library staff to prioritize many of the functions that technical services already undertakes to some extent: n cataloging specialized or unique materials n cataloging and processing gift collections n ensuring that full-text electronic access is represented accurately in the catalog n claiming and binding print serials n ordering and receiving physical resources n ordering and receiving electronic resources n maintaining and communicating acquisitions budget and serials data the survey asked technical services staff to “think of your current top three priority to-do items. in light of what you read and what you think is important for us to focus on, how do you think your work now will have changed in five years?” all other library staff members were asked to respond to the following: 1. please list two ways that technical services supports your work now. 2. please list two things you would like technical services to start doing in support of your work now. 3. please list two things you think technical services can stop doing now. 4. please list two things technical services will need to begin doing to support your work in the next five years. finally, the survey included ample opportunity for additional comments. fifty-eight staff members (over half of all library staff) completed the readings, activity, and survey. i analyzed the information to inform the design of subsequent phases of revisioning technical services. the dean of libraries’ direct reports then reviewed the design. in addition, many library staff contributed additional readings and links to library catalogs and other websites to add to the revisioning technical services staff webpage. throughout this phase, the organization was invited into the learning process through engagement with shared reference points, the ability to question the status quo, and the ability to embrace views of potential futures as well as of the present and the past.9 the careful selection of shared readings and activities created coherence among the staff in terms of thinking about the future, but these ideas also raised many questions about the concept of discovery and what route unlv libraries might take. the survey allowed library staff to better understand current practices in technical services, to prioritize new ideas against these practices, and to think about future options and their potential impact on their individual work as well as the collective work of the libraries. summit on discovery in the third phase of this process, “the discovery summit,” focus began to shift significantly from technical services as an organizational unit to the concept of discovery and what it means for the future of unlv libraries. during this half-day event, employing a facilitator from off campus, the dean of libraries and i designed a program to fulfill the following desired outcome: through a process of focused inquiry, observation, and discussion, participants will more fully understand the discovery experience of unlv libraries users. the event was open to all library staff members; however, individuals were required to rsvp and complete an activity before the day of the event. (the facilitator worked specifically with the technical services staff at a retreat designed to prepare for upcoming interviews for technical services director candidates.) participants were each sent a “summit matrix” (see appendix b) ahead of time, which asked them to look for specific pieces of information by doing the following: 1. search for the information requested with three discovery tools as your starting points: the libraries’ catalog, the libraries’ website, and a general internet search engine (like google). 2. for each discovery tool, rate the information that you were able to find in terms of “ease of discovery” on a scale of 1 (lowest ease—few results) to 5 (highest ease—best results). 3. document the thoughts and feelings you had and/ or process you went through in searching for this information. 4. answer this question: do you have other preferred starting points when looking for information that the libraries own or provide access to? the information that staff members were asked to search for using each discovery tool was mostly specific to the region of southern nevada, such as, “i heard that henderson (a city in southern nevada) started as a mining community. does unlv libraries have any books about that?” and “find any photograph of the gay 168 information technology and libraries | december 2009 pride parade in las vegas that you can look at in unlv libraries.” during the summit, the approximately sixty participants were asked to discuss their experiences searching for the matrix information, including any affective component to their experience, and they were asked to specify criteria for their definition of “ease of discovery.” next, we showed end-user usability video testing footage of a unlv professor, a human resources employee, and a unlv librarian going through similar discovery exercises. after each video, we discussed these users’ experiences—their successes, failures, and frustrations— and the fact that even our experts were unable to discover some of this information. finally, we facilitated a robust brainstorming session on initiatives we could undertake to improve the discovery experience of our users. [editor’s note: read more about this usability testing in “usability as a method for assessing discovery” on page 181 of this issue.] during the wrap-up of the discovery summit, the final phase of this initial process, the discovery miniconference was introduced. a call for proposals for library staff to introduce or otherwise present discovery concepts to other library staff was distributed. this call tied together the revisioning technical services process to date and also placed the focus on discovery to the libraries’ upcoming strategic planning process. this strategic planning process, outlining broad directions for the libraries to focus on for the next two years, would be the first time we would use our newly created evaluation framework. we focused on the concepts of discovery, access, and use, all tied together through an emphasis on the user. all library staff members were invited to submit a poster session or other visual display on various themes related to discovery of information to add to our collective and individual knowledge bases and to better understand our colleagues’ philosophies and positions on discovery. in addressing one of six mini-conference themes listed below, all drawn directly from the revisioning technical services survey results, potential participants were asked to consider the question, “what are your ideas for ways to improve how users find library resources?” n single search interface (federated searching, harvester-type platform, etc.) n open source vs. vendor infrastructure n information-seeking behavior of different users n social networking and web 2.0 features as related to discovery n describing primary sources and other unique materials for discovery n opening the library catalog for different record types and materials proposals could include any of these perspectives: n an environmental scan with a summary of what you learn n a visual representation of what you would consider improvement or success n a position for a specific approach or solution that you advocate ultimately, we had seventeen distinct projects involving twenty-four staff members for the afternoon miniconference. it was attended by approximately seventy additional staff members from unlv libraries as well as representatives from institutions who share our innovative system. we collected feedback on each project in written form and electronically after the mini-conference. miniconference content was documented on its own wiki pages and in this special issue of ital. during this phase of the revisioning technical services process, there was an emphasis on understanding our services from the customers’ point of view, a hallmark of a learning organization.10 during the discovery summit, we aimed to transform frustration and uncertainty over the user experience of the services we are providing into a motivation to embrace potential futures. the mini-conference utilized the discovery themes that had evolved throughout the revisioning technical services process to provide a cohesive framework for library staff members to share their knowledge and ideas about discovery systems and to question the status quo. n organizational ownership of discovery: strategic planning and beyond through the phases of the revisioning technical services process outlined above, it should be evident how the concept of discovery, highlighted during the process, moved from being focused on technical services to being owned by the entire organization. while the vocabulary of discovery had previously been owned by pockets of staff throughout unlv libraries, it has now become a common lexicon for all. the libraries’ evaluation framework, which includes discovery, had set the stage for our upcoming organizational strategic plan. just prior to the discovery summit, the dean of libraries’ direct reports group began to discuss how it would create a strategic plan for the 2009–11 biennium. it became increasingly apparent how important a focus on discovery would be in this process, and that we needed to time our planning right, allowing the organization and ourselves time to become familiar with the potential activities we might commit to in this area before locking into a strategic plan. “discovery” focus as impetus for organizational learning | fabbi 169 the dean’s direct reports group first spent time crafting a series of strategic directions to focus on in the two-year time period we were planning for. rather than give the organization specific activities to undertake, the strategic directions were meant to focus our new initiatives—and in a way to limit that activity to those that would move us past the status quo. of the sixteen directions, one stemmed directly from the organization’s focus on discovery: “improve discoverability of physical and electronic resources in empowering users to be self sufficient; work toward an interface and system architecture that incorporates our resources, internal and external, and allows the user to access them from their preferred starting point.” an additional direction also touched on the discovery concept: “monitor and adapt physical and virtual spaces to ensure they respond to and are informed by next-generation technologies, user expectations, and patterns in learning, social interactions, and research collaboration; encourage staff to experiment with, explore, and share innovative and creative applications of technology.” through their division directors and standing committees, all library staff members were subsequently given the opportunity to submit action items to the strategic plan within the framework of the strategic directions. the effort was made by the dean of libraries for this part of the process to coincide with the discovery mini-conference, a time when many library staff members were being exposed to a wide variety of potential activities that we might take as an organization in this area. one of the major action items that made it into the strategic plan was for the dean’s direct reports to charge an oversight task force with the investigation and recommendation of a systems or systems that would foster increased, unified discovery of library collections. the charge of this newly created discovery task force includes a set of guiding principles for the group in recommending a discovery solution that n creates a unified search interface for users pulling together information from the library catalog as well as other resources (e.g., journal articles, images, archival materials); n enhances discoverability of as broad a spectrum of library resources as possible; n is intuitive: minimizes the skills, time, and effort needed by our users to discover resources; n supports a high level of local customization (such as accommodating branding and usability considerations); n supports a high level of interoperability (easily connecting and exchanging data with other systems that are part of our information infrastructure); n demonstrates commitment to sustainability and future enhancements; and n is informed by preferred starting points of the user. in setting forth these guiding principles, the work of the discovery task force is informed by the organization’s discovery values, which have evolved over a year of organizational learning. in the timing of the strategic planning process and the emphasis of the plan, we made sure that the organization’s strategic development did not run ahead of organizational reality and also have worked to develop a culture that supports change and risk taking.11 the strategic discovery direction and pattern of organizational focus has been allowed to emerge throughout the organizational learning process. as evidenced in both the strategic plan directions and guiding principles laid out in the charge of the discovery task force, the organization has begun to absorb the basic philosophy that will guide appropriate objectives in this area and has focused more on this guiding philosophy than on the active pursuit of one right answer as it continues to learn. n conclusion using the theoretical lens of organizational learning, i have documented how unlv libraries’ emerging focus on the concept of discovery has provided an impetus for learning and change (see appendix a). our experience throughout this process supports the theory that organizational intelligence evolves over time and in reference to current operating norms.12 argyris and schön warn that a top-down approach to management focusing on control and clearly defined objectives encourages singleloop learning.13 had unlv libraries chosen a more management-oriented route at the beginning of this process, it most likely would have yielded an entirely different result. in this case, genuine organizational learning proved to be action based and ever-emerging, and while this is known to introduce some level of anxiety into an organization, the development of the ability to question, challenge, and potentially change operating norms has been worth the cost.14 i believe that while any single idea we have broached in the discovery arena may not be completely unique, it is the entire process of organizational learning that is significant and applicable to many information and technology-related areas of interest. references 1. karen calhoun, the changing nature of the catalog and its integration with other discovery tools (washington, d.c.: library 170 information technology and libraries | december 2009 scan and anticipate change in the wider environment to detect significant variations by n embracing views of potential futures as well as of the present and the past (revisioning phase 1: technical services questions); n understanding products and services from the customer’s point of view (revisioning phase 3: summit); and n using, embracing, and creating uncertainty as a resource for new patterns of development (revisioning phase 1: meeting; phase 3: summit). develop an ability to question, challenge, and change operating norms and assumptions by n challenging how they see and think about organizational reality using different templates and mental models (revisioning phase 2: survey); n making sure strategic development does not run ahead of organizational reality (strategic planning process; discovery task force charge); and n developing a culture that supports change and risk taking (strategic planning process). allow an appropriate strategic direction and pattern of organization to emerge by n developing a sense of vision, norms, values, limits, or “reference points” to guide behavior, including the ability to question the limits being imposed (revisioning phase 1: outcomes; phase 2: shared readings, activity; strategic planning process; discovery task force charge); n absorbing the basic philosophy that will guide appropriate objectives and behaviors in any situation (strategic planning process, discovery task force charge); and n placing as much importance on the selection of the limits to be placed on behavior as on the active pursuit of desired goals (strategic planning process, discovery task force charge). of congress, 2006), http://www.loc.gov/catdir/calhoun-report -final.pdf (accessed aug. 12, 2009); bibliographic services task force, rethinking how we provide bibliographic services for the university of california (univ. of california libraries, 2005), http://libraries.universityofcalifornia.edu/sopag/bstf/final .pdf (accessed aug. 12, 2009). 2. gareth morgan, images of organization (thousand oaks, calif.: sage, 2006). 3. chris argyris and donald a. schön, organizational learning ii: theory, method, and practice (reading, mass.: addison wesley, 1996). 4. morgan, images of organization, 87. 5. morgan, images of organization, 87–97. 6. ibid. 7. robert l. bothmann and melissa holmberg, “strategic planning for electronic management,” in electronic resource management in libraries: research and practice, ed. holly yu and scott breivold, 16–28 (hershey, pa.: information science reference, 2008). 8. karen coyle, “the library catalog: some possible futures,” the journal of academic librarianship 33, no. 3 (2007): 414–16. 9. morgan, images of organization. 10. ibid. 11. ibid. 12. ibid. 13. argyris and schön, organizational learning ii. 14. morgan, images of organization. appendix a. tracking unlv libraries’ discovery focus across characteristics of organizational learning “discovery” focus as impetus for organizational learning | fabbi 171 please complete the following and bring to the summit on discovery—february 24: 1. search for the information requested in each row of the table below with three discovery tools as your starting points: the libraries catalog, the libraries website, and a general internet search engine (like google). 2. for each discovery tool, rate the information that you were able to find in terms of “ease of discovery” on a scale of 1 (lowest ease) to 5 (highest ease). 3. document the thoughts and feelings you had and/ or process you went through in searching for this information in the space provided. 4. answer this question: do you have other preferred starting points when looking for information that the libraries own or provide access to? appendix b. summit matrix what am i looking for? libraries catalog libraries website google thoughts, etc., on what i discovered what’s all the fuss about frazier hall? why is it important? does unlv libraries have any documents about the history of the university that reference it? it’s black history month and my professor wants me to find an oral history about african americans in las vegas that is available in unlv libraries. i heard that henderson started as a mining community. does unlv libraries have any books about that? find any photograph of the gay pride parade in las vegas that you can look at in unlv libraries. 128 on-line serials system at laval university library rosario de varennes: director, library analysis and automation, laval university library, cite universitaire, quebec, canada description of a system, operational since june 1968, that provides control of all serials holdings in nine campus libraries, permits updating of the complete file every two or three days, and produces various outputs for library users and library staff from data in variable fields on disks (listings, statistics, etc.). the program, presently operating on an ibm 360/50 and utilizing an ibm 2314 disk -storage facility and three ibm 226 crt terminals, is written in ibm system/360 operating system assembler language and in pl/i; it could encompass a file of no more than 10 million records of variable length limited to 127/255 characters and subdivided in 25 or fewer fields . l'universite laval, the oldest french university in america, around 1950 began a move from the original location in historic old quebec to a new campus in suburban sainte-foy; the general plan calls for a total investment of $235,000,000. this private institution, subsidized by the provincial government at about 75%, had an operating budget in 1968/69 of $32,000,000 (research not included); of this sum, $2,300,000 was appropriated for the library system. the enrollment of full-time students was 10,145 and the total registration 22,726. the regular teaching staff amounted to 1,016 and the total figure was 1,691. the library serving this community constitutes a unified system under one administration, with centralized technical processing, but with nine physical locations-one of which is still in the old city-and four auxiliary services: documentation center, rare books and archives, map lion-line serials systemjde varennes 129 brary and film library. the most recent addition to it is the main library building, dedicated in june 1969, a $10,000,000 seven-story complex of 424,000 square feet(1) . the library staff consists of 269 employees, of which 78 are professional librarians or specialists. the serials department totals fifteen employees, of which three are professional librarians. the collections as of august 1969 represent 815,966 physical units, or 433,407 cataloging units of books, periodicals, government publications, pamphlets and microtexts; and 88,734 physical units of special collections (maps, photos, films, fixed films, music records, manuscripts, archives). the serials alone account for 189,440 bound volumes and 16,335 titles, of which 12,396 are received currently and 7,934 are subscriptions. the figures for serials titles will probably reach the 20,000 mark with the completion in 1970 of an inventory started in 1964. library automation venture at the library goes back to the autumn of 1963, when an off-line serials system and a subject headings list program were begun. along the road, there was developed in the documentation center a special technique of information storage and retrieval utilizing the recordak miracode (microfilm retrieval access code) system and a program called asyvol 2 (analyse synthetique par vocabulaire libre/synthetical analysis by free vocabulary) by means of which various indexes and research projects are currently processed. recently the first on-line real-time program with the new serials system went into successful operation. some literature, mostly in french, has been issued concerning these realizations and projects, but has been little publicized (2-11). it is also worth noting that the library, except for some peripheral equipment, mostly input devices, does not own any machinery and is utilizing instead the programming staff and the computer facilities of the polyvalent laval university's computing center. in the library itself the author of this article is mainly responsible for preliminary analysis of projects, for coordination of activities between the library and the computing center, for the supervision of work done in library automation units integrated into library services, and for the administration of the budget appropriated for library automation. this last item, research projects not included, is $170,000 for the year under discussion. system design contents of the file in its present organiza.tion, the serials file is accessible only by an accession number limited to seven digits and ordered corresponding to the alphabetized entries of records. there is a distinct entry for every title and every reference and for each duplication of any one title or reference. all records fall within two main divisions: humanities, represented by h, and sciences, represented by s, and are further identified by subdivisions of these main classes to a limit of three letters (for 130 journal of library automation vol. 3/2 june, 1970 titre a sigle g period! cite c repertoires de defouillement d ab./don/ech. e lieu de fubl. f pays/langue g cote h date de parution i nit !ale i editeur et son adresse j prix k note historique l titre direct m vedettes·"'atiere 1 2 n renouv • ccm~. 0 . etat de collection de l'annee cour. p etat retrospectif de la collection q etat retrospectif de ce qui manqjje r n.b. ."sigle !ere col 2e col 3e col 4e col. 5e col. 6e c ol. fig. 1. input sheet. matricule no: i i i i i i i i i i i i i 1 courant 1 publication officielle 1 annuel ou continuation 1 voir 1 collection non complete 1 periodiques d'hopitallx i i i i i i i 0 non courant 0 non officielle 0 les autres o titre du periodique o collection complete information sheet for transmission to system (back-up) on-line serials system/de varennes 131 fig. 2. serials file updating. 132 journal of library automation vol. 3/2 june, 1970 example: hh, main library; hmu, music library; sa, agricultural library; scc> science library, department of chemistry). there is a possibility of 25 fixed/variable fields for any record, but only 18 are currently used (figure 1). as of september 2, 1969, the statistical figures for the complete file were as follows: 22,530 entries, of which 16,335 were titles; 6,192 references; three entries were unspecified by error. hardware the system is operational with an ibm 360/50, an ibm 2314 diskstorage facility and ibm 2316 disk packs, three ibm 2260 crt display units, an ibm 2848 display control unit, an ibm 2401 tape transport and control unit and an ibm 1403 high-speed printer. the program system occupies a 56,320-byte region of core memory. software the system developed at laval provides essentially for two things: the record display on crt terminals for questioning or modifying the records; and the updating of the serials file (figure 2). it is not affected by the bibliographic contents of records; the control of this part is the responsibility of the serials department. the system could encompass any file of no more than 10 million records of variable length limited to 127/255 characters and subdivided into 25 or fewer fields. the program is written in ibm system/ 360 operating system assembler language, except for the output and printing routines written in pl/1, and it is conceived for an ibm 360/ 30 model or one of a higher number, matched at a minimum with one magnetic tape, one disk, one 2848 control unit and one 2260 crt terminal; it operates under the control of operating system os/ 360, version 14 or subsequent (12, 13). the system is roughly subdivided into three subsystems, the first being the control routine for the system and crt terminals, developed by the linkages cm cmca ed~ nvc modossif avanc i racine i --r--institr ~ecran croix ins chan 'ligne ins ~ cor mot aic reperage ric log blanbuf fig. 3. communication between modules. on-line serials systemfde varennes 133 computing center of laval university and called racine (root). the second is a subsystem that consists of display and updating routines, a group of 18 modules falling under two control sections ( csect): linkages and marchand (family name of an analyst-programmer from b.i.r.o.). each is again constituted of various subprograms, and marchand includes also all literals of the program. all these modules communicate various ways as illustrated in figure 3. modules not within the large box constitute the csect marchand; other csecf are within the large box. the third subsystem is a modification routine of records on disk called modossif (modification des dossiers du fichier/modification of records on file). the ibm/linkage editor links these routines, and they communicate 1) by way of specific registers; and 2) by way of working space areas, some common to all terminal stations and some restricted to one in particular. the main purpose of the system being to give the user up-to-date information, it is implied that information concerning modified records should be available as soon as the transaction is performed. the ibm/ indexed sequential system seems at first sight to ideally answer this need. nevertheless, the library was forced to elaborate a more complex system for the sake of security. data sets (figure 4) commune et unite modossi f ' ' ' ' 0 0 0 0 procedure de " restart " ---~, fig. 4. data sets. 134 journal of library automation vol. 3/ 2 june, 1970 there exist the master file in direct access on disk, with a backup £le on tape. when a record is asked for, the accession number of the record is transmitted to the program and searched for in the file; if found it is duplicated completely in the working space area on disk corresponding to a particular terminal, and is displayed on the crt nine lines at a time. in case of a modification being asked for, it is on this copy in that particular area that the program modossif applies. in switching to another demand, the program checks to see if any modification occurred. if not, the copy is destroyed; otherwise the amended record is transferred to the temporary common working space area on disk called bp am (buffer periodiques amendes/ buffer amended serials), where all modifications accumulate from one updating of the master £le to the other. if queried anew before updating, the same amended record will be retrieved from bp am £le and duplicated as before. moreover, any instruction concerning modifications is chronologically recorded on tape as given and constitutes the log (figure 2). if any down time occurs, it is then possible to simulate all the transactions performed since the last updating. updating, normally a daily process, is basically the merger of the master £le with the bp am £le, resulting in the creation of a new master £le on disk and a new backup on tape. record in the file as mentioned before, any record in the £le is identified by an accession number of seven digits. number 0000000 identifies the system's messages and is always displayed first, and number 9999999, indicating that a working space area is not occupied, is not to be used. otherwise all numbers are symmetrical and interchangeable. any record may cover up to 25 fields or blocks of logical information. these fields are identified by letters a to y and put into alphabetical order. they vary in length from three to many thousand characters. each field is divided into three elements: identifying letter; information to display; and end of field or record control tag. this tag is fd (fin du dossier/end of field) for all fields except the last one, which is tagged fe (fin de !'entree/ end of record ). the information to be displayed is submitted to various restrictions, exemplified in detail in the instructions manual ( 13). the manual, in fact, puts in action the main program modossif. physically any record on file is subdivided into many subrecords of fixed length ( l equ nnn) optimized at 239 bytes to a maximum of 127 per record. each subrecord is addressed in three sections as follows : 235 bytes of information, three bytes representing the accession number in binary code and one byte giving the sequence number of the subrecord under this particular accession number. this way, the last four bytes give the key to the subrecord in the master file and the last byte the key of access in the working-space area, making it possible to execute on-line serials systemfde varennes 135 modossif and various print-out routines. to facilitate the retrieval of any particular field in a record, the 26 first bytes of the first subrecord are set aside for an index to the fields . the 25 first bytes represent fields a to y; in each position, a binary zero points to an inexistent field and a positive value indicates the sequence of the subrecord where the field starts. the 26th byte gives the total number of subrecords in the record. the 27th byte gives the name of the first existing field, etc. figure 5 is an example of a complete record. underlined sections indicate hexadecimal notation. each row in the · figure is a subrecord here given an unreal value of 40. the remainder is in alphanumeric characters, except that space is compressed and indicated by a dollar sign. the information on figure 5 appears as follows on the crt screen: a tokyo bunrika daiguru, science reports b 000010 sc d annual reports of sciences q no 50-67, 83, 97 t chimie chimie industrielle u a et b c v a-b c w vide x revue annuelle de chimie y ce dossier est dresse a titre d'exemple seulement. varia the program provides also the parameters for each of the lines displayed on the screen, that is, nine screen-parameters called p armec ( parametres-ecran). a b c d e f g u i j k l m n 0 p q r s t u v w x y 0102000300000000000000000000000003000004050505050608a t 0 k y 0 bun 21e88e01 1 2 3 4 567 r i k a d a i g u r u , s c i e n c e r e p 0 r t s fd b 0 0 0 0 1 0 21e88e02 $ s c fd d a n n u a l r e p 0 r t s 0 f s c i e n c e s !~ q n 0 21e88e03 5 0 6 7 , 8 3 , 9 7 fd t c h i h i e c ii i 11 i e i n d u s t r i e 21e88e04 l l e fd u a e t d $ c fd v a f d , $ c fd w $ v i d e fd x r e v u 21e88eos e a n n u e l l e d e c h i h i e • fd y c e d 0 s s i e r e s 21e88e06 t d r e s s e a t i t r e d ' e x e m p l e s e u l e m e n t 21e8be07 , fe0000000000000000000000000000000000000000000000000000000000000000000021e8be08 fig. 5. example of complete record. 136 journal of library automation vol. 3/2 june, 1970 the analyst-programmers at the computing center completed the program by various printing subprograms from the data on variable fields on disks, by a statistics subprogram and by a control routine of the indexes to the file. recently another addition occurred to the system when les presses de l'universite laval, the library's subscription agent for serials, decided to utilize the file to initiate a computerized ordering process. the programming for this project was tested during october and the program was successfully run during the first week of december 1969. implementation as soon as it was confirmed that the computing center would receive by summer 1967 a third-generation computer (ibm 360/40) it was deemed advisable to contemplate an on-line system to replace the already saturated off-line serials system on the ibm 1410 inaugurated in 1964 (8). an optimistic target date having been set for january 1968, the author transmitted in april 1967 to the computing center for study a working hypothesis concerning the automatic conversion of holdings data and the automatic claiming of missing issues ( 14). in answer to it, in august 1967, mr. jean lachance, analyst-programmer, proposed a first draft of an automatic control system for serials ( 15). in fact the draft envisaged only a semi-automatic conversion of data and the on-line system for current entries only, the non-current being managed off line. then, on account of various restrictions befalling the computing center, it was decided to call upon an external firm, b.i.r.o., inc., located in quebec city. the contract, signed at the end of november 1967, provided basically for: 1) the conversion of the master file on magnetic tapes, containing records in fixed fields, to a random access file on disks, with records in variable fields; 2) the programming of record display on crt terminals; 3) the updating of the file via coded input procedures; 4) the provision of transitory working space areas for current transactions; 5) the possibility of questioning and amending both the master file and the transitory file; 6) the writing of the appropriate technical documentation and the intitial training of the operators of the terminals. the contract was to be in conformity with the standards of operating system os for an ibm 360 computer and subject to the acceptance of the computing center of laval university. at the same time a working schedule was established as follows: 1) beginning of work as soon as the contract was signed; 2) operational program sixty days after delivery to the contractor of terminals in good working condition according to manufacturer's specifications; and 3) termination of contract thirty days after acceptance of the finished product. the terminals were ready by january 25, 1968. the program was declared operational by april 11, 1968, and the technical report describing it deposited the week after. meanwhile a last updating of the master on-line serials systemjde varennes 137 file with the off-line system was performed at the beginning of april. on april 29, 1968, the conversion of the file to ibm 2311 disks connected to an ibm 360/40 was realized. everything was then ready for the final test. unhappily, mr. lachance left laval at the end of april at the most crucial moment, and it was not before june 12, 1968, that the first updating succeeded and the program became operational. from then on, apart from various technical problems, there were other difficulties: a moving of the main library during august, a moving of the computing center that precluded any activity from september 26 to october 25, and a switch to an ibm 360/50 and to an ibm 2314 diskstorage facility during october-not to mention a turnover of staff in the serials department. these prevailing conditions explain why the program was not officially accepted by the library before december 16, 1968. operation the first of a series of turning points in the refinement of the program occurred in november 1968 with a normalized run of updatings. in january 1969 the two first printing programs ran successfully; these were a daily checking list with calendar, and a statistics subprogram (figure 6). in february 1969 the program produced almost error-free updatings (0.8% and 0.63%), and in june 1969 the system was finally debugged, eliminating a particular recurring anomaly accounting for most of the errors in the system (a display of preceding instruction bearing accession number and code in some field o:i subsequent record). some other technical difficulties were encountered along the road and tasleau des slalisti~u<siflchier des per!ooiquesi secteur s z septeiibre l969o nombre total oe dossiers• h~7 • • ............................ ....... ............................ . • tit~ts• 9hs • . .................................. • cduraihs= 6s~s • • •••••••••••••••••••••••• t aboh~ehettts= 3i39 1$: • dons• sits t t echanges~ 39d * * cas/pitoblekes~ 1059 t •••••••••••••••••••••••• • n~n/tru•ants• 323a • • • ••••••••••••••••••••••• t aiofthe,.fnts c 0 t t oohs• 1 t • eciianges• 0 • t cas/problemesz 1 • •••••••••••••••••••••••• • reference$• 463! • • . .............................. . • • c.jurantes• 3061 non/couuntes• ls7l ** iioiibre de iiossi!ils qui sont des titres ·courahts et qui n'ont pas de champ •e• a ~ fig. 6. some outputs. 138 ]ournn,l of library automation vol. 3/2 june, 1970 diversely overcome; for example, incomplete reading of log, down time, overlapping of information, duplication of fields, unaccepted records, display of information into frame or display of frame of inexistent field, unsettled screen, etc. in the fifteen months after june 1968, the input capability per person improved from an average of 30-35 transactions a day during the first three months to an average of 100-200 transactions a day, depending on the length of records; the down time was reduced from 50% to 10%-15% and even less than 5% for some weeks; the error percentage attributable to program-as distinct from errors of transcription-has declined from 10%-5% to 1% or less. to familiarize library staff with the new system and build confidence in it, a round of systematic demonstrations of the system at terminal points was organized as part of an in-training program. two printed guides (16, 17) were previously distributed; then from february 5 to march 14, 1969, all employees, in groups of six, were directed in questioning the file individually by sitting down at the consoles and pressing the right tabs. later the experience was completed by a guided tour in larger groups to the machine room of the computing center. members of the staff were enthusiastic, especially the professional people, and the experiment on both sides quite a success. an initial feature of the new system, but one which has not yet materialized, is the automatic claiming of missing issues. a first step in this direction will be initiated with the recent subprogram prepared for automatic ordering of serials at les presses de l'universite laval. coden may be introduced into the system and a question-answer subprogram devised that would give a more selective approach to the file-in fact, an application of ibm/ dps (document processing system) to a sample of some 200 records has already been positively tested. as for the hardware, there is a good possibility of installing terminals for users in main libraries in 1970/71. it also seems quite realistic that the system will be switched again to an ibm 360/67 in the near future. discussion it may rightly be said that at the start of the 70's, the library field has come half way to the materialization of the "on-line, total library system" concept as defined by robert m. hayes in 1965, and also that the manufacturers are on the verge of offering libraries the "computer service centers" fitted to their particular needs as forecast by the same author ( 18). on the one hand, indeed, numerous examples of operational online applications to various library functions have been described in the ] ournal of library automation, as well as in other specialized publications (the larc reports, program, special libraries, etc.). on the other hand, pertinent literature on library automation issued by people or firms from the data processing and computer fields-for example, warheit ( 19), on-line serials systemjde varennes 139 ibm, sdc ( 20 )-and diversified services under contract executed by computer firms for library organizations, are sufficient proof that manufacturers are now heeding this new market of computer applications. it is only to be hoped that these firms will also realize the linguistic dimensions of their new market and will accelerate the extension of their publications program to languages such as french, spanish, etc., with special emphasis on french for the french-speaking population of canada. concerning laval university's on-line serials system, it is still too soon to be in a position to evaluate its impact on the library's environment. the system is still in the building and not yet really exploited, especially from the user's point of view. there are other shortcomings of the present program. it's only access by accession number is far too limiting for on-line dialogue. it is questionable whether it was advisable to have divided the holding statement in two parts (summary notation of what is in field q, to conform to regular practice, and detailed notation of missing issues in field r, for acquisition purposes) on .account of the heavy routine it imposes on library staff; questionable also is the double entry for titles in a (regular librarians' practice) and m (direct title) on account of complexities it bears with it for references. a lack of consultation with b.i.r.o., inc., in the initial phase of the contract led to the establishment of a field sequence not ideally compatible with the requirements of the serials department's daily operations, though adjusted to optimum storage capacity. a more complete mastery of downtime periods and a more effective and more reliable routine of restart should have been developed by now by the staff of the computing center. finally, there has not been a total shift from off-line procedures to procedures more compatible with the new system, mostly because of psychological reluctance on the part of library staff. on the other hand, a far more reliable system keeps serials under better control. the serials file is updated twice or thrice a week-and is indeed "updatable" anytime-without any inconvenience to the serials department, whereas in the off-line system the annual or semi-annual updatings were compelling stoppage of regular operations for weeks, or even requiring overtime. there need be no more distinction between current and non-current records, nor with oversized records, since any record of any length is readily acceptable by the system. the opportunity exists of producing more adequate listings of all kinds, and reliable statistics more quickly and at a better price. the input procedure is so simplified that that fact alone would almost justify the system. finally, "humiliter dico," this program has brought more fame to the library (or is it only curiosity?) than any other achievement, inquiries about it coming from canada, the u.s.a. , england, france, africa, and even ussr. a year or so from now, it should be possible to evaluate the impact of the system on laval library users, by which time additional terminals will have been put at their disposal. 140 journal of library automation vol. 3/2 june, 1970 programs microfiches and photocopies of the following may be obtained from national auxiliary publications service of asis: "on-line serials system at laval university library: basic programs (naps 00966) and "on-line serials system at laval university library: printing programs (naps 00967). acknowledgments the sustained collaboration, expert advice and friendly understanding of people from the computing center of laval university were important to the library program from the beginning. the director, mr. louis p. a. robichaud, scientist of international repute, the assistant director, mr. pierre ardouin, and the analysts assigned to the serials program-mr. conrad bourdon and mr. richard desrosiers-skillfully maintain the system and work continuously towards its potential development. special thanks must also deservedly go to the firm b.i.r.o., inc., for implementation of the basic programs. encouragement in this venture by the librarian-in-chief, father joseph marie blanchet, was constant. references 1. varennes, rosario de: "the siamese twins or the new building of laval university library", (quebec, june 17, 1969) 8 pp. 2. ]ournee d'etude sur la mecanisation de certains services de la bibliotheque de l'universite laval; communiques ( sainte-foy, quebec: 5 juin 1964), 93 p. 3. communications presentees a une reunion de bibliothecaires d'universites canadiennes sur r automatisation des services de bibliotheque tenue a l'universite laval (quebec: 21 et 22 mars 1966), 69 p. 4. varennes, rosario de: summary report, committee for automation of library services, laval univ. library, (quebec, mar. 18, 1966) 5 pp. 5. stuart-stubbs, basil: conference on computers in canadian libraries, universite laval, quebec, march 21-22, 1966; a report prepared for the canadian association of college and university libraries (vancouver: university of british columbia library june 1966), 13 p. 6. programme d'automatisation des services de la bibliotheque de l'universite laval (montreal: acblf, aout 1967), 74 p. 7. forget, guy: "rapport sur asyvol", in dolan, f. t.: information retrieval in canada; a preliminary survey. (calgary, alberta: imperial oil ltd., producting dept., western region august 1967), iprc-4 mir-67, pp. 163-173. 8. varennes, rosario de: "computerized serials record at laval university: a progress report", canadian library, 24 (september 1967), 122-123. 9. leclerc, rita: "le centre de documentation de la bibliotheque de l'universite laval (quebec )", revue de l'aupelf, 5 (automne 1967)' 27-32. on-line serials systemfde varennes 141 10. varennes, rosario de: bird's-eye view of the library's computer programs and projects as of february 21, 1969. prepared for the members of the aucc (association of universities and colleges of canada) library automation committee (quebec: bibliotheque de l'universite laval, analyse et automatisation des services: february 25, 1969), 9 p. 11. forget, guy. "the university library and information center: a new dimension". in clinic on library applications of data processing, university of illinois, 6th, 1968. proceedings (urbana, ill.: university of illinois graduate school of library service), pp. 1-10. 12. b.i.r.o. inc., quebec: systeme d'affichage et de mise a ]our du fichier des periodiques; documentation technique (quebec, b.i.r.o. inc., avril 1968), 33 p. 13. b.i.r.o. inc., quebec: systeme d'affichage du fichier des feriadiques, bibliotheque de l'universite laval. manuel d'operation (quebec: centre de traitement de !'information, universite laval. juin 1968), 58 p. 14. varennes, rosario de: programme des periodiques 1968. hypothese de travail: conversion mecanisee des donnees de l'etat de collecti{)n et rappels automatiques. schema preliminaire a l'intention du centre de traitment de l'information (quebec: bibliotheque de l'universite laval, services des periodiques. 26 avril 1967), 6 p. 15. lachance, jean: etudes preliminaires d'un systeme de controle automatique des periodiques, universite laval" (quebec: centre de traitement de l'information, universite laval: 14 aout 1967), 8 p. 16. varennes, rosario de: guide succinct pour l'interrogation du systeme de periodiques en temps reel avec l'ordinateur (ibm 360-50/2314/ 2260). a l'intention du personnel de la bibliotheque, (quebec: bibliotheque de l'universite laval, analyse et automatisation des services: 15 janvier 1969), 7 p. 17. bourdon, conrad; varennes, rosario de: description technique somnaire du systeme de periodiques en temps reel avec l'ordinateur (ibm 360-50/2314/2260). complement au "guide succinct . .. "defa paru. a l'intention du personnel de la bibliotheque" (quebec: bibliotheque de l'universite laval, analyse et automatisation des services: 24 avril 1969), 8 p. 18. hayes, robert m.: "the concept of an on-line, total library system", library technology reports, section: data processing (may 1965), 13 p. 19. warheit, i. a.: "file organization of library records", journal of . library automation, 2 (march 1969), 20-30. 20. black, donald v.: library information system time-sharing on a large, general-purpose computer (santa monica, calif.: system development corporation, september 20, 1968) sp-3135, 21 p. discovery: what do you mean by that? | carter 161 judith carter editorial board thoughts: issue introduction discovery: what do you mean by that? m wuah ha ha ha haaa! finally it’s my turn. i hold the power of the editorial. (can you tell i’m writing this around halloween?) seriously now, i’ve been intimately and extensively involved with information technology and libraries for eleven years, yet this is the first time i’ve escaped from behind the editing scenes to address the readership directly. as managing editor for seven of the eleven volumes (18–22 and 27–28) and an editorial board member reviewing manuscripts (vols. 23–26), i am honored marc agreed to let me be guest editor for this theme issue. this issue is a compilation of presentations from the discovery mini-conference held at the university of nevada las vegas (unlv) libraries in the spring of 2009. the first article by jennifer fabbi gives the full chronology and framework of the project, but i have the pleasure of introducing this issue and topic by virtue of my role as guest editor, as well as my own participation in the miniconference before i left unlv in july 2009. n what is discovery? when the dean of libraries, patricia iannuzzi, announced that unlv would have a libraries-wide, poster-session style discovery mini-conference, jennifer fabbi and i decided we wanted to be part of it. we had already been exploring various aspects of discovery as part of an organizational focus as well as following up on a particular event that happened earlier in the year. while serving on a search committee, we posed a question to all the candidates: “what do you see the library catalog looking like in the future? what do you see as the relationship between the library catalog and other access or discovery tools?” one of the candidates had such a unique answer that it got us thinking: are we all talking about the same thing when we discuss discovery? the mini-conference gave us the opportunity to explore the idea further. an all-library summit that preceded the mini-conference announcement had focused on users finding known items. we knew that discovery was so much more and that it depended on the users’ needs. of course, first we went to multiple online dictionaries to look up the meanings of “discovery” and found the following definitions: n something learned or found; something new that has been learned or found n the process of learning something; the fact or process of finding out about something for the first time n the process of finding something; the process or act of finding something or somebody unexpectedly or after searching we also looked at famous quotes about discovery. there were some of our favorites: a discovery is said to be an accident meeting a prepared mind. —albert szent-gyorgyi education is a progressive discovery of our own ignorance. —will durant next, a colleague recommended we look at chang’s browsing theory.1 this theory covered the broad spectrum of how users seek information and showed a more serendipitous view than the former focus of known item search. obviously, browsing implies a physical interaction with a collection, so we reframed the themes to fit discovery in the “every-library” electronic information environment. chang’s five browsing themes, adapted to discovery: n looking for a specific item, to locate n looking for something with common characteristics, to find “more like this” n keeping up-to-date, to find out what’s new in a field, topic or intellectual area n learning or finding out, to define or form a research question n goal-free, to satisfy curiosity or be entertained.2 all interesting information, but a little theoretical for a visual presentation. to make these themes more concrete and visual, i suggested we apply them to personas as described in one of my favorite books, the inmates are running the asylum.3 this encourages programmers to create a user with a full backstory and then design a product for their needs. to do this in an entertaining way, we identified five types of users we’ve encountered in our libraries and described an information-seeking need for each. i then created some colorful and representational characters using a well-known, alliteratively named candy’s website. our five characters were 1. mina, stylishly dressed and always carries a cell phone, is an undergraduate who rarely uses the library. she has a sociology class library assignment to find information on the cell phone habits of generation x. 2. ms. lvite lives in the las vegas area and contributes to the library. she is a regular from the community judith carter (jcarter.mls@gmail.com) is head of technical services at marquette university raynor memorial libraries, milwaukee, wi and managing editor of ital. 162 information technology and libraries | december 2009 who likes to dig into everything the library owns about small mining towns in nevada. 3. dr. prof is a faculty member with a slightly outdated wardrobe but a thirst for knowledge. he wants to know what books have been published in his field of quantum bowtie mechanics by any of his colleagues across the country. 4. phdead tired is a slightly mussed grad student who is always in the library clutching a cup of coffee. he needs to narrow down his dissertation topic. 5. duuuuude is an energetic, sociable young man who likes to hang out in the library with his friends. he has some time to kill on the computer. on our poster, we asked the discovery miniconference attendees to place cutouts of our personas on a pie chart divided into the five themes of discovery. jennifer and i expected certain placements and were pleasantly surprised when our attendees challenged our assumptions with alternate possibilities. another section of the poster related discovery behaviors to specific electronic discovery tools. we provided a few and asked the attendees to add others (see table 1). while talking with each attendee, we provided a bookmark listing the five discovery behaviors (with colorful character personas) and suggested they keep them in mind as they visited the other conference sessions. we challenged them to identify what user behaviors the other presenters’ systems or services were targeting. the message jennifer and i hoped to convey with our poster was this: the way we think about discovery, or the users’ goals in finding information, drives the discovery table 1. relating discovery behaviors to electronic discovery tools user wants . . . provide the user . . . other tools?* to find a specific item search by title, author, or call number (e.g., libraries’ webopac) search a database worldcat flickr google books to find items with common characteristics items linked by subject headings, format, or other elements; tag clouds; federated search for article databases (e.g., webopac, encore, article databases) flickr summon twine delicious to be kept up-to-date recently added items by subject; integration of blogs for news or updates (e.g., new books list, libguides, encore “recently added”) blogs rss feeds apple itunes amazon readers advisory authors/musicians websites newspapers online to learn more about something general information that provides context, reviews (e.g., wikipedia, google, encore community reviews) dissertation abstracts encyclopedias database of databases (for context) peer to peer: delicious, social tagging to satisfy curiosity or be entertained surfing the web, multimedia, social networking (e.g., google, youtube, facebook) myspace world of warcraft second life podcasts wikipedia “random article” feature * ideas generated at the discovery mini-conference discovery: what do you mean by that? | carter 163 systems we have or will create. as you read through this issue, i hope you’ll see some new ways to think about discovery and that those ways will fuel this audience’s potential to create new tools. what follows is a textual walk around our miniconference. taken as individual articles, each might not look like what you are used to seeing in ital. taken as a whole that grew out of the process, these articles are what makes this a special issue. as i said before, jennifer fabbi provides the background and process for the discovery mini-conference. then, alex dolski describes a prototype multipac discovery system he created and demonstrated, and he discusses the issues surrounding the design of such a system. tom ipri, michael yunkin, and jeanne brown, as members of the usability working group, had already been conducting testing on unlv libraries’ website. they share their methods, findings, and results with us. thomas sommer presents a look at what the special collections department has implemented to aid discovery of their unique materials. wendy starkweather and eva stowers used the mini-conference as an opportunity to research how other libraries are providing discovery opportunities to students via smartphones. patrick griffis describes his work with free screen capture tools to build pathfinders to promote resource discovery. patrick griffis and cyrus ford each looked at enhancing catalog records, so they combined their two presentations here to describe ways to enrich the online catalog to better aid our users’ success. references 1. shan-ju chang, “chang’s browsing,” in theories of information behavior, by karen e. fisher, sanda erdelez, and lynne mckechnie (medford, n.j.: information today, 2005): 69–74. 2. ibid., 71–72. 3. alan cooper, the inmates are running the asylum, (indianapolis, ind.: sams, 1999). personas are described in chapter 9. figure 1. “initial thoughts” and “five general themes of discovery behavior” panel from the discovery mini-conference poster letter from the editor (september 2022) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | september 2022 https://doi.org/10.6017/ital.v41i3.15559 as summer turns to fall—too soon for the editor of this journal—the northern hemisphere’s academic calendar is getting started. after the past two years of covid-directed activities, it feels good to be returning to a more typical start to the year. while we’re not out of the pandemic woods yet, by any means, it does feel as if we’ve turned a corner. some of us have returned to precovid modes of working and socializing, while others are finding the return to the status quo ante a bit more challenging. if the pandemic has shown us one thing, it’s the power—and limitations—of technology to adapt to a changed world, and of our human ability to adapt, or not, to new habits of technology. we’ve covered many of these responses of adaptation in the pages of this journal over the past two years, and expect that many more innovations and lessons learned will be shared here in the years to come. technological change is a never-ending process. information technology and libraries will continue to share the ways cultural memory institutions adapt, respond, and react to changes in the technological tools we use. as always, if you have lessons learned about technologies and their effect on our mission, we’d like to hear from you. our call for submissions outlines the topics and process for submitting an article for review. if you have questions or wish to bounce ideas off the editor and assistant editor, please contact either of us at the email addresses below. this issue’s contents the september “public libraries leading the way” column, “the first 500 mistakes you will make while streaming on twitch.tv” by chris markman, kasper kimura, and molly wallner of the palo alto (california) public library, is all about lessons learned. the authors summarize the many things they discovered while launching, managing, and sustaining a library presence on twitch . our peer-reviewed content this month showcases topics including public library broadband connectivity, two articles on aspect of chat reference, digitization, library management, and a learning object repository to support a cross-institutional, land-based, multidisciplinary academic initiative. 1. measuring library broadband networks to address knowledge gaps and data caps / chris ritzo, colin rhinesmith, and jie jiang 2. perceived quality of whatsapp reference service: a quantitative study from user perspectives / yan guo, apple hiu ching lam, dickson k. w. chiu, and kevin k. w. ho 3. library management practices in the libraries of pakistan: a detailed retrospective / asim ullah, shah khusro, and irfan ullah 4. navigating uncharted waters: utilizing innovative approaches in legacy theses and dissertations digitization at the university of houston libraries / annie wu, taylor davisvan atta, bethany scott, santi thompson, anne washington, jerrell jones, andrew weidner, a. laura ramirez, and marian smith 5. using machine learning and natural language processing to analyze library chat reference transcripts / yongming wang 6. an omeka s repository for placeand land-based teaching and learning / neah ingram-monteiro and ro mckernan kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu marisha.librarian@gmail.com https://ejournals.bc.edu/index.php/ital/call-for-submissions https://ejournals.bc.edu/index.php/ital/article/view/15475 https://ejournals.bc.edu/index.php/ital/article/view/15475 https://ejournals.bc.edu/index.php/ital/article/view/13775 https://ejournals.bc.edu/index.php/ital/article/view/14325 https://ejournals.bc.edu/index.php/ital/article/view/14325 https://ejournals.bc.edu/index.php/ital/article/view/14433 https://ejournals.bc.edu/index.php/ital/article/view/14719 https://ejournals.bc.edu/index.php/ital/article/view/14719 https://ejournals.bc.edu/index.php/ital/article/view/14967 https://ejournals.bc.edu/index.php/ital/article/view/14967 https://ejournals.bc.edu/index.php/ital/article/view/15123 mailto:varnum@umich.edu mailto:marisha.librarian@gmail.com this issue’s contents 166 book reviews proceedings of the 1968 clinic on library applications of data processing, edited by dewey e. carroll. urbana : university of illinois, 1969. 235 pp. $3.00. for all except inveterate institute participants, it must be difficult to decide to spend yet another week listening to a widely mixed series of papers and discussions on data processing in libraries, in the hope of finding something new or useful. to attract a wide audience, the offerings tend to range from simple introductions to technical discussions of specific programs or projects. the value of gathering the papers of such institutes into volumes of proceedings is questionable. material from the introductory papers would certainly find greater use in a comprehensive monograph, while the papers which report new developments or technical problems would have a better chance of reaching their proper audiences if published in journals. the repetitive "how-we-did-it" reports might best be left unpublished. the proceedings of the 1968 illinois clinic does have a number of articles which deserve wide readership. frederick g. kilgour's paper on initial system design for the ohio college library center is excellent, not so much for solutions, but because he raises the questions on the purpose of college libraries and the nature of regional systems which need to be raised before embarking on design. those who have had experience with automated operations will appreciate lawrence auld's listing of ten categories of library automation failure. (he omits one of the most common-lack of computer stability. ) a technical article of considerable interest is alan r. benenfeld's paper on generation and encoding of the data base for intrex. those looking for reports of successful computer applications may find useful information in the papers by robert hamilton, of the illinois state library, on circulation; by james w. thomson and robert h. muller, of the university of michigan, on the u. of m. order system; by michael m. reynolds, of indiana university, on centralized technical processing for the university's regional campus libraries; by john p. kennedy, of georgia tech, on production of catalog cards; and by robert k. kozlow, of the university of illinois, on a computer-produced serials list. melvin ]. voigt book reviews 167 planning library services. proceedings of a research seminar held at the university of lancaster, 9-11 july, 1969. edited by a. graham mackenzie and ian m. stuart. lancaster, england: university of lancaster library, 1969. 30 shillings. this volume offers fifteen papers presented in six sessions; each session had one or more papers and some discussion. the papers range from very general mathematical models to local problems of british legal codes and re-organization of local governments. the first session introduces the problems and some theoretical notions of how to deal with them. the next three sessions deal with analysis techniques. morely introduces some simple techniques of maximizing benefits for given resources. brookes presents a good quick introduction to statistics and distributions which occur frequently in information science. leimkuhler develops cost models for storage policies and woodburn analyzes the costs in heirarchical library systems. the mathematics in these latter papers, although not difficult, will probably put off a good many librarians and administrators. both are practitioners and impressed by results, not complex models; the equations developed by leimkuhler or woodburn are probably too complex to be successfully used by most librarians. this might reflect the state of the librarian and not of the art, however, to quote cloote (from the paper by duchesne) : "with only a very few notable exceptions, successful models have been so simple that an operational research specialist would disown them." the fifth session covers data collection and evaluation. duchesne comments on management information systems and operations research for librarians. conventional techniques of data collection are reviewed by ford, including sample forms and a note of warning about too many surveys. in the final session leimkuhler presents an overview which includes several choice comments on progress (or lack of it) in libraries. during the discussion period, mackenzie suggests that libraries should use up to five percent of their budgets for research. this reviewer feels that unless this suggestion is taken more seriously, most of the theory will never find an application. these proceedings would make an excellent companion to burkhalter's case studies in library systems analysis as more theoretically oriented readings for a course operations research or administration in librarianship. some of the techniques presented could be adapted for immediate application in analyzing present systems. thus this collection of papers can be useful to both student and practitioner interested in research and development of library systems. arvo tars 168 journal of library automation vol. 3/2 june 1970 libraries at large, edited by douglas m. knight and e. shepley nourse. new york: r. r. bowker, 1969. 664 p. $14.95. libraries at large is based on the materials which the national advisory commission on libraries employed in its deliberations. the commission appraised the adequacy of libraries and made recommendations designed "to ensure an effective, efficient library system for the nation." these materials are also useful to those engaged in the enrichment of present library programs and to those developing new library projects. the materials consist of papers and reports written for the commission and include essays, original investigations, and literature reviews, as well as reprints of material that has appeared elsewhere. some papers are of top quality; some are poor. nevertheless, the appearance of these materials in one volume adds a convenient source of information that will be useful to librarians for years to come. approximately half the book is devoted to problems related to the use of libraries and to the users of libraries. the second half contains discussions of government relationships of libraries and a series of useful appendixes. perhaps the most novel section of the book is william j. baumors "the cost of library and informational services." this study investigates the economics of libraries in depth and the results are of great interest. this chapter on economics contains new material and brings together that which existed heretofore, so that it constitutes the major resource on library economics. this chapter alone is so valuable as to justify the recommendation that all libraries and most librarians should acquire libraries at large. the section on copyright is equally important, for it brings together data on a topic possessing cataclysmic potentials for librarianship. verner clapp's "copyright: a librarian's view" is the best statement that has appeared on the subject, and it is hoped that clapp's dissertation will awaken librarians to the peril that confronts them. on the other hand, the chapter entitled "some problems and potentials of technology as applied to library informational services" is somewhat less than satisfying. the section starts off with mathews and brown's "research libraries and the new technology," which originally appeared in on research libraries. it is still an inadequate exposition. there follows a reprint of "the impact of technology on the library building," which educational facilities laboratories published in 1967. the statement is adequate, but more useful information exists. the last section of the chapter is a study, "technology in libraries," which the system development corporation produced. this paper is a useful review of technologies employed by libraries and recommends five important network and systems projects to be undertaken. the chapters on government relationships include discussions of those book reviews 169 with the federal government and those at local, state and regional levels. germaine krettek and eileen d. cooke have provided a worthwhile appendix listing and abstracting library-related legislation at the national level. libraries at large is indeed a resource book, and those papers containing original investigations and literature reviews are of such high quality as to insure usefulness of this work to all thoughtful librarians. frederick g. kilgour computers and their potential applications in museums. a conference sponsored by the metropolitan museum of art. new york: arno press, 1968. 402 pp. $12.50. computers and their potential applications in museums contains the published proceedings of a conference which was held in new york, 1968. sponsored by the metropolitan museum of art and supported by ibm, the conference was another attempt to involve art and related fields in computer technology. this book covers a broad range of issues and problems from information retrieval to creativity. experts from museums, educators, librarians and computer specialists discussed the possible uses and the implications of computers for the museum field. the diversity of the participants seems to represent the components of an exceedingly complex problem which is as monumental as the museum field itself. as an overall document it gives evidence of concern and insight into the many technical problems which some researchers have encountered. in many instances the non-technical experts were too global in their thinking, while the technologists were too local in their area of concern to communicate to anyone but technologists. this disparity between approaches, with the obvious difficulties presented, is a typical one whenever non-technical groups attempt to make use of computer technology. an ambitious conference in scope, there were excellent participants and several of the papers were stimulating and provocative. the interaction among the people who attended the conference may have been useful and it may have generated important ideas. for a reader of the published proceedings one wishes there had been a final chapter which could have provided some guidelines for research and education in this field. there was an opportunity for the organizers of the conference or a small group of the participants to summarize the problems and to give some direction to solutions. several years and many conferences later we in the humanities have made little progress in use of the computer. it seems that we are still better at rhetoric than at problem solving. charles csuri 170 journal of library automation vol. 3/2 june 1970 books for junior college libraries. pirie, james w., comp. chicago: american library assoc., 1969. 452 pp. $35.00. during the recent period of rapid growth and development of junior and community colleges, a bibliographic guideline has been long awaited. james w. pirie's books for junior college libraries, with its healthy potential for developing many basic collections and extending and updating others, fills that void. though it does not boast to be the single ideal bibliographic tool, it is a welcome addition to, (and perhaps replacement for some of) its predecessors-frank bertalan's books for junior college libraries; charles l. trinkner's basic books for junior college libraries; hester hohman's readers adviser; helen wheeler's a basic book collection for the community college library; bro dart foundation's the junior college library collection~ edited by dr. bertalan; and the ever-present subject guide to books in print and books in print, from bowker. books for junior college libraries represents the cooperative efforts of some 300 expert consultants-subject specialists, faculty members and librarians-charged with the responsibility of producing a publication to serve as a book selection guide for new or established junior and community college libraries. approximately 20,000 titles are arranged by subject, broadly interpreted; with entries consisting of author, title, subtitle, edition; if other than the first, publisher, and place of publication, date, pagination, price and l.c. number. easy access is provided by the inclusion of an author and subject index. a comparative "table of subject coverage" appearing in the preface, tabulating the percentage of subject distribution to total volume for the lamount, michigan, and the more recent books for college libraries lists, indicates that books for junior college libraries maintains a comparable subject percentage distribution to total volume. only book titles have been included; foreign entries have been limited to a few major works, and out-of-print titles, in favor of titles readily available. paperbacks were listed, in the absence of card copy. though limited in its coverage of terminal and vocational courses, with emphasis toward the transfer or liberal arts program, books for junior college libraries does embrace all fields of knowledge that tend to be challenging and useful for the general education programs. it has been endorsed by the joint committee on junior colleges of aajc, ala, and the junior college section of acrl, and moves toward the recommendations of the ala standards for junior college libraries. this bibliographic guideline for junior college libraries should be welcomed by public schools as well as junior and community colleges for its assistance in developing new collections, as well as expanding and updating old collections, with quantity, quality, and economy working together. ]ames i. richey book reviews 171 agricultural sciences information network development plan. educom research report, august 1969. 74 pp. the national agricultural library wants to implement its old plan of an agricultural science information network "based on the assumption that the land-grant libraries in the states are the natural nodes to this network." educom undertook a study which was submitted to and discussed by a symposium held in washington, d. c., on february 10-12, 1970, with the participation of all agricultural libraries interested in "new and improved ways of exchanging information in support of agricultural research and education." the goal is "to develop a long-range plan for strengthening information, communication, and exchange among the libraries of land-grant institutions and the nal." according to the report, the network concept would constitute a "network of networks" and three basic components are envisioned: 1) land-grant libraries, 2) information analysis centers, and 3) telecommunications. all these components have their own aims and objectives described in this report. "nal's first course of action in the establishment of a system of information analysis centers is to develop a directory of existing analysis centers of interest to the agricultural community. the directory should be supported with a catalog detailing the services and products offered by these centers. nal should then establish cooperative agreements with these centers which would make them responsive to the needs of the users of the agricultural sciences information network. this should be supported with the installation of communications equipment to encourage and facilitate the use of a center." no doubt, the participants of the symposium will have thoroughly investigated and discussed this plan with serious consideration to its practical implementation. a new approach and improvement of information exchange is not only a necessity, but also long overdue, for those in agriculture. this information development plan would provide service for research workers at the experiment stations, scientists and teachers at the colleges, agricultural extension people at the land-grant institutions, and, last but not least, for the farmers who provide us with food and fibers in order to bring a fuller and better life on the farm and in rural and city homes. a detailed analysis of the performance, an evaluation and revision of this gigantic scientific information system, can only be made after it has been in operation for a few years. it is very promising that the national agricultural library-among its many objectives-has again taken the initiative. john de gara 172 journal of library automation vol. 3/2 june, 1970 cornell university libraries. manual of cataloging procedures. 2d ed. ithaca, n.y. : cornell university libraries, 1969. $18.00. editor robert b. slocum and his associates have produced a valuable manual useful to catalogers and persons involved in the administration of policies and procedures in technical services. as stated in the preface the manual is a supplement, not a substitute, for the anglo-american cataloging rules and its predecessors, lc list of subject headings and the lc classification schedules. the following directive is basic: "the revisers are always open for consultation on particularly difficult problems, but it must be assumed that a professional cataloger will have a thorough knowledge of the basic tools of his profession. . . . if this knowledge is in any way lacking, the cataloger has the obvious responsibility of acquiring it through diligent study and experience. he should not come to the reviser with questions whose answers are available in the aforementioned tools and in this manual." the format is loose-leaf, so that additions and revisions may be made easily to reflect new developments and techniques. the sections include pre-cataloging procedures; general cataloging and classification procedures; recataloging and reclassification; cornell university college and department libraries-special collections and special catalogs; . . . serials and binding department; files and filing; typing, card production, book preparation; statistics; appendix (including abbreviations, romanization tables, etc. ) ; and index. the procedures and practices described are those adopted by a research library "conscious of the need for both quality and quantity in the work of its staff." this publication, weighing five pounds, is a great achievement and with its full index an indispensable contribution to the collection of worthwhile cataloging manuals. descriptions of local procedures may seem detailed but basic principles and policies are well covered. the final touch is the inclusion of a catalog card for the manual! margaret oldfather a comparative analysis of the effect of the integrated library system on staffing models in academic libraries ping fu and moira fitzgerald information technology and libraries | september 2013 47 abstract this analysis compares how the traditional integrated library system (ils) and the next-generation ils may impact system and technical services staffing models at academic libraries. the method used in this analysis is to select two categories of ilss—two well-established traditional ilss and three leading next-generation ilss—and compare them by focusing on two aspects: (1) software architecture and (2) workflows and functionality. the results of the analysis suggest that the nextgeneration ils could have substantial implications for library systems and technical staffing models in particular, suggesting that library staffing models could be redesigned and key librarian and staff positions redefined to meet the opportunities and challenges brought on by the next-generation ils. introduction today, many academic libraries are using well-established traditional integrated library systems (ilss) built on the client-server computing model. the client-server model aims to distribute applications that partition tasks or workloads between the central server of a library automation system and all the personal computers throughout the library that access the system. the client applications are installed on the personal computers and provide a user-friendly interface to library staff. however, this model may not significantly reduce workload for the central servers and may increase overall operating costs because of the need to maintain and update the client software across a large number of personal computers throughout the library. 1 since the global financial crisis, libraries have been facing severe budget cuts, while hardware maintenance, software maintenance, and software licensing costs continue to rise. the technology adopted by the traditional ils was developed more than ten years ago and is evidently outdated. the traditional ils does not have sufficient capacity to provide efficient processing for meeting the changing needs and challenges of today’s libraries, such as managing a wide variety of licensed electronic resources and collaborating, cooperating, and sharing resources with different libraries.2 ping fu (pingfu@cwu.edu), a lita member, is associate professor and head of technology services in the brooks library, central washington university, ellensburg, wa. moira fitzgerald (moira.fitzgerald@yale.edu), a lita member, is access librarian and assistant head of access services in the beinecke rare book and manuscript library, yale university, new haven, ct. mailto:pingfu@cwu.edu mailto:moira.fitzgerald@yale.edu a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 48 today’s libraries manage a wide range of licensed electronic resource subscriptions and purchases. the traditional ils is able to maintain the subscription records and payment histories but is unable to manage details about trial subscriptions, license negotiations, license terms, and use restrictions. some vendors have developed electronic resources management system (erms) products as standalone products or as fully integrated components of an ils. however, it would be more efficient to manage print and electronic resources using a single, unified workflow and interface. to reduce costs, today’s libraries not only band together in consortia for cooperative resource purchasing and sharing, but often also want to operate one “shared ils” for managing, building, and sharing the combined collections of members.3 such consortia are seeking a new ils that exceeds traditional ils capabilities and uses new methods to deliver improved services. the new ils should be more cost effective, should provide prospects for cooperative collection development, and should facilitate collaborative approaches to technical services and resource sharing. one example of a consortium seeking a new ils is the orbis cascade alliance, which includes thirty-seven universities, colleges, and community colleges in oregon, washington, and idaho. as a response to this need, many vendors have started to reintegrate or reinvent their ilss. library communities have expressed interest in the new characteristics of these next-generation ilss; their ability to manage print materials, electronic resources, and digital materials within a unified system and a cloud-computing environment is particularly welcome.4 however, one big question remains for libraries and librarians, and that is what implications the next-generation ils will have on libraries’ staffing models. little on this topic has been presented in the library literature. this comparative analysis intends to answer this question by comparing the nextgeneration ils with the traditional ils from two perspectives: (1) software architecture, and (2) workflows and functionality, including the capacity to facilitate collaboration between libraries and engage users. scope and purpose the purpose of the analysis is to determine what potential effect the next-generation ils will have on library systems and technical services staffing models in general. two categories of ilss were chosen and compared. the first category consists of two major traditional ilss: ex libris’s voyager and innovative interfaces’ millennium. the second category includes three nextgeneration ilss: ex libris’s alma, oclc’s worldshare management services (wms), and innovative interfaces’ sierra. voyager and millennium were chosen because they hold a large portion of current market shares and because the authors have experience with these systems. yale university library is currently using voyager, while central washington university library is using millennium. alma, wms, and sierra were chosen because these three next-generation ilss are produced by market leaders in the library automation industry. the authors have learned about these new products by reading and analyzing literature and vendors’ proposals, as well as information technology and libraries | september 2013 49 attending vendors’ webinars and product demonstrations. in the long run, yale university library must look for a new library service platform to replace voyager, verde, metalib, sfx, and other add-ons. central washington university library is affiliated with the orbis cascade alliance mentioned above. the alliance is implementing a new library management service to be shared by all thirty-seven members of the consortium. ex libris, innovative interfaces, oclc, and serials solutions all bid for the alliance’s shared ils. after an extensive rfp process, in july 2012 the orbis cascade alliance decided to choose ex libris’s alma and primo as their shared library services platform. the system will be implemented in four cohorts of approximately nine member libraries each over a two-year period, beginning in january 2013. the central washington university library is in the forth migration cohort, and their new system will be live in december 2014. it is important to emphasize that the next-generation ils has no local online public access catalog (opac) interface. vendors use additional discovery products as the discovery-layer interfaces for their next-generation ilss. specifically, ex libris uses primo as the opac for alma, while oclc’s worldcat local provides the front-end interface for wms. innovative interfaces offers encore as the discovery layer for sierra. as front-end systems, these discovery platforms provide library users with one-stop access to their library resources, including print materials, electronic resources, and digital materials. while these discovery platforms will also impact library organization and librarianship, they will have more impact on the way that end-users, rather than library staff, discover and interact with library collections. in this analysis, we focus on the effects that back-end systems such as alma, wms, and sierra will have on library organizational structure and staffing, rather than the end-user experience. as our sample only includes five ilss, the scope of the analysis is limited, and the findings cannot be universal or extended to all academic libraries. however, readers will gain some insight into what challenges any library may face when migrating to a next-generation ils. literature review a few studies have been published on library staffing models. patricia ingersoll and john culshaw’s 2004 book about systems librarianship describes vital roles that systems librarians play, with responsibilities in the areas of planning, staffing, communication, development, service and support, training, physical space, and daily operations. 5 systems librarians are the experts who understand both library and information technology and can put the two fields together to context. they point out that system librarians are the key players who ensure that a library stays current with new information technology. the daily and periodic operations for systems librarians include ils administration, server management, workstation maintenance, software and applications maintenance and upgrades, configuration, patch management, data backup, printing issues, security, and inventory. all of these duties together constitute the workloads of systems librarians. ingersoll and culshaw also emphasize that systems librarians must be proactive in facing constant changes and keep abreast of emerging library technologies. a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 50 edward iglesias et al., based on their own experiences and observations at their respective institutions, studied the impact of information technology on systems staff.6 their book covers concepts such as the client-server computing model, web 2.0, electronic resource management, open-source, and emerging information technologies. their 2004 studies show that, tough there are many challenges inherent in the position, there are also many ways for system staff to improve their knowledge, skills, and abilities to adapt to the changing information technologies. janet guinea has also studied the roles of systems librarians at an academic library.7 her 2003 study shows that systems librarians act as bridge-builders between the library and other university units in the development of library-initiated projects and in the promotion of information technology-based applications across campus. another relevant study was conducted by marshall breeding at vanderbilt university in an investigation of the library automation market. his 2012 study compares the well-established, traditional ilss that dominate the current market (and are based on client-server computing architecture developed more than a decade ago) to the next-generation ilss deployed through multitenant software-as-a-service (saas) models, which are based on service-oriented architecture (soa).8 through this comparison, breeding indicates that next-generation ilss will differ substantially from existing traditional ilss and will eliminate many hardware and maintenance investments for libraries. the next-generation ils will bring traditional ils functions, erms, digital asset management, link resolvers, discovery layers, and other add-on products together into one unified service platform, he argues.9 he gave the next-generation ils a new term, library services platform.10 this term signifies that a conceptual and technical shift is happening: the next-generation ils is designed to realign traditional library functions and simplify library operations through a more inclusive platform designed to handle different forms of content within a unified single interface. breeding’s findings conclude that the next-generation ils provides significant innovations, including management of print and electronic library materials, reliance on global knowledge bases instead of localized databases, deployment through multitenant saas based on a service-oriented architecture, and the provision of a suite of application programming interfaces (apis) that enable greater interoperability and extensibility.11 he also predicts that the next-generation ils will trigger a new round of ils migration.12 method our method narrowed down the analysis for the implications of ilss on library systems and technical services staffing models to two major aspects: (1) software architecture, and (2) workflows and functionality, including facilitation of collaborations between libraries and user engagement. first, we analyzed two traditional ilss, voyager and millennium, which are built on a client-server computing model, deliver modular workflow functionality, and are implemented in our institutions. through the analysis, we determined how these two aspects affect library organizational structure and librarian positions designed for managing these modular tasks. then, information technology and libraries | september 2013 51 based on information we collected and grouped from vendors’ documents, rfp responses, product demonstrations, and webinars, we examined the next-generation ilss alma, wms, and sierra— which are based on soa and intended to realign traditional library functions and simplify library operations—to evaluate how these two factors will impact staffing models. to provide a more in-depth analysis, particularly for systems staffing models, we also gathered and analyzed online systems librarian job postings, particularly for managing the voyager or millennium system, for the past five years. the purpose of this compilation is to cull a list of typical responsibilities of systems librarians and then determine what changes may occur when they must manage a next-generation ils such as alma, wms, or sierra. data on job postings were gathered from online job banks that keep an archive of past listings, including code4lib jobs, ala joblist, and various university job listing sites. duplicates and reposts were removed. the responsibilities and duties described in the job descriptions were examined for similarities to determine a typical list. the data from all sources were gathered together in a single database to facilitate its organization and manipulation. specific responsibilities, such as administering an ils, were listed individually, while more general responsibilities for which descriptions may vary from one posting to another were grouped under an appropriate heading. to ensure complete coverage, all postings were examined a second time after all categories had been determined. we also used our own institutions as examples to support the analysis. the implications of ils software architecture on staffing models voyager and millennium are built on client-server architecture. libraries that use these ilss also use add-ons, such as erms and link resolvers, to manage their print materials and licensed electronic resources. the installation, configuration, and updates of the client software require a significant amount of work for library it staff. many libraries must allocate substantial staff effort and resources to coordinating the installation of the new software on all computers throughout the library that access the system. those libraries that allow staff to work remotely have experienced additional costs and it challenges. in addition, server maintenance, backups, upgrades, and disaster recovery also require excessive time and effort of library it staff. administering ilss, erms, and other library hardware, software, and applications is one of the primary responsibilities for a library systems department. positions such as systems librarian, electronic resource librarian, and library it specialist were created to handle this complicated work. at a very large library, such as yale university library, the systems group of library it is only responsible for voyager’s configuration, operation, maintenance, and troubleshooting. two other it support groups—a library server support group and a workstation support group—are responsible for installation, maintenance, and upgrade of the servers and workstations. specifically, the library server support group deals with the maintenance and upgrade of ils servers and the software and relational database running on the servers, while the workstation support group takes care of the installation and upgrade of the client software on hundreds of a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 52 workstations throughout twenty physical libraries. at a smaller library, such as central washington university library, on the other hand, one systems librarian is responsible for the administration of millennium, including configuration, maintenance, backup, and upgrade on the server. another library it staff member helps install and upgrade the millennium client on about forty-five staff computers throughout its main library and two center campus libraries. comparatively, the next-generation ilss alma, wms, and sierra have a saas model designed by soa principles and deployed through a cloud-based infrastructure. oclc defines this model as “web-scale management services.”13 using this innovation, service providers are able to deliver services to their participating member institutions on a single, highly scalable platform, where all updates and enhancements can be done automatically through the internet. the different participating member institutions using the service can configure and customize their views of the application with their own brandings, color themes, and navigational controls. the participating member institutions are able to set functional preferences and policies according to their local needs. web-scale services reduce the total cost of ownership by spreading infrastructure costs across all the participating member institutions. the service providers have complete control over hardware and software for all participating member institutions, dramatically eliminating capital investments on local hardware, software, and other peripheral services. service providers can centrally implement applications and upgrades, integration across services, and system-wide infrastructure requirements such as performance reliability, security, privacy, and redundancy. thus participating member institutions are relieved from this burdensome responsibility that has traditionally been undertaken by their it staff.14 from this perspective, the next-generation ils will have a huge impact on library organizational structure, staffing, and librarianship. since the next-generation ils is implemented through the cloud-computing model, there is no requirement for local staff to perform the functions traditionally defined as “systems” staff activities, such as server and storage administration, backup and recovery administration, and server-side network administration. for example, the entire interfaces of alma and wms are served via web browser; there is no need for local staff to install and maintain clients on local workstations. therefore, if an institution decided to migrate to a next-generation ils, the responsibilities and roles of systems staff within the institution would need to be readdressed or redefined. we have learned from attending oclc’s webinars and product demonstrations that library systems staff would be required to prepare and extract data from their local systems during new systems implementation. they also would be required to configure their own settings such as circulation policies. however, after the migration, a systems staff member would likely serve as a liaison with the vendor. this would require, according to oclc’s proposal, only 10 percent of the systems staff’s time on an ongoing basis. through attending ex libris’s webinars and product demonstrations, we have learned that a local system administrator may be required to take on basic management processes, such as record-loading or integrating data from other campus systems. similarly, we have learned from innovative interfaces’ webinars and product demonstrations that sierra would still need local systems information technology and libraries | september 2013 53 expertise to perform the installations of the client software on staff workstations. sierra would require library it staff to perform administrative tasks like the user account administration and to support sierra in interfacing with local institution-specific resources. in general, as shown in table 1, local systems staff could be freed from the burdensome responsibility of administering the traditional ils because of the software architecture of the nextgeneration ils. systems librarian responsibilities workload percentage traditional ils nextgen ils managing ils applications, including modules and the opac 10 x managing associated products such as discovery systems, erms, link resolver, etc. 10 x day-to-day operations including management maintenance, troubleshooting, and user support 10 x x server maintenance, database maintenance and backup 10 x customizations and integrations 5 x x configurations 5 x x upgrades and enhancements 5 x patches or other fixes 5 x design and coordination of statistical and managerial reports 5 x x overall staff training 5 x x primary representative and contact to the designated library system vendors 5 x x keeping abreast of developments in library technologies to maintain current awareness of information tools 5 x x engaging in scholarly pursuit and other professional activities 10 x x serving on various teams and committees 5 x x reference and instruction 5 x x total 100 100% 60% table 1. systems librarian responsibilities comparison for traditional ils and next-generation ils. note: the systems librarian responsibilities and the approximate percentage of time devoted to each function are slightly readjusted based on the compiled descriptions of the systems librarian job postings we collected and analyzed from the internet and from vendors’ claims. a total of 47 position a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 54 descriptions were gathered. the workload percentage is adopted from the job description of the systems librarian position at one of our institutions. our analysis shows that systems staff might reduce their workload by approximately 40 percent. therefore library systems staff could use their time to focus on local applications development and other library priority projects. however, it is important to emphasize that library systems staff should reengineer themselves by learning how to use apis provided by the next-generation ils so that they will be able to support the customization of their institutions’ discovery interfaces and the integration of the ils with other local enterprise systems, such as financial management systems, learning management systems, and other local applications. the implications of ils workflows and functionality on staffing models the typical workflow and functionality of both voyager and millennium are built on a modular structure. major function modules, called client modules, include systems administration, cataloging, acquisitions, serials, circulation, and statistics and reports. additionally, the traditional ils provides an opac interface for library patrons to access library materials and manage their accounts. millennium has an erms module built in as a component of their ils while ex libris has developed an independent erms as an add-on to voyager. the systems administration module is used to add system users and to set up locations, patron types, material types, and other library policies. the cataloging module supports the functions of cataloging resources, managing the authority files, tagging and categorizing content, and importing and exporting bibliographic records. the sophistication of the cataloging module depends primarily on the ils. the acquisitions module helps in the tracking of purchases and acquisition of materials for a library by facilitating ordering, invoicing, and data exchange with serial, book, and media vendors through electronic data interchange (edi). the circulation module is used to set up rules for circulating materials and for tracking those materials, allowing the library to add patrons, issue borrowing cards, and form loan rules. it also automates the placing of holds, interlibrary loan (ill), and course reserves. self-checkout functionality can be integrated as well. the serials module is essentially a cataloging module for serials. libraries are often dependent on the serials module to help them track and check-in serials. the statistics and reports module is used to generate reports such as circulation statistics, age of collection, collection development, and other customized statistical reports. a typical traditional ils comprises a relational database, software to interact with that database, and two graphical user interfaces—one for patrons and one for staff. it usually separates software functions into discrete modules, each of them integrated with a unified interface. the traditional ils’s modular design was a perfect fit for a traditional library organizational structure. the staff at central washington university library, for example, under the library administration, are organized into the following three major groups: public services, including the reference and circulation departments; technical and technology services, including the cataloging, collection development, serials & electronic resource, and systems departments; and information technology and libraries | september 2013 55 other library services and centers, including the government documents department, the music library, two center campus libraries, the academic and research commons, and the rare book collection & archive. each department has at least one professional librarian and other library staff members responsible for their daily operations. for example, the collection development librarian is responsible for the acquisition of print monographs and serials, while the electronic resource librarian is responsible for purchasing and managing licensed databases or e-journals. however, the next-generation ils significantly enhances and reintegrates the workflow of traditional ils functions. the functionality is quite different from the traditional ils’s modular structure. the design of the functionality stresses two principles: modularity and extensibility. it brings together the selection, acquisition, management, and distribution of the entire library collection. it provides a centralized data-services environment to its unified workflows for all types of library assets. one of the big enhancements of the next-generation ils is the acquisitions module, which enables the management of both print and electronic materials within a single unified interface, with no need to move between modules or multiple systems for different formats and related activities. for example, according to oclc, wms streamlines selection and acquisition processes via built-in access to worldcat records and publisher data. vendor, local, consortium, and global library data share the same workflows. wms automatically creates holdings for both physical and electronic resources. the worldcat knowledge-base simplifies electronic resource management and delivery. order data from external systems can be automatically uploaded. for consortium users, wms’s unified workflow and interface fosters efficient resource-sharing between different institutions whose holdings share a common format. similarly, ex libris’s alma has an integrated central knowledge base (ckb) that describes available electronic resources and packages, so there is no need to load additional descriptive records when acquiring electronic resources based on the ckb. the purchasing workflow manages orders for both print and electronic resources in a very similar way and handles some aspects unique to electronic resources, such as license management and the identification of an access provider. staff users can start the ordering process by searching the ckb directly and ordering from there. this search is integrated into the repository search, allowing a staff user to perform searches both in his or her institution as well as in the community zone, which holds the ckb. the next-generation ils provides unified data services and workflows, and a single interface to manage all physical, electronic, and digital materials. this will require libraries to rethink their acquisitions staffing models. for example, in small libraries could merge the acquisition librarian position and the electronic resource librarian position or reorganize the two departments. another functionality enhancement of the next-generation ils provides the authoritative ability for consortia users to manage local holdings and collections as well as shared resources. for example, wms’s single shared knowledge base eliminates the need for each library to maintain a copy of a knowledge base locally, because all consortia members can easily see what is licensed by other members of the consortia. cataloging records are shared at the consortium and global levels a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 56 in real time. each institution immediately benefits from original cataloging records added to the system and from enhancements to existing records. authority control is built into worldcat, so there is no need to do authority processing against local bibliographic databases. with real-time circulation between libraries’ collections, there is no need to re-create bibliographic and item data in separate local systems. similarly, sierra enhances the traditional technical services workflows by providing a shared bibliographic database. whenever a member library performs selection or ordering, the library is able to determine if other consortia members have already selected, ordered, and cataloged the title. this may impact a local selection, allowing consortia members to more collectively develop their individual collections and reduce duplication. alma’s centralized metadata management service (mms) takes a very similar approach to wms and sierra, allowing several options for local control and shared cataloging, depending on an institution’s needs, while ex libris maintains authority files. very large institutions, for example, might manage some records in the local catalog and most records in a shared bibliographic database, while smaller institutions might manage all of their records in the shared bibliographic database. all these approaches require more collaboration and cooperation between consortia members. according to vendors’ claims on their proposals to the orbis cascade alliance, small institutions might not need to have a professional cataloger, since the cataloging process is simplified and it is therefore easier for paraprofessional staff to operate and copy bibliographic records from the knowledgebases of these ilss. in addition, the next-generation ils also allows library users to actively engage with ils software development. for example, by adding opensocial containers to the product, wms allows library developers to use api to build social applications called gadgets and add these gadgets to wms. one example highlighted by oclc is a gadget in the acquisitions area of wms that will show the latest new york times best sellers and how many copies the library has available for each of those titles. similarly, sierra’s open developer community will allow library developers to share ideas, reference code samples, and build a wide range of applications using sierra’s web services. also, sierra will provide a centralized online resource called sierra developer sandbox to offer a comprehensive library of documented apis for library-developed applications. all these enhancements provide library staff with new opportunities to redefine their roles in a library. conclusions and arguments in summary, compared to the client-server architecture and modular design of the traditional ils, the next-generation ils has an open architecture and is more flexible and unified in its workflow and interface, which will have a huge impact on library staffing models. the traditional ils specifies clear boundaries between staff modules and workflows while the next-generation ils has blurred these boundaries. the integration and enhancement of the functionality of the nextgeneration ils will help libraries streamline and automate workflows and processes for managing both print and electronic resources. it will increase libraries’ operational efficiency, reduce the information technology and libraries | september 2013 57 total cost of ownership, and improve services for users. particularly, it will free approximately 40 percent of library systems staff time from managing servers, software upgrades, client application upgrades, and data backups. moreover, the next-generation ils provides a new way for consortial libraries to collaborate, cooperate, and share resources. in addition, the web-scale services provided by the next-generation ils allow libraries to access an infrastructure and platforms that enable them to reach a broad, geographically diverse community while simultaneously focusing their services on meeting the specific needs of their end-users. thus the more integrated workflows and functionality allow library staff to work with more modules, play multiple roles, and back up each other, which will bring changes to traditional staffing models. however, the next-generation ils also brings libraries new challenges along with its clear advantages. librarians and library staff might have concerns pertaining to their job security and can be fearful of new technologies. they may feel anxious about how to reengineer their business processes, how to get training, how to improve their technological skills, and how to prepare for a transition. we argue here that library directors might think about these staff frustrations and find ways to address their concerns. libraries should provide staff more opportunities and training to help them to improve their knowledge and skills. redefining job descriptions and reorganizing library organizational structures might be necessary to better adapt to the changes brought about by the next-generation ils. systems staff might invest more time in local application developments, other digital initiatives, website maintenance, and other library priority projects. technical staff might reconsider their workflows and cross-train themselves to expand their knowledge and improve their work efficiency. they might spend more time on data quality control and special collection development or interact more with faculty on book and e-resource selections. we hope this analysis will provide some useful information and insights for those libraries planning to move to the next-generation ils. the shift will require academic libraries to reconsider their organizational structures and rethink their manpower distribution and staffing optimization to better focus on library priorities, projects, and services critical to their users. references 1. marshall breeding, “a cloudy forecast for libraries,” computers in libraries 31, no. 7 (2011): 32–34. 2. marshall breeding, “current and future trends in information technologies for information units,” el profesional de la información 21, no. 1 (2012): 11. 3. jason vaughan and kristen costello, “management and support of shared integrated library systems,” information technology & libraries 30, no. 2 (2011): 62–70. 4. marshall breeding, “agents of change,” library journal 137, no. 6 (2012): 30–36. a comparative analysis of the effect of the integrated library system on staffing models in academic libraries | fu and fitzgerald 58 5. patricia ingersoll and john culshaw, managing information technology: a handbook for systems librarians (westport, ct: libraries unlimited, 2004). 6. edward g. iglesias, an overview of the changing role of the systems librarian: systemic shifts (oxford, uk: chandos, 2010). 7. janet guinea, “building bridges: the role of the systems librarian in a university library,” library hi tech 21, no. 3 (2003): 325–32. 8. breeding, “agents of change,” 30. 9. ibid. 10. ibid., 33. 11. ibid., 33. 12. ibid., 30. 13. sally bryant and grace ye, “implementing oclc’s wms (web-scale management services) circulation at pepperdine university,” journal of access services 9, no. 1 (2012): 1. 14. gary garrison et al., “success factors for deploying cloud computing,” communications of the acm 55, no. 9 (2012): 62–68. 226 technical communications reports-library projects and activities light-pen technology at the university of south carolina-the south carolina circulation system for some years at the university of south carolina studies have been underway to perfect a computer-based circulation system, and every avenue of input has been explored. about three years ago the new light-pen inventory control system which was being developed for use in the retail trade became known to the library. this device, using a light-readable label about one inch square in size, seemed to be a far better input device for an inventory control system utilizing identification cards and books than the traditional punched card. mter as much research as could be done in such a new field, the library staff examined light-pen systems marketed by ncr, checkpointplessey, and the monarch marking system, a subsidiary of pitney-bowes. all of these systems were similar in technology but the hardware and interest of the companies varied. monarch and plessey were very interested and willing to cooperate with libraries. plessey (through checkpoint) has developed and is marketing a library circulation system. at carolina the decision was made to develop an in-house batch system using the monarch light-pen and its technology coupled to a digital equipment corporation pdpll/ 10, 16k, dual dectape system. the basis for the functioning of the system is the light-pen stations at the circulation desk where the books are charged and discharged by running the light-pen rapidly over the light-readable label on the patron's id card and the label in the front inside cover of the book. one quick pass over the id card sets the system, followed by a pass over each book label in succession. at present the library has three light-pen stations and anticipates adding a fourth, which will be sufficient for meeting all of the main library's needs for the foreseeable future. the light-pen control boxes were constructed on campus. they turn the system on, and have message lights, trouble li:ghts, and charge points up to five different due dates. any number of light-pen stations can be attached. a hazeltine 2000 crt is used to show all transactions on the screen as they occur and to serve as the system console. a decwriter, used as a line printer, insures a backup system and gives a printout of transactions. the decwriter was selected because its thirty cps speed is adequate, it is highly reliable, and the price is right. the pdpll is used as a batch controller. it does not convert the label data to human-readable data; this is done at the central computer center. each night after the circulation desk closes, a telephone link is made to the university's central computer and the day's scanned data are pumped into the big computer. while the system is batch design, it incorporates the features of an on-line system without the high cost. if a patron inquires about an item, a glance at the updated patron report and/ or an inquiry into the current activity file through the system console can answer questions on the location of all books in the system. unlike some systems, this one has not required that the lfbrary change its hours of operation or data input. when the library opens in the morning, all reports are distributed. the library gets its in-process, or charge, file, in clear text without borrower information for use by patrons to see which books are charged out. complete circulation files with borrower information are furnished to the circulation staff on com fiche. a total of 10,335 charge records is on each four-bysix-inch fiche. the record includes patron name, status, and social security number, call number of the book, item number, date checked out, date due, author, and title. in addition, there is a field for charging to graduate carrels within the library. notices are periodically written for overdue books and there is an indication in the charge file showing how many notices have been sent. personal reserves or holds can be placed on a book by simply keying the book number into the crt. when a book is returned, a message light on the controller lights so that the staff member will know that the book has a hold on it. similar procedures will put a "hold" on any borrower whom the library needs to reach. the printouts are generally by call number; however, it is possible to get lists of all books by borrower, or in other formats. statistics are obtainable in almost any configuration, including number of books checked out to different categories of borrowers, by individual title, etc. the total cost for the hardware in this system was $38,381.04. maintenance contracts, telephone charges, and miscellaneous operational costs add up to about $357.04 monthly. labels cost $1.70 per thousand. the total cost of the system amortized over a five-year period is no more than $975.00 per month. mter that the only continuing costs will be maintenance contracts and labels. additional light-pen stations can be added in the same building for about $1,200 each. light-pen stations can be added in branch libraries for about $2,400 each. not included in the cost figures is central computer time, which is held to a minimum by the batch features and software development. at the university of south carolina computer services are not charged to the individual department but are treated as a campus-wide service. all of the com fiche is produced on the campus. the entire project was developed and put into operation between april 1973 technical communications 227 and january 1974. the first books were officially charged out on january 15, 1974 and the system has been in continuous operation since then. needless to say, problems of a special nature had to be solved, for example, issuing id cards with light-readable labels to 20,000 students in two days, acquiring labels which are permanent for the books, constructing the control boxes which are unique, and solving the usual telephone and computer difficulties. suffice it to say, the entire project was planned and became operational in less than eight months without employing additional staff. although the system is still being refined, the performance has been spectacular.-kenneth f. toombs, di1'ecto1' of librm·ies, unive1'sity of south cm·olina. public access cable tv information center at new york public library the new york public library (nypl), recognizing public access cable television as an important social tool, has assembled in the mid-manhattan library one of the most extensive collections, and possibly the first, on the subject. noncommercial public access television has been characterized as a people-oriented television system that can respond to and reflect society in terms of culture, language, history, experience, and race. the collection is designed for readers seeking information on all aspects of pubhe access cable tv, both practical and theoretical, with a significant portion devoted to television as a community tool. the mid-manhattan collection includes books, pamphlets, periodicals, and microforms. related materials are also collected to document television activities of programming intentions similar to public access tv. it is hoped that the mid-manhattan project will provide a prototype for other libraries beginning collections of public access cable tv. the collection's book materials emphasize three main areas of interest: programming and the audience, the educational potential of public access television, and legislative controls. pamphlet materials in"' :i: '" : ~ i '" ,, :i! ,ii '" ;ii ,, ill i! 228 j oumal of library automation vol. 7 i 3 september 197 4 elude information on ethnic involvement, women's groups, conferences and conventions, library activities in video, bibliographies, and other current topics, and are accessible through a vertical file index. the subject headings in the file reflect the "rule of probable associ:ation" whereby the first meaningful word (after the assumed word television) is used. if the reader knows what he wants, aided by cross-reference system, he can easily identify the proper subject file. the collection also includes periodical indexes leading to a wide variety of journal articles on different aspects of video. the eric (educational resources information center) microfiche series is available from 1971 to date and includes much published and unpublished research. a special feature of the collection is the card file which lists hard-to-find information concerning organizations, associations, and periodicals in the field. such commercial groups as video cassette manufacturers as well as alternative groups making tapes can all be located in the file. contact: richard hecht, history and social science department, mid-manhattan library, 8 e. 40th st., new york, ny 10018. reports on regional projects and activities sal/net-satellite libmry information network this project is designed to experiment in the extension of library services to sparsely populated regions of the rocky mountain and northern plains states. the project has been awarded "designated user" status on the communications technology satellite to be launched by nasa in late 1975. salinet is one of the first attempts to experiment in delivering library services via satellite. a group meeting in denver described plans to use the world's most powerful communications satellite as an extension of local library resources for residents of twelve mountain and plains states. the national space agency, the multistate federation of rocky mountain states, and several library oriented groups and agencies serving the area will pool their expertise and resources in the program, vvhich will begin planning late this year. the library information and development program is a new passenger on the educational satellite which will demonstrate new means of helping to teach residents of far-flung portions of the rocky mountain states and assist them in their information needs during a hvo-year period beginning next fall. four interests are represented in the library oriented project, which bears the acronym of salinet -satellite library information network. the university of denver graduate school of librarianship, the university of kansas libraries, the wyoming state library, and natrona county (wyoming) library are the principals in the consortium. each institution is responsible for certain portions of the library program, which will benefit both libraries and their patrons in the mountain and plains states. dr. margaret knox goggin, dean of the du graduate school of librarianship, is principal investigator on the library program. her co-workers representing other members of the consortium include kenneth e. dowlin, director of natrona county library, casper, wyoming; william williams, wyoming state librarian; and robert malinowsky, assistant director for public service, education, and statistics at the university of kansas libraries. also taking part in the salinet program are the bibliographical center for research, rocky mountain region, inc.; the federation of rocky mountain states; and the mountain-plains library association. these groups will assist with programming, broadcast, and engineering requirements, utilization, and research. the proposed program will utilize fiftysix satellite ground stations which will be in place as part of the federation of rocky mountain states' satellite technology demonstrations. twenty participating libraries in the states of north dakota, south dakota, nebraska, and kansas will 230 1 ournal of library automation vol. 7 i 3 september 197 4 the student as "physician" has selected in managing the case. the learning center offers this and other kinds of audiovisual programs designed to enhance textbook and classroom learning. computers, video cassettes, slide projectors, and models enable the student to experience close up and at his or her own speed areas of medicine that often cannot he presented as well in lectures or textbooks. in addition to the audiovisual materials, the learning center provides students with periodicals, lecture notes, and reference texts. students can watch dissections, see examples of blood cell abnormalities, hear the sounds of healthy and defective heartbeats, and examine oversize plastic models of the brain, the heart, and other parts of the human anatomy. and, with the computer learning programs, the students can participate in a case and make choices to guide its outcome. medical students learning about iron metabolism last year were split into two groups by their instructor so that half attended traditional classroom lectures and half learned the unit from the computer. final examinations showed no difference between the two groups, according to their teacher, dr. james mcarthur, formerly associate professor of medicine. the students preferred human teachers in small group tutorial sections for this unit, mcarthur said, but generally they were in favor of computer instruction. mcarthur, now assistant director of the health sciences learning resources center at the university of washington in seattle, said that the crowd of students usually found at the learning center is an indication of its success. announcements resolution whereas, the american library association is the chief advocate for librarians and laymen seeking to provide citizens of the united states with the highest quality library and information service, and whereas, a major effort will be required of this association and of all supporters of libraries in the next few years as the country's leaders determine longrange national positions in such matters as copyright, intellectual freedom, federal support of libraries, and a national plan for libraries and information services, and whereas, the effectiveness of this effort will depend on the concerted effort of all those concerned with library service, including library users, citizens groups, government officials and librarians themselves from all aspects and ranks of the profession; therefore let it be resolved that all the committees, chapters and divisions of the american library association take definite steps to increase mutual efforts within the association and with other associations seeking ways to strengthen the common effort toward the provision of quality library service to all people. and let it be further resolved that chapter councilors, division officers, the legislation assembly and chairpersons of committees and round tables, affiliated organizations and related groups transmit this resolution to members of their respective units. adopted by the ala legislation committee on july 9, 1974. isad p1·esident receives awar.d frederick g. kilgour, director of the ohio college library center, received the margaret mann citation for 1974, on july 9 at the program meeting of the resources and technical services division of the american library association, during the annual conference of ala in new york city, the week of july 7-13, 197 4. the award recognizes outstanding professional achievement in the areas of cataloging or classification. mr. kilgour received his a.b. from harvard and studied library service at columbia while working at the harvard college library. he worked at the office of strategic services in washington, d.c. and later became deputy director of the 232 journal of library automation vol. 7 i 3 september 197 4 loo, ontario) and edwin buchinski universite d'ottawa): canadian marc, canadian cataloging task group, union lists of serials, prospects for cooperation, unique cataloging problems (e.g., dual language requirements) , large serial data bases. registration will be $70.00 to members of either ala or asis, $85.00 to nonmembers, and $20.00 to library school students. registration includes one lunch, a reception, and a copy of the marc serials manual. for hotel reservation information and a registration blank, write to donald p. hammer, isad, american library association, 50 e. huron st., chicago, il 60611. washington university school of medicine library-book catalog the washington university school of medicine library, st. louis, announces the publication of its catalog of books 1970-1973, containing all entries for monographs cataloged at this library from january 1, 1970 to december 31, 1973. the first part, the register, consists of the complete citations arranged in the order cataloged. the second part consists of three alphabetical indexes to the register -name, title and series, and subject indexes. the catalog is on thirty microfiche, 24x reduction, and the price is $15.00. orders can be filled or additional information obtained from doris bole£, assistant librarian for technical and informational services. proceedings of info1·matics/ ucla symposium available on tapes high fidelity recordings of proceedings from the annual data processing symposium held march 27-29, 1974, at the university of california, los angeles, are now available on cassette tapes. the subject of this year's conference, cosponsored by informatics inc., los angeles, and ucla, was "information systems and networks: the new world of information retrieval available to your organization through computer networks." the complete program, recorded by convention seminar cassettes, north hoilywood, can be ordered by session or in total for review whenever convenient. cassettes one and two cover session one, the evolution of interactive information systems; cassettes three and four include session two, data bases; cassettes five and six, session three, on-line information retrieval systems; cassettes seven and eight, session four, cost effectiveness of information retrieval systems and networks; and cassettes nine and ten, session five, information networks in the 1980s. each set of two cassettes covering one session is priced at $10.95. the entire series can be purchased for $49.95 in an easy-to-store cassette album file. prices include postage and handling. to order, contact convention seminar cassettes, 13356 sherman way, north hollywood, ca 91605; tel: (213) 765-2777. nonprint media institute a nonprint media institute will be held in galveston, texas on october 15, 1974, southwestern library association's annual conference registration day. the one-day institute, sponsored by swla, will feature morning speakers, including pearce grove discussing progress in resolving differences among three cataloging standards for nonprint media, and vivian schrader, head of the a v section of library of congress, reporting on the progress of lc's nonprint cataloging standards. afternoon informal discussion forums will focus on technical service handling of art prints, microforms, films, kits, phonorecords, and audiotape. the nonprint media institute is open to members and nonmembers of swla, but is limited to 150 registrants. registration fee is $20.00. for registration, hotel reservations, and transportation information, write: ann adams, head cataloger, houston public library, 500 mckinney, houston, tx 77002. international standm·ds for cataloging: an institute on isbd, issn, nsdp and chapter 6, aacr the seventh annual institute of the li234 journal of library automation vol. 7/3 september 1974 zurkowski, president, information industry association, 4720 montgomery lane, bethesda, md 20014; tel: (301) 6544150. commercial services and developments 10,000 computer program abstracts in ncpas data base the national computer program abstract service ( n cp as) , a clearinghouse for computer program abstracts, has categorized over 10,000 abstracts into 142 subject areas in its latest newsletter. these abstracts of simulation models, application and computational programs, and information retrieval systems are derived from business, government, industry, military, and universities. all fields of knowledge are included and are grouped into the following general categories: biosciences, medical sciences, business, manufacturing, management, education, libraries, environment, ecology, nature, government (federal, state, local), urban affairs, legal, humanities, specific industries, publfc utilities, military, science, and engineering. this service should be of value to a present or potential user of computer programs, a vendor with a program to sell, or a professor developing programs in the academic community. programs can be listed in the data base free of charge. the service is problem oriented. the program abstract information is disseminated in two forms: ( 1) a program index newsletter which includes a detailed index of the available subjects and the number of abstracts available for each subject (updated quarterly) -the newsletter cost is $10.00 per year ( $5.00 additional for foreign airmail) ; and ( 2) a sub;ect abstract report which includes all the abstracts available in the ncp as data base on a particular subject identi'fied in the progmm index newsletter-the abstract report cost is $10.00 for the rrst 200 abstracts and $5.00 for up to each additional 200 abstracts ( $5.00 additional for foreign airmail). for additional information contact: ncpas, p.o. box 3783, washington, dc 20007. communications by telephone a lfghtweight communications system about the size of a suitcase is now being introduced that can take the travel-and the cost-out of meetings. the new solid state system is the darome edu-com, a portable self-contained communications unit with four microphones, that uses regular telephone lines. edu-com enables groups of people in different places across the country to confer together as easily as if they were all in the same room. the cost of a onehour meeting with participants located coast-to-coast is a few hundred dollars. manufactured by darome, inc., of harvard, illinois, makers of modular sound systems equipment, the edu-com unit plugs into an inexpensive, standard telephone coupler, a device supplied by the telephone company. the number of locations that can be included in a darome edu-com conference is practically unlimited. to participate, each location need only be equipped with a darome unit and a telephone coupler. then, rather than having just one meeting at a time, it is possible to hold any number of meetings in any number of places at the same time. before an edu-com session begins, the organizer of the meeting telephones a special conference call telephone operator and gives the names, locations, and telephone numbers of the groups to be reached and the time of the meeting. charges begin only when all locations have been tied together by the operator and the conference is ready to start. the rate for the darome edu-com meetings is much lower than for direct dialing the places individually. the rate is equal to the cost of calling only the farthest city participating in the conference. for example, a one-hour meeting that originates in chicago and includes groups of people who participate in new york, newark, huntington, greensboro, atlanta, orlando, detroit, denver, and san diego would cost $280. 236 journal of library automation vol. 7/3 september 1974 the wall street journal and banon's magazine. anthony a. barnett, senior vice-president of bunker ramo, said test installations in five stockbrokerage firms over the past month "have allowed us to shake down the system and prepare for nationwide marketing." mr. barnett said fifty of the news retrieval systems were sold before formal introduction. "we're encouraged by the marketing prospects among stockbrokerage firms and financial institutions but believe the market among corporations may have even greater potential," he added. the news retrieval system permits instantaneous recall of stories on 6,000 companies listed on the new york and american stock exchanges and traded over-thecounter. users also are able to retrieve news of twenty-five industry groups, fifteen government agencies, and several general categories. mr. barnett said that at the outset customers will be able to recall from the ffie any story that has appeared in the last three months. dj news-recall was developed as a joint venture by bunker ramo and dow jones, which publishes the wall stmet joumal, barron's, and the dow jones news service. the joint venture, dow jones-bunker ramo news retrieval service inc., in turn will market the data base to distributors for resale. bunker ramo's information systems division is the charter distributor for dj news-recall. mr. barnett said the basic charge for dj news-recall to users of bunker ramo's system 7 will be $175 a month per office ·plus $25 for each video terminal having access to the news retrieval service. on-line access to the compendex data base engineering index, inc., announces the availability of its computer-readable data base, compendex, through on-line access. two organizations are currently providing this di'rect mode of bibliographic search: lockheed information systems and system development corporation. using the latest in data communications services, users requiring access to the compendex files may interact with the system via their own in-house terminal, thus providing the convenience and speed of "on-demand" searches. compendex is the machine-readable version of ei's monthly and provides abstracts/bibliographic citations covering worldwide developments in all fields of engineering. both s.d.c. and lockheed, utilizing the most modern system technology, afford the user the opportunity to maintain an actual "dialog" with major bases. this is done without imposing an overly complicated or difficult command language on those addressing the system. on-line access now adds a new dimension to those requiring searches of the ei data base. for further information interested individuals and organizations may contact: lockheed information service, 3251 hanover st., palo alto, ca 94304; tel: ( 415) 493-44ll (east coast office: lockheed-405 lexington ave., new york, ny 10017; tel: (212) 697-7171), or s.d.c. search service, system development corporation, 2500 colorado ave., santa monica, ca 90406; tel: (213) 393-94ll (east coast office: s.d.c.5827 columbia pike, falls church, va 22041; tel: ( 703) 820-2220) . standards the isad committee on technical standards for library automation invites your participation in the standards game editor's note: use of the following guidelines and forms is described in the mticle by john kountz in the june 1974 issue of jola. the tesla reactor ballot will also appear in subsequent issues of technical communications for mader use, and the tesla standards sc01'eboard will be presented as cumulated results warrant its publication. to use, photocopy or otherwise duplicate the forms presented in jolatc, fill out these copies, and mail them be added to complete a twelve-state test bed representing all categories of libraries. with the involvement of all these points, half of which will be in two-way communication with other points via the satellite, the library information project hopes to accomplish three primary goals: 1. improving individual and organization capacities for getting information. 2. demonstrating and testing cost effectiveness in using technological advances to disseminate information. 3. developing user "markets" for information utilizing satellite distribution. the program will try to help individual users of information and community-level groups such as governmental agencies, businesses, and other organizations. on a regional level, bibliographic information will be transmitted to libraries in a "compressed data format." with such a fonnat, a library in a remote area of north dakota may have access to most needed information about resources available from large and specialized centers, such as the denver public library's special conservation library or western history collection. the proposed satellite information program will also be used to train librarians, both at a professional and paraprofessional level. the in-service program will be aimed at helping librarians to better assist their patrons in getting information. all these major aspects-public information programming at the individual level, technology dissemination at the community level, compressed bibliographical data transmission, and in-service training-will be accomplished in a total of fifty hours per year of programming, reports dr. william e. rapp, vice-president of the federation of rocky mountain states. the limited time available for this programming in coordination with other programs planned for the satellite project place a premium on solid advance preparation of material to be transmitted, and speed of transmission, he notes. for example, the transmission of the technical communications 229 compressed bibliographical data would be in twoto three-minute segments at the end of other programming. technology dissemination, a community-level program, would be handled in a total of fifteen hours of satellfte use a year-an average of fifteen to twenty minutes per week. the largest segment of time, for inservice training of librarians, is twenty hours per year-which breaks down to less than half an hour a week on the average. but if the available time on the satellite is used to its full potential, dean goggin belfeves the population of the entire rocky mountain and plains region will benefit tremendously. the combined resources of major libraries and two major universities could be shared instantly with communities and residents of the region. new horizons cu1'e by compute1'-a learning expe1'ience it has been a long day for the physician, and at 10:30 p.m. he is getting ready to go horne and have his first meal since breakfast. but the phone rings and the caller from university of minnesota hospital tells him that an infant has just been brought in from outstate minnesota for diagnosis. the baby's hometown physician has noted that the baby hasn't eaten well for several days and he can't decide what's wrong. the case apparently isn't urgent, so the physician can either go for a bite to eat and return later or go straight to the hospital to look at the baby. this is the first of a series of choices presented to medical students in this imaginary case history. it's offered as a computerized learning program for medical and health sciences students at the university of minnesota's learning center. for a student playing the part of the physician in this case, each choice he or she makes presents new difficulties in the case--which call for more choices. ultimately, the imaginary infant either dies or survives, depending on what options i i ' i i i. office of intelligence collection at the department of state fn washington. he was librarian at the yale medical library and then associate librarian for research and development of the yale university library. he has been active in library and library-related organizations since the beginning of. his career and has served on many committees. he was managing editor of the yale journal of biology and medicine; and has written numerous articles for professional journals. his professional interests are computerization of libraries and information retrieval. the text of the citation reads in part: " ... awarded in 1974 to frederick g. kilgour for his success in organizing and putting into operation the first practical centralized computer bibli:ographical center. he has been the principal inr.uence behind an emerging trend toward cooperation in technical services. . . . as director of the ohio college library center he has made the library of congress marc data base a practical and useful product, stimulating i'nterest throughout the country and the profession .... his tireless efforts represent an outstanding contribution to the technical improvements of cataloging and classification and the introduction of new techniques of recognized importance." institute on automated serials contml the information science and automation division (isad) of the american library association and the american society for information science will cosponsor a preconference institute on "automated serials control: national and international considerations." the institute will be held on october 11 and 12, 1974 in atlanta, georgia immediately before the asis annual conference, which begins on october 13. the institute and the conference will both be held in the atlanta regency hyatt house. the purposes of the institute will be: ( 1) to present in-depth discussions on the new and dramatic developments in fhe serials field and their implications for the library and the library systems development communities; .and ( 2) to provide a technical communications 231 survey of the progress made to date in automated serials systems. formal presentations by acknowledged experts actively involved in the field will be provided, and ample opportunity will be available for informal discussion between the participants and a panel of concerned professionals. the panel will represent various views related to current and future developments as well as the national and international consequences. among other things, the program will include the following: elaine wood (lc marc development office) : marc serials format and serials processing at lc-a tutorial. in addition, this session will include discussion on national and international standards, and will emphasize the difference between marc serial and marc monograph formats. joseph howard (lc serials record division) : cataloging considerations. the proposed changes to the cataloging rules -isbd, aacr, various points concerning entry and other unique cataloging problems. henriette a vram and lucia rather (lc marc development office): international considerations. prospects for international exchange of data, mechanisms for exchange, problems posed by differing practices and conventions, international developments in machine-readable cataloging. paul fasana (new york public library) : impact of national developments and of automation on library services. general consideration of automation's impact on library services with emphasis on serials control. recommendations concerning national developments. linda crismond (university of southern california): review of serials systems and system considerations. acquisitions, cataloging, check-in, claiming, etc. problems posed by holdings notations, volatility of data, linking entries, etc. lois upham (university of minnesota): conser (consolidated serials project). joseph price (nsdp): international serials data system and nsdp. cynthia pugsley (university of waterbrary institutes planning committee will be held october 18-19, 1974, at rickey's hyatt house hotel, palo alto, california. paul w. winkler, principal descriptive cataloger, library of congress, will speak on the application of the international standard bibliographic description to monographs and on related topics. the establi:shment of bibliographic control of serials through international standard serial numbers, chapter 6 of the angloame1'ican cataloging rules, and the national serials data program will be presented by richard anable, coordinator, conser project. the program is designed to be of particular interest to technical services librarians, serials librarians, bibliographers, and administrators. registration for the two-day meeting is limited; the fee is $20.00 and includes two luncheons. further information, including a list of hotel accommodations, will be mailed to applicants. registrants of the 1972 and 1973 institutes will automatically receive registration forms. others may obtain forms by writing joseph e. ryus, 2858 oxford ave., richmond, ca 94806, or by telephoning him during weekday hours at the university of california, berkeley, ( 415) 642-4144. all registration forms will be mailed early in september. the library institutes planning committee is a nonprofit organization composed of eight librarians from county, special, and university libraries in northern california. previous institutes have featured ralph ellsworth, j. mcree elrod, seymour lubetzky, ellsworth mason, daniel melcher, john c. rather, joseph a. rosenthal, and paul w. winkler. info1'mation industry associations expanded micropublishing and data base programs major policy steps have been taken by the information industry association ( iia), making the work of the associatirm more understandable and more relevant to information industry companies. it changed the title of the government micropublishing committee to micropublishing and it directed the establishment technical communications 233 of a data base committee. "regardless of the media information companies and other publishers currently use in delivering information," iia presi'dent, paul g. zurkowski, said, "competition and rising costs are forcing them into consideration of alternative methods. iia member companies will be able to focus their energies most effectively on the industry-wide problems through these new committees." micropublishing committee chairman, henry powell, bell & howell, bethesda, at a recent meeting of the committee spelled out several areas of concern to micropublishers which will be the subject of committee action: 1. how can micropublishers protect their investment from unfair competition of unscrupulous competitors who misappropriate the micropublishers' work product and market essentially "reprinted" versions of the original microfilm. 2. library relations. joint library-industry steps toward mutual understanding and cooperation. 3. z-39 standards committee recommended standards covering what micropublishers can say about their products. what is a volume equivalent in microform? what information should be included on each mfcrofiche and where on the header or title section of the microfilm product? 4. a program to educate users as to the operational benefits of micropublished materials. the data base committee is being formed with the participation of both data base creating companies and those offering public access to various data bases. the area of interest to this committee will embrace the status of data bases under existing proprietary rights laws, communications capabilities and rates as controlled by the fcc, unfair competition legislation pending in congress, and such other problems as those created for the industry by university computer centers marketing access to similar data bases, but without full cost recovery. for further information contact paul g. simply by pressing the lever on one of the edu-com microphones and speaking into it, anyone in any of those cities will be heard in the meeting i'n every other city as if he were right there in the room. in addition, an automatic slide projector can also be plugged into the unit. during a presentation, the speaker can change the slides simultaneously in all the locations equipped with slides. a cassette player-recorder can be plugged into the darome edu-com unit, either to provide the program or record the session. vvhen used alone, the darome educom can also serve as a public address system for a single meeting. for further information, contact darome, inc., 711 e. diggins st., harvard, il 60033. automated news clipping, indexing and retrieval system (ancirs) image systems, inc. of culver city, california, has developed an automated system for the indexing and retrieval of news clippings. while ancirs (pronounced answers) is geared for use in the newspaper library, terminals located at remote sites provide access to the system for business, industry, education, and law enforcement and other government agencies. the microfiche terminals, whfch are controlled by a minicomputer, are each capable of storing 325,000 clippings and 1 million lines of index and search terms. ancirs has a capacity in excess of 1.25 million listings. access to a page of index hstings or to the full text of any clipping requires less than four seconds. paper copy of any or all of the selected clippings can be produced at the terminal at the touch of a button. multiple terminals can share the same minicomputer. a unique off-line/ on-line indexing system generates subject term lists from story headlines and other key words, names, and places in the story as selected by the indexer. when indexing a story, the indexer keys in the first letters of the subject terms to be assigned. this causes the terms currently in use to be displayed at the terminal, allowing the indexer to automatically assign the appropriate terms to the story being indexed. if the term is technical communications 235 not already in use it may be entered by completing the typing of the term. after new terms have been entered and old terms assigned, a magnetic tape is produced for the off-line program. the off-line program prepares three lists for computer output to microfiche: 1. headlines permuted by the key words in each headline interleaved with subject terms selected from the stories. 2. a category list by classification and subclass. 3. each story headline in date order. to perform a search, the user keys i'n the first few letters of the search term. this causes the appropriate portion of list one to be displayed. each item on the page has a line number. keying the line number ( s) selects the desired term ( s) and causes the most recent clipping to be displayed. if the selected term is too general, i.e., a category heading, the appropriate portion of hst two is automatically displayed so that a more precise selection may be made. the selection in this instance is also made by entering the line number(s) of the desired term(s). these selections may be combined logically with other selections to further narrow the search. once the terms have been selected and the most recent story i's displayed it is then possible to page back through previous related stories. the hard copy of any story can be requested at any time and is produced by the unit in ten seconds. ancirs' low cost makes it the ideal tool for researchers and decision makers who must have at their fingertips complete facts on world, national, and local events. machine-readable data bases news retrieval service bunker ramo corporation and dow jones & co. inc. announced the start of dj news-recall, a computerized news retrieval service based on stories appearing on the dow jones news service and in to the tesla chairman, mr. john c. kountz, associate for librmy automation, of/ice of the chancellor, the california state university and colleges, 5670 wilshire blvd., suite 900, los angeles, ca 90036. the procedure this procedure is geared to handle both reactive (originating from the outside) and initiative (originating from within ala) standards proposals to provide recommendations to ala's representatives on existing, recognized standards organizations. to enter the procedure for an initiative standards proposal you must complete an "initiative standards proposal" using the outline which follows: initiative standard proposal outlinethe following outline and forms are designed to facilitate review by both the isad committee on technical standards for library automation (tesla) and the membership of initiative standards requirements and to expedite the handling of the initiative standard proposal through the procedure. since the outline will be used for the review process, it is to be followed explicitly. where an initiative standard requirement does not require the use of a specific outline entry, the entry heading is to be used followed by the words "not applicable" (e.g., where no standards exist which relate to the proposal, this is indicated by: vi. existing standards. not applicable). note that the parenthetical statements following most of the outline entry descriptions relate to the ansi standards proposal section headings to facilitate the translation from this outline to the ansi format. all initiative standards proposals are to be typed, double spaced on 8%" x 11" white paper (typing on one side only). each page is to be numbered consecutively in the upper right-hand corner. the initiator's last name followed by the key word from the title is to appear one line below each page number. i. title of initiative standard proposal (title). technical communications 237 ii. initiator information (forward). a. name b. title c. organization d. address e. city, state, zip f. telephone: area code, number, extension iii. technical area. describe the area of library technology as understood by initiator. be as precise as possible since in large measure the information given here will help determine which ala official representative might best handle this proposal once it has been reviewed and which ala organizational component might best be engaged in the review process. iv. purpose. state the purpose of standard proposal (scope and qualifications) . v. description. briefly describe the standard proposal (specification of the standard). vi. relationship of other standards. if existing standards have been identified which relate to, or are felt to influence, this standard proposal, cite them here (expository remarks). vii. background. describe the research or historical review performed relating to this standard proposal (if applicable, provide a bibliography) and your findings (justification). viii. specifications. specify the standard proposal using record layouts, mechanical drawings, and such related documentation aids as required in addition to text exposition where applicable (specification of the standard). kindly note that the outline is designed to enable standards proposals to be written following a generalized format which will facilitate their review. in addition, the outline permits the presentation of background and descriptive information which, while important during any evalu238 journal of libmry automation vol. 7 i 3 september 197 4 ation, is a prerequisite to the development of a standard. the reactor ballot is to be used by members to voice their recommendations relative to initiative standards proposals. the reactor ballot permits both "for" and "against" votes to be explained, permitting the capture of additional information · which is necessary to document and communicate formal standards proposals to standards organizations outside of the american library association. tesla reactor ballot reactor information name title organization address city state ___ zip __ telephone identification number for standard requirement for against reason for position: (use additional pages if required} as you, the members, use the outline to present your standards proposals, tesla will publish them in ]ola-tc and solicit membership reaction via the reactor ballot. throughout the process tesla will insure that standards proposals are drawn to the attention of the applicable american library association division or committee. thus, internal review usually will proceed concurrently with membership review. from the review and the reactor ballot tesla will prepare a "majority recommendation" and a "minority report" on each standards proposal. the majority recommendation and "minority report" so developed will then be transmitted to the originator, and to the official american library association representative on the appropriate standards organization where it should prove a source of guidance as official votes are cast. in addition, the status of each standards proposal will be reported by tesla in jola-tc via the standards scoreboard. the committee (tesla) itself will be nonpartisan with regard to the proposals handled by it. however, the committee does reserve the right to reject proposals which after review are not found to relate to library automation. an invitation from tesla during the formative period of tesla the list of potential standards areas for library automation, below, was developed. you are invited to review the list below and voice your opinion of any or all areas indicated by means of the reactor ballot. or, if you have a requirement for a standard not included in this list, use the initiative standard proposal outline to collect and present your thoughts. potential technical standards areas1. codes for library and library network, including network hierarchy structures. 2. documentation for systems design, development, implementation, operation, and postimplementation review. 3. minimum display requirements for library crts, keyboards for terminals, and machine-readable character or code set to be used as label printed in book. 4. patron or user badge physical dimension(s) and minimum data elements. 5. book catalog layout (physical and minimum data elements), a. off-line print b. photocomposed c. microform 6. communication formats for inventory control (absorptive of interlibrary loan and local circulation). 7. data element dictionary content, format, and minimum vocabulary, and inventory identification minimum content. 8. inventory labels or identifiers (punched cards, labels, badges, or . . . ) physical dimensions and minimum data elements. 9. model/minimum specifications relating to hardware, software, and services procurement for library applications. 10. communications formats for library material procurement (absorptive of order, bid, invoice, and related follow-up). input to the editor: i have reviewed mr. joe rosenthal's incisive survey of the marc types which appear to be eminent. unfortunately, in his studies he seems to have overlooked the one marc type which will pose the technical communications 239 greatest problem and relates to "unmarced" books: nonmarc-the grand universe of records which yet remain to be placed in this most noble of formats and its international counterpart, originating in holland, nedermarc-which stems from, "i nedermarc, you nedermarc, all them systems nedermarc .... " hopefully, someone will solve the problem explicit in these forms of marc. as a result, until someone solves this problem, we are all without marc. john kountz associate for library automation california state university and colleges patrick griffis building pathfinders with free screen capture tools building pathfinders with free screen capture tools | griffis 189 this article outlines freely available screen capturing tools, covering their benefits and drawbacks as well as their potential applications. in discussing these tools, the author illustrates how they can be used to build pathfinding tutorials for users and how these tutorials can be shared with users. the author notes that the availability of these screen capturing tools at no cost, coupled with their ease of use, provides ample opportunity for low-stakes experimentation from library staff in building dynamic pathfinders to promote the discovery of library resources. o ne of the goals related to discovery in the university of nevada las vegas (unlv) libraries’ strategic plan is to “expand user awareness of library resources, services and staff expertise through promotion and technology.”1 screencasting videos and screenshots can be used effectively to show users how to access materials using finding tools in a systematic, step-by-step way. screencasting and screen capturing tools are becoming more intuitive to learn and use and can be downloaded for free. as such, these tools are becoming an efficient and effective method for building pathfinders for users. one such tool is jing (http://www.jingproject.com), freeware that is easy to download and use. jing allows for short screencasts of five minutes or less to be created and uploaded to a remote server on screencast.com. once a jing screencast is uploaded, screencast.com provides a url for the screencast that can be shared via e-mail or instant message or on a webpage. another function of jing is recording screenshots, which can be annotated and shared by url or pasted into documents or presentations. jing serves as an effective tool for enabling librarians working with students via chat or instant messaging to quickly create screenshots and videos that visually demonstrate to students how to get the information they need. jing stores the screenshots and videos on its server, which allows those files to be reused in subject or course guides and in course management systems, course syllabi, and library instructional handouts. moreover, jing’s files storage provides an opportunity for librarians to incorporate tutorials into a variety of spaces where patrons may need them in such a manner that does not require internal library server space or work from internal library web specialists. trailfire (http://www.trailfire.com) is another screencapturing tool that can be utilized in the same manner. trailfire allows users to create a trail of webpage screenshots that can be annotated with notes and shared with others via a url. such trails can provide users with a step-by-step slideshow outlining how to obtain specific resources. when a trail is created with trailfire, a url is provided to share. like jing, trailfire is free to download and easy to learn and use. wink (http://debugmode.com/wink) was originally created for producing software tutorials, which makes it well suited for creating tutorials about how to use databases. although wink is much less sophisticated than expensive software packages, it can capture screenshots, add explanation boxes, buttons, titles, and voice to your tutorials. screenshots are captured automatically as you use your computer on the basis of mouse and keyboard input. wink files can be converted into very compressed flash presentations and a wide range of other file types, such as pdf, but do not support avi files. as such, wink tutorials converted to flash have a fluid movie feel similar to jing screencasts, but wink tutorials also can be converted to more static formats like pdf, which provides added flexibility. slideshare (http://www.slideshare.net) allows for the conversion of uploaded powerpoint, openoffice, or pdf files into online flash movies. an option to sync audio to the slides is available, and widgets can be created to embed slideshows onto websites, blogs, subject guides, or even social networking sites. any of these tools can be utilized for just-in-time virtual reference questions in addition to the common use of just-in-case instructional tutorials. such just-in-time screen capturing and screencasting offer a viable solution for providing more equitable service and teachable moments within virtual reference applications. these tools allow library staff to answer patron questions via e-mail and chat reference in a manner that allows patrons to see processes for obtaining information sources. demonstrations that are typically provided in face-toface reference interactions and classroom instruction sessions can be provided to patrons virtually. the efficiency of this practice is that it is simpler and faster to capture and share a screencast tutorial when answering virtual reference questions than to explain complex processes in written form. additionally, the fact that these tools are freely available and easy to use provides library staff the opportunity to pursue low-stakes experimentation with screen capturing and screencasting. the primary drawback to these freely available tools is that none of them provides a screencast that allows for both voice and text annotations, unlike commercial products such as camtasia and captivate. however, tutorials rendered with these freely available tools can be repurposed into a tutorial within commercial applications like camtasia studio (http://www.techsmith.com/camtasia .asp) and adobe captivate (http://www.adobe.com/ products/captivate/). patrick griffis (patrick.griffis@unlv.edu) is business librarian, university of nevada las vegas libraries. 190 information technology and libraries | december 2009 as previously mentioned, these easy-to-use tools can allow screencast videos and screenshots to be integrated into a variety of online spaces. a particularly effective type of online space for potential integration of such screencast videos and screenshots are library “how do i find . . .” research help guides. many of these “how do i find . . .” research help guides serve as pathfinders for patrons, outlining processes for obtaining information sources. currently, many of these pathfinders are in text form, and experimentation with the tools outlined in this article can empower library staff to enhance their own pathfinders with screencast videos and screenshot tutorials. reference 1. “unlv libraries strategic plan 2009–2011,” http://www .library.unlv.edu/about/strategic_plan09-11.pdf (accessed july 30, 2009): 2. unlv special collections continued from page 186 references 1. peter michel, “dino at the sands,” unlv special collections, http://www.library.unlv.edu/speccol/dino/index.html (accessed july 28, 2009). 2. peter michel, “unlv special collections search box.” unlv special collections. http://www.library.unlv.edu/speccol/ index.html (accessed july 28, 2009). 3. unlv special collections search results, “hoover dam,” http://www.library.unlv.edu/speccol/databases/index .php?search_query=hoover+dam&bts=search&cols[]=oh&cols []=man&cols[]=photocoll&act=2 (accessed october 27, 2009). 4. unlv libraries, “southern nevada: the boomtown years,” http://digital.library.unlv.edu/boomtown/ (accessed july 28, 2009). 5. unlv special collections, “what’s new in special collections,” http://blogs.library.unlv.edu/whats_new_in_special_ collections/ (accessed july 28, 2009). 6. unlv special collections, “unlv special collections facebook homepage,” http://www.facebook.com/home .php?#/pages/las-vegas-nv/unlv-special-collections/70053 571047?ref=search (accessed july 28, 2009). 7. unlv libraries, “comments section for the aerial view of hughes aircraft plant photograph,” http://digital.library .unlv.edu/hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “‘rate it’ feature for the aerial view of hughes aircraft plant photograph,” http://digital.library.unlv.edu/ hughes/dm.php/hughes/82 (accessed july 28, 2009); unlv libraries, “rss feature for the index to the welcome home howard digital collection” http://digital.library.unlv.edu/hughes/ dm.php/ (accessed july 28, 2009). statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: marc truitt, associate director, information technology resources and services, university of alberta, k adams/cameron library and services, university of alberta, edmonton, ab t6g 2j8 canada. annual subscription price, $65. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: september 2009 issue). total number of copies printed: average, 5,096; actual, 4,751. mailed outside country paid subscriptions: average, 4,090; actual, 3,778. sales through dealers and carriers, street vendors, and counter sales: average, 430; actual 399. total paid distribution: average, 4,520; actual, 4,177. free or nominal rate copies mailed at other classes through the usps: average, 54; actual, 57. free distribution outside the mail (total): average, 127; actual, 123. total free or nominal rate distribution: average, 181; actual, 180. total distribution: average, 4,701; actual, 4,357. office use, leftover, unaccounted, spoiled after printing: average, 395; actual, 394. total: average, 5,096; actual, 4,751. percentage paid: average, 96.15; actual, 95.87. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , s e p t e m b e r 2 0 0 7 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , o c t o b e r 1 , 2 0 0 9 . algorithmic literacy and the role for libraries article algorithmic literacy and the role for libraries michael ridley and danica pawlick-potts information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12963 abstract artificial intelligence (ai) is powerful, complex, ubiquitous, often opaque, sometimes invisible, and increasingly consequential in our everyday lives. navigating the effects of ai as well as utilizing it in a responsible way requires a level of awareness, understanding, and skill that is not provided by current digital literacy or information literacy regimes. algorithmic literacy addresses these gaps. in arguing for a role for libraries in algorithmic literacy, the authors provide a working definition, a pressing need, a pedagogical strategy, and two specific contributions that are unique to libraries. introduction algorithms, in one form or another, are as old as human problem solving and as simple as “a sequence of computational steps that transform the input into the output.”1 for centuries they have been effective, and uncontroversial, methodologies. however, the rise of artificial intelligence (the integration of big data, enhanced computation, and advanced algorithms) with its human and greater-than-human performance in many areas has positioned algorithms as transformational and a “major human rights issue in the twenty-first century.”2 algorithmic literacy is important given of the prevalence of algorithmic decision-making in many aspects of everyday life and because “the danger is not so much in delegating cognitive tasks, but in distancing ourselves from—or in not knowing about—the nature and precise mechanisms of that delegation.”3 as a result, david lankes warns of a new type of digital divide with “a class of people who can use algorithms and a class used by algorithms.”4 in a 2019 deloitte survey “only 4 percent reported they were confident explaining what ai is and how it works.”5 while a 2019 edelman survey indicated general awareness of ai, it also revealed a similar lack of knowledge about the details of ai.6 an informed, algorithmically literate public is better able to negotiate and employ the complexities of ai.7 identifying and acting upon algorithms as a literacy makes them as “fundamental as reading, writing, and arithmetic.”8 however, the uncritical use of the term literacy should make one suspicious of extending it to algorithms. increasingly “literacy” has come to mean merely a body of knowledge or a set of domain-specific skills.9 various literacies have been described, such as health, death, financial, physical, ocean, religious, visual, dancing, spatial, screen, and porn. this includes a dozen different technology-related literacies.10 the case for algorithmic literacy, and the role for libraries in advancing it, must rest on a clear definition, a recognized problem and need, a pedagogical strategy, and a unique (or at least supportive) contribution libraries can provide. michael ridley (mridley@uoguelph.ca) is librarian emeritus, mclaughlin library, university of guelph, ontario, canada. danica pawlick-potts (dpawlic@uwo.ca) is phd candidate, faculty of information and media studies, western university, ontario, canada. © 2021. mailto:mridley@uoguelph.ca mailto:dpawlic@uwo.ca information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 2 algorithms and literacy while the term “algorithmic literacy” is recent, it has antecedents that cover similar if not equivalent ground. the general terms computer literacy or digital literacy have spawned more specific terms such as cyber literacy, computational thinking, and algorithmic thinking.11 most of these arise from the field of computer science, where algorithms are central, and focus on the computational nature of algorithms as a “matter of mathematical proof” where “other knowledge about algorithms—such as their applications, effects, and circulation—is strictly out of frame.”12 the implications of algorithms in everyday life suggests that a deeper and broader interpretation is required. whether a literacy, a mode of thinking, or merely a set of skills, discussions about computation and algorithms have been plagued by “ambiguity and vagueness” and “definitional confusion” resulting in ongoing challenges in establishing core pedagogy in both k–12 and higher education.13 without a clear, acknowledged, and actionable definition that differentiates it from concepts such as digital literacy, computational thinking, and algorithmic thinking, algorithmic literacy will be relegated to a buzz phrase and the urgency of its recognition and application will be lost. the relationship between algorithms and artificial intelligence might recommend the adoption of “ai thinking” or “ai literacy” as the more appropriate term.14 however, algorithmic literacy is both more foundational than the broader concept of ai and more actionable than just thinking. algorithms are not a technology like ai or, more generally, computers. algorithms provide a structure that frames—and constrains—how we express ourselves. they are a way of seeing and acting in the world and “need to be understood as relational, contingent, [and] contextual.”15 while the technical and operational aspects of algorithms are important to understand and use (as they are for the technologies and processes of reading and writing in a new language), they are complemented by a broader awareness: literacy is not a set of generic skills or something we do or do not possess, it’s a sociocultural practice, it’s something that we do, and what we do with literacy depends on the social, cultural, and historical contexts in which we do it. literacy looks different in different contexts and communities. literacy is not neutral, it’s ideological. there are dominant and marginalized literacies.16 this perspective is the essence of critical algorithm studies, where algorithms are viewed as sociotechnical systems that are “intrinsically cultural . . . constituted not only by rational procedures, but by institutions, people, intersecting contexts, and the rough-and-ready sensemaking that obtains in ordinary cultural life.”17 algorithms as part of increasingly ubiquitous ai, such as machine learning and deep learning systems, reflect and promulgate certain ideologies and have impacts and influences in the full range of human society. cautions about algorithmic decision-making have identified the far-reaching implications for bias, fairness, privacy, and democratic processes.18 at the same time, numerous national strategies to support ai development have highlighted the substantial economic impact, anticipated to be $15.7 trillion (us) by 2030.19 the idea of algorithmic literacy must encompass multiple perspectives and contexts. information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 3 “literacies of the digital” computer, internet, information, computation, and algorithmic are all “literacies of the digital.” 20 while each of these has its own domain and focus, they share common ideas and are generally symbiotic with each other. there is an especially strong and complementary connection between computational literacy and information literacy.21 computational thinking and algorithmic literacy are closely related even if most definitions of the former fail to fully acknowledge the broader social, economic, and political implications. however, the extensive literature on computational thinking is useful in helping to articulate aspects of algorithmic literacy. wing’s foundational article about computational thinking describes the key characteristics in terms that closely resemble a literacy: 1. conceptualizing, not programming 2. fundamental, not a rote skill 3. a way that humans, not computers, think 4. complements and combines mathematical and engineering thinking 5. ideas, not artifacts 6. for everyone, everywhere.22 jacob and warschauer make a strong case for computational thinking as a literacy. their threepart framework identifies computational thinking as a new literacy embedded in modern sociocultural practices (computational thinking as literacy), discusses how literacy development can be leveraged to foster computational thinking (computational thinking through literacy), and explores ways in which computational thinking can facilitate literacy development (literacy through computational thinking).23 this analysis of computational thinking informs the larger context and broader implications of algorithmic literacy. defining algorithmic literacy scribner and cole define a literacy as “socially organized practices [that] make use of a symbol system and a technology for producing and disseminating it.”24 therefore, literacy = practices + symbol system + technology. to this definition, steiner adds a more aspirational and humanistic definition: by “literacy” i mean the ability to engage with, to respond to, what is most challenging and creative in our societies. to experience and contribute to the energies of informed debate. to distinguish the “news that stays news,” as ezra pound put it, from the tidal waves of ephemeral rubbish, superstition, irrationalism, and commercial exploitation.25 literacy is about knowing and meaning making through the processes of internalizing and externalizing information. literacy enables a reflective, critical, and integrative approach to information that utilizes a broad knowledge base for both understanding and communicatin g ideas. finn calls for an algorithmic literacy “that builds from a basic understanding of computational systems, their potential and their limitations, to offer us intellectual tools for interpreting the algorithms shaping and producing knowledge” and thereby provides “a way to information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 4 contend with both the inherent complexity of computation and the ambiguity that ensues when that complexity intersects with human culture.”26 referring more broadly to “ai literacy,” long and magerko provide an operational view defining it as “a set of competencies that enables individuals to critically evaluate ai technologies; communicate and collaborate effectively with ai; and use ai as a tool online, at home, and in the workplace.” 27 following an exhaustive analysis of different, and often contradictory, definitions of literacy, information literacy, and digital literacy, bawden suggests “explaining, rather than defining, terms.”28 this provisional description of algorithmic literacy acknowledges that advice. algorithmic literacy is the skill, expertise, and awareness to • understand and reason about algorithms and their processes • recognize and interpret their use in systems (whether embedded or overt) • create and apply algorithmic techniques and tools to problems in a variety of do mains • assess the influence and effect of algorithms in social, cultural, economic, and political contexts • position the individual as a co-constituent in algorithmic decision-making. this description recognizes two overarching concepts: “creativity and critical analysis.”29 creativity involves building, creating, and using algorithms for specific purposes. critical analysis involves recognizing the application of algorithms in decision-making and the implications of their use in a variety of settings and within certain contexts. why algorithmic literacy? the need for algorithmic literacy arises from two key and equally important perspectives, both of which essentially focus on power: control and empowerment. algorithms, especially those us ing machine learning and deep learning, are complex, opaque, invisible, shielded by intellectual property protection, and most importantly, consequential in the everyday lives of people.30 control is held by those who build and deploy algorithms, not those who use them. in part because of these characteristics, people hold significant misconceptions about algorithms, their use, and their effect. in a 2019 global survey of consumers, 72% said they understood what ai was. however, despite ai being used in a wide variety of consumer-facing applications (e.g., email, search, social media), 64% said they had never used ai.31 a study of facebook users found that 62% were unaware that the news feed is algorithmically constructed and, even when told this, 12% concluded that it is, as a result, completely random.32 bias, discrimination, and unfairness in ai have been well documented.33 it is clear that poor data combined with underspecified algorithms and uncritical interpretations of the ai model outcomes can lead to abuses in a variety of ways. there is no quick fix, no automated solution to these problems. accordingly, those creating algorithms and those using them must be able to question the source of training data, the strengths and weaknesses of learning algorithms, the metrics for success, and how (and for whom) the systems are being optimized. the overarching objectives are accountability and transparency. perhaps most critically, the prevalence of algorithms in our lives has changed the way we interact with and use those systems, and the ways we behave in personal and social contexts. we conduct information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 5 ourselves to be “algorithmically recognizable” allowing us to become “increasingly legible to machines for capture and calculation.”34 the danger is that this will “lead users to internalize their [algorithm’s] norms and priorities.”35 at the same time the power of algorithmic technology is abused and misused, it remains a powerful technology to enhance human capabilities and insight. algorithms are attributable to dramatic advances in health care and science as well as more mundane (but appreciated) applications such as spam filters. anti-science sentiments, typified by anti-vaxxers, should not be allowed to undermine the opportunities for algorithms that materially improve the human condition and the natural world. those opportunities now extend beyond the well-funded, technology-rich research and corporate ai departments. increasingly more consumer-friendly tools and applications allow a broader and more diverse population to create algorithmic solutions. the rise of mlaas (machine learning as a service) brings together powerful cloud-based machine learning environments with accessible toolsets.36 algorithmic literacy is needed to acknowledge both the technology’s power (control) over people and power (empowerment) for people. recognizing the need for protection and encouragement, many governments have enacted protective legislation and training initiatives. emblematic of the former is the general data protection regulation (gdpr) of the european union with its “right to explanation” for algorithmic decisions.37 exemplary of the latter is finland’s initiative to educate a large portion of their population through “elements of ai,” a free online course.38 despite these advances there remain power imbalances that require vigilance on the part of 21 st century digital citizens. understanding the power and politics of algorithms recognizes their ontological impact in “new ways of ordering the world.”39 effects this profound suggest a deeper and more comprehensive understanding of algorithms is needed: efforts to help people understand algorithms need to continue moving away from a focus on building awareness of algorithms—people increasingly know about “those things called algorithms”—and toward explaining algorithms in such a way that people have a more consistent conceptualization of what algorithms are, what algorithms do, and—what often is overlooked—what algorithms cannot do.40 algorithmic literacy, like all literacies, is not about mastery but levels of competence appropriate to age, circumstance, and need. understood simply as recipes or visual decision trees, algorithms are accessible to even those with minimal digital literacy. public institutions, and specifically libraries, can and must take a lead role in addressing the challenges of this “new world.” the library role in algorithmic literacy libraries have traditionally played a central role in making emerging technologies accessible to their communities whether those be online systems, makerspaces, interactive media, virtual reality, or a host of others. advancing digital access, digital literacy, and digital inclusion have long been the acknowledged by governments and public agencies as a role of the public library even if not appropriately funded to do so.41 recently, libraries have begun addressing their role in relation to ai and algorithmic literacy. information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 6 the urban libraries council (ulc) conducted an informal poll about ai and public libraries.42 of the responding libraries (83 of its 150-member library systems), 45% identified ai as important to their leadership with 23% having a staff person dedicated to ai and 27% providing programming to help the public learn about ai. in response to a question of how best libraries could serve their community in this area, 79% said by framing and building awareness of ai, 68% recommended providing continuous education opportunities for the public, and 61% supported the provision of experiential programming. in 2019, the ulc formed a working group to advance the public library role in ai awareness, education, and experiences. in 2018 the canadian federation of library associations (cfla) held a national forum in part focused on artificial intelligence.43 participant discussions yielded three key priorities with respect to ai: training for library staff, educational materials of for the public, and advocacy initiatives regarding privacy, bias, and transparency. a fourth priority was the inclusion o f ai literacy and awareness in mis and mlis curricula to facilitate a leadership role for the profession in this area. algorithmic literacy programs have two general audiences: members of the community the library serves and the staff of the libraries themselves. for the community, these programs center on awareness and implications, skill development, and application and use.44 through workshops, hands-on laboratories and makerspaces, consumer checklists, and a variety of informational tools, libraries can provide, or partner in providing, resources in an ageand context-appropriate setting. for library staff, an additional focus is required on advocacy with respect to regulatory issues, system development, and the evolution of the local and national information infrastructures. library staff can lead, and participate in, advocacy programs that seek to influence government, public agencies, commercial system and service providers, and others about algorithmic literacy. it is a misconception to think of algorithms, and ai more generally, as arcane topics beyond the ability of library staff to understand and teach. while the technical details of ai are complex, this is not the level of understanding required of staff or needed by the library’s community. for example, ai programing at the frisco public library introduced ai maker kits and ran basic ai classes. the toronto public library, through its digital innovation hubs, has offered learning circles in basic ai (using the finnish elements of ai course as a foundation) and hosted presentations on various aspects of algorithms in everyday life. by abstracting algorithms to higher level concepts related directly to daily experience (using facebook is illustrative of many key ideas regarding algorithmic literacy), staff can obtain a sufficient overview from a variety of accessible, introductory texts or videos. perhaps most importantly, given the new and evolving nature of this technology, library staff should view themselves as co-learners. no matter the setting or context, an active learning approach is recommended with learners situated as makers as well as consumers.45 a review of the k–12 curricula regarding computational literacy identified active learning strategies based on projects, problem solving, cooperation, and games. the researchers recommend augmenting these with scaffolding strategies, storytelling, and aesthetic approaches.46 while intended for algorithmic literacy initiatives involving children, four design principles from dasgupta and mako are relevant for any demographic: https://friscolibrary.com/ https://www.torontopubliclibrary.ca/ https://www.elementsofai.com/ information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 7 1. make data analysis central and ensure the data is relevant to the learner, 2. manage risk by using sandboxes for experimentation, 3. respect community values about technology that may differ, and 4. support authenticity with real-world examples and scenarios.47 long and magerko document a set of 17 core competencies and 15 associated learning design considerations regarding ai literacy.48 taken together these represent the basis for an algorithmic literacy program for any demographic and any context. libraries are encouraged to seek partnerships and collaborations with schools (k–12 and higher education) as well as with non-profit advocacy and training groups.49 examples among these include the algorithmic literacy project (algorithmliteracy.org) and a.i. for anyone (https://aiforanyone.org). many technology companies also offer high quality programs and resources. however, a report from the public policy forum notes that digital literacy campaigns are “too often funded by the very companies that are contributing to the problem.”50 a key issue is the lack of assessment instruments. there are none for algorithmic literacy and few for computational thinking. the most prominent of the latter is skills based, focusing on concepts and operational practices and very little on the wider social and cultural implications.51 library experience with information literacy assessment can inform algorithmic literacy assessment by helping to balance skills and operational concerns with a wider focus on concepts and contextual awareness. information literacy and explainable ai (xai): unique library contributions while libraries can make contributions to algorithmic literacy through a variety of programs, resources, and advocacy initiatives, two specific areas suggest opportunities for unique contributions: algorithmic literacy as a part of information literacy and algorithmic literacy in support of “explainable ai” (xai). algorithmic literacy and information literacy annemaree lloyd describes the opacity and ubiquity of algorithms as “a wicked problem for librarians and archivists who have a vested interest in equitable access, informed citizenry and the maintenance of public memory” and insists that information literacy “provides resistance to the expansionist claims of algorithms, while at the same time ensuring that people harness the power of this culture to their advantage.”52 information literacy programs championed by libraries have been instrumental in raising awareness and skill building among their user communities. using information literacy programs as a scaffold, algorithmic literacy can be incorporated into these successful initiatives. however, given the current needs “machine learning and algorithms present frontstage in the information literacy constellation.”53 head et al., in their important 2020 study of algorithms and information literacy, present a view of student perspectives that is both troubling and optimistic. 54 the students expressed “a tangle of resignation and indignation” about the effects of algorithms on their lives. for them, algorithms obscure more than they reveal, privacy is compromised, “trust is dead,” and skepticism is total. the authors conclude that we face an “epistemological crisis” where algorithms are “stripping individuals of the responsibility to interpret the facticity of the information these systems give us when that interpretation has been performed by the algorithms themselves.” however, students also employed “defensive practices” against algorithms, utilized “multiple selves” to preserve their https://algorithmliteracy.org/ https://aiforanyone.org/ information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 8 privacy, and were keen to learn how to “fight back” against surveillance and algorithmic decisionmaking. this is a reminder that “while algorithms certainly do things to people, people also do things to algorithms.”55 people have “algorithmic capital” which they can use in “negotiation with algorithmic power.”56 with these findings, it seems clear that status quo information literacy programs will not address the unique challenges presented by algorithms. jason clark, scott young, and lisa janicke hinchliffe took up this challenge with a project funded by an imls grant.57 calling “algorithmic awareness” a “new competency,” these researchers identified a gap in the acrl framework for information literacy that revealed “a lack of an understanding around the rules that govern our software and shape our digital experiences.”58 those rules are the “invisible logic” of algorithms that need to be made transparent for users and library staff. deliverables from this project include an integrated curriculum, syllabus, and software prototype that respond uniquely to the pedagogical challenges of algorithmic literacy.59 in promoting ml (machine learning) literacy, ryan cordell also calls for a specific pedagogical approach that would “emphasize the situated-ness of ml training data and experiments, including the biases or oversights that influence the outcomes of academic, economic, and governmental ml processes.”60 recommendations from this report provide guidelines for developing staff expertise, running pilot projects, and creating toolsets and checklists supportive of responsible machine learning. algorithmic literacy and explainable ai (xai) perhaps a less obvious way for libraries to contribute to algorithmic literacy is through explainable ai (xai).61 difficulties in interrogating algorithms to assess bias, discrimination, and unfairness (as well as other deficiencies such as veracity and generalizability) have led to widespread interest in xai. the purpose of xai is to “enable human users to understan d, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners” and to deploy ai systems that have “the ability to explain their rationale, characterize their strengths and weaknesses, and convey an understanding of how they will behave in the future.”62 there is complementarity between the objectives of xai and algorithmic literacy. both seek transparency, promote understanding, and facilitate accountability. both recognize the primacy of human agency in human-machine interaction. xai is accomplished through a variety of techniques, strategies, and processes. these can involve unambiguous proofs, technical and statistical interventions for verification and validation, and authorizations that rely on standards, audits, and policy directives.63 explanations are contextual. system designers, professionals, regulators, end users, and the general public need explanations specific to their objectives and tailored to their skills and knowledge. as algorithmic decision-making is increasingly embedded in the information tools, services, and resources provided by libraries and promoted to users, xai and algorithmic literacy can operate in close association. libraries can incorporate aspects of xai into algorithmic literacy programming and the principles of algorithmic literacy (and more generally information literacy) can inform how xai is sensitive and responsive to different explanatory needs. xai is still an emergent field but it has had, and will continue to have, a profound impact on the development of machine learning systems. the opportunity for library involvement is immediate: information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 9 librarians need to become well versed in these technologies, and participate in their development, not simply dismiss them or hamper them. we must not only demonstrate flaws where they exist but be ready to offer up solutions. solutions grounded in our values and in the communities we serve.64 a repeated message from lis researchers is that library-developed tools to interrogate ai systems are essential components in advancing algorithmic literacy. 65 these tools can address the complexity and opacity of machine learning systems and provide levels of explainability and transparency in contextually appropriate ways. one such tool, either as a stand-alone system or embedded in an existing discovery system, might provide a user with access to the nature, and potential bias, of the training data, the general efficacy of the learning algorithm(s) used, and the generalizability of the trained model to different contexts. this xai scorecard would integrate the objectives of xai, algorithmic literacy, and information literacy. by leveraging and developing library staff skills and by partnering with ai research and industry groups “libraries can become ideal sites for cultivating responsible and responsive ml.”66 padilla views this engagement as not just a technical initiative but a library-wide effort to promulgate “responsible operations” with ai, noting that library practices “that embed transparency and explainability increase the likelihood of organizational accountability.”67 conclusion algorithms are “the new power brokers in society” and “we are growing increasingly dependent on computational spectacles to see the world.”68 lash argues that this development has altered the rules by which society operates. constitutive rules (e.g., rules that define the boundaries of society) and regulative rules (e.g., the rules define how we operate in society) are now joined by “algorithmic, generative rules.” these rules are “compressed and hidden and we do not encounter them in the way that we encounter constitutive and regulative rules. yet this third type of generative rules is more and more pervasive in our social and cultural life of the post-hegemonic order.”69 algorithmic literacy is a means to understand this new set of rules and to encourage the skills and abilities so people can use algorithms and not be used by them. libraries have typically championed accessible technology and its effective use. the ubiquity of algorithmic decisionmaking and its profound impact on everyday lives makes the recognition and promotion of algorithmic literacy a critical new challenge and imperative for libraries of all types. information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 10 endnotes 1 thomas h. cormen et al., introduction to algorithms, 3rd ed. (cambridge ma: mit press, 2009), 13. 2 yoav shohman et al., “ai index 2017 report” (stanford, ca: human-centered ai initiative, stanford university, 2017), http://cdn.aiindex.org/2017-report.pdf; safiya noble, algorithms of oppression: how search engines reinforce racism (new york: new york university press, 2018), 1. 3 jos de mul and bibi van den berg, “remote control: human autonomy in the age of computer mediated agency,” in law, human agency, and autonomic computing, ed. mireille hildebrandt and antoinette rouvroy (abingdon: routledge, 2011), 58. 4 lee rainie and janna anderson, “code-dependent: pros and cons of the algorithmic age” (pew research center, february 2017), http://www.pewinternet.org/wpcontent/uploads/sites/9/2017/02/pi_2017.02.08_algorithms_final.pdf. 5 “canada’s ai imperative: from predictions to prosperity” (toronto: deloitte, 2019), 16, https://www.canada175.ca/en/reports/aiimperative?&id=ca:2el:3or:awa_2019_fcc_omnia1:from_dca_fccomnia2. 6 “2019 edelman ai survey,” edelman, 2019, https://www.edelman.com/sites/g/files/aatuss191/files/201903/2019_edelman_ai_survey_whitepaper.pdf. 7 jenna burrell, “how the machine ‘thinks’: understanding opacity in machine learning algorithms,” big data & society 3, no. 1 (2016), https://doi.org/10.1177/2053951715622512; rainie and anderson, “code-dependent.” 8 jeannette wing, “computational thinking, 10 years later,” communications of the acm 59, no. 7 (2016): 10, https://doi.org/10.1145/2933410. 9 loanne snavely and natasha cooper, “the information literacy debate,” journal of academic librarianship 23, no. 1 (1997): 9–14, https://doi.org/10.1016/s0099-1333(97)90066-5. 10 alfred thomas bauer and ebrahim mohseni ahooei, “rearticulating internet literacy,” cyberspace studies 2, no. 1 (2018): 29–53, https://doi.org/10.22059/jcss.2018.245833.1012. 11 evelyn stiller and cathie leblanc, “from computer literacy to cyber-literacy,” journal of computing sciences in colleges 21, no. 6 (2006): 4–13; peter j. denning and matti tedre, computational thinking (cambridge ma: mit press, 2019); z. katai, “the challenge of promoting algorithmic thinking of both sciencesand humanities-oriented learners,” journal of computer assisted learning 31, no. 4 (2015): 287–99, https://doi.org/10.1111/jcal.12070. 12 nick seaver, “what should an anthropology of algorithms do?” (american anthropological association, chicago, 2013), 1–2, http://nickseaver.net/papers/seaveraaa2013.pdf. 13 jesús moreno-león and marcos román-gonzález, “on computational thinking as a universal skill,” in ieee global engineering education conference (educon, santa cruz de tenerife, http://cdn.aiindex.org/2017-report.pdf http://www.pewinternet.org/wp-content/uploads/sites/9/2017/02/pi_2017.02.08_algorithms_final.pdf http://www.pewinternet.org/wp-content/uploads/sites/9/2017/02/pi_2017.02.08_algorithms_final.pdf https://www.canada175.ca/en/reports/ai-imperative?&id=ca:2el:3or:awa_2019_fcc_omnia1:from_dca_fccomnia2 https://www.canada175.ca/en/reports/ai-imperative?&id=ca:2el:3or:awa_2019_fcc_omnia1:from_dca_fccomnia2 https://www.edelman.com/sites/g/files/aatuss191/files/2019-03/2019_edelman_ai_survey_whitepaper.pdf https://www.edelman.com/sites/g/files/aatuss191/files/2019-03/2019_edelman_ai_survey_whitepaper.pdf https://doi.org/10.1177/2053951715622512 https://doi.org/10.1145/2933410 https://doi.org/10.1016/s0099-1333(97)90066-5 https://doi.org/10.22059/jcss.2018.245833.1012 https://doi.org/10.1111/jcal.12070 http://nickseaver.net/papers/seaveraaa2013.pdf information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 11 spain: ieee, 2018), 1684–89; shuchi grover and roy pea, “computational thinking in k–12: a review of the state of the field,” educational researcher 42, no. 1 (2013): 38–43, https://doi.org/10.3102/0013189x12463051; betual c. czerkawski and eugene w. lyman iii, “exploring issues about computational thinking in higher education,” techtrends 59, no. 2 (2015): 57–65. 14 daniel zeng, “from computational thinking to ai thinking,” ieee intelligent systems (november/december, 2013), 2–4; duri long and brian magerko, “what is ai literacy? competencies and design considerations,” in proceedings of the 2020 chi conference on human factors in computing systems, chi ’20 (honolulu, hi: association for computing machinery, 2020), 1–16, https://doi.org/10.1145/3313831.3376727. 15 rob kitchin, “thinking critically about and researching algorithms,” information, communication & society 20, no. 1 (2017): 18, https://doi.org/10.1080/1369118x.2016.1154087. 16 karen nicholson, “information into action? reflections on (critical) practice” (workshop on instruction in library use (wilu), university of ottawa, 2018), 7–8, https://ir.lib.uwo.ca/fimspres/51/. 17 nick seaver, “algorithms as culture: some tactics for the ethnography of algorithm systems,” big data & society 4 (2017): 10, https://doi.org/10.1177/2053951717738104. 18 virginia eubanks, automating inequity: how high-tech tools profile, police, and punish the poor (new york: st. martin’s press, 2018); noble, algorithms of oppression; cathy o’neil, weapons of math destruction: how big data increases inequality and threatens democracy (new york: crown, 2016); frank pasquale, the black box society: the secret algorithms that control money and information (cambridge, ma: harvard university press, 2015). 19 time dutton, “building an ai world: report on national and regional ai strategies” (toronto: cifar, 2018), https://www.cifar.ca/docs/default-source/aisociety/buildinganaiworld_eng.pdf?sfvrsn=fb18d129_4; pricewaterhousecooper, “sizing the prize: what’s the real value of ai for your business and how can you capitalise?,” 2017, https://www.pwc.com/gx/en/issues/analytics/assets/pwc-ai-analysis-sizing-the-prizereport.pdf. 20 allan martin and jan grudziecki, “digeulit: concepts and tools for digital literacy development,” innovation in teaching and learning in information and computer sciences 5, no. 4 (2006): 249–67, https://doi.org/10.11120/ital.2006.05040249. 21 rosanne cordell, “information literacy and digital literacy: competing or complementary?,” communications in information literacy 7, no. 2 (2013): 177–83, https://doi.org/10.15760/comminfolit.2013.7.2.150; andreas dengel and ute heuer, “a curriculum of computational thinking as a central idea of information & media literacy,” in proceedings of the 13th workshop in primary and secondary computing education (wipsce’18) october 4-6, 2018, potsdam, germany (new york: acm, 2018), https://doi.org/10.1145/3265757.3265777; sarah gretter and aman yadav, “computational https://doi.org/10.3102/0013189x12463051 https://doi.org/10.1145/3313831.3376727 https://doi.org/10.1080/1369118x.2016.1154087 https://ir.lib.uwo.ca/fimspres/51/ https://doi.org/10.1177/2053951717738104 https://www.cifar.ca/docs/default-source/ai-society/buildinganaiworld_eng.pdf?sfvrsn=fb18d129_4 https://www.cifar.ca/docs/default-source/ai-society/buildinganaiworld_eng.pdf?sfvrsn=fb18d129_4 https://www.pwc.com/gx/en/issues/analytics/assets/pwc-ai-analysis-sizing-the-prize-report.pdf https://www.pwc.com/gx/en/issues/analytics/assets/pwc-ai-analysis-sizing-the-prize-report.pdf https://doi.org/10.11120/ital.2006.05040249 https://doi.org/10.15760/comminfolit.2013.7.2.150 https://doi.org/10.1145/3265757.3265777 information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 12 thinking and media & information literacy: an integrated approach to teaching twenty-first century skills,” techtrends 60 (2016): 510–16, https://doi.org/10.1007/s11528-016-0098-4. 22 jeannette wing, “computational thinking,” communications of the acm 49, no. 3 (2006): 35. 23 sharin rawhiya jacob and mark warschauer, “computational thinking and literacy,” journal of computer science integration 1, no. 1 (2018): 3, https://doi.org/10.26716/jcsi.2018.01.1.1. 24 sylvia scribner and michael cole, the psychology of literacy, acls humanities e-book (series) (cambridge, ma: harvard university press, 1981), 99. 25 george steiner, “school terms: redefining literacy for the digital age,” lapham’s quarterly 1, no. 4 (2008): 198. 26 ed finn, “algorithm of the enlightenment,” issues in science and technology 33, no. 3 (2017): 25; ed finn, what algorithms want: imagination in the age of computing (cambridge, ma: mit press, 2017), 2. 27 long and magerko, “what is ai literacy?,” 2. 28 david bawden, “information and digital literacies: a review of concepts,” journal of documentation 57, no. 2 (2001): 233. 29 gretter and yadav, “computational thinking,” 510. 30 pasquale, the black box society; o’neil, weapons of math destruction. 31 “what consumers really think about ai: a global study,” pega, 2019, https://www.ciosummits.com/what-consumers-really-think-about-ai.pdf. 32 motahhare eslami et al., “first i ‘like’ it, then i hide it: folk theories of social feeds,” in proceedings of the 2016 chi conference on human factors in computing systems, chi ’16 (san jose, ca: association for computing machinery, 2016), 2371–82, https://doi.org/10.1145/2858036.2858494. 33 julia angwin et al., “machine bias,” propublica, may 23, 2016, https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing; eubanks, automating inequity; noble, algorithms of oppression; pasquale, the black box society; ruha benjamin, race after technology: abolitionist tools for the new jim code (polity press, 2019); o’neil, weapons of math destruction. 34 tarleton gillespie, “the relevance of algorithms,” in media technologies: essays on communication, materiality, and society, ed. tarleton gillespie, pablo j. boczkowski, and kirsten a. foot (cambridge, ma: mit press, 2014), 184; sun-ha hong, technologies of speculation: the limits of knowledge in a data-driven society (new york: new york university press, 2020), 2. 35 gillespie, “the relevance of algorithms,” 187. https://doi.org/10.1007/s11528-016-0098-4 https://doi.org/10.26716/jcsi.2018.01.1.1 https://www.ciosummits.com/what-consumers-really-think-about-ai.pdf https://doi.org/10.1145/2858036.2858494 https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 13 36 altexsoft, “comparing machine learning as a service: amazon, microsoft azure, google cloud ai, ibm watson,” data science (blog), september 27, 2019, https://www.altexsoft.com/blog/datascience/comparing-machine-learning-as-a-serviceamazon-microsoft-azure-google-cloud-ai-ibm-watson/. 37 european union, “regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016,” 2016, http://eur-lex.europa.eu/legalcontent/en/txt/?uri=celex:32016r0679; bryce goodman and seth flaxman, “european union regulations on algorithmic decision making and a ‘right to explanation,’” ai magazine 38, no. 3 (2017): 50–57, https://doi.org/10.1609/aimag.v38i3.2741. 38 finland, “work in the age of artificial intelligence: four perspectives on economy, employment, skills and ethics” (helsinki: ministry of economic affairs and employment, 2018), http://urn.fi/urn:isbn:978-952-327-313-9. 39 taina bucher, if . . . then: algorithmic power and politics (new york: oxford university press, 2018), 20. 40 alison j. head, barbara fister, and margy macmillan, “information literacy in the age of algorithms: student experiences with news and information, and the need for change” (project information literacy, 2020), 41, https://www.projectinfolit.org/uploads/2/7/5/4/27541717/algoreport.pdf. 41 paul t. jaeger et al., “the intersection of public policy and public access: digital divides, digital literacy, digital inclusion, and public libraries,” public library quarterly 31, no. 1 (2012): 1, https://doi.org/10.1080/01616846.2012.654728. 42 “ulc snapshot: artificial intelligence,” urban libraries council weekly newsletter, july 18, 2018. 43 canadian federation of library associations, “artificial intelligence and intellectual freedom: key policy concerns for canadian libraries” (ottawa: cfla, 2018), http://cfla-fcab.ca/wpcontent/uploads/2018/07/cfla-fcab-2018-national-forum-paper-final.pdf. 44 martin and grudziecki, “digeulit.” 45 b. alexander, s. adams becker, and m. cummins, “digital literacy: an nmc horizon project strategic brief” (austin, tx: the new media consortium, 2016), https://www.nmc.org/publication/digital-literacy-an-nmc-horizon-project-strategic-brief/. 46 ting-chia hsu, shao-chen chang, and yu-ting hung, “how to learn and how to teach computational thinking: suggestions based on a review of the literature,” computers & education 126 (2018): 296–310, https://doi.org/10.1016/j.compedu.2018.07.004. 47 sayamindu dasgupta and benjamin mako hill, “designing for critical algorithmic literacies,” arxiv:2008.01719 [cs], 2020, http://arxiv.org/abs/2008.01719. 48 long and magerko, “what is ai literacy?” 49 alexander, adams becker, and cummins, “digital literacy.” https://www.altexsoft.com/blog/datascience/comparing-machine-learning-as-a-service-amazon-microsoft-azure-google-cloud-ai-ibm-watson/ https://www.altexsoft.com/blog/datascience/comparing-machine-learning-as-a-service-amazon-microsoft-azure-google-cloud-ai-ibm-watson/ http://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:32016r0679 http://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:32016r0679 https://doi.org/10.1609/aimag.v38i3.2741 http://urn.fi/urn:isbn:978-952-327-313-9 https://www.projectinfolit.org/uploads/2/7/5/4/27541717/algoreport.pdf https://doi.org/10.1080/01616846.2012.654728 http://cfla-fcab.ca/wp-content/uploads/2018/07/cfla-fcab-2018-national-forum-paper-final.pdf http://cfla-fcab.ca/wp-content/uploads/2018/07/cfla-fcab-2018-national-forum-paper-final.pdf https://www.nmc.org/publication/digital-literacy-an-nmc-horizon-project-strategic-brief/ https://doi.org/10.1016/j.compedu.2018.07.004 http://arxiv.org/abs/2008.01719 information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 14 50 edward greenspon and taylor owen, “democracy divided: countering disinformation and hate in the digital public sphere” (ottawa: public policy forum, 2018), 19, https://ppforum.ca/wpcontent/uploads/2018/08/democracydivided-ppf-aug2018-en.pdf. 51 marcos román-gonzález, juan-carlos pérez-gonzález, and carmen jiménez-fernández, “which cognitive abilities underlie computational thinking? criterion validity of the computational thinking test,” computers in human behavior 72 (2017): 678–91, https://doi.org/dx.doi.org/10.1016/j.chb.2016.08.047. 52 annemaree lloyd, “chasing frankenstein’s monster: information literacy in the black box society,” journal of documentation 75, no. 6 (2019): 1476, https://doi.org/10.1108/jd-022019-0035. 53 head, fister, and macmillan, “information literacy in the age of algorithms,” 42. 54 head, fister, and macmillan, “information literacy in the age of algorithms.” 55 taina bucher, “the algorithmic imaginary: exploring the ordinary affects of facebook algorithms,” information, communication & society 20, no. 1 (2017): 42, https://doi.org/10.1080/1369118x.2016.1154086. 56 tanya kant, making it personal: algorithmic personalization, identify, and everyday life (oxford: oxford university press, 2020), 152. 57 jason clark, lisa janicke hinchliffe, and scott young, “unpacking the algorithms that shape our ux” (washington, dc: imls, 2017), https://www.imls.gov/sites/default/files/grants/re-7217-0103-17/proposals/re-72-17-0103-17-full-proposal-documents.pdf. 58 association of college and university libraries, “framework for information literacy for higher education,” 2015, http://www.ala.org/acrl/standards/ilframework; jason clark, “building competencies around algorithmic awareness” (washington, dc: code4lib, 2018), https://www.lib.montana.edu/~jason/talks/algorithmic-awareness-talk-code4lib2018.pdf. 59 jason clark, algorithmic awareness (2018; repr., github, 2020), https://github.com/jasonclark/algorithmic-awareness. 60 ryan cordell, “machine learning + libraries: a report on the state of the field” (washington dc: library of congress, 2020), 31, https://labs.loc.gov/static/labs/work/reports/cordellloc-ml-report.pdf. 61 michael ridley, “explainable artificial intelligence,” research library issues, no. 299 (2019): 28– 46, https://doi.org/10.29242/rli.299.3. 62 matt turek, “explainable artificial intelligence (xai)” (arlington, va: darpa, 2016), https://www.darpa.mil/program/explainable-artificial-intelligence; darpa, “explainable artificial intelligence (xai)” (arlington, va: darpa, 2016), http://www.darpa.mil/attachments/darpa-baa-16-53.pdf. 63 ashraf abdul et al., “trends and trajectories for explainable, accountable, and intelligible systems: an hci research agenda,” in proceedings of the 2018 chi conference on human https://ppforum.ca/wp-content/uploads/2018/08/democracydivided-ppf-aug2018-en.pdf https://ppforum.ca/wp-content/uploads/2018/08/democracydivided-ppf-aug2018-en.pdf https://doi.org/dx.doi.org/10.1016/j.chb.2016.08.047 https://doi.org/10.1108/jd-02-2019-0035 https://doi.org/10.1108/jd-02-2019-0035 https://doi.org/10.1080/1369118x.2016.1154086 https://www.imls.gov/sites/default/files/grants/re-72-17-0103-17/proposals/re-72-17-0103-17-full-proposal-documents.pdf https://www.imls.gov/sites/default/files/grants/re-72-17-0103-17/proposals/re-72-17-0103-17-full-proposal-documents.pdf http://www.ala.org/acrl/standards/ilframework https://www.lib.montana.edu/~jason/talks/algorithmic-awareness-talk-code4lib2018.pdf https://github.com/jasonclark/algorithmic-awareness https://labs.loc.gov/static/labs/work/reports/cordell-loc-ml-report.pdf https://labs.loc.gov/static/labs/work/reports/cordell-loc-ml-report.pdf https://doi.org/10.29242/rli.299.3 https://www.darpa.mil/program/explainable-artificial-intelligence http://www.darpa.mil/attachments/darpa-baa-16-53.pdf information technology and libraries june 2021 algorithmic literacy and the role for libraries | ridley and pawlick-potts 15 factors in computing systems, chi ’18 (new york: acm, 2018), 582:1–582:18, https://doi.org/10.1145/3173574.3174156; wojciech samek and klaus-robert muller, “towards explainable artificial intelligence,” in explainable ai: interpreting, explaining and visualizing deep learning, ed. wojciech samek et al., 2019., lecture notes in artificial intelligence 11700 (cham: springer international publishing, 2019), 5–22; alejandro barredo arrieta et al., “explainable artificial intelligence (xai): concepts, taxonomies, opportunities and challenges toward responsible ai,” arxiv:1910.10045 [cs], 2019, http://arxiv.org/abs/1910.10045. 64 r. david lankes, “decoding ai and libraries,” r. david lankes (blog), july 3, 2019, https://davidlankes.org/decoding-ai-and-libraries/. 65 catherine coleman, “artificial intelligence and the library of the future, revisited,” digital library blog (blog), november 3, 2017, https://library.stanford.edu/blogs/digital-libraryblog/2017/11/artificial-intelligence-and-library-future-revisited; head, fister, and macmillan, “information literacy in the age of algorithms”; cordell, “machine learning + libraries”; clark, hinchliffe, and young, “unpacking the algorithms.” 66 cordell, “machine learning + libraries,” 2. 67 thomas padilla, responsible operations. data science, machine learning, and ai in libraries (dublin, oh: oclc research, 2019), 10, https://doi.org/10.25333/xk7z-9g97. 68 nicholas diakopoulos, “algorithmic accountability reporting: on the investigation of black boxes” (new york: tow center for digital journalism, columbia university, 2014), 2, https://doi.org/10.7916/d8zk5tw2; finn, “algorithm of the enlightenment,” 24. 69 scott lash, “power after hegemony: cultural studies in mutation?,” theory, culture & society 24, no. 3 (2007): 71, https://doi.org/10.1177/0263276407075956. https://doi.org/10.1145/3173574.3174156 http://arxiv.org/abs/1910.10045 https://davidlankes.org/decoding-ai-and-libraries/ https://library.stanford.edu/blogs/digital-library-blog/2017/11/artificial-intelligence-and-library-future-revisited https://library.stanford.edu/blogs/digital-library-blog/2017/11/artificial-intelligence-and-library-future-revisited https://doi.org/10.25333/xk7z-9g97 https://doi.org/10.7916/d8zk5tw2 https://doi.org/10.1177/0263276407075956 abstract introduction algorithms and literacy “literacies of the digital” defining algorithmic literacy why algorithmic literacy? the library role in algorithmic literacy information literacy and explainable ai (xai): unique library contributions algorithmic literacy and information literacy algorithmic literacy and explainable ai (xai) conclusion endnotes marcive: a cooperative automated library system virginia m. bowden: systems analyst, the university of texas health science center at san antonio, and ruby b. miller: head cataloger, trinity university, san antonio, texas. 183 the marcive library system is a batch computer system utilizing both the marc tapes and local cataloging to provide catalog cal'ds, book catalogs, and selective bibliographies for five academic libra1·ies in san antonio, texas. the development of the system is traced and present procedures are described. batch retrieval from the marc 1·ecords plus the modification of these records costs less than twenty cents per title. computer costs fo1' retrieval, modification, and card production average six-ty-six cents per title, between seven and ten cents per card. the attributes and limitations of the marcive system are compm·ed with those of the oclc system. in san antonio, texas, a unique cooperative effort in library automation has developed, involving the libraries of five diverse institutions: trinity university, the university of texas health science center at san antonio (uthscsa), san antonio college (sac), the university of texas at san antonio (utsa), and st. mary's university. these institutions are utilizing the marcive library system which was developed by and for one library, that of trinity university. the marcive system is a batch, disc oriented computer system utilizing both local cataloging and the marc tapes to produce catalog cards, book catalogs, selective bibliographies, and other products. development the trinity university library has been involved in library automation since 1966.1 when the library reclassified its collection from dewey to the library of congress classification in1966, a simplified machine-readable format was developed and used for storage on computer. this format contained the following bibliographic elements: accession number, call number, author, title, and imprint date. in 1969 the library decided to reformat the computer data base into a marc ii compatible format in order 184 ] ournal of library automation vol. 7 i 3 september 197 4 to build a data base of bibliographic records that could be the basis for all future automated systems within the library. the resulting system, marcive, was designed jointly by the head cataloger, ruby b. miller, and the library programmer, paul jackson, a graduate student in trinity's department of computer science. since in 1969 literature on completed library automation projects was sparse, no other system was used as a guide. the marcive format was based on the designers' interpretation of the 1969 edition of the marc manual. the name, marcive, evolved when the programmer facetiously claimed that his format was so advanced he would call it the marc iv format. the computer room operating staff, ignoring the space between the marc and iv, combined the two, producing marciv. an e was added later for ease of pronunciation. the marcive system was designed initially as a system for data storage and retrieval. the update, select, and acquisitions list programs were operative in september 1970. the next month uthscsa inquired as to the possibility of producing catalog cards as part of the marcive system. within the brief span of three months, by january 1971, trinity university library produced 4,289 catalog cards and uthscsa produced 1,719 catalog cards via marcive. in february 1974, the five participating libraries produced a total of 29,000 catalog cards, with trinity accounting for 10,740 cards. continued development of the marcive system was delayed in 1971 by changes in computer center personnel and equipment. in 1972 new programs were developed to incorporate the marc tapes into the marcive system. the size of the marc data base, which is now held on three discs, was a major problem. modifications were included to accept input from magnetic tape and typewriter terminals using the apl language as well as keypunched cards. the original restriction of the system to classifications with one to three alphabetic letters followed by numbers, such as used by lc and nlm, was modified to accept dewey decimal classification to accommodate san antonio college. this restriction had been incorporated in an attempt to insure that the call number would be properly formatted, thus simplifying retrieval in the select program and grouping in the acquisitions list and update programs. computer configuration the marcive system is a disc oriented system which was programmed for an ibm 360/44 using the mft operating system. this computer model was designed for scientific programming and was manufactured in limited quantities. the programs were written in basic assembly language since adequate higher level language compilers for the 360 i 44 were not available at the trinity computer center. in 1971 the programs were converted to run under dos, and in 1972 they were converted for processing on the ibm 370/155 using the os processing system. since the initial promarcne/bowden and miller 185 grams were written in basic assembly language, the subsequent programs have also been written this way. marcive format the marcive format is an adaptation of the marc ii format. the definition of the marc ii format is a", .. format which is intended for the interchange of bibliographic records on magnetic tape. it has not been designed as a record format for retention within the files of any specific organization ... [it is] a generalized structure which can be used to transmit between systems records describing all forms of material capable of bibliographic descriptions . . . the methods of recording and identifying data should provide for maximum manipulability leading to ease of conversion to other formats for various uses."2 adaptation of the marc ii format is common among users. an analysis by the recon task force found much variation among the use of the fixed fields, tags, indicators, and subfields. 3 the oclc system can regenerate marc ii records from oclc records although they contain only 78 percent of the number of characters in the original marc ii record. 4 the developers of the marcive system studied the marc manual and decided that the leader and directmy were not necessary for program manipulation. such information can be generated by a conversion program. the marc mnemonic codes were chosen instead of the numeric ones because all bibliographic data were being coded locally and it was felt that mnemonics would be easier to work with. the mnemonic codes are the ones designated in the marc manuals except that "si" was substituted for "se." rules for assigning indicators, subfields, and delimiters are those described by marc. the basic structure of the marcive format is illustrated in figure 1. the differences between marcive and marc are as follows: 1. marcive's leader consists of three fields: length of disc space, status code, and length of record. in converting marc the following elements of the marc leader are incorporated in the marcive leader fields: length of disc space, status code, and length of record. 2. marcive does not contain the marc record directory, but rather places the tags and subfield codes in front of the actual data. 3. in the conversion from marc ii to marcive, fixed fields such as date of publication are omitted. 4. all data elements in marcive are treated as variable tags even though they contain fixed field data. 5. marcive uses the mnemonic code names for the input of data rather than the numeric marc codes. for example "mep" is used for coding a person as main entry rather than "llo." the mnemonic tag names are stored in the machine format and not the numeric marc tags. ,, ·' 186 j oumal of libra1'y automation vol. 7 i 3 september 197 4 '""d ~ .... ~ 0 i5 cj " "' .oj "' "' .oj § ~ s s " '""' p:; ql "' ql ~ cj "' fin fin-data data elements ..sl 0"' ~ z z <:<:: "' 0 <:<:: <:<:: 1"'1 fjp< ~ tag .g elements bil .g bil ~ b!l"' fl "' "' .s "' bil c/) e-< c/) e-< "' "' " "' p >-1 "' >-1 length of disc space. this identifies the number of seventy-two byte blocks a record uses. the marcive records average 350 characters or three to six blocks. blank. this field is used by the update program. length of record. identifies the actual number of characters a record .contains. fin tag. this is the marcive control tag and must precede each record. it contains four subfields: accession number, type of material, location of material, and call number. tag name. after the fin tag, any of the marcive tags may be input as long as they conform to the proper sequence (i.e., main entry must pi·ecede title). each tag is followed by its subfield codes and the data elements. fig. 1. marcive fo1•mat st1'uctu1'e. 6. all first indicators are input except for the first indicator in the contents note. 7. most of the second indicators are not input, except for the filing indicators which are included in the marcive format. 8. marcive adds one variable tag to the marc format called "fin." it serves the function of the marc 090 local holdings tag. the fin tag must be the first variable tag in each marcive record and must contain four data elements: ( 1) accession number; ( 2) type of material code (monograph, serial, etc.); ( 3) location of material within library (reference, reserve, etc.); ( 4) local call number. even though marcive is not a pure marc format, there has been an attempt to code most of the data elements into marcive. a marcive to marc conversion is being written by one of the marcive libraries in order to merge its marcive data base with a purchased marc data base. marcive master data bases each of the m arcive users maintains a separate data base of its holdings, which is called its marcive master. this master file contains a complete bibliographic record for each title cataloged by the library, including marc cataloging and local cataloging. when a library modifies a marc record, the modified record is recorded in that library's marcive master. the various libraries' marcive masters have not been merged, although this is being considered. each library has prefaced all of its accession numbers with a unique library code just in case a merged data base is desired. marc-con data base the largest data base in the system is the marc-converted data base, marcive/bowden and miller 187 hereafter referred to as marc-con. this data base contains only pure marc data that have been converted into marcive machine format. no original cataloging or local modifications of marc are contained in the marc-con data base. marcive programs convert-this program reformats the weekly marc tapes into the marcive machine format. marc-update-this program merges the weekly converted marc tape with the marc-con disc file. an index sequential ( isam) file containing lc card number, fifty characters of the title, and the disc address of the marc reoord is generated. the isam file is in lc card number order. in 1974 the marc-con data base filled three 3330 disc packs. there are three tape back-up files: one file consisting of original marc records, one of the marc-con records, and a third with the isam file. deleted records and replaced records are annually purged from the marc-con files. a new set of back-up tapes for the disc packs is created every three months in order to facilitate regeneration of the disc packs should damage occur. marc-list-this program lists marc records in title sequence from the tape. once every six to eight weeks the list is cumulated and printed. these lists are used for searching until the annual cumulation of the nuc is received. this provides current listings of records on the marc tapes that are not easily available in the national union catalog. this listing will be eliminated in 1974, when access by title to the marc-con data base is available. marc-search-this program searches for lc numbers on the marc-con file using the isam file. a file of the matched records is produced on tape or disc as specified along with a listing of these records. this listing contains the marc-con complete bibliographic entry (figure 2). although access is currently only by lc card number, access by title algorithm ( 3, 1, 1) is expected in 197 4. replace-the purpose of this program is to modify marc-con records to fit the needs of the individual library. these modifications can be done automatically to all records or on a single record basis by the library. the automatic changes are specified on a control card and include twenty-two options such as assignment of accession number, usage of dewey class number instead of lc, and changing "u.s." in subject headings to "united states." an example of a single modification would be the changing of a series entry from t~·aced to untraced. most marcive participants use a combination of automatic and single changes. the output from the replace program may be input to all other marcive programs, such as edit, catalog card, update, etc. edit-this program verifies the format of the input. valid tags and subfields as well as correct sequence of tags are checked. multiple spaces 188 journal of library automation vol. 7/3 september 1974 library code t0000100fin ab~pa3877.a1~d5~ t0000102lcn a~?0-022854 ~ t0000104lano a~eng~ t0000106lant a~enggrc~ t0000108ddc a~882j.01~ t0000110mepf a~aristophanes.~ t0000112tiln ac~plays;~newly translated into english vbrsb by patrie dickinson.~ t0000114ihp aabc~london,~new york,~oxford university pr~ss,~1970-~ t0000116col ac~v. ~21 em.~ t000011hpri ablb.0.75 (v. 1)~{$2.95 u.s.)~ t0000120siru a~oxford paperbacks, 216-~ t0000122noc a~1. acbarnians. knights. clouds. wasps. peace,1 t0000124aeps ade~dickinson, patrie,11914-1tr.~ t0000200fin ab1nd1097.w4~m613~ t0000202lcn a173-4j7272 ~ t0000204lano a~enq1 t0000206lant a~engita~ t0000210meps a~monti, franco.~ t0000212til ac~african masks;~[translated from th~ italian by andrew hale].1 t0000214imp aabc~london,~new york,1hamlyn,~1969.~ t0000216col adc~j-157 p.169 col. illus.~20 em.~ t0000218pri a,15/-~ t0000220siru a~cameo~ t0000222nog a,translation of le maschere africane.~ t0000224sut az,masks, african,africa, west., fig. 2. search listing of marc-con data. are compressed to one, implied subfields are added, and a limited number of punctuation marks are generated. actual bibliographic data are not checked so spelling errors are not detected by the program. those titles which do not conform to specifications are rejected and an explanatory message is generated. a library may choose one of three forms of listings of output: (1) full-edit, (2) mini-edit, or (3) error-edit. the full-edit marcive!bowden and miller 189 950564 fin,cb6950564,m,rp, qs,4,jk49t,1961;, 950564 meps a, kimber, !diana jclifford, 950564 til ac,janatomy and physiology, (by> jdiana !clifford jkimber 950564 <et al,), 950564 edn,14 th ed. by jlutie jc, !leavell, <et al,), 950564 imp abc,jn.jy.,, !macmillan, <1961), 950564 col a,779 p,, 950564 nog,iearlier eds, have title: !textbook of anatomy and physiology, 950564 sut a,ianatomy, 950564 sut a,iphysiology, 950564 aeps a,jleavell, ilutie jc., fin atlc,cb6950564,m,rp,qs,4,k49t,1961;,meps a,kimber, diana clifford,til ac,anatomy and physiology,[ by] diana clifford kimber [et al. ],edn a,14 th ed. by lutie c. leavell, (et al. ],imp abc,n.y.,macmillan,[1961j,col a,779 p,nqg a,earlier eds. have title: textbook of anatomy and physiology,sut a,anatomy,sut a,physiology,aeps a,leavoll, lutie c, qs 4 kimber, diana cliffor•l. k49t anatomy and physiology [by] diana 1961 clifford kimber (et al.] 14th ed. by lutie c. leave.ll, [ et al.] n.y., macmillan, [ 1961] 6950564 779 p. earlier eds. have title: textbook of anatomy and physiology. 1. anatomy. 2. physiology. i. title. ii. leavell, lutie c. marcive anatomy and physiology anatomy physiology leavell, lutie c fig. 3. full-edit listing from the edit program showing the input, keypunched data, data in marcive file format, data in catalog card format, and tracings. lists every title processed in the following forms: the input data, the data as retained in the marcive file, the data in catalog card format, and the tracings (figure 3) . the mini-edit lists the input for each title and any error messages. the error-edit lists only the input form and the error message of those titles which do not conform to specifications. a library might use the error-edit for cataloging data from the marccon file as the proofreading has occurred after the search phase. however, for original cataloging in which all data have been input locally, the full-edit would be most beneficial. acquisitions list-this program produces listings arranged by classification in a format suitable for printing on 8~~-by-ll-inch paper. these listings include the following data elements for each title: main en190 journal of library automation vol. 7/3 september 1974 try, title, imprint, collation, call number. the items are sorted by author within classification. catalog card-this program produces catalog cards on one-up stock the ibm tn train is used and the printing is eight lines to an inch. the program has many options including number of cards produced by type of entry, and whether or not the tracings will print on the tops of the cards. the cards may be printed in filing order in the following arrays: 1. shelflist arranged by call number. 2. main entry-title-series-added entries in one alphabet arranged alphabetically by the first eight characters of the first word, excluding designated prefixes. 3. subject entries in one alphabet arranged alphabetically by the first eight characters of the first word. the alphabetizing is intended to be a prefiling aid and not to be used as the absolute filing arrangement since each library adapts the filing rules for its own collection. the cards may also be printed in set order. examples of typical catalog cards are shown as figure 4. book catalog i-this is a modification of the acquisitions list program in which the classification does not print. it is used to list faculty publications. book catalog 11-this is a modification of the catalog card program in which the cataloging information is printed with all blank lines compressed. since it is a variation of the catalog card, any type of catalog can be created by specifying the type of entry and then having it prefiled accordingly. trinity produces an author-title book catalog which is used in the interim period between the production and the filing of the catalog cards. thus, the public has a listing of new titles added to the library. book catalog iii-this program generates a book catalog arranged by classification. it is similar to book catalog ii but tracings are not printed (figure 5). indexes by added entry, title, and subject are also generated. update-this program merges the additions and deletions to the marcive master file, producing an updated master and a listing in classification sequence of the additions. the marcive master is in accession number order. duplicate accession numbers are rejected. select-this program generates bibliographies from the marcive master files (figure 6). any tag and its subfields can be searched. the output can be sorted by call number, main entry, title, or any other bibliographic element in marcive. the program can be a powerful search tool for a library. however, the program's method of retrieval is by comparing input data with each record on the file, which can be a very expensive process. there have been discussions of building isam files for various points of entry. build misam-this program builds an isam file to the marcive hf 5549 • .5 r3 f54 245701 marcive/bowden and miller 191 finkle, . robert,; b.,. 1922-... . assessing corporate talent; a key to managerial manpower plann1ng, by robert b. finkle and.william s. jones. new.york~ wileyrnterscience, 1910~ ix, 248 p. illus. 22 em. 1. employees, rating of. ii. jones, w1lliam s. 0 71-120702 i. title. marcive employees, rating of hf 551pl.5 r3 f54 finkle, robert. b., 1922-. assessing corporate talent; a key to manaqerial manpower planning, by robett b. finkle and william s. jones. new york~ wileyinterscience, 1qro. ix, 248 p. illus. 22 em. 245701 1. employees, r~tinq of. ii. jones, w1ll1am s. 0 71.-120702 i. title. marcive assessing corporate talent hf 5549.5 r3 f54 finkle, ~obert, b., 1922-. assess1~g corporate talent: a key to manager1al manpower planning, by robert b. finkle an~ william s. jones. new york~ w1leyinterscience, 19f0. ix, 248 p. illus. 22 em. 245701 1. employees 1 ratinq of. ii. jones, w1lliam s. 0 71-120702 i. title. marcive fig. 4. sample set of catalog cards. 192 journal of libm1·y automation vol 7 i 3 september 197 4 vt 244 the physical examination (video -1 tape). paul cutler, dept. of medicine. san antonio, the university of texas medical school, 1972. vt so min. sd. color. the complete physical examination is presented and done on a patient in an orderly precise method, systematically covering the general inspection of bead, neck, eyes, ears, nose, mouth, glands, chest, lungs, heart, abdomen, external genitalia, rectum, and extremities. 248 minimum dosage local anesthesia -1 (video tape). astra pharmaceutical and adrian cowan, dept. of oral surgery. houston, the university of texas dental school at houston, 1965. vt 45 min. sd. color. anatomical drawings and clinical demonstrations are used to show upper and lover incisor, premolar and molar infiltrations as vell as palatal injections. infra-orbital, palatine, mandibular, mental nerve blocks together with anatomical landmarks for each are demonstrated clinically. 248 the posterior cervical triangle -2 (video tape). yick williams, department of anatomy. san antonio, the university of texas dental school, 1972. 7,1/2 min. sd. color. the procedure for the dissection of 115 fig. 5. page j1·om uthscsa video tape catalog produced by book catalog iii. r qh 431 1 subject genetics, title genetic g. mendel memorial symposium, 1865 1965, brue~n, 1965. (symposia csavj 1. genetics congresses. 2. iii. proceedings, g. mendel memorial symposium, qh,43l 1 g66p 1 1965 6 proceedings, edited by milan scsna. p~ague, academia, 1~66. 287 p, genetics hist. i. mendel, gregcr, 1822 1864. ii. sosna, milan, ed. 1865 1965. galton, francis, sir, 1822 1911. hereditary genius an inquiry into its laws and consequences. lnnocn, macmillan, 18~9. 390 p. tables (part fold.). svo. garrison and morton, no. 226. reynolds historical lltipary catalogue, ng. 1593. 1. genetics. 2. creativeness. 05 72. bf,418,gl81h 1 1s6s gardner, eldon john, 1909 • 12. principles of genetics. 4th ed. ~ew york, wiley, cl972 527 p. illus. 1. ge,etics. 07 qh,431 1 g226p,1s72 gates, reginald ruggles, 1882 • 07 72. hereditary in man. new york, macmillan, 1930. 365 p. illus. diagrs. 1. genetics, human. qh,431,g25sh 1 1930 giblett, eloise r. genetic markers in human blood. philadelphia, davis, 1969 629 p. 1. blood cells. 2. phenotyp2. 3. plasma. 4. polymorphism igeneticsl qh 1 431,g446g,1969 grubb, rune. the genetic markers of human immunoglobins. new york, springer verlag, 1970. 152 p. illus. (molecular biology, biochemistry, anu biophysics, 9j 1. amino acid sequence. 2. anti ant!rodies. 3. gamma glorulin. 4. immunogenetics. qw,504 1 g885g 1 1970 hamrert, gunnar. males with positive sex chromatin an epicemiologic investigation fclloweo by psychiatric study cf sevehty five cases. goeteborg akademifcerlaget, 1966 98 p. 1st. joergen's hospital, goftebofg. psychiatric hese4~ch centre. reports, 1) 1. mental disorders etiology. 2. sex chromosome abncrmaljties. qh 1 431,h199m,1966 harris, harry. human biochemical genetics. cambridge eng. univ. press, 1962 c1959 310 ?. 1. ge"jetics. z. biochemistry. qh,43l,h314h 1 1s59 harris, har~y. principles of human biochemical genetics. n.y., american elsevier, 1970. 328 p. (frontiers of biclggy 1 v. 19j 1. genetics, biochemical. 2. genetics, human. i. human biochemical genetics. qh 1 431 1 h314p 1 1s7c harris, morgan, 1916 • cell culture and somatic variation. n.y., holt, rinehart & winston, 1964 54 7 p. 1. gcne"ics. 2. tissue culture. qh,401 1 h315c 1 1964 fig. 6. select listing arranged by main entry. retrieval was based on call number qh 431, subiect "genetics" and "genetics" as first word in title. ~ ::tl a ~ ~ -c:; ~ u l;tj z 8. ~ j-j t"" r l;tj ~ 1-' ~ 194 ]ou1'nal of library automation vol. 7/3 september 1974 data bases by accession number. this isam file is then used by the marcive search program. this isam file is built only on demand. marcive search-this program searches for accession numbers on the marcive isam file. a disc file and a listing of the marcive records is generated. this file can be corrected by using the replace program, thus creating a corrected disc file which can be input into the various marcive programs. using the marcive system the original marcive programs used only keypunched card input. the cataloged data were coded with the marc mnemonic codes and then keypunched. the resulting decklet of cards was the input to several programs. if in the edit program an error was detected, the appropriate punched card could be corrected and the decklet resubmitted to the edit program, or that program could be bypassed and the decklet input into the catalog card program. these same decldets were saved for the monthly acquisition list. they were not stored on disc until recorded on the marcive master disc file in the update program. although this may seem awkward, it was easier for the library staff, who thus did not worry about change codes and deletes, but could work with their prime input, the keypunched cards. the uthscsa library still uses the keypunched card method of input, since its cataloging is based on the national library of medicine's cataloging. citations are manually coded with the m1arc mnemonic codes at a rate of one to three minutes per title. keypunching takes approximately five minutes per title and is much easier than typing a catalog card because placement on the catalog card is not a consideration. the keypunched data are input to the edit program and the full-edit listing is used for proofreading. correct data are input to the catalog card program, then saved for input to the monthly acquisitions list and the update programs. the other libraries in the marcive system use the marc-con portion of the marcive system. with the addition of the marc tapes it became possible to bypass the coding and keypunching steps, thus. saving both time and effort and reducing the chance of error. at trinity university library when books arrive for processing, a library clerk keypunches the lc card number for each book published after 1968. these keypunched cards are then submitted to the marc search program. the books are matched against the search listing of marc data and any cataloging changes in addition to certain automatic changes are noted on the listing. when the books on a search listing are checked and classified, the books are sent for further processing for the shelves. unmatched books are held for later input to the search program. the corrected search listing is sent to the ibm 2741 typewriter terminal operator. the operator types an input file of changes to the marc marcive/bowden and miller 195 records, such as a series entry change. trinity uses the following automatic options in the replace program: ( 1) the call number used is the lc call number without the period; ( 2) a sequential accession number is generated; ( 3) the date entered on the file is added to the record; ( 4) library location code is generated; ( 5) "u.s." and "ct. brit." in subject headings are spelled out. the change file plus the automatic options control data are transmitted via telephone line to the computer and submitted to the replace program. the replace program modifies the records on the search file and creates a file of records which can be used by any of the marcive programs to produce catalog cards, book catalogs, or updates to the library marcive master. since many automatic options have been included in the replace program, the correction required on marc search records is minimal and a great many catalog cards can be created with little or no input required from the library. books published before 1968, unless one of the marc popular titles, must be fully coded and input by the trinity university library staff. this represents about 25 percent of the books cataloged by the library, with the percentage varying from month to month depending on the cataloging priorities in the department. trinity university library is the only marcive user that inputs via an on-line terminal. this is a much more expensive though versatile method of input. all of the other marcive users who modify marc search records follow the same procedure except all input is via punched cards. · one potential marcive user is experimenting with producing catalog cards without any interim editing of the marc records other than the automatic options available through the replace program. this. method of catalog card production is quicker and less expensive and would be useful for many libraries. benefits the benefits of the marcive system must be compared to the manual system which it replaced. clerical staff time to type tracings and call numbers has been eliminated. trinity has effected major time savings by eliminating the maintenance of proof slips. uthscsa has reduced the typical three-week period during which the unit card was reproduced by a commercial firm to one week. the production of the monthly acquisitions list which formerly took days is now accomplished in a matter of minutes. production of bibliographies is also now an easy task, whereas in the manual phase it was done only by manually searching and copying the card catalog. uthscsa has also utilized the marcive system for the cataloging of local products-the videotapes produced by its television department, the citations of publications of its faculty and staff, and most recently the cataloging of computer assisted instruction programs, a project funded by 196 journal of libmry automation vol. 7/3 september 1974 the national library of medicine. the marcive system was used to produce book catalogs for each of the above. costs the costs of developing the marcive system were borne by trinity university library and the trinity university computer center. there was no outside funding for the development. no records were kept of the computer time used to test the marcive programs since all computer time was charged to a university-wide academic budget. during the various development periods of the marcive system there were never more than 1.5 full-time employees engaged in the project. table 1 is an estimate of the manpower and time spent in the various phases of the system. the library paid the salary of the librarian and the computer center paid the salary of the programmer. marcive evolved as the result of the cooperation of both departments within the university and would not have been possible if the administration had not supported the project. table 1. estimated time and manpower involvement months 1969-1970 1971 1971-1972 1972-1973 1973development 12 hiatus 6 conversion to dos & os 9 addition of marc capability 12 maintenance production costs librarian programmer/ analyst .5 .5 .25 .25 .25 .5 1.25 1.00 .25 charges to a marcive library are presently determined by the number of programs that the library uses and the method of input. a program's cost is based on cpu time, cards read, number of lines printed, and online data storage. commercial rates which reflect overhead and salaries are used. method of input can be keypunched cards, typewriter terminals, or magnetic tape. the computer costs for producing a set of catalog cards depend on the method of input, whether the marc tapes are searched, which edit listing is used, and the length and number of cards produced. an additional $0.02 per card is charged for the cost of card stock and the maintenance of the system. in 197 4 this charge will be increased to $0.03 per card to cover the rising costs of paper. uthscsa has kept records of the cost of each computer job since 1971. the average cost per card for 65,217 cards in finished form produced between april 1972 and august 1973 was $0.024 per card. considering that the average set had ten cards and a surcharge of $0.02 per card was added, the cost for producing a set of cards was $0.44. for the average 400 titles cataloged by uthscsa in each month, the marcive system charge would be .$246.00. the same input can be used to produce the monthly acmarciveibowden and miller 197 quisitions list at a cost of $0.015 per title, or $6.00 per list. the addition of a title to the marcive master disc file is an additional $0.03 per title. an average monthly bill for marcive computer services is $262.00. to this should be added the $40.00 prorated cost of the uthscsa keypunch which is also used for other projects, giving a total of $302.00, or $0.755 per title. a library assistant codes and keypunches the data as one half of the job assignment. an average of fifteen minutes per title is involved in the coding, keypunching, proofreading, and handling of data. at a salary rate of ·$4.00 per hour, this amounts to $1.00 per title. transportation costs for delivering data average $17.00 per month at $0.10 per mile. trinity university retrieves 75 percent of its cataloging data from the marc tapes via the search program. the per title cost of this retrieval varies according to the number of items searched and reflects the fact that the more records a computer processes, the less expensive the process becomes. during september 1973 a search of 723 lc numbers resulted in a $0.025 computer charge per retrieved title. a search of 10 lc numbers cost $0.05 per retrieved item. trinity edits each retrieved title to make local changes and to add an accession number. the average cost for this using the replace and edit programs is $0.12. thus batch retrieval from the marc tapes combined with the modification of these records costs from $0.145 to $0.17. the marcive system and the oclc system it is useful to compare the attributes and limitations of the marcive system with those of the ohio college library center ( oclc) system. the two were developed fully independently of one another during the same period of time. 5 oclc is an on-line system with access to the cataloging input of its member libraries in addition to the marc tapes. this access is by authortitle and title algorithms in addition to lc number. marcive is a batch system with access to the marc tapes by lc card number. even though it is a batch system, the libraries in the marcive system enjoy high priority which enables immediate usage of the computer, and jobs are executed throughout the day. the turnaround time for jobs other than search is one to two hours. because of the large size of the marccon file, search programs are executed only at night, so the turnaround could be as much as twenty-four hours. the importance of the access to the oclc member libraries' original cataloging in addition to the marc tapes has not been evaluated. oclc reports that 71 to 76 percent of new requests run against the marc file produce hits. an eventual success rate approaching 100 percent was predicted. 6 it would seem that the marc tapes would be sufficient for all but the larger or more esoteric libraries. printed copy is generated from a marcive search of the marc tapes, ;.,i ~~ i ,, 198 jou1'nal of libm1'y automation vol. 7/3 september 1974 thus allowing a library which does not accept lc cataloging unaltered to do its checking and revising off line. oclc displays the retrieved item on a crt screen. printed copy would be generated only if a special attachment at additional cost could be hooked to the terminal. if a library accepted lc data as displayed, only one retrieval would be necessary. if, however, the procedure was similar to that described by walsh college library, two retrievals would be made. in this procedure, some manual transcribing from the oclc crt screen is made in order to check the entry in the library catalog. 7 however, oclc does not presently charge for aretrieval with no other transaction. the costs for catalog cards produced by oclc and marcive are similar. kilgour has reported that oclc cost for formatting but not printing catalog cards is $0.0221 using commercial rates. a printing cost of $0.0033 was also given. 8 uthscsa reports a cost of $0.024 for the average catalog card, including formatting and printing. oclc charges a fixed price of $0.035 per catalog card. 9 marcive takes the commercial cost of producing the cards and adds an additional $0.02 per card for card stock and system maintenance, thus causing the average card to cost $0.044. in 197 4 the surcharge will increase to $0.03 per card to cover the increased cost of card stock, thus causing the average card to cost $0.054. it is in retrieval charges that a significant difference occurs between the marcive and oclc systems. it has been difficult for the authors to know the precise costs of oclc to participants as these vary according to the structure of the agreement and the location. we have chosen to compare the marcive costs to those of the iucjoclc system which has recently negotiated a contract. these costs do differ in structure from the oclc member libraries. in the agreement between oclc and the interuniversity council ( iuc) of the north texas area, the charge for service is based upon the calls made upon the oclc system for catalog ca1'd p1'oduction whe1'e the cataloging data 1'equested m·e found within the oclc data bank. such a call, referred to as a "hit," is charged at the rate of $0.875. to this must be added such items as leased line costs, terminal hardware and maintenance costs, local training and administrative costs, etc. the total of these costs is approximately ·$1.70 per hit. 10 no charge is made for retrieval from the oclc system unless catalog cards are requested, or for material input by the requesting library. if a library were to catalog 1,000 books using retrieved data from the oclc system, the charge would be $1,700 plus the cost of catalog cards and postage. if a library were to catalog an additional 200 titles not found on the oclc data bank and input these titles into oclc, the charge for the 1,200 titles would still be $1,700 plus the cost of catalog cards and postage. in the marcive system there is a charge each time the marc tapes are accessed, regardless of whether catalog cards are produced. retrieval from the marc tapes via the marcive system costs between $0.025 and marcive!bowden and miller 199 $0.05 per title, depending upon the quantity and percentage of matches in a hatched search. costs vary according to the mode of input from the $80.00 monthly rental for a keypunch machine to the $120.00 rate for an on-line typewriter terminal. using the maximum $0.05 per title retrieval cost and the ·$120.00 terminal cost and adding another $0.12 for the replace and edit programs, the cost for 1,000 titles is $290.00 plus online data costs. using the minimum $0.025 per title and $80.00 for the keypunch machine and adding the $0.12, the cost for 1,000 titles is $225~00. although all the factors which account for this charge variation are not known, the following appear to be significant. first, the costs inherent in an on-line system may be higher than in a batch procedure. second, the size of the oclc data base, which in 1972 had 181,209 member records in addition to the 229,807 marc records, may increase search time and thus costs. third, overhead costs of the oclc staff and their developmental projects in other areas may be a factor, although the substantial grant support oclc has received should have offset some of such costs. marcive presently serves five libraries within a twenty-minute driving area that are responsible for picking up their own output. marcive has minimal overhead costs. the present pricing structure of the computer center has covered its overhead costs. all new development is separately funded. mrs. miller's salary has been totally absorbed by trinity university library even though she is the marcive liaison for the computer center in addition to her responsibility as head of trinity library's cataloging department. it is probable that if marcive were to expand the number of participating libraries, additional administrative support would be required. the present pricing structure used by marcive would then be reevaluated. current developments continued development of the marcive system is in progress. one of the marcive libraries is programming a marcive to marc conversion which will allow it to merge its marcive holdings with a purchased marc data base. programming is underway to create book labels. additional access indexes to the marc files are also being programmed. some procedures which are feasible now were not feasible when the original programs were written. for example, it would be logical for the update program to generate both an updated master and an isam file, instead of the latter requiring the build misam program. the catalog card program will be rewritten for output on two-up card stock using the ala print train. a union catalog and a joint circulation system for the participating libraries are also possibilities. in january 1974 the marcive users' group was formed as a special interest group of the council of research and academic libraries. its purposes are: ( 1) to share information among users regarding problems, procedures, and needs pertaining to the marcive library system; (2) 200 journal of librm·y automation vol. 7/3 september 1974 to establish guidelines, standards, and manuals for input, output, and reporting procedures of user libraries; ( 3) to maintain a sound financial policy for the marcive system; ( 4) to explore and develop new ideas for programs, techniques, and procedures to further enhance the marcive system; ( 5) to recommend changes and/ or new programs for the marcive system; and ( 6) to assist libraries in the installation of any marcive programs. a major project of the users' group has been the writing of a detailed procedure manual for the marcive system. conclusion the marcive system is an excellent example of library cooperation in which the sharing of facilities, interest, and expertise has had great benefit for all concerned. in 1971 only trinity university and the university of texas health science center at san antonio used the marcive system. with the addition of accessibility to the marc tapes, san antonio college joined in 1972 and the university of texas at san antonio and st. mary's followed in 1973. one factor which made this cooperation possible was the formation in 1968 of a consortium of libraries in the san antonio area, the council of research and academic libraries (coral). its membership is comprised of the academic, public, and special libraries in the area. another factor has been the trinity university library and computer center staffs' enthusiasm and graciousness, which have been of great importance in fostering the desire to cooperate. too many projects fail because of lack of communication between participants. such communication can be as important a consideration as economic benefit in the decision to enter into a consortium. references 1. ruby b. miller and robert a. houze, "new horizons in computer applications at trinity university library," texas library journal 48:227-29, 254-55 (nov. 1972). 2. "usa standard for a format for bibliographic information interchange on magnetic tape," journal of library automation 2:53-54 (march 1969). 3. recon working task force, "levels of machine readable records," journal of library automation 3: 124-25 (june 1970) . 4. frederick g. kilgour et al., "the shared cataloging system of the ohio college library center," journal of library automation 5:157-83 (sept.1972). 5. kilgour et al., "shared cataloging system," p.159, 164. 6. frederick g. kilgour, "evolving, computerizing, personalizing," american libraries 3:146 (feb. 1972). 7. patricia lyons and margaret northcraft, "oclc: a user's viewpoint," catholic library world 44:265-68 (dec. 1972). 8. kilgour, "evolving, computerizing, personalizing," p.146. 9, ohio college library center, newsletter 53:2 (sept. 29, 1972). 10. interuniversity council of the north texas area, memorandttm (aug. 24, 1973). public access technologies in public libraries | bertot 81 john carlo bertot public access technologies in public libraries: effects and implications public libraries were early adopters of internet-based technologies and have provided public access to the internet and computers since the early 1990s. the landscape of public-access internet and computing was substantially different in the 1990s as the world wide web was only in its initial development. at that time, public libraries essentially experimented with publicaccess internet and computer services, largely absorbing this service into existing service and resource provision without substantial consideration of the management, facilities, staffing, and other implications of public-access technology (pat) services and resources. this article explores the implications for public libraries of the provision of pat and seeks to look further to review issues and practices associated with pat provision resources. while much research focuses on the amount of public access that public libraries provide, little offers a view of the effect of public access on libraries. this article provides insights into some of the costs, issues, and challenges associated with public access and concludes with recommendations that require continued exploration. p ublic libraries were early adopters of internet-based technologies and have provided public access to the internet and computers since the early 1990s.1 in 1994, 20.9 percent of public libraries were connected to the internet, and 12.7 percent offered public-access computers. by 1998, internet connectivity in public libraries grew to 83.6 percent, and 73.3 percent of public libraries provided public internet access.2 the landscape of public-access internet and computing was substantially different in the 1990s, as the world wide web was only in its initial development. at that time, public libraries essentially experimented with public-access internet and computer services, largely absorbing this service into existing service and resource provision without substantial consideration of the management, facilities, staffing, and other implications of public-access technology (pat) services and resources.3 using case studies conducted at thirty-five public libraries in five geographically dispersed and demographically diverse states, this article explores the implications for public libraries of the provision of pat. the researcher also conducted interviews with state library agency staff prior to visiting libraries in each state. the goals of this article are to n explore the level of support pat requires within public libraries; n explore the implications of pat on public libraries, including management, building planning, staffing, and other support issues; n explore current pat support practices; n identify issues and challenges public libraries face in maintaining and supporting their pat infrastructure; and n identify factors that contribute to successful pat practices. this article seeks to look beyond the provision of pat by public libraries and review issues and practices associated with pat–provision resources. while much research focuses on the amount of public access that public libraries provide, little offers a view of the effect of public access on libraries. this article provides insights into some of the costs, issues, and challenges associated with public access, and it concludes with recommendations that require continued exploration. n literature review quickly over time, public libraries increased their public-access provision substantially (see figures 1 and 2). connectivity grew from 20.9 percent in 1994 to nearly 100 percent in 2006.4 moreover, nearly all libraries that connected to the internet offered public-access internet services. simultaneously, the average number of publicaccess computers grew from 1.9 per public library in 1996 to 12 per public library in 2007.5 accompanying and in support of the continual growth of basic connectivity and computing infrastructure was a demand for broadband connectivity. indeed, since 1994, connectivity has progressed from dial-up phone lines to leased lines and other forms of high-speed connectivity. the extent of the growth in public-access services within public libraries is profound and substantive, leading to the development of new internet-based service roles for public libraries.6 and public access to the internet through public libraries provides a number of community benefits to different populations within served communities.7 overlaid onto the public-access infrastructure is an increasingly complex service mix that now includes access to digital content (e.g., databases and digital john carlo bertot (jbertot@umd.edu) is professor and director of the center for library innovation in the college of information studies at the university of maryland, college park. 82 information technology and libraries | june 2009 libraries), integrated library systems (ilss), voice over internet protocol (voip), digital reference, and a host of other services and resources—some for public access, others for back-office library operations. and patrons do use these services in increasing amounts—both in the library and in everyday life.8 in fact, 82.5 percent of public libraries report that they do not have an adequate number of public-access computers some or all of the time and have resorted to time limits and wireless access to extend public-access services.9 by 2007, as connectivity and public-access computer infrastructure grew, so ensued the need to provide a range of publicly available services and resources: n 87.7 percent of public libraries provide access to licensed databases n 83.4 percent of public libraries offer technology training n 74.1 percent of public libraries provide e-government services (e.g., locating government information and helping patrons complete online applications) n 62.5 percent of public libraries provide digital reference services n 51.8 percent of public libraries offer access to e-books10 the list is not exhaustive, but illustrative, since libraries do offer other services such access to homework resources, video content, audio content, and digitized collections. as public libraries expanded these services, management realized that they needed to plan and evaluate technology-based services. over the years, a range of technology management, planning, and evaluation resources emerged to help public libraries cope with their technology-based resources—those both publicly available and for administrative operations.11 but increasingly, public libraries report the strain that pat services promulgate. this centers on four key areas: n maintenance and management. the necessary maintenance and management requirements of pat places an additional burden on existing staff, many of whom do not possess technology expertise to troubleshoot, fix, and support internet-based services and resources that patrons access. n staff. libraries consistently cite staff expertise and availability as a barrier to the addition, support, and management of pat. indeed, as described in previous sections, some libraries have experienced a decline in library staff. n finances. there is evidence of stagnant funding for libraries at the local level as well as a shift in expenditures from staff and collections to operational costs such as utilities and maintenance. n buildings. the buildings are inadequate in terms of space and infrastructure (e.g., wiring and cabling) to support additional public access.12 this article explores these four areas through a sitevisit method in an effort to go beyond a quantitative assessment of pat within the public library community. though related in terms of topic area and author, this study was conducted separately from the public library internet surveys conducted since 1994 and offers insights into the provision of pat services and resources that a national survey cannot explore in such depth. figure 1. public-access internet connectivity from 1994 through 2008 figure 2. public-access internet workstations from 1996 through 2008 public access technologies in public libraries | bertot 83 n method the researcher visited thirty-five public libraries in five geographically and demographically diverse states between october 2007 and may 2008. the states were in the west, southwest, southeast, and mid-atlantic regions. the libraries visited included urban, suburban, rural, and native american public libraries that served populations ranging from a few hundred to more than half a million. the communities that the libraries served varied in terms of poverty, race, income, age, employment, and education demographics. prior to visiting the public library sites, the researcher conducted interviews with state library agency staff to better understand the public library context within each state and to explore overall pat issues, strategies, and other factors within the state. the following research questions guided the site visits: n what are the community and library contexts in which the library provides pat? n what are the pat services and resources that the library makes available to its community? n what pat services and resources does the library desire to provide to its community? n what is the relationship between provided and desired pat and the effect on the library (e.g., staff, finances, the building, and management)? n what are the perceived benefits to the library and its community gains through pat in the library? n what are the issues and barriers that the library encounters in providing pat services and resources? n how does the library manage and maintain its pat? the researcher visited each library for four to six hours. during that time, he interviewed the library director and/or branch manager and technology support staff (either a specific library position, designated library employee, or city or county it staff person), toured the library facility, and conducted a brief technology inventory. at some libraries, the researcher was able to meet with community partners that in some way collaborated with the library to provide pat services and resources (e.g., educational institutions that collaborated with libraries to provide access to broadband or volunteers who conducted technology training sessions). interviews were recorded and transcribed, and the technology inventories were entered into a microsoft excel spreadsheet for analysis. the transcripts were coded using thematic content analytic schemes to allow for the identification of key issues regarding pat areas.13 this approach enabled the researcher to use an iterative site-visit strategy that used findings from previous site visits to inform subsequent visits. to ensure valid and reliable data, the researcher used a three-stage strategy: 1. site-visit reports were completed and sent to th libraries for review. corrections from libraries were incorporated into a final site-visit report. 2. a final state-based site-visit report was compiled for distribution to state library agency staff and also incorporated their corrections. this provided a state-level reliability and validity check. 3. a summary of key findings was distributed to six experts in the public library technology environment, three of which were public library technology managers and three of which were technology consultants who worked with public libraries. in combination, this approach provided three levels of data quality checks, thus providing both internal (library and state) and external (technology expert) support for the findings. the findings in this article are limited to the libraries visited and interviews conducted with public librarians and state library agency staff. however, themes emerged early during the site-visit process and were reinforced through subsequent interviews and visits across the states and libraries visited. in addition, the use of external reviewers of the findings lends additional, but limited, support to the findings. n findings this section presents the results of the site visits and interviews with state library agency staff and public librarians. the article presents the findings by key areas surrounding pat in public libraries. the public-access context public libraries have a range of pat installed in their libraries for patron use. these technologies include public-access computers, wireless (wifi) access, ilss, online databases, digital reference, downloadable audio and video, and others. many of these services and resources are also available to patrons from outside library buildings, thus extending the reach (and support issues) of the library beyond the library’s walls. in addition, when libraries do not provide direct access to resources and services, they serve as access points to those services, such as online gaming and social networking. while libraries can and do deploy a number of technologies for public use, it is possible to group these 84 information technology and libraries | june 2009 technologies broadly into two overlapping categories: n hardware. library pat hardware can include public-access computers, public-access computing registration (i.e., reservation) systems, self-checkout stations, printers, faxes, laptops, and a range of other devices and systems. some of these technologies may have additional devices, such as those required for persons with disabilities. within the hardware grouping are networking technologies that include a range of hardware and software to enable a range of library networks to run (e.g., routers, hubs, switches, telecommunications lines, and networking software). n software. software can include device operating system software (e.g., microsoft windows, mac os, and linux), device application software (e.g., microsoft office, openoffice, graphics software, audio software, e-book readers, assistive software, and others), and functional software (e.g., web browsers, online databases, and digital reference). in short, public libraries make use of a range of technologies that the public uses in some way. each type of technology requires skills, management, implementation, and maintenance, all of which are discussed later. in the building, all of these products and services come together at the library’s public-access computers, or patron mobile device if wifi is available. moreover, patrons increasingly want to use their portable devices (e.g., usb drives, ipods, and others) with library technology. this places pressure on libraries to not just offer public-access computers, but also to support a range of technologies and services. thus the environment in which libraries offer pat is complex and requires substantial technical expertise, support, and maintenance in key areas of applications, computers, and networking. moreover, as discussed below, patrons are increasingly demanding market-based approaches to pat. these demands—which are largely about single-point access to a range of information services and resources—are often at odds with library technology that is based on stove-piped approaches (e.g, ils, e-books, and licensed resources) and that do not necessarily lend themselves to seamless integration. n external pressures on pats the advent and increased use by the public of google, amazon, itunes, youtube, myspace, second life, and other networked services affects public libraries in a number of ways. this article discusses these services and resources from the perspective of an information marketplace of which the public library is one entrant. interviewed librarians overwhelmingly indicated that users now expect library services to resemble those in the marketplace. users expect the look and feel, integration, service capabilities, interactivity, and personalization and customization that they experience while engaging in social networking, online searching, online purchasing, or other online activities. and within the library building, patrons expect the services to integrate at the public-access computer entry point—not distributed throughout the library in a range of locations, workstations, or devices. said differently, they expect to have a “mylibrary.com” experience that allows for seamless integration across the library’s services but also facilitates the use of personal technologies (e.g., ipods, mp3 players, and usb devices). thus users expect the library’s services to resemble those services offered by a range of information service providers. importantly, however, librarians indicated that library systems on which their services and resources reside by and large do not integrate seamlessly—nor were they designed to do so. public-access computers are gateways to the internet; the ils exists for patrons to search for and locate library holdings; and online databases, e-books, audiobooks, etc., are extensions of the library’s holdings but are not physical items under a library’s control and thus subject to a vendor’s information and business models. while library vendors and the library community are working to develop more integrated products that lead users to the information they seek, the technology is under development. there are three significant issues that libraries face because of market pressures: (1) the pressures all come together at a single point—the public-access computer; (2) users want a customized experience while using technology designed for the general public, not the individual user; and (3) users have choices in the information marketplace. one participant indicated, “if the library cannot match what users have access to on the outside, users will and do move on.” managing and maintaining public access managing the public-access computer environment for public libraries is an growing challenge. there are a number of management areas with which public librarians contend: n public-access computers—the computers and laptops (if applicable) themselves, which can include anything from keyboards and mice to troubleshooting a host of computer problems (it is important to note that these may be computers that often vary in age and composition, come from a range of vendors, run different operating systems, and often public access technologies in public libraries | bertot 85 have different application software versions). n peripheral management—the printers, faxes, scanners, and other equipment that are part of the library’s overall public access infrastructure. n public-access management software or systems—these may include online or in-building computer-based reservations (which encompasses specialized reservations such as teen machines, gaming computers, computers for seniors, and so on), time management (set to the library’s decided-upon time allotment), filtering, security, logins, virtual machines, etc. n wireless access—this may include logins and configurations for patrons to gain access to the library’s wireless network. n bandwidth management—this may include the need to allocate bandwidth differently as needs increase and decrease in a typical day. n training and patron assistance—for a vast array of services such as databases, online searching, e-government (e.g., completing government forms and seeking government information), and others. training can take place formally through classes, but also through point-of-use tutorials requested by patrons. to some extent, librarians commented that, while they do have issues with the public-access computers themselves from time to time, the real challenges that they face regard the actual management of the publicaccess environment—sign-ups, time limits, cost recovery for print jobs, helping patrons, and so on. one librarian commented that “the computers themselves are pretty stable. we don’t really have too many issues with them per se. it’s everything that goes into, out from, or around the computer that creates issues for us.” as a result of the management challenges, several libraries have adopted turn-key solutions, such as public-access management systems (e.g., comprise technology’s smart access manager [http://www.comprisetechnologies .com/product_29.html]) and all-encompassing public computing management systems that include networking and desktops (e.g., userful’s discoverstations [http:// userful.com/libraries/]). these systems allow for an allin-one sign-up, print cost recovery, filtering (if desired), and security approach. also, the discoverstations are a linux-based, all encompassing public-access management environment. a clear advantage to the discoverstation approach is that the discoverstation is connected to the internet and is accessible by userful staff remotely to update software and perform other maintenance functions. they also use open-source operating and application software. while these solutions do provide efficiencies, they also can create limitations. for example, the discoverstations are a thin-client system and are dependent on the server for graphics and memory, thus limiting their ability to access gaming and social-networking sites. the smart access manager, and similar programs, can rely on smart cards or other technology that users must purchase to print. another limitation is that the time limits are fixed, and, while users get warnings as time runs out, the session can end abruptly. these approaches are by and large adopted by libraries to ease the management associated with public-access computers and let staff concentrate on other duties and responsibilities. one librarian indicated that “until we had our management system, we would spend most of the day signing people up for the computers, or asking them to finish their work for the next person in line.” n planning for pat services and resources public libraries face a number of challenges when planning for pat services and resources. this is primarily because pat planning involves more than computers. any planning needs to encompass n building needs, requirements, limitations, and design; n technology assessment that considers the library’s existing technology, technology potential, current practices, and future trends; n planning for and supporting multiple technology platforms; n telecommunications and networking; n services and resources available in the marketplace—those specifically for libraries and those more broadly available to consumers and used by patrons; n specific needs and requirements of technology (e.g., memory, disk space, training, other); n requirements of other it groups with which the library may need to integrate, for example, city or county technology mandates; n support needs, including the need to enter into maintenance agreements for computer, network, and other equipment and software; n staff capabilities, such as current staff skill sets and their ability to handle the technologies under review or purchased; and n policy, such as requirements to filter because of local, state or federal mandates. the above list may not be exhaustive, but rather based on the main items that librarians identified during the site visits, and they serve to provide indicators of the challenges those planning library it initiatives face. 86 information technology and libraries | june 2009 n the endless upgrade and planning one librarian likened the pat environment to “being a gerbil on a treadmill. you go round and round and never really arrive,” a reference to the fact that public libraries are in a perpetual cycle of planning and implementing various pat services and resources. either hardware needs to be updated or replaced, or there is a software update that needs to be installed, or libraries are looking to the next technology coming down the road. in short, the technology planning to implementation cycle is perpetual. the upgrade and replacement cycle is further exacerbated by the funding situation in which most public libraries find themselves. increasingly, public library local and state funding, which combined can account for more than 90 percent of library funding, is flat or declining.14 the most recent series of public library internet studies indicates an increase in reliance by public libraries on fees and fines, fundraising, private foundation, and grant funding to finance collections and technology within libraries.15 this places key aspects of library operations in the realm of unreliable and one-time funding sources, thus making it difficult for libraries to develop multiyear plans for pat. n multiple support models to cope with pat management and maintenance issues, public libraries are developing various support strategies. the site visits found a number of technology-support approaches in effect, ranging from no it support to highly centralized statewide approaches. the following list describes the technology-support models encountered during the site visits: 1. no technology support. libraries in this group have neither technology-support staff nor any type of organized technology-support mechanism with existing library staff. nor do they have access to external support providers such as county or city it staff. libraries in this group might rely on volunteers or engage in ad hoc maintenance, but by and large have no formal approach to supporting or maintaining their technology. 2. internal library support without technology staff. in this model, the library provides its own technology support but does not necessarily have dedicated technology staff. rather, the library has designated one or more staff members to serve as the it person. usually this person has an interest in technology but has other primary responsibilities within the library. there may be some structure to the support—such as updating software (e.g., windows patches) once a week at a certain time— but it may be more ad hoc in approach. also, the library may try to provide its designated it person(s) with training to develop his or her skills further over time. 3. internal library support with technology staff. in this model, the library has at least one dedicated it staff person (partor full-time) who is responsible for maintaining and planning the library’s pat environment. the person may also have responsibilities for network maintenance and a range of technology-based services and resources. at the higher end of this approach are libraries with multiple it staff with differing responsibilities, such as networking, telecommunications, public-access computers, the ils, etc. libraries at this end of the spectrum tend to have a high degree of technology sophistication but may face other challenges (i.e., staffing shortages in key areas). 4. library consortia. over the years, public libraries have developed consortia for a range of services— shared ilss, resource sharing, resource licensing, and more. as public-library needs evolve, so too do the roles of library consortia. consortia increasingly provide training and technology-support services, and may be funded through membership fees, state aid, or other sources. 5. technology partners. while some libraries may rely on consortia for their technology support, others are seeking libraries that have more technology expertise, infrastructure, and abilities with whom to partner. this can be a fee-for-service arrangement that may involve sharing an ils, a maintenance agreement for network and public-access computer support, and a range of services. these arrangements allow the partner libraries to have some input into the technology planning and implementation processes without incurring the full expense of testing the technologies, having to implement them first, or hiring necessary staff (e.g., to manage the ils). the disadvantage to this model is that the smaller partner libraries are dependent on the technology decisions that the primary partner makes, including upgrade cycles, technology choices, migration time frames, etc. 6. city, county, or other agency it support. as city or county government agencies, some libraries receive technology support from the city or county it department (or in some cases the education department). this support ranges from a full slate of services and support available to the library to support only for the staff network and computers. public access technologies in public libraries | bertot 87 even at the higher end of the support spectrum, librarians gave mixed reviews for the support received from it agencies. this was primarily because of competing philosophies regarding the pat environment, with public librarians wanting an open-access policy to allow users access to a range of information service and resources and it agency staff wanting to essentially lock down the public-access environment and thus severely limit the functionality of the public-access computers and network services (i.e., wireless). other limitations might include prescribed pat, specified vendors, and bidding requirements. 7. state library support. one state library visited provides a high degree of service through its statewide approach to supporting public-access computing in the state’s public libraries. the state library has it staff in five locations throughout the state to provide support on a regional level but also has additional staff in the capital. these staff offer training, inhouse technical support, phone support, and can remote access the public-access computers in public libraries to troubleshoot, update, and perform other functions. moreover, this state built a statewide network through a statewide application to the federal e-rate program, thus providing broadband to all libraries. this model extends the availability of qualified technical support staff to all public libraries in the state—by phone as well as in person if need be. as a result, this enables public libraries to concentrate on service delivery to patrons. it is important to note that there are combinations of the above models in public libraries. for example, some libraries support their public-access networks and technology while the county or city it department supports the staff network and technology. it is clear, however, that there are a number of models for technology support in public libraries, and likely more than are presented in this article. the key issue is that public libraries are engaging in a broad spectrum of strategies to support, maintain, and manage their pat infrastructure. also of significance is that there are public libraries that have no technology-support services that provide pat services and resources. these libraries tend to serve populations of less than ten thousand, are rural, have fewer than five full-time equivalents (ftes), and are unlikely to be staffed by professional librarians. staff needs and pressures the study found a number of issues related to the effect of pat on library staff. this section of the findings discusses the primary factors affecting library staff as they work in the public-access context. n multiple skills needed not only is the pace of technological change increasing, but the change requires an ever-increasing array of skills because of the complexity of applications, technologies, and services. an example of such complexity is the library opac or ils. visited libraries indicated that such systems are becoming so complex and technologically sophisticated that there is a need for a full-time staff person to run and maintain the library ils. given the range of hardware, software, and networking infrastructure, as well as planning and pat management requirements, public librarians need a number of skills to successfully implement and maintain their pat environments. moreover, the skill needs depend on the librarian’s position—for example, an actual it staff person versus a reference librarian who does double duty by serving as the library’s it person. the skills required fall into technology, information literacy, service and facilities planning, management, and leadership and advocacy areas: n technology o general computer troubleshooting o basic maintenance, such as mouse and keyboard cleaning o basic computer repair, such as memory replacement, floppy drive replacement, disk defragmentation, etc. o basic networking, such as troubleshooting an “internet” issue versus a computer problem o telecommunications so as to understand the design and maintenance of broadband networks o integrated library systems o web design n information literacy o searching and using internet-based resources o searching and using library licensed resources o training patrons on the use of the publicaccess computers, general internet resources, and library resources o designing curriculum for various patron training courses n services and facilities planning o technology plan development and implementation (including budgeting) o telecommunications planning (including 88 information technology and libraries | june 2009 e-rate plan and application development) o building design so as to accommodate the requirements of public access technologies n management o license and contract negotiation for licensed resources, various public-access software and licenses, and maintenance agreements (service and repair agreements) o integration of pat into library operations o troubleshooting guidelines and process o policy development, such as acceptable use, filtering, filtering removal requests by patrons, etc. n leadership and advocacy o grant writing and partnership development so as to fund pat services and resources and extend out into the community that the library serves o advocacy so as to be able to demonstrate the value of pat in the library as a community good o leadership so as to build a community approach to public access with the library as one of the foundational institutions these items provide a broad cross section of the skills that public library staff may need to offer a robust pat environment. in the case of smaller, rural libraries, these requirements in general fall to the library director—along with all other duties of running the public library. in libraries that have separate technology, collections development, and other specialized staff, the skills and expertise may be dispersed throughout various areas in the library. n training public librarians receive a range of technology training— including none at all. in some cases, this might be a basic workshop on some aspect of technology at a state library association annual meeting or a regional workshop hosted by the library’s consortium. it could be an online course through webjunction (http://www.webjunction .org/). it could be a one-on-one session with a vendor representative or colleague. or it could be a formal, multiday class regarding the latest release of an ils. if available, public librarians have access to technology training that can take many forms, has a wide array of content (basic to expert), and can enhance staff knowledge about it with varying degrees of success. an issue raised by librarians was that having access to training and being able to take advantage of training are two separate things. regardless of the training delivery medium, librarians indicated that they were not always able to get release time to attend a training session. this was particularly the case for small, rural libraries that had less than five ftes spread out over several part-time individuals. for these staff to take advantage of training would require a substitute to cover public-service hours—or shut down the library. funding information technology as one might expect, there was a range of technology budgets in the public libraries visited or interviewed— from no technology budget to a substantial technology budget, and many points in between. some libraries had a dedicated it budget line item, others had only an operating budget out of which they might carve some funds for technology. libraries with dedicated it budgets by and large had at least one it staff person; libraries with no it budget largely relied on a staff person responsible for other library functions to manage their technology. in the smallest libraries, the library director served as the technology specialist in addition to being the general library operation manager. some libraries have established foundations through which they can raise funds for technology, among other library needs. many seek grants and thus devote substantial effort to seeking grant initiatives and writing grant proposals. some libraries held fundraisers and worked with their library friends groups to generate funds. other libraries engage in all of the above efforts to provide for their pat infrastructure, services, and resources. in short, there are several budgetary approaches public libraries use to support their pat environment. critical to note is that a number of libraries are increasingly relying on nonrecurring funds to support pats, a fact corroborated by the 2007 and 2008 public library internet surveys.16 the buildings when one visits public libraries, one is immediately struck by the diversity in design, functionality, and architecture of the buildings. public libraries often reflect the communities that they serve not only in the collection and service, but also in the facilities. this diversity serves the public library community well because it allows for a custom approach to libraries and their community. the building design, however, can also be a source of substantial challenge for public libraries. the increased integration of technology into library service places a range of stresses on buildings—physical space for workstations and other equipment and specialized furniture, power, server rooms, and cabling, for example. along with the library-based technology requirements come those of patrons—particularly the need for power so that public access technologies in public libraries | bertot 89 patrons may plug in their laptops or other devices. also important to note is that the building limitations also extend to staff and their access to computing and networked technologies. a number of librarians commented that they are “simply at capacity.” one librarian summed it up by stating that “there’s no more room at the inn. unless we start removing parts of our collection, we don’t have any more room for workstations.” another said that, “while we do have the space to add more computers, we don’t have enough power or outlets to support them. and, with our building, it’s not a simple thing to add.” in short, many libraries are reaching, or have reached, a saturation point as to just how much pat they can support. n discussion and implications over time, pat services have become essential services that public libraries provide their communities. with nearly all public libraries connected to the internet and offering public-access computers, the high percentage of libraries that offer internet-based services and resources, the overall usage of these resources by the public,17 and 73 percent of public libraries reporting that they are the only free provider of pat in their communities, it is clear that the provision of pat services is a key and critical service role that public libraries offer.18 it is also clear, however, that the extent to which public libraries can continue to absorb, update, and expand their pat depends on the resolution of a number of staffing, financial, maintenance and management, and building barriers. in a time of constrained budgets, it is unlikely that libraries will receive increased operational funding. indeed, reports of library funding cuts are increasing in the current economic downturn, which affects the ability of libraries to increase, or significantly update, staff—particularly in the areas of technology, licensing additional resources, procuring additional and new computers, and purchasing and offering expanded services such as digital photography, gaming, or social networking.19 moreover, the same financial constraints can affect the ability of libraries to raise capital funds for building improvements and new construction. funding also has an effect on the training that public libraries can offer or develop for their staff. and training is becoming increasingly important to the success of pat services and resources in public libraries—but not just training regarding the latest technologies. rather, there is a need for training that provides instruction on the relationship between the level of pat services and resources a library can or desires to provide and advocacy; broadband, computing, and other needs; technology planning and management; collaboration and partnering; and leadership. the public library pat environment is complex, encompasses a number of technologies, and has ties to many community services and resources. training programs need to reflect this complexity. the continued provision of pat services in public libraries is increasingly burdensome on the public library community, and the pressures to expand their pat services and resources continues to grow—particularly as libraries report their “sole provider” of free pat status in their communities. the successful libraries in terms of pat services and resources visited had staff that could n understand pat (both in terms of functionality and potential); n think creatively across the technology and library service spectrum; n integrate online content, pat, and library services; n articulate the value of pat as an essential community need and public library service; n articulate the role of the perception of the library by its community as a critical bridge to online content; n demonstrate leadership within the community and library; n form partnerships and extend pat services and resources into the community; and n raise funds and develop other support mechanisms to enhance pat services and resources in the library and throughout the community. in short, successful pat in libraries was being redefined in the context of communitywide pat service and resource provision. this approach not only can lead to a more robust community pat infrastructure, but it also lessens the library’s burden of pat service and resource provision. but equally important to note is that the extent to which all public libraries can engage in these activities on their own is unclear. indeed, several libraries visited were struggling to maintain basic pat service levels and indicated that increasing pat services came at the expense of other library services. “we’re trying to meet demand,” one librarian said, “but we have too few computers, too slow a connection, and staff don’t always know what to do when things go wrong or someone comes in talking about the latest technology or website.” for some libraries, therefore, quality pat services that meet community needs are simply out of reach. thus another implication and finding of the study is the need for libraries to explore other models of support for their pat environments—for example, using the services of a regional cooperative, if available; if none is available, libraries could form their own cooperative for resource sharing, technology support, and other aspects of pat service provision. the same approach could be 90 information technology and libraries | june 2009 taken within a city or county to enhance technology support throughout a region. another approach would be to outsource a library’s pat support and maintenance to a nearby library with support staff in a fee-for-service approach. there are a number of approaches that libraries could take to support their pat infrastructure. a key point is that libraries need to consider pat service provision in a broader community, regional, or state context, and the study found some libraries doing so. the need to avail staff of the skills required to truly support pat was a recurring theme throughout the site visits. approaches and access to training varied. for example, some state libraries provided—either directly or through the hiring of consultants and instructors—a number of technology-related courses taught in regional locations. an example of this approach is california’s infopeople project (http://www.infopeople .org/). some state libraries subscribed to webjunction (http://www.webjunction.org/), which provides access to online instructional content. online manuals provided by compumentor through a grant funded by the bill and melinda gates foundation aimed at helping rural libraries support their pat (www.maintainitproject.org) are another resource. beyond technology skills training, however, is the need for technology planning, effective communication, leadership, value demonstration, and advocacy. the extent to which leadership, advocacy, and library marketing, for example, are able to be taught remains a question. all of these issues take place with the backdrop of an economic downturn and budgetary constraints. increased operating costs created through inflation and higher energy costs place substantial pressures on public libraries simply to maintain current levels of service— much less engage in the additional levels of service that the pat environment brings. indeed, as the 2008 public library funding and technology access study demonstrated, public libraries are increasingly funding their technology-based services through non-recurring funds such as fines and fundraising activities.20 thus, the ability of public libraries to provide robust pat services and resources is increasingly limited unless such service provision comes at the expense of other library services. alone, the financial pressures place a high burden on public libraries. combined with the building, staffing, skills, and other constraints reported by public libraries, however, the emerging picture for library pat services and resources is one of significant challenge. n three key areas for additional exploration the findings from the study point to the need for additional research and exploration of three key services areas and issues related to pat support and services: 1. develop a better understanding of success in the pat environment. this study and the 2006 study by bertot et al. point to what is required for libraries to be successful in a networked environment.21 in fact, the 2007 public libraries and the internet report contained a section entitled “the successfully networked public library,” which offered a range of checklists for public libraries (and others) to consider as they planned and implemented their networked services.22 this study identified additional success factors and considerations focused specifically on the public access technology environment. together, these efforts point to the need to better understand and articulate the critical success factors necessary for public libraries to plan, implement, and update their pat given current service contexts. this is particularly necessary in the context of meeting user expectations and needs regarding networked technologies and services. 2. further identify technology-support models. this study uncovered a number of different technologysupport models implemented by public libraries. undoubtedly there are additional models that require identification. but, more importantly, there is a need to further explore how each technologysupport model assists libraries, under what circumstances, and in what ways. some models may be more or less appropriate on the basis of the service context of the library—and that is not clearly understood at this time. 3. levels of service capabilities. an underlying theme throughout this research, and one that is increasingly supported by the public library and the internet studies, is that the pat service context is essentially a continuum from low service and capability to high service and capability. there are a number of factors contributing to where libraries may lie on the success continuum—funding, management, leadership, attitude, skills, community support, and innovation, to name a few. this continuum requires additional research, and the research implications could be profound. emerging data indicate that there are public libraries that will be unable to continue to evolve and meet the increased demands of the networked environment, both in terms of staff and infrastructure. public libraries will have to make choices regarding the provision of pat services and resources in light of their ability to provide high-quality services (as defined by their service communities). for better or worse, the technology environment continually evolves and requires new technologies, management, and support. that is, public access technologies in public libraries | bertot 91 and will continue to be, the nature of public access to the internet. though there are likely other issues worthy of exploration, these three are critical to further our understanding of the pat environment and public library roles and issues associated with the provision of public access. n conclusion the pat environment in which public libraries operate is increasingly complex and continues to grow in funding, maintenance and management, staffing, and building demands. public libraries have navigated this environment successfully for more than fifteen years; however, stresses are now evident. libraries rose quickly to the challenge of providing public-access services to the communities that they serve. the challenges libraries face are not necessarily insurmountable, and there are a range of tools designed to help public libraries plan and manage their public-access services. these tools, however, place the burden of public access, or assume that the burden of public access in placed, on the public library. given increased operating costs because of inflation, the continual need to innovate and upgrade technologies, staff technology skills requirements, and other factors discussed in this article, libraries may not be in a position to shoulder the burden of public access alone. thus there is a need to reconsider the extent to which pat provision is the sole responsibility of the library; perhaps there is a need to integrate and expand public access throughout a community. the potential of such an approach can benefit a community through an integrated and broader access strategy, but also can relieve the pressure on the public library as the sole provider of public access. n acknowledgement this reserach was made possible in part through the support of the maintianit project (http://www.maintainit project.org/), an effort of the nonprofit techsoup web resource (http://www.techsoup.org/). references 1. charles r. mcclure, john carlo bertot, and douglas l. zweizig, public libraries and the internet: study results, policy issues, and recommendations (washington, d.c.: national commission on libraries and information science, 1994). 2. john carlo bertot and charles r. mcclure, moving toward more effective public internet access: the 1998 national survey of public library outlet internet connectivity (washington, d.c.: national commission on libraries and information science, 1998), http://www.liicenter.org/reports/1998_plinternet_ study.pdf (accessed apr. 22, 2009). 3. charles r. mcclure, john carlo bertot, and john c. beachboard, internet costs and cost models for public libraries (washington, d.c.: national commission on libraries and information science, 1995). 4. charles r. mcclure, john carlo bertot, and douglas l. zweizig, public libraries and the internet: study results, policy issues, and recommendations (washington, d.c.: national commission on libraries and information science, 1994); john carlo bertot, charles r. mcclure, paul t. jaeger, and joe ryan, public libraries and the internet 2006: study results and findings (tallahassee, fla.: information institute, 2006), http://www .ii.fsu.edu/projectfiles/plinternet/2006/2006_plinternet.pdf (accessed mar. 5, 2009). 5. john carlo bertot, charles r. mcclure, carla b. wright, elise jensen, and susan thomas, public libraries and the internet 2007: study results and findings (tallahassee, fla.: information institute, 2008). http://www.ii.fsu.edu/projectfiles/plinternet/ 2007/2007_plinternet.pdf (accessed sept. 10, 2008). 6. charles r. mcclure and paul t. jaeger, public libraries and internet service roles: measuring and maximizing internet services (chicago: ala, 2008). 7. george d’elia, june abbas, kay bishop, donald jacobs, and eleanor jo rodger, “the impact of youth’s use of the internet on the use of the public library,” journal of the american society for information science & technology 58, no. 14 (2007): 2180–96; george d’elia, corinne jorgensen, joseph woelfel, and eleanor jo rodger, “the impact of the internet on public library use: an analysis of the current consumer market for library and internet services,” journal of the american society for information science & technology 53, no. 10 (2002): 802–20. 8. national center for education statistics (nces), public libraries in the united states: fiscal year 2005 [nces 2008301] (washington, d.c.: national center for education statistics, 2007); pew american and internet life, “internet activities,” http:// www.pewinternet.org/trends/internet_activities_2.15.08.htm (accessed mar. 5, 2009). 9. bertot et al., public libraries and the internet 2007. 10. ibid. 11. cheryl bryan, managing facilities for results: optimizing space for services (chicago: public library association, 2007); joseph matthews, strategic planning and management for library managers (westport, conn.: libraries unlimited, 2005); joseph matthews, technology planning: preparing and updating a library technology plan (westport, conn.: libraries unlimited, 2004); diane mayo and jeanne goodrich, staffing for results: a guide to working smarter (chicago: public library association, 2002). 12. ala, libraries connect communities: public library funding & technology access study (chicago: ala, 2008), http:// www.ala.org/ala/aboutala/offices/ors/plftas/0708report.cfm (accessed mar. 5, 2008). 13. charles p. smith, ed., motivation and personality: handbook of thematic content analysis (new york: cambridge univ. 92 information technology and libraries | june 2009 pr., 1992); klaus krippendorf, content analysis: an introduction to its methodology (beverly hills, calif.: sage, 1980). 14. ala, libraries connect communities. 15. bertot et al., public libraries and the internet 2006; bertot et al., public libraries and the internet 2007. 16. ibid. 17. nces, public libraries in the united states. 18. bertot et al., public libraries and the internet 2007. 19. american libraries, “branch closings and budget cuts threaten libraries nationwide,” nov. 7, 2008, http://www .ala.org/ala/alonline/currentnews/ newsarchive/2008/november2008/ branchesthreatened.cfm (accessed nov. 17, 2008). 20. ala, libraries connect communities. 21. bertot et al., public libraries and the internet 2006. 22. bertot et al., public libraries and the internet 2007. lib-s-mocs-kmc364-20140601051638 51 scientific serial lists dana l. roth: central library, indian institute of technology, kanpur, u.p., india this article describes the need for user-oriented serial lists and the development of such a list in the california institute of technology library. the results of conversion from eam to edp equipment and subsequent utilization of com (computer-output-microfilm) is reported. introduction prior to the dedication of the millikan memorial library, which houses the divisional collections in chemistry, biology, mathematics, physics, engineering, and humanities, the libraries at the california institute of technology were largely autonomous, reflecting the immediate needs of each division, and exhibited little attempt at interdivisional coordination of library purchases. with centralization of the major science collections, it became apparent that any efforts to reduce duplication, promote more effective library usage, and provide assistance in interdisciplinary research efforts would require a published list of serials and journals ( 1). scientists vs librarians it is certainly a truism that serial publications constitute the backbone of a library's research collection. particularly in the sciences, where serial publications serve as the primary record of past accomplishments, studies have shown that over 80 percent of the references cited in basic source journals are to serials (see table 1). citation of serials rather than monographs was greater in chemistry than in other sciences and the overall order may reflect the efficiency of the respective abstracting/ indexing services. in spite of the scientist's heavy dependence on serials, it appears that in most libraries little attempt has been made to reconcile the library 52 ]oumal of library automation vol. 5/1 march, 1972 table 1. percentage of citations to serials found in basic source journals for various, scientific disciplines 0 discipline percentage of citations to serials ch em is try ____________________ ____________ ---------------------. __ . _____________________ 93. 6 physiology ----------------------------------____________________________________________ 90. 8 physics __ . ____________________________________________________________________________ . .88. 8 ~~~l~~lo~~--~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~=~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~6:~ mathematics __________ ·------------------------------------_____________ ....... 76.8 °c. h. brown, scientific serials (chicago: association of college and research libraries, 1956). record with practices found in the scientific literature. this is in part due to the general acceptance of the library of congress dictum that serials should be cataloged according to the general principles laid down for monographs. fortunately, monographs are generally cited in the scientific literature by entries (author / title ) which invariably appear in the library catalog. serials, however, present the special problems of so-called indistinctive titles, frequent title changes, and common reference to the abbreviated form of their title. most american libraries have followed the library of congress/union list practices and as a result have long suffered user complaints about the use of corporate entries for so-called indistinctive titles, entries under the latest form of title, and the treatment of prepositions and conjunctions as filing elements. these practices have been defended as attempts to extend the reference value of the catalog but in doing so they create a number of problems and ambiguities which are only partially resolved by the annoying use of see references. the recent surge of interest in making the library "relevant" and more intimately involved with its users needs must take into account that in the minds of scientists it is a presumptive requirement for them to remember cataloging rules when the library could just as well accommodate the scientific form. in recognition of the long-standing scientific tradition of describing serials by their titles (which considerably predates the corporate entry syndrome), the logical solution wou ld be to provide title added entries for those serials whose main entry is in corporate form ( 2). specific problems 1. even if scientists were to remember the basic rules for society publications and similar corporate entries, how are the exceptions shown in table 2 to be reconciled? 2. the practice of cataloging serials under their latest title then best serves as an obstruction to determining the library holdings, since referscientific serial lists/roth 53 table 2. an example of the difficulties encountered in translating abbreviations of scientific journal titles into lc entries abbreviation scientific form of title union list entry bull acad pol sci bulletin de 1' polska akademia nauk academie ... pnas proceedinys of the national academy ... nationa ... jacs journal of the american chemical .. . american ... berichte berichte der deutsche chemischen ... deutschen . .. comp. rend. comptes rendus ... academie des sciences .. . ber. bunsen... berichte der bunsen. .. deutsche bunsen ... bull. soc. chim. belges bulletin des societies ... bulletin des bull. soc. chim. france bulletin de la societe societies ... societe chimique des france ences given in the scientific literature and citations obtained from abstracting/ indexing services are obviously to the title currently in use. another important factor, that is sometimes overlooked, is the requirement of a classified shelf arrangement. otherwise, since the title of the bound volume corresponds to the title in use at the time of binding, you have the ambiguity of catalog referring to the latest title and shelf locator referring back to the earlier title. these problems are further complicated by the long d elays and backlogs in recataloging. in many large libraries this is a major function of serials catalogers and it is estimated that it takes 50 percent longer to recatalog than to catalog originally ( 3). 3. the jargon of scientists when discussing or requesting information about various periodicals is replete with acronyms and abbreviated forms. jags, pnas, berichte, comptes rendus, annalen all have well-defined meanings in scientific literature and conversation because of the well-developed title entries and abbreviations given in physics abstracts, chemical abstracts, and the world list of scientific periodicals. the use of prepositions and conjunctions as filing elements constrains these scientists to being able to translate these abbreviations only into title entries where the omitted words are obvious, e.g., journal of the american chemical society but often causes problems with titles like journal of the lesscommon metals. the cal tech serials list: objectives and procedures the publication of a serials list oriented to the needs of scientists must then provide for: scientific title entries for corporate and society publica54 journal of library automation vol. 5/1 march, 1972 tions, treatment of each title change as the cessation of the old title, and omitting prepositions and conjunctions as filing elements. these practices will increase the number of entries by about 40 percent over the number of current titles but in terms of user appreciation the extra expense is amply justified. the list can then be a logical extension of the library's reference service and offers the opportunity of facilitating the research efforts of its users by obviating the need to remember cataloging rules or visit the library to determine its holdings. input to the serials list was derived from the library's serials card catalog. the information was typed on oversize card stock and included the full main entry, holdings, and divisional library location, with additional data cards, as required, to reflect title changes. with this data base, an extensive search of the world list of scientific periodicals and list of periodicals abstracted by chemical abstracts was made to determine the additional scientific title entries to be incorporated in the list. (each departmental library provides a shelf locator which relates the various forms of entry in the serials list to that chosen for the bindery title and subsequent shelf location.) prepositions and conjunctions were replaced with ellipses in the final typing of multilith stencils required for the manual publication of the first edition of the cal tech serials list ( 4). during the spring of 1969, the decision was made to employ edp techniques in the publication of the second edition of the list. as an interim housekeeping device between editions, the author maintained an in-house supplement on punch cards using a single card format. this experience indicated an unacceptable severity of title abbreviation which was obviated by adopting a two-card format. this is consistent with the ibm 360 system wherein input records are read two cards at a time, and thus, the unit record may be thought of as a "super" card of 160 columns (of which only a maximum of 131 columns can be printed on a given line, the remaining 29 columns being used for internal records). the unit serials record consists of the title, holdings, divisional library, serial number, and spacing command (see table 3). the unit records were created directly from the existing serial list and the cumulated supplement by in-house clericals. this obviated the usual requirement of coding the data for keypunch operators. subsequent to the preparation of the unit records, having an alphabetical sequence of punched cards, it was a simple matter to program the computer to serially number each second card, using orre letter and six digits. an example of the distribution of titles one might expect is given in table 4. while the data conversion was being performed, a series of programs was written. these programs were designed to create a master tape, update the tape, and to produce a variety of listings. these listings, in addition to the required 131-column printout for the serial list, include the 160-column table 3. the unit serials record card no. columns 1 1-75 2 1-27 2 29-32 2 72-78 2 80 scientific serial lists/roth 55 field designation title holdings divisional library serial no. spacing command table 4. distribution of titles by initial letter. letter number of title entries a 1,024 b-d 1,126 e-1 1,199 j-m 1,272 n-r 1,413 s-z 1,471 printout (in sequential so-column units) and printouts for individual divisional libraries which can be annotated with shelf locations. the data base was then transferred from punch cards to magnetic tape and subsequent additions and changes involve punch cards and tape one onto tape two operations. as a protective device tape one and tape two are the current and previously current tapes, respectively. thus in the case of accident the preceding tape can again be updated. as a further precaution the original punch card data base and update decks are on file. the economic justification for the use of edp equipment in libraries is based upon the necessity of maintaining current records that can be published at regular intervals. in the special case of serial lists this involves the periodic merging of small numbers of new and corrected unit records with the much larger number of unit records in the existing data base. the use of serially numbered unit records allows the relatively easy machine function of merging numbered items in contrast with the difficulties involved in merging large alphabetical fields. recent advances in reprographic technology suggested that com (computer-output-microfilm) could be utilized to produce a quality catalog, free of the normal objections to "computer printout." the flexibility of currently available com units allows the acceptance, as input of a normal print tape from most computer systems (ibm, burroughs, univac) 56 journal of library automation vol. 5/1 march, 1972 table 5. data presentation and relative spacing title holdings divisional library faraday society, london discussions 1,1947+ chern 10,1951+ c eng symposia 1,1968+ chern 1,1968+ c eng transactions 1,1905+ chern 46,1950+ c eng farber-zeitung 1889-1918 chern without reformating ( 6). the print processors resident in the front-end computer of the fr-80, for example, allow for upperand lowercase, gold characters, column format, pagination, and sixty-four-character sizes. variation in character size allows a maximum density of 170 characters per line and 120 lines per ( 8~ x 11 ) page. the application of com equipment requires the production of a "print tape." this is simply a coded version of the current tape which contains the additional instructions necessary for spacing the unit records, defining the page size, and inserting "continued on next page" statements. the use of spacing command instruction, as an integral part of the unit record, allows all the information on a given title to remain in one unit and easily provides for a blank line before the next title ( see table 5). the additional problem of keeping the information on one title together on a given page or providing a "continued on next page" statement was solved by analyzing the information in the eighty-ninth line of each page to determine whether to print another line, insert the "continued on next page" instruction, or begin the title on the next page. once the film is generated, it is a simple matter to produce plates for the multilith production of hard copy ( 7). the choice of a ninety-lines-per-page format was influenced, in part, by our desire to use the serials list to break down the reluctance shown by faculty and students toward microformats. this format results in a onethird reduction of the 112-column computer printout and enables our 5,000 current titles to be accommodated on two microfiches ( 152/ pages ). footnotes 1. for the purposes of this article, periodical and serial are synonymous and refer to publications which may be suspended or cease but never conclude. the term "serials list" should be restricted to publications which record only serial titles ( and supplementary information to distinguish between similar titles), holdings, and internal records. library catalogs and union lists are quite sufficient sources for relating a title scientific serial lists /roth 57 to its successor or precedent, and providing full bibliographic detail. 2. p. a. richmond and m. k. gill, "accomodation of nonstandard entries in a serials list made by computer," journal of the american society for information science 11:240 ( 1970 ); dana l. roth, "letters to the editor; comments on the 'accomodation of nonstandard entries . . . ,' " journal of the american society for information science (in press). 3. andrew d. osborn, serial publications ( chicago: american library association, 1955). 4. e. r. moser, serials and journals in the c.i.t. libraries (pasadena: california institute of technology, 1967). 5. dana l. roth, serials and journals in the c.i.t. libraries (2nd ed.; pasadena: california institute of technology, 1970). 6. robert f. gildenberg, "technology profile; computer output microfilm ," modem data 3:78 ( 1970 ). 7. computer micrographics, inc., los angeles, california. lib-mocs-kmc364-20131012115244 150 the british library's approach to aacr2* lynne brindley: british library, bibliographic services division, london, england. the formal commitment of the british library to aacr2 and dewey 19 entailed substantial changes to the u.k. marc format, the blaise filing rules, and a variety of products produced for the british library itself and for other libraries, including the british national bibliography. the british library file conversion involved not only headings but also algorithmic conversion of the descriptive cataloguing. along with the u.s. library of congress and the national libraries of australia and canada, the british library was formally committed to the adoption of the anglo-american cataloguing rules, second edition (aacr2) and decimal classification, 19th edition (dc19) in 1981. this entailed fairly substantial changes to the marc format as published in the u.k. marc manual, 2nd edition as well as the implementation of the new and more sophisticated blaise (british library automated information service) filing rules. 1 there is, of course, never an ideal time for making major changespolitically, economically, or technically; and the bibliographic services division (bsd) found itself having a large number of preexisting separate systems, particularly for our batch processing work, which had grown up over a long period of time and had in most cases been tailor-made to the individual products. whilst relatively small, bsd is nonetheless responsible for a multiplicity of products and services, almost all of which were to be affected to some extent by the change toaacr2/dc19. briefly, then, a comment on the different services and the degree to which they were affected, thus setting the scene for our decisions on machine conversion. *based on a talk given at the library association seminar "library automation and aacr2," held in london on january 28, 1981. the views expressed in this paper do not necessarily represent those of the british library or the bibliographic services division. manuscript received june 1981; accepted june 1981. services and impacts printed publications british library!brindley 151 the major printed publication of the division is the british national bibliography. it is arguable that for the printed publications (especially the weeklies) there would have been little justification for retrospective conversion. the files could have been cut off at the end of 1980 and started afresh for 1981-it might, however, have precluded, or certainly have made more messy, the possibility of any multiannual cumulations across this period. microform products these are mostly individual com catalogues, both within the bl, especially the reference division, and externally, provided through locas (bsd's local catalogue service) to some sixty libraries in the u.k. in many ways those libraries that plunged into automation early, building up files of records derived from central u.k. and lc marc, were likely to be worst affected. individual machine-readable files had grown very large and exploited not only relatively current cataloguing data, but also full retrospective u.k. holdings back to 1950. also we foresaw no lessening of use by libraries taking our catalogue service of the u.k. retrospective 1950-80 file after aacr2 implementation. therefore the grounds for attempting automatic retrospective conversion of records were indisputable. tape services u .k. exchange tapes, either as a weekly service or through the selective record service, are supplied to nearly one hundred organisations. the same arguments that there will be continuing selection from the retrospective files apply-therefore, for compatibility and ease of use we needed to consider conversion. the weekly exchange tape service makes a clean aacr1/aacr2 break, but obviously libraries have back files of aacrl records. mindful of our responsibility to other organisations and agencies utilising our records, we decided to make our own converted tapes of lc and u.k. marc records available to tape-service customers to aid their own conversions. online services regarding the blaise online information retrieval system for u.k. and lc marc, our concern was to ensure continued easy searching and printing across the total span of files. without automatic conversion it would have been difficult, if not impossible, to ensure consistency in search elementsandindexentries (e.g.: in u.k. marc, seriesfields400, 410, and 411 no longer exist, so without conversion a searcher would have to remember specific search qualifiers for pre-1981 records, and different ones thereafter). without conversion the searcher would need a lot more knowl152 journal of library automation vol. 14/3 september 1981 edge of marc and the history of cataloguing practices to formulate effective strategies. outside users of marc last and very much not least was a consideration of what we could do to help the now large community of u.k. marc users in coping with the changeover. this is now a very large and diverse group relying on bsd for the provision of bibliographic records for whatever purpose. our own conversion enabled us to provide a multiplicity of aids to libraries. of particular note are (1) u.k. and lc exchange tapes of converted records, and (2) machine-readable and microfiche versions of our own name conversion file, which is being used as the basis for the new name authority fiche. so, in the context of the variety of our services the case for conversion was strong. retrospective conversion the extent of the retrospective conversion exercise is discussed below. in conjunction with this work we were faced with the necessity of rationalising our com and print product software (library software package), both to enable it to drive each of the previously separate print applications and to ensure that it had sufficiently sophisticated output facilities to cope with the complexity of aacr2/u.k. marc 2 records, with their increase in numbers of subfields, their repeatability, all or some, and varying sequences, to produce the specified layout and punctuation across our services. extent of conversion we are now in a position to discuss the retrospective conversion exercise. having decided in principle to become involved with conversion, the extent of our involvement had to be established. british libraries have never had the tradition of building and utilising name authority files, and certainly the concepts fit more easily in the north american primarily online system context rather than in the predominantly batch cataloguing systems established in the u.k. the bl therefore found itself without a machinereadable authority file and began to create one from scratch to enable the important heading changes required by aacr2 to be handled automatically. again because of the overriding importance of com catalogues in the u.k., considerable attention was paid not only to automatic heading changes but also to automatic marc coding and text conversions bringing the descriptive cataloguing elements also into line with aacr2/u.k. marc 2, so that catalogue records could be consistent on output whether derived from the conversion or newly created . the third consideration for conversion was our library of congress file british library!brindley 153 (books all1968), used in the u.k. as part of our cataloguing services and as a file in the blaise online system. we had always performed certain conversions on lc records to bring them more into line structurally with the u.k. marc format. however, u.k. libraries using these records for cataloguing purposes still had to undertake substantial editing. it was therefore decided to use the opportunity to enhance this conversion and bringlc records into line with u.k. marc 2 to make them of maximum use to british librarians. to summarise, then, the retrospective conversion comprised three main parts: 1. that part which utilised information stored in the name conversion file, which records the aacr2 and aacrl forms of names. this enabled the automatic conversion of major, commonly occurring personal and corporate headings. 2. automatic marc coding and text conversions-this consisted of specifications at marc tag and subfield level of algorithms for automatic marc coding and scme bulk text conversions. it resulted in records being converted to a pseudo-aacr2/u.k. marc 2jormat, so that all output specifications, whether by profile or by online inversion, had only to cater for the new format. these two parts of the conversion are inexorably linked, both conceptually and in programming terms , with frequent references to alternative courses of action dependent on whether a match has been found on ncf. the details of conversion are in "specification for retrospective conversion of the uk marc files 1950-1980,"2 prepared in the computer services department. 3 . the third facet of conversion was to our library of congress files (books all1968), to bring records in line with u.k. marc 2 as far as possible. only conversions of tags, indicators, subfield marks, punctuation, and order of data elements have been included; no attempt has been made to bring textual data into conformity with bsd practice. the converted records are therefore in aa c r2 form to the extent that lc applies aacr2 to a particular record. the next section highlights major points of each part of the conversion, commenting particularly on aspects of programming and testing. name conversion then arne conversion file was built up by bsd's descriptive cataloguing section over nine months of 1980 and comprises authenticated aacr2 headings with theaacrl form where different. it will form the basis of an authority file of headings and references for future bsd cataloguing and will be the first publicly available u.k. authority file. the file was maintained using existing locas facilities. pseudo-marc records were created recording the aacrl and aacr2 forms of headings in the format shown in example 1. 154 journal of library automation vol. 14/3 september 1981 field 001 (control number) 049 (source code) 110.1 $a great britain $c accidents investigation branch (name heading in aacrii form) 710.1 $a great britain $c department of trade $c accidents investigation branch (name heading in aacri form) 910.1 $a great britain $c department of trade $c accidents investigation branch $x see $a great britain $x accidents investigation branch (reference for aacrii name heading) name conversion file record example! the file being used for conversion comprised some 12,000 records, of which 4,000 had aacr2 heading changes. the remaining records were authenticated by bsd as correct aacr2 headings without alteration. of the changed headings most were prolific personal and corporate (particularly u.k. government) headings. the first stage of the conversion process for u.k. marc records (1950-80) involved all records being processed against the name conversion file to replace aacrl with aacr2 headings and associated references. in programming terms, the name conversion was relatively easyrelatively, that is, in the context of bibliographic programming. the matching program used was not particularly sophisticated. it took each ncf record, identified the 7xx (aacrl) field, created a key of fifty characters stripping out all blanks, embedded punctuation and diacriticals, and then tried to match the key against each 1xx heading in whatever file was being converted. if there was a match on the key, then the program proceeded to match character by character through the data looking for an exact match. if this was not found, then the ncf record was not processed. example 2 shows this procedure more clearly. of course, this file has not converted all aacrl headings, but it has ensured that the majority of headings likely to recur (i.e., of any significance in catalogue collocation of headings) have been automatically changed. automatic marc coding and text conversions this is commonly known as the format conversion program and forms the bulk of the "specification for retrospective conversion." the original specification was extremely complex, particularly bearing in mind the tight time scales that we were working to. the major difficulty throughout all parts of this facet of conversion was having to specify procedures to accommodate the variety of usage of marc across thirty years, including previously automatically converted 1950-68 u.k. marc records; it has british library!brindley 155 ncfrecord 710 (aacri) $a great britain $c civil service department $c central computer agency# 110 (aacrii) $a central computer agency# 910 (aacrii) $a great britain $c civil service department $c central computer agency $x see $a central computer agency# key: 10$agreatbrit ain$ccivilservicedepartment$ccentralc matching on datawould match central computer agency would not match central cataloguing agency n.b. key equals 50 characters (upper case) ncfrecord 700 (aacri) 100 (aacrii) $a walker $h david esdailel $a walker $h david e. $q david esdaile $r 1907 -1 900 (aacrii) $a walker $h david $c 1907$x see $a walker, david e.# key: 10$aw alker$hdavidesdaile book record before: 100 walker $h david esdaile# 900ajter: 100 $a walker $h david e. $q david esdaile $r 1907 -# 900 $a walker $h david $c 1907$x see $a walker, david e. $z 100# n. b. addition of new reference name conversion matching example2 been almost impossible to verify absolutely that any of the automatic changes would cover all cases. not surprisingly, this was an extremely complex program. it had to allow for manipulating in fairly precise ways nonstandard and variable data, and had to be designed to cope with occurrences in many different combinations . the programmer had to code for these combinations, some of which may possibly never have been used. it is probably the case that certain combinations do not exist, but this could not be guaranteed over such a large number of records until the total file had been converted. a good example of the complex logic of this kind of processing is found in the 245 field, where seven complex conditions were allowed for: (1) (2) (3) field245 if $e ___ then _ __ else _ _ _ if$£ then else _ _ _ if $d or $e ___ or ___ or _ _ _ or ______ or ______ or ______ or ____ __ 156 journal of library automation vol. 14/3 september 1981 then __ _ else if $d or $e ___ or ___ or __ _ or ___ or ___ or or __ _ or __ _ then __ _ (4) iftags ___ then __ _ (5) if008 and or ___ or __ _ then __ _ (6) if $h then and __ (7) if $e then __ _ else if first $e then _ _ _ else __ _ else __ _ repeat for all levels of 245. another variation on this theme is that the specification catered for what it expected to find. again, because of the voiume and span of data the expected was not always found. for example, a lot of processing of references is dependent on the presence of a $x. what do you do when you find a record accidentally without one? a third problem was that of interdependency of fielch and subsequent actions . a good example of this is found in llos and related 910s. if a 110 is changed, you may have to create a 910 , replace a 910 with another one, or reorganise existing subfields. then you may have to reorder the field and also flag the action to come back to later in the program. hence you are switching back and forth across fields throughout the program . you cannot simply start at field one, process sequentially, and then stop. clearly this makes program testing that much more complicated. however, those were the problems-really a very small percentage of the whole. from all that has been seen of the converted files so far it has been a highly successful exercise. all of the major marc changes and many of less significance have been converted automatically by this program-treaties, laws, statutes, series, conferences, multipart works-the resulting records being consistent in marc tagging structure and in significant headings and areas of text. library of congress file conversion it has already been stressed that the automatic marc coding and text conversions for u.k. marc were very complex programs. perhaps even more complicated was the conversion program written to transform lc into u.k. marc format. the main reason for this is that the u.k. and ncf conversions are one-off programs and a great number of the manipulations could be hard-coded. however, it is intended that the lc conversion program will be used on an ongoing basis against each weekly lc tape. thus each conversion has been treated as a separate parameter to the british library!brindley 157 program so that it is general purpose and easily alterable in the light of changes of practice by lc. to give you some idea of the complexity, there are well over 600 separate parameters to the program. i say separate, but in fact they are interrelated parameters, so that if a minor change is made to one it can potentially affect many others. many of the problems relating to this program could again only be really apparent in volume testing, not in writing. each parameter written and tested in isolation was satisfactory, but when they began to be put together in modular form, then the problem of unusual combinations began to show. although the conversion parameters for lc records are extensive, they cannot touch the cataloguing data, certainly not nearly as much as in the u.k. marc conversion. there are added problems in the fact that the records coming to us from lc do not show the clean aa crl/ aacr2 break that bsd is adopting. we are having to allow for mixed records from lc at least in the foreseeable future. details of the lc-to-u.k. marc conversion are published in a detailed specification. 3 common issues in conversion testing it is possible to draw out common problems applicable across all the conversion work, particularly in testing. they are as follows: 1. variability of records; 2. complexity of records; 3. volume of data; 4. nonstandard data; 5. repercussions throughout system. variability this is an obvious problem in the handling of marc records, but particularly pertinent when trying to do such complex manipulations. the record format itself is of course variable-there are very few essential fields or data elements; most need not be present at all; if they are present, they can be there once or ten times. standards of cataloguing, and therefore marc coding, have changed considerably over the period in question, adding to the variability. in some exceptional cases bsd practices are different from those prescribed in the marc manual, e. g., nonstandard use of title references. all of this results in additional difficulties from specification, through programming and testing. on average we found that one conversion process took two to three times the amount of coding required for more normal computer processing. complexity this is linked with variability and was manifest particularly in the fact that it was extremely difficult to ensure that the programs catered for all 158 journal of library automation vol. 14/3 september 1981 conditions. we found that testing threw up oddities not allowed for in the original specification. in an ideal situation with no time constraints a totally tailored and comprehensive test file should have been drawn up for each facet of conversion. this exercise alone would have taken a good year and would still not have catered for the unexpected data problems. in practice, whilst bsd's descriptive cataloguing staff were able to provide several hundred records that tested the majority and most important of the conversions, we always faced the possibility of coming across exceptions. this soon became apparent when volume testing commenced and each new file threw up another combination and a different program route not previously tested. volume the third major factor adding to the complexity of the whole operation was the sheer volume of data to be processed. approximate figures are as follows : u.k. marc 0.7 million records lc marc 1.4 million records locas 2.5 million records the combination of these three factors-variability, complexity, and volume of datamade testing extremely difficult and expensive in machine terms, in that large test batches of material had to be processed. nonstandard data like any large file, u.k. marc has its share of incorrect data, most of it of no particular significance. however, some problems arose in conversion testing resulting occasionally in corrupted records. one example that springs to mind was the incorrect spelling of months in treaties, giving problems in the 110 $b conversion to 240 . repercussions throughout system a cautionary note, really: we made a decision that postconversion records should not be put back and overwrite existing master files until they had been through validation programs (i.e., those used for validating new input for bnb and locas); it was felt that this was a necessary safeguard against reintroducing any structurally incorrect records postconversion. it was here again that testing threw up timely reminders of just how much the validation programs had been upgraded and changed since many of the original records had been input through the system. scheduling the scheduling of such a large, complex exercise was extremely difficult, with interdependency of processing related to the success or otherwise of overnight runs . a lot of time was spent before the conversion period in british library/brindley 159 discussion with our computer bureau to ensure maximum cooperation throughout the difficult time. they were extremely helpful in ensuring operator coverage throughout weekends and priority for our work. one of the problems we encountered was having to forecast the approximate number of machine hours that would be required throughout january 1981 when the bulk of conversion work was carried out. at the time the figures were needed we were still in early stages of programming so no volume tests could be run. equally, although we were experienced in large-volume processing it was difficult to draw any direct comparisons with production work. additionally, we had to allow for a heavier than normal production work load towards the end of the year, which always sees annual volumes, cumulations, online file reorganisation, and so on. scheduling therefore was a fine art to ensure correct priorities for production, the bureau's own work, and conversion , and to minimise contentions for files and peripherals. staffing of interest is a picture of the human resources involved in this project . what is striking is the magnitude of the task achieved by very few people. the overall management of the project was taken on by existing line management within bsd's computer services department. two project leaders were appointed, one a librarian and one a systems analyst. the librarian had a team of four temporarily seconded staff who were totally responsible for all output profile specifications (printed products and com), testing, and implementation. they also did a considerable amount of checking of test file conversion runs . the systems analyst was a project leader for three analyst-programmers and one jcl writer. between them they were responsible for lc and u.k. conversion programming and the new filing rules. existing operations staff and others as appropriate within the division were called upon for other tasks. disruption to services whilst disruption to our normal production services was kept to an absolute minimum, it was decided that it would be necessary to temporarily suspend certain services through the month of january 1981 while the bulk of the file conversion took place. throughout the period, the blaise online information retrieval system continued to be operational : associated online facilities that would normally allow the despatch of marc records to catalogue files were suspended to avoid any non-aacr2 or nonconverted records inadvertently updating converted locas files. the production of com catalogues through locas was suspended for a single month, and the first issue of bnb for 1981 was not scheduled until early in february. the schedule for the conversion exercise was adhered to with no major slippage except in the case of our lc file conversion; this exercise 160 journal of library automation vol. 14/3 september 1981 stretched on into the spring for a variety of technical reasons largely concerned with the characteristics of the lc data. conclusions having been so closely involved in this project it is difficult to draw out general conclusions as yet. however, there are some already obvious benefits both for bsd and the wider library community: the rationalisation of our software for com/printed products will lead to easier maintenance and future upgrading; the introduction of the blaise filing rules across all our products is an improvement; the new lc conversion will make our lc files much more easily usable by the british library community; we have the basis of a u.k. name authority file for the first time. this was a vast and sophisticated conversion exercise and will result in u.k. marc files probably more uniform in structure than they have ever been. it forms an excellent basis for the continuation of bsd services, especially those based on utilising records across the whole time span, e. g., blaise information retrieval, selective record and cataloguing services. equally, because our conversion has been so extensive we have been able to share it: the specification, the name conversion file, and the converted u.k. and lc files were all available at minimal cost to libraries in the u.k. of course, it is not the 100 percent solutionit was never intended to beso of course if you look hard enough you will find inconsistencies. however, it has proved that very extensive automatic conversion is possible even with today's state of the art of computing and that bsd had led the way, indeed eased the path of transition to aacr2 for british libraries. references 1. british library, filing rules committee, blaise filing rules (bl , 1980). 2. british library, bibliographic services division, computer services department, "specification for retrospective conversion of the uk marc files 19501980" (unpublished with limited distribution). 3. british library, bibliographic services division, "specification for conversion of lc marc records to uk marc" (unpublished with limited distribution). lynne brindley is head of c ustomer services for the british library automated information service (blaise). lib-mocs-kmc364-20131012113604 233 lit a a ward, 1980: maurice j. freedman s. michael malinconico this is the third presentation of the lit a award for outstanding achievement. the first two honored individuals whose achievements can be said to have created the discipline we know as library automation. the first award went to fred kilgour whose vision, daring, and entrepreneurial and managerial skills changed the way libraries operate almost overnight, and may in the increasingly stringent economic times ahead have helped ensure the economic viability of libraries. the second award went to henriette avram, whose untiring efforts on behalf of the marc formats and their promulgation is only just short of legendary. this year's winner distinguished himself in a somewhat different manner. his contributions did not lead to the development of new automated systems or services. rather, his outstanding achievement lies in the creative and pioneering use he made of technology in support of a clear vision of effective library service. his contribution comes from the depth of sensitivity and understanding he brought to the application of technology to library service. much to our go<;>d fortune, he has chosen to share with us through his many writings the insights he has found in his study of the fit between technology and the delivery of effective library service. this year's winner shares the distinction, with the two previous winners, of being a former president of the division. in fact, he presided over the change from the venerable acronym isad to the new name of the division: library and information technology association (lita). it gives me particular pleasure to present this year's award, as it goes not simply to an esteemed colleague but to a valued friend. i first met maurice (mitch) freedman at the first ala conference i attended-the midwinter meeting of 1972. the first session i attended at that conference was a meeting of the committee on library automation (cola). i had gone to that meeting to report on nypl's automated cataloging system, which had that month become fully operational with the publication of the book catalogs of the research libraries and of the mid-manhattan library. following the cola program, mitch approached me, introduced himself, and inquired about the possibility of using the nypl system to produce hennepin county's catalog. the consequences of that afternoon were most salutary both for the hennepin county library (hcl) and for me personally. hcl acquired at no cost an automated bibliographic control system, and i gained a friendship that has endured for nearly a decade. thus, rather than dwelling on mitch's professional accomplishments-which are already well known to you-1 would prefer to say a few words about the man himself. perhaps the best way to characterize him is to describe to you his office at 234 journal of library automation vol. 14/3 september 1981 maurice freedman (left) receiving 1980 lita award presented by s. michael malinconico (right). columbia university. prominently displayed on the walls are two enormous posters, one of bertrand russell and another of lenny bruce. a perhaps odd pair until one realizes that these men had one important attribute in common: neither of them accepted, without incontrovertible proof, truths supported by conventional wisdom alone. mitch, like the philosopher and satirist whose images grace the walls of his office, is an iconoclast who insists on more than the endorsement of reigning authority before he will embrace an idea; and he will work tirelessly to change the prevailing wisdom if he finds that it serves to frustrate rather than aid the delivery of the kind and quality of library service to which he feels the patrons of libraries are entitled. likewise, though he was among the pioneers who helped introduce sophisticated technologies such as automation and micrographics into the operation of libraries, he has always maintained a healthy skepticism, which has prevented him from being seduced by the dry voices of the hollow men who proclaim marvels that are in reality only gilded figures of straw. just as lenny bruce refused to accept contemporary conventions regarding language and behavior, mitch freeman has refused to accept the sanctity of lc subject terminology. he, sanford berman, and joan marshall have served for more than a decade as lc's conscience, prodding our phlegmatic, de facto national library to action. just as bertrand russell returned to the axioms of giuseppe peano in an attempt to secure the foundation of mathematics in formal logic and to lita award 235 free that discipline of fuzzy thinking, mitch has returned to the principles articulated by antonio panizzi and seymour lubetzky, as the tests by which to judge the claims of the self-assured mountebanks who regale us with newly coined bibliographic wisdom. in this regard i anxiously await the completion of his doctoral dissertation, in which he explores the philosophical underpinnings of theories of bibliographic control (a work that would have proved most useful during the protracted emotional debate that surrounded aacr2). i expect that it must be particularly gratifying for mitch to accept his award in this particular city. although his physical roots are in the northeast, i rather think his intellectual and spiritual roots are here, or more precisely, in the city across the bay-berkeley. it was just about twenty years ago that mitch, after graduating from rutgers university, newark, enrolled as a graduate student in philosophy at the university of california, berkeley. while at berkeley, his sense of social justice and utter disdain for unsupported dogma-could one expect less of a student of philosophy?led him to become active in the free speech movement. thus, we find very early in his career a concern for social issues, a concern that reemerged in his active involvement with the social responsibilities round table shortly after joining the library profession. before leaving berkeley, mitch earned his degree in library science. thus, he earned his degree from one of the most prestigious library schools on the west coast, and now plies his trade as associate professor at one of the most prestigious library schools on the east coast, the columbia university school of library service. if he is only moderately successful in conveying to his students his dedication to the delivery of quality library service, his steadfast conviction that technical services is in reality the first step in the provision of effective public service, and a respect for the supremacy of principle over expedience, his graduating classes will constitute a more lasting and meaningful award than this simple gesture conferred upon him by his professional colleagues. communications ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ lib-mocs-kmc364-20131012125101 174 oclc's database conversion: a user's perspective arnold wajenberg and michael gorman: university of illinois library, urbana-champaign this article describes the experience of a large academic library with headings in the oclc database that have been converted to aacr2 form. it also considers the use of lc authority records in the database. specific problems are discussed, including some resulting from lc practices. nevertheless, the presence of the authority records, and especially the conversion of about 40 percent of the headings in the bibliographic file, has been of great benefit to the library, significantly speeding up the cataloging operation. an appendix contains guidelines for the cataloging staff of the university of illinois, urbana-champaign in the interpretation and use of lc authority records and converted headings. the library of the university of illinois, urbana-champaign, is the largest library of a publicly supported academic institution, and the fifth largest library of any kind, in the united states. in the last year for which figures are available (1979-80), the library added more than 180,000 volumes representing more than 80,000 titles. the library is currently cataloging more than 8,000 titles a month; more than 80 percent of the records for these titles are derived from the oclc database (library of congress and oclc member copy). because our cataloging is of such volume and because we are actively engaged in the development of an online catalog, we decided to use the second edition of the anglo-american cataloguing rules (aacr2) earlier than the "official" starting date of january 1981. we began to use aacr2 for all our cataloging in november 1979. this early use of aacr2 has led to two consequences. first, we now have oclc archival tapes representing about 150,000 titles cataloged according to aacr2. this represents a valuable and continuously growing bibliographic resource that can be used without modification in our future online catalog. second, we have a considerable and unique collective experience in the practical application of aacr2. the minor problems of working with aacr2 in an aacrl manuscript received june 1981; accepted june 1981. oclc's database conversion/w ajenberg and gorman 175 plus superimposition environment (until january 1981) were more than compensated for by these two positive results. oclc conversion with our practical background in the use of aacr2 and our continuing need for a high volume of cataloging, we were, naturally, keenly interested in the (to our mind) progressive decision of oclc to use machine matching techniques to convert the form of name and title headings in its database-the online union catalog (oluc) to conform to aacr2. we recognized the limitations of the project, essentially those defined by the capabilities of the computer for matching character by character, but felt that this was a major venture that would, when completed, produce major benefits. what follows is an assessment and analysis of the results of the project in the light of the experience of a library that is dedicated to achieving highvolume, quality cataloging. we deal with the lc authority file as well as the oclc headings because the lc file was the basis of the project and because, from the practical point of view, the two files are complementary aspects of the same service. the greatest value of the conversion, and its greatest claim to uniqueness, lies in the sheer size of the project in terms of headings checked and changed. our catalogers, and others who work with current materials, estimate that more than 40 percent of the name and title fields we use in our current cataloging have a w subfield indicating that the name or title has been changed to its aacr2 form. since oclc estimates that 39 percent of the name and title fields were affected by the conversion, it would appear that the headings that were changed are the headings that we are more likely to use. in other words, the project has brought us more than a 39 percent benefit . we are also greatly encouraged to find that the number of headings coded dn (meaning aacr2 "compatible," or, more bluntly, lc's modifications of the provisions of aacr2) is a very tiny minority of all converted headings. this means that when, in the future, this policy of "compatibility" is lessened or dropped, there will be relatively few changes to be made. lc authority records we also benefit from the presence of lc authority records in the oclc database when we establish headings that are new to our catalogs. there is one problem with the use of these records, which was revealed by a sample of new university of illinois authority records (see table 1). this sample of 368 new university of illinois records reveals that lc authority records are available relatively rarely for new headings. this is not surprising as these new headings are established most often as part of the process of original cataloging, which , almost without exception , occurs in our library only when oclc copy is not available. it seems to us to be unfortunate that 176 journal of library automation vol. 14/3 septem her 1981 table 1. recently established headings no record record record authority coded coded coded record c• d* n• given name headi ngs 13 5 0 1 single surname headings 212 26 2 2 (number of this category with (132) (7) (1) (2) initial isms expanded in parentheses) compound surname headings 29 12 0 (number of this category with (2) (0) (0) (0) initialisms expanded in parentheses) single surnames plus 3 0 0 0 uniform titles general corporate 34 12 0 0 headings general headings with 7 2 0 0 subdivisions government headings 4 2 0 total 302 59 2 5 *key: c-in subfield w, indicates an aacr2 form , as established by library of congress. din subfield w, indicates an aacr2 " compatible" fo rm, as established b y library of congress. n-in subfield w, indicates that the input operator could not determine which set of rules governed the form of the heading. member libraries cannot contribute their authority records to the oclc database. our experience suggests that the online authority file would grow very rapidly if that were the case. to put it another way, the oclc conversion provides an enormous and valuable resource of aacr2 headings. it did not, and could not, provide new authority information. oclc will be complementing its valuable work in upgrading the retrospective file when it devises and implements a scheme for making available authority records for new headings derived from a wide range of sources. since so many headings were converted to aacr2, it may seem churlish and ungrateful to complain that more was not done. the following descriptions are not intended to form part of an attack on oclc's project or to minimize its achievement. form subdivisions the project failed to delete form subdivisions (such as "liturgy and ritual" and "laws, statutes, etc.") from added entry headings and subjects. the program correctly deleted them from main entry headings, but the inconsistencies resulting from their retention elsewhere makes the job of ensuring consistency in a large copy cataloging operation that much harder. oclc's database conversion/w ajenberg and gorman 177 this inconsistency in treatment is illustrated by examples 1 and 2. example 1 originally was entered under 110 10 illinois. k laws, statutes, etc . the program correctly changed the main entry heading to 110 10 illinois and added a subfield w, coded mn (them indicates a conversion by machine to theaacr2 form; then means "not applicable," and indicates that there is no title element in the heading) . example 2 has as main entry 110 20 illinois community college board but has as added entry 710 10 illinois. k laws, statutes, etc. t illinois public community college act under aacr2, the subfield k, "laws, statutes, etc.," should not be present in the heading. unfortunately, the program looked only at 110 fields, not at 710 fields, and so the heading was not corrected in the conversion. it must therefore be edited manually by every library that uses the record. program problems our direct use of the online authority file is somewhat hampered by the programming oversight that makes it impossible to search uniform titles. of course, uniform titles that are accompanied by a 100 field (notably in music) can be retrieved by an author search, but those without 100 fields (anonymous classics, sacred scriptures, etc.) are virtually inaccessible. there were a handful of specific instances in which the specifications were inadequate or the programs seem to have malfunctioned. these resulted in some oddities such as the conversion of the subject "jesus christ" to "sermon on the mount" and the (surely not politically motivated) switch from "u.s. department of state" to "voice of america." oclc has been scrupulous in identifying and publicizing these errors. they are few in number and, though conspicuous, have rarely caused us many problems. as can be seen, the problems caused by what we see as failures on oclc's part are few and affect few cataloging circumstances. theremaining problems either result from the decisions and actions of the library of congress and, hence, are wholly or mostly out of oclc's control, or are of such a nature that they cannot be solved by computer matching techniques without extensive editorial intervention. whether such human intervention is possible and, if possible, cost-beneficial is not for us to say, though it must be recognized that to transform the oluc to pure aacr2 conformity would be a herculean task . that task would undoubtedly involve many of the hundreds of thousands of records that are seldom or never used. oclc's database conversion!w ajenberg and gorman 179 serials the most troublesome example of the kind of problem that cannot be resolved by machine matching is that of serials. the oclc conversion project was, quite properly, not concerned with choice of entry (aacr2, chapter 21). this seems a simple and clearly defined decision. when we come to consider serials, this clear distinction between choice and form of entry becomes blurred. the major change brought about by aacr2 (rule 21.1b2) is that many serials previously entered under the heading for a corporate body are to be entered under their titles. in fact, the great majority of serials will now be entered under title. the upshot of this is that the citation (or form of heading) for a serial changes from, for example, national society for medical research. bulletin to bulletin i national society for medical research the restriction of the oclc project to forms of heading means that most serials in oluc will be found under headings the form of which may be correct but are inappropriate for citations. this problem, which, of course, cannot be resolved by computer matching, has led to difficulties for us in copy cataloging, because a degree of expertise is needed to apply aacr2 rule 21.1b2 and to distinguish between the majority of serials where the 110 field should be changed to a 710 and the small minority where the 110 field should remain as it is. since most serials are to be entered under their titles, it occurs to us to suggest that the oclc conversion project could have changed all 110 fields in records identified as relating to serials to 710 . by that method, the majority of serials would be correctly entered and the potential for mistaken citations greatly reduced. multiple personal names persons who write under more than one name (real names, pseudonyms, etc.) and who are not primarily identified by one of those names (aacr2 22.2c3) pose a special problem. under the provisions of aacr2, such persons are to be represented in the catalog (and the database) under two or more names. despite the fact that "creasey, john" and "marric, j. j ." and" ashe, gordon" are all names used by the same man, they will appear as separate headings from now on. under aacrl plus superimposition one of those names ("creasey, john") was used as the heading for all works. within the confines of the oclc project, there was no method available to distribute the various records under the various headings. it occurs to us that some method based on matching the name found in the 245 $c subfield with the 100 field might, at least, have resulted in the project recognizing probable cases calling for multiple headings. for example: 100 a hibbert, eleanor 245 a bride of satan i $c jean plaidy 180 journal of library automation vol. 14/3 september 1981 could alert the system to a case for change. we recognize that this would call for more sophisticated computer matching techniques and that it would call for editorial intervention. a good example of the problem this has caused for us is the case of the danish author karen blixen. she wrote under that name and under the pseudonyms isak dinesen and pierre andrezel. records in the database that were added before 1981 will use "blixen, karen, 1885-1962" as the heading for all her works including those published under pseudonyms. since the blixen heading is a perfectly acceptable aacr2 form, the conversion program codes it as an aacr2 heading, which it is for the blixen books but is not for those published under other names. the authority record (example 3) includes a note identifying both pseudonyms as valid aacr2 headings, but, of course, the programs as written cannot interpret such a note and match them with appropriate records. corporate name changes corporate bodies present a similar problem when one is dealing with those that have changed their name. until1967, the library of congress used the latest name of such bodies with see references from the earlier names. both editions of aacr require that works issued under the earlier names be entered under those names and works issued under the latest name be entered under that name, the various names being connected by see also references. however, records in the oluc for earlier works cataloged before 1967 will show those works entered under a later name. for bib record enter t>1b display recd :o: end r~c: stat : n entrd : 80 11 :::1 u!;eooj : 80 1121 t·,p~ : = b1b lvl : g~vt a9n : ~ang : suurce : ~· t tt:o : 004 inlc: :.. en-: lvl : n h~ad ref: a h~ad : c•: i i~ ;;\ d s t ol. t •j s : ;1_ n-3-n~~.? : o:t mod h:~ c : a•j t h status : a 1 0 10 n 7~0077 1 9 '2 1(•0 10 bl1 ~~'' t:-:·~r:"no d t:::=:3c.j '=~62 . w r.001790::1~-:t.·'l.c.:t.nn----n r.n n :..; 4(h) 10 andrb'i-::'1?1· pj."'r' \!' w rp;.jo:::790::15a•:ht.snn----nnnd 4 4 0c"i t ll d1 n,~ '='1::n. ls·"l ~ w nl"l•) 37"''021 ~;,:toln-tnr.----nr.r.d 5 /;..(.7 th.: f-(.•11 r.• w lnq ps~ •jd•:•n•r•s. ~r i.· val l •j aa(r 2 h~adlrt:a~ : a ar.,jrbe,;:el, r" l ·:?r 1"' €'. u:::::5l ctt,~·~ .a d l l1(•!5-er. ~ i !.•-:\1 1 1f::35j '="162 w n 0047902 15>i<:ln0l nr. -·--r•nr.n no holding~. i n uiu for iioldh~c·. en ter .1h depress disf'lay recd ~·end f'\~ o: r t .:... t: ,en t r d: 7507 11 used : 8 10725 t,pe : ·"i:l btb lvl : m c• o vt r ~•ji• : _ l~r,9 : ..:tn9 ::::.;:, tj rce : ij i ll•js.: r~::pr: [i•·= 1·11: r c.:onf p•jb: _t-rr. : __ oat tp : _ m/f/b : _ _ _ ind ;.. : _mod rec : f~s.ts•: t-.r: _ c·:.or.t : d-?~·: : lr:.t lvl: 0dt~~ = 196:?.. _ l 0 1 0 63-11618 2 040 ~ orl 1 oc.l ~ ~.~: . ::: 0'::·0 0 pz 3. 86;2(126 b el • 4 0~..,2 fl •5 09:.· t; 6 04.;, tiiuu 7 100 10 bl1xen ~ ka~ en , d 188~ ·1 962 w e n :::: _::.q•:; 1 ehr·.:-n9ar-d c (b·,·) 1~<1 1 d1 nes.-?n [,. .. :-~ ... :jd) q 260 0 new yor~. brandon. house. c fi~~~j 10 "30•) 111 "· c 22 err •• example3 oclc,s database conversion/w ajenberg and gorman 181 because those later names are valid aacr2 headings in terms of their form , they are coded en (i.e., aacr2 validated) by the program , even though they may not be the right headings for the records to which they are attached. a good example of this problem is that of the "lutheran churchmissouri synod ." an earlier name of this religious body is "deutsche evangelisch-lutherische synode von missouri, ohio, und andern staaten." unfortunately, the authority record (example 4) does not even show that the earlier name is valid according to aacr2. the conversion program, on encountering the earlier name used as a heading, would change it to the later name and code that form as being the aacr2 heading. another example of the problem is: chamber of commerce of the united states of america. international department this is identified as the aa c r2 form (example 5) but, in fact, the department has changed its name to "international division." lc practice another problem we have encountered is that of the literal-mindedness of the computer programs in matching like with like. this problem is compounded by inconsistencies in cataloging practice resulting from variations in lc cataloging practice. an example of this problem is that of the nigerian author chinua achebe. the heading "achebe, chinua" is marked as being aacr2 despite the fact that the authority record shows ~;.:rt:oen 1 ot .. ~ fur f< i b reco:.ord ente:f~ to1b oi'sf'lay reco send f;:.,.,: ~ t~ t : 11 f r.t r· d: :::(' 1 ~~:::::: 1.1•3-.:o•j! 801~::.2:3 t ·tpe! :;:: 8jb lvl : c'o:•vt a9n : l~l.n9 : 'i-t:••jr•:.: : ·; lt e: 0 11 l'nlc : .:;:~ e: n•: lvl : r• h e.]_ d 1 ...:.·t : a h~a. d: c i •) 10 :<)•)57t)65 :.: 11•.) .:-l~· ' • • ,~··.;t r, ,-hur•:h--11j.ss<:••jr j :_::..·, .,-l( .. j , t li' n(l(ol:::(lll(l~:::;.ao3•:~.r • r•----r,r.r+n 3 4 j. o .:•: l '~ '' '=' <e.· i l.•: ·!il lutt, i·r.:s. n :.··,·r ... :ocl of ml ~!•)tj r l, oht•)• ar,d oth~;..:r· ·~;tat_~ w rh1 0:2~:(1! 1 't•s.;:t.-~ lo•ln r,--·--nr•na 4 .. 11 0 .·;:· f'll s!-<:• u rl s ·,nod w nc"j(•3801105aanann----~~~.r.a ~. 41•) 2 t) (i(·r·m.;.n ml$:s.•)tjrt -:. .• nno1 u.. lt0d4:.:;:0 ll ('l':j d.-1.n;}nn -----nnn.'l. 6 410 .:·o oe u t ~·:he ev;:in9~ ll ~·~ h -l•.jtt'. cott s•:. h~ :~. -.,r, (i(j.;? v(h• t'1t. ss •:•uij . • oht •.• und andepn ·~;t·.l·l.t~n 1.1.1 nf.•f)s:::0 1 1 0'3·.l·lr•-lnn--nnr.-3. st~aten w n00680j 1 0~~anan~----nnn~ 8 4 10 20 ger man evan9cl t c al luthard.rl s .n o d o f mt ssouf' ll ohl vt •nd other st~ t e5 w n00780 1 1 05a~nann----nnna scr~?e-n ~ o f 2 10 5 10 ~0 evan9~ltcal luthera n s.nod j c al confer~n ce o f nor th amertca w n000801105aan~nn----nnn~ 11 667 aacr 1 f or m: a lyth e r a n churc h --mt sso urt synod w no 1080 1 1 05aolnann----nn r.r. 1 2 1;,./-.7 th .:f·:·l l ow t n 9 f'jbd t vlsl•.•r• has n"t t•eo:n us e-•1 as a h~adtr,g : a lutheran church--mlssourt s .nad . ft sca l off t ce w n0 1 180 1 105a anann ----nnnn example4 182 journal of library automation vol. 14/3 september 1981 •:; ,~ r c-~r, 1 •:o f :: for bib rec(ifi[, enter ~1b [i! '~play re u j ·_.nm r~=-( ~ t o:tt : ·= l=n t rd : 80 1 ~'•)"1 t.k,.-,j : :7: 1 nr:;31 top!.:'! :: btb lv1 : ,.,_,vt ~~~r,: l-j. r•~~ = ·.·•.•ijr •..:€: : ·:.l t·? ! (loy/ irtl. (! r\ f: r••l vl : r, ht}·l•1 ~-~ f~ l h<:·td! ~·~~ d 5 tatu s : ~ n~n.~: d mud f@c: aut~ ~ l ~ tu ~ : ~ 1 0 1 0 ,,. 80 l u:.z..; l7 :.:. llt'j 20 r· r,. t n·b~ • •.• f c<:•rtiit•~?r ( <.? .:•f t r. v ur, lt <::·d ·~. l -lt:·~ ··· t ,'\m ~ 1 •--3 . t· j r.tt-r na t t•:,nal deft. w r.oo t ;;;() lo::l.:io'3•~·.in r•---,, ,,,..,,, ;: 41 0 _::o c t.-3.mb e-" r o t c •:o n.nu:o r ·•:e •:of 1 h~ ur. tt ~ j "=; t .lt.:.~ .:•f f lnt<~r l• ~ . to f(t l t.' j~n r.:.:. rr.n.~rct?-r"ore t 9r, f' c•lt•: , d~pt . w ,.,·,o.:::?.o l 0.2 l.ao:ln~t+r• · ~r.r•ih 4 41(1 ,::(1 (h~mb ~ r .:· f c o fl'i lll r('\? ,·,f th <.? l.l l tl t \'· d ~t ,tc··~ r)f. r. n, e.. rt • ~ . lo commerc~ dept . ~~ n00480 1 0~1~ana nn----nrlnd (:. 510 2 0 (h•lmt• ..:r r'l f c(•rnn,._:. r ,··~ ,-, f thi' ijr,j t ~d :::. !~t :·=•' t ;::. rr.~:r· j • .'}. i• l nte~n~tton•l rel~tton s de pt . w n00580102 laanann--·--nnn~ s..:r ~~n 2 •.•f 2 7 1_-,1_,7 th e f•:.ll <:!llll n q h~.t.jdlna t (•f th i• fo atll .~r flo3uaf' j r.~ \.a ltd aa( r .;;. head1n9 : a chan1b er ut c o:•rmt.cro:e of th~ un1l e •j ·:. tat -e· ~ (1f fhr1<.•r 1•>'3. . rori?l91• commerce-fc•rc~ i'ein f'oll c, [i~ pt. w n(u_l~.:?o l ').:· l ·'l.-3.f1~rtn ---nflrar' 8 67(1 ar, t ntr odu·:tt or. t.~· do1n9 lffif'•ht •• • 19 4 7w n007:~:t) t 02' 1 aananr.---nn n n example s that he was born in 1930. lc's announced policy is to give dates "whenever the information is readily available," but only for headings established after december 1980. this restriction creates inconsistencies in lc practice that are hard to predict. the result is that we often establish a personal heading with a date, only to discover that lc is not using it. the definition of " readily available" is clearly elastic and does not provide clear guidance to other libraries. it is irritating and occasionally burdensome but does not create a quantitatively serious problem . one unfortunate result of lc's machine conversion of its authority file to aacr2 forms has been to make notes on the authority records harder to understand. this is because only headings and references were changed; notes were not affected. this means that the wording of the notes may refer to a state of affairs that has altered as a result of the aacr2 conversion. example 6 is the authority record for theaacr2 form of heading for the university of illinois prior to the change of name in 1966. the history note (field 667) incorrectly says that the heading for works published before 1966 is "illinois. university" (the pre-aacr2 form) . since the aacr2 form as established by lc looks very much like the new name, "university of illinois at urbana-champaign," the authority card is very difficult to understand. nothing short of revising the note, and/or the use by lc of a less confusing qualifier than "(urbana-champaign campus)," will make the authority record intelligible. an example of how lc practice has affected the oclc program adversely is in the area of the so-called compatible headings. these are instances of when lc has chosen to depart from the provisions of aacr2 for one reason or another. leaving aside the utility and morality of such a policy, it presents a considerable problem to those of us who use oclc oclc's database conversion/w ajenberg and gorman 183 records. the example that follows is of the worst of these "compatible" practices. lc has decided to ignore the common form of name for persons who are not "famous or published under an american imprint." 1 thus, the writer p. c. boeren would be recorded as "boeren, p . c. (petrus cornelis), 1909" under the provisions of aacr2, but, because boeren is neither famous nor american, the "compatible" heading will be "boeren, petrus cornelis, 1909." this heading is not acceptable in an aacr2 catalog. scr-e-er. 1 qf 4 for bib record enter btb display recd senli rec stat: c entrd : 801122 us~d : 810718 fype: z bib lvl : ~ govt a9r : lang: source: site: 038 inl~: ~ enc lvl: n h@ad r~f: a he~d: ~ ~ie~d status : a name : a mod rec : auth ~t~tus: a 1 010 n 7904c•l04 2 110 20 untvt:-rsit-.~ of illtnc•ls <urt-~na-(hampal911 ~.o .:jmpus) w n029801115aac-l.nn----nnrll• 3 410 20 illtnois industrtal untvers1t. w n0027?08~9aananll----n~na 4 410 :20 unlverstt. of illir•oj.c; w rt0 '?7800::: 17aanar.r,----r•rtn;;j 5 4 1010 illinois. b ljfllvf!fslty. ttj no::::o:::(1\l l5a .jaa.r·u----nnn.j. 6 41 0 10 ur· b ar.a <ill . ) . b unjver!_;"tt-.. c•f tlllr••:•ls {l lrt•-:l.n.:l--·(h.amf ·.:& l~n ~ampus) w n031810616~dn~nn----nnna 7 410 10 1111no1s . t• un1vers.1t. (ijrl•ana--chantpal!=lr• .:~ntp&..i!) w n038810616aanann----nnn~ 8 510 20 unjver-slt-·, of jlltn•w•ls at chtca~..:.· ctl"' cle w n00779t')8~~~~-:~r•"-t~n--- nnna 9 510 20 university of tllinuis ~t the medical cent~r w n008700829aar•a~n- -nnna scr·een 2 '='f 4 10 510 20 llntve-rs1t ·, ~>t 11 510 20 un1ver~1t. nrtna ill tnot~ \~ystem) w n00q7q082qaanaen----nnn~ j lltn.:•is at ur bana.-champa.t9n w n0~:2810616aar.ae-n---12 510 ·io lirnverc;.jt •• t illtno ls at c(•l'l91"'toss circle, cht•:a~o w n0:3:.::81061 6a:lrtae r,-,tt,;i·• scl"'e€-n 3 of 4 13 665 the illin~1 5 industrial un1v~r~1t~ wa s ch~rtered in 1867. jr, 1885 the name wa~ cha~~ ~ d to unj,·erslt~ of ill1nnts and in 1966 to untverstty of illtn ots at urban~-ch~mfat9n . a worls b~ th1s bod v publt~hed befor~ the chan9e" of narr,~ 1n 1 <"~66 ar~ found und~r a un1ver~1t.· t:•f llltr• ols curt•ana-champal9n campus) a worjs p~bl1she~ ~ft~r th~t ch~n9e of n~m~ ~r efound under~ untverstt' of 11lln01tat urband chaffipal9n . a the ch i c ago undergraduate dtvls:.ion of the l_ lniver~slt·-. .,f ill1n01s wa5 ~stabll !, hed 1n 1946 . ir. 1.:-,~.:: th€naffi~ was chan9ed to untver-slt( o)f lllllll)ls at c,.n9r es.s c lrcll?, lhl•:ol.9o), and in 1964 t..--. unlvforsltl of illlnc•ls:. at chjc.:l~o:• c1 rr:.l~. a worj ~· t.··.th1 s t•r.•dl published befor t: the char•9t:t.:tf nam.:1r1 1':'64 ~.r· ·-~ fc•ur.d und~r a ur,1v fi:r·5t' of illio•)ls at c:on~_.ro?ss c'trclt?~ chica~·' · a wor• ~ p•lt• lj ·_;t,t:"•j :.1fter that char.~e of name are found und€-r a untvers t t ~, of 111 inoi£ at (t,jc-a.9(1 ctrcle • .a jr, 196(:, the llnlverstt. c· f illlnot sat urbana-(h-arr•p.aj~n .. the l.lr,jv ~~ r4;.jt, f•f l11jnt:•ts at screen 4 t;:. t 4 13 ccont> chi•: a9o cir·cl€, artd the univer·sit -..-· .:·f illjr .. :.ts at th et-tedto:·a.l center~ o1e-r·f=' reorganized into equal administr·ative ca~puses with1n a university s . stem with a •:entral admintstr·attve staff in llr· ba,r,a . a wor•-s p•jbllshed by th~s~ b(•dles after the reorgantzatton tn 1966 are found under a un1v~rs1tv of llltnots at urbana-champat9n . a un1vers1ty of 11 l1no1s at ch1ca9o c trcle. a un1vers1ty ot lll1nots at the medical center . a untver·st t. of llltr.ots (srstem) a subject entry: wor• s about these bod1es are ~nte-red •jnder th ~?name o:·r· n>3m~s tn e .:..::: 1stence d•jrln~ the 1ate5-t period fc•r wht•:t. sijt•je:t c.·.vera9-e ts 91v~r.. in the c ase wher~ the required name is represent~d 1n tht~ ~ at~lo~ onl, un·j~r ~ later form of th e• rtame. ~r.tr··r 1s ma.j-e un·j ~r· tht.:o la t ~r f.:·r·m. w n010:3 t061c·a'lnur.n--nnrtn 14 667 llltno1s indus t r1al university. w n004790s=a4dnann----nnnn example6 184 journal of library automation vol. 14/3 september 1981 more, it is quite possible that if boeren' s works are published in america or if lc suddenly decides that boeren is "famous," the heading will be changed. this is an infrequently encountered problem for us but one where lc's peculiar policies have created problems that have nothing to do with oclc or aacr2. conclusion the problems that we have cited above are real but not numerically significant (except in the case of serials and multiple personal namesneither of which are under oclc's control). they are far outweighed by the tremendous value of the more than 40 percent of oclc headings that have been converted to their aacr2 form. the oclc conversion has made it possible for us to do aacr2 cataloging more quickly than in the period november 1979-december 1980. we have issued guidelines to our professional, paraprofessional, and clerical cataloging staff who deal with all the headings we encounter in using oclc (see appendix). problems such as those we have described are dealt with in our guidelines, and in practical terms now in day-to-day work. they may take some extra time, but overall our cataloging operation has been greatly speeded by oclc's conversion . reference 1. cataloging service bulletin , no.6:6(falll979) appendix university of illinois library at urbana champaign copy cataloguing guidelines authority records lc authority records, now available on oclc, can be very helpful in determining the correct aacr 2 form of headings, and should be cited on authority cards we prepare, when we use them in establishing headings. the tag numbers used on authority records sometimes have different meanings from the numbers used on bibliographic records. the meanings are: lxx heading 4xx see reference (i.e. from the form in this field to the form in the lxx field) 5xx see also reference (i.e. from the form in this field to the form in the lxx field) 6xx notes (e.g. the authority used by the lc cataloguer) each field concludes with a w subfield, consisting of 24 characters indicating in coded form various types of information about the heading. the 13th character, the 3rd past the six-character date, consists of one of five letters indicating the rules governing the form of heading in that field. the codes are: oclc's database conversion!w ajenberg and gorman 185 c aacr2 d compatible with aacr 2 b aacr, 1967 ed. a earlier rules (e.g., ala rules of 1949, etc.) n not applicable or not applied here is an example of an lc authority record, omitting the fixed field and some of the references: 010 n 790558820 110 20 state university of new york at buffalo. w n008801115aacann----nnnn 410 10 buffalo. b university w n002791105aaaann----nnna 41010 new york (state). b state university, buffalo. w n009801115aaaann----nnna 667 the following heading for an earlier name is a valid aacr 2 heading: university of buffalo. w n007791105aanann----nnnn when oclc carried out its aacr 2 conversion project, the data about the rules encoded in subfield w was added to headings in bibliographic records, if those headings were altered by the conversion. for bibliographic records in oclc, subfield w contains 2 characters, each of which must be one of the following: c (for aacr 2 heading) d (for accr 2 compatible heading) m (for machine converted heading) n (not applicable or not applied) the first character applies to the name portion of the heading; the second, to the title portion. obviously, in many cases there is no title portion, in which case the second character will ben. the code m (machine converted heading) is used when a heading is altered directly by program, rather than being extracted from an authority record. an example would be the elimination of subfield k laws, statutes, etc. 1. use of subfield win cataloguing since oclc does not want member libraries to apply the letter codes in subfield w for their original input, the presence of a cord in subfield w should always indicate an lc decision identifying an aacr 2 or aacr 2 compatible heading. supply subfield w for all cataloguing to be added to oclc's data base. the codes to be used are given in illinet's information bulletin #92, from which this table is copied: 1 aacr 2 form found in on-line lc name-authority file 2 aacr 2 compatible form in on-line lc name-authority file 3 aacr 2 form supplied by inputting institution with copy in hand and piece not in hand 4 aacr 2 form supplied by inputting institution with piece in hand 5 author or title portion of heading not converted to aacr 2 form. this subfield (#w) is always the last subfield in the field. it must contain a two character code. the first character applies to the name portion of the heading; the second character applies to the title portion of the heading. if the heading is a name heading and does not include a title portion, use "n" as the second part of the code. if the heading is a uniform title heading, use "n" as the first part of the code. examples: 700 10 day lewis, c. #q (cecil), #d 1904-1972 #win 600 10 schmidt, h. r. #q (heinrich rudolf) #w 4n 130 00 bible. #p n.t. #s authorized. #f 1974. #w n4 accept headings coded c in subfield was correct aacr 2 headings, unless the heading is for an author entered under surname who writes in a non-roman alphabet language. for such 186 journal of library automation vol. 14/3 september 1981 authors, use the form given only if it is a standard romanization of the name in the original alphabet. if a form other than the standard romanization is used, substitute the standard romanization, and trace an x ref. from the form coded c. 2. lc author headings without dates lc recently announced that it will not add dates to a heading already established without dates, unless the dates are needed to resolve a conflict. when there is no conflict, the dates will be recorded in the authority record in a 6xx field , but will not be added to the heading. dates will be routinely added to newly established headings at the time the headings are established, if the information is readily available. lc codes such headings c, not d, because aacr 2 does not require that a date be added to the heading, except to resolve a conflict. if such an lc authority record is available when a heading is being established, use the lc form , without adding dates to the heading, unless dates are needed to resolve a conflict in the new catalogue. record the dates on an authority card. if lc authority is not available wh(;ln a heading is being established, use dates in the heading if the information is readily available. if, later , lc authority is found that omits date from the heading, do not change the heading as already established for the uiuc new catalogue. since records in oclc may contain headings without dates for persons we have established with dates, some conflicts will be generated. these should be resolved by catalogue maintenance staff, who will add dates in pencil to headings on new cards that lack dates, but are otherwise identical with headings in the new catalogue. such conflicts in the machine record will be cleaned up gradually, after fbr is up . 3. acceptable dn forms headings coded d in authority records (dn in bibliographic records) are the aacr 2 "compatible" forms. in many cases, the difference from aacr 2 is trivial, and the form can therefore be used. in such cases, if lc authority is available, use the form as established by lc, and record the information on an authority card. if lc authority is not available when a heading is being established, follow aacr 2. if, later, lc authority is found that establishes a "compatible" form , do not change the form in the uiuc new catalogue to the lc "compatible" form. it will sometimes happen that "compatible" forms will be found on records in oclc (coded dn, usually) . such headings may be used only if they fall into one of the categories listed below . this will sometimes result in "compatible" forms and true aacr 2 forms both being used in the new catalogue. in some cases, the two forms can be interfiled; in other cases, catalogue maintenance staff will need to correct "compatible" headings in pencil. acceptable dn form s are: a . lc will omit hyphens between forenames if the heading has been established without hyphens, even though rule 22.102 would require hyphens. use the lc form , if found . catalogue maintenance will interfile headings identical except for the presence or absence of hyphens. b. lc will continue to place the abbreviation ca. after a date in the heading for a person, if the heading has already been established in that form , even though rule 22.18 specifies that the abbreviation should precede the date. use the lc form, if found . catalogue maintenance will interfile headings identical except for the placement of the abbreviation ca. c. lc will not correct the language of an addition to a personal name heading; i. e . will not change to the language used in the person's works. (e.g., a heading already established as louis antoine, father will not be changed to louis antoine, pere, even though the latter is the author's usage.) use the lc form, if found . catalogue maintenance will correct conflicts in pencil, to the lc form. d. lc will not change a personal name heading to a fuller form of the name, even if the shorter form is not predominant. use the lc form , if found. catalogue maintenance will correct conflicts created by personal name headings that vary in fullness to the form to which a "see" reference has been made. if there is no "see" reference, catalogue oclc's database conversion!w ajenberg and gorman 187 maintenance will refer the conflict to the appropriate cataloguing service. e. lc will continue to use additions to surname headings supplied by cataloguers, for headings already established with such additions. use the lc form, if available. catalogue maintenance will resolve conflicts by adding qualifiers in pencil to headings that are otherwise identical with the forms with qualifiers. f. lc will continue to use titles of honor, address, or nobility with headings that have already been established with such titles, even though the authors do not use such titles. use the lc form , if found. catalogue maintenance will resolve conflicts by adding qualifiers in pencil to headings that are otherwise identical to the forms with the qualifiers. g. lc will not use initial articles in uniform title and corporate headings, even when they are required by aacr 2. we will follow lc practice in this, and use the lc form when found. catalogue maintenance will interfile uniform title and corporate headings that are identical except for the presence or absence of initial articles. h . lc will continue to use the abbreviations bp. and abp. for personal name headings that have already been established with those abbreviations used as qualifiers, instead of spelling out the qualifiers in full. use the lc form, if found. otherwise, follow aacr 2 and spell out "bishop" and "archbishop". catalogue maintenance will resolve conflicts by correcting in pencil to the form spelled out in full. i. lc will not add terms of incorporation to corporate headings already established without them, nor delete them from corporate headings already established with them, even though lc interpretation of aacr 2 would require such adjustment. use the lc form, if available. otherwise, retain terms of incorporation in corporate name headings only if the term is an integral part of the name, or if, without the term, it would not be apparent that the heading is the name of a corporate body. catalogue maintenance will resolve conflicts by adding, in pencil, terms of incorporation to headings identical to established forms except for the absence of such terms. j. lc will not add geographic qualifiers to corporate headings established previously without such qualifiers, even though they have chosen to apply the option in rule 24.4 that allows qualifiers to be added when there is no conflict. use the lc form, if available. catalogue maintenance will resolve conflicts by adding qualifiers in pencil to headings identical to established headings except for the absence of such qualifiers. k. lc will not reduce the hierarchy of far eastern corporate headings, established before 1981, even though aacr 2 rules would require that intervening superior bodies would be omitted from the heading. use the lc form , if available. catalogue maintenance will refer conflicts to the appropriate cataloguing agency for resolution. the asian library cataloguer is the final authority for such headings. l. lc will not change the capitalization of acronyms and initialisms to conform to the usage of the corporate body, if the acronym has already been established with a different capitalization. use the lc form, if available. catalogue maintenance will resolve conflicts by interfiling acronyms and initialisms that are identical except for variations in capitalization. m. lc will not supply quotation marks around elements in a corporate heading that has already been established without quotation marks, even though this varies from the usage of the body. use the lc form, if available. catalogue maintenance will resolve conflicts by interfiling headings identical except for the presence or absence of quotation marks. n. if lc is attempting to resolve a conflict (i.e. two different people with identical author statements), and neither dates nor expanded initials are available to resolve the c:onflict, lc will add an unused name in parentheses to the heading if the information is available. e.g.: established heading: smith, elizabeth new author: elizabeth smith 188 journal of library automation vol. 14/3 september 1981 (new author's full name, ann elizabeth smith, is available) lc heading: for new author: smith, elizabeth (ann elizabeth) use lc forms if found in name authority file. catalogue maintenance will refer problems to the appropriate cataloguing agency. 4. unacceptable dn forms in a few cases, the aacr 2 "compatible" forms, coded d in authority records and dn in bibliographic records, are unacceptable in the uiuc library. instead, we will follow aacr 2 in constructing these headings, and record the lc form on authority cards when they are found. we will also make references from the lc forms, if they would file differently from the forms we use. for many of these, catalogue maintenance will have to refer conflicts to the appropriate cataloguing agency. in a few cases, catalogue maintenance can make the corrections on the cards. the unacceptable dn forms are: a. lc will sometimes, but not always, continue to use headings established prior to 1981 with names spelled out in full , when the authors represent some of those names with initials. follow aacr 2 in constructing headings for these names. use initials in conformity with the authors' usage, and add the corresponding full names in parentheses, in subfield q, when the information is available. whenever an element in a compound surname or a first forename is represented by an initial, make a reference from the fuller form. usually, a reference will not be needed if a forename other than the first is represented by an initial. b. lc will continue to add " pseud." to personal name headings already established with that qualifier. do not use the qualifier "pseud." when establishing personal name headings, and delete the term from oclc records that use it, including records added by lc. catalogue maintenance will resolve conflicts by lining out the qualifier "pseud." in headings. c. lc will continue to add 20th century fl. dates to personal name headings already established with such dates. do not use 20th century fl. dates when establishing personal name headings, and delete such dates from oclc records that use it, including recorded added by lc. catalogue maintenance will resolve conflicts by lining out 20th century fl. dates in headings. 5. 87x fields one part of the aacr 2 conversion project by oclc was the addition of fields tagged 870, 871, 872, or 873. these fields contain the pre-aacr 2 forms of headings that were changed by the conversion. oclc participants can add 87x fields to records they enter into the data base. however, we will not supply these fields in our cataloguing. 6. authority cards prepare authority cards whenever references are needed, and whenever an lc authority record for the heading is found , even if we do not use the lc form. citation of the authority record takes the form: "lc auth. rec." followed by the record number and the indication, in parentheses, of the code for rules given in subfield w. example: akademie der wissenschaften und der literatur (mainz, germany) lc auth . rec. 80076417 (en) if the lc form differs from the form used as the heading in muc, give the lc form in parentheses, following the sub field w code. example: abrahamson , max w. (max william) lc auth. rec. 78064817 ( dn) (ab-rahamson, max william) it will sometimes happen , when establishing the heading for a corporate body, that an lc oclc's database conversion/w ajenberg and gorman 189 authority record for a subdivision of the body you are establishing will give you the aacr 2 form of the body you are setting up. precede the citation to the authority record with the word "from". example: united states. environmental protection agency. region v. from lc auth . rec. 80159375 (en) (the lc authority record is for the water division of region v) 7. references the basic rule for making references is given in aacr 2, rule 26.1: "whenever the name of a person or corporate body or the title of a work is, or may reasonably be, known under a form .that is not the one used as a name heading or uniform title, refer from that form to the one that has been used. do not make a reference, however, if the reference is so similar to the name heading of uniform title or to another reference as to be unnecessary." ultimately, this decision depends on the cataloguer's judgement. usually, make a reference only if it would file differently from the established heading and from all other references. refer from variant forms found in works catalogued for this library, and in standard reference sources. lc authority records will often suggest useful references. however, we may need references not traced by lc, and we may not need all of the references lc traces. notice especially that lc authority records will often give a reference from the pre-aacr 2 form, even when it would file with the aacr 2 form. for example, the authority record for akademie der wissenschaften und der literatur (mainz, germany) traces a reference from adakemie der wissenschaften und der literatur, mainz-the pre-aacr 2 form. these two forms would file together, so we do not need the reference. we will trace "see also" references from forms that can legitimately be used as headings, whether or not they have been used yet in the uiuc library. we will no longer observe the former restriction, which allowed "see also" references to be made only if both headings had been used. for further information on authority records and references, see the cataloguing manual, section a79. aw:lgo arnold wajenberg is principal cataloger and michael gorman is director, technical services, at the university of illinois library. gathering strength to combat access inequality: how a small rural public library supported virtual access for public school students, staff, and their families public libraries leading the way gathering strength to combat access inequality how a small rural public library supported virtual access for public school students, staff, and their families julie lane information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.15161 julie lane (jlane@peclibrary.org) is technology resource centre coordinator and educational resource consultant, county of prince edward public library and archives. © 2022. prince edward county (pec) is located east of toronto and covers approximately 1,050 square kilometers. pec is a part of the hastings prince edward district school board (hpedsb) and have a total of 6 public schools, one catholic school, and one private school. the other county serviced by our school board is hastings county. the county of prince edward public library (cpepl) system of 6 branches services just under 25,000 residents and countless seasonal visitors during the tourism season. our public school board services approximately 15,000 students across 7,220 square kilometers and 39 in-person schools and a k-10 virtual school across the two counties. starting off a technology column with a bunch of statistics is not exactly how i figured i would write this. however, context is key when discussing equity and access; and in this piece, i intend to highlight how both of those are made significantly easier to achieve for community stakeholders, with the presence of technology and education. when the stay-at-home orders were announced in march 2020 due to the covid-19 pandemic, we knew that we would not be able to hold our scheduled and planned public library programs. we turned to live streaming story times, maker programs, and author visits, all using what equipment we had on hand—tablets, laptops, and the internet. once it became clear that students in the public schools would not return to in-person learning within any short amount of time, all school boards in ontario ensured that enough chromebooks were purchased so that every student had their own dedicated device, with the assumption that providing a device meant all students could participate in remote learning. teachers rushed to transition their teaching plans to an online format; school administrators scrambled to schedule safe device pick-ups for students; and parents were not only juggling professional responsibilities and parenthood, but now teaching and tech support. although school boards provided tools to meet the “classroom” requirements, they could not ensure that every single student had access to a high-speed internet connection, nor could they offer school library access remotely. this is where the cpepl was able to offer support. the global shut down had a significant impact on the relationship that the cpepl had with the schools in our county. a large focus of mine was to rebuild those working relationships to support students, staff, and families, and ultimately demonstrate in actionable ways how the local pu blic library system was there for them. one immediate way i thought we could demonstrate support was through lending our wi-fi hotspots. hotspot lending programs through public libraries have gained popularity over the last few years. although our program had been in place for nearly 5 years, i am always surprised at the number of people that do not realize it is an available resource. with that in mind, i persistently reached out to the school administrators in our area and set up meetings to discuss how our borrow the internet program could benefit those working remotely without reliable internet. wait lists for our 9 available hotspot devices drastically increased, but mailto:jlane@peclibrary.org information technology and libraries june 2022 gathering strength to combat access inequality | lane 2 our patron community was incredibly supportive of our students and would frequently request that their loan, which is at maximum 7 days in length, be passed to a student. though connecting families with internet hotspots was helpful for the required online learning, we could not fill the gap completely. if we had an unlimited communications budget, the situation would have been easily remedied, but, as we all know in the library world, budgets can be very tight. this fact pushed us to find creative ways to bring as many resources as possible to the students, staff, and families in our community. to broaden the reach to individual schools (and staying persistent with that outreach), i focused on not only ensuring that school communities knew what physical resources the library had, but also what electronic resources were available. these conversations and emails with school administrators led me to get in contact with the curriculum coordinator at the board office. this connection was a complete game changer. instead of us, as a public entity outside of the school community, contacting individual schools and trying to build relationships with teachers, librarians, and administrators, we had the person who oversaw all of the school librarians, library technicians, and curriculum development for the k-8 grades on our side. the coordinator was on board to help us make the desired connections with the schools in a number of ways. she put us in contact with the curriculum coordinator for the secondary grades (9-12) and our program and service list was sent from the board office to every teacher, principal, school librarian, and library technician in prince edward county. we were then able to set up a meeting with the coordinator of assistive technologies for the board, which set us on a track to completely revamp how we marketed and allocated our resources to schools. it became clear in our first conversation that we needed to get students connected with their public libraries as quickly and efficiently as possible. with students split between in-person learning, virtual learning, or a combination of the two, with still minimal to no access to school library borrowing, the online resources of the public library system seemed like the perfect solution. not only would connecting students, staff, and their families with their local public library be a way to get everyone reading, but we were fulfilling the opportunity to ensure that everyone had genuine and equitable access. what the school board had observed was that the required shift to remote learning made the inequality of literature access glaringly obvious. students who relied on their school library for reading were not getting that opportunity and students who had individual education plans were jumping through hoops to get digital copies of material. so though everyone had a school supplied chromebook, not everyone had the same access. this is where public library subscriptions to hoopla and libby came to the rescue for providing current and popular literature in a variety of electronic formats for students to immediately access for both course reading and leisure enjoyment. connecting with like-minded, growthand education-oriented people is incredibly empowering. the curriculum coordinators at the board office were so enthusiastic about connecting students, staff, and families in our school board with their public library that it made the next parts of the process not only successful, but fun as well! the curriculum coordinators and i created a presentation that we brought first to school administrators in prince edward county. having public library advocacy come from the school board was incredibly influential and a big step toward issuing library cards to students. once we had buy-in from the school administrators, we circulated registration forms for families to fill out and get everyone in their household public library access. we found that the easiest way to do this information technology and libraries june 2022 gathering strength to combat access inequality | lane 3 was using google forms. it was simple for parents to fill out and easy for library staff to glean the required information for card registration. since the library was also working with the virtual school, we needed to be able to issue library cards even if some students were not in our catchment area. it was common for virtual classes to consist of students from the smallest village in pec and all the way up to the northern most part of hastings county, a full 3 hours’ drive away. cpepl was able to accommodate this need. pec is a tourist destination and frequently issues cards for visitors staying in the area for an extended period of time under the rule of if you “wo rk, live, or play” in pec, you are eligible for a public library card. once library cards were set up or renewed for all families who requested them through the google form, i got to work teaching students and staff how to access library resources. after communicating with the curriculum staff and public school administrators, it was decided that creating an information presentation on getting started with hoopla was the best course of action. hoopla is an incredibly intuitive application in regards to the format possibilities (ebooks and audiobooks) as well as adjustable features within each format. the available settings and adjustment options make the reading experience comfortable and accessible as possible for users. also, since there is no wait time to borrow materials, this allowed entire classes learning remotely to all check out the same title and read together. the material presented to students was easy to understand and interactive. the session provided ample time for students to follow along and test each feature in the hoopla app with their own individual book selections. the best part? this presentation was just the starting point. while we were only able to schedule and virtually deliver this presentation at two in-person schools, the other five schools in pec and a number of primary classes in the virtual school still participated in the google form for library card registration. teachers started asking what else the public library had to offer to enhance the curriculum delivery with additional resources. many community teachers were reminded of the public library’s services and resources (beyond just hoopla) and reached out for class visits or access to materials. other schools outside of our prince edward county catchment reached out and connected with their local public libraries, or vice versa. we are still working to develop ways to meet the needs of students, staff, and their families through the public library. some schools in the northern area of the region have students coming from multiple, different public library catchment areas, and most of these libraries do not have the same resources as others, especially in the case of smaller systems. this posed an issu e of equitable access for students: why should some students in the class have access to library online resources, and some not because they come from different/smaller communities? we were able to mitigate this issue with the virtual school, but for students attending in-person learning, we could not give library cards to every student in the school board. thankfully, another public library system in our area stepped up their access to offer virtual library access to any student or teacher in hastings county (so everywhere except prince edward county). this recognition of the importance of equitable access enabled students to not only regain access to a public library system, but it also ensured that all students could access books in the way that best suited them. when i ask a class if listening to an audiobook counts as reading, it amazes me that the majority of the class say “no.” or if i ask students if they had ever read an ebook, some would say it was not a “real” book. these comments and notions are not only untrue, but they are information technology and libraries june 2022 gathering strength to combat access inequality | lane 4 also exclusionary. countless students need other formats than just printed materials. how many would benefit from listening to an audiobook along with reading a printed version? how many students dislike reading because it is just hard to see the words, but if the text was more spaced out, or a different font, it would make all the difference? how many times is a student not able to access a book they want because all available copies are already checked out at their school library? these are issues students in the classes i work with face. having a public library card can significantly ease these barriers to access. all in all, we processed hundreds of card requests and renewals and were able to powerfully illustrate to teachers how they could meaningfully integrate public library resources into their classrooms, either virtually or physically. our requests for library visits came back up to prepandemic levels, but we were working with more schools than we had previously. teachers were, and still are, reaching out and asking if we can get extra copies of books, or if we can lead virtual novel studies. one of our more popular pieces of progress is the integration of our coding programs with other subjects. currently, i am running a ukulele program where students are writing group arrangements using binary code as the basis for composition. we have classes doing art projects with robotics and integrating math learning objectives. we have done virtual story time and connected the story to creating scratch programs. the possibilities are endless , and now that we once again have the interest from teachers, we are working with them to support their students and all the learning that comes with incorporating technology and maker-thinking into a classroom environment. the momentum has not let up, and we are beyond thrilled. our communities and local school board have embraced the reality that public libraries are more than just books. public libraries are a critical part of any community and have the power to be a meaningful component to education at all levels. having schools and all educational stakeholders using public library services not only broadens the reach of a public library, but also broadens our advocacy potential. we know there is still a long way to go in terms of genuine equitable access, especially when it comes to technology. internet connectivity and technology literacy are just the tip of the iceberg, but when organizations support each other to truly serve their community, collectively, that is how you make change. presidents ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ business intelligence in the service of libraries articles business intelligence in the service of libraries danijela tešendić and danijela boberić krstićev information technology and libraries | december 2019 98 danijela tešendić (tesendic@uns.ac.rs) is associate professor, university of novi sad. danijela boberić krstićev (dboberic@uns.ac.rs) is associate professor, university of novi sad. abstract business intelligence (bi) refers to methodologies, analytical tools, and applications used for data analysis of business information. this article aims to illustrate an application of bi in libraries, as reporting modules in library management systems are usually inadequate for a comprehensive business analysis. the application of bi technology is presented as a case study of libraries using the bisis library management system in order to overcome shortcomings of an existing reporting module. both user requirements regarding reporting in bisis and already existing transactional databases are analysed during the development of a data warehouse model. based on that analysis, three data warehouse models have been proposed. also, examples of reports generated by an olap tool are given. by building the data warehouse and using olap tools, users of bisis can perform business analysis in a more user-friendly and interactive manner. they are not limited with predefined types of reports. librarians can easily generate customized reports tailored to the specific needs of the library. introduction organizations usually have a vast amount of data which increases on a daily basis. the success of an organization is directly related to its ability to provide relevant information in a timely manner. an organization must be able to transform raw data into valuable information that will enable better decision-making.1 for this reason, it is impossible to imagine an organization without an efficient reporting module as a part of its management information system. if we put libraries in a business context, they are very similar to any other organization. the common characteristic of each is that they have high demand for a variety of statistical reports in order to support their business. a library management system uses a transactional database to store and process relevant data. this database is designed in accordance with the main functionalities of the system. information used to make strategic decisions is usually obtained from historical and summarized data. however, the database model may have a complex structure and may not be suitable for performing analytical queries that are often very complex and involve aggregations. execution of those queries may be a time-consuming and resource-intensive process that can decrease performance as well as the availability of the system itself. also, creating such queries can require advanced it knowledge. these problems can be overcome by developing business intelligence systems. business intelligence (bi) refers to methodologies, analytical tools, and applications used for data analysis of business information. bi gives business managers and analysts the ability to conduct mailto:tesendic@uns.ac.rs mailto:dboberic@uns.ac.rs business intelligence in the service of libraries |tešendić and krstićev 99 https://doi.org/10.6017/ital.v38i4.10599 appropriate analyses. by analyzing historical and current data, decision-makers get valuable insights that enable them to make better, more-informed decisions. bi systems rely on a data warehouse as an information source. the data warehouse is a repository of data usually structured to be available in a form ready for analytical processing activities. 2 business intelligence systems do not exist as ready-made solutions for each organization, but need to be built in accordance with the characteristics of each organization using the appropriate methodology. this article proposes a data warehouse architecture and usage of olap tools in order to support bi in libraries. the application of bi technology is illustrated through a case study of libraries using the bisis library management system. the first step in implementation of bi was the creation of a data warehouse model considering data that exist in bisis and requirements regarding reporting. after the data warehouse model had been created, data were loaded into the data warehouse using olap tools. olap tools were also used for visualization of data stored in the data warehouse. reporting in bisis the bisis library management system has been in development since 1993 by the university of novi sad, serbia. currently, the bisis community comprises over forty medium-sized libraries in serbia.3 the primary modules of the bisis system include cataloguing, reporting, circulation, opac, bibliographic data interchange, and administration. bisis supports cataloguing according to unimarc and marc 21 formats using an xml editor for bibliographic material processing.4 the bisis search engine is implemented with a lucene engine.5 bisis supports z39.50 and sru protocols for the search and retrieval of bibliographic records.6 those protocols are also used for developing a bisis service for searching and downloading electronic materials by the audio library system for visually impaired people.7 in addition, bisis allows sharing of bibliographic records with the union catalogue of the university of novi sad.8 the circulation module features all standard activities for managing users: registration, charging, discharging, searching users and publications, and generating different kinds of reports, as well as user reminders.9 the reporting module of bisis is implemented using the jasperreports tool.10 however, this module has some limitations due to the fact that bisis works only with a transactional database and does not cope well with complex reports. firstly, in order to generate reports regarding library collections, it is necessary to process all bibliographic records stored in that transactional database. this activity significantly burdens the system and reduces its performance. to avoid this, reports are prepared in advance outside working hours, usually at night. consequently, those reports include only data collected before report generation. creating reports in this manner greatly reduces system load and speeds up presentation of the reports because they are already generated. however, some reports, such as those related to the financial aspects of the library (e.g., the number of new members and the balance at the end of the day), need to be created in real time. due to the execution in real time, those reports are ineffective and affect the performance of the entire system. the next limitation of this reporting module is that it has a set of predefined reports and the creation of new reports requires additional development. in the current deployment it is not possible to add new reports without engaging software developers. also, an additional obstacle is the fact that the data for generating reports are obtained from two different data sources (described in more detail in the following sections). for example, the report regarding the number information technology and libraries | december2019 100 of borrowed books by the udc (universal decimal classification) groups requires data about the udc groups from xml documents and data about book borrowing from the relational database. generating this kind of reports cannot be done in a timely and efficient manner. taking into account these shortcomings of the reporting module, it can be concluded that the application of business intelligence, primarily data warehouse and olap tools, could improve analytical data processing in the libraries using bisis. related work one of the basic components of the business intelligence system is a data warehouse. a data warehouse is a centralized database that stores historical data. those data are in principle unchangeable and they are obtained by collecting and processing data from various data sources. data warehouses are used as support for making business decisions.11 the data sources for a data warehouse can be diverse and may include transactional databases and different file formats. the process of integrating data from different data sources into a single database is called data warehousing. data warehousing includes extracting, transforming, and loading (etl) data into data warehouse.12 the goal of data warehousing is to extract useful data for further analysis from the huge amount of data that is potentially available. there are different approaches to modeling a data warehouse. these approaches can be classified in three different paradigms according to the origin of the information requirements: (1) supplydriven, (2) demand-driven, and (3) hybrids of these. a supply-driven approach is based on data that exist in the transactional database. these data are analyzed to determine which data are the most relevant for making business decisions, or which data should be part of the data warehouse. alternatively, a demand-driven approach is based on the end-user requirements which means that the data warehouse is modeled in a way that is possible to get answers to the questions asked by the users. the third approach is a hybrid approach and it combines the previous two approaches in the process of data warehouse modelling. the hybrid approach attempts to diminish the shortcomings of the previous two approaches. in the case of a supply-driven approach, the data warehouse will probably not meet the requirements of the end users, while in the demand-driven approach there may be no data to fill the created data warehouse. in an article published in 2009, romero and abelló gave an overall view of the research in the field of dimensional modeling of data warehouses.13 various examples of implementation of data-warehouse solutions in libraries can be found in the literature. in 2014, siguenza-guzman et al. described the design of a knowledge-based decision support system based on data-warehouse techniques that assists library managers making tactical decisions about the optimal use and leverage of their resources and services. when designing the data warehouse, the authors started from the requirements of the end users (demand -driven approach) and extracted data from heterogeneous sources.14 a similar approach has been used by yang and shieh, who started from the reports needed by public libraries in taiwan and through an iterative methodological approach modeled a data warehouse that meets all their reporting requirements.15 unlike the previously described articles where a demand-driven approach was used, we applied a hybrid approach to modeling data warehouse. we analyzed data sources that exist in bisis business intelligence in the service of libraries |tešendić and krstićev 101 https://doi.org/10.6017/ital.v38i4.10599 following a supply-driven approach, but we also analysed user requirements to identify the facts and dimensions for the dimensional data warehouse model. modeling the data warehouse in order to implement a data warehouse solution, the first step is to design a data model suitable for analytical data processing. a data warehouse usually stores data in a relational database and organizes them in so called dimensional models. unlike standard relational database models, those models are denormalized and provide easier data visualization. data can be presented as a cube with three, four or n-dimensions. analyzing such data is more intuitive and user-friendly. the dimensional model contains the following concepts: dimensions, facts, and measures. dimensions represent the parameters for data analysis while facts represent business entities, business transactions, or events that can be used in analyzing business processes. the most commonly used model in dimensional modeling is the star model. after identifying the facts and dimensions, a dimensional model almost always resembles a star, with one central fact and several dimensions that surround it. dimensions and facts are usually implemented as tables in the relational database. dimension tables contain primary keys and other attributes. fact tables contain numerical data as well as dimension tables keys. the measure is a numerical attribute of the fact table and can be obtained by aggregating data by certain dimensions. there are several approaches to modeling data warehouse and we followed a hybrid approach to design dimensional models presented in this article. this implies that both the existing data sources and the user requirements were considered while designing the final data-warehouse models. that modeling process involved the following activities: 1. analysis of existing data sources in bisis with identification of possible facts and dimensions, 2. analysis of user requirements regarding reporting, 3. refactoring of the facts and dimensions in accordance with the user requirements, and 4. design of dimensional models. analysis of data sources in bisis the first step in creating a data warehouse is an analysis of existing data sources. the bisis system uses two different data sources. bibliographic records are stored in xml documents, while circulation data, as well as holdings data regarding the items that are circulated, are stored in a relational database. in 2009, tešendić et al. described the bisis circulation database model.16 that model describes data about individual and corporate library members. data about members includes information about personal data, membership fees, as well as information about a member’s borrowed and returned items. bibliographic data in bisis are presented in unimarc format. dimić and surla in 2009 described the model for bibliographic records used in bisis.17 a bibliographic record is modeled as a list of fields and subfields. a field contains a name, values of the indicators and a list of subfields. a subfield contains a name and a value of that subfield. the data described by that model are stored in xml documents because the bibliographic record structure is not suitable for relational modeling. that structure is more in line with the document-oriented data storage approach. information technology and libraries | december2019 102 analysis of user requirements one of the essential functionalities of information systems, including library management systems, is to provide various statistical reports that should help the management of the library to make better business decisions. user requirements related to analytical processing in bisis can be grouped into several categories. the first category consists of requirements regarding reports on the library collections. examples of reports from this category are: • number of publications per language for a certain period of time; • number of publications by departments; • number of new publications for a certain period of time; and • number of publications by udc groups. the second category consists of requirements related to the circulation of library resources. examples of such reports are: • number of borrowed items by member category; • number of borrowed items by language of publication; • number of borrowed items by departments; • the most popular books; and • the most avid readers for a certain period. the third category consists of requirements related to the reports on financial elements of the library's business. some of the reports are: • number of new members on a daily basis with a financial balance; • number of members by membership category and gender; and • number of members per departments. analyzing user requirements, it was perceived that a new data warehouse have to be created using data from both data sources. this means that appropriate transformations of data from the relational database as well as from the bibliographic records documents need to be performed. data warehouse models taking into account the reporting requirements as well as the data that exist in bisis, appropriate dimensional models are designed. the proposed dimensional models were designed to meet all the needs for analytical processing, as well as to enable flexibility of the reporting process in bisis. for each of the observed groups of reports, a dimensional model was created as described below. model describing library collection data a dimensional model of the bisis data warehouse used for analytical processing of the library collection data is shown in figure 1. the data from this model are used to generate reports on the library collection. examples of such reports are accessions register, number of items by udc group, number of items by departments, etc. in generating all these reports, an acquisition number of an item has the main role and all reports are created either by counting the acquisition numbers or by displaying the acquisition business intelligence in the service of libraries |tešendić and krstićev 103 https://doi.org/10.6017/ital.v38i4.10599 numbers along with other data related to that item. therefore, the acquisition number represents the measure in this dimensional model. the central table in the model is the item table and it presents a fact table. this table contains the acquisition number and foreign keys from dimension tables. all other tables in the model are dimension tables. the publication table represents a dimension table containing bibliographic data from bibliographic records. only data that are needed for reports are extracted from bibliographic records and stored in this table. those data refer to the name of the author, the title of the publication, the publication’s isbn and udc number, the number of pages, keywords, and an identification number for the bibliographic record in the transactional database. the acquisition table represents a dimension that describes the publication's acquisition data such as a retail price, the name of the supplier, and the invoice number. the location table describes departments within the library where an item is stored. the status, publisher, language, and udc_group tables relate to information about the status of an item, publisher, language of the publication, and udc group to which an item belongs. the date and year tables represent the time dimensions. data in the date table are extracted from the date of an item acquisition and data in the year table are extracted from the publishing year. figure 1. dimensional model describing library collection data. information technology and libraries | december2019 104 model describing library circulation data a dimensional model of the bisis data warehouse used for the analytical processing of library circulation data is shown in figure 2. data from this model are used for generating statistical reports regarding usage of library resources. examples of such reports are the number of borrowed publications according to different criteria (such as user categories, language of publication, departmental affiliation of the user who borrowed the publication, etc.). these data can answer questions about the most popular books or the readers with the highest number of borrowed books. similar to the previous reporting group, the acquisition number of the item which was borrowed has the main role in generating those reports. all reports from this group are created by counting acquisition numbers of borrowed items and displaying data related to those checkouts. therefore, in this dimensional model, the acquisition number is a measure. the central table in the model is the lending table and is presented as a fact table. this table contains the acquisition number of the borrowed item and foreign keys from the dimension tables. all other tables in the model are dimension tables. the publication, publisher, year, acquisition, ucd_group, status, and language tables contain data from bibliographic records and the content of these tables have been already explained. the member, membershiptype, category, education, and gender tables represent the dimension tables containing information about library users. these data are only a subset of circulation data from transactional database. the location table describes departments within the library where items are borrowed. the date table represents the time dimension. the data in the date table are derived from the date of borrowing and the date of discharge of an item. business intelligence in the service of libraries |tešendić and krstićev 105 https://doi.org/10.6017/ital.v38i4.10599 figure 2. dimensional model describing library circulation data. model describing members’ data a dimensional model of the bisis data warehouse used for the analytical processing of members’ data is shown in figure 3. data from this model are used for generating statistical reports on library members, as well as for generating financial reports based on membership fees. examples of such reports are the number of members according to different criteria (such as department of registration, member category, type of membership, gender, or education level). also, this report group contains reports that include a financial balance (for example, a list of members with membership fees in a certain time period). the membership fee has the main role in generating these reports. all reports from this group are generated by counting or displaying members who have paid a membership fee or summarizing membership fees. therefore, in this dimensional model, membership fee is a measure. the main table in the model is the membership table and it presents a fact table. it contains the membership fee, which is the measure, and foreign keys from the dimension tables. information technology and libraries | december2019 106 all other tables in the model are dimension tables. tables member, membershiptype, category, education and gender represents the dimension tables that contain information about library members and the content of these tables was previously described. the table location describes departments within the library where user registration is performed. the table date represents the time dimension. data in the table date are based on the registration date and the date of the membership expiration. figure 3. dimensional model describing library members. true value of a data warehouse in the previous sections, we presented models of data warehouse, but those models are unusable if they are not implemented and populated with data needed for business analysis. extracting, transforming, and loading (etl) processes are responsible for reshaping the relevant data from the source systems into useful data to be stored in the data warehouse. etl processes load data into a data warehouse, but that data warehouse is still only storage for those data. a real-time and interactive visualisation of those data will show the true benefits of data warehouse implementation in various organisations including libraries. to load as well as to analyze and visualize large volumes of data in data warehouses, various online analytical processing (olap) tools can be used.18 the usage of olap tools does not business intelligence in the service of libraries |tešendić and krstićev 107 https://doi.org/10.6017/ital.v38i4.10599 require a lot of programming knowledge in comparison to tools used for querying transactional databases. the interface of olap tools should provide a user with a comfortable working environment to perform analytical operations and to visualize query results without knowing programming techniques or structure of transactional database. there are various olap tools available on the market.19 when choosing an olap tool to be used in an organization, there are several important criteria to consider: the duration of query execution, user-oriented interface, the possibility of interactive reports, price of tool, automation of the etl process, etc.20pentaho bi system is one of the open-source olap tools which satisfies most of those criteria. among various features, pentaho supports creation of etl processes, data analysis, and reporting.21 implementation of etl processes can be a challenging task primarily because of the nature of the source systems. we used pentaho tool to transform data from bisis to the data warehouse, as well as to visualize data and generate statistical reports. etl processes modeling after creating a data-warehouse model, it is necessary to load data into the data warehouse. the first step in that process is to extract data from the data sources. those data may not be in accordance with the newly created data-warehouse model and appropriate transformations of data may be needed before loading. regarding the structure of the data sources, transformations can be implemented from scratch, or by using dedicated olap tools. both techniques are used in the development of our data warehouse. transformations that required data from bibliographic records were implemented from scratch because of complex data structure, while transformations that processed data from relational database are implemented using pentaho data integration (pdi) tool. pdi is a graphical tool that enables designing and testing etl processes without writing programming code. figures 4 and 5 show an example of transformations created and executed by that tool. those transformations have been applied to load members’ data from bisis relational database into the data warehouse. figure 4. transformations for loading members data. information technology and libraries | december2019 108 figure 5. the membershiptransformation process. an issue that may arise after an initial loading of a data warehouse relates to updating the data warehouse. in order to achieve a better performance of transactional databases, updates of the data warehouse should not be performed in real time. in the case of library management systems, those updates can be performed outside of working hours so data in the data warehouse will be up to date on a daily basis. an update algorithm can be defined as an etl process using olap tools or it can be implemented from scratch. data visualization the basic task of olap tools is to enable visualization of data stored in a data warehouse. the olap tools use multidimensional data representation, known as a cube, which allows a user to analyze data from different perspectives. olap cubes are built on dimensional models of a data warehouse and consist of dimensions and measures. dimensions form the cube structure and each cell of the cube holds a measure. measures are derived from the records in the fact table and dimensions are derived from the dimension tables. olap tools allow a user to select a part of the olap cube by setting an appropriate query and that part can be further analyzed by different dimensions. this process is performed by applying common operations on the cube which include slice and dice, drill down, roll up, and pivot.22 data that are results of operations on the cube can be visualized in the form of tables, charts, graphs, maps, etc. the main advantage of olap tools reflects is that end users can do their own analyses and reporting very efficiently. users can extract and view data from different points of view on demand. olap tools are valuable because they provide an easy way to analyze data using various graphical wizards. by analyzing data interactively, users are provided with feedback which can define the direction of further analysis. in order to visualize data from our data warehouse, we used the pentaho olap tool. we used it to create predefined reports identified during the analysis of user requirements as well as some interactive reports using operations on the olap cube. examples of generated reports are presented below in order to illustrate some features of the pentaho olap tool. an example of a report shown in figure 6 was obtained with a dice operation on the cube. the dice operation selects two or more dimensions from a given cube and provides a new sub-cube. in this particular example, we selected three dimensions: gender, member category, and registration date. business intelligence in the service of libraries |tešendić and krstićev 109 https://doi.org/10.6017/ital.v38i4.10599 figure 6. example of dice operation performed on the olap cube. information technology and libraries | december2019 110 figure 7. example of roll-up and drill-down operations performed on the olap cube. business intelligence in the service of libraries |tešendić and krstićev 111 https://doi.org/10.6017/ital.v38i4.10599 additionally, we analyzed only those data from 2014 to 2018. the result of this operation is presented in the form of nested pie charts. however, other forms of visualisation can be applied on the same data set very easily. in figure 7, a more complex report is presented. that report is obtained by performing combination of roll-up and drill-down operations. the roll-up operation performs aggregation on a cube reducing the dimensions. in our example, we aggregated the number of newly acquired publications for certain years ignoring all other dimension except the date dimension. a user can select a particular year, quarter, and month and analyze details of purchased publications in that period, such as title and author of the publication. this is an example of using drill-down operation on the cube. the result of that operation is presented as a table, as shown in figure 7. this report is interactive, because user can investigate data in more detail by performing other operations on the cube that are placed on the toolbar of the report. conclusion this article aims to illustrate an application of business intelligence in libraries, as reporting modules in library management systems are usually inadequate for a comprehensive business analysis. development of a data warehouse, which is the base of any business intelligence system, as well as usage of olap tools are presented. both user requirements regarding reporting in bisis and already-existing transactional databases are analyzed during the development of a datawarehouse model. based on that analysis, three data-warehouse models have been proposed. also, examples of reports generated by an olap tool are given. by building the data warehouse and using olap tools, users of bisis can perform business analysis in a more user-friendly manner than with other processes. users are not limited to predefined types of reports. librarians can easily generate customized reports tailored to the specific needs of the library. in this way, librarians work in a more comfortable environments, performing analytical operations interactively and visualizing query results without additional programming knowledge. the article presents the usage of pentaho olap tool, but the proposed data-warehouse model is independent of olap tools selection and any other tool can be integrated with the proposed data warehouse. references 1 ralph stair and george reynolds, fundamentals of information systems (cengage learning, 2017). 2 ramesh sharda, dursun delen, and efraim turban, business intelligence, analytics, and data science: a managerial perspective (pearson, 2016). 3 “bisis,” library management system bisis, accessed july 8, 2019, http://www.bisis.rs/korisnici/. 4 bojana dimić and dušan surla, “xml editor for unimarc and marc 21 cataloguing,” the electronic library 27, no. 3 (2009): 509-28, https://doi.org/10.1108/02640470910966934; bojana dimić surla,“eclipse editor for marc records,” information technology and libraries 31, no. 3 (2012): 65-75, https://doi.org/10.6017/ital.v31i3.2384; bojana dimić surla, “developing an eclipse editor for marc records using xtext,” software: practice and experience 43, no. 11 (2013): 1377-92, https://doi.org/10.1002/spe.2140. http://www.bisis.rs/korisnici/ https://doi.org/10.1108/02640470910966934 https://doi.org/10.6017/ital.v31i3.2384 https://doi.org/10.1002/spe.2140 information technology and libraries | december2019 112 5 branko milosavljević, danijela boberić, and dušan surla, “retrieval of bibliographic records using apache lucene,” the electronic library 28, no. 4 (2010): 525-39, https://doi.org/10.1108/02640471011065355. 6 danijela boberić and dušan surla,“ xml editor for search and retrieval of bibliographic records in the z39. 50 standard,” the electronic library 27, no. 3 (2009): 474-95; danijela boberić krstićev, “information retrieval using a middleware approach,” information technology and libraries 32, no. 1 (2013): 54-69, https://doi.org/10.6017/ital.v32i1.1941; miroslav zarić, danijela boberić krstićev, and dušan surla, “multitarget/multiprotocol client application for search and retrieval of bibliographic records,” the electronic library 30, no. 3 (2012): 351-66, https://doi.org/10.1108/02640471211241636. 7 danijela tesendic and danijela boberic krsticev, “web service for connecting visually impaired people with libraries,” aslib journal of information management 67, no. 2 (2015): 230-43, https://doi.org/10.1108/ajim-11-2014-0149. 8 danijela boberić-krstićev and danijela tešendić,“ mixed approach in creating a university union catalogue,” the electronic library 33, no. 6 (2015): 970-89, https://doi.org/10.1108/el-022014-0026. 9 danijela tešendić, branko milosavljević, and dušan surla, “a library circulation system for city and special libraries,” the electronic library 27, no. 1 (2009): 162-86, https://doi.org/10.1108/02640470910934669; branko milosavljević and danijela tešendić, “software architecture of distributed client/server library circulation system,” the electronic library 28, no. 2 (2010): 286-99, https://doi.org/10.1108/02640471011033648; danijela tešendić, “data model for consortial circulation in libraries,” in proceedings of the fifth balkan conference in informatics, novi sad, serbia, september, 16-20, 2012. 10 danijela boberic and branko milosavljevic, “generating library material reports in software system bisis,” in proceedings of the 4th international conference on engineering technologies icet, 2009: 133-37. 11 william h. inmon, building the data warehouse (indianapolis: john wiley & sons, 2005); ralph kimball, the data warehouse toolkit: practical techniques for building dimensional data warehouses (ny: john willey & sons, 1996), 248, no. 4. 12 ralph kimball and joe caserta, the data warehouse etl toolkit: practical techniques for extracting, cleaning, conforming, and delivering data (indianapolis: john wiley& sons, 2004), 528. 13 oscar romero and alberto abelló, “a survey of multidimensional modeling methodologies,” international journal of data warehousing and mining (ijdwm) 5, no. 2 (2009): 1-23. 14 lorena siguenza guzman, victor saquicela, and dirk cattrysse,“design of an integrated decision support system for library holistic evaluation,”in proceedings of iatul conferences (2014):112. https://doi.org/10.1108/02640471011065355 https://doi.org/10.6017/ital.v32i1.1941 https://doi.org/10.1108/02640471211241636 https://doi.org/10.1108/ajim-11-2014-0149 https://doi.org/10.1108/el-02-2014-0026 https://doi.org/10.1108/el-02-2014-0026 https://doi.org/10.1108/02640470910934669 https://doi.org/10.1108/02640471011033648 business intelligence in the service of libraries |tešendić and krstićev 113 https://doi.org/10.6017/ital.v38i4.10599 15 yi-ting yang and jiann-cherng shieh, “data warehouse applications in libraries—the development of library management reports,” in advanced applied informatics (iiai-aai), 2016 5th iiai international congress on advanced applied informatics, 88-91. ieee, 2016, https://doi.org/10.1109/iiai-aai.2016.129. 16 tešendić, milosavljević, and surla, “a library circulation system,”162-86. 17 dimić and surla, “xml editor for unimarc,” 509-28. 18 paulraj ponniah, data warehousing fundamentals for it professionals (hoboken, nj: john wiley & sons, 2011). 19 “top 10 best analytical processing (olap) tools,” software testing help, https://www.softwaretestinghelp.com/best-olap-tools/. 20 rick sherman, “how to evaluate and select the right bi tools,” https://searchbusinessanalytics.techtarget.com/buyersguide/a-buyers-guide-to-choosingthe-right-bi-analytics-tool. 21 doug moran, “pentaho community wiki,” https://wiki.pentaho.com/. 22 ponniah, “data warehousing,” 382-93. https://doi.org/10.1109/iiai-aai.2016.129 https://www.softwaretestinghelp.com/best-olap-tools/ https://searchbusinessanalytics.techtarget.com/buyersguide/a-buyers-guide-to-choosing-the-right-bi-analytics-tool https://searchbusinessanalytics.techtarget.com/buyersguide/a-buyers-guide-to-choosing-the-right-bi-analytics-tool https://wiki.pentaho.com/ abstract introduction reporting in bisis related work modeling the data warehouse analysis of data sources in bisis analysis of user requirements data warehouse models model describing library collection data model describing library circulation data model describing members’ data true value of a data warehouse etl processes modeling data visualization conclusion references text analysis and visualization research on the hetu dangse during the qing dynasty of china article text analysis and visualization research on the hetu dangse during the qing dynasty of china zhiyu wang, jingyu wu, guang yu, and zhiping song information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13279 zhiyu wang (mikemike248@gmail.com) is phd candidate, school of management, harbin institute of technology and associate professor, school of history, liaoning university. jingyu wu (734665532@qq.com) is graduate student, school of history, liaoning university. guang yu (yug@hit.edu.cn) is professor, school of management, harbin institute of technology. zhiping song (1367123893@qq.com) is graduate student, school of history, liaoning university. © 2021. abstract in traditional historical research, interpreting historical documents subjectively and manually causes problems such as one-sided understanding, selective analysis, and one-way knowledge connection. in this study, we aim to use machine learning to automatically analyze and explore historical documents from a text analysis and visualization perspective. this technology solves the problem of large-scale historical data analysis that is difficult for humans to read and intuitively understand. in this study, we use the historical documents of the qing dynasty hetu dangse, preserved in the archives of liaoning province, as data analysis samples. china’s hetu dangse is the largest qing dynasty thematic archive with manchu and chinese characters in the world. through word frequency analysis, correlation analysis, co-word clustering, word2vec model, and svm (support vector machines) algorithms, we visualize historical documents, reveal the relationships between functions of the government departments in the shengjing area of the qing dynasty, achieve the automatic classification of historical archives, improve the efficient use of historical materials as well as build connections between historical knowledge. through this, archivists can be guided practically in historical materials’ management and compilation. introduction china has a long history documented in numerous archives. at present, various local archive departments preserve large numbers of historical documents from different periods. owing to the development of china’s archive digitization, archive management departments at all levels have established digital archive abstracts, catalogs, and subject indexes of historical documents in their collections realizing online retrieval of historical archives. with in-depth research on chinese history, simple catalog retrieval cannot satisfy researchers’ demand for related knowledge in historical archives. owing to the limitations of the catalog retrieval system, complex catalog data still need to be read manually. however, it is difficult to view the overall picture of the recorded content and impossible to easily distinguish important information in historical materials; this leads to various difficulties, such as the compilation of historical materials for chinese historical researchers. thus, in this study, we aim to use text analysis and visualization methods in machine learning to conduct data mining analysis of historical document data. these methods will help us discover the logical relationships of historical records and their purposes, accomplish visual presentations of historical entities and knowledge discovered in historiography, improve knowledge representation and automatic classification of historical data, and provide valuable information for historical archive researchers. mailto:mikemike248@gmail.com mailto:734665532@qq.com mailto:yug@hit.edu.cn mailto:1367123893@qq.com information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 2 during the process of analyzing traditional manual methods for interpreting historical documents, we find the following phenomena: macro description, single angle, selective analysis, and one-way knowledge connection, among others. for example, the hetu dangse preserved in the liaoning archives contains a total of 1,149 volumes and 127,000 pages, making it difficult to fully grasp and understand the overall content of such documents. relying on manual reading and analysis of entire archives is an unrealistic task. therefore, this paper proposes using machine learning, natural language processing (nlp), and other technologies to address various problems from traditional manual reading. first, information from historical documents can be revealed from different angles, and this allows the content of the documents to be displayed more comprehensively and scientifically through visual charts. second, use of objective quantitative analysis methods, such as text analysis and nlp, prevents subjective interpretations of the same content. third, nlp and other technologies can solve the problem of calculating massive text training data sets while forming systematic knowledge that avoids the omission and one-sided understanding of knowledge in the historical archive. the application of machine learning in historical data analysis has attracted the attention of researchers in management, history, and computer science. tao used the latent dirichlet allocation (lda) topic modeling algorithm to analyze the themes of documents from 1700 to 1800 included in the german archives, providing a more three-dimensional interpretation and explanation of the spiritual world of germany during the eighteenth century.1 chinese scholars kaixu et al. proposed a method of automatic sentence punctuation based on conditional random fields in ancient chinese.2 this method was proved to better solve the problem of automatic punctuation processing compared with the single-layer conditional random field strategy in ancient chinese as tested on the two corpora of the analects and records of the grand historian. swiss and south african scholars stauffer, fischer, and riesen, and chinese scholars wu, wang, and ma used the kws technology and deep reinforcement learning to automatically recognize handwritten pictures in historical documents.3 solar and radovan used the national and university library of slovenia’s historical pictures and maps as research data. using gis technology, they created a novel display method, and interdisciplinary data resource web application to access and research the data.4 chinese scholars dong et al. and polish scholars kuna and kowalski used the webgis technology to conduct efficient management and visualization research on historical data of natural disasters in ancient china and russia. 5 meanwhile, latvian scholars ivanovs and varfolomeyev and dutch scholars schreiber et al. used web technology to develop a web service platform and explored the intelligent environment of cultural heritage service utilization.6 korean scholars kim et al. used machine learning technology to determine the complex relationships between tasks of various classes in a specific historical period through the network of historical figures.7 judging from results in related fields, the semantic analysis and visualization of historical archives in an intelligent way are gradually moving from statistical description to knowledge mining. these results provide theoretical feasibility and practical technical experience for this study. at present, research on historical documents mainly focuses on the retrieval and utilization of historical material databases. since the words, semantics, grammar, and sentence patterns recorded in historical materials differ from modern texts, using data mining technologies such as machine learning and nlp to intelligently identify historical documents and organize historical data will help us more than traditional methods. this requires the cooperation of artificial intelligence and historical researchers to establish an effective method of historical big data information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 3 analysis to achieve the transformation from traditional manual historical document analysis to automatic artificial intelligence analysis methods. in this paper, we use machine learning and data visualization as a tool to identify differently the content of the historical documents from traditional literature reading, reveal valuable information in the content of historical documents, and promote more systematic, efficient, and detailed understanding of the literature. related technology definition to perform text analysis and visualization of the hetu dangse, we use machine learning technology such as word vector processing, the svm (support vector machines) model and network analysis. word vector is a numerical vector representation of a word’s literal and implicit meaning.8 we segmented the hetu dangse’s catalog data and used the word2vec model to transform the segmented data’s word vector form into a set of 50-dimensional numerical vectors representing a catalog’s vector data set. to accurately visualize historical document records’ relationship features, we reduced the vector data set’s dimensionality. dimensionality reduction, or dimension reduction, is data’s transformation from a highinto a low-dimensional space so that the representation retains some of the original data’s meaningful properties, ideally close to its intrinsic dimension.9 after dimensionality reduction, each catalog data in the vector data set is reduced from 50 to 2 dimensions to facilitate flat display. we used the svm model and network analysis technology to analyze the vector data set. the svm model is a set of supervised learning methods used for classification, regression, and outlier detection.10 it is given a vector data set as training to represent historical document records as points in space, and learns independently through the kernel algorithm. using the algorithm, it maps the separated new records to the same space, and predicts their category based on which side of the interval they fall. network analysis techniques derive from network theory, a computer science system demonstrating social networks’ powerful influences. network analysis technology’s characteristics determine that it is suitable for books and historical archives’ visualization in the library and information science field, because the visualization technique involves mapping entities’ relationships based on the symmetry or asymmetry of their relative proximity.11 thus, it helps to discover historical documents’ knowledge relevance. for example, citation network analysis can identify emerging relationships in healthcare domain journals.12 sample data preprocessing and classification this study uses the catalog of the qing dynasty historical archives from the hetu dangse collected by the liaoning archives as the research sample to conduct text analysis and visualization research. china’s hetu dangse is the largest qing dynasty thematic archive with manchu and chinese characters both in domestic and international. the hetu dangse is the official document of communication between shengjing general yamen, the wubu of shengjing and fengtian office, and the document communicated between the beijing internal affairs office in charge and the liubu of beijing during the qing dynasty. the hetu dangse was published from 2015 to 2018, including the hetu dangse·kangxi period (56 volumes), hetu dangse·yongzheng period (30 volumes), hetu dangse·qianlong period (24 volumes), hetu dangse·qianlong period (17 volumes), hetu dangse·daoguang period (52 volumes), hetu dangse·jiaqing period (58 volumes), hetu dangse·qianlong period official documents (46 volumes), hetu dangse·qianlong period official documents (46 volumes), and hetu dangse·general list (16 volumes).13 the hetu dangse is an information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 4 important document for studying the history of the qing dynasty. owing to the special status of shengjing in the qing dynasty, it has a unique historical significance as the companion capital of beijing and the hometown of the qing royal family. this provides original evidence from this time for studying politics, economy, culture, history, and natural ecology in northeast china. in this study, we preprocess the catalog data of the hetu dangse by performing text segmentation, creating a corpus, and labeling data before using text analysis and visualization technology to analyze the catalog data of hetu dangse. first, we use word frequency analysis and statistics to study the functions of institutions. second, we use the co-word clustering algorithm to quantify and visualize the institutional relationships. finally, we use the svm model to automatically classify and explore the catalog data of the hetu dangse. figure 1 illustrates this process. figure 1. text analysis flowchart. data preparation and preprocessing we collected 95,680 catalog data items in the hetu dangse of the liaoning archives, including 25,148 items from the kangxi period; 1,096 items from the yongzheng period; 23,819 items from the qianlong period; 20,730 items from the jiaqing period; and 15,887 items from the daoguang period. the content of each catalog data includes three parts: title information, time of publication (chinese lunar calendar), and responsible agency. the proportion for each period was not evenly distributed in the catalog data of the hetu dangse with the kangxi period catalog data having the highest proportion (26.2%). through the catalog data information, we can perform an in -depth analysis of the content of the hetu dangse from the three perspectives: institutional functions, institutional relationships, and topic classification. data cleaning as the text recorded in the archives of the hetu dangse are manchu and ancient chinese, using chinese word segmentation tools (jieba, snownlp, thulac, etc.) based on modern chinese will cause errors. therefore, it is necessary to construct a special text corpus for word segmentation. first, we construct a stop vocabulary list to remove words with little impact on semantics in the hetu dangse, such as for (为), please (请) and of (之). second, we use the word segmentation tools mentioned above for preliminary word segmentation and then perform part-of-speech tagging and word segmentation corrections based on the word segmentation results. the title part of the catalog data of the hetu dangse mainly contains three dimensions of information: the record title of the catalog, issuing institution, and receiving institution. accordingly, we set a total of four types of tags in the text corpus: issuing institution, receiving institution, record type, and keywords. the receiving institution and the issuing institution correspond to the institutions at the beginning and the end of the catalog, respectively, such as the words shengjing zhangguan fang zuoling, and shengjing ministry of justice. the record type is the front word of the receiving institution, such as counseling (咨) and please (请). the keywords are words that can represent the overall semantics information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 5 in the record title of the catalog, such as arrest (缉拿) and advance (进送). table 1 presents the corpus we developed. table 1. hetu dangse corpus num word property1 property 2 1 盛京掌关防佐领 organization noun 2 为 stop_words preposition 3 缉拿 keywords verb 4 逃人 keywords noun 5 舒廷 name noun 6 官事 stop_words noun 7 咨 keywords verb 8 盛京刑部 organization noun 9 正白旗佐领 organization noun 10 兆麟 name noun 11 呈 stop_words preposition 12 为 stop_words preposition 13 交纳 keywords verb 14 壮丁 keywords noun 15 银两事 keywords noun ┋ ┋ ┋ ┋ 61047 收讫事 keywords noun 61048 盛京佐领 organization noun label data to improve the utilization efficiency of the hetu dangse and show the document content information from multiple angles, we use a supervised machine learning method to automatically classify the catalog data of the hetu dangse. therefore, the original catalog data set must be labeled. we determine the classification and label of the hetu dangse catalog according to the chinese archives classification law, chapter 12. table 2 presents the 11 categories of the catalog. with this, we complete the hetu dangse catalog sampling classification and labeling laying the foundation for automatic catalog classification. the hetu dangse has a total of 95,680 catalog records involving five periods: kangxi, yongzheng, qianlong, jiaqing, and daoguang. we randomly select 500 records from each period and manually label these 2,500 records as the sample data set. the data classification after manual labeling is shown in figure 2. the overall distribution is relatively even, making it suitable for machine learning processing. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 6 table 2. data labels num category 1 type of official documents （政务种类） 2 palace, royal family and eight banners affairs（宫廷、皇族及八旗事务） 3 bureaucracy, officials（职官、吏役） 4 military（军事） 5 politics and law（政法） 6 sino-foreign relations（中外关系） 7 culture, education, health and scientific cultural study（文化、教育、卫生及科学文化研究） 8 finance（财政） 9 agriculture, water conservancy, animal husbandry （农业、水利、畜牧业） 10 building（建筑） 11 transportation, post and telecommunication（交通、邮电） figure 2. percentage of the hetu dangse catalog data label chart. results in this study, we used the catalog data of the hetu dangse as a sample to analyze and reveal the hetu dangse catalog data from three perspectives: institutional function, institutional relationship, and automatic classification. this will improve usage efficiency of the hetu dangse, thus improving information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 7 researchers’ mastery of relevant information about the document. to achieve the functional requirements of text analysis, we adopted four methods: word vector conversion, word frequency analysis, co-word clustering, and the svm model. word vector conversion of text catalog data the automatic classification of machine-learning technology is based on vector data sets. thus, the hetu dangse text catalog data set must be vectorized before automatic classification. currently, word vector conversion technology mainly includes methods such as one-hot, word2vec, and glove. hetu dangse records the history of the qing dynasty for more than 200 years. there are inevitable relationships among the contents recorded in the documents, indicating that they are not isolated from each other. the word2vec model provides an efficient implementation of cbow and skip-gram architectures for computing vector representations of words, both of which are simple neural network models with one hidden layer. the word2vec model produces word vectors as outputs from inputting the text corpus. this method generates a vocabulary from the input words and then learns the word vectors via backpropagation and stochastic gradient descent.14 this makes the word2vec model more suitable for catalog data from hetu dangse. word2vec includes the cbow model and the skip-gram model, which can enrich the semantic relevance depending on the context, and it is more suitable for the semantic relevance of historical documents such as the hetu dangse. therefore, we adopt the skip-gram model to analyze the catalog data of hetu dangse. we extracted the features of word vectors in catalog data from the corpus, input them into the word2vec model, imported the gensim library in python, trained the vector embeddings, and obtained the htd.model.bin vector file and htd.text.model model file. the correlation between each word in the hetu dangse catalog can be found by implementing the model. for example, if the word bannerman (旗人) is input into the model, the most relevant words are minren (民人, with 0.84726 relevance), accused (被控, with 0.812017), and robbery (抢劫, with 0.795359). to visualize the ethnic relationships recorded in the hetu dangse catalog, we input the first 300 words of the word vector into the trained word2vec model and performed dimensionality reduction to realize a planar graph. to understand the structure of the data intuitively, we used the t-sne algorithm to reduce the dimensions of the word vector. the t-sne is a type of nonlinear dimensionality reduction used to ensure that similar data points in high-dimensional space are as close as possible in low-dimensional space. we set the embedded space dimension parameter of tsne to 2 and the initialization parameter as pca. this makes it more globally stable than random initialization. the maximum number of optimization iterations is 5,000. figure 3 presents the results. in figure 3, the terms sanling, yongling, zhaoling, prime minister, and fuling form clusters. in shengjing, the qing set up the sanling prime minister's office, and the prime minister's mausoleum affairs minister was appointed concurrently by general shengjing. near fujinmen, the sanling prime minister's office was established. in the 30th year of guangxu, the government office was changed to the prime minister's office of shengjing mausoleum affairs, and the governor of the three provinces concurrently served. under the sanling prime minister’s office, the sanling office was set up to undertake the sacrifice and repair affairs of the three tombs (xinbin yongling, shenyang fuling, and zhaoling).15 therefore, the clustering in figure 3 verifies the close relationship between the sanling prime minister's office and the tombs. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 8 figure 3. 2d tsne visualization of word2vec vectors. analysis of the relationship between the documents received and sent of the institution with the statistics of the text data obtained after word segmentation, we can find the quantitative relationship between the documents received and sent by the institution, using the pearson correlation coefficient to judge whether there is a correlation between the number of documents received and the number of documents sent by the same institution. 𝜌(𝑟,𝑠) = 𝑐𝑜𝑣(𝑅,𝑆) 𝜎𝑟 𝜎𝑠 (3.1) we suppose that the pearson correlation coefficient between the number of documents received and the number of documents sent is ρ(r,s), r= {r1, r2, r3...r11}. here, r is the variable set of documents received from the institutional sample. set s= {s1, s2, s3…s11} is the variable set of documents sent by the institutional sample. by dividing the covariance of r and s by the product of their respective standard deviations, we can obtain the value of the correlation coefficient of the documents sent and received by the same institution. mining the relationship between institutions’ sending and receiving documents based on co-word clustering to mine the relationship between the institutions’ sending and receiving documents, we adopt a co-word clustering algorithm to generate a visualized network map of institutional relationships. the global co-occurrence rate represents the probability of two words appearing together in all the data sets. in large-scale data sets, if two words often appear together in the text, these two words are considered to be strongly related to the semantics.16 clustering is a method that places objects into a group by similarity or dissimilarity. thus, keywords with high correlation to each other tend to be placed in the same cluster. social network analysis, which evaluates the unique structure of interrelationships among individuals, has been extensively used in social science, psychological science, management science, and scientometrics. 17 we can obtain a sociogram from the institutional function analysis. the main purpose of the sociogram is to provide information information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 9 about the relationship between institutions’ sending and receiving documents. in the sociogram, each member of a network is described by a “vertex” or “node.” vertices represent high-frequency words, and the sizes of the nodes indicate the occurrence frequency. the smaller the size of a node, the lower the occurrence frequency. lines depict the relationships between two institutions. they exist between two keywords, indicating that they received or sent documents to each other. the thickness is proportional to the correlation between the keywords. the thicker the line between the two keywords, the stronger the connection. using this rationale, the map visualization and network characteristics (centrality, density, core-periphery structure, strategic diagram, and network chart) were obtained by analyzing pearson’s correlation matrix or other similarity matrices.18 in this study, we conducted network analysis on a binary matrix to display the relationships between the documents sent and received by the institutions in the shengjing area during the qing dynasty recorded in the hetu dangse. further, we extracted the receiving institution and issuing institution from each record of catalog data in the hetu dangse, and then we composed a new data set with the following data from the receiving institution: issuing institution and title content. we used python to convert the new data set to endnote format and import it into vosviewer1.6.15 to calculate and draw a visual map of the new data set. van eck and waltman of the netherlands’ leiden university developed vosviewer, a metrological analysis software used for constructing and visualizing network graphs.19 although the software’s development principle is based on documents’ co-citation principles, it can be applied to the construction of data network knowledge graphs in various fields. combined with the co -word clustering algorithm, we can create an entity connection network map for historical documents through vosviewer software to reflect the recorded content. automatic classification method of historical archives catalog based on the svm model we used the svm model in machine learning for automatic classification. the svm model has the advantages of strong generalization, low error rate, strong learning ability, and support for small sample data sets, making it suitable for historical archive catalog data samples with small sample characteristics. therefore, we attempted to classify the catalog data set of hetu dangse using the svm model. first, we divided the vectorized labeled data set into a training set and a testing set. the training set accounts for 70% of the data, and the testing set accounts for 30%. to ensure the accuracy of the model prediction, we adopted a random division method to avoid overfitting. second, we used a linear kernel in the svm model and grid search to find the best parameter. various combinations of the penalty coefficient (c) and gamma parameter in the svm model were tested based on their accuracy ranked from high to low. we then determined the best parameter combination. after the model was established, we validated the predictive performance of the model from multiple perspectives such as precision, recall, and f1 score to ensure the generalization ability and availability of the model. we set the penalty coefficients to 10, 100, 200, and 300, while the gamma parameters are set to 0.1, 0.25, 0.5, and 0.75. we used the precision evaluation criteria to find the optimal parameter combination of the model and then imported them. the penalty coefficient is set to the x-axis, the gamma parameter set to the y-axis, and the precision set to the z-axis. we implemented the model to obtain the visualization that is shown in figure 4. clearly, the optimal parameter combination is a penalty coefficient of 10 and a gamma parameter of 0.075. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 10 figure 4. svm grid search parameter tuning diagram. discussion the history of a nation is the foundation on which it is built. historical documents are the witnesses and recorders of history. through the study of historical documents, we can go back to the past, cherish the present, and look forward to the future. an increasing number of scholars have studied these documents in recent years due to their importance. the hetu dangse records the document communications between institutions in shengjing (now shenyang) and beijing during the qing dynasty. it is an important historical document that cannot be ignored when information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 11 studying the history of northeast china during the qing dynasty. here, we use the catalog data of the hetu dangse as the sample data to test the machine learning methods previously mentioned. we explore the results from the perspectives of institutional function, institutional relationship, and automatic classification to determine the feasibility of our methods. functions of institutions the number of institutions involved in the hetu dangse is over 150. these functional departments formed the governance system of the shengjing area during the qing dynasty. to gain a deeper understanding of the qing dynasty’s ruling system in the shengjing area, the functions of these institutions should be examined. this study analyzes and studies the functions of the institutions in the shengjing area through the number of documents and the frequency of content of the sending and receiving institutions. analysis of the number of documents received and sent by institutions by sorting and statistically analyzing the catalog data of hetu dangse, we obtained data on the number of documents received and sent by institutions in the shengjing area recorded in the hetu dangse. we set the vertical axis as the total number of communicated documents, number of issued documents, and number of received documents. we set the horizontal axis as the names of the institutions and then drew a histogram. this study analyzes the number of institutional archives of the hetu dangse catalog from three perspectives: total number of sent and received documents, number of received documents, and number of issued documents to find the institutions with the highest research value in the shengjing area. in the histogram shown in figure 5(a), the top three institutions in total number of communicated documents are shengjing internal affairs office, shengjing zuoling, and shengjing ministry of revenue. we can also observe that the top 10 institutions have different volumes of their respective documents received and sent by institutions. therefore, the ranking of the total number of communicated documents is not directly related to the respective rankings of the number of documents received and the number of documents sent. in figure 5(b), we can observe that the top three institutions in number of documents received in the hetu dangse are shengjing internal affairs office, shengjing ministry of revenue, and shengjing general yamen. figure 5(c) shows the top three institutions in number of documents sent in the hetu dangse are shengjing internal affairs office, shengjing zuoling, and shengjing general yamen. the total number of communicated documents, number of documents sent, and number of documents received by the shengjing internal affairs office all rank first; this indicates that the shengjing internal affairs office is the most important department of the ruling system in the qing dynasty during the shengjing area. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 12 figure 5. number of documents received and sent by institutions. a b c information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 13 by using the number of documents received and sent by the institutions, we calculated the pearson correlation coefficient to determine if the number of documents received and sent by the same institution is relevant. as institutional samples, we selected the shengjing internal affairs office, shengjing ministry of revenue, (beijing) internal affairs office in charge, shengjing zuoling, shengjing ministry of works, shengjing ministry of justice, shengjing general yamen, shengjing close defense zuoling, shengjing ministry of war, fengtian general yamen, and shengjing ministry of rites. through calculation, the result of pearson correlation coefficient is 0.69 (save two decimal places), so there is a correlation between the number of sent and received documents, as shown in figure 6. figure 6. scatter plot of pearson correlation coefficient. the hetu dangse is a copy of official documents dealing with the royal affairs of the shengjing internal affairs office during the qing dynasty. it contains the official documents between the shengjing internal affairs office and the beijing internal affairs office in charge, the liubu, etc. and the local shengjing general yamen, fengtian office, the wubu of shengjing, and other yamens.16 thus, there exist a large stock of documents with the shengjing internal affairs office as the sending and receiving agency. the wubu of shengjing, shengjing general yamen, shengjing zuoling, and other institutions are important hubs for the operation of institutions in shengjing. they played an important role in maintaining and stabilizing the society of shengjing. the number of documents is second in importance only to the shengjing internal affairs office. analysis of the frequency of documents received and sent by institutions to further explore the functions of institutions with research value, we extracted the contents of the catalogs from the top three institutions in total number of documents sent and received: shengjing internal affairs office, shengjing ministry of revenue, and shengjing zuoling. we then classified the catalogs of the aforementioned institutions according to receipts and postings. subsequently, we used word segmentation and word frequency statistics to process the two types information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 14 of catalog information and draw comparison diagrams to explore their specific functions in the hetu dangse. as shown in figure 7, we can roughly divide the obtained segmentation words into two categories. one is the name of the communicated official document institutions, such as the ministry of revenue, the ministry of justice, and the ministry of rites on the side of the word frequency (see fig. 7[a]). the other is the name of the official document content and the words zhuangtou (庄头), dimu (地亩), and zhuangding (壮丁) on the side of the frequency of the words in the documents sent. through a comparative analysis of the top 10 words received and sent by the same institution, we conclude that the institutions with a close relationship between receiving and sending documents are not the same. for example, the ministry of revenue of shengjing internal affairs office ranks first in the frequency of documents sent by institutions, while the shengjing zuoling ranks first for receiving institutions (see fig. 7[b]). the contents of documents sent and received by the same institution are different. figure 7(c) shows how the affairs sent by shengjing zuoling to ula (乌拉), forage (粮草), and license (执照) differ from those represented by the zhuangtou (庄头), accounting (会计), and close defense (关防) in the frequency of documents sent and frequency of receipts, respectively. based on previous research on the functions of shengjing’s institutions, the shengjing internal affairs office was set up in the companion capital of shengjing during the qing dynasty to be in charge of shengjing cemetery, sacrifice, organization of staff transfer, and other matters. 20 this relates to the meaning of words such as sacrifice (祭祀) in figure 7(a). the functions of the shengjing ministry of revenue were represented in guangxu’s great qing huidian. the cashiers in charge of taxation in shengjing, number of annual losses in official villages, and banner land were carefully recorded. the expenditures were distinguished and the accounting obeyed the regulations according to the beijing ministry of revenue at the end of the year.21 this is related to the meaning of words, such as dimu (地亩), land sale (卖地), and money and grain (钱粮) in figure 7(b). in fu yonggong and guan jialu’s research of shengjing zuoling’s functions, shengjing zuoling handled the transfer communicated documents; supervised and urged the various departments of guangchu, duyu, zhangyi, accounting, construction, and qingfeng to undertake matters; managed officials and various people; maintained the shengjing palace and the warehouse; selected women to send to beijing inspect; heard all types of cases; undertook the emperor’s general letter; managed the ula people and tributes; and accepted the emperor or the internal affairs office in charge, among other tasks.22 this is connected to the meaning of words such as ula (乌拉), close defense (关防) and license (执照) in figure 7(c). information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 15 figure 7. word frequency comparison of documents received (in blue) and sent (in orange) by institutions. a b c information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 16 institutional relationship analysis to further study the governance structure of the shengjing area, we not only need to understand the functions of each institution but also explore the overlap between functions of institutions. the catalog data of the hetu dangse consist of three parts: receiving institutions, issuing institutions, and record title of the catalog. a document often includes two institutions, th e receiving institution and the issuing institution, and it is certain that the content of a document relates closely to the functions between the two institutions. by observing the closeness between the number of institutions through visualizations, we conducted a quantitative analysis of consistent catalog data of the receiving and issuing institutions in the hetu dangse to provide reliable data for further research in the intersection of institutional functions in shengjing area. results of institutional connection analysis using the co-word clustering algorithm, we counted the number of archive catalog data consistent with the receiving and issuing institutions. we set the vertical axis as the issuing institution and the horizontal axis as the receiving institution to obtain figure 8. the numbers inside the boxes represent the quantity of catalog data that are consistent with the issuing institution. to facilitate measurements in the statistical process, records less than or equal to 50 communicated documents between the receiving institution and the issuing institution have been zeroed out. as shown in figure 8, the institutions having close relations with the documents recorded in the hetu dangse are concentrated in the issuing institutions shengjing zuoling and shengjing internal affairs office, and the receiving institutions shengjing internal affairs office and shengjing zuoling. among the receiving institutions, the number of documents received by the shengjing internal affairs office from shengjing general yamen reached as high as 11,936. the top three documents received by shengjing zuoling were fengtian general yamen (2,265 pieces), shengjing ministry of revenue (1,527 pieces), and shengjing ministry of justice (1,520 pieces). it is worth noting that there are less than 50 documents from shengjing zuoling in the shengjing internal affairs office. the overlapping functions of the institutions in the shengjing area enabled individual offices to play bureaucratic games, passing responsibility to other offices, leading to low efficiency in handling affairs. for example, the military and political power in the shengjing area was jointly controlled by the shengjing general office and the shengjing ministry of war. the shengjing area’s tax power was controlled by the shengjing ministry of revenue and fengtian office and their subordinate offices. this phenomenon ran through the entire qing dynasty. research on the cr ossfunctionality of institutions has always been a hot topic in qing historiography. by analyzing the official documents between the institutional functions, we can further explore the overlap as well as the advantages and disadvantages of the qing dynasty shengjing ruling system to study the history of shengjing institutions in the qing dynasty more thoroughly providing a reference for the design of current institutions. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 17 figure 8. relationship of communicated documents by the hetu dangse institutions diagram. visualization of institutional network map we used the hetu dangse catalog as sample data and the co-word clustering algorithm to obtain the close relationship between institutions and the appearance frequency of institutions. we drew a visual network diagram by virtue of vosviewer1.6.15 to obtain figure 9. in figure 9, institutions are represented by default as a circle with their names. the size of the label and the circle of an institution are determined by the weight of the item. the higher the weight of an item, the larger the label and the circle of the item. for some items, labels may not be displayed to avoid overlapping labels. the color of an institution is determined by the cluster the institutions belong to, and lines between items represent links. as shown in figure 9, the relationships between the institutions and departments in the hetu dangse form three core groups: the shengjing internal affairs office (in charge), shengjing zuoling, and beijing internal affairs office in charge. however, the relationships between the three groups are not similar; the distance between the group (beijing) internal affairs office in charge and the two other groups is relatively large. the group at the core of shengjing internal affairs office and the group at the core of shengjing zuoling are closely connected to each other through the wubu of information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 18 shengjing (shengjing ministry of revenue, shengjing ministry of rites, shengjing ministry of war, shengjing ministry of justice, and shengjing ministry of works). further, there are two larger individuals: fengtian general yamen and shengjing general yamen. fengtian general yamen and shengjing zuoling are closely related to each other, and the relationship between shengjing general yamen and shengjing internal affairs office is relatively close. figure 9. co-occurrence of institutions network map. the city of shengjing was the companion capital of the qing dynasty. the qing government implemented special governance measures in these areas that differed greatly from those of direct inland provinces.23 to ensure the stable rule of the shengjing area, the qing dynasty performed the following tasks. first, the qing dynasty set up a general garrison as the highest military and political chief in the shengjing area to be responsible for all military and political affairs within its jurisdiction. second, they established the fengtian office, a capital of the same level as the shuntian office, to rule the common people of the shengjing area. the states and counties, as well as the garrison banner officer, which was under the rule of general garrison, were local administrative institutions under the fengtian office. these institutions implemented the dual management rule of the bannerman and common people. third, as the companion capital, the shengjing area followed the ming dynasty companion capital system to set up the wubu of shengjing to maintain power. in addition, the shengjing internal affairs office, which was in charge of palace affairs, communicated with the beijing internal affairs office in charge. information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 19 results of automatic classification analysis catalogs are important information resources in the field of historical archives. the classification of archival catalogs can not only link relevant information in archives or archive fonds, improve researchers’ utilization efficiency, and save time to search for required archives, but it can also be shown to readers in clusters. as the hetu dangse catalog is a series of historical documents stored for a long period of time, its original classification system does not suit well existing archival management methods. the hetu dangse has a total of 1,149 volumes and 127,000 pages. each volume contains a different number of documents and the ink characters on chinese art paper are in manchu and chinese. reading and categorizing the full text of the hetu dangse not only requires a lot of manpower, material, and financial resources but also extremely high requirements for the classified staff. they need to possess a good knowledge of manchu, archival science, document taxonomy, and other related disciplines. therefore, sorting and organizing the content of the hetu dangse is an impractical task that relies on manual reading and comprehension. to address this problem, we used the svm model of machine learning to automatically classify and explore the catalog data of the hetu dangse. this model further demonstrates the relevance of the knowledge between documents in the hetu dangse and facilitates an in-depth analysis. we imported the vectorized labeled data set into the svm model and selected the optimal parameter combination to run the model. to visualize the data results, the 50-dimensional word vector is reduced to a 2-dimensional word vector using the t-distributed random neighborhood embedding algorithm. we used the svm model to establish a hyperplane visualized in 2dimensional form. the legend only in figure 10 shows the data distribution of the six categories with the highest proportion owing to the large number of categorized data. to test the classification effect of the svm model, we used precision and recall as metrics and calculated the f1 score to validate the model. the results are presented in table 3. based on the created svm model, 95,680 catalog data of the hetu dangse were predicted and classified. the results are shown in figure 11. although there exist certain deficiencies in accuracy and other aspects, it a positive impact for the content research, management, utilization, and retrieval discovery of hetu dangse. table 3. svm model validation parameters result precision 0.736 recall 0.717 f1 score 0.716 information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 20 figure 10. svm decision region boundary. figure 11. hetu dangse catalog data prediction classification. conclusion in this study, we used machine learning to analyze and visualize the catalog data of the hetu dangse, revealing the functional relationship of the qing dynasty, shengjing regional institutions recorded in this historical document, and showing the institutional communicated relationships. using the svm model, we achieved automatic classification of the hetu dangse catalog from the category perspective. owing to the massive archives of historical materials in ancient china, the information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 21 fonts of many historical materials cannot be recognized by computers or humans. the digitization of catalogs has become a digital bridge between researchers and historical documents. this not only achieves the concise summary and refinement of them but also greatly improves the utilization efficiency by researchers. the svm model can “learn” through the labeled sample data and realize automatic classification of large amounts of unlabeled catalog data. by automatic classification of catalog data, historical data researchers and archive managers can use and manage a large number of historical documents and catalog data more effectively, greatly increasing their utilization. the co-occurrence algorithm can reveal the rules written by the catalog data itself, discover the distance between the catalog data, and form clusters providing a clearer direction for researchers to use historical documents. the algorithm also saves time for researchers to identify documents without purpose, making content presentation of historical documents to readers clearer. this paper improves archivists’ awareness of archive data compilation and management. first, data is observed, topics are identified, and potential relationships between these are found and established to improve historical archives’ compilation. second, the visual presentation method and carrier is chosen, and via the web browser established relationships are visualized for the users to access and utilize. it can be said that scientometric research method can promote the transformation of historical research and archives management and compilation research from traditional explanatory scholarship to truth-seeking scholarship. currently, the application of machine learning technology has gradually extended from applied disciplines to traditional fields of literature, art, and sociology. however, there are still many opportunities in the field of historical research. this study used methods in the field of artificial intelligence to conduct text mining and visualize the presentation of historical archive document catalog data and proposes a new digital and intelligent solution for researching chinese historical documents. with the development of science and technology, research methods for historical documents are undergoing constant changes from the traditional manual subjective analysis of historical data to relying on quantitative analysis represented by deep learning and data mining technology. it is an irreversible trend to research historical documents more comprehensively, accurately, and scientifically by means of artificial intelligence and other technologies on the scientific frontier. for future work, we plan to conduct research on the qing dynasty historical documents from a deeper semantic analysis level, construct a knowledge graph through the method of named entity recognition, and construct an ontological model transforming historical documents into a structured knowledge base to discover new knowledge from historical documents in an automated manner. acknowledgments funding statement this work was supported by the general program of the national natural science foundation of china [grant number 72074060], the research foundation of the ministry of education of china [grant number 20jhq012], and the national social science fund of china [grant number 16btq089]. data accessibility the data sets supporting this article have been uploaded as part of the supplementary material. https://drive.google.com/drive/folders/1bzs17otruyva_qkbshmf836ygdti40y0?usp=sharing https://drive.google.com/drive/folders/1bzs17otruyva_qkbshmf836ygdti40y0?usp=sharing information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 22 competing interests we have no competing interests. endnotes 1 wang tao, “data mining of german historical documents in the 18th century, taking topic models as examples,” xuehai 1, no. 20 (2017): 206–16, https://doi.org/10.16091/j.cnki.cn321308/c.2017.01.021. 2 kaixu zhang and yunqing xia, “crf-based approach to sentence segmentation and punctuation for ancient chinese prose,” journal of tsinghua university (science and technology) 10, no. 27 (2009): 39–49, https://doi.org/10.16511/j.cnki.qhdxxb.2009.10.027. 3 michael stauffer, andreas fischer, and kaspar riesen, “keyword spotting in historical handwritten documents based on graph matching,” pattern recognition 81 (2018): 240–53, https://doi.org/10.1016/j.patcog.2018.04.001; wu sihang et al., “precise detection of chinese characters in historical documents with deep reinforcement learning,” pattern recognition 107 (2020): 107503, https://doi.org/10.1016/j.patcog.2020.107503. 4 renata solar and dalibor radovan, “use of gis for presentation of the map and pictorial collection of the national and university library of slovenia,” information technology and libraries 24, no. 4 (2005): 196–200, https://doi.org/10.6017/ital.v24i4.3385. 5 shaochun dong et al., “semantic enhanced webgis approach to visualize chinese historical natural hazards,” journal of cultural heritage 14, no. 3 (2013): 181–89, https://doi.org/10.1016/j.culher.2012.06.009; jakub kuna and łukasz kowalski, “exploring a non-existent city via historical gis system by the example of the jewish district ‘podzamcze’ in lublin (poland),” journal of cultural heritage 46 (2020): 328–34, https://doi.org/10.1016/j.culher.2020.07.010. 6 aleksandrs ivanovs and aleksey varfolomeyev, “service-oriented architecture of intelligent environment for historical records studies,” procedia computer science 104 (2017): 57–64, http://doi.org/10.1016/j.procs.2017.01.062; guus schreiber et al., “semantic annotation and search of cultural-heritage collections: the multimedian e-culture demonstrator,” journal of web semantics 6, no. 4 (2008): 243–49, https://doi.org/10.1016/j.websem.2008.08.001. 7 m kim et al., “inference on historical factions based on multi-layered network of historical figures,” expert systems with applications 161 (2020): 113703, http://doi.org/10.1016/j.eswa.2020.113703. 8 hobson lane, cole howard, hannes hapke, natural language processing in action: understanding, analyzing, and generating text with python (new york: manning publications, 2019), 165. 9 laurens van der maaten, eric postma, and jaap van den herik, “dimensionality reduction: a comparative review,” tilburg university technical report, ticc-tr 2009-005 (2009), https://lvdmaaten.github.io/publications/papers/tr_dimensionality_reduction_review_200 9.pdf. https://doi.org/10.16091/j.cnki.cn32-1308/c.2017.01.021 https://doi.org/10.16091/j.cnki.cn32-1308/c.2017.01.021 https://doi.org/10.16511/j.cnki.qhdxxb.2009.10.027 https://doi.org/ https://doi.org/10.1016/j.patcog.2018.04.001 https://doi.org/10.1016/j.patcog.2020.107503 https://doi.org/10.6017/ital.v24i4.3385 https://doi.org/10.1016/j.culher.2012.06.009 https://doi.org/10.1016/j.culher.2020.07.010 http://doi.org/10.1016/j.procs.2017.01.062 https://doi.org/10.1016/j.websem.2008.08.001 http://doi.org/ https://doi.org/10.1016/j.eswa.2020.113703 https://lvdmaaten.github.io/publications/papers/tr_dimensionality_reduction_review_2009.pdf https://lvdmaaten.github.io/publications/papers/tr_dimensionality_reduction_review_2009.pdf information technology and libraries september 2021 text analysis and visualization research on the hetu dangse | wang, wu, yu, and song 23 10 gavin hackeling, mastering machine learning with scikit-learn (birmingham: packt publishing, 2017). 11 richard smiraglia, domain analysis for knowledge organization: tools for ontology extraction (oxford: chandos publishing, 2015). 12 kuo-chung chu, hsin-ke lu, and wen-i liu, “identifying emerging relationship in healthcare domain journals via citation network analysis,” information technology and libraries 37, no. 1 (2018): 39–51, https://doi.org/10.6017/ital.v37i1.9595. 13 archives of liaoning province in china, “the hetu dangse series archives publication,” qing history research 6, no. 2 (2009): 1. 14 amit kumar sharma, sandeep chaurasia, and devesh kumar srivastava, “sentimental short sentences classification by using cnn deep learning model with fine tuned word2vec,” procedia computer science 167 (2020): 1139–47, https://doi.org/10.1016/j.procs.2020.03.416. 15 b hongxi, “research on the sanling management institutions of the qing dynasty outside the pass,” manchu minority research 4, no. 12 (1997): 38–56. 16 guangli zhu et al., “building multi-subtopic bi-level network for micro-blog hot topic based on feature co-occurrence and semantic community division,” journal of network and computer applications 170 (2020): 102815, https://doi.org/10.1016/j.jnca.2020.102815. 17 s. ravikumar, ashutosh agrahari, and s. n. singh, “mapping the intellectual structure of scientometrics: a co-word analysis of the journal scientometrics (2005–2010),” scientometrics 102 (2015): 929–55, https://doi.org/10.1007/s11192-014-1402-8. 18 jiming hu and yin zhang, “research patterns and trends of recommendation system in china using co-word analysis,” information processing and management 51, no. 4 (2015): 329–39, https://doi.org/10.1016/j.ipm.2015.02.002. 19 nees jan van eck and ludo waltman, “software survey: vosviewer, a computer program for bibliometric mapping, scientometrics, 84, no. 2 (2010): 523–38, https://doi.org/10.1007/s11192-009-0146-3. 20 z yanchang and l xinzhu, “the study of the function of shengjing office from the use of the official communication — an academic investigation based on hetu dangse,” shanxi archives 8, no. 12 (2020): 179–88. 21 shengjing ministry of revenue, guangxu's great qing huidian volume 25 (zhonghua book company, 1991), 211–12. 22 f yonggong and g jialu, “brief introduction of shengjing upper three banners baoyi zuoling,” historical archives 9, no. 30 (1992): 93–7. 23 wangyue, “research on the yamens and their affair relationships in shengjing area,” shenyang palace museum journal 1, no. 31 (2011): 67–77. https://doi.org/10.6017/ital.v37i1.9595 https://doi.org/10.1016/j.procs.2020.03.416 https://doi.org/10.1016/j.jnca.2020.102815 https://doi.org/ https://doi.org/10.1007/s11192-014-1402-8 https://doi.org/10.1016/j.ipm.2015.02.002 https://doi.org/10.1007/s11192-009-0146-3 abstract introduction related technology definition sample data preprocessing and classification data preparation and preprocessing data cleaning label data results word vector conversion of text catalog data analysis of the relationship between the documents received and sent of the institution mining the relationship between institutions’ sending and receiving documents based on co-word clustering automatic classification method of historical archives catalog based on the svm model discussion functions of institutions analysis of the number of documents received and sent by institutions analysis of the frequency of documents received and sent by institutions institutional relationship analysis results of institutional connection analysis visualization of institutional network map results of automatic classification analysis conclusion acknowledgments funding statement data accessibility competing interests endnotes 75 the development and administration of automated systems in academic libraries richard de gennaro: harvard university library, cambridge, mass. the first part of this paper considers three general approaches to the development of an automation program in a large research library. the library may decide simply to wait for developments; it may attempt to develop a total or integrated system from the start; or it may adopt an evolutionary approach leading to an integrated system. outside consultants, it is suggested, will become increasingly important. the second part of the paper deals with important elements in any program regardless of the approach. these include the building of a capability to do automation work, staffing, equipment, organizational structure, selection of projects, and costs. since most computer-based systems in academic libraries at the present time are in the developmen tal or early operational stages when improvements and modifications are frequent, it is difficult to make a meaningful separation between the developmental function and the administrative or management function. development, administration, and operations are all bound up together and are in most cases carried on by the same staff. this situation will change in time, but it seems safe to assume that automated library systems will continue to be characterized by instability and change for the next several years. in any case, this paper will not attempt to distinguish between developmental and administrative ftmctions but will instead discuss in an informal and non-technical way some of the factors to be considered b y librarians and administrators when 76 journal of library automation vol. 1/ 1 march, 1968 their thoughts turn, as they inevitably must, to introducing computer systems into their libraries or to expanding existing machine operations. alternative approaches to library automation will be explored first. there will follow a discussion of some of the important elements that go into a successful program, such as building a capability, a staff, and an organization. the selection of specific projects and the matter of costs will also be covered briefly. approaches to library automation devising a plan for automating a library is not entirely unlike formulating a program for a new library building. while there are general types of building best suited to the requirements of different types of library, each library is unique in some respects, and requires a building which is especially designed for its own particular needs and situation. as there are no canned library building programs, so there are no canned library automation programs, at least not at this stage of development; therefore the first task of a library administration is to formulate an approach to automation based on a realistic assessment of the institution• s needs and resources. certain newly-founded university libraries such as florida atlantic, which have small book collections and little existing bibliographical apparatus, have taken the seemingly logical course of attempting to design and install integrated computer-based systems for all library operations. certain special libraries with limited collections and a flexible bibligraphical apparatus are also following this course. project intrex at m.i.t. is setting up an experimental library operation parallel to the traditional one, with the hope that the former will eventually transform or even supersede the latter. several older university libraries, including chicago, washington state, and stanford, are attempting to design total systems based on on-line technology and to implement these systems in modules. many other university libraries (british columbia, harvard, and yale to name only a few) approach automation in an evolutionary way and are designing separate, but related, batch-processing systems for various housekeeping functions such as circulation, ordering and accounting, catalog input, and card production. still other libraries (princeton is a notable example) expect to take little or no action until national standardized bibliographical formats have been promulgated, and some order or pattern has begun to emerge from the experimental work that is in progress. only time will tell which of these courses will be most fruitful. meanwhile the library administrator must decide what approach to take; and the approach to automation, like that to a building program, must be based on local requirements and available resources ( 1,2). for the sake of this discussion the major principal approaches will be considered under three headings: 1) the wait-for-developments approach, automated systems in academic libraries/ de gennaro 77 2) the direct approach to a total system, and 3) the evolutionary approach to a total system. the use of outside consultants will also be discussed. the wait-for-developments approach this approach is based on the premise that practically all computerbased library systems are in an experimental or research-and-development stage with questionable economic justification, and that it is unnecessary and uneconomical for every library to undertake difficult and costly development work. the advocates of this approach suggest that library automation should not be a moon race and say that it makes sense to wait until the pioneers have developed some standardized, workable, and economical systems which can be installed and operated in other libraries at a reasonable cost. for many libraries, particularly the smaller ones, this is a reasonable position to take for the next few years. it is a cautious approach which minimizes costs and risks. for the larger libraries, however, it overlooks the fact that soon, in order to cope with increasing workloads, they will have to develop the capability to select, adapt, implement, operate, and maintain systems that were developed elsewhere. the development of this capability will take time and will be made more· difficult by the absence of any prior interest and activity in automation within the adapting institution. the costs will be postponed and perhaps reduced because the late-starters will be able to telescope much of the process, like countries which had their industrial revolution late. however, it will take some courage and political astuteness for a library administrator to hold firmly to this position in the face of the pressures to automate that are coming from all quarters, both inside and outside the institution ( 3). a major error in the wait-for-developments approach is the assumption that a time will come when the library automation situation will have shaken down and stabilized so that one can move into the field confidently. this probably will not happen for many years, if it happens at all, for with each new development there is another more promising one just over the horizon. how long does one wait for the perfect system to be developed so that it can be easily "plugged in," and how does one recognize that system when one sees it? there is real danger of being left behind in this position, and a large library may then find it difficult indeed to catch up. the direct approach to a total system this approach to library automation is based on the premise that, since a library is a total operating unit and all its varied operations are interrelated and interconnected, the logic of the situation demands that it be looked upon as a whole by the systems designers and that a single inte78 journal of library automation vol. 1/ 1 march, 1968 grated or total system be designed to include all machinable operations in the library. such a system would make the most efficient and economical use of the capabilities of the computer. this does not require that the entire system be designed and implemented at the same time, but permits treating each task as one of a series of modules, each of which can be implemented separately, though designed as part of a whole. several large libraries have chosen this method and, while a good deal of progress is being made, these efforts are still in the early development stage. the university of chicago system is the most advanced (4) . unlike the evolutionary approach, which assumes that much can be done with local funds, home-grown staff, batch processing and even second generation computers, the total systems approach must be based on sophisticated on-line as well as batch-processing equipment. this equipment is expensive; it is also complex, requiring a trained and experienced staff of systems people and expert programmers to design, implement, and operate it effectively. since the development costs involved in this approach are considerable, exceeding the available resources of even the larger libraries, those libraries that are attempting this method have sought and received sizable financial backing from the granting agencies. the total systems approach has logic in its favor: it focuses on the right goal and the goal will ultimately be attainable. the chief difficulty, however, is one of timing. the designers of these systems are trying to telescope the development process by skipping an intermediate stage in which the many old manual systems would have been converted to simple batch-processing or off-line computer systems, and the experience and knowledge thus acquired utilized in taking the design one step further into a sophisticated, total system using both on-line and batch-processing techniques. the problem is that we neither fully understand the present manual systems nor the implications of the new advanced ones. we are pushing forward the frontiers of both library automation and computer technology. it may well be that the gamble will pay off, but it is extremely doubtful that the first models of a total library system will be economically and technically viable. the best that can be hoped for is that they will work well enough to serve as prototypes for later models. while bold attempts to make a total system will unquestionably advance the cause of library automation in general, the pioneering libraries may very well suffer serious setbacks in the process, and the prudent administrator should carefully weigh the risks and the gains of this approach for his own particular library. the evolutionary approach to a total system this approach consists basically of taking a long-range, conservative view of the problem of automating a large, complex library. the ultimate goal is the same as that of the total systems approach described in the automated systems in academic libraries/de gennaro 79 preceding section, but the method of reaching it is different. in the total systems approach, objectives are defined, missions for reaching those objectives are designed, and the missions are computerized, usually in a series of modules. in the evolutionary approach, the library moves from traditional manual systems to increasingly complex machine systems in successive stages to achieve a total system with the least expenditure of effort and money and with the least disruption of current operations and services ( 5 ) . in the first stage the library undertakes to design and implement a series of basic systems to computerize various procedures using its own staff and available equipment. this is something of a bootstrap operation, the basic idea of which is to raise the level of operation circulation, acquisitions, catalog input, etc. -from existing manual systems to simple and economical machine systems until major portions of the conventional systems have been computerized. in the process of doing this, the library will have built up a trained staff, a data processing department or unit with a regular budget, some equipment, and a space in which to work: in short, an in-house capability to carry on complex systems work. during this first stage the library will have been working with tried and tested equipment and software packages probably of the second generation variety and meanwhile, third generation computers with on-line and time-sharing software are being debugged and made ready for use in actual operating situations. at some point the library itself, computer hardware and software, and the state of the library automation art will all have advanced to a point where it will be feasible to undertake the task of redesigning the simple stage-one systems into a new integrated stage-two system which builds upon the designs and operating experience obtained with the earlier systems. these stage-one systems will have been, for the most part, mechanized versions of the old manual systems; but the stage-two systems, since they are a step removed from the manual ones, can be designed to incorporate significant departures from the old way of doing things and take advantage of the capabilities of the advanced equipment and software that will be used. the design, programming, and implementation of these stage-two systems will be facilitated by the fact that the library is going from one logical machine system to another, rather than from primitive unformalized manual systems to highly complex machine systems in one step. because existing manual systems in libraries produce no hard statistical data about the nature and number of transactions handled, stage-one machine systems have had to be designed without benefit of this essential data. however, even the simplest machine systems can be made to produce a wide variety of statistical data which can be used to great advantage by the designers of stage-two systems. the participation of non80 journal of library automation vol. 1/ 1 march, 1968 library-oriented computer people in stage-two design will also ·be facilitated by the fact that they will be dealing with formalized machine systems and records in machine readable form with which they can easily cope. while the old stage one of library automation was one in which librarians almost exclusively did the design and programming, it is doubtful that stage-two systems can or should be done without the active aid of computer specialists. in stage one it was easier for librarians to learn computing and to do the job themselves than it was to teach computer people about the old manual systems and the job to be done to convert them. this may no longer be the case in dealing with redesign of old machine systems into very complex systems to run on third or fourth generation equipment in an on-line, time-sharing environment. there is now a generation of experienced computer-oriented librarians capable of specifying the job to be done and knowledgeable enough to judge the quality of the work that has been done by the experts. there is no reason why a team of librarians and computer experts should not be able to work effectively together to design and implement future library systems. as traditional library systems are replaced by machine systems, the specialized knowledge of them becomes superfluous, and it was this type of knowledge that used to distinguish the librarian from the computer expert. just as there is a growing corps of librarians specializing in computer work, so there is a growing corps of computer people specializing in library work. it is with these two groups working together as a team that the hope of the future lies. the question of who is to do library automation librarians or computer experts is no longer meaningful; library automation will be done by persons who are knowledgeable about it and who are deeply committed to it as a specialty; whether they have approached it through a background of librarianship or technology will be of little consequence. experience has shown that computer people who have made a full-time commitment to the field of library automation have done some of the best work to date. stage-two, or advanced integrated library systems, may be built by a team of library and computer people of various types working as staff members of the library, as has been suggested in the preceding discussion, but this approach also has its weaknesses. for example, let us assume that a large library has finally brought itself through stage one and is now planning to enter the second stage. it may have acquired a good deal of the capability to do advanced work, but its staff may be too small and too inexperienced in certain aspects of the work to undertake the major task of planning, designing, and implementing a new integrated system. additional expert help may be needed, but only on a temporary basis during the planning and design stages. such people will be hard to find, and also hard to hire within some library salary structures. they automated systems in academic librari-es/ de gennaro 81 will be difficult to absorb into the library's existing staff, administrative, and physical framework. they may also be difficult to separate from the staff when they are no longer needed. use of outside consultants there are alternative approaches to creating advanced automated systems. the discussion that follows will deal with one of the most obvious: to contract much of the work out to private research and development firms specializing in library systems. what comes to mind here is an analogy with the employment of specialized talents of architects, engineers, and construction companies in planning and building very large, complex and costly library buildings, which are then turned over to librarians to operate. when a decision has been made to build a new building, the university architect is not called in to do the job, nor is an architect added to the library staff, nor are librarians on the staff trained to become architects and engineers qualified to design and supervise the construction of the building. most libraries have on their staffs one or two librarians who are experienced and knowledgeable enough to determine the over-all requirements of the new building, and together they develop a building program which outlines the general concept of the building and specifies various requirements. a qualified professional architect is commissioned to translate the program into preliminary drawings, and there follows a continuing dialogue between the architect and the librarians which eventually produces acceptable working drawings of a building based on the original program. for tasks outside his area of competence, the architect in turn engages the services of various specialists, such as structural and heating and ventilating engineers. both the architect and the owners can also call on library consultants for help and advice if needed. the architect participates in the selection of a construction company to do the actual building and is responsible for supervic;ing the work and making sure that the building is constructed according to plans and contracts. upon completion, the building is turned over to the owners, and the librarians move in and operate it and see to its maintenance. in time, various changes and additions will have to be made. minor ones can be made by the regular buildings staff of the institution, but major ones will probably be made with the advice and assistance of the original architect or some other. in the analogous situation, the library would have its own experienced systems unit or group capable of formulating a concept and drawing up a written program specifying the goals and requirements of the automated system. a qualified "architect" for the system would be engaged in the form of a small firm of systems consultants specializing or experienced in library systems work. their task, like the architect's, would be to turn 82 journal of library automation vol. 1/ 1 march, 1968 the general program into a detailed system design with the full aid and participation of the local library systems group. this group would be experienced and competent enough to make sure that the consultants really understood the program and were working in harmony with it. mter an acceptable design had emerged from this dialogue, the consultant would be asked to help select a systems development firm which would play a role similar to that of the construction company in the analog: to complete the very detailed design work and .to do the programming and debugging and implementation of the system. the consultant would oversee this work, just as the architect oversees the construction of a building. the local library group will have actively participated in the development and implementation of the system and would thus be competent to accept, operate, maintain and improve it. success or failure in this approach to advanced library automation will depend to a large extent on the competence of the "architect" or consultant who is engaged. until recently this was not a very promising route to take for several reasons. there were no firms or consultants with the requisite knowledge and experience in library systems, and the state of the library automation art was confused and lacking in clear h·ends or direction. it was generally felt tl1at batch-processing systems on second and even third generation computing equipment could and should be designed and installed by local staff in order to give them necessary experience and to avoid the failures that could come from systems designed outside the library. library automation has evolved to a point where there is a real need for advanced library systems competence that can be called upon in the way that has been suggested, and individuals and firms will appear to satisfy that need. it is very likely, however, that the knowledge and the experience that is now being obtained in on-line systems by pioneering libraries such as the university of chicago, washington state university and stanford university, will have to be assimilated before we can expect competent consultants to emerge. the chief difficulty with the architect-and-building analog is that while the process of designing and constructing library buildings is widely understood, there being hundreds of examples of library buildings which can be observed and studied as precedents, the total on-line library system has yet to be designed and tested. there are no precedents and no examples; we are in the position of asking the "architect'' to design a prototype system, and therein lies the risk. mter this task has been done several times, librarians can begin to shop around for experienced and competent "architects" and successful operating systems which can be adapted to their needs. the key problem here, as always in library automation, is one of correct timing: to embark on a line of development automated systems in academic libraries/de gennaro 83 only when the state of the art is sufficiently advanced and the time is ripe for a particular new development. building the capability for automation regardless of the approach that is selected, there are certain prerequisites to a successful automation effort, and these can be grouped under the rubric of "building the capability." to build this capability requires time and money. it consists of a staff, equipment, space, an organization with a regular budget, and a certain amount of know-how which is generally obtained by doing a series of projects. success depends to a large extent on how well these resources are utilized, i.e. on the overall sh·ategy and the nature and timing of the various moves that are made. much has already been said about building the capability in the discussion on the approaches to automation, and what follows is an expansion of some points that have been made and a recapitulation of others. staff since nothing gets done without people, it follows that assembling, training, and holding a competent staff is the most important single element in a library's automation effort. the number of trained and experienced library systems people is still extremely small in ·relation to the ever-growing need and demand. to attract an experienced computer librarian and even to hold an inexperienced one with good potential, libraries will have to pay more than they pay members of the staff with comparable experience in other lines of library work. this is simply the law of supply and demand at work. to attract people from the computer field will by the same token require even higher salaries. in addition, library systems staff, because of the rate of development of the field and the way in which new information is communicated, will have to be given more time and funds for training courses and for travel and attendance at conferences than has been the case for other library staff. the question of who will do library automation-librarians or computer experts-has already been touched upon in another context, but it is worth emphasizing the point that there is no unequivocal answer. there are many librarians who have acquired the necessary computer expertise and many computer people who have acquired the necessary knowledge of library functions. the real key to the problem is to get people who are totally committed to library automation whatever their background. computer people on temporary loan from a computing center may be poor risks, since their professional commitment is to the computer world rather than that of the library. they are paid and promoted by the computing center and their primary loyalty is necessarily to that employer. computer people, like the rest of us, give their best to tasks which they find interesting and challenging, and by and large, they tend to look l 84 journal of library automation vol. 1/ 1 march, 1968 upon the computerization of library housekeeping tasks as trivial and unworthy of their efforts. on the other hand, a first-rate computer person who has elected to specialize in library automation and who has accepted a position on a library staff may be a good risk, because he will quickly take on many of the characteristics of a librarian yet without becoming burdened by the full weight of the conventional wisdom that librarians are condemned to carry. the ideal situation is to have a staff large enough to include a mixture of both types, so that each will profit by the special knowledge and experience of the other. to bring in computer experts inexperienced in library matters to automate a large and complex library without the active participation of the library's own systems people is to invite almost certain failure. outsiders, no matter how competent, tend to underestimate the magnitude and complexity of library operations; this is tme not only of computing center people but also of independent research and development firms. a library automation group can include several different types of persons with very different kinds and levels of qualifications. the project director or administrative head should preferably be an imaginative and experienced librarian who has acquired experience with electronic data processing equipment and techniques, and an over-all view of the general state of the library automation art, including its potential and direction of development. there are various levels of library systems analysts and programmers, and the number and type needed will depend on the approach and the stage of a particular library's automation effort. the critical factor is not numbers but quality. there are many cases where one or two inspired and energetic systems people have far surpassed the efforts of much larger groups in both quality and quantity of work. some of the most effective library automation work has been done by the people who combine the abilities of the systems analyst with those of the expert programmer and are capable of doing a complete project themselves. a library that has one or two really gifted systems people of this type and permits them to work at their maximum is well on the way to a successful automation effort. as a library begins to move into development of on-line systems, it will need specialist programmers in addition to the systems analysts described above. these programmers need not be, and probably will not be, librarians. other members of the team, again depending on the projects, will be librarians who are at home in the computer environment but who will be doing the more traditional types of work, such as tagging and editing machine catalog records. in any consideration of library automation staff, it would be a mistake to underestimate the importance of the role of keypunchers, paper tape automated systems in academic libraries/de gennaro 85 typists, and other machine operators; it is essential that these staff members be conscientious and motivated persons. they are responsible for the quality and quantity of the input, and therefore of the output, and they can frequently do much to make or break a system. a good deal of discussion and experimentation has gone into the question of the relative efficiency of various keyboarding devices for library input, but little consideration is given to the human operators of the equipment. experience shows that there can be large variations in the speed and accuracy of different persons doing the same type of work on the same machine. equipment one of the lessons of library automation learned during the last few years is that a library cannot risk putting its critical computer-based systems onto equipment over which it has no control. this does not necessarily mean that it needs its own in-house computer. however, if it plans to rely on equipment under the administrative control of others, such as the computer center or the administrative data processing unit, it must get firm and binding commitments for time, and must have a voice in the type and configuration of equipment to be made available. the importance of this point may be overlooked during an initial development period, when the library's need for time is minimal and flexible; it becomes extremely critical when systems such as acquisitions and . circulation become totally dependent on computers. people at university computing centers are generally oriented toward scientific and research users and in a tight situation wiu give the library's needs second priority; those in administrative data process~g, because they are operations oriented, tend to have a somewhat better appreciation of the library's requirements. in any case, a library needs more than the expressed sympathy and goodwill of those who control the computing equipment-it needs firm commitments. for all but the largest libraries, the economics of present-day computer applications in libraries make it virtually impossible to justify an in-house machine of the capacity libraries will need, dedicated solely or largely to library uses. even the larger libraries will find it extremely difficult to justify a high-discount second generation machine or a small third generation machine during the period when their systems are being developed and implemented a step or a module at a time. eventually, library use may increase to a point where the in-house machine will pay for itself, but during the interim period the situation will be uneconomical unless other users can be found to share the cost. in the immediate future, most libraries will have to depend on equipment located in computing or data processing centers. the recent experience of the university of chicago library, which is pioneering on-line systems, suggests that this situation is inevitable, given the high core requirements and low com86 journal of library automation vol. 1/ 1 march, 1968 puter usage of library systems. experience at the university of missouri ( 6), suggests that the future will see several libraries grouping to share a machine dedicated to library use; this may well be preferable to having to share with research and scientific users elsewhere within the university. a clear trend is not yet evident, but it seems reasonable to suppose that in the next few years sharing of one kind or another will be more common than having machines wholly assigned to a single library; and that local situations will dictate a variety of arrangements. while it is clear that the future of library automation lies in third-generation computers, much of their promise is as yet unfulfilled, and it would be premature at this point to write off some of the old, reliable, second-generation batch-processing machines. the ibm 1401, for example, is extremely well suited for many library uses, particularly printing and formatting, and it is a machine easily mastered by the uninitiated. this old workhorse will be with us for several more years before it is retired to majorca along with obsolete paris taxis. organization when automation activity in a library has progressed to a point where the systems group consists of several permanent professionals and several clericals, it may be advisable to make a permanent place for the group in the library's regular organizational structure. the best arrangement might be to form a separate unit or department on an equal footing with the traditional departments such as acquisitions, cataloging, and public services. this systems department would have a two-fold function: it would develop new systems and operate implemented systems; and it would bring together for maximum economy and efficiency most of the library's data processing equipment and systems staff. it will require adequate space of its own andabove alla regular budget, so that permanent and long-term programs can be developed and sustained on some thing other than an ad hoc basis. there are other advantages to having an established systems department or unit. it gives a sense of identity and esprit to the staff; and it enables them to work more effectively with other departments and to be accepted by them as a permanent fact of life in the library, thereby diminishing resistance to automation. let there be no mistake about it the systems group will be a permanent and growing part of the library staff, because there is no such thing as a finished, stable system. (there is a saying in the computer field which goes "if it works, it's obsolete.") the systems unit should be kept flexible and creative. it should not be allowed to become totally preoccupied with routine operations and submerged in its day-to-day workload, as is too frequently the case with the traditional departments, which consequently lose their capacity to see their operations clearly and to innovate. part of the systems effort automated systems in academic libraries/de gennaro 87 must be devoted to operational systems, but another part should be devoted to the formulation and development of new projects. the creative staff should not be wasted running routine operations . . . there has never been any tradition for research and development work in libraries they were considered exclusively service and operational institutions. the advent of the new technology is forcing a change in this traditional attitude in some of the larger and more innovative libraries which are doing some research and a good deal of development. it is worth noting that a concomitant of research and development is a certain amount of risk but that, while there is no such thing as change without risk, standing pat is also a gamble. not every idea will succeed and we must learn to accept failures, but the experiments must be conducted so as to minimize the effect of failure on actual library operations. ·automated systems are never finished they are open-ended. they are always being changed, enlarged, and improved; and program and system maintenance will consequently be a permanent activity. this is one of the chief reasons why the equipment and the systems group should be concentrated in a separate department. the contrary case, namely dispersion of the operational aspects among the departments responsible for the work, may be feasible in the future as library automation becomes more sophisticated and peripheral equipment becomes less expensive, but the odds at this time appear to favor greater centralization. · the harvard university library has created, with good results, a new major department along the lines suggested above, except that it also includes the photo-reproduction services. the combination of data processing and reprography in a single department is a natural and logical relationship and one which will have increasingly important implications as both technologies develop concurrently and with increasing interdependence in the future. even at the present time, there is sufficient relationship between them so that the marriage is fruitful and in no way premature. while computers have had most of the glamour, photographic technology in general, and particularly the advent of the quick-copying machine, during the last seven years has so far had a more profound and widespread impact on library resources and services to readers than the entire field of computers and data processing. within the next several years, computer and reprographic technology will be so closely intertwined in libraries as to be inseparable. it would be a mistake to sell reprography short in the coming revolution. project selection no academic library should embark on any type of automation program without first acquiring a basic knowledge of the projects and plans of the library of congress, the national library of medicine, the national li88 journal of library automation vol. 1/ 1 march, 1968 · brary of agriculture, and certain of their joint activities, such as the national serials data program. as libraries with no previous experience with data processing systems move into the field of automation, they frequently select some relatively simple and productive projects to give experience to the systems staff and confidence in machine tec;hniques to the rest of the library staff. precise selection will depend on the local situation, but projects such as the production of lists of current journals (not serials check-in), lists of reserve books, lists of subject headings, circulation, and even acquisitions ordering and accounting systems are considered to be the safest and the most productive type of initial projects. since failures in the initial stage will have serious psychological effects on the library administration and entire staff, it is best to begin with modest projects. until recently it was fashionable to tackle the problem of automating the serials check-in system as a first project on the grounds that this was one of the most important, troublesome, and repetitive library operations and was therefore the best area in which to begin computerization. fortunately, a more realistic view of the serials problem has begun to prevail that serial receipts is an extremely complex and irregular library operation and one which will probably require some on-line updating capabilities, and complex file organization and maintenance programs. in any case, it is decidedly not an area for beginners. a major objection to all of the projects mentioned is that they do not directly involve the catalo~, which is at the heart of library automation. now that the marc ii tormat has been developed by the library of congress and is being widely accepted as the standardized bibliographical and communications format, the most logical initial automation effort for many libraries will be to adapt to their own environments the input system for current cataloging which is now being developed by the library of congress. the logic of beginning an integrated system with the development of an input sub-system for current cataloging has always been compelling for this author far more compelling than beginning in the ordering process, as so many advocate. the catalog is the central record, and the conversion of this record into machinable form is the heart of the matter of library automation. it seems self-evident that systems design should begin here with the basic bibliographical entry upon which the entire system is built. having designed this central module, one can then tum to the acquisitions process and design this module around the central one. circulation is a similar secondary problem. in other words, systems design should begin at the point where the permanent bibliographical record enters the system and not where the first tentative special-purpose record is created. unfortunately, until the advent of the standardized marc ii format, it was not feasible, except in automated systems in academic libraries/ de gennaro 89 an experimental way, for libraries to begin with the catalog record, simply because the state of the art was not far enough advanced. the development and acceptance of the marc ii format in 1967 marks the end of one era in library automation and the beginning of another. in the pre-marc ii period every system was unique; all the programming and most of the systems work had to be done by a library's own staff. in the post-marc ii period we will begin to benefit from systems and programs that will be developed at the library of congress and elsewhere, because they will ~e designed around the standard format and for at least one standard computer. as a result of this, automation in libraries will be greatly accelerated and will become far more widespread in the next few years ( 7). an input system for current cataloging in the marc ii format will be among the first packages available. it will be followed shortly by programs designed to sort and manipulate the data in various ways. a library will require a considerable amount of expertise on the part of its staff to adapt these procedures and programs to its own uses (we are not yet at the point of "plugging-in" systems), but the effort will be considerably reduced and the risks of going down blind alleys with homemade approaches and systems will be nearly eliminated for those libraries that are willing to adopt this strategy. the development and operation of a local marc ii input system with an efficient alteration and addition capability will be a prerequisite for any library that expects to learn to make effective use of the magnetic tapes containing the library of congress's current c;atalog data in the marc ii format, which will be available as a regular subscription in july, 1968. in addition to providing the experience essential for dealing with the library of congress marc data, a local input system will enable the library to enter its own data both into the local systems and into the national systems which will l?egin to emerge in the near future. since the design of the marc ii format is also hospitable to other kinds of library data, such as subject-headings lists and classification schedules, the experience gained with it in an input system will be transferable to other library automation projects. costs the price of doing original development work in the library automation field comes extremely highso high that in most cases such work cannot be undertaken without substantial assistance from outside sources. even when grants are available, the institution has to contribute a considerable portion of the total cost of any development effort, and this cost is not a matter of money alone; it requires the commitment of the library's limited human resources. in the earlier days of library automa-:tion attention was focused on the high cost of hardware, computer and 90 journal of library automation vol 1/ 1 march, 1q.68 peripheral equipment. the cost of software, the systems work and programming, tended to be underestimated. experience has shown, however, that software costs are as high as hardware costs or even higher. the development of new systems, i.e., those without precedents, is the most costly kind of library automation, and most libraries will have to select carefully the areas in which to do their original work. for those libraries that are content to adopt existing systems, the costs of the systems effort, while still high, are considerably less and the risks are also reduced. these costs, however, will probably have to be borne entirely by the institution, as it is unlikely that outside funding can be obtained for this type of work. the justification of computer-based library systems on the basis of the costs alone will continue to be difficult because machine systems not only replace manual systems but generally do more and different things, and it is extremely difficult to compare them with the old manual systems, which frequently did not adequately do the job they were supposed to do and for which operating costs often were unknown. generally speaking, and in the short run at least, computer-based systems will not save money for an institution if all development and implementation costs are included. they will provide better and more dependable records and systems, which are essential to enable libraries simply to cope with increased intake and workloads, but they will cost at least as much as the inadequate and frequently unexpansible manual systems they replace. the picture may change in the long run, but even then it seems more reasonable to expect that automation, in addition to profoundly changing · the way in which the library budget is spent, will increase the total cost of providing library service. however, that service will be at a much higher level than the service bought by today's library budget. certain jobs will be eliminated, but others will be created to provide ·new services and services in greater depth; as a library becomes increasingly successful and responsive, more and more will be demanded of it. conclusion the purpose of this paper has been to stress the importance of good strategy, correct timing, and intelligent systems staff as the essential ingredients for a successful automation program. it has also tried to make clear that no canned formulas for automating an academic library are waiting to be discovered and applied to any particular library. each library is going to have to decide for itseh which approach or strategy seems best suited to its own particular needs and situation. on the other hand, a good deal of experience with the development and administration of library systems has been acquired over the last few years and some of it may very well be useful to those who are about to take the plunge for the first time. this paper was written with the intention of automated systems in academic libraries/ de gennaro 91 passing along, for what they are worth, one man's ideas, opinions, and impressions based on an imperfect knowledge of the state of the library automation art and a modest amount of first-hand experience in library systems development and administration. references 1. wasserman, paul: th e librarian and the machine (detroit: gale, 1965). a thoughtful and thorough review of the state of the art of library automation, with some discussion of the various approaches to automation. essential reading for library administrators. 2. cox, n. s. m.; dews, j. d.; dolby, j. l.: the computer and the libmry (newcastle upon tyne: university of newcastle upon tyne, 1966). american edition published by archon books, hamden, conn. extremely clear, well-written and essential book for anyone with an interest in library automation. 3. dix, william s.: annual report of the librarian for the year ending june 30, 1966 (princeton: princeton university library, 1966). one of the best policy statements on library automation; a comprehensive review of the subject in the princeton context, with particular emphasis on the "wait-for-developments" approach. 4. fussier, herman h.; payne, charles t.: annual report 1966/67 to the national science foundation from the university of chicago library; development of an integrated, computer-based, bibliographical data system for a large university library (chicago: university of chicago library, 1967 ). appended to the report is a paper given may 1, 1967, at the clinic on library application of data processing conducted by the graduate school of library science, university of illinois. mr. payne is the author, and the paper is entitled "an integrated computer-based bibliographic data system for a large university library: progress and problems at the university of chicago." 5. kilgour, frederick g.: "comprehensive modern library systems," in the brasenose conference on the automation of libraries, proceedings. (london: mansell, 1967), 46-56. an example of the evolutionary approach as employed at the yale university library. 6. parker, ralph h.: "not a shared system: an account of a computer operation designed specifically and solely for library use at the university of missouri," librm·y journal, 92 (nov. 1, 1967), 3967-3970. 7. annual review of information science and technology (new york: lnterscience publishers), 1 ( 1966) . a useful tool for surveying the current state of the library automation art and for obtaining citations to current publications and reports is a chapter on automation in libraries which appears in each volume. vr hackfest public libraries leading the way vr hackfest chris markman, m ryan hess, dan lou, and anh nguyen information technology and libraries | december 2019 6 chris markman (chris.markman@cityofpaloalto.org) is senior librarian – information technology & collections, palo alto city public library. m ryan hess (ryan.hess@cityofpaloalto.org) is library services manager — digital initiatives, information technology & collections, palo alto city public library. dan lou (dan.lou@cityofpaloalto.org) is senior librarian — information technology & collections, palo alto city public library. anh nguyen (anh.nguyen@cityofpaloalto.org) is library specialist, information technology & collections, palo alto city public library. we built the future of the internet…today! the elibrary team at the palo alto city library held a vr hackfest weaving together multiple emerging technologies into a single workshop. during the event, participants had hands -on experience building vr scenes, which were loaded to a raspberry pi and published online using the distributed web. throughout the day, participants discussed how these technologies might change our lives, for good and for ill. and afterward, an exhibit showcasing the participants’ vr scenes was placed at our mitchell park branch to stir further conversation. multiple emerging technologies explored the workshop was largely focused around the a-frame code, a framework for publishing 3d scenes to the web (https://aframe.io/). however, we also integrated a number of other technologies, including a raspberry pi, qr codes, a twitter-bot, and the inter-planetary file system (ipfs), which is a distributed web technology. virtual reality built with a-frame code in the vr hackfest, participants first learned how to use a-frame code to render 3d scenes that can be experienced through a web browser or vr headset. a-frame is a new framework that web publishers and 3d designers can use to design web sites, games and 3d art. a-frame is an extension of html, the code used to build web pages. anyone who is familiar with html will pick up a-frame very quickly, but it is simple enough even for beginners. for example, here is some raw a-frame code: <!doctype html> <html> <head> <script src="https://aframe.io/releases/0.9.1/aframe.min.js"></script> </head> <body style='margin: 0px; overflow: hidden;'> <a-scene> <a-box position="0 1 -2" rotation="0 45 45" scale="1 1 1" color="blue"> </a-box> </a-scene> </body> </html> mailto:chris.markman@cityofpaloalto.org mailto:ryan.hess@cityofpaloalto.org mailto:dan.lou@cityofpaloalto.org mailto:anh.nguyen@cityofpaloalto.org https://aframe.io/ vr hackfest | markman, hess, lou, and nguyen 7 https://doi.org/10.6017/ital.v38i4.11877 figure 1. try this code example! https://tinyurl.com/ipfsvr02. save the above code as an html file and open it with a webvr compatible browser like chrome and you will then see a blue cube in the center of your screen. by just changing the values of a few parameters, novice coders can easily change the shape, size, color and location of primitive 3d objects, add 3d backgrounds and more. advanced users can also insert javascript code to make the 3d scenes more interesting. for example, in the workshop, we provided javascript that animated a 3d robot head (see figure 1) pre-loaded into the codepen (https://codepen.io) interface for quicker editing and iteration. the inter-planetary file system (ipfs) the collection of 3d scenes created in the vr hackfest was published to the internet using the inter-planetary file system (ipfs), an open source distributed web technology originally created in palo alto by protocol labs in 2014 and now actively improved by a global network of software developers. ipfs allows anyone to publish to the internet without a server, through a peer-to-peer network that can also work seamlessly with the regular internet through http “gateways”. in november 2019, brave browser (https://brave.com) became the first to offer seamless ipfs integration, capable of spawning its own background process or daemon that can upload and download to ipfs content on the fly without the need for an http gateway or separate browser extension installation. unlike p2p technologies such as bittorrent, ipfs is best suited for distributing small files available for long periods of time rather than the quick distribution of large files over a short period of time. this is an oversimplification of what is really happening behind the scenes (part of the magic involves content-addressable storage and asynchronous communication methods based on pub/sub messaging, to name a few) but the ability to share and publish 3d environments and 3d objects in a way that can instantly scale to meet demand could have far reaching consequences for future technologies like augmented reality. https://tinyurl.com/ipfsvr02 https://codepen.io/ https://ipfs.io/ https://brave.com/ information technology and libraries | december 2019 8 figure 2. workshop attendees. ipfs can load content much faster, more securely (through features like automated cryptographic hash checking), and allows people to publish directly to the internet without the need of a thirdparty host. google, facebook, and amazon web services need not apply. the same technology has already been used to overcome censorship efforts by governments, but like any technology it has its downsides. content on ipfs is essentially permanent, allowing free speech to flourish but it could also make undesirable content, like hate speech or child pornography, all but impossible to control. toward 21st century literacy like our other technology programs, the vr hackfest was designed to engage customers around new forms of literacy, particularly around understanding code and thinking critically about emerging communication technologies. in 2019, we are already seeing how technologies like machine learning and social media are impacting social relations, politics and the economy. it is no longer enough to know how to read and write code that underlies the web. true literacy must also understand how these technologies interface with each other and how they impact people and society. vr hackfest | markman, hess, lou, and nguyen 9 https://doi.org/10.6017/ital.v38i4.11877 figure 3. the free-standing exhibit. information technology and libraries | december 2019 10 to this end, the vr hackfest sought to take participants on a journey, both technological but also sociological. once the initial practice with the code was completed, we moved on to a discussion of the consequences for using these technologies. with the distributed web, for example, we explored questions like: • what are the implications for permanent content on the web which no one can take down? • what power do gatekeepers like the government and private companies have over our online speech? • what does a 3d web look like and how will that change how we communicate, tell stories and learn? after the workshop ended, we continued the conversation with the public through an exhibit placed at our mitchell park branch (see figure 3). in this exhibit, we showcased the vr scenes participants had created and introduced the technologies underlying them. but we also asked people to reflect on the future of the internet and to share their thoughts by posting on the exhibit itself. public comments reflected the current discourse around the internet. responses (see figure 5) were generally positive—most of our customers mentioned better download speeds or other efficiency increases but a few also highlighted online privacy and safety improvements. we recorded an equal number of pessimistic and technical responses to the same question, these often demonstrated either knowledge of similar technology (e.g. “how is this different than napster?”) or displeasure with the current state of the world wide web (e.g. “less human connections” or “more spyware and less freedom”). outcomes one surprise outcome was that our project reached the attention of the developers of ipfs, who happen to live a few blocks away from the library. after reading about the exhibit online, their whole team visited our team at the library. in fact, one of their team turned out to be a former child customer of our library! the workshop itself, which was featured as a summer reading program activity, also brought in record numbers. originally open to 20 participants and later expanded to 30, the workshop grew a waitlist that more than quadrupled our initial room capacity. clearly, people were interested in learning about these two emerging technologies. we also want to take a moment to highlight the number of design iterations this project went through before making its way into the public eye. the free-standing vr hackfest exhibit was originally conceived as a wall mounted computer kiosk that encouraged users to publish a short message directly to the web with ipfs, but this raised too many privacy concerns and ultimately our building design does not make mounting a computer on the wall an easy task. our workshop also initially focused much more on command line skills working directly with ipfs, but user testing with library staff showed learning a-frame was more than enough. vr hackfest | markman, hess, lou, and nguyen 11 https://doi.org/10.6017/ital.v38i4.11877 figure 4. building the exhibit. information technology and libraries | december 2019 12 figure 5. exhibit responses. figure 6. visit from protocol labs co-founders. 0 2 4 6 8 10 12 14 16 18 20 optimistic pessimistic technical spam illegible n u m b e r o f p o st -i t n o te s vr hackfest | markman, hess, lou, and nguyen 13 https://doi.org/10.6017/ital.v38i4.11877 the vr hackfest was also a win because it combined so many different skills into a single project. we were not only working with open source tools and highlighting new technologies, but also building an experience for workshop attendees and showcasing their work to thousands of people. future work our immediate plans include re-use of the exhibit frame for future public technology showcases and offering another round of vr hackfest workshops, perhaps in a smaller group so participants have the chance to view their work while wearing a vr headset. figure 7. 3d mock-up. beyond this, we also think libraries have the opportunity to harness the distributed web for digital collections, potentially undercutting the cost of alternative content delivery networks or file hosting services. through this project we have already tested things like embedded ipfs links in marc records and building a 3d object library. essentially, all the pieces of the “future web” are already here and it is just a matter of time before all modern web browsers offer native support for these new technologies. in general, our project demonstrated the popularity of 21st-century literacy programs. but it also demonstrated the significant technical difficulties of conducting cutting edge technology workshops in public libraries. clearly, the demand is there, and our library will continue to strive to re-imagine library services. multiple emerging technologies explored virtual reality built with a-frame code the inter-planetary file system (ipfs) toward 21st century literacy outcomes future work design of library systems for implement at ion with interactive computers 65 i. a. w arheit: program administrator, information systems marketing, international business machines, san jose, california in the development of library systems, the movement today is toward the so-called "totar' or integrated system. this raises certain design and implementation questions, such as: what functions should be on-line, real time and what should be done off line in a batch mode; should one operate in a time-share environment or is a dedicated system preferred; is it practical to design and implement a total system or is the selective implementation of a series of applications to be preferred. although it may not be feasible in most cases to design and install a total system in a single operation, it is shown how a series of application programs can become the incremental development of such a system. currently library mechanization is entering a new phase. the first phase, extending from 1936 to the mid-fifties, saw the development of a number of small, scattered, and essentially experimental automatic data processing ( adp) library applications. these were punch card systems for purchasing, serials holdings lists and circulation control. during the second phase, which has been running now about 15 years, a large number of library applications have been mechanized. these include the production of catalog cards, book catalogs, periodical check-in, serials holdings, circulation control systems, acquisitions programs and searching of files, or 66 journal of library automation vol. 3/ 1 march, 1970 information retrieval. systems librarians have been busy designing individual programs, building special computer stored files, implementing conversion of records and developing operating procedures for these various applications. more importantly, they have been studying the library from a systems point of view in order to have a better understanding of the individual tasks performed and how they can be best accomplished with the available tools. at first concern was limited to individual applications in the library. gradually some of the more perceptive systems analysts began to be concerned about integrating these various applications. some simple examples are the generation of book cards for process control and circulation control as a by-product during the order-receiving cycle; the combination of subscription renewal, claims, and binding control with the serials holding program; the development of authority lists in book catalog programs; the simultaneous updating of accession files and circulation control files, etc. the purpose of many of these partially integrated programs was to reduce redundancy and make multiple use of single inputs. the next step was to look at the library as a whole and consider it as a "total" or single, integrated system. rather than building a series of independent applications programs, a number of libraries began to plan total systems in which the individual applications would be integrated segments. in the past year or two such efforts have been undertaken by the university of chicago, stanford university, redstone arsenal, the national library of medicine, washington state university, university of toronto, system development corporation, ibm and others ( 1, 2, 3, 4, 5, 6) . it is this total systems concept which is the new and current development of library electronic data processing ( edp). at first, a total integrated system was conceived as a series of separate application programs utilizing separate files, but whose records have similar formats and field designators allowing for the multiple use of single inputs. a more advanced concept, however, calls for the construction of a single logical file, even though, physically, the individual record elements may be distributed over a number of tracks and storage devices. operating on this central file are a series of program modules performing functions involving file building, searching, computation, display and printing. as each application is called for-that is, as the librarian prepares an order, receives an invoice, checks in a periodical, adds a call number, does some cataloging, charges out a book, etc.-the appropriate program functions are called into use. attached to the file are a number of indexes or access points. one such program, for example, provides some eighteen indexes: author, permuted title, subject heading, descriptor, call number, invoice number, publisher, serial j.d., l. c. card number, borrower, etc. it is not just coincidental that the development of the total integrated library system developed at the same time that computer hardware besystems with interactive computers/ warheit 67 came available that made it practical, especially in an economic sense, to operate a total library system. one of the basic elements of this hardware was the development of real-time, on-line, terminal-oriented, timeshared systems. at present, orders for on-line systems are increasing at such a rate that it is estimated in the june 23, 1969, edp weekly "that half of the computers installed by 1975 will be on-line systems." although there are a number of reasons why on-line, time sharing and terminal oriented equipment made it feasible to build total library systems, the fundamental ones were that now the librarians could interact with their system and records and could, essentially simultaneously, perform a great variety of tasks. the scientific and business communities have been quick to take advantage of these new capabilities. a number of computer manufacturers, software firms and service companies soon started to provide terminal oriented, commercial time-share services. by the beginning of 1969 there were some 35 such services in existence, serving over 10,000 customers; by the end of 1969 it is estimated there will be over 30,000 users. although these systems are often used essentially for remote job entry, their main attraction for users has been their on-line, conversational, realtime capabilities. the interactive, man-computer techniques made possible by commercial time-sharing services have been extremely valuable for problem solving applications, especially engineering and programming. however, the wide availability of text editing packages have also opened up these services for libraries. one of the first academic libraries to use such a service for preparing bibliographic records was the state university of new york at buffalo (7, 8). many universities and industrial firms have developed their own timesharing systems. a number of special libraries, notably those in ibm, were quick to take advantage of their in-house, time-share system to implement acquisitions, catalog input and library bulletin programs (9). the defense documentation center over three years ago began preparing its bibliographic inputs on line. the suny biomedical network based in syracuse does the same (10). the washington state university library was one of the first academic libraries to implement an on-line acquisition program ( 11), and midwestern university ( 12) and bell laboratories ( 13) now have on-line circulation control systems. with the advent of time-shared, on-line capabilities and the potentiality of building total, integrated systems, librarians today who are planning edp systems are faced with a number of design decisions: 1) should the system be a real-time, on-line system or an off-line, batch mode operation, or a combination of both? 2) is it desirable to operate in a timeshare environment or is a dedicated system to be preferred? 3) should one design a total, integrated system or should one selectively implement a number of individual applications? 4) if the decision is for an integrated system, how can it be incrementally implemented? 68 journal of library automation vol. 3/1 march, 1970 it is recognized that a program must be tailored to fit the available resources and that it is not always possible to build an ideal system. nevertheless, design objectives must be established even though they cannot be immediately realized. if the ultimate objectives are understood, then the program development will be orderly and later reconversions will be kept to a minimum. therefore, even though the design objectives may not be achieved for a number of years, they should be established so that current implementation can be carried out in a rational manner with some assurance that the system will grow and develop. real time or batch library operations have always involved a variety of interactive real time and batch mode procedures. most operations dealing directly with the library patrons are, of course, in real time; reference question handling and charging of books are typical examples. some technical processes, such as cataloging and searching for acquisition, are also essentially interactive, real-time operations. this means that the librarian completely processes each item by creating or updating a record or servicing an inquiry, one at a time, with little or no attempt to batch the identical operations for a number of items or inquiries. other processing, however, such as preparing and mailing orders to vendors, sorting and filing charge-out cards, sending overdues, filing into the catalog, checking in periodicals, labeling, preparing binding, etc., is essentially done in the batch mode. in other words, batch and real-time operations complement each other, for whereas it is more effective to do some operations in real time, hatching is more effective for other operations. librarians, therefore, expect and need both modes of operation. the actual distinction between these two modes is often lost in certain mechanized systems where everything is done in a non-interactive batch mode while interactive, real-time services are provided from printouts. many current library mechanized systems are really nothing more than processing techniques for producing the standard, hard-copy, bibliographic tools such as catalog cards, serials lists, book catalogs, orders, overdue notices and the like. whenever the librarian wants to use the information generated by these programs, he consults the hard-copy files or lists. he does not interrogate a computer file directly. this approach has been typical of many other computer-based information systems. when the first direct access devices ( ramac) were made available for commercial and industrial inventory control, they were used primarily to update the records and to produce the inventory lists and card files which the user would consult for information. later, as confidence developed in the machines, and terminals became available, the printout lists and files were abandoned and the user began consulting the computer store directly. systems with interactive computers/warheit 69 typically today in libraries using computer systems, inputs are processed in batches and outputs are produced in batches. real-time services are provided from the print-outs: the catalogs, the on-order file, serials lists and so on. even circulation control has been an off-line, batch operation. although the charge-out may be made through a data entry unit, all that is actually accomplished at the time is that the transaction is recorded. it is only later that the transactions are hatched and processed, the files set up for the loans, the discharges pulled from the file and the delinquencies handled. although librarians will not, in the immediate future at least, as readily give up their card catalogs and printed lists as business and industry are doing and as some enthusiasts believe librarians will ( 14, 15) the queuing problem alone where the public must use the files would be very severesome hard-copy files could be dispensed with in an on-line system. certainly hard-copy files of circulation records, periodical checkin records, authority lists, on-order records and the like need not be maintained when these files are available via terminals. until now, practically all library machine processing, with a few exceptions, has been hatched, off line and not interactive. in a non-interactive system, records are created and modified by manual preparation of work sheets followed by keypunching for data entry. in a library environment, for example, this means that the acquisitions librarian fills out an order work sheet that is given to a keypunch operator, who either prepares a decklet of punch cards or punches a paper tape or makes a magnetic record on tape. the cards or tape are then fed into the computer, the input is edited and errors noted and a proof copy is printed. the error messages and proof copy come back to the order librarian, who makes the necessary corrections. these are handed to the keypunch operator, who corrects and updates the record and inputs it again into the computer. if the operator has not introduced some new errors, the record is then processed. if she has, the record loops back again to the order librarian. the same story can, of course, be told about catalog records, journal and report records, and so on. in an interactive on-line system, the originator of the information (in this example the order librarian) could key his data directly into the computer or could prepare a work sheet for operator input. the editing would occur at once by the terminal responding to each entry and verification or error messages would be returned immediately. the librarian or operator would enter the necessary corrections and upon acceptance of the record by the system would signal entry of the record into the file and the print queues as required. also, during the preparation of the entry, the librarian would be using the terminal-presumably a display type terminal-to consult the files he needs, such as shelf list, orders outstanding, authority lists, etc. 70 journal of library aait01tultion vol. 3/ 1 march, 1970 a simplified flow chart comparison of an off-line and an on-line cataloging process would look something like that shown in figure 1. off-line catalog revision kp proof input output worksheet h edit h .], t 8 correction correction 1 .online cataloger output or --4 input edit (7 revision catalog worksheet l 1 error correction i fig. 1. cataloging process: off-line and on-line. although only a few library applications and no total library system are as yet on-line operations, a number of analogous operations are being carried out in other industries, such as order entry, inventory control, production scheduling, insurance policy information, freight waybilling, etc., so that one can make a few tentative assessments (16, 17, 18). to begin with, in an on-line system a work sheet does not have to be prepared, and so the keypunch operator is eliminated. because of the interaction of the originator and the system, all corrections and editing are accomplished at once, so that the tum-around time is very much less. preparation of printed error messages and proof copy are eliminated and the total error rate is greatly reduced. thus, although the reading-in of the individual records is slower in the on-line mode than in the batch mode, appreciably fewer messages need be read to complete a record in the on-line mode, making for more economical machine time. to this, however, must be added terminal and communication costs as well as the terminal supervisor program and the fact that most on-line work is done systems with interactive computers/warheit 71 during the prime shift, so that actual machine costs tend to be higher with the on-line system. some, however, dispute this, claiming that, on balanc~, machine costs are equal. labor costs, however, are very much lower with the on-line system. as a general rule, computer input costs are 85% labor and 15% machine. not only can a transcription clerk be eliminated, but the order librarian who prepares the original inputs on the terminal works very much more efficiently. consulting hard-copy files and lists is more time consuming and less informative than interrogating machine files. in an on-line system, the librarian's necessary tools are brought directly to him and displayed rapidly and efficiently. he does not have to walk to the sheh list, the catalog or the on-order file and copy information. in a well developed, sophisticated system some of the heavily used tools, such as the subject heading authority lists and class tables, would also be available from the terminal. not only does the librarian not have to spend time going to the physical files, but since the information is computer stored, it is brought to him in a greater variety of forms and sequences than is available in the hard-copy files. for example, titles are fully permuted so that incomplete title information can be searched. some systems librarians are proposing the use of codes and ciphers to search for entries, especially those with garbled titles ( 19, 20). all entries, including added authors, editors, vendors, etc., are immediately available even for uncataloged on-order items, so that searching is not restricted to main entries. it is not surprising, therefore, that clerks preparing computer inputs prefer working on line rather than off line. one interesting discovery is that since operators can do so much more with on-line systems they tend to take more time to turn out a better product. indications are "that significantly lower costs would have resulted if the time-sharing users had stopped work (i.e. gone to the next task) when they reached a performance level equal to that of batch users" ( 17). even with a circulation control system, there is higher system efficiency with an on-line operation. every transaction, such as a charge-out or a discharge, is an actual inquiry into the file as to the status of the book and borrower and the answer is immediately available; therefore controls and audit procedures can be simpler. elaborate error correction routines do not have to be provided in the program to identify improper inputs as has to be done with an off-line system. incorrect loans are not made of restricted material, such as holds and reserves, or to delinquent borrowers. the system also acts as a locator tool for determining the location and availability of volumes. as a final note, on-line systems are necessary if effective networks are to be developed and decentralized services provided ( 21, 22). the basic conclusion is that an on-line system can handle more work and provide more services at greater machine costs but lower labor costs than a manual or an off-line machine system. in view of the fact that 72 journal of library a-utomation vol 3/1 march, 1970 machine costs are coming down rapidly, while labor costs and throughput demands are forever rising, the future of the on-line machine system in the library looks very promising. time share or dedicated system a number of librarians have had very unhappy experiences with data processing departments over which they had no control. machines have been changed, schedules dropped, library jobs delayed or dropped for "higher priority jobs" and so on. one tendency, therefore, has been to try to get a library's own computer facility. but, as de gennaro so succinctly summarizes it, "the economics of present day computer applications in libraries make it virtually impossible to justify an in-house machine of the capacity libraries will need dedicated solely or largely to library uses ... eventually, library use may increase to a point where the in-house machine will pay for itself, but during the interim period the situation will be uneconomical unless other users can be found to share the cost. in the immediate future, most libraries will have to depend on equipment located in computing or data processing centers . . . experience at the university of missouri suggests the future will see several libraries grouping to share a machine dedicated to library use . . . it seems reasonable to suppose that in the next few years sharing of one kind or another will be more common than having machines wholly assigned to a single library . . ." ( 23). it is true that the small computers are getting more powerful and it is quite possible the day will come when small stand-alone computers will have the capacity to do all the jobs required by the library. for the time being, however, an on-line system supporting a number of terminals for a variety of tasks in the library requires a computer of a size which cannot be economically justified except for the very large libraries. also, one thing that is often overlooked is that implementing a large library system requires data processing technical support that is very seldom available on the library's staff. one need only look at the information systems office of the library of congress, or the system analysis and data processing office of the new york public library to have some appreciation of the requirements for such technical support. also, a large central system often has backup capabilities which provide insurance against breakdowns and interruptions. the question really is not whether a library should time share or have a dedicated system, but rather whether or not the library has the necessary control over its segment of the total system. this segment is the library's property and its services are available to the library as set forth in the agreement made when the library became part of the data processing services. again, it must be emphasized that all this applies to systems which have to perform all library functions. most libraries, however, in order to systems with interactive computers/warheit 73 get started and develop their programs, are beginning with small, standalone computers or are submitting batch jobs to a data processing center. later, as their programs develop, they will have to upgrade their com• puter capabilities. in view of the ultimate needs of a system which will support most of the major processing functions of a library, most libraries will have to have access to computer facilities whose full support they cannot economically justify. time sharing, certainly for the immediate future, will be required for any on-line library system. total integrated system or individual application it is more economical to handle a variety of library applications by using a single file and a standard set of functional programs, than it is to provide a separate file and a separate set of application programs for each application. not only is it more economical, but this total, integrated approach is, in its essential modularity, extremely flexible. functions can be added, changed, or removed, and sequences can be re-ordered, so that the system can grow and change with changing needs and capabilities. also, since the full record is available, if needed, for every application, added services, normally not feasible, are practical. for example, a circulation control system that, instead of having separate circulation files, keeps charge records in its central bibliographic file, can set a hold on all copies of a book, no matter where the copies are kept, as in the bellrel system ( 13). also, from a total record one can select various subsets and make different orderings to provide a variety of services. the library systems currently being designed are essentially mechanized versions of existing manual systems. however, as experience is gained with these new systems, as more advanced equipment is made available, and as research and development provide new insights, these systems will evolve and change. for example, in some cases a major part of descriptive cataloging is becoming a part of acquisitions. the former compartmentalization in libraries is already breaking down. one should, therefore, be prudent and not lock up the system into tightly compartmentalized segments on the assumption that current file subsets will remain unchanged. it is advisable that each library activity have potential access to all system functions and to all records. in the present context, an activity may have no need for all functions, nor does it need the total record, but as the system develops it might very well need these added capabilities. the problem, however, is that for a total, integrated system one must first build a complete structure including the file and all the functionssuch as file building, search, compute, compare, display, print, etc.-as well as set up all the access points which are essentially indexes. in addition, all the overhead necessary for supervising the programs, managing the files, and monitoring the terminals must be provided for. to use an 74 journal of library automation vol. 3/ 1 march, 1970 analogy, one must first build foundation, walls and roof and install all plumbing and wiring before building any rooms. consequently, the start up or initial investment is far higher than for implementing a single application program. some who have undertaken the development of total systems did not fully appreciate this at first and have, as a result, had to replan their development programs. even if one could bring in a fully debugged program for a total system, there would still be the tasks of converting records, training staff, setting up operating manuals and working out procedures. only as machinable records became available and the file grew and developed could various applications become operable. from a practical point of view, the implementation of a total system would have to be incremental; that is, once the basic system is installed, applications would have to be implemented one at a time and in some rational order. this is even more true where the programs for a total system have not been written as yet or where the library's resources are such that it can only undertake one job at a time. from a practical point of view, one can develop and implement only one application at a time. furthermore, as is often the case, the available equipment is limited and cannot do everything the library will ultimately want. it is necessary, therefore, to develop single applications and to design them in such a way that they can become part of an integrated system. it is also necessary to have a strategy and a plan to move up through the various levels of mechanization. today there are many who, although accepting a total, on-line system as a desirable goal, feel that it is impractical to consider because of costs and unavailability of equipment. a full analysis of economic change in terms of wage-cost rise and machine-cost decrease, of technologic improvement and of demand for added services, goes far beyond the limits of this paper. there is developing, moreover, a literature on these subjects (24, 25, 26, 27, 28, 29). suffice it to say that an increasing number of librarians are becoming convinced that library mechanization is inevitable, that it will affect all operations of the library, that it will provide the highest level of service through direct, on-line, interactive systems and that, whatever today's limitations may be, these changes are coming so fast that plans must be made now. these individuals are also convinced that whatever is now undertaken in the way of mechanization will evolve into an integrated system with many basic functions operated in a real-time, on-line mode. implementation of an integrated library system typically a library mechanization project will start with a single, relatively uncomplicated application that will not impact library operations very much, will require only a small amount of systems design and programming, and will run in a batch mode on a small equipment configuration. a typical example is the preparation of a serials holdings list. from systems with interactive computers/warheit 75 this first job, the librarian and his staff will become acquainted with data processing, will introduce the data processing personnel to some library requirements and will, hopefully, begin to develop procedures for working with the computing center. having passed this introductory stage, many librarians continued, as a rule, simply by developing the next application. today, however, the more prescient ones are first assessing the total impact of mechanization and, having decided that their library will be mechanized, try to plan what their foreseeable goals are, then work out a plan to achieve these goals. having decided that the ultimate goal is a total integrated system for the whole library, which will provide real-time services and therefore must operate on line, the library planner will set priorities and work out a strategy to reach these goals. in some instances he can start designing a total system. in other situations, he does not have the resources to do so, but plans to make use of programs being developed for other libraries or of so-called standard, commercial packages, or programs which may be developed jointly with other libraries. he should realize that he can't just sit and wait for d-day when a total complete program will be wheeled in and a turnkey operation will be installed overnight. the lead time necessary for planning, training, conversion and installation is too often grossly underestimated, so that these preliminary preparations are neglected to the detriment of orderly growth and development. having established certain long-range goals, the librarian will tailor his current programs so that the library system will develop as smoothly as possible. he will try to keep the various subsystems and program segments as generalized and as modular as possible. he will structure his records so that they can ultimately be fitted together into a full bibliographic record. he will try to avoid using records so truncated that they will have to be discarded and recorded again later. he may, in fact, actually start with a full record that is comparable to his present shelf list or catalog card, even though there may be no need of the whole record for the current application. he will provide for a variety of print options, such as line width, number of lines, number of columns, etc., so that a separate print program will not have to be written for each product or to accommodate every change in style. he will try to organize his files so that the file structures and the record formats will not have to be radically changed when the system goes on line. he may store some of his records -his active on-order file, for example-on direct access storage devices. if he can, he will create access points to his large bibliographic file and store them on disk files too, even though he is currently operating off-line. such direct access storage of indexes makes economic sense when very large files and library files are large and grow very fast must be searched or sorted. aside from these immediate benefits, such a file organization requires little or no restructuring or record reformatting when 76 journal of library automation vol. 3/ 1 march, 1970 the system ultimately goes on line and becomes terminal oriented. as early as possible, he will put his circulation control system on line. this is by far the cheapest and easiest on-line operation requiring the least investment and yet producing the most immediate benefits. again, aside from the immediate benefits, this on-line operation represents an important building block for the ultimate total system. aside from the current improved services, the experience of working on line and the opportunity to develop and refine processes and procedures will pay important dividends in the design and implementation of the total on-line system. with knowledge of how he wants his system to develop, the librarian is now able to establish priorities and allocate his resources. the emphasis will be on file building, on capturing the record. acquisitions programs or circulation control systems will come first. work on the display terminal and communication will come later after searchable files have been built up. in other words, an attempt is made to have a controlled growth through several levels of mechanization. a start is made with a simple, off-line, batch job. then a beginning is made on building what is to become the main, central bibliographic file, the catalog. as soon as possible, parts of it are stored on direct access devices, so that it can be used more effectively and so that its structure will conform to the requirements of an ultimate on-line system. a simple on-line process is adopted as soon as feasible. each application program uses standard functional modules in macro form and so on. all this, of course, is highly oversimplified and may seem truistic to many. nevertheless, there has been too much evidence of programs undertaken without adequate planning and of programs that have lacked continuity because adequate guide lines have not been established. such failures are too often ascribed to changes in personnel or hardware. a project should be designed so that inevitable changes in personnel and hardware can be tolerated without its being wrecked. therefore, the establishment of long-range goals can have a profound effect on the shape and success of current operations. more and more librarians and systems personnel engaged in library projects are beginning to think in terms of total integrated systems. they are looking ahead and planning. they are designing and implementing their present applications not in a simple ad hoc way but as part of what is to become a total system. references 1. alexander, r. w.: "toward the future integrated library system," 33rd conference of fid and international congress on documentation, (tokyo: 1967). systems with interactive computers/ warheit 77 2. redstone scientific information center : automation in libraries (first atlis workshop) 15-17 november 1966, huntsville, ala.: redstone arsenal, (june 1967). report rsic625. 3. black, donald v.: "library information system time sharing: system development corporation's lists project," california school libraries, (march 1969), 121-6. 4. black, donald v.: library information system time-sharing on a large, general purpose computer. (system development corporation report sp-3135, 20 september 1968). 5. bruette, vernon r.; cohen, joseph; kovacs, helen : an on-line computer system for the storage and retrieval of books and monographs (brooklyn, new york : state university of new york downstate medical center, 1967). 6. fussier, herman h.; payne, charles t. : development of an integrated computer-based bibliographical data system for a large university library. (chicago : chicago university, 1968) . clearinghouse report pb 179 426. 7. balfour, frederick m.: "conversion of bibliographic information to machine readable form using on-line computer terminals," journal of library automation, 1 (december 1968), 217-26. 8. lazorick, gerald j.: "computer/ communications system at suny buffalo," educom. the bulletin of the interuniversity communications council, 4 (february 1969), 1-3. q_. bateman, betty b.; farris, eugene h.: "operating a multilibrary system using long-distance communications to an on-line computer," proceedings of asis, 5 ( 1968 ), 155-62. 10. pizer, i. h.: "regional medical library network," medical library association bulletin, 57 (april 1969), 101-15. 11. burgess, t .; ames, l.: lola library on-line acquisitions subsystem. (pullman, wash.: washington state university library, july 1968). 12. reineke, charles d.; boyer, calvin j. : "automated circulation system at midwestern university," ala bulletin, 63 (october 1969 ), 1249-54. 13. kennedy, r. a.: "bell laboratories' library real-time loan system (bellrel)," journal of library automation, 1 (june 1968), 128-46. 14. licklider, j. c. r. : libraries of the future (cambridge, massachusetts : m.i.t. press, 1965). 15. swanson, don r. : "dialogues with a catalog," library quarterly, 34 (january 1964), 113-25. 16. brown, robert r.: "cost and advantages of on-line dp," datamation, 14 (march 1968), 40-3. 17. gold, michael m.: "time-sharing and batch-processing; an experimental comparison of their values in a problem-solving situation," communications of the acm, 12 (may, 1969), 249-59. 78 journal of library automation vol. 3/ 1 march, 1970 18. · sackman, h.: "time sharing versus batch processing: the experimental evidence," afips conference proceedings, 32, 1968 spring ]oint computer conference, 1-10. 19. nugent, william r.: "compression word coding techniques for information retrieval," journal of library automation, 1 (december 1968) ) 250-60. 20. ruecking, frederick h.: "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," journal of library automation, 1 (december 1968), 227-38. 21. grosch, audrey n.: "implications of on-line systems techniques for a decentralized research library system," college & research libraries, 30 (march 1969), 112-18. 22. rayward, w. boyd: "libraries as organizations," college & research libraries, 30 (july 1969), 312-26. 23. de gennaro, richard: "the development and administration of automated systems in academic libraries," journal of library automation, 1 (march 1968), 75-91. 24. "the costs of library and informational services." knight, douglas m.; nourse, e. shepley, eds.: in libraries at large (new york: r. r. bowker, 1969), 168-227. 25. cuadra, carlos a.: "libraries and technological forces affecting them," ala bulletin, 63 (june 1969), 759-68. 26. culbertson, dons.: "the costs of data processing in university libraries : in book acquisition and cataloging," college & research libraries, 24 (november 1963), 487-89. 27. dolby, j. l.; forsyth, v.; and resnikoff, h. l.: an evaluation of the utility and cost of computerized library catalogs. final report project no. 7-1182, u. s. department of health, education and welfare. 10 july 1968, eric ed 022517. 28. kilgour, frederick g.: ''the economic goal of library automation," college & research libraries~ 30 (july 1969), 307-11. 29. knight, kenneth e.: 'evolving computer performance," datamation, 14 (january 1968), 31-5. reproduced with permission of the copyright owner. further reproduction prohibited without permission. the impact of information technology on library anxiety: the role of computer attitudes jiao, qun g;onwuegbuzie, anthony j information technology and libraries; dec 2004; 23, 4; proquest pg. 138 the impact of information technology on library anxiety: oun g. jiao and anthony j. onwuegbuzie the role of computer attitudes over the past two decades, computer-based technologies have become dominant forces to shape and reshape the products and services the academic library has to offer. the application of library technologies has had a profound impact on the way library resources are being used. although many students continue to experience high levels of library anxiety, it is likely that the new technologies in the library have led to them experiencing other forms of negative affective states that may be, in part, a function of their attitude towards computers. this study investigates whether students' computer attitudes predict levels of library anxiety. c omputers and information technologies have experienced considerable growth over the past two decades. as such, familiarity with computers is rapidly becoming a basic skill and a prerequisite for many tasks. although not every college student is equally prepared for the rising demand of computer skills in the !nformation age, computer literacy is increasingly becommg a gatekeeper for students' academic success. 1 gaps in computer literacy and skills can leave many students behind not only in their academic achievement but also in their future job-market success. the unprecedented pace of technological change in the development of digital information networks and electronic services in recent years has helped to expand the role of the academic library. once only a storehouse of printed materials, it is now a technology-laden information network where students can conduct research in a mixed print and digital-resource environment, experience the use of advanced information technologies, and hone their computer skills. yet, many students are struggling to cope with the changes brought on by the rapid advances of information teclmologies. academic libraries of various sizes have spent a large percentage of their material budget on electronic commercial content, and the trend will continue.' these days, college students are faced with the choices of ever-changing modes of electronic accessing tools, interfaces, and protocols along with the traditional print resources in the library. the fact that the same journal article may be available in multiple vendors' aggregator oun g. jiao (gerryjiao@baruch.cuny.edu) is reference libraria~ and_ associate professor at newman library, baruch college, city university of new york, and anthony j. onwuegbuzie (tony onwuegbuzie@aol.com) is associate professor at the college of education, university of south florida, tampa. sites (such as ebscohost and gale group) makes the navigation through these bibliographic databases more complex and challenging. relevant sources must be identified and navigation protocols must be learned before appropriate information and contents can be found. furthermore, having located a citation, students still have to search the library online catalog to find out if the journal or book is available in the library and, if not, know how to make an interlibrary loan request either on paper or electronically.' anxiety levels can be high and patience levels can be low at varying times of conducting library research. 4 . that students experience various levels of apprehension when using academic libraries is not a new phenomenon. indeed, the phenomenon is prevalent among college students in the united states and many other countries, and is widely known as library anxiety. mellon first coined the term in her study in which she noted that 75 percent to 85 percent of undergraduate students described their initial library experiences in terms of anxiety.5 according to mellon, feelings of anxiety stem from either the relative size of the library; a lack of knowledge about the location of materials, equipment, and resources of the library; how to initiate library research; or how to proceed with a library search. 6 library anxiety is an unpleasant feeling or emotional state with physiological and behavioral concomitants that come to the fore in li_brary settings. typically, library-anxious students experience negative emotions, including ruminations, tension, fear, and mental disorganization, which prevent them from using the library effectively. 7 a student who experiences library anxiety usually undergoes either emotional ~r physical discomfort when faced with any library or library-related task. 8 library anxiety may arise from a lack of self-confidence in conducting research, lack of prior exposure to academic libraries, the inability to see the relevance of libraries to one's field of interest, and lack of familiarity with library equipment and technologies. library anxiety is often accorded special attention because of its debilitating effects on students' academic achievement.9 although many students continue to experience high levels of library anxiety, it is likely that the new technologies and electronic databases in libraries have led to students experiencing other forms of negative affective states. in particular, it is likely that library anxiety experienced by students is, in part, a function of their attitudes toward computers. consistent with this assertion, mizrachi and shoham and mizrachi reported a statistically significant relationship between library anxiety and computer attitudes. 10 they noted in their research that home and work usage of computers, computer games, word processors, computer spreadsheets, and the internet are all related to the dimensions of library anxiety found among israeli students to varying degrees. 138 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. similarly, jerabek, meyer, and kordinak found levels of computer anxiety to be related to levels of library anxiety for both men and women. 11 these studies focused exclusively on undergraduate students. however, no study has examined this relationship among graduate students, a population that uses the academic library more than any other student population. over the past fifteen years, a large body of research literature on computer attitudes has been generated. in particular, many researchers have studied the relationship between computer attitudes and computer use. 12 the importance of beliefs and attitudes towards computers and technologies is widely acknowledged. 13 students' computer attitudes arguably impact their willingness to engage in computer-related activities in colleges and universities where effectively using library electronic resources represents an increasingly important part of college education. negative computer attitudes may inhibit students' interests in learning to use the library resources and thereby weaken their academic performance levels, while at the same time elevating levels of library anxiety. mclnerney, mclnerney, and sinclair observed that negative perceptions about computers among student teachers may accompany feelings of anxiety, including worries about being embarrassed, looking foolish, and even damaging the computer equipment. 14 further, there is often a negative relationship between prior experience with computers and computer anxiety experienced by individuals. 15 until recently, library anxiety has only been interpreted in the context of the library setting-that is, a phenomenon that occurs while students are undertaking library tasks. jiao, onwuegbuzie, and lichtenstein defined library anxiety as "an uncomfortable feeling or emotional disposition, experienced in a library setting, which has cognitive, affective, physiological, and behavioral ramifications." 16 at the same time, unprecedented technological advancement has had a profound impact on the products and services offered by academic libraries. students now are able to conduct sophisticated library searches from the comfort of their homes. it is clear that the construct of library anxiety needs to be expanded in the new library and information environment, incorporating into its definition other variables that are relevant for the changing library and information context. because many library users spend a significant portion of their time using computer-based technologies to conduct information searches, it is natural to ask, to what extent does library anxiety stem from students' prior attitudes and experiences with computers and library technologies? however, with the exception of the studies conducted by mizrachi and shoham and mizrachi on israeli undergraduate students, this link has not been examined. 17 thus, the present study investigated the relationship between computer attitudes and library anxiety in the rapidly changing library and information environment. as such, the current inquiry replicated the works of mizrachi, shoham and mizrachi, and jerabek, meyer, and kordinak by examining the degree to which computer attitudes predict levels of library anxiety among graduate students in the united states. 18 it was expected that findings from this study would help to increase the understanding of the construct of library anxiety. indeed, research in this area has become critical in higher education where educators are responsible for graduating students with the skills necessary to thrive and to lead in a rapidly changing technological environment in the twenty-first century. i method participants participants were ninety-four african american graduate students enrolled in the college of education at a historically black college and university in the eastern u.s. all participants were solicited in either a statistics or a measurement course at the time that the investigation took place. in order to participate in the study, students were required to sign an informed-consent document that was given during the first class session of the semester. the majority of the participants were female. ages of the participants ranged from twenty-two to sixty-two years (mean = 30.40, sd = 8.75). instruments and procedure all participants were administered two scales, namely, the computer attitude scale (cas) and the library anxiety scale (las). the cas, developed by loyd and gressard, contains forty likert-type items that assess individuals' attitudes toward computers and the use of computers. 19 this instrument consists of the following four scales, which can be used separately: (1) anxiety or fear of computers; (2) confidence in the ability to use computers; (3) liking or enjoying working with computers; and (4) computer usefulness. loyd and gressard reported coefficient alpha reliability coefficients of .86, .91, .91, and .95 for scores pertaining to computer anxiety, computer confidence, computer liking, and total scales, respectively. for the present study, the score reliabilities were as follows: • computer anxiety, .84 (95 percent confidence interval ci = .79, .88); • computer confidence, .81 (95 percent ci = .75, .86); • computer liking, .89 (95 percent ci= .85, .92); and • computer usefulness, .76 (95 percent ci = .68, .83). the impact of information technology on library anxiety i jiao and onwuegbuzie 139 reproduced with permission of the copyright owner. further reproduction prohibited without permission. the las, developed by bostick, contains forty-three 5-point likert-format items that assess levels of library anxiety experienced by college students. 20 it also contains the following five subscales: 1. barriers with staff; 2. affective barriers; 3. comfort with the library; 4. knowledge of the library; and 5. mechanical barriers. a high score on any subscale represents high levels of anxiety in that area. jiao and onwuegbuzie, in their examination of the score reliability reported on las in the extant literature, found that it has typically been in the adequate to high range for the subscale and total-scale scores. 21 based on their analysis, onwuegbuzie, jiao, and bostick concluded that "not only does the [las] produce scores that yield extremely reliable estimates, but also these estimates are remarkably consistent across samples with different cultures, nationalities, ages, years of study, gender composition, educational majors, and so forth." 22 for the current investigation, the subscales generated scores for the combined sample that had a classical theory alpha reliability coefficient of .89 (95 percent ci = .85, .92) for barriers with staff, .84 (95 percent ci = .79, .88) for affective barriers, .53 (95 percent ci= .37, .66) for comfort with the library, .62 (95 percent ci= .48 .73) for knowledge of the library, and .70 (95 percent ci= .58, .79) for mechanical barriers. analysis a canonical correlation analysis was conducted to identify a combination of library anxiety dimensions (barriers with staff, affective barriers, comfort with the library, knowledge of the library, and mechanical barriers) that might be simultaneously related to a combination of computer-attitude dimensions (computer anxiety, computer liking, computer confidence, and computer usefulness). canonical correlation analysis is used to examine the relationship between two sets of variables whereby each set contains more than one variable. 23 in the present investigation, the five dimensions of library anxiety were treated as the dependent multivariate set of variables, and the four dimensions of computer attitudes formed the independent multivariate profile. the number of canonical functions (factors) that can be produced for a given dataset is equal to the number of variables in the smaller of the two variable sets. because the library-anxiety set contained five dimensions and the computer-attitude set contained four variables, four canonical functions were generated. for any significant canonical coefficient, the standardized canonical-function coefficients and structure coefficients were then interpreted. standardized canonicalfunction coefficients are computed weights that are applied to each variable in a given set in order to obtain the composite variate used in the canonical correlation analysis. as such, standardized canonical-function coefficients are equivalent to factor-pattern coefficients in factor analysis or to beta coefficients in a regression analysis." conversely, structure coefficients represent the correlations between a given variable and the scores on the canonical composite (latent variable) in the set to which the variable belongs.2 5 thus, structure coefficients indicate the degree to which each variable is related to the canonical composite for the variable set. indeed, structure coefficients are essentially bivariate correlation coefficients that range in value between -1.0 and + 1.0 inclusive." the square of the structure coefficient yields the proportion of variance that the original variable shares linearly with the canonical variate. i results table 1 presents the intercorrelations among the five dimensions of library anxiety and the four dimensions of computer attitude. of particular interest were the twenty correlations between the library-anxiety subscale scores and the computer-attitude subscale scores. it can be seen that, after applying the bonferroni adjustment, four of these relationships were statistically significant. specifically, computer liking was statistically significantly related to affective barriers, knowledge of the library, and comfort with the library. using cohen's criteria of .1, .3, and .5 for small, medium, and large relationships, respectively, the first two relationships (involving affective barriers and knowledge of the library) were medium, and the third relationship (between computer liking and comfort with the library) was large. 27 in addition to these three relationships, the association between computer usefulness and knowledge of the library also was statistically significant, with a medium effect size. the correlation matrix in table 1 was used to examine the multivariate relationship between library anxiety and computer attitudes. this relationship was assessed via a canonical correlation analysis. the canonical analysis revealed that the four canonical correlations combined were statistically significant (p < .0001). also, when the first canonical root was removed, the remaining three canonical roots were not statistically significant. in fact, removal of subsequent canonical roots did not lead to statistical significance. together, these results suggested that only the first canonical function was statistically significant, but the remaining three roots were not statistically significant. this first canonical root also was practically significant (rc1 = .63), contributing 40.8 percent (rc12) to the shared variance, which represents a large effect size. 28 140 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. data pertaining to the first canonical root are presented in table 2, which provides both standardized function coefficients and structure coefficients. using a cutoff correlation of 0.3, the standardized canonical-function coefficients revealed that affective barriers, comfort with the library, and knowledge of the library made important contributions to the library-anxiety set, with affective barriers and comfort with the library making similarly large contributions. 29 with regard to the computer-attitude set, computer anxiety, computer liking, and computer confidence made noteworthy contributions, with the latter two dimensions making the most noteworthy contributions. the structure coefficients revealed that all five dimensions of library anxiety made important contributions to the first canonical variate. the square of the structure coefficient indicated that barriers with staff, affective barriers, comfort with the library, and knowledge of the library made similarly large contributions, explaining 67.2 percent, 72.3 percent, 72.3 percent, and 60.8 percent of the variance, respectively. with regard to the computerattitude set, computer liking and computer usefulness made important contributions. these variables explained 64.0 percent and 16.8 percent of the variance, respectively. comparing the standardized and structure coefficients indicated that computer anxiety and computer confidence served as suppressor variables because the standardized coefficients associated with these variables were large, whereas the corresponding structure coefficients were relatively small. 30 suppressor variables are variables that assist in the prediction of dependent variables due to their correlation with other independent variables. 31 thus, the inclusion of computer anxiety and computer confidence in the canonical correlation model strengthened the multivariate relationship between library anxiety and computer attitudes. i discussion the purpose of this study was to investigate the relationship between computer attitudes and library anxiety among african american graduate students. specifically, the multivariate link between these two constructs was examined. a canonical correlation analysis revealed a strong multivariate relationship between library anxiety and computer attitudes. the library-anxiety subscale scores and computer-attitudes subscale scores shared 40.82 percent of the common variance. specifically, computer liking and computer usefulness were related simultaneously to the following five dimensions of library anxiety: barriers with staff, affective barriers, comfort with the library, knowledge of the library, and mechanical barriers. computer anxiety and computer confidence served as suppressor variables. thus, computer attitudes predict levels of library anxiety. as such, the present findings are consistent with those of mizrachi and shoham and mizrachi, who found a statistically significant relationship between computer attitudes and the following seven dimensions of the hebrew library-anxiety scale, a modified version of las developed by the authors for their israeli sample: 1. staff 2. knowledge 3. language 4. physical comfort 5. library computer comfort 6. library policies and hours, and 7. resources. 32 according to its authors, the staff factor refers to students' attitudes towards librarians and library staff and their perceived accessibility. the knowledge factor pertains to how students rate their own library expertise. the language factor relates the extent to which using englishlanguage searches and materials yield discomfort. physical comfort evaluates how much the physical facility negatively affects students' satisfaction and comfort with the library. library computer comfort assesses the perceived trustworthiness of library computer facilities and the quality of directions for using them. library policies and hours concerns students' attitudes toward library rules, regulations, and hours of operation. finally, resources refers to the perceived availability of the desired material in the library collection. the correlations between the dimensions of library anxiety and computer attitudes ranged from .11 (physical comfort) to .47 (knowledge). the current results also replicate those of jerabek, meyer, and kordinak, who found levels of computer anxiety to be related to levels of library anxiety for both men and women. 33 nevertheless, caution should be exercised in generalizing the current findings to all graduate students. though the present study examined the association between library anxiety and computer attitudes among african american graduate students, it should not be assumed that this relationship would hold for other racial groups. jiao, onwuegbuzie, and bostick found that african american students attending a research-intensive institution reported statistically significantly lower levels of library anxiety associated with barriers with staff, affective barriers, and comfort with the library than did caucasian american graduate students enrolled at a doctoral-granting institution, with effect sizes ranging from moderate to large. 34 in a follow-up study, jiao and onwuegbuzie compared african american and caucasian american students with respect to library anxiety, controlling for educational background by selecting both racial groups from the same institution. 35 no statistically significant racial differences the impact of information technology on library anxiety i jiao and onwuegbuzie 141 reproduced with permission of the copyright owner. further reproduction prohibited without permission. table 1. lntercorre lations among the library-anxiety subscales and computer-att itude subsca les subscale 2 3 4 5 6 7 8 9 1 . barriers with staff .64 * .63* .49* .46* .02 .05 -.27 -.09 * .52 * * -.05 .02 -.37 * -.23 2. affective barriers .56 .40 3. comfort with the library .56 * .44 * -.19 -.20 -.55 * -.16 _39*' -.21 -.11 -.37 * -.32 * 4. knowledge of the library 5. mechanical barriers -.13 -.01 -.18 .04 .77 * .48 * .46 * 6. computer anxiety .67 * .36 * 7. computer confidence .43 * 8. computer liking 9. computer usefulness *indicates a statistically significant relationsh ip after the bonferroni adjustment. table 2. canon ical solution for th ird function-re lationship between library-anx iety subscales and computer-att itude subsca les theme standardization coefficient library-anxiety subscale barriers with staff .17 affect ive barriers .40* comfort with the library .39* know ledge of the library .31 * mechanical barr iers -.12 computer-attitude subscale computer anxiety -0.31* computer confidence 0.98* computer liking -1.25* computer usefulness -0 .13 *loadings with the effect sizes larger than .3. were found in library anxiety for any of the five dimensions of las. however, across all five library-anxiety measures, the african american sample reported lower scores than did the caucasian american sample. in fact, using the test of trend by onwuegbuzie and levin, they found that the consistency with which the african american graduate students had lower levels of library anxiety than did the caucasian american students was both statistically and practically significant. 36 thus, jiao and onwuegbuzie's results, alongside those of jiao, onwuegbuzie, and bostick, suggest that racial differences in library anxiety prevail. 37 thus, future research should investigate whether the relationship between library anxistructure coefficient .82* .85* .85* .78* .39* -.22 .13 -.80* -.41 * structure•(%) 67.2 72 .3 72.3 60.8 15.2 4.8 1.7 64.0 16.8 ety and computer attitudes found in the present study among african american graduate students also exists among caucasian american graduate students, as well as among other racial groups. further, the causal direction of the relationship found in the current study should be investigated. that is, future studies should investigate whether library anxiety places a person more at risk for experiencing poor computer attitudes, or whether the converse is true. more research also is needed to determine how computer attitudes might play a role in the library context. notwithstanding, it appears that the construct of library anxiety can be expanded to include the construct 142 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. of computer attitudes. indeed, one implication of the findings is that bostick's las should be modified to include dimensions of computer attitudes. 38 such a modification likely would facilitate the identification of library-anxious students. by identifying students with high levels of library anxiety and poor computer attitudes, library educators and others could help them improve their dispositions and provide them with the skills necessary to negotiate the rapidly changing technological environment, thereby putting them in a better position to be lifelong learners. references 1. susan m. piotrowski, computer training: pathway from extinction (eric document reproduction service, ed 348955, 1992). 2. thomas h. hogan, "drexel university moves aggressively from print to electronic access for journals (interview with carol hansen montgomery, dean of libraries)," computers in libraries 21, no. 5 (may 2001): 22-27. 3. m. claire stewart and h. frank cervone, "building a new [nfrastructure for digital media: northwestern university library," information technology and libraries 22, no. 2 (june 2003): 69-74. 4. carol c. kuhlthau, "longitudinal case studies of the information search process of users in libraries," library and information science research 10 (july 1988): 257-304; carol c. kuhlthau, "inside the search process: information seeking from the user's perspective," journal of the american society for information science 42, no. 5 (june 1991): 361-71; carol c. kuhlthau, seeking meaning: a process approach to library and information services (norwood, n.j.: ablex, 1993); carol c. kuhlthau, "students and the information search process: zones of intervention for librarians," advances in librarianship 18 (1994): 57-72; carol c. kuhlthau et al., "validating a model of the search process: a comparison of academic, public, and school library users," library and information science research 12, no. 1 (jan.-mar. 1990): 5-31. 5. constance a. mellon, "library anxiety: a grounded theory and its development," college & research libraries 47, no. 2 (mar. 1986): 160-65. 6. ibid. 7. qun g. jiao, anthony j. onwuegbuzie, and art lichtenstein, "library anxiety: characteristics of' at-risk' college students," library and information science research 18 (spring 1996): 151-63. 8. constance a. mellon, "attitudes: the forgotten dimension in library instruction," library journal 113 (sept. 1, 1988): 137-39; constance a. mellon, "library anxiety and the nontraditional student," in reaching and teaching diverse library user groups, ed. teresa b. mensching (ann arbor, mich.: pierian, 1989), 77-81; anthony j. onwuegbuzie, "writing a research proposal: the role of library anxiety, statistics anxiety, and composition anxiety," library and information science research 19, no. 1 (1997): 5-33. 9. anthony j. onwuegbuzie and qun g. jiao, "information search performance and research achievement: an empirical test of the anxiety-expectation model of library anxiety," fournal of the american society for information science and technology (jasist) 55, no. 1 (2004): 41-54; anthony j. onwuegbuzie, qun g. jiao, and sharon l. bostick, library anxiety: theory, research, and applications (lanham, md.: scarecrow, 2004). 10. diane mizrachi, "library anxiety and computer attitudes among israeli b.ed. students" (master's thesis, bar-ilan university, israel, 2000); snunith shoham and diane mizrachi, "library anxiety among undergraduates: a study of israeli b.ed. students," journal of academic librarianship 27, no. 4 (july 2001): 305-11. 11. ann j. jerabek, linda s. meyer, and thomas s. kordinak, "'library anxiety' and 'computer anxiety': measures, validity, and research implications," library and information science research 23, no. 3 (2001): 277-89. 12. muhamad a. al-khaldi and ibrahim m. al-jabri, "the relationship of attitudes to computer utilization: new evidence from a developing nation," computers in human behavior 9, no. 1 (jan. 1998): 23-42; margaret cox, valeria rhodes, and jennifer hall, "the use of computer-assisted learning in primary schools: some factors affecting uptake," computers in education 12, no. 1 (1988), 173-78; gayle v. davidson and scott d. ritchie, "attitudes toward integrating computers into the classroom: what parents, teachers, and students report," journal of computing in childhood education 5, no. 1 (1994): 3-27; donald g. gardner, richard l. dukes, and richard discenza, "computer use, self-confidence, and attitudes: a causal analysis," computers in human behavior 9, no. 4 (winter 1993): 427-40; robin h. kay, "predicting student teacher commitment to the use of computers," journal of educational computing research 6, no. 3 (1990): 299-309. 13. deborah bandalos and jeri benson, "testing the factor structure invariance of a computer attitude scale over two grouping conditions," educational and psychological measurement 50, no. 1 (spring 1990): 49-60; frank m. bernt and alan c. bugbee jr., "factors influencing student resistance to computer administered testing," journal of research on computing in education 22, no. 3 (spring 1990): 265-75; michel dupagne and kathy a. krendl, "teacher's attitudes toward computers: a review of the literature," journal of research on computing in education 24, no. 3 (spring 1992): 420-29; elizabeth mowrer-popiel, constance pollard, and richard pollard, "an analysis of the perceptions of preservice teachers toward technology and its use in the classroom," journal of instructional psychology 21, no. 2 (june 1994): 131-38; jennifer d. shapka and michel ferrari, "computerrelated attitudes and actions of teacher candidates," computers in human behavior 19, no. 3 (may 2003): 319-34. 14. valentina mclnerney, dennis m. mclnerney, and kenneth e. sinclair, "student teachers, computer anxiety, and computer experience," journal of educational computing research 11, no. 1 (1994): 27-50. 15. susan e. jennings and anthony j. onwuegbuzie, "computer attitudes as a function of age, gender, math attitude, and developmental status," journal of educational computing research 25, no. 4 (2001): 367-84. 16. jiao, onwuegbuzie, and lichtenstein, "library anxiety," 152. 17. mizrachi, "library anxiety and computer attitudes"; shoham and mizrachi, "library anxiety among undergraduates." 18. mizrachi, "library anxiety and computer attitudes"; shoham and mizrachi, "library anxiety among undergraduates"; the impact of information technology on library anxiety i jiao and onwuegbuzie 143 reproduced with permission of the copyright owner. further reproduction prohibited without permission. jerabek, meyer, and kordinak, '"library anxiety' and 'computer anxiety."' 19. brenda h. loyd and clarice gressard, "the effects of sex, age, and computer experience on computer attitudes" aeds journal 18, no. 2 (1984): 67-77. 20. sharon l. bostick, "the development and validation of the library anxiety scale" (ph.d. diss, wayne state university, 1992). 21. qun g. jiao and anthony j. onwuegbuzie, "reliability generalization of the library anxiety scale scores: initial findings/' (unpublished manuscript, 2002). 22. onwuegbuzie, jiao, and bostick, library anxiety, 22. 23. norman cliff and david j. krus, "interpretation of canonical analyses: rotated versus unrotated solutions," psychometrica 41, no. 1 (mar. 1976): 35-42; richard b. darlington, sharon l. weinberg, and herbert j. walberg, "canonical variate analysis and related techniques," review of educational research 42, no. 4 (fall 1973): 131-43; bruce thompson, "canonical correlation: recent extensions for modeling educational processes" (paper presented at the annual meeting of the american educational research association, boston, mass., apr. 7-11, 1980) (eric, ed 199269); bruce thompson, canonical correlation analysis: uses and interpretations (newbury park, calif.: sage, 1984); bruce thompson, "canonical correlation analysis: an explanation with comments on correct practice" (paper presented at the annual meeting of the american educational research association, new orleans, la., apr. 5-9, 1988) (eric, ed 295957); bruce thompson, "variable importance in multiple regression and canonical correlation" (paper presented at the annual meeting of the american educational research association, boston, mass., april 16-20, 1990) (eric, ed 317615). 24. margery e. arnold, "the relationship of canonical correlation analysis to other parametric methods" (paper presented at the annual meeting of the southwest educational research association, new orleans, la., jan. 1996) (eric, ed 395994). 25. thompson, "canonical correlation: recent extensions." 26. ibid. 27. jacob cohen, statistical power analysis for the behavioral sciences (new york: wiley, 1988). 28. ibid. 29. zarrel v. lambert and richard m. durand, "some precautions in using canonical analysis," journal of marketing research 12, no. 4 (nov. 1975): 468-75. 30. anthony j. onwuegbuzie and larry g. daniel, "typology of analytical and interpretational errors in quantitative and qualitative educational research," current issues in education 6, no. 2 (feb. 2003). accessed nov. 13, 2003,http://cie.ed.asu.edu/ volume6/number2/. 31. barbara g. tabachnick and linda s. fidell, using multivariate statistics, 3rd ed. (new york: harper), 1996. 32. mizrachi, "library anxiety and computer attitudes"; shoham and mizrachi, "library anxiety among undergraduates." 33. jerabek, meyer, and kordinak, '"library anxiety' and 'computer anxiety."' 34. qun g. jiao, anthony j. onwuegbuzie, and sharon l. bostick, "racial differences in library anxiety among graduate students," library review 53, no. 4 (2004): 228-35. 35. qun g. jiao and anthony j. onwuegbuzie, "library anxiety: a function of race?" (unpublished manuscript, 2003). 36. anthony j. onwuegbuzie and joel r. levin, "a proposed three-step method for assessing the statistical and practical significance of multiple hypothesis tests" (paper presented at the annual meeting of the american educational research association, san diego, calif., apr. 12-16, 2004). 37. jiao, onwuegbuzie, and bostick, "racial differences in library anxiety." 38. bostick, "the development and validation of the library anxiety scale." 144 information technology and libraries i december 2004 gaps in it and library services at small academic libraries in canada jasmine hoover information technology and libraries | december 2018 15 jasmine hoover (jasmine_hoover@cbu.ca) is scholarly resources librarian, cape breton university, sydney, nova scotia, canada. abstract modern academic libraries are hubs of technology, yet the gap between the library and it is an issue at several small university libraries across canada that can inhibit innovation and lead to diminished student experience. this paper outlines results of a survey of small (<5,000 fte) universities in canada, focusing on it and the library when it comes to organizational structure, staffing, and location. it then discusses higher level as well as smaller scale solutions to this issue. introduction modern academic libraries are hubs of technology, yet existing staffing, organizational structures, physical proximity and traditional ways of doing things in higher education have maintained a gap between the library and it, which is an issue at several small university libraries across canada. libraries today are largely online, which means managing access to resources, using online tools for reference and research, designing websites and more. the physical space in libraries is now a place to interact with new technologies, visualize data, a place for research support including open access repositories and data management, and other digital research initiatives. 1 these library functions often require a staffing complement to support them with a level of specialization in information technology (it). however, though the offerings of the library have changed drastically over the years, smaller university libraries have struggled to support the growing need for it services. larger universities (over 5,000 fte) have managed this influx of demand and usage of new technologies in libraries by having their own library it services to manage software and technologies to support research, teaching and learning. many also offer student and user -facing technical support with it help desks within the library. smaller universities (below 5,000 fte) often do not have the resources to have their own it department or staff and find themselves not able to help researchers with modern digital scholarship, not able to support new systems and software, and not working as closely with it as they would like or need. also, the it department is generally not responsible for this kind of work, as it is outside of institutional-wide software support. this paper outlines the current status of it and the library when it comes to organizational structure, physical location, and collaboration in small academic libraries across canada. it then outlines strategies that can be used in smaller libraries to help bridge the gap, as well as recommendations for administrators when considering organizational changes to better serve a modern research atmosphere. current status at small canadian universities the technologies behind modern library services are often complex, as libraries need to securely manage access to online resources (both on and off campus); support faculty as they research and mailto:jasmine_hoover@cbu.ca gaps in it and library services at small academic libraries in canada | hoover 16 https://doi.org/10.6017/ital.v37i4.10596 teach using new software and technologies; and support new models for publishing that include open-access repositories, data management, open education resources, and more. library staff deal with technology issues that come up daily, with several non-it library staff members troubleshooting and solving various issues that arise. library users run into all kinds of technical issues and reach out for help. in nova scotia, our library consortium offers live help, an online library chat service distributed throughout eleven academic institutions in nova scotia. statistics kept on type of question asked on this service from january 2010 to march 2018 show that 26 percent of the over 68,000 questions asked are technical in nature, with topics including difficulty accessing online resources, login troubles, and other technical issues.2 for this study, 18 out of the 21 universities with fte >1,000 and < 5,000 in canada were surveyed. excluded were universities that were “sister” institutions of larger universities which utilized the same library system and french-only-speaking universities. twelve university libraries responded to an online survey which asked questions concerning organization and collaboration focused on it, the library, and educational technology. results (see figure 1) show that organizational reporting structures in higher education vary when it comes to it and the library. fifty percent of the survey respondents reported that their it department reports to the ceo/cfo or vp administrative, 25 percent of it departments report to a cio, 17 percent report to a provost/vp academic, and 8 percent report to a vp finance. figure 1. which of the following best describes how your it organization reports? all of the libraries in this survey, on the other hand, report to a provost or vp academic. this makes sense, as libraries are generally considered academic while it is usually associated with operations. however, there have been recent changes to some university library structures in canada that might indicate new thinking when it comes to organizational structure and the relationship between these units. in 2018, it was announced that there would be restructuring at brandon university which removed the university librarian position altogether (as well as the 50% 17% 25% 8% reports to ceo/cfo/vp admin reports to provost/ vp academic reports to cio reports to the vp finance information technology and libraries | december 2018 17 director of it services), and placed the library under a chief information officer. this would bring the library and it under one reporting structure.3 in an opposite move, mount allison university recently proposed to eliminate the top librarian position and have an academic dean split the responsibility of the library and their academic unit.4 after local outcry, this move was reversed and the job ad is out for a head librarian. it is hard to say if these are signals of upcoming change in the future of library reporting, or a temporary solution in a time of budget restrictions. however, half of the survey respondents mentioned that there has been some recent reorganization or planned reorganization related to it and the library at their institutions. only 33 percent of small university libraries surveyed have their own it department or staff. one of those libraries have an it specialist who splits time between the library and their it department. the other 67 percent have no it department or staff in the library (see figure 2). figure 2. does your library have its own it department? when asked, “is there anything you would like changed about the current organization when it comes to it and the library?,” all of the libraries without in-library it support mentioned a desire for either a position in the library responsible for it; greater collaboration between it and the library; or a specific person within the it department who they could contact regarding it. student experience, including their experience with technology is important according to a recent educause study. this 2017 educause study outlines the importance of it, and support for students when it comes to wi-fi and other technical support.5 one recommendation from this report is to have it help desks more visible and available. not only is the library a convenient location, but as we have already seen, students are increasingly using technologies in the library and often run into issues. it makes sense then to have an it help desk within the library, as the majority of larger university libraries in canada already offer. when asked about it help desks in 25% 67% 8% yes, they are library employees no gaps in it and library services at small academic libraries in canada | hoover 18 https://doi.org/10.6017/ital.v37i4.10596 the library, three of the responding university libraries (25 percent) have help desks staffed by it services, one (8 percent) had a help desk staffed by library staff, and another (8 percent) had an after-hours help desk staffed by it services. the remaining 59 percent have no it help available in the library (see figure 3). figure 3. does your library have an it help desk? the physical location of the two units is also important. in this survey, 75 percent of respondents replied that the library and the it department are in separate spaces while 25 percent share a common space. studies have shown that physical proximity in the workplace can lead to greater collaboration. an mit study showed that physical proximity drives collaboration between researchers on university campuses.6 as one of the common themes in the survey was the desire for more collaboration, a physical change of location could have a great impact. when asked about changes people would like to see with the current organization of the it and library, many mention a need for more collaboration due to interrelated responsibilities. common suggestions included library it staff, having an it help desk in the library, or a specific person in it they could contact directly for help or who had shared responsibilities between it and the library. another suggestion was a committee that would bring together members from both units to strengthen communication. what can be done? in the larger view, university administrations need to look for outdated governance and organizational structures that are in place. as universities shift their goals and focus over time, they need to adapt structures and staffing accordingly. chong and tan describe it govern ance as being of utmost importance, claiming there needs to be strategic alignment between it and organizational strategies and objectives.7 carraway describes universities with a high level of it governance maturity and effectiveness as those where “it initiatives are aligned with the 59%25% 8% 8% no yes, employed by it services yes, employed by the library yes, employed by it services, only after hours information technology and libraries | december 2018 19 institution’s strategic priorities and prioritized among the university’s portfolio projects.” 8 effective it governance, focused on collaboration and communication, is associated with greater integration of innovation into institutional process. also, it governance was found to be more effective under a delegated model that empowers it governance bodies than under a cio centric model. the majority of universities surveyed showed common governance structures of it, with most as separate units reporting to a cfo/vp admin or similar. the inclusion of faculty, students , and business units in it governance committees was associated with a stronger innovation culture.9 stakeholder inclusion is an important characteristic of it governance maturity. students, as consumers of it, and faculty should both have a seat at the table when it comes to it governance. carraway found that an increased level of student engagement in it governance correlates with a high level of innovation culture.10 university administration should take a good look at how it is governed, who has input and how it is affecting the university’s objectives. the reporting structure of libraries has generally gone unchanged, with most respondents confirming that their library reports to an academic vice president. budget constraints at two canadian universities have seen the library structure being impacted as of late, however there has been little research done on the ideal governance structure of libraries in higher education. both it and the library in smaller canadian universities could consider governance committees that include students, faculty and other stakeholders, in order to be more innovative and effective. it is an interesting unit, where the model in higher education has moved back and forth between three main structures: centralized, decentralized, and federated it structures. centralized, where there is a central hub that runs it services for the university, is the most common structure found at the surveyed universities. decentralized, where it services are spread throughout the organization, would automatically mean the library (and other units) had it staff. a federated model would also lead to local library it work being done by specific people, who work for and out of a central it office, but are assigned to specific areas. federated structures offer centralized control, with decentralized functions in faculties and units. chong and tan believe that federated structures are more appropriate for a collaborative network, such as a university.11 their study found that federated structure, combined with coordinated communication, led to higher effectiveness. nugroho maintains that decentralized organizations such as universities need to regularly review their it governance structure, as both technology and the organization itself changes.12 he maintains that effective governance does not happen by coincidence, and it governance is not a static concept. library staffing also needs to change based on needs of the users and goals of the organization. some even suggest that libraries reorganize every few years to keep staff flexible, take advantage of new opportunities and foster growth.13 in 2011, we saw bell and shank’s work on the blended librarian, which advocated for librarianship with educational technology and instructional design skills.14 according to the 2015 arl statistics, we continue to see nontraditional professional jobs increasing in the library. in 2015, the top three new hire categories included two nontraditional categories: digital specialists and functional specialists.15 arcl statistics from 2016 showed that over the previous five years, 61 percent of libraries repurposed or cross-trained staff to better support new technologies or services.16 we saw in the survey that out of over 68,000 research questions fielded by librarians across nova scotia since 2010, just over one quarter of these are technical in nature. library administration at smaller universities, looking at these numbers, gaps in it and library services at small academic libraries in canada | hoover 20 https://doi.org/10.6017/ital.v37i4.10596 should respond by ensuring that technical knowledge and skills will be written into job ads, as they are increasingly in demand or that staff are trained appropriately. physical location is also important. we’ve seen from the survey results that there is a lack of physical connectedness between the library and it in smaller canadian universities. wineman et al. studied various organizations and their physical proximity. they state: “social networks play important roles in structuring communication, collaboration, access to knowledge and knowledge transformation.”17 they suggest that innovation is a process that occurs at the crossroad between social and physical space. cramton points out that “maintaining mutual knowledge is a central problem of geographically dispersed collaboration.”18 if it is not possible to change the organizational structure or governance to ensure more communication and knowledge sharing, physical spaces such as an it desk in the library is another way for the library and it staff to be in regular contact. a 2017 mit study recommended that institutions keen to support the crossdisciplinary collaborative activity that is vital to research and practice, may need to adopt “a new approach to socio-spatial organisation that may ultimately enrich the design and operation of places for knowledge creation.”19 we could apply the same thinking to institutions interested in supporting collaborative activity between the library, it, and newer-yet-related initiatives such as educational technology and digital research centers. proximity to collaborators should be considered as one option to enhance outcomes and innovation between the library and it. organizational structures and models, physical locations, and governance are all large-scale factors that should be considered when looking at the relationship between it and the library. there are also smaller-scale practical ideas that can help. these ideas will be discussed below. an important first step is to start the conversation. the author’s institution has begun thinking about the gaps in our services and support for research, especially when it comes to support for technologies needed for modern research and publication that are often housed in the library. factors which have helped start this conversation include: funding mandates related to open access and data management; new services or initiatives that researchers or units would like to start; which require it and library specialization; and planning for a future in higher education that increasingly relies on up to date technologies to support research, publishing, and teaching. a conversation is beginning between researchers, administration, the library, and other stakeholders which will lead to a collaborative solution to some of these issues. it’s important that there is interest and initiation from administration, but also that other stakeholders are involved from the onset. many universities have developed new positions, new units, or worked these positions into it or the library to fill this gap, but the solution needs to fit each institution and their goals. often times when there is no it staff in the library, technical issues are managed by one or two technical-minded staff members. equipping frontline service providers may help alleviate some of this work by enabling many staff to solve common technical issues. here at the author’s institution, the librarian in charge of access has begun presenting common technical/access issues during a monthly reference meeting. the goal is to have all staff who field questions from users have a basic understanding of how the systems work in the library, what to do if they see issues, and whom they can contact. in libraries where there is not a strong it presence, it is important to enable staff to be comfortable with basic issues that will come up. this also ensures that there is not just one person who can answer common technical/access issues. if someone staffing the information technology and libraries | december 2018 21 reference or circulation desk encounters users with these issues, they can explain why they are happening and what the library is going to do to help them. the plan is to create a library technical manual out of these quick presentations that can act as a resource for all staff or as a training manual for new staff. at each of these presentations, a survey is administered. the survey has four questions and asks participants about their comfort level dealing with technical/access questions both before and after the presentation. one hundred percent of staff answered that after the presentation, they felt more comfortable when encountering the issues described. this is not a suitable replacement of the specialized it skills needed in libraries; however, it can alleviate some of the pressure put on select people in smaller academic libraries. library staff can, and do, actively work to learn new skills through formal training and professional development. we saw from the acrl survey that many libraries are working to cross-train staff in order to keep up with technological demands. encouraging learning new skills and educational opportunities can go a long way and should be encouraged by library administration. the benefit of having it staff dedicated to the library is obvious, and libraries should continually push for this. results of the survey showed that library staff would prefer to have a person to contact with issues specific to the library. issues can be dealt with promptly, it personnel working in or assigned to the library will have an understanding of the systems involved, communication is easier, as there is a point person to contact, and the library has control over the products and services they offer. however, if that is not possible within the organization, a good system of communication is important. a timely system of contacting it and resolving issues can go a long way. chong and tan maintain that a coordinated communication system is key for it in an organization.20 a commonly used system for technical issues is the ticket system, where issues can be submitted by users, and answered and tracked by it. this is a very useful system for it staff, however the users often cannot track their own ticket, see a timeline for completion, or know who is on the other end to contact with more information. it is a good idea to meet regularly with it, formally or informally, to be able to discuss issues, build a relationship with colleagues, and get a better sense of how each unit works. on the library end, it is important to keep statistics on technical issues sent to it and the time elapsed before the issues are resolved. these statistics can be used to demonstrate the need for library-specific it staff, encourage better communication between departments, or demonstrate a problem with the current way issues are communicated. having statistics will help libraries if and when the time comes that new positions can be created. at the author’s institution we use springshare’s libanswers software to track all technical issues, including those sent on to it. this software records the dates and times; important details and resolutions to technical issues; and exports useful statistics. in smaller canadian university libraries there is a growing need for it support. however, there has been little done by way of organizational structure, staffing, or physical proximity between these two units to allow universities to better serve their students and faculty. this paper outlined the current situation in several smaller university libraries in canada and provides some high level as well as local solutions to this problem. gaps in it and library services at small academic libraries in canada | hoover 22 https://doi.org/10.6017/ital.v37i4.10596 appendix a: it, library, and educational technology organization *required 1. institution name * 2. total student population 3. which of the following best describes how your it organization reports? mark only one oval. reports to ceo/cfo/vp admin reports to cio reports to provost/vp academic reports to dean of library/ head of library other: 4. which of the following best describes how the dean/head of library/university librarian reports? mark only one oval. reports to the ceo/cfo/vp admin reports to provost/vp academic reports to university president other: 5. which of the following best describes it's relationship to the library? mark only one oval. it and the library are not at all part of the same reporting structure it is a part of the library reporting structure it and the library report to the same person, but are separate departments other: 6. which of the following describes the physical location of it and the library? mark only one oval. located in separate spaces share a physical location other: 7. does your library have its own it department? mark only one oval. yes, they are library employees yes, they are employed by it services and work in the library information technology and libraries | december 2018 23 no other: 8. does your library have an it help desk? mark only one oval. yes, they are library employees yes, they are employed by it services no other: 9. have there been any major reorganizations (that you are aware of) related to it and library services in the last ten years? 10. is there anything you would like changed about your current organization when it comes to it and the library? 11. who is in charge of educational technology/academic technology at your university? mark only one oval. library it educational technology is separate unit/office educational technology duties are split up among the library/it/other other: 12. which of the following describes the physical location of educational technology? mark only one oval. ed tech is located in or shares space with the library ed tech is located in or shares space with it ed tech has its own space no ed tech unit gaps in it and library services at small academic libraries in canada | hoover 24 https://doi.org/10.6017/ital.v37i4.10596 other: 13. what would you include as roles of an educational technology unit? mark all that apply. media design/production research and development (testing technologies, em erging tech) instructional design and developm ent faculty development learning spaces assessment (learning outcomes, course evaluations) distance/online learning support training on course software/technologies related to teaching and learning managing classroom technologies other: 14. have there been any changes (that you know of) related to educational technology services in the last ten years? 15. is there anything you would like changed about your current organization when it comes to educational technology services and the library? 16. may i use direct quotes in my research/publication? (no names or institutions will be attributed to a quote.) mark only one oval. yes no information technology and libraries | december 2018 25 references 1 tibor koltay, “are you ready? tasks and roles for academic libraries in supporting research 2.0,” new library world 117, no. 1/2 (january 11, 2016): 94–104, https://doi.org/10.1108/nlw-09-2015-0062. 2 “instant messaging service—statistics data entry page,” novanet, accessed june 5, 2018, https://util.library.dal.ca/livehelp/liveh3lp/admin/livehelp/chatentry.php. 3 “brandon university will eliminate 15% of senior administration to help tackle budget cut,” brandon university, march 15, 2018, https://www.brandonu.ca/news/2018/03/15/brandonuniversity-will-eliminate-15-of-senior-administration-to-help-tackle-budget-cut/. 4 joseph tunney, “mount a proposal to phase out top librarian makes students, staff want to make noise,” cbc news, january 18, 2018, https://www.cbc.ca/news/canada/newbrunswick/mount-allison-university-librarian-1.4492297. 5 d. christopher brooks and jeffrey pomerantz, “ecar study of undergraduate students and information technology,” educause, october 18, 2017, accessed june 7, 2017, https://library.educause.edu/resources/2017/10/ecar-study-of-undergraduate-studentsand-information-technology-2017. 6 matthew claudel et al., “an exploration of collaborative scientific production at mit through spatial organization and institutional affiliation,” plos one 12, no. 6 (2017), https://doi.org/10.1371/journal.pone.0179334. 7 josephine chong and felix b. tan, “it governance in collaborative networks: a socio-technical perspective,” pacific asia journal of the association for information systems 4, no. 2 (2012). 8 deborah louise carraway, “information technology governance maturity and technology innovation in higher education: factors in effectiveness” (master’s diss., the university of north carolina at greensboro, 2015), 113. 9 ibid., 89. 10 ibid. 11 chong and tan, “it governance in collaborative networks: a socio-technical perspective,” 44. 12 heru nugroho, “conceptual model of it governance for higher education based on cobit 5 framework,” journal of theoretical and applied information technology, 60, no. 2 (february 2014): 6. 13 gillian s. gremmels, “staffing trends in college and university libraries,” reference services review 41, no. 2 (2013): 233–52, https://doi.org/10.1108/00907321311326165. 14 john d. shank and steven bell, “blended librarianship.” reference & user services quarterly 51, no. 2 (winter 2011): 105–10. https://doi.org/10.1108/nlw-09-2015-0062 https://util.library.dal.ca/livehelp/liveh3lp/admin/livehelp/chatentry.php https://www.brandonu.ca/news/2018/03/15/brandon-university-will-eliminate-15-of-senior-administration-to-help-tackle-budget-cut/ https://www.brandonu.ca/news/2018/03/15/brandon-university-will-eliminate-15-of-senior-administration-to-help-tackle-budget-cut/ https://www.cbc.ca/news/canada/new-brunswick/mount-allison-university-librarian-1.4492297 https://www.cbc.ca/news/canada/new-brunswick/mount-allison-university-librarian-1.4492297 https://library.educause.edu/resources/2017/10/ecar-study-of-undergraduate-students-and-information-technology-2017 https://library.educause.edu/resources/2017/10/ecar-study-of-undergraduate-students-and-information-technology-2017 https://doi.org/10.1371/journal.pone.0179334 https://doi.org/10.1108/00907321311326165 gaps in it and library services at small academic libraries in canada | hoover 26 https://doi.org/10.6017/ital.v37i4.10596 15 stanley wilder, “hiring and staffing trends in arl libraries,” association of research libraries, october 2017, https://www.arl.org/storage/documents/publications/rli-2017-stanleywilder-article2.pdf. 16 “new acrl publication: 2016 academic library trends and statistics,” news and press center, july 20, 2017, http://www.ala.org/news/member-news/2017/07/new-acrl-publication-2016academic-library-trends-and-statistics. 17 jean wineman et al., “spatial layout, social structure, and innovation in organizations,” environment and planning b: planning and design 41, no. 6 (december 1, 2014): 1,100–112, https://doi.org/10.1068/b130074p. 18 catherine durnell cramton, “the mutual knowledge problem and its consequences for dispersed collaboration,” organization science 12, no. 3 (may-june2001): 346–71, https://doi.org/10.1287/orsc.12.3.346.10098. 19 claudel et al., “an exploration of collaborative scientific production at mit through spatial organization and institutional affiliation,” 2. 20 chong and tan, “it governance in collaborative networks: a socio-technical perspective,” 44. https://www.arl.org/storage/documents/publications/rli-2017-stanley-wilder-article2.pdf https://www.arl.org/storage/documents/publications/rli-2017-stanley-wilder-article2.pdf http://www.ala.org/news/member-news/2017/07/new-acrl-publication-2016-academic-library-trends-and-statistics http://www.ala.org/news/member-news/2017/07/new-acrl-publication-2016-academic-library-trends-and-statistics https://doi.org/10.1068/b130074p https://doi.org/10.1287/orsc.12.3.346.10098 abstract introduction current status at small canadian universities what can be done? appendix a: it, library, and educational technology organization references expanding and improving our library’s virtual chat service: discovering best practices when demand increases article expanding and improving our library’s virtual chat service discovering best practices when demand increases parker fruehan and diana hellyar information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13117 parker fruehan (fruehanp1@southernct.edu) is assistant librarian, hilton c. buley library, southern connecticut state university. diana hellyar (hellyard1@southernct.edu) is assistant librarian, hilton c. buley library, southern connecticut state university. © 2021. abstract with the onset of the covid-19 pandemic and the ensuing shutdown of the library building for several months, there was a sudden need to adjust how the hilton c. buley library at southern connecticut state university (scsu) delivered its services. overnight, the library’s virtual chat service went from a convenient way to reach a librarian to the primary method by which library patrons contacted the library for help. in this article, the authors will discuss what was learned during this time and how the service has been adjusted to meet user needs. best practices and future improvements will be discussed. background the buley library started using springshare's libchat service in january 2015. the chat service was accessible as a button in the header of all the library webpages, and the wording would change depending on the availability of a librarian. at buley library, the chat service is only staffed by our faculty librarians. there were other chat buttons on various individual libguides for either specific librarians or for the general library chat. chat was monitored at the research & information desk by the librarian on duty. the first librarian of the day would log into the shared chat account on the reference desk computer. while each librarian had their own account, using a shared account meant that the librarians could easily hand off a chat interaction during a shift change. while the reference desk was typically busy, librarians would only receive a small number of chats per day. between 2015 and 2019, the library saw an average of 250 chats per year. due to the low usage, there was little focus on libchat training for librarians. for more complicated questions, librarians would often recommend that chat users call, email, or schedule an in-person appointment. since libchat was only monitored while librarians were at the reference desk, it was easy to let it become a secondary mode of reference interaction, particularly if there was a surge of in-person reference questions at any given time. due to the covid-19 pandemic, the library quickly shifted from mostly in-person to solely online services. suddenly, libchat was the virtual reference desk and the main mode of patron interaction. despite this change in how the library interacted with the campus, there was only a slight increase in chat usage in the first two months of the closure. in april 2020, we started to explore our options with libchat in the hopes of increasing visibility and usage. mailto:fruehanp1@southernct.edu mailto:hellyard1@southernct.edu information technology and libraries september 2021 expanding and improving our library’s virtual chat service | fruehan and hellyar 2 evaluating chat widget options considering technical implementation the publicly accessible chat interface is made available completely within a webpage, requiring no clients, external applications, or plugins to make it functional. springshare calls this component the libchat widget, and provides a prepackaged set of website code necessary to create the chat interface. within the libchat system there are a few options for widget placement and presentation. at the time of writing, springshare offers four widget types in its libchat product: in-page chat, button pop-out, slide-out tab, and floating.1 when the service is offline, the system replaces the chat interface with a link to library faqs and the option to submit a question for follow-up. at buley library, prior to the covid-19 pandemic shutdown, the button pop-out was the main widget type used to enter a chat session (see fig. 1). figure 1. previous library website header with chat pop-out button in upper right-hand corner. the pop-out button works by opening a separate pop-up window with the chat interface. this allows the user to navigate to other pages in the previous window without disconnecting from the session. one challenge to the pop-up window method is that many web browsers block pop-up windows by default, requiring a user to recognize and override this setting. another option used mainly on librarian profiles and subject guides is the in-page chat, which embeds the chat interface directly on an existing webpage. many times, these chat widgets are connected to a particular user rather than the queue monitored by all librarians. the user will interact with the chat operator in this dedicated section of the webpage. if a user navigates to a different page in the same window or tab it will disconnect from the chat session. these widget options are easiest when considering web design expertise and time commitment involved in implementation. both the button pop-out and in-page chat can be accomplished with a user having access to a what you see is what you get, or wysiyg, editor on the webpage and the ability to copy and paste a few lines of html code. it does not require any custom <script> elements to be placed in the page <head> or footer area. choosing the floating widget when the library shifted to all virtual services, there was concern that the chat button could easily be missed by library patrons. it was decided at this point to investigate alternative options to invite patrons to chat with librarians. the floating chat widget was chosen as the best option and was integrated into the website during a theme update in may 2020. the floating chat widget was made widely available to springshare libchat users in 2017.2 this widget operates by placing a chat icon at the bottom right of a webpage. this icon remains visible in this location while users scroll down a page. another option is to implement this icon as a proactive widget which displays a message to users after a set number of seconds to invite them to start a chat session (see fig. 2). information technology and libraries september 2021 expanding and improving our library’s virtual chat service | fruehan and hellyar 3 figure 2. floating chat widget on the redesigned library homepage with the proactive setting enabled (pop-up modal) to invite users to begin a chat. a pop-out button is also on the library homepage for patrons who were accustomed to the previous version. implementing this type of widget requires an administrator level of access to the website content management system if it is to be implemented across an entire website. a single <script> tag in the <head> section of the site template will activate it across an entire site. the floating chat widget is a common feature widely seen in the business world and on retailer websites. the hesitation in implementing this type of widget was that it would be perceived by patrons as intrusive or annoying. however, one study reported finding that college students found it useful.3 additionally, several other libraries have written previously about their success in implementing a proactive chat widget.4 it was decided that the library would implement this proactive chat widget on a trial basis and then evaluate the outcome as to whether to continue or not. shortly after, it was decided to add another proactive chat widget to southernsearch, the library’s discovery platform, built on ex libris primo (see fig. 3). the primo new ui, built on nodejs, is more complex to implement as it requires building a javascript function to insert the chat widget information technology and libraries september 2021 expanding and improving our library’s virtual chat service | fruehan and hellyar 4 script code into the application for nodejs to render on the front-end. this process is well documented by laura guy of the colorado school of mines.5 figure 3. embedded chat widget in southernsearch discovery platform. the pop-up modal is hidden in this view. libchat in action in active chats the ask a librarian chat became the library’s virtual reference desk when the university closed its campus in march 2020 due to the covid-19 pandemic. with the new focus on chat, the librarians decided it was time for a refresher on libchat. training was also necessary due to an update to the librarian libchat dashboard. a virtual training session was held to show librarians how to use the new dashboard and to remind everyone how best to use chat since it was infrequently used prior to closing in-person services. the training was recorded, and a link was provided to everyone so librarians could watch the training again as needed. the libchat widget change caused a huge increase in daily chat numbers. the library received 47 chats in april with the chat button. the total spiked to 160 in may with the floating chat widget. historically, there is a decrease in both virtual and in-person reference interactions during the month of may. librarians were answering more chats than ever before. however, some librarians information technology and libraries september 2021 expanding and improving our library’s virtual chat service | fruehan and hellyar 5 still had questions or technical troubles that did not get asked until after a chat ended. there was a need for a place to address these questions ahead of time so answers could be provided at the point of need. in response, a libchat best practices guide was created soon after the widget change to have a place to house tips for effective chat interactions and for quick troubleshooting answers. the best practice guide was well received by librarians; however, it was underutilized. libchat administrators hoped to be able to add a link to it directly within the dashboard for quick reference if librarians had an issue during a live chat, but they were unable to do so. the librarians added new canned messages to increase efficiency and consistency among common patron needs. canned messages are commonly used prewritten statements that can be utilized by all librarians on chat. they can be created by an administrator to be used by any librarian and can be easily added to any active chat. for example, librarians created messages for bookstore contact information, how to request electronic delivery of journal articles, and the library’s hours. privacy considerations the library’s chat system is capable of capturing data that comprises personally identifiable information (pii) for a patron. an intentional decision was made to limit this collection as much as possible in keeping with the principles of ala’s stance on patron privacy in the library bill of rights, which as interpreted says: the right to privacy includes the right to open inquiry without having the subject of one’s interest examined or scrutinized by others, in person or online. confidentiality exists when a library is in possession of personally identifiable information about its users and keeps that information private on their behalf.6 the chat login form asks a patron for information before starting a chat; that information can include name, contact information (such as email or phone numbers), and up to three customized questions. additionally, the system automatically captures the referrer url and the user’s public ip address, browser, and operating system. while it would be helpful to have more information up front, this could lead to the collection of a wealth of pii. with these concerns in mind, it was decided to make pii such as name, email address , and phone numbers an optional entry. while knowing a name and email address is useful when the librarian would like to follow up on a question or if a chat session is unintentionally disconnected, it is not necessary. the librarian may request this information later in the chat if needed. handling harassment and inappropriate behavior in june, one chat patron caused an immediate change in the operation of libchat at the library. on this day, a librarian answered an incoming chat, and a hostile patron threatened the librarian. before the situation could be addressed, another question from an anonymous patron came into the queue. a second librarian, who happened to be one of this article’s authors, claimed the chat despite being aware of the situation and suspecting this was the same difficult patron. it began as a normal interaction but eventually they became inappropriate. the librarian warned the patron multiple times before finally ending the chat and blocked their ip address. in response, the patr on left an inappropriate comment on the ratings and comments survey after the chat interaction finished. this one patron interaction made the librarians realize that they were vulnerable to harassment and other hostile behavior. changes were made to help create a safe environment for librarians information technology and libraries september 2021 expanding and improving our library’s virtual chat service | fruehan and hellyar 6 and patrons. first, training and instructions were provided so every librarian knew how to ban a user’s ip address. it was made clear that those ip addresses could be reinstated if a mistake was made. the list is checked on occasion to determine if any decision needed to be reverted or if a university ip address was inadvertently banned. this is especially important if one bans the ip address assigned to the university edge router, a critical network device from which all campus http traffic originates. this could block all users on campus from accessing libchat. additionally, some librarians also chose to change their names from the default option which is full first and last name. many changed their display name, using a combination of their first names, and subject liaison specific job titles. a decision was also made to turn off the chat rating and comment survey. it was not useful for the library’s own assessment purposes as most chat interactions were left unrated. leaving it in place could lead to more inappropriate behavior as comments can be left by a patron even after being banned. most importantly, the librarians created a virtual reference policy. while there was an existing general reference policy in place, it did not focus on, nor specifically mention, online conduct. librarians agreed that it was time to create an additional policy since the increase in usage identified new concerns. the policy gave librarians grounds to ban patrons if needed, informing them that inappropriate conduct was against the virtual reference policy, and disclosed the librarian’s right to report inappropriate behavior to campus authorities. a canned message was created to allow librarians to quickly inform a patron before banning the ip address and ending the chat. the message warns the patron their behavior violates the virtual reference policy and links to the library’s research & instruction policies page where the policy is included. in the beginning of the fall 2020 semester students returned to school, both on campus and virtually. as shown in figure 1, there was a huge spike in chat questions at the beginning of the semester. typical questions included those about borrowing textbooks, using study rooms, and research questions. librarians could sometimes tell if chats were from scsu students or employees if the patron provided their school email address. librarians started seeing another interesting trend with the referrer url. it was noticed that subject guides see more chat patrons who provide non-university email addresses or provide no contact information. when asked, patrons respond they were not affiliated with scsu. it is suspected that non-scsu patrons find the library’s subject guides through a search engine and while exploring the page the floating chat prompts them to chat with a librarian. the floating chat widget has helped to inform us about outside traffic to some of our guides. this has encouraged the library to take a more proactive approach to libguide maintenance among subject librarians, especially the small number of guides with heavy chat traffic. information technology and libraries september 2021 expanding and improving our library’s virtual chat service | fruehan and hellyar 7 observing patron usage figure 4. total number of chats per month from january to december 2020. best practices the lessons learned from the increased chat usage have helped shape a better approach to virtual reference services and create better experiences for the librarians and patrons using the service. with that, here are some considerations and recommendations to expand virtual reference or chat services at other libraries: • use a persistent chat widget with pop-up notifications on all library pages to remind patrons that immediate help is available. • consider patron privacy when making decisions about chat settings. • have policies in place that cover virtual services, such as chat, that includes language on harassment and other inappropriate behavior. • use the policy language for a canned message so librarians can reply quickly if there is an issue. • be prepared for non-university affiliated patrons using chat services, especially from subject guides easily found through search engines. what’s next at buley library there are a few anticipated changes for the buley library chat service in the future. the library’s student advisory board praised the library chat, sharing they found it to be a great way for students to be able to contact librarians for help. however, they did suggest the library consider the branding of the service. their recommendation was to make it clear before a patron starts an information technology and libraries september 2021 expanding and improving our library’s virtual chat service | fruehan and hellyar 8 interaction that a real scsu librarian would be answering the chat. they felt as though students may think that the answerer would be a chat bot or from a call center and not a real librarian that works for the school. additionally, librarians will re-evaluate chat coverage schedules based on usage data. the current schedule is based on in-person desk coverage. however, in-person historical statistics do not translate well to online reference services; busy periods for in-person reference services may not be the case for the same period online. the trend appears to show that weekend chat transactions exceed those of in-person transactions during the same period a year previously. noticing this trend, chat scheduling may need to be adjusted to better align demand with librarian availability. these adjustments could include longer shifts scheduled less frequently rather than daily hour long shifts. chat allows for more multitasking, so a longer shift may be preferable. it might help the librarian adjust their own schedule in preparation for a longer chat shift. chat coverage may need to be evaluated again when the library eventually goes back to operating an in-person reference desk. one pressing question is, will chat demand go back to pre-covid levels, or will it remain high when an in-person reference desk returns? while an in-person reference desk is likely to come back in the future, if chat traffic remains high, a separate schedule may be needed for chat in addition to the reference desk, so the same person is not operating both at the same time. conclusion it may never be known if the increase in chat volume came from the widget updates or was due solely to the pandemic, but it is likely a bit of both. increased chat volume seems to correlate strongly with implementing the floating chat widget, even though there was already an increase due to the effects of the pandemic shutdown. it is planned to look in depth at all available chat data and compare it to in-person reference statistics. another area for further research is the textual analysis of chat transcripts to find trends and make recommendations for greater effectiveness. endnotes 1 “springboard tutorials: create libchat widgets,” springshare help center, accessed january 11, 2021, https://ask.springshare.com/springboards/faq/1880. 2 “libanswers 2.15 update—redesigned chat widgets!,” springshare blog, july 6, 2017, https://blog.springshare.com/2017/07/06/libanswers-2-15-update/. 3 bonnie brubaker imler, kathryn rebecca garcia, and nina clements, “are reference pop -up widgets welcome or annoying? a usability study,” reference services review 44, no. 3 (january 1, 2016): 282–91, https://doi.org/10.1108/rsr-11-2015-0049. 4 jan h. kemp, carolyn l. ellis, and krisellen maloney, “standing by to help: transforming online reference with a proactive chat system,” journal of academic librarianship 41, no. 6 (november 1, 2015): 764–70, https://doi.org/10.1016/j.acalib.2015.08.018; michael epstein, “that thing is so annoying: how proactive chat helps us reach more users,” college & research libraries news 79, no. 8 (2018), https://doi.org/10.5860/crln.79.8.436. 5 laura guy, “embedding springshare libchat widget into the primo nu,” ex libris developer network tech blog, updated december 17, 2018, https://ask.springshare.com/springboards/faq/1880 https://blog.springshare.com/2017/07/06/libanswers-2-15-update/ https://doi.org/10.1108/rsr-11-2015-0049 https://doi.org/10.1016/j.acalib.2015.08.018 https://doi.org/10.5860/crln.79.8.436 information technology and libraries september 2021 expanding and improving our library’s virtual chat service | fruehan and hellyar 9 https://developers.exlibrisgroup.com/blog/embedding-springshare-libchat-widget-into-theprimo-nu/. 6 “privacy: an interpretation of the library bill of rights,” american library association, amended june 24, 2019, https://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy. https://developers.exlibrisgroup.com/blog/embedding-springshare-libchat-widget-into-the-primo-nu/ https://developers.exlibrisgroup.com/blog/embedding-springshare-libchat-widget-into-the-primo-nu/ https://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy abstract background evaluating chat widget options considering technical implementation choosing the floating widget libchat in action in active chats privacy considerations handling harassment and inappropriate behavior observing patron usage best practices what’s next at buley library conclusion endnotes 36 processing of marc tapes for cooperative use kenneth john bierman: data processing coordinator, oklahoma department of libraries, and betty jean blue: programmer, information and management services division, state board of public affairs, oklahoma city, oklahoma a centralized data base of marc ii records distributed by the library of congress is discussed. the data base is operated by the oklahoma department of libraries and is available to any library that can make use of it. the history, creation, operation, uses, advantages, disadvantages, cost and future plans of the data base are included, as well as flowcharts (both system and detail) and sample outputs. background information early in 1966, college, university and public librarians in oklahoma began meeting irregularly to discuss library automation. the incentive for such meetings was clear -libraries in oklahoma could not justify the financial expenditure necessary to "go their own road" in library automation. secondarily, they realized that at some time in the future, cooperative automation projects begun now would pay big dividends. with the coming of library of congress marc ii distribution service in april 1969, interest in library automation once again came to the forefront in oklahoma library circles. mter several general meetings, primar.. processing of marc tapes/bierman and blue 37 ily to find out what others were doing, planned to do, had done, or had failed to do, a marc planning meeting was called by the oklahoma department of libraries. representatives (both administrative and technical) of the three colleges, two public library systems, and two universities that were most likely to be doing anything with marc in the immediate future, were invited. the feeling expressed at the meeting was that if economic use of marc were to be made in oklahoma, there would have to be cooperative effort so that marc data could be used at the least total cost. at the same time, libraries had planned different uses of marc; therefore, allowance for local autonomy and creative use of marc was important. since libraries were planning varying applications of marc, it was decided that the best place to begin a cooperative effort was in the centralized maintenance of a marc data base. four libraries in oklahoma had placed subscriptions with the library of congress for the marc ii tapes when they became available one public library, one college library, one university library, and the department of libraries. the cost of maintaining four complete data bases would be large compared to the cost of maintaining one complete data base in the state for everyone's use. the money saved could then be used for utilization of marc records rather than for housekeeping maintenance of marc records. mr. ralph funk, director of the oklahoma department of libraries, committed the department to obtaining and maintaining a complete file of all cataloging information sent out by the library of congress in marc ii machine readable format (both current and, when available, retrospective) which would always be available on demand (either in part or in whole) to any library in the state. this report describes the cooperative system developed by the department of libraries to maintain and make available marc ii records to any library in the state that has the computer equipment to make use of them. unlike nelinet (1) and the washington state system (2), which are processing marc tapes to produce final hard-copy products for the cooperating libraries, the oklahoma system provides the machine readable marc records (not final products) a library needs; then that library can process these records in any way it wishes on its own equipment. none of the marc i participants was primarily concerned with the central distribution of selected machine readable records ( 3). possible future state-wide cooperative ventures with marc (including useful products) are also discussed. overview of the system the system can be thought of as two sub-systems: 1) merging and maintaining a marc master file of all records sent out by the library of congress in marc ii format, and 2) retrieving-i. e., withdrawing38 journal of library automation vol. 3/1 march, 1970 selected records by lc card number from the marc master file for specific libraries on demand. the maintenance sub-system has four distinct programs: 1) odl-01, which merges marc tapes; 2) odl-03, which drops or transfers to another tape any record or combination of records on a given marc tape; 3) odl-04, which prints a marc tape in upper-case ebcdic; 4) odl-06, which prints the lc numbers (300/page) from any given marc tape. the retrieval sub-system has one program, odl-05, which selects and copies specified lc card number records from the marc master file to a blank magnetic tape to be sent to the requesting library for its use. the maintenance sub-system the programs discussed in this section are used to merge and maintain the marc data base ( odl-01 and odl-03) and produce hard-copy by-products which are of occasional use for various purposes ( odl-04 and odl-06). system, input and output descriptions are included for each program. record merge program this program takes tapes in marc format and code and merges them in lc card number sequence. during processing, messages print if any unusual conditions occur, such as a new record with status other than "n" (new), a matched record with a code other than "c" (corrected), or "d" (deleted), etc. the messages also indicate the action taken. in general, any new record is merged onto the file regardless of code, a match with code "d" causes deletion, and a match with code "c" (and any other match) results in replacement by the new record. this occasionally causes "invalid" codes to be merged onto the file, but this approach was taken for three major reasons, one being that it is usually easier, in cases of error, to remove a bad record from the master than to retrieve it from its source and then merge it onto the master. secondly, as files become larger, it is feasible to make minor merges of a few tapes, then merge the result onto the actual master. during the minor merge, many apparently new records with codes "c" and "d'' will appear, but as the final merge is made, appropriate action is taken. thirdly, a library obtaining marc records from the department of libraries may also want to use the odl-01 merge for its own internal use. some of the records requested by the local library may have been corrected at some time and are therefore coded "c". although new to the individual library, these are coded "c", but are perfectly valid records from that library's point of view. since the odl-01 merge always merges a new record onto the file regardless of the code, this program can be used by the individual library without modification. processing of marc tapes/bierman and blue 39 inputs are 1) a marc master (a tape in marc format and code containing all records merged to date), which is in lc card number sequence; and 2) marc "items" (a tape ( s) in marc format and code containing the new records to be merged.) processing halts if this tape ( s) is out of sequence. r----___,, i i i i i i i i i i i i dl1 marc merge fig. 1. record merge program system flowchart. outputs are 1) a marc master, which is a tape containing records from all inputs, with appropriate corrections and deletions made; and 2) a merge listing, which contains notices of all corrections and deletions and notices of all unusual conditions. if desired, this listing can be expanded to print certain desirable information from all records merged and thus can be used as a valuable reference and check list. it will contain the lc card number, with prefixes and suffixes, and status code, and will indicate if a match was found on the master tape and the action taken. figure 1 explains the overall flow of the program. figure 2 gives the program details as of september 1969. hskp i / ' e';ral ""'• / •• r tra ' \ red i construct i nvalid n~\1 code msg ' ' \ m o'i' i fig. 2. record merge program detail flowchart . • 11:>0 ....... 0 1:: '"'t ~ ~ .q. t"-t .... ~ '"'t ~ ~ e0 ~ .,... .... 0 ~ < 0 ~ c.:> .......... ~ s: ill '"i n .?"' ~ (0 ~ 0 e~tus hconstruct •c? hatcr message; ____j \.__} \.__} l.__1 \.__} i """ no ~ ~ ~ co co -· ;:$ q"q .q.. add 1 to h construct ~ print h skip a skip i a:: a read a read > delete delete msg old new old new ~ counter ~iessage ht":ti ~~~ d~a d-a (j ~ ~ ~ co construct i ~ral .......... invalid to matcr msg 1-; trj !:lo ~ > z ill ::s 0.. to roe fig. 2 continued. c:: tr1 ,j:>.. ....... ~ ~ ...... c subtract i 'r·-··-~ 2048 from ...... length c -1:"'4 .... add 1 to c:.-neii·count ~ exit ) ~ ~ .... c ;:! ~ .... s· ;: < c :-c.:> subtract i ......... ~ subtract i v \ r~d i 1 2048 from i 2048 from length ~ length cj no >; ~ no i . \ .?" exit ~ (!) exit ~ fig. 2 continued. add 1 to new-count fig. 2 continued. subtract 2048 i froh length ~ i exit ) ud-ne\1 hove hivalues to old-compare ~ area -----' hove hi· values to h nell-compare area ~~ j exit '"i; ~ ~ <.':) ~ ~0 ~ ::x:l (") ~ ~ ) <.':) ex it ~ b:j 1-c tr1 !:d ~ > z § 0.. b:j t-t c tr1 ..... (,:) 00886nam 2200205 0010013000000080041000130500021000540820018000751110093000932450119001862600 ~7ft03053000033003425000089c037550400290046465000320049365000240052570000460054970000460059571000400 0641& 67026007 &690324s1968 moua · b 10100 engo &0 sarc847sb.a67 1966& $4616.3/62/0755&20 saapplied seminar 0~ the laboratory diagnosis of liver diseases,scwashington, d.c.,$01966.&1 $alabor atory diagnosis of liver oiseases.$ccompiled and edited by f. william suno f. r~an and f. william sunde rman, jr.&o sast. louis,sbw. h. greensc*c1968*& saxl[[, 542 p.sbillus.sc27 cm.& sahelo under the a uspicf.s of the association of clinical scientists, nov. 10•13, 1966.& $al~cludes bibliographies.&oo saliversxdiseasessxdiagnosis.&oo$ameoicine, clinical.&10sasunderman, frederick williamtsd1898•seeo.& 10~asunderman, frederick william,sd1931•seed.&20saassociation of clinical scientists.* 00778nam 2200169 0010013000000080041000130500019000540820010000731000017000832450295001202600 04600415300002600461500003800487500002800525652002800553740002800581& 6702a617 &690324r19681846mdu c c 00000 f~go &0 saf93sb.h65 1968& sa929.3&10sahinman, royal ralph,$01785•1868.&1 saa catal ogtw of the names of the first puritan settlers of the colony of connecticut,sbwith the time of thei r a~rival in the colony, and their standing in society, together with thfir place of residence, as f ar as can be discovered by the records.sccollected from the state and town records.&o sabaltimore,sb genealogical pub. co.,sc1968.& sa336 p.sbport.sc23 cm.& $aon spine* first puritan settlers.& sare print of the 1646 ed.&oosaconnecticutsxgenealogy.&olsafirst puritan settlers.* 00896nam 2200193 00100130000000800410u0130500017000540820010000711000021000812450128001022600 0500023030000320020049c005800312500013300370504003100503650003100534710006100565810007700626& 6703 0030 &690324s1968 nyua b 00010 engo &0 sara395.a3sbu4& sa362.1£10saullmann, john e.t1 sat he application of management science to thf evaluation and design of regional hf.alth services,scedit ed by john e. ullmann.&o sa*hempsteao, n.y.,sbhofstra university*sc1968.& saiii, 346 p.sbillus.$c28 cm.&1 sahofstra university yearbook of business, ser. 5 0 v. 2& 'a**this* ~fport results from the c ontinulng series of m.b.a. seminars conducted by the school of business of hofsfra university.*& sa bibliographical footnotes.&oosacommunitv health services.&20 sahofstra university, hempstead, n.y.sbs chool of business.&2 sahofstra university, hempstead, n.y.styearbook of buslnfss,svser. 5, v. 2* 00844nam 2200217 00100130000000a0041000130410011000540500018000650 8 20014000831000027000972450 0940012426000580021830000490027635000100032549000730033550400810040865000260048965000330051584000270 054884000s200575& 67031114 &690328s1968 njua 8 00100 engo &1 shengfrf.&o san7b32sb.g6613& sa704.948/2&10sagrabar, anor=e,$dl896•&1 sachrlstian iconography*sba study of its origins.sc*trans lateo from french by terry grabar.&o saprinceton, n.j.*sbprinceton univepsity presssc*c196r*& sal, 174, *203* p.sbillus. ipart col.)sc27 cm.& sa15.00&1 sabollingen series, ~s. the a. w. mellon lectu res in the fine arts, 10& sabibliography* p. 149•158 12d group) *illustrations** p. *1*•*203* (30 g roupl&oosaartt early christian.&oosachr.istian art and symbolism.& sabolltngfn seriesrsv35.& sathe a. w. mellon lectures in the fine artsrlv10* fig. 3. print record program output. ..,. ..,. i ....... -q.. t"i & ~ ~ ..... c ~ ..... c;· ;:s ~ !"""' (;:) ........... ...... a:: ~ '"i pi-' cd c3 processing of marc tapes j bierman and blue 45 drop and transfer records program this is a utility program that enables any number of lc card numbers to be entered on cards, with the option in each case of dropping the record entirely or transferring it to another tape for future action. it has proven useful for removing out-of-sequence records, purging files, etc. inputs are two in number: 1) any tape in marc code and format (sequence is not checked) ; and 2) detail cards, each of which contains a 12-position lc card number and a code indicating if this marc record is to be dropped or transferred to another tape. these cards must be in sequence. there are three outputs: 1) an updated tape containing all marc records on which no action was taken; 2) transferred tape containing, in sequence, all records transferred; and 3) a listing showing the lc number and the action taken, which is useful for verification of results. print record program this program prints in readable form any tape in marc code and format. the translation table, which produces a form of upper-case ebcdic, is the same as that used for other department of libraries programs. it is a character-for-character translation, which, for the present, is useful for many and varied applications. input is any tape in marc code and format. output is an upper-case ebcdic translation of the tape. figure 3 shows a sample output. figure 4 shows how the oklahoma department of libraries is handling the marc expanded character set with a small printer (ibm 1404-48 characters). simply stated, the problem is that there are many more characters coded in the marc ascii character set than are available on the particular printer that the department of libraries is using. (this is a local limitation of the printer that happens to be available; it is not a limitation of computer technology, as printers with expanded character sets are readily available). in general, rarely used punctuation and special punctuation marks not in the printer's character set print as an "•'', the lower-case letters print their upper-case equivalents, and diacriticals and foreign language symbols print as "= ". this translation table is used for in-house lists (for checking purposes, etc.). for production purposes, a slightly different translation table is used. characters, particularly punctuation marks, not available on the printer are translated to their closest equivalent or left blank, whichever is more appropriate. at the oklahoma department of libraries, all translations at this time are internal and do not affect the marc tapes, which are being left in the original ascii code. it seemed unreasonable to centrally translate the tapes to ebcdic until agreement among all the users could be reached as to a mutually useful translation table. 46 journal of library alutomation vol. 3/ 1 march, 1970 there is a good possibility that in the near future the information and management services division will make available an off-line printer with an expanded character set ( upperand lower-case letters, additional punctuation, etc.). if this does happen, then print-outs in an expanded character set would be economically possible. k .. .c " .c "" kc: h a, y,9 a,9 8, 9 c , 9 0,9 e , 9 f ,9 g ,9 11, 9 a,8 , 9 8 , 8 , 9 c , 8 , 9 0 ,8, 9 e, 8 ,9 f , 8,9 g , 8,9 a , q , 9 j , 9 k, 9 l , 9 m, 9 n,9 0, 9 p,9 q, 9 j , 8 , 9 k ,8,9 l ,8, 9 m,8 , 9 n , 8 ,9 0 ,8 , 9 p . 8 , 9 k .. " k .c .. " " .c .c " u .... " " 0 .. .. u .. .. "' :e :e "' 00 nul $ 01 soh s 0 2 stx $ oj etx $ 04 eot $ os enq $ 0 6 ack $ 07 bel s 08 ss $ 09 lit $ oa lf s os vt s oc ff $ od cr s :je so s of si $ 10 ole $ 1 1 dc! $ 12 dc2 + lj ocj $ 14 dc4 $ 15 nak $ !6 syn $ 17 ets $ 18 can $ 19 em $ ! a sub $ 18 esc $ !c fs $ 10 gs * le rs & l f us $ " k .. " .c .c " u u .... .....c 0 0<.> u uc: : ~6. s b $ sb $ sb $ 58 s sb $ s b $ s b $ sb $ ss s 58 $ ss $ sb $ ss s sb $ ss $ ss $ 58 $ 58 $ 4e + ss $ 5b $ 58 $ sb $ sb $ 58 $ 5b $ 58 $ sb $ sb $ sc h 50 & sb $ .. .. .c ... .c ...... "" .. ~ "'"' j ,y,9 z , l z,2 z , j z,4 z, s z , 6 z , 7 z , 8 y,l , 9 y, 2 , 9 y , 3 , 9 y, 4,9 y , 5 , 9 y,6, 9 y, 7 , 9 a ,q, z 1 , 9 2 , 9 j , 9 4 ,9 5 , 9 6 , 9 7 , 9 8 , 9 1 ,8, 9 2 , 8 , 9 j , 8 ,9 4, 8 , 9 5 , 8 ,9 6 , 8,9 7 , 8,9 ~ k ~ " 0 .. .. "' 20 sp 21 ! 22 .. 2j i 24 $ 25 7. 26 & 27 . 28 ( 29 ) 2a " 2b + 2c ' 20 2e 2f i jo 0 jl 1 j2 2 3j j j4 4 3 5 5 j6 6 j7 7 38 8 39 9 ja : 38 jc < )0 3e } 3f ? marc hex lc • tape mare k .. .c 0 !::! 0 u "' "' sp ,, k * * * " . ( ) * * ' i ll 1 2 j 4 s 6 7 8 9 .;( " " • • * marc hex ld • end o f recor d marc hex l e • field terminator marc h ex if • delimet er fig. 4. conversion table. k " .. <j.r. .c " uu .... >-<£: 0 ov u uc: "'"'~ "' "'"' 40 sc ,., sc * 5c * sc * sc * 5c " 70 i 40 ( so ) 5c * sc • 6 b ' 60 48 61 i f\l 0 fll f2 2 fj j f4 4 f5 5 f6 6 f7 7 f8 8 f9 9 5c * sc h sc • sc • sc " sc • ..~ "'"' a, z 8 , z c,z d, z l:: , z f , z c , z h, z a, 8 b, s c , 8 0 , 8 e , s f , s g, 8 & a,r 8 , r c , r d, r e , r f , r g , r h,r j , 8 k, 8 l , 8 m, 8 n, s 0, 8 p,8 k x <0 " .c .c " 0 " .. .. .. "' .. "' 4 0 @ 41 a 42 b 4j c 44 d 4 5 e 4 6 f 47 c 48 h 49 i 4a j 48 k 4c l 40 ~~ 4e )i 4 f 0 so p 51 q 52 r 53 s 54 t 5 5 u 56 v 57 w 58 x 59 y sa z ss ( sc \ so j se ... sf .. .. oo x <0 .c <i.c " .c " 8 8 ~ 0 0 co u <..' ug ~ ~~ "' "' ,; sc ,, a c 1 a s c2 8 c cjc d c4 d e cs e f c6 f g c7 g h c8 ~ i c9 i j 01 j k 0 2 k l oj l m d4m n os )i 0 0 6 0 p 07 p q 08 q r 09 r s e2 s t ej t u e4 u v es v w e6 w x e7 x y eb y z e9 z " 5c " * sc * * 5c • * sc " * sc * .. .. .c 0 .c v o "" "'~ ,; a. ll,l k, z l , z m, z n, z o , z p , z q , z y,l & ,y , j y, 4 y, s y,6 y, 7 &, -.0 a , r,0 ll ,r,~ c ,r, 0 d,r,0 e,r,v' f , r ,\1 g , r , 0 h,r , 0 l , 8 2 , 8 3 , 8 4, 8 5 , 8 6, 8 7 ,8 .. x <0 " .c .c 0 0 " .. .. .. .. "' :e 6~ ' 61 a 62 b 63 c 64 d 65 e 66 f 67 g 68 h 69 i 6a j 68 k 6c 1 60 m 6e n 6f 0 70 p 71 q 72 r 73 s 74 t 75 u 76 v 77 "' 78 x 79 y 7a z 78 { 7c i 70 } 7e 7f del .. .. <0 x .. .c " .c " .c " 8 8 ~.c c c'l 0 0 u u u c:: "' ., "'~ w "' we. " sc l'f a cl a b c2 s c cj c d c4 0 e cs e f c6 f g c7 c h c8 h i c9 i j 01 j k 02 k l oj l m 04 m n os n 0 0 6 0 p 07 p q 08 q r 0 9 r s e2 s t e3 t u e4 u v es v w e6 w x e7 x y e8 y z e9 z • sc • " sc • " sc * * sc • * sc " processing of marc tapes j bierman and blue 47 "" :0: :0: a,y 80 a,ll 81 8,0 82 c, ll 83 0 , 0 84 e,ll 85 f ,ll 86 g,0 87 h, 0 88 1,0 89 b, y sa c, y 88 d,y sc e, y 80 f , y be g, y sf a,q 90 a,91 8 ,92 c,. 93 d,94 e, 95 f ,. 96 g, 97 h,98 i,99 8 , q 9a c,q 98 d, q 9c e,q 90 f , q 9e g, q 9f ( i s &,5,8 + i s &,6,8 "' + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + & is !"2 z one $ i s -,3, 8 u '"'"' 0 0 u c "' ~ "" "' "' 4e + 4£ + 4e + i~£ + 4£ + 4£ + 4e + 4e + 4e + 4e + 4e + 4e + 4£ + 4e + 4£ + 4£ + 4e + 4e + 4e + 4e + 4e + 4e + 4e + 4e + 4e + 4e + 4e + 4£ + 4e + 4£ + 4£ + 4e + ~ 1 u.., u 0 ~ c .. ~ :.:c.. j ,y j,ll k, l! l,ll m, ll n,ll 0 ,0 p ,0 q,0 r,ll k,y l, y m, y n, y o,y p , y a,q,il a,-,0 b,-,0 c.,0 d,,0 e,-,0 f,-,0 c , .. ,g ll, -,0 i ,,0 b, q,il c, q,il d, q, 0 e, q,0 f , q,il g, q, il is -, 5 ,8 • is 6·, 8 :0: ad ai a2 aj a4 as a6 a7 a8 a9 aa ab ac ad ae af 801 81 b2 b3 b4 b5 b6 b7 b8 b9 ba bb 8c bd be bf is 11 zone * i s -, 4 , 8 fig. 4 continued. :0: . .<;0 -9 p n!. ce " b ® .! r:j u: . t ~ .! p ,., "' .. .... tt <1 u: print card numbers program .., .., "' . . . . . . -. . -. ---+ + -. ----. . . . + -+ + 7£ 7£ 7£ 7e • 7£ 7e • 7e 7e • 7 e • 7e 7e 7e 7e • 7£ • 4e + 4e + 7e • 7£ 7e • 7£ 7e • 7e • 7e 7e • 7e • 7e 7e 4e + 7£ 7e • 4e + 4>: + is 0 ,1 is 0 ,3, 8 i s 5 , 8 .. .. .<! 0 .t! 0 u .. c "~ :.:c.. &, 0 a b c d e f g i! i b, y,9 c,y,9 d, \',9 e1 y, 9 f, y,9 g, y,9 -·" j k l m n 0 p q r 8 ,q, 9 c,q,9 d, q, 9 e,q ,9 f,q,9 g,q,9 .. " .. " .<! .. u .. .<! .<! u ~ 0 u 0 .. .. u " .. "' x :0: "' co + cl + c2 + c3 + c4 + c5 + c6 + c7 + c8 • c9 + ca + cb + cc + cd + ce + cf + do + 01 + 02 + 03 + 04 + 05 + 06 + 07 + 08 + 09 + oa + db + dc + do + de + of + .. " .. .. .<! .<! 0 u u .... '"'"' 0 ou l;{ !;l § "' "' "' 4e + 4£ + 4e + 4e + 4e + 4e + 4£ • 4£ • 4£ + 4e + 4£ + 4e + 4e + 4£ + 4e + 4£ + 4£ + 4e + 4£ + 4e + 4£ + 4e + 4e + 4e + 4e + 4£ + 4e + 4e + 4e + 4e + 4e + 4e + .. .. .<! u .<! u u .. c .. ~ :.:a. s,a j,z s t u v w x y z k, y, l, y, m,y, n,y, o,y,9 p,y,9 0 1 2 j 4 5 6 7 8 9 b, q, z c, q, z d,q , z e,q,z f , q,z g, q,z " " .. .. , " " .. .<! .. u .<! u .. .t! .c u u u ~.<! .... .... u u 0 0 ou .. .. u u !;l~ " "' "' "' :0: :0: "' "' "'"" eo ? . 7e • el ' . 7e • £2 / . 7e e3 " . 7 e • £ 4 . 7! • e5 7e • e6 .., . 7e • e7 . . 7£ £8 .. 7£ -£9 v . 7£ ea 0 . 7!· £8 , 7£ -ec ' 7e • ed . 7£ -ee .. . 7e • ef "' . 7e • fo • 7£ fl • 7£ • f2 7£ f3 .. 7e • f4 7e · fs = 7e • f6 . 7£ f7 -..l 7£ -fb ~ 7e • f9 " 7£ -fa ' 7e • fb -' 7e • fc + 4e + fd + 4£ + fe • 7e • ff .. 7£ • this program prints all lc card numbers on any tape in marc code and format. numbers are 12 positions in length and print 300/page in translated form. occasionally. a cumulative list of lc card numbers on the master tape is produced as a reference tool. its input is any tape in marc code and format, its output a printed listing of all library of congress numbers on the tape (figure 5). a record count will appear at the end of the listing. figure 6, the detail flowchart for this program, appears on the following pages. 48 journal of library a<utomation vol. 3/ 1 march, 1970 lc nu.,.ber listing as of 07/03/69 62022062 65062804 66030342 660629:}2 63011276 65062892 66030452 66063102 63022268 65062911 66030453 66063756 63064323 65062999 66030644 66063181 63065066 65063172 66060155 66063923 64011102 65065046 66060267 66063926 64023532 6506'5193 66060594 66063931 64062037 65065405 66060596 66064123 64062399 65065794 66060600 66064334 64062r67 6506'i815 66060631 66064456 64063587 65065818 66060710 66064513 64063588 66010930 66060816 66064619 640b363l 66011644 66060830 66064688 64063835 66012619 66060860 66064722 64063999 66014644 66060882 66065118 64064293 66014929 66061036 66065264 64064724 6601h0 8 4 66061101 66065371 64064971 66019184 66061116 66065445 64066336 66021082 66061340 660655:>1 65019066 66021087 66061643 66065709 65019667 66021669 66061685 6606'5767 6'5023174 6607.1679 66061f!69 66065770 6'502s047 66021680 66061875 67010007 65026126 66021689 66061886 67010038 65027231 66021695 66061889 67010310 6'>027416 66022509 66061r99 67010836 6'>021"107 660229af! 66061917 67011394 6s027708 66.02'l067 66061967 670115h 65028116 66024150 66061983 670ll9h 65060409 6602'i530 66061988 67012048 65060483 66025986 66062017 67012128 6'>060652 1>6026120 660620l8 67012478 650601,84 66026122 66062160 67012840 65060737 660261h 66062168 67013691 65060796 66026124 66062252 670140!>1 65061226 660261?.5 66062259 67014071 650b1567 1>6026126 6 60 62283 67014142 65061895 6602659r 6 6 062290 67014311 65061896 660?6650 66062309 67014312 65062346 b6027410 66062403 67014916 65062359 66027435 66062405 67015033 65062399 6602769~ 6 6 062417 67015715 6'i062463 66027694 660624'•4 67016233 6506248'> m021!204 66062476 67016619 65062489 66028413 66062637 670169h 65062')04 6602a462 66062640 67017216 65062'>07 66028495 66062649 670174!19 65062'>4'3 6602 8687 66062820 67017582 6506<'').72 6602f\c)')f) 66062964 67017584 65062300 660 'i014fl 6fl06?.986 670176:>9 fig. 5. library of congress number listing. processing of marc tapes j bierman and blue 49 c start ) move data to header une move to hoidarea usingh alter end-sw to go to eoj t-----.1 print-rtn fig. 6. print card numbers program detail flowchart. 50 journ-al of library automation vol. 3/ 1 march, 1970 add 1 to h yes )---------.{ print-rtn locate & pull est l.------lil'l length codertranslate fig. 6 continued. subtract 2048 from length processing of marc tapes/ bierman and blue 51 sethtofl translate hold area move 1 to k 51 to l 101 tom 151 ton 201 top 251 to q perform rtnx 5 times clear hold area fig. 6 continued. eo) close files stop run skip 1 une perform rtn-y 10 times l move hold (k) toprlnt-k • move hold (l) to printl move hold (m) toprlntm , move hold (n) toprlntn move hold (p) to printp , move hold (q) to printq add 1 t o k, l, m, n, p, q. 52 journal of library a.utomation vol. 3/1 march, 1970 retrieval sub-system withdrawing records program this program withdraws records selected by lc card number and copies the complete marc ii records onto another tape. a library sends the department of libraries a magnetic tape containing the lc card numbers for the records it wants copied from the data base. the data base is searched and the requesting library is sent back three tapes and three hard copies. the tapes are: 1) the original finder tape, 2) an item tape containing the records which matched, and 3) a tape containing the lc card numbers of the records which did not match. the three hard copies are: 1) a list in lc card number order of the records which matched containing on the first line information from the finder tape and on the second line information from the marc tape; 2) a listing of the card numbers and other information on the finder tape which did not match any card number in the data base; 3) a listing of card numbers and other information on the finder tape that were invalid. there are three inputs to the system, the first being a marc master, which is the latest merged master at the department of libraries; its records are in the original code and format. the second consists of finder records, which come from the individual library. input is originally on card in the format specified in table 1, then put on tape, blocked 5, and sorted (no tape labels are used at this time) on all 12 positions of the lc number. the tapes are unlabeled upper-case ebcdic 1600 bpi. the third is a card that enters the appropriate date and library code into the system. table 1. original card input format to odl-~5 card columns 1 2-4 5-12 13 14-28 29-48 49-76 77-80 field contents and special instructions local library code (assigned by dept. of libraries) lc card number prefix (upper case alpha or blank) lc card number (numeric) lc card supplement indicator (may be blank) local use (may be blank) local use or first 20 positions of author (may be blank) local use or first 28 positions of title (may be blank) local use or publication date (may be blank) the system gives the following five outputs: 1) matched records, a listing of records that matched and were transferred to the individual library's item tape. this listing shows all informaprocessing of marc tapes/bierman and blue 53 tion from the finder record, and immediately below, the following information from the marc record: lc card number, the first 20 characters of the author, and first 28 characters of the title and the publication date. information pulled is as follows: author (first tag beginning with 1), which will usually be 100 or 110; title, which will always be 245; and date, which will be the 7-10 positions under tag 008. figure 7 shows sample of output. the first line is data from the finder tape and the second line data from the marc master tape. hucheo .records library cool! x qate. processed · 06/15/69 lc number local use author htle date 6406(1336 arcd publishing cohp operations and maintenance 1966 aroo publishing,tom9 operation~ and maintl!nanceot 1966 6()02[680 knox, john jay a history of banking in the 1969 knox, john jay.$0182 a history of banking in .the 1969 67021200 g.ll/pic gilbert pictorial anatomy ·of the cat 1968' gilber.t • stephen g.c pictorial anatomy of the cat 1968 61.023081> dickinson, emily two poehs. 1968 dicki.nsoii, ehily•sdt two poems.·sc*illus. and call 1'168 680080h gernsheih, helmut l. jo1 m. daguerrb 1968 gernsheih, helmulo$0 l. j• m. daguerr!!*sbthe hist 1968 68008214 riley• jdhii it, the studeiit looks at his tea 1'169 riley, john w.£1 sat the .student looks at his tea 1969' 6a0081ol8 tagiurl'o renato organizational climate 1968 tagiurto rf.nat0~&1 $ organizational clihate*~8exp 1968 680257:!11 t"e bardoue principles, sty 1968 8az!n 0 gerhajn~tl sa the baroque* principles, ·sno 1968 69015554 36823 philips, judson girt with sl~ fingers 1969 philips, judson penf the girl with six fingers*sb 1969 17002574 aylmer, g. e. the struggle for the constit 1968 aylmer, .c. e.&t uth the struggle for the constit .1968 78625296 groves, doris guiding the development of 1968 .groves, ooris.u sag guiding . the development of t 1968 79000540 americ~n assoclation preparation fo~ retirement 1968 ·american association preparation for retirement•s· 1968 at 6.800't"l52 iiellst r08ei!t science• hob8y hook of iieathf 1968 iiel:lsi robert', s01913 ·st i ence•h08b'i' book op weathe 1968 agr680001't 5 .santhyer, carolee morocco s agricultural ecd~o 1968 santh'i'er'• carolee,sd morocco s agricultural econo l96b gs 68000236 us geological survey bi~liogi!aphy 'of reports resu 1968 heath, jo ann,sd1923 bi8liograph'i' of reports rfsu 1968 total hatched reccs 15 aoo. generated errs .. fig. 7. matched records listing. 54 journal of library a.utomation vol. 3/1 march, 1970 2) items tape, containing all records requested from the master tape. they are in marc format and code, and the number of logical records should match the matched record count. 3) unmatched finders listing, showing all valid finder records that did not match the marc master tape. figure 8 shows sample output. 4) unmatched finders tape, containing all valid finder records that did not match the marc master tape. uiimatcheo ~fcoros lleaary coof. x date paocessfo 06/is/69 lc nuhbf.~ local use a unto~ titlf date 39015412 paoint paoelfoao i~teana tional uw 1939 6000716) 3~6~ delllt melvin belli l coks at life 1960 640635r6 caswell t barbara w woak~ens c!i~pe~sat i on hfnh i 1963 68002763 belli law revolt 1?6~ 68055~0~ eiseha'it alberta thf. guest oog 68066~07 roheck t hllcreo spf.c!al class programs for 1?6a 70003466 th f fhenoeo ca{f. facility 196 7 71 079)1 0 rfitlelt willi am the •fo itfrranean, i is rolf 1969 hfw68000051 pi ve'ih/fouc tot~l un•atcheo rcos 9 fig. 8. unmatched records listing. 1!!\ltor list iiig library code x d ate processed 06/15/69 x 66016u65 rich necessit ies of life 1966 i nv all c lc• number x 6805540~ i? i segues t dupl i ca te lc number x 73u3622 8leinh eih the ai.se and fall of the 197 0 inv alic lc• number j 95000001 invalid libr ary code j g9 6md003b ~683986 i nvalid library code jg9 6adddol8 468~986 invalid lc• pref i x jhe\17369 78 }6 he t s~lue iiivalid liormy code xhe 3a3326d9l invalid lc• prefix xheh332609z invalid lc• nuh8er xhlw790d0366 j ones relationships a~cng l969 invalic lc• pref ix j3266aodoog9 curt is the making of a presiden t 1969 invalid lirrary code j3266addddg9 curtis the making of a pres i dent l 96? i'ivalio lc• prffix j32668dooor.9 curtis the making of a pres i dent 196? i nv all c lc• nuhher x32669005736 hew3265ht 32 invalic lc2 pre f ix total frrors l4 fig. 9. errors listing. processing of marc tapesjbierman and blue 55 5) errors listing, showing all 80 columns of invalid finder records and the appropriate error message. finder records are invalid if one of the following errors occurs: 1) blank or invalid library code; 2) prefix any characters except blank or upper-case alpha; 3) lc card number not pure numeric. invalid finder records are not processed but are placed on an error listing. figure 9 shows sample output. no edits will be made on columns 14-80, which are for local use entirely; all data from these fields will be transmitted to printed listings for any desired local use or for verification. record counts are included at numerous points to facilitate accurate record control. for the purposes of this particular program, counts should check as follows: matched + e + unmatched -generated original rrors '= records records errors count matched records appear at the end of the listing of the same name, errors appear at the end of the listing of the same name, and unmatched records appear at the end of the listing of the same name. generated errors appear at the end of the matched records listing. a generated error indicates more than one error in -a single card and this count is included only for control purposes. the original count is expected to be maintained by the submitting library for maximum accuracy. these counts are checked immediately, and any discrepancies cleared up as soon as possible. figure 10 gives the overall view of the program and figure 11 a detailed flowchart. the odl-05 program was written to provide the greatest flexibility possible to the user libraries. the only information absolutely required for the finder tape is the local library code, and the complete lc card number. however, the remaining 67 card columns are available to the local library for any use it may wish to make of them. if the local library would like a quick method of sight checking to make sure that the records copied were the records wanted, it can keypunch the first twenty characters of the author in columns 29-48, the first 28 characters of the title in columns 49-76, and the date of publication in columns 77-80. if this is done, the matched records listing will contain the author, title and date from the finder tape, immediately followed underneath in the same position on the page by the corresponding information from the marc record. figure 7 shows sample output. thus, the library can quickly sight check what it thought it was getting at the time of request with what it actually got from the marc record. of course, the local library is free to put no information, or other information, in columns 29-80; the operation of the system will not be affected and whatever information is included in columns 29-80 will appear on the three output listings (matched records, unmatched finders and errors). 56 journal of library automation vol. 3/ 1 march, 1970 -phase 1edit, pull atches and list. -phase 2print errors -phase 3print unmatched listing unmatched listing date & lib. code card matched listing errors fig. 10. withdrawing records program system flowchart. ~ master-read \..._/ ~ finder-read ~ processing of marc tapes/ bierman and blue 57 move control data ·to header areas put hi-values 1----~in compare pull lci & convert to ebcdic compare \..._/ area close files put proper error code in record ~ !---•compare '-.j ~ phase-2 \._.,/ ~---=" fineread fig. 11. withdrawing records program detail flowchart. 58 journal of library automation vol 3/ 1 march, 1970 construct ~:...___,..,non-match red 'and code as 1----~ ~1rud finder-read fig. 11 continued. such /"""'.. work-read '-./ open work tap e r-'\ 1----_.. finder-read -...__,~ close files construct error message work-read processing of marc tapes/bierman and blue 59 {\ phase-3 open unmatched tape construct at end close files unmatched eoj mes sage fig. 11 continued. another convenience for the local library is that it has to do no original programming to use the system. all that is needed are standard sort, merge and card-to-tape programs. any of the programs written by the department of libraries is available to users on demand. they may find the merge or lc card number print programs useful. another consideration for the user is the ease with which invalid finder records and unmatched finder records can be resubmitted into the system. to correct finder records in error, the library simply repunches cards from the error listing, with necessary corrections, and resubmits them in the next cycle with new cards. unmatched finder records can be merged with any new finder records in the next cycle and resubmitted, no repunching being necessary. 60 journal of library automation vol. 3/ 1 march, 1970 what is presently being done the variety of applications for marc presently being worked on in oklahoma libraries is most interesting. central state college, edmond, oklahoma, is currently subscribing to the weekly marc tapes and producing an index of available materials which cumulates for two months and then drops off the older entries. the library is receiving its own subscription to the marc tapes for this purpose but does not plan to maintain a complete file of marc records. the tulsa city-county library system, tulsa, oklahoma, is currently using marc records from the state data base for bibliographic information for its machine produced book catalog. it originally had a subscription to the marc tape service, but with the operation of the state-wide data base, is dropping it. the university of oklahoma, oklahoma state university, and oklahoma county libraries have no immediate plans for utilization of the marc records as distributed by the library of congress; however, when they do move in this area it will probably be for use in their technical processing departments and the state marc data base will form a basis for their use. computer and language used the computer being used for the department of libraries marc program is an ibm 360/ 30 located in the state budget bureau but under the administrative control and operation of the information and management services division of the board of affairs (the centralized state computer center for the capitol complex) . the computer has 32k core size, one on-line card read/ punch, model 2540, four magnetic tape drives, model 2415, two magnetic disk drives, model 2311, and one on-line printer model 1404. the programs are written in cobol for the 360/ 30, operating under dos, with a cobol compiler. very little modification would be required to operate under os. the merge program ( odl-01) requires three tape drives. the withdrawing program ( odl-05) requires four tape drives but could be modified to operate with only three tape drives. in agreement with henriette avram and julius droz ( 4), the department of libraries has found that cobol can easily be used to process marc records. the information and management services division has assigned a programmer to the department of libraries who has done, and will do, all the marc programming. she is actually employed by the imsd and the department of libraries contracts with them for her services. presently, the department is being charged about $7.00 an hour for programming time. the planning, system design, actual programming, and production are all closely supervised by the data processing coordinator of the library, and he is on the department of libraries' staff. processing of marc tapes/ bierman and blue 61 the relationship between the imsd and the odl has been extremely beneficial for the library. thus far, the centralized computer center has provided fast and excellent service at a minimum cost. having a fulltime data processing coordinator on the staff of the library has negated the communication barrier which so often exists between a computer service center and a user library. cost cost figures for use of marc are very difficult to find. few of the marc i participants ( 3) give anything but a fleeting reference to cost. the reason is clear: cost figures are difficult to determine and even more difficult to evaluate meaningfully. table 2 is a breakdown of the charges to the department of libraries for programming and machine time; it does not include department of libraries' staff time or overhead costs. the figures are accurate through the end of february 1970. table 2. costs system design -----------------------------------------------$1,102.00 programming -----------------------------2,467.00 machine cost for program testing, debugging and machine and operator cost for merging through 2/ 28/ 70 ------------------2,026.00 total --------------------------$5,595.00 for the first year, the department of libraries is absorbing all the costs of merging and maintaining the marc master file, as well as the costs of all programming, as a form of state aid to libraries. the machine costs of comparing a finder tape with the master file, copying the desired records, and printing the various hard-copy lists, is being absorbed by the user library. the user also supplies the two blank tapes which are needed for each run. the machine time costs are based on the rate of $80.00 an hour of cpu time. plans for the future of the state-wide marc master file two major problems are apparent in the system as it is now set up. the system was initially created as a sequential tape system because this was the easiest and quickest way to establish a working system, and because it was felt that this would be practical for at least the first year of operation. one problem is that the sequential file will become expensive to maintain and does not allow direct access to a particular record without a sequential search. another problem is that the present system allows enhy into the file only by lc card number and does not allow entry directly with bibliographic information. 62 ]vurnal vf library automation vol 3/1 march, 1970 in accordance with present plans, in march 1970 work will begin on converting the storage medium from tape to a direct access device (disk or data cell) as the recon study suggests ( 5). at that time the file will cease to be maintairied in lc card number order and will be maintained in the order in which the records are received from the library of congress. various indices to the marc data base will be produced; author and title indices will enable the data base to be searched by bibliographic information when the lc card number is not known. in this way, only the indices (which would be comparatively much smaller), and not the complete data base, would have to be merged and searched. in terms of the data base itself, this will be the next major change. in the long run, it will be desirable for libraries that want access to the marc data base to have such access directly via terminals. at the present time, the cost of this kind of access is not worth the increased speed of access, nor is the money presently available; however, in the future, the cost of such a system will surely be reduced by technological improvements and increased importance of instantaneous access to the data base. when need balances with cost, such a set-up will be feasible. the geographical expansion of the system is a possibility. economically, this is most desirable, because the more ways the cost of maintaining the data base is split, the cheaper it is for all involved. some preliminary investigation along these lines with bordering states is being made and hopefully at some time in the future there will be a regional data base which many libraries can use. plans for future cooperative use of marc the cooperative use of marc thus far in oklahoma only affects the larger libraries which have access to computers and automation personnel. essentially, each library is autonomous and is free to use marc in any manner it wishes. it will remain true in oklahoma that individual libraries will always be free to use the data base to retrieve part or all of the data base for any purpose. however, plans are under way for more cooperative use of marc with libraries that do not have automation capabilities that would result in useful hard copy products for such libraries. two such cooperative plans have been proposed for immediate implementation. the first of these is a current awareness service. selected subjects would be compared against the data base on a bi-weekly (or other period) basis and complete bibliographic information for books representing the selected subjects would be printed as a personalized current awareness service. for example, all law titles on the marc tapes for two weeks could be pulled and listed, and the listing distributed to the county and state law offices, attorney firms, the law school library, etc., for selection and order purposes. the same could be done with library processing of marc tapes/ bierman and blue 63 science or any other subject. subject lists of interest to various agencies of state government could be produced and sent to them. another possibility is a profile of a legislative session by subject and then weekly or monthly lists of current materials available on these subjects for ordering by the department of libraries and possible lists to be made available to the legislative members. there are many possible uses for such a system which could be done fairly inexpensively. work began on this project in october 1969, and the service became operational on a cost basis in february 1970. a second possibility is catalog card and processing aids production. this would probably be done as a pilot project with several libraries throughout the state and then, if successful, expanded to any library in the state wanting to use the service. catalog card sets with subject headings printed at the top, and call numbers printed if the library accepts lc or lc dewey classification (there would be several options available within the system), spine labels, and book and circulation card labels would be provided. a by-product of such a state-wide operation would be the maintenance of book location information in machine readable form in a central place for future use as a basis for a machine readable state-wide union catalog. a project not in the immediate future but certainly being considered is that of cooperative retrospective conversion. that is, several libraries in the state would like to have bibliographic information in marc format for all books in their collections. whether the department of libraries would go ahead with such an ambitious project or wait for it to be done nationally ( recon study) would depend on timeliness on the national scene, need on the local scene, and available financial resources. eventually, oklahoma would like to have in machine readable form a complete union catalog of the entire library resources of the state that could be used for cooperative acquisitions programs, for strengthening subjects which are weak within the state, and as a location tool for interlibrary loan. such a data base would later be used also for reference functions. needless to say, such an ambitious project as this is not in the immediate future. conclusion early in the game, oklahoma libraries learned that the most economical means to library automation was cooperative automation. the creation of a state-wide marc data base is an important step toward cooperative library automation, while still allowing each local library to maintain its individuality for uses of the data. many areas of cooperation still remain untouched. the future success of library automation in oklahoma lies in the imaginative and creative projects that could be designed and implemented cooperatively to the mutual cost savings and benefit of all. 64 journal of library automation vol. 3/ 1 march, 1970 programs copies of the programs mentioned in this paper may be obtained from national auxiliary publications service of asis as follows: 1) "a program to merge all marc ii tapes received from the library of congress onto a single tape" (naps 00815); 2) "a program to drop given records or to transfer them to a separate tape" (naps 00816); 3) "a program to print marc tapes in readable form" (naps 00817); 4) "a program to pull selected records from the marc master tape for a single library" (naps 00818); and 5) "a program to print a listing of all library of congress card numbers on a given marc tape" (naps 00819). references 1. nugent, william r.: "nelinet: the new england library information network." a paper presented at the international federation for information processing, ifip congress 68, edinburgh, scotland, august 6, 1968. (cambridge, mass.: inforonics, inc., 1968), 4 pp. 2. pulsifer, josephine s.: "washington state library." in avram, henriette d.: the marc pilot project; final report on a project sponsored by the council on library resources, inc. (washington: library of congress, 1968), pp. 149-165. 3. avram, henriette d.: the marc pilot project; final report on a project sponsored by the council on library resources, inc. (washington: library of congress, 1968), pp. 89-183. 4. avram, henriette d.: droz, julius r.: "marc ii and cobol," journal of library automation, 1 (december 1968), 261-72. 5. recon working task force: conversion of retrospective catalog records to machine-readable form; a study of the feasibility of a national bibliographic service. (washington, d. c.: library of congress, 1969). search across different media | buckland, chen, gey, and larson 181 digital technology encourages the hope of searching across and between different media forms (text, sound, image, numeric data). topic searches are described in two different media: text files and socioeconomic numeric databases and also for transverse searching, whereby retrieved text is used to find topically related numeric data and vice versa. direct transverse searching across different media is impossible. descriptive metadata provide enabling infrastructure, but usually require mappings between different vocabularies and a search-term recommender system. statistical association techniques and natural-language processing can help. searches in socioeconomic numeric databases ordinarily require that place and time be specified. a hope for libraries is that new technology will support searching across an increasing range of resources in a growing digital landscape. the rise of the internet provides a technological basis for shared access to a very wide range of resources. the reality is that network-accessible resources, like the contents of a well-stocked reference library, are quite heterogeneous, especially in the variety of indexing, classification, categorization, and other forms of metadata. however, the use of digital technology implies a degree of technical compatibility between different media, sometimes referred to as “media convergence,” and these developments encourage the prospect of being able to search across and between different media forms—notably text, images, sound, and numeric data sets—for different kinds of material relating to the same topic. to examine the practical problems involved, the authors undertook to demonstrate searching between and across two different media forms: text files and socioeconomic numeric data sets.1 two kinds of search are needed. first, it should be possible to do a topical search in multiple media resources, so that one can find, for example, both pertinent factual numeric data and relevant discussion. (one difficulty is that the vocabulary used to classify the numeric data is ordinarily quite different from the subject headings used for books, magazine articles, and newspaper stories about the same topic.) second, when intriguing data values are encountered, one would like to move directly to topically relevant texts. likewise, when a questionable statement is read, one would like to be able to find relevant statistical evidence. therefore, there needs to be search support that facilitates such transverse searching among resources, establishing connections, transferring data, and invoking appropriate utilities in a helpful way. both problems were addressed through the design and demonstration of a gateway providing search support for both text and socioeconomic numeric databases. first, the gateway should help users conduct searches in databases of different media forms by accepting a query in the searcher’s own terms and then suggesting the specialized categorization terms to search for in the selected resource. second, if something interesting was found in a socioeconomic database, the gateway would help the searcher to find documents on the same topic in a text database, and vice versa. selection of the best search terms in target databases is supported by the use of indexes to the categories (entries, headings, class numbers) in the system to be searched. these search-term recommender systems (also known as “entry vocabulary indexes”) resemble dewey’s “relativ index,” but are created using statistical association techniques.2 four characteristics of this investigation need to be noted: 1. searching independent sources: the authors were not concerned with ingesting resources from different sources into a consolidated local data repository and searching within it. the interest lay, instead, in being able to search effectively in any accessible resource as and when one wants. this implies that interoperability issues in dealing with the native query languages and metadata vocabularies of remote repositories can be solved. 2. search for independent content: numeric data sets commonly have associated text in the form of documentation, code books, and commentary. however, the authors were interested in finding topical content that had no such formal or literary connection. independent means, for example, a newspaper article written by someone unaware that relevant statistical data existed or had been written before the author’s article existed. in the other direction, having found statistical data of interest, could topically related text created independently of this particular data point be found? 3. two different media forms were chosen: text and numeric data sets. they look similar because they both use arabic numerals, but the traditional reliance on information retrieval in a text environment search across different media: numeric data sets and text files michael buckland, aitao chen, fredric c. gey, and ray r. larson michael buckland (buckland@sims.berkeley.edu) is emeritus professor, school of information, university of california, berkeley; aitao chen (aitao@yahoo-inc.com) is a researcher at yahoo!, sunnyvale, california; fredric c. gey (gey@berkeley .edu) is an information scientist, uc data archive and technical assistance at the university of california, berkeley; and ray r. larson (ray@sims.berkeley.edu) is a professor, school of information at the university of california, berkeley. 182 information technology and libraries | december 2006 of using any character string from the corpus as a query, although technically feasible, cannot be expected to be useful here. one can copy a number expressing quantity, such as 12,941, from a numeric data cell, use it as a query in a text search engine such as google, and retrieve a large and eclectic retrieved set, usually involving “12941” as an identifying number for a postal code, a memorandum, a part number, software bug report, and so on, but the relationship is spurious. it requires great faith in numerology to expect anything topically meaningful to the original data cell one started with. with other combinations of media forms, not even spurious results are feasible: one cannot submit a musical fragment or some pixels from an image as a text query. 4. the authors’ interest was in how to achieve a better return on existing investments in well-formed, edited resources with descriptive metadata. this project built directly on prior work on how to make more effective use of existing, expertly developed metadata, rather than creating or replacing metadata. search of multiple resources comes in two forms: 1. parallel search is when a single query is sent to two or more resources at more or less the same time. for example, a researcher interested in the import of shrimp would like to see pertinent newspaper articles and trade statistics. thus, one might send a query to the census bureau’s united states (u.s.) imports and exports numeric data series and look at sic 0913 for shrimp and prawn and note a dramatic increase in imports from vietnam through los angeles from 1995 onwards. one would also search newspaper indexes for articles such as “normalizing ties to vietnam important steps for u.s. firms; california stands to profit handsomely when barriers fall to trade with fast-growing country.”3 different sources are likely to use different index terms or categories, so the challenge is how to express the searcher’s query in terms that will be effective for searching in the target resources, which, mostly likely, will use different vocabularies. as one example, the term for “automobiles” is 3711 in the standard industrial classification; tl 205 in the library of congress (lc) classification, 180/280 in the u.s. patent classification; and, in the census bureau’s u.s. imports and exports data series, pass mot veh, spark ign eng.4 2. transverse search is when an item of interest found in one resource is used as the basis for a query to be forwarded to a different resource. the challenge here, again, is that when a query using the topical metadata in one resource needs to be expressed in the vocabulary of the target resource, the metadata vocabularies in the two resources will usually be different from each other, and, quite likely, both are unfamiliar to the searcher. when searching within a single media form, it may be possible to use content itself directly as a query: a fragment of text in a source-text database is commonly used as a query in a target-text database. similarly, one might start with an image and seek images that are measurably similar. however, because such direct search cannot be done when searching across different media forms, an indirect approach relying on the use of interpretive representations becomes necessary. as the network environment expands, mapping between vocabularies will be increasingly important. ■ text and numeric resources text resource a library catalog—a special case of text file—was chosen for use as a text file rather than a corpus of “full text.” the reasons were practical: in this exploratory investigation, it was important to start with resources that had rich metadata; it needed to be a resource that was sufficiently controllable to enable experimentation with it. a library catalog was in the spirit of the project in that it would lead to additional text resources; and a suitable resource was available, which was intended for metadata mapping: a set of several million marc records, derived from melvyl, the university of california online library catalog. socioeconomic numeric data set initially, and in prior work, the authors had worked on access to u.s. federal data series, especially import and export statistics and county business reports. although some progress was made with interfaces to these data series, it became clear that the investment needed to craft interoperable access was high relative to the available staff. crafting access to individual data series did not appear to be a scalable way to demonstrate variety within the authors’ limited resources, so attention was turned to a single collection comprising many diverse numeric tables, the counting california database.5 ■ mapping topical metadata well-edited, high-quality databases typically have topical metadata expertly assigned from a vocabulary (thesaurus, classification, subject-heading system, or set of search across different media | buckland, chen, gey, and larson 183 categories). but there is a babel of different vocabularies. not only do the names of topics vary, but the underlying concepts or categories may also differ. effective searching requires expert familiarity with a system’s vocabulary; but as access to digital resources expands, the diversity of vocabularies increases and accessible resources are decreasingly likely to use vocabularies familiar to any individual searcher. the best answer is twofold: first, it is desirable to have an index (a “mapping”) from the natural language of each group of searchers to the entries used in each metadata vocabulary. such a mapping provides an index from a vocabulary familiar to the searcher to the vocabulary used in entries of the target system and so is called a search-term recommender system. (the authors called it an “entry-vocabulary index,” or evi.) dewey’s “relativ index” to his decimal classification is a familiar example. when searching across databases, one also wants a second kind of mapping: between pairs of system vocabularies. unfortunately, mappings between different vocabularies are rare, expensive, time-consuming, and hard to maintain. (the unified medical language system is a notable example.)6 it is the authors’ impression that this problem is worse in searching across different media forms because data bases in different media forms tend to be created by different communities, increasing the chances that they will use different categories, vocabularies, and ways of thinking. fortunately where data containing two forms of vocabulary are available, they can be used as training sets for statistical-association techniques to generate evis automatically, and this is the approach that was used. (more details can be found in the appendix.) from text words to library subject headings an evi from ordinary english words to library of congress subject headings (lcsh) was created by taking catalog records containing at least one subject heading (6xx field in the marc bibliographic format). from each of the 4,246,510 records used, main subject headings were extracted (subfield a from fields 600, 610, 611, 630, 650, and 651) and fields containing text: titles (245a), subtitles (245b), and summaries describing the scope and general content of the material (520a). the underlying assumption is that for each record, the words in the “text” fields (245a,b and 520a) tend to be characteristic of discourse on the subject (6xxa). two examples, with identifying lccns in the <001> field are: <001>73180254 //r86</001> <245><a>a study of operant conditioning under delayed reinforcement in early infancy</a></245> <650><a>infant psychology</a></650> <650><a>operant conditioning</a></650> <001>73180255 </001> <245><a>reptilian disease</a><b>recognition and treatment</b></245> <650><a>reptiles</a><x>diseases</x></650> the words in the text fields (245a, 245b, and 520a) were extracted. stop words were removed and the remainder normalized. then the degree to which each word is associated with each subject heading (by co-occurring in the same records) was computed using a maximum likelihood ratio-based measure. natural-language processing can be used to identify adjective-noun phrases to support more precise searching using phrases as well as individual words. a very large matrix shows the association of each text word (or phrase) with each subject heading; so, for any given word (or combination of words), a list of the most closely associated headings, ranked by degree of association, can be derived from the matrix. queries a query, which can be a single word, a phrase, a set of keywords, a book title, and so on, is normalized in the same way and looked up in the matrix to produce a ranked list of the most closely associated subject headings as candidate lcsh search terms. for example, entering the textual query words “peanut” and “butter” generates the following ranking list of lcsh main headings as candidates for searching: rank lcsh (subfield 650a) 1. peanut 2. cookery (peanut butter) 3. cookery (peanuts) 4. peanut industry 5. peanut butter 6. butter 7. schulz, charles m. this display is an important departure from traditional fully automatic searching. the list is, in effect, a prompt, indicating probably suitable query terms in the vocabulary of the target resource. it introduces the searcher to the categories and terminology of the system and enables the searcher to use expert judgment to select the heading that seems best for the search. from text words to the metadata vocabularies in numeric data sets a training set of records containing both descriptive words and topical metadata is often not readily available for numeric data sets. the authors’ first effort was to create an evi to the standard industrial classification (sic), widely used over many years in numeric data sets. (sic codes were associated with words by using, as a training set, the 184 information technology and libraries | december 2006 titles in a bibliographic database that used sic codes.) but by the time the sic evi was completed, sic had been discontinued and replaced by the north american industry classification system (naics), so a mapping was created from sic codes to naics codes. figures 1–3 show stages in an interface that accepts a searcher’s query “car” (figure 1), prompts with a ranked list of naics codes (figure 2), then extends the search with the selected naics code to retrieve numeric data (figure 3). by this time, however, it had become apparent that, with the current low level of interoperability in software and in data formats, the labor required to create evis and interfaces to each large traditional numeric data series was enormous. therefore, attention was turned to a collection of different numeric data sets available through a single interface, counting california, made available by california digital library at http://countingcalifornia.cdlib.org. this resource is a collection of some three thousand numeric tables containing statistics related to a range of topics. the numeric data sets are mainly from the california department of health services, the california department of finance, and the federal bureau of the census. the tables are organized under a two-level classification scheme. there are sixteen topics at the top level, which are subdivided into a total of 184 subtopics. all the numeric tables were assigned to one or more subtopics and each table has a caption. at the counting california web site, a searcher can browse for tables by selecting a higher-level topic, then a lower-level subtopic, and then a table. two additional ways were created to access the tables: probabilistic retrieval, and an evi to the topical categories. the captions, topics, and subtopics were extracted for each of the three thousand tables, and xml records were created in the following form: <table> <topic> education </topic> <subtopic> libraries </subtopic> <caption> library statistics, statewide summary by type of library california 1992–93 to 1997–98 </caption> </table> retrieval two search methods were used: direct probabilistic retrieval. an in-house implementation was used of a probabilistic full-text retrieval algorithm developed at berkeley.7 this search engine takes a free-form text query and returns a ranked list of captions of tables ranked according to their relevance scores. for example, the five top-ranked captions returned to the query “public libraries in california” were: figure 1. query interface for search-term recommender system f or the north american industry classification system figure 2. display of naics code search-term recommendations for “car” figure 3. display of numeric data retrieved using selected naics code search across different media | buckland, chen, gey, and larson 185 1. library statistics, statewide summary by type of library california, 1992–93 to 1997–98 table f6. 2. library statistics, statewide summary by type of library california, 1993–94 to 1998–99 table f6yr0-0. 3. number of california libraries, 1989 to 1999 table f5yr00 4. number of california libraries, 1989 to 1998, as of september table f5. 5. california public schools, grades k–12, 1989 to 1998 table f4. each entry in the retrieved set list is linked to a numeric table maintained at the counting california web site and, by clicking on the appropriate link, a user can display the table as an ms excel file or as a pdf file. mediated search. from the same extracted records the words in the captions were used to create an evi to the subtopics in the topic classification using the method already described. as an example, the query “personal individual income tax,” when submitted to the evi, generated the following ranked list of subtopics: 1. income 2. government earnings and tax revenues 3. personal income 4. property tax 5. personal income tax 6. corporate income tax 7. per capita income a user can click on any selected subtopic to retrieve the captions of tables assigned that subtopic. for example, clicking on the fifth subtopic, personal income tax, retrieves: ■ personal income tax returns: number and amount of adjusted gross income reported by adjusted gross income class california, 1998 taxable year. table d10yr00 ■ personal income tax returns: number and amount of adjusted gross income reported by adjusted gross income class california, 1997 taxable year. table d9 ■ personal income statistics by county, california 1997 taxable year. table d10 ■ personal income statistics by county, california 1998 taxable year. table d11yr00 ■ transverse searching between textand numeric-data series to demonstrate the searching capability from a bibliographic record to numeric-data sets, the first step is to retrieve and display a bibliographic record from an online catalog. a web-based interface for searching online catalogs was implemented using an in-house implementation of the z39.50 protocol. besides the z39.50 protocol, an important component that makes searching remote online catalogs feasible is the gateway between the http (hypertext transfer protocol) and the z39.50 protocol. while http is a connectionless-oriented protocol, the z39.50 is a connection-oriented protocol. the gateway maintains connections to remote z39.50 servers. all search requests to any remote z39.50 server go through the gateway. searching from catalog records to numeric data sets having selected some text (for the purposes of this study, a catalog record), how could one identify the facts or statistics in a numeric database that are most closely related to the topic? clicking on a “formulate query” button placed at the end of a displayed full marc record creates a query for searching a numeric database. the initial query will contain the words extracted from the title, subtitle, and the subject headings and is placed in a new window where the user can modify or expand the query before submitting it to the search engine for a numeric database. so, for example, the following text extracted from a catalog record: library laws of the state of california, library legislation. california. public libraries when submitted as a query, retrieves a ranked list of table names, of which two, covering different time periods, are entitled library statistics, statewide summary by type of library, california. searching from numeric data sets from catalog records transverse search in the other direction, starting from a data table, is achieved by forwarding the caption of a table to the word-to-lcsh evi to generate a prompt list of the seven top-ranked lchss, any one of which can be used as a query submitted to the catalog. ■ architecture figure 4 shows the structure of the implementation. the boxes shown in the figure are: 1. a search interface for accessing bibliographic/textual resources through a word-to-lcsh evi. 2. a word to the lcsh evi. 3. a ranked list of lcshs closely associated with the query. 4. an online catalog. 186 information technology and libraries | december 2006 5. results of searching the online catalog using an lcsh. 6. a full marc record displayed in tagged form. 7. a new query formed by extracting the title and subject fields from the displayed full marc record. 8. a numeric database. 9. a list of captions of numeric tables ranked by relevance score to the query. 1 0. numeric table displayed in pdf or ms excel format. 11. a search interface for numeric databases based on a probabilistic search algorithm. a user can start a search using either interface (boxes 1 or 11) and, from either starting point, find records on the same topic of interest in a textual (here bibliographic) database and a socioeconomic database. ■ conclusions and further work enhanced access to numeric data sets the descriptive texts associated with numeric tables, such as the caption, headers, or row labels, are usually very short. they provide a rather limited basis for locating the table in response to queries, or describing a data cell sufficiently to form a usefully descriptive query from it. sometimes the title (caption) of a table may be the only searchable textual description about the content of the table, and the titles are sometimes very general. for example, one of the titles, library statistics, statewide summary by type of library california, 1992–93 to 1997–98, is so general that neither the kinds of statistics nor the types of libraries are revealed. if a user posed the question, “what are the total operating expenditures of public libraries in california?” to a query system that indexes table titles only, the search may well be ineffective since the only word in common between the table title and the user’s query is “california” and, if the plurals of nouns have been normalized, to the singular form, “library.” table column headings and row headings provide additional information about the content of a numeric table. however, the column and row headings are usually not directly searchable. for example, a table named “language spoken at home” in counting california databases consists of rows and columns. the column headings list the languages spoken at home, while the row headings show the county names in california. each cell in the table gives the number of people, five years of age and older, who speak a specific language at home. to answer questions such as “how many people speak spanish at home in alameda county, california?” using the table title alone may not retrieve the table that contains the answer to the example question. it is recommended that the textual descriptions of numeric tables be enriched. automatically combining the table title and its column and row headings would be a small but practical step toward improved retrieval. geographic search socioeconomic numeric data series refer to particular areas and, in contrast to text searching, the geographical aspect ordinarily has to be specified. to match the geographical area of the numeric data, a matching text search may also have to specify the same place. the authors found that this was hard to achieve for several reasons. place names are ambiguous and unstable: a search for data relating to trinidad might lead to trinidad, west indies, instead of trinidad, california, for example. the problem is compounded because, in numeric data series, specialized geopolitical divisions, such as census tracts and counties, are commonly used. these divisions do not match conveniently with searchers’ ordinary use of place names. also, the granularity of geographical coverage may not match well. data relating to berkeley, for example, may be available only in aggregated data for alameda county. it was eventually concluded that reliance on the names of places could never work satisfactorily. the only effective path to reliable access to data relating to places would be to use geospatial coordinates (latitude and longitude) to establish unambiguously the identity and location of any place and the relationship between places. this means that gazetteers and map visualizations become important. gazetteers relate named places to defined spaces, and thereby reveal spatial relationships between places, e.g., the city of alameda is on alameda island within alameda county. this problem has been addressed in a subsequent figure 4. architecture of the prototype search across different media | buckland, chen, gey, and larson 187 study entitled “going places in the catalog: improved geographical access.”8 temporal search searches of text files and of socioeconomic numeric data series also differ substantially with respect to time periods: numeric data searches ordinarily require the years of interest to be specified; text searches rarely specify the period. an additional difficulty arises because in text, as in speech, a period is commonly referred to by a name derived metaphorically from events used as temporal markers, rather than by calendar time, as in “during vietnam,” “under clinton,” or “in the reign of henry viii.” named time periods have some of the characteristics of place names: they are culturally based and tend to be multiple, unstable, and ambiguous. it appears that an analogous solution is indicated: directories of named time periods mapped to calendar definitions, much as a gazetteer links place names to spatial locators. this problem is being addressed in a subsequent study entitled “support for the learner: what, where, when, and who.”9 media forms the paradox, in an environment of digital “media convergence,” that it appears impossible to search directly across different media forms invites closer attention to concepts and terminology associated with media. a view that fits and explains the phenomena as the authors understand them, distinguishes three aspects of media: ■ cultural codes: all forms of expression depend on some shared understandings, on language in a broad sense. convergence here means cultural convergence or interpretation. ■ media types: different types of expression have evolved: texts, images, numbers, diagrams, art. an initial classification can well start with the five senses of sight, smell, hearing, taste, and feel. ■ physical media: paper; film; analog magnetic tape; bits; . . . being digital affects directly only this aspect. anything perceived as a meaningful document has cultural, type, and physical aspects, and genre usefully denotes specific combinations of code, type, and physical medium adopted by social convention. genres are historically and culturally situated. convergence can be understood in terms of interoperability and is clearly seen in physical media technology. the adoption of english as a language for international use in an increasingly global community promotes convergence in cultural codes. nevertheless, the different media types are fundamentally distinct. metadata as infrastructure it is the metadata and, in a very broad sense, “bibliographic” tools that provide the infrastructure necessary for searches across and between different media—thesauruses, mappings between vocabularies, place-name gazetteers, and the like. in isolation, metadata is properly regarded as description attached to documents, but this is too narrow a view. collectively, the metadata forms the infrastructure through which different documents can be related to each other. it is a variation on the role of citations: individually, references amplify an individual document by validating statements made within it; collectively, as a citation index, references show the structure of scholarship to which documents are attached. ■ summary a project was undertaken to demonstrate simultaneous search of two different media types (socioeconomic numeric data series and text files) without ingesting these diverse resources into a shared environment. the project objective was eventually achieved, but proved harder than expected for the following reasons: access to these different media types has been developed by different communities with different practices; the systems (vocabularies) for topical categorization vary greatly and need interpretative mappings (also known as relative indexes, searchterm recommender systems, and evis); specification of geographical area and time period are as necessary for search in socioeconomic data series and, for this, existing procedures for searching text files are inadequate. ■ acknowledgement this work was partially supported by the institute of museum and library services through national library leadership grant no. 178 for a project entitled “seamless searching of numeric and textual resources,” and was based on prior research partially supported by darpa contracts n66001-97-c-8541; ao# f477: “search support for unfamiliar metadata vocabularies” and n66001-00-18911, to# j290: “translingual information management using domain ontologies.” references 1. michael k. buckland, fredric c. gey, and ray r. larson, seamless searching of numeric and textual resources: final report on institute of museum and library services national leadership 188 information technology and libraries | december 2006 grant no. 178 (berkeley, calif.: univ. of california, school of information management and systems, 2002), http:// metadata.sims.berkeley.edu/papers/seamlesssearchfinal report.pdf (accessed july 18, 2006); michael buckland et al., “seamless searching of numeric and textual resources: friday afternoon seminar, feb. 14, 2003,” http://metadata.sims .berkeley.edu/papers/seamlessfri.ppt (accessed july 18, 2006). 2. michael buckland et al., “mapping entry vocabulary to unfamiliar metadata vocabularies,” d-lib magazine 5, no. 1 (jan. 1999), www.dlib.org/dlib/january99/buckland/01buckland .html (accessed july 18, 2006); michael buckland, “the significance of vocabulary,” 2000, http://metadata.sims.berkeley .edu/vocabsig.ppt (accessed july 18, 2006); fredric c. gey et al., “entry vocabulary: a technology to enhance digital search,” in proceedings of the first international conference on human language technology, san diego, mar. 2001 (san francisco: morgan kaufmann, 2001), 91–95, http://metadata.sims.berkeley.edu/ papers/hlt01-final.pdf (accessed july 18, 2006). 3. los angeles times, july 12, 1995: d1. 4. michael buckland, “vocabulary as a central concept in library and information science,” in digital libraries: interdisciplinary concepts, challenges, and opportunities. proceedings of the third international conference on conceptions of library and information science (colis3), dubrovnik, croatia, may 23–26, 1999, ed. t. arpanac et al. (lokve, croatia: benja pubs., 1999), 3–12, www .sims.berkeley.edu/~buckland/colisvoc.htm (accessed july 18, 2006); buckland et al., “mapping entry vocabulary.” 5. counting california, http://countingcalifornia.cdlib.org (accessed july 18, 2006). 6. “factsheet: unified medical language system,” www .nlm.nih.gov/pubs/factsheets/umls.html (accessed july 18, 2006). 7. william s. cooper, aitao chen, and fredric c. gey, “fulltext retrieval based on probabilistic equations with coefficients fitted by logistic regression,” in d. k. harman, ed., the second text retrieval conference (trec-2), march 1994, 57–66 (gaithersburg, md.: national institute of standards and technology, 1994), http://trec.nist.gov/pubs/trec2/papers/txt/05.txt (accessed july 18, 2006). 8. “going places in the catalog: improved geographical access,” http://ecai.org/imls2002 (accessed jul. 18, 2006). 9. vivien petras, ray larson, and michael buckland, “time period directories: a metadata infrastructure for placing events in temporal and geographic context,” in opening information horizons: joint conference on digital libraries (jcdl), chapel hill, n.c., june 11–15, 2006, forthcoming, http://metadata.sims .berkeley.edu/tpdjcdl06.pdf (accessed july 18, 2006); “support for the learner: what, where, when, and who,” http://ecai .org/imls2004 (accessed july 18, 2006). search across different media | buckland, chen, gey, and larson 189 appendix: statistical association methodology a statistical maximum likelihood ratio weighting technique was used to construct a two-way contingency table relating each natural-language term (word or phrase) with each value in the metadata vocabulary of a resource, e.g., lcsh, lccns, u.s. patent classification numbers, and so on.1 an associative dictionary that will map words in natural languages into metadata terms can also, in reverse, return words in natural language that are closely associated with a metadata value. training records containing two different metadata vocabularies can be used to create direct mappings between the values of the two metadata vocabularies. for example, u.s. patents contain both u.s. and international patent classification numbers and so can be used to create a mapping between these two quite different classifications. multilingual training sets, such as catalog records for multilingual library collections, can be used to create multilingual natural language indexes to metadata vocabularies and, also, mappings between natural language vocabularies. in addition to the maximum likelihood ratio-based association measure, there are a number of other association measures, such as the chi-square statistic, mutual information measure, and so on, that can be used in creating association dictionaries. the training set used to create the word-to-lcsh evi was a set of catalog records with at least one assigned lcsh (i.e., at least one 6xx field). natural language terms were extracted from the title (field 245a), subtitle (245b), and summary note (520a). these terms were tokenized; the stopwords were removed; and the remaining words were normalized. a token here can contain only letters and digits. all tokens were then changed to lower case. the stoplist has about six hundred words considered not to be content bearing, such as pronouns, prepositions, coordinators, determiners, and the like. the content words (those not treated as stopwords) were normalized using a table derived from an english morphological analyzer.2 the table maps plural nouns into singular ones; verbs into the infinitive form; and comparative and superlative adjectives to the positive form. for example, the plural noun printers is reduced to printer, and children to child; the comparative adjective longer and the superlative adjective longest are reduced to long; and printing, printed, and prints are all reduced to the same base form print. when a word belonging to more than one part-of-speech category can be reduced to more than one form, it is changed to the first form listed in the morphological analyzer table. as an example, the word saw, which can be a noun or the past tense of the verb to see, is not reduced to see. subject headings (field 6xxa) were extracted without qualifying subdivisions. the inclusion of foreign words (alcoholismo, alcoolisme, alkohol, and alcool), derived from titles in foreign languages, demonstrate that the technique is language independent and could be adopted in any country. it could also support diversity in u.s. libraries by allowing searches in spanish or other languages, so long as the training set contains sufficient content words. evis are accessible at http://metadata. sims.berkeley.edu/prototypesi.html. fuller descriptions of the project methodology can be found in the literature.3 ■ references 1. ted dunning, “accurate methods for the statistics of surprise and coincidence,” computational linguistics 19 (march 1993): 61–74. 2. daniel karp et al., “a freely available wide coverage morphological analyzer for english,” in proceedings of coling-92, nantes, 1992 (morristown, n.j.: association for computational linguistics, 1992), 950–55, http://acl.ldc.upenn .edu/c/c92/c92-3145.pdf (accessed july 18, 2006). 3. michael k. buckland, fredric c. gey, and ray r. larson, seamless searching of numeric and textual resources: final report on institute of museum and library services national leadership grant no. 178 (berkeley, calif.: univ. of california, school of information management and systems, 2002), http://metadata.sims .berkeley.edu/papers/seamlesssearchfinalreport.pdf (accessed jul. 18, 2006); youngin kim et al., “using ordinary language to access metadata of diverse types of information resources: trade classification and numeric data,” in knowledge: creation, organization, and use. proceedings of the american society for information science annual meeting, oct. 29–nov. 4, 1999 (medford, n.j.: information today, 1999), 172–80. 142 lc/marc on molds; an experiment in computer-based, interactne bibliographic storage, search, retrieval, and processing pauline atherton, associate professor, school of library science, and karen b. miller, research associate, syracuse university, syracuse, new york a project at syracuse university utilizing molds, a generalized computer-based interactive retrieval program, with a portion of the library of congress marc pilot project tapes as a data base. the system, written in fortran, was used in both a batch and an on-line mode. it formed part of a computer laboratory for library science students during 1968-1969. this report describes the system and its components and points out its advantages and disadvantages. introduction the somewhat intimidating title of this report becomes less so when translated from jargon into more familiar phrases. the lc/marc on molds experimental project conducted at syracuse university school of library science utilizes a computer: 1) to store bibliographic reference (library catalog) data, 2) to search the data for items that meet a searcher's criteria, 3) to retrieve items the searcher wishes retrieved, and 4) to process or manipulate items as required. a dialog or interaction between man and his data, via the machine, is established when a searcher makes a request in a query language and the computer responds immediately to the request. the lc/marc on molds system consists of two major components. the first is the data base, which is a slightly modified subset of the library of congress marc pilot project records ( 1). the second component is the computer programming system written in fortran known as molds (acronym for management on-line lc marc on molds/atherton and miller 143 data system). molds provides the computer routines required to store and maintain the data base, and the query language (also known generally as molds) that a searcher uses to interact with his data stored in the computer. the lc/ marc on molds system was originally implemented in april 1968 on the ibm 360/ 50 at the syracuse university computing center. this system is part of an experiment to determine how on-line interactive retrieval systems could be used to greatest advantage in the information gathering process. the molds system, developed in 1966 by the syracuse university research corporation (2) for management purposes, was readily available for use in the research reported in this paper. molds has been used with several data bases, including the marc records. the system has not been made available to a large user population. preliminary work with the system and a few demonstrations to students have already provided considerable insight into the desirable and undesirable features in both the marc data base and the molds query language, an insight that has already resulted in both data-base and querylanguage modification. work with the system on the computer at syracuse university has raised many crucial questions extending beyond the original research plan about system and data base design-questions for which there are as yet no answers. even at its early stage of experimentation the work should be of interest to librarians because of its use of the marc pilot project records and its use of an available retrieval program with features suitable for reference retrieval. to the authors' knowledge, this is the first computer-based project in which the library of congress marc records were used in an interactive retrieval environment. the query language (molds) was not specifically designed for reference retrieval, but its design features make its use for this purpose quite feasible. it differs from the usual interactive system designed for bibliographic reference retrieval and therefore deserves attention for comparative purposes. molds gives a user the ability to process as well as retrieve data, something very few search and retrieval systems are designed to do. the contribution of lc/marc on molds to the world of information retrieval, promising though it appears, cannot be assessed until all experiments are run. this report on its features, both good and bad, is offered in order to make those concerned with the design and application of interactive systems aware of its unique aspects and potential. hopefully, this work will contribute another ingredient to the synthesis of ideas and methods that will bring the state of the art ever closer to the optimum and ideal. 144 journal of library automation vol. 3/2 june, 1970 table i. some features of interactive retrieval systems (circa 1968) #docs. in data base system name data base structure access points 1. audacious 2330 tree structure udc descriptions (alp) threaded list euratom key words 2. bold 6000 threaded list astia subject category (sdc) index terms accession numbers 3. colex 2000 inverted index descriptor} subject micro tree structure author qualified country (sdc) index subject by ~ent, subject area, date 4. grins > 1000 serial document index terms (lehigh u) inverted index 5. multilist varies threaded list any chosen key term to fit aaplitree structure cation ( e! author, subject, ate, directory title wor , subject headings) 6. marc/ 2000 molds cell-matrix any discrete data block 7. nasa/recon 270,000 ? subject l { author corporate qualified date source by report# contract# 8. tip > (mit) 25,000 list structure author(s) location ( where work done) citation identification ( i.v-p.) article title ( entire, keyword) citation index bibliographic coupling 9. suny biomed > 20,000 inverted index auilior } { comm. title qualified date, ne1work subject by lang. •each command is a subroutine. commands are tailored to application. a ccess to authority files on-line udc schedules subject category list, index term file no index term no optional no no no lc marc on molds/atherton and miller 145 related terms or #commands cross refs in query given language yes 11 yes 14 ( 11 light pen) no (conversation) yes (conversation) no 0 optional 35 yes 16 function keys no 9 also various mac commands no 10 (?) computer instruction in language use optional optional optional no no no optional no no comj:ier ai query formulation (conversation) limited yes yes yes no no yes no yes root communiword cation search link yes crt yes crt with light pen yes teletype yes teletype no crt no crt yes teletype no ibm 2740 console 146 journal of library automation vol. 3/ 2 june, 1970 background a number of interactive retrieval systems have b een designed and implemented within the last few years. the features and potential of lc/ marc on molds are best viewed in relation to what has been done in the field up to now. to gain some perspective, the major features of data base structures and query languages of other interactive systems are summarized in table 1. this table presents those features of most interest to librarians who may wish to compare searching on a computer with searching in the card catalog or other bibliographic reference tools. references 3-12 document sources for the data in this table. molds data base structure the general structure of the data base with which molds operates is, in comparison with the threaded lists and inverted indexes found in many retrieval systems, extremely simple and unsophisticated. the data base can be composed of from one to ten distinct files of 1000 records each. a record is equal to the bibliographic description on a card in a library catalog. each record may be up to 300 computer words ( 1200 characters) long and may be subdivided into 80 blocks. originally, there was a 200word (boo-character) limitation on record size, but this has now been expanded. the total file size (limit of 10,000 records) is adequate for testing purposes, but expansion beyond the present limitations is planned in order to make the system more practical for actual use. the structure of a file is essentially a simple matrix. each row contains all the elements of a single complete record; each column contains all like discrete items of all the records in the file. the columns are called blocks in the molds system, block and fi eld being used synonymously in this report. for example, a library catalog card for one publication would be a record in a file composed of library catalog records. the main entries in the file constitute a block and the dates of publication constitute another block. figure 1 illustrates the data base structure, as of 1968. in this illustration the maximum number of files is 10 ( 1000 records each) and the maximum number of blocks 80. each file and each block in a file is given a name and/ or number. a user can reference or call up any file or data block within a file by using its name or number in a molds query language command. there are as many access points to a file as there are blocks in that file. this is in contrast to a conventional card catalog, for example, where the only access points are filing entries: main entry, title, subject(s), added entries, series, and analytics. no specific provision is made within the molds system for the storage of authority files, cross reference lists, or other intermediate keys to the records. such files are not absolutely necessary for effective operation of the system since every block can be accessed and can serve as its own authority file. for more efficient system operation, however, it is intended lc marc on molds/atherton and miller 147 to explore the possibility of creating authority fi1es as part of the data base, beginning with portions of the seventh edition of the library of congress list of subject headings. block al block a2 •••••••• block abo pin l ....___ ---v../ file a } record aj. record a2 re;,ord a3 record aj.ooo data base block bl blockb2 -------block b8c ~ ~ file b record bl record b2 .. record blooo (as of 1968,--maldmulllllulllber .of files is 10 files,each or 1000 records, v1th maximum number of blocks 80) figure 1. section of general molds data base structure. provision is made for temporary user storage areas in which the user places the results of his retrieval and processing operations. data in the user area is retained only during the session in which it is created. although it cannot be saved for use at a later date, all or part of it can be printed out on the on-line printer for the user's later reference. while the general structure of the data base is formalized within the molds system, the content and specific organization of a particular data base is determined by its originator. this feature, plus the simplicity of molds' own structure, introduces a great deal of flexibility into the data base and the use that can be made of it. the originator of the data base may designate as a block any discrete data item he wishes. if the user population is dissatisfied with results using one content and arrangement of blocks, the base can be reformatted and restructured in a fairly simple maintenance run. no problems of linking records or modifying authority lists arise, as neither is part of the system. the first version of the lc/marc data base has in f.act been modified by addition of three blocks and division of one block in half to form two blocks, giving access to smaller units of data. 148 journal of library automation vol. 3/2 june, 1970 the lc/marc data base in molds format library of congress marc pilot project tapes containing some 40,000 records of english language books cataloged in 1966-67 became available for this project in the fall of 1967. because of the molds data base limitations, a subset of these catalog records was selected for use with molds. the original plan was to have each file in the data base consist of as complete a set as possible of all marc pilot project records from a single library of congress classification schedule. the candidate for the first file was class r (medicine) which contained just under 1000 records. later molds files were formed for two other lc classes: t (technology) and z (bibliography and library science). in mid-1969 two stratified sample files of the marc data base were created, one in the humanities, another in the social sciences. in all, syracuse has a marc/molds data base of 10,000 records. the record format of the marc tape was first analyzed to determine which fields should be included in the data base, and which might be omitted. the criterion for selection was probable usefulness to searchers of the data base, a conception that should undoubtedly be modified as searches are monitored. appropriate changes would not be difficult. toward the end of january 1969, a programming project was begun which entailed the design and implementation of a computer program to perform format conversion of the library of congress marc i bibliographic file to satisfy molds data base requirements. the project represented a three man-month effort and was completed by june 1969. the data-base converter program represents an attempt to provide a user-oriented facility for creating a molds data base from marc information. essentially, the user of the program describes each molds file to be produced by specifying: 1) the number of (fixed) fields per molds record; 2) the name and size (in characters) of each field in the molds record; 3) the name of the marc i field from which the data are to be taken; 4) selection criteria according to which marc i records are to be chosen for conversion; 5) for any marc i field, a data conversion procedure to be applied prior to transferring the information to the appropriate molds field; 6) whether or not diacritical codes should be stripped from the marc i field prior to transferring the information to the molds field; 7) whether or not character translation from lower-case to upper-case codes should be performed on the data prior to transfer from the marc i to the molds field. although the program has not yet been refined to the extent originally intended, nevertheless it contains all the features indicated above and has lc marc on molds/atherton and miller 149 been used to create ten molds files since its completion. the program is written in pl/i and more fully documented in a report available from the national auxiliary publication service of asis. molds requires fixed-field input for its data base, but many of the fields or data blocks on the marc tape are variable in length. therefore, the field lengths of 200 records in the class r (medicine) subset were examined to determine the maximum size which would produce a molds record within the original 200 computer-word (boo-character) limitation and still retain all the desired data. this limitation was easily expanded to 300 words, allowing addition of new fields and expansion of existing fields as new marc/molds files were generated. a record whose original variable length was 500 characters or less expanded to about 800 characters when converted to fixed-field form. in the first data base only records of 500 characters or less were considered for inclusion, which gave a total of 620 records in the first marc/molds file. by mid-1969 this data base was greatly enlarged using the program described above. the names of the present marc/molds files are: ss01, ss02, ss03, ss04, ssoz, and ssoh. the first files generated were called marc and marz. the marc/molds format now in use is given in table 2. the additions made to the original format are noted. marc/molds block names can be used instead of block numbers; for ease of searching both name and number are given in the table. the molds block number corresponds to marc pilot project field tags whenever possible. after this second revision had been completed, marc ii ( 13) format with new field tags appeared . interestingly, there were remarkably few differences. creating an information retrieval system from other data bases can present some major headaches. during the first test session with the marc/ molds data base, it was discouraging to find that successful retrieval operations could not be performed on such vital items as subject or main entry (blocks main and suba, respectively) . the problem lay in the fact that the lower-case character codes employed on the marc tape had not been converted to the all-upper-case-codes required by molds. once discovered, the problem was easily remedied. other problems were not so easy to solve. the marc data base had been received in a "raw form", i.e., there were typographical errors in the original tapes and irregular spacing; and incorrect punctuation, spelling and abbreviations. there was no way to detect these errors, and the retrieval program would only work on direct matches of query and document information elements. the molds language (to be discussed subsequently) required a good deal of standardization and regularity of the records to take full and effective advantage of its retrieval capabilities. 150 journal of library automation vol. 3/2 june, 1970 table 2. marc/ molds data base format description field names chars. marc i data element marc fixed fields: molds blk in fixed information block no. block field values or name position explanation or tag no. lc card no. ldn~ 80 11 9-19 type of main entry type 81 1 21 a-g form of work f0rm 82 1 22 mis bibliographies indicator bib 83 1 23 xb illustrations " illu 84 1 24 " maps map 85 1 25 conferences c0nf 86 1 26 juvenile juv 87 1 27 languages lang 88 4/ 4 29-36 both languages language 1 lanl 1 4 29-32 language 2 lan2 2 4 33-36 publication dates date 89 4/ 4 38-45 both dates height in em. hite 90 2 59-60 uniform tracing indicator unif 91 1 66 xlb series tracing indicator sert 92 1 69 " place of publication code plcd 18 4 46-49 publisher code pucd 19 4 50-53 lc call no. lcn~ 98 20 90 dewey class no. dew! 99 20 92 dewey class no. (edited) dew2 39 8 92 ooddd.dd lc class no. (edited) lccl 97 8 90 e.g. 00351.2352 main entry main 10 68 10 title statement titl 20 80 20 subtitle statement stit 21 80 20 edition statement edit 25 12 25 place } . plce 30 28 30 publisher 1mpnnt statement publ 31 28 30 collation c0ll 40 48 40 series note sers 50 44 50/51 note n0ta 60 44 60 note n0tb 61 44 60 subject tracing suba 68 48 70 subject tracing subb 69 48 70 subject tracing subc 70 48 70 lc marc on molds/atherton and miller 151 personal author tracing paua 71 40 71 personal author tracing paub 72 40 71 corporate author tracing c0rp 73 1 72 lc card suffix lcff 94 3 94 total marc/molds characters 848 the molds system functionally, the molds system consists of utility routines to store a data base, a well-defined query language, a language interpreter, and a set of logical procedures which allow the user to operate on a data base. the molds system is a set of fortran iv subroutines which perform the maintenance functions, interpret the commands in the query language and perform the desired logical procedures. the subroutines render the system modular and open. it is therefore relatively easy for a programmer skilled in fortran iv to add, modify and delete commands and functions as required. this feature of the system is quite desirable. user feedback invariably points up weaknesses in the language or suggests useful features which might be incorporated. molds was continually modified in response to user requirements, and each modification was implemented within a short time without requiring major programming changes throughout the system. the system has already grown since it was first implemented with the marc data base, and commands have been added or modified as required. hardware configuration marc/molds was run at syracuse university computing center on an ibm 360/ 50 computer. originally, the on-line mode required full dedication of the computer during execution. the molds system requires some 150,000 bytes of main memory and a disk storage unit to hold the entire data base, as well as intermediate data generated by the user. the molds system has been implemented on other computers (2). interaction with the system in the on-line version was carried on through an ibm 2260 display station consisting of a keyboard and crt (cathode ray tube) display screen. although two or more consoles have not as yet been operated simultaneously, the system is intended to be time-shared. effort was made to alter the system to operate in a 50,000 ( 50k) upper partition, so that it could be accessible at all times rather than on a scheduled basis. this involved reorganizing the program into an overlay structure in which the basic or root segments are resident in a fixed portion of memory throughout execution, while the remainder of the program is divided into a set of smaller segments which can overlay each other, being brought into memory only when needed. this task 152 ] ournal of library automation vol. 3/ 2 june, 1970 required a careful analysis of each subroutine for its dependence upon others, breaking the program into mutually exclusive segments, while ensuring that any given set of segments which occupied memory simultaneously did not exceed 50k bytes of storage. many of the larger segments which had to be further subdivided required considerable reprogramming. the first attempt at executing the new overlay version failed. due to a general lack of experience with the 2260 display units, it had not been anticipated that system software would not allow the console to be accessed from outside of the root segment, and the 2260 software package had been placed in an overlay area. as a result the original overlay configuration had to be altered. the console input/ ouput (i/0) package was moved into the root segment, increasing its size by several hundred bytes and similarly decreasing the amount of storage available for the overlay portions. therefore, it was necessary to develop yet another configuration to conform to these new storage limitations. while the necessary changes were being made, the computing center began operating a limited time-sharing system which itself required full dedication of the 360/50 machine. projected dates for returning to normal computer operations within a multi-partition environment were far enough in the future to suggest the efficacy of creating a new version of molds which could function off line, with cards and printer instead of the 2260 consoles. in this batch, or off-line, mode molds jobs could be submitted through the regular queue and run by computer center staff during batch processing time. with the on-line source program as a starting point, all references to 2260's were replaced with card reader and printer statements and the molds language instructions deleted which depended on the console for their use. mter all changes had been made and compilation was completed successfully, the off-line molds was exercised against a sample data base until it was satisfactorily debugged. since it was known that the computing center would eventually return to r artitioned operation, it was next undertaken to overlay the off-line molds into a 50k partition. this was accomplished with little difficulty since the problems encountered in working with the on-line version were largely due to the consoles. the end result of the entire task, therefore, was an off-line molds which could operate either in core or in overlay structure at the discretion of the user. the molds query language the molds query language includes some 34 distinct commands which must be entirely formulated by the user according to precise syntactical rules. the large number of commands is in part a reflection of the fact that this system provides the user with the ability to perform more operations of a greater variety on a data base than other interactive inforlc marc on molds/ atherton and miller 153 mation retrieval systems. it provides for retrieval of records from the data base according to data value descriptors, processing of data values by arithmetic and logical operations, sorting of retrieval records, and display of retrieval records in full or in part. operationally, the molds system regards a file of records as a set of parallel lists of blocks (figure 1). with the marc data base, these blocks were the 38 fields of catalog data (such as dewey class number, title, author, etc.). the commands in the molds query language are geared to list processing operations. in general, most of the molds commands will result in the formation of lists which are either identical in format to the original file, or are an independent list of alpha or numeric constants not subdivided into blocks. despite its surface complexity, the query language was designed specifically for users with absolutely no computer experience. the fixed format commands are easy to learn and use, even for the novice in computer based systems. they are mnemonic enough so that a little use soon brings an easy familiarity with them. commands in the molds query language there are six categories of commands in the language: retrieval, processing, display, storage, utility, and language augmentation. the commands are listed below with a brief explanation of each. retrieval commands: find: extract fetch define chain select forms a temporary subfile consisting of records from the data base for which the value in a specified block is equal, not equal, greater, greater or equal, less, less or equal to an input value. forms a temporary subfile consisting of records from an argument subfile for which the value in a specified block is equal, not equal, greater, greater or equal, less, less or equal to an input value. forms a temporary file which duplicates an existing file in the data base (added to original molds commands during this project) . forms a temporary subfile from two argument subfiles based on logical relationships and, or, not. forms a temporary subfile consisting of records from an argument subfile for which the value in a specified block is equal to any of the values in a specified block from a second argument subfile. forms a temporary subfile consisting of records from an argument subfile for which the value in a specified block is equal to any of the values in an argument list. 154 journal of library automation vol. 3/2 june, 1970 these six retrieval commands allow the user to extract selected data from the data base. selection is based on 1) a simple algebraic relationship (e.g., equal, not equal, greater than, etc.) between block values and a value specified by the user in the command (value may be alphanumeric or numeric), or 2) a simple logical relationship (e.g., and, or, not) between block values in two lists. all retrievals from molds files are based on exact-match correspondences between input descriptors and data values as they occur in records. each file is treated as distinct regardless of the fact that for the marc/ molds data base the second file may simply be a continuation of the first, etc. any block in a file may be used as an argument in a retrieval process. thus, the usual range of access points (author, title, subject, classification number) is considerably extended to include such unorthodox access points as juvenile literature, language, illustrations, and bibliographies. for example, one can retrieve all documents on a given subject or subjects which are juvenile books with bibliographies and illustrations published by a given publisher in 1966. the user can define his search limits with a degree of specificity not found in most interactive systems. however, the price he must pay is exactness in specifying the values used as retrieval criteria. the system will not retrieve on root words or key letter combinations, although such capability could be added. the block values must, therefore, be consistent and the user must have a precise knowledge of what they may be. this knowledge can be gained by examining the values and having them printed out as needed. (molds does have the capability of selecting unique values from a list, ordering them, and printing them out at any time during system operation. processing commands: count counts the number of records in an argument subfile or items in an argument list. order (reverse) maximum (minimum) total average arranges the records of an argument subfile in ascending (descending) order according to the values in a specified block or similarly sorts the values in an argument list. may be applied to alphabetic, numeric, and chronological data. selects the record containing the maximum (minimum) value in a specified block from an argument subfile, or the maximum (minimum) value in an argument list. may be applied to numeric or chronological data. calculates the sum of the values in a specified block of an argument subfile or of a list of numbers. calculates the average of the values in a specified block of an argument subfile or of a list of numbers. lc marc on molds/atherton and miller 155 median variance squareroot difference add (subtract multiply divide) calculates the median of the values in a specified block of an argument subfile or of a list of numbers. calculates the variance (standard deviation squared) of the values in a specified block of an argument subfile or of a list of numbers. calculates the square root of each value in a block of an argument subfile or of a list of numbers. calculates successive differences in the values of a specified block in an argument subfile or of a list of numbers. adds (subtracts, multiplies, divides ) the values from a specified block from an argument file (or list) to the corresponding values from a specified block from a second argument file (or list) . firstelement selects the first record from an argument subfile or reduce compress list. deletes the first record from an argument subfile or list. forms a temporary list composed of all the unique values in a specified block of an argument subfile or in an argument list. the eighteen processing commands allow the user to manipulate the data in the lists he has retrieved. he may count the number of elements in a list, arrange them in ascending or descending order, form the sum, average, variance, median and square root of a list of numbers; add, subtract, multiply, and divide one list by another, and select all unique elements from a list. the ability to process data as well as retrieve it may be unique to molds as compared to other interactive systems, and gives the language a useful added power. display commands display show print outputs on the crt (cathode ray tube) each complete record in an argument subfile (added to original molds commands during this project). outputs in columnar fashion on the crt selected blocks from up to three argument subfiles or lists (deleted in batch or off -line mode) . outputs in columnar fashion on the printer selected blocks from up to three argument subfiles or lists (added to original molds commands during this project) . the three display commands allow the user to display entire documents, or display selected books of information or records in columnar format. in 156 journal of library automation vol. 3/2 june, 1970 the on-line version of molds this may be done on the crt, or a printout made of selected blocks or lists of documents on the high speed printer. there is much flexibility and versatility in output format which is completely determined by the user. the command, show, is not used in the batch mode of molds. storage commands: set stores a single numeric value. store stores an alphabetic, chronological, or numeric list of arbitrary length. the two storage commands allow the user to insert independent lists of constants into the storage area. such lists do not become part of the data base, but are used in conjunction with retrieval and processing commands. utility commands: clear delete dump recall list deletes from storage a temporary subfile or list created during the session. deletes from storage all temporary subfiles or lists created during the session. displays on the crt in tabular fashion the names, file origins, and number of items in each subfile and list created by the user during the session (deleted in batch or off-line mode). displays on the crt the command which resulted in the creation of a specified temporary subfile or list (added to original molds commands during this project). produces printed copy of all commands issued during the session. may be used with stop at end of search (added to original molds commands during this project). the five utility commands allow the user to perform housekeeping operations, such as the clearing of storage areas, reinitialization of the system, and termination of execution. the command dump is not used in the batch mode of molds. language augmentation command: program allows the user to create new commands consisting of a sequence of basic commands and to store them for future sessions. the language augmentation command program, is one of the most important features of the language. it allows the user to create new commands tailormade to his own needs. this is shown in the first molds search query which follows. lc marc on molds/atherton and miller 157 search request formulation in marc/molds molds search query-example 1 (batch mode) program tally a/ count b a/ print b// end find zny ssoz/plcd/e/nyny/ tally zny/ print zny/plce/ plcd/pucd/ i find p67 ssoz/ date/e/1967 i tally p67/ define ny67 zny/and/p67/ tally ny67/ average avht ny67 /hite/ print avht/ stop the above example shows an off-line or batch-mode search. this sequence of commands would be keypunched and submitted as a job deck in the regular queue and run by the computer center staff, the searcher receiving the results as a printout from the high speed printer. ssoz is the name of one of the marc/molds files. this particular interaction shows the use of the operator program to augment the language in the subsequent search by adding tally to the list of commands. the following example shows a search query which is a sequence of some typical molds commands along with an explanation of the effect of each. each command has three parts. the first part (find, define, etc.) is the imperative which tells what operation is to be performed. the second part ( bibl, engl, both, etc.) is the label of the place in storage where the result of the operation is to be stored. this label is made up by the user when he gives a command. the third part of the command is the operand. in some cases the operand gives the criteria for retrieval (as in find, define) . it always gives the name or label of the list to be operated on, and in some cases specifies a particular block of that list. the request shown in this example was handled by molds to retrieve, display, and process all english language books on printing, or typesetting, or type founding which have bibliographies. the sequence illustt·ates the flexibility of molds, the many types of processing which can be done, the relatively easy way to use command format. this particular sequence was performed in the on-line version with chance for usersystem interaction after each command. 158 journal of library automation vol. 3/2 june, 1970 molds search query-example 2 (on-line mode) molds commands: find bibl marc/bib/e/x/ explanation: find all records in the file named marc for which the block named bib contains a value equal to (e) x (x in the block indicates presence of bibliographies). the list of selected records is to be stored in a location called bibl. find engl marc/lang/e/eng/ find all documents in the file named marc for which the block named lang contains a value equal to (e) eng, i.e. english language books. the list of selected records is to be !itored in a location called engl. define both bibl/ and/engl/ store subs 3/ alpha/13/ element 1 = printing/ element 2 = type-setting/ element 3 = define a new list called both which consists of the documents common to both bibl and engl, i.e., all english language books with bibliographies. inform the system that the user wishes to store, via the console, a list of values which will be called subs. the list will contain 3 elements which will be alphanumeric (alpha) as opposed to strictly numeric. the longest element will not exceed 13 characters. (system responds with these words.) user inserts first value by typing it on the console. (system responds with these words.) user inserts second value. (system responds with these words.) lc marc on molds/atherton and miller 159 type-founding/ select all both/subj/subsi count no. all/ show no.// print all/main/titl/lcno i i all/publ/plce/ i maximum big all/hite/ average ave all/hite/ user inserts third value. user has now created an independent list of three distinct valuesprinting, typesetting, type-founding and stored them in a location called subs. select all records from the list called both for which the values in the block named subj are equal to any of the values in the list called subs, i.e. those records for which the subject heading is printlng, type-setting, or type-founding. the selected records are stored in a location called all. count the number of records in the list called all. the count is stored in a location called no. display the contents of no. on the crt. produce a 5-column printed listing consisting of the values in the blocks ·named main (main entry), titl (title), lcno (library of congress classification number), publ (publisher), plce (place of publication) from each record of the list called all. from the list called all, select the record containing the maximum value in the block named hite (height) . the record is stored in a location called big. calculate the average of the values in the block named hite (height) of the list called all. the value is stored in a location called ave. 160 journal of library automation vol. 3/2 june, 1970 the following example records another interaction and the results in the off-line or batch mode. notice the error message which did not interrupt the search. this result also includes a report on the length of central processing unit (cpu) time each operation takes in hours, minutes, seconds and tenths of seconds. any line preceded by c indicates that the line was printed by the computer; any line minus the c indicates that the information was typed in by the user. molds retrieval-example 3 (batch mode) c please enter your program c line 1 oooooooopauline athertonooooooooo c invalid command name c set in at 185 day of 1969 16-01-17.1 c line 1 program tally a/ c line 1 count b a/ c line 2 print b// c line 3 end c set in 185 day of 1969 16-01-17.5 c line 2 find d2 ssoz/dew2/ne/o? c set in at 185 day of 1969 16-02-38.7 c line 3 find d1 ssoz/dewl/ne/ i c set in at 185 day of 1969 16-03-56.7 c line 4 tally d2/ c 950.00 c set in at 185 day of 1969 16-03-57.3 c line 5 tally d1/ c 905.00 c line 6 stop comments on marc/molds thus far this report has been confined to a more or less factual description of the components of the marc/molds system. no doubt the reader has asked himself many questions about the system, and made his own critical comparisons between this system and others. what follows are preliminary and necessarily subjective comments based on a lc marc on molds/atherton and miller 161 few demonstrations given to students in the school of library science and on the authors' own observations and reflections. system design response time response time (i.e. the time between transmission of a command in the on-line version and its execution) has been on the order of 90 seconds for a search of 620 records, to 20 seconds for an arithmetic operation involving the same number of records. when one thinks of these times in comparison with the time required to perform the same operations manually, they seem rapid. however, 90 seconds appears to be an unreasonably long period of time in a computer-based interactive retrieval environment. viewers of demonstrations often asked why it took the computer "so long" to perform a search. a user's tolerance for delay appears to vary a great deal with the type of retrieval system he is using. this has been observed on other occasions, but no determination has yet been made of tolerable limits in different environments, a determination that would be important in designing computer-based systems. · man-system interaction a design goal of most other existing interactive retrieval systems seems to be to give the computer certain anthropomorphic qualities and make it into a teacher or a responsive friend. such systems offer computeraided query formulation and/or a friendly conversation with the computer. the molds on-line system does not include either of these features. the user must first master a marc/molds manual which is an explanation of the system and the data base. he then goes on line and gives his command. molds responds by performing that command or by putting out a brief error message if the command format was improper. apparently the objective of conversation with the computer as found in most systems is to make it easier for the user to achieve desired results or to make him feel more at ease with the system. the person who plays with an interactive system once or twice probably finds conversations with a computer amusing, novel, and helpful in his first attempts. however, for a serious and steady user, carrying on the same conversation with the computer during each and every session can be tedious, repetitive, time consuming and sometimes circular. the optimum mix of computer-aided and independent user-formulated query is yet to be studied and found. perhaps molds, because it is a poor conversationalist, could aid in this search. at any rate, the automatic assumption of conversational features as a design goal for computer-based retrieval systems may not be based on sound knowledge of what suits the serious user. molds repertory of commands the processing commands in the molds query language are a wei162 journal of library automation vol. 3/2 june, 1970 come and valuable addition to the usual repertory of search and display commands common to most interactive systems. although the marc data base does not lend itself to a great deal of processing, we have found some commands useful, particularly count, order, maximum, minimum, and compress. processing times when individual commands of a single search take seconds of cpu time, it is certain that a retrieval system will be expensive if it is employed by a great many users as a general purpose system. some of the molds commands operating on the marc data base took whole minutes of cpu time! the authors have learned a great deal about interactive retrieval systems by using molds experimentally, but because of the excessive cost of certain runs, may not be able to continue research with it. modifications will have to be made to make it more efficient (i.e. cheaper to run) before it could be recommended for general use in the syracuse university library school or anywhere else. if the molds system can be designed to yield good results for certain types of searches with a realistic file size, it will be a boon to the library or educational institution seeking to automate some part of its searching procedures. data base noah prywes ( 14) has commented, "the effectiveness in retrieving documents is highly dependent on the amount of labor and processing invested in the storage of documents." the minimum amount of processing done on the marc tapes has, in fact, limited the effectiveness of retrieval. the extreme simplicity of the general molds data base structure is worthy of study. the efficiency and cost of retrieval using this structure needs to be compared very carefully with more sophisticated threaded lists. one extremely important factor to consider will undoubted· ly be the effect of increasing the size of the file. as pointed out before, the molds system requires an exact match of punctuation and spelling between retrieval criteria and stored data items, a match difficult to achieve. to be sure, this is partially a limitation in the molds system that may be relaxed by incorporating a capability to search for root words and key letter combinations. however, the many inconsistencies in abbreviations, punctuation, and spelling that appear in bibliographic records when information on title pages is transcribed, as on the marc tapes, can enormously complicate effective retrieval. marc or non-marc bibliographic records will always contain some "author" variations that such a system as molds may have to accommodate. this is a very knotty problem. these comments are not to be construed as a criticism of the fine work the library of congress has done in its marc pilot project. the marc lc marc on molds/atherton and miller 163 pilot project record format, with sometimes indistinct data elements ( special punctuation marks and symbols), was not specifically designed for computer-based interactive search systems. hopefully, the use herein described to which the marc data base has been put, and the experience derived from that use, will be of value as future modifications of the marc format are made. mter all, reference retrieval, using bibliographic information, automated or manual, is natural to libraries and is, indeed, one of the purposes for which that information is recorded in the first place. since one of the true values of a computer-based file lies in making multiple use of the records, it becomes imperative to test the various uses to which these records can be put. the future use of marc/molds at syracuse university the marc/molds system has undergone continual modification in data base structure and query language during the first year of work on it. a computer-based system must be capable of such flexibility, for changes should be accomplished easily and smoothly. no system is perfect, especially in its early days, least of all molds. it is intended to continue investigation into information-seeking behavior, and to use marc/ molds occasionally along with other retrieval systems. another paper describes use of the marc file with the ibm/ document processing system ( 15). summary this report has tried to describe, not sell, marc/molds as fairly as possible in the belief that some of its features should be considered by persons designing interactive systems, and by those responsible for refinement of the marc format. the searching capability is valuable as it increases the access points to the data. the arithmetic and logical operations provide an opportunity to perform certain studies of the marc data base. the marc files will eventually have many applications beyond technical processing functions in libraries. these applications would be more practically implemented if the marc format were modified to accommodate them and if librarians would use systems such as molds during their exploration of alternatives. marc/molds as a computer-based system has many wealmesses. outnumbering and to some extent overshadowing the concrete statements about its faults is its great potential. many questions have been raised which remain unanswered. questions dealing with the basic design of the system and data base are indicative of the development and experimentation which must be done before computer-based interactive retrieval in libraries is a practical reality. acknowledgments the work on this project has been supported by rome air develop164 journal of library automation vol. 3/2 june, 1970 ment center (contracts. u. no. af30 (602)-4283). related work, supported by a grant from the u. s. office of education, provided an education in understanding of the marc tapes. the authors gratefully acknowledge the comments made by phyllis a. richmond and frank martel on the original manuscript. mrs. sharon stratakos, programmer most responsible for molds, contributed a great deal to the authors' understanding of this retrieval program and its potential use with a bibliographic reference file such as marc. program microfiches and photocopies of the following may be obtained from national auxiliary publications service of asis: "rome project program description: molds support package" (naps 00884). references 1. avram, henriette: the marc pilot project, final report (washington, d. c.: library of congress, 1968). 2. a user-oriented on-line data system (syracuse, n. y.: syracuse university research corp., 1966). 2 v. 3. freeman, robert r.; atherton, pauline: audacious-an experiment with an on-line interactive reference retrieval system using the universal decimal classification as the index language in the field of nuclear science (new york: american institute of physics, april 25, 1968) (aip/udc-7). 4. burnaugh, h . p.; et al: the bold user's manual (revised) (santa monica, cal.: jan. 16, 1967) ( tm-2306/004/01). 5. cegala, l.; waller, e.: colex user's manual (falls church, va.: system development. feb., 1969) (tm-wd-(l)-405/000/00). 6. smith, j. l.; micro: a strategy for retrieving ranking and qualifying document references (santa monica, cal.: jan. 15, 1966) (sp 2289). 7. green, james sproat: grins : an on-line structure for the negotiation of inquiries (bethlehem, pa.: lehigh university, center for the information sciences, september 1967) . 8. computer command and control company: description of the multilist system (philadelphia, pa.: july 31, 1967. 9. national aeronautics and space administration, scientific and technical information division: nasa/recon user's manual (washington, d. c.: october 1966). 10. kessler, m. m.: tip user's manual (cambridge, mass.: massachusetts institute of technology, dec. 1, 1965). 11. biomedical communication network: user's training manual (syracuse, new york : december 1968). 12. welch, noreen 0. : a survey of five on-line retrieval systems (washington, d. c.: mitre corp., august 1968) (mtp-322). lc marc on molds/atherton and miller 165 13. avram, henriette d.; knapp, john f.; rather, lucia j.: the marc ii format (washington, d. c.: library of congress, 1968). 14. prywes, noah s.: on-line information storage and retrieval (philadelphia, pa.: university of pennsylvania, moore school of electrical engineering, june 1968). 15. atherton, p.; wyman, j.: "searching marc project tapes using ibm/document processing system," proceedings of american society for information science, 6 ( 1969), 83-88. 66 1 comparative costs of converting shelf list records to machine readable form richard e. chapin and dale h. pretzer: michigan state university library, east lansing, michigan a study at michigan state university library compared costs of three different methods of conversion: keypunching, paper-tape typew1·iting, and optical scanning by a service bureau. the record converted included call number, copy number, first 39 letters of the author's name, first 43 letters of the title, and m.te of publication. source documents were all of the shelf list cards at the library. the end products were a master book tape of the library collections and a machine readable book card for each volume to be used in an automated circulation system. the problems of format, cost and techniques in converting bibliographic data to machine readable form have caused many libraries to defer the automation of certain routine operations. the literature offers little for the administrator facing the decisions of what to convert and how to convert it. automated circulation systems require at least partial conversion of the accumulated bibliographic record. the university of missouri, like many libraries, has been converting the past record only for books as they are circulated ( 1) . southern illinois university ( 2) and johns hopkins ( 3), on the other hand, have converted the record for their entire collections. the southern illinois program is based upon converting only the call number. johns hopkins has converted the call number, main entry, title, pagination, size, and number of copies. and missouri has recorded call number, accession number, and abbreviated author and title. costs of shelf list conversion/ chapin and pretzer 67 · several methods of converting the record have been described. missouri employed keypunching; southern illinois marked code sheets which were scanned electronically and converted to magnetic tape; johns hopkins, working from microfilm copy of the shelf list, used special type font and typed the records for optical scanning. an ibm report on converting the national union catalog recommended an on-line terminal as the best method of conversion ( 4). studies at michigan state university led to the conclusion that acquisition, serials, circulation, and card production contained certain routines that might well be automated. once automation of circulation was decided upon as our initial effort, decisions were necessary as to the conversion. it was recommended that a portion of the bibliographic record for all items in the shelf list should be converted. information other than the call number is being used for other programs ( 5) . cost figures for converting library records are scarce. in only two instances are figures available. the ibm report on the national union cata. log shows that the average entry in nuc contains 277 characters, with an estimated conversion cost ranging from $0.3531 to $0.417 per entry. the proposed conversion method employs an on-line terminal, a technique not available to most libraries. the johns hopkins conversion of "about 300,000 · cards" was accomplished by optical scanning and cost $18,170 (3,p.4). this figures out at about $.06 per record. later in the report it is stated that the conversion "is at a rate of $.0038 per character converted" ( 3,p.25). at $.06 per card and $.0038 per character, the converted record would consist of 16 characters! in the study herewith reported every effort was made to arrive at comparative cost figures for the three methods of conversion that are readily available to most research libraries: keypunching, paper-tape typewriting, and optical scanning as accomplished through a service bureau. methods of study the shelf list records of the michigan state university library were divided into three sections by numbering catalog drawers in sequence: 1,2,3; then 2,3,1; then 3,1,2. all the drawers marked with number one became one sample group; those marked two and three made up the other groups. this method of numbering the drawers gave samples from each area of the classification schedule for each method of conversion. the bibliographic data were taken directly from the shelf list without transferring information to worksheets. a sample of the shelf list shows that 74 per cent of the cards are library of congress cards or copies of library of congress proof slips. of those cards produced in the library, only 12 per cent of the total were abbreviated records. the keypunch operators, the typists, and the service bureau were in68 journal of library automation vol. 1/ 1 march, 1968 structed to extract information from the shelf list record. all differences in type-capitals, italics, etc.-were to ~e ignored; transliterated titles were to he used in those cases where entries were in non-roman alphabet; accents and diacritical marks were ignored, except where it made a difference in filing, as with umlauts; all numbers in title and author fields were to be spelled as if written. <d qd 941 • a4l3 c.l c.2 c.) jx 1417 • w47 c.l ~ auanovich. vladimir moiseevfela. <!> apahaf dlsw.rslon m crystal rtics and the theory of excitons ,l>yl 1'. m. agranovic and v. l. ginzburg. trnnslnted from the original manuscript by literaturprojekt, innsbruck, a~tria. london, new york intencience publishers ,cu~r~ \!1 ' ml, 316 p. ua. 24 em. (lnterteleoce mon01rapba and te:dl in phyiiici.i and utronomy, y, 18) translation of kphctu .. oonthka c )"'etom n~tp&hctaeiiiioi .iiicnepc:hh h teo ph• jltchtohob ( romanlzed: ~latallooptlka i adletom prostrn n11t\ ennoj dl1pe11111 i teorul ~ksltonoy) b.lbllograpby: p . 807-313. 1. cry11tal optici. 2. e1:clto" tbeory. t. gln&burj, vltallt luanyich, 191&joint author. n. title. (sertea: lntencience monosj&phl id physics ao aattooomy, v. 18) qd941.a.13 548'.9 66-2th7 llbrarr of con1re1111 ed • ern.tional .relations• san francisco, chandler 0 fig. 1. shelf list cards. costs of shelf list conversion/ chapin and pretzer 69 information that was transcribed is marked in the example, figure 1. the complete call number 1) was included. author 2) was typed through 39 spaces, including dates, if possible. in cases where author entry was lengthy the operators were instructed to stop at the end of 39 spaces. title 3) was recorded as completely as possible th1·ough 43 spaces, but not to extend beyond the first major punctuation. date 4) was included as shown. only one copy 5) was shown on each entry. in the example of abbreviated form in figure 1, five separate records were required, with change only in copy number. the master book tape includes the call number, which occupies 32 spaces; 3 spaces are allowed for copy number, 39 for author, 43 for title, and 4 for date of publication. on the book card, figure 2, which was generated by the computer from the master book tape, the format is as follows: 32 spaces for call numbers, 3 for copy number, 11 for author, 26 for title and 4 for the year published. the remainder of the card is for machine codes used in the circulation system. l. i i i i ii i ill ill i i i michigan state fjnjversiiy i library i i i i i i i i i ii i importa.fr: i if this card is lost or damaged, a fine jll ie chargta. msu 7j6 i i i i i fig. 2. book pocket card. i i i i i i ii . i i the book card alone can be created directly by the keypunch. however, if a library has equipment available for a more complete program, it is useful to prepare information in a format to create a master book tape. programs have been written so that the master tape can be added to or deleted from at a later date. four operators worked on the project at michigan state university. two of them were average keypunch operators with little typing skill, one was an expert typist, and the other was an expert keypunch operator. the first two operators were trained to use both the keypunch and the flexowriter. the purpose in using a variety of typists and operators for the job was to arrive at average figures for the conversion project. the data show great variance of output among operators. .70 journal of library automation vol. 1/ 1 march, 1968 the outline of the methods used is shown in figure 3. the keypunch method recorded the bibliographic data by use of an ibm 026 keypunch. the punch cards were transferred to a magnetic tape and the book cards were generated by the computer. the paper-tape typewriter information was punched in paper tape by the use of a 2201 flexowriter. a portion of the sample was converted directly to magnetic tape. since some libraries will not have a paper-tape to magnetic-tape converter, the remainder of the paper-tape sample was converted to punch cards and then to magnetic tape. typed 1-----+ page fig. 3. flowchart of shelf list record. optical scanner the optical scanning method was handled by farrington corporation·s service bureau, input services, in dayton, ohio. the service bureau assigned 10 to 15 employees to transcribe the shelf list. they used ibm selectric typewriters, with special type font. special symbols were used to designate end of field. the data were recorded on continuous-form paper. the typed record was then edited and scanned, producing a magnetic tape. mter the tape was used for production of book cards, it was added to the master book tape. the first batch of cards sent to dayton was gone from the library for approximately four weeks. after the personnel at dayton became accustomed to the format and to library terminology, the turnaround time was approximately two weeks. the 255,000 records which were converted by the service bureau were sent off campus in four separate batches. costs of shelf list conversion/chapin and pretzer 71 machine verification of the record was not required. each operator was instructed to proofread her own copy. machine verification was considered, but the idea was discarded because of the extra cost involved. also, since book cards were to be inserted in all volumes, final verification would result when the books and cards were matched. results in the conversion keypunching cost 6.63 cents per record. paper-tape ran slightly higher-7.07 cents; this higher cost was due to the added cost of machinery and the added cost of going from paper tape to magnetic tape. optical scanning, through a service bureau, was exactly the same as keypunching-6.63 cents, including the programming costs. cost details are shown in table 1. table 1. average cost per shelf list record converted labor (1) salary fringe benefits equipment rental (2) computer supplies overhead ( 4) contractual services keypunch $.04073 .03723 .00350 .00322 .00280 .00042 .00003 .02232 paper-tape typewriter $.03960 .03620 .00340 .00888 .00840 .00048 . (3) .00052 .02172 scanning, service bureau $.00030 .06600 (5) total $.06630 $.07072 $.06630 ( 1) average costs for all operators based upon salary of $2.10 per hour, and fringe benefits of 9.4 per cent. ( 2) rental tin1e to library of ibm 1401 computer is $30.00 per hour, including personnel costs. (3) includes $.000089 for tape-to-tape conversion and $.000091 for tape to card to magnetic tape conversion. ( 4) university charge of 54.87 per cent of salaries, for space, utilities, maintenance, etc. this figure does not include cost of training and supervision. ( 5) $.057 per record plus .009 per record for programming costs. late in the study we observed that a seemingly inordinate amount of the flexowriter time was consumed by the automatic movement of the typewriter caniage to the pre-determined fixed fields. in order to circum72 journal of library automation vol. 1/ 1 march, 1968 vent this the operator was instructed to strike one key to indicate end of field, and then she no longer had to wait for the carriage movement. by using the manual field markers, as opposed to automatic fixed fields, the cost of the flexowriter operation was reduced to 6.672 cents per record. the disadvantage of the manual field-marking system was the increased chance of operator error, which amounted to 3.13 per cent more than the fixed-field method. for this reason, and in spite of the economy of the manual method, the use of pre-determined fixed fields for flexowriter conversion is to be preferred. in the comparison of the salary costs for keypunching and for the use of flexowriter, great variations were shown among operators. two participants were asked to use both the keypunch and the flexowriter on varying days, with tallies of their output accounted for throughout the entire project. operator 1 was essentially a skilled keypunch operator who had some background in typing. her salary cost per record during keypunching was 3.98 cents; her salary for the paper-tape typewriter was 7.92 cents. operator 2 was a skilled keypunch operator who was also sent to typing class for one term to raise her typing skill. her salary cost was 3.92 cents per record on the keypunch and 3.79 cents per record on the paper-tape machine. operator 3, who was a skilled keypunch operator, averaged 2.32 cents per record for salary cost. operator 4, who was a typist and not a keypunch operator, produced records on the flexowriter at a cost of 3.56 cents per record. the above figures indicate salaries only, and do not include overhead, fringe benefits, and other expenses which are reflected in the total conversion cost shown. a letter from farrington service corporation stated the following information about the scanning operation: "1) our typists produced an approximate total of 7,950 typing pages in the course of this conversion. 2) each typist averaged from 3.6 to 3.8 pages per hour. 3) we processed an average of 800-1,000 (shelf list) cards, per girl, per day. 4) the total man hours expended in this project was 2,144. 5) the amount of error detected as a result of sight verification varies significantly from girl to girl. the average, however, ran approximately 2.8 per cent (of records to be corrected)." comparison was made of actual records converted per eight-hour day by each of the methods. the service bureau, with skilled typists, was able to convert approximately 100 records an hour for each typist. the most efficient keypunch operator averaged about 75 records per hour, which was noticeably more than the average. the paper-tape typist, using pre-programmed fixed fields, reached 65 records per hour, but was able to produce 73 records per hour by manually typing the field markers. a short-run sample was stop-watch-timed to give an indication of the differences in results for each method when only minimum changes in certain fields, such as copy number or volume number, were required. costs of shelf list conversion/ chapin and pretzer 73 on the keypunch machine an operator consumed 34.6 seconds in typing the initial record and 20.4 seconds in duplicating the basic information and changing data in one given field. the operator with the automatic program flexowriter consumed 47.2 seconds typing the initial record, including 13.2 seconds in shifting fields and automatically firing the record marks, and 24 seconds duplicating the record. when she manually indicated the field information, she was able to convert the initial record in slightly less time-30 seconds; and she took 22.8 seconds to duplicate the data with a change in one field. final verification will be completed only when all cards are matched with the proper books. for those books that do not circulate, this may never be accomplished. a sample of cards was selected to reflect the three methods of conversion. the service bureau cards contained fewer errors than those produced by keypunching and paper-tape typewriting. production of records that were not acceptable to the computer in an edit program occurred in 1.75 per cent of the sample for keypunching, 0.93 per cent for paper-tape typewriting, and 0.16 per cent for service bureau. operator errors, discovered while matching cards with books, showed a higher percentage: 4.62 per cent for keypunching, 3.60 per cent for flexowriter, and 0.35 per cent for service bureau. conclusions and recommendations 1. the cost of converting a portion of the bibliographic record is relatively inexpensive when compared to the total cost of automated library programs. one reason for our delay in entering into the field of an automated circulation program was that of making the book cards. now that this task has been completed, it is obvious that conversion is a one-time cost that can well be absorbed. if the library cannot afford the original conversion, at a cost of 6 or 7 cents a record, then the library cannot afford to proceed with automated programs. 2. there is no difference in cost between keypunching a machine readable record and in having the project undertaken by a service bureau. the use of paper-tape typewriter for conversion costs more than the other two methods. 3. large scale conversion of records to machine readable form might well be done by an outside organization. in order to get the task completed in a short period of time, a library would be required to hire a number of short-term clerical employees. in the case of michigan state, situated in the small community of east lansing, recruiting and training a large number of employees for short-term projects is most difficult. it is rather certain that the overhead for such a program would bring the cost beyond that of using a service bureau. on the basis of our experience it is recommended that the conversion be sent to a service bureau. 74 journal of library automation vol. 1/ 1 march, 1968 4. a library can get along without portions of. a shelf list for short periods of time. one of the predicted problems of sending material off campus to be converted was that of losing the availability of the shelf list records. although there were some inconveniences, it was found that the library could carry on its operations and function without the shelf list. certainly, this could not be done if the shelf list cards were gone for any length of time. acknowledgment a grant from the council on library resources, inc., made possible the study described in this paper. references 1. parker, ralph h.: "development of automatic systems at the university of missouri library," in university of illinois graduate school of library science, ptoceedings of the 1963 clinic on library applications of data processing. (champaign, ill.: illini union bookstore, 1964)' 43-55. 2. southern illinois university. office of systems and procedures: an automated circulation control system for the delyte w. morris library; the system and its progress in brief. (carbondale, ill.: southern illinois university, 1963). 3. the johns hopkins university. the milton s. eisenhower library: progress report on an operations research and systems engineering study of a university library. (baltimore: johns hopkins, 1965). 4. international business machines. federal systems division: report on a pilot project for converting the pre-1952 national union catalog to a machine readable record. (rockville, maryland: ibm, 1965). 5. chapin, richard e.: "administrative and economic considerations for library automation," in university of illinois graduate school of library science, proceedings of the 1967 clinic on applications of data processing. (in press). extending im beyond the reference desk: a case study on the integration of chat reference and library-wide instant messaging network ian chan, pearl ly, and yvonne meulemans information technology and libraries | september 2012 4 abstract openfire is an open-source instant messaging (im) network and a single unified application that meets the needs of chat reference and internal communication. in fall 2009, the california state university san marcos (csusm) library began using openfire and other jive software im technologies to simultaneously improve our existing im-integrated chat reference software and implement an internal im network. this case study describes the chat reference and internal communications environment at the csusm library and the selection, implementation, and evaluation of openfire. in addition, the authors discuss the benefits of deploying an integrated im and chat reference network. introduction instant messaging (im) has become a prevalent contact point for library patrons to get information and reference help, commonly known as chat reference or virtual reference. however, im can also offer a unique method of communication between library staff. librarians are able to rapidly exchange information synchronously or asynchronously in an informal way. im provides another means of building relationships within the library organization and can improve teamwork. many different chat-reference software packages are widely used by libraries, including questionpoint, meebo, and libraryh3lp. less commonly used is openfire (www.igniterealtime.org/projects/openfire), an open-source im network and a single unified application that uses the extensible messaging and presence protocol (xmpp), a widely adopted open protocol for im. since 2009, the california state university san marcos (csusm) kellogg library has used openfire for chat reference and internal im communication. openfire was relatively easy to set up and administer by the web development librarian. librarians and library users have found the im interface to be intuitive. in addition to helpful chat reference features such as statistics capture, queues, transfer, linking to meebo widgets, openfire offers the unique capability to host an internal im network within the library. ian chan (ichan@csusm.edu) is web development librarian, california state university san marcos, pearl ly (pmly@pasadena.edu) is access services & emerging technologies librarian, pasadena community college, pasadena, and yvonne meulemans (ymeulema@csusm.edu) is information literacy program coordinator, california state university san marcos, california. extending im beyond the reference desk | chan, ly, and meulemans 5 in this article, the authors present a literature review on im as a workplace communication tool and its successful use in libraries for chat reference services. a case study on the selection, implementation, and evaluation of openfire for use in chat reference and as an internal network will be discussed. in addition, survey results on the library staff use of the internal im network and its implications for collaboration and increased communication are shared. literature review although there is a great deal of literature on im for library reference services, publications on the use of im in libraries for internal communications do not appear in the professional literature. a review of library and information science (lis) literature has revealed very limited work on this aspect of instant messaging. however, a wider literature review in the fields of communications, computer science, and business, indicates there is growing interest in studying the benefits of im within organizations. instant messaging in the workplace in the workplace, im can offer a cost-effective means of connecting in real-time and may increase communication effectiveness between employees. it offers a number of advantages over email, telephone, and face-to-face that we will discuss further in the following section. within the academic library, im offers the possibility of not only improving access to librarians for research help but also provides the opportunity to enhance communication and collaboration throughout the entire organization. research findings indicate that im allows coworkers to maintain a sense of connection and context that is different from email, face-to-face (ftf), and phone conversations.1 each im conversation is designed to display as a single textual thread with one window per conversation. the contributions from each person in the discussion are clearly indicated and it is easy to review what has been said. this design supports the intermittent reconnection of conversation and in contrast to email, “intermittent instant messages were thought to be more immersive and to give more of a sense of a shared space and context than such email exchanges.”2 through the use of im, coworkers gain a highly interactive channel of communication that is not available via other methods of communication.3 phone and ftf conversations are two of the most common forms of interruption within the workplace.4 however, garrett and danziger found that “instant messaging in the workplace simultaneously promotes more frequent communications and reduces interruptions.”5 participants reported they were better able to manage disruptions using im and that im did not increase their communication time. the findings of this study revealed that some communication that otherwise may have occurred over email, by telephone, or in-person were instead delivered via im. this likely contributed to the reduced interruptions because im does not require full and immediate attention unlike a phone call or face-to-face communication. in addition, im study participants reported the ability to negotiate their availability through postponing conversations, information technology and libraries | september 2012 6 and these findings support earlier studies suggesting im is less intrusive than traditional communication methods for determining availability of coworkers.6 a number of research studies show that im improves teamwork and is useful for discussing complex tasks. huang, hung, and chen compared the effectiveness of email and im and the number of new ideas; they found that groups utilizing im generated more ideas than the email groups.7 they suggested that the spontaneous and rapid interchanges typical of im facilitates brainstorming between team members. the information that is uniquely visible through im and the ease of sending messages help create opportunities for spontaneous dialog. this is supported by a study by quan-haase, cothrel, and wellman, which found im promotes team interaction by indicating the likelihood of a faster response.8 ou et al. also suggest im has “potential to empower teamwork by establishing social networks and facilitating knowledge sharing among organizational members.”9 im can enhance the social connectedness of coworkers through its focus on contact lists and instant, opportunistic interactivity. the informal and personalized nature of im allows workers to build relationships while promoting the sharing of information. cho, trier, and kim suggest that the use of im as a communication tool encourages unplanned virtual hallway discussions that may be difficult for those located in different parts of a building, campus, or in remote locations.10 im can build relationships between teams and organizations where members are in physically separated locations. however, cho, trier, and kim also note that im is more successful in building relationships between coworkers who already have an existing relationship. wu et al. argue that by helping to build the social network within the organization, instant messaging can contribute to increased productivity.11 several studies have cautioned that im, like other forms of communication, requires organizational guidelines on usage and best practices. mahatanankoon suggests that productivity or job satisfaction may decrease without policies and workplace norms that guide im use.12 other research indicates that personality, employee status, and working style may affect the usefulness of im for individual employees.13 some workers may find the multitasking nature of im to work in their favor while those who prefer sequential task completion may find im disruptive. the hierarchy of work relationships and the nature of managerial styles are likely to have an impact on the use of im as well. while there are no research findings associated with the use of im for internal communication within libraries, there are articles encouraging its use. breeding writes of the potential for im to bring about “a level of collaboration that only rarely occurs with the store-and-forward model of traditional e-mail.”14 fink provides a concise introduction to the advantages of using internal im for communication between library staff.15 in addition, he provides an overview of the implementation and success of the openfire-based im network at mcmaster university. extending im beyond the reference desk | chan, ly, and meulemans 7 success of chat reference in libraries im-based chat reference gives libraries the means to more easily offer low-cost delivery of synchronous, real-time research assistance to their users, commonly referred to as “chat reference.” although libraries have used im for the last decade and many currently subscribe to questionpoint, a collaborative virtual reference service through oclc, two newer online services helped propel the growth of im-based chat reference. first available in 2006, the web-based meebo (www.meebo.com) made it much easier to use im for localized chat reference because library patrons were no longer required to have accounts on a proprietary network, such as aol or yahoo, to communicate with librarians.16 instead, meebo provided web widgets that allowed users to chat via the web browser. libraries could easily embed these widgets throughout their website and unlike questionpoint, meebo is free and does not require a subscription. librarians could answer questions using either their account on meebo’s website or by logging-in with a locally installed instant messaging client. in comparison to im-based chat reference, a number of libraries also found questionpoint difficult to use due to its complexity and awkward interface.17 in 2008, libraryh3lp (http://libraryh3lp.com) pushed the growth of im-based chat reference even further because it offered a low-cost, library-specific service that required little technical expertise to implement and operate. libraryh3lp improved on the meebo model by adding features such as queues, multi-user accounts, and assessment tools.18 im adds a more informal means of interaction that helps librarians build relationships with their users. several recent studies have shown that users respond positively to the use of im for chat reference. the illinois state university milner library found that switching from its older chat reference software to im increased transactions by 161 percent within one year.19 with the introduction of web-based im widgets pennsylvania state university library’s im-based chat reference grew from 20 percent to 60 percent of all virtual reference (vr), which includes email reference, in one year.20 a 2010 study of vr and im service at the university of guelph library found 71 percent user satisfaction with im compared to 70 percent satisfaction with vr overall.21 im use in academic libraries has become ubiquitous, and other types of libraries also use im to communicate with library patrons. case study california state university, san marcos (csusm) is a mid-size public university with approximately 9,500 students. csusm is a commuter campus with the majority of students living in north county san diego and offers many online or distance courses at satellite campuses. the csusm kellogg library has a robust chat reference service that is used by students on and off campus. the library has about forty-five employees including librarians, library administrators, and library assistants. the following section will discuss the meebo chat reference pilot, selection of openfire to replace meebo, implementation and customization of openfire, and evaluation of openfire for chat reference by librarians and as an internal network for all library personnel. information technology and libraries | september 2012 8 meebo chat reference pilot to examine the feasibility of using im for chat reference at csusm, the reference librarians initiated a pilot program using meebo (2008–9). a meebo widget was placed on the library’s homepage, the ask a librarian page, and on library research guides. within the first year of the pilot project, chat reference grew to more than 41 percent of all reference transactions.22 based on responses to user satisfaction surveys, 85 percent indicated they would recommend chat reference to other students, and 69 percent said they preferred it to other forms of reference services. chat reference is now an integral part of the library’s research assistance program, and im has become a permanent access point for students to contact reference librarians. although the new im service was successful, the pilot program uncovered a number of key shortcomings with meebo when used for chat reference; these shortcomings are documented in a case study by meulemans et al.23 these findings matched problems reported by other libraries who used meebo in their reference services.24 meebo is most suited for individual users who communicate one-to-one via im. for example, meebo chat widgets are specific to each meebo user, and it is not possible to share a single widget between multiple librarians. in addition, features such as message queues and message transfers, invaluable for managing a heavily used chat reference service, are not available in meebo. those features are essential for working with multiple, simultaneous incoming im messages, a common occurrence in virtual reference. other missing features included the lack of built-in transcript retention and lack of automated usage statistics.25 selecting openfire based on the need for a more robust chat reference system, the csusm reference librarians and the web development librarian explored other im options, especially open-source software. the web development librarian had previous experience using openfire at the university of alaska anchorage, for an internal library im network and investigated its capabilities to replace meebo as a chat reference tool. the desire to replace meebo for chat reference at csusm also provided the opportunity to pilot an internal im network. openfire, part of the suite of open-source instant messaging tools from jive software, was the only application that could easily fulfill both roles and offered a number of features that made it highly preferable when compared to other im-based chat reference systems. of its many features, one of the most valuable was the integration between openfire user accounts and our campus email system. being able to tap into the university’s email system meant automated configuration and updating of all staff accounts and contact lists. this removed the burden of individual account maintenance associated with external services such as meebo, libraryh3lp, and questionpoint. openfire supports internal im networks at educational institutions such as the university of pennsylvania, central michigan university, and university of california, san francisco. extending im beyond the reference desk | chan, ly, and meulemans 9 openfire could meet our im chat reference needs because it includes the fastpath plugin, a complete web-based chat management system available at www.igniterealtime.org/projects/openfire/plugins.jsp. this robust system incorporates important features such as message queues, message transfer, statistics, and canned messages. james cook university library in australia also chose to use openfire with fastpath plugin as its chat reference solution based on their need for those features.26 other institutions using fastpath and openfire in the role of chat reference or support include the university of texas, the oregon/ohio multistate virtual reference consortium, mozilla.com, and the university of wisconsin. when reviewing chat reference solutions, we considered the possibility of using chat modules available through drupal (http://drupal.org), the web content management system (cms) for our library website. the primary advantage of that option was complete integration with the library website and intranet. further analysis of the drupal option revealed that the available chat modules where too basic for our needs and that reconfiguration of our intranet and website to incorporate a workable chat reference system would require extensive time. in comparison to the implementation time associated with deploying the openfire system, using drupal-based chat modules did not provide a favorable cost-benefit ratio. while the proprietary libraryh3lp offered similar functionality for chat reference, its inability to integrate with our email system was clearly a deficit when compared to openfire. in libraryh3lp, it is necessary to create accounts for all library personnel in chat reference. fastpath does not have that requirement if you integrate openfire with your organization’s lightweight directory access protocol (ldap) directory. instead, the system will automatically create accounts for all library staff. furthermore, the administrative options and interface for libraryh3lp also did not compare favorably with that of fastpath. the fastpath interface for assigning users is more intuitive and the system generates a customizable chat initiation form for each workgroup (figures 1 and 2). oregon’s l-net and ohio’s knowitnow24x7 offer information about software requirements and an online demonstration of spark/fastpath.27 information technology and libraries | september 2012 10 figure 1. fastpath chat initiation form for csusm research help desk figure 2. fastpath chat initiation form for csusm media library for our requirements, openfire was clearly superior to the available systems for chat reference. its relatively simple deployment requirements and ease of setup helped make it our first choice for building a combined im network and chat reference system. in the following section, we will discuss the installation, customization, and assessment of our openfire implementation. openfire installation and configuration the openfire application is a free download from ignite realtime, a community of jive software. the program will run on any web server that has a windows, linux, or macintosh operating system. if configured as a self-contained application, openfire only requires java to be available on your web server. installation of the software is an automated process and system configuration is through a web-based setup guide. after the initial language selection form, the next step in the server configuration process is to enter the web server url and the ports through which the server will communicate with the outside world (figure 3). the third step provides fields for selecting the type of database to use with openfire and for inputting any information relating to your selection (figure 4). extending im beyond the reference desk | chan, ly, and meulemans 11 figure 3. openfire server settings screen figure 4. openfire database configuration form openfire uses a database to store information such as im network settings, user account information, and transcripts. database options include using an embedded database or connecting to an external database server. using the embedded database is the simpler option and is helpful if you do not have access to a database server. connecting to an external database server offers more control of the data generated by openfire and provides additional backup options. openfire works with a number of the more commonly used database servers such as mysql, postgresql, and microsoft sql server. in addition, oracle and ibm’s db2 are database options with additional free plugins from these vendors. we choose to use mysql because of our experience using it with other library web applications. if using the external database option, creating and configuring the external database before installing openfire is highly recommended. after choosing a database, the openfire configuration requires the selection of an authentication method for user accounts. one option is to use openfire’s internal authentication system. while the internal system is robust, it requires additional administrative support to manage the process of creating and maintaining user accounts. the recommended option is to connect openfire with your organization’s lightweight directory access protocol (ldap) directory (figure 5). ldap is a protocol that allows external systems to interact with the user information stored in an organization’s email system. using ldap with openfire is highly preferable because it simplifies access for your librarians and staff by automatically creating user accounts based on the information in your organization’s email system. library staff simply login with their work email or network account information; they are not required to create a new username and password. information technology and libraries | september 2012 12 figure 5. openfire ldap configuration form the last step in the configuration process is to grant system administrator access to the appropriate users. if using the ldap authentication method, you are able to select one or more users in your organization by entering their email id (the portion before the ampersand). the selected users will have complete access to all aspects of the openfire server. once the setup and configuration process is complete, the server is ready to accept im connections and route messages. reviewing the settings and options within the openfire system administration area is highly recommended. most libraries will likely want to adjust the configurations within the sections for server settings and archives. connecting the im network the second phase of the implementation process connected our library personnel with the im network using im software installed on their workstations. the openfire im server works with any multiprotocol im client (“multiprotocol” refers to support for simultaneous connections to multiple im networks) that provides options for configuring an xmpp or jabber account. some of the more popular im clients that offer this functionality include spark, trillian, miranda, and pidgin. based on our chat reference requirements, we choose to use spark (www.igniterealtime.org/projects/spark), an im client program designed to work specifically with the fastpath web chat service. spark comes with a fastpath plugin that enables users to receive and send messages to anyone communicating through the web-based fastpath chat widgets (more information on fastpath configuration is in the next section of this article). this plugin provides a tab for logging into a fastpath group and for viewing the status of the group’s message queues extending im beyond the reference desk | chan, ly, and meulemans 13 (figure 6). spark also includes many of the features offered by other im clients including built-in screen capture, message transfer, and group chat. figure 6. the fastpath plugin for spark library personnel were able to install spark on their own by downloading it from the ignite software website and launching the software’s installation package. the installation process is very simple and user-specific information is only required when spark is started for the first time. the fields required for login include the username and password of the user’s organizational email and the address of the im server. as part of our implementation process, we also provided library staff with recommendations regarding the selection and configuration of optional settings that might enhance their im experience. recommendations included auto-start of spark when loggingin to computer and the activation of incoming message signals, such as sound effects and pop-ups. on our openfire server, we had also installed the kraken gateway (http://kraken.blathersource.org) plugin to enable connections to external im networks. the gateway plugin works with spark to integrate library staff accounts on chat network such as google talk, facebook, and msn (an example of integrated networks is shown in figure 6.) by integrating meebo as well, librarians were able to continue using the meebo widgets they had embedded into their research guides and faculty profile pages. this allowed them to use spark to receive im messages rather than logging on to the meebo website. information technology and libraries | september 2012 14 configuring the fastpath plugin for chat reference a primary motivation for using openfire was the feature set available in the fastpath plugin. fastpath is a complete chat messaging system that includes workgroups, queues, chat widgets, and reporting. fastpath actually consists of two plugins that work together, fastpath service for managing the chat system and fastpath webchat for web-based chat widgets. both plugins are available as free downloads from the openfire plugins section of the ignite software website— www.igniterealtime.org/projects/openfire/plugins.jsp. to install fastpath, upload the its packages using the form in the plugins section of the openfire administrative interface. the plugins will automatically install and add a fastpath tab to the administrative main menu. the first step in getting started with the system is to create a workgroup and add members (figure 7). within each new workgroup, one or more queues are required to process and route incoming requests and each queue requires at least one “agent.” in fastpath, the term agent refers to those who will receive the incoming chat requests. figure 7. workgroup setup form in fastpath as work groups are created, the system automatically generates a chat initiation form which by default includes fields for name, email and question. administrators can remove, modify, and add any combination of field types including text fields, dropdown menus, multiline text areas, radio buttons, and check boxes. you may also configure the chat initiation form to require completion of some, all, or none of the fields. at csusm, our form (figures 1 and 2) includes name, question, email, and a dropdown menu for selecting the topic area of the user’s research and a field for the user to enter their question. the information in these fields allows us to quickly route incoming extending im beyond the reference desk | chan, ly, and meulemans 15 questions to the appropriate subject librarian. fastpath includes the ability to create routing rules that use the values submitted in the form to send messages to specific queues within a workgroup. in future, we may use the dropdown menu to automatically route questions to the subject specialist based on the student’s topic. there are two methods to make the fastpath chat widget available to the public. the standard approach embeds a presence icon on your webpage and provides automatic status updates. clicking on the icon displays the chat initiation form. for our needs we choose to embed the chat initiation form in our webpages (see appendix b for sample code). when the user submits the form, openfire routes the message to the next available librarian. on the librarian’s computer, the spark program plays a notification sound and displays a pop-up dialog. the pop-up dialog remains open until the librarian accepts the message, passes it on, or the time limit for acceptance is reached, in which case the message returns to the queue for the next available librarian. evaluation of openfire for enhanced chat reference the csusm reference librarians found fastpath and openfire to be much more robust than meebo for chat reference. the ability to keep chat transcripts and to retain metadata such as time stamps, duration of chats, and topic of research for each conversation is very helpful toward analyzing the effectiveness of chat research assistance and for statistical reporting. the automated recording of transcripts and metadata saved time when compared to meebo. using meebo, transcripts were manually copied into a microsoft word document and the tracking statistics of im interactions were kept in a shared excel spreadsheet. other useful features of fastpath were the capability of transferring of patrons to other librarians and having more than one librarian monitor incoming questions. furthermore, access to the database holding the fastpath data allowed us to build an intranet page to monitor real-time incoming im messages and their responses. however, some issues were encountered with the fastpath plugin when initiating chat connections. we experienced intermittent, random instances of dropped im connections and lost messages. while many of these lost connections were likely the result of user actions (accidentally closing the chat pop-up, walking away from the computer, etc.), others appear to have been due to problematic connections between the server and the user’s browser. to address these issues, we are now asking users to provide their email when they initiate a chat session. with user emails and our real-time chat monitoring system, we are able to follow up with reference patrons that experience im connection issues and provide research assistance via email. evaluation of openfire as an internal communication tool while the adoption of im as internal communication tool was highly encouraged, its use was not mandatory for all library personnel. based on the varied technical background of our staff and librarians, we recognized that some might find im difficult to integrate within their workflow or communication style and chose a soft-launch for our network. information technology and libraries | september 2012 16 in summer 2011, we conducted a survey of csusm library personnel (44 respondents, 99 percent of total staff) to evaluate im as an internal communication tool. (see appendix a for survey questions.) we found that 59 percent of staff use the internal im network while 85 percent use some type of im for web-based chat for work. of those who use internal im, 30 percent used it daily. while the survey was anonymous, anecdotal discussions indicate adoption rates are higher among library units where the work is technically oriented or instructional in nature, such as library systems and the information literacy program/reference. among the respondents who use im, 45 percent of library staff indicated they use it because it allows quick communication between those in the library and 39 percent like its informal nature of communication. twenty percent of total respondents preferred im to email and phone communications. two respondents use the internal im network but were dissatisfied with it and indicated it did not work well while one found it too difficult to use. an additional survey question was geared for staff members who do not use the internal im network at all (“why do you not use the library im network?”). this question was designed to find areas of possible improvement within our system to encourage greater use. survey respondents were allowed to select more than one reason. the most common reasons given by those who do not use the library im network were that they don’t feel the need (34 percent of nonusers), they mainly communicate with staff members who are also not utilizing the im network (18 percent), im does not work for their communication style (14 percent), and privacy concerns (14 percent). we believe more in-depth analysis is necessary to learn more regarding the perceived usefulness of im within our organization and to further its adoption. conclusion through additional training and user education, we hope to promote greater use of the openfire internal im network among those who work in the library. while 100 percent adoption of im as a communication tool is not a stated goal of our project, we believe that some staff have not realized the full potential of im for collaboration and productivity due to a lack of experience with this technology. in hindsight, additional training sessions beyond the initial introductory workshop to set up the spark im client may have increased the usage of im by staff. for example, providing more information on the library’s policies regarding internal im tracking and the configuration of our system may have alleviated concerns regarding privacy. in addition, we need to lead more discussions on the benefits of im for collaboration, lowering disruptions, and increasing effectiveness in the workplace. openfire and fastpath for chat reference has brought many new features that were previously unavailable to chat reference at csusm. the addition of queues, message transfer, and transcripts has enhanced the effectiveness of this service and eased its management. compared to the prior chat reference implementations that used questionpoint and meebo, this new system is more user friendly and robust. extending im beyond the reference desk | chan, ly, and meulemans 17 furthermore, the internal im network and its connection to web-based chat widgets offer the opportunity for building a library that is more open to users. library users could feasibly contact any library staff member, not just reference librarians, via im for help. we are testing this concept with a pilot project involving the csusm media library. they are staffing their own chat workgroup and a chat widget is now available on their website. in the future, we also hope to employ a chat widget for circulation and ill services, another public services area that frequently works with library users. it is important to note that the success of openfire and im in the library attracted the attention of other csusm instructional and student support areas. in spring 2011, instructional and information technology services (iits), which provides campus-wide technology services for faculty, staff, and students piloted an openfire-based im helpdesk service to assist users with technology questions and problems. as of fall 2011, the “ask an it technician” service is fully implemented and available on all campus webpages. discussions on the adoption of im for other campus student services, such as financial aid and counseling, have also occurred. in addition to being a contact point for students, im has potential to improve the internal communication within the organization. references 1. hee-kyung cho, matthias trier, and eunhee kim, “the use of instant messaging in working relationship development: a case study,” journal of computer-mediated communication 10, no. 4 (2005), http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.2005.tb00280.x/full (accessed aug. 1, 2011). 2. bonnie a. nardi, steven whittaker, and erin bradner, “interaction and outeraction: instant messaging in action,” in proceedings of the 2000 acm conference on computer supported cooperative work (new york, new york: acm press, 2000),79–88. 3. ellen isaacs et al., “the character, functions, and styles of instant messaging in the workplace,” in proceedings of the 2002 acm conference on computer supported cooperative work (new york, new york: acm press, 2002), 11–20. 4. victor m. gonzález and gloria mark, “constant, constant, multi-tasking craziness: managing multiple working spheres,” in proceedings of the sigchi conference on human factors in computing systems (new york, new york: acm press, 2004), 113–20. 5. r. kelly garrett and james n. danziger, “im = interruption management? instant messaging and disruption in the workplace,” journal of computer-mediated communication 13, no. 1 (2007), http://jcmc.indiana.edu/vol13/issue1/garrett.html (accessed jun. 15, 2011). 6. nardi, whittaker, and bradner, “interaction and outeraction,” 83. 7. albert h. huang, shin-yuan hung, and david c. yen, “an exploratory investigation of two internet-based communication modes,” computer standards & interfaces 29, no. 2 (2006): 238–43. http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.2005.tb00280.x/full http://jcmc.indiana.edu/vol13/issue1/garrett.html information technology and libraries | september 2012 18 8. anabel quan-haase, joseph cothrel, and barry wellman, “instant messaging for collaboration: a case study of a high-tech firm,” journal of computer-mediated communication 10, no. 4 (2005), http://jcmc.indiana.edu/vol10/issue4/quan-haase.html (accessed jun. 12, 2011). 9. carol x. j. ou et al., “empowering employees through instant messaging,” information technology & people 23, no. 2 (2010): 193–211. 10. cho, trier, and kim, “instant messaging in working relationship development.” 11. lynn wu et al., “value of social network—a large-scale analysis on network structure impact to financial revenue of information technology consultants” (paper presented at winter information systems conference, salt lake city, ut, feb. 5, 2009). 12. pruthikrai mahatanankoon, “28p. exploring the impact of instant messaging on job satisfaction and creativity,” conf-irm 2010 proceedings (2010). 13. ashish gupta and han li, “understanding the impact of instant messaging (im) on subjective task complexity and user satisfaction,” in pacis 2009 proceedings. paper 10, http://aisel.aisnet.org/pacis2009/1; and stephanie l. woerner, joanne yates, and wanda j. orlikowski, “conversational coherence in instant messaging and getting work done,” in proceedings of the 40th annual hawaii international conference on system sciences, http://www.computer.org/portal/web/csdl/doi/10.1109/hicss.2007.152 (2007). 14. marshall breeding, “instant messaging: it’s not just for kids anymore,” computers in libraries 23, no. 10 (2003): 38–40. 15. john fink, “using a local chat server in your library,” feliciter 56, no. 5 (2010): 202–3. 16. william breitbach, matthew mallard, and robert sage, “using meebo’s embedded im for academic reference services: a case study,” reference services review 37, no. 1 (2009): 83–98. 17. cathy carpenter and crystal renfro, “twelve years of online reference services at georgia tech: where we have been and where we are going,” georgia library quarterly 44, no. 2 (2007), http://digitalcommons.kennesaw.edu/glq/vol44/iss2/3 (accessed aug. 25, 2011); and danielle theiss-white et al., “im’ing overload: libraryh3lp to the rescue,” library hi tech news 26, no. 1/2 (2009): 12–17. 18. theiss-white et al., “im’ing overload,” 12–17. 19. sharon naylor, “why isn’t our chat reference used more?” reference & user services quarterly 47, no. 4 (2008): 342–54 20. sam stormont, “becoming embedded: incorporating instant messaging and the ongoing evolution of a virtual reference service,” public services quarterly 6, no. 4 (2010): 343–59. http://jcmc.indiana.edu/vol10/issue4/quan-haase.html http://www.computer.org/portal/web/csdl/doi/10.1109/hicss.2007.152 http://digitalcommons.kennesaw.edu/glq/vol44/iss2/3 extending im beyond the reference desk | chan, ly, and meulemans 19 21. lorna rourke and pascal lupien, “learning from chatting: how our virtual reference questions are giving us answers,” evidence based library & information practice 5, no. 2 (2010): 63–74. 22. pearl ly and allison carr, “do u im?: using evidence to inform decisions about instant messaging in library reference services” (poster presented at the 5th evidence based library and information practice conference, stockholm, sweden, june 29, 2009), http://blogs.kib.ki.se/eblip5/posters/ly_carr_poster.pdf (accessed august 1, 2011). 23. yvonne nalani meulemans, allison carr, and pearl ly, “from a distance: robust reference service via instant messaging,” journal of library & information services in distance learning 4, no. 1 (2010): 3–17. 24. theiss-white et al., “im’ing overload,” 12–17. 25. meulemans, carr, and ly, “from a distance,” 14–15 26. nicole johnston, “improving the reference and information experience of students in regional areas—does an instant messaging service make a difference?” (paper presented at 4th alia new librarians symposium, december 5–6, 2008, melbourne, australia), http://eprints.jcu.edu.au/2076(accessed august 17, 2011); and alan cockerill, “open source for im reference: openfire, fastpath and spark” (workshop presented at fair shake of the open source bottle, griffith university, queensland college of art, brisbane, australia, november 20, 2009), http://www.quloc.org.au/download.php?doc_id=6932&site_id=255 (accessed august 4, 2011). 27. oregon state multistate collaboration, “multi-state collaboration: home,” http://www.oregonlibraries.net/multi-state (accessed august 16, 2011). http://blogs.kib.ki.se/eblip5/posters/ly_carr_poster.pdf http://eprints.jcu.edu.au/2076 http://www.quloc.org.au/download.php?doc_id=6932&site_id=255 http://www.oregonlibraries.net/multi-state information technology and libraries | september 2012 20 appendix a library instant messaging (im) usage survey the information you submit is confidential. your name and campus id are not included with your response. which of the following do you use . . . for work for personal library’s im network (spark) meebo msn yahoo gtalk facebook or other website-specific chat system im app on my phone trillian, pidgin or other im aggregator skype i don’t use im or web-based chat other if you selected other, please describe: ____________________________________________________________________ extending im beyond the reference desk | chan, ly, and meulemans 21 on average, how often do you communicate via im or web-based chat at work? ● several times a day ● almost daily ● several times a week ● several times a month ● never how often do you use im or web-based chat to . . . 5—often 4 3— sometimes 2 1—never discuss work-related topic socialize with co-worker answer questions from library users talk about non-work related topic request tech support other if you selected other, please describe: ____________________________________________________________________ if you use im to communicate at work, what do you like about it? ● allows for quick communication with others in the library ● facilitates informal conversation ● students like to use it to ask library related questions ● i prefer im over phone or email ● other: information technology and libraries | september 2012 22 why do you not use the library im network? ● don’t feel the need ● the people i usually talk to aren’t on it ● does not work well ● never get around to it . . . but would like to ● it doesn’t work for my communication style ● the system is too difficult to use ● privacy concerns ● other: additional comments? ____________________________________________________________________ extending im beyond the reference desk | chan, ly, and meulemans 23 appendix b iframe code for embedding fastpath chat widget <iframe scrolling= “no” frameborder= “0” src=“http://library.your_org.edu: 7070/webchat/userinfo.jsp?workgroup=<workgroupname>@workgroup.library.your_org.edu”> your browser does not support inline frames or is currently configured not to display inline frames. content can be viewed at actual source page: http://library.your_org.edu: 7070/webchat/userinfo.jsp?workgroup=<workgroupname>@workgroup.library.your_org.edu </iframe> many different chat-reference software packages are widely used by libraries, including questionpoint, meebo, and libraryh3lp. less commonly used is openfire (www.igniterealtime.org/projects/openfire), an open-source im network and a single unified ap... literature review instant messaging in the workplace success of chat reference in libraries case study meebo chat reference pilot selecting openfire openfire installation and configuration connecting the im network configuring the fastpath plugin for chat reference evaluation of openfire for enhanced chat reference evaluation of openfire as an internal communication tool conclusion references appendix a library instant messaging (im) usage survey appendix b 31 a fast algorithm for automatic classification r. t. dattola: department of computer science, cornell university, ithaca, new york an economical classification process of order n log n (for n elements), which does not employ n-square procedures. conversion proofs are given and possible information retrieval applications are discussed. many methods exist for ordering or classifying th<;j elements of a file. the elements are usually clustered into groups based on the similarities of the attributes of the elements. in information retrieval, the elements are frequently documents, and the attributes are words or concepts characterizing the documents. classification of document files may be divided into two basic categories: 1) an a priori classification already exists and each document is placed into the cluster whose centroid is most similar to that document; 2) no a priori classification is specified and clusters are formed only on the basis of similarities among documents. classification schemes that fall into the first class are very common and often involve manual methods. for example, new acquisitions of a library are classified by placing them into the clusters of a standard a priori classification. problems of the second type are usually more difficult to handle, and automatic or semi-automatic methods are often used. methods of this type are widely used in statistical programs, but the number of elements in the file is limited to several hundred, or at most, a few thousand items. in information retrieval applications, the number of elements may approach several hundred thousand or even a million documents, as in the case of a large library. in the present study, a method is described which is suitable for classification of very large document collections. 32 journal of library automation vol. 2/1 march, 1969 the n2 problem current methods of automatic document classification usually require the calculation of a similarity matrix. this matrix specifies the correlation, or similarity, between every pair of documents in the collection. thus, if the collection contains n documents, n 2 computations are required for calculation of the similarity matrix; however the similarity matrix is often symmetric, so the number of computations is reduced to n2/2. this immediately poses two serious problems: the storage space necessary to store the matrix increases as the square of the number of documents, and the time required to calculate the matrix also increases quadratically. fortunately, document-document similarity matrixes are normally only about ten percent dense, and only the non-zero elements need be stored ( 1). however, as n increases, auxiliary storage must eventually be used, and although this solves the space problem, it also magnifies the time problem. to illustrate the magnitude of this problem, suppose that it takes one hour of computer time to classify a one-thousand document collection. then for n = 104, the time is approximately one hundred hours, and for n = 106, the time needed is about 120 years! the classification scheme described in this paper is an adapt~tion of the one proposed by doyle, and the time required is of the order of n log n (2) . for example, assuming the logarithm has base 10, and the time required for a one-thousand document collection is again one hour, then for n = 104 the time is 13 hours, and for n = 106, the time required is about 83 days. doyle's algorithm the n2 problem is avoided in this classification scheme, because a similarity matrix is never computed. assume the document set is arbitrarily partitioned into m clusters, where s, is the set of documents in cluster j, associated with each set s, are a corresponding concept vector c1 and frequency vector f,. the concept vector consists of all the concepts occurring in the documents of s;, and the frequency vector specifies the number of documents in s; in which each concept occurs. every concept in c; is assigned a rank according to its frequency; i.e., concepts with the highest frequency have a rank of 1, concepts with the next highest frequency receive a rank of 2, etc. given an integer b (base value), every concept in c; is assigned a rank value equal to the base value minus the rank of that concept. the vector of rank values is called the profile p, of the set sj. tables 1 and 2 illustrate the concept and frequency vectors, and the corresponding profiles for a sample document collection (base value=6). , starting from a partition of the document set into m clusters, the profiles are generated as described. every document di in the document space is now scored against each of the m profiles by a scoring function g, where g ( dt, p1) = the sum of the rank values of all the concepts from cl fast algorithm for automatic classification/dattola 33 which occur in c,. tables 3 and 4 show the results of scoring the documents in the sample collection against the profiles from table 1 (cutoff= 10). table 1. concept vectors dl d2 da d-4 d5 d6 d1 c1 ci c1 c1 c1 cs c6 c2 c2 c7 c2 cs cs c5 c4 cs cs c5 cn table 2. initial clusters, profiles, and frequencies s~ c1 f. pl s2 c2 f2 p2 sa cs fs dl c1 3 5 d2 c1 2 5 d6 cs 1 da c2 1 3 d4 c2 2 5 d7 . c6 1 d5 cn 1 3 cs 1 4 cs 1 c1 1 3 c4 1 4 cs 2 4 c5 2 5 table 3. document scoring document profile of highest score score ~ 2 m ~ 2 w ~ 1 ~ ~ 2 w d5 1 9 ds 3 5 ~ 3 ro table 4. clusters resulting from document scoring s1' s2' sa' l da d1 d1d5 d2 d6 d4 pa 5 5 5 34 journal of library automation vol. 2/1 march, 1969 given a cut-off value t, a new partition of the document set into m+1 clusters is made by the following formula : s/ = [d.ig(d.,pj)~g(d.,pk) and g(d.,p,)'=::,..t, for k=1, . . . ,m]. thus, s/ consists of all the documents that score highest against profile pj, provided that the score is at least as great as t. in cases where a document scores highest against two or more profiles, say pr, .. . , pr, 1 " the following tie-breaking rule is used: if d.esj and, a) f=rk, 1 l k l n, then d. is assigned to ( srk) '; b) j=r,, for 1 l k l n, then c1 is arbitrarily assigned to (sr 1 ) '. those documents which do not fall into any of the m clusters s/ are called loose documents, and they are assigned to a special class l . the process is now repeated after replacing p, by p/. the iteration continues until p; satisfies the termination condition, which states that p/=p1 for f=1, . .. ,m, i.e., the profiles are unchanged after two consecutive iterations. satisfaction of termination condition a) non-convergence of doyle's algorithm doyle's algorithm as described is not guaranteed to tenninate. to illustrate this, consider the following document collection: dl d2 da d4 du d6 dr ds d9 1 1 3 3 1 2 2 2 1 2 5 7 4 5 3 3 4 2 3 6 8 7 6 4 4 5 4 4 11 9 8 7 9 7 6 5 5 12 12 8 7 6 11 9 8 7 10 10 10 let s1=[d1-d6], s2=[d1-d9], and let p1=pro:6le of sl~ and p2= profile of s2. the two profiles are as follows (base value=7) : dld6 drd9 concept frequency profile concept frequency profile 1 3 5 1 1 4 2 2 4 2 3 6 3 4 6 3 1 4 4 3 5 4 3 6 5 3 5 5 2 5 fast algorithm for automatic classification/dattola 35 concept frequency profile concept frequency 6 2 4 6 2 7 3 5 7 3 8 2 4 8 2 9 2 4 9 1 ro o ro 3 11 2 4 11 0 12 2 4 12 0 profile 5 6 5 4 6 13 0 13 1 4 now assume that t = 0, and partition the document set by the formulae: s1'=[~jg( ~,p1)::::,g( dt,p2)] and ss'=[dtjg( dt,p2)::::,g( dt,pl) ]. the results are summarized in the following table: g(dt,pl) g(~,p2) dl 29 25 d2 22 14 ds 23 19 d4 20 21 d5 19 20 da 19 20 d1 28 37 ds 27 39 d9 28 42 therefore, s1'=[d1-dal and s2'=[d4-d9]. according to doyle's algorithm, p1 is replaced by p1' and p2 by p2'. the new profiles are: d1-ds d4-d9 concept frequency profile concept frequency profile 1 2 6 1 2 3 2 1 5 2 4 5 3 2 6 3 3 4 4 1 5 4 5 6 5 2 6 5 3 4 6 1 5 6 3 4 7 1 5 7 5 6 8 1 5 8 3 4 9 1 5 9 2 3 10 0 10 3 4 11 2 6 11 0 12 2 6 12 0 13 0 13 1 2 36 journal of library automation vol. 2/1 march, 1969 now the document set is again partitioned and the results are: g(i1,p1) g(t1,p2) dl 34 22 d2 29 11 da 27 17 d4 21 20 d5 22 17 do 21 18 d1 31 32 ds 31 33 do 32 34 therefore, s1'=fd1-da] and s2'=fd1-dol. these are the original sets, so that the algorithm will never terminate for this example. b) termination of modified algorithm although doyle's algorithm is not guaranteed to terminate, needham proved that similar types of iterative methods are guaranteed to terminate in a finite number of steps ( 3). a small change in doyle's method produces an algorithm that is guaranteed to terminate. the modification occurs after the calculation of the s/. instead of automatically replacing the old p1 by p;', the following condition must also be satisfied: 'l g( c1,p/) > 'l g( t1,p1 ) i£s/ i£s/ if the above condition is not satisfied, p1 is left unchanged. before proving that this new algorithm is guaranteed to terminate, it is desirable first to make the algorithm more general by allowing overlap between the clusters. the following theorem proves the termination of a method which allows overlapping clusters. theorem: let the subscript n designate the nth iteration. let d represent the document space and let po,1, ... , po,m represent m initial profiles corresponding to an arbitrary distribution so,1, ... , so,m of documents in d . given a cut-off value t, the nth iteration is defined as follows : 1). generate the sets sn,l, ••• , sn,m and ln by sn,j=[i1lg( d~,,pn-1,1 f:""·t] ln= (loose documents) { p n,j if, 'l g ( t1,p n,f) ? 'l g ( t1,p n-l,j) 2), let p n,;= 't£sn,j 't£sn,j pn-l,j otherwise this algorithm is guaranteed to terminate in a finite number of iterations, where termination occurs when pn,1=pn-l,j for all f. proof: extend the document spaced to a new document spaced# confast algorithm for automatic classification/dattola 37 taining m distinguishable copies of every document in d. also, add the condition that sn,j can never contain more than one copy of each document. clearly, any sn,j defined on d# in this manner can also be represented on d as defined in the theorem. conversely, any sn,j defined on d can also be represented on d# as defined above. thus, it suffices to prove the theorem on d# under the added condition. define a function f ,., which will be shown to be monotone increasing in n, by the following: m f,. = l f n,j+ t•zn, where i=l f,.,j= lg(cl,pn-t,j) and if.sn,j z,. = number of documents in l,.. after step 2 of the iteration, f,. is replaced by f ,.', where ( f n,j)' = l g ( dt,p n,j) • if.sn,j if for any j, pn,j =f pn-t,j, then (fn,j )' > fn,j (this statement is not necessarily true in doyle's algorithm) and therefore f ,.' > f n· if . termination occurs; i.e., pn,j = pn-t,j for all j; then fn' = f,.. for the n + lth iteration, m f,.+l = l fn+t,j + t•zn+t, where i=l fn+l ,j = l g(dt,pn,j). if.sn+z,j consider the relation between the contribution of cl to f,.' and fn+t, and note that each cl (where copies of a document are distinct) contributes once and only once to both f,.' and fn+t· this relation is summarized in the following table: documentcl a) was assigned to sn,j and now 1) to s,.+l,j (cl did not change clusters) 2) to sn+l,k, k =f f ( cl did change clusters) 3) to ln+t b) was assigned to l,. and now 1) to ln+t 2) sn+l ,j relation between contribution of cl to f,.+t and f,.' > (g( dt,pn,j) <t; otherwise cl would be in sn+t,j. also, g(clt,pn,rc)::::,. t) >(g(cl,pn,j)<t. now cl contributes l•t=t to fn+t) = (contributes t to both) ::::,.. (gave t for f,.', and now gives g(clt,pn,j)::::,.. t for fn+t) 38 journal of library automation vol. 2/1 march, 1969 therefore, f,.+l:::::...f.:. if pn,j=pn-1,1 for all f, then from a) 1) and b) 1) f n+t = f ,.'. therefore, if the termination condition is satisfied, then f,.+l = f,.. on the other hand, if f,.+t = f,., then f .. = f,.', which occurs only when termination occurs. thus, f,. is a monotone increasing function, where fn+l = f,. if the termination condition is satisfied. given m and t, f depends only on the distribution of the documents of d# in s1, ... ,sm. since there are only a finite number of distributions, there are only a finite number of values for f. therefore, at some iteration n, fn+1 must equal f,.. implementation the algorithm described in the preceding section is not implemented. instead, experiments are performed using an algorithm which differs from the preceding one in four important respects: 1) the extra condition necessary for convergence that is mentioned in section b above is not implemented; i.e., p1 is always replaced by p/; 2) termination occurs when s:.; = s:+l,j for all f, where s:., is the subset of sn,j consisting of all those documents that score highest against profile p n,j; 3) let h,.,, = max. ( g ( c1,p n,j) ) , and define sn,j as 19~m sn,j = [c11g(c1,p,..1,j)::::::... tn-1,,], where t _ fhn-1,,[a•(hn-1,-tt)], if hn-l,t>t, where 0..::::::~1 "·1·' lt otherwise 4) if any sn,; contains fewer than two documents, then s,.,; is eliminated, thereby reducing the number of clusters by one. the advantages of this method over the one defined in the theorem are discussed in the present section; the disadvantage is, of course, that termination is not guaranteed. to show this, note that conditions 1) and 2) above are equivalent to the termination condition in doyle's algorithm, since in doyle's method pn,j always corresponds to the new partition s,.,,, and s,.,1 = s:j (no overlap is allowed). also, if a = 0 in condition 3, then tn-t,< = hn-t,t. thus, only those documents c1 that score highest against pn-1,/, where hn-u::::::... t, are assigned to sn,j. therefore, with a= 0 this method is equivalent to doyle's algorithm. the first two modifications are implemented to improve the efficiency of the program. although convergence is no longer guaranteed, all the experiments tried so far have in fact always terminated. programs without these two modifications run about twice as slow. also, in cases where the overlap is not too high ( s:.1=sn,j), the new termination condition is usually equivalent to the one used in the theorem. that is, when s:.1=s:+l,j, then very often sn,j=sn+t,j. fast algorithm for automatic classification/dattola 39 the third modification does not improve efficiency, but it allows a more flexible, and intuitively, a more desirable method for creating overlap. the algorithm described in the theorem assigns a document d. to a cluster sn,j if g( dt, pn-l,j )-::::...t. this has two major disadvantages: 1) the overlap cannot be increased independently of the number of loose documents; increasing the overlap by lowering t in general decreases the percentage of loose documents; 2) the difference between d.'s highest score and d/s second highest score is ignored; e.g., if t=50, g(ct,p~)=200, and g(ct,p2)=50, then ct is assigned to both sl and s2. the first problem decreases the flexibility of the algorithm, since the amount of overlap and percentage of loose documents cannot be varied independently. the example in the second part illustrates the other problem. it seems desirable that a document should be assignable to two or more clusters when it scores equally (or almost equally) as high against all of them. the previous method does not take this fact into account. in the new algorithm, documents are assigned to more than one cluster on the basis of how close the score is to the highest score, relative to the cut-off value t. the parameter a determines how close to the highest score the other scores must be. when a.;_o, no overlap occurs, while a=! generates the maximum amount of overlap. with a=l, the formula reduces to tn-l,<=t; hence, it is the same definition of sn,j as in the theorem. the last modification increases the efficiency of the program, and also avoids forming clusters around documents which should be classified as loose. when s,,1 contains only one document, and that document is contained in no other clusters, then it has the same status as a loose document. experimental results the algorithm described in the preceding section is used to cluster the 82-document adi collection and the 200-document cranfield word stem collection ( 4). the results of the classification indicate three important problems: 1) the scoring function g tends to give higher scores to documents containing a larger number of concepts; thus, many of the documents containing very few concepts are classified as loose; 2) the documents do not move freely enough from one profile to another; i.e., the final clusters are quite similar to the initial ones; 3) the initial clusters cannot be chosen arbitrarily. the scoring function the first problem is due to the fact that g scores a document ct against a profile p1 by simply adding up the rank values of all the concepts in d, which appear in p1• if tl contains a larger number of concepts than d~c, the chances are greater for d; to receive a higher score. figure 1 is a plot of the score of the document against its final profile vs. the number of con40 journal of library automation vol. 2/1 march, 1969 -g) ... 0 cj (/) 8 0 • 1 c1uster t cluster 4 no. of concepts /document fig. 1. initial scoring function . q) ... 0 cj (/) 25 5 • _jl.----"~•r--c i uster 3 ,_ .. 0 10 20 30 40 50 no. of concepts/document fig. 2 modified scoring function. fast algorithm for automatic classification/dattola 41 cepts in the document for one of the adi runs. although there are a few exceptions, the graph indicates that the documents with a larger number of concepts generally receive higher scores. in fact, the average number of concepts in a loose document is eleven, while the average number of concepts per document for the entire collection is twenty. the solution to this problem is to weight the score inversely by the number of concepts in the document. the obvious answer is to divide the score by the number of concepts, but this overcompensates and gives many of the smaller documents the highest scores. dividing by the square root of the number of concepts in the document does not solve the original problem; i.e., larger documents give higher scores. satisfactory results are obtained when the score is divided by ( # of concepts per document) 718• figure 2 represents the same adi sample as figure 1, except that the new scoring function h=g/(# concepts per document) 718 is used. unlike the function g, h seems to be independent of the number of concepts in the document. movement of documents the second problem is clearly indicated by examination of the results of the classification. table 5 shows the initial and final clusters for the adi collection. the problem occurs because the documents tend to "stick" to the clusters that they are already in. this problem is solved by a method similar to that used by doyle. cluster 1 2 3 4 5 6 7 loose table 5. final results of adi classification initial documents 1-12 13-24 25-36 37-48 49-60 61-71 72-82 final documents 111, 13, 21, 30, 33, 34, 40, 43, 51, 68 3, 10, 13 24, 26, 33, 34, 53, 69, 79 9, 11, 13, 20, 22, 23, 25 28, 30 34, 36, 47, 51, 55, 65, 75 4, 7-9, 14, 20, 30, 3748, 51, 69 1, 5, 7, 20, 30, 32, 45, 47, 51 53, 55 59, 79, 80 2, 9, 27, 30, 47, 51, 61, 62, 64-71 10, 40, 51, 72 75, 77 81 12, 29, 35, 49, 50, 54, 60, 63, 76, 82 42 i ournal of library automation vol. 2/1 march, 1969 during the first few iterations, documents should be allowed to move freely from cluster to cluster, until a nucleus is formed within each cluster. the nucleus consists of those documents that are most highly correlated to one another. once the nucleus is formed, these documents will probably not move from their present clusters. clusters can be forced to contain only very highly correlated documents by raising the cut-off value t, assuming that documents with the highest scores are most similar to the other documents in the cluster. this assumption is investigated later. however, raising the cut-off value results in a larger number of loose documents. this is resolved by repeating the classification for a lower value of t, but using the clusters from the first classification as the initial clusters. this creates the problem of how to determine the initial value of t, and how much to decrement it when the classification is repeated using as initial clusters the results of the first classification. the initial value of t should be high enough so that only those documents which score very highly against profile p1 are assigned to sj. one method of achieving this is to pick t so that the clusters after the first iteration average q documents, where q is small compared to the total number of documents. in the experiments run so far, q is arbitrarily set at 4. after termination of the first classification, a nucleus is formed within each cluster. t is now chosen so that a certain percentage of the loose documents are assigned to clusters after the first iteration of the second classification. assuming it is desirable to have approximately x percent of the documents loose after the final clusters are formed, two approaches are possible: 1) t is lowered far enough so that only x percent of the documents remain loose after the first iteration; thus, after termination of the second classification, the clusters represent the final results; 2) t is lowered just enough to allow a certain percentage of the loose documents to be assigned to clusters after the first iteration; thus, the classification is repeated until approximately x percent of the documents remain loose. experiments performed using both methods indicate that the second approach allows greater control of the loose documents, with only slightly greater execution times. after the first classification, a large proportion of the documents still remain loose. therefore, if x is not too high, method 1) decreases t by a large amount. this injects many new documents into the clusters, and several iterations are necessary before termination occurs. also, t is chosen so that the percentage of loose documents is x at the end of the first iteration, but it is impossible to know beforehand the percentage of loose documents after the final iteration. in general, the more iterations, the more the final percent varies from the percent after the first iteration. in method 2), t is lowered just enough to allow a fairly small percentage ( 20% in the present experiments) of the loose documents to be assigned to clusters. this normally results in only a few fast algorithm for automatic classification/dattola 43 iterations before termination occurs; therefore, the final percent of loose documents does not vary much from the percent loose after the first iteration. the adi collection is reclassified using the procedures described above, where it is desired that about 25% of the documents remain loose. once again seven initial clusters are used, and the initial value of t is calculated to be 28.2 so that the clusters after the first iteration average four documents. however, in this case cluster 3 is assigned ten documents, while clusters 1,5, and 6 contain only one document. thus, these three clusters are eliminated, and the documents within them become loose. mter termination occurs, the final clusters are used as initial clusters for the next classification, where t is set to 19.1. the process is repeated again for t = 16.8, and after termination 17% of the documents remain loose. table 6 shows the final results of this classification. compared with table 5, many more of the documents have moved from their initial clusters. table 6. final results of new adi classification cluster 1 2 3 4 5 6 7 loose initial clusters initial documents 1-12 13-24 25-36 37-48 49-60 61-71 72-82 final documents 3, 5, 9, 10, 14-17, 2028, 30, 34, "37, 43, 45, 48, 53, 57 59, 64, 68, 69, 72, 79, 80 1, 2, 5, 6, 8, 11, 13, 20, 21, 24, 27, 28, 30, 36, 39, 41, 43, 47, 51, 53, 55, 56, 58, 61, 62, 65 68, 70, 71 79, 80 7, 31, 42, 44, 46 4, 9, 19, 32, 40, 51, 73 75, 78, 81 12, 18, 29, 35, 38, 49, 50, 52, 54, 60, 63, 76, 77, 82 in the present study, the initial clusters are determined by assigning the first p (or possibly p+ 1) documents to cluster 1, the next p ( p + 1) to 44 journal of library automation vol. 2/1 march, 1969 table 7. score vs. average correlation for adi classification cluster 1 cluster 2 document score avg. corr. document score avg. corr. 25 19.1 .08 8 18.7 .09 5 19.6 .12 5 18.8 .12 64 20.1 .10 20 18.8 .12 23 20.2 .13 68 18.9 .10 27 20.3 .10 2 19.2 .12 34 20.6 .11 70 19.2 .13 15 20.6 .11 39 19.2 .14 37 20.7 .14 28 19.3 .11 48 20.8 .12 58 19.4 .12 58 20.9 .12 36 19.5 .14 28 21.0 .12 61 19.6 .11 53 21.0 .14 56 19.6 .14 20 21.0 .14 66 19.7 .12 68 21.1 .12 67 19.9 .12 80 21.2 .14 80 20.0 .14 57 21.2 .13 43 20.0 .14 59 21.3 .15 33 20.0 .15 14 21.4 .13 21 20.0 .15 16 21.5 .13 11 20.0 .14 79 21.5 .15 65 20.0 .15 43 21.6 .16 27 20.0 .13 24 21.6 .15 71 20.1 .14 69 21.7 .14 41 20.2 .15 26 21.7 .16 79 20.4 .16 17 21.8 .15 24 20.4 .15 72 21.8 .17 13 20.5 .15 21 22.0 .17 51 20.6 .12 3 22.0 .17 53 20.7 .16 9 22.1 .17 6 21.0 .18 30 22.2 .17 62 21.4 .18 22 22.3 .17 55 21.5 .17 45 22.4 .18 1 21.7 .20 10 22.4 .17 30 21.8 .18 cluster 3 cluster 4 document score avg. cor1·. document score avg. co1'1'. 31 31.0 .21 32 24.5 .10 46 31.2 .05 51 25.0 .15 44 31.3 .14 74 26.0 .16 7 31.8 .24 4 26.3 .18 42 33.2 .28 75 26.4 .16 9 26.5 .19 19 26.9 .17 73 27.0 .19 78 27.1 .18 40 27.4 .24 81 27.6 .21 fast algorithm for automatic classification/dattola 45 table 8. score vs. average correlation for cranfield classification cluster 1 cluster 2 document score avg. corr. document score avg. corr. 26 22.0 .12 38 19.9 .12 6 22.3 .13 97 20.2 .13 7 22.4 .13 15 20.3 .11 117 23.0 .14 1 20.3 .13 2 23.1 .14 34 20.4 .13 121 23.2 .15 145 20.4 .14 13 23.3 .15 171 20.8 .12 19 23.4 .16 172 20.9 .13 60 23.7 .15 30 20.9 .14 23 23.8 .17 4 21.1 .15 18 23.8 .17 140 21.1 .15 44 24.1 .18 72 21.2 .15 183 24.2 .17 138 21.3 .15 116 24.3 .18 143 21.3 .15 128 24.5 .18 141 21.3 .14 61 24.6 .18 27 21.6 .13 9 24.6 .18 36 21.7 .13 197 24.7 .19 157 21.7 .17 16 24.7 .20 59 2!.8 .16 198 24.7 .17 156 21.8 .16 3 24.8 .18 200 21.9 .16 25 24.9 .20 32 22.0 .18 28 25.0 .21 137 22.4 .15 115 25.1 .21 29 22.5 .19 58 25.6 .21 148 22.5 .17 181 25.6 .20 . 57 22.8 .18 56 25.9 .21 128 22.8 .15 160 26.6 .23 44 23.1 .19 31 23.2 .19 139 23.9 .19 56 23.4 .18 160 24.3 .18 58 25.3 .21 clustet· 3 document score avg. corr. 179 31.4 .19 154 32.5 .24 79 32.8 .27 133 32.9 .29 134 33.4 .28 77 33.7 .27 132 33.8 .32 78 34.1 .30 76 34.3 .34 74 34.4 .34 75 34.5 .36 46 ]oumal of library automation vol. 2/1 march, 1969 cluster 2, . . . , and the final p to cluster m, where p = (total number of docu'ments) i m. since the nucleus of each cluster depends quite strongly on the initial clusters, it is not surprising that different initial clusters lead to different results. if the initial clusters are chosen at random, it is unlikely that the documents within each cluster are very similar. thus, the nucleus of each cluster might not be very tight. this problem is solved by insuring that the initial clusters contain at least a few documents that are highly con-elated. in the adi and cranfield collections, the order of the documents is such that many adjacent documents are quite similar; therefore, most of the initial clusters contain a few highly con-elated documents. in collections where the order of the documents is random, a simple, fast, clustering scheme can be used to determine the initial clusters. this type of an algorithm need only perform document-document con-elations within a fraction of the docll'ment space, and therefore should not take up much time. evaluation of results the assumption was made earlier that those documents of a cluster s; that score highest against the conesponding profile p1 are most similar to the other documents in the cluster. the phrase "most similar" is used to mean "con-elate most highly", where a standard con-elation function is used. table 7 compares the score of each document to the average correlation ( unweighted cosine function) of each document with every other document in the cluster. the documents are arranged in ascending order by scores, and hopefully, the con-elations will also appear in ascending order. as the table indicates, there is a strong tendency for the higher scores to conespond to the higher correlations. table 8 illustrates the same results for three out of seven final clusters from the cranfield collection. so far nothing has been said about how to choose the base value that is used to compute rank values. this integer has an important effect on the type of clusters produced. recall that the rank value of a concept equals the base value b, minus its rank. suppose a cluster s1 contains four documents d1-d4, and a total of twenty different concepts. the lowest possible rank value for any concept=b-4, since 4 is the lowest possible rank. if h=20, then the lowest rank value is 16, while if h=5, the lowest rank value is 1. consider a document d. which is the same as d1 except for one concept, and assume this concept does not occur in p,. with h=20, g( d.,p1) is between 16 and 20 points less than g( d1,p1 ); with h=5, g( d.,p1 ) is only 1 to 5 points less. since large clusters have profiles containing many concepts, the chances of a random document d, having concepts in the profile of a large cluster are greater than the chances of d. having concepts in the profile of a small cluster. therefore, if b is high, d. will score much lower against the profile of the small cluster, and large clusters will tend to capture all the remaining loose documents at the expense of the smaller clusters. fast algorithm for automatic classification/dattola 47 experimental results support this hypothesis, i.e., a large base value produces a few clusters with many documents, and many clusters with only a few documents. if, on the other hand, b is set so that the lowest rank value in an average cluster is 1, then there is a tendency for small clusters to get larger and large clusters to get smaller. in smaller than average clusters, all the rank values are high, since there are only a few different ranks. in larger than average clusters, the rank value as defined might become zero or even less than zero. in these cases, it is redefined to be 1, but then it is possible for many concepts to have a rank value of i . thus, a document often scores higher against the profiles of smaller clusters. the results of the cranfield classification clearly indicate the ability of a document to score higher against profiles of smaller clusters. during the classification, nine clusters are generated, and cluster 9 starts to grow much larger than average (average = 22 documents). it keeps growing until it contains 27 documents, and then it starts to oscillate. the following numbers indicate the number of documents in cluster 9 on successive iterations: 27, 21, 34, 17, 56, 01 thus, cluster 9 is eliminated. the same thing happens to cluster 8 on the next few iterations. although this tends to keep the size of the clusters somewhat uniform, it is not desirable to throw away a cluster which might contain many highly correlated documents. one solution which might be implemented is to split up large clusters into several smaller ones; i.e., classify the documents , within a single cluster. if the number of documents in the cluster is not too large, it might be practicable to use an n 2 algorithm to do this. conclusion the classification algorithm that has been described in the preceding two sections requires the following parameters as input: 1) maximum number of clusters desired; 2) approximate percentage of loose documents desired; 3) decision on whether or not loose documents should be "blended" into the nearest cluster at the end of the classification; 4) amount of overlap desired. the first parameter specifies the number of initial clusters that are formed. if no clusters are eliminated during the evaluation, then the maximum number are actually generated. the experiments run so far indicate that the number of clusters produced is usually only about 60% of the maximum. the next two parameters determine the "tightness" of the final clusters; the higher the percentage of loose documents, the tighter the clusters. if no loose documents are desired, parameter b can be set to 0, but very low percentages increase the running time of the program. almost identical results are obtained in less time by specifying about 15% loose, and then asking for all loose documents to be assigned to the cluster to which they score highest. 48 journal of library automation vol. 2/1 march, 1969 the last parameter determines the amount of overlap. this number corresponds to a in the formula t _ fhn-1,•-a • (hn-1,,-t), if hn-1,t>t "· 1 ·' lt otherwise which was mentioned under implementation. when a = 0, no overlap is produced, and with a = 1, the maximum amount of overlap is produced. the actual percentage of overlap for a given value of a depends on the collection, but results indicate that 10% overlap for a = .4, and about 20% for a= .6. although the algorithm is not guaranteed to terminate, convergence has always been obtained in practice. in order to prevent the program from looping in cases of non-convergence, the algorithm can be modified to permit a maximum of n iterations, whether or not convergence is obtained. the results indicate that clusters change very little after about four or five iterations, so that this modification would not make much difference in the final clusters. the true evaluation of the final clusters can only be made by actually performing two-level searches on the clustered document space. however, the algorithm is sufficiently general to allow for the evaluation of many different types of clusters. references 1. jones, s. k.; jackson, d.: "current approaches to classification and clump-finding at the cambridge language research unit," the computer journal, 10 (may 1967). 2. doyle, l. b.: breaking the cost barrier in automatic classification, sdc paper sp-2516 (july 1966). 3. needham, r. n.: the termination of certain iterative processes. the rand corporation memorandum rm-5188-pr (november 1966). 4. salton, g.; lesk, m. e.: computer evaluation of indexing and text processing, information storage and retrieval; report isr-12 to the national science foundation, section iii. (ithaca, n.y.: cornell university, department of computer science, august, 1967). editorial | truitt 3 marc truitt marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. marc truitt editorial: and now for something (completely) different t he issue of ital you hold in your hands—be that issue physical or virtual; we won’t even go into the question of your hands!—represents something new for us. for a number of years, ex libris (and previously, endeavor information systems) has generously sponsored the lita/ex libris (née lita/endeavor) student writing award competition. the competition seeks manuscript submissions from enrolled lis students in the areas of ital’s publishing interests; a lita committee on which the editor of ital serves as an ex-officio member evaluates the entries and names a winner. traditionally, the winning essay has appeared in the pages of ital. in recent years, perhaps mirroring the waning interest in publication in traditional peerreviewed venues, the number of entrants in the competition has declined. in 2008, for instance, there were but nine submissions, and to get those, we had to extend the deadline six weeks from the end of february to midapril. in previous years, as i understand it, there often were even fewer. this year, without moving the goalposts, we had— hold onto your hats!—twenty-seven entries. of these, the review committee identified six finalists for discussion. the turnout was so good, in fact, that with the agreement of the committee, we at ital proposed to publish not only the winning paper but the other finalist entries as well. we hope that you will find them as stimulating as have we. even more importantly, we hope that by publishing such a large group of papers representing 2009’s best in technology-focused lis work, we will encourage similarly large numbers of quality submissions in the years to come. i would like to offer sincere thanks to my university of alberta colleague sandra shores, who as guest editor for this issue worked tirelessly over the past few months to shepherd quality student papers into substantial and interesting contributions to the literature. she and managing editor judith carter—who guest-edited our recent discovery issue—have both done fabulous jobs with their respective ital special issues. bravo! n ex libris’ sponsorship in one of those ironic twists that one more customarily associates with movie plots than with real life, the lita/ex libris student writing award recently almost lost its sponsor. at very nearly the same time that sandra was completing the preparation of the manuscripts for submission to ala production services (where they are copyedited and typeset), we learned that ex libris had notified lita that it had “decided to cease sponsoring” the student writing award. a brief round of e-mails among principals at lita, ex libris, and ital ensued, with the outcome being that carl grant, president of ex libris north america, graciously agreed to continue sponsorship for another year and reevaluate underwriting the award for the future. we at ital and i personally are grateful. carl’s message about the sponsorship raises some interesting issues on which i think we should reflect. his first point goes like this: it simply is not realistic for libraries to continue to believe that vendors have cash to fund these things at the same levels when libraries don’t have cash to buy things (or want to delay purchases or buy the product for greatly reduced amounts) from those same vendors. please understand the two are tied together. point taken and conceded. money is tight. carl’s argument, i think, speaks as well to a larger, implied question. libraries and library vendors share highly synergistic and, in recent years, increasingly antagonistic relationships. library vendors—and i think library system vendors in particular—come in for much vitriol and precious little appreciation from those of us on the customer side. we all think they charge too much (and by implication, must also make too much), that their support and service are frequently unresponsive to our needs, and that their systems are overly large, cumbersome, and usually don’t do things the way we want them done. at the same time, we forget that they are catering to the needs and whims of a small, highly specialized market that is characterized by numerous demands, a high degree of complexity, and whose members—“standards” notwithstanding—rarely perform the same task the same way across institutions. we expect very individualized service and support, but at the same time are penny-pinching misers in our ability and willingness to pay for these services. we are beggars, yet we insist on our right to be choosers. finally, at least for those of us of a certain generation—and yep, i count myself among its members—we chose librarianship for very specific reasons, which often means we are more than a little uneasy with concepts of “profit” and “bottom line” as applied to our world. we fail to understand the open-source dictum that “free as in kittens and not as in beer” means that we will have to pay someone for these services—it’s only a question of whom we will pay. carl continues, making another point: i do appreciate that you’re trying to provide us more recognition as part of this. frankly, that was another consideration in our thought of dropping it—we just didn’t feel like we were getting much for it. marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | march 2010 i’ve said before and i’ll say again, i’ve never, in all my years in this business had a single librarian say to me that because we sponsored this or that, it was even a consideration in their decision to buy something from us. not once, ever. companies like ours live on sales and service income. i want to encourage you to help make librarians aware that if they do appreciate when we do these things, it sure would be nice if they’d let us know in some real tangible ways that show that is true. . . . good will does not pay bills or salaries unless that good will translates into purchases of products and services (and please note, i’m not just speaking for ex libris here, i’m saying this for all vendors). and here is where carl’s and my views may begin to diverge. let’s start by drawing a distinction between vendor tchotchkes and vendor sponsorship. in fairness, carl didn’t say anything about tchotchkes, so why am i? i do so because i think that we need to bear in mind that there are multiple ways vendors seek to advertise themselves and their services to us, and geegaws are one such. trinkets are nice—i have yet to find a better gel pen than the ones given out at iug 14 (would that i could get more!)—but other than reminding me of a vendor’s name, they serve little useful purpose. the latter, vendor sponsorship, is something very different, very special, and not readily totaled on the bottom line. carl is quite right that sponsorship of the student writing award will not in and of itself cause me to buy aleph, primo, or sfx (oh right, i have that last one already!). these are products whose purchase is the result of lengthy and complex reviews that include highly detailed and painstaking needs analysis, specifications, rfps, site visits, demonstrations, and so on. due diligence to our parent institutions and obligations to our users require that we search for a balance among best-of-breed solutions, top-notch support, and fair pricing. those things aren’t related to sponsorship. what is related to sponsorship, though, is a sense of shared values and interests. of “doing the right thing.” i may or may not buy carl’s products because of the considerations above (and yes, ex libris fields very strong contenders in all areas of library automation); i definitely will, though, be more likely to think favorably of ex libris as a company that has similar—though not necessarily identical—values to mine, if it is obvious that it encourages and materially supports professional activities that i think are important. support for professional growth and scholarly publication in our field are two such values. i’m sure we can all name examples of this sort of behavior: in addition to support of the student writing award, ex libris’ long-standing prominence in the national information standards organization (niso) comes to mind. so too does the founding and ongoing support by innovative interfaces and the library consulting firm r2 for the taiga forum (http://www.taigaforum.org/), a group of academic associate university librarians. to the degree that i believe ex libris or another firm shares my values by supporting such activities—that it “does the right thing”—i will be just a bit more inclined to think positively of it when i’m casting about for solutions to a technology or other need faced by my institution. i will think of that firm as kin, if you will. with that, i will end this by again thanking carl and ex libris—because we don’t say thank you often enough!—for their generous support of the lita/ex libris student writing award. i hope that it will continue for a long time to come. that support is something about which i do care deeply. if you feel similarly—be it about the student writing award, niso, taiga, or whatever—i urge you to say so by sending an appropriate e-mail to your vendor’s representative or by simply saying thanks in person to the company’s head honcho on the ala exhibit floor. and the next time you are neck-deep in seemingly identical vendor quotations and need a way to figure out how to decide between them, remember the importance of shared values. n dan marmion longtime lita members and ital readers in particular will recognize the name of dan marmion, editor of this journal from 1999 through 2004. many current and recent members of the ital editorial board—including managing editor judith carter, webmaster andy boze, board member mark dehmlow, and i—can trace our involvement with ital to dan’s enthusiastic period of stewardship as editor. in addition to his leadership of ital, dan has been a mentor, colleague, boss, and friend. his service philosophy is best summarized in the words of a simple epigram that for many years has graced the wall behind the desk in his office: “it’s all about access!!” because of health issues, and in order to devote more time to his wife diana, daughter jennifer, and granddaughter madelyn, dan recently decided to retire from his position as associate director for information systems and digital access at the university of notre dame hesburgh libraries. he also will pursue his personal interests, which include organizing and listening to his extensive collection of jazz recordings, listening to books on cd, and following the exploits of his favorite sports teams, the football irish of notre dame, the indianapolis colts, and the new york yankees. we want to express our deep gratitude for all he has given to the profession, to lita, to ital, and to each of us personally over many years. we wish him all the best as he embarks on this new phase of his life. president’s message cindi trainor information technologies and libraries | september 2013 1 it's fall already, and for lita that means some exciting things! the program planning committee is hard at work evaluating the sixty-plus program proposals that have been submitted. the new conference format specified by ala means that we have 20 slots for lita programs at annual, including the president's program and top tech trends. the ppc certainly has its work cut out! well before midwinter in philadelphia will be national forum 2013. i'm so excited that this year's forum is going to be in my home state of kentucky. join us from november 7-10 in "luhvuhl" (louisville) for great keynotes, preconference workshops, concurrent sessions, and of course, networking opportunities. travis good from make magazine will deliver our opening keynote, nate hill from chattanooga public library is up on saturday, and emily gore from the digital public library of america closes out the forum on sunday. i hope you'll join us for an exciting forum in the bluegrass state. in governance news, the board of directors identified three goal areas on which we will concentrate this year: stabilizing the budget, engaging members, and growing membership. we are eagerly awaiting the final report of the financial stability task force, led by tom wilson and andrew pace; they presented preliminary findings at the board meeting in chicago in july. we are currently updating the strategic plan, which was last updated in 2010—watch the lita-l list and the blog for more. we would love your input on them! if you're interested in lita governance, check out the board's space on ala connect. many discussion posts are open, and we welcome your comments! you can also complete this form (http://www.ala.org/lita/about/board/contact) to reach out to the board anytime. the board will be having several meetings this fall: the executive committee is tentatively meeting september 30; the forum steering committee will meet after forum 2013; the budget review committee will be meeting at the ala joint boards meeting in late october; and the entire board will have an online meeting before we convene in person in philadelphia. watch lita-l for the announcements; we welcome you as guests to all our meetings. finally, for those of you interested in leadership but not necessarily ready to run for board, i want to point you to documents put together by the lita emerging leaders team in 2013: http://connect.ala.org/node/197839 our team of three ala emerging leaders, margaret heller, zach coble, and katie heidgerkengreene, surveyed lita leaders, worked with committee chairs coordinator michelle frisque and ig chairs coordinator paul keith, and synthesized tons of information and many documents into the leadership guide for new chairs of committees and interest groups cindi trainor (cindiann@gmail.com) is lita president 2013-14 and community specialist & trainer for springshare, llc. http://litablog.org/ http://www.ala.org/lita/about/board/contact http://www.ala.org/lita/about/board/contact http://connect.ala.org/node/197839 http://connect.ala.org/node/209032 mailto:cindiann@gmail.com president’s message | trainor 2 (http://connect.ala.org/node/209032). they also created a sample leadership game (http://www.gloriousgeneralist.com/leadership.html) to test our leadership knowledge, and presented their work at the emerging leaders poster session in chicago. the leadership guide will inform future orientation activities for committee and ig chairs and will then be handed over to the bylaws & organization committee to be incorporated into the lita manual. thank-you for being a lita member! i hope to see you at a future event. my fellow board members and i welcome your comments and suggestion--for program ideas, workshop or online class ideas, and for how we can keep lita awesome. :) http://connect.ala.org/node/209032 http://www.gloriousgeneralist.com/leadership.html http://www.gloriousgeneralist.com/leadership.html 62 information technology and libraries | june 2011 jason vaughan and kristen costello management and support of shared integrated library systems the second major hardware migration occurred, and an initial memorandum of understanding (mou) was drafted by the unlv libraries. this mou is still used by the libraries. the mou was discussed with all partners and ultimately signed by the director of each library. since the mou was signed nearly a decade ago, the system has continued to grow by all measures—size of the database, number of users, number of software modules comprising the complete system, and the financial and staff commitment toward support and maintenance. despite the emergence of a large number of other network-based technologies critical to library operations and services, the ils remains a critical system that supports many library operations. the research described in this paper developed in part because there is a dearth of published survey-based research of shared ils management and financial support. this article interweaves local existing practices with research findings. for brevity’s sake, the system shared by the unlv university libraries and four additional partners will be referred to as unlv’s system. to provide a relative sense of the footprint of each partner on the system, various measures can be used (see figure 1). ■■ survey method in april 2010, the authors administered a 20-question survey to the innovative user’s group (iug) via the group’s listserv. the survey focused on libraries that are part of a consortial or otherwise shared innovative ils. the innovative user’s group is the primary user’s group associated with the innovative ils and suite of products. the iug hosts a busy listserv, coordinates the annual north american conference devoted solely to the innovative system, and provides innovative customer-driven enhancement requests. to prevent multiple individuals from the same consortium responding to the survey, instructions indicated that only one individual from the main institution hosting the system should officially respond. given the anonymity of the survey and the desire to provide confidentiality, there is the possibility that some survey responses refer to the same system. the survey consisted primarily of multiple choice, “select all that apply,” and free-text response questions. the survey was divided into four broad topical areas: (1) background information; (2) funding; (3) support; and (4) training, professional development, and planning. the survey was open for a period of three weeks. because respondents could choose to skip questions, the number of responses received per question varied. on average, 43 individual responses were received for each question. innovative currently has more than 1,200 millennium ils installations.2 not all of those installations support multiple, administratively separate library entities. it is unknown the university of nevada, las vegas (unlv) university libraries has hosted and managed a shared integrated library system (ils) since 1989. the system and the number of partner libraries sharing the system has grown significantly over the past two decades. spurred by the level of involvement and support contributed by the host institution, the authors administered a comprehensive survey to current innovative interfaces libraries. research findings are combined with a description of unlv’s local practices to provide substantial insights into shared funding, support, and management activities associated with shared systems. s ince 1989, the university of nevada, las vegas university libraries has hosted and managed a shared integrated library system (ils). currently, partners include the university of nevada, las vegas university libraries (consisting of one main and three branch libraries, and hereafter referred to as unlv libraries); the administratively separate unlv law library; the college of southern nevada (a community college system consisting of three branch libraries); nevada state college; and the desert research institute. the original ils installation included just the unlv libraries and the clark county community college (now known as the college of southern nevada). the desert research institute joined in the early 1990s, the unlv law library joined with the establishment of the william j. boyd school of law in 1998, and, finally, nevada state college joined upon its creation in 2002. over time, the technological underpinnings of the ils have changed tremendously and have migrated firmly into a webbased environment unknown in 1989. the system was migrated to innovative interfaces’ current java-based platform, millennium, beginning in 1999. since the original installation, there have been three major full hardware migrations, in 1997, 2002, and 2009. over time, regular innovative software updates, as well as additional purchased software modules, have greatly extended both the staff and end user functionality of the ils. in early 2001, unlv and its partners conducted a marketplace assessment of ils vendors catering to academic customers.1 the assessment reaffirmed the consortia’s commitment to innovative interfaces. shortly thereafter, jason vaughan (jason.vaughan@unlv.edu) is director, library technologies, university of nevada las vegas. kristen costello (kristen.costello@unlv.edu) is systems librarian, university of nevada las vegas. management and support of shared integrated library systems | vaughan and costello 63 partners originally purchased the system together; 20 (38.5 percent) indicated they purchased the system with some of their current existing partners, while 9 (17.3 percent) indicated they as the main institution originally and solely purchased the system. several of the entities sharing the unlv libraries’ system did not even exist when the ils was originally purchased; only two of the current partners shared the original purchase cost of the system. another background question sought to understand how partners potentially individualize the system despite being on a shared platform. innovative, and likely other similar ils vendors, offers several products to help libraries better manage and control their holdings and acquisitions. of potential benefit to staff operations and workflow, innovative offers the option to have multiple acquisitions and/or serials control units, which provide separate fund files and ranges of order records for different institutions sharing the ils system. of 51 responses received, 44 respondents (86.3 percent) indicated they had multiple acquisitions and serials units and 7 (13.7 percent) do not. innovative offers two web-based discovery interfaces for patrons: the traditional online public access catalog, known as webpac, and their version of a next-generation discovery layer, known as encore. of potential benefit to staff as well as patrons, innovative offers “scoping” modules that help patrons using one of the web-based discovery interfaces, as well as staff using the millennium staff modules. the scoping module allows holdings segmentation by location or material type. scopes allow libraries to define their collections and offer their patrons the option to search just the collection of their applicable library. forty-six (88.5 percent) of the 52 respondents indicated they use scoping and 6 (11.5 percent) do not. unlv how many shared innovative library systems exist. while a true response rate cannot be determined, such a measure is not critical for this research. the survey questions with summarized results are provided in appendix a. ■■ survey background unlv’s system, with only five unique library entities, is a “small” system when compared with survey responses. survey respondents indicated a range from 2 to 80 unique members sharing their system. of the 48 responses received for this background question, 26 (54 percent) indicated 10 or fewer partners on the system. seven (14.6 percent) indicated 40 or more partners. the average number of partners sharing an ils implementation was 18 and the median was 8.5. there can be varying levels of partnership within a shared ils system. unlv’s instance is a rather informal partnership. some survey respondents indicated the existence of a far more structured or dedicated support group not directly associated with any particular library. one respondent noted they have a central office comprised of an executive director and two additional staff, responsible for ils administration; this central office reports to a board of directors, comprised of library directors for each member library. another indicated they have a central office responsible not only for the ils, but for other things such as wide and local area networks and workstation support. one respondent indicated that they are actually a consortium of consortia, with 9 hosts each comprised of anywhere from 4 to 11 libraries. twenty-three respondents out of 52 (44.2 percent) indicated that they and all of their current existing full-time library staff bibliographic records item records order records patron records staff login licenses unlv libraries 105 (70.9%) 1,494,890 (78.2%) 1,906,225 (81.1%) 74,223 (58.4%) 40,788 (59.6%) 85 (69.1%) unlv law library 13 (8.8%) 246,678 (12.9%) 243,788 (10.4%) 29,921 (23.5%) 2,034 (3%) 13 (10.6%) college of southern nevada 27 (18.2%) 146,118 (7.6%) 175,862 (7.5%) 22,142 (17.4%) 23,876 (34.9%) 20 (16.3%) nevada state college 1 (.7%) 17,787 (.9%) 17,979 (.8%) 841 (.7%) 1,718 (2.5%) 3 (2.4%) desert research institute 2 (1.4%) 5,396 (.3%) 5,361 (.2%) 0 (0%) 24 (<.1%) 2 (1.6%) figure 1. various measures of ils footprints for unlv’s shared ils (percentage of overall system) note: “staff login licenses” refers to the number of simultaneous staff users each institution can have on the system at any given time. 64 information technology and libraries | june 2011 share of funding toward annual maintenance based on their number of staff licenses, as shown in figure 1. ■■ funding support from partners mous appear to include funding and budgeting information more than any other discrete topic. direct support costs can include the maintenance support costs paid to one or more vendors, costs for additional vendor authored software modules purchased in addition to the base software, and, perhaps, licensing costs associated with a database or operating system used by the ils (e.g., an oracle license for oracle based ils systems). there are many parameters by which costs could be determined for partners, and, given the dearth of published research on the topic, a chief focus of this research sought more information on what factors were used by other consortia. the authors brainstormed 10 elements that could potentially figure into the overall cost sharing method. thirty-eight respondents provided information on factors playing a role in their cost sharing arrangements, illustrated in figure 2. respondents could mark more than one answer for this question, as more than one factor could be involved. the top two factors relate directly to vendor costs— whether annual support costs or acquisition of new vendor software. hardware placed third in overall frequency; for innovative and likely for other ils systems, ils hardware can be purchased from the vendor or an approved platform can be sourced from a reseller directly. support costs from third parties and the number of staff login ports were each identified as a factor by more than a third of all respondents. ■■ software purchases depending on the software, additional modules extending the system capabilities can benefit a single partner, or, in unlv’s experience, all partners on the system. traditionally, the unlv libraries have had the largest operating budget of the group, and a majority of new software requests have come internally from unlv libraries staff. over the past 20 years, the unlv libraries have fully funded the initial purchase costs of a majority of the software extending the system, regardless of whether it benefits just the unlv libraries or all system partners. there are numerous exceptions where the partner libraries have contributed funding, including significant start-up costs associated with the unlv law library joining the system in 1998 and the addition of nevada state college in 2002. in both instances, those bodies funded required and recommended software directly applicable has multiple serials and acquisitions units as well as multiple scopes configured to help segment the records for each entities’ particular collection. innovative offers various levels of maintenance support. unlv’s level of support includes the vendor supplying services such as application troubleshooting resolution, software updates, and some degree of operating system and hardware configuration and advice. unlv also contracts with the hardware vendor for hardware maintenance and underlying operating system support. the unlv libraries have had the opportunity to hire fully qualified and capable technical staff to provide a high level of support for the ils. unlv’s level of vendor support has evolved from an original full turnkey installation with innovative providing all support to a present level of more modest support. nearly half of all survey respondents, 25 of 52 (48.1 percent) indicated they had a turnkey arrangement with innovative; the remaining 27 respondents had a lesser level of support. maintenance and support obviously carry a cost with one or more third party providers. the majority of the respondents, 40 of 51 (78.4 percent), indicated there is a cost-sharing structure in place where maintenance support costs related to the ils are spread across partner libraries. six respondents (11.8 percent) indicated the main institution fully funds the maintenance support costs. the unlv libraries drafted the first and current mou in 2002 for all five entities sharing the ils system. thirty-five of 51 survey respondents (68.6 percent) indicated they, too, have a mou in place. unlv’s mou is a basic document, two pages in length, split into the following sections: background; acquisition of new or additional hardware; acquisition of new or additional software; annual maintenance associated with the primary vendor and third party suppliers and, importantly, the associated cost allocation method for how annual support costs are split between the partners; how new products are purchased from the vendor; and management and support responsibilities of the hosting institution. many of the survey respondents provided details on items contained in their own mous, which can be clustered into several broad categories. these include budgeting, payments, funding formulas; general governance and voting matters; support (e.g., contractual service responsibilities, responsibilities of member libraries); equipment (e.g., title and use of equipment, who maintains equipment); and miscellaneous. this latter category includes items such as expectations for record quality; network requirements/ restrictions; fine collection; and holds management. the majority of unlv’s mou addresses shared costs for annual maintenance. unlv’s cost-sharing structure is simple. the system has a particular number of associated staff (simultaneous login) licenses, which have gradually increased as the libraries have grown. logins are separated by institution, and each member is assessed their management and support of shared integrated library systems | vaughan and costello 65 annual maintenance bill and all partners help maintain new software acquisitions by contributing toward the annual maintenance. regarding new software acquisitions, cost-sharing practices varied between 44 respondents providing information in the survey. eight (18.2 percent) indicated there is consultation with other partners and there is some arrangement to share costs between the majority or all partners sharing the system. two respondents (4.5 percent) indicated the institution expressing the initial interest in the product fully funds the purchase. nineteen respondents (43.2 percent) indicated that they have had instances of both these scenarios (shared funding and sole funding). two respondents (4.5 percent) indicated they could not recall ever adding any additional software. thirteen respondents (29.5 percent) offered details to their operation such as additional serials and accounting units (for the law library), check-in and order records, and staff licenses. in addition, when the system was migrated from the aging text-based system (innopac) to the current millennium java-based gui system in 1999, the current partners contributed toward the upgrade cost based on number of staff licenses. partner institutions have continued to fund items of sole benefit to their operation, such as adding staff licenses or required network port interfaces associated with patron self-check stations installed at their facilities. during the 2000s, the unlv libraries have fully funded a majority of software of potential benefit to all partners, such as the electronic resource management module, the encore next generation discovery platform, and various opac/encore enhancements. software additions typically increase the figure 2. cost-sharing formula factors t h e a m o u n t o f th e o ve ra ll ye a rl y in n o va ti ve in te rf a c e s m a in te n a n c e /s u p p o rt i n vo ic e t h e a m o u n t o f a n y a d d it io n a l 3 rd p a rt y m a in te n a n c e / su p p o rt a g re e m e n ts a ss o c ia te d w it h t h e i n n o va ti ve sy st e m ( su c h a s c o n tr a c ts w it h t h e h a rd w a re m a n u fa c tu re r— h p, s u n m ic ro sy st e m s [o ra c le ], e tc .) t h e p u rc h a se c o st (s ) fo r n e w ly a c q u ir e d i n n o va ti ve m o d u le s/ p ro d u c ts t h e p u rc h a se c o st (s ) fo r n e w ly a c q u ir e d h a rd w a re a ss o c ia te d w it h t h e i n n o va ti ve s ys te m ( su c h a s a se rv e r, a d d it io n a l d is k s p a c e , b a c k u p e q u ip m e n t, e tc .) t h e n u m b e r o f in c id e n t re p o rt s (o r ti m e s p e n t) , b y p e rs o n n e l a t th e m a in i n st it u ti o n r e la te d t o r e se a rc h , tr o u b le sh o o ti n g , e tc . su p p o rt i ss u e s re p o rt e d b y p a rt n e r in st it u ti o n s t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y in st it u ti o n f t e t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y n u m b e r o f b ib o r it e m r e c o rd s th e p a rt n e r’ s in st it u ti o n h a s in t h e in n o va ti ve d a ta b a se t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y n u m b e r o f st a ff lo g in p o rt s d e d ic a te d t o t h e p a rt n e r lib ra ry t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y n u m b e r o f u se r se a rc h e s c o n d u c te d f ro m i p r a n g e s a ss o c ia te d w it h th e p a rt n e r in st it u ti o n t h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e in n o va ti ve s ys te m , a s m e a su re d b y th e n u m b e r o f p a tr o n r e c o rd s w h o se h o m e l ib ra ry i s a ss o c ia te d w it h t h e p a rt n e r in st it u ti o n 66 information technology and libraries | june 2011 applied, the number of staff users has increased significantly, and the system was migrated to an underlying oracle database in 2004. since the original system was purchased in 1989 and fully installed in 1990, the central, locally hosted server has been replaced three times, in 1997, 2002, and 2009. partners contributed toward the costs of the server upgrades in 1997 and 2002, while the unlv libraries fully funded the 2009 upgrade. software and hardware components comprising the backup system have been significantly enhanced with a modern system capable of the speed, capacity, and features needed to perform appropriately in the short backup window available each night. unlv funded the initial backup software and hardware, and the partner institutions contribute toward the annual maintenance associated with the backup equipment and software. one survey question focused on major central infrastructure supporting the ils (defined as items exceeding $1,000 and with several examples listed). the question did not focus on hardware that could be provided by ils vendors benefiting a single partner, such as self-check stations or inventory devices. fourteen (31.8 percent) of the 44 respondents indicated that if major new hardware was needed, there was consultation with other partners, and, if purchased, a cost-sharing agreement was arranged. two respondents (4.5 percent) indicated the institution expressing the initial interest fully funds the purchase and seven respondents indicated they’ve had instances in the past of both these scenarios. three respondents (6.8 percent) indicated their shared system hardware had never been replaced or upgraded to their knowledge. nineteen respondents provided information on alternate scenarios or otherwise more details as to local practice. several indicated a separate fund is maintained solely for large ils system-related improvements or ils related purchases. revenue for these funds can be built up over time through maintenance and use payments by partner libraries or by a small additional fee earmarked for future hardware replacement needs collected each year. one respondent indicated they have been able to get grant funds to cover major purchases. with few exceptions, the majority of free text responses indicated that costs for major purchases were shared by partners or otherwise funded by the central consortium or cooperative agency. as with regular annual maintenance and new software purchases, various elements can determine what portion of hardware replacement costs are borne by partner libraries. this includes number of staff licenses (21.9 percent of responses), institutional fte count (15.6 percent), number of bibliographic or item records (15.6 percent), and number of patron records (9.4 percent). twenty respondents provided additional information. several indicated that the costs are split evenly across all partners. several indicated that population served was a factor. others reiterated that costs for central hardware on other scenarios. several indicated that if a product is directly applicable to only one library, such as self-check interfaces and additional acquisition units, then the library in need fully funds the purchase, which mirrors the local practice at unlv. several respondents indicated that if a product benefits all libraries, then costs are shared equally. one respondent indicated that the partner libraries discuss the potential item, and collectively they may choose not to purchase, even if one or more partners are very interested. in such cases, those partners have the option to purchase the product and must agree to make it available to all partners. several respondents indicated that, as the largest entity using the shared system, they generally always purchased new software for their operation as needed, with the associated benefit that the other partners of the system were allowed to use the software as well. three respondents reiterated that a central office funds add-on modules, in one case from funding set aside each year for system improvements. a fourth respondent indicated that a “joiners fee” fund, built up from new members joining the system, allows for the purchase of new software. clearly there are many scenarios of how new software is funded. generally, regardless of funding source, sole or share, if a product can benefit all partners, it’s allowed to do so. thirty-six survey respondents provided details on what factors determine how much each partner contributes toward new software purchases. seven respondents (19.4 percent) indicated the number of staff licenses plays a role (as in the unlv model). three respondents (8.3 percent) indicated that institution fte played a role, while three other respondents indicated that the number of partner bibliographic/item records played a role. the majority of respondents, 25 (69.4 percent) provided alternate scenarios or otherwise more information. nine of these 25 respondents indicated costs were split evenly across all partners. several indicated that the formula used for determining maintenance costs was also applied to new software purchases. four respondents indicated that the library service population was a factor. two indicated that circulation counts were a factor. one indicated that it’s negotiated on a per purchase basis, based on varying factors. ■■ hardware purchases hardware needs related to the underlying infrastructure, such as server(s), disk space, and backup equipment increases as the ils grows. unlv’s ils installation has grown tremendously. new software modules have been purchased, application architecture changes occurred with the release of the millennium suite in the late 1990s, regular annual updates to the system software have been management and support of shared integrated library systems | vaughan and costello 67 each partner institution. each module coordinator served as the contact person charged with maintaining familiarity with the functions and features of a particular module, testing enhancements within new releases, keeping other staff informed of changes, and alerting the system vendor of any problems with the module. annually, module coordinators were to consider new software and prioritize and recommend ils software the library should consider purchasing. module coordinators were tasked to maintain a system-wide view of the ils and alert others if they discovered problems or made changes to the ils that could affect other areas of the system. in addition, module coordinators were encouraged to subscribe to the iug listserv to monitor discussions and to maintain awareness of overall system issues. all staff had access to the system’s user manual but if they had questions on system features or functions, the module coordinator served as an additional resource. in addition, any bug reports were provided to the most appropriate module coordinator, who would contact innovative. the unlv systems staff, which has grown over time and is now part of the library technologies division, was responsible for all hardware and networking problems, and for scheduling and verifying nightly data backups. the systems department coordinated any new software installations with the module coordinators group, library staff, and library partners. in 2006, the unlv libraries reorganized and hired a dedicated systems librarian focused on the ils. the systems librarian’s principal job responsibility is to serve as the central administrator and site coordinator of the unlv libraries’ shared ils. responsibilities include communicating with colleagues regarding current system capabilities, monitoring vendor software developments, monitoring how other libraries utilize their innovative systems, and recommending enhancements. the systems librarian is the site contact with innovative and coordinates and monitors support calls, software and patch upgrades, and new software module installations. the position serves as the contact person for the shared institutions whenever they have questions or issues with the ils. the systems librarian has taken over much of the work previously coordinated through the module coordinators group. while the formal module coordinators group no longer exists, module experts still provide assistance as needed, and consultation always occurs with partners on system-wide issues as they arise. unlv is not unique in how it manages their ils. in the survey results, 36 respondents (87.8 percent) indicated there is a dedicated individual at the main institution who has a primary responsibility of overseeing the ils. to help clarify the responses, “primary responsibility” is defined as individuals spending more than half their time devoted to support, research, troubleshooting, and system administration duties related to the ils. the authors replacements are determined by the same formula used for assessing the share of annual maintenance. ■■ additional purchases the last funding-related survey question asked if ongoing content enrichment services were subscribed to, and if so, to describe how the cost share amount is determined for partner libraries. content enrichment services can provide additional evaluative content such as book cover images, table of contents (toc), and book reviews. unlv subscribes to a toc service as well as an additional service providing book covers, reviews, and excerpts. partner institutions contribute to the annual service charge associated with the toc service and pay for each record enhanced at their library. unlv fully funds the book cover/review/excerpt service that benefits all partners. fourteen of the 43 survey respondents (32.6 percent) indicated they did not subscribe to enrichment services. twelve respondents (27.9 percent) indicated they had one or more enrichment services and that the costs were fully funded by the main institution. seventeen respondents (39.5 percent) subscribe to enrichment services and that the costs are shared. several indicated the existing cost-sharing formula used for other assessments (annual maintenance, hardware, or nonsubscription-based software) is also used for the ongoing enrichment services. one respondent indicated they maintain a collective fund for enrichment services and estimate the cost of all shared subscriptions; this figure is integrated into the share each institution contributes to the central fund annually. one respondent indicated that their system only uses free enrichment services. ■■ support the next section of the survey addressed staff support efforts related to management of the ils. twenty years ago when unlv installed its ils, staff support included one librarian and one additional staff; both focused on various aspects of system support, from maintaining hardware to working with the vendor, in addition to having other primary job responsibilities completely unrelated to the ils. in addition, over time, functional experts developed for particular modules of the system, such as cataloging, acquisitions, circulation, and serials control. this group of functional experts eventually became known as the unlv innovative module coordinators group, which was chaired by the head of the library systems department. this group met quarterly and included experts from unlv as well as one representative from 68 information technology and libraries | june 2011 solely by the main library. typical system administration activities include managing and executing mid-release and major release software upgrades (95.2 percent of all respondents indicated the main library is solely responsible); managing, coordinating, and scheduling new products for installation (95.2 percent); monitoring disk space (95 percent); and scheduling and monitoring backups (92.9 percent). unlv’s ils support model is very similar to the survey results. the systems librarian at unlv manages all software upgrades, as well as coordinating and scheduling new ils software product and module installs. the library technologies division monitors and schedules the nightly backups and diskspace usage. certain unlv libraries staff and selected individuals from the partner libraries are authorized to open support calls with the system vendor, although the systems librarian often handles this activity herself. other functions, such as maintaining the year-to-date and last year circulation statistics are also performed by the unlv libraries systems librarian. updating circulation parameters are tasks best performed by each of the created a list of 20 duties related to ils system administration and asked respondents to indicate whether: the main library or a central consortial or cooperative office dedicated to the ils handles this particular duty; the duty is shared between the main library and partner libraries; or the duty is handled by just a partner library. as illustrated in figure 3, the survey results overwhelmingly show that the main library in a shared system provides the majority of system administration support. only two tasks were broadly shared between the main library and partner libraries; maintenance of the institution’s records (bibliographic, item, patron, order, etc.) and maintaining network and label printers. other shared tasks included changes to the circulation parameters tables (e.g., configuring loan rules and specifying open hours and days closed tables for materials they themselves circulate) with 40.5 percent of the respondents indicating this as a shared responsibility, opening support calls with the vendor (38.1 percent), monitoring bounced export and fts mail (33.3 percent), and account management (31 percent). the more typical system administration activities are done a c c o u n t m a n a g e m e n t (c re a te n e w / d e le te a c c o u n ts ; m ill e n n iu m a u th o ri za ti o n s) m a n a g e a n d e x e c u te i n n o va ti ve m id -r e le a se a n d m a jo r re le a se s o ft w a re u p g ra d e s m a n a g e , c o o rd in a te a n d s c h e d u le n e w in n o va ti ve s o ft w a re p ro d u c t in st a lla ti o n s s c h e d u le a n d m o n it o r b a c k u p s w ri te s c ri p ts t o a u to m a te p ro c e ss e s (i. e ., c ir c u la ti o n o ve rr id e s re p o rt , sy st e m s ta tu s re p o rt s, e tc .) p e rf o rm r e vi e w f ile m a in te n a n c e a n d t a k e a c ti o n s h o u ld a ll fi le s fi ll o p e n s u p p o rt c a lls w it h i n n o va ti ve m o n it o r st a tu s o f o p e n c a lls ; se rv e a s lia is o n w it h i n n o va ti ve f o r re so lu ti o n o f su p p o rt c a lls m a in ta in y e a rto -d a te /l a st y e a r c ir c u la ti o n st a ti st ic c o u n te rs m o n it o r sy st e m m e ss a g e s m o n it o r d is k s p a c e u sa g e m o n it o r b o u n c e d e x p o rt a n d f t s m a il m a in ta in c o d e t a b le s (f ix e d l e n g th , va ri a b le le n g th , e tc .) u p d a te c ir c u la ti o n p a ra m e te rs t a b le s (lo a n ru le s, h o u rs o p e n , d a ys c lo se d , e tc .) s e t u p , m o n it o r a n d t ro u b le sh o o t n o ti c e s is su e s w ri te o r m o d if y lo a d t a b le s fo r n e w r e c o rd lo a d in g m a in ta in s ys te m p ri n te rs ( la b e l, n e tw o rk e d la se r p ri n te rs ) p ro vi d e m a in te n a n c e o n r e c o rd s (p a tr o n , b ib , it e m , e tc .) m a n a g e s ys te m s e c u ri ty t h ro u g h i n n o va ti ve sy st e m s e tt in g s a n d /o r h o st b a se d o r n e tw o rk b a se d f ir e w a lls p ro vi d e e m e rg e n c y (o ff h o u rs ) re sp o n se t o re p o rt s o f in n o va ti ve d o w n ti m e o r se rv e r h a rd w a re f a ilu re s figure 3. systems administration / support responsibilities management and support of shared integrated library systems | vaughan and costello 69 and definition of policies and procedures. some groups provide recommendations to a larger executive board for the consortia. the meeting frequency of these groups is as varied as the libraries. some groups meet quarterly (33.3 percent) or monthly (20 percent) but the majority meet at other frequencies (40 percent), such as every other month or twice a year. some libraries use e-mail to communicate as opposed to having regular in-person meetings. in addition to a standing committee focused on the ils, and similar to unlv’s experience, libraries may have finite working groups to implement particular products. ■■ training, professional development, and planning the survey also focused on training, professional development, and planning activities related to the ils. there are many methods that library staff can use to stay current with their ils. most training methods typically include in-person workshops or online tutorials, as well as other venues for professional development, such as conference attendance. the authors were interested in how libraries sharing an ils determined training needs and who was responsible for the training. the survey results showed that libraries value a variety of training opportunities, partner institutions, with advice and assistance as necessary provided by the systems librarian. the authors were interested if an ils oversight body exists with other shared systems, and, if so, what issues are discussed. responses indicated that a variety of groups exist, and, in some instances, multiple groups may exist within one consortia (some groups have a more specific ils focus and others a more tangential involvement). as illustrated in figure 4, a minority of respondents, 11 of 41 (26.8 percent), indicated that they do not have a group providing ils oversight. if such a group exists, respondents were allowed to select various predefined duties performed by that group. twenty-three respondents indicated the group discusses purchasing decisions. respondents also indicated that such a group also discusses the impact of the vendor enhancements offered by mid-release and regular full-releases (19), and when to schedule the upgrades (12). the absence of an oversight group doesn’t imply that consultation doesn’t occur, rather, it may be the responsibility of an individual as opposed to an effort coordinated by a group. some libraries also have module-driven committees, which disseminate information, introduce new ideas, and try to promote cohesiveness throughout the consortium. other duties that such an oversight group may focus on include workflow issues, discussion of system issues, figure 4. issues discussed by ils oversight body updates on unresolved problem calls with innovative discussion on enhancements offered by mid-release and regular full release software upgrades and their impact (positive/ negative) on users of the system scheduling mid-release/ full release software upgrades prioritizing and selecting choices related to the innovative user’s group enhancements ballot for your installation discussion of potential new software/ modules to purchase from innovative n/a—an oversight group, body, or committee does not exist related to the oversight of the innovative system other 70 information technology and libraries | june 2011 specifically regarding cost sharing, support, and rights and responsibilities. in conducting this background research, a paucity of published literature was observed, and thus the authors hope the findings above may help other established consortia, who may be interested in reviewing or tweaking their current mous or more formalized agreements likely in place. it may also provide some considerations for libraries considering initiating a shared ils instance, something that, given the current recession, may be a topic to consider. given that nearly a decade has passed since the original unlv mou was drafted and agreed to, several revisions will be proposed and drafted. this includes formalization of how costs are divided for enrichment services (new since the original mou), and formalization in writing of the coordination role of the systems librarian in her capacity as chief manager of the ils. other ideas gathered from survey responses are worth consideration, such as a base additional fee contributed each year (above and beyond the fee accessed as determined by staff licenses). such a fee could help recoup real, sometimes significant costs associated with the system, such as the purchase of additional software benefitting all players (often, in practice funded solely by the main library). such a fee could also help recoup more tangential (but still real) expenses, such as replacement of backup media. however, at the time of writing, tweaking (increasing) the fee assessed to partner institutions is a delicate issue. as with many other institutions of learning and their associated libraries, the nevada system of higher education has been particularly hard hit with funding cuts, even when compared against serious cuts experienced by colleagues nationwide. by all measures (unemployment, state budget shortfall, foreclosures, etc.) nevada has been one of the hardest hit states in the current recession. while knowledge gained from this survey was useful (and current), what effect it will have in changing the cost structure is, now, on hold. in the spirit of support among the libraries in the same system of higher education, and in continuing to demonstrate serious shared efficiencies (by maintaining one joint system as opposed to five individual systems), no new fee structure will be implemented in the short term. at the appropriate time, different costing structures such as those elicited in the survey results will merit closer attention. references 1. jason vaughan, “a library’s integrated online library system: assessment and new hardware implementation,” information technology and libraries 23, no. 2 (june 2004): 50–57. 2. innovative interfaces, “about us: history,” http://www .iii.com/about/history.shtml (accessed may 17, 2010). regardless of the library’s status. the easiest and cheapest method of awareness involves having someone monitor the iug electronic discussion list, with 29 respondents (70.7 percent) indicating that both the main library and one or more partner libraries participate in this activity. attendance at the national and regional iug meetings was also valued highly by libraries with 26 respondents (66.7 percent) indicating both the main libraries and their partner libraries having a staff member attend such meetings in the past 5 years. sixteen respondents (64 percent) indicated both the main library and their partner libraries regularly send staff to the american library association annual conference and midwinter meeting. iug typically has a meeting the friday before the midwinter meeting. attendance at training workshops held at the vendor headquarters, as well as online training, is an activity in which the main library participates more frequently than the partner libraries (61.1 percent). complete survey results are provided in appendix a, available at http://www.lita.org/ala/mgrps/divs/lita/ ital/302011/3002jun/pdf/vaughan_app.pdf. ■■ research summary and future directions integrated library systems shared by multiple partners hold the promise of shared efficiencies. given a rather significant number of responses, shared systems appear to be quite common, ranging from a few partners to systems with many partners. perhaps reflecting this, shared systems range from loose federations of library partners to shared systems managed by a more formalized, official consortium. a majority of libraries with shared systems have a mou or other official documents to help define the nature of the relationship, focusing on such topics as budgeting, payments, and funding formulas; general governance and voting matters; support; and equipment. most libraries sharing a system have a method or funding formula outlining how the ils is funded on an annual basis and the contributions provided by each partner. such methods can include not only annual maintenance, but also the procurement of new hardware and software extending the system capabilities. while many support functions are carried out by a central office or staff at the main library hosting the shared system, partner libraries often participate in annual user group and library association conferences where they help stay abreast of vendor ils developments. the research above describes the authors’ investigations into management of shared integrated library systems. in particular, the authors were interested in how other consortia sharing an ils managed their system, high-performance annotation tagging over solr full-text indexes michele artini, claudio atzori, sandro la bruzzo, paolo manghi, marko mikulicic, and alessia bardi information technology and libraries | september 2014 22 abstract in this work, we focus on the problem of annotation tagging over information spaces of objects stored in a full-text index. in such a scenario, data curators assign tags to objects with the purpose of classification, while generic end users will perceive tags as searchable and browsable object properties. to carry out their activities, data curators need annotation tagging tools that allow them to bulk tag or untag large sets of objects in temporary work sessions where they can virtually and in real time experiment with the effect of their actions before making the changes visible to end users. the implementation of these tools over full-text indexes is a challenge because bulk object updates in this context are far from being real-time and in critical cases may slow down index performance. we devised tagtick, a tool that offers to data curators a fully functional annotation tagging environment over the full-text index apache solr, regarded as a de facto standard in this area. tagtick consists of a tagtick virtualizer module, which extends the api of solr to support real-time, virtual, bulk-tagging operations, and a tagtick user interface module, which offers end-user functionalities for annotation tagging. the tool scales optimally with the number and size of bulk tag operations without compromising the index performance. introduction tags are generally conceived as nonhierarchical terms (or keywords) assigned to an information object (e.g., a digital image, a document, a metadata record) in order to enrich its description beyond the one provided by object properties. the enrichment is intended to improve the way end users (or machines) can search, browse, evaluate, and select the objects they are looking for. examples are qualificative terms, i.e. terms associating the object to a class (e.g., biology, computer science, literature) or qualitative terms, i.e. terms associating the object to a given measure of value (e.g., rank in a range, opinion).1 approaches differ in the way tags are generated. in some cases users (or machines)2 freely and collaboratively produce tags,3 thereby generating so-called michele artini (michele.artini@isti.cnr), claudio atzori (claudio.atzori@isti.cnr.it), sandro la bruzzo (msandro.labruzzo@isti.cnr), paolo manghi (paolo.manghi@iti.cnr.it), and mark mikulicic (mmark.mikulicic@isti.cnr.it) are researchers at istituto di scienza e tecnologie dell’informazione “alessandro faedo,” consiglio nazionale delle richerce, pisa, italy. alessia bardi (mallessia.bardi@for.unipit.it) is a researcher at the dipartimento di ingegneria dell’informazione, università di pisa, italy. mailto:michele.artini@isti.cnr.it mailto:claudio.atzori@isti.cnr.it mailto:msandro.labruzzo@isti.cnr mailto:paolo.manghi@iti.cnr.it mailto:mmark.mikulicic@isti.cnr.it mailto:mallessia.bardi@for.unipit.itmailto: high-performance annotation tagging over solr full-text indexes | artini et al 23 folksonomies. the natural heterogeneity of folksonomies calls for solutions to harmonise and make more effective their usage, such as tag clouds.4 in other approaches users can pick tags from a given set of values (e.g., vocabulary, ontology, range) or else find hybrid solutions, where a degree of freedom is still permitted.5,6 a further differentiation is introduced by semantically enriched tags, which are tags contextualized by a label or prefix that provides an interpretation for the tag.7 for example, in the digital library world, the annotation of scientific article objects with subject tags could be done according to the tag values of the tag interpretations of acm scientific disciplines and “dewey decimal classification,” whose term ontologies are different.8 the action of tagging is commonly intended as the practice of end users or machines of assigning or removing tags to the objects of an information space. an information space is a digital space a user community populates with information objects for the purpose of enabling content sharing and providing integrated access to different but related collections of information objects.9 the effect of tagging information objects in an information space may be private, i.e., visible to the users who tagged the objects or to a group of users sharing the same right, or public, i.e., visible to all users.10 many well-known websites allow end users to tag web resources. for example delicious11 (http://delicious.com) allows users to tag web links with free and public keywords; stack overflow (http://stackoverflow.com), which lets users ask and answer questions about programming, allows tagging of question threads with free and public keywords; gmail 12 (http://mail.gmail.com) allows users to tag emails—at the same time, tags are also transparently used to encode email folders. in the digital library context, the portal europeana (http://www.europeana.eu) allows authenticated end users to tag metadata records with free keywords to create a private set of annotations. in this work we shall focus on annotation tagging—that is, tagging used as a manual data curation technique to classify (i.e., attach semantics to) the objects of an information space. in such a scenario, tags are defined as controlled vocabularies whose purpose is classification.13,14 unlike semantic annotation scenarios, where semantic tags may be semiautomatically generated and assigned to objects,15 in annotation tagging authorized data curators are equipped with search tools to identify the sets of objects they believe should belong or not belong to a given category (identified by a tag), and to eventually perform the tagging or untagging actions required to apply the intended classification. in general, such operations may assign or remove tags to and from an arbitrarily large subset of objects of the information space. it is therefore hard to predict the quality and consistency of the combined effect of a number of such actions. as a consequence, data curators must rely on virtual tagging functionalities which allow them to bulk (un)tag sets of objects in temporary work sessions, where they can in real-time preview and experiment (do/undo) the effects of their actions before making the changes visible to end users. examples of scenarios that may require annotation tagging can be found in many fields of application. this is the case, for example, in several data infrastructures funded by the european commission fp7 program, which share the common goal of populating very large information spaces by aggregating textual metadata records collected from several data sources. examples are the data http://delicious.com/ http://stackoverflow.com/ http://mail.gmail.com/ http://www.europeana.eu/ information technology and libraries | september 2014 24 infrastructures for driver,16 heritage of the people’s europe (hope),17 european film gateway (efg and efg1914),18 openaire19 (http://www.openaire.eu), and europeana. in such contexts, the aggregated records are potentially heterogeneous, not sharing common classification schemes, and annotation tagging becomes a powerful mean to make the information space more effectively consumable by end users. there at two significant challenges to be tackled in the realization of annotation tagging tools. first is the need to support bulk-tagging actions in almost real time so that data curators need not wait long for their actions to complete. second, bulk-tagging actions need to be virtualized over the information space, so that data curators can verify the quality of their actions before committing them, and access to the information space is unaffected by such actions. naturally, the feasibility and quality of annotation tagging tools strictly depends on the data management system adopted to index and search objects of the information space. in general, not to compromise information space availability, bulk-updates are based on offline, efficient strategies, which minimize the update’s delay,20 or virtualisation techniques, which perform the update in such a way that users have the impression this was completed.21 in this work, we target the specific problem of annotation tagging of information spaces whose objects are documents in a solr full-text index (v3.6).22 solr is an open-source apache project delivering a full-text index whose instances are capable of scaling up to millions of records, can benefit from horizontal clustering, replica handling, and production-quality performance for concurrent queries and bulk updates. the index is widely adopted in the literature and often in contexts where annotation tagging is required, such as the aforementioned aggregative data infrastructures. the implementation of virtual and bulk-tagging facilities over solr information spaces is a challenge, since bulk updates of solr objects are fast, but far from being real-time when large sets of objects are involved. in general, independently of the configuration, a re-indexing of millions of objects may take up to some hours, while for real-time previews even minutes would not be acceptable. moreover, in critical cases, update actions may also slow down index performance and compromise access to the information space. in this paper, we present tagtick, a tool that implements facilities for annotation tagging over solr with no remarkable degradation of performances with respect to the original index. tagtick consists of two main modules: the tagtick virtualizer, which implements functionalities for realtime bulk (un)tagging in the context of work sessions for solr, and the tagtick user interface, which implements user interfaces for data curators to create, operate and commit work sessions, so as to produce newly tagged information spaces. tagtick software can be demoed and downloaded from http://nemis.isti.cnr.it/product/tagtick-authoritative-tagging-apache-solr. annotation tagging annotation tagging is a process operated by data curators whose aim is improving the end user’s search experience over an information space. specifically, the activity consists in assigning searchable and browsable tags to objects in order to classify and logically structure the http://www.openaire.eu/ http://nemis.isti.cnr.it/product/tagtick-authoritative-tagging-apache-solr high-performance annotation tagging over solr full-text indexes | artini et al 25 information space into further (and possibly overlapping) meta-classes of objects. moreover, when ontologies published on the web are used, for example ontologies available as linked data such as the geonames ontology (http://www.geonames.org/ontology/documentation.html) or the dbpedia ontology (http://dbpedia.org/ontology), then tags are means to link objects in the information space to external resources. in this section, we shall describe the functional requirements of annotation tagging in order to introduce assumptions and nomenclature to be used in the remainder of the paper. information space: objects, classes, tags, and queries we define an information space as a set of objects of different classes c1 . . . ck. each class ci has a structure (l1 : v1, . . . ,ln : vn), where lj’s are object property labels and vj are the types of the property values. types can be value domains, such as strings, integers, dates, or controlled vocabularies of terms. in its general definition, annotation tagging has to do with semantically enriched tagging, where a tag consists of a pair (i, t), made of a tag interpretation i and a tag value t from a term ontology t; as an example of interpretation consider the acm subject classification scheme (e.g., i = acm), where t is the set of acm terms. in this context, tagging is de-coupled from the information space and can be configured aposteriori. typically, given an information space, data curators set up the annotation tagging environment by: (i) defining the interpretation/ontology pairs to be used for classification, and (ii) assigning to each class c the interpretations to be used to tag its objects. as a result, class structures are enriched with a set of interpretations (i1:t1 . . . im:tm), where ij are tag interpretation labels and tj the relative ontologies. unless differently specified, an object may be assigned multiple tag values for the same tag interpretation, e.g. scientific publication objects may cover different scientific acm disciplines. finally, the information space can reply to queries q formed according to the abstract syntax intable 1, where op is a generic boolean operator (dependent on the underlying data management system, e.g. “=,” “<,” “>”) and c∈{c1, . . . ,ck}. tag predicates (i = t) and class predicates (class = c) represent exact matches, which mean “the object is tagged with the tag (i, t)” and “the object belongs to class c.” q ∷=(q and q) | (q or q) | (l op v) | (i = t) | (class = c) | v | ε table 1. solr query language. http://www.geonames.org/ontology/documentation.html http://dbpedia.org/ontology information technology and libraries | september 2014 26 virtual and real-time tagging in annotation tagging data curators apply bulk (un)tagging actions with respect to a tag (i, t) over arbitrarily large sets of objects returned by queries q over the information space. due to the potential impact that such operations may have over the information space, tools for annotation tagging should allow data curators to perform their actions in a protected environment called work session. in such an environment curators can test sequences of bulk (un)tagging actions and incrementally shape up an information space preview: they may view the history of such actions, undo some of them, add new actions, and pose queries to test the quality of their actions. to offer a usable annotation tagging tool, it is mandatory for such actions to be performed in (almost) realtime. for example, curators should not wait more than a few seconds to test the result of tagging 1 million objects, an action which they might undo immediately after. moreover, such actions should not conflict (e.g., slow performance) with the activities of end users running queries on the information space. finally, when data curators believe the preview has reached its maturity, they can commit the work session, i.e., materialise the preview in the information space, and make the changes visible to end users. apache solr and annotation tagging as mentioned in the introduction, our focus is on annotation tagging for apache solr (v3.6). this section describes the main information space features and functionalities of the solr full-text index search platform. in particular, it explains the issues arising when using its native apis to implement bulk real-time tagging as described previously. solr information spaces: objects, classes, tags, and queries solr is one of the most popular full-text indexes. it is an apache open source java project that offers a scalable, high performance and cross-platform solution for efficient indexing and querying of information spaces made of millions of objects (documents in solr jargon).23 a solr index stores a set of objects, each consisting in a flat list of possibly replicated and unordered fields associated to a value. each object is referable by a unique identifier generated by the index at indexing time. the information spaces described previously can be modelled straightforwardly in solr. each object in the index contains field-value pairs relative to the properties and tag interpretations of all classes they belong to. moreover, we shall assume that all objects share one field named class whose values indicate the classes (e.g. c1, . . . ,ck) to which the object belongs. such an assumption does not restrict the application domain, since classes are typically encoded in solr by a dedicated field. the solr api provides methods to search objects by general keywords, field values, field ranges, fuzzy terms and other advanced search options, plus methods for the bulk addition and deletion of objects. in our study, we shall restrict to the search method query(q, qf), where q and qf are cql queries respectively referred as the “main query” and the “filter query”. in particular, in order to match the query language requirements described previously, we shall assume that q and qf are high-performance annotation tagging over solr full-text indexes | artini et al 27 expressed according to the cql subset matching the query language in table 1. getdocset :rs→ds returns the docset relative to a result set intersectdocsets :ds × ds→ds returns the intersection of two docset intersectsize :ds × ds→integer returns the size of the intersection of two docsets unifydocsets :ds × ds→ds returns the union of two docsets andnotdocsets :ds × ds→ds given two docsets ds1 and ds2 returns the docset {d | d ∈ ds1 ⋀ ¬ d ∈ ds2} searchondocsets :q × ds→rs executes a query q over a docset and returns the relative resultset table 2. solr docset management low-level interface. to describe the semantics of query(q, qf) it is important to make a distinction between the solr notions of result set and docset. in solr, the execution of a query returns a result set (i.e., queryresponse in solr jargon) that logically contains all objects matching the query. in practice, a result set is conceived to be returned at the time of execution to offer instant access to the query result, which is meantime computed and stored in memory into a low-level solr data structure called docset. docsets are internal solr data structures, which contain lists of object identifiers and allow for efficient operations such as union and intersection of very large sets of objects to optimize query execution. table 1 illustrates some of the methods used internally by solr to handle docsets. method names have been chosen to be self-explanatory and therefore do not match the ones used in the libraries of solr. information technology and libraries | september 2014 28 ⟦query(q,qf)⟧solr= � {d|id(d)∈ ⟦q⟧ds} if (qf=null) searchondocset(q, ⟦qf⟧cache(ϕ)) if (qf ≠null) ⟦qf⟧cache(ϕ)= � ds if (ϕ(qf)=ds) ⟦qf⟧cache(ϕ[qf ← ⟦qf⟧ds]) if (ϕ(qf) = ⊥) ⟦(q1 and q2)⟧ds= ⟦q1⟧ds ∩ ⟦q2⟧ds ⟦(q1 or q2)⟧ds= ⟦q1⟧ds ∪ ⟦q2⟧ds ⟦(l op v)⟧ds= {id(d)| d.l op v} ⟦(i=t)⟧ds= {id(d)| d.i op t} table 3. semantic functions. informally, query(q, qf) returns the result set of objects matching the query q intersected with the objects matching the filter query qf, i.e. its semantics is equivalent to the one of the command query(q and qf, null). in practice, the usage of a filter query qf is intended to efficiently reduce the scope of q to the set of objects whose identifiers are in the docset of qf. to this aim, solr keeps in memory a filter cache ϕ:q →ds. the first time a filter query qf is received, solr executes it and stores the relative docset ds in ϕ, where it can be accessed to optimize the execution of query(q, qf). once the docset ϕ(qf) = ds is available, query(q, qf) invokes the low-level method searchondocset(q, ds) (see table 1). the method executes q to obtain its docset, efficiently intersects such docset with ds, and populates the result set relative to the query. due to the efficiency of docset intersection and in-memory data structures, query execution time is closely limited to the one necessary to execute q. table 3 shows the semantic functions ⟦.⟧solr :q x q →rs , ⟦.⟧ds :q →ds, ⟦.⟧cache :q x ℘(q x ds) → ds. the first yields the result set of query(q, qf); the second the docset relative to a query q (where d is an object); and the third resolves queries into docsets by means of a filter cache ϕ. limits to virtual and real-time tagging in solr whilst solr is a well-known and established solution for full-text indexing over very large information spaces, it poses challenges for higher-level applications willing to expose to users private, modifiable views of the same index. this is the case for annotation tagging tools, which must provide data curators with work sessions where they can update with tagging and untagging actions a logical view of the information space, while still providing end users with search facilities over the last committed information space. since solr api does not natively provide “view management” primitives, the only approach would be that of materializing tagging and untagging high-performance annotation tagging over solr full-text indexes | artini et al 29 actions in the index while making sure that such changes are not visible to end users. prefixing tags with work session identifiers, cloning of tagged objects, or keeping index replicas may be valuable techniques, but all fail to deliver the real-time requirement described previously. this is due to the fact that when very large sets of objects are involved the re-indexing phase is generally far from being real-time. in general, independently of the configuration, processing such requests may take up to some hours for millions of objects, while for real-time previews even minutes would not be acceptable. tagtick virtualizer: virtual real-time tagging for solr this section presents the tagtick virtualizer module, as the solution devised to overcome the inability of apache solr to support out-of-the-box real-time virtual views over information spaces. the virtualizer api, shown in table 4, supports methods for creating, deleting and committing work sessions, and, in the context of a work session: (1) performing tagging/untagging actions and (2) querying the information space modified by such actions. in the following we will describe both functional semantics and implementation of the api, given in terms of a formal symbolic notation. the semantics defines the expected behaviour of the api and is provided in terms of the semantics of solr. the implementation defines the realization of the api methods in terms of the low-level docset management library of solr. the right side of figure 1 illustrates the layering of functionalities required to implement the tagtick virtualizer module. as shown, the realization of the module required exposing the solr low-level docset library through an api. figure 1. tagtick virtualizer: the architecture. tagtick virtualizer api: the intended semantics the commands createsession() creates a new session s, intended as a sequence of (un)tagging information technology and libraries | september 2014 30 actions over an initial information space i. the command and deletesession(s) removes the session s from the environment. we denote the virtual information space obtained by modifying i with the actions in s as i(s); note that: i(𝜖) = i. createsession() creates and returns a work session s deletesession(s) deletes a work session s commitsession(s) commits a work session s action(a, rs, (i, t), s) applies the action a with (i, t)to all objects in rs in s virtquery(q, s) executes q over the information space i(s) table 4. tagtick virtualizer api: the methods. the command action(a, rs, (i, t), s), depending on the value of a being tag or untag, applies the relative action for the tag (i, t) to all objects in rs and in the context of the session s. (un)tagging actions occur in the context of a session s, hence update the scope of the information space i(s). the construction of such rs takes place in the annotation tagging tool user interface and may require several queries before all objects to be bulk (un)tagged are collected. annotation tagging tools may for example provide web-basket mechanisms to support curators in this process. the command commitsession(s) makes the virtual information space i(s) persistent, i.e., materializes the bulk updates collected in session s. once this operation is completed, the session s is deleted. the command virtquery(q, s) executes a virtual search whose semantics is that of the solr’s method query(q, null) executed over i(s). more formally, let’s extend the semantic function ⟦.⟧solr to include the information space scope of the execution, that is: ⟦query(q,qf)⟧solr i is the semantics of query(q, qf) over a given information space i. then, we can define: ⟦virtquery(q, s)⟧tv = ⟦query(q, null)⟧solr i(s) tagtick virtualizer api: the implementation to provide its functionalities in real time, the tagtick virtualizer avoids any form of update action into the index. the module emulates the application of bulk (un)tagging actions over the information space by exploiting solr low-level library for docset management, whose methods are shown in table 2. the underlying intuition is based on two considerations: (1) the action action(a, rs, (i, t), s) can be encoded in memory as an association between the tag (i, t) and the objects in the docset ds relative to rs in the context of s; and (2) the subset of objects ds should be returned to the high-performance annotation tagging over solr full-text indexes | artini et al 31 query (i = t) if executed over i in the scope of s (i.e., as if i was updated with such an action). by following this approach, the module may rewrite and execute calls of the form virtquery(q and (i = t)) into calls searchondocset(q, ds), thereby emulating the real-time execution of the query over the information space i(s). more generally, any query of the form q and qtag predicates, where qtag predicates is a query combining tag predicates relative to tags touched in the session, can be rewritten as searchondocset(q, ds). in such cases, ds is obtained by combining the docsets relative to tag predicates by means of the low-level methods intersectdocsets and unifydocsets. the tagtick virtualizer module implements the aforementioned session cache by means of an inmemory map ρ = s × i × t →ds, which caches the tagging status of all active work sessions. to this aim, ρ maps triples (s, i, t) onto docsets ds that are defined as the set of objects tagged with the tag (i, t) in the context of s at the time of the request. the tagtick virtualizer is stateless with regard to the specific tags and sessions identifiers it is called to handle; such information is typically held in applications using the module to take advantage of real-time, virtual tagging mechanisms. tagging and untagging actions the method action(a, rs, (i, t), s) has the effect of changing the status ρ to reflect the action of tagging or untagging the objects in the result set rs with the tag (i, t) in the session s. table 5 describes the effect of the command over the status ρ in terms of the semantic function ⟦.⟧m:c × ℘(s×i×t)→℘(s×i×t) that takes a command c and a status ρ and returns the status ρ affected by c. in order to optimize the memory heap, ρ is populated following a lazy approach, according to which a new entry for the key (s, i, t) is created when the first tagging or untagging action with respect to the tag (i, t) is performed in the scope of s. when the user adds or removes a tag (i, t) for the first time in the session s (case ρ(s, i, t)= ⊥), the value of the entry ρ(s, i, t) is initialized to the docset relative to the query i = t: ds = getdocset(⟦query((i=t),null)⟧solr i the function init(ρ, s, i, t) returns such new ρ over which the tag or untag action is eventually executed. if the action involves a tag (i, t) for which an entry ρ(s, i, t)= ds exists (case ρ(s, i, t) ≠ ⊥), the commands return the new ρ obtained by adding or removing the docset getdocset(rs) to or from ds. such actions are performed in memory with minimal execution time. information technology and libraries | september 2014 32 ⟦action(a, rs, (i, t), s)⟧m(ρ)= � updatetag(ρ, rs, (i, t), s) if(a=tag and ρ(s, i, t)≠ ⊥) updateuntag(ρ, rs, (i, t), s) if(a=untag and ρ(s, i, t)≠ ⊥) ⟦action(a, rs, (i, t), s)⟧m(init(ρ, s, i, t)) if(ρ(s, i, t) = ⊥) init(ρ, s, i, t)= ρ[ρ(s, i, t)←getdocset(⟦query(i=t, null)⟧solr] updatetag(ρ, rs, (i, t), s)= ρ[ρ(s, i, t)← ρ(s, i, t) ∪ getdocset(rs)] updateuntag(ρ, rs, (i, t), s)= ρ[ρ(s, i, t)← ρ(s, i, t) ∖ getdocset(rs)] table 5. semantics of tag/untag commands. queries over a virtual information space as mentioned above, the command virtquery(q, s) is implemented by executing the low-level method searchondocset(q', ds). informally, 𝑞′ is the subpart of q whose predicates are not affected by actions in s, while ds is the subset of objects matching tag predicates affected by actions in s, to be calculated by means of the map ρ. to make this a real statement, two main issues must be addressed. the first one is syntactical: how to extract from q the sub-query q' and the subquery to be filtered by ρ to generate ds. the second issue is semantic: the misalignment between the objects in the original information space i, where searchondocset is executed, and the ones in i(s), to be virtually queried over and returned by virtquery. syntactic issue: to obtain 𝑞′ and ds from q, the tagtick virtualizer module includes a query rewriter module that is in charge of rewriting q as a query: q' and qtags in session (1) both queries are compliant to the query grammar in table 1, but the second is a query that groups all tag predicates in q which are affected by s. the reason of this restriction is due to the fact that the method searchondocset(q’, ds) performs an intersection between the docset ds and the docset obtained from the execution of q�. in principle, qtags in session may contain arbitrary combinations of tag predicates (i = t) combined with and and or operators. to get a better understanding, refer to the examples in table 6, where we assumed to have two tag interpretations a with terms {a1, a2} and b with terms {b1, b2} where ρ(s, a, a1) and ρ(s, b, b1) are defined in ρ; note that keyword searches, e.g., “napoleon,” are not run over tag values. the first two queries can be executed, while the last one is invalid. indeed there is no way to factor out the tag predicate (a = a1) so that it can be separated and joint with the rest of the query using an and operator. high-performance annotation tagging over solr full-text indexes | artini et al 33 clearly, the ability of the query rewriter module to rewrite the query independently of its complexity may be crucial to increase the usability level of tagtick virtualizer. in its current implementation, the tagtick virtualizer assumes that q is provided to virtquery as already satisfying the expected query structure (1). as we shall see in the next section, this assumption is very reasonable in the realization of our annotation tagging tool tagtick and, more generally, in the definition of tools for annotation tagging. indeed, such tools typically allow data curators to run google-like free-keyword queries to be refined by a set of tags selected from a list. such queries fall in our assumption and also match the average requirements of this application domain. 𝑞 = "napoleon" 𝐴𝑛𝑑 (𝐴 = 𝑎1 𝑂𝑟 𝐵 = 𝑏1) 𝑤ℎ𝑒𝑟𝑒: 𝑞′ = "napoleon" 𝑞𝑡𝑎𝑔 𝑖𝑛 𝑠𝑒𝑠𝑠𝑖𝑜𝑛 = (𝐴 = 𝑎1 𝑂𝑟 𝐵 = 𝑏1) 𝑞 = (𝐴 = 𝑎2 𝑂𝑟 "napoleon") 𝐴𝑛𝑑 (𝐴 = 𝑎1 𝑂𝑟 𝐵 = 𝑏1) 𝑤ℎ𝑒𝑟𝑒: 𝑞′ = (𝐴 = 𝑎2 𝑂𝑟 "napoleon") 𝑞𝑡𝑎𝑔 𝑖𝑛 𝑠𝑒𝑠𝑠𝑖𝑜𝑛 = (𝐴 = 𝑎1 𝑂𝑟 𝐵 = 𝑏1) 𝑞 = (𝐴 = 𝑎1 𝑂𝑟 𝐵 = 𝑏2) 𝐴𝑛𝑑 napoleon table 6. query rewriting. semantic issue: the command searchondocset(q�, ds) does not match the expected semantics of virtquery(q, s). the reason is that searchondocset is executed over the original information space i and objects in the returned result set may not reflect the new tagging imposed by actions in s. for example, consider an untagging action for the tag (i, t) and the result set rs in s. although the objects in rs would never be returned for a query virtquery((i = t),s), they could be returned for queries regarding other properties and in this case they would still display the tag (i, t). to solve this problem, the function patchresultset : rs → rs in table 7 intercepts the result set returned by searchondocset and “patches” its objects, by properly removing or adding tags according to the actions in s. to this aim, the function exploits the low-level function intersectsize, which efficiently computes and returns the size of the intersection between two docsets. for each object d in a given result set rs, the function verifies if d belongs to the docsets ρ(s, i, t) relative to the tags touched by the session s: if this is the case (intersectsize returns 1), the object should be enriched with the tag (add(d, (i, t))), otherwise the tag should be removed from the object (remove(d, (i, t))). information technology and libraries | september 2014 34    = = = = ≠ 0 ))(},({ )),(,( 1 ))(},({ )),(,( )),(,( ))},(,({),,( ^),,( rsgetdocsetdizeintersectsiftidremove rsgetdocsetdizeintersectsiftidadd tidentpatchdocum tidentpatchdocumsrrstsetpatchresul tisr dîrs   table 7. patching result sets. the tagtick virtualizer implements also patching of results for browse queries. a solr browse query is a cql query q followed by the list of object properties l for which a group-by operation (in the sense of relational databases) is demanded. the query returns two responses: the query result set rs and the group-by statistics (l, v, k(l, v)) calculated over the result set and for the given properties, where k(l, v) is the number of objects featuring the value v for the property l in rs. as in the case of standard queries, the semantic issue affects browse queries when a group-by is applied over a tag interpretation i touched in the current work session. indeed, the relative stats would be calculated over the information space i rather than the intended i(s). to solve this issue, when a browse query demands for stats over a tag interpretation i, the relative triples (i, t, k(i, t)) are patched as follows: 1. if (i, t, k(i, t)) is such that ρ(s, i, t )= ⊥, i.e. the tag was not affected by the session, then k(i, t) is left unchanged; 2. if (i, t, k(i, t)) is such that ρ(s, i, t )= ds, then k(i,t)= intersectsize(ds, getdocset(rs)) the operation returns the number of objects currently tagged with (i, t) which are also present in the result set rs. query execution: the implementation of virtquery can therefore be defined as ⟦virtquery(q,s)⟧tv = patchresultset(searchondocset(q�, ds), ρ, s) where q is rewritten in terms of q� and qtags in session by the query rewriter module, and ds is the docset obtained by applying the function ⟦.⟧vt:q × s × ℘(s × i × t) →ds defined in table 8 to qtags in session. the function, given a query of tag predicates, a session identifier, and the status map ρ returns the docset of objects satisfying the query in the session’s scope. high-performance annotation tagging over solr full-text indexes | artini et al 35 ⟦𝑞1 𝑂𝑟 𝑞2⟧𝑉(𝑠,𝜌) = unifydocsets(⟦𝑞1⟧𝑉(𝑠,𝜌), ⟦𝑞2⟧𝑉(𝑠,𝜌)) ⟦𝑞1 𝐴𝑛𝑑 𝑞2⟧𝑉(𝑠,𝜌) = intersectdocsets(⟦𝑞1⟧𝑉(𝑠,𝜌), ⟦𝑞2⟧𝑉(𝑠,𝜌)) ⟦(i=t)⟧v(s, ρ) = ρ(s, i, t) table 8. evaluation of qtags in session in session s. the definition of ρ, the query rewriter module, the semantics of the commands action and virtquery, the definition of searchondocset, and the function ⟦.⟧v guarantee the validity of the following claim, crucial for the correctness of the tagtick virtualizer: claim (search correctness) given an information space i, a map ρ, and a session s, for any query q such that 1. q = q� and qtags in session 2. ds = �qtags in session�v(s, ρ) we can claim that ⟦virtquery(q, s)⟧tv = ⟦query(q, null)⟧solr i(s) hence the implementation of the command virtquery matches its expected semantics. making a virtual information space persistent the commitsession(s) command is responsible for updating the initial information space i to the changes applied in s, i.e. add and remove tags to objects in i according to the actions in s. to this aim, the module relies on the map ρ, which associates each tag (i, t) to the set of objects virtually tagged by (i, t) in s, and on the low-level function andnotdocsets. by properly matching the set of objects tagged by (i, t) in i and i(s) the function derives the sets of objects to tag and untag in i. overall, the execution of commitsession(s) consists in: 1. identifying the set of tags affected by tagging or untagging actions in the session s: changedtags(s) = {(i, t)|ρ(s, i, t) ≠ ⊥} 2. for each (i, t) ∈ changedtags(s) a) fetching the result set relative to all objects in i with tag (i = t): rs = query((i = t), null); b) keeping in memory the relative docset ds = getdocset(rs); c) calculating in memory the set of objects in i to be untagged by (i = t): information technology and libraries | september 2014 36 tobeuntagged = andnotdocsets(ds, ρ(s, i, t)); d) calculating in memory the set of objects in i to be tagged with (i = t) tobetagged = intersectdocset(ρ(s, i, t), ds); e) update the index to tag and untag all objects in the two sets; and f) remove session s. the tagtick virtualizer module is also responsible for the management of conflicts on commits and to avoid index inconsistencies. to this aim, only the first commit action is executed, and once the relative actions are materialized into the index, all other sessions are invalidated, i.e., deleted. tagtick user interface: annotation tagging for solr the tagtick user interface module implements the functionalities presented in previously over a solr index equipped with the tagtick virtualizer module described in the section on solr and annotation tagging (see figure 2). the user interface offers to authenticated data curators an annotation tagging environment where they can open work sessions, do and undo sequences of (un)tagging actions, and eventually commit the session into the current solr information space. when data curators log out from the tool, the modules stores on disk their pending work sessions and the relative (un)tagging actions. such sessions will be restored at the next access to the interface, to allow data curators continuing their work. figure 2. tagtick: user interface. the tagtick user interface is a general-purpose module that can be configured to adapt to the classes and to the structure of objects residing in the index. to this aim, the modules acquires this information from xml configuration files where data curators can specify: high-performance annotation tagging over solr full-text indexes | artini et al 37 1. the names of the different classes, the values used to encode such classes in the index, and the index field used to contain such values; 2. the list of tag interpretations together with the relative ontologies: in the current implementation ontologies are flat sets of terms, which can be optionally populated by curators during the tagging step; and 3. the intended use of interpretations: the association between classes and interpretations. once instantiated, the tagtick user interface allows users to search for objects of all classes by means of free keywords and to refine such searches by class and by the tags relative to such class. this combination of predicates, which matches the query structure 𝑞� = 𝑞 𝐴𝑛𝑑 𝑞𝑡𝑎𝑔𝑠 𝑖𝑛 𝑠𝑒𝑠𝑠𝑖𝑜𝑛 expected by the tagtick virtualizer, is then executed by the module and the results presented in the interface. users can then add or remove tags to the objects—the interface makes sure that the right interpretations are used for the given class. as an example, we shall consider the real-case instantiation of tagtick in the context of the hope project, whose aim is to deliver a data infrastructure capable of aggregating metadata records describing multimedia objects relative to labour history and located across several data sources.24 such objects are collected, cleaned, and enriched to form an information space stored into a solr index. the index stores two main classes of objects: descriptive units and digital resources. descriptive unit objects contain properties describing cultural heritage objects (e.g., a pin). digital resource objects instead describe the digital material representing the cultural heritage objects (e.g., the pictures of a pin). tagtick is currently used in the project hope to classify the aggregated objects according to two tag interpretations: “historical themes,” to tag descriptive units with an ontology of terms describing historical periods, and “export mode,” to tag digital resources with an ontology which describes the different social sites (e.g., youtube, facebook, flickr) from which the resource must be made available from. in particular, figure 3 illustrates the hope tagtick user interface. in the screenshot, a set of descriptive units obtained by a query is being added a new tag “communism . . .” of the tag interpretation “historical themes.” the tagtick user interface offers the possibility to access the history of actions, in order to visualize their sequence, and possibly to undo their effects. figure 4 shows the history of actions that led to the actual tag virtualization in the current work session. curators can only rollback the last action they accomplished. this is because virtual tagging actions may be depending on each other; e.g., an action is based on a query that includes tag predicates whose tag has been affected by previous actions. other approaches may infer the interdependencies between the queries behind the tagging actions and expose dependency-based undo options. information technology and libraries | september 2014 38 figure 3. tagtick user interface: bulk tagging action. figure 4. tagtick user interface: managing history of actions. stress tests the motivations behind the realization of tagtick are to be found in annotation tagging requirements of bulk and real-time tagging. in general, the indexing speed of solr highly depends on the underlying hardware, on the number of threads used for feeding, on the average size of the objects and their property values, and on the kind of text analysis to be adopted.25 however, even assuming the most convenient scenario, bulk indexing in solr is comparably slow with respect to other technologies, such as relational databases,26 and far from being real-time. in this section, we present the result of stress tests conceived to provide concrete measures of query performance, i.e., the real-time effect, the scalability of the tool, and how many tagging actions can be handled in the same session. the idea of the tests is to re-create worst scenarios and give evidence of the ability of tagtick to cope and scale with response time and memory high-performance annotation tagging over solr full-text indexes | artini et al 39 consumption. the experiments were run on a machine with processor intel(r) xeon(r) cpu e5630 @ 2.53ghz (4 cores), a total of memory 4 gb, and available disk of 100 gb (used at around 52 percent). the machine installs an ubuntu 10.04.2 lts operating system, with a java virtual machine configured as -xmx1800m -xx:maxpermsize = 512m. in simpler terms, a medium-low server for a production index. the index was fed with 10 million objects randomly generated and with the following structure: [ identifier: string, title: string, description: string, publisher: string, url: string, creator: string, date: date, country: string, subject: terms] the tag interpretation subject can be assigned values from an ontology terms of scientific subjects, such as “agricultural biotechnology,” “automation,” “biofuels,” “biotechnology,” “business aspects.” the objects are initially generated without tags. each test defines a new session s with k tagging actions of the form action(tag, virtquery(identifier <> id,null), t, s) where id is a random identifier and t is a random tag (subject, term). in practice, the action adds the tag t to all objects in the index, thereby generating docsets of size 10 million. once the k actions are executed, the test returns the following measures: 1. the size of the heap space required to store k tags in memory. 2. the minimal, average, and maximum time required to reply to two kinds of stress queries to the index (calculated out of 100 queries): a. the query identifier <> id and(i,t)∈s(i = t): the query returns the objects in the index which feature all tags touched by the session. b. the query identifier <>id or(i,t)∈s(i = t): the query returns the objects in the index which feature at least one of the tags assigned in the session. in both cases, since tagging actions where applied to all objects in the index, the result will contain the full index. however, in one case the response will be calculated by intersecting docsets, while in the other case by unifying them. note that by selecting a random identifier value (id), the test makes sure that low-level solr optimization by caching is not fired, as this would compromise the validity of the test. information technology and libraries | september 2014 40 3. the minimal, average, and maximum time required to reply to browse queries which involve all tags used in the session (calculated out of 100 queries). 4. the time required to reconstruct the session in memory whenever the data curator logs into tagtick. the results presented in figure 5 show that the average time for the execution of search and browse queries always remain under 2 seconds, which we can consider under the “real-time” threshold from the point of view of the users. user tests have been conducted in the context of the hope project, where curators were positively impressed by the tool. hope curators can today apply sequences of tagging operations over millions of aggregated records by means of a few clicks. moreover, independently of the number of tagging operations, queries over the tagged records take about 2 seconds to complete. the execution time has a major increase from 0 tags to 1 tag. this behavior is expected because when there is 1 tag in the session, the 10 million records must be “patched.” from 1 tag onwards the execution time increases as well, but not at the same rate as in the previous case. this means that in the average case patching 10 million records with 100 tags does not cost much more than tagging them with 1 tag. figure 5. stress test for tagtick search and browse functionality. the results in figure 6 show that the amount of memory to be used does not exceed the limits expected on reasonable servers running a production system. the time required to reconstruct the sessions is generally long, starting from 20 seconds for 50 tags up to 1.5 minutes for 200 tags. high-performance annotation tagging over solr full-text indexes | artini et al 41 on the other hand, this is a one-time operation, required only when logging in to the tool. figure 6. stress test for heap size growth and session restore time. conclusions in this paper, we presented tagtick, a tool devised to enable annotation tagging functionalities over solr instances. the tool allows a data curator to safely apply and test bulk tagging and untagging actions over the index in almost real time and without compromising the activities of end users searching the index at the same time. this is possible thanks to the tagtick virtualizer module, which implements a layer over solr that enables real-time and virtual tagging by keeping in memory the inverted list of objects associated to a (un)tagging action. the layer is capable of parsing user queries to intercept the usage of tags kept in memory and, in this case, to manipulate the query response to deliver the set of objects expected after tagging. future developments may regard the ability to enable more complex query parsing to handle rewriting of a larger set of queries beyond google-like queries currently handled by the tool. another interesting challenge is tag propagation. curators may be interested in having the action of (un)tagging an object to be propagated to objects that are somehow related with the object. handling this problem requires the inclusion into the information space model of relationships between classes of objects and the extension of the tagtick virtualizer module for the specification and management of propagation policies. acknowledgements the work presented in this paper has been partially funded by the european commission fp7 econtentplus-2009 best practice networks project hope (heritage of the people’s europe, http://www.peoplesheritage.eu), grant agreement 250549. http://www.peoplesheritage.eu/ information technology and libraries | september 2014 42 references 1. arkaitz zubiaga, christian körner, and markus strohmaier, “tags vs shelves: from social tagging to social classification,” in proceedings of the 22nd acm conference on hypertext and hypermedia, 93–102 (new york: acm, 2011), http://dx.doi.org/10.1145/1995966.1995981. 2. meng wang et al., “assistive tagging: a survey of multimedia tagging with human-computer joint exploration,” acm computer survey 44, no. 4 (september 2012): 25:1–24, http://dx.doi.org/10.1145/2333112.2333120. 3. lin chen et al., “tag-based web photo retrieval improved by batch mode re-tagging,” in 2010 ieee conference on computer vision and pattern recognition (cvpr) (june 2010), 3440–46, http://dx.doi.org/10.1109/ cvpr.2010.5539988. 4. emanuele quintarelli, andrea resmini, and luca rosati, “information architecture: facetag: integrating bottom-up and top-down classification in a social tagging system,” bulletin of the american society for information science & technology 33, no. 5 (2007): 10–15, http://dx.doi.org/10.1002/bult.2007.1720330506. 5. stijn christiaens, “metadata mechanisms: from ontology to folksonomy . . . and back,” in lecture notes in computer science: on the move to meaningful internet systems 2006: otm 2006 workshops (berlin heidelberg: springer-verlag, 2006). 6. m. mahoui et al., “collaborative tagging of art digital libraries: who should be tagging?” in theory and practice of digital libraries, ed. panayiotis zaphiris et al., 162–72, vol. 7489, lecture notes in computer science (springer berlin heidelberg, 2012), http://dx.doi.org/10.1007/978-3-642-33290-6_18. 7. alexandre passant and philippe laublet, “meaning of a tag: a collaborative approach to bridge the gap between tagging and linked data,” in proceedings of the linked data on the web (ldow2008) workshop at www2008, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.6915. 8. michael khoo et al., “towards digital repository interoperability: the document indexing and semantic tagging interface for libraries (distil),” in theory and practice of digital libraries, ed. panayiotis zaphiris et al., 439–44, vol. 7489, lecture notes in computer science (springer berlin heidelberg, 2012), http://dx.doi.org/10.1007/978-3-642-33290-6_49. 9. leonardo candela, et al, “setting the foundations of digital libraries: the delos manifesto.” d-lib magazine 13, no. 3/4, march/april 2007, http://dx.doi.org/10.1045/march2007castelli. 10. jennifer trant, “studying social tagging and folksonomy: a review and framework,” journal of digital information (january 2009), http://hdl.handle.net/10150/105375. http://dx.doi.org/10.1145/1995966.1995981 http://dx.doi.org/10.1145/2333112.2333120 http://dx.doi.org/10.1109/%20cvpr.2010.5539988 http://dx.doi.org/10.1002/bult.2007.1720330506 http://dx.doi.org/10.1007/978-3-642-33290-6_18 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.6915 http://dx.doi.org/10.1007/978-3-642-33290-6_49 http://dx.doi.org/10.1045/march2007-castelli http://dx.doi.org/10.1045/march2007-castelli http://hdl.handle.net/10150/105375 high-performance annotation tagging over solr full-text indexes | artini et al 43 11. cameron marlow et al., “ht06, tagging paper, taxonomy, flickr, academic article, to read,” in proceedings of the seventeenth conference on hypertext and hypermedia, 31–40 (new york: acm, 2006), http://dx.doi.org/10.1145/1149941.1149949. 12. andrea civan et al., “better to organize personal information by folders or by tags? the devil is in the details,” proceedings of the american society for information science and technology 45, no. 1 (2008): 1–13, http://dx.doi.org/10.1002/meet.2008.1450450214. 13. marianne lykke et al., “tagging behaviour with support from controlled vocabulary,” in facest of knowledge organization, ed. alan gilchrist and judi vernau, 41–50 (bingley, uk: emerald group, 2012) 14. guus schreiber et al., “semantic annotation and search of cultural-heritage collections: the multimedian e-culture demonstrator,” web semantics: science, services and agents on the world wide web 6, no. 4 (2008): 243–49, http://dx.doi.org/10.1016/j.websem.2008.08.001. 15. diana maynard and mark a. greenwood, “large scale semantic annotation, indexing and search at the national archives,” in proceedings of lrec vol. 12 (2012). 16. martin feijen, “driver: building the network for accessing digital repositories across europe,” ariadne 53 (october 2007), http://www.ariadne.ac.uk/issue53/feijen-et-al/. 17. heritage of the people’s europe (hope), http://www.peoplesheritage.eu/. 18. european film gateway project, http://www.europeanfilmgateway.eu. 19. paolo manghi et al., “openaireplus: the european scholarly communication data infrastructure,” d-lib magazine 18, no. 9–10 (september 2012), http://dx.doi.org/10.1045/september2012-manghi. 20. panagiotis antonopoulos et al., “efficient updates for web-scale indexes over the cloud,” in 2012 ieee 28th international conference on data engineering workshops (icdew), 135–42, april 2012, http://dx.doi.org/10.1109/icdew.2012.51. 21. chun chen et al., “ti: an efficient indexing mechanism for real-time search on tweets,” in proceedings of the 2011 acm sigmod international conference on management of data, 649– 60 (new york: acm, 2011), http://dx.doi.org/10.1145/1989323.1989391. 22. rafal kuc, apache solr 4 cookbook (birmingham, uk: packt, 2013). 23. david smiley and eric pugh, apache solr 3 enterprise search server (birmingham, uk: packt, 2011). 24. the hope portal: the social history portal, http://www.socialhistoryportal.org/timelinemap-collections. http://dx.doi.org/10.1145/1149941.1149949 http://dx.doi.org/10.1002/meet.2008.1450450214 http://dx.doi.org/10.1016/j.websem.2008.08.001 http://www.ariadne.ac.uk/issue53/feijen-et-al/ http://www.peoplesheritage.eu/ http://www.europeanfilmgateway.eu/ http://dx.doi.org/10.1045/september2012-manghi http://dx.doi.org/10.1109/icdew.2012.51 http://dx.doi.org/10.1145/1989323.1989391 http://www.socialhistoryportal.org/timeline-map-collections http://www.socialhistoryportal.org/timeline-map-collections information technology and libraries | september 2014 44 25. assuming to operate a stand-alone instance of solr, hence not relying on solr sharding techniques with parallel feeding. 26. whyusesolr—solr wiki, http://wiki.apache.org/solr/whyusesolr. in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs manolis peponakis abstract the aim of this study is to contribute to the field of machine-processable bibliographic data that is suitable for the semantic web. we examine the entity relationship (er) model, which has been selected by ifla as a “conceptual framework” in order to model the fr family (frbr, frad, and rda), and the problems er causes as we move towards the semantic web. subsequently, while maintaining the semantics of the aforementioned standards but rejecting the er as a conceptual framework for bibliographic data, this paper builds on the rdf (resource description framework) potential and documents how both the rdf and linked data’s rationale can affect the way we model bibliographic data. in this way, a new approach to bibliographic data emerges where the distinction between description and authorities is obsolete. instead, the integration of the authorities with descriptive information becomes fundamental so that a network of correlations can be established between the entities and the names by which the entities are known. naming is a vital issue for human cultures because names are not random sequences of characters or sounds that stand just as identifiers for the entities—they also have socio-cultural meanings and interpretations. thus, instead of describing indivisible resources, we could describe entities that appear in a variety of names on various resources. in this study, a method is proposed to connect the names with the entities they represent and, in this way, to document the provenance of these names by connecting specific resources with specific names. introduction the basic aim of this study is to contribute to the field of machine-processable bibliographic data. as to what constitutes “machine processable” we concur with the clarification of antoniou and van harmelen, who state, “in the literature the term machine-understandable is used quite often. we believe it is the wrong word because it gives the wrong impression. it is not necessary for intelligent agents to understand information; it is sufficient for them to process information effectively, which sometimes causes people to think the machine really understands.”1 also, in the bibliography used, the term “computationally processable” is used as a synonym to “machine processable.” manolis peponakis (epepo@ekt.gr) is an information scientist at the national documentation centre, national hellenic research foundation, athens, greece. information technology and libraries | june 2016 19 mailto:epepo@ekt.gr with regard to machine-processable bibliographic data, we have taken into consideration both the practice and theory of library and information science (lis) and computer science. from lis we have chosen the functional requirements for bibliographic records (frbr) and the functional requirements for authority data (frad) while making comparisons with the resource description and access (rda) standard. from the computer science domain we have chosen the resource description framework (rdf) as a basic mechanism for the semantic web. we examine the entity relationship (er) model (selected from ifla as a “conceptual framework” for the development of frbr), 2 as well as the potential problems that may arise as we move towards the semantic web. having rejected the er model as a conceptual framework for bibliographic data, we have built on the potential of rdf and document how its rationale affects the modeling process. in the context of the semantic web and uniform resource identifiers (uris), the identification process has been transformed. for this reason we have performed an analysis of appellations and names as identifiers and also explored how we could move on from an era where controlled names play the role of identifiers to one of the uri dominion: “while it is self-evident that labels and comments are important for constructing and using ontologies by humans, the owl standard does not pay much attention to them. the standard focuses on the syntax, structure and reasoning capabilities. . . . if the semantic web is to be queried by humans, there will be no other way than dealing with the ambiguousness of human language.”3 it is essential to build on the “library's signature service, its catalog,”4 and use it to provide addedvalue services. but to get there, first there has to be “a shift in perspective, from locked-up databases of records to open data shared on the web.”5 this requires a transition from descriptions aimed at human readers to descriptions that put the emphasis on computational processes to escape the rationale of records being a condensed description in textual form and move towards more flexible and fruitful representations and visualizations. background frbr and rda the fr family has been growing for more than a decade. the first member of the family was the functional requirements for bibliographic records (frbr),6 the first version of which was published towards the end of the last century. subsequently, ifla decided to extend the model in order to cover authorities. during this process, the task of modeling the names was separated from the task of modeling the subjects. thus two new members were added to the family; the “functional requirements for authority data: a conceptual model” (frad) and the “functional requirements for subject authority data (frsad).” 7,8 at the same period of time, the “resource description and access” (rda) standard was established as a set of cataloging rules to replace the aacr standard. according to its creators, the alignment with the fr family was crucial. as stated, in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 20 “a key element in the design of rda is its alignment with the conceptual models for bibliographic and authority data developed by the international federation of library associations and institutions (ifla): functional requirements for bibliographic records [and] functional requirements for authority data.”9 this paper uses the fr family and the rda as a starting point but detects some problems and inconsistencies between these models. it sustains the basic semantics from these standards but rejects their structural formalism because it finds that it is quite problematic and lacks effectiveness in expressing highly machine-processable data. the effective processability of the data will be discussed in detail in the section “the impact of the representation scheme’s selection: rdf versus er.” among the fr family, the terminology is inconsistent and, as we pass from the frbr to frad and frsad, even the perception angle of the general model undergoes change. in frbr (the first in order), there is no notion of the name as an entity. frad introduces this perception (frad also adds family as a new entity) and frsad makes a step forward and introduces the concept of nomen instead of the concept of name. hence, despite the fact that each of the members of the fr family of models has been represented in rdf,10 there is no established consolidated edition yet that combines the different angles using a common model and terminology (vocabulary).11 these representations (one for each model) are available at ifla’s website.12 on the other hand, in the context of rda there may be more consistency regarding terminology, but, as is well established in the relevant literature, there are significant differences between the two models, i.e. the fr family and rda.13,14,15 due to these differences, there are no uris, not even in the rda registry, in the examples of our study.16 given the above, the terms appearing in the figures are a selection from the three texts of the fr family. thus, nomen (from frsad) is used instead of name (from frad) as a more abstract notion, and the attribute—property in the context of rdf—“has string” (from frad) is used to assign a specific literal to a nomen. in figures 2–5 we have used the “has appellation” (reversed “is appellation of”) relationship of frad.17 notes about terminology and graphs: how to read the figures in this paper two different sorts of figures appear. this covers the need to compare two different models and pinpoint the differences between them and the problems that arise from selecting the er model to express frbr. an explanation of the two major models follows in the next subsection. information technology and libraries | june 2016 21 the first figure type follows the diagrams of the entity–relationship model and is used in figure 1. in this case: • the rectangles represent entities. • the oval shapes represent attributes. • the diamond-shaped boxes represent relationships. the second figure type has been created according to the rdf graphical representations and is used in figures 2–5. in these cases: • the oval shapes represent nodes that are identified by a uri and they could serve as objects or subjects for further expansion of the network. in figures 3–5 all the names were derived from the fr entities. • the line connectors between nodes represent the predicates (i.e., they are properties) and should also serve as uris. • the rectangle shapes represent literals consisting of lexical form. language code could apply in these cases. with or without language codes, these are the end points and they could not be subject to new connections. we follow the common modeling of the language in rdf in which the literal itself contains a language code, for example "example"@en in standard turtle syntax, or <rdfs:label xml:lang="en"> in rdfs xml coding. we must note that this kind of modeling is quite a simplistic way of language modeling because there is no mechanism to declare more information about language, such as multiple scripts, which could apply in the context of the same language. the impact of the representation scheme’s selection: rdf versus er nowadays, all the information on library catalogs is created through and stored in computers. this technological infrastructure provides specific methods and dictates limitations for the catalog’s data management. hence, every model must take into consideration the basic rationale of the technological infrastructure that will curate and process the data. depending on the syntax capabilities of the representation model, the expression of what we want to express becomes reasonably easy and accurate since “semantics is always going to have a close relationship with the field of syntax.”18 this establishes a vital relationship between what we want to do and how computers can do it. in this section we emphasize the limitations of the entity relationship (er) implementation, which frbr proposes, and denote how syntax affects expressiveness and, accordingly, functionality. finally, we demonstrate how the selection of one implementation or another (in our case er vs. rdf) has serious implications, both for cataloging rules and for cataloging practice. in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 22 why do we compare these two specific models? the er model is the base that has been selected from ifla as a “conceptual framework” 19 for the development of frbr, while frbr is the conceptual model upon which rda has been founded. subsequently, rda is also affected by the choice of er model. on the other hand, rdf is the current conceptualization for resource description in the web of data. so, what kind of problems and conflicts arise from the implementations of each of these models? the basic rationale of er comprises three fundamental elements. there are entities; entities have attributes; and there are relationships between entities. it is also possible to declare cardinality constraints upon which the fr family builds. then again, rdf implies quite a different model. “the core structure of the abstract syntax is a set of triples, each consisting of a subject, a predicate and an object. a set of such triples is called an rdf graph. an rdf graph can be visualized as a node and directed-arc diagram, in which each triple is represented as a node-arc-node link. . . . there can be three kinds of nodes in an rdf graph: iris, literals, and blank nodes.”20 “linking the object of one statement to the subject of another, via uris, results in a chain of linked statements, or linked data. this avoids the ambiguity of using natural language strings as headings to match statements. as a result, a literal object terminates a linked data chain, and literals are generally used for human-readable display data such as labels, notes, names, and so on.”21 as a representative example of the differences between the two models, let us consider “place of publication.” peponakis counts nine attributes of place and notices that, due to the fact that the er model does not allow links between attributes, there is no way to define explicitly whether these attributes address the same place or not.22 taking into consideration this problem we demonstrate the transition from the er attributes approach to rdf implementations in figures 1– 2. let us assume that there is person (x), who was born in london, is named john smith and works at publisher (y). this publisher is located in london, where book (1), entitled history of london, has been published. for this specific book, person x was the lithographer. if we create a strict mapping to frbr entities, attributes, and relations, then we have the situation illustrated in figure 1. due to the fact that there is no way to link the four occurrences of london (inasmuch as there is no option to define relations between attributes in the er model), there is no way to be certain that london is the same in all cases. judging only by the name, it could stand for london in england, in ontario, in ohio, or elsewhere. information technology and libraries | june 2016 23 figure 1. example of “place” as attribute of several entities the ifla working group has faced the problem with place and noted the following. the model does not, however, parallel entity relationships with attributes in all cases where such parallels could be drawn. for example, “place of publication/distribution” is defined as an attribute of the manifestation to reflect the statement appearing in the manifestation itself that indicates where it was published. inasmuch as the model also defines place as an entity it would have been possible to define an additional relationship linking the entity place either directly to the manifestation or indirectly through the entities person and corporate body which in turn are linked through the production relationship to the manifestation. to produce a fully developed data model further definition of that kind would be appropriate. but for the purposes of this study it was deemed unnecessary to have the conceptual model reflect all such possibilities. 23 finally, they seem to avoid the problem and repeat their position in frad as well. in certain instances, the model treats an association between one entity and another simply as an attribute of the first entity. for example, the association between a person and the place in which the person was born could be expressed logically by defining a relationship (“born in”) between person and place. however, for the purposes of this study, it was deemed sufficient to treat place of birth simply as an attribute of person. 24 in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 24 for some reason the creators of the fr family have chosen not to “upgrade” the attributes of place into one and only one entity. furthermore, the same problem exists for many attributes, not only for place. thus, the problem has to do with the selection of er as “conceptual framework” and not with the specific entity of place. if we accept that “place of publication” must not be recorded as it appears on the resource, an rdf-based approach makes things clearer, as figure 2 shows. in this case, all attributes of place are promoted to the same rdf node and, instead of four repeats of the attribute with the value “london,” we reduce it to one and only one node with four connections to it. then, as illustrated by figure 2, we can be sure that all instances refer to the same london. figure 2. rdf-based representations of figure 1 in figure 2, it is assumed that there is no need to transcribe the literal of “place of publication” from the resource; i.e., we did not follow rule 2.8.1.4 of rda: “transcribe places of publication and publishers' names as they appear on the source of information.” for cataloging rules that demand to record the place as it appears on the resource, the readers can consult the subsection “place names” in this study. last but not least, rdf has another significant advantage compared to the er model: data coded in rdf are packed ready for use in the semantic web. on the contrary, data coded in er must undergo conversion—with all its implications—in order to be published in the semantic web. information technology and libraries | june 2016 25 names, entities, and identities in this section, the significance of names as carriers of meaning is outlined and the importance of documenting the relations of names with the entities and identities they refer to is established. additionally, the basic approaches are presented for metadata generation for managing names. these approaches resulted in the distinction (dissociation of authorities) from the bibliographic records, which in turn led (both frbr/frad and rda) to the lack of potentially linking—in an explicit way—the entity with the names it goes by. this linking, as it is presented later in this text, is fundamental for the description and interpretation of the entity. in everyday communication, the usage of a name in a sentence plays the role of the identifier for the entity that this specific name indicates. if the speakers share a common background, there is no need for qualifiers other than the name in order to disambiguate information such as whether nick is person x or person y, or if the word “london” indicates the city in ohio or in england, etc. thus, the common background leads to a very limited context in which the interpretation of the name and the assignment to the appropriate entity is sufficient and accurate. however, the context of the internet is extended into a variety of possibilities, so there is need of a more precise way to identify specific entities. in this regard, a very essential issue is the distinction between the properties of the name and the properties of the entity that is represented by the specific name. the word “john” could be recognized as an english name, but we jump to a logical flaw if we assume that john knows english. a representative example of this kind of inference (syllogism) can be found in rayside and campbell.25 statement: “man is a species of animal. socrates is a man. therefore, socrates is a species of animal. . . . ‘man' is a three-lettered word. socrates is a man. therefore, socrates is a three-lettered word.” therefore the authorities of a catalog should embody a two-level modeling of the information they represent. the first has to do with the entities and the second with the names of these entities. consequently, there is the need to find a way to pass from names to the entities they indicate; and, from entities, to the various appellations that these entities have. in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 26 in catalogs, it is kind of vague whether the change of a name signifies a new identity. niu states: “for example: the maiden name and the married name of an agent are normally not considered two separate identities, yet one pseudonym used for writing fiction and another pseudonym used for writing scientific works are often considered two different identities of an agent.”26 then there can be one individual with many identities. but there can also be one identity which incorporates many individuals: for example, a shared pseudonym for a group of authors. to deal with these problems, frad introduces the notion of persona, rejecting at the same time the idea that a person is equal to an individual. frad defines a person as an “individual or a persona or identity established or adopted by an individual or group.”27 the question that arises here is when the persona must be conceived as a new identity. yet, frad does not make a sufficient judgment; instead, they refer to cataloguing rules. “under some cataloguing rules, for example, authors are uniformly viewed as real individuals, and consequently specific instances of the bibliographic entity person always correspond to individuals. under other cataloguing rules, however, authors may be viewed in certain circumstances as establishing more than one bibliographic identity, and in that case a specific instance of the bibliographic entity person may correspond to a persona adopted by an individual rather than to the individual per se.”28 so there is no specific guidance if, for example, in the case of “religious relationship,”29 there must be one identity created with two alternative names or two different identities. rule 9.2.2.8 in rda does not elaborate further. still, even with the problem of identities solved, the matter of appellations itself could be extremely complicated, and this is widely addressed in relevant literature.30,31,32 the viaf project confirms this with an extremely huge data set .33 assigning all appellations as attributes is an easy way to model the variants of a name, but it is very simplistic because it “does not allow these appellations to have attributes of their own and neither does it allow the establishing of relationships among the appellations. . . . frad makes a big step forward: all appellations are defined as entities in their own right, thus allowing full modeling.”34 of course, frad’s approach is not a novelty in the domain of lis since library catalogs have been modeling names since the era of marc. in unimarc authorities,35 the control subfield $5 contains a coded value to indicate the relations between the names with values such as “k = name before the marriage,” “i = name in religion,” “d = acronym,” etc., and in marc 21 there is the corresponding subfield $w.36 frad puts these values on a more consistent and abstract level. frad also defines “relationships between persons, families, corporate bodies, and works” in section 5.3 and “relationships between their various names” in section 5.4.37 the distinction between authorities and descriptive information since the days of card catalogs and for as long as marc and aacr have been used, bibliographic records have set their grounds on the dichotomy between descriptive information and control access points. the various types of headings stand for control access points. the terminus of headings was the alphabetical sorting. with the advent of computers, they were used as string identifiers to cluster and retrieve relevant bibliographic records. these bibliographic records had information technology and libraries | june 2016 27 a body of descriptive information that was transcribed from the resource and remained unchanged. so the headings were the keys to the records and the records were surrogates for documents. “the elements of a bibliographic record . . . were designed to be read and comprehended by human beings, not by machines”38; established headings are not an exception. one of their basic characteristics was the precondition that they were unique in the context of a specific catalog, thereby avoiding ambiguity. in every case of synonymy, qualifiers (such as date of birth or profession) were added to disambiguate, while the names also played the role of a unique identifier. from this process, an issue emerges: the information that appears on the document has changed and the controlled name may be completely different from the name on the resource. this means that the cataloger performs a transformation of the information, and this transformation carries two dangers. first, by changing the name, there is the possibility of assigning the entity behind the name to a wrong entity. second, by disturbing the correspondence between the information on the resource and the information on the record of the resource, the record becomes a problematic surrogate of the resource. to surpass this obstacle, traditional catalogs split the information into two different areas: one with the established forms, i.e., the headings; and the second with the purely descriptive information, i.e., the information that must be transcribed from the resource. this is the reason why traditional library catalogs put much effort into transcribing information from resources and very detailed guidelines have been developed. on the other hand, current approaches on metadata creation (such as dublin core) seem to underestimate the importance of descriptive information while concentrating on the established forms of names. but how can we be sure that different literals communicate the same meaning? does this kind of simplification, perhaps, cause problems regarding the integrity of the information? the names are not just sequences of characters (i.e., strings), but they carry latent information. it is known that there are women who wrote using male names (for example mary ann evans wrote as george eliot) and men who wrote by using female names. there are also nicknames for groups (e.g., “richard henry” is a pseudonym for the collaborative works of richard butler and henry chance newton), etc. therefore, it is important not to ignore names and the forms in which they appear on the resources, but to model them in such a way that integration between authorities and descriptive information is feasible, and the names are efficiently machine-processable. integrating authorities with descriptive information as we have already stated, traditional library catalogs are built on the dichotomy between description and access points. this analysis aims to bring descriptive information and authorities closer, i.e. to connect the access point of catalogs with the description of the resource. the basic principle of the model presented in this section is to promote each verbal (lexical) representation of a name to a nomen, whether this form of the name derives from a controlled vocabulary or not. in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 28 in the cases that this form appears in a specific vocabulary, appropriate properties could be used to indicate such a relation. in this section, some representative examples are presented. it is important to note, once again, that every node and relation in the following figures could (and must, in the context of the semantic web) be identified by a uri, except for the values in rectangles, which are rdf simple literals and therefore cannot be the subjects of further expansion. thus, the concatenation is the following: every individual (instance of the relevant class) acquires a uri. every individual is connected through the “has appellation” property (acquires uri) to a nomen (also acquires uri) and these nomens end up connected to a plain rdf literal, which is in natural language wording and cannot be subjected to further analysis. place names the problem of place as an attribute in frbr and frad has also been analyzed in the background analysis of the current paper, specifically in the subsection “the impact of the representation scheme’s selection: rdf versus er.” here, a solution to this problem that is compatible with the frbr/rda solution is proposed. by promoting every nomen of a place to an rdf node, there is the option of referring to the entity of place as a whole or to a specific appellation of this entity. so, the relation (property in the context of rdf) between the subjects of a work could be indicated by connecting work x with place z. on the other hand, according to rule 2.8.1.4 of rda, the place of publication for the manifestation must be transcribed as it appears on the source of information. but following the connections presented in figure 3, it is easy to assume that this specific nomen corresponds to the same entity, i.e., to the same place. figure 3. place information technology and libraries | june 2016 29 personal names in the section “names, entities and identities,” we analyzed many of the problems associated with personal names. here, a model is presented where the work (and expression) is connected directly with the author, whereas manifestation is connected with a specific appellation, i.e., nomen, of this author. figure 4. statements of responsibility rda rule 2.4.1.4 states, “transcribe a statement of responsibility as it appears on the source of information.” but occasionally the statement of responsibility may contain phrases and not just names. in these cases, a solution similar to the metadata object description schema (mods) could be implemented where, if needed, the statement of responsibility is included in the note element using the attribute type="statement of responsibility." titles the management of titles in frbr and rda indicates a different point of view between the two standards. according to rda there is no title for the expression,39 and, as taniguchi states, this is a “significant difference between frbr and rda.”40 bibframe abides by the same principle of downgrading expression, since it entangles expression with work in an indivisible unit. in this regard, bibframe is closer to rda than to frbr. the notion of work has nothing to do with specific languages, even in the case when the work is a written text. therefore the assignment of the title of work to a specific appellation is an unnecessary limitation. on the contrary, the title of a manifestation is derived by a specific in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 30 resource. we argue that between these two poles there is the title of expression, which could stand as a uniform title per language. figure 5. titles v of bibliographic records and cataloging rules resource description in the domain of lis—from cutter’s era to the present day—emphasizes static linear textual representations. according to the rda “0.1 key features,” “in rda, there is a clear line of separation between the guidelines and instructions on recording data and those on the presentation of data. this separation has been established in order to optimize flexibility in the storage and display of the data produced using rda. guidelines and instructions on recording data are covered in chapters 1 through 37; those on the presentation of data are covered in appendices d and e.” but the tables in the relative appendices (d and e) contain guidelines that are mainly concentrated on punctuation issues, and they do not take into consideration the dynamics of current interactive user interface capabilities. as coyle and hillmann comment, “there are instructions for highly structured strings that are clearly not compatible with what we think of today as machine-manipulable data.”41 it is rather like producing high-tech cards: rda is faithful information technology and libraries | june 2016 31 to the classical text-centric approaches that produce bibliographic records as a linear enumeration of attributes; thus, rda can be likened to a new suit that is quite old fashioned. traditional catalogs (from card catalogs to opacs and repository catalogs) were built upon the principle of creating autonomous records. frbr set this principle, i.e. one record for each resource, under dispute, while linked data abolishes it. this way, a gigantic graph of statements is created, while a certain part of these statements (not always the same) responds to or describes the desired information. thus, a more sophisticated method emerges, if not makes itself imposed, for showing the results. therefore, the issue is not to present a record that describes a specific resource, since this conceptualization tends to be obsolete altogether. consequently, the visualization has to be different while in dependence with the data structure as well as the available interface of the searcher. in this context, the analysis of this study tries to keep in balance the machine-processable character of rdf that builds on identifiers (uris), while paying attention to the linguistic representation of entities. we argue that the balance between them will result in highly accurate and efficient representations for both humans and software agents. let us consider the model for titles that has been introduced in this study. according to frbr, “if the work has appeared under varying titles (differing in form, language, etc.), a bibliographic agency normally selects one of those titles as the basis of a ‘uniform title’ for purposes of consistency in naming and referencing the work.”42 rda treats the case in a very similar way: rule 5.1.3 states, “the term ‘title of the work’ refers to a word, character, or group of words and/or characters by which a work is known. the term ‘preferred title for the work’ refers to the title or form of title chosen to identify the work. the preferred title is also the basis for the authorized access point representing that work”. in this study, we consider the aforementioned statements as a projection that springs from the days when records were static textual descriptions independent of interfaces. nowadays we are moving towards a much clearer distinction between the entity and its names. this is reflected in figure 5, in which the connection between a work and its author has nothing to do with specific names (appellations) but is based on uris. the selection of the appropriate name as a title for the specific work could be based on certain criteria such as the language of the interface: in this case, the title of the work will be the title of the user interface language, and if this is not possible (i.e. there is no title label in this language), then it could be the title of the catalog’s default language. following the kind of modeling proposed in the current study, the visualizations of data become more flexible and efficient in a variety of dynamic ways. hence, we can isolate and display nodes and their connections, correlate them with the interface language or screen size (i.e., mobile phone or pc), create levels relative to the desired depth of analysis, personalize them upon the user’s request or habits, and so on. also, it becomes possible to display the data in forms other than textual. “as a result, humans, with their great visual pattern recognition skills, can comprehend data tremendously faster and more effectively through visualization than by reading the numerical or textual representation of the data.”43 in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 32 as we have already mentioned, the syntax and the semantics are always going to have a close relationship, but it is crystal clear that, now more than ever, the current semantic web standards allow for greater flexibility. as dunsire et al. put it, the rdf approach is very different from the traditional library catalog record exemplified by marc21, where descriptions of multiple aspects of a resource are bound together by a specific syntax of tags, indicators, and subfields as a single identifiable stream of data that is manipulated as a whole. in rdf, the data must be separated out into single statements that can then be processed independently from one another; processing includes the aggregation of statements into a record-based view, but is not confined to any specific record schema or source for the data. statements or triples can be mixed and matched from many different sources to form many different kinds of user-friendly displays.44 in this framework, cataloging rules must reexamine their instructions in light of the new opportunities offered by technological advancements. discussion naming is a vital issue for human cultures. names are not random sequences of characters or sounds that stand just as identifiers for the entities, but they also have socio-cultural meanings and interpretations. recently, out of “political correctness” and fear of triggering racism, sweden changed the names of bird species that could potentially offend, such as “gypsy bird” and “negro.”45 therefore we cannot treat names just as random identifiers. in this study we examined how, instead of describing indivisible resources, we could describe entities that appear in a variety of names on various resources. we proposed a method for connecting the names to the entities they represent and, at the same time, we documented the provenance of these names by connecting specific resources with specific names. we illustrated how to establish connections between entities, connections between an entity and a specific name of another entity, as well as connections between one name and another name concerning one or two entities. in the proposed framework, we maintain the linguistic character of naming while modeling the names in a machine-processable way. this formalism allows for a high level of expressiveness and flexible descriptions that do not have a static, text-centric orientation, since the central point is not the establishment of the text values (i.e., heading) but the meaning of our statements. this study has shown that it is important to have the possibility to establish relationships both between entities and between specific appellations (nomens in the context of this study) of these entities. to achieve this we promoted every appellation to an rdf node. this is not something unheard of in the domain of rdf since this approach has also been adopted by w3c for the development of skos-xl.46 frbroo, which is another interpretation of increasing influence in the wider context of the fr family, adopts the same perspective. 47 frbroo also gives the option to connect a specific name with a resource through the property “r64 used name (was name used information technology and libraries | june 2016 33 by)” or to connect a name with someone who uses this specific name through the property “r63 named (was named by).” murray and tillett state that “cataloging is a process of making observations on resources”48; hence, the production of records is the result of the judgments during this process. but in the context of traditional descriptive cataloging, the cataloger was not required to judge information in any way other than its category, i.e. to characterize whether the x set of characters corresponded to the name of an author, publisher, or place and so on. there was no obligation of assigning a particular name to a specific author, publisher, or place. in our approach, the cataloger interprets the information and supports the catalog’s potential to deliver added-value information. moreover, the initial information remains undifferentiated; hence, there is always the option of going back in order to generate new interpretations or validate existing ones. in recent years, there has been a significant increase in the attention given to multi-entity models of resource description.49 in this new environment, “the creation of one record per resource seems a deficient simplification.”50 rdf allows the transformation of universal bibliographic control to a giant global graph.51 in this manner, current approaches on resource description “cannot be considered as simple metadata describing a specific resource but more like some kind of knowledge related to the resource.”52 indeed, this knowledge can be computationally processable and exploitable. yet, to achieve this, “catalogers can only begin to work in this way if they are not held bound by the traditional definitions and conceptualizations of bibliographic records.”53 one critical issue is the isolation of parts (sets of statements) of this “giant graph” and the linking of these parts with something else; indeed, theory on this topic is starting to emerge.54 this is very essential because it allows for the creation of ad hoc clusters (i.e. the usage of a specific identity for an entity with all the names that have been assigned to this identity, in our context), which could be used as a set to link to some other entity. as a final remark, we could say that authorities manage controlled access points. in the semantic web, every uri is a controlled access point, and hence, the discrimination between description and authorities acquires a new meaning. in the context of machine-processable bibliographic data, the aim is to connect these two, i.e. the authorities with the description, and examine how one can support the other. however, since the emphasis is not on their individual management, we are drawn away from a mentality of ‘descriptive information versus access points” and towards one of “descriptive information as an access point.” acknowledgement the author wishes to thank henry scott who assisted in the proofreading of the manuscript. in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 34 references and notes 1. grigoris antoniou and frank van harmelen, a semantic web primer, 2nd ed. (cambridge, ma: mit press, 2008), 3. 2. ifla, functional requirements for bibliographic records: final report, as amended and corrected through february 2009, ifla series on bibliographic control, vol. 19 (munich: k.g. saur, 1998), 6. 3. daniel kless et al., “interoperability of knowledge organization systems with and through ontologies,” in classification & ontology: formal approaches and access to knowledge: proceedings of the international udc seminar 19–20 september 2011, the hague, the netherlands, organized by udc consortium, the hague, edited by aida slavic and edgardo civallero (würzburg: ergon, 2011), 63–64. 4. karen coyle and diane hillmann, “resource description and access (rda): cataloging rules for the 20th century,” d-lib magazine 13, no. 1/2 (january 2007): para. 2, doi:10.1045/january2007-coyle. 5. cory k. lampert and silvia b. southwick, “leading to linking: introducing linked data to academic library digital collections,” journal of library metadata 13, no. 2–3 (2013): 231, doi:10.1080/19386389.2013.826095. 6. ifla, functional requirements for bibliographic records. 7. ifla, functional requirements for authority data: a conceptual model, edited by glenn e. patton, ifla series on bibliographic control (munich: k.g. saur, 2009). 8. ifla, “functional requirements for subject authority data (frsad): a conceptual model” (ifla, 2010), http://www.ifla.org/files/assets/classification-and-indexing/functional requirements-for-subject-authority-data/frsad-final-report.pdf. 9. ala, “rda toolkit: resource description and access,” sec. 0.3.1, accessed june 18, 2014, http://access.rdatoolkit.org/. 10. gordon dunsire, “representing the fr family in the semantic web,” cataloging & classification quarterly 50, no. 5–7 (2012): 724–41, dx:10.1080/01639374.2012.679881. 11. while this paper was under review, ifla released the draft “frbr-library reference model” (frbr-lrm), which is a consolidated edition for the fr family standards. it is developed according to the respective individual standards following the principles of the entity relationship modeling, which is challenged in this paper. taking into account the er modeling and the statement (available on p.5 of the standard) that “the model is comprehensive at the conceptual level, but only indicative in terms of the attributes and relationships that are defined,” this consolidated edition could not be perceived as a standard that could be implemented directly as a property vocabulary qualifying for use in the rdf environment. information technology and libraries | june 2016 35 http://dx.doi.org/10.1045/january2007-coyle http://dx.doi.org/10.1080/19386389.2013.826095 http://www.ifla.org/files/assets/classification-and-indexing/functional-requirements-for-subject-authority-data/frsad-final-report.pdf http://www.ifla.org/files/assets/classification-and-indexing/functional-requirements-for-subject-authority-data/frsad-final-report.pdf http://access.rdatoolkit.org/ http://dx.doi.org/10.1080/01639374.2012.679881 12. main page (for all fr) at http://iflastandards.info/ns/fr/; “frbr model" available at http://iflastandards.info/ns/fr/frbr/frbrer/; “frad model” available at http://iflastandards.info/ns/fr/frad/; “frsad model” available at http://iflastandards.info/ns/fr/frsad/. an addition to the previous is frbroo: the element set is available at http://iflastandards.info/ns/fr/frbr/frbroo/. 13. manolis peponakis, “conceptualizations of the cataloging object: a critique on current perceptions of frbr group 1 entities,” cataloging & classification quarterly 50, no. 5–7 (2012): 587–602, doi:10.1080/01639374.2012.681275. 14. pat riva and chris oliver, “evaluation of rda as an implementation of frbr and frad,” cataloging & classification quarterly 50, no. 5–7 (2012): 564–86, doi:10.1080/01639374.2012.680848. 15. shoichi taniguchi, “viewing rda from frbr and frad: does rda represent a different conceptual model?,” cataloging & classification quarterly 50, no. 8 (2012): 929–43, doi:10.1080/01639374.2012.712631. 16. rda registry is available at http://www.rdaregistry.info/. 17. the nomen entity and the “has appellation” (reversed “is appellation of”) property are also used by the frbr-lrm. 18. paul h. portner, what is meaning?: fundamentals of formal semantics (malden, ma: blackwell, 2005), 34. 19. ifla, functional requirements for bibliographic records, 19:6. 20. w3c, “rdf 1.1 concepts and abstract syntax: w3c recommendation,” february 25, 2014, http://www.w3.org/tr/2014/rec-rdf11-concepts-20140225/. 21. gordon dunsire, diane hillmann, and jon phipps, “reconsidering universal bibliographic control in light of the semantic web,” journal of library metadata 12, no. 2–3 (2012): 166, doi:10.1080/19386389.2012.699831. 22. manolis peponakis, “libraries’ metadata as data in the era of the semantic web: modeling a repository of master theses and phd dissertations for the web of data,” journal of library metadata 13, no. 4 (2013): 333, doi:10.1080/19386389.2013.846618. 23. ifla, functional requirements for bibliographic records, 19:32. 24. ifla, functional requirements for authority data: a conceptual model, 36–37. 25. derek rayside and gerard t. campbell, “an aristotelian understanding of object-oriented programming,” in proceedings of the 15th acm sigplan conference on object-oriented programming, systems, languages, and applications, oopsla ’00 (new york: acm, 2000), 350, doi:10.1145/353171.353194. in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 36 http://iflastandards.info/ns/fr/ http://iflastandards.info/ns/fr/frbr/frbrer/ http://iflastandards.info/ns/fr/frad/ http://iflastandards.info/ns/fr/frad/ http://iflastandards.info/ns/fr/frsad/ http://iflastandards.info/ns/fr/frbr/frbroo/ http://dx.doi.org/10.1080/01639374.2012.681275 http://dx.doi.org/10.1080/01639374.2012.680848 http://dx.doi.org/10.1080/01639374.2012.712631 http://www.rdaregistry.info/ http://www.w3.org/tr/2014/rec-rdf11-concepts-20140225/ http://dx.doi.org/10.1080/19386389.2012.699831 http://dx.doi.org/10.1080/19386389.2013.846618 http://dx.doi.org/10.1145/353171.353194 26. jinfang niu, “evolving landscape in name authority control,” cataloging & classification quarterly 51, no. 4 (2013): 405, doi:10.1080/01639374.2012.756843. 27. ifla, functional requirements for authority data: a conceptual model, 24. 28. ibid., 20. 29. “religious relationship” is the “relationship between a person and an identity that person assumes in a religious capacity”; for example the “relationship between the person known as thomas merton and that person’s name in religion, father louis” (ifla, 2009, 61–62). 30. junli diao, “‘fu hao,’ ‘fu hao,’ ‘fuhao,’ or ‘fu hao’? a cataloger’s navigation of an ancient chinese woman’s name,” cataloging & classification quarterly 53, no. 1 (2015): 71–87, doi:10.1080/01639374.2014.935543. 31. on byung-won, sang choi gyu, and jung soo-mok, “a case study for understanding the nature of redundant entities in bibliographic digital libraries,” program: electronic library and information systems 48, no. 3 (july 1, 2014): 246–71, doi:10.1108/prog-07-2012-0037. 32. neil r. smalheiser and vetle i. torvik, “author name disambiguation,” annual review of information science and technology 43, no. 1 (2009): 1–43, doi:10.1002/aris.2009.1440430113. 33. thomas b. hickey and jenny a. toves, “managing ambiguity in viaf,” d-lib magazine 20, no. 7/8 (2014), doi:10.1045/july2014-hickey. 34. martin doerr, pat riva, and maja žumer, “frbr entities: identity and identification,” cataloging & classification quarterly 50, no. 5–7 (2012): 524, doi:10.1080/01639374.2012.681252. 35. ifla, unimarc manual: authorities format, 2nd revised and enlarged edition, ubcim publications—new series, vol. 22 (munich: k.g. saur, 2001). 36. library of congress, “marc 21 format for authority data” (library of congress, april 18, 1999), http://www.loc.gov/marc/authority/. 37. ifla, functional requirements for authority data: a conceptual model. 38. martha m. yee, “frbrization: a method for turning online public findings lists into online public catalogs,” information technology and libraries 24, no. 2 (2005): 81, doi:10.6017/ital.v24i2.3368. 39. see frbr-rda mapping from joint steering committee for development of rda available at http://www.rda-jsc.org/docs/5rda-frbrrdamappingrev.pdf 40. taniguchi, “viewing rda from frbr and frad,” 934. 41. coyle and hillmann, “resource description and access (rda): cataloging rules for the 20th century,” sec. 8. information technology and libraries | june 2016 37 http://dx.doi.org/10.1080/01639374.2012.756843 http://dx.doi.org/10.1080/01639374.2014.935543 http://dx.doi.org/10.1108/prog-07-2012-0037 http://dx.doi.org/10.1002/aris.2009.1440430113 http://dx.doi.org/10.1045/july2014-hickey http://dx.doi.org/10.1080/01639374.2012.681252 http://www.loc.gov/marc/authority/ http://dx.doi.org/10.6017/ital.v24i2.3368 http://www.rda-jsc.org/docs/5rda-frbrrdamappingrev.pdf 42. ifla, functional requirements for bibliographic records, 19:33. 43. leonidas deligiannidis, amit p. sheth, and boanerges aleman-meza, “semantic analytics visualization,” in intelligence and security informatics, edited by sharad mehrotra et al., lecture notes in computer science 3975 (springer berlin heidelberg, 2006), 49, http://link.springer.com/chapter/10.1007/11760146_5. 44. dunsire, hillmann, and phipps, “reconsidering universal bibliographic control in light of the semantic web,” 166. 45. rick noack, “out of fear of racism, sweden changes the names of bird species,” washington post, february 24, 2015, http://www.washingtonpost.com/blogs/worldviews/wp/2015/02/24/out-of-fear-of racism-sweden-changes-the-names-of-bird-species/. 46. w3c, “skos extension for labels (skos-xl) namespace document—html variant,” 2009, http://www.w3.org/tr/2009/rec-skos-reference-20090818/skos-xl.html. 47. chryssoula bekiari et al., frbr object-oriented definition and mapping from frbrer, frad and frsad, version 2.0 (draft), 2013, http://www.cidoc crm.org/docs/frbr_oo//frbr_docs/frbroo_v2.0_draft_2013may.pdf. 48. robert j. murray and barbara b. tillett, “cataloging theory in search of graph theory and other ivory towers,” information technology and libraries 30, no. 4 (january 12, 2011): 171, http://dx.doi.org/10.6017/ital.v30i4.1868. 49. thomas baker, karen coyle, and sean petiya, “multi-entity models of resource description in the semantic web,” library hi tech 32, no. 4 (2014): 562–82, http://dx.doi.org/10.1108/lht 08-2014-0081. 50. peponakis, “libraries’ metadata as data in the era of the semantic web,” 343. 51. kim tallerås, “from many records to one graph: heterogeneity conflicts in the linked data restructuring cycle,” information research 18, no. 3 (2013), http://informationr.net/ir/18 3/colis/paperc18.html. 52. peponakis, “conceptualizations of the cataloging object,” 599. 53. rachel ivy clarke, “breaking records: the history of bibliographic records and their influence in conceptualizing bibliographic data,” cataloging & classification quarterly 53, no. 3–4 (2015): 286–302, doi:10.1080/01639374.2014.960988. 54. gianmaria silvello, “a methodology for citing linked open data subsets,” d-lib magazine 21, no. 1/2 (2015), doi:10.1045/january2015-silvello. in the name of the name: rdf literals, er attributes, and the potential to rethink the structures and visualizations of catalogs | peponakis |doi:10.6017/ital.v35i2.8749 38 http://link.springer.com/chapter/10.1007/11760146_5 http://www.washingtonpost.com/blogs/worldviews/wp/2015/02/24/out-of-fear-of-racism-sweden-changes-the-names-of-bird-species/ http://www.washingtonpost.com/blogs/worldviews/wp/2015/02/24/out-of-fear-of-racism-sweden-changes-the-names-of-bird-species/ http://www.w3.org/tr/2009/rec-skos-reference-20090818/skos-xl.html http://www.cidoc-crm.org/docs/frbr_oo/frbr_docs/frbroo_v2.0_draft_2013may.pdf http://www.cidoc-crm.org/docs/frbr_oo/frbr_docs/frbroo_v2.0_draft_2013may.pdf http://dx.doi.org/10.6017/ital.v30i4.1868 http://dx.doi.org/10.1108/lht-08-2014-0081 http://dx.doi.org/10.1108/lht-08-2014-0081 http://informationr.net/ir/18-3/colis/paperc18.html http://informationr.net/ir/18-3/colis/paperc18.html http://dx.doi.org/10.1080/01639374.2014.960988 http://dx.doi.org/10.1045/january2015-silvello introduction background frbr and rda notes about terminology and graphs: how to read the figures the impact of the representation scheme’s selection: rdf versus er names, entities, and identities the distinction between authorities and descriptive information integrating authorities with descriptive information place names personal names titles visualization of bibliographic records and cataloging rules discussion references and notes the lc/marc record as a national standard 159 the desire to promote exchange of bibliographic data has given rise to a rather cacophonous debate concerning marc as a "standard," and the definition of a marc compatible record. much of the confusion has arisen out of a failure to carefully separate the intellectual content of a bibliographic record, the specific analysis to which it is subjected in an lc/marc format, and its physical representation on magnetic tape. in addition, there has been a tendency to obscure the different requirements of users and creators of machine-readable bibliographic data. in general, the standards making process attempts to find a consensus among both groups based on existing practice. the process of standardization is rarely one which relies on enlightened legislation. rather, a more pragmatic approach is taken based on an evaluation of the costs to manufacturers weighed against costs to consumers. even this modest approach is not invested with lasting wisdom. ansi standards, for example, are subject to quinquennial review. standards, as already pointed out, have as their basis common acceptance of conventions. thus, it might prove useful to examine the conventions employed in an lc/marc record. the most important of these is the anglo-american cataloging rules as interpreted by lc. the use of these rules for descriptive cataloging and choice of entry is universal enough that they may safely be considered a standard. similar comments may be made concerning the subject headings used in the dictionary catalog of the library of congress. the physical format within which machine-readable bibliographic data may be transmitted is accepted as a codified national and international standard (ansi z39.2-1971 and iso 2709-1973 (e) ) . this standard, which is only seven pages in length, should be carefully read by anyone seriously concerned with the problems of bibliographic data interchange. ansi z39.2 is quite different from the published lc/ marc formats. it defines little more than the structure of a variable length record. simply stated, ansi z39.2 specifies only that a record shall contain a leader specifying its physical attributes, a directory for identifying elements within the record by numeric tag (the values of the tags are not defined), and optionally, additional designators which may be used to provide further information regarding fields and subfields. this structure is completely general. within this same structure one could transmit book 160 1 oumal of library automation vol. 7 i 3 september 197 4 orders, a bibliographic record, an abstract, or an authority record by adopting specific conventions regarding the interpretation of numeric tags. thus, we come to the crux of the problem, the meanings of the content designators. content designators (numeric tags, subfields, delimiters, etc.) are not synonymous with elements of bibliographic description; rather, they represent the level of explicitness we wish to achieve in encoding a record. it might safely be said that in the most common use of a marc record-card production-scarcely more than the paragraph distinctions on an lc card are really necessary. if we accept such an argument, then we can simply define compatibility with lc/marc by defining compatibility in terms of a particular class of applications, e.g., card, book, or crt catalog creation. a record may be said to be compatible with lcjmarc if a system which accepts a record as created by lc produces from the compatible 1·ecord products not discernibly different from those created from an lc/marc record. thus, what is called for is a family of standards all downwardly compatible with lc/marc, employing ansi z39.2 as a structural base. this represents the only rational approach. the alternative is to accept lc/ marc conventions as worthy of veneration as artistic expression. s. michael malinconico 121 n ews and announcements redi or not . .. "public libraries and the remote electronic delivery of information (redi)," a working meeting, was held in columbus, ohio, on monday and tuesday, march 23 and 24, 1981. the meeting, jointly sponsored by the public library of columbus and franklin county (ohio) and oclc, inc., considered the issues that public libraries must examine before becoming involved in electronic information services . subjects explored included technology, communications, information providers, information users , social implications, and financial, legal, and regulatory responsibilities. tom harnish, program dir e ctor of oclc' s home delivery of library services program, was moderator of the twoday event. participants at the conference represented a variety of public libraries from throughout the u.s., including new york, georgia, texas, california, colorado, and illinois. don hammer represented lita at the meeting; mary jo lynch of the ala office for research also attended . "geographic distances, " said harnish, "were the only points of separation among the meeting participants . there was an overwhelming agreement on the concerns for the future of libraries and universal access to information in the electronic age . " on the second day of the conference it became apparent that the redi agenda could not be properly dealt with in two days. "we need an organization which will address these issues on an ongoing basis," said richard sweeney, executive director of plcfc . "librarians at the conference agreed to promote and lead the development of the electronic library . to that end, this group is seeking recognition by ala as a membership initiative group with a special interest in the electronic library." the group's founders prepared the following mission statement for the membership initiative group: to ensure that information delivered electronically remains accessible to the general public, the electronic library association shall promote participation and leadership in the remote electronic delivery of information* (redi) by publicly supported libraries and nonprofit organizations . goals of the organization are to: • identify services and information that are best suited to remote electronic delivery; • plan , fund, and develop working demonstrations of library redi services ; • communicate the availability of electronic library services to the user community; · • inform the library profession of trends, specific events , and future directions of redi; • create coalitions with organizations in allied fields ·of interest. public libraries and nonprofit organizations with information interests, such as information and referral groups, are invited to join the electronic library association . the group plans to meet at the ala annual conference in san francisco. meeting details will be announced as soon as they are available . it was the goal of the "public libraries and the remote electronic delivery of information " meeting to provide th e fram e work within which to address the myriad issues in redi. the electronic library group will validate the role of libraries in technology .... redi or not here we come. *information delivered electronically where and when it is needed, in the library and elsewhere (home/office/off-site). 122 journal of library automation vol. 14/2 june 1981 arl adopts plan for improving access to microforms a plan aimed at improving bibliographic access to materials in microform by building a nationwide database of machinereadable records for individual titles in microform sets was approved in principle by the arl board of directors on january 30, 1981. the plan concentrates on monograph collections, and is aimed at providing records for individual titles in both current and retrospective sets. records add~d to the database will also aid cooperative efforts in preservation microfilming. elements of the plan include: • inputting of records conforming to accepted north american standards to the major bibliographic utilities by libraries and microform publishers; • d~ve_lopment of "profile matching" by the b1blwgraphic utilities permitting the cataloging of all individual titles in a series or microform collection with single operation; • cooperative cataloging of current and retrospective microform sets by libraries and publishers; • compensation for publishers who input acceptable bibliographic records to the bibliographic utilities to offset loss of revenue from card set sales. cooperation among libraries publishers, networks, and others ha's been stressed throughout the development of the plan, and initiatives on a number of fronts are necessary and encouraged in order to accomplish the goal of improved bibliographic access to microforms. arl will s_eek outside funding for a program coordmator to facilitate implementation of the elements outlined above, and recruitment for the one-year position will begin short!~ . the coordinator, advised by a committee of librarians (from arl and ~on-arl institutions) and microform publ~shers, will work with libraries, publishers, and the bibliographic utilities to help get the plan off the ground. the plan is the result of a one-year study funded by a grant from the national endowment for the humanities and conducted for arl by richard boss of information systems consultants, inc. during the course of the year, he interviewed librarians , microform publishers, representatives of the bibliographic utilities, and others interested in bibliographic access to microforms, gradually building the plan from elements on which there was agreement and discarding ideas that were not widely accepted. the effort to build a consensus among the various interested parties was aided by the advisory committee, comprising both arl librarians and microform publishers, which assisted and advised throughout the course of the project. arl will publish the study this spring. arl sponsorship of this project and its follow-up reflects the long-standing commitment the association has had to improving access to microforms . two earlier arl studies on improving bibliographic access contributed to the development of standards for descriptive cataloging of microforms, reinforced the importance of microforms for preserving and disseminating scholarly materials, and identified some of the problem areas that the current study has addressed . today, as the amount of materials in microform in arl libraries continues to grow-arl libraries hold more than 146,660,000 units of microform-improving access to these materials has taken on even greater urgency. the association of research libraries is an organization of major research libraries in the united states and canada. members include the larger university libraries, the national libraries of both countries and a number of public and special librar~ ies with substantial research collections . there are at present 111 institutional members . battelle studies using computers to access unpublished technical information engineers may be able to use computers to store, call up, and otherwise display some technical information not currently published in professional journals as a result of a study recently begun by battelle's columbus laboratories. in a four-month study sponsored by the american society of mechanical engineers (asme), battelle researchers are examining ways to use computers as an alternative to publications for communicating with the technical community. asme is a technical and educational organization with a membership of 100,000 individuals, including 17,000 student members. it conducts one of the largest technical publishing operations in the world, which includes codes, stanc dards, and operating principles for industry. according to battelle's gabor j. kovacs, certain types of information traditionally are not covered in monthly or quarterly technical journals, yet they often have widespread appeal among engineers. "recent advances in computer and telecommunications technologies, coupled with rapidly rising publication costs and postal rates, have created an ideal environment for organizations to consider using computers as an alternative mode of communication," kovacs said . "data bases can be used to maintain information that is impractical for conventional publication, and it is now possible to use them for many other types of communication as well." during the study, researchers will determine the feasibility of using a computer database to disseminate to asme members such information as short articles dealing with design and applications data, news and announcements 123 catalog data, and teleconference messages. with the help of the asme, battelle specialists will define the information requirements for such a system. while technology is sufficiently advanced to accommodate virtually any type of information, costs can become prohibitive unless practical compromises are made, kovacs said . as part of the study, battelle researchers also will analyze the costs associated with systems of varying capabilities. researchers then will define several alternative database systems, which will include such attributes as: • online, interactive retrieval features • simple-to-use retrieval language • user-aid features • a minimum of seventy-five simultaneous users • ability to send, store, and broadcast messages • compatibility with a variety of hard copy and crts (cathode ray tube terminals) • sixteen or more hours per day availability to accommodate different time zones • a minimum of thirty-characters-persecond transmission rates two of these alternative system de .signs-one representing a minimum capability and the other a maximum capability-then will be selected for further evaluation by battelle and the asme. 50 communications how long the wait until we can call it television jerry borrell: congressional research service, library of congress , washington , d.c* this brief article will review videotex and teletext. there is little need to define terminology because new hybrid systems are being devised almost constantly (hats off to oclc's latest buzzword-viewtel). ylost useful of all would be an examination of the types of technology being used for information provision. the basic requirement for all systems is a data base-i.e ., data stored so as to allow its retrieval and display on a television screen. the interactions between the computer and the television screens are means to distinguish technologies. in teletext and videotex a device known as a decoder uses data encoded onto the lines of a broadcast signal (whatever the medium of transmission ) to generate the display screen. in videotex, voice grade telephone lines or interactive cable are used to carry data communications between two points (usually 1200 baud from the computer and 300 baud or less from the decoder and th e television screen). in teletext the signal is broadcast over airwaves (wideband) or via a time-sharing system (narrowband). the numerous configurations possible make straightforward classification of syst e ms questionable. a review of the systems currently available is useful to illustrate these terms, videotex and teletext. compuserve, the columbus, ohio-based company, provides on-line searching of newspapers to about 4,000 users. reader's digest recently acquired 51 percent of the source, a time*the views expressed in this paper do not necessarily represent those of the library of congress or of the congressional research ser~ vice. sharing service that provides more than 100 different (nonbibliographic) data bases to about 5,000 users. the warner and american express joint project, qube (also columbus-based), utilizes cable broadcast with a limited interactive capability . it does not allow for on-demand provision of information ; rather, it uses a polling technique. antiope, the french teletext system, used at ksl in st. louis last year and undergoing further tests in los angeles at knxt in the coming year, is only part of a complex data transmission system known as didon. antiope is also at an experimental stage in france, with 2,500 terminals scheduled for use in 1981. ceef ax and oracle , broadcast teletext by the bbc and ibc in britain, have an estimated 100,000 users currently. two thousand adapted television sets are being sold every month . prestel, bbc's videotex system, currently has approximately 5,000 users, half of whom are businesses. all other countries in europe are conducting experiments with one of the technologies. in canada, telidon, the most technically advanced system, has 200 users. experiments involving telidon are being conducted nationwide due to government interest in telecommunications improvements. telidon will also be used in washington in the spring of 1981 for consumer evaluation. these cursory notes should indicate the breadth of interes t in alternative means of information provision. video and electronic publishing newsletters (see references) keep track of the number of users and are the best way to keep informed of activities and developments. several important trends are becoming evident. perhaps the most evident is the realization that videography is being developed in countries other than the u.s. as a result of strong support by the national posts and telecommunications (ptt) authorities . until recently there was a feeling that the u.s. was technically behind europe. what is now evident is that in the free market system of the u . s. manufacturers or other potential system providers have had insufficient impetus to provide videotex/teletext technology. the technology of information display (see borrell, journal of library automation, v.13 (dec. 1980), p.277-81) in the u.s. is an order of magnitude more sophisticated than in europe. the point being that in the absence of strong ptt pressure, videography in the u . s. developed for specialized markets in which telecommunications were not a central need. in the one area of great demand, teletext services for the hearing impaired, decoders were developed and have been employed for a number of years (about 25,000 are currently in use ). as the high cost of telecommunications bandwidth is eased by data compression, direct broadcasting by satellite, enhanced cable services, and fiber optic networks, then videotex and te letext will become available on a wide scale in the u.s. the computer inquiry ii decision by the fcc involving reinterpretation of the communications act of 1934 has given at&t permission to enter the data processing market . in fact, at&t, in its third experiment with videotex, is taking such an aggressive stance that it seems to be doing everything that its critics have feared: providing updatable classified ads (dynamic yellow pages), allowing users to place information into the system memory , and providing voice mail servicesthereby taking on the newspapers, home computer manufacturers, and the u . s. postal service. in addition, banking services will be offered . as the largest company in the u.s ., this stance cannot be ignored. at&t supplies about 80 percent of the phone service in the u .s., and has the potential, if allowed , to become a broadcaster, data processor, publisher, and banker ; cross-ownership was never allowed up to this time . the trend toward specialized services provision is also exemplified by the communications 51 french and british systems. prestel , which was originally targeted for a home market, is now promoted with the tacit policy of being a special business service allowing financial and private data to be provided to subscribers. sofratev, the marketers of the french teletext system, are acknowledging the importance of transactional markets in two ways, based on technology they have named "smart card," a credit card-size (in one configuration) plate with a built-in microprocessor or chip. the card will allow system users to access material that will have controlled readership. an example would be a magazine of financial data provided to those who need such information (or, more importantly, are willing to pay for it). in a more complex effort, the largest retailer in paris will advertise material via teletext and system users will be able to make acquisitions with their smart card, which can be programmed with financial data. nor is this the end of the effort by the french to market information display technology. the electronic phone directory, being offered by bell in austin , is replicated in a more modest way by the french, who plan to produce a six-byeight-inch black-and,white display unit that will provide. phone directory information (both white and yellow pages) to all of france by the 1990s. developed as part of the "telerriatique" program of the .french government, the terminals represent to some (the parent company of the source has tendered an offer for up to 250,000 of the terminals) a low-cost alternative for providing videotex to a mass market. the tandy home computer in its videotex configuration seems to fill the same market slot. perhaps the most disturbing trend, at least from a librarian's point of view, is the fact that contemporary data systems are being created which could benefit greatly from the experience of librarians and libraries. for instance, research into the methods of access-keyword, phonetic and geographical-by the french is intended to pro:vide a flexible and easily used system for untrained persons searching for directory information, and is being performed by an advertising and yellow pages 52 journal of library automation vol. 14/1 march 1981 publishing firm. with a feeling of deja vu i listened to an explanation of how difficult it is to develop a system for the novice; one proposed solution is to allow only the first four letters of a word to be entered (one of the search methods used at the library of congress, which does suggest some cross-fertilization ). whatever the trends, the reality is that librarians and information scientists are playing decreasing roles in the growth of information display technology. hardware systems analysts, advertisers, and communications specialists are the main professions that have an active role to play in the information age. perhaps the answer is an immediate and radical change in the training of library schools of today. our small role may reflect our penchant to be collectors, archivists, and guardians of the information repositories . have we become the keepers of the system? the demand today is for service, information, and entertainment. if we librarians cannot fulfill these needs our places are not assured. should the american library association (ala) be ensuring that libraries are a part of all ongoing tests of videotex-at least in some way-either as organizers, information providers, or in analysis? consider the force of the argument given at the ala 1980 new york annual conference that cable television should be a medium that librarians become involved with for the future. certainly involvement is an important role, but we , like the industrialists and marketers before us, must make smart decisions and choose the proper niche and the most effective way to use our limited resources if we are to serve any part of society in the future. bibliography 1. electronic publishing revietc. oxford, england : learned information ltd . quarterly . 2. home video report . white plains, new york : knowledge industry publications. weekly. 3. ieee transactions on consumer electronics. new york: ieee broadcast, cable, and consumer electronics soc iety . five tim es yearly. 4. international videotex /te letext news. washington , d. c.: arlen communications ltd. monthly . 5. videodisc/teletext news. westport , conn.: microform revi ew. quarterly. 6. videoprint. norwalk , conn.: videoprint. two times monthly. 7. viewdata/videotex report. new york: link resources corp. monthly. data processing library: a very special library sherry cook, mercedes dumlao, and maria szabo: bechtel data processing library, san francisco, california. the 1980s are here and with them comes the ever broadening application of the computer. this presents a new challenge to libraries. what do we do with all these computer codes? how do we index the material? and most importantly, how do we make it accessible to our patrons or computer users? bechtel's data processing library has met these demands. the genesis for th e collection was bechte l's conversion from a honeywell 6000 computer to a univac lloo in 1974. all the programs in use at that time were converted to run on the univac system. it seemed a good time to put all of the computer programs together from all of the various bechtel divisions into a controlled collection. the librarians were charged with the responsibility of enforcing standards and control of bechtel's computer programs. the major benefits derived from placing all computer programs into a controlled library were: 1. company-wide usage of the programs. 2. minimize investment in program development through common usage. 3. computer file and documentation storage by the library to safeguard the investment. 4. central location for audits of program code and documentation. 5. centralized reporting on bechtel programs . developing the collection involved basic cataloging techniques which were greatly modified to encompass all the information that computer programs generate, including actual code, documentation, and listlib-mocs-kmc364-20131012114424 book reviews more joy of contracts: an epicurean approach to negotiation, by kevin hegarty. tacoma, wash.: tacoma public library, 1981. 66p. $10. order from: administrative offices, tacoma public library, 1102 tacoma ave. south, tacoma, wa 98402. hegarty's book, the second edition of his original joy of contracts (american library association, dallas, texas, june 1979), has both strengths and weaknesses. the basic strength is one heck of a lot of information about how to negotiate and write a contract that will assure a library that it gets what it pays for from a turnkey automation system. the weakne.c;ses involve the organization of the text, the writing style, the specific focus on automated circulation systems, and the physical format of the document. first, the author has clearly fought his way through a contract negotiation for a turnkey "computerized library circulation system." the first edition of this book was produced soon after that negotiation was completed. this second edition seems to be augmented on the basis of experience gained in living with the contract. the main text walks the reader through each element of a contract (e.g., terms of agreement, specification of governing law, schedule, acceptance testing, etc.), provides sample contract language and adds comments and recommendations for how to cope with specific problems (e. g., negotiation of system reliability standards, p. 3-4). while the contract structure and the specification of contract elements may be useful, the real value of the book lies in the comments (e.g., the difference between two percent downtime and five percent downtime over one year is a system that is disabled for 140 additional hours). the practical value of these comments may be measured in wasted dollars, wasted staff hours, or frustrated library patrons. the section on system maintenance (p. 13-15) 319 alone, may be worth the cost of the book. on the negative side of the ledger, the book is somewhat difficult to use, because of its organization. it is composed of a primary section-in outline form-on the elements of a contract between a library and a vendor, and seven secondary sections, including examples of plans, sub-agreements, and schedules (and a seventeen-item bibliography). that is all that appears in the table of contents and there is neither an introduction, an overview, nor an index. it is very difficult to find a specific topic of interest without skimming through the text itself. second, the body of the text is a mixture of sample (or recommended; it isn't clear) contract language (identifiable by use of the word "shall"), comments on the language of particular portions of the contract (sometimes labeled "comment " and sometimes not), and cross-references within the book itself (sometimes labeled "note:"). the mixture of different elements-contract language, narrative, etc.-are sometimes confusing. moreover, there are a number of small grammatical garbles which are slightly distracting. a bit of professional editing would make this document both more readable and more useful. third, hegarty focuses on (or uses as an example; it isn't clear) automated circulation systems. this would be very useful if that is what the reader intends to buy. however, with a variety of other turnkey automated systems and sources for libraries on the market or soon to be made available (e.g., acquisitions, book fund accounting, cataloging, online bibliographic access), some language about how the contract should be redesigned or revised to account for different systems and services would have made the book more immediately useful to more readers. last, the book comes as a photocopy of a typed original, with a velo binding. the binding of the reviewer's copy broke apart 320 journal of library automation vol. 14/4 december 1981 the first time it was opened. however, it should be possible to rebind or staple it together if this turns out to be a persistent problem. on balance, for those about to negotiate a contract with a vendor of automated systems and services, the strengths of more loy of contracts probably outweigh the weaknesses. one gets what a contract says one will get; any help in writing a thorough, comprehensive, and airtight contract will be of usei-donald thompson, university of california systemwide administration, office of the president, berkeley, california. computer science resources: a guide to professional literature. compiled and edited by darlene myers. white plains, n.y.: knowledge industry publications, 1981 . 346p. $59.50 (asis members: $47.60), paperback. isbn 0-914236-80-6. this comprehensive guide to the englishlanguage literature of computer sciences catalogs qn-time managed by professionals grc provides • accurate • flexible • economical com catalog services contact don gill general research corporation a subsidiary of flow general. inc. p.o box 6770, santa barbara. calofornoa 93111 (805) 964-7724 covers books, journals, technical reports, indexing and abstracting resources, directories, dictionaries, handbooks, newsletters, software resources, proceedings, programming languages, and publishers. its appendixes give information relating to career and salary trends, societies and associations, academic computer-center libraries, commercial fairs and shows, and myers' draft of a proposed expansion of the library of congress classification for the computer sciences. as meyers states in the preface, "the work is designed to serve the needs of researchers, managers, librarians, consultants and systems analysts in academic, corporate and governmental data processing centers." computer science resources, divided into ten main sections with five appendixes at the back, is on the whole easy to use. since the book does not have an index, its table of contents becomes the key to information access. its wide margins together with fairly large print make it very readable. however, its unconventional arrangement of entries-letter by letter ignoring conjunctions and prepositions instead of word by word-can be misleading. for instance, "computers and urban society" is arranged ahead of "computer survey." the word "and" and spaces between words are ignored; resulting in computersurban . .. filing before computersurvey. this practice does not follow the traditional library principle "nothing files before something. " the explanation of the idiosyncratic entry arrangement is only given in the preface. when people use a book for quick reference, they usually skip the preface and the introduction; some users will probably miss many terms as a result. the book is international in scope, relatively up-to-date, and informative. english titles published overseas, foreign publishers, and trade fairs and shows pertaining to the computer industry are included in the directory. most titles mentioned have been published since 1970 and many citations are as recent as 1980. the annotations for each entry in "indexing and abstracting resources," "directories, dictionaries, handbooks," and "software resources'' are very informative. it would have been ideal if titles in the "current books" and "computer-related journals" were also annotated to aid users in selecting the materials. subject headings and cross-references used in various sections of the book are not al·ways consistent. for example, in the section "current books," there is a see reference from "a. i. (artificial intelligence)" to "cybernetics/ artificial intelligence/robots," but none from "artificial intelligence." however, in the section "computer-related journals," the heading is "artificial intelligence" with a see also reference to "cybernetics; robots," but no reference from "a.l. (artificial intelligence)." in the "current books" section, "careers/vocational guidance" is used as a subject heading. in the "computer-related journals" section, "employment" becomes the subject. there is no cross reference from either heading to the other in either section. in the "computer-related journals" section, preceding and succeeding titles are linked by cross-references. the history of title changes is outlined whenever applicable under the entries for the current titles. this information is invaluable especially for librarians in identifying variant journal titles. although there are see references under most former titles to current titles, some entries are omitted for previous titles. for example, injosystems was formerly called management and business automation and later changed to business automation with the merging of international business automation and international edition business automation. then there was business automation news analysis edition published book reviews 321 as a supplement to business automation. surprisingly there are no see references under "business automation" and "international edition business automation" to "infosystems." maybe it is because ''business automation" is quite similar to "business automation news analysis edition'' and "international edition business automation" is similar to "international business automation" and would have appeared close together if not adjacent to one another. again some users may miss the links to the current titles. it might have been better to include a separate list for ceased journals. computer science uesources is the result of monumental effort and years of thorough research and careful planning. its compiler and editor, darlene myers has been very active in the computer and information science field, and is the manager of the computing information center at the university of washington. the wealth of information in the book and the currentness of cited materials are the prominent strengths. the flaws mentioned earlier are minor if users read the preface and the introduction in each section first. this reference tool is strongly recommended for computer industry libraries as well as for medium-sized and large public and academic libraries. although more current, it does not wholly supplant ciel carter's guide to reference sources in the computer sciences (new york: macmillan, 1974). carter's entries are all annotated, and some of the citations are not included in the newer work.-frauces lau, blackwell! north america, beaverton, oregon. the benefits of enterprise architecture for library technology management: an exploratory case study sam searle information technology and libraries | december 2018 27 sam searle (samantha.searle@griffith.edu.au) is manager, library technology services, griffith university, brisbane, australia. abstract this case study describes how librarians and enterprise architects at an australian university worked together to document key components of the library’s “as-is” enterprise architecture (ea). the article covers the rationale for conducting this activity, how work was scoped, the processes used, and the outputs delivered. the author discusses the short-term benefits of undertaking this work, with practical examples of how outputs from this process are being used to better plan future library system replacements, upgrades, and enhancements. longer-term benefits may also accrue in the future as the results of this architecture work inform the library’s it planning and strategic procurement. this article has implications for practice for library technology specialists as it validates views from other practitioners on the benefits for libraries in adopting enterprise architecture methods and for librarians in working alongside enterprise architects within their organizations. introduction griffith university is a large comprehensive university with multiple campuses located across the south east queensland region in australia. library and information technology operations are highly converged and from 1989–2017 were offered within a single division of information services. scalable, sustainable, and cost-effective it is seen as a key strategic enabler of the university’s core business in education and research. “information management and integration” and “foundation technology” are two of four key areas outlined in the griffith digital strategy 2020, which highlights enterprise-wide decision-making and proactive moves to take advantage of as-a-service models for delivering applications.1 from late 2016 through to early 2018, library and learning services (“the library”) and it architecture and strategy (itas) worked iteratively to document key components of the library’s “as-is” enterprise architecture (ea). around fifty staff members have participated in the process at different points. the process has been very positive for all involved and has led to a number of benefits for the library in terms of improved planning, decision-making, and strategic communication. as manager, library technology services, the author was well placed to act as a participant-asobserver with the objective of sharing these experiences with other library practitioners. the author actively participated in the processes described here and has been able to informally discuss the benefits of this work with the architects and some of the library staff members who were most involved. mailto:samantha.searle@griffith.edu.au benefits of enterprise architecture for library technology management | searle 28 https://doi.org/10.6017/ital.v37i4.10437 literature review enterprise architecture (ea) emerged over twenty years ago and is now a well-established it discipline. like other disciplines such as project management and change management, there are a number of best practice frameworks in common use, including the open group architecture framework (togaf).2 a global federation of member professional associations has been in place since 2011, with aims including the formalization of standards and promotion of the value of ea.3 educational qualifications, certifications, and professional development pathways for enterprise architects are available within universities and the private training sector. according to the international higher education technology association educause, ea is relatively new within universities but is growing in importance. as a set of practices, “ea provides an overarching strategic and design perspective on it activities, clarifying how systems, services, and data flows work together in support of business processes and institutional mission.”4 yet despite this growing interest in our parent organizations, individual academic libraries applying ea principles and methods are notably absent from the scholarly literature and library practitioner information sharing channels. the fullest account to date of the experience and impacts of enterprise architecture practice in a library context is a case study from the canada institute for scientific and technical information (cisti). at the time of the case study’s writing in 2008, cisti was already well underway in its adoption of ea methods in an effort to address the challenges of “legacy, isolated, duplicated, and ineffective information systems” and to “reduce complexity, to encourage and enable collaborations, and, finally, to rein in the beast of technology.”5 the author of this case study concludes that while getting started in ea was complex and resource-intensive, this was more than justified at cisti by the improvements in technology capability, strategic planning, and services to library users. broader whole-of-government agendas are a driver for ea adoption in non-university research libraries. the national library of finland’s ea efforts were guided by a national information society policy and the ea architecture design method for finnish government. 6 a 2009 review of the it infrastructure at the u.s. library of congress (lc) argued lc was lagging behind other federal agencies in adoption of government-recommended ea frameworks. the impact of this included: inadequate linking of it to the lc mission; potential system interoperability problems; difficulties assessing and managing the impact of changes; poor management of it security; and technical risk due to non-adherence to industry standards and lack of future planning.7 a followup review in 2015 noted that lc had since developed an architecture, but that it had still fallen short by not gathering data from management and validating the work with stakeholders. 8 there is little discussion in the literature about the ea process as a collaborative effort. in their 2016 discussion of emerging roles for librarians, parker and mckay proposed ea as a new area for librarians themselves to consider moving into, rather than as a source of productive partnerships.9 they argued that there are many similarities in the skillsets and practices of enterprise architects and information professionals (in particular, systems librarians and corporate information managers). areas of crossover identified included: managing risks, for example, related to intellectual property and data retention; structured and standardized approaches to (meta)data and information; technical skills such as systems analysis, database design and vendor management; and understanding and application of information standards and internal information technology and libraries | december 2018 29 information flows. while not a research library, within a broader information management context state archives and records nsw has promoted the benefits to records managers of working with enterprise architects, including improved program visibility, strategic assistance with business case development, and the embedding of recordkeeping requirements within the organization’s overall enterprise architecture.10 getting started: context and planning library technology services context in 2015–16, the awareness of enterprise and solution architecture expanded significantly within griffith university’s library technology services (lts) team. in 2015, some members of the team participated in activities led by external consultants to document griffith’s overall enterprise architecture at a high level. in 2016, the author became a member of the university’s solution architecture board (sab). lts submitted several smaller solution architectures to this group for discussion and approval, and team members found this process useful in identifying alternative ways to do things that we may not have otherwise considered. as a small team looking after a portfolio of high-use applications, lts was seeking to align itself as much as possible with university-wide it governance and strategy. these broader approaches included aggressively seeking to move services to cloud hosting, standardizing methods for transferring data between systems, complying with emerging requirements for greater it security, and participating in large-scale disaster recovery planning exercises. the author also needed to improve communication with senior it stakeholders. there was little understanding outside of the library of the scale and complexity involved in delivering online library services to a community of over 50,000 people. in a resource-scarce environment, it was increasingly important to make business cases not just in formal project documents but also opportunistically in less formal situations (the “elevator pitch”). existing systems were definitely hindering the library in making progress toward an improved online student experience and more efficient usage of staff resources. a complex ecosystem of more than a dozen library applications had developed over time. the library had selected these at different times based on requirements for specific library functions rather than alignment with an overall architectural strategy. our situation mirrored that described at cisti: “a complex and ‘siloed’ legacy infrastructure with significant vendor lock-in” combined with “reactionary” projects that “extended or redesigned [existing infrastructure] to meet purported needs, without consideration for the complexity that was being added to overcomplicated systems.”11 complex data flows between local systems and third-party providers that were critical to library services were not always well-documented. while lts staff members were extremely experienced, much of their knowledge was tacit. as in many libraries, staff could be observed sharing in informal, organic ways focused on the tasks at hand, but less effort was spent on capturing knowledge systematically. building a more explicit shared understanding about the library’s application portfolio would help address risks associated with staff succession. improved internal documentation would also address emerging requirements for team members to both develop their own understanding in new areas (upskilling) as well as become more flexible in terms of taking up broader roles and responsibilities across the team (cross-skilling). benefits of enterprise architecture for library technology management | searle 30 https://doi.org/10.6017/ital.v37i4.10437 there was also a sense that the time was right to take stock and evaluate the current state of affairs before embarking on any major changes. the team was supporting several applications, including the library management system and the interlibrary loans system, that were end-of-life. we needed to make decisions, and these needed to not only address our current issues but also provide a firm platform for the future. it was in this context that in 2016 library technology services approached the information technology architecture and solutions group for assistance. information technology architecture and solutions context in 2014, griffith university embarked on a new approach to enterprise architecture. the chief technology officer was given a mandate by the senior leadership of the university to ensure that it architecture was managed within an architecture governance framework, and the information services ea team was tasked with developing and maintaining an ea and providing services to support the development of solution architectures for projects and operational activities. two new boards were established to provide governance: the information and technology architecture board (itab) would control architectural standards and business technology roadmaps, while the solution architecture board (sab) would “support the development and implementation of solution architecture that is effective, sustainable and consistent with architectural standards and approaches.” project teams and operational areas were explicitly given responsibility to engage with these boards when undertaking the procurement and implementation of it systems. sets of architectural, information, and integration principles were developed, which promoted integration mechanisms that minimized business impact and were future-proof, loosely coupled, reusable, and shared services.12 our enterprise architects saw their primary role as maximizing the value of the university’s total investment in it by promoting standards and frameworks that could potentially improve consistency and reduce duplication across the whole organization. in order to do this , they would need to work with and through other business units. from the architects’ perspective, a collaboration with the library offered an opportunity to exercise skillsets and frameworks that were in place but still relatively new. griffith was still maturing in this area and attempting to move from the hiring of consultants as the norm to building more internal capability. working with the library would be a good learning experience for a junior architect, who was on a temporary work placement from another part of information services as a professional development opportunity. she could build her skills in a friendly environment before embarking on other engagements with potentially less open client groups. determining scope in a statement of architecture work once the two teams had decided that the process could have benefits on both sides, the next step was to jointly develop a statement of architecture work outlining what the process would include and how we would work together. a formal document was eventually endorsed at the director level, but prior to that, the librarians and the architects had a number of useful informal conversations in which we discussed our expectations, as well as the amount of time that we could reasonably contribute to the process. in developing the statement of work, the two teams agreed to focus on the current “as-is” environment and on assessment of the maturity of the applications already in use (see figure 1). this would help us immediately with developing business cases and roadmaps, without information technology and libraries | december 2018 31 necessarily committing either team to the much greater effort required to identify an ideal “to-be” (i.e., future) state to work towards. figure 1. overview of the architecture statement of work. full size version available at https://doi.org/10.6084/m9.figshare.6667427. the open group architecture framework (togaf) supports the development of enterprise architectures through four subdomains: business architecture, data architecture, application architecture, and technology architecture.13 the work that we decided to pursue maps to two of these areas: data architecture, which “describes the structure of an organization’s logical and physical data assets and data management resources;” and application architecture, which “provides a blueprint for the individual applications to be deployed, their interactions, and their relationships to the core business processes of the organization.” enterprise architecture process and outputs once the architecture statement of work had been agreed on, the two teams embarked on the process of working together over an extended period. while the lapsed time from approval of the statement of work through to endorsement of the architecture outputs by the solution architecture board was approximately fourteen months, the bulk of the work was undertaken within the first six months. following an intense period of information gathering involving large numbers of staff, a smaller subset of people then worked iteratively to refine the outputs for final approval. several times architecture activities had to be placed on hold in favor of essential ongoing operational work and higher priority projects, such as a major upgrade of the institutional repository. the process involved four main activities which are described in more detail in following sections. https://doi.org/10.6084/m9.figshare.6667427 benefits of enterprise architecture for library technology management | searle 32 https://doi.org/10.6017/ital.v37i4.10437 data asset and application inventory the first activity consisted of a series of three workshops to review information held about library systems in the ea management system, orbus software’s iserver. this is the tool used by the griffith ea team to develop and store architectural models, and to produce artifacts such as architecture diagrams (in microsoft visio format) and documentation (in microsoft word, excel, and powerpoint formats).14 the architects guided a group of librarians who use and support library systems through a process of mapping the types of data held against an existing list of enterprise data entities. in this context, a data entity is a grouping of data elements that is discrete and meaningful within a particular business context. for library staff, meaningful data entities included all the data relating to a person, to items and metadata within a library collection, and to particular business processes such as purchasing. we also identified the systems into which data were entered (system of entry), the systems that were considered the “source of truth” (system of record), and the systems that made use of data downstream from those systems of record (reference systems). the main output of this process was a workbook (figure 2) showing a range of relationships: between systems and data entities; between internal systems; and between internal systems and external systems. the first two columns in the worksheet contain a list of all the data entities and sub-entities stored in library systems (as expressed in the enterprise architecture). along the top of the worksheet is a list of all the products in our portfolio along with a range of systems they are integrated with. each of the orange arrows in this spreadsheet represents the flow of data from one system to another. the workbook in this raw form is definitely messy and the data within it is not really meant to be widely consumed in this format. the workbook’s main role is as the data source for the application communication diagram that is described in a later section. as a result of this data asset inventory, the management system used by our architects now contains a far more comprehensive and up-to-date view of the library’s architectural components than before: • the data entities better reflect library content. for example, while iserver already had a collection item data entity, we were able to add new data entity subtypes for bibliographic records, authority records, and holdings records. • library systems are now captured in ways that make more sense to us. workshopping with the architects led to the breakdown of several applications into more granular architectural components. for example, the library management system is now represented not just as a single system, but rather as a set of interconnected modules that support different business functions, such as cataloguing and circulation. similarly, our reading lists solution was broken down into its two main components: one for managing reading lists and one for managing digitized content. this granularity has enabled us to build a clearer picture of how systems (and modules within systems) interface with each other. information technology and libraries | december 2018 33 figure 2. part of the data asset and application inventory worksheet. full size version available at https://doi.org/10.6084/m9.figshare.6667430. https://doi.org/10.6084/m9.figshare.6667430 benefits of enterprise architecture for library technology management | searle 34 https://doi.org/10.6017/ital.v37i4.10437 • the wide range of technical interfaces we have with third parties, such as publishers and other libraries, is now explicitly expressed. feedback from the architects suggested that the library was very unusual compared to other parts of the organization in terms of the number of critical external systems and services that we use as part of our service provision. previously iserver did not contain a full picture of these critical services, including: o the web-based purchasing tools that we use to interact with publishers, such as ebsco’s gobi;15 o the library links program that we use to provide easier access to scholarly content via google scholar;16 and o various harvesting processes that enable us to share metadata with content aggregators, such as the national library of australia’s trove service and the australian national data service’s research data australia portal. 17 application maturity survey the second activity was an application maturity assessment. this involved forty-four staff members from all areas of the library with different viewpoints (technical, non-technical, and management) answering a series of questions in a spreadsheet format. the survey contained questions about: • how often a system was used; • how easy it was to use; • how well it supported the business processes that person carried out; • how well it performed, for example, in terms of response times; • how quickly changes/enhancements were implemented in the product; • how easily the system could be integrated with other systems; • the level of compliance with industry standards; and • overall supportability (including vendor support). as different respondents were assigned multiple systems depending on their level of support and/or use, the final overall number of responses to the survey was 144 responses relating to eleven different systems. the outputs of this process were a summary table and a series of four graphs. the summary table (see figure 3) presents aggregated scores on a scale of one (low) to five (high) for each application as well as recommended technical and management strategies. it is interesting, and somewhat disheartening, to note that scores for the business criticality of the applications are generally much higher than the scores for fitness. there is also some variation in the strategies required; some systems need to be replaced, but there are others where the issues seem to be less technical. the third row of the table shows a product that is scored as highly business-critical and perfectly suited to the job from a technical perspective, yet the product still scores much more poorly for business fit, which could indicate that something has gone wron g in the way that this product has been implemented. information technology and libraries | december 2018 35 figure 3. table summarizing the results of the application maturity assessment [product names redacted]. applications are rated on a scale of one to five, and one of four management strategies (technology refresh—not shown here, optimise, implementation review, or replace) is recommended. full size version available at https://doi.org/10.6084/m9.figshare.6667433. figure 4. two of the four graph types produced from the application maturity survey results, for a product [name redacted] that is performing well. full size version available at https://doi.org/10.6084/m9.figshare.6667436. figures 4 and 5 show the four graph types produced automatically from the survey results. on the left in figure 4 is a view displaying the business criticality, business fit, and technical fit for an individual application (shown in pink) as compared to the overall portfolio (shown in blue). on the right is a graph showing scores for the range of measures covered by the survey. this https://doi.org/10.6084/m9.figshare.6667433 https://doi.org/10.6084/m9.figshare.6667436 benefits of enterprise architecture for library technology management | searle 36 https://doi.org/10.6017/ital.v37i4.10437 particular product is doing well; technical and business fit are high in the graph on the left, and most measures are above average in the graph on the right. figure 5 shows the remaining two graphs for the same product. the graph on the left plots the scores for business criticality and application suitability (fitness for purpose) to produce a recommended technical strategy. the graph on the right plots the scores for business fit and technical fit to produce a recommended management strategy. in both graphs, it is possible to see how the specific application is performing (the red square) compared to the portfolio overall (the blue diamond). placement within the quadrant with the green optimize label is preferred, as in this case. figure 5. the remaining two graph types from the application maturity survey results, for a system [product name redacted] that is performing well. the specific system’s location is shown by the red square, while the blue diamond maps the average for all systems in the application portfolio. full size version available at https://doi.org/10.6084/m9.figshare.6667442. figures 6 and 7 present the same set of graphs for an end-of-life system. in figure 6 the graph on the left shows that the product is very business-critical but that its scores for technical fit and business fit (the lower corners of the pink triangle) are lower than the average across all applications (the lower corners of the blue triangle). the graph on the right shows that supportability and the time to market for changes and enhancements (the least prominent “points” in the pink polygon) are below the portfolio average (shown in blue along the same axes) while scores for other criticality, standards compliance, information quality, and performance were more in line with the portfolio average. https://doi.org/10.6084/m9.figshare.6667442 information technology and libraries | december 2018 37 figure 6. the first and second (of four) graphs for a system [product name redacted] that is end of-life. full size version available at https://doi.org/10.6084/m9.figshare.6667478. in figure 7, this application is placed well within the quadrant suggesting replacement. figure 7. the third and final graphs for a system [product name redacted] that is end-of-life. the placement of the red square within the replace quadrant indicates that this product is a high candidate for decommissioning. this is a marked difference from the portfolio as a whole (the blue diamond), which could be reviewed for possible implementation improvements. full size version available at https://doi.org/10.6084/m9.figshare.6667484. https://doi.org/10.6084/m9.figshare.6667478 https://doi.org/10.6084/m9.figshare.6667484 benefits of enterprise architecture for library technology management | searle 38 https://doi.org/10.6017/ital.v37i4.10437 the graphs are also useful for highlighting anomalies. figure 8 shows a product that is assessed as better-than-average in the portfolio on most measures. however, the survey results quite clearly show that information quality is a major issue. figure 8. graph from application maturity survey showing a specific area of concern (data quality) for an otherwise well-performing application [product name redacted]. full size version available at https://doi.org/10.6084/m9.figshare.6667487. this type of finding will help library technology services to target our continuous improvement efforts and work through our relationships with user groups and vendors to get a better result. application communication diagram the third major activity was the production of an application communication diagram (see figure 9). this is a visual representation of all of the information that was collated through the workshops using the workbook described above. https://doi.org/10.6084/m9.figshare.6667487 information technology and libraries | december 2018 39 figure 9. application communication diagram [simplified view]. full size version available at https://doi.org/10.6084/m9.figshare.6667490. https://doi.org/10.6084/m9.figshare.6667490 benefits of enterprise architecture for library technology management | searle 40 https://doi.org/10.6017/ital.v37i4.10437 the diagram includes a number of things to note. • key applications that make up the library ecosystem. an example of this is the large blue box on the top left. this represents the intota product suite from proquest, which contains multiple components, including our link resolver, discovery layer, and electronic resource manager. • physical technology. self-checkout machines appear as the small green box mid-right. • other internal systems that connect to library system components. examples of these are throughout and include: corporate systems, such as peoplesoft for human resources and finances; identity management systems like metadirectory and ping federate; the learning management system blackboard; and research systems, including the research information management system and the researcher profiles system. • external systems that connect to our systems. these are mostly gathered into the large grey box bottom right. • actors who access the systems. this includes administrators, staff, students, and the general public. actors are identified using a small person icon. • interfaces between components. each line in the diagram represents a unique connection into another system or interface. captions on these lines indicate the nature of the connection, e.g. manual data entry, z39.50 search, export scripts, and lookup lists. the production of this diagram has been an iterative process that has taken place over a long time period. the number of components involved in the diagram is quite large, so it is worth noting that the version presented here has actually been simplified. the architects’ tools can present information in different ways and this particular “view” was chosen to balance the need for detail and accuracy with the need to communicate meaningfully with a variety of stakeholders. production of interactive visualizations in the fourth and final work package, the data entity and application inventory spreadsheet was used as a data source to provide an interactive visualization (see figure 10). a member of the architecture team converted the workbook (see figure 2) from microsoft excel .xls into a .csv file. he developed a php script to query the file and return a json object based on the parameters that were passed. the data driven documents javascript library (d3.js) was used to produce a force graph that uses shapes, colors, and lines to visually present the spreadsheet information in a more interactive way.18 this tool enables navigation through the library’s network of data entities (shown as orange squares) and applications (shown as blue dots). in the example being displayed, the data entity “bibliographic records—marc” has been selected. it is possible to see both in the visualization and in the popup box on the left how marc records are captured, stored, and used across our entire ecosystem of applications. this visualization was very much an experiment and the value of this in the long term is something we are still discussing. in the short term, other outputs have proven to be more useful for planning purposes. information technology and libraries | december 2018 41 figure 10. interactive visualization of library architecture, showing relationships between a single data subentity (bibliographic records—marc) and various applications. full size version available at https://doi.org/10.6084/m9.figshare.6667493. https://doi.org/10.6084/m9.figshare.6667493 benefits of enterprise architecture for library technology management | searle 42 https://doi.org/10.6017/ital.v37i4.10437 discussion the process described above was not without its challenges, including establishing a common language. enterprise architecture and libraries are both fertile breeding grounds for jargon and acronyms. there was also a disconnect in our understandings of who our users were, with the architects tending to concentrate on internal users, while the librarians were keen to include the perspectives of the academic staff and students who make up our core client base. these were minor challenges, and the experience of working with the enterprise architects was overall an interesting and positive one for the library. our collaboration validated mckay and parker’s view that there is much crossover in the skillsets and mindsets of librarians and enterprise architects.19 both groups tended to work in systematic and analytical ways, which was helpful in removing some of the more emotive aspects that might have arisen through a more judgmental “assessment” process. the enterprise architects’ job was to promote conformance with standards that are aspirational in many respects for the library. however, the collaborative nature of the process and the immediate usefulness of its outputs helped us to approach this as an opportunity to improve our internal practices as well as the services that we offer to library customers. the architects observed in return that library staff were very open-minded about the process; this had not necessarily always been their experience with other groups in the university. one reason for this may have been lts’s efforts to communicate early with other library staff. before embarking on this work, we sent emails and provided verbal updates to all participants and their supervisors. these communications were clear about both the time commitment needed for workshops and surveys and also about the benefits we hoped to achieve. short-term impacts in the library domain the level of awareness and understanding in library technology services about ea concepts and methods is much higher than what it was previously. our capacity to self-identify architectural issues is better as a result and this is enabling us to be proactive rather than reactive. a recent example of this is a request from our solution architecture board (sab) to seek an exemption from our it advisory board (itab) for our proposed use of the niso circulation interchange protocol (ncip) to support interlibrary loan. while ncip is a niso standard that is widely used in libraries, it is not one of the integration mechanisms incorporated into the architecture standards. as a result of this request, we plan to develop a document for these it governance groups about all the library-specific data transfer protocols that we use; not just ncip, but also z39.50, the open archives initiative protocol for metadata harvesting (oai-pmh), the edifact standard for transferring purchasing information, and possibly others. it is in our interests to educate these important governance groups about integration methods commonly used in the library environment, since these are not well understood outside of our team. the baseline as-is application architecture diagram gives us a much better grasp on the complexity we are faced with. understanding this complexity is a prerequisite to controlling it. the diagram, and the process worked through to populate it, makes it easier to identify manual processes that should be automated and integrations that might be done more efficiently or effectively. for example, like most libraries, we still have many scheduled batch processes that we could potentially replace in the future with web services to provide real-time updates. information technology and libraries | december 2018 43 the iserver platform is now an important source of data to support our decision-making, in terms of arriving at broad recommendations for replacing, reimplementing, or optimizing our systems as well as highlighting specific areas of concern. importantly, the process produced relative results, so that we can see across our application portfolio which systems are underperforming compared to others. this makes it easier to determine where the team should be putting its efforts and highlights areas where firmer approaches to vendor management could be applied. a practical example of this was our decision in late 2017 to review (and ultimately unbundle and replace) an e-journal statistics module that was underperforming compared to other modules within the same suite. the outputs from this process are also helping library technology services communicate, both within our own team and also with other stakeholders. the results of the application maturity assessment were included as part of a business case seeking project funding to upgrade our library management system and replace our interlibrary loans system. that funding bid was successful. while it is possible that the business case would have been approved regardless, a recommendation from the architects that the system needed to be replaced was likely more persuasive than the same recommendation coming solely from a library perspective. in our organizational context, enterprise architects are trusted by very senior executives; they are perceived as neutral and objective, and the processes that they use are understood to be systematic and data-driven. longer-term impacts in an enterprise context there are a number of longer-term impacts that may arise from this work. seeing the library’s applications in a broader enterprise context is likely to lead to more questioning of the status quo and to a desire to investigate new ways to do things. in large organizations like universities, available enterprise systems can offer better functionality and more standardized ways of operating than library systems. financial systems are an obvious example, as are business intelligence tools. the canned and custom reports and dashboards within library systems meet a narrow set of requirements, but do not compare well for increasingly complex analytics when compared to enterprise data warehousing, emerging “data lake” technologies for less structured data, and sophisticated reporting tools. an enterprise approach also highlights where the same process is being done across different systems. for example, oai-pmh harvesting is a feature of multiple systems at griffith. traditionally each system provides its own feeds. our data repository, publications repository, and researcher profile system all provide oai-pmh harvesting endpoints for sending metadata to different aggregators. an alternative solution to explore could be to harvest all publications data from multiple systems into our corporate data warehouse (particularly if this evolved to provide more linked data functionality) and provide a single oai-pmh endpoint that could then be managed as a single service. the ea process has further raised our already high level of concern with the current library systems market. there has been a move in recent years towards larger, highly-integrated “black box” solutions. while there have been some moves towards openness, for example through the development of apis, these are often rhetorical rather than practical. the pricing structures for products mean that we continue to pay for functionality that would not be required if we could integrate library applications with non-library enterprise tools in smarter ways. at griffith, the benefits of enterprise architecture for library technology management | searle 44 https://doi.org/10.6017/ital.v37i4.10437 products that scored most highly in our maturity assessment in terms of business and technical fit were the less expensive, lightweight, browser-based, cloud-native tools designed to do one or two things really well. this suggests that strategies around a more loosely coupled microservices approach, such as that being developed through the folio open source library software initiative, will be worth exploring in future.20 conclusion there are few documented examples of librarians working closely with enterprise architects in higher education or elsewhere. the goal of this case study is to encourage other librarians to learn more about architects’ work practices and to seek opportunities to apply ea methods in the library systems space for the benefit not just of the library but also for the organization as a whole. as a single institution case study, the applicability of this work may be limited in other environments. griffith has a long tradition of highly converged library and it operations; other organizations may have more structural barriers to entry if the library and it areas are not as naturally cooperative. a further obvious limitation relates to resourcing. the author of the cisti case study cautions that getting started in ea can be complex and resource-intensive. few libraries are likely to be in the position of cisti in having dedicated library architects, so working with others will be required. in many universities, work of this nature is outsourced to specialist consultants because of a lack of in-house expertise. at griffith university, we conducted this exercise entirely with in-house staff. a downside of this was that, despite our best efforts at the scoping stage, competing priorities in both areas meant that this work took far longer than we expected. in theory, external consultants could have guided the library through similar activities to produce similar outputs, and probably in a shorter timeframe. however, we would observe that the process has been just as important as the outputs; the knowledge, skills, and relationships that have been built will continue into the future. at cisti, investments in ea were assessed by the library as justified by the improvements in technology capability, strategic planning, and services to library users. the griffith experience validates this perspective. it is also important to note that ea work can and should be done in an iterative way. our experience suggests that some outputs can be delivered earlier than others and useful insights can be gleaned even from drafts. our local “ecosystem” of library applications, enterprise applications, and integrations between these different components mus t respond to changes in technologies; legal and regulatory frameworks; institutional policies and procedures; and other factors. it is therefore unrealistic to expect outputs from a process like this to remain current for long. assuming that the library’s data and application architecture will always be a work-in-progress, it will continue to be worth the effort involved to build and maintain positive working relationships with the enterprise architects, who now have a deeper understanding of who we are and what we do. acknowledgements thank you to anna pegg, associate it architect; jolyon suthers, senior enterprise architect; colin morris, solution consultant; the library technology services team; all our library and learning services colleagues who participated in this initiative; and joanna richardson, library strategy information technology and libraries | december 2018 45 advisor, for support and feedback during the writing of this article. this work was previously presented at theta (the higher education technology agenda) 2017, auckland, new zealand. references 1 griffith university, “griffith digital strategy 2020,” 2016, https://www.griffith.edu.au/__data/assets/pdf_file/0026/365561/griffithuniversity-digitalstrategy.pdf. 2 the open group, “togaf®, an open group standard,” accessed june 4, 2018, http://www.opengroup.org/subjectareas/enterprise/togaf. 3 federation of enterprise architecture professional associations, “a common perspective on enterprise architecture,” 2013, http://feapo.org/wp-content/uploads/2013/11/commonperspectives-on-enterprise-architecture-v15.pdf. 4 judith pirani, “manage today’s it complexities with an enterprise architecture practice,” educause review, february 16, 2017, https://er.educause.edu/blogs/2017/2/managetodays-it-complexities-with-an-enterprise-architecture-practice. 5 stephen kevin anthony, “implementing service oriented architecture at the canada institute for scientific and technical information,” the serials librarian 55, no. 1–2 (july 3, 2008): 235–53, https://doi.org/10.1080/03615260801970907. 6 kristiina hormia-poutanen, “the finnish national digital library: national library of finland developing a national infrastructure in collaboration with libraries, archives and museums,” accessed march 24, 2018, http://travesia.mcu.es/portalnb/jspui/bitstream/10421/6683/1/fndl.pdf. 7 karl w. schornagel, “information technology strategic planning: a well-developed framework essential to support the library’s and future it needs. report no. 2008-pa-105,” may 2, 2009, https://web.archive.org/web/20090502092325/https://www.loc.gov/about/oig/reports/20 09/final%20it%20strategic%20planning%20report%20mar%202009.pdf. 8 joel willemssen, “information technology: library of congress needs to implement recommendations to address management,” december 2, 2015, https://www.gao.gov/assets/680/673955.pdf. 9 rebecca parker and dana mckay, “it’s the end of the world as we know it . . . or is it? looking beyond the new librarianship paradigm,” in marketing and outreach for the academic library, ed. bradford lee eden (lanham, md: rowman and littlefield, 2016): 81–106. 10 new south wales state archives and records authority, “recordkeeping in brief 59—an introduction to enterprise architecture for records managers,” 2011, https://web.archive.org/web/20120502184420/https://www.records.nsw.gov.au/recordkee ping/government-recordkeeping-manual/guidance/recordkeeping-in-brief/recordkeeping-inbrief-59-an-introduction-to-enterprise-architecture-for-records-managers. 11 anthony, “implementing service oriented architecture,” 236–37. https://www.griffith.edu.au/__data/assets/pdf_file/0026/365561/griffithuniversity-digital-strategy.pdf https://www.griffith.edu.au/__data/assets/pdf_file/0026/365561/griffithuniversity-digital-strategy.pdf http://www.opengroup.org/subjectareas/enterprise/togaf http://feapo.org/wp-content/uploads/2013/11/common-perspectives-on-enterprise-architecture-v15.pdf http://feapo.org/wp-content/uploads/2013/11/common-perspectives-on-enterprise-architecture-v15.pdf https://er.educause.edu/blogs/2017/2/manage-todays-it-complexities-with-an-enterprise-architecture-practice https://er.educause.edu/blogs/2017/2/manage-todays-it-complexities-with-an-enterprise-architecture-practice https://doi.org/10.1080/03615260801970907 http://travesia.mcu.es/portalnb/jspui/bitstream/10421/6683/1/fndl.pdf https://web.archive.org/web/20090502092325/https:/www.loc.gov/about/oig/reports/2009/final%20it%20strategic%20planning%20report%20mar%202009.pdf https://web.archive.org/web/20090502092325/https:/www.loc.gov/about/oig/reports/2009/final%20it%20strategic%20planning%20report%20mar%202009.pdf https://www.gao.gov/assets/680/673955.pdf https://web.archive.org/web/20120502184420/https:/www.records.nsw.gov.au/recordkeeping/government-recordkeeping-manual/guidance/recordkeeping-in-brief/recordkeeping-in-brief-59-an-introduction-to-enterprise-architecture-for-records-managers https://web.archive.org/web/20120502184420/https:/www.records.nsw.gov.au/recordkeeping/government-recordkeeping-manual/guidance/recordkeeping-in-brief/recordkeeping-in-brief-59-an-introduction-to-enterprise-architecture-for-records-managers https://web.archive.org/web/20120502184420/https:/www.records.nsw.gov.au/recordkeeping/government-recordkeeping-manual/guidance/recordkeeping-in-brief/recordkeeping-in-brief-59-an-introduction-to-enterprise-architecture-for-records-managers benefits of enterprise architecture for library technology management | searle 46 https://doi.org/10.6017/ital.v37i4.10437 12 jolyon suthers, “information and technology architecture,” 2016, accessed april 6, 2018 https://www.caudit.edu.au/system/files/media%20library/resources%20and%20files/com munities/enterprise%20architecture/ea2016%20joylon%20suthers%20caudit%20ea%2 0symposium%202016%20-%20it%20architecture%20v2_0.pdf. 13 the open group, “togaf® 9.1,” 2011, 2018, http://pubs.opengroup.org/architecture/togaf9doc/arch/index.html: part 1 introduction section 2: core concepts. 14orbus software, “iserver for enterprise architecture,” accessed march 26, 2018, https://www.orbussoftware.com/enterprise-architecture/capabilities/. 15 ebsco, “gobi®,” accessed june 5, 2018, https://gobi.ebsco.com/gobi. 16 google scholar, “google scholar support for libraries,” accessed june 5, 2018, https://scholar.google.com/intl/en/scholar/libraries.html. 17 national library of australia, “trove,” accessed june 5, 2018, https://trove.nla.gov.au/; australian national data service, “research data australia,” accessed june 5, 2018, https://researchdata.ands.org.au/. 18 mike bostock, “d3.js—data-driven documents,” accessed april 3, 2018, https://d3js.org/. 19 parker and mckay, “it’s the end of the world,” 88. 20 breeding, marshall, “five key technology trends for 2018,” computers in libraries, 37, no.10 (december 2017), http://www.infotoday.com/cilmag/dec17/breeding--five-key-technologytrends-for-2018.shtml. https://www.caudit.edu.au/system/files/media%20library/resources%20and%20files/communities/enterprise%20architecture/ea2016%20joylon%20suthers%20caudit%20ea%20symposium%202016%20-%20it%20architecture%20v2_0.pdf https://www.caudit.edu.au/system/files/media%20library/resources%20and%20files/communities/enterprise%20architecture/ea2016%20joylon%20suthers%20caudit%20ea%20symposium%202016%20-%20it%20architecture%20v2_0.pdf https://www.caudit.edu.au/system/files/media%20library/resources%20and%20files/communities/enterprise%20architecture/ea2016%20joylon%20suthers%20caudit%20ea%20symposium%202016%20-%20it%20architecture%20v2_0.pdf http://pubs.opengroup.org/architecture/togaf9-doc/arch/index.html http://pubs.opengroup.org/architecture/togaf9-doc/arch/index.html https://www.orbussoftware.com/enterprise-architecture/capabilities/ https://gobi.ebsco.com/gobi https://scholar.google.com/intl/en/scholar/libraries.html https://trove.nla.gov.au/ https://researchdata.ands.org.au/ https://d3js.org/ http://www.infotoday.com/cilmag/dec17/breeding--five-key-technology-trends-for-2018.shtml http://www.infotoday.com/cilmag/dec17/breeding--five-key-technology-trends-for-2018.shtml abstract introduction literature review getting started: context and planning library technology services context information technology architecture and solutions context determining scope in a statement of architecture work enterprise architecture process and outputs data asset and application inventory application maturity survey application communication diagram production of interactive visualizations discussion short-term impacts in the library domain longer-term impacts in an enterprise context conclusion acknowledgements references fagan 140 information technology and libraries | september 2006 visual search interfaces have been shown by researchers to assist users with information search and retrieval. recently, several major library vendors have added visual search interfaces or functions to their products. for public service librarians, perhaps the most critical area of interest is the extent to which visual search interfaces and text-based search interfaces support research. this study presents the results of eight full-scale usability tests of both the ebscohost basic search and visual search in the context of a large liberal arts university. l ike the web, online library research database interfaces continue to evolve. even with the smaller scope of library research databases, users can still suffer from information overload and may have difficulty in processing large results sets. web search-engine research has shown that the number of searchers viewing only the first results page has increased from 29 percent in 1997 to 73 percent in 2002 for united states-based web searchengines users.1 additionally, the mean number of results viewed per query in 2001 was 2.5 documents.2 this may indicate either increasing relevance in search results or an increase in simplistic web interactions. visual alternatives to search interfaces attempt to address some of the problems of information retrieval within large document sets. while research and development of visual search interfaces began well before the advent of the web, current research into visual web interfaces has continued to expand.3 within librarianship, the most visual interface research seems to focus on those that could be applied to large-scale digital library projects.4 although library products often have more metadata and organizational structure than the web, search engine-style interfaces adapted for field searching and boolean operators are still the most frequent approach to information retrieval.5 yet research has shown that visual interfaces to digital libraries offer great benefit to the user. zaphiris emphasizes the advantage of shifting the user’s mental load “from slow reading to faster perceptual processes such as visual pattern recognition.”6 according to borner and chen, visual interfaces can help users better understand search results and the interrelation of documents within the result set, and refine their search.7 in their discussion of the function of “overviews” in visual interfaces, greene and his colleagues say that overviews can help users make better decisions about potential relevance, and “extract gist more accurately and rapidly than traditional hit lists provided by search engines.”8 several library database vendors are implementing visual interfaces to navigate and display search results. serials solutions’ new federated search product, centralsearch, uses technology from vivisimo that “organizes search results into titled folders to build a clear, concise picture for its users.”9 ulrich’s fiction connection web site has used aquabrowser to help one “discover titles similar to books you already enjoy.”10 the queens library has also implemented aquabrowser to provide a graphical interface to its entire library’s collections.11 xreferplus maps search results to topics by making visual connections between terms.12 comabstracts, from cios, uses a similar concept map, although one cannot launch a search directly from the tool. groxis chose a circular style for its concept-mapping software, grokker. partnerships between groxis and stanford university began as early as 2004, and grokker is now being implemented at stanford university libraries academic and information resources.13 ebsco and groxis announced their partnership in march 2006.14 the ebscohost interface now features a visual search tab as an option that librarians can choose to leave on (by default) or turn off in ebsco’s administrator module. figure 1 shows a screenshot of the visual search interface. within the context of library research databases, visual searching likely provides a needed alternative from traditional, text-based searching. to test this hypothesis, james madison university libraries (jmu libraries) decided to conduct eight usability sessions with ebscohost’s new visual search, in coordination with ebsco and groxis. while this is by no means the first published usability test of vendor interfaces, the literature understandably reveals a far greater number of usability tests on in-house projects such as library web sites and customized catalog interfaces than on library database interfaces.15 it is hoped that by observing users try both the ebsco basic search and visual search, more understanding will be gained about user search behavior and the potential benefits of a visual approach. ฀ method the usability sessions were conducted at jmu, a large liberal arts university whose student population is mostly drawn from virginia and the northeastern region. only 10 percent of the students are from minority groups. jmu requires that all freshmen pass the online information skills seeking test (isst) before becoming a sophomore, and the libraries developed a web tutorial, “go for the gold,” to prepare students for the isst. therefore, usabiljody condit fagan usability testing of a large, multidisciplinary library database: basic search and visual search jody condit fagan (faganjc@jmu.edu) is digital services librarian at carrier library, james madison university, harrisonburg, virginia. usability testing of a large, multidisciplinary library database | fagan 141 ity-test participants were largely white, from the northeastern united states, and had exposure to basic information literacy instruction. jmu libraries’ usability lab is a small conference room with one computer workstation equipped with morae software.16 audio and video recordings of user speech and facial expressions, along with “detailed application and computer system data,” are captured by the software and combined into a searchable recording session for the usability tester to review. a screenshot of the morae analysis tool is shown in figure 2. the usability test script was developed in collaboration with representatives of ebsco and groxis. ebsco provided access to the beta version of visual search for the test, and groxis provided financial incentives for student participants. the test sessions and the results analysis, however, were conducted solely by the researcher and librarian facilitators. the visual search development team was provided with the results and video clips after analysis. usability study participants were recruited by posting an announcement to the jmu students’ web portal. a $25 gift certificate was offered as an incentive, and more than 140 students submitted a participation interest form. these were sorted by the number of years the student(s) had been at jmu to try to get as many novice users as possible. because so much of today’s student work is conducted in groups, four groups of two, as well as four individual sessions, were scheduled, for a total of twelve students. jmu librarians who had received both human-subjects training and an introduction to facilitation served as facilitators to the usability sessions. their role was to watch the time and ask open-ended questions to keep the student participants talking about what they were doing. the major research question it was hoped would be answered by the tests was, “to what extent does ebsco’s basic search interface and visual search interface support student research?” since the tests could not evaluate the entire research process, it was decided to focus on the development of the research topic. specifically, the goal was to find out how well each interface supported the intellectual process of the students in coming up with a topic, narrowing their topic, and performing searches on their chosen subtopics. an additional goal was to determine how well users were able to find and use the interface widgets and how satisfied the students felt after using the interfaces. the overall session was structured in this order: a pretest survey about the students’ research experience; a series of four tasks performed with ebscohost’s basic search; a series of three tasks performed with ebscohost’s visual search; and a posttest interview. both basic and visual search interfaces were used with academic search premier. each of the eight sessions was recorded in entirety by the morae software, and each recording was viewed in entirety. to try to gain some quantitative data, the researcher measured the time it took to complete each task. however, due to variables such as facilitator involvement and interaction between group members, the numbers did not lend themselves to comparison. also, it would not have been clear whether greater numbers indicated a positive or negative sign. taking longer to come up with subtopics, for example, could as easily be a sign of exploration and interested inquiry as it might be of frustration or failure. as such, the data are mostly qualitative in nature. figure 1. screenshot of ebscohost’s visual search figure 2. screenshot of morae recorder analysis tool 142 information technology and libraries | september 2006 ฀ results the student participants were generally underclassmen. two of the students, group 2, were in their third year at jmu. all others were in their first or second year. while students were drawn from a wide variety of majors, it is regrettable that there was not stronger representation from the humanities. when asked, “what do you normally use to do research?” six students answered an unqualified “google.” three other students mentioned internet search engines in their response. only two students gave the brand or product names of library research databases: one said, “pubmed, wilsonomnifile, and ebsco,” while the other, a counseling major, mentioned psycinfo and cinahl. when shown a screenshot of basic search, half of the students said they had used an ebsco database before. all of the participants said they had never before used a visual search interface. the full results from the individual pretest interviews are shown in figures 3 and 4. to begin the usability test, the facilitator started internet explorer and loaded the ebscohost basic search, which was set to have a single input box. the scripts for each task are listed in figure 5. note that task 4 was only featured in the basic search portion of the test. for task 1 on the basic search—coming up with a general topic—all of the participants began by using their own topics rather than choosing from the list of ideas. also, although they were asked to “spend some time on ebsco to come up with a possible general topic,” all but group 6 fulfilled this by simply thinking of a topic (sometimes after some discussion within the groups of two) and typing it in. with the exception of group 6, the size of the result set did not inspire topic changes. figure 6 summarizes the students’ searches and relative success on task 1. in retrospect, the tests might have yielded more straightforward findings if the students had been directed to choose from the provided list of topics, or even to use the same topic. however, part of the intention was to determine whether either interface was helpful in guiding the students’ topic development. it was hoped that by defining the scenario as writing a paper for class, their topic selection would reflect the realities of student research. however, it probably would have been better to have used the same topic for each session. task 2 asked participants to identify three subtopics, and task 3 asked them to refine their search to one subtopic and limit it to the past two years. a summary of these tasks appears in figure 7. a surprising finding during task 2 was that students did go past the first page of results. four groups went past the first page of results, while two groups did not get enough results for more than one page. the other two groups did not choose to look past the first page of results. this contrasts with jansen and spink’s findings, figure 3. results from pretest interview, groups 1–4 figure 4. results from pretest interview, groups 5–8 usability testing of a large, multidisciplinary library database | fagan 143 in which 73 percent of web searchers only view the first results page.17 another pleasant surprise was that students spent some time actually reading through results when they were searching for ways to narrow their topic. five groups scanned through both titles and abstracts, which requires clicking on the article titles to display the citation view. one of these five additionally chose to open full-text articles and look at the references to determine relevance. two groups scanned through the results pages only, but looked at both article titles and the subjects in the left-hand column. group 5 seemed to only scan the titles in the results list. this user behavior is also quite different than that found with web search-engine users. in one recent study by jansen and spink, more than 90 percent of the time, search-engine users viewed five or fewer documents per query.18 the five groups that chose to view the citation/abstract view by clicking on the title (groups 1, 2, 3, 4, and 6) identified subtopics that were significantly more interesting and plausible than the general topic they had come up with. from looking at their results, these groups were clearly identifying their subtopics from reading the abstracts and titles rather than just brainstorming. although group 2 had the weakest subtopics, going from the world baseball classic to specific players’ relationships to the classic and the home-run derby, they were working with a results set of but eleven items. the three groups that relied on scanning only the results list succeeded to an extent, but as a whole, the new subtopics would be much less satisfying to the scenario’s hypothetical professor. after scanning the titles on two pages of results, group 5 (an individual) ended up brainstorming her subtopics (prevention, intervention, and what an eating disorder looks like) based on her knowledge of the topic rather than drawing from the results. group 7 (a group of two) identified their subtopic (sand dunes) from the lefthand column on the results list. group 8 (an individual) picked up his subtopics (steroids in sports, president bush’s stance on steroids, and softball) from reading keywords in the article titles on the first page of results. since the subjects in the left-hand column were a new addition to basic search, the use of this area was also noted. four groups used the subjects in the left-hand column without prompting. two groups saw the subjects (i.e., ran the mouse over them) but did not use them. the remaining two groups made no action related to the subjects. a worrisome finding of tasks 2 and 3 was that most students had trouble with the default search being set to phrase-searching rather than to a boolean and. this can easily be seen in looking at the number of results the students came up with when they tried to refine their topics (figure 7). even though most students had some limiter still in effect (full text, last two years) when they first tried their new refined search, it was the phrasesearching that really hurt them. luckily, this figure 6. task 1, coming up with a general topic using basic search figure 5. tasks posed for each portion of the usability test. 144 information technology and libraries | september 2006 is a customizable setting in ebsco’s administrator module, and it is recommended that libraries enable the “proximity” expander to be set “on” by default, which will automatically combine search terms with and. task 4, finding a “recent article in the economist about the october earthquake in kashmir,” was designed to test the usability of the ebscohost publication search and limiter. it was listed as optional in case the facilitator was worried that time was an issue. four of the student groups—1, 2, 5, and 7—were posed the task. of these four groups, three relied entirely on the publication limiter on the refine search panel. group 1 chose to use the publication search. all four groups quickly and successfully completed this task. ฀ ฀additional questions during basic search tasks at various points during the three tasks in ebsco’s basic search, the students were asked to limit their results set to only full-text results, to find one peer-reviewed article, and to limit their search to the past two years. seven out of the eight student groups had no problem finding and using the ebscohost “refine search” panel, including the full-text check box, date limiter, and peerreviewed limiter. group 7 did not find the refine search panel or use its limiters until specifically guided by the facilitator near the end. this group had found other ways to apply limits: they used the “books/monographs” tab on the results list to limit to full text, and the results-list sorting function to limit to the past two years. after having seen the refine search panel, group 7 did use the “peer reviewed” check box to find their peer-reviewed article. toward the end of the basic search portion, students were asked to “save three of their results for later.” three groups demonstrated full use of the folder. an additional three groups started to use the folder and viewed the folder but did not use print, save, or e-mail. it is unclear whether they knew how to do so and just did not follow through, or whether they thought they had safely stored the items. two students did not use the folder at all, acting individually on items. one group used the “save” function but did not save each article. ฀ visual search similar to task 1, when using the basic search, students did not discover general topics by using the interface, but simply typed in a topic of interest. only two groups, 1 and 8, chose to try the same topic again. in the interests of processing time, visual search limits the search to the first 250 results retrieved. since jmu has set the default sort results to display in chronological order, the most recent 250 results were returned during these usability tests. figure 8 shows the students’ original search terms using visual search, the actions they took while looking for subtopics, and the subtopics they identified. additionally, if the subtopics they identified matched words on the screen, the location of those words is noted. three of the groups (1, 2, and 5) identified subtopics when looking at the labels on topic and subtopic circles. group 3 identified subtopics while looking at article titles as well as the subtopic circles. the members of group 6 identified subtopics while looking at the citation view and reading the abstract and full text, as well as rolling over article titles with their mice. it was not entirely clear where the student in group 4 got his subtopics from. two of the three subtopics did not seem to figure 7. basic search, task 2 and 3, coming up with subtopics. usability testing of a large, multidisciplinary library database | fagan 145 be represented in the display of the results set. his third subtopic was one of the labels from a subtopic circle. groups 7 and 8 both struggled with finding their subtopics. group 7 simply had a narrow topic (“jackalope”), and group 8 misspelled “steroids” and got few results for that reason. lacking many clusters, both groups tried typing additional terms into the title keyword box on the filter panel, resulting in fewer or zero results. for task 3, students were asked to limit their search to the last two years and to refine their search to a chosen subtopic (figure 9). particularly because the results set is limited to 250, it would have been better to have separated these two tasks: first to have them limit the content, then perhaps the date of the search. three groups, all groups of two, used the date limit first (2, 6, and 8). three groups (1, 3, and 6) narrowed the content of their search by typing a new search or additional keywords into the main search box. groups 2 and 4 narrowed the content of their search by clicking on the subtopic circles. note that this does not change the count of the number of results displayed in the filter panel. groups 5 and 7 tried typing keywords into the title keyword filter panel and also clicking on circles. both groups fared better with the latter approach. group 8 typed an additional keyword into the filter panel box to narrow his search. while five of the groups announced the subtopic to which they wanted to narrow their search before beginning to narrow their topic, groups 2, 7, and 8 began to interact with the interface and experiment with subtopics before choosing one. while groups 2 and 8 arrived at a subtopic and identified it, group 7 tried many experiments, but since their original topic (jackalope) was already narrow, they were not ultimately successful in identifying or searching on a subtopic. as with basic search, students were asked to save three articles for later. five of the groups (2, 4, 5, 6, and 8) used the “add to folder” function which appears in the citation view on the right-hand side of the screen. of these, three groups proceeded to “folder has items.” of these groups, two chose the “save” function. two groups used either “save” or “e-mail” to preserve individual items, rather than using the folder. one group experienced system slowness and was not able to load the full-record view in time to determine whether they would be able to save items for later. a concern that students may not realize is that in folder view or individually, the “save” button really just formats the records. the user must still use a browser function to save the formatted page. no student performed this function. figure 8. visual search, task 1 and 2, coming up with a general topic figure 9. visual search, task 3, searching on subtopic (before date limit, if possible) 146 information technology and libraries | september 2006 several students had some trouble with the mechanics of the filter panel, shown in figure 10. seven of the eight groups found and used the filter panel, originally hidden from view, without assistance. however, some users were not sure how the title keyword box related to the main search box. at least two groups typed the same search string into the title keyword box that they had already entered into the main search box. also, users were not sure whether they needed to click the search button after using the date limiter. however, in no case was a student unable to quickly recover from these areas of confusion. ฀ results of posttest interview at the end of the entire usability session, participants were asked several questions while looking at screenshots of each interface. a full list of posttest interview questions can be found in figure 11. when speaking about the strengths of basic search, seven of eight groups talked about the search options, such as field searching and limiters. the individual in group 1 mentioned “the ability to search in fields, especially for publications and within publications.” one of the students in group 3 mentioned that “i thought it was easier to specify the search for the full text and the peer reviewed—it had a separate page for that.” the student in group 4 added, “they give you all the filter options as opposed to the other one.” five of the eight groups also mentioned familiarity with the type of interface as a strength of basic search. since jmu has only had access to ebsco databases for less than a year, and half of the students admitted they had not used ebsco, it seemed their comments were with the style of interface more than their experience with the interface. the student in group 1 commented, “seems like the standard search engine.” group 2 noted, “it was organized in a way that we’re used to more,” and group 3 said, “it’s more traditional so it’s more similar to other programs.” half of the groups mentioned that basic search was clear or organized. group 6 explained, “it was nice how it was really clearly set out . . . like, everything’s in a line.” not surprisingly, visual search’s strengths surrounded the grouping of subtopics: seven of eight groups made some comment about this. the student in group 4 said, “it groups the articles for you better. it kinda like gives you the subtopics when you get into it and search it and that’s pretty cool.” the student in group 8 stated, “you can look and see an outline of where you want to go . . . it’s easy to pinpoint it on screen like that’s where i want to go with my research.” some of the other strengths mentioned about visual search were: showing a lot of information on one screen without scrolling (group 7) and the colorful nature of the interface. a student in group 2 added, “i like the circles and squares—the symbols register easily.” the only three weaknesses listed for basic search in response to the first question were: “not having a spot to put in words not to search for” (group 1); that, like internet search engines, basic search should have “a clip from the article that has the keyword in it, the line before and the line after” (group 6); and that basic search might be too broad, because “unless you narrow it, [you have to] type in keywords to narrow it down yourself” (group 7). figure 10. visual search filter panel figure 11. posttest interview questions usability testing of a large, multidisciplinary library database | fagan 147 with regard to weaknesses of visual search, half of the groups had some confusion about the content, partially due to the limited number of results. a student from group 7 declared, “it may not have as many results. . . . if you typed in ‘school’ on the other one, it might have . . . 8,000 pages [but] on this you have . . . 50 results.” the student in group 5 agreed, saying that with visual search, “they only show you a certain number of articles.” the student in group 1 said, “it’s kind of confusing when it breaks it up into the topics for you. it may be helpful for some other people, but for the way my mind works i like just having all my results displayed out like on the regular one.” half of the groups also made some comment that they were just not used to it. six of the groups were asked which one they would choose if they had class in one hour. (it is not clear why the facilitator did not ask this question of groups 3 and 8.) four groups (1, 2, 5, and 7) indicated basic search. one student in group 2 said, “i think it’s easier to use, but i don’t trust it.” the other in group 2 added, “it’s new and we’re not quite sure because every other search engine is you just type in words and it’s not graphical.” both students in group 7 commented that the familiarity of basic search was the reason they would use it for class in one hour. both groups 2 and 7 would later say that they liked the visual search interface better. two groups (4 and 6) chose visual search for the “class in one hour” scenario. the student in group 4 commented, “because it does cool things for you, makes it easier to find. otherwise you’re going through by title.” both these groups would later also say that they liked the visual search interface better. the students were also asked to describe two scenarios, one in which they would use basic search and one in which they would use visual search. four of the groups (1, 3, 5, and 6) said they would use basic search when they knew what information they needed. seven of the eight groups said they would use visual search for broad topics. all the students’ responses are given in figure 12. when asked which interface they preferred, the groups split evenly. comments from the four who preferred basic search (1, 3, 5, and 8) centered on the familiarity of the interface. the student in group 5 added, “the regular one . . . i like to get things done.” all four of these students had said they had used an ebsco database before. the two students who could list library research databases by name were both in this group. of the four who preferred visual search (2, 4, 6, and 7), three groups had never used ebsco before, though one of the students in group 7 thought he’d used it in the library web tutorial. group 2 commented, “it seemed like it had a lot more information . . . cool . . . futuristic.” the student in group 4 said, “it’s kind of like a little game. . . . like you’re trying to find the hidden piece.” group 7 commented that visual search was colorful and intriguing. the students in group 6 both stated “the visual one” in unison. one student said that visual search was more “[eye-catching] . . . it keeps you focused at what you are doing, i felt, instead of . . . words . . . you get to look at colors” and added later that it was “fun.” the other students in group 6 said, “i’m a very visual learner. so to see instead of having to read the categories, and say oh this is what makes sense, i see the circles like ‘abilities test’ or ‘academic achievement’ and i automatically know that’s what it is . . . and i can see how many articles are in it . . . and you click on it and it zooms in and you have all of them there.” the second student went on to add, “i’ve been teaching my mom how to use technology and the visual search would be so much easier for her to get, because its just looks like someone drew it on there like this is a general category and then it breaks it down.” other suggestions given during the free-comment portion of the survey were to have the filters from basic search appear on visual search (especially peer-reviewed); curiosity about when visual search would become available (at the time it was in beta test); and a suggestion to have generaleducation writing students write their first paper using visual search. figure 12. examples of two situations: one in which you would be more likely to use visual search, and one in which you would be more likely to use ebsco 148 information technology and libraries | september 2006 ฀ discussion this evaluation is limited both because most students chose different topics for each search interface, and because they only had time to research one topic in each interface. therefore, there could be an infinite number of scenarios in which they would have performed differently. however, this study does show that, for some students, or for some search topics, visual search will help students in a way that basic search may not. one hypothesis of this study was that within the context of library research databases, visual searching would provide a needed alternative from traditional, text-based searching. the success of the students was observed in three areas: the quality of the subtopics they identified after interacting with their search results; the improvement of the chosen subtopic over their chosen general topic, and the quality of the results they found for their subtopic search. the researcher made a best effort to compare topics and results sets and decide which interface helped the student groups to perform better. in addition, qualities that each interface seemed to contribute to the students’ search process were noted (figure 13). these qualities were determined by reviewing the video recordings and examining the ways in which either interface seemed to support the attitudes and behaviors of the students as they conducted their research tasks. when considering all three of these areas, four groups did not, overall, require visual search as an alternative to basic search (1, 3, 4, and 7). two of these groups (4 and 7) seemed to benefit from more focus when using the basic search interface. although visual search lent them more interaction and exploration (which may be why they said they preferred visual search), it seems the focus was more important to their performance. for the other two groups (1 and 3), basic search really supported the depth of inquiry and high interest in finding results. these two groups confirmed that they preferred basic search. for two groups (6 and 8), visual search seemed an equally viable alternative to basic search. for group 6, both interfaces seemed to support the group’s desire to explore; they said they preferred visual search. for the student in group 8, basic search seemed to orient him to the goal of finding results, while visual search supported a more exploratory approach. since, in his case, this exploratory approach did not turn out well in the area of finding results, it is not surprising that he ended up preferring basic search. the remaining two groups (2 and 5) performed better with visual search, upholding the hypothesis that an alternate search is needed. group 2 seemed bored and uninterested in the search process when using basic search even though they chose a topic of personal interest: “world baseball classic.” visual search caught their attention and sparked interest in the impersonal topic “global warming.” group 2 spent more time exploring while using the visual search interface, and in the posttest survey admitted that they preferred the visual search interface. the student in group 5 said she preferred basic search, and as a selfdescribed psycinfo user, seemed comfortable with the interface. yet for this test scenario, visual search made her think of new ideas and supported more real exploration during the search process. within each of the three areas, basic search appeared to have the upper hand for both the quality of the subtopics identified by the students, and in the improvement of the chosen subtopics over the general topics. this is at least partially explained by the limitation of visual search to the most recent 250 results. that is, as the students explored the visual search results, choosing subtopics would not relaunch a search on that subtopic, which would have engendered more and perhaps better subtopics. in the third area, the quality of the results set for the chosen topic, visual search seemed to have the upper hand if only because of the phrase-searching limitation present in jmu’s administrative settings for basic search. that is, students were often finding few or no results on their chosen subtopics in basic search. this study also had findings that seem to transcend figure 13: strengths of basic search and visual search in quality of subtopics, most improved topic, and result sets usability testing of a large, multidisciplinary library database | fagan 149 these interfaces and the underlying database. first, libraries should strongly consider changing their database default searching from phrase searching to a boolean and, if possible. (this is possible in ebsco using the administrative module.) second, most students did not have trouble finding or using the interface widgets to perform limiting functions, with the one exception being some confusion about the relationship between the visual search filters and main search box. unlike some research into web search behavior, students may well travel beyond the first page of results and view more than just a few documents when determining relevance. finally, the presence of subject terms in both interfaces proved to be an aid to understanding results sets. this study also pointed out some improvements that could be made to visual search. first, it would be great if visual search returned more than 250 results in the initial set, or at least provided an overview of the size, type, and extent of objects using available metadata.19 however, even with today’s high-speed connections, result-set size will need to be balanced with performance. perhaps, as students click on subtopics, the software could rerun the search so that the results set does not stay limited to the original 250. on a minor note, for both basic and visual search, greater care should be taken to make sure users understand how the save function works and alert users to the need to use the browser function to complete the process. it should be noted that ebsco has not stopped developing visual search, and many of these improvements may well be on their way. ebsco says it will be adding more support for limiters, display preferences, and contextual text result-list viewing at some point in the future. these feature sets can currently be viewed on grokker.com. an important area for future research is user behavior in library subscription databases. while these usability tests provide a qualitative evaluation of a specific interface, it would be worthwhile to have a more reliable understanding about students’ searching behavior in library databases across similar interfaces. since public service librarians deal primarily with users who have self-identified as needing help, their experience does not always describe the behavior of all users. furthermore, studies of web search behavior may not apply directly to searching in research databases. specifically, students’ use of subject terms in both interfaces could be explored. half of the student groups in this study chose to use the basic search subject clusters in the left-hand column on the results page, despite the fact that they had never seen them before (this was a beta-test feature). is this typical? would this strategy hold up to a variety of research topics? another interesting question is the use of a single search box versus several search boxes arrayed in rows (to assist in constructing boolean and field searching). in the ebsco administrative module, librarians can choose either option. based on research rather than anecdotal evidence, which is best? another option is the default sort: historically, at jmu libraries, this has been a chronological sort. does this cause problems for relevance-thinking students? finally, the issue of collaboration in student research using library research databases would be a fascinating topic. certainly, these usability recordings could be reviewed with a mind to capturing the differences between individuals and groups of two, but there may be better designs for a more focused study of this topic. ฀ conclusion if you take away one conclusion from this study, let it be this: do not hesitate to try visual search with your users! information providers must balance investments in cutting-edge technology with the demands of their users. libraries and librarians, of course, are a key user group for information providers. a critical need in librarianship is to become familiar with the newest technology solutions, particularly with regard to searching, in order to provide vendors with informed feedback about which technologies to pursue. by using and teaching new visual search alternatives, librarians will be poised to influence the further development of alternatives to text-based searching. references and notes 1. bernard j. jansen and amanda spink, “how are we searching the world wide web? a comparison of nine search engine transaction logs,” special issue, information processing and management 42, no. 1 (2006): 257. 2. bernard j. jansen and amanda spink, “an analysis of web documents retrieved and viewed,” in proceedings of the 4th international conference on internet computing (las vegas, 2003), 67. 3. aravindan veerasamy and nicholas j. belkin, “evaluation of a tool for visualization of information retrieval results,” sigir forum (acm special interest group on information retrieval) (1996): 85–93; katy börner and javed mostafa, “jodl special issue on information visualization interfaces for retrieval and analysis,” international journal on digital libraries 5, no. 1 (2005): 1–2; ozgur turetken and ramesh sharda, “clustering-based visual interfaces for presentation of web search results: an empirical investigation,” information systems frontiers 7, no. 3 (2005): 273–97. 4. stephen greene et al., “previews and overviews in digital libraries: designing surrogates to support visual information seeking,” journal of the american society for information science 51, no. 4 (2000): 380–93; panayiotis zaphiris et al., “exploring the use of information visualization for digital libraries,” new review of information networking 10, no. 1 (2004): 51–69. 5. katy börner and chaomei chen eds., visual interfaces to digital libraries, 1st ed. (berlin; new york: springer, 2003), 243. 150 information technology and libraries | september 2006 6. zaphiris et al., “exploring the use of information visualization for digital libraries,” 51–69. 7. börner and chen, visual interfaces to digital libraries, 243. 8. greene et al., “previews and overviews in digital libraries,” 380–93. 9. “vivisimo corporate profile,” in vivisimo, http://vivi simo.com/html/about (accessed apr. 19, 2006). 10. “aquabrowser library—fiction connection,” www.fic tionconnection.com/ (accessed apr. 19, 2006). 11. “queens library—aquabrowser library,” http://aqua .queenslibrary.org/ (accessed apr. 19, 2006). 12. “xrefer—research mapper,” www.xrefer.com/research (accessed apr. 19, 2006). 13. “stanford ‘groks,’” http://speaking.stanford.edu/back _issues/ soc67/library/stanford_groks.html (accessed apr. 19, 2006); “grokker at stanford university,” http://library.stan ford.edu/catdb/grokker/ (accessed apr. 19, 2006). 14. “ebsco has partnered with groxis to deliver an innovative visual search feature as part of ebsco,” www.groxis .com/service/grokker/pr29.html (accessed apr. 19, 2006). 15. michael dolenko, christopher smith, and martha e. williams, “putting the user into usability: developing customer-driven interfaces at west group,” in proceedings of the national online meeting 20 (medford, n.j.: learned information, 1999), 81–90; e. t. morley, “usability testing: the silverplatter experience,” cd-rom professional 8, no. 3 (1995); ron stewart, vivek narendra, and axel schmetzke, “accessibility and usability of online library databases,” library hi tech 23, no. 2 (2005): 265–86; nicholas tomaiuolo, “deconstructing questia: the usability of a subscription digital library,” searcher 9, no. 7 (2001): 32–39; b. hamilton, “comparison of the different electronic versions of the encyclopaedia britannica: a usability study,” electronic library 21, no. 6 (2003): 547–54; heather l. munger, “testing the database of international rehabilitation research: using rehabilitation researchers to determine the usability of a bibliographic database,” journal of the medical library association (jmla ) 91, no. 4 (2003): 478–83; frank cervone, “what we’ve learned from doing usability testing on openurl resolvers and federated search engines,” computers in libraries 25, no. 9 (2005): 10–14; alexei oulanov and edmund f. y. pajarillo, “usability evaluation of the city university of new york cuny+ database,” electronic library 19, no. 2 (2001): 84–91; steve brantley, annie armstrong, and krystal m. lewis, “usability testing of a customizable library web portal,” college & research libraries 67, no. 2 (2006): 146–63; carole a. george, “usability testing and design of a library web site: an iterative approach,” oclc systems & services 21, no. 3 (2005): 167–80; leanne m. vandecreek, “usability analysis of northern illinois university libraries’ web site: a case study,” oclc systems & services 21, no. 3 (2005): 181–92; susan goodwin, “using screen capture software for web-site usability and redesign buy-in,” library hi tech 23, no. 4 (2005): 610–21; laura cobus, valeda frances dent, and anita ondrusek, “how twenty-eight users helped redesign an academic library web site,” reference & user services quarterly 44, no. 3 (2005): 232–46. 16. “morae usability testing for software and web sites,” www.techsmith.com/morae.asp (accessed apr. 19, 2006). 17. jansen and spink, “an analysis of web documents retrieved and viewed,” 67. 18. ibid. 19. greene et al., “previews and overviews in digital libraries,” 381. user experience testing in the open textbook adaptation workflow: a case study article user experience testing in the open textbook adaptation workflow a case study camille thomas, kimberly vardeman, and jingjing wu information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.12039 camille thomas (cthomas5@fsu.edu) is scholarly communications librarian, florida state university. kimberly vardeman (kimberly.vardeman@ttu.edu) is user experience librarian, texas tech university. jingjing wu (jingjing.wu@ttu.edu) is web librarian, texas tech university. © 2021. abstract as library publishers and open education programs grow, it is imperative that we integrate practices in our workflows that prioritize and include end users. although there is information available on best practices for user testing and accessibility compliance, more can be done to give insight into the library publishing context. this study examines the user and accessibility testing workflow during the modification of an existing open textbook using pressbooks at texas tech university. introduction as library publishers and open education programs grow, there is an opportunity to integrate into our workflows practices that prioritize and include end users. although there is information available on best practices for user testing and accessibility compliance, more can be done to give insight into the library publishing context. there are currently no case studies that examine the user and accessibility testing workflow during the modification of an existing open textbook. this study examines user experience testing as a method to improve oer interfaces, learning experience, and accessibility during the oer production process using pressbooks at a large research university. literature review user experience (ux) is a “momentary, primarily evaluative feeling (good–bad) while interacting with a product or service” that can go beyond simple usability evaluations to consider “qualities such as meaning, affect and value.”1 ux evaluations are generally applied to library websites, spaces, and interfaces and are not currently a common element in library publishing workflows. open educational resources (oer) are defined as teaching, learning, and research resources that reside under an intellectual property license that permits their free use and repurposing by others.2 whitfield and robinson make a distinction between teaching vs. learning resources, instructional vs. interface usability, and ease of modification for creators.3 this select literature review considers usability testing of e-books, oer workflows, and accessibility evaluations and how they apply to local contexts. along with incentives for instructors to engage with oer, the ability to adapt oer is o ften highlighted as a benefit. walz shares common workflows for oer production, including broad steps for design and development during creation of original oer.4 in the case of reuse, the design stage in walz’s workflow includes review, redesign, redevelopment, and adoption. open university models for transforming oer include the integrity model, in which the new oer remains close to the original material; the essence model, in which material is transformed by mailto:cthomas5@fsu.edu mailto:kimberly.vardeman@ttu.edu mailto:jingjing.wu@ttu.edu information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 2 reducing some features and adding new activities for interactivity; and the remix model, in which content is redesigned to be optimal for web viewing.5 student participation in oer production is often seen in open pedagogy, but these cases look at student frustrations and feedback with the objective of experiential learning, not for usability or evaluation.6 now that oer production has grown in scale, librarians and advocates seek the most effective and sustainable workflows. figure 1. illustrations of two oer lifecycles. this work was adapted by camille thomas from an original by anita walz (2015) under a cc by 4.0 international license.7 in his workflow and framework analysis meinke recommends the inclusion of more discrete steps and believes each institution’s ideal workflow will be based on local context.8 usability testing is a discrete workflow step that gives us human-centered insight about how users are affected by interfaces and how they value systems.9 libraries favor collections-based assessment that measures how many end users are using digital items, without prioritizing who users are or how and why they use resources.10 demonstrating and assessing value is essential for scholar buy-in and content recruitment, for example, which are central to all types of open resources. in the case of educational materials, lack of engagement and breakdowns in learning can be attributed to barriers and marginalization of learners.11 additionally, critiques of oer include assumptions that access to information equates to meaningful access to knowledge, but withou t context there is no guarantee that there will be meaningful transference or learning.12 harley believes defining a target audience and considering reasons for use and nonuse of resources in specific educational contexts beyond measuring anecdotal demand (e.g., website page views or survey responses, which harley does not see as indicators of value but rather of popularity) may address challenges to effectively measuring outcomes for content that is freely available on the web.13 meaningful evaluation of learning resources requires deep understanding of contextualized information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 3 educator and student needs, not just content knowledge.14 to address these barriers and assumptions, openstax, a leading oer publisher, has ux experts on staff, but this model is exceptional and rare at a university or library. many universities and libraries publishing oer do not have full-time personnel dedicated to review. some library user experience departments have hired content strategists for auditing, analyzing, defining, and designing website content, contentrelated projects, and overall content strategy.15 currently, oer work is rarely included in the scope of library user experience departments. however, limited literature does show the use of ux research methods in library publishing contexts. libraries and support units with few resources can also perform user testing.16 user experience practitioners have established that a low number of test participants—three to five—are enough to identify a majority of usability issues.17 borsci et al. suggest the important aspect is not securing a high-volume sample, but rather finding behavior diversity to make reliable evaluations.18 the number of users required to find diverse behavior can depend on what is being tested. following this standard, the consistent inclusion of user evaluations in oer workflows will not necessarily require large amounts of funding, participants, resources, or time. oer, in particular, are well suited to cumulative, early, frequent, and specific user testing. with their open copyright licensing and availability, oer are an example of the mutable digital content needed for collaboration, cumulative production, and support of networked communities. 19 several studies assert that complete information behavior analysis should be carried out before or during development, not after.20 meinke concludes his workflow analysis by encouraging iterative release in oer production workflows, which aligns with lean and iterative “guerilla” approaches used in libraries to sustainably improve usability.21 iteration is a process of development that involves cyclical inquiry or repetitive testing, providing multiple opportunities for people to revisit ideas or deliverables in order to improve an evolving design until the end goal is reached.22 in the context of design, it is a method of creating high-quality deliverables at a faster rate.23 a cyclical approach also reflects walz’s as well as blakiston and mayden’s workflow visualizations.24 walz asserts that incentives for instructors and the quality of the resources are key factors in advancing adoption, adaptation, and creation of oer.25 harley uncovered disconnects between what faculty say they need in undergraduate education and what those who produce digital educational resources imagine is an ideal state. 26 influence on faculty resource use, including oer, varied by discipline, teaching style and philosophy, and digital literacy levels, with personal preferences having the most influence on decision-making. in the evaluation or tracking stage found in most oer production workflows, we can see the impact of the quality assurance stage. the study by woodward et al. on student voice in evaluating textbooks found that incorporating multiple stakeholders into the process resulted in deeper exploration of students’ expectations when learning. students ranked one oer and one conventional textbook the highest based on content, design, pedagogy, and cost. multiauthored options ranked higher, and texts with examples were seen as more beneficial for distance learners.27 meinke believes unless discrete parts of the development process are identified, it is not useful to signal others to contribute to a project.28 an example of an oer production workflow containing usability considerations is the content, openness, reuse & repurpose, and evidence (corre) framework by the university of leicester (see fig. 2).29 the openness phase of the corre 2.0 framework includes “transformation for usability,” which is assigned to the oer evaluator, editorial board, or head of department.30 versions of the corre workflow were adapted by the information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 4 university of bath and the university of derby in the united kingdom for their institutional contexts. for example, the university of derby assigned “transformation for usability” to a developer. by building usability as a discrete step in oer production workflows, publishers and collaborators can make improvements on pain points, make changes in context, and create clear guidelines for partnerships based on local needs. betz and hall’s study supports considerations for how user microinteractions, or individual actions within a process, can be improved to make them scalable and commonplace in library workflows.31 this can include publishing workflows. for example, a study of oer on mobile devices in brazil found problems related to performance, text legibility, and trigger actions in small objects.32 other guidelines for oer and usability include using high-quality multimedia source material, advanced support from educational technologists, and balancing high and low technology in order to avoid assumptions about learners’ internet connection or devices.33 although usability testing alone is an important part of evaluating a website or product, because the user experience is multifaceted, it is also important to ensure that the product is accessible, meets user needs, and has an appealing design.34 figure 2. corre framework for oer development at the university of leicester.35 accessibility studies also encourage integrating user interactions into the creation workflow. accessibility impacts usability, findability, and holistic user experience. 36 creators and supporting advocates have relied on universal design, web standards, and ada compliance when creating information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 5 accessible digital content, emphasizing that accessible content for those with disabilities means more accessible content for all users.37 areas addressed can include text style and formatting, linking, and alternative text considerations, as detailed in the bccampus open education accessibility toolkit.38 for example, kourbetis and boukouras drew from universal design to create a bilingual oer for students with hearing impairments in greece, incorporating contextual considerations for vernacular languages and other local user needs.39 early efforts toward accessible oer, such as a 2010 framework, prompted critiques from members of the accessibility community and impeded adoption.40 while guides based in universal design offer a starting place and consistent reference, oer advocates could create workflows that support adaptive changes seen in inclusive design. universal design and web standards are fixed, while inclusive design seeks to support adaptive changes as needs evolve and does not treat non normative users as a homogenous, segregated group.41 treviranus et al. go on to state that compliance is not achieved by providing a set of rules, guidelines, or regulations to be followed.42 beyond lack of awareness of accessibility best practices, librarians and creators tend to have little control over customizing proprietary digital content platforms to add local context. 43 the flexible learning for open education (floe) project, for example, aims to integrate automatic and unconscious inclusive oer design through open source tools, but many institutions may not be able to develop such tools to incorporate local contexts.44 both librarians and e-resources vendors have been interested in the features and usability of ebooks to fine-tune their collection development strategies as well as improve the user experience of their platforms. literature shows that most studies about e-books have focused on features or the interface design of e-book reading applications. the recent academic student ebook experience survey showed that three-quarters of survey attendees considered it extremely important or important for e-books to have page numbers for citation purposes.45 this survey and other studies suggest that search, navigation, zoom, annotation, mobile compatibility, as well as offline availability including downloading, printing, and saving, were the most expected features.46 other features, such as citation generation and emailing, were mentioned or tested in some research.47 while using e-books and using e-textbooks may involve the same functionality, the purpose, devices, and user types differ because knowledge transfer is needed in learning. jardina and chaparro evaluated eight e-textbook applications with four tasks: bookmarking, note-taking, note locating, and word searching.48 they found that the interfaces to these common features varied on the different applications. standardization, or at least following general web convention when designing these interfaces, may reduce distractions that keep students from learning. the etextbook user interface can be critical to the future success of e-textbook adoption. although limited research on usability of e-textbooks or open textbooks has been conducted, a considerable number of findings from studies on e-books are relevant and applicable to etextbook projects. the e-book or e-textbook applications usability evaluation methods and results can be borrowed when understanding oer user needs. libraries can apply these e-book usability evaluations to the basic infrastructure of oer, but leverage the local contexts of students, instructors, and institutional culture when adapting the material. the more normalized usability, prototyping, and collaboration are in oer production workflows, the richer the resources and community investment. this approach can address diverse and evolving oer user needs, locally and sustainably, as they arise. our study contributes to the information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 6 literature by examining the impacts of integrating usability testing in an inaugural oer adaptation project at a large research university in the united states. case study the project to adapt the open textbook college success, published by the university of minnesota libraries, for use in the raiderready first-year seminar course, was brought to the texas tech university libraries’ scholarly publishing office by the head of the outreach department in march 2018. the program was deciding between a commercial textbook and the adapted open textbook. the course was offered during each fall semester and had an average enrollment of over 1,600 since 2016. an initial meeting took place and regular weekly meetings were set up afterward to review edits and ensure communication within the original 30-day timeline. it was the first oer production project for the libraries’ publishing program, which had previously focused on open access journals and materials in the repository. originally, we sought to use open monograph press because we had a local instance already installed. however, a platform with more robust formatting capabilities was needed in order to reach the desired product within the timeline. we decided to use the pressbooks pro (a wordpress plugin) sandbox for one project through our open textbook network membership. a rough draft of edits to the original work were already completed. we used a mediated service model, in which librarians performed the formatting, quality assurance, and publishing. this was in contrast to self-service models in which creators work independently and consult with support specialists. the digital publishing librarian and scholarly communication librarian formatted the edits, with html/css customization and platform troubleshooting from the web librarian. other library staff involved in the project included communications & marketing (cover design), the user experience librarian, and the electronic resources librarian (cataloging in the catalog). campus stakeholders and partners included the libraries, the raiderready program, editors, copytech (printers affiliated with the university), the campus bookstore, and the worldwide elearning unit. program partners were enthusiastic about usability and accessibility testing for the textbook. the initial testing took place in the middle of the adaptation project timeline, once initial content was formatted and ready for testing. the bccampus’ accessibility toolkit and the pressbooks user guide were used as primary guides throughout the process. the scholarly publishing librarian and the user experience librarian met to develop the testing method and identify users who would reflect the audience using the textbook. a second round of tests was conducted a year after the initial project when the editors made updates to the text. while the resulting changes were minor, this further testing allowed us to seek more feedback on the most recent version of the textbook and apply some lessons learned from the first round of testing. we did not use personas or identify user needs beforehand. we planned to recruit first-year students and students who took the raiderready course in a previous semester. however, we decided to instead recruit from existing pools of student volunteers for library usability tests in order to get three to six students in a short amount of time. for the second round of testing, we planned to recruit on-campus students, distance students, and students with diverse abilities. we recruited from newly established pools of volunteers for distance students as well as existing volunteer pools. during the first iteration, we requested that worldwide elearning, texas tech university’s distance learning unit on campus, test the textbook pilot content in pdf and epub formats using screen reader software. information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 7 the user experience librarian conducted a first round of four usability tests in march 2018 and a second round of two usability tests in april 2019. a sample test script from steve krug provided a solid foundation for conducting our own tests.49 in each test, participants were asked to answer two pretest questions, complete four tasks, and answer four posttest questions (see appendix for script). tasks included finding the textbook, exploring the textbook itself, locating activities for a specific chapter, and searching for the student code of conduct. participants were instructed to think aloud as they worked through the tasks. the think-aloud protocol is a commonly used test method, where participants are asked to verbalize their thoughts and actions as they perform tasks on a website. the observation tasks are set beforehand, and the facilitator follows a script to ensure consistency among testers.50 the combination of observing and recording participants’ comments and reactions provides insight into a site’s usability. testers were invited to comment on their experience at the end of the session. each usability test was recorded using morae software to track on-screen activity such as mouse movements and clicks, typing, and the verbal comments of the facilitator and participant. we conducted tests using a laptop running windows 10 with a 15.6-inch display. in the first round of testing, we also showed students the book on an ipad mini, both in adobe reader and ibooks. while we asked them to briefly view the textbooks, we did not ask them to complete specific tasks while using the tablet. limitations the biggest limitation was that we did not test on users using a screen reader or other assistive technology. the user experience librarian built a pool of on-campus students who volunteered to participate in user research in 2018, and relationships with a pool of distance student users was established in 2019. however, a pool of other types of non-normative learners had not yet been established for either round of testing. another limitation of the study was that we primarily tested on campus servers, so we do not have data on rural or distance learner experiences with the textbook until the second round of testing. in addition, we used only a few devices, a windows 10 laptop for formal testing and an ipad students briefly viewed afterward. we also did not have an educational technologist as a partner throughout the process. results once testing was complete, the scholarly communications librarian and the user experience librarian analyzed the notes and identified areas of common concern and confusion among participants. all participants were familiar with online textbooks from other courses. participants cited cost as a major consideration when deciding between purchasing print or electronic texts. more than one participant said that electronic textbooks can be cheaper but can be more frustrating to use. participants had more experience viewing textbooks on laptops. the ability to download texts for reading on a phone was not always available due to publisher restrictions. content and navigation participants liked pictures and visuals to break up the blocks of text. however, one participant expressed a dislike for too many slideshows or other media. another in the first round of testing liked that there were not “too many” links that brought you out of the textbook, stating it was “annoying to split screen in order to see text plus activity/homework assignment.” in the second round of testing, one participant felt the lack of interactive content was best for the first-year students compared to videos and activities in textbooks for advanced courses. that participant also thought the simpler language of the text was more welcoming to first-year students. a information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 8 participant said an ipad would be better than a laptop for viewing this book, because scrolling was easier. several users did not find features such as bookmarked sections in the in-browser pdf viewers or adobe reader. participants who did not see or use the table of contents (toc) continually relied on scrolling through pages to locate content. only one participant, unprompted, used the ctrl+f shortcut to keyword search the text. a few other participants viewed the toc, then entered the desired page number in the top toolbar navigation field. most of them expected the code of conduct information in one of the tasks to be in the front matter. the emphasis on content reflects blakiston and mayden’s experience that without a content strategy, it becomes difficult to search and to demonstrate credibility, and it is a challenge to create a coherent, user-centered information architecture.51 all participants navigated to the toc several times to complete tasks, making it a relevant feature. in the second round of testing, one participant preferred the statements and questions at the beginning of the first chapter to learning objectives typically listed in textbooks. discovery and access participants took varied approaches to finding textbooks. one would get links from the professor via email or the syllabus. others would use the campus bookstore for purchases. one would use the student information system (raiderlink) to locate information about the textbook. potential access points to make the raiderready textbook discoverable included the institutional repository, the open textbook library, and the local library catalog. the open textbook library was ruled out mostly due to campus-specific adaptations, which were not more substantial for public use than the original college success. thinktech, the institutional repository, was the most viable option and allowed for permanent linking, which worked well with the access points student users mentioned. in the second round of testing, one participant searched for the textbook via the library catalog/discovery system, google search, and the raiderready department website. the course description on the department website listed an open textbook, but the user pointed out that it was not actually linked there. discussion user testing changed our actions during the project. interactions with students did not occur during any other stage of the adaptation process before the resource was adopted in the course. many insights from the testing were indicative of self-reported preferences such as requesting more visuals, preferring print for reading and exercises, and auditory screen reading. we also learned ways that cost impacted how students used textbooks. for example, when we followed up on a participant’s comment and asked if they liked to highlight books, the student responded that they try not to mark their books because they want to resell them. testing also helped us observe actual behaviors among similar users in a way oer toolkits and guidelines alone did not. we learned more about how oer fits into the culture of learning and resources at texas tech university and how that may differ from other institutions. for a visual representation of our workflow, we adapted billy meinke’s oer production workflow (targeted to creators) because it was an openly available, editable workflow with comprehensive discrete steps. similar to the corre framework adaptations, meinke’s workflow was adapted by others, including the southern alberta institute of technology (sait), lansing community college, and the university of houston, to fit their institutional contexts.52 our process did not include an external peer review process; instead review was done by the editors. priming and preproduction information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 9 phases in the workflow were relatively quick, occurring in the first two weeks. the bulk of time— about four weeks—was spent in the development phase. the quality assurance and publishing phases occurred for about two weeks, with most of the time spent on finalizing edits and formatting. the first round of user testing took about two hours total and redux (revisiting the prototype and implementing changes), along with the format finalization, took about two weeks. finalization for formatting and redux changes in the first iteration of the text involved pressbooks troubleshooting. the original timeline for the project was 30 days, but the actual time for the project was 60 days. the second round of user testing took about two hours total and occurred at the halfway point within a new 60-day deadline for an updated version of the textbook. we acknowledge that even though the actual time spent with users in the first round was limited to two hours, the process also required time for drafting recruitment emails, communicating with volunteers, scheduling testing, and debriefing after sessions. figure 3 shows our workflow diagram, including a new quality assurance phase (see fig. 4 for detail) based on our case study. it includes prototyping (content and format draft), user testing, and implementing user feedback on the oer prototype. figure 3. discrete production workflow including quality assurance phase. this workflow is an adaptation of a workflow by billy meinke and university of hawai’i at manoa under a cc-by license. information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 10 figure 4. quality assurance phase with user testing. we addressed several suggestions participants made during testing to improve the textbook’s navigation functionality. we were able to address requests for a linked toc in the second round of testing. in the first round of testing, formatting was tailored to print pdf format because the editors wanted a print version to be available. we were able to create a linked toc in the digital pdf format, but not the printable pdf format. we were not aware that the toc could be changed based on available documentation in the first round of testing, but we were able to successfully troubleshoot this issue in the second round of tests. we were not able to do any customization on the search feature, which was built in. for customization, pressbooks allows styling through css files for three formats (pdf, web page, epub) separately. we customized them for look and feel. many of the requests were constrained by our working knowledge of and the technical limits of pressbooks, so we added a tips for pdf readers and printing section in the front matter of the textbook during the first round. it is important to note that although these were not major changes to the interface, they gave us insight for iterative changes. upon reflection, it would have been information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 11 preferable to involve someone with pressbooks developer experience at the outset. because we had not led a similar project before, nor worked with the software previously, we were more limited on the changes we could make as a result of testing than we expected. however, after this experience, we know what areas to test and are better prepared to effect actual changes. we made chapters available separately in the institutional repository to cut down on excessive scrolling, because scrolling through an entire textbook slowed students’ ability to study and quickly reference the text. also, the editors requested digital as well as print access to the textbook through the campus bookstore. the raiderready textbook was also added to the library’s local records in the catalog. we did not make a web version available through pressbooks. a web version was not a priority because of the institution-specific customization and because the editors did not request one. usage statistics from the repository between march 2018 and february 2019 peaked during midterms and at the end of semesters in which the class was taught. chapters 5, 6, 7, and 8 had the most downloads—the last chapters of the book, likely the chapters students were tested on for the final—with the majority of downloads (1,368) taking place during october 2018. this indicates that the option to download individual chapters appealed to students. accessibility testing the textbook with screen readers confirmed the need for an epub format of the text. hyde gives the following guidance for educators using pressbooks: “pdf accessibility is known to be somewhat limited, and in particular some files are not able to be interpreted by screen readers. the pdf conversion tool that pressbooks uses does support pdf tagging, which improves screen reader compatibility, but often the best solution is to offer web and e-book versions of a textbook as well as the pdf, so readers can find the format that best suits their needs.”53 for pdfs, issues included lack of alt tags, headings not set, and tables and images lacking tags. adding alt tags was planned early, after they were lost when uploading the wxr (wordpress extended rss) file—a wordpress export file in xml format—in pressbooks, and loss of the alt tags was confirmed during testing at the midpoint of the process. however, due to deadlines and pressbooks functionality, we were not able to address more of the tagging issues. epubs worked much better in tests with screen readers, apple devices, and e-readers. editors preferred that a pdf be used as the primary version and wanted an epub for screen readers upon request. our partners’ preference was likely based on the common use of pdfs but it did not comply with the principles of universal or inclusive design. regarding e-book accessibility, pressbooks documentation says, “ebook accessibility is somewhat dictated by the file format standards, which focus on font sizes and screen readers, and improvements are also being made with dynamic content. the international digital publishing forum has a checklist to prompt ebook creators on accessibility functions they can incorporate while creating their content.”54 we made a decision to include multiple formats to take multiple types of use into consideration. in the fir st round of changes, we included an epub alongside the pdf in the repository, so users with disabilities would not have to self-identify by making a request in order to gain access. upon learning more about inclusive design after the pilot, we realized we were treating users as a homogeneous group and segregating the more accessible version. in the second round, when we realized the epub was not available by separate chapters as was the pdf version, we then made it available by chapter as well. we recommend that evaluating oer according to the international digital publishing forum checklist be incorporated into the qa part of the workflow. information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 12 conclusion there is room for future research on iterative testing for oer and testing with more emphasis on mobile devices, testing with deeper investigation into microinteractions concerning accessibility, and testing in workflows that use other publishing platforms. as the creators of the floe project suggest, many more customizations can be made to points of user interactions if the software platform for adaptation is open source. future research may also examine regional and cultural influence on learning and interface preferences. one change that may support future adaptation projects at texas tech university would be modifying internal guidelines that take into consideration previous testing and local context. we also recommend keeping detailed documentation, particularly of steps for changes that are not included in existing guides on oer production. creating a memorandum of understanding with partners that clearly outlined responsibilities could have prevented some of the misunderstandings that occurred. for example, when stakeholders discussed producing print copies of the textbook, it wasn’t clear what the library’s role was. with a short timeline and more work involved than expected, the library was in a position of overpromising and underdelivering. it was apparent that the workflows themselves needed to be open and adaptable to support resources, communities, and processes in local contexts. it was important throughout the process to be aware of our partners’ priorities (e.g., instructional preferences, cost to students, departmental buy-in), because we had to balance these priorities with user feedback. we recommend having specific roles for content strategists, educational technologists, and developers in workflows during oer production. the work of creating workflows, assigning roles, and creating standards for oer content currently falls on librarians, instru ctional designers, and creators. as librarians seek the most sustainable workflows, it will be beneficial to emphasize investing in the quality assurance stages of oer production and evenly distributing responsibilities. this can be done through collaborative partnerships or by hiring additional positions. if other institutions were to scale the practices from our case study, ideally, librarians would take responsibility for adding roles or formalized work to the scope of either ux or oer departments so that it becomes normalized in oer workflows. we recommend working with editors to advocate for one textbook format that addresses a variety of learning needs. we plan to use these experiences, along with existing resources, to include inclusive and user -friendly recommendations in policies and guidelines for oer adaptation. conducting user testing did challenge assumptions about student use of oer by librarians and editing instructors. while we referred to toolkits, guidelines, and best practices, internal testing allowed us to make improvements to several specific microinteractions students encountered while using the text. it was very feasible to incorporate testing into the workflow. we were able to directly observe user information behavior from members of the community that the resource was intended to serve. information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 13 appendix: usability test method pretest questions 1. what is your academic classification? (undergraduate, graduate, faculty) 2. have you ever used an e-textbook or a digital textbook in one of your classes? (if yes, ask for course details.) tasks to observe 1. imagine you needed to get a copy of the digital textbook raider ready: unmasking the possibilities of college success. how would you go about finding it? it will help us if you think out loud as you go along—tell us what you’re looking at, what you’re trying to do, what you’re thinking. 2. [if the tester is unable to locate the digital textbook, the moderator will open it.] please take a couple of minutes to look at this textbook. explore and click on a link or two. 3. for the next task, imagine an instructor asked you to locate the chapter activities for chapter 1. could you show us how you would locate those? 4. for the final task, could you find the student code of conduct? posttest questions 1. what were your impressions of this resource? 2. what did you like? dislike? what would you change? 3. how easy or difficult was it to find what you wanted? please explain. 4. is there anything else about your experience using this textbook today that you’d like to tell us? information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 14 endnotes 1 sonya betz and robyn hall, “self-archiving with ease in an institutional repository: microinteractions and the user experience,” information technology and libraries 34, no. 3 (september 21, 2015): 44–45, https://doi.org/10.6017/ital.v34i3.5900. 2 anita r. walz, “open and editable: exploring library engagement in open educational resource adoption, adaptation and authoring,” virginia libraries 61 (january 2015): 23, http://hdl.handle.net/10919/52377. 3 stephen whitfield and zoe robinson, “open educational resources: the challenges of ‘usability’ and copyright clearance,” planet 25, no. 1 (2012): 52, https://doi.org/10.11120/plan.2012.00250051. 4 walz, “open and editable,” 24. 5 andy lane, “from pillar to post: exploring the issues involved in repurposing distance learning materials for use as open educational resources” (working paper, uk open university, december 2006), accessed august 1, 2018, http://kn.open.ac.uk/public/document.cfm?docid=9724. 6 andy arana et al. eds., “open logic project,” university of calgary faculty of arts and the campus alberta oer initiative, accessed april 26, 2019, http://openlogicproject.org/; robin derosa, the open anthology of earlier american literature (public commons publishing, 2015), https://openamlit.pressbooks.com/; timothy robbins, “case study: expanding the open anthology of earlier american literature,” in a guide to making open textbooks with students, ed. elizabeth mays (the rebus community for open textbook creation, 2017), https://press.rebus.community/makingopentextbookswithstudents/chapter/case-studyexpanding-open-anthology-of-earlier-american-literature/. 7 walz, “open and editable,” 24. 8 billy meinke, “discovering oer production workflows,” uh oer (blog), university of hawai’i, december 23, 2016, https://oer.hawaii.edu/discovering-oer-production-workflows/. 9 betz and hall, “self-archiving with ease,” 44. 10 beth st. jean et al., “unheard voices: institutional repository end-users,” college & research libraries 72, no. 1 (january 2011): 23, https://doi.org/10.5860/crl-71r1. 11 jutta treviranus et al., “an introduction to the floe project,” in international conference on universal access in human-computer interaction, universal access to information and knowledge, ed. constantine stephanidis and margherita antona, uahci 2014 (june 2014), lecture notes in computer science 8514: 454, https://doi.org/10.1007/978-3-319-074405_42. 12 sarah crissinger, “a critical take on oer practices: interrogating commercialization, colonialism, and content,” in the library with the lead pipe, october 21, 2015, http://www.inthelibrarywiththeleadpipe.org/2015/a-critical-take-on-oer-practicesinterrogating-commercialization-colonialism-and-content/; diane harley, “why https://doi.org/10.6017/ital.v34i3.5900 https://nam04.safelinks.protection.outlook.com/?url=http%3a%2f%2fhdl.handle.net%2f10919%2f52377&data=02%7c01%7ccamille.thomas%40ttu.edu%7c1184a3bd8d1b411a7d6f08d6abcb07bd%7c178a51bf8b2049ffb65556245d5c173c%7c0%7c0%7c636885285831155488&sdata=dqdoguvqanm6uote7bqivip8uoaz%2b3xnoxg5uscm4tc%3d&reserved=0 https://nam04.safelinks.protection.outlook.com/?url=http%3a%2f%2fhdl.handle.net%2f10919%2f52377&data=02%7c01%7ccamille.thomas%40ttu.edu%7c1184a3bd8d1b411a7d6f08d6abcb07bd%7c178a51bf8b2049ffb65556245d5c173c%7c0%7c0%7c636885285831155488&sdata=dqdoguvqanm6uote7bqivip8uoaz%2b3xnoxg5uscm4tc%3d&reserved=0 http://hdl.handle.net/10919/52377 https://doi.org/10.11120/plan.2012.00250051 https://doi.org/10.11120/plan.2012.00250051 https://doi.org/10.11120/plan.2012.00250051 http://kn.open.ac.uk/public/document.cfm?docid=9724 http://openlogicproject.org/ https://openamlit.pressbooks.com/ https://press.rebus.community/makingopentextbookswithstudents/chapter/case-study-expanding-open-anthology-of-earlier-american-literature/ https://press.rebus.community/makingopentextbookswithstudents/chapter/case-study-expanding-open-anthology-of-earlier-american-literature/ https://press.rebus.community/makingopentextbookswithstudents/chapter/case-study-expanding-open-anthology-of-earlier-american-literature/ https://press.rebus.community/makingopentextbookswithstudents/chapter/case-study-expanding-open-anthology-of-earlier-american-literature/ https://oer.hawaii.edu/discovering-oer-production-workflows/ https://doi.org/10.5860/crl-71r1 https://doi.org/10.1007/978-3-319-07440-5_42 https://doi.org/10.1007/978-3-319-07440-5_42 http://www.inthelibrarywiththeleadpipe.org/2015/a-critical-take-on-oer-practices-interrogating-commercialization-colonialism-and-content/ http://www.inthelibrarywiththeleadpipe.org/2015/a-critical-take-on-oer-practices-interrogating-commercialization-colonialism-and-content/ information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 15 understanding the use and users of open education matters,” in opening up education: the collective advancement of education through open technology, open content, and open knowledge, ed. toru iiyoshi and m.s. vijay kumar (cambridge, ma: the mit press, 2008), 197– 212. 13 harley, “why understanding,” 208. 14 tom carey and gerard l. hanley, “extending the impact of open educational resources through alignment with pedagogical content knowledge and institutional strategy: lessons learned from the merlot community experience,” in opening up education: the collective advancement of education through open technology, open content, and open knowledge, ed. toru iiyoshi and m.s. vijay kumar (cambridge, ma: the mit press, 2008), 238. 15 rebecca blakiston and shoshana mayden, “how we hired a content strategist (and why you should too),” journal of web librarianship 9, no. 4 (2015): 202–6, https://doi.org/10.1080/19322909.2015.1105730; “our team,” openstax, rice university, accessed december 9, 2019, https://openstax.org/team. 16 maria nuccilli, elliot polak, and alex binno, “start with an hour a week: enhancing usability at wayne state university libraries,” weave: journal of library user experience 1, no. 8 (2018), https://doi.org/10.3998/weave.12535642.0001.803. 17 jakob nielsen and thomas k. landauer, “a mathematical model of the finding of usability problems,” in proceedings of the interact’93 and chi’93 conference on human factors in computing systems (may 1993): 211–12, https://doi.org/10.1145/169059.169166. 18 simone borsci et al., “reviewing and extending the five-user assumption: a grounded procedure for interaction evaluation,” in acm transactions on computer-human interaction 20, no. 5, article 29 (november 2013), 18–19, http://delivery.acm.org/10.1145/2510000/2506210/a29-borsci.pdf. 19 treviranus et al., “floe project,” 454. 20 laura icela gonzález-pérez, maría-soledad ramírez-montoya, and francisco j. garcía-peñalvo, “user experience in institutional repositories: a systematic literature review,” international journal of human capital and information technology professionals 9, no. 1 (january–march 2018): 79, 84, https://doi.org/10.4018/ijhcitp.2018010105; betz and hall, “self-archiving with ease,” 45; st. jean et al., “unheard voices,” 23, 36–37, 40. 21 meinke, “discovering oer production workflows”; nuccilli, polak, and binno, “start with an hour.” 22 steven d. eppinger, murthy v. nukala, and daniel e. whitney, “generalised models of design iteration using signal flow graphs,” research in engineering design 9, no. 2 (1997): 112; helen timperley et al., teacher professional learning and development (wellington, new zealand: ministry of education, 2007), http://www.oecd.org/education/school/48727127.pdf. 23 eppinger, nukala, and whitney, “design iteration,” 112–13. https://doi.org/10.1080/19322909.2015.1105730 https://openstax.org/team https://doi.org/10.3998/weave.12535642.0001.803 https://doi.org/10.1145/169059.169166 http://delivery.acm.org/10.1145/2510000/2506210/a29-borsci.pdf https://doi.org/10.4018/ijhcitp.2018010105 http://www.oecd.org/education/school/48727127.pdf information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 16 24 walz, “open and editable,” 23; blakiston and mayden, “how we hired a content strategist,” 203. 25 walz, “open and editable,” 28. 26 harley, “why understanding,” 201–6. 27 scott woodward, adam lloyd, and royce kimmons, “student voice in textbook evaluation: comparing open and restricted textbooks,” international review of research in open and distributed learning 18, no. 6 (september 2017), 150–63, https://doi.org/10.19173/irrodl.v18i6.3170. 28 meinke, “discovering oer production workflows.” 29 samuel k. nikoi et al., “corre: a framework for evaluating and transforming teaching materials into open educational resources,” open learning: the journal of open, distance and e-learning 26, no. 3 (2011), 194–99, https://doi.org/10.1080/02680513.2011.611681. 30 “corre 2.0,” institute of learning innovation, university of leicester, accessed april 25, 2019, https://www2.le.ac.uk/departments/beyond-distance-researchalliance/projects/ostrich/corre-2.0. 31 betz and hall, “self-archiving with ease,” 45–46. 32 andré constantino da silva et al., “portability and usability of open educational resources on mobile devices: a study in the context of brazilian educational portals and android-based devices” (paper, international conference on mobile learning 2014, madrid, spain, february 28–march 2, 2014), 198, https://eric.ed.gov/?id=ed557248. 33 sarah morehouse, “oer bootcamp 3-3: oers and usability,” youtube video, 3:16, march 2, 2018, https://www.youtube.com/watch?v=cncxbcs-2gm. 34 krista godfrey, “creating a culture of usability,” weave: journal of library user experience 1, no. 3 (2015), https://doi.org/10.3998/weave.12535642.0001.301; peter morville, “user experience design,” semantic studios, june 21, 2004, http://semanticstudios.com/user_experience_design/. 35 meinke, “discovering oer production workflows.” 36 cynthia ng, “a practical guide to improving web accessibility,” weave: journal of library user experience 1, no. 7 (2017), https://doi.org/10.3998/weave.12535642.0001.701; whitney quesenbery, “usable accessibility: making web sites work well for people with disabilities,” ux matters, february 23, 2009, http://www.uxmatters.com/mt/archives/2009/02/usableaccessibility-making-web-sites-work-well-for-people-with-disabilities.php. 37 ng, “improving web accessibility.” 38 amanda coolidge et al., accessibility toolkit 2nd edition (victoria, b.c.: bccampus, 2018), 1–71, https://opentextbc.ca/accessibilitytoolkit/. https://doi.org/10.19173/irrodl.v18i6.3170 https://doi.org/10.1080/02680513.2011.611681 https://www2.le.ac.uk/departments/beyond-distance-research-alliance/projects/ostrich/corre-2.0 https://www2.le.ac.uk/departments/beyond-distance-research-alliance/projects/ostrich/corre-2.0 https://eric.ed.gov/?id=ed557248 https://www.youtube.com/watch?v=cncxbcs-2gm https://doi.org/10.3998/weave.12535642.0001.301 http://semanticstudios.com/user_experience_design/ https://doi.org/10.3998/weave.12535642.0001.701 http://www.uxmatters.com/mt/archives/2009/02/usable-accessibility-making-web-sites-work-well-for-people-with-disabilities.php http://www.uxmatters.com/mt/archives/2009/02/usable-accessibility-making-web-sites-work-well-for-people-with-disabilities.php https://opentextbc.ca/accessibilitytoolkit/ information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 17 39 vassilis kourbetis and konstantinos boukouras, “accessible open educational resources for students with disabilities in greece,” in universal access in human-computer interaction, universal access to information and knowledge, ed. constantine stephanidis and margherita antona, uahci 2014 (june 2014), lecture notes in computer science 8514: 349–57, https://doi.org/10.1007/978-3-319-07440-5_32. 40 treviranus et al., “floe project,” 455–56. 41 treviranus et al., 456–57. 42 treviranus et al., 456–57. 43 ng, “improving web accessibility”; treviranus et al., “floe project,” 460–61. 44 treviranus et al., “floe project,” 461. 45 2018 academic student ebook experience survey report (library journal research, 2018): 6, accessed may 3, 2019, https://mediasource.formstack.com/forms/2018_academic_student_ebook_experience_survey _report. 46 michael gorrell, “the ebook user experience in an integrated research platform,” against the grain 23, no. 5 (december 2014): 38; robert slater, “why aren’t e-books gaining more ground in academic libraries? e-book use and perceptions: a review of published literature and research,” journal of web librarianship 4, no. 4 (2010): 305–31; joelle thomas and galadriel chilton, “library e-book platforms are broken: let’s fix them,” academic e-books: publishers, librarians, and users (2016): 249–62; christina mune and ann agee, “ebook showdown: evaluating academic ebook platforms from a user perspective,” in creating sustainable community: the proceedings of the acrl 2015 conference (2015): 25–28; laura muir and graeme hawes, “the case for e-book literacy: undergraduate students’ experience with ebooks for course work,” the journal of academic librarianship 39, no. 3 (2013): 260–74; esta tovstiadi, natalia tingle, and gabrielle wiersma, “academic e-book usability from the student’s perspective,” evidence based library and information practice 13, no. 4 (2018): 70– 87. 47 erin dorris cassidy, michelle martinez, and lisa shen, “not in love, or not in the know? graduate student and faculty use (and non-use) of e-books,” the journal of academic librarianship 38, no. 6 (2012): 326–32; gorrell, “the ebook user experience,” 36–40. 48 jo r. jardina and barbara s. chaparro, “investigating the usability of e-textbooks using the technique for human error assessment,” journal of usability studies 10, no. 4 (2015): 140–59. 49 steve krug, rocket surgery made easy (berkeley, ca: new riders, 2010), 146–53. 50 danielle a. becker and lauren yannotta, “modeling a library website redesign process: developing a user-centered website through usability testing,” information technology and libraries 32, no. 1 (march 2013): 9–10. 51 blakiston and mayden, “how we hired a content strategist,” 194. https://doi.org/10.1007/978-3-319-07440-5_32 https://mediasource.formstack.com/forms/2018_academic_student_ebook_experience_survey_report https://mediasource.formstack.com/forms/2018_academic_student_ebook_experience_survey_report information technology and libraries march 2021 user experience testing in the open textbook adaptation workflow | thomas, vardeman, and wu 18 52 jessica norman, sait oer workflow, may 2019, accessed july 14, 2020, https://docs.google.com/drawings/d/1xvjpu9s4bb32k3gblnvw4uy1ely9rtxnr8bkfdm5yk/; regina gong, oer production workflow, accessed july 14, 2020, http://libguides.lcc.edu/oer/adopt; ariana e. santiago, oer adoption workflow visual overview, april 2019, accessed july 14, 2020, https://docs.google.com/drawings/d/1czqhpgpqyrr46vm5iytoemyqj-s1zr0p-m-lj16rtto/; meinke, “discovering oer production workflows.” 53 zoe wake hyde, “accessibility and universal design,” in pressbooks for edu guide (pressbooks.com, 2016), https://www.publiconsulting.com/wordpress/eduguide/. 54 hyde. https://docs.google.com/drawings/d/1xvjpu9s4bb32k3gblnvw4uy1ely9rtxnr8bkfdm5-yk/edit https://docs.google.com/drawings/d/1xvjpu9s4bb32k3gblnvw4uy1ely9rtxnr8bkfdm5-yk/edit http://libguides.lcc.edu/oer/adopt https://docs.google.com/drawings/d/1czqhpgpqyrr46vm5iytoemyqj-s1zr0p-m-lj16rtto/ https://www.publiconsulting.com/wordpress/eduguide/ abstract introduction literature review case study limitations results content and navigation discovery and access discussion accessibility conclusion appendix: usability test method endnotes editorial i think that writing editorials in my job as the new editor of information technology and libraries (ital) is going to be a real piece of cake. all i have to do, dear readers, is to quote (with proper attribution) walt crawford, the title of whose book i repeat as the title of this, my inaugural editorial.1 and then quote other sages of our profession, using only as many of their words as is fitting and proper to make my editorials relevant to the concerns of our membership and readers and as few of my own words as i can to repay the confidence that the library information and technology association (lita) has placed in me— and to avoid muddling the ideas of those to whom i shall be indebted. those of you reading this will note that i have already fallen prey to the conceit of all scholarly journal editors: that their readers, of course, after surveying the tables of contents, dive wide-eyed first into the editorials. of course. to paraphrase a technologist of an earlier era, “when in the course of human events, it becomes necessary for” a new editor to take on the responsibility for the stewardship of ital, “a decent respect to the opinions of mankind requires that” he “should declare the causes which impel” him to accept that responsibility and, further, to write editorials. i quote, of course, from the first paragraph of the declaration of independence adopted by the “thirteen united states of america” july 4, 1776. in this, my first editorial, i, too, shall put forth for the examination of the members of lita and the readers of ital my goals and hopes for the journal that i am now honored to lead. these goals and hopes are shared by the members of the ital editorial board, whose names appear in the masthead of this journal. ital is a double-blind refereed journal that currently has a manuscript acceptance rate of 50 percent. it began in 1968 as the journal of library automation (jola), the journal of the information science and automation division (isad) of ala, and its first editor was fred kilgour. in 1978 isad became lita, and in 1982, the journal title was changed to reflect the expanding role of information technology in libraries, an expansion that continues to accelerate so that ital is no longer the only professional journal within ala whose pages are now dominated by our accelerating use of information technologies as tools to manage the services we provide our users and as tools we use ourselves to accomplish our daily duties. i write part of this editorial in the skies over the middle section of the united states as i return home from the seventh national lita forum held in st. louis, october 7–10. at the forum, i heard presentations, visited poster sessions, and talked with colleagues from forty-four states and six countries who had something to say and said it well. i hope that some of them may submit manuscripts to ital so that all the members of lita and all the readers of the journal will profit as well from some of what the attendees of the forum heard and saw. i attended the forum forewarned by previous ital editors to carry plenty of business cards, and i went armed with a pocketful. i think i distributed enough that, if pieced together, their blank sides would provide sufficient writing space for at least one manuscript! in an attempt to fulfill the jeffersonian promise above, i hereby list a few of my goals for the beginning of my term as editor. i must emphasize that these goals of mine supplement but do not supplant the purposes of the journal as stated on the first page and on the ital web site (www.ala.org/lita/litapublications/ital/italinformation. htm); likewise, they do not supplant the goals of my predecessors. in no particular order: i hope to increase the number of manuscripts received from our library and information schools. their faculty and doctoral students are some of the incubators of new and exciting information technologies that may bear fruit for future library users. however, not all research turns up maps on which “x marks the spot.” exploration is interesting, even vital, for the journey, for the search itself, and our graduate faculties and students have something to say. i hope to increase the submission of manuscripts that describe relevant sponsored research. in the earlier volumes, jola had an average of at least one article per issue, maybe more, describing the results of funded research. ital can and should be a source that information-technology researchers consider as a vehicle for the publication of their results. two articles in this issue result from sponsored research. in fact, i hope to increase the number of manuscripts that describe any relevant research or cutting-edge developments. much of the exploration undertaken by librarians improving and strengthening their services involves research or problems solved on both small scales and large. neither the officers of lita, the referees, the readers, nor i are interested in very many “how i run my library good” articles. we all want to read a statement of the problem(s), the hypotheses developed to explore the issues surrounding the problem(s), the research methods, the results, the assessment of the outcomes, and, when feasible, a synthesis of how the research methods or results may be generalized. i hope to increase the number of articles with multiple authors. libraries are among society’s most cooperative institutions and librarians, members of one of the most cooperative of professions. the work we do is rarely that of solitary performers, whether it be research or the editorial | webb 3 editorial: first have something to say john webb john webb (jwebb@wsu.edu) is assistant director for digital services/collections, washington state university libraries, pullman, and editor of information technology and libraries. (continued on page 21) __problems with unauthorized people accessing the internet through the wireless network __problems with restricted parts of the network being accessed by unauthorized users __other 3. how were security problems resolved? benefits of use of network 1. what have been the biggest benefits of wireless technology? check all that apply. __user satisfaction __increased access to the internet and online sources __flexibility and ease due to lack of wires __has improved technical services (use for library functions) __has aided in bibliographic instruction __provides access beyond the library building __allows students to roam the stacks while accessing the network __other 2. how would you describe current usage of the network? __heavy __moderate __low 3. in your opinion, has this technology been worth the benefit-cost ratio thus far? __yes __no __not sure 4. what advice would you give to librarians considering this technology? (editorial continued from page 3) design and implementation of complex systems to serve our users. writing about that should not be solitary either. i hope to publish think-pieces from leaders in our field. i hope to publish more articles on the management of information technologies. i hope to increase the number of manuscripts that provide retrospectives. libraries have always been users of information technologies, often early adopters of leading-edge technologies that later become commonplace. we should, upon occasion, remember and reflect upon our development as an information-technology profession. i hope to work with the editorial board, the lita publications committee, and the lita board to find a way, and soon, to facilitate the electronic publication of articles without endangering—but in fact enhancing—the absolutely essential financial contribution that the journal provides to the association. in short, i want to make ital a destination journal of excellence for both readers and authors, and in doing so reaffirm the importance of lita as a professional division of ala. to accomplish my goals, i need more than an excellent editorial board, more than first-class referees to provide quality control, and more than the support of the lita officers. i need all lita members to be prospective authors, prospective referees, and prospective literary agents acting on behalf of our profession to continue the almost forty-year tradition begun by fred kilgour and his colleagues, who were our predecessors in volume 1, number 1, march 1966, of our journal. reference 1. walt crawford, first have something to say: writing for the library profession (chicago: ala, 2003). wireless networks in medium-sized academic libraries | barnett-ellis and charnigo 21 off-campus access to licensed online resources through shibboleth article off-campus access to licensed online resources through shibboleth francis jayakanth, ananda t. byrappa, and raja visvanathan information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12589 abstract institutions of advanced education and research, through their libraries, invest substantially in licensed online resources. only authorized users of an institution are entitled to access licensed online resources. seamless on-campus access to licensed resources happens mostly through internet protocol (ip) address authentication. increasingly, licensed online resources are accessed by authorized users from off-campus locations as well. libraries will, therefore, need to ensure seamless off-campus access to authorized users. libraries have been using various technologies, including proxy server or virtual private network (vpn) server or single sign-on, to facilitate seamless offcampus access to licensed resources. in this paper, authors share their experience in setting up a shibboleth-based single sign-on (sso) access management system at the jrd tata memorial library, indian institute of science, to enable authorized users of the institute to seamlessly access licensed online resources from off-campus locations. introduction the internet has both necessitated and offered options for libraries to enable remote access to an organization’s licensed online content—journals, e-books, technical standards, bibliographical and full-text databases, and more. in the absence of such an option for remote access, faculty, students, and researchers have limited and constrained access to the licensed online content from off campus locations. as scholarly resources transitioned from print to online in the mid-1990s, libraries and their vendors had to start identifying user affiliations in order to grant access to licensed online resources to the authorized users of an institution. the ip address was an obvious mechanism to do that. allowing or denying access to online resources based on a user’s ip address was simple, it worked, and, in the absence of practical alternatives, it became the universal means of authentication for gaining access to licensed library content.1 to facilitate seamless access to licensed online resources from off-campus sites, libraries have been using various technologies including proxy server or vpn server or remote desktop gateway or federated identity management or a combination of the said technologies. in our institute, the on-campus ip-based access to the licensed content is supplemented by vpn technology for off-campus access. the covid-19 pandemic has necessitated academic and scientific staff work from home, which demands smooth and seamless access to the organization’s licensed content. the sudden surge in demand for seamless off-campus access to the licensed online resources had an impact on the institute’s vpn server. also, not all authorized users of the francis jayakanth (francis@iisc.ac.in) is scientific officer, j.r.d. tata memorial library, indian institute of science. ananda t. byrappa (anandtb@iisc.ac.in) is librarian, j.r.d. tata memorial library, indian institute of science. raja visvanathan (raja@inflibnet.ac.in) is scientist c (computer science), inflibnet centre, gandhinagar, india. © 2021. mailto:francis@iisc.ac.in mailto:anandtb@iisc.ac.in mailto:raja@inflibnet.ac.in information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 2 institute are entitled to get vpn access. to mitigate the situation, the library, therefore, had to explore a secure, reliable, and cost-effective solution to facilitate seamless off-campus access to all the licensed online resources to all the authorized users of the institute. after exploring the possibilities, the library decided to implement a single sign-on solution based on shibboleth. shibboleth software implements the security assertion markup language (saml) protocol, separating the functions of authentication (undertaken by the library or university, which knows its community of end users) and authorization (undertaken by the resource provider, which knows which libraries have licenses for their users to access the resource in question). 2 about the indian institute of science (iisc) the indian institute of science (iisc, or “the institute”) was established in 1909 by a visionary partnership between the industrialist jamsetji nusserwanji tata, the maharaja of mysore, and the government of india. over the 109 years since its establishment, iisc has become the premier institute for advanced scientific and technological research and education in india. since its inception, the institute has laid a balanced emphasis on the pursuit of fundamental knowledge in science and engineering, and the application of its research findings for industrial and social benefit. during 2017–18, the institute initiated the practice of undergoing international peer academic reviews over a 5-year cycle. each year, a small team of invited international experts reviews a set of departments. the experts spend 3 to 4 days at the institute. during this period, they interact closely with the faculty and students of these departments and tour the facilities, aiming to assess the academic work against international benchmarks. iisc has topped the ministry of human resource development (mhrd), government of india’s nirf (national institutional ranking framework) rankings not only in the university’s category but also overall among all ranked institutions. times higher education has placed iisc at the 8th position in its small university rankings (that is, among universities with fewer than 5 ,000 students), at the 13th position in its ranking of universities in the emerging economies, and in the range 91–100 in its world reputation rankings. in the qs world university rankings, iisc is ranked 170. in the same ranking system, on the metric of citations per faculty, iisc is placed in second position. iisc publishes about 3,000 papers per year in scopus and web of science indexed journals and conferences and, each year, the institute awards around 400 phd degrees. about the iisc library jrd tata memorial library (https://www.library.iisc.ac.in), popularly known as the indian institute of science library, is one of the best science and technology libraries in india. started in 1911, as one of the first three departments in the institute, it has become a precious national resource center in the field of science and technology. the library receives annually a grant of 1012% of the total budget of the institute. the library spends about 95% of its budget toward periodical subscriptions, which is unparalleled in this part of the globe. with a collection of nearly 500,000 volumes of books, periodicals, technical reports and standards, the jrd tata memorial library is one of the finest in the country. currently, it subscribes to over 13,000 current periodicals. the library also maintains the iisc’s research publications repository, eprints@iisc (http://eprints.iisc.ac.in), and its theses and dissertations repository, etd@iisc (https://etd.iisc.ac.in). https://www.library.iisc.ac.in/ http://eprints.iisc.ac.in/ http://etd.iisc.ac.in/ https://etd.iisc.ac.in/ information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 3 off-campus access to licensed online resources in a typical research library, licensed scholarly resources comprise research databases, electronic journals, e-books, standards, and more. a library licenses these resources through publishers/vendors. these license agreements limit access to the resources to the authorized users of an institute. in our case, authorized users include faculty members, enrolled students, current staff, contractual staff, and walk-in users to the library. seamless access to the licensed resources from on-campus sites is predominantly ip-address authenticated, which is a simple and efficient model for users physically located on the institute campus. these users expect a similar experience while accessing licensed online resources from off-campus locations. therefore, the challenge to the libraries is to ensure that such off-campus accesses are secure, seamless, and restricted to authorized users of an institute. libraries have been using various technologies including proxy servers, vpn servers, or single sign-on to facilitate seamless off-campus access to licensed resources. our institute has been using vpn technology to enable off-campus access to licensed online resources. a virtual private network (vpn) is a service offered by many organizations to its members to enable them to remotely connect to the organization’s private network. a vpn extends a private network across a public network and allows users to send and receive data across shared or public networks as if their computing devices were directly connected to the private network. applications running across a vpn may therefore benefit from the functionality, security, and management of the private network. encryption is common, although not an inherent, part of a vpn connection.3 in our institute, faculty members and students are provided access to the vpn service when their institute email address is created. users follow four steps to use a vpn client to get connected to the campus network: • install vpn client software on their computer system. cisco anyconnect (https://www.cisco.com/c/en/us/products/collateral/security/anyconnect-securemobility-client/at-a-glance-c45-578609.html) is one such software. • start the vpn client software every time there is a need to connect to the private network. • enter the address of the institute’s vpn server, and click connect in the anyconnect window. • log in to the vpn server using their institutional email credentials. an authorized user of the institute can use any of the ip authenticated network services, including the licensed online resources, after a successful login to the vpn server. the vpn technology has been serving the purpose well, but the service is, by default, available only to the institute’s faculty and students. other categories of employees such as project assistants, project associates, research assistants, post-doctoral fellows, and others, who constitute a good percentage of iisc staff, are provided vpn access on a case-by-case basis. during the covid-19 lockdown, the library received several enquiries about accessing the online resources from off-campus sites. realizing the importance of the situation, the library quickly assessed the various possibilities for facilitating seamless off-campus access to the subscribed online resources apart from the vpnbased access. federated access through shibboleth identity provider (idp) service emerged as a possible solution to facilitate seamless off-campus access to the entire institute community. https://www.cisco.com/c/en/us/products/collateral/security/anyconnect-secure-mobility-client/at-a-glance-c45-578609.html https://www.cisco.com/c/en/us/products/collateral/security/anyconnect-secure-mobility-client/at-a-glance-c45-578609.html information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 4 federated access federated access is a model for access control in which authentication and authorization are separated and handled by different parties. if a user wishes to access a resource controlled by a service provider (sp), the user logs in via an identity provider (idp). more complex forms of federated access involve the use of attributes (information about the user passed from the idp to the sp, which can be used to make access decisions) and can include extra services such as trust federations and discovery services (where the user selects which idp to use to connect to the sp). 4 examples of this federated access model include shibboleth and openathens. shibboleth is opensource software that offers single sign-on infrastructure. openathens is a commercial product delivered as a cloud-based solution. it supports many of the same standards as shibboleth. so, an institution could pay and join the openathens federation, which will provide technical support to set up, integrate, and operationalize federated access using openathens. we decided to go with shibboleth for the following reasons: • to avoid the recurring cost associated with the openathens solution. • the existence of a shibboleth-based infed federation in the country. infed manages the trust between the participating institutions and publishers (http://infed.inflibnet.ac.in/). • infed is part of the edugain inter-federation, which enables our users to gain access to the resources of federations of other countries. what is shibboleth? shibboleth is a standards-based, open-source software package for web single sign-on across or within organizational boundaries. it allows sites to make informed authorization decisions for individual access of protected online resources in a privacy-preserving manner. the shibboleth software implements widely used federated identity standards, principally the oasis security assertion markup language (saml), to provide a federated single sign-on and attribute exchange framework. a user authenticates with their organizational credentials, and the organization (or identity provider) passes the minimal identity information necessary to the service provider to enable an authorization decision. shibboleth also provides extended privacy functionality allowing a user and their home site to control the attributes released to each application (https://www.shibboleth.net/index/). shibboleth has two major components: (1) an identity provider (idp), and (2) a service provider (sp). the idp supplies required authorizations and attributes about the users to the service providers (for example, publishers). the service providers make use of the information about the users sent by the idp to make decisions on whether to allow or deny access to their resources. interaction between a shibboleth identity provider and service provider. when a user attempts to access licensed content on the service provider’s platform, the service provider generates an authentication request and then directs the request and the user to the user’s idp server. the idp prompts for the login credentials. in our setup, the idp server communicates the login credentials to the institute’s active directory (ad) using the secure lightweight directory access protocol (ldap). http://infed.inflibnet.ac.in/ https://www.shibboleth.net/index/ information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 5 ad is a directory service provided by microsoft. in a directory service, objects (such as a user, a group, a computer, a printer, or a shared folder) are arranged in a hierarchical manner facilitating easy access to the objects. organizations primarily use ad to perform authentication and authorization. once the authenticity of a user is verified, ad helps in determining if a user is authorized to use a specific resource or service. access is granted to a user only if the user checks out on both counts. the ad authenticates a user, and the response is sent back to the idp along with the required attributes. the idp then releases only the required set of attributes to the service provider. based on the idp attributes, which is nothing but a user’s entitlement, the sp grants access to the resource. figure 1 illustrates the functioning of the two components of shibboleth. figure 1. a shibboleth workflow involving a user, identity provider, and service provider. identity federation the interaction between a service provider and identity provider happens based on mutual trust. the trust is established by providing idp metadata as encrypted keys and the idp url that the sp uses to send and request information from the idp. the exchange of metadata between idp and sp can be informal if an institution licenses online resources from only a few publishers. however, research libraries license content from hundreds of sps. therefore, the role of federations is significant. in the absence of a federation, each identity provider and service provider must individually communicate with each other about their existence and configuration, as illustrated in figure 2. information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 6 figure 2. individual communication between idps and sps. a federation is merely a list of metadata entries aggregated from their member idps and their sps. our institute is a member of infed (information and library network access management federation). infed was established as a centralized agency to coordinate with member institutions in the process of implementing user authentication and access control mechanism across all member institutions. infed manages the trust relationship between the idps and sps (publishers) in india. therefore, individual idps that intend to facilitate access to subscribed online resources through shibboleth will share their metadata with infed. infed, in turn, will share the metadata of the idps with respective service providers, as illustrated in figure 3. other regions have their federations. for example, n the us, incommon (https://www.incommon.org/) serves as the federation, and in the uk, it is the uk access management federation (http://www.ukfederation.org.uk/). https://www.incommon.org/ http://www.ukfederation.org.uk/ information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 7 figure 3. role of a federation as a trust manager between idps and sps. how does one gain access to shibboleth-enabled resources? a federation manages the trust between identity providers and service providers. the sps enable shibboleth-based access to subscribed resources to the idps based on the metadata shared by a federation. once the sps allow access, users can access such resources by using the institutional login option via the athens/shibboleth link found on the service provider’s platform. alternatively, a library can create a simple html page listing all the shibboleth-enabled licensed resources, as shown in figure 4. information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 8 figure 4. partial screenshot of shibboleth-enabled resources of our institute. each of the links in figure 4 is a wayfless url. a wayfless url is specific to an institution (idp), and it enables users of that institution to gain federated access to a service or resource in a way that bypasses where are you from (wayf), or the institutional login (discovery service) steps on the sp’s platform. since the institutional login or the discovery service step can be confusing to end users, wayfless links to the resources will facilitate an improved end-user experience in accessing licensed resources. a user needs to follow a link from the list of resources. the link will take the user to the sp. the sp will redirect the user to the idp server for authentication. after successful authentication, the user will gain access to the resource. there are two ways to get a wayfless url to a service: (1) the service provider can share the url or (2) one can make use of a wayfless url generator service like wugen (https://wugen.ukfederation.org.uk/wugen/login.xhtml). https://wugen.ukfederation.org.uk/wugen/login.xhtml information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 9 benefits of shibboleth-based access shibboleth-based single sign-on can effectively address several requirements of the libraries in ensuring secure and seamless on-campus and off-campus access to subscribed online resources. there are other benefits of shibboleth-based sso: 1. it is open-source software that provides single sign-on infrastructure. 2. it enables organizations to use their existing user authentication mechanism to facilitate seamless access to licensed online resources. 3. being a single sign-on system, for the end users, it eliminates the need to have individual credentials for each online resource. 4. it uses security assertion mark-up language (saml) to securely transfer information about the process of authentication and authorization. 5. it is used by most of the publishers, who facilitate shibboleth-based access through shibboleth federations. 6. it requires a formal federation as a trusted interface between the institutions as an identity provider (idp) and publishers as service providers (sp) thereby ensuring the use of uniform standards and protocols while transmitting attributes of authorised users to publishers. inflibnet’s access management federation, infed, plays this role (https://parichay.inflibnet.ac.in/objectives.php). idp server configuration we installed the shibboleth idp software version 3.3.2 on a virtual machine on the azure platform. the vm system is configured with two virtual cpus, 4 gb of ram, 300 gib of os disk (standard hdd), and ubuntu linux os version 18.04.4 lts. coordination with the organization’s network support team is essential. the network support team handles the domain name service resolution of the idp server and facilitates the idp server to communicate with the organization’s active directory and to open non-standard communication ports on the idp server. shibboleth idp usage statistics the infed team has developed a beta version of the usage analysis tool called infedstat to analyse the use of federated access to gain access to licensed resources. we have implemented the tool on the idp server. figure 5 shows the redacted screenshot of the infedstat dashboard. it shows • date-wise usage details of logged-in users along with ip address, time logged in, and the publishers’ platforms accessed, • number of times the publishers’ platforms were accessed during a specific period, • number of times users logged in for a specific period, • unique users for a specific period, and • unique publishers accessed during a specific period. https://parichay.inflibnet.ac.in/objectives.php information technology and libraries june 2021 off-campus access to licensed online resources | jayakanth, byrappa, and visvanathan 10 figure 5. idp usage dashboard. conclusions the implementation of federated access to subscribed online resources has ensured that all the authorized users of the institute can access almost all the licensed resources from wherever they are. the counter 5 usage analysis of subscribed resources for the period of january 2020 to october 2020 indicates that usage of online resources has increased by nearly 20 percent over the last year for the same period. the enhanced use could be partly because of ease of accessing online resources facilitated by federated access. to assess the reasons for enhanced usage of online resources, the library is planning to conduct a survey to understand how convenient and useful federated access to online resources has been especially while being off campus. federated access through single sign-on is useful not just for accessing licensed online resources. a typical research library offers various other services to its users, including the institutional repository service, learning management system, online catalogue, etc. the library intends to integrate such services with sso, thereby freeing the end users from service-specific credentials. endnotes 1 thomas dowling, “we have outgrown ip authentication,” journal of electronic resources librarianship 32, no. 1 (2020): 39–46, https://doi.org/10.1080/1941126x.2019.1709738. 2 john paschoud, “shibboleth and saml: at last, a viable global standard for resource access management,” new review of information networking 10, no. 2 (2004): 147–60, https://doi.org/10.1080/13614570500053874. 3 andrew g. mason, ed., cisco secure virtual private network (cisco press, 2001): 7, https://www.ciscopress.com/store/cisco-secure-virtual-private-networks-9781587050336. 4 masha garibyan, simon mcleish, and john paschoud, “current access management technologies,” in access and identity management for libraries: controlling access to online information (london, uk: facet publishing, 2014): 31–38. https://doi.org/10.1080/1941126x.2019.1709738 https://doi.org/10.1080/13614570500053874 https://www.ciscopress.com/store/cisco-secure-virtual-private-networks-9781587050336 abstract introduction about the indian institute of science (iisc) about the iisc library off-campus access to licensed online resources federated access what is shibboleth? interaction between a shibboleth identity provider and service provider. identity federation how does one gain access to shibboleth-enabled resources? benefits of shibboleth-based access idp server configuration shibboleth idp usage statistics conclusions endnotes the efficient storage of text documents in digital libraries | skibiński and swacha 143 przemysław skibiński and jakub swacha the efficient storage of text documents in digital libraries przemysław skibiński (inikep@ii.uni.wroc.pl) is [qy: title?], institute of computer science, university of wrocław, poland. jakub swacha (jakubs@uoo.univ.szczecin.pl) is [qy: title?], institute of information technology in management, university of szczecin, poland. przemysław skibiński and jakub swacha the efficient storage of text documents in digital libraries in this paper we investigate the possibility of improving the efficiency of data compression, and thus reducing storage requirements, for seven widely used text document formats. we propose an open-source text compression software library, featuring an advanced word-substitution scheme with static and semidynamic word dictionaries. the empirical results show an average storage space reduction as high as 78 percent compared to uncompressed documents, and as high as 30 percent compared to documents compressed with the free compression software gzip. i t is hard to expect the continuing rapid growth of global information volume not to affect digital libraries.1 the growth of stored information volume means growth in storage requirements, which poses a problem in both technological and economic terms. fortunately, the digital librarys’ hunger for resources can be tamed with data compression.2 the primary motivation for our research was to limit the data storage requirements of the student thesis electronic archive in the institute of information technology in management at the university of szczecin. the current regulations state that every thesis should be submitted in both printed and electronic form. the latter facilitates automated processing of the documents for purposes such as plagiarism detection or statistical language analysis. considering the introduction of the three-cycle higher education system (bachelor/master/doctorate), there are several hundred theses added to the archive every year. although students are asked to submit microsoft word–compatible documents such as doc, docx, and rtf, other popular formats such as tex script (tex), html, ps, and pdf are also accepted, both in the case of the main thesis document, containing the thesis and any appendixes that were included in the printed version, and the additional appendixes, comprising materials that were left out of the printed version (such as detailed data tables, the full source code of programs, program manuals, etc.). some of the appendixes may be multimedia, in formats such as png, jpeg, or mpeg.3 notice that this paper deals with text-document compression only. although the size of individual text documents is often significantly smaller than the size of individual multimedia objects, their collective volume is large enough to make the compression effort worthwhile. the reason for focusing on text-document compression is that most multimedia formats have efficient compression schemes embedded, whereas text document formats usually either are uncompressed or use schemes with efficiency far worse than the current state of the art in text compression. although the student thesis electronic archive was our motivation, we propose a solution that can be applied to any digital library containing text documents. as the recent survey by kahl and williams revealed, 57.5 percent of the examined 1,117 digital library projects consisted of text content, so there are numerous libraries that could benefit form implementation of the proposed scheme.4 in this paper, we describe a state-of-the-art approach to text-document compression and present an opensource software library implementing the scheme that can be freely used in digital library projects. in the case of text documents, improvement in compression effectiveness may be obtained in two ways: with or without regard to their format. the more nontextual content in a document (e.g., formatting instructions, structure description, or embedded images), the more it requires format-specific processing to improve its compression ratio. this is because most document formats have their own ways of describing their formatting, structure, and nontextual inclusions (plain text files have no inclusions). for this reason, we have developed a compound scheme that consists of several subschemes that can be turned on and off or run with different parameters. the most suitable solution for a given document format can be obtained by merely choosing the right schemes and adequate parameter values. experimentally, we have found the optimal subscheme combinations for the following formats used in digital libraries: plain text, tex, rtf, text annotated with xml, html, as well as the device-independent rendering formats ps and pdf.5 first we discuss related work in text compression, then describe the basis of the proposed scheme and how it should be adapted for particular document formats. the section “using the scheme in a digital library project” discusses how to use the free software library that implements the scheme. then we cover the results of experiments involving the proposed scheme and a corpus of test files in each of the tested formats. n text compression there are two basic principles of general-purpose data compression. the first one works on the level of character sequences, the second one works on the level of przemysław skibiński (inikep@ii.uni.wroc.pl) is associate professor, institute of computer science, university of wrocław, poland. jakub swacha (jakubs@uoo.univ.szczecin .pl) is associate professor, institute of information technology in management, university of szczecin, poland. 144 information technology and libraries | september 2009 individual characters. in the first case, the idea is to look for matching character sequences in the past buffer of the file being compressed and replace such sequences with shorter code words; this principle underlies the algorithms derived from the concepts of arbraham lempel and jacob ziv (lz-type).6 in the second case, the idea is to gather frequency statistics for characters in the file being compressed and then assign shorter code words for frequent characters and longer ones for rare characters (this is exactly how huffman coding works—what arithmetic coding assigns are value ranges rather than individual code words).7 as the characters form words, and words form phrases, there is high correlation between subsequent characters. to produce shorter code words, a compression algorithm either has to observe the context (understood as several preceding characters) in which the character appeared and maintain separate frequency models for different contexts, or has to first decorrelate the characters (by sorting them according to their contexts) and then use an adaptive frequency model when compressing the output (as the characters’ dependence on context becomes dependence on position). whereas the former solution is the foundation of prediction by partial match (ppm) algorithms, burrows-wheeler transform (bwt) compression algorithms are based on the latter.8 witten et al., in their seminal work managing gigabytes, emphasize the role of data compression in text storage and retrieval systems, stating three requirements for the compression process: good compression, fast decoding, and feasibility of decoding individual documents with minimum overhead.9 the choice of compression algorithm should depend on what is more important for a specific application: better compression or faster decoding. an early work of jon louis bentley and others showed that a significant improvement in text compression can be achieved by treating a text document as a stream of space-delimited words rather than individual characters.10 this technique can be combined with any general-purpose compression method in two ways: by redesigning character-based algorithms as word-based ones or by implementing a two-stage scheme whose first step is a transform replacing words with dictionary indices and whose second step is passing the transformed text through any generalpurpose compressor.11 from the designer’s point of view, although the first approach provides more control over how the text is modeled, the second approach is much easier to implement and upgrade to future general-purpose compressors.12 notice that the separation of the wordreplacement stage from the compression stage does not imply that two distinct programs have to be used—if only an appropriate general-purpose compression software library is available, a single utility can use it to compress the output of the transform it first performed. an important element of every word-based scheme is the dictionary of words that lists character sequences that should be treated as single entities. the dictionary can be dynamic (i.e., constructed on-line during the compression of every document),13 static (i.e., constructed off-line before the compression stage and once for every document of a given class—typically, the language of the document determines its class),14 or semidynamic (i.e., constructed off-line before compression stage but individually for every document).15 semidynamic dictionaries must be stored along with the compressed document. dynamic dictionaries are reconstructed during decompression (which makes the decoding slower than in the other cases). when the static dictionary is used, it must be distributed with the decoder; since a single dictionary is used to compress multiple files, it usually attains the best compression ratios, but it is only effective with documents of the class it was originally prepared for. n the basic compression scheme the basis of our approach is a word-based, lossless text compression scheme, dubbed compression for textual digital libraries (ctdl). the scheme consists of up to four stages: 1. document decompression 2. dictionary composition 3. text transform 4. compression stages 1–2 are optional. the first is for retrieving textual content from files compressed poorly with generalpurpose methods. it is only executed for compressed input documents. it uses an embedded decompressor for files compressed using the deflate algorithm,16 but an external tool—precomp—is used to decode natively compressed pdf documents.17 the second stage is for constructing the dictionary of the most frequent words in the processed document. doing so is a good idea when the compressed documents have no common set of words. if there are many documents in the same language, a common dictionary fares better—it usually does not pay off to store an individual dictionary with each file because they all contain similar lists of words. for this reason we have developed two variants of the scheme. the basic ctdl includes stage 2; therefore it can use a document-specific semidynamic dictionary in the third stage. the ctdl+ variant uses a static dictionary common for all files in the same language; therefore it can omit stage 2. during stage 2, all the potential dictionary items that meet the word requirements are extracted from the document and then sorted according to their frequency the efficient storage of text documents in digital libraries | skibiński and swacha 145 to form a dictionary. the requirements define the minimum length and frequency of a word in the document (by default, 2 and 6 respectively) as well as its content. only the following kinds of strings are accepted into the dictionary: n a sequence of lowercase and uppercase letters (“a”–“z”, “a”–“z”) and characters with ascii code values from range 128–255 (thus it supports any typical 8-bit text encoding and also utf-8) n url address prefixes of the form “http:// domain/,” where domain is any combination of letters, digits, dots, and dashes n e-mails—patterns of the form “login@domain,” where login and domain are any combination of letters, digits, dots, and dashes n runs of spaces stage 3 begins with parsing the text into tokens. the tokens are defined by their content; as four types of content are distinguished, there are also four classes of tokens: words, numbers, special tokens, and characters. every token is then encoded in a way that depends on the class it belongs to. the words are those character sequences that are listed in the dictionary. every word is replaced with its dictionary index, which is then encoded using symbols that are rare or nonexistent in the input document. indexes are encoded with code words that are between one and four bytes long, with lower indexes (denoting more frequent words) being assigned shorter code words. the numbers are sequences of decimal digits, which are encoded with a dense binary code, and, similarly to letters, placed in a separate location in the output file. the special tokens can be decimal fractions, ip numerical addresses, dates, times, and numerical ranges. as they have a strict format and differ only in numerical values, they are encoded as sequences of numbers.18 finally, the characters are the tokens that do not belong to any of the aforementioned group. they are simply copied to the output file, with the exception of those rare characters that were used to construct code words; they are copied as well, but have to be preceded with a special escape symbol. the specialized transform variants (see the next section) distinguish three additional classes from the character class: letters (words not in the dictionary), single white spaces, and multiple white spaces. stage 4 could use any general-purpose compression method to encode the output of stage 3. for this role, we have investigated several open-licensed, generalpurpose compression algorithms that differ in speed and efficiency. as we believe that document access speed is important to textual digital libraries, we have decided to focus on lz–type algorithms because they offer the best decompression times. ctdl has two embedded backend compressors: the standard deflate and lzma, wellknown for its ability to attain high compression ratios.19 n adapting the transform for individual text document formats the text document formats have individual characteristics; therefore the compression ratio can be improved by adapting the transform for a particular format. as we noted in the introduction, we propose a set of subschemes (modifications of the original processing steps or additional processing steps) that can help compression— provided the issue that a given subscheme addresses is valid for the document format being compressed. there are two groups of subschemes: the first consists of solutions that can be applied to more than one document format. it includes n changing the minimum word frequency threshold (the “minfr” column in table 1) that a word must pass to be included in the semidynamic dictionary (notice that no word can be added to a static dictionary); n using spaceless word model (“wdspc” column in table 1) in which a single space between two words is not encoded at all; instead, a flag is used to mark two neighboring words that are not separated by a space; n run-length encoding of multiple spaces (“spruns” column in table 1); n letter containers (“letcnt” column in table 1), that is, removing sequences of letters (belonging to words that are not included in the dictionary) to a separate location in the output file (and leaving a flag at their original position). table 1 shows the assignment of the mentioned subschemes to document formats, with “+” denoting that a given subscheme should be applied when processing a given document format. notice that we use different subschemes for the same format depending on whether a semidynamic (ctdl) or static (ctdl+) dictionary is used. the remaining subschemes are applied for only one document format. they attain an improvement in compression performance by changing the definition of acceptable dictionary words, and, in one case (ps), by changing the definition of number strings. the encoder for the simplest of the examined formats—plain text files—performs no additional formatspecific processing. the first such modification is in the tex encoder. the difference is that words beginning with “\” (tex 146 information technology and libraries | september 2009 instructions) are now accepted in the dictionary. the modification for pdf documents is similar. in this case, bracketed words (pdf entities)— for example “(abc)”—are acceptable as dictionary entries. notice that pdf files are internally compressed by default—the transform can be applied after decompressing them into textual format. the precomp tool is used for this purpose. the subscheme for ps files features two modifications: its dictionary accepts words beginning with “/” and “\” or ending with “(“, and its number tokens can contain not only decimal but also hexadecimal digits (though a single number must have at least one decimal digit). the hexadecimal number must be at least 6 digits long, and is encoded with a flag: a byte containing its length (numbers with more than 261 digits are split into parts) and a sequence of bytes, each containing two digits from the number (if the number of digits is odd, the last byte contains only one digit). for rtf documents, the dictionary accepts the “\”-preceded words, like the tex files. moreover, the hexadecimal numbers are encoded in the same way as in the ps subscheme so that rtf documents containing images can be significantly reduced in size. specialization for xml is roughly the transform described in our earlier article, “revisiting dictionarybased compression.”20 it allows for xml start tags and entities to be added to dictionary, and it replaces every end tag respecting the xml well-formedness rule (i.e., closing the element opened most recently) with a single flag. it also uses a single flag to denote xml attribute value begin and end marks. html documents are handled similarly. the only difference is that the tags that, according to the html 4.01 specification, are not expected to be followed by an endtag (base, link, xbasehref, br, meta, hr, img, area, input, embed, param and col) are ignored by the mechanism replacing closing tags (so that it can guess the correct closing tag even after the singular tags were encountered).21 n using the scheme in a digital library project many textual digital libraries seriously lack text compression capabilities, and popular digital library systems, such as greenstone, have no embedded efficient text compression.22 therefore we have decided to develop ctdl as an open-source software library. the library is free to use and can be downloaded from www.ii.uni.wroc .pl/~inikep/research/ctdl/ctdl09.zip. the library does not require any additional nonstandard libraries. it has both the text transform and back-end compressors embedded. however, compressing pdf documents requires them to be decompressed first with the free precomp tool. the compression routines are wrapped in a code selecting the best algorithm depending on the chosen compression mode and the input document format. the interface of the library consists of only two functions: ctdl_encode and ctdl_decode, for, respectively, compressing and decompressing documents. ctdl_encode takes the following parameters: n char* filename—name of the input (uncompressed) document n char* filename_out—name of the output (compressed) document n efiletype ftype—format of the input document, defined as: enum efiletype { html, pdf, ps, rtf, tex, txt, xml}; n edictionarytype dtype—dictionary type, defined as: enum edictionarytype { static, semidynamic }; ctdl_decode takes the following parameters: n char* filename—name of the input (compressed) document n char* filename_out—name of the output (decompressed) document table 1. universal transform optimizations ctdl settings ctdl+ settings format minfr wdspc spruns letcnt wdspc spruns letcnt html 3 + + + + + pdf 3 ps 6 + + rtf 3 + + + tex 3 + + + + + + txt 6 + + + + + + xml 3 + + + + + the efficient storage of text documents in digital libraries | skibiński and swacha 147 the library was written in the c++ programming language, but a compiled static library is also distributed; thus it can be used in any language that can link such libraries. currently, the library is compatible with two platforms: microsoft windows and linux. to use static dictionaries, the respective dictionary file must be available. the library is supplied with an english dictionary trained on a 3 gb text corpus from project gutenberg.23 seven other dictionaries—german, spanish, finnish, french, italian, polish, and russian— can be freely downloaded from www.ii.uni.wroc.pl/~inikep/ research/dicts. there also is a tool that helps create a new dictionary from any given corpus of documents, available from skibiński upon request via e-mail (inikep@ii.uni .wroc.pl). the library can be used to reduce the storage requirements or also to reduce the time of delivering a requested document to the library user. in the first case, the decompression must be done on the server side. in the second case, it must be done on the client side, which is possible because stand-alone decompressors are available for microsoft windows and linux. obviously, a library can support both options by providing the user with a choice whether a document should be delivered compressed or not. if documents are to be decompressed client-side, the basic ctdl, using a semidynamic dictionary, seems handier, since it does not require the user to obtain the static dictionary that was used to compress the downloaded document. still, the size of such a dictionary is usually small, so it does not disqualify ctdl+ from this kind of use. n experimental results we tested ctdl experimentally on a benchmark set of text documents. the purpose of the tests was to compare the storage requirements of different document formats in compressed and uncompressed form. in selecting the test files we wanted to achieve the following goals: n test all the formats listed in table 1 (therefore we decided to choose documents that produced no errors during document format conversion) n obtain verifiable results (therefore we decided to use documents that can be easily obtained from the internet) n measure the actual compression improvement from applying the proposed scheme (apart from the rtf format, the scheme is neutral to the images embedded in documents; therefore we decided to use documents that have no embedded images) for these reasons, we used the following procedure for selecting documents to the test set. first, we searched the project gutenberg library for tex documents, as this format can most reliably be transformed into the other formats. from the fifty-one retrieved documents, we removed all those containing images as well as those that the htlatex tool failed to convert to html. in the eleven remaining documents, there were four jane austen books; this overrepresentation was handled by removing three of them. the resulting eight documents are given in table 2. from the tex files we generated html, pdf, and ps documents. then we used word 2007 to transform html documents into rtf, doc, and xml (thus this is the microsoft word xml format, not the project gutenberg xml format). the txt files were downloaded from project gutenberg. the tests were conducted on a low-end amd sempron 3000+ 1.80 ghz system with 512 mb ram and a seagate 80 gb ata drive, running windows xp sp2. for comparison purposes, we used three generalpurpose compression programs: n gzip implementing deflate n bzip2 implementing a bwt-based compression algorithm table 2. test set documents specification file name title author tex size (bytes) 13601-t expositions of holy scripture: romans corinthians maclaren 1,443,056 16514-t a little cook book for a little girl benton 220,480 1noam10t north america, v. 1 trollope 804,813 2ws2610 hamlet shakespeare 194,527 alice30 alice in wonderland carroll 165,844 cdscs10t some christmas stories dickens 127,684 grimm10t fairy tales grimm 535,842 pandp12t pride and prejudice austen 727,415 148 information technology and libraries | september 2009 n ppmvc implementing a ppm-derived compression algorithm24 tables 3–10 show n the bitrate attained on each test file by the deflatebased gzip in default mode, the proposed compression scheme in the semidynamic and static variants with deflate as the back-end compression algorithm, 7-zip in lzma mode, the proposed compression scheme in the semidynamic and static variants with lzma as the back-end compression algorithm, bzip2 and ppmvc; n the average bitrate attained on the whole test corpus; and n the total compression and decompression times (in seconds) for the whole test corpus, measured on the test platform (they are total elapsed times including program initialization and disk operations). bitrates are given in output bits per character of an uncompressed document in a given format, so a smaller table 3. compression efficiency and times for the txt documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.944 2.244 2.101 2.337 2.057 1.919 2.158 1.863 16514-t 2.566 2.150 1.969 2.228 1.993 1.838 2.010 1.780 1noam10t 2.967 2.337 2.109 2.432 2.151 1.958 2.160 1.946 2ws2610 3.217 2.874 2.459 2.871 2.659 2.312 2.565 2.343 alice30 2.906 2.533 2.184 2.585 2.360 2.056 2.341 2.090 cdscs10t 3.222 2.898 2.298 2.928 2.721 2.192 2.694 2.436 grimm10t 2.832 2.275 2.090 2.357 2.079 1.931 2.112 1.886 pandp12t 2.901 2.251 2.097 2.366 2.061 1.930 2.032 1.835 average 2.944 2.445 2.163 2.513 2.260 2.017 2.259 2.022 comp. time 0.688 1.234 0.954 6.688 2.640 2.281 2.110 3.281 dec. time 0.125 0.454 0.546 0.343 0.610 0.656 0.703 3.453 table 4. compression efficiency and times for the tex documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.927 2.233 2.092 2.328 2.049 1.913 2.146 1.852 16514-t 2.277 1.904 1.794 1.957 1.744 1.645 1.746 1.534 1noam10t 2.976 2.370 2.142 2.445 2.186 1.986 2.195 1.976 2ws2610 3.206 2.906 2.482 2.864 2.674 2.323 2.562 2.340 alice30 2.897 2.526 2.183 2.573 2.350 2.048 2.332 2.085 cdscs10t 3.224 2.931 2.328 2.941 2.759 2.222 2.723 2.466 grimm10t 2.831 2.304 2.120 2.364 2.113 1.960 2.143 1.910 pandp12t 2.881 2.239 2.090 2.346 2.049 1.916 2.013 1.817 average 2.902 2.427 2.154 2.477 2.241 2.002 2.233 1.998 comp. time 0.688 1.250 0.969 6.718 2.703 2.406 2.140 3.329 dec. time 0.109 0.453 0.547 0.360 0.609 0.672 0.703 3.485 the efficient storage of text documents in digital libraries | skibiński and swacha 149 bitrate (of, e.g., rtf documents compared to the plain text) does not mean the file is smaller, only that the compression was better. uncompressed files have a bitrate of 8 bits per character. looking at the results obtained for txt documents (table 3), we can see an average improvement of 17 percent for ctdl and 27 percent for ctdl+ compared to the baseline deflate implementation. compared to the baseline lzma implementation, the improvement is 10 percent for ctdl and 20 percent for ctdl+. also, ctdl+ combined with lzma compresses txt documents 31 percent better than gzip, 11 percent better than bzip2, and slightly better than the state-of-the-art ppmvc implementation. in case of tex documents (table 4), the gzip results were improved, on average, by 16 percent using ctdl and by 26 percent using ctdl+; the numbers for lzma are 10 percent for ctdl and 19 percent for ctdl+. in a cross-method comparison, ctdl+ with lzma beats gzip by 31 percent, bzip2 by 10 percent, and attains results very close to ppmvc. on average, deflate-based ctdl compressed xml documents 20 percent better than the baseline algorithm (table 5), and with ctdl+ the improvement rises to 26 percent. ctdl improves lzma compression by 11 percent, and ctdl+ improves it by 18 percent. ctdl+ with lzma beats gzip by 33 percent, bzip2 by 8 percent, and loses only 4 percent to ppmvc. similar results were obtained for html documents (table 6): they were compressed with ctdl and deflate 18 percent better than with the deflate algorithm alone, and 27 percent better with ctdl+. lzma compression efficiency is improved by 11 percent with ctdl and 20 percent with ctdl+. ctdl+ with lzma beats gzip by 33 percent, bzip2 by 9 percent, and loses only 2 percent to ppmvc. for rtf documents (table 7), the gzip results were improved, on average, by 18 percent using ctdl, and 25 percent using ctdl+; the numbers for lzma are respectively 9 percent for ctdl and 17 percent for ctdl+. in a cross-method comparison, ctdl+ with lzma beats gzip by 34 percent, bzip2 by 7 percent, and loses 5 percent to ppmvc. although there is no mode designed especially for doc documents in ctdl (table 8), the basic txt mode was used, as it was found experimentally to be the best choice available. the results show it managed to improve deflate-based compression by 9 percent using ctdl, and by 21 percent using ctdl+, whereas lzma-based compression was improved respectively by 4 percent for ctdl and 14 percent for ctdl+. combined with lzma, ctdl+ compresses doc documents 30 percent better than gzip, 13 percent better than bzip2, and 1 percent better than ppmvc. in case of ps documents (table 9), the gzip results were improved, on average, by 5 percent using ctdl, and by 8 percent using ctdl+; the numbers for lzma improved 3 percent for ctdl and 5 percent for ctdl+. in a cross-method comparison, ctdl+ with lzma beats gzip by 8 percent, losing 5 percent to bzip2 and 7 percent to ppmvc. finally, ctdl improved deflate-based compression of pdf documents (table 10) by 9 percent using ctdl and 10 percent using ctdl+ (compared to gzip; the numbers are table 5. compression efficiency and times for the xml documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.046 1.551 1.514 1.585 1.405 1.339 1.451 1.242 16514-t 0.871 0.698 0.670 0.703 0.612 0.590 0.599 0.552 1noam10t 2.383 1.870 1.736 1.914 1.711 1.575 1.724 1.515 2ws2610 0.691 0.539 0.497 0.561 0.474 0.440 0.461 0.422 alice30 1.477 1.258 1.140 1.248 1.131 1.034 1.116 0.999 cdscs10t 2.106 1.892 1.576 1.862 1.741 1.462 1.721 1.538 grimm10t 1.878 1.485 1.422 1.521 1.337 1.276 1.337 1.198 pandp12t 1.875 1.404 1.349 1.465 1.263 1.207 1.252 1.105 average 1.666 1.337 1.238 1.357 1.209 1.115 1.208 1.071 comp. time 0.750 1.844 1.390 10.79 4.891 5.828 7.047 3.688 dec. time 0.141 0.672 0.750 0.421 0.859 0.953 1.140 3.907 150 information technology and libraries | september 2009 much higher if compared to the embedded pdf compression—see “native” column in table 10); the numbers for lzma are respectively 7 percent for ctdl and 10 percent for ctdl+. combined with lzma, ctdl+ compresses pdf documents 28 percent better than gzip, 4 percent better than bzip2, and 5 percent worse than ppmvc. the results presented in tables 3–10 show that ctdl manages to improve compression efficiency of the general-purpose algorithms it is based on. the scale of improvement varies between document types, but for most of them it is more than 20 percent for ctdl+ and 10 percent for ctdl. the smallest improvement is achieved in case of ps (about 5 percent). figure 1 shows the same results in another perspective: the bars show how much better compression ratios were obtained for the same documents using different compression schemes compared to gzip with default options (0 percent means no improvement). compared to gzip, ctdl offers a significantly better compression ratio at the expense of longer processing time. the relative difference is especially high in case of decompression. however, in absolute terms, even in the worst case of pdf, the average delay between ctdl+ and gzip is below 180 ms for compression and 90 ms for decompression per file. taking into consideration the low-end specification of the test computer, these results table 6. compression efficiency and times for the html documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.696 2.054 1.940 2.121 1.868 1.751 1.932 1.670 16514-t 1.726 1.405 1.310 1.436 1.258 1.180 1.257 1.113 1noam10t 2.768 2.159 1.972 2.244 1.979 1.815 1.973 1.785 2ws2610 2.084 1.747 1.504 1.743 1.525 1.344 1.499 1.303 alice30 2.451 2.124 1.829 2.128 1.929 1.701 1.888 1.684 cdscs10t 2.880 2.593 2.084 2.597 2.410 1.966 2.348 2.131 grimm10t 2.603 2.074 1.916 2.138 1.883 1.752 1.889 1.688 pandp12t 2.640 2.037 1.891 2.120 1.826 1.717 1.777 1.596 average 2.481 2.024 1.806 2.066 1.835 1.653 1.820 1.621 comp. time 0.750 1.438 1.078 8.203 3.421 3.328 2.672 3.500 dec. time 0.140 0.515 0.594 0.359 0.688 0.750 0.812 3.672 table 7. compression efficiency and times for the rtf documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 1.882 1.431 1.372 1.428 1.267 1.200 1.300 1.120 16514-t 0.834 0.701 0.696 0.662 0.601 0.591 0.568 0.529 1noam10t 2.244 1.774 1.637 1.765 1.594 1.462 1.601 1.404 2ws2610 0.784 0.630 0.581 0.629 0.545 0.500 0.520 0.485 alice30 1.382 1.196 1.065 1.134 1.046 0.948 0.995 0.922 cdscs10t 2.059 1.882 1.558 1.784 1.704 1.432 1.645 1.488 grimm10t 1.618 1.301 1.227 1.285 1.150 1.082 1.149 1.010 pandp12t 1.742 1.340 1.264 1.336 1.169 1.115 1.142 1.012 average 1.568 1.282 1.175 1.253 1.135 1.041 1.115 0.996 comp. time 0.766 2.047 1.500 12.62 6.500 7.562 8.032 3.922 dec. time 0.156 0.688 0.766 0.469 0.875 0.953 1.312 4.157 the efficient storage of text documents in digital libraries | skibiński and swacha 151 certainly seem good enough for practical applications. compared to lzma, ctdl offers better compression and a shorter compression time at the expense of longer decompression time. notice that the absolute gain in compression time is several times the loss in decompression time, and the decompression time remains short, noticeably shorter than bzip2’s and several times shorter than ppmvc’s. ctdl+ beats bzip2 (with the sole exception of ps documents) in terms of compression ratio and achieves results that are mostly very close to the resourcehungry ppmvc. n conclusions in this paper we addressed the problem of compressing text documents. although individual text documents rarely exceed several megabytes in size, their entire collections can have very large storage space requirements. although text documents are often compressed with general-purpose methods such as deflate, much better compression can be obtained with a scheme specialized for text, and even better if the scheme is additionally specialized for individual document formats. we have developed such a scheme (ctdl), beginning with a text transform designed earlier for xml documents and table 8. compression efficiency and times for the doc documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.798 2.183 2.062 2.181 1.976 1.854 2.115 1.818 16514-t 2.226 2.213 2.073 1.712 1.712 1.652 1.919 1.686 1noam10t 2.851 2.250 2.025 2.289 2.057 1.869 2.113 1.870 2ws2610 2.497 2.499 2.210 2.095 2.095 1.890 2.251 1.999 alice30 2.744 2.714 2.270 2.345 2.345 2.038 2.348 2.058 cdscs10t 2.916 2.891 2.231 2.559 2.560 2.062 2.475 2.196 grimm10t 2.691 2.677 2.059 2.179 2.179 1.856 2.075 1.833 pandp12t 2.761 2.171 2.050 2.189 1.955 1.843 1.983 1.770 average 2.686 2.450 2.123 2.194 2.110 1.883 2.160 1.904 comp. time 0.718 1.312 1.031 7.078 4.063 3.001 2.250 3.421 dec. time 0.125 0.375 0.547 0.344 0.547 0.718 0.735 3.625 table 9. compression efficiency and times for the ps documents deflate lzma bzip2 ppmvc file name gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 2.847 2.634 2.589 2.213 2.105 2.074 2.011 1.778 16514-t 3.226 3.129 3.039 2.730 2.707 2.699 2.613 2.505 1noam10t 2.718 2.551 2.490 2.147 2.060 2.015 1.892 1.694 2ws2610 3.064 2.922 2.795 2.600 2.521 2.450 2.336 2.186 alice30 3.224 3.154 3.026 2.750 2.745 2.691 2.553 2.400 cdscs10t 3.110 3.029 2.890 2.657 2.683 2.579 2.447 2.276 grimm10t 2.833 2.664 2.597 2.288 2.200 2.162 2.074 1.863 pandp12t 2.814 2.533 2.468 2.193 2.049 1.998 1.858 1.644 average 2.980 2.827 2.737 2.447 2.384 2.334 2.223 2.043 comp. time 1.328 3.015 2.500 14.23 10.96 11.09 4.171 5.765 dec. time 0.203 0.688 0.781 0.609 1.063 1.125 1.360 6.063 152 information technology and libraries | september 2009 modifying it for the requirements of each of the investigated document formats. it has two operation modes: basic ctdl and ctdl+ (the latter uses a common word dictionary for improved compression) and uses two back-end compression algorithms: deflate and lzma (differing in compression speed and efficiency). the improvement in compression efficiency, which can be observed in the experimental results, amounts to a significant reduction of data storage requirements, giving the reasons to use the library in both new and existing digital library projects instead of general-purpose compression programs. to facilitate this process, we implemented the scheme as an open-source software library under the same name, freely available at http://www.ii.uni.wroc . p l / ~ i n i k e p / re s e a rc h / c t d l / ctdl09.zip. although the scheme and the library are now complete, we plan future extensions aiming both to increase the level of specializations for currently handled document formats and to extend the list of handled document formats. table 10. compression efficiency and times for the (uncompressed) pdf documents deflate lzma bzip2 ppmvc file name native gzip ctdl ctdl+ 7-zip ctdl ctdl+ 13601-t 3.443 2.624 2.191 2.200 1.986 1.708 1.656 1.852 1.659 16514-t 4.370 2.839 2.836 2.810 2.422 2.422 2.328 2.378 2.241 1noam10t 3.379 2.522 2.103 2.094 1.924 1.659 1.603 1.770 1.587 2ws2610 3.519 2.204 2.346 2.248 1.781 1.947 1.860 1.625 1.480 alice30 3.886 2.863 2.753 2.668 2.429 2.308 2.216 2.315 2.137 cdscs10t 3.684 2.835 2.688 2.557 2.399 2.276 2.164 2.260 2.079 grimm10t 3.543 2.557 2.135 2.120 2.008 1.713 1.661 1.858 1.696 pandp12t 3.552 2.684 2.267 2.256 2.071 1.831 1.769 1.870 1.705 average 3.672 2.641 2.415 2.369 2.128 1.983 1.907 1.991 1.823 comp. time n/a 1.594 3.672 3.250 19.62 13.31 16.32 5.641 7.375 dec. time n/a 0.219 0.844 0.969 0.719 1.219 1.360 1.765 7.859 figure 1. compression improvement relative to gzip the efficient storage of text documents in digital libraries | skibiński and swacha 153 acknowledgements szymon grabowski is the coauthor of the xml-wrt transform, which served as the basis for the ctdl library. references 1. john f. gantz et al., the diverse and exploding digital universe: an updated forecast of worldwide information growth through 2011 (framingham, mass.: idc, 2008), http://www .emc.com/collateral/analyst-reports/diverse-exploding-digital -universe.pdf (accessed may 7, 2009). 2. timothy c. bell, alistair moffat, and ian h. witten, “compressing the digital library,” in proceedings of digital libraries ‘94 (college station: texas a&m univ. 1994): 41. 3. ian h. witten and david bainbridge, how to build a digital library (san francisco: morgan kaufmann, 2002). 4. chad m. kahl and sarah c. williams, “accessing digital libraries: a study of arl members’ digital projects,” the journal of academic librarianship 32, no. 4 (2006): 364. 5. donald e. knuth, tex: the program (reading, mass.: addison-wesley, 1986); microsoft technical support, rich text format (rtf) version 1.5 specification, 1997, http://www.biblioscape .com/rtf15_spec.htm (accessed may 7, 2009); tim bray et al., eds., extensible markup language (xml) 1.0 (fourth edition), 2006, http://www.w3.org/tr/2006/rec-xml-20060816 (accessed may 7, 2009); dave raggett, arnaud le hors, and ian jacobs, eds., w3c html 4.01 specification, 1999, http://www.w3.org/ tr/rec-html40/ (accessed may 7, 2009); postscript language reference, 3rd ed. (reading, mass.: addison-wesley, 1999), http://www.adobe.com/devnet/postscript/pdfs/plrm.pdf (accessed may 7, 2009); pdf reference, 6th ed., version 1.7, 2006, http://www.adobe.com/devnet/acrobat/pdfs/pdf_ reference_1-7.pdf (accessed may 7, 2009). 6. jacob ziv and abraham lempel, “a universal algorithm for sequential data compression,” ieee transactions on information theory 23, no. 3 (1977): 337. 7. ian h. witten, alistair moffat, and timothy c. bell, managing gigabytes: compressing and indexing documents and images, 2nd ed. (san francisco: morgan kaufmann, 1999). 8. john g. cleary and ian h. witten, “data compression using adaptive coding and partial string matching,” ieee transactions on communication 32, no. 4, (1984): 396; michael burrows and david j. wheeler, “a block-sorting lossless data compression algorithm,” digital equipment corporation src research report 124, 1994, www.hpl.hp.com/techreports/ compaq-dec/src-rr-124.pdf (accessed may 7, 2009). 9. witten, moffat, and bell, managing gigabytes. 10. jon louis bentley et al., “a locally adaptive data compression scheme,” communications of the acm 29, no. 4 (1986): 320; r. nigel horspool and gordon v. cormack, “constructing word-based text compression algorithms,” proceedings of the data compression conference (snowbird, utah, 1992): 62. 11. see for example andrei v. kadach, “text and hypertext compression,” programming & computer software 23, no. 4 (1997): 212; alistair moffat, “word-based text compression,” software—practice & experience 2, no. 19 (1989): 185; przemysław skibiński, szymon grabowski, and sebastian deorowicz, “revisiting dictionary-based compression,” software— practice & experience 35, no. 15 (2005): 1455. 12. przemysław skibiński, jakub swacha, and szymon grabowski, “a highly efficient xml compression scheme for the web,” proceedings of the 34th international conference on current trends in theory and practice of computer science, lncs 4910 (2008): 766. 13. jon louis bentley et al., “a locally adaptive data compression scheme,” communications of the acm 29, no. 4 (1986): 320. 14. skibiński, grabowski, and deorowicz, “revisiting dictionary-based compression,” 1455. 15. skibiński, swacha, and grabowski, “a highly efficient xml compression scheme for the web,” 766. 16. peter deutsch, “deflate compressed data format specification version 1.3,” rfc1951, network working group, 1996, www.ietf.org/rfc/rfc1951.txt (accessed may 7, 2009). 17. christian schneider, precomp—a command line precompressor, 2009, http://schnaader.info/precomp.html (accessed may 7, 2009). 18. the technical details of the algorithm constructing code words and assigning them to indexes, and encoding numbers and special tokens, are given in skibiński, swacha, and grabowski, “a highly efficient xml compression scheme for the web,” 766. 19. david solomon, data compression: the complete reference, 4th ed. (london: springer-verlag, 2006). 20. skibiński, swacha, and grabowski, “a highly efficient xml compression scheme for the web,” 766. 21. dave raggett, arnaud le hors, and ian jacobs, eds., w3c html 4.01 specification, 1999, http://www.w3.org/tr/rec -html40/ (accessed may 7, 2009). 22. ian h. witten, david bainbridge, and stefan boddie, “greenstone: open source dl software,” communications of the acm 44, no. 5 (2001): 47. 23. project gutenberg, 2008, http://www.gutenberg.org/ (accessed may 7, 2009). 24. przemysław skibiński and szymon grabowski, “variablelength contexts for ppm,” proceedings of the ieee data compression conference (snowbird, utah, 2004): 409. alcts cover 2 lita cover 3, cover 4 index to advertisers editorial | truitt 159 marc truitt editorial: reflections on what we mean by “forever” w hat do we mean when we tell people that we want or intend to preserve content or an object “forever”? a couple of weeks ago, i attended the fall meeting of the preservation and archiving special interest group (pasig) in san francisco. the group, generously sponsored by sun microsystems, is the brainchild of art pasquinelli of sun and michael keller of stanford. first, a confession on my part. since the university of alberta (ua) was one of the founding members of pasig, i had occasion to attend the first several pasig meetings. in the beginning, there were just a handful of—perhaps fewer than ten—institutions represented. it seemed at the first couple of meetings, when the group was still finding its direction, that the content was slim, repetitious, and overly focused on sun’s own solutions in the digital preservation and archiving (dpa) arena. since we had other attendees ably representing ua, i stayed away from the following several meetings. well, pasig has grown up. the attendee list for this meeting boasted nearly two hundred persons representing more than thirty institutions. among the attendees were many of the leading lights in dpa and the profession generally. institutions represented included several north american and european national libraries, as well as arls, memory institutions, and a host of companies and consultants offering a range of dpa solutions. yes, pasig has arrived, and we have art, mike, and sun to thank for this. if i have one real remaining complaint about pasig, it’s that the group is still overly focused on sun’s solutions. true, other vendors such as exlibris and vtls attended, but their solutions don’t compete; rather, they build on sun’s offerings. and while microsoft also was in attendance for the first time, its presentation focused not so much on dpa solutions—it has none—as on a raft of interesting and useful plug-ins whose purpose is to facilitate preservation of content created in microsoft products such as word, excel, powerpoint, etc. other large vendors of dpa solutions—think ibm, for one—remain conspicuously absent. it’s time for sun to do the “right thing” and “open source” pasig. if sun wishes to continue to sponsor pasig by lending administrative and organizational expertise, that would be great. indeed, a leading but not controlling role in pasig would be entirely consistent with the company’s new focus on support of open-source efforts such as mysql, openoffice, and opensolaris. so, what about the title of this editorial? when we talk of digital preservation, just how long are we thinking of preserving an object? ask any twenty specialists in dpa, and chances are that you’ll get at least ten different answers. for some, the timeframe can be as short as five to twenty years. for others, it’s fifty or perhaps one hundred years. at pasig, at least one presenter described an organizational business model that envisions preserving content for five hundred years. and there are even some in our profession who glibly use what one might call “the dpa f-word,” although fortunately none of them seemed to be in attendance at this fall’s pasig what does this mean in a very practical, nuts-and-bolts it sense? chris wood of sun gave a presentation at the 2008 pasig spring meeting in which he estimated that the cost to supply power and cooling alone to maintain a petabyte (1,000 tb) of disk-based digital content for a mere ten years would easily exceed $1 million.1 refining his figures downward somewhat, wood noted a few months later at the following pasig meeting that for a 1 tb drive, the fiveyear estimated power and cooling for 2008–12 could be estimated at approximately $320, or $640,000 per petabyte over ten years, still a considerable sum.2 add to this the costs of migration—consider that a modern spinning disk is generally thought to have a useful lifespan of about five years, and tape may have two or three decades—and the need regular integrity-checking of digital content for “bit-rot,” and you have the stuff of a sustainability nightmare. these challenges don’t even include the messy question of preservating an object so that it is usable in a century or five. while we probably will be able to read word and excel files for the foreseeable future, there are already countless files created with nowdefunct pc applications of the 1980s and 1990s; many are stored on all kinds of obsolete media and today are skating on the edge of inaccessibility. already we are seeing concern expressed at institutions with significant digital library and digitization commitments that curating, migrating, and ensuring the integrity and usability of growing petabytes of content over centuries may be unsustainable in both dollars and staff.3 can we even imagine the possible maintenance burden for our descendants, say, 250 or 500 years from now? in 2006, alexander stille observed that “one of the great ironies of the information age is that, while the late twentieth century will undoubtedly have recorded more data than any other period in history, it will also almost certainly have lost more information than any previous era.”4 how are we to deal with this? can we meaningfully plan for the preservation of digital content over centuries given our poor track record over just the past few decades? perhaps we’re thinking too big when we speak of “forever.” maybe we need to begin by conceptualizing and implementing on a more manageable scale. or, to adopt a phrase that seemed to become the informal mantra of marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 160 information technology and libraries | december 2009 both this year’s pasig and the immediately preceding ipres meeting, “to get to forever you have to get to five years first.”5 n about this issue of ital a few months ago, while she was still working at the university of nevada las vegas, ital’s longtime managing editor, judith carter, shared with me the program for discovery mini-conference that had just been held at unlv. the presentations, originally cast as poster sessions, suggested a diverse and fascinating collection of insights deserving of wider attention. i suggested to judith that she and her colleagues had the makings of a great ital theme issue, and i’m pleased that they accepted my invitation to rework the presentations into a form suitable for publication here. i hope that you will find the results of their work interesting—i certainly do. they’ve done a superb job! bravo to judith and the presenters at the unlv discovery mini-conference! n corrigenda in our september issue, in an article by kathleen carlson, we inadvertently characterized camtasia studio as an open-source product. it is not. camtasia studio is published by techsmith corporation. you can find out more at the product website (http://www.techsmith.com/ camtasia.asp). also, in the same article, we provided a url to a flash tutorial titled “how to order an article that asu does not own.” ms. carlson has recently advised us that the tutorial in question is no longer available. references and notes 1. chris wood, “the billion file problem and other archive issues” (presentation, spring meeting of the sun preservation and archiving special interest group [pasig], san francisco, california, may 28, 2008), http://events-at-sun.com/ pasig_spring/presentations/chriswood_massivearchive.pdf (accessed oct. 22, 2009). 2. chris wood, “archive and preservation: emerging storage: technologies & trends” (presentation, fall meeting of pasig, baltimore, maryland, nov. 19, 2008), http://events -at-sun.com/pasig_fall08/presentations/pasig_wood.pdf. (accessed oct. 22, 2009). 3. consider, for example, the following extract from a recent posting to the syslib-l electronic discussion list by the head of library systems at the university of north carolina at chapel hill: i’m exaggerating a little in my subject line, but it’s been less than 4 years since we purchased our first large (5tb) storage array. we now have a raw 65tb online, and 84tb on order—although a considerable chunk of that 84 is going to replace storage that’s going out of warranty/maintenance and is more cost effective to replace (apple xraids, for instance). in the end, though we’ll net out with 100tb or thereabouts by the end of next year. a great deal of this space is going to digitization projects—no surprise there. we have over 20tb now in our “digital archive,” storage i consider dim, if not dark. we need a heck of a lot of space for staging backups, givien [sic] how much we write to tape in a 24-hour period. individual staff aren’t abusing our lack of quotas—it’s really almost all legitimate, project-driven work that’s eating us up. what’s scarier is that we’re now talking seriously about moving from project-driven work to programmatic work: the latest large photographic archive we acquired is being scanned as part of the acquisition/processing workflow. we’re looking at ways to prioritize the scanning of our manuscript collections. donors increasingly expect to see their gifts online. and we’re not even yet supporting an “institutional repository.” will owen, “0 to 60 in three years: mass storage management,” online posting, dec. 8, 2008, syslib-l@listserv.indiana.edu, https://listserv.indiana.edu/cgi-bin/wa-iub.exe?a0=syslib-l (account required; accessed oct. 22, 2009). 4. alexander stille, “are we losing our memory? or, the museum of obsolete technology,” lost magazine, no. 3 (feb. 2006), http://www.lostmag.com/issue3/memory.php (accessed oct. 22, 2009). while stille was referring in this quotation to both digital and nondigital materials, his comments are but part of a larger debate positing that the latter half of the twentieth century could well come to be known in the future as a “digital dark age” because of the vast quantity of at-risk digital content, recently estimated by one expert at some 369 exabytes (369 billian gb) worth of data. physorg.com, “‘digital dark age’ may doom some data,” http://www.physorg.com/news144343006 .html (accessed oct. 22, 2009). 5. ed summers, “ipres, iipc, pasic roundup/braindump,” online posting, oct. 14, 2009, inkdroid, http://inkdroid .org/journal/2009/10/14/ipres-iipc-pasig-roundupbrain dump/ (accessed oct. 22, 2009). the binary vector as the basis of an inverted index file donald r. king: rutgers university, new brunswick, new jersey. 307 the inverted index file is a frequently used file structure for the storage of indexing information in a document retrieval system. this paper describes a novel method for the computer storage of such an index. the method not only offers the possibility of reducing storage requirements fot an index but also affords more mpid processing of query statements expressed in boolean logic. introduction the inverted index file is a frequently used file structure for the storage of indexing information in document retrieval systems. an inverted index file may be used by itself or with a direct file in a so-called combined file system. the inverted index file contains a logical record for each of the subject headings or index terms which may be used to describe documents in the system. within each logical record there is a list of pointers to those documents which have been indexed by the subject heading in question. the individual pointers are usually in the form of document numbers stored in fixed-length digital form. obviously, the length of the lists will vary from record to record. the purpose of this paper is the presentation of a new technique for the storage of the lists of pointers to documents. it will be shown that this technique not only reduces storage requirements, but that in many cases the time required to search the index is reduced. the technique is useful in systems which use boolean searches. the relative merits of boolean and weighted term searches are beyond the scope of this paper, as are the relative merits of the various possible file structures. the binary vector as a storage device the exact form of each document pointer is immaterial to the user of a document retrieval system as long as he is able to obtain the document he desires. the standard form for these pointers in most automated systems is a document number. note that each pointer is by itself a piece of information. however, if one thinks of a "peek-a-boo" system, the document 308 journal of library automation vol. 7/4 december 1974 pointer becomes simply a hole punched in a card. in this case the position of the pointer, not the pointer itself, conveys the information. the new technique presented in this paper is an extension of the "peeka-boo" concept. a vector or string of binary zeroes is constructed equal in length to the number of documents expected in the system. the position of each vector element corresponds to a document number. that is, the first position in a vector corresponds to document number one and the tenth vector position corresponds to document number ten. a vector is constructed for each subject heading in the system. as a document enters the system, ones are inserted in place of the zeroes in the positions corresponding to the new document number in the vectors for the subject headings used to describe the document. as an example, assume the following document descriptions are presented to a system using binary vectors: document number 1 2 3 subject headings a,b,d c,e a,c the binary vectors for terms a, b, c, d, and e before the insertion of the indexing data would be as follows: subject heading a b c d e vector 000 ... 0 000 ... 0 000 ... 0 000 ... 0 ooo ... ·o after the insertion of the indexing information, the same vectors would appear as follows: subject heading a b c d e vector 101 ... 0 100 ... 0 011 ... 0 100 ... 0 010 ... 0 the binary vector seems to have several advantages over the standard form of storage of document numbers in an inverted file. first, the records are of fixed length since the vectors are all equal in length to the expected number of documents in the system. space may be left at the end of each vector for the addition of new documents. periodic copying of the file may be used to expand the index records with additional zeroes added at the end of each record during the process. consequently, unless binary vector/king 309 there are limitations of size imposed by the equipment, only one access to the storage device will be needed to retrieve the index record for a term. the second advantage offered by the binary vector method appears in the search process. most modern computers have a built-in capability of performing boolean logical manipulations on binary digit vectors or strings. thus, when boolean operations are specified as part of a query, the implementation of the operations within the· computer is considerably easier and faster for binary vectors than for the standard form of inverted files. other investigators of the use of the binary digit patterns or vectors have not fully explored its advantages and disadvantages. bloom suggests, without an explanation or evaluation, the use of bit patterns as the storage technique for inverted files in large data bases in the area of management information systems.1 davis and lin, again in the area of management information systems, propose bit patterns as the means of locating pertinent records in a master file. 2 they do not compare the method with other possible techniques. sammon discusses briefly the use of binary vectors as a storage technique, but dismisses it on the basis that the two-valued approach obviates the possible assignment of weights to index terms in describing documents. 3 gorokhov discusses the use of a modified binary vector approach in a document retrieval system implemented on a small soviet computer.4 faced with the need to minimize storage requirements for his inverted file, gorokhov concentrated on developing a technique for locating and removing strings of zeroes occurring in the binary vectors used within the system. since these zeroes represent the absence of information they could be removed if there were a way to indicate the position in the original vector of the ones that remained. he proposed the removal of strings of zeroes and the inclusion of numeric place values with the remaining vector elements. his result is a file with variable-length index records. the abandoning of the pure binary vector obviates the process, and gorokhov found it necessary to expand the vector elements into the original vector before logical operations could be applied. even though he does not state so explicitly, gorokhov seems to have found his method more efficient than the standard inverted file. gorokhov' s suggestion has led to the development of an algorithm for the compression of binary vectors. heaps and thiel have also discussed the use of compressed binary vectors as the basis of an inverted index file. 5• 6 aside from a brief description of the method for implementing the concept, they offer no comparison of the binary vector with the standard inverted file. storage requirements an immediate reaction to the concept of binary vectors is to state that they will obviously take more storage space than the standard inverted file. a closer study shows that this is not always the case. the storage requirements for the two types of files may be calculated as follows: 310 journal of library automation vol. 7/4 december 1974 d·n 1. mbv = 8 bytes 2. msr = d · i · k where: ( binary vector file) (standard inverted file) m = storage requirements in bytes d = number of documents in the system n = number of index terms in the system i = average depth of indexing in the system k = size in bytes of a document number stored in the file using equations 1 and 2 we find that the storage requirements for the binary vector file are, in fact, less than the requirements for the standard inverted file if n < 8 •] • k. it is well lmown that the distribution of the use of index terms follows a logarithmic curve. in simple terms, one might say that a few terms are used very frequently and many terms are used infrequently. this condition implies that in a binary vector file the records for many terms will contain segments in which there are no "ones" in any byte. a method for removing these "zero" bytes is called compression. compression algorithm the technique for the compression of binary vectors as described here is designed specifically for the ibm 360 family of computers and similar machines. the extension to other machines should be obvious. within the ibm 360 the byte, which contains eight binary digits, is the basic storage unit, and with the eight binary digits it is possible to store a maximum integer value of 255. for the purpose of describing a proposed compression algorithm for the binary vector in the ibm 360, the term subvector will be defined as a string of contiguous bytes chosen from within the binary vector. a zero subvector will be a subvector each of whose bytes contains eight binary zeroes. a nonzero subvecto1· will be a subvector each of whose bytes contains at least one binary one. to compress a binary vector in the ibm 360 the following steps may be taken: 1. divide the binary vector into a series of zero subvectors and nonzero subvectors. subvectors of either type may have a maximum length of 255 bytes. for zero subvectors longer than 255 bytes, the 256th byte is to be treated as a nonzero byte, thus dividing the long zero subvector. 2. each nonzero subvector is prefixed with two bytes. the first of the prefix bytes contains the count of zero bytes which precede the nonzero subvector in the uncompressed vector. the second prefix byte contains a count of the bytes in the nonzero subvector. 3. the compressed vector then consists of only the nonzero subvectors together with their prefix bytes. 4. a two byte field of binary zeroes will end the compressed vector. binmy vector/king 311 the compression of the vectors creates variable-length records and removes the advantage of having records which are directly amenable to boolean manipulation. the effect of file compression on such manipulation in the search process is not as severe as it might appear. for the search process, the compressed vector may be expanded into its original form. the process of expansion of the binary vectors is relatively simple, and since only those index term records which are used in a query need to be expanded at the search time, the search time is not significantly affected. as an example of the use of the compression algorithm consider the following binary vector. 01100000/10000000/ seven zero bytes j00000001j10000000j ... the slashes indicate the division of the vector into bytes. the vector might be read as indicating the following list of document numbers: 2, 3, 9, 80, and 81. in a standard inverted file with each document number assigned three bytes of storage, fifteen bytes would be required to store these numbers. the compressed vector which results from the application of the algorithm is the following: 00000000j00000010j01100000/10000000j00000111/00000010/ 00000001/10000000/ ... again the slashes separate the vector into bytes. for the purpose of the following discussion consider each byte in a vector to be numbered sequentially beginning with byte one at the left. in the uncompressed vector bytes one and two form a nonzero subvector. consequently, the first four bytes in the compressed vector can be interpreted as follows: byte one. binary zero indicating that no zero bytes were removed preceding this subvector. byte two. binary two indicating that the following nonzero subvector is two bytes long. bytes three, four. bytes one and two of the original vector. bytes three through nine of the original vector are a zero subvector, and bytes ten and eleven form a second nonzero subvector. consequently, the second four bytes of the compressed vector are interpreted as follows: byte five. binary seven indicating that a zero subvector of seven bytes has been removed. byte six. binary two indicating that the following two bytes are a nonzero subvector. bytes seven, eight. bytes ten and eleven of the original vector. thus the binary vector has been reduced from eleven bytes to eight 312 journal of library automation vol. 7/4 december 1974 bytes while the space required to record the document numbers in the standard inverted file remains fifteen bytes. memory requirements for the standard inverted file and the binary vector file to compare memory requirements for the standard inverted file and the compressed binary vector file, we base our comparison on the total number of postings in the file. in the standard inverted file the storage space for the postings is equal to the number of postings times the length of a single posting, which is usually two, three, or five bytes. memory requirements for the compressed binary vector file are more difficult to estimate because the distribution of document numbers within the record for each index term is not known. the fact that a single byte in the binary vector file may contain between zero and eight postings is extremely important. the worst possible case occurs if the postings in the binary vector are spaced in such a way that each nonzero byte contains only one posting, and these bytes are separated by zero bytes. consider the following example: ... /00000000/00010000/00000000/00000100/ ... in this case the compression algorithm will remove the zero bytes, but will add two bytes (the prefix bytes) for each nonzero byte. the resulting compressed vector will be essentially the same length as the standard inverted file record if each posting is three bytes long in the standard inverted file. it might seem that the distribution of one posting per byte for the entire vector represents an even worse situation. it is clear that the compression algorithm will, in this case, not reduce the size of the vector. however, it must be remembered that in the standard inverted file each posting will require at least two bytes and perhaps three bytes. thus, the length of the record in the standard inverted file is two or three times longer than the corresponding binary vector regardless of compression. in data used in two model retrieval systems prepared to compare the standard inverted file and the binary vector file there are 6,121 documents with a total of 94,542 postings. an examination of the binary inverted file for the model systems discloses that there are only 55,311 nonzero bytes in the binary vector file. thus there seems to be some form of clustering of the document numbers in each index term record. if each nonzero byte in this binary vector is isolated by zero bytes, two prefix bytes would be added for each byte. thus the total memory requirements for the postings in the compressed file would be 165,933 bytes. less storage space is required if some nonzero bytes are contiguous. on the other hand, the standard inverted file will require 189,084 bytes if a two-byte posting is used, or 283,626 bytes if a three-byte posting is used. further study of the clustering phenomenon is needed. binary vector /king 313 model retrieval systems to test some of the conjectures about the differences between the standard inverted file and the binary vector file, two model systems were prepared for operation on an ibm 360/67. details of the systems and pl/1 program listings are available elsewhere. 7 the data base used was obtained from the institute of animal behavior at rutgers university. in the data base 6,121 documents were indexed by 1,484 index terms. a total of 94,542 postings in the system gives an average depth of indexing of 15.4 terms per document. both inverted files were stored on ibm 2314 disc storage devices. to ease the problem of handling variable-length records in both files the logical records for each index term were divided into chains of fixed~ lehgth physical records. for the standard inverted file a physical record size of 331 bytes was chosen. the entire file required 702,713 bytes including record overhead. for the uncompressed binary vector file a physical record size of 1,286 bytes was chosen to include overhead and space for up to 10,216 document numbers. when the compression algorithm was applied, with a physical record length of 130 bytes, the memory requirements for the binary vector file were reduced to 281,450 bytes, or 41 percent of the space required to store the standard inverted file. a series of forty searches of varying complexities were run against both files. the "time" function of pl/1 made it possible to accumulate timing statistics which excluded input/output functions. search times for the binary vector file include expansion of the compressed vectors, boolean manipulation of the vectors, and conversion of the resultant vector into digital document numbers. the times for the standard inverted file are for the boolean manipulation of the lists. the following points were noted in the analysis of the times: 1. in twenty-two of the forty queries for which comparative timings were obtained, the search of the binary vector file was faster, in one case by a factor of thirty-five. in the eighteen cases in which the search of the standard inverted file was faster, the search of the standard inverted file was at most 6.17 times faster. 2. the range of the total times for the binary vector file was .79 seconds to 9.72 seconds. the range for searching the standard inverted file was .15 seconds to 202.98 seconds. the fact that the search times for the binary vector file are within a fairly narrow range, in contrast to the wider range of times for searching the standard inverted file, has important implications for the design of an on-line interactive document retrieval system. in such a system it is important that the computer respond to users' requests not only rapidly but consistently. the narrower range of the search times provided by the binary vector file will assist in producing consistent times. 3. the search times for the binary vector file, exclusive of expansion and conversion times, are unaffected by the number of postings con314 journal of library automation vol. 7/4 december 1974 tained in the index terms used in a query. on the other hand, the number of postings in the records used from the standard inverted file appears to cause the differences in search times for that file. to test the conjectures! that 1. search times for the binary vector file are related to the number of index terms in the query, and 2. search times for the standard inverted file are related to the number of postings in the index terms in the query, a correlation analysis was performed. the following correlation coefficients were obtained: v a1'iables 1' number of terms in query and search .960 times for the binary vector file. number of postings in query terms and .979 search times for standard inverted file. the relationships indicated above are significant at the .001 level. no attempt was made to compute an average search time per term for the binary vector file or average search time per posting for the standard inverted file. such times would have meaning only for the model systems. summary the binary vector is suggested as an alternative to the usual method of storing document pointers in an inverted index file. the binary vector file can provide savings in storage space, search times, and programming effort. references 1. burton h. bloom, "some techniques and trade-offs affecting large data base retrieval times," proceedings of the acm 24 ( 1969). 2. d. r. davis and a. d. lin, "secondary key retrieval using an ibm 7090-1310 system," communications of the acm 8:243-46 (april1965). 3. john w. sammon, some mathematics of information storage and retrieval (technical report radc-tr-68-178 [rome, new york: rome air development center, 1968]). 4. s. a. gorokhov, "the 'setka-3' automated irs on the 'minsk-22' with the use of the socket associative-address method of organization of information" (paper presented at the all-union conference on information retrieval systems and automatic processing of scientific and technical information, moscow, 1967. translated and published as part of ad 697 687, national technical information service). 5. h. s. heaps and l. h. thiel, "optimum procedures for economic information retrieval," information storage & retrieval6:131-53 (1970). 6. l. h. thiel and h. s. heaps, "program design for retrospective searches on large data bases," information storage & retrieval8:1-20 (1972). 7. d. r. king, "an inverted file structure for an interactive document retrieval system" (ph.d. dissertation, rutgers university, 1971). emergency remote library instruction and tech tools: a matter of equity during a pandemic article emergency remote library instruction and tech tools a matter of equity during a pandemic kathia ibacache, amanda rybin koob, and eric vance information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12751 abstract during spring 2020, emergency remote teaching became the norm for hundreds of higher education institutions in the united states due to the covid-19 pandemic. librarians were suddenly tasked with moving in-person services and resources online. for librarians with instruction responsibilities, this online mandate meant deciding between synchronous and asynchronous sessions, learning new technologies and tools for active learning, and vetting these same tools for security issues and ada compliance. in an effort to understand our shared and unique experiences with emergency remote teaching, the authors surveyed 202 academic instruction librarians in order to answer the following questions: (1) what technology tools are academic librarians using to deliver content and engage student participation in emergency remote library sessions during covid-19? (2) what do instruction librarians perceive as the strengths and weaknesses of these tools? (3) what digital literacy gaps are instruction librarians identifying right now that may prevent access to equitable information literacy instruction online? this study will deliver and discuss findings from the survey as well as make recommendations toward best practices for utilizing technology tools and assessing them for equity and student engagement. introduction the worldwide covid-19 pandemic has had important repercussions for university libraries. all library services, including information literacy instruction, moved online in a matter of days, creating a wave of needs that required immediate response. with the closure of university campuses all around the world, academic libraries encountered an unprecedented test of their adaptation abilities. although online education has been around for many years, widespread use of the remote classroom may have been unprecedented for many librarians until the spring of 2020. this type of online learning, as charles hodges et al. explain, is significantly different from the otherwise established domains of online and distance learning because it is unplanned, rushed, and happening in the midst of a crisis.1 as they note, “emergency remote teaching has emerged as a common alternative term” to differentiate from standard online education prior to the pandemic.2 the authors recognize the different and sometimes overlapping personal and professional impacts covid-19 has had on our communities, both inside and outside of the classroom. rather than broadly assessing emergency remote teaching, the authors are looking at what jody greene, referring to teaching during the covid-19 pandemic, calls “specific technological tools and flexible teaching practices.”3 this paper is concerned with issues of equity, student engagement, kathia salomé ibacache oliva (kathia.ibacache@colorado.edu) is romance languages librarian, assistant professor, university of colorado boulder. amanda rybin koob (amanda.rybinkoob@colorado.edu) is literature and humanities librarian, assistant professor, university of colorado boulder. eric vance (eric.vance@colorado.edu) is associate professor of applied mathematics and director of lisa (laboratory for interdisciplinary statistical analysis), university of colorado boulder. © 2021. mailto:kathia.ibacache@colorado.edu mailto:amanda.rybinkoob@colorado.edu mailto:eric.vance@colorado.edu information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 2 and technology tools that could be used to facilitate library instruction during emergency remote teaching. the authors seek to answer the following questions: (1) what technology tools are academic librarians using to deliver content and engage student participation in emergency remote library sessions during covid-19? (2) what do instruction librarians perceive as the strengths and weaknesses of these tools? (3) what digital literacy gaps are instruction librarians identifying since covid-19 that may prevent equitable access to information literacy instruction online? literature review technology tools facilitated a quick transition online in march 2020, enabling librarians to interact with students despite the move to emergency remote teaching. however, this fast transition and its associated learning curve accentuated issues of student engagement including equity and accessibility. there is a dearth of existing literature on teaching and learning online during times of great societal stress, with some notable exceptions, including a recent piece about university closures and moving to online classes during student-led protests in south africa from 2015 to 2017.4 as such, this literature review considers some of the barriers that contribute to inequitable information access in online learning, as well as digital literacy definitions. here we consider both ongoing challenges to equitable online access and specific challenges for the current covid-19 pandemic. barriers to equitable student access in online learning equity in academic libraries is widely represented in the scholarship through topics including disability, race, class, and salary gaps among librarians.5 however, as our ongoing pandemic illustrates, there is a strong need for more literature regarding students’ equitable online access to information during times that call for emergency remote teaching. the issue of equity may be considered in terms of external and internal challenges, which affect students differently. external barriers include low bandwidth and lack of devices. some researchers advise letting students communicate through chat instead of a webcam, since webcam use increases bandwidth consumption.6 understandably, colleges may need to provide computers and wireless hotspots to students who lack access to computers or to the internet.7 moreover, a 2018 pew fact tank publication noted that 15 percent of homes with school-age students (6–17 years old) do not have access to high-speed connection, and this digital divide particularly affects teens and their ability to be involved with homework.8 although this data focused on school-age students, these issues probably affected some college students during the pandemic. students may also be experiencing internal barriers such as language differences, lack of self regulation, lack of previous educational experience, and stress, all of which may affect academic performance. for example, one study found that language barriers challenged international students during remote web conferences with librarians.9 another study of international students showed that their academic success relied significantly on a variety of internal characteristics, such as self-regulation.10 additionally, a survey of students taking online courses showed that previous educational experience, including with online learning or within a given discipline, supported completion of those courses.11 moreover, stress is an internal barrier for students that may have external causes and is likely affecting librarians, faculty, and students during covid -19. scholars note that stress changes peoples’ use of technology, and this stress manifests differently depending on individual identity markers, such as gender and experience.12 information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 3 technology tools, digital literacy, student engagement in addition to barriers to equitable access, the digital age that has characterized the late 20th and 21st centuries, has prompted the advent of multiple technology tools that may be used in online library sessions, including emergency remote library instruction. these tools are meant to facilitate instruction and engagement, but they require students and instructors to be comfortable with technology. in the case of higher education, this level of comfort involves digital literacy competencies that surpass what is known as traditional textual literacy. the american library association’s (ala) digital literacy task force defines digital literacy as “the ability to use information and communication technologies to find, evaluate, create, and communicate information, requiring both cognitive and technical skills.”13 during the pandemic, the technical and cognitive skills of library instructors and students may be compromised due to stress as well as individual situations and specific environments. one of the technical challenges for remote library sessions stems from the need for instructor s to use tools to achieve flexibility and hybridity. librarians steven j. bell and john shank, addressing the challenges of new technologies for librarianship, coined the term “blended librarian” in 2004 to denote a librarian who combines traditional skills with those involving knowledge of hardware and software as applied in the teaching and learning process.14 the concept of the “blended librarian” may be outdated, but it encompasses the notion that librarians are expected to be comfortable with technology. again, librarians are now facing the mandate of presenting information literacy and library resources online, navigating between and facilitating the use of multiple technology tools and formats. it is worth considering how well our tools meet this mandate. although remote learning may be more amenable to some learners than others, there is consensus on the benefits of using technology for teaching and learning even if a learning curve exists for instructors. for example, researchers examining school support for classroom technology found that teachers supported enhanced technology integration even if it surpassed their own technology skills.15 notwithstanding the benefits perceived by teachers, there are also some drawbacks in the use of technology in the classroom, especially for distance learning. digital technologies researcher jesper aagaard, reporting part of a study on “technological mediation in the classroom” refers to two processes: “outside in,” where students use educational technologies to acquire knowledge in the classroom, and “inside out,” where students use technology tools to withdraw from the classroom visiting non-related websites.16 for instruction librarians, student engagement is paramount; therefore, redirecting students who leave the digital classroom is important, though it can be difficult to know when this occurs. a number of reasons could explain why students may disengage in a distance learning setting, one of them being the lack of digital literacy. moreover, the belief that higher education students in the 21st century are technologically savvy may be misleading. citing mark prensky, who originated the terms “digital native” and “digital immigrant,” wan ng explains that the phrase “digital natives” describes those people born in 1980 and after whose lives have been shaped by technology.17 ng found that while the students in his study were very comfortable with technologies such as word processor software, youtube, and facebook, they were not as comfortable using technologies to create content.18 there may be a digital literacy divide between information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 4 knowing and using a technology for social media and using a technology to create online content such as web pages and blogs. similarly, ng found that when presented with unfamiliar technology, students spent less time learning the new technology and instead focused on preparation of content.19 this finding may be of concern to instruction librarians who use a myriad of tools during emergency remote teaching. it is important to consider that these digital literacy divides could stem from factors not related to a student’s age group. researchers ellen johanna helsper and rebecca envon question the notion that a person may be called a digital native if they were born after 1980. these authors state that there are variables other than generational differences that could define a person as a digital native, such as gender, education, experience, and interaction with technology.20 therefore, even when people grow up in technological environments, they may not be considered digital natives. to minimize a gap in equity, lecture design, even for one-time library sessions, offers an opportunity to think of technology tools that could increase students’ participation and prompt learning. david ellis, studying classroom resources to enhance student engagement, notes that padlet, a web 2.0 technology, supports interaction and learning.21 seyed abdollah shahrokni, reviewing another web 2.0 technology, playposit, as a video tool for language instruction, states that it “can support learning in language classrooms” if used in a lecture design that includes relevant questions.22 lecture design applies to all types of settings: in person, flipped, and distance learning. approaches should be applied consistently to help students become more digitally literate and bridge equity issues where possible. jurgen schulte et al., providing examples of “new” librarian roles in a science curriculum, note that digital literacy enables better learning. 23 in the case of emergency remote teaching, instruction librarians may promote digital literacies through the use of technologies that increase students’ engagement and their “outside in” participation in the teaching and learning process. considering these challenges, the authors seek to identify the strengths and weaknesses of technology tools used by librarians and the digital literacy gaps that may prevent access to equitable library instruction. methods instrument the authors used a six-question qualtrics survey approved by the institutional review board at the university of colorado boulder. the survey was open for two weeks, between may 10 and may 24, 2020. it is worth noting that the questions were specific to this timeframe, and some responses indicated that instruction librarians were still finishing up spring semester 2020. the survey received 202 responses. however, the number of responses to each question varied as answers were not required. the data collected were both quantitative and qualitative, reflecting respondents’ practices, perceptions, and personal knowledge. respondents answered two multiple-choice and four free-text questions. for the multiple-choice questions, participants could choose all the options that applied and enter their own choice as well. the multiple-choice questions gathered data on the technology tools that librarians used to deliver content or to engage with students during covid-19. these questions distinguished between content delivery platforms (like zoom) and technology tools used for student engagement (like padlet). the technology tools included in the multiple-choice questions were chosen based on the authors’ knowledge of their potential relevance to instruction librarians. the information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 5 final four qualitative questions collected information about respondents’ perceptions of strengths and weaknesses of technology tools, as well as digital literacy gaps identified during covid -19 and other challenges to equitable instruction. qualtrics provided a report, which the authors organized in a spreadsheet used to analyze the data and create the figures. the following survey questions were asked: 1. what content delivery technology have you used to create your distance learning library sessions during covid-19? 2. what technology tools have you used to enhance student engagement in your distance learning library sessions during covid-19? 3. what are the strengths of the technology tools you’re using right now? 4. what are the weaknesses of the technology tools you’re using right now? 5. what digital literacy gaps have you identified in your students since covid-19 closures? ala’s digital literacy task force defines digital literacy as “the ability to use information and communication technologies to find, evaluate, create, and communicate information, requiring both cognitive and technical skills.” 6. what other challenges exist in your ability to effectively provide equitable information literacy instruction during this time? please see appendix a for the complete survey instrument. participants the survey was distributed through email to five listservs associated with academic libraries and library organizations: the seminar on the acquisition of latin american library materials (salalm) listserv, information literacy instruction discussion listserv, the library instruction roundtable (lita) listserv, the lita instructional technologies interest group listserv, and the literature in english discussion list. these organizations were chosen due to their connection with library instruction in academic libraries and the authors’ subject specialty affiliations (romance languages and english and american literature). grounded theory approach the data for questions 3, 4, 5, and 6 were analyzed using a basic grounded theory approach, where the authors collected themes and patterns from the responses rather than approaching the data with pre-existing hypotheses.24 based on their observations, the authors categorized responses according to an agreed-upon set of keywords. in addition, after coding the data separately, the researchers examined every answer together to ensure consistency and reliability. a mixedmethods survey with a grounded theory approach to analysis allowed for a larger number of responses than qualitative interviews. the survey format also allowed for quicker solicitation and analysis of data, given the urgency of the topic and the authors’ desire to provide recommendations to colleagues in a timely manner. findings popularity of technology tools figure 1 shows respondent selections from the list of content delivery tools provided by the authors. a large number of respondents used libguides as a content delivery tool during covid 19, followed closely by the video conferencing tool zoom. however, although libguides and zoom displayed a substantial amount of concurrence among the respondents, fewer than half of the https://lists.ala.org/sympa/info/ili-l https://lists.ala.org/sympa/info/lirt-l https://lists.ala.org/sympa/info/lirt-l https://lists.ala.org/sympa/info/lita-insttechig https://lists.ala.org/sympa/info/les-l information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 6 respondents used the rest of the technology tools shown in figure 1. these data suggest that a large number of the respondents were able to deliver library instruction via synchronous learning through zoom or by providing resources asynchronously via libguides, and thus had the opportunity to have at least some engagement with students. figure 1 also shows that more respondents used snagit and screencast-o-matic to create videos than playposit. similarly, a little over one-eighth of respondents used the graphic design tool canva to create content, although this tool had better usage than adobe illustrator, which was only used by one respondent. in addition, the communication software google hangouts was largely not used by respondents. the authors listed formative and pear deck in the survey options as well, but these were not selected by any respondents (not shown in figure 1). figure 1. respondent selections to question 1: what content delivery technology have you used to create your distance learning library sessions during covid-19? figure 2 represents the tools used by the 95 respondents who selected other and entered additional tools in a free-text box in question 1. tools mentioned, such as webex, camtasia, panopto, and kaltura capture, were used for video conferencing but to a lesser extent than zoom. similarly, only six respondents reported using narrated powerpoint. interestingly these tools were still used by more people than playposit. respondents mentioned a wide array of other technology tools in the free-text box (see appendix b); however, none of these tools were used individually by more than three respondents. information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 7 figure 2. other content delivery technology used to create distance learning library sessions during covid-19. the survey also asked about technology tools used for student engagement in distance learning library sessions during covid-19. the authors distinguish these tools from content delivery tools, as they are often utilized in conjunction with some of the tools mentioned in figure 1 to facilitate student interaction. figure 3 shows that, among the tools listed by the authors, respondents preferred the application google forms, found in google drive and google classroom, as more than one-third of respondents indicated they used this application to enhance student engagement. although representing fewer than half of the respondents, 18 more people selected google forms over poll everywhere, the tool with the second-best representation. moreover, poll everywhere and padlet, two online tools that enable student participation through custom-made polls and post-it boards, were each utilized by about one-fourth of participants. the game-based learning platform kahoot was used by nearly one-fifth of respondents, and mentimeter, another interactive platform allowing students to answer multiple-choice and openended questions, was used by 11 respondents. less than five percent of the respondents used the interactive technology tools flipgrid, answergarden, jamboard, mural, slido, and socrative. no respondents indicated they used pear deck, google drawings, quizalize, gosoapbox, and yo teach! (not shown in figure 3). in addition, 42 respondents entered the names of technology tools they used to enhance student engagement in the other free-text option. similar to the responses in the free-text answer for question 1, respondents provided a broad list of technology tools. two of the tools listed displayed a higher number of concurrences: eight respondents mentioned zoom polls and four mentioned information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 8 springshare libwizard. an additional 20 tools were used by just one participant each (see appendix c). figure 3. respondent selections to question 2: what technology tools have you used to enhance student engagement in your distance learning library sessions during covid-19? strengths and weaknesses of technology tools instruction librarians also described the perceived strengths of the technology tools they used. figure 4 shows that a little less than half of the respondents agreed “easy to use” was an important consideration for technology tools, making it the most frequently mentioned strength. responses showed interest in ease of use for librarians, students, and faculty alike. for example, respondents included the phrases “our learners were comfortable with them,” “it’s easy to get started,” and “everyone already knows zoom.” in addition, nearly one-fourth of participants selected the strength “interactive/collaborative” followed at a distance by the strength “flexible,” which dropped dramatically to 15 percent. in fact, the number of respondents who noted “interactive/collaborative” was almost quadruple the number of respondents who mentioned the less popular choices “supported by it” or “captioning functionality.” fewer than 19 participants acknowledged that it was important for the technology tools to enable remote instruction, include recording functionality and screen-sharing functionality, and to be able to enhance communication. only 11 participants wrote that it was important for the tool to be readily available. respondents referred to other strengths not included in figure 4 due to their infrequency. nonetheless, some of these strengths offer unique insights. for example, four respondents noted information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 9 that they favor free tools. in addition, three respondents stated that it was beneficial to repurpose content created with technology tools. two respondents mentioned that they preferred tools that do not require download and/or account creation. another respondent mentioned that mo bilefriendly tools were most helpful for engaging students. figure 4. respondent answers to question 3: what are the strengths of the technology tools you’re using right now? respondents also shared their observations around technology tool weaknesses. figure 5 shows that several perceived weaknesses were the inverse of strengths from figure 4, including that tools were “difficult to use” or “not interactive or engaging.” figure 5 also indicates that respondents were divided as to the most significant weaknesses. in fact, not even one-fourth of respondents selected the most frequent response, “not interacting or engaging,” displaying a lack of concurrence. the second most-repeated weakness referred to bandwidth requirements, with 27 respondents worrying about the lack of requisite internet access. the authors joined together seven weaknesses mentioned by respondents as “other functional limitations.” these weaknesses included “lack of screen capture,” “connection failures,” “lack of captioning,” “lack of recording capabilities,” “limited sharing screen,” “freezing video,” and “video quality.” each of these specific limitations was only mentioned a couple of times, but together these functional limitations were mentioned by 17 respondents. again, there were some specific weaknesses mentioned by only a few respondents. some of the highlights included tech overload or too many tools to choose from (two respondents), computer storage requirements (three respondents), and that the tools are not flexible enough or easy to information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 10 integrate into other systems such as canvas or libguides (four respondents). two people observed that the tech tools they used had no weaknesses. interestingly, 18 respondents included keywords and phrases in their answers (not shown in figure 4) that were not directly related to tool weaknesses, but rather described other issues affecting teaching and learning in a remote setting. these included students lacking computer s or having only cell phones (seven respondents), students’ limited technology skills or attitudes about remote learning (six respondents), students’ home setups (three respondents), and limited familiarity with the tools among teaching faculty (two respondents). these kinds of responses illustrate the wide range of interconnected factors impacting librarians’ experiences engaging with students and technology during covid-19. finally, 26 percent of librarians answering this question mentioned some weaknesses related to zoom (not shown in figure 5). to illustrate, some comments included “active learning in zoom [sic] is difficult . . .”; “zoom recordings take up a lot of space and our college is running out of room . . .”; “zoom doesn’t work as well when using wifi [sic], as opposed to connecting through a network”; “it is easy to zone out and not pay attention to zoom [sic]”; “with zoom it is difficult to interact with students on a one-to-one basis as they breakout [sic] to conduct research”; and “students tend to not have cameras on . . . and it’s hard to tell if they are actually paying attention.” these observations may show that while respondents favor using tools like zoom, they are also aware of important limitations. figure 5. respondent answers to question 4: what are the weaknesses of the technology tools you’re using right now? information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 11 digital literacy gaps beyond describing the technology tools used, respondents were asked to identify digital literacy gaps that they noticed in students during covid-19 closures. as stated above, the authors defined digital literacy in the survey question according to the ala digital literacy task force definition. still, answers to this question provoked a wide range of responses as seen in figure 6. the most frequently recurring response was that digital literacy gaps were the same as those perceived before the pandemic, although only 25 respondents agreed on this. digital literacy gaps observed by respondents included “lack of tech skills in general,” “problems evaluating information online,” “ineffective search strategies,” “difficulty communicating online,” “problems using library resources in general,” “problems using online resources,” “problems using library databases,” and “understanding citation and plagiarism.” the second-biggest category, “lack of tech skills in general,” included varied responses such as “some of my students lack a basic understanding of . . . browsers, upload/download, url versus link, activate/enable a feature etc.”; “students have trouble navigating multiple windows”; and “students are having a hard time trying something new which involves more than a single click or two.” eleven other respondents noted that it was too early to evaluate digital literacy gaps during emergency remote teaching. one respondent offered insight about the possibility that librarians missed gaps because they were not able to meet with all students. as they stated, “students who have access and are in contact with librarians seem to have adequate skills. i don’t know how many students simply lack internet access, and i don’t know how many need the library and don’t figure out how to access it. . . .” ideas for reaching more students who may not have access to in-class library sessions are mentioned in the recommendations section below. when asked about digital literacy gaps, some respondents mentioned student experiences during covid-19 that were not directly related to digital literacy and therefore are not included in figure 6. however, the authors considered this information relevant because it provided insight into perceived challenges students faced. the authors separated such responses into two g roups: external challenges and internal challenges. external challenges mostly involved technology access rather than digital literacy per se, with 22 respondents mentioning lack of tech access as a barrier or gap. it is worth noting that respondents mentioned this lack more than any individual digital literacy gap shown in figure 6. fifteen respondents also noted that students may lack internet access at home, while five percent mentioned a home environment that was not ideal or conducive to learning. although these external challenges are not explicitly related to digital literacy, the fact that they are mentioned here may indicate that respondents perceived these challenges as interrelated during the covid-19 pandemic. internal challenges included concepts that may be seen as related to digital literacy but are not explicitly included in the ala task force definition. in fact, many of these challenges had to do with pandemic-specific difficulties such as “emotional issues” arising during covid-19 (10 respondents). five respondents worried about information overload, while two respondents each mentioned that students were less likely to ask for help and more likely to have problems following directions during emergency remote learning. information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 12 figure 6. respondent answers to question 5: what digital literacy gaps have you identified in your students since covid-19 closures? the last survey question asked respondents to reflect on any other challenges that may have impacted their ability to effectively provide equitable library instruction during emergency remote learning. figure 7 displays an array of responses, including some related to technology tools, home environments, and institutional support. nonetheless, technology access from home was perceived as the most important challenge (39 respondents) followed closely by internet issues (35 respondents). for both challenges, the authors included responses that specified lack of tech access for students, teaching faculty, or instruction librarians. many respondents d id not specify who lacked access. however, one could argue that lack of access by any of those three groups may impede connection and student engagement. other challenges such as home environment, fewer library instruction sessions, communication barriers with students, lack of student engagement, no time to plan, emotional distress, and issues with synchronous or asynchronous instruction affected 11 percent or less of respondents each. additionally, the data indicated that librarians perceived more communication barriers with students (14 respondents) than with faculty (nine respondents). in figure 7, “asynchronous/synchronous” refers to problems encountered by respondents that had to do, in general, with the unique challenges of presenting content online either asynchronously or synchronously. for example, respondents mentioned being unsure whether students were engaging with asynchronous content. they also mentioned being asked by faculty to use one format over another, despite librarian preferences. one respondent focused specifically information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 13 on the need for flexibility when addressing equity: “asynchronous instruction does not allow the real time adaptation to student needs (cognitive and technical).” even though figure 7 relates to challenges experienced in providing equitable library instruction, respondents showed that there was also an emotional factor surrounding these challenges . two revealing responses to the question about challenges included “my kids running around in the background, not having an actual office, being expected to work 40 hours a week while homeschooling and running a household” and “some students [are] more or less in shock from the pandemic; some students have illness in the family; some students have economic issues, some students just don’t learn well with online learning only.” other comments stated personal challenges, such as the “stress of living in [the] epicenter of [a] global pandemic” and “my own mental and emotional capacity.” figure 7. respondent answers to question 6: what other challenges exist in your ability to effectively provide equitable information literacy instruction during this time? because there was often little consensus among responses, the authors created word clouds for all four qualitative questions (figure 8). each of these questions showed students at the center of instruction librarians’ responses, which is not surprising given their roles and the subject of this survey. the purpose of emergency remote teaching and learning is, at its core, to continue to connect students with resources and to engage them in their learning, even and especially when it is challenging to do so. still, it is meaningful to see students at the heart of these data. information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 14 figure 8. word cloud visualization for each qualitative question answer set. challenges and limitations many of the challenges encountered while analyzing data had to do with creating meaningful keyword codes for the qualitative survey questions. this coding was challenging because respondents expressed varied experiences and opinions and there was no significant consensus regarding tools used, tool weaknesses, digital literacy gaps, or other challenges. in contrast, respondents frequently referred to students’ lack of technology and internet access, even when the question at hand did not explicitly address this. these challenges speak both to the varied experiences of and institutional responses to covid-19, as well as perceived lack of tech or internet access among students as a primary barrier to effective emergency remote teaching. further, while some questions signaled a clear answer, others required interpretation. to illustrate, respondents used the term “accessibility” inconsistently. some respondents used this term to refer to accessibility for students with disabilities, and others used it to refer to “availability.” therefore, the authors employed contextual clues to determine meaning. regardless, if the meaning remained unclear, then these answers were not considered for coding. similarly, respondents didn’t always use the same language to describe the same concepts. for example, a participant noted that “the technology we have is limited to lecturing and answering questions and providing documents and videos online. we don’t have polls enabled . . . .” the information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 15 authors interpreted this to mean that the technology tools didn’t allow for robust engagement with students, though the respondent didn’t specifically mention the word “engagement.” again, if context or meaning was unclear, those responses were not coded. another challenge occurred in analyzing responses to question 6: what digital literacy gaps have you identified in your students since covid-19 closure? some respondents appeared to be unfamiliar with the term “digital literacy,” even though a definition was provided within the question. some respondents referred to hardware access, home environment, tech access, or psychological stress rather than explicitly reflecting on digital literacy gaps as included in ala ’s task force definition. these responses could indicate either confusion around the definition of digital literacy or, as suggested above, the perception of all these factors being codependent or interrelated. limitations of the study included the design of the survey itself. for example, respondents received a list of tools for questions 1 and 2, which may have meant that they were more likely to select these than to remember other tools that they used and add them to the other category accordingly. questions 3 through 6, in contrast, did not include any multiple-choice options, which may have limited the thoroughness of responses. for example, the average number of responses to question 3 was 2.08 strengths mentioned per respondent. we think it is likely that respondents would have indicated more strengths had they been presented with a list rather than only a freetext box. the authors also did not define the difference between content delivery tools and tools for student engagement in the survey. for this reason, there was some overlap noted in the responses for questions 1 and 2. also, respondents mentioned tools for engagement that were sometimes features of content delivery tools, such as webex whiteboards and lms discussion forums. the vast landscape of tools used meant that our survey could not account for all possible manifestations of technology for content delivery and student engagement. discussion questions 1 and 2: technology tools instruction librarians are using to deliver content and engage students the instruction librarians that answered the survey have widely used technology tools such as libguides and zoom in their library seminars during covid-19. however, as data show, librarians have also used many other technology tools to create and deliver emergency remote library sessions during covid-19, due perhaps to the wide array of tools available. while libguides and zoom exhibit a high percentage of usage, this result was expected because libguides is a wellknown tool used by academic librarians and, according to the company, zoom became more prominent as a tool during covid-19.25 the relatively low usage of adobe illustrator is also somewhat predictable because this tool not only requires a subscription, but also may have a higher learning curve than other free graphic editor and design programs. data raised some further questions about the role of information technology (it) departments. are instruction librarians reaching out to their respective universities’ it departments to learn about technology tools available to them and vice versa? are it offices willing and able to provide training via video conferencing if in-person training is not available due to the pandemic? do it departments offer enough promotion to advertise these tools? these questions are not addressed in this manuscript but are important avenues for further research. only six percent of respondents information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 16 recorded “supported by it” as a strength of the technology tools they were using. this shallow percentage may appear striking but could be understood under the premise that as the pandemic set in across the united states and instruction librarians rushed to prepare and present online sessions, librarians relied on the tools that were most familiar to them instead of learning a new technology tool. the data from this survey would seem to corroborate this, as so many respondents chose “ease of use” as an important strength. one interesting detail that is worth addressing about the tools is that respondents mentioned other more than they selected the options the authors provided, which may imply that either the authors did not include the most-used tools or that the number and/or variety of tools is so wide that it is difficult to reach a consensus. one wonders if the tools mentioned in other had been included as part of the listed options, the number of respondents using these tools would be higher. questions 3 and 4: the strengths and weaknesses of tools as they affect student engagement the authors wanted to know what tools respondents had used to enhance student engagement. data show that google forms is the tool that most of the respondents have used for this purpose. however, fewer respondents used tools that are purposely designed to increase interaction in online sessions, such as kahoot and mentimeter, which do not even require a fee for using their basic features. respondents’ perceptions of the strengths and weaknesses of tools they have used provided useful information. in terms of tools strengthening student engagement, the responses were not as conclusive, as 40 of 150 respondents found these tools interactive or collaborative, and an even lower number of these respondents thought of these tools as flexible or helpful to enable remote instruction. one could argue that ada capabilities are features that may facilitate student engagement. however, when respondents were asked about the strengths of technology tools they had used, accessibility was not often mentioned as a strength. moreover, data showed that only eight respondents referred to ada problems and three of them voiced concern over captioning capabilities, which is considered a relevant ada feature. there was no mention of alt text for images, screen-readable and software-neutral file formats, or the importance of user ability to change the color and font setting in their devices to see the content. in fact, only three respondents specifically mentioned issues with videos in terms of their audio quality, lack of auto closed-captioning, and freezing images. respondents noted wide-ranging effects of tool weaknesses on both instruction librarians and students. to illustrate, the weaknesses “time intensive,” “not designed for teaching,” and “no feedback or assessment” likely affected instruction librarians at a personal level as they prepared for and assessed their teaching. in contrast, the weaknesses “ada problems,” “not interactive or engaging,” “difficult to use,” and “makes communication difficult” might primarily impact students. other concerns respondents stated that may influence student engagement included poor bandwidth, which affects internet access and causes connection issues. for example, even if librarians try to improve the video quality in zoom by disabling the higher definition option, or start a session with audio conferencing only, which will decrease the amount of bandwidth needed, students with poor bandwidth may still not be able to engage. therefore, in situations of emergency remote learning, if students lack bandwidth or an appropriate home environment, learning and engaging may become a challenge. information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 17 questions 5 and 6: digital literacy gaps and equitable instruction a number of factors affect the ability to provide equitable library instruction on the librarian side and to engage with equitable library instruction on the students’ side. one of these factors is home environment, including access to computers, good bandwidth, or an appropriate working station. our data specify that 15 respondents perceived “home environment” as a challenge when providing equitable library instruction. in addition, some respondents noticed home environment issues when asked about digital literacy gaps in relation to shared spaces and lack of computers and access. it’s possible that equity issues increased during the covid-19 shutdown, which raises the question of whether there is a correlation between the issues affecting students, librarians, and faculty. data showed that of 139 participants, a little over one-fourth of them considered “tech access from home” and “internet issues” as challenges in the ability to provide equitable library instruction. these challenges, along with “emotional issues,” were perceived to affect not only students but also librarians and faculty. although responses recorded librarians’ perceptions on equity issues, often including their own experiences, data revealed that respondents presumed faculty and students were having similar issues. data also exhibited that respondents perceived other challenges, such as “fewer library sessions” and “student lack of engagement,” that may affect students directly. fewer library sessions are a challenge that may be further addressed forthwith. however, students’ lack of engagement is a difficulty that may require thoughtful outreach, collaboration with mental health offices at the campus level, and reflective and inclusive lecture design. these challenges may have a negative impact on receiving an equitable learning experience. in fact, less commonly acknowledged gaps may, in some ways, be more important than those frequently mentioned. a well-known gap can be addressed because there is consensus that the gap exists and poses a barrier to equity in education. unnoticed or overlooked gaps, in contrast, are more difficult to address but may be no less important as barriers to equitable education. equity issues may also arise as a result of lack of digital literacy skills in students. students with higher digital literacy are deemed to perform better in an emergency remote library instruction setting, may be more prone to stay in tune and engaged with the lesson, and have less emotional stress by feeling confident. however, as wan ng explains above, those recognized as digital natives may not necessarily have digital literacy, even if they are comfortable with social media tools.26 data do not tell us the age of students; regardless, digital literacy gaps were detected by respondents. these perceived gaps in digital literacies (evaluating information, communicating online, applying search strategies, using library resources and databases, understanding plagiarism and citation, and using online resources) are important for librarians to address during emergency remote learning. last, the lack of consensus may be explained by the complexity of the concept of digital literacy. it is possible that many of these gaps existed before, but librarians recognized them as new during emergency remote learning. one response illustrates this idea: “the closure has prompted many more students to request help in every step of the digital literacy process. i’m not sure if students typically ask each other, or their professors/instructors. regardless, it’s exposed that not all students know things i’d assumed they did.” whether these gaps are new or not remains unclear, as evidenced by another respondent who stated, “nothing new to the covid era.” information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 18 recommendations these recommendations seek to address some of the issues that arose in the data, especially those regarding equity and emergency remote library instruction. to illustrate, one respondent summed up the current situation while also posing a question that appears valuable: “not all of our students have the same access to stable technology and internet, nor do they all respond to online teaching strategies in the same ways. how do we create equitable and accessible learning opportunities?” while the authors do not have all the answers, based on the analysis of the data and emerging themes, some recommendations may help instruction librarians move forward through the covid-19 crisis. technology and equity the authors realize that a budget is essential for the implementation of recommendations that may reduce both inequitable access to information and lack of digital literacy. nonetheless, the recommendations below intend to offer guidance on ways to improve equitable access, digital literacy, and student engagement during emergency remote library sessions. one external digital barrier for students engaging in emergency remote library sessions was the lack of equipment at home, possibly due to economic hardship. university libraries could provide kits containing a chromebook, webcam, microphone, wi-fi hotspot, and headphones to increase equitable access. access to this equipment may help students feel supported and understood, with a sense of dignity. these offerings should be in coordination with other campus units who may provide similar services, such as student affairs and it departments. likewise, a coordinated marketing and outreach effort at the campus level may enhance the visibility of equipment available for student use. as stated above, “ease of use” rose to the top as the most-frequently mentioned technology tool strength, which is understandable given the many stressors educators and students may be experiencing during covid-19. however, it is important to keep in mind that tools should be “easy to use” not just for librarians and teaching faculty, but especially for students. nonetheless, given the difficulty of assessing instructional technology and library information literacy sessions right now, it is challenging to know whether students find the technology tools that librarians choose truly “easy to use.” compounding the perception that tools are easy to use is the possibility that tools may not be ada accessible. though the survey did not ask about accessibility explicitly, and while the authors did not vet the tools listed in the survey for their accessibility features, the authors wonder how many tools are fully accessible to all learners. instead of choosing tools for their perceived ease of use, a further recommendation is to move beyond valuing what’s easy to critically reflect on whether tools are fully accessible to students with visual or hearing impairments or learning differences. if the answer is no or unclear, perhaps using basic content delivery tools that are vetted for accessibility features is the better option. it is recommended to follow best practices for using those tools (for example, by referring to guidance from campus it departments). if instruction librarians consider themselves “blended,” or perhaps even so well-versed in technology that a term like “blended librarian” is no longer needed, they should also prioritize flexible, responsive, and intentional use of technology in their lecture design. if a tool that they assumed would be easy to use for all students is proving challenging for some, librarians should have alternative options and extra support at the ready. they may also ask themselves whether information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 19 use of a technology tool furthers the learning process and outcomes of the course, or if technology is added for its own sake. in addition, avoiding use of extra tools and technology that does not genuinely enhance lecture goals and priorities may help students avoid stress related to technology, which could further students’ emotional well-being during this fraught time. being clear with students about which tools will be used and for what purpose may help students who would otherwise struggle with layered content delivery and engagement tools. a glossary of these tools, along with when and how they’ll be used and links to technical support, could be a helpful support document for students. communication and equity it is worth exploring librarian, student, and faculty communication not explicitly focused on technology. some respondents mentioned outreach and connection challenges that have less to do with technology and more to do with other stressors and limitations. for example, some librarians reported receiving fewer requests for information literacy sessions or library support than usual, and some speculated that this was because of the quick move to emergency remote learning, lack of time to plan, and the possibility that a library session was “extra” and faculty were trying to simplify. there are several ways to address this challenge. librarians can attempt to meet students and faculty where they are by offering multimodal learning opportunities, including both synchronous and asynchronous offerings (zoom meetings, prerecorded videos, tutorials/quizzes, canvas discussion posts, and libguides are a few options). it is also paramount to make sure librarians are reachable at the point-of-need, which may mean extended weekend and evening hours on the virtual ask-a-librarian desk. also imperative is ensuring that virtual services, as well as consultation request links and/or email addresses, are clear and visible to students and faculty on the library’s website. some survey respondents mentioned that communication with faculty was difficult, and this may have contributed to fewer instruction requests. while it is understandable that faculty may have been less responsive to librarian outreach for a variety of reasons, there are some ways to encourage faculty communications. for example, librarians could provide simple, bulleted lists with updated information on services and offerings, individual attention (focused on specific classes and topics), and options, acknowledging that some faculty will simply not want to share classroom time during emergency remote teaching. librarians can also work to bridge the disconnect between it and their departments by proactively reaching out to learn about best practices not only for technology use, but also for ada accommodations. even when information literacy sessions are requested, faculty may not always share student accommodation needs. librarians can ask for help from it or other units on campus (such as centers for teaching and learning) to make sure that their communication techniques are aligned with inclusive, user-centered approaches to teaching and learning with technology. as professionals in a unique role serving both students and faculty, librarians may also check in on a person-to-person basis with both groups. acknowledging that we are people with mental and physical health needs working together in difficult circumstances is one way of connecting with students and faculty in an authentic way. emergency remote teaching and learning is different information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 20 from typical remote or online learning and being clear about that might also help everyone adjust expectations and extend compassion. professional development and personal support while emergency remote teaching and learning may not seem like the best time for professional development, it is important to acknowledge that librarians deserve support in navigating this unprecedented time. even as we clearly want to help students who may be especially vulnerable during covid-19, there is a sense of being overwhelmed, and librarians may not always know where to start. while there are online webinars and discussions that provide advice about how to best help students during covid-19, the authors recommend a more specific approach targeting digital literacy gaps and support systems for librarians. in reviewing survey responses to perceived digital literacy gaps and other challenges, it became clear that not all librarians are well-versed in digital literacy concepts. if librarians have time to take one approach to professional development as it relates to instruction and information literacy, the authors recommend learning more about digital literacy competencies and thinking critically about how emergency remote library instruction design can address those competencies and potential gaps. of course, stresses of the pandemic are impacting librarians as well as faculty and students. it is important that we connect with colleagues and support systems during this time. one option might be to form a community with colleagues to determine best practices for use of technology in instruction among other relevant topics (examples at the authors’ library include anti-racist actions and a caregiver’s support group). librarians should also prioritize their own health (mental and physical) and stress management. the recommendations are everywhere but bear repeating: connect with family and friends, exercise, take time away from the computer, and make sure to rest. librarians should be kind to themselves and their colleagues and offer or ask for support when needed. conclusion as of spring 2021, the covid-19 pandemic is not yet over. it remains unclear whether and when academic library instruction will return to the old normal. the data collected and analyzed during this paper, as well as the discussion and recommendations, can inform how instruction librarians respond to student needs and challenges as everyone continues to cope with life during emergency remote learning. especially compelling are the data shared about the strengths and weaknesses of technology tools used to enhance student engagement in library instruction. these data provide parameters that may help other instruction librarians make decisions when choosing a technology tool and be prepared to troubleshoot when issues arise. a concerning data revealed that digital literacy, as defined by ala’s digital literacy task force, is a subject that may not be widely understood by instructors. although our pool of respondents was small, instruction librarians may need a broader understanding of what digital literacies look like in practice when dealing with emergency remote teaching and a diverse student population. while instruction librarians’ experiences and perceptions are one important piece of the puzzle, especially in acknowledging shared challenges, it is important to recognize that students may have needs, digital literacy or otherwise, that educators are missing. though assessment is difficult right now, reflection and attention to the whole student experience is necessary. working with colleagues on campus to provide technology, including laptop computers and wi-fi hotspots, as information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 21 well as evaluating our content delivery and engagement tech tools for ada accessibility, are examples of ways that instruction librarians can connect students with unmet needs to resources during this difficult time. examining instruction librarians’ ongoing response to the pandemic, while challenging, will help libraries become more emergency-responsive and better able to meet the needs of diverse students in the 21st century. acknowledgement we would like to thank moria woodruff from the university of colorado boulder writing center for her help revising this manuscript. information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 22 appendix a: survey instrument distance learning during a pandemic: a matter of equity we are curious to hear about your experiences of library instruction during the abrupt shift to online learning. in particular, we are researching librarians’ use of technology tools for online content delivery and student engagement during covid-19.this survey should take less than ten minutes to complete. your answers will be anonymous. please do not include personally identifiable information. participation in the survey indicates your consent for us to use the data collected in a forthcoming research paper about using online technology tools to teach information literacy or library seminars during covid-19. the survey will be open through sunday, may 24th. thank you for your participation! q1 what content delivery technology have you used to create your distance learning library sessions during covid-19? select as many as apply. zoom microsoft teams libguides course management system (e.g., canvas) formative pear deck adobe illustrator snagit screencast-o-matic playposit google hangouts google classrooms canva (graphic design tool) other information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 23 q2 what technology tools have you used to enhance student engagement in your distance learning library sessions during covid-19? select as many as apply. padlet answergarden kahoot! mentimeter flipgrid slido socrative jamboard pear deck mural google drawings google forms quizalize gosoapbox poll everywhere yo teach! other q3 what are the strengths of the technology tools you’re using right now? q4 what are the weaknesses of the technology tools you’re using right now? information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 24 q5 what digital literacy gaps have you identified in your students since covid-19 closures? ala’s digital literacy task force defines digital literacy as “the ability to use information and communication technologies to find, evaluate, create, and communicate information, requiring both cognitive and technical skills.” q6 what other challenges exist in your ability to effectively provide equitable library instruction during this time? information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 25 appendix b: tools mentioned by three or fewer respondents, question 1, option other ninety-five respondents answered other to question 1: what content delivery technology have you used to create your synchronous and asynchronous distance learning library sessions during covid-19? tool name type of tool number of respondents bluejeans online meetings 3 google meet online meetings 3 jing / techsmith capture screen capture 3 blackboard ensemble video creation 2 imovie video editing 2 guide on the side interactive tutorials 2 kapwing video and image editing 2 libchat communications service 2 piktochart graphics editing 2 techsmith relay video creation 2 thinglink multimedia editing 2 adobe indesign desktop publishing 1 adobe photoshop graphics editing 1 adobe premiere pro video editing 1 amazon chime communications service 1 audacity audio editing 1 chat (in general) communications service 1 clideo video editing 1 faststone capture screen capture 1 genially interactive content creation 1 google sheets web-based spreadsheets 1 gotomeeting online meetings 1 information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 26 tool name type of tool number of respondents microsoft bookings scheduling 1 microsoft stream video sharing 1 powtoons video creation 1 pressbooks content management 1 prezi video video creation 1 qualtrics surveys 1 quicktime multimedia editing 1 screenflow video editing and screen capture 1 springshare libwizard interactive tutorials and forms 1 telephone communications service 1 videoscribe animated video creation 1 vimeo video sharing 1 whatsapp communications service 1 information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 27 appendix c: tools mentioned by one respondent, question 2, option other forty-two respondents answered other to question 2: what technology tools have you used to enhance student engagement in your distance learning library sessions or courses during covid 19? each tool was used by only one respondent. tool name type of tool articulate storyline interactive e-learning modules calendly scheduling camtasia video editing and screen recording canva quizzes quizzes google voice communications service h5p programming language for websites handout (not a technology tool) html/css programming language for websites knight lab tools storytelling lms discussion forums discussions microsoft powerpoint presentation platform microsoft word word processor nearpod interactive lessons parlay discussions qualtrics surveys remind communications service speakpipe communications service twine storytelling voicethread video, voice, and text commenting webex whiteboard drawing tool information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 28 endnotes 1 charles hodges et al., “the difference between emergency remote teaching and online learning,” educause review (2020), https://er.educause.edu/articles/2020/3/thedifference-between-emergency-remote-teaching-and-online-learning. 2 hodges et al., “the difference.” 3 jody greene, “how (not) to evaluate teaching during a pandemic,” chronicle of higher education (2020), https://www-chronicle-com.colorado.idm.oclc.org/article/how-not-to-evaluateteaching/248434. 4 laura czerniewicz, “what we learnt from ‘going online’ during university shutdowns in south africa,” philoned (2020), https://philonedtech.com/what-we-learnt-from-going-onlineduring-university-shutdowns-in-south-africa/. 5 for scholarship on equity and librarianship see joanne oud, “systematic workplace barriers for academic librarians with disabilities,” college & research libraries 80, no. 2 (2019), https://doi.org/10.5860/crl.80.2.169; amanda l. folk, “reframing information literacy as academic cultural capital: a critical and equity-based foundation for practice, assessment, and scholarship,” college & research libraries 80, no. 5 (2019), https://doi.org/10.5860/crl.80.5.658; scott seaman, carol krismann, and nancy carter, “salary market equity at the university of colorado at boulder libraries: a case study followup,” college & research libraries 64, no. 5 (2003), https://doi.org/10.5860/crl.64.5.390; freeda brook, dave ellenwood, and althea eannace lazzaro, “in pursuit of antiracist social justice: denaturalizing whiteness in the academic library,” library trends 64, no. 2 (2015), https://doi.org/10.1353/lib.2015.0048; isabel gonzalez-smith, juleah swanson, and azusa tanaka, “unpacking identity: racial, ethnic, and professional identity and academic librarians of color,” in the librarian stereotype: deconstructing perceptions and presentations of information work, ed. nicole pagowsky and miriam rigby (chicago: association of college and research libraries, 2014), 149–73. 6 tom riedel and paul betty, “real time with the librarian: using web conferencing software to connect to distance students,” journal of library & information services in distance learning 7, no. 1–2 (2013): 101, https://doi.org/10.1080/1533290x.2012.705616. 7 keith shaw, “colleges expand vpn capacity, conferencing to answer covid-19,” network world (online) (2020): 1. 8 monica anderson and andrew perrin, “nearly one-in-five teens can’t always finish their homework because of the digital divide,” pew research center fact tank news in the numbers, october 26, 2018, https://www.pewresearch.org/fact-tank/2018/10/26/nearlyone-in-five-teens-cant-always-finish-their-homework-because-of-the-digital-divide/. 9 julie arnold lietzau and barbara j. mann, “breaking out of the asynchronous box: suing web conferencing in distance learning,” journal of library & information services in distance learning 3, no. 3–4 (2009): 113, https://doi.org/10.1080/15332900903375291. https://er.educause.edu/articles/2020/3/the-difference-between-emergency-remote-teaching-and-online-learning https://er.educause.edu/articles/2020/3/the-difference-between-emergency-remote-teaching-and-online-learning https://www-chronicle-com.colorado.idm.oclc.org/article/how-not-to-evaluate-teaching/248434 https://www-chronicle-com.colorado.idm.oclc.org/article/how-not-to-evaluate-teaching/248434 https://philonedtech.com/what-we-learnt-from-going-online-during-university-shutdowns-in-south-africa/ https://philonedtech.com/what-we-learnt-from-going-online-during-university-shutdowns-in-south-africa/ https://doi.org/10.5860/crl.80.2.169 https://doi.org/10.5860/crl.80.2.169 https://doi.org/10.5860/crl.80.5.658 https://doi.org/10.5860/crl.64.5.390 https://doi.org/10.1353/lib.2015.0048 https://doi.org/10.1080/1533290x.2012.705616 https://www.pewresearch.org/fact-tank/2018/10/26/nearly-one-in-five-teens-cant-always-finish-their-homework-because-of-the-digital-divide/ https://www.pewresearch.org/fact-tank/2018/10/26/nearly-one-in-five-teens-cant-always-finish-their-homework-because-of-the-digital-divide/ https://doi.org/10.1080/15332900903375291 information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 29 10 aek phakiti, david hirsh, and lindy woodrow, “it’s not only english: effects of other individual factors on english language learning and academic learning of esl international students in australia,” journal of research in international education 12, no. 3 (2013): 248. 11 t. v. semenova and l. m. rudakova, “barriers to taking massive open online courses,” russian education & society 58, no. 3 (2016): 242, https://doi.org/10.1080/10609393.2016.1242992. 12 xinghua wang, seng chee tan, and lu li, “technostress in university students’ technologyenhanced learning: an investigation from multidimensional person-environment misfit,” computers in human behavior 105, (2020): 2, https://doi.org/10.1016/j.chb.2019.106208. 13 “digital literacy,” ala literacy clearinghouse, accessed may 16, 2021: https://literacy.ala.org/digital-literacy/. 14 steven j. bell and john shank, “the blended librarian: a blueprint for redefining the teaching and learning role of academic librarians,” college & research libraries news 65, no. 7 (2004): 374, https://doi.org/10.5860/crln.65.7.7297. 15 vanessa w. vongkulluksn, kui xie, and margaret a bowman, “the role of value on teachers’ internalization of external barriers and externalization of personal beliefs for classroom technology integration,” computer & education 118, (2018): 79, https://doi.org/10.1016/j.compedu.2017.11.009. 16 jesper aagaard, “breaking down barriers: the ambivalent nature of technologies in the classroom, new media & society 19, no. 7 (2017): 1138, https://doi.org/10.1177/1461444816631505. 17 wan ng, “can we teach digital natives digital literacy?” computers & education 59, no. 3 (2012): 1065, https://doi.org/10.1016/j.compedu.2012.04.016. 18 ng, “can we teach,” 1071–72. 19 ng, “can we teach,” 1072. 20 ellen helsper and rebecca enyon, “digital natives: where is the evidence?,” british educational research journal 36, no. 3. (2010): 515, https://doi.org/10.1080/01411920902989227. 21 david ellis, “using padlet to increase student engagement in lectures,” in proceedings of the 14th european conference on e-learning (ecel 2015), ed. amanda jefferies and marija cubric (reading, uk: academic conferences and publishing international limited): 195. 22 seyed abdollah shahrokni, “playposit: using interactive videos in language education,” teaching english with technology 18, no. 1 (2018): 106. 23 jurgen schulte et al., “shaping the future of academic libraries: authentic learning for the next generation,” college & research libraries 79, no. 5 (2018): 688, https://doi.org/10.5860/crl.79.5.685. https://doi.org/10.1080/10609393.2016.1242992 https://doi.org/10.1016/j.chb.2019.106208 https://literacy.ala.org/digital-literacy/ https://doi.org/10.5860/crln.65.7.7297 https://doi.org/10.1016/j.compedu.2017.11.009 https://doi.org/10.1177/1461444816631505 https://doi.org/10.1177/1461444816631505 https://doi.org/10.1016/j.compedu.2012.04.016 https://doi.org/10.1080/01411920902989227 https://doi.org/10.5860/crl.79.5.685 information technology and libraries june 2021 emergency remote library instruction and tech tools | ibacache, rybin koob, and vance 30 24 chiara faggiolani, “perceived identity: applying grounded theory in libraries,” jlis.it: italian journal of library and information science 2, no. 1 (2011): 4592, https://doi.org/10.4403/jlis.it-4592. 25 “over 700 universities and colleges now use zoom!” zoom blog, july 15, 2013, https://blog.zoom.us/over-700-universities-and-colleges-now-use-zoom-video-conferencing/. 26 ng, “can we teach,” 1071–72. https://doi.org/10.4403/jlis.it-4592 https://blog.zoom.us/over-700-universities-and-colleges-now-use-zoom-video-conferencing/ abstract introduction literature review barriers to equitable student access in online learning technology tools, digital literacy, student engagement methods instrument participants grounded theory approach findings popularity of technology tools strengths and weaknesses of technology tools digital literacy gaps challenges and limitations discussion questions 1 and 2: technology tools instruction librarians are using to deliver content and engage students questions 3 and 4: the strengths and weaknesses of tools as they affect student engagement questions 5 and 6: digital literacy gaps and equitable instruction recommendations technology and equity communication and equity professional development and personal support conclusion acknowledgement appendix a: survey instrument appendix b: tools mentioned by three or fewer respondents, question 1, option other appendix c: tools mentioned by one respondent, question 2, option other endnotes microsoft word author_edits_march_ital_rebmannproof_edits.docx tv white spaces in public libraries: a primer kristen radsliff rebmann, emmanuel edward te, and donald means information technology and libraries | march 2017 36 abstract tv white space (tvws) represents one new wireless communication technology that has the potential to improve internet access and inclusion. this primer describes tvws technology as a viable, long-term access solution for the benefit of public libraries and their communities, especially for underserved populations. discussion focuses first on providing a brief overview of the digital divide and the emerging role of public libraries as internet access providers. next, a basic description of tvws and its features is provided, focusing on key aspects of the technology relevant to libraries as community anchor institutions. several tvws implementations are described with discussion of tvws implementations in several public libraries. finally, consideration is given to first steps that library organizations must take when contemplating new tvws implementations supportive of wifi applications and crisis response planning. introduction tens of millions of people rely wholly or in part on libraries to provide access to the internet. many lack access to the federal communications commission (fcc) recommended standard of 25 mbps (megabits per second) download speed and 3 mbps upload speed.1 though the fcc reclassified high-speed internet as a public utility under title ii of the telecommunications act to ensure that broadband networks are “fast, fair, and open” in 2015,2 the “digital divide” still remains. one in four community members does not have access to the internet at home. accounting for age and education level, households with the lowest median income households have service adoption rates of around 50%, compared to those with higher incomes, with rates of 80 to 90%.3 a recent pew research center survey on home broadband adoption found that 43% of those surveyed reported cost being their main reason for non-adoption.4 individuals with low quality or no access are more likely to be digitally disadvantaged, tend to use library computers more frequently, and are less equipped to interact and compete economically as more services and application processes move online.5 kristen radsliff rebmann (kristen.rebmann@sjsu.edu) is associate professor, san jose state university school of information, san jose, ca. emmanuel edward te (emmanueledward.te@sjsu.edu) is a graduate student, san jose state university school of information, san jose, ca. donald means (don@digitalvillage.com) is co-founder and principal of digital village associates, sausalito, ca. tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 37 this article highlights tv white space (tvws), a new wireless communication technology with the potential to assist libraries in addressing digital access and inclusion issues. this primer provides first a brief overview of the digital divide and the emerging role of public libraries as internet access providers, highlighting the need for cost-efficient, technological solutions. we go further to provide a basic description of tvws and its features, focusing on key aspects of the technology relevant to libraries as community anchor institutions. several tvws implementations are described with discussion of how tvws was set up in several public libraries. finally, we extend consideration to first steps library organizations must consider when contemplating new implementations including everyday applications and crisis response planning. digital access and inclusion the term “digital divide” describes the gap between people who can easily access and use technology and the internet, and those who cannot.6 as kinney observes, “there has not been one single digital divide, but rather a series of divides that attend each new technology.”7 digital divides are exacerbated by various factors including: socioeconomic status, education, geography, age, ability, language, and especially availability and quality.8 in recent years, the language describing this issue has changed, but the inequalities stay consistent and widen among different dimensions with each emerging technology. the most recent public policy term “digital inclusion” promotes digital literacy efforts for unserved and underserved populations.9 the progression from the term “digital divide” to “digital inclusion” represents a shift in focus from issues of access exclusively toward contexts and quality of participation and usage. along these lines, the language of digital inclusion reframes the issue by making visible that simply focusing on internet access can obscure the fact that divides associated with quality and effectiveness remain.10 in response to the digital divide, public libraries have become the “unofficial” providers of internet access, stemming from libraries’ access to broadband infrastructure, maintenance of publiclyavailable computers, and services providing assistance and training.11 a pew research center survey on perceptions of libraries found that most respondents reported viewing public libraries as important parts of their communities, providing resources and assisting in decisions regarding what information to trust.12 however, many public libraries are facing an “infrastructure plateau” of internet access due to few computer workstations and slower broadband connection speeds that can support a growing number of users,13 on top of insufficient funding, physical space, and staffing.14 previous surveys show that although public libraries are connected to the internet and provide public access workstations and wireless access, nearly 50% of public libraries only offer wireless access that shares the same bandwidth as their workstations.15 this increased usage strains existing network connections and infrastructure, resulting in slower connections for everyone connected to the public library’s network. many public libraries cannot accommodate more workstations, support the power requirements of both workstations and patrons’ laptops, and afford workstation upgrades and bandwidth increases to move past their insufficient connectivity speeds. libraries often lack the it skills, time, and funds to upgrade their information technology and libraries | march 2017 38 infrastructure.16 typical wireless access via wi-fi is relegated to distances within library buildings, which may extend to exterior spaces and is available only during operating hours. despite these challenges, public libraries continually provide access and “at-the-point-of-need” training and support for their patrons, especially for those who do not have easy access to the internet and computers.17 subsidized by federal funding, libraries represent key access providers and technology trainers for the public without internet access.18 the fcc classifies libraries as “community anchor institutions” (cais), organizations that “facilitate greater use of broadband by vulnerable populations, including low-income, the unemployed, and the aged.”19 recent surveys show that users have a positive view of libraries, providing opportunities to spend time in a safe space, pursue learning, and promote a sense of community. librarians offer internet skills training programs more often than other community organizations though (at around 75% of the time) training occurs informally.20 in particular, 29% of respondents to a library use survey reported going to libraries to use computers, the internet, or the wi-fi network; 7% have also reported using libraries’ wi-fi signals outside when libraries are closed.21 the majority of these users are more likely to be young, black, female, and lower income, utilizing library technology resources for school or work (61%), checking email or sending texts (53%), finding health information (38%), and taking online courses or completing certifications (26%).22 public libraries are already exploring creative approaches to providing internet access for these underserved communities. the mobile hotspot lending program in public library systems in new york city and kansas city are just two examples.23 yet libraries must do more by supporting innovation and providing leadership by partnering with other community organizations and their stakeholders to enhance resilience in addressing access and inclusion. the emergence of tvws wireless technology presents an opportunity for libraries to explore expanding the reach of their wireless signals beyond library buildings and extend 24/7 library wi-fi availability to community spaces such as subsidized housing, schools, clinics, parks, senior centers, and museums. tvws basics tv whitespace (tvws) refers to the unoccupied portions of spectrum in the vhf/uhf terrestrial television frequency bands.24 television broadcast frequency allocations traditionally assumed that tv station transmissions operating at high power needed wide spectrum separation to prevent interference between broadcasting channels, which led to the specific spectrum allocation of these frequency “guard bands.”25 research discovered that low-power devices can operate within these spaces, which led the federal communications commission (fcc) to field test tvws applications to wireless communications and (ultimately) promote tvws neutrality.26 in 2015, the federal communications commission (fcc) made a portion of these very valuable tvws bands of spectrum available for open, shared public use, like wi-fi. yet, unlike wi-fi, with a reach measured in 10s of meters, the range of tvws is measured in 100s or even 1000s of meters. tvws has good propagation characteristics, which makes it an extremely valuable license-exempt radio spectrum.27 it is a relatively stable frequency that does not change over time, allowing for tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 39 spectrum availability estimates to remain reliable and valid, which in turn promotes its various applications.28 radio spectrum is considered a “common heritage of humanity,”29 as radio waves “do not respect national borders.”30 the fcc recently made a portion of these tvws bands of spectrum available for open, shared public use.31 tvws availability and application are contextual and dependent on many key factors. availability is influenced by frequency (the idle channels purposely planned in tv bands, varying across regions), deployment (the height and location of the tvws transmit antenna and its installation sites in relation to nearby surrounding tv broadcasting reception), space and distance (geographical areas outside the current planned tv coverage, including no present broadcasting signals), and time (off-air availability of licensed broadcasting transmitters during specific periods of time, subject to change by the broadcaster).32 as tvws existed as fragmented “safety margins” between broadcast services, tvws is typically more abundant in rural areas that have less broadcast coverage and in larger contiguous blocks rather than in highly dense urban areas.33 assigned spectrum is not always used efficiently and effectively by licensees, and exclusive or nonexclusive sharing can alleviate pressure on these resources.34 this “spectrum crunch” of the inefficient use of scarce spectrum resources can be alleviated with dynamic spectrum access (dsa) and spectrum sharing. tvws availability is small where digital television has been deployed, with the potentials for aggregate interference (from tvws users in relation to primary tv service) and self-interference (within the tvws network), which may lead to a “mismatch situation” where there is high demand for bandwidth but very low tvws bandwidth supply.35 as most spectrum frequencies have been organized through some form of exclusive access in which only the licensee can use the specific spectrum, technologies such as cognitive radios can enable new modes of spectrum access, supporting autonomous, self-configuring, self-planning networks which rely on up-to-date tvws availability databases. the limited distribution (in many areas) of basic broadband infrastructure and relatively high cost of access often prevents individuals with lower incomes from participating in the digital revolution of information access and its opportunities.36 despite these challenges to broadband availability, tvws excels in areas with low broadband coverage. rural regions possess greater frequency availability due to lower density of spectrum licensing. in comparison to other frequencies operating higher up on the spectrum band, tvws does not require direct line-of-sight between devices for operation, and has lower deployment costs. equipment market costs are comparable to wi-fi equipment currently on the market.37 importantly, tvws can address access and inclusion by having relatively low start-up costs and no ongoing services fees. as a public resource, it can work with existing services to create new, potentially mobile connections to the internet that ensure the continuation of vital services in the event of service interruptions.38 in urban areas with fewer channels available, new efficient spectrum sharing policies will be necessary. assigned spectrum is not always used efficiently and effectively by licensees, and exclusive or non-exclusive sharing or “recycling” of bands for more information technology and libraries | march 2017 40 effective spectrum use by multiple parties with changing spectrum needs can alleviate pressure on these resources.39 tvws for public libraries tvws is a viable medium for applications from internet access, content distribution within a given location, tracking (people, animals, and assets), task automation, and public safety and security,40 as well as remote patient monitoring and other telemedicine applications.41 tvws complements existing networks that use other parts of the spectrum for access points, mobile communications, and home media devices.42 analyses of a recent digital inclusion survey suggest that technology upgrades can have significant impact on the ability of libraries to expand programs and services.43 as community anchor institutions (cais), public libraries can use tvws systems to expand and improve access to their services for their users, especially for underserved populations. library-led collaborations to deploy tvws networks in other cais and public spaces have numerous benefits. in conjunction with building-centered wi-fi, tvws can redistribute network users from congested library spaces to other community sites, thereby distributing network usage across the community. from an existing broadband connection, libraries can extend their networks of internet access strategically across their communities. yet, unlike networks which solely use limited-range wi-fi, far-reaching tvws can improve the coverage and inclusion of patrons in accessing library programs, services and the broader internet.44 the portability of the access points allows libraries to extend their reach by providing wireless connections in the shortterm, for cultural or civic events like fairs, markets, or concerts, and in the long-term, for use at popular public areas. recent tvws pilot installations have proven to be very stable in kansas, colorado, mississippi, and delaware. manhattan public library (kansas)’s tvws project began in fall 2013. though there were a few delays in the installation and testing process, the tvws equipment was successfully implemented and welcomed by the community in early 2014. it staff report that their remote locations have shown that this library service fills a community need, especially for underserved populations.45 delta county libraries (colorado) are conducting trials with two public hotspots to support “guest” access and potentially provide library patrons with more bandwidth access.46 tvws implementations in the pascagoula school district (mississippi)47 and delaware public libraries48 show successful initial pilot usage in providing wireless internet service directly to community-distributed access points. though there are contextual differences across these sites, the strength of public libraries as cais providing internet access via tvws systems is evident and promising. first steps any library can take the initiative in setting up a tvws network on its own. the first step is to assess availability of spectrum in the library’s geographic location. access to tvws frequencies is free and requires no subscription fees other than the initial equipment investment. public tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 41 databases of tvws availability are easily accessible and have been tested by the fcc since 2011;49 google also has posted its own spectrum database as well.50 from this setup, the library gains access to public tvws frequencies by which they can broadcast and receive internet connections from paired tvws-enabled remote hotspots. once it is determined that there is available spectrum/channels in the desired area, libraries can then explore how their current broadband and wireless connections might be expanded to include several community spaces where internet access is needed. next, the library works with a tvws equipment supplier to design and install a tvws network consisting of a base station that is integrated with their wired connection to the internet. finally, the library places tvws-enabled remote hotspots in (previously identified) community-based spaces where wi-fi access is needed by underserved populations. given a high quality backhaul (i.e., fiber optic cable high speed connection), tvws can spread that signal and provide access from the library, which is able to propagate and penetrate multiple barriers and geographical features with a signal up to 10 times stronger than current wi-fi. depending on the context (geographical features, tvws availabilities, etc.), hotspots can be installed up to six miles (10 km) away and do not require line-of-sight between the base station and hotspots. this ability is superior to current wi-fi networks that only cover patrons in the immediate vicinity of the library. these tvws remote hotspots also can be easily (and strategically) moved to support occasional community needs (such as neighborhood-wide or city events) or in response to crisis situations. tvws, libraries, and emergency response public libraries provide leadership as “ready access point, first choice, first refuge, and last resort” for community services in everyday matters and in emergencies.51 they have assisted residents in relief efforts during hurricanes katrina and rita, and other natural and man-made disasters.52 …the provision of access to computers and the internet was a wholly unique and immeasurably important role for public libraries… the infrastructure can be a tremendous asset in times of emergencies, and should be incorporated into community plans.53 they have likewise provided immediate and long-term assistance to communities and aid workers, providing physical space for recovery operations for emergency agencies, communication technologies, and emotional support for the community. in previous library internet usage surveys, nearly one-third of libraries reported that their computers and internet services would be used by the public in emergencies to access relief services and benefits.54 such activities include finding and communicating with family and friends, completing online fema forms and insurance claims, and checking news sites regarding information of their affected homes.55 yet, despite the admirable and successful efforts of many public libraries, their infrastructures are not always built to meet the increased demand of user needs and e-government services in emergency contexts.56 jaeger, shneiderman, fleischmann , preece, qu, and wu propose the concept of community response grids (crgs), which utilize the internet and mobile communications devices so that emergency responders and residents in a disaster area can information technology and libraries | march 2017 42 communicate and coordinate accurate, appropriate responses.57 this concept relies on social networks, both in person and online, to enable residents and emergency responders to work together in a multi-directional communication scheme. crgs provide residents tailored, localized information and a means to report pertinent disaster related information to emergency responders, who in turn can synthesize and analyze submitted information and act accordingly.58 due to their existing role as community anchor institutions (cais), public libraries are uniquely positioned for crg involvement. libraries can assist in facilitation of internet access with portable tvws network connection points. by virtue of their portability, tvws hotspots can provide essential digital access in times of crisis by moving along with their affected populations. emergency operations and communications in a crisis occur throughout networks comprised of various technologies. information management before, during, and after a disaster affects how well a crisis is managed.59 broadband internet can be one access route in the event that phone and radio transmissions are affected, and vice versa, as part of a “mixed media approach” to get messages to those that need it in an emergency.60 yet one must remember that internet communications are double-edged: the internet provides relevant material on demand and near instant sharing and collaborating, but these very features can compound a crisis with misinformation.61 despite these concerns, the potential of the integration of wireless devices and other technologies into a multi-technology, collaborative response system can solve the problem of existing communication structures that lack coordination and quality control.62 the proliferation of smartphones, laptops, and other portable wireless devices makes such technology ideal for emergency communications, especially in how users’ familiarity with their own devices will help them navigate crg communications while under stress.63 conclusion supporting internet access and inclusion in public libraries and having equal, affordable, and available access to information is a necessary component to bridging the digital divide. technology has become “an irreducible component of modern life, and its presence and use has significant impact on an individuals’ ability to fully engage in society.”64 as cohron argues, this principle represents more than providing people with internet access: it is about “leveling the playing field in regards to information diffusion. the internet is such a prominent utility in peoples’ lives that we, as a society, cannot afford for citizens to go without.”65 broadband access is the first step; digital literacy training is also a necessity. access alone is not enough to ensure quality and effective use, however, as the digital divide is representative of broader social inequalities that computer and internet access cannot fully remedy.66 this is a complex problem that requires a multi-faceted solution. as kinney states, “the digital divide is a moving target, and new divides open up with new technologies. libraries help bridge some inequities more than others, and substantial disparities exist among library systems.”67 internet access also becomes a necessity when the internet is to play a role in emergency communications.68 tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 43 it is problematic to suggest that public libraries can be simultaneously promoted as the solution to digital divide issues while facing cuts to funding. policy makers, community advocates, and the community members themselves are stakeholders in the success of their communities, and must also take responsibility for access and inclusion via public libraries.69 as public agencies automate to increase equality and save money, they exacerbate digital divides by excluding those without access. suggesting that community members simply visit the library to ensure access to public services places additional pressure on libraries, yet these efforts may go unsupported and unacknowledged. public libraries are already valuable community access points to resources especially in emergencies, though many suffer from a lack of concerted disaster planning. along similar lines, many libraries are ill-equipped to accommodate the bandwidth needs of growing and oftentimes sparsely connected populations. as communications and government services move increasingly online, it becomes imperative to build strong cost-effective information infrastructures. tvws connections can arguably help in breaking down the barriers that challenge ubiquitous access and inclusion. tvws-enabled remote access points in daily use around communities are ideally situated to provide everyday wi-fi and for rapid redeployment to damaged areas (as pop-up hotspots) to provide essential communication and information resources in times of crisis. in short, tvws can augment the technological infrastructure of public libraries toward further developing their roles as cais and leaders serve their communities well into the future. references 1. wireline competition bureau, “2016 broadband progress report,” federal communications commission, january 29, 2016, https://www.fcc.gov/reports-research/reports/broadbandprogress-reports/2016-broadband-progress-report. 2. office of chairman wheeler, “fcc adopts strong, sustainable rules to protect the open internet,” federal communications commission, february 26, 2015, https://apps.fcc.gov/edocs_public/attachmatch/doc-332260a1.pdf. 3. “here's what the digital divide looks like in the united states,” the white house, july 15, 2015, https://www.whitehouse.gov/share/heres-what-digital-divide-looks-united-states. 4. john b. horrigan and maeve duggan, “home broadband 2015,” pew research center, december 21, 2015, http://www.pewinternet.org/files/2015/12/broadband-adoptionfull.pdf. this 43% is further divided between 33% reporting the monthly subscription cost as their main reason, while the other 10% report the expensive cost of a computer as their reason for non-adoption. 5. bo kinney, “the internet, public libraries, and the digital divide,” public library quarterly 29, no. 2 (2010): 104-161, https://doi.org/10.1080/01616841003779718. information technology and libraries | march 2017 44 6. madalyn cohron, “the continuing digital divide in the united states,” the serials librarian 69, no. 1 (2015): 77-86, https://doi.org/10.1080/0361526x.2015.1036195. 7. kinney, “the internet, public libraries, and the digital divide.” 8. paul t. jaeger, john carlo bertot, kim m. thompson, sarah m. katz, and elizabeth j. decoster, “the intersection of public policy and public access: digital divides, digital literacy, digital inclusion, and public libraries,” public library quarterly 31, no.1 (2012): 1-20, https://doi.org/10.1080/01616846.2012.654728. 9. brian real, john carlo bertot, and paul t. jaeger, “rural public libraries and digital inclusion: issues and challenges,” information technology and libraries 33, no. 1 (2014): 6-24, https://doi.org/10.6017/ital.v33i1.5141. 10. jaeger et al., “the intersection of public policy and public access.” 11. john carlo bertot, paul t. jaeger, lesley a. langa, charles r. mcclure, “public access computing and internet access in public libraries: the role of public libraries in e-government and emergency situations,” first monday 11, no. 9 (2006), https://doi.org/10.5210/fm.v11i9.1392. 12. john. b horrigan, “libraries 2016,” pew research center, september 9. 2016, http://www.pewinternet.org/2016/09/09/libraries-2016/. 13. real et al., “rural public libraries and digital inclusion.” 14. john carlo bertot, charles r. mcclure, and paul t. jaeger, “the impacts of free public internet access on public library patrons and communities,” library quarterly 78, no.3 (2008): 285301, https://doi.org/10.1086/588445. 15. charles r. mcclure, paul t. jaeger, john carlo bertot, “the looming infrastructure plateau? space, funding, connection speed, and the ability of public libraries to meet the demand for free internet access,” first monday 12, no. 12 (2007): https://doi.org/10.5210/fm.v12i12.2017 . 16. ibid. 17. bertot et al., “public access computing and internet access in public libraries.” 18. ibid.; jaeger et al., “the intersection of public policy and public access.” 19. wireline competition bureau, “wcb cost model virtual workshop 2012 community anchor institutions,” federal communications commission, june 1, 2012, https://www.fcc.gov/newsevents/blog/2012/06/01/wcb-cost-model-virtual-workshop-2012-community-anchorinstitutions. 20. jennifer koerber, "ala and ipac analyze digital inclusion survey," library journal 141, no. 1 (2016): 24-26. 21. horrigan, “libraries 2016.” tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 45 22. ibid. 23. timothy inklebarger, “bridging the tech gap,” american libraries, september 11, 2015, https://americanlibrariesmagazine.org/2015/09/11/bridging-tech-gap-wi-fi-lending. 24. andrew stirling, “white spaces – the new wi-fi?,” international journal of digital television 1, no. 1 (2010): 69–83, https://doi.org/10.1386/jdtv.1.1.69/1; cristian gomez, “tv white spaces: managing spaces or better managing inefficiencies?,” in tv white spaces a pragmatic approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 67-77. 25. steve song, “spectrum and development,” in tv white spaces a pragmatic approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 35-40. 26. robert horvitz, “geo-database management of white space vs. open spectrum,” in tv white spaces a pragmatic approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 7-17. 27. julie knapp, “fcc announces public testing of first television white spaces database,” federal communications commission, september 14, 2011, https://www.fcc.gov/newsevents/blog/2011/09/14/fcc-announces-public-testing-first-television-white-spacesdatabase. 28. horvitz, “geo-database management of white space vs. open spectrum.” 29. ryszard strużak and dariusz więcek, “regulatory issues for tv white spaces,” in tv white spaces a pragmatic approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 19-34. 30. horvitz, “geo-database management of white space vs. open spectrum,” 8. 31. engineering & technology bureau, “fcc adopts rules for unlicensed services in tv and 600 mhz bands,” federal communications commission, august 11, 2015, https://apps.fcc.gov/edocs_public/attachmatch/fcc-15-99a1_rcd.pdf. 32. gomez, “tv white spaces: managing spaces or better managing inefficiencies?,” 68. 33. stirling, “white spaces – the new wi-fi?.” 34. linda e. doyle, “cognitive radio and africa,” in tv white spaces a pragmatic approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 109-119. 35. gomez, “tv white spaces: managing spaces or better managing inefficiencies?,” 72. 36. mike jensen, “the role of tv white spaces and dynamic spectrum in helping to improve internet access in africa and other developing regions,” in tv white spaces a pragmatic information technology and libraries | march 2017 46 approach, eds. ermanno pietrosemoli and marco zennaro (trieste: abdus salam international centre for theoretical physics t/ict4d lab, 2013), 83-89. 37. song, “spectrum and development.” 38. ibid. 39. doyle, “cognitive radio and africa,” 113. 40. stirling, “white spaces – the new wi-fi?.” 41. afton chavez, ryan littman-quinn, kagiso ndlovu, and carrie l kovarik, “using tv white space spectrum to practice telemedicine: a promising technology to enhance broadband internet connectivity within healthcare facilities in rural regions of developing countries,” journal of telemedicine and telecare 22, no. 4 (2015): 260-263, https://doi.org/10.1177/1357633x15595324. 42. stirling, “white spaces – the new wi-fi?.” 43. koerber, "ala and ipac analyze digital inclusion survey." 44. chavez et al., “using tv white space spectrum to practice telemedicine.” 45. kerry ingersoll, june 22, 2015, google+ comment to the gigabit libraries network, https://plus.google.com/107631107756352079114/posts/l4y8ci8sg5y. 46. delta county libraries, “super wi-fi pilot,” accessed november 1, 2016, http://www.deltalibraries.org/super-wi-fi-pilot/. 47. pascagoula tv white spaces facebook group, accessed november 1, 2016, https://www.facebook.com/psdtvws/. 48. “delaware libraries white space pilot update, january 2015,” accessed november 1, 2016, http://lib.de.us/files/2015/01/delaware-libraries-white-space-pilot-update-jan-2015.pdf. 49. knapp, “fcc announces public testing of first television white spaces database.” 50. see https://www.google.com/get/spectrumdatabase/. 51. bertot et al., “public access computing and internet access in public libraries.” 52. bertot et al., “the impacts of free public internet access.” see also horrigan, “libraries 2016.” 53. paul t. jaeger, lesley a. langa, charles r. mcclure, and john carlo bertot, “the 2004 and 2005 gulf coast hurricanes: evolving roles and lessons learned for public libraries in disaster preparedness and community services,” public library quarterly 25, 3/4, (2007), 199-214. 54. ibid. tv white spaces in public libraries: a primer | rebmann, te, and means | https://doi.org/10.6017/ital.v36i1.9720 47 55. bertot et al., “public access computing and internet access in public libraries.” 56. ibid. 57. paul t. jaeger, ben shneiderman, kenneth r. fleischmann , jennifer preece, yan qu, philip fei wu, “community response grids: e-government, social networks, and effective emergency management,” telecommunications policy 31 (2007): 592-604, https://doi.org/10.1016/j.telpol.2007.07.008. 58. ibid., 595. 59. laurie putnam, “by choice or by chance: how the internet is used to prepare for, manage, and share information about emergencies,” first monday 7, no.11 (2002), https://doi.org/10.5210/fm.v7i11.1007. 60. ibid. 61. ibid. 62. jaeger et al., “community response grids,” 598. jaegar et al. describe how the internet combines the best of one-to-one, one-to-many, many-to-one, and many-to-many in terms of the flow and quality of information. one-to-one communication is slow; many-to-one only benefits the central network, while outsiders reporting emergencies do not learn what others are reporting; one-to-many is inefficient, limited, and assumes the broadcaster has the appropriate information and can get it to those that need it most; many-to-many can create “information overload” of questionable content. 63. ibid., 599. 64. jaeger et al., “the intersection of public policy and public access,” 3. 65. cohron, “the continuing digital divide in the united states,” 84. 66. kinney, “the internet, public libraries, and the digital divide,” 120. 67. ibid., 148. 68. jaeger et al., “community response grids,” 599. 69. bertot et al., “the impacts of free public internet access,” 299. 324 journal of library automation vol. 7/4 december 1974 book reviews current awareness and the chemist, a study of the use of ca con.densates by chemists, by elizabeth e. duncan. metuchen, n.j.: scarecrow press, 1972. 150p. $5.00. this book starts with a five-page foreword by allen kent entitled "kwic indexes have come a long way-or have they?" kent is always interesting but when one detects that his foreword is becoming almost an. apologia, one wonders just what is to come. the remainder of the book (apart from the index) appears already to have been presented as dr. duncan's ph.d. thesis at the university of pittsburgh. the first two chapters are the usual sort of stuff, taking us from alexandria in the third century to columbus, ohio in 1970, with undistinguished reviews of user studies and the history of the chemical abstracts service. the remaining sixty-four pages of text report and discuss a study of the use of ca condensates by quite a small sample of academic and industrial chemists in the pittsburgh area. the objective appears to have been to compare profile hits with periodical holdings and interlibrary loan requests at the client's library so that a decision model for the acquisition of periodicals could be developed. on the author's own admission, this objective was not achieved. a certain amount of data is presented but it is difficult to draw many conclusions from it, other than the fact that chemists do not appear to follow up the majority of profile hits that they receive nor do they use the current issues of chemical abstracts very frequently. it is difficult to understand why this material was published in book form. it could have been condensed to one or possibly two papers for ].chem.doc. or perhaps even left for the really diligent seeker to find on the shelves of university microfilms-but, as the old testament scribe bemoaned, "of making many books there is no end." at the bottom of page 118 a reference is made to the paper by abbott et al. in aslib proceedings (feb. 1968); at the top of page 119 the same paper's date is given as january 1968. other errors are less obvious, but one really questions whether the provision of a short foreword and an index makes even a good thesis worth publishing in hard covers. r. t. bottle the city university london, u.k. computer-based reference service, by m. lorrai'ne mathies and peter g. watson. chicago: american library assn., 1973. 200p. $9.95. the archetypal title and model for all works of explication is ....... without tears. lorraine mathies and peter watson have attempted the praiseworthy task of explaining computer-produced indexes to the ordinary reference librarian, but for a number of reasons, some of them probably beyond the control of the authors, the tears will remai'n, perhaps one difficulty is that this book was, in its beginnings at least, the product of a committee. back in 1968 the information retrieval committee of the reference services division of the ala wanted to present to "working reference librarians the essentials of the reference potential of computers and the machine-readable data they produce" (p.xxix). the proposal worked its way (not untouched, of course) through several other groups and eventually resulted in a preconference workshop on computer-based reference service being given at the dallas convention of 1971. the present book is based on the tutor's manual which mathies and watson prepared for that workshop but incorporates revisions suggested by the ala publishing services as well as changes initiated by the authors themselves. with so many people getting into the planning act, it is not surprising that the various parts of the book should end up by working at cross purposes to each other. unfortunately, the principal conflicts come at just those points where a volume of exposition needs to be most definite and precise: just what is the book trying to do and for whom? at the original workshop, the eric data base was chosen as a "model system" since educational terminology was more likely to be understood than that of the sciences. and because the participants were to learn by doing, they were told a great deal about eric so as to be able to "practice" on it. the trouble is that these objectives do not translate well from workshop to print. the detafls about eric, which may have been necessary as tutors' instructions, seem misplaced in book form. almost half the present book is devoted to a laborious explanation of how eric works and this is a great deal more than most workaday reference librarians will want to know about it. moreover, it is no longer clear whether mathies and watson aim to train "producers" or "consumers." the welter of detail suggests that they expect their readers to learn hereby to construct profiles and to program searches but it is highly doubtful that skills of this kind can or should be imparted on a "teach yourself" basis. once mathies and watson leave eric behind, they seem on surer ground. part ii (computer searching: principles and strategies) begins with a fairly routine chapter on binary numeration which is perhaps unnecessary since this material is easily available elsewhere. however, the section quickly moves on to an excellent explanation of boolean logic and weighting, describes their application in the formulation of search strategies, and ends with an admirably succinct and demystifying account of how one evaluates the output (principles of relevance and recall). the reader might well have been better served if the book had indeed begun with this part. the last section (part iii: other machine readable data bases) is also very useful, particularly for the "critical bibliography" (p.153) in which the authors describe and evaluate ten of the major bibliographic data bases. this critical bibliography is apparently a first of its kind, which makes the authors' perceptive and frank comments all the more welcome. part iii also contains chapters on marc and the 1970 census but, sh·angely enough, does not include a final resume and conclusions. it is true that in each book reviews 325 chapter there is a paragraph or so of summary but this is hardly a satisfactory substitute for the overall recapitulation one would expect. in the final analysis, indeed, one's view of the book will depend on just thatwhat one expects of it. if "working reference librarians" expect to read this book in order to be no longer "intimidated by these electronic tools" (p.ix), they are apt to be disappointed. the inordinate emphasis on eric, the rather dense language, and the fact that the main ideas are never pulled together at the end will all prevent easy enlightenment. however, if our workaday reference librarians are willing to work their way through a fairly difficult manual on computer-based indexing as in effect a substitute 'for a workshop on the subject, they will find this book a worthwhile investment of their time-and tears. samuel rothstein school of lihl'arianship university of british columbia the circulation system at the university of missouri-columbia library: an evolutionary approach. sue mccollum and charles r. sievert, issue eds. the larc reports, vol. 5, issue 2, 1972. 101p. in 1958 the university of missouri-columbia library was one of the first libraries to mechanize circulation by punching a portion of the charge slip with book and borrower and/ or loan information. in 1964 an ibm 357 data collection system utilizing a modified 026 keypunch was installed, but not until 1966 was 026 output processed on the library owned and operated ibm 1440 computer. however, budgetary constraints forced a transfer of operations in 1970 to the data processing center, which undertook rewriting of library programs in 1971. after explanation of hardware changes and an overview of the circulation department organization and data processing center operation, this report deals in depth with the major files of the circulation system-circulation master flle and location master file-and the main components of the circulation system-edit, update, overdues, fines, interlibrary loans, 326 journal of libmry automation vol. 7/4 december 1974 address file, location file, reserve book, listing of files, special requests, and utility programs. many examples of report layouts are included, particularly those accomplished by utilizing data gathered from main collection and reserve book loans. although this off-line batch processing circulation system is limited in that it does not handle any borrower reserve or lookup (tracer) routines, both of which are possible in off-line systems, the university of missouri-columbia system has merit as a pioneer system which influenced other university library circulation system designs in the 1960s. detailed reference given throughout the report to changes in the original library programs not only makes it of value as a case history for any library interested in circulation automation but also indicates the important fact that library programs do change and evolve in response to new demands and technological capabilities. lois m. kershnm university of pennsylvania libraries national science information systems, a guide to science information systems in bulgaria, czechoslovakia, hungary, poland, rumania, and yugoslavia, by david h. kraus, pranas zunde, and vladimir slamecka. (national science information series) cambridge, mass.: the m.i.t. press, 1972. 325p. $12.50. as indicated by the title, this volume provides a comparative description and analysis of the various organizational or political structures which have been adopted by six counb·ies of central and eastern europe in their attempts to develop effective national systems for the dissemination of scientific and technical information. for each country there is a detailed account of the national information system now existing, with a brief outline of its antecedents, a directory of information or documentation centers, a list of serials published by these centers, and a bibliography of recent papers dealing with the development of information systems in that country. this main section of the book is preceded by a brief review of the common characteristics of the six national systems and an outline of steps being taken to achieve international cooperation for the exchange of information in specific subjects. of particular interest is the description of the international center of scientific and technical information established in moscow in 1969, and which is now linked to five of these national systems. no attempt is made to describe the techniques being used to store, retrieve, and disseminate information. the authors point out that the six countries being examined "have experimented intensely with organizational variants of national science information systems." unfortunately, they do not attempt to indicate which of these organizational structures was most effective in bringing about the desired results. undoubtedly, this would have been an impossible task and probably not worth the effort, since a successful type of organization in a socialist country would not necessarily be effective in a democracy. the book will be of interest to political scientists and to those seeking the most effective ways of coordinating the information processing efforts of all types of government bodies. it will be only of academic interest to the information specialist concerned primarily with information processing techniques. jack e. brown national science library of canada ottawa information retrieval: on-line, by f. w. lancaster and e. g. fayen. los angeles: melville publishing co., 1973. 597p. lc: 73-9697. isbn: 0-471-51235-4. have you been reading the asis annual review of information science and technology year after year and wishing for a compendium of the best information and examples of the latest systems, user manuals, cost data, and other facts so that you would not have to go searching in a library for the interesting reports, journal articles, and books? well, if you have (and who hasn't), your prayers have been answered if you are interested in online bibliographic retrieval systems. the authors of the handy reference book have collected and reprinted, among other things, the complete dialog terminal users reference manual, the supars user manual, the user instructions for aim-twx, obar, and the caruso tutorial program. each of these systems, and several others (arranged alphabetically from aim-twx [medline] to to xi con [toxline]), is described and illustrated. features and functions of on-line systems, such as vocabulary control and indexing, cataloging, instruction of users, equipment, and file design, are all covered in a straightforward manner, simply enough for the uninformed and carefully enough so that a system operator could compare his system's features and functions with the data provided. richly illustrated with tables, charts, graphs, and figures, up-to-date bibliographies (only serious omission noticed was the afips conference proceedings edited by d. walker), and subject and author indexes, this volume will stand as another landmark in the state-of-the-art review series which the wiley-becker & hayes information science series has come to represent. emphasis has been placed on the design, evaluation, and use of on-line retrieval systems rather than the hardware or programming aspects. several of the chapters have a broader base of interest than on-line systems, covering as they do performance criteria of retrieval systems, evaluating effectiveness, human factors, and cost-performance-benefits factors. easy to use and as up to date and balanced a book as any in a rapidly changing field can be, lancaster and fayen have given students of information studies and planners and managers of information services a very valuable reference aid. pauline a. atherton school of information studies syracuse university national library of australia. australian marc specification. canberra: national library of australia, 1973. 83p. $2.50. isbn: 0-642-99014-x for those readers who are familiar with book reviews 327 the library of congress marc format, the australian marc specification will be, for the most part, self-explanatory. the intent of the document is to describe the basic format structure and to list the various content designators that are used in the format. no effort was made to include any background information or explanation of data elements. because of this, the reviewer found it necessary to refer to other documents, e.g., precis: a rotated subiect index system, by derek austin and peter butcher, in order to complete a comparative analysis of the australian format with other similar formats. perhaps the value of reviewing a descriptive document of this type lies in discovering how the format it describes compares to other existing formats developed for the same purpose. the international organization for standardization published a format for bibliographic information interchange on magnetic tape in 1973, international standard iso 2709, the australian format structure is the same throughout as the international standard. the only variance is in character positions 20 and 21 of the leader, which the australian format left undefined. a comparison of content designators cannot be made with the international standard because it specifies only the position and length of the identifiers in the structure of the format, but not the actual identifier (except for the three-digit tags 001-999 that identify the data fields). the best comparison of content designators can be made with the lc marc format, since the australian format uses many of the same tags, indicators, and subfield codes for the same purposes. the australian format has assigned to the same character positions the same fixed-length data elements as the lc format except for position 38, which is the periodical code in the australian format and the modified record code in the lc format. in the fixed-length character. positions for form of contents, publisher (government publication in lc marc), and literary text (fiction in lc 328 journal of library automation vol. 7/4 december 1974 marc) , the australian format assigned different codes than lc. in general, the australian format uses the same three-digit tags as lc to identify the primary access fields in their records, e.g., 100, 110, 111 for main entries; 400, 410, 411, 440, 490 for series notes; 600, 610, 611, 650, 651 for subject headings; and 700, 710, 711 for added entries. for the remaining bibliographic fields there are some variations in tagging between the two formats. the australian marc has chosen a different method of identifying uniform titles, and has identified five more note fields in the 5xx series of tags than has lc. the australians have also added some manufactured fields to their record. these fields do not contain actual data from the bibliographic record, but rather are fields consisting of data created by program for control and manipulation purposes, or from lists such as the precis subject index. the australian format has also included, as part of its record, a series of cross-reference fields identified by 9xx tags. lc has reserved the 9xx block of tags for local use. the use of indicators differs in most instances between the two formats. both allow for two indicator positions in each field as specified by the international standard format structure. however, the information conveyed by the indicators differs except where the first indicator conwhich means no intelligence carried in this position. in the australian format the indicators in the 6xx block of tags have three different patterns. inconsistency of this kind does not tend to destroy compatibility with other coding systems using the same format structure, as long as sufficient explanation and examples are given from which conversion tables may be developed by the institutions with whom one wants to exchange, or interchange, bibliographic data. an even greater degree of difference exists between the two formats in the subfield codes used to identify data elements. the australian marc has identified some data elements that lc has not, e.g., in personal name main entries, the australian record identifies first names with subfield code "h," whereas lc does not identify parts of a personal name, only the form of the name, i.e., forename form, single surname, family name, etc. in most of the fields the two formats have defined some of the same data elements, but each uses a different subfield code to represent the element. in the australian document, under each field heading, the subfield codes are listed alphabetically with a data element following each code. this arrangement causes the data elements to fall out of their normal order of occurrence in the field, i.e., name, numeration, titles, dates, relator, etc. for example: personal name main entry (tag 100) subfield code a b amtralian marc entry element ( name) relator lc marc entry element (name ) numeration c dates d e second or subsequent additions to name numeration titles ( honorary) dates relator f additions to name other than date date (of a work) veys form of name for personal and corporate name headings. within each block of tags, lc has made an effort to remain consistent in the use of indicators, e.g., in the 6xx block for subject headings, the first indicator specifies form of name where a form of name can be discerned. where no form of name is discernable such as in a topical subject heading (tag 650), a null indicator or blank is used the example demonstrates the need for precise definition and documentation of data elements for the purpose of conversion or translation when interchanging data with other institutions. the australian format has included the capability of identifying analytical entries by using an additional digit (called the level digit) placed between the tag and the indicators to identify the analytical entries. a subrecord directory (tag 002) is present in each record containing data for analytical entries. the australian document includes appendixes for the country of publication codes, language codes, and geographical area codes that were developed by the library of congress. their only deviabook reviews 329 tion from lc marc usage is in the country of publication codes, where the australians have added entities and codes for australian first-level administrative subdivisions. patricia e. parker marc development office library of congress articles no need to ask: creating permissionless blockchains of metadata records dejah rubel information technology and libraries | june 2019 1 dejah rubel (rubeld@ferris.edu) is metadata and electronic resource management librarian, ferris state university. abstract this article will describe how permissionless metadata blockchains could be created to overcome two significant limitations in current cataloging practices: centralization and a lack of traceability. the process would start by creating public and private keys, which could be managed using digital wallet software. after creating a genesis block, nodes would submit either a new record or modifications to a single record for validation. validation would rely on a federated byzantine agreement consensus algorithm because it offers the most flexibility for institutions to select authoritative peers. only the top tier nodes would be required to store a copy of the entire blockchain thereby allowing other institutions to decide whether they prefer to use the abridged version or the full version. introduction several libraries and library vendors are investigating how blockchain could improve activities such as scholarly publishing, content dissemination, and copyright enforcement. a few organizations, such as katalysis, are creating prototypes or alpha versions of blockchain platforms and products.1 although there has been some discussion about using blockchains for metadata creation and management, only one company appears to be designing such a product. therefore, this article will describe how permissionless blockchains of metadata records could be created, managed, and stored to overcome current challenges with metadata creation and management. limitations of current practices metadata standards, processes, and systems are changing to meet twenty-first century information needs and expectations. there are two significant limitations, however, to our current metadata creation and modification practices that have not been addressed: centralization and traceability. although there are other sources for metadata records, including the open library project, the largest and most comprehensive database with over 423 million records is provided by the online computer library center (oclc).2 oclc has developed into a highly centralized operation that requires member fees to maintain its infrastructure. oclc also restricts some members from editing records contributed by other members. one example of these restrictions is the program for cooperative cataloging (pcc). although there is no membership fee for pcc, catalogers from participating libraries must receive additional training to ensure that their institution contributes high quality records.3 requiring such training, however, limits opportunities for participation and can create bottlenecks when non-pcc institutions identify errors in a pcc record. decentralization no need to ask | rubel 2 https://doi.org/10.6017/ital.v38i2.10822 would help smaller, less-well-funded institutions overcome such barriers to creating and contributing their records and modifications to a central database. the other significant limitation to our current cataloging practices is the lack of traceability for metadata changes. oclc tracks record creation and changes by adding an institution’s oclc symbol to the 040 marc field.4 however, this symbol only indicates which institution created or edited the record, not what specific changes they made. oclc also records a creation date and a replacement date in each record, but a record may acquire multiple edits between those two dates. recording the details of each change within a record would help future metadata editors to understand who made certain changes and possibly why they were made. capturing these details would also mitigate concerns about the potential for metadata deletion because every datum would still be recorded even if it is no longer part of the active record. information science blockchain research many researchers and institutions are exploring blockchain for information science applications. most of these applications can be categorized as either scholarly publishing, content dissemination and management, or metadata creation and management. one of the most promising applications for blockchain is coordinating, endorsing, and incentivizing research and scholarly publishing activities. in “blockchain for research,” rossum from digital science describes benefits such as data colocation, community self-correction, failure analysis, and fraud prevention.5 research activity support and endorsement would use an academic endorsement points (aep) currency to support work at any level, such as blog posts, data sets, peer reviews, etc. the amount credited to each scientist is based on the aep received for their previous work. therefore, highly endorsed researchers will have a greater impact on the community. one benefit of this system is that such endorsements would accrue faster than traditional citation metrics.6 one detriment to this system is its reliance on the opinions of more experienced scientists. the current peer review process assumes these experts would be the best to evaluate new research because they have the most knowledge. breakthroughs often overturn the status quo, however, and consequently may be overlooked in an echo chamber of approved theories and approaches. micropayments using aep could “also introduce a monetary reward scheme to researchers themselves,” bypassing traditional publishers.7 unfortunately, such rewards could become incentives to propagate unscientific or immoral research on topics like eugenics. in addition, research rewards might increase the influence of private parties or corporations to science and society’s detriment. blockchains might also reduce financial waste by “incentivizing research collaboration while discouraging solitary and siloed research.”8 smart contracts could also be enabled that automatically publish any article, fund research, or distribute micropayments based on the amount of endorsement points.9 to support these goals, digital science is working with katalysis on the blockchain for peer review project. it is hard to tell exactly where they are in development, but as of this writing, it is probably between the pilot phase and the minimum viable product.10 the decentralized research platform (deip) serves as another attempt “to create an ecosystem for research and scientific activities where the value of each research…will be assessed by an experts’ community.”11 the whitepaper authors note that the lack of negative findings and unmediated or open access to information technology and libraries | june 2019 3 research results and data often leads to scientists replicating the same research.12 they also state that 80 percent of publishers’ proceeds are from university libraries, which spend up to 65 percent of their entire budget on journal and database subscriptions.13 this financial waste is surprising because universities are the primary source of published research. therefore, deip’s goals include research and resource distribution, expertise recognition, transparent grant processes, skill or knowledge tracking, preventing piracy, and ensuring publication regardless of the results.14 the second most propitious application of blockchain to information science is content dissemination and management.15 blockchain is an excellent way to track copyright. several blockchains have already been developed for photographers, artists, and musicians. examples include photochain, copytrack, binded, and dotbc.16 micropayments for content supports the implementation of different access models, which can provide an alternative to subscriptionbased models.17 micropayments can also provide an affordable infrastructure for many content types and royalty payment structures. blockchain could also authenticate primary sources and trace their provenance over time. this authentication would not only support archives, museums, and special collections, but it would also ensure law libraries can identify the most recent version of a law.18 finally, blockchain could protect digital first sale rights, which are key to libraries being able to share such content.19 “while drm of any sort is not desirable, if by using blockchain-driven drm we trade for the ability to have recognized digital first sale rights, it may be a worthy bargain for libraries.”20 to support such restrictions, another use for blockchain developed by companies such as libchain is open, verifiable, and anonymous access management to library content.21 another suitable application for blockchain is metadata creation and management.22 an open metadata archive, information ledger, or knowledgebase is very appealing because access to high quality records often requires a subscription to oclc.23 some libraries cannot afford such subscriptions. therefore, they must rely on records supplied by either a vendor or a government agency, like the library of congress. unfortunately, as of this writing, there is little research on how these blockchains could be constructed at the scale of large databases like those of oclc and the library of congress. in fact, the only such project is demco’s private, invitation-only beta.24 demco does not provide any information regarding their new product, but to make its development profitable, it is most likely a private, permissioned blockchain. creating permissionless blockchains for metadata records this section will describe how to create permissionless blockchains for metadata records including grouping transactions, an appropriate consensus algorithm, and storage options. please note that these blockchains are intended to augment current metadata record creation and modification practices and standards, not supersede them. the author assumes that record creation and modification will still require content (rda) and encoding (marc) validation prior to blockchain submission. validation in this section will refer solely to blockchain validation. generating and managing public and private keys all distributed ledger participants will need a public key or address for blocks of transactions to be sent to them and a private key for digital signatures. one way to create these key pairs is to generate a seed, which can be a group of random words or passphrases. the sha-256 algorithm can then be applied to this seed to create a private key.25 next, a public key can be generated from that private key using an elliptic curve digital signature algorithm.26 for additional security, the no need to ask | rubel 4 https://doi.org/10.6017/ital.v38i2.10822 public key can be hashed again using a different cryptographic hash function, such as ripemd160, or multiple hash functions, like bitcoin does to create its addresses.27 these key pairs could be managed with digital wallet software. “a bitcoin wallet is an organized collection of addresses and their corresponding private keys.”28 larger institutions, such as the library of congress, could have multiple key pairs with each pair designated for the appropriate cataloging department based on genre, form, etc. creating a genesis block every blockchain must start with a “genesis block.”29 for example, a personal name authority blockchain might start with william shakespeare’s record. a descriptive bibliographic blockchain might start with the king james bible. this genesis block includes a block header, a recipient’s public key or address, a transaction count, and a transaction list.30 being the first block, the block header will not contain a hash of the previous block header. it will contain, however, a hash of all of the transactions within that block to verify that the transactions list has not been altered. the block header will also include a timestamp and possibly a difficulty level and nonce.31 then the block header is hashed using the sha-256 algorithm and encrypted with the creator’s private key to produce a digital signature. this digital signature will be appended to the end of the block so validators can verify that the creator made the block by using their (the creator’s) public key.32 finally, the recipient’s public key or address, the transaction count, and transaction list are appended to the block header.33 block header • hash of previous block header • hash of all transactions in that block • timestamp • difficulty level (if applicable) • nonce (if applicable) block • recipient public key or address • transaction count • transaction list • digital signature in her master of information security and intelligence thesis at ferris state university, amber snow investigated the feasibility of using blockchain to add, edit, and validate changes to woodbridge n. ferris’ authority record.34 as shown in figure 1, she began by creating a hash function using the sha-256 algorithm to encrypt the previous hash, the timestamp, the block number, and the metadata record. “the returned encrypt value is significant because the returned data is the encrypted data that is being committed as [a] mined block transaction permanently to ledger.”35 the ledger block, however, “contains the editor’s name, the entire encrypted hash value, and the prior blocks [sic] hashed value.”36 information technology and libraries | june 2019 5 figure 1. creating a sha-256 hash. next, as shown in figures 2 and 3, she created a genesis block with a prior hashed value of zero by ingesting ferris’ authority record as “a single line file that contains the indicator signposts for cataloging the record.”37 figure 2. ingesting woodbridge n. ferris' authority record.38 figure 3. woodbridge n. ferris' authority record as a genesis block. note the previoushash value is zero. snow noted that “the understanding and interpretation of the marc authority record’s signposts is not inherently relevant for the blockchain data processing.”39 to keep the scope narrow, she also avoided using public and private key pairs to exchange records between nodes. “the ri [research institution] blockchain does not necessarily require two users to agree…instead the ri blockchain is looking to commit and track single user edits to the record.”40 creating and submitting new blocks for validation once a genesis block has been created and distributed, any node on the network can submit new blocks to the chain. for metadata records, new blocks should contain either new records or multiple modifications to the same record with each field being treated as a transaction. when a no need to ask | rubel 6 https://doi.org/10.6017/ital.v38i2.10822 second block is appended, the new block header will include the hash of the previous block header, a hash of all of the new transactions, a new timestamp, and possibly a new difficulty level and/or nonce. the block header will then be hashed using sha-256 and encrypted with the submitter’s private key to become a digital signature for that block. finally, another recipient’s public key or address, a new transaction count, and a new transaction list will be appended to the block header. additional blocks can then be securely appended to the chain ad infinitum without losing any of the transactional details. if two validators approve the same block at the same time, then the fork where the next block is appended first becomes the valid chain while the other chain becomes orphaned.41 although snow’s method does not include exchanging records using public keys or addresses, she was able to change a record, add it to the blockchain, and successfully commit those edits using the proof of work consensus algorithm.42 as shown in figure 4, after creating and submitting a genesis block as “tester 1,” she added a modified version of woodbridge n. ferris’ record as “tester 2.” this version appended the string “testerchanged123” to woodbridge n. ferris’ authority record. then she validated or “mined” the second block to commit the changes. figure 4. submitting and validating an edited record. figure 5 shows that the second block is chained to the genesis block because the “previoushash” value of the second block matches the “hash” of the genesis block. this link is what commits the block to the ledger. the appended string in the second block is at the end of the “metadata” variable. information technology and libraries | june 2019 7 figure 5. the new authority record blockchain. a more sophisticated method to append a second block would require key pairs. as described previously, a block would include a recipient’s public key or address, which would route the new and modified records to large, known institutions like the library of congress. although every node on the network can see the records and all of the changes, large institutions with welltrained and authoritative catalogers may be the best repository for metadata records and could store a preservation or backup copy of the entire chain. they are also the most reliable for validating records for content accuracy and correct encoding. achieving algorithmic consensus once a block has been submitted for validation, the other nodes use a consensus algorithm to verify the validity of the block and its transactions. “consensus mechanisms are ways to guarantee a mutual agreement on a data point and the state…of all data.”43 the most well-known consensus algorithm is bitcoin’s proof of work, but the most suitable algorithm for permissionless metadata blockchains is a federated byzantine agreement. proof of work proof of work (pow) relies on a one-way cryptographic hash function to create a hash of the block header. this hash is easy to calculate, but it is very difficult to determine its components.44 to solve a block, nodes must compete to calculate the hash of the block header. to calculate the hash of a block header, a node must first separate it into its constituent components. the hash of the previous block header, the hash of all of the transactions in that block, the timestamp, and the difficulty target will always have the same inputs. the validator, however, changes the nonce or random value appended to the block header until the hash has been solved.45 in bitcoin this process is called “mining” because every new block creates new bitcoins as a reward for the node that solved the block.46 no need to ask | rubel 8 https://doi.org/10.6017/ital.v38i2.10822 bitcoin also includes a mechanism to ensure the average number of blocks solved per hour remains constant. this mechanism is the difficulty target. “to compensate for increasing hardware speed and varying interest in running nodes over time, the proof-of-work difficulty is determined by a moving average targeting an average number of blocks per hour. if they’re generated too fast, the difficulty increases.”47 adjusting the difficulty target within the block header keeps bitcoin stable because its block rate is not determined by its popularity.48 in sum, validators are trying to find a nonce that generates a hash of the block header that is less than the predetermined difficulty target. unfortunately, proof of work requires immense and ever-increasing computational power to solve blocks, which poses a sustainability and environmental challenge. bitcoin and other financial services may need to rely on proof of work because “the massive amounts of electricity required helps to secure the network. it disincentivizes hacking and tampering with transactions…”49 because an attacker would need to control over 51 percent of the entire network to convince the other nodes that a faulty ledger is correct.50 metadata blockchains would rely on public information and therefore would not need the same level of security as private financial, medical, or personally identifiable information. unlike bitcoin, metadata blockchains also would not need a difficulty target because fluctuations in block production rates would not affect a metadata block’s value the same way cryptocurrency inflation would. therefore, despite its incredible security, proof of work would be computationally excessive for metadata record blockchains. federated byzantine agreement byzantine agreements are “the most traditional way to reach consensus. […] a byzantine agreement is reached when a certain minimum number of nodes (known as a quorum) agrees that the solution presented is correct, thereby validating a block and allowing its inclusion on the blockchain.”51 byzantine fault-tolerant (bft) state machine replication protocols support consensus “despite participation by malicious (byzantine) nodes.”52 this support ensures consensus finality, which “mandates that a valid block…never be removed from the blockchain.”53 in contrast, proof of work does not satisfy consensus finality because there is still the potential for temporary forking even if there are no malicious nodes.54 the “absence of consensus finality directly impacts the consensus latency of pow blockchains as transactions need to be followed by several blocks to increase the probability that a transaction will not end up being pruned and removed from the blockchain.”55 this latency increases as block size increases, which may also increase the number of forks and possibility of attack.56 “with this in mind, limited performance is seemingly inherent to pow blockchains and not an artifact of a particular implementation.”57 bft protocols, however, can sustain tens of thousands of transactions at nearly network latency levels.58 a bft consensus algorithm is also superior to one based on proof of work because “users and smart contracts can have immediate confirmation of the final inclusion of a transaction into the blockchain.”59 bft consensus algorithms also decouple trust from resource ownership, allowing small organizations to oversee larger ones.60 to use bft, every node must know and agree on the exact list of participating peer nodes. ripple, a bft protocol, tries to ameliorate this problem by publishing an initial membership list and allowing members to edit that list after implementation. unfortunately, users are often reluctant to edit the membership list thereby placing most of the network’s power in the person or organization that maintains the list.61 information technology and libraries | june 2019 9 federated byzantine agreement (fba), however, does not require each node to agree upon and maintain the same membership list. “in fba, each participant knows of others it considers important. it waits for the vast majority of those others to agree on any transaction before considering the transaction settled.”62 theoretically, an attacker could join the network enough times to outnumber legitimate nodes, which is why quorums by majority would not work. instead, fba creates quorums using a decentralized method that relies on each node selecting its own quorum slices.63 “a quorum slice is the subset of a quorum convincing one particular node of agreement.”64 a node may have many slices, “any one of which is sufficient to convince it of a statement.”65 the system constructs quorums based on individual node decisions thereby generating consensus without every node being required to know about every other node in the system.66 one example of quorum slices that might be good for metadata blockchains is a tiered system as shown in figure 6. the top tier would be structured like a bft system where the nodes can tolerate a limited number of byzantine nodes at the same level. this level would include the core metadata authorities, such as the library of congress or pcc members. members of this tier would be able to validate any record. the second or middle tier nodes would depend on the top tier because, in this example, a middle tier node requires two top tier nodes to form a quorum slice. these middle tier nodes would be authoritative, known institutions, such as universities, that already rely on the core metadata authorities on the top tier to validate and distribute their records. finally, a third tier, such as smaller institutions, would, in this example, rely on at least two middle tier nodes for their quorum slice. figure 6. tiered quorum example. no need to ask | rubel 10 https://doi.org/10.6017/ital.v38i2.10822 using an fba protocol to validate a transaction requires each node to exchange two sets of messages. the first set of messages gathers validations and the second set of messages confirms those validations. “from each node’s perspective, the two rounds of messages divide agreement…into three phases: unknown, accepted, and confirmed.”67 the unknown status becomes an acceptance when the first validation succeeds. acceptance is not sufficient for a node to act on that validation, however, because acceptance may be stuck in an indeterminate state or blocked for other nodes.68 the accepting node may also be corrupted and validate a transaction the network quorum rejects. therefore, the confirmation validation “allows a node to vote for one statement and later accept a contradictory one.”69 figure 7. validation process of statement a for a single node v. fba would lessen concerns about sharing a permissionless blockchain, but it can “only guarantee safety when nodes choose adequate quorum slices.”70 after discovery, byzantine nodes should be excluded from quorum slices to prevent interference with validation. one example of such interference is tricking other nodes to validate a bad confirmation message. “in such a situation, nodes must disavow past votes, which they can only do by rejoining the system under new node names.”71 theoretically, this recovery process could be automated to include “having other nodes recognize reincarnated nodes and automatically update their slices.”72 therefore, the key limitation to using an fba algorithm is continuity of participation. if too many nodes leave the network, reengineering consensus would require centralized coordination whereas proof of work algorithms could operate after losing many nodes without substantial human intervention.73 storing the blockchain storing a large blockchain, such as bitcoin, is a significant challenge. one method to facilitate that storage would be to rely on top tier nodes to retain a complete copy of the blockchain and allow smaller, lower tier nodes to retain an abridged version. in bitcoin, these methods are known as full payment verification (fpv) and simplified payment verification (spv). fpv requires a complete copy of the blockchain to “verify that bitcoins used in a transaction originated from a mined block by scanning backward, transaction by transaction, in the blockchain until their origin is found.”74 unfortunately, as one might expect, fpv consumes many resources and can take a long time to initialize. for example, downloading bitcoin’s blockchain can take several days. this long installation period is partly due to the size of blockchain, but if proof of information technology and libraries | june 2019 11 work is used as the consensus algorithm, then the new node must also connect to other full nodes “to determine whose blockchain has the greatest proof-of-work total (by definition, this is assumed to be the consensus blockchain).”75 using fba instead of proof of work would eliminate this time and resource consuming step. in contrast, svp only allows a node “to check that a transaction has been verified by miners and included in some block in the blockchain.”76 a node does this by downloading the block headers of every block in the chain. in addition to retaining the hash of the previous block header, these headers also include root hashes derived from a merkle tree. a merkle tree is a method where “the spent transactions…can be discarded to save disk space.”77 as shown in figure 8, combining transaction hashes for the entire block into a single root hash in the block header saves a considerable amount of storage capacity because the interior hashes can be eliminated or “pruned” off the merkle tree. figure 8. using a merkle tree for storage. as shown in figure 9, to verify that a transaction was included a block, a node “obtains the merkle branch linking the transaction to the block it’s timestamped in.”78 although it cannot check the transaction directly, “by linking it to a place in the chain he can see that a network node has accepted it and blocks after it further confirm the network has accepted it.”79 no need to ask | rubel 12 https://doi.org/10.6017/ital.v38i2.10822 figure 9. verifying a transaction using a merkle root hash. compared to fvp, svp “requires only a fraction of the memory that’s needed for the entire blockchain.”80 this small amount of storage enables svp ledgers to sync and become operational in less than an hour.81 svp is limited, however, only allowing nodes to manage addresses or public keys that they maintain whereas fvp ledgers are able to query the entire network. thus, an svp ledger must rely “on its network peers to ensure its transactions are legit.”82 theoretically, an attacker could overpower the entire network and convince nodes using svp to accept fraudulent transactions, but such an attack is very unlikely for metadata blockchains. for additional security, an svp node could also “accept alerts from network nodes when they detect an invalid block, prompting the user’s software to download the full block and alerted transactions to confirm the inconsistency.”83 adding such a feature to metadata blockchain software would eliminate the slight risk of it being contaminated by malicious actors. thus, svp offers the ability for smaller institutions to participate in creating and maintaining a metadata blockchain without requiring them to have the storage capacity for the entire blockchain. conclusion and future directions this article described how permissionless metadata blockchains could be created to overcome two significant limitations in current cataloging practices: centralization and a lack of traceability. the process would start by creating public keys using a seed and the sha-256 algorithm and private keys using an elliptic curve digital signal algorithm. after creating the genesis block, nodes would submit either a new record or modifications to a single record for validation. validation would rely on a federated byzantine agreement (fba) consensus algorithm because it offers the most flexibility for institutions to select authoritative peers. quorum slices would be chosen using a tiered system where the top tier institutions would be the core metadata authorities, such as the library of congress. only the top tier nodes would be required to store a copy of the entire blockchain (fvp) thereby allowing other institutions to decide whether they prefer to use svp or fvp. information technology and libraries | june 2019 13 future directions for research could start with investigating whether this theoretical design will work. fba has not been heavily promoted as an option for a consensus algorithm, but its quorum slices create trust between recognized authorities and smaller institutions. another area of study could be whether there is a significant demand for metadata blockchains. many institutions appear frustrated at the costs and limitations of working with a vendor, but they also view such relationships as necessary for metadata record creation and maintenance. a metadata blockchain would reduce such dependence, but some institutions may be leery of using open source software. other institutions might be hesitant to adopt blockchain because they believe it is merely another “fad” or an unnecessary addition to metadata exchange systems. a third area for research could be a cost-benefit analysis for implementing metadata blockchains that weighs current vendor fees and labor costs against the potential storage and labor costs. such an analysis may create a tipping point where long-term return on investment outweighs the short-term challenges. endnotes 1 “about the project,” blockchain for peer review, digital science and katalysis, accessed nov. 29, 2018, https://www.blockchainpeerreview.org/about-the-project/. 2 “marc record services,” marc standards, library of congress, accessed nov. 29, 2018, https://www.loc.gov/marc/marcrecsvrs.html; “open library data,” open library, internet archive, accessed nov. 29, 2018, https://archive.org/details/ol_data ; oclc, 2017-2018 annual report. 3 “join the pcc,” program for cooperative cataloging, library of congress, accessed nov. 29, 2018, http://www.loc.gov/aba/pcc/join.html. 4 “040 cataloging source (nr),” oclc support & training, oclc, accessed nov. 29, 2018, https://www.oclc.org/bibformats/en/0xx/040.html. 5 dr. joris van rossum, “blockchain for research,” accessed nov. 29, 2018, https://www.digitalscience.com/resources/digital-research-reports/blockchain-for-research/. 6 van rossum, 11. 7 van rossum, 12. 8 van rossum, 12. 9 van rossum, 16. 10 digital science and katalysis, “about the project.” 11 “decentralized research platform,” deip, accessed nov. 29, 2018, https://deip.world/wpcontent/uploads/2018/10/deip-whitepaper.pdf. 12 deip, 13. 13 deip, 14. 14 deip, 16. no need to ask | rubel 14 https://doi.org/10.6017/ital.v38i2.10822 15 jason griffey, “blockchain for libraries,” feb. 26, 2016, https://speakerdeck.com/griffey/blockchain-for-libraries. 16 “e-services,” concensum, accessed nov. 29, 2018, https://concensum.org/en/e-services; “about,” binded, accessed nov. 29, 2018, https://binded.com/about; “faq,” dot blockchain media, accessed nov. 29, 2018, http://dotblockchainmedia.com/. 17 van rossum, “blockchain for research,” 10. 18 debbie ginsberg, “law and the blockchain,” blockchains for the information profession, nov. 22, 2017, https://ischoolblogs.sjsu.edu/blockchains/law-and-the-blockchain-by-debbieginsberg/. 19 griffey, “blockchain for libraries.” 20 “ways to use blockchain in libraries,” san josé state university, accessed nov. 29, 2018, https://ischoolblogs.sjsu.edu/blockchains/blockchains-applied/applications/. 21 “libchain: open, verifiable, and anonymous access management,” libchain, accessed nov. 29, 2018, https://libchain.github.io/. 22 griffey, “blockchain for libraries.” 23 san josé state university. “ways to use blockchain in libraries.” 24 “demco software blockchain,” demco, accessed nov. 29, 2018, http://blockchain.demcosoftware.com/. 25 jordan baczuk, “how to generate a bitcoin address—step by step,” coinmonks, accessed nov. 29, 2018, https://medium.com/coinmonks/how-to-generate-a-bitcoin-address-step-by-step9d7fcbf1ad0b. 26 “elliptic curve digital signature algorithm,” bitcoin wiki, accessed nov. 29, 2018, https://en.bitcoin.it/wiki/elliptic_curve_digital_signature_algorithm. 27 conrad barski and chris wilmer, bitcoin for the befuddled (san francisco: no starch pr., 2015), 139. 28 barski and wilmer, 12-13. 29 barski and wilmer, 11. 30 barski and wilmer, 172-73. 31 barski and wilmer, 172-73. 32 satoshi nakamoto, “bitcoin: a peer-to-peer electronic cash system,” accessed nov. 29, 2018, https://bitcoin.org/bitcoin.pdf. 33 barski and wilmer, bitcoin for the befuddled, 170-72. information technology and libraries | june 2019 15 34 amber snow, “the design and implementation of blockchain technology in academic resource’s authoritative metadata records: enhancing validation and accountability” (master’s thesis, ferris state university, 2018), 34. 35 snow, 40. 36 snow, 40. 37 snow, 37, 40. 38 snow, 42. 39 snow, 37. 40 snow, 39. 41 barski and wilmer, bitcoin for the befuddled, 23. 42 snow, “the design and implementation of blockchain technology,” 37. 43 “9 types of consensus mechanisms you didn’t know about,” daily bit, accessed nov. 29, 2018, https://medium.com/the-daily-bit/9-types-of-consensus-mechanisms-that-you-didnt-knowabout-49ec365179da. 44 barski and wilmer, bitcoin for the befuddled, 138. 45 barski and wilmer, 171. 46 barski and wilmer, 138. 47 nakamoto, “bitcoin,” 3. 48 barski and wilmer, bitcoin for the befuddled, 171. 49 helen zhao, “bitcoin and blockchain consume an exorbitant amount of energy. these engineers are trying to change that,” cnbc, feb. 23, 2018, https://www.cnbc.com/2018/02/23/bitcoinblockchain-consumes-a-lot-of-energy-engineers-changing-that.html. 50 barski and wilmer, bitcoin for the befuddled, 23. 51 shaan ray, “federated byzantine agreement,” towards data science, accessed nov. 29, 2018, https://towardsdatascience.com/federated-byzantine-agreement-24ec57bf36e0. 52 marko vukolić, “the quest for scalable blockchain fabric: proof-of-work vs. bft replication,” ibm research – zurich, accessed nov. 29, 2018, http://vukolic.com/inetsec_2015.pdf 53 vukolić, “the quest for scalable blockchain fabric,” [5]. 54 vukolić, [6]. no need to ask | rubel 16 https://doi.org/10.6017/ital.v38i2.10822 55 vukolić, [6]. 56 vukolić, [7]. 57 vukolić, [7]. 58 vukolić, [7]. 59 vukolić, [6]. 60 david mazières, “the stellar consensus protocol: a federated model for internet-level consensus,” stellar development foundation, accessed nov. 29, 2018, https://www.stellar.org/papers/stellar-consensus-protocol.pdf. 61 mazières, 3. 62 mazières, 1. 63 mazières, 4. 64 mazières, 4. 65 mazières, 4. 66 mazières, 5. 67 mazières, 11. 68 mazières, 11. 69 mazières, 13. 70 mazières, 28. 71 mazières, 29. 72 mazières, 29. 73 mazières, 29. 74 barski and wilmer, bitcoin for the befuddled, 191. 75 barski and wilmer, 191. 76 barski and wilmer, 192. 77 nakamoto, “bitcoin,” 4. 78 nakamoto, 5. 79 nakamoto, 5. information technology and libraries | june 2019 17 80 barski and wilmer, bitcoin for the befuddled, 192. 81 barski and wilmer, 193. 82 barski and wilmer, 193. 83 nakamoto, “bitcoin,” 5. the open access citation advantage: does it exist and what does it mean for libraries? colby lewis information technology and libraries | september 2018 50 colby lewis (colbyllewis@gmail.com), a second year master of science in information student at the university of michigan school of information, is winner of the 2018 lita/ex libris student writing award. abstract the last literature review of research on the existence of an open access citation advantage (oaca) was published in 2011 by philip m. davis and william h. walters. this paper reexamines the conclusions reached by davis and walters by providing a critical review of oaca literature that has been published since 2011 and explores how increases in open access publication trends could serve as a leveraging tool for libraries against the high costs of journal subscriptions. introduction since 2001, when the term “open access” was first used in the context of scholarly literature, the debate over whether there is a citation advantage (ca) caused by making articles open access (oa) has plagued scholars and publishers alike.1 to date, there is still no conclusive answer to the question, or at least not one that the premier publishing companies have deemed worthy of acknowledging. there have been many empirical studies, but far fewer with randomized controls. the reasons for this range from data access to the numerous potential “methodological pitfalls” or confounding variables that might skew the data in favor of one argument or another. the most recent literature review of articles that explored the existence (or lack thereof) of an open access citation advantage (oaca) was published in 2011 by philip m. davis and william h. walters. in that review, davis and walters ultimately concluded that “while free access leads to greater readership, its overall impact on citations is still under investigation. the large access -citation effects found in many early studies appear to be artifacts of improper analysis and not the result of a causal relationship.”2 this paper seeks to reexamine the conclusions reached by davis and walters in 2011 by providing a critical review of oaca literature that have been published since their 2011 literature review.3 this paper will examine the methods and conclusions provoking such criticisms and whether these criticisms are addressed in the studies. i will begin by identifying some of the top confounders in oaca studies, in particular the potential for self-archiving bias. i will then examine articles from july 2011, when davis and walters published their findings, to july 2017. there will be a few exceptions to this time frame, but the studies cited in figures 4 and 5 are entirely from this period. in addition to reviewing oaca studies since davis and walters’ march 2011 study, i will explore the implications of an oaca on the future of publishing and the role of librarians in the subscription process. as antelman points out in her association of college and research libraries conference paper, “leveraging the growth of open access in library collection decision making,” it is the responsibility of libraries to use the newest data and technology available to them in the interest of best serving their patrons and advancing scholarship.4 in connecting oaca mailto:colbyllewis@gmail.com the open access citation advantage | lewis 51 https://doi.org/10.6017/ital.v37i3.10604 studies and the potential bargaining power an oaca could bring libraries, i assess the current roles that universities and university libraries play in promoting (or not) oa publications and the implications of an oaca for researchers, universities, and libraries, and i provide suggestions on how recent research could influence the present trajectory. i conclude by summarizing what my findings tell us about the existence (or lack thereof) of an oaca, and what these findings imp ly for the future of library journal subscriptions and the publish-or-perish model for tenure. lastly, i will suggest some alternative metrics to citations that could be used by libraries in determining future journal subscriptions and general collection management. self-archiving bias and why it doesn’t matter the idea of a self-archiving bias is based upon the concept that, if faced with a choice, authors will always opt to make their best work more widely available. effectively, when open access is not mandated, these articles may be specifically chosen to be made open access to increase readership and, hypothetically, citations.5 this biased selection method has the potential to confound the results of oaca studies because of the intuitive notion that an author’s best work is much more likely to be cited than any of their other work. its effect is amplified by making this work available oa, but it prevents studies in which articles were self-archived from being able to convincingly claim that the citation advantage these articles received was due to oa and not to its inherent quality and subsequent likelihood to be cited anyway. in a 2010 study, gargouri et al. determined that articles by authors whose institutions mandated self-archiving (such as in an institutional repository [ir]) saw an oaca just as great for articles that were mandated to be oa as for articles that were self-selected to be oa.6 this by no means proves a causal relationship between oa and ca, but does counter the notion that self -archived articles are an uncontrollable confounder that automatically compromises the legitimacy of oaca studies.7 ottaviani affirms this conclusion in a 2016 study in which he writes, “in the long run better articles gain more citations than expected by being made oa, adding weight to the results reported by gargouri et al.”8 in short, claiming that articles self-selected for self-archiving irreparably confound oaca studies ignores the fact that these authors have accounted for the likelihood that articles of higher quality will inherently be cited more. as gargouri et al. put it, “the oa advantage [to self-archived articles] is a quality advantage, rather than a quality bias” (italics in original).9 gold versus green and their effect on oaca analyses many critics of oaca studies have argued that such studies do not distinguish between gold oa, green oa, and hybrid (subscription journals that offer the option for authors to opt-in to gold oa) journals in their sample pool, thus skewing the results of their studies. in fact, there are many acknowledged subcategories of oa, but for the purposes of this paper, i will primarily focus on gold, green, and hybrid oa. figure 1, provided by elsevier as a guide for their clients, distinguishes between gold and green oa.10 while the chart provided applies specifically to those looking to publish with elsevier, it highlights the overarching differences between gold oa and green oa. a comprehensive list of oa journals is available through the directory of open access journals (doaj) website (https://doaj.org/). https://doaj.org/ information technology and libraries | september 2018 52 figure 1. elsevier explains to potential clients their options for publishing oa with elsevier and the differences between publishing with gold oa versus green oa. the argument that not distinguishing between gold oa and green oa in oaca studies distorts study results primarily stems from the potential for skew in green oa journals. green oa journals allow authors to self-archive their articles after publication, but the articles are often not made full oa until an embargo period has passed. this problem was addressed in a recent study conducted by science-metrix and 1science, who manually checked and coded approximatively 8,100 top-level domains (tlds).11 it is important to note that this study was made available as a white paper on the 1science website and has not been published in a peer-reviewed journal. additionally, 1science is a company built on providing oa solutions to libraries, which means they have a vested interest in proving the existence of an oaca. however, just as publishers such as elsevier have a vested interest in a substantial oaca not existing, this should not prevent us from examining their data. for their study, 1science did not distinguish hybrid journals as being in a distinct journal category. critics, such as the editorial director of journals policy for oxford university press, david crotty, were quick to fixate on this lack of distinction as a means of discrediting the study.12 employees of elsevier were similarly inclined to criticize the study, declaring that it, “like many others [studies] on this topic, does not appear to be randomized and controlled.”13 however, archambault et al., acknowledging that their study “does not examine the overlap between green and gold,” have provided an extremely comprehensive sample pool, examining 3,350,910 oa papers published between 2007 and 2009 in 12,000 journals.14 this paper examines the notion that “the advantage of oa is partly due to citations having a chance to arrive sooner . . . and concludes that the purported head start of oa papers is actually contrary to observed data.” 15 the open access citation advantage | lewis 53 https://doi.org/10.6017/ital.v37i3.10604 in a more recent study published in february 2018, piwowar et al. examine the prevalence of oa and average relative citation (arc) based on three sample groups of one hundred thousand articles each: “(1) all journal articles assigned a crossref doi, (2) recent journal articles indexed in web of science, and (3) articles viewed by users of unpaywall, an open-source browser extension that lets users find oa articles using oadoi.”16 unlike the 1science study, piwowar et al. had a twofold purpose: to examine the prevalence of oa articles available on the web and whether an oaca exists based on their sample findings. i do not include their results in my literature review because of the dual focus of their study, although i do compare their results with those of archambault et al. and analyze the implications of their findings. bronze: neither gold nor green in their article, piwowar et al. introduce a new category of oa publication: bronze. if gold oa refers to complete open access at the time of publication, and green oa refers to articles published in a paywalled journal but ultimately made oa either after an embargo period or via an ir, bronze oa refers to oa articles that somehow don’t fit into either of these categories. piwowar et al. define bronze oa articles as “free to read on the publisher page, but without any clearly identifiable license.”17 however, as crotty points out in a scholarly kitchen article reflecting on the preprint version of piwowar et al.’s article, “bronze” already exists as an oa category, but has simply been called “public access.”18 while coining “bronze” as a new term for “public access” is helpful in connecting it to oa terms such as “green” and “gold,” it is not quite the new phenomenon it is touted to be. arc as an indication of an oaca both archambault et al. and the authors of the 1science paper provide the arc as a means of establishing a paper’s impact on the larger research community. 19 within their arc analyses, archambault et al. distinguish between non-oa and oa, within which they differentiate between gold and green oa (figure 2). piwowar et al. group papers by closed (non-oa) and oa, with the following oa subcategories: bronze, hybrid, gold, and green oa (figure 3). an arc of 1.0 is the expected amount of citations an article will receive “based on documents published in the same year and [national science foundation (nsf)] specialty.” 20 based on this standard, articles with an arc above or below 1.0 represent a citation impact that percentage above or below the expected citation impact of like articles. for example, an article with an arc of 1.23 has received 23 percent more citations than expected for articles of similar content and quality. this scale can be incredibly useful in determining the presence of a citation advantage, and it can enable researchers to determine overall ca patterns. information technology and libraries | september 2018 54 figure 2. research impact of paywalled (not oa) versus open access (oa) papers “computed by science-metrix and 1science using oaindx and the web of science.” archambault et al., “research impact of paywalled versus open access papers,” white paper, science-metrix and 1science, 2016, http://www.1science.com/1numbr/. critics’ fixation on the “randomized and controlled” nature of the 1science study ignores the fact that the authors do not claim causation. rather, their findings suggest the existence of an oaca when comparing oa (in all forms) and non-oa (in any form) articles (see figure 2). the authors ultimately conclude that “in all these fields, fostering open access (without distinguishing between gold and green) is always a better research impact maximization strategy than relying on strictly paywalled papers.”21 unlike archambault et al., piwowar et al. found that gold oa articles had a significantly lower arc, and that the average arc of all oa balances out to 1.18 because of the high arcs of bronze (1.22), hybrid (1.31), and green (1.33). however, both studies fou nd that non-oa (referred to by piwowar et al. as “closed”) articles had an arc below 1.0, suggesting a definitive correlation between oa (without specifying type) and an increase in citations. http://www.1science.com/1numbr/ the open access citation advantage | lewis 55 https://doi.org/10.6017/ital.v37i3.10604 figure 3. “average relative citations of different access types of a random sample of world of science (wos) articles and review with a digital object identifier (doi) published between 2009 and 2015.” heather piwowar et al., “the state of oa: a large-scale analysis of the prevalence and impact of open access articles,” peerj, february 13, 2018, https://doi.org/10.7717/peerj.4375. six years and what has changed in oaca research between july 2011 and the publication of piwowar et al.’s work in february 2018, nine new oaca studies have been published in peer-reviewed journals. of these, five only look at the oaca in one field, such as cytology or dentistry. the other four are multidisciplinary studies, two of which are repository-specific and only use articles from deep blue and academia.edu, respectively. this is important to note because of critics’ earlier stated objections to the use of studies that are not randomized controlled studies. however, the deep blue study can still be considered a randomized controlled sample group because the authors are not self-selecting articles to upload to the repository as they are with academia.edu. rather, articles were made accessible through deep blue “via blanket licensing agreements between the publishers and the [university of michigan] library.”22 some of the field-specific studies use sample sizes that may not reflect a general oaca, but rather one only for that field, and in certain cases, only for a single journal. field-specific studies between july 2011 and july 2017, five field-specific studies were conducted to determine whether an oaca existed in those fields. i summarize the scope and conclusions of these studies in table 1. as you can see from the table, the article sample size vastly varied between studies, but that can likely be accounted for by considering the specific fields studied since there are only five major cytopathology journals and nearly fifty major ecology journals. piwowar et al. acknowledge this in their study, noting that the nsf assigns all science journals “exactly one ‘discipline’ (a high-level categorization) and exactly one ‘specialty’ (a finer-grained categorization).”23 the more deeply nested in an nsf discipline a subject is, the more specialized the field becomes and the fewer journals there are on the subject. this alone is reason not to extrapolate from the results of these studies and project their results on the existence of oaca across all fields. https://doi.org/10.7717/peerj.4375 information technology and libraries | september 2018 56 only two of these studies, those focused on an oaca in dentistry and ecology, can be cons idered truly randomized controlled studies. both the cytopathology and marine ecology studies chose a specific set of journals from which to draw their entire sample pool. while the dentistry and ecology studies can be considered randomized controlled in nature, they still only reflect the occurrence (or lack thereof) of an oaca in those specific fields. it would be irresponsible to allow the results from studies in a single field of a single discipline to represent oaca trends across all disciplines. therefore, it is surprising that elsevier employees use the dentistry study to make such a claim. hersh and plume write, “another recent study by hua et al (2016) looking at citations of open access articles in dentistry found no evidence to suggest that open access articles receive significantly more citations than non-open access articles.”24 the key phrase missing from the end of this analysis is in dentistry. one might question whether a claim about multidisciplinary oaca can effectively be extrapolated from a single-field analysis. the authors do, two sentences later, qualify their earlier statement by saying, “in dentistry at least, the type of article you publish seems to make a difference but not oa status.”25 that is indeed what this study seems to show, and is therefore a logical claim to make. likewise, the three empirical studies in table 1 show that, for those respective fields, oa status does correlate to a citation advantage. in the case of the ecology study, the authors are confident enough in their randomized controlled methodology to claim causation. 26 the ecology study is the most recently published oaca study, and its authors were able to learn from similar past studies about the necessary controls and potential confounders in oaca studies. with this knowledge, tang et al. determined that: by comparing oa and non-oa articles within hybrid journals, our estimate of the citation advantage of oa articles sets controls for many factors that could confound other comparisons. numerous studies have compared articles published in oa journals to those in non-oa journals, but such comparison between different journals could not rule out the impacts of potentially confounding factors such as publication time (speed) and quality and impact (rank) of the journal. these factors are effectively controlled with our focus on hybrid journals, thereby providing robust and general estimates of citation advantages on which to base publication decisions. 27 the open access citation advantage | lewis 57 https://doi.org/10.6017/ital.v37i3.10604 summary of key field-specific studies author study design content number of articles controls results, interpretation, and conclusion clements 2017 empirical 3 hybrid-oa marine ecology journals all articles published in these journals between 2009 and 2012; specific number not provided jif; article type; selfcitations “on average, open access articles received more peer-citations than nonopen access articles.” oaca found. frisch et al. 2014 empirical 5 cytopathology journals; 1 oa and 4 non-oa 314 articles published between 2007 and 2011 jif; author frequency; publisher neutrality “overall, the averages of both cpp and q values were higher for oa cytopathology journal (cytojournal) than traditional non-oa journals.” oaca found. gaulé and maystre 2011 empirical 1 major biology journal 4,388 articles published between 2004 and 2006 last author; characteristics; article quality “we find no evidence for a causal effect of open access on citations. however, a quantitatively small causal effect cannot be statistically ruled out.” oaca not found. hua et al. 2016 randomized controlled articles randomly selected from pubmed database, not specific dentistry journals 908 articles published in 2013 randomized article selection; exclusion of articles unrelated to dentistry; multidatabase search to determine oa status “in the present study, there was no evidence to support the existence of oa ‘citation advantage’, or the idea that oa increases the citation of citable articles.” oaca not found. tang et al. 2017 randomized controlled 46 hybrid-oa ecology journals 3,534 articles published between 2009 and 2013 gni of author country; randomized article pairing; article length “overall, oa articles received significantly more citations than non-oa articles, and the citation advantage averaged approximately one citation per article per year and increased cumulatively over time after publication.” oaca found. table 1. scope, controls, and results of field-specific oaca studies since 2011. based on a chart in stephan mertens, “open access: unlimited web based literature searching,” deutsches ärzteblatt international 106, no. 43 (2009): 711. jif, journal impact factor; cpp, citations per publication; q, q-value (see frisch, nora k., romil nathan, yasin k. ahmed, and vinod b. shidham. “authors attain comparable or slightly higher rates of citation publishing in an open access journal (cytojournal) compared to traditional cytopathology journals—a five year (2007–2011) experience.” cytojournal 11, no. 10 (april 2014). https://doi.org/10.4103/1742-6413.131739 for specific equation used.) https://doi.org/10.4103/1742-6413.131739 information technology and libraries | september 2018 58 summary of key multidisciplinary studies author study design content number of articles controls results, interpretation, and conclusion mccabe and snyder 2014 empirical 100 journals in ecology, botany, and multidisciplinary science all articles published in these journals between 1996 and 2005; specific number not provided jif; journal founding year “we found that open access only provided a significant increase for those volumes made openly accessible via the narrow channel of their own websites rather than the broader pubmed central platform.” oaca found. niyazov et al. 2016 empirical unspecified number of journals across 23 academic divisions 31,216 articles published between 2009 and 2012 field; jif; publication vs. upload date “we find a substantial increase in citations associated with posting an article to academia.edu. . . . we find that a typical article that is also posted to academia.edu has 49% more citations than one that is only available elsewhere online through a non-academia.edu venue.” oaca found for academia.edu. ottaviani 2016 randomized controlled unspecified number of journals who have blanket licensing agreements between the publishers and the university of michigan library 93,745 articles published between 1990 and 2013 self-selection “even though effects found here are more modest than reported elsewhere, given the conservative treatments of the data and when viewed in conjunction with other oaca studies already done, the results lend support to the existence of a real, measurable, open access citation advantage with a lower bound of approximately 20%.” oaca found. sotudeh et al. 2015 empirical 633 apc-funded oa journals published by springer and elsevier 995,508 articles published between 2007 and 2011 journals who adopted oa policies after 2007 journals with non– article processing charge oa policies “the apc oa papers are, also, revealed to outperform the ta ones in their citation impacts in all the annual comparisons. this finding supports the previous results confirming the citation advantage of oa papers.” oaca found. table 2. scope, controls, and results of multi-disciplinary oaca studies since 2011. jif, journal impact factor; apc, article processing charge; ta, toll access the open access citation advantage | lewis 59 https://doi.org/10.6017/ital.v37i3.10604 based on the randomized controlled methodology that tang et al. found hybrid journals to provide, it is possible that this study may serve as an ideal model for future larger oaca studies across multiple disciplines. however, more field-specific hybrid journal studies will have to be conducted before determining if this model would be the most accurate method for measuring oaca across multiple disciplines in a single study. multidisciplinary studies the multidisciplinary oaca studies conducted since 2011 include a single randomized control study and three empirical studies (table 2). all these studies found an oaca; in the case of niyazov et al., an oaca was found specifically for articles posted to academia.edu. i included this study because it is an important contribution to the premise that a relationship exists between self selection and oaca. niyazov et al. highlight this point in the section “sources of selection bias in academia.edu citations,” explaining that “even if academia.edu users were not systematically different than non-users, there might be a systematic difference between the papers they choose to post and those they do not. as [many] . . . have hypothesized, users may be more likely to post their most promising, ‘highest quality’ articles to the site, and not post articles they believe will be of more limited interest.”28 to underscore this point, i refer to gargouri et al., who stated that “the oa advantage [to self archived articles] is a quality advantage, rather than a quality bias” (italics in original).29 again, it is unsurprising that articles of higher caliber are cited more and that making such articles more readily available increases the amount of citations they would likely already receive. similar to my conclusion in the field-specific study section, we simply need more randomized controlled studies, such as ottaviani’s, to determine the nature and extent of the relationship between oa and ca across multiple disciplines. conclusions critics of some of the most recent studies, specifically archambault et al. and ottaviani, have argued that authors of oaca studies are too quick to claim causation. while a claim of causation does indeed require strict adherence to statistical methodology and control of potential confounders, few of the authors i have examined actually claim causation. they recognize that the empirical nature of their studies is not enough to prove causation, but rather to provide insight into the correlation between open access and a citation advantage. in all their conclusions, these authors acknowledge that further studies are needed to prove a causal relationship between oa and ca. the recent work published by piwowar et al. provides a potential model for replication by other researchers, and ottaviani offers a replicable method for other large research institutions with non-self-selecting institutional repositories. alternatively, field-specific studies conducted in the style of tang et al. across all fields would serve to provide a wider array of evidence for the occurrence of field-specific oaca and therefore of a more widespread oaca. recent developments in oa search engines have created alternative routes to many of the same articles offered by subscriptions, but at a fraction (if any) of the cost. antelman proposed that libraries use an oa-adjusted cost per download (oa-adj cpd), a metric that “subtracts the downloads that could be met by oa copies of articles within subscription journals,” as a tool for negotiating the price of journal subscriptions.30 by calculating an oa-adj cpd, libraries could information technology and libraries | september 2018 60 potentially leverage their ability to access journal articles through means other than traditional subscription bundles to save money and encourage oa publication. while antelman suggests using oa-adj cpd as a leveraging tool when making deals with publishers for journals subscriptions, i suggest that libraries use the data-gathering methods of piwowar et al. via unpaywall to determine whether enough articles from a specific journal can be found oa via unpaywall. by using metrics such as those collected by piwowar et al. through unpaywall, the potential confounding variable of articles found through illegitimate means (such as scihub) is alleviated. instead, piwowar et al.’s metrics focus on tracking the percentage of material searched by library patrons that can be found oa through the unpaywall browser extension. according to unpaywall’s “libraries user guide” page, libraries “can integrate unpaywall into their sfx, 360 link, or primo link resolvers, so library users can read oa copies in cases where there's no subscription access. over 1000 libraries worldwide are using this now. ”31 ideally, scholars will also be more willing to publish papers oa, and institutions will be more supportive of providing the necessary costs for making publications oa. though the publish-orperish model still reigns in academia, there is great potential in encouraging tenured professors to publish oa by supplementing the costs through institutional grants and other incentives wrapped into a tenure agreement. perhaps through this model, as gargouri et al. have suggested, the longstanding publish-or-perish doctrine will give way to an era of “self-archive to flourish.”32 bibliography antelman, kristin. “leveraging the growth of open access in library collection decision making.” acrl 2017 proceedings: at the helm, leading the transformation, march 22–25, baltimore, maryland, ed. dawn m. mueller (chicago: association of college and research libraries, 2017), 411–22. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/l everagingthegrowthofopenaccess.pdf. archambault, éric, grégoire côté, brooke struck, and matthieu voorons. “research impact of paywalled versus open access papers.” white papers, science-metrix and 1science, 2016. http://www.1science.com/1numbr/. calver, michael c. and j. stuart bradley. “patterns of citations of open access and non -open access conservation biology journal papers and book chapters.” conservation biology 24, no. 3 (may 2010): 872-80. https://doi.org/10.1111/j.1523-1739.2010.01509.x. chua, s. k., ahmad m. qureshi, vijay krishnan, dinker r. pai, laila b. kamal, sharmilla gunasegaran, m. z. afzal, lahri ambawatta, j. y. gan, p. y. kew, et al. “the impact factor of an open access journal does not contribute to an article’s citations” [version 1; referees: 2 approved]. f1000 research 6 (2017): 208. https://doi.org/10.12688/f1000research.10892.1. clarivate analytics. “incites journal citation reports.” dataset updated september 9, 2017. https://jcr.incites.thomsonreuters.com/. clements, jeff c. “open access articles receive more citations in hybrid marine ecology journals.” facets 2 (january 2017): 1–14. https://doi.org/10.1139/facets-2016-0032. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/leveragingthegrowthofopenaccess.pdf http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/leveragingthegrowthofopenaccess.pdf http://www.1science.com/1numbr/ https://doi.org/10.1111/j.1523-1739.2010.01509.x https://doi.org/10.12688/f1000research.10892.1 https://jcr.incites.thomsonreuters.com/ https://doi.org/10.1139/facets-2016-0032 the open access citation advantage | lewis 61 https://doi.org/10.6017/ital.v37i3.10604 crotty, david. “study suggests publisher public access outpacing open access; gold oa decreases citation performance.” scholarly kitchen, october 4, 2017. https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-accessoutpacing-open-access-gold-oa-decreases-citation-performance/. crotty, david. “when bad science wins, or ‘i’ll see it when i believe it.’” scholarly kitchen, august 31, 2016. https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-see-itwhen-i-believe-it/. davis, philip m. “open access, readership, citations: a randomized controlled trial of scientific journal publishing.” faseb journal 25, no. 7 (july 2011): 2129–34. https://doi.org/10.1096/fj.11183988. davis, philip m., and william h. walters. “the impact of free access to the scientific literature: a review of recent research.” journal of the medical library association 99, no. 3 (july 2011): 208– 17. https://doi.org/10.3163/1536-5050.99.3.008. elsevier. “your guide to publishing open access with elsevier.” amsterdam, netherlands: elsevier, 2015. https://www.elsevier.com/__data/assets/pdf_file/0020/181433/openaccessbooklet_may.pdf. evans, james a. and jacob reimer. “open access and global participation in science.” science 323, no. 5917 (february 2009): 1025. https://doi.org/10.1126/science.1154562. eysenbach, gunther. “citation advantage of open access articles.” plos biology 4, no. 5 (may 2006): e157. https://doi.org/10.1371/journal.pbio.0040157. fisher, tim. “top-level domain (tld).” lifewire, july 30, 2017. https://www.lifewire.com/toplevel-domain-tld-2626029. frisch, nora k., romil nathan, yasin k. ahmed, and vinod b. shidham. “authors attain comparable or slightly higher rates of citation publishing in an open access journal (cytojournal) compared to traditional cytopathology journals—a five year (2007–2011) experience.” cytojournal 11, no. 10 (april 2014). https://doi.org/10.4103/1742-6413.131739. gaulé, patrick, and nicolas maystre. “getting cited: does open access help?” research policy 40, no. 10 (december 2011): 1332–38. https://doi.org/10.1016/j.respol.2011.05.025. gargouri, yassine, chawki hajjem, vincent larivière, yves gingras, les carr, tim brody, and stevan harnad. “self-selected or mandated, open access increases citation impact for higher quality research.” plos one 5, no. 10 (october 2010). https://doi.org/10.1371/journal.pone.0013636. hajjem, chawki, stevan harnad, and yves gingras. “ten-year cross-disciplinary comparison of the growth of open access and how it increases research citation impact.” ieee data engineering bulletin 28, no. 4 (december 2005): 39-46. hall, martin. “green or gold? open access after finch.” insights 25, no. 3 (november 2012): 235– 40. https://doi.org/10.1629/2048-7754.25.3.235. https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-access-outpacing-open-access-gold-oa-decreases-citation-performance/ https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-access-outpacing-open-access-gold-oa-decreases-citation-performance/ https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-see-it-when-i-believe-it/ https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-see-it-when-i-believe-it/ https://doi.org/10.1096/fj.11-183988 https://doi.org/10.1096/fj.11-183988 https://doi.org/10.3163/1536-5050.99.3.008 https://www.elsevier.com/__data/assets/pdf_file/0020/181433/openaccessbooklet_may.pdf https://doi.org/10.1126/science.1154562 https://doi.org/10.1371/journal.pbio.0040157 https://www.lifewire.com/top-level-domain-tld-2626029 https://www.lifewire.com/top-level-domain-tld-2626029 https://doi.org/10.4103/1742-6413.131739 https://doi.org/10.1016/j.respol.2011.05.025 https://doi.org/10.1371/journal.pone.0013636 https://doi.org/10.1629/2048-7754.25.3.235 information technology and libraries | september 2018 62 hersh, gemma, and andrew plume. “citation metrics and open access: what do we know?” elsevier connect, september 14, 2016. https://www.elsevier.com/connect/citation-metrics-andopen-access-what-do-we-know. houghton, john, and alma swan. “planting the green seeds for a golden harvest: comments and clarifications on ‘going for gold.’” d-lib magazine 19, no. 1/2 (january/february 2013). https://doi.org/10.1045/january2013-houghton. hua, fang, heyuan sun, tanya walsh, helen worthington, and anne-marie glenny. “open access to journal articles in dentistry: prevalence and citation.” journal of dentistry 47 (april 2016): 41– 48. https://doi.org/10.1016/j.jdent.2016.02.005. internet corporation for assigned names and numbers. “list of top-level domains.” last updated september 13, 2018. https://www.icann.org/resources/pages/tlds-2012-02-25-en. jump, paul. “open access papers ‘gain more traffic and citations.’” times higher education, july 30, 2014. https://www.timeshighereducation.com/home/open-access-papers-gain-more-trafficand-citations/2014850.article. mccabe, mark j., and christopher m. snyder. “identifying the effect of open access on citations using a panel of science journals.” economic inquiry 52, no. 4 (october 2014): 1284–1300. https://doi.org/10.11111/ecin.12064. mccabe, mark j., and christopher m. snyder. “does online availability increase citations? theory and evidence from a panel of economics and business journals.” review of economics and statistics 97, no. 1 (march 2015): 144–65. https://doi.org/10.1162/rest_a_00437. mertens, stephan. “open access: unlimited web based literature searching.” deutsches ärzteblatt international 106, no. 43 (2009): 710–12. https://doi.org/10.3238/arztebl.2009.0710. moed, hank. “does open access publishing increase citation or download rates?” research trends 28 (may 2012). https://www.researchtrends.com/issue28-may-2012/does-open-accesspublishing-increase-citation-or-download-rates/. niyazov, yuri, carl vogel, richard price, ben lund, david judd, adnan akil, michael mortonson, josh schwartzman, and max shron. “open access meets discoverability: citations to articles posted to academia.edu.” plos one 11, no. 2 (february 2016): e0148257. https://doi.org/10.1371/journal.pone.0148257. ottaviani, jim. “the post-embargo open access citation advantage: it exists (probably), it’s modest (usually), and the rich get richer (of course).” plos one 11, no. 8 (august 2016): e0159614. https://doi.org/10.1371/journal.pone.0159614. pinfield, stephen, jennifer salter, and peter a. bath. “a ‘gold-centric’ implementation of open access: hybrid journals, the ‘total cost of publication,’ and policy development in the uk and beyond.” journal of the association for information science and technology 68, no. 9 (september 2017): 2248–63. https://doi.org/10.1002/asi.23742. piwowar, heather, jason priem, vincent larivière, juan pablo alperin, lisa matthias, bree norlander, ashley farley, jevin west, and stefanie haustein. “the state of oa: a large-scale https://www.elsevier.com/connect/citation-metrics-and-open-access-what-do-we-know https://www.elsevier.com/connect/citation-metrics-and-open-access-what-do-we-know https://doi.org/10.1045/january2013-houghton https://doi.org/10.1016/j.jdent.2016.02.005 https://www.icann.org/resources/pages/tlds-2012-02-25-en https://www.timeshighereducation.com/home/open-access-papers-gain-more-traffic-and-citations/2014850.article https://www.timeshighereducation.com/home/open-access-papers-gain-more-traffic-and-citations/2014850.article https://doi.org/10.11111/ecin.12064 https://doi.org/10.1162/rest_a_00437 https://doi.org/10.3238/arztebl.2009.0710 https://www.researchtrends.com/issue28-may-2012/does-open-access-publishing-increase-citation-or-download-rates/ https://www.researchtrends.com/issue28-may-2012/does-open-access-publishing-increase-citation-or-download-rates/ https://doi.org/10.1371/journal.pone.0148257 https://doi.org/10.1371/journal.pone.0159614 https://doi.org/10.1002/asi.23742 the open access citation advantage | lewis 63 https://doi.org/10.6017/ital.v37i3.10604 analysis of the prevalence and impact of open access articles.” peerj (february 13, 2018): 6:e4375. https://doi.org/10.7717/peerj.4375. research information network. “nature communications: citation analysis.” press release, 2014. https://www.nature.com/press_releases/ncomms-report2014.pdf. riera, m. and e. aibar. “¿favorece la publicación en abierto el impacto de los artículos científicos? un estudio empírico en el ámbito de la medicina intensive” [does open access publishing increase the impact of scientific articles? an empirical study in the field of intensive care medicine]. medicina intensiva 37, no. 4 (may 2013): 232-40. http://doi.org/10.1016/j.medin.2012.04.002. sotudeh, hajar, zahra ghasempour, and maryam yaghtin. “the citation advantage of author-pays model: the case of springer and elsevier oa journals.” scientometrics 104 (june 2015): 581–608. https://doi.org/10.1007/s11192-015-1607-5. swan, alma, and john houghton. “going for gold? the costs and benefits of gold open access for uk research institutions: further economic modelling.” report to the uk open access implementation group, june 2012. http://wiki.lib.sun.ac.za/images/d/d3/report-to-the-uk-openaccess-implementation-group-final.pdf. tang, min, james d. bever, and fei-hai yu. “open access increases citations of papers in ecology.” ecosphere 8, no. 7 (july 2017): 1–9. https://doi.org/10.1002/ecs2.1887. unpaywall. “libraries user guide.” accessed september 13, 2018. https://unpaywall.org/userguides/libraries. wray, k. brad. “no new evidence for a citation benefit for author-pay open access publications in the social sciences and humanities.” scientometrics 106 (january 2016): 1031–35. https://doi.org/10.1007/s11192-016-1833-5. endnotes 1 elsevier, “your guide to publishing open access with elsevier” (amsterdam, netherlands: elsevier, 2015), 2, https://www.elsevier.com/__data/assets/pdf_file/0020/181433/openaccessbooklet_may.pdf. 2 philip m. davis and william h. walters, “the impact of free access to the scientific literature: a review of recent research,” journal of the medical library association 99, no. 3 (july 2011): 213, https://doi.org/10.3163/1536-5050.99.3.008. 3 david and walters, “the impact of free access,” 208. 4 kristin antelman, “leveraging the growth of open access in library collection decision making,” acrl 2017 proceedings: at the helm, leading the transformation, march 22–25, baltimore, maryland, ed. dawn m. mueller (chicago: association of college and research libraries, 2017): 411, 413, http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 7/leveragingthegrowthofopenaccess.pdf. https://doi.org/10.7717/peerj.4375 https://www.nature.com/press_releases/ncomms-report2014.pdf http://doi.org/10.1016/j.medin.2012.04.002 https://doi.org/10.1007/s11192-015-1607-5 http://wiki.lib.sun.ac.za/images/d/d3/report-to-the-uk-open-access-implementation-group-final.pdf http://wiki.lib.sun.ac.za/images/d/d3/report-to-the-uk-open-access-implementation-group-final.pdf https://doi.org/10.1002/ecs2.1887 https://unpaywall.org/user-guides/libraries https://unpaywall.org/user-guides/libraries https://doi.org/10.1007/s11192-016-1833-5 https://www.elsevier.com/__data/assets/pdf_file/0020/181433/openaccessbooklet_may.pdf http://jmla.mlanet.org/ https://doi.org/10.3163/1536-5050.99.3.008 http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/leveragingthegrowthofopenaccess.pdf http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2017/leveragingthegrowthofopenaccess.pdf information technology and libraries | september 2018 64 5 research information network, “nature communications: citation analysis,” press release, 2014, https://www.nature.com/press_releases/ncomms-report2014.pdf. 6 gargouri et al., “self-selected or mandated, open access increases citation impact for higher quality research,” plos one 5, no. 10 (october 2010): 17, https://doi.org/10.1371/journal.pone.0013636. 7 david crotty, “when bad science wins, or ‘i’ll see it when i believe it’,” scholarly kitchen, august 31, 2016, https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-seeit-when-i-believe-it/. 8 jim ottaviani, “the post-embargo open access citation advantage: it exists (probably), it’s modest (usually), and the rich get richer (of course),” plos one 11, no. 8 (august 2016): 9, https://doi.org/10.1371/journal.pone.0159614. 9 gargouri et al., “self-selected or mandated,” 18. 10 elsevier, “your guide to publishing,” 2. 11 top-level domain (tld) refers to the last string of letters in an internet domain name (i.e., the tld of www.google.com is .com). for more information on tlds, see tim fisher, “top-level domain (tld),” lifewire, july 30, 2017, https://www.lifewire.com/top-level-domain-tld2626029. for a full list of tlds, see “list of top-level domains,” internet corporation for assigned names and numbers, last updated september 13, 2018, https://www.icann.org/resources/pages/tlds-2012-02-25-en. 12 crotty, “when bad science wins.” 13 hersh and plume, “citation metrics and open access: what do we know?,” elsevier connect, september 14, 2016, https://www.elsevier.com/connect/citation-metrics-and-open-accesswhat-do-we-know. 14 archambault et al., “research impact of paywalled versus open access papers,” white paper, science-metrix and 1science, 2016, http://www.1science.com/1numbr/. 15 archambault et al., “research impact.” 16 heather piwowar et al., “the state of oa: a large-scale analysis of the prevalence and impact of open access articles,” peerj, february 13, 2018, https://doi.org/10.7717/peerj.4375. 17 piwowar et al., “the state of oa,” 5. 18 david crotty, “study suggests publisher public access outpacing open access; gold oa decreases citation performance,” scholarly kitchen, october 4, 2017, https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-accessoutpacing-open-access-gold-oa-decreases-citation-performance/. https://www.nature.com/press_releases/ncomms-report2014.pdf https://doi.org/10.1371/journal.pone.0013636 https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-see-it-when-i-believe-it/ https://scholarlykitchen.sspnet.org/2016/08/31/when-bad-science-wins-or-ill-see-it-when-i-believe-it/ https://doi.org/10.1371/journal.pone.0159614 https://www.lifewire.com/top-level-domain-tld-2626029 https://www.lifewire.com/top-level-domain-tld-2626029 https://www.icann.org/resources/pages/tlds-2012-02-25-en https://www.elsevier.com/connect/citation-metrics-and-open-access-what-do-we-know https://www.elsevier.com/connect/citation-metrics-and-open-access-what-do-we-know http://www.1science.com/1numbr/ https://doi.org/10.7717/peerj.4375 https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-access-outpacing-open-access-gold-oa-decreases-citation-performance/ https://scholarlykitchen.sspnet.org/2017/10/04/study-suggests-publisher-public-access-outpacing-open-access-gold-oa-decreases-citation-performance/ the open access citation advantage | lewis 65 https://doi.org/10.6017/ital.v37i3.10604 19 archambault et al., “research impact”; piwowar et al., “the state of oa,” 15. 20 piwowar et al., “the state of oa,” 9–10. 21 archambault et al., “research impact.” 22 ottaviani, “the post-embargo open access citation advantage,” 2. 23 piwowar et al., “the state of oa,” 9. 24 hersh and plume, “citation metrics and open access.” 25 hersh and plume, “citation metrics and open access.” 26 tang et al., “open access increases citations of papers in ecology,” ecosphere 8, no. 7 (july 2017): 8, https://doi.org/10.1002/ecs2.1887. 27 tang et al., “open access increases citations,” 7. tang et al. list the following as examples of the “numerous studies” as quoted above, which i did not include in the quote for the purpose of brevity: (antelman 2004, hajjem et al. 2005, eysenbach 2006, evans and reimer 2009, calver and bradley 2010, riera and aibar 2013, clements 2017). 28 yuri niyazov et al., “open access meets discoverability: citations to articles posted to academia.edu,” plos one 11, no. 2 (february 2016): e0148257, https://doi.org/10.1371/journal.pone.0148257. 29 gargouri et al., “self-selected or mandated,” 18. 30 antelman, “leveraging the growth,” 414. 31 “library user guide,” unpaywall, accessed september 13, 2018, https://unpaywall.org/userguides/libraries.<<disclaimer not needed if access date is given. i put the date that i accessed the page and verified the quote.>> 32 gargouri et al., “self-selected or mandated,” 20. https://doi.org/10.1002/ecs2.1887 https://doi.org/10.1371/journal.pone.0148257 https://unpaywall.org/user-guides/libraries https://unpaywall.org/user-guides/libraries abstract introduction self-archiving bias and why it doesn’t matter gold versus green and their effect on oaca analyses bronze: neither gold nor green arc as an indication of an oaca six years and what has changed in oaca research field-specific studies summary of key field-specific studies summary of key multidisciplinary studies multidisciplinary studies conclusions bibliography endnotes 116 journal of library automation vol. 14/2 june 1981 tions only. they do not list the individual works that may be contained in publications. if an analytic catalog were to be built into a computerized system at some time in the future , the structure code would be a great help in the redesign, because it makes it easy to spot items that need analytics, namely those that contain embedded works, or codes 2, 4, 5, 6, 8, 9, 10, 11, and 13. a searcher working with such an analytic catalog could use the code to limit output to manageable stages-first all items of type c, for example; then broadening the search to include those of type d; and so forth, until enough relevant material has been found. the structure code would also be useful in the displayed output. if codes 5 or 8 appeared together with a bibliographic description on the screen, this would tell the catalog user that the item retrieved is a set of many separately titled documents. a complete list of those titles can then be displayed to help the searcher decide which of the documents are relevant for him. in the card catalog this is done by means of contents notes . not all libraries go to the trouble of making contents notes, though, and not all contents notes are complete and rtliable . the structure code would ensure consistency and completeness of contents information at all times. codes 10 and 13 in a search output, analogously, would tell the user that the item is a serial with individual issue titles. there is no mechanism in the contemporary card catalog to inform readers of those titles. codes 4 and 7 would tell that the document is part of a finite set, and so forth. it has been the general experience of database designers that a record cannot have too many searchable elements built into its format. no sooner is one approach abandoned "because nobody needs it," than someone arrives on the scene with just that requirement. it can be anticipated, then, that once the structure code is part of the standard record format, catalog users will find many other ways to work the code into search strategies. it can also be anticipated that the proposed structure code, by adding a factor of selectivity, will help catalogers because it strengthens the authority-control aspect of machine-readable catalog files. if two publications bear identical titles, for example, and one is of structure 1, the other of structure 6, then it is clear that they cannot possibly be the same items. however, if they are of structures 1 and 7, respectively, extra care must be taken in cataloging, for they could be different versions of the same work. determination of the structure of an item is a by-product of cataloging, for no librarian can catalog a book unless he understands what the structure of that book is-one or more works, one or more documents per item, open or closed set, and so forth . it would therefore be very cheap at cataloging time to document the already-performed structure analysis and express this structure in the form of a code. references l. herbert h. hoffman, descriptive cataloging in a new light: polemical chapters for librarians (newport beach, calif.: headway publications, 1976), p.43. revisions to contributed cataloging in a cooperative cataloging database judith hudson: university libraries , state university of new york at albany. introduction oclc is the largest bibliographic utility in the united states. one of its greatest assets is its computerized database of standardized cataloging information . the database, which is built on the principle of shared cataloging, consists of cataloging records input from library of congress marc tapes and records contributed by member libraries. oclc standards ln. order to provide records contributed by member libraries that are as usable as those input from marc tapes, it is imperative that the records meet the standards set by oclc and that the cataloging and formatting of the records be free of errors. member libraries are requested to follow the nationally accepted cataloging code (anglo-american cataloging rules, north american text, 1 • 2 for records input before december 12, 1980, and angloamerican cataloguing rules, second edition, 3 for records input later), the library of congress' application of the cataloging code, and the various marc formats in preparing records to be input. 4 • 5 the cataloging rules dictate what kind of bibliographic information should be included in the cataloging records, a prescribed system of punctuation that identifies the various fields of the cataloging record (international standard bibliographic description, isbd), which access points should be provided, and what form the entries should take. the marc formats provide a standardized method of identifying the various fields and subfields in a cataloging record and, through the use of indicators, information necessary to make the record easily manipulated by computers. in addition, fixed fields provide coded information about the cataloging records. the form of main, added, and series entries can be verified in the national union catalog to ensure that member libraries are following the library of congress' application of the cataloging code . by the same token, subject entries can be verified in the appropriate subject heading list (e.g., library of congress subject headings, sears subject headings, etc.). a study of oclc member cataloging a major problem with the use of contributed cataloging is the amount of revision needed to bring the records up to the standards described above. in 1975, a study of the quality of a group of membercontributed catalog records was conducted by c. c . ryans. 6 the first 700 monographic records input into oclc after september 1, 1975, to which kent state university attached its holdings were examined. 7 the analysis included changes in or additions to main, added, or series communications 117 entries, changes in descriptive cataloging, and changes in or additions to subject headings . the study dealt only with the revision of cataloging; revision of the formatting of records was not noted. the kent state study found that 393 revisions were necessary to 283 records. the remaining 417 records were considered to be acceptable, i.e., they adhered to aacr and isbd rules and to the oclc standards for input cataloging. recent developments relating to quality control since these records were studied, the internetwork quality control council was formed in 1977 by the oclc board of trustees. 8 its primary purpose is to identify problem areas regarding quality control and distribute information to networks concerning problems and solutions. its role is to promote quality control through education and by monitoring the implementation of standards. in addition, oclc' s documentation has steadily improved. the recent publication of the books format9 and the recent revision of the cataloging manual10 provide clear and specific information on oclc' s formatting requirements. with these developments in mind, it would seem likely that the quality of the contributed cataloging has improved since 1975. in order to test this assumption, a number of cataloging records were analyzed in an effort to replicate the kent state study. the analysis of these records differed from the earlier study in that differences in the treatment of series were not noted because one library's treatment of series can reasonably be expected to differ from that of another . methodology the records included in this study consist of 1,017 monographic catalog records to which the state university of new york at albany (sunya) library added its holding symbol during an eight-month period from november 1979 to july 1980. the records included only those that were entered into the oclc database after 1976. cataloging revisions that were noted 118 journal of library automation vol. 14/2 jun e 1981 consisted of changes in main and added entries to make them consistent with library of congress form of entry, and the inclusion of other added entries that were deemed necessary to provide adequate access to the material. in addition, corrections or additions to the imprint and the collation· were noted, as were typograph_ ical errors in all fields . subject headings that were changed to make them consistent with library of congress subject headings and subject headings and/or subdivisions added to provide better subject access to the material were also noted . analysis of cataloging cataloging revisions were required for 43 percent of the 1,017 records examined (596 changes or additions were made to 437 records). changes or additions to subject headings were made to 22.4 percent of all the records in the sunya sample, and represented the most common revision . changes in descriptive cataloging were made to 20 percent of the records, and changes or additions to main or added entries were made to approximately 16 percent of the records. table 1 compares the results of this analysis with the findings of the e arlier study . it should be emphasized that the two studies are not exactly comparable because the kent state study included differences in the treatment of series, while this study noted only typographical errors in series statements. the findings of this analysis do not bear out the hypothesis that the quality of member-contributed cataloging has improved since 1975. the overall percentage of records requiring cataloging revision is similar in both the kent state and the sunya samples . the percentage of changes made in the various areas of the cataloging records was similar, with the exception of added entries and subject headings . in the sunya sample , more revisions and additions were made to these two areas. this difference between the two samples may reflect variation in the cataloging policies of the two libraries rather than the presence or absence of more errors in member-contributed catalog records . analysis of oclc reportable errors and additions in the fall of 1979, oclc distributed its revised cataloging manual, which includes a chapter dealing with quality control. 11 the chapter delineates the errors and changes that are to be reported to oclc for correction or addition . the cataloging records examined in this study were also analyzed with these criteria in mind. this analysis (table 2) revealed that 661 reportable errors or changes were found on 486 records (47.8 percent of all the records). reportable errors or changes included formatting errors or omissions such as incorrect assignment of tags, incorrect or missing indicators, subfield codes or fixed fields, and errors affecting retrieval or card printing . other types of errors intable 1 . comparison of two studies of cataloging revision area needing kent state sample* sunya sample revision or addition number percentage number percentage main entry 44 6.2 46 4.5 title statement 28 4.0 76 7.5 edition statement 4 0.6 2 0.2 imprint 29 4.4 64 6.3 collation 111 15.9 58 5.7 series 55 7.9 3 0.3 subject heading 88 12.6 228 22 .4 added entries 44 6.2 119 11.7 total records in study 700 100.0 1017 100.0 records requiring revision 283 40.4 437 43.0 number of revisions made 393 596 *source: constance c . ryans, "a study of errors found in non-marc cataloging in a machineassisted system," journal of library automation 11 :128 (june 1978). communications 119 table 2 . errors and additions reportable to oclc number percentage of total records percentage of total errors and additions 19 6 13 17 59 errors in transcription of data incorrect assignment of tags incorrect or missing subfield codes incorrect assignment of 1st indicator incorrect assignment of 2d indicator incorrect fixed fields incorrect isbd incorrect form of entry (less than lc) errors affecting retrieval or card printing bibliographic information missing addition of access points 313 8 87 3 1 135 total number of records containing reportable errors or additions total number of reportable errors or additions 486 661 eluded incorrect or omitted access points (added or subject entries, isbn, lc card numbers, etc.), errors in transcription of data, incorrect isbn, and the omission of needed bibliographic information. approximately 40 percent (408) of the records contained formatting errors, with over 29 percent (300) of the records containing incomplete or incorrect fixed fields. the apparent unconcern with fixed fields may stem from a lack of understanding of the value of correct fixed-field information. the recent addition of date and type of material as qualifiers in a search of the database is one example of the use of fixed fields. in order to underscore their importance, it might be useful for oclc to highlight this use of fixed fields and further explain to its members how other fixed fields might be used in online search strategies in the future. errors in or omission of access points were found in 222 records (21.8 percent). these errors were also noted in the study of cataloging revisions discussed above, as were errors in transcription of data, in isbd, and in omission of necessary bibliographic information. summary of findings although the quality of the sunya sample seems equivalent to that of the kent state sample, an analysis by date of input of the records examined indicates a slight decrease in the percentage of rec1.9 0.6 1.3 1.7 5.8 30.8 0.8 8.6 0.3 0.1 13.3 47.8 2.9 0.9 2.0 2.6 8.9 47.4 1.2 13.2 0.5 0.2 20 .4 100.0 ords needing correction for those records input in 1979 and 1980 (table 3). perhaps this is the beginning of a trend toward more careful cataloging and formatting of records input by members. in summary, 589 of the 1,017 membercontributed records studied were found to require revision. of these, 486 records contained er.rors or omissions that may be reported to oclc, and 437 required cataloging revision. it is discouraging to realize that approximately 60 percent of the member records used required revision. such a high percentage of records needing revision necessitates the review of all member records .used if a library wishes to adhere to oclc standards for cataloging. this leads to tremendous duplication of effort and negates, in part, the purpose of shared cataloging. table 3. yearly breakdown of catalog records total records percentage year number needing needing of input of records correction correction 1977 186 115 61.8 1978 332 202 60.8 1979 339 184 54.3 1980 160 88 55.0 influences for change the implementation of aacr2 in 1981 provides the impetus for greater adherence to standards. since all catalogers 120 journal of library automation vol. 14/2 june 1981 have had to learn the new cataloging requirements, greater care may be used in the formulation of records by member libraries. the publication of clear and specific guidelines for reportable errors may help to alleviate the situation in two ways . first, the careful articulation of errors or desirable additions may impel member libraries to place more emphasis on the quality control of input. second, member libraries may report more errors, thus allowing oclc to correct the master records. a change in the method of correcting errors and the rate at which they are corrected might be beneficial. presently, errors on the master records can only be corrected by oclc or by the inputting library if it is the only library that has used the record. such an arrangement is clumsy and time-consuming. if other member libraries were trained and authorized to correct errors on master records, errors might be corrected as often as they are detected. in the long run, however, the responsibility for inputting catalog records that meet the standards for cataloging and formatting rests with the member libraries. oclc and the networks must develop methods of encouraging libraries to input records that are correctly formatted and cataloged . one way of alleviating the problem might be to develop training programs conducted by oclc or by network staff that are aimed at those libraries identified as having high error rates. another approach might be to give public recognition to libraries that contribute cataloging of high quality to the database. one example of this approach is the pittsburgh regional library council's fred award, which annually honors the library with the lowest error rate in the prlc network. 12 through the use of peer pressure the member libraries and networks of oclc can encourage adherence to the standards. in addition, they must continue to insist that oclc address this annoying, expensive, and seemingly perennial problem. references l. anglo-american cataloging rules, north american text (chicago: american library assn., 1967), 409p. 2. anglo-american cataloging rules, chapter 6 (rev. ed.; chicago: american library assn., 1974), 122p. 3. anglo-american cataloguing rules, second edition (chicago: american library assn., 1978), 620p. 4. oclc, inc . , cataloging: user manual (columbus: oclc, 1979), 1v. (looseleaf). 5. oclc level i and level k input standards (columbus: ohio college library center, 1977), 1 v. (looseleaf). 6. constance c. ryans, "a study of errors found in non-marc cataloging in a machine-assisted system," journal of library automation 11:125-32 oune 1978). 7. ibid., p . 127. 8. frederick g. kilgour, "establishment of inter-network quality control council" (unpublished document, ohio college library center, 1977), 2p. 9. oclc, inc., books format (columbus: oclc, 1980), 1v. (looseleaf). 10. oclc, inc., cataloging: user manual, 1v. (looseleaf) . 11. ibid. 12. "prlc peer council cites pittsburgh theological seminary library for high cataloging standards," oclc newsletter 131:4 (sept. 1980). 240 i ournal of library automation vol. 7 i 3 september 197 4 book reviews case studies in lihm1·y computer systems, by richard phillips palmer. new york: r. r. bowker, 1973. 214p. $10.95. surely one of the most annoying and disappointing aspects of the literature of library automation is the complete lack of uniformity or standards for reports of individual accomplishments. thus one reads the continuing stream of reports of automated processes in individual libraries with only the remotest idea of which of the projects described are actually operating, which are in the process of being implemented, and which are merely proposals that still exist exclusively in the minds of their creators. in this volume, richard palmer has brought together a number of descriptions of operating systems, upon which he has imposed his own standards of presentation. in all, six circulation, eight serials, and six acquisitions systems are described; in each case the description is divided into six parts. first, in a section entitled "environment," the library, its collections, and its users are briefly described. some idea is provided of the library's total budget, or at least its materials budget, and unusual features of the library are given. next, the objectives of the automated system are stated, generally with some indication of what prompted automation to be considered and what features of the previous manual system were less than satisfactory. a section entitled "the computer" describes the hardware used in some detail (and this information is summarized in a table at the end of the book) , and the next section, "the system," gives a lengthy and detailed description of how the system works. the last section in each case is devoted to observations by palmer, indicating the significance to the library of the automated system, and often pointing out problems that have been noted. the least satisfactory section of the book is the final chapter, "summary and observations," in which palmer lays out the stated costs of each system in such a way that they may be directly compared, even though he knows the figures have been derived in various manners and are therefore not directly comparable. palmer's warning to the reader that "unit costs ... should not be compared without noting that they were not computed on a standard basis" makes even more mystifying his arrangement of those costs in tabular form. a second area that seems weak is the suggestion that the book constitutes an effective rebuttal to the criticisms of ellsworth mason. it seems unlikely that anything short of a very thorough systems analysis, showing all of the problems, alten1atives, costs, and benefits of both manual and machine systems, will satisfy mason. despite these very minor reservations, the book is well worthy of study. it presents, in nontechnical language, some of the most carefully and honestly described systems descriptions to be found in the literature, suggesting by example that many of the individual applications described in the journals, including lola, might well be better than they are. pete1· simmons school of librarianship university of british c@lumbla information systems, se1'vices and centers, by herman m. weisman. new york, n.y.: wiley-becker-hayes, 1972. 265p. $10.95. isbn: 0-471-92645-0. weisman states that his work "is not a text on automated information technology," and mechanization is pretty well dismissed in one page of critical discussion. ellsworth mason is singled out for his "amusing, facetious and bitter account of a [sic] melancholy experience at mechanization." use of automated services for information work is covered in less than a page. the work is supposed to be a university-level text and "reference source" on "the practices of information transfer and use" on the "retailer" level. it is almost entirely limited to industrial and government scientific and technical information services. libraries are defined in passing as a "specific type of information system . . . largely limited with some few exceptions to the passive repository function. . . ." however, "if the organization has a library, consultation with the librarian and use of his mechanisms for acquisition and purchase are advisable." it is also suggested that acquisitions are "recommended by the systems advisory committee . . . selected and purchased by the director of the information system and the documentation unit head," and the onorder file is maintained as a list (to be distributed monthly, perhaps) and in card form. the section on cataloging is equally instructive in advising that the acquisition process has provided "subject" as one of three elements needed for descriptive cataloging. the book swings dizzily back and forth from this lilliputian (or is it laputan?) perspective to the more olympian outlook suggested by a seventeen-page appendix which is the text of a charter for the united engineering information service with an expected annual budget of $1.2 million. it also seesaws from the uselessly general to the exquisite detail of an operations manual with hardly a pause for breath. we are told at the start of the chapter on "documentation practicesinformation services" that it "is more efficient to provide [information dissemination services] than to have individuals scurrying about searching for information." a summary of the "procedural flow" follows immediately: "1. ... all requests and inquiries no matter how received or to whom addressed are logged and assigned a control number. 2 .... the head of inquiry services is responsible for monitoring all requests and inquiries. . . . all incoming requests are entered on the inquiry form .... " most examples appear to be drawn from the author's experience as manager of information services, national bureau of standards. some are useful. strauss' scientific and technical libraries: their organization and administration was another wiley-becker-hayes volume issued during the same year. it is impossible to avoid imagining the publisher's marketing division people counting the respective memberships of the special libraries association and the american book reviews 241 society for information science as distinct markets for the two works. however, the first three-quarters of weisman's work is a duplication distinguished from strauss mainly by the shallowness of its coverage and the poverty of its prose. weisman's only notable contribution is thirty pages about information analysis centers, which might be worth a school reading assignment. the assignment will be at some risk, depending .on students' toleration for such words as "essentialness," "beneficialness," "collaborationists," and such phrases as "parameters of data points," as in "an indexed bibliography becomes a more useful document, since it can indicate to a user exactly the type of data contained as well as parameters of data points." "relevance," as weisman notes, "is not always synonymous with competence." justine roberts university of california san francisco k1wwing books and men; knowing computers, too, by jesse h. shera. littleton, colo.: libraries unlimited, 1973. 363p. $13.50. isbn: 0-87287-073-1. the only clumsy thing about this book is its pretentious title, which not only gives little indication of the book's contents but is discordant with the lucid and vigorous style of the writing. kbam;kc,t is a selection of writings and speeches by dr. shera, done between 1931 and 1972, all but one previously published. but only a few are reprinted unchanged; most have undergone revision to some significant extent, and one has been almost doubled in length in revision. even the oldest papers are not unduly "dated," and the author's reflections on the use and abuse of computers in libraries are as timely now as when first written. the twenty-nine papers published here are presented under six headings, each representing an area of librarianship in which dr. shera has been a major influence: philosophy of librarianship, library history, reference work in the library, documentation, the academic library, and library education. most of lola's readers, it is hazarded, will find 242 journal of library automation vol. 7/3 september 1974 the section on documentation of most interest. reviewing kbam;kc,t is no fit occasion for attempting to evaluate jesse shera's contributions to librarianship. he is established, and this selection from his writings contains many of his important and influential papers and others, inevitably, less weighty. throughout, however, they bear shera' s characteristic combination of clarity, intelligence, vision, and a forthrightness bordering on truculence, the mix spiced judiciously with attic salt. in a disarming preface shera suggests that the collection may be "more of an addition to library shelves than to library literature." be that as it may, many of the writings were originally published in somewhat obscure journals, and it is helpful to have them gathered in this convenient form. george piternick school of librarianship the university of british columbia a library management game: a report on a research project, by p. brophy, m. k. buckland, g. ford, a. hindle, and a. g. mackenzie; with an appendix by l. c. guy. (university of lancaster occasional papers, no.7) university of lancaster library, 1972. 90p. £ 1.00. isbn: 0901699-14-4. in the context of the need for greater managerial expertise in libraries, the state of managerial education in library schools, and the place of games in this education, the authors describe in this document the development of a simplified probablistic model of a loan and duplication system. while it is perhaps the novelty of concept exhibited by the game which first attracts attention, closer examination reveals that the game is but the vehicle upon which is carried a far-ranging analysis of the state of library management. a dynamic model utilizes three input variables-loan period, titles bought, and duplicates purchased-and three output measures-satisfaction level, document exposure, and collection bias-of the effective manipulation of the former within the constraint of budget to illustrate complex interactions within a library system. sufficient flexibility (e.g., variation of loan periods according to popularity of volumes and/ or status of user) enables different policies to be selected to effect the stated objectives of the player (library "manager"). comparison of selected outputs illustrates that while choosing and implementing policies may be simple (a "game editor" interprets a player's decisions to the computer), judging their merits is not. policy (l) decreases collection bias at the expense of average document exposure per issue, while policy ( q) has the opposite effect, for similar costs and total issues; policy (t) increases satisfaction level and decreases collection bias in comparison with policy ( q), at a cost of 8,000 units of expenditure. evaluation of the policy decision rests on a value judgment (as in the real world) . although description of the game and probabilities upon which it is based occupies a considerable portion of the volume, the authors considered not only the practicability of such a game but also its usefulness in teaching and cost of utilization. an appendix devoted to an in-depth study of education for library management concludes that: in britain and, to a lesser extent, the united states this aspect of library education needs considerable strengthening; games such as that described are most suited to specialized courses for experienced librarians but there is a place for similar ones in firstlevel courses; and a larger proportion of the profession needs to comprehend the concepts put forward in this and other studies before better management techniques will be applied to libraries. this volume is an important contribution to the literature of library management, illustrating that the effect that computers can have on the practice of librarianship goes far beyond the mere substitution of machines for clerical workers. george ]. snowball sir george williams unive1·sity library montreal, canada liu ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ microsoft word march_ital_tharani_tc proofread.docx linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe karim tharani information technology and libraries | march 2015 5 abstract by way of a case study, this paper illustrates and evaluates the bibliographic framework (or bibframe) as means for harvesting and sharing bibliographic metadata over the web for libraries. bibframe is an emerging framework developed by the library of congress for bibliographic description based on linked data. much like semantic web, the goal of linked data is to make the web “data aware” and transform the existing web of documents into a web of data. linked data leverages the existing web infrastructure and allows linking and sharing of structured data for human and machine consumption. the bibframe model attempts to contextualize the linked data technology for libraries. library applications and systems contain high-‐quality structured metadata, but this data is generally static in its presentation and seldom integrated with other internal metadata sources or linked to external web resources. with bibframe existing disparate library metadata sources such as catalogs and digital collections can be harvested and integrated over the web. in addition, bibliographic data enriched with linked data could offer richer navigational control and access points for users. with linked data principles, metadata from libraries could also become harvestable by search engines, transforming dormant catalogs and digital collections into active knowledge repositories. thus experimenting with linked data using existing bibliographic metadata holds the potential to empower libraries to harness the reach of commercial search engines to continuously discover, navigate, and obtain new domain specific knowledge resources on the basis of their verified metadata. the initial part of the paper introduces bibframe and discusses linked data in the context of libraries. the final part of this paper outlines and illustrates a step-‐by-‐step process for implementing bibframe with existing library metadata. introduction library applications and systems contain high-‐quality structured metadata, but this data is seldom integrated or linked with other web resources. this is adequately illustrated by the nominal presence of library metadata on the web.1 libraries have much to offer to the web and its evolving future. making library metadata harvestable over the web may not only refine precision karim tharani (karim.tharani@usask.ca) is information technology librarian at the university of saskatchewan in saskatoon, canada. information technology and libraries | march 2015 6 and recall but has the potential to empower libraries to harness the reach of commercial search engines to continuously discover, navigate, and obtain new domain specific knowledge resources on the basis of their verified metadata. this is a novel and feasible idea, but its implementation requires libraries to both step out of their comfort zones and to step up to the challenge of finding collaborative solutions to bridge the islands of information that we have created on the web for our users and ourselves. by way of a case study, this paper illustrates and evaluates the bibliographic framework (or bibframe) as means for harvesting and sharing bibliographic metadata over the web for libraries. bibframe is an emerging framework developed under the auspices of the library of congress to exert bibliographic control over traditional and web resources in an increasingly digital world. while bibframe has been introduced as a potential replacement for marc (machine-‐readable cataloging) in libraries;2 however, the goal of this paper is to highlight the merits of bibframe as a mechanism for libraries to share metadata over the web. bibframe and linked data while the impetus behind bibframe may have been replacement of marc, “it seems likely that libraries will continue using marc for years to come because that is what works with available library systems.”3 despite its uncertain future in the cataloging world, bibframe in its current form provides fresh and insightful mechanism for libraries to repackage and share bibliographic metadata over the web. bibframe utilizes the linked data paradigm for publishing and sharing data over the web.4 much like semantic web, the goal of linked data is to make the web “data aware” and transform the existing web of documents into a web of data. linked data utilizes existing web infrastructure and allows linking and sharing of structured data for human and machine consumption. in a recent study to understand and reconcile various perspectives on the effectiveness of linked data, the authors raise intriguing questions about the possibilities of leveraging linked data for sharing library metadata over the web: although library metadata made the transition from card catalogs to online catalogs over 40 years ago, and although a primary source of information in today’s world is the web, metadata in our opacs are no more free to interact on the web today than when they were confined on 3" × 5" catalog cards in wooden drawers. what if we could set free the bound elements? that is, what if we could let serial titles, subjects, creators, dates, places, and other elements, interact independently with data on the web to which they are related? what might be the possibilities of a statement-‐based, linked data environment? 5 linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe | tharani 7 figure 1. the bibframe model6 bibframe provides the means for libraries to experiment with linked data to find answers to these questions for themselves. this makes bibframe both daunting and delighting simultaneously. it is daunting because it imposes a paradigm shift in how libraries have historically managed, exchanged, and shared metadata. but embracing linked data also leads to a promise land where metadata within and among libraries can be exchanged seamlessly and economically over the web. bibframe (http://bibframe.org) consists of a model and a vocabulary set specifically designed for bibliographic control.7 the model identifies four main classes, namely, work, instance, authority, and annotation (see figure 1). for each of these classes, there are many hierarchical attributes that help in describing and linking instantiations of these classes. these properties are collectively called the bibframe vocabulary. philosophically, linked data is based on the premise that more links among resources will lead to better contextualization and credibility of resources, which in turn will help in filtering irrelevant resources and discovering new and meaningful resources. at a more practical level, linked data provides a simple mechanism to make connections among pieces of information or resources over the web. more specifically, it not only allows humans to make use of these links but also machines to do so without human intervention. this may sound eerie, but one has to understand the history behind the origin of linked data not to think of this as yet another conspiracy for machines to take over the world (wide web). in 1994 tim berners-‐lee, the inventor of the web, put forth his vision of the semantic web as a “web of actionable information—information derived from data through a semantic theory for information technology and libraries | march 2015 8 interpreting the symbols. the semantic theory provides an account of ‘meaning’ in which the logical connection of terms establishes interoperability between systems.”8 while the idea of semantic web has not been fully realized for a variety of functional and technical reasons, the notion of linked data introduced subsequently has made the concept much more accessible and feasible for a wider application.9 once again, it was tim berners-‐lee who put forth the ground rules for publishing data on the web that are now known as the linked data principles.10 these principles advocate using standard mechanisms for naming each resource and their relationships with unique universal resource identifiers (uris); making use of the existing web infrastructure for connecting resources; and using resource description framework (rdf) for documenting and sharing resources and their relationships. a uri serves as a persistent name or handle for a resource and is ideally independent of the underlying location and technology of the resource. although often used interchangeably, a uri is different from a url (or universal resource locator), which is a more commonly used term for web resources. a url is a special type of uri, which points to the actual location (or the web address) of a resource, including the file name and extension (such as .html or .php) of a web resource. being more generic, the use of uris (as opposed to urls) in linked data provides persistency and flexibility of not having to change the names and references every time resources are relocated or there is a change in server technology. for example if an organization switches its underlying web-‐scripting technology from active server pages (asp) to java server pages (jsp), all the files on a web server will bear a different extension (e.g., .jsp) causing all previous urls with old extension (e.g., .asp) to become invalid. this technology change, however, may have no impact if uris are used instead of urls because the underlying implementation and location details for a resource are masked from the public. thus the uri naming scheme within an organization must be developed independent of the underlying technology. there are diverse best practices on how to name uris to promote usability, longevity, and persistence.11 the most important factors, however, remain the purpose and the context for which the resources are being harvested and shared. use of rdf is also a requirement of using linked data for sharing data over the web. much like how html (hypertext markup language) is used to create and publish documents over the web, rdf is used to create and publish linked data over the web. the format of rdf is very simple and makes use of three fundamental elements, namely, subject, predicate, and object. similar to the structure of a basic sentence, the three elements make up the unit of description of a resource known as a triple in the rdf terminology. unsurprisingly, rdf requires all three elements to be denoted by uris with the exception of the object, which may also be represented by constant values such as a dates, strings, or numbers.12 as an example, consider the work divine comedy. the fact this work, also known as divina commedia, was created by dante alighieri can be represented by the following two triples (using n-‐triples format): linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe | tharani 9 <http://dbpedia.org/resource/divine_comedy> <http://bibframe.org/vocab/creator> <http://dbpedia.org/resource/dante_alighieri> . <http://dbpedia.org/resource/divine_comedy> <http://www.w3.org/2002/07/owl#sameas> “divina commedia”. in the first triple of this example, the work divine comedy (subject) is being attributed to a person called dante alighieria (object) as the creator (predicate). in the second triple the use of sameas predicate asserts that both divine comedy and divina commedia refer to the same resource. thus using uris makes the resources and relationships persistent whereas use of rdf makes the format discernible by humans and machines. this seemingly simple idea allows data to be captured, formatted, shared, transmitted, received, and decoded over the web. use of the existing web protocol (http or hypertext transfer protocol) for exchanging and integrating data saves the overhead of putting additional agreements and infrastructure in place among parties willing or wishing to exchange data. this ease and freedom to define relationships among resources over the web also makes it possible for disparate data sources to interact and integrate with each other openly and free of cost. why is this seemingly simple idea so significant for the future of the web? from a functional perspective, what this means is that linked data facilitates “using the web to create typed links between data from different sources. these may be as diverse as databases maintained by two organisations in different geographical locations, or simply heterogeneous systems within one organisation that, historically, have not easily interoperated at the data level.”13 the notion of typed linking refers to the facility and freedom of being able to have and name multiple relationships among resources. from a technical point of view, “linked data refers to data published on the web in such a way that it is machine-‐readable, its meaning is explicitly defined, it is linked to other external data sets, and can in turn be linked to from external data sets.”14 in a traditional database, relationships between entities or resources are predefined by virtue of tables and column names. moreover, data in such databases become part of the deep web and not readily accessed or indexed by search engines. 15 the use of uris to name relationships allows data sources to establish, use, and reuse vocabularies to define relationships between existing resources. these names or vocabularies, much like the resources they describe, have their own dedicated uris, making it possible for resources to form long-‐term and reliable relationships with each other. if resources and relationships have and retain their identities by virtue of their uris, then links between resources add to the awareness of these resources both for humans and machines. this is a key concept in realizing the overall mission of linked data to imbue data awareness and transforming the existing web of documents into a web of data. consequently various institutions and industries information technology and libraries | march 2015 10 have established standard vocabularies and made them available for others to use with their data. for example, the library of congress has published its subject headings as linked data. the impetus behind this gesture is that if data from multiple organizations is “typed link” using lcsh (library of congress subject headings) with linked data, then libraries and others gain the ability to categorize, collocate, and integrate data from disparate systems over the web by virtue of using a common vocabulary. as more and more resources link to each other through established and reusable vocabularies, the more data aware the web becomes. recognizing this opportunity, the library of congress has also developed and shared its vocabulary for bibliographic control as part of the bibframe framework.16 implementing bibframe to harvest and share bibliographic metadata nowadays, systems like catalogs and digital collection repositories are commonplace in libraries, but these source systems often operate as islands of data both within and across libraries. the goal of this case study is to explore and evaluate bibframe as a viable approach for libraries to integrate and share disparate metadata over the web. as discussed above, the bibframe model attempts to contextualize the use of linked data for libraries and provides a conceptual model and underlying vocabulary to do so. to this end, a unique collection of ismaili muslim community was identified for the case study. the collection is physically housed at the harvard university library (hul) and the metadata for the collection is dispersed across multiple systems within the library. an additional objective of this case study has been to define concrete and replicable steps for libraries to implement bibframe. the discussion below is therefore presented in a step-‐by-‐step format for harvesting and sharing bibliographic metadata over the web. 1. establishing a purpose for harvesting metadata the harvard collection of ismaili literature is first of its kind in north america. “the most important genre represented in the collection is that of the ginans, or the approximately one thousand hymn-‐like poems written in an assortment of indian languages and dialects.”17 the feasibility of bibframe was explored in this case study by creating a thematic research collection of ginans by harvesting existing bibliographic metadata at hul. the purpose of this thematic research collection is to make ginans accessible to researchers and scholars for textual criticism. historically libraries have played a vital role in making extant manuscripts and other primary sources accessible to scholars for textual criticism. the need for having such a collection in place for ginans was identified by dr. ali asani, professor of indo-‐muslim and islamic religion and cultures at harvard university: perhaps the greatest obstacle for further studies on the ginan literature is the almost total absence of any kind of textual criticism on the literature. thus far merely two out of the nearly one thousand compositions have been critically edited. naturally, the availability of reliably edited texts is fundamental to any substantial scholarship in this field. . . . for the scholar of post-‐classical ismaili literature, recourse to this kind of linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe | tharani 11 material has become especially critical with the growing awareness that there exist significant discrepancies between modern printed versions of several ginans and their original manuscript form. fortunately, the harvard collection is particularly strong in its holdings of a large number of first editions of printed ginan texts—a strength that should greatly facilitate comparisons between recensions of ginans and the preparation of critical editions.18 2. modeling the data to fulfill functional requirements historically, the physicality of resources such as book or compact disc has dictated what is described in library catalogs and to what extent. the issue of cataloging serials and other works embedded within larger works has always been challenging for catalogers. for this case study as well, one of the major implementation decisions revolved around the granularity of defining a work. designating each ginan as a work (rather than a manuscript or lithograph) was perhaps an unconventional decision, but one that was highly appropriate for the purpose of the collection. thus there was a conscious and genuine effort to liberate a work from the confines of its carriers. fortuitously, bibframe does not shy away from this challenge and accommodates embedded and hierarchal works in its logical model. but bibframe, like any other conceptual model, only provides a starting point, which needs to be adapted and implemented for individual project needs. figure 2. excerpt of project data model information technology and libraries | march 2015 12 the data model for this case study (see figure 2) was designed to balance the need to accommodate bibliographic metadata with the demands of linked data paradigm. central to the project data model is the resources table where information on all resources along with their uris and categories (work, instance, etc.) are stored. resources relate to each other with use of predicates table, which captures relevant and applicable vocabularies. the namespace table keeps track of all the set of vocabularies being used for the project. in the triples table, resources are typed linked using appropriate predicates. once the data model for the project was finalized, a database was created using mysql to house the project data. 3. planning the uri scheme in general the uri scheme for this case study conformed to the following intuitive nomenclature: <http://domain.com/resource/resource_type/resource_id>. this uri naming scheme ensures that a uri assigned to a resource depends on its class and category (see table 1). while it may be customary to use textual identifiers in the uris, the project used numeric identifiers to account for the fact that most of the ginans (works) are untitled and transliterated into english from various indic languages. generally support for using uris is either already built-‐in or added on depending on the server technology being used. this case study utilized the lamp (linux, apache, mysql, and php) technology stack, and the uri handler for the project was added on to the apache webserver using url-‐rewriting (or mod_rewrite) facility.19 resource types bibframe category uri example organizations annotation http://domain.com/organization/1 collections annotation http://domain.com/collection/1 items instance http://domain.com/item/1 ginan work http://domain.com/ginan/1 subjects authority http://domain.com/subject/1 table 1. uri naming scheme and examples 4. using standard vocabularies bibframe provides the relevant vocabulary and the underlying uris to implement linked data with bibliographic data in libraries. while not all attributes may be applicable or used in a project, the ones that are identified as relevant must be referenced with their rightful uri. for example, the predicate hasauthority from bibframe has a persistent uri (http://bibframe.org/vocab/hasauthority) enabling humans as well as machines to access and decode the purpose and scope of this predicate. other vocabulary sets or namespaces commonly used with linked data include resource description frameowrk (rdf), web ontology language (owl), friend of a friend (foaf), etc. in rare circumstances, libraries may also choose to publish their own specific vocabulary. for example, any unique predicates for this case study could be linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe | tharani 13 defined and published using the http://domain.com/vocab namespace. 5. identifying data sources the bibliographic metadata used for this case study was obtained from within hul. as mentioned above, the data pertained to a unique collection of religious literature belonging to the ismaili muslim community of the indian subcontinent. this collection was acquired by the middle eastern department of the harvard college library in 1980. the collection comprises 28 manuscripts, 81 printed books, and 11 lithographs. in 1992, a book on the contents of this collection was published in 1992 by dr. asani and was titled the harvard collection of ismaili literature in indic languages: a descriptive catalog and finding aid. the indexes in the book served as one of the sources of data for this case study. subsequent to the publication of the book, the harvard collection of ismaili literature was also made available through harvard’s opac (online public access catalog) called hollis (see figure 3). the catalog records were also obtained from the library for the case study. some of the 120 items from the collection were subsequently digitized and shared as part of the harvard’s islamic heritage project. the digital surrogates of these items were shared through the harvard university library open collections program. and the library catalog records were also updated to provide figure 3. hollis: harvard university library’s opac direct access to the digital copies where available. additional metadata for the digitized items was also developed by the library to facilitate open digital access through harvard library’s page delivery service (pds) to provide page-‐turning navigational interface for scanned page images over the web. data from all these sources was leveraged for the case study. information technology and libraries | march 2015 14 6. transforming source metadata for reuse etl (extract, transform, and load) is an acronym commonly used to refer to the steps needed to populate a target database by moving data from multiple and disparate source systems. extraction is the process of getting the data out of the identified source systems and making it available for the exclusive use of the new database being designed. in the context of the library realm, this may mean getting marc records out from a catalog or getting descriptive and administrative metadata out of a digital repository. format in which data is extracted out of a source system is also an important aspect of the data extraction process. use of xml (extensible markup language) format is fairly common nowadays as most library source systems have built-‐in functionality to export data into a recognized xml standard such as marcxml (marc data encoded in xml), mods (metadata object description schema), mets (metadata encoding and transmission standard), etc. in certain circumstances, data may be extracted using csv (comma-‐separated values) format. transformation is the step in which data from one or more source systems is massaged and prepared to be loaded to a new database. the design of the new database often enforces new ways of organizing source data. the transformation process is responsible to make sure that the data from all source systems is integrated while retaining its integrity before being loaded to the new database. a simplistic example of data transformation may be that the new system may require authors’ first and last names to be stored in separate fields rather than in a single field. how such transformations are automated will depend on the format of the source data as well as the infrastructure and programming skills available within an organization. since xml is becoming the de facto standard for most data exchange, use of xslt (extensible stylesheet language transformations) scripts is common. with xslt, data in xml format can be manipulated and given different structure to aid in the transformation process. the loading process is responsible for populating the newly minted database once all transformations have been applied. one of the major considerations in this process is maintaining the referential integrity of the data by observing the constraints dictated by the data model. this is achieved by making sure that records are correctly linked to each other and are loaded in proper sequence. for instance, to ensure referential integrity of items and their annotations, it may be necessary to load the items first and then the annotations with correct reference to the associated item identifiers. for this case study, records from source systems were obtained in marcxml and mets formats, and specific scripts were developed to extract desired elements and transform them into the required format. a somewhat unconventional mechanism was used to capture and reuse the data from dr. asani’s book, which was only available in print. the entire book was scanned and processed by an ocr (optical character recognition) tool to glean various data elements. once the data was cleaned and verified, the information was transformed into a csv data file to facilitate linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe | tharani 15 database loading. 7. generating rdf triples the rdf triples can be written or serialized using a variety of formats such as turtle, n-‐triples, json, as well as rdf/xml, among others. the traditional rdf/xml format, which was the first standard to be recommended for rdf serialization by the world wide web consortium (w3c), was used for this case study (see figure 4). the format was chosen for its modularity in preserving the context of resources and their relationships as well as its readability for humans. generating rdf may be a simple act if the data is already stored in a triplestore, which is a database specifically designed to store rdf data. but given that this project was implemented using a relational database management system (rdbms), i.e., mysql, the programming effort to generate rdf data was complex. the complications arose in identifying and tracking the hierarchical nature of the rdf data, especially in the chosen serialization format. several server-‐side scripts were developed to aid in discerning the relationships among resources and formatting them to generate triples. in hindsight generating triples would have been easier using the n-‐triples serialization but that would have also required more complex programming for rebuilding the context for the user interface design. figure 4. a sample of triples serialized for the project 8. formatting rdf triples for human and machine consumption the raw rdf data is sufficient for machines to parse and process, but humans typically require intuitive user interface to contextualize triples. in this case study, xsl was extensively used for formatting the triples. while xslt and xsl (extensible stylesheet language) are intricately related, they serve different purposes. xslt is a scripting language to manipulate xml data whereas xsl is a formatting specification used in presentation of xml, much like how css (cascading style sheets) are used for presenting html. a special routing script was also developed to detect whether the request for data was intended for machine or human consumption. for machine requests, the triples were served unformatted whereas for human requests, the triples were formatted to display in html. information technology and libraries | march 2015 16 figure 5. formatted triples for human consumption discussion models are tools of communicating simple and complex relations between objects and entities of interest. effectiveness of any model is often realized during implementation when the theoretical constructs of the models are put to test. the challenge faced by bibframe, like any new model, is to establish its worthiness in the face of the existing legacy of marc. the existing hold of marc in libraries is so strong that it may take several years for bibframe to be in a position to challenge the status quo. historically bibliographic practices in libraries such as describing, classifying, and cataloging resources have primarily catered to tangible, print-‐based knowledge carriers such as books and journals.20 bibframe challenges libraries to revisit and refresh their traditional notion of text and textuality. although initially introduced as a replacement for marc, bibframe is far from being an either-‐or proposition given the marc legacy. nevertheless, bibframe has made linked data paradigm much more accessible and practical for libraries. rather than perceiving bibframe as a threat to existing cataloging praxis, it may be useful for libraries to allow bibframe to coexist within the current cataloging landscape as a means for sharing bibliographic data over the web. libraries maintain and provide authentic metadata about knowledge resources for their users based on internationally recognized standards. this high quality structured metadata from library catalogs and other systems can be leveraged and repurposed to fulfill unmet and emerging needs of users. with linked data, library metadata could become readily harvestable by search engines, transforming dormant catalogs and collections into active knowledge repositories. in this case study seemingly disparate library systems and data were integrated to provide a unified and enabling access to create a thematic research collection. it is also possible to create such purpose-‐specific digital libraries and collections as part of library operations without having to acquire additional hardware and commercial software. it was also evident from this case study that digital libraries built using bibframe offer superior navigational control and access points linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe | tharani 17 for users to actively interact with bibliographic data. any linked data predicate has the potential to become an access point and act as a pivot to provide insightful view of the underlying bibliographic records (see figure 6). with advances in digital technologies “richer interaction is possible within the digital environment not only as more content is put within reach of the user, but also as more tools and services are put directly in the hands of the user.”21 developing capacity to effectively respond to the informational needs of users is part and parcel of libraries’ professional and operational responsibilities. with the ubiquity of the web and increased reliance of users on digital resources, libraries must constantly reevaluate and reimagine their services to remain responsive and relevant to their users. figure 6. increased navigational options with linked data conclusion just as libraries rely on vendors to develop, store, and share metadata for commercial books and journals, similar metadata partnerships need to be put in place across libraries. the benefits and implications of establishing such a collaborative metadata supply chain are far reaching and can also accommodate cultural and indigenous resources. library digital collections typically showcase resources that are unique and rare, and the metadata to make these collections accessible must be shared over the web as part of library service. as the amount of data on the web proliferates, users find it more and more difficult to differentiate between credible knowledge resources and other resources. bibframe has the potential to address many of the issues that plague the web from a library and information science perspective, including precise search, authority control, classification, data portability, and disambiguation. most popular search engines like google are gearing up to automatically index and collocate disparate resources using linked data.22 libraries are particularly well positioned to realize this goal with their expertise in search, metadata generation, and ontology development. this research looks forward to further initiatives by libraries to become more responsive and make library information technology and libraries | march 2015 18 resources more relevant to the knowledge creation process. references 1. tim f. knight, “break on through to the other side: the library and linked data,” tall quarterly 30, no. 1 (2011): 1–7, http://hdl.handle.net/10315/6760. 2. eric miller et al., “bibliographic framework as a web of data: linked data model and supporting services,” november 11, 2012, http://www.loc.gov/bibframe/pdf/marcld-‐report-‐ 11-‐21-‐2012.pdf. 3. angela kroeger, “the road to bibframe: the evolution of the idea of bibliographic transition into a post-‐marc future,” cataloging & classification quarterly 51, no. 8 (2013): 873–89, http://dx.doi.org/10.1080/01639374.2013.823584. 4. eric miller et al., “bibliographic framework as a web of data: linked data model and supporting services,” november 11, 2012, http://www.loc.gov/bibframe/pdf/marcld-‐report-‐ 11-‐21-‐2012.pdf. 5. nancy fallgren et al., “the missing link: the evolving current state of linked data for serials,” serials librarian 66, no. 1–4 (2014): 123–38, http://dx.doi.org/10.1080/0361526x.2014.879690. 6. the figure has been adapted from eric miller et al., “bibliographic framework as a web of data: linked data model and supporting services,” november 11, 2012, http://www.loc.gov/bibframe/pdf/marcld-‐report-‐11-‐21-‐2012.pdf. 7. “bibliographic framework initiative project,” library of congress, accessed august 15, 2014, http://www.loc.gov/bibframe. 8. nigel shadbolt, wendy hall, and tim berners-‐lee, “the semantic web revisited,” intelligent systems 21 no. 3 (2006): 96–101, http://dx.doi.org/10.1109/mis.2006.62. 9. sören auer et al., “introduction to linked data and its lifecycle on the web,” in reasoning web: semantic technologies for intelligent data access, edited by sebastian rudolph et al., 1– 90 (heidelberg: springer, 2011), http://dx.doi.org/10.1007/978-‐3-‐642-‐23032-‐5_1. 10. tim berners-‐lee, “linked data,” design issues, last modified june 18, 2009, http://www.w3.org/designissues/linkeddata.html. 11. danny ayers and max völkel, “cool uris for the semantic web,” world wide web consortium (w3c), last modified march 31, 2008, http://www.w3.org/tr/cooluris. 12. tom heath and christian bizer, linked data: evolving the web into a global data space (morgan & claypool, 2011), http://dx.doi.org/10.2200/s00334ed1v01y201102wbe001. linked data in libraries: a case study of harvesting and sharing bibliographic metadata with bibframe | tharani 19 13. christian bizer, tom heath, and tim berners-‐lee, “linked data—the story so far,” international journal on semantic web and information systems 5, no. 3 (2009): 1–22, http://dx.doi.org/10.4018/jswis.2009081901. 14. ibid. 15. tony boston, “exposing the deep web to increase access to library collections” (paper presented at the ausweb05, the twelfth australasian world wide web conference, queensland, australia, 2005), http://www.nla.gov.au/openpublish/index.php/nlasp/article/view/1224/1509. 16. “bibliographic framework initiative,” bibframe.org, accessed august 15, 2014, http://bibframe.org/vocab; “bibliographic framework initiative project,” library of congress, accessed august 15, 2014, http://www.loc.gov/bibframe. 17. ali asani, the harvard collection ismaili literature in indic languages: a descriptive catalog and finding aid (boston: g.k. hall, 1992). 18. ibid. 19. ralf s. engelschall, “url rewriting guide,” apache http server documentation, last modified december, 1997, http://httpd.apache.org/docs/2.0/misc/rewriteguide.html. 20. yann nicolas, “folklore requirements for bibliographic records: oral traditions and frbr,” cataloging & classification quarterly 39, no. 3–4 (2005): 179–95, http://dx.doi.org/10.1300/j104v39n03_11. 21. lee l. zia, “growing a national learning environments and resources network for science, mathematics, engineering, and technology education: current issues and opportunities for the nsdl program,” d-‐lib magazine 7, no. 3 (2001), http://www.dlib.org/dlib/march01/zia/03zia.html. 22. thomas steiner, raphael troncy, and michael hausenblas, “how google is using linked data today and vision for tomorrow” (paper presented at the linked data in the future internet at the future internet assembly (fia 2010), ghent, december 2010), http://research.google.com/pubs/pub37430.html. 128 bell laboratories' library real-time loan system (bellrel) r. a. kennedy: bell telephone laboratories, murray hill, new jersey bell telephone laboratories has established an on-line circulation system linking two terminals in each of its three largest libraries to a central computer. objectives include improved service through computer pooling of collections, immediate reporting on publication availability or a borrower's record, automatic reserve follow-up; reduced labor; and increased · management information. loans, returns, reserves and many queries are handled in real time. input may be keyboard only or combined with card reading, to handle all publications with borrower present or absent. bellrel is now being used for some 1500 transactions per day. introduction as part of a continuing program to exploit available technology to improve library service, the technical information libraries system of the bell telephone laboratories has established an on-line, real-time computer circulation network. the initial configuration links two terminals in each of the holmdel, murray hill and whippany, new jersey, libraries to a central computer at murray hill. these are the three largest libraries in bell laboratories, handling 75% of a system total of more than 300,000 loans per year. the bellrel system is designed to process loans, returns, reservations and queries with real-time speed and responsiveness; additionally, it provides a wide range of other products and information basic to the effective control and use of library resources. the libraries of bell laboratories, like many other research libraries, have experienced unprecedented growth over the past decade in facilities, collections, services and traffic. new approaches have had to be found bellrel/ kennedy 129 not only to supply information services of sufficient power and diversity to meet the needs of a communications research organization of over 15,000 people, but also to cope with the expanding volume of everyday work in its eighteen library units. as elsewhere, a large component of that work is circulation in all of its ramifications: direct service, record-keeping, follow-up, resource identification, inter-unit coordination, feedback for purchase and purge decisions, etc. the bellrel system is addressed to these problems within the context of the bell laboratories. the use of computers in circulation control is no longer novel. the studies done by george fry and associates for the library technology project of the american library association emphasize the expense of implementing computer-aided circulation systems ( 1,2). despite these studies, which tend to focus more on the gross costs of substituting data processing for manual techniques than on the immediate and long-range gains for the library as an information system, a trend to the computer is clear. southern illinois ( 3), lehigh ( 4), and oakland ( 5) are among the many university and research libraries which have automated circulation operations using the ibm 357 data collection system and batch processing. comparable systems are in use or planned by other libraries (6,7). latterly there is increasing evidence of serious interest in real-time circulation control. the queen's university of belfast ( 8 ), and the state university of new york at buffalo ( 9) are two institutions reporting studies. redstone arsenal has been demonstrating a two-terminal, on-line system for about a year as part of a comprehensive automation program (10). the bellrel system was put into regular service in march, 1968, after two months of dry-run testing at all six terminals. this paper describes the reasons for changing from a manual system; the objectives established for the new system; the alternatives evaluated; the principal elements, operations and services of the selected system; and problems and performance in the brief period of operations to date. the paper is essentially a summary description; it does not report in detail on all card, disk and tape formats, maintenance procedures, products, logical operations, etc., of the system and its fifty-plus programs. a further report on bellrel will be published when significant experience has accrued. the displaced manual system the newark self-charge-signature system has been used by bell laboratories' libraries for some forty years. in this well-known simple system, the borrower writes his name and address on a prepared book card pulled from the book pocket. for the two out of three loans at bell labs where the borrower is not present, a circulation assistant fills out the card, which is then date stamped, tabbed for date due and filed by author. minor variations on this practice are used for unbound journals and other items lacking book cards. 130 journal of library automation vol. 1/2 june, 1968 reservations for individuals or other libraries in the network are hand posted on the charge card. files are scrutinized for overdue dates every several days (latterly, less frequently as traffic has mounted) and notices prepared by xerox copying of the charge card on an appropriate form. although standard loan periods run from one to four weeks, depending upon the item and demand, about 30% of all loans result in overdue notices. each library in the network has maintained its own circulation rec~ ords, including records for the local circulation of items borrowed on inter~ unit loan. inter-unit traffic is heavy, although substantial duplication of important publications exists in the various libraries. the merits of the newark self-charge system-simplicity, fast handling of borrowers, relatively low cost-are widely known. the system is a venerable one; it works. but all circulation systems have imperfections and in the bell laboratories long-recognized deficiencies of the manual system became increasingly unacceptable when loan traffic began to approach, then exceed, 200,000 items per year. these deficiencies included: 1. an increasing number of hours spent on the tedious and uninspiring tasks of sorting, tagging, posting, slipping, checking and husbanding cards. 2. labor, frequent delays and poor service associated with processing over 60,000 overdue notices per year. 3. inability automatically to use the pooled resources of several libraries to meet demands. 4. inability to determine quickly not merely the holdings of other copies of a title in the library system (union catalogs serve this purpose, after some steps and card handling) but the availability of loan copies at the moment of need. 5. inefficiencies in tracking down missing publications, inventory items, etc. 6. inability to identify all publications currently on loan to a borrower or used by him sometime previously. 7. inadequate information on collection use for resource management. 8. excessive service delays due to combinations of the preceding factors. new system objectives the deficiencies listed above suggest some of the characteristics defined for the new system. library management concluded early in 1965 that any replacement for the existing system must: 1. meet the long-range needs of each of the major libraries in bell laboratories and be extensible to other units in the library network as traffic, experience and costs warranted. . 2. provide not merely a more effective means for handling circulation operations within the walls of any one library but also, if possible, bellrel/kennedy 131 an instrument for knocking walls down, for bringing the combined resources of a number of libraries to bear on any information need. 3. handle all types of materials, bound or unbound, and all types of requests whether in person, by mail, direct telephone or recorded message (i.e., telereference) service. 4. give immediate up-to-the-minute accounting for all items on loan or otherwise off the shelves and locate copies still available for loan. 5. hold reservations against system resources (in line with objective 2) and direct the first copy returned, wherever returned, and as automatically as practical, to the first person on the reserve queue, whatever his base location. 6. identify promptly all items currently charged to a borrower and, as required, previously borrowed by him. 7. monitor circulation traffic and generate, as necessary, overdue notices, missing item lists, high-demand lists, zero-activity reports, statistics, use analyses and other feedback fundamental to effective control and management of the collection. 8. lift the circulation staff from clerical tasks to more personal service to library users, in the interest of the "human use of human beings," to use norbert wiener's phrase. 9. integrate the loan system with other computer-aided systems in use or planned in the libraries. 10. improve the total response of the library to the user. systems evaluated in view of these objectives it will be apparent that only a computeraided system could be seriously considered. none of the several dozen noncomputer systems surveyed in the fry report ( 1) could be considered a worthwhile alternative to the libraries' manual system. the essential questions therefore became: off-line or on-line access? batch or real-time processing? the demonstrated success of the ibm 357 batch processing circulation system compelled study and on-site investigation in several libraries. it was concluded, however, that while the 357 system would meet a number of the established goals, and at moderate cost, the important objectives of immediate accountability, automatic follow-up on reserves, full disclosure of copies available for loan, and automatic pooling of network resources would be seriously compromised. further, the fact that twothirds of all loans made in bell laboratories do not involve the presence of the borrower substantially detracted from one of the major virtues of the 357 system, i.e., the simplicity of input using a pre-punched man ( identification) card submitted by the borrower. the various alternatives for coping with this situation in a 357 system, for 200,000 loans a year and a potential of over 15,000 people, were not attractive. i .. ' i i (. ! :: " ' r ' 132 journal of library automation vol. 1/ 2 june, 1968 the feasibility of on-line access has been widely demonstrated in the research and business world. remote, on-line computer processing is clearly a common course of the near future. equally predictably, it will steadily give more favorable cost/ value ratios as machine costs decrease and labor costs mount. in sum, the technical information libraries concluded that an on-line system was worth the investment and that no other system was worth the price. only an on-line approach would meet the overall objectives for a new system and offer advantages sufficient to justify conversion effort at this time. as frederick ruecking has observed, "a charging system should not be selected because it is 'cheaper' than others. if the selected system does not meet the present and future needs, the choice is poor." ( 11) the bellrel system bellrel is a joint development of the technical information libraries and the comptroller's division of bell laboratories. the system was designed, programmed and implemented in a little over two years, beginning in late 1965. during this time, preparation of the bibliographic records, system design and programming took about seven man years. basic machine elements the initial network is illustrated in figure 1. the two ibm 1050 terw.£ . o ata phone holm0£l l i brary ,...i,-,.,050~te=r=min""a-..~ 1051 1052 1056 selector murray hill library 1050 terminal 10~0 terminal 10" 105 2 lost 1051 1052 1056 channe~ #i 1-----.------.1 console typewriter fig. 1. bellrel circulation system network . whippany library 105 0 te rmina~ r-10 __ 5_0 --te--.m-in-4--.~ 10" 1052 1051 1051 1052 1056 won i to r te"minal bellrel/ kennedy 133 minals in each of the three libraries incorporate keyboard, printer and card reader facilities for maximum flexibility in handling all types of transactions and queries. each terminal is linked by telephone lines, using western electric 103a data-sets, to an ibm 360-40 computer in the compb·oller's division at murray hill. the murray hill library is only a building away from the computer. the holmdel and whippany libraries are about thirty and twelve miles distant, respectively. the computer, in heavy daily use along with other computers for regular operations of the comptroller's division, has a 262,000 byte ( character) core memory. core is partitioned, permitting effective simultaneous use of the computer for routine batch operations and the bellrel system. in addition to core requirements for the 360 operating system, core partitions include (a) the teleprocessing logic of the ibm queued teleprocessing access method (qtam), (b) message editing logic and application logic packages, including library applications and (c) batch processing programs and operations for all purposes. figure 2, a flowchart of the real-time processing logic, illustrates core partitioning for function process equip input and output of inquiry or transaction at 1050 terminal teleprocessing logic *inter-t.erminal communication, if desired by librarian message editing logic application logic common logic routines output ~ message ( response) v receive a i queue message i (switch) i queue a send response • process message • refer to disk files • update files • generate response disk input. and output logic process logic 1050 fig. 2. general flowchart of bellrel real-time programming logic. 134 journal of library automation vol. 1/ 2 june, 1968 (a) and (b). in addition to the programs resident in core (portions of which can be overlaid as necessary by other real-time operations) certain programs for particular functions (e.g. loan, return, etc.) are called from disk as needed. in all, 32 real-time and 23 batch programs, together with the 360 operating system, are used by bellrel. the programs are written in cobol level f and basic assembly language. disk records publication and man records are stored on an ibm 2314 disk pack with a capacity of some 29,000,000 characters. about two-thirds of this space is in use or dedicated. the man records, which are up-dated daily from tape used for telephone directory, payroll and other purposes, cover about 19,000 people including btl employees and resident visitors, i.e., contractual people who may also use library facilities. each man record is 161 characters in length and contains such information as payroll account number, name, department number, telephone number, location, occupational class, space for three book loans, keys referring to overflow loan trailers elsewhere on disk, etc. the man file is organized by payroll account number, a five-digit number which is keyed in or read from a prepunched card for all loans, reservations and other transactions requiring it. access to man records on disk is by the ibm index sequential access method ( isam). publication records vary in format, length and method of access depending upon the class of publication. five classes of publications are currently in the system: books (class 1), journals (class 2), trade catalogs (class 3), college catalogs (class 4) and dewey-classified continuations and multiple-volume titles cataloged as sets (class 5). other classes of information, e.g., documents, motion picture films , etc. will be added. each title in each class is assigned a unique six-digit identification number, the first digit of which identifies the class. a typical number for a monograph title is 127391. the punched cards and book labels for each copy of this title also indicate the holding library and its copy number, e.g., 127391mh01, 127391 wh05. a sample card and label, generated by the computer, are shown in figure 3. as noted above, books fall in two classes-1 and 5. each class provides a maximum of 100,000 title numbers, more than adequate for the predicted growth of the technical information libraries where weeding is heavy. the book collections for the three libraries now on disk total about 33,000 titles and 66,000 volumes. the disk record for each class 1 title is 188 characters in length and contains the book number, 43 characters of author-title, the call number, copies by location, the fields for file maintenance change infonnation, three loans, two reserves, keys to loan trailers and reserve trailers, etc. each loan field identifies borrower, date due, copy and status of the loan (e.g., overdue number, renewed, number of reserves, returned). the bellrel/kennedy 135 i i 102362mhui i i holton, g ./ sci~nc e and the mod e rn mind ii ill ii i i ii ii i i i i i i i i i _ .... u 500/h7 5 1111 i i i i i i i i 111111 i i bell 'elephone laboratories ii i i i i i i i i i i tethnical .. iiofi.1atjon liirafties i i i i i i i i i i i i i i 1023!>2 mh 01 holton, g./ science ano the modern mind 500/h75 ill i fig. 3. bellrel book card and label. z ~ ::u rr1 3': i 0 < rr1 i i -u r rr1 l> (f) rr1 0 0 identification number for each new class 1 book is assigned by the computer on update runs. numbers are sequential. disk access is direct. class 5 books-cataloged continuations and multiple-volume titles cataloged as sets-share a different kind of disk record. they could all have been entered as class 1 items, in which case each volume of a set would have had a separate record on disk, a unique (not ~ecessarily consecutive) identification number, and a separate listing in the author, call number and identification number printed catalogs. the class 5 approach, however, permits grouping of volumes in sets and series. ten volumes of one title are handled in one disk record, 288 characters in length, under the same identification number. additional volumes, up to a total of 100, are handled in succeeding records. all of the records of the set carry the same first five digits in their identification number. disk access is by the index sequential access method ( isam). in addition to grouping sets, class 5 records effect a saving on disk space and permit use statistics to be derived for the set as a whole, as well as for each volume in the set. the principal disadvantage of the approach is that all keyed messages dealing with any volume in the set must cite both the basic access number and the specific data (e.g., volume number) pertinent to the volume in question. the journal disk records cover all the 2700 journal titles held in the library system. unlike books, however, records of all copies and volumes of each title are not permanently stored on disk. instead, each !55character journal title record contains the journal identification number ,. ,, 136 journal of library automation vol. 1/ 2 june, 1968 and 48 characters of title, plus fields for file maintenance changes, two loans, one reserve, and keys to loan and reserve trailers. specific bound volumes or unbound issues are recorded on this record only as long as they are current loan or reserve transactions. to expedite loans and returns, punch cards and computer-printed labels have been prepared for some 10,000 bound journal volumes. additional volumes are similarly processed as circulated or bound. disk records for trade catalogs and college catalogs are also 155 characters long. access to records is also by the index sequential access method. unlike journal volumes, however, each separate catalog is specifically identified and recorded on disk. when conversion is complete, more than 5000 catalogs will be accessible on disk. the loan and reserve trailers for each publication class accommodate overflow. trailer records vary in number and length depending upon function, publication type and predicted need. for example, 5000 31character trailer records, each handling three reserves, are available for book reserves. for journals, 800 59-character records, each handling three reserves, are provided. the difference reflects the heavier book traffic and the particularly sharp peaking of reserves on new book titles. apart from the normal safety back-up files (e.g., the nightly dump to tape of the current disk records), the only remaining machine record which requires mention is the history tape. this tape, up-dated daily, is a continuing record of all completed loans which provides information necessary for statistics and use analyses. on-line transactions twenty-two different transaction codes are currently available to handle loans, returns, renewals, reservations and queries in real time. in addition, any terminal can call another terminal by a single digit code and one terminal in each library can call the other two libraries simultaneously by a 'broadcast' code. this inter-library, typewritten message facility is a highly useful component of the total system. ten of the twenty-two transaction codes handle loans, returns, reservations and renewals. these codes, their prime functions and associated data inputs are listed in table 1. the eleven lq (library query) codes for requesting information from bellrel are listed in table 2. one additional code causes the computer to print out at the query terminal a statistical log of all classes of transactions at each terminal and their totals. it also gives the number of input errors made at each terminal. the log aids in adjusting work loads and monitoring performance. let us now consider several common transactions in more detail. loans: if the borrower is present, he gives the desired book to the circulation clerk. he shows his badge or, alternatively, writes his surname and fivetable 1. on-line codes for loans, etc. code function 1. loans lm ln 2. returns lc lk 3. reserves la lb ld 4. renewals, etc. loan of 1-5 items for one man at one time. overnight loan. assigns overnight loan period automatically; does not pick-up reserves on return. cancel loan. charge out automatically to first person on reserve queue. cancel loan. no automatic charge-out. add to reserve queue. give reserve no., copies held and available, etc. bypass reserve queue. put designated man first. delete from reserve queu~. lp change loan period assigned. lr renew loan, once. lg force renewal irrespective of reserves, overdue status, etc. input data & method man no., item no. (including location and copy). usually card read. , item no., with location and copy. usually card read. " man no., item no. (less location and copy). keyed. , , new loan period, complete item no. keyed or card read. , , t:x:j ~ ~ ~ ~ ~ ~ ........... ~ t'%:1 z z t'%:1 t1 ~ )-' ~ i ( ~ .. = ~ •• 142 journal of library automation vol. 1/ 2 june, 1968 similar messages are not accepted by the system. recovery from errors may be done by aborting input, repeating it correctly or, if all elements are legitimate to the message edit program, by using the appropriate online code to correct the record. on the first day of full operations, 10% of the input transactions were incorrect. one week's experience reduced the error rate to 3%, and further improvement is expected. the .25% error rate estimated by lazorick and herling for a system planned to function without any prepunched cards ( 9) appears unrealistic. non-personal codes some thirty special codes, which function like man numbers in the system, are available to handle real-time transactions involving branch libraries, outside organizations and such internal library functions as charges to recataloging, repair, new book shelf, etc. all are three-digit codes, essentially mnemonic, e.g., al9 allentown ( pa.) library; wi9withdrawn. most of the codes generate overdue notices; the codes for binding, missing, repair and a few oth~rs do not. several require backup manual records, e.g., ala interlibrary loan forms for charges to outside libraries. batch processes and products overdue notices and daily loan lists are produced in a nightly file maintenance run which also updates the history tape. the preprinted forms used for first and second overdue notices are address-sorted for direct mailing. the third notice, triggered three days after the second and ten days after the first, is a listing with telephone numbers and other data for telephone follow-up. the daily loan list is primarily a back-up record in the event of system down-time. current loans, the number of reserves and other information are combined in one list for all three libraries. the bellrel master book catalog is run quarterly from disk records. main entry, dewey number and access number catalogs are produced. all new copy; new title and other record changes made on disk in maintenance runs are reflected in cumulative weekly catalog supplements . these runs also produce all the new or changed cards and labels required. the bellrel catalog is a precursor to a system-wide printed book catalog which will replace nearly one million catalog cards held in eighteen libraries. when completely developed, input to the circulation system will be a sub-system of the master catalog maintenance procedures. maintenance of the disk journal records for bellrel follows a comparable integrated approach: journal code numbers, title abbreviations, data changes and the like are derived from the computer routines used to prepare the serials catalog since 1962. trade catalog files for the bellrel/kennedy 141 the book and the number of copies still available for loan at each library. getting one copy into the hands of the requester is then very simple. the holding library nearest to the borrower is instructed, by telephone or terminal message, to send call number such-and-such "out." the requester's name and address are not relayed. the holding library gets the book from the shelves and cancels it, using the lc command with the card reader. although this copy was not on loan, the computer ignores this fact because someone is waiting for the book, i.e., the requester whose reserve triggered this sequence. as a consequence of the cancel operation, the requester is automatically charged with the book, the holding library is told his name and address, and mailing follows. the lc command is also used in the same way to get additional copies of a book, when purchased to meet high demands, into the hands of the requesters. the la reserve transaction is put to particularly good use in handling the 600-plus requests received within a few days each month for new books announced in the library bulletin. bulletin request forms supply both item numbers ~nd man numbers. mass input follows and the computer responds with all the signposts needed to put every copy in the system to work, with a dispatch speed hitherto impossible to achieve. as shown in table 1, two transactions permit changes in reserve queues. ld deletes a requester. lb permits the queue to be bypassed and insertion of a new name at the top of the list. queries this is a fact retrieval facility. the codes listed in table 2 are reasonably self-explanatory, and take into account the realities of on-line circulation service. lqc, for example, tells the status of a title at the moment of asking, an up-to-dateness not available from the backup daily loan list generated each night. typical responses to the lqc code are: copies available, mh02 who!; title removed my68; or all 03 copies loaned, 14 reserves similarly, lql provides a requester with an immediate, printed listing of all the items he has on loan. two query codes cause display of the complete disk record for a publication ( lqd) or a person ( lqe), including current loans, reserves and trailer records. error detection in any keyboard operation mistakes will be made. bellrel attempts to signal critical errors and prevent them from affecting records. as noted previously, input man numbers and item numbers are translated by the computer into alpha characters. numerous diagnostics are also returned: e.g., invalid transaction code; invalid book id #; invalid empl #;invalid transaction bad copy#; variable data required, etc. incorrect inputs generating these and table 2. on-line library query codes. code query 1. publications lqc what is the status of title . . . ? lqs what is the status of copy . . . ? lqn what overnight items are still out? lqd display the complete disk record for title .... 2. people lqm how many items are on loan to ... ? lql what items are charged to . . . ? lqq who is first on the reserve queue for . . . ? lqr is man . . . on reserve? where? lqw who are the borrowers of title ... ? lqz who is man number ... ? lqe display the complete disk record for man . . .. input data & method (all queries are keyed.) item no. (less location and copy). item no. (with location and copy). location symbol only. item no. (less location and copy). man no. ,, man no., item no. (less location, copy) . " man no. " " ~ i ..a t-'4 ... i it i g· ~ ..... ~ '-1 § s~ ..... cd &5 bellrel/ kennedy 139 digit number on a card. while he is doing this, the clerk hits the 'request" button on the keyboard-printer, and inserts the book card in the card reader along with an end-of-transmission card. with the keyboard 'proceed' light on ( 2 seconds after 'request'), the clerk returns the typewriter carriage and keys in lm (the loan code) and the man number obtained from the borrower: e.g., lm43486. input is completed by activating the card reader. the card reader reads only to the end-of-block punch (column 16 in book cards ) , ignoring the author-title data and call number in columns 17-78. as the book identification number is read, it is listed on the typewriter. the loan period is not punched in the card, but is assigned by the computer on the basis of the first digit of the publication's identification number. the assigned loan may be altered from the keyboard, if desired. the computer responds to the loan transaction in 3 to 5 seconds, printing back in upper-case red the first three letters (trigram) of the borrower's surname and twenty characters of the book's author-title entry. these responses provide checks against errors in keying. man numbers are usually keyed (although they may also be card read) and a book number is keyed when its punch card is not available, e.g., in posting a reserve. as noted below, a wide range of other computer responses are available to flag errors and aid diagnosis. the loan transaction is completed by inserting the punch card in the book and date stamping it. total elapsed time from the borrower's presentation of the book to date stamping averages about 23 seconds for a single loan of the type described. this compares with about 20 seconds cited for one ibm 357 system ( 3) and 14 second~ in bell laboratories manual system; however, in both these systems further processing is required. if a borrower wishes to charge out more than one book at a time (a common occurrence which ruled out punching the end-of-transmission code on the book card), up to five books may be handled with one keyboarding. total elapsed time for multiple loans averages about 15 seconds per book. loans of bound journals, trade catalogs and other publications with prepunched cards follow the routine described. for unbound journals and other items lacking cards, it is necessary to obtain the title number from the printed catalog and to key this in with the relevant issue information. other transaction codes, as noted in table 1, deal with loan period changes and renewals. typical computer responses from the renew code (lr) include: renew; overdue; res waiting; no renew. returns the two-character return code ( lc) is used with card reading or typing. five items may be discharged with one lc action. the computer ·' .,, iii '•' ill 1;1 '• ' i ' . ~ . . ,, '•' ' o , i , i. ... i ' 140 journal of library automation vol. 1/ 2 june, 1968 responds with twenty characters of author-title, and one of the following messages, for each item: returned i.e., the loan is complete and no one is waiting for a copy of this book. loan to . . . i.e., send the book to the man indicated by name and address. since he was first on the reserve queue, the book is now charged out to him for the loan period shown. mail to ... libr i.e., this book belongs to the library sho\vn and should be returned there. no one is waiting for it. not on loan i.e., this book was previously cancelled or somebody borrowed' it without charging it out. the loan to ... response noted above is a particularly valuable service and time-saving feature. in effect, if any reserve exists anywhere in the system for the title, then the first copy returned is automatically charged to the first person in the queue and the next person moved up. the loan period assigned by the computer depends upon whether there is a waiting list for the book. the library does not need to take any charge-out · action except to date stamp the book and address a mail envelope using the information provided . the mail to ... libr response, calling attention to the fact that the book should be returned to its 'home' library, is coupled with automatic charging by the computer to in transit to . . . questions about the copy will receive this response during the time it takes to ship it to its home base. when the book is cancelled at the home library, any reservations made during the 'in transit' phase will cause automatic loan in the manner already described . cancellation of a loan charge without automatic follow-up on the reserve queue is sometimes desirable. for example, after a copy of a book has been charged to 'missing' and search has failed to locate it, a charge to 'lost' may be desirable for record purposes. use of the lk return code, instead of the normal lc, makes this possible without automatic pick-up of the reserve queue. reservations since reserves are posted in bellrel in real time, any copy of a title returned, even seconds after the reserve is made, will be charged to the first man on reserve. reserves are input using the keyboard sequence la man number, item number. the computer response includes the standard name trigram and publication data. if all copies of the title are on loan, the computer also responds with information to the requester on where he stands; as an example, "res #03, copies held 05". if all copies are not on loan, the response includes the call number of l bellrel/kennedy 143 circulation system are similarly correlated with other existing machine processes and products. as stated earlier, much improved feedback on collection use, demand patterns, and other matters important to library management was a major goal of bellrel. the history tapes serve this purpose both for special-purpose analyses and regular system reports. the latter include circulation statistics by subject class and library, laboratory location, user department and so on. three other reports may be mentioned: 1. high-demand list-this is a weekly list focusing attention on all titles with more than a specified number of reserves. reserves and copies are shown by location. previous loan totals are also given to aid in purchase decisions to meet demands. 2. zero-loan list-this is a semiannual listing of all titles in the collection with no recorded loan activity in one or more libraries for the period surveyed. a summary of previous loans is given, to help in decisions on weeding. 3. missing items list-this is a twice-monthly, dewey-ordered list of all titles charged to 'missing.' it is used to conduct scheduled searches in all libraries until the items are converted to 'lost' and replaced or withdrawn. operating experience . this paper is being written after only one month's . use of bellrel in regular service. the following observations are therefore limited. circulation assistants have adapted very quickly to the ·input mechanics, familiarity with typewriter keyboards and the novelty of conversing with the computer being contributing factors .. bellrel appears to be regarded as a powerful and perceptive colleague with the occasional off moments accepted in a friend. · burdensome tasks, such as preparing overdue notices and maintaining card records have ·been dropped with enthusiasm. staff members are developing new perspectives as they understand the functioning of an information network. the total system concept, embracing the resources of all participating libraries . and permitting one copy of a book to serve many readers without inter-library loan, is modifying many practices. greater record accuracy, · completeness and utility is also being realized, along with significant time-savings throughout the system. the query . facility, which shows promise of being much used, provides immediate answers to certain questions which previously could not be asked and gives a glimpse of the eventual responsiveness of a complete on-line library catalog. .. · customer reaction has ranged from some technical interest (technical staff members were consulted in the development of the system and information about its purposes and functions has been widely disseminated) to more common approval and enthusiasm. the increase in time to charge ,. •' if' ,. ii · it '• '•' it' ' • ,, " •' 144 journal of library automation vol. 1/ 2 june, 1968 out a book in person in bellrel-about nine seconds more than the manual system for a single loan and two seconds more per book for multifle loans-appears to be widely accepted. whether this ·is due to initia tolerance of a new system, or less 'work' by the borrower in the charge operation, or an appreciation that service as a whole will be faster and more responsive, is not known. it is expected that charging time will be reduced with program modifications and experience. it should also be recalled that in two out of three loans the borrower is not present: far from experiencing additional delay, he gets what he wants faster. the usual bugs in a complex of programs have arisen; certain trailers had to be enlarged; the 360 operating system and hardware have failed several times. down-time, under initial loads of up to 1500 on-line transactions per day, has been less than anticipated for the first month and is expected to drop sharply. about two down-times per day were experienced in the first month, about half of these being deliberate, and most recoveries have taken less than fifteen minutes. down-time logs are used to record transactions for immediate entry into the system when it becomes alive, a similar procedure being used for after-hours loans. costs the costs of operating the bellrel system are, understandably, higher than the displaced manual system, the two systems, of course, not being comparable in services and functions. in the operations which can be fairly directly correlated, bellrel permits very significant labor savings. appreciable materials savings are also anticipated as a result of collection pooling (leading to reduced duplication of resources in the individual libraries), better inventory control, and other factors. rental costs are the major component. each of the six terminals, for example, with associated data-sets and telephone lines, costs $275 a month. costs of the portion .of the transmission control unit and disk facility used by the libraries total about $1100 a month. in addition to a small amount for materials, other costs include a share of the central processing unit and core memory charges, depending upon usage. to execute 1500 real-time transactions per day appears to require less than 12 minutes of main-frame computer time, but a share of the real-time terminal polling and batch processing time must also be included. however, experience with the automated system has been far too brief to reach any precise cost figures for the whole system. in particular, although the dollar value of the largely intangible but very real benefits to library users and library staff can only, at least at this stage, be guessed at, bellrel has been implemented on the premise that these benefits are major. it should be noted that the costs of the manual (newark) system in bell laboratories differ greatly from the costs calculated by the library technology project ( l tp) for this system in an academic library ( 12). bellrel/kennedy 145 ltp cost estimates for both the newark and the ibm 357 systems do not conform to our calculations for more reasons than can he discussed here. in the main, however, environmental conditions, strongly affecting labor costs, are too different. for example, in arriving at labor costs, l tp uses the figure of 44 overdues per 1000 circulations in academic libraries; in our library system where there are no fines or long loans, overdues total about eight times this figure. again, as a result of book announcement services, discipline concentration and other factors, reserves in the bell laboratories libraries are nearly twenty times the ratio used by the library technology project for academic libraries. still further, in bell laboratories some 200,000 loans per year are made without the borrower being present to fill in the loan card. these and other factors add heavily to the cost of labor. few industrial organizations can obtain labor at the cost of $2.00 per hour cited in library technology reports when personnel benefits and other overhead are included. conclusion paul fasana has observed: "since cost is primarily a quantitative measme of a system, it is hut one of several factors (and possibly not even the most important factor) to consider in evaluating an automated system. other factors . . . qualitative factors . . . must also he considered. . . . they include such items as operating efficiency, reliability, services rendered, and growth potential." ( 13) . a full judgment on these factors in the bellrel system must await fmther experience hut the following observations may he made: 1) bellrel is not an experiment; it is addressed to practical problems in an industrial library network. 2) it is not a final system; software and hardware evolution will see to that. 3) it is not a model system, transportable in toto to another context; any system of comparable complexity and investment requires careful matching to local needs and objectives. bellrel objectives, to reiterate, include improved service through computer pooling of dispersed library collections, up-to-date reporting on the status of any publication, immediate identification of all items on loan to a person and automatic follow-up on reserve queues; reduced clerical labor; better inventory control; much enriched feedback for library management; more effective realization of the information network philosophy; and experience in the new era of man-machine communication in a real-life environment. the evidence is strong that these objectives are being achieved. acknowledgments the technical information libraries gratefully acknowledge the unstinting and imaginative aid given by the comptroller's division of bell tel,, •' ~ • ' r .. ... 146 journal of librm·y automation vol. 1/ 2 june, 1968 ephone laboratories in the design, development and operation of the bellrel system. bibliography 1. 2. 3. 4. 5 . 6. 7. 8. 9. 10. 11. 12. 13. george fry and associates, inc.: study of circulation control systems (chicago: ala, 1961). american library association, library technology project: the use of data-processing equipment in circulation control (chicago: ala, july 1965), library technology reports. mccoy, ralph e.: "computerized circulation work: a case study of the 357 data collection system," library resources & technical se1·vices, 9 (winter 1965), 59-65. flannery, anne; mack, james d.: mechanized circulation system, lehigh university library (center for the information sciences, lehigh univ.: nov. 1966), library systems analyses report no. 4. cammack, floyd; mann, donald: "institutional implications of an automated circulation study," college & research libraries, 28 ( march 1967), 129-32 . cuadra, carlos a., ed.: american documentation institute annual review of information science and technology, vol. 1. (new york: interscience, 1966), pp. 201-4. mccune, lois c.; salmon, stephen r.: "bibliography of library automation," ala bulletin, 61 (june 1967), 674-94. kimber, richardt.: "studies at the queen's university of belfast on real-time computer control of book circulation," journal of documentation, 22 (june 1966), 116-22. lazorick, gerald j.; herling, john p. : "a real time library circulation system without pre-punched cards," proceedings of the american documentation institute, v. 4 (washington: adi, 1967), 202-6. croxton, f. e.: on-line computer applications in a technical library (redstone scientific information center, redstone arsenal, alabama: november 1967), rsic-723. ruecking, frederick, jr.: "selecting a circulation-control system: a mathematical approach," college & research libraries, 25 (sept. 1964)' 385-90. american library association, library technology project: three systems of circulation control (chicago: ala, may 1967), library technology reports. fasana, paul j.: "determining the cost of library automation," ala bulletin, 61 (june 1967) 661. adventure code camp: library mobile design in the backcountry david ward , james hahn, and lori mestre information technology and libraries | september 2014 45 abstract this article presents a case study exploring the use of a student coding camp as a bottom-up mobile design process to generate library mobile apps. a code camp sources student programmer talent and ideas for designing software services and features. this case study reviews process, outcomes, and next steps in mobile web app coding camps. it concludes by offering implications for services design beyond the local camp presented in this study. by understanding how patrons expect to integrate library services and resources into their use of mobile devices, librarians can better design the user experience for this environment. introduction mobile applications offer an exciting opportunity for libraries to expand the reach of their services, to build new connections, and to offer unique, previously unavailable services for their users. mobile apps not only provide the ability to present library services through mobile views (e.g., the library catalog and library website), but they can tap into an ever-increasing list of mobile-specific features. by understanding how patrons expect to integrate library services and resources into their use of mobile devices, librarians can better design the user experience for this environment. by adjusting the normal app production workflow to directly involve students during the formative stages of mobile app conception and design, libraries have the potential to generate products that more accurately anticipate real-life student needs. this article details one such approach, which sources student talent to code apps in a fast-paced, collaborative setting. as part of a two-year institute of museum and library services (imls) grant, an academic library– based research team investigated three different methods for involving users in the app development process—a student competition, integrated computer science class projects, and the coding camp described in this article. the coding camp method focuses on a trend in mobile software development of having intensive two-to-three-day coding events that result in working prototypes of applications (e.g., iphonedevcamp, http://www.iphonedevcamp.org/). coders typically work in groups to simultaneously learn how new software works, and also develop a functioning app that matches an area of personal interest. camps promote collaboration, which provides additional networking and social outcomes to attendees. additionally, camps provide an david ward (dh-ward@illinois.edu) is reference services librarian, james hahn (imhahn@illinois.edu) is orientation services & environments librarian, and lori mestre (lmestre@illinois.edu) is head, undergraduate library and professor of library administration, university of illinois at urbana-champaign. http://www.iphonedevcamp.org/ mailto:dh-ward@illinois.edu mailto:imhahn@illinois.edu mailto:lmestre@illinois.edu adventure code camp: library mobile design in the backcountry | ward, hahn, and mestre 46 opportunity for software makers to promote their services and products, and they can result in new code and ideas on which to base future products. for academic libraries, a camp environment provides an educational opportunity for students, particularly those in a field with a computing or engineering focus, to learn new coding languages and techniques and to gain experience with a professional software production process that runs the full timeline from conception to finished product. coding camps offer a chance for librarians to get direct student feedback on their own software development goals. the resulting applications provide potential benefits to both groups—students have a functional prototype to enhance their classroom experiences and a codebase to build on for future projects, and the librarians gain an insight into students’ desires for the content of mobile apps, code to integrate into existing apps, and direct student input into the iterative design process. this article presents the results of a mobile application coding camp held in fall 2013. the camp method was tested as a way to explore a less timeand staff-intensive process for involving students in the creation of library mobile apps. three specific research questions framed this investigation: 1. what library and course-related needs do students believe would benefit from the development of a mobile application? 2. is the library providing access to data that is relevant to these needs, and is it available in a format (e.g., restful apis) that end users can easily adopt into their own application design process? 3. how viable is the coding camp method for generating usable mobile app prototypes? literature review in line with efforts in academic libraries to operationalize participatory design for novel service implementation,1 the library approach to code camps included sourcing student technical expertise in line with other tech companies’ approaches to quickly iterating prototypes that may advance or enhance company services. while coding camps happen in corporate settings, other types of camps try to publicize technologies, like a programming language, while others still are directed toward a specific cohort.2 the departure point for the library was in understanding other ways the library might consider organizing and pairing its resources of apis with other available campus services. a few highly visible and notable corporate “hackfests” or “hackdays” include the facebook hackdays, in which facebook timeline was developed (http://www.businessinsider.com/facebook-timeline-began-as-a-hackathon-project-2012-1). the mobile app company twitter also has monthly hackfests where employees from across the company work for a sustained period (a weekend or friday) on new ideas putting together prototypes that may transition into new services for the company. http://www.businessinsider.com/facebook-timeline-began-as-a-hackathon-project-2012-1 information technology and libraries | september 2014 47 an example of code camps from academia are the mhacks camps at the university of michigan (http://mhacks.challengepost.com/), among the largest code camps for university students in the midwest. these camps are notable for their funding from corporations and for their support of student travel from colleges around the country to participate at the university of michigan. at each event, coders are encouraged to make use of the corporate apis that student programmers may make use of once they graduate or form companies after graduation. on the professional front, digital library code meet-ups (such as that of the code4lib preconference: http://code4lib.org/) are an opportunity for library technologists to share strategies and new directions in software using hands-on group coding sessions that last a half or full day. a recent digital event for the digital public library of america (dpla) hosted hackfests to demonstrate interface and functional possibilities with the source content in the dpla. similarly, the hathi trust research center organized a developer track for api coaching at their conference so that participants would have hands-on opportunities to use the hathi trust api (http://www.hathitrust.org/htrc_uncamp2013). goals of coding camps include development of new services or creation of value-added services on already existing operations. code is not required to be complete, but functional prototypes help showcase new ways of approaching problems or novel solutions. recently, mhacks issued the call to form new businesses at their winter hackathon (http://www.mhacks.org). libraries are typically less interested in new businesses, but rather seek new service ideas and new principles for organizing content via mobile and to do so in such a way that will source student preferences for location specific services, a key focus for the research team’s student/library collaborative imls grant. method while the camp itself took only two days, there was a significant amount of lead-time needed to prepare. in addition to obtaining standard campus institutional-review-board permissions for the study, it was also necessary to consult the office of technology management to devise an assignment agreement covering the software generated by the camp. the research team chose a model that gave participating students the option to assign co-ownership of the code they developed to the library. this meant that both students and the library could independently develop applications using the code generated during the camp. marketing for the camp specifically targeted departments and courses where students with interest and skills for mobile application development were likely to be found, particularly in computer science and engineering. individual instructors were contacted, as well as registered student organizations, to help promote the camp. attendees were directed to an online application form, where they were asked to provide information on their coding skills and details on their interest in mobile application development. http://mhacks.challengepost.com/ http://code4lib.org/ http://www.hathitrust.org/htrc_uncamp2013 http://www.mhacks.org/ adventure code camp: library mobile design in the backcountry | ward, hahn, and mestre 48 ten students were ultimately selected from the pool and, of those, six attended the camp. a precamp package was sent to these students to help them prepare for the short, intense timeframe the event entailed. this package included details on library data that were available to base applications on through web apis, as well as brief tutorials on the coding languages and data formats participants needed to be familiar with (e.g., javascript, json, xml, etc.). participants were also provided with information on parking and other logistics for the event. the research team consisted of librarians and academic professionals involved in public services and mobile design, and student coders employed by the library to serve as peer mentors. the team designed the camp as a two-day experience occurring over a weekend (friday evening to saturday late afternoon). the first day was scheduled as an introduction to the camp, with details on library and related apis that could be used for apps and an opportunity for participants to brainstorm app ideas and form design teams. the day ended with some preliminary coding and consultation with camp organizers about planned directions and needs for the second day. the second day of the camp mostly for coding, with breaks scheduled for food, presentations of work-in-progress, and an opportunity to ask questions of the research team. the day ended with each team presenting their app, describing their motivation in designing it and the functionality they had been able to code into it. given the brief turnaround time, the research team put a heavy focus during the orientation session on clearly articulating the need to develop apps germane to student library needs. examples from the student mobile app design competition conducted in february 2013 were provided as starting points for discussion, as these reflected known student desires for mobile library applications.3 after the camp ended, students who elected to share their code with the library were given details on how and where to deposit the code. post-camp debriefing interviews (lasting 30 to forty-five minutes each) were scheduled individually with all participants to get their feedback on the setup of the event as well as what they felt they learned from the experience. discussion researcher observations and feedback from students, both during the camp and in individual interviews afterwards, led to several insights about what sorts of outcomes libraries might anticipate from running camps, how to best structure library coding camps, what outcomes students anticipate from participating in a sponsored camp environment, and what features and preferences students have for mobile apps designed to support their academic endeavors. a key student assumption, which emerged from comments at the event and through subsequent student interviews, was that students anticipated completing a fully functioning mobile app by the end of camp. instead, the two student teams each finished with an app that, while it included some of the features they desired, still required additional coding to be fully realized. several information technology and libraries | september 2014 49 suggestions were made for how this need might be met at future events. the most consistent feedback from the students was that they would have liked an additional day of coding (three total camp days), so that they could have gotten further on the implementation of their app ideas. during the exit interviews, one student noted that the two-day timeframe really only allowed for sketching out an idea for an app, not coding it from scratch. a pair of related suggestions from students included having templates for mobile apps available to review to get up to speed on new frameworks (particularly jquery), and secondly, a longer meetand-greet for teams prior to beginning work during which they could compare available coding skills and have some extended brainstorming of app ideas. students were somewhat mixed in their desire for assistance in developing app ideas—some appreciated the open-endedness of the camp, but others wanted a more organizer-driven approach. some students suggested having time to work with library staff after the camp to finish or polish their apps. this observation suggests the enthusiasm students had for the camp itself, and specifically for having a social, structured, and mentored opportunity to develop their coding skills. based on these requests, the research team created “office hours” on the fly after the camp ended to support this request. research team members and coding staff communicated times when team members could come into the library and get additional help with developing their apps. the students had very similar themes for app features to those that the research team observed in an earlier student mobile app competition study. notable categories included the following: • identify and build connections with peers from courses. • discover campus resources and spaces. • facilitate common activities such as studying and meeting for group work. students remarked that the camp was an opportunity to both meet people with similar coding interests as well as to learn more about specific functional areas of app development (specific coding languages, user interface design, etc.) in which they had little experience. jquery and javascript for user-facing designs were particular areas of interest. many students had some indepth background working on pieces of a finished software product but had not previously done start-to-finish software design; this was a big selling point for the camp. the collaborative nature of the camp also matched students’ preferences to work in teams and to learn from peers. while the research team had coders on hand to assist with both the library apis, as well as jquery basics, most teams did the majority of their work themselves, and preferred self-discovery. each team did eventually ask for coding advice, but this occurred toward the end of the camp, once their apps were largely coded and they needed assistance overcoming particular sticking points. the other piece of advice students asked organizers about concerned identifying apis for locations of campus maps, and other related resources to serve as data sources powering their apps. in the course of assisting with these requests, researchers discovered another key issue facing library mobile app development—the lack of campus standards for presenting information across adventure code camp: library mobile design in the backcountry | ward, hahn, and mestre 50 different colleges and departments. in particular, maps of rooms inside campus buildings were not provided in a consistent or comprehensive way. this was particularly frustrating to the team that was attempting to develop an app featuring turn-by-turn navigation and directions to meeting rooms and computer labs. in addition to sharing information on known apis and data sources, camp organizers also learned about previously unknown data sources from the student teams. one example was a json feed for the current availability of computers in labs provided by the college of engineering. while this feed was beneficial to starting work on an app for one team, it also led to frustration because feeds for other campus computer labs did not exist, and the team was limited to designing around the specific labs that did have this information available. observed student discussions about the randomness of data availability also highlighted one of the key themes of student-centered design—the conceptualization of a university as a single entity, the various parts of which combine and come in and out of focus depending on the current student task. related student feedback from one of the post-event interviews described a strong desire to create integrated, multifunction apps to meet student needs as opposed to a variety of apps that each did one thing. the siloed nature of campus infrastructures frustrates this desire to some extent but also creates opportunities for students to build a tool that meets a real need among their peers to comprehend and organize their academic environment. this observation also matches those found during the aforementioned student competition. conclusion and future directions student feedback on the camp, as a whole, was very positive, and in the individual interviews, students noted they would like to participate in another camp if it was offered. on the library side, the research team felt that the camp was useful to their ongoing mobile app development process, partially for the code generated but primarily for the direct feedback on what types of apps students wanted to see. the start-up time and costs for the project were low, as expected, and the insights into student mobile preferences seemed proportionate to this outlay. the camp method should be reproducible in a variety of library environments. the key assets other libraries will need to have in place to run a camp include staff with knowledge of client-side api use (in particular jquery, cors, or related skills), and knowledge of campus data sources that students may wish to pull from. third-party apis with bibliographic data (e.g., good reads) could also be used as placeholders for libraries that do not have access to apis for their own catalogs or discovery systems. student suggestions for extending the camp by a day, and their ideas for how to structure it for student success, were very specific and actionable and provided excellent guidance. one of their ideas was to develop tutorials and templates that could be introduced as a pre-camp meeting. this would not add too much prep time. another idea for a future camp would be to develop a specific theme for teams, which would allow for more documentation of and practice with specific apis. information technology and libraries | september 2014 51 the low attendance was a concern, so for the next camp twice the number of desired participants will be invited to ensure both a variety of coding skills and interests as well as opportunities for more teams to be formed. additionally, partnerships with student coding groups or related classes should help to drive up attendance. the biggest difficulty moving forward will be developing campus standards for data that can be made available to students about resources, spaces, and services. as noted above, students typically do not design a “library app,” rather they look to build a “student app” that pulls in a variety of data from across campus. functions of apps are therefore more oriented toward common student activities like studying, socializing, and learning. a related challenge will be to provide adequate format and delivery mechanisms for access to supporting data feeds. cognizant of the silo issue, noted above, as libraries present their own data for student consumption, these tendencies towards a unified view need to be taken into account. completion of an assignment is more than identifying three scholarly sources; it might involve identifying a space to do the research, locating peers or mentors for either the research or writing process, locating suitable technology to complete an assignment, and a variety of other needs. the features and information presented on a library’s website should be designed as modular building blocks that can fit into other campus services in a similar way to how course reserves are sometimes presented in campus learning management services alongside syllabi and assignments. separating library content (e.g., full-text articles, room information, research services) from library process can help with freeing information about what libraries have to offer and can facilitate broader discovery of services and resources at point of need. key to this process is recognizing the student desire to shape the resources they need into a comprehensible format that matches their workflow rather than forcing students to learn a specific, isolated, and inflexible path for each part of the projects they work on. this study has shown that a collaborative process in technology design can yield insights into students’ conceptual models about how spaces, resources, and services can be implemented. while the traditional model of service development often leaves these considerations until the very end in a summative assessment of service, the coding camp and collaborative methods presented here provide librarians a new tool for adding depth to service design and implementation, ultimately resulting in services and platforms that are informed by a more wellrounded and deeper understanding of the student mobile-use experience. in that regard, the initial research questions that framed this study could also be used by other libraries as they explore the library and course-related needs that students could benefit from with the development of mobile applications, as well as to determine if their library provides access to data that is relevant to those needs. the results from this study have affirmed that, for at least the library from this study, that the coding camp method is viable for generating usable mobile app prototypes. it also affirmed that by directly involving students during the formative stages of mobile app conception and design, the products of those apps more accurately reflect real-life student needs. adventure code camp: library mobile design in the backcountry | ward, hahn, and mestre 52 references 1. council on library and information resources (clir), participatory design in academic libraries: methods, findings, and implementations (washington, dc: clir, 2012), http://www.clir.org/pubs/reports/pub155/pub155.pdf. 2. “hackathon,” wikipedia, 2014), http://en.wikipedia.org/wiki/hackathon. 3. david ward, james hahn, and lori mestre, “designing mobile technology to enhance library space use: findings from an undergraduate student competition,” journal of learning spaces (forthcoming). http://www.clir.org/pubs/reports/pub155/pub155.pdf http://en.wikipedia.org/wiki/hackathon 198 production of library catalog cards and bulletin using an ibm 1620 computer and an ibm 870 document writing system donald p. murrill: philip morris, incorporated, richmond, virginia a program is presented which runs on an ibm 1620 computer and pro· duces punched cards that activate an ibm 870 document writing system to type catalog cards in upperand lower·case characters. another pro· gram produces punched cards which instruct the 870 to type a library accessions bulletin. the programs are written in fortran ii and are de· scribed in detail. estimates of costs and production times are included. producing library catalog cards and accessions bulletins with the aid of a computer is not a new idea. since 1963 several published papers have described projects that have resulted in the production of such cards and bulletins, either as the principal end products or as two products under a total systems concept. kilgour ( 1,2,3), while with the yale medical library, described a project which had been developed jointly with tl1e harvard and columbia medical libraries under a grant from the national science foundation. six pro. grams were written to produce catalog cards by means of an ibm 1401 computer and either a 1403 line printer or an ibm 870 document writer. these programs were of interest to the philip morris, incorporated, re· search center library because of the upper· and lower·case printing capability. during the period 1961·1963 the technical library of the bureau of ships, department of the navy, used a 1401 computer to automate the preparation of 3x5 catalog cards and the library accessions bulletin. in production of library catalog cards/murrill 199 reports concerning project sharp (ships analysis and retrieval project) ( 4,5) these functions are described as being two of eleven formal outputs of the project. production of the cards is termed a "by-product" of the accumulated bibliographic data. in an ibm publication concerning the administrative terminal system, holzbaur and farris ( 6) list the production of catalog cards and of the library bulletin as two of approximately seven outputs of a total system using the 1401 computer. ibm began automatic card preparation in 1963. other publications by ibm also deal with the production of catalog cards and library bulletins with the help of a computer ( 7,8). in 1964 buckland ( 9) prepared a report for the council on library resources, inc., in which he described the preparation of catalog data on a tape-punching typewriter. the perforated tape was processed by computer for phototypesetting, tape typewriter, or line printer output. at the 1967 meeting of the american documentation institute (now american society for information science) cariou ( 10) discussed the preparation of catalog cards by means of a computer and an ibm document writing system. she programmed her computer to count the number of spaces between sentences and to use this count to determine the type of information it could expect next and, thus, what kind of processing it should give that information. the files set up by this program were used with another program to punch cards for the document writer. the computers used with the programs discussed in the foregoing were of the ibm 1400 series or equivalent. the philip morris research library has access to such computers but only on a limited basis. it has ready access to a 1620 computer. for this reason its programs were written for use with the 1620. the punched card output is processed by the 870 document writing system to produce printout in upperand lower-case characters. the philip morris library contains 5000 books, subscribes to 600 journals, and serves approximately 300 people. it is growing at the rate of approximately 1000 volumes per year. the ibm 1620 computer the 1620 computer for which the programs described in this paper were written has 20,000 positions of core, and typewriter and punched card input. it has typewriter and punched card output and differs from the computers used with the ptograms mentioned earlier in not having magnetic tape or a line printer. the lack of these devices does not detract from the final output, although that output is not produced as fast with the 870 document writer as it would be with a printer. also, data punched into cards cannot be stored on tape but must be saved, if desired, in the cards. there are many 1620 computers extant, however, and many libraries which might use this machine for the automation of their library functions. programs which permit the use of the 1620 in the pro200 journal of library automation vol. 1/ 3 september, 1968 duction of catalog cards and accessions bulletins might be of help to these libraries. the programs are written in fortran ii, specifically for compilation with a pdq fortran processor deck, as developed by maskiell ( 11 ) of mcgraw-edison company. when the program for the library catalog cards is compiled, columns 7-13 of card 25058 in the fixed format subroutine deck must be changed to 4903158. this eliminates the punching of sequence numbers in the computer output cards which are to be run through the 870 system, thus permitting use of the last eight columns for tum up control characters ( & ) and for the forms feed character (). the ibm 870 document writing system the 870 document writing system is a combination of a control unit, which is a card punch machine with a control panel, and a typewriter. a complete system could include an auxiliary keyboard, a paper-tape punch, an auxiliary card punch, and a second typewriter; but for an output of library catalog cards and bulletin only the control unit and one typewriter are needed. the punched cards whose contents are to be typed are placed in the hopper of the control unit and are passed in a continuous feed under the read head. the control panel interprets the punched characters in the cards and produces the desired alphameric symbols and punctuation marks on the typewriter. for the production of library catalog cards, continuous 3x5 card forms are put into the typewriter and the punched cards which are output from the computer, as explained in the next section, are passed through the control unit. carriage turnup is controlled by a continuous chain of small beads in which four large beads are equally spaced three-and-a-half inches apart. one rotation of the chain corresponds to the turnup of four 3x5 cards. before typing begins, one of the large beads is positioned to coincide with the top of a card. a special character in the last card of a unit of punched cards obtained from the computer activates the carriage control, causing tumup to the next large bead and the top of the next card. eleven special character exits on the control panel correspond to the special characters on the 836 card-punch keyboard. any one of the exits can be wired to do a certain job when the special character is encountered in a punched card. it seems logical to have a punched period produce a typed period, a punched comma a typed comma, and a punched slash a typed slash. these three characters, along with the numbers, are lower case on the 866 typewriter. other special characters on the typewriter, such as parentheses, brackets, colon and semicolon, are obtained by punching the upper-case shift character in the card immediately before the appropriate lower-case character. the left parenthesis, for example, is produced from an upper-case 9, the right parenthesis from an upper-case 0. reference to the typewriter keyboard gives ready knowledge of how production of library catalog gauls/ murrill 201 to obtain any desired character (see table 1). other special character exits on the control panel are wired to produce lower-case shift, single and multiple upper-case shift, typewriter carriage return, tab control, underlining (obtained from an asterisk punched immediately before the character to be underlined), and forms feed (see table 2). table 1. special character production card punch input @. typewriter output @, @/ #1 #2 #3 #4 #5 #6 #7 #8 #9 #0 #. #, #i • ( @ = lower case shift, # = upper case shift) table 2. control panel wiring typewriter 1 on _ _ ..,.. column zero single card read on ..,.. column zero single carriage return __ ..,.. card drop out special character exits common channel: . __ ..,.. type-only entry . , __ ..,.. type-only entry , / __ ..,.. type-only entry / o __ ..,.. type-only entry o & __ ..,.. carriage return @ _ _ ..,.. lower case shift # __ ..,.. single upper case shift $ __ ..,.. multiple upper case shift % __ ..,.. typewriter tab control __ ..,.. forms feed start 1 ' i " l + [ ] ? & ( ) (underline) (punched "&" prints as "+", punched "%" as "(", and punched ":if' as "-'' ) . 202 journal of library automation vol. 1/ 3 september, 1968 to obtain panel control of the typewriter the star wheels of the control unit must be engaged and a program card must be on the drum. for the library programs a blank program card is used. the same panel that is used to control the printout of the catalog cards is used for the library bulletin. the same manually punched cards are used as input data to both programs, with modifications in the case of the bulletin, as will be noted later. library catalog cards the data cards containing the bibliographic information on the books being cataloged must be punched with care, but no worksheet need be filled out by the librarian. this saves him a great deal of time and trouble. he must designate the call number, tracings, and other details for the keypunch operator, but he can do this on a plain piece of paper, which also contains a transcription of the title page or of an lc proof slip. he does not need to remember or look up any codes, nor does he need to be concerned with where each letter will go in the punched cards. the keypuncher must know certain details, of course; e.g., that the author's name always starts in column nine, and that two blank spaces are inserted after a period, one after a comma. she must know the special characters which have been wired in the control panel to produce upperand lower-case printout on the 866 typewriter. she must remember that the character for multiple upper-case printing will produce capital letters until a different control character is encountered, and she must punch the appropriate control characters where needed. these are details which are quickly learned with use, but because of them only one keypunch operator should be selected to handle the library data and to be responsible for production of the catalog cards. the 3x5 catalog card will hold seventeen typed lines. it was arbitrarily decided that the tracings, i.e., the subject entries and added entries, would always start on line thirteen. if there are more than twelve lines of bibliographic information before the tracings, the program described here will not punch the thirteenth card, and the computer processing will stop. if there are more than seventeen lines total, the program will not provide for automatic turnup to the necessary continuation card. catalog cards which fall into these two categories must be typed manually, but only a small percentage of cards need to be prepared in this way. computerizing the production of catalog cards enables one set of keypunched cards to produce several sets of computer punched cards and, thus, several catalog cards for each book. the necessity for typing the card manually is eliminated. in the program presented here each punched card represents one printout line on the catalog card. this makes it possible to count the number of lines before the tracings, by counting each punched card as it is read by the computer, and to skip as· many lines as is necessary to always start the tracings on line thirteen. production of library catalog cards/ murrill 203 to facilitate explanation in the following discussion certain terms and definitions have been assumed: a "unit" is all of the manually punched cards required for all of the catalog cards for a given book, including main entry cards and tracing cards, the latter being those with headings taken from the subject and added entries; a "set" is all of the manually punched cards required for the main entry card and not including the headings punched separately; the "body" cards are those required for the "body" of the catalog card, down to but not including the tracings. figure 1 shows how the keypunched cards look. class code subject code cutter number $tx :;@415 ij_if29 ~ : r; ~ r;$tx r;stx v@415 vif29 #fritzsche #brothers iinc. #guide to flavoring ingredients as classified under the #federal #food, #drug ahd fcosmetic fact . #neii#york, 1966. 84p. 1 l. fflavoring essences. ii . #t itle. 1 flavoring essences 1 #guide to flavoring ingredients as classified under the #federal i food, #drug and #cosnetic #act 1-i ..,_..~ fig. 1. keypunched cards. ii l i unit l certain controls must be punched into the data cards. a "1" is punched into a field called mon to indicate the last body card; a "1" is punched right-justified into a two-character field called kon to designate the last tracings card, before a repetition of the tracings to serve as headings; and "-1" is punched into the field kon to indicate the last card for a given book. one rule must be followed: if there are no non-spacing characters in a card, e.g., upperand lower-case shift characters, punching should not extend beyond column 57. for each non-spacing character punching can be extended one column, but must not extend beyond column 65. in the read statement of the computer program discussed here sixtyfive columns of alphameric data are read and stored in an array. the contents of mon and kon are saved until the next card is read. 10 read 11, m1(i),m2(i),m3(i), (read statement for all lines 1m4(i),m5(i),m6(i),m7(i), on main entry cards and all 2m8(i) ,m9( i),m10(i),mll(i), lines except headings on trac3ml2(i),m13(i),mon,kon ing cards.) 11 format( 13a5,11x,2i2) 204 journal of library automation vol. 1/ 3 september, 1968 as long as mon and kon are zero, each card is punched as it is read, and the program returns to the read statement: 15 punch 8, m1(i),m2(1) ,m3(i) , (punch statement for all lines 1m4(i),m5(i),m6(i),m7(i), except last of body of main 2m8(i ),m9(i ),m10(i) ,mll(i) , entry cards.) 3m12 (i) ,m13(i) 8 format(13a5) 14 go to (32,33) ,nbc 32 i=i+1 go to 10 (nbc is set to "1" at beginning of program. ) when the last body card is read and mon=1, the program branches to a statement which stores n-1, n being the number of cards read to that point. the program then calculates the difference between thirteen and n and branches to the appropriate statement for punching the last body card and the special characters needed to produce the number of blank lines which will begin the tracings on line thirteen in the printout: 23 max=13-n go to (15,19,20,21,22,34), max 21 (e.g.) punch 30, m1(i),m2(i), 1m3( i) ,m4( i ),m5(i) ,m6(i ), 2m7 (i),m8(i) ,m9(i ),mlo(i) , 3mll (i) ,m12( i) ,m13( i) 30 format ( 13a5,11x,4h&&&&) (sample punch statement for last body line; includes special characters to produce skipped lines.) the computed go to contains only six statement numbers to which the program can branch because of the limited memory of the 1620 computer. this means that there must be at least seven cards before the tracings and, if necessary, blank cards must be added to reach this count. the subject entries and added entries, i.e., the tracings, are read and punched, card by card, after a branch back to the read statement ( 10). with the reading of the last tracings card, kon=l. the program branches to a statement which punches the last tracings data and a special character ( ) which, during printout, will cause the typewriter to tum up to a new 3x5 card. this completes the preparation of punched cards for the first main entry card. most libraries need more than one such card. additional sets of punched cards are prepared by means of a do loop and a return to an earlier part of the program: 36 do 6 k= 1,nb 6 punch 8,ml(k),m2(k) , 1m3 ( k) ,m4( k) ,m5( k) ,m6( k), 2m7(k) ,m8(k),m9(k),mlo(k), 3mll(k),m12(k),m13(k) ( nb has been previously set to one less than the number of body cards.) (punch statement for all except last body card for all main entry cards except the first. ) max is again defined in statement 23 and the last body card is again production of library catalog cards/ murrill 205 punched. statement 14 sends the program to statements which punch the tracings for the second and subsequent cards: 33 nix=nb+2 if(not-nix)1,9,9 (not=one less than total number of cards in set.) 9 do 47 jo=nix,not 47 punch 8, m1(jo),m2(jo ), 1m3(jo ),m4(jo ),m5(jo ), 2m6(jo),m7(jo),m8(jo), 3m9(jo ),m10(jo),mll(jo ), 4m12(jo),m13(jo) 1 i=not+1 punch 26, m1(i),m2(i), 1m3( i ),m4( i) ,m5(i ),m6( i), 2m7 (i ),m8( i ),m9( i),m10( i) , 3mll( i) ,m12( i) ,m13(i) 26 format ( 13a5,14x,1h-) (punch statement for all but last line of tracings. ) (punch statement for last line of tracings; includes special character to produce typewriter turn up.) a count is kept, in a fixed point variable, ncs, of the number of card sets which have been prepared. this variable is now used to determine the next step in the program: go to ( 36,36,53,53,53,53,53,53,etc.) ,ncs the number of card sets punched for main entry cards is one more than the number of times "36" is inserted in the foregoing computed go to statement. . the headings for the tracing cards, as shown in figure 1, are placed after the card set, thus completing the card unit. a 't' in the mon field controls the processing of one heading at a time. if a heading requires more than one card, the control is punched into the last card. preparation of the remaining punched cards required for the tracing cards is provided for in the following statements: 59 m=1 2 read ll,n1,n2,n3,n4,n5,n6, 1n7,n8,n9,n10,n11,n12,n13, 2mon,kon (read statement for headings. ) punch 8, n1,n2,n3,n4,n5,n6, (punch statement for head1n7,n8,n9,n10,nll,n12,n13 ings.) m=m+1 if(mon)2,2,51 51 if(m-4)12,52,52 52 i= 3 punch 13, m2(i),m3(i), 1m 4 (i) ,m5 (i) ,m6 (i) ,m7 (i ) , 2m8( i ),m9( i ),mlo( i) ,mll (i), 3m12( i) ,m13( i) (punch statement for main entry on tracing card, drawn from fomputer memory; class code is t>mitted.) 206 journal of library automation vol. 1/ 3 september, 1968 a header card is punched not only with the heading but also with the alphabetic class code of the book being cataloged. if there is a second card to the heading, it is punched with the subject code number. a third card, if there is one, contains the cutter number. in the program statements cited above, provision is made for retrieving from the computer memory any part or parts of the call number that are not read in with the heading. care is taken to assure that no part of the number that is not needed is retrieved. as headings are generally in capital letters, and the multiple upper-case shift character is used to produce them, it is necessary to precede the subject code number of a second card to a heading with the lower-case shift character. otherwise, instead of the subject code number's being typed, the upper-case characters for the digits of this number will be typed. cards for the remainder of the body of the catalog card, except the last line, are punched by use of a do loop: m= 4 12 do 7 j= m,nb (punch statement for remainder of tracing card through next-to-last body line.) 7 punch 8, m1(j),m2(j),m3(j), 1m4(j),m5( j) ,m6(j ),m7 (j), 2m8 ( j),m9(j),m10(j),mll(j), 3m12(j),m13(j) before the last body card can be punched, the value of n must be set. before this can be done, the value of i, subscript for the punch statement, must be t ested, which is accomplished in a series of if statements. i is set to nb + 1; then if there are fewer than four cards in a heading: 54 if(i-8)18,42,43 43 if(i-10)44,45,46 46 if(i-12)48,49,50 18 n= 7 42 n= 8 44 n= 9 45 n= 10 48 n= ll 49 n= 12 after each statement setting n to a specific value, a go to statement sends the program back to statement 23. if the number of cards in a heading is four or more, the value of n is set by the following statements, mat having been equated earlier to one more than the number of cards in the heading: 41 if(mat-4 )23,27,37 37 if(mat-6)38,39,40 27 n= n + 1 38 n=n + 2 39 n= n + 3 40 n= n + 4 -----------(it is assumed that no heading will be longer than seven lines.) --~~--==---................ """"""'""""""'-... production of library catalog cards/ murrill 207 again, after each statement setting n to a specific value, a go to statement sends the program back to statement 23. . the last card of the unit of manually punched cards contains -1 in the kon field. when this control number is read and tested, the program branches to a do loop which erases the bibliographic data stored in the computer's memory, then branches to the beginning of the program to statements which set n=o, i=l, nbc=l, and ncs= o. the computer is ready to read the first card of the next unit. figure 2 shows a printout of a main entry card. exact cost figures for the production of catalog cards are not available, but a good estimate would be approximately eighteen cents per card. this cost is high, perhaps, but when the cards issue from the 870 system, they are complete, including call numbers and tracings, items that are missing from the lc cards and that must be typed onto these cards. the saving is in time; the cards produced by the program described here can be ready for filing in the library within a week; delivery of lc cards is, on the average, six months after the order date. the overall cost is reduced somewhat by the fact that the same cards that are punched for the catalog cards are used for the accessions bulletin. they can also be used for bibliographic listings under selected headings. a listing of the complete catalog card program may be obtained from the author. tx 415 f29 fritzsche brothers inc. guide to flavoring ingredients as classified under the federal food, drug and cosmetic act. new york, 1966. ·a4p. l.flavoring essences. i.title. fig. 2. printout of main entry cm·d. 208 journal of library automation vol. 1/ 3 september, 1968 library bulletin the data for the library bulletin program consists of the cards which were punched manually for the catalog card program. the information concerning each book which is to be included in the bulletin is the choice of the individual librarian. at the philip morris, incorporated, research center library none of the data after the publication date is included. care must be taken that there are at least five cards remaining for each book after unwanted cards have been discarded. the reason for this will become apparent later. the headings, consisting of the first subject entries in the tracings of the books to be listed, need to be punched. the books are grouped in the bulletin under these headings. each of the headings is to be in uppercase letters, so the first column of each card must be punched with a dollar sign, the special character wired in the 870 control panel to produce multiple upper-case printout. as with the catalog card program, certain controls are required for the bulletin program. a 't' must be punched in the field called mon in the last card of each book set. a 't' must be punched in a field called jon in each header card. a code punched into the cards to facilitate keeping them in the proper order is not necessary for the computer program, but it is certainly desirable. therefore, a sequence code consisting of eleven digits is in each card: the first five digits designate the subject heading, the next four the author, and the last two the card sequence for each book. judicious selection of the codes makes it possible to put the cards into proper or near proper order with a card sorter an especially useful feature for preparing a listing of all the titles under a given heading acquired over a period of several months. in the bulletin the titles are numbered sequentially beginning with 't', the numbering being controlled with a fixed point variable, no, which is set to one at the beginning of the computer program. the data cards are arranged in alphabetic order by author and are placed behind the appropriate header cards, which have also been arranged in alphabetic order. the first card to be read and punched is a header card: read 4, ka,kb,kc,kd,ke, kf, (read statement for first headlkg,kh,ki,kj,kk,kl,km,jon er card.) 4 format( 13a5,11x,ll) 15 punch 3, ka,kb ,kc,kd,ke, (punch statement for header lkf,kg,kh,ki,kj,kk,kl,km cards; includes special charac3 format ( 3h&&&, 13a5 ) ters to produce skipped lines.) the first three data cards are read in a do loop. the three parts of the call number are stored in an array, then the main entry, in the third card, is punched, along with the sequence number and the first part of the call number, the alphabetic class code. production of library catalog cards/ murrill 209 do 6 i= 1,3 6 read 1, m (i ) ,n (i) ,lb,lc,ld, 1le,lf,lg,lh,li,lj,lk,ll, 2lm,mon 1 format(a5,a4,11a5,a1,12x,il) punch 12, no,lb,lc,ld,le, 1lf,lg,lh,li,lj,lk,ll,lm, 2m( 1),n ( 1) 12 format ( ih&,1h@,i4,1h.,2x, 112a5, 1h%,a5,a4) (read statement for call number and main entry.) (punch statement for sequence number, main entry, and class code; includes characters for skipped line, lower case, and typewriter tab control. ) the next two cards are read and new cards are punched in another do loop. the remainder of the call number, the subject code number and the cutter number, are punched into these two cards. if there are fewer than five cards to be processed in each book set, part of the call number will be lost. blank cards are added, if necessary, to bring the count to five. 16 do 7 i = 2,3 read 17, ka,kb,kc,kd,ke, 1kf,kg,kh ,ki,kj,kk,kl,mon 17 format(8x,11a5,a2,12x,il) punch 5, ka,kb,kc,kd,ke, lkf ,kg,kh,ki,kj ,kk,kl,m (i), 2n(i) 5 format( 4x,1h%, 12a5,5x, 11h%,a5,a4) 7 continue (read statement for fourth and fifth cards of set. ) (punch statement for second and third lines of bibliographic data and remainder of call number; includes typewriter tab controls. ) the mon field in the fifth card is tested in an if statement. if this field is zero, indicating that there are more cards in the set, the additional cards are read and new ones are punched in another pair of read and punch statements. if the mon field is ''1'', indicating that the processing of the card set is complete, one is added to the sequence number variable, no, and the next card is read. if this is a header card, as indicated by ''1'' in the field called jon, the program branches back to statement 15. if, on the other hand, it is the first card of another title under the same heading, the class code is stored in an array, and the second and third cards are read in a do loop. the main entry, the content of the third card, is then punched, along with the class code and the sequence number. 9 no=n0 + 1 read 4, ka,kb,kc,kd,ke,kf, 1kg,kh,ki,kj,kk,kl,km,mon (read statement, for heading if jon=1, for main entry and class code of new card set if jon= 0.) 212 journal of library automation vol. 1/ 3 september, 1968 7. ibm manual no. e20-8094 : mechaniz ed library procedmes, 14. 8. ibm manual no. e20-0093: library catalog production -1 401 and 870. 9. buckland lawrence f. : the recording of library o f congress bibliographical data in machine form. a report for the council on library resources, inc. (washington council on library resources, inc.: november 1964 ) (rev. february 1965 ). 10. cariou, mavis: "a computerized method of preparing catalogue cards, using a simplified form of data input," proceedings, american documentation institute annual meeting, 4 (october 1967 ), 186-90. 11. maskiell, frank h.: "pdq fortran (an interpretive program for the fortran language )" (november 1963 ). 122 levels of machine readable records recon working task force: henriette d. avram, chairman; richard de gennaro; josephine s. pulsifer; john c. rather; joseph a. rosenthal and allen b. veaner. this study of the feasibility of determining levels or subsets of the established marc ii format concludes that only two levels are necessary and desirable for national purposes: 1) the full ma.rc ii format for distribution purposes; and 2) a less complex subset to be used by libraries reporting holdings to the national union catalog. introduction in march 1969, the advisory committee to the recon working task force, after approving publication of the initial recon report ( 1 ), endorsed investigation of a number of questions raised in that report as well as consideration of certain issues not covered in the initial survey. the basic tasks to be undertaken have been described in another article in this issue (2). with further support for recon from the council on library resources, inc., the working task force has met several times to explore some of these problems. this article reports the conclusions reached with respect to one task: the feasibility of determining a level or subset of the established marc ii format that would still allow a library using it to be part of a future national network. definition of "level" during the initial recon study the working task force, for discussion purposes, considered levels of encoding detail of machine readable catalog records in relation to the conditions under which conversion might occur. a level was distinguished by differences in 1) the bibliographic \ levels of machine readable recordsjrecon task force 123 completeness of a record, and 2) the extent to which its contents were separately designated. with respect to the latter point, the recon report stated: "a machine format for recording of bibliographic data and the identification of these data for machine manipulation is composed of a basic structure (physical representation), content designators (tags, delimiters, subfield codes), and contents (data elements in fixed and variable fields). although the basic structure should remain constant, the contents and their designation are subject to variation. for example, a name entry could be designated merely as a name instead of being distinguished as a personal name or corporate name. when a distinction is made, a personal name entry can be further refined as a single surname, multiple surname, or forename. likewise, if a personal name entry contains date of birth and/ or death, relationship to the work (editor, compiler, etc.), or title, these data elements can be identified or can be treated as part of the name entry without any unique identification. thus individual data elements can be identified at various levels of completeness." (3) appendix f of the recon report tentatively defined three levels: "level 1 involves the encoding of bibliographic items according to the practices followed at the library of congress for currently cataloged items, i.e., the marc ii format. a distinguishing feature of level 1 is the inclusion of certain content designators and data elements which, in some instances, can be specified only with the physical item in hand. "level 2 supplies the same degree of detail as in level 1 insofar as it can be ascertained through an already supplied bibliographic record ... . "level 3 would be distinguished by the fact that only part of the bibliographic data in the original catalog record would be transcribed. in addition, content designators might be restricted ... " ( 4) . at the outset of the present study, however, it was recognized that incomplete bibliographic description is not acceptable in records for national use. in addition, it seemed that the question of having a level below level 2 really arose from a desire to define a machine readable record with a lesser degree of content designation rather than one with less complete bibliographic data. it was decided, therefore, to concentrate the study effort on this task, and the original formulation of level 3 was discarded. on further consideration, it was realized also that the distinguishing feature between levels 1 and 2 was not significant. omission of data elements that cannot be determined unless the book is in hand may simplify an individual record but does not simplify the content designators in the format because these elements are often present in other 124 journal of library automation vol. 3/2 june, 1970 records. thus, as far as content designation is concerned, levels 1 and 2 (as originally defined) were in fact the same. once this similarity became apparent, it was recognized that the specification of levels really depended on the functions of machine readable catalog records from the standpoint of national use. functions and levels on the basis of present knowledge, it seems that machine readable records will serve two primary functions for national use. the first involves the distribution of cataloging information in machine readable form for use by library networks, library systems, and individual libraries; the second involves the recording of bibliographic data in a national union catalog to reflect the holdings of libraries in the united states and canada. in this report, the first is called the distribution function; the second is called the national union catalog ( nuc) function. each of these functions can be related to a distinct level of machine readable record. the distribution function the distribution function can best be satisfied by a detailed record in a communications format from which an individual library can extract the subset of data useful in its application. at the present stage of library automation, it is impossible to define rigorously all of the potential uses of machine readable catalog records. thus, there is no way to predict which data elements may not be needed or to rank them according to their value to a wide variety of users under different circumstances. to confirm the wide variation in treatment of the marc ii format, an analysis was made of the use of marc content designators by eight table 1. use of marc content designators by 8 library systems or networks number of libraries number of items fixed fields (19) tags indicators (63) 8 26 7 6 6 3 5 1 5 4 6 3 3 7 2 2 4 4 1 1 7 none 7 (126) 2 7 9 92 16 note: only six libraries supplied information on fixed fields . sub field codes (181) 1 88 45 15 9 11 9 3 \ levels of machine readable recordsjrecon task force 125 library systems and emerging networks. the data from this analysis were synthesized for presentation in two tables. table 1 shows the acceptance of content designators in terms of the absolute number of libraries using them. it should be read as shown by the following examples: 1) 26 of the 63 marc tags are used by all eight libraries; 2) 92 of the 126 indicators are used by only three libraries. table 2 shows the acceptance of content designators in relative terms. thus, if only three libraries were using a particular tag and all used the associated subfield codes, the acceptance of those subfield codes was calculated as 100 percent. in both tables 1 and 2, the columns on indicators and subfield codes include responses only from those libraries that were definitely using the tag with which a given indicator or subfield code was associated. the analysis excludes tags for which no immediate implementation is planned by the marc distribution service. table 2. percentage of acceptance of marc content designators by 8 library systems or networks percent of libraries 100 75-99 50-74 25-49 1-24 0 fixed fields (19) 1 13 4 1 number of items tags indicators (63) ( 126 ) 26 9 2 8 16 6 108 7 7 subfield codes (181) 10 134 32 5 the major findings of this analysis may be summarized as follows: 1) of 19 fixed fields, 14 were used by at least half of the libraries and all were used by at least one library. 2) of 63 tags, 43 were used by at least half of the libraries and 26 were used by all of them. seven tags were not used by any of the libraries studied, but these tags cover items that will appear in machine records produced by the national library of medicine, the national agricultural library, and the british national bibliography. 3) of 126 indicators, only 18 were used by at least half of the libraries. the highest degree of acceptance was the use of the same two indicators by six libraries. on the other hand, each indicator was used by at least two libraries. 4) of 181 subfield codes, 176 were used by at least half of the libraries that were using the related tags. each subfield code was used by at least a quarter of the libraries that could express a relevant opinion. 126 journal of library automation vol. 3/ 2 june, 1970 the foregoing analysis confirmed the view that a nationally distributed record should be as rich in content designation as possible. failure to provide this detail would result in many libraries having to enrich the record to satisfy local needs, a process more costly than deleting items selectively. therefore, as of now, the present marc ii format constitutes the level required to satisfy the national distribution function. the national union catalog function as noted above, the nuc function relates to the use of machine readable records to build a national union catalog. at first thought, it might appear that this function overlaps the distribution function. as far as library of congress cataloging is concerned, this view is correct. it is valid also with respect to cooperative cataloging entries issued by the library as part of the card service. however, the two functions are quite distinct as far as regular reports to nuc are concerned. the essential difference between the two categories of catalog records is that those issued as lc cards have been completely checked against the library's authority files and edited for consistency, whereas only the main and added entries of nuc reports have been checked for compatibility. the impact of this difference can be judged from the fact that an attempt to distribute nuc reports as proof slips several years ago was abandoned because the response to this service did not justify its continuance. distributing nuc reports in machine readable form would add another dimension to the problem of processing them, because, to be flexible enough for wide acceptance, nuc reports would have to be entirely compatible with those issued by the marc distribution service. since compatibility would involve more detailed content designation than many libraries might put into their records for local use, libraries would have to be willing to provide this detail in nuc reports, or the level of nuc reports would have to be upgraded centrally. as the certification of the bibliographic data and the content designators would entail a major workload for the library of congress, it does not seem practical to pursue this goal at present. it is possible, however, to define a subset of content designators to cover the eventuality that outside libraries may be able to report their holdings to nuc in machine readable fmm. a marc subset can be determined for the nuc function because this function involves processing records in a multiplicity of places to be used centrally for specifically definable purposes. the distribution function, on the other hand, involves the preparation of records at a central somce to be used for a wide variety of purposes in a multiplicity of places. the difference is vital when it comes to stating the requirements for the two types of records. levels of machine readable recordsfrecon task force 127 the specifications of a machine readable record to fulfill the nuc function depend on the nature and functions of the national union catalog itself. the content designators for such a record will be defined in a separate investigation now being conducted by the working task force. the present study was considered to be completed once the feasibility of defining a level of machine readable record for that purpose was established. conclusion the findings of this study of the feasibility of defining levels of machine readable bibliographic records are as follows: 1) the level of a record must be adequate for the purposes it will serve. 2) in terms of national use, a machine readable record may function as a means of distributing cataloging information and as a means of reporting holdings to a national union catalog. 3) to satisfy the needs of diverse installations and applications, records for general distribution should be in the full marc ii format. 4) records that satisfy the nuc function are not necessarily identical with those that satisfy the distribution function. 5) it is feasible to define the characteristics of a machine readable nuc report at a lower level than the full marc ii format. references 1. recon working task force: conversion of retrospective catalog records to machine readable form (washington, d. c.: library of congress, 1969). 2. avram, henriette d. "the recon pilot project. a progress report." journal of library automation, 3 (june 1970), 10-22. 3. recon, op. cit., p. 43. 4. ibid., p. 164. 75 video technologies: neologism or library trend? converging factors are shaping a new environment for libraries, and, as a consequence, the present is full of opportunity. technical and social changes provide libraries with a host of alternatives for service, growth, and innovation. in this new environment libraries will, undoubtedly, continue to promote the availability of books and other materials, continue to increase their efforts to furnish patrons with information, and continue to broaden the range of activities offered so that patrons can receive personalized service. patron information seeking and searching methods we have known, however, will give way to new methods based on computers and telecommunications. a host of new technologies is growing out of the evolutionary pathway marked by telegraph, telephone, radio, and television . broadband communications (that's the cable that today brings you predominantly entertainment television), satellite, videotex, teletext, videodisc, videotape, large-screen television, and computer displays (some are as large as the side of a building) are available either today or within the next year or two . each of these technologies is a new medium within its own inherent capabilities and limitations. each has the promise of providing faster and more cost-efficient information services than some present forms of printed communication. and each requires a different approach and different knowledge for effective and efficient use, and integration into library operations. in a growing number of locations, cable communications for delivery of library se rvices have already been made available virtually free of charge . other technologies, such as videotex, will grow significantly . estimates suggest that in five years more than 8 million american homes will be able to obtain extensive, automated information services from commercial, private, and government sources . probably a larger number will receive limited information services over the broadcast airwaves via teletext. dramatic new services will combine television, computer, telephone, satellite, and cable into home entertainment and information centers ... potential extensions of libraries. some sources suggest more than 50 percent of the american gross national product results from the collection, processing, and dissemination of information, much of which involves new technologies . inev76 journal of library automation vol. 14/2 june 1981 itably, this technological trend also will occur in libraries and, in this light, the relatively low level of involvement of computers in providing patron services today is notable. by their natural inertia, individuals and organizations in the library community will be opposed to the acceptance of cable services, videotex , online catalogs, information retrieval , and other video technologies simply because it represents change . but these technologies are technically feasible and are becoming an economic reality. the point of demarcation between computer and library may well become a terminal in a patron's home. whether or not the service provided is a library's or a commercial competitor's depends to a great extent on how libraries define their role in this environment, and on the degree of library participation in the evolutionary process that's now taking place. something besides inertia opposes the acceptance of new technologies, however. to some degree, lack of awareness of technological trends is a factor, but more significant is a lack of clear understanding (both by the proponents of the technology and by librarians) of how new technology can be integrated into the library setting. understanding the value a technology offers for increased service or decreased cost, for example should be paramount, but frequently the technology seems to be offered as an end in itself. internal and external factors must be considered to guide the application of technology toward meeting library and patron needs. financial concerns, social forces , and the consumer/patron appear to be major factors leading libraries toward a future deeply involved in video technologies. whether the outcome will result from external pressure or internal plan remains to be seen. it's incumbent upon libraries to be informed and active participants in directing their own future in this kind of an environment. what are the implications of this technical evolution and internal/external factors? one thing is sure: it's a massive industry growing at a very rapid rate, and it is going to grow even faster . libraries have the opportunity to grow with this trend through application of the technologies to existing technical services, increased availability of patron services, and development of innovative services . if there is a common thread that can identify those libraries which will grow and prosper, that thread is flexibility the capacity of library management and staff to adapt this library to the new environment, and integrate technology into their library. readers and contributors of ]ola are the people that can e ither have an integral part in defining the future direction of libraries, or passively watch patron needs outstrip services. library schools and people involved in library-related research must play a key role in assessing the value of video technologies and defining how to integrate them into the business and service of libraries. what is going to preserve and enhance editorial!harnish 77 the role of libraries in the 1980s will not only be flexibility but another very critical element foresight, dedicated to patron needs . many libraries have met this technological revolution head-on and are intimately involved in testing, developing, and providing innovative library services . in this and forthcoming issues, we hope to bring a perspective on these changes that is valuable and cogent to the library community. readers of ]ola and practitioners in all areas of video technology are called upon to describe their efforts and share their results drawn from this rapidly changing field through contributions to this journal. thomas d. harnish editor's notes jola will continue to be interesting and useful to its readers to the extent that its readers are willing to expend the efforts to also be its writers. the authors in this issue are all as busy as you and i. they have made time in their already full schedules to write down ideas and information they hope will be useful and provocative. they and we of the ]ola staff hope you are pleased with the results . so what's new by you? how have your patrons reacted to your new online catalog? what do the costs of your acquisitions system look like? how about that idea you had about a new way to do whatever? do you think the fuss over authority control is worth it? if you have ,ideas, perceptions, or stories to tell that you feel are of interest to your fellow readers, please write and let them know. lib-mocs-kmc364-20131012122710 292 journal of library automation vol. 14/4 december 1981 we need a format which is consistent, easily maintainable without being uncontrollably disruptive, and responsive to changing needs which are likely to accelerate as we gain experience with online systems. rather than recommending or supporting the implementation of specific changes to the marc format, it is essential that the library community begin to establish the framework and benchmarks necessary to maintain the marc formats over the long term as well as to guide short-term considerations. arl and others can play an important role in undertaking and encouraging a broader approach to this pressing problem. such an approach will not only reduce the risk of decision making, but will also assist in the development of the cost/benefit data needed to enhance consideration of format changes. references l. d. kaye capen, simplification of the marc format: feasibility, benefits, disadvantages, consequences (washington, d.c.: association of research libraries, 1981), 22p. 2. "principles of marc format content designation," draft (washington, d.c.: library of congress, 1981), 66p. 3. ichikot. morita and d. kaye capen, "a cost analysis of the ohio college library center on-line shared cataloging system in the ohio state university libraries," library resources & technical services 21:286-302 (summer 1977). 4. council on library resources bibliographic interchange committee, bibliographic interchange report, no.1 (washington, d.c.: the council, 1981). comparing fiche and film: a test of speed terence crowley: division of library science, san jose state university, san jose, california. introduction for more than a decade librarians have been responding to budget pressures by altering the format of their library catalogs from labor-intensive card formats to computer-produced book and microformats. studies at bath, 1 toronto, 2 texas, 3 eugene, 4 los angeles, 5 and berkeley, 6 have compared the forms of catalogs in a variety of ways ranging from broad-scale user surveys to circumscribed estimates of the speed of searching and the incidence of queuing. the american library association published a state-of-the-art reporf as well as a guide to commercial computer-output microfilm (com) catalogs pragmatically subtitled how to choose; when to buy. 8 in general, com catalogs are shown to be more economical and faster to produce and to keep current, to require less space, and to be suitable for distribution to multiple locations. primary disadvantages cited are hardware malfunctions, increased need for patron instruction, user resistance (particularly due to eyestrain), and some machine queuing. the most common types of library com catalogs today are motorized reel microfilm and microfiche, each with advantages and disadvantages. microfilm offers filesequence integrity and thus is less subject to user abuse, i.e., theft, misfiling, and damage; in motorized readers with "captive" reels it is said to be easier to use. disadvantages include substantially greater initial cost for motorized readers; limits on the capacity of captive reels necessitating multiple units for large files; inexact indexing in the most widespread commercial reader, and eyestrain resulting from high speed film movement. microfiche offers a more nearly random retrieval, much less expensive and more versatile readers, and unlimited file size. conversely, the file integrity of fiche is lower and the need for patron assistance in use of machines is said to be greater than for self-contained motorized film readers. the problem one of the important considerations not fully researched is that of speed of searching. the toronto study included a selftimed "look-up" test of thirty-two items "not in alphabetical order" given to thirtysix volunteers, of whom thirty finished the test. the researchers found the results "inconclusive" but noted that seven of the ten librarians found film searching the fastest method. "average" time reported for searching in card catalogs was 37.3 min-utes, in film catalogs 41.6 minutes, and for fiche catalogs 4i. 7 minutes. a reanalysis of the original data shows a stronger advantage of fiche over film (45.3 minutes versus 51.7 minutes) when all times except duplicates are totaled, but that difference is almost entirely due to one extreme score (203 minutes). 9 the berkeley report of fiche/film comparability addressed the issue of retrieval speed directly. by constructing a series of look-up tests composed of items selected from a large public library com catalog, the researchers were able to compare microfiche and microfilm formats while holding other variables constant. in one test involving thirty-six paid users and 252 trials, microfilm was determined to be faster by 7.6 percent (±2.5 percent). in a second test, forty volunteer users were timed in 240 trials and the advantage of film over fiche dropped to 5. 7 percent ( ± 2.5 percent) .1° although rigorous in design and execution, the berkeley experimenters used in their look-up tests questions that naive users might misinterpret, e.g., "you want a book about paul robeson, written by eloise greenfield. find the listing and give the call number"; and some which could be confusing, e.g., "does the library have any joke books? if so, give the call number for one. "11 such questions potentially pose an element of uncertainty for subjects: should i look under robeson or greenfield? under joke books or humor? in addition, questions were selected by "browsing the file for target items," a procedure which could result in an uneven distribution of items which in turn could bias the results. since the number of observations is relatively large the reliability of the results is not questioned; the validity may be. the study reported here was executed by a class in research methods taught by the author during the same time as the berkeley study; we used the same two formats of the same catalog, and attempted to answer the same question: using the best available equipment, which microformat is faster to search? assumptions we assumed (i) the two forms of the catalog were identical; (2) the quality of the image was not significantly different; (3) a communications 293 search for items selected randomly from the file and arranged randomly was a fair test of retrieval speed; and ( 4) graduate students in library science were reasonably representative users for a test of speed. methodology we used a dictionary catalog from a public library system with 436, 79i entries, of which 5,63i were author, ill,l58 were title or added entries, and 320,002 were subject entries. using a random number table, we selected from the catalog i6 entries which were reproduced and randomly arranged to form the test. of the i6 items, 3 were author entries, 8 were title or added entries, 5 were subject entries. the sequence, which presumably would affect the speed of retrieval more in the film format because of the necessity to scroll from one letter to another, wasacwns kcb wm h l p pal. the test was then administered to thirty-seven volunteer graduate students randomly assigned to a micro-design 4020 fiche reader or an information design rom 3 film reader. the two readers were located in the same room. the 86 fiche were held and displayed by a ring king binder. all times were measured by a stopwatch. questionnaires administered before and after the test established that the two groups did not differ significantly in age or in selfperceived mechanical ability. of the film users, 64 percent used micro-formats "occasionally" or "frequently" compared with 35 percent of the fiche users. of the total group, 73 percent wore glasses and 62 percent reported prior physical problems with both film and fiche readers used before the test. results table 1 shows that the mean speed of the film users was i6. 7 minutes, significantly faster than the 25.3 minutes recorded by the fiche users; the range of speed for the film users was less than v3 that of the fiche users. even the slowest film user was faster than 70 percent of the fiche users. however, the fastest fiche user was faster than 70 percent of the film users. the range of fiche scores is more than 3 times that of the film scores (figure i). the standard statistical test shows the difference of means to be significant at the .oilevel. 294 journal of library automation vol. 14/4 december 1981 table i. speed of retrieval (in minutes) format low microfilm (n = 17) 12.3 microfiche(c = 20) 14.6 t = 4.8,p< .01 discussion searching motorized microfilm appears to be significantly faster than searching microfiche, on the average, for relatively inexperienced users. even the slowest time on the film was faster than most fiche times. the wide range of fiche scores suggests the possibility that frequent users could improve their searching times; very experienced users may be able to search fiche faster than film. • because of the relatively small numbers of subjects and observations •the author, an experienced fiche user, was timed at 11.6 minutes; this was the fastest time recorded by either fiche or film users. <j) q) ol ~ c: q) (..) .... & standard high mean deviation 19.45 16. 7 2.34 18.0 25 .3 7.47 involved, the results should be interpreted with caution. although the advantage of film over fiche in this study is greater than that shown in the berkeley report, differences in design and analysis must be taken into account. acknowledgments the author wishes to acknowledge the members of his research methods class, especially david fishbaugh and carol manoukian, for their assistance. references 1. bath university comparative catalog study : final report. papers no. 110. i film n•20 ~ fiche n•l7 time in minutes and tenths fig. i. dislribulionsojtest scores. (bath: the library, 1974-75). 2. valentine de bruin, "sometimes dirty things are seen on the screen," journal of academic librarianship 3:256-66 (nov. 1977). 3. carolyn m. cox and bonnie juergens, microform catalogs: a viable alternative for texas libraries (dallas: amigos bibliographical council, 1977). eric document no. ed 149 739. 4. james r. dwyer, "public response to an academic library microcatalog," journal of academic librarianship 5:132-41 quly 1979). 5. brett butler, martha w. west, and brian aveney, com catalog: use and evaluation: report of a field study of the los angeles county public library system (rev. ed.; los altos: information access corporation, 1979), 71p. 6. theodora hodges and uri bloch, "fiche or film for com catalogs-two use tests" in library effectiveness: a state of the art (chicago: american library assn., 1980), p.122-30. 7. william saffady, computer-output microfilm: its library applications (chicago: american library assn., 1978), loop. 8. commercial com catalogs: how to choose, when to buy. catalog use committee, reference and adult services division, american library association. (chicago: american library assn., 1978), 47p. 9. debruin, ''dirty things," p.266. 10. hodges, "fiche or film," p.l28. 11. hodges to crowley, september 1979. electronic order transmission james k. long : oclc, inc., dublin, ohio. in this era of decreasing library allocation from the public sector, libraries are realizing increased benefits from the automation of the acquisitions process. the price of hardware is decreasing and the capabilities of the available offerings increasing. we have evolved from the small local library collection of data and printing of orders, through the book vendor offerings of an online connection to a single vendors inventory. these systems still required local mailing for all other vendor orders. communications 295 in 1981 we have seen a greater emphasis on electronic ordering. memorial university in canada has been experimenting in sending orders directly to john coutts library services ltd. in print format using the utlas catss system. wayne state university is planning to use the ringgold nonesuch acquisitions system to transmit orders electronically to book house using the bisac tape format. blackwell/ north america and the academic book center have experimentally used wl~ to receive test orders in a print file format. these all save time in getting the orders to the respective vendor. if sufficient volume can be generated there may be a savings in transmission costs over the u.s. mail. however, in order to realize maximum economics in this electronic process, four activities need to occur. 1. acquisition orders must be collected from multiple libraries at a central site to generate volume for dispersal to multiple sites. 2. standard formats need to be accepted and enforced for order transmission. 3. the isbn must become a universally accepted part of the library acquisitions order. 4. the library must receive order status information from the vendor. once again, this should occur via a standard data format. at oclc there were 113 libraries, as of november 1981, thatcouldsendprintedorders from a central site to over 15,000 addresses of their choice. by july 1982 the projection is for over 200 libraries to be using the system. the library's order is hatched by the vendor address that the library has specified. this process offers savings by sharing mail and printing costs between participants. with the proposed installation of direct transmission in 1982, this central collection will afford shared transmission costs. this is the type of centralized collection that maximizes the benefits of electronic ordering. within the book industry, standards for electronic data transmission for book ordering have been developed. in may of 1981 the book industry systems advisory committee (bisac), a subcommittee of the book industry study group (bisg), aplikes, comments, views: a content analysis of academic library instagram posts articles likes, comments, views a content analysis of academic library instagram posts jylisa doney, olivia wikle, and jessica martinez information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12211 jylisa doney (jylisadoney@uidaho.edu) is social sciences librarian, university of idaho. olivia wikle (omwikle@uidaho.edu) is digital initiatives librarian, university of idaho. jessica martinez (jessicamartinez@uidaho.edu) is science librarian, university of idaho. © 2020. abstract this article presents a content analysis of academic library instagram accounts at eleven land-grant universities. previous research has examined personal, corporate, and university use of instagram, but fewer studies have used this methodology to examine how academic libraries share content on this platform and the engagement generated by different categories of posts. findings indicate that showcasing posts (highlighting library or campus resources) accounted for more than 50 percent of posts shared, while a much smaller percentage of posts reflected humanizing content (emphasizing warmth or humor) or crowdsourcing content (encouraging user feedback). crowdsourcing posts generated the most likes on average, followed closely by orienting posts (situating the library within the campus community), while a larger proportion of crowdsourcing posts, compared to other post categories, included comments. the results of this study indicate that libraries should seek to create instagram posts that include various types of content while also ensuring that the content shared reflects their unique campus contexts. by sharing a framework for analyzing library instagram content, this article will provide libraries with the tools they need to more effectively identify the types of content their users respond to and enjoy as well as make their social media marketing on instagram more impactful. introduction library use of social media has steadily increased over time; in 2013, 86 percent of libraries reported using social media to connect with their patron communities.1 the ways in which libraries use social media tend to vary, but common themes include marketing services, content, and spaces to patrons, as well as creating a sense of community.2 even with this wealth of research, fewer studies have examined how libraries use instagram, and those that do often utilize a formal or informal case study methodology.3 this research seeks to fill that gap by examining the types of content shared most frequently by a subset of academic library instagram accounts. although this research focused on academic libraries, its methods and findings could be leveraged by educational institutions and non-profits in their own investigations of instagram usage and impact. literature review since its inception in 2010, instagram’s number of account holders has been steadily increasing. by 2019, more than one billion user accounts were active each month, making it the third most popular social media network in the world, and the pew research center has reported that instagram is the second most used social media platform among people ages 18-29 in the united states, after facebook.4 instagram has estimated that 90 percent of user accounts follow at least one business account.5 previous research has also shown that individuals who use instagram to follow specific brands have the highest rates of engagement with, and commitment to, those mailto:jylisadoney@uidaho.edu mailto:omwikle@uidaho.edu mailto:jessicamartinez@uidaho.edu information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 2 brands when compared to users of other social media platforms.6 though businesses are fundamentally different in the products or services they are trying to market, academic libraries share a desire to provide information to, and engage with, their followers. as such, in the past decade, libraries have begun to adopt instagram as a way to market their libraries and interact with patrons.7 however, methods and parameters for libraries’ use of instagram vary across types of libraries and even within specific library types.8 research has demonstrated that academic libraries’ use of social media, including instagram, is often for the purpose of increasing the sense of community among librarians and patrons by marketing the library’s services and encouraging student feedback and interaction.9 similarly, harrison et al. discovered that academic library social media posts reflected three main themes: “community connections, inviting environment, and provision of content.”10 chatten and roughley have also reported that libraries’ use of social media ranges from providing customer service to promoting the library and building a community of users.11 indeed, when comparing modern social networking systems, such as instagram, to older platforms, such as myspace, fernandez posited that today’s popular social media sites encourage networking and are especially suited to creating community.12 ideally, community engagement in the virtual social media environment would encourage more patrons to enter the library and thus engage in more face-to-face encounters.13 libraries’ methods for measuring the success of their social media engagement are as varied as the ways in which they use social media. assessment of libraries’ social media efficacy is tricky, and highly variable from institution to institution. hastings has cautioned that librarians should recognize that patrons both actively and passively interact with social media content.14 for this reason, while a large number of comments or likes may be identified as positive markers for active engagement, passive forms of engagement, such as the number of times a post appeared in users’ instagram feeds, may also be relevant.15 therefore, when librarians measure the success of an instagram post by examining only the number of likes and comments, they should be aware that they are measuring a very specific type of engagement: one which, on its own, may not determine a post’s full reach or effectiveness. other ways to measure engagement include monitoring how the number of people subscribed to an account changes over time, evaluating reach and impressions,16 or analyzing the content of comments (a type of qualitative measure that may indicate the type of community developing around the library’s social media). despite, or perhaps because of, the general excitement surrounding the possibilities that libraries’ engagement with social media can produce, very little has been written about how different types of libraries (such as academic libraries, law libraries, public libraries, etc.), or libraries in general, use these platforms.17 additionally, many librarians may lack expertise in marketing, including those who are managing social media accounts.18 as social media culture continues to evolve, librarians should move toward a more targeted and pragmatic approach to their instagram practices. this refinement in social media practices may enable libraries to develop more structure, so that they may create and share the type of content that would achieve their desired result at a given time. however, in order to develop this kind of measured approach, it is necessary for researchers to first analyze libraries’ current instagram practices to determine how posts are being used and the outcomes they generate. one effective method of analyzing instagram content centers on coding and classifying images. while many such schemas have been developed for analyzing images posted by instagram users and businesses, transferring these schemas to academic contexts has been difficult. 19 to address information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 3 this gap, stuart et al. adapted a schema that had been used to examine how “news media [and] non-profits,” as well as businesses, used instagram.20 this new schema allowed stuart et al. to classify instagram posts produced by academic institutions in the uk and measure the effect of these universities’ attempts to engage with students via instagram.21 stuart et al.’s schema, which classified instagram images into six categories (orienting, humanizing, interacting, placemaking, showcasing, and crowdsourcing), was the basis for the present study.22 methods research questions the impetus for this study was to learn more about how academic libraries use instagram to connect with their campus communities and promote their services and events. the authors of the present study adapted the research questions posed by stuart et al. to reflect academic library contexts:23 • rq1: which type of post category is used most frequently by libraries on instagram? • rq2: is the number of likes or the existence of comments related to the post category? identifying a sample population this study investigated a small subset of academic institutions: the university of idaho’s sixteen peer institutions. these peers have similar “student profiles, enrollment characteristics, research expenditures, [or] academic disciplines and degrees”; each is designated as a land-grant institution; and the university of idaho considers three to be “aspirational peers.”24 after selecting this population, the authors investigated the library websites of each of the sixteen peer institutions to determine whether or not they had a library-specific instagram account. when a link was not available on the library websites, the authors conducted a search within instagram as well as a general google search in an attempt to identify these instagram accounts. of the university of idaho’s sixteen peer institutions, eleven had active, library-specific instagram accounts. data collection the authors undertook manual data collection between november and december 2018 for these eleven library instagram accounts. initial information about each instagram account was gathered prior to the study on october 23, 2018: the date of the first post, the total number of posts shared by the account, the total number of followers, and the total number of accounts followed. for each account, the authors identified posts shared from january 1, 2018, to june 30, 2018. the “print to pdf” function available in the chrome browser was used to preserve a record of the content, in case the accounts were later discontinued while research was underway. if a post included more than one image, only the first image was captured in the pdf and analyzed. to organize the 3 77 instagram posts shared within this timeframe, the authors assigned each institution a unique, fivedigit identifier; file names included this identifier as well as the date of the post (e.g. , 00004_igpost_20180423). this file naming convention ensured that posts were separated based on institution and that future studies could use the same file naming convention, even if the sample size increased significantly. the authors added the file names of all 377 instagram posts to a shared google sheet, and for each post they reported the kind of post (photo or video), the number of likes, and whether comments existed. information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 4 research data analysis content analysis this project adapted the coding schema stuart et al. employed to investigate the ways in which uk universities used instagram.25 expanding on research by mcnely, stuart et al. employed six instagram post categories: orienting, humanizing, interacting, placemaking, showcasing, and crowdsourcing.26 for the purposes of the present study, the authors used the same category names when coding library instagram posts. however, they updated and adapted the descriptions of each category over the course of two rounds of coding to better reflect academic library contexts (see table 1). within this coding schema, the authors elected to apply only a single category name (i.e., a code) to each library instagram post. interrater reliability during the first round of coding, the authors selected two or three institutions every month, independently coded the posts based on the initial adapted schema, met to discuss discrepancies, and identified the final code based on consensus.27 however, during these discussions, it became evident that there was substantial disagreement concerning how specific categories were interpreted. to examine the impact of this disagreement, the authors calculated fleiss’ kappa, which can be used to assess interrater reliability when two or more coders categorically evaluate data.28 although this project’s fleiss’ kappa (0.683554901) was relatively close to a score of 1.0, demonstrating moderate agreement between each of the three coders, the authors recognized that additional fine-tuning of the adapted coding schema would allow for a more accurate representation of the types of content shared by academic libraries. after updating the schema (table 1), a small sample of collected instagram posts (20 percent, or 76 posts) was randomly selected for independent recoding by each of the authors. again, after coding this random sample individually, the authors met to seek consensus. anecdotal feedback from the coders, as well as an increase in the project’s fleiss’ kappa (0.795494117), demonstrated that the updated coding schema was more robust and representative. based on this evidence, the authors randomly distributed the remaining 301 posts amongst themselves; each post was coded by one author. information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 5 table 1. coding schema for library instagram posts [adapted from: emma stuart, david stuart, and mike thelwall, “an investigation of the online presence of uk universities on instagram,” online information review 41, no. 5 (2017): 588, https://doi.org/10.1108/oir-02-2016-0057.] category description example1 crowdsourcing posts that were created with the intention of generating feedback within the platform. if the content of the post itself fits within a different classification category, but the image is accompanied by text that explicitly asks for viewer feedback, then the post should be classified as crowdsourcing. includes requests for followers to like, comment on, or tag others in a particular post. humanizing posts that aim to emphasize human character or elements of warmth, humor, or amusement. this includes historic/archival photos used to convey these sentiments. this code is only used if both the text and the photo or video can be categorized as humanizing because many library posts contain a “humanizing” element. 1 sample images from the university of idaho library’s instagram account. https://doi.org/10.1108/oir-02-2016-0057 information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 6 category description example1 interacting posts with candid photographs or videos at library and libraryassociated events. includes events within or outside the library. orienting posts that situate the library within its larger community, especially regarding locations, artifacts, or identities. text often includes geographic information. placemaking posts that capture the atmosphere of the library through its physical space and attributes. includes permanent murals, statues, etc. information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 7 category description example1 showcasing posts that highlight library or campus resources, services, or future events. can include current or on-going events if people are not the focus of the image (e.g. exhibit, highlight of collection, etc.). these posts can also present information about library operations, such as hours and fundraising. posts can also entice their audience to do something, outside of instagram, such as visit a specific website. results general data about the library instagram accounts as of october 23, 2018 (the date this initial information was gathered), the eleven academic library instagram accounts had shared a combined 3,124 posts. most libraries created their instagram accounts and started posting between 2013 and 2016, but one library shared a post in 2012 and one created their account in april 2018. since the date of their first post, each account had shared 284 posts on average, while the actual number of posts shared across accounts ranged from 62 to 520. the number of followers and accounts followed across these eleven accounts ranged from 115 to 1,390 and 65 to 2,717, respectively. between january 1, 2018 , and june 30, 2018, these eleven library instagram accounts shared a total of 377 posts. the number of posts shared by each account during this time period ranged from four to 57, with an average of 34 posts. rq1: which type of post category is used most frequently by libraries on instagram? of the 377 posts analyzed, 359 included photos and 18 included videos. more than 50 percent of posts shared were coded as showcasing, with humanizing (18 percent) and crowdsourcing (9.8 percent) being the next most common categories (see table 2), although data demonstrated that individual libraries differed in their use of specific post categories (see table 3). when examining frequency based on category of post, the authors identified slight differences between video and photo posts. as with photos, the majority of videos (55.6 percent) were still coded as showcasing; however, the second most common post category for videos was interacting (16.7 percent). information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 8 table 2. number and percentage of posts by category for posts with photos or videos category number of posts percentage of posts crowdsourcing 38 10.1% humanizing 68 18.0% interacting 16 4.2% orienting 28 7.4% placemaking 33 8.8% showcasing 194 51.5% total 377 100% table 3. percentage of posts by category and library for posts with photos or videos library crowdsourcing humanizing interacting orienting placemaking showcasing lib 1 7.7% 15.4% 0% 23.1% 30.8% 23.1% lib 2 4.2% 50.0% 0% 4.2% 0% 41.7% lib 3 56.1% 10.5% 1.8% 3.5% 7.0% 21.1% lib 4 0% 4.1% 4.1% 4.1% 2.0% 85.7% lib 5 0% 24.4% 2.2% 20.0% 26.7% 26.7% lib 6 7.5% 18.9% 3.8% 11.3% 11.3% 47.2% lib 7 0% 20.0% 0% 0% 10.0% 70.0% lib 8 0% 21.6% 9.8% 5.9% 0% 62.7% lib 9 0% 25.0% 25.0% 0% 0% 50.0% lib 10 0% 16.1% 6.5% 0% 9.7% 67.7% lib 11 0% 15.0% 5.0% 5.0% 5.0% 70.0% rq2: is the number of likes or the existence of comments related to the post category? number of likes by category the results of the coding process also indicated that the number of likes differed based on the category of post. when examining photo posts, the authors noted that every post received at least five likes, with most posts receiving between 20-39 likes (see table 4). on average, crowdsourcing information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 9 photo posts generated the highest average number of likes across all categories, followed by orienting and placemaking posts (see table 5). however, it is important to recognize that crowdsourcing posts often asked visitors to participate in a post by “liking” it, often with the chance to win a library-sponsored contest, which may partially explain the higher average number of likes. table 4. number of posts by category and range of likes for posts with photos (does not include posts with videos) range of likes category 5-19 20-39 40-59 60-79 80-99 100119 120140 crowdsourcing 0 11 16 6 1 1 1 humanizing 16 26 10 9 5 0 1 interacting 5 5 3 0 0 0 0 orienting 2 7 9 8 0 1 0 placemaking 3 10 12 3 2 1 1 showcasing 67 83 27 5 1 0 1 total 93 142 77 31 9 3 4 table 5. average number of likes by category for posts with photos (does not include posts with videos) category average number of likes number of posts crowdsourcing 53.6 36 humanizing 39.9 67 interacting 27.8 13 orienting 50.0 27 placemaking 46.9 32 showcasing 27.6 184 existence of comments by category the authors also examined the existence of comments, another metric for engagement with instagram posts. data demonstrated that 78.9 percent of crowdsourcing posts included information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 10 comments, while a much lower percentage of placemaking (30.3 percent), orienting (28.6 percent), and humanizing (26.5 percent) posts generated this type of engagement (see table 6). as with the data on the number of “likes,” many crowdsourcing posts encouraged visitors to comment on a particular post, at times with an incentive connected to this type of engagement. table 6. presence of comments by category for posts with photos or videos category number of posts with comments number of posts without comments total number of posts percentage of posts with comments crowdsourcing 30 8 38 78.9% humanizing 18 50 68 26.5% interacting 3 13 16 18.8% orienting 8 20 28 28.6% placemaking 10 23 33 30.3% showcasing 40 154 194 20.6% total 109 268 377 28.9% discussion as noted previously, the post category used most frequently by these eleven libraries on instagram was showcasing (51.5 percent). the fact that libraries were more likely to share this type of content—which highlighted library resources, events, or collections—is understandable, as library promotion is one of the foundational reasons libraries spend the time and effort required to maintain social media accounts.29 this finding differs substantially from previous research with uk universities, which classified only 28.8 percent of posts as showcasing.30 when examining other post categories, it also became clear that uk universities shared humanizing posts more frequently (31 percent) than the eleven libraries (18 percent) included in this study.31 although the results of this study demonstrated that showcasing posts were shared most often, the data also indicates that showcasing posts were neither the category with the most likes on average nor the category that received comments most often. crowdsourcing posts were the category with the highest average number of likes (53.6) with orienting posts coming in at a close second (50), followed by placemaking (46.9) and humanizing (39.9) posts. showcasing posts, along with interacting posts, only generated slightly more than half the number of likes on average, when compared to the other categories (27.6 and 27.8, respectively). the category with the largest proportion of comments was crowdsourcing posts, with 78.9 percent of posts in this category generating comments from visitors. however, this result is likely skewed, as one of the library instagram accounts had exceptionally successful crowdsourcing posts, which often included a giveaway or other incentive for participation. in fact, when this institution was removed from the data set, only six crowdsourcing posts remained, two of which generated information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 11 comments. to better determine whether crowdsourcing posts are always this effective at generating engagement, it would be necessary to code a larger sample of instagram pos ts. it is clear that while showcasing posts were the most common among the instagram accounts analyzed, they also received the lowest number of likes, on average, and generated comments less frequently than all but one post category. while this may seem disheartening, it is important to remember that the showcasing category includes informational posts that convey library hours, services, or closures; this information that may be effectively relayed to users without necessitating an active response in the form of likes and comments. therefore, one might use different criteria to determine the success of showcasing posts, perhaps examining instagram data related to reach (the total number of unique visitors that view a post) and impressions (the total number of times a post is viewed).32 data on reach and impressions are only available to instagram account “owners.” in the current study, the authors did not quantify these types of engagement as their goal was to evaluate the content and metrics available to all instagram users, rather than the data that was only available to the “owners” of these library instagram accounts. in addition to answering the research questions, coding these instagram posts prompted several new questions regarding the types of information libraries and other institutions share online. one such question includes: with both universities and academic libraries working with students, why did academic libraries share a smaller percentage of interacting posts than uk universities? 33 additional research is needed to answer this question, but anecdotally, this difference may be related to the fact that universities, as a whole, have a larger number of opportunities to promote and share instances of interaction via instagram than libraries. for example, general university instagram accounts often include photos of students and affiliates interacting at large scale events such as sports games, musical performances, and other student gatherings that take place across campus. library-specific accounts on the other hand, have fewer opportunities to post photos that capture individuals “interacting” candidly. further, the fact that libraries tend to be proponents of privacy rights may inhibit library staff from taking photos of their users and sharing them online without first getting permission. therefore, differences related to the number of events and the organization type may contribute to whether or not universities and libraries share interacting posts; more research is needed to examine this hypothesis. another issue that arose during coding was that, if not for their inclusion of a request to comment, many crowdsourcing posts could have been classified under other categories. if an account follower looked only at the photos included in many of the crowdsourcing posts without reading the captions, they may not interpret those posts as crowdsourcing. therefore, a future research project might examine whether applying secondary categories to crowdsourcing posts, as a means of further classifying images and not just their captions, could generate a more comprehensive picture of what libraries are sharing on their instagram accounts. the authors also discovered that a majority of the library instagram posts included in this sample contained humanizing elements. almost all posts attempted to convey warmth, humor, or assistance, and therefore had the potential to be classified as humanizing. to successfully adapt stuart et al.’s coding schema for academic library instagram accounts, the authors specified that a post had to have both a humanizing caption as well as a humanizing photo to be coded as such.34 as with crowdsourcing posts, adding secondary categories to humanizing posts could better reflect the dual nature of this content and help future coders more accurately interpret the types of content shared by academic libraries. information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 12 limitations and future research the number of library instagram accounts selected as well as the use of a six-month timeframe were limitations of the current study. in the future, selecting a larger sample size and a different group of academic libraries would serve to advance the discipline’s understanding of the types of content shared by academic libraries and how users interact with these instagram posts. additionally, collecting instagram posts shared during an expanded timeframe could allow researchers to explore whether library instagram accounts consistently share the same types of content at various points throughout the year. as mentioned in the discussion section, future research could also include adding secondary categories to posts, which would allow researchers to gather more granular information about the types of content shared and the relationships between post category, comments, and likes. lastly, to better understand the post categories that generate the greatest engagement, collaborative research between institutions could allow researchers to gather and analyze metrics that are only available to account owners, such as impressions and reach. with this type of collaboration, researchers could also investigate how social media outreach goals influence the types of content shared on library instagram accounts. for example, researchers could conduct interviews or surveys with libraries and ask questions such as: what does your library hope to accomplish with its instagram account, who are you attempting to reach, how do you define a successful post, what metrics do you use to evaluate your instagram presence, and do your social media outreach goals influence the types of content shared on instagram? pursuing these types of questions, in addition to examining the actual content shared, would allow researchers to gain a more complete picture of what a successful social media presence looks like for an academic library. conclusion this research provides initial insight into the instagram presence of a subset of academic libraries at land-grant institutions in the united states. expanding on the research of stuart et al., this project used an adapted coding schema to document and analyze the content and efficacy of academic libraries’ instagram posts.35 the results of this study suggest that social media accounts, including those used by academic libraries, perform better when they reflect the community the library inhabits by highlighting content that is unique to their particular constituents, rather than simply functioning as another platform through which to share information. this study’s findings also demonstrate that academic libraries should strive to create an instagram presence that encompasses a variety of post categories to ensure that their online information sharing meets various needs. information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 13 endnotes 1 nancy dowd, “social media: libraries are posting, but is anyone listening?,” library journal 138, no. 10 (may 7, 2013), 12, https://www.libraryjournal.com/?detailstory=social-media-libraries-areposting-but-is-anyone-listening. 2 marshall breeding, next-gen library catalogs (london: facet publishing, 2010); zelda chatten and sarah roughley, “developing social media to engage and connect at the university of liverpool library,” new review of academic librarianship 22, no. 2/3 (2016), https://doi.org/10.1080/13614533.2016.1152985; amanda harrison et al., “social media use in academic libraries: a phenomenological study,” the journal of academic librarianship 43, no. 3 (2017), https://doi.org/10.1016/j.acalib.2017.02.014; nicole tekulve and katy kelly, “worth 1,000 words: using instagram to engage library users,” brick and click libraries symposium, maryville, mo (2013), https://ecommons.udayton.edu/roesch_fac/20; evgenia vassilakaki and emmanouel garoufallou, “the impact of twitter on libraries: a critical review of the literature,” the electronic library 33, no. 4 (2015), https://doi.org/10.1108/el03-2014-0051. 3 yeni budi rachman, hana mutiarani, and dinda ayunindia putri, “content analysis of indonesian academic libraries’ use of instagram,” webology 15, no. 2 (2018), http://www.webology.org/2018/v15n2/a170.pdf; catherine fonseca, “the insta-story: a new frontier for marking and engagement at the sonoma state university library,” reference & user services quarterly 58, no. 4 (2019), https://www.journals.ala.org/index.php/rusq/article/view/7148; kjersten l. hild, “outreach and engagement through instagram: experiences with the herman b wells library account,” indiana libraries 33, no. 2 (2014), https://journals.iupui.edu/index.php/indianalibraries/article/view/16633; julie lê, “#fashionlibrarianship: a case study on the use of instagram in a specialized museum library collection,” art documentation: bulletin of the art libraries society of north america 38, no. 2 (2019), https://doi.org/10.1086/705737; danielle salomon, “moving on from facebook: using instagram to connect with undergraduates and engage in teaching and learning,” college & research libraries news 74, no. 8 (2013), https://doi.org/10.5860/crln.74.8.8991. 4 “our story,” instagram, https://business.instagram.com/; chloe west, “17 instagram stats marketers need to know for 2019,” sprout blog, april 22, 2019, https://web.archive.org/web/20191219192653/https://sproutsocial.com/insights/instagra m-stats/; pew research center, “social media fact sheet,” last modified june 12, 2019, http://www.pewinternet.org/fact-sheet/social-media/. 5 “our story,” instagram. 6 joe phua, seunga venus jin, and jihoon jay kim, “gratifications of using facebook, twitter, instagram, or snapchat to follow brands: the moderating effect of social comparison, trust, tie strength, and network homophily on brand identification, brand engagement, brand commitment, and membership intention,” telematics and informatics 34, no. 1 (2017), https://doi.org/10.1016/j.tele.2016.06.004. https://www.libraryjournal.com/?detailstory=social-media-libraries-are-posting-but-is-anyone-listening https://www.libraryjournal.com/?detailstory=social-media-libraries-are-posting-but-is-anyone-listening https://doi.org/10.1080/13614533.2016.1152985 https://doi.org/10.1016/j.acalib.2017.02.014 https://ecommons.udayton.edu/roesch_fac/20 https://doi.org/10.1108/el-03-2014-0051 https://doi.org/10.1108/el-03-2014-0051 http://www.webology.org/2018/v15n2/a170.pdf https://www.journals.ala.org/index.php/rusq/article/view/7148 https://journals.iupui.edu/index.php/indianalibraries/article/view/16633 https://doi.org/10.1086/705737 https://doi.org/10.5860/crln.74.8.8991 https://business.instagram.com/ https://web.archive.org/web/20191219192653/https:/sproutsocial.com/insights/instagram-stats/ https://web.archive.org/web/20191219192653/https:/sproutsocial.com/insights/instagram-stats/ http://www.pewinternet.org/fact-sheet/social-media/ https://doi.org/10.1016/j.tele.2016.06.004 information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 14 7 fonseca, “the insta-story;” hild, “outreach and engagement;” lê, “#fashionlibrarianship;” rachman, mutiarani, and putri, “content analysis;” salomon, “moving on from facebook;” tekulve and kelly, “worth 1,000 words.” 8 vassilakaki and garoufallou, “the impact of twitter.” 9 breeding, next-gen library catalogs; hild, “outreach and engagement;” rachman, mutiarani, and putri, “content analysis;” vassilakaki and garoufallou, “the impact of twitter.” 10 harrison, burress, velasquez, schreiner, “social media use,” 253. 11 chatten and roughley, “developing social media.” 12 peter fernandez, “‘through the looking glass: envisioning new library technologies’ social media trends that inform emerging technologies,” library hi tech news 33, no. 2 (2016), https://doi.org/10.1108/lhtn-01-2016-0004. 13 robin m. hastings, microblogging and lifestreaming in libraries (new york: neal-schumann publishers, 2010). 14 hastings, microblogging. 15 robert david jenkins, “how are u.s. startups using instagram? an application of taylor's sixsegment message strategy wheel and analysis of image features, functions, and appeals” (ma thesis, brigham young university, 2018), https://scholarsarchive.byu.edu/etd/6721. 16 lucy hitz, “instagram impressions, reach, and other metrics you might be confused about,” sprout blog, january 22, 2020, https://sproutsocial.com/insights/instagram-impressions/. 17 vassilakaki and garoufallou, “the impact of twitter.” 18 mark aaron polger and karen okamoto, “who’s spinning the library? responsibilities of academic librarians who promote,” library management 34, no. 3 (2013), https://doi.org/10.1108/01435121311310914. 19 yuhen hu, lydia manikonda, and subbarao kambhampati, “what we instagram: a first analysis of instagram photo content and user types,” eighth international aaai conference on weblogs and social media (2014), https://www.aaai.org/ocs/index.php/icwsm/icwsm14/paper/viewpaper/8118; jenkins, “how are u.s. startups using instagram?;” brian j. mcnely, “shaping organizational imagepower through images: case histories of instagram,” proceedings of the 2012 ieee international professional communication conference, piscataway, nj (2012), https://doi.org/10.1109/ipcc.2012.6408624; emma stuart, david stuart, and mike thelwall, “an investigation of the online presence of uk universities on instagram,” online information review 41, no. 5 (2017): 584, https://doi.org/10.1108/oir-02-2016-0057. 20 stuart, stuart, and thelwall, “an investigation of the online presence;” mcnely, “shaping organizational image-power,” 3. https://doi.org/10.1108/lhtn-01-2016-0004 https://scholarsarchive.byu.edu/etd/6721 https://sproutsocial.com/insights/instagram-impressions/ https://doi.org/10.1108/01435121311310914 https://www.aaai.org/ocs/index.php/icwsm/icwsm14/paper/viewpaper/8118 https://doi.org/10.1109/ipcc.2012.6408624 https://doi.org/10.1108/oir-02-2016-0057 information technology and libraries september 2020 likes, comments, views | doney, wikle, and martinez 15 21 stuart, stuart, and thelwall, “an investigation of the online presence.” 22 stuart, stuart, and thelwall, “an investigation of the online presence,” 588. 23 stuart, stuart, and thelwall, “an investigation of the online presence,” 585. 24 “university of idaho’s peer institutions,” university of idaho, accessed october 8, 2019. 25 stuart, stuart, and thelwall, “an investigation of the online presence,” 588. 26 mcnely, “shaping organizational image-power,” 4; stuart, stuart, and thelwall, “an investigation of the online presence,” 588. 27 johnny saldaña, the coding manual for qualitative researchers (los angeles: sage publications, 2013), 27. 28 “fleiss’ kappa,” wikipedia, https://en.wikipedia.org/wiki/fleiss%27_kappa. 29 chatten and roughley, “developing social media.” 30 stuart, stuart, and thelwall, “an investigation of the online presence,” 590. 31 stuart, stuart, and thelwall, “an investigation of the online presence,” 590. 32 hitz, “instagram impressions, reach, and other metrics.” 33 stuart, stuart, and thelwall, “an investigation of the online presence,” 590. 34 stuart, stuart, and thelwall, “an investigation of the online presence,” 588. 35 stuart, stuart, and thelwall, “an investigation of the online presence.” https://en.wikipedia.org/wiki/fleiss%27_kappa abstract introduction literature review methods research questions identifying a sample population data collection research data analysis content analysis interrater reliability results general data about the library instagram accounts rq1: which type of post category is used most frequently by libraries on instagram? rq2: is the number of likes or the existence of comments related to the post category? number of likes by category existence of comments by category discussion limitations and future research conclusion endnotes assessing the treatment of patron privacy in library 2.0 literature michael zimmer information technology and libraries | june 2013 29 abstract as libraries begin to embrace web 2.0 technologies to serve patrons, ushering in the era of library 2.0, unique dilemmas arise regarding protection of patron privacy. the norms of web 2.0 promote the open sharing of information—often personal information—and the design of many library 2.0 services capitalize on access to patron information and might require additional tracking, collection, and aggregation of patron activities. thus embracing library 2.0 potentially threatens the traditional ethics of librarianship, where protecting patron privacy and intellectual freedom has been held paramount. as a step towards informing the decisions to implement library 2.0 to adequately protect patron privacy, we must first understand how such concerns are being articulated within the professional discourse surrounding these next generation library tools and services. the study presented in this paper aims to determine whether and how issues of patron privacy are introduced, discussed, and settled, if at all, within trade publications utilized by librarians and related information professionals introduction in today’s information ecosystem, libraries are at a crossroads: several of the services traditionally provided within their walls are increasingly made available online, often by non-traditional sources, both commercial and amateur, thereby threatening the historical role of the library in collecting, filtering, and delivering information. for example, web search engines provide easy access to millions of pages of information, online databases provide convenient gateways to news, images, videos, as well as scholarship, and largescale book digitization projects appear poised to make roaming the stacks seem an antiquated notion. further, the traditional authority and expertise enjoyed by librarians has been challenged by the emergence of automated information filtering and ranking systems, such as google’s algorithms or amazon’s recommendation system, as well as amateur, collaborative, and peerproduced knowledge projects, such as wikipedia, yahoo! answers, and delicious. meanwhile, the professional, educational, and social spheres of our lives are increasingly intermingled through online social networking spaces such as facebook, linkedin, and twitter, providing new interfaces for interacting with friends, collaborating with colleagues, and sharing information. michael zimmer, phd, (zimmerm@uwm.edu), a lita member, is assistant professor, school of information studies, and director, center for information policy research, university of wisconsin-milwaukee. mailto:zimmerm@uwm.edu information technology and libraries | june 2013 30 libraries face a key question in this new information environment: what is the role of the library in providing access to knowledge in today’s digitally networked world? one answer has been to actively incorporate features of the online world into library services, thereby creating “library 2.0.” conceptually, library 2.0 is rooted in the global web 2.0 discussion, and the professional literature often links the two concepts. according to o’reilly, web 2.0 marks the world wide web’s shift from a collection of individual websites to a computing platform that provides applications for end users and can be viewed as a tool for harnessing the collective intelligence of all web users.1 web 2.0 represents a blurring of the boundaries between web users and producers, consumption and participation, authority and amateurism, play and work, data and the network, reality and virtuality.2 its rhetoric suggests that everyone can and should use new internet technologies to organize and share information, to interact within communities, and to express oneself. in short, web 2.0 promises to empower creativity, to democratize media production, and to celebrate the individual while also relishing the power of collaboration and social networks. library 2.0 attempts to bring the ideology of web 2.0 into the sphere of the library. the term is generally attributed to casey,3 and while over sixty-two distinct viewpoints and seven different definitions of library 2.0 have been advanced,4 there is general agreement that implementing library 2.0 technologies and services means bringing interactive, collaborative, and user-centered web-based technologies to library services and collections.5 examples include • providing synchronous messaging (through instant message platforms, skype, etc.) to allow patrons to chat with library staff for real-time assistance; • using blogs, wikis, and related user-centered platforms to encourage communication and interaction between library staff and patrons; • allowing users to create personalized subject headings for library materials through social tagging platforms like delicious or goodreads; • providing patrons the ability to evaluate and comment on particular items in a library’s collection through rating systems, discussion forums, or comment threads; • using social networking platforms like facebook or linkedin to create online connections to patrons, enabling communication and service delivery online; and • creating dynamic and personalized recommendation systems (“other patrons who checked out this book also borrowed these items”), similar to amazon and related online services. launching such library 2.0 features, however, poses a unique dilemma in the realm of information ethics, especially patron privacy. traditionally, the context of the library brings with it specific norms of information flow regarding patron activity, including a professional commitment to patron privacy (see, for example, american library association’s privacy policy, 6 foerstel,7 gorman,8 and morgan 9). in the library, users’ intellectual activities are protected by decades of established norms and practices intended to preserve patron privacy and confidentiality, most assessing the treatment of patron privacy in library 2.0 literature | zimmer 31 stemming from the ala’s library bill of rights and related interpretations.10 as a matter of professional ethics, most libraries protect patron privacy by engaging in limited tracking of user activities, having short-term data retention policies (many libraries actually delete the record that a patron ever borrowed a book once it is returned), and generally enable the anonymous browsing of materials (you can walk into a public library, read all day, and walk out, and there is no systematic method of tracking who you are or what you’ve read). these are the existing privacy norms within the library context. library 2.0 threatens to disrupt these norms. in order to take full advantage of web 2.0 platforms and technologies to deliver library 2.0 services, libraries will need to capture and retain personal information from their patrons. revisiting the examples provided above, each relies on some combination of robust user accounts, personal profiles, and access to flows of patrons’ personal information: • providing synchronous messaging might necessitate the logging of a patron's name (or chat username), date and time of the request, e-mail or other contact information, and the content of the exchange with the librarian staff member. • library-hosted blogs or wikis will require patrons to create user accounts, potentially tying posts and comments to patron ip addresses, library accounts, or identities. • implementing social tagging platforms would similarly require unique user accounts, possibly revealing the tags particular patrons use to label items in the collection and who tagged them. • comment and rating systems potentially link patrons’ particular interests, likes, and dislikes to a username and account. • using social networking platforms to communicate and provide services to patrons might result in the library gaining unwanted access to personal information of patrons, including political ideology, sexual orientation, or related sensitive information. • creating dynamic and personalized recommendation systems requires the wholesale tracking, collecting, aggregating, and processing of patron borrowing histories and related activities. across these examples, to participate and benefit from library 2.0 services, library patrons could potentially be required to create user accounts, engage in activities that divulge personal interests and intellectual activities, be subject to tracking and logging of library activities, and risk having various activities and personal details linked to their library patron account. while such library 2.0 tools and services can greatly improve the delivery of library services and enhance patron activities, the increased need for the tracking, collecting, and retaining of data about patron activities presents a challenge to the traditional librarian ethic regarding patron privacy.11 despite these concerns, many librarians recognize the need to pursue library 2.0 initiatives as the best way to serve the changing needs of their patrons and to ensure the library’s continued role in information technology and libraries | june 2013 32 providing professionally guided access to knowledge. longitudinal studies of library adoption of web 2.0 technologies reveal a marked increase in the use of blogs, sharing plugins, and social media between 2008 and 2010.12 in this short amount of time, library 2.0 has taken hold in hundreds of libraries, and the question before us is not whether libraries will move towards library 2.0 services, but how they will do it, and, from an ethical perspective, whether the successful implementation of library 2.0 can take place without threatening the longstanding professional concerns for, and protections of, patron privacy. research questions recognizing that library 2.0 has been implemented, in varying degrees, in hundreds of libraries,13 and is almost certainly being considered at countless more, it is vital to ensure that potential impacts on patron privacy are properly understood and considered. as a step towards informing the decisions to implement library 2.0 to adequately protect patron privacy, we must first understand how such concerns are being articulated within the professional discourse surrounding these next generation library tools and services. the study presented in this paper aims to determine whether and how issues of patron privacy are introduced, discussed, and settled—if at all—within trade publications utilized by librarians and related information professionals. specifically, this study asks the following primary research questions: rq1. are issues of patron privacy recognized and addressed in literature discussing the implementation of library 2.0 services? rq2. when patron privacy is recognized and addressed, how is it articulated? for example, is privacy viewed as a critical concern, as something that we will need to simply “get over,” or as a non-issue? rq3. what kind of mitigation strategies, if any, are presented to address the privacy issues related to library 2.0? data analysis the study combines content and textual analyses of articles published in professional publications (not peer-reviewed academic journals) between 2005 and 2011 discussing library 2.0 or related web-based services, retrieved through the library, information science, and technology abstracts (lista) and library literature & information science full text databases. the discovered texts were collected in winter 2011 and coded to reflect the source, author, publication metadata, audience, and other general descriptive data. in total, there were 677 articles identified discussing library 2.0 and related web-based library services, appearing in over 150 different publications. of the articles identified, 50 percent of appeared in 18 different publications, which are listed in table 1. assessing the treatment of patron privacy in library 2.0 literature | zimmer 33 table 1. top publications with library 2.0 articles (2005–2011) publication count computers in libraries library journal information today library and information update incite scandinavian public library quarterly american libraries electronic library online school library journal information outlook mississippi libraries college & research library news library hi tech news library media connection csla journal (california school library association) knowledge quest multimedia information and technology 51 51 21 21 20 18 16 15 14 14 13 13 12 12 12 10 10 8 each of the 677 source texts was then analyzed to determine if a discussion of privacy was present. full-text searches were performed on word fragments to ensure the identification of variations in terminology. for example, each text was searched for the fragment “priv” to include hits on both the terms “privacy” and “private.” additional searchers were performed for word fragments related to “intellectual freedom” and “confidentiality” in order to capture more general considerations related to patron privacy. of the 677 articles discussing library 2.0 and related web-based services, there were a total of 203 mentions of privacy or related concepts in 71 articles. these 71 articles were further refined to ensure the appearance of the word “privacy” and related terms were indeed relevant to the ethical issues at hand (eliminating false positives for mentions of “private university,” for example, or mention of a publication’s “privacy policy” that happened to be provided in the pdf searched). the final analysis yielded a total of 39 articles with relevant mention of patron privacy as it relates to library 2.0, amounting to only 5.8 percent of all articles discussing library 2.0 (see table 2). a full listing of the articles is in appendix a. information technology and libraries | june 2013 34 table 2. article summary count % total articles discussing library 2.0 articles with hit in “priv” and related text searches articles with relevant discussion of privacy 677 71 39 10.5 5.8 the majority of these articles were authored by practicing librarians in both public and academic settings and present arguments for the increased use of web 2.0 by libraries or highlight successful deployment of library 2.0 services. of the 39 articles, only 4 focus primarily on challenges faced by libraries hoping to implement library 2.0 solutions.14 a textual analysis of the 39 relevant articles was performed to assess how privacy was discussed in each. two primary variables were evaluated: the length of discussion, and the level of concern. length of discussion was measured qualitatively as high (concern over privacy is explicit or implicit in over 50 percent of the article’s text), moderate (privacy is discussed in a substantive section of the article), and minimal (privacy is mentioned, but not given significant attention). the level of concern was measured qualitatively as high (indicated privacy as a critical variable for implementing library 2.0), moderate (recognized privacy as one of a set of important concerns), and minimal (mentioned privacy largely in passing, giving it no particular importance). results of these analyses are reported in table 3. table 3. length of discussion and level of concern length of discussion level of concern high moderate minimal 3 8 28 9 13 16 of the 39 relevant articles, only three had lengthy discussions of privacy-related issues. as early as 2007, coombs recognized that the potential for personalization of library services would force libraries to confront existing policies regarding patron privacy. 15 anderson and rethlefsen similarly engage in lengthy discussions of the challenges faced by libraries wishing to balance patron privacy with new web 2.0 tools and services. 16 these three articles represent less than 1 percent of the 677 total articles identified that discussed library 2.0 while only three articles dedicate lengthy discussions to issues of privacy, over half the articles that mention privacy (21 of 39) indicate a high or moderate level of concern. for example, cvetkovic warns that while “privacy is a central, core value of libraries…the features of web 2.0 applications that make them so useful and fun all depend on users sharing private information with the site owners.” 17 and casey and savastinuk’s early discussion of library 2.0 puts these concerns in context for librarians, warning that “libraries should remain as vigilant with assessing the treatment of patron privacy in library 2.0 literature | zimmer 35 protecting customer privacy with technology-based services as they are with traditional, physical library services.” 18 while 21 articles indicated a high or moderate level of concern over patron privacy, less than half of these provided any kind of solution or strategy for mitigating the privacy concerns related to implementing library 2.0 technologies. overall, 14 of the 39 relevant articles provided privacy solutions of one kind or another. breeding, for example, argues that librarians must “absolutely respect patron privacy,” 19 and suggests any library 2.0 tools that rely on user data should only be implemented if users must explicitly “opt-in” to having their information collected, a solution also offered by wisniewski in relation to protecting patron privacy with location-based tools.20 rethlefsen goes a step further, proposing libraries take steps to increase the literacy of patrons regarding their privacy and the use of library 2.0 tools, including the use of classes and tutorials to help educate patrons and staff alike. 21 conversely, cvetkovic argues that “the place of privacy in our culture is changing,” and that while “in many ways our privacy is diminishing, but many people…seem not too concerned about it.” 22 as a result, while she argues for only voluntary participation in library 2.0 services, cvetkovic takes a position that information sharing is becoming the new norm, weakening any absolute position regarding protecting patron privacy above all. discussion rq1 asks if issues of patron privacy are recognized and addressed within literature discussing library 2.0 and related web-based library services. of the 677 articles published for professional audiences that discuss library 2.0, only 39 contained a relevant discussion of the privacy issues that stem from this new family of data-intensive technologies, and only 11 of these discussed the issue beyond a passing mention. rq2 asks how the privacy concerns, when present, are articulated. of the 39 articles with relevant discussions of privacy, only 11 make more than a minimal mention of privacy concerns. however, the discussion in 22 of the articles reveals a high or moderate level of concern. this suggests that while privacy might not be a primary focus of discussion, when it is mentioned, even minimally, its importance is recognized. finally, rq3 seeks to understand if any solutions or mitigation strategies related to the privacy concerns are articulated. with only 14 of the 39 articles providing a means for practitioners to address privacy issues, readers of library 2.0 publications are more often than not left with no real solutions or roadmaps for dealing with these vital ethical issues. taken together, the results of this study reveal minimal mention of privacy alongside discussions of library 2.0. less than 6 percent of all 677 articles on library 2.0 include mention of privacy; of these, only 11 make more than a passing mention of privacy, representing less than 2 percent of information technology and libraries | june 2013 36 all articles. of the 39 relevant articles, 22 express more than a minimal concern, but of these, only 9 provide any mitigation strategy. these results suggest that while popular publications targeted at information professionals are giving significant attention to potential for library 2.0 to be a powerful new option for delivering library content and services, there is minimal discussion of how the widespread adoption and implementation of these new tools might impact patron privacy and even less discussion of how to address these concerns. consequently, as the interest in, and adoption of, library 2.0 services increase, librarians and related information practitioners seeking information regarding these new technologies in professional publications will not likely be confronted with the possible privacy concerns, nor learn of any strategies to deal with them. this absence of clear guidance for addressing patron privacy in the library 2.0 era resembles what computer ethicist jim moor would describe as a “policy vacuum”: a typical problem in computer ethics arises because there is a policy vacuum about how computer technology should be used. computers provide us with new capabilities and these in turn give us new choices for action. often, either no policies for conduct in these situations exist or existing policies seem inadequate. a central task of computer ethics is to determine what we should do in such cases, that is, formulate policies to guide our actions. 23 given the potential for the data-intensive nature of library 2.0 technologies to threaten the longstanding commitment to patron privacy, these results show that work must be done to help fill this vacuum. education and outreach must be increased to ensure librarians and information professionals are aware of the privacy issues that typically accompany attempts to implement library 2.0, and additional scholarship must take place to help understand the true nature of any privacy threats and to come up with real and useful solutions to help find the proper balance between enhanced delivery of library services through web 2.0-based tools and the traditional protection of patron privacy. acknowledgements this research was supported by a ronald e. mcnair postbaccalaureate achievement program summer student research grant,and a uw-milwaukee school of information studies internal research grant. the author thanks kenneth blacks, jeremy mauger, and adriana mccleer for their valuable research assistance. assessing the treatment of patron privacy in library 2.0 literature | zimmer 37 references 1. tim o’reilly, “what is web 2.0? design patterns and business models for the next generation of software,” 2005, www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web20.html. 2. michael zimmer, “preface: critical perspectives on web 2.0,” first monday 13, no. 3 (march 2008), http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2137/1943. 3. michael casey, “working towards a definition of library 2.0,” librarycrunch (october 21, 2005), www.librarycrunch.com/2005/10/working_towards_a_definition_o.html. 4. walt crawford, “library 2.0 and ‘library 2.0,’” cites & insights 6, no 2 (midwinter 2006): 1–32, http://citesandinsights.info/l2a.htm. 5. michael casey and laura savastinuk, “library 2.0: service for the next-generation library,” library journal 131, no. 14 (september 1, 2006): 40–42; michael casey and laura savastinuk, library 2.0: a guide to participatory library service (medford, nj: information today, 2007).; nancy courtney, library 2.0 and beyond: innovative technologies and tomorrow’s user (westport, ct: libraries unlimited, 2007). 6. american library association, “policy on confidentiality of library records,” www.ala.org/offices/oif/statementspols/otherpolicies/policyconfidentiality. 7. herbert n. foerstel, surveillance in the stacks: the fbi’s library awareness program (new york: greenwood, 1991). 8. michael gorman, our enduring values: librarianship in the 21st century (chicago: american library association, 2000). 9. candace d. morgan, “intellectual freedom: an enduring and all-embracing concept,” in intellectual freedom manual. (chicago: american library association, 2006). 10. library bill of rights, american library association, www.ala.org/advocacy/intfreedom/librarybill; american library association, “privacy: an interpretation of the library bill of rights,” www.ala.org/template.cfm?section=interpretations&template=/contentmanagement/conten tdisplay.cfm&contentid=132904 11. rory litwin, “the central problem of library 2.0: privacy,” library juice (may 22, 2006), http://libraryjuicepress.com/blog/?p=68. http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2137/1943 http://www.librarycrunch.com/2005/10/working_towards_a_definition_o.html http://citesandinsights.info/l2a.htm http://www.ala.org/offices/oif/statementspols/otherpolicies/policyconfidentiality http://www.ala.org/advocacy/intfreedom/librarybill http://www.ala.org/template.cfm?section=interpretations&template=/contentmanagement/contentdisplay.cfm&contentid=132904 http://www.ala.org/template.cfm?section=interpretations&template=/contentmanagement/contentdisplay.cfm&contentid=132904 http://libraryjuicepress.com/blog/?p=68 information technology and libraries | june 2013 38 12. zeth lietzau and jamie helgren, u.s. public libraries and the use of web technologies, 2010 (denver: library research service, 2011), www.lrs.org/documents/web20/webtech2010_closerlookreport_final.pdf. 13. ibid. 14. sue anderson, “libraries struggle to balance privacy and patron access,” alki 24, no. 2 (july 2008): 18–28; karen coombs, “privacy vs. personalization,” netconnect (april 15, 2007): 28; milica cvetkovic, “making web 2.0 work–from ‘librarian habilis’ to ‘librarian sapiens,’” computers in libraries 29, no. 9 (october 2009): 14–17, www.infotoday.com/cilmag/oct09/cvetkovic.shtml;, melissa l. rethlefsen, “tools at work: facebook’s march on privacy,” library journal 135, no. 12 (june 2010): 34–35. 15. coombs, “privacy vs. personalization.” 16. anderson, “libraries struggle to balance privacy and patron access.”; melissa l rethlefsen, “facebook’s march on privacy,” library journal 135, no. 12 (2010): 34–35. 17. cvetkovic, “making web 2.0 work.” 18. casey and savastinuk, “library 2.0: service for the next-generation library.” 19. marshall breeding, “taking the social web to the next level,” computers in libraries 30, no. 7 (september 2010): 34–37, www.librarytechnology.org/ltg-displaytext.pl?rc=15053. 20. jeff wisniewski, “location, location, location,” online 33, no. 6 (2009): 54–57. 21. rethlefsen, “tools at work: facebook’s march on privacy.” 22. cvetkovic, “making web 2.0 work,” 17. 23. james moor, “what is computer ethics?” metaphilosophy 16, no. 4 (october 1985): 266–75. http://www.lrs.org/documents/web20/webtech2010_closerlookreport_final.pdf http://www.infotoday.com/cilmag/oct09/cvetkovic.shtml http://www.librarytechnology.org/ltg-displaytext.pl?rc=15053 assessing the treatment of patron privacy in library 2.0 literature | zimmer 39 appendix a: articles with relevant mention of patron privacy as it relates to library 2.0 anderson, sue. “libraries struggle to balance privacy and patron access.” alki 24, no. 2 (july 2008): 18–28. balnaves, edmund. “the emerging world of open source, library 2.0, and digital libraries.” incite 30, no. 8 (august 2009): 13. baumbach, donna j. “web 2.0 and you.” knowledge quest 37, no. 4 (2009): 12–19. breeding, marshall. “taking the social web to the next level.” computers in libraries 30, no. 7 (september 2010): 34–37. casey, michael e. and laura savastinuk. “library 2.0: service for the next-generation library.” library journal 131, no. 14 (september 1, 2006): 40–42. cohen, sarah f. “taking 2.0 to the faculty why, who, and how.” college & research libraries news 69, no. 8 (september 2008): 472–75. coombs, karen. “privacy vs. personalization.” netconnect (april 15, 2007): 28. coyne, paul. “library services for the mobile and social world.” managing information 18, no. 1 (2011): 56–58. cromity, jamal. “web 2.0 tools for social and professional use.” online 32, no. 5 (october 2008): 30–33. cvetkovic, milica. “making web 2.0 work—from ‘librarian habilis’ to ‘librarian sapiens.’” computers in libraries 29, no. 9 (october 2009): 14–17. eisenberg, mike. “the parallel information universe.” library journal 133, no. 8 (may 1, 2008): 22–25. gosling, maryanne, glenn harper, and michelle mclean. “public library 2.0: some australian experiences.” electronic library 27, no. 5 (2009): 846–55. han, zhiping, and yan quan liu. “web 2.0 applications in top chinese university libraries.” library hi tech 28, no. 1 (2010): 41–62. harlan, mary ann. “poetry slams go digital.” csla journal 31, no. 2 (spring 2008): 20–21. hedreen, rebecca c., jennifer l. johnson, mack a. lundy, peg burnette, carol perryman, guus van den brekel, j. j. jacobson, matt gullett, and kelly czarnecki. “exploring virtual librarianship: second life library 2.0.” internet reference services quarterly 13, no. 2–3 (2008): 167–95. information technology and libraries | june 2013 40 horn, anne, and sue owen. “leveraging leverage: how strategies can really work for you.” in proceedings of the 29th annual international association of technological university libraries (iatul) conference, auckland, nz (2008): 1–10, http://dro.deakin.edu.au/eserv/du:30016672/horn-leveragingleveragepaper-2008.pdf. huwe, terence. “library 2.0, meet the ‘web squared’ world.” computers in libraries 31, no. 3 (april 2011): 24–26. “idea generator.” library journal 134, no. 5 (1976): 44. jayasuriya, h. kumar percy, and frances m. brillantine. “student services in the 21st century: evolution and innovation in discovering student needs, teaching information literacy, and designing library, 2.0-based student services.” legal reference services quarterly 26, no. 1–2 (2007): 135–70. jenda, claudine a., and martin kesselman. “innovative library 2.0 information technology applications in agriculture libraries.” agricultural information worldwide 1, no. 2 (2008): 52–60. johnson, doug. “library media specialists 2.0.” library media connection 24, no.7 (2006): 98. kent, philip g. “enticing the google generation: web 2.0, social networking and university students.” in proceedings of the 29th annual international association of technological university libraries (iatul) conference, auckland, nz (2008), http://eprints.vu.edu.au/800/1/kent_p_080201_final.pdf. krishnan, yyvonne. “libraries and the mobile revolution.” computers in libraries 31, no. 3 (april 2011): 5–9. li, yiu-on, irene s. m. wong, and loletta p. y. chan. “mylibrary calendar: a web 2.0 communication platform.” electronic library 28, no. 3 (2010): 374–85. liu, shu. “engaging users: the future of academic library web sites.” college & research libraries 69, no. 1 (january 2008): 6–27. mclean, michelle. “virtual services on the edge: innovative use of web tools in public libraries.” australian library journal 57, no. 4 (november 2008): 431–51. oxford, sarah. “being creative with web 2.0 in academic liaison.” library & information update 5 (may 2009): 40–41. rethlefsen, melissa. “facebook’s march on privacy.” library journal 135, no. 12 (2010): 34–35. schachter, debbie. “adjusting to changes in user and client expectations.” information outlook 13, no. 4 (2009): 55. http://dro.deakin.edu.au/eserv/du:30016672/horn-leveragingleveragepaper-2008.pdf http://eprints.vu.edu.au/800/1/kent_p_080201_final.pdf assessing the treatment of patron privacy in library 2.0 literature | zimmer 41 shippert, linda crook. “thinking about technology and change, or, ‘what do you mean it’s already over?’” pnla quarterly 73, no. 2 (2008): 4, 26. stephens, michael. “the ongoing web revolution.” library technology reports 43, no. 5 (2007): 10–14. thornton, lori. “facebook for libraries.” christian librarian 52, no. 3 (2009): 112. trott, barry and kate mediatore. “stalking the wild appeal factor.” reference & user services quarterly 48, no. 3 (2009): 243–46. valenza, joyce kasman. “a few new things.” lmc: library media connection 26, no. 7 (2008): 10– 13. widdows, katharine. “web 2.0 moves 2.0 quickly 2.0 wait: setting up a library facebook presence at the university of warwick.” sconul focus 46 (2009): 54–59. wisniewski, jeff. “location, location, location.” online 33, no. 6 (2009): 54–57. woolley, rebecca. “book review: information literacy meets library 2.0: peter godwin and jo parker (eds.).” sconul focus 47, (2009): 55–56. wyatt, neal. “2.0 for readers.” library journal 132, no. 18 (2007): 30–33. tending to an overgrown garden: weeding and rebuilding a libguides v2 system article tending to an overgrown garden weeding and rebuilding a libguides v2 system rebecca hyams information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.12163 rebecca hyams (rhyams@bmcc.cuny.edu) is web and systems librarian, borough of manhattan community college/cuny. © 2020. abstract in 2019, the borough of manhattan community college’s library undertook a massive cleanup and reconfiguration of the content and guides contained in their libguides v2 system, which had been allowed to grow out of control over several years as no one was in charge of its maintenance. this article follows the process from identifying issues, getting departmental buy-in, and doing all of the necessary cleanup work for links and guides. the aim of the project was to make their guides easier for students to use and understand and for librarians to maintain. at the same time, work was done to improve the look and feel of their guides and implement the built-in a-z database list, both of which are also discussed. introduction in early 2019, the a. philip randolph library at the borough of manhattan community college (bmcc) (part of the city university of new york (cuny) system) hired a new web and systems librarian. the position itself was new to the library, though some of its functions had previously been performed by a staff member who had left more than a year prior. it quickly became apparent to the newest member of the library’s faculty that, while someone had at one point managed the website, the same could not really be said for the library’s libguides system and the mass of content contained within. the library’s libguides system was first implemented in january 2013 and over time the system came to be used primarily by instruction librarians to serve their teaching efforts. not long after bmcc implemented libguides, springshare announced libguides version 2 (v2), a new version of the system that included several enhancements and features not present in the earlier version.1 these features included the ability to mix content types in a single box (in the earlier version, for example, boxes could have either rich text or links but not both), a centrally managed asset library, and an automatically-generated a-z database list designed to make it easy to manage a publicfacing display. bmcc moved to libguides v2 around early 2015, but few of those who worked with the system ever took advantage of the newer features offered for quite some time, if at all. at the time the web and systems librarian came aboard, the bmcc libguides system contained over 400 public guides and an unwieldy asset library filled with duplicates and broken widgets and links. many of the guides essentially duplicated others, with only the name of the classroom instructor differing. there were, for example, 69 separate guides just for english 101, some of which had not been updated in three or four years. there were there no local guidelines for creating or maintaining guides, and in theory, each librarian was responsible for their own. however, it was apparent that in practice, no one was actively managing the guides or their related assets, as the lists of both were overwhelming. the creators of existing guides were primarily reference and instruction librarians whose other responsibilities meant there was little mailto:rhyams@bmcc.cuny.edu information technology and libraries december 2020 tending to an overgrown garden | hyams 2 time to do guide upkeep and because there was no single person in charge of the guides, there was no one to ensure any maintenance took place. in addition to the unwieldy guide list and asset library, the bmcc library was also effectively maintaining two separate a-z database lists, one on the library’s website that was a homegrown sql database built by a previous staff member, and another running on libguides to provide links to databases via the guides. the lists were not in sync with one another and several of the librarians were unaware that the libguides version of the list even existed, leading to links to databases appearing on both the database list and as link assets. and, while the libguides a -z list was not linked from the library’s website, it was still accessible from points within libguides, meaning that patrons could encounter an incorrect list that was not being maintained. getting started before any work could be done on our system, there needed to be buy-in from the rest of the library faculty. with the library director in agreement, agenda items were added to department meetings between march and may 2019 for discussion and department approval. the various aspects of the project were pitched to emphasize the following goals: • removing outdated material, broken links, etc. • streamlining where information could be found • decluttering guides to make everything easier to use and understand for students • improving the infrastructure to make maintenance and new guide creation easier and more manageable • standardizing layouts and content the aim of all of this would be to increase guide usability, accessibility, and make the guides overall a more consistent resource for our students. for the sake of transparency (as well as to have a demo of some of the aesthetic changes discussed in more detail below), a project guide was created and shared with the rest of the library department to share preliminary data as well as detailed updates as tasks were completed.2 process the database list while the libguides a-z database list, a feature built into v2 of the platform, contained information about our databases, it was essentially only serving to provide links to databases when creating guide content. there was some indication, in the form of a dormant a-z database “guide,” that someone had tried to create a list in libguides by manually adding assets to a guide. while that was a common practice in libguides v1 sites, as the built-in list was not yet a part of the system, the built-in list itself was never properly put into use. the links on our website all pointed to a homegrown list which, while powered by an sql database, was essentially a manual list. because of its design, it had proved impossible for anyone in the library to update without extensive web programming knowledge. it seemed a no-brainer to work on the database list first. this way we had both the infrastructure to update database-related content on the guides and a single and up-to-date list of resources with enhanced functionality that could benefit the library’s users almost immediately.3 information technology and libraries december 2020 tending to an overgrown garden | hyams 3 to begin, the two lists were compared to find any discrepancies, of which there were many. as the e-resources librarian was on leave at the time, the library director was consulted to determine which of databases missing from the libguides list were active subscriptions (and which of the ones missing from the homegrown list were previously cancelled so they could be removed). once the database list reflected current holdings, the metadata entries for the databases on the libguides side were updated to include resource type, related subjects, and related icons. these updates would enhance the functionality of the libguides list, as it could be filtered or searched using that additional information, something that was missing from the homegrown lis t. in addition to updating content and adding useful metadata, some slight visual changes were made to improve the look and usability of the list using custom css. most of this was done because as the list was being worked on, several librarians (of those who were even aware of it in the first place) mentioned that one reason they disliked the libguides list was because of the font size and spacing, which they felt was too small and hard to read. with the list updated, it was presented at the march 2019 department meeting and quickly won over all in attendance, especially when it was pointed out that the list could be very easily maintained because it required no special coding knowledge. while the homegrown list would remain live on the server for the rest of the semester (so as to not disrupt any classes that may have been using it), it was agreed that the web and systems librarian could go ahead with switching all of the links pointing to the homegrown list to point to the springshare list instead. the asset library because of how guides were typically created over the years since adopting libguides (many appeared to have been copied from another existing guide each time) the asset library had grown immense and unmanageable. for example, there were 149 separate links to our “databases by subject” page on the library’s website, the overwhelming majority of which were only used once. there were also 145 separate widgets for the same embedded scribd-hosted keyword worksheet, which was in fact broken and displayed no content. this is to say nothing of the broken-link report that no one had reviewed in quite some time. tackling the cleanup of duplicates and fixing of broken links/embeds was a large piece of the invisible work taken on behind the scenes to make maintaining the guides easier in the future. in order to analyze the data, the asset library report was exported to an excel file to make it easier to identify issues that needed correction. to start this process, we requested that springshare technical support wipe out all assets (other than documents) that were not mapped to anything and were just cluttering up the asset library (this ended up being just under 2,000 assets).4 most of those items had been removed from the guides they were originally included on but were never removed from the asset library. they served no real function other than to clutter up the backend. the guide authors had given the web and systems librarian permission to remove anything broken that could not be easily fixed. this included the aforementioned broken worksheet (and other similar items), as well as an assortment of youtube video embeds where the video had since been taken down, resulting in a “this video is unavailable” error message. it was felt that since those were already not working and seriously hurt the reliability of our guides to our users, that no further permission was needed. then came the much more tedious task of standardizing (where possible) which assets were in use. this involved going into guides listed as containing known-duplicate assets, replacing them with a single, designated asset, and then removing the resulting unmapped items.5 it was decided information technology and libraries december 2020 tending to an overgrown garden | hyams 4 that while many of the guides would likely be deleted after the spring semester, that only assets appearing on currently-active guides would be standardized. while in hindsight, as many of the links that were fixed were on guides that were soon-to-be deleted, it would have been better to hold off and wait until guides could be deleted first. however, doing at least some of this work in advance helped find other issues including instances where our proxy prefix was included directly in the url (an issue as we were also in the process of changing our ezproxy hosting) and where custom descriptions or link names were unclear. “books from the catalog” assets had their own issues that also needed to be addressed. with a pending migration of the library’s ils, it was already apparent that the links to any books in the library’s catalog would need updating so they could have a shot at continuing to function postmigration.6 we had been told at the time that the library’s primo instance would remain through the migration (though this changed during the migration process) so at the time we felt it important to ensure that all links were pointing to primo, as some had been pointing to the soonto-be decommissioned opac. for consistency, the urls were structured as isbn searches instead of ones relying on internal system numbers that would soon change. however, it became obvious very early on that some of the links to books were either pointing to materials that were no longer in the library’s collection, or were pointing to a previously decommissioned opac server, both of which resulted in errors. because the domain of the previously decommissioned opac server had been whitelisted in the link checker report settings, these items had not appeared on the broken link list. using the filtered list of “books from the catalog” assets, all titles were checked, which allowed the web and systems librarian to remove items that were no longer in the collection and make other adjustments as needed. as a result of the asset cleanup process, the asset library went from an unwieldy total of more than 5,000 items to just over 2,000 items. it also simplified the process for reusing assets in new guides, as there was now only one choice per item, and made it much easier to find and fix broken links and embeds. the guides the cleanup of the guides themselves was by far the most complex task. before starting the guide cleanup work itself, the web and systems librarian performed a content analysis to identify and recommend guides for deletion and which could be converted into general subject area guides. because a common practice was to create a “custom” guide for each class that came in for a library instruction session, there was an overrepresentation of guides for the classes that had regular sessions: english 101 (english composition), english 201 (introduction to literature), speech 100 (public speaking), and introduction to critical thinking. those four courses accounted for 187 guides, or over 40 percent of the total number in our system. the majority of them had not been updated directly in over three years, and in some cases, were designed for instructors who no longer taught at the college. perhaps more telling was that the content for these guides diff ered more across the librarians who created them than across the courses they were designed for. this meant that while there might be three or four different iterations of the english 101 guide, the guides created by the same librarian for different introductory courses were essentially the same. before the arrival of the web and systems librarian, one of the other librarians had been occasionally maintaining guide groups for “current courses” and “past courses,” but it was unclear if anyone was still actively maintaining these groupings, as guides for current instructors were sometimes under “past courses” and vice versa. because these groups did not actually hide the information technology and libraries december 2020 tending to an overgrown garden | hyams 5 guides from view on the master list of guides and appeared to be unnecessary work, it was decided to remove the groupings. instead, the web and systems librarian would plan to revisit the guides on a regular basis to unpublish/remove anything for courses that were no longer taught. however, since the philosophy behind the guides was to move from “custom” guides for each instructor’s section to a general guide for the course as a whole for the overwhelming majority of cases, the need for maintaining these groupings was essentially eliminated anyway. in may 2020, a preliminary list of guides to be deleted was presented to the librarians at the monthly department meeting. the list was broken down as: • duplicates to be deleted: this portion consisted primarily of course guides like those mentioned above where multiple guides existed for the same course, most of which used the exact same content. • guides to be “merged:” while merging guides is not actually possible in the libguides platform, there were cases where we had two or three for the same course. they could be condensed into a single guide with the rest deleted. • guides to convert to subject area guides: these were guides that were essentially already structured as a subject guide but were titled for a specific course, and in many cases, a guide for the subject area did not already exist (for example, a course-specific guide for business would become the business subject area guide). • dead guides: these were guides that had not been updated in more than two years and had not been viewed in at least one year. librarians were given an opportunity in the department meeting to comment on the list, as well as to contact the web and systems librarian with any comments. additionally, as some of the classroom faculty on campus had connections to specific guides, the library director also sent out a message to classroom faculty to let them know of our general plan to revamp the guides and that many would be removed over the summer. surprisingly, there were few objections either amongst the librarians or the classroom faculty once they understood the rationale and process. of the few classroom faculty members that did respond to the library director’s message, most of them were more concerned with content or specific links that they felt strongly about versus the guides themselves. in those cases, we noted the content requests to make sure they appeared on the new guides. most of these instructors were satisfied when we further explained our process and , if needed, ensured them that the content they requested would be worked into the new guide. only one instructor who responded, whose assignment was related to a grant they had received, made a strong case for keeping a separate guide for their sections of english 101. with the project approval out of the way, it was then time to begin removing all of the to-bedeleted guides and start the process of revamping those that would be kept. the goal was that the project would be completed by the start of the fall semester so that faculty and students would come back to a new (and hopefully, much improved) set of guides. removing debris to be cautious, a few preliminary steps were taken before the guides selected for deletion were removed. for starters, the selected guides had their status changed to “unpublished,” meaning that they no longer appeared on the public-facing list of guides. this gave everyone a chance to say something if a guide they were actively using suddenly went “missing.” these unpublished guides were then downloaded using the libguides html backup feature and saved to the department’s information technology and libraries december 2020 tending to an overgrown garden | hyams 6 network share drive. while the html backup output is not a full representation of the guide (the file generated displays as a single page and is missing any formatting or images that were included in the guide), it does include all of a guide’s content, meaning that a link or block of text can be retrieved from the backup in case of moments of “i know i had this on my guide before but....” because of the somewhat haphazard nature of our guides, deleting unwanted ones turned out to result in interesting and unexpected challenges. over the years, some of librarians had, from time to time, reused individual boxes between guides, but there was no consistency to the practice. while there was a repository guide for reusable content, not everyone used it or used it consistently. thankfully, libguides runs a pre-delete check, which proved to be invaluable in this process, as it showed if any of the boxes displayed on one guide were reused on any others. in most cases where boxes were reused, they were reused on guides that were also on the “to be deleted” list, but that was not always the case. by having that check we could find the other guides listed and make copies of the boxes that would have otherwise been deleted. if a box was reused on multiple guides that were being kept, it was copied to the reusable content guide and then remapped from there. cosmetic improvements in conjunction with the work being done to improve content of our guides, the web and systems librarian felt it was the perfect opportunity to update the guide templates and overall aesthetics to make the guides more visually appealing, especially considering little had been done in this area system-wide apart from setting the default color scheme. using the project guide as an initial sandbox, several changes were put into motion that would eventually be worked into new templates and pushed out to all of the reworked guides. the first, and perhaps biggest, change was the move from tab navigation to side navigation (an option first made available with the release of libguides v2). while there have been several usability studies that have debated using one over the other, in this case side navigation was chosen both for the streamlined nature of the layout as a whole (by default there is only one full content column), and because enabling the box-level navigation could serve as a quick index for anyone looking to find specific content on a page.7 side navigation also avoided the issue of long lists of tabs spilling into a second row, which further complicated page navigation. several changes to the look and feel of the guides were also put into place, with many of the changes coming from suggestions given on various libguide style or best practice guides or more general recommendations from web usability guidelines.8 perhaps most importantly, all of the font sizes were increased for improved readability, especially on box titles and headers, to better facilitate visual scanning. the default fonts were also replaced with two commonly used fonts from the google fonts library, roboto (for headings and titles) and open sans (for body text). additionally, the navigation color scheme was changed because the orange of the college’s blueand-orange color scheme regularly failed accessibility contrast checks and was described by some colleagues as “harsh on the eyes.” instead, two analogous lighter shades of blue (one of which was taken from the college’s branding documentation) were selected for the navigation and box titles respectively, both of which allowed for the text in those areas to be changed from white to black (again, for improved readability). figure 1 shows a typical “before” guide navigation design, and figure 2 shows a typical “after” design. information technology and libraries december 2020 tending to an overgrown garden | hyams 7 figure 1. a sample of guide navigation and content frequently found on guides before start of cleanup figure 2. navigation and content after revisions additionally, the web and systems librarian took this opportunity to go through the remaining guides to ensure they were all consistent. most of this work fell in the area of text styling, or rather, undoing text styling. it was clear from several of the guides that over the years, librarians had not been happy with the default font sizes or styles, which lead to a lot of customizing using the built-in wysiwyg text editor. not only did this create a nightmare in the code itself (as the wysiwyg editor adds a lot of extraneous <span> tags and style markup), but it also meant that the changes coming from the new stylesheet were not being applied universally as any properties assigned on a page overrode the global css. there was also the issue of paragraph text (<p>) that was sometimes styled as fake headings (made larger or bolder to look like headings, but not using the proper <h#> tags) which needed to be corrected for consistency and accessibility purposes. replanting and sprucing up with an overwhelming majority of the guides (and their associated assets) deleted, it was finally time to rework the remaining guides into clear, easy-to-use resources that would benefit our students. at this point the guides fell into three categories: • guides that just needed to be pruned and updated. • guides that should be combined into a single subject area guide. • guides that should be created to fill an unmet need. information technology and libraries december 2020 tending to an overgrown garden | hyams 8 pruning and updating tasks were generally the least-arduous, as many of the guides included content that was also housed on discrete guides (citations, resource evaluation, etc.). instead of duplicating, for example, citation formats on every guide, those pages were replaced with navigation-level links out to the existing citation guide. this was also the point that we could do more extensive quality control such as switching to a single content column which further emphasized the extraneous information on many of our guides. infographics, videos, and long blocks of links or text were scrutinized to determine if they were helping to enhance students’ understanding of the core content or if they were merely providing clutter that would make it more difficult to understand the important information.9 in some cases, by going from guide to guide, it became apparent that there were guides for multiple courses in a subject area where the resources were basically identical. this was most noticeable in the criminal justice and health education subject areas. in these cases, it made little sense to keep separate course guides when the content was basically the same across them. to remedy this duplication, one of the course guides for each subject was transformed into the subject area guide, and resources were added to ensure they covered the same materials that the separate course guides may have covered. the remaining course guides were then marked for future deletion as they were no longer needed. lastly, subject areas without guides were identified so that work could be done later to create them. as we had discussed moving towards using the “automagic” integration of guide content into our blackboard learning management system (lms), this step will be key in ensuring that all subject areas have at least some resources students can use. however, as of this time we have yet to finish creating these additional guides, and several subject areas (including computer science, nursing, and gender studies) have no guides at all. next steps now that all of the work to clean and update our libguides is done, the most important next step is coming up with a workflow to ensure that the guides stay relevant and useful. the web and systems librarian mostly left the guides alone for the fall 2019 semester to allow their colleagues time to use them and report back any issues. to the web and systems librarian’s surprise there were few issues reported, but that does not mean there is no room for future improvement. as a department, it is clear that we need a formal plan for maintaining the guides, including update frequency, content review, and guidelines for when guides should be added or deleted. additionally, immediately following the conclusion of this cleanup project the library’s website was forced into a server migration and full rebuild for reasons outside of the scope of this article. however, as a result there were changes made on the library’s site involving the look and feel of pages that will need to be carried through into our guides and associated springshare platforms. while most of this work is relatively simple, mimicking changes developed in wordpress to work properly on external services will take time and effort. conclusion overall, while this project was a massive undertaking (done almost entirely by a single person), the end result, at least on the surface, has made our guides much easier to use and understand. there were obviously several things that, if the project were to be done over, should have been done differently, mostly involving the cleaning of the asset library. however, it is now much easier information technology and libraries december 2020 tending to an overgrown garden | hyams 9 to refer students to guides for their courses and the feelings about the guides amongst the library faculty have become much more positive. endnotes 1 “libguides: the next generation!,” springshare blog (blog), june 26, 2013, https://blog.springshare.com/2013/06/26/libguides-the-next-generation/. 2 the guide can be viewed at: https://bmcc.libguides.com/guidecleanup. 3 though the author only learned of the project undertaken at unc a few years ago, after they had already finished this project, a similar project was outlined here: sarah joy arnold, “out with the old, in with the new: migrating to libguides a-z database list,” journal of electronic resources librarianship 29, no. 2 (april 2017): 117–20, https://doi.org/10.1080/1941126x.2017.1304769. 4 because there was no way to view the documents before a bulk deletion, documents were manually reviewed and deleted as needed. 5 it was only long after this process that springshare promoted that they could do this on the backend by request. 6 however, it turned out that due to the differences in url structure between classic primo and primo ve that this change was completely unnecessary as the urls did actually needed to be changed again post-migration. at least they were consistent which meant a systemwide findand-replace could take care of most of the links. 7 several studies have been done since the roll out of libguides v2 including: sarah thorngate and allison hoden, “exploratory usability testing of user interface options in libguides 2,” college and research libraries 78, no. 6 (2017): 844–61, https://doi.org/10.5860/crl.78.6.844; kate conerton and cheryl goldenstein, “making libguides work: student interviews and usability tests,” internet reference services quarterly 22, no. 1 (january 2017): 43–54, https://doi.org/10.1080/10875301.2017.1290002. 8 of the many guides the author consulted, the following were the most informative: stephanie jacobs, “best practices for libguides at usf,” https://guides.lib.usf.edu/c.php?g=388525&p=2635904; jesse martinez, “libguides standards and best practices,” https://libguides.bc.edu/guidestandards/getting-started; carrie williams, “best practices for building guides & accessibility tips,” https://training.springshare.com/libguides/best-practices-accessibility/video. 9 there is a very detailed discussion of cognitive overload in libguides in jennifer j. little, “cognitive load theory and library research guides,” internet reference services quarterly 15, no. 1 (march 1, 2010): 53–63, https://doi.org/10.1080/10875300903530199. https://blog.springshare.com/2013/06/26/libguides-the-next-generation/ https://bmcc.libguides.com/guidecleanup https://doi.org/10.1080/1941126x.2017.1304769 https://doi.org/10.5860/crl.78.6.844 https://doi.org/10.1080/10875301.2017.1290002 https://guides.lib.usf.edu/c.php?g=388525&p=2635904 https://libguides.bc.edu/guidestandards/getting-started https://training.springshare.com/libguides/best-practices-accessibility/video https://doi.org/10.1080/10875300903530199 abstract introduction getting started process the database list the asset library the guides removing debris cosmetic improvements replanting and sprucing up next steps conclusion endnotes reproduced with permission of the copyright owner. further reproduction prohibited without permission. the impact of web search engines on subject searching in opac yu, holly;young, margo information technology and libraries; dec 2004; 23, 4; proquest pg. 168 the impact of web search engines on subject searching in opac holly yu and margo young this paper analyzes the results of transaction logs at california state university, los angeles (csula) and studies the effects of implementing a web-based opac along with interface changes. the authors find that user success in subject searching remains problematic. a major increase in the frequency of searches that would have been more successful in resources other than the library catalog is noted over the time period 2000-2002. the authors attribute this increase to the prevalence of web search engines and suggest that metasearching, relevance-ranked results, and relevance feedback ("more like this") are now expected in user searching and should be integrated into online catalogs as search options. i n spite of many studies and articles on online public access catalogs (opac) over the last twenty-five years, many of the original ideas about improving user success in searching library catalog have yet to be implemented. ironically, many of these techniques are now found in web search engines. the popularity of the web appears to have influenced users' mental models and thus their expectations and behavior when using a webbased opac interface. this study examines current search behavior using transaction-log analysis (tla) of subject searches when zero-hits are retrieved. it considers some of the features of web search engines and online bookstores and suggests future enhancements for opacs. i literature review many studies have been published since the 1980s centering on the opac. seymour and large and beheshti provide in-depth overviews on opac research from the mid-1980s through the mid-1990s.' much of this research has addressed system design and user behavior including: • user demographic s, • search behavior, • knowledge of system, • knowledge of subject matter, holly yu (hyu3@calstatela.edu) is library web administrator and reference librarian at the university library, california state university, los angeles. margo young (margo.e.young@jpl. nasa.gov) is manager of the library, archives and records section at the jet propulsion laboratory, california institute of technology, pasadena. • library settings, • search strategies, and • opac systems 2 opac research has employed a number of data-collection methodologies: experiment, interviews, questionnaires, observation, think aloud, and transaction logs. ' transaction logs have been used extensively to study the use of opacs, and library literature reflects this. while the exact details of tla vary greatly, peters et al. define it simply as "the study of electronically recorded interactions between online information retrieval systems and the persons who search for the information found in those systems."' this section reviews the tla literature relevant to the study. i number of hits tla cannot portray user intention or actual satisfaction since relevance, success, or failure are subjectively determined and require the user to decide. peters recommends combining tla with another technique such as observation, questionnaire or survey, interview, or focus group. 5 in spite of the limit ations of tla, many studies (including this one) rely on it alone. typically, these studies define failure as zero hits in response to a search. generalizing from several studies, approximately 30 percent of all searches result in zero hits.6 the failure rate is even higher for subject searches: peters reported that about 40 percent of subject searches failed by retrieving zero hits. 7 some researchers also define an upper number of results for a successful sea rch. buckland found that the average retrieval set was 98.8 blecic reported that cochrane and markey found that opac users retrieve too much (15 percent of the time). 9 wiberly, daugherty, and danowski (as reported in peters) found that the median number of postings considered to be too many was fifteen, although when fifteen to thirty postings were retrieved, more users displayed them all than abandoned the search. 10 i subject searching some studies have specifically looked at subject searching. hildreth differentiated among various types of searches and defined one hundred items as the upper limit for keyword searches and ninety as the upper limit for subject searches." larson defined reasonable subject retrieval as between one and twenty items and found that only 12 percent of subject searches retrieved the appropriate number. 12 168 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. larson is not the only researcher to have reported poor results in subject searching. for more than twenty years, research has demonstrated that subject or topical searches are both popular and problematic. tolle and han found that subject searching is most frequently used and the least successful. 13 moore reported that 30 percent of searches were for subject, and matthews et al. found that 59 percent of all searches were for subject information. 14 hunter found that 52 percent of all searches were subject searches and that 63 percent of these had zero hits. 15 van pulis and ludy referred to alzofon and van pulis's earlier work in 1984 where they reported that 42 percent of all searches were subject searches.16 hildreth found that 62.1 percent of subject searches and 35.4 percent of keyword searches failed. 17 larson categorized the major problems with online catalogs as follows: • users' lack of knowledge of library of congress subject headings (lcsh), • users' problems with mechanical and conceptual aspects of query formulation, • searches that retrieve nothing, • searches that retrieve too much, and • searches that retrieve records that do not match what the user had in mind. 18 during an eleven-year longitudinal study, larson found that subject searching was being replaced by keyword searching. 19 no consistent pattern in the number of search terms has emerged in the literature. van pulis and ludy reported that user searches were typically single words. 20 markey contended that users' search terms frequently matched standardized vocabulary in large catalogs. 21 none of markey's researchers consulted lcsh, and only 11 percent of van pulis and ludy's did so, notably in spite of their library's user-education programs. peters reported that lester found that the average search was less than two words and fewer than thirteen characters." hildreth found that more than two-thirds of keyword searches included two or more words and 42 percent of these multiple-word searches resulted in zero hits. 23 the proportion of zero-hit keyword searches rose with the increasing number of words in the search. subject headings have been a matter of considerable study. gerhan examined catalog records and surmised their accessibility in an online catalog. he contended that when a keyword from the title only is accessed, only 50 percent of all relevant books would be found and that title keywords would lead a user to subject-relevant records in 55 percent of cases while lcsh would lead a user successfully in 85 percent of the cases. 24 in contrast, cherry found that 42 percent of zero-hit subject searches would have been more fruitful as keyword or title searches than by following cross references retrieved from the subject field.25 she recommended converting zero-hit subject queries to other types of subject searches (keyword). thorne and whitlatch recommended that subject searchers should select keyword rather than subject headings as their first access strategy. 26 types of problems in subject searches numerous studies have categorized reasons for search failure (typically in zero-hit situations), but peters reports that a standard categorization has not yet been established .27 tn cases where more than one error is made in a search (and hunter reported this to be frequent), there is no consistency in how that is assigned. nonetheless, some major categories of problems stand out: • misspelling and typographical errors-peters found that these errors accounted for 20.8 percent of all unsuccessful keyword searches, while henty (reported by peters) concluded that 33 percent of such searches could be attributed to this.28 hunter found that 9.3 percent of subject searches had typographical and spelling errors. 29 • keyword search-hunter found 52.6 percent of zerohit searches used uncontrolled vocabulary terms. 30 • wrong source or field-hunter concluded that 4.5 percent of searches should have been done in a source other than the catalog, while 1.3 percent of searches were of the wrong type (an author search in the subject-search option). 31 • items not in the database-peters found that searches for items not held in the database accounted for 39.1 percent of unsuccessful searches, while hunter found that problem in only 2.5 percent of the problem cases. 32 in addition to these problems, hunter also found that index display and rules relating to the systems accounted for 27 percent of errors. 33 i resulting recommendations for change while hildreth stated, "there has been little research on most components of the opac interface" in 1997, he proposed two options to improve user success: increased user training or improved design based on informationseeking behavior. 34 wallace pointed out that there is a very short window of opportunity when searchers are amenable to instruction and that successful screen designs should therefore focus on presenting the quicksearching options employed by the majority of users first. 35 large and beheshti observed "that too many options simply caused confusion, at least for less experienced opac users," and they summarized that opacthe impact of web search engines on subject searching in opac i yu and young 169 reproduced with permission of the copyright owner. further reproduction prohibited without permission. interface research focuses on menu sequence, browsing, and querying .3'; menu sequence in terms of menu sequence, hancock-beaulieu indicated that "the menu sequence in which search options are offered will influence user selection." 37 ballard found that the amount of keyword searching was affected by its posi tion on the menu. 38 scott reported that both keywordand subject-search success improved when the keyword was plac ed at the top of the menus .39 thorne and whitlach used a combination of methods in their study and concluded that several interface changes should be implemented : • strongly encourage novi ce users to start with keyword (list keyword above subject heading), • relabel "keyword" to "subject or title words," and • relabel ii subject heading" to "library of congress subject heading."' 0 blecic et al. studied tran sactio n logs over six months to track th e impact of "simplifying and clarifying" opac introductory screens. after moving the keyword option to th e top, keyword searching incr ease d from 13.30 percent to 15.83 percent of all sea rch statements. blecic et al. found her original tally of 35.05 p ercent of correct searches having zero hits decre ased to 31.35 percent after screen changes. 41 querying opac-interface design has been based on an assumption that us ers come to the catalog knowing what the y need to know . in either text-bas ed opac or web-based opac, query-based searches are still mainstream. searchers are required to have knowledge of title, author, or subject. ortiz-repiso and moscoso observed that web-based catalogs, like all library catalogs, basi cally fulfill two functions: locating works based on known details and identifying which documents in the databas e cover a given subject. 42 natural-language input has long been considered a desi rable way to overcome this shortcoming. browsing relevance-ranked output and hypertext were considered by hildr eth to be promising in 1997.43 opacs have not been conceived within a true h ypertext environment, but rather they maintain the structure of their original formats, principally machine-readable cataloging (marc), and therefore impede the generation of a structure of nodes and links. 44 in addition to continuing to employ marc format as its underlying structure, the concept of main entry and added entr y, field label, and displa y logic all reflect cataloging rules . amazon.com and barnes and noble have completel y mo ve d away from this centuryold structure to pro vi d e easy access to book information . in the web environment , th e concept of main ent ry loses its meaning to multiple-acces s points and linking capabilities of author, subject, and call number. another prominent drawback of web-based op a cs is that they have not taken advantage of thesaurus structure and utilized the thesaurus for sea rching feedback. the hierarc hical relationship in lcsh is underutilized in terms of the relationship betw een terms and associations through related terms. web-based opacs have failed to make use of this important access. the persistence of the se drawbacks in opac-interfac e design is rooted d eeply in cataloging rules that were derived from the manual environment more than a century ago. it reflects th e gap between "concepts typically h eld by nonprofessional users and those used in library practices." 45 in her article "why are online catalogs still hard to use?" borgman conclude s: despite numerous imp rove m en ts to the user interfac e of online catalogs in recent years, searc her s still find th em hard to use . most of th e improvements are in surface features rather than in th e core functionality. we see little evidence th at our research on searching behavior studies has influ enced onlin e catalog design. " catalog content users misunderstand th e scope of the catalog. in questionnaire responses, 80 percent of van pulis and ludy 's participants indicated the y had considered looking elsewhere than the library catalog, as in periodical ind exes. 47 blazek and bilal report ed a reque st for inclusion of journalarticle titles in one respo nse to their questionnaire .48 libraries responded to th ese requests by acquiring databas es on cd-rom , loadin g them locally (sometimes using the catalog system to mount a separate databas e), and, most recently, providing access to databases over the internet. however, seldom h ave libraries responded to these requests by integratin g searc h access through a single front end as the default search. i impact of web search engines blecic et al. found that keyword searching increased from 13.3 percent to 28.3 percent over her four-year series of logs. at the same time, zero hits in keyword increased from 8.71 percent to 20.78 percent while subject zero hits dropped from 23 percent to 13.69 percent. she surmised th at the influence of web interfaces might have affected the regression-fluctuation in search syntax, initial articles, and author order. 49 170 information technology ano libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. automatically sco uts the web for pa ges that are related to its res ults so it can find a large number of resources ver y qu ickly without requiring th e user to select the right keyw ord s . teoma structures the appropriate communities of int eres t on-the-fly and ranks th e results on a range of facto rs including authorities and hubs (good resources pointing to related resources). google offers an opti on of "similar pages." whil e the subj ect-r edirect function in a web-bas ed opac emulates thi s, it succ ee ds only if th e user 's initi a l search term y ielded the right result. opac u sers ha ve the option of clicking on hyperlinked h ea din gs (author, titl e, subject headin gs ) but cannot ask the sys tem to perform a more so phisticated sea rch on their behalf. user-popularity tracking amazon and barnes and nobl e web sites pr ese nt enhanced information about items b y user-popul arity tracking. circulation stati stics or user comments could serv e as a form of "r ecommend er sys tem " to h elp novi ces narrow th eir selections. messa ge s such as "o ther student s who checked this bo ok out also read thes e book s" could be dynamically in serted in bibliographic records. users could also be allowed to pro vide comment s on mat eri als in the catalog, thus providin g an int era ctive experience for opac user s. summary of web features there are positive and negati ve imp acts of web sea rch engines and on line bookstores on web-based opac u sers . u sers who find web p ages to b e comfortable, easy, and familiar may mak e greater use of web-ba sed opacs. while th ey brin g with them their knowledge of search eng ine s, they also brin g their misp erce ption s. the possibility of using similar too ls to those found on web sea rch en gine s can greatly "re infor ce the u sefuln ess of the catalog as well as th e positiv e perc eption that th e end us er has of it." 61 given the diver sity of the error s that u sers experience , a co mbination of approaches is necessa r y to improve their search success. automatic mapping of freetex t-to -th esauru s term s, tran sla tion of common spelling mistak es, and links to related pages are to ols alr eady in use in th e web sea rch engines . "see similar pages," extensive us e of releva nce feedback, and popularity track ing along with natural language are less common. i recommendations for web-based opacs th e authors' tla rev ea led a continuing problem with subject-h ea din g searches and sho we d a trend toward searching top ics that are n ot typically answered in a bo ok catalog. the form er probl em ha s a well-documented hi story, whil e the authors b elieve th e latt er probl em stems from the influence of th e web and web sear ch engin es . severa l changes to typical opacs are recommended to addr ess th e trend s observ ed in th e cour se of thi s study. metasearching th e recent trend of incorporating databases and opac s into a single sear ch reflects the neces sity of exp anding information resourc es and simplif ying access to resources. thi s stud y's empirical results clearly indicate a need to exp and thi s integration into one sea rch. while some argu e that this metasearching w ill further au gment the syntax digr ess ion an d pr eve nt us ers from becom ing information literate, oth er s beli eve that metas ea rchin g, along with th e option of sear chin g each individu al d a tabase , is an ultim a te goal for onl ine search. like it or not, the m etasearch technolog y, also known as federat ed or broadca st search, "crea tes a portal that could allow the libr ary to become the on e-stop shop th eir us ers and potential use rs find so attractive ."65 onesea rch-for-all cannot solve all problems; how ev er, guidin g u sers to where the y are mo st likel y to find results quickly (the quick search) should sa tisfy th e ne ed s of th e majority of u sers . menu sequence eff ec tive scree n d es ign h as a p osi ti ve e ffec t on user su ccess. the m enu sequence for search opti ons plays a significant role in user selection . this research and oth ers it h ave demonstr ated th at users choose an option hi gher rath er than lower in a list. too many options "simply cause confu sion, at least for less experienced opac use rs." •• browsing feature browsing is a natural and effective approach to m any information -seekin g problems an d requires less effort and knowledge on the part of th e u ser. the liter a ture suggests that a great deal of the use of th e web relies on known web sites, recommended sites , or return visits to sit es recently visited-thus relying on browsin g rather than on searching. jenkins, co rritore , and widenb eck found that domain novice s seldom clicked very deepout and b ack-while web experts explor ed mor e deeply. 67 holscher and strub e not e that hurtineene and wandtke claim that only minimal trainin g is necessa ry for brow sin g an individual web site, whil e pollok and hockl ey claim that considerably more experience is req uired for qu ery ing and na viga tin g among sites. 68 hancock -beaulieu found that betwe en 30 p ercent and 45 percent of all online searches, reg ardl ess of th e typ e of search, ar e concluded with brow sing the librar y shelve s.69 176 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. to implement user help throu gh tip s or tac tics select ed and accumulated from a collection of common u sersearc h mistakes. in such a case, the system would play a more activ e role by generatin g relevant search tips on th e fly and using zero-hits search resul ts as a basis for gener ating a spe ll check or sugg esting altern ate wording. an idea l scenario is th at opac allow s the user to pursue mu ltiple avenues of an inq uiry by entering fra gments of th e question, exploring vocabulary choices, and reformulating the search wi th the assis tan ce of var iou s spec ialized intelligent assistants. borgman suggests that an opac should be jud ge d by whether the ca tal og answers questions rather th an merely mat ches queri es. she s ugges ts the need to design systems that are ba sed on behavioral models of h ow people ask questions, arguing th a t users still need to tra n sla te their question into what a sys tem will accept. " user instruction onsite tr aini ng and online documentation can help mak e it eas ier to u se opac. with the adven t of information literacy, the shi ft in librar y instruction from procedur ebased query formulation to question-being-answered has taken place. at csula, in struct ion for en try-level classes focu ses on formulating a research sta teme nt and then identifying keywords and alternate terms. the instruc tion sess ions that follow the initia l-conce pt formulation are sh ort an d focus on how to en ter keyword or subject, a u t h 01~ a n d title, and th e u se of boolea n operators. thi s approac h may improve success until th e sys tems provid e th e tools to improve sea rch stra tegies or accept an unt rai ned user 's input. as an increas ing numb er of users access online librar y ca talogs remotel y, assistance needs to be embedded into intuitive sys tems. "time invested in elaborate help systems often is better spent in redesigning the user interfac e so that help is no longer n eeded." 74 users are not willing to devote much of their time to learning to use these systems. they just want to get th eir searc h results quickly and expec t the catalog to be easy to use w ith little or no tim e invested in learning th e sys tem. i conclusion the em piri cal study repo rted in thi s paper indicates th at p rogress has been made in terms of increasing search success by improv ing the opac search int erfac e. the goal is to design web-based opac systems for today's users who are like ly to bring a mental model of web search engin es to the lib rary catalog. web-b ased opacs and web search engi nes differ in terms of th eir sys tems and interfac e design. however, in most cases, these differences do not res ult in different sea rch charac teris tics by users. researc h findings on the impact of web searc h engines and u ser searc hing expectations and behavior should b e adequately utilized to guide the in terface design. web users typically do n ot know how a search engine works. therefore, fund amental fea tures in the desi gn of the n ext generation of th e opac in terface should includ e ch ang in g the search to allow natural-language searching wit h keyword search first, and focu s on meetin g th e quick-search need . such a concep t-ba sed sea rch will allow u sers to enter natu ra l lan guage of their chos en top ic in the searc h bo x w hil e th e system maps the quer y to th e s tru cture and content of the database. relevance feedb ack to allow the system to brin g back related page s, spe llin g correctio n, and relevan ce-ranked output remain key goals for future opacs. references and notes 1. sharon seymour, "on line public-access catalog user stud ies: a revi ew of research methodologies, march 1986november 1989," library and information science research 13 (1991): 89-102; andrew large and ja mshid beheshti , "opacs: a resear ch review," library and information science research 19 (1997): 2, 111-33. 2. ibid., 113-16. 3. ibid., 116-20. 4. thomas a. peters et al.," an introduct ion to the special section on transaction-log analysis," library hi tech 11(1993): 2, 37. 5. thomas a. peters, "the history and developm ent of transactionlog analysis," library hi tech 11 (1993): 2, 56. 6. pauline a. cochrane an d karen markey, "cata log use studies since th e introdu ction of onlin e interactiv e ca tal ogs: impact on design for subj ec t access, " in redesign of catalogs and indexes for improved subject access: selected papers of pauline a. cochrnne (phoenix: oryx , 1985), 159-84; steve n a. zink , "monitoring user success th ro u gh transac tion-log analysis: the wolfpac example," reference services review 19 (sprin g 1991): 44956; michael k. buckl and et al., "oas is: a front end for prototy ping catalog enhancements," library hi tech 10 (1992): 7-22. 7. thomas a. peters, "when smart people fail: an analysis of the tra nsaction log of an on line public-access catalog," journal of academic librarianship 15 (1989): 5, 267. 8. michael k. buckland et al., "oasis," 7-22. 9. deborah d. blecic et al., "using transac tion-lo g ana lys is to imp rove opac retrieval result s," college and research libraries (jan. 1998): 48. 10. peters, "histo ry and development of transacti on-log analys is," 2, 52. 11. cha rles r. hildr eth , "the use and understanding of keyword searching in a un iversity online catalog," information technology and libraries 16 (1997): 6. 12. ray r. larson, "th e decline of subject searching: long term trends and patt erns of index use in an online catalog," journal of the american society for information science and technology 42 (1991): 3, 210. 178 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. 13. john e. tolle and sehchang hah, "o nline search patterns: nlm catline database," journal of the american society for information science 36 (mar. 1985): 8293. 14. carol weiss moore, "user reac tion to online catalog s: an exploratory study," college and research libraries 42 (1981): 295-302; joseph r. matth ews et a l., using online catalogs: a nationwide survey-a report of a study sponsored by the council on library resources (new york: n ea l-schuman, 1983), 144. 15. rhonda n. hunter, "success and failures of patrons searching the online catalog at a large academic library: a transaction-log analysis," r.q 30 (spring 1991): 399. 16. noelle van puli s and lorne e. ludy, "subject searching in an onl ine cata log with aut h ority contro l," college and research libraries 49 (1988): 526. 17. hildret h, "th e use and understanding of keyword searching," 6. 18. ray r. larson, "the decline of subjec t searching," 3, 60. 19. ibid. 20. van pulis and ludy, "subj ect searching in an onlin e cat alog," 527. 21. karen markey, research report on the process of subject searching in the library catalog: final report of the subject access research project (repo rt no. oclc /op r/ rr-83-1) (dub lin , ohio: oclc online co mput er library center, 1983), 529. 22. pe ters, "the history and deve lopment o f transactionlog ana lysis," 2, 43. 23. hi ldr eth, "the use and understanding of keyword searching," 8-9. 24. david r. gerhan, "lcsh in vivo: subje ct searching performance an d strategy in th e opac era," journal of academic librarianship 15 (1989): 86-8 7. 25. joan m. cherry, "improving subject access in op acs: an exploratory study of conversion of users' queries," journal of academic librarianship 18 (1992): 2, 98. 26. rosemary thorne and jo bell whitlatch, "patron on line catalog success," college and research libraries 55 (1994): 496. 27. peters, "the history and developmen t of transactionlog analys is," 2, 48. 28. ibid. 29. h unt er, "succe ss and failures," 400. 30. ibid., 399. 31. ibid., 400. 32. peters, "the histor y and developmen t of transa ctionlog analysis," 2, 56. 33. hunter, "success and failures," 400. 34. hildreth , "the use and understandi n g of keyword searchi n g," 6. 35. patricia m . wa llace, "how do patrons search th e online c:, talog w h en no one 's looking? trnn sae tion-log a nal ysis and impli cation s for bibliographic instruction and system desi gn, " rq 33 (winter 1993): 3, 249. 36. large and beheshti, "opacs: a research review," 125. 37. m. m. hancock-beaulieu , "online cata logue: a case for the user," in the online catalogue: developments and directions, c. hildreth, ed. (london: library association , 1989), 25-46. 38. terry ballard, "com parative searching styles of patrons and staff," library resources and technical services 38 (1994): 293305. 39. jane scott et al.,"@*&#@ this computer and the horse it rode in on: patron frustration and failur e at th e opac" (in "co ntinuity and transformation : the promise of confluen ce": u sabi li rs·"' i [: ,, ), b p l..jr l.i ""( ' " user interface consulting fed erat ed search tn(,ln es 1.ibr;'.'\ry portals & [)at/\, (ln 'itr s ()pacs f.." ( h i ldrei' l's dl(, ital libr ar ies ezra schwartz locs (773) 256-1418 ezra@artandtech.com http://www.artandtech.com proceedings of the acrl 7th nationa l conference, chicago: acrl 1995), 247-56. 40. thorne and whitlat ch, "patron on lin e catalog success," 496. 41. blecic et al., "usin g tran sac tion-log ana lys is," 46. 42. virginia ortiz-repiso and purificac ion moscoso, "we bbased op a cs: between tradition and innovation ," lnformntion technology and libraries 18, no. 2 (june 1999): 68-69. 43. hildreth, "the use and understanding of keyword searching," 6. 44. ortiz-repiso and mos coso, "web-bas ed opac s," 71. 45. ibid., 75. 46. chris tine borgm an, "why are on line catalogs still hard to us e?" journal of the americnn society for information science 47 (1996): 7, 501. 47. van pulis and ludy, "subje ct searching in an onlin e cat alog," 53. 48. rla zek and bilal , "prob lems with opac: a case study of an academic research library," rq 28 (w int er 1988): 175. 49. debora h d. blecic et al., "a longitud inal stu dy of the effects of opac screen changes on searching behavior and user success," college and research library 60, no. 6 (nov. 1999): 524,527. 50. bernar d j. jan sen and udo pooch, "a revi ew of web searching studies and a framework for future resear ch," journal of the american society for information science and technology 52 (2001): 3, 249-50. 51. ibid., 250. 52. blazek and bilal, "problems with opac: a case study," 175; moore , "user reaction to online cata logs," 295-302. the impact of web search engines on subject searching in opac i yu and young 179 reproduced with permission of the copyright owner. further reproduction prohibited without permission. 53. m. j. bates, "the design of browsin g and berry-pickin g techniques for the onlin e search interfac e," online review 13 (1989): 5, 407-24. 54. jan sen and pooch, "a review of web searc hing studies, " 238. 55. judy luther, "trumping google? metasearching's promise," library journal 128 (2003): 16, 36. 56. jack muramatsu and wanda pratt, "transparent queries: investigating users' mental models of search engines," research and development in information retrieval sept. 2001. accessed mar. 10, 2003, http://citeseer.nj.nec.com/muramatsuoltransparent. html. 57. jans en and pooch, "a review of web searching studie s," 235. 58. luth er, 't rumping goog le," 36. 59. blecic et a l., "a lon gitudina l study of th e effects of opac screen changes," 527. 60. sus an m. colaric, "ins truction for web searching: an empirical study," college and research libraries news, 64 (2003): 2. 61. a. g. sutcliff, m. ennis, and s. j. watkinson, "empirical studies of end-user informati on searching," joumal of the american society for information science and tcchnologtj 51 (2000): 13, 1213. 62. "a ll about google," google. accessed dec. 10, 2003, www.google.com. 63. g. salton, introduction to modern information retrieval (new york: mcgraw-hill, 1983), 18. 64. orti z-rep iso and moscoso, "we b-ba sed opacs," 71. 65. luth er, "trumping google," 37. 66. maaike d. kiestr a et al, "end-us ers searching th e online catalogue: the influenc e of domain and system knowledge on search patterns. experiment at tilburg university," the electronic library 12 (dec. 1994): 335-43. 67. c. jen kins et al., "pa tterns of in forma tion seeking on the web: a qualitative study of domain expertise and web experti se," it and society l (winter 2003): 3, 74,77. accessed may 10, 2003, www.itandsociety.org/. 68. c. holscher and g. strube, "web search behavior of internet experts and newbi es," 9th international world wide web conference, (amsterdam. 2000). accessed mar. 28, 2003, www9.org/ w9cdrom /8 1/81.html; a. pollock and a. hockley, "wha t's wrong with internet searching," d-lib magazine (mar. 1997). accessed may 10, 2003, www.dlib.org/dlib/march97 /b t /03 pollo ck.h tml. 69. m . m . hanco ck-beau lieu , "on lin e catalogue: a case for the user," 25-46. 70. wilbert 0. galitz, the essential guide to user interface design: an introduction to gui design principles and techniques (chichester, england: wiley, 1996). 71. juliana chan," an evaluation of displays of bibliographic records in opacs in canadian academic and public libraries," mis report, univ. of toronto, 1995. [025.3132 c454e] 72. giorgio brajnik et al., "strategic h elp in user interfaces for information retriev al," journal of the american society for information science and technology (jasist) 53 (2002): 5, 344 . 73. borgman, "why are online catalogs still hard to use?" 500. 74. ibid . 180 information technology and libraries i december 2004 lib-mocs-kmc364-20131012114038 278 circulation systems past and present* maurice j. freedman: school of library service, columbia university, new york city. a review of the development of circulation systems shows two areas of change. the librarian's perception of circulation control has shifted from a broad service orientation to a narrow record-keeping approach and recently back again . the technological development of circulation systems has evolved from manual systems to the online systems of today. the trade-ojjs and deficiencies of earlier systems in relation to the comprehensive services made possible by the online computer are detailed. in her 1975 library technology reports study of automated circulation control systems, barbara markuson contrasted what she called "older" and "more recent" views of the circulation function. the "older" or traditional view was that circulation control centered on conservation of the collection and recordkeeping. the "more recent" attitude encompasses "all activities related to the use of library materials. " 1 it appears that this latter outlook is not as new as markuson had suggested. in 1927, jennie m. flexner's circulation work in public libraries described the work of circulation as the "activity of the library which through personal contact and a system of records supplies the reader with the [materials] wanted. "2 flexner went on to characterize four major functions of circulation as follows: (1) the staff must know the books in the collection, and have a working familiarity with them. (2) the staff must know the readers; their wants, interests, etc. (3) the circulation staff must fully understand the library mission and policies and work harmoniously with those in related departments. (4) the circulation department has its own particular duty to perform .... effective routines and techniques must be established by the library and mastered by the staff if the distribution of books is to be properly accomplished and the public is to have *this article is adapted from a speech delivered at rutgers university. manuscript received november 1980; revised may 1981 ; accepted july 1981. circulation systems/freedman 279 the fullest use of the resources of the institution. the library must be able to locate books, on the shelves or in circulation; to know who is using material and how the reader can be traced, if he is misusing or unduly withholding the books drawn. 3 the function of circulation has not changed since flexner's description. even within the context of online circulation systems, it is absolutely essential that the circulation system be seen in as broad a context as possible. it is not merely an electromechanical phenomenon staffed by automatonclerks. circulation services involve that function which is ultimately one of the most fundamental: the satisfactory bringing together of the library user and the materials sought by that person. it follows, then, that the mechanism and means of delivery and control of the service are only a small part, and certainly not the most important part of the circulation function. knowing your collection, your readers, and clearly knowing your library's mission are crucial prerequisites for the effective circulation of library materials. an examination of the history of circulation systems and their evolution to the present state reveals the change in outlook from a narrow view of the circulation function to a broader view. let us begin by establishing the basic elements of record keeping, upon which circulation control is based. there are three categories of records: 1. for the collection of materials, books, tapes, microforms, etc., comprising the library. 2. for the readers or users of the library service. 3. for the wedding or concatenation of the first two, i.e., the library user's use or borrowing of the library's materials. a minimal circulation model is a set of procedures or recordkeeping with respect to only the third category, i.e., records of the materials held by the library user outside of the library. a total or complete system would then be one that provides for all three categories. using these criteria to judge the level of control provided by the various circulation systems of the past, let us review. the earliest method of circulation control was the chain method. in this case, "circulation" is not an accurate term; "use" of materials is more appropriate, as the collection did not circulate. books were chained to the wall and the user did not take the material outside of the library. the minimal circulation model is not met, and records were not required. several hundred years later, the ledger system's first iteration involved a simple notation into a ledger. the identification of the book-call number and/or author and title-and the borrower's identification were recorded. upon the return of the book, the borrower or the receiving clerk initialed the ledger entry or otherwise indicated the return of the item. minimal circulation control is met. a more developed or sophisticated ledger system exceeded this minimal circulation model. the new ledger had each page headed by a different 280 journal of library automation vol. 14/4 december 1981 borrower or registration number. consequently, a given user had all of his or her charges recorded on the given page indicated by the user's number. the economy of not having to write the borrower's name for every transaction was made possible through the creation of a file of patron records linked to the ledger page by common registration numbers. in effect, this was our first "automation." the use of a master file in support of anumbered page provided information that had previously been handwritten every time someone wished to borrow books from the library. the new ledger system also allowed for a more orderly control of charges. only the borrower's number was needed to get at the page of transactions relating to that borrower, as opposed to the former methoda benchmark method, in a sensein which the transactions were chronologically entered and had no other ordering whatsoever. even with the improved ledger system, though, the only ordering was by borrower number and date of issue to the borrower. there was no arrangement that provided for sequencing or finding the books borrowed. the need to identify borrowed books led to the dummy system. every book had a concomitant dummy book (or large card) that had a ruled sheet of paper with the book identification information on it and the borrower's name and/or number. when a user wished to borrow a book, the dummy was pulled from a file and the borrower information was written on the sheet of paper. the dummy was then filed on the shelf occupying the space formerly occupied by the book itself. when the book was returned, it was reshelved, the dummy removed, and the circulation transaction was crossed out. this system is interesting in that it provides for a complete inventory control. either all items are on the shelf in proper sequence or a physical surrogate or record for circulating items is substituted and placed in proper sequence. one has instant and, in effect, "online" access to the presence or absence of materials if one has the call number and can go to the shelf. unlike most systems that can only tell whether or not the book is present, the dummy system tells who has the book and when it was charged. in terms of a minimal model, this system provided less and more than the ledger system. if a reader wanted a list of books he or she borrowed, the reader would have to view every dummy and see if the listed item was charged to him or her. in contrast, the ledger system served such a request well, though every page of the ledger might have to be examined to find out who had borrowed a book not found on the shelf. leaping past several systems, let us now discuss the newark system , the overwhelmingly prevalent system in the united states today (if we include the mechanical or electromechanical versions of dickman, gaylord (the manual, not automated), and demeo). the newark system incorporated the best features of the systems already mentioned. a separate registration file was kept which provided both alphabetic access by patron and numeric access by patron registration circulation systems/freedman 281 number. consequently, the recording of the borrower's identification during circulation transactions only involved the notation of the number. for book identification, a card and matching pocket were placed in each book with the call number and/or author-title identification information. the circulation transaction involved the removal of the card from the pocket and the entering on it, ala dummy system, the date of the transaction and the borrower number. the cards for all of the books borrowed on a given day were aggregated and filed in shelflist sequence in a tray headed by the date of the transactions. resorting to computer jargon, the major or primary sort of the book cards (read circulation cards) was by date, but the minor sort was by call number. consequently, if one wanted to know the status of a given book and one had the call number, it would not take too long to search, even with a file as large as the one in the main branch of newark public library, by looking for the item in all of the different days' charges. when a book was returned, the clerk noted from the date-of-issue card inserted in the book's pocket, the tray in which to search, and the matching call number on the pocket which was used for discharging the book, i.e., removing the charge card from the tray and replacing it in the book. the combination of the books on the shelf plus the cards in the different trays in shelflist order constituted a complete inventory. additionally, the trays of cards comprised a comprehensive record of all current charges, i.e., all transactions by date, call number, and borrower, with borrower number pointing to fuller information in the registration file. looking back at our basic model, the newark system offered not just the minimum-a record of the item and the borrower who took it-but also introduced a major step toward inventory control. there was an inventory sequence involved, or, more accurately, several inventory sequences-one for each given collection (or day) of circulation transactions. what was still missing was a record by borrower of what was charged to him or her. in the original newark system, the borrower's card had entered upon it dates of issue and return of items. this way, even if the library could not tell the user what items (s)he had, the user's card would reflect the number of items outstanding. the handling of reserves, renewals, and overdue notices occurred as follows: a colored clip or some indicator on a circulation card would be used to indicate a reserve. a renewal would be handled the same as a return except the person would wait while the charge card was pulled from the appropriately dated tray, and assuming that no reserves had been placed on the circulation card, the book would be recharged (i.e., renewed) to the borrower. overdues automatically presented themselves by default. cards left in a tray after a predetermined number of days represented charges for which overdues were to be sent. the tray was taken to the registration file and the numerically sequenced registration cards for the delinquent borrowers removed so that notices could be prepared and sent. then the 282 journal of library automation vol. 14/4 december 1981 registration slips and circulation cards had to be refiled at the completion of the process. essentially, most subsequent systems are variants on the newark system. the mcbee key-sort system involves the use of cards with prepunched holes around the edges, one of which can be notched to indicate the date an item is due. the cards are arranged by call number creating a single sequence. the insertion of a knitting needle .like device through a given hole will allow all of the books overdue for a given date to fall free of the deck. this system is like the newark system in that it has inventory and date access, but unlike newark it places a horrible burden on the borrower. each card has (written by the borrower) the borrower's name and address and the call number, author, and title of the book. thus, the library is saved the labor of creating circulation cards and maintaining registration records for every patron-all of the information needed is on the charge card. but here, as marvin scilken has pointed out, the burden of the library's tasks are merely passed on to the users. this point should be emphasized. the next system to be considered is the photo-charge system. microphotos are taken of the borrower's card, which has the name and address on it, the book card (as in the newark book identification card), and a sequentially numbered date-of-issue or date-due slip . again, as with the mcbee, since the photo record includes the borrower's name and address, one can throw away registration files. also, a list or range of transaction numbers is kept by date used. since the numbered date-of-issue slip is placed in the book at the time of charging, and one removes it when the book is returned, it is a simple step to cross off or remove the number on the slip from its corresponding duplicate on the list of numbers for that day's transactions. overdue transactions are found by searching for unchecked transaction numbers on the numerically sequenced microfilm. this system does meet the criterion of the minimal model, a record of the user's use of the item. in terms of labor intensity, one has eliminated the maintenance of charge-card files and registration files by a single microfilm record. reserves, though , are terribly time-consuming with the photo-charge system: each returned book, before it can be returned to the shelf or renewed, must be searched against a call-numbered sequence of reserve cards. academic libraries would not use this kind of system because call-number access is a necessity, especially in relation to recalls of longloaned items . the elimination of paper files is what so commended this system to public libraries over the newark-based systems. but, as was noted, one has virtually no way of determining who took a book out or when it is due back except, in principle, by searching all of the reels of microfilm. some variants on this microfilm system were developed. bro-dart marketed a system that thermographically produced eye-readable records instead of microimages . such was the state of circulation systems before computers began to be used. the following-a discussion of the involvement of computers-can circulation systems/freedman 283 be separated by the type of hardware: main frames, minicomputers, and microcomputers. the main-frame computer has been used primarily in the past as a processing unit for batches of circulation transactions collected and fed to it via punched cards, terminals, or minicomputers. call number and author and title (albeit brief) and user identification number, were captured for each transaction. in the 1960s and into the early 1970s, this information would be batch-processed by the computer and a variety of reports would be produced. what the computer does, then, is keeps track of numbers, their ranges, and the dates of the ranges. but the computer can do much more than this. it is capable, as none of the nonautomated systems were, of rearranging the data input and then comparing and tabulating them as desired and appropriate. consequently, the fact that the call number, author, and title are stored by the machine means that lists or files can be arranged by any of these elements. the same goes for date of transaction. as to borrower identification number, a master file much like the newark registration file is kept (only now in its machine-readable form), and the computer does the comparing at high speed instead of the clerk taking the charge record and going to the numeric file to find the name and address of the borrower. of course, the computer can then readily and quickly print out overdue notices with an obvious absence of clerical support and labor intensity. as we all know, the rate of increase of labor costs in increasing, and the rate of increase of computer costs is decreasing. two kinds of large computer systems have been used. the batchoriented one, which either kept track of items in circulation only (the absence system-only items absent from the collection were tracked), or one that kept track of the entire collection (the inventory system). 4 normally, identification numbers were used for patrons in either system. although relatively rare in academic and public libraries, the mainframe-based online system is also in use. ohio state university is famous for its online system. what is meant here is that all transactions are immediately recorded and all files are instantly updated. printing is still necessary for overdue notices, but printed circulation lists are not necessary because of the online answers to queries regarding books or patrons now possible through terminals distributed to appropriate locations. the minicomputers came on the scene in two stages. clsi's entrance in 1973 utilized one of the early minicomputers, quite small by today's standards. for relatively small libraries that had not begun to dream of having their own computers, it became possible to have an entire inventory (in abbreviated form) and an entire patron file online. consequently, all of the access power of the newark system, and none of its labor intensity, was available online and much more besides. few libraries could afford the main-frame system of ohio state, but many could pay for clsi's, and indeed they did. in the last few years, minicomputers have grown several magnitudes 284 journal of library automation vol. 14/4 december 1981 above the capacity and speed of main-frame computers of the 1960s. consequently, such firms as dataphase, systems control, geac, gaylord, and others offer these larger minis, which can now support online the needs of large branch systems with inventories of hundreds of thousands of books. incidentally, clsi, with a new mini line, can do this now as well. both the miniand maxi-based systems do all of the basic work originally outlined: the whole inventory can be accessed online or with printed lists arranged by author, title, or call number (and, presently, some vendors offer online subject access and cross-references); access can also be made by patron's name. further, the basic transactionitem, borrower, and date-is recorded and checked for holds or delinquency before it is accepted. without overly extolling the present state of the art, it should be said that all of the information identified as important in the earliest systems is now not only available in a far quicker and more usable fashion, it can be manipulated by the machine in a variety of ways to meet and serve management objectives not considered practicable in the past. peter simmons showed how collection development could be aided by automatically generating purchase orders when reserves exceeded a specified acceptable level. 5 all kinds of statistical data regarding collection and patron use can be generated that could not have been possible in a manual mode. while at the university of southwestern louisiana, william mcgrath was able to adjust book budget allocations in terms of collection use and undergraduate major in a most interesting fashion. 6 the net result was an empirically based expenditure of book funds. now the microcomputer or microprocessor is the newly emerging phenomenon , and in many respects it is not unlike the minicomputer of the early 1970s. it is being used to perform single data-recording functions, and is also being seen as the link to the larger computer . so we have moved from chained books to microcomputers the size of a desk top. originally, a great deal of information was captured at great expense and laboriously maintained. certainly the handwritten and typed records of the newark system, although relatively comprehensive, were obtained and preserved at great cost. and, despite it all , there were real limitations of access . the succeeding mcbee and photo-charging systems appreciably cut out-of-pocket costs to the library, but either passed labor directly on to the user, or eliminated access altogether. book or patron access are virtually impossible with the photo-charging method. simply put, that system tells what is overdue, and that's all. the entry in the 1960s of the computer radically altered the ground rules. now all sequences of encoded elements are possible, and management information can be derived. important statistical data pertaining to collection use and library users can be obtained by further manipulating the data accumulated in the circulation process. it is now possible for all but the smallest and the very largest libraries to have access to and control circulation systems/freedman 285 of their materials through the current range of minicomputers on the market. jennie flexner told us that circulation had to be more than maintenance and record keeping of loan and borrower transactions. through the advances of the computer technology and its application to circulation control, we have finally seen what seems to be an optimization of the recordkeeping process and, by extension, an improvement in circulation service. if instantaneous access to patron files, inventory files, and outstanding transaction files through a variety of modes and computer-developed management data does not constitute that optimization, it will have to dountil the real thing comes along. acknowledgment the author is deeply indebted to susan e. bourgault for her editorial assistance. references 1. barbara evans markuson, "automated circulation control," library technology reports quly and sept., 1975), p.6. 2. jennie m. flexner, circulation work in public libraries (chicago: american library assn., 1927), p.l. 3. ibid., p.2. 4. robert mcgee, "two types of design for online circulation systems," journal of library automation 5:185 (sept. 1972). 5. peter simmons, collection development and the computer (vancouver, b.c.: univ. of british columbia, 1971), 60p. 6. william e. mcgrath, "a pragmatic allocation formula for academic and public libraries with a test for its effectiveness," library resources & technical services 19:356-69 (fall1975). maurice j. freedman is an associate professor at the school of library service, columbia university, new york city. bailey 116 information technology and libraries | september 2006 three critical issues—a dramatic expansion of the scope, duration, and punitive nature of copyright laws; the ability of digital rights management (drm) systems to lock-down digital content in an unprecedented fashion; and the erosion of net neutrality, which ensures that all internet traffic is treated equally—are examined in detail and their potential impact on libraries is assessed. how legislatures, the courts, and the commercial marketplace treat these issues will strongly influence the future of digital information for good or ill. editor's note: this article was submitted in honor of the fortieth anniversaries of lita and ital. b logs. digital photo and video sharing. podcasts. rip/mix/burn. tagging. vlogs. wikis. these buzzwords point to a fundamental social change fueled by cheap personal computers (pcs) and servers, the internet and its local wired/wireless feeder networks, and powerful, low-cost software. citizens have morphed from passive media consumers to digital-media producers and publishers. libraries and scholars have their own set of buzzwords: digital libraries, digital presses, e-prints, institutional repositories, and open-access (oa) journals, to name a few. they connote the same kind of change: a democratization of publishing and media production using digital technology. it appears that we are on the brink of an exciting new era of internet innovation: a kind of digital utopia. gary flake of microsoft has provided one striking vision of what could be (with a commercial twist) in a presentation entitled “how i learned to stop worrying and love the imminent internet singularity,” and there are many other visions of possible future internet advances.1 when did this metamorphosis begin? it depends on who you ask. let’s say the late 1980s, when the internet began to get serious traction and an early flowering of noncommercial digital publishing occurred. in the subsequent twenty-odd years, publishing and media production went from being highly centralized, capital-intensive analog activities with limited and welldefined distribution channels, to being diffuse, relatively low-cost digital activities with the global internet as their distribution medium. not to say that print and conventional media are dead, of course, but it is clear that their era of dominance is waning. the future is digital. nor is it to say that entertainment companies (e.g., film, music, radio, and television companies) and information companies (e.g., book, database, and serial publishers) have ceded the digital-content battlefield to the upstarts. quite the contrary. high-quality, thousand-page-per-volume scientific journals and hollywood blockbusters cannot be produced for pennies, even with digital wizardry. information and entertainment companies still have an important role to play, and, even if they didn’t, they hold the copyrights to a significant chunk of our cultural heritage. entertainment and information companies have understood for some time that they must adapt to the digital environment or die, but this change has not always been easy, especially when it involves concocting and embracing new business models. nonetheless, they intend to thrive and prosper—and to do whatever it takes to succeed. as they should, since they have an obligation to their shareholders to do so. the thing about the future is that it is rooted in the past. culture, even digital culture, builds on what has gone before. unconstrained access to past works helps determine the richness of future works. inversely, when past works are inaccessible except to a privileged minority, future works are impoverished. this brings us to a second trend that stands in opposition to the first. put simply, it is the view that intellectual works are property; that this property should be protected with the full force of civil and criminal law; that creators have perpetual, transferable property rights; and that contracts, rather than copyright law, should govern the use of intellectual works. a third trend is also at play: the growing use of digital rights management (drm) technologies. when intellectual works were in paper (or other tangible forms), they could only be controlled at the object-ownership or object-access levels (a library controlling the circulation of a copy of a book is an example of the second case). physical possession of a work, such as a book, meant that the user had full use of it (i.e., the user could read the entire book and photocopy pages from it). when works are in digital form and are protected by some types of drm, this may no longer be true. for example, a user may only be able to view a single chapter from a drm-protected e-book and may not be able to print it. the fourth and final trend deals with how the internet functions at its most fundamental level. the internet was designed to be content-, application-, and hardware-neutral. as long as certain standards were met, the network did not discriminate. one type of content was not given preferential delivery speed over another. one type of strong copyright + drm + weak net neutrality = digital dystopia? charles w. bailey jr. charles w. bailey jr. (cbailey@digital-scholarship.com) is assistant dean for digital library planning and development at university of houston libraries. digital dystopia | bailey 117 content was not charged for delivery while another was free. one type of content was not blocked (at least by the network) while another was unhindered. in recent years, network neutrality has come under attack. the collision of these trends has begun in courts, legislatures, and the marketplace. it is far from over. as we shall see, its outcome will determine what the future of digital culture looks like. ฀ stronger copyright: 1790 versus 2006 copyright law is a complex topic. it is not my intention to provide a full copyright primer here. (indeed, i will assume that the reader understands some copyright basics, such as the notion that facts and ideas are not covered by copyright.) rather, my aim is to highlight some key factors about how and why united states copyright law has evolved and how it relates to the digital problem at hand. three authors (lawrence lessig, professor of law at the stanford law school; jessica litman, professor of law at the wayne state university law school; and siva vaidhyanathan, assistant professor in the department of culture and communication at new york university) have done brilliant and extensive work in this area, and the following synopsis is primarily based on their contributions. i heartily recommend that you read the cited works in full. the purpose of copyright let us start with the basis of u.s. copyright law, the constitution’s “progress clause”: “congress has the power to promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries.”2 copyright was a bargain: society would grant creators a time-limited ability to control and profit from their works before they fell into the public domain (where works are unprotected) because doing so resulted in “progress of science and useful arts” (a social good). regarding the progress clause, lessig notes: it does not say congress has the power to grant “creative property rights.” it says that congress has the power to promote progress. the grant of power is its purpose, and its purpose is a public one, not the purpose of enriching publishers, nor even primarily the purpose of rewarding authors.3 however, entertainment and information companies can have a far different view, as illustrated by this quote from jack valenti, former president of the motion picture association of america: “creative property owners must be accorded the same rights and protections resident in all other property owners in the nation.”4 types of works covered when the copyright act of 1790 was enacted, it protected published books, maps, and charts written by living u.s. authors as well as unpublished manuscripts by them.5 the act gave the author the exclusive right to “print, reprint, publish, or vend” these works. now, copyright protects a wide range of published and unpublished “original works of authorship” that are “fixed in a tangible medium of expression” without regard for “the nationality or domicile of the author,” including “1. literary works; 2. musical works, including any accompanying words; 3. dramatic works, including any accompanying music; 4. pantomimes and choreographic works; 5. pictorial, graphic, and sculptural works; 6. motion pictures and other audiovisual works; 7. sound recordings; 8. architectural works.”6 rights in contrast to the limited print publishing rights inherent in the copyright act of 1790, current law grants copyright owners the following rights (especially notable is the addition of control over derivative works, such as a play based on a novel or a translation): ฀ to reproduce the work in copies or phonograph records; ฀ to prepare derivative works based upon the work; ฀ to distribute copies or phonograph records of the work to the public by sale or other transfer of ownership, or by rental, lease, or lending; ฀ to perform the work publicly, in the case of literary, musical, dramatic, and choreographic works, pantomimes, and motion pictures and other audiovisual works; ฀ to display the copyrighted work publicly, in the case of literary, musical, dramatic, and choreographic works, pantomimes, and pictorial, graphic, or sculptural works, including the individual images of a motion picture or other audiovisual work; and ฀ in the case of sound recordings, to perform the work publicly by means of a digital audio transmission.7 duration the copyright act of 1790 granted authors a term of fourteen years, with one renewal if the author was still living (twenty-eight years total).8 now the situation is much more complex, and, rather than trying to review the details, i’ll provide the following example. for a personal author who produced a work on or after january 1, 1978, it is covered for the life of the author plus seventy years.9 so, assuming 118 information technology and libraries | september 2006 an author lives an average seventy-five years, the work would be covered for 144 years, which is approximately 116 years longer than in 1790. registration registration was required by the copyright act of 1790, but very few eligible works were registered from 1790 to 1800, which enriched the public domain.10 now registration is not required, and no work enriches the public domain until its term is over, even if the author (or the author’s descendants) have no interest in the work being under copyright, or it is impossible to locate the copyright holder to gain permission to use his or her works (creating so-called “orphan works”). drafting of legislation by 1901, copyright law had become fairly esoteric and complex, and drafting new copyright legislation had become increasingly difficult. consequently, congress adopted a new strategy: let those whose commercial interests were directly affected by copyright law deliberate and negotiate with each other about copyright law changes, and use the results of this process as the basis of new legislation.11 over time, this increasingly became a dialogue among representatives of entertainment, high-tech, information, and telecommunications companies; other parties, such as library associations; and rights-holder groups (e.g., ascap). since these parties often had competing interests, the negotiations were frequently contentious and lengthy. the resulting laws created a kind of crazy quilt of specific exceptions for the deals made during these sessions to the ever-expanding control over intellectual works that copyright reform generally engendered. since the public was not at the table, its highly diverse interests were not directly represented, and, since stakeholder industries lobby congress and the public does not, the public’s interests were often not well served. (there were some efforts by special interest groups to represent the public on narrowly focused issues.) frequency of copyright term legislation with remarkable restraint, congress, in its first hundred years, enacted one copyright bill that extended the copyright term and one in its next fifty; however, starting in 1962, it passed eleven bills in the next forty years.12 famously, jack valenti once proposed that copyright “last forever less one day.”13 by continually extending copyright terms in a serial fashion, congress may grant him his wish. licenses in 1790, copyrighted works were sold and owned. today, many digital works are licensed. licenses usually fall under state contract law rather than federal copyright law.14 licensed works are not owned, and the first-sale doctrine is not in effect.15 while copyright is the legal foundation of licenses (i.e., works can be licensed because licensors own the copyright to those works), licenses are contracts, and contract provisions trump user-favorable copyright provisions, such as fair use, if the licensor chooses to negate them in a license. criminal and civil penalties in 1790 there were civil penalties for copyright infringement (e.g., statutory fines of “50 cents per sheet found in the infringer ’s possession”).16 now there are criminal copyright penalties, including felony violations that can result in a maximum of five years of imprisonment and fines as high as $250,000 for first-time offenders; civil statutory fines that can range as high as $150,000 per infringement (if infringement is “willful”), and other penalties.17 once the copyright implications of digital media and the internet sunk in, entertainment and information companies were deeply concerned: digital technologies made creating perfect copies effortless, and the internet provided a free (or low-cost) way to distribute content globally. congress, primarily spurred on by entertainment companies, passed several laws aimed at curtailing perceived digital “theft” through criminal penalties. under the 1997 no electronic theft (net) act, copyright infringers face “up to 3 years in prison and/or $250,000 fines,” even for noncommercial infringement.18 under the 1998 digital millennium copyright act (dmca), those who defeat technological mechanisms that control access to copyrighted works (a process called “circumvention”) face a maximum of five years in prison and $500,000 in fines.19 effect of copyright on average citizens in 1790, copyright law had little effect on citizens. the average person was not an author or publisher, private use of copyrighted materials was basically unregulated, the public domain was healthy, and many types of works were not covered by copyright at all. in 2006, ฀ virtually every type of work imaginable is under automatic copyright protection for extended periods of time; ฀ private use of digital works is increasingly visible and of concern to copyright holders; ฀ the public domain is endangered; and ฀ ordinary citizens are being prosecuted as “pirates” under draconian statutory and criminal penalties. digital dystopia | bailey 119 regarding this development, lessig says: for the first time in our tradition, the ordinary ways in which individuals create and share culture fall within the reach of the regulation of the law, which has expanded to draw within its control a vast amount of culture and creativity that it never reached before. the technology that preserved the balance of our history—between uses of our culture that were free and uses of our culture that were only upon permission—has been undone. the consequence is that we are less and less a free culture, more and more a permission culture.20 how has copyright changed since the days of the founding fathers? as we have seen, there has been a shift in copyright law (and social perceptions of it) from ฀ promoting progress to protecting intellectual property owners’ “rights”; ฀ from covering limited types of works to covering virtually all types of works; ฀ from granting only basic reproduction and distribution rights to granting a much wider range of rights; ฀ from offering a relatively short duration of protection to offering a relatively long (potentially perpetual) one; ฀ from requiring registration to providing automatic copyright; ฀ from drafting laws in congress to drafting laws in work groups of interested parties dominated by commercial representatives; ฀ from making infrequent extensions of copyright duration to making frequent ones; ฀ from selling works to licensing them; ฀ from relatively modest civil penalties to severe civil and criminal penalties; and ฀ from ignoring ordinary citizens’ typical use of copyrighted works to branding them as pirates and prosecuting them with lawsuits. (regarding lawsuits filed by the recording industry association of america against four students, lessig notes: “if you added up the claims, these four lawsuits were asking courts in the united states to award the plaintiffs close to $100 billion—six times the total profit of the film industry in 2001.”)21 complicating this situation further is intense consolidation and increased vertical integration in the entertainment, information, telecommunications, and other high-tech industries involved in the internet.22 this vertical integration has implications for what can be published and the free flow of information. for example, a company that publishes books and magazines, produces films and television programs, provides internet access and digital content, and provides cable television services (including broadband internet access) has different corporate interests than a company that performs a single function. these interrelated interests may affect not only what information is produced and whether competing information and services are freely available through controlled digital distribution channels, but corporate perceptions of copyright issues as well. one of the ironies of the current copyright situation is this: if creative works are by nature property, and stealing property is (and has always been) wrong, then some of the very industries that are demanding that this truth be embodied in copyright law have, in the past, been pirates themselves, even though certain acts of piracy may have been legal (or appeared to be legal) under then-existing copyright laws.23 lessig states: if “piracy” means using the creative property of others without their permission—if “if value, then right” is true—then the history of the content industry is a history of piracy. every important sector of “big media” today—film, records, radio, and cable tv—was born of a kind of piracy so defined. the consistent story is how last generation’s pirates join this generation’s country club—until now.24 let’s take a simple case: cable television. early cable television companies used broadcast television programs without compensating copyright owners, who branded their actions as piracy and filed lawsuits. after two defeats in the supreme court, broadcast television companies won a victory (of sorts) in congress, which took nearly thirty years to resolve the matter: cable television companies would pay, but not what broadcast television companies wanted; rather they would pay fees determined by law.25 of course, this view of history (big media companies as pirates in their infancy) is open to dispute. for the moment, let’s assume that it is true. put more gently, some of the most important media companies of modern times flourished because of relatively lax copyright control, a relatively rich public domain, and, in some cases, a societal boon that allowed them to pay statutory license fees— which are compulsory for copyright owners—instead of potentially paying much higher fees set by copyright owners or being denied use at all. today, the very things that fostered media companies’ growth are under attack by them. the success of those attacks is diminishing the ability of new digital content and service companies to flourish and, in the long run, may diminish even big media’s ability to continue to thrive as a permission culture replaces a permissive culture. several prominent copyright scholars have suggested copyright reforms to help restore balance to the copyright system. james boyle, professor of law at the duke university law school, recommends a twenty-year copyright term with “a broadly defined fair use protection for journalistic, teaching, and parodic uses—provided that those uses were not judged to be in bad faith by a jury applying the ‘beyond a reasonable doubt’ standard.”26 120 information technology and libraries | september 2006 william w. fisher iii, hale and dorr professor of intellectual property law at harvard university law school, suggests that “we replace major portions of the copyright and encryption-reinforcement models with . . . a governmentally administered reward system” that would put in place new taxes and compensate registered copyright owners of music or films with “a share of the tax revenues proportional to the relative popularity of his or her creation,” and would “eliminate most of the current prohibitions on unauthorized reproduction, distribution, adaptation, and performance of audio and video recordings.”27 lessig recommends that copyright law be guided by the following general principles: (1) short copyright terms, (2) a simple binary system of protected/not protected works without complex exceptions, (3) mandatory renewal, and (4) a “prospective” orientation that forbids retrospective term extensions.28 (previously, lessig had proposed a seventy-five-year term contingent on five-year renewals). he suggests reinstating the copyright registration requirement using a flexible system similar to that used for domain name registrations. he favors works having copyright marks, and, if they are not present, he would permit their free use until copyright owners voice their opposition to this use (uses of the work made prior to this point would still be permitted). litman wants a copyright law “that is short, simple, and fair,” in which we “stop defining copyright in terms of reproduction” and recast copyright as “an exclusive right of commercial exploitation.”29 litman would eliminate industry-specific copyright law exceptions, but grant the public “a right to engage in copying or other uses incidental to a licensed or legally privileged use”; the “right to cite” (even infringing works); and “an affirmative right to gain access to, extract, use, and reuse the ideas, facts, information, and other public-domain material embodied in protected works” (including a restricted circumvention right).30 things change in two hundred-plus years, and the law must change with them. since the late nineteenth century, copyright law has been especially impacted by new technologies. the question is this: has copyright law struck the right balance between encouraging progress through granting creators specific rights and fostering a strong public domain that also nourishes creative endeavor? if that balance has been lost, how can it be restored? or is society simply no longer striving to maintain that balance because intellectual works are indeed property, property must be protected for commerce to prosper, and the concept of balance is outmoded and no longer reflects societal values? ฀ drm: locked-up content and fine-grained control noted attorney michael godwin defines drm as “a collective name for technologies that prevent you from using a copyrighted digital work beyond the degree to which the copyright owner (or a publisher who may not actually hold a copyright) wishes to allow you to use it.”31 like copyright, drm systems are complex, with many variations. there are two key technologies: (1) digital marking (i.e., digital fingerprints that uniquely identify a work based on its characteristics, simple labels that attach rights information to content, and watermarks that typically hide information that can be used to identify a work), and (2) encryption (i.e., scrambled digital content that requires a digital key to decipher it).32 specialized hardware can be used to restrict access as well, often in conjunction with digital marking and encryption. the intent of this article is not to provide a technical tutorial, but to set forth an overview of the basic drm concept and discuss its implications. what is of interest here is not how system a-b-c works in contrast to system x-y-z, but what drm allows copyright owners to do and the issues related to drm. to do so, let’s use an analogy, understanding that real drm systems can work in other ways as well (e.g., digital watermarks can be used to track illegal use of images on the internet without those images being otherwise protected). for the moment, let’s imagine that the content a user wishes to access is in an unbreakable, encrypted digital safe. the user cannot see inside the safe. by entering the correct digital combination, certain content becomes visible (or audible or both) in the safe. that content can then be utilized in specific ways (and only those ways), including, if permitted, leaving the safe. if a public domain work is put in the safe, access to it is restricted regardless of its copyright status. bill rosenblatt, bill trippe, and stephen mooney provide a very useful conceptual model of drm rights in their landmark drm book, digital rights management: business and technology, summarized here.33 there are three types of content rights: (1) render rights, (2) transport rights, and (3) derivative-works rights. render rights allow authorized users to view, play, and print protected content. transport rights allow authorized users to copy, move, and loan content (the user retains the content if it is copied and gets it back when a loan is over, but does not keep a copy if it is moved). derivative-works rights allow authorized users to extract pieces of content, edit the content in place, and embed content by extracting some of it and using it in other works. each one of these individual rights has three attributes: (1) consideration, (2) extents, and (3) types of users. in the first attribute, consideration, access to content is provided for something of value to the publisher (e.g., money or personal information). content can then be used to some extent (e.g., for a certain amount of time or a certain number of times). the rights and attributes users have are determined by their user types. digital dystopia | bailey 121 for example, an academic user, in consideration of a specified license payment by his or her library, can view a drm-protected scholarly article—but not copy, move, loan, extract, edit, or embed it—for a week, after which it is inaccessible. we can extend this hypothetical example by imagining that the library could pay higher license fees to gain more rights to the journal in question, and the library (or the user) could dynamically purchase additional article-specific rights enhancements as needed through micropayments. this example is extreme; however, it illustrates the fine-grained, high level of control that publishers could potentially have over content by using drm technology. godwin suggests that drm may inhibit a variety of legitimate uses of drm-protected information, such as access to public-domain works (or other works that would allow liberal use), preservation of works by libraries, creation of new derivative works, conduct of historical research, exercise of fair-use rights, and instructional use.34 the ability of blind (or otherwise disabled) users to employ assistive technologies may also be prevented by drm technology.35 drm also raises a variety of privacy concerns.36 fair use is an especially thorny problem. rosenblatt, trippe, and mooney state: fair use is an “i’ll know it when i see it” proposition, meaning that it can’t be proscriptively defined. . . . just as there is no such thing as a “black box” that determines whether broadcast material is or isn’t indecent, there is no such thing as a “black box” that can determine whether a given use of content qualifies as fair use or not. anything that can’t be proscriptively defined can’t be represented in a computer system.37 no need to panic about scholarly journals—yet. your scholarly journal publisher or other third-party supplier is unlikely to present you with such detailed options tomorrow. but you may already be licensing other digital content that is drm-protected, such as digital music or e-books that require a hardware e-book reader. as the recent sony bmg “rootkit” episode illustrated, creating effective, secure drm systems can be challenging, even for large corporations.38 again, the reasons for this are complex. in very simple terms, it boils down to this: assuming that the content can be protected up to the point it is placed in a drm system, the drm system has the best chance of working if all possible devices that can process its protected content either directly support its protection technology, recognize its restrictions and enforce them through another means, or refuse access.39 anything less creates “holes” in the protective drm shell, such as the well-known “analog hole” (e.g., when drm-protected digital content is converted to analog form to be played, it can then be rerecorded using digital equipment without drm protection).40 ideally, in other words, every server, network router, pc and pc component, operating system, and relevant electronic device (e.g., cd player, dvd player, audiorecording device, and video-recording device) would work with the drm system as outlined previously or would not allow access to the content at all. clearly, this ideal end-state for drm may well never be realized, especially given the troublesome backwardcompatibility equipment problem.41 however, this does not mean that the entertainment, information, and hightechnology companies will not try to make whatever piecemeal progress that they can in this area.42 the trusted computing group is an important multiple-industry security organization, whose standards work could have a strong impact on the future of drm. robert a. gehring notes: but a drm system is almost useless, that is from a content owner’s perspective, until it is deployed broadly. putting together cheap tc components with a marketdominating operating system “enriched” with drm functionality is the most economic way to provide the majority of users with “copyright boxes.”43 seth schoen argues computer owners should be empowered to override certain features of “trusted computing architecture” to address issues with “anti-competitive and anti-consumer behavior” and other problems.44 drm could potentially be legislatively mandated. there is a closely related legal precedent, the audio home recording act, which requires that digital audiotape equipment include special hardware to prevent serial copying.45 there is currently a bill before congress that would require use of a “broadcast flag” (a digital marker) for digital broadcast and satellite radio receivers.46 last year, a similar fcc regulation for broadcast digital television was struck down by a federal appeals court; consequently, the current bill explicitly empowers the fcc to “enforce ‘prohibitions against unauthorized copying and redistribution.’”47 another bill would plug the analog-to-digital video analog hole by putting “strict legal controls on any video analog to digital (a/d) convertors.”48 whether these bills become law or not, efforts to mandate drm are unlikely to end. dmca strongly supports drm by prohibiting both the circumvention of technological mechanisms that control access to copyrighted works (with some minor exceptions) and the “manufacture of any device, composition of any program, or offering of any service” to do so.49 what would the world be like if all newly published (or released) commercially created information was in digital form, protected by drm? what would it be like if all old works in print and analog formats were only reissued in digital form, protected by drm? what would it be like if all hardware that could process that digital information had to support the information’s drm scheme or block any access to it because this was mandated by law? what would it be 122 information technology and libraries | september 2006 like if all operating systems had direct or indirect built-in support for drm? would “progress of science and useful arts” be promoted or squashed? ฀ weaker net neutrality lessig identifies three important characteristics of the internet that have fostered innovation: (1) edge architecture: software applications run on servers connected to the network, rather than on the network itself, ensuring that the network itself does not have to be modified for new or updated applications to run; (2) no application optimization: a relatively simple, but effective, protocol is utilized (internet protocol) that is indifferent to what software applications run on top of it, again insulating the network from application changes; and (3) neutral platform: the network does not prefer certain data packets or deny certain packets access.50 lessig’s conceptual model is very useful when thinking about net neutrality, a topic of growing concern. educause’s definition of net neutrality aptly captures these concerns: “net neutrality” is the term used to describe the concept of keeping the internet open to all lawful content, information, applications, and equipment. there is increasing concern that the owners of the local broadband connections (usually either the cable or telephone company) may block or discriminate against certain internet users or applications in order to give an advantage to their own services. while the owners of the local network have a legitimate right to manage traffic on their network to prevent congestion, viruses, and so forth, network owners should not be able to block or degrade traffic based on the identity of the user or the type of application solely to favor their interests.51 for some time, there have been fears that net neutrality was endangered as the internet became increasingly commercialized, a greater percentage of home internet users migrated to broadband connections not regulated by common carrier laws, and telecommunications mergers (and vertical integration) accelerated. some of these fears are now appearing to be realized, albeit with resistance by the internet community. for example, aol has indicated that it will implement a two-tier e-mail system for companies, nonprofits, and others who send mass mailings: those who pay bypass spam filters, those who don’t pay don’t bypass spam filters.52 critics fear that free e-mail services will deteriorate under a two-tier system. facing fierce criticism from the dearaol.com coalition and many others, aol has relented somewhat on the nonprofit issue by offering special treatment for “qualified” nonprofits. a second example is that an analysis of verizon’s fcc filings reveals that “more than 80% of verizon’s current capacity is earmarked for carrying its service, while all other traffic jostles in the remainder.”53 content-oriented net companies are worried: leading net companies say that verizon’s actions could keep some rivals off the road. as consumers try to search google, buy books on amazon.com, or watch videos on yahoo!, they’ll all be trying to squeeze into the leftover lanes on verizon’s network. . . . “the bells have designed a broadband system that squeezes out the public internet in favor of services or content they want to provide,” says paul misener, vice-president for global policy at amazon.com.54 a third example is a comment by william l. smith, bellsouth ‘s chief technology officer, who “told reporters and analysts that an internet service provider such as his firm should be able, for example, to charge yahoo inc. for the opportunity to have its search site load faster than that of google inc.,” but qualified this assertion by indicating that “a pay-for-performance marketplace should be allowed to develop on top of a baseline service level that all content providers would enjoy.”55 about four months later, at&t announced that it would acquire bellsouth, after which it “will be the local carrier in 22 states covering more than half of the american population.”56 finally, in a white paper for public knowledge, john windhausen jr. states: this concern is not just theoretical—broadband network providers are taking advantage of their unregulated status. cable operators have barred consumers from using their cable modems for virtual private networks and home networking and blocked streaming video applications. telephone and wireless companies have blocked internet telephone (voip—voice over the internet protocol) traffic outright in order to protect their own telephone service revenues.57 these and similar examples are harbingers of troubled days ahead for net neutrality. the canary in the net neutrality mine isn’t dead yet, but it’s getting very nervous. the bottom line? noted oa advocate peter suber analyzes the situation as follows: but now cable and telecom companies want to discriminate, charge premium prices for premium service, and give second-rate service to everyone else. if we relax the principle of net neutrality, then isps could, if they wanted, limit the software and hardware you could connect to the net. they could charge you more if you send or receive more than a set number of emails. they could block emails containing certain keywords or emails from people or organizations they disliked, and block traffic to or from competitor web sites. they could make filtered service the default and force users to pay extra for the digital dystopia | bailey 123 wide open internet. if you tried to shop at a store that hasn’t paid them a kickback, they could steer you to a store that has. . . . if companies like at&t and verizon have their way, there will be two tiers of internet service: fast and expensive and slow and cheap (or cheaper). we unwealthy users—students, scholars, universities, and small publishers—wouldn't be forced offline, just forced into the slow lane. because the fast lane would reserve a chunk of bandwidth for the wealthy, the peons would crowd together in what remained, reducing service below current levels. new services starting in the slow lane wouldn't have a fighting chance against entrenched players in the fast lane. think about ebay in 1995, google in 1999, or skype in 2002 without the level playing field provided by network neutrality. or think about any oa journal or repository today.58 is net neutrality a quaint anachronism of the internet’s distant academic/research roots that we would be better off without? would new internet companies and noncommercial services prosper better if it was gone, spurring on new waves of innovation? would telecommunications companies (who may be part of larger conglomerates), free to charge for tiered-services, offer us exciting new service offerings and better, more reliable service? ฀ defending the internet revolution sixties icon bob dylan’s line in “the times they are achangin’”—“then you better start swimmin’ or you’ll sink like a stone”—couldn’t be more apt for those concerned with the issues outlined in this paper. here’s a brief overview of some of the strategies being used to defend the freewheeling internet revolution. 1. darknet: j. d. lasica says: “for the most part, the darknet is simply the underground internet. but there are many darknets: the millions of users trading files in the shady regions of usenet and internet relay chat; students who send songs and tv shows to each other using instant messaging services from aol, yahoo, and microsoft; city streets and college campuses where people copy, burn, and share physical media like cds; and the new breed of encrypted dark networks like freenet. . .”59 we may think of the darknet as simply fostering illegal file swapping by ordinary citizens, but the darknet strategy can also be used to escape government internet censorship, as is the case with freenet use in china.60 2. legislative and legal action: there have been attempts to pass laws to amend or reverse copyright and other laws resulting from the counter-internet-revolution, which have been met by swift, powerful, and generally effective opposition from entertainment companies and other parties affected by these proposed measures. the moral of this story is that these large corporations can afford to pay lobbyists, make campaign contributions, and otherwise exert significant influence over lawmakers, while, by and large, advocates for the other side do not have the same clout. the battle in the courts has been more of a mixed bag; however, there have been some notable defeats for reform advocates, especially in the copyright arena (e.g., eldred v. ashcroft), where most of the action has been. 3. market forces: when commercial choices can be made, users can vote with their pocketbooks about some internet changes. but, if monopoly forces are in play, such as having a single option for broadband access, the only other choice may be no service. however, as the oa movement (described later) has demonstrated, a concerted effort by highly motivated individuals and nonprofit organizations can establish viable new alternatives to commercial services that can change the rules of the game in some cases. companies can also explore radical new business models that may appear paradoxical to pre-internet-era thinking, but make perfect sense in the new digital reality. in the long run, the winners of the digital-content wars may be those who are not afraid of going down the internet rabbit hole. 4. creative commons: copyright is a two-edged sword: it can be used as the legal basis of licenses (and drm) to restrict and control digital information, or it can be used as the legal basis of licenses to permit liberal use of digital information. by using one of the six major creative common licenses (ccl), authors can retain copyright, but significantly enrich society’s collective cultural repository with works that can be freely shared for noncommercial purposes, used, in some cases, for commercial purposes, and used to easily build new derivative creative works. for example, the creative commons attribution license requires that a work is attributed to the author; however, a work can be used for any commercial or noncommercial purpose without permission, including creating derivative works.61 there are a variety of other licenses, such as the gnu free documentation license, that can be used for similar purposes.62 5. oa: scholars create certain types of information, such as journal articles, without expecting to be paid to do so, and it is in their best interests for these works to be widely read, especially by specialists in their fields.63 by putting e-prints (electronic preprints or post-prints) of articles on personal home pages or in various types of digital archives (e.g., institutional repositories) in full compliance with copyright law and, if needed, in compliance with publisher policies, scholars can provide free global access to these works with minimal effort and at no (or little) cost to themselves. further, a new generation of free e-journals are being published on the internet that are being funded by a variety of business models, such as advertising, author fees, library membership fees, and supplemental products. these oa strategies make digital 124 information technology and libraries | september 2006 scholarly information freely available to users across the globe, regardless of their personal affluence or the affluence of their affiliated institutions. ฀ impact on libraries this paper’s analysis of copyright, drm, and network neutrality trends holds no good news for libraries. copyright the reach of copyright law constantly encompasses new types of materials and for an ever-lengthening duration. as a result, copyright holders must explicitly place their works in the public domain if the public domain is to continue to grow. needless to say, the public domain is a primary source of materials that can be digitized without having to face a complex, potentially expensive, and sometimes hopeless permission clearance process. this process can be especially daunting for media works (such as films and video), even for the use of very short segments of these works. j. d. lasica recounts his effort to get permission to use short music and film segments in a personal video: five out of seven music companies declined; six out of seven movie studios declined, and the one that agreed had serious reservations.64 the replies to his inquiry, for those companies that bothered to reply at all, are well worth reading. for u.s. libraries without the resources to deal with complicated copyright-related issues, the digitization clock stops at 1922, the last year we can be sure that a work is in the public domain without checking its copyright status and getting permission if it is under copyright.65 what can we look forward to? lessig says: “thus, in the twenty years after the sonny bono act, while one million patents will pass into the public domain, zero copyrights will pass into the public domain by virtue of the expiration of a copyright term.”66 (the sonny bono term extension act was passed in 1998.) digital preservation is another area of concern in a legal environment where most information is automatically copyrighted, copyright terms are lengthy (or endless), and information is increasingly licensed. simply put, a library cannot digitally preserve what it does not own unless the work is in the public domain, the work’s license permits it, or the work’s copyright owner grants permission to do so. or can it? after all, the internet archive does not ask permission ahead of time before preserving the entire internet, although it responds to requests to restrict information. and that is why the internet archive is currently being sued by healthcare advocates, which says that it: “is just like a big vacuum cleaner, sucking up information and making it available.”67 if it is not settled out of court, this will be an interesting case for more digitally adventurous libraries to watch. as the cost of the hardware and software needed to effectively do so continues to drop, faculty, students, and other library users will increasingly want to repurpose content, digitizing conventional print and media materials, remixing digital ones, and/or creating new digital materials from both. with the “information commons” movement, academic libraries are increasingly providing users with the hardware and software tools to repurpose content. given that the wording of the u.s. copyright act section 108 (f) (1) is vague enough that it could be interpreted to include these tools when they are used for information reproduction, is the old “copyright disclaimer on the photocopier” solution enough in the new digital environment? or—in light of the unprecedented transformational power of these tools to create new digital works, and their widespread use both within libraries and on campus—do academic libraries bear heavier responsibilities regarding copyright compliance, permission-seeking, and education? similar issues arise when faculty want to place self-created digital works that incorporate copyrighted materials in electronic reserves systems or institutional repositories. enduser contributions to “library 2.0” systems that incorporate copyrighted materials may also raise copyright concerns. drm as libraries realize that they cannot afford dual formats, their new journal and index holdings are increasingly solely digital. libraries are also licensing a growing variety of “born digital” information. the complexities of dealing with license restrictions for these commercial digital products are well understood, but imagine if drm was layered on top of license restrictions. as we have discussed, drm will allow content producers and distributors to slice, dice, and monetize access to digital information in ways that were previously impossible. what may be every publisher/vendor’s dream could be every library’s nightmare. aside from a potential surge of publisher/vendor-specific access licensing options and fees, libraries may also have to contend with publisher/ vendor-specific drm technical solutions, which may: ฀ depend on particular hardware/software platforms, ฀ be incompatible with each other, ฀ decrease computer reliability and security, ฀ eliminate fair or otherwise legal use of drm-protected information, ฀ raise user privacy issues, ฀ restrict digital preservation to bitstream preservation (if allowed by license), digital dystopia | bailey 125 ฀ make it difficult to assess whether to license drmprotected materials, ฀ increase the difficulty of providing unified access to information from different publishers and vendors, ฀ multiply user support headaches, and ฀ necessitate increased staffing. drm makes solving many of these problems both legally and technically impossible. for example, under dmca, libraries have the right to circumvent drm for a work in order to evaluate whether they want to purchase it. however, they cannot do so without the software tools to crack the work’s drm protection. but the distribution of those tools is illegal under dmca, and local development of such tools is likely to be prohibitively complex and expensive.68 fostering alternatives to restrictive copyright and drm given the uphill battle in the courts and legislatures, ccls (or similar licenses) and oa are particularly promising strategies to deal with copyright and drm issues. copyright laws do not need to change for these strategies to be effective. it is not just a question of libraries helping to support oa by paying for institutional memberships to oa journals, building and maintaining institutional repositories, supporting oa mandates, encouraging faculty to edit and publish oa journals, educating faculty about copyright and oa issues, and encouraging them to utilize ccls (or similar licenses). to truly create change, libraries need to “walk the talk” and either let the public-domain materials they digitize remain in the public domain, or put them under ccls (or similar licenses), and, when they create original digital content, put it under ccls (or similar) licenses as well. as the oa movement has shown, using ccls does not rule out revenue generation (if that is an appropriate goal), but it does require facilitating strategies, such as advertising and offering fee-based add-on products and services. net neutrality there are many unknowns surrounding the issue of net neutrality, but what is clear is that it is under assault. it is also clear that internet services are more likely to require more, not less, bandwidth in the future as digital media and other high-bandwidth applications become more commonplace, complex, and interwoven into a larger number of internet systems. one would imagine that if a corporation such as google had to pay for a high-speed digital lane, it would want it to reach as many consumers as possible. so, it may well be that libraries’ google access would be unaffected or possibly improved by a two-tier (or multi-tier) internet “speed-lane” service model. would the same be true for library-oriented publishers and vendors? that may depend on their size and relative affluence. if so, the ability of smaller publishers and vendors to offer innovative bandwidth-intensive products and services may be curtailed. unless they are affluent, libraries may also find that they are confined to slower internet speed lanes when they act as information providers. for libraries engaged in digital library, electronic publishing, and institutional repository projects, this may be problematic, especially as they increasingly add more digital media, large-data-set, or other bandwidth-intensive applications. it’s important to keep in mind that net neutrality impacts are tied to where the chokepoints are, with the most serious potential impacts being at chokepoints that affect large numbers of users, such as local isps that are part of large corporations, national/international backbone networks, and major internet information services (e.g.,yahoo!). it is also important to realize that the problem may be partitioned to particular network segments. for example, on-campus network users may not experience any speed issues associated with the delivery of bandwidth-intensive information from local library servers because that network segment is under university control. remote users, however, including affiliated home users, may experience throttled-down performance beyond what would normally be expected due to speed-lane enforcement by backbone providers or local isps controlled by large corporations. likewise, users at two universities connected by a special research network may experience no issues related to accessing the other university’s bandwidth-intensive library applications from on-campus computers because the backbone provider is under a contractual obligation to deliver specific network performance levels. although the example of speed lanes has been used in this examination of potential net neutrality impacts on libraries, the problem is more complex than this, because network services, such as peer-to-peer networking protocols, can be completely blocked, digital information can be blocked or filtered, and other types of fine-grained network control can be exerted. ฀ conclusion this paper has deliberately presented one side of the story. it should not be construed as saying that copyright law should be abolished or violated, that drm can serve no useful purpose (if it is possible to fix certain critical deficiencies and if it is properly employed), or that no one has to foot the bill for content creation/marketing/distribution and ever-more-bandwidth-hungry internet applications. 126 information technology and libraries | september 2006 nor is it to say that the other side of the story, the side most likely to be told by spokespersons of the entertainment, information, and telecommunications industries, has no validity and does not deserve to be heard. however, that side of the story is having no problem being heard, especially in the halls of congress. the side of the story presented in this paper is not as widely heard—at least, not yet. nor does it intend to imply that executives from the entertainment, information, telecommunications, and other corporate venues lack a social conscience, are fully unified in their views, or are unconcerned with the societal implications of their positions. however, by focusing on short-term issues, they may not fully realize the potentially negative, long-term impact that their positions may have on their own enterprises. nor has this paper presented all of the issues that threaten the internet, such as assaults on privacy, increasingly determined (and malicious) hacking, state and other censorship, and the seemingly insolvable problem of overlaying national laws on a global digital medium. what this paper has said is simply this: three issues—a dramatic expansion of the scope, duration, and punitive nature of copyright laws; the ability of drm to lock-down content in an unprecedented fashion; and the erosion of net neutrality—bear careful scrutiny by those who believe that the internet has fostered (and will continue to foster) a digital revolution that has resulted in an extraordinary explosion of innovation, creativity, and information dissemination. these issues may well determine whether the much-touted information superhighway lives up to its promise or simply becomes the “information toll road” of the future, ironically resembling the pre-internet online services of the past. references and notes 1. gary flake, “how i learned to stop worrying and love the imminent internet singularity,” http://castingwords.com/ transcripts/o3/5073.html (accessed may 2, 2006). 2. lawrence lessig, free culture: the nature and future of creativity (new york: penguin, 2005), 130, www.free-culture.cc/ (accessed may 2, 2006). 3. ibid., 131. 4. ibid., 117–18. 5. william f. patry, copyright law and practice (washington, d.c.: bureau of national affairs, 2000), http://digital-law -online.info/patry (accessed may 2, 2006). 6. u.s. copyright office, copyright basics (washington, d.c.: u.s. copyright office, 2000), www.copyright.gov/circs/circl/ html (accessed may 2, 2006). 7. ibid. 8. lessig, free culture, 133. 9. barbara m. waxer and marsha baum, internet surf and turf revealed: the essential guide to copyright, fair use, and finding media (boston: thompson course technology, 2006), 17. 10. patry, copyright law and practice; lessig, free culture, 133. 11. jessica litman, digital copyright (amherst: prometheus books, 2001), 35–63. 12. lessig, free culture, 134. 13. ibid., 326. 14. association of american universities, the association of research libraries, the association of american university presses, and the association of american publishers, campus copyright rights & responsibilities: a basic guide to policy considerations (association of american universities, the association of research libraries, the association of american university presses, and the association of american publishers, 2006), 8, www.arl.org/info/ frn/copy/campuscopyright05.pdf (accessed may 2, 2006). 15. george h. pike, “the delicate dance of database licenses, copyright, and fair use,” computers in libraries 22, no. 5 (2002): 14, http://infotoday.com/cilmag/may02/pike .htm (accessed may 2, 2006). 16. patry, copyright law and practice. 17. computer crime and intellectual property section criminal division, u.s. department of justice, “prosecuting intellectual property crimes manual,” www.cybercrime.gov/ipmanual .htm (accessed may 2, 2006); u.s. copyright office, copyright law of the united states of america and related laws contained in title 17 of the united states code (washington, d.c.: u.s. copyright office, 2003), www.copyright.gov/title17/circ92.pdf (accessed may 2, 2006). 18. recording industry association of america, “copyright laws,” www.riaa.com/issues/copyright/laws.asp (accessed may 2, 2006). 19. kenneth d. crews, copyright law for librarians and educators: creative strategies and practical solutions, 2nd ed. (chicago: ala, 2006), 94. 20. lessig, free culture, 8. 21. ibid., 51. 22. lawrence lessig, the future of ideas: the fate of the commons in a connected world (new york: vintage bks., 2002), 165–66, 176. 23. lessig, free culture, 53–61. 24. ibid., 53. 25. ibid., 59–61. 26. james boyle, shamans, software, and spleens: law and the construction of the information society (cambridge: harvard univ. pr., 1996), 172. 27. william w. fisher iii, promises to keep: technology, law, and the future of entertainment (stanford, calif.: stanford univ. pr., 2004), 202. 28. lessig, free culture, 289–93. 29. litman, digital copyright, 179–80. 30. ibid., 181–84. 31. michael godwin, digital rights management: a guide for librarians (washington, d.c.: office for information technology policy, ala, 2006), 1, www.ala.org/ala/washoff/woissues/ copyrightb/digitalrights/drmfinal.pdf (accessed may 2, 2006). digital dystopia | bailey 127 32. ibid., 10–18. 33. bill rosenblatt, bill trippe, and stephen mooney, digital rights management: business and technology (new york: m&t bks., 2002), 61–64. 34. godwin, digital rights management: a guide for librarians, 2. 35. david mann, “digital rights management and people with sight loss,” indicare monitor 2, no. 11 (2006), www .indicare.org/tiki-print_article.php?articleid=170 (accessed may 2, 2006). 36. julie e. cohen, “drm and privacy,” communications of the acm 46, no. 4 (2003): 46–49. 37. rosenblatt, trippe, and mooney, digital rights management: business and technology, 45. 38. j. alex halderman and edward w. felten, “lessons from the sony cd drm episode,” feb. 14, 2006, http://itpolicy.princeton .edu/pub/sonydrm-ext.pdf (accessed may 2, 2006). 39. godwin, digital rights management: a guide for librarians, 18–36. 40. wikipedia, “analog hole,” http://en.wikipedia.org/ wiki/analog_hole (accessed may 2, 2006). 41. godwin, digital rights management: a guide for librarians, 18–20. 42. ibid., 36. 43. robert a. gehring, “trusted computing for digital rights management,” indicare monitor 2, no. 12 (2006), www.indicare .org/tiki-read_article.php?articleid=179 (accessed may 2, 2006). 44. seth schoen, “trusted computing: promise and risk,” www.eff.org/infrastructure/trusted_computing/20031001 _tc.php (accessed may 2, 2006). 45. pamela samuelson, “drm {and, or, vs.} the law,” communications of the acm 46, no. 4 (2003): 43–44. 46. declan mccullagh, “congress raises broadcast flag for audio,” cnet news.com, mar. 2, 2006, http://news.com .com/congress+raises+broadcast+flag+for+audio/2100-1028 _3-6045225.html (accessed may 2, 2006). 47. ibid. 48. danny o’brien, “a lump of coal for consumers: analog hole bill introduced,” eff deeplinks, dec. 16, 2005, www.eff .org/deeplinks/archives/004261.php (accessed may 2, 2006). 49. siva vaidhyanathan, copyrights and copywrongs: the rise of intellectual property and how it threatens creativity (new york: new york univ. pr., 2001), 174–75. 50. lessig, the future of ideas, 36–37. 51. educause, “net neutrality,” www.educause.edu/ c o n t e n t . a s p ? pa g e _ i d = 6 4 5 & pa r e n t _ i d = 8 0 7 & b h c p = 1 (accessed may 2, 2006). 52. electronic frontier foundation, “dearaol.com coalition grows from 50 organizations to 500 in one week,” mar. 7, 2006, www.eff.org/news/archives/2006_03.php#004461 (accessed may 2, 2006). 53. catherine yang, “is verizon a network hog?” businessweek, feb. 13, 2006, 58, www.businessweek.com/technology/ content/feb2006/tc20060202_061809.htm (accessed may 2, 2006). 54. ibid. 55. jonathan krim, “executive wants to charge for web speed,” washington post, dec. 1, 2005, d05, www.washingtonpost .com/wp-dyn/content/article/2005/11/30/ar2005113002109 .html (accessed may 2, 2006). 56. harold furchtgott-roth, “at&t, or another telecom takeover,” the new york sun, mar. 7, 2006. www.nysun.com/ article/28695 (accessed may 2, 2006). (see also: www.furchtgott -roth.com/news.php?id=87 (accessed may 2, 2006). 57. john windhausen jr., good fences make bad broadband: preserving an open internet through net neutrality (washington, d.c.: public knowledge, 2006), www.publicknowledge.org/ content/papers/pk-net-neutrality-whitep-20060206 (accessed may 2, 2006). 58. peter suber, “three gathering storms that could cause collateral damage for open access,” sparc open access newsletter, no. 95 (2006), www.earlham.edu/~peters/ fos/newsletter/ 03-02-06.htm#collateral (accessed may 2, 2006). 59. j. d. lasica, darknet: hollywood’s war against the digital generation (new york: wiley, 2005), 45. 60. john borland, “freenet keeps file-trading flame burning,” cnet news.com, oct. 28, 2002, http://news.com.com/2100 -1023-963459.html (accessed may 2, 2006). 61. creative commons, “attribution 2.5,” http://creative commons.org/licenses/by/2.5/ (accessed may 2, 2006). 62. lawrence liang, “a guide to open content licenses.” http://pzwart.wdka.hro.nl/mdr/research/lliang/open _content_guide (accessed may 2, 2006). 63. peter suber, “open access overview: focusing on open access to peer-reviewed research articles and their preprints.” www.earlham.edu/~peters/fos/overview.htm (accessed may 2, 2006); charles w. bailey jr., “open access and libraries,” in mark jacobs, ed., electronic resources librarians: the human element of the digital information age (binghamton, n.y.: haworth, 2006), forthcoming, www.digital-scholarship.com/cwb/oa libraries.pdf (accessed may 2, 2006). 64. lasica, darknet, 72–73. 65. waxer and baum, internet surf and turf revealed, 17. 66. lessig, free culture, 134–35. 67. joe mandak, “internet archive’s value, legality debated in copyright suit,” mercury news, mar. 31, 2006, www .mercurynews.com/mld/mercurynews/news/local/states/ california/northern_california/14234638.htm (accessed may 2, 2006). 68. arnold p. lutzker, primer on the digital millennium: what the digital millennium copyright act and the copyright term extension act mean for the library community (washington, d.c.: ala washington office, 1999), www.ala.org/ala/washoff/wois sues/copyrightb/dmca/dmcaprimer.pdf (accessed may 2, 2006). the chamberlain group inc. v. skylink technologies inc. decision offers some hope that authorized users of drm-protected works could legally circumvent drm for lawful purposes if they had the means to do so (see: crews, copyright law for librarians and educators: creative strategies and practical solutions, 96–97). continued on page 139 toward a twenty-first-century library catalog | antelman, lynema, and pace 139 copyright © 2006 by charles w. bailey jr. this work is licensed under the creative commons attributionnoncommercial 2.5 license. to view a copy of this license, visit http://creativecommons.org/licenses/by-nc/2.5/ or send a letter to creative commons, 543 howard st., 5th floor, san francisco, ca, 94105, usa. bailey continued from 127 ฀ known-item questions 1. “your history professor has requested you to start your research project by looking up background information in a book titled civilizations of the ancient near east.” a. “please find this title in the library catalog.” b. “where would you go to find this book physically?” 2. “for your literature class, you need to read the book titled gulliver’s travels written by jonathan swift. find the call number for one copy of this book.” 3. “you’ve been hearing a lot about the physicist richard feynman, and you’d like to find out whether the library has any of the books that he has written.” a. “what is the title of one of his books?” b. “is there a copy of this book you could check out from d. h. hill library?” 4. “you have the citation for a journal article about photosynthesis, light, and plant growth. you can read the actual citation for the journal article on this sheet of paper.” alley, h., m. rieger, and j.m. affolter. “effects of developmental light level on photosynthesis and biomass production in echinacea laevigata, a federally listed endangered species.” natural areas journal 25.2 (2005): 117–22. a. “using the library catalog, can you determine if the library owns this journal?” b. “do library users have access to the volume that actually contains this article (either electronically or in print)?” ฀ topical questions 5. “please find the titles of two books that have been written about bill gates (not books written by bill gates).” 6. “your cat is acting like he doesn’t feel well, and you are worried about him. please find two books that provide information specifically on cat health or caring for cats.” 7. “you have family who are considering a solar house. does the library have any materials about building passive solar homes?” 8. “can you show me how would you find the most recently published book about nuclear energy policy in the united states?” 9. “imagine you teach introductory spanish and you want to broaden your students’ horizons by exposing them to poetry in spanish. find at least one audio recording of a poet reading his or her work aloud in spanish.” 10. “you would like to browse the recent journal literature in the field of landscape architecture. does the design library have any journals about landscape architecture?” appendix a: ncsu libraries catalog usability test tasks information security in libraries: examining the effects of knowledge transfer tonia san nicolas-rocca and richard j. burkhard information technology and libraries | june 2019 58 tonia san nicolas-rocca (tonia.sannicolas-rocca@sjsu.edu) is assistant professor in the school of information at san jose state university. richard j. burkhard (richard.burkhard@sjsu.edu) is professor in the school of information systems and technology in the college of business at san jose state university. . author three (email) is title, institution. abstract libraries in the united states handle sensitive patron information, including personally identifiable information and circulation records. with libraries providing services to millions of patrons across the u.s., it is important that they understand the importance of patron privacy and how to protect it. this study investigates how knowledge transferred within an online cybersecurity education affects library employee information security practices. the results of this study suggest that knowledge transfer does have a positive effect on library employee information security and risk management practices. introduction libraries across the u.s. provide a wide range of services and resources to society. libraries of all types are viewed as important parts of their communities, offering a place for research, to learn about technology, to access accurate and unbiased information, and a place that inspires and sparks creativity. as a result, there were over 171 million registered public library users in the u.s. in 2016.1 a library is a collection of information resources and services made available to the community in which it serves. the american library association (ala) affirms the ethical imperative to provide unrestricted access to information and to guard against impediments to open inquiry.2 further, in all areas of librarianship, best practice leaves the library user in control of as many choices as possible.3 in a library, the right to privacy is the right to open inquiry without having the subject of one’s interest examined or scrutinized by others.4 many library resources require the use of a library card. to obtain a library card in the u.s. one must provide official photo identification showing personally identifiable information (pii), such as name, address, telephone number, and email address. pii connects library users or patrons with, for example, items checked out, and websites visited. as such, pii has the potential to build up an image of a library patron that could potentially be used to assess the patron’s character. in response, the ala developed a policy concerning the confidentiality of pii about library users.5 confidentiality extends to “information sought or received and resources consulted, borrowed, acquired or transmitted,” and includes, but is not limited to, database search records, reference interviews, circulation records, interlibrary loan records, and other personally identifiable uses of library materials, facilities, or services.6 in more recent years, the ala has further specified that the right of patrons to privacy applies to any information that can link “choices of taste, interest, or research with an individual.”7 when library users recognize or fear that their privacy or information technology and libraries | june 2019 59 confidentiality is compromised, true freedom of inquiry no longer exists. therefore, it is imperative that libraries use extra care when handling patron personally identifiable information. while librarians and other library employees may understand the importance of data protection, they generally don’t have the resources available to assess information security risk, employ risk mitigation strategies, or offer security education, training, or awareness (seta) programs. this is of particular concern as libraries increasingly have access to databases of both proprietary and personal information.8 seta programs are risk mitigation strategies employed by organizations worldwide to increase and maintain end-user compliance of information security and privacy policies. in libraries, information systems are widely used to provide services to patrons, however, there is little known about information security practices in libraries.9 given the sensitivity of the data libraries handle, and the lack of information security resources available to them, it is important for those currently or planning to work in the library environment to develop the knowledge necessary to identify risks and develop and employ risk mitigation strategies to protect information and information resources they are entrusted with. therefore, the research question in this present study is: how can cybersecurity education strengthen information security practices in libraries? currently, there is a dearth of research on information security practices in libraries.10 this is an important research gap to acknowledge given that patron privacy is fundamental to the practice of librarianship in the u.s, and the advancement in technology coupled with federal regulations adds to the challenges of keeping patron privacy safe.11 thus this study contributes to current literature by evaluating the effects of knowledge transfer as a means to strengthen information security within libraries. furthermore, this study will offer a preliminary investigation as to whether knowledge utilization leads to motivation and the participation of information security risk management activities within libraries. the remainder of this paper proceeds as follows: first, a review of knowledge transfer is covered. a description of the cybersecurity course, including students and course material, is provided. data collection and analysis are then presented. this is followed by a discussion of the findings, limitations, and future research. literature rivew knowledge transfer in seta knowledge transfer through seta programs plays a key role in the development and implementation of cybersecurity practices.12 knowledge is transferred when learning takes place and when the recipient of that knowledge understands the intricacies and implications associated with that knowledge so that he or she can apply it.13 for example, in a security education program, an educator may transfer knowledge about information security risks to users who learn and apply the knowledge to increase patron privacy. the knowledge is applied when evidenced by users who are able to identify risks to patron data and implement risk mitigations strategies that serve to protect patron information and information system assets. knowledge transfer can be influenced by four factors: absorptive capacity, communication, motivation, and user participation.14 this study evaluates the extent to which knowledge transferred from a cybersecurity course strengthens information security practices within libraries. this study adapts the theoretical model as proposed by spears & san nicolas-rocca information security in libraries | san-nicolas-rocca and burkhard 60 https://doi.org/10.6017/ital.v38i2.10973 (2015) (see figure 1) to examine the effects of cybersecurity education on information security practices in libraries.15 figure 1. factors of knowledge transfer leads to knowledge utilization. absorptive capacity absorptive capacity is the ability of a recipient to recognize the importance and value of eternally sourced knowledge, assimilate and apply it and has been found to be positively related to knowledge transfer.16 activating a student’s prior knowledge could enhance their ability to process new information.17 that is, knowledge transfer is more likely to take place between the instructor and students enrolled in a cybersecurity course if the student has existing knowledge or has had experience in some related area. for the present study, students have stated that prior to enrolling in the cybersecurity course, they had little to no knowledge of cybersecurity. one student mentioned, “while i am the director of a small academic library, i have no understanding of cybersecurity. i am taking this course to learn about cybersecurity so that i can better secure the library i work in and to share the information with those who work in the library.” another student mentioned, “my goal is to work in a public library after graduation. i am taking this course because i keep hearing about cybersecurity breaches in the news, and i want to learn more about cybersecurity because i think it will help me in my future job.” while all of the students enrolled in the course had no cybersecurity experience, all of them had some understanding of principle 3 in the ala code of ethics, which states, “we protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.”18 understanding of principle 3 in the code of ethics demonstrates existing knowledge in some related area with regards to cybersecurity, albeit limited knowledge. given this understanding, students should have the ability to process new information from the cybersecurity course. information technology and libraries | june 2019 61 communication the success of any seta program depends on the ability of the instructor to effectively communicate the applicability and practical purpose of the material to be mastered, as distinguished from abstract or conceptual learning.19 according to current research, knowledge transfer can only occur if communication is effective in terms of type, amount, competence, and usefulness.20 for the present study, students were enrolled in an online graduate level cybersecurity course at a university we call mountain view university (mvu). we changed the name to protect the privacy of the research participants. while research suggests that the best form of communication for knowledge transfer is face-to-face communication, the cybersecurity course at mvu is only offered online.21 therefore, communication relating to the course was conducted via course management software, email, video conferencing, discussion board, and prerecorded videos. motivation motivation can be a significant influence on knowledge transfer.22 that is, an individual’s motivation to participate in seta programs has been found to influence the extent to which knowledge is transferred.23 specifically, without motivation, a trainee may fail to use information shared with them about methods used to protect and safeguard patron privacy. in this present study, research participants voluntarily enrolled in the cybersecurity course. the cybersecurity course is not a core course or a class required for graduation. therefore, enrolling in the course implies motivation to learn about cybersecurity by participating in course activities and completing assigned work. user participation user participation in information security activities may influence effective knowledge transfer initiatives.24 according to previous research, when users participate in cybersecurity activities, security safeguards were more aligned with organizational objectives and were more effectively designed and performed within the organization.25 for the present study, given that students enrolled in the cybersecurity course, it is expected that they will participate in information security risk management activities, such as the completion of personal and organizational risk management projects. cybersecurity course information this study will examine whether cybersecurity education strengthens information security practices within libraries. based on the model in figure 1, students enrolled in the cybersecurity course (motivation), and therefore, were expected to participate in all course activities and complete assigned work (user participation), such as isrm assignments. isrm assignments are described in the course material section below. as per figure 2, the cybersecurity course was offered online, and used multiple forms of communication, including email, video conferencing, discussion board, and pre-recorded videos (communication). students were able to access these resources through canvas, a learning management system. students came into the class with some understanding of principle 3 in the ala code of ethics. therefore, given that this knowledge is in a “related area,” students may be able to process new information relating to cybersecurity (absorptive capacity). as per the above information and as depicted in figure 1, motivation, user participation, communication, and absorptive capacity will lead to knowledge transfer. therefore, this study will focus on how knowledge transfer, as a means to strengthen information security, leads to knowledge utilization by cybersecurity students within information organizations. information security in libraries | san-nicolas-rocca and burkhard 62 https://doi.org/10.6017/ital.v38i2.10973 specifically, this study will explore the possibility of knowledge utilization leading to motivation, and participation in isrm initiatives in libraries. figure 2. knowledge transfer elements: cybersecurity knowledge transfer for information organizations. course material the course was offered to graduate students at mountain view university. course material was created based on the national institute of technology special publication (nist sp) 800-53 and 60, as well as federal information processing standards (fips) publications 199 and 200. the focus of the course was information security risk management (isrm). course requirements included lab exercises, discussion posts relating to current cybersecurity findings and news reports, and isrm assignments. isrm assignments included a personal risk management assignment, which then led to the completion of an organizational risk management project (ormp). students completed the ormp for various libraries, healthcare institutions, pharmaceutical companies, government organizations, and small businesses. with instructor approval, students were allowed to select the organization they wanted to work with. the objective of the course was for students to obtain an understanding of isrm and be able to apply what they have learned to the workplace. course communication seta programs depend strongly on the ability of the knowledge source to effectively communicate the importance and applicability of the knowledge shared. current research suggests that the type of communication medium, relevance and usefulness of the information, and competency of the instructor can affect knowledge transfer. given that face-to-face communication is considered the best method for successful knowledge transfer, it is important to understand if online communication methods were effective in the cybersecurity course described herein as the main focus of this study is to determine if knowledge transfer leads to knowledge utilization. according to table 1, respondents “strongly agree” or “agree” that the materials used, relevance of communication, comprehension of instructor communication, and the amount of time communicating about cybersecurity in the course was effective (data collection described in section, data collection and analysis. information technology and libraries | june 2019 63 questions response strongly agree agree neither agree nor disagree disagree strongly disagree medium: the material used in the cybersecurity course i took at mvu communicated security lessons effectively. 12 (50%) 12 (50%) 0 (0.00%) 0 (0.00%) 0 (0.00%) relevance: communication during the cybersecurity course i took at mvu was effective in focusing on things i needed to know about cybersecurity for my job. 10 (45.45%) 12 (54.55%) 0 (0.00%) 0 (0.00%) 0 (0.00%) comprehension: in the cybersecurity course i took at mvu, the instructor’s oral and/or written communication with me was understandable. 12 (54.55%) 10 (45.45%) 0 (0.00%) 0 (0.00%) 0 (0.00%) amount: in the cybersecurity course i took at mvu, the amount of time communicating about cybersecurity was sufficient. 12 (54.55%) 10 (45.45%) 0 (0.00%) 0 (0.00%) 0 (0.00%) table 1. effectiveness of communication in cybersecurity course. data collection and analysis the purpose of this study is to determine if knowledge transfer through cybersecurity education, as a means to strengthen information security, leads to knowledge utilization within libraries. specifically, this study will examine if research participants will engage in isrm activities after completion of the cybersecurity education course. the model in figure 1 is examined via survey instrument by the authors. the survey instrument was available to former students who completed an online, semester long, cybersecurity course from fall 2013 through fall 2017. one hundred and twenty-six former students completed one of eight cybersecurity courses, and all were asked to participate in this study. thirty-nine students accessed the survey, but only thirty-eight agreed to participate. of those who agreed to participate in the survey, only twenty-two work in a library in the u.s. or a u.s. territory. of the other sixteen participants, twelve do not currently work within a library environment, and four do not have a job. therefore, responses from twenty-two research participants who work in a library in the u.s. or u.s. territory will be reported in this study. table 2 provides a list of the types of libraries the twenty-two research participants work in. type of library environment response (22) academic library 3 (13.64%) public library 11 (50%) school library (k-12) 2 (9.09%) special library 6 (27.27%) table 2. types of libraries research participants work in. information security in libraries | san-nicolas-rocca and burkhard 64 https://doi.org/10.6017/ital.v38i2.10973 having knowledge and an understanding of information security policies, work processes, and information and information system use within a library environment, a knowledge recipient may understand the value of the knowledge shared with them through effective seta programs and utilize the new knowledge to protect information and information resources. according to table 3, most survey participants stated that they have average to excellent knowledge of their library’s computing-related policies, work processes that handle sensitive patron information, how access to patron information is granted, and how internal staff tend to use computing devices to access organizational information. a few respondents stated that their knowledge is below average. questions: response excellent above average average below average poor how would you rate your knowledge of your organization’s computing-related policies for internal staff computer usage? 4 (18.18%) 10 (45.45%) 8 (36.36%) 0 (0.00%) 0 (0.00%) how would you rate your knowledge of your library’s work processes that handle sensitive patron information? 4 (18.18%) 11 (50%) 6 (27.27%) 1 (4.55%) 0 (0.00%) within the organization you work for, how would you rate your knowledge of how access to patron information is granted? 3 (13.64%) 12 (54.55%) 5 (22.73%) 2 (9.10%) 0 (0.00%) how would you rate your knowledge on how internal staff tend to use computing devices to access organizational information? 2 (9.10%) 11 (50%) 8 (36.36%) 1 (4.55%) 0 (0.00%) table 3. knowledge of organization’s computing-related policies. knowledge transfer for this study, knowledge transfer is measured as the extent to which the cybersecurity student acquired knowledge or understands the key educational objective. according to table 4 below, all survey participants stated that during the cybersecurity course, they acquired knowledge on information security risks, and solutions to manage information security risks within organizations. furthermore, 91 percent of the twenty-two survey participants stated that they gained an understanding of the feasibility to implement solutions and potential impact of not implementing solutions to manage information security risk within the organizations in which they work. this is consistent with previous research that has measured knowledge transfer.26 question: during the cybersecurity course i took at mvu, i _________. response acquired knowledge on information security risks within the organization. 22 (100%) acquired knowledge on solutions to manage information security risks identified within my organization. 22 (100%) gained an understanding of the feasibility to implement solutions to manage information security risks identified within my organization. 20 (90.90%) gained an understanding of the potential impact of not implementing solutions to manage information security risks identified within my organization. 20 (90.90%) information technology and libraries | june 2019 65 table 4. indicators of knowledge transfer. knowledge utilization the desired outcome of knowledge transfer is knowledge utilization.27 this study is interested in the extent to which cybersecurity students have been engaged in information security risk management initiatives in their workplace since the completion of the cybersecurity course. according to table 5, twelve of the twenty-two survey participants have utilized the knowledge transferred to them from the cybersecurity course within the libraries in which they work. of the twelve survey participants, ten performed security procedures within the organization on an ad hoc, informal basis. seven worked on defining new or revised security policies. four implemented new or revised security procedures for organizational staff to follow, and two evaluated at least one security safeguard to determine whether it is being followed by organizational staff. question: since the completion of the cybersecurity course i took at mvu, i have ______ (please check all that apply). response performed security procedures within the organization on an ad hoc, informal basis. 10 (83.33%) worked on defining new or revised security policies. 7 (58.33%) implemented new or revised security procedures for organizational staff to follow. 4 (33.33%) evaluated at least one security safeguard to determine whether it is being followed by organizational staff. 2 (16.66%) not performed any security procedures within the organization. 10 (45.45%) table 5. indicators of knowledge utilization in the library. participation knowledge transfer through cybersecurity education may influence a cybersecurity student to utilize the knowledge they have gained by participating in isrm activities. according to table 6, sixteen of the twenty-two survey participants have participated in isrm activities in the library in which they work since the completion of the cybersecurity course. fifteen communicated with internal senior management on training materials. seven performed a policy review and communicated with internal senior management on training materials. five worked on a security questionnaire, one had an interview with an external collaborator, and another research participant analyzed their library’s business or it process workflow. question: since the completion of the cybersecurity course you took at mvu, have you performed any of the following activities within the workplace: (please check all that apply) response security questionnaire 5 (31.25%) interview with external collaborator (i.e. trainers) 1 (6.25%) policy review 7 (43.75%) business or it process workflow analysis 1 (6.25%) communication with internal peers or staff on training materials 15 (93.75%) communicate with internal senior management on training materials 7 (43.75%) i have not performed any security activities in my workplace 6 (14.29%) table 6. participation in isrm activities. participation may also include discussions on isrm activities. according to table 7, sixteen of the twenty-two survey participants have participated in discussion on isrm activities within the information security in libraries | san-nicolas-rocca and burkhard 66 https://doi.org/10.6017/ital.v38i2.10973 libraries they are currently working at. fifteen survey participants participated in discussions on physical security, and ten had discussions on password policy. seven survey participants had discussions on user provisioning, and six had discussions on encryption. four survey participants had discussions on mobile devices, and another four had discussions on vendor security question: since the completion of the cybersecurity course you took at mvu, have you participated in discussions on the following areas of security? (check all that apply) response password policy 10 (62.5%) user provisioning (i.e., establishing or revoking user logons and system authorization) 7 (43.75%) mobile device 4 (25%) encryption 6 (37.5%) vendor security 4 (25%) physical security 15 (93.75%) disaster recovery, business continuity, or security incident response 6 (37.50%) i have not participated in any discussions relating to security in my workplace 6 (27.27%) table 7. participation in discussions on isrm activities. participation in cybersecurity education may lead to formal responsibility or accountability of isrm activities. according to table 8, nine of the twenty-two survey respondents stated that since the completion of the cybersecurity course, they are formally responsible or accountable for isrm in the libraries in which they work. three research participants are responsible for identifying organizational members to participate in cybersecurity training. five survey participants stated that they are responsible for communicating results on cybersecurity training to upper management, peers, and staff. three research participants are responsible for organizational compliance with government regulations. two are responsible for communicating organizational risk to the board of directors, and one research participant is responsible for organizational compliance of funder requirements. question: since the completion of the cybersecurity course you took at mvu, are you formally responsible or accountable in the following ways? (check all that apply) response identifying organizational members to participate in cybersecurity training 3 (33.33%) communicating results to upper management 5 (55.56%) communicating results to peers or staff 5 (55.56%) responsible for organizational compliance of funder requirements 1 (1.11%) responsible for organizational compliance with government regulations 3 (33.33%) responsible for internal audit 0 (0%) responsible for communicating organizational risk to the board of directors 2 (22.22%) i am not formally responsible for security in my workplace 13 (59.10%) table 8. participation via accountability of isrm activities. motivation an objective of seta programs is to motivate knowledge recipients to comply with information security policies that serve to protect information and information resources. as such, cybersecurity education may motivate students to comply with organizational information security policies that serve to protect information and information resources. according to table 9, information technology and libraries | june 2019 67 since the completion of the cybersecurity course, eighteen of the twenty-two survey participants stated that they believe it is important to protect patron sensitive data. two respondents stated that they wholeheartedly feel responsible to protect their patrons from harm, and another two stated that they would be embarrassed if their organization experienced a data breach. since the completion of the cybersecurity course i took at mvu, _________. response i wholeheartedly feel responsible to protect our patrons from harm. 2 (9.10%) i believe it is important to protect our patrons’ sensitive data. 18 (81.82%) i would be embarrassed if my organization experienced a data breach. 2 (9.10%) my job could be in jeopardy if my organization were to experience a data breach. 0 (0.00%) i do not care about cybersecurity in my organization. 0 (0.00%) table 9. motivation to protect patron privacy. discussion the purpose of this study was to evaluate the effects of knowledge transfer as a means to strengthen information security within libraries. given the results from the survey instrument, the findings suggest that knowledge transfer through cybersecurity education can lead to knowledge utilization. specifically, knowledge transfer through cybersecurity education may influence a library employee to utilize the knowledge they have gained by participating in discussions about, and the accountability and responsibility of isrm activities. in addition, participating in seta programs. seta programs are implemented within organizations as a means to increase compliance of information security policies. the findings suggest that library employees who completed a cybersecurity education course believe that it is important to, or feel that they have a responsibility to, protect patron private information. a couple of research participants stated that they would feel embarrassed if their organization experienced a data breach. a student enrolled in a cybersecurity education course may develop an understanding of and value the information that is passed on from the knowledge source about isrm activities. with ongoing development and implementation of seta programs, activating a student’s prior knowledge of isrm activities could enhance their ability to process new information and apply to their job. limitations and future research this research was conducted based on an online cybersecurity course offered at a university located in the western u.s. therefore, future research is needed to study how cybersecurity courses in other parts of the u.s and internationally affects knowledge transfer as a means to strengthen isrm initiatives in libraries, and other information organizations. it would also be valuable to conduct a modified version of this research within a classroom-based, face-to-face cybersecurity course. furthermore, seta programs implemented in libraries in the united states and internationally would add to this research area. there were 126 potential research participants identified, and although all were asked to participate, only thirty-eight completed the online survey. of the thirty-eight completed surveys, responses from twenty-two participants were reported in this article. participation from additional research participants may have generated different results. information security in libraries | san-nicolas-rocca and burkhard 68 https://doi.org/10.6017/ital.v38i2.10973 while a major limitation of this study is its small pilot study and exploratory focus, a next phase of research should further investigate what type of seta programs would be most effective in different library environments. while cybersecurity education may not be feasible for all library employees to obtain, examining and implementing the most effective seta program for each library environment could strengthen cybersecurity practices in libraries across the u.s. a future study instrument should take into account the factors that influence knowledge transfer (absorptive capacity, communication, motivation, and user participation) as a means to strengthen isrm practices. a common an important outcome for seta programs is user compliance to information security policies. as such, a future study should test library employee knowledge of, and compliance to, information security policies. conclusion u.s. libraries handle sensitive patron information, including personally identifiable information and circulation records. with libraries providing services to millions of patrons across the united states, it is important that they understand the importance of patron privacy and how to protect it. this study investigated how knowledge transferred within an online cybersecurity education course as a means to strengthen information security risk management affects library employee information security practices. the results of this study suggest that knowledge transfer does have a positive effect on library employee information security and risk management practices. references 1 “public library survey (pls) data and reports,” institute of museum and library services, retrieved on june 10, 2018 from https://www.imls.gov/research-evaluation/datacollection/public-libraries-survey/explore-pls-data/pls-data. 2 “policy concerning confidentiality of personally identifiable information about library users,” american library association, july 7, 2006, http://www.ala.org/advocacy/intfreedom/statementspols/otherpolicies/policyconcerning; "professional ethics," american library association, may 19, 2017, http://www.ala.org/tools/ethics. 3 “privacy: an interpretation of the library bill of rights,” american library association, amended july 1, 2014, http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy. 4 ibid. 5 “policy concerning confidentiality of personally identifiable information about library users,” american library association; “code of ethics of the american library association,” american library association, amended jan. 22, 2008, http://www.ala.org/advocacy/proethics/codeofethics/codeethics. 6 “policy concerning confidentiality of personally identifiable information about library users,” american library association; “code of ethics of the american library association,” american library association. 7 “privacy: an interpretation of the library bill of rights,” american library association. information technology and libraries | june 2019 69 8 samuel t.c. thompson, “helping the hacker? library information, security, and social engineering,” information technology and libraries 25, no. 4 (2006): 222-25, https://doi.org/10.6017/ital.v25i4.3355. 9 roesnita ismail and awang ngah zainab, “assessing the status of library information systems security,” journal of librarianship and information science 45, no. 3 (2013): 232-47, https://doi.org/10.1177/0961000613477676. 10 ibid. 11 shayna pekala, “privacy and user experience in 21st century library discovery,” information technology and libraries 36, no. 2 (2017): 48–58, https://doi.org/10.6017/ital.v36i2.9817. 12 tonia san nicolas-rocca, benjamin schooley and janine l. spears, “exploring the effect of knowledge transfer practices on user compliance to is security practices,” international journal of knowledge management 10, no. 2, (2014): 62-78, https://doi.org/10.4018/ijkm.2014040105; janine spears and tonia san nicolas-rocca, “knowledge transfer in information security capacity building for community-based organizations,” international journal of knowledge management 11, no. 4 (2015): 52-69, https://doi.org/10.4018/ijkm.2015100104. 13 dong-gil ko, laurie j. kirsch and william r. king, “antecedents of knowledge transfer from consultants to clients in enterprise system implementations,” mis quarterly 29, no. 1 (2005): 59-85, https://doi.org/10.2307/25148668. 14 spears and san nicolas-rocca, “knowledge transfer in information security capacity building for community-based organizations,” pp. 52-69; dana minbaeva et al., “mnc knowledge transfer, subsidiary absorptive capacity and hrm,” journal of international business studies 45, no. 1 (2014): 38-51, https://doi.org/10.1057/jibs.2013.43; geordie stewart and david lacey, “death by a thousand facts: criticising the technocratic approach to information security awareness,” information management & computer security 20, no. 1 (2012): 29-38, https://doi.org/10.1108/09685221211219182; mark wilson et al., “information technology training requirements: a role-and performance-based model” (nist special publication 800-16), national institute of standards and technology, (2018), https://www.nist.gov/publications/information-technology-security-training-requirementsrole-and-performance-based-model; san nicolas-rocca, schooley and spears, “exploring the effect of knowledge transfer practices on user compliance to is security practices,” 62-78. 15 spears and san nicolas-rocca, “knowledge transfer in information security capacity building for community-based organizations,” 52-69. 16 janine l. spears and henri barki, “user participation in information systems security risk management,” mis quarterly 34, no. 3 (2010): 503-22, https://doi.org/10.2307/25750689; piya shedden, tobias ruighaver, and atif ahmad, “risk management standards-the perception of ease of use,” journal of information systems security 6, no. 3 (2010): 23–41. information security in libraries | san-nicolas-rocca and burkhard 70 https://doi.org/10.6017/ital.v38i2.10973 17 shedden, ruighaver and ahmad, “risk management standards-the perception of ease of use” pp. 23-42; janne hagen, eirik albrechtsen, and stig ole johnsen, “the long-term effects of information security e-learning on organizational learning,” information management & computer security 19, no. 3 (2011): 140-154, https://doi.org/10.1108/09685221111153537. 18 “code of ethics of the american library association,” american library association. 19 spears and san nicolas-rocca, “knowledge transfer in information security capacity building for community-based organizations,” pp. 52-69; wilson et al., “information technology training requirements: a roleand performance-based model” (nist special publication 800-16). 20 thompson s.h. teo and anol bhattacherjee, “knowledge transfer and utilization in it outsourcing partnerships: a preliminary model of antecedents and outcomes,” information & management 51, no. 2 (2014): 177–86, https://doi.org/10.1016/j.im.2013.12.001; ko, kirsch, and king, “antecedents of knowledge transfer from consultants to clients in enterprise system implementations,” 59-85; minbaeva et al., “mnc knowledge transfer, subsidiary absorptive capacity and hrm,” 38-51; geordie stewart and david lacey, “death by a thousand facts: criticising the technocratic approach to information security awareness,” information management & computer security 20, no. 1 (2012): 29-38, https://doi.org/10.1108/09685221211219182. 21 martin spraggon and virginia bodolica, “a multidimensional taxonomy of intra-firm knowledge transfer processes,” journal of business research 65, no. 9 (2012) 1,273-282: https://doi.org/10.1016/j.jbusres.2011.10.043; shizhong chen et al., “toward understanding inter-organizational knowledge transfer needs in smes: insight from a uk investigation,” journal of knowledge management 10, no. 3 (2006): 6-23, https://doi.org/10.1108/13673270610670821. 22 maryam alavi and dorothy e. leidner, “review: knowledge management and knowledge management systems: conceptual foundations and research issues,” mis quarterly 25, no. 1 (2001): 107-36, https://doi.org/10.2307/3250961. 23 ko, kirsch, and king, “antecedents of knowledge transfer from consultants to clients in enterprise system implementations,” 59-85. 24 san nicolas-rocca, schooley, and spears, “exploring the effect of knowledge transfer practices on user compliance to is security practices,” 62-78; spears and san nicolas-rocca, “knowledge transfer in information security capacity building for community-based organizations,” 52-69. 25 spears and san nicolas-rocca, “knowledge transfer in information security capacity building for community-based organizations,” 52-69; spears and barki, “user participation in information systems security risk management,” 503-22. 26 san nicolas-rocca, schooley, and spears, “exploring the effect of knowledge transfer practices on user compliance to is security practices,” 62-78; janine l. spears and tonia san nicolasrocca, “information security capacity building in community-based organizations: information technology and libraries | june 2019 71 examining the effects of knowledge transfer,” 49th hawaii international conference on system sciences (hicss), koloa, hi, 2016, pp. 4,011-20, https://doi.org/10.1109/hicss.2016.498; ko, kirsch, and king, “antecedents of knowledge transfer from consultants to clients in enterprise system implementations,” 59-85. 27 ko, kirsch, and king, “antecedents of knowledge transfer from consultants to clients in enterprise system implementations,” 59-85; teo and bhattacherjee, “knowledge transfer and utilization in it outsourcing partnerships: a preliminary model of antecedents and outcomes,” 177–86. 178 subject reference lists produced by computer ching-chih chen: massachusetts institute of technology, boston, massachusetts (formerly university of waterloo) and e. robert kingham, university of waterloo, waterloo, ontario, canada. a system developed to produce fourteen subject reference lists by ibm 360 f75 is described in detail. the computerized system has many advantages over conventional manual procedures. the feedback from students and other users is discussed, and some analysis of cost is included. introduction the university of waterloo, with the third largest enrollment in the province of ontario, was the first in canada to institute a "cooperative education plan". undergraduate students enrolled in cooperative courses (all engineering and some science, mathematics and arts students) spend eight four-month terms at the university for academic studies, alternated with six four-month terms with industry or government for practical experience related to their academic programmes. an ibm 360/ 75 at the university of waterloo is the heart of the largest university computer installation in canada, and is an important tool for faculty, students and administration. under multi-processing it can serve many departments through terminals around the campus. one terminal serves the data processing department of the computer centre, where all the maintenance and printing of various reports required for the project under discussion are handled for the engineering, mathematics & science library ( e.m .s. library) . the e.m.s. library contains approximately 75,000 volumes of monographs, periodicals, technical reports and government documents, and subject reference lists/ chen and kingham 179 currently receives 1,650 periodical titles. it serves about 4,500 on-campus students and more than 300 faculty members in the fields of engineering, mathematics and science (in 1967/68), and provides assistance on request to business and industry in the area. system since e.m.s. library users have frequently requested subject reference lists to guide them in the use of library materials, and library reference statistics have proved that there is a justified need for them ( 1), the reference staff began, in the fall of 1966, to investigate means of compiling and producing these lists. it was planned that each subject list should first be prepared and edited by reference librarians, but at that point, conventional manual procedures should be abandoned in favor of using the computer available on campus. in this way, operations in revising and updating the lists and in adding new lists in other subject areas would be simplified, manual clerical work would be reduced significantly (no typing would be needed) and titles related to interdisciplinary areas of study could be easily coded to appear on more than one list. although library literature contains numerous accounts of library automation programmes ( 2), it is very obvious that the chief emphasis has been on technical services and circulation applications. so far as "reference services" or "information services" go, many developments have been discussed in recent years in the areas of documentation, indexing, retrieval techniques and systems, selective dissemination of information, interlibrary communication, etc. . . concise summaries . can be found in many papers (3, 4, 5, 6, 7). however, in the initial stages of developing our system, we failed to locate any existing mechanized system of producing subject bibliographies for reference use. such subject reference lists could be easily generated if the library catalogue were in machine readable form ( 6, 8), but since a computerized catalogue was not foreseen at waterloo for some time to come, the library had to design and develop an independent system to fulfil reference needs. since december 1965, the university of waterloo libraries have achieved success in producing a serials list by computer. the techniques used in the original serials project (using an ibm 1620 with card input) which started in spring 1965 and was completed in december 1965, were not new, and the fields and codes used were based on modifications of those used by the national research council library ( 9) and dalhousie university [the dalhousie aa u list] ( 10). these techniques have also been used with various modifications at several libraries in the united states, such as m.i.t. libraries ( 11 ). in 1966, the waterloo serials project was greatly modified by conversion from ibm 1620 to ibm 360, and from a card system to a tape system, by re-writing the fortran ii pro180 journal of library automation vol. 1/3 september, 1968 gramme in rpg (report programme generator) and by expanding and adding certain data fields. the reference project was initiated in november 1966. it was apparent that, after the revision of the serials project, the newly improved serials. system could be adapted to maintain the master file of the reference subject lists. the project is unique in that it uses a separate code structure that makes possible the provision of information from the master file by types of materials within each subject area. it was decided that the existing library serials maintenance form could be used with minor modifications to produce reference lists. the original form was . modified to facilitate maintenance of the master file by the lib_rary reference staff and easy transcription onto cards by keypunch operators. through the use of these forms, the master file was created and is kept up-to-date. reference master file there are four record types in the master file, each of which is 64 characters in length. they are stored on tape in a blocked length of 6,400 characters for faster processing on the computer, tape being a relatively slow input-output device. the fields in each of the record types are as follows: 1. reference 1st record 1-7 serial number 8-10 record type code 1 [blank] [blank] 11 form type 12~21 classification number 22-32 cutter number 33-34 agent number 35 country code 36 language code 37-38 department code 39 serial exclusion code (for future use) 40-42 sequence number 43 library location 44-64 f~ller (for future use) 2; reference title record 1-7 serial number 8-10 record type code (2nn) 11-63 title information 64 filler (for future use) 3. reference holdings 1-7 serial number 8-10 record type code (3nn) 11-63 holdings information 64 filler (for future use) ' ( subject refetence lists/ chen and kingham 181 4. reference notes record 1-7 serial number 8-10 record type code ( 4nn) 11-63 notes information 64 filler (for future use) library data processing yes additions fig. 1. flowchart of maintenance run. computer print run --, i i i i i i i i i i i i i __ j i i r ( 182 journal of library automation vol. 1/ 3 september, 1968 programmes were written in r.p.g. ( 12) to achieve operational status rapidly with a minimum of debugging. r.p.g. is a problem-oriented ian· guage designed to provide users with an efficient, easy-to-use technique for generating programmes. a set of specification sheets is required, on which the user makes entries. the forms are simple and the headings on the sheets are largely self-explanatory. library i data processing i source '1'0 mai~alice .., ___ ...._ __ -t form oh display imtbe library fig. 2. flowchart of listings run. compu'rer porlic 'l'ransaction sort public print run subject reference lists/ chen and kingham 183 there are three phases to the e.m.s. computer runs: 1) maintenance run (weekly or as required) (figure 1); 2) listings run (monthly ) (figure 2); 3) masters run (semi-annually) (figure 3). . prints!iop subjjx:t referjlfce booklets printed library available 'lo stud!m's '-----1~ at. 25¢ p!3 copy fig. 3. flowchart of masters run. computer 184 journal of library automation vol. 1/ 3 september, 1968 library maintenance form most of the fields on this form (figure 4) are self-explanatory; however, the following may need further definition. library maintenance form serial no. [ lxi i i i i i i • i >p l • lsl•l7 le l' insert: "a" f or addit ion,"(;" for c:hange, or "d" for deletio n f 0 r m <:lass if i cation <: utter. i /k i i i i i l l_l i i i i seria lswhi te reference pink agnt. <: l dept . s e f. code t code n a no. e & 8. ~ ~ ·~· seq. no. 5 y <: r n n n i l 10 ii 11 ll 14 is 16 17 11 19 20 2 1 22 21 1 4 1 5 16 17 1 8 19 30 31 l2 ll h ls 16 37 38 19 40 41 42 4] 4<4 45 when "change' ' has been checked above an oit aff ects title holdings or notes i ndic ate the type of chang e with a-a dd ition c-change 0-0eletion here; place a li ne ty p e code t-title h-holoing nnot es he re;; place ttle sequence number witt~in t he line type he re : l 10 ll·l l 13 i s 20 25 ) 0 fig. 4. library maintenance form. columns 1-2: card code there are six possible codes: 35 •o 1. a[blank] new entry to master file. so 5 5 __l i 60 65 2. c[blank] change to record type 1 (see cols. 10 12 as described below). 3. cal 4. cc~ 5. cdj 6. d[blank] change in lines for an existing entry on the master file, which add, change or delete respectively title, holdings and/ or notes. deletion of an entry from master file. subject reference lists/ chen and kingham 185 columns 3-9: serial number (major sequence of master file) serial number is assigned to every new entry to maintain the alphabetical order of the complete listing. it consists of one alphabetic character taken from the first letter of the main entry, followed by six numerics which serve to make each entry unique within the letter. columns 10-12: (minor sequence of master file) there are four record type codes: · · 1. record type 1 one record permitted per entry ( information on call number, subject matter of the entry and other data). cols. 11 & 12 not 2. record type tl record type h ~ record type n j column 13: form code used. col. 10 contains "title", "holdings" & "notes" information respectively from cols. 13-65 inclusive. cols. 11 & 12 permit up to 99 lines per record type per entry. this alphabetic code represents form of publication, e.g., "a" stands for "abstract", "p" stands for "periodical'' etc ... column 39-40: department code this numeric code indicates the subject list or lists which reference librarians. assign to each entry, and there are two code types: 1. prime department numbers, of which there are 14, e.g. 20 physics ..... to appear on the physics list. 2. implied department numbers : to appear on 2 or more of the prime department lists. e.g. 41 math., phys. & chern. _..,... to appear on the math., phys. & chemistry lists. 60 general _ ____ ..,... to appear on all fourteen subject lists. etc. . . column 4244: sequence number col. 42 is always "r", which stands for "reference list". col. 43 & 44 is a numeric code which indicates type of reference materials. e.g. 12 reference books dictionaries 14 reference bookshandbooks and tables 60 abstracts and indexes where .. reference books" & "abstracts and indexes" are section headings, and "dictionaries" & "handbooks and tables" are sub-section headings. 186 journal of library automation vol. 1/3 september, 1968 pre-edit report the programme that produces the pre-edit report (figure 5) checks the maintenance transactions for the following known error conditions: 1. card code invalid. 2. serial number invalid. 3. sequence number invalid. 4. first record card columns 46-80 should be blank. 5. agent code invalid. 6. country code invalid. 7. language code invalid. 8. department number invalid. 9. exclude code should be "x" or blank. 10. reference code invalid. 11. library location invalid. 12. deletion card should be blank card columns 10-80. 13. 1st record card missing on addition. 14. title, holding or note card sequence error. 15. title, holding or note delete card should be blank card columns 13-80. 16. title, holding or note, addition or change card should be blank card columns 66-80. this step catches approximately 80% of the clerical and keypunching errors. page 1 rei'eri:l<ce library hasti:ll/nle iiaintetiahce pr&-ei>it rei'ort liallcr 0}, 1968 s , b,-w<rxexplanation -aalpi!abetic char , -&blank -hany nomlier -rlel'ter i -xle'l'f!:r x card colohn guide iiitr invalid card printed beneath ", ,5,, , 10, • , 15,, ,20, ,. 25, ,.}0,, ,j5,, . ito •• , 45, .. sq., ,55,, ,60, .. 65 .,. 70,,. 75, ,.80 card colioois a f800000l rha17 f7 1al1 r12~ lahouage code cc ,s s,b , -nor -b,,,5 ... 10,, .1 5 .. , 20 ... 25 ... }0,,.}5,. ,4o, .. 4s ... }0,,.55 ... 60,.,65, .. 70, .. 75,,8o card oowkns a l'8ooo001 iuia17 f7 1aa1 rl24 language code cc }8 s, b,-11or -!,,,,5 ••• 10 ... 15, •• 20 ... 25, ,.}0,,,}5,,.4o,,45. ,,sq ,,, 55 ... 60 •• ,65,, , 70 ... 75 .. ,8o card coloi!!is a fs000001 rral7 f7 laal r12s language code cc }8 s,b ,-nor -b, ,,5 . .. 10 .... 15 ••• 20 ... 25 .. ,}0, ,.}5,,.1to,,,45 •• • so •. . 55 •• • 60 • • ,65 ••• 70 ,,,75, ,80 card colum!is a fs000001 rra17 77 laal xl2s language code cc js s.b,-nor b",.5,.10 ... 1 5, .. 20 ... 25., , }0, , o}5 .. ,ito,, . 45 .. • so .. , 55,, .60;, ,65, , 70, .. 75,, . 8o card coliooi s a f8ooooot011'reond, john e, 1 st rec . add.card hissing this card dropd "• , 5, • . 10. • .15,, , 20, , ,25 ,, . }0,, ,}5,, ,40 ,, ,45 ,, ,so ,, , 55, .. 60,, . 65,, , 70, ,, 75, , .so card colioois a f8oooooto2 dictionary/outline of basic statistics , n£v yorjc , 1 st rd:.add. card hissi:t:l '!"dis card dropd ",5, .. 10. ,,15 .. , 20., , 25. , }0, , }5, , ,ito, , 45,, .so .. , 55,, ,60, , ,65,,, 70.,, 75., ,so card columns a f8oooooto}hcoraw-ril1, 1966, 1st rec,ado,card hissi ng this card dropd page 2 refer!iice library master/file hain'miance pre-edit report karch 0}, 1968 iio. of valid h'i'ce,cards read 9 no,of invalid htce,cards read 7 fig. 5. pre-edit report. subject reference lists/ chen and kingham 187 maintenance report the maintenance report (figure 6) shows: 1. additions to and deletions from the master file. 2. changes to the master file displaying the entry as it was before the change and as it is after the change. this permits easy recovery of the previous record when eltors are made. page l reference library master/file maintenance report january 30, 1968. serial rio, comments ----data -------a846ooo deletion 1st record form no. p class.l/0, tll cutrer no. detail agent country language department serial ex ref.list rr8o llb.locn. a846o0o deletion title t01 automobile engineer. a846o0o deletion holding b01 vol. 8-24, (25) 1918-1935 d0?.5l20 c!wlge 1st from forh no. r class no, z6673 ciitl'er no. rjx:ord to r z6673 detail froh agent country language_ department to froh serial ex ref,list rrlo llb.locn. '1'0 11-10 d200000 addition 1st rjx:ord forh no. p class.no. nkl. ciitl'er no, detail agent country language departhd>'t· serial ex ref.llst r-80 llb.locn, 1)200000 addition title tol design quarterly. d200000 addition holding hol 1966/67d520700 change title from directory of british scientists, lonoon, e. benn, 'f01 to directory of british scientists, vi03100 change bolding fro!~ library has vols. 1-3· b01 to library has vols, l-it, m4o8ooo .... attehpt to add new record has been unsuccessnll serial nwmer exists already .... card columns .. . . 5 ••• 10 ••• 15 • •• 20 ••• 25 .. . 30 ... 35 .. ,40, .. 45 ... 50 ... 55 ... 60 ... 65 ., .,invalid card a m4o8oool be331 b55 40 rlelt a66 88 4 d36 d36 82 ito 4 4 e9 60 4 nl?3000 .... attd!pt to change a record has been unsuccessful add t,h or n seq.no, exists already .,.,card columns .. , .5. , .10 ... 15 ... 20 ••• 25 .. ,3q ... 35 ... 40 ... 45 ... 50 ... 55 ... 6o .. ,65 •••• invalid card can173qooh02 1925-1962// 111?3000 .,.,attempt to change a record has been unsuccessful add t,h or n seq.no. exists already ,.,,card columns , ... 5 ... 10 ... 15 ••• 20 ••• 25 ... 30 ... 35 ••• 40 ... 45 ... 50 • • • 55 ... 60 ... 65 ... ,invalid. card caiu73000n01. superseded by its highway research rf.x:ord, master/l'ile records read 6292 lltlmber of records added 162 number of records deleted 55 master/file records 'riritten 6399 lltlmber of invalid maintenance records not processed 8 fig. 6. maintenance report. 188 journal of library automation vol. 1/ 3 september, 1968 3. two types of error conditions that fail to appear in the pre-edit report due to the absence of the master file in the pre-edit programme. a. additions where serial numbers and/ or sequence numbers ( cols. 1012) exist already. b. changes/deletions where serial numbers and/ or sequence numbers are non-existent. 4. master file maintenance statistics on: ·~ a. master file records read. b. number of records added. c. number of records deleted. d. master file records written. e. number of invalid maintenance records not processed. addition list this list (figure 7) is an alphabetical summary (in serial number sequence) containing added entries only from the "maintenance" run. information on call number, complete bibliographical data of the entry, department or subject code ( cols. 39 40) and location are printed for each entry. this augments the internal reference list between the "listings" run (see figure 2). page 1 reference addition list for_ wed< ending ~anuary 30, 1968. serial a262000 abs qdl a 53 d200000 per nk1 ag cntry l dpl'. 85 t01 american chemical society • lf02 abstracts of paper. hol 196?60 e9 tol design quarterly • hol 1966/6?d56.5000 ref z?916 d6 01 tol doc~ts digest. h01 vol. 16, no. ?fig. 7. addition list. locn ma. ma. eng. subject reference lists/ chen and kingham 189 internal reference list this is a complete alphabetical list (figure 8) of all entries on the master file, similar to the addition list (figure 7) in arrangement and format. the serial number sequence facilitates the reference staff assignment of unused serial numbers to new entries and the easy location of serial numbers of entries for updating purposes. this document is the prime source of information for maintaining the master file. public list the main list (figure 9) is first divided by subjects of which there are fourteen: mathematics, astronomy, biology, chemistry, earth sciences, physics, design, management sciences, aero engineering, chemical engineering, civil engineering, electrical engineering, mechanical engineering and nuclear engineering. each subject list is further divided into the following sections and sub-sections: 1. reference books a. guides to the literature and bibliographies b. periodical listings c. dictionaries d. encyclopedias e. handbooks and tables f. directories -individuals page 1 serial a002500 per tk1 a8 a020000 per qc221 a4 a028ooo per qd1 a325 internal reference list ag cntry 1. dpt • ref x r 48 r80 t01 asea journal h01 vol. 321959n01 published with abstracts 60 rbo '1'01 acoustical society of america. '1'02 journal, h01 vol. 17• 1945/4644 rbo '1'01 acta chemica scandinavica. h01 vol. 1· 1947fig. 8. internal reference list. . february 5, 1968. locn eng, eng, eng. 190 journal of library automation vol. 1/ 3 september, 1968 chdiisl'rt page 4 encycidpaedias ref the encyclopedia of ciidusl'rt. 21> ed. qd5 new york, reinhold publishing corp., 1966. e.58 ref hampel, clifford allen, ed. qd.553 the fzfcycwpedia of elfx;troc!m4istry • b3 new york, reinhold, c1964. ref international encyclopedia of che24ical science. qd5 princeton, n.j., van nostrand, 1964. i5 ref jacobson, carl alfred, ed. qdl55 ejicyclopedia of choocal reactions. new york, j} reinhold pub. corp., 1946-1959. 8v. kingze'rl', charles thomas. kingze'rl's chemical encycidpedia, a digest of chejfis'l'ry &c its inoos'l'rial applications.gr ~:;:h~._~ princeton, n.j., van nostrand, 1966. fig. 9. public list. g. directoriesorganizations h. international conferences 2. standards and patents 3. important series 4. theses 5. abstracts and indexes 6. periodicals reference booklets it is planned that semiannually the e.m.s. library will receive from the computer centre the computer produced masters, which are exact duplicates of the public list except that they are printed on unlined paper with a special printer ribbon. the masters are then sent to the university's printshop, and the fourteen separate reference booklets are printed from offset masters photo-reduced to 75% of the original. this results in a publication of convenient size (8~"x5~") with clearer typographical representation than the actual computer printout. figure 10 shows a representative page from the aero engineering list of the first edition. rff tho) f.~7 rff tall .e~ ref ta'i j6~ ref l~c63 a~ ~2 ref q 17.1 1032 subfect refet·ence lists/ chen and kingham 191 atron, eng , p~ge 6 t~e et;cytlop(dja of f.ngi~eering iuhiuals ano processes, nfw ycp~t r~ini'cli) pub, corp,, 1'163, hcyclopeilu of hcimf.ring sic:ns anc syiii'olso nfw yor~, coyssf.y pi\fs~, c\'165, jones, frln~lin day, fc, enei~fering f.ncyclopeoia, 30 f.o, new york, inousti\ul press, c\'163, kempe s eng infers yf.ar-iiook, 720 ell, lcnoon, "organ broti'f.rs, l9h, 2v, l mccraw-hill encyclopedia qf science ano technology, rev. f!), new yllrk, ~cgraw-hll, 1'166, 15v, ~cc~aw-hlll yearbook of scie~ce ano technology, nfh york, ~ccraw-hillt 1'16?.hanoftooks and tables ref tjz33 1572 tjiu6 a~6 th07 j.7z ~ef oclu 8~5 american society for testing ann "atf.rial s, coii"t ttee ~-1 d n f 'on-c~roiiiuii, ircn-chromtliiinl ckelt ano relued alloys compilation, cciifilation of chell i cal compos it ions and rupture strengths of super-strength alloys, philadelphia, 1'16 .... american society of ~echhical engineers, asme handbook, 2d f.d, new y~rk, mcgraw-hill, 1965library has yol o l • aiurican society of tool and han\.ifacturinc engineers, machining thf. space-age ~etals.,, cearborno hichigaic, 1'165. armcur resurch founcationt cii'icago, handbook of thhhophvsical pri)perths of solid materials, rev, eo, new york, iiciiillan, 1'161. 5vo aviatiot; ace research and developme~t technical handbook, l'ih-1958, benedict, roftf.rt p, ha~cftcck cf generillzec cas dynamics , new yorko plenuh ~ress data divislllnt 1966, fig. 10. page (actual size) in the aero engineering list. •' •' ,. ' i i i i i i ' 192 journal of library automation vol. 1/ 3 september, 1968 table 1. information on first edition copies number estimated copies ordered sold to of printing first second students & subject pages cost/ copy printing printing faculty astronomy 9 14c 30 40 7 biology 16 18c 90 37 chemistry 16 18c 150 64 earth sciences 22 2lc 50 19 physics 15 18c 100 44 design 14 17c 50 12 management sciences 11 16c 30 100 88 mathematics 15 18c 150 81 aero engineering 20 chemical 20c 30 40 11 engineering civil 28 24c 100 46 engineering elech·ical 23 22c 100 57 engineering mechanical 26 23c 100 65 engineering nuclear 27 24c 100 44 engineering 16 18c 30 40 5 discussion first edition an estimated number of copies for each list, as shown in table 1, was ordered on the basis of sttident enrollment figures in different departments of the faculties of engineering, mathematics and science, and on the subject matter of each list in relation to the academic programmes of the university. it was hoped that those copies could adequately meet the demands of students, faculty and interested people outside the u niversity until the completion of the second edition, tentatively set then for september 1968. the first edition of the reference lists was available for distribution at the end of november 1967. experience having shown that free library materials were no sooner received than discarded, it was decided to give some value to these lists by a charge of 25¢ per copy. from the start students responded so enthusiastically to ·the lists that one week after their , sub;ect reference lists/ chen and kingham 193 availability, the library had to order 100 additional copies of the second printing of the "management science" list, and by the end of february 1968,. 40 additional copies each of the "aero engineering," "nuclear engineering" -and "astronomy" lists. table 1 gives information on quantities printed, costs, and sale to students and faculty of first edition lists. the estimated printing cost per copy is based on printing of 100 copies. · mter the announcement of the availability of the lists in several library professional journals, the e. m. s. library received many letters of inquiry and requests for complimentary copies. complimentary distribution was made of 12 sets and some 80 lists of different subjects. purchase orders were received for 83 complete sets "of lists, 21 from canada, 58 from the united states, and two each from australia and england. by the end of -march 1968, stock of the first edition was exhausted, and there were still 44 purchase orders unfilled and 28 filled only partially. questionnaire instead of ordering more copies of the first edition from the university f'rintshop to meet the requests received thus far, the reference staff decided to work on a second edition, and the original scheduled completion date of that edition was moved ahead to early june 1968. ' although the e. m. s. library had already received many valuable suggestions and comments on the project from waterloo faculty and interested librarians in canada and the united states, including some very enthusiastic library school professors, there was little feedback at that time from the immediate users, the students, on their use of the lists. since addresses and department affiliations of most of those who purchased lists had been recorded, it was possible to send out questionnaires (figure 11) to 210 undergraduates, 122 graduate students and 30 faculty members in the beginning of april 1968. by april 20, 65 returns ( 31%) were received from the undergraduates, 41 ( 33.6%) from the graduates and 11 (36.6%) from the faculty. a summary of those returns, shown in table 2, has been of great help in assessing the value of the first edition. from the replies it is certain that almost all who purchased lists found them useful and would be willing to buy the updated edition. most important, students used the list for research' 'purposes (including term papers and thesis work), thus fulfilling the original purpose of the project. another fact emerging from the questionnaire ' was that the number of serial titles included should be greatly expanded. second edition reference librarians started at the end of april to update the fourteen subject reference lists by incorporating the valuable feedback and comments received and to compile the fifteenth list, "optometry" (the university of waterloo has had a new optometry school since september 1967). many changes, additions and deletions have been made, and the 194 journal of library automation vol. 1/3 september, 1968 university of waterloo e.m,s, library according to our records, you have purchased one (or mqre) of the reference booklets. in order to plan for a second edition, and to a ssess the value of th~ first edition, we would be most grateful if you would fill out this questionnaire as completely as possible and mail it to us before april 20 1 1968. it is not necessary to sign yo~r name. 1. haye you used your reference list? a, if so did you find it helpful? yes yes 0 no 0 no 0 0 b. for what purpose did you use the list? regulard studies research o (including term papers, c. did you use the list in place of the serials list and card catalogue? 2. did the list save you time in your use of the library? 3. should the list include more or fewer titles? yea yes more a, which area~ do you feel should be expanded or deleted? thesis work) 0 no 0 0 no 0 ~~ fewer ~~ expanded deleted guides to the i.iterature & bibliographies •••••••••••••• periodical li stings ••• , ••••••.•••••• · •••• , • •••••••• , , , • , dictionar~es • •• •••• •• .•• •. • • .••• •..•. • •• , • • , , •• , •. , • , .• encyclopedias , , ••• , • , •••••••••••• • ••••• , •••••• , • , • , • , • , handbook and tables , • , •••••••.•••• ••• , ••••• , ••• , •••• , •• directoriesindividuals •••• ••••• ••••••••••••••••••••• directo!ues organizations ••••••••••• , • . ••••••• , , • , , , • international conferences •• , ••••••••• , ••••••••••••••••• standards and patents ••••••••• • •••• •• ••••.•••••• , , , •• , • important series , , , •• , , ••• •• • • •••••••••••• , • , , •• •• , • , , • theses ••••• •• ••• , • , ••••••••••••••••• •• , • •• • •••• • •••• , • , abstracts and indexes • , , ••••• • •••• , • , • • , • , •• , •••• , , •• , , periodi cals •. , , •• , •. , . , •.•.•• .• , •• . . ••. , •. , •. , •. , •••• , , • b. which specific titles do you feel should be added? c. which specific titles do you feel should be deleted? 4, would you be interes ted in buying an updated edition of the reference list? 5. additional comments 6. undergraduate d graduate· 0 faculty 0 thank you for answering this questionnaire. if you would like to discuss further anything pertaining to the reference lists, please feel free to call us, fig. 11. questionnaire on use of reference subject lists. subject reference lists/ chen and kingham 195 table 2. summary of questionnaire returns question undergr. grad. fac. 1. yes 39 30 5 no 26 11 5 la. yes 31 26 5 no 8 3 1 lb. studies 16 9 1 research 27 23 5 lc. yes 19 12 2 no 25 20 4 2. yes 27 21 6 no 11 11 1 3. more 45 32 5 fewer 2 3a. handb. exp. 16 18 7 .. del. 2 1 2 series exp. 19 8 3 .. del. 1 theses exp. 15 14 1 .. del. abst. exp. 15 13 3 '' del. 1 per. ea. 28 23 6 " de. 2 1 4. yes 23 24 6 no 17 8 6. 65 41 14 serial titles greatly expanded as requested by users. a new sub-section heading has been created under the section "reference books" for reference materials of a very general nature; thus materials such as encyclopaedia canadiana, canada yearbook, etc ... are pulled out from subsections such as "reference encyclopaedia" and "reference handbooks & tables" etc . .. to the sub-section "reference general" at the very beginning of each subject list. it is estimated that the second edition will be available at the beginning of june. a comparison of the two editions is shown in table 3. cost although up to this time, the computer centre has made no internal charge for its services to the library, it is estimated that with the university's present computer configuration, the monthly cost of maintaining 196 journal of library automation vol. 1/3 september, 1968 table 3. first and second editions compared edition completion date no. of records on master file addition up-dating . change (no. of entnes) d 1 t' e e 1011 no. of subject lists number of pages of each subject list aero engineering chemical engineering civil engineering electrical engineering mechanical engineering nuclear engineering design management sciences mathematics astronomy biology chemistry earth sciences physics optometry i nov./ 67 c.5,500 14 20 28 23 26 27 16 14 11 15 9 16 16 22 15 n · june/68 7,446 280 216 7 15 26 37 31 34 35 21 17 15 21 14 23 27 27 22 15 this project is approximately $40.00. this cost covers about 4 minutes/ month computer time, about 2 hours/month for keypunching and verifying and the cost of punch cards, multipart paper etc. . ., but does not cover the initial cost of system analysis and the charges for printing the booklets. by comparison, it would cost approximately $95.00 per month to produce the copy by hand and this method . would not provide the flexibility and other advantages of a computerized system. heferences 1. chen, ching-chih: "computer-produced subject reference lists," iplo newsletter, 9 (feb. 1968), 38-40. · ·. 2. mccune, lois c.; salmon, stephen r.: "bibliography of library automation," ala bulletin, 61 (june 1967), 674-94. · 3. black, donald v.; farley, earl a.: "library automation," in annual review of information science and technology, edited by carlos a. cuadra (new york, inter science: wiley) . 1 ( 1966), · 273 303. 4. schultz, claire . k.: "automation of reference work," .libmry trends, 12 (jan. 1964 ), 413-424. · subject reference lists/ chen and kingham 197 5. brownson, helen l. : "new developments in scientific documentation," cla occasional paper, no. 32, 1961. 6. hammer, donald p.: "automated operations in a university library; a summary," college & research libraries, 26 (jan. 1965), 19-29, 44. 7. prodrick, r. g.: "automation can transform reference services," ontario library review, 51 (sept. 1967) , 145-50. 8. cox, n. s. m.; dews, j. d. ; dolby, j. l.: the computer and the library; the role of the computer in the organization and handling of information in libraries (hamden, conn.: archon books, 1967), 78-84. . . 9. brown, j. e.; walters, peter: "mechanized listing of serials at the national research council library," canadian library, 19 (may 1963 ), 420-26. 10. wilkinson, john p.: "a.a.u. mechanized union list of serials," apla bulletin, 29 (may 1965), 54 59. 11. nicholson, natalie n.; thurston, 'villiam : "serials and journals in the m.i.t. library," american documentation, 9 ( 1958), 304-7. 12. international business machines corporation : "ibm system 360 operating system-report programme generator specifications," ibm system reference library, file no. s360-20, form c24-3337, (ibm programming publications dept. 452, san jose, c.alif. 95114, 1965 + revisions). 201 application of the variety-generator approach to searches of personal names in bibliographic data bases-part 2. optimization of key-sets, and evaluation of their retrieval efficiency dirk w. fokker and michael f. lynch: postgraduate school of librarianship and information science, university of sheffield, england. keys consisting of variable-length chamcter strings from the front and rear of surnames, derived by analysis of author names in a particular data base, am used to provide approximate representations of author names. when combined in appropriate mtios, and used together with keys for each of the first two initials of personal names, they provide a high degme of discrimination in search. methods for optimization of key-sets are desc1·ibed, and the performance of key-sets varying in size between 150 and 300 is determined at file sizes of up to 50,000 name entries. the effects of varying the proportions of the queries present in the file are also examined. the results obtained with fixed-length keys are compared with those f01' variable-length keys, showing the latter to be greatly superior. implications of the work for a variety of types of information systems a1'e discussed. introduction in part i of this series the development of variety generators, or sets of variable-length keys with high relative entropies of occurrence, from the initial and terminal character strings of authors' surnames was described.1 their purpose, used singly or in combination, is to provide a high and constant degree of discrimination among personal names so as to facilitate searches for them. in this paper the selection of optimal combinations of the keys and evaluation of their efficiency in search are described. the performance of combined key-sets of various compositions is determined at a range of file sizes and compared with fixed-length keys. in addition, 202 1 ournal of lib1'm'y automation vol. 7 i 3 september 197 4 the extent of statistical associations among keys from different positions in the names is determined. balancing of key-sets the relative entropies of distribution of the first and last letters of the surnames of authors in the file of 100,000 entries from the inspec data base differ significantly, the former being 0.92 and the latter 0.86. as a result, a larger key-set has to be produced from the back of the surnames to reach the same value of the relative entropy as that of a key-set of given size from the front of the surname. for instance, the value of 0.954 is reached by a key-set comprising 41 keys from the front of the name, but a set of 101 keys from the back is needed to attain this value. it seemed reasonable to assume that keys from the front and rear should be combined in different proportions in order to maximize the relative entropy of the combined system, and that their proportions should reflect the redundancies of each distribution (redundancy = 1 hr). in order to test this, a series of combined key-sets of different total sizes was produced, in which the proportions of keys were varied around the ratio of the redundancies of the first and last character positions, i.e., ( 1 0.92): ( 1 0.86), or 8:14. the relative entropies of the name representations provided by combining these key-sets with keys for the first and second initials were determined by applying them to the 50,000 name file, and the entropy value used to determine the optimal ratio of keys. in one case, the correlation between the value of the relative entropy and retrieval efficiency, as measured by the precision ratio, was also studied, and shown to be high. the sizes of the combined key-sets studied were 148 and 296, with an intermediate set of 254 keys. the values of 148 and 296 were chosen in view of the projected implementation in the serial-parallel file organization.2 this relates the size of the key-set to the number of blocks on one cylinder of a disc. (the 30mbyte disc cartridges available to us have 296 blocks per cylinder.) otherwise the choice of key-set is arbitrary, and can be varied at will. the minimum key-set size is 106, consisting of 26 letters each for the first and last letter of the surname, and 27 ( 26 letters and the space symbol) each for the first and second initials. the numbers of n-gram keys ( n ::::,. 2) required for the key-sets numbering 148, 254, and 296 in size are . thus 42, 148, and 190. full details are given of the composition of the first and third of these sets. a slight refinement to key-set generation was employed to ensure as close an approximation to equifrequency as possible, especially with the smallest key-sets. precise application of a threshold frequency may occasionally result in arbitrary inclusion of either very high or very low frequency keys. thus, if almost all the occurrences of a longer key are accounted for by a shorter key (as with -mann and -ann), only the shorter n-gram is included. va1'iety-generato1· approach/fokker and lynch 203 optimal set of 148 keys the number of n-gram keys ( n ::::::,. 2) to be added to the minimum set of 106 keys is 42, the presumed optimum proportion being 8:14, which implies about 16 keys from the front of the name and 26 from the back. in order to examine the relationship between the ratio of keys from the front and rear of the surname and the relative entropy of the combined sets, the ratios were varied at intervals between 1:1 and 1:3 so that the numbers of n-grams varied from 21 and 21 to 11 and 31 respectively. for each ratio the keys were applied to the 50,000 name entries, and the distribution of the resultant descriptions determined. the ratios, the number of n-gram keys, and the relative entropies of the distributions are shown in table 1. the maximum value of the entropy is taken to be log250,000. in this case the balancing point, with the key-set including 16 n-gram keys table 1. relation between ratio of n-grams f1'0m f1'dnt and rear of surname, entropy of combined key-sets, and retrieval efficiency for a series of sets of 148 keys ratio numbm· of n-gram number of diffm·ent relative · precision(%) of n-gram keys representations entropy (file size= keys front back in 50,000 entries of system 25,000) 1:1 21 21 33,485 0.9450 71.5 3:4 18 24 33,501 0.9450 71.3 17:25 17 25 33,434 0.9447 70.9 8:13 16 26'* 33,454 0.9453 72.2 5:9 15 27 33,402 0.9450 72.0 1:2 14 28 33,378 0.9449 72.1. 1:3 11 31 33,126 0.9437 71.5 total number of different name entries = 41,469. '* key-set with highest relative entropy. from the front and 26 from the back, corresponds with the ratio of the redundancies of the first and last letters of the surnames. table 2 shows the composition of the optimal key-set of 148 keys, while table 3 gives the distribution of the name representations compiled from the combined key-set, and its corresponding relative entropy. optimal set of 296 keys a similar procedure to that used for the optimal148-key key-set was also applied in this instance. here the ratios of front and rear n-gram keys varied from 57 and 133 to 69 and 121 respectively. for each of the sets chosen, the distributions of the entries resulting from application of the combined key-sets to the file of 50,000 names were determined. these showed virtually no difference in terms of the relative entropy alone, although the total number of different entries differed slightly between keysets, and the highest value was used to choose the optimal set, detailed in table 4. the range of combinations studied is shown in table 5, and the distribution of the entries for the optimal set is given in table 6. .. , -::: 204 journal of library automation vol. 7/3 september 197 4 table 2. composition of balanced key-set of 148 keys keys from front of surname ( 42) : key p• key p• key p• key p• a .035 g .055 ma .030 sh .016 b .020 h .035 n .025 st .016 ba .020 ha .021 0 .017 t .040 be .017 i .013 p .038 u .005 bo .014 j .017 pa .014 v .025 br .014 k .041 q .001 w .040 c .036 ka .017 r .032 x ch .016 ko .017 ro .017 y .011 d .044 l .033 s .049 z .013 e .018 le .014 sa ,016 f .034 m .050 sc .015 keys from rear of surname (52) : a .060 ii .015 nn .010 is .012 ra .010 ki .015 on .018 t .042 va .015 j .001 son .027 u .013 b .003 k .033 0 .028 v .001 c .005 l .013 ko .013 ev .018 d .030 el .012 p .004 ov .026 e .068 ll .016 q .001 ,kov .012 f .006 m .013 r .016 nov .on g .012 n .009 er .064 w .005 ng .014 an .020 ler .013 x .003 h .020 man .017 ner .010 y .031 ch .017 en .025 s .055 ey .012 i .044 in .039 es .015 z .013 keys from first initial: 27 characters keys from second initial: 27 characters table 3. frequencies of entries represented by optimall48-key key-set in a file of 50,000 names frequency number of entries with f frequencyf 1 24,363 2 5,622 3 1,850 4 757 5 372 6 193 7 103 8 68 9 32 10 24 11-15 54 16--20 11 21-30 4 33 1 total number of different entries = 33,454 maximum number of possible combinations= 1,592,136 (i.e., 42 x 52 x 27") h = 14.7553 hmax = 15.6096{log,50,000) hr = 0.9453 variety-generator approach/fokker and lynch 205 table 4. composition of balanced key-set of 296 keys keys from front of surname ( 87) : a bu e ha ki ma ni ra si wa al c f he ko mar 0 re so we an qa fr ho kr mc p ri st wi b ch g hu ku me pa ro t x ba co ga i l mi pe s ta y bar d go j la mo po sa u z be da gr jo le mu pr sc v bo de gu k ll n q se· va br do h ka m na r sh w keys from rear of surname ( 155) : a ld ng vskii el lin r or nt sov ca nd ang ki ll tin ar s rt w da rd lng ski all nn er as ert x ka e rg wski ell on ber es st y ma de h li m son der nes tt ay na ee ch ni am lson ger is ett ey ina ge ich ri n nson nger ns u ley ra ke vich ti an rson her ins v ky ta le gh j man ton ier os ev ry va ne sh k rman 0 ker rs ov z ova re th ak yan ko ler ss kov tz wa se ith ck en nko ller ts ikov ya te i ek sen no mer us lov b f ai ik in to ner t nov c ff hi l ein p ser dt anov d g ii al kin q ter et rov keys from first initial: 27 characters keys from second initial: 27 characters table 5. relation between ratio of n-grams from front and rear of surname and entropy of combined key-sets for a series of sets of 296 keys (file size= 50,000) ratio ofn-gram keys 3:7 61:129 13:25 69:121 number of n-gram keys front 57 61 65 69 back 133 129'* 125 121 '* key-set with highest number of different entries. number of different representations 39,182 39,191 39,186 39,179 relative entropy of system 0.9679 0.9679 0.9679 0.9679 in this instance, the ratio of n-gram keys from the front and back of the surnames has been displaced from the ratio of the redundancies of the first and last characters of the surnames, i.e., 8:14 (1:1.7). here the ratio is roughly 1:2. this is undoubtedly due to the fact that the relative entropies of key-sets from the back of the surname increase less rapidly than those of key-sets from the front, and hence larger sets must be employed. evaluation of retrieval effectiveness the keys in the optimized key-sets represent name entries in an approxi,, i' i: 206 ] oumal of librm·y automation vol. 7 i 3 september 197 4 table 6. frequencies of entries represented by optimal key-set of 296 keys in a file of 50,000 names frequency f 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 total number of different entries = 39,191 number of entries with frequencyf 31,705 5,394 1,371 442 164 63 27 12 4 3 2 2 1 1 maximum number of possible combinations= 9,830,565 (i.e., 87 x 155 x 27') h = 15.108 hmax = 15.6096(log,50,000) hr = 0.9679 mate manner only, so that when a search for a name is performed, additional entries represented by the same combination of keys are identified. while these may be eliminated in a subsequent character-by-character match of the candidate hits, the proportion of unwanted items should remain low if the method is to offer advantages. in evaluating the effectiveness of the key-sets in the retrieval, the names in the search file were represented by concatenating the codes for the keys from the front and back of the surnames and the initials, and subjecting the query names to the same procedure. the matching procedure produced lists of candidate entries, of which the desired entries were a subset. the final determination was carried out manually. the tests were performed first with names sampled from the search file, so that correct items were retrieved for each query. since searches for name entries may be performed with varying probabilities that the authors' names are present in the file (especially in current-awareness searches), varying proportions of names of the same provenance, but known not to be present in the search file, were also added. in these cases candidate items were selected which included none of the desired entries. recall tests were also performed and recall shown to be complete. the measure used in determining the performance of the variety-generator search method is the precision ratio, defined as the ratio of correctly identified names to all names retrieved. it is presented both as the ratio of averages (i.e., the summation of items retrieved in the search and calculation of the average) and as the average of ratios (i.e., averaging the val'iety-genemtor app1'0ach/fokker and lynch 207 figures for individual searches). the latter gives higher figures, since many of the individual searches give 100 percent precision ratios. the precision ratio was found to be dependent on file size and to fall somewhat as the size of file increases. this is due to the fact that the keysets provided only a limited, if very high, total number of possible combinations, while the total possible variety of personal names is virtually unlimited. the evaluation was performed with a sample of 700 names, selected by interval sampling. this number ensured a 99 percent confidence limit in the results. a comparison of the interval sampled query names with randomly sampled names showed that no bias was introduced by interval sampling. a test to confirm that the retrieval effectiveness reached a peak at the maximum value of the relative entropy of a balanced key-set was performed first. this was carried out on a file of 25,000 names, using as queries names selected from the file and the optimal 148-key key-set. as shown in table 1, the values of the precision ratio (ratio of averages) and of the relative entropy both peak at the same ratio of n-gram keys from the front and back of the surnames. the performance of the optimal key-sets of 148, 254, and 296 keys with files of 10,000, 25,000, and 50,000 names is shown in table 7. calculated as the ratio of averages, the smallest key-set ( 148 keys) shows a precision ratio of 64 percent with a file of 50,000 names, which means that of every three names identified in the variety-generator search, two are those desired. with the largest key-set ( 296 keys), this rises to nine correctly identified names in every ten retrieved at this stage. on the other hand, calculated as the average of ratios, the precision ratios rise to 81 percent and 94 percent respectively. for smaller file sizes-typical, for instance, of current-awareness searches-the figures for all of these are cottespondingly higher. table 7. precision ratios obtained in variety-generator searches of personal names-queries sampled from sea1'ch file (confidence level= 99 pm·cent) precision as ratio of averages (%) : file size 50,000 25,000 10,000 precision as average of ratios (%) : file size 50,000 25,000 10,000 148 64 71 84 148 81 87 93 key-set size 254 87 90 93 key-set size 254 91 95 97 296 90 91 94 296 94 96 97 '; ~;: 208 journal of library automation vol. 7/3 september 1974 the effect of sampling from a larger file, so that increasing proportions of the names searched for are not present in the search file, is shown in table 8 for a file of 25,000 names. in this case, the proportion of correctly identified names in the total falls, so that overall performance is somewhat reduced. thus, depending both on file size and on the expected proportion of queries identifying hits, the key-set size can be adjusted to reach a desired level of performance. in addition, tests to determine the table b. effect of varying proportion of query names not present in search file of 25,000 names, using 296 keys (ratio of averages) %of names not precision% number of names number of names in search file (ratio of averages) ret1·ieved correctly retrieved 21 90 766 691 42 85 595 505 61 83 449 371 74 76 319 242 84 68 228 154 applicability of a key-set optimized for one file of 50,000 names to another file of the same provenance and size were carried out. the three key-sets derived from the first file were applied to the second, query names sampled from the latter, and the precision ratios determined. some reduction in performance was observed; expressed as ratio of averages, the precision with the 296-key key-set fell from 90 to 83 percent, with the 254-key keyset from 87 to 82 percent, and with the 148-key key-set from 64 to 56 percent, figures which seem unlikely to prejudice the net performance in any marked way. nonetheless, monitoring of performance and of data base name characteristics over a period of operation might well be advisable. distribution characteristics of other types of keys it is particularly instructive to examine the distribution characteristics of other types of keys, including those of fixed length, generated from various positions in the names, and to compare them with those of the optimal key-sets employed in the variety-generator approach. to this end, the file of 50,000 names was processed to produce the following keys or keysets: 1. initial digram of surname. 2. initial trigram of surname. 3. key-set of ninety-four n-grams from the front of the surname, with first and second initials. 4. key-set consisting of first and last character of surname, with first and second initials. the figures (table 9) show clearly that all have distributions which leave no doubt as to their relative inadequacy in resolving power, where this is defined as the ratio of distinct name representations provided by the key-set used to the number of different name entries ( 41,469) in the file. at the digram level, the value of the resolving power is 0.009, i.e., each vm·iety-generator approach/fokker and lynch 209 digram represents, on average, 110 different name entries, while no fewer than thirty-two specific digrams each represent between 500 and 1,000 different names. at the trigram level, the value of the resolving power rises to 0.08, a tenfold increase; however, one trigram still represents between 500 and 1,000 different names. use of the first and last letters of the surname plus the initials again increases the value of the resolving power to 0.627, or 1.6 distinct names per entry; eight of the representations now account for between thirty-one table 9. distributions of a variety of other representations of personal names in a file of 50,000 entries 94 n-grams from first and last frequency initial digram initial trigram front of surname letter of surname f of surname of surname plus 2 initials plus 2 initials 1 40 735 8,964 16,346 2 22 428 3,929 4,919 3 16 249 1,884 2,025 4 11 197 1,006 973 5 7 170 646 581 6 7 110 397 340 7 10 112 234 224 8 4 98 186 146 9 7 81 144 92 10 5 66 108 72 11 6 61 70 49 12 2 56 88 36 13 5 51 74 33 14 1 48 50 24 15 2 35 51 23 16 3 37 36 25 17 2 35 29 15 18 3 33 29 11 19 8 35 28 6 20 8 40 23 5 21-30 21 207 127 49 31-40 23 109 47 8 41-50 13 88 13 51-100 36 142 3 101-200 24 62 201-500 57 15 501-1000 32 1 total 375 3,301 18,166 26,002 resolving power .009 .080 .438 .627 and forty distinct entries. in contrast, however, the key-set of 148 keys comprising ninety-four n-gram keys from the front of the name and the first and second initials, although almost 50 percent larger than the fourcharacter representation, has a resolving power of only 0.438 (or 2.28 entries per representation). this contrast provides particularly strong evidence for the superiority of keys from the front and rear of the surnames over those from the front alone, even when the latter are variable in •' 210 journal of library automation vol. 7/3 september 1974 length. as expected, the precision ratio of the four-character representation is low, at 37 percent (ratio of averages), compared with 64 percent for the optimal148-key key-set. extent of statistical association among keys thus far, the frequency of occurrence of variable-length character strings from the front and back of the surnames is the only factor considered in their selection as keys. it is well known in other areas that statistical associations among keys can influence the effectiveness of their combinations. 3 where a strong positive association between two keys exists, their intersection results in only a small reduction of the number of items retrieved over that obtained by using each independently. when the association is strongly negative, the result of intersection may be much greater than that predicted on the basis of the product of the individual probabilities of the keys. to assess the extent of associations among keys from the front and rear of surnames and initials, sets of both fixedand variable-length keys from each of these positions were examined.· the kendall correlation coefficient v was calculated for each of the twenty most frequent combinations of these. this is related to the chi-square value by the expression x2 =m v2 where m is the file size, or 50,000. table 10 shows the values of the association coefficient for certain of the characters in the full name. those above .012 are significant at a 99 percent confidence level. positive associations are table 10. a8sociation coefficients for sets of the most frequent digrams from various positions in personal names first and last first letter of surname first and second letters of surname and first initial initials digram v digram v digram v kv .064 kv .054 hv .078 wr .050 hj .027 mv .069 ka .038 br -.024 kv .069 hn .028 sj -.023 rv -.055 sa .024 dj .022 dv -.053 sn .024 bg .018 tv .053 cn .022 ka .018 jv -.045 kn -.020 cj ,018 sv .034 ma .014 sd .015 fv .033 kr -.011 sv .013 nv -.029 sv ,010 mm .011 gv .022 rn .010 mj ,007 lv -.022 bn -.008 bj ,005 iv -.019 br .008 sg -.004 av -.019 mn -.007 sr .004 cv -.018 sr .007 ba .004 pv .017 mr .004 ma ,004 wv -.014 si -.002 sm -.003 yv .010 gn .001 mr .002 bv .005 ln .001 sa -.000 ev -.002 variety-generator app1'0ach/fokker and lynch 211 more frequent than negative. the figures indicate that intersection of certain of these characters as keys in search would result in some slight diminution in performance against that expected. the figures for the association coefficients among the twenty most frequent combinations of keys from the front and back of surnames in the 148and 296-key key-sets show magnitudes (mostly positive) which are substantially greater than those for single characters (see table 11). the reasons for these values are obvious; in certain instances, e.g., miller, jones, and martin, common complete names are apparent, while in one case, lee, an overlap between keys from the front and rear exists. in others, linguistic variations on common names can be discerned, as with br n-brown or braun. table 11. association coefficients in the twenty most frequent key combinations from front and back of surnames in two key-sets key-set size key-set size 148 296 keys v keys v s h .146 s ith .343 j son .127 jo nson .297 sc er .104 jo nes .278 w s .043 an rson .274 t a .038 si gh .249 t i .038 le ee .221 w er .038 mu ller .214 c e .034 ta or .195 f er .033 gu ta .168 p s .025 br n .160 d e .023 mi ller .151 l e .022 mar tin .145 w e .022 wi s .137 g in .020 f her .133 m e .009 sc der .121 s a .008 sa to .110 g e .006 t as .084 m a .005 sc er .069 m er -.004 ch en .055 g er -.000 t son .050 such associations are inevitable. when the selection of keys is based solely on frequency, some deviation from the ideal of independence must result, becoming larger as the size of the key-sets increases, and as the length of certain of the keys increases. however, since its effect in the most extreme cases is merely to lead to virtually exact definition of the most frequent surnames, no particular disadvantage results. possible implementations of the variety-generator name search approach the variety-generator approach permits a number of possible implementations of searches for personal names to be considered, if only in outline f ( f•j/ 212 journal of library automation vol. 7/3 september 1974 at this stage, using a variety of file organization methods. the most widely known methods (apart from purely sequential files) are direct access (utilizing hash-addressing), chained, and index sequential files. direct application of the concatenated key-numbers as the basis for hash-address computation appears attractive in instances where the personal name is used alone or in combination (as, for instance, with a part of the document title). the almost random distribution of the bits in this code should result in a general diminution of the collision and overflow problems commonly encountered with fixed-length keys. since only four keys are used to represent each name, and the four sets of keys from which these are selected are limited in number and of approximately equal probability, the keys can be used to construct chained indexes, to which, however, the usual constraints still apply. index sequential storage again offers opportunities, in particular since the low variety of key types means that the sorting operations which this entails can be eliminated. in effect, each name entry would be represented by an entry in each of four lists of document numbers or addresses, and documents retrieved by intersection of the lists. while four such numbers are stored for each name, in contrast to a single entry for the more conventional name list, the removal of the name list itself would more than compensate for the additional storage required for the lists. in the index sequential mode, the lists of document addresses or numbers stored with each key are more or less equally long. they may thus be replaced by bit-vectors in which the position of a bit corresponds to a name or document number. if the number of keys bears a simple relation to the number of blocks on a disc cylinder, the vectors can be stored in predetermined positions within a cylinder, resulting in the serial-parallel file. the usefulness of this file organization has yet to be fully evaluated; however, it also promises substantial economies in storage. on average, only four of the bits are set at the positions in the vectors corresponding to the name or document entry. on average, then, the density of 1-bits is very low, and long runs of zeros occur in the vectors. they can, therefore, be compressed using run-length coding, for instance as applied by bradley.3· 4 preliminary work with the 296-key key-set has indicated already that a gross compression ratio of nine to one is attainable, so that the explicit storage requirements to identify the association between a name and a document number would be just over thirty bits. conclusions the work described here relates solely to searches for individual occurrences of personal names. clearly, in operational systems in which one or more author names are associated with a particular bibliographical item, it will be necessary to provide for description of each of these for access. if this is provided solely on the basis of a document number, some false coordination will occur-for instance, when the initials of one entry are variety-generator approach/fokker and lynch 213 combined with the surname of another. a number of strategies can be envisaged to overcome this problem. , the performance figures show clearly that a small number of characteristics-between 100 and 300 in this study-are sufficient to characterize the entries in large files of personal names and to provide a high degree of resolution in searches for them. while performance in much larger files, involving the extension of key-set sizes to larger munbers, has yet to be studied, the logical application of the concept of variety generation would appear to open the way to novel approaches to searches for documents associated with particular personal names, which seem likely to offer advantages in terms of the overall economic performance of search systems, not only in bibliographic but also in more general computer-based information systems. acknowledgments we thank m. d. martin of the institution of electrical engineers for provision of a part of the inspec data base and of file-handling software, and the potchefstroom university for c.h.e. (south mrica) for awarding a national grant to d. fokker to pursue this work. we also thank dr. i. j. barton and dr. g. w. adamson for valuable discussions, and the former for n-gram generation programs. references 1. d. w. fokker and m. f. lynch, "application of the variety-generator approach to searches of personal names in bibliographic data bases-part 1. microstructure of personal authors' names," journal of library automation 7:105-18 (june 1974). 2. i. j. barton, s. e. creasey, m. f. lynch, and m. j. snell, "an information-theoretic approach to text searching in direct-access systems," communications of the acm (in press). 3. s. d. bradley, "optimizing a scheme for run-length encoding," proceedings of the ieee 57:108-9 (1969). 4. m. f. lynch, "compression of bibliographic files using an adaptation of runlength coding," information storage and retrieval 9:207-14 (1973). r /' i, letter from the editors (march 2022) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.14881 our first issue of 2022 brings the welcome appointment of marisha c. kelly as assistant editor for the journal. marisha is reference and instruction librarian at northcentral university, wh ere her job duties include planning, developing, integrating, implementing, and maintaining digital systems and services. she has a bachelor of science in journalism from syracuse university, a master of science in library and information science from drexel university, and is currently pursuing a master of science in information technology from northcentral university. contribute to the journal are you interested in furthering the scholarly record for library technology and have a background in information technology in libraries, archives, or museums? i would assume the answer is “yes” if you are reading this issue. ital needs new editorial board members to fill vacancies starting in july. joining the board is an exciting way for members of core to contribute to the profession and engage with colleagues across all types of organizations in examining the role of technology in libraries, archives, and museums. we are especially interested in applications from those in underrepresented groups and identities and encourage all interested individuals to apply. please see the full call for nominations for more information and details on how to apply. we also encourage all library technologists to consider submitting articles for publication. our call for submissions outlines the topics and process for submitting an article for review. if you have questions or wish to bounce ideas off the editor and assistant editor, please contact either of us at the email addresses below. in this issue in the final thought-provoking editorial board thoughts column (“policy before technology— don’t outkick the coverage”) of his editorial board term, brady lund writes about the risks of adopting new technologies before thinking through the possible policy and practical implications of offering it. we likewise highly recommend the peer-reviewed content in this issue: 1. using dpla and the wikimedia foundation to increase usage of digitized resources / dominic byrd-mcdevitt and john dewees 2. researchgate metrics’ behavior and its correlation with rg score and scopus indicators / saeideh valizadeh-haghi, hamed nasibi-sis, maryam shekofteh, and shahabedin rahmatizadeh 3. balancing community and local needs: releasing, maintaining, and rearchitecting the institutional repository / daniel coughlin 4. using open access institutional repositories to save the student symposium during the covid-19 pandemic / allison symulevich and mark hamilton 5. migration of ict-based services of a research library to a cloud platform / francis jayakanth, ananda t. byrappa, and filbert minj 6. local hosting of faculty-created open education resources / joseph letriz kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu mkelly@ncu.edu https://www.ala.org/news/member-news/2022/01/marisha-c-kelly-selected-new-ital-assistant-editor https://drive.google.com/file/d/1-foy8y5hyhr8op9wmouvfc3yz3ctykeo/view?usp=sharing https://ejournals.bc.edu/index.php/ital/call-for-submissions https://ejournals.bc.edu/index.php/ital/call-for-submissions https://ejournals.bc.edu/index.php/ital/article/view/14773 https://ejournals.bc.edu/index.php/ital/article/view/14773 https://ejournals.bc.edu/index.php/ital/article/view/13659 https://ejournals.bc.edu/index.php/ital/article/view/14033 https://ejournals.bc.edu/index.php/ital/article/view/14073 https://ejournals.bc.edu/index.php/ital/article/view/14073 https://ejournals.bc.edu/index.php/ital/article/view/14175 https://ejournals.bc.edu/index.php/ital/article/view/14175 https://ejournals.bc.edu/index.php/ital/article/view/13537 https://ejournals.bc.edu/index.php/ital/article/view/13803 mailto:varnum@umich.edu mailto:mkelly@ncu.edu contribute to the journal in this issue collaboration and integration: embedding library resources in canvas articles collaboration and integration embedding library resources in canvas jennifer l. murray and daniel e. feinberg information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.11863 jennifer l. murray (jennifer.murray@unf.edu) is associate dean, university of north florida. daniel e. feinberg (daniel.feinberg@unf.edu) is online learning librarian, university of north florida. abstract the university of north florida (unf) transitioned to canvas as its learning management system (lms) in summer 2017. this implementation brought on opportunities that allowed for a more userfriendly learning environment for students. working with students in courses which were in-person, hybrid, or online, brought about the need for the library to have a place in the canvas lms. students needed to remember how to access and locate library resources and services outside of canvas. during this time, the thomas g. carpenter library’s online presence was enhanced, yet still not visible in canvas. it became apparent that the library needed to be integrated into canvas courses. this would enable students to easily transition between their coursework and finding resources and services to support their studies. in addition, librarians who worked with students, looked for ways for students to easily find library resources and services online. after much discussion, it became clear to the online learning librarian (oll) and the director of technical services and library systems (library director) that the library needed to explore ways to integrate more with canvas. introduction online learning is not a new concept at the unf. in fact, in-person, hybrid, and online courses used online learning in some capacity since distance learning took hold in higher education. unf transitioned to canvas as their learning management system (lms) in summer 2017, which replaced blackboard. this change, which affected all the unf’s online instruction and student learning, brought on new benefits and challenges that allowed for a more secure system for students to take in-person, hybrid, and distance learning courses. while this change occurred, unf’s library went through many changes in its virtual presence. students, specifically those who had classes that utilized canvas, needed a user-friendly way to use the library website and its resources virtually. in response, the library’s resources transitioned into having a greater online presence. however, ultimately, many students needed to use resources that they did not actually realize were available electronically from the library. through instruction and research consultations (both in-person and virtually), students needed to be directed back to the library homepage to access resources; however, the reality was that unless there was a presence of library instruction or professors pointing out library resources, students instead turned to google or other easy to find online resources to which they were previously exposed. how the project originated by spring 2018, there was growth of unf courses that were converted to online or hybrid courses. as students used canvas more, librarians started receiving feedback from in-person and online sessions that students had difficulty accessing the library’s resources while in canvas. the lack of library visibility in canvas caused the librarians to truly acknowledge that this was a problem. mailto:jennifer.murray@unf.edu mailto:daniel.feinberg@unf.edu information technology and libraries june 2020 collaboration and integration | murray and feinberg 2 students had to open a new browser window to access the library and then go back to canvas to complete their assignments, which involved multiple steps. this caused frustration among students who had to remember the library url, while also getting used to navigating their new courses in canvas. librarians consistently spent large amounts of time instructing students how to navigate to the library website during library instruction sessions and research consultations. in effect, more time was spent with students to guide them to library resources such as programmatic or course specific springshare hosted libguides (also known as library guides), or the library homepage. rather than being focused on how to use library resources and become more information literate, students spent more time on just locating the library website to get to the unf library’s online resources. together, the oll and library director talked about possibilities in canvas that would benefit all students who attended unf both in-person and online. canvas is located in unf’s mywings, a portal where all students go for coursework, email, and resources that support their academic studies at unf. it became apparent that if it was possible, there needed to be a quicker way to access the unf library resources for students. literature review with the advent of online learning, it became obvious that students needed to have library access within their online learning management system. for campuses such as unf, this meant within canvas. for unf students that are distance or online students only, this was especially true. farkas noted that librarians had worked to determine the best ways to provide library materials, services, and embed librarians into the lms.1 over the last fifteen years, lms have become more important to support the growth of online learning. pomerantz noted that the lms has become critical to instruction and online learning. approximately 99 percent of institutions adopt an lms and 88 percent of faculty utilize an lms platform.2 this “puts it in the category with cars and cellphones as among the most widely adopted technologies in the united states.”3 library guides that have been integrated into an lms increased their visibility, but did not guarantee that faculty and students would utilize them. that is why it was critical to continuously collaborate and communicate with faculty, students, and librarians to bring attention to the resources that could assist students. farkas noted that librarians at the ohio state university discussed that no matter how the library was integrated into a university’s lms, the usage of the library there was decidedly dependent on if the faculty professor promoted the library to their students.4 the reality that libraries faced was that without visibility in an lms, students that were online/distance learners needed to remember or find the library’s website. while this seemed to be inconsequential, it caused students to use google or other resources instead of their university/college’s library discovery tool or library databases. farkas noted that shank and dewald’s seminal article described a university’s lms as having two levels, macro and micro. when there was one way to access the library in the lms, then it was termed macro. this single pathway allowed for less maintenance since there was a single way to access the library from the lms.5 the university of kentucky embedded the library by adding a library tab in blackboard. other institutions like portland state university, ohio state university, and oakland state university developed library widgets to make the library more accessible.6 the addition of library and research guides in library instruction was critical to increase visibility for information technology and libraries june 2020 collaboration and integration | murray and feinberg 3 students and furthermore make sure students had easier access to the library through their lms. getting librarians’ access to the lms at their institutions is an ongoing issue.7 unf librarians wanted to determine best practices to decide how the library could integrate into canvas. therefore, research was needed to see what other university libraries were doing. the librarians at unf discovered that there was no obvious preference based on examples found in research to accomplish how to get the library into canvas. davis observed that “claiming a small amount of real estate in the lms template . . . is an easy way to put the library in students’ periphery.” by simply having a library link added or a page added to each course was “the digital equivalent for students of walking past the library on the way to class.”8 however, it seemed that a lot depended on how an lms was used at an unf and the technical expertise available. thompson and davis noted that the “lms technology has added another layer of complexity to the puzzle. as technology evolves to address changes in learning design, student and facu lty attitudes, expectations, perceptions will continue to be a critical piece of the course integration puzzle.” 9 while looking at other institutions, there were a variety of ways in which canvas and the library were integrated. there were numerous examples from embedded springshare product library guides, to the creation of modules of quizzes or tutorials, and even to the creation of online minicourses, and embedded librarians in lms courses.10 penn state university looked at their method of how to add library content into canvas. they already had a specific way of putting library guides in canvas, but it was not in a highly visible location for students to easily access. when faculty put guides in their courses, with the collaboration of librarians, the guides were used. however, many of the faculty did not use these librarians or resources. a student survey and user studies were used to best learn how to fix the problem of students and faculty that did not use the guides and content more. penn state worked with their comm 190 instructors to administer a survey that was extra credit, to ensure getting responses.11 “general findings included: 53 percent of students did not know what a course guide was; 41 percent of students had used a guide for a class project; 60 percent accessed a course guide via the lms; and 37 percent of students used course guides on their own.”12 many students were interested in doing their library research within canvas itself. it should be noted that the guides needed to be in a prominent place in canvas, but not overwhelm the course content. for course-specific guides an introductory statement was needed to describe what the guide was about. when the release of springshare’s lti tool occurred, it became an optimal time in which the technical solutions allowed for penn state’s library guides to be embedded smoothly into canvas.13 the learning tools interoperability (lti tool) allows for online software to be integrated into canvas. in effect, when professors want to add a tool to their course, it allows for more seamless and controlled avenue. in the case of library guides, it creates a way in which guides can be embedded into the lms with little problems. another example of a library integration into a campus lms was at the utah state university (usu) merrill-cazier library. they looked to find a way to maximize the effectiveness of springshare’s library guides when they assessed the design and reach of library guides within their lms.14 they took a unique approach to build an lti tool that automatically pulled in the most appropriate library guide when the “research help” link in canvas was activated by a professor. they also saw this as an opportune time to redesign their subject guides and ensure there were guides for all disciplines. they provided usage data to subject librarians to help determine where there might be opportunities to interact with classes and provide more library instruction. overall, information technology and libraries june 2020 collaboration and integration | murray and feinberg 4 the study and feedback they received from students helped them to find ways to improve how librarians used and thought about library guides, and expanded their reach based on usage data. 15 this ability to add library guides to canvas provided students a way to access library materials or the library without having to leave the online classroom. many libraries have conducted focus groups and usability studies that were key to providing valuable feedback on the knowledge and understanding that faculty and students had of guides, ways to improve information shared that assisted students with their coursework and faculty in their online teaching. research indicated that exploration and implementation of integrating library guides into an lms led to a need to improve and provide more consistently designed guides.16 the literature indicated the importance of a strong relationship with the department that manages the lms. these integrations were made much easier when there was a relationship established and it sometimes led to finding out about additional opportunities to integrate more with the lms. penn state saw an increase of over 200,000 hits to its course guides believed to be because of the lti integration.17 this, however, did not guarantee that the students benefited from the course guides, similar to library statistics not proving resources were being used despite page hits. in addition, faculty were able to turn off the lti course guide tool, which reduced the chances of student usage or awareness of the course guide. based on feedback from students and faculty, it did not matter where the course guides were since they could be ignored anyway. a penn state blog was developed by the teaching and learning with technology unit to provide instructors a venue in which they could be aware of online services that librarians provide.18 “although automatic integration allows specialized library resources to be targeted at all lms courses, that does not mean that they’ll be accessed. it is important then to build ongoing relationships with stakeholders, providing not just information that such integrations exist, but also reasons why to use them.”19 however, not all universities and colleges decided to integrate the library strictly through a library guide or a link to the library integration into their lms. karplus noted that students spent more time online rather than going to the physical academic library. karplus discussed that the digital world combined with academic library resources had two benefits. one of which brought online research as a more normal occurrence. the second benefit was that students were more comfortable with accessing online resources.20 while using blackboard, st. mary’s college’s goal was to incorporate library information literacy modules into courses that existed. using the blackboard lms, students were able to access all components of their courses including assigned readings. this became their academic environment. therefore, information literacy modules, tutorials, and outside internet resources could be added to the lms.21 tutorials combined with preand post-testing, gave faculty instant feedback. librarians were also able to participate in blackboard through discussion boards and work with students.22 there was a constant need to update the modules and the information added to blackboard. librarians having access to the blackboard site, allowed for students to use the library resources more readily. “the site can be the focal point for many librarians in one location thus ensuring a consistent, collaborative instructional program.”23 overall, the integration of campus librarians into an lms was to get students to use the library in order to be more successful in their academic endeavors. information technology and libraries june 2020 collaboration and integration | murray and feinberg 5 developing a plan of action initially, the oll and library director brainstormed possible integration ideas, ranging from adding a library global navigation button to the canvas ui, to adding a link to the library in the canvas help menu. at the same time, they also researched what other libraries had done. after brainstorming, it was realized that additional conversations needed to occur within the library and with unf’s online learning support team, a part of the center for instruction and research technology (cirt), the group that manages canvas. the discussion to integrate the library and canvas was a complex matter. unf administrators asked for a proposal to be written so it could be brought to the library, online learning support team, and information technology services (its) stakeholders for discussion and approval. that proposal, along with much needed discussion, was critical in order to determine the possibilities and actions that needed to be taken. that being said it was important to recognize the importance of what was best to serve the faculty and students. when brainstorming discussions started to occur with unf’s online learning support team, it was important for the library to determine what options were available to embed the library in canvas. the library had a strong relationship with unf’s online learning support team and its administrators, which made this an easy process to pursue. what the oll and library director initially wanted was to add a simple link to the global navigation in canvas that would take all users to the library homepage. however, it became apparent that this was not possible due to the fact that this space is limited and many departments on campus would like greater visibility in canvas. the next option, which was easier to implement, was to add a link to the library homepage under the help menu in canvas. although this menu link was added, it was so hidden in canvas that the oll and library director felt that it would never be found in canvas by faculty, let alone students. cirt administrators asserted to the oll and library director of what other possibilities were available. after researching options, the library recommended creating access to library resources and services using a springshare lti tool for library guides, which cirt agreed to. library guides, or libguides, are library webpages that are built from springshare software. using the lti tool seemed like a great possibility since it would allow for more of a presence in addition to the help link to the library homepage. after approval from library administration and initial discussions with it, the project moved forward. implementation the project took about a year to complete from the time discussions began internally in the library to the time the integration went live (see figure 1). information technology and libraries june 2020 collaboration and integration | murray and feinberg 6 figure 1. project timeline the idea to have a seamless entryway to the library seemed to be a good idea based on observations and feedback from students, but the oll and library director started by completing an environmental scan to see what other institutions did to get ideas on ways the unf library could integrate into canvas. the oll and library director learned that there were a variety of ways it had been done from the integration of the library at the global navigation level, course level, and by an added link to the library under the help menu in canvas. it became clear that an integration into canvas would seem like an obvious progression to strengthen not only online learning, but also give students the ability to benefit from the resources that the library subscribed to and enhance their curricular needs. conversations then occurred with unf’s online learning support team to discuss integration options further. after much discussion, a decision was made to pursue an added link to the librar y website under the canvas help menu and a new lti tool at the course level. since canvas was used in so many courses, it was determined that university-wide campus committee agreement was needed on how to go about adding library guides to canvas courses. librarians were also approached at this time to get their input and feedback. the goal seemed obvious to the librarians. when they were approached, buy-in to support the students with canvas by way of the help button and lti tool integration seemed more than straightforward. therefore, for the librarians, the goal was to solve the problem of making sure that students could easily access library materials. overall, the library faculty’s preference for the implementation was to embed the library website under the canvas help menu while also have the student resources libguide inside all canvas courses using the springshare lti tool. after all internal approvals were obtained, the link to the library was seamlessly added under the canvas help menu. as for the springshare lti tool, it required more work and discussion before it could be implemented. after approval was granted from the unf online learning support team and campus its security team, the integration began. configuration options for the lti tool were explored and the systems and digital technologies librarian worked closely with the unf online learning support team and springshare to setup the libapp lti tool. information technology and libraries june 2020 collaboration and integration | murray and feinberg 7 the first step was to configure springhare’s automagic lti tool to automatically add libguid es to courses in canvas. this included adding an lti tool name and description, which appeared in canvas during setup and the course navigation. it was also decided to set the student resources libguide as the default guide for all courses based on feedback from across campus. instructors could request to use a different libguide for their course. to enable this, two parameters had to be set in the automagic lti tool to enable libguide matching between canvas and libguides: • lti parameter name: for unf, this was set to “context” label, to select the course code field in canvas. • libguides metadata name: this was set to the appropriate value to identify the metadata field used in libguides. if an instructor decided to change the default guide to another guide, these two parameters would need to be entered into a specific libguide’s custom metadata, so that canvas could link to the designated guide to display in a course. the change had to be made in the libguide itself, so it was handled by the systems and digital technologies librarian. there had not been many instructors who had requested this yet, but if utilized, the library would also have had to ensure this carried over each semester by updating the metadata in the guide to the new course code. after the configuration was completed on the springshare side, the next step was to set up the integration in the canvas test environment. an external application had to be installed in canvas to allow the springshare lti tool to run. after it was tested, the application was applied across all courses and set to display by default, which the majority of faculty preferred. faculty who did not want to use the integration had the ability to manually turn it off in canvas. during the implementation setup, a few minor issues were encountered. after seeing what the student resources guide looked like in canvas it became clear that the header and footer were not needed and just cluttered the guide. they were both removed in the lti setup options to ensure a cleaner looking guide. since the libguides were being embedded into another system (canvas), formatting of the guides had to be adjusted. the other issue encountered was trying to add available springshare widgets such as the library hours or room booking content to the guide using the automagic lti tool. while this was not successful, it was determined that the additional options were not needed. once the integration was set up in the canvas test environment, demonstrations were held and input was gathered from stakeholders through campus-wide meetings with faculty to obtain their input. it was critical to determine if faculty would utilize libguides in their canvas courses. an overview of the integration and the benefits were given to the campus technology committee and distance learning committee faculty. a demonstration was also given so that these faculty committees could see what the integration would look like in their courses. overall, the feedback obtained from the faculty was very positive. the preference was to have the configuration be optout, where the library guides would automatically display in canvas courses. many faculty members were excited about the integration and looked forward to having it in their courses. after demos took place and final setup was completed based on feedback, the integration was then setup in the canvas production environment and was announced via newsletters, emails , and social media. as of the fall 2019 semester, the library's student resources guide was integrated into all courses in canvas (see figure 2). information technology and libraries june 2020 collaboration and integration | murray and feinberg 8 figure 2. student resources library guide in canvas benefits of the integration students are dependent on their campus lms in order to complete their coursework, support their studies, and in the case at unf, have easier access to the online campus. the libguide integration not only streamlined their way to library resources, but also promoted library usage from students that may not have known how to get to the resources available to them. for faculty it should be noted that they were able to replace the default student libguide with a more specific subject or course guide. either way, it brought more awareness to resources and services that supported curricular needs. the springshare chat widget in the guide also gave students the ability to communicate directly with a librarian from within canvas. this integration not only increased the library visibility in the online environment but enabled all students, whether inperson, hybrid, or online, with direct access to the resources they needed for their coursework. challenges of the integration although there were many benefits to integrate the library into canvas, there were many challenges with making the integration happen. there were many more stakeholders than expected. from library administration, to the canvas administrators, to library faculty, and teaching faculty committees, their input was needed prior to the project taking place. since the project grew organically, this meant that all of the stakeholders were brought in as the project grew or unfolded. once the project received approval from the library and cirt administrators, its administrators had to give the final approval in order to proceed with the integration of library guides. the process to implement the integration took some time to figure out. in addition, getting buy-in from the teaching faculty was key as the navigation options in their canvas courses would be impacted. making sure the faculty understood how it would assist their students was important as the goal was to help their students succeed with their coursework. information technology and libraries june 2020 collaboration and integration | murray and feinberg 9 a concern was if faculty would tell their students, or conversely, would students find the link to the libguide on their own? determining how the news of the library and canvas integration would be communicated to the unf community was the final step. the library director, oll, and cirt administrators needed to determine the best communication routes to get the unf community aware of this news. in effect, emails, unf updates/newsletters, and by word of mouth by teaching faculty. it was crucial that students be aware of these tools. this meant that going forward, unf would depend on word of mouth or student's curiosity in the canvas navigation bars themselves. discussion and next steps integrating the library with the unf’s learning management system, canvas, took much planning and collaboration, which was key to creating a more user-friendly learning environment for students. in reflecting on what went well and what did not, the unf librarians learned several important lessons that will help improve upon the implementation of future projects. to begin, it is important to identify and involve stakeholders early on, so they can provide feedback along the way. getting buy-in from the teaching faculty is also key since the integration affects the navigation options in their canvas courses. for unf, initially, the oll and library director did not realize how many groups of teaching faculty and departments would need to approve this canvas change and implementation. it was important to have them understand the importance of the integration and how it can assist their students with their coursework. considering the content of the library guides was important because of the impact it would have on canvas courses. for example, at the unf library some students thought that the librarian’s picture on the default guide was in fact their professor. in turn, students began to contact her. this caused much confusion for our students and professors alike. along the way, communication is critical so that everyone is kept informed as the integration progresses. communication at the appropriate times and ensuring all information is gathered about configuration options before starting conversations with stakeholders is important too. having this transparency at the appropriate times and ensuring there was enough info rmation about the configuration options before starting conversations with stakeholders was important too. finally, investigating the ability to retrieve usage statistics from day one would be extremely beneficial and provide data to assess how often the library guides are being used in the lms and by whom. this information would help determine next steps and explore other potential integration opportunities. at unf, the librarians were not able to implement statistics as part of our integration which has made it more difficult to assess the usage of the library guides in canvas. now that the integration has been completed, ensuring the integration continues to meet the needs of faculty and students will be important. feedback will need to be gathered from stakeholders to find out if they find the integration useful, if there are any issues being encountered, and/or if they have any recommendations for ways to enhance the integration. usage statistics will also be gathered as soon as they are available. this will provide information on which instructors are using the library guides in their courses and which instructors are not using them. for those who have used it, it will be an opportunity to target those courses for instruction. for those who have not used them, it will be an opportunity to find out why and make sure they are aware of the benefits of using them in their courses. information technology and libraries june 2020 collaboration and integration | murray and feinberg 10 exploring other integration possibilities, especially as the technology continues to evolve, will be important to ensure the library continues to reach students. while the natural progression of the unf integration would be to embed librarians in the canvas platform, others have faced challenges. “according the ithaka s & r library survey 2013 by long and schonfeld, 80–90 percent of academic library directors perceive their librarians’ primary role as contributing to student learning while only 45–55 percent of faculty agree librarians contribute to student learning.”24 even though this is a challenge, faculty collaboration with librarians is crucial for the embedded librarian role. without a requirement of embedded librarianship, marketing for the librarians and what they can do for students will be essential for their role to be successful.25 at unf, conversations will have to be held to determine what other integrations would be of interest and possible at our university. the unf library will also be looking to improve the design and layout of library guides. now that their visibility has increased, it will be important to standardize them and ensure they all have a consistent look and feel, which will make it easier for students to find the information and resources they are looking for. conclusion in today’s rapidly changing technological world, it is critical to make resources available despite where students are physically located. integrating the library’s libguides into canvas not only brought more visibility to the library, its resources, and its services, but it also brought the library to where the students were engaged with the university. as noted by farkas, “positioning the library at the heart of the virtual campus seems as important as positioning the library as the heart of the physical campus.”26 providing resources to students at their point of need, enabled them to easily access the information they needed to help them succeed in their courses. it also allowed faculty to integrate library resources that were most beneficial to their courses and enhanced their teaching as well as the educational needs of their students. the unf library will continue to look at how library resources are used, and how to best serve the online community going forward. it will be important to explore ways to enhance existing services with existing technology but also look ahead and determine what may be possible down the road with new and upcoming technologies. in addition, assessing how the library connects to online learners and gathers feedback from students and faculty will be critical to contributing to the success of students. endnotes 1 meredith gorran farkas, “libraries in the learning management system,” american libraries tips and trends (summer 2015), https://acrl.ala.org/is/wpcontent/uploads/2014/05/summer2015.pdf. 2 jeffrey pomerantz et al., “foundations for a next generation digital learning environment: faculty, students, and the lms” (jan 12, 2018): 1–4. 3 pomerantz et al., “foundations for a next generation digital learning environment.” 4 farkas, “libraries in the learning management system.” https://acrl.ala.org/is/wp-content/uploads/2014/05/summer2015.pdf https://acrl.ala.org/is/wp-content/uploads/2014/05/summer2015.pdf information technology and libraries june 2020 collaboration and integration | murray and feinberg 11 5 farkas, “libraries in the learning management system.” 6 farkas, “libraries in the learning management system.” 7 farkas, “libraries in the learning management system.” 8 robin camille davis, “the lms and the library,” behavioral & social sciences librarian 36, no. 1 (jan 2, 2017): 31–5, https://doi.org/10.1080/01639269.2017.1387740. 9 liz thompson and davis vess, “a bellwether for academic library services in the future: a review of user-centered library integrations with learning management systems,” virginia libraries 62, no. 1 (2017): 1–10, https://doi.org/10.21061/valib.v62i1.1472. 10 davis, “the lms and he library.” 11 amanda clossen and linda klimczyk, “chapter 2: tell us a story: canvas integration strategy,” library technology reports 54, no. 5 (2018): 7–10, https://doi.org/10.5860/ltr.54n5. 12 clossen and klimczyk, “chapter 2,” 8. 13 clossen and klimczyk, “chapter 2,” 8. 14 britt fagerheim et al. “extending our reach,” reference & user services quarterly 56, no. 3 (2017): 180–8, https://doi.org/10.5860/rusq.56n3.180. 15 fagerheim et al., “extending our reach,” 187. 16 fagerheim et al., “extending our reach,” 188. 17 amanda clossen, “chapter 7: ongoing implementation: outreach to stakeholders,” library technology reports 54, no. 5 (2018): 28. 18 amanda clossen, “chapter 7,” 29. 19 amanda clossen, “chapter 7,” 29. 20 susan s. karplus, “integrating academic library resources and learning management systems: the library blackboard site,” education libraries 29, no. 1 (2006): 5, https://doi.org/10.26443/el.v29i1.219. 21 karplus, “integrating academic library resources and learning management systems.” 22 karplus, “integrating academic library resources and learning management systems.” 23 karplus, “integrating academic library resources and learning management systems.” 24 beth e. tumbleson, “collaborating in research: embedded librarianship in the learning management system,” reference librarian 57, no. 3 (jul, 2016): 224–34, https://doi.org/10.1080/02763877.2015.1134376. https://doi.org/10.1080/01639269.2017.1387740. https://doi.org/10.21061/valib.v62i1.1472 https://doi.org/10.5860/ltr.54n5 https://doi.org/10.5860/rusq.56n3.180 https://doi.org/10.26443/el.v29i1.219 https://doi.org/10.1080/02763877.2015.1134376 information technology and libraries june 2020 collaboration and integration | murray and feinberg 12 25 tumbelson, “collaborating in research.” 26 farkas, “libraries in the learning management system.” abstract introduction how the project originated literature review developing a plan of action implementation benefits of the integration challenges of the integration discussion and next steps conclusion endnotes communications ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ cost comparison of computer versus manual catalog maintenance 159 john c. kountz : county of orange public library, orange, california is a computer assisted catalog system less expensive than . its manual counterpart? a method for comparing the two was developed and applied to historical data from the orange county public library. comparative costs obtained were $ .89 per entry for computer assisted catalog maintenance versus $1.71 for manual maintenance. introduction since november 1965, the county of orange public library has performed all acquisitions by means of a computer assisted system. as a byproduct of this continuing operation, records for over 30,000 titles are now available in machine readable form on magnetic tape. the next logical step to realize the library's goal of mechanizing a major portion of its many nonprofessional functions is the production of a comprehensive multi-access list of its holdings suitable for both library and patron use; in short, a book catalog. the 30,000 captive entries, however, comprise only a quarter of the library's total holdings of 120,000 titles. before the envisioned book catalog can be produced, approximately 90,000 titles remain to be captured, and subsequent file handling and data printout operations must be developed. an undertaking of this magnitude naturally prompted a review of the literature. initially, hayes and shoffner's work for the stanford university undergraduate library ( 1) would appear adequate. on closer exam160 journal of library autouwtion vol. 1/ 3 september, 1968 fig. 1. manual card catalog system. cem~\.\uo 0~'\lii..,.iotl~ (c~ ... "'"'""'c.t\oh) cost comparisonj kountz 161 fig. 2. proposed computer assisted book catalog system. 162 journal of library automation vol. 1/ 3 september, 1968 ination, however, their approach did not optimize the cycle for supplement production or catalog reprint; nor was particular attention given this problem in the institute of library research report to the california state library ( 2). the cartwright and shoffner study for the california state library ( 3) paid close attention to cycle length, but the system therein described differed extensively from the system proposed for orange county. further, though the costing of data capture has been well documented and continues to appear in the literature ( 4,5,6), there is little concerning the cost of maintaining data once on file. in brief, neither a method nor basic information was available which could be applied generally, although several specific approaches and results had been presented (1,7,8,9,10,11), and an approach to the analysis of manual operations established ( 12). when it became apparent that more than article reading .would be required, cost analysis of the existing manual operation and the proposed computer assisted book catalog program was performed. in addition, a method was designed to discern what cost benefit, if any, was implied in a computer maintained file before a massive keying effort and systems development should be undertaken. it is important to note that the analysis gives no consideration to increased level of service, esthetics, practicality, or the subsidiary products of a computerized system. nor is the capital investment represented by existing card catalogs considered, as those units are assumed to have been paid for in the course of their creation. manual card catalog system the manual system to be replaced consists of individual card catalogs and shelf lists in each of the library's service units, comprising 25 branches and a separate bookmobile base. this system, depicted in figure 1, consists of: 1) centralized card production, and; 2) branch catalog maintenance. in the centralized operations, offset masters are created from worksheets prepared by the cataloging section and used for two-up card production. these cards are collated into sets, the sets merged with their corresponding books, and the completed packages sent to the ordering branches. when book and card packages are received by the branch, shelf list and catalog cards are sorted and merged with their respective files. withdrawal of a book (discarded or lost) from a branch collection triggers a reversal of this process, and all cards for the withdrawn item are purged from the files. proposed computer assisted book catalog system the computer assisted system (figure 2) consists of tlll'ee phases of computer operation and catalog printing. in the first phase the computer receives as input magnetic tapes produced by the library's ongoing book acquisition system and/ or the output of a device providing a direct keycost comparisonj kountz 163 board to tape capability, processes the input data into updated records, and merges the updated records with the master file of library holdings. the first phase will build the initial master file through capture of the library's remaining 90,000 titles via the keyboard-to-tape device indicated above, and will also form the main avenue for communicating revision (update) data to the master file. in the second phase the computer extracts two print tapes from the master file: the first is the biblio file, consisting of all the bibliographic data and the record number for each master file entry in alphabetical sequence (author-title mix); the second, or locate file, contains location codes and copy counts for each record number in numeric sequence. in the third and final phase, the biblio and locate files are processed. from tl1e biblio file are produced keylines (camera-ready copy) for the book catalog and periodic cumulative supplements of new entries. out of the locate file are generated three numeric listings: 1) a locate list containing all entries, 2) periodic cumulative locate supplements, 3) branch inventories. in production of the book catalog, the computer produced keylines are used to create offset masters for printing. the end product of the printing process is 400 bound copies of the book catalog. factors in cost comparison following is an examination of the principal factors which must be equal or identical to permit comparative analysis of the two systems. unit of comparison (entry) to facilitate the cost analysis between manual and computer assisted file maintenance systems, a unit of comparison was established which would be compatible to both. this unit is called the ent}\y, and in the analysis which follows is the basis for all cost comparisons. for the manual system, entry means creation, distribution, filing and, ultimately, purging of the complete set of cards (figure 3) pertaining to a specific book; while for the computer assisted counterpart, an entry is a record (figure 4) in machine readable form which has been captured, sorted, listed and updated. frequency of transactions either system, in addition to creating and posting new records to a file, must periodically update both entire records and the elements of those records. the number of these updates can be determined for a given period of time, and for our purposes we call this figure the frequency of transactions. with regard to the systems under analysis, the frequency of transactions is identical, and includes two elements: titles added and withdrawn; and volumes added and withdrawn (including re-assignments) as shown in table 1. 164 journal of library automation vol. 1/ 3 september, 1968 don baa 940.5472 940.5472 sandulescu, jacques donbas. mckay, 1968. 217p $4.95 escapes sandulescu, jacques donbas. mckay, 1968, 217p $4.95 world war, 1939-1945 prisoners and prisons, russian personal narratives 940.5472 ., 940.5472 sandulescu, jacques donbas. mckay, 1968. 217p $4.95 sandulescu, jacques donbas. mckay, 1968, 217p $4.95 5217s3 940,5472 940.5472 sandulescu, jacques · donbaa. mckay, 1968, 217p $4.95 sandulescu, jacques donbas. mckay, 1968. 217p $4.95 521763 1. world war, 1939-1945 prisoners and prisons, russian personal narratives (wo 63866) 2. escapes (es14042) i, t 0 68-14127 fig. 3. set of catalog cards. 11-i. ~ m~~fr. ~ rl::c.olitc z ~ ... 0 "' ...j "' a: "' • "' :> z 0 c: ... 0 "' ...j "' a: "' .. 2 ::> z 0 ~ <> "' .j "' 2. ::; na.!'l!: ; :> sv~!:>)e<.t ~ rec.oro:: 0 fig. 4. "' .j "' dep artm ent ~' . . . multiple layout form for electric account ing mac hine cards interpreter spacing n'-mi:: / svaje.ct c.odes lc./oc. nvm8t:r • . ~ iu...f-. o-n l.tnnnc."n•• 0 c.o~ coac c.odt c.oot. c.oo~ c.oot:. p1tlrt• nv"at:" ~ ~ ~ ~ ~ ~ ~ s' 0 0 0 0 0 0 effective date----filingtitle. v tt « « tt. ii it 9 9 919 919 9 9 9 9 919 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 919 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 i 2 3 4 s 6 1 i s 10 11 i jl 14 15 1l 11 ii ii 10 21 2 23 24 25 2l 27 2t 1'! 30 )i 32 ll 34 35 3c l1 31 jt 41 41 4~43 44 4s 46 47 41 0 $0 51 52 s3 s4 55 ~ 51 s. 5t so 61 ' 2 6j 64 5$ "51 61 " 70 71 11 1314 75 11 7171 7t to fiuniptiti..e (c.on'r.) svs·t\tl.e: 9 9 9 9 9 9 9 9 9 9 9 919 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 1 ~ 3 4 5 i 7 i t \ 0 ii 12 ll 14 15 " 11 ii it )(! 21 n ll 24 2'5 k 1l 21 2! 30 ll l2 33 :w l s 3l 31 31: ll 40 41 42 4j 44 '54& 41 u 49 so 51 52 5j s4 5.5 ~ 57 5i st 50 61 cj 63 s4 65 6' 51 m tt 70 71 ll 7l 74 15 71 71 1111 • 5ub·tttl.e (.c.• .. t ) d"-te jfve<o .. r"'---..--,....--,o • .:: 99 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 919 9 9 919 9 9 919 9 919 9 919 9 9 12l45171!~11~~~~""~a~~nn~~3 n~~~~nu~~~n» hc~u~~ ~ ~uq~~» ~~ ~~~u~~~ u~~~ ~"""~~nn»h~nnnn• ca'·~~c.twl\1"0..1 v&llott p~\c.e qu...,nt• t"t he~o e'( c.o~t ce'nt~r(~xp¥j$\&t.t:to·n··<.o\."r<.e'~te'q~) t i c.oll'£ t:t " ~ df. .i ooooooooo·--1 --.j--::r'~~ .. ~"r'~t'~r'" "'~w •.• ".t ~l.&l:::::: o11. % ~ ~ ~ ~~~~~~o-~~ ~~ ~o~~ r~ ~ o-~-~~ ~ ~ o ~~s ~~••o c t . s 9 s sis 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 s 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 1 1 1 •is s 1 1 t 1 n1zu u 1!115 n 11nzg 2~ n zj 21r. n 2121 lo 11 n 33 34 j lli j7 lll9 tl 41 u t3 44" "'<~7 41141 51~ !sl 54 5o 5i s ;s. ~ 61' 63 sot 65" n 1t 1011 1z n t c 7~ n 7771 71 eo name / s ue.)e<r coo£ 19 9 9 919 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 g 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 g 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 91919 9 9 l1 1 3 4 5 5 1 i 9 id 11 12 1l 14 is " 1t ii 19 70 21 22 13 24 25 25 71 21 n lo 31 32 33 34 35 36 31 31 js 40 41 42 43 44 45 46 u 41 49 50 51 52 53 54 55 5(j 57 51 59 60 e1 62 6l 64 65 66 6j 5i " 70 11 72 7374 ts td 71 7171 10 entry record map. c":l 0 "' .... c":l 0 ~ £i ~0 ;:$ ............ ~ 0 c z ~ n 1-' ~ 166 journal of library automation vol. 1/3 september, 1968 table 1. frequency of transactions for manual and computer systems per year. titles volumes added 15,000 (9,000 new) 105,000 table 2. manual system direct costs personnel: file and typist-clerk offset operator equipment: offset press depreciation (5-year) materials: card stock offset masters press supplies table 3. computer system direct costs personnel: typist-clerk equipment: keyed data recordermohawk 1181 (other possibilities include: ibm 50; or ibm mt / st and 2495 tape cartridge reader) computerrca spectra 70/ 45f, 131 k byte core, 3 selector channels, four 70/ 442 tape drives ( 8 tapes), three 70/ 564 disc units. software used is in rca cobol v version 15 materials: fanfold paperstock, l magnetic tape s withdrawn 4,000 24,000 $ 2.17 hr. 2.90 hr. 900.00 year 2.69 per 1,000 4.60 per 1,000 147.00 mo. $ 2.17 hr. 300.00 mo. 82.00 hr. included in equipment cost. cost comparisonj kountz 167 input both manually produced and computer generated catalogs use an identical input: the source document being the sub purchase order (figure 5), which is completed and edited by a cataloger. this document provides the text to be typed onto offset masters for card production (manual system); or the corrected data to be keyed into machine readable form (computer assisted system). output similarly, the output from each process must fulfill the primary objective of catalog creation: multi-access listing of the materials held by the library (figures 3 and 6). the on order list in figure 6 contains many elements which will appear later in the proposed book catalog. computation of costs once identity of input, output, and frequency of transactions have been established, the activities internal to each process must be isolated and documented in terms of cost. materials consumed and equipment used in each process to fulfill its objectives must also be accounted for. the direct costs for the two systems are shown in tables 2 and 3, respectively. in addition to these direct costs, a burden rate could .be applied to these figures to reflect the indirect charges involved in both systems. however, essentially the same burden rate is applicable to both, primarily because of the amortization of computer system development costs, the dollar amount of which approximates the supervisorial an~ administrative costs of the manual system. therefore burden rates are not included. following is a development of the costs incurred by each system, in an attempt to answer the question: is a computer maintained book catalog less expensive than a manually maintained card catalog? manual card catalog system the cost figures presented in table 4 reflect the total card production load for the entire system and the average card catalog shelf list maintenance load for a single branch. they have been derived from time studies performed on both the manual card files and the library's card production operation. as indicated in table 4, the $1.71 cost per entry is com· posed of $0.99 for card production and $0.72 for file maintenance at a single branch. computer produced book catalog cost a computer assisted catalog system requires the construction of tw6 forms of the same file: a machine readable record for file maintenance and a book catalog for staff and library users. to produce an accurate total cost picture, the specific costs in both the consbuction and maintenance of these "parallel" files must be identified and summarized. 168 journal of library automation vol. 1/ 3 september, 1968 table 4. manual catalog card production and maintenance costs operation supplies entry cost card production type offset masters print cards offset masters card stock $ 00.30 00.02 00.32 00.22 a.b. dick 360 ( 5-year) amortization 00.06 00.07 assemble sets with books subtotal: maintenance (single branch) sort shelf list from catalog cards shelf list sort file/ purge catalog sort file/ purge subtotal: total: $ 00.99 00.11 00.08 00.02 00.47 00.04 00.72 $ 1.71 table 5. initial costs for machine readable (master file) entries operation unit cost total input of 90,000 titles key-to-tape $00.37 conect errors 00.007 subtotal: $00.377 $33,930 transfer of 30,000 entries correct errors $00.007 merge with master file 00.0048 subtotal: $00.012 360 total for 120,000 entries $34,290 cost comparison j kountz 169 machine readable file construction the first step toward implementing a computer assisted system is the construction of a machine readable file on magnetic tape of some 120,000 entries ( 90,000 entries of approximately 309 characters each to be captured, and the 30,000 entries captured previously). costing the capture of the 90,000 entries has been closely estimated at $0.37 per entry. but, to complete the initial file cost picture requires pricing the transfer of the 30,000 acquisitions entries . and the correction of errors occurring in both the capture of the 90,000 entries .and/or the transferred acquisitions entries. further, it is known that the error correction figure will not include the price of deriving the update data (cataloging), nor the price of keying the original entries (acquisitions). therefore, the error correction cost is the price of keyboarding update data, posting it to the erroneous entries and, in the case of acquisitions entries, merging those entries with the master file. the cost of error correction is tied to the number of errors to be corrected per entry. the figure used here was developed by surveying the acquisitions records (figure 5), which have been used for pre-cataloging in the manual system, where an average of five characters per entry were found . to be in error . (edition, pagination, price, misspellings, etc.) . for the computer assisted system an additional 9 characters will be required to contact a specific field in any record (record, card code, and action digit) . thus, approximately 14 keystrokes will be required to correct an error in an entry. ·at approximately 8,500 net keystrokes per hour, slightly more than 650 minimum changes per hour are possible. therefore, if all 30,000 entries require correction, approximately 50 keyboard hours will be required (or a rounded $195.00 for operator and machine) , plus some 20 minutes of computer time ( $30.00), which can be reduced to an average cost of $0.007 per entry. to merge the updated acquisitions entries with the master file will cost an additional $0.0048 (historical cost derived from a similar operation in the acquisitions system) for a total of approximately $0.012 per entry. this is shown in table 5 in concert with the development of the initial master file entry cost predicated on 120,000 entries, all of which have required a minin1um error correction. the initial average cost per entry (master file) is $00.286. book catalog construcfiml. the cost for a comput~r produced book catalog involves the computer production of camera-ready copy (keylines) and offset catalog production. the first step in the transition from the master file to the production of keylines is the rearranging of the individual elements (author, title, collation, lc order number) of .each record into the sequence they will occupy in the printed entry, and translating the coded elements in each entry to their "real world" equivalents. the ·$0.009 cost for "reformat. . . " 170 journal of library automation vol. 1/ 3 september, 1968 dole: vendo r: sui purchase order county of orange santa ana, california 0'1-25 .. 66 tarl j, lle8el1 lnc, 1236 south hatther avenu£ . la puente, callforhla route sheet sub p. 0 . no. l 51282 contract o r order no: a 8'200 storage: 63it priority: /. woiu~ wa~_,/'ijs'·itf'ls--fi>lfi$4.-fies a~ pte /sons 1'?1):.~ •1\n pt"iu~n~ t n~tttt'a11vgs(wll ".39,6) ;). • fsc"pj:s (es 11./oi.l~) strips: 6 price new ti tle 0 recol. [=:j end papers new ed. d new set c=j bind have d l abels i ,._., re info rce f850-66 . 2 bind ing: trade closs: 9 01 02 1 20 21 32 53 54 55 56 61 fig. 5. sub purchase order. anf . 1 63 71 72 73 cost comparisonj kountz 171 adult titles on oroer branch 11 laguna beach 04-30-67 page 3 01933.5 arco 03-67 $6.50 scoring high bn reading tests 019413 arco 03-67 1 s4.00 vocabulary, spelling, gra1<4mar 022015 armstrong, charlotte 05-67 1 $4.95 the cih shop 022483 ashley ~ontagu, m. f. 05-67 1 $5.95 american way of life 017896 ashuy-~ontagu s4.50 on being hijman 021535 asi"'ov, isaac 05-67 1 s3.75 whlsprings of liff 022599 attwood, . wil~ia~ 05-67 1 $5.95 the rf.ds & thf blacks 020713 aucfiincloss, louis 04-67 1 $4.95 tales of ~anhattan 019960 auer, alfons 03-67 1 s5.95 open to the world 022460 austen, jane 05-67 s5.95 pride & prejudice 018680 austen, jane 1 s3.95 em'4a 021536 bak f.r, geoffrey 05..;67 1 sl5.00 motfls 021666 balogh. thas 05-67 $7.95 economics of poverty 018116 bannister, margaret $4.95 6up.n tfie little lamp 022093 baring-goulo, william 05-67 ss.oo the lure of th£ limerick 018392 barlow, james 1 s5.95 one man in the world 021188 barnett, a •. ooak 04-67 1 $6.00 china aftei-t mao fig. 6. on order listing. 172 journal of library automation vol. 1/ 3 september, 1968 is taken from historical costs for the operation of a report generator doing this in the acquisitions system. similarly, the reformatted entries in "printable" form must also be : sequenced alphabetically (single ·authortitle mix) before they can be printed, and again the $0.00034 cost is taken from historical data. finally, the sorted, reformatted entries are printed (upper case) at a cost of $0.027 each ( 2 lines). the total cost for these operations becomes $0.036 per entry, as shown in table 6. table 6. keyline production cost computer reformat master file entries sort reformatted enh·ies print (offline) total entry cost $00.009 00.00034 00.027 $00.036 for catalog printing, the computer generated keylines will be reduced photographically to 60 percent, and the reductions assembled for 16-up reproduction with approximately 100 entries per sheet (both sides). initial book catalog production will be 400 copies of approximately 1,800 sheets. there are slightly more than 61,000 author entries which will receive full bibliographic data, call and lc order numbers, while the 120,000 title entries will present only author data and call number. the estimated total cost of printing is given in table 7. the resultant. cost per entry is $00.186, regardless of the number of lines required. table 7. catalog printing cost set-up plates run time gather/ collate paper cover / perfect bind total: $ 3,000 7,000 6,000 2,000 4,000 350 $22,350 as the printed and bound book catalog will not present the locations of the materials it lists, an off-line locate list will be produced concurrent with catalog creation. this list will contain 120,000 numeric entries ( lc order number, coded locations and price), and will be generated for library use only. the cost of offline printing of this list ( 25 copies) is based on a historical print cost of $0.014 per one-line entry extended to the number of entries, or $1,680.00. cost comparisonj kountz 173 summary of computer assisted catalog production costs the grand total cost per entry for all operations leading to the initial book catalog (based on initial data capture through file construction above) is given in table 8. as shown in this table, the computer cost per table 8. composite cost per entry of a computer produced book catalog operation file construction keyline production book production locate list production total cost $0.286 0.036 0.186 0.014 $0.52 entry for the first 'edition' of the book catalog is $0.52 .. this figure is comparable to the manual system figure of $0.99 per entry derived earlier. however, the cost per entry figure for computer assisted file maintenance must also be derived before comparison with the total manual figure of $1.71 per entry is possible. table 9. cost of posting and printing catalog update data operation unit cost total locate update input print locate list (offline) subtotal total for 133,000 actions biblio update reformat master file entry sort reformatted entry print biblio list (offline) subtotal ( rounded) total for 9,000 actions grand total ( 142,000 actions) $00.007 00.014 00.021 00.009 00.00034 00.027 00.036 $2,793.00 324.00 $3,117.00 174 journal of library automation vol. 1/ 3 september, 1968 computer file maintenance the figures developed in table 9 establish a cost per entry for file maintenance, from keyboarding corrected data to the production of an offline printout of biblio and locate supplements. to understand their derivation, let us review the frequency of transactions. in table 1, it can be noted that the 15,000 titles added to the collection annually will necessitate master file location update for the volumes they represent. the locations for withdrawals will also require update. in combination, additions and withdrawals mean a total of 129,000 actions, plus 4,000 last copy withdrawals, or 133,000 updates yearly to keep the master locate file current. in contrast, only the 9,000 new titles will require bibliographic listing. the locate file update input cost is identical to that used for the entry of error correction data (table 5). this is possible since approximately the same number of characters must be keyed to address an entry and enter updated location data ( 9 characters for record and card code, an action digit, and 2 numeric location characters for a total of 12 characters versus the 14 characters required for entry correction) . similarly, the offline print cost for locate data remains the same as that indicated for the initial locate list printout: $0.014 per entry. the biblio file update costs are the computer keyline production figures presented in table 6. to derive the cost per entry, both locate and biblio figures are extended to reflect the proportion of the final figure they represent, and reduced to a single cost per entry. in summation, $0.0242 is the cost per entry for file maintenance for one year. however, this figure is of limited value without reference to either the frequency of supplemental production or total catalog reprint period. therefore, the optimum frequency of supplement production and the period of maintenance are discussed below to bring this raw $0.0242 per entry into perspective. optimum frequency of supplement production and catalog reprints the optimum frequency of bibliographic supplement production is based on the most timely reporting of new title disposition at the least cost. that is, a determination of the number of cumulative listings of new titles in concert with all location changes which can be produced before their production cost equals or exceeds the cost of total catalog reprint. the most economical approach to reporting revised, new, or deleted bibliographic and location entries would be through listing only those entries which have been changed. the summary figures presented in table 10 reflect only the cost per entry developed in table 9 for the production of cumulative exception listings, assuming an equal monthly distribution of transactions. in addition, the annual cost per year, excepting the twehth month, is tabulated to reflect overall cost where total reprint cost comparison/ kountz 175 would occur instead of last cumulative supplement cycle. a quarterly supplement production cycle is selected, as it best meets the optimum defined earlier (i.e., most timely reporting for the least cost). table 10. cumulative supplement costs for various cycles computer runs per year 12 6 _..,.. 4 3 2 1 annual cost @ $0.0242/entry 12th month 12th month included excluded $ 22,335.92 $ 18,899.63 12,027.04 8,590.74 8,590.74 5,154.44 6,872.26 3,436.30 5,154.44 1,718.15 3,436.30 by extending the quarterly supplement production costs shown in table 10 to represent recurring annual expenses and cumulating these annual expenses for comparison with the total cost of complete book catalog and locate list product, the number of years between catalog reprints becomes obvious. this calculation is shown in table 11, where 3 years is the optimum reprint cycle for the qua1terly supple.ment costs selected. table 11. years i 2 _..,.. 3 4 catalog reprint vs. supplement production costs supplement cost annual cumulative $ 8,591 17,182 25,773 34,364 12th month excluded $ 5,154 13,746 22,337 30,928 (year's end) catalog reprint $ 23,000 24,250 25,500 26,750 comparison and conclusion to return to the cost per entry for catalog maintenance alone for optimum reprint cycle, there is a total outlay of $47,837 for 3 years of cumulative supplements and a catalog reprint to report an average of 129,000 titles. from this base can be derived a cost per entry of $0.37 for entry maintenance. this $0.37 can then be summed with the $0.52 cost per entry for the catalog "first edition", for a grand total of $0.89 as the cost per entry for a computer assisted catalog production and maintenance system. further, this cost per entry is realized in a document equal to 400 card catalogs! in terms of the manual system, maintenance was $0.72 per entry, and some 26 files had to be maintained. thus, it is possible to extend the single file maintenance cost to a systemwide average of $18.72, plus the $00.99 required for entry preparation, or a grand total of $19.72 per entry, rather than the $1.71 indicated earlier. 176 journal of library automation vol. 1/3 september, 1968 the lesson implied here is simple: manual cost per entry is dependent upon the number of manual files being maintained. this is of importance since it means a significant increase in outlay for file maintenance with the addition of each new branch; whereas, costing for a computer produced and maintained catalog is relatively independent of the number of service units accommodated. finally, a word of caution. there is a potential danger lurking in these figures for the small public library which has a limited number of branches. this is the fact that the cost per entry, even for the single shelf list/card-catalog comparison, has been calculated for an operating system serving a relatively large number of branches. the cost-per-entry method used in this paper does not include amortization of the capital outlay for "computerization" which, in this specific case, amounts to almost $200,000 for design of system, procedures and forms, and for design, coding and debugging of programs. although savings equal to this amount, or more, would be realized over a period of time because of reduced clerical operations and attendant burden, a large sum would still have to be earmarked for expenditure during a relatively short period with no immediate return. foreknowledge of this "one-shot" cost and its related cost-per-entry payoff should not be a deterrent. rather, it should permit the administrator of a limited operation to deal effectively with increased clerical costs and to make meaningful decisions relative to service bureau overtures, library board interrogations, or the goals of a new library system. references 1. hayes, robert m.; shoffner, ralph m.; weber, david c.: "the economics of book catalog production," library resources and technical services, 10 (winter 1966), 65, 68-82, 87-88. 2. university of california, institute of library research: report to the california state library preliminary evaluation of the feasibility of mechanization (institute of library research, university of california, 1966), p . 3-6. 3. cartwright, kelly l.; shoffner, ralph m.: catalogs in book form: a research study of their implications for the california state library and the california union catalog, with a design for their implementation (institute of library research, university of california, 1967), p. 58-68. 4. bourne, charles : bibliographic data conversion techniques (mimeographed tables presented at oregon library mechanization workshop, june 1968) , table ii. · 5. chapin, richard e.; pretzer, dale h.: "comparative costs of converting shelf list records to machine readable form," journal of library automation, 1 (march 1968) , 71. l cost comparison j kountz 177 6. black, donald v.: "creation of computer input in an expanded character set," ] ournal of library automation, 1 (june 1968), 117118. 7. fasana, paul j.: "automating cataloging functions in conventional libraries," library resources and technical services, 7 (summer 1963), 358, 361-365. 8. robinson, charles w.: "the book catalog: diving in," wilson library bulletin, 40 (november 1965), 265-268. 9. macquarrie, catherine; martin, beryl l.: "the book catalog of the los angeles county public library; how it is being made," library resources and technical services, 4 (summer 1960), 225-226. 10. heinritz, fred: "book versus card catalog costs," library resources and technical sm·vices, 7 (summer 1963), 231-236. 11. smith, f. r.; jones, s. 0.: card versus book-form printout in a mechanized library system, (douglas aircraft company, 1967; clearing house document #ad 653 697), p. 7-8. . 12. wynar, don: "cost analysis in a technical services division:· library resources and technical services, 7 (fall 1963 ), 320-326. 20190615 11093 galley lita president’s message moving forward with lita bohyun kim information technology and libraries | june 2019 2 bohyun kim (bohyun.kim.ois@gmail.com) is lita president 2018-19 and chief technology officer & associate professor, university of rhode island libraries, kingston, ri. i am happy to share some updates on what i covered in my previous column. first of all, i am excited to report that the merger planning of lita, alcts, and llama is back on track. the merger planning had been temporarily put on hold due to the target date for the merger being delayed from fall 2019 to fall 2020, as announced earlier this year. after taking some time after the 2019 ala midwinter meeting, the current leadership of lita, alcts, and llama met, reviewed the work that we have accomplished so far, and decided that the remaining work will now go to the capable hands of the president-elects of lita, alcts, and llama, who were elected this april. during their term, this new cohort of president-elects will build on the work done by the cross-divisional working groups, in order to present the three-division merger for the membership vote in spring 2020 with more details. another piece of good news is that lita, alcts, and llama will begin experimenting with joint programming in order to kickstart our collaboration while the merger planning continues. the lita board decided to hold the next lita forum in fall 2020. alcts is also planning for its second virtual alcts exchange to take place in spring 2020. lita, alcts, and llama will work together on both program committees of the lita forum and the alcts exchange to provide a wider and more interesting range of programs at both conferences. if the membership vote result is in favor of the three-division merger, then the new division will be officially formed in fall 2020, and the planned 2020 lita forum may become the first conference of the new division. shortly after the 2019 ala midwinter meeting, the lita board decided to commit funds to create and disseminate an online allyship training to address the issues aggressive behavior, racism, and harassment reported at the midwinter meeting.1 since then, the lita staff and the lita board of directors have been closely working with the ala office and several other divisions, alcts, alsc, asgcla, pla, rusa, and united, reviewing options. it is likely that this training will follow the “train-the-trainer” model, in order to generate and expand the pool of allyship trainers who will develop and run the lita’s online allyship training for lita members. our goal is to expand our collective capacity to strengthen active and effective allyship, recognize and undo oppressive behaviors and systems, and promote the practice of cultural humility, which requires ongoing efforts, not just a one-time event. we hope to be able to announce more details soon once the final plan is determined. i would also like to highlight the lita award winners who will be celebrated at the 2019 ala annual conference in washington d.c. and to thank the members of the award committees for their hard work.2 the 2019 lita/ex libris student writing award will go to sharon han, a master of science in library and information science candidate at the university of illinois school of information sciences, for her paper, "weathering the twitter storm: early uses of social media as a disaster response tool for public libraries during hurricane sandy," which is included in this issue. charles mcclure and john price wilkin were selected as the 2019 winners of the lita/oclc information technology and libraries | june 2019 3 frederick g. kilgour award for research in library and information technology and the hugh c. atkinson memorial award sponsored by acrl, alcts, llama, and lita, respectively. charles mcclure is the francis eppes professor of information studies in the school of information and the director of the information use management and policy institute at florida state university. john price wilkin is the juanita j. and robert e. simpson dean of libraries at the university of illinois at urbana-champaign. the north carolina state university libraries will receive the 2019 lita/library hi tech award for outstanding communication in library and information technology, which recognizes outstanding individuals or institutions for their long-term contributions to the education of the library and information science technology field and is sponsored by lita and emerald publishing. other not-to-be-missed lita highlights at the 2019 ala annual conference in washington d.c. include the lita top tech trends program widely known for its insightful overview of emerging technologies, the lita president’s program with meredith broussard, a data journalist and the author of artificial unintelligence: how computers misunderstand the world3 as the speaker, and the lita happy hour, a lively social gathering of all library technologists and technologyenthusiasts. the lita avram camp is also preparing for another terrific all-day discussion and activities this year for women and non-binary library technologists to examine the shared challenges, to network, and to support one another. the lita imagineering interest group has put together another fantastic program, “agency, consent, and power in science fiction and fantasy,” featuring four sci-fi authors: sarah gailey, malka older, john scalzi, and martha wells. the lita membership committee is also preparing a virtual lita kickoff orientation for those who are newly attending the ala annual conference. in this last column that i write as the lita president, i would like to express my sincere gratitude to the dedicated lita board of directors, the always fantastic lita staff, and many lita leaders and members whose creativity, passion, and energy continue to drive lita forward. serving as the chief elected officer of one of the leading membership association in library technology has been a true honor to me, and having such a great team of people to work with has been of tremendous help to me in tackling many dauting tasks. it is often said that all lita presidents face unique challenges during their terms. i can say that this has been certainly true during my term. working together with the alcts and the llama leadership on the three-division merger was a valuable experience and a privilege. while we could not move things as quickly as we hoped, we have built a great foundation for the next phase of the planning and learned many things together along the way. last but not least, i would like to thank everyone who stood for the election and congratulate all newly-elected lita officers: evviva weinraub for the president-elect, hong ma and galen charlton for board of directors at large, and jodie gambill for the lita councilor. i am confident that led by the incoming lita president, emily morton-owens, the capable and dedicated lita leadership will continue to accomplish many great things with energetic and forward-thinking lita members in coming years. the future of lita is brighter with these new lita leaders. good luck and thank you for your service! lita president’s message: moving forward with lita | kim 4 https://doi.org/10.6017/ital.v38i2.11093 endnotes 1 “lita’s statement in response to incidents at ala midwinter 2019,” lita blog, february 4, 2019, https://litablog.org/2019/02/litas-statement-in-response-to-incidents-at-ala-midwinter2019/. 2 “lita awards & scholarships,” library information technology association (lita), http://www.ala.org/lita/awards. 3 meredith broussard, artificial unintelligence: how computers misunderstand the world (cambridge, massachusetts: the mit press, 2018). 124 book reviews theory and application of information research. edited by ole harbo and leif kajberg. london: mansell publishing, 1980. 235p. £16.00. isbn: 0-7201-1513-2. this book reproduces twenty-one papers presented at the second intemational research forum on information science, which was held at the royal school of librarianship in copenhagen during august of 1977. the title of this work may be misleading since the majority of the papers could better be described as the foundations of information science. the papers that advanced the theory of information science were the exception, and the contributions dealing with practical applications were even rarer. the contributors included many familiar names: kathleen t. bivins, anthony debons, william goffman, manfred kochen, allan d. pratt, and hans h . wellisch from the united states; nicholas j. belkin, j. m. brittain, b. c. brookes, robert a. fairthorne, j.-m. griffiths, m. h. heine, s. e. robertson, b. c. vickery, and t. d. wilson from the united kingdom; and many names from europe that may be less familiar on this side of the atlantic. the forum was organized into five sessions: general models of information science, information science in relation to other scientific disciplines, measurement, the information retrieval process, and the future tasks of information scientists in europe. within the book, the distinction between these sessions generally is not obvious. appendixes give the forum program, summarize the discussions of the papers, and report on group discussions. in the introduction, it was stated that it was hoped that the forum would bridge the gap betwe~n theory and research on one side and practice on the other. the book does not fulfill this hope, but it does present a good collection of papers dealing with a variety of aspects in information science. the view that the main problems of information science are cognitive rather than technical is evident in many of the papers. however, bradford's law, shannon's theory, and the epidemic model are addressed in several of the papers. with a few exceptions, the papers are quite readable and do not require a mathematical background to be understood and appreciated. the summaries and group discussions are disappointing, possibly because several of the authors were unable to attend the forum. kathleen bivius was the only american contributor present. there is no index, although one would have been helpful. the book is valuable and should be part of any library collection covering information science. anyone interested in information science should be able to find several highly relevant papers. however, only a limited number of scholars will find it necessary to read the entire work.-edward t. o'neill, matthew a. baxter school of information and library science, case western reserve university, cleveland, ohio. personal documentation for professionals-means and methods, by v. stibic. amsterdam: north-holland pub!. co., 1980. 214p. $29.25 (dfl 60.00). isbn: 0-444-85480-0. while there have been many a number of books written on the design, development, and use of large-scale database systems, there have been few that focus on the control of one's own personal collect ion of reprints, memoranda, reports, drafts, slides, and related miscellanea, which accumulate so rapidly in any professional " information -handler's" office. stibic's book addresses this problem in a thoroughly professional and competent manner. his first two chapters introduce the general nature of the problem, and discuss professionals' information needs and sources. the third, "document description," covers the record structure, abstracting, subject descriptions, keywords and classification methods, and their various combinations. the fourth chapter details the various technical means for storage of original documents, microfilm, and such control meobanisms as card indexes, peek-a-boo cards, and computer-supported indexes. all of these chapters draw on the experience and practices familiar to users of large-scale systems. stibic recommends the use of iso and other standardized practices, and endeavors to emphasize the need for constructing one's own system in accord with generally accepted design principles. stibic is careful to point out, however, that if one is in fact designing a personal documentation system, then personal idiosyncrasies and preferences can be built into it. it is not necessary to use an established and standardized vocabulary or classification system without modification. one may alter it to suit one's own purposes. however, the structure of the system (whether descriptors, classification numbers, or other means} must be controlled; otherwise the system will become useless. the next four chapters are case studies of different systems. the first is a card index technique used by an individual. the second describes a computerized index to support the documentation needs of a project team. (essentially an augmented kwic index, published quarterly.) the third case study is one of particular interest to many professionals at the moment-the use of a personal computer as an indexing control system. the system, though not explicitly identified, is roughly comparable to many of those available in the u.s.; a microcomputer with 64k ram, a display of 80x24 lines, two floppy disks with 512k bytes/disk, and an socharacter-per-line printer. the indexing is done via a faceted classification system of about 250 terms, which are hierarchically linked, providing automatic up-posting from specific to generic terms. a hashcoding technique is used to minimize the storage space required on the disk, and searching is performed by simple serial book reviews 125 searching of the index records. the fourth case study is an examination of the upgrading of the manual card index described in the first study to a system supported by a large main-frame computer, using a terminal in the professional's office. a combination of automatic keyword extraction and manual classincation is used for indexing. complex boolean searches are possible with this system. stibic concludes with a chapter on future prospects, touching briefly on such things as internal and public viewdata/teletext systems. he also provides a checklist of desirable features of "a multi-purpose personal work station." such a station is not merely a special-purpose device used to aid in some parts of one's work, such as retrieval, but is an integral part of all of one's work; computer, calculator, textprocessor, mail-dispatch system, calendar, in/out box, and so forth. the author, a scientist of long standing with philips in holland, has provided a valuable guide to this area. there are two relatively minor points of criticism, however. whether it was the author's or the publisher's choice is not clear, but there is an excessive use of italics throughout the text. this lavish use seems more appropriate to teenagers' romantic novels than to a serious work. in this case, it is more distracting than helpful. secondly, but more understandably, the extensive references stibic gives are frequently to documents not easily available in the u.s. some are oecd papers, some refer to the german din standards, and some to internal philips technical reports. these are minor points, however, regarding an excellent book. it is recommended not only for the information professional, but for anyone who is seriously concerned with the problem of keeping track of what one needs to know.-allan d. pratt, university of arizona graduate library school, tucson. viewdata revolution, by sam fedida and rex mahle a halsted press book. new york: wiley, 1979. l86p. $34.95. lc: 7923869. isbn: 0-470-26879-4. sam fedida is the inventor of prestel, 126 journal of library automation vol. 14/2 june 1981 the british post office's viewdata system . with this as his license, he and rex malik have written a 186-page volume explaining the prestel system. prestel is a series of databases, which are accessed by a keypad similar to a calculator. the common television takes on the characteristic of a crt for viewin·g alphabetical and numerical information. the connection to the computer is by telephone, and, in britain, the post office is in charge of the telephones . overall, in spite of several printing errors, this book does provide information about the system. the authors explain the types of information that will be available on the pres tel system, such as "buying a car," "houses for sale," "entertainment," "education," "an evening out," and "news . " they have also devoted individual chapters to electronic mail, electronic funds transfer, and education, explaining how each works in the system. the authors stress the benefits and attributes of their system almost to the point of redundancy . in each of the chapters, the manner in which the information is going to be accessed is repeated. despite the repetition, the primary focus is what prestel will do for the betterment of mankind. the uniqueness of prestel is the simplicity of its access process. according to the authors, being able to access the information in one's own home will make prestel a major tool for dissemination of information for many agencies and businesses. at times, the "hard sell" is very obvious throughout the volume. however, the diagrams are good and help to explain the authors' points. the problems fedida and malik anticipate in the electronic mail and protocols are realistic. in the chapters "future i" and "future ii," the authors go off on a tangent, using a time line, on what they see in the future. again, it is basically a repetition of what was said in the previous chapters, only from a futuristic point of view. here, the reader gets a distinct feeling of what is really bothering them now in the system; that is, government bureaucracy . they cite the different groups trying to control the information by means of legislation. they delve into the problem of uniformity of standards. television is an example . what will be standard for convertors and adapters for the computer hookup? this is a real problem that was well explored throughout the work. this volume is good for librarians who are interested in cable, telecommunications, and computers . however, be aware of its poor organization. there are numerous printing errors that affect its readability. nevertheless, if a person can wade through these errors and the repetition of ideas, he/she can obtain some useful information from this text. there is a distinct feeling throughout this work that it was put together hastily . nonetheless, there is a dearth of information on this subject, and this book will serve some useful purpose for libraries .-robert miller, memphis/shelby county public library and information center, memphis, tennessee . ala filing rules. filing committee, resources and technical services division, american library association . chicago: american library assn., 1980. 50p. $3.50. lc: 80-22186. isbn: 0-8389-3255-x. library of congress filing rules. prepared by john c . rather and susan c. biebel. washington, d. c.: library of congress, 1980. ll1p . $5. lc: 80-607944 isbn: 0-8444-0347-4. available from customer services section, cataloging distribution service, library of congress, washington, dc 20541. these two works represent the culmination of over a decade of effort within the library profession to overhaul the techniques by which entries are arranged to form catalogs. the impetus for this work came from recognition that computer technology would soon be enlisted to perform the arrangement of entries for the production of catalogs, and that filing rules current at the time would be impossible to implement in their entirety on the computer. although the original intention was to develop rules appropriate for the arrangement of entries by computer, those at the library of congress and the ala committee working on the problem soon realized that, from the point of view of catalog users, it would be very undesirable to have different sets of filing rules in operation depending on the physical medium of the catalog. therefore, the scope of the effort was broadened to rules that could be applied both manually and by machine using headings that were formulated according to more than one set of cataloging rules. now that we have these new rules, the question arises whether they are better than what preceded them . the criteria for "better" ought to be whether the rules make entries easier to find both for known-item searches and browsing within the complex device called a library catalog. or to state the same criteria negatively: it should be more difficult to lose an entry in the catalog if it has been filed according to the rules . the evaluation of these rules against other possible approaches to catalog arrangement ought to be centered on observation of the needs of a variety of both experienced and unsophisticated catalog users and on measurement of the effectiveness of the alternative approaches to meet these needs. the complex problems of filing clearly exemplify the need for research as recently expressed by herb white in his columns in american libraries. lacking any empirical data on which to base an evaluation, we must rely on our professional judgment and personal biases to argue the case for the new rules. to this reviewer, it seems that common sense supports a set of rules that are simple, consistent, and easy to explain to library users. the need for simplicity and consistency directly implies the "file-as-is" principle (i.e., file exactly as the heading is visually constructed, not by some interpretation of it), which should be applied even at the cost of having to search in more than one place in the arrangement; e .g., numeric digits and numeric words , mac and me, muller and mueller. the file-as-is principle has been more consistently applied in the ala rules than the lc rules, the latter undoubtedly a result of the anticipated complexity and size book reviews 127 of lc's catalogs, although there is no justification argued for these departures from the basic principle. of specific interest to readers of the journal is whether these rules can be implemented for computer sorting of catalog entries . do the rules succeed in meeting their original objective? the ala rules certainly appear to be amenable to very straightforward systems analysis and programming. for this the committee and its chairperson, joe rosenthal, need to be commended. from some sources there are already claims of systems that fully implement the new ala rules, which certainly could be the case . however, it would be interesting to know how these systems deal with the follow ing, which seem to be potentially troublesome: • the lack of consistent support in the marc format for handling initial articles when the rules call for ignoring initial articles in corporate names other than personal or place names, title subheadings ($t subfield), and subject headings. the english articles obviously present no problem, but the table of articles in appendix 2 shows more than thirty words that can be both an article and the cardinal numeral l. in addition, the footnote , "in h awaiian, the '0 emphatic' must be carefully distinguished from the preposition 0, but 0 also serves. the h awaiian language as a noun and a verb (each with several meanings), an adverb, and a conjunction," must surely give pause to the diligent systems designer. the recent library of congress practice of dropping nonfiling initial articles from heading fields still does not solve the problem of initial articles in the several million marc records that already exist in library catalogs . • the requirement that roman numerals be filed numerically presents an opportunity to construct an interesting but not overly complex algorithm . however, although the marc format makes the identification of roman numerals in heading fields fairly straightforward (the $b subfield), the identification of roman numerals embedded in a long title is much more ambiguous . for example, does iv mean "4" or "intravenous"? 128 journal of library automation vol. 14/2 june 1981 • the rules require that punctuation in an arabic numeral that is included to increase its readability is to be ignored in filing, but decimal points are significant in determining the numeric value of the number (i.e . , .003 files before 1) . how does one specify an algorithm to deal with the title, "5.000 kilometres dans le sud"? using european practice, this number is obviously 5,000, but why not 5 according to the computer algorithm? • the special rule for nonroman alphabets (rule 7) is interesting: "if, in the arrangement of bibliographic records, it is necessary to distinguish access points containing characters in different nonroman alphabets, scripts and syllabaries (cf. rule 1, order of characters) the following order of precedence is used. . .. " there follows a table beginning with amharic and ending in tibetan. that is the entire rule. systems designers who have implemented this rule clearly have transcendent skills! reliance on the marc language code in the 008 field has both theoretical and practical problems. • the introductory text advises libraries to include in the file information notes and references that explain filing practices to catalog users . however, the rules do not specify where these references are to file in relation to other headings. admonishment to provide these at "appropriate points" is not much help. • the ampersand is ignored in filing (for which we should be grateful) . but, by including the optional rule 1.3, which allows filing the ampersand "as its spelled-out language equivalent," the ala committee has put systems designers in the position of having to explain why this rule cannot be implemented on the computer-at least not until the marc format includes a code for language of the field (not a likely development, and even then not all ambiguity would be eliminated). interestingly, the library of congress treats all ampersands as a character filing between blank and the letter a . • the optional rule 9.1, which allows the inclusion of "the role of a person or a corporate body in a legal action in arranging access points," presents a problem when the rule requires suppression of all other relators . how is the computer programmed to recognize a legal action? is there a finite list of such relator words? differences between aacr2 and previous cataloging practices further complicate the use of this option . admittedly, many of these problems are marginal in terms of the number of entries in a catalog affected, but to a systems designer, even though there is only one instance, it must be accounted for in the computer programs if the system can claim a "full" implementation of the rules. clearly, full implementation will require some changes in the marc format before all rules can be applied absolutely consistently and unambiguously . the library of congress rules, although applying similar principles, depart significantly from the ala rules in detail and complexity. a full analysis of the implementation problems would require much more space than this review will allow. suffice it to say that although the library's libsked program has been under development for twelve years, and its strengths and limitations have undoubtedly influenced the development of these filing rules, there are elements in these rules that have not yet been implemented in libsked, and several where no one has yet figured out how to do it. although the work on these rules is complete, there are two more projects the profession should undertake that would be most useful for those concerned with catalog development . in both sets of rules, there is mention in the introduction of the need for a brief version of the essential rules, which could be handed out to catalog users . why did the committee not develop such a brief guide and include it as an appendix to the rules? those of us who work on computers are familiar with the reference cards for programming languages put out by computer manufacturers . a similar format for the filing rules would be very useful. another more difficult but equally useful project would be the publication of a standard design implementation of the ala filing rules expressed in terms of the marc format . such a design would include the marc fields and subfields necessary for each possible entry from a bibliographic record and a description of any special processing required for particular data elements. the design would be expressed at a level that is independent of programming languages and computer hardware . we need a standard reference that translates the filing rules into the language of the marc format. t he ala rules, in some tantalizingly brief instances, begin this process. both sets of filing rules are significant improvements over those previously available to systems analysts. reference librarians should find these rules easy to explain book reviews 129 to beleaguered catalog users. for their simplicity and relatively slight departure from the "file-as-is" principle, the ala rules are to be recommended . the library of congress rules, in their attempt to retain the classificatory structures that support the browsing user, further complicate the task of the user performing a known-item search. library research has indicated that the preponderance of catalog searches in research libraries are known-item searches .-] ohn f. knapp, ringgold management systems, beaverton, oregon . tps ties them together ® aegistered tr ade mark oclci!!> a t s a c c i rcu l at i®o n l 0 s g i® r l i n® n g tps electronics provides on-line and off-line interfaces • one-step item processing • error-free data entry • back-up storage tps electronics 4047 transport st. palo alto, ca 94303 41 5-494-6802 6 information technology and libraries | march 2009 paul t. jaeger and zheng yan one law with two outcomes: comparing the implementation of cipa in public libraries and schools though the children’s internet protection act (cipa) established requirements for both public libraries and public schools to adopt filters on all of their computers when they receive certain federal funding, it has not attracted a great amount of research into the effects on libraries and schools and the users of these social institutions. this paper explores the implications of cipa in terms of its effects on public libraries and public schools, individually and in tandem. drawing from both library and education research, the paper examines the legal background and basis of cipa, the current state of internet access and levels of filtering in public libraries and public schools, the perceived value of cipa, the perceived consequences of cipa, the differences in levels of implementation of cipa in public libraries and public schools, and the reasons for those dramatic differences. after an analysis of these issues within the greater policy context, the paper suggests research questions to help provide more data about the challenges and questions revealed in this analysis. t he children’s internet protection act (cipa) established requirements for both public libraries and public schools to—as a condition for receiving certain federal funds—adopt filters on all of their computers to protect children from online content that was deemed potentially harmful.1 passed in 2000, cipa was initially implemented by public schools after its passage, but it was not widely implemented in public libraries until the 2003 supreme court decision (united states v. american library association) upholding the law’s constitutionality.2 now that cipa has been extensively implemented for five years in libraries and eight years in schools, it has had time to have significant effects on access to online information and services. while the goal of filtering requirements is to protect children from potentially inappropriate content, filtering also creates major educational and social implications because filters also limit access to other kinds of information and create different perceptions about schools and libraries as social institutions. curiously, cipa and its requirements have not attracted a great amount of research into the effects on schools, libraries, and the users of these social institutions. much of the literature about cipa has focused on practical issues—either recommendations on implementing filters or stories of practical experiences with filtering. while those types of writing are valuable to practitioners who must deal with the consequences of filtering, there are major educational and societal issues raised by filtering that merit much greater exploration. while relatively small bodies of research have been generated about cipa’s effects in public libraries and public schools,3 thus far these two strands of research have remained separate. but it is the contention of this paper that these two strands of research, when viewed together, have much more value for creating a broader understanding of the educational and societal implications. it would be impossible to see the real consequences of cipa without the development of an integrative picture of its effects on both public schools and public libraries. in this paper, the implications of cipa will be explored in terms of effects on public libraries and public schools, individually and in tandem. public libraries and public schools are generally considered separate but related public sphere entities because both serve core educational and information-provision functions in society. furthermore, the fact that public schools also contain school library media centers highlights some very interesting points of intersection between public libraries and school libraries in terms of the consequences of cipa: while cipa requires filtering of computers throughout public libraries and public schools, the presence of school library media centers makes the connection between libraries and schools stronger, as do the teaching roles of public libraries (e.g., training classes, workshops, and evening classes). n the legal road to cipa history under cipa, public libraries and public schools receiving certain kinds of federal funds are required to use filtering programs to protect children under the age of seventeen from harmful visual depictions on the internet and to provide public notices and hearings to increase public awareness of internet safety. senator john mccain (r-az) sponsored cipa, and it was signed into law by president bill clinton on december 21, 2000. cipa requires that filters at public libraries and public schools block three specific types of content: (1) obscene material (that paul t. jaeger (pjaeger@umd.edu) is assistant professor at the college of information studies and director of the center for information policy and electronic government of the university of maryland in college park. zheng yan (zyan@uamail.albany .edu) is associate professor at the department of educational and counseling psychology in the school of education of the state university of new york at albany. one law with two outcomes | jaeger and yan 7 which appeals to prurient interests only and is “offensive to community standards”); (2) child pornography (depictions of sexual conduct and or lewd exhibitionism involving minors); and (3) material that is harmful to minors (depictions of nudity and sexual activity that lack artistic, literary, or scientific value). cipa focused on “the recipients of internet transmission,” rather than the senders, in an attempt to avoid the constitutional issues that undermined the previous attempts to regulate internet content.4 using congressional authority under the spending clause of article i, section 8 of the u.s. constitution, cipa ties the direct or indirect receipt of certain types of federal funds to the installation of filters on library and school computers. therefore each public library and school that receives the applicable types of federal funding must implement filters on all computers in the library and school buildings, including computers that are exclusively for staff use. libraries and schools had to address these issues very quickly because the federal communications commission (fcc) mandated certification of compliance with cipa by funding year 2004, which began in summer 2004.5 cipa requires that filters on computers block three specific types of content, and each of the three categories of materials has a specific legal meaning. the first type—obscene materials—is statutorily defined as depicting sexual conduct that appeals only to prurient interests, is offensive to community standards, and lacks serious literary, artistic, political, or scientific value.6 historically, obscene speech has been viewed as being bereft of any meaningful ideas or educational, social, or professional value to society.7 statutes regulating speech as obscene have to do so very carefully and specifically, and speech can only be labeled obscene if the entire work is without merit.8 if speech has any educational, social, or professional importance, even for embodying controversial or unorthodox ideas, it is supposed to receive first amendment protection.9 the second type of content—child pornography—is statutorily defined as depicting any form of sexual conduct or lewd exhibitionism involving minors.10 both of these types of speech have a long history of being regulated and being considered as having no constitutional protections in the united states. the third type of content that must be filtered— material that is harmful to minors—encompasses a range of otherwise protected forms of speech. cipa defines “harmful to minors” as including any depiction of nudity, sexual activity, or simulated sexual activity that has no serious literary, artistic, political, or scientific value to minors.11 the material that falls into this third category is constitutionally protected speech that encompasses any depiction of nudity, sexual activity, or simulated sexual activity that has serious literary, artistic, political, or scientific value to adults. along with possibly including a range of materials related to literature, art, science, and policy, this third category may involve materials on issues vital to personal well-being such as safe sexual practices, sexual identity issues, and even general health care issues such as breast cancer. in addition to the filtering requirements, section 1731 also prescribes an internet awareness strategy that public libraries and schools must adopt to address five major internet safety issues related to minors. it requires libraries and schools to provide reasonable public notice and to hold at least one public hearing or meeting to address these internet safety issues. requirements for schools and libraries cipa includes sections specifying two major strategies for protecting children online (mainly in sections 1711, 1712, 1721, and 1732) as well as sections describing various definitions and procedural issues for implementing the strategies (mainly in sections 1701, 1703, 1731, 1732, 1733, and 1741). section 1711 specifies the primary internet protection strategy—filtering—in public schools. specifically, it amends the elementary and secondary education act of 1965 by limiting funding availability for schools under section 254 of the communication act of 1934. through a compliance certification process within a school under supervision by the local educational agency, it requires schools to include the operation of a technology protection measure that protects students against access to visual depictions that are obscene, are child pornography, or are harmful to minors under the age of seventeen. likewise, section 1712 specifies the same filtering strategy in public libraries. specifically, it amends section 224 of the museum and library service act of 1996/2003 by limiting funding availability for libraries under section 254 of the communication act of 1934. through a compliance certification process within a library under supervision by the institute of museum and library services (imls), it requires libraries to include the operation of a technology protection measure that protects students against access to visual depictions that are obscene, child pornography, or harmful to minors under the age of seventeen. section 1721 is a requirement for both libraries and schools to enforce the internet safety policy with the internet safety policy strategy and the filtering technology strategy as a condition of universal service discounts. specifically, it amends section 254 of the communication act of 1934 and requests both schools and libraries to monitor the online activities of minors, operate a technical protection measure, provide reasonable public notice, and hold at least one public hearing or meeting to address the internet safety policy. this is through the 8 information technology and libraries | march 2009 certification process regulated by the fcc. section 1732, titled the neighborhood children’s internet protection act (ncipa), amends section 254 of the communication act of 1934 and requires schools and libraries to adopt and implement an internet safety policy. it specifies five types of internet safety issues: (1) access by minors to inappropriate matter on the internet; (2) safety and security of minors when using e-mail, chat rooms, and other online communications; (3) unauthorized access; (4) unauthorized disclosure, use, and dissemination of personal information; and (5) measures to restrict access to harmful online materials. from the above summary, it is clear that (1) the two protection strategies of cipa (the internet filtering strategy and safety policy strategy) were equally enforced in both public schools and public libraries because they are two of the most important social institutions for children’s internet safety; (2) the nature of the implementation mechanism is exactly the same, using the same federal funding mechanisms as the sole financial incentive (limiting funding availability for schools and libraries under section 254 of the communication act of 1934) through a compliance certification process to enforce the implementation of cipa; and (3) the actual implementation procedure differs in libraries and schools, with schools to be certified under the supervision of local educational agencies (such as school districts and state departments of education) and with libraries to be certified within a library under the supervision of the imls. economics of cipa the universal service program (commonly known as e–rate) was established by the telecommunications act of 1996 to provide discounts, ranging from 20 to 90 percent, to libraries and schools for telecommunications services, internet services, internal systems, and equipment.12 the program has been very successful, providing approximately $2.25 billion dollars a year to public schools, public libraries, and public hospitals. the vast majority of e-rate funding—about 90 percent—goes to public schools each year, with roughly 4 percent being awarded to public libraries and the remainder going to hospitals.13 the emphasis on funding schools results from the large number of public schools and the sizeable computing needs of all of these schools. but even 4 percent of the e-rate funding is quite substantial, with public libraries receiving more than $250 million between 2000 and 2003.14 schools received about $12 billion in the same time period.15 along with e-rate funds, the library services and technology act (lsta) program administered by the imls provides money to each state library agency to use on library programs and services in that state, though the amount of these funds is considerably lower than e-rate funds. the american library association (ala) has noted that the e-rate program has been particularly significant in its role of expanding online access to students and to library patrons in both rural and underserved communities.16 in addition to the effect on libraries, e-rate and lsta funds have significantly affected the lives of individuals and communities. these programs have contributed to the increase in the availability of free public internet access in schools and libraries. by 2001, more than 99 percent of public school libraries provided students with internet access.17 by 2007, 99.7 percent of public library branches were connected to the internet, and 99.1 percent of public library branches offered public internet access.18 however, only a small portion of libraries and schools used filters prior to cipa.19 since the advent of computers in libraries, librarians typically had used informal monitoring practices for computer users to ensure that nothing age inappropriate or morally offensive was publicly visible.20 some individual school and library systems, such as in kansas and indiana, even developed formal or informal statewide internet safety strategies and approaches.21 why were only libraries and schools chosen to protect children’s online safety? while there are many social institutions that could have been the focus of cipa, the law places the requirements specifically on public libraries and public schools. if congress was so interested in protecting children from access to harmful internet content, it seems that the law would be more expansive and focused on the content itself rather than filtering access to the content. however, earlier laws that attempted to regulate access to internet content failed legal challenges specifically because they tried to regulate content. prior to the enactment of cipa, there were a number of other proposed laws aimed at preventing minors from accessing inappropriate internet content. the communications decency act (cda) of 1996 prohibited the sending or posting of obscene material through the internet to individuals under the age of eighteen.22 however, the supreme court found the cda to be unconstitutional, stating that the law violated free speech under the first amendment. in 1998, congress passed the child online protection act (copa), which prohibited commercial websites from displaying material deemed harmful to minors and imposed criminal penalties on internet violators.23 a three-panel judge for the district court for the eastern district of pennsylvania ruled that copa’s focus on “contemporary community standards” violated the first amendment, and the panel subsequently imposed an one law with two outcomes | jaeger and yan 9 injunction on copa’s enforcement. cipa’s force comes from congress’s power under the spending clause; that is, congress can legally attach requirements to funds that it gives out. since cipa is based on economic persuasion—the potential loss of funds for technology—the law can only have an effect on recipients of those funds. while regulating internet access in other venues like coffee shops, internet cafés, bookstores, and even individual homes would provide a more comprehensive shield to limit children’s access to certain online content, these institutions could not be reached under the spending clause. as a result, the burdens of cipa fall squarely on public libraries and public schools. n the current state of filtering when did cipa actually come into effect in libraries and schools? after overcoming a series of legal challenges that were ultimately decided by the supreme court, cipa came into effect in full force in 2003, though 96 percent of public schools were already in compliance with cipa in 2001. when the court upheld the constitutionality of cipa, the legal challenge by public libraries centered on the way the statute was written.24 the court’s decision states that the wording of the law does not place unconstitutional limitations on free speech in public libraries. to continue receiving federal dollars directly or indirectly through certain federal programs, public libraries and schools were required to install filtering technologies on all computers. while the case decided by the supreme court focused on public libraries, the decision virtually precludes public schools from making the same or related challenges.25 before that case was decided, however, most schools had already adopted filters to comply with cipa. as a result of cipa, a public library or public school must install technology protection measures, better known as filters, on all of its computers if it receives n e-rate discounts for internet access costs, n e–rate discounts for internal connections costs, n lsta funding for direct internet costs,26 or n lsta funding for purchasing technology to access the internet. the requirements of cipa extend to public libraries, public schools, and any library institution that receives lsta and e–rate funds as part of a system, including state library agencies and library consortia. as a result of the financial incentives to comply, almost 100 percent of public schools in the united states have implemented the requirements of cipa,27 and approximately half of public libraries have done so.28 how many public schools have implemented cipa? according to the latest report by the department of education (see table 1), by 2005, 100 percent of public schools had implemented both the internet filtering strategy and safety policy strategy. in fact, in 2001 (the first year cipa was in effect), 96 percent of schools had implemented cipa, with 99 percent filtering by 2002. when compared to the percentage of all public schools with internet access from 1994 to 2005, internet access became nearly universal in schools between 1999 and 2000 (95 to 98 percent), and one can see that the internet access percentage in 2001 was almost the same as the cipa implementation percentage. according to the department of education, the above estimations are based on a survey of 1,205 elementary and secondary schools selected from 63,000 elementary schools and 21,000 secondary and combined schools.29 after reviewing the design and administration of the survey, it can be concluded that these estimations should be considered valid and reliable and that cipa was immediately and consistently implemented in the majority of the public schools since 2001.30 how many public libraries have implemented cipa? in 2002, 43.4 percent of public libraries were receiving e-rate discounts, and 18.9 percent said they would not apply for e-rate discounts if cipa was upheld.31 since the supreme court decision upholding cipa, the number of libraries complying with cipa has increased, as table 1. implementation of cipa in public schools year 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2005 access (%) 35 50 65 78 89 95 98 99 99 100 100 filtering (%) 96 99 97 100 10 information technology and libraries | march 2009 have the number of libraries not applying for e-rate funds to avoid complying with cipa. however, unlike schools, there is no exact count of how many libraries have filtered internet access. in many cases, the libraries themselves do not filter, but a state library, library consortium, or local or state government system of which they are a part filters access from beyond the walls of the library. in some of these cases, the library staff may not even be aware that such filtering is occurring. a number of state and local governments have also passed their own laws to encourage or require all libraries in the state to filter internet access regardless of e-rate or lsta funds.32 in 2008, 38.2 percent of public libraries were filtering access within the library as a result of directly receiving e-rate funding.33 furthermore, 13.1 percent of libraries were receiving e-rate funding as a part of another organization, meaning that these libraries also would need to comply with cipa’s requirements.34 as such, the number of public libraries filtering access is now at least 51.3 percent, but the number will likely be higher as a result of state and local laws requiring libraries to filter as well as other reasons libraries have implemented filters. in contrast, among libraries not receiving e-rate funds, the number of libraries now not applying for e-rate intentionally to avoid the cipa requirements is 31.6 percent.35 while it is not possible to identify an exact number of public libraries that filter access, it is clear that libraries overall have far lower levels of filtering than the 100 percent of public schools that filter access. e-rate and other program issues the administration of the e-rate program has not occurred without controversy. throughout the course of the program, many applicants for and recipients of the funding have found the program structure to be obtuse, the application process to be complicated and time consuming, and the administration of the decision-making process to be slow.36 as a result, many schools and libraries find it difficult to plan ahead for budgeting purposes, not knowing how much funding they will receive or when they will receive it.37 there also have been larger difficulties for the program. following revelations about the uses of some e-rate awards, the fcc suspended the program from august to december 2004 to impose new accounting and spending rules for the funds, delaying the distribution of over $1 billion in funding to libraries and schools.38 news investigations had discovered that certain school systems were using e-rate funds to purchase more technology than they needed or could afford to maintain, and some school systems failed to ever use technology they had acquired.39 while the administration of the e-rate program has been comparatively smooth since, the temporary suspension of the program caused serious short-term problems for, and left a sense of distrust of, the program among many recipients.40 filtering issues during the 1990s, many types of software filtering products became available to consumers, including serverside filtering products (using a list of server-selected blocked urls that may or may not be disclosed to the user), client-side filtering (controlling the blocking of specific content with a user password), text-based content-analysis filtering (removing illicit content of a website using real-time analysis), monitoring and timelimiting technologies (tracking a child’s online activities and limiting the amount of time he or she spends online), and age-verification systems (allowing access to webpages by passwords issued by a third party to an adult).41 but because filtering software companies make the decisions about how the products work, content and collection decisions for electronic resources in schools and public libraries have been taken out of the hands of librarians, teachers, and local communities and placed in the trust of proprietary software products.42 some filtering programs also have specific political agendas, which many organizations that purchase them are not aware of.43 in a study of over one million pages, for every webpage blocked by a filter as advertised by the software vendor, one or more pages were blocked inappropriately, while many of the criteria used by the filtering products go beyond the criteria enumerated in cipa.44 filters have significant rates of inappropriately blocking materials, meaning that filters misidentify harmless materials as suspect and prevent access to harmless items (e.g., one filter blocked access to the declaration of independence and the constitution).45 furthermore, when libraries install filters to comply with cipa, in many instances the filters will frequently be blocking text as well as images, and (depending on the type of filtering product employed) filters may be blocking access to entire websites or even all the sites from certain internet service providers. as such, the current state of filtering technology will create the practical effect of cipa restricting access to far more than just certain types of images in many schools and libraries.46 n differences in the perceived value of cipa and filtering based on the available data, there clearly is a sizeable contrast in the levels of implementation of cipa between one law with two outcomes | jaeger and yan 11 schools and libraries. this difference raises a number of questions: for what reasons has cipa been much more widely implemented in schools? is this issue mainly value driven, dollar driven, both, or neither in these two public institutions? why are these two institutions so different regarding cipa implementation while they share many social and educational similarities? reasons for nationwide full implementation in schools there are various reasons—from financial, population, social, and management issues to computer and internet availability—that have driven the rapid and comprehensive implementation of filters in public schools. first, public schools have to implement cipa because of societal pressures and the lobbying of parents to ensure students’ internet safety. almost all users of computers in schools are minors, the most vulnerable groups for internet crimes and child pornography. public schools in america have been the focus of public attention and scrutiny for years, and the political and social responsibility of public schools for children’s internet safety is huge. as a result, society has decided these students should be most strongly protected, and cipa was implemented immediately and most widely at schools. second, in contrast to public libraries (which average slightly less than eleven computers per library outlet), the typical number of computers in public schools ranges from one hundred to five hundred, which are needed to meet the needs of students and teachers for daily learning and teaching. since the number of computers is quite large, the financial incentives of e-rate funding are substantial and critical to the operation of the schools. this situation provides administrators in schools and school districts with the incentive to make decisions to implement cipa as quickly and extensively as possible. furthermore, the amount of money that e-rate provides for schools in terms of technology is astounding. as was noted earlier, schools received over $12 billion from 2000 to 2003 alone. schools likely would not be able to provide the necessary computers for students and teachers without the e-rate funds. third, the actual implementation procedure differs in schools and libraries: schools are certified under the supervision of the local educational agencies such as school districts and state departments of education; libraries are certified within a library organization under the supervision of the imls. in other words, the certification process at schools is directly and effectively controlled by school districts and state departments of education, following the same fundamental values of protecting children. the resistance to cipa in schools has been very small in comparison to libraries. the primary concern raised has been the issue of educational equality. concerns have been raised that filters in schools may create two classes of students—ones with only filtered access at school and ones who also can get unfiltered access at home.47 reasons for more limited implementation in libraries in public libraries, the reasons for implementing cipa are similar to those of public schools in many ways. public libraries provide an average of 10.7 computers in each of the approximately seven thousand public libraries in the united states, which is a lot of technology that needs to be supported. the e-rate and lsta funds are vital to many libraries in the provision of computers and the internet. furthermore, with limited alternative sources of funding, the e-rate and lsta funds are hard to replace if they are not available. given that the public libraries have become the guarantor of public access to computing and the internet, libraries have to find ways to ensure that patrons can access the internet.48 libraries also have to be concerned about protecting and providing a safe environment for younger patrons. while libraries serve patrons of all ages, one of the key social expectations of libraries is the provision of educational materials for children and young adults. children’s sections of libraries almost always have computers in them. much of the content blocked by filters is of little or no education value. as such, “defending unfiltered internet access was quite different from defending catcher in the rye.”49 nevertheless, many libraries have fought against the filtering requirements of cipa because they believe that it violates the principles of librarianship or for a number of other reasons. in 2008, 31.6 percent of public libraries refused to apply for e-rate or lsta funds specifically to avoid cipa requirements, a substantial increase from the 15.3 percent of libraries that did not apply for e-rate because of cipa in 2006.50 as a result of defending patron’s rights to free access, the libraries that are not applying for e-rate funds because of the requirements of cipa are being forced to turn down the chance for funding to help pay for internet access in order to preserve community access to the internet. because many libraries feel that they cannot apply for e-rate funds, local and regional discrepancies are occurring in the levels of internet access that are available to patrons of public libraries in different parts of the country.51 for adult patrons who wish to access material on computers with filters, cipa states that the library has the option of disabling the filters for “bona fide research or other lawful purposes” when adult patrons request such disabling. the law does not require libraries to 12 information technology and libraries | march 2009 disable the filters for adult patrons, and the criteria for disabling of filters do not have a set definition in the law. the potential problems in the process of having the filters disabled are many and significant, including librarians not allowing the filters to be turned off, librarians not knowing how to turn the filters off, the filtering software being too complicated to turn off without injuring the performance of the workstation in other applications, or the filtering software being unable to be turned off in a reasonable amount of time.52 it has been estimated that approximately 11 million low-income individuals rely on public libraries to access online information because they lack internet access at home or work.53 the e-rate and lsta programs have helped to make public libraries a trusted community source of internet access, with the public library being the only source of free public internet access available to all community residents in nearly 75 percent of communities in the united states.54 therefore usage of computers and the internet in public libraries has continued to grow at a very fast pace over the past ten years.55 thus public libraries are torn between the values of providing safe access for younger patrons and broad access for adult patrons who may have no other means of accessing the internet. n cipa, public policy, and further research while the diverse implementations, effects, and levels of acceptance of cipa across schools and libraries demonstrate the wide range of potential ramifications of the law, surprisingly little consideration is given to major assumptions in the law, including the appropriateness of the requirements to different age groups and the nature of information on the internet. cipa treats all users as if they are the same level of maturity and need the same level of protection as a small child, as evidenced by the requirement that all computers in a library or school have filters regardless of whether children use a particular computer. in reality, children and adults interact in different social, physical, and cognitive ways with computers because of different developmental processes.56 cipa fails to recognize that children as individual users are active processors of information and that children of different ages are going to be affected in divergent ways by filtering programs.57 younger children benefit from more restrictive filters while older children benefit from less restrictive filters. moreover, filtering can be complimented by encouragement of frequent positive internet usage and informal instruction to encourage positive use. finally, children of all ages need a better understanding of the structure of the internet to encourage appropriate caution in terms of online safety. the internet represents a new social and cultural environment in which users simultaneously are affected by the social environment and also construct that environment with other users.58 cipa also is based on fundamental misconceptions about information on the internet. the supreme court’s decision upholding cipa represents several of these misconceptions, adopting an attitude that ‘we know what is best for you’ in terms of the information that citizens should be allowed to access.59 it assumes that schools and libraries select printed materials out of a desire to protect and censor rather than recognizing the basic reality that only a small number of print materials can be afforded by any school or library. the internet frees schools and libraries from many of these costs. furthermore, the court assumes that libraries should censor the internet as well, ultimately upholding the same level of access to information for adult patrons and librarians in public libraries as students in public schools. these two major unexamined assumptions in the law certainly have played a part in the difficulty of implementing cipa and in the resistance to the law. and this does not even address the problems of assuming that public libraries and public schools can be treated interchangeably in crafting legislation. these problematic assumptions point to a significantly larger issue: in trying to deal with the new situations created by the internet and related technology, the federal government has significantly increased the attention paid to information policy.60 over the past few years, government laws and standards related to information have begun to more clearly relate to social aspects of information technologies such as the filtering requirements of cipa.61 but the social, economic, and political ramifications for decisions about information policy are often woefully underexamined in the development of legislation.62 this paper has documented that many of the reasons for and statistics about cipa implementation are available by bringing together information from different social institutions. the biggest questions about cipa are about the societal effects of the policy decisions: n has cipa changed the education and informationprovision roles of libraries and schools? n has cipa changed the social expectations for libraries and schools? n have adult patron information behaviors changed in libraries? n have minor patron information behaviors changed in libraries? n have student information behaviors changed in school? n how has cipa changed the management of libraries and schools? n will congress view cipa as successful enough to merit using libraries and schools as the means of enforcing other legislation? one law with two outcomes | jaeger and yan 13 but these social and administrative concerns are not the only major research questions raised by the implementation of cipa. future research about cipa not only needs to focus on the individual, institutional, and social effects of the law. it must explore the lessons that cipa can provide to the process of creating and implementing information policies with significant societal implications. the most significant research issues related to cipa may be the ones that help illuminate how to improve the legislative process to better account for the potential consequences of regulating information while the legislation is still being developed. such cross-disciplinary analyses would be of great value as information becomes the center of an increasing amount of legislation, and the effects of this legislation have continually wider consequences for the flow of information through society. it could also be of great benefit to public schools and libraries, which, if cipa is any indication, may play a large role in future legislation about public internet access. references 1. children’s internet protection act (cipa), public law 106554. 2. united states v. american library association, 539 u.s. 154 (2003). 3. american library association, libraries connect communities: public library funding & technology access study 2007–2008 (chicago: ala, 2008); paul t. jaeger, john carlo bertot, and charles r. mcclure, “the effects of the children’s internet protection act (cipa) in public libraries and its implications for research: a statistical, policy, and legal analysis,” journal of the american society for information science and technology 55, no. 13 (2004): 1131–39; paul t. jaeger et al., “public libraries and internet access across the united states: a comparison by state from 2004 to 2006,” information technology and libraries 26, no. 2 (2007): 4–14; paul t. jaeger et al., “cipa: decisions, implementation, and impacts,” public libraries 44, no. 2 (2005): 105–9; zheng yan, “limited knowledge and limited resources: children’s and adolescents’ understanding of the internet,” journal of applied developmental psychology (forthcoming); zheng yan, “differences in basic knowledge and perceived education of internet safety between high school and undergraduate students: do high school students really benefit from the children’s internet protection act?” journal of applied developmental psychology (forthcoming); zheng yan, “what influences children’s and adolescents’ understanding of the complexity of the internet?,” developmental psychology 42 (2006): 418–28. 4. martha m. mccarthy, “filtering the internet: the children’s internet protection act,” educational horizons 82, no, 2 (winter 2004): 108. 5. federal communications commission, in the matter of federal–state joint board on universal service: children’s internet protection act, fcc order 03-188 (washington, d.c.: 2003). 6. cipa. 7. roth v. united states, 354 u.s. 476 (1957). 8. miller v. california, 413 u.s. 15 (1973). 9. roth v. united states. 10. cipa. 11. cipa. 12. telecommunications act of 1996, public law 104-104 (feb. 8, 1996). 13. paul t. jaeger, charles r. mcclure, and john carlo bertot, “the e-rate program and libraries and library consortia, 2000–2004: trends and issues,” information technology & libraries 24, no. 2 (2005): 57–67. 14. ibid. 15. ibid. 16. american library association, “u.s. supreme court arguments on cipa expected in late winter or early spring,” press release, nov. 13, 2002, www.ala.org/ala/aboutala/hqops/ pio/pressreleasesbucket/ussupremecourt.cfm (accessed may 19, 2008). 17. kelly rodden, “the children’s internet protection act in public schools: the government stepping on parents’ toes?” fordham law review 71 (2003): 2141–75. 18. john carlo bertot, paul t. jaeger, and charles r. mcclure, “public libraries and the internet 2007: issues, implications, and expectations,” library & information science research 30 (2008): 175–184; charles r. mcclure, paul t. jaeger, and john carlo bertot, “the looming infrastructure plateau?: space, funding, connection speed, and the ability of public libraries to meet the demand for free internet access,” first monday 12, no. 12 (2007), www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/ article/view/2017/1907 (accessed may 19, 2008). 19. mccarthy, “filtering the internet.” 20. leigh s. estabrook and edward lakner, “managing internet access: results of a national survey,” american libraries 31, no. 8 (2000): 60–62. 21. alberta davis comer, “studying indiana public libraries’ usage of internet filters,” computers in libraries (june 2005): 10–15; thomas m. reddick, “building and running a collaborative internet filter is akin to a kansas barn raising,” computers in libraries 20, no. 4 (2004): 10–14. 22. communications decency act of 1996, public law 104-104 (feb. 8, 1996). 23. child online protection act (copa), public law 105-277 (oct. 21, 1998). 24. united states v. american library association. 25. r. trevor hall and ed carter, “examining the constitutionality of internet filtering in public schools: a u.s. perspective,” education & the law 18, no. 4 (2006): 227–45; mccarthy “filtering the internet.” 26. library services and technology act, public law 104-208 (sept. 30, 1996). 27. john wells and laurie lewis, internet access in u.s. public schools and classrooms: 1994–2005, special report prepared at the request of the national center for education statistics, nov. 2006. 28. american library association, libraries connect communities; john carlo bertot, charles r. mcclure, and paul t. jaeger, “the impacts of free public internet access on public library patrons and communities,” library quarterly 78, no. 3 (2008): 285–301; jaeger et al., “cipa.” 29. wells and lewis, internet access in u.s. public schools and classrooms. 14 information technology and libraries | march 2009 30. ibid. 31. jaeger, mcclure, and bertot, “the e-rate program and libraries and library consortia.” 32. jaeger et al., “cipa.” 33. american library association, libraries connect communities. 34. ibid. 35. ibid. 36. jaeger, mcclure, and bertot, “the e-rate program and libraries and library consortia.” 37. ibid. 38. norman oder, “$40 million in e-rate funds suspended: delays caused as fcc requires new accounting standards,” library journal 129, no. 18 (2004): 16; debra lau whelan, “e-rate funding still up in the air: schools, libraries left in the dark about discounted funds for internet services,” school library journal 50, no. 11 (2004): 16. 39. ken foskett and paul donsky, “hard eye on city schools’ hardware,” atlanta journal-constitution, may 25, 2004; ken foskett and jeff nesmith, “wired for waste: abuses tarnish e-rate program,” atlanta journal-constitution, may 24, 2004. 40. jaeger, mcclure, and bertot, “the e-rate program and libraries and library consortia.” 41. department of commerce, national telecommunication and information administration, children’s internet protection act: study of technology protection measures in section 1703, report to congress (washington, d.c.: 2003). 42. mccarthy, “filtering the internet.” 43. paul t. jaeger and charles r. mcclure, “potential legal challenges to the application of the children’s internet protection act (cipa) in public libraries: strategies and issues,” first monday 9, no. 2 (2004), www.firstmonday.org/issues/issue9_2/ jaeger/index.html (accessed may 19, 2008). 44. electronic frontier foundation, internet blocking in public schools (washington, d.c.: 2004), http://w2.eff.org/censor ship/censorware/net_block_report (accessed may 19, 2008). 45. adam horowitz, “the constitutionality of the children’s internet protection act,” st. thomas law review 13, no. 1 (2000): 425–44. 46. tanessa cabe, “regulation of speech on the internet: fourth time’s the charm?” media law and policy 11 (2002): 50–61; adam goldstein, “like a sieve: the child internet protection act and ineffective filters in libraries,” fordham intellectual property, media, and entertainment law journal 12 (2002): 1187–1202; horowitz, “the constitutionality of the children’s internet protection act”; marilyn j. maloney and julia morgan, “rock and a hard place: the public library’s dilemma in providing access to legal materials on the internet while restricting access to illegal materials,” hamline law review 24, no. 2 (2001): 199–222; mary minow, “filters and the public library: a legal and policy analysis,” first monday 2, no. 12 (1997), www .firstmonday.org/issues/issue2_12/minnow (accessed may 19, 2008); richard j. peltz, “use ‘the filter you were born with’: the unconstitutionality of mandatory internet filtering for adult patrons of public libraries,” washington law review 77, no. 2 (2002): 397–479. 47. mccarthy, “filtering the internet.” 48. john carlo bertot et al., “public access computing and internet access in public libraries: the role of public libraries in e-government and emergency situations,” first monday 11, no. 9 (2006), www.firstmonday.org/issues/issue11_9/bertot (accessed may 19, 2008); john carlo bertot et al., “drafted: i want you to deliver e-government,” library journal 131, no. 13 (2006): 34–39; paul t. jaeger and kenneth r. fleischmann, “public libraries, values, trust, and e-government,” information technology and libraries 26, no. 4 (2007): 35–43. 49. doug johnson, “maintaining intellectual freedom in a filtered world,” learning & leading with technology 32, no. 8 (may 2005): 39. 50. bertot, mcclure, and jaeger, “the impacts of free public internet access on public library patrons and communities.” 51. jaeger et al., “public libraries and internet access across the united states.” 52. paul t. jaeger et al., “the policy implications of internet connectivity in public libraries,” government information quarterly 23, no. 1 (2006): 123–41. 53. goldstein, “like a sieve.” 54. bertot, mcclure, and jaeger, “the impacts of free public internet access on public library patrons and communities”; jaeger and fleischmann, “public libraries, values, trust, and e-government.“ 55. bertot, jaeger, and mcclure, “public libraries and the internet 2007”; charles r. mcclure et al., “funding and expenditures related to internet access in public libraries,” information technology & libraries (forthcoming). 56. zheng yan and kurt w. fischer, “how children and adults learn to use computers: a developmental approach,” new directions for child and adolescent development 105 (2004): 41–61. 57. zheng yan, “age differences in children’s understanding of the complexity of the internet,” journal of applied developmental psychology 26 (2005): 385–96; yan, “limited knowledge and limited resources”; yan, “differences in basic knowledge and perceived education of internet safety”; yan, “what influences children’s and adolescents’ understanding of the complexity of the internet?” 58. patricia greenfield and zheng yan, “children, adolescents, and the internet: a new field of inquiry in developmental psychology,” developmental psychology 42 (2006): 391–93. 59. john n. gathegi, “the public library as a public forum: the (de)evolution of a legal doctrine,” library quarterly 75 (2005): 12. 60. sandra braman, “where has media policy gone? defining the field in the 21st century,” communication law and policy 9, no. 2 (2004): 153–82; sandra braman, change of state: information, policy, & power (cambridge, mass.: mit pr., 2007); charles r. mcclure and paul t. jaeger, “government information policy research: importance, approaches, and realities,” library & information science research 30 (2008): 257–64; milton mueller, christiane page, and brendan kuerbis, “civil society and the shaping of communication-information policy: four decades of advocacy,” information society 20, no. 3 (2004): 169–85. 61. paul t. jaeger, “information policy, information access, and democratic participation: the national and international implications of the bush administration’s information politics,” government information quarterly 24 (2007): 840–59. 62. mcclure and jaeger, “government information policy research.” lib-mocs-kmc364-20131012113301 random sample of personal names in the lc file indicates that less than 17 percent of personal names require cross-references. thus the personal name headings that occur only once but would require authority records because of cross-references could be less than 17 percent. the frequency data combined with reference structure data could have a significant impact on design. out of a total of 695,074 personal names in the authority files associated with the marc bibliographic files examined here, 456,328, or 66 percent, occur only once. of these, fewer than 77,575 would be expected to have cross-references, thus the nameauthority file for personal names could be reduced in size from 695,074 records to 316,321, a 55 percent decrease. if separate authority records are a system requirement, the occurrence figures might then be useful for defining configurations that employ machine-generated provisional records for single-occurrence headings that do not have reference structures or that simplify in other ways the treatment of these headings. these figures may also be useful in making decisions on the addition of retrospective authority records to the automated files. reference 1. william gray potter, "when names collide: conflict in the catalog and aacr2," library resources & technical services 24:7 iwinter 1980). . rlin and oclc as reference tools douglas jones: university of arizona, tucson. the central reference department (social science, humanities, and fine arts) and the science-engineering reference department at the university of arizona library are currently evaluating the oclc and rlin systems as reference tools, to see if their use can significantly improve the effectiveness and efficiency of providing reference service. a significant number of the questions received by our librarians, and presumably by librarians elsewhere, incommunications 201 valve incomplete or inaccurately cited references to monographs, conference proceedings, government documents, technical reports, and monographic serials. if by using a bibliographic utility a librarian can identify or verify an item not found in printed sources, then effectiveness has been improved. once a complete and accurate description of the item is found , it is a relatively simple task to determine whether or not the library has the item, and if not, to request it through interlibrary loan. additionally, if the efficiency of the librarian can be improved by reducing the amount of time required to verify or identify a requested item, then the patron, the library, and, in our case, the taxpayer, have been better served. the promise of nearimmediate response from a computer via an online interactive terminal system is clearly beguiling when compared to the relatively time-consuming searching required with printed sources, which frequently provide only a limited number of access points and often become available weeks, months, or even years after the items they list. we realize, of course, that the promise of instantaneous electronic information retrieval is limited by a variety of factors, and presently we view access to rlin and oclc as potentially powerful adjuncts tonot replacements for printed reference sources. given that rlin and oclc have databases and software geared to known-item searches for catalog card production, our evaluation attempts to document their usefulness in reference service. a preliminary study conducted during the spring semester of 1980-81 indicated that approximately 50 percent of the questionable citations requiring further bibliographic verification could be identified on oclc or rlin. the time required was typically five minutes or less. successful verification using printed indexes to identify the same items ranged from 20 percent in the central reference department to 50 percent in science-engineering. time required per item averaged approximately fifteen minutes. based on our findings, we plan a revised and more thorough test during the fall semester of 1981-82, which will include an assessment of the enhancements to the 202 journal of library automation vol. 14/3 september 1981 rlin system scheduled to be operational this summer. the proposed test will involve eight members of the reference staff-four from each department-who will be trained to search on oclc and rlin. those selected will include both librarians and library assistants who regularly provide reference assistance. the results obtained from such a representative group will better enable us to assess the impact on the whole reference staff should we later decide to fully implement the service. they will be the only ones involved in sampling questions and conducting comparative searches. the test will have two components, the first of which will be a twenty-week period to collect at least 400 sample questions. during their regularly scheduled reference hours, the eight specially trained librarians 'will collect samples of reference requests for materials that, based on the information initially given by the patron, cannot be identified in the card catalog. after checking the catalog, the librarian will then complete the top portion of a two-page selfcarbon form with all of the information that is known about the requested item. then, at regular intervals during the semester, the pages of each form will be separated and distributed to other members of the test staff for batch-mode searching. the manual oclc and rlin searching for each query will be done by different staff members to eliminate crossover effects. each request will be searched on both oclc and rlin with the following information being recorded: 1. date of the material requested (if known). 2. type of material (e.g., conference proceeding). 3. amount of time required to do the search. 4. success or failure of the search. this information will then be cumulated in a statistical table, and the results of each search will be keypunched for computerized analysis using the bmdp (biomedical computer programs) statistical package to determine whether or not effectiveness and efficiency have been improved significantly. in addition, on twenty-four randomly selected days during the semester the trained searchers will count the total number of questions received by them on that day that would have been appropriate to search on rlin or oclc. by using these data it will be possible to extrapolate the potential usefulness of the systems for the entire semester. the second component of the test will be a two-week real-life test during which all questions requiring further verification would be searched immediately on rlin , oclc, and in the appropriate printed sources to compare time required, success rate, and type of material requested. this sort of test would permit the searcher to continue to negotiate with the patron as the search progressed, which is the usual situation. also, this would provide the only opportunity to have the patron judge the value of subject searches done on rlin. if funding is received, preliminary results should be available in early 1982. anyone conducting similar or otherwise relevant studies is asked to contact the author. replicating the washington library network computer system software thomas p. brown: manager of computer services, and raymond deb use: manager of development and library services, washington library network, olympia. the washington library network (wln) computer system supports shared cataloging and catalog maintenance, retrospective conversion, reference, com catalog production, acquisitions, and accounting functions for libraries operating within a network. the system offers both full marc and brief catalog records as well as linked authority control for a ll traced headings. it contains more than 250,000 lines of pl/1 and ibm bal code in more than 1,100 program modules and runs on ibm or ibm-compatible hardware with ibm operating systems (mvs,os/vs1). all database management functions are provided by adxbas, a product of software a.g. of north america. the online system runs unservices to mobile users: the best practice from the top-visited public libraries in the us article services to mobile users the best practice from the top-visited public libraries in the us yan quan liu and sarah lewis information technology and libraries | march 2023 https://doi.org/10.6017/ital.v42i1.15143 yan quan liu (liuy1@southernct.edu) is professor, southern connecticut state university. sarah lewis (sbojo32@gmail.com) is mlis graduate, southern connecticut state university. © 2023. abstract libraries are adapting to the changing times by providing mobile services. one hundred fifty-one libraries were chosen based on circulation, with at least one library or library system from each state, to explore the diverse services provided to mobile users across the united states. according to the data, mobile apps, mobile reference services, mobile library catalogs, and mobile printing are among public libraries’ most-frequently offered services, as determined by mobile visits, content analysis, and librarian survey responses. every library examined had at least one mobile website, mobile catalog, mobile app, or webpage adapted for a mobile device. following the covid-19 outbreak, services such as mobile renewal, subscriber database access, mobile reservations, and the ability to interact with a librarian were expanded to allow better communication with customers—all from the comfort and safety of their own homes. libraries are continually looking for innovative methods to assist their mobile customers as the world changes. introduction searching can be done on a computer, but it’s more likely to be done on a mobile device.1 according to data from the pew research center, about 96 percent of american adults own a cell phone.2 americans are connecting to the internet via their mobile devices in greater numbers than ever before. the pew report also states that 81 percent of americans have a smartphone.3 while direct usage is not measured in terms of how it relates to public libraries, the reality is that users are looking to connect with businesses and services through their mobile devices. while the covid virus’ ongoing spread has had a significant effect on various public services sectors around the world, libraries, especially in suburban areas, had to evolve and adapt to the changing environment. public libraries now offer more patron services. while many public libraries offered curbside services during covid-19 as a way to provide continuity of service, contactless services—services without having to speak with librarians directly—also became prevalent.4 these services cross geographical boundaries and reduce the risk of disease transmission from direct contact with other people. however, are there areas where libraries are lacking? are there areas where libraries can improve, especially as more and more people are relying on mobile services rather than in-person ones? this study examines current mobile services being offered in public libraries across the united states, how these offerings changed due to the pandemic, and what services libraries are looking to offer in the future. literature review several studies have explored mobile devices and their usage in a broader context. currently, it is estimated that 67 percent of the world’s population has mobile devices, with most of those devices being smartphones.5 in the united states alone, nearly 80 percent of the population owns a mailto:liuy1@southernct.edu mailto:sbojo32@gmail.com information technology and libraries march 2023 services to mobile users 2 liu and lewis smartphone.6 people across the world use mobile media to talk to one another, order food, and attend meetings and appointments. “mobile media—referring to devices, services, and content accessible on the go—has in a decade rapidly become a part of urban culture, and the habit of using a mobile device in public is only increasing.”7 once the covid-19 pandemic hit, use of mobile media was the easy way to connect as shelter-in-place orders were mandated across the country. even as many people are venturing out again, using mobile media for certain services is likely going to continue. the world is forever changed because of the pandemic, and organizations like libraries have sought new ways to connect with their patrons. status of library services provided in public libraries while a few research articles have addressed services provided to mobile users in libraries, none at the time of writing were focused explicitly on services for mobile users in public libraries in the united states. some studies focused on libraries overseas.8 others focused on what drives a library to provide services for mobile devices and what drives users to want to access their library through a mobile device.9 with time, the availability of mobile-friendly services has become critical to a library’s long-term viability. the ability to search a library’s catalog, for example, has aided a traditional library’s modern relevance. “the digital library on mobile devices has been a milestone in library industry development, leading to huge changes in knowledge carrying, spreading, acquiring, processing , and sharing of cloud computing and big data in online and offline forms.”10 users want data at their fingertips, whether it is an online order or a library catalog. “social sharing functions such as reading, borrowing, sharing, comment tracking, automatic retrieval, and new book recommendations on mobile devices such as mobile phones have become the popular trend of digital library development.”11 when evaluating mobile library websites and apps that libraries use to offer services, the literature appears to imply that a mobile version of a library website is more prevalent than a dedicated (or “native”) app. this is because of the development cost, the different mobile platforms (apple, android, etc.), and the maintenance required.12 instead of creating an app, it is often easier to optimize the library’s website for mobile devices through responsive design techniques. the challenge for accessing a library website on a mobile device is making sure the website has the same functionality as when viewed on a desktop computer. “today’s mobile users are no longer satisfied with simple mobile websites with only a small fraction of the information and features that are available on desktop websites. the small screen size of a mobile device may make performing certain tasks more tedious or cumbersome, but mobile users do expect to perform more and more tasks on their mobile devices.”13 lemire et al. performed a study of how mobile services have improved since 2010. while academic libraries were analyzed and surveyed, this information is helpful and relevant to the study of public libraries. mobile apps offered across various libraries while mobile apps and mobile webpages aren’t precisely the same (as proven by previous literature), the study public library mobile apps in scotland: views from the local authorities and the public sheds some insight on the use of mobile apps. it seems that many libraries do not have or are not interested in developing a mobile app (this coincides with the findings from lemire et al.’s study in the united states).14 however, when the public was surveyed, they did want more information technology and libraries march 2023 services to mobile users 3 liu and lewis remote services offered to mobile devices from their libraries. to increase the use of library services, apps should be considered.15 “by using an app instead of a mobile-enabled website, all the functionalities of smart technology can be incorporated to the library’s advantage. improved communication with patrons increases exposure to communities who otherwise would not use library services.”16 a similar study was done in malawi, where mobile services to libraries were analyzed. as we might expect, since malawi has a developing economy, it has fewer mobile services then the more mature economies of the us or scotland. the country recognizes the potential of using mobile devices to access library services. computer shortages are often a problem within libraries in developing countries, so by creating mobile services, users could access the library through their own mobile devices.17 studies have also found that it is important to know what users are looking for and to use that information when creating an app design. “perceived situation efficiency and perceived mobile library quality positively affect intention to use mobile library, demonstrating that both quality and situation efficiency are necessary to satisfy library users’ needs in mobile era.”18 the quality of the library app or mobile website is obviously important, but knowing what kind of services will be provided is also important. users are not going to turn to an app if it is not going to provide the information they need. this reported study proved that mobile users want to quickly obtain the information they want in the most effective way possible. library services for mobile users universities are often more likely offering mobile services than are small-town libraries. nearly all prestigious universities in the united states are already using mobile-friendly services, according to a study of one hundred university libraries.19 typically, the services for mobile users include “mobile sites, mobile apps, mobile opacs, mobile access to databases, text messaging services, qr codes, augmented reality, and e-books.”20 public libraries may not offer all these mobile services, such as augmented reality, but the majority provide access to a mobile opac or library catalog. guo, liu, and bielefield examined how urban public libraries provide services for and offer content to mobile users.21 their analysis explored what was being done in an urban setting to potentially help public libraries plan and create mobile services. they looked into literature dating back to 1991, when mobile data was just a thought in some forward-thinkers’ minds, as was the concept of what a “mobile device” entailed. the study used current research to group library services into two categories: “traditional library services modified to be available via mobile devices and services created for mobile devices.”22 to conduct their study, a list of 138 urban libraries in the united states was used based on the urban libraries council. all 138 were examined using the same criteria. a list of contents was created (components of mobile websites, components of mobile apps, mobile reference services, social media, mobile reservation services, mobile printing, apps or databases). the findings supported the hypothesis that services for mobile users have been in place in urban libraries in the united states. according to guo et al., 95 percent offer at least one type of service for mobile users.23 pope and others discussed sms or text messaging services to a mobile device. researchers also mention the my info quest project, which was trying to get more libraries on board with using text information technology and libraries march 2023 services to mobile users 4 liu and lewis messaging, one of the mobile services studied in this review.24 literature from the fields of information science and library science shows that in recent years, typical library services have been adapted to be accessible via a mobile device rather than a service being developed specifically for a mobile platform. research design a selection of two to five of the most-visited public libraries per state was chosen based on a statistic from the institute of museum and library services’ database of 9,247 public libraries (this number treats library systems as one).25 the libraries were chosen based on two criteria: libraries needed to be in states having a sizable number of public libraries and to have at least 3 million yearly in-person visitors. this resulted in a list of public libraries compiled to ensure that all notable libraries in the united states were covered in this study. the sample of 151 libraries’ state and population served, total circulation, total number of programs, total visits, and other associated data were entered into an excel spreadsheet for analysis. with a low of 168,661 and a high of 16,686,945, the average number of visitors per year (prior to the pandemic) to these libraries is 2,663,292. these libraries serve an average of 700,924 people, with a low of 8,542 and a high of 4,294,460. the dataset includes all branch libraries, with the largest system having 92 branches. mobile website/app visits, content analysis, and email surveys were among the study methodologies used. the mobile website examination was conducted through an android mobile phone device, using a specially designed codebook/spreadsheet to store and analyze data collected and verified from the library websites through 2021. the services offered were checked on each library’s website. email surveys were developed as a supplement to ensure data accuracy and additional input of the library’s mobile services from the librarians who work there. irb approval was granted (protocol #406)26 because human participants did not provide personal information and simply responded to email surveys. the first questionnaire was sent out to each library via google forms, either through direct email addresses or the library’s web form, from april 3 through april 7, 2021. the follow-up email survey was designed to act as a check for some possible limitations (such as being unable to access app features) and was sent on april 19–20, 2021. the data was verified again through the imls database in early 2022. the overarching question of what services libraries provided to users via their mobile devices was designed to delve into the mobile websites and/or apps provided by libraries, as well as the resources provided by the mobile websites or apps, reference services, reservation services, remote printing, and other services provided to mobile users. results and findings examining the use of a mobile device to deliver services by these most-visited public libraries reveals intriguing and novel findings in comparison to the use of desktop access for services available to users. the poll sought to find the following information: which mobile services the libraries now provide, what they intend to provide, and any feedback received on their mobile services. information technology and libraries march 2023 services to mobile users 5 liu and lewis all library websites are accessible via mobile devices every one of the 151 libraries analyzed had a mobile website. in certain situations, this was the library’s website optimized and tailor-made with responsive designs for a mobile device. in other cases, all that was provided was a version of the library’s website (but not optimized, so navigation was more difficult). in other cases, the library’s website was merged with the parent organization and simply provided basic library information. examples of these three conditions of the library mobile websites are shown in figure 1. figure 1. examples of different library websites. library services were made available to mobile devices primarily due to covid-19, as indicated by the responses of 60 (40%) of the 151 libraries to online questionnaires using google forms. though it is practically difficult to compare the services offered in 2019 to those offered during the covid-19 pandemic, 38 (63%) of respondents indicated that they had added mobile services in the last year (see fig. 2). information technology and libraries march 2023 services to mobile users 6 liu and lewis figure 2. mobile services added because of covid-19. what services were added, particularly since covid-19? chat features were the most popular response as a reference service and a means to connect. one example is the usage of discord, a chat program that allows users to communicate by voice, video, and text, as a virtual communication tool to provide reference services. nearly every library implemented curbside pickup. during the pandemic, virtual events and online reference services also became popular additions. book delivery was still one of the mobile services that libraries provided, either using “chomp delivery” (a local delivery service company used by one library in iowa) or the united states postal service (usps). while these “ship to patron” delivery approaches may not be a frequent practice, it was an inventive approach to getting library books into the hands of clients who were unable to leave their homes. see figure 3 for common services added during covid-19. over half of libraries provide dedicated apps for mobile devices though all libraries had a website, more than half had a specific library app dedicated for users with mobile devices. out of the 151 libraries analyzed, 52 percent (78) had at least one dedicated app built for the library or library system (see fig. 4). these apps could be downloaded from the google play store (android platform) or the apple store (ios platform). all but one of the 78 libraries allowed patrons to log in to their accounts to look at their current checkouts, search the catalog, place holds, and request items. other library applications such as hoopla or libby/overdrive, as well as apps that were used to display upcoming events and other library data, like locations and hours, were excluded. 63% 37% yes no information technology and libraries march 2023 services to mobile users 7 liu and lewis figure 3. common features added because of covid-19. figure 4. percent of libraries that offer a mobile app. 5% 8% 11% 14% 14% 19% 19% 24% 0% 5% 10% 15% 20% 25% 30% book delivery self-checkout digital library card mobile printing virtual reference services curbside virtual events chat 48% 52% no yes information technology and libraries march 2023 services to mobile users 8 liu and lewis services delivered through library mobile websites all 151 libraries had some form of mobile website. although each website provided a lot of the same information, there were significant variances (see fig. 5). the ability to log in to one’s account was available on the investigated websites. this allows customers to search the library catalog, place holds on books, and renew any books that were currently checked out. all the websites included basic contact information as well as information about the locations and hours of the libraries, and all libraries have an online public access catalog (opac) accessible through mobile devices. web-mediated services have been in place particularly since covid-19. nearly every library now offers services such as curbside pickup for books and other materials. library events are now almost exclusively offered virtually. over 98 percent of libraries have a calendar or other means of informing users about forthcoming virtual activities and events. some libraries are now making available re-opening plans, covid-19 protocols, and even covid-19 vaccine information. almost every library’s main webpage mentions modifications brought on by covid-19. as a result, certain library hours have altered, and services at all libraries have been impacted by covid-19. typically, getting a library card was a service that physically took place in the library. during covid-19, libraries had to learn to be flexible. over 94 percent of libraries offered the ability to get a library card or allowed patrons to register for an e-card so they could check out books. some of these libraries offered the option of printing the paperwork and necessary documentation to obtain a library card, but this paperwork did have to be dropped off at the library. however, many allowed their patrons to either extend the expiration date on a current library card or apply for a new one to be able to use the library’s virtual services. many libraries require a library card to access some of their services, such as databases. nearly all libraries offered a variety of databases or other apps (such as libby/overdrive or hoopla). a library card is needed to access all services. a couple of libraries even have their databases behind a login screen, so a user cannot even see the list of available databases until they have a library card and log in. all libraries in the sample had a social media presence. discrepancies on library websites were noted concerning new arrivals and recommendations. only about 55 percent of library websites listed new arrivals and about 52 percent listed recommendations. it is possible that this data is included within the library catalog (and only accessible once a user has logged in). out of all libraries analyzed, 43 libraries (28%) offered both recommendations and listed new arrivals. a smaller number, 31 libraries (21%) did not offer either of these. additional services seen on some of the library websites included the option to recommend a purchase to the library, how and if libraries were accepting donations, and a list of online class es being offered. some libraries offered services on finding a job or becoming a united states citizen. overall, the library websites provided a lot of valuable information to patrons. some were easier to navigate than others. the locations, hours, and contact information should be easy to find and access; however, on quite a few websites, these points of information are not easily accessed, since some libraries’ websites required logging in. still, with some searching and scrolling, a user could get to nearly all the basic information they might be looking for. information technology and libraries march 2023 services to mobile users 9 liu and lewis figure 5. percentage of libraries in study that offered services via the library’s mobile website. services delivered through library mobile apps while the primary function of a library’s dedicated app seemed to be accessing the catalog and requesting books, common services delivered through apps also include listings of events, such as webinars, found in 35 libraries (45%) that have an app; recent arrivals or recommendations, found in 33 libraries (42%); and ways to contact the library, in 13 libraries (17%). (see fig. 6 for the complete list of services.) all percentages are calculated based on the number of libraries that had a dedicated app that had that feature (78). gathering the information available on mobile apps was challenging as some apps require users to be affiliated with that library and log in with a library card. the information on what the app provides primarily came from the screenshots in the google play store (for the investigators’ android mobile devices) and the description on either the library’s website or within the play store. the apple app store applications available for library use are not disclosed in this study. as a result, the number of apps that offer additional services (such as locations/hours, events, etc.) may be higher than the percentages in figure 6. for example, the brooklyn public library app (seen in fig. 8) has many features that are not shown in the sample on the google play store. clicking on library info will provide additional details beyond library locations and hours, but also contact information and links to social media. this information is only displayed to affiliated library patrons who have downloaded and logged in to the app. 52% 56% 95% 98% 99% 100% 100% 100% 100% 100% 100% 0% 20% 40% 60% 80% 100% 120% recommendations recent arrivals get a library card/ecard virtual programs library database search social media locations and hours library contact information library catalog search ability to log in to account ability to reserve/renew materials information technology and libraries march 2023 services to mobile users 10 liu and lewis figure 6. percentage of libraries with a dedicated mobile app that offered services. apps developed and delivered for dedicated library services while it may appear that apps are the most popular method of connecting with patrons, maintaining an app and ensuring that it works on all mobile platforms and devices can be challenging. while creating and making an app available through the google play or apple app stores may be an easy procedure for certain libraries, others are impeded by a lack of technical expertise. designing and maintaining an app takes too much time and money, as voiced from the study survey. an in-depth examination of the google play store reveals that variables influencing the makers of each library’s app include familiarity with the community serviced, the potential for options, staff training, and phone access (see fig. 7). however, the majority of apps (67%) were commercially made, and only 33 percent were self-developed. this results in apps that are unique to the library and the user’s needs, but also require dedicated it staff to maintain the app. 17% 17% 42% 45% 63% 99% 99% 99% 0 0.2 0.4 0.6 0.8 1 1.2 social media contact us/ask a question recent arrivals/recommendations events locations/hours book reservations account login catalog search information technology and libraries march 2023 services to mobile users 11 liu and lewis figure 7. app developers, according to the google play store. one of the most popular software developers that provide libraries with scalable, effective mobile applications is solus uk ltd. out of the 78 apps analyzed, 24 percent used this developer. feedback from the study survey indicated that this app has strong capacity to expand or change according to user requirements, and its interface is user friendly and simple (see fig. 8). the app was obviously updated during the past year to allow for contactless holds pickup, a service that many libraries are offering so patrons do not have to come into the library to pick up their books or other materials. however, that interface is markedly different than others. both the chicago public library and the brooklyn public library have self-developed apps (see fig. 9). the functionality of these apps is like that of the st. paul public library, but the look is very different. other apps, developed by other companies, also have an entirely different presentation and notion, such as responding time and user interface. the main purpose is the same: allow the user to be able to view the catalog and be able to check their holds and current materials. 5% 6% 6% 8% 17% 24% 33% 0% 5% 10% 15% 20% 25% 30% 35% boopsie other bibliocommons capira communico solus uk ltd self-developed information technology and libraries march 2023 services to mobile users 12 liu and lewis figure 8. st. paul public library’s app developed by solus uk ltd. figure 9. self-developed apps: chicago public library app (left) and brooklyn public library app (right). information technology and libraries march 2023 services to mobile users 13 liu and lewis major forms of mobile reference services one of the most important ways for a library to connect with patrons is through mobile reference services. even when the library is not open, many people seek help from the library reference desk. while calling the library is always an option, it is often not the most convenient one. while not all mobile reference services will work in this instance, some certainly will. an increasingly common example is the use of chatbots to offer such services. out of the 151 libraries surveyed, 134 libraries (80%) offered mobile reference services in some format via both mobile websites and dedicated apps (see fig. 10). figure 10. percentage of libraries offering mobile reference services. mobile reference services are described in this study as a direct way to contact the library via its mobile site. this can be done via chat, which functions similarly to instant messaging, text messaging (a patron can text a reference request to a specified text message number), or a web form (mobile friendly and reachable from remote devices) (see fig. 11). the web form, which was found on the websites of 127 of the 134 libraries (95%), was the most often utilized channel for mobile reference transactions compared to other services (see fig. 11). the user’s name, email address (and sometimes their library card or branch location), as well as their inquiry, were required. users’ phone numbers can be blank and are not collected. the librarian would then answer through email (or phone, if provided and applicable). the web form is available seven days a week, 24 hours a day. the user can submit at any time and receive a response when the librarian is available, which is convenient for both parties. 11% 89% no yes information technology and libraries march 2023 services to mobile users 14 liu and lewis figure 11. major forms of mobile reference service. chat or instant messaging is the second most common type of mobile reference service and was used by 74 (55%) of the 134 libraries that offered mobile reference services. chat rooms were set up in several libraries during specific hours of the day. for example, from 11 a.m. to 1 p.m., the boston public library (massachusetts) hosts a chat session. outside of those hours, this feature is not available. the limitation of the chat option for some libraries is that it works on a computer but not on a mobile device. also, the chat feature was unavailable outside of libraries’ standard operating hours. the sms text option was chosen by 36 (27%) of the libraries. the fact that many questions require a lengthy response or back-and-forth conversation can sometimes make this more challenging for librarians. if a patron has a short inquiry, the text option is convenient; nevertheless, this is most often used when the library is open. in addition to the text function, all 36 libraries offered another form of mobile reference service. thirty-five libraries offered the web form in addition to the text alternative. while most libraries provide only one form of mobile reference service, a few provide three or more such options. out of 134 libraries that provided mobile reference services, 56 (42%) provided only one service and 49 (37%) provided two. only 30 libraries (22%) provided all three services (chat, sms, and web form) (see fig. 12). the provo city library (utah) combines all three services in one chat box in the example shown in figure 13. this allows a user to ask a question and then continue the conversation using the technique of their choice. presently, these tools are more frequently observed in public libraries using libraryh3lp as opposed to mosio or libanswers, etc. 27% 55% 95% 0% 20% 40% 60% 80% 100% text/sms chat/im web form information technology and libraries march 2023 services to mobile users 15 liu and lewis figure 12. comparison of the types of mobile reference services offered by libraries. figure 13. “ask a librarian” combines options of mobile services. mobile reservation services are widely available reserving a computer, museum pass, study room, meeting room, and show space are among the mobile reservation services that were examined. at least one of these was available at 106 libraries (70%) (see fig. 14). this finding indicated that mobile reservation services became widely used as a result of the covid-19 pandemic in the surveyed libraries. 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% one form two forms three forms information technology and libraries march 2023 services to mobile users 16 liu and lewis figure 14. percentage of libraries offering at least one mobile reservation service. many libraries remained closed throughout the survey study period, making it difficult for patrons to book conference rooms, study rooms, or exhibit space. some libraries that were open had meeting rooms and study rooms that were not available for use due to social distancing or other municipal standards. however, the option to reserve these rooms was still available on the website at the time of the analysis. some libraries supplied meeting space information but required patrons to make a reservation by contacting them. some libraries, such as the salt lake city public library in utah, closed their actual meeting rooms but offered rental of virtual meeting rooms while their physical meeting rooms were closed to the public (see fig. 15). others offer not just rooms, but outdoor and large areas. the indianapolis public library has an auditorium, an atrium, and a garden that are available to rent for a meeting—or even a wedding (the online form can be filled out requesting the date and the space for all types of events). 70% 30% yes no information technology and libraries march 2023 services to mobile users 17 liu and lewis figure 15. the salt lake city public library’s online announcement. while many events can take place at a library, the spaces or rooms that could be reserviced, such as wedding and exhibit spaces or rooms, are not commonly offered (see fig. 16). rather, the rooms at most libraries are used for either meetings or study sessions. the most common mobile reservation is for a meeting room, as 79 libraries (75%) out of the 106 libraries offering mobile reservation services allow patrons to reserve this space online (although, again, this service may not currently be available because of covid-19). study rooms are less commonly offered, with only 24 libraries (23%) offering mobile reservations for those online. however, some study rooms may be included with meeting rooms at some libraries. because libraries are closed or have covid-19 restrictions, book-a-librarian and reserve computer services are also paused. some libraries have pivoted to booking a librarian for a virtual meeting (34, or 32%), but often, that is just done through reference services. many libraries are limiting computer use and, as such, booking one can only be done in person to limit the number of people on the computers at one time. only 10 libraries (9%) of the libraries offering mobile reservations provide a service where a patron can reserve a computer. libraries are typically a great resource for free or discounted museum passes, but with some museums closed or having limited hours, some museum passes may not be available either. only 36 (34%) of the 106 libraries offering mobile reservation services allow patrons to book a museum pass online. information technology and libraries march 2023 services to mobile users 18 liu and lewis figure 16. percentage of libraries offering reservation of various kinds of services via a mobile interface mobile printing services become an emerging phenomenon the ability to print from a laptop computer or a mobile device is a newer service that many libraries are starting to offer to their patrons. with libraries being closed because of covid -19, the mobile printing service has become even more important. patrons can send a printing job to the library and then pick it up curbside. this service is another way libraries are adapting to the world during a pandemic. most libraries (114, or 76%) do offer mobile printing (see fig. 17). the majority of these libraries have their users download a specific app that allows them to connect to the library’s printers remotely with a variety of instructions. some libraries offer the ability to print wirelessly, where a user can connect to the library’s wi-fi (even in the parking lot if the library is closed) and send a document to the library that way. in this analysis, wireless printing and mobile printing are included together, as it is difficult to differentiate them because of the pandemic. as part of a remote service, some libraries are also starting to offer 3-d printing services, allowing patrons to submit 3-d print jobs from their mobile devices. this typically is also done through a specific app and sometimes comes with a fee allowing users to 3-d print and then pick the product up curbside. some libraries include this service as part of their makerspace and make it available for free with a library card. overall, 17 libraries (11%) offer 3-d printing. as more locations are utilizing 3-d printing technology, libraries can offer such 3-d service libraries to the general public wisely. 9% 23% 32% 34% 75% 0% 10% 20% 30% 40% 50% 60% 70% 80% computer study room librarian museum pass meeting room information technology and libraries march 2023 services to mobile users 19 liu and lewis figure 17. percentage of libraries that offer mobile (or wireless) printing. table 1. databases & applications accessible via a mobile website app/database percent of libraries providing app/database percent of libraries providing abcmouse 18% learning express 65% ancestry 78% lynda.com (linkedin) 57% bookflix 27% mango languages 57% brainfuse 32% masterfile 40% britannica 39% morningstar 57% chilton 35% national geographic 15% consumer reports 49% new york times 58% driving-tests.org 29% novelist 81% ebscohost 51% overdrive/libby 62% eric 31% proquest 17% explora 46% rbdigital/zinio 27% flipster 39% reference usa 57% freegal 33% rosetta stone 15% gale 72% tumble book 47% heritage quest 55% tutor.com 20% hoopla 48% valuline 56% kanopy 41% world book 33% khan 10% worldcat 44% 75% 25% yes no information technology and libraries march 2023 services to mobile users 20 liu and lewis subscribed databases are available via mobile devices patrons should be able to access databases via their mobile sites. when consumers think of public library services, databases are not always the first thing that comes to mind. however, when users were at home and libraries were closed, it was critical for patrons to have access to these databases. a library card was necessary to access the majority, if not all, of the databases. some libraries locked their databases behind a login screen so users can’t access them without a valid library card. a wide range of databases were available, and some libraries seemed to cater to their target audiences, with some offering more databases geared at children or teens (abc mouse, tumblebooks) and others focusing on their adult patrons (consumer reports, valuline). ancestry, gale (which includes academic onefile), and learning express (also known as prepstep) were the most widely used databases and were mentioned by 108 libraries (72%) of the 150 libraries that have databases available through their library mobile websites. table 1 lists the most commonly offered databases. many others were available at various libraries across the country, but none were offered by more than 10 percent of the libraries examined. social media bridges mobile devices from libraries to patrons all the libraries surveyed have social media links on their mobile websites. some libraries include apps that connect to social networking. every library had a facebook page, and almost all had a youtube channel (96%) or an instagram account (95%). the amount of access to some of these pages varies. one expectation from users is that all libraries would promote their social media on their main (or contact us) page, but some did not do so. figures were double-checked by browsing youtube, instagram, and twitter for each library. a youtube channel with fewer than 20 subscribers and postings was found for several libraries. ninety-three percent of libraries use their twitter accounts to promote their programming and other activities. following the listing of the primary social media, some libraries have a “drop-off” for how they interact with their users (see fig. 18). only 48 libraries (32%) used pinterest, and fewer used flickr (27, or 18%) or linkedin (25, or 17%). a very small percentage used tumblr (10, or 7%) or goodreads (6, or 4%). it is possible that more of these libraries used these social media channels but did not include the logo on their web page with their other social media. an extensive check was done on youtube, twitter, and instagram to verify that the libraries had a social media presence on those channels. these are the primary ways libraries connect users to their services, as they are the more popular on social media. other social media used include blogs from each library, e-newsletters, podcasts, rss feeds, vimeo, and tiktok. actions and barriers to advance mobile services despite the fact that more individuals are visiting libraries again and the facilities are open, many users might be hesitant to do so. fifty-six percent of the 54 libraries responding to the survey questionnnarie said they would like to provide more mobile services in the future (see fig. 19). information technology and libraries march 2023 services to mobile users 21 liu and lewis figure 18. percentage of public libraries using various social media services. figure 19. percentage of libraries wishing to add mobile services. 4% 7% 17% 18% 32% 93% 95% 96% 100% 0% 20% 40% 60% 80% 100% 120% goodreads tumblr linkedin flickr pinterest twitter instagram youtube facebook 56% 44% yes no information technology and libraries march 2023 services to mobile users 22 liu and lewis when libraries were asked what services they were looking to add, the two most common responses were: an app and text services. both responses were given by seven (21%) out of the 33 libraries that responded. the sms texting service was not just mobile reference, but also text notifications for program updates or reminders or account notifications. the rest of the responses were spread out, in that three libraries (9%) said that they were looking to add chat features and two libraries (6%) were looking to add mobile printing. other answers to the question by a single library included bookmobile, mobile checkout, and virtual story time. many libraries aspire to expand their offering of mobile reference services. the number of libraries that employ chat and sms text messaging appears to be substantial in the upcoming year. considerations are being given to expanding mobile services at libraries (see fig. 20). a total of 34 libraries responded, with the most common considerations being getting to know the community and promoting/marketing the services (18% of the libraries responded for each). a popular mobile service does not imply it will work for patrons, according to the surveyed libraries. the advice from survey respondents was to really attempt to figure out what people want and then give it to them. figure 20. percentage of libraries giving consideration to expanding mobile services to various areas. the research showed that libraries frequently test services to see what works and test services frequently to make sure they are operating properly. one library urged other libraries to do the best they could with what they had. they recognized that their customers were frustrated by their inability to provide mobile services and urged that they push out whatever mobile offerings they could. they agreed that it would not be flawless, but users would prefer to communicate with library staff when encountering a bug-free app/website/service. staffing, technology, and money were by far the three most significant difficulties that libraries faced, based on the response from 36 libraries (see fig. 21). while expanding mobile services is something that these libraries want to accomplish, finding employees to manage them and 9% 12% 15% 18% 18% 0% 5% 10% 15% 20% mobile phone access train your staff provide options/try as many as possible know your community promotion/marketing information technology and libraries march 2023 services to mobile users 23 liu and lewis understand how to utilize them is difficult. furthermore, many upgrades necessitate funding, whether for staff training or the development of an app. figure 21. percentage of libraries facing resource challenges in offering mobile services. one of the top technical issues is ensuring that their mobile services are interoperable across several platforms. six libraries (17%) indicated that anything they add will operate on both android and apple platforms on mobile devices. another technological challenge is that any service the library introduces must be compatible with what they already have. when new apps or platforms are required, integrating them with existing technology can be tricky. conclusion and suggestions for further study after examining 151 public libraries across the united states, it is obvious that libraries are attempting to engage with their consumers via mobile services. all 151 libraries have either a mobile website or one that is mobile friendly. as such, the website, in all cases, contained contact information, library locations and hours, access to the library catalog, and the ability to log in and renew or reserve a book. during a period when many libraries were closed, these primary services were critical in connecting with the bulk of users. libraries adapted to the times by introducing curbside service, which allowed users to place holds on books or materials remotely using their mobile devices and then return to the library to pick up their holds without having to go inside. a bit more than half of the libraries polled had a specialized mobile app that performs some of the same functions as the mobile website. some libraries are considering developing an app to help their patrons in the future. many libraries have incorporated remote reference services, such as booking a librarian with an online reservation, sms notifications, and chat, in response to the pandemic. these services enable the library to communicate with its patrons when face-to-face interaction is not possible. 17% 17% 23% 0% 5% 10% 15% 20% 25% money technology staff information technology and libraries march 2023 services to mobile users 24 liu and lewis libraries are continually assessing their mobile services to determine what will work best for their users. mobile printing, chat, and an app are among the new features they’re introducing. according to a poll, 56 percent of libraries plan to continue to provide mobile services in the future. many businesses, including libraries, have had to reconsider and adapt their business models. libraries are relying on mobile services to maintain their relationships with their clients as the world continues to change because of covid-19 and the rise of mobile devices. endnotes 1 statcounter global stats, “mobile and tablet internet usage exceeds desktop for first time worldwide,” press release, november 1, 2016, https://gs.statcounter.com/press/mobile-andtabletinternet-usage-exceeds-desktop-for-first-time-worldwide. 2 pew research center, “mobile phone ownership over time,” mobile fact sheet, april 7, 2021, http://www.pewinternet.org/factsheet/mobile/ 3 ash turner, “how many smartphones are in the world?”, accessed march 2021, https://www.bankmycell.com/blog/how-many-phones-are-in-the-world. 4 yajun guo, zinan yang, zhishun yang, yan quan liu, arlene bielefield, and gregory tharp, “the provision of patron services in chinese academic libraries responding to the covid-19 pandemic,” library hi tech 39, no. 2 (july 24, 2020): 533–48, https://doi.org/10.1108/lht04-2020-0098. 5 turner, “how many smartphones.” 6 turner, “how many smartphones.” 7 petter bae brandtzaeg, “how mobile media impacts urban life: blending social cohesion and social distancing,” interactions 27, no. 6 (november–december 2020): 52–56, https://doi.org/10.1145/3424682. 8 yajun guo, zinan yang, yiming yuan, hulfang ma, and yan quan liu, “contactless services: a survey of the practices of large public libraries in china,” information technology and libraries 41, 2 (2022): 2–21, https://doi.org/10.6017/ital.v41i2.14141. 9 theophilus kwamena ocran, peter graham underwood, and paulina afful arthur, “strategies for successful implementation of mobile phone library services,” the journal of academic librarianship 46, no. 5 (2020): 102174, https://doi.org/10.1016/j.acalib.2020.102174. 10 li liu, xin su, umair akram, and muhammad abrar, “the user acceptance behavior to mobile digital libraries,” international journal of enterprise information systems (ijeis) 16, no. 2 (april–june 2020): 38–53, https://doi.org/10.4018/ijeis.2020040103. 11 liu et al., “the user acceptance behavior.” 12 s. lemire, stacy gilbert, stephanie graves, and tiana faultry-okonkwo, “the present and future of the library mobile experience,” library technology reports, 49, no. 6, (2017). https://gs.statcounter.com/press/mobile-and-tablet-internet-usage-exceeds-desktop-for-first-time-worldwide https://gs.statcounter.com/press/mobile-and-tablet-internet-usage-exceeds-desktop-for-first-time-worldwide https://gs.statcounter.com/press/mobile-and-tablet-internet-usage-exceeds-desktop-for-first-time-worldwide http://www.pewinternet.org/fact-sheet/mobile/ http://www.pewinternet.org/fact-sheet/mobile/ https://www.bankmycell.com/blog/how-many-phones-are-in-the-world https://doi.org/10.1108/lht-04-2020-0098 https://doi.org/10.1108/lht-04-2020-0098 https://doi.org/10.1145/3424682 https://doi.org/10.6017/ital.v41i2.14141 https://doi.org/10.1016/j.acalib.2020.102174 https://doi.org/10.4018/ijeis.2020040103 information technology and libraries march 2023 services to mobile users 25 liu and lewis 13 lemire et al., “the library mobile experience.” 14 lemire et al., “the library mobile experience.” 15 alan kerr and diane rasmussen pennington, “public library mobile apps in scotland: views from the local authorities and the public,” library hi tech 36, no. 2, (2018): 237–51, https://doi.org/10.1108/lht-05-2017-0091. 16 kerr and pennington, “public library mobile apps in scotland.” 17 aubrey harvey chaputula and stephen mutula, “ereadiness of public university libraries in malawi to use mobile phones in the provision of library and information services,” library hi tech 36, no. 2, (2018): 270–88, https://doi.org/10.1108/lht10-2017-0204. 18 min zhang, xuele shen, mingxing zhu, and jun yang, “which platform should i choose? factors influencing consumers’ channel transfer intention from web-based to mobile library service,” library hi tech 34, no. 1, (2016): 2–20, https://doi.org/10.1108/lht-06-2015-0065. 19 yan quan liu and sarah briggs, “a library in the palm of your hand: mobile services in top 100 university libraries,” information technology & libraries 34, no. 2, (june 2015): 133–48, https://doi.org/10.6017/ital.v34i2.5650. 20 liu and briggs, “a library in the palm of your hand.” 21 yajun guo, yan quan liu, and arlene bielefield, “the provision of mobile services in us urban libraries,” information technology and libraries 37, no. 2 (june 2018): 78–93, https://doi.org/10.6017/ital.v37i2.10170. 22 guo, liu, and bielefield, “the provision of mobile services.” 23 guo, liu, and bielefield, “the provision of mobile services.” 24 kitty pope, tom peters, lori bell, and skip burhans, “twenty-first century library musthaves: mobile library services,” searcher 18, no. 3 (april 2010): 44–47. 25 institute of museum and library services, “library search and compare,” https://www.imls.gov/search-compare/. 26 yan quan liu and sarah lewis, irb protocol 406, for “the use of mobile services in public libraries across the country,” the department of information and library science, southern connecticut state university (2021). https://doi.org/10.1108/lht-05-2017-0091 https://doi.org/10.1108/lht-10-2017-0204 https://doi.org/10.1108/lht-10-2017-0204 https://doi.org/10.1108/lht-06-2015-0065 https://doi.org/10.6017/ital.v34i2.5650 https://doi.org/10.6017/ital.v37i2.10170 https://www.imls.gov/search-compare/ abstract introduction literature review status of library services provided in public libraries mobile apps offered across various libraries library services for mobile users research design results and findings all library websites are accessible via mobile devices over half of libraries provide dedicated apps for mobile devices services delivered through library mobile websites services delivered through library mobile apps apps developed and delivered for dedicated library services major forms of mobile reference services mobile reservation services are widely available mobile printing services become an emerging phenomenon subscribed databases are available via mobile devices social media bridges mobile devices from libraries to patrons actions and barriers to advance mobile services conclusion and suggestions for further study endnotes lib-mocs-kmc364-20131012113204 194 journal of library automation vol. 14/3 september 1981 today's large academic libraries struggle, there is, nonetheless, room for criticism of library priorities. this study must be viewed as only a first step (largely tentative and exploratory) in relating automation with service attitudes. it suggests that online systems may be associated with managers more positive in their view of the management role and more positive in their attitudes toward users than batchand manual-system managers. further research would be useful at this point to compare levels of automation (manual, batch, and online) with circulation-staff service attitudes or those of patrons using the systems. references l. laurence miller, "changing patterns of circulation services in university libraries" (ph.d. dissertation, florida state university, 1971), p.iii. 2. ibid., p.149. 3. robert oram, "circulation," in allen kent and harold lancour, eds., encyclopedia of library and information science, v.s (new york: marcel dekker, 1971), p.l. 4. william h. scholz, "computer-based circulation systemsa current review and evaluation," library technolo gy reports 13:237 (may 1977). 5. robert oram , " circulation," p.2. 6. james robert martin , "automation and the service environment of the circulation manager" (ph.d. dissertation, florida state university, 1980), p.22. statistics on headings in the marc file sally h. mccallum and james l. godwin: network development office, library of congress, washington, d.c. in designing an automated system, it is important to understand the characteristics of the data that will reside in the system. work is under way in the network development office of the library of congress (lc) that focuses on the design requirements of a nationwide authority file. in support of this work, statistics relating to headings that appear on the bibliographic records in the lc marc ii files were gathered. these statistics provide information on characteristics of headings and on the expected sizes and growth rates of various subsets of authority files. this information will assist in making decisions concerning the contents of authority files for different types of headings and the frequency of update required for the various file subsets. then ational commission on libraries and information science supported this work. use of these statistics to assist in system design is largely system-dependent; however, some general implications are given in the last section of this paper. in general , counts were made of the number of bibliographic records, headings that appear in those records, and distinct headings that appear on the records. the statistics were broken down by year, by type of heading, and by file. in this paper, distinct headings are those left in a file after removal of duplicates. distinctness will not be used to imply that a heading appears only once in a source bibliographic file, although distinct headings may in fact have only a single occurrence. thus, a file of records containing the distinct headings from a set of bibliographic records is equivalent in size to a marc authority file of the headings in those bibliographic records. methodology these statistics were derived from four marc ii bibliographic record files maintained internally at lc: books, serials, maps, and films. the files contain updated versions of all marc records that have been distributed by lc on the books, serials, maps, and films tape:; frum 1969 through october 1979, and a few records that were then in the process of distribution. the files do not contain cip records. a total of l ,336,182 bibliographic records were processed, including 1,134,069 from the books file, 90,174 from the serials file, 60,758 from the maps file, and 51,176 from the films file. a file of special records, called access point (ap) records, was created that contains one record for the contents of each occurrence of the following fields in the bibliographic records: type of heading personal name corporate name conference name topical subject geographic subject uniform title heading fields 100,700,400,800,600 110,710,410,810,610 111,711,411,811,611 130, 730, 650 651 830,630 only the 6xx subject fields that contained lc subject headings (i.e., second indicator = 0) were selected asap records. the main entry data string was substituted for the pronoun in the series (4xx) fields that contained pronouns. the ap records also contained information from the bibliographic records that assisted in making the counts, such as the date of entry of the record on the file, the identity of the type of bibliographic file, and the language of the bibliographic record. a third file was derived from the ap file that contained a normalized character string for each ap record heading. these normalized ap records were used to produce the counts of distinct headings by clustering like data strings. normalization included conversion of all characters to uppercase, and masking of diacritics, marks of punctuation, and other characters that do not determine the distinctness of a heading, but would interfere with machin~ determination of uniqueness. the subhelds included in the normalized string, hence used for all heading comparisons, are given below. only use-dependent subfields, such as the relator subfield, and those that belonged to title clusters in author/title headings were excluded. examples of the ap file field contents and the normalized forms are: ap field contents: chuang-tzu chuang-tzu [blaeu,joan] 1596-1673 blaeu, joan. 1596-1673 blaeu,joan, 1596-1673 byron, george gordon noel byron, baron, 1788-1824 byron, george gordon noel byron, baron, 1788-1824 byron, george gordon noel byron, baron, 1788-1824 byron, george gordon noel byron, baron, 1788.1824 communications 195 normalized forms: chuang tzu blaeu joan 1596 1673 byron george gordon noel byron baron 17881824 distinct headings for this study were determined by comparing on the following subfields: type of heading personal name corporate name sub fields a, b,c,d a, b, k, f, p, s, g conference name a, q, e topical subject a, b, x, y, z geographic subject a, b, x, y, z all occurrences of repeating subfields were included. the relator data of subfields were dropped from personal and corporate name headings as were the title subfields in author/title headings. a separate study will examine the occurrence of author/title headings. approximately 8 percent of the name headings in the files carry title subfields: 6 percent are series and 2 percent are author/title subjects or added entries. two types of distinct heading counts were generated for topical and geographic subject headings. one takes account only of main terms, the a and b subfields, excluding all subject subdivisions. the other compared the complete heading strings, including subject subdivisions. characteristics of the files the four bibliographic files from which the statistics were derived were begun in different years and are of unequal size. table 1 presents the number of bibliographic records added to each of the marc files by the year that the record was first entered into the file. the records added in the first months of 1979 have been eliminated from tables 1-3, thus the total number of records under consideration is 1,210,809. in the combined file, the records for books dominate the contributions from other forms of materials, representing 85 percent of the combined file records. after the addition of the films and serials records in 1972 and 1973 the total number of records added each year leveled off to around 115,000 but jumped to an average of slightly more than 150,000 records per year following the ad196 journal of library automation vol. 14/3 september 1981 table 1. number of records added to each file by year year entered book serial map film total 1968 11,812 0 0 0 11,812 1969 43,874 0 1, 104 0 44,978 1970 86,004 0 3,467 0 89,978 1971 105,390 0 8,857 6,280 114 ,247 1972 73,437 0 4,665 6,280 84,382 1973 92,512 3,720 5,566 8,929 110,727 1974 99,004 10,682 6,246 8,457 124,389 1975 86,527 15,866 6,721 8,604 117,718 1976 120,106 19,098 6,876 5,432 151,512 1977 140,011 17,999 7,011 4 ,797 169,818 1978 169,044 12,643 5,584 4,464 191,735 total 1,027,721 80,008 56,117 46,963 1,210,809 table 2. numbers of headings and distinct name headings added to all files by year number of headin gs number of distinct headin gs year personal corporate conference personal corporate conference entered names names names names names. names 1968 14,526 3,138 155 12,620 2,139 143 1969 53, 134 21,206 1,027 39,184 9,364 909 1970 104,365 42 ,798 2,175 63,037 14,286 1,769 1971 129,617 57,496 2,742 64,029 15,216 2,158 1972 91,040 45,768 1,942 41,246 9,891 1,402 1973 118,188 57,847 2,625 48,703 12,653 1,862 1974 127,588 73,303 2,972 51,623 17,129 1,983 1975 113,622 76,417 2,519 50,291 18,135 1,742 1;}76 154 ,7 18 88,207 3,454 73,182 23,120 2,306 1977 182,860 87,985 3,487 89,353 23,906 2,333 1978 218,535 97,042 4,192 99,780 24,280 2,831 total 1,308, 193 651,207 27,290 633,048 170, 119 19,438 table 3. numbers of subject headings and distinct subject headings added to all files by year number of distinct headings number of headings first terms only full headings year topical geographic topical geographic topical geographic entered subjects subjects subjects subjects subjects subjects 1968 10,615 1,857 4,390 489 7,775 1,512 1969 45,161 9,047 8,104 1,980 23,617 5,426 1970 89,304 21,054 8,170 4,263 34 ,526 10,179 1971 115,220 31,278 6,853 5,417 36,689 12,862 1972 92,247 20,760 4,236 2,597 26,201 7,074 1973 121 , 161 27,890 4,460 3,105 33,061 9,819 1974 137,843 31,814 4,524 3,553 39,262 11 ,4 13 1975 130,980 30,650 4,203 3,417 40,129 11 ,818 1976 168,840 39,886 5, 125 4,142 55,468 15,472 1977 185,331 44,973 5,718 4,194 59,529 16,676 1978 222,565 49,923 7,151 4,034 69,856 17,855 t otal 1,319,267 309,132 62,934 37,191 426, 113 120, 106 clition of major non-english roman alphabet language records in 1976. the increase is noticeable primarily in the books and serials files since the maps file had been adding those languages since 1969 and only a limited number of non-english-language audiovisual materials are cataloged. the unusually large number of records added to the books file in 1971 resulted from a special project to add retrospective titles to the file. the large increase in books records in 1978 was due to the co marc project in which retrospective lc records that had been converted to machine-readable form by other libraries were contributed to the lc marc file. approximately 12,000 comarc records were added in 1977 and 28,000 in 1978. the fall in numbers of film records produced in 1976-1978 reflects a general fall in production of instructional films in the united states. counts of items cataloged that are compiled by lc processing services from catalogers' statistics sheets show that lc cataloged approximately 225,000 titles in 1978; thus, approximately 73 percent of lc cataloging is currently going into machinereadable form. the principal exclusions are records for most nonroman material (only nonroman records for maps have been transliterated and added since 1969) and a few records for music, sound recordings, incunabula, and microforms. the portion being put into machine-readable form should rise significantly as the romanized records for items in several nonroman alphabets are added in the next year. name headings table 2 presents the number of occurrences of name headings in the marc bibliographic files and the number of distinct name headings, both by type of heading and by year. the number of distinct headings that were new to the file in a year was determined by comparing the headings added in a given year against those added in all previous years. it is not surprising to find that 66 percent of name-heading occurrences are personal names, 33 percent are corporate, and only 1.4 percent are conference. the figures shift when considering the distinct names, where 77 percent are percommunications 197 sonal and only 21 percent are corporate. looking at ~he total figures in table 2, while 1 ,308,193 of the headings that appeared on the records were personal names, only 633,048 or 48 percent of these were distinct. of the rest, 52 percent were duplicates of the distinct headings. similarly, 26 percent of corporate names were distinct, with 74 percent being duplicates; and 71 percent of conference names were distinct, with only 29 percent being duplicates. in 1968, 87 percent and 68 percent of personal and corporate names, respectively, were distinct, i.e., 13 percent and 32 percent "had been used previously" when they appeared on a bibliographic record during the year. as the base file of names grows, the percentage of names appearing on new records but which "had been used previously" rises, to 60 percent and 77 percent in 1974. while the figures reported in table 2 indicate that the percentage of headings used that were repeats fell slightly again in 1977 (51 percent and 73 percent), this is probably due to the influx of new names with the addition of new languages in 1976-77. additional statistics gathered on english-language items show the percentage of repeating headings becoming steady after 1974. subject headings statistics concerning distinct topical and geographical subject headings were collected for main terms, excluding subdivisions, and for full subject heading strings. table 3 gives the numbers of headings and the numbers of distinct headings of each type found in the marc file. looking at the total figures, only 4.8 percent of topical first terms are distinct, the rest are duplicates. this indicates an average occurrence of 20.8 times for each first term. slightly more, 12 percent, of the geographic first terms are distinct. when the full headings with topical, period, form, and geographic subdivisions are considered, the percentage of headings that are distinct rises to 32.3 percent for topical subjects and 38.8 percent for geographic subjects. thus, 67.3 percent of topical and 61.2 percent of geographic are duplicates of existing headings. in the yearly figures, sub198 journal of library automation vol. 14/3 september 1981 ject headings show the same tendency as name headings in that the percentages of headings that appear on new records but which "had been previously used" rises as the stock of headings increases and then levels off. subjects were also affected by the addition of other roman alphabet languages in 1976-77 but not to a very large degree. for all access points, name headings and full string subject headings, name headings account for 55 percent of the headings that occur in the bibliographic records, with only 45 percent attributable to topical and geographical headings. it should be noted that 12 percent of the name headings that appear on the bibliographic records are names used as subjects. frequencies of occurrence counts were also made of the frequency with which name headings occurred in the bibliographic files. table 4 summarizes the frequency data: 66 percent of distinct personal names, 62 percent of distinct corporate names, and 84 percent of distinct conference names occur only once in the files. the percent of corporate names with single occurrences is surprisingly close to that for personal; however, the percent of names having multiple occurrences falls more slowly for corporate than for personal names. while 5.47 percent of corporate names occur ten or more times, only 1.92 percent of personal names occur ten or more times. the figures for personal names roughly correspond to those obtained by william potter from a sample taken from the main catalog at the university of illinois at urbana-champaign. that study showed 63.5 percent of personal names occurred onlyonce. 1 the number of occurrences of different types of headings are compared in figure 1. the bars show the numbers of personal, corporate, conference, topical, and geographic headings that appear in the bibliographic files. the shaded areas represent the number of headings that are distinct, thus the upper part of each bar represents additional occurrences of the headings from the shaded area. for personal, corporate, and conference headings a further distinction is made between distinct headings that occur only once, the crosshatched area, and those that have multiple occurrences. thus the multiple occurrences of corporate names may be seen to come from a small table 4. frequency of occurrence of name headings in all files distinct distinct distinct number of personal names corporate names conference names occurrences number percent number percent number percent 1 456,328 65.65 116,250 62.02 18,02 1 83.90 2 119,68 1 17.22 30,185 16.10 2,049 9.54 3 46,247 6.65 11,563 6.17 587 2.73 4 23,951 3.45 6,814 3.64 289 1.35 5 13,820 1.99 4,109 2.19 163 .76 6 8,790 1.26 2,958 1.58 98 .46 7 5,827 .84 2,175 1.16 56 .26 8 4,056 .58 1,673 .89 48 .22 9 2,998 .43 1,395 .74 36 . 17 10 2,153 .31 10 ,037 .55 18 .08 11-13 4,116 .59 2,180 1.16 44 .20 14-20 3,748 .54 2,632 1.40 41 .19 2150 2,678 .39 2,901 1.55 23 .11 51-100 448 .06 936 .50 4 .02 101-200 149 .02 374 .20 2 .01 201-300 47 .01 109 .06 1 .00 301400 19 .00 46 .02 0 .00 401-500 11 .00 21 .01 0 .00 5011000 5 .00 53 .03 0 .00 1001 + 2 .00 18 .01 0 .00 total 695,074 99.99 187,429 99.98 21,480 100.00 number of distinct corporate headings, as was indicated by the slow decrease of the multiple-heading occurrence rate (i.e., a small group of corporate names have a very large number of occurrences). file growth as a bibliographic file grows and the stock of names and subjects that are contained in the associated authority file increases, the number of new-to-the-file 1400 1200 1000 "' <:> 800 z i5 <t w :>: .. 0 a: w 600 id ::;: "' z 400 200 1,444,726 personal names corporate names communications 199 headings that are required for the new bibliographic records would be expected to fall. figure 2 illustrates that tendency and shows that there is a leveling off of the number of new-to-the-file headings per new bibliographic record after the bibliographic file reaches a certain size. for example, after approximately 700,000 bibliographic records are in the file, for every additional 100 bibliographic records approximately 298 name and subject headings 30,417 conference names 1.468,804 topical subjects geographic subjects d distinct headings distinct headings that occur -only once fig. 1. number of headin gs by type. 200 journal of library automation vol. 14/3 september 1981 will be assigned, and, of these, approximately 53 will be new personal names, 14 new corporate names, 2 new conference names, 35 new topical subjects, and 10 new geographic subjects; the remaining 184 headings used will already be established in the authority file. thus after a certain bibliographic file size is reached, the growth of the authority file is approximately a linear function of the growth of the bibliographic file. implications the reoccurrence frequency of headings in a bibliographic file is often cited as a factor in designing bibliographic and authority-file configurations. discussion 1.2 ii i 0 0 .9 a: 0 u w a: .8 ~ ~ .7 z 5 :.\ " .6 ~ z 0 .5 a: w "' ~ . 4 z .3 centers on the necessity of carrying authority records for headings that occur only once in a bibliographic file . with reference to the name-heading data in table 4 and figure 1, carrying authority records only for headings that occur more than once could 'potentially reduce the size of the authority file from that indicated by the whole shaded area (including shaded and crosshatched) to the plain shaded area, i.e., from 903,983 records to 310,123, a 66 percent decrease. controlling multiple occurrences of a heading is, however, only one role of the authority record. more important perhaps is the control of cross-references connected with the heading. preliminary work with a • persona l names ---9 top~cal su8jects ... corporate names 2~ ----------~----~---------& geographi ca l subj ects 'y con ference n ames » ~~~~r=~~~~~==~==~==~~~==~==~==~==~-100 200 300 400 500 600 700 800 900 1000 11 00 1200 1300 number of bibliographic records cthousands) fig. 2 . n umber of n ew headings p er r eco rd for all files. random sample of personal names in the lc file indicates that less than 17 percent of personal names require cross-references. thus the personal name headings that occur only once but would require authority records because of cross-references could be less than 17 percent. the frequency data combined with reference structure data could have a significant impact on design. out of a total of 695,074 personal names in the authority files associated with the marc bibliographic files examined here, 456, 328, or 66 percent, occur only once. of these, fewer than 77,575 would be expected to have cross-references, thus the nameauthority file for personal names could be reduced in size from 695,074 records to 316,321, a 55 percent decrease. if separate authority records are a system requirement, the occurrence figures might then be useful for defining configurations that employ machine-generated provisional records for single-occurrence headings that do not have reference structures or that simplify in other ways the treatment of these headings. these figures may also be useful in making decisions on the addition of retrospective authority records to the automated files. reference 1. william gray potter, "when names collide: conflict in the catalog and aacr2," library resources & technical services 24:7 (winter 1980). rlin and oclc as reference tools douglas jones: university of arizona, tucson. the central reference department (social science, humanities, and fine arts) and the science-engineering reference department at the university of arizona library are currently evaluating the oclc and rlin systems as reference tools, to see if their use can significantly improve the effectiveness and efficiency of providing reference service. a significant number of the questions received by our librarians, and presumably by librarians elsewhere, incommunications 201 volve incomplete or inaccurately cited references to monographs, conference proceedings, government documents, technical reports, and monographic serials. if by using a bibliographic utility a librarian can identify or verify an item not found in printed sources, then effectiveness has been improved. once a complete and accurate description of the item is found, it is a relatively simple task to determine whether or not the library has the item, and if not, to request it through interlibrary loan. additionally, if the efficiency of the librarian can be improved by reducing the amount of time required to verify or identify a requested item, then the patron, the library, and, in our case, the taxpayer, have been better served. the promise of nearimmediate response from a computer via an online interactive terminal system is clearly beguiling when compared to the relatively time-consuming searching required with printed sources, which frequently provide only a limited number of access points and often become available weeks, months, or even years after the items they list. we realize, of course, that the promise of instantaneous electronic information retrieval is limited by a va):'iety of factors, and presently we view access to rlin and oclc as potentially powerful adjuncts tonot replacements for-printed reference sources. given that rlin and oclc have databases and software geared to known-item searches for catalog card production, our evaluation attempts to document their usefulness in reference service. a preliminary study conducted during the spring semester of 1980-81 indicated that approximately 50 percent of the questionable citations requiring further bibliographic verification could be identified on oclc or rlin. the time required was typically five minutes or less. successful verification using printed indexes to identify the same items ranged from 20 percent in the central reference department to 50 percent in science-engineering. time required per item averaged approximately fifteen minutes. based on our findings, we plan a revised and more thorough test during the fall semester of 1981-82, which will include an assessment of the enhancements to the 20190318 10992 editor letter from the editor kenneth j. varnum information technology and libraries | march 2019 1 https://doi.org/10.6017/ital.v38i1.10992 the current (march 2019) issue of information technology and libraries sees the first of what i know will be many exciting contributions to our new “public libraries leading the way” column. this feature (announced in december 2018) shines a spotlight on technology-based innovations from the public library perspective. the first column, “the democratization of artificial intelligence: one library’s approach,” by thomas finley of the frisco (texas) public library, discusses how his library has developed a teaching and technology lending program around artificial intelligence, creating kits that community members can take home and use to explore artificial intelligence through a practical, hands-on, approach. if you have a public library perspective on technology that you would like to share in a conversational, 1000-1500-word column, submit a proposal. full details and a link to the proposal submission form can be found on the lita blog. i look forward to hearing your ideas. in addition to the premiere column in this series, the current issue includes the lita president’s column from bohyun kim to update us on the 2019 ala midwinter meeting, particularly on the status of the proposed alcts/llama/lita merger, and our regular editorial board thoughts column, contributed this quarter by kevin ford, on the importance of user stories in successful technology projects. articles in this issue cover the topics: improving sitewide navigation; improving the display of hathitrust records in primo; using linked data to create a geographic discovery system; measuring information system project success; a systematic approach towards web preservation; and determining textbook cost, formats and licensing. i hope you enjoy reading the issue, whether you explore just one article, or read it “cover to cover.” as always, if you want to share the research or practical experience you have gained as an article in ital, get in touch with me at varnum@umich.edu. sincerely, kenneth j. varnum, editor varnum@umich.edu march 2019 web services and widgets for library information systems | han 87on the clouds: a new way of computing | han 87 shape cloud computing. for example, sun’s well-known slogan “the network is the computer” was established in late 1980s. salesforce.com has been providing on-demand software as a service (saas) for customers since 1999. ibm and microsoft started to deliver web services in the early 2000s. microsoft’s azure service provides an operating system and a set of developer tools and services. google’s popular google docs software provides web-based word-processing, spreadsheet, and presentation applications. google app engine allows system developers to run their python/java applications on google’s infrastructure. sun provides $1 per cpu hour. amazon is well-known for providing web services such as ec2 and s3. yahoo! announced that it would use the apache hadoop framework to allow users to work with thousands of nodes and petabytes (1 million gigabytes) of data. these examples demonstrate that cloud computing providers are offering services on every level, from hardware (e.g., amazon and sun), to operating systems (e.g., google and microsoft), to software and service (e.g., google, microsoft, and yahoo!). cloud-computing providers target a variety of end users, from software developers to the general public. for additional information regarding cloud computing models, the university of california (uc) berkeley’s report provides a good comparison of these models by amazon, microsoft, and google.4 as cloud computing providers lower prices and it advancements remove technology barriers—such as virtualization and network bandwidth—cloud computing has moved into the mainstream.5 gartner stated, “organizations are switching from factors related to cloud computing: infinite computing resources available on demand, removing the need to plan ahead; the removal of an up-front costly investment, allowing companies to start small and increase resources when needed; and a system that is pay-for-use on a short-term basis and releases customers when needed (e.g., cpu by hour, storage by day).2 national institute of standards and technology (nist) currently defines cloud computing as “a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. network, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”3 as there are several definitions for “utility computing” and “cloud computing,” the author does not intend to suggest a better definition, but rather to list the characteristics of cloud computing. the term “cloud computing” means that ■■ customers do not own network resources, such as hardware, software, systems, or services; ■■ network resources are provided through remote data centers on a subscription basis; and ■■ network resources are delivered as services over the web. this article discusses using cloud computing on an it-infrastructure level, including building virtual server nodes and running a library’s essential computer systems in remote data centers by paying a fee instead of running them on-site. the article reviews current cloud computing services, presents the author’s experience, and discusses advantages and disadvantages of using the new approach. all kinds of clouds major it companies have spent billions of dollars since the 1990s to on the clouds: a new way of computing this article introduces cloud computing and discusses the author’s experience “on the clouds.” the author reviews cloud computing services and providers, then presents his experience of running multiple systems (e.g., integrated library systems, content management systems, and repository software). he evaluates costs, discusses advantages, and addresses some issues about cloud computing. cloud computing fundamentally changes the ways institutions and companies manage their computing needs. libraries can take advantage of cloud computing to start an it project with low cost, to manage computing resources cost-effectively, and to explore new computing possibilities. s cholarly communication and new ways of teaching provide an opportunity for academic institutions to collaborate on providing access to scholarly materials and research data. there is a growing need to handle large amounts of data using computer algorithms that presents challenges to libraries with limited experience in handling nontextual materials. because of the current economic crisis, academic institutions need to find ways to acquire and manage computing resources in a cost-effective manner. one of the hottest topics in it is cloud computing. cloud computing is not new to many of us because we have been using some of its services, such as google docs, for years. in his latest book, the big switch: rewiring the world, from edison to google, carr argues that computing will go the way of electricity: purchase when needed, which he calls “utility computing.” his examples include amazon’s ec2 (elastic computing cloud), and s3 (simple storage) services.1 amazon’s chief technology officer proposed the following yan hantutorial yan han (hany@u.library.arizona.edu) is associate librarian, university of arizona libraries, tucson. 88 information technology and libraries | june 201088 information technology and libraries | june 2010 company-owner hardware and software to per-use service-based models.”6 for example, the u.s. government website (http://www.usa .gov/) will soon begin using cloud computing.7 the new york times used amazon’s ec2 and s3 services as well as a hadoop application to provide open access to public domain articles from 1851 to 1922. the times loaded 4 tb of raw tiff images and their derivative 11 million pdfs into amazon’s s3 in twenty-four hours at very reasonable cost.8 this project is very similar to digital library projects run by academic libraries. oclc announced its movement of library management services to the web.9 it is clear that oclc is going to deliver a web-based integrated library system (ils) to provide a new way of running an ils. duraspace, a joint organization by fedora commons and dspace foundation, announced that they would be taking advantage of cloud storage and cloud computing.10 on the clouds computing needs in academic libraries can be placed into two categories: user computing needs and library goals. user computing needs academic libraries usually run hundreds of pcs for students and staff to fulfill their individual needs (e.g., microsoft office, browsers, and image-, audio-, and video-processing applications). library goals a variety of library systems are used to achieve libraries’ goals to support research, learning, and teaching. these systems include the following: ■■ library website: the website may be built on simple html webpages or a content management system such as drupal, joomla, or any home-grown php, perl, asp, or jsp system. ■■ ils: this system provides traditional core library work such as cataloging, acquisition, reporting, accounting, and user management. typical systems include innovative interfaces, sirsidynix, voyager, and opensource software such as koha. ■■ repository system: this system provides submission and access to the institution’s digital collections and scholarship. typical systems include dspace, fedora, eprints, contentdm, and greenstone. ■■ other systems: for example, federated search systems, learning object management systems, interlibrary loan (ill) systems, and reference tracking systems. ■■ public and private storage: staff file-sharing, digitization, and backup. due to differences in end users and functionality, most systems do not use computing resources equally. for example, the ils is input and output intensive and database query intensive, while repository systems require storage ranging from a few gigabytes to dozens of terabytes and substantial network bandwidth. cloud computing brings a fundamental shift in computing. it changes the way organizations acquire, configure, manage, and maintain computing resources to achieve their business goals. the availability of cloud computing providers allows organizations to focus on their business and leave general computing maintenance to the major it companies. in the fall of 2008, the author started to research cloud computing providers and how he could implement cloud computing for some library systems to save staff and equipment costs. in january 2009, the author started his plan to build library systems “on the clouds.” the university of arizona libraries (ual) has been a key player in the process of rebuilding higher education in afghanistan since 2001. ual librarian atifa rawan and the author have received multiple grant contracts to build technical infrastructures for afghanistan’s academic libraries. the technical infrastructure includes the following: ■■ afghanistan ils: a bilingual ils based on the open-source system koha.11 ■■ afghanistan digital libraries website (http://www.afghan digitallibraries.org/): originally built on simple html pages, later rebuilt in 2008 using the content management system joomla. ■■ a digitization management system. the author has also developed a japanese ill system (http://gif project.libraryfinder.org) for the north american coordinating council on japanese library resources. these systems had been running on ual’s internal technical infrastructure. these systems run in a complex computing environment, require different modules, and do not use computing resources equally. for example, the afghan ils runs on linux, apache, mysql, and perl. its opac and staff interface run on two different ports. the afghanistan digital libraries website requires linux, apache, mysql, and php. the japanese ill system was written in java and runs on tomcat. there are several reasons why the author moved these systems to the new cloud computing infrastructure: ■■ these systems need to be accessed in a system mode by people who are not ual employees. ■■ system rebooting time can be substantial in this infrastructure because of server setup and it policy. ■■ the current on-site server has web services and widgets for library information systems | han 89on the clouds: a new way of computing | han 89 reached its life expectancy and requires a replacement. by analyzing the complex needs of different systems and considering how to use resources more effectively, the author decided to run all the systems through one cloud computing provider. by comparing the features and the costs, linode (http://www.linode.com/) was chosen because it provides full ssh and root access using virtualization, four data centers in geographically diverse areas, high availability and clustering support, and an option for month-to-month contracts. in addition, other customers have provided positive reviews. in january 2009, the author purchased one node located in fremont, california, for $19.95 per month. an implementation plan (see appendix) was drafted to complete the project in phases. the author owns a virtual server and has access to everything that a physical server provides. in addition, the provider and the user community provided timely help and technical support. the migration of systems was straightforward: a linux kernel (debian 4.0) was installed within an hour, domain registration was complete and the domains went active in twenty-four hours, the afghanistan digital libraries’ website (based on joomla) migration was complete within a week, and all supporting tools and libraries (e.g., mysql, tomcat, and java sdk) were installed and configured within a few days. a month later, the afghanistan ils (based on koha) migration was completed. the ill system was also migrated without problem. tests have been performed in all these systems to verify their usability. in summary, the migration of systems was very successful and did not encounter any barriers. it addresses the issues facing us: after the migration, ssh log-ins for users who are not university employees were set up quickly; systems maintenance is managed by the author’s team, and rebooting now only takes about one minute; and there is no need to buy a new server and put it in a temperature and security controlled environment. the hardware is maintained by the provider. the administrative gui for the linux nodes is shown in figure 1. since migration, no downtime because of hardware or other failures caused by the provider has been observed. after migrating all the systems successfully and running them in a reliable mode for a few months, the second phase was implemented (see appendix). another linux node (located in atalanta, georgia) was purchased for backup and monitoring (see figure 2). nagios, an open-source monitoring system, was tested and configured to identify and report problems for the above library systems. nagios provides the following functions: (1) monitoring critical computing components, such as the network, systems, services, and servers; (2) timely alerts delivered via e-mail or cell phone; and (3) report and record logs of outages, events, and alerts. a backup script is also run as a prescheduled job to back up the systems on a regular basis. figure 1. linux node administration web interface figure 2. two linux nodes located in two remote data centers node 1: 64.62.xxx.xxx (fremont, ca) node 2: 74.207.xxx.xxx (atlanta, ga) nagios backup afghan digital libraries website afghan ils interlibrary loan system dspace 90 information technology and libraries | june 201090 information technology and libraries | june 2010 findings and discussions since january 2009, all the systems have been migrated and have been running without any issues caused by the provider. the author is very satisfied with the outcomes and cost. the annual cost of running two nodes is $480 per year, compared to at least $4,000 dollars if the hardware had been run in the library.12 from the author ’s experience, cloud computing provides the following advantages over the traditional way of computing in academic institutions: ■■ cost-effectiveness: from the above example and literature review, it is obvious that using cloud computing to run applications, systems, and it infrastructure saves staff and financial resources. uc berkeley’s report and zawodny’s blog provide a detailed analysis of costs for cpu hours and disk storage.13 ■■ flexibility: cloud computing allows organizations to start a project quickly without worrying about up-front costs. computing resources such as disk storage, cpu, and ram can be added when needed. in this case, the author started on a small scale by purchasing one node and added additional resources later. ■■ data safety: organizations are able to purchase storage in data centers located thousands of miles away, increasing data safety in case of natural disasters or other factors. this strategy is very difficult to achieve in a traditional off-site backup. ■■ high availability: cloud computing providers such as microsoft, google, and amazon have better resources to provide more up-time than almost any other organizations and companies do. ■■ the ability to handle large amounts of data: cloud computing has a pay-for-use business model that allows academic institutions to analyze terabytes of data using distributed computing over hundreds of computers for a short-time cost. on-demand data storage, high availability and data safety are critical features for academic libraries.14 however, readers should be aware of some technical and business issues: ■■ availability of a service: in several widely reported cases, amazon’s s3 and google gmail were inaccessible for a duration of several hours in 2008. the author believes that the commercial providers have better technical and financial resources to keep more up-time than most academic institutions. for those wanting no single point of failure (e.g., a provider goes out of business), the author suggests storing duplicate data with a different provider or locally. ■■ data confidentiality: most academic libraries have open-access data. this issue can be solved by encrypting data before moving to the clouds. in addition, licensing terms can be negotiated with providers regarding data safety and confidentiality. ■■ data transfer bottlenecks: accessing the digital collections requires considerable network bandwidth, and digital collections are usually optimized for customer access. moving huge amounts of data (e.g., preservation digital images, audios, videos, and data sets) to data centers can be scheduled during off hours (e.g., 1–5 a.m.), or data can be shipped on hard disks to the data centers. ■■ legal jurisdiction: legal jurisdiction creates complex issues for both providers and end users. for example, canadian privacy laws regulate data privacy in public and private sectors. in 2008, the office of the privacy commissioner of canada released a finding that “outsourcing of canada .com email services to u.s.-based firm raises questions for subscribers,” and expressed concerns about public sector privacy protection.15 this brings concerns to both providers and end users, and it was suggested that privacy issues will be very challenging.16 summary the author introduces cloud computing services and providers, presents his experience of running multiple systems such as ils, content management systems, repository software, and the other system “on the clouds” since january 2009. using cloud computing brings significant cost savings and flexibility. however, readers should be aware of technical and business issues. the author is very satisfied with his experience of moving library systems to cloud computing. his experience demonstrates a new way of managing critical computing resources in an academic library setting. the next steps include using cloud computing to meet digital collections’ storage needs. cloud computing brings fundamental changes to organizations managing their computing needs. as major organizations in library fields, such as oclc, started to take advantage of cloud computing, the author believes that cloud computing will play an important role in library it. acknowledgments the author thanks usaid and washington state university for providing financial support. the author thanks matthew cleveland’s excellent work “on the clouds.” references 1. nicholars carr, the big switch: rewiring the world, from edison to google web services and widgets for library information systems | han 91on the clouds: a new way of computing | han 91 (london: norton, 2008). 2. werner vogels, “a head in the clouds—the power of infrastructure as a service” (paper presented at the cloud computing and in applications conference (cca ’08), chicago, oct. 22–23, 2008). 3. peter mell and tim grance, “draft nist working definition of cloud computing,” national institute of standards and technology (may 11, 2009), http:// csrc.nist.gov/groups/sns/cloud-computing/index.html (accessed july 22, 2009). 4. michael armbust et al., “above the clouds: a berkeley view of cloud computing,” technical report, university of california, berkeley, eecs department, feb. 10, 2009, http://www.eecs.berkeley .edu/pubs/techrpts/2009/eecs-200928.html (accessed july 1, 2009). 5. eric hand, “head in the clouds: ‘cloud computing’ is being pitched as a new nirvana for scientists drowning in data. but can it deliver?” nature 449, no. 7165 (2007): 963; geoffery fowler and ben worthen, “the internet industry is on a cloud—whatever that may mean,” wall street journal, mar. 26, 2009, http://online.wsj.com/article/ sb123802623665542725.html (accessed july 14, 2009); stephen baker, “google and the wisdom of the clouds,” business week (dec. 14, 2007), http://www.msnbc .msn.com/id/22261846/ (accessed july 8, 2009). 6. gartner, “gartner says worldwide it spending on pace to supass $3.4 trillion in 2008,” press release, aug. 18, 2008, http://www.gartner.com/it/page .jsp?id=742913 (accessed july 7, 2009). 7. wyatt kash, “usa.gov, gobierno usa.gov move into the internet cloud,” government computer news, feb. 23, 2009, http://gcn.com/articles/2009/02/23/ gsa-sites-to-move-to-the-cloud.aspx?s =gcndaily_240209 (accessed july 14, 2009). 8. derek gottfrid, “self-service, prorated super computing fun!” online posting, new york times open, nov. 1, 2007, http://open.blogs .nytimes.com/2007/11/01/self-service -prorated-super-computing-fun/?scp =1&sq=self%20service%20prorated&st =cse (accessed july 8, 2009). 9. oclc online computing library center, “oclc announces strategy to move library management services to web scale,” press release, apr. 23, 2009, http://www.oclc.org/us/en/news/ releases/200927.htm (accessed july 5, 2009). 10. duraspace, “fedora commons and dspace foundation join together to create duraspace organization,” press release, may 12, 2009, http:// duraspace.org/documents/pressrelease .pdf (accessed july 8, 2009). 11. yan han and atifa rawan, “afghanistan digital library initiative: revitalizing an integrated library system,” information technology & libraries 26, no. 4 (2007): 44–46. 12. fowler and worthen, “the internet industry is on a cloud.” 13. jeremy zawodney, “replacing my home backup server with amazon’s s3,” online posting, jeremy zawodny’s blog, oct. 3, 2006, http://jeremy .zawodny.com/blog/archives/007624 .html (accessed june 19, 2009). 14. yan han, “an integrated high availability computing platform,” the electronic library 23, no. 6 (2005): 632–40. 15. office of the privacy commissioner of canada, “tabling of privacy commissioner of canada’s 2005–06 annual report on the privacy act: commissioner expresses concerns about public sector privacy protection,” press release, june 20, 2006, http://www.priv.gc.ca/media/ nr-c/2006/nr-c_060620_e.cfm (accessed july 14, 2009); office of the privacy commissioner of canada, “findings under the personal information protection and electronic documents act (pipeda),” (sept. 19, 2008), http://www.priv.gc.ca/cf -dc/2008/394_20080807_e.cfm (accessed july 14, 2009). 16. stephen baker, “google and the wisdom of the clouds,” business week (dec. 14, 2007), http://www.msnbc.msn .com/id/22261846/ (accessed july 8, 2009). appendix. project plan: building ha linux platform using cloud computing project manager: project members: object statement: to build a high availability (ha) linux platform to support multiple systems using cloud computing in six months. scope: the project members should identify cloud computing providers, evaluate the costs, and build a linux platform for computer systems, including afghan ils, afghanistan digital libraries website, repository system, japanese interlibrary loan website, and digitization management system. resources: project deliverable: january 1, 2009—july 1, 2009 92 information technology and libraries | june 201092 information technology and libraries | june 2010 phase i ■■ to build a stable and reliable linux platform to support multiple web applications. the platform needs to consider reliability and high availability in a cost-effective manner ■■ to install needed libraries for the environment ■■ to migrate ils (koha) to this linux platform ■■ to migrate afghan digital libraries’ website (joomla) to this platform ■■ to migrate japanese interlibrary loan website ■■ to migrate digitization management system phase ii ■■ to research and implement a monitoring tool to monitor all web applications as well as os level tools (e.g. tomcat, mysql) ■■ to configure a cron job to run routine things (e.g., backup ) ■■ to research and implement storage (tb) for digitization and access phase iii ■■ to research and build linux clustering steps: 1. os installation: debian 4 2. platform environment: register dns 3. install java 6, tomcat 6, mysql 5, etc. 4. install source control env git 5. install statistics analysis tool (google analytics) 6. install monitoring tool: ganglia or nagios 7. web applications 8. joomla 9. koha 10. monitoring tool 11. digitization management system 12. repository system: dspace, fedora, etc. 13. ha tools/applications note calculation based on the following: ■■ leasing two nodes $20/month: $20 x 2 nodes x 12 months = $480/year ■■ a medium-priced server with backup with a life expectancy of 5 years ($5,000): $1,000/year ■■ 5 percent of system administrator time for managing the server ($60,000 annual salary): $3,000/year ■■ ignore telecommunication cost, utility cost, and space cost. ■■ ignore software developer’s time because it is equal for both options. appendix. project plan: building ha linux platform using cloud computing (cont.) what more can we do to address broadband inequity and digital poverty? editorial board thoughts what more can we do to address broadband inequity and digital poverty? lori ayre information technology and libraries | september 2020 https://doi.org/10.6017/ital.v39i3.12619 lori ayre (lori.ayre@galecia.com) is principal, the galecia group, and a member of the ital editorial board. © 2020. we are now almost seven months into our new lives with the novel coronavirus and over 190,000 americans have died of covid-19. library administrators have been struggling with their commitment to provide services to their communities while keeping their staff safe. initially, libraries relied on their online offerings, so more e-books and other online resources were acquired. staff learned that they could do quite a bit of their work from home. they could still respond to email and phone messages. they could evaluate and order new material. they could deliver online programs like summer reading and story time. they could interact with people on social media. they could put together key resources for patrons and post them on the website.1 a lot of what the library was doing while the buildings were closed was not obvious. most people associate the library with the building and since the building was closed… it seemed like nothing was happening at the library. yet, library workers were busy. once it became possible for library staff to enter the building (per local health ordinances), the first thing that libraries started to do was accept returns. that was a little fraught considering how little we knew about the virus and how long contaminants might live on returned library material. eventually with the long-awaited testing results from the realm project and battelle labs (https://www.webjunction.org/explore-topics/covid-19-research-project.html), people started standardizing on a three-day quarantine of returns. then more testing of stacked material was done resulting in some people choosing to quarantine returns for four days. as of early september, we have learned that even five days isn’t enough to quarantine delivery totes and some other plastic material. curbside pick-up was born in these early days of being allowed back in the buildings. if someone had mapped who was offering curbside pick-up, it would look like popcorn popping across the country. the number of libraries offering the service slowly increased and pretty soon nearly everyone was doing it.2 many library directors will say that curbside pick-up is here to stay. people love the convenience too much to take the service away. rolling out curbside pick-up has had some challenges: how to safely make the handoff between library staff and library patrons; whether to accept returns; whether to charge fines; modifying mailto:lori.ayre@galecia.com https://www.webjunction.org/explore-topics/covid-19-research-project.html information technology and libraries september 2020 what more can we to do address broadband inequity and poverty? | ayre 2 circulation policies to fit the current needs; and selecting books for people that want them but who don’t have the skills needed to negotiate the library catalog’s requesting system. some libraries started putting together grab bags of materials selected by staff for specific patrons—kind of like homebound services on-the-fly. curbside helped get material in circulation again. importantly, also during this period, libraries started finding creative ways to get wi-fi hotspots out into communities. they began lending them if they weren’t already. those libraries already circulating hotspots increased their inventory. they took their bookmobiles into neighborhoods and created temporary hotspot connections around town. many libraries made sure wi-fi from their building was available in their own parking lots.3 but one thing everyone has learned during this pandemic is that libraries alone cannot be the solution to the digital divide. this isn’t news to librarians who have been arguing that internet access should be as readily available as electricity and water. librarians understand that information cannot be free and accessible unless everyone has internet access and knows how to use it. public access computers, wi-fi hotspots, and media literacy are all staple services in our libraries today.4 however, these services are not enough to bridge a digital divide that only seems to be getting worse. the coronavirus that closed libraries and schools has made it painfully clear that something much bigger has to happen to address the problem. as gina millsap stated in a recent facebook post: i think it’s become obvious that the covid-19 crisis is shining a spotlight on the flaws we have in our broadband infrastructure and on our failure to make the investments that should have been made for equitable access to what should be a basic utility, like water or electricity.5 according to broadbandnow, the number of people who lack broadband internet access could be as high as 42 million.6 the fcc reports that at least “18 million people lacked access to broadband internet at the end of 2018.”7 even if all the libraries were open and circulating hundreds of wi-fi hotspots, we’d still have a very serious access problem. thinking differently about addressing the digital divide in a paper published march 28, 2019, by the urban libraries council (ulc), the author suggested three specific actions that libraries can take to address race and social equity and the digital divide. they are: 1. assess and respond to the needs of your community through meaningful conversation (including considering different partners for your work) 2. optimize funding opportunities to support your efforts (e.g. e-rate), and information technology and libraries september 2020 what more can we to do address broadband inequity and poverty? | ayre 3 3. think outside the box to create effective solutions that are informed by those in need (e.g. lending wi-fi hot spots).8 while we know libraries have been heeding this advice when it comes to wi-fi hotspots, let’s look into what can be done when we consider ulc’s suggestion to consider different partners for your work. community partners an excellent example of what can be done with a coalition of community partners comes from detroit where a mesh wireless network was put in place to provide permanent broadband in a low-income neighborhood.9 the project is called the detroit community technology project. with the community-based mesh network, only one internet account is needed to provide access for multiple homes. the networks also enable people on the network to share resources on the network (calendar, files, bulletin board) and that data lives on their network, not in the cloud. one of the sponsors of the detroit community technology project is the allied media project (https://www.alliedmedia.org/) which also sponsors the casscowifi and the equitable internet initiative to get broadband and digital literacy training into several underserved areas. community networks (https://muninetworks.org/), a project of the institute for local selfreliance (https://ilsr.org/), describes several innovative projects in which communities partner with electric utilities. surry county, virginia, expects to extend broadband access to 6,700 households through a first-ever partnership between a utility (dominion energy virginia) and an electric cooperative (dominion energy). a similar project is underway with the northern neck cooperative and dominion energy.10 these initiatives are made possible due to some regulatory changes made in virginia (sb 966). according to community networks, there are 900 communities providing broadband connectivity locally (https://muninetworks.org/communitymap). but nineteen states still have barriers in place that discourage, if not outright prevent, local communities from investing in broadband. libraries in states where community networks are a viable option should be at the table, or perhaps setting the table, for discussions about how to bring broadband to the entire community not just into the library or dispatched one-at-a-time via wi-fi hotspots. this is an opportunity to convene community conversations focusing on the issue of broadband. library staff have been doing more and more of this type of outreach into the community and acting as facilitator. the ala has even produced a community conversation workbook (http://www.ala.org/tools/sites/ala.org.tools/files/content/ltc_convoguide_final_062414.pdf ) to support libraries just getting started. state partners in california, the governor recently issued executive order n-73-20 (https://www.gov.ca.gov/wp-content/uploads/2020/08/8.14.20-eo-n-73-20.pdf) directing state agencies to pursue a goal of 100 mbps download speed and outlines actions across state agencies https://www.alliedmedia.org/ https://muninetworks.org/ https://ilsr.org/ https://muninetworks.org/communitymap http://www.ala.org/tools/sites/ala.org.tools/files/content/ltc_convoguide_final_062414.pdf https://www.gov.ca.gov/wp-content/uploads/2020/08/8.14.20-eo-n-73-20.pdf information technology and libraries september 2020 what more can we to do address broadband inequity and poverty? | ayre 4 and departments to accelerate mapping and data collection, funding, deployment, and adoption of high-speed internet.11 this will undoubtedly create fertile ground for libraries to partner with other agencies and community organizations to advance this initiative. libraries are specifically called out to raise awareness of low-cost broadband options to their local community. every state has some kind of broadband task force or commission or advisory council (https://www.ncsl.org/research/telecommunications-and-information-technology/statebroadband-task-forces-commissions.aspx). this is another instance where libraries should be at the table. in my state, our state librarian is on the california broadband council. but many of these commissions do not have a representative from the library world which means they probably are not hearing from us. whether it is through your local library, your state library, or your state library association, it is important for librarians to build relationships with people on these commissions—if not get a seat on the commission themselves. national partners unless your community is blanketed with affordable broadband connectivity, it will be important that we continue to advocate nationally for the needs we see. in addition to helping the patron standing right in front of us checking out their hotspot, we also need to address the needs of the people who aren’t able to get to the library but are equally in need of access. our job is to make sure that any new initiatives undertaken by a new administration provide for free and equitable access to the internet for every household. extending e-rate (the federal communication commission’s program for making internet access more affordable for schools and libraries) isn’t enough. free (or at least affordable) broadband needs to be brought to every home. the electronic frontier foundation (eff) argues that fiber-to-the-home is the best option for consumers today because it will be easily upgradeable without touching the underlying cables and will support the next generation of applications (see https://www.eff.org/wp/case-fiber-hometoday-why-fiber-superior-medium-21st-century-broadband). libraries have worked with the eff on issues related to privacy and government transparency. maybe it’s time to team-up with them about broadband. global partners low earth orbit (leo) satellites could potentially bring broadband to everyone on earth.12 starlink (https://www.starlink.com/) is elon musk’s initiative and project kuiper (https://blog.aboutamazon.com/company-news/amazon-receives-fcc-approval-for-projectkuiper-satellite-constellation) is amazon’s jeff bezos’ project. a private beta starlink service is due (or perhaps it is already happening). if it works as musk has envisioned, it could be a gamechanger. or it might just make the digital divide worse if it isn’t affordable to everyone who needs it. how might we lobby musk to roll-out this service in a way that is equitable and fair? https://www.ncsl.org/research/telecommunications-and-information-technology/state-broadband-task-forces-commissions.aspx https://www.ncsl.org/research/telecommunications-and-information-technology/state-broadband-task-forces-commissions.aspx https://www.eff.org/wp/case-fiber-home-today-why-fiber-superior-medium-21st-century-broadband https://www.eff.org/wp/case-fiber-home-today-why-fiber-superior-medium-21st-century-broadband https://www.starlink.com/ https://blog.aboutamazon.com/company-news/amazon-receives-fcc-approval-for-project-kuiper-satellite-constellation https://blog.aboutamazon.com/company-news/amazon-receives-fcc-approval-for-project-kuiper-satellite-constellation information technology and libraries september 2020 what more can we to do address broadband inequity and poverty? | ayre 5 speak up, speak out, and get in the way these are just a few avenues that we, as professionals committed to free access to information, might pursue. i worry that we have not made enough noise about the problems we see in our communities that are a result of broadband inequity and digital poverty. and although virtually every library is doing something to address the problem, our efforts are no match for the magnitude of the problem. in a blog post on the brookings institution’s website, authors lara fishbane and adie tomer argue for a new agenda focused on comprehensive digital equity that includes (among other things) “building networks of local champions, ensuring community advocates, government officials, and private network providers share intelligence, debate priorities, and deploy new programming .”13 there are no better local champions and advocates for communities than the city or county librarians and their staffs. let’s treat this problem with the seriousness it deserves and at a scale that will be meaningful. to quote john lewis (as so many of us have since his death on july 17, 2020), it's time for us to “speak up, speak out, and get in the way.”14 we have to make it painfully clear to policymakers that libraries cannot bridge the digital divide with public access computers and hotspots. we need to tell our communities’ stories, convene conversations, and agitate for equitable broadband that is as readily available as water and electricity. endnotes 1 “libraries respond: covid-19 survey,” american library association, accessed august 25, 2020, http://www.ilovelibraries.org/sites/default/files/may-2020-covid-survey-pdf-summaryof-results-web-2.pdf. 2 erica freudenberger, “reopening libraries: public libraries keep their options open,” library journal, june 25, 2020, https://www.libraryjournal.com/?detailstory=reopening-librariespublic-libraries-keep-their-options-open. 3 lauren kirchner, “millions of american depend on libraries for internet. now they’re closed,” the markup, june 25, 2020, https://themarkup.org/coronavirus/2020/06/25/millions-ofamericans-depend-on-libraries-for-internet-now-theyre-closed. 4 jim lynch, “the gates library foundation remembered: how digital inclusion came to libraries,” techsoup, accessed august 24, 2020, https://blog.techsoup.org/posts/gateslibrary-foundation-remembered-how-digital-inclusion-came-to-libraries. 5 gina millsap, “this was in april. q. we’re starting a new school year and what has changed? a. not much. it’s past time to get serious about universal broadband in the u.s.” facebook, august 16, 2020, 5:37 a.m., https://www.facebook.com/gina.millsap.7/posts/10218986781485855. accessed september 14, 2020. http://www.ilovelibraries.org/sites/default/files/may-2020-covid-survey-pdf-summary-of-results-web-2.pdf http://www.ilovelibraries.org/sites/default/files/may-2020-covid-survey-pdf-summary-of-results-web-2.pdf https://www.libraryjournal.com/?detailstory=reopening-libraries-public-libraries-keep-their-options-open https://www.libraryjournal.com/?detailstory=reopening-libraries-public-libraries-keep-their-options-open https://themarkup.org/coronavirus/2020/06/25/millions-of-americans-depend-on-libraries-for-internet-now-theyre-closed https://themarkup.org/coronavirus/2020/06/25/millions-of-americans-depend-on-libraries-for-internet-now-theyre-closed https://blog.techsoup.org/posts/gates-library-foundation-remembered-how-digital-inclusion-came-to-libraries https://blog.techsoup.org/posts/gates-library-foundation-remembered-how-digital-inclusion-came-to-libraries https://www.facebook.com/gina.millsap.7/posts/10218986781485855 information technology and libraries september 2020 what more can we to do address broadband inequity and poverty? | ayre 6 6 “libraries are filling the homework gap as students head back to school,” broadband usa, last modified september 4, 2018, https://broadbandusa.ntia.doc.gov/ntia-blog/libraries-arefilling-homework-gap-students-head-back-school. 7 james k. willcox, “libraries and schools are bridging the digital divide during the coronavirus pandemic,” consumer reports, last modified april 29, 2020, https://www.consumerreports.org/technology-telecommunications/libraries-and-schoolsridging-the-digital-divide-during-the-coronavirus-pandemic/. 8 sarah chase webber, “the library’s role in bridging the digital divide”, urban libraries council, last modified march 28, 2019, https://www.urbanlibraries.org/blog/the-librarys-role-inbridging-the-digital-divide. 9 cecilia kang, “parking lots have become a digital lifeline,” the new york times, may 20, 2020, https://www.nytimes.com/2020/05/05/technology/parking-lots-wifi-coronavirus.html. 10 ry marcattilio-mccracken, “electric cooperatives partners with dominion energy to bring broadband to rural virginia,” last modified august 6, 2020, https://muninetworks.org/content/electric-cooperatives-partner-dominion-energy-bringbroadband-rural-virginia. 11 “newsom issues executive order on digital divide,” cheac (improving the health of all californians), last modified august 14, 2020, https://cheac.org/2020/08/14/newsom-issuesexecutive-order-on-digital-divide/. 12 tyler cooper, “bezos and musk’s satellite internet could save americans $30b a year,” podium: opinion, advice, and analysis by the tnw community, last modified august 24, 2019, https://thenextweb.com/podium/2019/08/24/bezos-and-musks-satellite-internet-couldsave-americans-30b-a-year/. 13 lara fishbane and adie tomer, “neighborhood broadband data makes it clear: we need an agenda to fight digital poverty,” brookings institution, last modified february 6, 2020, https://www.brookings.edu/blog/the-avenue/2020/02/05/neighborhood-broadband-datamakes-it-clear-we-need-an-agenda-to-fight-digital-poverty/. 14 rashawn ray, “five things john lewis taught us about getting in ‘good trouble,’” brookings institution, last modified july 23, 2020, https://www.brookings.edu/blog/how-werise/2020/07/23/five-things-john-lewis-taught-us-about-getting-in-good-trouble/. https://broadbandusa.ntia.doc.gov/ntia-blog/libraries-are-filling-homework-gap-students-head-back-school https://broadbandusa.ntia.doc.gov/ntia-blog/libraries-are-filling-homework-gap-students-head-back-school https://www.consumerreports.org/technology-telecommunications/libraries-and-schools-bridging-the-digital-divide-during-the-coronavirus-pandemic/ https://www.consumerreports.org/technology-telecommunications/libraries-and-schools-bridging-the-digital-divide-during-the-coronavirus-pandemic/ https://www.urbanlibraries.org/blog/the-librarys-role-in-bridging-the-digital-divide https://www.urbanlibraries.org/blog/the-librarys-role-in-bridging-the-digital-divide https://www.nytimes.com/2020/05/05/technology/parking-lots-wifi-coronavirus.html https://muninetworks.org/content/electric-cooperatives-partner-dominion-energy-bring-broadband-rural-virginia https://muninetworks.org/content/electric-cooperatives-partner-dominion-energy-bring-broadband-rural-virginia https://cheac.org/2020/08/14/newsom-issues-executive-order-on-digital-divide/ https://cheac.org/2020/08/14/newsom-issues-executive-order-on-digital-divide/ https://thenextweb.com/podium/2019/08/24/bezos-and-musks-satellite-internet-could-save-americans-30b-a-year/ https://thenextweb.com/podium/2019/08/24/bezos-and-musks-satellite-internet-could-save-americans-30b-a-year/ https://www.brookings.edu/blog/the-avenue/2020/02/05/neighborhood-broadband-data-makes-it-clear-we-need-an-agenda-to-fight-digital-poverty/ https://www.brookings.edu/blog/the-avenue/2020/02/05/neighborhood-broadband-data-makes-it-clear-we-need-an-agenda-to-fight-digital-poverty/ https://www.brookings.edu/blog/how-we-rise/2020/07/23/five-things-john-lewis-taught-us-about-getting-in-good-trouble/ https://www.brookings.edu/blog/how-we-rise/2020/07/23/five-things-john-lewis-taught-us-about-getting-in-good-trouble/ thinking differently about addressing the digital divide community partners state partners national partners global partners speak up, speak out, and get in the way endnotes usability as a method for assessing discovery | ipri, yunkin, and brown 181 tom ipri, michael yunkin, and jeanne m. brown usability as a method for assessing discovery the university of nevada las vegas libraries engaged in three projects that helped identify areas of its website that had inhibited discovery of services and resources. these projects also helped generate staff interest in the usability working group, which led these endeavors. the first project studied student responses to the site. the second focused on a usability test with the libraries’ peer research coaches and resulted in a presentation of those findings to the libraries staff. the final project involved a specialized test, the results of which also were presented to staff. all three of these projects led to improvements to the website and will inform a larger redesign. u sability testing has been a component of the university of nevada las vegas (unlv) libraries web management since our first usability studies in 2000.1 usability studies are a widely used and relatively standard set of tools for gaining insight into web functionality. these tests can explore issues such as the effectiveness of interactive forms or the complexity of accessing full-text articles from third-party databases. they can explore aesthetic and other emotional responses to a site. in addition, they can provide an opportunity to collect input concerning satisfaction with the layout and logic of the site. they can reveal mistakes on the site, such as coding errors, incorrect or broken links, and problematic wording. they also allow us to engage in testing issues of discovery to isolate site elements that facilitate or hamper discovery of the libraries’ resources and services. the libraries’ usability working group seized upon two library-wide opportunities to highlight findings of the past year’s studies. the first was the discovery summit, in which the staff viewed videos of staff attempting finding exercises on the homepage and discussed the finding process. the second was the discovery mini-conference, an outgrowth of a new evaluation framework and the libraries’ strategic plan. through a poster display, the working group highlighted areas dealing with discovery of library resources. the mini-conference allowed us to leverage library-wide interest in the topic of effective information-finding on the web to draw wider attention to usability’s importance in identifying the likelihood of our users discovering library resources independently. the usability working group engaged in three projects to help identify areas of the website that inhibited discovery and to generate staff interest in the process of usability. all three of these projects led to improvements to the website and will inform a larger redesign. the first project is an ongoing effort to study student responses to the site. the second was to administer a usability test with the libraries’ peer research coaches and present those findings to the libraries’ staff. the final project was requested by the dean of libraries and involved a specialized test, the results of which also were presented to staff. n student studies the usability working group began its ongoing evaluation of unlv libraries’ website by conducting two series of tests: one with five undergraduate students and one with five graduate students. not surprisingly, most students self-reported that the main reason they come to the libraries’ site is to find books and journal articles for assignments. the group created a set of fourteen tasks that were based on common needs for completing assignments: 1. find a journal article on the death penalty. (note: if students go somewhere other than the library, guide them back.) 2. find what floor the book the catcher in the rye is on. 3. find the most current issue of the journal popular mechanics. 4. identify a way to ask a question from home. 5. find a video on global warming. 6. you need to write a bibliography for a paper. find something on the website that would help you. 7. find out what lied library’s hours were for july 4. 8. find the libraries’ tutorial on finding books in the library. 9. the library offers workshops on how to use the library. find one you can take. 10. find a library-recommended website in business. 11. find out what books are checked out on this card. 12. find instructions for printing from your personal laptop. 13. your sociology professor, dr. lampert, has placed something on reserve for your class. please find the material. 14. your professor wants you to read the book efficiency and complexity in grammars by john a. hawkins. find a copy of the book for your assignment. (the tom ipri (tom.ipri@unlv.edu) is head, media and computer services; michael yunkin (michael.yunkin@unlv.edu) is web content manager/usability specialist; and jeanne m. brown (jeanne.brown@unlv.edu) is head, architecture studies library and assessment librarian, university of nevada las vegas libraries. 182 information technology and libraries | december 2009 moderator will prompt if the person stops at the catalog.) the results of these tests revealed that the site was not as conducive to discovery as was hoped. the libraries are planning on a complete redesign of the site in the near future; however, the results of these first two series of usability tests were compelling enough to prompt an intermediary redesign to improve some of the areas that were troublesome to students. that said, the tests also found certain parts of the old site (figure 1) to be very effective: 1. all participants used the tabbed box in the center of the page, which gives them access to the catalog, serials lists, databases, and reserves. 2. all students quickly found the “ask a librarian” link when prompted to find a way to ask a question from home. 3. most students found the libraries’ hours, partly because of the “hours” tab at the top of the page and partly because of multiple access points. 4. many participants used the “site search” tab to navigate to the search page, but few actually used it to conduct searches. they effectively used the site map information also included on the search page. the usability tests also revealed some variables that undermined the goal of discoverability. 1. due to the various sources of library-related information (website, catalog, vendor databases) navigation posed problems for students. although not a specific question in the usability tests, the results show students often struggled to get back to the libraries’ home page to start a new question. 2. students often expected to find different content under “help and instruction” than what was there. 3. students used the drop down boxes as a last resort. often, they would expand a drop down box and quickly navigate away without selecting anything from the list. 4. with some exceptions, students mainly ignored the tabs across the top of the home page. 5. although students made good use of the tabbed box in the center of the page, many could not distinguish between “journals” and “articles & databases.” 6. similarly, students easily found the “reserves” tab but could not make sense of the difference between “electronic reserves (e-reserves)” and “other reserves.” 7. no student found business resources via the “subject guides” drop down menu at the bottom of the home page. n peer-coach test and staff presentation unlv libraries employs peer research coaches, undergraduate students who serve as frontline research mentors to their peers. the usability working group administered the same test they used with the first group of undergraduate and graduate students to the peer research coaches. although these students are trained in library research, they still struggled with some of the usability tasks. the usability working group presented the findings of the peer research coach tests with staff. the peer research coaches are highly regarded in the libraries, so staff were surprised that they had so much difficulty navigating the site; this presentation was the first time many of the staff had seen the results of usability studies of the site. the shocking nature of these results generated a great deal of interest among the staff regarding the work of the usability working group. n the dean’s project in january 2009, the dean of libraries asked the usability working group for assistance in planning for the discovery summit. initially, she requested to view figure 1. unlv libraries’ original website design usability as a method for assessing discovery | ipri, yunkin, and brown 183 the video from some of the usability tests with the goal of identifying discovery-oriented problems on the libraries’ website. soon after, the dean tasked the group with performing a new set of usability tests using three subjects: a librarian, a library employee with little research or web expertise, and a faculty researcher. each participant was asked to complete three tasks, first using the libraries’ website, then using google. the tasks were based on items found in the libraries’ special collections: 1. find a photograph available in unlv libraries of the basic magnesium mine in henderson, nevada. 2. find some information about the baneberry nuclear test. are there any documents in unlv libraries about the lawsuit associated with the test? 3. find some information about the local greenpeace chapter. are there any documents in unlv libraries about the las vegas chapter? the dean viewed those videos and chose the most interesting clips for a presentation at the discovery summit. prior to this meeting, the libraries’ staff were instructed to try completing the tasks on their own so that they might see the potential difficulties users must overcome and to compare the user experience provided by our website with that provided by google. at the discovery summit, the dean presented the staff a number of clips from these special usability tests, giving the staff an opportunity to see where users familiar with the libraries collections stumble. the staff also were shown several clips of undergraduates using the website to perform basic tasks, such as finding journal articles or videos in the libraries, with varying degrees of success. these clips helped illustrate the various difficulties users encounter when attempting to discover library holdings, including unfamiliar search interfaces, library jargon, and a lack of clear relationships between the catalog and other databases. this discussion helped set the stage for the discovery mini-conference. n initial changes to the site unlv libraries’ website is in the process of being redesigned, and the results of the usability studies are being used to inform that process. however, because of the seriousness of some of the issues, some changes are being implemented into an intermediary design (figure 2). the new homepage n combines article and journal searching into one tab and removes the word “databases” from the page entirely; n adds a website search to the tabbed box; n adds a “music & video” search option; n makes better use of the picture on the page by incorporating rotating advertisements in that area; n widens the page, allowing more space on the rest of the site’s templates; n breaks the confusing “help & instruction” page into two more specific pages: “help” and “using the libraries”; and n adds the main library and the branch library hours to the homepage. this new homepage is just the beginning of our efforts to improve discovery through the libraries’ website. the usability working group already has plans to do a card sort for the “using the library” category to further refine the content and language of that section. the group plans to test the initial changes to the site to ensure that they are improving discovery. reference 1. jennifer church, jeanne brown, and diane vanderpol, “walking the web: usability testing of navigational pathways at the university of nevada las vegas libraries,” in usability assessment of library-related web sites: methods and case studies, ed. nicole campbell (chicago: ala, 2001). figure 2. unlv libraries’ new website design 222 information technology and libraries | december 2006 social engineering is the use of nontechnical means to gain unauthorized access to information or computer systems. while this method is recognized as a major security threat in the computer industry, little has been done to address it in the library field. this is of particular concern because libraries increasingly have access to databases of both proprietary and personal information. this tutorial is designed to increase the awareness of library staff in regard to the issue of social engineering. one morning the phone rings at the circulation desk; the assistant, joyce, answers. “seashore branch public library, how may we help you?” she asks, smiling. “my wife and i recently moved and i wanted to confirm that you had our current address,” a pleasant male voice responds. “could you give me your name please?” “the card is in my wife’s name, jennifer greene. we’ve been so busy with the move that she hasn’t had a chance to catch up with everything.” “okay, i have her information here. 123 main street, apartment 2b. is that correct?” “thank you so much, that’s it. do you have our new number or is it still 555-555-1234 in your records?” “let me see . . . no, i think we have your new number.” “could you read it back to me?” “sure . . . 555-555-6789, is that right?” “555-555-6789 . . . that’s right. thank you very much, you’ve been very helpful.’ “no problem, that’s what we’re here for.” <click> what just happened? what happened to joyce may have been exactly what it appeared to be—a conscientious spouse trying to make sure information was updated after a move. but what else could it have been—research for an identity theft, or a stalker trying to get personal information? we have no way of knowing. all reasons except for the first, innocent, reason are covered by the term social engineering. in the language of computer hackers, social engineering is a nontechnical hack. it is the use of trickery, persuasion, impersonation, emotional manipulation, and abuse of trust to gain information or computer-system access through the human interface. regardless of an institution’s commitment to computer security through technology, it is vulnerable to social engineering. recently, the institute of management and administration (ioma) reported social engineering as the number-one security threat for 2005. according to ioma, this method of security violation is on the rise due to continued improvements in technical protections against hackers.1 why and how does social engineering work? the first thing to keep in mind about social engineering is that it does work. kevin mitnick, possibly the best known hacker of recent decades, carried out most of his questionable activities through the medium of social engineering.2 he did not need to use his technical expertise because it was easier to just ask for the information he wanted. he discovered that people, when questioned appropriately, would give him the information he wanted. social engineering succeeds because most people work under the assumption that others are essentially honest. as a pure matter of probability, this is true; the vast majority of communications that we receive during the day are completely innocent in character. this fact allows the social engineer to be effective. by making seemingly innocuous requests for information, or making requests in a way that seems reasonable at the time, the social engineer can gather the information that he or she is looking for. methods of social engineering the arsenal of the social engineer is large and very well established. this is mainly because social engineering amounts to a variation on confidence trickery, an art that goes back as far as human history can recall. one might argue that homer’s iliad contains the first record of a social engineering attack in the form of the trojan horse. direct requests many social-engineering methods are complex and require significant planning. however, there is a simple and effective method that is often just as effective. the social engineer contacts his or her target and simply asks for the information. preying on trust and emotion social engineering is a method of gaining information through the persuasion of human sources, based on the abuse of trust and the manipulation of emotion. in his book, the art of deception, mitnick makes the argument that once a social engineer has established the trust of a contact, then all security is effectively voided and helping the hacker? library information, security, and social engineering samuel t. c. thompson samuel t. c. thompson (sthompson@ collier-lib.org), is a public service librarian at the collier county public library, naples, florida. helping the hacker? | thompson 223 the social engineer can gather whatever information is required. the most common method of targeting computer end-users is through the manipulation of gratitude. in these cases, a social engineer, usually impersonating a technician, contacts a user and states that there is something wrong on the victim’s end, and that the social engineer needs a few pieces of information to “help” the user. appreciative of the assistance, the victim provides the necessary information to the helpful caller or carries out the requested actions. predictably, no problem ever existed and the victim has now provided the social engineer either access to a computer system or with the information needed to gain that access. a counterpoint to the manipulation of gratitude is the manipulation of sympathy. this method is most often used on information providers such as help-desk personnel, technicians, and library staff members. in this scenario, a social engineer contacts a victim and claims to have either lost information, is out of contact with a normal source, or is simply ignorant of something that he or she should know. as anyone can empathize with this plea, the victim is often all too willing to provide the information sought by the social engineer. using these methods—taking advantage of the gratitude, sympathy, and empathy of their victims—social engineers are able to achieve their aims. impersonation because forming trust relationships with their victims is critical to a socialengineering attack, it is not surprising that social engineers often pretend to be someone or something that they are not. two of the major tools of impersonation are (1) speaking the language of the victim institution and (2) knowledge of personnel and policy. to allay suspicion, a social engineer needs to know and be able to use an institution’s terminology. being unable to do so would cause the victim to suspect, rather than trust, the social engineer. with a working knowledge of an organization’s particular vocabulary, a social engineer can phrase his or her request in terms that will not rouse alarm with the intended victim. the other major goal of a social engineer in preparing a successful impersonation is to develop a familiarity with the “lay of the land,” i.e., the specifics of and personnel within an organization. for instance, a social engineer needs to discover who has what authority within an organization so as to understand for whom he or she needs to claim to speak. research to establish trust in their victims, social engineers use research as a tool. this comes in two forms, background research and cumulative research. background research is the process by which a social engineer uses publicly available resources to learn what to ask for, how to ask for it, and whom to ask it of. while the intent and goal of this research differs from the techniques used by students, librarians, and other members of the population, the actual process is the same. cumulative research is the process by which a social engineer gathers the information that he or she needs to make more critical requests of their victims. the facts that a social engineer seeks through cumulative research may seem without value to the casual observer, but put together properly, they are anything but that. questions can include names of staff, internal phone numbers, procedures, or seemingly minor technical details about the library’s network (e.g., what operating system are you running?). late in the afternoon the phone at the reference desk rings. marcy, the librarian on duty answers, “reference desk.” “hi there, this is dave simpson calling from information services at the main branch. sorry about the echo, i’m working in the cabling closet at the moment, so i’m calling you on my cell phone.” “no problem, i can hear you fine. what can i do for you?” “thanks. a lot of the branches have been having network problems over the last few days. has everything been okay at the seashore branch reference desk?” “i think so.” “okay, that’s good. i’m running a test right now on the network and needed to find a terminal that was behaving itself. could you log off and let me know if any messages come up?” “no problem.” marcy logs off of the reference computer; nothing strange happens. “just the usual messages.” “good. now start logging back on. what user are you going in as? i mean which login name are you using?” “searef. okay, i’m logged on now.” “no strange messages?” “nothing.” “that’s great. look, our problem might be kids hacking into the system so i need you to change the password. do you know how to do that?” “i think so.” “well, let me walk you through it.” dave spends a couple of minutes walking marcy through changing the system password. the password is now changed to 5ear3f, a moderately secure password. “thanks, marcy. you’ve been a great help. we have your new password logged into the system. could you pass on the new password to the other reference personnel?” “sure.” “wonderful. just remember not to give the password out to anyone who doesn’t need it, and don’t write it down where anyone who shouldn’t have it can get at it. have a great day.” “you too.” <click> 224 information technology and libraries | december 2006 why are libraries vulnerable? libraries are vulnerable to social-engineering attacks for two major reasons: (1) ignorance and (2) institutional psychology. the first of these difficulties is the easiest to address. the ignorance of library professionals in this matter is easily explained—there is very little literature to date about the issue of social engineering directed at library personnel. what exists is usually mixed in larger articles on general security issues and receives little focus. this lack of concern about social engineering can also be seen in computer professional literature, where it is dwarfed by the volume of articles concerning technical security issues. this is a curious gap, considering the high rate of occurrence of this kind of attack. is it because many technical professionals are less comfortable with a social issue—that can only be solved through people—than with a technical security issue that can be solved through the development or implementation of proper software?3 unfortunately, not knowing about a method of security violation leaves one vulnerable to that method. it is incumbent on librarians, computer administrations, and security professionals to be aware of these issues. the second factor is harder to address but equally important. unlike almost any other profession, librarians are expected to fulfill their patrons’ informational needs without question or bias. this laudable goal makes librarians vulnerable to social-engineering attacks because the inquiries made by a social engineer about the information resources available at a library may be used for nefarious purposes. a reference interview over these issues may be very successful from the point of view of both parties involved, as the librarian fills the openended inquiries of the social engineer, and the social engineer receives much, if not all, of the information that he or she needs to violate the library’s internal information systems. why libraries can be targets at this point, it is relevant to ask why security violators would even bother with library computer networks. what do libraries have that is worth possibly committing a crime to get? personal information is probably the most tempting target in a library computer system. libraries possess databases of names, addresses, and other personal data about library cardholders. this information is valuable, and not all of it is easily available from public sources. as may be seen in the section of this article on techniques, such information could be used as an end unto itself or as a stepping stone to security violations in other systems. subscriptions to proprietary databases are quite expensive, as any acquisitions librarian will explain. given the high prices and limited licensing, a hacker may want to gain access to these information resources. this could be a casual hacker who wants to have access to a library-only resource from his or her home computer, or this may be a criminal who wishes to steal intellectual properties from a database provider. libraries often have broadband access designed for a large network (e.g., t1). as these lines are very expensive, few individuals can afford them. at the same time, it has been observed that these broadband lines have immense capabilities for downloading information from other networks. there are many reasons why a hacker would seek to illicitly use such a resource. for instance, a casual hacker may want to download a large number of bootlegged movie files, or a criminal may wish to download a corporate database. with access to a library’s high bandwith internet line, these actions can be carried out quickly and with a minimized chance of detection. libraries possess large numbers of computers due to their increasing automation. these computer resources can, if compromised, be used as anonymous remote computers by hackers. called “zombies,” compromised computers could be used to deliver illegal spam, distributed denial of service (ddos) attacks, or as servers to distribute illegal materials. if library computers are used in this way, there is a potential for a library to face legal responsibility for the actions of its computers or for the questionable materials found on them. prevention the tools needed to prevent social engineering from succeeding are awareness, policy, and training. these tools feed into one another—we become aware of the possibility of social-engineering attacks, develop policy to communicate these concerns to others, and then train others in these policies to protect them and their libraries from social engineering. libraries should have a simple set of policies to help prevent social engineering from affecting them. this policy need not be long; ideally, it should be a small page of bullet points that are easy to remember or to post near telephones. what is important is that it is easy to remember and implement when a call or e-mail comes in.4 basic guidelines for protection against social engineering ■ be suspicious of unsolicited communications asking about employees, technical information, or other internal details. ■ do not provide passwords or login names over the phone or helping the hacker? | thompson 225 via e-mail no matter who claims to be asking. ■ do not provide patron information to anyone but the patron in person and only upon presentation of the patron’s library card or other proper identification. ■ if you are not sure if a request is legitimate, contact the appropriate authorities. ■ trust your instincts. if you feel suspicious about a question or communication, there is probably a good reason. ■ document and report suspicious communications. in closing social engineering is an immensely effective method of breaching computer and network security. it is, however, entirely dependent on the ability of the social engineer to persuade staff members into providing information or access that they should not provide. with care and good information policies, we can prevent social engineering from working. after all, do we really want to be helping the hacker? the circulation desk phone rings. joyce answers, “seashore branch public library, how may we help you?” “hi there, i’m worried that i haven’t turned in all the books i have out, and i really don’t want to get stuck with a fine. could you tell me what i have out?” “no problem. what is you name?” “sean grey.” joyce brings up sean grey’s circulation records, and then remembers about the library’s information policy and decides to ask another question, “could you give me your library card number?” “i don’t have that with me. i really don’t want to get stuck with those fines.” “i’m sorry. mr. grey, to preserve patron privacy we can only give out circulation information if you give us your card number or if you are here in person with your card or id.” “but i just want to avoid a fine. can’t you help?” “don’t worry; if you are late by accident on occasion, we are willing to forgive a fine.” “so you can’t give me my records?” “i’m sorry but we have to protect patron privacy. i’m sure you understand.” “i guess so. goodbye.” “have a good day.” <click> ■ references 1. institute of management & administration, “six security threats that will make headlines in ’05,” ioma’s security director’s report 5, no. 1 (2004): 1–14. 2. k. manske, “an introduction to social engineering,” security management practices (nov./dec. 2000): 53–59. 3. m. mcdowell, “cyber-security tip st04-014,” (2005), http://www.us.cert. gov/cas/tips/st04-014.html (accessed june 5, 2005). 4. k. mitnick and w. simon, the art of deception (indianapolis: wiley, 2002). alcts cover 2 lama cover 3 lita 180, 216, cover 4 index to advertisers : | wang 81building an open source institutional repository at a small law school library | wang 81 fang wangcommunications v700 flatbed scanner, which was recommended by many digitization best practices in texas. for software, we had all the important basics such as ocr and image editing software for the project to start. for the following several months, i did extensive research on what digital asset management platform would be the best solution for the law library. we had options to continue displaying the digital collections through webpages or use a digital asset management platform that would provide long-term preservation as well as retrieval functions. we made the decision to go with the latter. generally speaking, there are two types of digital asset management platforms: proprietary and open source. in some rare occasions, a library chooses to develop its own system and not to use either type of the platforms if the library has designated programmers. there are pros and cons to both proprietary and open source platforms. although setting up the repository is fairly quick and easy on a proprietary platform, it can be very expensive to pay annual fees for hosting and using the service. for the open source software, it may appear to be “free” up front; however, installing and customizing the repository can be very time consuming and these solutions often lack technical and development support. there is no uniform rule for choosing a platform. it depends on what the organization wants to achieve and its own unique circumstances. i explored several popular proprietary platforms such as contentdm and digital commons. contentdm is an oclc product, which has a lot of capability and is especially good for displaying image collections. digital commons is owned of the repository is ongoing; it is valuable to share the experience with other institutions who wish to set up an institutional repository of their own and also add to the knowledgebase of ir development. institutional repository from the ground up unlike most large university libraries, law school libraries are usually behind on digital initiative activities because of smaller budgets, lack of staff, and fewer resources. although institutional repositories have already become a trend for large university libraries, it still appears to be a new concept for many law school libraries. at the beginning of 2009, i was hired as the digital information management librarian to develop a digital repository for the law school library. when i arrived at texas tech university law library, there was no institutional repository implemented. there were very few digital projects done at the law library. one digital collection was of faculty scholarship. this collection was displayed on a webpage with links to pdf files. another digital project, to digitize and provide access to the texas governor executive orders found in the texas register, was planned then disbanded because of the previous employee leaving the position. i started by looking at the digitization equipment in the library. the equipment was very limited: a very old and rarely used book scanner and a sheet-fed scanner. the good thing was that the library did have extra pcs to serve as workstations. i did research on the book scanner we had and also consulted colleagues i met at various digital library conferences about it. because the model is very outdated and has been discontinued by the vendor and thus had little value to our digitization project, i decided to get rid of the scanner. i then proposed to purchase an epson perfection building an open source institutional repository at a small law school library: is it realistic or unattainable? digital preservation activities among law libraries have largely been limited by a lack of funding, staffing and expertise. most law school libraries that have already implemented an institutional repository (ir) chose proprietary platforms because they are easy to set up, customize, and maintain with the technical and development support they provide. the texas tech university school of law digital repository is one of the few law school repositories in the nation that is built on the dspace open source platform.1 the repository is the law school’s first institutional repository in history. it was designed to collect, preserve, share and promote the law school’s digital materials, including research and scholarship of the law faculty and students, institutional history, and law-related resources. in addition, the repository also serves as a dark archive to house internal records. i n this article, the author describes the process of building the digital repository from scratch including hardware and software, customization, collection development, marketing and outreach, and future projects. although the development fang wang (fang.wang@ttu.edu) is digital information management librarian, texas tech university school of law library, lubbock, texas. 82 information technology and libraries | june 2011 two months later, we discovered that a preconfigured application called jumpbox for dspace was released and approved to be a much easier solution for the installation. the price was reasonable too, $149 a year (the price has jumped quite a bit since then). however, using jumpbox would leave our newly purchased red hat linux server of no use because jumpbox runs on ubuntu, therefore after some discussion we decided not to pursue it. we were a little stuck in the installation process. outsourcing the installation seemed to be a feasible solution for us at this point. we identified a reputable dspace service provider after doing extensive research including comparing vendors, obtaining references, and pursuing other avenues. after obtaining a quote, we were quite satisfied with the price and decided to contract with the vendor. while waiting for the contract to be approved by the university contracting office, i began designing the look and feel that is unique to the ttu school of law with some help from another library staff member. the installation finally took place at the beginning of january 2010. i worked very closely with the service provider during the installation to ensure the desired configuration for our dspace instance. our repository site with the ttu law branding became accessible to the public three days later. and with several weeks of warranty, we were able to adjust several configurations including display thumbnails for images. overall, we are very pleased with the results. after the installation, our it department maintains the dspace site and we host all the content on our own server. collection development of the ir content is the most critical element to an institutional repository. while we were waiting for our it department 66, the majority of the repositories worldwide were created using the dspace platform.2 for the installation, we looked at the opportunity to use services provided by the state digital library consortium texas digital library (tdl) and tried to pursue a partnership with the main university library, which had already implemented a digital repository. however, because of financial reasons and separate budgets, those approaches did not work out. so we decided to have our own it department install dspace. installation and customization of our dspace unlike large university libraries, smaller special libraries face many challenges while trying to establish an open source repository. after making the decision to use dspace, the first challenge we faced was the installation. dspace runs on postgresql or oracle and requires a server installation. customizing the web interface requires either the jspui (javaserver pages user interface) or xmlui (extensible markup language user interface). the staff in our it department knew little about dspace. however, another special library on campus offered their installation notes to our system administrator because they just installed dspace. although dspace runs on a variety of operating systems, we purchased red hat enterprise linux after some testing because it is the recommended os for dspace. then our system administrator spent several months trying to figure out how to install the software in addition to his existing projects. because we did not have dedicated it personnel working on the installation, the work was often interrupted and very difficult to complete. our it staff also found it very difficult to continue with the installation because the software requires a lot of expertise. by berkley press and is often used in the law library community. as a smaller law library, our budget did not allow us to purchase those platforms, which require annual fees of more than $10,000. so we had to look at the open source options. for the open source platforms, i investigated dspace, fedora, eprints and green stone. dspace is a javabased system developed by mit and hp labs. it offers a communitiescollections model and has built-in submission workflows and long-term preservation function. it can be installed “out of the box” and is easy to use. it has been widely adopted as institutional repository software in the united states and worldwide. fedora was also developed in the united states. it is more of a backend software with no web-based administration tools and requires a lot of programming effort. similar to dspace, eprints is another easy to set up and use ir software developed in the u.k. it is written in perl and is more widespread in europe. greenstone is a tool developed in new zealand for building and distributing digital library collections. it provides interfaces in 35 languages so it has many international users. when choosing an ir platform, it is not a question of which software is superior to others but rather which is more appropriate for the purpose and the content of the repository. our goal was to find a platform that had low costs and did not involve much programming. we also wanted a system that was capable of archiving digital items in various formats for the long term, flexible for data migration, had a widely accepted metadata scheme, decent search capability, and was easy to use. another factor we had to consider was the user base. because open source software relies on the user themselves for technical support for the most part, we wanted a software that had an active user community in the united states. dspace seemed to satisfy all of our needs. also, according to repository : | wang 83building an open source institutional repository at a small law school library | wang 83 hosted by the lubbock county bar association at the ttu law school. we made the initial announcement to the law faculty and staff and later to the lubbock county bar about the new digital initiative service we have established. we received very positive feedback from the law community. professor edgar’s family was delighted to see his collection made available to the public. following the success of the initial launch, i developed an outreach plan to promote the digital repository. to make the repository site more visible, several efforts were made: the repository site url was submitted to the dspace user registry, the directory of open access repositories (opendoar), and registry of open access repositories (roar); the site was registered with google webmaster tools for better indexing; and the repository was linked to several websites of the law school and library. the “faculty scholarship” collection and the “texas governor executive orders” collection became available shortly after. i then developed a poster of the newly established digital repository and presented it at the texas conference on digital libraries held at university of texas austin in may 2010. currently, our digital repository has more than eight hundred digital items as of august 2010. with more and more content becoming available in the repository, we plan on making an official announcement to the law community. we will also make entering first-year law students aware of the ir by including an article about the new repository in the library newsletter that is distributed to them during their orientation. our future marketing plan includes sending out announcements of new collections to the law school using our online announcement system techlawannounce and promoting the digital repository through the law library social networking pages on facebook and twitter. we also plan reviewed each year. based on the collection development policy, we made a decision to migrate the content of the old “faculty scholarship” collection from webpages into the digital repository. it was intended to include all publications of the texas tech law school faculty in the collection. we then hired a second-year law student as the digital project assistant and trained him on scanning, editing, and ocr-ing pdf files; uploading files to dspace; and creating basic metadata. we also brought another two student assistants on board to help with the migration of the faculty scholarship collection. the faculty services librarian checked the copyright with faculty members and publishers while i (the digital information management librarian) served as the repository manager handling more complicated metadata creation, performing quality control over student submissions, and overseeing the whole project. later development and promoting the ir during the faculty scholarship migration process, we discovered a need to customize dspace to allow active urls for publications. we wanted all the articles linked to three widely used legal databases: westlaw, lexisnexis, and hein online. because the default dspace system does not support active urls, it requires some programming effort to make the system detect a particular metadata field then render it as a clickable link. we outsourced the development to the same service provider who installed dspace for us. the results were very satisfying. the vendor customized the system to allow active urls and displayed the links as clickable icons for each legal database. in april 2010, “professor j. hadley edgar ’s personal papers” collection was made available in conjunction with his memorial service, to install dspace, we prepared and scanned two collections: the “texas governor executive orders” collection and the “professor j. hadley edgar’s personal papers” collection. the latter was a collection donated by professor edgar’s wife after he passed away in 2009. professor edgar taught at the law school from 1971 to 1991. he was named the robert h. bean professor of law and was twice voted by the student body as the outstanding law professor. the collection contains personal correspondence, photos, newspaper clippings, certificates, and other materials. many of the items have a high historic value to the law school. for the scanning standards, we used 200 dpi for text-based materials and 400 dpi for pictures. we chose pdf as our production file format as it is a common document format and smaller in size to download. after the installation was completed at the beginning of january, i drafted and implemented a digital repository collection development policy shortly after to ensure proper procedures and guidance of the repository development. the policy includes elements such as the purpose of the repository, scope of the collections, selection criteria and responsibilities, editorial rights, and how to handle challenges and withdrawals. i also developed a repository release form to obtain permissions from donors and authors to ensure open access for the materials in the repository. twelve collections were initially planned for the repository: “faculty scholarship,” “personal manuscripts,” “texas governor executive orders,” “law school history,” “law library history,” “regional legal history,” “law student works,” “audio/ video collection,” “dark archive,” “electronic journals,” “conference, colloquium and symposium,” and “lectures and presentations.” there will be changes to the collections in the future as the digital repository collection development policy will be 84 information technology and libraries | june 2011 all roads lead to rome. no matter what platform you choose, whether open source or not, the goal is to pick a system that best suits your organization’s needs. to build a successful institutional repository is not simply “scanning” and “putting stuff online.” various factors need to be considered, such as digitization, ir platform, collection development, metadata, copyright issues, and marketing and outreach. our experience has proven that it is possible for a smaller special library with limited resources and funding to establish an open source ir such as dspace and continue to maintain the site and build the collections with success. open source software is certainly not “free” because it requires a lot of effort. however, in the end it still costs a lot less than what we would pay to the proprietary software vendors. references 1. “the texas tech university school of law digital repository,” http://reposi tory.law.ttu.edu/ (accessed apr. 5, 2011). 2. “repository maps,” accessed http://maps.repository66.org/ (accessed aug. 16, 2010). (ssrn) links to individual articles in the faculty scholarship collection. after that, the next collections we will work on are the law school and law library history materials. we also plan to do some development on the dspace authentication to integrate with the ttu “eraider” system to enable single log-in. in the future, we want to explore the possibilities of setting up a collection for the works of our law students and engage in electronic journal publishing using our digital repository. conclusion it is not an easy task to develop an institutional repository from scratch, especially for a smaller organization. installation and development are certainly a big challenge for a smaller library with limited number of it staff. outsourcing these needs to a service provider seems to be a feasible solution. another challenge is training. we overcame this challenge by taking advantage of the state consortium’s dspace training sessions. subscribing to the dspace mailing list is necessary as it is a communication channel for dspace users to ask questions, seek help, and keep up to date about the software. on hosting information sessions for our law faculty and students to learn more about the digital repository. future projects there is no doubt that our digital repository will grow significantly because we have exciting collections planned for future projects. one of our law faculty, professor daniel benson, donated some of his personal files from an eight-year litigation representing the minority plaintiffs in the civil rights case of jones v. city of lubbock, 727 f. 2d 364 (5th cir. 1984) in which the minority plaintiffs won the case. the lawsuit changed the city of lubbock’s election system for city council members from the “at large” method to the “single member district system,” which allowed the minority candidates consistently being elected. this collection contains materials, notes, memoranda, letters, and other documents prepared and utilized by the plaintiffs’ attorneys. it has significant historical value because a texas tech law professor and five texas tech law graduates participated in that case successfully as pro bono attorneys for the minority plaintiffs. in addition, we plan on adding social science research network efficiently processing and storing library linked data using apache spark and parquet kumar sharma, ujjal marjit, and utpal biswas information technology and libraries | september 2018 29 kumar sharma (kumar.asom@gmail.com) is research scholar, department of computer science and engineering; ujjal marjit (marjitujjal@gmail.com) is system-in-charge, center for information resource management (cirm); and utpal biswas (utpal01in@yahoo.com) is professor, department of computer science and engineering, the university of kalyani, india. abstract resource description framework (rdf) is a commonly used data model in the semantic web environment. libraries and various other communities have been using the rdf data model to store valuable data after it is extracted from traditional storage systems. however, because of the large volume of the data, processing and storing it is becoming a nightmare for traditional datamanagement tools. this challenge demands a scalable and distributed system that can manage data in parallel. in this article, a distributed solution is proposed for efficiently processing and storing the large volume of library linked data stored in traditional storage systems. apache spark is used for parallel processing of large data sets and a column-oriented schema is proposed for storing rdf data. the storage system is built on top of hadoop distributed file systems (hdfs) and uses the apache parquet format to store data in a compressed form. the experimental evaluation showed that storage requirements were reduced significantly as compared to jena tdb, sesame, rdf/xml, and n-triples file formats. sparql queries are processed using spark sql to query the compressed data. the experimental evaluation showed a good query response time, which significantly reduces as the number of worker nodes increases. introduction more and more organizations, communities, and research-development centers are using semantic web technologies to represent data using rdf. libraries have been trying to replace the cataloging system using a linked-data technique such as bibframe.1 libraries have received much attention on transitioning marc cataloging data into rdf format.2 data stored in various other formats such as relational databases, csv, and html have already begun their journey toward the open-data movement.3 libraries have participated in the evolution of linked open data (lod) to make data an essential part of the web.4 various researchers have explored areas related to library data and linked data. in particular, transitioning legacy library data into linked data has dominated most of the research works. other areas include researching the impact of linked library data, investigating how privacy and security can be maintained, and exploring the potential effects of having open linked library data. obviously, a linked-data approach for publishing data on the web brings many benefits to libraries. first, once isolated library data currently stored using traditional cataloging systems (marc) becomes a part of the web, it can be shared, reused, and consumed by web users.5 this promotes the cross-domain sharing of knowledge hidden in the library data, opening the library as a rich source of information. online library users can share more information using linked library resources since every library mailto:kumar.asom@gmail.com mailto:marjitujjal@gmail.com mailto:utpal01in@yahoo.com efficiently processing and storing library linked data | sharma, marjit, and biswas 30 https://doi.org/10.6017/ital.v37i3.10177 resource is crawlable on the web via uniform resource identifiers (uri). most importantly, library data benefits from linked-data technology’s real advantages, such as interoperability, integration with other systems, data crosswalks, and smart federated search.6 numerous approaches have evolved for making the vision of the semantic web a success. no doubt, they have succeeded in making the library a part of the web, but there remain issues related to library big data. the term big data refers to data or information that cannot be processed using traditional software systems.7 the volume of such data is so large that it requires advanced technologies for processing and storing the information. libraries also have real concerns with large volumes of data during and after the transition to linked data. the main challenges are in processing and storage. during conversion from library data to rdf, the process can become stalled because of the large volumes of data. once the data is successfully converted into rdf formats, there are storage issues. finally, even if the data is somehow stored using common rdf triple stores, it is difficult to retrieve and filter. this is a challenging problem that every librarian must give attention to. librarians should know the real nature of library big data, which causes problems in analyzing data and decision making. librarians must also know the technologies that can resolve these issues. the rate of data generation and the complexity of the data itself are constantly increasing. traditional data-management tools are becoming incapable of managing the data. that is why the definition of big data has been characterized by five vs—volume, velocity, variety, value, and veracity.8 • volume is the amount of the data. • velocity is the data-generation rate (which is high in this case). • variety refers to the heterogeneous nature of the data. • value refers to the actual use of the data after the extraction. • veracity is the quality or trustworthiness of the data. to handle the five vs of big data, distributed technologies such as commodity hardware, parallel processing frameworks, and optimized storage systems are needed. commodity hardware reduces the cost of setting up a distributed environment and can be managed with very limited configurations. a parallel processing system can process distributed data in parallel to reduce processing time. an optimized storage system is required to store the large volume of data, supporting scalability to accommodate more data on demand. with these library requirements to tackle the challenges posed by library big data, a distributed solution is proposed. this approach is based on apache hadoop, apache spark, and a column-oriented storage system to process largesize data and to store the processed data in a compressed form. bibliographic rdf data from british national library and the national library of portugal have been used for this experiment. these bibliographic data are processed using apache spark and stored using apache parquet format. the stored data can be queried using sparql queries for which spark sql is used to execute queries. given an existing rdf dataset, we designed a schema for storing rdf data using a columnoriented database. using column-oriented design with apache parquet and spark sql as the query information technology and libraries | september 2018 31 processor, a distributed rdf storage system was implemented that can store any amount of rdf data by increasing the number of distributed nodes as needed. literature review while big data continues to rise, library data are still in traditional storage systems isolated from the web. to continue working with the web, libraries must redesign the way they format data and contribute toward the web of data. to serve library data to other communities, libraries must integrate their data with the web. attempts to do this have been made by several researchers. the task of integration cannot be achieved by only librarians; rather, it requires a team of experts in the field of library and information technology. the advanced way for integrating resources is with linked-data technology by assigning uris to every piece of library data. with this goal, there exist various projects related to the convergence of library data and linked data. one of these, bibframe, is an initiative to transition bibliographic resources into linked-data representation. bibframe aims to replace traditional cataloging standards such as marc and unimarc using the concept of publishing structured data on the web. marc formats cannot be exchanged easily with nonlibrary systems. the marc standard also suffers from inconsistencies, errors, and inability to express relationships between records and fields within the record. that is why mostly bibliographic resources stored in marc standards are targeted for conversion.9 other works include the open-data initiative from the british national library, library catalog to linked opendata conversion, exposing library data as linked data, and building a knowledge graph to reshape the library staff directory.10 linked data is fully dependent on rdf. rdf reveals graph-like structures where resources are linked with one another. thus, rdf can improve on marc standards because of its strong ability to link related resources. this system of revealing everything as a graph helps in building a network of library resources and other data on the web. this also makes for fast search functionality. in addition, searching a topic or book could bring similar graphs from other library resources, leading to the creation of linked-data service.11 such a service has been implemented by the german national library to provide bibliographic and authority data in rdf format, by the europeana linked open data with access to open metadata on millions of books and multimedia data, and by the library of congress linked data service.12 there is less discussion of library big data. though big data in general is in active research, the library domain has received much less attention than the broader concept of big data and its challenges. this could be because most of librarians working with linked data are from nontechnical backgrounds. now is the right time for libraries to give priority to adopting big data technologies to overcome challenges posed by big data. wang et al. have discussed library big data issues and challenges.13 they made some statements about whether library data belongs to the big data category. obviously, library data belongs to big data since it fulfills some of the characteristics of big data, such as volume, variety, and velocity. wang et al. also raise some of libraries’ challenges related to library big data, such as lacking teams of experts, inability to adopt big data due to budgetary issues, and technical challenges. finally, they point out that to take advantage of the web’s full potential, library data must be transformed into a format that can be accessible beyond the library using technologies like semantic web and linked data. the web has already started its work related to big-data challenges. libraries need to transition their data into an advanced format with the ability to handle big-data issues. the main problems efficiently processing and storing library linked data | sharma, marjit, and biswas 32 https://doi.org/10.6017/ital.v37i3.10177 related to library big data happen at data transformation and storage. to store and retrieve large amounts of data, we need commodity hardware that can handle trillions of rdf triples, requiring terabytes or petabytes of disk space. as of now, there are semantic web frameworks such as jena and sesame to handle rdf data, but these frameworks are not scalable for large rdf graphs.14 jena is a java-based framework for building semantic web and linked-data applications. it is basically a semantic web programming framework that provides java libraries for dealing with rdf data. jena tdb is the component of jena for storing and querying rdf data. 15 it is designed to work in a single-node environment. sesame is also a semantic web framework for processing, storing, and querying rdf data. basically, sesame is a web-based architecture for storing and querying rdf data as well as schema information. 16 background this section briefly describes the structure of rdf triples, apache spark along with its features and column-oriented database system, and apache parquet. structure of rdf triples rdf is a schema-less data model. it implies that the data is not fixed to a specific schema, so it does not need to conform to any predefined schema. unlike in relational tables, where we define columns during schema definition and those columns must contain the required type of data, in rdf we can have any number of properties and data using any kind of vocabulary. we only need vocabulary terms to embed properties. the vocabulary is created using domain ontology, which represents the schemas. to describe library resources we need a library-domain ontology. for example, to define a book and its properties one can use the bookont ontology.17 bookont is a book-structure ontology designed for an optimized book search and retrieval process. however, it is not mandatory to use existing ontology and all the properties defined under it. we can use terms from a newly created ontology or mixed ontologies with required properties. rdf represents resources in the form of subject, predicate, and object. the subject is the resource being described, identified by a uri. this subject can have any number of property-value pairs. this way representation of a resource is called knowledge representation, where everything is defined as a knowledge in the form of entity attribute value (eav). in rdf, the basic unit of information is a triple t, such that t = {subject, predicate, object}. such information when stored on disk is called a triplestore. the collection of rdf triples is called an rdf database. an rdf database is specially designed to store linked data to make the web more useful by interlinking data from different sources in a meaningful way. the real advantage of rdf is its support of the common data model. rdf is the standard way for publishing meaningful data on the web, and this is backed by linked data. linked data provides some rules about how data can be published on the web by following the rdf data model.18 with such a common data model, one can integrate data from any sources by inserting new property-value pairs without altering database schema. another important purpose of rdf is to provide resources to be processable by software agents on the web. rdf triples are of two types: literal triples and linked triples. literal triples consist of a uri referenced subject and a literal object (scalar value) joined by a predicate. in linked triples, both the subject and the object consist of uris linking by the predicate. this type of linking is called rdf link, which is the basis for interlinking the resources.19 rdf data are queried using the sparql query language.20 sparql is a graph-matching query language and is used to retrieve information technology and libraries | september 2018 33 triples from the triple store. the sparql queries are also called semantic queries. like sql queries, sparql also finds and retrieves the information stored in the triplestore. a sparql query is composed of five main components:21 • the prefix declaration part is used to abbreviate the uris; • the dataset definition is used to specify the rdf dataset from which the data is to be fetched; • the result clause is used to specify what information is needed to be fetched, which can be select, construct, describe, and ask; • the query pattern is used to specify the search conditions; and • the query modifiers are used to rearrange query results using order by, limit etc. hadoop and mapreduce hadoop is open-source software that supports distributed processing of large datasets on machine clusters.22 two core components—hadoop distributed file system (hdfs) and mapreduce— make distributed storage and computation of processing jobs possible.23 hdfs is the storage component, whereas mapreduce is a distributed data-processing framework, the computational model of hadoop based on java. the mapreduce algorithm consists of two main tasks: map and reduce. the map task takes a set of data as input and produces another set of data with individual components in the form of key/value pairs or tuples. the output of the map task goes to the reduce task, which combines common key/value pairs into a smaller set of tuples. hdfs and mapreduce are based on driver/worker architecture consisting of driver and worker nodes having different roles. an hdfs driver node is called the name-node while the worker node is called the data-node. the name-node is responsible for managing names and data blocks. data blocks are present in the data-nodes. data-nodes are distributed across each machine, responsible for actual data storage. similarly, the mapreduce driver node is called the job-tracker and the worker node is called the task-tracker. job-tracker is responsible for scheduling jobs on task-trackers. task-tracker again is distributed across each machine along with the data-nodes, responsible for processing map and reducing tasks as instructed by the job-tracker. the concept of hadoop implies that the set of data to be processed is broken into smaller forms that can be processed individually and independently. this way, tasks can be assigned to multiple processors to process the data, and eventually it becomes easy to scale data processing over multiple computing nodes. once a mapreduce program is written, the program can be scaled to run over thousands of machines in a cluster. spark and resilient distributed datasets (rdd) apache spark is an in-memory cluster computing platform, which is a faster batch-processing framework than mapreduce. more importantly, it supports in-memory processing of tasks along with data, so querying data is much faster than disk-based engines. the core of spark is the resilient distributed dataset (rdd). rdd is a fundamental data structure of spark that holds a distributed collection of data where data cannot be modified. rather, data modification yields another immutable collection of data (or rdd). this process is called rdd transformation. for example, figure 1 depicts an example of rdd transformation. the distributed processing and efficiently processing and storing library linked data | sharma, marjit, and biswas 34 https://doi.org/10.6017/ital.v37i3.10177 transformation of data is managed by rdd. rdds are fault-tolerant, meaning that the lost data is recoverable using lineage graph of rdds.24 spark constructs a direct acyclic graph (dag) of a sequence of computations that needed to be performed on data. spark has the most powerful computing engine that allows most of the computations in multistage memory. because of this multistage in-memory computation engine, it provides better performance at reading and writing data than the mapreduce paradigm.25 it aims at speed, ease of use, extensibility, and interactive analytics. spark relies on concepts such as rdd, dag, spark context, transformations, and actions. spark context is an execution environment in which rdds and broadcasting variables can be created. spark context is also called the master of a spark application and allows accessing the cluster through a resource manager. data transformation happens in the spark application when the data is loaded from a data-store into rdds and some filter or map functions are performed to produce a new set of rdds. when the set of computations is created, forming a dag, it does not perform any execution; rather, it prepares for execution in the end, like a lazy loading process. some examples of actions are data extraction or collection and getting the count of words. transformations are the sequence of events, and action is the final execution of the underlying logic. figure 1. rdd transformations. the execution model of spark is shown in figure 2. the execution model is based on the driver/worker architecture consisting of the driver and the worker processes. the driver process creates the spark context and schedules tasks based on the available worker nodes. initially, the master process must be started, then creating worker nodes follows. the driver takes the responsibility of converting a user’s application into several tasks. these tasks are distributed among the workers. the executors are the main components of every spark application. executors actually perform data processing, reading and writing data to the external sources and the storage system. the spark manager is responsible for resource allocation and deallocation to the spark job. basically, spark is only a computation model. it is not related to storage of data, which is a different concept. it only helps in computations and data analytics in a distributed manner. for distributed execution, the task is distributed among the connected nodes so that every node can perform tasks at the same time; it performs the desired operation and notifies the master upon completion of the task. information technology and libraries | september 2018 35 figure 2 execution model of spark. in mapreduce, read/write operations happen between disk and memory, making job computation slower than spark. rdds resolve this by allowing fault-tolerant, distributed, in-memory computations. in rdd, the first load of data is read from disk and then a write-to-disk operation may take place depending upon the program. the operations between first read and last write happen in memory. data on rdds are lazily evaluated, i.e., during rdd transformations, data will not take part until any action is called on the final rdd, which triggers the job execution. the chain of rdd transformations creates dependencies between rdds. each dependency has a function for calculating its data and a pointer to its parent rdd. spark divides rdd dependencies into stages and tasks, then it sends them to workers for execution. hence, an rdd does not actually hold the data; rather, it either loads data from disk or from another rdd and performs some actions on the data for producing results. one of the important features of rdd is its fault tolerance, because of which it can retain and recompute any of the unsuccessful partitions due to node failures. rdds have built-in methods for saving data into files. for example, the rdd calls on saveastextfile(), its data are written on the specified text file line by line. there are numerous options for storing data in different formats, such as json, csv, sequence files, and object files. all these file formats can be saved directly into hdfs or normal file systems. spark sql and dataframe spark sql is a query interface for processing structured data using sql style on the distributed collection of data. that means it is used for querying structured data stored in hdfs (like hive) and parquet. spark sql runs on top of spark as a library and provides higher optimization. the efficiently processing and storing library linked data | sharma, marjit, and biswas 36 https://doi.org/10.6017/ital.v37i3.10177 spark dataframe is an api (application programming interface) that can perform relational operations on rdds and external data sources such as hive and parquet. like rdds, a spark dataframe is also a collection of structured records that can be manipulated by spark sql. it evaluates operations lazily to perform relational optimizations.26 a dataframe is created using rdds along with the schema information. for example, the java code snippet below creates a dataframe using rdd and a schema called rdftriple (rdf-triple schema will be discussed in the proposed approach). javardd<string> n_triples_ = marc_records.map(new texttostring()); javardd<rdftriple> rdf_triples = n_triples.map(new linestordffunction()); dataset<row> dataframe = sparksession.createdataframe(rdf_triples, rdftriple.class); dataframe.write().parquet("/full-path/rdfdata.parquet"); the spark dataframe uses memory management wisely by saving data in off-heap memory and provides an optimized execution plan. conceptually, a dataframe is equivalent to the relational tables with richer optimization and supports sql queries over its data. so, a dataframe is used for storing data into tables. structured data from spark dataframe can be saved into the parquet file format as shown in the above code snippet. column-oriented database a database is a persistent collection of records. these records are accessed via queries. the system that stores data and processes queries to retrieve data is called a database system. such systems use indexes or iteration over the records to find the required information stored in the database. indexes are an auxiliary, dictionary-like data structure that keeps indexes of individual records. indexing is efficient in some cases, however, as it requires two lookup operations and it slows down the access time. data scanning or iteration over each record resolves the query by finding the exact location of the records. it is inefficient when the size of the data is too large. as data-generation rate is increasing constantly, more and more data is going to be stored on the disk. for a fast-growing rate of data, we need a system that can adjust to more data than traditional storage systems and, at the same time, query-processing tasks should take less time. when the data gets too large, indexing and record scanning will be costly during querying. hence, a satisfying solution is the columnar-storage system, which stores data by columns rather than by rows. 27 a column-oriented database system stores data in corresponding columns, and each column is stored in a separate file into the disk. this makes data access time much quicker. since each column is stored separately, any required data can directly be accessed instead of reading all the data. that means any column can be used as an index, making it auto-indexing. that is why the column-oriented representation is much faster than the row-oriented representation. apart from this, data is stored in the compressed form. each column is compressed using a different scheme. in the column-oriented database, the compression is always efficient as all the values belong to the same data type. hence, column-oriented databases require less disk space, as they do not need additional storage for indexes since the data is stored within the indexes themselves. consider an example where a database table named “book” consisting of columns “bookid,” “title,” and “price.” following a column-oriented approach, all the values for bookid are stored together under the “bookid” column, all the values for title are stored together under “title” column. and so on as shown in figure 3. information technology and libraries | september 2018 37 figure 3 an example of an entity and its row and column representation. apache parquet parquet is a top-level apache project that stores data in column-oriented fashion, highly compressed and densely packed in the disk.28 it is a self-describing data format that embeds schema within the data itself. it supports efficient compression and encoding schemes that allows lowering data-storage costs and maximizes the effectiveness of querying data. parquet has added advantages, such as limiting the i/o operation and storing data in compressed form using the snappy method developed by google and used in its production environment. hence it is designed especially for space and query efficiency. snappy aims at compressing petabytes of data in minimal amounts of time, and especially aims for resolving big data issues.29 the data compression rate is more than 250 mb/sec, and decompression rate is more than 500 mb/sec. these compression and decompression rates are for a single core of a system having a core i7 processor in 64-bit mode. it is even faster than the fastest mode of zlib compression algorithm.30 parquet is implemented using column-striping and assembly-language algorithms that are optimized for storing large data-blocks.31 it supports nested data structures in which each value of the same column is stored in contiguous memory locations.32 apache parquet is flexible and can work with many programming languages because it is implemented using apache thrift (https://thrift.apache.org/). a parquet file is divided into row groups and metadata at the end of the file. each row group is divided into column values (or column chunks), such as column 1, column 2, and so on as shown in figure 4. each column value is divided into pages, and each page consists of the page header, repetition levels, definition levels, and values. the footer of the file contains various metadata, such as file metadata, column metadata, and page-header metadata. the metadata information is required to locate and find the values, just like indexing. https://thrift.apache.org/ efficiently processing and storing library linked data | sharma, marjit, and biswas 38 https://doi.org/10.6017/ital.v37i3.10177 figure 4 parquet file structure. the proposed approach the proposed approach relies on spark’s core apis—rdd, spark sql, and dataframe—which can operate on large datasets. rdd is used to load the initial data from the input file, process the data and transform them into triple structure. spark dataframe is used to load the data from rdd into the triple structure and send the transformed rdf data into a parquet file. spark sql is used to fetch the data stored in the parquet file. processing rdf data processing rdf data from large rdf/xml files requires breaking the file into smaller file components. general data-processing systems cannot handle large files because they face memory issues. at this stage, the proposed approach can process the data using an n-triples file, hence individual rdf/xml files again need to be converted into the n-triples file format. the process of breaking rdf/xml file into smaller file components and then converting them into n-triples format depends upon the size of the input file. if it is not more than 500 mb then it is directly converted into n-triples file format. multiple rdf/xml files are converted into individual ntriples file formats, which are again combined into one n-triples file, as the proposed spark application reads input from a single file. information technology and libraries | september 2018 39 schema to store rdf data a simple rdf schema with three triple entities has been designed. this schema is an rdf triple view, which is the building block of the rdf storage schema proposed in this work. the rdf triple view is a simple java class consisting of three attributes—subject, predicate, and object. given an rdf dataset d, consisting of a set of rdf triples t, in either rdf/xml or n-triples format, the dataset is transformed into a format that can be processed by a spark application. further, the dataset is transformed into a line-based format where the individual triple statement is placed in a line separated by a new-line (\n) character. a line contains three components—subject, predicate, and object separated by a space. here each line is unique, using the combined information of subject, predicate, and object. given an rdf triple structure ti, ti = (si, pi, oi) and ti ∈ t, for each t an instance of rdf triple view is created to hold the triple information. the columnar schema organizes triple information into three components, storing each component separately as subject, predicate, and object columns (figure 5). figure 5. rdf triple view. rdf storage we store the rdf data based on rdf triple view, which is the main schema for storing data in the triple representation. we do not need any indexing or additional information related to subject, predicate, or object to be stored on the disk. since we can have any number of temporary dataframe tables in memory, join operations can be performed using these tables to filter the data. in the absence of expensive indexing and additional triple information, storage area can be reduced significantly. apart from this, the compression technique used in apache parquet reduces lot more space than storing in other triple stores. in figure 6, we illustrate the data-storing process. efficiently processing and storing library linked data | sharma, marjit, and biswas 40 https://doi.org/10.6017/ital.v37i3.10177 figure 6. data-storing process in hdfs. the collection of triple instances is loaded into an rdd. at the end, the collection of triple instances is loaded into spark dataframe. spark dataframes are equivalent to the rdbms tables and support both structured and unstructured data formats. using a single schema, multiple dataframes can be used and can be registered as temporary tables in the memory, where highlevel sql queries can be executed on top of them. here the concept of using multiple dataframes with a single schema is motivated to avoid joins and indexing. in the final step, the spark dataframe is saved into hdfs files in the parquet format. from the parquet file, the data can be loaded back into dataframes in memory and queried using spark sql. fetching data from storage given an rdf dataset d, a sparql query q, and a columnar-schema s, we use s to translate q to q' to perform queries on top of s. here, the answer of query q' on top of s is equal to the answer of q on top of d. query mappings m are used to transform sparql queries into spark sql queries. for querying, first the data is loaded into a spark dataframe from parquet files. to query data using sparql, queries must follow basic graph patterns (bgp). a bgp is a set of triple patterns similar to an rdf triple (s, p, o) where any of s, p, and o can be query variables or literals. bgp is used for matching a triple pattern to an rdf graph. this process is called binding between query variables and rdf terms. the statements listed under the where clause is known as bgp consisting of query patterns. for example, the query “select ?name ?mbox where {?x foaf:name ?name . ?x foaf:mbox ?mbox .}” has two query patterns. to evaluate the query containing two query patterns, one join is required. based on the total number of query patterns, information technology and libraries | september 2018 41 we need one less number of joins. that is, for n number of query patterns we need n-1 joins to resolve the values. figure 7 illustrates the process of query execution. figure 7. process of query execution. evaluation to evaluate the proposed approach we compare the storage size with file-based storage systems such as n-triples files and rdf/xml files. we also compare with standard triple stores such as jena tdb and sesame. the data-storing time is compared with jena tdb, sesame, and parquet, having one, two, and three worker nodes respectively. finally, for the purposes of the experiment, some sparql queries are selected and tested over rdf data stored in parquet format into hdfs. the query performance is tested on the distributed system having one, two, and three worker nodes respectively. in the following subsections, we show the results for each of the above comparisons. datasets for evaluation, we use two datasets. dataset 1 contains bibliographic data from the national library of portugal (nlp) (http://opendata.bnportugal.pt/eng_linked_data.htm). from nlp, we choose the nlp catalogue datasets in rdf/xml formats. the datasets are freely available to reuse and contain metadata information from nlp catalogue, the national bibliographic database, the portuguese national bibliography, and the national digital library. the datasets are available as linked data, which were produced in the context of the european library. the size of the rdf/xml file is 6.46 gb with more than 45 billion rdf triples. http://opendata.bnportugal.pt/eng_linked_data.htm efficiently processing and storing library linked data | sharma, marjit, and biswas 42 https://doi.org/10.6017/ital.v37i3.10177 dataset 2 contains bibliographic data from the british national library (https://www.bl.uk/bibliographic/download.html). from the british national bibliography collection we choose the bnb lod books dataset. the datasets are publicly available and contain bibliographic records of different categories, such as books, locations, bibliographic resources, persons, organizations, and agents. the datasets are divided into sixty-seven files in rdf format. however, we combine them into one file in n-triples format to fit the requirement of the large size of the input data. the combined file is 22.52 gb and contains more than 16 billion rdf resources in n-triples format, making it suitable for the proposed approach. from this conversion, we get more than 150 billion rdf triples. figure 8. data storage time for different file formats. figure 9. disk size for different file formats. disk storage figure 8 shows the data-storing time using sesame, jena tdb, and parquet for the above two datasets. data from raw rdf files are stored in jena tdb and sesame. individual files are processed for storing into jena tdb and sesame to avoid memory overflow as jena or sesame models cannot load data at once from the large files. to store data in parquet format we run the program separately on different worker nodes. figure 9 presents the total disk size required for each of these file formats and triple stores for the two datasets. https://www.bl.uk/bibliographic/download.html information technology and libraries | september 2018 43 query performance for testing, the sparql queries are converted manually at this stage. we run some of the selected queries over bibliographic rdf data stored in parquet file format in hdfs. we run the following type of queries on worker nodes 1, 2 and 3 respectively. the queries are listed below: q1) the first query is to fetch the count of rdf triples present in the storage. query: select (count(*) as ?count) where ?s ?p ?o . q2) the second query is to fetch the entire dataset in spo format. it fetches data in the n triples format. query: select * { ?s ?p ?o } . q3) the third query is to fetch resources that belong to books with the subject “english language composition and exercises.” query: select ?s where ?x rdf:type bibo:book . ?x dc:subject <http://bnb.data.bl.uk/id/concept/lcsh/english_language_composition_and_exercises> . q4) the fourth query is to fetch resources that belong to books with the subject “english language composition and exercises” and creator “palmer frederick.” query: select ?s where ?x rdf:type bibo:book . ?x dc:subject <http://bnb.data.bl.uk/id/concept/lcsh/english_language_composition_and_exercises> . ?x dc:creator <http://bnb.data.bl.uk/id/person/palmer_frederick>. q5) the fifth query is to fetch objects having predicate dcterms:ispartof. query: select ?name where ?s dcterms:ispartof ?name . figure 10 shows the query response time for the above queries on different worker nodes for two different datasets. the queries are executed in the distributed environment. it shows that increasing the number of worker nodes decreases the query response time. efficiently processing and storing library linked data | sharma, marjit, and biswas 44 https://doi.org/10.6017/ital.v37i3.10177 figure 10. query response time with different numbers of worker nodes. query comparison for comparing query response time, the proposed approach is tested with the first dataset as mentioned above. though at this stage the proposed approach requires further research to be compared with other distributed triple storage systems. also, it requires more worker nodes and larger datasets compatible for parallel processing in the distributed environment. with a smaller setup, it will be hard to analyze the performance of the individual approaches, as they may produce similar results. we compare the proposed approach with the standard jena tdb solution in a single-node environment. the following sparql queries are tested against dataset 1. prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> prefix dc: <http://purl.org/dc/terms/> prefix rdau: <http://rdaregistry.info/elements/u/> prefix foaf: <http://xmlns.com/foaf/0.1/> q1. select (count(*) as ?count) { ?s ?p ?o } q2. select * { ?s ?p ?o } q3. select ?x where { ?x rdf:type dc:bibliographicresource. } q4. select ?x where { ?x rdf:type <http://schema.org/book>. ?x rdau:p60339 'time out lisboa'. } q5. select ?s where {?s dc:ispartof <http://data.theeuropeanlibrary.org/collection/a0511>. ?s foaf:page 'http://www.theeuropeanlibrary.org/tel4/record/3000115318515'. } information technology and libraries | september 2018 45 figure 11. query comparison. we are interested in measuring the query response time with the above queries. first, we test with jena tdb. we then test the proposed approach on a single-node environment. we execute the above set of queries multiple times to record the average performance. as mentioned above, no indexing is used in the storage. rdf triples are stored as they appeared in the n-triples file. queries are executed without indexing and are still getting better performance than jena tdb, as shown in figure 11. discussion in this article, we claim that apache spark and column-oriented databases can resolve library big data issues. especially when dealing with rdf data, spark can perform far better than other approaches because of its in-memory processing ability. concerning rdf data storage, the column-oriented database is suitable for storing the large volume of data because of its scalability, fast data loading, and highly efficient data compression and partitioning. a column-oriented database system requires less disk, reducing the storage area. as a proof, we have shown the data storage comparison and the performance of the columnar-storage for rdf data using parquet formats in hdfs. as shown in the results, apache parquet takes much less disk space as compared to other storage systems. also, the data-storing time is relatively very small as compared to others. we observed that the result of query 2 is the entire dataset stored in parquet format. the size of this resultant dataset is 22.52 gb, which is the same as the original size. the same dataset when stored with parquet format is reduced to 2.89 gb. this shows that parquet is a very optimized efficiently processing and storing library linked data | sharma, marjit, and biswas 46 https://doi.org/10.6017/ital.v37i3.10177 storage system that can reduce the storage cost. we have shown the query response time for five different sparql queries on distributed nodes for two different datasets. we believe with better schema for storing rdf triples the proposed approach can be improved, and with the used technologies a fast and reliable triple store can be designed. conclusion and future work librarians all over the globe should give priority to integrating library data with the web to enable cross-domain sharing of library data. to do this, they must pay attention to current trends in big data technologies. because the data-generation rate is increasing in every domain, traditional data processing and storage systems are becoming ineffective because of the scale and complexity of the data. in this article, we present a distributed solution for processing and storing a large volume of library linked data. from the experiment, we observe that the processing of large volume of the data takes significantly less time using the proposed approach. also, the storage area is reduced significantly as compared to other storage systems. in the future we plan to optimize the current approach using advanced technologies such as graphx, machine learning tools, and other big -data technologies for even faster data processing, searching, and analyzing. references 1 eric miller et al., “bibliographic framework as a web of data: linked data model and supporting services,” library of congress, november 11, 2012, https://www.loc.gov/bibframe/pdf/marcld-report-11-21-2012.pdf. 2 brighid m. gonzales, “linking libraries to the web: linked data and the future of the bibliographic record,” information technology and libraries 33 no. 4 (2014): 10, https://doi.org/10.6017/ital.v33i4.5631; myung-ja k. han et al., “exposing library holdings metadata in rdf using schema.org semantics,” in international conference on dublin core and metadata applications dc-2015, são paulo, brazil, september 1–4, 2015, pp. 41–49, http://dcevents.dublincore.org/intconf/dc-2015/paper/view/328/363. 3 franck michel et al., “translation of relational and non-relational databases into rdf with xr2rml,” in proceedings of the 11th international conference on web information systems and technologies, lisbon, portugal, 2015, pp. 443–54, https://doi.org/10.5220/0005448304430454; varish mulwad, tim finin, and anupam joshi, “automatically generating government linked data from tables,” working notes of aaai fall symposium on open government knowledge: ai opportunities and challenges 4, no. 3 (2011), https://ebiquity.umbc.edu/_file_directory_/papers/582.pdf; matthew rowe, “data.dcs: converting legacy data into linked data,” ldow 628 (2010), http://ceur-ws.org/vol628/ldow2010_paper01.pdf. 4 virginia schilling, “transforming library metadata into linked library data,” association for library collections and technical services, september 25, 2012, http://www.ala.org/alcts/resources/org/cat/research/linked-data. 5 getaneh alemu et al., “linked data for libraries: benefits of a conceptual shift from libraryspecific record structures to rdf-based data models,” new library world 113, no. 11/12 (2012): 549–70 (2012), https://doi.org/10.1108/03074801211282920. https://www.loc.gov/bibframe/pdf/marcld-report-11-21-2012.pdf https://doi.org/10.6017/ital.v33i4.5631 http://dcevents.dublincore.org/intconf/dc-2015/paper/view/328/363 https://doi.org/10.5220/0005448304430454 https://ebiquity.umbc.edu/_file_directory_/papers/582.pdf http://ceur-ws.org/vol-628/ldow2010_paper01.pdf http://ceur-ws.org/vol-628/ldow2010_paper01.pdf http://www.ala.org/alcts/resources/org/cat/research/linked-data https://doi.org/10.1108/03074801211282920 information technology and libraries | september 2018 47 6 lisa goddard and gillian byrne, “the strongest link: libraries and linked data,” d-lib magazine, 16, no. 11/12 (2010), https://doi.org/10.1045/november2010-byrne. 7 t. nasser and r. s. tariq, “big data challenges,” journal of computer engineering & information technology 4, no. 3 (2015), https://doi.org/10.4172/2324-9307.1000133. 8 alexandru adrian tole, “big data challenges,” database systems journal 4, no. 3 (2013): 31–40, http://dbjournal.ro/archive/13/13_4.pdf. 9 carol jean godby and karen smith-yoshimura, “from records to things: managing the transition from legacy library metadata to linked data,” bulletin of the association for information science and technology 43, no. 2 (2017): 18–23, https://doi.org/10.1002/bul2.2017.1720430209. 10 corine deliot, “publishing the british national bibliography as linked open data,” catalogue & index, issue 174 (2014): 13–18, http://www.bl.uk/bibliographic/pdfs/publishing_bnb_as_lod.pdf; gustavo candela et al., “migration of a library catalogue into rda linked open data,” semantic web 9, no. 4 (2017): 481–91, https://doi.org/10.3233/sw-170274; martin malmsten, “exposing library data as linked data,” ifla satellite preconference sponsored by the information technology section: emerging trends in 2009, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.181.860&rep=rep1&type=pdf ; keri thompson and joel richard, “moving our data to the semantic web: leveraging a content management system to create the linked open library,” journal of library metadata 13, no. 2– 3 (2013): 290–309, https://doi.org/10.1080/19386389.2013.828551; jason a. clark and scott w. h. young, “linked data is people: building a knowledge graph to reshape the library staff directory,” code4lib journal 36 (2017), http://journal.code4lib.org/articles/12320; martin malmsten, “making a library catalogue part of the semantic web,” humbolt university of berlin, 2008, https://doi.org/10.18452/1260. 11 r. hastings, “linked data in libraries: status and future direction,” computers in libraries 35, no. 9 (2015): 12–28, http://www.infotoday.com/cilmag/nov15/hastings--linked-data-inlibraries.shtml. 12 mirjam keßler, “linked open data of the german national library,” in eco4r workshop lod of dnb, 2010; antoine isaac, robina clayphan, and bernhard haslhofer, “europeana: moving to linked open data,” information standards quarterly 24, no. 2/3 (2012)<<qy: page range?>>; carol jean godby and ray denenberg, “common ground: exploring compatibilities between the linked data models of the library of congress and oclc,” oclc online computer library center, 2015, https://files.eric.ed.gov/fulltext/ed564824.pdf. 13 chunning wang et al., “exposing library data with big data technology: a review,” 2016 ieee/acis 15th international conference on computer and information science (icis), pp. 1-6, https://doi.org/10.1109/icis.2016.7550937. 14 b. mcbride, “jena: a semantic web toolkit,” ieee internet computing 6, no. 6 (2002): 55–59, https://doi.org/10.1109/mic.2002.1067737; jeen broekstra, arjohn kampman, and frank van https://doi.org/10.1045/november2010-byrne https://doi.org/10.4172/2324-9307.1000133 http://dbjournal.ro/archive/13/13_4.pdf https://doi.org/10.1002/bul2.2017.1720430209 http://www.bl.uk/bibliographic/pdfs/publishing_bnb_as_lod.pdf https://doi.org/10.3233/sw-170274 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.181.860&rep=rep1&type=pdf https://doi.org/10.1080/19386389.2013.828551 http://journal.code4lib.org/articles/12320 https://doi.org/10.18452/1260 http://www.infotoday.com/cilmag/nov15/hastings--linked-data-in-libraries.shtml http://www.infotoday.com/cilmag/nov15/hastings--linked-data-in-libraries.shtml https://files.eric.ed.gov/fulltext/ed564824.pdf https://doi.org/10.1109/icis.2016.7550937 https://doi.org/10.1109/mic.2002.1067737 efficiently processing and storing library linked data | sharma, marjit, and biswas 48 https://doi.org/10.6017/ital.v37i3.10177 harmelen, “sesame: a generic architecture for storing and querying rdf and rdf schema,” international semantic web conference, ed. j. davies, d. fensel, and f. van harmelen (berlin and heidelberg: springer, 2002), https://doi.org/10.1002/0470858060.ch5. 15 “apache jena—tdb,” apache jena, accessed august 22, 2018, https://jena.apache.org/documentation/tdb/. 16 “sesame (framework),” everipedia, july 15, 2016, https://everipedia.org/wiki/sesame_(framework)/. 17 asim ullah et al., “bookont: a comprehensive book structural ontology for book search and retrieval,” 2016 international conference on frontiers of information technology (fit), 211– 16, https://doi.org/10.1109/fit.2016.046. 18 tom heath and christian bizer, “linked data: evolving the web into a global data space,” synthesis lectures on the semantic web: theory and technology 1, no. 1 (2011): 1–136, https://doi.org/10.2200/s00334ed1v01y201102wbe001. 19 christian bizer et al., “linked data on the web (ldow2008),” proceeding of the 17th international conference on world wide web—www 08, 2008, pp. 1265–66 (2008), https://doi.org/10.1145/1367497.1367760. 20 eric prud and andy seaborne, “sparql query language for rdf,” w3c recommendation, january 15, 2008, https://www.w3.org/tr/rdf-sparql-query/. 21 devin gaffney, “how to use sparql,” datagov wiki rss, last modified april 7, 2010, https://data-gov.tw.rpi.edu/wiki/how_to_use_sparql. 22 tom white, hadoop: the definitive guide (sebastopol, ca: o’reilly media,, 2012), https://www.isical.ac.in/~acmsc/wbda2015/slides/hg/oreilly.hadoop.the.definitive.guide. 3rd.edition.jan.2012.pdf. 23 dhruba borthakur, “the hadoop distributed file system: architecture and design,” hadoop project website, 2007, http://svn.apache.org/repos/asf/hadoop/common/tags/release0.16.3/docs/hdfs_design.pdf; seema maitrey and c. k. jha, “mapreduce: simplified data analysis of big data,” procedia computer science 57 (2015), 563–71 (2015), https://doi.org/10.1016/j.procs.2015.07.392. 24 michael armbrust et al., “spark sql: relational data processing in spark,” in proceedings of the 2015 acm sigmod international conference on management of data (new york: acm, 2015), 1383–94, https://doi.org/10.1145/2723372.2742797. 25 abdul ghaffar shoro and tariq rahim soomro, “big data analysis: apache spark perspective,” global journal of computer science and technology 15, no. 1 (2015), https://globaljournals.org/gjcst_volume15/2-big-data-analysis.pdf. 26 salman salloum et al., “big data analytics on apache spark,” international journal of data science and analytics 1, no. 3–4 (2016): 145–64, https://doi.org/10.1007/s41060-016-0027-9. https://doi.org/10.1002/0470858060.ch5 https://jena.apache.org/documentation/tdb/ https://everipedia.org/wiki/sesame_(framework)/ https://doi.org/10.1109/fit.2016.046 https://doi.org/10.2200/s00334ed1v01y201102wbe001 https://doi.org/10.1145/1367497.1367760 https://www.w3.org/tr/rdf-sparql-query/ https://data-gov.tw.rpi.edu/wiki/how_to_use_sparql https://www.isical.ac.in/~acmsc/wbda2015/slides/hg/oreilly.hadoop.the.definitive.guide.3rd.edition.jan.2012.pdf https://www.isical.ac.in/~acmsc/wbda2015/slides/hg/oreilly.hadoop.the.definitive.guide.3rd.edition.jan.2012.pdf http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.16.3/docs/hdfs_design.pdf http://svn.apache.org/repos/asf/hadoop/common/tags/release-0.16.3/docs/hdfs_design.pdf https://doi.org/10.1016/j.procs.2015.07.392 https://doi.org/10.1145/2723372.2742797 https://globaljournals.org/gjcst_volume15/2-big-data-analysis.pdf https://doi.org/10.1007/s41060-016-0027-9 information technology and libraries | september 2018 49 27 daniel j. abadi, samuel r. madden, and nabil hachem, “column-stores vs. row-stores: how different are they really?,” in proceedings of the 2008 acm sigmod international conference on management of data (new york: acm, 2008), 967–80, https://doi.org/10.1145/1376616.1376712. 28 deepak vohra, “apache parquet,” in practical hadoop ecosystem (berkeley, ca: apress, 2016), 325–35, https://doi.org/10.1007/978-1-4842-2199-0_8. 29 “google/snappy,” github, january 04, 2018, https://github.com/google/snappy. 30 jean-loup gailly and mark adler, “zlib compression library,” 2004, https://www.repository.cam.ac.uk/bitstream/handle/1810/3486/rfc1951.txt?sequence=4. 31 sergey melnik et al., “dremel: interactive analysis of web-scale datasets,” proceedings of the vldb endowment 3, no. 1–2 (2010): 330–39, https://doi.org/10.14778/1920841.1920886. 32 marcel kornacker et al., “impala: a modern, open-source sql engine for hadoop,” in proceedings of the 7th biennial conference on innovative data systems research, asilomar, california, january 4–7, 2015, http://www.inf.ufpr.br/eduardo/ensino/ci763/papers/cidr15_paper28.pdf. https://doi.org/10.1145/1376616.1376712 https://doi.org/10.1007/978-1-4842-2199-0_8 https://github.com/google/snappy https://www.repository.cam.ac.uk/bitstream/handle/1810/3486/rfc1951.txt?sequence=4 https://doi.org/10.14778/1920841.1920886 http://www.inf.ufpr.br/eduardo/ensino/ci763/papers/cidr15_paper28.pdf abstract introduction literature review background structure of rdf triples hadoop and mapreduce spark and resilient distributed datasets (rdd) spark sql and dataframe column-oriented database apache parquet the proposed approach processing rdf data schema to store rdf data rdf storage fetching data from storage evaluation datasets disk storage query performance query comparison discussion conclusion and future work references lib-s-mocs-kmc364-20140601052109 computer assisted circulation control at health sciences library sunyab 87 jean k. miller: associate health sciences librarian for circulation and dissemination a description of the circulation system which the health scien ces library at the state university of new york at buffalo has been using since october 1970. features of the system include automatic production of overdue, fin e, and billing notices; notices for call-in of requested books; and book availability notices. remote operation and processing on the ibm 360/40 and cdc 6400 computer are accomplished via the administrative terminal system (ats) and terminal job entry (t]e). th e system provides information for management of the collection and improved service to the user. introduction the health sciences library of the state university of new york at buffalo (sunyab) serves the teaching, research, and clinical programs of the five schools of health sciences at the university-medicine, dentistry, pharmacy, nursing, health related professions-as well as the department of biology. it is the biomedical resource library for the five teaching hospitals affiliated with sunyab and for the health professionals within the nine counties of the lakes area regional medical program. service demands had increased steadily since 1961 with the incorporation of the university within the state of new york this was apparent in the circulation department where statistics indicated a 21 percent increase in the circulation of materials between fy 1967/ 68 and 1968/ 69. the circulation system in use was inefficient and time-consuming for both the user and the clerical staff. the user was required to fill out a charge card for each book, giving his name, address, and status; and the title, author, year of publication, volume, copy number, and call number of the book. the card 88 journal of library automation vol. 5/ 2 june , 1972 was stamped with the due date and filed alphabetically by main entry. problems resulted from illegible handwriting, selection of incorrect main entry, and incorrect filing. control of library materials was inadequate. the system to be described was adopted following consideration of the requirements of an effective system of circulation control and of the resources available to the library. planning for the development and implementation of the automated circulation system began in the fall of 1969. funding was provided by a medical library resource grant and the office of the provost of health sciences of sunyab. system design began in february 1970; programming was accomplished during june and july; implementation started in august; and the system was operational in october 1970. costs of operation have been provided by the university libraries of sunyab since april 1971. computer facilities the health sciences library shares the facilities provided by the department of computer services on campus. the current installation is an ibm 360/40 h with an eight-disk drive 2319 unit, six magnetic tape devices, card read and punch unit, and a 1100 line-per-minute printer. it includes a 2703 telecommunications unit supporting forty 2741 terminals and a 2701 unit with parallel data adapter unit interfacing a channel-to-channel adapter to a cdc 6400 computer. the ibm operating system, scope 3.2.0 version 1.1, is used. processing the library's circulation system was designed to use the administrative terminal system (a ts) and terminal job entry ( tje) for remote operation and processing on the ibm 360/40 and cdc 6400 computers. programs are written in fortran for cdc 6400-6500-6600 (version 2.3). the program modules comprising the circulation system require from lk to 60k and from 0 to 2 tape units for processing. ats documents are used rather than punched card decks as program and data input media. the system incorporates several large data bases which are updated at regular intervals. a file of current circulation transactions ( 80 characters per record) is mahltained on both magnetic tape and in a ts storage. this file is merged daily with new transactions. names and addresses of university personnel and students are maintained on magnetic tape. a file of inactive circulation records (50 characters per record) is also maintained on tape. other smaller files are stored in ats and are updated daily and/ or weekly. no permanent disk storage is used. input of data and programs is made from the ibm 2741 terminal located in the circulation office of the library. data are entered daily by the clerical staff via the ats terminal. storage, retrieval, and text editing are performed as required. processing of data is initiated by the library staff. a properly sequenced assemblage of ats documents consisting of data and procomputer assisted circulation control/miller 89 gramming instructions ( tje input file) is input from the ibm 27 41 terminal. this input file is submitted through tje for execution on the cdc 6400 computer. the data are processed in accordance with the specific job command entered at the terminal. after processing, the output is stored as a single ats document. in some instances, the clerical staff divides the ats stored output into discrete output files for storage and subsequent use. selected segments of the output (notices, save lists, etc.) are produced in hard copy format and delivered to the library by the computer center (figure 1). hsccirc system the health sciences library circulation system ( hsccirc) provides: 1. a file (query file) of all monographs off the shelves which includes a record of: a. books charged out. b . books on interlibrary loan. c. books on reserve. d . books at the bindery. e. books on the "hold shelf'' which have been returned upon the request of another patron. f. books on the "new book shelf." g. books which have been declared lost and are in the process of being replaced. 2. overdue notices to all borrowers. 3. billing notices to students for those books not returned after a second overdue has been sent. 4. a file (fine file) indicating the amount owed by individual students for overdues. 5. fine notices to students if an overdue book is returned but the fine is not paid. 6. notices to users having books requested by other patrons. 7. hold shelf notices alerting library personnel to those books which have been reserved for library pat rons. 8. book availability notices to users who have made "save" requests. 9. a file (history file) containing records of inactive transactions. 10. daily and cumulative (fiscal year-to-date) statistics of the transactions. the foregoing lists the information which the system provides on a routine basis. other modules of the system permit access to additional information as required. for example, lists may be prepared of books currently in circulation to interlibrary loan, on reserve, or at the bindery. these lists are used by the staff involved in processing these materials and may be updated at their request. 90 journal of library automation vol. .5 j 2 jun e, 1972 query file processing query file (ats) 8 8 ,.-----'--'---" processing fine unpaid fine file file \ \ \ \ \ ' \ ' ' ' ' ' ' transaction file(s) \ ' \ \ \ \ \ ' \ \ ' \ ' ' ' \ \ \ \ \ updated fine file fig. 1. system overr.;ieu;. history file analysis tables charts lists address file update updated address fi le computer a ssist ed citculation control/ miller 91 creation names/ addresses master 1~---i address tape creation creation names / addresses semester faculty/ staff letters 9 special run sequences lists (ill . reserve bindery) 92 journal of library automation vol. .5/ 2 june, 1972 the history file is analyzed quarterly. the analysis provides a statistical breakdown, by user categories, of the transactions which occurred since the last analysis. the total number of charges, renewals, and save requests for each of the five user categories are tallied. the call numbers of the books borrowed by members of each user category are listed. multiple charges of the same book are incremented and recorded. this information on book usage and borrowing patterns assists in library management decisions. it is possible to identify high usage of specific volumes or subject areas and to determine whether the demand is from the faculty, staff, or graduate or undergraduate student body. records of heavy demand and multiple save requests aid in decisions to purchase additional copies of a monograph. at the end of each semester, faculty /staff letters are prepared and mailed. each notice lists the call number and due date for overdue books currently charged to the faculty or staff member. the notice requests return of the book( s) before the beginning of the next semester. statistics generated by the system (figure 2) are used in the preparation of monthly, quarterly, and annual reports. they have been used as a basis for decisions on policy such as that resulting in a change in the length of the circulation period in april 1971. subsequent statistics have been used to evaluate such changes. in addition, the system permits rapid, easy consultation of the query file to detect the location of any book off the shelves. this is accomplished through use of the printed query file (figure 3) which is arranged in call number sequence. it contains one line of information for each transaction new chrgs holds spcl chrgs (ill) spcl chrgs (bnd) spc 1 ch rgs (res) renewals save requests recall letters ho 1 d 1 etters books overdue 1st overdue 2nd overdue bills lost books discharges discharges (hld shlf) fig. 2. circulation statistics. 42572 year to date 112 5 5 0 1 12 3 2 3 48 0 0 0 2 173 5 2225 185 85 48 116 308 74 63 99 1230 773 364 122 61 2837 154 computer assisted circulation control/miller 93 *t)l696/p2l27 /1 *clo *t 60772 *d 70772 *p 202319 *u3 *ql696/p2l27 /1 *c51 *t 71172 *d 71872 *p 202319 *u3 *ql697/g4/l966/l *clo *t 62672 *d 62672 *p 71165132 *ul *ql697/g4/l966/l *c52 *t 70772 *d 71472 *p 71165132 *ul *ql698/h25/l *clo *t 61972 *0 71972 *p 138551 *u3 *ql698/j3/l953/l *clo *t 61972 *d 71772 *p 138551 *u3 *ql698/s78/l965/l *c20 *t 62372 *d 72372 *p 244856 *u2 *ql698/s78/l *clo *t 61972 *d 71972 *p 138551 *u3 *ql698.3/a7/l965/l *clo *t 61972 *d 71972 *p 138551 *u3 *ql703/w22/l968/vl/l *clo *t 51172 *d 61172 *p 102714 *u3 *ql706/p25/1957/l *clo *t 71172 *d 81172 *p 440503366 *u4 *ql715/cl32m/l966/l *clo *t 70272 *d 80272 *p 180439 *u3 *ql73l/d67f/l969/l *clo *t 51872 *d 61872 *p 102714 *u3 *ql73l/d67f/l969/l *c53 *t 70772 *d 71472 *p 102714 *u3 *ql73l/e55/1953/l *clo *t 51872 *0 61872 *p 102714 *u3 *ql731/e55/l953/l *c53 *t 70772 *d 71472 *p 102714 *u3 *ql737/c2h24/l948/l *clo *t 50772 *d 60772 *p 147339 *u2 *ql737/c2h24/l952/l *c60 *t 42472 *d -0 *p -0 *uo *ql737/c2h24/l966/l *c60 *t 42472 *d -0 *p -0 *uo *ql737/c23c7/l969/l *clo *t 62172 *d 72172 *p 175470 *u2 *ql737/c4l5/l96l/l *c60 *t 30172 *d -0 *p -0 *uo *ql737/c7/l964/l *clo *t 62272 *d 72272 *p 220053 *u3 *ql737/m3f919k/l969/l *clo *t 60972 *d 70972 *p 165872 *u3 ~ql737/m3f919k/l96l/l *c51 *t 71172 *d 71872 *p 165872 *u3 *ql737/p9b9/1963/l *c60 *t 12272 *d -0 *p -0 *uo *ql737/p9h4/vl/1953/l *clo *t 51172 *d 61172 *p 102714 *u3 *ql737/p9h4/vl/l953/l *c54 *t 71172 *d 61172 *p 102714 *u3 *ql737/p9m64/l967/3 *clo *t 61572 *d 71572 *p 360307268 *u6 *ql737/p9s3/l965/v2/2 *cl8 *t 100471 *d -0 *p 777777777 *uo legend: *call number *c transaction code *t date of transaction *d due date or date next notice will be generated *p patron identification number *u user category fig. 3. query file. changing the status of a book. for example, when a book is charged, the query file contains one line of information relating to the charge. if the book becomes overdue, a second line of information is automatically generated indicating the overdue status of the book. a two-digit transaction code defines the status. the transaction code is entered as part of the input (as code 10 when charging a book); or it is generated by the system, as occurs when a book becomes overdue and initial and subsequent overdue, billing, and/or fine notices are produced (code 51,52,53,54). this same information may be obtained through on-line query of the circulation file from the ibm 2741 terminal during the hours of operation of ats. access to the query file is either by call number of the book or identification number of the user. the latter is used when producing lists of items out on loan to a borrower and in detecting delinquent borrowers. 94 journal of library automation vol. 5/2 june, 1972 overview comparison of statistics between fy 1969/70 and fy 1971/72 showed a 12 percent increase in circulation. during the same period there was a 61 percent increase in the number of people using the library. the circulation department has been able to handle the increased workload more efficiently because of the automated system. a decrease in clerical time required for carrying out the tasks of the department has been realized. the circulation records are now updated five times per week and notices are issued promptly. previously, updating was possible only once in every seven to ten days. service to the user is much faster and more accurate in charging books and in providing information on this status. control of items loaned to users is more effective. information for management of the collection and provision of improved service is available. system disadvantages are related to the mode of data input and lack of author and title information on records. transcription errors occur during manual capture of data at the time the transaction occurs and when the data are entered by the clerical staff from the terminal. correction of errors requires rekeying and reentry of the corrected data for reprocessing. this increases cost in terms of personnel time and equipment use. author and title information is not provided in the query file or on notices sent to users. this is an inconvenience to the user and requires checking of the shelf list by library personnel to provide the information when required. these potential disadvantages were recognized at the time the system was planned. however, they were not considered serious drawbacks. the decision was made to adopt the system and, when additional funds were forthcoming, to provide machine readable input and add author and title information to the records. costs the cost of the system during its first year of operation was $10,590.65. this included monthly charges for rental of equipment, use of ats, storage of records, computer time, and print costs. ibm 27 41 terminal (including phone line) a ts sign on time ats storage computer time and print costs total $1082.86 1042.08 3187.24 5278.47 $10,590.65 unit cost figures are imperfect, but over 69,000 transactions were processed and over 20,000 notices generated at an average unit cost of 11.6 cents. clerical time is not included in this figure. the number of clerical assistants remained constant although, as noted, all phases of the work of the circulation department increased. computer assisted circulation contmljmiller 95 future development in the future, the library hopes to be able to take greater advantage of the on-line query capability of the present system. additional ibm 27 41 terminals at selected locations in the library could provide instantaneous file query. while non-routine queries are made on-line, the library now uses printed listings for most routine queries. the installation of automatic data input devices, such as ibm 1030 equipment, would permit reading of coded book cards and patron identification cards with direct transmission of data to ats storage. the hardware and software modification required to implement this additional capability is technically feasible and not financially prohibitive. the present system is to be installed soon in another library on the sunyab campus. implementation should require only minimal software modifications to identify and keep separate the records of the other library. adoption is simplified because of the fact that book cards are not required and that the circulation file consists only of charged materials and not a record of complete library holdings. acknowledgments the following individuals contributed their varied talents and support to the development and implementation of the system: mr. erich meyerhoff, former librarian of the health sciences library; gerald lazorick, systems design programmer, former director, technical information dissemination bureau, sunyab; mrs. jean risley, programmer/analyst; mr. mark fennessy, former library intern at the health sciences library; and the clerical staff of the circulation department, especially barbara helminiak and _evelyn hufford. microsoft word 9526-16430-5-ce.docx president’s message: reflections on lita’s past and future aimee fifarek information technologies and libraries | september 2016 3 when i reached out to ital editor bob gerrity about my first president’s column, he graciously provided copies of past lita presidents’ columns to get me started. it reminded me once again of the illustrious company i am in, starting with stephen r. salmon, the first president of the information services and automation division, as we were known until 1977. i am proud to be at the head of lita as it begins to celebrate its 50th anniversary year. a half century ago when lita was founded the world was experiencing an era of profound technological change. the us and soviet union were battling to be first in the space race, and an increasing number of world powers were engaging in nuclear testing. while civil rights demonstrations and the fighting in vietnam dominated the news, we were imagining peace via the technologically-driven future depicted in a new tv series called star trek. with tv focused on the stars, we were able to go to the movies and explore the strange new world of inner space in fantastic voyage. technology was poised to enter our daily lives as well, with diebold demonstrating the first atm1 and ralph h. baer writing the 4-page paper that would lay the foundation for the video game industry.2 heady times for technology indeed, and the fact that libraries were sufficiently advanced to require an association dedicated to supporting technologists is hardly surprising. by the time of lita’s founding at the 1966 midwinter meeting in chicago, library automation had been in development for over a decade.3 marc was just being invented, with the first tapes from the library of congress scheduled to go to the sixteen pilot libraries later that year. membership in the only organization that existed, the committee on library automation (cola), was restricted to the handful of professionals who either developed or managed existing library systems. but technology was beginning to impact many more librarians than just those rarified few. according to president salmon, “it was clear that large numbers of librarians who didn't meet cola's standards for membership were in need of information on library automation and wanted leadership.”4 the first meeting of our division on july 14, 1966 at the ala annual conference in new york was attended by several hundred librarians interested in information sharing, technology standards, and technology training for library staff. this group created the first mission, vision, and bylaws that set us on a 50-year path of success. lita is well positioned to take the first steps into our next 50 years. thanks to the efforts of last year’s lita board, we are on the verge of adopting a new two-year strategic plan that is designed aimee fifarek (aimee.fifarek@phoenix.gov) is lita president 2016-17 and deputy director for customer support, it and digital initiatives at phoenix public library, phoenix, az. president’s message | fifarek doi: 10.6017/ital.v35i3.9526 4 to guide us through the current transitional period. it will be accompanied by a tactical plan that will allow us to document our accomplishments and set the stage for an ongoing culture of continuous planning. also, jenny levine has proven to be extremely capable as she completes her first year as lita executive director. she has just the right combination of ala experience, technology know-how, and calm competence to guide us through the retooling and reimagining that is required to take a middle-aged association into the next phase of its life. the four areas of focus in the new strategic plan will help us to balance our efforts between preserving the strengths of our past and adapting our organization for a successful future. the first area of focus, member engagement, shows that our primary commitment needs to be to lita members. without you, lita would not exist. one of the key efforts is to increase the value of lita for members who are unable to travel to conferences. with travel budgets down and staying low, online member engagement is an area all of ala needs to improve, and who better to lead in this area than lita. the next area, organizational sustainability, is all about keeping the infrastructure of the organization strong, much of which happens in the domain of lita staff. budgeting, quality communication, and strategic planning all live here. the section on education and professional development recognizes the important role that webinars, online courses, online journal, and print publications play in allowing lita members to share their knowledge on both cutting edge and practical topics with the rest of the association and ala in general. we are already doing great work here and we need to better support and expand these efforts. the last focus area, advocacy and information policy, represents a future growth area for lita. now that everyone in the library world "does" technology to a certain extent, lita needs to think about how we will differentiate ourselves as outside competencies increase. our advantage is that we have been doing and thinking about technology for much longer than anyone else. with our vast wealth of experience, it's appropriate that we work to become thought leaders and implementers in the information policy realm. in this, as always, we return to where we started: our members. lita has thrived over the last 50 years because of this, our most important resource. lita was founded on the concept of sharing information about technology through conversation, publications, and knowledge creation. we endure because you, the committed, passionate information professionals are willing to share what you know with those who come after. and like our founders, there are always individuals who are willing to take on the mantle of leadership, whether through getting elected to lita board, becoming a committee or interest group chair, serving in key editorial roles for our monographs, journal, and blog, or joining the all-important lita staff. thanks to all of you who make lita’s future happen every day. i am proud to be in your company. information technologies and libraries | september 1016 5 references 1 . alan taylor, “50 years ago: a look back at 1966,” the atlantic photo, march 23, 2016, http://www.theatlantic.com/photo/2016/03/50-years-ago-a-look-back-at-1966/475074/, photo 46. 2. “take me back to august 30, 1966,” http://takemeback.to/30-august-1966#.v8szitlrtaq. 3. “library technology timeline,” http://web.york.cuny.edu/~valero/timeline_reference_citations.htm. 4. stephen r. salmon, “lita’s first 25 years, a brief history,” http://www.ala.org/lita/about/history/1st25years. president’s message andromeda yelton information technology and libraries | december 2017 2 andromeda yelton (andromeda.yelton@gmail.com) is lita president 2017-18 and senior software engineer, mit libraries, cambridge, united states. before i dive into my column, i’d like to recognize and thank bob gerrity for his six years of service as ital’s editor in chief. he oversaw our shift from a traditional print journal to a fully online one, recognized by micah vandegrift and chealsye bowley as having the strongest open-access policies of all lis journals (http://www.inthelibrarywiththeleadpipe.org/2014/healthyself/). i’d like to further extend a welcome to ken varnum as our new editor in chief. ken’s distinguished record of lita service includes stints on the ital editorial board and the lita board of directors, so he knows the journal very well and i am enthusiastic about its future under his lead. i’m particularly curious to see what will be discussed in ital under ken’s leadership because i’ve just come back from two outstanding conferences which drove home the significance of the issues we wrestle with in library technology, and i’m looking forward to a third. in early november, i attended lita forum in scenic denver. the schedule was packed with sessions on intriguing topics – too many, of course, for me to attend them all – but two in particular stand out to me. in one, sam kome detailed how he’s going about a privacy audit at the claremont colleges library. he walked us through an extensive – and sometimes surprising – list of places personally identifiable information can lurk on library and campus systems, and talked through what his library absolutely needs (which is less than he’d thought, and far less than the library has been logging without thinking about it). in the other, mary catherine lockmiller took a design thinking approach to serving transgender populations. she shared a fantastic, practical libguide (http://libguides.southmountaincc.edu/transgenderresources), but the part that stuck with me most is her statement that many trans people may never physically enter a library because public spaces are not safe spaces; for this population, our electronic services are our public services. as technologists, we create the point of first, and maybe only, contact. a week later, i attended the inaugural data for black lives conference (http://d4bl.org/) at the mit media lab, steps from my office. this was – and i think everyone in the room felt it – something genuinely new. from the galvanizing topic, to the sophisticated visual and auditory design, to the frisson of genius and creativity buzzing all around a room of artists, activists, professors, poets, data scientists and software engineers, it was a remarkable experience for us all. those of you who heard dr. safiya noble speak at thomas dowling’s lita president’s program in 2016 are familiar with algorithmic bias. numerous speakers discussed this at d4bl: the ways that racial disparities in underlying data sets can be replicated, magnified, and given a veneer of objective power when run through the black boxes that power predictive policing or risk assessment for bail hearings. absent and messy data was a theme as well: in a moment that would make many librarians chuckle (and then wince) knowingly, a panel of music industry executives estimated that 40% of their metadata is wrong, thus making it impossible to credit and compensate artists appropriately. mailto:andromeda.yelton@gmail.com) https://www.google.com/url?q=http://www.inthelibrarywiththeleadpipe.org/2014/healthyself/&sa=d&ust=1512118443864000&usg=afqjcnedfyl-ywfgnadmdzfcrvvnmhlhhq http://libguides.southmountaincc.edu/transgenderresources http://d4bl.org/) president’s message | yelton 3 https://doi.org/10.6017/ital.v36i4.10238 and yet – in a memorable keynote – dr. ruha benjamin called on us not only to collect data about black death, as she showed us an image of the ambulance bill sent to tamir rice’s family, but to listen to our artists and poets as we use our data to imagine black life – this in front of an image of wakanda. with our data and our creativity, what new worlds can we map? several of my mit colleagues also attended d4bl, and as we discussed it afterward we started thinking about how these ideas can drive our own work. how does the imaginary world of wakanda connect to the archival imaginary, and what worlds can we empower our own creators to imagine with what we collect and preserve? how can we use our data literacy and access to sometimes un-googleable resources to help community groups collate data on important issues that are not tracked by our public institutions, such as police violence (https://mappingpoliceviolence.org/) or racial disparities in setting bail? with these ideas swirling in my mind, i am looking forward with tremendous excitement to lita forum 2018. building on the work of our forum assessment task force, we’ll be doing a lot of things differently; in particular, aiming for lots of hands-on, interactive sessions. this will be a conference where, whether you’re a presenter or an attendee, you’ll be able to do things. and these last two conferences have driven home for me how very much there is to do in of library technology. our work to select, collect, preserve, clean, and provide access to data can indeed have enormous impact. technology services are front-line services. https://mappingpoliceviolence.org/) microsoft word september_ital_betz_final.docx self-‐archiving with ease in an institutional repository: microinteractions and the user experience sonya betz and robyn hall information technology and libraries | september 2015 43 abstract details matter, especially when they can influence whether users engage with a new digital initiative that relies heavily on their support. during the recent development of macewan university’s institutional repository, the librarians leading the project wanted to ensure the site would offer users an easy and effective way to deposit their works, in turn helping to ensure the repository’s long-‐term viability. the following paper discusses their approach to user-‐testing, applying dan saffer’s framework of microinteractions to how faculty members experienced the repository’s self-‐archiving functionality. it outlines the steps taken to test and refine the self-‐archiving process, shedding light on how others may apply the concept of microinteractions to better understand a website’s utility and the overall user experience that it delivers. introduction one of the greatest challenges in implementing an institutional repository (ir) at a university is acquiring faculty buy-‐in. support from faculty members is essential to ensuring that repositories can make online sharing of scholarly materials possible, along with the long-‐term digital preservation of these works. many open access mandates have begun to emerge around the world, developed by universities, governments, and research funding organizations, which serve to increase participation through requiring that faculty contribute their works to a repository.1 however, for many staff managing irs at academic libraries there are no enforceable mandates in place, and only a fraction of faculty works can be contributed without copyright implications when author agreements transfer copyrights to publishers. persuading faculty members to take the time to sort through their works and self-‐archive those that are not bound by rights restrictions is a challenge. standard installations of popular ir software, including dspace, digital commons, and eprints, do little to help facilitate easy and efficient ir deposits by faculty. as dorothea salo writes in a widely cited critique of irs managed by academic libraries, the “‘build it and they will come’ proposition has been decisively wrong.”2 a major issue she points out is that repositories were predicated on the “assumption that faculty would deposit, describe, and manage their own material.”3 seven sonya betz (sonya.betz@ualberta.ca) is digital initiatives project librarian, university of alberta libraries, university of alberta, edmonton, alberta. robyn hall (hallr27@macewan.ca) is scholarly communications librarian, macewan university library, macewan university, edmonton, alberta. self-‐archiving with ease in an institutional repository | betz and hall doi: 10.6017/ital.v34i3.5900 44 years after the publication of her article, a vast majority of the more than 2,600 repositories currently operating around the world still function in this way and struggle to attract widespread faculty support.4 to deposit works into these systems, faculty are often required to fill out an online form to describe and upload each work individually. this can be a laborious process that includes deciphering lengthy copyright agreements, filling out an array of metadata fields, and ensuring file formats or file sizes that are compatible with the constraints of the software. in august of 2014, macewan university library in edmonton, alberta, launched an ir, research online at macewan (ro@m; http://roam.macewan.ca). our hope was that ro@m’s simple user interface and straightforward submission process would help to bolster faculty contributions. the site was built using islandora, an open-‐source software framework that offered the project developers substantial flexibility in appearance and functionality. in an effort to balance their desire for independence over their work with ease of use, faculty and staff have the option of submitting to ro@m in one of two ways: they can choose to complete a brief process to create basic metadata and upload their work, or they can simply upload their work and have ro@m staff create metadata and complete the deposit. thoroughly testing both of these processes was critical to the success of the ir. we wanted to ensure that there were no obstacles in the design that would dissuade faculty members from contributing their works once they had made the decision to start the contribution process. as the primary means of adding content to the ir, and as a process that other institutions have found problematic, carefully designing each step of how a faculty contributor submits material was our highest priority. to help us focus our testing on some of these important details, and to provide a framework of understanding for refining our design, we turned to dan saffer’s 2013 book microinteractions: designing with details. the following case study describes our use of microinteractions as a user-‐testing approach for libraries and discusses what we learned as a result. we seek to shed light on how other repository managers might envision and structure their own self-‐archiving processes to ensure buy-‐in while still relying on faculty members to do some of the necessary legwork. additionally, we lay out how other digital initiatives may embrace the concept of microinteractions as a means of better understanding the relationship between the utility of a website and the true value of positive user experience. literature review user experience and self-‐archiving in institutional repositories user experience (ux) in libraries has gained significant traction in recent years and provides a useful framework for exploring how our users are interacting with, and finding meaning in, the library technologies we create and support. although there is still some disagreement around the definition and scope of what exactly we mean when we talk about ux, there seems to be general consensus that paying attention to ux shifts focus from the usability of a product to more nonutilitarian qualities, such as meaning, affect, and value.5 hassenzhal simply defines ux as a information technologies and libraries | september 2015 45 “momentary, primarily evaluative feeling (good-‐bad) while interacting with a product or service.”6 hassenzhal, diefenbach, and goritz argue that positive emotional experiences with technology occur when the interaction fulfills certain psychological needs, such as competence or popularity.7 the 2010 iso standard for human-‐centered design for interactive systems defines ux even more broadly, suggesting that it “includes all the users’ emotions, beliefs, preferences, perceptions, physical and psychological responses, behaviors and accomplishments that occur before, during and after use.”8 however, when creating tools for library environments, it can be difficult for practitioners to translate ambiguous emotional requirements, such as satisfying emotional and psychological needs or increasing motivation, with pragmatic outcomes, such as developing a piece of functionality or designing a user interface. it has been well documented that repository managers struggle to motivate academics to self-‐ archive their works.9 however, the literature focusing on how ir websites’ self-‐archiving functionality helps or hinders faculty support and engagement is sparse. one study of note was conducted by kim and kim in 2006, who led usability testing and focus groups on an ir in south korea. 10 they provide a number of ways to improve usability on the basis of their findings, which include avoiding jargon terms and providing comprehensive instructions at points of need rather than burying them in submenus. similarly, veiga e silva, goncalves, and laender reported results of usability testing conducted on the brazilian digital library of computing, which confirmed their initial goals of building a self-‐archiving service that was easily learned, comfortable, and efficient.11 the authors of both of these studies suggest that user-‐friendly design could help to ensure the active support and sustainability of their services, but long-‐term use remained to be seen at the time of publication. meanwhile, bell and sarr recommend integrating value-‐added features into ir websites as a way to attract faculty.12 their successful strategy for reengineering a struggling ir at the university of rochester included adding tools to allow users to edit metadata and add and remove files, and providing portfolio pages where faculty could list their works in the ir, link to works available elsewhere, detail their research interests, and upload a copy of their cv. although the question remains as to whether a positive user experience in an ir can be a significant motivating factor for increasing faculty participation, there seems to be enough evidence to support its viability as an approach. applying microinteractions to user testing dan saffer’s 2013 book, microinteractions: designing with details, follows logically from the ux movement. although he uses the phrase “user experience” sparingly, saffer consistently connects interactive technologies with the emotional and psychological mindset of the user. saffer focuses on “microinteractions,” which he defines as “a contained product moment that revolves around a single use case.”13 saffer argues that well-‐designed microinteractions are “the difference between a product you love and product you tolerate.”14 saffer’s framework is an effective application of ux theory to a pragmatic task. not only does he privilege the emotional state of the user as a priority self-‐archiving with ease in an institutional repository | betz and hall doi: 10.6017/ital.v34i3.5900 46 for design, he also provides concrete recommendations for designing technology that provokes positive psychological states such as pleasure, engagement, and fun. defining what we mean by a “microinteraction” is important when translating saffer’s theory to a library environment. he describes a microinteraction as “a tiny piece of functionality that only does one thing . . . every time you change a setting, sync your data or devices, set an alarm, pick a password, turn on an appliance, log in, set a status message, or favorite or like something, you are engaging with a microinteraction.”15 in libraries, many microinteractions are built around common user tasks such as booking a group-‐use room, placing a hold on an item, registering for an event, rating a book, or conducting a search in a discovery tool. a single piece of interactive library technology may have any number of discrete microinteractions, and often are part of a larger ecosystem of connected processes. for example, an integrated library system is composed of hundreds of microinteractions designed both for end users and library staff, while a self-‐checkout machine is primarily designed to facilitate a single microinteraction. saffer’s framework provided a valuable new lens on how we could interpret users’ interactions with our ir. while we generally conceptualize an ir as a searchable collection of institutional content, we can also understand it as a collection of microinteractions. for example, ro@m’s core is microinteractions that enable tasks such as searching content, browsing content, viewing and downloading content, logging in, submitting content, and contacting staff. ro@m also includes microinteractions for staff to upload, review, and edit content. as discussed above, one of the primary goals when developing our ir was to allow faculty to deposit scholarly content, such as articles and conference papers, directly to the repository. we wanted this process to be simple and intuitive, and for faculty to have some control over the assignation of keywords and other metadata, but also to have the option to simply submit content with minimal effort. we decided to employ user testing to carefully examine the deposit process as a discrete microinteraction and to apply saffer’s framework as a means of assessing both functionality and ux. we hoped that focusing on the details of that particular microinteraction would allow us to make careful and thoughtful design choices that would lead to a more consistent and pleasurable ux. method and case study we conducted two rounds of user testing for the self-‐archiving process. our initial user testing was conducted in january 2014. we asked seven faculty to review and comment on a mockup of the deposit form to test the workflow. this simple exercise allowed us to confirm the steps in the upload process, and identified a few critical issues that we could resolve before building out the ir in islandora. after completing the development of the ir, and with a working copy of the site installed on our user acceptance testing (uat) server, we conducted a second round of in-‐depth usability testing within our new microinteraction framework. in april 2014 we recruited six faculty members through word of mouth and through a call for participants in the university’s weekly electronic staff newsletter. the volunteers represented major disciplines at macewan university, including health sciences, social sciences, humanities, information technologies and libraries | september 2015 47 and natural sciences. saffer describes a process for testing microinteractions and suggests that the most relevant way to test microinteractions is to include “hundreds (if not thousands) of participants.”16 however, he goes on to describe the most effective methods of testing to be qualitative, including conversation, interviews, and observation. testing thousands of participants with one-‐on-‐one interviews and observation sessions is well beyond the means of most academic libraries, and runs counter to standard usability testing methodology. while testing only six participants may seem like a small number, and one that is apt to render inconclusive results and sparse feedback, it is strongly supported by usability experts, such as jakob nielson. during the course of our testing, we quickly reached what nielson refers to in his piece “how many test users in a usability study?” as “the point of diminishing returns.”17 he suggests that for most qualitative studies aimed at gathering insights to inform site design and overall ux, five users is in fact a suitable number of participants. we support his recommendation on the basis of our own experiences; by the fourth participant, we were receiving very repetitive feedback on what worked well and what needed to be changed. testing took place in faculty members’ offices on their own personal computers so that they would have the opportunity to engage with the site as they would under normal workday circumstances. each user testing session lasted 45 to 60 minutes, and was facilitated by three members of the ro@m team: the web and ux librarian guided each faculty member through the testing process, the scholarly communications librarian observed the interaction, and a library technician took detailed notes recording participant comments and actions. each faculty member was given an article and asked to contribute that article to ro@m using the uat site. the ro@m team observed the entire process carefully, especially noting any problematic interactions, while encouraging the faculty member to think aloud. once testing was complete, the scholarly communications librarian analyzed the notes and identified areas of common concern and confusion among participants, as well as several suggestions that the participants made to improve the site’s functionality as they worked through the process. she then went about making changes to the site based on this feedback. as we discuss in the next section, each task that faculty members performed, from easy to frustrating, represented an interaction with the user interface that affected participants’ experiences of engaging with the contribution process, and informed changes we were able to make before launching the ir service three months later. basic elements of microinteractions saffer’s theory describes four primary components of a microinteraction: the trigger, rules, feedback, and loops and modes. viewing the ir upload tool as a microinteraction intended to be efficient and user-‐friendly required us to first identify each of these different components as they applied to the contribution process (see figure 1), and then evaluate the tool as a whole through our user testing. self-‐archiving with ease in an institutional repository | betz and hall doi: 10.6017/ital.v34i3.5900 48 figure 1. ir self-‐archiving process with microinteraction components. trigger the first component to examine in a microinteraction is the trigger, which is, quite simply, “whatever initiates the microinteraction.”18 on an iphone, a trigger for an application might be the icon that launches an app; on a dishwasher, the trigger would be the button pressed to start the machine; on a website, a trigger could be a login button or a menu item. well-‐designed triggers follow good usability principles: they appear when and where the user needs them, they initiate the same action every time, and they act predictably (for example, buttons are pushable, toggles slide). information technologies and libraries | september 2015 49 examining our trigger was a first step in assessing how well our upload microinteraction was designed. uploading and adding content is a primary function of the ir, and the trigger needed to be highly noticeable. we can assume that users would be goal-‐based in their approach to the ir; faculty would be visiting the site with the specific purpose of uploading content and would be actively looking for a trigger to begin an interaction that would allow them to do so. the initial design of ro@m included a top-‐level menu item as the only trigger for contributing works. in the persistent navigation at the top of the site, users could click on the menu item labeled “contribute” where they would then be presented with a login screen to begin the contribution process. this was immediately obvious to half of the participants during user testing. however, the other half immediately clicked on the word “share,” which appeared on the lower half of the page beside a small icon simply as a way to add some aesthetic appeal to the homepage along with the words “discover” and “preserve.” not surprisingly, the users were interpreting the word and icon as a trigger. because of the user behavior that we observed, we decided to add hyperlinks to all three of these words, with “share” linking to the contribution login screen (see figure 2), “discover” leading to a browse page, and “preserve” linking to an faq for authors page that included information on digital preservation. this increased visibility of the trigger significantly for the microinteraction. figure 2. “share” as additional trigger for contributing works. self-‐archiving with ease in an institutional repository | betz and hall doi: 10.6017/ital.v34i3.5900 50 rules the second component of microinteractions described by saffer are the rules. rules are the parameters that govern a microinteraction; they provide a framework of understanding to help users succeed at completing the goal of a microinteraction by defining “what can and cannot be done, and in what order.”19 while users don’t need to understand the engineering behind a library self-‐checkout machine, for example, they do need to understand what they can and cannot do when they’re using the machine. the hardware and software of a self-‐checkout machine is designed to support the rules by encouraging users to scan their cards to start the machine, to align their books or videos so that they can be scanned and desensitized, and to indicate when they have completed the interaction. the goal when designing a self-‐archiving process in ro@m was to ensure that the rules were easy for users to understand, followed a logical structure, and were not overly complex. to this end, we drew on saffer’s approach to designing rules for microinteractions, along with the philosophy espoused by steve krug in his influential web design book, don’t make me think: a common sense approach to web usability.20 both krug and saffer argue for reducing complexity and removing decision-‐making from the user whenever possible to remove potential for user error or mistakes. the rules in ro@m follow a familiar form-‐based approach: users log in to the system, they have to agree to a licensing agreement, they create some metadata for their item, and they upload a file (see figure 1). however, determining the order for each of these elements, and ensuring that users could understand how to fill out the form successfully, required careful thinking that was greatly informed by the user testing we conducted. for example, we designed ro@m to connect to the same authentication system used for other university applications, ensuring that faculty could log in with the credentials they use daily for institutional email and network access. forcing faculty to create, and remember, a unique username and password to submit content would have increased the possibility of login errors and resulted in confusion and frustration. we also used drop-‐down options where possible throughout the microinteraction instead of requiring faculty to input data such as file types, faculty or department names, or content types into free-‐text boxes. during our user testing we found that those fields where we had free-‐text input for metadata entry most often led to confusion and errors. for instance, it quickly became apparent that name authority would be an issue. when filling out the “author” field, some people used initials, some used middle names, and some added “dr” before their name, which could negatively affect the ir’s search results and the ability to track where and when these works may be cited by others. when asked to include a citation for published works, most of our participants noted frustration with this requirement because they could not do so quickly, and they had concerns about creating correct citations. finally, many participants also became confused at the last, optional field in the form that allowed them to assign a creative commons license to their works. information technologies and libraries | september 2015 51 our user testing indicated that we would need to be mindful of how information like author names and citations were entered by users before making an item available on the site. under ideal circumstances, we would have modified the form to ensure that any information that the system knew about the user was brought forward: what saffer calls “don’t start from zero.”21 this could include automatically filling in details like a user’s name. however, like many libraries, we chose to adapt existing software rather than develop our microinteraction from the ground up; implementing such changes would have been too time-‐consuming or expensive. in response, we instead added additional workflows to allow administrators to edit the metadata before a contribution would be published to the web so we could correct any errors. we also changed the “citation” field to “publication information” to imply that users did not need to include a complete citation. lastly, we made sure that “all rights reserved” was the default selection for the optional “add a creative commons license?” field in the form because this was language that with which our users were familiar with and comfortable proceeding. policy constraints are another aspect of the rules that provide structure around microinteractions, and can also limit design choices that can be made. having faculty complete a nonexclusive licensing agreement that acknowledged they had the appropriate copyright permissions to allow them to contribute the work was a required component in our rules. without the agreement, we would risk liability for copyright infringement and could not accept the content into the ir. however, our early designs for the repository included this step at the end of the submission process, after faculty had created metadata about the item. our initial round of testing revealed that several of our participants were unsure of whether they had the appropriate copyright permissions to add content and didn’t want to complete the submission, a frustrating experience for them after spending time filling out author information, keywords, abstract, and the like. we attempted to resolve this issue by moving the agreement much earlier in the process, forcing users to acknowledge the agreement before creating any metadata. we also used simple, straightforward language for the agreement and added additional information about how to determine copyrights or contact ro@m staff for assistance. integrating an api that could automatically search a journal’s archiving policies in sherpa romeo at this stage in the contribution process is something we plan to investigate to help reduce complexity further for users. feedback understanding the concept of feedback is critical to the design of microinteractions. while most libraries are familiar with collecting feedback from users, the feedback saffer describes is flowing in the opposite direction, and instead refers to feedback the application or interface is providing back to users. this feedback gives users information when and where they need it to help them navigate the microinteraction. as saffer comments, “the true purpose of feedback is to help users understand how the rules of the microinteraction work.”22 self-‐archiving with ease in an institutional repository | betz and hall doi: 10.6017/ital.v34i3.5900 52 feedback can be provided in a variety of ways. an action as simple as a color change when a user hovers over a link is a form of feedback, providing visual information that indicates that a segment of text can be clicked on. confirmation messages are an obvious form of feedback, while a folder with numbers indicating how many items have been added to it is more subtle. while visual feedback is most commonly used, saffer also describes cases where auditory and haptic (touch) feedback may be useful . designing feedback, much like designing rules, should aim to reduce complexity and confusion for a user, and should be explicitly connected both functionally and visually to what the user needs to know. in an online web environment, much of the feedback we provide the user should be based on good usability principles. for example, formatting web links consistently and providing predictable navigation elements are some ways that feedback can be built into a design. providing feedback at the users’ point of need is also critical, especially error messages or instructional content. this proved to be especially important to our ro@m test subjects. while the ir featured an “about” section accessible in the persistent navigation at the top of the website that contained detailed instructions and information about how to submit works, and terms of use governing these submissions, this content was virtually invisible to the users we observed. instead, they relied heavily on the contextual feedback that was included throughout the contribution process when it was visible to them. these observations led us to rethink our approach to providing feedback in several cases. for example, an unfortunate constraint of our software required users to select a faculty or school and a department and then click an “add” button before they could save and continue. we included some instructions above the drop-‐down menus, stating “select and click add” in an effort to prevent any errors. however, our participants failed to notice the instructions and inevitably triggered a brief error message (see figure 3). we later changed the word “add” in the instructions from black to bright red hoping to increase its visibility, and we ensured that the error message that displayed when users failed to select “add” clearly explained how to correct the problem and move on. we also observed that the plus signs to add additional authors and keywords were not visible to users. we added feedback that included both text and icons with more detail (see figure 4). however, this remains a problem for users that we will need to further explore. on completing a contribution, users receive a confirmation page that thanks them for the contribution, provides a timeline for when the item will appear on the site, and notes that they will receive an email when it appears. response to this page was positive as it succinctly covered all of the information the users felt they needed to know having completed the process. information technologies and libraries | september 2015 53 figure 3. feedback for the “add” button. figure 4. feedback for adding multiple authors and keywords. self-‐archiving with ease in an institutional repository | betz and hall doi: 10.6017/ital.v34i3.5900 54 modes and loops the final two components of microinteractions defined by saffer are modes and loops. saffer describes a mode as a “fork in the rules,” or a point in a microinteraction where the user is exposed to a new process, interface, or state.23 for example, google scholar provides users with a setting to show “library access links” for participating institutions with openurl compatible link resolvers.24 users who have set this option are presented with a search results page that is different from the default mode and includes additional links to their chosen institution’s link resolver. our microinteraction includes two distinct modes. once logged in, users can choose to contribute works through the “do it yourself” submission that we’ve described here in some detail, or they can choose “let us do it” and complete a simplified version that requires them to acknowledge the licensing agreement, upload their files, and provide any additional data they chose in a free-‐text box (see figure 5). the majority of our testers specified that they would opt for the “do it yourself” option because they wanted to have control over the metadata describing their work, including the abstract and keywords. however, since launching the repository, several submissions have arrived via the “let us do it” form, which suggests a reasonable amount of interest in this mode. figure 5. the “let us do it” form. loops, on the other hand, are simply a repeating cycle in the microinteraction. a loop could be a process that runs in the background, checking for network connections, or it could be a more visible process that adapts itself on the basis of the user’s behavior. for example, in the ro@m submission process users can move backward and forward in the contribution forms; both have information technologies and libraries | september 2015 55 “previous” and “save and continue” buttons on each page to allow users to navigate easily. the final step on the “do it yourself” form allows users to review their metadata and the file that they have uploaded. they can then use the previous button to make changes to what they have entered before completing the submission. ideally, users would be able to edit this content directly from this review page, but software constraints prevented us from including this feature, and the “previous” button did not pose any major challenges for our testing participants. another example of a loop in ro@m is a “contribute more works” button embedded in the confirmation screen that takes users back to the beginning of the microinteraction. this feature was suggested by one of our participants, and it extends the life of the microinteraction, potentially leading to additional contributions. discussion and conclusions focusing on the details of the self-‐archiving process in our ir provided extremely rich qualitative data for improving the user interface, while analyzing the structure of the microinteraction, following saffer’s model, was also a valuable exercise in thinking about user needs and software design from a different perspective from standard usability studies. the improvements we made, based on both saffer’s theory and the results we observed through testing, added significant functionality and ease of use to the self-‐archiving process for faculty. thinking carefully about elements like placement of buttons, small changes in wording or flow, and timing of instructional or error feedback highlighted the big effect small elements can have on usability. however, there are some limitations to both the theory, and our approach to testing and improving the ir that affect how well we can understand and utilize the results. of particular concern is how well this kind of testing can capture the ux of a faculty member beyond the utility or ease of use of the interaction. in an observational study we can rely on comments from participants and key statements that may indicate a participant’s emotional or affective state, but we didn’t include targeted questions to gather this data and focused instead on the details of the microinteraction. we didn’t ask how they felt while using the ir, or if successfully uploading an item to the ir gave them a sense of autonomy or competence, or if this experience would encourage them to submit content in the future. nevertheless, improving usability is a solid foundation for providing a positive ux. hassahzhal describes the difference between “do-‐goals” (completing a task) and “be-‐goals” (human psychological needs like being competent, or developing relationships).25 while he argues that “be-‐goals” are the ultimate drivers of ux, he also suggests that creating tools that make the completion of do-‐goals easy can facilitate the potential fulfillment of be-‐goals by removing barriers and making their fulfillment more likely. ultimately, however, a range of user testing strategies can lead to improvements in a user interface, whether that testing relies on carefully detailed examination of a microinteraction, analysis of large data sets from google analytics, or interviews with key user groups. microinteraction theory is a useful approach, and valuable in its conceptualization, but it should be one of many tools libraries adopt to improve their online ux. self-‐archiving with ease in an institutional repository | betz and hall doi: 10.6017/ital.v34i3.5900 56 similarly, focusing on the ux of irs must be only one of many strategies institutions employ to improve rates of faculty self-‐archiving. there have been recent studies that argue that regardless of platform or process, faculty-‐initiated submissions have proven to be uncommon.26 instead, they suggest that sustainability relies on marketing, direct outreach with individual faculty members, and significant staff involvement in identifying content for inclusion, investigating rights, and depositing on authors’ behalf. it would be short sighted to suggest that relying solely on designing a user-‐friendly website, or only developing savvy promotional and outreach efforts, can determine the ongoing success of an ir initiative. gaining and maintaining support is an ongoing, multifaceted process, and largely depends on the academic culture of an institution as well as available financial and staffing resources. as such, user testing offers qualitative insights into ways that processes and functions might be improved to enhance the viability of ir initiatives in tandem with a variety of marketing and outreach references 1. “welcome to roarmap,” university of southampton, 2014, http://roarmap.eprints.org. 2. dorothea salo, “innkeeper at the roach motel,” library trends 57, no. 2 (2008): 98, http://muse.jhu.edu/journals/library_trends. 3. ibid., 100. 4. “the directory of open access repositories—opendoar,” university of nottingham, uk, 2014, http://www.opendoar.org. 5. effie l-‐c law et al., “understanding scoping and defining user experience: a survey approach,” computer-‐human interaction 2009: user experience (new york: acm press, 2009), 719. 6. marc hassenzahl, “user experience (ux): towards an experiential perspective on product quality,” proceedings of the 20th international conference of the association francophone d’interaction homme-‐machine (new york: acm press, 2008), 11, http://dx.doi.org/10.1145/1512714.1512717. 7. marc hassenzahl, sarah diefenbach, and anja göritz, “needs, affect, and interactive products: facets of user experience,” interacting with computers 22, no. 5 (2010): 353–62, http://dx.doi.org/10.1016/j.intcom.2010.04.002. 8. international standards organization, human-‐centred design for interactive systems, iso 9241-‐210 (geneva: iso, 2010), section 2.15. 9. see philip m. davis and matthew j.l. connolly, “institutional repositories: evaluating the reasons for non-‐use of cornell university’s installation of dspace,” d-‐lib magazine 13, no. 3/4 (2007), http://www.dlib.orghttp://www.dlib.org; ellen dubinsky, “a current snapshot of institutional repositories: growth rate, disciplinary content and faculty contributions,” information technologies and libraries | september 2015 57 journal of librarianship & scholarly communication 2, no. 3 (2014): 1–22, http://dx.doi.org/10.7710/2162-‐3309.1167; anthony w. ferguson, “back talk—institutional repositories: wars and dream fields to which too few are coming,” against the grain 18, no. 2 (2006): 86–85, http://docs.lib.purdue.edu/atg/vol18/iss2/14http://docs.lib.purdue.edu/atg/vol18/iss2/14; salo, “innkeeper at the roach motel”; feria wirba singeh, a abrizah, and noor harun abdul karim, “what inhibits authors to self-‐archive in open access repositories? a malaysian case,” information development 29, no. 1 (2013): 24–35, http://dx.doi.org/0.1177/0266666912450450. 10. hyun hee kim and yong ho kim, “usability study of digital institutional repositories,” electronic library 26, no. 6 (2008): 863–81, http://dx.doi.org/10.1108/02640470810921637. 11. lena veiga e silva, marcos andré gonçalves, and alberto h. f. laender, “evaluating a digital library self-‐archiving service: the bdbcomp user case study,” information processing & management 43, no. 4 (2007): 1103–20, http://dx.doi.org/10.1016/j.ipm.2006.07.023. 12. suzanne bell and nathan sarr, “case study: re-‐engineering an institutional repository to engage users,” new review of academic librarianship 16, no. s1 (2010): 77–89, http://dx.doi.org/10.1080/13614533.2010.5095170. 13. dan saffer, microinteractions: designing with details (cambridge, ma: o’reilly, 2013), 2. 14. ibid., 3. 15. ibid., 2. 16. ibid., 142. 17. jakob nielson, “how many test users in a usability study?” nielsen norman group, 2012, http://www.nngroup.com/articles/how-‐many-‐test-‐users. 18. saffer, microinteractions, 48. 19. ibid., 82. 20. steve krug, don’t make me think: a common sense approach to web usability (berkeley, ca: new riders, 2000). 21. saffer, microinteractions, 64. 22. ibid., 86. 23. ibid., 111. 24. “library support,” google scholar, http://scholar.google.com/intl/en-‐ us/scholar/libraries.html. self-‐archiving with ease in an institutional repository | betz and hall doi: 10.6017/ital.v34i3.5900 58 25. hassahzhal, “user experience,” 10–15. 26. see dubinsky, “a current snapshot of institutional repositories,” 1–22; shannon kipphut-‐ smith, “good enough: developing a simple workflow for open access policy implementation,” college & undergraduate libraries 21, no. 3/4 (2014): 279–94. http://dx.doi.org/10.1080/10691316.2014.932263. navigating uncharted waters: utilizing innovative approaches in legacy theses and dissertations digitization at the university of houston libraries article navigating uncharted waters utilizing innovative approaches in legacy theses and dissertations digitization at the university of houston libraries annie wu, taylor davis-van atta, bethany scott, santi thompson, anne washington, jerrell jones, andrew weidner, a. laura ramirez, and marian smith information technology and libraries | september 2022 https://doi.org/10.6017/ital.v41i3.14719 annie wu (awu@uh.edu) is head of metadata and digitization services and the ambassador kenneth franzheim ii and mrs. jorgina franzheim endowed professor, university of houston libraries. taylor davis-van atta (tgdavis-vanatta@uh.edu) is director of the digital research commons, university of houston libraries. bethany scott (bscott3@uh.edu) is head of preservation and reformatting, university of houston libraries. santi thompson (sathompson3@uh.edu) is associate dean for research and student engagement and the eva digital research endowed library professor, university of houston libraries. anne washington (washinga@oclc.org) is semantic applications product analyst, oclc. jerrell jones (jjones46@uh.edu) is digitization lab manager, university of houston libraries. andrew weidner (andrew.weidner@bc.edu) is head of digital production services, boston college libraries. a. laura ramirez (alramirez@uh.edu) is senior library specialist, university of houston libraries. marian smith (mrsmith8@uh.edu) is digital photo tech, university of houston libraries. © 2022. abstract in 2019, the university of houston libraries formed a theses and dissertations digitization task force charged with digitizing and making more widely accessible the university’s collection of over 19,800 legacy theses and dissertations. supported by funding from the john p. mcgovern foundation, this initiative has proven complex and multifaceted, and one that has engaged the task force in a broad range of activities, from purchasing digitization equipment and software to designing a phased, multiyear plan to execute its charge. this plan is structured around digitization preparation (phase one), development of procedures and workflows (phase two), and promotion and communication to the project’s targeted audiences (phase three). the plan contains step-by-step actions to conduct an environmental scan, inventory the theses and dissertations collections, purchase equipment, craft policies, establish procedures and workflows, and develop digital preservation and communication strategies, allowing the task force to achieve effective planning, workflow automation, progress tracking, and procedures documentation. the innovative and creative approaches undertaken by the theses and dissertations digitization task force demonstrated collective intelligence resulting in scaled access and dissemination of the university’s research and scholarship that helps to enhance the university’s impact and reputation. introduction to answer the call of implementing university of houston (uh) libraries strategic plan to position the libraries as a campus leader in research and transform library space to reflect evolving modes of learning and scholarship, the uh libraries launched a cross-departmental task force in 2019 charged with digitizing the university’s extensive print theses and dissertations collection. providing online access to newly digitized theses and dissertations boosts the reach and impact of our institution’s research and scholarship while expanding available space for computing, mailto:awu@uh.edu mailto:tgdavis-vanatta@uh.edu mailto:bscott3@uh.edu mailto:sathompson3@uh.edu mailto:washinga@oclc.org mailto:jjones46@uh.edu mailto:andrew.weidner@bc.edu mailto:alramirez@uh.edu mailto:mrsmith8@uh.edu information technology and libraries september 2022 navigating uncharted waters | 2 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith technology, and faculty and student learning and research activities. a study by bennett and flanagan revealed the positive impact and benefits of online dissemination of theses and dissertations, including enhanced discoverability by google’s strong indexing capabilities, significant increase in the usage of the works, and an overall enhancement of the reputation of an institution.1 encouraged by the positive outcomes and supported by funding from the john p. mcgovern foundation to initiate this project, the theses and dissertations digitization (tdd) task force developed a phased project plan and utilized creative, automated processes and methods to execute it. this article articulates the tdd project planning and the innovative work undertaken by the task force to achieve efficiency in making our print theses and dissertations readily available to new readerships around the world. literature review over the past several decades, research libraries have been building programs around digitization and open access repository infrastructures, largely aimed at expanding their digital collections and engaging communities with newly available research materials. for some, part of their programming has included projects that digitize their institution’s legacy print collections of theses and dissertations. the review below explores literature on the mass digitization process , including institutional case studies, guidance documents, legal and policy papers, and local documentation developed as libraries have planned and implemented these projects. any library tackling a retrospective thesis and dissertation project needs a framework for determining the copyright status of these works en masse. perhaps it is no coincidence, then, that copyright concerns are the most heavily documented aspect of the process. clement and levine provide the definitive work to date on copyright and the publication status of theses and dissertations written in the united states before 1978. their study asserts that “p re-1978 american dissertations were considered published for copyright purposes by virtue of their deposit in a university library or their dissemination by a microfilm distributor.”2 they go on to write that, “for copyright purposes, these were acts of publication with the same legal effect as dissemination through presses, publishers, and societies.”3 they suggest that libraries should investigate the copyright status for theses and dissertations authored between 1909 and 1978 (typically found on the title page and verso); if there is no copyright notice, then the thesis or dissertation is likely in the public domain and eligible for digitization and public release without permission. moreover, even those works that have a printed copyright notice might have fallen out of copyright if they were not renewed after 28 years for the same length of time.4 broad guidance and best practice for copyright status and other matters of process around theses and dissertations is provided in guidance documents for lifecycle management of etds, which acknowledges that legal services may be required for some retrospective thesis and dissertation digitization projects, especially “before scanning without the permission of former students .”5 the authors assert that information professionals should investigate any “appropriate access options” with institutional legal expertise before engaging in a retrospective digitization project and articulate the two most commonly encountered copyright scenarios: “[either] former student authors may not allow the reproduction and open dissemination of their work, or unauthorized copyrighted material was used in the original theses and dissertations.”6 strategies that might be employed to determine copyright status include “consulting with legal counsel at one’s institution to see where it stands on this issue; negotiating with commercial entities that make such content information technology and libraries september 2022 navigating uncharted waters | 3 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith available at a price so that institutions can have some control over it for the purpose of broader access; and working with groups such as alumni associations, colleges, departments, and graduate schools to establish contact with thesis and dissertation authors for securing their permission to digitize, and render available online, their past scholarship.”7 on the question of public access to newly digitized works, the guidance documents detail the implications of the “transition from print to electronic,” which “has led to increased scrutiny over who will be allowed to access the electronic versions and how widely they will be disseminated.”8 when there is any legal doubt, there are many reasons for libraries to exercise caution and restrict access to electronic theses and dissertations; that said, “research available on the web immediately upon submission of the final, approved thesis can prove advantageous to the newlydegreed student, the institution, and other researchers.”9 again, consulting legal officers and the original authors, if possible, remains the consensus approach to establishing a strategy for access to digitized theses and dissertations. the guidance documents also touch on the thorny issue of digitizing theses and dissertations that contain third-party content. they summarize the history and routine application of the fair use doctrine in both the creation and dissemination of scholarly works but provide little firm guidance on the matter.10 indeed, after reviewing the entire body of literature on retrospective thesis and dissertation projects, this remains a practical challenge that any library undertaking a mass digitization project must consider and the associated risks must be accounted for. in recent years, several case studies have documented institutions’ efforts to digitize and make more widely available legacy theses and dissertations. of the institutions that the tdd task force reviewed for the environmental scan, none of their case studies attempts an exhaustive documentation of end-to-end workflows and processes developed to execute the task; most focus on particularly difficult questions inherent to the process. martyniak provides a rationale for the university of florida’s (uf) retrospective scanning project and details their process for contacting authors before works were scanned.11 the workflow outlines several points of contact with authors to obtain signed distribution agreements, as well as uf’s approach to automate this process as much as possible. notably, the distribution agreement form and correspondence templates are provided as appendices to the article.12 as part of this retrospective digitization project, uf also released a scanning policy that articulates their approach to determining the copyright status of works and their resultant practice. 13 this policy document is an excellent example of an institution’s implementation of clement and levine’s research described above. likewise, mundle describes the methods used by simon fraser university (sfu) to establish its approach to the issues of copyright status and access, ultimately resulting in a public thesis access policy and procedures for contacting authors whenever possible to offer them the ability to opt their work out of the project.14 unlike the uf, sfu began scanning before any explicit permission had been obtained from authors. sfu also shares their use of scripts to automate the ingest of metadata from original marc records into their dspace repository.15 piorun and palmer, meanwhile, focus on an analysis of the time and cost associated with digitizing 300 doctoral dissertations for a newly implemented institutional repository at the university of information technology and libraries september 2022 navigating uncharted waters | 4 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith massachusetts chan medical school.16 piorun and palmer detail the library’s process for obtaining cost comparisons from external vendors as well as estimated costs, including labor, associated with undertaking the task in house.17 issues of workflow, policy development, and permissions are also addressed with an emphasis on developing accurate and streamlined methods of processing works; however, piorun and palmer conclude, “regardless of the amount of planning and thought that goes into a project, there is always the possibility that each record or file will need to be reworked.”18 shreeves and teper discuss theses and dissertations’ complicated status as grey literature and the university of illinois urbana-champaign (uiuc) library’s digitization project, which they describe as “less of a collection management or preservation issue and more as an effort to tackle broader scholarly communication and outreach issues.”19 after consulting with university legal counsel, digitized works were ingested to the uiuc institutional repository as a restricted (campus -only access) collection. as authors provide consent, access to their work is broadened to the public. worley demonstrates that, according to an analysis of circulation numbers, works that are accessible electronically are used dramatically more than print copies, serving as rationale for undertaking digitization of student works.20 they provide significant detail around virginia polytechnic institute and state university’s process to establish file specifications for its digitization process, and image quality/resolution and file format selection are discussed in some detail, with helpful visual examples.21 these case studies are particularly valuable in that they provide evidence and cautionary tales around how local contexts have made a difference in copyright and workflow issues. this case study contributes to the existing body of literature by attempting to provide an exhau stive, end-toend description of the retrospective digitization process—from copyright evaluation, to physical handling, to digitizing with an eye to access controls and digital preservation concerns. furthermore, our approach to digitization at scale incorporates automation at several points throughout the workflow, representing a production improvement to the decade-old case studies we reviewed. project planning and execution digitizing a large corpus of print theses and dissertations is a complex process touching areas of equipment, copyright policy, workflows for different sections of the process, progress tracking, preservation, and communication. to handle such a multifaceted project, the tdd task force designed a plan that divided the project activities into three phases (see table 1). phase one is dedicated to tasks of preparation such as the environmental scan, copyright permission investigation, digitization equipment purchasing, and print theses and dissertation inventory. phase two includes activities such as digitization and metadata workflow development, documentation, project tracking, ingestion, and preservation of digitized files. phase three is mainly for promotion and communication to our researchers on the availability of our digitized theses and dissertations collection. task force members volunteered to serve in subteams for identified specific tasks in each project phase. information technology and libraries september 2022 navigating uncharted waters | 5 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith table 1. phased planning for the tdd project theses and dissertation digitization task force planning phases task force activities subteams (*subteam lead) phase one: preparation environmental scan jerrell, anne, crystal, santi*, bethany physical theses/dissertations inventory/retention bethany copyright permissions and policies bethany, annie, taylor* purchase equipment jerrell*, crystal phase two: workflow development td digitization workflow jerrell, crystal* td metadata workflow anne*, taylor, annie td ingest and publishing workflow andrew, taylor td progress tracking annie, andrew preservation/storage strategy bethany*, santi phase three: communication/promotion promote dtd to colleagues and researchers taylor*, santi communicate progress to staff and users annie*, santi* develop training materials for stakeholders anne*, crystal* phase one: preparation a subgroup of the tdd task force conducted an environmental scan of similar theses and dissertations digitization approaches previously used by other institutions. the lead for the subgroup created a google sheet that all group members used to document information found in published literature, public documents, and institutional websites. the lead assigned group members to review information from institutions with publicly available data, including: the university of florida, the university of north texas, the university of illinois urbana champaign, brigham young university, william and mary university, texas a&m university, the university of arizona, the california institute of technology, the massachusetts institute of technology, iowa state university, xavier university, texas tech university, and the university of british columbia. group members noted relevant information pertaining to a variety of topics focused on theses and dissertations digitization. one of the most prominent was the institution’s response to copyright permissions. the group tried to determine if the institution required author permission before releasing a digitized thesis or dissertation (the “opt in” option), or incorporated policies and procedures that prioritized taking down digitized theses and dissertations once requested by the author (the “opt out” option). they observed software and hardware specifications used by other institutions—critical data that would inform the technology needed to complete a project of this scale. the group documented the key components of the digitization and metadata workflows, including roles and responsibilities, sequencing of actions, and the implications that policies and procedures had on the process. this data helped the group understand what gaps, common problems, and emerging best practices existed. finally, the group reviewed physical retention and preservation strategies articulated by institutions to ensure it understood the long-term stewardship hurdles and requirements for analog and digital material. information technology and libraries september 2022 navigating uncharted waters | 6 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith based on the assessment of the 19,800 uh theses and dissertations identified for inclusion in the project, the digitization subgroup members determined that several scanners would be required for agility in digitization production. the tdd digitization workflow was designed so that this project could run effectively, in parallel, with existing digitization projects regardless of the need for some theses to be scanned on existing equipment. automatic document feed (adf) scanners were a strategic choice for the rapid scanning of disbound items. two canon dr-g2110 adf scanners were purchased for the project. these scanners were chosen for their scanning speed, scanning quality, ease of use, onboard image preprocessing, and reasonable price point. the canon dr-g2110 can handle a large page stack, approximately 500 pages. theses and dissertations can be scanned on the longer or shorter dimension, which allows faster scanning times. among many innovative features, this duplex scanner simultaneously digitizes both sides of a document, rotates the pages based on text orientation, and auto-crops through preprocessing during the scanning process. this canon adf solution makes more image postprocessing automation possible since the resulting scans match our output expectations with minimal user input. other scanning options were needed for a smaller subset of theses and dissertations that could not be disbound. the digitization team leveraged an existing zeutschel os 12002 planetary scanner for items that could not be disbound. an existing plustek opticbook (po) a300 plus was used for items with foldouts containing graphs, maps, and illustrations that measure beyond 11 inches on the longest dimension. additionally, a plustek opticbook 3800l was purchased to accommodate fragile us letter–sized pages that are not suitable for adf scanning. thin or heavily waxed papers typically do not stand up well to the fast-moving rubber rollers and other internal scanning mechanisms. while the po 3800l provides a much longer scanning time than the po a300, both scanners can scan into the page gutter of bound materials, a useful feature for items with insufficient margins. figure 1. image processing workflow testing on a thesis in limb processing. the green check marks on the left indicate that a page has been processed correctly. information technology and libraries september 2022 navigating uncharted waters | 7 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith the canon adf scanner operates through two pieces of software working concurrently, canon captureontouch v4 pro and kofax vrs (an additional product supplied by the task force’s scanner vendor). some image processing settings are applied in kofax vrs, which communicates with captureontouch v4 pro. both pieces of software were bundled with purchases of the two canon adf scanners. limb processing by i2s was also purchased for the project. limb processing is a powerful mass image processing product that operates through user-built processing workflows that can be applied to multiple folders, creating standardized output suitable for automation. the limb software can transform an imperfect scan into a fully processed, clean derivative with minimal user input, which is especially useful for transforming legacy image data. abbyy finereader server 14 is used to provide quality optical character recognition (ocr) data and features efficient tools for automation, allowing for large ocr processing jobs to be queued and run recursively with minimal user intervention (see fig. 1). with these powerful tools, uh libraries has been able to leverage our existing scanners, new scanners, and advanced software to plan for the timely capture of nearly three million pages of content. the number of theses and dissertations required the implementation of a semiautomatic disbinding system. the spartan 185a paper cutter from the challenge machinery company was purchased to ensure the replication of many clean binding removals. options from several manufacturers were considered for these needs but the spartan 185a offered the cutting power needed to cut millions of pages over the life of the project (see fig. 2). the cutter features several safeguards that protect the operator, such as the lowering of a protective acrylic guard and the requirement of two hands, away from the blade, to lower the blade automatically. uh libraries chose a local cutter blade replacement company that services the equipment quarterly. in addition to the cutter, supplies for binding removal and physical volume management were needed, such as: • x-acto knife and/or utility blades • recycling bins • table brooms and dustpans • disbinding tables • cutting mats • standing mats • letter and legal-size folders • folder holders • surface cleaning materials • carts/book trucks information technology and libraries september 2022 navigating uncharted waters | 8 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith figure 2. (l) thesis scanning test on the zeutschel os 12002; (r) challenge spartan 185a cutter with red numbers indicating blade lowering and cutting safety button order. the physical retention of volumes was considered in the context of the overall preservation of the theses and dissertations collection, including the digital preservation approach to the tdd project. the uh libraries holds two copies of a student’s thesis and dissertation. after consulting with stakeholders throughout the library—such as the university archivist, the dean of libraries and associate deans, and access services’ shelving team for shelf space/storage in different areas of the library building—the task force decided to retain one bound copy of each thesis or dissertation. additional copies will be weeded from the general collection, and the best copy for digitization will be disbound for feeder scanning using the equipment described above. when only one copy of a thesis or dissertation exists in the collection, it will be scanned using a scanner that will not destroy or damage the binding. the retained theses and dissertations collection will be housed in uh libraries special collections in the secure and climate controlled closed stacks. once the tdd task force settled on this retention strategy, the digital projects coordinator, a member of the task force who represents special collections, conducted a full shelf -read of the theses and dissertations already housed in special collections. using a master tracking spreadsheet that was generated from catalog reports for project tracking and pulling, a small team of student workers reviewed over 20,000 volumes to identify missing titles, titles with multiple copies that can be weeded from special collections, and copies with label and/or cataloging errors. missing titles were transferred from the general stacks to special collections, and the items were reshelved in chronological order. a more extensive shelving shift still needs to be completed to move volumes to accommodate additions and finalize the shelf location for all items in this collection, which will no longer be growing or because all theses and dissertations at uh are submitted electronically as of 2014. as part of the shifting project, the items also need to be checked in and/or have their location codes changed in the catalog to reflect their new permanent home in special collections. information technology and libraries september 2022 navigating uncharted waters | 9 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith phase two: workflow development the theses and dissertations digitization workflow starts with pulling physical volumes from shelves. the task force generated a report of all uh theses and dissertations and sorted them by call number so that student workers can pull these volumes from the general stacks in call number order. after the pulled volumes’ records are withdrawn from the catalog system, they are shelved by call number order in the “ready for digitization” section of the tdd shelf in the library basement, close to the digitization lab. volumes are pulled from a section of the library stacks dedicated to the tdd project and loaded onto book carts for transfer to the physical volume processing room. using a custom-built processing table, covers are removed with utility knives and discarded. the text is placed in a folder with a pre-printed label indicating the oclc number and call number of the volume. the spine of each volume is removed with a spartan guillotine. the completely disbound volumes, housed in labeled folders, are then moved on book carts to the scanning room. prior to scanning, physical volumes are grouped in batches of approximately 50 and a text file is created that lists each oclc number in a batch, one per line. a simple executable file reads the text file and creates a batch directory. the batch directory is labeled with the current date in yyyymmdd format and contains a folder for each scanned volume. the scanned volume folders are labeled by oclc number and contain a metadata.txt file that records the volume’s descriptive metadata from the uh catalog system in yaml format: a data carrier that is easily readable for both humans and machines. scanning is performed with one of the two canon dr-g2110 high-speed feed scanners controlled by kofax vrs and captureontouch v4 pro software. before a volume is placed in the scanner, it is checked to ensure that the binding has been completely removed, that there are no pages that have been glued in after binding, and that there are no onion skin pages, irregular page sizes, inserts, or foldouts. if necessary, additional scans for delicate onion skin pages, inserts, or foldouts are performed on a flatbed scanner. page images are output as 300 dpi grayscale or color tiffs, and first pass quality control for completeness, page legibility, and rotation, and cropping is performed in captureontouch. after page images have been captured, a batch is loaded into limb for final processing. scanned volumes are again checked for completeness, legibility, and orientation. text pages are processed as 300 dpi bitonal tiffs. pages with grayscale or color images are processed as such. when batch processing is complete, the documents’ processed signature pages, which include names and signatures of the author, advisor, and committee members, are separated out so they are not included in the final version published online. this step protects the privacy of individuals by not sharing their signature openly over the internet. the tdd project uses abbyy finereader server 14 to generate full text pdfs and a plain text file for each scanned volume. the data in each scanned volume directory undergoes transformations both before and after the ocr processing. the transformations are accomplished with the tdd workflow utility, a ruby command line application. before running a batch through ocr, the archive digitized batch function moves the high-resolution master tiffs to an archive directory and formats the batch directory for the input that abbyy expects. after ocr processing, the archive ocr batch function moves the derivative tiffs used as ocr input to the archive directory information technology and libraries september 2022 navigating uncharted waters | 10 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith as well. in a final process before sending the batch to the metadata unit, the process ocr batch function adds descriptive metadata to the embedded exif metadata of each pdf document with exiftool for improved accessibility. the tdd task force sought to align print materials’ metadata standards with the existing metadata standards applied to electronic theses and dissertations in the university’s institutional repository, largely based on the dictionary of texas digital library descriptive metadata guidelines for electronic theses and dissertations, v.2.22 early on in the project, the metadata subteam reviewed thesis and dissertation records in the institutional repository (ir) as well as marc catalog records in uh libraries’ library services platform, with special emphasis on the metadata elements used, to identify alignments and gaps. after analysis, the team established the crosswalk from marc to the qualified dublin core profile in the ir (see table 2). in july 2019, uh libraries migrated to the alma library services platform. prior to this migration, the task force exported tdd marc records from uh libraries’ former library services platform, sierra, and crosswalked into dublin core metadata fields using the freely available software marcedit. data was further normalized using openrefine. at this early stage, openrefine proved to be a valuable tool for batch editing and formatting metadata and identifying legacy terms or missing data. once the crosswalked data was cleaned up and put into place, standard values for all records were added (see table 3). table 2. metadata crosswalk from marc to qualified dublin core metadata field marc field qualified dublin core element oclc number 001, 035 $a dc.identifier.other call number 099 [n/a, admin use only] author name 100 $a dc.creator title 245 $a $b dc.title thesis year 264 $c dc.date.issued degree information 500, 502 $a thesis.degree.name subject 6xx fields dc.subject department 710 $b thesis.degree.department during the ongoing processing of digitized materials and as part of the quality control, each volume’s metadata is evaluated against its corresponding metadata record and edited when necessary. in an effort to enrich the metadata available to users and increase visibility of the volumes, information not typically provided in the marc records, such as thesis committee chairs, other committee members, and abstracts, are added to the records using dublin core contributor (dc.contributor.committeemember) and abstract (dc.description.abstract). information technology and libraries september 2022 navigating uncharted waters | 11 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith table 3. standard values added to all records qualified dublin core element value dc.format.mimetype application/pdf dc.type.genre “thesis” or “dissertation,” as applicable thesis.degree.grantor university of houston dc.type.dcmi text dc.format.digitalorigin reformatted digital in the interest of closely observing copyright best practices, members of the tdd task force, including the digital projects coordinator and the director of the digital research commons, created copyright review guides and applicable rights statements. under these guidelines, theses and dissertations are considered under copyright if a copyright notice appears on volumes created in 1977 and earlier, if the item was created between 1978 and february 28, 1989, and if it was registered with the us copyright office within five years of its creation, or if it was created on march 1, 1989 or later. inserts and other research material provided in the volumes are similarly considered for copyright evaluation during the copyright review process. once a volume has been evaluated for copyright status, an out-of-copyright or in-copyright statement is assigned. in alignment with the uh libraries’ mission to provide valuable research and educational materials, digitized volumes and metadata records are then ingested into the institutional repository.23 in this stage of the process, out-of-copyright volumes are made available as open access materials. due to inherent limitations, in-copyright volumes are access restricted and available solely to the university community. when content is ready for ingest, volumes are moved to the ingest folder and placed in staging directories based on rights status: open access or in copyright. the ingest process is the same for both types of content, but in-copyright content requires additional post-ingest processing, so ingest batch folders are labeled according to rights status for clarity. the tdd workflow utility’s prepare ingest package function is used to create ingest packages in an input format expected by the saf creator, a utility for preparing dspace batch imports in the simple archive format .24 pdf files are copied and renamed in the format lastname_year_oclcnumber.pdf, a csv file is created with descriptive metadata for the batch, and the original files and metadata are moved to an archive directory. the saf creator is then used to create an saf ingest package that is imported into dspace. limiting access to copyrighted content was a necessary component of the project that took some time to solve. the team investigated creating a separate collection for the in-copyright content with access limited to users logged in with uh credentials. the downside to this approach was that the content within the restricted collection was not discoverable to users who were not information technology and libraries september 2022 navigating uncharted waters | 12 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith logged into the ir. in the end, the tdd task force worked with the texas digital library, a consortial nonprofit organization that hosts uh libraries’ dspace repository, to enable restricted access using bitstream authentication with shibboleth. this allows the task force to ingest all tdd project content into a single collection and apply uh authentication to copyrighted pdf documents. in this manner, descriptive metadata for all documents is discoverable, but access to the document content is only available to members of the uh community. applying authentication to bitstreams in the dspace administrative interface is a tedious process involving numerous clicks and dropdown menu selections. selenium ide, a browser plug-in designed for automated web development testing, is used to automate that process in the firefox web browser. after an in-copyright batch has been ingested, the tdd workflow utility’s prepare selenium script function is used to create an automation script for selenium. when loaded in the firefox selenium add-on, the script automatically applies the bitstream authentication steps in the browser for each volume in the batch. the tdd workflow comprises detailed tasks carried out at different units in the library in a sequential routine as an assembly line. tdd activities flow from pulling volumes from shelves to disbinding, scanning, image quality control and ocr, metadata creation and copyright evaluation, and digitized files ingestion into the dspace system. as the tdd task force worked collaboratively to develop and confirm workflows for this complicated process, they documented each section of the workflow in the one-stop tdd workflow google document for easy access and transparency of the overall process.25 the tdd working group members notify each other at completion of tasks at each section. to better track each thesis and dissertation as it moved through the digitization, metadata, and copyright verification workflows, the task force developed an excel spreadsheet tracking system.26 this tracking system lists uh libraries’ theses and dissertation titles, their oclc numbers, dates, and call numbers. it records the tdd volumes pulled from shelves, digitization completed, digitization batch, borrower notes, metadata completed, and other notes. this tracking system provides a channel for the team members to inform each other of completed tasks at each unit and to communicate issues in the working process (see fig. 3). information technology and libraries september 2022 navigating uncharted waters | 13 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith figure 3. a screenshot of a portion of the tdd tracking system. phase three: promotion, communication, and next steps it is important to have strategies for tdd promotion and communication to raise awareness of the online availability of the university’s legacy theses and dissertations. the tdd task force brainstormed elements such as audience, channels, and timeline for tdd communication. theses and dissertation authors and campus users are the two main groups the task force plans to target in its promotion and communication plan. to attract audience attention, the tdd task force will design an online flyer/postcard for dissemination. they are currently collaborating with the uh libraries director of communication, the uh alumni office, the uh graduate office, and the uh division of research to distribute messages to targeted audiences. the task force will communicate tdd digitization progress as they reach important milestones, including the completion of pre-1978 volumes, then at the increments of 10,000 and 15,000 volumes, and once all volumes have been digitized and deposited to the repository. with the disbanding, digitization, and metadata workflows firmly in place, the tdd task force commenced the process of generating digitized versions of uh’s theses and dissertations in 2020. while this process will continue over the next several years, the task force will also fo cus on refining policies and workflows around its copyright and digital preservation activities. the tdd task force has developed a draft copyright policy development document, which outlines copyright determination decisions and access controls for content deemed in copyright. the task force is currently consulting with uh general counsel to ensure its recommended copyright approaches are in concert with university best practices. information technology and libraries september 2022 navigating uncharted waters | 14 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith at the same time, the task force is developing digital preservation procedures to ensure the longterm access, storage, and preservation to digitized theses and dissertations. the group has made some foundational decisions to date. since one physical copy of each title will be retained, allowing for future higher-resolution rescanning if needed, the task force determined that the preservation master file for each digitized thesis or dissertation will be one pdf. this will allow the uh libraries to greatly reduce the ongoing storage costs associated with digitally preserving the tdd collection. throughout 2023, the task force will be exploring ways to sync tdd content to its current digital preservation workflow process, including submitting content to uh libraries’ archivematica instance for preservation curation services such as file fixity checks and normalization, and transferring preserved tdd content to cloud storage for distributed digital preservation. prior to ingesting any content into the institutional repository, the team reached out to the uh’s electronic and information resources accessibility (eira) coordinator for feedback on the accessibility of the pdf documents produced by abbyy. the eira coordinator recommended encoding our pdfs as pdf/a-1a, a standard designed for preservation and accessibility, and introduced the team to the accessibility tools available in adobe acrobat. the adobe acrobat accessibility checker has been useful for identifying and addressing accessibility issues with the pdfs that we are producing. uh libraries web accessibility standards strive to comply with the world wide web consortium’s (w3) web content accessibility guidelines (wcag). combined with the feedback from the uh’s eira coordinator, the current output was reviewed against these accessibility checklists, and areas needing improvement were identified. after several adjustments, the newest output for the project passes a majority of adobe acrobat’s accessibility checker accessibility parameters, with further investigation planned to address weak points moving forward. conclusion the tdd project at uh libraries provides an in-depth view of the planning and workflow processes needed to launch a retrospective thesis and dissertations digitization effort in an academic library setting. collaborating across uh libraries departments, the tdd task force designed a phased approach to identify technology and resources needed to undertake the project, to develop policies, procedures, and workflows to guide the work to its completion, and to communicate about the scope, purpose, and progress of the project to internal and external stakeholders. throughout the planning and development phases, the task force leveraged automation, bibliographic data reuse, and project management tracking to achieve workflow objectives efficiently and responsibly. with the project well underway, the task force will continue refining its processes and working across uh libraries and campus units to ensure it complies fully with copyright and digital preservation best practices. through these ongoing efforts, the tdd task force is ensuring that the original research and scholarship contained in thousands of theses and dissertations are more accessible than ever before—broadening the reach and impact of uh graduates well into the future. funding this project was funded by the john p. mcgovern foundation. information technology and libraries september 2022 navigating uncharted waters | 15 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith acknowledgments the authors dedicate this work to the memory of their colleague and tdd task force member crystal cooper. endnotes 1 linda bennett and dimity flanagan, “measuring the impact of digitized theses: a case study from the london school of economics,” insights: the uksg journal 29, no. 2 (2016): 111–19, https://doi.org/10.1629/uksg.300. 2 gail clement and melissa levine, “copyright and publication status of pre-1978 dissertations: a content analysis approach,” portal: libraries and the academy 11, no. 3 (2011): 825, https://doi.org/10.1353/pla.2011.0031. 3 clement and levine, “copyright and publication status,” 825. 4 clement and levine, “copyright and publication status,” 826. 5 xiaocan (lucy) wang, “guidelines for implementing etd programs—roles and responsibilities,” in guidance documents for lifecycle management of etds, eds. matt schultz, nick krabbenhoeft, and katherine skinner (2014): sect.1, p. 14, https://educopia.org/guidance-documents-forlifecycle-management-of-etds. 6 wang, “guidelines,” 1-17. 7 patricia hswe, “briefing on copyright and fair use issues in etds,” in guidance documents for lifecycle management of etds, eds. matt schultz, nick krabbenhoeft, and katherine skinner, (2014): sect. 3, p. 12, https://educopia.org/guidance-documents-for-lifecycle-management-ofetds. 8 geneva henry, “guide to access levels and embargoes of etds,” in guidance documents for lifecycle management of etds, eds. matt schultz, nick krabbenhoeft, and katherine skinner, (2014): sect. 2, p. 1, https://educopia.org/guidance-documents-for-lifecycle-management-ofetds. 9 henry, “guide to access levels,” 2-1. 10 hswe, “briefing on copyright,” 3-9–3-13. 11 cathleen l. martyniak, “scanning our scholarship: the university of florida retrospective dissertation scanning project,” microform and imaging review 37, no. 3 (2008): 122–24, https://doi.org/10.1515/mfir.2008.013. 12 martyniak, “scanning our scholarship,” 127–29. 13 “retrospective dissertation scanning policy,” (2011), university of florida, accessed january 1, 2022, https://ufdc.ufl.edu/aa00007596/00001. https://doi.org/10.1629/uksg.300 https://doi.org/10.1353/pla.2011.0031 https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://educopia.org/guidance-documents-for-lifecycle-management-of-etds https://doi.org/10.1515/mfir.2008.013 https://ufdc.ufl.edu/aa00007596/00001 information technology and libraries september 2022 navigating uncharted waters | 16 wu, davis-van atta, scott, thompson, washington, jones, weidner, ramirez, and smith 14 todd mundle, “digital retrospective conversion of theses and dissertations: an in house project,” in proceedings of eighth symposium on electronic theses and dissertation (sydney, australia, 2005): 3–4. 15 mundle, “digital retrospective conversion,” 3. 16 mary piorun and lisa a. palmer, “digitizing dissertations for an institutional repository: a process and cost analysis,” journal of the medical library association: jmla 96, no. 3 (2008): 223–29, https://doi.org/10.3163/1536-5050.96.3.008. 17 piorun and palmer, “digitizing dissertations,” 224. 18 piorun and palmer, “digitizing dissertations,” 227. 19 sarah l. shreeves and thomas h. teper, “looking backwards: asserting control over historic dissertations,” college and research library news 73, no. 9 (2012): 532–33, https://doi.org/10.5860/crln.73.9.8830. 20 gary m. worley, “dissertations unbound: a case study for revitalizing access,” in proceedings of the 10th international symposium on electronic theses and dissertations (uppsala, sweden, 2007). 21 worley, “dissertations unbound,” 3–6. 22 dictionary of texas digital library descriptive metadata for electronic theses and dissertations, version 2.0, (2015), http://hdl.handle.net/2249.1/68437. 23 to access cougar roar, see https://guides.lib.uh.edu/roar. 24 saf creator is a tool developed by james creel at texas a&m university. for more, see https://github.com/jcreel/safcreator. 25 see the tdd google document: https://docs.google.com/document/d/18gyjq6isn7qsuelo1z3b7btmxlxnchmvqp8rhquzy8g /edit?usp=sharing. 26 see the complete tdd tracking system: https://docs.google.com/spreadsheets/d/1tehagvcqw6wb3n5cdaulbtlwzqzstwdbltiapd 1oan0/edit?usp=sharing. https://doi.org/10.3163/1536-5050.96.3.008 https://doi.org/10.5860/crln.73.9.8830 http://hdl.handle.net/2249.1/68437 https://guides.lib.uh.edu/roar https://github.com/jcreel/safcreator https://docs.google.com/document/d/18gyjq6isn7qsuelo1z3b7btmxlxnchmvqp8rhquzy8g/edit?usp=sharing https://docs.google.com/document/d/18gyjq6isn7qsuelo1z3b7btmxlxnchmvqp8rhquzy8g/edit?usp=sharing https://docs.google.com/spreadsheets/d/1tehagvcqw6wb3n5cdaulbtlwzqzstwdbltiapd1oan0/edit?usp=sharing https://docs.google.com/spreadsheets/d/1tehagvcqw6wb3n5cdaulbtlwzqzstwdbltiapd1oan0/edit?usp=sharing abstract introduction literature review project planning and execution phase one: preparation phase two: workflow development phase three: promotion, communication, and next steps conclusion funding acknowledgments endnotes academic libraries on social media: finding the students and the information they want heather howard, sarah huber, lisa carter, and elizabeth moore information technology and libraries | march 2018 8 heather howard (howar198@purdue.edu) is assistant professor of library science; sarah huber (huber47@purdue.edu) is assistant professor of library science; lisa carter (carte241@purdue.edu) is library assistant; and elizabeth moore (moore658@purdue.edu) is library assistant and student supervisor at purdue university. librarians from purdue university wanted to determine which social media platforms students use, which platforms they would like the library to use, and what content they would like to see from the library on each of these platforms. we conducted a survey at four of the nine campus libraries to determine student social media habits and preferences. results show that students currently use facebook, youtube, and snapchat more than other social media types; however, students responded that they would like to see the library on facebook, instagram, and twitter. students wanted nearly all types of content from the libraries on facebook, twitter, and instagram, but they did not want to receive business news or content related to library resources on snapchat. youtube was seen as a resource for library service information. we intend to use this information to develop improved communication channels, a clear social media presence, and a cohesive message from all campus libraries. introduction in his book tell everyone: why we share and why it matters, alfred hermida states, “people are not hooked on youtube, twitter or facebook but on each other. tools and services come and go; what is constant is our human urge to share.”1 libraries are places of connection, where people connect with information, technologies, ideas, and each other. as such, libraries look for ways to increase this connection through communication. social media is a key component of how students communicate with classmates, families, friends, and other external entities. it is essential for libraries to communicate with students regarding services, collections, events, library logistics, and more. purdue university is a large, land-grant university located in west lafayette, indiana, with an enrollment of more than forty thousand. the purdue libraries consist of nine libraries, presented collectively on the social media platforms facebook and twitter since 2009 and youtube since 2012. going forward, the purdue libraries want to ensure it establishes a cohesive message and brand that is communicated to students on platforms they use and on which they will engage with it. the purpose of this study was to determine which social media platforms the students are currently using, which platforms they would like the library to use, and what content they would like to see from the libraries on each of these platforms. mailto:howar198@purdue.edu mailto:huber47@purdue.edu mailto:carte241@purdue.edu mailto:moore658@purdue.edu academic libraries on social media | howard, huber, carter, and moore 9 https://doi.org/10.6017/ital.v37i1.10160 literature review academic libraries and social media academic libraries have been slow to accept social media as a venue for either promoting their services or academic purposes. a 2007 study of 126 academic librarians found that only 12 percent of those surveyed “identified academic potential or possible benefits” of facebook while 54 percent saw absolutely no value in social media.2 however, the mission of academic libraries has shifted in the last decade from being a repository of knowledge to being a conduit for information literacy; new roles include being a catalyst for on-campus collaboration and a facilitator for scholarly publication within contemporary academic librarianship.3 academic librarians have responded to this change, with many now believing that “social media, which empowers libraries to connect with and engage its diverse stakeholder groups, has a vital role to play in moving academic libraries beyond their traditional borders and helping them engage new stakeholder groups.”4 student perceptions about academic libraries on social media as the use of social media has grown with college-aged students, so has an increasing acceptance of academic libraries using social media to communicate. a pew research center report from 2005 showed just 7 percent of eighteen to twenty-nine year olds using social media. by 2016, 86 percent were using social media.5 in 2007 the oclc asked 511 college students from six different countries to share their thoughts on libraries using social networking sites. this survey revealed that “most college students would be unlikely to participate in social networking services offered by a library,” with just 13 percent of students believing libraries have a place on social media.6 however, just two years later (in 2009), a shift was seen: students were open to connecting with academic libraries, as observed in a survey of 366 freshmen at valparaiso university. when asked their thoughts on the library sending announcements and communications to them via facebook or myspace (a social media powerhouse at the time), 42.6 percent answered they would be “more receptive to information received in this way than any other response.” a smaller group, 12.3 percent, responded more negatively to this approach. students showed concern for their privacy and the level of professionalism, as a quote from a student illustrates: “facebook is to stay in touch with friends or teachers from the past. email is for announcements. stick with that!!!” 7 as students report becoming more open to academic libraries on social media, the question of whether they will engage through social media emerges. a recent study from western oregon university’s hammersley library asked this question with promising results. forty percent of students said they were either “very likely “or “somewhat likely” to follow the library on instagram and twitter, as opposed to wanting communications being sent to them directly through social media (for example, a facebook message). pinterest followed, with 33 percent of students saying they were either “very likely” or “somewhat likely” to follow the library using this platform.8 throughout the literature, students have shown an interest in information about the libraries that is useful to them. in another survey given to undergraduate students from three information technology classes at florida state university, one question examined the perceived importance of different library social media postings to students. the report showed students considered postings related to operations updates, study support, and events as the most important.9 in the hammersly study noted above, 78 percent and 87 percent of respondents said information technology and libraries | march 2018 10 they were either “very interested” or “somewhat interested,” respectively, in every category relating to library resources presented in the survey, but “interesting/fun websites and memes” received the least interest from participants.10 the literature shows an increase in students being receptive to academic libraries on social media. results vary campus to campus and students are leery of libraries reaching out to them via social media, but they have an increasingly positive view about content posted that will help them with the library. research questions the aim of this project was to investigate the social media behaviors of purdue university students as they relate to the libraries, and to develop evidence-based practices for managing the library’s social media accounts. the project focused on three research questions: 1. what social media platforms are students using? 2. what social media platforms do students want the library to use? 3. what kind of content do students want from the library on each of these platforms? methods we created the survey using the web-based qualtrics survey software. it was distributed in electronic form only, and it was promoted to potential respondents via table tents in the libraries, bookmarks at the library desk, facebook posts, and in-classroom promotion. potential respondents were advised that the survey was anonymous and voluntary. the survey consisted of closed questions, though many questions contained an open-ended field for answers that did not fall into the provided choices. inspiration for some of the options in our survey questions came from the hammersly library study, as we felt they did a good job capturing information about the social media usage of their patrons.11 our survey asked what social media platforms students use, what they use them for, how often they visit the library, how likely they are to follow the library on social media, which platforms they want the library to have, and what content they would like from the library on each of those platforms. the social media platforms included were facebook, flickr, g+, instagram, linkedin, pinterest, qzone, renren, snapchat, tumblr, twitter, youtube, and yik yak.12 there were also open-ended spaces where participants could write in additional platforms. the survey originally ran for three weeks in only the business library early in the spring 2017 semester, as its intended purpose was to inform how the business library would manage social media. after that survey was completed, we decided to replicate the survey in three additional libraries (humanities, social science, and education; engineering; and the main undergraduate libraries). this was done to expand the dataset and reach additional students in a variety of disciplines. these libraries were chosen because they were the libraries in which the authors work, with the hope to expand to additional libraries in the future. the second survey also lasted for three weeks starting in mid-april of the spring 2017 semester. as a participation incentive, students who completed the initial survey and the second survey had an opportunity to enter a drawing for a $25 visa gift card. academic libraries on social media | howard, huber, carter, and moore 11 https://doi.org/10.6017/ital.v37i1.10160 the survey was advertised across four different campus libraries and promoted in several ways to reach different populations. though the results are not from a random sample of the student population, the results are broad enough that we intend to apply them to our entire student population. results survey the survey was completed by 128 students. an additional 13 students began the survey but did not complete it; we removed their results from the analysis. the breakdown of respondents was 10 percent freshmen (n = 13), 22 percent sophomore (n = 28), 27 percent junior (n = 35), 20 percent senior (n = 25), and 21 percent graduate or professional (n = 27). library usage the students were asked how frequently they visit the library to determine if the survey was reaching a population of regular or infrequent library visitors. the results showed that the students who completed the survey were primarily frequent library users, with 93 percent (n = 119) visiting once a week or more. social media platforms the students were asked to identify which social media platforms they used and how frequently they used them. the most popular social media platforms were determined by combining the number of students who said they used them daily or weekly. the top five were facebook (n = 114, 88 percent), youtube (n = 102, 79 percent), snapchat (n = 90, 70 percent), instagram (n = 85, 66 percent), and twitter (n = 41, 32 percent). full results are in table 1. table 1. usage frequency by platform social media platform daily weekly monthly < once per month never facebook 94 (72.87%) 20 (15.50%) 5 (3.88%) 5 (3.88%) 4 (3.10%) flickr 0 (0.00%) 1 (0.78%) 2 (1.55%) 8 (6.20%) 117 (90.70%) g+ 3 (2.33%) 6 (4.65%) 4 (3.10%) 16 (12.40%) 99 (76.74%) instagram 68 (52.71%) 17 (13.18%) 5 (3.88%) 11 (8.53%) 27 (20.93%) linkedin 9 (6.98%) 29 (22.48) 22 (17.05%) 22 (17.05%) 46 (35.66%) pinterest 12 (9.30%) 12 (9.30%) 16 (12.40%) 19 (14.73%) 69 (53.49%) qzone 0 (0.00%) 0 (0.00%) 0 (0.00%) 4 (3.10%) 124 (96.12%) renren 0 (0.00%) 0 (0.00%) 1 (0.78%) 3 (2.33%) 124 (96.12%) snapchat 84 (65.12%) 6 (4.65%) 6 (4.65%) 7 (5.43%) 25 (19.38%) tumblr 7 (5.43%) 2 (1.55%) 7 (5.43%) 11 (8.53%) 101 (78.29%) information technology and libraries | march 2018 12 social media platform daily weekly monthly < once per month never twitter 28 (21.71%) 13 (10.08%) 12 (9.30%) 9 (6.98%) 66 (51.16%) youtube 58 (44.96%) 44 (34.11%) 15 (11.63%) 4 (3.10%) 7 (5.43%) yik yak 0 (0.00%) 0 (0.00%) 0 (0.00%) 11 (8.53%) 117 (90.70%) other: email 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: groupme 3 (2.33%) 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: reddit 2 (1.55%) 2 (1.55%) 0 (0.00%) 0 (0%) 0 (0.00%) other: skype 0 (0.00%) 0 (0.00%) 0 (0.00%) 1 (0.78%) 0 (0.00%) other: vine 0 (0.00%) 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: wechat 3 (2.33%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: weibo 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) other: whatsapp 1 (0.78%) 0 (0.00%) 0 (0.00%) 0 (0.00%) 0 (0.00%) social media activity next, students were asked how much time they spend on social media doing the following activities: watching videos, keeping in touch with friends/family, sharing photos, keeping in touch with classmates/professors, learning about campus events, doing research, getting news, or following public figures. table 2 shows that students overwhelmingly use social media daily or weekly to watch videos (94 percent, n = 120), keep in touch with family/friends (93 percent, n = 119), and to get news (81 percent, n = 104). the least popular activities, those that students do less than once per month or never, were research (47 percent, n = 60) and to following public figures (34 percent, n = 45). social media and the library the students were asked how likely they are to follow the libraries on social media. the response to this was primarily positive, with 57 percent of respondents saying they are either extremely likely or somewhat likely to follow the library. one response for this question was inexplicably null, so for this question n = 127. figure 1 contains the full results. academic libraries on social media | howard, huber, carter, and moore 13 https://doi.org/10.6017/ital.v37i1.10160 table 2. social media activity social media activity daily weekly monthly < once per month never watch videos 85 (66.41%) 35 (27.34%) 1 (0.78%) 4 (3.13%) 3 (2.34%) keep in touch with friends/family 89 (69.53%) 30 (23.44%) 6 (4.69%) 2 (1.56%) 1 (0.78%) share photos 32 (25%) 33 (25.78%) 38 (29.69%) 20 (15.63%) 5 (3.91%) keep in touch with classmates/professors 34 (26.56% 47 (36.72%) 21 (16.41%) 19 (14.84%) 7 (5.47%) learn about campus events 24 (18.75%) 53 (41.41%) 29 (22.66%) 18 (14.06%) 4 (3.13%) do research 24 (18.75%) 26 (20.31%) 18 (14.06%) 23 (17.97%) 37 (28.91%) get news 66 (51.56%) 38 (29.69%) 7 (5.47%) 9 (7.03%) 8 (6.25%) follow public figures 34 (26.56%) 30 (23.44%) 20 (15.63%) 19 (14.84%) 24 (18.75%) other 2 (1.56%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) figure 1. library social media follows. 12 66 23 16 10 0 10 20 30 40 50 60 70 extremely likely somewhat likely neither likely nor unlikely somewhat unlikely extremely unlikely how likely are you to follow the library on social media? information technology and libraries | march 2018 14 the students were asked which social media platforms they thought the library should be on. five rose to the top of the results: facebook (82 percent, n = 105), instagram (55 percent, n = 70), twitter (40 percent, n = 51), snapchat (34 percent, n = 44), and youtube (29 percent, n = 37). full results can be seen in figure 2. after a student selected a platform they wanted the library to be on, logic built into the survey then directed them to an additional question that asked what content they would like to see from the library on that platform. content included library logistics (hours, events, etc.), research techniques and tips, how to use library resources and services, library resource info (database instruction/tips, journal availability, etc.), business news, library news (e.g., if the library wins an award), campus-wide info/events, and interesting/fun websites and memes. for facebook, students widely selected all types of content, with the most selections made for library logistics (n = 73) and the fewest made for business news (n = 33). for instagram, students wanted all content except business news (n = 18). snapchat was similar, except along with business news (n = 8), students also were not interested in receiving content related to library resource information (n = 9). twitter was similar to facebook in that all content was widely selected. youtube had a focus on library services, with the three most-selected content options being research techniques and tips (n = 20), how to use library resources and services (n = 19), and library resource info (n = 16). table 3 contains the full results. figure 2. library social media presence. 105 7 70 23 10 1 1 44 5 51 37 0 20 40 60 80 100 120 facebook g+ instagram linkedin pinterest qzone renren snapchat tumblr twitter youtube what social media platform should the library be on? academic libraries on social media | howard, huber, carter, and moore 15 https://doi.org/10.6017/ital.v37i1.10160 table 3. library social media content by platform what type of content would you like to see from the library? content type f a c e b o o k (n = 1 0 5 ) g + (n = 7 ) in s ta g r a m (n = 7 0 ) l in k e d in (n = 2 3 ) p in te r e s t (n = 1 0 ) s n a p c h a t (n = 4 4 ) t u m b lr (n = 5 ) t w itte r (n = 5 1 ) y o u t u b e (n = 3 7 ) library logistics (hours, events, etc.) 73 (69.52%) 2 (28.57%) 34 (48.57%) 7 (30.43%) 4 (40%) 23 (52.27%) 2 (40%) 32 (62.75%) 8 (21.62%) research techniques & tips 52 (49.52%) 3 (42.85%) 28 (40%) 13 (56.53%) 7 (70%) 19 (43.18%) 3 (60%) 27 (52.94%) 20 (54.05%) how to use library resources & services 53 (50.48%) 3 (42.85%) 26 (37.14%) 8 (34.78%) 7 (70%) 16 (36.36%) 3 (60%) 25 (49.02%) 19 (51.35%) library resource info (database instruction/tips , journal availability, etc.) 53 (50.48%) 3 (42.85%) 22 (31.42%) 8 (34.78%) 6 (60%) 9 (20.45%) 2 (40%) 23 (45.10%) 16 (43.24%) business news 33 (31.43%) 2 (28.57%) 18 (25.71%) 13 (56.52%) 3 (30%) 8 (18.18%) 2 (40%) 17 (33.33%) 7 (18.92%) library news (e.g., if the library wins an award) 49 (46.67%) 3 (42.85%) 37 (52.86%) 12 (52.17%) 5 (50%) 19 (43.18%) 3 (60%) 24 (47.06%) 7 (18.92%) campus-wide info/events 73 (69.52%) 3 (42.85%) 42 (60%) 5 (21.74%) 5 (50%) 26 (59.09%) 2 (40%) 35 (68.63%) 13 (35.14%) interesting/fun websites & memes 48 (45.71%) 0 41 (58.57%) 2 (8.70%) 10 (100%) 30 (68.18%) 3 (60%) 26 (50.98%) 12 (32.43%) other 1 (0.95%) 0 2 (2.86%) 0 1 (10%) 2 (4.55%) 0 2 (3.92%) 1 (2.70%) discussion historically, libraries have used social media as a marketing tool.13 with social media’s everincreasing popularity with young adults, academic libraries have actively established a presence on several platforms.14 our survey shows that our students follow this trend, using social media regularly and for a variety of activities. we were surprised that facebook turned out to be the information technology and libraries | march 2018 16 most widely used by our students, as much has been written in the last few years about teens and young adults leaving the platform.15 a november 2016 survey, however, found that 65 percent of teens said they used facebook daily, a large increase from 59 percent in november 2014. though snapchat and instagram preferred, teens continue to use facebook for its utility in scheduling events or keeping in touch regarding homework.16 students do seem receptive to following the library on different platforms and report wanting primarily library-related content from us, including more in-depth content such as research techniques and database instruction. limitations and future work findings from this study give insight into opportunities for libraries to reach university students through social media. we acknowledge that only limited generalizations can be made because of the way the survey was conducted. our internal recruitment methods led to a selection bias in our surveyed population, as advertisement of the survey took place either in the chosen libraries or on the purdue libraries’ existing facebook page. because of this, our sample consists primarily of students who visit the library or already follow the library on facebook. we hope to alter this in future surveys by expanding our recruitment to other physical spaces across campus. in addition, we plan to add questions that first establish a better understanding of students’ opinions of libraries being on social media before asking what social media they would like to see libraries use. this would potentially avoid leading students to an answer. further, we are concerned we took for granted students’ understanding of library resources; that is, we may have made distinctions librarians understand, but students may not. in future studies, we plan to rephrase, and possibly combine, questions in a way that will be clear to people less familiar with library resources and services. we believe confusion with these questions created contradictory responses. for example, “research help through social media” received a low response rate, but “information on research techniques and tips” received a much higher response rate. additionally, a limitation of using a survey to collect behavior information is that respondents do not always report how they actually behave. using methods such as focus groups, interviews, text mining, or usability studies could provide a more holistic view of student behavior. duplication of this study on a yearly or semi-yearly basis across all libraries could help us see how social media preferences change over time and across a larger sample of our population. this study aimed to provide a broad view of a large university’s student body by surveying across different subject libraries. with the changes discussed, we think a revised survey could give us the detailed information we need to build a more effective social media strategy that reaches both library users and non-users. conclusion this study improved our understanding of the social media usage and preferences of purdue students. from these results, we intend to develop better communication channels, a clear social media presence, and a more cohesive message across the purdue libraries. under the direction of our new director of strategic communication, a social media committee was formed with representatives from each of the libraries to contribute content for social media. the committee will consider expanding the purdue libraries’ social media presence to communication channels where students have said they are and would like us to be. as social media usage is ever-changing, we recommend repeated surveys such as this to better understand where on social media students want to see their libraries and what information they want to receive from them. academic libraries on social media | howard, huber, carter, and moore 17 https://doi.org/10.6017/ital.v37i1.10160 references 1 alfred hermida, tell everyone: why we share and why it matters (toronto: doubleday canada, 2014), 1. 2 laurie charnigo and paula barnett-ellis, “checking out facebook.com: the impact of a digital trend on academic libraries,” information technology and libraries 26, no. 1 (march 2007): 23–34, https://doi.org/10.6017/ital.v26i1.3286. 3 stephen bell, lorcan dempsey, and barbara fister, new roles for the road ahead: essays commissioned for the acrl’s 75th anniversary (chicago: association of college and research libraries, 2015). 4 amanda harrison et al., “social media use in academic libraries: a phenomenological study,” journal of academic librarianship 43, no. 3 (may 1, 2017): 248–56, https://doi.org/10.1016/j.acalib.2017.02.014. 5 “social media fact sheet,” pew research center, january 12, 2017, http://www.pewinternet.org/fact-sheet/social-media/. 6 online computer library center, sharing, privacy and trust in our networked world: a report to the oclc membership, (dublin, ohio: oclc, 2007)), https://eric.ed.gov/?id=ed532599. 7 ruth sara connell, “academic libraries, facebook and myspace, and student outreach: a survey of student opinion,” portal: libraries and the academy 9, no. 1 (january 8, 2009): 25–36, https://doi.org/10.1353/pla.0.0036. 8 elizabeth brookbank, “so much social media, so little time: using student feedback to guide academic library social media strategy,” journal of electronic resources librarianship 27, no. 4 (2015): 232–47, https://doi.org/10.1080/1941126x.2015.1092344. 9 besiki stvilia and leila gibradze, “examining undergraduate students’ priorities for academic library services and social media communication,” journal of academic librarianship 43, no. 3 (may 1, 2017): 257–62, https://doi.org/10.1016/j.acalib.2017.02.013. 10 brookbank, “so much social media, so little time.” 11 stvilia and gibradze, “examining undergraduate students’ priorities.” 12 qzone and renren are chinese social media platforms. 13 curtis r. rogers, “social media, libraries, and web 2.0: how american libraries are using new tools for public relations and to attract new users,” south carolina state library, may 22, 2009, http://dc.statelibrary.sc.gov/bitstream/handle/10827/6738/scsl_social_media_libraries_20 09-5.pdf?sequence=1; jakob harnesk and marie-madeleine salmon, “social media usage in libraries in europe—survey findings,” linkedin slideshare slideshow presentation, august https://doi.org/10.6017/ital.v26i1.3286 https://doi.org/10.1016/j.acalib.2017.02.014 http://www.pewinternet.org/fact-sheet/social-media/ https://eric.ed.gov/?id=ed532599 https://doi.org/10.1353/pla.0.0036 https://doi.org/10.1080/1941126x.2015.1092344 https://doi.org/10.1016/j.acalib.2017.02.013 http://dc.statelibrary.sc.gov/bitstream/handle/10827/6738/scsl_social_media_libraries_2009-5.pdf?sequence=1 http://dc.statelibrary.sc.gov/bitstream/handle/10827/6738/scsl_social_media_libraries_2009-5.pdf?sequence=1 information technology and libraries | march 2018 18 10, 2010, https://www.slideshare.net/jhoussiere/social-media-usage-in-libraries-in-europesurvey-teaser. 14 “social media fact sheet.” 15 daniel miller, “facebook’s so uncool, but it’s morphing into a different beast,” the conversation, 2013, http://theconversation.com/facebooks-so-uncool-but-its-morphing-into-a-differentbeast-21548; ryan bradley, “understanding facebook’s lost generation of teens,” fast company, june 16, 2014, https://www.fastcompany.com/3031259/these-kids-today; nico lang, “why teens are leaving facebook: it’s ‘meaningless,’” washington post, february 21, 2015, https://www.washingtonpost.com/news/the-intersect/wp/2015/02/21/why-teensare-leaving-facebook-its-meaningless/?utm_term=.1f9dd4903662. 16 alison mccarthy, “survey finds us teens upped daily facebook usage in 2016,” emarketer, january 28, 2017, https://www.emarketer.com/article/survey-finds-us-teens-upped-dailyfacebook-usage-2016/1015053. https://www.slideshare.net/jhoussiere/social-media-usage-in-libraries-in-europe-survey-teaser https://www.slideshare.net/jhoussiere/social-media-usage-in-libraries-in-europe-survey-teaser http://theconversation.com/facebooks-so-uncool-but-its-morphing-into-a-different-beast-21548 http://theconversation.com/facebooks-so-uncool-but-its-morphing-into-a-different-beast-21548 https://www.fastcompany.com/3031259/these-kids-today https://www.washingtonpost.com/news/the-intersect/wp/2015/02/21/why-teens-are-leaving-facebook-its-meaningless/?utm_term=.1f9dd4903662 https://www.washingtonpost.com/news/the-intersect/wp/2015/02/21/why-teens-are-leaving-facebook-its-meaningless/?utm_term=.1f9dd4903662 https://www.emarketer.com/article/survey-finds-us-teens-upped-daily-facebook-usage-2016/1015053 https://www.emarketer.com/article/survey-finds-us-teens-upped-daily-facebook-usage-2016/1015053 introduction literature review academic libraries and social media student perceptions about academic libraries on social media research questions methods results survey library usage social media platforms social media activity social media and the library discussion limitations and future work conclusion references 207 standardized costs for automated library systems mary ellen l. jacob: systems officer, fisher library, university of sydney costs of automated library systems as currently given in published reports tend to be misleading and confusing. it is necessary to have a clear understanding of how they were derived before any comparisons can be made. clearly defined costs in terms of time units are more meaningful than straight dollar costs and can be used as one means of comparison among various system designs and as guidelines for the design of new systems. there is a great lack of consistency in reporting the costs of automated library systems. cost figures given in published reports tend to be misleading and confusing; rather than indicate the true cost of a system, they tend to obscure the entire issue. without a clear understanding of how such figures were obtained, one cannot use them for comparison against any other system ( 1). while it is true that no two systems are identical, use of standardized methods can make cross comparisons meaningful and give a basis for estimating the costs of new systems ( 2). when all the variables affecting automated library systems are considered, it is very tempting to say that no realistic comparisons can be made. what is needed is some definite statement of just what criteria can be used to determine costs and how they are derived ( 3). while there have been numerous studies dealing with the cost aspects of specific functions, no real attempt has been made to define standardized cost criteria .. 208 journal of library automation vol. 3/3 september, 1970 for automated library systems. the following discussion is an attempt to identify and define some of the more common cost aspects of such systems. what is cost? "cost" as defined by accountants, is not the subject of this article. primary interest here is in cost as a yardstick for measuring the efficiency and effectiveness of a system and for its comparison with other systems. it is important to note that cost is only one criterion and not necessarily the most important one. as costs are herein described several factors need to be determined. does cost include: 1 ) fixed overhead such as lighting, office space, administrative functions, etc.? 2) actual salaries or assessed salaries (some installations have a fixed figure for certain types of jobs regardless of the actual cost)? 3) equipment cost ( each installation has its own methods for prorating equipment costs)? 4) material costs, paper supplies, etc.? cost figures in terms of dollars have little or no meaning unless their derivation is understood. more meaningful are costs in terms of time, man units for human work, and actual running times for equipment. even for use of these units it is necessary to know something of the relative skill of the personnel involved and the equipment configuration used. personnel costs before examination of the possible breakdown of personnel costs several pertinent points should be considered. it is necessary to state the backgrounds, skills, and levels of experience of the personnel involved. these should include extent of familiatity and experience with the system environment. this environment consists of the equipment, computer or otherwise, and the particular library application involved. it would be advantageous if there were some objective ways of measuring system analyst and programmer performance, rather than reliance on background and experience as measures. unfortunately there are none. this is a problem that has bothered service bureaus, software houses, and any data processing manager worth his salt. at present there is no clear-cut answer. some try to measure efficiency by the length of time taken to code a number of program steps. this gives no measure of the efficiency of the program generated, only a guide to the translating ability of the individual involved. it is certainly no measure of the actual program performance on the computer system. in addition it is extremely difficult to estimate accurately the actual running time that a given program should take, especially for a time-sharing system. how then can one measure the effectiveness of a given programmer in achieving such a goal? another problem to be considered is that of the best program versus the most efficient. it is important that a program be maintainable and capable of being changed easily by another programmer. too, if equipment changes are contemplated, it is highly desirable that the program be written costs for automated systems/jacob 209 in a higher-level language, such as cobol or fortran, which can be used on another machine. higher-level languages are not as efficient as assembler languages, but generally take less time to write and debug and are usually transferable from one machine to another with only minor modifications. while it is not possible to measure analyst or programmer efficiency accurately, it does help to know the level of experience and the general background of personnel. while an inexperienced analyst or programmer may occasionally be more efficient than an experienced one, this is not generally true. normally the more experienced man will know a variety of standardized methods or shortcuts that can be used effectively to either shorten running time, coding time, or both. more important, he knows where to start. a sample personnel description might read: systems and programming staff: 1 systems analyst, b.a. in business administration, five years' experience with various makes of computers, two years of which were spent as a programmer working with cobol and fortran, no library background, worked on a part-time basis. 2 programmers, both with high school diplomas, one with one year's experience with cobol, one trainee with no experience, but high aptitude, both with no library background. 1 library data processing coordinator, masters in library science, manufacturer's course in systems analysis and programming, knowledge of cobol. data preparation staff: 2 professional librarians with masters in library science. 5 clerk-typists, high school diplomas, two with keypunch ability, three typists with 60 wpm. a breakdown of personnel costs should include : 1 ) planning; 2) actual design (both systems and individual programs); 3) coding (writing of actual programs); 4) testing/debugging; 5) file conversion; 6) actual data preparation and correction (includes new file; preparation and maintenance of existing files); 7) program maintenance. in the planning, design, and coding phases both total time actually spent and the elapsed project time should be given. testing/debugging and conversion costs are normally one-time costs, but both can amount to a sizable portion of the system cost ( 4). ideally, if conversion costs can also include file cleanup, it helps make that portion more valuable and easier to accept ( 5). 2io journal of library automation vol. 3/3 september, i970 once the system is in operation, data preparation and correction times become major cost factors. these are usually highest during the initial installation when personnel are learning the new system. care should be taken not to let the size of initial costs bias the entire cost figure. once initial training is over, these will reach a more realistic level and will be more indicative of system requirements and actual costs. equipment costs just as there are difficulties in determining analyst and programmer efficiency there are similar, though less severe, problems in comparing the efficiency of various machine configurations. even in comparison of two identical machine configurations, different run times are possible for the same job. the operating system or monitor must also be considered, as must the experience and efficiency of the computer operator. systems are improving to the point where operator performance is less critical than it once was, but it can still be a significant factor. equipment costs are largely determined by the machine configuration used. a tape system may have a totally different running time from that of a disc or drum system. the configuration also affects what types of systems may be implemented. for comparison purposes it is necessary to state the make, model, and memory size of the computer used. memory size should be given in either words or bytes. a byte is the amount of storage required for one alphabetic character, one special character, or one or two numeric characters. if the memory size is specified in words, the word size should also be given. details of the computer peripherals, such as the general type (i.e., tape drives, disc, printer, card reader, punch, etc.) make, model, and number should be given. for printers, card readers, punches, paper tape units, etc. the speed should also be given (i.e. lines/ minute, card/minute, characters/second, etc.). for storage media, such as disc and drum, the storage capacity should be given. for tape units the tape density should be included. a sample description might read: 1-ibm 360/20 submodel 5, 24k bytes 4-ibm 2415 tape drives, model 3, 800 bpi i-ibm 2560 card reader/punch, reader-500 cpm, punch-160 cpm i-ibm 2203 printer 450 ipm 2-ibm 2311 disc, model i2, 2.7 million bytes equipment costs can be subdivided in many ways. a possible method includes: 1) computer costs a) compile times (highly language and computer dependent) b) test/debug time (should include the entire system as well as individual components) c) actual run times d ) maintenance or debug after installation costs for automated systems/jacob 211 2) additional equipment costs a) keypunch, paper tape, optical character, other inpu~ devices b) interpreting punched card output c) sorting/ collating d) listings e) bursting, binding, etc. 3) special forms or material costs a) input or work forms b) punched cards (pre-printed or blank) c) pre-printed forms d) carbon sets or ncr forms e) pre-punched badge or id cards f) masters for reproduction g) special computer printer ribbons while all of the above items may not be applicable to a particular system, all those that are should be included. large or real-time systems might need additional categories. compile and testjdebug times are of interest to the systems designer, but are less important than actual run times. compile times are a function of the computer, the language used, and the complexity of the program. they are more indicative of the compiler efficiency than the system performance. test/debug times must be allowed for in any system, but data on them will be useful to those having had little experience with automated systems. experienced designers will be aware of the problem and make adequate allowances for it. actual run times are a primary cost factor in any system and representative samples should be given. details concerning the type and volume of input, type of processing, and the type and volume of output should also be stated. program maintenance costs usually do not appear until after the system has been up and running for some time, and are usually not included in reports of system costs, since most reports are written before, or soon after, system installation. they are important, however, because they represent a part of the continuing cost of the system. conversion costs can represent a sizable portion of the installation cost of the system. this is especially true if the data must be converted to machine readable form. these costs are of great interest to others engaged in similar conversions and care should be taken to ensure these are accurate. the most obvious type of file conversion is from a record such as a typed list or a catalog card into a machine readable record through keypunching or keytape conversion. file conversion may still exist for a file already in machine readable form if there are differences between it and the files used by the system. normally such costs are considerably less than conversion from a non-machine-readable form. exceptions may occur if the file lacks much of the necessary information or if extensive character 212 journal of library automation vol. 3/3 september, 1970 manipulation is required before the information can be used. an example of a file having insufficient data to warrant conversion might be a card file with very abbreviated authors and titles used for quick listing purposes, when what is wanted is a full shelf list containing all added entries and subjects, full titles, and imprint information. existing information may require too many corrections to expand the authors and titles to provide any really usable information. in other words it might be more economical to repunch the file from scratch than to try to edit and punch corrections. non-computer equipment costs should not be neglected. while capital investments in such equipment may be small, the time spent in using the equipment can often be lengthy. this is particularly true of input devices. non-computer equipment includes such items as keypunches; keytape units; and any unit record equipment, such as collators, sorters, interpreters, xerox machines, typewriters, guillotines, etc. just as for computer equipment the type, make, model, quantity used, and special features should be given. a sample description of such equipment might be: 1 ibm selectric typewriter, ascii ocr type element 2 ibm 029 keypunches (no special features) 1 ibm 82 sorter 1 ibm 85 collator in a system using a large number of punched cards or large volumes of paper for printing, these too can be significant cost factors. again the volume of usage may be more helpful than actual dollar cost. special or pre-printed forms are usually more expensive than plain forms, so it is important to state types as well as quantities. the actual dollar cost should be stated as well. an example of materials used to produce a small printed catalog with shortened entries on a six month cycle is: 600 dikote masters (for multilith reproduction) at $56.00/1000 masters 1000 pre-printed punched cards at $1.50/1000 cards 1 ibm 1403 computer printer ribbon, no. 413197 1200 pages, standard, lined, 14% x 11-inch computer printer paper presentation of costs the format for presenting cost data could be divided as follows: personnel: a brief paragraph describing the number, types, backgrounds, and skills of all personnel involved with the system. equipment: computer equipment: a brief statement of the computer make, model, and memory size; type, model and number of peripherals. additional equipment: a brief statement of the types, makes, models, and numbers of any other equipment necessary for the successful operation of the system. materials : a brief statement of the types and quantities of forms used, and for special forms an indication of the actual dollar cost as well. table 1. cost control form for sdi system functions planning actual design coding compile testing/debugging file conversion data preparation/correction per run individual job run citations/external citations /internal profile update decollating bmsting printing computer paper ( 141.4 x 11) citations /external citations /internal profile update elapsed 1 month 3 months 4 months 3 months 2 months .5 months personnel total type hours 80 analyst 20 librarian 98 analyst 20 librarian 90 analyst 2 analyst 5 analyst 15 clerk 1 clerk 3 librarian .1 clerk .1 clerk .1 clerk equipment and material time number (in hours) type 1 1 1 1 1 1 1.2 cdc3600 <j 1 2.9 cdc3600 c "' 7 ibm 029 .,... 1 "' 1 cdc3600 -c 1 1 ibm 029 .... > 1 .2 cdc3600 ~ .,... c 1 .5 cdc3600 ~ ;:::. ..... 1 .05 cdc3600 ~ ~ 1 .1 cdc3600 en .2 ~ "' .,... .2 ~ ~ "' ........ ._ > 80 pages n 0 50 pages t:p 50 pages 1:-0 1-' i en 214 journal of l ibrary automation vol. 3/3 september, 1970 fig. 1. profile update . f'rofii. £ lipdates sort (man nd . s£ tot·) sort fa i.pha sl£ al•) costs for alltomated systems/jacob 215 .sg/.£~t /. fdifi"'a -r t£rms .sort (aj.pjia 7"'£rm s.~ta.) ma'f'~/j plfdi'il£./ cita-r/iin 7'lifms fig . 2. citation run. !1drt ccitati~n .s£q,) slim aijd s£l£cr ~ort lit a ii. .s&d) pi? !nt norl(.£.5 sjji ndt/c.l.s 216 journal of library automation vol. 3/3 september, 1970 table 1 shows a simple presentation of system cost. the table is a suggested form only and is not exhaustive; it can be expanded as needed for more complex systems. the information and figures given in the table illustrate the system discussed in the following section. the purpose is to provide a sample, not to describe the system in detail, and consequently the system description is very brief. system description a selective dissemination of information system was developed to serve a small group of engineers in a scientific laboratory. one source of input consisted of current accessions obtained as a by-product of regular weekly runs to create a master shelf-list file in machine readable form. another input file was obtained by subscription to a commercial tape service supplying journal, book and report citations. while most of the programs developed for the system were new, it was possible to modify some existing programs for use in the new system. file conversion was required from an existing profile tape used in the previous sdi (ibm package 1401-cx-01) for the format used by the new system. the greater capabilities of the new system also resulted in numerous modifications and expansions of the profiles. the profile master containing a description of user interests had just under 100 profiles representing 40 separate groups. most profiles were for groups rather than individuals; these were updated only as needed. the citation tape contained slightly over 8,000 journals and book citations per week. the internal citation tape contained 180 report citations per week. an average of 400 notices per weekly run for the external citations and 200 notices per weekly run for the internal citations were generated. systems how for the profile update and the citation runs are contained in figures 1 and 2. the language used for the system was cobol. development personnel 1 analyst/programmer with three years cobol, two years autocoder, professional librarian with four years library experience, worked with ibm 1401, 1410 and cdc 3600 1 professional librarian with 15 years' experience in all phases of library work, knowledge of computers, but no programming or analysis experience 1 clerk-typist, ba in english, 60 wpm typist, self-taught keypunch operator, worked in library four years equipment configuration computer equipment 1 cdc 3600, 65 k (words), 8 bytes/word 8 cdc 604 tape drives, 200/500/800 bpi, 7 track, 37.5 inches per sec. costs for automated systems/jacob 217 2 cdc 861 magnetic drwns at 4.2 million characters, 17 ms access time, 2 million cps transfer rate 1 cdc 405 card reader, photoelectric, 1200 cpm 1 cdc 415 card punch, 250 cpm 2 cdc 501 printers, 1000 1 pm, 64 char. print set, 136 char. line 1 cdc 3601 console non-computer equipment 1 ibm 026 keypunch (no special features) 1 decollator 1 burster 1 hand perforator materials standard ( 14lh x 11), lined, computer printer paper blank punched cards magnetic tape subscription @ $5000./year general considerations how well the system attains its intended goals within the desired limits of design, development, and operating costs is the most important consideration. design and development costs are usually initial costs only, but operating costs continue as long as the system functions. operating costs must include the cost of data preparation, computer run times, cost of program maintenance, additional equipment costs, and cost of special forms or materials needed. careful consideration should be given to allowing sufficient money to be spent in design and development so that overal1 operating costs, especially those of data preparation and computer run times, can be reduced. references 1. griffin, hillis l.: "estimating data processing costs in libraries," college and research libraries, 25 (sept. 1964), 400-03, 431. 2. fasana, paul j.: "determining the cost of library automation," a.l.a. · bulletin, 61 (june 1967, 656-61). 3. landau, herbert b.: "the cost analysis of document surrogation: a literature review," american documentation, 20 (oct. 1969), 320-310. 4. gregory, robert h.; van horn, richard l.: automatic data-processing systems: principles and procedures (belmont, ca: wadsworth, 1963 ). 5. hammer, donald p.: "problems in the conversion of bibliographic data: a keypunching experiment," american documentation, 19 (jan. 1968), 12-17. microsoft word 13063 20211217 galley.docx article bridging the gap using linked data to improve discoverability and diversity in digital collections jason boczar, bonita pollock, xiying mi, and amanda yeslibas information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.13063 jason boczar (jboczar@usf.edu) is digital scholarship and publishing librarian, university of south florida. bonita pollock (pollockb1@usf.edu) is associate director of collections and discovery, university of south florida. xiying mi (xmi@usf.edu) is digital initiative metadata librarian, university of south florida. amanda yeslibas (ayesilbas@usf.edu) is e-resource librarian, university of south florida. © 2021. abstract the year of covid-19, 2020, brought unique experiences to everyone in their daily as well as their professional life. facing many challenges of division in all aspects (social distancing, political and social divisions, remote work environments), university of south florida libraries took the lead in exploring how to overcome these various separations by providing access to its high-quality information sources to its local community and beyond. this paper shares the insights of using linked data technology to provide easy access to digital cultural heritage collections not only for the scholarly communities but also for those underrepresented user groups. the authors present the challenges at this special time of the history, discuss the possible solutions, and propose future work to further the effort. introduction we are living in a time of division. many of us are adjusting to a new reality of working separated from our colleagues and the institutions that formerly brought us together physically and socially due to covid-19. even if we can work in the same physical locale, we are careful and distant with each other. our expressions are covered by masks, and we take pains with hygiene that might formerly have felt offensive. but the largest divisions and challenges being faced in the united states go beyond our physical separation. the nation has been rocked and confronted by racial inequality in the form of black lives matter, a divisive presidential campaign, income inequality exacerbated by covid-19, the continued reckoning with the #metoo movement, and the wildfires burning the west coast. it feels like we are burning both literally and metaphorically as a country. adding fuel to this fire is the consumption of unreliable information. ironically, even as our divisions become more extreme, we are increasingly more connected and tuned into news via the internet. sadly, fact checking and sources are few and far between on social media platforms, where many are getting their information. the pew foundation report the future of truth and misinformation online warns that we are on the verge of a very serious threat to the democratic process due to the prevalence of false information. lee raine, director of the pew research center’s internet and technology project, warns, “a key tactic of the new anti-truthers is not so much to get people to believe in false information. it’s to create enough doubt that people will give up trying to find the truth, and distrust the institutions trying to give them the truth.”1 libraries and other cultural institutions have moved very quickly to address and educate their populations and the community at large, trying to give a voice to the oppressed and provide information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 2 reliable sources of information. the university of south florida (usf) libraries reacted by expanding antiracism holdings. usf’s purchases were informed by work at other institutions, such as the university of minnesota’s antiracism reading lists, which has in turn grown into a rich resource that includes other valuable resources like the mapping prejudice project and a link to the umbra search.2 the triad black lives matter protest collection at the university of north carolina greensboro is another example of a cultural institution reacting swiftly to document, preserve, and educate.3 these new pages and lists being generated by libraries and cultural institutions seem to be curated by hand using tools that require human intervention to make them and keep them up to date. this is also a challenge the usf libraries faced when constructing its new african american experience in florida portal, a resource that leverages already existing digital collections at usf to promote social justice. another key challenge is linking new digital collections and tools to already established collections and holdings. beyond the new content being created in reaction to current movements, there is already a wealth of information established in rich archives of material, especially regarding african american history. digital collections need to be discoverable by a wide audience to achieve resource sharing and educational purposes. this is a challenge many digital collections struggle with, because they are often being siloed from library and archival holdings even within their own institutions. all the good information in the world is not useful if it is not findable. an example of a powerful discovery tool that is difficult to find and use is the umbra search (https://www.umbrasearch.org/) linked to the university of minnesota’s anti-racism reading list. umbra search is a tool that aggregates content from more than 1,000 libraries, archives, and museums.4 it is also supported by highprofile grants from the institute of museum and library services, the doris duke charitable foundation, and the council on library and information resources. however, the website is difficult to find in a web search. umbra search was named after society of umbra, a collective of black poets from the 1960s. the terms umbra and society of umbra do not return useful results for finding the portal, nor do broader searches of african american history the portal is difficult to find through basic web searches. one of the few chances for a user to find the site is if they came upon the human-made link in the university of minnesota anti-racism reading list. despite enthusiasm from libraries and other cultural institutions, new purchases and curated content are not going to reach the world as fully as hoped. until libraries adopt open data formats in favor of locking away content in closed records like marc, library and digital content will remain siloed from the internet. the library catalog and digital platforms are even siloed from each other. we make records and enter metadata that is fit for library use but not shareable to the web. as karen coyle asked in her lita keynote address a decade ago, the question is how can libraries move from being “on the web” to being “of the web”?5 the suggested answer and the answer the usf libraries are researching is with linked data. literature review the literature on linked data for libraries and cultural heritage resources reflects an implementation that is “gradual and uneven.” as national libraries across the world and the library of congress develop standards and practices, academic libraries are still trying to understand their role in implementation and identify their expertise.6 information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 3 in 2006 tim berners-lee, the creator of the sematic web concept, outlined four rules of linked data: 1. use uris as names for things. 2. use http uris so that people can look up those names. 3. when someone looks up a uri, provide useful information, using the standards (rdf, sparql). 4. include links to other uris so that they can discover more things.7 it was not too long after this that large national libraries began exploring linked data and experimenting with uses. in 2010 the british library presented its prototype of linked data. this move was made in accordance with the uk government’s commitment to transparency and accountability along with the user’s expectation that the library would keep up with cutting edge trends.8 today the british library has released the british national bibliography as linked data instead of the entire catalog because it is authoritative and better maintained than the catalog.9 the national libraries of europe, spurred on by government edicts and europeana (https://www.europeana.eu/en), are leading the progress in implementation of linked data. national libraries are uniquely suited to the development and promotion of new technologies because of their place under the government and proximity to policy making, bridging communication between interested parties and the ability to make projects into sustainable services.10 a 2018 survey of all european national libraries found that 15 had implemented linked data, two had taken steps for implementation and three intended to implement it. even national libraries that were unable to implement linked data were contributing to the linked data open cloud by providing their data in datasets to the world.11 part of the difficulty with earlier implementation of linked data by libraries and cultural heritage institutions was the lack of a “killer example” that libraries could emulate.12 the relatively recent success of european national libraries might provide those examples. many other factors have slowed the implementation of linked data. a survey of norwegian libraries in 2009 found considerable gap in the semantic web literature between the research undertaken in the technological field and the research in the socio-technical field. implementing linked data requires reorganization of the staff, commitment of resources, education throughout the library and buy-in from the leadership to make it strategically important.13 the survey of european national libraries cited the exact same factors as limitations in 2018.14 outside of european national libraries the implementation of linked data has been much slower. many academic institutions have taken on projects that tend to languish in a prototype or proof of concept phase.15 the library-centric talis group of the united kingdom “embraced a vision of developing an infrastructure based on semantic web technologies” in 2006, but abandoned semantic web-related business activities in 2012.16 it has been suggested that it is premature to wholly commit to linked data, but it should be used for spin-off projects in an organization for experimentation and skill development.17 linked data is also still proving to be technologically challenging for implementation of cultural heritage aggregators. if many human resources are needed to facilitate linked data, it will remain an obstacle for cultural heritage aggregators. a study has shown automated interpretation of information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 4 ontologies is hampered by a lack of inter-ontology relations. cross-domain applications will not be able to use these ontologies without human intervention.18 aiding in the development and awareness of linked data practices for libraries is the creation and implementation of bibframe by the library of congress. the library of congress’s announcement in july 2018 that bibframe would be the replacement of marc definitively shows that the future of library records is focused on linking out and integrating into the web.19 the new rda (resource description and access) cataloging standards made it clear that marc is no longer the best encoding language for making library resources available on the web.20 while rda has adopted the cataloging rules to meet a variety of new library environments, the marc encoding language makes it difficult for computers to interpret and apply logic algorithms to the marc format. in response, the library of congress commissioned the consulting agency zepheria to create a framework that would integrate with the web and be flexible enough to work with various open formats and technologies, as well as be able to adapt to change. using the principles and technologies of the open web, the bibframe vocabulary is made of “resource description framework (rdf) properties, classes, and relationships between and among them.”21 eric miller, the ceo of zepheria, says bibframe “works as a bridge between the description component and open web discovery. it is agnostic with regards to which web discovery tool is employed” and though we cannot predict every technology and application bibframe can “rely on the ubiquity and understanding of uris and the simple descriptive power of rdf.”22 the implementation of linked data in the cultural heritage sphere has been erratic but seems to be moving forward. it is important to pursue though because bringing local data out of the “deep web” and making them open and universally accessible, means offering minority cultures a democratic opportunity for visibility.”23 linked data linked data is one way to increase the access and discoverability of critical digital cultural heritage collections. also referred to as semantic web technologies, linked data follows the w3c resource description framework (rdf) standards.24 according to tim berners-lee, the semantic web will bring structure and well-defined meaning to web content allowing computers to perform more automated processes.25 by providing structure and meaning to digital content, information can be more readily and easily shared between institutions. this provides an opportunity for digital cultural heritage collections of underrepresented populations to get more exposure on the web. following is a brief overview of linked data to illustrate how semantic web technologies function. linked data is created by forming semantic triples. each rdf triple contains uniform resource identifiers or uris. these identifiers allow computers (machines) to “understand” and interpret the metadata. each rdf triple consists of three parts: a subject, a predicate, and an object. the subject defines what the metadata rdf triple is about, while the object contains information about the subject which is further defined by the relationship link in the predicate. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 5 figure 1. example of a linked data rdf triple describing william shakespeare’s authorship of hamlet. for example, in figure 1, “william shakespeare wrote hamlet” is a triple. the subject and predicate of the triple are written as an uri containing the identifier information and the object of the triple is a literal piece of information. the subject of the triple, william shakespeare, has an identifier which in this example links to the library of congress name authority file for william shakespeare. the predicate of the rdf triple describes the relationship between the subject and object. the predicate also typically defines the metadata schema being used. in this example, dublin core is the metadata schema being used, so “wrote” would be identified by the dublin core creator field. the object of this semantic triple, hamlet, is a literal. literals are text that are not linked because they do not have a uri. subjects and predicates always have uris to allow the computer to make links. the object may have a uri or be a literal. together these uris, along with the literal, tell the computer everything it needs to know about this piece of metadata, making it self-contained. rdf triples with their uris are stored in a triple-store graph style database which functions differently from a typical relational database. relational databases rely on table headers to define the metadata stored inside. moving data between relational databases can be complex because tables must be redefined every time data is moved. graph databases don’t need tables since all the defining information is already stored in each triple. this allows for bidirectional flow of information between pieces of metadata and makes transferring data simpler and more efficient.26 information in a triple-store database is then retrieved using sparql, a query language developed for linked data. because linked data is stored as self-contained triples, machines have all the information needed to process the data and perform advanced reasoning and logic programming. this leads to better search functionality and lends itself well to artificial intelligence (ai) technologies. many of today’s modern websites make use of these technologies to enhance their displays and provide greater functionality for their users. the internet is an excellent avenue for libraries to un-silo their collections and make them globally accessible. once library collections are on the web, advanced keyword search functionalities and artificial intelligence machine learning algorithms can be developed to automate metadata creation workflows and enhance search and information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 6 retrieval of library resources. the use of linked data metadata in these machine-learning functions will add a layer of semantic understanding to the data being processed and analyzed for patron discovery. ai technology can also be used to create advanced graphical displays making connections for patrons between various resources on a research topic. sharing digital cultural heritage data with other institutions often involves transferring data and is considered one of the greatest difficulties in sharing digital collections. for example, if one institutional repository uses dublin core to store its metadata for a certain cultural heritage collection and another repository uses mods/mets to store digital collections, there must first be a data conversion before the two repositories could share information. dublin core and mods/mets are two completely different schemas with different fields and metadata standards. these two schemas are incompatible with each other and must be crosswalked into a common schema. this typically results in some data loss during the transformation process. this makes combining two collections from different institutions into one shared web portal difficult. linked data allows institutions to share collections more easily. because linked data triples are self-contained, there is no need to crosswalk metadata stored in triples from one schema into another when transferring data. the uris contained in the rdf triples allow the computer to identify the metadata schema and process the metadata. rdf triples can be harvested from one linked data system and easily placed into another repository or web portal. a variety of schemas can all be stored together in one graph database. storing metadata in this way increases the interoperability of digital cultural heritage collections. collections stored in triple-store databases have sparql endpoints that make harvesting the metadata in a collection more efficient. libraries can easily share metadata on important collections increasing the exposure and providing greater access for a wider audience. philip schreur, author of “bridging the worlds of marc and linked data,” sums this concept up nicely: “the shift to the web has become an inextricable part of our day-to-day lives. by moving our carefully curated metadata to the web, libraries can offer a muchneeded source of stable and reliable data to the rapidly growing world of web discovery.”27 linked data also makes it easier to harvest metadata and import collections into larger cultural heritage repositories like digital public library of america (dpla) which uses linked data to “empower people to learn, grow, and contribute to a diverse and better-functioning society by maximizing access to our shared history, culture, and knowledge.”28 europeana, the european cultural heritage database, uses semantic web technologies to support its mission which is to “empower the cultural heritage sector in its digital transformation.”29 using linked data to transfer data into these national repositories is more efficient and there is less loss of data because the triples do not have to be transformed into another schema. this increases the access of many cultural heritage collections that might not otherwise be seen. one of the big advantages to linked data is the ability to create connections between other cultural heritage collections worldwide via the web. incorporating triples harvested from other collections into the local datasets enables libraries to display a vast amount of information about cultural heritage collections in their web portals. libraries thus can provide a much richer display and allows users access to a greater variety of resources. linked data also allows web developers to use uris to implement advanced search technologies creating a multifaceted search environment for patrons. current research points to the fact that using sematic web technologies makes the creation of advance logic and reasoning functionalities possible. according to liyang yu in the book introduction to the semantic web and semantic web services, “the semantic web is an information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 7 extension of the current web. it is constructed by linking current web pages to a structured data set that indicates the semantics of this linked page. a smart agent, which is able to understand this structure data set, will then be able to conduct intelligent actions and make educated decisions on a global scale.”30 many digital cultural heritage collections in libraries live in siloed resources and are therefore only accessible to a small population of users. linked data helps to break down traditional library silos in these collections. by using linked data, an institution can expand the interoperability of the collection and make it more easily accessible. many institutions are starting to incorporate linked data technologies into digital collections, thereby increasing the ability for institutions to share collections. this allows for a greater audience to have access to critical cultural heritage collections for underrepresented populations. in the article “bridging the worlds of marc and linked data,” the author states, “the shift to linked data within this closed world of library resources will bring tremendous advantages to discovery both within a single resource … as well as across all the resources in your collections, and even across all of our collective collections. but there are other advantages to moving to linked data. through the use of linked data, we can connect to other trusted sources on the web.… we can also take advantage of a truly international web environment and reuse metadata created by other national libraries.”31 university of south florida libraries practice university of south florida libraries digital collections house a rich collection varying from cultural heritage objects to natural science and environment history materials to collections related to underrepresented populations. most of the collections are unique to usf and have significant research and educational value. the library is eager to share the collections as widely as possible and hopes the collections can be used at both document and data level. linked data creates a “web of data” instead of a “web of documents,” which is the key to bringing structure and meaning to web content, allowing computers to better understand the data. however, collections are mostly born at the document level. therefore, the first problem librarians need to solve is how to transform the documents to data. for example, there is a beautiful natural history collection called audubon florida tavernier research papers in usf libraries digital collections. the audubon florida tavernier research papers is an image collection which includes rookeries, birds, people, bodies of water, and man-made structures. the varied images come from decades of research and are a testament to the interconnectedness of bird population health and human interaction with the environment. the images reflect the focus of audubon’s work in research and conservation efforts both to wildlife and to the habitat that supports the wildlife.32 this was selected to be the first collection the authors experimented with to implement linked data at usf libraries. the lessons learned from working with this collection are applied to later work. when the collection was received to be housed in the digital platform, it was carefully analyzed to determine how to pull the data out of all the documents as much as possible. the authors designed a metadata schema of the combination of mods and darwin core (darwin core, abbreviated to dwc, is an extension of dublin core for biodiversity informatics) to pull out and properly store the data. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 8 figure 2. american kestrel. figure 3. american kestrel metadata. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 9 figure 2 is one of the documents in the collection, which is a photo of an american kestrel. figure 3 shows the data collected from the document and the placement of the data in the metadata schema. the authors put the description of the image in free text in the abstract field. this field is indexed and searchable through the digital collections platform. location information is put in the hierarchical spatial field. the subject heading fields describe the “aboutness” of the image, that is, what is in the image. all the detailed information about the bird is placed in darwin core fields. thus, the document is dissembled into a few pieces of data which are properly placed into metadata fields where they can be indexed and searched. having data alone is not sufficient to meet linked data requirements. the first of the four rules of linked data is to name things using uris.33 to add uris to the data, the authors needed to standardize the data and reconcile it against widely-used authorities such as library of congress subject headings, wikidata, and the getty thesaurus of geographic names. standardized data tremendously increases the percentage of data reconciliation, which will lead to more links with related data once published. figure 4. amenaprkitch khachkar. figure 4 shows an example from the armenia heritage and social memory program. this is a visual materials collection with photos and 3d digital models. it was created by the digital heritage and humanities collection team at the library. the collection brings together comprehensive information and interactive 3d visualization of the fundamentals of armenian identity, such as their architectures, languages, arts, etc.34 when preparing the metadata for the items in this collection, the authors devoted extra effort to adding geographic location metadata. this effort serves two purposes: one is to respectfully and honestly include the information in the collection; and the second is to provide future reference to the location of each item as the physical items are in danger and could disappear or be ruined. the authors employed the getty thesaurus of geographic names because it supports a hierarchical location structure. the location names at each level can be reconciled and have their own uris. the authors also paid extra attention on the subject headings. figure 5 shows how the authors used library of congress subject headings, local subject headings assigned by the researchers, and the getty art and architecture thesaurus for this collection. in the data reconciliation stage, the metadata can be compared against both library of congress subject headings authority files and the getty aat vocabularies so that as many uris as possible can be fetched and added to the metadata. the focus information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 10 on geographic names and subject headings is to standardize the data and use controlled vocabularies as much as possible. once moving to the linked data world, the data will be ready to be added with uris. therefore, the data can be linked easily and widely. figure 5. amenaprkitch khachkar metadata. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 11 one of the goals of building linked data is to make sense out of data and to generate new knowledge. as the librarians explored how to bring together multiple usf digital collections to highlight african american history and culture, three collections seemed particularly appropriate: • an african american sheet music collection from the early 20th century (https://digital.lib.usf.edu/sheet-music-aa) • the “narratives of formerly enslaved floridians” collection from 1930s (https://digital.lib.usf.edu/fl-slavenarratives) • the “otis r. anthony african american oral history collection” from 19781979(https://digital.lib.usf.edu/ohp-otisanthony) these collections are all oral expressions of african american life in the us. they span the first three-quarters of the 20th century around the time of the civil rights movement. creating linked data out of these collections will help shed light on the life of african americans through the 20th century and how it related to the civil rights movement. with semantic web technology support, these collections can be turned into machine actionable datasets to assist research and education activities on racism, anti-racism and to piece into the holistic knowledge base. usf libraries started to partner with dpla in 2018. dpla leverages linked data technology to increase discoverability of the collections contributed to it. dpla employs javascript object notation for linked data (json-ld) as its serialization for their data which is in rdf/xml format. json-ld has a method of identifying data with iris. the use of this method can effectively avoid data ambiguity considering dpla is holding a fairly large amount of data. json-ld also provides computational analysis in support of semantics services which enriches the metadata and in results, the search will be more effective.35 in the 18 months since usf began contributing selected digital collections to dpla, usf materials have received more than 12,000 views. it is exciting to see the increase in the usage of the collections and it is the hope that they will be accessed by more diverse user groups. usf libraries are exploring ways to scale up the project and eventually transition all the existing digital collections metadata to linked data. one possible way of achieving this goal would be through metadata standardization. a pilot project at usf libraries is to process one medium-size image collection of 998 items. the original metadata is in mods/mets xml files. we first decided to use the dpla metadata application profile as the data model. if the pilot project is successful, we will apply this model to all of our linked data transformation processes. in our pilot, we are examining the fields in our mods/mets metadata and identify those that will be meaningful in the new metadata schema. then we transport the metadata in those fields to excel files. the next step is to use openrefine to reconcile the data in these excel files to fetch uris for exact match terms. during this step, we are employing reconciliation services from the library of congress, getty tgn, and wikidata. after all the metadata is reconciled, we are transforming the excel file to triples. the column headers of the excel file become the predicates and the metadata as well as their uris will be the objects of the triples. next, these triples will be stored in an apache jena triple-store database so that we can start designing sparql queries to facilitate search. the final step will be designing a user-friendly interface to further optimize the user experiences. in this process, to make the workflow as scalable as possible, we are focusing on testing two processes: first, creating a universal metadata application profile to apply to the most, if not all, of the collections; and second, only fetching uris for exactly matching terms during the reconciliation information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 12 process. both of these processes aim to reduce human interactions with the metadata so that the process is more affordable to the library. conclusion and future work linked data can help collection discoverability. in the past six months, usf has seen an increase in materials going online. usf special collections department rapidly created digital exhibits to showcase their materials. if the trend in remote work continues, there is reason to believe that digital materials may be increasingly present and, given enough time and expertise, libraries can leverage linked data to better support current and new collections. the societal impact of covid-19 worldwide sheds light on the importance of technologies such as linked data that can help increase discoverability. when items are being created and shared online, either directly related to covid-19 or a result of its impact, linked data can help connect those resources. for instance, new covid-19 research is being developed and published daily. the publications office of the european union datathon entry “covid-19 data as linked data” states that “[t]he benefit of having covid-19 data as linked data comes from the ability to link and explore independent sources. for example, covid-19 sources often do not include other regional or mobility data. then, even the simplest thing, having the countries not as a label but as their uri of wikidata and dbpedia, brings rich possibilities for analysis by exploring and correlating geographic, demographic, relief, and mobility data.”36 the more institutions that contribute to this, the greater the discoverability and impact of the data. in 2020 there has been an increase in black lives matter awareness across the country. this affects higher education. usf libraries are not the only ones engaged in addressing racial disparities. many institutions have been doing this for years. others are beginning to focus on this area. no matter whether it’s a new digital collection or one that’s been around for decades, the question remains: how do people find these resources? perhaps linked data technologies can help solve that problem. linked data is a technology that can help accentuate the human effort put forth to create those collections. linked data is a way to assist humans and computers in finding interconnected materials around the internet. usf libraries faced many obstacles implementing linked data. there is a technological barrier that takes well-trained staff to surmount, i.e., creating a linked data triple store database and having linked data interact correctly on webpages. there is a time commitment necessary to create the triples and sparql queries. sparql queries themselves vary from being relatively simple to incredibly complicated. the authors also had the stumbling block of understanding how linked data worked together on a theoretical level. taking all of these considerations into account, we can say that creating linked data for a digital collection is not for the faint of heart. a cost/benefit analysis must be taken and assessed. the authors of this paper must continue to determine the need for linked data. at usf, the authors have taken the first steps in converting digital collections into linked data. we’ve moved from understanding the theoretical basis of linked data and into the practical side where the elements that make up linked data start coming together. the work to create triples, sparql queries, and uris has begun, and full implementation has started. our linked data group has learned the fundamentals of linked data. the next, and current, step is to develop workflows for existing metadata conversion into appropriate linked data. the group meets regularly and has created a triple store database and converted data into linked data. while the process is slow information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 13 moving due to group members’ other commitments, progress is being made by looking at the most relevant collections we would like to transform and moving forward from there. we’ve located the collections we want to work on, taking an iterative approach to creating linked data as we go. with linked data, there is a lot to consider. how do you start up a linked data program at your institution? how will you get the required expertise to create appropriate and high-quality linked data? how will your institution crosswalk existing data into triples format? is it worth the investment? it may be difficult to answer these questions but they’re questions that must be addressed. the usf libraries will continue pursuing linked data in meaningful ways and showcasing linked data’s importance. linked data can help highlight all collections but more importantly those of marginalized groups, which is a priority of the linked data group. endnotes 1 peter perl, “what is the future of truth?” pew trust magazine, february 4, 2019, https://www.pewtrusts.org/en/trust/archive/winter-2019/what-is-the-future-of-truth. 2 “anti-racism reading lists,” university of minnesota library, accessed september 24, 2020, https://libguides.umn.edu/antiracismreadinglists. 3 “triad black lives matter protest collection,” unc greensboro digital collections, accessed december 9, 2020, http://libcdm1.uncg.edu/cdm/blm. 4 “umbra search african american history,” umbra search, accessed december 10, 2020, https://www.umbrasearch.org/. 5 karen coyle, “on the web, of the web” (keynote at lita, october 1, 2011), https://kcoyle.net/presentations/lita2011.html. 6 donna ellen frederick, “disruption or revolution? the reinvention of cataloguing (data deluge column),” library hi tech news 34, no. 7 (2017): 6–11, https://doi.org/10.1108/lhtn-072017-0051. 7 tim berners-lee, “linked data,” w3, last updated june 18, 2009, https://www.w3.org/designissues/linkeddata.html. 8 neil wilson, “linked data prototyping at the british library” (paper presentation, talis linked data and libraries event, 2010). 9 diane rasmussen pennington and laura cagnazzo, “connecting the silos: implementations and perceptions of linked data across european libraries,” journal of documentation 75, no. 3 (2019): 643–66, https://doi.org/10.1108/jd-07-2018-0117. 10 jane hagerlid, “the role of the national library as a catalyst for an open access agenda: the experience in sweden,” interlending and document supply 39, no. 2 (2011): 115–18, https://doi.org/10.1108/02641611111138923. 11 pennington and cagnazzo, “connecting the silos,” 643–66. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 14 12 gillian byrne and lisa goddard, “the strongest link: libraries and linked data,” d-lib magazine 16, no. 11/12 (2010): 2, https://doi.org/10.1045/november2010-byrne. 13 bendik bygstad, gheorghita ghinea, and geir-tore klæboe, “organisational challenges of the semantic web in digital libraries: a norwegian case study,” online information review 33, no. 5 (2009): 973–85, https://doi.org/10.1108/14684520911001945. 14 pennington and cagnazzo, “connecting the silos,” 643–66. 15 heather lea moulaison and anthony j. million, “the disruptive qualities of linked data in the library environment: analysis and recommendations,” cataloging & classification quarterly 52, no. 4 (2014): 367–87, https://doi.org/10.1080/01639374.2014.880981. 16 marshall breeding, “linked data: the next big wave or another tech fad?” computers in libraries 33, no. 3 (2013): 20–22. 17 moulaison and million, “the disruptive qualities of linked data,” 369. 18 nuno freire and sjors de valk, “automated interpretability of linked data ontologies: an evaluation within the cultural heritage domain,” (workshop, ieee conference on big data, 2019). 19 “bibframe update forum at the ala annual conference 2018,” (washington, dc: library of congress, july 2018), https://www.loc.gov/bibframe/news/bibframe-update-an2018.html. 20 jacquie samples and ian bigelow, “marc to bibframe: converting the pcc to linked data,” cataloging & classification quarterly 58, no. 3–4 (2020): 404. 21 oliver pesch, “using bibframe and library linked data to solve real problems: an interview with eric miller of zepheira,” the serials librarian 71, no. 1 (2016): 2. 22 pesch, 2. 23 gianfranco crupi, “beyond the pillars of hercules: linked data and cultural heritage,” italian journal of library, archives & information science 4, no. 1 (2013): 25–49, http://dx.doi.org/10.4403/jlis.it-8587. 24 “resource description framework (rdf),” w3c, february 25, 2014, https://www.w3.org/rdf/. 25 tim berners-lee, james hendler, and ora lassila, “the semantic web,” scientific american 284, no. 5 (2001): 34–43, https://www.jstor.org/stable/26059207. 26 dean allemang and james hendler, “semantic web application architecture,” in semantic web for the working ontologist: effective modeling in rdfs and owl, (saint louis: elsevier science, 2011): 54–55. information technology and libraries december 2021 bridging the gap | boczar, pollock, mi, and yeslibas 15 27 philip e. schreur and amy j. carlson, “bridging the worlds of marc and linked data: transition, transformation, accountability,” serials librarian 78, no. 1–4 (2020), https://doi.org/10.1080/0361526x.2020.1716584. 28 “about us,” dpla: digital public library of america, accessed december 11, 2020. https://dp.la/about. 29 “about us,” europeana, accessed december 11, 2020, https://www.europeana.eu/en/about-us. 30 liyang yu, “search engines in both traditional and semantic web environments,” in introduction to semantic web and semantic web services (boca raton: chapman & hall/crc, 2007): 36. 31 schreur and carlson, “bridging the worlds of marc and linked data.” 32 “audubon florida tavernier research papers,” university of south florida libraries digital collections, accessed november 30, 2020, https://lib.usf.edu/?a64/. 33 berners-lee, “linked data,” https://www.w3.org/designissues/linkeddata.html. 34 “the armenian heritage and social memory program,” university of south florida libraries digital collections, accessed november 30, 2020, https://digital.lib.usf.edu/armenianheritage/. 35 erik t. mitchell, “three case studies in linked open data,” library technology reports 49, no. 5 (2013): 26-43. 36 “covid-19 data as linked data,” publications office of the european union, accessed december 11, 2020, https://op.europa.eu/en/web/eudatathon/covid-19-linked-data. the impact of covid-19 on the use of academic library resources article the impact of covid-19 on the use of academic library resources ruth sara connell, lisa c. wallis, and david comeaux information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12629 abstract the covid-19 pandemic has greatly impacted higher education, including academic libraries. this paper compares the use of library resources (including interlibrary loan, website and discovery tool pageviews, database use, patron interactions, etc.) at three university libraries before and after the pandemic. the latter part of the 2019 and 2020 spring semesters are the time frames of focus, although two control time frames from earlier in those semesters are used to determine how the semesters differed when the coronavirus was not a factor. the institutions experienced similar patterns of use across many metrics. introduction the year 2020 will be remembered as the year of the novel coronavirus (covid-19). around the world, hundreds of thousands of people died from the disease, schools and businesses shut their doors, wearing masks in public places became commonplace, and unemployment soared.1 everyone and everything changed in ways large and small, and libraries were no exception. this study measures changes in use of library resources during time frames of covid-19 closures at three different institutions: louisiana state university, northeastern illinois university, and valparaiso university. these three universities vary in size (large to medium), control (two public, one private), basic carnegie classification (doctoral-very high research, master’s, and doctoral/ professional), and setting (two primarily non-residential and one highly residential).2 despite their differences, these institutions experienced similar patterns of use across many categories. key findings: • the pandemic affected the three institutions studied on a continuum, with the least impact at the largest school, and the biggest drop in use seen at the smallest school. • use of all three libraries’ websites as well as the discovery tools/catalogs and major databases decreased during the covid time frame. • all three libraries experienced an increase in virtual communication. background louisiana state university louisiana state university (lsu) is the flagship institution of louisiana and is one of only 22 prestigious universities nationwide holding land-grant, sea-grant, and space-grant status. the main campus is in baton rouge, has the carnegie basic classification “doctoral universities: very ruth sara connell (ruth.connell@valpo.edu) is professor of library science & director of systems, valparaiso university. lisa c. wallis (l-wallis@neiu.edu) is associate dean of libraries and eresources & systems librarian, northeastern illinois university. david comeaux (davidcomeaux@lsu.edu) is systems and discovery librarian, louisiana state university. © 2021. mailto:ruth.connell@valpo.edu mailto:l-wallis@neiu.edu mailto:davidcomeaux@lsu.edu information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 2 high research activity,” is primarily non-residential, and has a student full-time equivalent (fte) of about 30,000.3 lsu library is the university’s main library and a center of campus life for both students and faculty, and it houses approximately three million physical items (print books, media, and serials).4 before the onset of covid-19, lsu library was open 24/5 (24 hours sunday-thursday) plus weekend hours, with a 24/7 schedule during finals. as of july 30, 2020, it had 100 employees, of which 39 were librarians. lsu library staff, like all nonessential lsu staff, last reported to work in-person on monday, march 16. the previous day, the state’s governor ordered a statewide stay-at-home order, restricting events and closing venues. that monday, library it employees hurriedly assisted other staff in preparing to work remotely. near the end of the day, the university’s president sent an email asking all employees to work from home until further notice. classes were canceled for the week to allow instructors to prepare for remote teaching. the following week, march 23, was the beginning of spring break. classes resumed online-only on march 30. the libraries remained closed through the duration of the spring semester. despite the closure, the libraries continued to serve patrons. librarians continued to assist students through email and zoom-based sessions. the catalog and discovery systems remained available, and staff continued to fill article and book chapter requests through interlibrary loan and document delivery. northeastern illinois university northeastern illinois university (neiu) is a public, comprehensive, multicultural university located on the north side of chicago. it has an enrollment of approximately 7,000 students in undergraduate and graduate programs among three colleges: arts and sciences, education, and business. while currently classified in the “master’s colleges and universities: larger programs” category, neiu is undergoing a major enrollment shift due to the state of illinois’ failure to provide a budget during the period 2015–2017 and, now, the covid crisis.5 the spring 2019 fte was 4,644 (83% undergraduate), while in spring 2020, fte was 4,404 (80% undergraduate). the campus is primarily a commuter campus. the neiu libraries offer library services at three locations in chicago. altogether, the three libraries house approximately 500,000 physical books, media, and serials.6 in spring 2020, the neiu libraries employed 12 people in positions requiring an mls—including the dean and associate dean of libraries—and 18 staff. the main library is typically open 92 hours per week. neiu’s spring break typically falls at the same point in the semester every year, with the spring 2020 break scheduled to begin on saturday, march 14. the week prior to spring break, neiu’s president announced that the break would be extended by an extra week to allow instructors to move instruction to alternatives from face-to-face teaching. the neiu libraries closed its doors on wednesday, march 18 at 6 p.m., and library faculty and staff began working from home. the libraries were able to offer continued reference and instruction by chat, email, phone, and google meet and to fill article and book chapter requests electronically. no physical materials were available, as the statewide delivery system supporting borrowing among academic libraries stopped its services on march 16. alternative instruction resumed on monday, march 30 for all students. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 3 valparaiso university valparaiso university (valpo) is a private, not-for-profit, highly residential, four-year university in northwest indiana. its carnegie basic classification is “doctoral/professional universities,” although it serves a largely undergraduate student population (90% of fte in spring 2020). the graduate programs serve fewer than 500 students. during the time frame of this study, valpo’s fte ranged from 3,449 (spring 2019) to 3,147 (spring 2020).7 there is one library on campus. the christopher center for library and information resources has approximately 450,000 items in its print collection.8 during the fall and spring semesters (excluding breaks), the library is open 113 hours per week. as of fall 2020, the library employed 19 people: 10 librarians (including the dean) and nine staff members. this is four fewer positions than before the pandemic; due to covid-19 funding cuts, three staff members were laid off and one open librarian position was eliminated. valpo is unusual in that it has a two-week spring break which always falls on the first full two weeks in march. in 2020, spring break coincided with the burgeoning covid-19 crisis. during the second week of spring break, campus administration announced that campus would remain open, but classes would go online immediately following the break, starting monday, march 16. on march 16, the christopher center library moved to reduced hours (open 67.5 hours per week). as the fallout from the pandemic rapidly unfolded, hours were further reduced four days later. at the close of business on tuesday, march 24, in accordance with the state of indiana’s stay-at-home order, the physical building closed, although library faculty and staff continued to work from home. literature review preparing and responding to pandemics and other disasters libraries have long understood the need to prepare for disasters and have chronicled their struggles with disasters both natural and man-made. library collections have been lost due to fires and floods, and libraries have been forced to drastically alter their service models in response. the literature on this topic includes surveys of libraries regarding their emergency preparedness, advice on preparing disaster recovery plans, and recovering or replacing lost collections.9 one particularly prescient article describes the work done at the university of minnesota’s school of public health.10 two librarians joined a task force to prepare the school to function through an influenza pandemic. they focused on two scenarios. one was the onset of a pandemic mid semester, forcing schools to close for weeks or even through the duration of the semester. the other was an even longer (9to 18-month) school closure. both scenarios included implementing social distancing practices now familiar to us all. the task force provided many recommendations to enable continuity of online teaching in the event of a pandemic, but none dealt specifically with libraries. resource use during building disruptions previous studies have studied the impact on the use of library resources during building disruptions, generally due to renovations. in the studies reviewed, these libraries moved to temporary locations and still had some physical space available to students. this differs from the complete closure experienced by most libraries due to covid-19. typically, library services and resources are used less when normal building operations are disrupted, but there are exceptions. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 4 in 1999, the library at eastern illinois university closed for 31 months. library services were relocated elsewhere. overall, the library experienced a “sharp drop in the use of library resources and services,” although one bright spot of growth was in interlibrary loan use, which went up by 16%.11 however, the authors note that this increase may be due to patrons placing requests for items owned by eiu that were difficult to access. mcneese state university in louisiana started a multiyear library renovation in 2012, during which the library’s personnel, services, and a small subset of the collection were relocated to a ballroom in the student union. a 2016 article discusses effects on library services half-way through the disruption. in reviewing the literature, the author found, “the longer students and faculty are not allowed access to the library building, the more usage statistics such as circulation, interlibrary loan (ill), and instruction decrease.”12 mcneese experienced a 22% decrease in interlibrary loan requests (borrowing and lending), a 51% decrease in the use of e-books, and a 62% decline in reference transactions. the author noted that “nearly every library service experienced a precipitous decline.”13 pepperdine university bucked the trend of sharp decreases during their 2016–2017 renovation. they experienced a dramatic increase in the number of interlibrary loan requests, both for borrowing (33%) and lending (375%), likely related to their decision to join rapid-ill at the beginning of the renovation. conversely, there was a slight decrease (10%) in borrowing from the california statewide consortium. pepperdine’s e-book usage remained fairly steady and actually increased 3% during the disruption. as expected, their in-person questions decreased while in a temporary facility, although chat and email reference questions increased by 30%.14 covid-19 impact though the coronavirus was first discovered to have reached the united states in january 2020, it was not until late february that american colleges and universities began to implement travel restrictions for students and staff and to develop plans for potential closure. 15 by early march 2020 the us department of education had developed guidelines for the coronavirus’ impact on distance learning and financial aid.16 the topic was also the subject of numerous articles on higher education websites such as the chronicle of higher education and inside higher ed. in march 2020, when the impact of covid-19 began to reverberate around the united states, virtually all libraries closed.17 the american library association (ala) conducted a survey of libraries of all types in may and reported that the majority of academic libraries had already lost funding, or anticipated losing funding within the next year, for staff, new hires, professional development, print collections, programs, and services.18 the press release for the ala survey stated that “survey respondents shared leaps in the use of digital content, online learning, and virtual programs.”19 however, a review of the survey instrument reveals that no questions were asked about use of these resources, so it is unclear where this assertion comes from.20 the survey included opportunities to leave comments, so it may be that the leaps in use mentioned in the ala press release is anecdotal. although libraries have never faced a global pandemic in modern times, previous research indicates library building disruptions generally reduce use, not cause it to increase as described in the press release. academic libraries were among the last facilities to close on many campuses, as they were viewed as essential for students.21 however, as early as the week of march 9, 7% of academic libraries reported that they had stopped circulating physical materials. in addition, building and face-toinformation technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 5 face reference desk staffing began to be scaled back, with 28% of reporting libraries doing no faceto-face reference by the end of that week.22 complete closure of academic library physical facilities became the norm by mid-march, with vocal advocacy for that measure expressed by both the american library association and the association of college and research libraries.23 apart from library instruction, the coronavirus did not so much force academic libraries to move online as to temporarily suspend physical and in-person services. by 2020, provision of online services—including book, article, and media circulation and email and chat reference—was nothing new for libraries. recent years have seen increased migration to cloud-based systems, eliminating the need for library work to be done using specific client software on library staff computers. academic libraries and vendors alike promoted the ability of their institutions and computer systems to handle the covid-19 crisis with minimal disruption.24 the international coalition of library consortia (icolc) issued a statement on march 13, 2020 , asking vendors to lift many of the usual licensing restrictions and opening access to the 391 million students affected by school and library shutdowns.25 publishers and vendors quickly began to remove paywalls between users and their online collections, either for free without library mediation or upon library request. icolc followed up by starting a comprehensive list of these materials on march 16, 2020.26 despite increased access to online collections, library resource use was disrupted. no matter how many services can be offered online, plenty of students and faculty still use traditional items such as print books. given the circumstances of covid-19, with sudden and lasting limits on access to physical materials and space, academic libraries began to promote online equivalents of th ese, such as the hathitrust emergency temporary access service (etas), which penn state reported offered “reading access to more than 48% of the libraries’ print collections.”27 upon seeing the collected mass of print books students returned prior to leaving campus due to covid -19, librarian nora dimmock of brown university’s john jay library identified the need to move “more intentionally” to e-books over print books in future purchasing decisions.28 methodology in order to determine whether the covid-19 pandemic affected use of library systems and resources, the researchers compared usage statistics from a covid-affected time frame in 2020 to the same time frame in 2019. these will be called the covid 2019 and covid 2020 time frames (or covid time frames collectively). because there could be other factors influencing use, such as differences in student fte between the two years, the researchers also pulled data from control time frames to compare earlier in the spring semesters. these control time frames were unaffected by covid in both 2019 and 2020. these will be called the control 2019 and control 2020 time frames (or control time frames collectively). by including data from the control time frames, we were able to determine whether there were trends affecting usage differences before the pandemic hit. as an example, lsu’s catalog pageviews were down 5% from the previous year during the first part of the semester unaffected by covid-19. for the latter part of the semester affected by covid-19, catalog use fell 25% as compared to the previous year. the control time frame comparison shows that catalog use was already in decline before the pandemic; therefore, some of the 25% decline is likely due to factors other than covid-19. the baseline factored difference is −20% (−5% + x = −25%; x = information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 6 −20%). figures 2, 4, and 6 illustrate the percentage of differences in covid time frames from determined baselines. each of the three researchers determined the control and covid time frames for their own institutions. because the academic calendars vary widely between the three institutions, the time frames of study also differ. the absolutes for each institution were: • the 2019 and 2020 control time frames were the same number of days • the 2019 and 2020 covid time frames were the same number of days (but the control time frames were not the same length as the covid time frames) • if a time frame from one year contained a special calendar event (e.g., spring break, mardi gras holiday, etc.), the researchers made sure the corresponding time frame from the other year also included that event. table 1. dates of institutions’ control & covid comparison time frames (excluding ebsco and proquest) control time frame 2019 control time frame 2020 covid time frame 2019 covid time frame 2020 lsu monday, january 14 through friday, march 8 monday, january 13 through friday, march 6 monday, march 18 through saturday, april 27 monday, march 23 through saturday, may 2 neiu monday, january 7 through friday, march 15 monday, january 6 through friday, march 13 saturday, march 16 through sunday, may 5 saturday, march 14 through sunday, may 3 valpo wednesday, january 9 through friday, march 1 wednesday, january 8 through friday, february 28 monday, march 18 through tuesday, may 14 monday, march 16 through tuesday, may 12 the primary concern was to ensure that each institution was comparing like time frames to like time frames within its own academic calendar. the variability of institutional calendars means that the data cannot be compared between institutions. for the major database platforms, the control and covid time frames differ from the other areas measured due to limitations of the statistical platforms. ebsco and proquest platforms allow reports to be run on full months only; partial month statistics cannot be pulled. for that reason, for the major database platforms alone, the control time frame is january through february, and the covid time frame is april through may. because 2020 was a leap year, february 2020 had one more day than february 2019. this means that the 2019 control time frame had 59 days while the 2020 control time frame had 60 days. the extra day could account for an increase in use of approximately 2%. when evaluating use of database platforms, there are several metrics from which to choose. the authors chose full text downloads (the counter metric known as total item requests) because it is the same across reports. there are different search metrics in platform and database reports, so the authors chose to focus on the consistency of the full text download metric. when considering information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 7 e-resource use during covid-19, another factor to consider is that many publishers removed some or all of their paywalls as the pandemic began. according to counter, the non-profit organization that regulates standards for usage data: as a result of this open content, usage may appear to go down during this period. this is because many of your users will be working from home, outside of your institution’s ip range and not authenticated. this means that the publishers are unable to attribute this usage to your institution.29 the journal platforms (wiley, sage, etc.) were more likely to be affected by this consideration than ebsco and proquest, who did not remove paywalls for databases of interest to academic audiences. the researchers determined which metrics to harvest. the following were selected: • main website usage • catalog usage • discovery tool usage • libguides usage (database a–z lists and guide views) • interlibrary loan article requests received (for both lending and borrowing) • sfx document delivery clickthroughs • ebsco and proquest total item requests (full text downloads) • patron interactions (chat questions, research consultations, ask-a-librarian) the three institutions use different products for discovery tools, chat reference, and interlibrary loan, so reporting mechanisms also vary. however, in general a combination of google analytics pageviews and vendors’ own reporting systems were used to pull the data. neiu libraries migrated from illiad to tipasa in june 2019, so it was not possible to gather comparable daily statistics for interlibrary loan request comparisons. since neiu did not have interlibrary loan data available that compared 2019 to 2020, the sfx report on document delivery clickthroughs was used as a proxy measure. when full text of an item is not available in a database, the docdel clickthroughs indicate that the user went to the sfx menu and then the tipasa interlibrary loan request page. because of various factors, not all data points are available for the three institutions. for example, neiu is part of the carli consortium. it was decided by a carli committee a few years ago that google analytics would not be used in the shared catalog in order to protect users’ privacy, s o the catalog usage data point is missing for neiu. neiu and valpo have chat service data; lsu now has chat but did not during the time frames under investigation. when data are not available, they are missing from the tables and results. results louisiana state university lsu’s data present a mixed picture. while the use of libguides rose, use of the libraries’ website, discovery system, and catalog all declined during the covid-enforced closure. while downloads through ebsco increased during the covid time frame by 30%, that is less than the increase information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 8 during the control time frame (37%), resulting in a baseline-factored decrease. interlibrary loan requests and requests for help from patrons rose considerably. table 2. lsu: use of library resources during control and covid time frames, 2019 and 2020 measure of use 2019 control 2020 control control period change 2019 covid 2020 covid covid period change ill borrowing & docdel 3,872 3,774 -3% ▼ 2,619 3,525 35% ▲ ill lending 2,692 2,419 -10% ▼ 1,845 2,884 56% ▲ catalog pageviews 83,760 79,838 -5% ▼ 54,662 41,000 -25% ▼ discovery tool pageviews 432,070 407,832 -6% ▼ 349,558 291,093 -17% ▼ ebsco total item requests 59,892 81,804 37% ▲ 49,819 64,817 30% ▲ proquest total item requests 2,783 5,575 100% ▲ 6,002 2,859 -52% ▼ main website pageviews 164,329 151,090 -8% ▼ 103,340 81,404 -21% ▼ databases a–z views 27,977 26,037 -7% ▼ 20,893 19,624 -6% ▼ libguides views 127,191 123,783 -3% ▼ 12,163 18,925 56% ▲ ask-a-librarian tickets 35 88 151% ▲ 17 108 535% ▲ information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 9 figure 1. lsu: change from 2019 to 2020 in both control and covid time frames (in %). figure 2. lsu: percentage of differences in covid time frames from determined baselines. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 10 northeastern illinois university use of some of neiu’s resources dropped dramatically during the covid 2020 time frame. however, use of some resources was already lower overall in 2020 than in 2019, even in the control time frame. illinois has experienced a tumultuous few years, with state universities starting new fiscal years without budgets in 2015 and 2016. this has led to a steady decrease in enrollment at public regional universities, as students sought to attend more stable institutions out of state.30 neiu was hit particularly hard, experiencing a 25% drop in enrollment between fall 2015 and fall 2019.31 so, it is likely some of the decrease in 2020 was due to lower enrollment. areas that saw growth in covid time frame 2020 were chat and reference consultations and interlibrary loan clickthroughs from sfx (see table 3). table 3. neiu: use of library resources during control and covid time frames, 2019 and 2020 measure of use 2019 control 2020 control control time frame change 2019 covid 2020 covid covid time frame change sfx docdel clickthroughs 1,864 2,728 46% ▲ 924 1,249 35% ▲ discovery tool pageviews 52,435 50,310 -4% ▼ 30,362 23,729 -22% ▼ ebsco total item requests 14,525 19,704 36% ▲ 15,785 15,452 -2% ▼ proquest total item requests 7,003 5,842 -17% ▼ 6,895 4,898 -29% ▼ main website pageviews 47,085 43,215 -8% ▼ 28,938 15,624 -46% ▼ databases a–z views 11,475 11,980 4% ▲ 8,687 7,743 -11% ▼ libguides views 7,473 7,081 -5% ▼ 3,317 3,282 -1% ▼ chats 363 306 -16% ▼ 182 213 17% ▲ research consultations 162 149 -8% ▼ 59 77 31% ▲ information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 11 figure 3. neiu: change from 2019 to 2020 in both control and covid time frames (in %). figure 4. neiu: percentage of differences in covid time frames from determined baselines. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 12 valparaiso university overall, usage of valparaiso university’s library resources dropped dramatically during the part of the 2020 spring semester affected by covid-19 (see table 4). reductions occurred in all areas except chat assistance. if you consider that usage was up in most categories during the pre-covid19 part of the semester compared to the previous year, the decline during the latter part of the semester is even more significant. the two exceptions to this pattern of covid-related net reduction were chat questions and interlibrary loan lending requests received (see figure 6). table 4. valpo: use of library resources during control and covid time frames, 2019 and 2020 measure of use 2019 control 2020 control control time frame change 2019 covid 2020 covid covid time frame change ill borrowing & docdel 564 668 18% ▲ 721 622 -14% ▼ ill lending 341 260 -24% ▼ 373 303 -19% ▼ catalog pageviews 18,951 16,231 -14% ▼ 26,865 6,234 -77% ▼ discovery tool pageviews 20,020 21,369 7% ▲ 32,818 28,534 -13% ▼ ebsco total item requests 7,355 7,324 -0% ▼ 10,871 10,159 -7% ▼ proquest total item requests 8,770 10,844 24% ▲ 12,325 10,915 -11% ▼ main website pageviews 27,334 31,919 17% ▲ 29,771 21,336 -28% ▼ databases a–z views 3,959 4,242 7% ▲ 5,044 4,526 -10% ▼ libguides views 15,395 18,029 17% ▲ 13,374 12,980 -3% ▼ chats 26 32 23% ▲ 20 37 85% ▲ information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 13 figure 5. valpo: change from 2019 to 2020 in both control and covid time frames (in %). figure 6. valpo: percentage of differences in covid time frames from determined baselines. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 14 table 5. three institutions: net pandemic change in resource use (in %) measure of use lsu neiu valpo ill borrowing & docdel 37% ▲ -32% ▼ ill lending 66% ▲ 5% ▲ sfx docdel clickthroughs -11% ▼ catalog pageviews -20% ▼ -62% ▼ discovery tool pageviews -11% ▼ -18% ▼ -20% ▼ ebsco total item requests -6% ▼ -38% ▼ -6% ▼ proquest total item requests -153% ▼ -12% ▼ -35% ▼ main website pageviews -13% ▼ -38% ▼ -45% ▼ databases a–z views 1% ▲ -15% ▼ -17% ▼ libguides views 58% ▲ 4% ▲ -20% ▼ chats 33% ▲ 62% ▲ research consultations 39% ▲ ask-a-librarian tickets 384% ▲ discussion louisiana state university usage of the libraries’ website, discovery system, and catalog all declined during the covidenforced closure. use of the main library site has been steadily declining since 2012. lsu library’s main website saw reduced traffic in both 2020 time frames, but this reduction was particularly dramatic during the covid-19 pandemic. catalog use has also been declining but experienced a sharp decline of 25% during the covid closure. catalog use regularly drops when the library is closed, at least partially owing to the fact that library staff are heavy catalog users. also, the catalog is largely used by patrons seeking print items and, with the closure of the library building, patrons were less likely to search for print items. therefore, a drop could be expected, but such an extreme reduction was a surprise. usage of the main discovery system (eds) followed a similar trajectory. as most discovery searches begin on the library website’s home page, a decline in discovery use would logically follow a decline in website usage. eds pageviews declined between 2019 and 2020 during the control time frame by 6% but declined much more sharply during the covid time frame. interlibrary loan requests (borrowing/document delivery and lending) rose considerably during the closure. one factor in the borrowing/document delivery increase was that document delivery was opened up to undergraduates during the covid 2020 time frame; this service was not available to that patron group during the control time frames and the covid 2019 time frame. downloads through ebsco, typically lsu’s busiest platform, increased in both the control time frame (37%) and covid time frame (30%), resulting in a baseline-factored decrease of 6% (rounded to the nearest percent). proquest use was less straightforward to interpret. while there was a dramatic decrease in downloads between 2019 and 2020, that seems to owe more to an unusually high usage in the 2019 covid time frame rather than a precipitous drop in the 2020 covid time frame. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 15 lsu uses springshare’s popular libguides product, both for its list of databases (databases a–z) and as a platform for research guides. usage of the databases a–z page was minimally impacted in the covid time frame. however, overall guide usage rose significantly at 56%. at least some of this is due to the creation of a covid-specific research guide. this guide included descriptions and links to various electronic resources which vendors had made freely available during the crisis. that page alone had nearly 1,700 views in the covid time frame. but other pages with no clear connection to covid also had significant increases. this may reflect students feeling the need to consult electronic guides to compensate for lack of in-person access to librarians. the covid closure did have an enormous impact on usage of lsu’s patron support system. this system, labeled “ask us!”, handled 108 tickets in the 2020 covid time frame, compared to 17 in 2019. while this system’s use was trending upward in the control time frame as well, the jump in use during the covid-19 closure was notable. while many of the tickets were similar to questions asked when the libraries were open, many others were clearly related to the library’s closure. for example, several inquiries were in reference to alternate access to print items (such as scanning and delivering electronic versions of book chapters). there were also inquiries about when the building might reopen, and when patrons might be able to access print items that had been placed on hold. northeastern illinois university the library’s website is part of the university’s drupal content management system, which restricts the library’s ability to design and structure content. for that reason, much of the high impact library content is stored on third-party sites, like libguides or worldcat discovery. while the library directs users to start their research on the main website, users typically immediately follow a link that leads them to one of those third-party resources. usage of the library website and resources linked from the website both dropped. sfx document delivery clickthroughs, which pass users through to neiu’s interlibrary loan request pages, saw increases overall in 2020 compared to 2019. this is not surprising, given that neiu was forced to cancel two “big deal” packages at the end of 2019. the neiu libraries’ chat service was one of the resources that experienced increased usage during the covid 2020 time frame. an informal review of questions coming in during the early covid closure showed that most respondents were asking about building hours or returning or checking out materials. while the library website was immediately updated with covid-related closure information, the chat button is easily spotted and readily available, while the hours and materials information required clicking additional links. research consultations also increased. as the physical reference desk was no longer available, students and faculty were directed to use email or set up subject librarian google meet appointments for those questions where they may have usually visited the desk. an increase of 31% in these interactions demonstrates that users still needed librarian assistance with research and course-related questions in the time of covid. valparaiso university when evaluating valpo’s interlibrary loan demand, only article requests (for both borrowing/document delivery and lending) were considered; loans were excluded. demand from valpo patrons fell 32% during the time frame affected by covid-19. despite lack of access to the print collection once the pandemic hit, net lending demand from electronic resources rose slightly. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 16 catalog use was in decline before covid hit. this is likely due to the increased reliance on the library’s discovery tool, which includes records for all materials held in the catalog. however, the difference in use seen during the pandemic is striking; net use fell 62%. because patrons use the catalog to access information about the library’s physical collection and the library was closed, this precipitous decline is not surprising. overall discovery tool (summon) use also fell during the 2020 covid time frame (20%). valparaiso university subscribes to approximately 60 ebsco databases and 30 proquest databases, making them valpo’s most popular database vendors. both saw declines, but in proquest more than ebsco. the library’s main website has slightly more than 100 pages, and although it serves as a starting point to reach many of the library’s other resources on different platforms (databases, libguides, etc.) it still receives considerable use: 147,125 pageviews in the 2019 calendar year. high-use pages include hours, the library directory, departments such as interlibrary loan and archives & special collections, and the listing of liaison librarians. the main website experienced a net 45% drop during the time frame affected by covid-19. valpo uses libguides for two primary purposes: to organize and share approximately 200 databases using the product’s a–z database list, and to deliver subject-specific and instruction content. the a–z lists are heavily used by students and faculty across campus. however, during the pandemic, net usage of the databases fell 17%. the net decrease in libguides views was approximately the same at −20%. seemingly, patrons had increased need for chat reference during the pandemic, although valpo receives a relatively small number of chat questions. during the first part of the spring 2020 semester unaffected by covid, valpo received six more questions over the same time frame in 2019. during the covid-affected part of the 2020 semester, usage jumped from 20 chats to 37 chats. again, these numbers are small, so caution should be used in interpretation, but considering the baseline change from the control time frame, there was a net 62% jump in usage for chat reference during the covid time frame. commonalities and trends while the study did not set out to compare usage among the three institutions, some clear patterns did emerge. use of all three libraries’ websites as well as the discovery tools/catalogs and major databases decreased during the covid time frame. a number of explanations could apply. for instance, regarding library website usage, libraries’ public computers are often set to open to the libraries home pages whenever a new browser is launched. when those computers are not in use, such as when library buildings are closed, students do not automatically start their research on those pages. even though students did continue to interact with librarians and library staff through virtual methods, in-person reference encounters were not possible during covid. many patron interactions begin with librarians demonstrating how to start research at the library home page, moving on to find databases or conduct searches in the discovery tool or catalog. without direct librarian guidance, it is not surprising that students do not start their research at library tools. whether it is because they do not realize library resources are available off campus or they have become fond of free web search tools such as google scholar, most public services librarians would expect a decrease in library resource usage when students are not on campus. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 17 with the exception of lsu’s libguides usage, the results of this study do not support the ala assertion that covid led to “leaps in the use of digital content” by library patrons. it is not surprising to see that all three libraries experienced an increase in virtual communication methods, as chat, email, and online meetings were the only means of student-librarian communication once campuses closed. no longer could students catch library staff in the stacks or at a service desk to ask quick—and often uncounted—questions. instead, interactions were more easily measured through the virtual trails they left behind. it would be difficult to determine whether overall student-librarian interaction increased or decreased during the covid time frame as compared to “normal” times. data only show that students did continue to seek help from libraries even when the buildings were closed. individual institutions noted some within–time frame usage patterns. for instance, valpo found that the drop in usage was more dramatic during the first part of the covid 2020 time frame and improved during the latter part of the covid time frame. initially, students were probably in sinkor-swim mode, with school assignments having a lower priority in their lives. after the first month or so, students and faculty may have been able to start thinking about research needs more. valparaiso university fared the worst of the three institutions, with net decreases in eight out of ten of the areas studied (80%) (see table 4). neiu had decreases in six out of nine areas (67%) and lsu, the largest institution, broke even with five out of ten (50%) areas showing decreases. valpo has a carnegie classification of highly residential, while lsu and neiu are primarily nonresidential. it could be that the residential nature of valpo’s campus more negatively impacted use when students were away from campus. lsu, a state flagship, has a higher percentage of graduate students than valpo or neiu. it seems likely that graduate students would be more determined to continue research activities during the closure than undergraduates, which may explain the smaller decrease. this could be an area of further research. conclusion as the pandemic continues and universities plan for altered learning environments into the fall, will there be a rebound in library system and resource usage, or will the dramatic dip seen during immediate covid time frame continue? nearly every day there are multiple webinars for academic libraries, their administrators, and their staff members to share their stories, compare their experiences, and help guide each other for operating in the new normal. librarians have had time to adjust typical pedagogical practices, learn new virtual technologies, and develop outreach plans to ensure continued library instruction in remote and online environments. other changes to library practice may include an even greater shift in acquisitions from print to electronic resources. services for distance students might become a greater point of emphasis. faculty, too, have had time to reevaluate their syllabi and identify support needs for themselves and their students as courses go online. as with so many areas of life these days, the outcomes of this work remain uncertain. information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 18 endnotes 1 “covid-19 map,” johns hopkins coronavirus resource center, accessed july 31, 2020, https://coronavirus.jhu.edu/map.html. 2 the carnegie classification of institutions of higher education, 2018 ed. (bloomington: indiana university center for postsecondary research, 2018), https://carnegieclassifications.iu.edu/. 3 the carnegie classification. 4 “library collection by material type: fiscal year 2018,” national center for education statistics, accessed august 21, 2020, https://nces.ed.gov/ipeds/datacenter/institutionbyname.aspx?gotoreportid=1. 5 the carnegie classification. 6 “library collection by material type.” 7 “enrollment data by semester,” valparaiso university office of institutional effectiveness, accessed august 11, 2020, https://www.valpo.edu/institutional-effectiveness/institutionalresearch/enrollment-data/. 8 “library collection by material type.” 9 s. d. allen iske and linda g. lengfellner, “fire, water & books: disaster preparedness for academic libraries,” professional safety 60, no. 10 (october 2015), https://search.proquest.com/docview/1735009821?pq-origsite=summon; daryl l. superio, stephen b. alayon, and mary grace h. oliveros, “disaster management practices of academic libraries in panay island, philippines: lessons from typhoon haiyan,” information development 35, no. 1 (january 2019): 51–66, https://doi.org/10.1177/0266666917725905; andy corrigan, “disaster: response and recovery at a major research library in new orleans,” library management 29, no. 4/5 (may 2008): 293–306, https://doi.org/10.1108/01435120810869084. 10 lisa mcguire, “planning for a pandemic influenza outbreak: roles for librarian liaisons in emergency delivery of educational programs,” medical reference services quarterly 26, no. 4 (december 2007): 1–13, https://doi.org/10.1300/j115v26n04_01. 11 bradley p. tolppanen and marlene slough, “providing circulation services in a temporary location,” journal of access services 1, no. 4 (may 24, 2004): 125, https://doi.org/10.1300/j204v01n04_10. 12 walter m. fontane, “assessing library services during a renovation,” journal of access services 13, no. 4 (october 1, 2016): 223, https://doi.org/10.1080/15367967.2016.1250643. 13 fontane, 223. 14 marc vinyard et al., “a pop-up service point and repurposed study spaces: maintaining market share during a renovation,” journal of library administration 58, no. 5 (july 4, 2018): 449–67, https://doi.org/10.1080/01930826.2018.1468193. https://coronavirus.jhu.edu/map.html https://carnegieclassifications.iu.edu/ https://nces.ed.gov/ipeds/datacenter/institutionbyname.aspx?gotoreportid=1 https://www.valpo.edu/institutional-effectiveness/institutional-research/enrollment-data/ https://www.valpo.edu/institutional-effectiveness/institutional-research/enrollment-data/ https://search.proquest.com/docview/1735009821?pq-origsite=summon https://doi.org/10.1177/0266666917725905 https://doi.org/10.1108/01435120810869084 https://doi.org/10.1300/j115v26n04_01 https://doi.org/10.1300/j204v01n04_10 https://doi.org/10.1080/15367967.2016.1250643 https://doi.org/10.1080/01930826.2018.1468193 information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 19 15 amy harmon, “inside the race to contain america’s first coronavirus case,” new york times, february 5, 2020, sec. us, https://www.nytimes.com/2020/02/05/us/corona-viruswashington-state.html; karin fischer, “colleges brace for more-widespread outbreak of coronavirus,” chronicle of higher education, february 26, 2020, http://www.chronicle.com/article/colleges-brace-for/248123. 16 “guidance for interruptions of study related to coronavirus (covid-19),” us department of education office of postsecondary education, updated june 16, 2020, https://ifap.ed.gov/electronicannouncements/030520guidance4interruptionsrelated2coronaviruscovid19. 17 “libraries respond: covid-19 survey results (may 2020),” american library association, http://www.ala.org/tools/covid/libraries-respond-covid-19-survey. 18 “libraries respond: covid-19 survey,” appendix 1: academic library financial aggregate tables. 19 “survey: libraries examine phased building re-opening, prepare summer programs,” american library association news and press center, june 3, 2020, http://www.ala.org/news/pressreleases/2020/06/survey-libraries-examine-phased-building-re-opening-prepare-summerprograms. 20 “libraries respond: covid-19 survey” (instrument), american library association, may 2020, http://www.ala.org/pla/sites/ala.org.pla/files/content/advocacy/covid-19/librariesrespond-covid-19-survey-may-2020_5-12-20.pdf. 21 karin fischer, “as coronavirus spreads, colleges make limited allowances for support staff,” chronicle of higher education, march 23, 2020, http://www.chronicle.com/article/ascoronavirus-spreads/248304. 22 christine wolff-eisenberg and lisa janicke hinchliffe, “academic library strategies shift to closure and restriction,” ithaka s+r (blog), march 15, 2020, https://sr.ithaka.org/blog/academic-library-strategies-shift-to-closure-and-restriction/. 23 fischer, “as coronavirus spreads”; colleen flaherty, “librarians advocate closing campus libraries during coronavirus pandemic,” inside higher ed, march 19, 2020, https://www.insidehighered.com/news/2020/03/19/librarians-advocate-closing-campuslibraries-during-coronavirus-pandemic. 24 “cloud-based library platform keeps california community college libraries operational during covid-19 crisis—system is now live at 110 california community colleges,” library technology guides, current news service and archive, 2020, https://librarytechnology.org/pr/25004; “springshare responds to remarkable shift to online library services,” library technology guides, current news service and archive, april 29, 2020, https://librarytechnology.org/pr/25116. 25 “statement on the global covid-19 pandemic and its impact on library services and resources | icolc website,” icolc coordinating committee, march 13, 2020, https://www.nytimes.com/2020/02/05/us/corona-virus-washington-state.html https://www.nytimes.com/2020/02/05/us/corona-virus-washington-state.html http://www.chronicle.com/article/colleges-brace-for/248123 https://ifap.ed.gov/electronic-announcements/030520guidance4interruptionsrelated2coronaviruscovid19 https://ifap.ed.gov/electronic-announcements/030520guidance4interruptionsrelated2coronaviruscovid19 http://www.ala.org/tools/covid/libraries-respond-covid-19-survey http://www.ala.org/news/press-releases/2020/06/survey-libraries-examine-phased-building-re-opening-prepare-summer-programs http://www.ala.org/news/press-releases/2020/06/survey-libraries-examine-phased-building-re-opening-prepare-summer-programs http://www.ala.org/news/press-releases/2020/06/survey-libraries-examine-phased-building-re-opening-prepare-summer-programs http://www.ala.org/pla/sites/ala.org.pla/files/content/advocacy/covid-19/libraries-respond-covid-19-survey-may-2020_5-12-20.pdf http://www.ala.org/pla/sites/ala.org.pla/files/content/advocacy/covid-19/libraries-respond-covid-19-survey-may-2020_5-12-20.pdf http://www.chronicle.com/article/as-coronavirus-spreads/248304 http://www.chronicle.com/article/as-coronavirus-spreads/248304 https://sr.ithaka.org/blog/academic-library-strategies-shift-to-closure-and-restriction/ https://www.insidehighered.com/news/2020/03/19/librarians-advocate-closing-campus-libraries-during-coronavirus-pandemic https://www.insidehighered.com/news/2020/03/19/librarians-advocate-closing-campus-libraries-during-coronavirus-pandemic https://librarytechnology.org/pr/25004 https://librarytechnology.org/pr/25116 information technology and libraries june 2021 the impact of covid-19 on the use of academic library resources | connell, wallis, and comeaux 20 https://icolc.net/statement/statement-global-covid-19-pandemic-and-its-impact-libraryservices-and-resources. 26 “icolc covid19 complimentary expanded access specifics” (google doc), accessed may 8, 2020, https://docs.google.com/spreadsheets/d/1xiinlf9p00to5lgki3v4s413iujycm5qjokug19a_y/edit?usp=sharing. 27 “libraries without walls: even wider access to digital resources during pandemic | penn state university,” penn state news, april 8, 2020, https://news.psu.edu/story/614577/2020/04/08/research/libraries-without-walls-evenwider-access-digital-resources-during. 28 emilija sagaityte and cate ryan, “historic pandemic poses lasting impact on libraries, scholarship,” brown daily herald, march 31, 2020, https://www.browndailyherald.com/2020/03/31/historic-pandemic-poses-lasting-impacton-libraries-scholarship/. 29 “message to libraries about counter usage during the covid-19 pandemic,” project counter, june 12, 2020, https://www.projectcounter.org/message-to-libraries-about-counter-usageduring-the-covid-19-pandemic/. 30 “outmigration numbers increase, with evidence of tie to budget impasse,” illinois board of higher education, march 12, 2019, http://www.ibhe.org/pressreleases/2019.03.12-ibheoutmigration-numbers-for-web.htm. 31 “fall enrollments: five-year trend data,” northeastern illinois university, accessed june 3, 2020, https://www.neiu.edu/sites/neiu.edu/files/documents/ysun2/fall%202019%20data%20di gest%205-year%20enrollment.pdf. https://icolc.net/statement/statement-global-covid-19-pandemic-and-its-impact-library-services-and-resources https://icolc.net/statement/statement-global-covid-19-pandemic-and-its-impact-library-services-and-resources https://docs.google.com/spreadsheets/d/1xiinlf9p00to-5lgki3v4s413iujycm5qjokug19a_y/edit?usp=sharing https://docs.google.com/spreadsheets/d/1xiinlf9p00to-5lgki3v4s413iujycm5qjokug19a_y/edit?usp=sharing https://news.psu.edu/story/614577/2020/04/08/research/libraries-without-walls-even-wider-access-digital-resources-during https://news.psu.edu/story/614577/2020/04/08/research/libraries-without-walls-even-wider-access-digital-resources-during https://www.browndailyherald.com/2020/03/31/historic-pandemic-poses-lasting-impact-on-libraries-scholarship/ https://www.browndailyherald.com/2020/03/31/historic-pandemic-poses-lasting-impact-on-libraries-scholarship/ https://www.projectcounter.org/message-to-libraries-about-counter-usage-during-the-covid-19-pandemic/ https://www.projectcounter.org/message-to-libraries-about-counter-usage-during-the-covid-19-pandemic/ http://www.ibhe.org/pressreleases/2019.03.12-ibhe-outmigration-numbers-for-web.htm http://www.ibhe.org/pressreleases/2019.03.12-ibhe-outmigration-numbers-for-web.htm https://www.neiu.edu/sites/neiu.edu/files/documents/ysun2/fall%202019%20data%20digest%205-year%20enrollment.pdf https://www.neiu.edu/sites/neiu.edu/files/documents/ysun2/fall%202019%20data%20digest%205-year%20enrollment.pdf abstract introduction background louisiana state university northeastern illinois university valparaiso university literature review preparing and responding to pandemics and other disasters resource use during building disruptions covid-19 impact methodology results louisiana state university northeastern illinois university valparaiso university discussion louisiana state university northeastern illinois university valparaiso university commonalities and trends conclusion endnotes detection of information requirements of researchers using bibliometric analyses to identify target journals vadim nikolaevich gureyev, nikolai alekseevich mazov information technology and libraries | december 2013 66 abstract bibliometric analyses were used to identify journals that are representative of the authors’ research institutes. methods to semiautomatically collect data for an institute’s publications and which journals they cite are described. citation analyses of lists of articles and their citations can help librarians to quickly identify the preferred journals in terms of the number of publications, and the most frequently cited journals. librarians can use these data to generate a list of journals that an institute should subscribe to. background recent developments in information technology have had a significant impact on the research activities of scientific libraries. such tools have provided new insights into the workload and duties of librarians in research libraries. in the present study, we performed bibliometric analyses to determine the information needs of researchers, and to determine whether they are satisfied with the journal subscriptions available at their institutes. such analyses are important because of limited funding for subscriptions, increases in the cost of electronic resources, and the publication of new journals, especially open-access journals. bibliometric analyses are more accessible and less labor-intensive when using specially designed web services and software. several databases of citation data are accessible online. the leading publishers of these databases, including thomson reuters and elsevier, promote their products such as the web of science (wos) and scopus with travelling and online seminars to increase the number of skilled users. of note, the number of articles devoted to bibliometric analysis has increased about 4-fold since 2000 (see figure 1). vadim nikolaevich gureyev (gureyev@vector.nsc.ru) is leading bibliographer, information analysis department, state research center of virology and biotechnology vector, novosibirsk, russia. nikolai alekseevich mazov (mazovna@ipgg.sbras.ru) is head of information and library center, trofimuk institute of petroleum geology and geophysics sb ras, novosibirsk, russia. mailto:mgureyev@vector.nsc.ru mailto:mazovna@ipgg.sbras.ru information technology and libraries | december 2013 67 figure 1. growth of publications devoted to informetric analysis. data were generated from the wos using the following request: «topic=((bibliometric* or informetric* or webometic* or scientometric*) and (stud* or analys*))». bibliometric analysis appears to be the most objective method for use by librarians. it is important to note that bibliometric analysis shows high objectivity, even when compared with peer review.1 citation analysis can be used to select target journals because it accurately reflects the needs of researchers and can reveal current scientific trends. it also allows librarians to evaluate the effectiveness of each journal, the significance of each journal to the institute, and the minimum archival depth.2 citation analysis is particularly useful when generating a list of journals for subscription and to determine whether to subscribe to specific journals.3 in the present study, we performed citation analyses of scientific papers that were published by researchers at src vb “vector” (biology and medicine) and ipgg sb ras (geosciences). we analyzed groups of journals that published articles from these two institutes and compared the characteristics of the cited and citing journals. many journals publish articles covering the fields associated with the two institutes (biology and medicine, and geosciences), and journals in these fields tend to have the highest impact factors of all fields. therefore, the methods applied in this study and the results may be generalized to other research libraries. detection of information requirements of researchers using bibliometric analyses to identify target journals | gureyev and mazov 68 study design sources. we analyzed articles published in journals or books by researchers at src vb “vector” and ipgg, together with the references cited in these articles. we limited the articles to those published in 2006–2010 (ipgg) or 2007–2011 (src vb “vector”). we did not analyze monographs, theses, or conference proceedings (including those that were published in journals), because our aim was to optimize the list of subscribed journals. to collect comprehensive data regarding these publications, we used four overlapping sources. (1) the russian science citation index (sci) was used to retrieve articles based on the profile of each researcher. the “bound and unbound publications in one list” option was switched off. (2) thomson reuters sci expanded was used to examine the profile of each researcher. the “conference proceedings citation index” option was switched off. (3) scopus was used to retrieve the publications for each researcher. (4) each head of department provided us with the articles published by each member of the research group within the last 5 years. along with publications in which the affiliation was clearly stated, we also analyzed articles where the authors’ affiliation was not stated, the authors reported a superior organization such as a governmental ministry, and the authors from either institute attributed the work to another affiliation (if they worked at two or more organizations). the translated and original versions of the same article were treated as a single article, and the english version was used in our analyses. for journals that published the original russian article and an english translation, we analyzed the latter. citations. citations from the published articles were analyzed to identify the most frequently cited journals. we ignored references that lacked sufficient information or references included in footnotes. cited monographs, theses, and conference proceedings (including those that were published in journals) were also ignored. for citations published in russian with an english translation, we analyzed the translated version, even if the authors originally cited the russian version. we preferred to include the translated versions because they are included in wos database and we can treat them automatically. for example, the wos indexes articles from russian geology and geophysics (print issn 1068-7971) but not the russian-language version geologiya i geofizika (print issn 0016-7886). journals that had been renamed were treated as one journal, and the current/most recent name was used in the analysis. however, journals that had split into multiple titles were analyzed separately, and the journal’s name at the time the cited article was published was used in the analysis. for this study, we first retrieved the journal name and the year the cited article was published. we then expanded on this information by recording the journal publisher, journal accessibility (i.e. subscription method, paper or electronic), open/pay-per-view access, embargo status, and journal length. we ignored the accessibility of individual articles that had been deposited in the author’s personal website or in an institutional repository. information technology and libraries | december 2013 69 results and discussion table 1 summarizes the publication activities for both institutes. a. year number of articles included in russian sci* (%) included in wos (%) included in scopus* (%) nowhere indexed (%) number of journals** 2007 118 94.9 28.8 54.2 5 66 2008 84 96.4 41.6 51.1 3.5 57 2009 82 97.5 39 52.4 2.4 58 2010 100 94.0 41 61 6 60 2011 105 91.4 25.7 55.2 8.5 50 b. year number of articles included in russian sci* (%) included in wos (%) included in scopus* (%) nowhere indexed (%) number of journals** 2007 188 79.8 43.1 43.1 21 82 2008 218 96.8 39.4 41.7 3 88 2009 259 93.0 39.0 37.8 7 87 2010 250 84.4 31.2 29.6 5 102 2011 267 70.4 30.4 30.0 29 97 *the russian sci and scopus indexed some articles twice, particularly those published in russian with an english translation. therefore, some articles have different timelines and citations. these duplications were analyzed as one article. **number of journals in this field, excluding translated journals. table 1. publication activity and articles presence in the main bibliographic databases in the fields of biomedicine (a; 2007–2011) and geoscience (b; 2007–2011). table 1 shows that the two institutes have a stable publication history relative to other russian scientific institutes in terms of publication activity, in publishing approximately 150 articles per year. therefore, our results can be generalized to other institutes in these fields. collecting this information may seem to be a daunting task, especially for librarians who have not conducted such analyses before. we used three databases and contacted the heads of department detection of information requirements of researchers using bibliometric analyses to identify target journals | gureyev and mazov 70 directly. however, our data indicate that it is sufficient to use the free-of-charge russian sci, an extensive index of russian scientific articles that includes almost all of the articles published by russian researchers in russian and international journals. nevertheless, it is essential to review the profile of each author. when searching for articles by affiliation, the number of articles retrieved ranged from 28% to 51%, but the number of publications retrieved tended to decrease over time. this phenomenon may be caused by a deficient system used to identify affiliations because of differences in the spelling of the affiliation name (in our case, more than 70 variants have been used), attribution of the research to a superior organization, and two or more affiliations may have the same name.4 furthermore, recent studies1,5 confirmed that information about authors should be collated by their affiliations, rather than by performing searches in bibliographic databases. it seems paradoxical that the wos and scopus databases index russian articles quicker than the russian sci. by subscribing to the same print and electronic journals, we noted that print editions are published before electronic ones. nevertheless, this seems reasonable based on this 2-year retrospective analysis. therefore, routine analysis of russian articles can be partly automated by efficient searches of the russian sci. table 2 presents the citation details. citing year number of references number of cited journals average number of references in article 2007 1830 492 15.5 2008 1354 472 16.1 2009 1536 558 18.7 2010 1591 471 15.9 2011 1613 484 15.4 table 2. number of cited articles, cited journals, and mean number of references in article in the biomedical field references from articles not indexed in wos were manually extracted, which takes time and effort. references from articles indexed in the wos, including russian articles translated into english, were analyzed semiautomatically. for this purpose, we used endnote software developed by thomson reuters. endnote web is a free alternative that could also be used for this purpose. the references cited in each article were exported into endnote. next, the references were arranged according to the chosen parameters to simplify our analyses. of note, about 35% of the russian articles indexed in wos accounted for 80% of all the references cited in the articles. two possibly reasons for this are (1) the greater number of articles published in translated and international information technology and libraries | december 2013 71 journals, and (2) russian researchers are adopting the western citing culture.6 this finding suggests that it is possible to avoid labor-intensive routine work and to use automated services developed by thomson reuters to collate and analyze up to 80% of all references. the authors of articles in the geosciences field cited 1000 journals, including 750 in western journals and 250 in russian journals. in terms of biomedical articles, the index included 1339 articles, of which 1168 were in western journals and 171 in russian journals. we analyzed about 8000 references cited by authors from each institute over 5 years. the references were divided into three equal groups. the most frequently cited russian journals and book series are listed in table 3. biological and medical sciences geosciences journals/book series percent of references total (%) journals/book series percent of references total (%) problems of virology 16.94 16.94 russian geology and geophysics 34.99 34.99 molecular biology 6.44 23.38 doklady earth sciences 18.18 53.17 biotechnology in russia 6.07 29.45 geochemistry international 6.49 59.66 doklady biological sciences 5.09 34.54 petrology 2.72 62.38 atmospheric and oceanic optics 4.42 38.96 geotectonics 2.55 64.93 annals of the russian academy of medical sciences 4.04 43 geology of ore deposits 2.23 67.16 journal of microbiology, epidemiology and immunobiology 3.82 46.82 national geology 1.98 69.14 molecular genetics microbiology and virology 2.92 49.74 stratigraphy and geological correlation 1.96 71.1 bulletin of experimental biology and medicine 2.77 52.51 izvestiya. physics of the solid earth 1.55 72.65 russian journal of bioorganic chemistry 2.69 55.2 proceedings of all-union mineralogic society 1.45 74.1 problems of tuberculosis 2.62 57.82 bulletin of the russian academy of sciences: geology 1.42 75.52 biochemistry (moscow) 1.79 59.61 lithology and mineral resources 1.42 76.94 pharmaceutical chemistry journal 1.72 61.33 oil and gas geology 1.26 78.2 infectious diseases 1.57 62.9 russian journal of pacific geology 0.82 79.02 bulletin siberian branch of russian academy of medical sciences 1.2 64.1 chemistry for sustainable development 0.69 79.71 russian journal of genetics 1.12 65.22 physics of the solid state 0.68 80.39 table 3. characteristics of the 16 most frequently cited russian journals and journals in the second group listed in order of number of citations. the journals in the colored region include one-third of all citations. the translated titles of each journal and the official translated titles of journals without translated variants are given. detection of information requirements of researchers using bibliometric analyses to identify target journals | gureyev and mazov 72 table 3 shows that two-thirds of all citations were published in only 9% (16/171) of the cited russian biomedical journals. this statistic is even more pronounced in the field of geosciences, as 6% (16/250) of russian journals published 80% of the cited articles. comparing the data between the two institutes, it is notable that the results are consistent. the only difference evident to us is that the geoscience researchers tended to cite more russian journals, whereas biomedical researchers preferred to cite international literature. the greater concentration of citations to select journals in the geosciences field can be explained by the smaller number of citations. in the biomedical field, we observed a high trend towards abundant citations resulting in a wider distribution of citations in each article; the journals with the highest impact factors in biology and medicine confirmed our observation. figures 2 and 3 show the correlations between citations and publication activity in russian journals. figure 2. correlation between publication activity (red) and citations (blue) in the biomedical field (in %) for the data shown in table 3. timescale: 2007–2011. figure 3. correlation between publication activity (red) and citations (blue) in the geosciences field (in %) for the data shown in table 3. timescale: 2006–2010. information technology and libraries | december 2013 73 the citing and cited journals are often the same journals, and publication activity is highly correlated with citation activity. this is more apparent in the geosciences field, where russian geology and geophysics is the most frequently cited journal, as it published about two-thirds of all cited articles. this is unsurprising because it is published by our institute and is the main multidisciplinary russian journal in the field of geosciences. the most frequently cited international journals are listed in table 4. biological and medical sciences geosciences journals/book series percent of references total (%) journals/book series percent of references total (%) journal of virology 6.03 6.03 earth planetary science letters 6.46 6.46 proceedings of the national academy of sciences of the united states of america 3.36 9.39 geochimica et cosmochimica acta 6.28 12.74 virology 3.15 12.54 contributions to mineralogy and petrology 5.67 18.41 vaccine 2.77 15.31 journal of geophysical research 4.9 23.31 journal of biological chemistry 2.4 17.71 nature 3.67 26.98 journal of general virology 2.4 20.11 american mineralogist 3.53 30.51 nature 2.04 22.15 journal of petrology 3.22 33.73 science 1.94 24.09 lithos 2.58 36.31 journal of clinical microbiology 1.94 26.03 chemical geology 2.29 38.6 emerging infectious diseases 1.89 27.92 geology 2.01 40.61 nucleic acids research 1.59 29.51 tectonophysics 1.94 42.55 journal of infectious diseases 1.38 30.89 economic geology 1.93 44.48 detection of information requirements of researchers using bibliometric analyses to identify target journals | gureyev and mazov 74 journal of molecular biology 1.35 32.24 science 1.87 46.35 journal of immunology 1.24 33.48 journal of crystal growth 1.56 47.91 journal of medical virology 1.19 34.67 canadian mineralogist 1.48 49.39 virus research 0.86 35.53 russian geology and geophysics 1.35 50.74 new england journal of medicine 0.86 36.39 european journal of mineralogy 1.32 52.06 archives of virology 0.83 37.22 geophysics 1.02 53.08 antiviral research 0.75 37.97 geophysical research letters 1.02 54.1 lancet 0.73 38.7 journal of metamorphic geology 0.98 55.08 cell 0.65 39.35 journal of geology 0.93 56.01 applied and environmental microbiology 0.6 39.95 international geology review 0.91 56.92 biochemistry 0.59 40.54 physical review. ser. b 0.9 57.82 journal of experimental medicine 0.59 41.13 precambrian research 0.9 58.72 febs letters 0.56 41.69 mineralogical magazine 0.88 59.6 table 4. characteristics of the 25 most frequently cited international journals and journals within the second group listed in terms of number of citations. the colored area includes one-third of all citations. the distribution of citations to international journals was similar to that observed for russian journals, with a greater citation density in journals in the geosciences field. notably, two-thirds of all citations were to articles published in just 25 journals. in terms of biomedical journals, twothirds of all citations were to articles published in 100 journals. only 1.3% (15/1 168) of the cited journals contained one-third of the cited articles in the biomedical field. the corresponding value for journals in the geosciences field was 0.9% (7/750). the correlations between citation activity and publication activity are shown in figures 4 and 5. information technology and libraries | december 2013 75 fig. 4. correlation between publication activity (red) and citations (blue) for biomedical journals (in %) for the data shown in table 4. timescale: 2007–2011. fig. 5. correlation between publication activity (red) and citations (blue) for journals in the geosciences field (in %) for the data shown in table 4. timescale: 2006–2010 detection of information requirements of researchers using bibliometric analyses to identify target journals | gureyev and mazov 76 as illustrated in figures 4 and 5, the distribution of citations to international journals was broader than for russian journals, where there are only 1–4 frequently cited journals. this is probably due to the smaller number of russian journals than international journals. figures 4 and 5 also revealed a difference between the two disciplines, as geoscience researchers published their articles in top cited international journals, whereas biomedical researchers rarely published their research in highly cited journals. this may be due to the greater number of biomedical journals or the lower rate of publication, because relatively few articles were published in the major multidisciplinary journals, such as nature or science, or in specialized journals, such as the journal of virology. conclusion citation analysis enabled rapid identification of the most frequently cited journals that are essential to academic researchers. in the biomedical field, we found that 16 russian and 100 international journals published two-thirds of all cited articles in the last 5 years. in the field of geosciences, we identified 4 russian and 25 international journals that were essential to researchers in this field. interestingly, there were four times as many russian and international journals in the biomedical field than in the geosciences field. the journals that published the researchers’ articles were partially correlated with the cited journals in the geosciences field, but this correlation was less obvious for biomedical journals. it is important to note that all aspects of this study were performed by librarians who used tools that were available in both institutes. we did not require any additional facilities or the assistance of any researchers. we believe our method is one of the most objective and accessible approach for scientific libraries to select target journals. we used our results to optimize subscribed periodical items. in addition to journal acquisition, our methods and results may be applied to other tasks that may be performed by research libraries. for example, it is possible to study the citing and cited halflives of journals, and compare the results with those reported in the journal citation reports. this allows researchers in specific institutes to determine whether they are citing cutting edge or obsolete literature in their studies. the results can also be used to determine whether the subjects of the cited articles are relevant to the institute’s field of research. finally, the results can be used to compare the list of the most frequently cited international journals within a particular field with the list of journals that are most frequently cited by a research institute. perspectives in this study, we revealed some differences in the correlation between citing and cited journals in two distinct fields, namely geosciences and biomedical science. notably, this correlation was greater for journals in the geosciences field. to determine the factors underlying this phenomenon, it will be interesting to extend our study to a greater number of disciplines. it will also be interesting to compare data for cited journals with their usage statistics. information technology and libraries | december 2013 77 references 1. a.f.j. van raan. “the use of bibliometric analysis in research performance assessment and monitoring of interdisciplinary scientific developments.” technikfolgenabschätzung – theorie und praxis, vol. 1 no.12 (2003): 20-29. 2. n.a. slashcheva, yu.v. mokhnacheva and t.n. kharybina. (2008)/ “study of information requirement of scientists from pushchino scientific center ras in central center library.” http://dspace.nbuv.gov.ua:8080/dspace/bitstream/handle/123456789/31392/20slascheva.pdf?sequence=1 (accessed january 21, 2013). 3. nikolay a. mazov. “estimation of a flow of scientific publications in research institute on the basis bibliometric citation analysis”. information technologies in social researches, no.16 (2011): 25-30. 4. leo egghe and ronald rousseau. “citation analysis”, in introduction to informetrics: quantitative methods in library, documentation and information science. amsterdam: elsevier science publishers. (1990): 217-218. 5. “bibliometrics publication analysis as a tool for science mapping and research assessment.” (2008), http://ki.se/content/1/c6/01/79/31/introduction_to_bibliometrics_v1.3.pdf (accessed january 21, 2013). 6. a.e. warshawsky and v.a. markusova. (2009). “estimation of efficiency of russian scientists should be corrected.” http://strf.ru/organization.aspx?catalogid=221&d_no=17296 (accessed january 21, 2013). http://dspace.nbuv.gov.ua:8080/dspace/bitstream/handle/123456789/31392/20-slascheva.pdf?sequence=1 http://dspace.nbuv.gov.ua:8080/dspace/bitstream/handle/123456789/31392/20-slascheva.pdf?sequence=1 http://ki.se/content/1/c6/01/79/31/introduction_to_bibliometrics_v1.3.pdf http://strf.ru/organization.aspx?catalogid=221&d_no=17296 110 information technology and libraries | september 2009 employing virtualization in library computing: use cases and lessons learned arwen hutt, michael stuart, daniel suchy, and bradley d. westbrook this paper provides a broad overview of virtualization technology and describes several examples of its use at the university of california, san diego libraries. libraries can leverage virtualization to address many long-standing library computing challenges, but careful planning is needed to determine if this technology is the right solution for a specific need. this paper outlines both technical and usability considerations, and concludes with a discussion of potential enterprise impacts on the library infrastructure. o perating system virtualization, herein referred to simply as “virtualization,” is a powerful and highly adaptable solution to several library technology challenges, such as managing computer labs, automating cataloging and other procedures, and demonstrating new library services. virtualization has been used in one manner or another for decades,1 but it is only within the last few years that this technology has made significant inroads into library environments. virtualization technology is not without its drawbacks, however. libraries need to assess their needs, as well as the resources required for virtualization, before embarking on large-scale implementations. this paper provides a broad overview of virtualization technology and explains its benefits and drawbacks by describing some of the ways virtualization has been used at the university of california, san diego (ucsd) libraries.2 n virtualization overview virtualization is used to partition the physical resources (processor, hard drive, network card, etc.) of one computer to run one or more instances of concurrent, but not necessarily identical, operating systems (oss). traditionally only one instance of an operating system, such as microsoft windows, can be used at any one time. when an operating system is virtualized—creating a virtual machine (vm)—the vm communicates through virtualization middleware to the hardware or host operating system. this middleware also provides a consistent set of virtual hardware drivers that are transparent to the enduser and to the physical hardware. this allows the virtual machine to be used in a variety of heterogeneous environments without the need to reconfigure or install new drivers. with the majority of hardware and compatibility requirements resolved, the computer becomes simply a physical presentation medium for a vm. n two approaches to virtualization: host-based vs. hypervisor virtualization can be implemented using type 1 or type 2 hypervisor architectures. a type 1 hypervisor (figure 1), commonly referred to as “host-based virtualization,” requires an os such as microsoft windows xp to host a “guest” operating system like linux or even another version of windows. in this configuration, the host os treats the vm like any other application. host-based virtualization products are often intended to be used by a single user on workstation-class hardware. in the type 2 hypervisor architecture (figure 2), commonly referred to as “hypervisor-based virtualization,” the virtualization middleware interacts with the computer’s physical resources without the need of a host operating system. such systems are usually intended for use by multiple users with the vms accessed over the network. realizing the full benefits of this approach requires a considerable resource commitment for both enterprise-class server hardware and information technology (it) staff. n use cases archivists’ toolkit the archivists’ toolkit (at) project is a collaboration of the ucsd libraries, the new york university libraries, and the five colleges libraries (amherst college, hampshire college, mt. holyoke college, smith college, and university of massechusetts, amherst) and is funded by the andrew w. mellon foundation. the at is an open-source archival data management system that provides broad, integrated support for the management of archives. it consists of a java client that connects to a relational database back-end (mysql, mssql, or oracle). the database can be implemented on a networked server or a single workstation. since its initial release in december 2006, the at has sparked a great deal of interest and rapid uptake of the application within the archival community. this growing interest has, in turn, created an increased demand for demonstrations of the product, workshops and training, and simpler methods for distributing the application. (of the use cases described here, the two for the at arwen hutt (ahutt@ucsd.edu) is metadata specialist, michael stuart (mstuart@ucsd.edu) is information technology analyst, daniel suchy (dsuchy@ucsd.edu) is public services technology analyst, and bradley d. westbrook (bradw@library.ucsd.edu) is metadata librarian and digital archivist, university of california, san diego libraries. employing virtualization in library computing | hutt et al. 111 distribution and laptop classroom are exploratory, whereas the rest are in production.) at workshops the society of american archivists sponsors a two-day at workshop occurring on multiple dates at several locations. in addition, the at team provides oneand two-day workshops to different institutional audiences. at workshops are designed to give participants a hands-on experience using the at application. accomplishing this effectively requires, at the minimum, supplying all participants with identical but separate databases so that participants can complete the same learning exercises simultaneously and independently without concern for working in each other’s space. in addition, an ideal configuration would reduce the workload of the instructors, freeing them from having to set up the at instructional database onsite for each workshop. for these workshops we needed to do the following: n provide identical but separate databases and database content for all workshop attendees n create an easily reproducible installation and setup for workshops by preparing and populating the at instructional database in advance virtualization allows the at workshop instructors to predefine the workstation configuration, including the installation and population of the at databases, prior to arriving at the workshop site. to accomplish this we developed a workshop vm configuration with mysql and the at client installed within a linux ubuntu os. the workshop instructors then built the at vm with the data they require for the workshop. the at client and database are loaded on a dvd or flash drive and shipped to the classroom managers at the workshop sites, who then need only to install a copy of the vm and the freely available vmplayer software (necessary to launch the at vm) onto each workstation in the classroom. the at vm, once built, can be used many times both for multiple workstations in a classroom as well as for multiple workshops at different times and locations. this implementation has worked very well, saving both time and effort for the instructors and classroom support staff by reducing the time and communication figure 1. a type 1 hypervisor (host-based) implementation figure 2. a type 2 hypervisor-based implementation 112 information technology and libraries | september 2009 necessary for deploying and reconfiguring the vm. it also reduces the chances that there will be an unexpected conflict between the application and the host workstation’s configuration. but the method is not perfect. more than anything else, licensing costs motivated us to choose linux as the operating system instead of a proprietary os such as windows. this reduces the cost of using the vm, but it also requires workshop participants to use an os with which they are often unfamiliar. for some participants, unfamiliarity with linux can make the workshop more difficult than it would be if a more ubiquitous os was used. at demonstrations in a similar vein, members of the at team are often called upon to demonstrate the application at various professional conferences and other venues. these demonstrations require the setup and population of a demonstration database with content for illustrating all of the application’s functions. one of the constraints posed by the demonstration scenario is the importance of using a local database instance rather than a networked instance, since network connections can be unreliable or outright unavailable (network connectivity being an issue we’ve all faced at conferences). another constraint is that portions of the demonstrations need some level of preparation (for example, knowing what search terms will return a nonempty result set), which must be customized for the unique content of a database. a final constraint is that, because portions of the demonstration (import and data merging) alter the state of the database, changes to the database must be easily reversible, or else new examples must be created before the database can be reused. building on our experience of using virtualization to implement multiple copies of an at installation, we evaluated the possibility of using the same technology for simplifying the setup necessary for demonstrating the at. as with the workshops, the use of a vm for at demonstrations allows for easy distribution of a prepopulated database, which can be used by multiple team members at disparate geographic locations and on different host oss. this significantly reduces the cost of creating (and recreating) demonstration databases. in addition, demonstration scripts can be shared between team members, creating additional time savings as well as facilitating team participation in the development and refinement of the demonstration. perhaps most important is the ability to roll back the vm to a specific state or snapshot of the database. this means the database can be quickly returned to its original state after being altered during a demonstration. overall, despite our initial anxiety about depending on the vm for presentations to large audiences, this solution has proven very useful, reliable, and cost-effective. at distribution implementing the at requires installing both the toolkit client and a database application such as mysql, instantiating an at database, and establishing the connection between database and client. for many potential customers of the at, the requirements for database creation and management can be a significant barrier due to inexperience with how such processes work and a lack of readily available it resources. many of these customers simply desire a plug-and-play version of the application that they can install and use without requiring technical assistance. it is possible to satisfy this need for a plug-and-play at by constructing a vm containing a fully installed and ready-to-use at application and database instance. this significantly reduces the number and difficulty of steps involved in setting up a functional at instance. the customer would only need to transfer the vm from a dvd or other source to their computer, download and install the vm reader, and then launch the at vm. they would then be able to begin using the at immediately. this removes the need for the user to perform database creation and management; arguably the most technically challenging portion of the setup process. users would still have the option of configuring the application (default values, lookup lists, etc.) in accord with the practices of their repository. batch processing catalog records the rapid growth of electronic resources is significantly changing the nature of library cataloging. not only are types of library materials changing and multiplying, the amount of e-resources being acquired increases each year. electronic book and music packages often contain tens of thousands of items, each requiring some level of cataloging. because of these challenges, staff are increasingly cataloging resources with specialized programs, scripts, and macros that allow for semiautomated record creation and editing. such tools make it possible to work on large sets of resources—work that would not be financially possible to perform manually item by item. however, the specialized configuration of the workstation required for using these automated procedures makes it very difficult to use the workstation for other purposes at the same time. in fact, user interaction with the workstation while the process is running can cause a job to terminate prior to completion. in either scenario, productivity is compromised. virtualization offers an excellent remedy to this problem. a virtual machine configured for semiautomated batch processing allows for unused resources on the workstation to process the batch requests in an isolated environment while, at the same time and on the same machine, the user is able to work on other tasks. in cases employing virtualization in library computing | hutt et al. 113 where the user’s machine is not an ideal candidate for virtualization, the vm can be hosted via a hypervisorbased solution, and the user can access the vm with familiar remote access tools such as remote desktop in windows xp. secure sandbox in addition to challenges posed by increasingly large quantities of acquisitions, the ucsd libraries is also encountering an increasing variety of library material types. most notable is the variety and uniqueness of digital media acquired by the library, such as specialized programs to process and view research data sets, new media formats and viewers, and application installers. cataloging some of these materials requires that media be loaded and that applications be installed and run to inspect and validate content. but running or opening these materials, which are sometimes from unknown sources, poses a security risk to both the user’s workstation and to the larger pool of library resources accessible via the network. many installers require a user to have administrative privileges, which can pose a threat to network security. the virtual machine allows for a user to have administrative privileges within the vm, but not outside of the vm. the user can be provided with the privileges needed for installing and validating content without modifying their privileges on the host machine. in addition, the vm can be isolated by configuring its network connection so that any potential security risks are limited to the vm instance and do not extend to either the host machine or the network. laptop classroom instructors at the ucsd libraries need a laptop classroom that meets the usual requirements for this type of service (mobility, dependability, etc.) but also allows for the variety of computing environments and applications in use throughout our several library locations. in a least-common-denominator scenario, computers are configured to meet a general standard (usually microsoft windows with a standard browser and office suite) and allow minimal customization. while this solution has its advantages and is easy to configure and maintain from the it perspective, it leaves much to be desired for an instructor who needs to use a variety of tools in the classroom, often on demand. the goal in this case is not to settle for a single generic build but instead look for a solution that accommodats three needs: n the ability to switch quickly between different customized os configurations n the ability to add and remove applications on demand in a classroom setting n the ability to restore a computer modified during class to its original state of course, regardless of the approach taken, the laptops still needed to retain a high level of system security, application stability, and regular hardware maintenance. after a thorough review of the different technologies and tools already in use in the libraries, we determined that virtualization might also serve to meet the requirements of our laptop classroom. the need to support multiple users and multiple vms makes this scenario an ideal candidate for hypervisor-based virtualization. we decided to use vdi (virtual desktop infrastructure), a commercially available hypervisor product from vmware. vmware is one of the largest providers of virtualization software, and we were already familiar with several iterations of its host-based vm services. the core of our project plan consists of a base vm to be created and managed by our it department. to support a wide variety of applications and instruction styles, instructors could create a customized vm specific to their library’s instruction needs with only nominal assistance from it staff. the custom vm would then be made available on demand to the laptops from a central server (as depicted in figure 2 above). in this manner, instructors could “own” and maintain a personal instructional computing environment, while the classroom manager could still ensure the laptop classroom as a whole maintained the necessary secure software environment required by it. as an added benefit, once these vms are established, they could be accessed and used in a variety of diverse locations. n considerations for implementation before implementing any virtualization solution, in-depth analysis and testing is needed to determine which type of solution, if any, is appropriate for a specific use case in a specific environment. this analysis should include three major areas of focus: user experience, application performance in the virtualized environment, and effect on the enterprise infrastructure. in this section of this paper, we review considerations that, in hindsight, we would have found to be extremely valuable in the ucsd libraries’ various implementations of virtualization. user experience traditionally, system engineers have developed systems and tuned performance according to engineering metrics (e.g., megabytes per second and network latency). while such metrics remain valuable to most assessments of a 114 information technology and libraries | september 2009 computer application, performance assessments are being increasingly defined by usability and user experience factors. in an academic computing environment, especially in areas such as library computer labs, these newer kinds of performance measures are important indicators of how effectively an application performs and, indirectly, of how well resources are being used. virtualization can be implemented in a way that allows library users to have access to both the virtualized and host oss or to multiple virtualized oss. since virtualization essentially creates layers within the workstation, multiple os layers (either host or virtualized) can cause the users to become confused as to which os they are interacting with at a given moment. in that kind of implementation, the user can lose his or her way among the host and guest oss as well as become disoriented by differing features of the virtualized oss. for example, the user may choose to save a file to the desktop, but may not be aware that the file will be saved to the desktop of the virtualized os and not the host os. external device support can also be problematic for the end user, particularly with regard to common devices such as flash drives. the user needs to be aware of which operating system is in use, since it is usually the only one with which an external device is configured to work. authentication to a system is another example of how the relationship between the host and guest os can cause confusion. the introduction of a second os implicitly creates a second level of authentication and authorization that must be configured separately from that of the host os. user privileges may differ between the host and guest os for a particular vm configuration. for instance, a user might need to remember two logins or at least enter the same login credentials twice. these unexpected differences between the host and guest os produce negative effects on a user’s experience. this can be a critical factor in a time-sensitive environment such as a computer lab, where the instructor needs to devote class time to teaching and not to preparing the computers for use and navigating students through applications. interface latency and responsiveness latency (meaning here the responsiveness or “sluggishness” of the software application or the os) in any interface can be a problem for usability. developers devote a significant amount of time to improving operating systems and application interfaces to specifically address this issue. however, users will often be unable to recognize when an application is running a virtualized os and will thus expect virtualized applications to perform with the same responsiveness as applications that are not-virtualized. in our experience, some vm implementations exhibit noticeable interface latency because of inherent limitations of the virtualization software. perhaps the most notable and restrictive limitation is the lack of advanced 3d video rendering capability. this is due to the lack of support for hardware-accelerated graphics, thus adding an extra layer of communication between the application and the video card and slowing down performance. in most hardware-accelerated 3d applications (e.g., google earth pro or second life), this latency is such a problem that the application becomes unusable in a virtualized environment. recent developments have begun to address and, in some cases, overcome these limitations.3 in every virtualization solution there is overhead for the virtualization software to do its job and delegate resources. in our experience, this has been found to cause an approximately 10–20 percent performance penalty. most applications will run well with little or moderate changes to configuration when virtualized, but the overhead should not be overlooked or assumed to be inconsequential. it is also valuable to point out that the combination of applications in a vm, as well as vms running together on the same host, can create further performance issues. traditional bottlenecks the bottlenecks faced in traditional library computing systems also remain in almost every virtualization implementation. general application performance is usually limited by the specifications of one or more of the following components: processor, memory, storage, and network hardware. in most cases, assuming adequate hardware resources are available, performance issues can be easily addressed by reconfiguring the resources for the vm. for example, a vm whose application is memorybound (i.e., performance is limited by the memory available to the vm), can be resolved by adjusting the amount of memory allocated to the vm. a critical component of planning a successful virtualization deployment includes a thorough analysis of user workflow and the ways in which the vm will be utilized. although the types of user workflows may vary widely, analysis and testing serve to predict and possibly avoid potential bottlenecks in system performance. enterprise impact when assessing the effect virtualization will have on your library infrastructure, it is important to have an accurate understanding of the resources and capabilities that will form the foundation for the virtualized infrastructure. it is a misconception that it is necessary to purchase stateof-the-art hardware to implement virtualization. not only are organizations realizing how to utilize existing hardware better with virtualization for specific projects, they are discovering that the technology can be extended employing virtualization in library computing | hutt et al. 115 to the rest of the organization and be successfully integrated into their it management practices. virtualization does, however, impose certain performance requirements for large-scale deployments that will be used in a 24/7 production environment. in such scenarios, organizations should first compare the level of performance offered by their current hardware resources with the performance of new hardware. the most compelling reasons to buy new servers include the economies of scale that can be obtained by running more vms on fewer, more robust servers, as well as the enhanced performance supplied by newer, more virtualization-aware hardware. in addition, virtualization allows for resources to be used more efficiently, resulting in lower power consumption and cooling costs. also, the network is often one of the most overlooked factors when planning a virtualization project. while a local virtualized environment (i.e., a single computer) may not necessarily require a high performance network environment, any solution that calls for a hypervisor-based infrastructure requires considerable planning and scaling for bandwidth requirements. the current network hardware available in your infrastructure may not perform or scale adequately to meet the needs of this vm use. again, this highlights the importance of thorough user workflow analyses and testing prior to implementation. depending on the scope of your virtualization project, deployment in your library can potentially be expensive and can have many indirect costs. while the initial investment in hardware is relatively easy to calculate, other factors, such as ongoing staff training and system administration overhead, are much more difficult to determine. in addition, virtualization adds an additional layer to oftentimes already complex software licensing terms. to deal with the increased use of virtualization, software vendors are devoting increasing attention to the intricacies of licensing their products for use in such environments. while virtualization can ameliorate some licensing constraints (as noted in the at workshop use case), it can also conceal and promote licensing violations, such as multiple uses of a single-license applications or access to license-restricted materials. license review is a prudent and highly recommended component of implementing a virtualization solution. finally, concerning virtualization software itself, it also should be noted that while commercial vm companies usually provide plentiful resources for aiding implementation, several worthy open-source options also exist. as with any opensource software, the total cost of operation (e.g., the costs of development, maintenance, and support) needs to be considered. n conclusion as our use cases illustrate, there are numerous potential applications and benefits of virtualization technology in the library environment. while we have illustrated a number of these, many more possibilities exist, and further opportunities for its application will be discovered as virtualization technology matures and is adapted by a growing number of libraries. as with any technology, there are many factors that must be taken into account to evaluate if and when virtualization is the right tool for the job. in short, successful implementation of virtualization requires thoughtful planning. when so implemented, virtualization can provide libraries with cost-effective solutions to long-standing problems. references and notes 1. alessio gaspar et al., “the role of virtualization in computing education,” in proceedings of the 39th sigcse technical symposium on computer science education (new york: acm, 2008): 131–32; paul ghostine, “desktop virtualization: streamlining the future of university it,” information today 25, no. 2 (2008): 16; robert p. goldberg, “formal requirements for virtualizable third generation architectures,” in communications of the acm 17, no. 7 (new york: acm, 1974): 412–21; and karissa miller and mahmoud pegah, “virtualization: virtually at the desktop,” in proceedings of the 35th annual acm siguccs conference on user services (new york: acm, 2007): 255–60. 2. for other, non–ucsd use cases of virtualization, see joel c. adams and w. d. laverell, “configuring a multi-course lab for system-level projects,” sigcse bulletin 37, no. 1 (2005): 525–29; david collins, “using vmware and live cd’s to configure a secure, flexible, easy to manage computer lab environment,” journal of computing for small colleges 21, no. 4 (2006): 273–77; rance d. necaise, “using vmware for dual operating systems,” journal of computing in small colleges 17, no. 2 (2001): 294–300; and jason nieh and chris vaill, “experiences teaching operating systems using virtual platforms and linux,” sigcse bulletin 37, no 1 (2005): 520–24. 3. h. andrés lagar-cavilla, “vmgl (formerly xen-gl): opengl hardware 3d acceleration for virtual machines,” www .cs.toronto.edu/~andreslc/xen-gl/ (accessed oct. 21, 2008). 24 teaching with marc tapes pauline atherton: associate professor, and judith tessier: research associate, school of library science, syracuse university, syracuse, n.y. a computer based laboratory for library science students to use in class assignments and for independent projects has been developed and used for one year at syracuse university. marc pilot project tapes formed the data base. different computer programs and various samples of the marc file ( 48,poo records, approx.) were used for search and retrieval operations. data bases, programs, and seven different class assignments are described and evaluated for their impact on library education in general and individual students and faculty in particular. a computer based laboratory for use in library science instruction, with the marc pilot project tapes as the file of catalog records, has been the focus of leep (library education experimental project) at syracuse university, school of library science, since august 1968. work has been twofold: 1) development of the laboratory as an instructional tool and 2) exploration of applications of such a facility in library education. the instructional aspect of the project is really "learning with marc tapes". the development of the laboratory has been reported elsewhere ( 1 ), and will not be emphasized in this report. many of today's students in library schools will be tomorrow's workers in libraries that will be parts of library networks and cooperative technical processing centers. they will be involved in library automation projects and related developments. in anticipation of personnel needs for new modes of library service, leep designed activities in the laboratory to satisfy minimum requirements for tomorrow's professional and to encourage maximum use for students with serious interests. teaching with marc tapes/ atherton and tessier 25 the aim during the past year has been to develop a laboratory where computer programs and the marc tapes could be used by library school faculty in class assignments and by students for independent research. the objective was to achieve a program of activities integrated throughout the library school curriculum-one in which computer applications would be seen as one more source of support for the functioning librarian. students were to be provided with a myriad of experiences that would help them to probe the potential usefulness of machine readable catalog data and to develop certain minimal skills needed for using computer based retrieval systems. figure 1 shows the resulting place of leep in the library school. the approach has two stresses : 1) demonstrations of library automation and 2) activities where computers are used in librarianship for research and experimental applications. this orientation is in contradistinction to hillis griffin's use of the term, automation in technical processes, (as he defines it, it includes only acquisition, cataloging, and circulation processes) ( 2). after a short description of the facilities at syracuse this paper will deal with the various class assignments and student projects developed in the first academic year of use, the feedback from students and faculty concerning the usefulness of marc records in instruction, and the authors' conclusions. l.s. classes reference cotalocling bibliography technical services information systems etc. faculty research curriculum development class assignments independent student projects fig. 1. leep's role in the library school. 26 journal of library automation vol. 3/ 1 march, 1970 table 1: data bases and computer programs available through leep. i. data bases: marc i48,000 records (the entire pilot project file) leep programs function program biblolst(3) fdr(4) language access by lc card number, prints assembler each bibliographic record in lc diagnostic format. a frequency distribution program for assembler file study. marc i prints the entire content of a file assembler double column of marc i records in a two-column lister page format. marc i record sort sorts a file of marc i records on the assembler content of any variable (tagged) field. ii. data bases: programs marc/ dps-1000 (the first 1000 marc i records) marcs/ dps-5200 records (a stratified sample of social science records, selected by lc class number, and the lc a's and z's) marcs/dps(ii)-5200 records above, plus 3800 (a stratified sample in humanities, selected by lc class number) function marc reformat reformats marc i records to meet dps requirements and performs certain counts of characters per field, etc. program language pl/ i teaching with marc tapes/ atherton and tessier 27 dps (ibm document processing system) (1,5,6) processes entire text of marc record assembler to produce dictionary and search file, of keywords. retrieves records by any keyword or keyword combinations specified by searcher. allows for root searches, weighting, phrase and field placement, etc. iii. data bases: marcs/ molds-5200 (see marcs/dps above) march/ molds-3800 (see marcs/dps (ii) above) iv. v. programs dbg (7) molds (management online data system) (8) data bases: programs shop (subject heading output stat data bases: test programs function data base generator; selects and formats records for molds. program language pl/ i retrieves fixedfield records by · fortran matching elements; includes sort capabilities and arithmetic operations. licosh (lc subject headings, 7th ed.) function program language formats and prints subject heading pl/ i records. a frequency distribution program for pl/ i file study. lc/z class (lc schedule z) index to z class function program language z text processor selects certain lines of text from assembler, sample of lc z-class schedule and fortran transforms lines into kwic indexable data. 28 journal of library automation vol. 3/ 1 march, 1970 leep facilities the facilities at leep include marc pilot project data bases, computer programs, and personnel. students and faculty were fully informed of the accessibility of the staff of faculty, programmers, and graduate student liaison personnel for consultation and guidance. further, the facility includes computer time; either the leep budget or the library school's university-supported computer budget covers the time charges for class assignments and student projects. table 1 lists and describes the leep programs (with explanation of acronyms) and data bases that are available at the present time at the university computing center for library instruction purposes. the leep facility uses the university's ibm 360/ 50 computer with the following characteristics: main storage: 512 k bytes disc storage: 240 m bytes (2314 disc unit) 3 tape units: 9 channel (800 bpi max) 1 tape unit: 7 channel (800 bpi max) printer: 1000 1pm (two print trains, std and tn) card read/punch: 1000 cpm in 300 cpm out work on the implementation of the computer programs available from the library of congress and ibm was carried on through most of the first academic year. biblolst was in use during the fall term, but the first efficient retrieval system, dps, became available only during the spring 1969 term. for this reason, instructional development has been limited to one semester and one summer session. the experiences reported here are based upon those two terms. marc ii records became available only in late spring 1969. no effort to utilize these records has been made to date, but future plans do include using such a data base. class assignments and student projects using leep because most assignments use the dps retrieval system, learning that system once helps the student in consecutive assignments. leep staff arranged tours of the computing center for individual students, classes and faculty. keypunching instruction and dps explanations were distributed with first assignments, or as needed on an individual basis. during the summer there was instituted an all-school leep orientation seminar and a leep clinic, where a staff member was available for consultation one hour each day in the corridor outside the library school classrooms. materials related to marc/ dps are always available near the students' study carrels and their reserved reading shelf. seven different assignments were developed for classroom use by the library school faculty working with the leep staff during the first year. each assignment reflects the interests of the teacher and the purposes of the unit in which it was introduced. during the spring semester over • teaching with marc tapes/atherton and tessier 29 100 students had computer based assignments (over 200 searches total); during the summer session, seventy-six students had assignments (over 200 searches). following are abstracts of the seven assignments: l.s. 407 bibliographic linking reference service purpose: a) obtain a listing of titles conl.s. 427 cat. & class. (richmond) l.s. 427 cat. & class. (moore) l.s. 621 technical services ( gration and webster) taining bibliographies from marc records; b) prepare for extension and interconnection of some of these bibliographic entries and the original titles within the marc data base. c) practice bibliographic evaluation. procedure: a) area of interest was selected by dewey or l.c. class number (root search, and, or options) from marc file of 1000 records. records with class number and bibliographic note were retrieved using dps /marc system. b) bibliographic entries in these titles were examined and marc i worksheets were made for three english monographs, with added data fields for source of reference. c) evaluate the bibliographies in the books examined as reference tools for a scholar. title searches purpose: contrast searching for titles to be ordered in bpr and in marc file, in order to obtain l.c. card number, established entry, and full cataloging record. procedure: a) search for 12 titles in bpr ( 1966 and 1967). b) search in marc file ( 1000 records) for 10 (and searches of title words), and prepare unit cards for any 5. searching a shelf-list purpose: a) verify an assigned class number with library holdings in that number. b) compare the subject headings for one class number. procedure: assign dewey class numbers to three titles; search the marcs/dps file for the assigned number; compare titles cataloged with titles retrieved by search for consistence and compare subject headings on worksheet provided. searching for acquisitions purpose: extract cataloging records from marc files ( 48,000 records) for titles selected from choice or library ] ournal ( 1967 issues). 30 journal of library asutomation vol. 3/1 march, 1970 l.s. 621 technical services (gration) l.s. 628 information systems (bottle) l.s. 628 information systems (bottle) procedure: cite l.c. card number for selected titles (at least 10); keypunch numbers; submit with job control deck to dispatcher in computing center and obtain printout of full l.c. cataloging via biblolst program. evaluation of series purpose: a) for a given subject, examine catalog records for titles in a series. b) determine quantity of material on a subject published in series. c) evaluate series notes and series tracing with a view to setting policy for series control. procedure: a) search for subject via dps/ marc system ( 5000 social science monographs). (and, or, root searches of any descriptors are possible.) b) examine printout of 50 titles (or less) for series notes, publishers series, etc. c) write procedural statement for handling series. preparation of bibliographic information for machine input purpose: a) exercise keypunching. b) simulate preparation of bibliographic information for machine input. procedure: for one marc i input worksheet (done in ls407) keypunch six data elements following a fixed format. use of boolean logic for searching marc file purpose: a) practice in use of boolean operators. b) practice in use of a reference retrieval system, e.g. dps/ marcs. procedure: construct 3 searches-!) do or search for references found earlier in s.u. library with both l.c. card number and in bnb; 2) do and search for two descriptors possibly in same document, e.g., d.c. class number and l.c. class number, or two english language words that describe a subject; 3) do or search looking for same subject as in 2). compare results and comment on use of modifiers (root search, specification of field, sentence, or paragraph to be searched, etc. ) teaching with marc tapes/atherton and tessier 31 aside from the structured approach developed for classroom assignments, students were encouraged to develop independent projects. one student group developed an index to abstracts of recent articles in the area of technical services. this involved analysis of three aspects of information in journal articles (type of library, function, and technique) as described in abstracts prepared by the class. the project group did the coding, keypunching, sorting, and listing needed to produce the index. several decisions about abbreviations, format, content, and index order had to be made. leep provided keypunching and instructions for implementing the project. this student abstracting service, begun by one group in the fall of 1968, was updated by another group in the spring. the index was ready in a second edition for use by summer school students. it may become an ongoing service if there is enough student interest. another group of students produced a computer printed bibliography on negro history for an inner-city school library in syracuse. pl/1, or programming language one, was offered twice as a noncredit, eight-session seminar for librarians and library school students. the teacher, a leep consultant, stressed the pl/1 vocabulary subset for character manipulation. the students, on completion, could do simple programs to access, count and print marc i records. one student chose to continue his pl/1 experience, and under an independent project, programmed an ordering procedure and reporting forms. through term projects or independent research, the student can get academic credit, free computer time, and consultation from faculty and leep. during the spring and summer, a general invitation to the students to make dps searches was offered and a form for search evaluation was developed. during the summer session, such independent searches became more popular as instructors of subject reference, bibliography of the social sciences, and bibliography of the humanities allowed dps searching as one technique in term project development. these independent search results became a part of the students' subject bibliographies. searches run by leep were used in two classes as instructional aids: in advanced cataloging and classification as examples of precedents in cataloging practices; and in bibliography of the social sciences as an example of bibliography building for area studies work where information on the catalog card can be retrieved by searching any bibliographic field therein. during the fall semester, 1969, students had the option of taking a three-unit research course on search strategy and retrieval evaluation. the basic tool of this course was marc on dps, and the objective analysis and evaluation of reference retrieval via a computer based system, as well as evaluation of traditional cataloging in a new retrieval form. work continues to prepare other computer based assignments in courses in the bibliography of social sciences and humanities, advanced cataloging and subject reference. 32 journal of library automation vol. 3/ 1 march, 1970 feedback and findings this first year has not yet produced conclusive results about the best direction in which to continue, hut the faculty has been encouraged to think that the above-described integrated activities for the student are promising. different students used the computer based laboratory in varying degrees, the depth of their investigations being their choice. for some students the only experience with leep was a class assignment or an orientation lecture on computers and cataloging or computers and reference service. others met leep in class hut also made efforts to explore the pl/ 1 seminars, and at the informal coffee hours and orientations showed considerable interest in this field. the first year has been a blending of library automation and other concepts in librarianship which the student could explore through practical experience. evaluation forms were distributed through student mailboxes at the close of each semester to get individual feedback. of ninety responding (about 35% of the library school population) sixty-four students had used leep in classes and seventeen had used it independently. sixty students .. picked up new ideas as a result of leep." twenty-two reported no new ideas. fifty-eight students would .. take a job involving library automation" and fifty-five reported this was not the same response they would have given a year earlier. some comments included: .. the field has an exciting future," "automation is of value to libraries," .. we need librarians in the field," and "now i can talk to experts and communicate our needs to them." the first assignment that the student encounters is the most important one. all the students in the school, not just those who have expressed interest in automation, are required to take three of the four courses with leep assignments. students with a broad range of personal interests must be exposed to the potential of computers for libraries. the first assignment is designed to overcome students' fears of computers and related equipment. many are reluctant to keypunch and hesitant about approaching any problem involving equipment. the method is a simple one : starting the student out with a simple assignment with little computerese and one that has a high chance of success. every instruction is made clear and the steps to follow are stated. the reason for the assignment is spelled out exactly and its relation to everyday librarianship explained. even though students tend to resist what looks foreign and complicated, they usually respond, upon completion of the first assignment: "that was easy". the assignments described above in cataloging and in technical services best illustrate the simple-and locked-in-nature of a beginning assignment. in the classroom, marc/ dps has been presented in terms of the assignment. in order to make the system understandable by the uninitiated, students have at times been given only a portion of its capabilities. this approach works well for locked-in assignments, as in cataloging. howteaching with marc tapes/atherton and tessier 33 ever, it hegins to seem that an in-depth introduction to dps, with all its capabilities and fiexibilities, is a better start for some students. an evaluation of retrieval systems and machine readable cataloging data is one instructional aim which may he best achieved in special seminars. experience has also shown that at times the integration of leep assignments into the instructional objectives of a course vitiates both elements. the reference assignment is an example. the final objective of the assignment was evaluation of bibliographies in books (indicated by a bibliography note or marc i indicator). the first section of the assignment included structuring a subject search in marc/dps that illustrated access to bibliography indicators not accessible in the card catalog. the second and third parts dealt with retrieval of the books and evaluation of the bibliographies. the students expressed satisfaction with the citation retrieval, but they experienced frustrations in finding the books cited on the marc tapes. finally, the evaluation of the bibliographies suffered from the student frustration, and students came to regard a "leep sponsored" assignment as "too complicated." the indications from this one assignment were that instruction in retrieval techniques and potential would be sufficient, and need not be tied directly to a larger problem which may rely on external resources. in other classes an integrated approach was used successfully, whenever the techniques and parts of the assignment were kept simple. the greatest impact of the pl/i non-credit seminar was not learning how to program, but understanding what is involved in using such precise techniques, and how to specify steps with logic. this helped make the programmer's role more understandable to the librarians and students in the class. summary the stress during the first year of operation has been on implementing the leep facility for class use. now that development of data bases and retrieval programs is somewhat stabilized, it is hoped to move on to more use of this tool for analysis and evaluation. with faculty support, it is planned to continue designing class assignments, increasing the "catalog of tested assignments." the intent is to encourage a serious study of the marc record, and hence traditional cataloging practices. it should also he possible to do some useful research into the nature of bibliographic description as a tool for reference retrieval. leep will continue through 1969-70, using the marc tapes (marc i and soon, marc ii) . emphasis will not he to teach "how to automate a library better" hut to learn "what difference does a machine readable catalog make" and "of what use and value is such a record to librarians and library users?" the marc records will be used to ask questions about how to improve or change acquisitions, cataloging, reference, and other library functions. this is a departure from the use of computer 34 journal of library automation vol. 3/ 1 march, 1970 based facilities to teach library data processing. the leep approach seems to have had an impact upon library students who are "straight librarians," and not very interested in library automation. it may also foster a greater interest in analysis and research in the library school. for example, with machine readable catalog records it is possible to monitor what has been done in practice before and after the anglo-american rules, or the various additions in l.c. subject cataloging and classification. it is possible to check cataloging consistency more easily. because the marc tapes include both library of congress class numbers and dewey class numbers, they can be compared as to their usefulness for subject searches, subject spread on a library's shelves, etc. with marc ii tapes, it should be possible to simulate a data base more like a national bibliography, and thus open new fields for efficient survey. as noted earlier, all the research, whether on retrieval evaluation or on the nature of cataloging, is student based. the library school's objective is to provide the facility, the impetus, and the guidance which make up the intellectual environment where such investigation can be done. leep is a new part of the library school environment. it can serve to encourage librarians to consider, understand, and even use computers where applicable, in library schools today and in the library field tomorrow. the future use of computers in libraries will be decided by librarians and not by system programmers or automation technologists. to prepare such librarians there must be a time in their lives for experimentation, research and development. there must be a time when they can objectively question what of the old can blend with the new and what will have to be revised. we hope that leep has provided that opportunity to some, if not all, of the students and faculty at syracuse university school of library science. acknowledgments work reported here was partially supported by a grant from usoe, bureau of library research, oeg-08-0664. this paper is based on a presentation before the library education division at the american library association annual meeting, atlantic city, new jersey, june 23, 1969. programs and descriptions microfiches and photocopies of the following leep program descriptions and related materials may be obtained from national auxiliary publications service of asis as follows: 1) "leep program description for marc i file: distribution of records" (naps 00878); 2) "leep report 69-11 : leep program description: marc i double column lister" (naps 00879) ; 3) "leep report 69-12 : leep program descripteaching with marc tapes/atherton and tessier 35 tion: leep-biblolst" (naps 00880); 4) "leep report 69-13: leep program description: marc i record sort" (naps 00881); 5) "leep report 69-14: leep program description: listing machine-readable library of congress subject heading file" (naps 00882); 6) "leep report 69-15: the conversion of the lc classification schedules to machine-readable form" (naps 00883); 7) "rome project program description: molds support package" (naps 00884). copies in mimeographed form may also be had by writing to library education experimental project, school of library science, syracuse university, syracuse, new york 13210. references 1. atherton, pauline; wyman, john: "searching marc tapes with ibm/document processing system." in proceedings of the american society for information science, 6 (westport, connecticut, greenwood publishing corporation, 1969), 83-88. 2. griffin, hillis: "automation of technical processes in libraries," in annual review of information science and technology, volume 3, cuadra, carlos a., ed., (chicago: encyclopedia britannica, 1968), pp. 241-262. 3. library of congress, information systems office: marc pilot pro;ect report, appendix a (washington, d.c., 31 january 1967), p. ill, 3, 21. 4. martel, frank: stillwell, john: marc pilot pro;ect file analysis of distribution of records (syracuse: leep report 69-1). 5. tessier, judith: index and manual for ibm system/360 document processing system (syracuse: leep report 69-5). 6. tessier, judith: searching marc/dps; a users manual (syracuse, n.y.: leep report 69-3 ). 7. leep report to be published december 1969. 8. peterson, p. l.; carnes, r.; reid, i.; et. al.: large scale information processing system, vol. i. compiler, natural language and information processing, report radc-tr-68-401, volume i (april 1969). jorgensen ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ microsoft word 13319 20211217 gallery.docx article developing a minimalist multilingual full-text digital library solution for disconnected remote library partners todd digby information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.13319 todd digby (digby@ufl.edu) is chair, library technology services department, and associate university librarian, george a. smathers libraries, university of florida. © 2021. abstract the university of florida (uf) george a. smathers libraries have been involved in a wide range of partnered digital collection projects throughout the years with a focus on collaborating with institutions across the caribbean region. one of the countries that we have a number of digitization projects within is cuba. one of these partnerships is with the library of the temple beth shalom (gran sinagoga bet shalom) in havana, cuba. as part of this partnership, we have sent personnel over to cuba to do onsite scanning and digitization of selected materials found within the institution. the digitized content from this project was brought back to uf and loaded into our university of florida digital collections (ufdc) system. because internet availability and low bandwidth are issues in cuba, the synagogue’s ability to access the full-text digitized content residing on ufdc was an issue. the synagogue also did not have a local digital library system to load the newly digitized content. to respond to this need we focused on providing a minimalist technology solution that was highly portable to meet their desire to conduct full-text searches within their library on their digitized content. this article will explore the solution that was developed using a usb flash drive loaded with a portableapps version of zotero loaded with multilingual ocr’s documents. about the partnership the university of florida (uf), george a. smathers libraries have been involved in a wide range of partnered digital collection projects throughout the years with a focus on collaborating with institutions across the caribbean region. uf has been involved with the digital library of the caribbean (dloc), which began in 2006, and the university is the technical home. the dloc brings together collections from countries around the caribbean in order to provide researchers with greater online access to these physically dispersed collections.1 this partnership reflects common interests of preservation, access, accessibility, discovery, and content management.2 one of the countries that we have a number of digitization projects with is cuba.3 the cuban judaica collection comprises materials held in the library of the temple beth shalom (gran sinagoga bet shalom) in havana, cuba. the synagogue library collection contains over 10,000 books. the origin of this collection started with abraham marcus matterin, the founder of the la agrupacion cultural hebreo cubana cultural group, who first gathered and arranged the materials. in addition to matterin’s own works, the materials in the library include many rare yiddish publications from the early 20th century, as well as little-known works produced in cuba beginning in the 1930s. the temple beth shalom library as a whole provides a complete snapshot of cuban jewish intellectual, cultural, religious, and political life as it evolved and progressed during the 20th century.4 information technology and libraries december 2021 developing a minimalist multilingual full-text digital library solution | digby 2 among the rare publications included in the smathers libaries collections are habanera lebn, the main cuban jewish newspaper published between 1932 and1960; israelia, a spanish-language newspaper which circulated as a monthly during the 1950s; as well as many other cuban jewish publications. this collection will also provide access to the synagogue library’s wealth of jewish publications from other parts of the caribbean and latin america. the synagogue library digitization project is a partnership between the george a. smathers libraries, the isser and price library of judaica under the auspices of its neh challenge grant, la comunidad hebrea de cuba, and la biblioteca nacional de cuba josé martí. the digitization process as part of this partnership, graduate interns who were fluent in spanish travelled to cuba to do onsite scanning and digitization of selected materials found within the institution. the digitized content from this project was brought back to uf and loaded into our university of florida digital collections (ufdc) system. this digitization process involved taking images in a high-resolution tiff format and then creating the appropriate metadata to accompany these records for use once they were loaded into the digital library system. an additional step of this process was developed by fletcher durant, uf’s preservation librarian, who cut strips of colorful acid-free paper, which were placed in physical items to indicate that they were digitized. these paper flags were used to tell local synagogue users which items were digitized and available locally in the synagogue and more broadly on the internet. these digital files were then transported back to the digital support services group at the university of florida with the returning personnel on usb hard drives, which were then appropriately scanned for viruses before extracting the digitized files. as part of the ingest process to ufdc we created derivative access files including jpgs, jpg2000, and thumbnail images from the high resolution. additionally, these files are processed through an optical character recognition (ocr) process to generate full text searchable files. this ocr process involved a combination of using adobe acrobat or abbyy finereader. the unique aspect of the ocr process was the need to ensure multilingual character recognition that could recognize and generate full text files that may include spanish, hebrew, and english. these scanned files, along with the derivatives and ocr files were then loaded and made publicly searchable with the ufdc system. these newly digitized files were then easily loaded into our ufdc system and made accessible to the wider internet audience around the world. working with a partner in cuba, however, presented additional challenges. partnership challenges providing the synagogue with access to their own content presented challenges that were not necessarily anticipated when the scanning project started. the synagogue did not have a local digital library system that they could use to load and access the newly digitized content. we did provide the synagogue with copies of all the digital files, but since this digitization effort focused on text-based printed materials, to make full use of this content access to the ocr files in a searchable format was the end goal. this in itself is not normally an issue since the digitized content would be loaded on ufdc systems and could be accessed online. unfortunately, one challenge that presented itself when working with cultural heritage partners within cuba is that limited technology infrastructure and internet connectivity can create issues in supporting the information technology and libraries december 2021 developing a minimalist multilingual full-text digital library solution | digby 3 physical and digital initiatives needs of the project activities. with internet availability being intermittent and bandwidth speeds limited in cuba, there needed to be creative ways to address the digitization work and subsequent file sharing needed to be developed.5 aside from the broader infrastructural challenges related to technology that are presented when working with cuban partners, there are additional bureaucratic challenges presented with partnering projects in cuba that can be in flux and change with changes in policy by either the us or cuban governments. these technological and political hurdles made our ability to offer ongoing remote support highly challenging. additionally, there were barriers to how we could offer remote support because our respective it technicians spoke either english or spanish, but not both. the need for translation between languages is one that can be overcome, but does slow down the responsiveness. also, translators may not be accustomed to translating technical jargon, which can further complicate providing support. given the challenges presented above we endeavored to provide the synagogue with a solution that would provide a multilingual full-text search of the materials that had been scanned and put through the ocr process. additional factors that influenced our planning, as discussed above, were the recognition that our cuban partner did not have reliable internet and access to the content hosted in the us and that this solution would be run locally at the synagogue so we would be able to provide minimal, if any, support in installing or supporting the system once it was deployed. minimalist computing solutions in the search for a solution, we were influenced by the concept of minimalist computing.6 borrowed from digital humanities, minimal computing “refer[s] to computing done under some set of significant constraints of hardware, software, education, network capacity, power, or other factors. minimal computing includes both the maintenance, refurbishing, and use of machines to do dh work out of necessity.”7 an important focus of minimal computing in the digital humanities, as noted by risam and edwards (2017), is how these practices can be used by those that find themselves with technological needs when they work outside larger macro structures of financial and technical support.8 so basically, minimal computing is a solution for those individuals or institutions that are not positioned at a larger scale to support projects financially and technologically. appropriate technology in addition to acknowledging the use of minimalist computing, additional related frameworks exist that we drew upon for our project. the most prominent of these is the concept of appropriate technology (at) which is a concept that comes from the field of economics.9 this concept was further adapted and used in the field of economic development and was seen through implementation pertaining to … small production units, appropriate technologies are small-scale but efficient, replicable in numerous units, readily operated, maintained and repaired, low-cost and accessible to low-income persons. in terms of the people who use or benefit from them, appropriate technologies seek to be compatible with local cultural and social environments.10 information technology and libraries december 2021 developing a minimalist multilingual full-text digital library solution | digby 4 one of the main reasons for the use of appropriate technology is that advanced technologies were often inappropriate for the needs of the populations in countries that did not have the same level of technological infrastructure, support, and knowledge. this idea is composed of multiple facets, where in some cases it can be used to describe as using the simplest level of technology that is needed to meet the intended purpose of the user. it can also refer to how the system works in the way it is developed that takes into account the social and environmental factors for a given use. open-source appropriate technology further influences that influenced the design of this project can be found in a more granular approach to appropriate technology that has become known as open source appropriate technology (osat). as defined by pearce (2012), osat refers to technologies that can be sustainably developed, while at the same time being developed using the concepts and principles of free and open source software. additionally, pearce (2012) further states that, osat is made up of technologies that are easily and economically utilized from readily available resources by local communities to meet their needs and must meet the boundary conditions set by environmental, cultural, economic, and educational resource constraints of the local community.11 developing a solution as mentioned previously, the digitized synagogue content from this project was brought back to uf from cuba and loaded into our local digital collections system. this made the content accessible to anyone with internet access around the world, yet due to the fact that internet availability and low bandwidth are issues in cuba, the synagogue’s ability to access the full-text digitized content residing on ufdc could not be assured. additionally, the synagogue also did not have a local digital library system to load the newly digitized content into. to respond to their desire to conduct full-text searches on their digitized content within their library, we focused on providing a minimalist technology solution that was highly portable, user friendly, open source, and sustainable without any or minimal technical support. in scanning the library technological landscape, our first thought was to find a small digital library system that could be used to meet these needs. although there are a number of open source digital library systems and some of these can be configured to work in a non-internet–connected environment, the level of customization and ongoing technical support posed a problem, especially when we may not be able to provide support due to both issues in the telecommunications systems and possible language barriers. although we knew that they had a windows laptop, there were still technical uncertainties about the local computing environment within the synagogue. once we determined that a full digital library system was not going to be sustainable and deployable, we decided to look for alternative approaches. a solution that just involved providing the scanned materials on dvds was considered, but this also presented a problem, because the ocr’d pdfs would need to be opened individually to search the text within, and there was no logical way to provide citation information or organize files in a meaningful way. information technology and libraries december 2021 developing a minimalist multilingual full-text digital library solution | digby 5 zotero portable eventually, we looked at the various citation management systems, since many of these allow pdf files to be imported and allow for searching the full text of pdf files using the ocr’d text. we then focused on a solution that is open source; this is especially important given that we were providing this to an entity in a foreign country and we did not want to experience any licensing or update issues to the chosen software. the platform that was chosen was zotero, primarily because of the open source license of the software, but also because of existing technical knowledge and experience using it. with zotero chosen as our platform, we then investigated how this could be made portable in a way that was already installed, configured, and populated for the end user. i had existing knowledge of the portableapps platform (https://portableapps.com/), which is a fully open source and free platform that enables a broad range of windows applications to be installed on a portable device, in our case a large flash drive. once installed using the portableapps platform these applications can be used without any additional installation, just by plugging in the flash drive to another computer. in the case of the zotero application, there was already a deployment that was created to be used with portableapps, which made the installation and configuration less of a hurdle. see figure 1. figure 1. once plugged into the pc, the portableapps.com drive would present itself; double-clicking on the icon would load the menu for the user with only the zotero application available to choose. once installed we started to load the digitized files into zotero. this consisted of a two-step process. first, because the content was already added to our digital library system, we imported citations for each item or volume into zotero, as you normally would when adding citations into zotero. then we went into each of these citations and added the corresponding digitized file(s) into each entry. some of these items consisted of multiple volumes that would be placed under an information technology and libraries december 2021 developing a minimalist multilingual full-text digital library solution | digby 6 existing citation. in addition to this, some of the materials contained both spanish and hebrew text, so during the ocr process there was a separate file created for each of these languages. at this point we were able to test the full-text search capabilities of the zotero against our multilanguage ocr’d pdf files. figure 2. this image shows a set of citations of the scanned materials. to perform a full-text search within these documents a user uses the everything search box in the zotero toolbar. at this point in our testing, we had determined that our method was successful in being able to perform a full-text search across all the loaded pdf documents (see figure 2). however, a limit of zotero at this point was that the search would only identify which files the search terms were located in and not the exact location of the located terms within each file (see figure 3). although this was a limitation of our solution, we were able to provide a second step for searching within the pdf files for the exact location of the search terms. information technology and libraries december 2021 developing a minimalist multilingual full-text digital library solution | digby 7 figure 3. the zotero search results, which highlighted the files in which the search text could be found. figure 4. acrobat search conducted on a file to locate the exact location of the text in the digitized materials. since we were aware that you can search within a full-text pdf file using adobe acrobat reader, we decided that for a more granular search we would instruct the users to click on the file that information technology and libraries december 2021 developing a minimalist multilingual full-text digital library solution | digby 8 included the identified search term, which would load acrobat reader and open the file. then the user could search for the exact term using adobe acrobat reader’s search capabilities to locate the location in the document where the term was situated (see figure 4 for an example). although this two-step process is not ideal, for a minimal technological solution that addresses all the concerns, it would meet the overall goals of the project and provide a workable searching solution for our partner, with the workflows, installation, and configuration of the flash drive complete, we next created documentation in both english and spanish to guide the user through the search process. we then provided the flash drive with accompanying documentation to the next staff member who was travelling to cuba. because of federal rules implemented shortly after we transported the flash drive to cuba, our ability to travel to cuba to work with our partners was limited. this has substantially reduced the information flow between our institution and our cuban partners and has limited how much we can know about how actively this resource is being used. it is hoped that in the near future we can once again be able to travel to cuba and re-engage with our partners to determine the success of this and other projects we have been working with them on. conclusion in the realm of library technology, we often implement and support complex and highly costly systems as part of our regular oversight. by working on this project, we have been given a chance to take a step back and design a solution that uses open source and free tools that are readily available and require low support. looking broadly across the technology platforms, systems, and software that are used, there is a tendency to find these include a plethora of features and functions that are rarely used but add additional complexity. focusing on solutions that reduce this complexity and still meet user needs has been a rewarding experience. endnotes 1 brooke wooldridge, laurie taylor, and mark sullivan, “managing an open access, multiinstitutional, international digital library: the digital library of the caribbean,” resource sharing & information networks 20, no. 1–2 (2009): 35–44, https://doi.org/10.1080/07377790903014534. 2 miguel asencio, “collaborating for success: the digital library of the caribbean,” journal of library administration 57, no. 7 (2017): 818–25, https://doi.org/10.1080/01930826.2017.1362902. 3 “celebrating cuba! collaborative digital collections of cuban patrimony,” university of florida digital collections, accessed february 15, 2021, https://ufdc.ufl.edu/cuba. 4 “cuban judaica,” university of florida digital collections, accessed february 15, 2021, https://ufdc.ufl.edu/cuban_judaica. information technology and libraries december 2021 developing a minimalist multilingual full-text digital library solution | digby 9 5 xuefei deng, nancy armando camacho, and larry press, “how do cubans use internet? the effects of capital,” in proceedings of the 52nd hawaii international conference on system sciences (2019), https://doi.org/10.24251/hicss.2019.617. 6 jentery sayers, “minimal definitions,” minimal computing: a working group of go::dh, october 2, 2016, https://go-dh.github.io/mincomp/thoughts/2016/10/02/minimal-definitions. 7 “about: what is minimal computing?” minimal computing: a working group of go::dh, https://go-dh.github.io/mincomp/about/. 8 roopika risam and susan edwards, “micro dh: digital humanities at the small scale,” digital humanities 2017, http://works.bepress.com/roopika-risam/27/. 9 ernest f. schumacher, small is beautiful: economics as if people mattered (london: blond and brigggs, 1973). 10 peter thormann, “proposal for a program in appropriate technology,” in appropriate technologies for third world development (new york: st. martin's press, 1979): 280–99. 11 j. m. pearce, “the case for open source appropriate technology,” environment, development and sustainability 14 (2012): 425–31, https://doi.org/10.1007/s10668-012-9337-9. lib-mocs-kmc364-20131012112916 aacr2: oclc's implementation and database conversion georgia l. brown: oclc, inc., dublin, ohio. 161 oclc's online union catalog (oluc) contains bibliographic records created under various cataloging guidelines. until december 1980, no system-wide attempt had been made to resolve record conflicts caused by use of the different guidelines. the introduction of the new guidelines, the anglo-american cataloguing rules, second edition (aacr2) , exacerbated these record conflicts. to reduce library costs, w hich might increase dramatically as users attempted to resolve those conflicts, oclc converted name headings and uniform titles in its database to aacr2 form. the purpose of the conversion was to resolve record conflicts that resulted from rule changes and to conform to lc preferred forms of heading if possible. background in may 1978, upon receiving an advance copy of the anglo-american cataloguing rules second edition (aacr2), oclc formed an internal task force of librarians who were professional catalogers to study the new rules. the aacr2 task force was charged with identifying differences between aacr2 and aacrl as applied by the library of. congress. the task force compared the two sets of rules on a rule-by-rule basis to determine: (1) effects of rule changes on the marc record formats , (2) who benefited from the changes, and (3) relative costs of the changes on both a one-time and a continuing basis. each change was assigned a number from 0 to 5 to represent the cost to libraries (0 being no cost and 5 being maximum cost) . the task force identified a total of 454 significant rule changes or new rules . the task force categorized each rule's effect, and in its judgment, 56 percent of the changes would benefit neither the librarian nor the patron, 23 percent would benefit librarians, and 21 percent would benefit patrons. the estimates of the percentage of changes along the cost spectrum are illustrated in table 1. manuscript received ap rill981; accepted aprill98 l. 162 journal of library automation vol. 14/3 september 1981 table 1. estimates of aac r2 changes in terms of costs cost range 0 1 2 3 4 5 pe rcentage of changesone-time 18 54 13 9 4 2 identification of conversion requirements percentage of changescon t inuing 20 56 20 0 2 2 originally, the findings of the task force were to be used to adjust the oclc online system and card production programs to accommodate aacr2 changes. however, in light of estimated costs to individual libraries to convert existing headings and uniform titles to aacr2 form, the task force studied the requirements for an oclc machine conversion. the machine conversion required that information within the record be consistently identifiable. the task force used work sheets to record and keep track of its findings . the first column of each row on the work sheet represented one rule. the row was completed with the rule number, the aacr2 form with tagging, the pre-aacr2 form with tagging, instructions, and comments. figure 1 illustrates a work sheet. an analysis of the work sheets indicated that one method to convert to aacr2 form was to develop an oclc authority control system based on aa cr 2 aa cr2 fo rm prc-aacr2 form rule with tagging with tagging instructions comments 22.501 100 10 zerotina, 100 10 z ze rotina withi n l a z could be fo r czech and slovak karel z ka rel searched , dele ted , and names only a dded at end o f field 25.4b l xx ! a ... lxx !a . . . set up table of uniform t h is would requ ire 240 ta theaetet us 240 t a theaitetos titles where greek reading of ! a . . . or fo rms change to latin c hecking against ta ble 240 !a theaetetus fo rms. change 240 ta greek fo rm to l atin form 21 .26 100 10 parker, 100 10 ra msdell , no way to and theodore tc sar ah a. automatically recognize 22.14 (s pirit) 700 10 70010 par ker , those records requ iring ramsdell, theodore change sarah a. 25 .9 240 t a selections 240 t a selected if text of 240 ! a is this will require wo rks "selected works" reading tex t of l a c hange to " selections" fig. 1. task force conversion w orksheet aacr2: oclc/brown 163 the lc name authority file. due to time constraints and the complexity of developing such a system, however, oclc decided on a second method: to convert online union catalog headings and uniform titles using the lc name authority file and some additional data manipulation techniques that would detect changes not done by the authority processing. preconversion testing using the work sheets, the task force assigned the rule changes to pattern sets. pattern sets were defined as combinations of character strings, punctuation, subfield coding, and other characteristics that indicate that the heading could be algorithmically changed to conform to the new rules. these changes were further divided into those that could be converted by machine and those that could not be converted by machine. approximately 100 pattern sets were initially identified. before making a commitment to convert all 100 of these pattern sets, tests were run to determine the approximate number of bibliographic records that would be changed. a test file obtained by selecting records at random from the online union catalog as of september 2, 1978, already existed at oclc. the test file represented a 1 percent sample of the database on that date, or 41,212 records. programs run on the test file identified the patterns within the bibliographic records and counted the number of times each pattern occurred in the test file. table 2 illustrates selected results of pattern set sampling. patterns not found in the test file were later eliminated from those to be applied against the entire online union catalog. " u.s." was found in qualifying fields 754 times, and "covenant" was found only once. "university of' was found 486 times on the test sample; however, it could be incorrectly converted frequently enough to eliminate it from the list of pattern matching to be done. tests also indicated that some changes that appeared straightforward, when applied , introduced further errors that would have to be resolved after the conversion. of the 41,212 records , 100 records were manually checked for system changes that would need to be made for the existing bibliographic records table 2. selected results of pattern sampling rule number number matched comments 21.39a 32 !a ... !k liturgy and ritual 21.39c 7 ! a jews ! k liturgy and ritual 24.1b 7l state university 21.33 28 constitution 3 charter 1 covenant 21.35 27 treaties 25.15 206 laws, etc. 25.681 0 books, parts, numbers 25.9 19 selected works 24.2782 0 pope 164 journal of library automation vol. 14/3 september 1981 to comform toaacr2. general findings included: change number of records none 33 more than one 21 minor personal name change 19 personal name modification 13 single change other than 14 personal name specific changes that would be needed are shown in table 3. as noted in the table, personal name changes account for more than two-thirds of all required conversion changes. as a final note, name headings to be converted by authority processing could not be estimated by sampling, since the lc name authority file was not available online when the tests were run. early estimates, based on the tests and anticipated name authority matches, called for conversion of 8 percent of the online union catalog, or 560,000 records, to aacr2. however, samplings done by the library of congress indicated that 17 percent of all marc records contained one or more headings that needed to be converted. oclc assumed that this statistic would also apply to its database. the task force's study, in general, showed that oclc could convert by machine a large portion of its bibliographic records to conform to aacr2. design methodology oclc formally initiated the aacr2 project to : (1) accommodate the use of aacr2 format in the online system, and (2) convert existing bibliographic records to aacr2. accommodating aacr2 formats required validating additional content designators, modifying card printing to allow for the new content designators, and training users. also , the seven bibliographic format documents (books, serials, audiovisual media, scores, sound recordings, maps, and manuscripts) were rewritten to include the new content designators and aacr2 input conventions and table 3. modifications needed for aacr2 conversion (based on a sample of 100 records) modification personal name parenthesize geographic location u.s.-united states, ct. brit.-great britain uniform title modification drop geographic location from corporate name tk dropped university heading conference date and place inverted u.s. congress total occurences per 100 records 57 8 3 3 5 2 2 2 1 83 percent of modification 69 10 4 4 6 2 2 2 1 100 aacr2: oclc!brown 165 examples. the remainder of this paper will deal with the conversion of existing bibliographic records in the online union catalog, oclc's bibliographic database. the purpose of the conversion was to resolve record conflicts that resulted from rule changes affecting name headings and uniform titles. functional specifications two sets of functional specifications were written based on the preproject studies by the aacr2 task force. set 1 functional specifications addressed the conversion of bibliographic records to aacr2 by matching the records in the lc name authority file and then incorporating data into the bibliographic records. set 2 functional specifications addressed the machine manipulation of character strings that formed a given pattern. set 1 functional specifications three constraints were placed on the conversion described in set 1 functional specifications. first, the pre-aacr2 form of a converted field must be retained. second, the bibliographic record must be retrievable by both pre-aacr2 and aacr2 forms. third, the field that was changed must be identified to users, and the record must indicate that it had been modified by machine conversion. set 1 functional specifications listed the fields in the bibliographic and authority records that should be considered in the conversion, grouping bibliographic fields that should be matched with given authority fields. for each field, characters were eliminated that might inadvertently cause a no-match result. subfield codes and delimiters, multiple blanks, and diacritics were eliminated from the character string used for matching. all alphabetic characters were converted to uppercase letters and certain subfields were eliminated from the matching strings. this process was applied to both bibliographic and authority records. the resultant matching strings, for a bibliographic and an authority field, were compared on a character-by-character basis. if any character was different, there was no match. matches were treated differently depending on the contents of the name authority field . four cases for matching were defined: case 1. bibliographic field matches aacr2 authority field. in case 1, the only change needed was to indicate in subfield w of the bibliographic field that it conformed with aacr2. case 2. bibliographic field matches non-aacr2 authority field; aacr2 form present in authority record. case 2 called for the following changes: (1) replacing the bibliographic field with the aacr2 form from the authority record; (2) moving the replaced bibliographic data to another field (an 87x field); and (3) indicating in the converted bibliographic field that conversion had been done. 166 journal of library automation vol. 14/3 september 1981 case 3. bibliographic field matches non-aacr2 authority field; aacr2 form not present in authority record . in case 3, the authority record contained the form preferred by lc, but not the aacr2 form . if the bibliographic field matched a "see from" reference ( 4xx authority field), case 3 called for the following changes: (1) replacing the bibliographic field with the authoritative field (lxx authority field); and (2) moving the replaced bibliographic data to another field (an 87x field) . no indication was added that the field was machine-converted, since the form supplied was not aacr2. case 4. bibliographic field tagged as personal name matches authority field tagged as corporate name. in case 4, the bibliographic tag was corrected to a corporatename tag. case 4 was used to clean up the database and to allow more fields to be converted. set 2 functional specifications for set 2 functional specifications, the pre-aacr2 form of the entry also must be retained and the record retrievable by both pre-aacr2 and aacr2 forms. these functional specifications called for conversion of six pattern sets. each pattern set might apply to multiple fields and, within the fields, to multiple character strings. some of the pattern sets were further subdivided into various conditions . for example, pattern set 2 specified the conversion of form subheadings. this pattern set looked only at one field, the 110 field, but held two conditions. in the first condition, any one of ten character strings might be matched. in the second condition, either of two character strings qualified for matching. pattern set 2 was actually one of the easier sets to work with since it involved minimum data manipulation and testing. the most complicated pattern set concerned music uniform titles where only two fields were involved but six possible conditions had to be considered. one of these conditions required conversion of forty-two character strings, provided other information was present. development plan after reviewing the two sets of functional specifications, a development plan was established. this plan outlined the steps involved in software development for the project, named an individual responsible for each step, estimated the duration of each step, identified the objectives of software development, and identified potential time conflicts for staff and machine resources. the time estimates were constantly monitored and revised during the project cycle to ensure that the work would be completed on time. development method based on a thorough analysis of the functional specifications, the following basic design was chosen: aacr2: oclc/brown 167 1. read a bibliographic record. 2. identify a field in the bibliographic record for potential conversion. 3. derive a key from that field. the key derivation used would be the same as that used for the online system, except that it would be extended to include fields not normally indexed but that needed to be converted to aacr2. derived search keys are formulated by extracting a certain number of characters from the words in a name. for personal names, a 4,3,1 key is used; i.e., the first four characters from the surname, the first three characters from the forename, and the middle initial. 4. perform a keyed search of the lc name authority index files. 5. for each hit on an index record, read the corresponding name authority record and check for a match of the authority and bibliographic fields. when a match is found, merge the bibliographic and authority data. 6. repeat steps 2 through 5 for every field in the bibliographic record that qualifies for conversion. 7. scan the bibliographic record for fields that might be converted using the machine-manipulation pattern matching and compare these fields with the various patterns. should a match occur, manipulate the string accordingly. 8. if a record has been converted, add the 040 field if it is not already present in the record; or, edit the 040 field to include a subfield d indicating that oclc has modified the record. 9. repeat the entire process for every record in the online union catalog. design method for conversion the method presented a complex design. because it required indexing fields not normally indexed by the oclc system, the search keys would have to be specified. also, the 130,430, 530 uniform and variant title fields in the name authority file would have to be indexed and the keys defined. this could be done by adding the search keys to the existing name index file , which contains indexes to the lc name authority file , or by creating a separate file. adding to the existing name index file would result in inconsistent data within the file , mixed names and titles, and, more important, interference with the online system . using a separate file would mean more maintenance, necessitate slightly more machine space, and require two searches to cover all search possibilities for derived name authority search keys. (it should be noted that currently online system users cannot search the name authority file using a derived title search key.) software development project software design defined activity along the two lines of conversion, corresponding to the functional specifications: conversion of name 168 journal of library automation vol. 14/3 september 1981 headings by matching bibliographic headings with headings in the lc name authority file , and conversion of name and uniform title headings through machine manipulation of existing bibliographic data. conversion by matching name authority headings was broken down into subactivities as specified by cases 1 through 4 in the functional specifications. conversion by machine manipulation was subdivided into: 1. conversion of conference name headings. 2. conversion of uniform titles-music. 3. conversion of uniform titles-general. 4 . conversion of form subheadings. 5. conversion of general material designators. 6. conversion of "united states" and "great britain" abbreviations. the entire conversion was designed to be directed by a series of runtime parameters that specified which subactivities were to be performed, whether the conversion was to be run concurrently with the online system , the names of files to be used (including audit and checkpoint files), and the range of oclc numbers to be processed. the run-time parameters allowed multiple processes (programs) to be run simultaneously, with each process running against a different part of the online union catalog. the design also included use of an audit trail, where a record is written to a file every time a change is made to a bibliographic field. the trail consisted of the oclc number and the type of subactivity applied to the field . conversion restarts were specified to be automatically controlled through a checkpoint file. checkpoint records in this file contained the latest oclc number processed, total number of records processed, total number of records, and time stamps to calculate elapsed time. to effect a restart, the conversion was simply rerun and the checkpoint file handled file positioning to ensure against duplicate reprocessing of records. an overriding development priority was to design the software to be flexible enough to handle both the conversion of the online union catalog and the conversion of incoming marc tapes. in this way, pre-aacr2 headings would be converted (if they met the specifications) before being loaded into the database. growth requirements at the same time that the coding began, the project staff studied the design to determine its effects on the existing system. additional disk space was projected based on the estimate of bibliographic records to be converted. based on past research of field lengths, project staff estimated that 66.42 bytes (characters) would be added to each converted record . based on earlier samplings by the library of congress, it was assumed that 17 percent of the database would be converted (a figure that turned out low). therefore, 79.04 additional megabytes would be used. because an additional 13 percent of this would be needed for file management, the total aacr2: oclcibrown 169 requirement for the expansion of the bibliographic file was projected as 89.3 megabytes. the bibliographic index files would also expand with the conversion. not only would the old index keys be retained but new keys would be added. it was estimated that 4 percent of the bibliographic records would generate new keys (duplicate keys are not added to the files), for an additional requirement of ten megabytes. it was also calculated that six megabytes would be required for the new name authority index file. the total additional space required for the expansion of the bibliographic file, the expansion of the bibliographic index file, and the addition of the name authority index file was thus 105.3 megabytes. this space would have to be available before the conversion could be run. testing as coding progressed into the testing phase, it became obvious to the project staff that existing testing methods were not well suited to testing the conversion software. therefore, a utility program was developed to enter bibliographic records in a test file using techniques similar to those used by the online system. these test bibliographic records included both good and bad data, and records that should and should not be converted. an attempt was made to cover as many situations as practicable. for example, a given record might have multiple fields that would convert and, within a given field, multiple conversions might apply. images of the converted test records were manually compared with the original entry. at the same time, the accuracy of the audit trail was verified. the conversion process was tested using a utility debugger to simulate error conditions that did not occur as a result of other tests. changes to the online system code were tested using a simulator. all testing and development work was done on a development machine. calibration tests were made on the data base processor (dbp), the database management portion of the online system. the calibrations were taken in a stand-alone environment to calculate the length of time needed to run the conversion and to test the conversion software on a larger database than the test files. at the time of the calibration tests, the lc name authority file held about 250,000 records; it was not distributed across the various disk packs but rather restricted to a fairly small number of packs. between the time of the calibration and the conversion run, the lc name authority file grew to 450,000 records and was distributed evenly across the disk packs on the dbp. according to the calibration tests, the conversion to aacr2 was expected to take ninety-two hours, with sixteen copies of the software processing different ranges of the bibliographic file. the calibration tests also estimated that 28 percent of the bibliographic records would be converted, much higher than originally estimated. after the calibration tests, the software underwent quality assurance tests. the conversion software was run against test files on the dbp to 170 journal of library automation vol. 14/3 september 1981 verify the conversion process and to provide the data for the next phase of quality assurance, the regression test. during regression testing, each subsystem in the online system, with new software changes included, was tested by oclc staff. additional tests were made of normal work flows to ensure that all functions that previously worked still functioned correctly and all functions that should not work still did not work. no problems were uncovered during these tests and no software changes were made. conversion of the oclc online union catalog the conversion was designed to run either in a stand-alone mode or concurrently with the online system. the major drawback to running in a stand-alone mode was that the online system would be unavailable to users for some period of time. however, this was not deemed as great a problem as running the conversion while the online system was operational. with the online system operational, the conversion would have to "lock" the bibliographic record as it is processed, thus potentially affecting system performance. for example, if a user wanted to retrieve a record that was locked, he or she would have to wait until the record was unlocked. since theaacr2 conversion process locks the bibliographic record when it reads it and keeps it locked until the conversion for that record is complete, the record could stay locked for several seconds. before the conversion could be run, various files had to be created on the dbp . the bibliographic file on the dbp is partitioned across twenty-nine disk packs, each pack holding 250,000 records within a range of oclc control numbers. the start-up commands and parameters were put into one file for execution. one audit file was created for each process to be run. the conversion began running with sixteen processes . ten of the processes were run against two disk packs, with four processes running against a single disk pack. at the time of conversion, the dbp contained fourteen cpus; twelve of the processes ran alone in a cpu, and two processes ran in each of two cpus . as soon as the conversion began, on december 13, 1981, at 4:00a.m., another calibration test was done to estimate completion time. the results showed that the file redistribution that was expected to lower the time estimates significantly had not produced the expected result. attempts were made to explain the discrepancies, but it was concluded that the processes simply were slow. the 110 rate and cpu utilization rate were high. based on these calibration test findings, it was decided to start up additional processes so that one process would be run on a single disk pack, with two processes per cpu. the original sixteen processes had to be stopped, the range of oclc numbers processed redistributed, and additional audit files created. twenty-eight processes were then started up. all records in the twenty-ninth disk pack, records with control numbers greater than seven million, were to be handled by the twenty-eighth process. aacr2: oclc!brown 171 the conversion ran smoothly until some of the processes encountered a problem they could not handle. the conversion was then stopped. because the problem was not immediately obvious, the records being processed at the time of the error were skipped and the conversion restarted using the checkpoint file. the problem was later identified-if the converted field held more than 255 characters, the length of the field was incorrectly calculated-and software was corrected. the audit files were saved up to the point of the correction to identify the problem records. using these audit files to find records that had been converted, a preconversion copy of the bibliographic file was scanned for records that would need correction. fifty-six records were identified and sent to the bibliographic maintenance section, user services division, of oclc for manual correction. from this point on, the conversion ran smoothly but slowly, processing an average of 28,500 records per hour. the checkpoint files were read every two hours to monitor the speed of the conversion. because this monitoring in itself proved to be quite cumbersome, a program was written to format the checkpoint data for easier readability. the resultant reports showed a breakdown by process of how much of the conversion had been done, the rate at which it had been done, and how much remained . by using these reports, as a process would finish, another slower process could be divided and started up to balance the load and finish faster. periodically, converted records were written on hard-copy printers for oclc staff to use to check the accuracy of the conversion. the checkpoint reports showed that 39 percent of the records in the online union catalog were being converted to aacr2. this percentage was much higher than anticipated by the calibration tests, and consequently the disk space needed for expansion was more than anticipated. files not used by the conversion were deleted and index files were moved to other disk packs to allow the bibliographic files to expand. the last record was converted and all processes stopped by 10:45 a .m. on december 21, after 246 hours of work. the bibliographic file and its indexes were reorganized, slack space squeezed out, and all files that had been deleted were put back. the online system was made available to users at 7:00a.m., december 23, 1980. a total of 3,704,440 changes had been made on more than 2, 767,000 records. table 4 lists the number of fields converted for each activity. summary some records could not be converted because: (1) the data within the field were incorrect or inadequate, or (2) the record would have exceeded field number and record length limits. oclc has made a continuing effort since the conversion to correct problems. the most difficult and numerous problems involved the lc name authority file. in some cases the data within the authority records are incorrect, while in other instances multiple authority records exist. the 172 ]ournaloflibraryautomation vol.14/3 september1981 1'a ble 4. fields converted for each activ ity activity mistagged corporate na me fie lds direct aacr 2 match match where aacr2 form is e lsewhe re in the a uth ority record ma tc h on lc prefe rred form conversion of conference name headings conversion of uniform titlesmusic conversion of uniform titles-general conversion of form headings conversion of general material designators conversion of " united states" and " great brita in" abbreviations nu mber of fields co nverted 1,268 2, 685. 211 614,333 23,611 96,382 68,905 2,263 3 1,278 49,978 13 1,211 conversion used the first matching authority record it encountered. the most desirable record, as it turned out, was sometimes not the first encountered . a series of eight fixes was programatically applied to the oluc to correct problems, using either the audit file or database scans to select the record to be corrected. fixes 1 and2 were similar in that each was the result of a bad authority record and the original form was restored . headings converted to "voice of america (radio program)" were changed back to " united states. dept . of state" by fix 1. "united states bureau of the census . census of construction industries (1972)" was changed back to "united states. bureau of the census" by fix 2. fixes 3 through 7 were needed to correct programming problems, omissions in the functional specifications, and changes in lc procedures . subfields x, y, and z were deleted from 600 fields by the conversion. fix 3 moved the subfields back into the 600 fields. fix 4 reordered the e and q subfields in personal name headings that had been moved into the field in the wrong order by the conversion. the conversion had supplied a subfield g between the word "manuscript" and the following text in 110 fields. fix 5 changed subfield coding g to n when lc began using the n . fixes 6 and 7 restored some fields to the original form, which had been unintentionally converted. fix 6 corrected form subheadings, and fix 7 corrected music uniform titles. "constitutional" had been treated as " constitution," i.e., it was deleted from the field. some terms within music uniform titles were to have been pluralized. however, the conversion did not differentiate between terms needing pluralization and those that were already plural . " masses" ended up as "masseses. " fix 7 corrected this problem . forty-six headings, including chopin, shakespeare, and beethoven , were identified as unconverted headings, resulting from the multiple authority record problem . fix 8 adjusted the name authority file so the desired record would be the first encountered, scanned the oluc to select records containing the forty-six headings, and ran those selected records through the conversion process. approximately 80,000 records were converted by fix 8. aacr2: oclcibrown 173 other problems were expected to filter in, although the stream has slowed to a trickle. these problems continue to be dealt with by oclc librarians. on the whole, problems were expected, planned for, and handled in a timely fashion. oclc originally envisioned the conversion of its large database to encompass 8 percent of the total records online; 39 percent of the records were converted, and they were available to oclc users before the january 1, 1981, deadline set by the library community. georgia l. brown is manager, cataloging section, in the development division of oclc. an evaluation of finding aid accessibility for screen readers kristina l. southwell and jacquelyn slater information technology and libraries | september 2013 34 abstract since the passage of the american disabilities act in 1990 and the coincident growth of the internet, academic libraries have worked to provide electronic resources and services that are accessible to all patrons. special collections are increasingly being added to these web-based library resources, and they must meet the same accessibility standards. the recent popularity surge of web 2.0 technology, social media sites, and mobile devices has brought greater awareness about the challenges faced by those who use assistive technology for visual disabilities. this study examines the screen-reader accessibility of online special collections finding aids at 68 public us colleges and universities in the association of research libraries. introduction university students and faculty today expect some degree of online access to most library resources. special collections libraries are no exception, and researchers now have access to troves of digitized finding aids and original materials at university library websites nationwide. as part of the websites of higher education institutions, these resources must be accessible to patrons with disabilities. section 504 of the rehabilitation act of 1973 first prohibited exclusion of the disabled from programs and activities of public entities, and the 1990 americans with disabilities act (ada) mandated accessibility of public services and facilities. section 508 of the rehabilitation act, as amended by the workforce investment act of 1998, also requires accessibility of federally funded services. since the passage of these laws, libraries at us colleges and universities have made progress in physical and electronic accessibility for the disabled. according to the employment and disability institute at cornell university, 2.1 percent of noninstitutionalized persons in the united states in 2010 had a visual disability.1 the us census bureau counted nearly 8.1 million people (3.3 percent) who reported difficulty seeing and 2 million who are blind or unable to see.2 these numbers indicate that there are students, faculty, and patrons outside the academic community with visual impairments who are potential consumers of online special collections materials. as ada improvements increasingly pave the way for greater enrollment numbers of students with visual impairments, libraries must anticipate these students’ need for fully accessible information resources. kristina southwell (klsouthwell@ou.edu) is associate professor of bibliography and assistant curator at the western history collections, university of oklahoma, norman, ok. jacquelyn slater (jdslater@ou.edu) is assistant professor of bibliography and librarian at the western history collections, university of oklahoma, norman, ok. mailto:klsouthwell@ou.edu mailto:jdslater@ou.edu an evaluation of finding aid accessibility for screen readers | southwell and slater 35 library websites and the constantly changing resources they offer must be regularly evaluated for compatibility with screen readers and other accessibility technologies to ensure access. perhaps because special collections materials are relatively late arrivals to the internet, their accessibility has not received as much attention as more traditional library offerings like published books and periodicals. the goal of this study is to determine whether a sampling of special collections finding aids available on public us college and university library websites are accessible to patrons using screen readers. internet access and screen readers blind and low-vision internet users have various types of assistive technology available to them. these include screen readers with text-to-speech synthesizers, refreshable braille displays, text enlargement, screen magnification software, and nongraphical browsers. guidelines for making websites accessible via assistive technology are available from the w3c’s web content accessibility guidelines (wcag 2.0).3 these rules provide success criteria for levels a, aa, and aaa for web developers to meet. many websites today still do not conform to these guidelines, and barriers to access persist. screen-reader users access information on the internet differently than sighted persons. the keyboard usually replaces the monitor and mouse as the primary computer interface. webpage content is spoken aloud in a strictly linear order, which may differ from the visual order on screen. instead of visually scanning the page to look for the desired content, screen-reader users can use the “find” or “search” function to look for something specific or use one of several options for skimming the page via keyboard shortcuts. these shortcuts, which vary by screen reader, allow navigation to the available links, headings, aria landmarks, frames, paragraphs, and other elements of the page. a recent webaim survey of screen-reader users indicated 60.8 percent navigated lengthy webpages first by their headings. using the “find” feature was the second most common method (16.6 percent), followed by navigation with links (13.2 percent) and aria landmarks (2.3 percent). only 7.0 percent reported reading through a long website without using navigational shortcuts.4 some websites also offer a “skip navigation link” at the beginning of a page, which allows the user to skip over the repetitive navigational information in the banner to hear the “main content” as soon as possible. these fundamental differences in the way screenreader users access internet content are the key to making websites that work well with assistive technology. literature review accessibility studies of library web sites in higher education have primarily focused on the library’s homepage and its resources and services. more than a decade ago, lilly and van fleet and spindler determined only 40 percent and 42 percent, respectively, of academic library homepages were rated accessible using bobby accessibility testing software.5 a series of similar studies followed by schmetzke and comeaux and schmetzke, which found accessibility rates of library homepages fluctuating over time, decreasing from a 2001 rate of 59 percent to 51 percent in 2002, information technology and libraries | september 2013 36 and rising back to 60 percent in 2006.6 providenti and zai iii examined academic library homepages in kentucky, comparing data from 2003 and 2007. they also found low accessibility rates with minimal improvement in four years.7 many accessibility studies have focused on one of the mainstays of academic library sites: databases of e-journals. early studies by coonin, riley, horwath, and others found significant accessibility barriers in most electronic content providers’ databases.8 problems ranged from missing and unclear alternative text to inaccessible journal pdfs saved as images instead of text. as awareness of web accessibility in library resources spread, research studies began to find that most databases were section 508 compliant but still lacked user-friendliness for users of assistive technology.9 more recent studies examined the actual usability of journal databases and the challenges they pose for the disabled. power and labeau still found vendor databases that were not section 508 compliant and others that were minimally compliant but lacked functionality.10 dermody and majekodunmi found that students were hindered by advanced database features intended to improve general users’ experiences.11 disabled students were also confronted with excessive links, unreadable links, and inaccessible pdfs. related studies have focused on providing guidelines for accessible library web design and services. brophy and craven highlighted the importance of universal design in library sites because of the ever-increasing complexity of web-based media.12 vandenbark provided a clear explanation of us regulations and standards for accessible design and outlined basic principles of good design and how to achieve it.13 recent works by samson and willis addressed best practices for reference and general library services to disabled patrons. samson found no consistent set of best practices between eight academic libraries studied, noting that five of the eight based their services on reactions to individual complaints instead of using a broader, proactive approach.14 willis followed up on a 1996 study by surveying technology and physical-access issues for the disabled in academic health sciences libraries. she found improvements in physical access, but technological access proved to be a mixed bag. while library catalogs were more accessible because they were available online, library webpages continued to pose problems for the disabled. significant deficiencies in the provision of alternative text and accessible media formats were observed.15 finding no comparable evaluations of special collections resources, in 2011 we examined the screen-reader accessibility of digitized textual materials from the special collections departments of us academic library websites.16 our study found that 42 percent of the digitized items were accessible by screen reader, while 58 percent were not. published at the same time, lora j. davis’ 2012 study evaluated accessibility of philadelphia area consortium of special collections libraries (pacscl) member libraries’ special collections websites and compared their performance to popular sites such as facebook, wikipedia, and youtube. davis found that the special collections sites had error rates comparable to the popular sites, but demonstrated that a low number of error codes in automatic checkers does not necessarily mean the page is usable for nonsighted people.17 davis concluded that it is difficult to “meaningfully assess site accessibility” an evaluation of finding aid accessibility for screen readers | southwell and slater 37 using only automatic accessibility checkers.18 our current research study addresses this issue by incorporating manual tests of the special collections finding aids we examined. the results provide some insight into the screen-reader user’s experience with these materials. method the researchers evaluated a single online finding aid from the websites of each of the 68 us public university and college libraries in the association of research libraries. they were analyzed with automated and manual tests during the 2012 fall academic semester. the evaluated finding aids were randomly selected from each library’s manuscripts and archives collections. selection was limited to only collections that have a container list describing manuscript or archives contents at least at box level. evaluations were performed on the default display mode of the selected finding aids. if the library’s website required a format choice instead of a default display (such as html or pdf) the html version was selected. the automated web-accessibility checker wave 5.0 by webaim was used to perform an initial assessment of each finding aid’s conformance to section 508 and wcag 2.0 guidelines. the wavegenerated report for each finding aid was used to compile a list of codes for the standard wave categories: errors, alerts, features, structural elements, and wai-aria landmarks. we recorded how many libraries earned each type of code, as well as how many times each code was generated during the entire evaluation process. manual testing of each finding aid was performed with the webbie 3 web browser, a text-only browser for the blind and visually impaired. webbie’s ctrl-h and ctrl-l commands were used to examine the headings and links available on each finding aid to determine whether patrons who use screen readers could navigate the finding aid by using its headings or internal links. the study concluded with a manual test by screen reader directed by keyboard navigation. system access to go (satogo) and nvda were used for this test. results overview basic descriptive data recorded during the selection process shows that 65 of the 68 finding aids tested were displayed as webpages using html, xhtml, or xml coding. the remaining three finding aids were displayed only in pdf, with no other viewing option available. only 25 of 68 finding aids were offered in multiple viewing formats, while 43 were only available in a single format. twenty of the finding aids were displayed in a state or regional database, while four used archon, one used contentdm, and four used dlxs. wave 5.0 web accessibility evaluation tool the three finding aids available only in pdf cannot be checked in wave, which is limited to webpages. therefore 65 finding aids were evaluated with this tool. the results show that the majority of tested finding aids (58 of 65, or 89.23 percent) had at least one accessibility error (see information technology and libraries | september 2013 38 table 1). the most common errors were missing document language, missing alternative text, missing form labels, and linked images missing alternative text. only seven of the finding aids had zero accessibility error codes. missing document language was noted for 63 percent of finding aids. language identification is important for screen readers or any text-to-speech applications, and it is a basic level a conformance requirement to meet wcag 2.0 criteria. the finding aids tested for this study contain primarily english materials, but they also describe materials in other languages, particularly spanish and french manuscript and book titles. without language identification, these words are spoken incorrectly with english pronunciation. furthermore, increasing popularity of mobile devices with voicing capabilities will likely make language identification helpful for many users, whether or not they use a screen reader for a disability accommodation. error number of libraries total number of occurrences broken skip link 4 6 document language missing 41 41 empty button 1 1 empty form label 4 7 empty heading 15 16 image map area missing alternative text 2 2 linked image missing alternative text 12 28 missing alternative text 15 36 missing form label 23 29 missing or uninformative page title 7 7 table 1. wave 5.0 errors (n = 65). the number of errors found for missing alternative text (36 instances at 15 libraries), linked images missing alternative text (28 instances at 12 libraries), and image map areas missing alternative text (two instances at two libraries) is surprising. alternative text for graphic items is one of the most basic and well-known accessibility features that can be implemented. the fact that it has not been provided when needed in more than a dozen finding aids suggests that these libraries have not performed the most rudimentary accessibility checks. missing or empty form labels and empty buttons, found at 24 libraries, can cause significant problems for screen-reader users. form labels and buttons allow listeners to identify and interact with forms such as search boxes. lack of accessible descriptive information makes them challenging to use, if not impossible. because headings are used with screen readers to facilitate quick keyboard navigation of a page, the presence of empty headings deprives screen-reader users of the information they need to scan the page the way a sighted patron does. similarly, skip links are used to jump to the main content of a page, bypassing the repetitive information in headers and sidebars. broken skip links were an evaluation of finding aid accessibility for screen readers | southwell and slater 39 present at four libraries, eliminating their intended advantage. missing or uninformative page titles were found at seven libraries, six of which were from pages using frames for display. when frames are used, each frame must have a clear title so listeners can choose the correct frame to hear. wave’s alerts category identifies items that have the potential to cause accessibility issues, particularly when not implemented properly (see table 2). a total of 43 percent of the finding aids reported missing first level headings, 30 percent had a skipped heading level, and nearly 17 percent had no heading structure. missing and disordered headings cause confusion when screenreader users try to navigate a page with them. listeners may think they have missed a heading, or they may have difficulty understanding the order and relationship of the page’s sections. alert number of libraries total number of occurrences accesskey 3 15 broken same-page link 9 18 link to pdf document 3 5 missing fieldset 1 1 missing first level heading 28 28 missing focus indicator 13 13 nearby image has same alternative text 9 1,071 no heading structure 11 16 noscript element 8 9 orphaned form label 2 2 plugin 1 1 possible table caption 3 4 redundant alternative text 4 9 redundant link 26 264 redundant title text 18 1,093 skipped heading level 20 22 suspicious alternative text 6 6 suspicious link text 1 5 tabindex 8 74 underlined text 8 142 unlabeled form element with title 1 2 very small text 11 20 table 2. wave 5.0 alerts (n = 65). at first glance, the most frequently encountered alerts appear to be for nearby images with the same alternative text (1,071 instances at nine libraries), and redundant title text (1,093 instances information technology and libraries | september 2013 40 at 18 libraries). on closer inspection, it is clear the vast majority of these alerts came from just three libraries using archon and are due to the inclusion of an “add to your cart” linked arrow image at the end of each described item. this repetitive statement is read aloud by the screen reader, making for a tedious listening experience. redundant links accounted for the next largest group of errors (264 instances at 26 libraries). most of these came from a single library using contentdm. its finding aid included a large number of subject headings linked to a “refine your search” option. excessive links clutter the navigational structure used by screen readers. broken same page links, present on nine finding aids, also hamper quick navigation within a page. other alerts reported at several libraries indicated failure to provide descriptive information or adequate alternative text for form labels, table captions, fieldsets, and links. the presence of these problems underscores the fact that descriptive information needed by screen reader users is not reliably available in finding aids. the remaining alerts for accesskey, tabindex, plugin, noscript element, and link to pdf document simply highlight areas that should be checked for correct implementation and do not confirm the presence of an access barrier. the features, structural elements, and wai-aria landmarks codes in wave identify the coding elements that make online content more accessible. features help users with disabilities interact with the page and read all of the available information on it, such as alternative text for images and form labels (see table 3). fully 83 percent (54 of 65) of library finding aids evaluated included at least one accessibility feature. the most commonly used features are alternative text and linked images with alternative text. a total of 53 libraries used some form of alternative text. wave reported that skip navigation links were available on only four finding aids, accounting for just 6 percent of libraries. a manual check of the source code, however, located a total of six finding aids feature number of libraries total number of occurrences alternative text 45 142 element language 2 2 form label 5 16 image button with alternative text 4 4 image map area with alternative text 2 5 image map with alt attribute 3 3 linked image with alternative text 19 31 long description 1 6 null or empty alternative text 10 21 null or empty alternative text on spacer 9 30 skip link 4 4 skip link target 5 5 table 3. features (n = 65). an evaluation of finding aid accessibility for screen readers | southwell and slater 41 with functioning skip links, all correctly located at or near the beginning of the page. this discrepancy indicates that accessibility checkers are not fail proof and must be followed by manual tests. the two added libraries raise the total to just 9 percent of libraries with skip links. considering the value of skip links to users of assistive technology, it is unfortunate they are not present on more pages. the structural elements noted by wave are the elements that help with keyboard navigation of the page and contextualizing layout-based information, such as tables or lists (see table 4). most libraries (64 of 65, or 98 percent) used at least one structural feature on their finding aids. lists and heading levels 2 and 3 are the most frequently used, followed by heading levels 1 and 4. although heading levels should be ordered sequentially to provide logical structure to the document, heading level 1 was skipped at 28 libraries (see table 2). table header cells, included at the 9 libraries using data tables to display container lists, are key to making tables screen-reader accessible. inline frames were used at seven libraries, as opposed to six libraries that used traditional frames. while inline frames are considered more accessible than traditional frames, using css is preferable to using either type. structural element number of libraries total number of occurrences definition/description list 11 86 heading level 1 33 54 heading level 2 43 150 heading level 3 42 295 heading level 4 25 108 heading level 5 1 2 heading level 6 0 0 inline frame 7 7 ordered list 6 16 table header cell 9 38 unordered list 41 715 wai-aria landmarks 1 3 table 4. structural elements (n = 65). wai-aria landmarks are element attributes that identify areas of the page such as “banner” or “search.” they serve as navigational aids for assistive technology users in a manner similar to headings. only one of the finding aids included three wai-aria roles. while aria landmarks are becoming more common on the internet in general, the data collected for this study indicates they have not yet been incorporated into library finding aids. information technology and libraries | september 2013 42 webbie 3 text browser analysis because screen reader users often use a webpage’s headings and links for navigating by keyboard commands, their importance to accessibility cannot be overstated. a quick check of any page in a nongraphical browser will reveal the page’s linear structure and reading order as handled by a screen reader. a text-only view of a website shows the order of headings and links within the document. webbie 3’s ctrl-h and ctrl-l commands were used to evaluate the 65 finding aids for the presence of headings and links for internal navigation. finding aids were rated on a pass/fail basis in three categories: • presence of any headings • presence of headings for navigating to another key part of the finding aid (e.g., container list) • presence of internal links for navigating to another key part of the finding aid headings/links yes no finding aid has at least one heading 59 (91%) 6 (9%) headings are used for navigation within finding aid 44 (68%) 21 (32%) links are used for navigation within finding aid 37 (57%) 28 (43%) headings and/or links used for navigation within finding aid 49 (75%) 16 (25%) table 5. use of headings and links for navigation (n = 65). while 91 percent had at least one heading, just 68 percent actually had headings that enabled navigation to another important section of the document, such as the container list. that means one-third of all finding aids encountered during this study could not be navigated by headings. even those that did have enough headings with which to navigate did not always have the headings in proper sequential order, or were missing first-level headings. this lack of adequate structure, given the length of some manuscript-collection finding aids, can make reading them with a screen reader tedious. finding aids with few or no headings prevent users of assistive technology from conveniently moving between sections, as a sighted reader can by visually scanning the page and selecting a relevant portion to read. even fewer finding aids offered links for navigating between sections of the finding aid. while 57 percent included such links, 43 percent did not. a total of 25 percent of pages tested lacked both headings and links of any kind for navigation within the finding aid. inclusion of headings or links to the standard sections of the finding aid facilitates keyboard navigation. additional headings or links to individual series or boxes provide even more options an evaluation of finding aid accessibility for screen readers | southwell and slater 43 for screen reader users. this is particularly helpful for patrons whose queries aren’t easily found using a search function – for example, when a patron does not know the specific terms to use for searching. only the most patient visitor will listen to an entire finding aid being read. screen reader test a manual screen reader test of each finding aid was completed by the researchers with satogo and nvda. both screen readers were used to ensure that success or failure to read the content was not because of any particular screen reader software. despite the 89 percent error rate noted by the automatic accessibility checker, the screen readers were able to read the main content of all 65 finding aids. the three pdf-only finding aids in the original group of 68 were also tested by opening them with the screen reader and adobe reader together. adobe reader indicated all three lacked tagging for structure and attempted to prepare them for reading. this resulted in all three being read aloud by the screen reader, but only one of the three was navigable by linked sections of the finding aid. the remaining two finding aids had no headings or links. while it is encouraging that the main content of all 68 finding aids could be read, some functioned poorly because of how the information is organized and displayed. finding aids serve as reference works for researchers and as internal record-keeping documents for the history of the collection. as such, they typically have a substantial amount of administrative information positioned at the beginning. biographical, acquisition, and scope and content notes are common, as are processing details and subject headings. sighted users can glance at the administrative information and skip to the collection summary or container list as needed. screenreader users can bypass this administrative information by using headings or links when they are supplied. users of the one-third of finding aids in this study that lacked these shortcuts must skim, search, or read the entire finding aid. inclusion of extensive administrative information without providing the means to skip past it creates a significant usability barrier. the descriptive style and display format of the container list also posed problems during this test. lengthy container lists displayed in tables are difficult to understand when spoken because tables are read row-by-row. this separates the descriptive table header cells, such as “box” and “folder” from the related information in the rows and columns below. as a result, the screen reader says “one, fifteen” before the description of the item in box 1, folder 15. it is hard to follow a long table, and the listener must remember or revisit the column and row headers to make sense of the descriptions. most screen readers have a table-reading mode for data tables that will read the header cell with the associated content, but only if the table has been marked up with sufficient tags. container-list-item descriptions that begin with an identification number or numeric date (e.g., 2012/01/13) are particularly unclear for listeners. these long sequences of numbers seem out of context when spoken by the screen reader, and it can be difficult to infer the relationship between the number and the item. item descriptions that are phrased as brief sentences in plain language result in finding aids that are more easily understood. application of findings information technology and libraries | september 2013 44 most special collections personnel in academic libraries are not responsible for the design of their websites, which are part of a larger organization that serves other needs. it is important that special collections librarians communicate to administrative and systems personnel that finding aids must be accessible to the visually disabled. libraries cannot rely on a content management system’s claims of being section 508-compliant to ensure accessibility, because that does not automatically guarantee the information displayed in the system is accessible. proper implementation of any content management system’s accessibility features is a key factor in achieving accessibility. librarians can take the first step toward improving accessibility of their special collections’ online finding aids by experiencing firsthand what screen reader users encounter when they use them. this can be done by conducting the same automated and manual tests described in this study. the following key checkpoints should be considered: accessible finding aids should • be keyboard navigation-friendly; • include alternative text for all graphics; • have descriptive labels and titles for all interactive elements like forms; • offer at least one type of navigational structure: o skip links and internal navigation links, o sufficient and properly ordered headings, or o wai-aria landmarks; and • linear reading order should be correct and simulate visual reading order, particularly for the container list. conclusion this study indicates that special collections finding aids at us public colleges and universities can be accessed by screen-reader users, but they do not always perform well because of faulty coding and inadequate use of headings or links for keyboard navigation. it is clear that many finding aids available online today have not been evaluated for optimal performance with assistive technology. this results in usability barriers for visually impaired patrons. special collections librarians can help ensure their electronic finding aids are accessible to screen-reader users by conducting automatic and manual tests that focus on usability. the test results can be used to initiate changes that will result in finding aids that are accessible to all users. an evaluation of finding aid accessibility for screen readers | southwell and slater 45 references 1. “disability statistics,” employment and disability institute, cornell university, 2010, accessed december 20, 2012, www.disabilitystatistics.org/reports/acs.cfm. 2. matthew j. brault, “americans with disabilities: 2010,” us census bureau, 2010, accessed december 20, 2012, www.census.gov/prod/2012pubs/p70-131.pdf. 3. “web content accessibility guidelines (wcag) 2.0,” world wide web consortium (w3c), accessed december 20, 2012, www.w3.org/tr/wcag. 4. “screen reader user survey #4,” webaim, accessed december 20, 2012, http://webaim.org/projects/screenreadersurvey4. 5. erica b. lilly and connie van fleet, “wired but not connected,” reference librarian 32, no. 67/68 (2000): 5–28, doi: 10.1300/j120v32n67_02; tim spindler, “the accessibility of web pages for mid-sized college and university libraries,” reference & user services quarterly 42, no. 2 (2002): 149–54. 6. axel schmetzke, “web accessibility at university libraries and library schools,” library hi tech 19, no. 1 (2001): 35–49; axel schmetzke, “web accessibility at university libraries and library schools: 2002 follow-up study,” in design and implementation of web-enabled teaching tools, ed. mary hricko (hershey, pa: information science, 2002); david comeaux and axel schmetzke, “web accessibility trends in university libraries and library schools,” library hi tech 25, no. 4 (2007): 457–77, doi: 10.1108/07378830710840437. 7. michael providenti and robert zai iii,“web accessibility at kentucky’s academic libraries,” library hi tech 25, no. 4 (2007): 478–93, doi: 10.1108/07378830710840446. 8. bryna coonin, “establishing accessibility for e-journals: a suggested approach,” library hi tech 20, no. 2 (2002): 207–20, doi: 10.1108/07378830210432570; cheryl a. riley, “libraries, aggregator databases, screen readers and clients with disabilities,” library hi tech 20, no. 2 (2002): 179–87, doi: 10.1108/07378830210432543; cheryl a. riley, “electronic content: is it accessible to clients with ‘differabilities’?” serials librarian 46, no. 3/4 (2004): 233–40, doi: 10.1300/j123v46n03_06; jennifer horwath, “evaluating opportunities for expanded information access: a study of the accessibility of four online databases,” library hi tech 20, no. 2 (2002): 199–206; suzanne l. byerley and mary beth chambers, “accessibility and usability of web-based library databases for non-visual users,” library hi tech 20, no. 2 (2002): 169–78, doi: 10.1108/07378830220432534; suzanne l. byerley and mary beth chambers, “accessibility of web-based library databases: the vendors’ perspectives,” library hi tech 21, no. 3 (2003): 347–57. 9. ron stewart, vivek narendra and axel schmetzke, “accessibility and usability of online library databases,” library hi tech 23, no. 2 (2005): 265–86, doi: 10.1108/07378830510605205; suzanne l. byerley, mary beth chambers, and mariyam thohira, “accessibility of web-based http://webaim.org/projects/screenreadersurvey4/ information technology and libraries | september 2013 46 library databases: the vendors’ perspectives in 2007,” library hi tech 25, no. 4 (2007): 509– 27, doi: 10.1108/07378830710840473. 10. rebecca power and chris labeau, “how well do academic library web sites address the needs of database users with visual disabilities?” reference librarian 50, no. 1 (2009): 55–72, doi: 10.1080/02763870802546399. 11. kelly dermody and norda majekodunmi, “online databases and the research experience for university students with print disabilities,” library hi tech 29, no. 1 (2011): 149–60, doi: 10.1108/07378831111116976. 12. peter brophy and jenny craven, “web accessibility,” library trends 55, no. 4 (2007): 950–72. 13. r. todd vandenbark, “tending a wild garden: library web design for persons with disabilities,” information technology & libraries 29, no. 1 (2010): 23–29. 14. sue samson, “best practices for serving students with disabilities,” reference services review 39, no. 2 (2011): 244–59, doi: 10.1108/00907321111135484. 15. christine a. willis, “library services for persons with disabilities: twentieth anniversary update,” medical reference services quarterly 31, no. 1 (2012): 92–104, doi: 10.1080/02763869.2012.641855. 16. kristina l. southwell and jacquelyn slater, “accessibility of digital special collections using screen readers,” library hi tech 30, no. 3 (2012): 457–471, doi: 10.1108/07378831211266609. 17. lora j. davis, “providing virtual services to all: a mixed-method analysis of the website accessibility of philadelphia area consortium of special collections libraries (pacscl) member repositories,” american archivist 75 (spring/summer 2012): 35–55. 18. davis, “providing virtual services to all,” 51. 30 information technology and libraries | march 2010 the path toward global interoperability in cataloging ilana tolkoff libraries began in complete isolation with no uniformity of standards and have grown over time to be ever more interoperable. this paper examines the current steps toward the goal of universal interoperability. these projects aim to reconcile linguistic and organizational obstacles, with a particular focus on subject headings, name authorities, and titles. i n classical and medieval times, library catalogs were completely isolated from each other and idiosyncratic. since then, there has been a trend to move toward greater interoperability. we have not yet attained this international standardization in cataloging, and there are currently many challenges that stand in the way of this goal. this paper will examine the teleological evolution of cataloging and analyze the obstacles that stand in the way of complete interoperability, how they may be overcome, and which may remain. this paper will not provide a comprehensive list of all issues pertaining to interoperability; rather, it will attempt to shed light on those issues most salient to the discussion. unlike the libraries we are familiar with today, medieval libraries worked in near total isolation. most were maintained by monks in monasteries, and any regulations in cataloging practice were established by each religious order. one reason for their lack of regulations was that their collections were small by our standards; a monastic library had at most a few hundred volumes (a couple thousand in some very rare cases). the “armarius,” or librarian, kept more of an inventory than an actual catalog, along with the inventories of all other valuable possessions of the monastery. there were no standard rules for this inventory-keeping, although the armarius usually wrote down the author and title, or incipit if there was no author or title. some of these inventories also contained bibliographic descriptions, which most often described the physical book rather than its contents. the inventories were usually taken according to the shelf organization, which was occasionally based on subject, like most libraries are today. these trends in medieval cataloging varied widely from library to library, and their inventories were entirely different from our modern opacs. the inventory did not provide users access to the materials. instead, the user consulted the armarius, who usually knew the collection by heart. this was a reasonable request given the small size of the collections.1 this type of nonstandardized cataloging remained relatively unchanged until the nineteenth century, when charles c. jewett introduced the idea of a union catalog. jewett also proposed having stereotype plates for each bibliographic record, rather than a book catalog, because this could reduce costs, create uniformity, and organize records alphabetically. this was the precursor to the twentieth-century card catalog. while many of jewett’s ideas were not actually practiced during his lifetime, they laid the foundation for later cataloging practices.2 the twentieth century brought a great revolution in cataloging standards, particularly in the united states. in 1914, the library of congress subject headings (lcsh) were first published and introduced a controlled vocabulary to american cataloging. the 1960s saw a wide array of advancements in standardization. the library of congress (lc) developed marc, which became a national standard in 1973. it also was the time of the creation of anglo-american cataloguing rules (aacr), the paris principles, and international standard bibliographic description (isbd). while many of these standardization projects were uniquely american or british phenomena, they quickly spread to other parts of the world, often in translated versions.3 while the technology did not yet exist in the 1970s to provide widespread local online catalogs, technology did allow for union catalogs containing the records of many libraries in a single database. these union catalogs included the research libraries information network (rlin), the oclc online computer library center (oclc), and the western library network (wln). in the 1980s the local online public access catalog (opac) emerged, and in the 1990s opacs migrated to the web (webpacs).4 currently, most libraries have opacs and are members of oclc, the largest union catalog, used by more than 71,000 libraries in 112 countries and territories.5 now that most of the world’s libraries are on oclc, librarians face the challenge and inconvenience of discrepancies in cataloging practice due to the differing standards of diverse countries, languages, and alphabets. the fields of language engineering and linguistics are working on various language translation and analysis tools. some of these include machine translation; ontology, or the hierarchical organization of concepts; information extraction, which deciphers conceptual information from unorganized information, such as that on the web; text summarization, in which computers create a short summary from a long piece of text; and speech processing, which is the computer analysis of human speech.6 while these are all exciting advances in information technology, as of yet they are not intelligent enough to help us establish cataloging interoperability. it will be interesting to see whether language engineering tools will be capable of helping catalogers in the future, but for now they are ilana tolkoff (ilana.tolkoff@gmail.com) holds a ba in music and italian from vassar college, an ma in musicology from brandeis university, and an mls from the university at buffalo. she is currently seeking employment as a music librarian. the path toward global interoperability in cataloging | tolkoff 31 best at making sense of unstructured information, such as the web. the interoperability of library catalogs, which consist of highly structured information, must be tackled through software that innovative librarians of the future will produce. in an ideal world, oclc would be smoothly interoperable at a global level. a single thesaurus of subject headings would have translations in every language. there would be just one set of authority files. all manifestations of a single work would be grouped under the same title, translatable to all languages. there would be a single bibliographic record for a single work, rather than multiple bibliographic records in different languages for the same work. this single bibliographic record could be translatable into any language, so that when searching in worldcat, one could change the settings to any language to retrieve records that would display in that chosen language. when catalogers contribute to oclc, they would create the records in their respective languages, and once in the database the records would be translatable to any other language. because records would be so fluidly translatable, an opac could be searched in any language. for example, the default settings for the university at buffalo’s opac could be english, but patrons could change those settings to accommodate the great variety of international students doing research. this vision is utopian to say the least, and it is doubtful that we will ever reach this point. but it is valuable to establish an ideal scenario to aim our innovation in the right direction. one major obstacle in the way of global interoperability is the existence of different alphabets and the inherently imperfect nature of transliteration. there are essentially two types of transliteration schemes: those based on phonetic structure and those based on morphemic structure. the danger of phonetic transliteration, which mimics pronunciation, is that semantics often get lost. it fails to differentiate between homographs (words that are spelled and pronounced the same way but have different meanings). complications also arise when there are differences between careful and casual styles of speech. park asserts, “when catalogers transcribe words according to pronunciation, they can create inconsistent and arbitrary records.”7 morphemic transliteration, on the other hand, is based on the meanings of morphemes, and sometimes ends up being very different from the pronunciation in the source language. one advantage to this, however, is that it requires fewer diacritics than phonetic transliteration. park, whose primary focus is on korean–roman transliteration, argues that the mccune reischauer phonetic transliteration that libraries use loses too much of the original meaning. in other alphabets, however, phonetic transliteration may be more beneficial, as in the lc’s recent switch to pinyin transliteration in chinese. the lc found pinyin to be more easily searchable than wade-giles or monosyllabic pinyin, which are both morphemic. however, another problem with transliteration that neither phonetic nor morphemic schemes can solve is word segmentation—how a transliterated word is divided. this becomes problematic when there are no contextual clues, such as in a bibliographic record.8 other obstacles that stand in the way of interoperability are the diverse systems of subject headings, authority headings, and titles found internationally. resource description and access (rda) will not deal with subject headings because it is such a hefty task, so it is unlikely that subject headings will become globally interoperable in the near future.9 fortunately, twenty-four national libraries of english speaking countries use lcsh, and twelve non-english-speaking countries use a translated or modified version of lcsh. this still leaves many more countries that use their own systems of subject headings, which ultimately need to be made interoperable. even within a single language, subject headings can be complicated and inconsistent because they can be expressed as a single noun, compound noun, noun phrase, or inverted phrase; the problem becomes even greater when trying to translate these to other languages. bennett, lavoie, and o’neill note that catalogers often assign different subject headings (and classifications) to different manifestations of the same work.10 that is, the record for the novel gone with the wind might have different subject headings than the record for the movie. this problem could potentially be resolved by the functional requirements for bibliographic records (frbr), which will be discussed below. translation is a difficult task, particularly in the context of strict cataloging rules. it is especially complicated to translate among unrelated languages, where one might be syntactic and the other inflectional. this means that there are discrepancies in the use of prepositions, conjunctions, articles, and inflections. the ability to add or remove terms in translation creates endless variations. a single concept can be expressed in a morpheme, a word, a phrase, or a clause, depending on the language. there also are cultural differences that are reflected in different languages. park gives the example of how angloamerican culture often names buildings and brand names after people, reflecting our culture’s values of individualism, while in korea this phenomenon does not exist at all. on the other hand, korean’s use of formal and informal inflections reflects their collectivist hierarchical culture. another concept that does not cross cultural lines is the korean pumasi system in which family and friends help someone in a time of need with the understanding that the favor will be returned when they need it. this cannot be translated into a single english word, phrase, or subject heading. one way of resolving ambiguity in translations is through modifiers or scope notes, but this is only a partial solution.11 because translation and transliteration are so difficult, 32 information technology and libraries | march 2010 as well as labor-intensive, the current trend is to link already existing systems. multilingual access to subjects (macs) is one such linking project that aims to link subject headings in english, french, and german. it is a joint project under the conference of european national librarians among the swiss national library, the bibliothèque nationale de france (bnf), the british library (bl), and die deutsche bibliothek (ddb). it aims to link the english lcsh, the french répertoire d’autorité matière encyclopédique et alphabétique unifié (rameau), and the german schlagwortnormdatei/ regeln für den schlagwortkatalog (swd/rswk). this requires manually analyzing and matching the concepts in each heading. if there is no conceptual equivalent, then it simply stands alone. macs can link between headings and strings or even create new headings for linking purposes. this is not as fruitful as it sounds, however, as there are fewer correspondences than one might expect. the macs team experimented with finding correspondences by choosing two topics: sports, which was expected to have a particularly high number of correspondences, and theater, which was expected to have a particularly low number of correspondences. of the 278 sports headings, 86 percent matched in all three languages, 8 percent matched in two, and 6 percent was unmatched. of the 261 theater headings, 60 percent matched in three languages, 18 percent matched in two, and 22 percent was unmatched.12 even in the most cross-cultural subject of sports, 14 percent of terms did not correspond fully, making one wonder whether linking will work well enough to prevail. a similar project—the virtual international authority file (viaf)—is being undertaken for authority headings, a joint project of the lc, the bnf, and ddb, and now including several other national libraries. viaf aims to link (not consolidate) existing authority files, and its beta version (available at http://viaf.org) allows one to search by name, preferred name, or title. oclc’s software mines these authority files and the titles associated with them for language, lc control number, lc classification, usage, title, publisher, place of publication, date of publication, material type, and authors. it then derives a new enhanced authority record, which facilitates mapping among authority records in all of viaf’s languages. these derived authority records are stored on oai servers, where they are maintained and can be accessed by users. users can search viaf by a single national library or broaden their possibilities by searching all participating national libraries. as of 2006, between the lc’s and ddb’s authority files, there were 558,618 matches, including 70,797 complex matches (one-to-many), and 487,821 unique matches (one-to-one) out of 4,187,973 lc names and 2,659,276 ddb names. ultimately, viaf could be used for still more languages, including non-roman alphabets.13 recently the national library of israel has joined, and viaf can link to the hebrew alphabet. a similar project to viaf that also aimed to link authority files was linking and exploring authority files (leaf), which was under the auspices of the information society technologies programme of the fifth framework of the european commission. the three-year project began in 2001 with dozens of libraries and organizations (many of which are national libraries), representing eight languages. its website describes the project as follows: information which is retrieved as a result of a query will be stored in a pan-european “central name authority file.” this file will grow with each query and at the same time will reflect what data records are relevant to the leaf users. libraries and archives wanting to improve authority information will thus be able to prioritise their editing work. registered users will be able to post annotations to particular data records in the leaf system, to search for annotations, and to download records in various formats.14 park identifies two main problems with linking authority files. one is that name authorities still contain some language-specific features. the other is that disambiguation can vary among name authority systems (e.g., birth/death dates, corporate qualifiers, and profession/ activity). these are the challenges that projects like leaf and viaf must overcome. while the linking of subject headings and name authorities is still experimental and imperfect, the frbr model for linking titles is much more promising and will be incorporated in the soon-to-be-released rda. according to bennett, lavoie, and o’neill, there are three important benefits to frbr: (1) it allows for different views of a bibliographic database, (2) it creates a hierarchy of bibliographic entities in the catalog such that all versions of the same work fall into a single collapsible entry point, (3) and the confluence of the first two benefits makes the catalog more efficient. in the frbr model, the bibliographic record consists of four entities: (1) the work, (2) the expression, (3) the manifestation, and (4) the item. all manifestations of a single work are grouped together, allowing for a more economical use of information because the title needs to be entered only once.15 that is, a “title authority file” will exist much like a name authority file. this means that all editions in all languages and in all formats would be grouped under the same title. for example, the lord of the rings title would include all novels, films, translations, and editions in one grouping. this would reduce the number of bibliographic records, and as danskin notes, “the idea of creating more records at a time when publishing output threatens to outstrip the cataloguing capacity of national bibliographic agencies is alarming.”16 the frbr model is particularly beneficial for complex canonical works like the bible. there are a small number of complex canonical works, but they take up a the path toward global interoperability in cataloging | tolkoff 33 disproportionate number of holdings in oclc.17 because this only applies to a small number of works, it would not be difficult to implement, and there would be a disproportionate benefit in the long run. there is some uncertainty, however, in what constitutes a complex work and whether certain items should be grouped under the same title.18 for instance, should prokofiev’s romeo and juliet be grouped with shakespeare’s? the advantage of the frbr model for titles over subject headings or name authorities is that no such thing as a title authority file exists (as conceptualized by frbr). we would be able to start from scratch, creating such title authority files at the international level. subject headings and name authorities, on the other hand, already exist in many different forms and languages so that cross-linking projects like viaf might be our only option. it is encouraging to see the strides being made to make subject headings, name authority headings, and titles globally interoperable, but what about other access points within a record’s bibliographic description? these are usually in only one language, or two if cataloged in a bilingual country. should these elements (format, contents, and so on) be cross-linked as well, and is this even possible? what should reasonably be considered an access point? most people search by subject, author, or title, so perhaps it is not worth making other types of access points interoperable for the few occasions when they are useful. yet if 100 percent universal interoperability is our ultimate utopian goal, perhaps we should not settle for anything less than true international access to all fields in a record. because translation and transliteration are such complex undertakings, linking of extant files is the future of the field. there are advantages and disadvantages to this. on the one hand, linking these files is certainly better than having them exist only for their own countries. they are easily executed projects that would not require a total overhaul of the way things currently stand. the disadvantages are not to be ignored, however. the fact that files do not correspond perfectly from language to language means that many files will remain in isolation in the national library that created them. another problem is that cross-linking is potentially more confusing to the user; the search results on http://www.viaf.org are not always simple and straightforward. if cross-linking is where we are headed, then we need to focus on a more user-friendly interface. if the ultimate goal of interoperability is simplification, then we need to actually simplify the way query results are organized rather than make them more confusing. very soon rda will be released and will bring us to a new level of interoperability. aacr2 arrived in 1978, and though it has been revised several times, it is in many ways outdated and mainly applies to books. rda will bring something completely new to the table. it will be flexible enough to be used in other metadata schemes besides marc, and it can even be used by different industries such as publishers, museums, and archives.19 its incorporation of the frbr model is exciting as well. still, there are some practical problems in implementing rda and frbr, one of which is that reeducating librarians about the new rules will be costly and take time. also, frbr in its ideal form would require a major overhaul of the way oclc and integrated library systems currently operate, so it will be interesting to see to what extent rda will actually incorporate frbr and how it will be practically implemented. danskin asks, “will the benefits of international co-operation outweigh the costs of effecting changes? is the usa prepared to change its own practices, if necessary, to conform to european or wider ifla standards?”20 it seems that the united states is in fact ready and willing to adopt frbr, but to what extent is yet to be determined. what i have discussed in this paper are some of the more prominent international standardization projects, although there are countless others, such as eurowordnet, the open language archives community (olac), and international cataloguing code (icc), to name but a few.21 in general, the current major projects consist of linking subject headings, name authority files, and titles in multiple languages. linking may not have the best correspondence rates, we have still not begun to tackle the cross-linking of other bibliographic elements, and at this point search results may be more confusing than helpful. but the existence of these linking projects means we are at least headed in the right direction. the emergent universality of oclc was our most recent step toward interoperability, and it looks as if cross-linking is our next step. only time will tell what steps will follow. references 1. lawrence s. guthrie ii, “an overview of medieval library cataloging,” cataloging & classification quarterly 15, no. 3 (1992): 93–100. 2. lois mai chan and theodora hodges, cataloging and classification: an introduction, 3rd ed. (lanham, md.: scarecrow, 2007): 48. 3. ibid., 6–8. 4. ibid., 7–9. 5. oclc, “about oclc,” http://www.oclc.org/us/en/ about/default.htm (accessed dec. 9, 2009). 6. jung-ran park, “cross-lingual name and subject access: mechanisms and challenges,” library resources & technical services 51, no. 3 (2007): 181. 7. ibid., 185. 8. ibid. continued on page 39 tagging: an organization scheme for the internet | visser 39 international and o’reilly media, web 2.0 refers to the web as being a platform for harnessing the collective power of internet users interested in creating and sharing ideas and information without mediation from corporate, government, or other hierarchical policy influencers or regulators. web 3.0 is a much more fluid concept as of this writing. there are individuals who use it to refer to a semantic web where information is analyzed or processed by software designed specifically for computers to carry out the currently human-mediated activity of assigning meaning to information on a webpage. there are librarians involved with exploring virtual-world librarianship who refer to the 3d environment as web 3.0. the important point here is that what internet users now know as web 2.0 is in the process of being altered by individuals continually experimenting with and improving upon existing web applications. web 3.0 is the undefined future of the participatory internet. 3. clay shirky, “here comes everybody: the power of organizing without organizations” (presentation videocast, berkman center for internet & society, harvard university, cambridge, mass., 2008), http://cyber.law.harvard.edu/inter active/events/2008/02/shirky (accessed oct. 1, 2008). 4. ibid. 5. lawerence lessig, “early creative commons history, my version,” videocast, aug. 11, 2008, lessig 2.0, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed aug. 13, 2008). 6. elaine peterson, “beneath the metadata: some philosophical problems with folksonomy,” d-lib magazine 12, no. 11 (2006), http://www.dlib.org/dlib/november06/peterson/11peterson .html (accessed sept. 8, 2008). 7. clay shirky, “ontology is overrated: categories, links, and tags” online posting, spring 2005, clay shirky’s writings about the internet, http://www.shirky.com/writings/ontology_ overrated.html#mind_reading (accessed sept. 8, 2008). 8. gene smith, tagging: people-powered metadata for the social web (berkeley, calif.: new riders, 2008): 68. 9. ibid., 76. 10. thomas vander wal, “folksonomy,” online posting, feb. 7, 2007, vanderwal.net, http://www.vanderwal.net/folksonomy .html (accessed aug. 26, 2008). 11. thomas vander wal, “explaining and showing broad and narrow folksonomies,” online posting, feb. 21, 2005, personal infocloud, http://www.personalinfocloud.com/2005/02/ explaining_and_.html (accessed aug. 29, 2008). 12. shirky, “ontology is overrated.” 13. ibid. 14. michael arrington, “exclusive: screen shots and feature overview of delicious 2.0 preview,” online posting, june 16, 2005, techcrunch, http://www.techcrunch.com/2007/09/06/ exclusive-screen-shots-and-feature-overview-of-delicious-20 -preview/(accessed jan. 6, 2010). 15. smith, tagging, 67–93 . 16. vander wal, “explaining and showing broad and narrow folksonomies.” 17. adam mathes, “folksonomies—cooperative classification and communication through shared metadata” (graduate paper, university of illinois urbana–champaign, dec. 2004); peterson, “beneath the metadata”; shirky, “ontology is overrated”; thomas and griffin, “who will create the metadata for the internet?” 18. shirky, “ontology is overrated.” 19. peterson, “beneath the metadata.” 20. cory doctorow, “metacrap: putting the torch to seven straw-men of the meta-utopia,” online posting, aug. 26, 2001, the well, http://www.well.com/~doctorow/metacrap.htm (accessed sept. 15, 2008). 21. marieke guy and emma tonkin, “folksonomies: tidying up tags?” d-lib magazine 12, no. 1 (2006), http://www.dlib .org/dlib/january06/guy/01guy.html (accessed sept. 8, 2008). 22. shirky, “ontology is overrated.” global interoperability continued from page 33 9. julie renee moore, “rda: new cataloging rules, coming soon to a library near you!” library hi tech news 23, no. 9, (2006): 12. 10. rick bennett, brian f. lavoie, and edward t. o’neill, “the concept of a work in worldcat: an application of frbr,” library collections, acquisitions, & technical services 27, no. 1, (2003): 56. 11. park, “cross-lingual name and subject access.” 12. ibid. 13. thomas b. hickey, “virtual international authority file” (microsoft powerpoint presentation, ala annual conference, new orleans, june 2006), http://www.oclc.org/research/ projects/viaf/ala2006c.ppt (accessed dec. 9, 2009). 14. leaf, “leaf project consortium,” http://www.crxnet .com/leaf/index.html (accessed dec. 9, 2009). 15. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 16. alan danskin, “mature consideration: developing bibliographic standards and maintaining values,” new library world 105, no. 3/4, (2004): 114. 17. ibid. 18. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 19. moore, “rda.” 20. danskin, “mature consideration,” 116. 21. ibid.; park, “cross-lingual name and subject access.” meeting users where they are: delivering dynamic content and services through a campus portal communications meeting users where they are delivering dynamic content and services through a campus portal graham sherriff, dan desanto, daisy benson, and gary s. atwood information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.11519 graham sherriff (graham.sherriff@uvm.edu) is instructional design librarian, university of vermont. dan desanto (ddesanto@uvm.edu) is instruction librarian, university of vermont. daisy benson (daisy.benson@uvm.edu) is library instruction coordinator, university of vermont. gary s. atwood (gatwood@uvm.edu) is education librarian, university of vermont. abstract campus portals are one of the most visible and frequently used online spaces for students, offering one-stop access to key services for learning and academic self-management. this case study reports how instruction librarians at the university of vermont collaborated with portal developers in the registrar’s office to develop high-impact, point-of-need content for a dedicated “library” page. this content was then created in libguides and published using the application programming interfaces (apis) for libguides boxes. initial usage data and analytics show that traffic to the libraries’ portal page has been substantially and consistently higher than expected. the next phase for the project will be the creation of customized library content that is responsive to the student’s user profile. introduction for many academic institutions, campus portals (also referred to as enterprise portals) are one of students’ most frequently used means of interacting with their institutions. campus portals are websites that provide students and other campus constituents with a “one-stop shop” experience, with easy access to a selection of key services for learning and academic self -management. typically, portals provide features that make it possible for students to obtain course information, manage course enrollment, view grades, manage financial accounts, and access information about campus activities. for faculty and staff, campus portals provide access to administrative resources related to teaching, human relations, and more. these campus portals are different from library portals, which some libraries implemented in the 2000s as a way to centralize access to key library services.1 currently, the public-facing websites of many colleges and universities serve a crucial role in marketing the institution to prospective students. this creates an incentive to be as comprehensive as possible and to showcase the full breadth of programs, services, offices, and facilities. a common disadvantage to this approach to institutional web design is information overload: an overwhelming array of labels and links that diminish the ability of current affiliates to find and access the services they need. these sites are designed for external users for whom the research and educational functions of the library are a low priority. campus portals, however, are designed for internal users and can take a more selective approach. they give student and faculty users a view of campus services that aligns with their priorities and places them in a convenient interface. in this sense, they are tools for information management. campus portals play a critical role in students’ daily lives because they do much more than simply present information. carden observes that campus portals have these key characteristics: mailto:graham.sherriff@uvm.edu mailto:ddesanto@uvm.edu mailto:daisy.benson@uvm.edu mailto:gatwood@uvm.edu information technology and libraries march 2020 meeting users where they are | sherriff, desanto, benson, and atwood 2 • allow a single user authentication and authorization step at the initial point of contact to be applied to all (or most) other entities within the portal; • allow multiple types and sources of information to be displayed on a single composite screen (multiple “channels”); • provide automated personalization of the selection of channels offered, based on each user’s characteristics, on the groups to which each user belongs, and possibly on the way in which the system has historically been used; • allow user personalization of the selection of channels displayed and the look-and-feel of the interface, based on personal preferences; • provide a consistent style of access to diverse information sources, including “revealing” legacy applications through a new consistent interface; and • facilitate transaction processing as well as simple data access. 2 in sum, enterprise portals use a combination of advanced technologies that have the ability to present both static and user-responsive information in a space reserved for affiliates of the university. these abilities present an attractive venue for libraries to leverage the capabilities of a campus portal to present users with dynamic, personalized instructional experiences—in a space where users are. this aligns with the principles of user-centered design, which emphasizes the need to empathize with users’ needs and perspectives. simplicity, efficiency, convenience, and responsiveness to each user’s individual circumstances are critical.3 the idea of presenting libraries’ content through a campus portal is not a new one. stoffel and cunningham surveyed libraries in 2004 and, while finding that “library participation in campus portals is . . . relatively rare,” of the sixteen self-selected responding campuses, ten had a library tab or a dedicated library channel within their campus portal, while two more had a channel or tab under development.4 the types of library integration described in most examples consisted of using the portal’s campus authentication to link to a user’s library account and view borrowed books, fines, holds, and announcements. while resources like federated searches, research guides, and lists of journals and databases appeared in some respondents’ portals, they largely appeared as static content rather than responding to the user’s profile. since 2004, portals have remained a core part of the university of vermont’s information delivery system, but portal integration remains relatively rare among libraries and most have done little to integrate new tools such as research guides or develop instructional content that leverages a portal’s user-responsive design. as a result, there is little in the literature on libraries’ integration of content into campus portals, but a small number of case studies provide proof of concept, such as lehigh university, california state university-sacramento, and arizona state university.5 these case studies also illustrate the importance of cross-campus collaboration. our project required some critical elements, specifically access to the campus portal and a method for publishing content. the projects described in the case studies were successful partly because they were able to apply advanced programming expertise that was not available to our group, such as api coding. instead, our group was able to obtain these critical inputs through a partnership with the university of vermont registrar’s office. information technology and libraries march 2020 meeting users where they are | sherriff, desanto, benson, and atwood 3 at the university of vermont, the campus portal uses the banner product licensed from ellucian and has branded it as “myuvm.” it is administered by the registrar’s office. librarians have observed that it is central to students’ academic lives. students go to myuvm as their pathway to many of the online services and tools that they use. they go there to check email, log in to the learning management system (lms), check grades, to add, drop, or withdraw from courses, to check their schedule, and more. they go there to carry out tasks. figure 1. screenshot of myuvm (https://myuvm.uvm.edu as it was on march 1, 2019). the importance of myuvm is communicated to university of vermont students at orientation. in this way, first-year students learn at the earliest point, even before their academic programs begin, that the portal is their primary gateway for access to campus academic services. this shapes their view of the services available to them and how those services are organized. it also shapes how they reach those services and how they interact with them. at the same time, the selective principle underlying the campus portal means that if something is not present, it is less visible and less accessible, and there is a risk of signaling to students that it is not important to their daily lives or their academic performance. methods the characteristics of campus portals and their contents motivated instruction librarians to explore the possibility of integrating library services into myuvm. in 2014, the university of vermont libraries’ educational services working group—a small cross-libraries group of librarians who work on a variety of projects supporting classroom instruction and research assistance—began by defining the desirable scope of possible portal content. https://myuvm.uvm.edu/ information technology and libraries march 2020 meeting users where they are | sherriff, desanto, benson, and atwood 4 the educational services working group quickly determined that library content included in the portal should be designed to conform with the principle of priority-based selectivity employed across the portal as a whole. this content should not attempt to represent the full s uite of library information and services available. this would replicate the websites of the three libraries on campus and would risk creating overload and disorientation, in a similar way to institutional websites. it is common for actionable and instructional material to become buried beneath links on a library homepage, and the homepages of our three libraries’ websites are no different. our hope was to reposition selected instructional content such as research guides, databases -bysubject, chat reference, and liaison librarian contacts in a venue with which students are used to interacting. the goal of the project was the strategic positioning of dynamic, responsive information about research services in a venue with which students frequently interact. research librarians would select and organize the most important and pertinent instructional content. such selectivity fit well within the portal’s principle for curating content: high-use tools and services that directly support students’ priorities. thus the objective for this project would not be the re-creation of the library websites within myuvm. it was also determined that the scope would exclude content that might be considered marketing or engagement for its own sake, for the same purpose of minimizing users’ cognitive load and helping them to quickly find the features they need. the myuvm developers in the registrar’s office were enthusiastic about working with us on this project, which partly reflects an increased attention across campus to equitable access to student services for all users—something that is important for its own sake, but also for the purposes of accreditation. following preliminary discussions in early 2018, myuvm developers created a test “libraries” page, equivalent to a full screen of content, and assigned to our group the privileges necessary to view it in the myuvm test environment. each page in myuvm is composed of a series of content boxes or channels. in developing our new page, our task was to develop content for the desired channels. we began our process for composing the page with a card-sorting exercise that identified priorities for the content that should be highlighted. the participants were the group’s members, in order to expedite initial decisions about content that could be tested with users at a later point in the project. items that figured prominently in this process were the libraries’ “ask a librarian” service, research guides, and search tools (discovery layer, databases, and journal directory). this confirmed that our group’s priorities centered on users’ transactional interactions with library services and not merely the one-way promotion of library information. the results of the card sorting were then translated into a wireframe (see figure 2). each square in the wireframe represented a channel for which we would need to create the appropriate content: • ask a librarian (contact details for the libraries’ research assistance services) • research guides (subject and class guides) • search our collections (search tools for the discovery layer, databases, and journal directory) information technology and libraries march 2020 meeting users where they are | sherriff, desanto, benson, and atwood 5 • research roadmap (the libraries’ suite of tutorials on foundational research skills) • featured content (a channel for rotating or temporary time-specific content) • libraries (a box with a link to each of the three libraries on campus; we later added a channel for each library) • the wireframe also envisaged the inclusion of a pop-out chat widget. figure 2. wireframe for library content. as noted, the project needed a process that would enable our group to create and publish this content autonomously, but without requiring advanced programming skills on our part. we learned that myuvm is capable of publishing content pushed from a webpage by using its url. this meant that we could create content in libguides, a platform with which our group was very familiar, and then push the content of an individual libguide box to a myuvm channel simply by providing the libguide box urls to the portal developers. this method offers several advantages. importantly, it meant that our group had direct control of the box content and was able to publish it without needing the myuvm developers to review and authorize every edit. information technology and libraries march 2020 meeting users where they are | sherriff, desanto, benson, and atwood 6 those involved in this project faced important decisions early in the process regarding which resources we deemed essential for inclusion and best suited to this new online context. once items were selected, it was important to keep user behaviors in mind as we prioritized “above the fo ld” content. students are used to quickly popping into the portal, finding what they need, and popping out. we tried to place interactive content that fit this use pattern in high-visibility places and moved content that required more sustained reading and attention further down the page. a challenge faced during the design process was our campus’s lack of a unified, cross-libraries web presence. the three libraries on our campus have separate websites, but the university of vermont portal required that we present a unified “libraries” presence. in some cases, such as links back to library webpages, we were easily able to treat the three libraries separately. in other cases, such as our research guides, we were able to merge resources from multiple libraries. in still other cases, such as our chat widgets, we had to make decisions about which library’s resource would be featured and which other versions would be secondary. the prototyping and testing phases revealed that some content needed to be adjusted in order to display in myuvm as desired. libguides’ tabbed boxes and gallery boxes did not display correctly. also, some style coding inherited from the libguides boxes needed to be adjusted in order to display cleanly. one item, “course reserves,” was present in the wireframe but not the page at the time of implementation. we continue to work on the development of a widget for searching “course reserves” holdings. the version of the “library” page at the time of going live is shown in figure 3. figure 3. screenshot of the “library” page in myuvm. the “research guides” channel has a dropdown menu for subject guides and another for class guides. these menus were created using libguides widgets, meaning that they update automatically as guides are published and unpublished, and do not require any manual maintenance. information technology and libraries march 2020 meeting users where they are | sherriff, desanto, benson, and atwood 7 the “search our collections” channel includes three access points to the libraries’ collections. this contrasts with the libraries’ websites, which display only the discovery layer search box. the latter approach has the advantage of promoting one-stop searching, but also the disadvantage of overwhelming users with non-relevant results. channels on the left side of the page are less dynamic and interactive. at the top, links to the three libraries on campus provide highly visible quick access for students looking for the libraries’ websites. similarly, the “ask a librarian” channel quickly gets students to reference and consultation services at their home library. the “you should know” channel provides a space for rotating content to be changed based on time-of-year, events on campus, or other perceived student needs. results the “library” page in myuvm went live in january 2019, at the same time that spring semester classes began. our preliminary review of results from the semester, based on data collected from myuvm, libguides statistics, and google analytics, has identified several positive outcomes. myuvm data showed that there were 18,891 visits to the “library” page during the period from mid-january to the end of march, a period of eleven weeks when classes were in session. this volume of traffic substantially exceeded our group’s expectations for the first months following implementation, during a period when we were only beginning to promote awareness of the page. data also showed that usage during this period was generally consistent. the most significant variation in traffic was a small peak in late february that corresponded with a high point in the level of library instruction. libguides statistics showed an overall increase in usage of subject guides, though it is not possible to attribute this to the myuvm project with complete certainty. in addition, however, we also observed that for many of our guides during this period, myuvm was among the top referring sites. libguides statistics also recorded unexpectedly large increases in usage for the “research roadmap” that we attribute primarily to the myuvm project. four sections of the “research roadmap” experienced increases of more than 100 percent during the january-march period. the research roadmap’s “more help” page showed a 65 percent drop in visits, but a possible explanation for this is that the highlighting of sections in myuvm is providing more-immediate help to our users in finding what they need and promoting independent use of instructional materials by students. libchat statistics indicated a significant increase in chat reference transactions at howe library, the university of vermont’s central library: a 23 percent increase over the count for the fall 2018 semester, with the implementation of the myuvm project being the only reasonable explanation. all initial data appear to show that users are finding and continuing to use the “library” tab in the portal. they are discovering guides and using the embedded chat widget. we plan to gather more usage data for other channels on the page to better inform our picture of what users are doing once they find and view the “library” tab. as campus portals have become a ubiquitous part of university life, revisiting the library’s role in these portals seems worthwhile, especially given that information technology and libraries march 2020 meeting users where they are | sherriff, desanto, benson, and atwood 8 commonplace design tools like libguides dramatically lower the technological acumen needed for creating content. future directions the next step for this project is to leverage the ability of a campus portal to create a myuvm homepage library channel that customizes the display of content, based on unique user characteristics. when the user logs in, they are routed to the portal’s landing page, which is dynamically created based upon their student or faculty status, enrollment in a college or school, level of study (graduate/undergraduate), or number of years attending the university of vermont. this page has the ability to conform to the user in even more granular ways and dynamically display content based upon their major or other demographic categories such as study abroad status, veteran status, or first-year students. by leveraging the portal’s ability to display user-specific content, the university of vermont libraries have the ability to customize instructional content tailored to a user’s information needs and place that content in a channel that will display alongside other channels on the myuvm homepage. a first-year history major’s library channel could contain tutorials on working with primary sources, a link to their liaison librarian, links to digitized newspaper collection, and help guides for chicago citation style. a graduate student in nursing might see information abou t evidence-based practices for developing a clinical question, help guides for using pubmed and cinahl, and resources for point-of-care. a faculty member in psychology might find tutorials for creating alerts in their favorite journals, information about copyright and reserves material, or information about citation-management software. in each case, the portal pushes resources and assistance to each user that best fits their specific need, as informed by the librarians best equipped to address that need. this last step of placing dynamic content on the myuvm homepage will require a great deal of coordination with liaison librarians both to identify the most pertinent disciplinary information to place in the portal and to identify the times of year when certain information is most relevant. to keep portal content dynamic and pertinent to users, a system will need to be created for releasing and removing content on a regular basis and this scheduling of content will require the input of liaison librarians. the educational services working group will need to manage this scheduling, as well as the enforcement of portal design conventions in coordination with the myuvm developers. although this management may end up being complex, it is not insurmountable, and our next steps will be to both to create a system for content creation and management, and to begin to create test content for a sample of user groups. we also plan to gather more data and expand our analytics capabilities to assess how users are using content on the myuvm “library” page and examine which features are most popular, how much traffic is being driven back to our websites, and how users are interacting with the features on the page. conclusion our project has confirmed our initial inclination that students go to myuvm as a finding tool for finding inter-campus resources. also, faculty have reported accessing library resources through the portal and directing their students to that pathway as well. the immediate high use and consistency of use indicate that we have placed our selected libraries resources in a high -traffic information technology and libraries march 2020 meeting users where they are | sherriff, desanto, benson, and atwood 9 venue. instead of attempting to coax students to our web outpost in the wilds of the internet, we have placed an exit ramp from a highway they already travel. this has proven overwhelmingly effective and confirms, on our campus at least, the literature from the mid-2000s pointing out the opportunity created for libraries by campuses’ institutional adoption of portal systems. in all, the project has been a worthwhile venture for the university of vermont libraries. we have observed immediate use and better-than-expected levels of traffic, as well as continued use throughout the semester. it appears that once students wear a path to resources in myuvm, they are continuing to use that path as a way to access library content. we look forward to further customizing that content in the near future. acknowledgements we gratefully acknowledge david alles, portal developer, and naima dennis, senior assistant registrar for technology, in the university of vermont office of the registrar, for their contributions to the design and development of this project. endnotes 1 scott garrison, anne prestamo, and juan carlos rodriguez, “putting library discovery where users are,” in planning and implementing resource discovery tools in academic libraries, ed. mary pagliero popp and diane dallis (hershey, pa: information science reference, 2012), 391, https://doi.org/10.4018/978-1-4666-1821-3.ch022; bruce stoffel and jim cunningham, “library participation in campus web portals: an initial survey,” reference services review 33, no. 2 (june 1, 2005): 145-46, https://doi.org/10.1108/00907320510597354. 2 mark carden, “library portals and enterprise portals: why libraries need to be at the centre of enterprise portal projects,” information services & use 24, no. 4 (2004): 172–73, https://doi.org/10.3233/isu-2004-24402. 3 ilka datig, “walking in your users’ shoes: an introduction to user experience research as a tool for developing user-centered libraries,” college & undergraduate libraries 22, nos. 3–4 (2015): 235–37, https://doi.org/10.1080/10691316.2015.1060143; steven j. bell, “staying true to the core: designing the future academic library experience,” portal: libraries and the academy 14, no. 3 (2014): 369–82. https://doi.org/10.1353/pla.2014.0021. 4 stoffel and cunningham, “library participation in campus web portals,” 145-46. 5 tim mcgeary, “mylibrary: the library’s response to the campus portal,” online information review 29, no. 4 (2005): 365–73, https://doi.org/10.1108/14684520510617811; garrison, prestamo, and rodriguez, “putting library discovery where users are,” 393-94. https://doi.org/10.4018/978-1-4666-1821-3.ch022 https://doi.org/10.1108/00907320510597354 https://doi.org/10.3233/isu-2004-24402 https://doi.org/10.3233/isu-2004-24402 https://doi.org/10.1080/10691316.2015.1060143 https://doi.org/10.1353/pla.2014.0021 https://doi.org/10.1108/14684520510617811 abstract introduction methods results future directions conclusion acknowledgements endnotes lib-mocs-kmc364-20131012113204 190 communications automation and the service attitudes of arl circulation managers james r. martin: university of rochester library, rochester, new york. the circulation function in our large academic libraries has undergone two important transformations since the turn of the century. the first of these is departmentalization; the second, automation. the departmentalization of the circulation function has tended to separate the circulation department from the library's educational and information functions , the more "professional " aspects of librarianship. laurence miller makes this point in his dissertation, "changing patterns of circulation services in university libraries, " which focuses on the rise of circulation departmentalization.1 miller surveyed large academic libraries to determine if certain services-reference, interlibrary loan, orientation, catalog assistance-were being withdrawn from the circulation function . after verifying a withdrawal of these services and identifying them as the "professional" ones, miller drew the conclusion that circulation is therefore suspect as a professional activity. 2 his are generally held conclusions as robert oram suggests: until recently, librarians have been reluctant to deal with circulation problems on an organized basis. the belief that circulation was, in part at least, custodial and clerical rather than managerial and professional underlies much of the reluctance to solve mutual circulation problems th rough a professional group.' paralleling this change in the circulation function's organizational setting, the mechanization of the circulation process has continued to move from the laborious and slow use of manual procedures and book cards toward the immediate updating and record keeping of the online system. circulation automation has passed from the early days of simply mechanizing files (represented by the batch system) to the present, where libraries have the potential capacity to perform the complete circulation control process with real-time systems. • sophisticated online systems have begun to truly control the complete circulation function. the metamorphosis of circulation automationfrom simple mechanization to full computerization-has had a tremendous impact on the technical side, the processes, of the circulation department. likewise it may well have had impact on the service attitudes, priorities, and leadership of the department. the level of automation may relate to the circulation manager's attitudes and priorities, and in the words of an american library association committee, "the impact of automation might change the image of the circulation librarian." 5 as it automates, gaining control over its own processes, the circulation department and its manager may actually become more responsive to its users-more service oriented, more "professional." in february 1980, a questionnaire was sent to circulation managers of all the ninety-eight academic libraries that hold membership in the association of research libraries. 6 it sought to (1) identify the degree and state of automation of the circulation function , classified by the three system categories of manual , batch , and online systems, and (2) to capture opinions on the circulation manager's view of his management role and his attitudes on service issues and user demands. these attitudes were related to the three types of systems. seventy-six questionnaires were returned, for a 78 percent response rate. circulation department characteristics circulation departments ranged in size from 4 to 78 ite employees. the average department size was 18, the median 14.25. the number of students employed ranged from 0 to 175. twenty-nine percent of managers said staffing was not adequate and 45 percent said they had to depend too heavily upon students. fifty-seven percent of managers of manual systems responded that they had to depend too heavily upon students, versus 27 percent of batch and 50 percent of online managers. (because of variations in what is counted, transaction volume figures are not particularly informative.) circulation system characteristics the seventy-six responding libraries reported approximately thirty-two different system configurations. thirty-nine percent of these systems were manual, 34 percent were batch, and 26 percent were online. nineteen percent of the total were manual mcbee systems and 15 percent were libs100 online systems. manual systems had been in use an average of twenty-six years, batch systems an average of eight years (range: ten months to eighteen years), and online systems an average of three years (range: three months to eight years). circulation manager characteristics typically, the circulation manager in an arl library is the head of a department. arl circulation managers had held their positions from six months to twenty years. five years was the average, but 68 percent listed five years or less. gender was evenly distributed: thirty-eight males and thirtyeight females. the managers of manual systems were 43 percent male/ 57 percent female, those of batch systems were 54 percent male/46 percent female, and of online systems 55 percent male/45 percent female. seventy percent of all managers had an mls, and 30 percent did not; 40 percent of managers of online systems did not have an mls. a majority of circulation mancommunications 191 agers (57 percent) reported spending over 25 percent of their time on matters outside of strictly circulation concerns. in fact a substantial minority, 23 percent of all managers, spent over 50 percent on extracirculation matters. satisfaction with circulation system as a group, arl circulation managers are not satisfied with their systems, as table 1 shows. online-system managers consistently rate their systems most highly. asked if their systems were "close to ideal," only 17 percent of all respondents were affirmative. only 3 percent of manual-system managers agreed that their system was "close to ideal" as compared to 12 percent of batch managers and 45 percent of online managers. hidden in these averages is the fact that three managers gave their systems perfect scores on all four questions and those systems were all online: geac, libsloo, and an ibm-based online system. (table 2 summarizes responses on the four system-performance statements.) hardware, software, and downtime circulation managers with automated systems also reported on their experience with equipment, software, and downtime. batch-system managers were more satisfied with hardware and software (7 4 percent for both) than were online managers (60 percent satisfied with hardware and 65 percent with software). however, open-ended questions revealed that dissatisfaction with online-system hardware and software centered around limitations of the libs100 system (used by 55 percent of online-system managers). the libs100 system was panned for "inflexible software," "poor fines system," and "lack of reserve book features. " (these are all long-recognized limitations that were partially addressed in the relatively recent release 24.) the downtime situation was more satisfactory, however, for online managers than batch managers. seventy-five percent reported downtime was not a problem as against more than 63 percent of batch-system managers. 192 journal of library automation vol. 14/3 september 1981 table 1. responses by type of system (n = 30 manual , 26 batch, 20 online) strongly no strongly disagree agree agree opinion disagree "our circulation system is completely adequate" manual 1(3%) 4(13 %) batch 1(4%) 5(19%) online 3(15%) 7(35%) "our circulation system is reliable" manual 1 (3%) 15(50 o/o ) batch 3(12%) 9(35%) online 5(25 %) 11(55 %) 1(3 %) 1(4 %) 1(5 %) 1(3 %) 12(40% ) 13(50 %) 6(30%) 10(33 %) 11(42 %) 3(15 %) 14(40 %) 6(23%) 3(15%) 3(10 %) 3(12 %) 1(5 %) "our circulation system's records are very accurate" manual 2(7%) 7(23%) 2(7 %) 16(53 %) 9(35%) 6(30%) 3(10 %) 3(12 % ) batch 3(8%) 12(46%) online 4(20 o/o) 10(50%) "our circulation system is close to ideal" manual 1(3 %) batch 3(12 o/o) online 3(15%) 6(30%) 3(15 %) 7(23%) 8(31 o/o) 4(20%) 22(73%) 13(50%) 4(20%) table 2. summary of responses on four system questions (detail given in table 1) standard minimum maximum mean median deviation value value variance manual 9. 9 3.27 4 16 11 batch 10.o8· 8.5 3.81 4 18 15 online 13.45. 14 4.57 5 20 21 •20 =strongly agree, 16 =agree, 12 =no opinion, 8 =disagree, 4 =strongly disagree. service attitudes respondents were asked to mark attitude statements on a five-point scale: "strongly agree," "agree," "no opinion," "disagree," and "strongly disagree." attitude statements fell into four categories: (1) specific service concerns, (2) the importance of the managerial role, (3) user problems, contacts and complaints, and (4) user demands and expectations. the averages of the last three groups were used to explore the question of association between level of automation and manager service attitudes (see table 3). specific service concerns ninety percent of circulation managers agreed that "speed of service is very important to users," and no online-system manager disagreed. forty-three percent of manual-system managers agreed that "control of circulating books tends to be inadequate." this compares to 16 percent of batch managers and 15 percent of onlinesystem managers. asked whether "users tend to expect more service than the department can give," 56 percent of manual managers agreed, as did 46 percent of batch managers and 40 percent of online-system managers. attitudes toward management role the study found that circulation managers are uniformly strong in their affirmation of the importance of their role, with a slight tendency for online managers to be more affirmative. in fact, 100 percent of respondents agreed with the statement that the "management of the circulation function is important." ninety-three percent agreed that "circulation management should rank high among the library's priorities." ninety-five percent disagreed with the negative statement that "circulation communications 193 table 3. attitude responses, averages management role (9 questions) demands and expectations (6 questions) contacts and complaints (6 questions) totals 3.913 3.92 3.99 manual batch online 4.38 4.34 4.48 5 =most positive response . 1 =least positive response. management offers little opportunity for the exercise of initiative." ninety-four percent of all managers disagreed that "circulation management lacks complexity." attitudes toward user problems, contacts, and complaints the study found that circulation managers are uniformly strong in their desire to respond to user complaints and problems, but with a slight tendency for online managers to be more favorable to the user. one hundred percent of online managers regarded user contacts as pleasant, as did 93 percent of manual and batch managers. ninety-five percent of online managers, 92 percent of batch managers, and 87 percent of manual managers affirm that patron contact provides the challenge in circulation work. eighty percent of online managers and 73 percent of manual and batch managers rejected the statement that "complaints tend to be unfounded." sixty-five percent of the respondents of online systems were more likely to favor the user by thinking "complaints are most often substantive," as compared to 50 percent of manual managers and 48 percent of batch managers. ninety percent of online managers disagreed that users "complain far too much," compared with 84 percent of batch managers and 79 percent of manual managers. attitudes toward user demands and expectations circulation managers are generally favorable in their attitudes toward user demands and expectations. several statements in this area, however, ran contrary to the tendency of online managers to agree slightly more with attitudes favorable to the user than managers of batch and manual 3.48 3.52 3.46 3.88 3.9 4.03 systems. for example, while 93 percent of manual-system managers and 85 percent of batch managers agreed that "the circulation department should be oriented towards users' expectations," only 70 percent of online managers did. on the statement, "users should be more tolerant of limitations in circulation services," manual managers disagreed by 34 percent, batch managers by 40 percent, and online managers by 20 percent. these responses against the trend of the online manager as more user oriented may be due to the fact that the study was not completely successful in differentiating between responses based on general attitudes and those based directly on the specific system in use. in other words, the relative quality of each circulation system or even the "bugs" peculiar to a ~pecific system may affect one's attitude toward the user's need to tolerate the limitations of that system. manual-system managers know the limitations on their service are keyed to inefficient systems, whereas online-system managers know their systems and services are already at a high level. this knowledge of the system in use colors service attitudes. conclusion the study found a depressed state of circulation-system development and support in arl libraries. seventy-four percent of circulation managers, on average, rated their systems negatively on basic system integrity, as shown in table 2. the thirty manual-system managers gave their systems an average score of 9, to the effect that their systems were ideal, adequate, reliable, and accurate. the twentysix batch managers gave their systems an average score of 10.08, the twenty online managers an average of 13.45. recognizing the considerable constraints under which 194 journal of library automation vol. 14/3 september 1981 today's large academic libraries struggle, there is, nonetheless, room for criticism of library priorities. this study must be viewed as only a first step (largely tentative and exploratory) in relating automation with service attitudes. it suggests that online systems may be associated with managers more positive in their view of the management role and more positive in their attitudes toward users than batchand manual-system managers. further research would be useful at this point to compare levels of automation (manual, batch, and online) with circulation-staff service attitudes or those of patrons using the systems. references l. laurence miller, "changing patterns of circulation services in university libraries" (ph.d. dissertation, florida state university, 1971), p.iii. 2. ibid., p.149. 3. robert oram, "circulation," in allen kent and harold lancour, eds., encyclopedia of library and information science, v.s (new york: marcel dekker, 1971), p.l. 4. william h. scholz, "computer-based circulation systemsa current review and evaluation," library technolo gy reports 13:237 (may 1977). 5. robert oram , " circulation," p.2. 6. james robert martin , "automation and the service environment of the circulation manager" (ph.d. dissertation, florida state university, 1980), p.22. statistics on headings in the marc file sally h. mccallum and james l. godwin: network development office, library of congress, washington, d.c. in designing an automated system, it is important to understand the characteristics of the data that will reside in the system. work is under way in the network development office of the library of congress (lc) that focuses on the design requirements of a nationwide authority file. in support of this work, statistics relating to headings that appear on the bibliographic records in the lc marc ii files were gathered. these statistics provide information on characteristics of headings and on the expected sizes and growth rates of various subsets of authority files. this information will assist in making decisions concerning the contents of authority files for different types of headings and the frequency of update required for the various file subsets. then ational commission on libraries and information science supported this work. use of these statistics to assist in system design is largely system-dependent; however, some general implications are given in the last section of this paper. in general , counts were made of the number of bibliographic records, headings that appear in those records, and distinct headings that appear on the records. the statistics were broken down by year, by type of heading, and by file. in this paper, distinct headings are those left in a file after removal of duplicates. distinctness will not be used to imply that a heading appears only once in a source bibliographic file, although distinct headings may in fact have only a single occurrence. thus, a file of records containing the distinct headings from a set of bibliographic records is equivalent in size to a marc authority file of the headings in those bibliographic records. methodology these statistics were derived from four marc ii bibliographic record files maintained internally at lc: books, serials, maps, and films. the files contain updated versions of all marc records that have been distributed by lc on the books, serials, maps, and films tape:; frum 1969 through october 1979, and a few records that were then in the process of distribution. the files do not contain cip records. a total of l ,336,182 bibliographic records were processed, including 1,134,069 from the books file, 90,174 from the serials file, 60,758 from the maps file, and 51,176 from the films file. a file of special records, called access point (ap) records, was created that contains one record for the contents of each occurrence of the following fields in the bibliographic records: 108 journal of library automation vol. 14/2 june 1981 has shown that interactive television programs: 1. serve as an initial introduction to naive audiences of what a truly interactive system is all about; 2. are difficult to implement; 3. really aren't democratic; 4. are basically polling devices. it has been said that the reason that railroads went out of business was because they insisted that they were in the railroad business and wouldn't admit that they were in the transportation business . if cable operators insist that they are in the television business, they may well miss the opportunities that are possible in the communications business or, in fact, in the information business . by the same token, if libraries miss the significance of what cable television is bringing to their business, their role in the community will be diminished and libraries may go the way of railroads. modern communications and computers offer an opportunity for libraries to become the information choice in their community. in the near future, applications such as the home book club may well be a way to provide increased accessibility of library services to library patrons, and to "condition" those patrons to the coming electronic nature of libraries. over the long term, libraries, if they have the courage and the foresight, can be the focus of the coming information and telecommunications revolution . the message is quite clear: opportunities abound. references l. john wicklein, "wired city, u.s.a: the charms and dangers of two-way tv," atlantic monthly 243:35--42 (feb. 1979). 2. warner amex represents a newly formed corporation resulting from the merger of warner communications and american express. 3. jonathan black, "brave new world of television," new times 11:41 (24 july 1978). 4. ibid ., p.49. 5. "warner cable's qube: exploring the outer reaches of two-way tv, " broadcasting 95:28 (31 july 1978). 6. "two-way converters hot ticket at ncta exhibits , " broadcasting 97:72 (26 may 1980). an informal survey of the cti computer backup system joseph covino and sheila intner: great neck library, great neck, new york. in order to help decide whether or not to purchase computer backup systems from computer translation, inc . (cti), * for use when the clsi libs 100 automated circulation system is not operating, great neck library conducted an informal survey of libraries using both systems . eleven institutions, including both public and academic libraries , responded to a brief questionnaire. they were asked what size cti system they had purchased and why, how easily it was installed, how well it performed, how it was maintained, and if clsi acknowledged that the addition of the backup did not affect their libs 100 maintenance agreements . before summarizing the responses, the structure of the two systems and how they interact should be outlined. clsi libs 100 the clsi automated circulation system consists of a stand-alone minicomputer console with local and/or remote terminals connected to it through individual ports by means of electrical and/or dedicated telephone line hookups . when it operates, the terminals are online and interactive with the database, which is stored on one or more multiplatter disc packs. cti backup the cti backup system is based on an apple ii microcomputer with two minidisc drives, which take 51/.-inch floppy discs, a tv monitor, and a switching system that can be connected to the libs 100 console or its terminals. the cti system can also be used alone . when the libs 100 is down (inoperative), the cti system is connected to a terminal, and data is recorded on its discs for later dumping (data entry) into the database via a port connection . it *cti is a profit-making company wholly owned by brigham young university. the cti backup system was originally developed to support the clsi-installation at byu. appears to the public and the library's staff memb e r operating the backup-terminal combination that the terminal is working. there is, however, no connection between the backup unit and the database in this mode. when the libs 100 is up (operating) once again, the backup is connected and data is automatically dumped. naturally the port cannot be used by both the clsi terminal and the backup unit at the same time without the addition of other hardware . the terminals attached to other ports may operate normally while dumping is completed. the clsi and cti software, which operate compatibly, are owned by the respective companies, not the library . the responses 1. size of system : cti systems are available in two sizes , 32k and 48k . two libraries purchased the smaller system, nine purchased the larger system, and one purchased both. greater programming capabilities of the larger system were consid e red its greatest asset. 2. reason for purchase: five libraries indicated they use the backup for other purposes in addition to substituting for the libs 100 when it is down . among these other purposes were development of a community information database, personnel and financial reports and files, use as an rlin terminal, as a bookmobile terminal, and as an aid in converting short-title bibliographic records to expanded format. 3. installation: respondents were unanimous in having no problems with installation . seven did their own installation, while cti gave instructions over the phone. three were installed by cti, who also trained the library staff in its operation. one library indicated the accompanying documentation was enough to install the system without assistance. 4. performance: all eleven respondents were enthusiastic about system performance. some comments were, "it's the best thing since buttered communications 109 popcorn," and "we love it dearly 0 0 0 0 it saves hours 0 0 0 works just fine . " many commented on the slow dumping time as the biggest drawback, but noted that increased accuracy over manual entry and decreased pressure on their circulation staff during downtime were assets. 5. maintenance: backup system maintenance is not uniform . six respondents said that software was maintained by cti, but hardware was maintained by an apple dealer; or they were undecided about who would be respons ible for hardware repairs. a seventh library contracted with an apple dealer for hardware repairs, but was contending over software mai ntenance with ct i. three libraries answered that cti was maintaining the system, but did not specify both hardware and software . the last respondent expected to take hardware repairs to an apple dealer and did not mention software . 6. clsi maintenance agreements: one library stated that they had written assurance from clsi that the installation of the backup system would not affect their libs 100 maintenance contract. three more said they had verbal assurances . five respondents indicated no assurances from clsi that the libs 100 contract was not affected . one library sent a copy of a clsi · letter defining company policy in this area. it said, in part: "clsi does not prohibit the attachment of foreign devices to the systems . .. . " qualifications to this statement involved an inst itution's attempt to repair the libs 100 itself, to hold clsi responsible for damage resulting from the attachment of the device, or to have clsi maintain the device . the great neck library decided to purchase two cti backup systems for use when the libs 100 is down. experience bears out the findings of the survey ; i.e . , it is easy to install the system with only telephone assistance; it works well, and, though data transmission to the main unit is slow, it is accurate and removes some of llo journal of library automation vol. 14/2 june 1981 the desperation from a downtime situation. great neck library is also planning to use the apples for other functions, which, it is hoped, will be implemented soon . multimedia catalog: com and online kenneth j. bierman: tucson public library, tucson, arizona. like many public libraries, the tucson public library (tpl) is closing its card catalog and implementing a vendorsupplied microform catalog. unlike most of these other libraries, however, the tpl microform catalog will not include: location or holding information. the indication of where copies of a particular title are actually available (i.e., which of the fifteen possible branch locations) will be available only by accessing a video display terminal connected to the online circulation and inventory control system. conceptually, the tpl catalog will be in two parts with each part intended to serve different functions . 1 the microform catalog (copies available in both film and fiche format) will fulfill the bibliographic function of the catalog. this catalog will contain bibliographic description and provide the traditional access points of author, title, and subject. the online catalog (online terminals are in place at all reference desks and a few public access terminals will also be available) will fulfill the finding or locating function of the catalog. this catalog will contain very brief bibliographic description and will only be searchable by author, title, author/title, and call number, and will contain the current status of every copy of every title in the library system (i.e., on shelf, checked out, at bindery , reported missing, etc.). why did the tucson public library make this decision? there are two major reasons: l. accuracy . the location information , if provided in the microform catalog, would always be inaccurate and out of date. assuming that the locations listed in the latest edition of the microform catalog were completely accurate when the catalog was first issued (an unrealistic assumption to begin with as anyone who has ever worked with location information at a public library with many branches well knows!), the location information would become increasingly less accurate with each day because of the large number of withdrawals, transfers, and added copy transactions that occur (more than 100 , 000 a year). in addition, at any given time, one-quarter to one-third of the materials in busy branches are not on the shelf because they are either checked out or waiting to be reshelved. thus, the microform catalog would indicate that these materials were available at specific branches when a significant percentage would in fact not be available at any given time. in short, even in the best of circumstances, easily half of the location information would be incorrect in telling a user where a copy of a title was actually available at that moment. 2 . cost . a study done at the tucson public library indicated that close to half of the staff time of the cataloging department was spent dealing with location and holding information. this time includes handling transfers, withdrawals, and added copies. all of this record keeping is already being done as a part of the online circulation and inventory control system (the tucson public library has no card shelflist containing copy and location information but rather relies completely on the online file for this type of information) . to "duplicate" the information in the microform catalog would cost an estimated $40,000 to $60,000 a year and the information in the microform catalog would never be accurate or up to date for the reasons outlined above. figure 1 is a brief summary of how the bibliographic system will work. would the system in figure 1 be improved if holdings were included in the microform catalog? on the surface, the obvious answer is yes-more information is 51 brown university library fund accounting system robert wedgeworth: brown university library, providence, r. i. the computer-based acquisitions procedures which have been developed at the library provide more efficient and more effective control over fund accounting and the maintenance of an outstanding order file. the system illustrates an economical, yet highly flexible, approach to automated acquisitions procedures in a university library. · the fund accounting system of the brown university .library was initiated on the basis of a program developed in april, 1966. subsequently, it was decided to implement the program in the fall of that year. the necessary in-house equipment, namely, an ibm 826 typewriter card punch and an ibm 026 keypunch, was placed on order along with new six-part order forms. about the same time an agreement was reached with the administrative data processing office of the university (tabulating) which would provide for rental time on their ibm 1401, 12k system with three magnetic disks and four magnetic tape-storage units. the services of a part-time programmer were also secured through this office. the system became fully operational on december 1, 1966. the primary objective of the project was to establish more efficient and more effective control over the approximately 150 fund accounts administered by the order department of the university library. in addition, it seemed that a number of by-products were possible. among these were statistical information for management and a file of bibliographical records from which a new accessions list could be drawn on a regular basis. the system was to accommodate the payment of all invoices to be posted against the aforementioned accounts. these include mono52 journal of library automation vol. 1/ 1 march, 1968 graphic and serial publications as well as supplies and equipment. however, records of outstanding orders were to be maintained for monographic publications only. although the basic routines were to remain much the same, some minor adjustments wer~ necessary to accommodate the new machine system. also, several flle s of dubious value to the new system · were to be maintained in order to gain empirical evidence as to their worth. this report is presented as a record of an attempt to develop an economical, yet highly flexible approach to the automating of acquisitions procedures of a university library. perhaps the scope of the computer-based acquisitions procedures at brown may be determined more easily relative to three recently reported systems of varying complexity. one of tl1e best surveys of automated university library acquisitions systems appears in the project report of the university of illinois, chicago campus (1). however, two of the systems summarized here are more recent. the university of michigan was included in the illinois literature survey, but the first full description to be published appeared just recently. · automated acquisitions procedures have been in operation at the university of michigan library since june, 1965 (2). the system features a list of items produced by computer from punch cards in which order · information has been recorded. this list is produced on a monthly basis with semi-weekly cumulative supplements. the computer also produces status report cards. these are punch cards, containing summarized order information, which travel with the book and at appropriate processing stages are coded and returned to the computer in order to up-date the status code in the processing list. thus by checking the status code one can determine that a book has been received, received and paid, or cataloged. claim notices are automatically produced for items which remain on order for longer than the predetermined period. in addition to creating and maintaining full financial records and compiling selected statistics, the system will produce specialized acquisitions lists on demand. yale university library creates a machine readable record of a request before it is searched or ordered ( 3). as a result, the status-monitoring system is almost immediately effective. an ibm 826 typewriter card punch is used to type purchase orders, and the ibm 357 data collection system is used to monitor the progress of an item through the system. the process information list is produced weekly with daily supplements. automatic claiming and financial record maintenance are also products of the system. moreover, numerous statistics are planned for management purposes. the fund control system reported by the university of hawaii features financial accounting for book purchases based on pre-punched cards corresponding to purchase orders typed ( 4). the list price is keypunched into the appropriate card in a separate operation and used to encumber library fund accounting system / wedgeworth 53 funds. upon receipt of the book the invoice is matched with the appropriate punch card, and after actual cost is keypunched the card is used to up-date the account. the michigan and yale systems incorporate all of the major features of operational university library automated acquisitions systems. foremost among them are the list of items being processed and its coordinate monitoring system. the cost of creating and maintaining such a file was prohibitive for brown. brown, michigan and hawaii generate a machine record after searching. unlike michigan and yale, brown and hawaii do not have "total" acquisitions systems plans. at brown serials control is not included. at hawaii fund accounting is the only task of the system. also, brown differs from michigan and yale in that the claiming procedure merely notifies the department that certain items are overdue. the brown system is certainly not as economical as that of hawaii, but the use of the typewriter card punch creates a highly flexible and easily expanded system for the difference in cost. manual files and procedures the manual routines of the order department are based upon the maintenance of four basic files. the file documents are all parts of the six-part purchase order form. the outstanding order search file is an alphabetical card file representing unfilled orders, requests t9 search for items, and inquiries for bibliographical information. this file is virtually independent of other routines, thus making it feasible for it to be merged with the file of items waiting to be cataloged. the processing file consists of outstanding orders filed first by book dealer, and second by order number. this file is used to check in shipments of books, to record reports on orders and to record claims. the numerical control file is an order number sequence file containing one copy of every order typed regardless of its ultimate disposition. it provides rapid access to information regarding retrospective orders. the fund file is a file of completed or cancelled transactions filed first by fund name and second by order number. the latter two files were thought to be of dubious value to the new system. however, it was agreed to maintain both for the time being. in order to accommodate the ftmd accounting system, the procedures developed feature two basic routines based on the presence or absence of a unique order number. unique order (figure 1) items acquired in this fashion include purchases and solicited gifts. continuations, but not serials, are included. when a request is received in the order department, it is searched in the main catalog, the waiting catalog and the outstanding order file. if it is found to be neither in the library nor on order, it is then given to an order assistant who completes the bibliographical work, if necessary, and assigns a fund and 54 journal of library automation vol. 1/1 march, 1968 fig. 1. unique order procedure. abbrf.v!ations kp key punch kv • key verify c exhibit c fimf~te maintenance . n.b . library fund accounting systemjwedgeworth 55 fund slip to kp card kp-kv record card · book arrives pull out .... t-----1 order card catalo!! er retuh~ update e* proposed accessions listinr pror ram fig. 1 continued. 56 journal of library automation vol. 1/ 1 march, 1968 dealer. if the price is listed in a foreign currency, the assistant converts it to u. s. dollars. the request then proceeds to the typist. all unique orders are typed on an 826 typewriter card punch. as the typist fills in the six-part order form, pre-~lected pieces of information are keypunched automatically. these fields are as follows : order number order date source type d for domestic, etc. fund number list price author title imprint series orders are proofread on the day after they are typed. the forms are separated and the outstanding order cards are filed immediately in order to detect duplicate orders. at this point the dealer slips are mailed and the numerical control slips filed. the processing file documents, each containing a fund slip, an l.c. order slip, and a cataloger's work slip on a separate perforation, are then filed pending the arrival of the books. also, the deck of ibm cards which has been weeded of voided orders goes to tabulating. ·. although books may be processed without invoices, the normal practice is to process after the arrival of the invoice. the processing file document is obtained and the cost, invoice date and the number of volumes are noted on the fund slip. if the item is a continuation, a supplementary fund slip is made and the original returned to the processing file with the receipt noted. the invoices are cleared and sent to the controller. the fund slips representing books received are sent to the keypuncher in order to up-date the accounts. in the meantime the books, along with the work slips and the l.c. order slips, are sent to the catalog department. as the books are cataloged, the work slips noting any major bibliographical changes and the call number are returned to the order department. from these slips are punched bibliographical adjustment cards and an up-date record card containing the call number and coded for subject and location. the resulting bibliographical record forms the data base for the new accessions listing. no unique order (figure 2 ) items acquired in this fashion include unsolicited gifts, exchanges, standing orders, etc. some continuations and all serials invoices are included. upon arrival, invoiced items without unique order numbers are searched. if they are duplicates they are retwned for credit. if they are not duplicates, they are sent to the typist. catalog file slips are typed library fund accounting systemjwedgeworth 57 kp • kv series 9 card book without unique order number here book & invoice to typist create slips & record card books & slips to cataloger invoice cheeked and si ned invoice to controller n.b. of course no r.ecord card will be made for alfts or exchanges fig. 2. no unique order procedure. 58 journal of library automation vol. 1/ 1 march, 1968 and by-product bibliographical and ·accounting records are punched. on the record card for accounting, the order number field is filled with nines. this signals the program that this entry is a receipt for which there was no unique order number. the series of order numbers beginning with 900000 was originally reserved for assignment to our standing order agreements with presses, societies, etc. eventually, each will have its own order number. however, the last number of the series, 999999, will continue to be used for miscellaneous receipts. presently no accessions listing records are being generated for items without unique order numbers. however, all purchases without unique order numbers are processed with a series 9 order number. serials all serial invoices are handled as series 9 transactions with no attempt to record bibliographical information or volume counts. expenditures for serials are accumulated and entered as one transaction each time the accounts are up-dated. this decision was made in anticipation of the development of a separate serials control program. ibm 1401 files and procedures the basic function of the computer program for the fund accounting system is to maintain current balances on the various library fund accounts and to maintain a file of outstandmg orders exclusive of standing orders. although several correlative functions are distinct possibilities, the only additionat function planned is a file of bibliographic records for the production of an accessions listing. figures 3, 4, 5 and 6 illustrate the major __ tasks to be performed by the system. the programming language used is autocoder. fund balance forward file a card file created at the beginning of each fiscal year having two card types. l fund group header card a. group code b. groq.p name this card assigns a unique code and name to categories of funds such as endowed, special, etc. 2. fund balance forward and appropriation card a. fund group code b. fund code c.-fund name d. previous year balance forward e. current income . or appropriation f. balance forward code g. remaining previous year encumbrances library fund accounting systemj wedgeworl'h 59 fund balance forward file create libr.ary fund accounting file fig. 3. fttnd file creation. fund grou9 headers fund listing this card contains information used to establish the individual funds at the beginning of each year. the balance forward code directs the program to carry over excess funds to the next year, not to carry over excess funds to the next year, or to carry over a negative balance to the next year, thereby reducing cash balance resulting from the new income or appropriation. encumbrances are carried over to the next year in order to maintain an accurate net available at all times. 60 journal of library automation vol. 1/ 1 march, 1968 .--i f/m i ·--fig. 4. file maintenance. completed record cards r-1 i i i i library fund accounting system/ wedgeworth 61 new or ders ------.., i i i i _..j fig. 5. fund accounts updating . 62 journal of library automation vol. 1/1 march, 1968 library fund file a magnetic tape file created from the fund balance forward file and containing three record types,. 1. fund group header 2. fund record a. fund group code b. fund code c. fund name d. previous year balance forward e. current income or appropriation f. current expenditures g. cash balance h. amount encumbered i. net available j. volumes purchased k. balance forward code fund record fields a, b, c, d, e, h and k initially are taken from the corresponding fields in the fund balance forward card. current year expenditures and volumes purchased are preset to zero each year. cash balance is determined by the sum of the previous year balanc(l forward_ and the current income or appropriation. amount encumbered will be · preset to zero or taken from the fund card. net available is determined by the difference between cash balance and amount encumbered. 3. fund group trailer this record is the last within each fund group and contains a summation of the quantitative fields in that fund group. it is used primarily for control purposes. figure 4 illustrates the file maintenance program for the library fund files. this program permits the addition or deletion of a fund group code, changes to a fund group header, addition or deletion of a specific fund or changes to a specific fund. however, changes to quantitative fields are limited to those fields which are contained in the fund balance forward card. thus, net available may not be changed directly by file maintenance but may be changed by manipulating current income or appropriation. . the library fund f ile is a serial file maintained in ascending algebraic sequence on fund group code, fund code and fund record from major to minor respectively. outstanding order file a magnetic disk file created and up-dated by three card types. 1. order card a. order number b. order date library fund accounting system/ wedgeworth 63 c. source type d is domestic, f is foreign d. fund number e. list price figure 5 illustrates the program which processes new orders. this program validates fund code, rejects duplicate order numbers and encumbers list price, thereby reducing net available. 2. record card a. order number b. invoice date c. fund code d. cost e. continuation order code, if applicable f. number of volumes standing orders, blanket orders, serials, etc. are purchased without placing an order. consequently, a series 9 order number is assigned to these record cards. such cards will not match the outstanding order file by definition but will increase amount expended, decrease cash balance and net available and increase volumes purchased. all other record cards must match an existing order number on file. on continuations the record card for each part received produces a transaction as described above, except that the encumbrance remains unchanged until the final record card appears without the continuation order code. 3. adjustment card this card may be submitted for either an order card or a record card. it is differentiated by a special code. its primary purpose is to correct a previous error or to effect a cancellation. · the outstanding order file is in ascending algebraic sequence by fund group, fund code and order number. all cards used in this program must be pre-sorted into this sequence. p1'intout products the accumulated punch cards are processed on a bi-weekly schedule by the tabulating office. a file maintenance report (figure 4) is the first product of each run. it lists in detail any adjustments, additions, or deletions to the fund listing plus the results of such operations. at the end of the detailed report is a summary of the status of each active fund. copies of this latter report are distributed for desk use to all order assistants, the chief order librarian, and the librarian. the transaction register of fund activity (figure 5) lists each transaction posted to each fund for the inclusive period. the assistant in charge· of bookkeeping is the primary user of this and the detailed file maintenance report. 64 journal of library automation vol. 1/ 1 march, 1968 the delinquent orders report (figure 6) lists all past due outstanding orders according to two cycles. domestic orders are listed bi-monthly and foreign orders are listed quart<;rly. the listing is of the "tickler" variety, as it may not be necessary to ask reports on all of the items. an order will remain on the delinquent orders report until it is filled or cancelled. control card list delinquent orders fig. 6. delinquent order listing. conclusion delinquent orders . as of october, 1967, the fund accounting system has been in operation for ten months. assessment of its effectiveness in terms of meeting the primary objective shows the system to be an immediate success. at this . point costs are about the same for the manual system as for the present one. however, accounts which used to require from 25 to 30 man-hours per month are maintained with about 5 man-hours per month. our current equipment and processing costs run about $325 per month. on the other hand, we have become aware of some shortcomings of the system. the addition of a currency conversion sub-routine would greatly expedite the many requests for foreign publications received daily. secondly, the addition of a dealer code would make the delinquent orders list much more useful. at present a user must search the numerical file for the order to ascertain the dealer. the processing file copies are then pulled to go to the typist who asks reports on delinquent orders. a revised program incorporating both of these features is being planned and will be operational early in 1968. the proposed accessions listing has been rejected as a by-product of this system primarily because of the limited character set available on our ibm 1403 print chain and the excessive length of the average listing. the time and expense of storing and up-dating the bibliographical record library fund accounting system j wedgeworth 65 for each new acquisition should, in our estimation, result in a more palatable end-product. we have, therefore, temporarily discontinued producing punch cards for the bibliographical records. as a corollary, it should be added that we have turned to a consideration of the paper tape typewriters as input/ output devices, focusing on their expanded character set and operating speed. the speed of the 826 leaves much to be desired. the numerical control file has proven its usefulness as a rapid index to our files spanning several years. it is extremely helpful in identifying quotes on old order numbers which have long since been cancelled. the fund file, however, has proven to be a duplicate of our machine file. it is thought that replacement of the slip in the numerical control file with the fund slips would at the same time reduce our files by one and up-date the information in the numerical file. finally, this modest beginning, occasioned by limited financial resources as well as the lack of personnel with experience in data processing, seems to have been justified. moreover, although the increasing complexity of our involvement in library automation poses some serious planning and supervisory problems, we are encouraged by our initial success. acknowledgments the staff of the order department have all contributed to the production of this report. however, a special note of gratitude is acknowledged for t:pe assistance of dorothy woods and gloria hagberg imd for the technical advice and assistance of ai hansen, library programmer, and david a. jonah, librarian. references 1. kozlow, robert d.: report on a library project conducted on the chicago campus of the university of illinois, (washington: nsf, 1966), p. 50. 2. dunlap, connie : "automated acquisitions procedures at the univer. sity of michigan library," library resources & technical services, 11 ·(spring 1967), 192. 3. alanen, sally; sparks, david e.; kilgour, frederick g.: "a computer. monitored library technical processing system," american documen· tation institute. proceedings, 3 ( 1966), 419. 4. · shaw, ralph r.: "conh·ol of book funds at the university of hawaii library," library resomces & technical services, 11 (summer 1967) , 380. editorial board thoughts: the importance of staff change management in the face of the growing “cloud” mark dehmlow information technology and libraries | march 2016 3 the library vendor market likes to throw around the word “cloud” to make their offerings seem innovative and significant. in many ways, much of what the library it market refers to as “cloud,” especially saas (software as a service) offerings, are really just a fancier term for hosted services. the real gravitas behind the label cloud really emanated from grid-computing or large interconnected, and quickly deployable infrastructure like amazon’s aws or microsoft’s azure platforms. infrastructure at that scale and that level of geographic distribution was revolutionary when it emerged. still these offerings at their core are basically iaas (infrastructure as a service) bundled as a menu of services. so i think the most broadly applicable synonym for the “cloud” could be “it as a service” in various forms. outsourcing in this way isn’t entirely new to libraries. the function and structure of oclc has arguably been one of the earlier instantiations of “it as a service” for libraries vis-à-vis their marc record aggregation and distribution which oclc has been doing for decades. the more recent trend toward hosted it services has been relatively easy for non-it related units in our library. a service no different to most library staff based on where it is hosted. and with many services implementing apis for libraries, that distinction is becoming less significant for our application developers too. for many of our technology staff, who have built careers around systems administration, application development, systems integration, and application management, hosted services represent a threat to not only their livelihoods but in some ways also their philosophical perspectives that are grounded in open source and do-ityourself oriented beliefs. in many ways the “cloud” for the it segment of our profession is perhaps more synonymous with change, and with change requires effective management of that change, especially for the human element of our organizations. recently, our office of information technologies started an initiative to move 80% of their technology infrastructure into the cloud. they have proposed an inverted pyramid structure for determining where it solutions should reside — focusing first on hosted software as a service solutions for the largest segment of applications, followed by hosting those applications we would have typically installed locally onto a platform or infrastructure as a service provider, and then limiting only those applications that have specialized technical or legal needs to reside on premise. this is a big shift for our it staff, especially, but not limited to, our systems administrators. the iaas platform our university is migrating to is amazon web services and their infrastructure is mark dehmlow (mdehmlow@nd.edu), a member of lita and the ital editorial board, is the director, information technology program, hesburgh libraries, university of notre dame, south bend, indiana. editorial board thoughts: the importance of staff change management in the face of the growing “cloud” | dehmlow | doi: 10.6017/ital.v35i1.8965 4 largely accessible via a web dashboard, so that the myriad of tasks our systems administrators took days and weeks to do can now, in some adjusted way, be accomplished with a few clicks. this example is on one extreme end of the spectrum as far as it change goes, but simultaneously, we have looked at the vendor market to lease pre-packaged tools that support standard functions in academic libraries and can be locally branded and configured with our data — things like course guides, a-z journal lists, scheduling events, etc. the overarching goals of these efforts are cost savings and increased velocity and resiliency of infrastructure, but also and perhaps more important, is giving us flexibility in how we invest our staff time. if we are able to move high level tasks from staff to a platform, then we will be able to reallocate our staff’s time and considerable talent to take on the constant stream of new, high level technology needs. partnering with the university, we are aiming towards their defined goal of moving 80% of our technical infrastructure into the “cloud.” we have adopted their overall strategy of approach to systems infrastructure, at least in principle and are integrating into our own strategy significant consideration for the impact of these changes on our staff. our organization has recognized that people form not only habits around process, but also personal and emotional attachments to why we do things the way we do them, both from a philosophical as well as a pragmatic perspective. our approach to staff change is layered as well as long term. we know that getting from shock to acceptance is not an overnight process and that staff who adopt our overarching goals and strategy as their own will be more successful in the long term. to make this transition, we have developed several strategic approaches: 1. explaining the case: my experience is that staff can live through most changes as long as they understand why. helping them gain that understanding can take some time, but ultimately having that comprehension will help them fully understand our strategic goals as well as help them make decisions that are in alignment with the overall approach. i often find it is important to remember that, as managers, we have been a part of all of the change conversations and we have had time to assimilate ideas, discuss points of view, and process the implications of change. each of our staff needs to go through the same process and it is up to leadership to guide them through that process and ensure they get to participate in similar conversations. it is tempting to want to hit an initiative running, but there is significant value in seeding those discussions gradually over a somewhat gradual time period to more holistically integrate staff into the broader vision. it is important to explain the case for change multiple times and actively listen to staff thoughts and concerns and to remember to lay out the context for change, why it is important, and how we intend to accomplish things. then reassure, reassure, and reassure. the threats to staff may seem innocuous or unfounded to managers, but staff need to feel secure during a process to ultimately buy in. 2. consistency and persistence: staff acceptance doesn’t always come easy — nor should it necessarily. listening and integrating their perspectives into the planning and information technology and libraries | march 2016 5 implementation process can help demonstrate that they matter, but equally important is that they feel our approach is built on something solid. stability is reinforced through consistency in messaging. not only in individual consistency, but also team consistency, and upper management consistency — everyone should be able to support and explain messaging around a particular change. any time staff approach me and say, “it was much easier to do it this other way,” i talk about the efficiency we will garner through this change and how we will be able to train and repurpose staff in the future. the more they hear the message, the more ingrained it becomes, and the more normative it begins to feel. 3. training and investment: it futures require investment, not just in infrastructure, but also in skill development. we continue to invest significantly in providing some level of training on new technologies that we implement. that training will not only prove to staff that you are invested in their development as well as their job security, but it will also give them the tools they need to be successful in implementing new technologies. change is anxiety inducing because it exposes so many unknowns. providing training helps build confidence and competence for staff, reducing anxieties and providing some added engagement in the process. it also gives them exposure to the real world implementation of technologies where they can begin to see the benefits that you have been communicating for themselves. 4. envisioning the future: improvements and roles — one of the initial benefits we will be getting from recouping staff time is around shoring up our processes. we have generally had a more ad hoc approach to managing the day to day. it has been difficult to institute a strong technical change management process, in part, because of time. we will be able to remove that consideration from our excuses as we take advantage of the “cloud.” the net effect will be that we will do our work more thoughtfully and less ad hoc and use better defined processes that will meet group-developed expectations. in addition to doing things better, we do expect to do things differently. with fewer tasks at the operational level, we believe we will be able to transition staff into newly defined roles. some of these roles include devops engineers, a hybrid of application engineering (the dev) and systems administration (the ops), these staff will help design automation and continuous integration processes that allow developers to focus on their programming and less on the environment they are deploying their applications in; financial engineers who will take system requirements and calculate costs in somewhat complex technical cloud environments; systems architects who will be focused on understanding the smorgasbord of options that can be tied together to provide a service to meet expected response performance, disaster recovery, uptime, and other requirements; and business analysts who will focus on taking technical requirements and looking at all of the potential approaches to solve that need whether it be a hosted service, a locally developed solution, an implementation of an open source system, or some integration of all or some of the editorial board thoughts: the importance of staff change management in the face of the growing “cloud” | dehmlow | doi: 10.6017/ital.v35i1.8965 6 above. this list is by no means exhaustive, but i think it forms a good foundation on which to help staff develop their skill set along with our changing environment. i believe it is important to remind those of us who are managing it departments in libraries that in many ways the easiest parts of change are the logistics. the technology we work with is bounded by sets of guidelines that define how they are used and ensure that if they are implemented properly, they will work effectively. people on the other hand are not bounded as neatly by stringent rules. they are guided by diverse backgrounds, personalities, experiences, and feelings. they can be unpredictable, difficult to fully figure out, and behaviorally inconsistent. and yet, they are the great constant in our organizations and therefore require significant attention. our field needs “servant leaders” dedicated to supporting and developing staff, and not just being competent at implementing technologies. those managers who invest in staff, their well-being, development, and sense of engagement in their jobs, will find their organizations are able to tackle most anything. but those who ignore their staffs’ needs over pragmatic goals will likely find their organizations struggling to move quickly and instead spend too much energy overcoming resistance instead of energizing change. reproduced with permission of the copyright owner. further reproduction prohibited without permission. wikiwikiwebs: new ways to communicate in a web environment chawner, brenda;lewis, paul h information technology and libraries; mar 2006; 25, 1; proquest education journals pg. 33 reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. lib-mocs-kmc364-20131012114209 (bath: the library, 1974-75). 2. valentine de bruin, "sometimes dirty things are seen on the screen," journal of academic librarianship 3:256-66 (nov. 1977). 3. carolyn m. cox and bonnie juergens, microform catalogs: a viable alternative for texas libraries (dallas: amigos bibliographical council, 1977). eric document no. ed 149 739. 4. james r. dwyer, "public response to an academic library microcatalog," ]ourru~l of academic librarianship 5:132-41 ouly 1979). 5. brett butler, martha w. west, and brian aveney, com catalog: use and evaluation: report of a field study of the los angeles county public library system (rev. ed.; los altos: information access corporation, 1979}, 71p. 6. theodora hodges and uri bloch, "fiche or film for com catalogs-two use tests" in library effectiveness: a state of the art (chicago: american library assn., 1980), p.122-30. 7. william saffady, computer-output microfilm: its library applications (chicago: american library assn., 1978), 190p. 8. commercial com catalogs: how to choose, when to buy. catalog use committee, reference and adult services division, american library association. (chicago: american library assn., 1978), 47p. 9. debruin, "dirty things," p.266. 10. hodges, "fiche or film," p.128. 11. hodges to crowley, september 1979. electronic order transmission james k. long: oclc, inc., dublin, ohio. in this era of decreasing library allocation from the public sector, libraries are realizing increased benefits from the automation of the acquisitions process. the price of hardware is decreasing and the capabilities of the available offerings increasing. we have evolved from the small local library collection of data and printing of orders, through the book vendor offerings of an online connection to a single vendors inventory. these systems still required local mailing for all other vendor orders. communications 295 in 1981 we have seen a greater emphasis on electronic ordering. memorial university in canada has been experimenting in sending orders directly to john coutts library services ltd. in print format using the utlas catss system. wayne state university is planning to use the ringgold nonesuch acquisitions system to transmit orders electronically to book house using the bisac tape format. blackwell/ north america and the academic book center have experimentally used wln to receive test orders in a print file format. these all save time in getting the orders to the respective vendor. if sufficient volume can be generated there may be a savings in transmission costs over the u.s. mail. however, in order to realize maximum economics in this electronic process, four activities need to occur. 1. acquisition orders must be collected from multiple libraries at a central site to generate volume for dispersal to multiple sites. 2. standard formats need to be accepted and enforced for order transmission. 3. the isbn must become a universally accepted part of the library acquisitions order. 4. the library must receive order status information from the vendor. once again, this should occur via a standard data format. at oclc there were 113 libraries, as of november 1981, thatcouldsendprintedorders from a central site to over 15,000 addressesoftheir choice. by july 1982 the projection is for over 200 libraries to be using the system. the library's order is hatched by the vendor address that the library has specified. this process offers savings by sharing mail and printing costs between participants. with the proposed installation of direct transmission in 1982, this central collection will afford shared transmission costs. this is the type of centralized collection that maximizes the benefits of electronic ordering. within the book industry, standards for electronic data transmission for book ordering have been developed. in may of 1981 the book industry systems advisory committee (bisac), a subcommittee of the book industry study group (bisg), ap296 journal of library automation vol. 14/4 december 1981 proved the third version of their purchase order format. this is a simple format with fixed length fields and fixed length records. it was developed for tape transmission of book orders and relies heavily on the use of the international standard book number (isbn) for accuracy. ansi z39 subcommittee u is working on an ansi purchase order data transmission format for libraries. this effort is in cooperation with bisac. in 1981 there were nine book vendors using the bisac purchase order format, including the large retail chains walden and dalton. there were also twelve vendors using the bisac invoice format, five vendors using the title update format, and one vendor using the approved data transmission protocol (ibm 3780). this book ordering activity and standards use is fine for the book vendors and retailers. but where are the libraries? oclc plans to use bisac data transmission protocol and fixed data format in their initial direct transmission effort. however, there are some real problems with these formats relative to library needs. first, the formats do not provide for serials ordering or renewal. second, data fields in the format are fixed length. this is a real problem when ordering esoteric publications. especially since the title and descriptor entries are a single field. obviously there are many items that a library needs to order that cannot be supported by this current standard. oclc and dataphase have representatives on the bisac purchase order subcommittee. this subcommittee is developing a variable length p. 0. format. however, if there is to be real cooperation, and the accompanying economics, we must have more active participation from the library community. the cataloger has the library of congress catalog number (lccn). however, this is inadequate for library acquisitions. the isbn was developed for acquisitions. the isbn identifies the publishers or current distributor, the binding, etc., so necessary for accurate acquisitions. you can order music, maps, recordings, or film by using the isbn. this is providing you order from a publisher that assigns isbns to those materials. it also assumes that you include the isbn on the order. baker and taylor, brodart, random house, and mcgrawhill estimate less than 25 percent of their orders contain an isbn. yankee book, blackwell/north america, and the book house report approximately 10 percent use on orders received. a significant number of these isbns are incorrect, obsolete, or otherwise erroneous. if we are to realize the tremendous economies possible with electronic transmission, we must have greater and more accurate use of the isbn. it is simply uneconomical to transmit all of the data necessary to accurately identify a piece via the cataloging fields and subfields for every order; even if this information were available for ordering. another standard developed by bisac is the standard address number (san). all library vendors, public, academic, and school libraries have been assigned a san. do you know yours? do you know the san of your vendor? these sans are available in your libraries' reference sections, as well as the online name-address directories that accompany the network acquisition systems. if electronic ordering is going to be used most effectively and economically, the san plays an important part. it is not economically efficient to transmit hundreds of characters of address information. the last item that becomes feasible with electronic transmission is order status information. the day is gone when we can afford to keep thousands of dollars encumbered with acquisition pieces that are unavailable. the normal practice of automatic cancel after sixty or ninety days, keeps those monies committed. how much better would it be to know within twentyfour to forty-eight hours of an order that the material was unavailable. those funds, that become more dear each year, could be recommitted to more available items. this would be advantageous to both the vendor, the library and ultimately the library patron. both the bisac invoice and title update formats have potential for use in reporting. it would be better, however, if we could derive a format specific for title status. in closing, i urge you to use the isbn and san; pursue avenues of collective ordering; and lastly, become active in the standards effort. it is the library that ultimately has the most to gain from a cooperative, coordinated, volume-oriented, resource-sharing electronic ordering process. for information relative to bisac transmission formats or bisac membership, write to: book industry systems advisory committee, 160 fifth ave., suite604, new york, ny 10010. for input to bisac purchase order formats, write to: j. k. long, chairman, bisac p.o. subcommittee, c/o oclc, inc., 6565 frantz rd., dublin, oh 43017. (mr. long is also the library or network representative on the isbn advisory council.) for input to the ansi z39 p.o. transmission formats, write to: mr. e. muro, chairman, subcommittee u, c/o baker & taylor co., 6 kirby ave., somerville, nj 08876. for problems with the isbn and san, write to: mr. emory i koltay, international standard book numbering agency, 1180 avenue of the americas, new york, ny 20036. microcomputer backup to online circulation sheila intner: emory university, atlanta, georgia. our primary objective in purchasing microcomputer systems for the great neck library was to provide a better alternative to paper and pencil checkouts when our minicomputer-based clsi libs 100 automated circulation system was down. two difficult and lengthy downtime periods occurring shortly after going online convinced the administration that public service should not be jeopardized because of system failure. after investigation of the backup systems vended by computer translation, inc., 1 two of them were purchased in november 1980. computer translation, inc. (cti) sells a turnkey backup system based on an apple ii plus microcomputer, with two mini-disk drives using 5\14 " floppy diskettes, a tv monitor, and a switching system connecting the apple to the libs 100 console and terminals. software designed to interface with the clsi system is part of the package. the backup collects and stores data for communications 297 check-ins and checkouts and then dumps them into the database by simulating a terminal when the mini-main-frame is operational again. this requires dedicating a terminal to this process until complete. it can also be used alone as a portable unit for circulation purposes, or with any of the many applesoft packages available, or with an applesoft program of the user's own design. our initial experience in great neck was with a borrowed demonstration system, set up by a sympathetic cti representative on the spur of the moment in tandem with and connected to the main library checkout station's crt laser terminal after several days of downtime. the circulation staff cheered as the familiar prompts appeared on both screens. they used the clsi equipment which they were accustomed to operating and the computer room staff learned to operate the cti system. the ease with which the apple could be transported to different locations in the building and the immediate relief it gave wherever it was connected, sometimes one checkout station, sometimes another, led us to put off deciding on a permanent installation at first. we thought it might be more advantageous to keep it on a rolling cart and use it wherever a terminal was down, or wherever the traffic appeared to be heaviest. we continued in this manner for a while even after both of our own apple systems were delivered. it soon became apparent that the apple and its accompaniments, especially the switching system with its dangling cables, was a nuisance at the checkout counter. people with piles of books or records tended to nudge it dangerously close to the edge or jiggle its connections loose. the circulation staff didn't like waiting until someone from the computer room could be spared to bring up the system, secure the connections, and turn on the apple. also, although the apple is a very reliable instrument which has given us negligible downtime, bumpy rides over various floors , carpets, lintels, and textured tiles occasionally loosened its chips and rendered it, too, inoperative. cti representatives were called in to make a more permanent installation for the apple in our computer room , a simple operation requiring some additional cable. se58 information technology and libraries | june 2010 know its power, and facets can showcase metadata in new interfaces. according to mcguinness, facets perform several functions in an interface: ■■ vocabulary control ■■ site navigation and support ■■ overview provision and expectation setting ■■ browsing support ■■ searching support ■■ disambiguation support5 these functions offer several potential advantages to the user: the functions use category systems that are coherent and complete, they are predictable, they show previews of where to go next, they show how to return to previous states, they suggest logical alternatives, and they help the user avoid empty result sets as searches are narrowed.6 disadvantages include the fact that categories of interest must be known in advance, important trends may not be shown, category structures may need to be built by hand, and automated assignment is only partly successful.7 library catalog records, of course, already supply “categories of interest” and a category structure. information science research has shown benefits to users from faceted search interfaces. but do these benefits hold true for systems as complex as library catalogs? this paper presents an extensive review of both information science and library literature related to faceted browsing. ■■ method to find articles in the library and information science literature related to faceted browsing, the author searched the association for computing machinery (acm) digital library, scopus, and library and information science and technology abstracts (lista) databases. in scopus and the acm digital library, the most successful searches included the following: ■■ (facet* or cluster*) and (usability or user stud*) ■■ facet* and usability in lista, the most successful searches included combining product names such as “aquabrowser” with “usability.” the search “catalog and usability” was also used. the author also searched google and the next generation catalogs for libraries (ngc4lib) electronic discussion list in an attempt to find unpublished studies. search terms initially included the concept of “clustering”; however, this was quickly shown to be a clearly defined, separate topic. according to hearst, “clustering refers to the grouping of items according to some measure faceted browsing is a common feature of new library catalog interfaces. but to what extent does it improve user performance in searching within today’s library catalog systems? this article reviews the literature for user studies involving faceted browsing and user studies of “next-generation” library catalogs that incorporate faceted browsing. both the results and the methods of these studies are analyzed by asking, what do we currently know about faceted browsing? how can we design better studies of faceted browsing in library catalogs? the article proposes methodological considerations for practicing librarians and provides examples of goals, tasks, and measurements for user studies of faceted browsing in library catalogs. m any libraries are now investigating possible new interfaces to their library catalogs. sometimes called “next-generation library catalogs” or “discovery tools,” these new interfaces are often separate from existing integrated library systems. they seek to provide an improved experience for library patrons by offering a more modern look and feel, new features, and the potential to retrieve results from other major library systems such as article databases. one interesting feature these interfaces offer is called “faceted browsing.” hearst defines facets as a “a set of meaningful labels organized in such a way as to reflect the concepts relevant to a domain.”1 labarre defines facets as representing “the categories, properties, attributes, characteristics, relations, functions or concepts that are central to the set of documents or entities being organized and which are of particular interest to the user group.”2 faceted browsing offers the user relevant subcategories by which they can see an overview of results, then narrow their list. in library catalog interfaces, facets usually include authors, subjects, and formats, but may include any field that can be logically created from the marc record (see figure 1 for an example). using facets to structure information is not new to librarians and information scientists. as early as 1955, the classification research group stated a desire to see faceted classification as the basis for all information retrieval.3 in 1960, ranganathan introduced facet analysis to our profession.4 librarians like metadata because they jody condit fagan (faganjc@jmu.edu) is content interfaces coordinator, james madison university library, harrisonburg, virginia. jody condit fagan usability studies of faceted browsing: a literature review usability studies of faceted browsing: a literature review | fagan 59 doing so and performed a user study to inform their decision. results: empirical studies of faceted browsing the following summaries present selected empirical research studies that had significant findings related to faceted browsing or interesting methods for such studies. it is not an exhaustive list. pratt, hearst, and fagan questioned whether faceted results were better than clustering or relevancy-ranked results.11 they studied fifteen breast-cancer patients and families. every subject used three tools: a faceted interface, a tool that clustered the search results, and a tool that ranked the search results according to relevance criteria. the subjects were given three simple queries related to breast cancer (e.g., “what are the ways to prevent breast cancer?”), asked to list answers to these before beginning, and to answer the same queries after using all the tools. in this study, subjects completed two timed tasks. first, subjects found as many answers as possible to the question in four minutes. second, the researchers measured the time subjects took to find answers to two specific questions (e.g., “can diet be used in the prevention of breast cancer?”) that related to the original, general query. for the first task, when the subjects used the faceted interface, they found more answers than they did with the other two tools. the mean number of answers found using the faceted interface was 7.80, for the cluster tool it was 4.53, and for the ranking tool it was 5.60. this difference was significant (p<0.05).12 for the second task, the researchers found no significant difference between the tools when comparing time on task. the researchers gave the subjects a user-satisfaction questionnaire at the end of the study. on thirteen of the fourteen quantitative questions, satisfaction scores for the faceted interface were much higher than they were for either the ranking tool or the cluster tool. this difference was statistically significant (p < 0.05). all fifteen users also affirmed that the faceted interface made sense, was helpful, was useful, and had clear labels, and said they would use the faceted interface again for another search. yee et al. studied the use of faceted metadata for image searching, and browsing using an interface they developed called flamenco.13 they collected data from thirty-two participants who were regular users of the internet, searching for information either every day or a few times a week. their subjects performed four tasks (two structured and two unstructured) on each of two interfaces. an example of an unstructured task from their study was “search for images of interest.” an example of a structured task was to gather materials for an art history of similarity . . . typically computed using associations and commonalities among features where features are typically words and phrases.”8 using library catalog keywords to generate word clouds would be an example of clustering, as opposed to using subject headings to group items. clustering has some advantages according to hearst. it is fully automated, it is easily applied to any text collection, it can reveal unexpected or new trends, and it can clarify or sharpen vague queries. disadvantages to clustering include possible imperfections in the clustering algorithm, similar items not always being grouped into one cluster, a lack of predictability, conflating many dimensions, difficulty labeling groups, and counterintuitive subhierarchies.9 in user studies comparing clustering with facets, pratt, hearst, and fagan showed that users find clustering difficult to interpret and prefer a predictable organization of category hierarchies.10 ■■ results the author grouped the literature into two categories: user studies of faceted browsing and user studies of library catalog interfaces that include faceted browsing as a feature. generally speaking, the information science literature consisted of empirical studies of interfaces created by the researchers. in some cases, the researchers’ intent was to create and refine an interface intended for actual use; in others, the researchers created the interface only for the purposes of studying a specific aspect of user behavior. in the library literature, the studies found were generally qualitative usability studies of specific library catalog interface products. libraries had either implemented a new product, or they were thinking about figure 1. faceted results from jmu’s vufind implementation 60 information technology and libraries | june 2010 uddin and janacek asked nineteen users (staff and students at the asian institute of technology) to use a website search engine with both a traditional results list and a faceted results list.22 tasks were as follows: (1) look for scholarship information for a masters program, (2) look for staff recruitment information, and (3) look for research and associated faculty member information within your interested area.23 they found that users were faster when using the faceted system, significantly so for two of the three tasks. success in finding relevant results was higher with the faceted system. in the post–study questionnaire, participants rated the faceted system more highly, including significantly higher ratings for flexibility, interest, understanding of information content, and more search results relevancy. participants rated the most useful features to be the capability to switch from one facet to another, preview the result set, combine facets, and navigate via breadcrumbs. capra et al. compared three interfaces in use by the bureau of labor statistics website, using a between-subjects study with twenty-eight people and a within-subjects study with twelve people.24 each set of participants performed three kinds of searches: simple lookup, complex lookup, and exploratory. the researchers used an interesting strategy to help control the variables in their study: because the bls website is a highly specialized corpus devoted to economic data in the united states organized across very specific time periods (e.g., monthly releases of price or employment data), we decided to include the us as a geographic facet and a month or year as a temporal facet to provide context for all search tasks in our study. thus, the simple lookup tasks were constructed around a single economic facet but also included the spatial and temporal facets to provide context for the searchers. the complex lookup tasks involve additional facets including genre (e.g. press release) and/or region.25 capra et al. found that users preferred the familiarity afforded by the traditional website interface (hyperlinks + keyword search) but listed the facets on the two experimental interfaces as their best features. the researchers concluded, “if there is a predominant model of the information space, a well designed hierarchical organization might be preferred.”26 zhang and marchionini analyzed results from fifteen undergraduate and graduate students in a usability study of an interface that used facets to categorize results (relation browser ++).27 there were three types of tasks: ■■ type 1: simple look-up task (three tasks such as “check if the movie titled the matrix is in the library movie collection”). ■■ type 2: data exploration and analysis tasks (six tasks essay on a topic given by the researchers and to complete four related subtasks. the researchers designed the structured task so they knew exactly how many relevant results were in the system. they also gave a satisfaction survey. more participants were able to retrieve all relevant results with the faceted interface than with the baseline interface. during the structured tasks, participants received empty results with the baseline interface more than three times as often as with the faceted interface.14 the researchers found that participants constructed queries from multiple facets in the unstructured tasks 19 percent of the time and in the structured tasks 45 percent of the time.15 when given a post–test survey, participants identified the faceted interface as easier to use, more flexible, interesting, enjoyable, simple, and easy to browse. they also rated it as slightly more “overwhelming.” when asked to choose between the two, twenty-nine participants chose the faceted interface, compared with two who chose the baseline (n = 31). thirty-one of the thirty-two participants said the faceted interface helped them learn more, and twentyeight of them said it would be more useful for their usual tasks.16 the researchers concluded that even though their faceted interface was much slower than the other, it was strongly preferred by most study participants: “these results indicate that a category-based approach is a successful way to provide access to image collections.”17 in a related usability study on the flamenco interface, english et al. compared two image browsing interfaces in a nineteen-participant study.18 after an initial search, the “matrix view” interface showed a left column with facets, with the images in the result set placed in the main area of the screen. from this intermediary screen, the user could select multiple terms from facets in any order and have the items grouped under any facet. the “singletree” interface listed subcategories of the currently selected term at the top, with query previews underneath. the user could then only drill down to subcategories of the current category, and could not select terms from more than one facet. the researchers found that a majority of participants preferred the “power” and “flexibility” of matrix to the simplicity of singletree. they found it easier to refine and expand searches, shift between searches, and troubleshoot research problems. they did prefer singletree for locating a specific image, but matrix was preferred for browsing and exploring. participants started over only 0.2 percent of the time for the matrix compared to 4.5 percent for singletree.19 yet the faceted interface, matrix, was not “better” at everything. for specific image searching, participants found the correct image only 22.0 percent of the time in matrix compared to 66.0 percent in singletree.20 also, in matrix, some participants drilled down in the wrong hierarchy with wrong assumptions. one interesting finding was that in both interfaces, more participants chose to begin by browsing (12.7 percent) than by searching (5.0 percent).21 usability studies of faceted browsing: a literature review | fagan 61 of the first two studies: the first study comprised one faculty member, five graduate students, and two undergraduate students; the second comprised two faculty members, four graduate students, and two undergraduate students. the third study did not report results related to faceted browsing and is not discussed here. the first study had seven scenarios; the second study had nine. the scenarios were complex: for example, one scenario began, “you want to borrow shakespeare’s play, the tempest, from the library,” but contained the following subtasks as well: 1. find the tempest. 2. find multiple editions of this item. 3. find a recent version. 4. see if at least one of the editions is available in the library. 5. what is the call number of the book? 6. you’d like to print the details of this edition of the book so you can refer to it later. participants found the interface friendly, easy to use, and easy to learn. all the participants reported that faceted browsing was useful as a means of narrowing down the result lists, and they considered this tool one of the differentiating features between primo and their library opac or other interfaces. facets were clear, intuitive, and useful to all participants, including opening the “more” section.31 one specific result from the tests was that “online resources” and “available” limiters were moved from a separate location to the right with all other facets.32 in a study of aquabrowser by olson, twelve subjects— all graduate students in the humanities—participated in a comparative test in which they looked for additional sources for their dissertation.33 aquabrowser was created by medialab but is distributed by serials solutions in north america. this study also had three pilot subjects. no relevance judgments were made by the researchers. nine of the twelve subjects found relevant materials by using aquabrowser that they had not found before.34 olson’s subjects understood facets as a refinement tool (narrowing) and had a clear idea of which facets were useful and not useful for them. they gave overwhelmingly positive comments. only two felt the faceted interface was not an improvement. some participants wanted to limit to multiple languages or dates, and a few were confused about the location of facets in multiple places, for example, “music” under both format and topic. a team at yale university, led by bauer, recently conducted two tests on pilot vufind installations: a subject-based presentation of e-books for the cushing/ whitney medical library and a pilot test of vufind using undergraduate students with a sample of 400,000 records from the library system.35 vufind is open-source software developed at villanova university (http://vufind.org). that require users to understand and make sense of the information collection: “in which decade did steven spielberg direct the most movies?”). ■■ type 3: (one free exploration task: “find five favorite videos without any time constraints”). the tasks assigned for the two interfaces were different but comparable. for type 2 tasks, zhang and marchionini found that performance differences between the two interfaces were all statistically significant at the .05 level.28 no participants got wrong answers for any but one of the tasks using the faceted interface. with regard to satisfaction, on the exploratory tasks the researchers found statistically significant differences favoring the faceted interface on all three of the satisfaction questions. participants found the faceted interface not as aesthetically appealing nor as intuitive to use as the basic interface. two participants were confused by the constant changing and updating of the faceted interface. the above studies are examples of empirical investigations of experimental interfaces. hearst recently concluded that facets are a “proven technique for supporting exploration and discovery” and summarized areas for further research in this area, such as applying facets to large “subject-oriented category systems,” facets on mobile interfaces, adding smart features like “autocomplete” to facets, allowing keyword search terms to affect order of facets, and visualizations of facets.29 in the following section, user studies of next-generation library catalog interfaces will be presented. results: library literature understandably, most studies by practicing librarians focus on products their libraries are considering for eventual use. these studies all use real library catalog records, usually the entire catalog’s database. in most cases, these studies were not focused on investigating faceted browsing per se, but on the usability of the overall interface. in general, these studies used fewer participants than the information science studies above, followed less rigorous methods, and were not subjected to statistical tests. nevertheless, they provide many insights into the user experience with the extremely complex datasets underneath next-generation library catalog interfaces that feature faceted browsing. in this review article, only results specifically relating to faceted browsing will be presented. sadeh described a series of usability studies performed at the university of minnesota (um), a primo development partner.30 primo is the next-generation library catalog product sold by ex libris. the author also received additional information from the usability services lab at um via e-mail. three studies were conducted in august 2006, january 2007, and october 2007. eight users from various disciplines participated in each 62 information technology and libraries | june 2010 participants. the researchers measured task success, duration, and difficulty, but did not measure user satisfaction. their study consisted of four known-item tasks and six topic-searching tasks. the topic-searching tasks were geared toward the use of facets, for example, “can you show me how would you find the most recently published book about nuclear energy policy in the united states?”45 all five participants using endeca understood the idea of facets, and three used them. students tried to limit their searches at the outset rather than search and then refine results. an interesting finding was that use of the facets did not directly follow the order in which facets were listed. the most heavily used facet was library of congress classification (lcc), followed closely by topic, and then library, format, author, and genre.46 results showed a significantly shorter average task duration for endeca catalog users for most tasks.47 the researchers noted that none of the students understood that the lcc facet represented call-number ranges, but all of the students understood that these facets “could be used to learn about a topic from different aspects—science, medicine, education.”48 the authors could find no published studies relating to the use of facets in some next-generation library catalogs, including encore and worldcat local. although the university of washington did publish results of a worldcat local usability study in a recent issue of library technology reports, results from the second round of testing, which included an investigation of facets, were not yet ready.49 ■■ discussion summary of empirical evidence related to faceted browsing empirical studies in the information science literature support many positive findings related to faceted browsing and build a solid case for including facets in search interfaces: ■■ facets are useful for creating navigation structures.50 ■■ faceted categorization greatly facilitates efficient retrieval in database searching.51 ■■ facets help avoid dead ends.52 ■■ users are faster when using a faceted system.53 ■■ success in finding relevant results is higher with a faceted system.54 ■■ users find more results with a faceted system.55 ■■ users also seem to like facets, although they do not always immediately have a positive reaction. ■■ users prefer search results organized into predictable, multidimensional hierarchies.56 ■■ participants’ satisfaction is higher with a faceted system.57 the team drew test questions from user search logs in their current library system. some questions targeted specific problems, such as incomplete spellings and incomplete title information. bauer notes that some problems uncovered in the study may relate to the peculiarities of the yale implementation. the medical library study contained eight participants—a mix of medical and nursing students. facets, reported bauer, “worked well in several instances, although some participants did not think they were noticeable on the right side of the page.”36 the prompt for the faceted task in this study came after the user had done a search: “what if you wanted to look at a particular subset, say ‘xxx’ (determine by looking at the facets).”37 half of the participants used facets, half used “search within” to narrow the topic by adding keywords. sixty-two percent of the participants were successful at this task. the undergraduate study asked five participants faced with a results list, “what would you do now if you only wanted to see material written by john adams?”38 on this task, only one of the five was successful, even though the author’s name was on the screen. bauer noted that in general, “the use of the topic facet to narrow the search was not understood by most participants. . . . even when participants tried to use topic facets the length of the list and extraneous topics rendered them less than useful.”39 the five undergraduates were also asked, “could you find books in this set of results that are about health and illness in the united states population, or control of communicable diseases during the era of the depression?”40 again, only one of the five was successful. bauer notes that “the overly broad search results made this difficult for participants. again, topic facets were difficult to navigate and not particularly useful to this search.”41 bauer’s team noted that when the search was configured to return more hits, “topic facets become a confusingly large set of unrelated items. these imprecise search results, combined with poor topic facet sets, seemed to result in confusion for test participants.”42 participants were not aware that topics represented subsets, although learning occurred because the “narrow” header was helpful to some participants.43 other results found by bauer’s team were that participants were intrigued by facets, navigation tools are needed so that patrons may reorder large sets of topic facets, format and era facets were useful to participants, and call-number facets were not used by anyone. antelman, pace, and lynema studied north carolina state university’s (ncsu) next-generation library catalog, which is driven by software from endeca.44 their study used ten undergraduate students in a between-subjects design where five used the endeca catalog and five used the library’s traditional catalog. the researchers noted that their participants may have been experienced with the library’s old catalog, as log data shows most ncsu users enter one or two terms, which was not true of study usability studies of faceted browsing: a literature review | fagan 63 one product’s faceted system for a library catalog does not substitute for another, the size and scope of local collections may greatly affect results, and cataloging practices and metadata will affect results. still, it is important for practicing librarians to determine if new features such as facets truly improve the user’s experience. methodological best practices after reading numerous empirical research studies (some of which critique their own methods) and library case studies, some suggestions for designing better studies of facets in library catalogs emerged. designing the study ■■ consider reusing protocols from previous studies. this provides not only a tested method but also a possible point of comparison. ■■ define clear goals for each study and focus on specific research questions. it’s tempting to just throw the user into the interface and see what happens, but this makes it difficult, if not impossible, to analyze the results in a useful way. for example, one of zhang and marchionini’s hypotheses specifically describes what rich interaction would look like: “typing in keywords and clicking visual bars to filter results would be used frequently and interchangeably by the users to finish complex search tasks, especially when large numbers of results are returned.”64 ■■ develop the study for one type of user. olson’s focus on graduate students in the dissertation process allowed the researchers to control for variables such as interest of and knowledge about the subject. ■■ pilot test the study with a student worker or colleague to iron out potential wrinkles. ■■ let users explore the system for a short time and possibly complete one highly structured task to help the user become used to the test environment, interface, and facilitator.65 unless you are truly interested in the very first experience users have with a system, the first use of a system is an artificial case. designing tasks ■■ make sure user performance on each task is measurable. will you measure the time spent on a task? if “success” is important, define what that would look like. for example, english et al. defined success for one of their tasks as when “the participant indicated (within the allotted time) that he/she had reached an appropriate set of images/specific image in the collection.”66 ■■ establish benchmarks for comparison. one can test for significant differences between interfaces, one can test for differences between research subjects and an expert user, and one can simply measure against ■■ users are more confident with a faceted system.58 ■■ users may prefer the familiarity afforded by traditional website interface (hyperlinks + keyword search).59 ■■ initial reactions to the faceted interface may be cautious, seeing it as different or unfamiliar.60 users interact with specific characteristics of faceted interfaces, and they go beyond just one click with facets when it is permitted. english et al. found that 7 percent of their participants expanded facets by removing a term, and that facets were used more than “keyword search within”: 27.6 percent versus 9 percent.61 yee et al. found that participants construct queries from multiple facets 19 percent of the time in unstructured tasks; in structured tasks they do so 45 percent of the time.62 the above studies did not use library catalogs; in most cases they used an experimental interface with record sets that were much smaller and less complicated than in a complete library collection. domains included websites, information from one website, image collections, video collections, and a journal article collection. summary of practical user studies related to faceted browsing this review also included studies from practicing librarians at live library implementations. these studies generally had smaller numbers of users, were more likely to focus on the entire interface rather than a few features, and chose more widely divergent methods. studies were usually linked to a specific product, and results varied widely between systems and studies. for this reason it is difficult to assemble a bulleted summary as with the previous section. the variety of results from these studies indicate that when faceted browsing is applied to a reallife situation, implementation details can greatly affect user performance and user preference. some, like labarre, are skeptical about whether facets are appropriate for library information. descriptions of library materials, says labarre, include analyses of intellectual content that go beyond the descriptive terms assigned to commercial items such as a laptop: now is the time to question the assumptions that are embedded in these commercial systems that were primarily designed to provide access to concrete items through descriptions in order to enhance profit.63 it is clear that an evaluation of commercial interfaces or experimental interfaces does not substitute for an opac evaluation. yet it is a challenge for libraries to find expertise and resources to conduct user studies. the systems they want to test are large and complex. collaborating with other libraries has its own challenges: an evaluation of 64 information technology and libraries | june 2010 groups of participants, each of which tests a different system. ■❏ a within-subjects design has one group of participants test both systems. it is hoped that if libraries use the suggestions above when designing future experiments, results across studies will be more comparable and useful. designing user studies of faceted browsing after examining both empirical research studies and case studies by practicing librarians, a key difference seems to be the specificity of research questions and designing tasks and measurements to test specific hypotheses. while describing a full user-study protocol for investigating faceted browsing in a library catalog is beyond the scope of this article, reviewing the literature and the study methods it describes provided insights into how hypotheses, tasks, and measurements could be written to provide more reliable and comparable evidence related to faceted browsing in library catalog systems. for example, one research question could surround the format facet: “compared with our current interface, does our new faceted interface improve the user’s ability to find different formats of materials?” hypotheses could include the following: 1. users will be more accurate when identifying the formats of items from their result set when using the faceted interface than when using the traditional interface. 2. users will be able to identify formats of items more quickly with the faceted interface than with the traditional interface. looking at these hypotheses, here is a prompt and some example tasks the participants would be asked to perform: “we will be asking you to find a variety of formats of materials. when we say formats of materials, we mean books, journal articles, videos, etc.” ■■ task 1: please use interface a to search on “interpersonal communication.” look at your results set. please list as many different formats of material as you can. ■■ task 2: how many items of each format are there? ■■ task 3: please use interface b to search on “family communication.” what formats of materials do you see in your results set? ■■ task 4: how many items of each format are there?” we would choose the topics “interpersonal communication” and “family communication” because our local catalog has many material types for these topics and because these topics would be understood by most of our students. we would choose different topics to expectations or against previous iterations of the same study. for example, “75 percent of users completed the task within five minutes.” zhang and marchionini measured error rates, another possible benchmark.67 ■■ consider looking at your existing opac logs for zeroresults searches or other issues that might inspire interesting questions. ■■ target tasks to avoid distracters. for example, if your catalog has a glut of government documents, consider running the test with a limit set to exclude them unless you are specifically interested in their impact. for example, capra et al. decided to include the united states as a geographic facet and a month or year as a temporal facet to provide context for all search tasks in their study.68 ■■ for some tasks, give the subjects simple queries (e.g., “what are the ways to prevent breast cancer?”) as opposed to asking the subjects to come up with their own topic. this can help control for the potential challenges of formulating one’s own research question on the spot. as librarians know, formulating a good research question is its own challenge. ■■ if you are using any timed tasks, consider how the nature of your tasks could affect the result. for example, pratt, hearst, and fagan noted that the time that it took subjects to read and understand abstracts most heavily influenced the time for them to find an answer.69 english et al. found that the system’s processing time influenced their results.70 ■■ consider the implications of your local implementation carefully when designing your study. at yale, the team chose to point their vufind instance at just 400,000 of their records, drew questions from problems users were having (as shown in log files), and targeted questions to these problems.71 who to study? ■■ try to study a larger set of users. it is better to create a short test with many users than a long test with a few users. nielsen suggests that twenty users is sufficient.72 consider collaborating with another library if necessary. ■■ if you test a small number, such as the typical four to eight users for a usability test, be sure you emphasize that your results are not generalizable. ■■ use subjects who are already interested in the subject domain: for example, pratt, hearst, and fagan used breast cancer patients,73 and olson used graduate students currently writing their dissertations.74 ■■ consider focusing on advanced or scholarly users. la barre suggests that undergraduates may be overstudied.75 ■■ for comparative studies, consider having both between-subjects and within-subjects designs.76 ■❏ a between-subjects design involves creating two usability studies of faceted browsing: a literature review | fagan 65 these experimental studies. previous case-study investigations of library catalog interfaces with facets have proven inconclusive. by choosing more specific research questions, tasks, and measurements for user studies, libraries may be able to design more objective studies and compare results more effectively. references 1. marti a. hearst, “clustering versus faceted categories for information exploration,” communications of the acm 49, no. 4 (2006): 60. 2. kathryn la barre, “faceted navigation and browsing features in new opacs: robust support for scholarly information seeking?” knowledge organization 34, no. 2 (2007): 82. 3. vanda broughton, “the need for faceted classification as the basis of all methods of information retrieval,” aslib proceedings 58, no. 1/2 (2006): 49–71. 4. s. r. ranganathan, colon classification basic classification, 6th ed. (new york: asia, 1960). 5. deborah l. mcguinness, “ontologies come of age,” in spinning the semantic web: bringing the world wide web to its full potential, ed. dieter fensel et al. (cambridge, mass.: mit pr., 2003): 179–84. 6. hearst, “clustering versus faceted categories,” 60. 7. ibid., 61. 8. ibid., 59. 9. ibid.. 60. 10. wanda pratt, marti a. hearst, and lawrence m. fagan, “a knowledge-based approach to organizing retrieved documents,” proceedings of the sixteenth national conference on artificial intelligence, july 18–22, 1999, orlando, florida (menlo park, calif.: aaai pr., 1999): 80–85. 11. ibid. 12. ibid., 5. 13. ka-ping yee et al., “faceted metadata for image search and browsing,” 2003, http://flamenco.berkeley.edu/papers/ flamenco-chi03.pdf (accessed oct. 6, 2008). 14. ibid., 6. 15. ibid., 7. 16. ibid. 17. ibid., 8. 18. jennifer english et al., “flexible search and navigation,” 2002, http://flamenco.berkeley.edu/papers/flamenco02.pdf (accessed apr. 22, 2010). 19. ibid., 7. 20. ibid., 6. 21. ibid., 7. 22. mohammed nasir uddin and paul janecek, “performance and usability testing of multidimensional taxonomy in web site search and navigation,” performance measurement and metrics 8, no. 1 (2007): 18–33. 23. ibid., 25. 24. robert capra et al., “effects of structure and interaction style on distinct search tasks,” proceedings of the 7th acm-ieee-cs joint conference on digital libraries (new york: acm, 2007): 442–51. 25. ibid., 446. 26. ibid., 450. help minimize learning effects. to further address this, we would plan to have half our users start first with the traditional interface and half to start first with the faceted interface. this way we can test for differences resulting from learning. the above tasks would allow us to measure several pieces of evidence to support or reject our hypotheses. for tasks 1 and 3, we would measure the number of formats correctly identified by users compared with the number found by an expert searcher. for tasks 2 and 4, we would compare the number of items correctly identified with the total items found in each category by an expert searcher. we could also time the user to determine which interface helped them work more quickly. in addition to measuring the number of formats identified and the number of items identified in each format, we would be able to measure the time it takes users to identify the number of formats and the number of items in each format. to measure user satisfaction, we would ask participants to complete the system usability scale (sus) after each interface and, at the very end of the study, complete a questionnaire comparing the two interfaces. even just selecting the format facet, we would have plenty to investigate. other hypotheses and tasks could be developed for other facet types, such as time period or publication date, or facets related to the responsible parties, such as author or director: hypothesis: users can find more materials written in a certain time period using the faceted interface. task: find ten items of any type (books, journals, movies) written in the 1950s that you think would have information about television advertising. hypothesis: users can find movies directed by a specific person more quickly using the faceted interface. task: in the next two minutes, find as many movies as you can that were directed by orson welles. for the first task above, an expert searcher could complete the same task, and their time could be used as a point of comparison. for the second, the total number of movies in the library catalog that were directed by welles is an objective quantity. in both cases, one could compare the user’s performance on the two interfaces. ■■ conclusion reviewing user studies about faceted browsing revealed empirical evidence that faceted browsing improves user performance. yet this evidence does not necessarily point directly to user success in faceted library catalogs, which have much more complex databases than those used in 66 information technology and libraries | june 2010 53. uddin and janecek, “performance and usability testing”; zhang and marchionini, evaluation and evolution; hao chen and susan dumais, bringing order to the web: automatically categorizing search results (new york: acm, 2000): 145–52. 54. uddin and janecek, “performance and usability testing.” 55. ibid.; pratt, hearst, and fagan, “a knowledge-based approach”; hsinchun chen et al., “internet browsing and searching: user evaluations of category map and concept space techniques,” journal of the american society for information science 49, no. 7 (1998): 582–603. 56. vanda broughton, “the need for faceted classification as the basis of all methods of information retrieval,” aslib proceedings 58, no. 1/2 (2006): 49–71; pratt, hearst, and fagan, “a knowledge-based approach,” 80–85.; chen et al., “internet browsing and searching,” 582–603; yee et al., “faceted metadata for image search and browsing”; english et al., “flexible search and navigation using faceted metadata.” 57. uddin and janecek, “performance and usability testing”; zhang and marchionini, evaluation and evolution; hideo joho and joemon m. jose, slicing and dicing the information space using local contexts (new york: acm, 2006): 66–74.; yee et al., “faceted metadata for image search and browsing.” 58. yee et al., “faceted metadata for image search and browsing”; chen and dumais, bringing order to the web. 59. capra et al., “effects of structure and interaction style.” 60. yee et al., “faceted metadata for image search and browsing”; capra et al., “effects of structure and interaction style”; zhang and marchionini, evaluation and evolution. 61. english et al., “flexible search and navigation,” 7. 62. yee et al., “faceted metadata for image search and browsing,” 7. 63. la barre, “faceted navigation and browsing,” 85. 64. zhang and marchionini, evaluation and evolution, 183. 65. english et al., “flexible search and navigation.” 66. ibid., 6. 67. zhang and marchionini, evaluation and evolution. 68. capra et al., “effects of structure and interaction style.” 69. pratt, hearst, and fagan, “a knowledge-based approach.” 70. english et al., “flexible search and navigation.” 71. bauer, “yale university library vufind test—undergraduates.” 72. jakob nielsen, “quantitative studies: how many users to test?” online posting, alertbox, june 26, 2006 http://www.useit .com/alertbox/quantitative_testing.html (accessed apr. 7, 2010). 73. pratt, hearst, and fagan, “a knowledge-based approach.” 74. tod a. olson used graduate students currently writing their dissertations. olson, “utility of a faceted catalog for scholarly research,” library hi tech 25, no. 4 (2007): 550–61. 75. la barre, “faceted navigation and browsing.” 76. capra et al., “effects of structure and interaction style.” 27. junliang zhang and gary marchionini, evaluation and evolution of a browse and search interface: relation browser++ (atlanta, ga.: digital government society of north america, 2005): 179–88. 28. ibid., 183. 29. marti a. hearst, “uis for faceted navigation: recent advances and remaining open problems,” 2008, http://people. ischool.berkeley.edu/~hearst/papers/hcir08.pdf (accessed apr. 27, 2010). 30. tamar sadeh, “user experience in the library: a case study,” new library world 109, no. 1/2 (jan. 2008): 7–24. 31. ibid., 22. 32. jerilyn veldof, e-mail from university of minnesota usability services lab, 2008. 33. tod a. olson, “utility of a faceted catalog for scholarly research,” library hi tech 25, no. 4 (2007): 550–61. 34. ibid., 555. 35. kathleen bauer, “yale university library vufind test— undergraduates,” may 20, 2008, http://www.library.yale.edu/ usability/studies/summary_undergraduate.doc (accessed apr. 27, 2010); kathleen bauer and alice peterson-hart, “usability test of vufind as a subject-based display of ebooks,” aug. 21, 2008, http://www.library.yale.edu/usability/studies/summary _medical.doc (accessed apr. 27, 2010). 36. bauer and peterson-hart, “usability test of vufind as a subject-based display of ebooks,” 1. 37. ibid., 2. 38. ibid., 3. 39. ibid. 40. ibid., 4. 41. ibid. 42. ibid., 5. 43. ibid., 8. 44. kristin antelman, andrew k. pace, and emily lynema, “toward a twenty-first century library catalog,” information technology & libraries 25, no. 3 (2006): 128–39. 45. ibid., 139. 46. ibid., 133. 47. ibid., 135. 48. ibid., 136. 49. jennifer l. ward, steve shadle, and pam mofield, “user experience, feedback, and testing,” library technology reports 44, no. 6 (aug. 2008): 22. 50. english et al., “flexible search and navigation.” 51. peter ingwersen and irene wormell, “ranganathan in the perspective of advanced information retrieval,” libri 42 (1992): 184–201; winfried godert, “facet classification in online retrieval,” international classification 18, no. 2 (1991): 98–109.; w. godert, “klassificationssysteme und online-katalog [classification systems and the online catalogue],” zeitschrift für bibliothekswesen und bibliographie 34, no. 3 (1987): 185–95. 52. yee et al., “faceted metadata for image search and browsing”; english et al., “flexible search and navigation.” academic web site design and academic templates | peterson 217 academic web site design continues to evolve as colleges and universities are under increasing pressure to create a web site that is both hip and professional looking. many colleges and universities are using templates to unify the look and feel of their web sites. where does the library web site fit into a comprehensive campus design scheme? the library web site is unique due to the wide range of services and content available. based on a poster session presented at the twelfth annual association of college and research libraries conference in minneapolis, minnesota, april 2005, this paper explores the prevalence of university-wide academic templates on library web sites and discusses factors libraries should consider in the future. c ollege and universities have a long history with the web. in the early 1990s, university web sites began as piecemeal projects with varying degrees of complexity—many started as informational sites for various technologically advanced departments on campus. over the last decade, these web sites have become a vital part of postsecondary institutions and one of their most visible faces. academic web sites communicate the brand and mission of an institution. they are used by prospective students to learn about an institution and then used later to apply. current students use them to pay tuition bills, register for classes, access course materials, participate in class discussions, take tests, get grades, and more. online learning and course-management software programs, such as blackboard, continue to increase the use of web sites. they are now an important learning tool for the entire campus community and the primary communication tool for current students, parents, alumni, the community, donors, and funding organizations. web site standards have developed since the 1990s. usability and accessibility are now important tenets for web site designers, especially for educational institutions. as a result, campus web designers or outside consultants are often responsible for designing large parts of the academic web site. as web sites have grown, ongoing maintenance is an important workload issue. databases and other technologies are used to simplify daily updates and changes to web sites. this is where the academic template fits in. an academic template can be defined as a common or shared template used to control the formatting of web pages in different departments on a campus. generally, administrators will mandate the use of a specific template or group of templates. this mandate includes guidelines for such things as layout, design, color, font, graphics, and navigation links to be used on all web pages. often, the templates are administered using content management systems (cmss) or web development software such as macromedia’s contribute. these programs give different levels of editing rights to individuals, thus keeping tight control over particular web pages or even parts of web pages. academic templates give the web site administrator the ability to change the template and update all pages with a single keystroke. for example, the web site administrator may give editing rights to content editors, such as librarians, to edit only the center section of the web page. the remaining parts of the page such as the top, sides, and bottom are locked and cannot be edited. the result of using templates is that the university web site is very unified and consistent. this is particularly important in creating a brand for the university. well-branded institutions have the opportunity to increase revenue, improve administration and faculty staffing, improve retention, and increase alumni relationships.1 but what about the library? libraries are one of the most visited web pages on a university’s web site.2 thus, the design of the library page can be crucial to a well-designed academic web site. the library web site can set a tone for an institution and help prospective students get a feel for the campus. belanger, mount, and wilson contend it is important for the image of an institution to match the reality.3 if there is discord between the two, students may choose an inappropriate college and quickly drop out, lowering a campus’s retention data. the library web site can also be important in the recruitment of new faculty members. in addition, libraries use their web sites for marketing, public relations, and fund-raising for the library.4 library web sites are crucial to delivering data, research tools, and instruction to students, faculty, staff, and community patrons. more than 90 percent of students access the library from their home computers, and 78 percent prefer this form of access.5 today, the web site connects users with article citations and databases, library catalogs, full-text journals, magazines, newspapers, books, videos, dvds, e-books, encyclopedias, streaming music and video, and more. users access subject-specific research guides, library tutorials, information-literacy instruction, and critical evaluation tools. services such as interlibrary loan (ill), reference management programs such as endnote or refworks, and print and electronic reserves are also used via the web. users get help with doing research by e-mail and virtual chat. in addition, libraries are digital repositories for a growing number of digital historic documents and archives. academic web site design and academic templates: where does the library fit in? kate peterson kate peterson (katepeterson@gmail.com) is an information literacy librarian at capella university, minneapolis, minnesota. 218 information technology and libraries | december 2006 how common are academic templates in library web sites? what effect do they have on the content and services provided by libraries? ■ methods for the purposes of this study, a list of doctoral, master’s, and bachelor of arts (ba) institutions (private and public) based on the carnegie classification of institutions of higher education was created and a random number table was used to select a sample of web pages (n=216).6 home pages, admissions pages, departmental pages, and library web pages were analyzed. a similarly sized sample of each type was selected to give a broad overview of trends—18 percent of doctoral institutions (n=47), 19 percent of master’s institutions (n=115), and 23 percent of ba institutions (n=54). the following questions were asked: ■ does the college or university web site use an academic template? ■ if yes, is the library using the template, and for how much of the library web site? ■ to what extent is the template being used? primarily, a web site was determined to be using an academic template based on the look of the site. for example, if the majority of the web elements (top banner, navigation) all matched, then the web site was counted as using some sort of template. use and nonuse of content management system (cms) software behind the web site was not considered in this study—only the look of the web site. ■ results a majority of college and university web sites (94 percent) use an academic template. fifty percent of the libraries surveyed use the academic template for at least the library’s home page. of that number, about 34 percent of libraries use the template on a majority of the library pages. roughly 44 percent of the total libraries surveyed did not use the academic template, and approximately 5 percent of academic web sites do not use any sort of unified academic template. smaller ba institutions are more likely to use the academic template on multiple library pages than doctoral institutions, which tend to have their own library design or template (see table 1). for those libraries that did not use the academic template on every library page, the most commonly used elements template were the top header (which often has the university seal or an image of the university), the top navigation bar (with university-wide links), and the bottom footer, which often contains the university address, privacy statement, or legal disclaimers. less frequently used elements were the bottom navigation bar, and the left or right navigation bar with university-wide links (see tables 2–3). ■ discussion while many colleges and universities use academic templates, only about half of their libraries follow suit. libraries using the template often use selected parts of the template, or only use the template on their home page. though not considered in this study, there may be a correlation between institution size and template use, as larger institutions are more likely to have library web designers and thus use the academic template only on the library’s home page. while academic templates can cause libraries many problems, there are also many benefits to be considered. ■ problems with academictemplates on library web sites the primary concern with any template is how much space is available for content. for example, there may be a very small box for the page content while images, banner bars, and large navigation links may take up most of the real estate on the page. this problem can be exacerbated for libraries because there are so many different types of content such as the library catalog, databases, tutorials, forms, ill, and other library services delivered via the web. libraries can be caught between the design imposed by the academic template and the rigid size requirements from outside vendors such as database companies, ill or reserve modules, federated search products, or others. academic templates are usually mandated by administrators without a full understanding of the specific content and uses of the library web site. many problems can occur when trying to fit an existing library web site into a poorly designed academic template. it can be very difficult to modify the template effectively for the library’s purposes. an example of one specific problem is confusing links on the template, where a link on every page to the “university catalog” links to the course catalog and not the library catalog, which is very confusing for users. another example is a search box as part of the academic template—what are users searching? the university web site? the library web site? the library catalog? the world wide web? another drawback to using academic templates for library web sites can be the time involved in training librarians, staff, and library web site administrators. the existing academic web site design and academic templates | peterson 219 content must be fit into the new template—a huge project, given that many library web sites contain one thousand pages or more. generally, a decision to use a template is accompanied by a decision to use a cms or new web-page editor. this takes yet more time to train individuals on the new software in addition to the new template. ■ benefits of using academic templates one of the benefits for libraries using an academic template is the ability to exploit the expertise of the web site designers who created the template. the academic template often incorporates images, logos, and branding that the library may not be able to design otherwise. many libraries do not have professional web designers on staff; even if they do, there often is no one person who designs and maintains the entire library web site. instead, different parts of a library web site are designed and maintained by different individuals with varying degrees of web site ability. as a result, many library web sites are a mix of styles, which can be disorienting for students who are familiar with the university’s “look.” web site uniformity has a positive effect on usability since familiarity with one part of the web site helps students, faculty, table 1. percentages of occurrences of academic templates no academic template (%) library not using template (%) library using template—transition or top page (%) library using template—majority of pages (%) bachelor of arts 4 37 13 46 master’s 6 48 12 34 doctoral 6 45 28 21 table 2. occurrence of templates in academic and library web sites no academic template library not using template library using template— transition or top page library using template— majority of pages total sites analyzed bachelor of arts 2 20 7 25 54 master’s 7 55 14 39 115 doctoral 3 21 13 10 47 total 12 96 34 74 216 table 3. percentages of occurrence for institutions using the academic-wide template for first page of library web site or libraries using modified academic template ba (%) master’s (%) doctoral (%) all colleges and universities (%) top header (no navigation) 100 94 94 91 top navigation 75 82 82 76 bottom header (no navigation) 83 65 76 72 bottom navigation 25 18 18 20 left navigation 42 18 18 24 right navigation 8 0 0 2 220 information technology and libraries | december 2006 and staff navigate other parts of the web site. even web site basics such as knowing the color and style of the links and how to navigate to different pages can be helpful.8 another benefit is academic templates are generally ada compliant as required under section 508 of the rehabilitation act of 1973.9 as usability and usability testing become more prevalent, academic template designers may also test the template and navigation for usability. such testing will improve the template and thus the library web site as well. ■ trends in academic and library web sites colleges and universities are responding to a new generation of students, the majority of whom have grown up with computers. in trying to meet their needs and desires, many academic web sites have high-quality photographs, quotes, and testimonials from the universities’ students on their home pages. more and more materials are being placed online to allow both prospective and current students to do what they need to do twenty-four hours a day, from registering for classes to handing in research papers. many web sites have interactive elements such as instant polls or quizzlets or use instant messaging to connect with tech-savvy students. for example, prospective students can chat with admissions staff members or current students about what it is like to attend a particular university. a large number of sites also highlight weblogs written by current students or those studying abroad. these features allow students to use the technology they are comfortable with to maximize their academic experience. numerous library web sites are changing as well, featuring a library catalog, article database, or federated search box on the home page to allow users to search instantly. additionally, library sites are beginning to include images of students using the library, external or internal shots of the building, flash graphics, icons, and sound. many incorporate screen captures to help users navigate specific databases or forms. in addition, an increasing number of libraries use weblogs to give more of a dynamic quality with daily library news and announcements. ■ strategies for using academic templates based on comments received in april 2005 during the poster session, and in recent electronic discussion list postings, many academic libraries are dealing with these issues. libraries should work on creating a mission statement and objectives for their web sites that expand upon the library’s mission, the institutional web site’s mission, and the institution’s overall mission and brand. librarians must be knowledgeable about web site usability and trends in web site design in order to communicate effectively to designers and administrators. librarians should also become members of campus web committees and be a voice for library users during the design process. teaching administrators and campus web designers about the library and the library web site’s prominence are important tools to successfully deal with any proposed university-wide academic templates. for example, a librarian could mock-up a few pages, conduct informal usability testing, and invite administrators to learn firsthand about potential problems library users could experience with a template. librarians could also propose a modified template that uses a few key elements from the academic template. this would maintain the brand but retain enough space for important library content. connecting with other librarians and learning from each other’s successes and failures will also help bring insight into this academic template issue. ■ conclusion the use of academic templates is only going to increase as institutional web sites grow in complexity and importance. libraries are an important part of institutions both physically—on campus—and virtually—as part of the campus web site. academic templates are part of a unified design scheme for colleges and universities. librarians must work with both library and university administrators to create a well-designed but usable library web site. they must advocate for library users and continue to help students and faculty access the rich resources and services available from the library. library administrators need to allocate resources and staff time to improve their web sites and to work in concert with academic web site designers to merge the best of the academic template to the best of the library site while not sacrificing users’ needs. the result will be highly used, highly usable library web sites that attract students and keep them coming back to access the fantastic world of information available in today’s academic libraries. ■ references 1. robert sevier, “university branding: 4 keys to success,” university business 5, no. 1 (2002): 27–28. 2. mignon adams and richard m. dougherty, “how useful is your homepage? a quick and practical approach to evaluating a library’s web site,” college & research libraries news 63, no. 8 (2002): 590–92. academic web site design and academic templates | peterson 221 3. charles belanger, joan mount, and mathew wilson, “institutional image and retention,” tertiary education and management 8, no. 3 (2002): 217. 4. jeanie m. welch, “the electronic welcome mat: the academic library web site as a marketing and public-relations tool,” the journal of academic librarianship 31, no. 3 (2005): 225–28. 5. oclc, “how academic librarians influence students’ web-based information choices,” in oclc online computer library center database online, (2002), 5, http://www5.oclc .org/downloads/community/informationhabits.pdf (accessed march 10, 2005). 6. carnegie foundation, carnegie classification of institutions of higher education, 2000 edition, http://www.carnegiefound ation.org/classification/ (accessed jan. 8, 2005). 7. beth evans, “the authors of academic library home pages: their identity, training, and dissemination of web construction skills,” internet research 9, no. 4 (1999): 309–19. 8. oclc, 6. 9. u.s. department of justice, section 508 home page, in united states department of justice database online, (2004), 1, http://www.usdoj.gov/crt/508/ (accessed july 3, 2005). statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: john webb, librarian emeritus, washington state university libraries, pullman, wa 99164-5610. annual subscription price, $55. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: june 2006 issue). total number of copies printed: average, 5,256; actual, 5,300. sales through dealers and carriers, street vendors, and counter sales: none. paid or requested mail subscriptions: average, 4,262; actual, 4,280. free distribution (total): average, 59; actual, 67. total distribution: average, 4,758; actual, 4,769. office use, leftover, unaccounted, spoiled after printing: average, 498; actual, 531. total: average, 5,256; actual, 5,300. percentage paid: average, 98.76; actual, 98.60. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , o c t o b e r 1 9 9 9 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , s e p t e m b e r 3 0 , 2 0 0 6 . lib-s-mocs-kmc364-20140601052745 146 journal of lihmry automation vol. 5/2 june, 1972 book reviews book catalogs. by maurice f. tauber and hilda feinberg. metuchen, n.j.: scarecrow press, 1971. 572 p. $15.00 in 1963 kingery & tauber published a collection entitled book catalogs. this is a much larger follow-up, containing twenty papers published between 1964 and 1970 and eight previously unpublished pieces. not surprisingly, nearly all of them are concerned with computer-produced book catalogs-in academic, special, county, public, and school libraries. although nearly all of the previously published papers appeared in wellknown journals, it is useful to have them collected together; the older ones are now of mainly historical interest, but, taken as a whole, they form a valuable record of trial and error-also of progress. it would be unfair to single out any of the published articles for special praise or blame. in a rapidly changing field, even the good is soon improved upon. it is the examples, the castings, and above all the mistakes that are so helpful. there is no excuse now for running into problems that have in the past led to the total scrapping of some computer systems: unforeseen filing difficulties, insufficient computer storage, bad economic estimating, and inability to produce an acceptable product. one major problem is still unsolved and indeed has not really been tackled systematically-the pattern of output (main sequence and supplements) that provides maximum usability at minimal cost-a problem surely amenable to or techniques. as a reviewer from the united kingdom, i would like to have seen a little more on relevant events there than is provided by frederick g. kilgour's general review: the smaller budgets of british libraries have generally enforced much more careful planning and, although there may be fewer successes, there are also very few failures. the introduction and the three final pieces, all specially written, are of great value, particularly hilda feinberg's "sample book catalogs and their characteristics" (some samples are unbelievably horrible). for good measure there is a bibliography, a (computer-produced) index, and the listing of "book form catalogs" reprinted from lrts. book reviews 147 i would hazard a guess that it is with com that the future lies for many libraries. the next collection of papers, for which i hope we shall not have to wait eight years, must surely be entitled "book and microform catalogs." maurice b. line an introduction to pljl programming for library and information science. library and information science series. by thomas h. mott, jr., susan artandi, and leny struminger. new york: academic press, 1972. 231 p. the importance of this text rests in the authors' assumptions that the acquisition of programming skills by the library student is an essential component of his education in the fields of library automation and information retrieval. such skills should enable the student to examine critically the relevance of automated information handling for the library, to experiment with some basic methods of manipulating machine readable textual material, and, "to acquire an understanding of the role of the programmer in the development of ... information handling techniques." the selection of a programming language for this text deserves some comment. pl/1 has been recognized as a particularly suitable language for the processing of textual material and data base management applications. its extensive and powerful repertoire of bit, character, string, array, record, and file manipulation capabilities argue strongly in favor of its adoption for library and other information handling applications. students should be encouraged by the selection of pl/1 for this text, for it offers the novice great flexibility and ease in constructing and manipulating even the most complex types of information structures. this title constitutes the first published attempt to tailor an introductory programming text to the needs of the library student. as such, it possesses several characteristics which distinguish it from other basic programming books, including other pl/1 texts. the language features receiving the greatest share of attention in the present title are the set of built-in functions in pl/1 designed to facilitate the manipulation of strings of both binary and character data. discussion of four of these functions ( bool, unspec, verify, and translate) is usually omitted from general introductory pl/1 textbooks. although the discussions of the bool and unspec functions are reasonably complete, the explanations of verify and translate fail to indicate the scope of their applications. for example, the utility of the verify function as an index function for ranges is completely ignored. a more illuminating example of the power of the translate function could have explored its usefulness in converting ascii characters to the corresponding characters of the ebcdic set. this might have clarified the section entitled "internal representation of pl/ 1 characters," which contains an equivalence table for the pl/1 character 148 journal of library automation vol. 5/2 june, 1972 set in ascii and ebcdic without indicating its purpose. additional use of this example could have been made in the presentation of the marc material, where the practical value of such a function could be stressed. another desirable feature of this text for the instructor and the library student is the inclusion of sample problems and exercises which , since they refer exclusively to text processing, library automation , and information retrieval, should be readily understandable. unfortunately, the present volume omits any mention of the picture attribute and its uses. as a powerful device facilitating the interchange of data between numeric and character variables and the uncomplicated editing of numeric fields prior to output time, its inclusion would have proved valuable to the tex t handling programmer. however, it should be emphasized that this appears to be the single instance in the text in which a generally acknowledged basic language feature has been entirely excluded. it seems to me that too much of the text ( 15-25 percent) is devoted to developing some of the elementary concepts of boolean algebra and constructing a theoretical model of document retrieval based on these concepts. one possible explanation for this emphasis is the fact that the material for the book was drawn from a graduate seminar in programming theory for information handling. although these chapters are informative and the exposition of ideas is straightforward, they should have been omitted. the space which they occupy could have been used more successfully to explore those pl/1 features essential for information handling but excluded or treated too briefly in the present volume. a list of such topics would include: an expanded discussion of program interrupts and the on condition, a description of pl/1 record formats emphasizing the variable length record, and a guide to the use of the varying structure method of writing variable length records. the deficiencies of this text are its overemphasis of information retrieval theory and applications, and its failure to stress those features of pl/ 1 which would enable the student to appreciate the file-handling capabilities of the language. however, for many instructors the availability of programming examples which should be easily grasped by the library student may strongly outweigh these disadvantages. h award s. harris guidelines f01' library automation; a handbook for federal and other libraries. by barbara evans markuson, judith wagner, sharon schatz, and donald black. santa monica, calif.: system development corporation, 1972. 401 p. $12.50 this handbook is the result of a 1970 study on the status of fed eral library automation projects which was conducted under the auspices of tlw federal book reviews 149 library committee's task force on automation. the survey was carried out by the system development corporation and funded by the u.s. office of education. it is one of two reports generated from the study data, the other report being aut01nation and the federal library community. the study consisted of a questionnaire survey of 2,104 federal libraries of which 964 responded. of that number, 57 libraries had one or more functions automated and ten had one or more functions in various stages of development or planning. the survey revealed that, among other activities, 27 cataloging systems (presumably "cataloging" means catalog card production), 25 serials systems, and 13 circulation systems were operational. the handbook purports to help the federal librarian answer the question: .. is it feasible to use automation for my library?" it attempts to do this by presenting step-by-step guidelines "from the initial feasibility survey through systems analysis and design to fully operational status." that material more or less follows a pattern of discussion on automation procedure followed by a checklist of the procedures in chart form. the areas covered include "feasibility guidelines" concerning such points as equipment, personnel, budget, and existing files; and "systems development guidelines" which include planning, analysis, design, implementation, and operation. the discussions include brief reviews of the various aspects of automation development, and statements describing the experiences of federal librarians as reported in the study. in this fashion, the reader is informed of the steps that should be considered with each aspect of automation development and, additionally, he is informed of what his colleagues have previously done about each phase and/or problem. much of this material is too general and too brief to do more than call the reader's attention to the fact that certain requirements must be met in the successful development of an automation project. a large portion of the book is taken up with descriptions of automation projects in 59 federal libraries. this overview of the federal sector provides limited descriptive information about each library and reviews the various applications in terms of system descriptions, equipment, programs, future plans, documentation, etc. the reviews are not consistent in that not all of the above points are included in every review. this, however, is the result of the data submitted to the survey by the respondents. approaches have been provided to this survey material by automated application, form of publication, type of equipment used, and by the special features of each system. surprisingly, there is no approach by name of library. at least one very important library is not represented, i.e., livermore, but for some reason, a similar library, los alamos, is included. the final section of the book is a potpourri of information about nonfederal automation activities and is the weakest section of the volume. it includes a list of "automated libraries" that was published before and is very incomplete and poorly defined. additionally, it briefly discusses data bases, commercial ventures, and for no apparent reason suddenly includes 150 journal of library automation vol. 5/2 jun e, 1972 22 pages of information on microforms in libraries. it just as suddenly reverts back to automation and proceeds to provide 23 pages of data on input/output hardware in libraries. the final section is a selected bibliography that seems almost as aimless as the section before it. the items included "have been selected on the basis of their particular interest and applicability to federal libraries," it is stated. they range over the whole spectrum of library automation, and some items have nothing to do with automation at all. there is no index to the book as a whole and a fair number of errors are present. in summary, the book includes a limited amount of rather old information most of which is available in other places in far greater detail. it appears that sdc had some rather weak survey data that seemed like it should be used! as a book of "guidelines" it does succeed in providing information in uncluttered and simplified form , but it is a very disappointing publication that leaves much to be desired both in substance and in organization. donald p. hammer canadian marc; a report of the activities of the marc task group resulting in a recommended canadian marc format for monographs and a canadian marc format for serials. recommended to the national librarian. by dr. guy sylvestre. ottawa: the national library of canada, 1972. canada's approach to the realization of a proposed format for machinereadable cataloging data was influenced by several factors. first and foremost was the fact that canada is bi-lingual, dictating the requirement for the possible representation of data in both french and english. in addition, the national library of canada wanted to continue its interaction with the library of congress and also to coordinat e the development of a canadian marc with international developments. the formats recommended are for the communication of machinereadable cataloging data. the processing of the data by local libraries was not ignored. it was recognized that this could involve ( 1) expansion. of the format to accommodate processing data (e.g., for acquisitions, serial control); and ( 2) the development of data format independent software for effective data storage and retrieval (e.g., a data management system with logical and physical characteristics of data described independently of specific applications software). the marc task group was established as a result of the recommendations of the conference on cataloguing standards held at the national library of canada in may 1970. the mission of the task group was to study the requirements for a format for machine-readable bibliographic records to be used in canada. the group was not to concern itself with book reviews 1.51 cataloging standards as such, since these were to be considered by the task group on cataloguing standards. the marc task group limited its attention to monographs and serials because this was the greatest need at the time. it was felt that after development of these two basic formats, i.e., monographs and serials, other formats for films, manuscripts, maps, etc., could be more logically developed. recognizing that canada has two official languages and that this creates specific bibliographic needs, the task group's first recommendation was that the national library of canada assume the responsibility for developing a distinctive canadian marc format. variations from the library of congress format are to be kept to a minimum, due to: • economic considerations. • dedication of canadian library communication (in common with the library of congress) to the full application of the aacr, american edition and the "version fran9aise." • willingness of canada for continued heavy reliance upon the library of congress for answering its bibliographical needs in both the traditional way as well as in machine-readable form. • readiness of canada to accept future bibliographic developments and amendments proposed by the library of congress, e.g., new filing rules. it is further recommended that: • the development of a separate canadian marc be coordinated with international developments such as isbd (international standard bibliographic description) and isds (international serials data system). • the national library of canada adopt the precis (preserved context index system) developed for bnb for the purpose of adding subject data to marc records for canadian publications in the form of descriptors. • any new data elements and varying levels of completeness of data introduced into the format in the future (for other media, specialized collections, or retrospective conversions) do not conflict with the basic specifications recommended for canadian marc. several studies were made by the task group. one addressed the need for marc formats and the user requirements for such formats, keeping in mind the need for bi-lingual content in the perspective of an international marc as to data for author, title, collation and notes, geographic names, and subject. format requirements were based on a comparison of the united states and united kingdom formats and the examination of italian and other national marc formats. an intensive study was made of the proposed library of congress format for serials. the implications and requirements for a marc format to be used in conjunction with information retrieval and indexing systems were also examined. the best formats were then defined and recommended to the national librarian. 152 journal of library automation vol. 5/2 june , 1972 the format recommended for monographs may be summarized as follows: 1. the tags are mainly from the library of congress marc-ii, with adoptions from bnb and monocle. particular attention was paid to avoiding conflict with any of the national formats. the library of congress 900 tags were expanded to provide canadian libraries the option of selecting data in bi-lingual content, i.e., the data for the secondary entry fields could be represented in either the french or english equivalent. 2. the indicators specified in the library of congress format have been retained. some additional ones from bnb and monocle have been added. 3. the subfield codes of the library of congress format have been used most often with additional ones from bnb. there is no basic conflict with the library of congress marc. canadian marc is more specific and the more precise specifications are hospitable to the library of congress format. it was felt that the subfielding for filing values or relationships found in monocle could be met by software. 4. descriptive and bibliographic content are not altered in any way since they are dealt with by cataloging codes. however, for codified content (e.g., codes for language, geographic area, bibliographic area, intellectual level), use of standard international codes is recommended. meanwhile, library of congress marc-ii codes will be used for some fields, e.g., languages, geographic area . for serials, it was the intention of the task group to maintain compatibility with the canadian marc format for monographs. however, it was necessary to study the proposed formats for serials issued by the library of congress, mass-a marc-based automated serials system proposed in the united kingdom by the birmingham libraries co-operative mechanisation project, and the french monocle. the proposed canadian marc format for serials has been based on the recommendation for the processing of serials issued by the task group on cataloguing standards. data elements were isolated to meet special applications such as: 1. the preparation of union lists for serial holdings with minimal bibliographic data (e.g., by broad subject groupings, by form division). 2. the bibliographic description of canadian serials for a national bibliography. 3. the development of local library in-house systems for acquisition, processing, and control of serials. 4. the preparation of a canadian serials directory incorporating a minimum of data and with a constant update facility. book reviews 153 this diversity of requirements led the task group to state several beliefs. first, the isolation of data elements for local library in-house systems and the compatibility of these data elements to allow for the exchange of computer programs can best be done by allocating a tag structure in a format separate to the main serials communication format. second, there is a requirement for the relating of entries in the serial and monograph format (e.g., monographs in series which may appear in either format). if an exchange of data between the two formats is necessary, there may be a need to have an additional tag or a more extensive tagging structure for titles and series title entries. the specific recommendations for serials were that the national library should: 1. participate in the unesco proposals for an international serials data system in which the isolation of data elements for international exchange will have a direct bearing on the elements in a canadian marc serials format. 2. immediately initiate any action deemed advisable within the international proposals to provide standard serial numbers for canadian serial publications. 3. consider the preparation of a canadian serials directory as a separate project. 4. initiate a pilot project with other libraries to test the proposed canadian serials format prior to full implementation. 5. on the basis of the above recommendations, explicitly state which data elements are necessary. (the proposed format for serials recommended has those elements asterisked that the task group believed were not necessary. these are all processing control-oriented, e.g., frequency control, publication patterns, indexing, and abstracting coverage.) the report includes three comparative tables to be used in evaluating the proposed canadian marc formats. table 1 compares, for monographs, the library of congress, united kingdom, french (monocle), and italian formats against the format proposed for canada. table 2 compares the library of congress proposed format and the mass format for serials against the format proposed for canada. table 3 compares the canadian format for monographs against the canadian format for serials. copies of the table 1 were submitted to the united states, the united kingdom, france, and italy for review and comments. the resulting revisions were not incorporated in the report since this would have delayed publication. the tagging structure, therefore, may be slightly revised when the canadian marc user's manual is finalized. however, those interested in the compatibility of the canadian formats with the library of congress formats and the implications of the canadian formats for an international marc format will find the tables sufficient. lillian h. \vashi11g ton 154 ]oumal of library automation vol. 5/ 2 june, 1972 monocle: pro;et de mise en ordinateur d'une notice catalographique de livre. publications de ia bibliotheque universitaire de grenoble, 4. [par) marc chauveinc. 2.eme ed. grenoble: bibliotheque interuniversitaire, 1972. 197 p. plus 25 annexes and errata a review of the 1st edition of monocle appeared in ]ola in march 1971 ( v. 4, no. 1, pp. 57-58). readers are referred to that review and to the article by m. chauveinc in the september 1971 issue of ]ola (v. 4, no. 3) for a description of the structure of monocle. the format has undergone little change in essentials, but many changes in detail have been made. new fields have been added ( 249: abridged title of periodical; 270: printer's imprint; 545: note showing title of periodical analyzed ) , subfield codes have been changed or added, new indicators have been created (see below), and the names (and therefore the contents) of some fields have been changed ( cf. 241 and 242). the leader has been enlarged from 19 to 24 bytes to show more exactly the address of the index related to a particular bibliographic record ( 4 new bytes) and to show the current number of fields in the record ( 2 new bytes) and the current length of the record ( 2 new bytes ) as well as the initial number of fields and the initial length. the length of the index is no longer given. thus the leader makes use of 8 new bytes and has discontinued 2 (only 18 of the original 19 were utilized). what has remained unchanged is the emphasis on coding for filing arrangement and on the use of tags to identify not only the nature of a field but its different functions and its relationship with other data. there is increased emphasis, however, on the importance of the integration and collaboration of several libraries in automation activities and, therefore, on the need for monocle to be generalized so that it is usable by institutions with other goals, hardware, and processing languages than the university of grenoble. mention is made throughout the volume of the variant approach of the bibliotheque nationale which uses monocle to prepare the bibliographie de la france. one change in the second edition is the increased awareness of the complexities involved in dealing with subrecords. the use of the subrecord technique has therefore been limited to works meeting certain requirements. the requirements are so strict that, for all practical purposes, grenoble does not use subrecords. instead, it uses secondary entries, or series headings, or contents notes. an important change has been made in the first indicator position of personal name fields ( 100, 400, 600, 700, 800, 900) which, in the 1st edition, was similar to marc. a new indicator structure has been created to facilitate construction of sort keys. a first indicator of '0' is used for forenames of saints, popes, and emperors. a '1' indicates a name that is to be filed exactly as given, whether it is a forename, simple surname, or multiple surname. book re vie ws 155 a '2' is used for multiple surnames containing a hyphen that is to be replaced by a blank, e.g., saint-exupery. a '3' is used when a name contains a blank, apostrophe, or hyphen that is to be deleted, e.g., la fontaine. a '4' is used for complex names, whether simple or multiple, in which it is necessary to keep some blanks and/or letters and to delete others. for this purpose, monocle makes use of three vertical bars to distinguish text to be printed and used for sorting from text to be printed only from text (supplied) to be used only for sorting. since the three bars are used only in fields with 1st indicator of '0' or '4', the use of these indicators enables the program to test for them only when these indicators are present instead of in every field. the 1st indicator of '4' is used for complex arrangements utilizing the three bars in other fields as well: 110, 111, 241, 243, 245, 410, 411, 441, 443, 445 and the equivalent 6xx, 7xx, 8xx, and 9xx fields. the errors in this volume are minor. monocle still lists field 653 (proper names incapable of authorship) as an lc subject field, although this field was discontinued almost as soon as it was created so that it doesn't even appear in the 1st edition ( 1969) of the marc manual.s. in a discussion of the use of terminals to catalog books, it footnotes 'the library' of 'ohio college' rather than 'the libraries' affiliated with the ohio college library center. the review of the 1st edition pointed out that one of the values of monocle for american librarians was the light it threw on marc. that statement still holds true. for purposes of facilitating its use for this purpose, an english language translation might be of value. judith hqpkins 156 journal of library automation vol. .5/2 june, 1972 information retrieval and library automation this monthly review is unique in its extensive u. s i and international coverage of the many specialized fields which contribute to improved information systems and labrary services for sc;ience, social science, technology, law and medicine; these fields include; computer technology and systems, library science and technology, library administration, photo· graphic technology and microforms, facsimile and communications, library and information networks, reprographic and printing technologies, copyright issues, indexing systems, mechine-aided indexing and abstracting, documentation and data standards, databanks and anlysis centers. subscription is $24.00 per year (overleas subscribers add $6.00). orders and inquiries should be directed to: lomond systems, inc. mt. airy maryland 21771 48,222 strong ... and still growing! f.w. faxon company, the only fully automated library subscription agency in the world, has an ibm 370/145 computer currently listing 48,222 p e riodic a ls for your library. our 'til forbidden service the a utomatic annua l renewal of your subscriptions provides fast, accurate, and efficient processing of your orde rs and invoices. send lor free descriptive brochure and annual librarians' guide. library business is our on ly businesssince 1886. reitic1 f. w. faxon co. ,inc. lljllj 15 southwest park westwood, massachusetts 02090 tel: (boo) 225-7894 (toll free) the american lffirary association announces the exclusive distribution here in the united states qf non-book materials: the organization of integrated collections jean riddle, shirley lewis, janet macdonald, in consultation with the technical services committee of the canadian library association. non-book materrols published by the canadian library association ( 1970) is now being exclusively distributed in the united states by the american library association. these officially approved rules for the cataloging of audiovisual materials were designed to be compatiable with parts i and ii of anglo-american cataloging rules (ala 1970). though written with the school library in mind, the principles can be applied to any library system which houses books and other media together and has a single, unified list of holdings. color coding, organization of media, rules for descriptive cataloging, use of illes, storage and media destination are covered in addition to discussing 20 different media. the glossary of media designations in the book is an attempt to standardize terminology within the media industry. isbn0-8389-3129-4 (1971 $3.50 iidiamerjcan library association ~ l huran st • chicaga 60611 information technology and libraries at 50: the 1980s in review mark dehmlow information technology and libraries | september 2018 8 mark dehmlow (mdehmlow@nd.edu) is director, library information technology at the hesburgh libraries, university of notre dame. my view of library technology in the 1980s through the lens of journal of library automation (jola) and its successor information technology and libraries (ital) is a bit skewed by my age. i am a gen-xer and much of my professional perspective has been shaped by the last two decades in libraries. while i am cognizant of our technical past, my perspective is very much grounded in the technical present. in a way, i think that context made my experience reviewing the 1980s in jola and ital all the more fun. the most pronounced event for the journal during the 1980s was the transition from the journal of library automation to information technology and libraries between 1981 to 1982. the rationale for this change is perhaps best captured through the context set in the guest editorial “old wine in new bottles?” by kenney in the first issue of ital: “proliferating technologies, the trend toward integration of some of these technologies into new systems, and rapidly increasing adoption of technology-based systems of all types in libraries .…”1 the article grounds us in the anxieties and challenges of the decade surrounding an accelerating change in technology. libraries were evolving from implementing systems of “automation,” a term that focuses more on processes, to broadening their view to “information technology,” which is more of a discipline — an ecosystem made up of technology, process, systems, standards, policies, etc. in a way, the article acknowledges the departure of libraries from their adolescent technological pasts to their young adult present for which the 80s would be the background. perhaps no other event is more technologically significant during the decade than the standardization of the internet. while the concept of networks and a network of networks, e.g. the internet, was conceptualized in the 1960s, it was the development of the tcp/ip network protocol that is the most consequential event because it made it possible to interconnect computer systems using a common means of communication. while the internet wouldn’t become ubiquitously popularized until the early 1990s with the emergence of the world wide web, the internet was active and alive well before that and, in its early state, was critical to the emergence and evolution of library technologies. from the first issue through the last of the 1980s, ital references the term “online” frequently. the “online” of the 80s however was largely text based, where systems were interconnected using lightweight terminals to navigate browse and search systems. it was not unlike a massive “choose your own adventure book,” skipping from menu to menu to find what you were looking for. throughout my review, i was happy to see a small, but significant, percentage of international articles that focused on character sets, automation, and collection comparisons in countries like kuwait, australia, china, and israel. diversity is a cornerstone for lita and ala and the journal has continued this trend to encourage the submission of articles from outside of the u.s. the 1980s volumes of ital traversed a plethora of topics ranging from measuring system the 1980s in review | dehmlow 9 https://doi.org/10.6017/ital.v37i3.10749 performance (efficiency was important during a time when computing was relativ ely slow and expensive) to how to use library systems to provide data that can be used to make business decisions. over the decade, there was a significant focus on library organizations coming to terms with new technology, e.g. the automation of circulation, acquisitions, and the marc bibliographic record. there were several articles that discussed the complications, costs, and best practices for converting card-catalog metadata to electronic records and several other articles that detailed large barcoding projects. the largest number of articles on a single topic focused on the automation and management of authority control in automated library systems. there were articles on the emergence of research databases often delivered as applications on cd-roms which would then be installed on microcomputers. the term “microcomputer” was frequently used because the 80s saw the emergence of the personal computer in the work environment, a transformative step in enabling staff and patrons alike to access online library services and applications to support their research and work. electronic mail was in its infancy and became a novel way to share information with end users across a campus. several articles focused on the physical design of search terminals and optimizing the ergonomics of computers. there were also many articles about designing the best opac interface for users, ranging from how to present bibliographic records to users, to what information should be sent to printers, to early efforts to extend local catalogs with article-based metadata. many of these topics have parallels today. instead of only analyzing statistical usage data we can pull from our systems, libraries are striving to develop predictive analytics, leveraging big-data from across an assortment of institutions. i found the 1988 article “investigating computer anxiety in an academic library,” which examines staff resistance to technology and change to be as apropos today as it was then.2 cd-roms have gone the way of the feathered and overly hairsprayed coifs of the 80s and have largely been superseded by hard drives and solid state flash media that can hold significantly more data and can transfer data more rapidly. the current decade of the 2010s has been dedicated to providing the optimal search experience for our end users as we have broadened our efforts to the discovery of all scholarly information, not just what is held in our collections. and of course, instead of adding a few article abstracting resources to our catalogs in an innovative, but difficult to sustain manner, the commercial sector has created web-scale mega-indexes that are integrated with our catalogs and offer the promise of searching a predominant amount of the scholarly record. there was a really interesting thread of articles over the decade that traced the evolution of the ils in libraries. there were articles about how to develop automation systems for libraries, the various functions that could be automated — cataloging, circulation, acquisitions, etc. — and evaluation projects for commercial systems. if the 2000s was the era of consolidation, the early 1980s could easily represent the era of proliferation. the decade nicely traces the first two generations of library systems, starting with university-developed automation and database backed systems and the migration of many of those systems to vendors. the northwestern university-based notis system was referenced a lot and there were some mentions of oclc’s acquisition and distribution of the ls/2000 system. this part of our automation history is a palpable reminder that libraries have been innovative leaders in technology for decades, often developing systems ahead of the commercial industry in an effort to meet our evolving service portfolios. this early strategy for libraries mirrors recent developments of institutional repositories, current research information systems (criss), and faculty profiling systems like vivo that were developed before the commercial sector saw the feasibility of commercialization. information technology and libraries | september 2018 10 the cycle of selecting and implementing a new integrated library system is something that m any organizations are faced with again. the only difference is that the commercial sector has entered into the development of the 4th or 5th generation of integrated library systems, many of which are coming with data services integrated and most of them are implemented in the cloud. in addition to seeing our technically rudimentary past, there were several articles over the decade that discussed especially innovative ideas or that anticipated future technologies. a 1983 article by tamas doszkocs which was written long before the emergence of google is an early revelation that regular patrons struggle to use expert systems that require normalized and boolean searching strategies. not surprising is the conclusion that users lean organically toward natural language searching, but even then we were having the expert experience vs. intuitive experience debate in the profession: “the development of alternative interfaces, specifically designed to facilitate direct end user interaction in information retrieval systems, is a relatively new phenomenon.”3 the 1984 article, “packet radio for library automation,” is about eliminating the challenges of retrofitting buildings with cabling to connect lan networks by using radio based interfaces.4 could this be an early precursor to wifi? there is the 1985 article titled “microcomputer based faculty-profile” about using a local database management application on a pc to create an index of faculty publications and university publishing trends.5 this is nearly three decades before the popularization of the cris and faculty profile system. in 1986, there is an article “integrating subject pathfinders into a geac ils: a marc-formatted record approach,” an article that made me think about how library websites are structured, and the current trend of developing online research guides and making them discoverable in our websites as a research support tool.6 and finally, i was struck by the innovative approach in 1987’s “remote interactive online support,” wherein the authors wrote about using hardware to make simultaneous shell connections to a search interface so they could give live search guidance to researchers remotely. 7 we take remote technical support for granted now, but in the late 80s, this required several complicated steps to achieve. the 80s were an exciting time for technology development and a decade that is rife with technical evolution. i think this quote from the article “1981 and beyond: visions and decisions” by fasana in the journal of library automation best elucidates the deep connection between the past and the future, “library managers are currently confronted with a dynamic environment in which they are attempting simultaneously to plan library services and systems for the future, and to control the rate and direction of change.”8 this still holds true. library managers are still planning services in a rapidly changing environment, except, i like to think we have learned to live with change that we cannot control the rate nor direction of. 1 b. kenney, “guest editorial: old wine in new bottles?,” information technology and libraries, 1 no. 1 (march 1982), p. 3. 2 maryellen sievert, rosie l. albritton, paula roper, and nina clayton, “investigating computer anxiety in an academic library,” information technology and libraries 7 no. 3 (september 1988), pp. 243-252. the 1980s in review | dehmlow 11 https://doi.org/10.6017/ital.v37i3.10749 3 tamas e. doszkocs, “cite nlm: natural-language searching in an online catalog,” information technology and libraries 2 no. 4 (december 1983), p. 364. 4 edwin b. brownrigg, clifford a. lynch, and rebecca pepper, “packet radio for library automation,” information technology and libraries 3 no. 3 (september 1984), pp. 229-244. 5 vladimir t. borovansky and george s. machovec, “microcomputer based faculty-profile,” information technology and libraries 4 no. 4 (december 1985), pp. 300-305. 6 william e. jarvis and victoria e. dow, “integrating subject pathfinders into a geac ils: a marcformatted record approach,” information technology and libraries 5 no. 3 (september 1986), pp. 213-227. 7 s. f. rossouw and c. van rooyen, “remote interactive online support,” information technology and libraries 6 no. 4 (december 1987), pp. 311-313. 8 paul j. fasana, “1981 and beyond: visions and decisions,” journal of library automation 13 no. 2 (june 1980), p. 96. 10 high school library data processing betty flora: librarian, leavenworth high school, leavenworth, kansas and john willhardt: data processing instructor, central missouri state college, warrensburg, missouri. planning and operation of an automated high school library system is described which utilizes an ibm 1401 data processing system installed for teaching purposes. book ordering, shelf listing and circulation have been computerized. this paper presents an example of a small automated high-school library system which works efficiently. a great deal of emphasis to date in library automation has been on large university and college libraries, but the relatively few schools that have pioneered in the field of school library automation have demonstrated its feasibility and its potential. data processing is economically within the realm of large and medium-sized school districts. the port huron district, port huron, michigan, has an accounting machine, keypunch and verifier; among the operations performed are printing purchase orders and book cards. the port huron staff consists of one professional librarian, two clerks and two part-time working students. evanston township high school, evanston, illinois, has an automated library system processed with an ibm 1401 computer. other high schools using library data processing are the oak park-river forest high school in illinois; beverly hills, california; west hartford, connecticut; weston, massachusetts; and the burnt hills-ballston lake and bedford-mt. kisco school districts in new york state (1). there are a small number of high schools and vocational schools in kansas and missouri that have high school library edp/flora and willhardt 11 data processing equipment which is used for teaching purposes. names and addresses of these schools may be obtained from the missouri director of vocational education at jefferson city, missouri, and from the kansas state supervisor of technical training at topeka, kansas. introduction leavenworth senior high school, leavenworth, kansas, a campusstyle school comprising six buildings, has approximately 1350 students. the library, located in the main academic building, is presently being remodeled and enlarged. it contains approximately eighteen thousand volumes, including the professional collection; and fifteen hundred to two thousand new volumes are added each year. the library staff consists of one qualified librarian, two full-time clerical assistants, and twenty student assistants, each of the latter working one class period a day. the library is, in the true sense of the term, a media center. a mobile listening center is available, and there are large collections of recordings, cartridge and reel tapes, film strips, films, microfilms, reproductions of paintings, educational games, magazines and vertical file material. fortunately, there is a consistently substantial budget of more than eight dollars per student, including some federal funds, which makes additions to the collection possible in stable development. · data processing at leavenworth high school was made possible by the vocational education act of 1963, which provided for the secretary of health, education, and welfare to enter into agreements with the several state vocational education agencies to provide such occupational training as found to be necessary by the secretary of labor (2). under the provisions of the act, federal money is alloted to the states, which in turn allot a portion of this money to various school districts; a school system receiving such money must lease or purchase data processing equipment and use it mainly for teaching purposes. a data processing curriculum was initiated in the school year 1964-65 at leavenworth high school, under conditions and regulations set up by the state supervisor of technical training which gave first priority in the use of the data processing equipment to teaching. this has been adhered to strictly at leavenworth high school; the equipment is used over half of the school day for teaching purposes and adult education courses in data processing are offered at night. class time consists of lecture and application, with students having opportunity to operate, wire, program and test problems. data processing classes are scheduled first in the computer room; administrative and library operations are scheduled to be processed in the remaining hours during the school day and after school, each operation being assigned a specific time. although unit record equipment was initially leased, plans for a small computer were included in the original decision to offer data processing courses. equipment, plus salaries to those conducting the program, con12 1 ournal of librm·y automation vol. 2/1 march, 1969 stitute a major investment for a medium-sized public high school. consequently, although the classes are a valuable addition to the vocational training area of the curriculum, as many applications as possible are made of school operations, such as enrollment, record keeping, grade reports and payroll, in order to further justify the cost. for this reason, the superintendent of the leavenworth school system suggested that the library might, by using data processing in many of its procedures, both support the data processing instructional program and increase its own effectiveness. methods and materials to develop a system requires systems analysis, which necessitates a clear formulation of purposes and requirements independent of any particular design for implementation ( 3); and the development of procedural applications to be processed on a computer system should be a joint responsibility of both the systems staff and line management ( 4). furthermore, any conversion of library procedures to automation should be carefully planned in advance. proceeding in the fullest cooperation with a view to mutual benefits, the librarian at leavenworth and the head of data processing spent many hours working out the details of their joint effort. the librarian explained her needs and suggested methods of achieving the desired objectives. for his part, the head of data processing evaluated the possibilities from a technical point of view and suggested methods of achieving the desired objectives. together they worked out an initial plan, and the various phases were then programmed. the leavenworth data processing library system was set up to 1) order all new library books; 2) complete shelf cards and book checkout cards; 3) run shelf card listings; 4) correct and file shelf cards; 5) reproduce book checkout cards for books checked out; 6) run first and second overdue notices; and 7) provide library inventory, book count lists and book catalogs. all the lists, notices, and reproduced cards are done on the 1401 computer; computer programs for these operations are written in autocoder. the amount of computer time required for the processing of library data and reports is comparatively small in relation to other operations of the data processing department and was set up to run partly in the daily schedule and partly after school. time required for preparation of information for the computer is significant and must be scheduled more carefully. again, part of this time is fitted into the daily schedule and part of it is accomplished after classes. the high school leases the following ibm data processing equipment: two 024 card punches, one 026 printing card punch, one 082 sorter, one 548 interpreter, one 085 collator, and one 1401 computer with 4k and one disk storage drive. the 1401 computer consists of the 1401 central processing unit, a 1402 card reader punch, a 1403 printer and a 1311 disk storage drive. high school library edp /flora and willhardt 13 the following cards were developed for the procedure: shelf card a is punched from lists of books to be ordered and only the following information and columns are punched: author name (columns 14-35), title (columns 36-71), copyright date (columns 72-73), and purchase date (columns 79-80). when the book is received, this card is completed with the following information: shelf letter (column 1), dewey decimal number (columns 2-7), author number (columns 8-13) and accession number (columns 7 4-78). shelf card b is punched and filed behind shelf card a. only the following information and columns are punched: price (columns 8-13), publisher (columns 36-65), and an x-punch in column 80. the book checkout card (figure 1) is first reproduced from the completed shelf card a, and after that from book checkout cards when books have been checked out of the library. this card contains the shelf letter (column 1), dewey decimal number (columns 2-7), author number (columns 8-13), author name (columns 14-30), title (columns 31-66), student number (columns 68-73), accession number (columns 7 4-78), and an x-punch in column 80. !look titlt author i accept responsibility for this book ano should this book be lost, destroyed or stolen while checked out to me, i will pay the replacement cost of the book, i agree to pay nie fin~ for overdue book$ as follows: i to 5 days overdue 2¢ per day 6 to 10 da'is overdue !5¢ per day over 10 days ~l:rt!v£ 10¢ per da't ---=moc.,=-=t '""':=::-'""' __ ] fig. 1. book checkout card. student tivmdir i a student finder card locates the student's name and parent's name and address on the computer disk pack. the biggest initial task was keypunching an ibm card for each book in the library, which at that time comprised 13,000 books. it was done by data processing students in the high school, working occasionally during class, but mostly after class and on saturdays on a voluntary basis. toward the end of the second semester, many of the procedures had been reviewed and discussed with students in the data processing classes as part of the vocational program. 14 journal of library automation vol. 2/1 march, 1969 aondb card i nte'rpret cards run listing on 1401 re-:-run list fig. 2. book order procedure. interpret book checkou cards interpret shelf card reproduce shelf card into book checkout card fig. 3. new book p1'0cessing. high school library edp/flora and willhardt 15 r ecej ve book · checkout cards from library reproduce new book checkout cards interpret new book checkout cards return old and new book check: out cards to ll brary fig. 4. book checkout procedure. cards for overdue books send finder cards to data processing run address labels return labels and finder cards to library fig. 5. overdue notice procedure. 16 journal of library automation vol. 2/1 march, 1969 book order (figure 2) the library furnishes the data processing department with request cards or lists of books to be ordered, giving author name, title, copyright date, price, publisher, and purchase date (year). data processing punches two cards for each book according to shelf cards a and b. these cards and batches must be kept in the order received from the library. the cards are interpreted, checked for correct punching and listed by batch. the library must check the number of copies ordered and the total amount of each group or batch. after verification and corrections, the cards are returned to data processing for rerunning of the number of copies necessary to send with the purchase order. new book processing (figure 3) when new books are received, the library staff discards shelf card b and writes the following information on shelf card a for punching in the columns indicated: shelf letter in column 1 ( b for biography, k for kansas, p for professional, r for reference, s for story collection, or a blank which indicates fiction); dewey decimal number in columns 2-7; author number in columns 8-13; and accession number in columns 74-78. these columns are interpreted on the 548 interpreter. shelf card a is used to reproduce the book checkout card. shelf cards are block sorted on column 1; each group is then sorted by author number and dewey decimal number. individual cards must be hand filed into the shelf list. the shelf list can be used to provide classification listings, inventory listings, library book counts and book catalogs. book checkout cards are interpreted, sorted by author name (columns 14-23 alpha), returned to the library and filed in the respective books. book checkout card reproduction (figure 4) as books are checked out of the library, the book checkout card (figure 1) is signed by the student and his number is written on it. once a week accumulated book checkout cards are sent to data processing to be reproduced into new book checkout cards which are interpreted and merged behind the old book checkout cards. each week's cards are kept separately. the old cards are for books due in the library in two weeks. the new cards are inserted in these books as they are returned and the old ones placed in a separate file for library circulation statistics. overdue notices (figure 5) the library is provided with a deck of student finder cards (on~ for each student), with student name, number and finder number on the card and in the address file on the disk. when books are overdue, finder cards are pulled by the library staff and sent to data processing,. where they are sorted by a disk accession number. address labels are run on high school library edp/flora and willhardt 17 the 1401 computer for those students with overdue books. these labels are presently attached to pre-printed envelope overdue notices (figure 6), but it is planned to replace the envelope with a continuous-form post card. the first notice is addressed to the student at his home and the second to his parents. leavenworth senior high school library if you have returned your overdue library materials disregard this notice •••• if not, please come to the library at your earliest convenience. 1 to 5 days overdue..................... 2¢ per day 6 to 10 days overdue ..................... 5¢ per day over 10 days overdue ..................... lo¢ per day fig. 6. overdue notice. discussion the book checkout and overdue notices procedures were the first concrete ones developed. these were initiated during the 1965-66 school year, and have proved to be quite successful in saving time and effort. one of the most useful purposes of the leavenworth system is that any portion of the shelf list can be easily provided for an instructor who wishes to assign special readings. also the system has simplified and accelerated preparation of lists for inventory purposes. the ordering process gives the librarian the opportunity to check the order lists before forwarding them to the business manager; this improves the accuracy of the order. 18 i ournal of library automation vol. 2/1 march, 1969 standardization of procedure and operation is essential for efficiency ( 5,6). basically the leavenworth procedure utilized two types of cards, sheh card a and the book checkout card, which are very similar in format. sheh card a is initiated when ordering books and is used to reproduce book checkout cards and to make sheh listings, inventory and book count listings. moreover the system was designed on the basis of having a minimum of skilled clerical workers. student help is used for correcting and filing sheh cards. the ability to provide a book catalog in the future is an advantage. a book catalog need not be confined to one area and may be done in multiple copies. different editions of a work may be more readily seen and compared on a printed page than in a card catalog, where only one entry can be examined at a time. also a book catalog may concentrate in a single easily handled volume entries which would occupy several heavy drawers in a card catalog (7). one of the problems associated with developing a system like the one here described is that of communication. as in all technical and professional areas, a specialized terminology develops, a kind of esoteric jargon which confuses meanings and impedes understanding. this difficulty naturally diminishes as each party to the cooperative effort becomes more familiar with the terminology of the other, and a little plain talking and clear thinking will soon eliminate it. the effectiveness of an automated library program depends, of course, upon the unqualified cooperation between the library and the data processing department. the librarian must establish a reasonable and acceptable schedule of work upon which the data processing department can depend, and she must assure that library material essential to that work is delivered according to schedule. conversely, the data processing department must undertake to complete the work promptly and accurately. evaluation certainly one of the most significant benefits of automation is the great saving of time. tedious and detailed tasks essential to the efficient operation of any library, tasks which formerly required many hours to complete and which had by their natures to be repeated periodically, are accomplished in a fraction of the time. consequently, the librarian is freed for more professional work; most importantly, she has more time to give to the students and their problems, which should be, above all, her first concern. the value of the leavenworth high school library system lies not only in greater accuracy and saving of time for the librarian and he~ staff, but also in the opportunity it provides for student help to learn and operate a system. it is apparent, finally, that automation, properly applied, can be an invaluable asset to the school library. like all systems it depends, in the high school library edp /flora and willhardt 19 final analysis, upon the human factors involved. so long as interests are mutual, and so long as efforts are equal, the library and data processing departments can work effectively together for the benefit of both. acknowledgments mr. jack spear, ksu, manhattan, kansas, advised on the initial planning of the system. the authors received cooperation and encouragement from mr. gordon yeargan, superintendent of schools in leavenworth, and mr. dino spigarelli, principal of leavenworth high school. mr. fred buis, data processing instructor at the high school, helped with the preparation of this paper and is continuing to develop the potential of the system. references 1. mccusker, sister mary lauretta: "implications of automation for school libraries part 2," school libraries, (fall, 1968), 15-22. 2. united states department of health, education and welfare: vocational and technical education (washington: government printing office, 1964). 3. markuson, barbara evans, ed.: libraries and automation (washington: library of congress, 1964). 4. elliott, orville c.; wesley, roberts.: business information processing systems (homewood, illinois: richard d. irwin, inc~ , 1968). 5. laden, h. n.; gildersleeve, t. r.: system design for computer applications (new york: john wiley & sons, inc., 1963). 6. dougherty, richard m.: "manpower utilization in technical services," library resources and technical services, 12 (winter, 1968), 79-80. 7. kingery, robert e.; tauber, maurice f., eds.: book catalogs (new york: the scarecrow press, inc., 1963). lib-s-mocs-kmc364-20140601051338 multipurpose cataloging and indexing system (cain) at the national agricultural library. 21 vern j. van dyke: chief, computer applications, national agricultural library, and nancy l . ayer: computer systems analyst, national agricultural library, beltsville, maryland. a description of the cataloging and indexing system (cain) which the national agricultural library has been using since january 1970 to build a broad data base of agricultural and associated sciences information. with a single keyboarding, bibliographic data is inputed, edited, manipulated, and merged into a permanent base which is used to produce many types of printed or print-ready end-products. presently consisting of five subsystems, cain utilizes the concept of controlled authority files to facilitate both information input and its retrieval. the system was designed to provide maximum computer services with the minimum of effort by users. introduction this article describes an interactive system in operation at the national agricultural library which with a single keyboarding of data provides all necessary catalog cards, book catalogs, bibliographies, and related internal reports, as well as a computer data base for information retrieval. primarily in batch mode, the system can operate on an ibm 360 with 256k memory using os, six magnetic tape drives, a card reader, and a line printer. background the national agricultural library ( nal) as one of the three national libraries is responsible for the collection and dissemination of agricultural information on a national and worldwide basis. in this pursuit publications are obtained through gifts, exchange agreements, and by purchase of items in many languages. titles of those items in non-roman alphabets are transliterated and all non-english titles are translated. the volume of publications handled by nal in 1969 was in the neigh22 journal of library automation vol. 5/1 march, 1972 borhood of 600,000, of which approximately 275,000 were added to the collection. this volume was sufficiently large to provide a serious problem to nal's staff and thus computer assistance was clearly a logical and necessary arrangement. in 1964 a computer group was formed in nal; it became active in developing systems to prepare voluminous indexes for the bibliography of agriculture, the complete pesticides documentation bulletin, and the categorical and alphabetical issues of the agricultural/ biological vocabulary. during 1969 these systems were consolidated and expanded so as to process all input data within one coordinated set of parameters. in january 1970 the new cataloging and indexing (cain) system was implemented. system design cain is a complex and comprehensive computer system which has been engineered to handle up to five ( 5) simultaneous but separate users who share the same controlled authority files. the basic precept in development of computer applications at nal is to make input and output simple and convenient for the users, with the computer assuming as much detail and data manipulation as is technically feasible. at nal the current users providing input data are the new book section, cataloging, indexing, and agricultural economics. operating in parallel, cain also services the herbicides data base of the agricultural research service; the international tree disease data base of the forest service; and in 1971 will be installed in the library of the technion-israel institute of technology in haifa, israel. the master data record is variable in length with a fixed portion of 173 characters and up to fifty-seven additional segments of 65 characters each. the fixed portion includes basic data plus a directory of data contained in the variable portion. data elements in cain are: a. file code-delineates the various files. b. identification number-on cataloged items this embodies the accession number. all identification numbers include the year of accession, a parallel run code plus a unique control number. c. source code. d. user codes-specific identification of up to five users. e. english indicator-language of text. f. translation code-availability of an english translation. g. language, if other than english. h. proprietary restrictoridentifies classified records. i. title tracing indicator-for catalog cards. j .. main entry-designates main entry if not normal sequence. k document type-whether journal article, monograph, serial, etc. i. filing location-if other than in the library stacks. m. categories-two. general area of coverage of subject matter. cataloging and indexing system/van dyke and ayer 23 n. new book description-if the title is not sufficiently explanatory. o. titles-three types: ( 1 ) vernacular or short, ( 2) alternate or holdings, and ( 3) translated title (english). p. personal authors-up to 10. names plus identifying data. q. corporate authors-maximum of two. r. major personal author affiliation. s. abbreviated journal title if item is a journal article; imprint if monographs and serials. t. collation/pagination. u. date-two: search date, and date on publication if different. v. call number. w. subject terms-may be nested. up to 45. x. general notes. y. special purpose numbers-patent, grant, analysis, contract, technical, or report. z. series statement. aa. abstract/ extract. bb. tracings not otherwise normally generated by the system. cc. nonvocabulary cross-references. the total number of individual elements is limited only by the maximum record size. the nal-produced software is written in cobol. the data base is maintained on tape which is nine-track, 800 bpi, blocked 2, in ebcdic, with standard ibm 360 header and trailer labels. the total system presently consists of forty programs, some of which are multipass. in addition, throughput is sorted twenty-five times during the full computer run. these, of course, include the search and retrieval programs and sorts which are run only on request. the ultimate system which nal is working toward and for which the basic design is already substantially complete is an on-line full library document locator and control system which may be linked via dial-up service to an international and national science and technology information network. each portion of cain is developed with the broader picture in mind. it was this factor which weighed heavily in selecting cathode ray tube (crt) terminals for the proposed data gathering subsystem inasmuch as crt's will be the predominant type of terminal in the future network. for convenience in discussion, the system will be described by its subsystems: data gathering, edit and update, publication, search and controlled authorities. data gathering subsystem from its inception the input to cain was in the form of punched cards, a method which has proved to be slow and error prone. in order to eliminate double keyboarding and excessive time lag, as well as to reduce the 24 journal of library automation vol. 5/ 1 march, 1972 error rates, it was decided to perform this input function in the library with trained library personnel. to accomplish this, nal proposes to implement an "on-line" type of input subsystem using crt's. although this form of entry is not yet in use, the subsystem should operate substantially as follows. the documents are to be marked by catalogers and indexers and passed to library technicians who will enter the data through crt's into an on-line storage file. to do this, the technician will call from the hardware prestored formats as desired and fill in the data elements required. these formats use english terms and for the most part call for data rather than codes. in addition, data are to be entered in normal upperand lowercase without diacritics, thus improving visual scanning for errors. an average of four formats will be needed to enter one item. by use of an algorithm, the system would store formatted records for each id in such a manner as to permit recall singly or collectively. the physical documents are then to be passed on to an editor who can recall any or all formatted records for review. with the document in hand, stored records will be reviewed and corrected if necessary. when acceptable, the records will then be transmitted to magnetic tape. variations on this procedure could include input direct to tape, storage to tape without recall to a crt by an editor, cancellation of actions, and a direct purge of the entire storage file without loss of the controlling matrix. the expertise of the library technicians inputting the data should insure far more accuracy than could be expected from multihandling and multikeyboarding. in addition the system has been designed to accomplish basic pre-cain editing of such factors as numeric or alphabetic characters in certain fields and overall lengths of the fields. errors in these categories will be promptly identified by the computer by a blinking feature on the crt screen. another major benefit of this direct approach is that documents can be processed through the system so as to reach the stacks twenty-four days faster than under the current keypunch method. magnetic tapes created by the data gathering system will be periodically converted from ascii to ebcdic and processed into the edit and update subsystem of cain. the present nal time schedule for updating master cain files is weekly. this is not a requirement of the system but an administrative decision based on other deadlines. the data gathering system as prescribed by nal will be composed of sixteen crt's, a large on-line storage file , and one nine-track 800 bpi magnetic tape drive. this configuration will be either a hard-wired "black-box" approach, or controlled by a dedicated mini-computer. the hardware prescribed for this subsystem is not included as a requirement of cain inasmuch as transactions can be entered on 80-column cards if desired. an additional feature of this subsystem will be the generation of managecataloging and indexing systemjv an dyke and ayer 25 ment information feedback. this will encourage elimination of manual counts and provide accurate throughput volume statistics on a timely basis. through this means the supervisor will be in a better position to evaluate workload, individual performance, and hardware utilization. edit and update subsystem the first step in the acceptance of transactions is a thorough validation of each data element. the computer is used to relieve librarians of the voluminous and time-consuming edit of many individual elements having predetermined limits. thus, only a cursory review of the proof-listed records is necessary by a librarian before acceptance. the system cannot detect, of course, logical or typographical errors, but it can determine the absence of necessary information, codes in invalid ranges, and the incorrect placement of data. elements for which the system supplies authority files are not only verified against the file but also additional transactions are generated from the authority file to assure uniformity in output. this also eliminates the necessity for librarians having to enter those elements which have a direct predictable relationship to another element. further validations are performed at the point of building new records or updating records already in the master file. the two "master" files are ( 1 ) the temporary set of unselected records and ( 2 ) the permanent set of those records which have been approved and selected for publication in some form. data elements specified as required within each record are reviewed. if one or more is missing, the system refuses to approve this record, and a notice is produced concerning this reversal of human input. fields can be deleted, in whole or in part, replaced or added. three types of output from this subsystem are: • new updated master files. those which have been added or altered during this update run are proof-listed for cursory review by a team of professional librarians. corrections and/ or approvals are submitted in a subsequent update run. • activity notices. every action whether submitted by the user or system-generated which has been accepted for processing is reported. • error notices. all error and warning messages from this subsystem are compiled into one listing. this includes errors on individual elements, system-discovered errors of omission, and warnings of computer overriding of submitted actions. through the use of control cards various handling options are possible. one of these is proof-listing of a specific range or ranges of masters by identification numbers or dates. subject headings are assigned by professional librarians for monographs and new serial titles. for journal articles, however, the system analyzes the title of the article and creates subject index terms, using single words, 26 journal of library automation vol. 5/1 march, 1972 combinations of two words not separated by stop words, and singular and plural variations. the generated terms are then processed against the controlled authority file. those accepted as valid are inserted in the record for searching purposes. publication and distribution subsystem each data element of a bibliographic item is captured only once and at the earliest possible time in the receipt process. master records which have successfully passed the edit and update phase become candidates for various types of publications and other user services. six major modes of publication products are produced by cain, at various times and in a variety of both formats and media. preliminary to the production of formal output there is a screening for records designated as fully acceptable by the edit and update subsystem. as mentioned above, any record may be identified as being applicable to any combination of from one to five users. by a method of control cards the system is informed as to which users are scheduled for publication/ distribution, and the maximum quantity to be selected in each case. this subsystem reviews each record to ascertain its appropriateness for selection. records meeting the criteria are siphoned off for individual handling. no record is dropped from the temporary file until it has been selected by all applicable users. a new book shelf listing may be printed on photocopy paper on request. on preparation, it is ready to be matted, photographed, printed, and distributed throughout the department of agriculture. only enough new book entries are selected by the computer at one time as will fit on three sheets of a four-page publication. approved cataloged records are selected weekly. each record is analyzed for applicability to any or all of the eight major files for which catalog cards are prepared. each card file has its own criteria both in content and in the number and types of cards produced for it. the system produces a separate record for each card required, sorts together the records for each file, and alphabetizes within that file. leading articles (regardless of language) are printed but are excluded in the sorting procedure. cards are printed two-up in upperand lowercase in the format prescribed by angloamerican cataloging rules. after printing, the cards are distributed to the appropriate organizations and sections where they may be filed with a minimum of additional effort. monthly, a book catalog is compiled. this contains not only a listing by main entry but also indexes of personal authors, corporate authors, subjects, and titles. a biographic index (major personal author affiliation) capability is available although not presently used by nal in the book catalog. this catalog is printed in varying numbers of columns changeable by control card option for each index. again photocopy paper is used with a standard cataloging and indexing system/van dyke and ayer 27 upperand lowercase (tn) print train. an alternate option is magnetic tape output formatted for direct input to a computer-driven linotron. see bibliographic description for more detail. semiannually the index portions of the book catalog are cumulative. main entry listings are not repeated. multiyear accumulations may also be produced. the book catalogs are presently being published from photocopy printout by rowman and littlefield, inc., new york. bibliographies, either scheduled or special, can be produced with the same indexes as those in the book catalog. these are normally prepared for printing via the linotron. this magnetic tape record contains all formatting requirements with the exception of word divisions. document title, page, and columnar (subject category) headers are provided by nal. running headers are inserted by the linotron. through predetermined codes, the cain tape specifies the print style, print size, and print format. bibliographies may also be computer printed on photocopy paper similar to the book catalog. once a month, each record selected for publication is processed through a merge and adjustment program. at this point published records not previously on the permanent master file are added to it. those which are already on it are compared and the resident record is adjusted to include the new user for whom the record has just been published. the term field is also verified and updated if necessary. each term is also used to generate posting records for the subject authority file. the permanent (published) cain data base is available on magnetic tape in either the master format or a print format of the linear proof (listing of each data element). only records not previously published are added to the monthly sale tapes. these tapes may be ordered individually (new monthly selections) or collectively (whole file) at the cost of reproduction only. the tape is nine-track, 800 bpi, ebcdic with standard ibm 360 header and trailer labels. one of the purchasers of cain tape is the ccm information corporation of new york which publishes bibliography of agriculture from it starting in 1970. current purchasers include private corporations and universities, both in the united states and abroad. the last type of output is normal computer printout of numerous internal reports in a variety of customized formats. search subsystem the search capability of the cain system is not being used by nal on its own data base at the present time. it is utilized, however, by other organizations who run the cain system on a parallel basis, maintaining their own data bases. the following description, therefore, pertains to the programmed system rather than to its use on the nal data base. this subsystem permits identification and retrieval of records in cain format based on search statements as applied to almost every data element 28 journal of library automation vol. 5/ 1 march, 1972 or combinations thereof. such searches may use simple statements or a complex series of nested boolean parameters. questions may also be absolute or weighted to give more precise results. the weight factors if used are normally assigned to each statement within a search question, with a threshold weight assigned to the overall question. the total weight of all true statements must be equal to or greater than the threshold weight for the full query in order to be considered as meeting the search criteria. if such is not the case, the record will not be selected. since cain uses a controlled vocabulary, query statements on subject terms are first matched against that authority file. at this point each invalid (use ) term is replaced by a corresponding valid ( uf ) term if appropriate. in addition, if the query statement so specifies, the requested terms may be expanded one level in the hierarchy. in other words, it could generate additional statements requesting all broader, narrower, or related terms as specified if such structure were present for the subject within the vocabulary. because subject terms comprise the largest percentage of all search elements, an algorithm was developed whereby queries on this type of element are first processed against an inverted file. identification numbers are extracted for all terms matching the query and only those candidate records are searched using the full query. on a serial file such as cain, this concept provides a substantial savings in computer run time. the print options of retrieval output allow either for normal sequence by identification number or for a specific sequence as requested by the originator. the printout may contain all data elements or only those selected, all others being suppressed. at the present time this subsystem is used infrequently by nal and only for internal high priority searches due to the extremely limited subject indexing terms present. it is used more extensively on the parallel operation established for the international tree disease register maintained for the u. s. forest service. authority files subsystem this subsystem updates, generates, expands, and maintains three types of authority files. these include subject terms with associated hierarchy, call numbers of indexed journals with abbreviated titles, and a subject term inverted file carrying the identification number of each record using that term. each transaction to add, change, or delete any data is both edited and reversed before entering the updating sequence. thus an addition of a narrower term (for example, horse) to a base term (for example, animal) will automatically generate another transaction to add the broader term of animal to a base term (new or existing ) of horse. this precludes having to manually enter both sides of an action as well as assuring reciprocity of entries. due to the flexibility of the search subcataloging and indexing system/van dyke and ayer 29 system of cain, this hierarchical continuity is of great importance. if an item is changed the same procedure is followed. in the instance of deletion, a broader precept is involved. in this case, the term is deleted from all entries in other hierarchies but is itself left on the authority fil e and marked as being no longer valid. it is thus available for search purposes but is not allowed to be used on subsequent cain data records. during a normal cain data run, each call number or subject term in a record is verified against the appropriate file. each element on these files is carried in two forms-one in stripped uppercase, and the other in preferred print form. when an incoming term is found on the authority file, the system substitutes the proper form. this includes substituting a valid term for an invalid term as in the "use-use for" relationship, as well as generation of the appropriate abbreviated journal title for a given call number. in order to keep the authority file up to date, the transactions generated by the publication subsystem are now used to insert the record identification number into the inverted file as well as increase the number of postings per term. this assists search specialists in formulating queries in the manner which will reduce computer processing time to the greatest degree. when published, the authority files themselves can be printed in a special format which displays the entire hierarchy of each term. in addition, up to ten levels of increasingly narrower terms can be listed for each term. summary cain is a broad-based comprehensive batch mode system which meets many library requirements. its flexibility is apparent from the fact that it has already been expanded to se lect each newly cataloged serial record for transmission in marc ii communication format to the national serials data bank being created by the three national libraries. still more capabilities will undoubtedly be built into it before the nal ultimate on-line system is implemented. the major thrust of the systems design has been to concentrate on simplifying user interface while imposing stringent and extensive service requirements on the computer system itself. due to its inherent fluidity, cain is being retained as an in-house system. it is so complex that a single change in one subsystem may have radial effects in any or all of the other portions. continuing efforts are underway to simplify input, accelerate throughput, and expand its already generous services both to the staff of the national agricultural library and to those organizations utilizing output from the cain system. digital faculty development editorial board thoughts digital faculty development cinthya ippoliti information technology and libraries | june 2019 5 cinthya ippoliti (cinthya.ippoliti@ucdenver.edu) is director, auraria library, colorado. the role of libraries within faculty development is not a new concept. librarians have offered workshops and consultations for faculty for everything from designing effective research assignments, to scholarly impact, and open educational resources. in recent months however, both acrl and educause have highlighted new expectations for faculty to develop skills in supporting students within a digital environment. as part of acrl’s “keeping up with…” series, katelyn handler and lauren hays1 discuss the rise of faculty learning communities that cover topics such as universal design, instructional design, and assessment. effective teaching has also recently become the focus of many institutions’ efforts in increasing student success and retention, and faculty play a central role in students’ academic experience. in addition, the educause horizon report echoes these sentiments, positing that “the role of full-time faculty and adjuncts alike includes being key stakeholders in the adoption and scaling of digital solutions; as such, faculty need to be included in the evaluation, planning, and implementation of any teaching and learning initiative.”2 finally, maha bali and autumn caines mention that “when offering workshops and evidence-based approaches, educational development centers make decisions on behalf of educators based on what has worked in the past for the majority.”3 they call for a new model that blends digital pedagogy, identity, networks, and scholarship where the experience is focused on “participants negotiating multiple online contexts through various online tools that span open and more private spaces to create a networked learning experience and an ongoing institutionally based online community.”4 so how does the library fit into this context? what we are talking about here goes far beyond merely providing access to tools and materials for faculty. it requires a deep tripartite partnership with educators and the centers for faculty development, as each partner brings something unique to the table that cannot be covered by one area alone. the interesting element here is a dichotomy where this type of engagement can span both in-person and virtual environments as faculty utilize both to teach and connect with colleagues as part of their own development. the lines between these two worlds suddenly blur and it is experience and connectivity that are at the center of the interactions rather than the tools themselves. while librarians may not be able to provide direct support in terms of instructional technologies, they can certainly inform efforts to integrate open and critical pedagogy and scholarship into faculty development programming and into the curriculum. libraries can take the lead on providing the theoretical foundation and application for these efforts while the specifics of tools and approaches can be covered by other entities. bali and caines also observe that bringing together disparate teaching philosophies and skill sets under this broader umbrella of digital support and pedagogy can help provide professional development opportunities for faculty, especially adjuncts, who may not have the ability to participate otherwise. this opportunity can act as a powerful catalyst to influence their teaching by implementing, and therefore modeling, a best-practices approach so that they are thinking about digial faculty develoment | ippoliti 6 https://doi.org/10.6017/ital.v38i2.11091 bringing students together in a similar fashion even if they are not teaching exclusively online, but especially if they are.5 open pedagogy can accomplish this in a variety of ways. bronwyn hegarty defines eight areas that constitute open pedagogy: (1) participatory technologies; (2) people, openness, and trust; (3) sharing ideas and resources; (4) connected community; 5) learner generated; (6) reflective practice; and (7) peer review.6 these elements are applicable to both faculty development practices, as well as pedagogical ones. just as faculty might interact with one another in this manner, so can they collaborate with their students utilizing these methods. by being able to change the course materials and think about the ways in which those activities shape their learning, students can view the act of repurposing information as a way to help them define and achieve their learning goals. this highlights the fact that an environment where this is possible must exist as a starting point and it also underlines the importance of the instructor’s role in fostering this environment. having a cohort of colleagues, for both instructors and students, can “facilitate student access to existing knowledge, and empower them to critique it, dismantle it, and create new knowledge.”7 this interaction emphasizes a twoway experience where both students and instructors can learn from one another. this is very much in keeping with the theme of digital content, as by the very nature of these types of activities, the tools and methods must lend themselves to being manipulated and repurposed, and this can only occur in a digital environment. finally, in a recent posting on the open oregon blog, silvia lin hanick and amy hofer discuss how open pedagogy can also influence how librarians interact with faculty and students. specifically, they state that “open education is simultaneously content and practice”8 and that by integrating these practices into the classroom, students are learning about issues such as intellectual property and the value of information, by acting “like practitioners” 9 where they take on “a disciplinary perspective and engage with a community of practice.”10 this is a potentially pivotal element to take into consideration when analyzing the landscape of library-related instruction, because it frees the librarian from feeling as if everything rests on that one-time instructional opportunity. the development of a community of practitioners which includes the students, faculty, and the librarian has the potential to provide learning opportunities along the way. including the librarian as part of this model makes sense not only as a way to signal the critical role the librarian plays in the classroom, but also as a way to stress that thinking about, and practicing library-related activities is (or should be) as much part of the course as any other exercise. information technology and libraries | june 2019 7 references 1 katelyn handler and lauren hays, “keeping up with…faculty development,” association of college and research libraries, last modified 2019, http://www.ala.org/acrl/publications/keeping_up_with/faculty_development. 2 “horizon report,” educause, last modified 2019, https://library.educause.edu//media/files/library/2019/2/2019horizonreportpreview.pdf. 3 maha bali and autumm caines. “a call for promoting ownership, equity, and agency in faculty development via connected learning.” international journal of educational technology in higher education 15, no. 1 (2018): 3. 4 bali, “a call for promoting ownership, equity, and agency in faculty development,” 9. 5 ibid, 3. 6 bronwyn hegarty, “attributes of open pedagogy: a model for using open educational resources,” last modified, 2015, https://upload.wikimedia.org/wikipedia/commons/c/ca/ed_tech_hegarty_2015_article_attri butes_of_open_pedagogy.pdf. 7 kris shaffer, “the critical textbook,” last modified 2014, http://hybridpedagogy.org/criticaltextbook/. 8 silvia lin hanick and amy hofer, “opening the framework: connecting open education practices and information literacy,” open oregon, last modified 2017, http://openoregon.org/openingthe-framework/. 9 “opening the framework.” 10 “opening the framework.” tull ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ j costs of library catalog cards produced by computer 121 frederick g. kilgour: ohio college library center, columbus, ohio production costs of 79,831 cards are analyzed. cards were produced by four variants of the columbia-harvard-yale procedure employing an ibm 870 document writer and an ibm 1401 computer. costs per card ranged from 8.8 to 9.8 cents for completed cards. . early in september, 1964, the yale medical library.put into routine operation the columbia-harvard-yale computerized technique for catalog card manufacture ( 1), and during the following three · years yale produced over 87,000 cards. the principal objective of the chy project was an on-line, computerized, bibliographic information retrieval system. however, the route selected for attaining the objective included manufacture of cards from machine readable data to keep up the manual catalog while machine readable records were being inexpensively accumulated for computerized subject retrieval. catalog cards were only one product of the system, but their production was designed to be as efficient as possible within constraints of the system. nevertheless, this paper will examine chy card production costs as though this segment of the system were an isolated procedure, yielding but one product, as is the case in classical library procedures. costing will disregard other benefits, such as accession lists and machine readable data produced for little, or no, additional expense. the columbia medical library and harvard medical library also installed ibm 870 document writers and tested the programs for card production, but neither library routinely produced cards. however, co122 journal of library automation vol. 1/ 2 june, 1968 lumbia produced its acquisitions lists until october, 1966, using chy techniques. harvard issued a similar list, but for a shorter period of time, and it was harvard's withdrawal early in 1966 that brought about the collapse of the project. nevertheless, other institutions adopted the chy procedure for catalog card production, among them the medical library at the university of rochester, which used the programs for two years following february, 1966. e. r. squibb & sons at east brunswick, new jersey, also uses the programs. at the university of kentucky an 870 document writer types catalog cards, but new programs were written to run on an ibm 7040 computer that recently have been recoded in cobol for an ibm 360/50. similarly, the library at philip morris, inc., richmond, virginia, rewrote the programs to run on an ibm 1620 computer which punches cards that drive an 870. the korean social science bibliography project of the human relations area files has elaborated the chy technique into its automated bibliographic system ( 2), which in turn is the base for another bibliographic system for mrican studies. the machine readable cataloging record of the chy mechanized system eventually became the great-grandfather of the marc ii format and contributed about as much to marc ii as would have been the case had their relationship been truly biological. although the columbia-harvard-yale project never did develop and activate its proposed bibliographic information retrieval system, r. k. summit working entirely independently has brought into successful operation his excellent dialog system ( 3) which is essentially the system that chy had in design stage. moreover, summit's system is definitely superior because it has several useful functions not contemplated in chy. nearly all reports on catalog card production limit study of costs to reproduction of cards and neglect other costs involved in preparing cards for the catalog. an exception is p. j. fasana's 1963 investigation wherein he found that library of congress cards, in seven copies and ready to be filed into a catalog, cost 16.6 cents per card; cards produced by a machine method consisting of a tape typewriter and a very small special purpose computer cost 9.9 cents ( 4). fasana used an hourly salary rate of $2.00. a study of early experience with chy production yielded 12.5 cents per card ( 1) whereas the present study shows that costs range between 8.8 and 9.8 cents per card, cards being ·in completed form, arranged in packs for individual catalogs, and ready for bursting before alphabetizing for filing. methods · during the course of the three years in which the chy programs were in operation, four variant techniques were used for card production. the first three with their limitations have been described · elsewhere ( 5). briefly, the initial system consisted of keypunching from worksheets, _listing the punch cards on an ibm 870 document writer, proofreading and costs of library catalog cards/ kilgour 123 correcting, processing the proofread and corrected punch cards on an ibm 1401 computer which produced punch card output that, in tum, was used to drive the 870 document writer for production of catalog cards on oneup forms. in the next arrangement, printing of cards on one-up forms was accomplished on an ibm 1401 computer driving an upperand lowercase print chain. in the third procedure, a two-up card form replaced the one-up form. finally, the medical library returned the 870 document writer to the manufacturer, and the 1401 was programmed to do the prooflisting in upper and lower case. the yale bibliographic system (6) replaced the chy routines on 25 july 1967. the keypuncher kept time records for the various activities listed in table 1 throughout the period of this study. during the first two months of operation, design for recording data was inadequate. subsequently an individual would, albeit infrequently, fail to record time elapsed, so that production of 7,630 cards was omitted from the study, leaving a total of 79,831 to be included. on several occasions during the fourth part of the study, the second proofreading was suspended, and only correction carried out. hence, time expended in this category is less than in the previous three periods. at first an ibm 1401 computer in the yale computer center was used, the center being located about a mile from the medical library. subsequently, another 1401 modified to drive an upperand lower-case print chain and located in the medical sc;hool was employed. later this machine was transferred to the administrative data systems computer center, which moved to a new location not long after it assumed operation of the 1401. still later, the 1401 was again transferred, this time to the yale computer center. as can be seen from the computer charges in table 1, these wanderings about new haven appear to have had no effect on operating efficiency. time recorded for each computer run was actual time clocked by the operator. other times were recorded by the individual performing the operation. ·. salaries used in the cost calculation were salaries being paid in june, 1967, which were, of course, appreciably higher than those in the autumn of 1964; hourly rate for the first proofreader in table 1 was $2.62 ~nd for the second $2.21. hourly rental for the 870 document writer was $.78. rate of computer charges employed in the calculation was $20 per hour, a rate that had existed during the last year or so during which data was collected. initially, computer charges had been $75 an hour, but they dropped precipitously during the first two years. costs for catalog card stock were the lowest cost charged for the two types of forms. since these forms were not standard items during the years of the study, their prices varied considerably depending upon the amount ordered. results table 1 contains cost figures for catalog card production by the four variant techniques. since salaries and computer charges can vary widely, -----.-.---.-~..::::-·...:::::-.-__ ...... l'o ~ table 1. per-card costs of computer-produced catalog cards. 'o' one-u p form on 870 one-up fo r m o n 1401, t woup f o r m on 1401 , two-up· form o n 1401 , ~ g proo f on 8 70 proof o n 870 p r oof o n 140 1 ...... ..a dollars hou r s dollars hou r s dolla r s hou r s d olla r s hours t"'' .... <:3"' k e ypunch i n g • 02 19 • 0099 • 0 2 18 • 0099 • 0222 • 0 10 1 • 0 235 • 0106 "'t ~ "'t '-!::: keypun c h • 0029 • 00 99 • 0030 • 009 9 • 003 0 • 0101 • 0 0 32 • 0 106 ::> ~ i b m 8 70p r o o£ • 0033 • 00 4 3 • 0 036 • 00 4 6 • 003 9 • 00 51 ..... 0 i bm 1401 -proof • 004 6 ~ • 009 1 ~ ..... .... proofr eaders (2) 0 ;:$ proofr eadi ng • 0 11 5 • 004 4 • 0 11 3 • 00 4 3 . 0118 • 00 45 • 011 6 • 0044 proofr eading and c orrecting • 0 120 • 0 0 55 • 0 12 2 • 005 5 • 0 11 9 • 0 0 54 • 009 1 • 004 1 ~ i bm 140 1 • 0149 • 0085 • 0313 • 0 156 • 023 1 • 011 6 • 024 5 • 0 112 !"""' ...... ib m 8 70-ca r d typing • 0 104 '-.... l'o card st o c k • 0 149 • 01 49 • 01 2 5 • 0125 '--1 t o ta l • 0 9 18 • 0981 • 0884 • 09 35 § v(l) ...... <;;0 n um b er of cards 1 5, 149 9343 27,210 28, 129 0:> 00 number of titles 1, 6 55 990 2 , 920 3,1 30 cards per titl e 9 . 2 9. 4 9. 3 9 . 0 ~--· costs of librm·y catalog cards/kilgour 125 particularly among countries, time per card produced is also included in the table to facilitate comparison with other systems. of course, amounts of tim~ calculated by dividing elapsed time by amount of product are not directly comparable with results of time and motion studies such as henry voos' helpful study (7) . however, two different methods of comparing the input costs in table 1 with those johnson ( 8) published for the stanford book catalog gave divergences of only 2 and 6 per cent. source of the increase in costs of six-tenths of a cent from the first procedure to the second is entirely the increase in computer charges when the 1401 replaced the 870 to print cards. when the two-up form was employed on the computer in variant three, charges then dropped to less than the combined 1401 and 870 costs in the first procedure. costs rose again in procedure four. here the principal cause of the increase was the substitution of computer-produced proof listings after the 870 document writer had been returned to the manufacturer. although there is no reason to think that preparation of cataloging copy on a worksheet is either more or less expensive than older techniques, coding a worksheet constitutes additional work for which there is no equivalent in classical procedures. coding costs were examined between 9 march and 11 may 1965, when six individuals, ranging from professional catalogers to a student assistant, recorded time required to code 725 worksheets. time per final catalog card produced was three seconds; in other words, $.003 for a cataloger receiving $7500 a year, or $.001 for a student assistant earning $1.50 an hour. if total coding cost, . rather than a portion of it, were to be charged to card production, costs reported in table 1 could rise oneto three-tenths cents. discussion the accurate comparison of costs would be with those of systems similar to the chy system that produce more than one product. for instance, the chy system also produced monthly accession lists from the same punch-card decklets that produced catalog cards. the accession list was produced mechanically at a cost far less than that for the previous manual preparation. the decklets also constituted machine readable information available for other purposes, most of which have not yet been realized. system costing would assign only a portion of keypunching and proofreading costs to card production. another saving was the appreciable shortening of time required for catalog cards to appear in the catalog. in procedures one through three, usually three or four days elapsed from the day on which the cataloger completed cataloging to the day on which cards were filed into the catalog. however, in procedure four, the computer, which was then a mile distant from the medical library, was used on two separate occasions for each batch of decklets, so that elapsed time rose to at least a week. ' i li ii ii '· ,, .. '· ,, ' • ,, 126 journal of library automation vol. 1/ 2 june, 1968 even though other benefits are not reflected in comparative costs, it is clear from fasana's findings that the chy computer-produced cards cost far less than do lc cards, and have a similar cost to those produced mechanically on which fasana reported. although there appears to be no published evidence that photocopying techniques can produce finished catalog cards at less expense than 9 cents, it is possible that some photoreproduced cards may be less expensive than those described in this article. however, it must be pointed out that photo-reproduced cards are products . of single-product procedures, whereas the chy cards are one of several system products. increase in cost between procedure three and procedure four was due to increase in cost of prooflisting in upper and lower case on the 1401 computer as compared to prooflisting on the 870 document writer. this cost increase was not detected until calculations were done for this investigation, and therein lies a moral. it was the policy at the yale library for all programming to be done by library programmers, since various inefficiences, and indeed catastrophes, had occasionally been observed when non-library personnel had prepared programs for library operations. the single exception to this policy was the proof program, which this investigation reveals used an exhorbitant amount of time-one-third of that required for subsequent card production. since it had been felt that writing and coding a prooflisting program. was perfectly straightfmward, an outside programmer of recognized ability was employed to write and code the program. because the program was simple, and because the programmer had high competence, efficiency of the program was never checked as it should have been. this episode raises the question that if even the wary can be trapped, how can the tmwary avoid pitfalls? there is no satisfactory answer, but it would appear that some difficulties could be avoided by review of new programs by experienced library programmers, of which there are unfortunately far too few. comparison with data such as that in table 1 will also be helpful, but not definitive, in evaluating new programs. of course, when widely used library computer programs of recognized efficiency are generally available, magnitude of the pitfalls will have been greatly reduced. concl"qsion computer-produced catalog cards, even when they are but one of several system products, can be prepared in finished form for a local catalog less expensively and with less delay than can library of congress printed cards. computer card production at 8.8 to 9.8 cents per completed card appears to be competitive with other procedures for preparing catalog cards. however, undetected inefficiency in a minor program increased costs, thereby emphasizing need to insure efficiency in programs used routinely. costs of library catalog cards/ kilgour 127 acknowledgements the author is most grateful to mrs. sarah boyd, keypuncher extraordinary, who maintained the record of the data used in this study. national science foundation grant no. 179 supported the chy project in part. references 1. kilgour, frederick g.: "mechanization of cataloging procedures," bulletin of the medical library association, 53 (aprill965), 152-162. 2. koh, hesung c.: "a social science bibliographic system; computer adaptations," the american behavioral scientist, 10 (jan. 1967), 2-5. 3. summit, roger k.: "dialog; an operational on-line reference retrieval system," association for computing machinery, proceedings of 22nd national conference, (1967), 51-56. 4. fasana, p.j.: "automating cataloging functions in conventional libraries," library resources & technical services, 7 ( fall1963), 350-365. 5. kilgour, frederick g.: "library catalogue production on small computers," american documentation, 17 (july 1966), 124-131. 6. weisbrod, david l.: "an integrated, computerized, bibliographic system for libraries," (in press). 7. voos, henry: standard times for certain clerical activities in technical processing (ann arbor, university microfilms, 1965). 8. johnson, richard d.: "a book catalog at stanford~" journal of library automation, 1 (march 1968), 13-50. ----------------------batch ingesting into eprints digital repository sof tware tomasz neugebauer and bin han information technology and libraries | march 2012 113 abstract this paper describes the batch importing strategy and workflow used for the import of theses metadata and pdf documents into the eprints digital repository software. a two-step strategy of importing metadata in marc format followed by attachment of pdf documents is described in detail, including perl source code for scripts used. the processes described were used in the ingestion of 6,000 theses metadata and pdfs into an eprints institutional repository. introduction tutorials have been published about batch ingestion of proquest metadata and electronic theses and dissertations (etds),1 as well as endnote library,2 into the digital commons platform. the procedures for bulk importing of etds using dspace have also been reported.3 however, bulk importing into the eprints digital repository software has not been exhaustively addressed in the literature.4 a recent article by walsh provides a literature review of batch importing into institutional repositories.5 the only published report on batch importing into the eprints platform describes perl scripts for metadata-only records import from thomson reuters reference manager.6 bulk importing is often one of the first tasks after launching a repository, so it is unsurprising that requests for reports and documentation on eprints-specific workflow have been a recurring question on the eprints tech list.7 a recently published review of eprints identifies “the absence of a bulk uploading feature” as its most significant weakness.8 although eprints’ graphical user interface for bulk importing is limited to the use of the installed import plugins, the software does have a versatile infrastructure for this purpose. leveraging eprints’ import functionality requires some perl scripting, structuring the data for import, and using the command line interface. in 2009, when concordia university launched spectrum,9 its research repository, the first task was a batch ingest of approximately 6,000 theses dated from 1967 to 2003. the source of the metadata for this import consisted in marc records from an integrated library system powered by innovative interfaces and proquest pdf documents. this paper is a report on the strategy and workflow adopted for batch ingestion of this content into the eprints digital repository software. import strategy eprints has a documented import command line utility located in the /bin folder.10 documents can also be imported through eprints’ graphical interface. using the command line utility for tomasz neugebauer (tomasz.neugebauer@concordia.ca) is digital projects and systems development librarian and bin han (bin.han@concordia.ca) is digital repository developer, concordia university libraries, montreal, quebec, canada. mailto:tomasz.neugebauer@concordia.ca mailto:bin.han@concordia.ca batch ingesting into eprints digital repository software| neugebauer and han 114 importing is recommended because it is easier to monitor the operation in real time by adding progress information output to the import plugin code. the task of batch importing can be split into the following subtasks: 1. import of metadata of each item 2. import of associated documents, such as full-text pdf files the strategy adopted was to first import the metadata for all of the new items into the inbox of an editor’s account. after this first step was completed, a script was used to loop through the newly imported eprints and attach the corresponding full-text documents. although documents can be imported from the local file system or via http, import of the files from the local file system was used. the batch import procedure varies depending on the format of the metadata and documents to be imported. metadata import requires a mapping of the source schema fields to the default or custom fields in eprints. the source metadata must also be converted into one of the formats supported by eprints’ import plugins, or a custom plugin must be created. import plugins are available for many popular formats, including bibtex, doi, endnote, and pubmedxml. in addition, community-contributed import plugins such as marc and arxiv are available at eprints files.11 since most repositories use custom metadata fields, some customization of the import plugins is usually necessary. marc plugin for eprints in eprints, the import and export plugins ensure interoperability of the repository with other systems. import plugins read metadata from one schema and load it into the eprints system through a mapping of the fields into the eprints schema. loading marc-encoded files into eprints requires the installation of the import/export plugin developed by romero and miguel.12 the installation of this plugin requires the following two cpan modules: marc::record and marc::file::usmarc. the marc plugin was then subclassed to create an import plugin named “concordia theses,” which is customized for thesis marc records. concordia theses marc plugin the marc plugin features a central configuration file (see appendix a) in which each marc field is paired with a corresponding mapping to an eprints field. most of the fields were configured through this configuration file (see table 1). the source marc records from the innovative interfaces integrated library system (ils) encode the physical description of each item using the anglo american cataloguing rules, as in the following example: “ix, 133 leaves : ill. ; 29 cm.” since the default eprints field for number of pages is of the type integer and does not allow multipart physical descriptions from the marc 300 field, a custom text field for these physical descriptions (pages_aacr) had to be added. the marc.pl configuration file cannot be used to map compound fields, such as author names—the fields need custom mapping implementation in perl. for instance, the marc 100 and 700 fields information technology and libraries | march 2012 115 are transferred into the eprints author compound field (in marc.pm). similarly, marc 599 is mapped into a custom thesis advisor compound field. marc field eprints field 020a isbn 020z isbn 022a issn 245a title 250a edition 260a place_of_pub 260b publisher 260c date 300a pages_aacr 362a volume 440a series 440c volume 440x issn 520a abstract 730a publication table 1. mapping table from marc to eprints helge knüttel’s refinements to the marc plugin shared on the eprints tech list were employed in the implementation of a new subclass of marc import for the concordia theses marc records. in the implementation of the concordia theses plugin, concordiatheses.pm inherits from marc.pm. (see figure 1.)13 knüttel added two methods that make it easier to subclass the general marc plugin and add unique mappings: handle_marc_specialities and post_process_eprint. the post_process_eprint function was not used to attach the full-text documents to each eprint. instead, the strategy to import the full-text documents using a separate attach_documents script was used (see “theses document file attachment” below). import of all of the specialized fields, such as thesis type (mapped from marc 710t), program, department, and proquest id, was implemented in the function handle_marc_specialities of concordiatheses.pm. for instance, 502a in the marc record contains the department information, whereas an eprints system like spectrum stores department hierarchy as subject objects in a tree. therefore importing the department information based on the value of 502a required regular expression searches of this marc field to find the mapping into a corresponding subject id. this was implemented in the handle_marc_specialities function. batch ingesting into eprints digital repository software| neugebauer and han 116 figure 1. concordia theses class diagram, created with the perl module uml::class::simple execution of the theses metadata import the depositing user’s name is displayed along with the metadata for each eprint. a batchimporter user with the corporate name “concordia university libraries” was created to carry out the import. as a result, the public display of the imported items shows the following as a part of the metadata: “deposited by: concordia university libraries.” the marc plugin requires the encoding of the source marc files to be utf-8, whereas the records are exported from the ils with marc-8 encoding. therefore marcedit software developed by reese was used to convert the marc file to utf-8.14 to activate the import, the main marc import plugin and its subclass, concordiatheses.pm, have to be placed in the plugin folder /perl_lib/eprints/plugin/import/marc/. the configuration file information technology and libraries | march 2012 117 (see appendix a) must also be placed with the rest of the configurable files in /archives/repositoryid/cfg/cfg.d. the plugin can then be activated from the command line using the import script in the /bin folder. a detailed description of this script and its usage is documented on the eprints wiki. the following eprints command from the /bin folder was used to launch the import: import repositoryid --verbose --user batchimporter eprint marc::concordiatheses theses-utf8.mrc following the aforementioned steps, all the theses metadata was imported into the eprints software. the new items were imported with their statuses set to inbox. a status set to inbox means that the imported items are in the work area of batchimporter user and will need to be moved to live public access by switching their status to archive. theses document file attachment after the process of importing the metadata of each thesis is complete, the corresponding document files need to be attached. the proquest id was used to link the full-text pdf documents to the metadata records. all of the marc records contained the proquest id, while the pdf files, received from proquest, were delivered with the corresponding proquest id as the filename. the pdfs were uploaded to a folder on the repository web server using ftp. the attach_documents script (see appendix b for source code) was then used to attach the documents to each of the imported eprints in the batchimporter’s inbox and to move the imported eprints to the live archive. several variables need to be set at the beginning of the attach_documents operation (see table 2). variable comment $root_dir = 'bin/importdata/proquest' this is the root folder where all the associated documents are uploaded by ftp. $depositor = 'batchimporter' only the items deposited by a defined depositor, in this case batchimporter, will be moved from inbox to live archive. $dataset_id = 'inbox' limit the dataset to those eprints with status set to inbox $repositoryid = 'library' the internal eprints identifier of the repository table 2. variables to be set in the attach_documents script batch ingesting into eprints digital repository software| neugebauer and han 118 the following command is used to proceed with file attachment, while the output log is redirected and saved in the file attachment: /bin/attach_documents.pl > ./attachment 2>&1 the thesis metadata record was made live even if it did not contain a corresponding document file. a list of eprint ids of theses that did not contain a corresponding full-text pdf document are listed at the end of the log file, along with the count of the number of theses that were made live. after the import operation is complete, all the abstract pages need to be regenerated with the following command: /bin/generate_abstracts repositoryid conclusions this paper is a detailed report on batch importing into the eprints system. the authors believe that this paper and its accompanying source code is a useful contribution to the literature on batch importing into digital repository systems. in particular, it should be useful to institutions that are adopting the eprints digital repository software. batch importing of content is a basic and fundamental function of a repository system, which is why the topic has come up repeatedly on the eprints tech list and in a repository software review. the methods that we describe for carrying out batch importing in eprints make use of the command line and require perl scripting. more robust administrative graphical user interface support for batch import functions would be a useful feature to develop in the platform. acknowledgements the authors would like thank mia massicotte for exporting the metadata records from the integrated library system. we would also like to thank alexandros nitsiou, raquel horlick, adam field, and the reviewers at information technology and libraries for their useful comments and suggestions. references 1. shawn averkamp and joanna lee, “repurposing proquest metadata for batch ingesting etds into an institutional repository,” code{4}lib journal 7 (2009), http://journal.code4lib.org/articles/1647 (accessed june 27, 2011). 2. michael witt and mark p. newton, “preparing batch deposits for digital commons repositories,” 2008, http://docs.lib.purdue.edu/lib_research/96/ (accessed june 20, 2011). 3. randall floyd, “automated electronic thesis and dissertations ingest,” 2009, https://wiki.dlib.indiana.edu/display/iusw/automated+electronic+thesis+and+dissertations+i ngest (accessed may 26, 2011). 4. eprints digital repository software, university of southampton, uk, http://www.eprints.org/ (accessed june 27, 2011). 5. maureen p. walsh, “batch loading collections into dspace: using perl scripts for automation and quality control,” information technology & libraries 29, no. 3 (2010): 117–27, http://journal.code4lib.org/articles/1647 http://docs.lib.purdue.edu/lib_research/96/ https://wiki.dlib.indiana.edu/display/iusw/automated+electronic+thesis+and+dissertations+ingest https://wiki.dlib.indiana.edu/display/iusw/automated+electronic+thesis+and+dissertations+ingest http://www.eprints.org/ information technology and libraries | march 2012 119 http://search.ebscohost.com/login.aspx?direct=true&db=a9h&an=52871761&site=ehost-live (accessed june 26, 2011). 6. lesley drysdale, “importing records from reference manager into gnu eprints,” 2004, http://hdl.handle.net/1905/175 (accessed june 27, 2011). 7. eprints tech list, university of southampton, uk, http://www.eprints.org/tech.php/ (accessed june 27, 2011). 8. mike beazly, “eprints institutional repository software: a review,” partnership: the canadian journal of library & information practice & research 5, no. 2 (2010), http://journal.lib.uoguelph.ca/index.php/perj/article/viewarticle/1234 (accessed june 27, 2011). 9. concordia university libraries, “spectrum: concordia university research repository,” http://spectrum.library.concordia.ca (accessed june 27, 2011). 10. eprints wiki, “api:bin/import,” university of southampton, uk, http://wiki.eprints.org/w/api:bin/import (accessed june 23, 2011). 11. eprints files, university of southampton, uk, http://files.eprints.org/ (accessed june 24 2011). 12. parella romero and jose miguel, “marc import/export plugins for gnu eprints3,” eprints files, 2008, http://files.eprints.org/323/ (accessed may 31, 2011). 13. agent zhang and maxim zenin, “uml:class::simple,” cpan, http://search.cpan.org/~agent/uml-class-simple-0.18/lib/uml/class/simple.pm (accessed september 20, 2011). 14. terry reese, “marcedit: downloads,” oregon state university, http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html (accessed june 27, 2011). http://search.ebscohost.com/login.aspx?direct=true&db=a9h&an=52871761&site=ehost-live http://hdl.handle.net/1905/175 http://www.eprints.org/tech.php/ http://journal.lib.uoguelph.ca/index.php/perj/article/viewarticle/1234 http://spectrum.library.concordia.ca/ http://wiki.eprints.org/w/api:bin/import http://files.eprints.org/ http://files.eprints.org/323/ http://search.cpan.org/~agent/uml-class-simple-0.18/lib/uml/class/simple.pm http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html batch ingesting into eprints digital repository software| neugebauer and han 120 appendix a. marc.pl configuration file # # plugin eprints::plugin::import::marc # # marc tofro eprints mappings # do _not_ add compound mappings here. $c->{marc}->{marc2ep} = { # marc to eprints '020a' => 'isbn', '020z' => 'isbn', '022a' => 'issn', '245a' => 'title', '245b' => 'subtitle', '250a' => 'edition', '260a' => 'place_of_pub', '260b' => 'publisher', '260c' => 'date', '362a' => 'volume', '440a' => 'series', '440c' => 'volume', '440x' => 'issn', '520a' => 'abstract', '730a' => 'publication', }; $c->{marc}->{marc2ep}->{constants} = { }; ################################################################### ### # # plugin-specific settings. # # any non empty hash set for a specific plugin will override the # general one above! # ################################################################### ### # # plugin eprints::plugin::import::marc::concordiatheses # $c->{marc}->{'eprints::plugin::import::marc::concordiatheses'}->{marc2ep} = { '020a' => 'isbn', '020z' => 'isbn', '022a' => 'issn', '250a' => 'edition', information technology and libraries | march 2012 121 '260a' => 'place_of_pub', '260b' => 'publisher', '260c' => 'date', '300a' => 'pages_aacr', '362a' => 'volume', '440a' => 'series', '440c' => 'volume', '440x' => 'issn', '520a' => 'abstract', '730a' => 'publication', }; $c->{marc}->{'eprints::plugin::import::marc::concordiatheses'}->{constants} = { # marc to eprints constants 'type' => 'thesis', 'institution' => 'concordia university', 'date_type' => 'submitted', }; batch ingesting into eprints digital repository software| neugebauer and han 122 appendix b. attach_documents.pl #!/usr/bin/perl -i/opt/eprints3/perl_lib =head1 description this script allows you to attach a file to an eprint object by proquest id. =head1 copyright and license 2009 adam field, tomasz neugebauer <tomasz.neugebauer@concordia.ca> 2011 bin han <bin.han@concordia.ca> this module is free software under the same terms of perl. compatible with eprints 3.2.4 (victoria sponge). =cut use strict; use warnings; use eprints; my $repositoryid = 'library'; my $root_dir = '/opt/eprints3/bin/import-data/proquest'; #location of pdf files my $dataset_id = 'inbox'; #change to 'eprint' if you want to run it over everything. my $depositor = 'batchimporter'; #limit import to $depositor’s inbox #global variables for log purposes my $int_live = 0; #count of eprints moved to live archive with a document my $int_doc = 0; #count of eprints that already have document attached my @array_doc; #ids of eprints that already have documents my $int_no_doc = 0; #count of eprints moved to live with no document attached my @array_no_doc; #ids of eprints that have no documents my $int_no_proid = 0; #count of eprints with no proquest id my @array_no_proid; #ids of eprints with no proquest id my $session = eprints::session->new(1, $repositoryid); die "couldn't create session for $repositoryid\n" unless defined $session; #the hash contains all the files that need to be uploaded #the hash contains key-value pairs: (pq_id => filename) my $filemap = {}; load_filemap($root_dir); #get all eprints in inbox dataset my $dataset = $session->get_repository->get_dataset($dataset_id); #run attach_file on each eprint object $dataset->map($session, \&attach_file); information technology and libraries | march 2012 123 #output log for attachment print "#### $int_doc eprints already have document attached, skip ####\n @array_doc\n"; print "#### $int_no_proid eprints doesn't have proquest id, skip ####\n @array_no_proid\n"; print "#### $int_no_doc eprints doesn't have associated document, moved to live ####\n @array_no_doc\n"; #total number of eprints that were made live: those with and without documents. my $int_total_live = $int_live + $int_no_doc; print "#### intotal: $int_total_live eprints moved to live ####\n"; #attach file to corresponding eprint object sub attach_file { my ($session, $ds, $eprint) = @_; #skip if eprint already has a document attached my $full_text_status = $eprint->get_value( "full_text_status" ); if ($full_text_status ne "none") { print "eprint ".$eprint->get_id." already has a document, skipping\n"; $int_doc ++; push ( @array_doc, $eprint->get_id ); return; } #retrieve username/userid associated with current eprint my $user = new eprints::dataobj::user( $eprint->{ session }, $eprint->get_value( "userid" ) ); my $username; # exit in case of failure to retrieve associated user, just in case. return unless defined $user; $username = $user->get_value( "username" ); # $dataset includes all eprints in inbox, so we limit to $depositor's items only return if( $username ne $depositor ); #skip if no proquest id is associated with the current eprint my $pq_id = $eprint->get_value('pq_id'); if (not defined $pq_id) { print "eprint ".$eprint->get_id." doesn't have a proquest id, skipping\n"; $int_no_proid ++; batch ingesting into eprints digital repository software| neugebauer and han 124 push ( @array_no_proid, $eprint->get_id ); return; } #remove space from proquest id $pq_id =~ s/\s//g; #attach the pdf to eprint objects and move to live archive if ($filemap->{$pq_id} and -e $filemap->{$pq_id} ) #if the file exists { #create document object, add pdf files to document, attach to eprint object, and move to live archive my $doc = eprints::dataobj::document::create( $session, $eprint ); $doc->add_file( $filemap->{$pq_id}, $pq_id . '.pdf' ); $doc->set_value( "format", "application/pdf" ); $doc->commit(); print "adding document to eprint ", $eprint->get_id, "\n"; $eprint->move_to_archive; print "eprint ".$eprint->get_id." moved to archive.\n"; $int_live ++; } else { #move the metadata-only eprints to live as well print "proquest id \\$pq_id\\ (eprint ", $eprint->get_id, ") does not have a file associated with it\n"; $eprint->move_to_archive; print "eprint ".$eprint->get_id." moved to archive without document attached.\n"; $int_no_doc ++; push ( @array_no_doc, $eprint->get_id ); } } #recursively traverse the directory, find all pdf files. sub load_filemap { my ($directory) = @_; foreach my $filename (<$directory/*>) { if (-d $filename) { load_filemap($filename); } #catch the file name ending in .pdf elsif ($filename =~ m/([^\/]*)\.pdf$/i) information technology and libraries | march 2012 125 { my $pq_id = $1; #add pq_id => filename pair to filemap hash table $filemap->{$pq_id} = $filename; } } } lib-mocs-kmc364-20131012113359 210 reports and working papers inclusion of nonroman character sets the following document was prepared by staff of the library of congress as a working paper for discussions on incorporating the techniques described into the marc communications format. the document defines the principles for inclusion of nonroman alphabet character sets in the marc communications format and the procedural changes needed to allow implementation of the principles. this technique was agreed upon at the marbi committee meeting on february 2, 1981. any questions on the description of the inclusion of nonroman character sets in the marc communications format should be addressed to: library of congress, processing services, attention: mrs. margaret patterson, washington , dc 20540. 1. introduction the cataloging rules followed by american libraries favor recording the title page data in the original script when possible. this helps those who consult catalogs to read the most essential information about the book. (reading his or her name in romanized form is just as difficult for someone who knows arabic as reading your name when it's written in arabic. ) the new cataloging rules also specify that names and titles in notes be given in their original script, aacr2 l. 7 a.3. technological advances have made it possible to provide many, if not all , nonroman alphabets in machinereadable cataloging records. oclc and rlin are in the process of enhancing their systems so they can handle some nonroman writing systems. the library of congress has entered into a cooperative agreement with rlin for the development and use of an augmented rlin system for east asian (i.e., chinese, japanese, and korean) bibliographic data. although the library itself will not be creating and distributing marc records with nonroman characters in the near term , the goal of this proposal is to define how these data can be included now so others can do so soon. the technique known as an escape sequence announces that the codes which follow will represent letters in a specific different alphabet instead of the roman letters the codes would otherwise stand for. 2. principles the following principles will govern inclusion of other alphabets in marc records. note that these deal only with the marc communications format record, not the details of its processing-keying, sorting, display, etc.-by any bibliographic agency or utility. these principles are a slightly revised version of ones reviewed and approved in principle by the marbi character set committee in 1976. the earlier version was also distributed that year as working paper n77 of iso tc46/sc4/ wgi. (1) standard character sets should be used when available. (2) standard escape sequences should be used when available. (3) escape sequences should be used only when needed. (4) escape sequences are locking within a subfield but revert at any delimiter or field or record terminator code. example: (for demonstration purposes only, ec represents escape to cyrillic and ea escape to ascii) 245 10$aecrussian title proper :$becrussian subtitle. f not 245 10$aecrussian title proper :ea$becrussian subtitle. eaf and not 245 10$aecrussian title proper :$brussian subtitle. f (5) records which contain an escape sequence will also contain a special field which specifies what unusual character sets are present. 3. implementation the following will be done to realize these principles. • the ala character set will be redefined-see table 1. • a new character sets present field will be defined. • details of application such as distribution, filing indicator values, etc., will be defined. 3.1 discussionala character set a character set is a list of characters with the code used to represent each one. using this definition , the ala character set as given in appendixes iii.b and iii.c of marc formats for bibliographic data actually consists of eight character sets. (1) ascii and ala diacritics and special characters with their eight-bit code. (2) superscript zero to nine, plus, minus, open and close parentheses with their eight-bit code. table 1. proposed revised ala character set ~ p ~ p p p p i p p i p p p i i p i p p p i p i p i i p ~ i i i i ~ ~ ~ i p ~ i i ~ i p i p i i i i p p i i p i i i i p i i i i 4 3 2 i bits p i 2 3 4 ~ 6 7 r 9 10 ii 12 13 14 , 'i p i p i 2 nul ole sp soh dci ! fstx dc2 . etx dc3 " eot dc4 s enq nak " ack syn & bel !::to os can i ht em i lp sub vt f:sc + ff fs cr os , so ns , s l us' i ~ p l p 9 i i i i p p p i p i 3 ~ ~ . p @ p i a q 2 b r 3 c s 4 i> t ~ e u 6 p v 7 g w 8 h x !i i y j z ; k i < 1. \ m i > n ' 1 0 i ascii 6 • b c d c r • h ; j k i m n 0 reports and working papers 211 (3) subscript zero to nine, plus, minus, open and close parentheses with their eight-bit code. (4) greek lowercase alpha, beta, and gamma with their eight-bit code. (5-8) the same characters with their sixbit codes. the six-bit character sets are used to distribute marc records on seven-track tapes. there are very few subscribers. it is unlikely that a method can be devised for distribution of nonroman character sets records on such tapes. the present seven-track subscribers should be asked if they know of any way to do so. if they do not, the alternatives are to cease distribution of seven-track tapes entirely or limit them to those records containing only roman alphabet characters-those without a character sets present field. in the latter case, they should pay proportionately less for their subscription. the present four eight-bit character sets and their escape sequences do not conform to present standards. the present standards did not exist when the character sets were being defined. to avoid creating and distributing records containing both standard and nonstandard character sets and stanp p i i i i p p i i p p p i p i 7 8 9 . p q r ' l u ,. w x y , i' : i' -. del l i i /i p ~ i i i i i i p p i i p i p i ii p ~~. i i 10 ii 12 13 · u 0 l i l ' e • < 2 0 d ' j .. p ~ 4 4 . ;e • 5 s u <e .. 6 -, " 7 ' . . 8 .. < ~ ( 9 . 411 b + . ~ :!: r -0 " ( ~ lf u i ) ~ . a y . proposed change d ala extension of ascii i hb 7 i 6t ~) s 212 journal of library automation vol. 14/3 september 1981 dard and nonstandard escape sequences, the ala character set should be redefined. this change will be much less traumatic than it sounds. no new characters will be added; only the codes used to represent subscript, superscript, and greek characters will be changed. these characters were found in the title field of 8.59 out of 1.1 million records. if, as seems plausible, most or all marc subscribers translate tapes into their own character set codes as a first step and for communication translate from their own codes into the ala character set as the last step before distribution, only these two programs would need to be changed. the proposed redefined ala character set is shown in table 1. on it, columns two through seven are the american standard code for information interchange (ascii) which is a recognized standard with a registered escape sequence. columns ten through fifteen are the ala extension of ascii with special characters and the three greek letters in columns ten and eleven, superscripts in column twelve, subscripts in thirteen, and diacritics in columns fourteen and fifteen. (it should be noted that six ascii codes will not occur in marc records: codes 5/14 circumflex, 5/15 underline, 6/0 grave, and 7/ 14 tilde are redundant with the codes for these diacritics in columns fourteen and fifteen; codes 7 i 11 left brace and 7/13 right brace never occur because these characters do not occur in bibliographic data. no change in this practice is proposed. it is the fact that these last two codes are used in some nonroman alphabet standard character sets that makes nonroman six-bit codes impossible.) the ala extension of ascii is not an official standard now; it does not have an escape sequence yet. in addition to the ala extension of ascii, there is a draft international standard extended latin alphabet character set for bibliographic use-iso dis 5426 (table 2). while both sets are identical in purpose, they differ in the characters they contain and the codes used to represent them. the abacus group has agreed that iso 5426 be used for international distribution of marc records among the bibliographic agencies they represent once it is an approved international standard, cf. lc information bulletin, november 16, 1979, p. 475. the library will, however, continue to use the ala extension for u.s. distribution. some of the characters only on the iso set could be added to the ala extension without affecting existing records. an ansi z39 subcommittee has been established to consider this possibility. while some changes may be desirable to the ala character repertoire, it is important that this issue not delay the separate matter of providing for inclusion of nonroman alphabets in marc. 3.2 discussion-escape sequence for purposes of this discussion, escape sequences are defined as a combination of three characters. (see table 3.) the first is an escape character, hex 1111. the second character specifies which codes are having different characters assigned to them, those in columns 27 or those in columns 10-15. the third character defines what characters are being assigned to these codes, e.g., cyrillic, greek, etc. this is a greatly simplified explanation of the escape sequence standards, iso 2022 and ansi x3.41. (both are in the process of revision.) these standards provide for two types of escape sequences: public ones which reference registered character sets, and private ones for unregistered character sets. while the meaning of the latter is governed by an agreement between the sender and the receiver, they are in conformity with the standard. until the ala extension of ascii has a registered escape sequence, such a "private" escape sequence could be defined for it in the character set appendix and used. the second character of an escape sequence which changes the meaning of the codes in columns 2-7 contains either an open parenthesis, hex 2/8, or a less than sign, hex 2/12. the second character of an escape sequence which changes the meaning of the codes in columns 10-15 contains either a close parenthesis, hex 2/9, or an equal sign, hex 2/13. the third character of escape sequences for certain registered character sets has been defined as follows: table 2. extended latin alphabet character set ii . : 0 0 0 i ~ 0 lb71b61bs b4 b3 b2 bt 1\ow set ascii bits russian (1967 cost standard) (table 3) iso greek rso extended cyrillic (table3) 0 0 0 0 0 0 0 0 1 1 0 0 1 0 2 0 0 l 1 3 0 1 0 0 4 0 l 0 1 5 0 1 1 0 6 0 1 1 1 7 1 0 0 0 8 1 0 0 1 9 1 0 l 0 10 1 0 1 l 11 1 l 0 0 12 l 1 0 1 13 1 1 1 0 14 1 1 1 1 15 code registration applied for, code pending 5/8, uppercase x 517, uppercase w 0 0 1 the sixteen codes in column three can be used to designate sixteen different "private" character sets. in marc records, ascii and russian would be assigned to columns 2-7, while greek and the extended cyrillic (and the ala extension of ascii) would be assigned to columns 10-15. 1 reports and working papers 213 10 0 1 1 11 l 1 1 0 0 1 1 0 1 0 1 0 1 2 3 4 5 6 7 . 7 ) . ... i l ie ae " -[) ct " . .j £ a 0 1.1 s ~ 0 ¥ ~ 1 t .t u u ij 9 . .. i ii .. _t. .r . . .. ¢ = ¢ .. .. 0 <e i <e « >> . b "' b * . p p © i ii r ® ii j 1 ® l " '-../ escape sequences would be given where needed in data fields. if necessary, it is permissible to embed escape sequences within a word. for example, a latin diacritic might be needed with an extended cyrillic letter to represent a letter in one of the nonslavic languages of central asia which uses the cyrillic alphabet. in addition to escape sequences for nonroman alphabets described above in which one code stands for one letter, the escape standards also define escape sequence procedures for changing to multiple byte character sets. because the ideographic writing 214 journal of library automation vol. 14/3 september 1981 table 3. escape sequence character set p p p p p p p 1 p p 1 p ~ ~ 1 1 u i p p p i p 1 p i 1 p p 1 i 1 i p p u 1 u u 1 i ~ 1 p i ~ i i i i p u i i q i i i i p i 1 i i ·i 3 2 i jilts g 1 2 j 4 r. 6 7 8 9 ill 11 12 ij (.1 1$ fl ~ ~ u ~ p 1 p 1 ~ p q p 1 1 p i 2 3 sp p ! 1 " 2 # j ll 4 \\ $ & 6 7 i 8 i 9 : • : < > i ? l ~ i i i g g i p i ~ 4 !o g 10 n 10 a « a 15 p !j 1.1 c ll a t .n e >' e <!> "' <l> r • r " • x u .. lo( -,, 3 ~ k w " j1 . n ... u1 m k ~ ll 0 . 0 ~ i i l i i u g p u i 1 p ~ i 1 u 1 ~ 1 ~ i ~ 7 8 9 10 11 12 n r .!l ~ p r c c t ~ y s lk j b 'i b j bl "' 3 ,_ ill ii 3 " ul y 'l ,, ,, i i p 1 13 -t 9 v .. [ j /i i i i 1 h ~ 1!o 1 / r '!; ,; e f y c ;;{ j:: s 1 .. j jb h, 1\ ,( y ll " i hll 7 i gt r, s cost 13052-67 russian iso dis 5~27 extended cyrillic systems of east asia use thousands of different characters, it will be necessary to use two or three bytes/codes to identify a single specific character uniquely. the japanese industrial standard character set, jis 6226, uses two bytes per character, and it has been submitted to iso to obtain a registered escape sequence. the first volume of the chinese character code for information interchange, cccii, has been issued; the second is expected in december. it uses three bytes per character. in all probability the lc/ rlin east asian cooperative project will adopt either these character sets and their escape sequences or machine reversible adaptations of them. the need to expand east asian character sets constantly to provide for infrequently used characters poses problems whose solutions cannot be predicted at this time. 3. 3 discussioncharacter sets present field as specified in the sixth principle, there is need for a special field which specifies what character sets are present whenever a set other than ascii and the ala extension of ascii are present in a record. the proposed field will use tag 066 and be defined as follows: 066 character sets present this field specifies what character sets are present in the other than ascii and the ala extension of ascii. the field is not repeatable. both indicators are unused and will contain blanks. $a this subfield will contain all but the first character of the escape sequence to the default character set in columns 2-7 whenever the default character set is not ascii. this is not likely to occur in records created in the united states. since there can only be one default character set, the subfield is not repeatable. $b this subfield will contain all but the first character of the escape sequence to the default character set in columns 1015 whenever the default character set is not the ala extension of ascii. this is not likely to occur in records created in the united states. since there can be only one default extension character set, this subfield is not repeatable. $c this subfield will contain all but the first character (or all but the first if a longer escape sequence is used) of every escape sequence found in the record. if the same escape sequence occurs more than once, it will be given only once in this subfield. the subfield is repeatable. this subfield does not identify the default character sets. example : l'>l'>~c)w a record containing the iso extended cyrillic character set. l'>l'>$c)w$c)x a record 3.4 discussion-other details containing both the iso greek and extended cyrillic character sets. when a field has an indicator to specify the number of leading characters to be ignored in filing and the text of the field begins with an escape sequence, the length of the escape sequence will not be included in the character count. when fields contain escape sequences to languages written from right to left, the field will still be given in its logical order. for example, the first letter of a hebrew title would be the eighth character in a field (following the indicators, a delimiter, a subfield code, and a three-character escape sequence). the first letter would not appear just before the end of field character and proceed backwards to the beginning of the field. a convention exists in descriptive cataloging fields that subfield content designation generally serves as a substitute for a space. an escape sequence can occur within a word, after a subfield code, or between two words not at a subfield boundary. for simplicity, the convention that an escape sequence does not replace a space should be adopted. one other convention is also advocated: when a space, subfield code, or punctuation mark (except open quote, pareports and working papers 215 renthesis or bracket) is adjacent to an escape sequence, the escape sequence will come last. wayne davison of rlin raised the following issue. after the library of congress has prepared and distributed an entirely romanized cataloging record for a russian book, a library with access to automated cyrillic input and display capability will create a record for the same book with the title in the vernacular. (since aacr2 says to give the title in the original script "wherever practicable," the library could be said to be obligated to do so.) in such an event the local record could have all the authoritative library of congress access points. to keep this record current when the library of congress record is revised and redistributed, it would be necessary to carry the lc control number in the local record . most automated systems are hypersensitive to the presence of two records with the same control number. the two records can be easily distinguished: in the library of congress record, the modified record byte in field 008 will be set to "o" and it will not have any 066, character sets present field. a comparison of oclc, rlg/rlin, and wln university of oregon library the following comparison of three major bibliographic utilities was prepared by the university of oregon library's cataloging objectives committee, subcommittee on bibliographic utilities. members of the subcommittee were elaine kemp, acting assistant university librarian for technical services; rod slade, coordinator of the library's computer search service; and thomas stave, head documents librarian. the subcommittee attempted to produce a comparison that was concise and jargonfree for use with the university community in evaluating the bibliographic utilities under consideration. the university faculty library committee was enlisted to review this document in draft jorm and held three meetings with the subcommittee for that purpose. the document was also shared with library faculty and staff in order to elicit suggestions for revision. 176 journal of library a-utomation vol. 2/3 september, 1969 book reviews information retrieval systems; characteristics, testing, and evaluation, by f. wilfred lancaster. new york, john wiley & sons, 1968. 222 pp. $9.00. despite the fact that users retrieve the majority of information that they obtain from collections such as libraries by employing author / title listings in catalogs, information scientists consider only subject listings in discussions of information retrieval this book is no excepton. lancaster defines an information retrieval system as informing a user "on the existence (or non-existence) and whereabouts of documents relating to his request." half of his book treats of characteristics and operation of information retrieval systems and half of testing and evaluating such systems. it is the latter half of the book that distinguishes it from other general introductions to the subject. for the testing and evaluation sections of his book, the author draws heavily on his experience gained while working on the cranfield project as well as at the national library of medicine. at the latter he examined a segment of the real world in a major investigation of the medlars system. an interesting finding of the medlars study that he reports in the book, but on which he does not elaborate, is that there was no relationship between recall ratio percentage and precision ratio percentage for 299 searches examined. in his preface the author expresses the hope that his book will be helpful to students and useful to practitioners. however, a principal function of such an introduction is to guide the reader in further pursuit, or retrieval, of information. in this function the book does not succeed, for seven chapters are barren of references, another eight average somewhat more than three, and the remaining chapter boasts fifty-three. this book will not supplant other general introductions to information retrieval systems, but its discussion of testing and evaluation is a useful introduction. frederick g. kilgour book reviews 177 how to manage your information, by bart e. holm. new york, reinhold book corporation, 1968. 292 pp. $10.00 essential information exceeds the grasp of the keenest minds in all professions. a method of readily obtaining needed resource material can be a particularly knotty problem for those who have no background in appropriate methods of data storage and retrieval. successful operation for many professionals depends directly upon their ability to work out a practical personal system which does not require complex apparatus, excessive cost or time. the purpose of this volume is to help such individuals evaluate their particular needs and design a method of managing information which will be workable and practical. i found the book enjoyable and informative. it immediately recommends itself with its own efficient organization, attractive format, readable style, clever illustrations, and complete indexing. it not only deals with the broad principles necessary for development of a personal information system but also includes specific information of a practical nature on the approach to this problem for professionals in several different fields. the first chapter, which is titled "man the collector", is fascinating to an unsophisticated non-librarian. it outlines the enormous problem of the growth of world-wide information that appears to be proliferating in an almost malignant manner. this served to emphasize a repeatedly stressed cardinal principle: the need to be selective, so that only items of probable real value will be retained. a most valuable chapter for those not experienced in library work relates to the basic principles for retrieval on a single or multiple entry basis. this logically leads into a discussion of how to evaluate the individual's personal need. the operations of specific simple systems, such as optical coincidence, termatrex, keysort and term cards were adequately discussed. individual chapters are devoted to the unique problems that might be encountered by the engineer, the chemist, the physicist, the architect, the doctor, and the archivist, with emphasis on the specific vocabulary needed for proper organization and a brief review of information sources of the various disciplines. the remaining seven chapters deal with proper use of available sources of information, such as keeping current with the literature, use of the modern library, records management, microfilming, and data systems of the present and the future. this volume should be a real value to many who have limited background and are struggling in vain to keep up with the information they need. it can provide practical pointers for those who want to make a serious effort toward establishing and maintaining a system of storage and retrieval of information that does not rely on an all too often faulty memory. ellis a. fuller, m.d. 178 journal of library automation vol. 2/3 september, 1969 the institutes of education union list of periodicals processing system, by j. d. dews and j. m. smethurst. ( symplegades, no. 1). newcastleupon-tyne, oriel press ltd., 1969. 39 pp. sbn ( 69uk) 85362 060 1. 15s. the first half of this small manual is devoted to describing the file .maintenance .and text editing system developed by the university of newcastle-upon-tyne. the second half of the text is devoted to the technical specifications of the newcastle file handling system and refers specifically to the english electric-leo marconi kdf 9 computer. the system described is the application of a series of general purpose programs, that provide the capability of storing, adding, deleting, or changing variable length records, to a union list project for a group of libraries. unfortunately this otherwise well designed system has not been able to do away with the manual "typed slips" back-up file which plagues so many other computerized union list projects. also of interest in this processing system is the use of the work developed at the newcastle computer typesetting research project for computer controlled composition of the final output. section two of seminar on the organization and handling of bibliographic records by computer, newcastle-upon-tyne, 1967 edited by nigel s. m. cox and michael w. grose (archon books, hamden, connecticut, 1967) is the preferred description of all aspects of the system except for those who need the program specifications. alan d. hogan computer based information retrieval systems, edited by bernard houghton. camden, conn., archon books, 1969. 136 pp. $5.00. this book contains six papers that their authors presented at a special course in april 1968 at the liverpool school of librarianship. the objective of the course was to survey the major computer based informational retrieval systems operating in the united kingdom for an audience of prospective users and planners. the book is a successful elementary introduction to large information retrieval systems. in the 1940's and early 1950's, such pioneers as w. e. baten, g. cordonnier, calvin mooers and mortimer taube developed new techniques for information retrieval, a phrase which mooers coined. the major innovation in the new development was "coordinate indexing" or the coordination of index terms at the time of searching. coordination employed simpl~ boolean logic -"and," "or," and "but not." coordinate indexing increased flexibility of searching and number of accesses to documents in contrast to the inflexible, pre-coordinated traditional subject catalogs. book reviews 179 it was also characteristic of the early systems that they dealt with relatively small files of documents not under classical bibliographical control -patents, internal reports, and segments of external report literature. with the advent of the computer, it became feasible to apply the new information retrieval techniques to large files of traditional materials, but to date the major effort has been directed toward huge files of journal articles. it is, therefore, no surprise to find that the five chapters in computer based information retrieval systems that describe systems all depict retrieval from files of journal articles. these five systems are medlars, the science citation index (sci) and its peripherals, chemical titles ( ct) and chemical biological activities ( cbac), a burgeoning institution of electrical engineers (lee ) sponsored project in selective dissemination of electronics information, and a minor computer application to production of the british technology index; the three major, operational projects are of united states origin. selective dissemination is a gt·atifying feature of sci, ct, cbas, and the lee project, for sdi applications take advantage of the computer's potential for personalization by servicing individual users on the basis of their individual needs. the book is a successful primer that provides a useful introduction to computer based systems for retrieval of journal citations from large files. g. a. somerfield's last chapter, "state of the art of computer based information retrieval systems," is more than its title implies, for the last half of the chapter analyzes desirable improvements yet to be achieved. the first half could well serve as an introduction to the book. recently, several worthwhile primers on information retrieval and retrieval systems have appeared. computer based infotimation retrieval systems is still another to provide the brief, clear, elementary introduction that new students, new users, and new planners find most effective in providing an understanding of an unfamiliar field. frederick g. kilgour modern data processing, by robert r. arnold, harold c. hill and aylmer v. nichols. new york, john wiley and sons, inc., 1969. 370 pp. $8.95 this book is an updated version of the authors' previous book, introduction to data processing, john wiley and sons, 1966. the present volume is designed to be used as an introductory text to the concepts of all facets of data processing. it will not teach people to be programmers or systems analysts but it can be very useful to anyone who would like to learn about data processing without having to become a programmer or systems analyst. the book is well organized and explains, in non-technical terms, highly technical facets of data processing. this book can be used not only 180 journal of library automation vol. 2/3 september, 1969 at the high school level but also at the beginning college level. in it the authors strived and achieved to ·make . available all the latest advancements in the computer science field. in my opinion the authors have achieved then· goal of developing a very good elementary text in data processing. i highly recommend this book to librarians and all others as a basic primer in automation. it will be particularly useful to administrators, as it has an excellent glossary that assist them in their understanding of the data processing vocabulary and jargon. thomas k. burgess microsoft word september_ital_fortier_final.docx hidden online surveillance: what librarians should know to protect their own privacy and that of their patrons alexandre fortier and jacquelyn burkell information technology and libraries | september 2015 59 abstract librarians have a professional responsibility to protect the right to access information free from surveillance. this right is at risk from a new and increasing threat: the collection and use of non-‐ personally identifying information such as ip addresses through online behavioral tracking. this paper provides an overview of behavioral tracking, identifying the risks and benefits, describes the mechanisms used to track this information, and offers strategies that can be used to identify and limit behavioral tracking. we argue that this knowledge is critical for librarians in two interconnected ways. first, librarians should be evaluating recommended websites with respect to behavioral tracking practices to help protect patron privacy; second, they should be providing digital literacy education about behavioral tracking to empower patrons to protect their own privacy online. introduction privacy is important to librarians. the american library association code of ethics (2008) states that “we protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted,” while the canadian library association code of ethics (1976) states that members have responsibility to “protect the privacy and dignity of library users and staff.” this translates to a core professional commitment: according to the american library association (2014, under “why libraries?”), “librarians feel a professional responsibility to protect the right to search for information free from surveillance.” increasingly, information searches are conducted online, and as a result librarians should be paying specific attention to online surveillance in their efforts to satisfy their privacy-‐related professional responsibility. this is particularly important given the current environment of significant and increasing threat to privacy in the online context. although many concerns about online privacy relate to the collection, use, and sharing of personally identifiable information, there is increasing awareness of the risks associated with the collection and use of what has been termed ‘non-‐personally identifiable information’ (e.g.: internet protocol addresses, pages visited, geographic location information, search strings, etc.; office of the privacy commissioner of canada alexandre fortier (afortie@uwo.ca) is a phd candidate and lecturer, faculty of information and media studies, the university of western ontario, london, ontario. jacquelyn burkell (jburkell@uwo.ca) is associate professor, faculty of information and media studies, the university of western ontario, london, ontario. hidden online surveillance: what librarians should know to protect their own privacy and that of their patrons| fortier and burkell | doi: 10.6017/ital.v34i3.5495 60 2011, 12). this practice has been termed ‘behavioral tracking’, and recent revelations of government security agency collection of user metadata (ball 2013; weston, greenwald and gallager 2014) have heightened awareness of this issue. the problem, however, is not new, nor is the practice restricted to the actions of governmental agencies. indeed, as early as 1996 commercial and non-‐commercial entities were practicing online behavioral tracking for purposes of website and interaction personalization and to present targeted advertising (“affinicast unveils personalization tool” 1996; “adone classified network and clickover announce strategic alliance” 1997). since these initial forays into behavioral tracking and personalization of online content the practice has proliferated, and many sites now use a variety of behavioral tracking tools to enhance user experience and deliver targeted advertisements (see, e.g., the “what they know” series from the wall street journal 2010; gomez, pinnick and soltani 2009; soltani et al. 2009). there can be no question that behavioral tracking is a form of surveillance (castelluccia and narayanan 2012), and the ubiquity of this practice means that users are regularly subject to this type of surveillance when they access online resources. in order to satisfy a professional commitment to support information access free from surveillance, librarians must therefore address two related issues: first, they must ensure that the resources they recommend are privacy-‐respecting in that those resources engage in little if any online surveillance; second, they must raise the digital literacy of their patrons with respect to online privacy, increasing understanding of online tracking mechanisms and the strategies that patrons can use to protect their privacy in light of these activities. addressing the first issue requires that librarians attend to surveillance practices when recommending online information resources. privacy and surveillance issues, however, are notably absent from common guidelines for evaluating web resources (see, e.g., kapoun 1998; university of california, berkley 2012; john hopkins university 2013), and thus librarians do not have the guidance they need to ensure that the resources they recommend are privacy-‐respecting. it is critical that librarians and other information professionals address this gap by developing an understanding of the surveillance mechanisms used by websites and the strategies that can be deployed to identify and even nullify these mechanisms. this same understanding is necessary to address the second goal of raising the privacy-‐related digital literacy of patrons. librarians must understand tracking mechanisms and potential responses in order to integrate privacy literacy into library digital literacy initiatives that are central to the mission of libraries (american library association 2013). this paper provides an introduction to behavioral tracking mechanisms and responses. the goals of this paper are to provide an overview of the risks and benefits associated with online behavioral tracking, to discuss the various surveillance mechanisms that are used to track user behavior, and to provide strategies for identifying and limiting online behavioral tracking. we have elsewhere published analyses of behavioral tracking practices on websites recommended by information professionals (burkell and fortier 2015), and on practices with respect to the disclosure of tracking mechanisms (burkell and fortier 2015). this paper serves as an adjunct to information technologies and libraries | september 2015 61 those empirical results, providing information professionals with background that will assist them in negotiating, on the part of themselves and their patrons, the complex territory of online privacy. consumer attitudes toward behavioral tracking survey data suggest that consumers are, in general, aware of behavioral tracking practices. the 2013 us consumer data privacy study (truste 2013), for example, reveals that 80 percent of users are aware of online behavioral tracking on their desktop devices, while slightly under 70 percent are aware of tracking on mobile devices (see also office of the privacy commissioner of canada 2013). awareness, however, does not directly translate to understanding, and recent data indicate that even relatively sophisticated internet users are not fully informed about behavioral tracking practices (mcdonald and cranor 2010; smit et al. 2014). moreover, attitudes about tracking are at best ambivalent (ur et al. 2012), and many studies indicate a predominantly negative reaction to these practices (turow et al. 2009; mcdonald and cranor 2010; truste 2013). although it is not universally required by regulatory frameworks, many users feel that companies should request permission before collecting behavioral tracking data (office of the privacy commissioner of canada 2013). finally, although some users take steps to limit or even eliminate behavioral tracking, many do not. for example, while one-‐third to three-‐quarters of survey respondents indicate that they manage or refuse browser cookies (truste 2013; comscore 2007; 2011; rainie et al. 2013), at least one quarter reported no attempts to limit behavioral tracking. this may be attributed to the difficulty in using such mechanisms (leon et al. 2011). behavioral tracking and its mechanisms tracking mechanisms transmit non-‐personally identifiable information to websites for different purposes. originally, the information collected by these mechanisms was used to enhance user experience and to make these website interactions more efficient. tracking mechanisms can record user actions on a web page and their interaction preferences. using these data, websites can for example direct returning visitors to a specific location in the site, allowing those visitors to resume interaction with a website at the point where they were on the previous visit. using the internet protocol (ip) address of a user, websites can display information relevant to the geographic area where a user is located. tracking mechanisms also allow a website to remember registration details and the items users have put in their shopping basket (harding, reed and gray 2007). tracking mechanisms are also of great use to webmasters, supporting the optimization of website design. thus, for example, these mechanisms can inform webmasters of users’ movements on their websites: what pages are visited, how often they are visited, and in what order. they can also indicate the common entry and exit points for a specific website. this information can be leveraged in site redesign to increase user satisfaction and traffic. hidden online surveillance: what librarians should know to protect their own privacy and that of their patrons| fortier and burkell | doi: 10.6017/ital.v34i3.5495 62 website optimization and interaction personalization have potential benefit to users. at the same time, however, the detailed profile of user activities, potentially aggregated across multiple visits to different websites, presents potential privacy risks. the information gathered through tracking mechanisms can allow a website to identify browsing and information access habits, to infer user characteristics including location and some demographics, and to know what topics or products are of particular interest to a user. tracking mechanisms can be first-‐party or third-‐party, and the difference has implications for the detail that can be assembled in the user profile. first-‐party mechanisms are set by directly by the website a user is visiting, while third-‐party mechanisms are set by outside companies providing services, such as advertising, analysis of user patterns and social media integration, on the primary site. first-‐party tracking mechanisms collect information about a site visit and visitor and deliver that information to the site itself. using first-‐party tracking, web sites can provide personalized interaction, integrating visit and visitor information both within a single visit and across multiple visits (randall 1997). this information is available only to the web site itself, and thus neither includes information about visits to other sites nor is accessible by other websites, unless the information is sold or leaked by the first-‐party site (see narayanan 2011). third-‐party tracking mechanisms, by contrast, deliver information about a site visit and visitor to a third party. this transaction is often invisible to the user, and the information is transmitted typically without explicit user consent. third-‐party tracking represents a greater menace to privacy, since third parties have a presence on multiple sites, and are able to collect information about users and their activities on all those sites and integrate that information across sites and across visits into a single detailed user profile (see mayer and mitchell 2012 for a discussion of privacy problems associated with third-‐party tracking). research demonstrates that third-‐party tracking is a common and perhaps even ubiquitous practice (gomez, pinnick and soltani 2009; (burkell and fortier 2013). it is not uncommon for websites to have trackers from more than one third party, and some websites, especially popular ones, have trackers from dozens of different organizations: gomez, pinnick and soltani (2009), for example, found 100 unique web beacons on a single website. furthermore, the same tracking companies are present on many different websites, allowing them to integrate into a single profile information about visits to each of these many sites. privacychoice1, which maintains a comprehensive database of tracking companies, estimates that google display network (doubleclick), for instance, has a presence on 57 percent of websites. thus, a user traveling the web is likely to be tracked by doubleclick on more than half of the sites they visit, and doubleclick has access to information about all visits to each of these many sites. worries about the potential privacy breaches that mechanisms for tracking a user’s activities online can allow are not new. even at their inception in the mid-‐1990s, http cookies (also known as browser cookies) were generating controversy about the potential invasion of privacy 1 http://www.privacychoice.org/. information technologies and libraries | september 2015 63 (e.g. randall 1997). users, however, quickly realized that they could manage http cookies using accessible browser settings that limit or even entirely disallow the practice of setting cookies. as a result, websites, advertisers and others who benefit from web audience segmentation and behavior analytics developed newer and more obscure tracking technologies including ‘supercookies’ and web beacons, and these technologies are now deployed along with http cookies (sipior, ward and mendoza 2011). tracking technologies are constantly evolving in response to user behavior and advertiser demand, therefore keeping up to date is an ongoing challenge (see, e.g., goodwin 2011). http cookies http cookies were originally meant to help web developers “invisibly” gather information about users in order to personalize and optimize user experience (randall 1997). these cookies are simply a few lines of text shared in an http transaction, and a typical cookie might include a user id, the time of a visit, and the ip address of the computer. cookies are associated with a specific browser, and the information is not shared between different browsers on the same machine: thus, the cookies stored by firefox are not accessible to internet explorer, and vice versa. cookies do not usually include identifying information such as name or address, and they are able to do so if and only if the user has explicitly provided this information to the website. when users want to access a web page, their browser sends a request to the server for the specific website and the server searches the hard drive for a cookie file from this site. if there is no cookie, a unique identifier code is assigned to the browser and a cookie file is saved on the hard drive. if there is a cookie, it is retrieved and the information is used to personalize and structure the website interaction (for a detailed description of the mechanics of cookies, see kriscol 2001, 152–155). some http cookies, called session or transient cookies, automatically expire when the browser is closed (barth 2011). they are mainly used to keep track of what a consumer has added to a shopping cart or to allow users to navigate on a website without having to log in repeatedly. other http cookies, called permanent, persistent or stored cookies, are configured to keep track of users until the cookie reaches its expiration date, which can be set many years after creation (barth 2011). permanent http cookies can be easily deleted using browser management tools (sipior, ward and mendoza 2011). studies have shown that approximately a third of users delete cookies once a month (e.g. comscore 2007; 2011). such behavior, however, displeases advertisers, as it leads to an overestimation of the number of true unique visitors on a website and impede user tracking (marshall 2005; see also comscore 2007; 2011). flash cookies and other ‘supercookies’ to palliate this ‘attack’ on http cookies, an online advertising company, united virtualities, developed a backup system for cookies using the local shared object feature of adobe’s flash player plug-‐in: the persistent identification element (sipior, ward and mendoza 2011). this type of storage, called flash player local shared objects or, more commonly, flash cookies, shares many similarities with http cookies with regard to their tracking capabilities, storing similar hidden online surveillance: what librarians should know to protect their own privacy and that of their patrons| fortier and burkell | doi: 10.6017/ital.v34i3.5495 64 non-‐personally identifying information. unlike http cookies, however, flash cookies do not have an expiration date, a characteristic that makes them permanent until they are manually deleted. they are also not handled by a browser, but are stored in a location accessible to different browsers and flash widgets, which are thus all able to access the same cookie. they can hold much more data (up to 100 kb by default compared to 4 kb for http cookies), and support more complex data types than http cookies (see macdonald and cranor 2012 for a technical comparison of http and flash cookies). moreover, it is estimated that adobe’s flash player is installed on over 99 percent of personal computers (adobe 2011), making flash cookies usable on virtually all computers. flash cookies represent a more resilient technology for tracking than http cookies. erasing traditional cookies within a browser does not affect flash cookies, which needs to be erased in a separate panel (sipior, ward and mendoza 2011). flash cookies also have the ability to ‘respawn’ (or recreate) deleted http cookies. a website using flash cookies can therefore track users across sessions even if the user has taken reasonable steps to avoid this type of online profiling (soltani et al. 2009), and although it is declining in incidence, this practice is still occurring, sometimes on very popular websites (ayenson et al. 2011; macdonald and cranor 2012). it should also be noted that other internet technologies (e.g. silverlight, javascript, and html5), which have so far attracted less attention from researchers, use local storage for similar purposes. one developer even created the ‘evercookie’, a very persistent cookie incorporating twelve types of storage mechanisms available in a browser that makes data persist and allows for respawning (kamkar 2010), a method investigated by the national security agency to de-‐anonymize users of the tor network, (‘tor stinks’ presentation 2013), a network which aims at concealing the location and usage of users. web beacons users’ online behavior can also be monitored by web beacons (also called web bugs, clear gifs or pixel tags), which tiny are image tags embedded within a document, appearing on a webpage or attached to an email, that are intended to be unnoticed (martin, wu and alsaid 2003). the image tag creates a holding space for a referenced image residing on the web, and beacons transmit information to a remote computer when the document (web page or email) is viewed. web beacons can gather information on their own, and they can also retrieve information from a previously set cookie (angwin 2010; see martin, wu and alsaid 2003 for description of the different technological abilities of web beacons). such capacity means, according to the privacy foundation (smith 2000; quoted in martin, wu and alsaid 2003), that beacons could potentially transfer to a third party demographic data and personally identifiable information (name, address, phone number, email address, etc.) that a user has typed on a page. unlike cookies, beacons are not tied to a specific server and can track users over multiple web sites (schoen 2009). beacons, moreover, cannot be managed through browser settings. while blocking third-‐party cookies limit information technologies and libraries | september 2015 65 their range of action, it does not preclude beacons from gathering information on their own, and users have to install extensions to their browser to efficiently limit the effects of web beacons. strategies for identifying behavioral tracking in order to identify privacy-‐respecting online resources, librarians must learn to assess the behavioral tracking activities occurring on websites. the first step is to identify and review website privacy policies. privacy guidelines regulating the collection, retention and use of personal information in the online environment usually require that users should be given notice of website practices (e.g., fair information practice principles2 proposed in 1973 by the us secretary’s advisory committee on automated personal data systems, the convention for the protection of individuals with regard to automatic processing of personal data developed by the council of europe (1981), and the organisation for economic co-‐operation and development guidelines on the protection of privacy and transborder flows of personal data3). this notice is typically provided in privacy policies that identify what information is collected, how it is used, and with whom it is shared. regulatory frameworks, however, did not originally contemplate the collection of non-‐ personally identifiable information. while such disclosure would seem to be consistent with the fair information practice principles, the current mode of mode of control is in many cases self-‐ regulatory45, and full compliance with notice requirements is far from universal (komanduri et al. 2011-‐2012). thus, while disclosure of behavioral tracking practices in websites should be seen as diagnostic of the presence of these mechanisms, lack of disclosure cannot be interpreted to mean that the site does not engage in behavioral tracking (komanduri et al. 2011-‐2012; burkell and fortier 2013b). furthermore, privacy policy disclosures, where they do exist, may be difficult to understand (burkell and fortier 2013b). website privacy policies are often complex (micheti, burkell and steeves 2010). they tend to be written with the goal of protecting a website owner against lawsuits rather than informing users (earp et al. 2005; pollach 2005). pollach (2005), for example, details a variety of linguistic strategies that serve to undermine user understanding of website practices, including mitigation and enhancement, obfuscation of reality, relationship building, and persuasive appeals. therefore, even if many websites acknowledge the collection of non-‐ personally identifiable information, both from first-‐ and third-‐party, the effectiveness of this disclosure is limited, making privacy policies a relatively ineffective tool to identify behavioral tracking practices. 2 the privacy act of 1974, 5 u.s.c. § 552a. 3 c(80)58/final, as amended on 11 july 2013 by c(2013)79. 4 for instance, the new self-regulatory guidelines for online behavioral advertising identify the need to provide notice to users when behavioral data is collected that allows the tracking of users across websites and over time (united states federal trade commission, 2009). 5 exceptions to this self-regulatory principle are increasing, including but not limited to the california online privacy protection act of 2003 (oppa), and the eu cookie directive (2009/136/ec) of the european parliament and of the council. hidden online surveillance: what librarians should know to protect their own privacy and that of their patrons| fortier and burkell | doi: 10.6017/ital.v34i3.5495 66 as a result, librarians need to develop strategies and tools that allow them to assess directly the behavioral tracking practices of websites, in order that these practices can be considered in making websites recommendations. different protocols can be followed in making this assessment, but they should be built around the following guiding principles (see burkell and fortier 2013a for a full discussion). the first important principle is that each website should be visited in an independent session to eliminate contamination. each website under consideration should be visited in an independent session, beginning with the browser at an about:blank page, with clean data directories (no http and flash cookies, and an empty cache). the evaluator should ensure that browser settings are configured to allow cookies, tools to track web beacons (e.g., the ghostery6 browser extension) are installed in the browser, and adobe flash, via the website storage settings panel is configured to accept data. the website should then be accessed directly by entering the domain name into the browser’s navigation bar. evaluators should mimic a typical user interaction with the website on many pages without clicking on advertisements or following links to outside sites. as they browse through the site, the evaluator should record the web beacons and trackers identified by the browser extension (e.g., ghostery). at the end of the session, they should immediately review the contents of the browser cookie file and the adobe flash panel via website storage settings, recording any cookies that are present. privacychoice, as well as ghostery, maintains a database of trackers that evaluators can use to identify associated privacy risk. while all third-‐party trackers raise some privacy issues, some of them put users at a greater risk than others, either because of their practices or their presence on a large number of websites. evaluators should take that into account when making a decision. strategies for limiting behavioral tracking users may also take these steps to identify the presence of behavioral tracking, and digital literacy initiatives should provide this information along with tools and strategies that users can employ to limit tracking. it should be noted that elimination of all behavioral tracking may not be a desirable outcome from the perspective of users who benefit from the website personalization and optimization supported by these mechanisms. targeted advertising can also be positive for many people, since it eliminates unwanted or ‘useless’ advertisements. ultimately, a user must decide whether he or she wants to be tracked. digital literacy initiatives should raise awareness of behavioral tracking and provide users with the tools they need to identify and control tracking should they choose to do so. the easiest step is for users to learn how to manage http cookies in every web browser that they use. using browser settings, users can decide to refuse third-‐party cookies or even all cookies. the latter, however, will make the make the browsing experience much less efficient and may impede users from accessing some websites. users should also learn how to delete cookies and they should be encouraged to think about periodically emptying the cookie file of each of their browsers. controlling flash cookies is more complex, yet crucial considering the capabilities of 6 https://www.ghostery.com/. information technologies and libraries | september 2015 67 flash cookies. this is achieved through settings on the adobe website storage settings panel. browser extensions, such as ghostery and adblock plus7, can be added to most browsers. ghostery allows users to block trackers, either on a tracker-‐by-‐tracker basis, a site-‐by-‐site basis or a mixture of the two. also customable, adblock plus allows users to block either all advertisements or only the ones they do not want to see. these extensions, however, may slow down internet browsing. users can also change their internet use habits. it is possible for user to use search engines that do dot store any non-‐personally identifiable information, such as ixquick8 and duckduckgo9. ixquick returns the top ten results from multiple search engines. it only sets one cookie that remembers a user’s search preferences and that is deleted after a user does not visit ixquick for 90 days. duckduckgo, which returns the same search results for a given search term to all users, aims at getting information from the best sources rather than the most sources. while these search engines do not have all the functionality of the major search engines, both of them have received praise (e.g. mccracken 2011). the ultimate solution, one that allows a user to navigate online total anonymity, is to use the tor10 web browser, which impedes network surveillance or traffic analysis and which the u.s. national security agency has characterized as “the king of high secure, low latency internet anonymity” (schneier 2013). the anonymity afforded by tor, however, comes at the price of reduced speed and limitations to available content. conclusion it is widely understood that online privacy is at risk, threatened by the actions of governmental agencies and commercial entities. there is widespread awareness of and attention to the risks associated with the collection and use of personally identifiable information, but less attention is paid to an equally significant issue: the collection and use of information that is highly personal but nonetheless ‘non-‐identifying’. this practice, termed ‘behavioral tracking’, is the focus of this paper. other research demonstrates that behavioral tracking is widespread (gomez, pinnick and soltani 2009; burkell and fortier 2013a), but users demonstrate only a limited knowledge of the practice and they do little to control tracking (comscore 2007; 2011; rainie et al. 2013; truste 2013). we argue that librarians have a dual professional responsibility with respect to this issue: first, librarians should be aware of the surveillance practices of the websites they recommend to patrons and take these practices into account in making website recommendations; second, digital literacy initiatives spearheaded by librarians include a focus on online privacy, and provide patrons with the information they need to manage their own online privacy. this paper presents an overview of online behavioral tracking mechanisms, and provides strategies for identifying and limiting online behavioral tracking. the information presented provides a basic understanding of tracking mechanisms along with practical strategies that 7 https://adblockplus.org/. 8 https://www.ixquick.com/. 9 https://duckduckgo.com/. 10 www.torproject.org/torbrowser/. hidden online surveillance: what librarians should know to protect their own privacy and that of their patrons| fortier and burkell | doi: 10.6017/ital.v34i3.5495 68 librarians can use to evaluate websites with respect to these practices and strategies that can be used to limit online tracking. we recommend that website evaluation standards be extended to include assessment of online privacy and especially behavioral tracking. we also recommend that librarians actively promote digital literacy by engaging in public education programs that take privacy and other digital literacy issues into account (american library association 2013). finally, we note that protecting online privacy is an ongoing challenge, and librarians must ensure that they continually update their understanding of online surveillance mechanisms and the approaches that can be used to monitor and limit these activities. acknowledgement support for this project was provided by the office of the privacy commissioner of canada through its contributions program. the views expressed in this document are those of the researchers and do not necessarily reflect the views of the officer of the privacy commissioner of canada. references adobe. 2011. “adobe flash platform runtimes: pc penetration”. http://www.adobe.com/mena_en/products/flashplatformruntimes/statistics.html. “adone classified network and clickover announce strategic alliance”. 1997. business wire, march 24. “affinicast unveils personalization tool”. 1996. adage, december 4. http://adage.com/article/news/affinicast-‐unveils-‐personalization-‐tool/2714/. american library association. 2008. code of ethics. http://www.ala.org/advocacy/proethics/codeofethics/codeethics. ———. 2013. digital literacy, libraries, and public policies: report of the office for information technology policy’s digital literacy task force. http://www.districtdispatch.org/wp-‐ content/uploads/2013/01/2012_oitp_digilitreport_1_22_13.pdf. ———. 2014. choose privacy week. accessed april 8. http://chooseprivacyweek.org. angwin, julia. 2010. “the web’s new gold mine: your secrets”. the wall street journal july 31. http://online.wsj.com/news/articles/sb10001424052748703940904575395073512989404. ayenson, mika, dietrich james wambach, ashkan soltani, nathan good and chris jay hoofnagle. 2011. “flash cookies and privacy ii: now with html5 and etag respawning”. social science research network. http://ssrn.com/abstract=1898390. ball, james. 2013. “nsa stores metadata of millions of web users for up to a year, secret files show”. the guardian, september 30. http://www.theguardian.com/world/2013/sep/30/nsa-‐americans-‐ metadata-‐year-‐documents. information technologies and libraries | september 2015 69 barth, adam. 2011. “http state management mechanism”. internet engineering task force, rfc 6265. http://tools.ietf.org/html/rfc6265. burkell, jacquelyn and alexandre fortier. 2013. privacy policy disclosures of behavioural tracking on consumer health websites. proceedings of the 76th annual meeting of the association for information science and technology, edited by andrew grove. doi: 10.1002/meet.14505001087. burkell, jacquelyn and alexandre fortier. 2015. could we do better? behavioural tracking on recommended consumer health websites. health information and libraries journal 32 (3): 182– 194. canadian library association. 1976. code of ethics. http://www.cla.ca/content/navigationmenu/resources/positionstatements/code_of_ethics.htm. castelluccia, claude and arvind narayanan. 2012. privacy considerations of online behavioural tracking. heraklion, greece: european union agency for network and information security. http://www.enisa.europa.eu/activities/identity-‐and-‐trust/library/deliverables/privacy-‐ considerations-‐of-‐online-‐behavioural-‐tracking. comscore 2007. the impact of cookie deletion on the accuracy of site-‐server and ad-‐server metrics: an empirical comscore study. https://www.comscore.com/fre/insights/presentations_and_whitepapers/2007/cookie_deletio n_whitepaper. ———. 2011. the impact of cookie deletion on site-‐server and ad-‐server metrics in latin america: an empirical comscore study. http://www.comscore.com/insights/presentations_and_whitepapers/2011/impact_of_cookie_de letion_on_site-‐server_and_ad-‐server_metrics_in_latin_america. council of europe. 1981. convention for the protection of individuals with regard to automatic processing of personal data. http://conventions.coe.int/treaty/en/treaties/html/108.htm. earp, julia b., annie i. antón, lynda. aiman-‐smith and william h. stufflebeam. 2005. “examining internet privacy policies within the context of user values”. ieee transactions on engineering and management 52 (2): 227–237. gomez, joshua, travis pinnick and ashkan soltani. 2009. knowprivacy. http://ashkansoltani.files.wordpress.com/2013/01/knowprivacy_final_report.pdf. goodwin josh. 2011. super cookies, ever cookies, zombie cookies, oh my. ensighten, blog entry. http://www.ensighten.com/blog/super-‐cookies-‐ever-‐cookies-‐zombie-‐cookies-‐oh-‐my. harding, william t., anita j. reed and robert l. gray. 2001. cookies and web bugs: what they are and how they work together. information systems management 18 (3): 17–24. hidden online surveillance: what librarians should know to protect their own privacy and that of their patrons| fortier and burkell | doi: 10.6017/ital.v34i3.5495 70 johns hopkins university sheridan libraries. 2013. evaluating information found on the internet. http://guides.library.jhu.edu/evaluatinginformation. kamkar, samy. 2010. “evercookie”. http://samy.pl/evercookie/. kapoun, jim. 1998. “teaching undergrads web evaluation: a guide for library instruction”. college & research libraries news, july/august: 522–523. komanduri, saranga, richard shay, greg norcie, blase ur and lorrie faith cranor. 2011-‐2012. “adchoices? compliance with online behavioral advertising notice and choice requirements”. i/s: a journal of law and policy for the information society 7: 603–638. kristol, david m. 2001. http cookies: standards, privacy, and politics. acm transactions on internet technology 1 (2): 151–198. leon, pedro giovanni, blase ur, rebecca balebako, lorrie faith cranor, richard shay, and yang wang. 2012. “why johnny can’t op out: a usability evaluation of tools to limit online behavioral advertising”. proceedings of the sigchi conference on human factors in computing systems. http://dl.acm.org/citation.cfm?id=2207759. marshall, matt. 2005. “new cookies much harder to crumble”. the standard-‐times, may 15. http://www.southcoasttoday.com/apps/pbcs.dll/article?aid=/20050515/news/305159957. martin, david, hailin wu and adil alsaid. 2003. hidden surveillance by web sites: web bugs in contemporary use. communications of the acm 46 (1): 258–264. mayer, jonathan r. and john c. mitchell. 2012. third-‐party web tracking: policy and technology. proceedings of the 2012 ieee symposium on security and privacy. https://cyberlaw.stanford.edu/files/publication/files/trackingsurvey12.pdf. mccracken, harry. 2011. “50 websites that make the web great. time, august 16. http://content.time.com/time/specials/packages/0,28757,2087815,00.html. mcdonald, aleecia m. and lorrie faith cranor. 2010. “beliefs and behaviors: internet users’ understanding of behavioral advertising”. social science research network. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1989092. ———. 2012. “a survey of the use of adobe flash local shared objects to respawn http cookies”. i/s: a journal of law and policy for the information society 7 (3): 639–687. micheti, anca, jacquelyn burkell and valerie steeves. 2010. “fixing broken doors: strategies for drafting privacy policies young people can understand”. bulletin of science, technology, and society. 30 (2): 130–143. narayanan, arvind. 2011. “there is no such thing as anonymous online”. blog entry, july 28. https://cyberlaw.stanford.edu/blog/2011/07/there-‐no-‐such-‐thing-‐anonymous-‐online-‐tracking. information technologies and libraries | september 2015 71 office of the privacy commissioner of canada. 2011. report on the 2010 office of the privacy commissioner of canada's consultations on online tracking, profiling and targeting, and cloud computing. https://www.priv.gc.ca/resource/consultations/report_201105_e.pdf. ———. 2013. survey of canadians on privacy-‐related issues. http://www.priv.gc.ca/information/por-‐rop/2013/por_2013_01_e.pdf. pollach, irene. 2005. “a typology of communicative strategies in online privacy policies: ethics, power, and informed consent”. journal of business ethics 62 (3): 221–235. rainie, lee, sara kiesler, ruogu kang and mary madden. anonymity, privacy, and security online. pew research internet project. http://www.pewinternet.org/2013/09/05/anonymity-‐privacy-‐ and-‐security-‐online/. randall, neil. 1997. “the new cookie monster”. pc magazine 16 (8): 211–214. schneier, bruce. 2013. “attacking tor: how the nsa targets users' online anonymity”. the guardian, 4 october. http://www.theguardian.com/world/2013/oct/04/tor-‐attacks-‐nsa-‐users-‐ online-‐anonymity. schoen, seth. 2009. “new cookie technologies: harder to see and remove, widely used to track you”. blog entry, september 14. https://www.eff.org/deeplinks/2009/09/new-‐cookie-‐ technologies-‐harder-‐see-‐and-‐remove-‐wide. sipior , janice c., burke t. ward and ruben a. mendoza. 2011. online privacy concerns associated with cookies, flash cookies, and web beacons. journal of internet commerce 10 (1): 1–16. smit, edith g., guda van noort hilde a. m. voorveld. 2014. understanding online behavioural advertising: user knowledge, privacy concerns, and online coping behaviour in europe. computers in human behavior 32 (1): 15–22. smith, r. m. 2000. “why are they bugging you?” privacy foundation. http://www.privacyfoundation.org/resources/whyusewb.asp. soltani, ashkan, shannon canty, quentin mayo, lauren thomas, chris jay hoofnagle. 2009. “flash cookies and privacy”. social science research network. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1446862. “‘tor stinks’ presentation”. 2013. the guardian online, october 4. http://www.theguardian.com/world/interactive/2013/oct/04/tor-‐stinks-‐nsa-‐presentation-‐ document. truste. 2013. us 2013 consumer data privacy study – advertising edition. http://www.truste.com/us-‐advertising-‐privacy-‐index-‐2013/. hidden online surveillance: what librarians should know to protect their own privacy and that of their patrons| fortier and burkell | doi: 10.6017/ital.v34i3.5495 72 turow, joseph, jennifer king, chris jay hoofnagle, amy bleakley and michael hennessy. 2009. “americans reject tailored advertising and three activities that enable it”. social science research network. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1478214. united states federal trade commission. 2009. ftc staff report: self-‐regulatory principles for online behavioral advertising. http://www.ftc.gov/os/2009/02/p085400behavadreport.pdf. university of california, berkley library. 2012. “finding information on the internet: a tutorial” http://www.lib.berkeley.edu/teachinglib/guides/internet/evaluate.html. ur, blase, pedro giovanni leon, lorrie faith cranor, richard shay, and yang wang. 2012. “smart, useful, scary, creepy: perceptions of online behavioral advertising”. soups ’12 proceedings of the eighth symposium on usable privacy and security. http://dl.acm.org/citation.cfm?id=2335362. weston, greg, glenn greenwal and ryan gallagher. 2014. “csec used airport wi-‐fi to track canadian travelers: edward snowden documents”. cbc news, january 30. http://www.cbc.ca/news/politics/csec-‐used-‐airport-‐wi-‐fi-‐to-‐track-‐canadian-‐travellers-‐edward-‐ snowden-‐documents-‐1.2517881. “what they know”. 2010. the wall street journal online. http://blogs.wsj.com/wtk/. alexa, are you listening? an exploration of smart voice assistant use and privacy in libraries article alexa, are you listening? an exploration of smart voice assistant use and privacy in libraries miriam e. sweeney and emma davis information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.12363 miriam e. sweeney (mesweeney1@ua.edu) is associate professor, university of alabama. emma davis (edavispatsfan@gmail.com) is library specialist, hoover public library. abstract smart voice assistants have expanded from personal use in the home to applications in public services and educational spaces. the library and information science (lis) trade literature suggests that libraries are part of this trend, however there are few empirical studies that explore how libraries are implementing smart voice assistants in their services, and how these libraries are mitigating the potential patron data privacy issues posed by these technologies. this study fills this gap by reporting on the results of a national survey that documents how libraries are integrating voice assistant technologies (e.g., amazon echo, google home) into their services, programming, and checkout programs. the survey also surfaces some of the key privacy concerns of library workers in regard to implementing voice assistants in library services. we find that although voice assistant use might not be mainstreamed in library services in high numbers (yet), libraries are clearly experimenting with (and having internal conversations with their staff about) using these technologies. the responses to our survey indicate that library workers have many savvy privacy concerns about the use of voice assistants in library services that are critical to address in advance of library institutions riding the wave of emerging technology adoption. this research has important implications for developing library practices, policies, and education opportunities that place patron privacy as a central part of digital literacy in an information landscape characterized by ubiquitous smart surveillant technologies. introduction smart voice assistant use has expanded from personal uses in the home to new applications in customer services, healthcare, e-government, and educational spaces, raising questions from groups like the american civil liberties union (aclu), among others, about the data privacy implications of these technologies in public and shared spaces.1 libraries are part of the voice assistant adoption trend, as documented in the american libraries magazine article “your library needs to speak to you” by carrie smith.2 smith gives examples of school, public, and academic libraries adopting smart voice assistants like amazon’s alexa and echo devices for a range of services and programming including “event calendars, catalog searches, holds, and advocacy.” nicole hennig points out that there are tremendous opportunities for voice assistants to assist “people with disabilities, the elderly, and people who can’t easily type.”3 in these ways, voice assistants are often presented in the trade literature as part of an exciting new wave of emerging smart technology services that libraries can “get ahead of” and potentially harness for public service and community engagement. at the same time, the key privacy issues inherent in voice assistants are often downplayed as secondary concerns while librarians are encouraged to press forward and experiment with smart technology adoption. we argue that the privacy concerns surrounding voice assistant use in libraries should be treated as fundamental questions for library mailto:mesweeney1@ua.edu mailto:edavispatsfan@gmail.com information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 2 workers to consider as a part of upholding the core values of patron privacy and confidentiality in library services. voice assistant use in libraries is still nascent, reflecting the emerging nature of these technologies. given this, it is not surprising that very few empirical studies have explored voice assistant use and potential data privacy implications for libraries. our research is intended as an exploratory study that contributes to advancing knowledge in this area. the goals of this study are to begin mapping smart voice assistant use in libraries, to assess how aware library workers are of privacy concerns involving these technologies, and document how library workers are educating patrons about privacy and voice assistant use. these are necessary first steps for developing library practices, policies, and education opportunities for voice assistant use that prioritize privacy as a central part of digital literacy in an information landscape characterized by ubiquitous smart surveillant technologies and diminishing data privacy protections. review of literature what is a voice assistant? voice assistants are a type of digital assistant technology, also known as virtual assistants, and can be broadly defined as computer programs designed with human characteristics that act on behalf of users in digital environments using voice interfaces.4 apple’s siri, microsoft’s cortana, and amazon’s alexa are prevalent examples of smart digital assistants that use voice recognition and natural language user interfacing to help learn users’ preferences, answer questions, and manage a variety of applications and personal information. voice assistants can run on multiple devices and be seamlessly integrated across platforms including networked internet of things (iot) gadgets like smart speakers (e.g., amazon echo and google home) and other smart-home technologies (e.g., nest or ring), along with mobile devices, smart watches, personal computers, and numerous third-party applications. ubiquitous “always on” features are offered as a convenience to users who can use “wake words” (e.g., “hey, siri”; “alexa”; “ok google”) to initiate queries and commands. amazon’s smart speakers and intelligent digital assistants are rapidly becoming pervasive home and personal technologies, with the amazon echo leading the market in 2019 with 61 percent market share, followed distantly by the google home device with 24 percent market share.5 a recent united states survey by clutch reported that nearly half of people surveyed owned a voice assistant, with one-third planning to purchase one in the next three years.6 additionally, the clutch survey found that 69 percent of voice assistant owners used their devices every day.7 the popularity of voice assistants for personal use has driven the expansion of these technologies for customer service applications outside of the home in shared and public spaces, including in educational settings and health care. in this landscape it is perhaps not surprising that librarians are following suit and exploring the service potentials of voice assistants for libraries. libraries and voice assistant use the american library association’s (ala) center for the future of libraries initiative identified “voice control” as a trend in their 2017 report, anticipating the relevance of voice assistant technologies for libraries.8 the capability of voice assistants to integrate across platforms through customized applications—which amazon calls “skills” and google refers to as “actions”—allows libraries to create specialized uses for these technologies as a part of their regular information services. additionally, existing third-party vendors like overdrive (for e-book lending) and hoopla (multimedia lending) that most public libraries use are preconfigured to connect to voice information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 3 assistants like amazon’s alexa. there are many creative and potentially helpful ways that voice assistants could be integrated into the library setting, including enhancing read-along with music and effects, providing accessible services for elderly patrons or individuals with disabilities, and providing an alternative access point for common library queries and institutional information (e.g., searching titles, placing holds, requesting library event information).9 some libraries have started experimenting with voice assistant services in the library. for example, iowa state university staff developed alexa skills for their library so that users could find out information about library history and library collections.10 other libraries are using voice assistants to strategically engage their communities, as when the spokane public library placed amazon echo dots in the library so patrons could ask questions about upcoming bond elections, an issue that directly impacts library funding.11 the worthington (oh) libraries are integrating voice assistant technologies into technology training and “petting zoo kits” which allow their patrons to try out emerging technologies.12 the king county (wa) library system is taking a novel approach and experimenting with developing their own voice assistant, libro.13 these examples point to the many applications and creative approaches libraries are experimenting with to bring voice assistant technology to their services. data privacy issues as convenient as voice assistants may be for library services, the underlying data infrastructures of these technologies are tightly controlled by the technology companies that design and sell them. the lack of library control (and transparency) over these infrastructures raises questions about how the core values of privacy and confidentiality can be guaranteed in the library setting. 14 voice assistant technologies capture a wide range of intimate user information in the form of biometric data (e.g., voice recognition), consumer habits, internet-based transactions, personally identifiable information (pii), and geographical information.15 the ubiquitous “always on” feature that makes these technologies so convenient also flags important privacy questions about the extent of user interactions that are recorded; how these files are processed, transcribed, and stored; and how local, state or other law enforcement agencies might compel or otherwise use these records.16 recently amazon has confirmed that they have employees dedicated to listening to recordings from echo devices in order to help “eliminate the gaps in alexa’s understanding of human speech and help it better respond to commands,” which is concerning for patron privacy in the library context.17 researchers at northeastern university and imperial college london recently did a study about how often smart speakers record “accidentally” and whether or not they are constantly recording. the study found no evidence to support the theory that these devices are constantly recording, however the researchers did report that smart speakers are accidentally activated around 19 times a day, on average. 18 these reports aside, there is still much unknown about what these companies, and the companies they contract out work to, do with the personal data collected from voice assistants. lastly, amazon is a known collaborator with us government agencies like homeland security and immigration and customs enforcement (ice), hosting their biometric data on amazon web services (aws).19 amazon has a reputation for being one of the least transparent technology companies in terms of data sharing practices, and has routinely evaded questions about if/how much of customers’ echo data has been turned over to federal authorities.20 given this data environment, the fact that libraries are beginning to experiment with voice assistant integration in their services poses important questions for patron data privacy and confidentiality. ala provides library privacy guidelines for third-party vendors that clearly detail information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 4 expectations for use, aggregation, retention, and disclosure of user data. 21 while this document has been helpful for guiding license agreements with digital content providers, program facilitators, and other libraries, it does not quite capture the range of complexities that emerging smart technologies pose in the app-driven iot landscape. this area is ripe for study and having more information about how libraries of different types are approaching using voice assistants is necessary for developing responsive professional practices that center issues of privacy and critical digital literacy. our survey explores some of these issues with the purpose of beginning to document voice assistant use, and associated privacy concerns, in library services. research methods four main research questions guide this study: (1) how are libraries using smart voice assistant technologies as a part of their library services? (2) how aware are library workers of how voice assistants integrate with third-party digital content platforms? (3) are libraries educating library patrons about the privacy implications of smart voice assistant technologies? and (4) what kinds of privacy concerns do library workers have about the use of smart voice assistant technologies in their library services and programming? to address these questions, we developed an online survey using qualtrics web software, and distributed it in fall 2019 to 1,929 public and academic libraries across the us via email solicitation.22 the survey consisted of a mix of 31 multiple choice and open-ended questions designed to address different aspects of the stated research questions (see appendix a). since most of the examples of library voice assistant use detailed in the lis trade literature came from public and academic libraries, these were the library types we identified as most likely to already be experimenting with voice assistants in services and programming. using purposive sampling techniques, we selected 30 public libraries for each state that represented a range of rural and metropolitan service areas. we selected approximately 10-20 academic libraries per state, the actual numbers ranging based on the total number of universities and colleges in a given state. we identified a cross-section of large state schools, private colleges, and community colleges in each state to account for the variety of higher education institutional settings for academic libraries. we sent email solicitations to each public library, targeting email addresses for library directors where possible. for libraries that had centralized email services, we solicited participation using the contact forms available on the libraries’ websites. email solicitations to the academic libraries targeted library employees with job titles that included: emerging technology, user services, user experiences, head of public services, and head of technology. our survey analysis documents the numbers of reported uses, and kinds of integration, of voice assistant technologies across library applications and services. we conducted a qualitative content analysis of the short answer responses, with both researchers independently coding participant comments for emergent themes and categories. as a part of this process both researchers compared and negotiated categories in two iterations of coding to arrive at a common codebook which was then applied in the final pass of the responses. these categories have some distinct features, but also have many overlapping components. comments that embodied multiple themes were included in all categories that were relevant for describing them, meaning a particular comment might be included in multiple categories. the following sections report on the key findings of this study, organizing the discussion around our original research questions. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 5 findings participant demographics we received 86 total responses for the survey, with the majority of respondents (61 percent) reporting affiliation with public libraries, followed by respondents from academic libraries (38 percent), with one respondent from a school library (1 percent).23 the participants represented libraries from 42 states across the us.24 the vast majority of public library respondents (65 percent) reported serving populations of 25,000 or more, though there was also a large reporting from libraries serving smaller populations of 2,500-9,999. the majority of academic library respondents work for small and medium sized institutions serving populations between 2,5009,999 (table 1), with nearly a third of respondents representing medium to large institutions. admittedly, these are rough demographic sketches to help quickly identify which types of libraries might be using voice assistants. more granular demographic detail would be useful in future studies to further understand how factors like institution type, geographical region, access to resources, and service community demographics shape decisions about emerging technology adoption in libraries. table 1. size of service population by library type total public academic school total count 84 51 32 1 2,5000 or less 11.9% 2.0% 28.1% 0.0% 2,500-9,999 25.0% 19.6% 34.4% 0.0% 10,000-25,000 16.7% 11.8% 25.0% 0.0% 25,000+ 44.0% 64.7% 12.5% 0.0% i’m not sure. 2.4% 2.0% 0.0% 100.0% how are libraries using smart voice assistant technologies as a part of their library services? only five respondents (6 percent) in our study reported that their library is currently using amazon echo, google home, or apple siri devices for patron services and programming. of the voice assistant adopters, three were public libraries using amazon echo and google home devices, and two were academic libraries using amazon echo and apple siri (table 2). table 2. voice assistant device by library type total public academic school amazon echo 3 1 2 0 google home 2 2 0 0 apple siri 1 0 1 0 librarians described using voice assistants to “provide basic info about the library and resources,” and on an “ad hoc basis” to promote the library-specific alexa skills and google home actions. other reported uses included “translation services” and as a part of “technology petting zoos.”25 we asked librarians to describe where voice assistants were located in the library to get a better idea of the spatial arrangements of these technologies, which could be important for considering potential surveillant concerns. several libraries reported that they had voice assistants sitting at front service desks or reference desks for patrons to use in both adult and children’s service areas, information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 6 as well as at circulation desks. as one librarian described, “we are mounting it [the voice assistant] so students/users can ask questions when necessary.” when it comes to using these devices in library programming, the most common response was for use in technology petting zoos and in technology classes where patrons can see technology demonstrations and ask library staff questions, or get on-on-one tutoring sessions: “our technology department holds regular ‘tech drop-in's’ and carries out one on one assistance by appointment. in the context of these patrons will sometimes bring in their own devices or ask questions about the use of digital assistants.” other programming applications that librarians mentioned for voice assistants included trivia, 3 -d printing, and makerspaces. two libraries (one public and one academic) reported that they were circulating apple siri devices (e.g., ipads) and amazon alexa products (e.g., echo) for checkout. how aware are library workers of how voice assistants integrate with third-party digital content platforms? the majority of library workers surveyed (70 percent) reported that their libraries use third-party digital media platforms like overdrive and hoopla to provide multimedia content like e-books and streaming video to patrons. both of these platforms support integration with voice assistants like amazon alexa through “skills” (the alexa equivalent of an application). patrons are able to download a skill for their alexa-enabled device to access digital content through these platforms, which are often linked to their library accounts (e.g., “alexa, ask hoopla how many borrows i have remaining.”).26 around 14 percent of the respondents reported that they were aware that overdrive and hoopla integrated with voice assistants, and 3 percent of all respondents reported that their libraries actively inform patrons about amazon alexa skills for these services. when patrons begin connecting their personal voice assistant devices with third -party digital content providers that are also linked to their library accounts, different terms of service agreements and privacy policies overlap creating a complex data rights landscape. almost a third of our respondents (29 percent) replied that they were aware that amazon has different privacy policies from overdrive and hoopla, with 22 percent responding that they were unaware of these differences (the rest were unsure or did not respond). only 15 percent of respondents reported that their libraries provided patrons with information about overdrive and hoopla’s privacy policies. one library worker offered that, “when helping a patron or informing them that we use overdrive they are encouraged to read all the privacy info.” however, no libraries in this study reported sharing information about amazon’s privacy policies with patrons, which might also apply to linked accounts. lastly, 34 percent of the library workers indicated that they were familiar with the ala guidelines on privacy that pertain to third-party vendors, and 16 percent reported that their library actively refers to these guidelines in information materials for patrons. for instance, “we have a privacy policy on our website, which was based on the ala library privacy checklists. it states that our vendors have different privacy policies than we do.” these responses indicate that while some library workers are aware of the privacy implications of the integration of voice assistants into third-party digital content platforms, there are opportunities to increase staff and patron awareness about the intersecting privacy policies and terms of service in this landscape. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 7 are libraries educating library patrons about voice assistant technologies as a part of services and programming? we were curious if the libraries who used voice assistants in their services were taking any particular measures to inform patrons about the privacy implications of these technologies, or offering any other kinds of specific privacy “best practices” guides for use (e.g., how to erase your data records, adjust settings, etc.). the two libraries who reported circulating voice assistants indicated that they did not include any privacy information with voice assistant devices at checkout. similarly, we asked library workers about the kinds of technology classes or programming that their libraries were offering, since these might be sites where there is potential to educate or provide information about privacy issues raised by smart technologies like voice assistants. we found that 49 (56 percent) of the libraries represented in the survey (37 public, 12 academic) offer technology courses for the public. of these, 39 libraries (24 public, 15 academic) responded “yes” to our question asking if aspects of “data privacy or data literacy” are included as part of these classes or other related programming.27 only 3 libraries (2 public, 1 academic) were able to report that their library offers data literacy education that specifically addresses voice assistant technologies. library workers provided many examples of the kinds of data literacy information that their libraries typically provided in technology classes and programming. twelve respondents said that their libraries offered some sort of broad data literacy class and several cited classes specifically targeted at personal data practices and security. topics taught in these classes included: understanding your personal risk profile; password managers and security; how to understand and protect your digital footprint; and sessions on facebook and google where staff “walk users through how to find their information and make decisions about it.” several respondents identified information literacy topics in conjunction with data literacy, noting that their library teaches classes about identifying “fake news,” phishing scams, and evaluating the authority of websites and website content. none of the responses specifically named issues around privacy or data capture by voice assistants or other smart technologies as topics covered in library technology classes. several library workers noted that technology classes were offered at their libraries through one-on-one sessions, geared to individually address what patrons had questions about. based on these responses it is unclear how in-depth, or if at all, these one-on-one sessions might go into informing patrons about privacy best practices and risks when using smart technologies like voice assistants. what kinds of privacy concerns do library workers have about the use of smart voice assistant technologies in their library services and programming? just over half of the library workers surveyed (52 percent) answered “yes” to the question: “do you have any privacy concerns about the use of amazon echo, google home, or apple siri devices in the library?” of the other responses, 16 percent reported “no” concerns and 15 percent answered “i’m not sure.” those who answered yes were asked to further describe their privacy concerns, resulting in robust descriptions that demonstrated a savvy understanding of the voice assistant data landscape. we characterized library workers’ concerns about voice assistants in the library by five major categories: data access and use; surveillance and “always on” features; procedure and operations; legal issues; and professional responsibility. data access and use by far the most prevalent privacy concerns focused on questions about who has access to data collected by smart voice assistants and how this data might be used (or misused) by different information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 8 parties. library workers were the most concerned about the reach of access that the three major voice assistant parent companies (amazon, google, apple) have to patron data, closely accompanied by concerns with the selling of this data to third-party vendors: “there are known risks in the logging practices of the assistant vendor (amazon, google, apple). there are potentially greater, and unknown, risks of privacy and data security problems with third-party integrators that libraries are working with to create the alexa skills, google home actions, etc.” “these devices are tied to user accounts for vendors that sell goods and services. there are opportunities to make purchases that we do not want to present to our patrons.” “as currently constituted, most of these devices' privacy policies require owners to allow voice recordings to be sent to cloud services for transcription and, in some instances, for storage and for re-listening by staff or 3rd-party contractors.” another library worker added that they were concerned about the willingness of these parent companies to “share personal, private data with law enforcement agencies.” this observation underscores what is potentially at stake in terms of patron vulnerability in this data environment. several concerns focused on patrons “unwittingly leaving their sensitive information on devices that we might use.” “being that anything we use in the library, or check out to our patrons is shared, i have privacy concerns for what data and recordings will be collected by the services while they are either in use in the library or while they are in the patron’s possession.” while some of these concerns were tied back to how parent companies might use this data, others were equally wary of the potentials for “storing information that can be accessed between patron uses” or by library staff: “as with computers in the info commons, i would be concerned whether user information is scrubbed after each user. or would one user's information persist and become available to a subsequent user.” “i would not want to be able to identify the patron who used the device. in this case, we cannot. we circulate ipads as assistive devices. as soon as the item is returned, the checkout record is purged.” lastly, library workers expressed cybersecurity concerns about voice assistants, wondering about how voice assistants might be hacked or otherwise manipulated by malicious actors: “the library is public space, these devices are not known for being secure. a device would have to be registered to some university account, but would be prone to algorithmic manipulation from public voice inputs if that makes sense?” “just the idea that they (everything!) is [sic] hackable, and hostage-able, and so on, creeps me out personally, but also in terms of privacy and confidentiality of users of that technology.” information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 9 “alexa and google home can be hacked to phish passwords and other sensitive information.” taken together, these concerns gesture to the opacity of the data environment in terms of who might have access to data (companies, law enforcement, patrons, library staff, hackers) and how this data might be used (advertising and marketing, exposure of personal patron information, state surveillance, and exploitation). surveillance and “always on” features the second major area of concern that library workers expressed was about the surveillance potentials of voice assistants via their passive listening features. in order for voice assistants to respond to their various wake words, they need to be “always on” and listening. while there is a difference between always listening and always recording (which recent studies suggest is no t happening), library workers remained wary about devices “constantly monitoring staff or patrons.”28 these concerns have some obvious overlap to the data access and use theme, but differ in that they are specifically concerned with the act of surveilling—monitoring—patron activities, use patterns, and personal information. three respondents in this category couched their data privacy concerns in terms of ability to exert some control over their data (e.g., deleting data), or the ability to grant permission/consent to be recorded: “these devices are intended for use in the home. they offer some protections for users with management access. for example, the google assistant allows review and deletion of recording history. for users without such access there are no such protections.” “...they [voice assistants] are intended to for use inside a single household, learning the voices, habits and preferences of those household members. i feel that this kind of personal information should be the individual's choice to make and not the library's [sic].” “my concern is that my personal data is being collected without my permission. the same concern applies to patrons of the library. having them present and turned on captures people's conversations and they may not be aware that is happening.” as these comments suggest, passive listening in public spaces opens up the potential for surveilling patrons and library staff who are not intending to interact with the devices, or who have no knowledge that the device is present. in other words, while some patrons might opt to use a voice assistant to ask a question or look a book up in a library catalog, patrons (and library staff) who are merely talking in the vicinity of these devices may still be listened to and recorded by these devices without their knowledge or consent. this group of privacy concerns conveys a lack of transparency around data collection and surveillance in voice assistants, pointing to larger power differentials between parent companies and users in terms of control over data collection and management. procedure and operations library workers discussed the operational challenges that voice assistants present to staff in terms of establishing routine procedures that ensure patron privacy and confidentiality in between patron use: information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 10 “how do we make sure no residual information remains in the device before someone else uses it or that if used during a program 'private' information isn't being broadcast to other devices in the area?” another library worker alluded to some of the operational considerations that already accompany library use and lending of personal computing devices, “clearing data, purchasing, maintaining, we already have ipads and other devices and their management with our staff has been a challenge.” this comment points to the extra staff labor that underpins technology services, which is often not considered as a part of infrastructure for offering these services. similarly, there is a sense from these comments that establishing procedures to maintain privacy and confidentiality are critical for voice assistants. failure to erase or secure patron data could lead to inadvertently exposing sensitive or personally identifiable information (pii). “patron's [sic] may inadvertently be saving their information or staff may forget to delete information causing the previous patrons sensitive information to remain for the next patron to discover.” while google home and amazon alexa devices do provide the ability for individual recordings to be deleted by the account holder, in the case of shared library use of voice assistants, it would likely be incumbent on a library staff member to access and delete recordings. this raises ethical, legal, and operational questions for library staff required to manage any patron data collected by voice assistants. in any case, procedural concerns are a reminder that library staff have an active role to take in ensuring patron privacy. legal issues library workers in this study identified three legal issues posed by voice assistants in the library. the first legal issue raised was the potential for violation of the family educational rights and privacy act (ferpa)—the federal law that protects the privacy of student education records—due to the collection of pii by voice assistants. library workers in many academic settings are required to maintain compliance with ferpa. one of the respondents was concerned that by using voice assistants in their services, libraries would be putting themselves in a position to potentially violate this law. a second set of concerns focused on questions about the liability of the library (or individual library workers) if a patron’s pii is misused by technology companies or the third-party vendors who have access to user data: “i have great concerns regarding the use of this technology in a library setting since it might expose the library to potential liability if, more likely when patron data is misused by the technology providers.” related to this concern, another library worker asked, “who owns the info?” questions about rights and ownership of personal data by technology companies, itself a fraught and opaque legal area, require more ethical and legal probing as libraries become intermediaries to patron use of voice assistants. lastly, one library worker cited concerns about librarians’ ability to uphold first amendment rights with voice assistants. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 11 “we take our mandated role to uphold first amendment rights and patron privacy very seriously. there are too many issues with the way these for-profit companies collect, store and potentially use information. we see no benefits of service gained that offset these concerns. we are also concerned about the way owners of these products use their wealth to leverage political influence.” this comment identifies privacy as a necessary condition for facilitating free speech, contrasting this with a sketch of the political and economic motives underlying voice assistant development. the concerns raised by these library workers point to the complexity of managing patron data in the context of a variety of existing legal frameworks. professional responsibility three respondents explicitly placed privacy concerns in the context of their professional responsibility as library workers to “protect” patrons and patron privacy. a fourth respondent voiced a twin concern about “the library's inability to protect privacy and patron information” (emphasis added). beyond descriptions of protecting patrons, these library workers framed privacy as a professional value. comments such as, “we take our mandated role to uphold first amendment rights and patron privacy very seriously,” emphasize privacy as a professional charge. these kinds of comments tacitly draw on lis professional core values and ethics statements to position responsible professional practice as the action of upholding privacy. as a result, professional identity is discursively constructed by these library workers as a function of valuing privacy. the following comment, particularly, draws an identity-based line between “us” (library professionals) and “them” (technology companies) that is based on divergent values surrounding privacy: “since one of the main concerns we (should) have as library professionals is patron privacy; ‘teaming up’ with technology providers who do not have that level of concern is problematic at best.” the assertion that library core values may be in conflict with the technology providers that are designing voice assistants is very astute, and important for libraries to consider when weighing the decision to experiment with these (and other) emerging smart technologies. discussion: key considerations for library professionals our research suggests that library use of voice assistants poses many as-of-yet unresolved privacy issues for library staff and patrons alike. though voice assistant use is still fairly nascent across public and academic libraries, our study confirms that these tools are already being adopted by some libraries. the adoption of these, and other, smart technologies, is likely to keep trending in library services across institution types, paralleling market trends for personal adoption of voice assistants. many library workers in our study expressed astute concerns about voice assistants, raising important questions about how patron data was collected, managed, and used across the data lifecycle of these technologies. this is a critical moment, then, for the library profession to take stock of questions of privacy surrounding voice assistants, and an opportunity to set a broader professional agenda for data-privacy that encompasses the complexities of smart technology use in library services. in this spirit, we have identified several main areas of concern that emerged from our study, posited as key considerations about voice assistants for library professionals to grapple with. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 12 circulation procedures for libraries who are, or are considering, lending voice assistant-enabled technologies, clear lending rules are needed for patrons that set guidelines for disconnecting their personal amazon, apple, or google accounts before returning the device. likewise, it is important to develop procedures for library staff to anticipate instances when patrons forget to disconnect their personal accounts. library workers cannot, and should not, be responsible for disconnecting personal accounts as a protective measure for both staff and patrons, since doing so asks library workers to access and take responsibility for personal patron data, including pii. one suggestion might be to require devices to be restored to factory settings, which could be verified by a library staff member at time of device return. libraries might also consider including privacy best practices with these devices that outline known privacy risks and provide information about how to adjust settings to limit data sharing or delete records in personal accounts where applicable (e.g., amazon). third-party digital content platforms the integration of voice assistants in third-party digital content platforms licensed by libraries is becoming more common, pointing to the complexity of upholding patron data privacy throughout these layered and linked services. this issue speaks to the difficulties navigating overlapping privacy statements and terms of service agreements, which is not unique to voice assistants but does indicate the need for more data protections and consumer-oriented information policies. ala already does advocacy work on these issues and provides many helpful guidelines , such as the library privacy guidelines for vendors (http://www.ala.org/advocacy/privacy/guidelines/vendors). still, the data environment is very much characterized by the unequal power differential between technology companies and users. we are in dire need of more robust information policy frameworks that are predicated on transparency, strict parameters for data collection and use, corporate accountability, and user control and agency. a promising example of this is the general data protection regulation (gdpr) implemented in the european union in 2018. something similar is needed in the us to regulate corporate data-sharing practices and give users more control over their data. this would be beneficial across the board for the public, as well as to library patrons using their personal voice assistant devices to access library resources. education opportunities for expanding digital literacy library workers in our study reported a range of technology education and digital literacy programming initiatives in their libraries, though none that specifically addressed voice assistants. this suggests that library technology programming might not be targeting the kinds of specific privacy concerns posed by smart technologies like voice assistants. as smart technologies like voice assistants become more common for household/personal use, it would make sense to expand library programming initiatives to include informational sessions that incorporate data privacy considerations for smart technologies in addition to skills-driven sessions. additionally, some survey responses indicated that library workers may have some knowledge gaps or a lack of concern about voice assistant use. this might point to a need for expanded education, training, and professional development around data privacy issues and emerging technologies for library workers. there has already been a large push in the field to expand digital literacy, defined by ala as “the ability to use information and communication technologies to find, evaluate, create, and communicate information, requiring both cognitive and technical skills.”29 however, this definition of digital literacy falls short of considering the role of assessing data collection, storage, and use as a core part of digital knowledge. expanding digital literacy training, for both staff and http://www.ala.org/advocacy/privacy/guidelines/vendors information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 13 patrons, to include awareness of the data ecosystems and privacy concerns that undergird smart technologies is a must for responsive library services. surveilling patrons and staff voice assistants placed in public service areas, in the library stacks, and in public gathering areas within the library raise the ethical issue of recording patrons (and staff) who either do not wish to be recorded, or do not even know they may be recorded. in the case of library staff, this poses a labor issue where staff may be asked to work in areas where devices may be listening to their interactions during the duration of their shift. for patrons, this could compromise privacy in reference transactions and in other information seeking activities, as well as capturing other personal interactions that take place in the library setting. it is critical that libraries are transparent about using voice assistant technologies, upfront about the potential privacy harms of these technologies, and abide by “opt-in” rather than “opt-out” frameworks. library workers should consider treating voice assistant records in the same way they have historically treated circulation records, opting to either delete these records or not collect them (meaning, not use voice assistants) at all. unlike circulation records, however, library workers have far less control over the data captured by voice assistants. this data is stored in the cloud on privately owned servers that remain outside of library control and oversight. given the incredibly low bar for federal access to information under the usa patriot act, actively facilitating the collection of patron and staff interactions, particularly without informed consent, should give librarians pause. opt to not adopt in light of the issues raised in this study, library workers need to seriously weigh whether the benefits of using voice assistants in libraries at this point in time outweigh the vast privacy concerns that we have outlined here. as it stands, these technologies are not currently filling a gap in library services that cannot be otherwise met by more traditional service models that carry fewer potential harms for our patron communities. importantly, not all patrons are equally vulnerable to harm or exploitation in these data environments. for instance, there is a wealth of research that demonstrates the multitude of ways that black, indigenous, people of color, lqbtq+, women, and low-income communities are subjected to higher levels of surveillance and data profiling that results in harassment, discrimination, economic penalties, and legal persecution.30 as the current national political landscape is aflame in protests against police violence and anti-black racism, it is important to identify surveillance technologies as policing technologies. libraries need to consider that these tools, as extensions of policing data networks, may directly endanger, particularly, black, latinx, and indigenous people who are already subjected to over-policing. in this sense, concerns about patron data privacy are high-stakes and are deeply linked to the professional core value of social responsibility.31 libraries should consider not using voice assistants until key data privacy concerns are addressed, more robust data protections are in place at a federal level, and the blanket authority for federal agencies and law enforcement to compel user data is revoked. this is not a technophobic stance. on the contrary, we are suggesting that library workers could serve an important role as privacy advocates, which includes critically evaluating the role of emerging technologies in their communities on behalf of public interest. a key part of this must include the library profession taking responsibility for the use of surveillance technologies in their institutions since these technologies are deeply implicated in the policing of disenfranchised communities by state and federal authorities. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 14 conclusion we view this study as a modest starting point for mapping some of the many privacy issues associated with voice assistant use in library services and programming and hope it points a way forward for future research. future research might address specific case studies of voice assistant use in libraries, data mapping of patron data through third-party library services, use and privacy issues across different institution types, patron digital literacies with voice assistants, and library policies for smart technologies more generally. plural and diverse vantage points are needed to understand the potential impacts of these technologies across different community types. such research is critical for developing best practices, guidelines, policies, and education opportunities for voice assistant use (and other smart technologies) that prioritize patron privacy and confidentiality. the use of voice assistants in libraries raises questions about the responsibility of libraries and librarians to actively engage patron data privacy concerns when considering integrating these technologies into services and programming. indeed, we encourage library workers to consider informed non-adoption of these technologies as a socially responsible professional stance until the key issues we have outlined are addressed. while it is, of course, important for library workers to remain current and innovative in their services, it is also paramount that patron privacy (as a function of safety) stays at the forefront of library services. in other words, it is the responsibility of library workers to anticipate potential privacy issues associated with emerging technologies, rather than treating privacy as a secondary concern to technology adoption. there are tremendous opportunities for library workers to lead the data privacy charge—in collaboration with community stakeholders—in pursuit of privacy-centered library services that are accountable to community members, particularly those who are mostly likely to be harmed by these technologies. information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 15 appendix a: survey instrument 1. by selecting the “i agree” button below, i hereby certify: that i am 19 years old or older; that i have read and understand the above consent form; and that this action indicates my willingness to voluntarily take part in the study. a. i agree to participate in the research study described above. b. i do not agree to participate in the research study described above. 2. do you work in a library setting? a. yes b. no 3. what kind of library do you work at? a. public b. academic c. school d. other, please specify [fill in the blank] 4. what is the size of your library’s service population? a. 2,500 or less b. 2,500-9,999 c. 10,000-25,000 d. 25,000+ e. i’m not sure. 5. what state is your library located in? [fill in the blank] 6. does your library have amazon echo devices, google home devices, or apple siri devices available for use by patrons? a. yes b. no c. i’m not sure. 7. which of the following digital assistant devices does your library have available for patrons to use? a. amazon echo devices b. apple siri devices c. google home devices d. other products, please specify: [fill in the blank] 8. please provide some examples of how your library patrons use the library's digital assistant technologies. [short answer] 9. could you describe where these digital assistant technologies are located in the library? [short answer] information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 16 10. does your library use amazon echo devices, google home devices or apple’s siri devices in any of the following kinds of programming? (select all that apply) a. tech “petting zoos” b. trivia c. homework help d. technology classes e. makerspaces f. not listed, please specify: [fill in description] g. none of the above 11. for the programs you selected, briefly explain how the devices are integrated into programming. [short answer] 12. does your library circulate amazon echo, google home, and apple siri devices to the public for checkout? a. yes b. no c. i’m not sure. 13. which devices do you circulate? a. amazon echo devices b. apple siri devices c. google home devices d. other products, please specify [fill in the blank] 14. do you provide any privacy information and/or best practice information with the device at checkout? a. yes b. no c. i’m not sure 15. if so, briefly explain what kind of privacy or best practices information you include. examples of content covered in this information would be helpful. [short answer] 16. do you have any privacy concerns about the use of amazon echo, google home, or apple siri devices in the library? a. yes b. no c. i’m not sure 17. could you describe these privacy concerns? [short answer] 18. does your library offer any sort of technology courses to the public? a. yes b. no c. i’m not sure information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 17 19. does your library teach data privacy or data literacy as part of the library's programming? a. yes b. no c. i’m not sure 20. does your library offer any data literacy education in programming that specifically addresses digital assistants? a. yes b. no c. i’m not sure 21. what kinds of data literacy information is provided in these courses taught at your library? please provide some examples: [short answer] 22. does your library use any of the following services? select all that apply: a. overdrive/libby b. hoopla c. none of the above 23. are you aware that both overdrive and hoopla have amazon echo application integration (called "skills")? a. yes b. no c. i’m not sure 24. does your library inform patrons about amazon echo skills on overdrive and/or hoopla? a. yes b. no c. i’m not sure 25. are you aware that amazon's privacy policies differ from those of overdrive and hoopla? a. yes b. no c. i’m not sure 26. does your library provide any information to patrons about overdrive and hoopla's privacy policies? a. yes b. no c. i’m not sure 27. does your library provide any information to patrons about amazon's privacy policies? a. yes b. no c. i’m not sure 28. please provide a brief description of the information that you are providing to patrons on this subject, including where this information is located for patron access. [short answer] information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 18 29. are you aware of the guidelines that the american library association (ala) provides on privacy as it pertains to third party electronic vendors? a. yes b. no 30. does your library use or refer to these privacy guidelines in any informational materials for patrons? a. yes b. no c. i’m not sure 31. please describe these informational materials, including how and where they are distributed to patrons: [short answer] information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 19 endnotes 1 benjamin herald, “teacher’s aide or surveillance nightmare? alexa hits the classroom,” digital education, education week, june 26, 2018, http://blogs.edweek.org/edweek/digitaleducation/2018/06/alexa_in_the_classroom_teacher s_surveillance.html?cmp=soc-shr-fb. 2 carrie smith, “your library needs to speak to you,” american libraries (june 3, 2019), https://americanlibrariesmagazine.org/2019/06/03/voice-assistants-your-library-needs-tospeak-to-you/. 3 nicole hennig, siri, alexa, and other digital assistants: the librarian’s quick guide (santa barbara, ca: libraries unlimited, 2018) 33–8. 4 adapted from: brenda laurel, “interface agents: metaphors with character,” human values and the design of computer technology (1997): 207–19, cambridge university press. 5 emily clark, “alexa, are you listening? how people use voice assistants,” https://clutch.co/appdevelopers/resources/alexa-listening-how-people-use-voice-assistants. 6 clark, “alexa, are you listening? how people use voice assistants.” 7 clark, “alexa, are you listening? how people use voice assistants.” 8 "voice control", american library association, http://www.ala.org/tools/future/trends/voicecontrol. 9 shannon liao, “google home will play music and sound effects when you read disney storybooks,” https://www.theverge.com/2018/10/29/18037466/google-home-disneymusic-moana-incredibles-coco-storytime; hennig, siri, alexa, and other digital assistants, 35; susan allen and avneet sarang, “serving patrons using voice assistants at worthington,” online searcher 42, no. 6 (november-december 2018): 49–52. 10 smith, “your library needs to speak to you.” 11 smith, “your library needs to speak to you.” 12 allen and sarang, “serving patrons using voice assistants at worthington.” 13 king county library system, “voice assistants, connecting you to your library,” https://kcls.org/voice/. 14 “core values of librarianship,” american library association, http://www.ala.org/advocacy/intfreedom/corevalues. 15 miriam e. sweeney, “digital assistants,” in uncertain archives: critical keywords for big data, ed. nanna bonde thylstrup, daniela agostinho, annie ring, catherine d’ignazio , and kristin veel (baltimore, md: mit press, 2021), 151–60. http://blogs.edweek.org/edweek/digitaleducation/2018/06/alexa_in_the_classroom_teachers_surveillance.html?cmp=soc-shr-fb http://blogs.edweek.org/edweek/digitaleducation/2018/06/alexa_in_the_classroom_teachers_surveillance.html?cmp=soc-shr-fb http://blogs.edweek.org/edweek/digitaleducation/2018/06/alexa_in_the_classroom_teachers_surveillance.html?cmp=soc-shr-fb http://blogs.edweek.org/edweek/digitaleducation/2018/06/alexa_in_the_classroom_teachers_surveillance.html?cmp=soc-shr-fb https://americanlibrariesmagazine.org/2019/06/03/voice-assistants-your-library-needs-to-speak-to-you/ https://americanlibrariesmagazine.org/2019/06/03/voice-assistants-your-library-needs-to-speak-to-you/ https://clutch.co/app-developers/resources/alexa-listening-how-people-use-voice-assistants https://clutch.co/app-developers/resources/alexa-listening-how-people-use-voice-assistants http://www.ala.org/tools/future/trends/voicecontrol https://www.theverge.com/2018/10/29/18037466/google-home-disney-music-moana-incredibles-coco-storytime https://www.theverge.com/2018/10/29/18037466/google-home-disney-music-moana-incredibles-coco-storytime https://kcls.org/voice/ http://www.ala.org/advocacy/intfreedom/corevalues information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 20 16 anthony cuthbertson, “amazon admits employees listen to audio from echo devices,” https://www.independent.co.uk/life-style/gadgets-and-tech/news/amazon-alexa-echolistening-spy-security-a8865056.html. 17 matt day et al., “amazon workers are listening to what you tell alexa,” https://www.bloomberg.com/news/articles/2019-04-10/is-anyone-listening-to-you-onalexa-a-global-team-reviews-audio. 18 daniel j. dubois et al., “when speakers are all ears: understanding when smart speakers mistakenly record conversations,” mon(iot)r, february 14, 2020, https://moniotrlab.ccis.neu.edu/smart-speakers-study/. 19 karen hao, “amazon is the invisible backbone of ice’s immigration crackdown,” mit technology review, october 16, 2019, https://www.technologyreview.com/s/612335/amazon-is-theinvisible-backbone-behind-ices-immigration-crackdown/. 20 zack whittaker, “echo is listening, but amazon’s not talking,” zdnet, january 16, 2018, https://www.zdnet.com/article/amazon-the-least-transparent-tech-company/. 21 american library association, “library privacy guidelines for vendors,” http://www.ala.org/advocacy/privacy/guidelines/vendors. 22 this research protocol (19-08-2671) was approved in october 2019 by the university of alabama’s institutional review board (irb). 23 note, participants were not required to answer every question, so some questions have fewer than 86 total responses due to participants electing to not respond. also, even though we were targeting public and academic libraries, we did receive a response from someone identifying their institution as a school library and decided to include it in the results. 24 we did not receive responses from libraries in arizona, arkansas, connecticut, delaware, pennsylvania, vermont, virginia, or wyoming. 25 “technology petting zoos” are areas where patrons can experiment or try out technologies and gadgets. 26 hoopla, “alexa, meet hoopla,” july16, 2018, http://hub.hoopladigital.com/whatsnew/2018/7/alexa-meet-hoopla. 27 we purposely couched questions about “data literacy” and “data privacy” broadly in the survey to allow for a range of interpretations by respondents in an attempt to capture the range of information that might be taught under this umbrella. 28 daniel j. dubois et al., “when speakers are all ears: understanding when smart speakers mistakenly record conversations.” 29 american library association, “digital literacy,” https://literacy.ala.org/digital-literacy/. 30 examples of critical research in this area include: toby beauchamp, going stealth: transgender politics and u.s. surveillance practices (durham, london: duke university press, 2019); https://www.independent.co.uk/life-style/gadgets-and-tech/news/amazon-alexa-echo-listening-spy-security-a8865056.html https://www.independent.co.uk/life-style/gadgets-and-tech/news/amazon-alexa-echo-listening-spy-security-a8865056.html https://www.bloomberg.com/news/articles/2019-04-10/is-anyone-listening-to-you-on-alexa-a-global-team-reviews-audio https://www.bloomberg.com/news/articles/2019-04-10/is-anyone-listening-to-you-on-alexa-a-global-team-reviews-audio https://moniotrlab.ccis.neu.edu/smart-speakers-study/ https://moniotrlab.ccis.neu.edu/smart-speakers-study/ https://moniotrlab.ccis.neu.edu/smart-speakers-study/ https://www.technologyreview.com/s/612335/amazon-is-the-invisible-backbone-behind-ices-immigration-crackdown/ https://www.technologyreview.com/s/612335/amazon-is-the-invisible-backbone-behind-ices-immigration-crackdown/ https://www.zdnet.com/article/amazon-the-least-transparent-tech-company/ http://www.ala.org/advocacy/privacy/guidelines/vendors http://hub.hoopladigital.com/whats-new/2018/7/alexa-meet-hoopla http://hub.hoopladigital.com/whats-new/2018/7/alexa-meet-hoopla https://literacy.ala.org/digital-literacy/ information technology and libraries december 2020 alexa, are you listening? | sweeney and davis 21 virginia eubanks, automating inequality: how high-tech tools profile, police, and punish the poor (st. martin’s press, 2018); safiya u. noble, algorithms of oppression: how search engines reinforce racism (new york: nyu press, 2018). 31 “core values of librarianship,” american library association. abstract introduction review of literature what is a voice assistant? libraries and voice assistant use data privacy issues research methods findings participant demographics how are libraries using smart voice assistant technologies as a part of their library services? how aware are library workers of how voice assistants integrate with third-party digital content platforms? are libraries educating library patrons about voice assistant technologies as a part of services and programming? what kinds of privacy concerns do library workers have about the use of smart voice assistant technologies in their library services and programming? data access and use surveillance and “always on” features procedure and operations legal issues professional responsibility discussion: key considerations for library professionals circulation procedures third-party digital content platforms education opportunities for expanding digital literacy surveilling patrons and staff opt to not adopt conclusion appendix a: survey instrument endnotes aliprand ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ 139 technical communications announcements panel discussion on «government publications in machine-readable form" this meeting will be held on july 10 from 8:30 to 10:30 p.m. as a part of the american library association's 1974 new york conference. the meeting is cosponsored by the government documents round table's (godort) machinereadable data file committee, the federal librarians round table (flirt), the rasd information retrieval committee, and the rasd/rtsd/ asla public documents committee. the moderator is gretchen dewitt of columbus public library and the panelists are peter watson of ucla, mary pensyl of mit, judith rowe of princeton, and billie salter of yale. mr. watson will discuss the general issues concerning the acquisition and use of bibliographic data files and provide a brief description of some of the files now publicly available; miss pensyl will describe the workings of the project now underway to make these files available to mit users. mrs. rowe will discuss the ways in which government-produced statistical files supplement the related printed reports and will indicate some of the types and sources of files now being released; miss salter will discuss a program for integrating these and other research files into yale's social science reference service. representatives of several federal agencies will display materials describing and documenting both bibliographic and statistical data files. the purpose of the program is to acquaint reference librarians, particularly those now handling printed documents, with the uses of both types of files, the advantages and disadvantages of these reference tools, and the techniques and policy changes necessary for their use in a library environment. the recent release of the draft proposal produced by the national commission on libraries and information services makes more timely than ever an open discussion of the place of bibliographic and numeric data files in a reference collection. all librarians must be acquainted with these growing resources in order to continue to provide full service to their patrons. for further information, contact judith rowe, computer center, princeton university, 87 prospect ave., princeton, nj 08540. ninth annual educational media and technology conference to be hosted by university of wisconsin-stout, july 22-24, 1974 aetc past president dr. jerry kemp, coordinator of instructional development services for san jose state university (california), and film consultant ralph j. amelio, media coordinator and english instructor at willowbrook high school, villa park, illinois, will headline the university of wisconsin-stout's 9th annual educational media and technology conference to be held in menomonie, wisconsin, on july 22-24, 1974. "educational technology: can we realize its potential?" will be the subject of kemp's presentation on monday evening, while amelio, speaking on tuesday, july 23, will challenge participants with the subject "visual literacy: what can you do?". seven concurrent workshops will be held on monday afternoon: library automation; sound for visuals; making the timesharing computer work for you; new developments in photography; what's 140 journal of libmry automation vol. 7/2 june 1974 new in graphics; selecting and evaluating educational media; and instructional development: how to make it work! individuals leading the three-hour workshops will include: alfred baker, vicepresident of science press; john lord, technical service manager for the dukane corporation; william daehling, weber state college, ogden, utah; and several media specialists from learning resources, university of wisconsin-stout. about fifty exhibitors will show and demonstrate both hardware and software during the conference. six case studies will be given of exemplary media programs at the public school, vocational-technical, and college level. further information may be obtained by contacting dr. david p. bernard, dean of learning resources, university of wisconsin-stout, menomonie, wi 54751. report of recon project published the library of congress has published in recon pilot project (vii, 49p.) the final report of a project sponsored by lc, the council on library resources, inc., and the u.s. office of education to determine the problems associated with centralized conversion of retrospective catalog records and distribution of these records from a central source. in the marc pilot project, begun in november 1966, the library of congress distributed machine-readable catalog records for english-language monographs, and the success of that project led to the implementation in march 1969 of the marc distribution service, in which over fifty subscribers have by now received more than 300,000 marc records representing the current english-language monograph cataloging at the library of congress. as coverage is extended to catalog records for foreign-language monographs and for other forms of material, libraries will be able to obtain machine records for a large number of their current titles. more research was needed, however, on the problems of obtaining machinereadable data for retrospective cataloging, and the council on library resources made it possible for lc to engage in november 1968 a task force to study the feasibility of converting retrospective catalog records. the final report of the recon (for retrospective conversion) working task force was published in june 1969. one of the report's recommendations was that a pilot project test various conversion techniques, ideally covering the highest priority materials, english-language monograph records from 1960-68; and with funds from the sponsoring agencies lc initiated a two-year project in august 1969. the present report covers five major areas examined in that period: 1. testing of techniques postulated in the recon report in an operational environment by converting englishlanguage monographs cataloged in 1968 and 1969 but not included in the marc distribution service. 2. development of format recognition, a computer program which can process unedited catalog records and supply all the necessary content designators required for the full marc record. 3. analysis of techniques for the conversion of older english-language materials and titles in foreign languages using the roman alphabet. 4. monitoring the state-of-the-art of input devices that would facilitate conversion of a large data base. 5. a study of microfilming techniques and their associated costs. recon pilot project is available for $1.50 from the superintendent of documents, u.s. government printing office, washington, dc 20402. stock no. 300000061. library of congress issues recon working task fo1'ce report national aspects of creating and using marc/recon records (v, 48p.) reports on studies conducted at the library of congress by the recon working task force under the chairmanship of henriette d. avram. they were made concurrently with a pilot project by the library to test the feasibility of the plan outlined in the task force's first report entitled conversion of retrospective reco1·.ds to machine-readable form (library of congress, 1969) and in recon pilot p1'oject (library of congress, 1972). both the pilot project and the new studies received financial support from the council on library resources, inc., and the u.s. office of education. the present volume describes four investigations: ( 1) the feasibility of determining a level or subset of the established marc content designators (tags, indicators, and subfield codes) that would still allow a library using it to be part of a future national network; ( 2) the practicality of the library of congress using other machine-readable data bases to build a national bibliographic store; ( 3) implications of a national union catalog in machine-readable form; and ( 4) alternative strategies for undertaking a largescale conversion project. the appendices include an explanation of the problems of achieving a cooperatively produced bibliographic data base, a description of the characteristics of the present national union catalog, and an analysis of library of congress card orders for one year. although the findings and recommendations of this report are less optimistic than those of the original recon study, they reaffirm the need for coordinated activity in the conversion of retrospective catalog records and suggest ways in which a large-scale project might be undmtaken. the report provides a basis for realistic planning in a critical area of library automation. national aspects of creating and using marc!recon records is available for $2.75 from the superintendent of documents, u.s. government printing office, washington, dc 20402. stock no. 300000062. isad official activities tesla info1'mation editor's note: use of the following guidelines and forms is described in the article by john kountz in this issue of technical communications 141 jola. the tesla reactor ballot will also appear in subsequent issues of technical communications for reader use, and the tesla standards scoreboard will be presented as cumulate.d 1'esults warrant its publication. to use, photocopy or otherwise duplicate the forms presented in jola-tc, fill out these copies, and mail them to the tesla chai1'man, m1'. john c. kountz, associate fo1' libmry automation, office of the chancello1', the califomia state university and colleges, 5670 wilshim blvd., suite 900, los angeles, ca 90036. initiative standard proposal outlinethe following outline and forms are designed to facilitate review by both the isad committee on technical standards for library automation (tesla) and the membership of initiative standards requirements and to expedite the handling of the initiative standard proposal through the procedure. since the outline will be used for the review process, it is to be followed explicitly. where an initiative standard requirement does not require the use of a tesla reactor ballot reactor information name title organization address city state ___ zip __ telephone identification number for standard requirement for against reason for position: (use additional pages if required} 142 ]oumal of librm·y automation vol. 7/2 june 1974 tesla standards scoreboard receipt screen division rej/acpl publish tally representative title/i.d. number date date date date date date date target specific outline entry, the entry heading is to be used followed by the words "not applicable" (e.g., where no standards exist which relate to the proposal, this is indicated by: vi. existing standards. not applicable). note that the parenthetical statements following most of the outline entry descriptions relate to the ansi standards proposal section headings to facilitate the translation from this outline to the ansi format. all initiative standards proposals are to be typed, double spaced on 83~" x 11" white paper (typing on one side only). each page is to be numbered consecutively in the upper right-hand corner. the initiator's last name followed by the key word from the title is to appear one line below each page number. i. title of initiative standard proposal (title) . ii. initiator information (forward). a. name b. title c. organization d. address e. city, state, zip f. telephone: area code, number, extension iii. technical area. describe the area of library technology as understood by initiator. be as precise as possible since in large measure the information given here will help determine which ala official representative might best handle this proposal once it has been reviewed and which ala organizational component might best be engaged in the review process. iv. purpose. state the purpose of standard proposal (scope and qualifications) . v. description. briefly describe the standard proposal (specification of the standard) . vi. relationship of other standards. if existing standards have been identified which relate to, or are felt to influence, this standard technical communications 143 proposal, cite them here (expository remarks) . vii. background. describe the research or historical review performed relating to this standard proposal (if applicable, provide a bibliography) and your findings (justification). viii. specifications. specify the standard proposal using record layouts, mechanical drawings, and such related documentation aids as required in addition to text exposition where applicable (specification of the standard). research and development system development corporation awarded national science foundation grant to study interactive searching of large literature data bases santa monica, california-the national science foundation has awarded system development corporation $98,500 for a study of man-machine system communication in on-line reh·ieval systems. the study will focus on interactive searching of very large literature data bases, which has become a major area of interest and activity in the field of information science. at least seven major systems of national or international scope are in operation within the federal government and private industry, and more systems are on the drawing boards or in experimental operation. the principal investigator for the project will be dr. carlos cuadra, manager of sdc's education and library systems department. the project manager, who will be responsible for the day-to-day operation of the fifteen-month effort, is judy wanger, an information systems analyst and project leader with extensive experience in the establishment and use of interactive bibliographic retrieval services. ms. wanger is currently responsible for user training and customer support on sdc's on-line information service. the study will use questionnaire and interview techniques to collect data re144 journal of libml'y automation vol. 7/2 june 1974 lated to: (1) the impact of on-line retrieval usage on the terminal user; (2) the impact of on-line service on the sponsoring institution; and ( 3) the impact of online service on the information-utilization habits of the information consumer. attention will also be given to reliability problems in the transmission chain from the user to the computer and back. the major elements in this chain include: the user; the terminal; the telephone instrument; local telephone lines and switchboards; long-haul communications; the communications-computer interface hardware; the computer itself; and various programs in the computer, including the retrieval program. reports on regional projects and activities california state university and colleges system union list system the library systems project of the california state university and colleges has recently completed a production union list system. this system, comprised of eight processing programs to be run in a very modest environment (currently a cdc 3300), is written in ansi cobol and is fully documented. included in the documentation package are user worksheets for bibliographic and holding data, copies of all reports, file layouts, program descriptions, etc. output from this system are files designed to drive graphic quality photocomposition or com devices. the system is available for the price of duplicating the documentation package. and, for those so desiring, the master file containing some 25,000 titles and titles with references is also available for the cost of duplication. interested parties (bona fides only, please) should contact john c. kountz, associate for library automation, california state university and colleges, 5670 wilshire blvd., suite 900, los angeles, ca 90036, for further details. solinet membe1·ship meeting the annual membership meeting of the southeastern library network (solinet) was held at the georgia institute of technology in atlanta, march 14. it was announced that charles h. stevens, executive director of the national commission on libraries and information science, has been named director of solinet effective july 1. john h. gribbin, chairman of the board, will serve as interim director. it was also announced that solinet will be affiliated with the southern regional education board. sreb will provide office space, act as financial disbursing agent, and will be available at all times in an advisory capacity. negotiations are underway for a tie-in to the ohio college library center ( oclc) and a proposed contract is in the hands of the oclc legal counsel. it is anticipated that a contract soon will be signed. additional to the tie-in, solinet will proceed with the development of its own permanent computer center in atlanta. this center will eventually provide a variety of services and will be coordinated carefully with other developing networks, looking toward a national library network system. elected to fill three vacancies on the board of directors were james f. govan (university of north carolina), gustave a. harrar (university of florida), and robert h. simmons (west georgia college). they will assume office on july 1. anyone desiring information about solinet should write to 130 sixth st., nw, atlanta, ga 30313. reports-library projects and activities new book catalog for junior college district of st. louis the three community college libraries of the junior college district of st. louis have been using computerized union book catalogs since 1964. formerly maintained and produced by an outside contractor, the catalogs are now one product of a new catalog system recently designed and implemented by instructional resources and data processing staff of the district. known as "ir catalog," the system presently has a data base of approximately 65,000 records describing the print and nonprint collections of the district's three college instructional resource centers. in addition to photocomposed author, subject, and title indexes, the system also produces weekly cumulative printouts which supplement the phototypeset ''base" catalog. other output includes three-by-five-inch shelflist cards (which include union holdings information), a motion picture film catalog, subject and cross reference authority lists, and various statistical reports. hawaii state lihra1'y system to automate p1'ocessing the state board of education in hawaii has approved a proposal for a computerized data processing system for the hawaii state library. the decision allows for the purchase of computer equipment for automating library operations. the state library centrally processes library materials for all public and school libraries in the state. teichior hirata, acting state education superintendent, told board members a computerized system will speed book selection, ordering, and processing, and will improve interlibrary loan and reference services. he also pointed out it would facilitate a general streamlining of all technical administrative operations. the system's total cost will be $187,000, of which $58,000 will be spent for computer software. the "biblios" system, designed and developed at orange county public library in california and marketed by information design, inc., was selected as the software package. the caltech science lihm1'y catalog supplement the use of catalog supplements during the necessary maturation period required to take full advantage of the national program for acquisitions and cataloging is technical communications 145 obviously an idea whose time has come. the program developed at the california institute of technology, however, differs in several important respects from that previously described by nixon and bell at u.c.l.a. 1 for reasons based primarily on faculty pressure, the practice of holding books in anticipation of the cataloging copy has never been a practice at the institute. the solution, while hardly unique, is to assign the classification number (dewey) and depend on a temporary main entry card to suffice until the lc copy is available. while this procedure has the distinct advantage of not requiring the presence of the book to complete the cataloging process, it does, however, prevent the user from finding the newest books through a search of the subject added entry cards. the use of the computer-based systems is an obvious solution to this aspect of the program but raises several additional problems which formerly seemed to defy solutions. as has been pointed out by mason, library-based computer systems can rarely be justified in terms of cost effectiveness, and computer-based library catalogs are no exception.2 part of this problem arises from the natural inclination to repeat in machine language what has been standard practice in the library catalog. this reaction overlooks the very different nature of catalogs and catalog supplements. as catalogs serve as the basis for the permanent record and their cost can be prorated over several decades the need for a careful description of the many facets of a book is quite properly justified. in the case of catalog supplements, however, where the record will serve quite likely for only a few months, any attempt at detailed description of the book cannot be justified. one solution to this dilemma that has been developed here at caltech is a brief listing supplement which allows searching for a given book by either the first author or editor's last name, a key word from the title, or the first word of a series entry. these elements form the basis of a simple kwoc index (see figure 1) which sup146 journal of library automation vol. 7/2 june 1974 chemisorption chemisorption and catalysis hepple 541.395 he 1970 ch chester 19 techniques in partial differential equations chester 517.6 ch 1971 ch 199 ciba protein turnover 612.39 pr 1972 bl ( ciba foundation symposium, 9) fig. 1. sample entries from the kwoc index 108 19 t chemisorption & catalysis hepple 541.395 he 1970 ch a hepple chemisorption catalysis 108 t protein turnover 612.39 pr 1972 bi (ciba foundation symposium, 9) a protein ciba 199 t techniques in partial differential equations chester 517.6 ch 1971 ch a differential chester fig. 2. sample ent1·ies from the bibliographic listing new books chemistry /biology august 6, 1973 catalysis, chemisorption and . hepple 541.395 he 1970 ch differential equations, techniques in partial . . . chester 517.6 ch 1971 ch protein turnover ciba foundation symposium, 9 612.39 pr 1972 bi fig. 3. sample entries from the weekly list of newly added books plements the bibliographic listing (shown in figure 2) . all books received in the chemistry, physics, and biology libraries are represented in the catalog supplement. weekly lists of newly added books (shown in figure 3) are annotated to show the index terms prior to keypunching. the unit record consists of a "title" card or cards (which contain the full title, author/ editor, call number, library designation, and series information) and an "author" card (which contains the index terms) . edited material is added accessionally to the card file data base and batch processed on the campus ibm 370/ 155 computer. the catalog supplement is currently published on 8jf-by-1hnch sheets as a result of reducing the computer printout on a xerox 7000 copier. lists are given a vello-bind and delivered to therespective libraries. weeding the catalog supplement is still unresolved. at the present time additions are less than 1,000 per year, so that it may be possible after five years to replace the subject sections of the respective divisional catalogs with the catalog supplement. the "library" at caltech consists of several divisional libraries, each with their own card catalog. these divisional card catalogs are supplemented by a union catalog, which serves all libraries on campus and, because of the strong interdisciplinary nature of the divisional libraries, is much the better source for subject searches. the project is so facile and the costs so minimal that this approach might be of value to many small libraries. it is particularly applicable to the problems recently discussed by patterson. 3 books in series, even if they are distinct monographs, are often lost to the user from a subject approach. with this system each physical volume added to the library can be analyzed for possible inclusion in the catalog supplement. 1. robertanixon and ray bell, "the u.c.l.a. library catalog supplement," library resources & technical services 17:59 (winter 1973). 2. ellsworth mason, "along the academic way," library journal 96:1671 (1971). 3. kelly patterson, "library think vs libra1y user," rq 12:364 (summer 1973). danal. roth millikan librm·y c alifomia institute of technology commercial activities richard abel & company to sponsor workshops in library automation and management one of the most effective forms of continuing education is state-of-the-art reporting. recognizing the need for more such communication 1 the international library service firm of richard abel & company plans to sponsor two workshops for the library and information science community. the first workshop will deal with the latest techniques in library automation. it will precede the 197 4 american library association conference in new york city, july 7-13. the second will present advances in library management, and will be scheduled to precede the 1975 ala midwinter meeting, january 19-25. the workshops will include forums, lectures, and open discussions. they will be presented by recognized leaders in the fields of library automation, management, and consulting. each workshop will probably be one or two days long. there will be no charge to attend either of the workshops, but attendance will be limited, to provide a good discussion atmosphere. for the management workshop, attendance will be limited to librarians active in library management. similarly, the automation workshop is intended for librarians working in library automation. maintaining the theme of state-of-theart reporting, the basic content of the workshops will consist of what is happening in library management and automation today. looking to the future, there will also be discussions and forecasts of what is to come. persons interested in further informatechnical communications 147 tion or in pa1ticipating in either workshop should contact abel workshop director, richard abel & company, inc., p.o. box 4245, portland, or 97208. idc introduces bibnet on-line services the introduction of bibnet on-line systems, a centralized computer-based bibliographic data service for libraries, has been announced by information dynamics corporation. demonstrations are planned for the ala annual conference in new york, july 7-13. according to david p. waite, idc president, "during 1973, bibnet service modules were interconnected over thousands of miles and tested for on-line use with idc's centralized computerbased cataloging data files. this is the culmination of a program that began two years ago. it is patterned after advanced technological developments similar to those recently applied to airline reservation systems and other large scale nationwide computing networks used in industry." idc, a new england-based library systems supplier, will provide a computerstored cataloging data base of more than 1.2 million library of congress and contributed entries. initially it will consist of all library of congress marc records (now numbering over 430,000 titles), plus another 800,000 partial lc catalog records containing full titles, main entries, lc card numbers, and other selected data elements. as a result, bibnet will provide on-line bibliographic searching for all 1,250,000 catalog records produced by the library of congress since 1969. to enable users to produce library cards from those non-marc records for which only partial entries are kept in the computer, idc will mail card sets from its headquarters and add the full records to the data base for future reference. subscribing libraries will have access to the data base using a minicomputer cathode ray tube (crt) terminal. using this technique of dispersed computing each bibnet terminal has programmable computer power built-in. this in-house 148 journal of library automation vol. 7/2 june 1974 processing power, independent of the central computer, allows computer processes like library card production to be performed in the library. this also eliminates waiting for catalog cards to arrive in the mail. bibnet terminals communicate with the central computer over regular telephone lines, eliminating the high costs of dedicated communication lines. therefore, thousands of libraries throughout the united states and canada can avail themselves of on-line services at low cost. bibnet users will have several methods of extracting information from the idc data base. the computer can search for individual records by titles, main entry, isbn number, or keywords. here's how it works: the operator types in any one of the search items or if a complete title is not known, a keyword from the title may be used. the cataloging information is then displayed on the crt where the operator may verify the record. at the push of a button, the data is stored on a magnetic cassette tape which is later used for editing and production of catalog cards by the user library. the bibnet demonstration in new york will highlight one of many bibliographic service modules available from idc and stress the fact that these services can be utilized by individual libraries and organized groups of libraries. license for new information retrieval concept awarded to boeing by xynetics an exclusive license for manufacture and marketing to the government sector of systems incorporating a completely new concept in information storage and retrieval has been awarded to the boeing company, seattle, washington, by xynetics, inc., canoga park, california, it was announced jointly by dr. r. v. hanks, boeing program manager, and burton cohn, xynetics board chairman. the system is said to be the first image storage and retrieval system which offers response times and costs comparable to those of digital systems. the heart of the system is a device of proprietary design, the flat plane memory, which provides mpid access to massive amounts of data stored in high resolution photographic media. the photographic medium enables low cost storage of virtually any type of source material (documents, correspondence, drawings, multitone images, computer output, etc.) while eliminating the need for time-consuming, costly conversion of pre-existing information into a specialized (e.g., digital) format. by virtue of its extremely rapid random access capability, the data needs of as many as several thousand users can be served at remote video terminals from a single memory with near real time response ( 1-3 seconds, typically). the high speed, high accuracy, and high reliability of the flat plane memory is accomplished primarily through the use of the patented xynetics positioner, which generates direct linear motion at high speeds and with great precision and reliability instead of converting rotary motion. as a result, the positioners eliminate the gears, lead screws, and other mechanical devices previously utilized, and thus achieve the requisite speed, accuracy, and reliability. the xynetics positioners are already being used in automated drafting systems produced by the firm, and in a wide variety of other applications, including the apparel industry and integrated circuit test systems. the new approach could eliminate many of the problems associated with multiple reproductions and distribution of large data files. in addition to many government applications, the system is expected to have major applications in the commercial marketplace. appointments charles h. stevens appointed solinet director charles h. stevens, executive director, national commission on libraries and information science, has been appointed director of the southeastern library net~ work (solinet), effective july 1. the announcement was made at a meeting of solinet in atlanta, march 14, by john h. gribbin, board chairman. composed of ninety-nine institutional members, solinet is headquartered in atlanta. a librarian of acknowledged national stature and an expert on the technical aspects of information retrieval systems, mr. stevens brings to solinet a valuable combination of experience and abilities. concerned with national problems of libraries and information services, he will develop a regional network and move toward a cohesive national program to meet the evolving needs of u.s. libraries. a forerunner in library automation, mr. stevens served for six years as associate director for library development, project intrex, at massachusetts institute of technology. from 1959-1965 he was director of library and publications at mit's lincoln laboratory, lexington, massachusetts. at purdue university, he was aeronautical engineering librarian and later director of documentation of the thermophysical properties research center. mr. stevens is a member of the council of the american library association, the american society for information science, the special libraries association, and other professional organizations. he is the author of approximately forty papers in the field, lectures widely, and consults on library activities for a number of universities. mr. stevens holds a b.a. in english fro:in principia college, elsah, illinois, and master's degrees in english and in library science from the university of north carolina. mr. stevens has done further study in engineering at brooklyn polytechnic institute. mr. stevens is married and has three sons. input to the editor: international scuttlebutt informs us that those in the bibliothecal stratosphere are technical communications 149 attempting to formulate a communications format for bibliographical records acceptable on a worldwide basis. we on the local scene unite in wishing them "huzzah!" and "godspeed!" nomenclature must be provided, of course, to designate particular applications; and the following suggestions are offered as possible subspecies of the genus supermarc: deutschmarc-for records distributed from bonn and/ or wiesbaden rheemarc-for south korean records, named in honor of the late president of that country bismarc-for records of stage productions which have been produced by popular demand from the top balcony; especially pertinent for wagnerian operas benchmarc-for records of generally unsuccessful football plays minskmarc-for byelorussian records sachermarc-for austrian records, usually representing extremely tasteful concoctions trademarc-for records pertaining to manufactured products, especially patent medicines goldmarc-for records representing hungarian musical compositions ( v. karl goldmark, 1830-1915) ectomarc } endomarc mesomarc (from -for skinny, fat, and the italian, mezmedium-sized reczomarc) ords, respectively landmarc-for records of historic edifices; sometimes ( enoneously) applied to records for local geographical regions feuermarc-for records representing charred or burned documents montmarc-1. for records representing works by or about parisian artists; 2. for records representing publications of the french academy watermarc-for records representing documents contained in bottles washed up on the beach. joseph a. rosenthal university of california, berkeley 100 information technology and libraries | june 2009 tutorial andrew darby and ron gilmour adding delicious data to your library website social bookmarking services such as delicious offer a simple way of developing lists of library resources. this paper outlines various methods of incorporating data from a delicious account into a webpage. we begin with a description of delicious linkrolls and tagrolls, the simplest but least flexible method of displaying delicious results. we then describe three more advanced methods of manipulating delicious data using rss, json, and xml. code samples using php and javascript are provided. o ne of the primary components of web 2.0 is social bookmarking. social bookmarking services allow users to store bookmarks on the web where they are available from any computer and to share these bookmarks with other users. even better, these bookmarks can be annotated and tagged to provide multiple points of subject access. social bookmarking services have become popular with librarians as a means of quickly assembling lists of resources. since anything with a url can become a bookmark, such lists can combine diverse resource types such as webpages, scholarly articles, and library catalog records. it is often desirable for the data stored in a social bookmarking account to be displayed in the context of a library webpage. this creates consistent branding and a more professional appearance. delicious (http://delicious .com/), one of the most popular social bookmarking tools, allows users to extract data from their accounts and to display this data on their own websites. delicious offers multiple ways of doing this, from simply embedding html in the target webpage to interacting with the api.1 in this paper we will begin by looking at the simplest methods for users uncomfortable with programming, and then move on to three more advanced methods using rss, json, and xml. our examples use php, a cross-platform scripting language that may be run on either linux/ unix or windows servers. while it is not possible for us to address the many environments (such as cmses) in which websites are constructed, our code should be adaptable to most contexts. this will be especially simple in the many popular php–based cmses such as drupal, joomla, and wordpress. it should be noted that the process of tagging resources in delicious requires little technical expertise, so the task of assembling lists of resources can be accomplished by any librarian. the construction of a website infrastructure (presumably by the library’s webmaster) is a more complex task that may require some programming expertise. linkrolls and tagrolls the simplest way of sharing links is to point users directly to the desired andrew darby (adarby@ithaca.edu) is web services librarian, and ron gilmour (rgilmour@ithaca.edu) is science librarian at ithaca college library, ithaca, new york. figure 1. delicious linkroll page adding delicious data to your library website | darby and gilmour 101 delicious page. to share all the items labeled “biology” for the user account “iclibref,” one could disseminate the url http://delicious.com/iclibref/ biology. the obvious downside is that the user is no longer on your website, and they may be confused by their new location and what they are supposed to do there. linkrolls, a utility available from the delicious site, provides a number of options for generating code to display a set of bookmarked links, including what tags to display, the number, the type of bullet, and the sorting criterion (see figure 1).2 this utility creates simple html code that can be added to a website. a related tool, tagrolls, creates the ubiquitous delicious tag cloud.3 for many librarians, this will be enough. with the embedded linkroll code, and perhaps a bit of css styling, they will be satisfied with the results. however, delicious also offers more advanced methods of interacting with data. for more control over how delicious data appears on a website, the user must interact with delicious through rss, json or xml. rss like most web 2.0 applications, delicious makes its content available as rss feeds. feeds are available at a variety of levels, from the delicious system as a whole down to a particular tag in a particular account. within a library context, the most useful types of feeds will be those that point to lists of resources with a given tag. for example, the request http://feeds.delicious.com/rss/iclibref/biology returns the rss feed for the “biology” tag of the “iclibref” account, with items listed as follows: <item rdf:about=“http://icarus .ithaca.edu/cgi-bin/pwebrecon .cgi?bbid=237870”> <title>darwin’s dangerous idea (evolution 1) 2008-0409t18:40:00z http://icarus.ithaca .edu/cgi-bin/pwebrecon. cgi?bbid=237870 iclibref this episode interweaves the drama in key moments of darwin's life with documentary sequences of current research, linking past to present and introducing major concepts of evolutionary theory. 2001 biology to display delicious rss results on a website, the webmaster must use some rss parsing tool in combination with a script to display the results. the xml_rss package provides an easy way to read rss using php.4 the code for such an operation might look like this: parse(); foreach ($rss->getitems() as $item) { echo “

”; } ?> this code uses xml_rss to parse the rss feed and then prints out a list of linked results. rss is designed primarily as a current awareness tool. consequently, a delicious rss feed only returns the most recent thirty-one items. this makes sense from an rss perspective, but it will not often meet the needs of librarians who are using delicious as a repository of resources. despite this limitation, the delicious rss feed may be useful in cases where currency is relevant, such as lists of recently acquired materials. json a second method to retrieve results from delicious is using javascript object notation or json.5 as with the rss feed method, a request with credentials goes out to the delicious server. the response returns in json format, which can then be processed using javascript. an example request might be http://feeds.delicious . c o m / v 2 / j s o n / i c l i b r e f / b i o l o g y . by navigating to this url, the json response can be observed directly. a json response for a single record (formatted for readability) looks like this: delicious.posts = [ {“u”:“http:\/\/icarus.ithaca .edu\/cgi-bin\/pwebrecon .cgi?bbid=237870”, “d”:“darwin’s dangerous idea (evolution 1)”, “t”:[“biology”], “dt”:“2008-04-09t06:40:00z”, “n”:“this episode interweaves the drama in key moments of darwin’s life with documentary sequences of current research, linking past to present and introducing major concepts of evolutionary theory. 2001”} ]; it is instructive to look at the json feed because it displays the information elements that can be extracted: “u” for the url of the resource, “d” for the title, “t” for a comma-separated list of related tags, “n” for the note field, and “dt” for the timestamp. to display results in a webpage, the feed is requested using javascript: 102 information technology and libraries | june 2009 then the json objects must be looped through and displayed as desired. alternately, as in the script below, the json objects may be placed into an array for sorting. the following is a simple example of a script that displays all of the available data with each item in its own paragraph. this script also sorts the links alphabetically. while rss returns a maximum of thirty-one entries, json allows a maximum of one hundred. the exact number of items returned may be modified through the count parameter at the end of the url. at the ithaca college library, we chose to use json because at the time, delicious did not offer the convenient tagrolls, and the results returned by rss were displayed in reverse chronological order and truncated at thirty-one items. currently, we have a single php page that can display any delicious result set within our library website template. librarians generate links with parameters that designate a page title, a comma-delimited list of desired tags, and whether or not item descriptions should be displayed. for example, www.ithacalibrary.com/research/delish_feed. php?label=biology%20films&tag=bio logy,biologyi¬es=yes will return a page that looks like figure 2. the advantage of this approach is that librarians can easily generate webpages on the fly and send the url to their faculty members or add it to a subject guide or other webpage. the php script only has to read the “$_get” variables from the url and then query delicious for this content. xml delicious offers an application programming interface (api) that returns xml results from queries passed to delicious through https. for instance, the request https://api.del.icio.us/v1/posts/ recent?&tag=biology returns an xml document listing the fifteen most recent posts tagged as “biology” for a given account. unlike either the rss or the json methods, the xml api offers a means of retrieving all of the posts for a given tag by allowing requests such as https://api.del.icio.us/v1/ posts/all?&tag=biology. this type of request is labor intensive for the delicious server, so it is best to cache the results of such a query for future use. this involves the user writing the results of a request to a file on the server and then checking to see if such an archived file exists before issuing another request. a php utility called deliciousposts, which provides caching functionality, is available for free.6 note that the username is not part of the request and must be supplied separately. unlike the public rss or json feeds, using the xml api requires users to log in to their own account. from a script, this can be accomplished using the php curl function: $ch = curl_init(); curl_setopt($ch, curlopt_ url, $queryurl); curl_setopt($ch, curlopt_ userpwd, $username . “:” . $password); curl_setopt($ch, curlopt_ returntransfer, 1); $posts = curl_exec($ch); curl_close($ch); this code logs into a delicious account, passes it a query url, and makes the results of the query available as a string in the variable $posts. the content of $posts can then be processed as desired to create web content. one way of doing this is to use an xslt stylesheet to transform the results into html, which can then be printed to the browser: /* create a new dom document from your stylesheet */ $xsl = new domdocument; $xsl->load(“mystylesheet.xsl”); /* set up the xslt processor */ $xp = new xsltprocessor; $xp->importstylesheet($xsl); /* create another dom document from the contents of the $posts variable */ $doc = new domdocument; $doc->loadxml($posts); /* perform the xslt transformation and output the resulting html */ $html = $xp>transformtoxml($doc); echo $html; conclusion delicious is a great tool for quickly and easily saving bookmarks. it also offers some very simple tools such as linkrolls and tagrolls to add delicious content to a website. but to exert more control over this data, the user must interact with the delicious api or feeds. we have outlined three different ways to accomplish this: rss is a familiar option and a good choice if the data is to be used in a feed reader, or if only the most recent items need be shown. json is perhaps the fastest method, but requires some basic scripting knowledge and can only display one hundred results. the xml option involves more programming but allows an unlimited number of results to be returned. all of these methods facilitate the use of delicious data within an existing website. references 1. delicious, tools, http://delicious .com/help/tools (accessed nov. 7, 2008). 2. linkrolls may be found from your delicious account by clicking settings > linkrolls, or directly by going to http:// delicious.com/help/linkrolls (accessed nov. 7, 2008). 3. tagrolls may be found from your delivious account by clicking settings > tagrolls or directly by going to http:// delicious.com/help/tagrolls (accessed nov. 7, 2008) 4. martin jansen and clay loveless, “pear::package::xml_rss,” http://pear .php.net/package/xml_rss (accessed november 7, 2008). 5. introducing json, http://json.org (accessed nov. 7, 2008). 6. ron gilmour, “deliciousposts,” h t t p : / / r o n g i l m o u r. i n f o / s o f t w a r e / deliciousposts (accessed nov. 7, 2008). lita cover 2, cover 3, cover 4 mit press 92 index to advertisers microsoft word december_ital_biswas_final.docx analyzing digital collections entrances: what gets used and why it matters paromita biswas and joel marchesoni information technology and libraries | december 2016 19 abstract this paper analyzes usage data from hunter library’s digital collections using google analytics for a period of twenty-seven months from october 2013 through december 2015. the authors consider this data analysis to be important for identifying collections that receive the largest number of visits. we argue this data evaluation is important in terms of better informing decisions for building digital collections that will serve user needs. the authors also study the benefits of harvesting to sites such as the digital public library of america, and they believe this paper will contribute to the literature on google analytics and its use by libraries. introduction hunter library at western carolina university (wcu) has fourteen digital collections hosted in contentdm—a digital collection management system from oclc. users can enter the collections in various ways—through the library’s contentdm landing pages,1 search engines, or sites such as the digital public library of america (dpla) where all the collections are harvested.2 since october 2013, the library has collected usage data from its collections’ websites and from dpla referrals via google analytics. this paper analyzes this usage data covering a period of approximately twenty-seven months from october 2013 through december 2015. the authors consider this data analysis important for identifying collections receiving the largest number of visits, including visits through harvesting sites such as the dpla. the authors argue that such data evaluation is important because it can better inform decisions taken to build collections that will attract users and serve their needs. additionally, this analysis of usage data generated from harvesting sites such as the dpla demonstrates the usefulness of harvesting in increasing digital collections’ usage. lastly, this paper contributes to the broader literature on google analytics and its use by libraries in data analysis. literature review using google analytics to study usage of electronic resources is common; a considerable amount of material exists describing the use of google analytics in marketing and business fields.3 paromita biswas (pbiswas@email.wcu.edu) is metadata librarian and joel marchesoni (jmarch@email.wcu.edu) is technology support analyst, hunter library, western carolina university, cullowhee, north carolina. analyzing digital collections entrances: what gets used and why it matters | biswas and marchesoni | https://doi.org/10.6017/ital.v35i4.9446 20 however, the published literature offers little about the use of this software for studying usage of collections consisting of unique materials digitized and placed online by libraries and cultural heritage organizations. for example, betty has written about using google analytics to track statistics for user interaction with librarian-created digital media such as quizzes and video tutorials.4 fang discusses using google analytics to track the behavior of users who visited the rutgers-newark law library website.5 fang looked at the number of visitors, what and how many pages they visited, how long they stayed on each page, where they were coming from, and which search engine or website had referred them to the library’s website. findings were evaluated and used to make improvements to the library’s website. for example, fang mentions using google analytics data for tracking the percentage of new and returning visitors before and after the website redesign. among articles that discuss using web analytics to learn how users access digital collections, most have focused on a comparison between third-party platforms, online search engines, and the traditional library catalog to find preferred modes of access and whether results call for a shift in how libraries share their digital collections. for example, in their article on the impact of social media platforms such as historypin and pinterest on the discovery and access of digital collections, baggett and gibbs use google analytics for tracking usage of digital objects on the library’s website as well statistics collected from historypin’s and pinterest’s first-party analytics tools.6 the authors conclude that while neither historypin nor pinterest drive users back to the library’s website, they help in the discovery of digital collections and can enhance user access to library collections. schlosser and stamper compare the effects on usage of a collection housed in an institutional repository and reposted on flickr.7 whether housing a collection on a third-party site had an adverse effect on attracting traffic to the library’s website was not as important as ensuring users accessed the collection somewhere. likewise, o’english demonstrates how data from web analytics were used to compare access to archival materials via online search engines as opposed to library catalogs using marc records for descriptions.8 o’english argues library practices should change accordingly to promote patron access and use. ladd’s article on the access and use of a digital postcard collection from miami university uses statistics from google analytics, contentdm, and flickr over a period of one year.9 ladd’s findings reveal that few users came to the main digital collections website to search and browse; instead, most arrived via external sources such as search engines and social media sites. the resulting increase in views makes it imperative, ladd asserts, that regular updates both in contentdm and flickr are important for promoting access and use of the postcards. articles on using google analytics for tracking digital collection usage have explored tracking the geographic base of users. for example, herold uses google analytics to demonstrate usage of a digital archival collection by users at institutional, national, and international levels.10 herold looks at server transaction logs maintained in google analytics, onand off-campus searching counts, user locations, and repeat visitors to the archival images representing cultural heritage materials related to orang asli peoples and cultures of malaysia. she uses these data to ascertain information technology and libraries | december 2016 21 the number of users by geographic region and determine that, while most visitors came from the united states, malaysia ranked second. the data supported, according to herold, that this particular digital collection was able to reach another target audience: users from malaysia. herold’s findings indicate that digitization of unique materials makes them available to a worldwide audience. whether harvesting has increased usage of digital collections available via dpla or its hubs has received limited exploration in the literature. most writings on harvesting digital collections have focused more on the technical aspects of the process, like the dpla’s ingestion method, the quality and scalability of metadata remediation and enhancement,11 and large metadata encoding.12 for example, gregory and williams write about the north carolina digital heritage center as one of the service hubs of the dpla. the service hubs are centers that aggregate digital collection metadata provided by institutions for harvesting by the dpla. the authors discuss metadata requirements, software review, and establishment of workflow for sending large metadata feeds to the dpla.13 boyd, gilbert, and vinson, in their article on the south carolina digital library (scdl), another service hub for dpla, describe the planning behind setting up the scdl, its management, and the technology involved in metadata harvesting.14 freeland and moulaison discuss the missouri hub as a model for “institutions with similar collective goals for exposing and enriching their data through the dpla.”15 according to them, by harvesting their metadata to the dpla, institutions are able to share their digital collections with the broader public. additionally, institutions that harvest metadata to the dpla get value-added services like geocoding of locationbased metadata and expression of contributed metadata as linked data. data collection parameters hunter library digital collections usage data included information on item views16 and referrals17 for each of the collections including dpla referrals. the authors also considered keyword search terms18 across all referrals, and within contentdm specifically, that brought users to the library’s collections. the authors considered the most frequently occurring keywords to be representing the subjects of collections that were most used. repeat visitors to the library’s digital collections’ website were also tracked. finally, sessions19 were traced by the geographic area20 of the users. hunter library’s collections vary in size. the library’s largest and one of the oldest collections, craft revival [note: collections are set in roman and capitalized] showcases documents, photographs, and craft objects housed in hunter library and smaller regional institutions. the collection’s items represent the late nineteenth and early twentieth century (1890s–1940s) craft revival movement in western north carolina, which was characterized by a renewed interest in handmade objects, including cherokee arts and crafts. the craft revival collection began in 2005 and includes 1,982 items. the second largest collection, great smoky mountains, which highlights efforts that went into the establishment of the park and includes photographs on the landscape and flora and fauna in the park, began in 2012 and consists of 1,829 items. not all digital analyzing digital collections entrances: what gets used and why it matters | biswas and marchesoni | https://doi.org/10.6017/ital.v35i4.9446 22 collections were harvested to the dpla at the same time. while some older collections were harvested to the dpla in 2013, smaller, institution-specific collections started later were also harvested later. for example wcu—oral histories, a collection of interviews collected by students of one of wcu’s history classes documenting the history and culture of western north carolina and the lives of wcu athletes or artists’ like josephina niggli who taught drama at wcu; highlights from wcu, a collection of unique items from wcu’s mountain heritage center and other departments on campus, including letters from the library’s special collections transcribed by wcu’s english department students; and wcu—fine art museum, showcasing art work from the university’s fine art museum, were harvested to the dpla in 2015. as these smaller collections were started later, their total item views and referral counts would likely be less than some of the library’s older collections; however, these newer collections were included as they might provide valuable data regarding harvesting referrals and returning visitors. table 1 shows the years the collections were started, the number of items included in each collection, and the year they were harvested to the dpla. collection name start year collection size (number of items) harvested since cherokee traditions 2011 332 2013 civil war 2011 68 2013 craft revival 2005 1,982 2013 great smoky mountains 2013 1,829 2013 highlights from wcu 2015 39 2015 horace kephart 2005 552 2013 picturing appalachia 2012 972 2013 stories of mountain folk 2012 374 2013 travel western north carolina 2011 160 2013 wcu—fine art museum 2015 87 2015 wcu—herbarium 2013 91 2013 wcu—making memories 2012 408 2013 wcu—oral histories 2015 67 2015 western north carolina regional maps 2015 37 2015 table 1. collections by year information technology and libraries | december 2016 23 collecting data using google analytics the library has had google analytics set up on online exhibits—websites outside of contentdm that provide additional insight into the collection—since 2008 and began using google analytics to track its contentdm materials with the 6.1.2 release in october 2013. contentdm version 6.4 introduced a configuration field that allowed the authors to enter a google analytics id and automatically generate the tracking code in pages to simplify the setup. following that software update, oclc made google analytics the default data logging mechanism. the library set up google analytics such that online exhibits are tracked together with their contentdm collections. this is accomplished by using custom tracking on all webpages and a custom script in contentdm. this allows the library to link its contentdm and wcu.edu domains within google analytics so that sessions can be viewed across all online digital collections. data were collected from google analytics using several tools. google provides an online tool called query explorer (https://ga-dev-tools.appspot.com/query-explorer/) that can create and execute custom searches against google analytics. this application was used to craft the queries. microsoft excel was primarily used to download data, using the custom plugin rest to excel library (http://ramblings.mcpher.com/home/excelquirks/json/rest) to parse information from google analytics into worksheets. the excel add-on works well, but requires knowledge of microsoft visual basic for applications (vba) programming to use effectively. this limitation prompted the authors to look for a simpler way of retrieving data. the authors found openrefine (https://github.com/openrefine/openrefine) to collect, sort, and filter data, with excel used for results analysis. once in excel, formulas were used to mine data for specific targets. results analysis the data collected using google analytics spanned a period of approximately twenty-seven months, from october 2013 through december 2015. table 1 and graph 1 show each collection’s item views, item referrals, and size (number of items in the collection). these numbers were calculated for each collection as a percentage of total item views, total items referrals, and total number of items for all collections together. in table 2, the top five collections in terms of items views and referrals are highlighted. graph 1, a graphical representation of table 2, displays more starkly the differences between collections in terms of views and referrals. analyzing digital collections entrances: what gets used and why it matters | biswas and marchesoni | https://doi.org/10.6017/ital.v35i4.9446 24 collection name item views as percentage of total views item referrals as percentage of total referrals number of items as percentage of total items for all collections cherokee traditions 6.38 6.12 4.74 civil war 1.89 0.88 0.97 craft revival 41.35 52.39 28.32 great smoky mountains 7.50 6.34 26.14 highlights from wcu 0.23 0.08 0.56 horace kephart 11.67 7.62 7.89 picturing appalachia 10.03 9.99 13.89 stories of mountain folk 3.51 2.45 5.344 travel western north carolina 7.87 9.57 2.29 wcu—fine art museum 0.19 0.08 1.24 wcu—herbarium 0.71 0.45 1.30 wcu—making memories 7.13 2.64 5.83 wcu—oral histories 0.80 1.08 0.96 western north carolina regional maps 0.26 0.11 0.53 total 100.00 100.00 100.00 table 2. collections by percentage graph 1. collections by percentage information technology and libraries | december 2016 25 as demonstrated in the preceding table and graph, craft revival, one of the library’s oldest and largest collections, contributes more than 28 percent of all digital collections’ items and garners close to 42 percent of all item views and 53 percent of all item referrals. great smoky mountains, the second largest collection, contributes a little more than 26 percent of items but receives only about 8 percent of all item views and 7 percent of all referrals. the horace kephart collection, focusing on the life and works of horace kephart—author, librarian, and outdoorsman who made the mountains of western north carolina his home later in life—is the library’s fourth largest collection. it receives almost 12 percent of all item views and about 8 percent of all item referrals. picturing appalachia, the third largest collection—consisting of photographs showcasing the history, culture, and natural landscape of southern appalachia in the western north carolina region—makes up 14 percent of items and receives approximately 10 percent of all referrals and views. travel western north carolina—visual journeys of western north carolina communities through three generations—contributes fewer than 3 percent of items but scores high on both items views and referrals. wcu—making memories, which highlights the people, buildings, and events from wcu’s history, and stories of mountain folk (somf), which is a collection of radio programs from western north carolina non-profit catch the spirit of appalachia and archived at hunter library, are collections that are similar in size—receiving fewer than 3 percent of all item referrals. however, wcu—making memories receives a more than 7 percent of all item views compared to somf’s almost 4 percent. these findings are not surprising as the making memories collection documents western carolina university’s history and may receive many views from within the institution. overall, however, the craft revival collection can be considered the library’s most popular collection. the horace kephart collection appears to be the second most popular collection. and, not surprisingly, cherokee traditions, a collection of art objects, photographs, and recordings similar in content to the craft revival in terms of its focus on cherokee culture and history, is quite popular and receives more item referrals than both wcu—making memories and somf and more item views than somf (table 2). an analysis of keyword searches within contentdm and keyword searches across all referral sources reiterates these findings. as part of the analysis, data collected for this twenty-sevenmonth period for the top keyword searches within contentdm and the top keyword searches counting all referrals was recorded in an excel spreadsheet and then uploaded to openrefine. openrefine allows text and numeric data to be sorted by name (alphabetical) and count (highest to lowest occurring). once the excel spreadsheet was uploaded to openrefine, keywords were sorted numerically and clustered. openrefine has a “cluster” function to bring together text that has the same meaning but differs by spelling or capitalization (for example, “cherokee,” “cherokee,” “cheroke”) or by order (for example, “jane smith,” “smith, jane”). the clustering function provides a count of the number of times a keyword was used regardless of exact spelling. after identifying keywords belonging to a cluster (for example, a cluster of the word “cherokee” spelled differently), the differently spelled or organized keywords in each cluster were merged in analyzing digital collections entrances: what gets used and why it matters | biswas and marchesoni | https://doi.org/10.6017/ital.v35i4.9446 26 openrefine with their most accurate counterparts. finally, it should be noted that keywords including “!” and “+” symbols were most likely generated from either using multiple search terms within contentdm’s advanced search or from curated search links maintained on some of our online exhibit websites. these links take users to commonly used result sets within the collection. tables 3 and 4 provide a listing of the ten most frequently searched keywords within contentdm across all referrals and names of collections that are most relevant to these searches. keywords occurrence count relevant collection(s) cherokee 187 craft revival; cherokee traditions cherokee language 107 craft revival; cherokee traditions southern highland craft guild 98 craft revival basket!object 96 craft revival; cherokee traditions indian masks—appalachian region, southern 83 craft revival; cherokee traditions basket!photograph postcard 82 craft revival; cherokee traditions w.m. cline company 78 picturing appalachia; craft revival cherokee +indian! photograph 72 craft revival; cherokee traditions wood-carving— appalachian region, southern 70 craft revival indian wood-carving— appalachian region, southern 69 craft revival table 3. top keywords searches within contentdm information technology and libraries | december 2016 27 keywords number of sessions relevant collection(s) cherokee traditions 442 craft revival; cherokee traditions horace kephart 185 horace kephart; great smoky mountains; picturing appalachia cherokee pottery 55 craft revival; cherokee traditions kephart knife 50 horace kephart amanda swimmer 37 craft revival; cherokee traditions appalachian people 36 craft revival; cherokee traditions; great smoky mountains; wcu—oral histories cherokee indian pottery 36 craft revival; cherokee traditions cherokee baskets 34 craft revival; cherokee traditions weaving patterns 33 craft revival; cherokee traditions basket weaving 26 craft revival; cherokee traditions table 4. top keyword searches across all referrals tables 3 and 4 show that top searches relate to arts and crafts from the western north carolina region (“baskets,” “indian masks,” “indian wood carving,” “cherokee pottery”), artists (“amanda swimmer”), or topics relating to cherokee culture (“cherokee,” “cherokee language”). searches relating to the horace kephart collection (“horace kephart,” “kephart knife”) are also popular, explaining the fact that the kephart collection, which accounts for fewer than 8 percent of the library’s digital collections’ items scores highly in terms of item views (second) and referrals (fourth). the popularity of topics related to western north carolina is reiterated in the geographic base of the users. graph 2 shows north carolina accounts for most of the searches, with cities in western north carolina (asheville, franklin, cherokee, waynesville) accounting for more than 40 percent of sessions. analyzing digital collections entrances: what gets used and why it matters | biswas and marchesoni | https://doi.org/10.6017/ital.v35i4.9446 28 graph 2. cities by session count the majority of item referrals come from search engines such as google, bing, and yahoo! graph 3 shows the percentage of item referrals from these external searches.21 however, the dpla also generates a fair amount of incoming traffic to the collections. for example, while all collections get referrals from the dpla, harvesting to the dpla is particularly useful for smaller collections such as highlights from wcu, wcu—fine art museum, and civil war collection. each of these collections gets 17 percent of referrals from the dpla, making dpla the largest referral source following the search engines for the highlights and fine art museum collections. graph 4 shows referrals each collection receives via the dpla as a percentage of total referrals. this indicates the usefulness of harvesting to the dpla. a trend seems also to show there is an increase in total referrals from dpla per month the longer items are in dpla (graph 5). graph 3. percentage of search engine item referrals (google, bing, and yahoo!) 367 319 171 146 144 135 122 109 105 98 44% 29% 47% 44% 75% 43% 57% 11% 23% 75% 74% 38% 33% 6% 22% information technology and libraries | december 2016 29 graph 4. percentage of dpla item referrals graph 5. increase in dpla referrals over time lastly, new and returning visitors to the collections were tracked as a marker of user interest in particular collections. graph 6 shows data collected for new and returning visitors calculated as a proportion of the total number of visits for each collection. some smaller collections like highlights from wcu, wnc regional maps, wcu—fine art museum, and wcu—oral histories score highly in terms of attracting return visitors (graph 6). 6% 17% 3% 12% 17% 4% 11% 6% 3% 17% 3% 4% 5% 0% analyzing digital collections entrances: what gets used and why it matters | biswas and marchesoni | https://doi.org/10.6017/ital.v35i4.9446 30 graph 6. new and returning visitors discussion the aim behind gathering data was to study usage of hunter library’s digital collections and examine the usefulness of harvesting in promoting use. although usage data logs were unable to shed much light on the actual usefulness of the collections to users, the logs provided information on volume of use, what materials were accessed, and where users were located. analysis of the transaction logs indicates that while all collections likely benefitted from harvesting, craft revival, cherokee traditions, and horace kephart (collections focusing on the culture and history of western north carolina) were the most heavily used and most visitors came from the state of north carolina and from the region in particular. search terms in the transaction logs also indicated a strong interest in items related to cherokee culture and horace kephart. as herold, who traced the second largest group of users of the orang asli digital image archive to malaysia notes, the geographic base of a collection’s users can be indicative of the popularity of a subject area.22 likewise, matusiak asserts that users’ comments can be indicative of the relevance of collections to users’ needs and provide direction for the future development of digital collections.23 as neither the craft revival, cherokee traditions, nor horace kephart collection includes items that relate specifically to the university’s history—unlike other institution-specific collections mentioned earlier—it is possible collection users may be more representative of the larger public than the university. these findings point to the need for questioning identification of an academic information technology and libraries | december 2016 31 library’s user base as mainly students and faculty of the institution and whether librarians should give greater consideration to the needs of a wider audience.24 data supporting the existence of this user base, whose true import or preferences might not be captured in surveys and questionnaires, can serve as a valuable source of information for individuals responsible for building digital collections. in an informal survey of hunter library faculty carried out by hunter library’s digital initiatives unit in september of 2014, respondents considered collections such as craft revival to be more useful to users external to the university. while the survey could allude to the nature of the user base of a collection like craft revival, it understandably could not capture the scale of the item views and referrals garnered by this collection as well as a usage data analysis could. on the other hand, analysis of usage data, as demonstrated in this paper, indicated that certain collections— highlights from wcu, wcu—fine art museum, and wcu—oral histories—possibly served a niche audience. these smaller and more recently established collections consisting of universitycreated materials attracted more returning visitors (see graph 6). these returning visitors were likely internal users whose visits indicated, as fang points out, a loyalty to these collections.25 in the paper “a framework of guidance for building good digital collections,” authored by the national information standards organization framework advisory group, the authors point out that while there are no absolute rules for creating quality digital collections, a good collection should include data pertaining to usage.26 the authors point to multiple assessment matrixes including using a combination of observations, surveys, experiments, and transaction log analyses. as the wcu digital collections findings demonstrate, a careful analysis of the popularity of collections can indicate the need for balancing quantitative data with more qualitative survey and interview data. these findings also indicate that usage data analysis can be very valuable in identifying the extent of collection usage by visitors who may not have significant survey representation. results from the small (fewer than ten respondents) wcu survey indicate that some respondents question the institutional usefulness of collections such as craft revival. these results show the importance of taking multiple factors into account when assessing user needs and interests in digital collections. conclusion the authors feel future projects might stem from this data analysis. for example, local subject fields based on the highest recurring keywords that were mined from the transaction logs can be added for all of hunter library’s digital collections. usage statistics at a later period could be evaluated to study if addition of user generated keywords increased use of any collection. as matusiak points out in her article on the usefulness of user-centered indexing in digital image collections, social tagging—despite its lack of synonym control or misuse of the singular and plural—is a powerful form of indexing because of “close connection with users and their language,” as opposed to traditional indexing.27 the terms users assign to describe images are also the ones they are most likely to type while searching for digital images. likewise, according to walsh, a analyzing digital collections entrances: what gets used and why it matters | biswas and marchesoni | https://doi.org/10.6017/ital.v35i4.9446 32 study conducted by the university of alberta found more than forty percent of collections reviewed used a locally developed classification for indexing and searching their collections, and many of these schemes could work well for searches within the collection by users who are familiar with the culture of the collection.28 usage-data analysis can constitute useful information that guides decisions for building digital collections that better serve user needs. it can identify a library’s digital collections’ users and what they want. these are important considerations to keep in mind if library services are to be all about engaging and building relationship with the users.29 harvesting to a national portal such as the dpla is beneficial for hunter library’s collections. at the same time, the library’s institution-specific collections receive more return visits, likely because of sustained interest from the large user base of the university’s students and employees, an assessment supported by survey findings. conversely, collections not so directly tied to the institution receive the most onetime item views and referrals. items that get used are a good indication of what users want and, as this paper demonstrates, the focus of academic digital library collections should consider the needs of both the university audience and the general public. references 1. a landing page refers to the homepage of a collection. 2. the dpla provides a single portal for accessing digital collections held by cultural heritage institutions across the united states. “history,” digital public library of america, accessed may 19, 2016, http://dp.la/info/about/history/. 3. paul betty, “assessing homegrown library collections: using google analytics to track use of screencasts and flash-based learning objects,” journal of electronic resources librarianship 21, no. 1 (2009): 75–92, https:// doi.org/10.1080/19411260902858631. 4. ibid. 5. wei fang, “using google analytics for improving library website content and design: a case study,” library philosophy and practice (e-journal), june 2007, 1-17, http://digitalcommons.unl.edu/libphilprac/121. 6. mark baggett and rabia gibbs, “historypin and pinterest for digital collections: measuring the impact of image-based social tools on discovery and access,” journal of library administration 54, no. 1 (2014): 11–22, https:// doi.org/10.1080/01930826.2014.893111. 7. melanie schlosser and brian stamper, “learning to share: measuring use of a digitized collection on flickr and in the ir,” information technology and libraries 31, no. 3 (september 2012): 85–93, https:// doi.org/10.6017/ital.v31i3.1926. information technology and libraries | december 2016 33 8. mark r. o’english, “applying web analytics to online finding aids: page views, pathways, and learning about users,” journal of western archives 2, no. 1 (2011): 1–12, http://digitalcommons.usu.edu/westernarchives/vol2/iss1/1. 9. marcus ladd, “access and use in the digital age: a case study of a digital postcard collection,” new review of academic librarianship 21, no. 2 (2015): 225–31, https://doi.org/10.1080/13614533.2015.1031258. 10. irene m. h. herold, “digital archival image collections: who are the users?” behavioral & social sciences librarian 29, no. 4 (2010): 267–82, https://doi.org/10.1080/01639269.2010.521024. 11. mark a. matienzo and amy rudersdorf, “the digital public library of america ingestion ecosystem: lessons learned after one year of large-scale collaborative metadata aggregation,” in 2014 proceedings of the international conference on dublin core and metadata applications (dcmi, 2014), 1–11, http://arxiv.org/abs/1408.1713. 12. oskana l. zavalina et al., “extended date/time format (edtf) in the digital public library of america’s metadata: exploratory analysis,” proceedings of the association for information science and technology 52, no. 1 (2015), 1–5, http://onlinelibrary.wiley.com/doi/10.1002/pra2.2015.145052010066/abstract. 13. lisa gregory and stephanie williams, “on being a hub: some details behind providing metadata for the digital public library of america,” d-lib magazine 20, no. 7/8 (july/august 2014): 1–10, https://doi.org/10.1045/july2014-gregory. 14. kate boyd, heather gilbert, and chris vinson, “the south carolina digital library (scdl): what is it and where is it going?” south carolina libraries 2, no. 1 (2016), http://scholarcommons.sc.edu/scl_journal/vol2/iss1/3. 15. chris freeland and heather moulaison, “development of the missouri hub: preparing for linked open data by contributing to the digital public library of america,” proceedings of the association for information science and technology 52, no. 1 (2015): 1–4, http://onlinelibrary.wiley.com/doi/10.1002/pra2.2015.1450520100105/abstract. 16. a single view of an item in a digital collection. 17. visits to the site that began from another site with an item page being the first page viewed. 18. keywords are words visitors used to find the library’s website when using a search engine. google analytics provides a list of these keywords. 19. a session is defined as a “group of interactions that take place on a website within a given time frame” and can include multiple kinds of interactions like page views, social interactions, and economic transactions. in google analytics, a session by default lasts thirty minutes, though analyzing digital collections entrances: what gets used and why it matters | biswas and marchesoni | https://doi.org/10.6017/ital.v35i4.9446 34 one can adjust this length to last a few seconds or several hours. “how a session is defined in analytics,” google, analytics help, accessed may 20, 2016, https://support.google.com/analytics/answer/2731565?hl=en. 20. locations were studied in terms of mostly cities and states. 21. the percentage is based on the total referral count a collection gets—for example, a 44 percent referral count for cherokee traditions would mean that the search engines account for 44 percent of the total referrals this collection gets. 22. herold, “digital archival image collections,” 278. 23. krystyna k. matusiak, “towards user-centered indexing in digital image collections,” oclc systems & services: international digital library perspectives 22, no. 4 (2006): 283–98, https://doi.org/10.1108/10650750610706998. 24. ladd, “access and use in the digital age,” 230. 25. fang points out that the improvements made to the rutgers-newark law library website could attract more return visitors and thus achieve loyalty. fang, “using google analytics for improving library website,” 11. 26. niso framework advisory group, a framework of guidance for building good digital collections, 2nd ed. (bethesda, md: national information standards organization, 2004), https://chnm.gmu.edu/digitalhistory/links/cached/chapter3/link3.2a.niso.html. 27. matusiak, “towards user-centered indexing,” 289. 28. john walsh, “the use of library of congress subject headings in digital collections,” library review 60, no. 4 (2011), https://doi.org/10.1108/00242531111127875. 29. lynn silipigni connaway, the library in the life of the user: engaging with people where they live and learn, (dublin: oclc research, 2015), http://www.oclc.org/research/publications/2015/oclcresearch-library-in-life-of-user.html. automated book order and circulation control procedures at the oakland university library lawrence auld: oakland university, rochester, michigan 93 automated systems of book order and circulation control using an ibm 1620 computer are described as developed at oakland university. relative degrees of success and failure are discussed briefly. introduction oakland university, affiliated with michigan state ·university and founded in 1957, offers degree programs at the bachelor's and master's levels. by september, 1967, 3,896 students were enrolled and continuing growth is anticipated in coming years. the library had holdings of 86,755 jlumes and 17,908 units of microform materials on july 1, 1967. although young, oakland's library has already encountered a host of problems common to most academic libraries. in recognizing a need to 1utomate or otherwise improve basic routines of handling book ordering •• u. circulation control, oakland is simply another member of a growing club. the book order system developed at oakland is noteworthy because of ·~rtain features which may be unique: a title index to the on-order file, a computer prepared invoice-voucher form, and a computer prepared voucher card which serves as input to the computer for writing payment checks. in logic the system is related, through parallel invention, to the machine aided technical processing system developed at yale university ( 1). the system developed with unit record equipment at the university of maryland is perhaps more directly related, particularly in the use of the purchase order as a vendor's report form (2,3 ). the pennsyl94 journal of library automation vol. 1/ 2 june, 1968 vania state university library design for automated acquisitions, which uses a similar purchase order, includes the capacity for an elaborate and variable method for reporting the progress of each item from initial order to completion of cataloging ( 4,5) . the ibm 357 circulation control system developed at southern illinois university, carbondale, set the pattern followed by most subsequent systems ( 6,7) . oakland's circulation control system, a variation of the ibm 357 system, is more flexible than some because it uses trigger cards to control machine operations. this paper, originally distributed to a relatively small group of persons and redrafted for a more general reading, presents a case study of how one institution in modest circumstances set about solving certain problems. it describes not systems to be copied but rather a learning process which will continue for many years to come. background during the winter of 1964/ 65, oakland university library laid out the plans and began work on a program of automation of the university library. an initial four-phase plan was conceived: 1) book order, 2) circulation control, 3) serials acquisitions, and 4) a printed book catalog. these housekeeping routines were felt to be the foundation for developing further automation in the library. their automation would liberate the staff, clerical and professional, from such nonproductive and repetitive_ tasks as alphabetizing and re-copying of bibliographic information. an early decision to learn by doing rather than attempting to design the ultimate system in advance was supported by the university administration. consensus being that a larger computer to replace the ibm 1620 would be delivered within two years, computer programs were planned to be useful for twenty-four to thirty-six months. work on developing the book order system was begun in march, 1965; perhaps an all-time speed record was achieved when the system was put into use on july 1 of the same year. work on a circulation control system was begun in august and on february 21, 1966, it too was ready. phases three and four, serials acquisitions and the printed book catalog, were by then being held in abeyance until larger computer equipment should become available to the library. at oakland university all computer and related services are provided by the computing and data processing center. the computer system includes the following pieces of equipment: ibm 1620 computer, 40k with monitor 1 and additional instructions feature (mf, tnf, tns) ibm 1622 card reader/ punch (240 cpm/ 125 cpm) two ibm 1311 disk drives with changeable disk packs ibm 1443 line printer ( 240 lpm) automated book order/ auld 95 only one of the two disk drives is available for production use because the other is committed to monitor, supervisor, and stored programs. a disk pack on the ibm 1620 can accommodate two million numeric or one million alphabetic characters. the computer language used for most of the library programs is 1620 sps (symbolic programming system); fortran is used for some computational work. equipment within the library consists of an ibm 026 printing keypunch which is used for the order system and an ibm 357 data collection device, including a time clock, with output via a second ibm 026 printing keypunch for the circulation system. book order procedure as may be inferred from a birdseye view of the order system (figure 1), the initial input to the computer is decklets of punched cards. output from the computer is a series of printouts: purchase orders, library of congress card orders, oakland university invoice-vouchers, a complete fig. 1. flow chart of book order system . 96 journal of library automation vol. 1/ 2 june, 1968 on-order listing with title and purchase order number indices, departmental listings, and budget summaries. facu1ty and library staff submit requests for book purchases to the acquisitions department on a specially designed library book request form (figure 2 ) . the 5x8-inch size provides adequate room for notes, checking marks, etc., and makes for improved legibility, which in turn makes for easier, faster, and more accurate keypunching. kttt;e libt o ry oo~larul untve r1 ity jildg. q 11ery library book request mutt be typ41d &jp st orch au th o, cij tit i• p' tla piip brit . ... p~o~bli•h•r and "oce r----no. copie• i p, bll•h dote l fd ition j•'· _t·· ~ mo . yr. cotl. ltl!qu tttt d it d!portment cited in r---o:;;p'rice t ·· dept i vando• clau l l c cood n•mbe • l.c. fig. 2. book order request form. the request form calls for the bibliographic data customarily required for book purchasing, plus date of ordering, code number for the department originating the order, and vendor number. oakland university utilizes campus-wide a five-digit vendor code system; since the library's vendor numbers are a part of the university's vendor code, this interface is one of several points where the book order system ties in with other university records and procedures. a tag number is assigned to each library book request form upon its arrival in the acquisitions department. after routine bibliographic identification is completed, decklet cards (figure 3) are keypunched. the individual cards in each decklet are kept together by the tag number, punched into columns one through five. to keep the cards in order within decklets, column six is punched to identify the type of card as 1) author, 2) title, 3) place and publisher, or 4) miscellaneous information. column seven indicates the card number within type of card. for example, code 11 in columns six and seven wou1d be the first author card and code 12 the second. ·automated book order/ auld 97 i i : i •i l l i l ~~ i l i i ..... ~;,, i i i i i l l ! i i i : i .,! autho• ~ ~ : 0 : i!! u ~ ••• c< anc ~u&ll&~ bisct/11~ cll~b:s . each book has a machine readable book card (figure 7). the period for which the book normally circulates is indicated with a letter code punched into column one; column two identifies the collection within the library from which the material came; column three identifies · the type of material. the call number and/ or other identifying information is punched into columns four through forty-one. column forty-two is punched with an end-of-transmission code . 104 journal of library automation vol. 1/ 2 june, 1968 .. ::;; !!~8 g c: .. ~~z z ... . 0 p 0 -1 1111 .. !,._ ~oot ~ .... i z:o =;.: "' !::o:r.oa 1);1 ... ;!,..< .. ... '"it"' !ill 0 $ ;;;~_. ... ;., 0 c: s!! ~~1!!1 ,. ill: z o:= lire:~ ... ~-~ -zr.oa s 0 n_ c: ..... :iiii i:' s; s20 z-;. z =~= 0 0-it . . c: ........ .. z !o~ ... < ... ~:~ .. "' ~ . "' pi 'i t' 1 • t •h • eijn!lf!m!i!u+•+•+•+•+•+•+•+•• .. fig. 7 . book card. the ibm 357 data collection device will perform only one operation without special instructions. if it is to perform more than one operation, it. must receive instructions for each variant operation and it must receive them each time the variant operation is performed. this limitation can be met in one of three ways: by not admitting variant operations, by using a cartridge as a carrier for some information, or by providing special . instructions as they are needed via a "trigger" card. denying the existence of a variant operation was not practical, because at oakland the identification of a borrower constitutes a set of variant operations. the library's clientele includes not only oakland university students, faculty, and staff, but also residents from the surrounding communities, area high school students, and neighboring college students. the heaviest users are oakland's own students and faculty, who have machine readable plastic identification cards issued by the registrar or the personnel office. it has been impractical for the library to attempt to issue similar cards to guest borrowers. thus, the identification of a borrower is a set of variant operations. use of a cartridge to gain the borrower identification number would be possible but would leave the borrower identification badge unused. this badge card constitutes an official identification card and as such should be utilized throughout the university whenever practical. . · trigger cards to instruct the 357 in the pedormance of variant operations were developed to control the recording of borrower identification and to identify discharging and certain charging functions. the use of trigger cards provides flexibility, in that machine. instructions are carried in trigger cards and are not an integral part of the book cards. a change in machine configuration would probably not require ·repunching book cards for the book collection. at the same time a wide range · of 357 machine functions are made possible through ·the use of different automated book order/ auld 105 trigger cards. in short, the adoption of trigger cards provides the greatest degree of flexibility in operating the 357. in the customary borrowing procedure the student brings a book to the circulation desk and presents it, along with his machine readable student id card, to the desk attendant. the attendant first inserts the book card into the ibm 357 data collection device, then retrieves the book card and inserts a "student badge trigger card", which activates the badge reader on the 357. then the badge is inserted into the badge reader, completing the transaction. by remote control this has created on an ibm 026 printing keypunch a card with the following information: typical loan period, collection from which the item came, type of material, call number, borrower type, borrower's identification number, the day of the year, and the time of day secured from an on-line clock. if the borrower does not have a machine readable badge card, an alternate method of charging a book is to use a "manual entry trigger card" which activates the manual entry unit, with which can be recorded numeric information identifvine: the borrower. with special trigger cards .bo;;ks can also be charged to reserve, bindery, or "missing". books are discharged by passing the book card through the 357 and following it by a "discharge trigger card". monday through friday at closing the charge and discharge cards for the day are delivered to the computing and data processing center, where they are processed by the ibm 1620 computer system. the circulation file is maintained on a disk pack similar to that for the order. system. three reports are received from the computing and data processing center: a daily cumulative listing of all books and materials in circulation (figure 8); a cumulative weekly list of all books on long-term loan; and a weekly fines-due report. in addition, overdue notices, computer printed on mailable postcard stock, are sent weekly to the library where they are audited before being mailed. the fines-due report is arranged by borrower, bringing together in one place all of the borrower's delinquencies; the books which he has neglected to return are listed here, as are the overdue books which he returned through the outdoor book return chute. for the latter the number of days overdue at the time of return is listed. subsequent refinements introduced into this system include two additional reports: a pre-notice report in call number sequence produced two days in advance of the fines-due report and a listing of books discharged each day. the pre-notice report makes it possible to search the shelves for books which have been returned but, because of time lag, may still have overdue notices generated. normal tum-around time for the system is 24 hours, but on weekends it goes to 63 hours and at certain holiday periods even higher. the daily list of discharges documents the return and discharge of each book and is used to answer the student who says, "but i returned the book." 106 journal of library automation vol. 1/2 june, 1968 s hort term books in circulation . weds-jul. 13.1966 pag f. 1 8 call numb er borrower day of yr due odue 01 jc0153ol.79 01 000009b74 20b 01 jc 0179 , r723 01 000007736 209 01 j c0 179or83-1962 01 000004838 199 01 j c0 179.r86-195 4 01 000007935 209 01 j c025 lol..27 01 000009021 20 i 01 jc0421ob8vol 01 000000207 127 * 01 j c0 4 23oi..58co2 04 000002393 19 9 01 jk0246ob9-1895v o 2 01 00000020 7 127 * 01 jk04 2 1op4 01 000006266 203 01 jk0421o s7 01 000006266 209 01 jk0516os3 01 000 003891 199 01 jk0518oh6 01 000006266 209 01 jk0524ol.38 01 000007717 2 1 4 01 jk154 1oj27 01 000006266 182 * 01 jk1561o527 01 000003891 199 01 jk1 57 1om8 01 000003640 208 01 j k1976 om5-co2 0 5 0 00002256 207 01 jk2295om5253 01 000007397 209 01 jk2372 oh5 04 000002194 2 10 01 jk 2372op6 04 000002194 2 1 0 01 jk2408ok4 0 1 000 00020 7 146 * 01 jn6769 oa5k622 01 00000 52 31 2 13 01 j01503 o1 912 ob7 01 000003824 209 01 j01503o1911oh72 01 000003824 207 0 1 j01512ok7 01 000 003824 207 01 j s0323oc58 01 0000 07717 209 01 js0341ow7 0 1 00 0 00 7717 2 09 01 j x14 25 op384 04 00000 2925 213 01 jx14 28 oc 6c5-1 964 01 000004154 199 01 jx1 977o2 oc5a73 01 000009 11 9 207 01 jx1977o2ou5577 0 1 000007371 201 fig. 8. example of short term circulation rep01t. maximum file capacity will permit up to about 9,000 charges at one time. assuming an average life of four weeks for each charge, the maximum number of transactions which can be accommodated in one year is about 115,000. the circulation control system utilizes eight programs. all are written in 1620 sps and utilize 40k storage. (an additional computational program not included in the production package is written in fortran.) with only minor modification the programs could be made to work with 20k storage. the individual programs are described in table 2. tabk 2. lib 201 lib 202 lib 204 lib 205 lib 207 lib 209 lib 212 lib 213 circulation control system programs to update file and to print short-and long-term reports. to print overdue notices and fines-due report. phase 1 routine for lib 202. cold start program to "seed" circulation file. to restart files from one term to the next. to print pre-notice report. to print daily discharges. to print circulation file or part thereof. • automated book order/ auld 107 appraisal the book order system has been described as it was originally designed, and the circulation control system as designed and modified. a partial update together with a critical appraisal follows. implicit in the planning of both systems was the assumption that the ibm 1620 would eventually be replaced by a larger and faster machine and that both systems would be redesigned and augmented. however, the ibm 1620 is continuing in use for a maximum rather than minimum projected time. in july, 1965, oakland initiated an accelerated library development program. overnight the book budget projection for several years was available and in less than three months the book order system was consequently overloaded. with the disk ble filled and many orders waiting, drastic action was required. the most obvious solution seemed to be use of an additional changeable disk pack to expand the purchase order file, but this procedure would have been hopelessly unwieldy. to use a second pack would require either that all transactions be run against both disk packs, roughly doubling computer time and costs, or that each transaction be addressed to a particular disk pack which would necessitate extensive systems redesign. another proposed solution was to revert to a completely manual system, but the order section preferred, if at all po~sible, to retain the automated fiscal control and invoice-voucher preparation features of the order system. , the alternative finally adopted required a basic philosophical change in the system. as originally designed, the system accounted for a book from the time it was placed on order to the time it was cataloged and placed on the shelf. the disk file was one-half occupied· with items received and paid for but not yet cataloged. by purging the file of such items, an on-order file in the narrowest sense was created and a doubling of file capacity gained. now a new problem was created. how was a book to be accounted for that had been received, paid, and purged from the on-order file, but not yet cataloged? the solution was to print a second (carbon) copy of the lc card order slip which would be hand-filed into the card catalog; there it would serve as an on-order/ in-process slip until replaced by a catalog card. hand-filed slips replacing a machine-filed list further altered the philosophical basis of the system. discrepancies in entry do occur, but not so often that the expedient does not work. four months later the system was again overloaded and a routine had to be devised whereby purchase orders could be issued either manually or through the computer. however, all items were still paid via the computer and all invoice-vouchers computer prepared. fiscal control was retained even though the rationale of the system was violated . 108 journal of library automation vol. 1/ 2 june, 1968 during the summer of 1967 a change of a different nature was implemented. as originally designed the system provided constant communication between the library and each faculty department through the departmental report. but, after the changes described above, the departmental report now included less than one-half of the items being purchased with the department's book fund allocation. it had ceased to serve any purpose and was omitted after july, 1967, with a consequent reduction of nearly two-fifths of line-printer time required for the book order system. to the question, "would it be better to return to a completely manual system for ordel'ing books?" the answer by the order section has always been "no, retention of the automated system for fiscal control and voucher preparation is preferable, even with the patched system at hand." nor should it be forgotten that the book order system as originally designed worked well until the demand on it exceeded its production capacity. also to be recognized is the gain in experience and insight by the library staff during these three years. reading about or visiting someone else's work is enlightening but day-to-day work brings an understanding for which it is difficult to obtain a substitute. acknowledgments four persons deserve special recognition for the roles they played in the foregoing: dr. floyd cammack, former university librarian, without · whose imagination and courage library automation at oakland would not have been attempted; mr. donald mann, assistant director, computing and data processing center, an outstanding systems analyst and programmer; mrs. edith pollock, head of the order section, who likes computers; mrs. nancy covert, head of circulation department, who likes students. references i. alanen, sally; sparks, david e.; kilgour, frederick g.: "a computermonitored library technical processing system," in american documentation institute: proceedings of the annual meeting, v. 3, 1966 (woodland hills, calif.: adrianne press, 1966) p. 419-26. 2. cox, carl r.: "the mechanization of acquisitions and circulation procedures at the university of maryland library," in international business machines corporation: ibm library mechanization symposium (endicott, n. y.: 1964) p. 205-35. 3. cox, carl r.: "mechanized acquisitions procedures at the university of maryland," college & research libraries, 24 (may 1965) 232-36. 4. minder, thomas l.: "automation-the acquisitions program at the pennsylvania state university library," in international business machines corporation: ibm library mechanization symposium (endicott, n. y.: 1964) p. 145-56. automated book order/ auld 109 5. minder, thomas l.; lazorick, gerald: "automation of the penn state university acquisitions department" in international business machines corporation: ibm library mechanization symposium (endicott, n. y. 1964) p. 157-63. (reprinted from american documentation institute: automation and scientific communication; short papers contributed to the theme sessions of the 26th annual meeting ... (washington: 1963) p. 455-59. ) 6. dejarnett, l. r. : "library circulation control using ibm 357's at southern illinois university," in international business machines corporation: ibm library mechanization symposium (endicott, n. y.: 1964) p . 77-94. 7. mccoy, ralph e.: "computerized circulation work: a case study of the 357 data collection system," library resources & technical services, 9 (winter 1965), 59-65. reproduced with permission of the copyright owner. further reproduction prohibited without permission. policies governing use of computing technology in academic libraries vaughan, jason information technology and libraries; dec 2004; 23, 4; proquest pg. 153 policies governing use of computing technology in academic libraries the networked computing environment is a vital resource for academic libraries. ever-increasing use dictates the prudence of having a comprehensive computer-use policy in force. universities often have an overarching policy or policies governing the general use of computing technology that helps to safeguard the university equipment, software, and network against inappropriate use. libraries often benefit from having an adjunct policy that works to emphasize the existence and important points of higher-level policies, while also providing a local context for systems and policies pertinent to the library in particular. having computer-use policies at the university and library level helps provide a comprehensive, encompassing guide for the effective and appropriate use of this vital resource. f or clients of academic libraries, the computing environment and access to online information is an essential part of everyday service-every bit as vital as having a printed collection on the shelf. the computing environment has grown in positive ways-higher-caliber hardware and software, evolving methods of communication, and large quantities of accurate online information content. it has also grown in many negative ways-the propagation of worms and viruses, other methods of hacking and disruption, and inaccurate informational content. as the computing environment has grown, it has become essential to have adequate and regularly reviewed policies governing its use. often, if not always, overarching policies exist at a broad institutional or even larger systemwide level. such policies can govern the use of all university equipment, software, and network access within the library and elsewhere on campus, such as campus computer labs. a single policy may encompass every easily conceivable computing-related topic, or there may be several individual policies. apart from any document drafted and enforced at the university level, various public laws exist that also govern appropriate computer-use behavior, whether in academia or on the beach. many institutions have separate policies governing employee use of computer resources; this paper focuses on student use of computing technologies. in some cases, the library and the additional campus student-computer infrastructure (for example, campus labs and dormitory computer access) are governed by the same organizational entity, so the higher-level policy and the library policy are de facto the same. in many instances, libraries have enacted additional computeruse policies. such policies may emphasize or augment certain points found in the institution-level policy(s), address concerns specific to the library environment, or both. this paper surveys the scope of what are most jason vaughan commonly referred to as "computer-use policies," specifically, those geared toward the student-client population. common elements found in university-level policies (and often later emphasized in the library policy) are identified. a discussion on additional topics generally more specific to the library environment, and often found in library computer-use policies, follows. the final section takes a look at the computer-use environment at the university of nevada, las vegas (unlv), the various policies in force, and identifies where certain elements are spelled out-at the university level, the library level, or both. i policy basics purpose and scope policies can serve several purposes. a policy is defined as: a plan or course of action ... intended to influence and determine decisions, actions, and other matters. a course of action, guiding principle, or procedure considered expedient, prudent, or advantageous.' any sound university has a comprehensive computeruse policy readily available and visible to all members of the university community-faculty, staff, students, and visitors. some institutions have drafted a universal policy that seeks to cover all the pertinent bases pertaining to the use of computing technology. in some cases, these broad overarching policies have descriptive content as well as references to other related or subsidiary policies. in this way, they provide content and serve as an index to other policies. in other cases, no illusions are made about having a single, general, overarching policy-the university has multiple policies instead. policies can define what is permitted (use of computers for academic research) or not permitted (use of computers for nonacademic purposes, such as commercial or political interests). a policy is meant to guide behavior and the use of resources as they are meant to be used. in addition, policies can delve into procedure. for example, most policies contain a section on how to report suspected abuse and how suspected abuse is investigated, and outlines potential penalties. policies buried in legalese may serve some purpose, but they may not do a good job of educating users on what is acceptable and not acceptable. perhaps the best approach is an appropriate jason vaughan (jvaughan@ccmail.nevada.edu) is head of the library systems department at the university of nevada, las vegas. policies governing use of computer technology in academic libraries i vaughan 153 reproduced with permission of the copyright owner. further reproduction prohibited without permission. balance between legalese and language most users will understand. in addition, policies can also serve to help educate individuals on important topics, rather than merely stating what is allowed and what will get one in trouble. for example, a general policy statement might read, "you must keep your password confidential." taken a step further, the policy could include recommendations pertaining to passwords, such as the minimum password length, inclusion on nonalphabetic characters, the recommendation to change the password regularly, and the mandate to never write down the password. characteristics of a policy-visibility, prominence, easily identifiable a policy is most useful when it is highly visible and clearly identified as a policy that has been approved by some authoritative individual or body. students often sign a form or agree online to terms and conditions when their university accounts are established. web pages may have a disclaimer stating something to the effect of "use of (institution's) resources is governed by .... " and provide a hyperlink to the various policies in place. or, a simple policies link may appear in the footer of every web page at the institutional site. some universities have gone a bit further. at the university of virginia, for example, students must complete an online quiz after reviewing the computer-use guidelines.' in addition, they can choose to view the optional video. such components serve to enhance awareness of the various policies in place. a review of the library literature failed to uncover any articles focusing on computer-use policies in academic libraries. the author then selected several similar-sized (but not necessarily peer) institutions to unlv-doctoralgranting universities with a student population between twenty thousand and thirty thousand-and thoroughly examined their library web sites to see what, if any, policy components were explicitly highlighted. it quickly became evident that many libraries do not have a centrally visible, specifically titled, inclusive computer-use policy document. most, but not all, of the library web sites provided a link to the institutional-level computer-use policy. in some cases, library policies were not consolidated under a central page titled "policies and procedures," or "guidelines," and, where they did appear, the context did not imply or state authoritatively that this was an official policy. there was no statement of who drafted the policy (which can lend some level of authority or credence), as well as no indicated creation or revision date. granted, many libraries have paper forms one must sign to obtain a library card, or they may state the rules in hardcopy posted within prominent computer-dense locations. still, with so much emphasis given to licensed database and internet resources, and with such heavy use of the computing environment, such policies should appear online in a prominent location. where better to provide a computer-use policy than online? perhaps all the libraries reviewed did have policies posted somewhere online. if the author could not easily find them, chances are a student would have difficulties as well. in sum, the location of the policy information and how it is labeled can make a tremendous difference. revisions policies should be reviewed on a regular basis. often, the initial policy likely goes through university counsel, the president's administrative circles, and, perhaps, a board of regents or the equivalent. revisions may go through such avenues, or may be more streamlined. a frequent review of policies is mandated by evolving information technology. for example, cell phones with built-in cameras or internetbrowsing capabilities, nonexistent a few years ago, are now becoming mainstream. with such an inconspicuous device, activities such as taking pictures of an exam or finding simple answers online are now possible. similarly, regularly installed critical updates are a central concept within windows' latest version of operating-system software. such functionality failed to attract much attention until the increase in security exploits and associated media coverage. some policies, recently updated, now make mention of the need to keep operating systems patched. i why have a library policy? while some libraries link to higher-level institutional policies and perhaps have a few rules stated on various scattered library web pages, other libraries have quite comprehensive policies that serve as an adjunct to (and certainly comply with) higher-institutional policies. there are several reasons to have a library policy. first, it adds visibility to whatever higher-level policy may be in place. a central feature of a library policy is that it often provides links (and thus, additional visibility) to other higher-level policies. a computer-use policy can never appear in too many places. (some libraries have the link in the footer of every web page.) a computer-use policy can be thought of as a speed limit sign. presumably, everyone knows that unless otherwise posted, the speed limit inside the city is thirty-five miles per hour, and outside it is fifty-five miles per hour. nevertheless, numerous speed-limit signs are in place to remind drivers of this. higher-level institutional policies often take a broad stroke, in that they pertain to and address computing technology in general, without addressing specific systems in detail. a second reason to have a local-library policy is to reflect rules governing local-library resources that are housed and managed by the library. such systems 154 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. often include virtual reference, electronic reserves, laptop-checkout privileges, and the mass of electronic databases and full-text resources purchased and managed by libraries. such library-based systems do not necessarily make the radar of higher-level policies, yet have important considerations, such as copyright issues in the electronic age or privacy as it relates to e-mail and chat reference. in addition, libraries often have two large user groups that other campus entities do not have-university affiliates (faculty, staff, students) and nonuniversity affiliates (community users). while broader university policies generally apply to all users of computing technology, local-library policies can work to address all users of the library pcs, and make distinctions as to when, where, and what each group can use. i common computer-use policy elements the following section outlines broad topics that are usually addressed within high-level, institutional policies. often, some or many of these same elements are later reemphasized or adapted by libraries, focusing on the library environment. in many cases, the policy is presented in a manner somewhat like breaking the seal on a new piece of software packaging. essentially, if someone is using the university equipment or network, that person agrees to abide by all policies governing such use. an overarching policy frequently may end with a bulleted summary of the important points in the document. an important first part of the policy is a clear indication of who the policy applies to. this may be as broad as "anyone who sits down in front of university equipment or connects to the network," or as specific as spelling out individual user groups (undergraduates, graduates, alumni, k-12 students). appendix a summarizes elements found in the various end-user computer policies in force at unlv and the unlv university libraries. network and workstation security network security is a universal topic addressed in computer-use policies. under this general aegis one often finds prohibitions against various forms of hacking, as well as recommendations for steps individual users should take to help better secure the overall network. there are also such policies as the prohibition of food and drink near computer workstations or on the furniture housing computer workstations. typical components related to network and workstation security include: 1. disruption of other computer systems or networks; deliberately altering or reconfiguring system files; use of ftp servers, peer-to-peer file sharing, or operation of other bandwidth-intensive services 2. creation of a virus; propagation of a virus 3. attempts at unauthorized access; theft of account ids or passwords 4. password information-individual users need to maintain a strong, confidential password 5. intentionally viewing, copying, modifying, or deleting other users' files 6. a requirement to secure restrictions to files stored on university servers 7. recommendation or requirement to back up files 8. statement of ownership regarding equipment and software-the university, not the student, owns the equipment, network, and software 9. intentional physical damage: tampering, marking, or reconfiguring equipment or infrastructuresuch as unplugging network cables 10. food and drink policies personal hardware and software many universities allow students to attach their own laptops to the campus wired or wireless network(s). in addition to network connections, a growing number of consumer devices such as floppy disks, zip disks, and rewritable cd /dvd-media have the potential to connect to university computers for the purpose of data transfer. today, the list has grown to include portable flash drives, digital cameras and camcorders, and mp3 players, among others. the attaching of personal equipment to university hardware may or may not be allowed. similarly, users may often try to install software on university-owned equipment. typical examples may include a game brought from home or any of the myriad pieces of software easily downloaded from the internet. some of the policy elements dealing with the use of personal hardware and software include: 1. connecting personal laptops to the university wired or wireless network(s) 2. use of current and up-to-date patched operating systems and antivirus programs running on personal equipment attached to the network 3. connecting, inserting, or interfacing such personal hardware as floppy disks, cds, flash drives, and digital cameras with university-owned hardware; liability regarding physical damage or data loss 4. limit access to and mandate immediate reporting of stolen personal equipment (to deactivate registered mac addresses, for example) 5. downloading or installing personal or otherwise additional software onto university equipment 6. use of personal technology (cell phones, pdas) in classroom or test-taking environments policies governing use of computer technology in academic libraries i vaughan 155 reproduced with permission of the copyright owner. further reproduction prohibited without permission. e-mail e-mail privileges figure prominently in computer-use policies. some topics deal with security and network performance (sending a virus), while many deal with inappropriate use (making threats or sending obscene e-mails). other topics deal with both (such as sending spam, which is unsolicited, annoying, and consumes a lot of bandwidth). among the activities covered are prohibitions or statements regarding: l. hiding identity, forging an e-mail address 2. initiating spam 3. subscribing others to mailing lists 4. disseminating obscene material or weblinks to such material 5. general guidelines on e-mail privileges, such as the size of an e-mail account, how long an account can be used after graduation, and e-mail retention 6. basic education regarding e-mail etiquette printing with the explosion of full-text resources, libraries and other student-computing facilities have experienced a tremendous growth in the volume of pages printed on library printers. at unlv libraries, for example, the printing volume for july 2002 to june 2003 was just shy of two million pages; the following year that had jumped to almost 2.4 million pages. various policies helping to govern printing may exist, such as honor-system guidelines ("don't print more than ten pages per day"). some institutions or libraries have implemented cost-recovery systems, where students pay fixed amounts per black-and-white and color pages printed through networked printers. standard policies regarding printer-use cover: 1. mass printing of flyers or newsletters 2. tampering with or trying to load anything into paper trays (such as trying to load transparencies in a laser printer) 3. per-sheet print costs (color and black-and-white; by paper size) 4. refund policies 5. additional commonsense guidelines, such as "use print preview in browser" personal web sites many universities allow students to create personal web sites, hosted and served from university-owned equipment. customary policy items focusing on this privilege include: 1. general account guidelines-space limitations, backups, secure ftp requirements 2. use of school logo on personal web pages 3. statement of content responsibility or institutional disclaimer information 4. requirement to provide personal contact information 5. posting or hosting of obscene, questionable, or inappropriate content intellectual property, copyright, or trademark abuse of copyright, clearly a violation of federal law, is something that libraries and universities were concerned about long before computers hit the mainstream. widespread computing has introduced new avenues to potentially break copyright laws, such as peer-to-peer file sharing and dvd-movie duplication, to mention only two. a computer-use policy covering copyright will generally include: l. general discussion of copyright and trademark law; links to comprehensive information on these topics 2. concept of educational "fair use" 3. copying or modification of licensed software, use of software as intended, use of unlicensed software 4. specific rules pertaining to electronic theses and dissertations 5. specific mention of the illegality of downloading copyrighted music and video files appropriateand priority-use guidelines appropriate use is often covered in association with topics such as network security or intellectual property. however, appropriateand priority-use rules can be an entire policy and would include: l. mention of federal, state, and local laws 2. use of resources for theft or plagiarism 3. abuse, harassment, or making threats to others (via e-mail, instant messaging, or web page) 4. viewing material that may offend or harass others 5. legitimate versus prohibited use; use for nonacademic purposes such as commercial, advertising, political purposes, or games 6. academic freedom, internet filtering privacy, data security, and monitoring privacy and data security are tremendous issues within the computing environment. networking protocols and components of many software programs and operating systems by default keep track of many activities (browser history files and cache, dynamic host configuration protocol logs, and network account login logs, to mention a few). additional specialized tools can track specific sessions and provide additional information. just as credit156 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. card companies, banks, and hospitals provide a privacy policy to their clients, so do many academic computer-use policies. such statements often address what logs are kept, how they are maintained, how they may be used, and who has access. in addition to the legitimate use of maintaining information, there is the general concept of questionable or outright malicious collection of information, through cookies, spybots, or browser hijacks. the following are concepts often addressed under the general heading of privacy: l. cookies, spybots, other malicious software 2. what information is collected for evaluative system management and/ or statistical purposes; use of cookies for this; how such information is used and reported 3. statement on routine monitoring or inspection of accounts or use; reasons information may be accessed (routine system maintenance, official university business, investigation of abuse, irregular usage patterns) 4. security of information stored on or transmitted by various campus resources 5. statement on general lack of security of public, multiuser workstations (browser cache, search history, recent documents) 6. disposition of information under certain circumstances (for example, if a student dies while enrolled, any personal university e-mail and stored files can be turned over to executor of will or parents) abuse violations, investigations, and penalties as policies generally are a statement of what is or is not permitted, or what is considered abuse, a clearly defined mechanism for reporting suspected abuse and policy violations can often be found. obviously, some abuse issues violate not only university policy, but also local, state, or federal law. investigations of suspected abuse are by their nature tied into the privacy and monitoring category. policy items detailing suspected abuse usually include: 1. how one can report suspected abuse 2. how requests for content, logging, or other account information are handled; how and by what entities abuse investigations are handled 3. potential penalties 4. how to appeal potential penalties; rights and responsibilities one may have in such a situation other computer or network-based services affecting the broad student population universities operate any number of other computer or network-based services for the broad academic community. such services may include provisioning of isp accounts, courseware, online registration, and digital institutional repositories. depending on the broad nature of these services, policy information particular to such systems can be specified at the broad policy level, especially if they have unique avenues of potential exploitation or abuse not covered in the general topics included elsewhere in the policy. i additional library-specific computer-use policy elements many libraries elect to have their own, additional computer-use policies that serve as an adjunct to the larger university-level policy that generally governs the use of all computing resources on campus. libraries that have a formalized library computer-use policy often start with a statement of other policies governing the use of the library equipment and network-references to the university policies in place. the library policy may choose to include or paraphrase parts of the university policy deemed especially important or otherwise applicable to the specific library environment. important concepts governing university policies apply equally to library policies-purpose and comprehensiveness, visibility, and frequent review. libraries that have formalized computer-use policies often link them under library common web-site sections such as "information about the libraries," or "about the libraries." library policies can help address items unique, special, or otherwise worthy of elaboration, such as specific systems in place or situations that may arise. they can also help provide guidelines and strategies to aid staff in policy enforcement. as an example of a library computer-use policy, appendix b provides the main unlv libraries computer-use policy. public versus student use-allowances and priority use many of the other entities on a university campus do not daily deal with the community at large (the non-university affiliates) as do academic libraries. this applies to most if not all public institutions, as well as many private institutions. the degree to which academic libraries embrace community users varies widely; often, a statement on which user groups are the primary clients is stated in a policy. such policy statements may discuss who may use what computers, what software components they have access to, and when access is allowed. in some cases, levels of access for students and the community are basically the same. community users may be allowed to use all software installed on the pc. more often, separate pcs with smaller software sets have been configured for community users or for specific access to policies governing use of computer technology in academic libraries i vaughan 157 reproduced with permission of the copyright owner. further reproduction prohibited without permission. government documents. in some cases, libraries allow some or all pcs to be used by anyone, student or nonstudent, but have technically configured the pc or network to prevent the community at large from using the full software set (such as common productivity suites). however, community users may be limited from using the productivity software (such as microsoft word) found on these pcs. the may be restricted from using pcs on upper floors, or those reserved for special purposes, such as high-end graphics-development workstations. in addition, during crunch time-midterms and final exams-community users are often restricted to the few pcs set up and configured to allow access only to the library web page (not the web at large) and the online catalog. in addition, only students and staff can plug in their personal laptops to the library and campus network. regardless of whether it is crunch time, nonstudent users can be asked to leave if all pcs are in use and students are waiting. an in-house-authored program identifies accounts and whether particular users are students or nonstudents. in 2005, the unlv libraries will begin limiting full web access to community users; they will only be permitted access to a limited set of webbased resources, such as government document websites and library licensed databases. more and more government information is available online. for libraries serving as government document repositories, all users have the right to freely access information distributed by the government. in 2005, the unlv libraries will begin limiting full web access to community users; they will only be permitted access to a limited set of web-based resources, such as government document web sites and library licensed databases. on another note, many libraries have special adaptive workstations with additional software and hardware to facilitate access to library resources by disabled citizens. disabled individuals, enrolled at the university or not, are allowed to use these adaptive workstations. laptop checkout privileges many libraries today check out laptops for student use. at unlv libraries, faculty, staff, and students may check out lcd projectors and library-owned laptops and plug them into the network at any of the hundreds of available locations within the main library. more details on these privileges can be found in the article "bringing them in and checking them out: laptop use in the modern academic library." 3 as the university does not otherwise check out laptops to users or allow students to plug in their own laptops to the wired university network, the libraries had to come up with these additional specific policies. licensed electronic resources-terms and conditions academic libraries are generally the gatekeepers to many citation and full-text databases and electronic journals. each of the myriad subscription vendors has terms of use, violations of which can carry harsh penalties. for example, the unlv libraries had an incident where a vendor temporarily cut off access to its resource due to potential abuse detected from a single student. in this case, the user was downloading multiple pdf full-text files in an automated manner. this illustrates the need to have some statement in a library policy outlining the existence of such additional terms of use. vendors generally place a link at the top page of each of their resources related to this. for greater visibility, libraries should at least point out the existence of such terms of use for better exposure and potential compliance. in addition, some electronic resources have licensing agreements that simply do not permit community-user access. in these cases, library policy can simply state that some licensed resources may be accessed only by university affiliates. electronic reserves many libraries have set up electronic reserves systems to help distribute electronic full-text documents and streaming media content, among other things. additional policies may govern the use of such systems, such as making the system available only to currently enrolled students, and providing some boundaries in terms of what is acceptable for mounting on such a system. in addition, there is the whole area of copyright. e-reserve systems often have built-in methods to help better enforce copyright compliance in the electronic arena. additional policy statements can help educate faculty members on particulars related to copyright and e-reserves. offsite access to licensed electronic resources many libraries provide offsite access to their licensed resources to legitimate users via proxy servers or other methods. the policy regarding such access may address things such as who is permitted to access resources from offsite (such as students, staff, and faculty), and the requirement that the user be in good standing (such as no outstanding library-book fines). in some instances, universities have implemented broad authentication systems that, once logged on from an offsite location, allow the user into a range of university resources, including, potentially, library-licensed electronic resources. if such is the case, information pertaining to offsite access may be found in a higher-level policy. 158 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. electronic reference transactions many libraries have installed (or plan to install) virtualreference systems, or, at a minimum, have a simple e-mail reference service ("ask a librarian"). in addition, many collect library feedback or survey information through simple forms. in all cases, a record exists of the transaction. with virtual-reference systems, the record can include chat logs, e-mail reference inquiries, and urls of web pages accessed during the transaction. a policy governing the use of electronic-reference systems may address such things as which clientele may use the system; a statement on the confidentiality of the transaction; or a statement on whether the library maintains the electronic-transaction details. items such as hours of operation and response time to an e-mail question could be considered more procedural or informational than a policy issue. statements on information literacy while perhaps not a policy per se, many libraries have a computer-use policy statement to the effect that while the library may provide links to certain information, this does not serve as an endorsement or guarantee that the information is accurate, up-to-date, or has been verified. (such a statement posted on the library web site may provide additional exposure to the maxim that all that glitters is not gold.) statements that libraries do not regulate, organize, or otherwise verify the general mass of information on the internet may be included. obviously, many libraries have separate instruction sessions, awareness programs, and overall mission goals geared toward information literacy. i principles on intellectual freedom and internet filtering statements by the american library association (ala) on intellectual freedom and internet filtering may well appear in an institutional policy and often are included in library policies. filtering is something more likely to affect public and school libraries as opposed to academic libraries. still, underage children can and do use academic libraries. in such an environment, they may be intentionally or unintentionally exposed to questionable or obscene material. thus, a library computer-use policy can express the general concept behind the following: 1. intellectual freedom (freedom of speech; free, equal, unrestricted access); 2. the fact that academic libraries provide a variety of information expressing a variety of viewpoints; 3. the fact that this information is not filtered; and 4. the responsibility of parents to be aware of what their children may be viewing on library pcs. some libraries have provided policy links to various sets of information from the office of intellectual freedom at ala's web site, such as: 1. ala code of ethics 2. ala bill of rights 3. intellectual freedom principles for academic libraries: an interpretation of the library bill of rights 4. access to electronic information, services, and networks: an interpretation of the library bill of rights some libraries also provide references to ala information pertaining to the usa patriot act and how lawenforcement inquiries are handled. i summary computing is a vitally important tool in the academic environment. university and library computing resources receive constant and growing use for research, communication, and synthesizing information. just as computer use has grown, so have the dangers in the networked computing environment. universities often have an overarching policy or policies governing the general use of computing technology that help to safeguard the university equipment, software, and network against inappropriate use. libraries often benefit from having an adjunct policy that works to emphasize the existence and important points of higher-level policies, while also providing a local context for systems and policies pertinent to the library in particular. having computer-use policies at the university and library level help provide a comprehensive, encompassing guide for the effective and appropriate use of this vital resource. references 1. the a111erica11 i jeri/age college dictionary, 3rd edition. (boston: houghton, 1997), 1058. 2. board of visitors of the university of virginia, "responsible computing at u.va.: a handbook for students." accessed june 2, 2004, www.itc.virginia.edu/pubs/ docs/respcomp / rchandbook03.html. 3. jason vaughan and brett burnes, "bringing them in and checking them out: laptop use in the modern academic library," information technology and libraries 21 (2002): 52-62. policies governing use of computer technology in academic libraries i vaughan 159 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a. systemwide, institutional, and library computing policies at unlv unlv uccsn unlv policy libraries scs computing unlv student for posting guidelines unlv libraries nevadanet resou rces computer-use information for library additional policy* policy** policy*** on the webt computer usett policiesttt general direct evident link or references to higher-level institutional/system computer use policy x x x author / authority information included x x x approved/revised date included x x x x network and workstation security disruption of other computer systems/networks; deliberate ly altering or reconfiguring system files; ftp servers/peer-to-peer file sharing/operation of other bandwidth intensiv e services x x x creat ion of a virus; propagation of a virus x x x x attempts at unauthorized access/theft of account ids or passwords x x x x x password informationindividual user's need to maintain a strong, confidential password intentionally view, copy, mod ify, or de lete other users' files x x x x requirement to secure restrictions on stored files recommendat ion/requirement to back up fi les statement of ownership regarding equipment/software x intent ion al phys ical damage: tampering with or marking, reconfiguring equipment or infrastructure x x x food and dr ink policies x 160 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a. systemwide, institutional, and library computing policies at unlv (cont.) scs nevada net policy* pe rsonal hardware and software connect ing persona l laptops, etc. to university wired or wireless network(s) use of current and up-to-date patched operating systems and antiv irus programs running on personal equipment attached to network connect ing/ insert ing/ interfacing personal hardware with existing univers ity equipment; liability regarding physica l damage or data loss limiting access to personal equipment/report immed iately if stolen download ing or installat ion of personal or otherwise add itiona l software onto university equipment use of pe rsonal technology in c lassroom/test -tak ing environments printing mass printing of f lyers or news lette rs tampering with or trying to load anything into paper trays per-sheet print costs refund policies additiona l commonsense gu idel ines e-mail hiding ident ity/forging an e-mai l address initiating spam x uccsn computing resources policy** x x x x unlv policy unlv student for posting computer-use information policy*** on the webt x x unlv libraries gu idelines for library computer usett x x x x x x x unlv libraries additional policiesttt x x policies governing use of computer technology in academic libraries i vaughan 161 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a. systemwide, institutional, and library computing policies at unlv (cont.) j unlv uccsn unlv policy libraries i scs computing unlv student for posting guidelines unlv libraries nevada net resources computer-use information for library additional policy* policy** policy*** on the webt computer usett policiesttt e-mail (cont.) subscribing others to mailing lists dissemination of obscene material or web links to such material x x general guidel ines on e-mail privileges , such as the size of an e-mail account, how long an account can be used after graduation, etc. personal web site specific general account guidelines x use of schoo l name and logo statement of content responsibility/institutional discla imer inform ation x requirement to prov ide personal contact inform at ion x posting and hosting of obscene, questionable, or inappropriate material x intellectual property, copyright, and trademark general d iscussion of copyrights and trademark law; link s to comprehensive information on these topics x x x the concept of educational fair use x copying or modifying licensed software/use of software as intended/use of unlicensed software x x x specific rules pertaining to electronic theses and dissertations 162 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. ' f i i appendix a. systemwide, institutional, and library computing policies at unlv (cont.) unlv uccsn unlv policy libraries scs computing unlv student for posting guidelines unlv libraries nevadanet resources computer-use information for library additional policy* policy** policy*** on the webt computer usett policiesttt appropriateand priority-use guidelines mention of federal, state, and local laws x x x use of resources for theft/plagiarism x abuse, harassment, or making threats to others (via e-mai l, instant messaging, web page, etc.) x x x x viewing material which may offend others x legitimate versus prohibited use; use for nonacademic purposes (commerc ial; advertising; political purposes; games; etc.) x x x x x academic freedom; internet filtering x x x x privacy cookies, spybots, other malicious software what information is collected for evaluative/system management/statistical purposes; use of cookies for this statement on routine monitoring or inspect ion of accounts or use; reasons information may be accessed x x security of information stored on or transmitted by various campus resources x statement on general lack of security of public, multi-user workstations disposition of information under certain circumstances policies governing use of computer technology in academic libraries i vaughan 163 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a. systemwide, institutional, and library computing policies at unlv (cont.) unlv scs nevadanet policy* abuse violations, investigations, and penalties how one can report suspected ab use how requests for content, logg ing, or other account information are hand led; how and by what entities abuse investigations are hand led potential pena lt ies how to appea l potentia l penalties; rights/ respons ibilit ies you may have in such a sit uation other computer/ network-based services affecting the broad student population library-specific pub lic versus student use -a llowances and pr iority use right to access government information assistance for person w ith disab ilit ies laptop, lcd projector, etc. checkout privileges licensed electron ic resources-terms and conditions offsite access to licensed electron ic resources-who can access from offsite electronic reference transactions statements on information literacy x x x uccsn computing unlv student resources computer,-use policy** policy*** x x x 164 information technology and libraries i december 2004 unlv policy for posting information on the webt x x x libraries guidelines unlv libraries for library additional computer usett policies ttt x x x x x x reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a. systemwide, institutional, and library computing policies at unlv (cont.) unlv ala princip les on academic freedom /internet filtering scs nevada net policy* electron ic reserves; copyright as it pertains to electronic reserves notes uccsn unlv policy computing unlv student for posting resources computer-use information policy** policy*** on the webt libraries guidelines unlv libraries for library additional computer usett policiesttt x x * the systems computing services nevadanet policy. among other responsibilities, scs provides and maintains the general internet connectivity for nevada's higher education institutions, including unlv. the complete document can be accessed at www.scs.nevada.edu/nevadanet/nvpolicies.html. ** the university and community college system of nevada computing resources policy. uccsn is the system of higher education institutions in the state of nevada, governed by an elected board of regents. the complete document can be accessed at www.scs .nevada .edu/about/policy061899.html *** the complete document can be accessed at www.unlv.edu/infotech/itcc/scup.html. 1 the complete document can be accessed at www.unlv.edu/infotech/itcc/www _policy.html. 11 the primary unlv libraries policy governing student computer use. provided in appendix 2, the complete document can also be accessed at www. library.unlv.edu/services/policies/computeruse.html . ttt various other policies are in effect at the unlv libraries. some of these can be accessed at www.library.unlv .edu/services/policies/computeruse.html. appendix b. unlv university libraries guidelines for library computer use in pursuit of its goal to provide effective access to information resources in support of the university's programs of teaching, research, and scholarly and creative production , the university libraries have adopted guidelines governing electronic access and use of licensed software. all those who use the libraries' public computers must do so in a legal and ethical manner that demonstrates respect for the rights of other users and recognizes the importance of civility and responsibility when using resources in a shared academic environment. authorized users to gain authenticated access to the libraries ' computer network, all users of the university libraries public computers must be officially registered as a library borrower, a library computer user, or a guest user . a photo id is required. (exceptions may be made as needed when access to federal depository electronic resources is required.) priority use is granted to unlv students, faculty, and staff. as need arises, access restrictions may be imposed on nonuniversity users. in accordance with lic ensing and legal restrictions, nonuniversity users are restricted from using word-processing, spreadsheet, and other productivity and high-end multi-media software. during high-demand times, all users may have time restrictions placed on their computer use. if requested by library staff, all users must be prepared to show photo id to confirm their user status. authorized and unauthorized use public computers are to be used for academic research purposes only. electronic information, services, software, and networks provided directly or indirectly by the mliversity libraries shall be accessible, in accordance with licensing or contractual obligations and in accordance with existing unlv and university and community college system of nevada (uccsn) computing services policies (uccsn computing resources policy www.scs .nev ada.edu/about/policy061899.html; policies governing use of computer technology in academic libraries i vaughan 165 reproduced with permission of the copyright owner. further reproduction prohibited without permission. unlv faculty computer use policy www.unlv.edu/infotech/itcc/fcup.html; student computer use policy http:/ /ccs. unlv.edu/ scr/ computeruse.asp). users are not permitted to: 1. copy any copyrighted software provided by unlv. it is a criminal offense to copy any software that is protected by copyright, and unlv will treat it as such 2. use licensed software in a manner inconsistent with the licensing arrangement. information on licenses is available through your instructor 3. copy, rename, alter, examine, or delete the files or programs of another person or unlv without permission 4. use a computer with the intent to intimidate, harass, or display hostility toward others (sending offensive messages or prominently displaying material that others might find offensive such as vulgar language, explicit sexual material, or material from hate groups) 5. create, disseminate, or run a self-replicating program ("virus"), whether destructive in nature or not 6. use a computer for business purposes 7. tamper with switch settings, move, reconfigure, or do anything that could damage terminals, computers, printers, or other equipment 8. collect, read, or destroy output other than your own work without the permission of the owner 9. use the computer account of another person with or without their permission unless it is designated for group work 10. use software not provided by unlv 11. access or attempt to access a host compnter, either at unlv or through a network, without the owner's permission, or through use of log-in informatio! belonging to another person internet and web use the university libraries cannot control the information available over the internet and are not responsible for its content. the internet contains a wide variety of material, expressing many points of view. not all sources provide information that is accurate, complete, or current, and some may be offensive or disturbing to some viewers. users should properly evaluate internet resources according to their academic and research needs. links to other internet sites should not be construed as an endorsement by the libraries of the content or views contained therein. the university libraries respect the first amendment and support the concept of intellectual freedom. the libraries also endorse ala's library bill of rights, which supports access to information and opposes censorship, labeling, and restricting access to information. in accordance with this policy, the university libraries do not use filters to restrict access to information on the internet or web. as with other library resources, restriction of a minor's access to the internet or web is the responsibility of the parent or legal guardian. printing users are charged for printing no matter who supplies the paper. mass production of club flyers, newsletters, posters is strictly prohibited. if multiple copies are desired, users need to go to an appropriate copying facility such as campus reprographics. contact a staff member when using the color laser printer to avoid costly mistakes. the university libraries reserve the right to restrict user printing based on quantity and content (such as materials related to running an outside business). copyright alert many of the resources found on the internet or web are copyright protected. although the internet is a different medium from printed text, ownership and intellectual property rights still exist. check the documents for appropriate statements indicating ownership. most of the electronic software and journal articles available on library servers and computers are also copyrighted. users shall not violate the legal protection provided by copyrights and licenses held by the university libraries or others. users shall not make copies of any licensed or copyrighted computer program found on a library computer. use of personal laptops and other equipment students, faculty, and staff of the university are welcome to bring laptops with network cards and use them with our data drops to gain access to our network. the laptop must be registered in our laptop authentication system, and a valid 166 information technology and libraries i december 2004 reproduced with permission of the copyright owner. further reproduction prohibited without permission. library barcode is also required. users are responsible for notifying the library promptly if their registered laptop is lost or stolen, since they may be held responsible if their laptop is used to access and damage the network. users taking advantage of this service are required to abide by all uccsn and unlv computer policies. the libraries allow the use of the universal serial bus (usb) connections located in the front of the workstations. this includes use with portable usb-based devices such as flash-based memory readers (memory sticks, secure digital) and digital camera connections. the patron assumes all responsibility in attaching personal hardware to library workstations. the libraries are not responsible for any damage done to patron-owned items (hardware, software, or personal data) as a result of connecting such devices to library workstations. as with any use of library workstations, patrons must adhere to all uccsn, unlv, and university libraries' computing and network-use policies. patrons are responsible for the security of their personal hardware, software, and data. inappropriate behavior behavior that adversely affects the work of others and interferes with the ability of library staff to provide good service is considered inappropriate. it is expected that users of the libraries' public computers will be sensitive to the perspective of others and responsive to library staff's reasonable requests for changes in behavior and compliance with library and university policies. the university libraries and their staff reserve the right to remove any user(s) from a computer if they are in violation of any part of this policy and may deny further access to library computers and other library resources for repeat offenders. the libraries will pursue infractions or misconduct through the campus disciplinary channels and law enforcement as appropriate. revised: march 3, 2004 updated: thursday, may 13, 2004 content provider: wendy starkweather, director of public services policies governing use of computer technology in academic libraries i vaughan 167 hathitrust as a data source for researching early nineteenth-century library collections: identification, coverage, and methods articles hathitrust as a data source for researching early nineteenth-century library collections: identification, coverage, and methods julia bauder information technology and libraries | december 2019 14 julia bauder (bauderj@grinnell.edu) is associate professor and social studies and data services librarian, grinnell college. abstract an intriguing new opportunity for research into the nineteenth-century history of print culture, libraries, and local communities is performing full-text analyses on the corpus of books held by a specific library or group of libraries. creating corpora using books that are known to have been owned by a given library at a given point in time is potentially feasible because digitized records of the books in several hundred nineteenth-century library collections are available in the form of scanned book catalogs: a book or pamphlet listing all of the books available in a particular library. however, there are two potential problems with using those book catalogs to create corpora. first, it is not clear whether most or all of the books that were in these collections have been digitized. second, the prospect of identifying the digital representations of the books listed in the catalogs is daunting, given the diversity of cataloging practices at the time. this article will report on progress towards developing an automated method to match entries in early nineteenth-century book catalogs with digitized versions of those books, and will also provide estimates of the fractions of the library holdings that have been digitized and made available in the google books/hathitrust corpus. introduction digital libraries such as google books and hathitrust have created tantalizing opportunities for research into the history of american culture: automated analyses of the entire corpus of books published at a given point in time. the attraction of this prospect is most clearly demonstrated by the avalanche of papers written using the google books ngram data, which provides counts over time of the words and phrases used in the works that make up the google books corpus. as soon as this data became available in 2009, it was used to make arguments about social, linguistic, and other changes over time as reflected in changes in the words used in print.1 however, for nearly as long, other researchers have been cautioning that the google books corpus is not a representative sample of publishing output, let alone of what the public at large was actually reading in a given year, and that its unrepresentativeness makes it dangerous to draw sweeping conclusion s from this data.2 one potentially feasible solution to the problem of unrepresentativeness in the google books corpus would be to use corpora based on the holdings of a specific library or a group of libraries. using library holdings to form corpora helps to remedy some known issues with using the google books corpus as an indicator of social change, such as the fact that many books did not become mailto:bauderj@grinnell.edu hathitrust as a data source | bauder 15 https://doi.org/10.6017/ital.v38i4.11251 popular and/or widely available until well after their official publication date, and that some prolific authors who contributed hundreds of thousands of words to the google books corpus were never as widely purchased and read as authors who wrote a single, short, best-selling work.3 although using books held by a set of libraries at a given time as the corpus has its own problems of unrepresentativeness—particularly, for long-established libraries, the fact that the books on the shelf at a given time represent not only works of interest to current users but also those of interest to users from decades past—triangulating this data with that provided by the google books ngram data would at least give some sense of whether and where these different corpora disagree.4 creating corpora using books that are known to have been owned by a given library at a given point in time is potentially feasible because digitized records of the books in several hundred nineteenth-century library collections are available in the form of scanned book catalogs: a book or pamphlet listing all of the books available in a particular library. however, there are two potential problems with using those book catalogs to create corpora. first, it is not clear whether most or all of the books that were in these collections have been digitized, incorporated into google books and hathitrust, and hence made available for ngram analyses. second, the prospect of identifying the digital representations of the books listed in the catalogs is daunting, as both widely agreed-upon cataloging standards and universal identifiers were not adopted until late in the nineteenth century. this article will report on progress towards developing a fully-automated method to match entries in early nineteenth-century book catalogs with digitized versions of those books, and will also provide estimates of the fractions of the library holdings that have been digitized and made available in the google books/hathitrust corpus. methods practical considerations dictated using data from hathitrust rather than from google books for this research. the hathitrust corpus, although not perfectly coextensive with the google books corpus, has very substantial overlap with it. the hathitrust digital archive was founded in 2008, when a group of large academic libraries formed a collaboration to archive and disseminate their digitized books. the vast majority of those digitized books—around 95 percent, as of mid-2017— had originally been scanned as part of the google books project; the agreements that google books entered into with the libraries typically stipulated that google had to provide the library with a digital copy of each book scanned from that library. 5 it was necessary to use hathitrust rather than google books as the comparison corpus because the metadata for the titles in hathitrust is readily available in ways that the google books metadata is not, including as bulk marc-data downloads. the libraries included in this analysis are social libraries, which were a type of quasi-public library that predated the now-standard, tax-supported public library in the united states. these libraries were privately owned and operated, but were open to some large portion of the population of a particular area who were willing and able to pay a fee or buy a share to belong to the library. although the presence or absence of a book in social library collections is not a perfect indicator of the book’s popularity—most social libraries pointedly refused to purchase the “trashy” but widely read sensational fiction of the day—it is a defensible proxy (although with some caveats, as noted above) for the popularity of the “serious” literature and nonfiction works that made up the bulk of these libraries’ collections. information technology and libraries | december 2019 16 roughly one hundred social library book catalogs published between 1800 and 1860 can be found in hathitrust.6 for the purposes of the present study, attention was focused on the thirteen library catalogs from ten different american libraries that were published between 1776 and 1825. (a list of these catalogs can be found in appendix a.) these catalogs were chosen because they are likely to present the worst-case scenario in terms of both of the challenges mentioned above: the highest percentage of rare and extremely old books, which google’s partner libraries would have been least likely to permit to be scanned by google, and, presumably, the most primitive and eclectic cataloging practices. to the extent that it was possible to do so, this analysis focused on book-length monographs. when serials or pamphlets were listed in a separate section of the catalog, those catalog pages were excluded from the process by which entries were extracted from the catalogs and parsed into csv files. serials present particularly intractable matching problems: not only are the original catalogs often unclear about which specific volumes were held, but also hathitrust’s marc data does not always clearly indicate which volumes are available in hathitrust either. pamphlets have limited coverage in hathitrust. the selected catalogs were downloaded from hathitrust as pdfs, and the pdftotext software was used to extract the ocr data from the relevant pages of the scans as hocr (a file format for ocr that includes information about where each word is located on the page in addition to the words themselves). 7 then cleaning scripts were created that parsed the hocr data into csv files for analysis, with one catalog entry per line of the csv file.8 given the widely varied cataloging practices of the early nineteenth century, several different cleaning scripts were written, each tailored to a particular catalog format. for example, many of the catalogs had entries that spanned multiple lines (see figures 1 and 2), so the scripts for those catalogs had to be able to identify when each new entry started. many catalogs had extraneous information, such as the name of the donor of the book or the size of the book, that had to be filtered out (see figure 1; f, q, o, and d refer to the size of the book: folio, quarto, octavo, or duodecimo). in addition, various forms of dittoes were frequently used in these catalogs (see figures 1, 2, and 3), so one of the tasks for the cleaning scripts was to identify the dittoes and replace them with the correct words from the previous entry. figure 1. library company of philadelphia, a catalogue of the books belonging to the library company of philadelphia: to which is prefixed, a short account of the institution, with the charter, laws, and regulations (philadelphia, pa: printed by bartram & reynolds, 1807), 5. hathitrust as a data source | bauder 17 https://doi.org/10.6017/ital.v38i4.11251 figure 2. library company of baltimore, a catalogue of the books, &c. belonging to the library company of baltimore: to which are prefixed the act for the incorporation of the company, their constitution, their by-laws, and an alphabetical list of the members (baltimore, md: printed by eades and leakin, 1809), 46. figure 3. washington library company, catalogue of books in the washington library (washington, dc: printed by anderson and meehan, 1822), 17. unfortunately, the horizontal-line dittoes seen in figures 1 and 2—a type of ditto which is used in seven of the thirteen catalogs—are represented inconsistently or not at all in the hocr, so they cannot reliably be used to identify places where words need to be carried down from the previous entry. for the catalog of the library of company of philadelphia, from which figure 1 was taken, the numbers after the horizontal-line dittoes (which identify the books’ locations on the shelves) can be used to distinguish between a line that is indented because it is a continuation of the entry above and a line that is indented but is the start of a new entry. in theory, a cleaning script for the catalog of the library company of baltimore (figure 2) could use a similar process to identify the last line of an entry by watching for the right-justified count of volumes at the end of each entry. information technology and libraries | december 2019 18 when a right-justified digit was encountered, the script could then carry down the first word from that entry if the first word in the next entry was indented. however, these isolated digits were also not handled well by the ocring process: many do not appear in the hocr file at all, and those that do are as likely to be ocred as a colon, an exclamation point, a capital i, etc., as they are to be a digit. hence, the three catalogs of the library company of baltimore, which use this format and have this ocr issue, were not analyzed for this project. table 1. results of verification library date founded if known, or inc. if not known9 date catalog printed number of spreadsheet entries number of entries handverified handverified entries that cannot be positively identified handverified, positively identifiable entries that are not in hathitrust positively identifiable entries successfully matched when work was in hathitrust library company of philadelphia 1731 1807 7619 128 0% 16.9% 79.8% horsham library company 1808 1810 143 143 28.4% 5.1% 79.8% salem (ma) athenaeum inc. 1810 1811 1585 130 0.8% 11.3% 72.3% new york society library 1754 1813 4522 135 5.7% 17.9% 76.1% providence library company 1753 1818 688 688 17.1% 9.4% 87.2% apprentices’ library (new york, ny)10 1820 1820 1811 124 34.4% 15.0% 69.7% washington (dc) library company inc. 1814 1822 900 124 12.9% 3.2% 83.7% boston library inc. 1794 1824 2273 138 4.1% 11.1% 82.5% mercantile library (new york, ny) 1820 1825 1386 138 0% 11.3% 86.0% hathitrust as a data source | bauder 19 https://doi.org/10.6017/ital.v38i4.11251 the catalogs of the other nine libraries could all be parsed with an acceptable success rate and, with one exception, were included. the exception was the salem athenaeum’s 1818 catalog, which was identical in format and nearly identical in content to the athenaeum’s 1811 catalog. given the overwhelming similarity it was decided to include only one of the catalogs; given that the goal of this analysis was to try to use the worst-case-scenario catalogs, the older of the two catalogs was chosen for inclusion. once the catalogs were parsed into csv files, they were run through another script that attempted to match each entry in the catalog against metadata from hathitrust. in february 2019, marc records containing metadata for 2,824,875 public-domain titles in hathitrust were downloaded from hathitrust via their oai feed and ingested into a local apache solr index for searching and matching, using code from the solrmarc and vufind projects.11 because of ocr errors in the catalog files and mistakes in the original catalogs, many of the words in the entries have one or more character-level errors. therefore, solr’s fuzzy searching option was used, which allows words to match as long as the levenshtein distance between them is two or less. (the levenshtein distance is the number of edits, such as changing one letter to another or adding or deleting a letter, it would take to turn one word into the other.) no attempt was made to match specific editions; as can be seen from the excerpts in figures 2 and 3, many of the catalogs do not contain sufficient detail to do so, even if it was desirable. the goal was merely to determine whether the text of that work, from any edition, was contained in the hathitrust corpus. once the catalogs had been checked against hathitrust, a sample of the entries was hand-verified. for the two smallest catalogs, the horsham library company and the library company of providence, all entries were hand-verified. for the other catalogs, a random sample of approximately 130 items (+/10) was selected. microsoft excel’s random-number generator was used to assign each line in the csv file a number between 0 and 1, and then the lowest 1.5 percent to 12.5 percent (depending on the number of items in the catalog) were examined. results percentage of works included in hathitrust a minimum of four of the books in every catalog examined was missing from hathitrust. as can be seen in table 1, the fraction of books from the hand-verified sample that was missing from hathitrust ranged from 3.2 percent for the washington library company to just shy of 18 percent for the new york society library. the library company of philadelphia, at 16.9 percent missing, had the second-highest missing number. it is not surprising that these two libraries, as two of the oldest and most venerable libraries in the united states at the time, owned the most books that are not represented in hathitrust, as both have a high percentage of very old and rare works. however, not all of the books from these collections that are not represented in hathitrust fall into that category. only six of the twenty missing works from the library company of philadelphia sample, and no more than eight of twenty-two from the new york society library, were published before 1700, for example.12 percentage of works that cannot be positively identified as can be seen in figures 1 through 3, some catalogs provided relatively full titles (figures 1 and 2), while others described the works in only two or three words each (figure 3). as might be expected, it is much easier to positively identify the works when fuller titles are provided, although two or three words proved to be enough to identify the work unambiguously the information technology and libraries | december 2019 20 majority of the time. (all of the titles shown in figure 3 can be positively identified, for example.) in the samples taken from the nine catalogs, the percentage of titles that were unidentifiably ambiguous ranged from 0 percent (library company of philadelphia, mercantile library of new york) to more than one in four (apprentices’ library of new york, 34.4 percent; horsham library company, 27.9 percent). the apprentices’ library of new york and the horsham library company were particularly problematic because they frequently omitted the name of the author, in addition to greatly compressing the title; without an author name, titles such as modern geography (apprentices’ library) and history of rome (horsham library company) present far too many potential matches. however, even including the author’s name does not make all greatly compressed entries identifiable. one particularly egregious example comes from the library company of providence’s 1818 catalog, which contains an entry reading “bell’s inquiry.” the list of candidates for this work includes a practical inquiry into the authority, nature, and design of the lord’s supper, by william bell; an inquiry into the causes which produce, and the means of preventing diseases among british officers, soldiers, and others in the west indies, by john bell; and inquiry into the policy and justice of the prohibition of the use of grain in distilleries, by archibald bell. figure 4. new york society library, a catalogue of the books belonging to the new-york society library (new york: printed by c. s. van winkle, 1813), 139. success rates for the parsing and matching scripts when there was a single, identifiable work that matched the catalog entry, and that work was in hathitrust, the matching scripts identified it at least 70 percent of the time for every individual catalog. unsurprisingly, catalogs such as those of the horsham library company and the apprentices’ library of new york that had entries that were difficult to positively identify were also more difficult for the script to properly match, although the matching script still succeeded between roughly 70 and 80 percent of the time. hathitrust as a data source | bauder 21 https://doi.org/10.6017/ital.v38i4.11251 for two other libraries with below-average matching results (the library company of philadelphia and the new york society library), many of the matching problems were caused by issues with the scanned catalogs that the data-cleaning scripts did not handle well. the new york society library catalog listed out the contents of multivolume sets in a way that was difficult for the cleaning script to identify and remove (see figure 4); instead, it was common for each volume of the set to end up with its own entry in the dataset. since the hathitrust records generally do not list out the contents of each volume, it was very rare for the cleaning script to correctly match a set based on an entry for one volume in the set. twenty-seven percent (six out of 22) missed matches from that sample failed because of this table-of-contents issue. for the library company of philadelphia, the problem lies with a quirk in the hocr where the character heights for many of the horizontal-line dittoes are extremely high—around twenty pixels, when the text around those dittoes is typically around ten pixels high. it appears as if the ocr program may have treated each horizontal-line ditto as an em dash and assigned it a height that would be proportional for an em dash of that length. these extra-tall line heights for the first “word” on the line cause issues with the algorithm that processes the text line-by-line, causing some entries to be inappropriately divided across two entries in the data spreadsheets. unsurprisingly, the matching script had difficulty correctly identifying the correct work in hathitrust when it was trying to match based on only half of the book’s title. conclusions although not a complete success, the results of this study provide hope that it might be possible to create full-text corpora based on the works in individual libraries with minimal manual labor, with a few caveats. the first caveat is that the digitized catalogs of those libraries must meet certain specifications: 1) the catalog is formatted, and has been ocred, in such a way that it is consistently possible to parse the catalog line-by-line and to identify algorithmically where each entry starts and ends. 2) the catalog provides at least the authors’ last names, if not their full names, plus a more-orless complete and accurate transcription of the title proper. 3) either the catalog contains minimal extraneous information (such as tables of contents or donors’ names), or the extraneous information is consistently formatted in a way that it can be algorithmically identified and removed. the second caveat is that even if all of these conditions are met, the full-text corpora that can be created will probably still be missing some small percentage of the books available in that library. one potential direction for future research could be more closely examining the books that are absent from hathitrust to see if there are any commonalities among them that might bias research done using these corpora, or if the missing works can safely be treated as random omissions. on the other hand, as was noted above, the catalogs used in this study represent a likely worst-case scenario for being able to positively identify the works listed in the catalogs and for those works being present in hathitrust. another promising avenue for future research would be to repeat this analysis on catalogs from the mid-to-late nineteenth century to see if the works in those catalogs are in fact more likely to exist in the hathitrust corpus. information technology and libraries | december 2019 22 appendix a: american library catalogs from 1776 to 1825 included in hathitrust boston library, catalogue of books in the boston library, june, 1824, boston: munroe and francis, 1824, http://hdl.handle.net/2027/hvd.32044080249337. general society of mechanics and tradesman of the city of new york, catalogue of the apprentices’ library, instituted by the society of mechanics and tradesman of the city of new-york, on the 25th november, 1820: with the names of the donors: to which is added, an address delivered on the opening of the institution by thomas r. mercein, a member of the society. new york: printed by william a. mercein, no. 93 gold-street, 1820, http://hdl.handle.net/2027/nnc2.ark:/13960/t8md1cv2t. horsham library company, the constitution, bye-laws, and catalogue of books, of the horsham library company. philadelphia, pa: j. rakestraw, 1810, http://hdl.handle.net/2027/nnc1.cu55910696. library company of baltimore, a catalogue of the books, &c. belonging to the library company of baltimore: to which are prefixed the act for the incorporation of the company, their constitution, their by-laws, and an alphabetical list of the members. baltimore, md: printed by eades and leakin, 1809, http://hdl.handle.net/2027/nyp.33433069263907. library company of baltimore, a supplement to the catalogue of books, &c. belonging to the library company of baltimore. baltimore, md: printed by j. robinson, 1816, http://hdl.handle.net/2027/nyp.33433069263899. library company of baltimore, a supplement to the catalogue of books, &c. belonging to the library company of baltimore. baltimore, md: printed by j. robinson, 1823, http://hdl.handle.net/2027/nyp.33433069263899. library company of philadelphia, a catalogue of the books belonging to the library company of philadelphia: to which is prefixed, a short account of the institution, with the charter, laws, and regulations. philadelphia, pa: printed by bartram & reynolds, 1807, http://hdl.handle.net/2027/nyp.33433075914816. mercantile library association of the city of new york, catalogue of the books belonging to the mercantile library association of the city of new-york: to which are prefixed, the constitution and the rules and regulations of the same. new york: printed by hopkins & morris, 1825, http://hdl.handle.net/2027/nyp.33433057517090. new york society library, a catalogue of the books belonging to the new-york society library. new york: printed by c. s. van winkle, 1813, http://hdl.handle.net/2027/mdp.39015023478822. providence library company, charter and by laws of the providence library company, and a catalogue of the books of the library. providence, ri: printed by miller and hutchens, 1818, http://hdl.handle.net/2027/nyp.33433059555346. salem athenaeum, catalogue of the books belonging to the salem athenæum, with the by-laws and regulations. salem, ma: printed by thomas c. cushing, 1811, http://hdl.handle.net/2027/hvd.32044080252174. http://hdl.handle.net/2027/hvd.32044080249337 http://hdl.handle.net/2027/nnc2.ark:/13960/t8md1cv2t http://hdl.handle.net/2027/nnc1.cu55910696 http://hdl.handle.net/2027/nyp.33433069263907 http://hdl.handle.net/2027/nyp.33433069263899 http://hdl.handle.net/2027/nyp.33433069263899 http://hdl.handle.net/2027/nyp.33433075914816 http://hdl.handle.net/2027/nyp.33433057517090 http://hdl.handle.net/2027/mdp.39015023478822 http://hdl.handle.net/2027/nyp.33433059555346 http://hdl.handle.net/2027/hvd.32044080252174 hathitrust as a data source | bauder 23 https://doi.org/10.6017/ital.v38i4.11251 salem athenaeum, catalogue of the books belonging to the salem athenæum, with the by-laws and regulations. salem, ma: printed by w. palfray, 1818, http://hdl.handle.net/2027/hvd.32044080252174. washington library company, catalogue of books in the washington library, july 20, 1822. washington, dc: printed by anderson and meehan, 1822, http://hdl.handle.net/2027/chi.098498263. references 1 see, e.g., jean-baptiste michel et al., “quantitative analysis of culture using millions of digitized books,” science, 311, no. 6014 (january 11, 2011): 176-82, https://doi.org/10.1126/science.1199644; jean m. twenge, w. keith campbell, and brittany gentile, “male and female pronoun use in u.s. books reflects women’s status, 1900 -2008,” sex roles 67, nos. 9-10 (november 2012), 488-93, https://doi.org/10.1007/bf00287963; patricia m. greenfield, “the changing psychology of culture from 1800 through 2000,” psychological science 24, no. 9, 1722-31, https://doi.org/10.1177/0956797613479387. 2 eitan adam pechenick, christopher m. danforth, and peter sheridan dodds, “characterizing the google books corpus: strong limits to inferences of socio-cultural and linguistic evolution,” plos one 10, no. 10 (october 7, 2015): e0137041. https://doi.org/10.1371/journal.pone.0137041; alexander koplenig, “the impact of lacking metadata for the measurement of cultural and linguistic change using the google ngram data sets—reconstructing the composition of the german corpus in times of wwii,” digital scholarship in the humanities 32, no. 1 (april 2017): 169-88, https://doi.org/10.1093/llc/fqv037. 3 pechenick et al., 2015; lindsay dicuirci, colonial revivals: the nineteenth-century lives of early american books (philadelphia: university of pennsylvania press, 2019). 4 robert a. gross, “reconstructing early american libraries: concord, massachusetts, 1795 -1850,” proceedings of the american antiquarian society, 97, no. 1 (january 1, 1987): p. 331-451. 5 jennifer howard, “what ever happened to google’s effort to scan millions of university library books?,” edsurge, august 20, 2017, https://www.edsurge.com/news/2017-08-10-whathappened-to-google-s-effort-to-scan-millions-of-university-library-books. 6 book catalogs fell out of favor in the latter half of the nineteenth century as library collections became larger and more dynamic, making book catalogs much more difficult and expensive to compile and to keep up to date. by the end of the nineteenth century, book catalogs had largely been replaced by the card catalog system that remained in use through most of the twentieth century. although card catalogs were far superior for their primary purposes—maintaining an inventory of books presently owned by the library and allowing library users to locate the books that they wanted—they leave no permanent record of the books listed in the catalog at any particular point in time. 7 information about pdftotext can be found at https://manpages.debian.org/testing/popplerutils/pdftotext.1.en.html. http://hdl.handle.net/2027/hvd.32044080252174 http://hdl.handle.net/2027/chi.098498263 https://doi.org/10.1126/science.1199644 https://doi.org/10.1007/bf00287963 https://doi.org/10.1177/0956797613479387 https://doi.org/10.1371/journal.pone.0137041 https://doi.org/10.1093/llc/fqv037 https://www.edsurge.com/news/2017-08-10-what-happened-to-google-s-effort-to-scan-millions-of-university-library-books https://www.edsurge.com/news/2017-08-10-what-happened-to-google-s-effort-to-scan-millions-of-university-library-books https://manpages.debian.org/testing/poppler-utils/pdftotext.1.en.html https://manpages.debian.org/testing/poppler-utils/pdftotext.1.en.html information technology and libraries | december 2019 24 8 the cleaning scripts, as well as data and other code used in this project, are available in https://github.com/julia-bauder/library-catalog-analysis-public. 9 the founding and incorporation dates were taken from the prefatory texts in the book catalogs used in this analysis, as listed in appendix a. 10 the scan of this catalog that is available from hathitrust is missing pages 3-6. 11 apache solr is a widely used, open-source search platform. solrmarc is a utility that can be used to index marc records into solr. vufind is an open-source library discovery layer built in part on solr and solrmarc. for more information, see http://lucene.apache.org/solr/, https://github.com/solrmarc/solrmarc, and https://vufind.org/vufind/, respectively. the hathitrust oai feed is available at https://www.hathitrust.org/oai. 12 five of the missing works from the new york society library sample were undated in the catalog. https://github.com/julia-bauder/library-catalog-analysis-public http://lucene.apache.org/solr/ https://github.com/solrmarc/solrmarc https://vufind.org/vufind/ https://www.hathitrust.org/oai abstract introduction methods results percentage of works included in hathitrust percentage of works that cannot be positively identified success rates for the parsing and matching scripts conclusions appendix a: american library catalogs from 1776 to 1825 included in hathitrust references microsoft word 5485-10835-5-ce.docx negotiating a text mining license for faculty researchers leslie a. williams, lynne m. fox, christophe roeder, and lawrence hunter information technology and libraries | september 2014 5 abstract this case study examines strategies used to leverage the library’s existing journal licenses to obtain a large collection of full-‐text journal articles in xml format, the right to text mine the collection, and the right to use the collection and the data mined from it for grant-‐funded research to develop biomedical natural language processing (bnlp) tools. researchers attempted to obtain content directly from pubmed central (pmc). this attempt failed because of limits on use of content in pmc. next, researchers and their library liaison attempted to obtain content from contacts in the technical divisions of the publishing industry. this resulted in an incomplete research data set. researchers, the library liaison, and the acquisitions librarian then collaborated with the sales and technical staff of a major science, technology, engineering, and medical (stem) publisher to successfully create a method for obtaining xml content as an extension of the library’s typical acquisition process for electronic resources. our experience led us to realize that text-‐mining rights of full-‐text articles in xml format should routinely be included in the negotiation of the library’s licenses. introduction the university of colorado anschutz medical campus (cu anschutz) is the only academic health sciences center in colorado and the largest in the region. annually, cu anschutz educates 3,480 full-‐time students, provides care during 1.5 million patient visits, and receives more than $400 million in research awards.1 cu anschutz is home to a major research group in biomedical natural language processing (bnlp), directed by professor lawrence hunter. natural language processing (also known as nlp or, more colloquially, “text mining”) is the development and application of computer programs that accept human language, usually in the form of documents, as input. bnlp takes as input scientific documents, such as journal articles or abstracts, and provides useful leslie a. williams (leslie.williams@ucdenver.edu) is head of acquisitions, auraria library, university of colorado, denver. lynne m. fox (lynne.fox@ucdenver.edu) is education librarian, health sciences library, university of colorado anschutz medical campus, aurora. chistophe roeder is a researcher at the school of medicine, university of colorado, aurora. lawrence hunter (larry.hunter@ucdenver.edu) is professor, school of medicine, university of colorado, aurora. negotiating a text mining license for faculty researchers | williams et al 6 functionality, such as information retrieval or information extraction. cu anschutz’s health sciences library (hsl) supports hunter’s research group by providing a reference and instruction librarian, lynne fox, to participate on the research team. hunter’s group is working on computational methods for knowledge-‐based analysis of genome-‐scale data.2 as part of that work, his group is devising and implementing text-‐mining methods that extract relevant information from biomedical journal articles, which is then integrated with information from gene-‐centric databases and used to produce a visual representation of all of the published knowledge relevant to a particular data set, with the goal of identifying new explanatory hypotheses. hunter’s research group demonstrated the potential of integrating data and research information in a visualization to further new discoveries with the “hanalyzer” (http://hanalyzer.sourceforge.net). their test case used expression data from mice related to craniofacial development and connected that data to pubmed abstracts using gene or protein names. “copying of content that is subject to copyright requires the clearing of rights and permissions to do this. for these reasons the body of text that is most often used by researchers for text mining is pubmed.”3 the resulting visualization allowed researchers to identify four genes involved in mouse craniofacial development that had not previously been connected to tongue development, with the resulting hypotheses validated by subsequent laboratory experiment.4 the knowledge-‐based analysis tool is open access. to continue the development of the bnlp tools for the knowledge-‐based analysis system, three things were required: a large collection of full-‐text journal articles in xml format, the right to text mine the collection, and the right to store and use the collection and the data mined from it for grant-‐funded research. the larger the dataset, the more robust the visual representations of the knowledge-‐based analysis system, so hunter’s research group sought to compile a large corpus of relevant literature, beginning with journal articles. the text that is mined can start in many formats; however, xml provides a computer-‐ready format for text mining because it is structured to indicate parts of the document. xml is “called a ‘markup language’ because it uses tags to mark and delineate pieces of data. the ‘extensible’ part means that the tags are not pre-‐defined; users can define them based on the type of content they are working with.”5,6 xml has been adopted as a standard for content creation by journal publishers because it provides a flexible format for electronic media.7 xml allows the parts of a journal article to be encoded with tags that identify the title, author, abstract, and other sections, allowing the article to be transmitted electronically between editor and publisher and to be easily formatted and reproduced into different versions (e.g., print, online). xml can also indicate significant content in the text, such as biological terms or concepts. xml allowed hunter’s research group to write computer programs that can make sense of each article by using the xml tags as indicators of content and placement within the article. products have been developed, such as la-‐pdftext, to extract text from pdf documents.8 however, direct access to xml provides more useful corpora information technology and libraries | september 2014 7 because the document markup saves time and improves the accuracy of results extracted from xml. once the sections and content of an article are identified, text-‐mining techniques are applied to the article. “text mining extracts meaning from text in the form of concepts, the relationships between the concepts or the actions performed on them and presents them as facts or assertions.”9 text-‐ mining techniques can be applied to any type of information available in machine-‐readable format (e.g., journal article, e-‐books). a dataset is created when the text-‐mined data is aggregated. using bnlp tools, hunter’s research group’s knowledge-‐based analysis system analyzed the dataset and produced visual representations of the knowledge that have the potential to lead to new hypotheses. text mining and bnlp techniques have the potential to build relationships between the knowledge contained in the scholarly literature that lead to new hypothesis resulting in more rapid advances in science. literature review hunter and cohen explored “literature overload” and its profoundly negative impact on discovery and innovation.10 with an estimated growth rate of 3.1 percent annually for pubmed central, the us national library of medicine’s repository, researchers struggle to master the new literature of their field using traditional methods. yet much of the advancement of biological knowledge relies on the interplay of data created by protein, sequence, and expression studies and the communication of information and discoveries through nontextual and textual databases and published reports.11 how do biomedical researchers capitalize on and integrate the wealth of information available in the scholarly literature? “the common ground in the area of content mining is in the shared conviction that the ever increasing overload of information poses an absolute need for better and faster analysis of large volumes of content corpora, preferably by machines.”12 bnlp “encompasses the many computational tools and methods that take human-‐generated texts as input, generally applied to tasks such as information retrieval, document classification, information extraction, plagiarism detection, or literature-‐based discovery.”13 bnlp techniques accomplish many tasks usually performed manually by researchers, including enhancing access through expanded indexing of content or linkage to additional information, automating reviews of the literature, discovering new insights, and extracting meaning from text.14 text mining is just one tool in a larger bnlp toolbox of resources used to read, reason, and report findings in a way that connects data to information sources to speed discovery of new knowledge.15 according to pioneering text-‐mining researcher marti hearst, “text mining is the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources. a key element is the linking together of the extracted information together to form new facts or new hypotheses to be explored further by more conventional means of negotiating a text mining license for faculty researchers | williams et al 8 experimentation.”16 biomedical text mining uses “automated methods for exploiting the enormous amount of knowledge available in the biomedical literature.”17 recent reports, commissioned by private and governmental interest groups, discuss the economic and societal value of text mining.18,19 the mckinsey global institute estimates the worth of harnessing big data insights in us health care at $300 billion. the report concludes that greater sharing of data for text mining enables “experimentation to discover needs, expose variability, and improve performance” and enhances “replacing/supporting human decision making with automated algorithms,” among other benefits. furthermore, the mckinsey report points out that north america and europe have the greatest potential to take advantage of innovation because of a well-‐developed infrastructure and large stores of text and data to be mined.20 however, these new and evolving technologies are challenging the current intellectual-‐property framework as noted in an independent report by ian hargreaves, “digital opportunity: a review of intellectual property and growth,” resulting in lost opportunity for innovation and economic growth.21 in “the value and benefits of text mining,” jisc finds copyright restrictions limit access to content for text mining in the biomedical sciences and chemistry and that costs for access and infrastructure prevent entry into text-‐mining research for many noncommercial organizations.22 despite copyright barriers, organizations surveyed pointed out the risks associated with failing to use text-‐ mining techniques to further research include financial loss, loss of prestige, opportunity lost, and the brain drain of having talented staff seek more fulfilling work. jisc explores a research project’s workflow and finds a lack of access to text mining delayed the publication of an important medical research study by many months, or the time the research team spent analyzing and summarizing relevant research.23 both reports advocate an exception to intellectual property rights for noncommercial text-‐mining research to balance the protection of intellectual property with the access needs of researchers. a centrally maintained repository for text mining has been proposed, although its creation would face significant challenges.24 scholarly journal content is the raw “ore” for text mining and bnlp. the lack of access to this ore creates a bottleneck for researchers. “new business models for supporting text mining within the scholarly publishing community are being explored; however, evidence suggests that in some cases lack of understanding of the potential is hampering innovation.”25 bnlp and machine-‐ learning research products are more accurate and complete when more content is available for text mining. “knowledge discovery is the search for hidden information. . . . hence the need is to start looking as widely as possible in the largest set of content sources possible.”26 however, as noted in a nature article, “the question is how to make progress today when much research lies behind subscription firewalls and even ‘open’ content does not always come with a text-‐mining license.”27 large scientific publishers are facing economic challenges, and potentially diminished economic returns, as the tension over the right to use licensed content heats up. nature, the flagship of a major scientific publisher, predicted “trouble at the text mine” if researchers lack access to the contents of research publications.28 and a 2012 investment report predicted slower information technology and libraries | september 2014 9 earnings growth for elsevier, the largest stem publisher, if it blocked access to licensed content by text-‐mining researchers. the review predicted, “if the academic community were to conclude that the commercial terms imposed by elsevier are also hindering the progress of science or their ability to efficiently perform research, the risk of a further escalation of the acrimony [between elsevier and the academic community] rises substantially.”29 with open access alternatives proliferating, including making federally funded research freely accessible, stem publishers are under increased pressure to respond to market forces. “the greatest challenge for publishers is to create an infrastructure that makes their content more machine-‐accessible and that also supports all that text-‐miners or computational linguists might want to do with the content.”30 on the other end of the spectrum, researchers are struggling to gain legal access to as much content as possible. academic libraries have long excelled at serving as the bridge between researchers and publishers and can expand their roles to include navigating the uncharted territory of obtaining text-‐mining rights for content. increasing the library’s role in text mining and other associated bnlp and machine-‐learning methods offers tremendous potential for greater institutional relevance and service to researchers.31 at cu anschutz’s hsl, fox and williams, an acquisitions librarian, found natural opportunities for collaboration including negotiating rights to content more efficiently through expanded licensing arrangements and facilitating the secure transfer and storage of data to protect researchers and publishers. method hunter and fox began working in 2011 to obtain a large corpus of biomedical journal articles in xml format to create a body of text as comprehensive as possible for bnlp experimentation that would further advance hunter’s research group’s knowledge-‐based analysis system. the desired result was an aggregated collection obtained from multiple publishers, stored locally, and available on demand for the knowledge-‐based analysis system to process. hunter and fox soon realized that “the process of obtaining or granting permissions for text mining is daunting for researchers and publishers alike. researchers must identify the publishers and discover the method of obtaining permission for each publisher. most publishers currently consider mining requests on a case by case basis.”32 they pursued a multifaceted strategy to build a robust collection and to determine which strategy proved most fruitful because, during a grant review, national library of medicine staff wanted evidence of access to an xml collection before awarding a grant. fox first approached two open-‐access publishers, biomed central (bmc) and public library of science (plos), to request access to xml text from journals in the subjects of life and biomedical science. fox had existing contacts within both organizations and an agreement was reached to obtain xml journal articles. letters of understanding were quickly obtained as both publishers were excited about exploring new ways for their research publications to be accessed and the potential to increase the use of their journals. possible journal titles were identified and negotiating a text mining license for faculty researchers | williams et al 10 arrangements were made to transfer and store files locally from bmc and plos to hunter’s research group. hunter approached staff at pubmedcentral (pmc) to request access to articles and discovered they could only be made available with permission from publishers. a wiley research and product development executive granted hunter permission to access wiley articles in pmc. the wiley executive was interested in learning what impact text mining might have on wiley products. hunter’s research group planned to transfer document type definition (dtd) format files from pmc. unfortunately, when hunter’s research group staff requested file-‐transfer assistance from pmc, no pmc staff were available to provide the technical help needed because of budget reductions. pmc staff could accurately evaluate their time commitment because they had a clear understanding of the xml access and transfer process, and knew they could not allocate resources to the effort. hunter then began to leverage his professional network connections to obtain content from a major stem vendor. research and development division directors within the company were familiar with the work of hunter’s research group and were willing to provide assistance in acquiring content. however, when the research group began to perform research using this data, further investigation determined that the contents were not adequate for the research. follow-‐up between fox, the research group, and the vendor revealed that the group’s needs were not communicated in the vendor’s vernacular, resulting in the group not clearly understanding what content the vendor was providing. this disconnect occurred in the communication flow from the research group to the vendor’s research and development staff to the vendor’s sales staff (who identified the content to be shared). it was a like a game of telephone tag. after the initial strategies produced mixed results, hunter’s research group hypothesized that they could harvest materials through hsl’s journal subscriptions. hunter’s research group attempted to crawl and download journal content being provided by hsl’s subscription to a major chemistry publisher. since publishers monitor for web crawling of their content, the chemistry publisher became aware of the unusual download activity, turned off campus access, and notified the library that there may have been an unauthorized attempt to access the publisher’s content. researchers are often unaware of complex copyright and license compliance requirements. in fact, librarians sometimes become aware of text-‐mining projects only after automated downloads of licensed content prompt vendors to shut off campus access.33 libraries can prevent interruption of campus-‐wide access to important resources by suggesting more effective content-‐access methods. williams, an hsl acquisitions librarian, investigated the interruption in access and discovered hunter’s research group’s efforts to obtain journal articles to text mine for their research. she offered to use her expertise in acquiring content to help hunter’s research group obtain the dataset needed for their research. initially, hunter and fox had not included an acquisitions information technology and libraries | september 2014 11 librarian because that position was vacant. after williams became involved, the effort focused on licensing content via negotiation and licensing with individual publishers. results “there are a large number of resources to help the researcher who is interested in doing text mining” but “no similar guide to obtaining the necessary rights and permissions for the content that is needed.”34 at cu anschutz, this vacuum was filled by williams, who is knowledgeable about the acquisition of content, and fox, who is knowledgeable about hunter’s research, serving as the bridge between the research group and the stem publisher. by working together and capitalizing on each other’s expertise, williams and fox were able to facilitate the collaboration that developed a framework for purchasing a large collection of full-‐text journal articles in xml format. as the collaboration progressed, three major elements to the framework surfaced, including a pricing model, a license agreement, and the dataset and delivery mechanism. researchers interested in legally text mining journal content often find themselves having to execute a license agreement and pay a fee.35 what should the fee be based on to create a fair and equitable pricing model? publishers establish pricing for library clients on the basis of not only the content, but many valued-‐added services such as the breath of titles aggregated and made available for purchase in a single product, the creation of a platform to access the journal titles, the indexing and searching functionality within the platform, and the production of easily readable pdf versions of articles. these value-‐added services are not required for text-‐mining endeavors. rather, the product is the raw journal content that has been peer-‐reviewed, edited, and formatted in xml that precedes the addition of value-‐added services. therefore the pricing should not be equivalent to the cost of a library’s subscription to a journal or package of journals. in the end, after lengthy negotiations, the pricing model for the hunter’s research group collection of full-‐text journal articles in xml format consisted of • a cost per article; • a minimum purchase of 400,000 articles for one sum on the basis of the cost per article; • an annual subscription for the minimum purchase of 400,000; • the ability to subscribe to additional articles in excess of 400,000 in quantities determined by hunter’s research group; • a volume discount off the per article price for every article purchased in excess of 400,000; • inclusion of the core journal titles purchased via the library’s subscription at no charge; • inclusion of the core journal titles purchased by the university of colorado boulder at no charge because of hunter’s joint appointment at both cu boulder and cuanschutz campuses; and • a requirement for hsl to maintain its subscription to the vendor’s product at its current level. negotiating a text mining license for faculty researchers | williams et al 12 “where institutions already have existing contracts to access particular academic publications, it is often unclear whether text mining is a permissible use.”36 from the beginning, common ground was easily found on the subject of core titles purchased by the two campuses’ libraries. core titles are typically those journals that libraries pay a premium for to obtain perpetual rights to the content. most of the negotiation focused on access titles, which are journals that libraries pay a nominal fee to have access to without any perpetual rights included. the final challenge related to cost was determining how to process and pay for the product. hunter’s research group operates on major grant funding from federal government agencies. the university of colorado requires additional levels of internal controls and approvals to expend grant funds as well as to track expenditures to meet reporting requirements of the funding agencies. also, grant funding of this type often spans multiple fiscal years whereas the library’s budget operates on a single fiscal year at a time. therefore it was decided that hunter would handle payment directly rather than transferring funds to hsl to make payment on their behalf. “libraries as the licensee of publishers’ content are from that perspective interested in the legal framework around content mining.”37 during price negotiations, williams recommended negotiating a license agreement similar to those libraries and publishers execute for the purchases of journal packages. a license agreement would offer a level of protection for all parties involved while clearly outlining the parameters of the transaction. hunter and the stem publisher readily agreed. the final license agreement contained ten sections including definitions; subscription; obligations; use of names; financial arrangement; term; proprietary rights; warranty, indemnity, disclaimer, and limitation of liability; and miscellaneous. while the license agreement was similar to traditional license agreements between libraries and publishers for journal subscriptions, there were some notable differences. first, in the definitions section, users were defined and limited to hunter and his research team. this limited the users to a specific group of individuals unlike typical library–publisher license agreements that license content for the entire campus. second, the subscription section covered how the data can be used in detail and allowed the dataset to be installed locally. this was important to make the dataset available on demand to researchers; to allow researchers to manipulate, segment, and store the data in multiple ways instead of as one large dataset; and to allow the researchers the ability to access and use the large dataset efficiently and quickly. because the dataset would be manipulated so extensively, the license gave permission to create a backup copy and store it separately. the subscription section also required the dissemination of the research results to occur in such a way that the dataset could not be extracted and used by others. this was significant because prof. hunter releases the bnlp software applications they develop as open source software so that the applications can be open to peer review and attempts at reproduction. ideally, someone could download the open source software, obtain the same corpus as input, and see the same output mentioned in the paper. information technology and libraries | september 2014 13 third, the obligations section was radically different from traditional library–publisher license agreements because even though “publishers are still working out how to take advantage of text mining . . . none wants to miss out on the potential commercial value.”38 this interest prompted the crafting of an atypical obligations section in the license agreement that included an option for hunter to collaborate with the stem publisher to develop and showcase an application on the vendor’s website and included a commitment for hunter to meet quarterly with the vendor’s representatives to discuss advances in research. furthermore, the obligations section specified a request for hunter and the university of colorado to recognize the vendor where appropriate and a right for the stem publisher to use any research software application released as open source. up to this point, williams had been collaborating with the university of colorado in-‐house counsel to review and revise the license agreement. when the stem publisher requested the right to use the software application, williams was required to submit the license agreement to the university of colorado‘s technology transfer office for review and approval. approval was prompt in coming, primarily because prof. hunter releases his software applications as open source. fourth, the license agreement included a “use of names” section, which is not found in typical library–publisher agreements. this section authorized the vendor to use factual information drawn from a case study in market-‐facing materials and a requirement that the vendor request written consent, as required from the university of colorado system, for information in the case study to be released for market facing materials. the vendor also agreed not to use the university of colorado’s trademark, service mark, trade name, copyright, or symbol without prior written consent and to use these items in accordance with the university of colorado system’s usage guidelines. fifth, the vendor agreed not to represent in any way that the university of colorado or its employees endorse the vendor’s products or services. this is extremely important because the university of colorado’s controller does not allow product endorsements because of the federal unrelated business income tax. exempt organizations are required to pay this tax if engaged in activities that are regularly occurring business activities that do not further the purpose of the exempt organization.39 finally, the license agreement stated all items would be provided in xml format with a unique digital object identifier (doi) number, essential for linking xml content to real-‐world documents that researchers using hunter’s research group’s knowledge-‐based analysis system would want to access. after a pricing model and license agreement were finalized, the focus turned to the last major element of the framework: the dataset and delivery mechanism. elements such as quality of the corpora contents, file transfer time, and storage capacity are all important. in other words, “the need is to start looking as widely as possible in the largest set of content sources possible. this need is balanced by the practicalities of dealing with large amounts of information, so a choice negotiating a text mining license for faculty researchers | williams et al 14 needs to be made of which body of content will most likely prove fruitful for discovery. text mines are dug where there is the best chance of finding something valuable.”40 when building an xml corpora for research, hunter’s research group wanted to maximize their return on investment, so a pilot download was conducted to assure that the most beneficial content could be transferred smoothly to a local server. “permissions and licensing is only a part of what is needed to support text mining. the content that is to be mined must be made available in a way that is convenient for the researcher and the publisher alike.”41 this pilot phase allowed hunter’s researchers and the vendor’s technical personnel to clarify the requirements of the dataset and to efficiently deliver and accurately invoice for content. one of the initial obstacles was that a filter for the delivery mechanism didn’t exist. letters to the editor, errata, and more were all counted as an article. hunter’s researchers quickly determined that research articles were most important at this point in the development of the knowledge-‐based analysis system. how should a useful or minable article be defined—by its length, by xml tags indicating content type, or by some other criteria? roeder, a software engineer, used article attributes and characteristics embedded in xml tags to define an article as including all of the following: • an abstract • a body • at least 40 lines of text • none of the following tags: corrigendum, erratum, book review, editorial, introduction, preface, correspondence, or letter to the editor in the end, hunter’s research group and the vendor agreed to transmit everything and allow the group a fifteen business days to evaluate the content. the research group would then notify the vendor of how many “articles” were received. this process would continue until 400,000 “articles” were received. after spending more than a year working to develop a structure to purchase a large corpus of journal articles to text mine. just as hunter’s research group was ready to execute the license, remit payment, and receive the articles, their federal grant expired, stalling the purchase. in retrospect, this unfortunate development was the catalyst for a shift in philosophy and strategy for the researchers and librarians at cu anschutz. discussion xml text-‐mining efforts will continue to expand, leading to increased demand on libraries and librarians to play a role in securing content. publishers, researchers, and libraries see the potential commercial and research value for text mining journal content and are driving the rapid evolution of this arena, in part, because “there is increasing demand from public and charitable funders that maximum value is leveraged from their substantial investment and this includes making outputs information technology and libraries | september 2014 15 accessible and usable. . . . text mining offers the potential for fuller use of the existing publicly-‐ funded research base.”42 however, publishers identified two main barriers to text mining from their perspective—lack of standardization in content formats and in access terms—and concede that “publishers should develop shared access terms for research-‐driven mining requests.”43 from the researcher and librarian perspective, there are many barriers and costs involved including “access rights to text-‐ minable materials, transaction costs (participation in text mining), entry (setting up text mining), staff and underlying infrastructure. currently, the most significant costs are transaction costs and entry costs.”44 the significant transaction costs stem from the time it takes to navigate the complexity of negotiating and complying with license agreements for journal content. the various types of “costs are currently borne by researchers and institutions, and are a strong hindrance to text mining uptake. these could be reduced if uncertainty is reduced, more common and straightforward procedures are adopted across the board by license holders, and appropriate solutions for orphaned works are adopted. however, the transaction costs will still be significant if individual rights holders each adopt different licensing solutions and barriers inhibiting uptake will remain.”45 in a survey of libraries, findings indicated that librarians anticipate a new role as facilitators between researchers and publishers to enable text mining.46 librarians are a natural fit for this role because they already have expertise in navigating copyright, requesting copyright permissions, and negotiating license agreements for journal content. “advice and guidance should be developed to help researchers get started with text mining. this should include: when permission is needed; what to request; how best to explain intended work and how to describe the benefits of research and copyright owners.”47 after their experience with developing a framework to license and purchase a large corpora of journal articles in xml format to be text mined, fox and williams came to believe that, in addition to providing copyright expertise, librarians should assist in reducing transaction costs by developing model license clauses for text mining and routinely negotiating for these rights when the library purchases journals and other types of content. adopting this philosophy and strategy led williams and fox to successfully advocate for the inclusion of a text-‐mining clause in the license agreement for the stem publisher in this case study at the time of the library’s subscription renewal. this occurred at a regional academic consortium level, making text mining easier at fourteen academic institutions. furthermore, the university of colorado libraries, which includes five libraries on four campuses, is now working on drafting a model clause to use when purchasing journal content as the university of colorado system and to put forth for consideration by the consortiums that facilitate the purchase of our major journal packages. given that incorporating text mining clauses into library–publisher license agreements for scholarly journals is in its infancy, there are few resources available to assist librarians adopting this new role. model clauses include the following: negotiating a text mining license for faculty researchers | williams et al 16 • british columbia electronic library network’s model license agreement48 o clause 3.11. “data and text mining. members and authorized users may conduct research employing data or text mining of the licensed materials and disseminate results publicly for non-‐commercial purposes.” • california digital library’s standard license agreement49 o section iv. authorized use of licensed materials. “text mining. authorized users may use the licensed material to perform and engage in text mining/data mining activities for legitimate academic research and other educational purposes.” • jisc’s model license for journals50 o clause 3.1.6.8. “use the licensed material to perform and engage in text mining/data mining activities for academic research and other educational purposes and allow authorised users to mount, load and integrate the results on a secure network and use the results in accordance with this license.” o clause 9.3. “for the avoidance of doubt, the publisher hereby acknowledges that any database rights created by authorised users as a result of textmining/datamining of the licensed material as referred to in clause 3.1.6.8 shall be the property of the institution.” publishers are also beginning to break down barriers perhaps, in part, because of the sentiment that “privately erected barriers by copyright holders that restrict text mining of the research base could be increasingly regarded as inequitable or unreasonable since the copyright holders have borne only a small proportion of the costs involved in the overall process; furthermore, they do not have rights or ownership of the inherent facts or ideas within the research base.”51 biomed central and plos both offer services that allow researchers to access xml text collections. biomed central makes content readily accessible by providing a website for bulk download of xml text.52 plos requires contact with a staff member for download of xml text.53 in december 2013, elsevier also announced that it would create a “big data” center at the university college london to allow researchers to work in partnership with mendeley, a knowledge management and citation application now owned by elsevier. while this is a positive step, the partnership does not appear to make the data available to research groups beyond the university college london.54 however, there is still a long way to go before publishers and librarians are routinely collaborating on opening up the scholarly literature to be mined. for example, a 2012 nature editorial states “nature publishing group, which also includes this journal, says that it does not charge subscribers to mine content, subject to contract.”55 repeated attempts by williams to obtain more information from nature publishing group and a copy of the contract have proved fruitless. in january 2014, elsevier announced that “researchers at academic institutions can use elsevier’s online interface (api) to batch-‐download documents in computer-‐readable xml format” after information technology and libraries | september 2014 17 signing a legal agreement. elsevier will limit researchers to accessing 10,000 articles per week.56,57 for small-‐scale projects with a narrow scope, this limit will suffice. for example, mining the literature for a specific gene that plays a known role in a disease could require a text set under 30,000 articles. at elsevier’s current rate of article transfer, a 30,000 article text set could be created in roughly three weeks. however, for large-‐scale projects such as hunter’s research group’s knowledge-‐based analysis system that require a text set of 400,000 articles (or much more, if not limited by budget constraints), nearly a year of time would be required to build the corpora. time is one of the most valuable commodities in computational biology. the elapsed time required to transfer articles at the rate of 10,000 articles per week represents a bottleneck that most grant-‐funded research cannot afford. speed of transfer will also be a factor. researchers require flexibility to maximize available central processing unit (cpu) hours because documents can take from a few seconds to a full minute each to transfer to the storage destination. monopolizing peak hours in high performance computing (hpc) settings may mean that computing power is not available for other tasks, although many hpc centers have learned to allocate cpu use more efficiently to high volumes. furthermore, the terms and conditions set by elsevier for output limits excerpting from the original text to 200 characters.58 this is roughly equivalent to two lines of text or approximately forty words. this may be insufficient to capture important biological relationships necessary to evaluate the relevance of the article to the research being represented by the hanalyzer knowledge-‐based analysis system. conclusion forging a partnership between a library, a research lab, and a major stem vendor requires flexibility, patience, and persistence. our experience strengthened the existing relationship between the library and the research lab and demonstrated the library’s willingness and ability to support faculty research in a nontraditional method. librarians are encouraged to advocate for the inclusion of text-‐mining rights in their library’s license agreements for electronic resources. what the future holds for publishers, researchers, and libraries involved in text mining remains to be seen. however, what is certain is that without cooperation between publishers, researchers, and libraries, breaking down the existing barriers and achieving standards for content formats and access terms will remain elusive. references 1. university of colorado anschutz medical campus, university of colorado anschutz medical campus quick facts, 2013, http://www.ucdenver.edu/about/whoweare/documents/cuanschutz_facts_041613.pdf. negotiating a text mining license for faculty researchers | williams et al 18 2. sonia m. leach et al., “biomedical discovery acceleration, with applications to craniofacial development,” plos computational biology 5, no. 3 (2009): 1–19, http://dx.doi.org/10.1371/journal.pcbi.1000215. 3. jonathan clark, text mining and scholarly publishing (publishing research consortium, 2013). 4. corie lok, “literature mining: speed reading,” nature 463 (2010): 416–18, http://dx.doi.org/10.1038/463416a. 5. hong-‐jie dai, yen-‐ching chang, richard tzong-‐han tsai, wen-‐lian hsu, "new challenges for biological text-‐mining in the next decade," journal of computer science and technology 25, no.1 (2010): 169-‐179, doi: 10.1007/s11390-‐010-‐9313-‐5. 6. anne hoekman, “journal publishing technologies: xml,” http://www.msu.edu/~hoekmana/wra%20420/ismte%20article.pdf. 7. alex brown, "xml in serial publishing: past, present and future," oclc systems & services 19, no. 4, (2003):149-‐154, doi: 10.1108/10650750310698775. 8. cartic ramakrishnan et al., “layout-‐aware text extraction from full-‐text pdf of scientific articles,” source code for biology and medicine 7, no. 7 (2012), http://dx.doi.org/10.1186/1751-‐0473-‐7-‐7. 9. ibid. 10. lawrence hunter and k. bretonnel cohen, “biomedical language processing: perspective what’s beyond pubmed?” molecular cell 21, no. 5, (2006): 589–94. 11. martin krallinger, alfonso valencia, and lynette hirschman, “linking genes to literature: text mining, information extraction, and retrieval applications for biology,” genome biology 9, supplement 2 (2008): s8.1–s8.14, http://dx.doi.org/10.1186/gb-‐2008-‐9-‐s2-‐s8. 12. eefke smit and maurits van der graaf, “journal article mining: the scholarly publishers’ perspective,” learned publishing 25, no. 1 (2012): 35–46, http://dx.doi.org/10.1087/20120106. 13. hunter and cohen, “biomedical language processing,” 589. 14. clark, text mining and scholarly publishing. 15. leach et al., “biomedical discovery acceleration.” 16. marti hearst, “what is text mining?” october 17, 2003, http://people.ischool.berkeley.edu/~hearst/text-‐mining.html. information technology and libraries | september 2014 19 17. k. bretonnel cohen and lawrence hunter, “getting started in text mining,” plos computational biology 4, no. 1 (2008): 1–3, http://dx.doi.org/10.1371/journal.pcbi/0040020. 18. jisc, “the model nesli2 licence for journals,” 2013, http://www.jisc-‐collections.ac.uk/help-‐ and-‐information/how-‐model-‐licences-‐work/nesli2-‐model-‐licence-‐/. 19. ian hargreaves, “digital opportunity: a review of intellectual property and growth,” may 2011, http://www.ipo.gov.uk/ipreview-‐finalreport.pdf. 20. james manyika et al., “big data: the next frontier for innovation, competition, and productivity,” mckinsey & company, may 2011, http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_inn ovation. 21. hargreaves, “digital opportunity.” 22. diane mcdonald and ursula kelly, “the value and benefits of text mining to uk further and higher education,” jisc, 2012, http://www.jisc.ac.uk/reports/value-‐and-‐benefits-‐of-‐text-‐ mining. 23. jisc, “the model nesli2 licence for journals.” 24. smit and van der graaf, “journal article mining.” 25. mcdonald and kelly, “the value and benefits of text mining.” 26. clark, text mining and scholarly publishing. 27. “gold in the text?” nature 483 (march 8, 2012): 124, http://dx.doi.org/10.1038/483124a. 28. richard van noorden, “trouble at the text mine,” nature 483 (march 8, 2012): 134–35. 29. claudio aspesi, a. rosso, and r. wielechowski. reed elsevier: is elsevier heading for a political train-‐wreck? 2012. 30. clark, text mining and scholarly publishing. 31. jill emery, “working in a text mine: is access about to go down?” journal of electronic resources librarianship 20, no. 3 (2009):135–38, http://dx.doi.org/10.1080/19411260802412745. 32. clark, text mining and scholarly publishing: 14. 33. van noorden, “trouble at the text mine.” 34. ibid. 35. ibid. negotiating a text mining license for faculty researchers | williams et al 20 36. jisc, “the model nesli2 licence for journals.” 37. smit and van der graaf, “journal article mining.” 38. van noorden, “trouble at the text mine.” 39. internal revenue service, “unrelated business income defined,” http://www.irs.gov/charities-‐&-‐non-‐profits/unrelated-‐business-‐income-‐defined. 40. clark, text mining and scholarly publishing: 10. 41. ibid: 14. 42. mcdonald and kelly, “the value and benefits of text mining.” 43. smit and van der graaf, “journal article mining.” 44. mcdonald and kelly, “the value and benefits of text mining.” 45. ibid. 46. smit and van der graaf, “journal article mining.” 47. mcdonald and kelly, “the value and benefits of text mining.” 48. british columbia electronic library network, bc eln database licensing framework, http://www.cdlib.org/services/collections/toolkit/. 49. “licensing toolkit,” california digital library, http://www.cdlib.org/services/collections/toolkit/. 50. jisc, “the model nesli2 licence for journals.” 51. mcdonald and kelly, “the value and benefits of text mining.” 52. “using biomed central’s open access full-‐text corpus for text mining research,” http://www.biomedcentral.com/about/datamining. 53. “help using this site,” plos, http://www.plosone.org/static/help. 54. iris kisjes, “university college london and elsevier launch ucl big data institute,” elsevier connect, press release, december 18, 2013, http://www.elsevier.com/connect/university-‐ college-‐london-‐and-‐elsevier-‐launch-‐ucl-‐big-‐data-‐institute. 55. “gold in the text?” 56. richard van noorden, “elsevier opens its papers to text-‐mining,” nature 506 (february 2, 2014): 17. 57. sciverse, content apis, http://www.developers.elsevier.com/cms/content-‐apis. information technology and libraries | september 2014 21 58. “text and data mining,” elsevier, , http://www.elsevier.com/about/universal-‐access/content-‐ mining-‐policies. animated subject maps for book collections tim donahue information technology and libraries | june 2013 7 abstract of our two primary textual formats, articles by far have received the most fiscal and technological support in recent decades. meanwhile, our more traditional format, the book, seems in some ways to already be treated as a languishing symbol of the past. the development of opacs and the abandonment of card catalogs in the 1980s and 1990s is the seminal evolution in print monograph access, but little else has changed. to help users locate books by call number and browse the collection by subject, animated subject maps were created. while the initial aim is a practical one, helping users to locate books and subjects, the subject maps also reveal the knowledge organization of the physical library, which it displays in a way that can be meaningful to faculty, students, and other community members. we can do more with current technologies to assist and enrich the experience of users searching and browsing for books. the subject map is presented as an example of how we can do more in this regard. lc classification, books, and library stacks during the last few decades of technological evolution in libraries, we have helped facilitate a seismic shift from print-based to digital research. our library websites are jammed with electronic resources, digital collection components, database links, virtual reference assistance, online tutorials, and mobile apps. collection budgets too have shifted from a print to electronic focus. many libraries are now spending less than 20 percent of their material budgets on print monographs. and yet, our stacks are still filled with books that often take up more than fifty percent of our library spaces. knowledge organization schemas have also evolved in libraries. we have subject lists to help users to decide on which databases to select that reflect current disciplines and majors in higher education. internal database navigation continues to evolve in terms of limits, fields, and subject searching. web searching is based on the contemporary keyword approach where “everything is miscellaneous” and need not be organized, but nationwide, billions of books still sit on shelves according to dewey or library of congress classification systems that were initially developed over a century ago. some say these organizing systems are woefully antiquated and do not reflect our contemporary post-modern realities, though they still amply serve their purpose to assign call number locations for our books. we hear scant little of plans to update these classification schemes. why invest more time, energy, and resources on revamped organization schemes for libraries? the hathitrust now contains the tim donahue (tdonahue@montana.edu) is assistant professor/instruction librarian, montana state university, bozeman, mt. animated subject maps for book collections | donahue 8 scanned text of more than ten million books. google claims there are almost 130 million published titles in the world and intends to digitize all of them.1 what will happen to our physical book collections? how long will they reside on our library shelves? how long will they be located using the dewey and lc systems? is the library a shrinking organism? profession-wide, there seems to be no concrete vision in regards to the future of our book collections. there is, of course, general acknowledgement that acquisition of e-books will increase as print acquisitions decrease and that, overall, print collections will accordingly shrink to reflect the growing digital nature of knowledge consumption. but for now and into the foreseeable future these billions of monographs remain on our shelves in the same locations our call number systems assigned to them decades ago. and while online library users are now able to utilize an array of electronic access delivery systems and web technologies for their article research and consumption, book seekers still need a call number. books and articles have been our two primary textual formats for centuries. articles have moved into the digital realm more fleetly than their lengthier counterparts. their briefer length, the cyclical serial publication process, and the evolution of database containment and access have enabled, in a relatively short time, a migration from print to primarily digital access. books, however, are accessed in much the same way they were a hundred years ago. the development of opacs in the 1980s and 1990s and abandonment of card catalogs is the seminal evolution in print monograph access, but little else has changed.2 once a call number is attained, the rest of the process remains physical, usually requiring pencil, paper, feet, sometimes a librarian, and a trip through the library until the object itself is found and pulled from the shelf. so while the process of article acquisition may employ a plethora of finding aids, keyword searching, database features, full text availability, and various delivery methods through our richly developed websites, beyond the opac and possibly a static online map, book seekers are on their own or need a librarian in what may seem a meaningless labyrinth of stacks and shelves. while the primary and most practical purpose of our classification schemes is to provide an assigned call number for book finding, these organizational outlines create an order to the layout of our stacks that maps a universe of knowledge within our library walls. this structure of knowledge reveals a meaning to our collections that includes the colocation of books by topic and proximity of related subjects. these features enhance the browsing process and often lead to the act of serendipitous discovery. to locate a book by call number, a user may consult library floor plans, which are typically limited to broad ranges or lc main classes, then rely on stack-end cards to home in on the exact stack location. to browse books by subject without using the catalog, a user typically must rely on a combination of floor plans and lc outline posters if they exist at all. often, informed browsing by subject cannot take place without a visit to the reference desk for mediation by a librarian. even then, many librarians are barely familiar with their book collection’s organizational structure and are reticent to recommend broad subject browsing. information technology and libraries | june 2013 9 purpose and description of the subject map to help users locate books by call number and browse the collection by subject, animated subject maps were created at skidmore college and montana state university. displaying overhead views of library floors, users mouse over stacks to reveal the lc sub-classes located within. alternatively, they may browse and select lc subject headings to see which stacks contain them. the lc outline contains 21 main subject classes and 224 sub-classes, corresponding to the first two elements of a book call number. on stack mouse-over, three items are displayed: the call number by range, the main subject heading, and all sub-classes contained within the stack. when using the browse by subject option, users select and click an lc main class and the stacks where this class is located are highlighted. while the initial aim is a practical one, helping users to locate books and subjects, the subject map also reveals the knowledge organization of the physical library, which it displays in a way that can be meaningful to faculty, students, and other community members. the map also provides local electronic access to the lc classification outline. at both institutions the maps are linked from prominent web locations and electronic points of need that are relevant and proximate to other book searching functions and tools. figure 1. skidmore college subject map showing stack mouse-over display. animated subject maps for book collections | donahue 10 figure 2. montana state university subject map showing stack mouse-over display. design rationale and methodology the inspiration for the subject map started with a question: what if users could see on a map where individual subjects were located within the library? most library maps examined were limited to lc main classes or broad ranges denoting wide swaths of call numbers. including hundreds of lc subclasses would convolute and clutter a floor map beyond usability. but what if an online map contained each individual stack and only upon user-activation was the information revealed, saving space and avoiding clutter? such a map should be as devoid of congestion as possible and focus the user’s attention on library stack locations and lc classification. working from existing maps and architectural blueprints of the library building, a basic perimeter was rendered using adobe illustrator and indesign software. these perimeters were then imported into adobe flash and a new .fla file created. library stacks were then measured, counted, and added as a separate layer within each floor perimeter. basic location elements such as stairways, elevators, and doors were added for locational reference points. each stack was then programmed as a button with basic rollover functionality. flash actionscript was coded so that the correct call number, main class, and sub-class information appear within the interface upon rollover activation. this functionality accounts for the stack searching ability of the subject map. information technology and libraries | june 2013 11 additionally, the lc outline was made searchable within the map so that users can mouse over subjects and upon clicking, see what stacks contain those main classes. this functionality accounts for the subject searching ability of the map. left-hand navigation was built in so users can toggle between these two main search functions. maintaining visual minimalism and simplicity was a priority and inclinations to render the map more comprehensively were resisted in order to maximize attention to subject and stack information. black, white, and gray colors were chosen to enhance the contrast of the map and aid the user’s eye for quick and clear use. other relevant links and instructional context were added to the left-hand navigation including links to the catalog, official lc outline, and library homepage. finally, after uploading to the local server and creating a simple url, links to the subject map were established in prominent and meaningful points of need within the library website. user acceptance once the subject map was completed and links to it were made public, a brief demonstration was provided for reference team members who began showing it to users at the reference desk. initial reaction was enthusiastic. students thought it was “cool” and enjoyed “playing with it.” one reported, “i didn’t know the library actually made sense like that. it’s neat to see the logic about where things are.” another student said, “now i can see where all the books on buddhism are!” faculty, too, were pleased. though faculty members typically know a little about lc classification, they are not accustomed to seeing it visualized and grafted onto their institutional library’s stacks. making transparent the intellectual organization of the library for other faculty can bolster their confidence in our order and structure. professors are often pleased to see their discipline’s place within our stacks and where related subjects are located. the most positive praise for the subject map, however, comes from the sense of convenience it lends. many comments express appreciation for the ability to directly locate an individual book stack. because primary directional and finding elements like stairs and elevators are included in the maps, users are able to see the exact path that leads to the book they are seeking. for those not interested in browsing, in a hurry, or challenged in terms of mobility, the subject map is a time and energy saver. some users however have reported frustration with the sensitivity required for the mouse-over functions. others desire a more detailed level of searching beyond the sub-class level. one user pointed out that the subject map was of no help to the blind. multiple uses and internal applications the primary use and most obvious application of the subject map is as a reference tool. as a front line finding aid, librarians and other public service staff at reference, circulation, or other help desks can easily and conveniently deploy the map to point users in the right direction and orient them to the book collection. in library instruction sessions, the subject map is not only a practical local resource worth pointing out, but also serves as an example of applied knowledge organization. when accompanying a demonstration of the library catalog, the map is not only a valuable finding aid, but adds a layer of meaning as well. students who understand the map are animated subject maps for book collections | donahue 12 not only more able to browse and locate books, but learn that a call number represents a detailed subject meaning as well as locational device. used in conjunction with a tour, the map reinforces the layout of library shelves and helps to bridge the divide between electronic resources and physical retrieval. the subject map facilitates a concrete and visual introduction to the lc classification outline, a knowledge of which can be applied to most college and research libraries in the united states. the subject map can also be of assistance with collection development. perusal of the map can reveal relative strengths and weaknesses within the collection. subject liaisons and bibliographers may use the map to home in on and visualize their assigned areas. circulation staff and stacks maintenance workers find the map useful for book retrieval, shifting projects, and in the training and acclimation of new workers to the library. the subject map has proven to be a useful reference for library redesign and space planning considerations. at information fairs and promotional events where devices or projection screens are available, the map has served as a talking point and promotional piece of digital outreach. the map has been demonstrated by information science professors to lis graduate students as an example of applied knowledge organization in libraries. recently, a newly hired incoming library dean commented that the map helped him “get to know the book collection” and familiarized him with the library. figure 3. skidmore college subject map showing subject search display. information technology and libraries | june 2013 13 issues and challenges in some libraries, books don’t move for decades. the same subjects may reside on the same shelves during an entire library’s lifetime. in this case, a subject map can be designed once and never edited. but, of course, most library buildings go through changes and evolutions. in many libraries, collection shifting seems to be ongoing. book collections wax and wane. certain subjects expand with their times, while others shrink in irrelevancy. weeding does not affect all subjects and stacks equally and adjustments to shelves and end cards are necessary. in addition to the transitions of weeding and shifting, sometimes whole floors are reconfigured. in the library commons era of the last few decades, substantial redesigns have been commonplace as book collections make way for computer stations and study spaces. in all these cases, adjustments and updates will be necessary to keep a subject map accurate. this is easily done by going back into the master .fla file and editing as needed. in many cases only a stack or two need be adjusted, but in instances of major collection shifting some planning ahead may be necessary and more time allotted for redesign. shifting can be a complex spatial exercise and it is difficult to predict where subjects will realign exactly. subject map editing may have to wait until physical shifting is completed. it should be noted that each stack must be hand-coded separately. in libraries with hundreds of stacks this can seem a tedious and time-consuming design method. both subject maps rely on adobe flash animation technology. flash is proprietary software, so the benefits of open source software cannot be utilized with subject maps at this time. further, abobe flash reader software must be installed on a computer for the subject map to render. this has almost never been a problem, however, as the flash reader is ubiquitous and automatically installed on most public and private machines upon initial boot up. another concern, however, relating to flash technology is human assets. not every library has a flash designer or even someone who can implement the most fundamental flash capabilities. flash is not hard to learn and the subject maps utilize only its most basic functionalities, but still, for some it remains a niche software and many libraries will not have the resources to invest. reaction, though, to the live subject maps and the rollover interactivity they provide, has been so positive that more fully integrated flash maps have been proposed. why not have all physical elements of the library incorporated into one flash-enabled map? this is possible but may come at some expense to the functionality of the subject-rendering aspect of the maps. by limiting the application to stacks and lc classes, a user may remain more focused. avoiding clutter, overcrowding, and a preponderance of choice is a design strategy that has gained much credibility in recent years.3 the subject map enjoys the usability success of clean design, limited purpose, and simple rendering. while demonstrating the potential of user-activated animation for other proposed library applications, the subject map might be best maintained as a limited specialty map. a final concern regarding the long-term success of subject maps should be mentioned. how long will books remain in libraries? how long will they be organized by subject? when the physical animated subject maps for book collections | donahue 14 arrangement and organization of information objects no longer exists in libraries, maps of any kind will seemingly lose all efficacy. but will libraries themselves exist in this future? whither books? whither libraries? future developments the most prominent and practical attribute of the subject map is its ability to show a user the exact stack where the book they are seeking is located. but in its current state as a stand-alone application, a user must obtain a call number from a catalog search, then open the subject map by going to its independent url. investigation is underway to determine what is necessary in order to integrate the subject map with the online catalog. in this scenario, a catalog item record might also display an embedded subject map that automatically highlights the floor and stack where the call number is located. this seemingly requires .swf files and flash actionscript to be embedded in catalog coding. one potential solution is to attribute an individual url to each stack rendering so that a get url function can be applied and embedded in each catalog item record. this synthesis of subject map and catalog poses a complex challenge but promises meaningful and time-saving results for the item retrieval process. qr code technology in conjunction with subject map use is also being deployed. by fixing qr codes on stack end cards that link to relevant sections of the lc outline, a researcher may use a mobile device to browse digitally and physically within the stacks at the same time. in this way a user may conduct digital subject browsing and physical item browsing simultaneously. the urls linked to by qr coding contain detailed lc sub-levels not contained within the subject map, which is limited to the level of sub-class. the active discovery of new knowledge facilitated by exploiting preexisting lc organization inside library stacks in real time can be quite impressive when experienced firsthand. another development exploiting lc knowledge organization is in beta mode at this time. an lc search database has been created allowing users to enter words and find matching lc subject terminology. potentially, this database could be merged with the subject map, allowing users to correlate subject word search with physical locations independent of call numbers. despite its intent as a limited specialty map, possibilities are also being explored to incorporate the subject map into a more fully integrated library map. one way forward in this regard is to create map layers that could be toggled on and off by users. in this way, the subject map could exist as its own layer, maintaining its clarity and integrity when isolated but integrated when viewed with other layers. flash technology excels at allowing such layer creation. other stack maps and related technologies searching the web for “subject map” and relative terminology such as stack, shelf, book, and lc maps, does turn up various efforts and approaches to organizing and exploiting classification scheme data, but no animated, user-activated maps are found. similar searches across library and information science literature turn up some explorative research on the possibilities of mapping information technology and libraries | june 2013 15 lc data, but again no animated stack maps are found.4 there is a product licensed by bowker inc. called stackmap that can be linked to catalog search results. when a user clicks on the map link next to a call number result, a map is displayed with the destination stack highlighted, but the information provided is locational only. stackmap is not animated or user-activated. no subject information is given and the map offers no browsing features. since the release of html5, we are beginning to see more animation on the web that is not flashdriven. steve jobs and apple’s determined refusal to run flash on their mobile devices has motivated many to seek other animation options. new html5 animation tools such as adobe edge, hippo animator, and hype offer promising starts at dislodging the flash grip on web animation, but they have far to go and do not yet offer either the ease of design nor the range of creative possibilities of flash. building an animated subject map with html5 alone does not seem possible at this time. universal applicability of the subject map so far, subject maps have been created for two very different libraries. the commonality shared between the montana state university and skidmore college libraries is their possession of hundreds of thousands of books in stacks shelved by the lc classification system. this is a trait shared by nearly all college and research libraries. subject maps can be easily structured on the dewey decimal system as well so that public libraries could benefit from their functionality, making the subject map appropriate and creatable for more than 12,000 libraries.5 of our two primary textual formats, articles by far have received the most fiscal and technological support in recent decades. article searching and retrieval continues to evolve through the rich implementation of assets such as locally constructed resource management tools, independent journal title searches, complexly designed database search interfaces, and dedicated electronic resource librarians. meanwhile, our more traditional format, the book, seems in some ways to already be treated as a languishing symbol of the past. because its future is uncertain, does that justify our neglect in the present? as a profession we seem a bit complacent about the state of our book collections. why dedicate our technical resources to a format that is on the way out? but has the book disappeared yet? as we make room for more student lounges, coffee bars, computer stations, writing labs, and information commons, we should carefully ask what makes a library special. good books and the focused, sustained treatment of knowledge they contain are part of the correct answer, symbolically and as yet, practically speaking. while our books still occupy our library shelves, shouldn’t they also fully benefit from the ongoing technological explosion through which we continue to evolve? opacs haven’t evolved much in recent years. in fact they seem quite stymied to many librarians and users. we can do more with current technologies to assist and enrich the experience of users searching and browsing for books. the subject map is hopefully an example of how we can do more in this regard. while we have grown accustomed to increasingly look forward in order to position our libraries for the future, we should also remember to sometimes look back. our classification systems and animated subject maps for book collections | donahue 16 book collections are assets built from the past that represent many decades of great labor, investment, and achievement. more than 12,000 public and academic libraries together make up one of our greatest national treasures and bulwarks of living democracy. libraries are among the dearest valued assets in any of our states. many of the most beautiful buildings in our nation are libraries. based on library insurance values and estimated replacement costs, library buildings and the collections they hold amount cumulatively to hundreds of billions of dollars of worth.6 this astounding worth is figured mainly from the buildings themselves and the books they contain. a few have commented that there is some aesthetic quality to the subject maps. if this is true, the appeal comes from the synthesis of architectural form and the universe of knowledge revealed within, from the beauty of libraries both real and ideal, from physical and mental constructions unified. animated subject maps can help bring the physical and intellectual beauty of libraries into the digital realm, but the main appeal is a practical one: to point the user directly to the book or subject they are seeking. so in conclusion, perhaps we should measure the subject map’s potential in the light of ranganathan’s five laws of library science:7 1. books are for use. 2. every reader his [or her] book. 3. every book its reader. 4. save the time of the reader. 5. the library is a growing organism. the subject maps can be found at the following urls: skidmore college subject map: http://lib.skidmore.edu/includes/files/subjectmaps/subjectmap.swf montana state university subject map: www.lib.montana.edu/subjectmap references 1. google, “google books library project—an enhanced card catalog of the world’s books,” http://books.google.com/googlebooks/library.html, accessed november 8, 2012. 2. antonella iacono, “opac, users, web. future developments for online library catalogues,” bollettino aib 50, no. 1–2 (2010): 69–88, http://bollettino.aib.it/article/view/5296. 3. geoffrey little, “where are you going, where have you been? the evolution of the academic library web site,” the journal of academic librarianship 38, no. 2, (2012): 123–25, doi:10.1016:j.acalib.2012.02.005. http://lib.skidmore.edu/includes/files/subjectmaps/subjectmap.swf http://www.lib.montana.edu/subjectmap/ http://books.google.com/googlebooks/library.html http://bollettino.aib.it/article/view/5296 http://dx.doi.org/10.1016:j.acalib.2012.02.005 information technology and libraries | june 2013 17 4. kwan yi and lois mai chan, “linking folksonomy to library of congress subject headings: an exploratory study,” journal of documentation 65, no. 6 (2009): 872–900, doi:10.1108:00220410910998906. 5. american library association, “number of libraries in the united states, ala library fact sheet 1,” www.ala.org/tools/libfactsheets/alalibraryfactsheet01. 6. edward marman, “a method for establishing a depreciated monetary value for print collections,” library administration and management 9, no. 2 (1995): 94–98. 7. s. r. ranganathan, the five laws of library science (new delhi: ess ess, 2006), http://hdl.handle.net/2027/mdp.39015073883822. http://dx.doi.org/10.1108:00220410910998906 http://www.ala.org/tools/libfactsheets/alalibraryfactsheet01 http://hdl.handle.net/2027/mdp.39015073883822 president’s message thomas dowling information technologies and libraries | december 2015 doi: 10.6017/ital.v34i4.9150 1 the lita governing board has had a productive autumn, and i wanted to share a few highlights. keeping an eye on how better to understand and improve the member experience, we have a couple of new groups getting down to work. lita local task force i'm writing this shortly after returning from lita forum 2015, which was a fantastic meeting. i'm glad that so many people were able to attend, and i hope even more will come to forum 2016. but we know many members cannot regularly travel to national meetings, and even the best online experience can lack the serendipitous benefits that so often come from face-to-face meetings. the new lita local task force will be responsible for creating a toolkit to facilitate local groups’ ability to host events, including information on event planning, accessibility, and ensuring an inclusive culture at meetings. so you’ll be able to host a lita event in your own backyard! (if your backyard has a couple of meeting rooms and good wireless.) forum assessment and alternatives task force as we begin work on lita local events, we are also turning our eyes to our national meeting. planning the next lita forum is essentially a year-round process. we assess the work we’ve done on previous forums, of course, but the annual schedule often doesn’t afford an opportunity to strategically rethink what forum is and how it can best serve the members. to address that issue, we’re convening another new task force, on forum assessment and alternatives. this group will look critically at how forum advances our strategic priorities, and will also look at other library technology conferences to help identify how forum can continue to distinguish itself in a rapidly changing environment. lita personas task force finally, as i write this, the board is in the final stages of creating a personas task force as a tool for better understanding our current and potential new members. a well-constructed set of personas, representing both people who are lita members and people who aren’t—but who could be or should be—will become a valuable tool for membership development, programming, communications, assessment, and other purposes. each of these task forces will work throughout 2016 and deliver their results by midwinter 2017. it is worth noting that we could only convene these groups because we have a strong list of volunteers on tap. if you haven’t filled out a lita volunteer form recently, please considering doing so at http://www.ala.org/lita/about/committees. thomas dowling (dowlintp@wfu.edu) is lita president 2015-16 and director of technologies, z. smith reynolds library, wake forest university, winston-salem, north carolina. http://www.ala.org/lita/about/committees mailto:dowlintp@wfu.edu lita local task force forum assessment and alternatives task force lita personas task force 2 information technology and libraries | march 2011 program that would provide for educating mentees about lita, sharing of areas of expertise and awareness, and develop a network of professionals. dialogue on the lita electronic discussion list and conversations with committee and interest group chairs suggests a desire and need for leadership training. the membership development committee is addressing the need for mentors in lita 101 and lita 201 held at ala annual conferences and midwinter meetings. lita leadership, including the membership development committee, committee and interest group chairs, the education committee, lita emerging leaders, and others, will be included in an ongoing dialogue to see how and what can be implemented from the lita leadership institute and the lita mentorship program recommendations as submitted by the 2009 emerging leaders team t. follow-up by lita to implement the recommendations of emerging leader projects is important to the vitality and longevity of the association. since 2007, a number of projects have been developed by emerging leaders. information about the projects is available at the following locations online: ■■ the ala website: http://www.ala.org/ala/educationcareerleader ship/emergingleaders/index.cfm ■■ ala connect: http://connect.ala.org/emergingleaders ■■ facebook: http://www.facebook.com/pages/ala-emerging -leaders/156736295251?ref=ts/ ■■ the emerging leaders blog: http://connect.ala.org/2011emergingleaders ■■ the emerging leaders wiki: http://emergingleaders.ala.org/wiki/index.php ?title =main_page i n 2006, ala president leslie burger implemented six initiatives, including an emerging leaders program that is now in its fifth year. the initiative was designed to prepare librarians who are new to the profession in leadership skills that are applicable on the job and as active leaders within the association. lita is sponsoring 2011 emerging leaders bohyun kim and andreas orphanides. bohyun is currently digital access librarian at the florida international university medical library. andreas is currently librarian for digital technologies and learning at the north carolina state university libraries. as of the writing of this column, the projects for 2011 have not been assigned. additional lita members accepted into the 2011 ala emerging leaders program include tabatha farney, deana greenfield, amanda harlan, colleen harris, megan hodge, matthew jabaily, catherine kosturski, nicole pagowsky, casey schacher, sibyl schaefer, jessica sender, and andromeda yelton. lita provides an ideal environment for its members to enhance their skills. in 2009, emerging leaders team t developed a project “making it personal: leadership development programs for lita,” working in consultation with the lita membership development committee. team members included amanda hornby (university of washington), angelica guerrero fortin (san diego county library), dan overfield (cuyahoga community college), and lisa carlucci thomas (yale university). the team t members recommended the creation of “an online continuing education program to develop the leadership and project management skills necessary to maintain and promote the value and ability of lita’s professional membership to the greater librarian population.” outcomes for the training would include project-management and team-building skills within a context that focuses on the development and application of technology in libraries. the team members also recommended the establishing of a lita mentorship karen j. starr (kstarr@nevadaculture.org) is lita president 2010–11 and assistant administrator for library and development services, nevada state library and archives, carson city. karen j. starr president’s message: membership, leadership, emerging leaders, and lita 8 information technology and libraries | march 2010 t. michael silver monitoring network and service availability with open-source software silver describes the implementation of a monitoring system using an open-source software package to improve the availability of services and reduce the response time when troubles occur. he provides a brief overview of the literature available on monitoring library systems, and then describes the implementation of nagios, an open-source network monitoring system, to monitor a regional library system’s servers and wide area network. particular attention is paid to using the plug-in architecture to monitor library services effectively. the author includes example displays and configuration files. editor’s note: this article is the winner of the lita/ex libris writing award, 2009. l ibrary it departments have an obligation to provide reliable services both during and after normal business hours. the it industry has developed guidelines for the management of it services, but the library community has been slow to adopt these practices. the delay may be attributed to a number of factors, including a dependence on vendors and consultants for technical expertise, a reliance on librarians who have little formal training in it best practices, and a focus on automation systems instead of infrastructure. larger systems that employ dedicated it professionals to manage the organization’s technology resources likely implement best practices as a matter of course and see no need to discuss them within the library community. in the practice of system and network administration, thomas a. limoncelli, christine j. hogan, and strata r. chalup present a comprehensive look at best practices in managing systems and networks. early in the book they provide a short list of first steps toward improving it services, one of which is the implementation of some form of monitoring. they point out that without monitoring, systems can be down for extended periods before administrators notice or users report the problem.1 they dedicate an entire chapter to monitoring services. in it, they discuss the two primary types of monitoring—real-time monitoring, which provides information on the current state of services, and historical monitoring, which provides long-term data on uptime, use, and performance.2 while the software discussed in this article provides both types of monitoring, i focus on real-time monitoring and the value of problem identification and notification. service monitoring does not appear frequently in library literature, and what is written often relates to single-purpose custom monitoring. an article in the september 2008 issue of ital describes the development and deployment of a wireless network, including a perl script written to monitor the wireless network and associated services.3 the script updates a webpage to display the results and sends an e-mail notifying staff of problems. an enterprise monitoring system could perform these tasks and present the results within the context of the complete infrastructure. it would require using advanced features because of the segregation of networks discussed in their article but would require little or no extra effort than it took to write the single-purpose script. dave pattern at the university of huddersfield shared another perl script that monitors opac functionality.4 again, the script provided a single-purpose monitoring solution that could be integrated within a larger model. below, i discuss how i modified his script to provide more meaningful monitoring of our opac than the stock webpage monitoring plug-in included with our opensource networks monitoring system, nagios. service monitoring can consist of a variety of tests. in its simplest form, a ping test will verify that a host (server or device) is powered on and successfully connected to the network. feher and sondag used ping tests to monitor the availability of the routers and access points on their network, as do i for monitoring connectivity to remote locations.5 a slightly more meaningful check would test for the establishment of a connection on a port. feher and sondag used this method to check the daemons in their network.6 the step further would be to evaluate a service response, for example checking the status code returned by a web server. evaluating content forms the next level of meaning. limoncelli, hogan, and chalup discuss end-to-end monitoring, where the monitoring system actually performs meaningful transactions and evaluates the results.7 pattern’s script, mentioned above, tests opac functionality by submitting a known keyword search and evaluating the response.8 i implemented this after an incident where nagios failed to alert me to a problem with the opac. the web server returned a status code of 200 to the request for the search page. users, however, want more from an opac, and attempts to search were unsuccessful because of problems with the index server. modifying pattern’s original script, i was able to put together a custom check command that verifies a greater level of functionality by evaluating the number of results for the known search. n software selection limoncelli, hogan, and chalup do not address specific t. michael silver (michael.silver@ualberta.ca) is an mlis student, school of library and information studies, university of alberta, edmonton, alberta, canada. monitoring network and service availability with open-source software | silver 9 how-to issues and rarely mention specific products. their book provides the foundational knowledge necessary to identify what must be done. in terms of monitoring, they leave the selection of an appropriate tool to the reader.9 myriad monitoring tools exist, both commercial and open-source. some focus on network analysis, and some even target specific brands or model lines. the selection of a specific software package should depend on the services being monitored and the goals for the monitoring. wikipedia lists thirty-five different products, of which eighteen are commercial (some with free versions with reduced functionality or features); fourteen are opensource projects under a general public license or similar license (some with commercial support available but without different feature sets or licenses); and three offer different versions under different licenses.10 von hagen and jones suggest two of them: nagios and zabbix.11 i selected the nagios open-source product (http:// www.nagios.org). the software has an established history of active development, a large and active user community, a significant number of included and usercontributed extensions, and multiple books published on its use. commercial support is available from a company founded by the creator and lead developer as well as other authorized solution providers. monitoring appliances based on nagios are available, as are sensors designed to interoperate with nagios. because of the flexibility of a software design that uses a plug-in architecture, service checks for library-specific applications can be implemented. if a check or action can be scripted using practically any protocol or programming language, nagios can monitor it. nagios also provides a variety of information displays, as shown in appendixes a–e. n installation the nagios system provides an extremely flexible solution to monitor hosts and services. the object-orientation and use of plug-ins allows administrators to monitor any aspect of their infrastructure or services using standard plug-ins, user-contributed plug-ins, or custom scripts. additionally, the open-source nature of the package allows independent development of extensions to add features or integrate the software with other tools. community sites such as monitoringexchange (formerly nagios exchange), nagios community, and nagios wiki provide repositories of documentation, plug-ins, extensions, and other tools designed to work with nagios.12 but that flexibility comes at a cost—nagios has a steep learning curve, and usercontributed plug-ins often require the installation of other software, most notably perl modules. nagios runs on a variety of linux, unix, and berkeley software distribution (bsd) operating systems. for testing, i used a standard linux server distribution installed on a virtual machine. virtualization provides an easy way to test software, especially if an alternate operating system is needed. if given sufficient resources, a virtual machine is capable of running the production instance of nagios. after installing and updating the operating system, i installed the following packages: n apache web server n perl n gd development library, needed to produce graphs and status maps n libpng-devel and libjpeg-devel, both needed by the gd library n gcc and gnu make, which are needed to compile some plug-ins and perl modules most major linux and bsd distributions include nagios in their software repositories for easy installation using the native package management system. although the software in the repositories is often not the most recent version, using these repositories simplifies the installation process. if a reasonably recent version of the software is available from a repository, i will install from there. some software packages are either outdated or not available, and i manually install these. detailed installation instructions are available on the nagios website, in several books, and on the previously mentioned websites.13 the documentation for version 3 includes a number of quick-start guides.14 most package managers will take care of some of the setup, including modifying the apache configuration file to create an alias available at http://server.name/nagios. i prepared the remainder of this article using the latest stable versions of nagios (3.0.6) and the plug-ins (1.4.13) at the time of writing. n configuration nagios configuration relies on an object model, which allows a great deal of flexibility but can be complex. planning your configuration beforehand is highly recommended. nagios has two main configuration files, cgi.cfg and nagios.cfg. the former is primarily used by the web interface to authenticate users and control access, and it defines whether authentication is used and which users can access what functions. the latter is the main configuration file and controls all other program operations. the cfg_file and cfg_dir directives allow the configuration to be split into manageable groupsusing additional recourse files and the object definition files (see figure 1). the flexibility offered allows a variety of different structures. i group network 10 information technology and libraries | march 2010 devices into groups but create individual files for each server. nagios uses an objectoriented design. the objects in nagios are displayed in table 1. a complete review of nagios configuration is beyond the scope of this article. the documentation installed with nagios covers it in great detail. special attention should be paid to the concepts of templates and object inheritance as they are vital to creating a manageable configuration. the discussion below provides a brief introduction, while appendixes f–j provide concrete examples of working configuration files. n cgi.cfg the cgi.cfg file controls the web interface and its associated cgi (common gateway interface) programs. during testing, i often turn off authentication by setting use_authentication to 0 if the web interface is not accessible from the internet. there also are various configuration directives that provide greater control over which users can access which features. the users are defined in the /etc/nagios/htpasswd.users file. a summary of commands to control entries is presented in table 2. the web interface includes other features, such as sounds, status map displays, and integration with other products. discussion of these directives is beyond the scope of this article. the cgi.cfg file provided with the software is well commented, and the nagios documentation provides additional information. a number of screenshots from the web interface are provided in the appendixes, including status displays and reporting. n nagios.cfg the nagios.cfg file controls the operation of everything except the web interface. although it is possible to have a single monolithic configuration file, organizing the configuration into manageable files works better. the two main directives of note are cfg_file, which defines a single file that should be included, and cfg_dir, which includes all files in the specified directory with a .cfg extension. a third type of file that gets included is resource.cfg, which defines various macros for use in commands. organizing the object files takes some thought. i monitor more than one hundred services on roughly seventy hosts, so the method of organizing the files was of more than academic interest. i use the following configuration files: n commands.cfg, containing command definitions n contacts.cfg, containing the list of contacts and associated information, such as e-mail address, (see appendix h) n groups.cfg, containing all groups—hostgroups, servicegroups, and contactgroups, (see appendix g) n templates.cfg, containing all object templates, (see appendix f) n timeperiods.cfg, containing the time ranges for checks and notifications all devices and servers that i monitor are placed in directories using the cfg_dir directive: servers—contains server configurations. each file includes the host and service configurations for a physical or virtual server. devices—contains device information. i create individual files for devices with service monitoring that goes beyond simple ping tests for connectivtable 1. nagios objects object used for hosts servers or devices being monitored hostgroups groups of hosts services services being monitored servicegroups groups of services timeperiods scheduling of checks and notifications commands checking hosts and services notifying contacts processing performance data event handling contacts individuals to alert contactgroups groups of contacts figure 1. nagios configuration relationships. copyright © 2009 ethan galstead, nagios enterprises. used with permission. monitoring network and service availability with open-source software | silver 11 ity. devices monitored solely for connectivity are grouped logically into a single file. for example, we monitor connectivity with fifty remote locations, and all fifty of them are placed in a single file. the resource.cfg file uses two macros to define the path to plug-ins and event handlers. thirty other macros are available. because the cgi programs do not read the resource file, restrictive permissions can be applied to them, enabling some of the macros to be used for usernames and passwords needed in check commands. placing sensitive information in service configurations exposes them to the web server, creating a security issue. n configuration the appendixes include the object configuration files for a simple monitoring situation. a switch is monitored using a simple ping test (see appendix j), while an opac server on the other side of the switch is monitored for both web and z39.50 operations (see appendix i). note that the opac configuration includes a parents directive that tells nagios that a problem with the gateway-switch will affect connectivity with the opac server. i monitor fifty remote sites. if my router is down, a single notification regarding my router provides more information if it is not buried in a storm of notifications about the remote sites. the web port, web service, and opac search services demonstrate different levels of monitoring. the web port simply attempts to establish a connection to port 80 without evaluating anything beyond a successful connection. the web service check requests a specific page from the web server and evaluates only the status code returned by the server. it displays a warning because i configured the check to download a file that does not exist. the web server is running because it returns an error code, hence the warning status. the opac search uses a known search to evaluate the result content, specifically whether the correct number of results is returned for a known search. i used a number of templates in the creation of this configuration. templates reduce the amount of repetitive typing by allowing the reuse of directives. templates can be chained, as seen in the host templates. the opac definition uses the linux-server template, which in turn uses the generic-host template. the host definition inherits the directives of the template it uses, overriding any elements in both and adding new elements. in practical terms, generic-host directives are read first. linux-server directives are applied next. if there is a conflict, the linuxserver directive takes precedence. finally, opac is read. again, any conflicts are resolved in favor of the last configuration read, in this case opac. n plug-ins and service checks the nagios plugins package provides numerous plug-ins, including the check-host-alive, check_ping, check_tcp, and check_http commands. using the plug-ins is straightforward, as demonstrated in the appendixes. most plugins will provide some information on use if executed with—help supplied as an argument to the command. by default, the plug-ins are installed in /usr/lib/nagios/ plugins. some distributions may install them in a different directory. the plugins folder contains a subfolder with usercontributed scripts that have proven useful. most of these plug-ins are perl scripts, many of which require additional perl modules available from the comprehensive perl archive network (cpan). the check_hip_search plug-in (appendix k) used in the examples requires additional modules. installing perl modules is best accomplished using the cpan perl module. detailed instructions on module installation are available online.15 some general tips: n gcc and make should be installed before trying to install perl modules, regardless of whether you are installing manually or using cpan. most modules are provided as source code, which may require compiling before use. cpan automates this process but requires the presence of these packages. n alternately, many linux distributions provide perl module packages. using repositories to install usually works well assuming the repository has all the needed modules. in my experience, that is rarely the case. table 2. sample commands for managing the htpasswd.users file create or modify an entry, with password entered at a prompt: htpasswd /etc/nagios/htpasswd.users create or modify an entry using password from the command line: htpasswd -b /etc/nagios/htpasswd.users delete an entry from the file: htpasswd -d /etc/nagios/htpasswd.users 12 information technology and libraries | march 2010 n many modules depend on other modules, sometimes requiring multiple install steps. both cpan and distribution package managers usually satisfy dependencies automatically. manual installation requires the installer to satisfy the dependencies one by one. n most plug-ins provide information on required software, including modules, in a readme file or in the source code for the script. in the absence of such documentation, running the script on the command line usually produces an error containing the name of the missing module. n testing should be done using the nagios user. using another user account, especially the root user, to create directories, copy files, and run programs creates folders and files that are not accessible to the nagios user. the best practice is to use the nagios user for as much of the configuration and testing as possible. the lists and forums frequently include questions from new users that have successfully installed, configured, and tested nagios as the root user and are confused when nagios fails to start or function properly. n advanced topics once the system is running, more advanced features can be explored. the documentation describes many such enhancements, but the following may be particularly useful depending on the situation. n nagios provides access control through the combination of settings in the cgi.cfg and htpasswd.users files. library administration and staff, as well as patrons, may appreciate the ability to see the status of the various systems. however, care should be taken to avoid disclosing sensitive information regarding the network or passwords, or allowing access to cgi programs that perform actions. n nagios permits the establishment of dependency relationships. host dependencies may be useful in some rare circumstances not covered by the parent–child relationships mentioned above, but service dependencies provide a method of connecting services in a meaningful manner. for example, certain opac functions are dependent on ils services. defining these relationships takes both time and thought, which may be worthwhile depending on any given situation. n event handlers allow nagios to initiate certain actions after a state change. if nagios notices that a particular service is down, it can run a script or program to attempt to correct the problem. care should be taken when creating these scripts as service restarts may delete or overwrite information critical to solving a problem, or worsen the actual situation if an attempt to restart a service or reboot a server fails. n nagios provides notification escalations, permitting the automatic notification of problems that last longer than a certain time. for example, a service escalation could send the first three alerts to the admin group. if properly configured, the fourth alert would be sent to the managers group as well as the admin group. in addition to escalating issues to management, this feature can be used to establish a series of responders for multiple on-call personnel. n nagios can work in tandem with remote machines. in addition to custom scripts using secure shell (ssh), the nagios remote plug-in executor (nrpe) add-on allows the execution of plug-ins on remote machines, while the nagios service check acceptor (nsca) add-on allows a remote host to submit check results to the nagios server for processing. implementing nagios on the feher and sondag wireless network mentioned earlier would require one of these options because the wireless network is not accessible from the external network. these add-ons also allow for distributed monitoring, sharing the load among a number of servers while still providing the administrators with a single interface to the entire monitored network. the nagios exchange (http://exchange.nagios .org/) contains similar user-contributed programs for windows. n nagios can be configured to provide redundant or failover monitoring. limoncelli, hogan, and chalup call this metamonitoring and describe when it is needed and how it can be implemented, suggesting self-monitoring by the host or having a second monitoring system that only monitors the main system.16 nagios permits more complex configurations, allowing for either two servers operating in parallel, only one of which sends notifications unless the main server fails, or two servers communicating to share the monitoring load. n alternative means of notification increase access to information on the status of the network. i implemented another open-source software package, quickpage, which allows nagios text messages to be sent from a computer to a pager or cell phone.17 appendix l shows a screenshot of a firefox extension that displays host and service problems in the status bar of my browser and provides optional audio alerts.18 the nagios community has developed a number of alternatives, including specialized web interfaces and rss feed generators.19 monitoring network and service availability with open-source software | silver 13 n appropriate use monitoring uses bandwidth and adds to the load of machines being monitored. accordingly, an it department should only monitor its own servers and devices, or those for which it has permission to do so. imagine what would happen if all the users of a service such as worldcat started monitoring it! the additional load would be noticeable and could conceivably disrupt service. aside from reasons connected with being a good “netizen,” monitoring appears similar to port-scanning, a technique used to discover network vulnerabilities. an organization that blithely monitors devices without the owner’s permission may find their traffic is throttled back or blocked entirely. if a library has a definite need to monitor another service, obtaining permission to do so is a vital first step. if permission is withheld, the service level agreement between the library and its service provider or vendor should be reevaluated to ensure that the provider has an appropriate system in place to respond to problems. n benefits the system-administration books provide an accurate overview of the benefits of monitoring, but personally reaping those benefits provides a qualitative background to the experience. i was able to justify the time spent on setting up monitoring the first day of production. one of the available plug-ins monitors sybase database servers. it was one of the first contributed plug-ins i implemented because of past experiences with our production database running out of free space, causing the system to become nonfunctional. this happened twice, approximately a year apart. each time, the integrated library system was down while the vendor addressed the issue. when i enabled the sybase service checks, nagios immediately returned a warning for the free space. the advance warning allowed me to work with the vendor to extend the database volume with no downtime for our users. that single event convinced the library director of the value of the system. since that time, nagios has proven its worth in alerting it staff to problem situations, providing information on outage patterns both for in-house troubleshooting and discussions with service providers. n conclusion monitoring systems and services provides it staff with a vital tool in providing quality customer service and managing systems. installing and configuring such a system involves a learning curve and takes both time and computing resources. my experiences with nagios have convinced me that the return on investment more than justifies the costs. references 1. thomas a. limoncelli, christina j. hogan, and strata r. chalup, the practice of system and network administration, 2nd ed. (upper saddle river, n.j.: addison-wesley, 2007): 36. 2. ibid., 523–42. 3. james feher and tyler sondag, “administering an opensource wireless network,” information technology & libraries 27, no. 3 (sept. 2008): 44–54. 4. dave pattern, “keeping an eye on your hip,” online posting, jan. 23, 2007, self-plagiarism is style, http://www.daveyp .com/blog/archives/164 (accessed nov. 20, 2008). 5. feher and sondag, “administering an open-source wireless network,” 45–54. 6. ibid., 48, 53–54. 7. limoncelli, hogan, and chalup, the practice of system and network administration, 539–40. 8. pattern, “keeping an eye on your hip.” 9. limoncelli, hogan, and chalup, the practice of system and network administration, xxv. 10. “comparison of network monitoring systems,” wikipedia, the free encyclopedia, dec. 9, 2008, http://en.wikipedia .org/wiki/comparison_of_network_monitoring_systems (accessed dec. 10, 2008). 11. william von hagen and brian k. jones, linux server hacks, vol. 2 (sebastopol, calif.: o’reilly, 2005): 371–74 (zabbix), 382–87 (nagios). 12. monitoringexchange, http://www.monitoringexchange. org/ (accessed dec. 23, 2009); nagios community, http:// community.nagios.org (accessed dec. 23, 2009); nagios wiki, http://www.nagioswiki.org/ (accessed dec. 23, 2009). 13. “nagios documentation,” nagios, mar. 4, 2008, http:// www.nagios.org/docs/ (accessed dec. 8, 2008); david josephsen, building a monitoring infrastructure with nagios (upper saddle river, n.j.: prentice hall, 2007); wolfgang barth, nagios: system and network monitoring, u.s. ed. (san francisco: open source press; no starch press, 2006). 14. ethan galstead, “nagios quickstart installation guides,” nagios 3.x documentation, nov. 30, 2008, http://nagios.source forge.net/docs/3_0/quickstart.html (accessed dec. 3, 2008). 15. the perl directory, (http://www.perl.org/) contains complete information on perl. specific information on using cpan is available in “how do i install a module from cpan?” perlfaq8, nov. 7, 2007, http://perldoc.perl.org/perlfaq8.html (accessed dec. 4, 2008). 16. limoncelli, hogan, and chalup, the practice of system and network administration, 539–40. 17. thomas dwyer iii, qpage solutions, http://www.qpage .org/ (accessed dec. 9, 2008). 18. petr šimek, “nagioschecker,” google code, aug. 12, 2008, http://code.google.com/p/nagioschecker/ (accessed dec. 8, 2008). 19. “notifications,” monitoringexchange, http://www .monitoringexchange.org/inventory/utilities/addon-projects/notifications (accessed dec. 23, 2009). 14 information technology and libraries | march 2010 appendix a. service detail display from test system appendix b. service details for opac (hip) and ils (horizon) servers from production system appendix c. sybase freespace trends for a specified period appendix d. connectivity history for a specified period appendix e. availability report for host shown in appendix d appendix f. templates.cfg file ############################################################################ # templates.cfg sample object templates ############################################################################ ############################################################################ # contact templates ############################################################################ monitoring network and service availability with open-source software | silver 15 # generic contact definition template this is not a real contact, just # a template! define contact{ name generic-contact service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r,f,s host_notification_options d,u,r,f,s service_notification_commands notify-service-by-email host_notification_commands notify-host-by-email register 0 } ############################################################################ # host templates ############################################################################ # generic host definition template this is not a real host, just # a template! define host{ name generic-host notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 notification_period 24x7 register 0 } # linux host definition template this is not a real host, just a template! define host{ name linux-server use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check-host-alive notification_period workhours notification_interval 120 notification_options d,u,r contact_groups admins register 0 } appendix f. templates.cfg file (cont.) 16 information technology and libraries | march 2010 # define a template for switches that we can reuse define host{ name generic-switch use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check-host-alive notification_period 24x7 notification_interval 30 notification_options d,r contact_groups admins register 0 } ############################################################################ # service templates ############################################################################ # generic service definition template this is not a real service, # just a template! define service{ name generic-service active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 10 retry_check_interval 2 contact_groups admins notification_options w,u,c,r notification_interval 60 notification_period 24x7 register 0 } appendix f. templates.cfg file (cont.) monitoring network and service availability with open-source software | silver 17 # define a ping service. this is not a real service, just a template! define service{ use generic-service name ping-service notification_options n check_command check_ping!1000.0,20%!2000.0,60% register 0 } appendix f. templates.cfg file (cont.) appendix g. groups.cfg file ############################################################################ # contact group definitions ############################################################################ # we only have one contact in this simple configuration file, so there is # no need to create more than one contact group. define contactgroup{ contactgroup_name admins alias nagios administrators members nagiosadmin } ############################################################################ # host group definitions ############################################################################ # define an optional hostgroup for linux machines define hostgroup{ hostgroup_name linux-servers ; the name of the hostgroup alias linux servers ; long name of the group } # create a new hostgroup for ils servers define hostgroup{ hostgroup_name ils-servers ; the name of the hostgroup alias ils servers ; long name of the group } # create a new hostgroup for switches define hostgroup{ hostgroup_name switches ; the name of the hostgroup alias network switches ; long name of the group } ############################################################################ # service group definitions ############################################################################ 18 information technology and libraries | march 2010 # define a service group for network connectivity define servicegroup{ servicegroup_name network alias network infrastructure services } # define a servicegroup for ils define servicegroup{ servicegroup_name ils-services alias ils related services } appendix g. groups.cfg file (cont.) appendix h. contacts.cfg ############################################################################ # contacts.cfg sample contact/contactgroup definitions ############################################################################ # just one contact defined by default the nagios admin (that’s you) # this contact definition inherits a lot of default values from the # ‘generic-contact’ template which is defined elsewhere. define contact{ contact_name nagiosadmin use generic-contact alias nagios admin email nagios@localhost } appendix i. opac.cfg ############################################################################ # opac server ############################################################################ ############################################################################ # host definition ############################################################################ # define a host for the server we’ll be monitoring # change the host_name, alias, and address to fit your situation define host{ use linux-server host_name opac parents gateway-switch alias opac server monitoring network and service availability with open-source software | silver 19 appendix i. opac.cfg (cont.) address 192.168.1.123 } ############################################################################ # service definitions ############################################################################ # create a service for monitoring the http port define service{ use generic-service host_name opac service_description web port check_command check_tcp!80 } # create a service for monitoring the web service define service{ use generic-service host_name opac service_description web service check_command check_http!-u/bogusfilethatdoesnotexist.html } # create a service for monitoring the opac search define service{ use generic-service host_name opac service_description opac search check_command check_hip_search } # create a service for monitoring the z39.50 port define service{ use generic-service host_name opac service_description z3950 port check_command check_tcp!210 } appendix j. switches.cfg ############################################################################ # switch.cfg sample config file for monitoring switches ############################################################################ ############################################################################ # host definitions ############################################################################ 20 information technology and libraries | march 2010 appendix k. check_hip_search script #!/usr/bin/perl -w ######################### # check horizon information portal (hip) status. # hip is the web-based interface for dynix and horizon # ils systems by sirsidynix corporation. # # this plugin is based on a standalone perl script written # by dave pattern. please see # http://www.daveyp.com/blog/index.php/archives/164/ # for the original script. # # the original script and this derived work are covered by # http://creativecommons.org/licenses/by-nc-sa/2.5/ ######################### use strict; use lwp::useragent; # note the requirement for perl module lwp::useragent! use lib “/usr/lib/nagios/plugins”; use utils qw($timeout %errors); # define the switch that we’ll be monitoring define host{ use generic-switch host_name gateway-switch alias gateway switch address 192.168.0.1 hostgroups switches } ############################################################################ ### # service definitions ############################################################################ ### # create a service to ping to switches # note this entry will ping every host in the switches hostgroup define service{ use ping-service hostgroups switches service_description ping normal_check_interval 5 retry_check_interval 1 } appendix j. switches.cfg monitoring network and service availability with open-source software | silver 21 ### some configuration options my $hipserverhome = “http://ipac.prl.ab.ca/ipac20/ipac. jsp?profile=alap”; my $hipserversearch = “http://ipac.prl.ab.ca/ipac20/ipac.jsp?menu=se arch&aspect=subtab132&npp=10&ipp=20&spp=20&profile=alap&ri=&index=.gw&term=li nux&x=18&y=13&aspect=subtab132&getxml=true”; my $hipsearchtype = “xml”; my $httpproxy = ‘’; ### check home page is available... { my $ua = lwp::useragent->new; $ua->timeout( 10 ); if( $httpproxy ) { $ua->proxy( ‘http’, $httpproxy ) } my $response = $ua->get( $hipserverhome ); my $status = $response->status_line; if( $response->is_success ) { } else { print “hip_search critical: $status\n”; exit $errors{‘critical’}; } } ### check search page is returning results... { my $ua = lwp::useragent->new; $ua->timeout( 10 ); if( $httpproxy ) { $ua->proxy( ‘http’, $httpproxy ) } my $response = $ua->get( $hipserversearch ); my $status = $response->status_line; if( $response->is_success ) { my $results = 0; my $content = $response->content; if( lc( $hipsearchtype ) eq ‘html’ ) { if ( $content =~ /\(\d+?)\<\/b\>\ \;titles matched/ ) { $results = $1; appendix k. check_hip_search script (cont.) 22 information technology and libraries | march 2010 } } if( lc( $hipsearchtype ) eq ‘xml’ ) { if( $content =~ /\(\d+?)\<\/hits\>/ ) { $results = $1; } } ### modified section original script triggered another function to ### save results to a temp file and email an administrator. unless( $results ) { print “hip_search critical: no results returned|results=0\n”; exit $errors{‘critical’}; } if ( $results ) { print “hip_search ok: $results results returned|results=$results\n”; exit $errors{‘ok’}; } } } appendix k. check_hip_search script (cont.) appendix l. nagios checker display lib-mocs-kmc364-20131012113301 202 journal of library automation vol. 14/3 september 1981 rlin system scheduled to be operational this summer. the proposed test will involve eight members of the reference stafffour from each departmentwho will be trained to search on oclc and rlin. those selected will include both librarians and library assistants who regularly provide reference assistance. the results obtained from such a representative group will better enable us to assess the impact on the whole reference staff should we later decide to fully implement the service. they will be the only ones involved in sampling questions and conducting comparative searches. the test will have two components, the first of which will be a twenty-week period to collect at least 400 sample questions. during their regularly scheduled reference hours, the eight specially trained librarians 'will collect samples of reference requests for materials that, based on the information initially given by the patron, cannot be identified in the card catalog. after checking the catalog, the librarian will then complete the top portion of a two-page selfcarbon form with all of the information that is known about the requested item. then, at regular intervals during the semester, the pages of each form will be separated and distributed to other members of the test staff for batch-mode searching. the manual oclc and rlin searching for each query will be done by different staff members to eliminate crossover effects. each request will be searched on both oclc and rlin with the following information being recorded: 1. date of the material requested (if known). 2. type of material (e.g. , conference proceeding) . 3. amount of time required to do the search. 4. success or failure of the search. this information will then be cumulated in a statistical table, and the results of each search will be keypunched for computerized analysis using the bmdp (biomedical computer programs) statistical package to determine whether or not effectiveness and efficiency have been improved significantly. in addition , on twenty-four randomly selected days during the semester the trained searchers will count the total number of questions received by them on that day that would have been appropriate to search on rlin or oclc. by using these data it will be possible to extrapolate the potential usefulness of the systems for the entire semester. the second component of the test will be a two-week real-life test during which all questions requiring further verification would be searched immediately on rlin, oclc, and in the appropriate printed sources to compare time required, success rate, and type of material requested. this sort of test would permit the searcher to continue to negotiate with the patron as the search progressed, which is the usual situation. also, this would provide the only opportunity to have the patron judge the value of subject searches done on rlin. if funding is received, preliminary results should be available in early 1982. anyone conducting similar or otherwise relevant studies is asked to contact the author. replicating the washington library network computer system software thomas p. brown: manager of computer services, and raymond deb use: manager of development and library services, washington library network, olympia. the washington library network (wln) computer system supports shared cataloging and catalog maintenance, retrospective conversion, reference, com catalog production, acquisitions, and accounting functions for libraries operating within a network. the system offers both full marc and brief catalog records as well as linked authority control for all traced headings. it contains more than 250,000 lines of pli 1 and ibm bal code in more than 1,100 program modules and runs on ibm or ibm-compatible hardware with ibm operating systems (mvs ,os/vsl). all database management functions are provided by ada:bas, a product of software a.g. of north america. the online system runs under cics/vs 1.5. a set of assembler codes called the tp monitor interface defines a standard service interface between the applications programs and the tp monitor. this allows easy upgrade to different tp monitors and convenient points for collecting performance statistics. the majority of the bibliographic subsystem updating is done in batch mode to conserve online resources. a new version of the system with interactive updating is currently being planned, for use in special environments. the applications software was designed and implemented with a number of important conventions: 1. top-down design. 2. standard use of ibm environments. 3. structured coding techniques. 4. interfaces to a database management system (adabas) and teleprocessing monitor (currently cics). 5. stand;ud naming and formatting. 6. use of a standard set of data structures and assembler subroutines to manipulate data. 7. identification of maintenance changes in source programs. in addition, programming for the online functions meets other conditions: 1. load modules less than 20k bytes. 2. no pl/1 run-time subroutines. 3. reentrant coding. 4. standard services for the tp monitor interface. 5. applications are kept as terminal independent a~ possible, with the tp monitor interface performing input and output translations. replication a system with these characteristics, even though large, can easily be transported to a different site. while wln was not designed with multiple replications in mind, a policy decision made by the network a few years ago made replication an attractive possibility. recognizing that it had a capability that would be highly competitive with other online shared bibliographic services, wln expanded its service area beyond the state of washington. it set limits to its expansion, however, having determined that it would remain a small, responsive organicommunications 203 zation providing what it hoped would be superior service to its participating libraries. having set such limits, however, created two impediments to its achieving superior service: wln would have a smaller base of libraries from which its participants could obtain the benefits of shared cataloging, and there would not be the fiscal resources necessary to support a large continuing development effort. both would penalize libraries for joining wln, the first with a lower hit rate against the database and the second with fewer added capabilities. replication provided a possible answer to both of these problems, as well as a potential source of income. in its software license agreements, wln asks the licensee to agree to bibliographic data sharing. all cataloging done by a licensee or its participants would thus be available for loading on wln's own database; likewise, all wln participant cataloging will be made available to the licensee. wln, at least, would accept catalog records only from libraries that follow its bibliographic standards; that is, the standards of the library of congress. currently this sharing is accomplished via magnetic tape, but in the future, online record interchange may be possible, given wln's current work in this area. wln also explicitly asks in its software license agreements that the replicating institution carry out an organized program of development to meet the latter's particular needs. such development is monitored by wln in order that redundant work is not undertaken and to ensure that the various efforts relate coherently. there is a built-in constraint upon major modification andredesign: wln is packaging enhancements and changes into periodic releases of the source code and requiring that the replicants install these releases within twelve months of the date issued. because of the interest in shared development and because wln itself is not in a position to provide first-level program maintenance, the system is distributed in source-code form. the initial installation, however, is of load modules (programs in a form efficiently read and executed by machine), allowing immediate testing of the system's capabilities in its new environ204 journal of library automation vol. 14/3 september 1981 ment. wln is currently negotiating a contract with a new firm, biblio-techniques, that will offer a more nearly turnkey version of wln, packaged with adabas and software a.g.'s tp monitor, complete, and, if necessary, with the required hardware. national lffirary of australia the first replication of the system was made at the national library of australia (nla) in cranberra in early 1979. nla had its own ibm 370/148 and an established data processing staff. adabas had been installed prior to the arrival of wln's installation consultant. minor changes are necessary in cics to support dedicated wln terminals, and these were quickly made and the system was up within days. further work allowed searching on the system from the library's 3270 terminals. after a couple of weeks of shakedown, a wln staff member spent about two weeks training nla staff in the use of the system. it has been in full production for in-house production cataloging for more than a year now, and this spring is being extended to other libraries around the country on a pilot basis, testing the concept of the newly defined australian bibliographic network (abn). nla has replaced the 370/l48 with a larger machine. university of illinois the second installation occurred earlier this year at the university of illinois, where the system was obtained to carry out a pilot project in which the urbana campus and the river bend library system will use it as an online public-access catalog in conjunction with the lcs circulation system. again, load modules were installed and the system was up within a few days, running on the university's administrative computer at the chicago circle campus. illinois staff have had some difficulties in recompiling all of the source code, but these problems are being worked out. wln will warrant that the source code supplied corresponds to the load modules it installs. the system as presently distributed by wln can in no way be considered turnkey. local computer operations or jcl requirements as well as differing levels of staff expertise can create problems. furthermore, wln handles source management through wylbur, a text-editing system, and this is not included with the wln software. the module descriptions, grogramming language, mode, link-edit information, etc., maintained through this facility are supplied, however. either a test or, if contracted for, a full database is also supplied. wln has had some difficulties in creating a valid test database for illinois, but has now defined procedures to better control the process. wln has distributed its second release to australia as a full source update identical to what was installed at illinois. in the future only the source changes in standard ibm iebupdte form will be supplied to replication sites. this will better enable these sites to integrate the new version into theirs. other sites the university of missouri is likely to be the third replicant of the system, since it has just selected wln as the basis for its online catalog system . installation is planned before the end of 1981. the national library of new zealand has also indicated that it intends to purchase the system. the southeastern library network (solinet) has obtained the system in order to convert it to a burroughs facility. while this is a software license, it is not a replication. theresulting system, however, would be available from wln for installation on burroughs equipment. wln has not implemented data sharing with australia, but is testing the loading of illinois data into its bibliographic file. wln libraries should see illinois records on a regular basis by late summer of 1981. similar arrangements will be made with missouri and solinet. shared development has gotten off to a start with the national library of australia having done the work necessary to add the ibm 3270 type of terminal to those that can support cataloging input and edit on wln. illinois will be undertaking the development of enhancements to make the system easier to use as a public online catalog, in addition to other possible areas of concern. wln, of course, continues its in-house development, which has recently seen the implementation of a new batch retrospectiveconversion subsystem, and added com catalog options and online authority verification during input/edit. while not the only bibliographic system to be successfully replicated, the wln computer system is becoming the most systematically replicated main-frame facility, with a broad range of future possibilities, including that of a truly turnkey system. wln's experience indicates that, if a system is designed for ease of maintenance at perhaps some sacrifice of efficiency, it will be readily transportable and allow others to obtain the benefits of a highly sophisticated bibliographic capability without the everincreasing cost of original development and, more importantly, without having to support the ongoing maintenance of a unique system. a general planning methodology for automation richard w. meyer, beth ann reuland, francisco m. diaz, and frances colburn: clemson university, clemson, south carolina. introduction a workable planning methodology is the logical starting place for the successful implementation of automation in libraries. an automation plan may develop on the basis of an informal arrangement or from the efforts of one individual, but just as often, automation plans are developed by committees. an automation planning committee must determine and execute some kind of planning methodology and is more likely to be successful if it starts with clear guidelines, good leadership, and a thoroughly proven approach. as a summary review of the literature will bear out, many libraries have developed their own planning techniques inhouse. some of these, which are addressed to the issues of cataloging rule changes and public-access catalogs, have been very well thought out. 1 however, these techniques are generally not directed to planning for communications 205 library-wide automation, and are usually designed to meet the specific needs of an individual library. although the pattern for these studies is often similar, they do not seem to be based upon any general automation design methodology. neither, in addition, does there seem to be a general methodology available through any external library agency. the office of library management studies of the association of research libraries has developed a number of programs designed to assist libraries with their planning efforts, some of which appear to be useful in automation development. 2 but for many libraries, these programs may be too broad, too time-consuming or too expensive. as an alternative, some libraries will need to look elsewhere for a general automation planning methodology. this problem was addressed by the administration of the clemson library, and was resolved in a unique way. background the robert muldrow cooper library of clemson university has the responsibility of acquiring, preserving, and making available for use the many materials needed by faculty and students in their research and instructional efforts. at a typical landgrant institution like clemson, the amount of scholarly publishing and the pressure to develop research proposals has risen sharply in recent years. the increased needs of users working with an expanding and diversified collection have resulted in a doubling of circulation activity, and have required the growth of library staff by 70 percent over the last decade. furthermore, acquisition, processing, and access problems are compounded by the high inflation rate of materials, particularly serial publications, and manpower costs. even though user demands heavily burdened the traditional manual systems, the extent of library automation at clemson had been limited to a batch circulation system, a simple serials-listing capability, and the use of bibliographic utilities. although it had been generally accepted for some time that the acquisitions and fund-control functions at clemson were in need of automation, no concrete approach to developletter from the editor kenneth j. varnum information technology and libraries | june 2019 1 https://doi.org/10.6017/ital.v38i2.11241 welcome to the june 2019 issue of ital. you’ll likely notice a new look to the journal when you read this issue’s content. our helpful and supportive partners at boston college, where information technologies and libraries is archived, have updated the journal’s content management system to the current version of open journal systems. i am grateful to john o’connor at boston college for his patience with and quick, helpful responses to my numerous questions as we adapted to the new user interface and editorial workflows. columns in this issue include bohyun kim’s final “president’s message” as her term concludes, summarizing the work that has gone into the planned division merger that would combine lita, alcts, and llama. editorial board member cinthya ippoliti discusses the role of libraries in fostering digital pedagogy in her “editorial board thoughts” column. and, in the second of our new “public libraries leading the way” column, jeffrey davis discusses the technologies and advantages of digital pass systems. peer-reviewed articles in this issue include: • “no need to ask: creating permissionless blockchains of metadata records,” by dejah rubel, laying a path for using blockchain for managing metadata. • “50 years of ital/jla: a bibliometric study of its major influences, themes, and interdisciplinarity,” by brady lund, a thorough study of how our journal has influenced, and been influenced by, other leading information technology journals. • “weathering the twitter storm: early uses of social media as a disaster response tool for public libraries during hurricane sandy,” by sharon han. this article is the 2019 lita/ex libris student writing award-winning paper. • “‘good night, good day, good luck’: applying topic modeling to chat reference transcripts,” by megan ozeran and piper martin, describing a process to categorize chat reference themes using topic mapping software. • “information security in libraries: examining the effects of knowledge transfer,” by tonia san nicolas-rocca and richard j burkhard, investigating the importance of knowledge transfer across an organization to enhance information security behaviors. • “wikidata: from ‘an’ identifier to ‘the’ identifier,” by theo van veen, describing how libraries could use wikidata as a source of linked open data. thank you to this issue’s authors, and all of information technology and libraries’ readers for supporting peer-reviewed, open-access, scholarly publishing. in closing, i would like to thank the members of the editorial board whose terms are ending june 30: patrick “tod” colegrove, joseph deodato, richard guajardo, and frank cervone. i’m grateful to these four individuals, upon whom i’ve relied for their excellent advice and guidance in steering ital’s course. we are in the process of appointing new editorial board members with two-year terms starting on july 1, and i’ll introduce them in the next issue. kenneth j. varnum, editor varnum@umich.edu june 2019 a scatter storage scheme for dictionary lookups d. m. murray: department of computer science, cornell university, ithaca, new york scatter storage schemes are examined with respect to their applicability to dictionary lookup procedures. of particular interest are virtual scatter methods which combine the advantages of rapid search speed and reason• able storage requirements. the theoretical aspects of computing hash addresses are developed, and several algorithms are evaluated. finally, experiments with an actual text lookup process are described, and a possible library application is discussed. a document retrieval system must have some means of recording the subject matter of each document in its data base. some systems store the actual text words, while others store keywords or similar content indicators. the smart system ( 1) uses concept numbers for this purpose, each number indicating that a certain word appears in the document. two advantages are apparent. first, a concept number can be held in a fixedsized storage element. this produces faster processing than if variablesized keywords were used. second, the amount of storage required to hold a concept number is less than that needed for most text words. hence, storage space is used more efficiently. smart must be able to find the concept numbers for the words in any document or query. this is done by a dictionary lookup. there are two reasons why the lookup must be rapid. for text lookups, a slow scheme is costly because of the large number of words to be processed. for handling user queries in an on-line system, a slow lookup adds to the user response time. 174 journal of library automation vol. 3/3 september, 1970 storage space is also an important consideration. even for moderate sized subject areas the dictionary can become quite large-too large for computer main memory, or so large that the operation of the rest of the retrieval system is penalized. in most cases a certain amount of core storage is allotted to the dictionary, and the lookup scheme must do the best possible job within this allotment. this usually means keeping the overhead for the scheme as low as possible, so that a large portion of the allotted core is available to hold dictionary words. the rest of the dictionary is placed in auxiliary storage and parts of it are brought in as needed. obviously the number of accesses to auxiliary storage must be minimized. this paper presents a study of scatter storage schemes for application to dictionary lookup, methods which appear to be fast and yet conservative with storage. the next two sections describe scatter storage schemes in general. they are followed by a section presenting the results of various experiments with hash coding algorithms and a section discussing the design and use of a practical lookup scheme. the final sections deal with extensions and conclusions. basic scatter storage method a basic scatter storage scheme consists of a . transformation algorithm and a table. the table serves as the dictionary and is constructed as follows: given a natural language word, the algorithm operates on its bit pattern to produce an address, and the concept number for the word is placed in the table slot indicated by this address. this process is repeated for every word to be placed in the dictionary. the generated addresses . are called hash addresses; and the table, a hash table.· · there are many possible algorithms for producing hash addresses ( 2,3,4). some of the most common are: 1 ) choosing bits from the square of the integer represented by the input word; 2) cutting the bit pattern into pieces and adding these pieces; 3) dividing the integer represented by the input word by the length of the hash table and using the remainder. · collisions in an ideal situation every word placed in the dictionary would have a unique hash address. however, as soon as a few slots in the hash table have been filled, the possibility of a collision arises-two or more words producing the same hash address. to differentiate among collided entries, the characters of the dictionary words· must be stored along with their concept numb~rs. during lookup, the input word can then be compared with the character string to verify that the correct table entry has been located. · · the problem of where to store the collided items has several · methods of solution ( 3,5). the linear scan method places a collided item in the first free table slot after the slot indicated by the hash address. the scan· is scatter storage for dictionary lookups/murray 175 circular over the end of the table. the random probe method uses a crude algorithm to generate random offsets r(i) in the interval [1,h] where h is the length of the hash table. if the colliding address is a, slot a+r( 1) mod h is examined. the process is repeated until an empty slot is found. both of these methods work best when the hash table is lightly loaded; that is, when the ratio between the number of words entered and the number of table slots is small. in such cases the expected length of scan or average number of random probes is small. chaining methods provide a satisfactory method of resolving collisions regardless of the load on the hash table. however, they require a second storage table-a bump table-for holding the collided items. when a collision occurs, both entries are linked together by a pointer and placed in the bump table. a pointer to this collision chain is placed in the hash table along with an identifying flag. further colliding items are simply added to the end of the collision chain. table layout and search procedure in the virtual scatter storage system described later, the hash table has a high load factor. hence the chained method (or rather a variation of it) is used to resolve collisions. further discussion involves only scatter storage systems using collision chains. with this restriction, then, a scatter storage system consists of a hash table, a bump table, and the associated algorithm for producing hash addresses. a dictionary entry consists of a concept number and the character string for the word it represents. these entries are placed in the hash-bump table as described above. consequently there are three types of slots in the hash table-slots that are empty, slots holding a single dictionary entry, and slots containing a pointer to a collision chain held in the bump table. figure 1 is a typical table layout. hash table 0 empty slot • • concept + char • nary entry single dictio . . pointer -..j.,ntry 11 'r --~)~entry 21 \ collision cha in fig. 1. typical table layout. 176 journal of libmry automation vol. 3/3 september, 1970 one of the advantages of scatter storage systems is that the search strategy is the same as the strategy for constructing the hash-bump tables. a word being given, its hash address is computed and the tables searched to find the proper slot. during construction, dictionary information is placed in the slot; during lookup, information is extracted from the slot. the basic search procedure is illustrated by the flow diagram in figure 2. the construction procedure is similar. pointer,----< get next bump table entry input the text ~rd coiilpute hasii. address return concept number word never entered in dictionary fig. 2. flow diagram for the lookup procedure in basic scatter storage systems. scatter storage for dictionary lookups/murray 177 theoretical expectations an ideal transformation algorithm produces a unique hash address for each dictionary word and thereby eliminates collisions. from a practical point of view, the best algorithms are those which spread their addresses uniformly over the table space. producing a hash address is simply the process of generating a uniform random number from a given character string. if the addresses are truly random, a probability model may be used to predict various facts about the storage system. suppose a hash table has h slots and that n words are to be entered in the hash-bump tables. let h, be the expected number of hash table slots with i entries for i=0,1, ... n. in other words, ho is the expected number of empty slots, h1 is the expected number of single entries, and h2,hs, ... , hn are the expected number of slots with various numbers of colliding items. even though the items are physically located in the bump table, they may be considered to "belong .. to the same slot in the hash table. it is expected that: n 1) h=~ h. i=o n 2) n="s i h, i=o now let x _ (1 if exactly i items occur in the r~~ slot ' 1 ~0 if exactly i items do not occur in the j'11 slot for i = 1,2, ... , h then h, = e [xu + x.2 + ... + xm] h = ~ e [x,j] i= 1 assume that any chosen table slot is independent of the others so that the probability of getting any single item in the slot is 1/h. then the probability of getting exactly i items in that slot is 3)p·= (~x1r(1-1f then e[x,1] = 1·p, + 0· (1-p,) = p, substituting into the above 4) h,= h·p, = n( ~x~) i ( 1~ ri for j = 0,1, ... > n 178 journal of library automation vol. 3/3 september, 1970 for the cases of interest h and n are large, and the poisson approximation can be used in equation 3: p ·nih (n/h)' •e il the ratio n f h is the load factor mentioned previously. it is usually designated by a so that a• 5) h, = he·a if i=o,l, . . . , n equation 5 is sufficient to describe the state of the scatter storage system after the entry of n items. most of the statistics of interest can be predicted using this expression; a few of them are listed in table 1. the time required for a single lookup using a hash scheme depends on the number of probes into the table space, that is, how many slots must be examined. suppose the word is actually found; if it is a single entry, only one probe is required. if the word is located in a collision chain, the number of probes is one (for the hash table) plus one additional probe for each element of the collision chain that must be examined. suppose that the word is not in the dictionary; if its hash address corresponds to an empty table slot, again only one probe is needed. however, if the address points to a collision chain, the number is one plus the length of the chain. for words found in the dictionary the average number of probes per lookup is : i 6) p = 1 + n[(o)ht + (1+2)hz + (1+2+3)hs + ... n i = 1 + ~ h , ~ f i=2 f=1 1 n = 1 + 2 ~ (i+1)fi-1 i=2 1 n 1 n =i+ 2 ~ (i-1) f i-1 +2 l f •. t i=2 i=2 1 n+1 1 n+1 = 1 + 2 ~ (i-1)fi-t + 2 ~ f1-1 i=2 i=2 (probes) + (1+2+ ... + n)hn] scatter storage for dictionary lookups/murray 179 table 1. expected storage and search properties fo1' basic scatter storage schemes measure load factor number of empty table slots number of single entries number of collision chains of length i expected sums fraction of hash table empty fraction of table filled with single entries fraction of hash table slots with i entries expected sums number of collisions number of entries in the bump table total table slots required average lookup time (probes) h = number of hash table slots n = number of words to be entered formula a=n/h ho =he-a h1 = ne-a ai hi = h e-a--:-r i = 2,3, ... , n z. n h = ~--hi i=o n n =~--hi i=o 1 fo = h ho= e-a \ 1 f1 = h h1 = aea 1 a' f, = h hi = e-a if i = 2,3, ... , n n 1 = ~ f, i=o n a = ~ i f, i=o no = h2 + h a + ... + hn = h ho-hl b = n-hl s = h+b 180 journal of library automation vol. 3/ 3 september, 1970 vihtual scatter storage method from table 1, the expected number of collisions is nc= hhoht = h( 1 e ·ntn _ ~e·nih) for a fixed n, this number decreases as h increases. at the same time the number of empty hash table slots ho = h e·nt n increases as h increases. both of these results are expected; as the hash addresses are spread over a larger and larger table space ( h slots), the number of collisions should decrease and the number of empties increase for a fixed number of entries ( n). a virtual scatter storage scheme tries to balance these opposing strains by combining hash coding with a sparse storage technique. large or virtual hash addresses are used to obtain the collision properties associated with a very large hash table, and the storage technique is used to achieve the storage and search properties of a reasonably sized hash table. if the virtual hash address is taken large enough, the expected number of collisions can be reduced to essentially zero. with no expected collisions, it is possible to dispense with verifying that a query word and the dictionary word are the same. it is enough to check that they produce the same virtual address. hence, the character strings need not be stored in the hash-bump tables at all. to implement the virtual scheme a large hash address is computed, say in the range ( 0, v), and the address is split into a major and minor part. the major portion is used just as before-as an index on a hash table of size h. the minor portion is stored in the hash or bump table, in place of the character string. with this difference, the virtual scheme works just as the basic scheme does. the lookup procedure is identical, but the minor portions are used for comparison rather than character strings. all the results of the previous section apply as storage and timing estimates. the advantage of virtual scatter storage systems is economy of storage space. the minor portion is much smaller in size than that of the character string it replaces. it is true that the virtual scheme assigns the same concept number to two different words if they have the same virtual address. this need not be disastrous for document retrieval applications. presumably v is chosen large enough to keep the number of collisions small. on the one hand, errors could be neglected because of their low probability of occurrance and their small effect on the total performance of the retrieval system. on the other hand, it is always possible to resolve detected collisions even in a virtual scheme. collisions may be detected during dictionary construction or updating, and the characters for the scatter storage for dictionary lookups/murray 181 colliding words appended to the bump table. the hash or bump table entry must contain a pointer to these characters along with an identifying flag. collisions occurring during actual lookups cannot be detected. collision problem "' in order to use a virtual hash scheme, the virtual table must be large enough to reduce the expected number of collisions to an acceptable level. from a practical point of view, a collision may be considered to involve only two words, rather than three, four, or more. it is assumed that the probability of these other types of collisions is negligible. let v be the size of the virtual hash table. then the expected number of collisions is simply n. =h2 a2 = v2e·a where a = ~ . in this case v> > n so that a is small and e·a is approximately 1. a2 7) n.=v2 n2 =2v suppose, for example, the dictionary has n = 213 words. if the size of the virtual hash table is chosen to be v = 226, then the expected number of collisions is (213)2 1 nc = 2(226) = ]' suppose further that this table size is adopted for the dictionary, and that the hash code algorithm produces three collisions. the question arises whether the algorithm is a good one-whether it produces uniform random addresses. the answer is found by extending the previous probability model. consider a virtual scatter storage scheme in which the virtual table size is v, and n items are to be entered into the hash-bump tables. again assume that collisions involve only two items. let p(i) =prob [i collisions] = prob [i table slots have 2 items and n-2i slots have 1 item] the number of ways of choosing the i pairs of colliding words (in an ordered way) is: (~x n22} .. ( n-~+2 ) 2' (~~2i)l 182 journal of library automation vol. 3/3 september, 1970 there are il ways of ordering these pairs and vi (v)n-i = (v-n+i)l ways of placing the pairs in the hash table, so that ( . nl (v)n-' t n 2 j 8) p l) = 21il (n-2i)! -yr fori= 0,1, ... , in a form for hand computation, 1 2 n-1 9) p(0)=(1--y) (1-y-) ... (1---y) p( ') =p('1 ) (n-2i+ 2) (n-2i+1) f t ' 2i(v-n+ i) or i=1,2, ... , these results are exact, but the following approximations can be used with accuracy n-1 . log p ( 0) = ~ log ( 1 ~ ) i=l n-1 . =~ -' . 1 v 7= nz -2v let f3 = ~; . terms linear in n may be neglected in equation 9, giving p(o) = exp(-fi) p(i) = ~ p(i-1) 1 this is also a poisson distribution: 10) p(i) = exp(-fi) ft for i = 0,1,2, .. . , l ~ j this equation gives the approximate probability of i collisions for a virtual scatter storage scheme. it may be used to form a confidence interval around the expected number of collisions nc = /3. for the previous example in which v = 22 6, n = 213, n c = ~'the following table of values can be made: i p(i) ~p(i) 0 .607 .607 1 .303 .910 2 .076 .986 3 .012 .998 \ scatter storage for dictionary lookups/murray 183 the probability is .986 that the number of collisions is less than or equal to 2. since the algorithm gave 3 collisions, it appears to be a poor one. the results for the collision properties are summarized in table 2. table 2. expected collision properties for virtual scatter storage systems measure collision factor expected number of collisions probability of i collisions probability that the number of collisions c lies in [ a,b] v virtual hash table size formula n2 {3= 2v n.=p p( i) = exp(-{3) ~' i=o, 1, ... , [ ~ j b prob = ~ p(i) i=a n number of words to be entered experiments with algorithms for generating hash addresses any scatter storage scheme depends on a good algorithm for producing hash addresses. this is especially true for virtual schemes in which collisions are to be eliminated. in these experiments three basic algorithms are evaluated for use in virtual schemes. the words in two dictionariesthe adi wordform and cran 1400 \ivordform-are used. the hash-bump tables are filled using these words and the resulting collision and storage statistics compared with the expected values. dictionaries the adi wordform contains 7822 words pertaining to the field of documentation. it contains 206 common words (previously judged) averaging 3.93 characters. the remaining 7616 noncommon words average 8.00 characters. in all there are 61,712 characters. the cran 1400 wordform contains 8926 words dealing with aeronautics. the common word list consists of that of the adi, plus four additional entries. the 8716 noncommon words average 8.40 characters. there is a total of 74,074 characters. figures 3 and 4 show the distribution of the length of the words versus percentage of collection. the abrupt end to the curves in figure 3 is due to truncation of words to 18 characters. both dictionaries have approximately the same size and proportions of words of various length. however, their vocabularies are considerably different. a good hash scheme should work equally well on both dictionaries. 184 journal of library automation vol. 3/3 september, 1970 1/) 0 common words "e ~ ~ adi >0 cran 1400 ~ 0 c 0 += 0 ·0 -0 -c q) ~ 8 q) a.. 0 2 4 6 8 10 14 word length fig. 3. distribution of dictionary words according to their lengths. >. '0 c .q -u 0 -0 q) .~ -a :; e ::j u \ scatter storage for dictionary lookups/murray 185 0 common words 6 adi 0 cran 1400 0 2 4 6 8 10 12 14 16 18 20 word length fig. 4. cumulative distribution of dictionary words according to th eir l engths. 186 journal of library automation vol. 3/3 september, 1970 hash coding algorithms by their nature, hash coding algorithms are machine dependent. the computer representation of the alphabetic characters, the way in which arithmetic operations are done, and other factors all affect the randomness of the generated address. the algorithms described below are intended for use on the ibm s /360. words are padded with some character to fill an integral number of s /360 full words. then the full words are combined in some manner to form a single fullword key, and the final hash address is computed from this key. in the experiments which follow, the blank is used as a fill character. this is an unfortunate choice because of the binary representation of the blank 01000000. in some algorithms the zeroes may propagate or otherwise affect the randomness. a good fill character is one that 1) is not available on a keypunch or teletype, 2) will not propagate zeroes, 3) will generate a few carries during key formation, and 4) has the majority of its bits equal to 0, so their positions may be filled. a likely candidate for the s/360 is 01000101. three basic methods of generating virtual hash addresses-addition, multiplication, and division-are studied. the first and second provide contrasting ways of forming the single fullword keys. the second and third differ in the way the hash address is computed from the key. variations of each basic method are also tested to try to improve speed, programming ease, or collision-storage properties. l. addition methods ac-addition and center the fullwords of characters are logically added to form the key. the key is squared and the centermost bits are selected as the major. the minor is obtained from bits on both sides of the major. as-addition with shifting same as ac, except the second, third, etc. fullwords are shifted two positions to the left before their addition in forming the key. (an attempt to improve collision-storage properties) am-addition with masking same as ac, except the second, third, etc. fullwords have certain nonsignificant bits altered by masks before their addition in forming the key. (an attempt to improve collision-storage properties) 2. multiplication methods mc-multiply and center the fullwords of characters are multiplied together to form the key. the center bits of the previous product are saved as the multiplier for the next product. the key is squared and the centermost bits selected as the major. the minor is obtained from the bits on both sides of the major. scatter storage fo1' dictionary lookups/murray 187 msl-multiply and save left same as mc, but during formation of the key, the high order bits of the products, rather than the center, are used as successive multipliers. (an attempt to improve speed) mlm-multiply with left major same as mc, but taking the major from the left half of the square of the key and the minor from the right half. (an attempt to improve speed) 3. division methods dp-divide by prime the fullwords of characters are multiplied together to form the key. the center bits of the previous product are saved as the multiplier for the next product. the key is divided by the length of the virtual hash table-a prime number in this case-and the remainder used as the virtual hash address. the major is drawn from the left end of the virtual address and the minor from the right. do-divide by odd number same as dp, except using a hash table whose length is odd. (an attempt to provide more flexibility of hash table sizes ) dt -divide twice same as dp, except two divisions are made. the major is produced by dividing the key by the actual hash table size. the minor results from a second division. primes are used throughout as divisors. (an attempt to improve storage-collision properties) evaluation in the experiments to evaluate each variation of the above hash schemes, the size of the virtual hash table varies from 220 to 228 slots. the actual hash table varies in size from 212 to 214 slots. bump table space is used as needed. the tables are filled by the words from either the adi or cran dictionaries and the collision and storage statistics taken. because good collision properties are most important, they are examined first. the storage properties are dealt with later. the number of collisions obtained from each scheme versus the virtual table length is plotted in figures 5 to 8. the adi dictionary is shown in figures 5 and 7, and the cran in figures 6 and 8. the circled lines correspond to curves generated from equations 7 and 10. the horizontal one shows the expected number of collisions and the lines above and below it enclose a 95% confidence interval about the expected curve. in other words, if an algorithm is generating random . addresses, the probability is 95% that the curve for that scheme lies between the heavy lines. consider figures 5 and 6 showing the results for all the addition methods and the mc variation of the multiplication variation. the ac and mc algorithms differ only in that addition is used in forming the key in the 188 journal of library automation vol. 3/3 september, 1970 -0 ooooooo theoretical curves (equations (7) and ( 1 0) experimental curves ---interpolated curves virtual hash table size (power of two) fig. 5. collisions in the adi dictionary for addition and multiplication hash schemes. first one and multiplication in the second one. yet the curves are spectacularly different. the result seems to have the following explanation. the purpose of a hash address computation is to generate a random number from a string of characters. if the bits in the characters are as varied as possible, then the algorithm has a headstart in the right direction. however, the s/360 bit patterns for the alphabet and numbers are: a to i 1100 xxxx j to r 1101 xxxx s to z 1110 xxxx 0 to 9 1111 xxxx scatter storage for dictiona1·y lookups/murray 189 en c: 0 en 0 (.) -0 ... q) .0 e ::t z 20 ooooooo theoretical curves (equations (7) and (l 0) experimental curves --interpolated curves 26 virtual hash tobl e size (power of two) fig. 6. collisions in the gran dictionary for addition and multiplication hash schemes. c 28 in each case the two initial bits of a character are l's, so that in any given word one-fourth of the bits are the same. in forming a key, the successive additions in the ac algorithm may obscure these nonrandom bits if a sufficient number of carries are generated. however, the number of additions performed is usually small-2 or 3and it appears that the pattems are not broken sufficiently. the mc algorithm uses multiplication to form its keys, which involves many additions-certainly enough to make the resulting key random. the multiplications in the mc algorithm are costly in terms of computation time. therefore the as and am algorithms are tried. these addition 190 journal of library automation vol. 3/3 september, 1970 en c 0 ·~ 0 u .... 0 ooooo theoretical curves experimental curves interpolated curves 22 20 virtual hosh table size (power of two} fig. 7. collisions in the adi dictionary for division and multiplication hash schemes. variants try to hasten the breakup of the nonrandom bits by shifting and masking respectively. although these variants reduce the number of collisions somewhat, none of the addition schemes could be called random. typically a few words are singled out at some point and continue to collide regardless of the length of the virtual address. several collision pairs are listed below. note the similarities between the words. count worth tolerated wheel -sound -forty -telemeter -sheet in 1: 0 ·;;; 0 0 ... 0 ... cl) .d e :i z 20 scatter storage for dictionary lookups/murray 191 0000000 \ \ ,---\ \ \ \ 26 theoretical curves (equations (7) and (1 0) experimental curves interpolated curves 28 virtual hash table size (power of two) fig. 8. collisions in the gran dictionary for division and multiplication hash schemes. consider the multiplication algorithms. during key formation, the process of saving the center of successive products adds to the computation time. the msl variation attempts to remedy this by saving only the high order bits between multiplications (on the s /360 this means saving the upper 32 bits of the 64-bit product) . this method is so inferior that its collision graph could not be included with the others. the poor results stem from the fact that characters at the end of fullwords have little effect on the key and that the later multiplications swamped the effects of the earlier ones. examples of collision pairs are given below. for convenience the fullwords are separated by blanks. 192 journal of library automation vol. 3/3 september, 1970 certainty prevented heaving expe nse charter certainly -presented -heat lng -expanse -chapter the mc and mlm variants are identical with respect to collision properties. in general these algorithms produce good results, reducing the number of collisions to zero in both dictionaries. the collision curve is always beneath the expected one. consider figures 7 and 8 showing the results for all division methods and the mc method. all of the division algorithms display a distinct rise in the number of collisions when the virtual table size is near 224-regardless of the dictionary. the majority of the colliding word pairs are 4-character words having the same two middle letters. this brings to light a curious fact about division algorithms. for virtual tables, the divisor of the key is large and the initial few bits determine the quotient, leaving the rest for the remainder. for words of less than 4 characters (which require no multiplications during key formation), dividing by 224 is equivalent to selecting the last 3 characters of the word as the hash address. because the divisors are not exactly equal to 224, only the two middle characters tend to be the same. examples are: deal -bear took -soon held -cell verb -term this phenomenon apparently continues for table sizes around 226 and 228, but there are few or no words of 4 characters or less which agree in 26 or 28 bits. for divisors smaller than 22 \ a larger part of the key determines the quotient and apparently breaks up the pattern. because the above effect occurs only for v = 22 \ these points are passed over on the graphs. in general, the dt algorithm is superior to the rest of the division methods, mostly because each of its two divisors is smaller than those used in other methods. prime numbers seem to produce better results than other divisors. on the basis of collision properties, the mc, mlm, dt, and possibly as algorithms are the best. storage-search evaluations are included for these methods only. the experiments with each hash coding method also include counting the frequency of various length of collision chains. here a collision chain refers to chains of words producing the same major. the frequency counts are compared with the expected counts given by equation 5. the comparison is in terms of a chi-square goodn ess-of-fit test with a 10 % level of 0 8 :;::: t/) -0 -(j) q) ~ 0 ;:) ct (j) i .c u 2 0 scatter storage for dictionary lookups/murray 193 x·---x·---x or----dt / dt a~as mlm as ----as,dt mlm--mlm mlm mc mc-mc ~c ----mc virtual hash table size (power of two) xcurve for 10% level of significance fig. 9. deviations of storage-search properties from expected values for selected hash schemes using the adi dictionary. significance. figures 9 and 10 show the results of this test for each dictionary. included in the graphs is the line corresponding to the 10% level of significance. if the major portions of the hash addresses are really random, there is a probability of 0.90 that the 10% line will lie above the curve for the algorithm tested. consider the mc and mlm algorithms which differ only in that the major is selected from the center and left of the virtual address. from the graphs, it is clear that the multiplication methods produce their most 194 journal of library automation vol. 3/3 september, 1970 .~ -.!:!! -0 -cj) ~ 0 8 ~ c:1' cj) i ..c (.) 6 4 mlm x--dt virtual hash table size (power of two) xcurve for 10% level of significance fig. 10. deviations of storage-search properties from expected values for selected i-iash schemes using the gran dictionary. random bits in the center of their product. this is somewhat as expected, because the center bits are involved in more additions than other bits. the division algorithm, which had fairly good collision properties, seems to have rather mediocre storage properties. this is probably due to the scatter storage for dictionary lookups/murray 195 same causes as the collision problems, but working at a lower level, and not affecting the results as much. the as curve is included simply for completeness. the scheme displays a well behaved storage curve, but it has poor collision properties. in summary, the mc scheme seems to be the best for both dictionaries in terms of collision and search properties. in terms of computing time, the method is more time consuming than the addition methods, but less expensive than the division methods. the difference in computation times is not an extremely big factor. all methods required from 35 to 55 microseconds for an 8-character word on the s/360/65. the routines are coded in assembly language and called from a fortran executive. the times above include the necessary bookkeeping for linkage between the routines. a practical lookup scheme general description the lookup scheme described below is designed for use with dictionaries of about 21 :. words. the virtual table size selected is 229 and the actual table size is 216• on the basis of the results presented in previous sections, when the dictionary is full, it is expected that 1) 36.8% of the hash table will be empty, 2) 36.8% of the hash table will be single entries, 3) the bump table will require ( 0.632 )215 entries, 4) 1 collision is expected, 5) the probability of 5 or fewer collisions is 0.999, and 6) the average lookup will require 2.13 probes. table layout in all previous discussions a dictionary entry has included a minor and a concept number. a concept number is simply a unique number assigned to each word. the hash address of a word is also unique, and hence can be used. there is no need to store and use a previously assigned concept number. a dictionary entry contains a 14-bit minor and a single bit indicating whether the word is common or noncommon: 1 2 15 ic minor c = 0 implies the word is common; c = 1 implies the word is noncommon. a hash table entry contains 16 bits arranged as : 0 1 15 i flag i information flag = 0 implies that the information is a dictionary entry; flag = 1 implies that the information is a pointer to the bump table. words that have the same major are stored in a block of consecutive 196 journal of library automation vol. 3/3 september, 1970 locations in the bump table. this eliminates the need for pointers in the collision "chains". a bump table entry also has 16 bits structured as: 0 1 2 w i end i c minor end= 0 implies that the entry is not the last in the collision block; end = 1 implies that the entry is the last in the block. some convention must be adopted to signify an empty hash table slot. a zero is most convenient in the above scheme. unfortunately a zero is also a legitimate minor. however, to cause trouble the word generating the zero minor would have to be a common word and a single table entry (zero minors in the bump table are no problem). hopefully this occurs rarely because of the size of the minor ( 14 bits) and the small number of common words. however, even if this combination of circumstances occurs, the common word could be placed in the bump table anyway. in designing the tables, it is important to make the hash table entries large enough to accommodate the largest pointer anticipated for the bump table. for the above scheme, the expected bump table size is less than 215 so that the 15 bits allocated for pointers is sufficient. search considerations the number of probes needed to locate any given word depends on the place that the word occupies in a collision block. the average search time is improved if the most common words occupy the initial slots in each block. a study of adi text yields the statistics given in tables 3 and 4. table 3. division of words by categm·y. number of words percent of total 17270 total words 100.0 8716 common words 50.5 8554 noncommon words 49.5 table 4. distribution of l engths. number of all common noncharacters words percent words percent common percent words 1-4 10145 58.8 8057 92.5 2097 24.5 5-8 4630 26.8 627 7.2 4003 46.8 9-12 2249 13.0 32 0.3 2217 25.9 13-16 221 1.3 0 0.0 221 2.6 17-20 11 0.1 0 0.0 11 0.1 21-24 5 0.0 0 0.0 5 0.1 totals 17270 100.0 8716 100.0 8554 100.0 av. length 6.3 4.3 8.3 scatter storage for dictionary lookups/murray 197 using the categorical information, it appears that in filling the hash-bump tables, the common words should be entered first. within each category, all words should be entered in frequency order if such information is known. if frequency information is not available, the distribution by lengths can be used as an approximation to it. for common words, this means entering the shorter words first. for noncommon words, the words of 5 to 8 characters should be entered first. the greater the number of single entries, the greater the average search speed. figure 11 shows the fraction of single entries ( f 1) and fraction of empty slots ( f o) for various load factors. the fraction of single entries .l: iji 0 i -0 c 0 -(.j 0 ~ u. 0 .4 .8 load factor fig. 11 . theoretical hash table usage. 0 fraction empty slots a fraction of single entries 1.6 198 journal of library automation vol. 3/3 september, 1970 f1=ae-a reaches a maximum for a= 1, but since the slope of the curve is small around this point, the load factor in the interval ( 0.8, 1.2 ) is practically the same. table usage is better, however, for the larger values of a. these facts imply that scatter storage schemes make most efficient use of space and time for a=l. most text words can be assumed to be in the dictionary. thus the order of comparisons during lookup should be: hash table scan 1) check minor assuming the text word is a common word 2) check minor assuming the word is non common 3) check if the entry is a pointer to the bump table 4) check if the entry is empty first bump table entry (must be at least two) 5) check minor assuming the word is a common word 6) check minor assuming the word is non common other bump table entries 7) check minor assuming the word is non common 8) check minor assuming the word is common 9) check if at end of collision block. the search pattern can be varied to take advantage of the storage conditions. for example, if all common words are either single entries or the first element of a collision block, then step 8 may be eliminated. performance the lookup system described above has been implemented and tested on the ibm s/360/65. a modified form of the mc algorithm is used to compute a 29-bit virtual address and divide it into a 15-bit major and a 14bit minor. the modification is the inclusion of a single left shift of the fullwords of characters during key formation. this breaks up certain types of symmetries between words such as wingtail and tailwing. without this, such words will always collide. the hash-bump tables were filled with entries from the adi dictionary-common words first, followed by noncommon words. the shortest words were entered first. table 5 gives comparison of the expected and actual results. table 5. lookup system results. a=.239 number of empty table slots number of single entries number of collision blocks longest collision block average length of collision blocks size of bump table number of collisions average probes per lookup expected 25810 6161 797 4 2.1 1663 .06 1.33 actual 25762 6250 756 4 2.1 1572 0 1.33 scatter storage for dictionary lookups/murray 199 to obtain the actual lookup times 627 words were processed. the words were read from cards and all punctuation removed. each word was passed to the lookup program as a continuous string of characters with the proper number of fill characters added. the resulting times are given in table 6 (in microseconds); a larger sample of the category of "not-found" words processed with less accurate timings indicates that the average time for words in this category is about 62 microseconds (standard deviation 26). table 6. lookup times category number of words of words all 627 common 288 noncommon 338 not found 1 percent of total 100.0 45.9 53.9 0.2 average time 57.9 49.9 64.7 53.1 standard deviation 11.7 6.7 10.7 0.0 average probes 1.18 1.12 1.24 1.00 the time to compute a hash address depends on the length of the word . let n be the number of s /360 full words needed to hold these characters. the time to form the initial address is i ( n) = 34.5 + 10.2 ( n-1) microseconds. the average total lookup time, then, is t = i(n) + cp where c is the average time per probe into the table space and p is the average number of probes. for the words in the experiment n = 2.32 (average), i ( n) = 40.3, and t = 57.9, so that each probe required about 15 microseconds. c ompadsons timing information for other lookup schemes is difficult to obtain. a treestructured dictionary is used for a similar purpose at harvard. published information indicates 6pq microseconds are needed to process p words in a dictionary of q entries. this time is for the ibm 7094. translating this time to the s/360/65, which is roughly four times faster, and using the adi dictionary ( q = 7822), it appears that each lookup averages 11,000 microseconds. exactly how much computation and input-output this includes is unknown. extensions larger dictionaries as more words are added to the dictionary, the size of the virtual address must increase in order to prevent collisions. as a result, the number of bits per table slot must also increase in order to accommodate the larger minors and pointers that are used. for a fixed-sized hash table, the number of entries in the bump table grows as new words are added. at some point the space required for tables will exceed the amount of core allotted for 200 journal of library automation vol. 3/ 3 september, 1970 dictionary use. to salvage the scheme, it may be possible to split the buinp table into parts-one part for more frequently used words and one for words in rather rare usage. during dictionary construction common words are entered first, then noncommon, then rare. when a rare word must be placed in a collision block, a marker is stored instead, and the item is placed in the secondary bump table. presumably the nature of the words in the second bump table will make its usage rather infrequent, thus saving access to auxiliary storage to fetch it. suffix removal many dictionary schemes store only word stems; the lookup attempts to match only the stem, disregarding suffixes in the process. this is not easily done with scatter storage schemes. one solution is to try to remove the suffix after an initial search has failed. each of the various possible stems must be looked up independently until a match is found. another solution is to use a table of correspondences between the various forms of a word and its stem. the concept number could be used as an index on th is table containing pointers to information about the actual stem. a thesauru s lookup can be handled the same way. application to library files library fil es-characterized by a large number of entries, personal and corporate names, foreign language excerpts, etc.-present special problems to lookups. with regard to size, there is no particular reason that scatter storage cannot b e extended to such files. the only genuine requirement is the ability to compute a virtual address long enough to insure a reasonably low number of collisions. as mentioned previously, table space can become a problem. for really large files, a two-stage process looks most promising. a small hash table is used to address high frequency items and a larger hash table is used for addressing all other data. lookup starts with the small tables and continues to the larger ones if the initial search fails. the same virtual address can be used in both lookups by shifting a few bits from the high-frequency minor to the low-frequency major. this two-stage technique should keep the amount of table shufbing to a minimum and provide rapid lookup for all textual data in titles, abstracts, etc. with respect to bibliographic information, personal and corporate names are bothersome because they can occur in several forms. unfortunately, scatter storage schemes do not guarantee that dictionary entries for r. a. jones and robert a. jones are near each other, so that if an initial lookup fails, the rest of the search can be confined to a local area of the file. there are two approaches to the problem : ( 1 ) standardization of names before input or ( 2) repeated lookups using variants of a name as it occurs in text. standardization, along with delimiting and formatting bibliographic data, is probably the most effective and least expensive approach. in addition, it reduces the amount of redundant data in the file. scatter storage for dictionary lookups/murray 201 phrases in foreign languages present a difficulty, since the character sets on most computing equipment are limited to english letters and symbols. however, if an encoding for such symbols is used, lookup can proceed normally. the problem of obtaining the dictionary entry for an english equivalent of a foreign word is a completely different matter and will not be dealt with here. conclusions virtual scatter storage schemes are well suited for dictionaries, having both rapid lookup and economy of storage. the rapid lookup is due to the fact that the initial table probe limits the search to only a few items. the space savings come from the fact that the actual character strings for words are not part of the dictionary. the schemes depend heavily on a good algorithm for producing random hash addresses. the theory developed in the first two sections of this paper gives a basis for judging the worth of proposed algorithms. for any particular application, the table organization may vary to suit different needs and to store different information. however, the advantages of scatter storage schemes are still present. references 1. salton, g.: "a document retrieval system for man-machine interaction." in association for computing machinery. proceedings of the 19th national conference, philadelphia, pennsylvania, august 25-27, 1964, pp. l2.3-l-l2.3-20. 2. mcilroy, m. d.: dynamic storage allocation (bell telephone laboratories, inc., 1965). 3. morris, r.: "scatter storage techniques," communications of the acm (january, 1968 ). 4. maurer, w. d.: "an improved hash code for scatter storage," communications of the acm (january, 1968) . 5. johnson, l. r.: "indirect chaining method for addressing on secondary keys," communications of the acm (may, 1961). antelman 128 information technology and libraries | september 2006 article title: subtitle in same font author name and second author author id box for 2 column layout library catalogs have represented stagnant technology for close to twenty years. moving toward a next-generation catalog, north carolina state university (ncsu) libraries purchased endeca’s information access platform to give its users relevance-ranked keyword search results and to leverage the rich metadata trapped in the marc record to enhance collection browsing. this paper discusses the new functionality that has been enabled, the implementation process and system architecture, assessment of the new catalog’s performance, and future directions. editor’s note: this article was submitted in honor of the fortieth anniversaries of lita and ital. t he promise of online catalogs has never been realized. for more than a decade, the profession either turned a blind eye to problems with the catalog or accepted that it is powerless to fix them. online catalogs were, once upon a time, “the most widely-available retrieval system and the first that many people encounter.”1 needless to say, that is no longer the case. libraries cannot force users into those “closed,” “rigid,” and “intricate” online catalogs.2 as a result, the catalog has become for many students a call-number lookup system, with resource discovery happening elsewhere. yet, while the catalog is only one of many discovery tools, covering a proportionately narrower spectrum of information resources than a decade ago, it is still a core library service and the only tool for accessing and using library book collections. in recognition of the severity of the catalog problem, particularly in the area of keyword searching, and seeing that integrated library system (ils) vendors were not addressing it, the north carolina state university (ncsu) libraries elected to replace its keyword search engine with software developed for major commercial web sites. the software, endeca’s information access platform (iap), offers state-of-the-art retrieval technologies. ฀ early online catalogs larson and large and beheshti summarize an extensive body of literature on online public access catalogs (opacs) and related information-retrieval topics through 1997.3 the literature has tapered off since then; however, as promising innovations failed to be realized in commercial systems, mainstream opac technology stabilized, and the library community’s collective attention was turned to the web. first generation online catalogs (1960s and 1970s) provided the same access points as the card catalog, dropping the user into a pre-coordinate index.4 the first online catalogs, byproducts of automating circulation functions, were “intended to bring a generation of library users familiar with card catalogs into the online world.”5 the expectation was that most users were interested in known-item searching.6 with the second generation of online catalogs came keyword or post-coordinate (boolean) searching. while systems based on boolean algebra represented an advance over those that preceded them, boolean is still a retrieval technique designed for trained and experienced searchers. (twenty years ago, salton wrote, “[t]he conventional boolean retrieval methodology is not well adapted to the information retrieval task.”7) boolean systems were, however, simple to implement and economical in their storage and processing requirements, important at that time.8 soon after the euphoria of combining free-text terms across records wore off, the library community recognized that the major problem with firstand second-generation catalogs was the difficulty of searching by subject.9 ฀ the “next-generation” catalog by the early 1980s, thinking turned to next-generation catalog features.10 out of this surge of interest in improving online catalogs emerged a number of experimental catalogs that incorporated advanced search and matching techniques developed by researchers in information retrieval. they typically did not rely on exact match (boolean) but used partial-match techniques (probabilistic and vector-based). since probabilistic and vector-based models were first worked out on document collections, not collections of marc records, adaptations were made to the models.11 these prototype systems included okapi, which implemented search trees, and cheshire ii, which refined probabilistic retrieval algorithms for online catalogs.12 it is particularly sobering to revisit one system that was developed between 1979 and 1983. the cite catalog, developed at the national library of medicine, incorporated many of the features of the endeca-powered catalog, including suggesting (mesh) subject headings, correcting spelling errors, stemming, as well as even more advanced features, such as term weighting, keyword suggestion, and “find similar.”13 toward a twenty-first century library catalog kristin antelman, emily lynema, and andrew k. pace kristin antelman (kristen_antelman@ncsu.edu), emily lynema (emily_lynema@ncsu.edu), and andrew k. pace (andrew_pace@ncsu.edu) are respectively associate director for the digital library, systems librarian for digital projects, and head, information technology, at the north carolina state university libraries, raleigh. toward a twenty-first-century library catalog | antelman, lynema, and pace 129 ฀ where are we now? as belkin and croft noted in 1987, “there is a disquieting disparity between the results of research on ir techniques . . . and the status of operational ir systems.”14 two decades later, libraries are no better off: all major ils vendors are still marketing catalogs that represent secondgeneration functionality. despite between-record linking made possible by migrating catalogs to web interfaces, the underlying indexes and exact-match boolean search remain unchanged. it can no longer be said that more sophisticated approaches to searching are too expensive computationally; they may, however, to be too expensive to introduce into legacy systems from a business perspective. ฀ the endeca-powered catalog coupled with the relative paucity of current literature on next-generation online catalogs is a scarcity of library industry interfaces from which to draw inspiration, rlg’s red light green and oclc’s fictionfinder being notable exceptions. in june 2004, library automation vendor tlc announced a partnership with endeca technologies for joint sales, marketing, technology, and product development of the company’s iap software. this search software underlies the web sites of companies such as wal-mart, barnes and noble, ibm, and home depot. ncsu libraries acquired endeca’s iap software in may 2005, started implementation in august, and deployed the new catalog in january 2006. several organizational and cultural factors contributed to making this project possible. of significance was an ongoing administrative commitment to fund digitallibrary innovation, including projects that involve some risk. library staff share this feeling that calculated risks are opportunities to improve the library as well as to open up new challenges in their own jobs. critically, they also believe that not all issues, particularly “edge cases,” (i.e., rarely occurring scenarios) must be resolved before releasing a new service. finally, it was important that the managers who controlled access to programming and other resources were also the project leaders and drivers of the collective urgency to solve the underlying problem. all these factors also contributed to making possible a five-month implementation timeline. functionality the principle functionality gained by implementing an advanced search-and-navigation technology such as the endeca iap falls in three main areas: relevance-ranked results, new browse capabilities, and improved subject access. most ilss, including ncsu’s former catalog, presented keyword results to users in one order: last-in, first-out (i.e., system sort), while browsing within keyword result sets was limited to the links within individual records. ฀ searching and relevance ranking of results inhabiting the catalog search landscape now, somewhere between a secondand third-generation catalog, is endeca’s mdex engine, which is capable of both boolean and limited partial-match retrieval. queries submitted to endeca can use one of several matching techniques (e.g., matchall, matchany, matchboolean, matchallpartial). the current ncsu implementation primarily uses the “matchall” technique for keyword searching, an implied and technique that requires that all search terms (or their spellcorrected, truncated form) entered by the user occur in the result. the user is not required to enter boolean operators for this type of search; in fact, these terms are discarded as stopwords. the “matchboolean” technique continues to support true boolean queries with standard operators; access to this functionality is provided through advanced search options. although classic information retrieval research tends to associate relevance ranking with probabilistic or vector-based retrieval techniques, endeca includes a suite of relevance ranking options that can be applied to booleantype searches (i.e., implied and/or). these individual modules are combined and prioritized according to customer specifications to form an overall relevance ranking strategy, or algorithm. each search index created in the endeca software can be assigned a different relevance ranking strategy. this capability becomes significant when considering the differences in the data being indexed for isbn/issn as compared to a general keyword search. since the keyword anywhere index contains the majority of the fields in a marc record and is the default search operator, its relevance ranking strategy received the most attention. this strategy currently consists of seven modules. the first five modules rank results in a dynamic fashion, while the final two modules provide static ordering based on publication date and total circulation. the ncsu libraries, algorithm prioritizes results with the query terms exactly as entered (no spell-correction, truncation, or thesaurus matching) as most relevant. for multiterm searches, results containing the exact phrase are considered more relevant than those that do not. in addition, ncsu has created a field priority ranking, which 130 information technology and libraries | september 2006 provides the capability to define matches that occur in the title as more relevant than matches that occur in the notes fields. the relevance algorithm also considers factors such as the number of times the query appears in each result and the term frequency/inverse document frequency (tf/idf) of query terms. the unprecedented nature of using this particular set of tools to define relevance algorithms in library catalogs meant that the initial configuration required a best guess approach. the ability to quickly change the settings and re-index provided the opportunity both to learn by doing and test assumptions. much work remains, however, including systematic testing of the “matchallpartial” retrieval technique. while not a true probabilistic or vectorbased matching approach, the “matchallpartial” retrieval technique will broaden a search by dropping individual query terms if no results are returned. however, this type of retrieval technique creates the challenge of developing an intuitive interface that helps users understand partial matching (although many users must be aware that this is how google works). spell correction, “did you mean . . . ,” and sort several other features are included in the basic endeca iap application. these include auto-correction of misspelled words, which uses an index-based approach based on frequency of terms in the local database rather than a dictionary. due to the presence of unique terminology in the database (particularly author names), the relevance ranking has been configured to display any matches on the user’s original term before spell-corrected matches. a “did you mean…” feature also checks queries against terms indexed within the local database to determine if another possible term has more hits than the original term in order to provide the user the option to resubmit the search with a different spelling. various sort options are supported, including date, title, author, and “most popular.” ฀ browse whatever the shortcomings of the card catalog, a library user could approach it with no query in mind; any drawer could be browsed. with the advent of online catalogs, this is no longer possible: an initial search is required to enter the system. marchionini characterizes “browsing strategies” as “informal and opportunistic.”15 a good catalog browse should simulate the experience of browsing the stacks, even potentially improving upon it since the virtual browser can jump around. many patrons cite the serendipity of browsing the stacks and “recognizing” relevant resources as a key part of their discovery process. with more books moving to online formats and off-site storage (and therefore, unable to be browsed), enhancing virtual browsing in the catalog becomes increasingly important. as borgman points out, “few systems allow searchers . . . to pursue non-linear links in the database.”16 key browsing features provided by the endeca software are faceted navigation and the ability to browse the entire collection without entering a search term. although most modern search engines support both fast response times and relevance ranking, the opportunity to apply endeca’s guided navigation feature to the highly structured marc record data was particularly intriguing. guided, or faceted, navigation exposes the relationships between records in the result set. for example, a broad topical search might return thousands of results. classification codes, subject headings, and item-level details can be used to define logical clusters for browsing—post-coordinate refinement—within the result set. since these refinements are based on the actual metadata of the records in the result set, users can never refine to less than one record, (i.e., there are no “dead ends”).these clusters, or facets, are known as dimensions. users are able to select and remove values from all available dimensions in any order to assist them as they browse through the result set. endeca’s dimensions, while able to be browsed, are not available only as post-coordinate search refinements, however. using the endeca application, library catalogs can once again give users the ability to browse the entire set of records without first entering a search term. any of the dimensions can be used to browse the collection in this fashion, and the ability to assign item-level information (e.g., format, availability, new book), as well as bibliographic-record elements, to the dimensions further enhances the browsing functionality. ฀ improving subject access given the unsuitability of library of congress subject headings (lcsh) as an entry vocabulary, improving topical (subject) access in catalogs centers around keyword searching. while keyword searches query the subject headings as they do the rest of the record, most systems do not take advantage of the fact that subject headings are controlled and structured access points or use the subject information embedded in the classification number. the endeca-powered catalog, in addition to addressing classic keyword-search problems by introducing relevance ranking, implied phrase, spell correction, and stemming, also leverages the “ignored” controlled vocabulary present in the bibliographic records—subject headings and classification numbers—to aid in improving topical searching. this is a system design concept that has been discussed in the literature on improving subject toward a twenty-first-century library catalog | antelman, lynema, and pace 131 access but has not until now been manifest in a major catalog implementation. as chan noted, “subject headings and classification systems have more or less operated in isolation from each other.”17 the endeca-powered catalog interface is an experiment in presenting users with these two different, but complementary, approaches to categorizing library materials by subject. classification several catalog experiments created retrieval clusters based on deweyand ddc-classification schemes and captions in order to improve subject access by expanding the entry vocabulary and as a way to improve precision and recall.18 using the lc classification is more challenging, however, as it is not hierarchical. still, the potential of its use has been noted by bates and coyle; and larson experimented with creating clusters (“classification clusters”) based on subject headings associated with a given lc class.19 in larson’s system, the interface suggested possible subject headings of interest, an approach similar to that of displaying the subject facets alongside the result set in the endeca catalog. there is some evidence from early usability studies that exposing the classification, much as it was physically exposed in the card catalog, is useful and desired by catalog users. markey summarizes findings of a 1981 council on library resources study in which many institutions conducted usability testing. positive aspects of card-catalog use that people wanted to see in the opac included, a “visual overview of what is available in the library,” and “serendipity.”20 but there is a difference between using the classification scheme to identify subject headings and displaying the classification itself in the user interface. the latter can be problematic from a usability perspective, as larson pointed out, because the classification scheme and terminology are not transparent.21 imagine the would-be browser of a library’s computer-science collection having to know to select first q science, then qa1–qa939 mathematics, and then qa71–qa90 instruments and machines before possibly recognizing that qa75–qa76.95 calculating machines included computer science? despite these potential problems, because the endeca software supported display of the lc classification as a dimension, ncsu decided to experiment with its utility by making it available on the results screen. entry vocabularies entry vocabularies or mappings apply to all types of retrieval models. they address the general problem of reconciling a user’s query vocabulary with the index vocabulary represented in the catalog or documents.22 studies show that users’ query vocabulary is large (people rarely pick the same term to describe the same concept) and inflexible (people are unable to repair searches with synonyms.)23 because of this, bates refers to the objective of the entry vocabulary as the “side-of-abarn principle.”24 several approaches have been taken to develop this functionality. building on larson’s “classification clustering” methodology, buckland created an entry vocabulary module by associating dictionaries created by analyzing database records.25 the result was natural language indexes to existing thesauri and classification systems. while the endeca-powered catalog does not yet incorporate an entry vocabulary, its exposure of the index vocabulary to the user in subject dimensions could be said to be a limited side-of-a-barn approach. the limitation is that only controlled vocabulary from the retrieved records is exposed as dimensions on the results screen; relevant records not retrieved because of a lack of match between query vocabulary and terms in the record will not have their facets displayed. were an entry vocabulary for lcsh available, endeca’s synonym-table feature could be used to map between query terms and lcsh. ฀ implementation the library’s information technology advisory committee appointed a seven-member representative team to oversee the implementation. preparatory steps included sending key development staff to training and a two-day meeting with endeca project managers to establish functional and technical requirements. architecture knowing that the endeca application would not completely replace ncsu’s integrated library system, determining how best to integrate the two products was part of the implementation process. the endeca iap coexists with the sirsidynix unicorn ils and the sirsidynix (web2) online catalog, indexing marc records that are exported from unicorn’s underlying oracle database. figure 1 depicts the integration of the endeca software with existing systems. although the endeca software is capable of communicating directly with the database that supports the unicorn ils, ncsu chose the easier path of exporting marc records into text files for ingest by endeca. the marc4j api is used to reformat the exported marc records (which include itemlevel information in 999 fields) into flat text files with utf-8 encoding that are parsed by endeca’s data foundry process. nightly shell scripts export updated and new records from ils, merge those with the base endeca files, and start the re-indexing process. the indexing of seventy-three marc 132 information technology and libraries | september 2006 record fields and ten dimensions results in an index size of approximately 2.5 gb. the entire index resides in system memory. the endeca data foundry can easily parse and reindex the approximately 1.7 million titles in ncsu’s holdings nightly (in stark contrast to the more than 3 days of downtime required to re-index keywords in unicorn). the relative speed of this process and the fact that it does not interfere with the front-end application prompted the decision not to implement “partial indexing” at the outset. though there was little doubt among staff as to the increased capabilities of keyword searching through endeca, the implementation team decided that authority searching (author, title, subject, call number) would be preserved in the new catalog interface. this allowed ncsu to retain the value of authority headings, in addition to providing a familiar interface and approach to known-item searching. since the detailed record in web2 included the capability to save records, place requests, and send systemsuggested searches (“more like this”), the implementation team also decided to link from titles in the endeca-powered results page to the web2 detailed record. only slight modifications were required to stylize this display in a manner consistent with the new interface. the front-end interface for keyword searching in endeca is a java-based web application built in-house. this application is responsible for sending queries to the endeca mdex engine—the back-end http service that processes user queries—and displaying the results that are returned. user-interface design because it is created by the customer, ncsu libraries has complete control over the look, feel, and layout of the endeca search-results page. indexes, properties, and dimensions the implementation team began the process of making indexing decisions by looking at the fields indexed in the unicorn keyword-index file. this list included 161 marc fields and subfields, including more than thirty fields that are never displayed to the public. this kitchen-sink approach was replaced with a more carefully selected list less than half that number. the implementation team defined eleven dimensions for use with endeca’s faceted navigation feature. once users enter a search query, they can explore the result set by selecting values from these dimensions: availability; lc classification; subject: topic; subject: genre; format; library; subject: region; subject: era; language; and author (see figure 2). the eleventh dimension is not displayed on the results page, but is used to enable patrons to browse new titles. each dimension value also lists the number of results associated with it; most dimensions are listed in frequency order. search interface once the implementation team made some preliminary decisions regarding dimensions and search indexes, wireframes were created to assist in the iterative design process for the front-end application. while the positioning of the dimensions on the results page and the display of holdings information was well debated, the design of the catalog search page was an even hotter topic. integration of both endeca keyword searching and web2 authority searching required an interface that could help users differentiate between the two tools. a survey of the keyword-versus-authority searching distinction in a variety of library catalogs led to the development of four mock-ups. the implementation team chose a search tab that includes separate search boxes for keyword and authority searching, as well as search figure 1. ncsu endeca architecture figure 2. dimensions toward a twenty-first-century library catalog | antelman, lynema, and pace 133 examples dynamically displayed based on the index selected. authority searching was relabeled “begins with” searching to let users know that this particular search box featured known-item searching (although it is also where lcsh searching is found) (see figure 3). an advanced search tab re-creates the pre-coordinated search options from the web2 search interface using endeca search functionality. one unique new feature allows users to include or exclude reference materials and government documents from their results. a true boolean search box is made available here, primarily for staff. browse while users can submit a blank search and browse the entire collection by any of the dimensions, the browse tab specifically supports browsing by lc classification scheme (see figure 4). this tab also includes a “new titles” browse that can easily be refined with faceted navigation. at the time of this writing, there are plans to pull out other dimensions, such as format, language, or library, for browsing. this will be a great stride forward since there has traditionally been no way to perform a marc codes-only search (in order to browse all chinese fiction in the main library, for example). assessment the endeca-powered catalog seems self-evidently a better tool to help users find relevant resources quickly and intuitively. but since so much of the implementation involved uncharted territory, plans for assessment began before the launch of the interface, and the actual assessment activities began shortly thereafter. the library identified five assessment measures prior to implementation. one of these, however, requires longer time-series data (changes in circulation patterns), and another, the application of new and potentially complex log-analysis techniques (path analysis). other measures relate to use of the refinements, “sideways searching,” and objective and subjective measurements of quality search results, some of which can be preliminarily reported on here. log analysis to learn more about how patrons are using the catalog, data from two months of search logs were analyzed. while authority searching using the library’s old web2 catalog is still available in the new interface, search logs show that authority searching has decreased 45 percent and keyword searches have increased 230 percent. it is noted, however, that a significant—and indefinable—component of this increase in keyword searching is due to the fact that the default catalog search was changed from title to keyword. users are taking advantage of the new navigational features. fifty-five percent of the endeca-based search requests are simple keyword searches, 30 percent represent searches where users are selecting post-search refinements from the dimensions on the results page, and the remaining 15 percent are true browses with no search term entered (this figure includes use of browse new titles). dimensions the horizontal space just above the results is used to display the full range of results within the lc classification scheme (see figure 2). the first dimensions in the left column focus on the subject dimensions (topic and genre) that should be pertinent to the broadest range of searches. the following format and library dimensions recognize that patrons are often limited by time and space. when designing the user interface, it was not known which dimensions would be most valuable. as it turned out, dimension use does not exactly parallel dimension placement. lc classification is the most heavily used, followed closely by subject: topic, and then library, format, author, and subject: genre. since no basis for the placement of dimenfigure 3. new catalog search interface figure 4. browse by lc classification and new titles 134 information technology and libraries | september 2006 sions existed at the time of implementation, the endeca product team plans to use these data, after some time, to determine if changes in dimension order are warranted. spell correction and “did you mean . . .” approximately 6 percent of endeca keyword searches responded to the user’s query with some type of spelling correction or suggestion: 3.6 percent performed an automatic spell correction, and 2.8 percent offered a “did you mean…” suggestion. while ncsu has not analyzed how many of the spell corrections are accurate or how many of the “did you mean…” suggestions are being selected by users, future work in this area is planned. recommender features two features in endeca that have seen a surprising amount of use are the “most popular” sort option and the “more titles like this” feature available on the detailed-record page for a specific title. both relate broadly to the area of recommending related materials to patrons. the “most popular” sort option is currently powered by aggregated circulation data for all items associated with a title. while this technique is ineffective for serials, reference materials, and other noncirculating items, it provides users a previously unavailable opportunity to define relevance. to date, the “most popular” sort is the second most frequently selected sort option (after publication date, at 41 percent), garnering 19 percent of all sorting activity. most-popular sorting was trailed by title, author, and call-number sorting. when viewing a detailed record, users are given the option to find “more titles like this” or “more by these authors.” the first option initiates a new subject keyword search combining the phrases from the $a subdivision of all the subject (6xx) fields assigned to the record. the latter option initiates an author keyword search for any of the authors assigned to the current record. while there are not good statistics on use of this feature, these subject strings appear regularly in the list of most popular queries in search logs. assessing top results if relevance ranking was effective, one would expect to see good results on the first page. but what are “good” or “relevant” results? greisdorf finds that topicality is the first condition of relevance, and xu and chen’s more recent study finds topicality and novelty to be equally important components of relevance.26 while someone other than the searcher might be able to assess topical relevance, it is impossible to assess novelty, since it cannot be known what the searcher already knows. although researchers agree that relevance is subjective—that is, only a searcher can determine whether results are relevant—janes showed that trained external searchers do a reasonably good job of approximating the topical relevancy judgments of users.27 the analysis reported here focuses on topicality (using a liberal interpretation of what might be topically relevant). ncsu libraries sought to measure how many of the top search results are likely to be relevant to the user ’s query in the old and new catalogs. methodology one of the authors searched 100 topical queries (taken from 2005 search logs) in both web2 and endeca catalogs using “keyword anywhere.” topical queries whose meaning was unclear (e.g., “hand wrought”) were excluded. the topical relevance of the top hits (up to five) was coded for each target. because not all search-result sets contained five records, success for each was measured as a ratio (e.g., 2/5 = .4). those searches that resulted in 0 records in both targets were discarded, while those that resulted in 0 records in target a but “found relevant results” in target b were counted as 0 in target a. the ratios were then averaged for each target and compared to determine the difference in relevance-ranking performance. finally, a random subset of forty-four of the queries was selected, and the placement in the web2 results of the first result in endeca was noted. results on average, 40 percent of the top results in web2 were judged to be relevant, while 68 percent of the top results in endeca were judged to be relevant. that represents a 70 percent better performance for the endeca catalog. if one makes the assumption that the first endeca record is relevant (admittedly an assumption), based on these data, then one can look at the average position of that record in the old catalog. it was found that the first hit in endeca fell between #1 and #4126 in web2, with more than a third falling after the second screen of results, the maximum number of screens users are typically willing to examine.28 while this level of increased performance is impressive, it masks some dramatic differences in the respective result sets. looking at a broad search, “marsupial,” all of the top five hits in endeca have “marsupial” in the title and “marsupials” or “marsupialia” as a subject heading. the result set includes seventy-eight records, thanks to this intelligent stemming. in the web2 result set, just twenty-nine records, not a single one of the top five has “marsupial” in the title or subject headings (and the top two results, tributes to malcolm c mckenna and poisonous plants and related toxins, are highly unlikely to be relevant). it is not until record #10 that you see the first item that contains “marsupial” in the title or subject. this single example demonstrates the benefit of both relevance ranking and stemming. toward a twenty-first-century library catalog | antelman, lynema, and pace 135 usability testing as a result of a long history of catalog-usability studies, there are things that are known about library catalog users. one is that people both expect systems to be easy to use and find that they are not.29 usability testing was conducted to compare student success in using the new catalog interface with that of students using the old catalog interface when completing the same set of ten tasks. ten undergraduate students were recruited for the test. five were randomly selected to use the old web2 catalog, while the other five used the new catalog interface, which allows users to choose between a keyword search box powered by endeca and an authority search box (begins with . . . ) that is still powered by web2. the test contained four known-item tasks and six topical-searching tasks (appendix a). task success, duration, and difficulty were recorded. user satisfaction was not measured since catalog usability studies have found that satisfaction does not correlate with success.30 task duration figure 5 shows the average task duration for the topical tasks (5–10) for web2 and endeca. except for task 9*, there is clearly a trend of significantly decreased average task duration for endeca catalog users. the endeca catalog shows a 48 percent improvement in the average time required to complete a task (01:34 in web2 compared to 00:49 in endeca). it is also noted that, although results from known-item searching tasks (1–4) are not reported in detail here, test subjects were just as successful in completing them using keyword searching in the endeca catalog as they were using authority searching in web2. task success and difficulty in addition to task duration, the test moderator assigned a difficulty rating to each task attempted by the participants: easy, medium, hard, or failed. figure 6 illustrates the overall task-attempt difficulty for topical tasks (5–10) in the web2 and endeca catalogs. the largest improvement is in the increased percentage of tasks that are completed easily in endeca and the nearly equivalent decrease in the percentage of tasks that were rated as hard to complete. while a significant number of tasks were still failed using the endeca catalog, many of these failures can be attributed to participants’ propensity to select keyword in subject rather than keyword anywhere searches. in fact, the only instances where keyword anywhere search in the new catalog failed to lead to successful task completion were for a single participant who was unwilling to examine retrieved results closely enough to determine if they were actually relevant to the task question, assuming too quickly that the task had been completed successfully. terminology participants using both the web2 and endeca catalog interfaces expressed confusion over some of the terminology employed. one of the most problematic terms was “subject.” a number of participants selected keyword in subject for topical searches because of the attraction of the word “subject.” none of the participants recognized that this term referred to controlled vocabulary assigned to records. coupled with a slight unfamiliarity with the term “keyword,” not typically used in web searching, this misunderstanding led participants to misuse (or overuse) keyword in subject searches when they could have found results more effectively using general keyword searching. this terminology problem appears to be an artifact of the usability testing, however. looking at the search logs, more than 50 percent of the keyword searches were keyword anywhere searches, while only 4 percent represented keyword in subject searches. relevance relevance ranking of search results is clearly the most important im-provement in the new catalog. students in this usability test all looked immediately at the first few results on the first page to determine if their search had produced good results. if they didn’t like what they saw, they were likely to retry the search with fewer or more keywords in order to improve their first few results. one participant figure 5. average task duration: web2 versus endeca * while task 9 may appear to be an aberration, it actually reveals effective use of new functionality. this task required users to locate an audio recording of poetry in spanish. in web2, three of five participants completed the task successfully, all using the material type and language limits available in the advanced search tab. the two participants who didn’t locate this tool failed to complete the task. in endeca, two participants used the same advanced search limits to complete the task successfully and two additional participants were able to locate and use endeca dimensions to complete the task successfully. this suggests that the new interface is providing users with more options to help them arrive at the results they seek. 136 information technology and libraries | september 2006 using the web2 catalog expressed the need for relevance ranking, “once i scroll through a page, i get pretty discouraged about the results.” the number of paging requests recorded in system logs confirms that users are focusing on the first result screen (with ten results per page); only 13 percent of searchers go to the second page. use of dimensions when questioned after the test, all five participants who used the endeca catalog intuitively understood that dimensions could be used to narrow results. however, only three used the dimensions during the test. throughout the tests, the student participants frequently attempted to limit their search at the outset, rather than beginning with a broad search and then refining. it is unclear whether this behavior is a function of the very specific nature of the test questions or experience with the old catalog. log data show that users are indeed entering broad keyword searches with only one or two terms, which implies that dimensions may be more useful than this usability test indicates. it is also interesting to note that while none of the students understood that the lc classification dimension represented call-number ranges, they did understand that the values could be used to learn about a topic from different aspects—science, medicine, education. ฀ future directions weeks before the initial application went live in january 2006, the list of desired features had grown long. some of these were small “to do” items that the team did not have time to implement. others required deeper investigation, discussion, and testing before the feature could be put into production. still others may or may not be possible. a few of ncsu’s significant planned development directions are summarized below. functional requirements for bibliographic records there is much interest in the utility of applying the functional requirements for bibliographic records model to online catalogs.31 endeca includes a feature called “record rollup” that allows retailers to group items together for example, different sizes and colors of a shirt. all that is required for this feature is a rollup key. ncsu, working with oclc, has elected to try the oclc work identifier to take advantage of this functionality and create work-level record displays in the endeca catalog hit list. subject access the collective investment libraries have made in subject and name authorities is leveraged with the faceted navigation features of endeca. but only authorized headings in records are seen by endeca, cross-references in the subjectauthority record are not used. during implementation, the team looked at ways to improve the entry vocabulary to authorized-subject terms by loading the 1xx and 4xx fields from the subject-authority file into endeca synonym tables so that users could be guided to proper subject terms. the team still views this as a promising direction, but simply did not have time to fully explore it prior to implementation. additional discussions with oclc centered on their faceted access to subject terms (fast) project. fast terms are more amenable than lcsh headings to being broken up into topical, geographic, and time-period facets without losing context and meaning. the normalization of geographic and time-period subdivisions promises to be particularly useful. fast has, to date, lacked a ready interface for the application of its data. while the fast structure is more conducive to non-cataloger metadata creation and post-coordinate refinement, it still does not meet the need figure 6. topical task success and difficulty: web2 versus endeca toward a twenty-first-century library catalog | antelman, lynema, and pace 137 for a user-entry vocabulary.32 were such a vocabulary for lcsh to become available, it could be mapped to synonym tables to lead users to authorized headings. abandon authority searching? the future of authority searching, however, is less clear. although the usability testing described in this paper showed that the endeca keyword search tools performed on a par with the old catalog for known-item searching, it is recognized that authority searching serves more functions. clearly, collocation of all books on a topic is absent when a user does a topical search using keyword rather than a controlled subject heading. but there are more subtle losses as well. as chan points out, one purpose of subject access is to help users focus searches, develop alternative strategies, and enable recall and precision.33 this is not possible with a simple keyword search, unless the searcher discovers that he can search on a subject heading from a record of interest. the display of subject facets in the endeca-powered catalog works to counter this weakness of simple keyword searching. another navigation aid in the traditional authority display that is lost in a simple keyword-search result is visible “seams.” as mann points out, “seams serve as perceptible boundaries that provide points of reference; without such boundaries, readers get ‘lost at sea’ and don’t know where they are in relation to anything else: they can’t perceive either the extent of what they have, or what they don’t have.”34 until users have confidence that a known item will appear at the top of a results list if the library holds that item, with a large keyword result set, one cannot confirm a “negative result” without browsing through the entire set. the endecapowered catalog interface does not help to address either the “seams” or the negative-result problem, which are two reasons why ncsu maintained authority searching. an integration platform despite the vast improvements found in the endeca catalog, the fact remains that it is still mainly books—as calhoun says, “only a small portion of the expanding universe of scholarly information.”35 there are two approaches to take with the endeca platform: one is to take advantage of having control over the data and the interface to facilitate incorporation of outside data sources to enhance bibliographic records. the second is to put other, non-catalog data sources under the endeca search-and-navigation umbrella. the middleware nature of the endeca platform makes either approach more promising than the “square peg and round hole” problem of trying to work with library management systems illequipped to handle a diversity of digital assets. whether as a feed of catalog data to a metasearch application or web-site search tool, or as a platform for faceted access to electronic theses, institutional repositories, or electronic books, endeca has clear potential as a future platform for library resource discovery. ฀ conclusion while it cannot be claimed that this endeca-powered catalog is a third-generation online catalog, it does implement a majority of the third-generation catalog features identified by hildreth. most notably, through navigation of subject and item-level facets, the endeca catalog supports two of his objectives, “related record search and browse” and “integration of keyword, controlled vocabulary, and classification-based approaches.” spell correction, intelligent stemming, and synonym tables support “automatic term conversion/matching aids.” the flexible relevance-ranking tools support “closest, best-match retrieval” as well as “ranked output.” much work remains, however. three important features identified by hildreth cannot be said to be implemented in this catalog at this time: “natural language query expression,” that is, an entry vocabulary, “expanded coverage and scope,” and “relevance feedback methods.”36 requirements for these features are either being reviewed or are already under development by both endeca and ncsu libraries. ncsu views the endeca catalog implementation in the context of a broader, critical evaluation and overhaul of library discovery tools. like the library web site, the catalog still requires users to come to it. when they do, it still sets a high threshold for patience and the ability to interpret clues. still, at the end of the day it rewards the ncsu student searching “declaration of independence” with the book, american scripture: making the declaration of independence instead of the recent congressional resolution, recognizing the mexican holiday of cinco de mayo. references 1. christine l. borgman, “why are online catalogs still hard to use?” journal of the american society for information science 47, no. 7 (1996). 2. karl v. fast and d. grant campbell, “i still like google: university student perceptions of searching opacs and the web.” in proceedings of the 67th asis&t annual meeting (providence, r.i.: american society for information science and technology, 2004). 3. ray r. larson, “between scylla and charybdis: subject searching in the online catalog,” advances in librarianship 15 (1991); andrew large and jamshid beheshti, “opacs: a research review,” library & information science research 19, no. 2 (1997). 4. nathalie nadia mitev, gillian m. venner, and stephen walker, designing an online public access catalogue: okapi, a catalogue on a local area network (london: british library, 1985). 138 information technology and libraries | september 2006 5. borgman, “why are online catalogs still hard to use?” 495. 6. r. hafter, “the performance of card catalogs: a review of research,” library research 1, no. 3 (1979). 7. gerard salton, “the use of extended boolean logic in information retrieval,” in proceedings of the 1984 acm sigmod international conference on management of data (new york: acm pr., 1984), 277. 8. ray r. larson, “classification clustering, probabalistic information retrieval, and the online catalog,” library quarterly 61, no. 2 (1991). 9. ibid. 10. charles r. hildreth, online public access catalogs: the user interface (dublin, ohio: oclc, 1982). 11. larson, “classification clustering.” 12. mitev, venner, and walker, designing an online public access catalogue; ray r. larson et al., “cheshire ii: designing a next-generation online catalog,” journal of the american society for information science 47, no. 7 (1996). 13. tamas e. doszkocs, “cite nlm: natural-language searching in an online catalog,” information technology and libraries 2, no. 4 (1983). 14. nicholos j. belkin and w. bruce croft, “retrieval techniques,” in annual review of information science and technology, ed. martha e. williams (new york: elsevier, 1987), 129. 15. gary marchionini, information seeking in electronic environments (new york: cambridge univ. pr., 1995), 100–18. 16. borgman, “why are online catalogs still hard to use?” 494. 17. lois mai chan, exploiting lcsh, lcc, and ddc to retrieve networked resources: issues and challenges (washington, d.c.: library of congress, 2001), www.loc.gov/catdir/bibcontrol/ chan_paper.html (accessed july 10, 2006). 18. lois mai chan, “library of congress classification as an online retrieval tool: potentials and limitations,” information technology and libraries 5, no. 3 (1986); mary micco and rich popp, “improving library subject access (ilsa): a theory of clustering based in classification,” library hi tech 12, no. 1 (1994). 19. marcia j. bates, “subject access in online catalogs: a design model,” journal of the american society for information science 37, no. 6 (1986); karen coyle, “catalogs, card—and other anachronisms,” the journal of academic librarianship 31, no. 1 (2005); larson, “classification clustering.” 20. karen markey, “thus spake the opac user,” information technology and libraries 2, no. 4 (1983): 383. 21. larson, “classification clustering.” 22. marcia j. bates, library of congress bicentennial conference on bibliographic control for the new millennium, task force recommendation 2.3 research and design review: improving user access to library catalog and portal information, final report, 2003; charles r. hildreth, intelligent interfaces and retrieval methods for subject searching in bibliographic retrieval systems (washington, d.c.: library of congress, 1989); bates, “subject access in online catalogs”; belkin and croft, “retrieval techniques.” 23. bates, “subject access in online catalogs”; bates, library of congress bicentennial conference on bibliographic control for the new millennium; eric novotny, “i don’t think, i click: a protocol analysis study of use of a library online catalog in the internet age,” college & research libraries 65, no. 6 (2004). 24. bates, “subject access in online catalogs,” 367. 25. larson, “classification clustering”; buckland et al., “mapping entry vocabulary to unfamiliar metadata vocabularies,” d-lib magazine 5, no. 1 (1999). 26. h. greisdorf, “relevance thresholds: a multi-stage predictive model of how users evaluate information,” information processing & management 39, no. 3 (2003): 403–23; yunjie (calvin) xu and zhiwei chen, “relevance judgment: what do information users consider beyond topicality?” journal of the american society for information science and technology 57, no. 7 (2006). 27. joseph w. janes, “other people’s judgments: a comparison of users’ and others’ judgments of document relevance, topicality, and utility,” journal of the american society for information science 45, no. 3 (1994). 28. bernard j. jansen and udo pooch, “a review of web searching studies and a framework for future research,” journal of the american society for information science and technology 52, no. 3 (2001); novotny, “i don’t think, i click.” 29. borgman, “why are online catalogs still hard to use?” 30. brian nielsen and betsy baker, “educating the online catalog user: a model evaluation study,” library trends 35, no. 4 (1987). 31. ifla cataloging section, “frbr bibliography,” www.ifla .org/vii/s13/wgfrbr/bibliography.htm (accessed may 1, 2006). 32. lois mai chan et al., “a faceted approach to subject data in the dublin core metadata record,” journal of internet cataloging 4, no. 1/2 (2001). 33. chan, exploiting lcsh, lcc, and ddc. 34. thomas mann, “is precoordination unnecessary in lcsh? are web sites more important to catalog than books?” a reference librarian’s thoughts on the future of bibliographic control (washington, d.c.: library of congress, 2001), www.loc.gov/ catdir/bibcontrol/mann_paper.pdf (accessed july 10, 2006). 35. karen calhoun, “the changing nature of the catalog and its integration with other discovery tools,” prepared for the library of congress, 2006, 24. unpublished, www.loc.gov/ catdir/calhoun-report-final.pdf (accessed july 7, 2006). 36. charles r. hildreth, online catalog design models: are we moving in the right direction? (washington, d.c.: council on library resources, 1995). toward a twenty-first-century library catalog | antelman, lynema, and pace 139 copyright © 2006 by charles w. bailey jr. this work is licensed under the creative commons attributionnoncommercial 2.5 license. to view a copy of this license, visit http://creativecommons.org/licenses/by-nc/2.5/ or send a letter to creative commons, 543 howard st., 5th floor, san francisco, ca, 94105, usa. bailey continued from 127 ฀ known-item questions 1. “your history professor has requested you to start your research project by looking up background information in a book titled civilizations of the ancient near east.” a. “please find this title in the library catalog.” b. “where would you go to find this book physically?” 2. “for your literature class, you need to read the book titled gulliver’s travels written by jonathan swift. find the call number for one copy of this book.” 3. “you’ve been hearing a lot about the physicist richard feynman, and you’d like to find out whether the library has any of the books that he has written.” a. “what is the title of one of his books?” b. “is there a copy of this book you could check out from d. h. hill library?” 4. “you have the citation for a journal article about photosynthesis, light, and plant growth. you can read the actual citation for the journal article on this sheet of paper.” alley, h., m. rieger, and j.m. affolter. “effects of developmental light level on photosynthesis and biomass production in echinacea laevigata, a federally listed endangered species.” natural areas journal 25.2 (2005): 117–22. a. “using the library catalog, can you determine if the library owns this journal?” b. “do library users have access to the volume that actually contains this article (either electronically or in print)?” ฀ topical questions 5. “please find the titles of two books that have been written about bill gates (not books written by bill gates).” 6. “your cat is acting like he doesn’t feel well, and you are worried about him. please find two books that provide information specifically on cat health or caring for cats.” 7. “you have family who are considering a solar house. does the library have any materials about building passive solar homes?” 8. “can you show me how would you find the most recently published book about nuclear energy policy in the united states?” 9. “imagine you teach introductory spanish and you want to broaden your students’ horizons by exposing them to poetry in spanish. find at least one audio recording of a poet reading his or her work aloud in spanish.” 10. “you would like to browse the recent journal literature in the field of landscape architecture. does the design library have any journals about landscape architecture?” appendix a: ncsu libraries catalog usability test tasks virtual reality: a survey of use at an academic library articles virtual reality a survey of use at an academic library megan frost, michael goates, sarah cheng, and jed johnston information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.11369 megan frost (megan@byu.edu) physiological sciences librarian, brigham young university. michael goates (michael_goates@byu.edu) life sciences librarian, brigham young university. sarah cheng is an undergraduate student, brigham young university. jed johnston (jed_johnston@byu.edu) innovation lab manager, brigham young university. abstract we conducted a survey to inform the expansion of a virtual reality (vr) service in our library. the survey assessed user experience, demographics, academic interests in vr, and methods of discovery. currently our institution offers one htc vive vr system that can be reserved and used by patrons within the library, but we would like to expand the service to meet the interests and needs of our patrons. we found use among all measured demographics and sufficient patron interest for us to justify expansion of our current services. the data resulting from this survey and the subsequent focus groups can be used to inform other academic libraries exploring or developing similar vr services. introduction virtual reality (vr) is commonly defined as an experience in which a user remains physically within their real world while entering a virtual world (comprising three-dimensional objects) using a headset with a computer or a mobile device.1 vr is part of a spectrum of related technologies ranging from mostly real experiences to completely virtual experiences, such as augmented reality, augmented virtuality, and mixed reality. 2 extended reality (xr) is a term often used when describing these technologies as a whole. many different xr devices and services are available in academic libraries. the most popular xr devices used in libraries are the htc vive, the oculus rift by facebook, and google cardboard.3 other common xr devices include gearvr by samsung and playstation virtual reality by sony.4 the htc vive and oculus rift are technologies that provide an immersive virtual-reality experience. google cardboard provides both non-immersive virtual reality and augmented reality experiences, while mixed reality is provided through various technologies such as microsoft’s hololens and mixed-reality headsets from hp, acer, and magic leap. in addition, many academic libraries are using augmented reality apps that can be downloaded on patrons’ personal mobile devices.5 academic libraries are starting to offer various xr services to increase engagement with patrons and teach information literacy.6 despite the increase in xr service offerings, there is little consistency in the devices used or in how these services are developed at academic libraries , and there is substantial variation in the types of services offered. for example, some libraries make vr headsets available for in-house activities, such as storytelling, virtual travel, virtual gaming, and the development of new skills.7 other libraries, notably ryerson university library and archives in toronto, let students and faculty borrow their oculus rift headsets for two or three days at a time.8 some university libraries lend out headsets or 360-degree cameras or provide a virtualmailto:megan@byu.edu mailto:michael_goates@byu.edu mailto:jed_johnston@byu.edu information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 2 reality space for students to develop content.9 the university of utah library offers an open-door, drop-in vr workshop once a week.10 claude moore health sciences library at the university of virginia implemented a project that educated its students and staff on the uses of vr in the health field through a combination of large-group demonstrations, one-on-one consultations, and workshops.11 the xr field is developing quickly, and xr services have the potential to benefit students academically. some universities are already offering classes on vr platforms.12 this is particularly true in fields that are high risk or potentially discomforting. for example, students in medical fields benefit by practicing virtually before attempting surgery on a human body.13 in addition to potential surgical benefits, the university of new england has been utilizing xr technology to teach empathy to its medical and other health profession students by putting the learner in the place of their patients.14 other examples of xr usage in the health fields include a recent attempt to introduce vr in anatomic pathology education and the use of virtual operating rooms to train nurses and educate the public. 15 one recent study measured the effectiveness of using vr platforms in engineering education and found a drastic improvement in student performance.16 many educational institutions outside of the university setting have also started exploring how xr could be used to enhance students’ educational experience. this technology has already progressed from being considered a novelty to being an established tool to engage learners.17 one of the perceived benefits of xr use in public libraries by both library patrons and staff is the ability of xr technology to inspire curiosity and a desire to learn.18 in some school programs, students are able to advance their learning through xr apps that allow them not only to absorb information but also to experience what they are learning through hands-on activities and full immersion without danger (e.g., hazardous science experiments) or high cost (e.g., traveling to another country).19 xr has the potential to increase the overall engagement of students, which, according to carnini, kuh, and klein’s 2006 study, is correlated to how well students learn.20 xr has the ability to capture the attention of students and eliminate distractions. this is particularly true for students with attention deficit disorder, anxiety disorders, or impuls e-control disorder.21 the application of xr goes beyond traditional classroom settings. a case study assessing the benefits of vr in american football training found that players showed an overall improvement of 30 percent after experiencing game plays created by their coaches in a virtual environment.22 although these studies were not conducted in an academic library or university setting, their results are transferable. it is beneficial to academic libraries to provide technologies to their patrons that enhance and advance their learning. currently, xr apps available for purchase on the google app store are still limited. most app development comes from private companies; however, some universities are giving their students the opportunity to develop xr content.23 objectives at brigham young university, we want our vr services to foster the link between academic achievement and virtual reality. in order to do this effectively, our first objective is to determine which vr services will be of most benefit to our patrons. to inform the expansion of future vr services, we conducted a survey of patrons using current vr services in the library. this survey is also intended to help other libraries that are developing vr services and potentially developers information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 3 interested in creating academic content for students. we were primarily interested in user experience, demographics, academic interests in vr, and methods of discovery. methods during one semester, january through april 2018, we asked individuals to complete a questionnaire following their use of the library’s htc vive system. this questionnaire was administered through an online qualtrics survey that was distributed via email to patrons after using the library’s vr equipment. it consisted of thirteen questions that gathered basic demographic information as well as information on patron interests and experiences with the library’s vr services. the complete survey used in this study can be found in appendix a. currently the harold b. lee library at brigham young university offers one htc vive vr system that can be used on site in the science and engineering area of the library. it is primarily operated by student employees who work at the science and engineering reference desk. time slots are reserved through an online registration system on the library’s website. in order to gather more in-depth, qualitative data on patron experience with the library’s vr services, we also conducted a focus group with vr users. we recruited participants by adding a question at the end of the qualtrics survey asking whether the responder would be interested in participating in a focus group. all focus group participants received a complimentary lunch. during the focus group, we asked a series of five questions to gain a deeper understanding of users’ vr experience at the library. in particular, we asked participants to explain what went well during their vr experience in the library, what difficulties they experienced, how they envisioned using vr for both academic and extracurricular purposes, and what type of vr content (e.g., software or equipment) they would like the library to acquire. the focus group facilitator asked follow-up questions for clarification as needed. the session was audio recorded, and participant responses were transcribed and coded for themes. results and discussion demographics the most frequent users of the vr equipment in the library were male students in the science. technology. engineering, or mathematics (stem) disciplines. the percentage of male students at brigham young university is roughly 50 percent but over 70 percent of our survey respondents were male. that stated, there was considerable use among all measured demographics, as shown in figure 1. over one third of responders were not students. university faculty made up 11 percent of responders during the survey period. the proportion of faculty who responded was higher than the university’s faculty-to-student ratio and likely the result of directly advertising the service to non-student university employees. because some users informed librarians that they had brought spouses and children to use the equipment, we estimate that the 7 percent of responders who were neither students nor university employees mostly consisted of family or friends accompanying students or employees. over one third of student responders were majoring in disciplines outside of science, technology, engineering, and mathematics. this number is small when compared to the number of students in these majors across campus (approximately 63 percent of students on campus are not majoring in stem disciplines.); however, it demonstrates that there is an interest in vr technology throughout the university. as the vr services are located in the science and engineering area of the library, it is not surprising that more students majoring information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 4 in these disciplines used these services when compared to students majoring in other disciplines. in fact, 15 percent of responders learned about the services at the reference desk, where they could see other patrons using the vr equipment. the most common discovery method, however, was the various forms of advertisements targeted to both students and employees of the brigham young university, as shown in figure 2. figure 1. demographics. information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 5 figure 2. most effective discovery methods: advertisement and word-of-mouth. only 7 percent of responders identified research or class assignments as their primary reason for using the services. the large majority of use, as shown in figure 3, was simply for entertainment or fun. this was not unexpected, especially as most of the users were trying the technology for the first time (see figure 4). however, because we purchased the equipment with the intent to support academic pursuits on campus, we hoped to see a higher percentage of academic use. figure 3. most responders came because it sounded fun. information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 6 figure 4. most responders were first-time users. faculty use was higher than expected (see figure 5). eleven percent of users during our survey period were faculty. the majority of these responders indicated an interest in potentially using vr technology with their students (see figure 6). while this interest was positive, faculty member suggestions for classroom use remained hypothetical, without any concrete intentions for implementation. this suggests that although faculty interest exists, faculty may need to be informed of specific application ideas in order to be more likely to incorporate this technology into their courses. information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 7 figure 5. faculty were interested in trying the vr equipment. information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 8 figure 6. faculty were interested in using vr academically. a clear majority (72 percent) indicated an intention of returning to the library to use the service again (see figure 7). information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 9 figure 7. most responders intend to return. because our vr services were a small pilot program at the time of the survey, we did not offer a large number of paid apps to users. table 1 displays the most common apps used by survey responders. most users tried google earth during their session, and employees at the reference desk often recommended this app to new users. another common app for new users was the lab, which includes a few small games showcasing the current capabilities of vr. google tiltbrush is an app for creating 3d art. virtual jerusalem is an app that was created by faculty at brigham young university and allows users to walk around and explore the jerusalem temple mount during the time of christ. the fifth-most-used app we offered was 3d organon vr anatomy, which teaches human anatomy. 1. google earth 2. the lab 3. tiltbrush by google 4. virtual jerusalem 5. 3d organon vr anatomy table 1. top five apps used. focus group data we conducted a total of three focus group sessions. each session included between five and eight participants, for a total of twenty-one focus group participants. because we were primarily information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 10 interested in student responses, we limited focus group participants to students enrolled at brigham young university. the participants were asked to describe what did or did not go well during their vr session. when describing what went well during their vr session, many participants responded with positive comments about the quality of service the library employees provided during their session. most participants expressed satisfaction with the number and quality of the apps provided by the library. during all three focus groups, participants mentioned that they liked how easy it was to sign up for the vr services. the most common problems reported by participants related to health or safety concerns, such as feeling dizzy, bumping into objects in the room because of the lack of space, and tripping over the headset wire. other s reported problems related to the level of personal or social comfort with the vr services, such as feeling self-conscious using vr in a semi-open space not exclusively devoted to vr services or being told to be quieter. when asked about ways the library could improve its vr services, the students suggested solutions to many of these problems. a frequent recommendation was that the library dedicate a space to vr. the reasons for this suggestion included minimizing the risk of accidentally bumping into objects, reducing the embarrassment of using the vr equipment in front of spectators, and allowing participants to become more fully immersed in the vr experience without worrying about being too loud. other common suggestions included providing more than one headset for multiple patrons to use for gaming purposes or team projects, acquiring wireless headsets to eliminate wire tripping hazards, and providing more online training videos to reduce reliance on library workers for common troubleshooting problems. participants did not provide actionable suggestions on ways to decrease dizziness while operating vr equipment. when asked about how the students could see themselves using vr academically, many responded with some of the more well-known uses of vr technology, such as potential uses in science, medicine, engineering, and the military. however, some students had a very hard time determining how vr could be applied to humanities fields such as english. after some discussion, most students were able to see the relevance of vr in their field, but some said that they most likely would not pursue those functions of vr, using vr exclusively for extracurricular activities. in contrast to the lack of academic uses envisioned by focus group participants, participants had substantially more ideas about how they would use vr for extracurricular purposes, including playing games for stress relief, exercising, exploring the world, and watching movies. many expressed interest in using vr for extracurricular learning outside their majors, such as virtually being part of significant historic events, exploring ecosystems, and visiting museums or other significant landmarks. students expressed interest in exploring the many possibilities provided by vr technology but were not especially aware of or interested in how vr might apply to their specific field of study unless they were in an engineering, medical, or other science-related discipline. conclusions vr is a rapidly growing field, and academic libraries are already providing students access to this technology. in our study, we found considerable interest across campus in using vr in the library, however the academic interest and use were not as high as we hoped. future marketing to faculty might benefit from specifically suggesting ideas for academic uses or collaboration. even though our current vr services are located at the science and engineering help desk, nearly 40 percent of information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 11 users were not in stem disciplines. this is encouraging and suggests value in marketing future vr services to all library patrons. we also found sufficient patron interest to justify exploring related vr services, such as offering classes on creating content and acquiring less expensive headsets that can be borrowed outside of the library. although this survey was limited to one university, we believe the results can be used to inform other academic libraries as they develop similar vr services. endnotes 1 susan lessick and michelle kraft, “facing reality: the growth of virtual reality and health sciences libraries,” journal of the medical library association: jmla 105, no. 4 (2017): 407. 2 paul milgram et al., “augmented reality: a class of displays on the reality-virtuality continuum,” in telemanipulator and telepresence technologies 2351 (international society for optics and photonics, 1995), 282-92. 3 hannah pope, “incorporating virtual and augmented reality in libraries,” library technology reports 54, no. 6 (2018): 8. 4 sarah howard, kevin serpanchy, and kim lewin, “virtual reality content for higher education curriculum,” proceedings of vala (melbourne, australia: libraries, technology and the future inc., 2018), 2. 5 zois koukopoulos and dimitrios koukopoulos, “usage scenarios and evaluation of augmented reality and social services for libraries,” in digital heritage. progress in cultural heritage: documentation, preservation, and protection (springer international, 2018), 134-41; leanna fry balci, “using augmented reality to engage students in the library,” information today europe/ili365 (november 17, 2017), https://www.infotoday.eu/articles/editorial/featuredarticles/using-augmented-reality-to-engage-students-in-the-library-121763.aspx. 6 bruce massis, “using virtual and augmented reality in the library,” new library world 116, nos. 11-12 (2015): 789, https://doi.org/10.1108/nlw-08-2015-0054. 7 adetoun a oyelude, “virtual and augmented reality in libraries and the education sector,” library hi tech news 34, no. 4 (2017): 3, https://doi.org/10.1108/lhtn-04-2017-0019. 8 weina wang, kelly kimberley, and fangmin wang, “meeting the needs of post-millennial: lending hot devices enables innovative library services,” computers in libraries (april 2017): 7. 9 “oxford libguides: virtual reality: borrowing vr equipment,” bodleian libraries, https://ox.libguides.com/vr/borrowing; “virtual reality services,” penn state university libraries, https://libraries.psu.edu/services/virtual-reality-services; “vr studio,” north carolina state, https://www.lib.ncsu.edu/spaces/vr-studio. 10 oyelude, “virtual and augmented reality,” 3. 11 lessick and kraft, “facing reality: the growth of virtual reality,” 409. https://www.infotoday.eu/articles/editorial/featured-articles/using-augmented-reality-to-engage-students-in-the-library-121763.aspx https://www.infotoday.eu/articles/editorial/featured-articles/using-augmented-reality-to-engage-students-in-the-library-121763.aspx https://doi.org/10.1108/nlw-08-2015-0054 https://doi.org/10.1108/lhtn-04-2017-0019 https://ox.libguides.com/vr/borrowing https://libraries.psu.edu/services/virtual-reality-services https://www.lib.ncsu.edu/spaces/vr-studio information technology and libraries march 2020 virtual reality | frost, goates, cheng, and johnston 12 12 oyelude, “virtual and augmented reality,” 3. 13 medhat alaker, greg r. wynn, and tan arulampalam, “virtual reality training in laparoscopic surgery: a systematic review & meta-analysis,” international journal of surgery 29 (2016): 86, https://doi.org/10.1016/j.ijsu.2016.03.034. 14 elizabeth dyer, barbara j. swartzlander, and marilyn r. gugliucci, “using virtual reality in medical education to teach empathy,” journal of the medical library association: jmla 106, no. 4 (2018): 498, https://doi.org/10.5195/jmla.2018.518. 15 emilio madrigal, shyam prajapati, and juan hernandez-prera, “introducing a virtual reality experience in anatomic pathology education,” american journal of clinical pathology 146, no. 4 (2016): 462, https://doi.org/10.1093/ajcp/aqw133; nils fredrik kleven et al., “training nurses and educating the public using a virtual operating room with oculus rift,” ieee (2014): 1, https://doi.org/10.1109/vsmm.2014.7136687. 16 wadee alhalabi, “virtual reality systems enhance students’ achievements in engineering education,” behaviour & information technology 35, no. 11 (2016): 925, https://doi.org/10.1080/0144929x.2016.1212931. 17 patricia brown, “how to transform your classroom with augmented reality—edsurge news,” edsurge, november 2, 2015, https://www.edsurge.com/news/2015-11-02-how-to-transformyour-classroom-with-augmented-reality. 18 negin dahya et al., “virtual reality in public libraries,” university of washington information school, https://ischool.uw.edu/vrinlibraries. 19 del siegle, “seeing is believing: using virtual and augmented reality to enhance student learning,” gifted child today 42, no. 1 (2019): 46, https://doi.org/10.1177/1076217518804854. 20 guillaume loup et al., “immersion and persistence: improving learners’ engagement in authentic learning situations,” 11th european conference on technical enhanced learning (2016): 414, https://doi.org/10.1007/978-3-319-45153-4_35; robert carini, george kuh, and stephen klein, “student engagement and student learning: testing the linkages,” research in higher education 47, no. 1 (2006): 23-4, https://doi.org/10.1007/s11162-005-8150-9. 21 mariano alcaniz, elena olmos-raya, and luis abad, “use of virtual reality for neurodevelopmental disorders: a review of the state of the art and future agenda,” medicinabuenos aires 79, nos. 77–81 (2019): 419-20, https://doi.org/10.21565/ozelegitimdergisi.448322. 22 yazhou huang, lloyd churches, and brendan reilly, “a case study on virtual reality american football training,” proceedings of the 2015 virtual reality international conference 6 (2015): 3, https://doi.org/10.1145/2806173.2806178. 23 “media lab,” massachusetts institute of technology, https://libraries.psu.edu/services/virtualreality-services; “the ischool technology resources at fsu: virtual reality,” florida state university libguides, https://guides.lib.fsu.edu/ischooltech/vr. https://www.sciencedirect.com/science/journal/17439191 https://doi.org/10.1016/j.ijsu.2016.03.034 https://doi.org/10.5195/jmla.2018.518 https://doi.org/10.1093/ajcp/aqw133 https://doi.org/10.1109/vsmm.2014.7136687 https://doi.org/10.1080/0144929x.2016.1212931 https://www.edsurge.com/news/2015-11-02-how-to-transform-your-classroom-with-augmented-reality https://www.edsurge.com/news/2015-11-02-how-to-transform-your-classroom-with-augmented-reality https://ischool.uw.edu/vrinlibraries https://doi.org/10.1177/1076217518804854 https://doi.org/10.1007/978-3-319-45153-4_35 https://doi.org/10.1007/s11162-005-8150-9 https://doi.org/10.21565/ozelegitimdergisi.448322 https://doi.org/10.1145/2806173.2806178 https://libraries.psu.edu/services/virtual-reality-services https://libraries.psu.edu/services/virtual-reality-services https://guides.lib.fsu.edu/ischooltech/vr abstract introduction objectives methods results and discussion demographics focus group data conclusions endnotes letter from the editor (september 2019) letter from the editor kenneth j. varnum information technology and libraries | september 2019 1 https://doi.org/10.6017/ital.v38i3.11631 editorial board changes thanks to the dozens of lita members who applied to join the board this spring. the large number of interested volunteers made the selection process challenging. i’m pleased to welcome six new members to the ital editorial board for two-year terms (2019-2021): • lori ayre (independent technology consultant) • jon goddard (north shore public library) • soo-yeon hwang (sam houston state university) • holli kubly (syracuse university) • brady lund (emporia state university) • paul swanson (minitex) in this issue welcome to lita’s new president, emily morton-owens. in her inaugural president’s message, “sustaining lita,” morton-owens discusses the many ways lita strives to provide a sustainable organization for its members. we also have the next edition of our “public libraries leading the way column. this quarter’s essay is by thomas lamanna, “on educating patrons on privacy and maximizing library resources.” joining those essays are six excellent peer-reviewed articles: • “library-authored web content and the need for content strategy,” by courtney mcdonald and heidi burkhardt • “use of language-learning apps as a tool for foreign language acquisition by academic libraries employees,” by kathia ibacache • “is creative commons a panacea for managing digital humanities intellectual property rights?,” by yi ding • “am i on the library website?,” by suzanna conrad and christy stevens • “assessing the effectiveness of open access finding tools,” by teresa auch schultz, elena azadbakht, jonathan bull, rosalind bucy, and jeremy floyd • “creating and deploying usb port covers at hudson county community college,” by lotta sanchez and john delooper call for pllw contributions if you work at a public library, you’re invited to submit a proposal for a column in our “public libraries leading the way” series for 2020. our series has gotten off to a strong start with essays by thomas finley, jeffrey davis, and thomas lamanna. if you would like to add your voice, please submit a proposal through this google form. kenneth j. varnum, editor varnum@umich.edu september 2019 https://doi.org/10.6017/ital.v38n3.11627 https://doi.org/10.6017/ital.v38n3.11571 https://doi.org/10.6017/ital.v38n3.11571 https://doi.org/10.6017/ital.v38n3.11627 https://doi.org/10.6017/ital.v38n3.11077 https://doi.org/10.6017/ital.v38n3.11077 https://doi.org/10.6017/ital.v38n3.10714 https://doi.org/10.6017/ital.v38n3.10714 https://doi.org/10.6017/ital.v38n3.10977 https://doi.org/10.6017/ital.v38n3.11009 https://doi.org/10.6017/ital.v38n3.11007 https://doi.org/10.6017/ital.v38n1.10974 https://doi.org/10.6017/ital.v38n2.11141 https://doi.org/10.6017/ital.v38n3.11571 mailto:https://docs.google.com/forms/d/e/1faipqlsfqu7c9ogmcdvvbn025a0kiehavrrlr7090ao3rowqypbqtng/viewform?usp=sf_link mailto:varnum@umich.edu editorial board changes in this issue call for pllw contributions 12 library mechanization at auburn community college eloise f. hilbert: head librarian, auburn community college, auburn, new york use of an ibm 1401 computer and a single keypunch operation for changing a college book collection from dewey decimal to library of congress classification; for acquisitions, accounting and circulation procedures; and for production of a list of periodical holdings. a mark-sense reproducer is used for the circulation system. introduction auburn community college, a two-year college and one of the fifty-two units of the state university of new york, was founded in 1953 as one of the first community colleges in the state. like other institutions of higher learning, it has experienced a rapid growth. there are about 1500 students enrolled in the day division and about 1500 in the evening division. the college offers courses in data processing in addition to the usual curriculum offerings. the library possesses approximately 40,000 volumes and adds about 3,000 volumes a year. the librarian attended a one-week ibm customer executive conference for librarians on the subject of automation in endicott, n.y. in april1967, and returned eager to consider ways of applying computer technology to some of the library's procedures. data processing equipment, located directly beneath the library, was available for library use. the limited library staff, both professional and clerical, would benefit by automating of technical processes, since automation would eliminate much typing and reduce tedious tasks. clerical staff would have time to take over clerical operations that were being performed by professional staff, freeing the latter for more professional work. library mechanization at auburn/hilbert 13 upon approval by the college administration of plans for automating, discussions took place between the librarians involved and the chairman of the data processing department. there was considerable interest and cooperation among the library staff and data center personnel. library literature describing computerization in academic libraries was reviewed, but there was a decided lack of information available concerning automation of libraries approximately the same size as auburn's. a proposed use of mark-sense cards in the circulation system also appeared to be unique. ibm publications on library applications served as guides in developing the systems ( 1,2,3). decisions were made to automate a projected reclassification, the acquisitions and circulation systems and accounting procedures, and to produce lists of the serial holdings ( 4). the ibm processing installation used for the library comprised the following: 1401 computer, 12k storage; 1402 card reader punch; 1403 printer; 1311 disk storage drive; 514 mark-sense reproducer; 083 sorter; 548 interpreter; 026 keypunch. reclassification a decision had previously been made to change the classification of the library's collection of 30,000 books from the dewey decimal to the library of congress system. since the tedious task of erasing or painting over and retyping catalog card numbers and entries, book pockets and book cards would be greatly reduced by automation, it was decided that reclassification of the book collection should be the first automated procedure. the aim was to complete conversion as quickly as possible in order to avoid confusion in the library. the data center made its staff available for the summer months so that much of the conversion was completed at that time. conversion of the shelf list presently in dewey decimal classification was to be the key step. the shelf list was sent to the data center, where one ibm shelf list card was punched for each volume. accession number, call number, author's name and title, copyright date, and publisher were carefully abbreviated to fit into six fixed fields of the eighty columns on the card. these fields would appear repeatedly in later processes so this bibliographic record, which would be used in later procedures, needed to be keypunched only once. keypunch operators were instructed to use the library of congress number on the catalog card instead of punching the dewey decimal number. all cards that did not have complete library of congress cataloging were returned from the data center to the library for necessary additions and corrections. the decision was made to accept the library of congress classification call number exactly as it appeared on the card, in order to keep original cataloging to a minimum. this abbreviated shelf list punch card was sufficient for the purpose 14 journal of library automation vol. 3/ 1 march, 1970 presently planned for its use. it was considered unnecessary to provide complete bibliographical detail, which would be available from the main card catalog. punched shelf list cards would produce accession lists, "new book" lists, and bibliographies of books in abbreviated form. there was no interest in producing catalog cards, which would have required much more work and more time than was available. once the shelf list cards had been punched, reclassification proceeded as follows: the punched shelf list card was used as the source card to create the circulation book card, duplicating accession number, call number, author and title. the punched shelf list card also produced the gummed labels, a label for the book pocket, the spine label, and labels for the catalog cards. the punched cards had been produced in dewey decimal order from the catalog trays. labels, shelf list and circulation cards were placed with the dewey decimal shelf list card to make a set. it was important to keep them in dewey decimal order, since the books would come off the shelves in that order. each set of labels and cards was placed in its respective book, labels were attached to the book pockets already in the books, and call number labels were applied to the spines of the books. the remaining labels were filed with the dewey shelf list card and later attached to the other cards in the card catalog. circulation cards were inserted in the book pockets and the books were returned to the stacks, which had been relabeled with library of congress numbers. student assistants were used to perform this work under the supervision of the librarians. originally, it was estimated that the job would require three years, but by automated procedures reclassification took only about six months. attaching the labels to the cards in the card catalog did take another year. labels were placed on the cards at the card catalog without removing cards from the trays. instruction was given and examples were displayed to show how to locate the new library of congress number on the catalog card, thus the use of the card catalog was not hindered by the lack of labels on the cards. this method seemed to be the most efficient, since new cards would have been expensive, headings would have had to be retyped and all cards refiled again. circulation (figure 1) a machine readable shelf list made possible an automated circulation system, since producing a machine readable circulation card (figure 2) would be a simple computer operation. the old circulation system presented the usual problems but manually preparing the over-dues was costly and time consuming and would benefit most from automation. total circulation for the library for 1968/ 69 was 18,000 books. the maximum number of books charged out per day was one hundred and fifty books and the minimum, twenty-five. maximum size of the loan record file was about two thousand charges. library mechanization at auburn/hilbert 15 mak~ up dtlf~ eal"d. idj.af '., rit~fl} ia l i.brary fig. 1. circulation flow chart. gets . (jut t:~""hy eat"d.5 . +hata/r~ady hare. a. d~.~e cla"ki11 the-m. 16 journal of library aa.ttomation vol. 3/ 1 march, 1970 -1 i~' ~, :r:; o· -o10 nc 0 0 ;sz oo 3::-i oo"" om 03:: ;><;0 < m [ai nu~bt• ui hor oui _10 nc . \. ii ill i u u 01 ' 110 ,,_; ,. 01 u n" " "'"""•"""" '"'" " ~~"~!"".~_'~~-~~ · ko l nn• ll k """" " kh 011 " "' '""""" " "'" fig. 2. circulation card. a mark-sense reproducer prepares the cards for the computer. this reproducer had been acquired for other college computer functions and the library was able to make use of it (2). under this plan the books are charged out by having the borrower write in his identification number, which serves as the borrower's number, and his name in the appropriate box on the ibm circulation card ( 3). the student assistant at the desk mark-senses the book card with the identification number; this is the one manual operation, but it has presented no problem. the marked circulation cards are sent three times a week to the data center, where the mark-senses are read and punched and the due date is gang-punched in. the 1401 computer generates a second circulation card, duplicating the accession number, call number, author and title. old and new circulation cards are machine filed together by accession number and returned to the circulation file, which is arranged by date and accession number. it was found that the accession number is easier to read than the library of congress number and is the truly unique number. a printed circulation listing, arranged by call number to facilitate use, is kept at the charge desk; it shows accession number, author and title, borrower's name, identification number and due date. it is also possible to prepare a daily circulation report by student identification number and name if required. the entire circulation is sent to the data center weekly to produce a cumulative print-out of all books in circulation. these print-outs provide daily and weekly totals of all outstanding circulated books. no data processing equipment is required for reserve circulation. charging out of books on reserve continues to be done by having the borrower write his identification and name on a blue reserve card to be kept at the desk. library mechanization at auburn/hilbert 17 when a book is returned, the pair of circulation cards are selected from the circulation file. the used charge card, which contains the borrower's identification number and due date, is marked "cancelled" with a rubber stamp. the new circulation card is inserted in the pocket of the book and the book is reshelved. cancelled circulation cards are kept and sorted later to provide statistical analyses by date and class number for each semester. this system was developed because it was felt a small library could not justify expensive charging machinery. acquisitions and shelf list once the reclassification operation was organized it was possible to set up automation procedures for processing current acquisitions. an ibm card was designed as a book request card (figure 3) to be filled in by staff or faculty member. information on it includes author's name, title, publisher, price purchase order number, academic department, and requestor's name. at the data center the foregoing information is keypunched into the card, which then becomes a purchase order card. the purchase order number identifies the vendor and is gang-punched into the cards. a computer print-out produced from the purchase order cards is mailed directly to the dealer as a book order. order cards are kept in an "on order" file by dealer or purchase order number and then by author until the books are received. / i i i i i i i i i i""'' '""· .,.o . nol li.tr,. 0 c • •. nn uf ••11r nnvr tiis ill£ mlibrary author c request form tide inch.! de date & edirion ii neceuafy publis'her dept, please print or type. list price i reqvested a, complete request nn lit jhow nrs lim . and sign. p.o. -~ cod lc. cion i i do not fold, bend acceu;on .j i i i or mutilate this card. i i i i i i ' 1 f j t i i i i i ii n ij ult 111111111tr11 nn l4 11mn•~uu~»•nxnut1unm~·t/ii~miiuummmpmm.i1uh~u-ii·h~ itd»h~niih~· ~ ............. fig. 3. book request card. when the book and its library of congress cards have been received, the corresponding order cards are pulled and the following additional information is added to the purchase order card: actual cost (taken from the invoice), accession number (stamped on), and the library of congress call number (taken from the library of congress printed card). figure 4 is a flowchart of the acquisitions procedure. the books are then processed in the same manner as was used for reclassifying ( 1). 18 journal of library automation vol. 3/ 1 march, 1970 ,,.~, ll/20 vcr fig. 4. acquisitions flow chart. vpm r~ceipt .f~~.s tj,.de/#~d. m•ke. ~kif list . card~ f"r1m ~rcluue. orr:lel' li~ra~"''/ ~,. fi/,..,9 library mechanization at auburn/ hilbert 19 /41>1 lillo fig. 4 continued. 1~01 l/1~0 order cards are sent to the data center to reproduce the shelf list cards, automatically transferring the pertinent information already punched in the order card and keypunching the additional information into the shelf list cards (figure 5). currently, provision is being made for inclusion of the library of congress card order number in the shelf list card to enable easy subsequent selection of the corresponding marc records. -r;, l,l,i"r%ry -hr .fi/i-,9 fig. 5. new books listing procedure. 20 l ournal of library automation vol. 3/ 1 march, 1970 the shelf list cards are used to produce the new books list (figure 5). the shelf list is kept in the ibm card form, and a book catalog could easily be made if so desired. to compile a bibliography it is only necessary to take the punched cards from the shelf list in the wanted classification. the library's subject catalog and the library of congress subject headings are checked to determine the class numbers to be used. as depicted in the flowchart in figure 5, these cards are put through the computer to produce the print-out, and then returned to the punched shelf list file. this system was designed to produce a bibliographical record of the books in the library and to automate the technical processing of the books in as simple a method as possible so as not to defeat the purpose of automating. accounting (figure 6) the accounting system was designed to use the book request card after it has had department and cost punched into it. after the books are processed, accumulated request cards are sent periodically to the data center, where computer print-out is produced by department, listing the books purchased and the cost of each, with a summary showing all expenditures. copies are sent to department chairmen to keep them informed of their expenditures. these order cards are kept for a semester, then returned to the individual requesting faculty members after a cumulative accounting record has been made. by this means it is possible to keep track of each department's book budget and the library's total book budget, with the computer doing all the work. l117tj fig. 6. accounting procedure. library mechanization at auburn/ hilbert 21 overdues (figure 7) overdue notices are machine prepared from overdue circulation cards which are selected periodically from the charge-out file. the cards are passed through the computer, which generates second and third overdue overdue c!.ire. ecl"ds 14tji 70 li.6ran; mw ovt~due.. file l. 1/~t:f fig. 7. overdue procedure. t 22 journal of library a.utomation vol. 3/ 1 march, 1970 cards to be used for discharging purposes. gummed address labels that include the student identification number are produced using the college log of names and addresses. the appropriate label is applied to the reverse side of the circulation card using the i.d. as a guide. each notice card is stamped "overdue book, please return as soon as possible," then sent through the postage meter and placed directly in the mail. if several overdues are sent to the same person, the cards are mailed in an envelope, using the gummed label. the second and third notice cards are filed at the circulation desk until needed or until the book is returned. there is another file for borrowers who are seriously delinquent in returning their books. cards that have accumulated in this overdue file are processed as follows to generate further overdue notices: an overdue notice is sent to the borrower, the dean's letter to the borrower or to his parents, and the list of names to the dean and the student personnel office. at the end of each semester a list is prepared indicating all books held by individual faculty members for more than three months and the latter are notified. the time-consuming operation of preparing overdues has been considerably reduced ( 4). serials serial holdings have been converted to machine readable punched cards. the state university of new york, under the direction of dr. irwin pizer of the upstate medical center at syracuse, has recently published a union list containing the titles of all periodicals received in all units of the state university (5). it includes the serial holdings of auburn community college, (approximately 400 titles) and punched cards for these holdings are used by the library adapted for its use. information on the card comprises title, inclusive dates, years on microfilm, department for which the periodical was ordered and the indexes in which the periodical is listed. each new serial title added to the holdings is keypunched with this information. the punched cards are used to print out an alphabetically arranged title listing and a departmental listing. adding or withdrawing titles is a simple matter, and up-to-date lists of periodical holdings are easily produced by the computer. copies of the lists are sent to eacli faculty member and several copies are available at the desk and in the periodical room. costs since library use of the data center was considered to be similar to other college uses (e.g., that of the business office), the cost of library automation was absorbed by the data center and not charged to the library. an estimate of the cost, including rental time on the computer (about three hours per week), supplies, and data center staff time, is about $1500.00 a year for ongoing programs. library mechanization at auburn/hilbert 23 conclusion the automated systems herein described have now been completely operational for over a year. converting data for a computer operation spotlighted inaccurate recording of information and afforded a good opportunity for correcting previous errors. periodically, progress and results have been reviewed and changes made, as will continue to be the case. the automated circulation system is providing the library with rapid, accurate, and efficient circulation control not possible for a manual system. ease and speed of performing routine library operations by the use of automation more than compensates for the cost of data processing. automated technical procedures provide for faster and more efficient processing of books, production of the library's monthly new books list (which previously took hours to type) and subject bibliographies. other important results of the mechanization project are the serial listings and departmental accounts, all of which make possible better library service. acknowledgments the programming was done in autocoder by, or under the supervision of, mr. richard klinger, chairman of the data processing department at auburn community college; to him is due most of the credit for the mechanization of the library. the library is grateful to mr. klinger for his encouragement and enthusiastic support and his willingness to assume the technical responsibilities of programming and systems design. references 1. international business machines: mechanized library procedures. (white plains, n. y.: ibm, n. d. ). 2. international business machines: library processing for the albuquerque public school system (white plains, n. y.: ibm, n. d.). 3. dejarnett, l. r.: "library circulation." in international business machines corporation: ibm library mechanization symposium (endicott, n. y.: 1964), pp. 78-93. 4. eyman, eleanor g.: "periodicals automation at miami-dade junior college," library resources and technical services, 10 (summer, 1966), 341-61. 5. the union list of serials in the libraries of the state university of new york. (syracuse, n.y.: state university of new york upstate medical center, 1966). 154 information technology and libraries | september 2009 tutorial kathleen carlson delivering information to students 24/7 with camtasia this article examines the selection process for and use of camtasia studio software, a screen video capture program created by techsmith. the camtasia studio software allows the author to create streaming videos which gives students 24 hour access on any topics including how to order books through interlibrary loan. h ow does one engage students in the library research process? in my brief time at the downtown phoenix campus library of arizona state university (asu) i have found a software program that allows librarians to bring the classroom to the student. screen capture programs allow you to create presentations and publish them for students to view on their own time. instead of telling students how to do something, we need to show them.1 recent studies show there are numerous benefits to using streaming video in higher education. students that receive streaming video instruction as well as traditional instruction show dramatic improvement in class.2 this article takes a look at how i selected one software program and created a streaming video using the application. i examined three software applications that help create video tutorials and presentations: cam studio, macromedia’s captivate, and techsmith’s camtasia studio. i first experimented with cam studio, which is open-source software. there are limitations to what you can do with software that is free. the screen size is too small and the file size it can create is limited. macromedia’s captivate is good if you want to create a series of screenshots with accompanying audio. i did not choose this streaming video program because i was unsure of the software’s capability, and i had no one to provide technical support. the third choice, the open-source camtasia studio, was the software i selected. there were several reasons why i preferred this software. i had more familiarity with it, and the software is very easy to load and is user friendly. it also has the ability to record a video of everything that is happening on your computer screen.3 another reason i selected camtasia studio was because of the availability of an asu software technician who had experience editing the streaming video. most users view camtasia’s video through adobe flash, but the program also can produce windows media, quicktime, dvd-ready avi, ipod, iphone, realmedia mp3, web, cd, blog, and animated gif formats.4 camtasia performs screen captures in real time. you are able to simultaneously use slideshow software, navigate to a website, and narrate step-by-step instructions. the full version of camtasia studio runs around $300. in addition to the software program, you also must have a combination headset and microphone. a stick microphone will work, but the combination headset will help eliminate any noise that can be picked up by a stick microphone. i purchased a logitech extreme pc gaming headset for about $20. when you purchase the camtasia license online at http://www.techsmith .com/, the customer service department will e-mail you the access code along with a link from which you can download the software. the cd-rom loaded with the camtasia software arrives about ten days later. my first camtasia studio project was a tutorial on how to use the university’s interlibrary loan system. here are the basic steps i took to create a streaming video: 1. preproduction. this involves the creation of a script. 2. production. the actual capturing of the video and audio content. have all websites and programs open and minimized at the bottom of the screen in order to easily select them during the video capturing. 3. postproduction. this is the most time-consuming and involves editing the video and compressing the file for delivery to users. 4. publishing. posting the video to a web server and assessing the material’s success. to see the full 3 minute 53 second streaming video “how to order an article that asu does not own” go to http://www.asu.edu/lib/tutorials/ illiad/index.html. implementing camtasia studio once camtasia studio is installed on your computer, double click on the camtasia studio icon. it will bring up a welcome window where you can select from the following (see figure 1): n start a new project by recording the screen n start recording a powerpoint presentation n start a new project by importing media files n open an existing project i have selected “start a new project by recording the screen.” on the left hand menu there is a task list, and you can select one of the kathleen carlson (kathleen.carlson@ asu.edu) is health sciences librarian, information commons library, arizona state university, downtown phoenix campus. delivering information to students 24/7 with camtasia | carlson 155 following (see figure 2): n record the screen n record the powerpoint i have selected “start a new project by recording the screen.” this will bring up a window, “new recording wizard screen recording setup.” it asks you what you would like to record (see figure 3). n region of the screen n specific window n entire screen i have selected “entire screen.” when you click on the “next” button, it brings up a recording options window (see figure 4). select from the following: n record audio n record camera i have selected “record audio while recording the screen.” next you see a window that lets you choose audio settings from the following (see figure 5): n microphone n speaker audio n microphone and speaker audio n manual input selection i have selected “microphone” (see figure 6). the next window is titled “tune volume input levels.” use the input level lever to set the audio input level (see figure 7). figure 1. welcome screen and what do you want to do? figure 5. choose audio settingsfigure 3. screen recording setup figure 4. recording options figure 2. record the screen 156 information technology and libraries | september 2009 the “begin recording” window appears, which includes instructions on how to start and stop recording. you have the choice of clicking the “record” button on camtasia recorder or clicking the f9 key to start recording. to stop, click the “stop” button on camtasia recorder or click the f10 key (see figure 8). finally click on either “record the screen” or “record powerpoint.” to view your streaming video, click on the saved icon where it says clip bin or go to camtasia toolbar and click on view. then click on clip bin, then click on thumbnails. that’s all there is to it. summary i found camtasia studio to be very user friendly, although i cannot emphasize enough how important it is for librarians to collaborate with their it staff. this software enables you to bring the classroom to the student when they need it. you may have instructed a class on library research, but many of these students may have already forgotten where to begin. streaming video allows students to access presentations 24/7. here is a checklist of things to think about when selecting software: n what do you want to accomplish with the software? n what kind of access are you trying to give? n do you want audio, video, or both? n is it easy for the student to access and understand? n have you researched the software to make sure it meets your needs? n how much money do you want to spend? n what additional equipment is necessary? finally, and most importantly, work with your it staff on all phases of your project. by developing a collaborative relationship with them you will have fewer bumps in the road. use your imagination: the sky is the limit. references 1. diane murley, “tools for creating video tutorials,” law library journal 99, no. 4 (2007). 2. ron reed, “streaming technology improves achievement: study shows the use of standards-based video content, powered by new internet technology application, increases student achievement,” t.h.e. journal 30, no. 7 (2003). 3. christopher cox, “from cameras to camtasia: streaming media without the stress,” internet reference services quarterly 9 no. 3/4 (2004). 4. john d. clark and qinghua kou, “captivate/camtasia,” journal of the medical library association 96, no. 1 (2008), http://www.pubmedcentral.nih.gov/ articlerender.fcgi?artid=2212324 (accessed june 24, 2009). figure 6. audio volume levels figure 7. begin recording figure 8. camtasia recorder automatic retrieval of biographical reference books cherie b. well: institute for computer research, committee on information science, university of chicago, chicago, illinois 239 a description of one of the first pro;ected attempts to automate a reference service, that of advising which biographical reference book to use. two hundred and thirty-four biographical books were categorized as to type of subjects included and contents of the uniform entries they contain. a computer program which selects up to five books most likely to contain answers to biographical questions is described and its test results presented. an evaluation of the system and a discussion of ways to extend the scheme to other forms of reference work are given. ideally the reference librarian is the "middleman between the reader and the right book" ( 1 ) , and this is what the program here described is intended to be. in the past there has been very little interest shown in automating this service, probably because it is neither urgent nor practical in current reference departments. many developments in automating other areas of libraries have indirectly benefitted reference librarians, and the literature primarily emphasizes this aspect. for instance, where circulation systems have been automated, the location of a particular volume can be quickly ascertained and librarians need not waste time searching. automation of the ordering phase provides them with information on the processing stage of a new volume. if the contents of the catalog have been put in machine readable form, special bibliographies can be rapidly produced in response to a particular request or as a regular service of selective dissemination. the development of kwic (key word in context) in240 journal of library automation vol. 1/ 4 december, 1968 dexes, which are compiled and printed by computer, has enabled publishers to provide indexes to their books much faster. computers have also been programmed to make concordances and citation indexes ( 2). the combination of paper-tape typewriters, computer and a photocomposer has introduced automation into compiling index medicus (3). changes in reference services themselves, however, may make automation of question-answering practical. one trend is toward larger reference collections to be shared by several libraries; some areas have already set up regional reference services. there are also cooperative reference plans whereby several strong libraries agree to specialize in certain fields and cooperate in answering questions referred by the others ( 4). these trends will mean two things to reference librarians: greater concentration of resources, allowing more specialized books and mechanization; and screening of questions at the local level, letting reference centers concentrate on more complex questions that utilize their specialized books. thus it seems likely that special reference centers may look increasingly toward mechanizing their services, and retrieval schemes of the type presented here will be important to consider. basic assumptions the categorizing system was based on two nearly universal generalizati.ons about biographical reference books: 1) they are consistently confined to biographies of persons who have something in common: for example, being alive or dead; or having the same nationality, sex, occupation, religion, race, memberships; or possessing some combination of those attributes. these common characteristics in the people covered by a given book are herein called "exclusive categories." 2) the books generally maintain uniform entries for each subject; that is, they give the same data for each biography. these facts are referred to herein as "specifics" or "specific categories." certain assumptions were made about reference work: 1) all biographical reference books fit into the scheme and can be categorized. 2) the more limited a book's scope, the more likely it is to contain the person a user wants to find. in other words, if a user is interested in a dutch economist, he is more likely to find information in a book limited to dutch economists than in a general biographical dictionary. the user, however, does not want to miss any source that might be useful. therefore a general biographical dictionary should be given to him as a last resort, after books on dutch economists, dutchmen of all occupations, and economists of all nationalities. 3) certain requirements, the specifics, have no substitutes. for example, a book lists addresses or it does not, and if a user wants an address, books without them are useless. there is merit in suggesting to a user which book to use as opposed to giving him the direct answer to his question. probably the best argument for this assumption is that the volume of names that would have to be retrieval of biographical reference booksjweil 241 compiled and stored for a direct inquiry system is staggering, only a small number would ever be looked up, and it is impossible to predict which ones would be searched for. there are advantages to mechanizing this particular task of a reference librarian: good reference librarians should be freed to perform work less easily mechanized; there are not enough reference librarians who have perfect recall of their collections even to knowing which exclusive categories all the books fit into; and no librarian could have complete recall as to the specifics contained in each biographical reference book in the collection. the computer program the program was written in the comit language, a non-numerical programming language developed for research in mechanical translation, information retrieval and artificial intelligence. it is a high-level problemoriented language for symbol manipulation, especially designed for dealing with strings of characters. the program could probably be converted to other list-processing languages ( 6) for operation at other installations. the program was run at the university of chicago computation center on an ibm 7094 having the comit system on a disk. questions were submitted and nm in large batches. · the data all biographical reference books in english, with alphabetical ordering of subjects, which are in the reference room of the university of chicago's harper library were included in the data and no other books were included. since one assumption was that all biographical reference books could be categorized by the scheme, it seemed more useful to prove the system could handle any biographical reference tool than to compile a balanced list of biographical books. there was no difficulty in categorizing the books. all books are categorized in the following way. first an arbitrary abbreviation for the book is chosen to be its entry in the file; it is referred to as a "constituent." each book is then described by determining the values of nine subscripts each constituent carries, the subscripts being sex, living, nat (nationality), occup (occupation), min ( minorities), date, index, specl and spec2 (specifics). values of the first five subscripts-the exclusive categories-are first determined. that is, is the book limited to one sex? are all the subjects living or dead? do they all have a certain occupation? does the book include only certain nationalities? or is there another restriction; e.g., to alumni of a college, members of the nobility or a religious group? the exclusive categories for a book are determined and coded from a table of abbreviations. sex, for example, allows three values: restricted to males ( m), restricted to females (f) , or no restriction ( z) . also a value x must occur 242 journal of library automation vol. 1/ 4 december, 1968 with m or f, indicating there is a restriction. therefore sex can have the following combinations: sex z, sex f x, or sex m x; the values m x and f x are both the opposite of z. next the book's date is determined by asking "at what date did the values on living (yes or no) apply? or, if the subjects are not restricted to living or dead (living z), "when was the book up to date?" next any indexes to the biographies are noted. all the biographical books list subjects in alphabetical order by surname. lists of subjects in any other order are considered indexes even if the subjects are actually listed in some other order in the main body and the list that is alphabetic by surname is an index. finally, specific categories (spec i and spec2) are coded for such facts as birthdate, birthplace, college attended, degrees held, hobbies, illustrations, social clubs, and marital status. when all categorizing is finished, a data item is punched in this form: dictphilbio/ index field x, living n x, occup z, sex a, nat philip asian x, specl dc ds fl bp l cl cm dg e i z, date 50s x, spec2 p pl r ms pd z, min z +. this represents the dictionary of philippine biography, a book limited to dead filipinos and giving for each entry: dates, career, descendants, field, birthplace, long articles, class in college, degrees, education, picture, parents, publications, references, marital status and physical description. the book has a special index to find subjects by their field of work. one specific value, that for a long article, requires special mention. though most biographical reference books provide the same facts about all the subjects in list form, a few provide different facts about different subjects in a nanative form. such books carry the specl l, and the other specifics these books are listed as providing are not always given for every subject. for example, a book with a list format may provide the birthplace for every subject when it can be ascertained, but in a book using the narrative form, where often different authors write the articles, birthplace is not necessarily given. books in narrative form are used less for quick reference; therefore the program provides a note, when a long article is requested, that the card catalog may provide more long articles on the subject. ease of file maintenance is one advantage of this system. as data is analyzed in the first place, if a new value for a category is required, such as an occupation which is not in the list, the new value is simply added under occup for that particular book and in the list of abbreviations for fuh1re use. it is a little more complicated to make an existing value more specific. for example, to differentiate botanist, chem, physics and astron and still maintain scientist as a general category embracing them all, another short program is required to retrieve the data to be reclassified. retrieval of biographical reference books/ well 243 coding the question a biographical question can be quickly coded. the nine required subscripts are the same as those for the data books, but only one value for each subscript is necessary. for example, "'what are the publications of a living dutch economist? a current book is desired." is coded as q / sex z (or m), living y, nat dutch, occup econ, min z, index z, date 60s, spec! z, spec2 pl +. operation of the program briefly, the program reads in data and then the first question. it weeds out data items that can never be suitable, discarding all but those items that have the same values as the question has on the subscripts index, spec! and spec2. it then weeds out data items that do not have either the same values as the question, or the value z, on the subscripts occup, nat, min, sex and living. mter each weeding the program checks to determine that there are data items left; if all the books have been weeded out, there are no answers. there is also a provision to allow the user to designate certain titles to be ignored on a particular question in case he has already checked them, for example. all data items left after weeding are potential answers and could simply be printed out. however, subsequent searches over the remaining items serve the purpose of rearranging them into an order in which they are more likely to produce answers. it was decided that five answers are enough to judge the types of titles chosen yet few enough to avoid very long searches. a shorter list of answers would obviously be cheaper and a longer list more likely to produce a book containing the desired subject. ordering proceeds as follows: first values of subscripts sex, living, min, occup, nat and date on the question as originally stated are matched to those of books in the data. the computer is at this stage searching for books that are limited in just those categories in which the question is limited. for example, if the question q / sex z, living y, min z, nat dutch, occup econ, index z, date 60s, spec! a, spec2 pl + will match only those books published in the 1960's and restricted to living dutch economists which give publications for all the subjects (or the majority), and the books cannot be restricted to a sex or to any "minority" group. the books found may or may not have additional values on the subscripts; that is, a book may also contain french economists. such books found on the first search are mostly likely to contain the subject the questioner is looking for. if there are fewer than five books found which are a perfect match with the question, the program begins to alter the question. to make the least significant possible change in the question, the program changes the value of the subscript judged to be the limiting factor on the fewest books in the data, namely sex. if sex has a z as its value (because the questioner did not know the sex or did not prefer a book limited to one 244 journal of library automation vol. 1/ 4 december, 1968 sex) it is changed to x so that a book limited to one sex will not be overlooked. if sex does not have a z value (which means it has either m x or f x), it is changed to z. this means the questioner preferred books limited to one sex but presumably his second choice is books not limited to any sex. clearly if the question has sex f x it can never be changed to sex m x or sex x, since sex x will find books in the data classified sex m x. anything other than z changes to z, and z only changes to x. mter this change is made, another search is conducted and the answers counted. until there are five books or the data is exhausted, the original question is altered and the cycle continued. alterations proceed by changing the values of one subscript at a time in the following order: sex, living, min, nat and occup. then they are changed two at a time, three at a time, four at a time, and finally all five are changed, so there are thirty-one possible changes. if at the end of the thirty-second search there are still not five answers and there are more data items, the date restriction on the question is checked. if date has a value other than z, it is changed to z, which matches all the data items, and the computer prints a note if this is done; the program will then select any book regardless of date. control returns to search and begins the cycle again, continuing until five answers are found or the data is exhausted. mter searching is finished, the writing routine commences. one at a time the computer takes each answer, writes out its code for possible further reference, and then writes out the complete author, title, copyright date and library of congress call number, all of which the computer finds in a list within the program. results to obtain some measure of the program's accuracy, fourteen textbook questions, probably more challenging than the average patron would ask, were submitted to the computer and to a professional librarian who was especially familiar with biographical reference books. (see figure 1 for sample questions and results. ) the librarian spent a total of an hour and a half, and found answers to eleven out of fourteen questions. on the three she could not answer she felt she had exhausted the resources. in one of the eleven she answered ("how many americans won the nobel prize in medicine between 1930 and 1950?") she found the answer in a source not specifically biographical (world almanac) and therefore not in the computer's data. no problems occurred in forming the questions for submission to the computer. the program found some reasonable sources in all cases. it found books containing the answer in ten out of fourteen cases, the four answers not found being those three the librarian missed and the one requiring an almanac. in all but one case there were more possibilities than the five books given per answer. some questions were rerun ignoring retrieval of biographical reference books / well 245 qu~stion: in one source find a list of .1t least twenty references t o biog raphical information about dmitri ~1endelee£ (or mende. lev), russ i an chemist (1834-1907). as submitted to computer: q / sex h, livi ng n, occup cheh, nat russian, min z, specl z, specz r , index z, dt.te z + librarian's results: b phillips, dictionary of biographical a encyclopedia britannica a encyclopedia p-'llericana a biography index (1949-64 volumes) reference .. 0 references 6 references 1 reference .. 14 references time: 10 minutes computer's results: a index to scientists ... 27 references a biography index c drake, di ctionary of american biography (sounds wrong but it is international.) b phillips, dictionary of biographical reference a encyclopedia britannica question: what academic degrees have been earned by professor reuben l. hill, director of family study at the university of minnesota'?: as submitted to computer: (l) q/ sex m, living y, occup educ, nat aher, min z, specl dg, spec2 z, i~"dex z, date z + ( 2) ignore + 1\}ieconassn + i gnore + amerscience + ignore + ampolisci + ignore + damerschol + ignore + leadeduc + q / sex m, living y, occup educ, nat amer, min z, specl dg, specz z, index z, date z librarian's results: b leaders in education a who's who in arne rica answer: bs, phm, phd time: 3 minutes computer 's results: ( l) d handbook of the american economic association d d d b (2) b c a b b american men of science biographical directory of the american l'olitical science assoca t ion directory of american scholars leaders in education who's who i n american education outstanding young men of america who's who in america h'ho' s who in various areas national cyclopedia of american biograp hy question: where might i find information about a new england ancestor named jacob billings who was born around 1753'? as submitted to the computer: a / sex m, living n , occup z, nat amer, min ff , index z, date z, specl z, spec2 z + librarian's results: d handbook of genealogy about genealogists not families a compendium of american genealogy time: 8 minutes computer t s results: a compendium of a(!letican genea logy c dictionary of american biography c who ~.ras ~"1'10 in america c lamb's biographical dict i onary of the u. s. c concise dictionary of american biography a = it has the answer or a t least part of it b = good choice but it does not have answer c = reasonable choice but the r e arc better ones d = poor choice fig. 1. sample reference questions. 246 journal of library automation vol. 1/ 4 december, 1968 the first five answers, and five more titles were retrieved; even then there were more possibilities. in some cases the program did better than the librarian because she wasted time looking in sources that did not give the specifics sought. for instance, when the question asked for the pronunciation of the surname of paul and lucian hillemaker, french composers, she looked in dictionaries that do not give pronunciation. the computer found the only four possible sources immediately. in other cases the program came up with rather far-fetched answers a human would have skipped. a question asking for biographies of franz rakoczy, an hungarian hero, retrieved in its second five sources three jewish encyclopedias and a book on composers! these were not wrong and, in cases where occupation or minority group affiliations were unknown, these might be good sources. as an answer to the nobel-prize-winner question the computer retrieved sources on american doctors, nobel winners and scientists, which are the best choices from the data and would have the answers buried in them. however, what is really required is an index to award winners, and there were none in the data. the test revealed the necessity for allowing questions to have dummy values; that is, ones not used in the data. for instance there are no books limited to botanists, so occup botanist is not allowed in a question, though occup scientist is, and chem and physics are included as more specific values under scientist. asking for occup scientist when searching for a botanist avoids getting books devoted to nonscientific occupations but also gets books devoted to chemists and physicists. since one would want these books if he did not know the scientist was a botanist, that should not be changed. if he asks for occup botanist he wants books devoted to botanists first, then scientists in general. a short-term solution is to have dummy values to stand for all these other values. for example occup other-scientist could include all scientific occupations except those specifically listed, and it would retrieve books limited to all scientists but not to specific scientific occupations mentioned in the data. a long-term solution is to use a computer language allowing tree-structured data. presently this problem does no more than cause extraneous retrievals which the person using the list can easily skip. discussion advantages of the scheme can be speculated. from the library's point of view its virtues are that it is simple and inexpensive. original implementation would not require a major block of time to be spent in human indexing or abstracting. operating costs would be low because it does not require such a large store of information in memory that several tapes must be searched, and because updating the file is simple. when a new retrieval of biographical reference books/ well 247 book is added, an experienced person could categorize it in five minutes, punch a new data card and, if required, add to the list of values in the table of abbreviations. the system could provide useful information to other departments. it could keep tallies for the acquisitions department of how often a book is given as an answer, indicating whether new editions of it or similar books would be good buys. from the user's point of view the system avoids a major pitfall of some retrieval schemes which retrieve on the basis of ambiguous terms or association chains; that is, missing relevant items. if the user resubmits the same question ignoring already retrieved books each time, he will eventually have a comprehensive list of possible sources in the data that have the index and specifics he requires. a user also wants his information as brief as possible, listed in order of importance and with no extraneous answers ( 7); this requirement could be met as the program stands by having a human simply cross out any unnecessary titles. users like to know the reliability of the information ( 7) ; this detail could be provided along with the titles. users also want speed and convenience. as it stands, this system could be made available to users of the university of chicago library tomorrow with no more equipment than is presently in the computation center. time delay in the present implementation could be remedied by using an on-line system. users often prefer to be given facts themselves and not just citations ( 7). a program that gives biographical facts directly has no connection with this scheme or classification system, but the output of this program could be used as a tool by a librarian to find the answer for a patron. bibliographies the most obvious area to which the retrieval scheme could be extended is that of bibliographies. like biographies, they are limited in their scopes to certain exclusive categories, and they contain the same specific facts for each entry. logical exclusive categories could be: nationality, form (with such values as drama, poetry, fiction, maps, etc. ), subject (probably the most frequently used criterion on which to select books for a bibliography), and date. since there is no living with which to connect date, date here should probably have not just the most recent relevant date but as many values as necessary. for instance date 40s 50s 60s would apply to an index that began publication in the 1940's and is current. then a request for any of those dates would find it. possible specifics include number of pages, the cost, or a facsimile of the title page. arrangement would be needed, being different from index in that bibliographies, unlike biographies, cannot be assumed to have the same order (alphabetic by subject's name) plus indexes in other orders. arrangement would list as values all the ways the con248 journal of library automation vol. 1/ 4 december, 1968 tents of the bibliography could be approached: by subject, author, title, chronology or a combination of these. dictionaries dictionaries also lend themselves well to this type of scheme; one exclusive category, subject, might even be adequate for dictionaries. dictionaries' special subjects could be broken down into field (such as chemistry or business) and type (such as slang or geography), if necessary. language would be a specific category, since there are no substitutes for the language required. other possible specifics are pronunciation, definition, etymology and illustration. atlases atlases are also suited to the scheme. exclusive categories that seem appropriate are area covered, special subject atlases, and the size of the scale. scale should probably act as date does in the biographical program; that is, if a particular scale is requested, that would be searched for first and, if no answer is found, a note would be given and another search made for any scale. specifics for atlases could include items like topography, rainfall, winds, cities, highways and major products. factual books (those that give the highest mountain, the first fourminute mile, the january loth price of u.s. steel, etc.) do not lend themselves to the scheme. because these books are not uniform as to entries and subject coverage, the list of possible specifics and exclusive categories would be extremely long and the number of searches consequently prohibitive. also, since such books are far fewer in number than biographical or bibliographical works, the proper one is easier to find by browsing. conclusion a scheme for categorizing biographical reference books by their exclusive and specific categories makes it possible to automatically retrieve titles of those which would best answer reference questions. when tested it was found acceptable, with minor refinements, and it is easily adaptable to other reference book forms. such a system seems a logical direction in which to go when automation of actual reference functions is undertaken. acknowledgment the project under discussion was undertaken in partial fulfillment of requirements for the m. a. degree at the university of chicago's graduate library school. the computer program employed is detailed in the author's thesis ( 8). the work was partially completed under the auspices of aec contract no. at(ll-1)614. retrieval of biographical reference books / well 249 references 1. university of illinois library school: the library as a community information center. papers presented at an institute conducted by the university of illinois library school september 29-0ctober 2, 1957 (champaign, illinois: university of illinois library school, 1959), p. 2. 2. shera, jesse: "automation and the reference librarian," rq, iii, 6 (july 1964), 3-4. 3. austin, charles j.: medlars 1963-1967 (bethesda, national institutes of health, 1968). 4. haas, warren j.: "statewide and regional reference service," library trends. xii, 3 (january 1964), 407-10. 5. yngve, victor: com it programmers' reference manual (cambridge, mass.: m. i. t. press, 1962). 6. hsu, r. w.: characteristics of four list-processing languages (u. s. department of commerce, national bureau of standards, sept. 1963). 7. goodwin, harry b. : "some thoughts on improved technical information service," readings in information retrieval (new york, scarecrow press, 1964) , p. 43. 8. weil, cherie b.: classification and automatic retrieval of biographical reference books (chicago: university of chicago graduate library school, 1967). lib-mocs-kmc364-20140103102400 52 journal of library automation vol. 4/1 march, 1971 book reviews computerized library catalogs: their growth, cost, and utility, by j. l. dolby, v. j. forsyth, [and] h. l. resnikoff. cambridge, mass.: them. i. t. press, 1969. 164 pp. $10.00. on the verso of the title page of this book extolling the benefits of computer stored information we read: "no part of this book may be reproduced in any form or by any means, . . . , or by any information storage and retrieval system, without permission in writing from the publisher." this is ironic evidence of the inner contradiction between the pioneering aspirations of new technological development and the interests of an existing industry which feels threatened by unfounded fear of obsolescence and pursues claims to undesirable universal control of information. it is a vivid indication of an urgent need for statutory regulation of the right to information as opposed to the right of profit-motivated control of information. behind this disturbing title page are found seven chapters, three of which constitute a specially important contribution to the very scarce literature on quantitative aspects of bibliographic administration. the other four chapters deal at some length with various cost-related aspects of bibliographic record conversion in machine readable form and with computer use for the production of library catalogs from such records: a chapter analyzing the user costs, costs of programming, hardware costs, and record conversion costs; another chapter on the effect of type face design and page format characteristics on the cost of printed catalogs; a chapter on automated error detection in bibliographic record processing; and a chapter on the use of machine readable catalog data in the production of bibliographies. the three chapters on statistical analysis of machine readable bibliographic data are chef-d'oeuvres of library literature. they demonstrate the wealth of quantitative information inherent in bibliographic record files, which with application of appropriate statistical methodology can yield most important information for library management. the introductory chapter illustrates a methodology of analysis of book publication trends, and comparing these with the gross national product and other economic indicators, points out forcibly the extent of vital quantitative information available to the administrator if he cares to analyze. this is a brilliant essay on the topic of the growth rate of library collections! the case study of the fondren library shelf list is a further elaboration of this theme, especially in terms of title vs. volume ratios and class distri· bution, leading to the third analytical essay on the similarities between the economic growth of nations and archival acquisition rates. this essay trio should not be missed by anyone concerned with the objectives and rational management of libraries. it should be obligatory book reviews 53 reading for administrators who want food for their creative vision. these essays not only are informative, illuminating and stimulating, they also attest the virtuosity of their authors in the area of imaginative statistical analysis. we put this book on our ready reference shelf with the conviction that its authors should be persuaded to give us a most needed textbook on the methodology of library statistical analysis. ritvars bregzis library use of computers, an introduction, edited by gloria l. smith and robert s. meyer. new york: special libraries association, 1969. 114 pp. $5.00. ( sla monograph no. 3). this little paperback book is the result of a lecture series held during the spring of 1965 and sponsored by sla's san francisco bay region chapter and the university extension, university of california, berkeley. according to the introduction, it is intended to be "a librarian's primer for computers: briefly, how they work, and basically what kinds of output can be got from them for use in libraries." the book includes most aspects of library automation in a very generalized manner. the chapters include programming, systems analysis, hardware, applications, reference services, conversion, and current trends and future applications. the most informative chapters are the ones on application and current trends. the other chapters matter-of-factly present their information, but because they fail to raise questions do not challenge the beginner to seek additional answers. in addition, most of the papers do not include a bibliography or reading list and therefore do not give the reader guidance to further his interests. the more substantive papers in the book confront the reader with provocative statements, such as, "it should be the cost to the patron which the library should worry about. what is the cost to the ~atron of not knowing something, or the cost of having to find out?" or ' . . . library science . . . today is really only a technology, much like medicine was before biology was developed, namely a handbook-cookbook world. we must continue to search for underlying principles so that librarianship may become a science and grow out of its technological phase." the papers without doubt met the needs of the lecture series for which they were originally intended, and the authors are experts in the field. most of them are well known for their work in library automation and mechanized information retrieval the question nevertheless arises-must ~very spoken word subsequently appear in print? very little contribution is made to the literature by this book as most of it has appeared elsewhere, many times, before-even down to many of the illustrations. as a group of elementary instructional lectures designed to stimulate intere!jt, most of these papers make the grade, but as a contribution to the literature, a fllw of the papers may have better served as journal articles. donald p. hamme1' 54 journal of library automation vol. 4/1 march, 1971 the career of the academic librarian-a study of the social origins, educational attainments, vocational experiences, and personality characteristics of a group of american academic librarians, by perry d. morrison. chicago: american library association, 1969. viii, 165 pp. $4.50. this acrl monograph no. 29 is a revised and condensed version of a dissertation for the degree of doctor of library science at the university of california at berkeley, published ten years after the data were collected. in 1958 the author sent questionnaires to two groups of academic librarians: head librarians of american college and university libraries earning $6,000 or more, which he calls the "primary group," and a "control group" from the same institutions selected from the 1955 edition of who's who in library service. the findings are quite interesting, often expected, but sometimes surprising. it would be expected that the study would support the theory that the true leader has interests broader than those whom he leads, that participation in professional and scholarly organizations is directly correlated with position and salary, and that willingness to move around to different positions is an important ingredient in the formula of "success". but this reviewer, and perhaps others, are surprised to learn that librarians, psychologists and business leaders tend to come from families of men who are better educated than the average, and that it is more advantageous for a woman than a man to hold a master's degree in a subject field and/or to possess the old-type master's in library science. the chapter on "implications of findings" has some interesting comments from respondents, among which is the notion that rewarding specialized competence, as opposed to general administrative ability, is essential if a maximum contribution is to be secured from both men and women. not many academic library administrators would quarrel with this point of view, but might quarrel with the heavy burden which the respondents lay on the library schools to solve the problems of academic librarianship, rather than sharing them with the libraries. criticism of the study is directed toward the rather pessimistic, probably unrealistic, attitude taken by the author on the question of faculty status, and the omission of any substantial inclusion about information science and information specialists, although one needs to recall that these were not prominent issues in 1958, though they were in 1969. the wealth of information contained in this little volume can be useful in recruiting to the profession and to individual libraries, helpful to the young librarian just starting out in the practice of his profession, and of value to administrators of academic libraries. the appendices include a copy of the questionnaire, a statement of the statistical treatment, suggestions for further research and an extensive bibliography. lewis c. branscomb book reviews 55 the computer in the public service: an annotated bibliography 19661969. compiled by public administration service. chicago: public information service, 1970. 74 pp. $8.00. it seemed useful to approach this review from the point of view of a designer of automated library systems, to see how well this bibliography covered the use of computers in that sector of the public service. thus approached, the computer in the public service is disappointing in the extreme. nearly a quarter of the volume is used to explain the classification system, wherein one finds libraries lumped together with museums and the promotion of science and research. this over-generalization is reflected throughout the work and results in very thin coverage, at least as concems the field of library automation. for example, in the four-year period covered, one finds a single article by henriette a vram. one finds neither marc nor recon in the subject index. the indexes, by the way, refer one to the numbers of the sections of the classification code, rather than to page numbers. this takes some getting used to. both journal articles and monographic works are included, but one is hard put to find reference to most of the major systems problems now being tackled by library systems designers, and is struck by the fact that most imprints included seem to be 1966-67. but to return to the real problem with this work: covering four years of the use of computers in the public service in forty-seven pages of citations naturally leaves wide gaps in the literature. the person wishing to learn all about the use of computers in the public service, especially in library systems, will do well to look beyond this meagre list of citations. it can perhaps be best characterized as too little and too late. lawrence g. livingston computer programming for business and social science, by paul l. emerick and joseph wilkinson. homewood, ill.: richard d. irwin, 1970. xviii, 429 pp. $13.25. this book is devoted principally to business programming, with a few of the examples being taken from educational administration or social ~cience. the computer language used throughout most of the chapters ~s fortran-not the current fortran iv, which is covered briefly m an appendix, but the older fortran ii. one chapter is devoted to cobol and another to an "overview of other languages" including basic, algol, and pl/i. it is a satisfactory introduction to programming fundamentals, generally clear and well presented. furthermore, the ?umerous exercises in business programming would seem to be a useful mtroduction to that field. however, neither the language nor the type of ~xample used gives the book any special relevance for persons approaching library automation. foster m. palmer 56 ] oumal of library automation vol. 4/1 march, 1971 consortiwn of universities of the washington metropolitan area: union list of serials, edited by bruce dack. 2d edition. washington, d.c.: distributed by the catholic university of america press, 1970. $20.00. this edition represents 23,787 diherent serial titles located in american, catholic, george washington, georgetown and howard universities. the scope has been enlarged to include monographic series. among the subjects emphasized are africana, astronomy, canon law, chemistry, classics, latin americana, law, linguistics, medicine, physics, semitics, theology and the american negro. although the citations lack the complete bibliographic data that was included in the first edition, the cross references are more than adequate to get from variant forms of the entry to the latest form. in most instances only the title is noted along with the holdings of the various libraries. the print is not as good as that of the first edition. even though the quality does not quite come up to that of the first edition, the list is still worthwhile for people doing research in the washington metropolitan area. sue brown the case for faculty status for academic librarians, edited by lewis c. branscomb. chicago: american library association, 1970. 122p. $5.00. this volwne is available at a most opportune time because of the current interest in faculty status for librarians in academic institutions. fourteen articles and statements examine the various aspects of the subject and generally support faculty status for librarians. although only two of the articles have not appeared in colleges and research libraries during the past decade, this compilation effectively brings together the relevant statements on the subject. faculty status for librarians is a burning issue on many campuses where recognition is being sought, threatened, or questioned. unfortunately, many librarians take faculty status for granted or perhaps do not completely understand or appreciate it. this book will be useful for many groups and situations because of its comprehensiveness. some of the subjects discussed are priviliges and obligations of faculty status, definition of professional duties, criteria for appointment and promotion, opportunities for study and research, and the granting of tenure. on individual campuses faculty status for librarians must be secured and retained in terms locally acceptable and in a frequently unfriendly, if not hostile, atmosphere. the article, "institutional dynamics of faculty status for librarians," by robert h . muller very effectively describes the advantages of faculty status for the institution and explains the forces that tend to work against faculty status for librarians. ]ames t. dodson book reviews 57 monocle; projet de mise en ordinateur d'une notice catalographique de livre, [by] marc chauveinc. grenoble: bibliotheque universitaire, 1970. (publications de ia bibliotheque u niversitaire de grenoble, iii) 156p., [32 leaves] this attractively printed volume is an adaptation, not a translation, of lc marc ii and bnb marc by the librarian of the university of grenoble. as m. chauveinc points out, the format established in a country is based on the cataloging rules used there, since the latter determine the cataloging elements, their functions and relationships. element by element, code by code, everything in marc had to be redefined for monocle in the context of the french rules. because of a commitment to international standardization, care was taken not to alter the basic structure of the content designators used in marc. when a marc variable field is not needed in monocle, e.g., 650 ( l. c. topical subject headings) it is left unused and a new tag is created for the equivalent french field. french topical subject headings are thus 681. marc is a communications format; monocle is a processing format for an individual library which wishes to place its bibliographic records in memory, to produce printed catalogs, and to keep statistics that will be useful in administering the library. the structure of monocle divides each record into two parts: an index file and a library or principal file. this latter is a continuous string of variable length fields which contain the data of a catalog record. the fields and subfields are differentiated in this file only by delimiters and subfield codes. the tags and indicators appear in the directory. the index file in its turn is divided into three parts: a leader, information codes, and a directory which is similar to, though not identical with, the marc directory. the nineteen-byte leader and the sixty-nine-byte information codes area together form the legend. the leader is based on the marc leader and 001 (control number) field while the information codes are an expansion and replacement of marc's 008 field, with provision for serials as well as monographs in this one format. some of the information indicated in the legend can be found in full in the variable fields. however, in line with monocle's avowed objective of automating to improve the running of a library, this information is repeated in the legend, since information placed there, being coded and in fixed positions, can be accessed and sorted more easily, rapidly, and economically. one of monocle's most striking characteristics is the great concern exhibited throughout for sorting and filing arrangement; one example is the effort made to give sorting values to subfield codes. in this respect monocle follows the example of bnb marc rather than lc marc, 58 journal of library automation vol. 4/1 march, 1971 which uses subfield codes only as a means of identifying distinct elements within a field. everyone interested in the problems of bibliographic formatting, or sorting for filing, should give monocle close attention, both for its specific provisions (such as its tagging conventions, its search code, its treatment of titles, subrecords, and references, etc.) and for the light it throws on marc. while lc marc is very succinct in its commentaries and rru:ely justifies its codes, monocle has much fuller explanations of its provisions and its reasons for agreeing with or differing from marc. judith hopkins system scope for library automation and generalized information storage and retrieval at stanford university. stanford, calif.: stanford university, 1970. 157 pp. (available from eric document reproduction service. ed 038153 mf $0.75; hc $7.70) "the purpose of this document is to define the scope of a manual-automated system to serve the libraries and the teaching and research community of stanford university." the automated system considered is not one, but the joint development of two major bibliographic projects; ballots (bibliographic automation of large library operations on a timesharing system) and spires (stanford physics information retrieval system). the development activity falls into three areas; applications unique to ballots, applications unique to spires, and common facilities that are used by both applications, such as executive and communications software and a text editor. the document is roughly divided in two, with hah being devoted to the scope statement and the other half a myriad collection of appendices. the scope portion of the document defines a second phase of development for the system, as prototype applications have been in operation. the objectives of the applications are redefined in system level detail in view of experience learned from phase one. hardware is evaluated and there are indications that it is inadequate to effectively handle even the prototype system. the appendices include a glossary for the uninitiated, sample documentation of the present library operations, a comment on how the law library could use the system, a review by louise addis of the stanford linear accelerator center's experience with spires, and a tutorial on information retrieval. because of the audience this publication is intended for (librarian users, system developers, and administrators) library automation specialists and information scientists will not find much to put their teeth into. the document seems to be intended mainly for internal use rather than external distribution. alan d. hogan book reviews 59 advances in librarianship, edited by melvin j. voigt. volume 1. new york: academic press, 1970. 294 pp. this volume is a most welcome addition to the literature of librarianship, and the prospect of its annual reappearance is indeed cheering. for decades, there were few major innovations in librarianship, but about 1960 there occurred a series of events in libraries, including user-operated photocopying machines, radical improvements and extensions of microphotography, and computerization, that are leading to formulation of new objectives, new systems, new techniques for printing, new media of communication, and new knowledge. up until a dozen years ago, a librarian could keep abreast of new knowledge in his field by skimming a few journals and reading precious few articles. today to keep up he should read a couple of abstract journals and request offprints, read most of the artic1es in at least four or five journals, and advances in libmrianship. this first volume contains eleven chapters by different authors that · discuss topics in a broad span of librarianship : cataloging; acquisitions; costs; academic, school and public libraries; bibliotherapy; and developing countries. although the standard observation by reviewers of such volumes is that "as is to be expected, the quality of the papers is uneven, with some falling below others," it would be accurate to state of this volume that the quality of some papers is higher than others. the editor is to be commended for having produced an excellent publication that should be on the personal shelves of every librarian who wishes to keep abreast of advances throughout his profession. frederick g. kilgour folkbiblioteken och adb. en introduktion i automatisk databehandling. av sten henriksson. datorn som hjalpmedel vid utlan och katalogisering. av claes axelson, lund: bibliotekstjanst, 1969. 86 p., ills., reg. the general service bureau for swedish public libraries, bibliotekstjanst, has edited this introduction to computers and library automation ( circulation and cataloguing). the book is written for persons who know the functions of a library and have perhaps more interest in than knowledge of computers and computer technique. the introduction about computers by sten henriksson from the university of lund is not loaded with numeric statements but carefully explains the computer technique. for non-mathematicians this is an extremely good introduction. claes axelson from bibliotekstjanst writes about the computer's use in circulation and cataloguing. the described circulation system is based on some experiments carried out at a branch library in malmo (stock: 15,000 vols., circulation: 50,000 vols per year). borrower's card and book-card are matched, and batch processed once a week. in case any cards are lost, 60 journal of library automation vol. 4/1 march, 1971 new ones are punched at the desk. reservations were troublesome to handle (a more serious objection if the systems had been intended for use in research and special libraries). cataloguing is described without referring to any experiments. all possibilities are mentioned, book catalogues, card catalogues, databanks, microfiche. specific problems about book cataloguing are treated too. both parts of the book are well written and illustrated with taste and economy. there is an index and a bibliography, but one will find ]ola and program missing. for librarians in scandinavian countries this book is a very useful one-though some problems are related only to the swedish public library world. mogens weitemeyer latin american literature. harvard university library (widener library shelhist, 21). cambridge, massachusetts : harvard university library, 1969. 498 pp. $20.00. the harvard library initiated its monumental project for publishing the widener shelf list in 1965 with the appearance of crusades. making available a classed listing of the holdings of one of the world's largest scholarly libraries is a major contribution to scholarship, as reviewers of the early volumes gratefully acknowledged. however, this review is not concerned with content of this remarkable publication, but rather with the typography of the 21st volume. the early volumes ( 1-8) were reproduced by photo-offset from computer produced copy that was all in upper-case characters. to be sure, one does not read a shelf list, one consults it. nevertheless, literature is in lower case, and using large compilations in upper case is tiresome because of the low legibility of upper-case characters. the plates of these first volumes contained an average of 55 entries. the harvard library improved its procedures about a year after the volumes began to appear, so that the computer printout was in an expanded character set that included lower-case characters and diacritics. the newer volumes were far more legible and therefore far more comfortable to use. each page carried a single column of entries; pages averaged 85 entries. beginning with volume 21, computerized phototypesetting techniques are being employed, and legibility is greatly improved. economy has also improved, for each page now averages over 130 entries per page-nearly two-and-one-half times the content of pages in the early volumes. volume 21 has two columns of entries per page-a format that enhances the number of entries as well as legibility. harvard is to be congratulated on taking advantage of computer developments during the first five years of publication of the widener shelflist. thereby, the aesthetics and economy of a major bibliographic publication have been gratifyingly enhanced. frederick g. kilgofjf cherry 154 information technology and libraries | september 2006 article title: subtitle in same font author name and second author author id box for 2 column layout the present study investigated whether there is a correlation between user performance and compliance with screen-design guidelines found in the literature. rather than test individual guidelines and their interactions, the authors took a more holistic approach and tested a compilation of guidelines. nine bibliographic display formats were scored using a checklist of eighty-six guidelines. twenty-seven participants completed ninety search tasks using the displays in a simulated web environment. none of the correlations indicated that user performance was statistically significantly faster with greater conformity to guidelines. in some cases, user performance was actually significantly slower with greater conformity to guidelines. in a supplementary study, a different set of forty-three guidelines and the user performance data from the main study were used. again, none of the correlations indicated that user performance was statistically significantly faster with greater conformity to guidelines. a ttempts to establish generalizations are ubiquitous in science and in many areas of human endeavor. it is well known that this enterprise can be extremely problematic in both applied and pure science.1 in the area of human-computer interaction, establishing and evaluating generalizations in the form of interface-design guidelines are pervasive and difficult challenges, particularly because of the intractably large number of potential interactions among guidelines. using bibliographic display formats from web catalogs, the present study utilizes global evaluation by correlating user performance in a search task with conformity to a compilation of eighty-six guidelines (divided into four subsets). the literature offers many design guidelines for the user interface, some of which cover all aspects of the user interface, some of which focus on one aspect of the user interface—e.g., screen design. tullis, in chapters in two editions of the handbook of human-computer interaction, reviews the work in this area.2 the earlier chapter provides a table describing the screen-design guidelines available at that time. he includes, for example, galitz, whom he notes have several hundred guidelines addressing general screen design, and smith and mosier, whom he notes have about three hundred guidelines addressing the display of data.3 earlier guidelines tended to be generic. more recently, guidelines have been developed for specific applications—e.g., web sites for airline travel agencies, multimedia applications, e-commerce, children, bibliographic displays, and public-information kiosks.4 although some of the guidelines in the literature are based on empirical evidence, many are based on expert opinion and have not been tested. some of the researchbased guidelines have been tested in isolation or in combination with only a few other guidelines. the national cancer institute (nci) web site, research-based web design and usability guidelines, rates sixty guidelines on a scale of 0 to 5 based on the strength of the evidence.5 the more valid the studies that directly support the guideline, the higher the rating. in interpreting the scores, the site advises that scores of 1, 2, or 3 suggest that “more evidence is needed to strengthen the designer’s overall confidence in the validity of a guideline.” of the sixty guidelines on the site, forty-six (76.7 percent) fall into this group. in 2003, the united states department of health and human services web site, research-based web design and usability guidelines, rated 187 guidelines on a different five-point scale.6 eightytwo guidelines (43.9 percent) meet the criteria of having strong or medium research support. another forty-eight guidelines (25.7 percent) are rated as having weak research support. thus, there is some research support for 69.6 percent of the guidelines. in addition to the issue of the validity of individual guidelines, there may be interactions among guidelines. an interaction occurs if the effect of a variable depends on the level of another variable—e.g., an interaction occurs if the usefulness of a guideline depends on whether some other guideline is being followed. a more severe problem is the potential for high-order interactions: the nature of a two-way interaction may depend on the level of a third variable, the nature of a three-way interaction may depend on the level of a fourth variable, and so on. because of the combinatorial explosion, if there are more than a few variables the number of possible interactions becomes huge. as cronbach stated: “once we attend to interactions, we enter a hall of mirrors that extends to infinity.”7 with a large set of guidelines, it is impractical to test all of the guidelines and all of the interactions, including highorder interactions. muter suggested several approaches for handling the problem of intractable high-order interactions, including adapting optimizing algorithms such as simplex, seeking “robustness in variation,” re-construing the problem, and pruning the alternative space.8 the present study utilizes another approach: global evaluation by joan m. cherry, paul muter, and steve j. szigeti bibliographic displays in web catalogs: does conformity to design guidelines correlate with user performance? joan m. cherry (joan.cherry@utoronto.ca) is a professor in the faculty of information studies; paul muter (muter@psych .utoronto.ca) is an assistant professor in the department of psychology; and steve j. szigeti (szigeti@fis.utoronto.ca) is a doctoral student in the faculty of information studies and the knowledge media design institute, all at the university of toronto, canada. bibliographic displays in web catalogs | cherry, muter, and szigeti 155 correlating user performance with conformity to a set of guidelines. using this method, particular guidelines and interactions are not tested, but the set and subsets are tested globally, and some of the interactions, including high-order interactions, are captured. bibliographic displays were scored using a compilation of guidelines, divided into four subsets, and the performance of users doing a set of search tasks using the displays was measured. an attempt was made to determine whether users find information more quickly on displays that receive high scores on checklists of screen-design guidelines. the authors are aware of only two studies that have investigated conformity with a set of guidelines and user performance, and they both included only ten guidelines. d’angelo and twining measured the correlation between compliance with a set of ten standards (d’angelo standards) and user comprehension.9 the d’angelo standards are in the form of principles for web-page design, based on a review of the literature.10 d’angelo and twining found a small correlation (.266) between number of standards met and user comprehension.11 they do not report on statistical significance, but from the data provided in the paper it appears that the correlation is not significant. gerhardt-powals compared an interface designed according to ten cognitive engineering principles to two control interfaces and found that the cognitively engineered interface resulted in statistically significantly superior user performance.12 the guidelines used in the present study were based on a list compiled by chan to evaluate displays of bibliographic records in online library catalogs.13 the set of guidelines was broken down into four subsets. participants in this study were given search tasks and clicked on the requested item on a bibliographic display. the main dependent variable of interest was response time. ฀ method participants twenty-seven participants were recruited through the university of toronto psychology 100 subject pool. seventeen were female; ten were male. most (twenty) were in the age group 17 to 24; three were in the age group 25 to 34 years, and four were in the age group 35 to 44. one had never used the web; all others reported using the web one or more hours per week. participants received course credit. design to control for the effects of fatigue, practice runs, and the like, the order of trials was determined by two orthogonal 9 x 9 latin squares—one to select a display and one to select a book record. each participant completed five consecutive search tasks—author, title, call number, publisher, and date—in a random order, with each display-book combination. (the order of the five search tasks was randomized each time.) this procedure was repeated, so that in total each participant did ninety tasks (9 displays x 5 tasks x 2 repetitions). materials and apparatus the study used nine displays from library catalogs available on the web. they were selected to represent a variety of systems and to illustrate the remarkable diversity in bibliographic displays in web catalogs. the displays differed in the amount of information included, the structure of the display, employment of highlighting techniques, and use of graphical elements. four examples of the nine displays are presented in figures 1a, 1b, 1c, and 1d. the displays were captured and presented in an interactive environment using active server page (asp) software. the look of the displays was retained, but hypertext links were deactivated. nine different book records were used to provide the content for the displays. items selected were those that would be readily understood by most users—e.g., books by saul bellow, norman mailer, and john updike. the guidelines were based on a list compiled by chan from a review of the literature in human-computer interaction and library science.14 the list does not include guidelines about the process of design. chan formatted the guidelines as a checklist for bibliographic displays in online catalogs. in work reported in 1996, cherry and cox modified the checklist for use with bibliographic displays in web catalogs.15 in a 1998 paper, cherry reported on evaluations of bibliographic displays in catalogs of academic libraries, based on chan’s data for twelve opacs and data for ten web catalogs evaluated by cherry and cox using a modification of the 1996 checklist for web catalogs.16 the findings showed that, on average, displays in opacs scored 58 percent and displays in web catalogs scored 60 percent. the 1996 checklist of guidelines was modified by herrero-solana and de moya-anegón, who used it to explore the use of multivariate analysis in evaluating twenty-five latin american catalogs.17 for the present study four questions were removed that were considered less useful from the checklist used in cherry’s 1998 analysis. the checklist consisted of four sections or subsets: labels (these identify parts of the bibliographic description); text (the display of the bibliographic, holdings/ location, and circulation status information); instructions (includes instructions to users, informational messages, and options available); and layout (includes identification of the screen, the organization for the bibliographic 156 information technology and libraries | september 2006 information, spacing, and consistency of information presentation). items on the checklist were phrased as questions requiring yes/no responses. examples of the items are: labels: “are all fields/variables labeled?” text: “is the text in mixed case (upper and lowercase)?” instructions: “are instructional sentences or phrases simple, concise, clear, and free of typographical errors?” and layout: “is the width of the display no more than forty to sixty characters?” the set used in the present study contained eightysix guidelines in total, of which forty-eight were generic and could be applied to any application. thirty-eight are specific and apply to bibliographic displays in web catalogs. the experiment was run on a pentium computer with a seventeen-inch sony color monitor with a standard keyboard and mouse. figure 1a. example of display figure 1b. example of display figure 1c. example of display figure 1d. example of display bibliographic displays in web catalogs | cherry, muter, and szigeti 157 procedure participants were tested individually. five practice trials with a display and book record not used in the experiment familiarized the participant with the tasks and software. at the beginning of a trial, the message “when ready, click” appeared on the screen. when the participant clicked on the mouse, a bibliographic display appeared along with a message at the top of the screen indicating whether the participant should click on the author, title, call number, publisher, or date of publication—e.g., “current task: author.” participants clicked on what they thought was the correct answer. if they clicked on any other area, the display was shown again. an incorrect click was not defined as an error—in effect, percent correct was always 100—but an incorrect click would of course add to the response time. the software recorded the time to successfully complete each search, the identification for the display and the book record, and the search-task type. when a participant completed the five search tasks for a display, a message was shown indicating the average response time on that set of tasks. when participants completed the ninety search tasks, they were asked to rank the nine displays according to their preference. for this task, a set of laminated color printouts of the displays was provided. participants ranked the displays, assigning a rank of 1 to the display that they preferred most, and 9 to the one they preferred least. they were also asked to complete a short background questionnaire. the entire session took less than forty-five minutes. scoring the displays on screen design guidelines the authors’ experience has indicated that judging whether a guideline is met can be problematic: evaluators sometimes differ in their judgments. in this study, three evaluators assessed each of the nine displays independently. if there was any disagreement amongst the evaluators’ responses for a given question for a given display, that question was not used in the computation of the percentage score for that display. (a guideline regarding screen density was evaluated by only one evaluator because it was very time-consuming.) the total number of questions used to assess each display was eighty-six. the number of questions on which the evaluators disagreed ranged from twelve to thirty across the nine displays. all questions on which the three evaluators agreed for a given display were used in the calculation of the percentage score for that display. hence the percentage scores for the displays are based on a variable set and number of questions—from fifty-six to seventy-four. the subset of questions on which the three evaluators agreed for all nine displays was small—twenty-two questions. ฀ results with regard to conformity to the guidelines, in addition to the overall scores for each display, which ranged from 42 percent to 65 percent, the percentage score was calculated for each subset of the checklist (labels, text, instructions, and layout). the time to successfully complete each search task was recorded to the nearest millisecond. (for some unknown reason, six of the 2,430 response times recorded [27 x 90] were 0 milliseconds. the program was written in such a way that the response-time buffer was cleared at the time of stimulus presentation, in case the participant clicked just before this time. these trials were treated as missing values in the calculation of the means.) six mean response times were calculated: author, title, call number, publisher, date, and the sum of the five response times, called all tasks. the mean of all tasks response times ranged from 13,671 milliseconds to 21,599 milliseconds for the nine formats. the nine display formats differed significantly on this variable according to an analysis of variance, f(8, 477) = 17.1, p < .001. the correlations between response times and guidelines-conformance scores are presented in table 1. it is important to note that a high correlation between response time and conformity to guidelines indicates a low correlation between user performance (speed) and conformity to guidelines. row 1 of table 1 contains correlations between the total guidelines score and response times; column 1 contains correlations between all tasks (the sum of the five response times) and guidelines scores. of course, the correlations in table 1 are not all independent of each other. only five of the thirty correlations in table 1 are significant at the .05 level, and they all indicate slower response times with higher conformity to guidelines. of the six correlations in table 1 indicating faster response times with higher conformity to guidelines, none approaches statistical significance. the upper left-hand cell of table 1 indicates that the overall correlation between total scores on the guidelines and the mean response time across all search tasks (all tasks) was 0.469 (df = 7, p = 0.203)—i.e., conformity to the overall checklist was correlated with slower overall response times, though this correlation did not approach statistical significance. figure 2 shows a scatter plot of the main independent variable, overall score on the checklist of guidelines, and the main dependent variable, the sum of the response times for the five tasks (all tasks). figure 3 shows a scatter plot for the highest obtained correlation: between score on the overall checklist of guidelines and the time to complete the title search task. visual inspection suggests patterns consistent with table 1: no correlation in figure 2, and slower search times with higher guidelines scores in figure 3. finally, correlations were computed between preference and response times (all tasks response times and five 158 information technology and libraries | september 2006 specific-task response times) and between preference and conformity to guidelines (overall guidelines four subsets of guidelines). none of the eleven correlations approached statistical significance. ฀ supplementary study to further validate the results of the main study, it was decided to score the interfaces against a different set of guidelines based on the 2003 u.s. department of health and human services research-based web design and usability guidelines. this set consists of 187 guidelines and includes a rating for each guideline based on strength of research evidence for that guideline. the present study started with eighty-two guidelines that were rated as having either moderate or strong research support, as the definitions of both of these include “cumulative research-based evidence.”18 compliance with guidelines that address the process of design can only be judged during the design process, or via access to the interface designers. since this review process did not allow for that, a total of nine process-focused guidelines were discarded. this set of seventy-three guidelines was then compared with the sixty-guideline 2001 nci set, research-based web design and usability guidelines, intending to add any outstanding nci guidelines supported by strong research evidence to the existing list of seventy-three. however, all of the strongly supported nci guidelines were already represented in the original seventy-three. finally, the guidelines in the iso 9241, ergonomic requirements for office work with visual display terminals (vdts), part 11 (guidance on usability), part 12 (presentation of information ), and part 14 (menu dialogues ) were compared to the existing set of seventy-three, with the intention that any prescriptive guideline in the iso set that was not already included in the original seventy-three would be added.19 again, there were none. the seventy-three guidelines were organized into three thematic groups: (1) layout (the organization of textual and graphic material on the screen), (2) interaction (which included navigation or any element with which the user would interact), and (3) text and readability. all of the guidelines used were written in a manner allowing readers room for interpretation. the authors explicitly stated that they were not writing rules, but rather, guidelines, and recognized that their application must allow for a level of flexibility.20 this ambiguity creates problems in terms of assessing displays. in this study, two evaluators independently assessed the nine displays. the first evaluator applied all seventy-three guidelines and found thirty to be nonapplicable to the specific types of interfaces considered. the second evaluator applied the shortened list of forty-three guidelines. following the independent evaluations, the two evaluators compared assessments. the initial rate of agreement between the two assessments ranged from 49 percent to 70 percent across the nine displays. in cases where there was disagreement, the evaluators discussed their rationale for the assessment in order to achieve consensus. ฀ results of supplementary study as with the initial study, in addition to the overall scores for each display, the percentage score was calculated for each subset of the checklist (labels, interaction, and text and readability). it is worth noting that the overall scores witnessed higher compliance to this second set of guidelines, ranging from 68 percent to 89 percent. the correlations between response times and guidelines-conformance scores are presented in table 2. again, it is important to note that a high correlation between response time and conformity to guidelines indicates a low correlation between user performance (speed) and conformity to guidelines. row 1 of table 2 contains correlations between the total guidelines score and response times; column 1 contains correlations between all tasks (the sum of the five response times) and guidelines scores. of course, the correlations in table 2 are not all independent of each other. only one of the twenty-four correlations in table 2 table 1. correlations between scores on the checklist of screen design guidelines and time to complete search tasks: pearson correlation (sig. 2-tailed); n=9 all cells all tasks author title call # publisher year total score: .469 (.203) .401 (.285) .870 (.002) .547 (.127) .035 (.930) .247 (.522) labels: .722 (.028) .757 (.018) .312 (.413) .601 (.087) .400 (.286) .669 (.049) text: -.260 (.500) -.002 (.997) .595 (.091) -.191 (.623) -.412 (.271) -.288 (.452) instructions: .422 (.258) .442 (.234) .712 (.032) .566 (.112) .026 (.947) .126 (.748) layout: .602 (.086 -.102 (.794) .383 (.308) .624 (.073) .492 (.179) .367 (.332) bibliographic displays in web catalogs | cherry, muter, and szigeti 159 is significant at the .05 level, and it indicates a slower response time with higher conformity to guidelines. of the ten correlations in table 2 indicating faster response times with higher conformity to guidelines, none approaches statistical significance. the upper left-hand cell of table 2 indicates that the overall correlation between total scores on the guidelines and the mean response time across all search tasks (all tasks) was 0.292 (p = 0.445)—i.e., conformity to the overall checklist was correlated with slower overall response times, though this correlation did not approach statistical significance. figure 4 shows a scatter plot of the main independent variable, overall score on the checklist of guidelines, and the main dependent variable, the sum of the response times for the five tasks (all tasks). figure 5 shows a scatter plot for the highest-obtained correlation: between score on the text and readability category of guidelines and the time to complete the title search task. visual inspection suggests patterns consistent with table 2: no correlation in figure 4, and slower search times with higher guidelines scores in figure 5. ฀ discussion in the present experiment and the supplementary study, none of the correlations indicating faster user performance with greater conformity to guidelines approached statistical significance. in some cases, user performance was actually significantly slower with greater conformity to guidelines—i.e., in some cases, there was a negative correlation between user performance and conformity to guidelines. the authors are aware of no other study indicating a negative correlation between user performance and conformity to interface design guidelines. some researchers would not be surprised at a finding of zero correlation between user performance and conformity to guidelines, but a negative correlation is somewhat puzzling. a negative correlation implies that there is something wrong somewhere—perhaps incorrect underlying theories or an incorrect body of assumptions. such a negative correlation is not without precedent in applied science. in the field of medicine, before the turn of the twentieth century, seeing a doctor actually decreased the chances of improving health.21 presumably, medical guidelines of the time were negatively correlated with successful practice, and the negative correlation implies not just worthlessness, but medical theories or beliefs that were actually incorrect and harmful. the boundary conditions of the present findings are unknown. the present findings may be specific to the tasks employed—fairly simple search tasks. the findings may apply only to situations in which the user is switching formats frequently, as opposed to situations in which each user is using only one format. (a between-subjects design would test this possibility.) the findings may be specific to the two sets of guidelines used. with sets of ten guidelines, d’angelo and twining and gerhardt-powals found positive correlations between user performance and conformity to guidelines (though apparently not statistically significantly in the former study).22 the guidelines used in the authors’ main study and supplementary study tended to be more detailed than in the other two studies. detailed guidelines are sometimes seen as advantageous, since developers who use guidelines need to be able to interpret the guidelines in order to implement them. however, perhaps following a large number of detailed figure 2. scatter plot for overall score on checklist of screen design guidelines and time to complete set of five search tasks figure 3. scatter plot for overall score on checklist of screen design guidelines and time to complete “title” search tasks 160 information technology and libraries | september 2006 guidelines reduces the amount of personal judgment used and results in less effective designs. (designers of the nine displays used in the present study would not have been using either of the sets of guidelines used in our studies but may have been using some of the sources from which our guidelines were extracted.) as noted by cheepen in discussing guidelines for voice dialogues, sometimes a designer’s experience may be more valuable than a particular guideline.23 the lack of agreement in interpreting the guidelines was an unexpected but interesting factor revealed during the collection of data in both the main study and the supplementary study. while a higher rate of agreement had been expected, the differences raised an important point in the use of guidelines. if guidelines intentionally leave room for interpretation, what factor does expert opinion and experience play in design? in the main study, the number of guidelines on which the evaluators disagreed ranged from 14 percent to 35 percent across the nine displays. in the supplementary study, both evaluators had experience in interface design through a number of different roles in the design process (both academic and professional). this meant the evaluators’ interpretations of the guidelines were informed by previous experience. the initial level of disagreement ranged from 30 percent to 51 percent across the nine displays. while it was possible to quickly reach consensus table 2. correlations between scores on subset of the u.s. dept. of health and human services (2003) research–based web design and usability guidelines and time to complete search tasks: pearson correlation (sig. 2-tailed); n=9 all cells all tasks author title call # publisher year total score: .292 (.445) .201 (.604) .080 (.839) -.004 (.992) .345 (.363) .499 (.172) layout: -.308 (.420) -.264 (.492) -.512 (.159) -.332 (.383) .046 (.906) -.294 (.442) text: .087 (.824) -.051 (.895) .712 (.032) -.059 (.879) -.095 (.808) -.259 (.500) interaction: .638 (.065) .603 (.085) .055 (.887) .439 (.238) .547 (.128) .625 (.072) figure 4. scatter plot for subset of u.s. department of health and human services (2003) research–based web design and usability guidelines conformance score and total time to complete five search tasks figure 5. scatter plot for text and readability category of u.s. department of health and human services (2003) research–based web design and usability guidelines and time to complete “title” search tasks bibliographic displays in web catalogs | cherry, muter, and szigeti 161 on a number of assessments (because both evaluators recognized the high degree of subjectivity that is involved in design), it also led to longer discussions regarding the intentions of the guideline authors. a majority of the differences involved lack of guideline clarity (where one evaluator had indicated a meet-or-fail score, while another felt the guideline was either unclear or not applicable). does this imply that guidelines can best be applied by committees or groups of designers? the dynamic of such groups would add another complex variable to understanding the relationship between guideline conformity and user performance. future research should test other tasks and other sets of guidelines to confirm or refute the findings of the present study. there should also be investigation of other potential predictors of display effectiveness. for example, would the ratings of usability experts or graphic designers for a set of bibliographic displays be positively correlated with user performance? crawford, in response to a paper presenting findings from an evaluation of bibliographic displays using a previous version of the checklist of guidelines used in the main study, commented that the design of bibliographic displays still reflects art, not science.24 several researchers have discussed aesthetics and user interface design. reed et al. noted the need to extend our understanding of the role of aesthetic elements in the context of user-interface guidelines and standards.25 ngo, teo, and byrne discussed fourteen aesthetic measures for graphic displays.26 norman discussed these ideas in “emotions and design: attractive things work better.”27 tractinsky, katz, and ikar found strong correlations between perceived aesthetic appeal and perceived usability.28 most empirical studies of guidelines have looked at one variable only or, at the most, a small number of variables. the opposite extreme would be to do a study that examines a large number of variables factorially. for example, assuming eighty-six yes/no guidelines for bibliographic displays, it would be theoretically possible to do a factorial experiment testing all possible combinations of yes/no—2 to the 86th power. in such an experiment, all two-way interactions and higher interactions could be assessed, but such an experiment is not feasible. what the authors have done is somewhere between these two extremes. this study has the disadvantage that we cannot say anything about any individual guideline, but it has the advantage that it captures some of the interactions, including highorder interactions. despite the present results, the authors are not recommending abandoning the search for guidelines in interface design. at a minimum, the use of guidelines may increase consistency across interfaces, which may be helpful. however, in some research domains, particularly when huge numbers of potential interactions result in extreme complexity, it may be advisable to allocate resources to means other than attempting to establish guidelines, such as expert review, relying on tradition, letting natural selection take its course, utilizing the intuitions of designers, and observing user-interaction. indeed, in pure and applied research in general, perhaps more resources should be allocated to means other than searching for explicit generalizations. future research may better indicate when to attempt to establish generalizations and when to use other methods. ฀ acknowledgements this work was supported by a social sciences and humanities research council general research grant awarded by the faculty of information studies, university of toronto, and by the natural sciences and engineering research council of canada. the authors wish to thank mark dykeman and gerry oxford who developed the software for the experiment; donna chan, joan bartlett, and margaret english, who scored the displays with the first set of guidelines; everton lewis, who conducted the experimental sessions; m. max evans, who helped score the displays with the supplementary set of guidelines; and robert l. duchnicky, jonathan l. freedman, bruce oddson, tarjin rahman, and paul w. smith for helpful comments. references and notes 1. see, for example, a. chapanis, “some generalizations about generalization,” human factors 30, no. 3 (1988): 253–67. 2. t. s. tullis, “screen design,” in handbook of human-computer interaction, ed. m. helander (amsterdam: elsevier, 1988), 377–411; t. s. tullis, “screen design,” in handbook of humancomputer interaction, 2d ed., eds. m. helander, t. k. landauer, and p. prabhu (amsterdam: elsevier, 1997), 503–31. 3. w. o. galitz, handbook of screen format design, 2d ed. (wellesley hills, mass.: qed information sciences, 1985); s. l. smith and j. n. mosier, guidelines for designing user interface software, technical report esd-tr-86-278 (hanscom air force base, mass.: usaf electronic systems division, 1986). 4. c. chariton and m. choi, “user interface guidelines for enhancing the usability of airline travel agency e-commerce web sites,” chi ‘02 extended abstracts on human factors in computing systems, apr. 20–25, 2002 (minneapolis, minn.: acm press), 676–77, http://portal.acm.org/citation .cfm?doid=506443.506541 (accessed dec. 28, 2005); m. g. wadlow, “the andrew system; the role of human interface guidelines in the design of multimedia applications,” current psychology: research and reviews 9 (summer 1990): 181–91; j. kim and j. lee, “critical design factors for successful e-commerce systems,” behaviour and information technology 21, no. 3 (2002): 185–99; s. giltuz and j. nielsen, usability of web sites for children: 162 information technology and libraries | september 2006 70 design guidelines (fremont, calif.: nielsen norman group, 2002); juliana chan, “evaluation of formats used to display bibliographic records in opacs in canadian academic and public libraries,” master of information science research project report (university of toronto: faculty of information studies, 1995); m. c. maquire, “a review of user-interface design guidelines for public information kiosk systems,” international journal of human-computer studies 50, no. 3 (1999): 263–86. 5. national cancer institute, research-based web design and usability guidelines (2001), www.usability.gov/guidelines/index .html (accessed dec. 28, 2005). 6. u.s. department of health and human services, researchbased web design and usability guidelines (2003), http://usability .gov/pdfs/guidelines.html (accessed dec. 28, 2005). 7. l. j. cronbach, “beyond the two disciplines of scientific psychology,” american psychologist 30, no. 2 (1975): 116–27. 8. p. muter, “interface design and optimization of reading of continuous text,” in cognitive aspects of electronic text processing, eds. h. van oostendorp and s. de mul (norwood, n.j.: ablex, 1996), 161–80; j. a. nelder and r. mead, “a simplex method for function minimization,” computer journal 7, no. 4 (1965): 308–13; t. k. landauer, “research methods in human-computer interaction,” in handbook of human-computer interaction, ed. m. helander (amsterdam: elsevier, 1988), 905–28; r. n. shepard, “toward a universal law of generalization for psychological science,” science 237 (sept. 11, 1987): 1317–323. 9. j. d. d’angelo and j. twining, “comprehension by clicks: d’angelo standards for web page design, and time, comprehension, and preference,” information technology and libraries 19, no. 3 (2000): 125–35. 10. j. d. d’angelo and s. k. little, “successful web pages: what are they and do they exist?” information technology and libraries 17, no. 2 (1998): 71–81. 11. d’angelo and twining, “comprehension by clicks.” 12. j. gerhardt-powals, “cognitive engineering principles for enhancing human-computer performance,” international journal of human-computer interaction 8, no. 2 (1996): 189–211. 13. chan, “evaluation of formats.” 14. ibid. 15. joan m. cherry and joseph p. cox, “world wide web displays of bibliographic records: an evaluation,” proceedings of the 24th annual conference of the canadian association for information science (toronto, ontario: canadian association for information science, 1996), 101–14. 16. joan m. cherry, “bibliographic displays in opacs and web catalogs: how well do they comply with display guidelines?” information technology and libraries 17, no. 3 (1998): 124– 37; cherry and cox, “world wide web displays of bibliographic records.” 17. v. herrero-solana and f. de moya-anegón, “bibliographic displays of web-based opacs: multivariate analysis applied to latin-american catalogs,” libri 51 (june 2001): 75–85. 18. u.s. department of health and human services, researchbased web design and usability guidelines, xxi. 19. international organization for standardization, iso 924111: ergonomic requirements for office work with visual display terminals (vdts)—part 11: guidance on usability (geneva, switzerland: international organization for standardization, 1998); international organization for standardization, iso 9241-12: ergonomic requirements for office work with visual display terminals (vdts)—part 12: presentation of information (geneva, switzerland: international organization for standardization, 1997); international organization for standardization, iso 9241-14: ergonomic requirements for office work with visual display terminals (vdts)—part 14: menu dialogues (geneva, switzerland: international organization for standardization, 1997). 20. u.s. department of health and human services, researchbased web design and usability guidelines. 21. ivan illich, limits to medicine: medical nemesis: the expropriation of health (harmondsworth, n.y.: penguin, 1976). 22. d’angelo and twining, “comprehension by clicks”; gerhardt-powals, “cognitive engineering principles.” 23. c. cheepen, “guidelines for dialogue design—what is our approach? working design guidelines for advanced voice dialogues project. paper 3,” (1996), www.soc.surrey.ac.uk/research/ reports/voice-dialogues/wp3.html (accessed dec. 29, 2005). 24. w. crawford, “webcats and checklists: some cautionary notes,” information technology and libraries 18, no. 2, (1999): 100–03; cherry, “bibliographic displays in opacs and web catalogs.” 25. p. reed et al., “user interface guidelines and standards: progress, issues, and prospects,” interacting with computers 12, no. 1 (1999): 119–42. 26. d. c. l. ngo, l. s. teo, and j. g. byrne, “formalizing guidelines for the design of screen layouts,” displays 21, no. 1 (2000): 3–15. 27. d. a. norman, “emotion and design: attractive things work better,” interactions 9, no. 4 (2002): 36–42. 28. n. tractinsky, a. s. katz, d. ikar, “what is beautiful is usable,” interacting with computers 13, no. 2 (2000): 127–45. levan opensearch and sru | levan 151 not all library content can be exposed as html pages for harvesting by search engines such as google and yahoo!. if a library instead exposes its content through a local search interface, that content can then be found by users of metasearch engines such as a9 and vivísimo. the functionality provided by the local search engine will affect the functionality of the metasearch engine and the findability of the library’s content. this paper describes that situation and some emerging standards in the metasearch arena that choose different balance points between functionality and ease of implementation. editor's note: this article was submitted in honor of the fortieth anniversaries of lita and ital. ฀ the content provider’s dilemma consider the increasingly common situation in which a library wants to expose its digital content to its users. suppose it knows that its users prefer search engines that search the contents of many sites simultaneously, rather than site-specific engines such as the one on the library’s web site. in order to support the preferences of its users, this library must make its contents accessible to search engines of the first type. the easiest way to do this is for the library to convert its contents to html pages and let the harvesting search engines such as google and yahoo! collect those pages and provide searching on them. however, a serious problem with harvesting search engines is that they place limits on how much data they will collect from any one site. google and yahoo! will not harvest a 3-million-record book catalog, even if the library can figure out how to turn the catalog entries into individual web pages. an alternative to exposing library content to harvesting search engines as html pages is to provide a local search interface and let a metasearch engine combine the results of searching the library’s site with the results from searching many other sites simultaneously. users of metasearch engines get the same advantage that users of harvesting search engines get (i.e., the ability to search the contents of many sites simultaneously) plus those users get access to data that the harvesting search engines do not have. the issue for the library is determining how much functionality it must provide in its local search engine so that the metasearch engine can, in turn, provide acceptable functionality to its users. the amount of functionality that the library provides will determine which metasearch engines will be able to access the library’s content. metasearch engines, such as a9 and vivísimo, are search engines that take a user’s query, send it to other search engines, and integrate the responses.1 the level of integration usually depends on the metasearch engine’s ability to understand the responses it receives from the various search engines it has queried. if the response is html intended for display on a browser, then the metasearch engine developers have to write code to parse through the html looking for the content. in such a case, the perceived value of the content determines the level of effort that the metasearch engine developers put into the parsing task; low-value content will have a low priority for developer time and will either suffer from poor integration or be excluded. for metasearch engines to work, they need to know how to send a search to the local search engine and how to interpret the results. metasearch engines such as vivísimo and a9 have staffs of programmers who write code to translate the queries they get from users into queries that the local search engines can accept. metasearch engines also have to develop code to convert all the responses returned by the local search engines into some common format so that those results can be combined and displayed to the user. this is tedious work that is prone to breaking when a local search engine changes how it searches or how it returns its response. the job of the metasearch engine is made much simpler if the local search engine supports a standard search interface such as sru (search and retrieve url) or opensearch. ฀ what does a metasearch engine need in order to use a local search engine? the search process consists of two basic steps. first, the search is performed. second, records are retrieved. to do a search, the metasearch engine needs to know: 1. the location of the local search engine 2. the form of the queries that the local search engine expects 3. how to send the query to the local search engine to retrieve records, the metasearch engine needs to know: 4. how to find the records in the response 5. how to parse the records opensearch and sru: a continuum of searching ralph levan ralph levan (levan@oclc.org) is a research scientist at oclc online computer library center in dublin, ohio. 152 information technology and libraries | september 2006 ฀ four protocols this paper will discuss four search protocols: opensearch, opensearch 1.1, sru, and the metasearch xml gateway (mxg).2 opensearch was initially developed for the a9 metasearch engine. it provides a mechanism for content providers to notify a9 of their content. it also allows rss (really simple syndication) browsers to display the results of a search.3 opensearch 1.1 has just been released. it extends the original specification based on input from a number of organizations, microsoft being prominent among them. sru was developed by the z39.50 community.4 recognizing that their standard (now eighteen years old) needed updating, they simplified it and created a new web service based on an xml encoding carried over http. the mxg protocol is the product of the niso metasearch initiative, a committee of metasearch engine developers, content providers, and users.5 mxg uses sru as a starting place, but eases the requirement for support of a standard query grammar. ฀ functionality versus ease of implementation a library rarely has software developers. the library’s area of expertise is, first of all, the management of content and, secondarily, content creation. librarians use tools developed by other organizations to provide access to their content. these tools include the library’s opac, the software provided to search any licensed content, and the software necessary to build, maintain, and access local digital repositories. for a library, ease of adoption of a new search protocol is essential. if support for the search protocol is built into the library’s tools, then the library will use it. if a small piece of code can be written to convert the library’s existing tools to support the new protocol, the library may do that. similarly, the developers of the library’s tools will want to expend the minimum effort to support a new search protocol. the tool developer’s choice of search protocol to support will depend on the tension between the functionality needed and the level of effort that must be expended to provide and maintain it. if low functionality is acceptable, then a small development effort may be acceptable. high functionality will require a greater level of effort. the developers of the search protocols examined here recognize this tension and are modifying their protocols to make them easier to implement. the new opensearch 1.1 will make it easier for some local search-engine providers to implement by easing some of the functionality requirements of version 1.0. similarly, the niso metasearch committee has defined mxg, a variant of sru that eases some of the requirements of sru.6 ฀ search protocol basics once again, the five basic pieces of information that a metasearch engine needs in order to communicate effectively with a local search engine are: (1) local search engine location, (2) the query-grammar expected, (3) the request encoding, (4) the response encoding, and (5) the record encoding. the four protocols provide these pieces of information to one degree or another (see table 1). the four protocols expose a site’s searching functionality and return responses in a standard format. all of these protocols have some common properties. they expect that the content provider will have a description record that describes the search service. all of these services send searches via http as simple urls, and the responses are sent back as structured xml. to ease implementation, opensearch 1.1 allows the content provider to return html instead of xml. all four protocols use a description record to describe the local search engine. the opensearch protocols define what a description record looks like, but not how it is retrieved. the location of the description record is discovered by some means outside the protocol (a priori knowledge). the description record specifies the location of the local search engine. the sru protocols define what a description record looks like and specifies that it can be obtained from the local search engine. the location of the local search engine is provided by a means outside the protocol (a priori knowledge again). each protocol defines how to formulate the search url. opensearch does this by having the local search-engine provider supply a template of the url in the description record. sru does this by defining the url. opensearch and mxg do not define how to formulate the query. the metasearch engine can either pass the user’s query along to the local search engine unchanged or reformulate the query based on information about the local search engine’s query language that it has gotten by outside means (more a priori knowledge). in the first case, the metasearch engine has to hope that some magic will happen and the local search engine will do something useful with the query. in the latter case, the metasearch engine’s staff has to develop a query translator. sru specifies a standard query grammar: cql (common query language).7 this means that the metasearch engine only has to write one translator for all the sru local search engines in the world. but it also means that all the sru local search engines have to support the cql query grammar. since there are no local search engines that support cql as their native query grammar, the content provider is left with the task of translating cql queries into their native query grammar. the query translation task has moved from the metasearch engine to the content provider. opensearch and sru | levan 153 opensearch 1.0, mxg, and sru define the structure of the query response. in the case of opensearch, the response is returned as an rss message, with a couple of extra elements added. mxg and sru define an xml schema for their responses. opensearch 1.1 allows the local search engine to return the response as unstructured html. this moves the requirement of creating a standard response from the content provider and leaves the metasearch engine with the much tougher task of finding the content embedded in html. if the metasearch engine doesn’t write code to parse the response, then all it can do is display the response. it will not be able to combine the response from the local search engine with the responses from other engines. sru and mxg require that records be returned in xml and that the local search engine must specify the schema for those records in the response. this leaves the content provider with the task of formatting the records according to the schema of their choice, a task that the content provider is probably best able to do. in turn, the metasearch engine can convert the returned records into some common format so that the records from multiple local search engines can be combined into a single response. because the records are encoded in xml, it is assumed that standard xml formatting tools can be used for the conversion. opensearch does not define how records should be structured. the opensearch response has a place for the title of the record and a url that points to the record. the structure of the record is undefined. this leaves the metasearch engine with the task of parsing the record that is returned. again, the effort moves from the content provider to the metasearch engine. if the metasearch engine does not or cannot parse the records, then it can at least display the records in some context, but it cannot combine them with the records from another local search engine. ฀ conclusion these protocols sit on a spectrum of complexity, trading the content provider’s complexity for that of the search engine. however, with lessened complexity for the metasearch engine comes increased functionality for the user. metasearch engines have to choose what content providers they will search. those that provide a high level of functionality can be easily combined with their existing local search engines. content providers with a lower level of functionality will either need additional development by the metasearch engine or will not be searched. not all metasearch engines require the same level of functionality, nor will they be prepared to accept content with a low level of functionality. content providers, such as digital libraries and institutional repositories, will have to choose the functionality they need to support to reach the metasearch engines they desire. references and notes 1. joe barker, “meta-search engines,” in finding information on the internet: a tutorial (u.c. berkeley: teaching library internet workshops, aug. 23, 2005 [last update]), www.lib.berkeley. edu/teachinglib/guides/internet/metasearch.html (accessed may 8, 2006). 2. a9.com, “opensearch specification,” http://opensearch .a9.com/spec/ (accessed may 8, 2006); a9.com, “opensearch 1.1,” http://opensearch.a9.com/spec/1.1/ (accessed may 8, 2006). 3. mark pilgrim, “what is rss?” o’reilly xml.com, dec. 18, 2002, www.xml.com/pub/a/2002/12/18/dive-into-xml.html (accessed may 8, 2006). 4. the library of congress network development and marc standards office, “z39.50 maintenance agency page,” www.loc.gov/z3950/agency/ (accessed may 8, 2006). 5. national information standards organization, “niso metasearch initiative,” www.niso.org/committees/ ms_initiative.html (accessed may 8, 2006). 6. niso metasearch initiative task group 3, “niso metasearch xml gateway implementors guide, version 0.2,” may 16, 2005, [microsoft word document] www.lib.ncsu.edu/nisomi/images/0/06/niso_metasearch_initiative_xml _gateway _implementors_guide.doc (accessed may 8, 2006); the library of congress, “sru: search and retrieve via url; sru version 1.1 13 february 2004,” www.loc.gov/standards/sru/index.html (accessed may 8, 2006). 7. the library of congress, “common query language; cql version 1.1 13th february 2004.” [web page] www.loc .gov/standards/sru/cql/index.html (accessed may 8, 2006). table 1. comparison of requirements of four metasearch protocols for effective communication with local search engines protocol feature opensearch 1.1 opensearch 1.0 mxg sru local search engine location a priori a priori a priori a priori request encoding defined defined defined defined response encoding none rss xml xml record encoding none none xml xml query grammar none none none cql peer reading promotion in university libraries: based on simulation study about readers' opinion seeking in social network article peer reading promotion in university libraries based on a simulation study about readers’ opinion seeking in social networks yiping jiang, xiaobo chi, yan lou, lihua zuo, yeqi chu, and qingyi zhuge information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.12175 yiping jiang (jyp@zjut.edu.cn) is associate professor in information and library science, zhejiang university of technology, china. xiaobo chi (chixiaobo@zjut.edu.cn) is associate professor in information and library science, zhejiang university of technology, china. yan lou (jljly@zju.edu.cn) is associate professor in administrative department of continuing education, zhejiang university, china. lihua zuo (jljly@zju.edu.cn) is librarian at zhejiang university of technology, china. yeqi chu (cyq77@zjut.edu.cn) is librarian at zhejiang university of technology, china. qingyi zhuge (beckygoodly@163.com) is librarian at zhejiang university of technology, china. © 2021. abstract university libraries use social networks to promote reading; however, there are challenges to increasing the use of these library platforms, such as poor promotion and low reader participation. therefore, these libraries need to find ways of dealing with the behavior characteristics of social network readers. in this study, a simulation experiment was developed to explore the behaviors of readers seeking book reviews and opinions on social networks. the study draws on social network theory to find the causes of students’ behavior and how these affect their selection of information. finally, it presents strategies for peer reading promotion in university libraries. introduction over the last decade, social media has made an impact on almost every aspect of daily life. university libraries have gradually accepted social media as a way of promoting their services, and almost every university library in the people’s republic of china has its own social media accounts. however, there are challenges to increasing libraries’ use of social media, such as poor promotion and low reader participation.1 university libraries cannot depend only on promoting reading through readers’ unenthusiastic use of social media tools, as constructive engagement with social networks requires users’ participation, dominance, and construction.2 therefore, as a baseline, university libraries must take into consideration their readers’ social attributes and then make full use of the mutual cooperation and sharing mechanisms between peers so that readers can become more involved in the use of these platforms that promote reading. in the current study, a free simulation was conducted wherein participants were required to complete a preferential choice task while browsing a book review survey that was integrated with social media platforms.3 finally, we provide some suggestions to promote reading. literature review university reading promotion our review of literature on the promotion of university reading reveals three main research perspectives. the first perspective focuses on libraries. there is some evidence to suggest increasing enthusiasm for reading programs within universities.4 rodney detailed the experiences of a library at a small liberal arts university that launched a one book, one community program.5 hou emphasized the dominant position of university reading in reading promotion and put forward specific promotion strategies.6 li et al. established a subscription digital service system for reading promotion in universities that provided personalized services for users.7 the national mailto:jyp@zjut.edu.cn mailto:chixiaobo@zjut.edu.cn mailto:jljly@zju.edu.cn mailto:jljly@zju.edu.cn mailto:cyq77@zjut.edu.cn mailto:beckygoodly@163.com information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 2 resource center for the first-year experience and students in transition hosts a discussion site and has compiled a list of institutions reporting first-year summer reading programs along with a list of book titles used in the programs.8 appalachian state university also has an active university reading program discussion list.9 college reading experience programs have the potential to bring disparate disciplines and college departments together in ways that extend student learning and engagement beyond the classroom. it could be argued that librarians are one group of natural “boundary spanners.”10 gustavus adolphus college has compiled a lengthy list of links to universities that participate in the first-year experience. querying this list suggests that there is a growing number of reading programs on college campuses and librarians are increasingly finding a role in their development and delivery.11 the second perspective focuses on readers. for instance, zhou et al. analyzed users’ reading needs and proposed ideas of how universities could promote reading through the use of questionnaires.12 based on self-determination theory, wei et al. constructed an index of students’ and teachers’ motivation and participation in reading promotions in libraries by using questionnaires. their factor analysis of readers’ reading psychology from the aspects of information value, social sharing, interest, cognition, and emotional entertainment concludes that the theme, intelligence, and interactivity of college reading promotions are significant. 13 dali made specific recommendations on how to give reading practices in academic libraries a boost and a new direction through the lens of the differentiated nature of readerships on campuses.14 the third perspective focuses on cultural constructions in colleges. boff et al. studied the practical activities they termed as library participation in “campus reading experience” (cre) at two american community colleges and two four-year institutions in the united states.15 their research pointed out the importance of reading promotion activities in the cultural constructions of colleges and universities. moreover, it presented efficient suggestions of how librarians can hold reading promotion activities on campus and how librarians can play a more positive role in presenting reading promotion plans to their administrations. marcoux et al. emphasized the vital status of canadian college libraries in various subject areas and cultural dominance in colleges and insisted that reading promotion should be enforced by bringing together colleges, teachers, and students.16 peer education in university libraries since the 1970s, university libraries have experimented with making students an extension of reference services and part of established peer instruction services. for example, at california state university, fresno, student assistants were recruited to work on the reference desk and answer directional and simple reference questions.17 the university of michigan in ann arbor developed its peer information counseling program in 1985 to focus on the retention of minority students.18 the university of wisconsin-parkside and binghamton university in new york employ student peers to provide instructional support.19 all these programs incorporate peer tutoring models developed specifically for their settings. there are many practical projects in which libraries provide instruction to peer readers, such as the peer practical project at wabash college, the student assistant project at valparaiso university, the student consultant project at the university of new mexico, the curriculum consultant project at the university of new hampshire, and the student assistants project at utah state university. 20 surveys of targeted students in these programs revealed that students were more likely to ask questions of student assistants than librarians. descriptions of the training in these projects information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 3 emphasized knowledge of the library’s resources, with little or no explanation of incorporating peer-learning principles.21 by 2010, libraries acknowledge that peer learning provides an opportunity to further solidify the information literacy skills of all students. programs such as the library instruction tutor project at the university of new mexico and research mentors at the university of new hampshire were designed with a focus on taking advantage of the uniqueness of the peer-to-peer relationship, rather than replacing reference services.22 the librat program at california polytechnic state university trained students to provide single-shot information literacy instruction. student endorsement of peer-led sessions provides supportive evidence that participating attendees perceive this type of session as useful and valuable.23 to summarize, university libraries have identified that harnessing the uniqueness of peer relationships is an effective way to engage students in learning.24 social reading promotion the first published papers regarding the use of facebook by libraries and librarians appeared in 2006.25 up to that time, most scholars defined social reading as sharing theoretical research but had not yet explored its relationship to social media. for instance, zhang et al. suggested that classic books and other resources should be aimed more accurately at potential users, and the convenience of the connection between a library and its users in the social media environment should be fully utilized in order to understand the personal needs of users.26 yang et al. explored the relationship between users who engage in social reading and reading resources by analyzing it from the perspective of users.27 white et al. explored how social networks promote reading and studying and found that social media can promote users’ selection, critiques, and discussions in the reading context, which are part of the process of constructive studying.28 similarly, asteman et al. investigated the impact of users’ discussions on reading participation and reading promotion by using facebook as a research target, and they proved that users’ discussions on social reading were beneficial to reading, studying, and understanding complex scientific topics.29 some researchers have explored the resources recommending methods and algorithms for social reading. take kochitchi et al.’s research as an example: they built a visual analysis system to construct the relationship between user characteristics and resources by extracting social reading tags and user interaction behavior characteristics. 30 huang et al. proposed an efficient method for recommending information among social network groups.31 some scholars put particular emphasis on researching user services. for instance, liu et al. created reading review flows to help improve users’ reading ability and optimize the reading experience by analyzing users’ needs on social platforms.32 fox subdivided users on social media platforms into three categories—passive, active, and interactive—and summarized that users’ standardized behaviors can effectively enhance user interaction.33 yao et al. studied the data gathered from practical activities such as information posts and book retrieval through social reading platforms at the tsinghua university library.34 information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 4 methods research questions existing studies on social reading promotion in libraries have mainly explored how to use social media platforms to promote and develop reading promotion services. however, few scholars have explored the reader patterns of those seeking opinions within social networks. this study explored the effects that peers have on each other as opposed to services provided by university libraries and addressed the following three questions: 1. do readers value the opinions of peers on social networks? 2. what tendencies do readers exhibit when seeking opinions on different types of literature? 3. how does social capital influence readers’ tendencies when those readers are seeking opinions? social capital refers to the potential value of social relations and includes two key dimensions: structural and relational.35 social structure can be characterized by quantity and configuration.36 with respect to quantity, the more social ties one has the potential to activate, the more information resources can be transferred.37 the configuration of social capital means that it is higher when a network’s structure is more sparse.38 relational capital refers to the potential value associated with the quality of social relationships which are created and embedded by network peers and can be utilized by network friends.39 previous studies have used different social attributes to describe relational capital, including homogeneity, trust, expertise, power, and closeness.40 study design we used a questionnaire applet and designed a survey of book reviews to explore patterns in the way readers seek information on books through social networks and the factors that influence readers when they adopt others’ opinions. as an initial step, we recruited 300 college student participants from 15 colleges and universities. these students, from the wechat group of the eighth national mechanical design competition for college students, expressed interest in the study. the students were offered three lists of books, categorized as leisure literature, mechanical literature, and information resources utilization literature. there were 10 books in every list, and the books were selected from a 2019 lending list compiled by five colleges and universities. participants were asked to log in to our survey using their wechat credentials. they were required to write reviews for the books that they had read, and they were encouraged to recommend similar literature and write reviews for those books as well. meanwhile, 30 librarians were also invited to write reviews for the books on the lists (see fig. 1). information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 5 figure 1. the book review steps. to keep a representative sample, we adopted stratification and divided participants into different non-overlapping homogenous groups. we set up six groups with similar scales according to the number of wechat friends that the readers had within the 300-student sample. readers in the first group did not have any friends. readers in the second group had one or two friends. readers in the third group had three or four friends. readers in the fourth group had five or six friends. readers in the fifth group had seven to eleven friends. readers in the sixth group had twelve or more friends. subsequently, we randomly invited 15 readers from every group to complete the following steps: (1) complete an online questionnaire that measured their relational capital (see table 1 and fig. 2) and (2) make references to others’ reviews and select books that they intended to read (see fig. 2). information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 6 table 1. three measurements of relational capital measurements questionnaire professional skills 1. reading is a part of my life. 2. reading is in my daily to-do list. 3. i go to the library to study. 4. my reading ability is good. 5. reading helps me a lot. 6. i read a lot (at least 10 books a year） similarity 1. my friends and i read similar books. 2. my friends and i have similar feelings about reading. 3. recommendations are useful to me. intimacy 1. my survey group is trustworthy. 2. others’ comments are beneficial to me. 3. i am willing to share my feelings about reading with my friends. note: a five-point likert scale (strongly disagree,disagree,neutral,agree,strongly agree) was used. information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 7 figure 2. selecting the books. data collection measurement of readers’ behavior when seeking opinions. a book review applet (with wechat’s questionnaire function) was incorporated to record the number of times a reader looked at reviews from peers and librarians (see fig. 3). information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 8 figure 3. measurement of readers’ behavior when seeking opinions. note: when readers wanted to refer to other people’s comments, the applet allowed them to choose either classmates’ or librarians’ reviews. readers could browse the comments through a drop-down menu, and the number of reviews read by the respondents was recorded. measurement of structural capital. the scale of a reader’s structural capital was related to the number of wechat friends they had.41 drawing on the extant literature, we computed network sparseness by dividing the network’s effective size by the overall size. network effective size is the average number of wechat friends within the sample set of 300 people. measurement of relational capital. we used the three variables of professional skills, similarity, and intimacy to measure social relationships.42 data were gathered from the online questionnaire (see table 1). experiment validity reliability analysis. we performed measurement validity checks on the three variables applied in the measurement of relational capital. table 2 shows the evidence of satisfactory convergence and discriminant validity. table 2. reliability analysis results cronbach’s alpha std. cronbach’s alpha projects .811 .811 12 factor analysis. we tested whether it was scientifically meaningful to consider network scale, network sparseness, and relational capital as independent variables in respectively analyzing two information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 9 dependent variables, that is, times of seeking peers’ opinions and times of ref erring to librarians’ opinions. tables 3 and 4 show the results of the kaiser-meyer-olkin (kmo) spherical test and bartlett’s test, which show that factor analysis is suitable, and further analysis can be performed. table 3. kmo spherical test kmo .654 bartlett’s test chi-square 314.342 df 10 significance .00 table 4. bartlett’s test initial extract network scale 1.000 .724 network sparseness 1.000 .676 relational capital 1.000 .720 results reviews written and browsed we recorded the number of reviews read by the respondents and then compared the number of times they consulted peers with the number of times they consulted librarians. the results are shown in table 5. in total, readers sought opinions from peers 1,374 times (70.9%), and from librarians 563 times (29.1%). for leisure literature, mechanical literature, and information resources utilization literature they sought opinions from peers 422 times (85.3%), 519 times (88.3%), and 433 times (50.7%), respectively. from these results, it can be surmised that readers tend to seek opinions from peers. table 5. comparison of sources consulted by readers seeking opinions leisure mechanical information resources utilization total reviews browsed reviews browsed reviews browsed readers 422 (85.3%) 519 (88.3%) 433 (50.7%) 1374 (70.9%) librarians 73 (14.7%) 69 (11.7%) 421 (49.3%) 563 (29.1%) we used regression analysis to analyze the relationship between readers’ behavior of seeking opinions and social capital. results are shown in table 6. according to the t-test, given the significance level of 0.10, the significance probability of the three variables was less than 0.10 for both the times that readers sought peer opinions and the times that they consulted librarians. table 7 shows the introduction and elimination process of variables in the process of stepwise information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 10 regression. the analysis of seeking peer opinions eliminates relational capital, and the analysis of reference to librarians’ opinions has eliminated network sparseness. table 6. regression analysis results model non-standardized coefficients standardized coefficients t observed value significance probability partial regression coefficient (b) standard error (std error) standardized partial regression coefficient (beta) dependent variable: seeking peers’ opinions (constant) 4.864 1.334 3.645 .000 network scale 1.341 .143 .715 9.393 .000 network sparseness 3.450 1.012 -.245 -3.408 .001 relational capital .094 .318 .017 .295 .769 dependent variable: seeking librarian opinions (constant) .292 1.943 .150 .881 network scale -1.691 .208 .909 8.133 .000 network sparseness -1.312 1.474 .094 .890 .376 relational capital -1.385 .463 -.247 -2.990 .004 table 7. quantity introduction and elimination process model variable entered variable removed dependent variable: seeking peers’ opinions network scale, network sparseness relational capital dependent variable: seeking librarians’ opinions network scale, relational capital network sparseness according to the analysis results, we can draw some conclusions. first, the number of times readers sought opinions from peers was in proportion to network sparseness. in other words, the higher the number of readers’ online peers, the lower the network sparseness and the more they sought advice from their peers. second, the number of times readers sought librarians’ opinions was inversely proportional to their network scale and relational capital. this means that readers tended not to seek opinions from librarians if they had more network peers and more relational capital. information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 11 according to further analysis of tables 6 and 7, there are two equations that can derived: 1. regression equation of readers seeking opinions from peers: where y1 represents seeking peer opinions and x1 and x2 represent network scale and network sparseness, respectively. 2. regression equation of readers seeking opinions from librarians: where y2 represents seeking librarian opinions and x1 and x3 represent network scale and relational capital, respectively. based on the two regression equations, it can be observed that the number of peers’ reviews consulted increases by 13.9% for each point increase in network scale and by 35.7% for each additional unit of network sparseness. the number of librarians’ reviews consulted decreases by 60.7% for each point increase in network scale and by 70.7% for each additional unit of relational capital. discussion the value of peers according to the results in table 5, readers tend to seek opinions from peers. this is mainly because familiar information sources can provide more diagnostic help.43 meanwhile, the cognitive effort required to process such information is lower and information is easier to understand. research on peer education also shows similar findings. jane piaget and lev vygotsky found that it is easier to build partnerships among children than among children and adults. moreover, children are more willing to negotiate and theorize with partners who are not authoritative. in this social media age, the value of peers is even more pronounced. therefore, libraries should recognize this, recruit influential readers for reading promotion, and utilize the influence of peer social networks to spread information related to the promotion of reading. opinion seeking tendency of different types of literature information the participants in this study were university students majoring in mechanical engineering disciplines across several different universities. this means that they had similar backgrounds, experiences, and feelings as they participated in the mechanical design competition. under these circumstances, it would be expected that they had close peer relationships. behavioral science research proves that if the communicator and the receiver have similar experiences, are concerned about similar things, and face similar problems, the receiver is more likely to accept information from the communicator. this viewpoint is consistent with the standpoint proposed by psychological models in which relevant sources of information are more frequently activated. 44 therefore, the result that readers are more willing to seek opinions from peers is supported. the result regarding information resources utilization literature shows similar numbers in terms of seeking opinions from peers and from librarians, and the number seeking from librarians is slightly higher. this may be because libraries are the literature and information resource centers information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 12 of universities, and readers trust the professional abilities of librarians and are willing to seek their help with regard to resource utilization. the results indicate that librarians should make efforts to promote the use of information resources. seeking tendency of readers with different social capital in the process of decision-making, readers will look for homogeneous and credible people to assist them in their search for and evaluation of information.45 identity, experience, reading level, and taste in reading are contributing factors to credibility. the more partners, the more the relational capital; if readers have more trustworthy sources of information, they will not turn to librarians for their opinions. information searching is a dynamic and adaptive process. when readers find information that is novel, it has value in decision-making. conversely, when the information is redundant (not novel), it may cause the seeker to stop searching. it is widely acknowledged that sparse social networks reduce the possibility of information redundancy.46 therefore, the sparser the social network, the more likely it is that readers will need peer help. conclusions in this age of social media, university students are accustomed to using social networks. seeking information that complies with their psychological needs is more significant and valuable to them than the value of the information itself. therefore, libraries should make full use of peer influences when employing social media in reading promotion activities. first, university libraries ought to realize their great potential for involving students within the social flow to participate in reading promotion activities. in the digital age, readers’ consciousness is repeatedly awakened. individuality, advocating for information freedom, and improving the flow of information mean that readers are no longer satisfied with passively receiving information and are more willing to actively search out and read information. meanwhile, sharing and interaction can fully meet the desires of individuals to share and communicate as well as meet their psychological need of realizing their self-worth. libraries must understand the characteristics of contemporary university students’ information needs and create a space for readers to take the initiative. only in this way can readers be more than passive recipients of information—they can also be pushers and disseminators of information. bolder attempts at innovation should be applied to reading promotions. in this research, an analysis and exploration were conducted based on the literature, indicating that readers prefer to seek opinions from their own social networks. therefore, the library can make full use of readers’ social network groups when promoting the library’s literature resources. likewise, other services and activities provided by the libraries can be promoted through readers’ social networks. for instance, libraries can invite student volunteers to take part in a new service before launching and then invite them to share their feelings and evaluations through social networks. these types of methods are more efficient than the traditional flyer notification. last but not least, when organizing reading promotion activities, libraries should stay behind the scenes. university libraries can establish a set of systematic peer reading promotion rules, including recruitment, training, and management systems, to build a wide and influ ential reading promotion student volunteer team on social networks. libraries should, however, strengthen the process of monitoring peer reading promotions to prevent negative influences caused by harmful information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 13 information on social media.47 they should pay special attention to the control of social opinion by using the reader volunteer team. in addition, through the monitoring and analysis of data, the strategy and direction of reading promotions can be adjusted over time to improve their pertinence and effectiveness. moreover, libraries should strengthen the effective evaluation of peer reading promotion projects. readers should be involved in the systematic readjustment of traditional reading promotion methods. innovative methods need to be tested in practice s o that libraries can strengthen the effective evaluation of peer projects. limitations and future research there are some limitations in the methodology design, theoretical scope, empirical context, and research perspective in the current study. getting past these limitations can also provide direction for further research. in the methodology, although the sectional sampling method is often adopted, the conclusions of this research make it difficult to disentangle the roles of readers’ social capital from those of opinion seeking. this study explored the correlations between several variables which need to be further investigated by introducing control variables to fully examine the interactions between the variables. in theory, this research focused on analysis of the rule that readers will seek opinions through social networks, which reflects the behavior observed in information searching and browsing. readers then need to decide whether to use the information they find. these two behaviors are related and, based on this research, we need to further explore readers’ adoption behavior to better guide reader service work in libraries. in the empirical analysis, it is worth further considering how the various efforts undertaken by university libraries have promoted information channels. the participants in this study were university students majoring in mechanical engineering disciplines. however, students in different majors may exhibit different information selecting behavior, and this deserves further analysis and exploration. this study compared the influence of peers’ and librarians’ opinions on readers. however, how do readers feel about opinions from peers as compared to those from librarians? what is the impact of students preferring to use their social networks instead of librarians for information retrieval? is there a difference in the adoption of peer opinions by readers in different social media contexts? these questions deserve further study to fully understand the impact of social networks on read er opinion-seeking behavior. acknowledgements this work was supported by the humanities and social sciences research fund of the chinese ministry of education [grant number 17yja870003] and the philosophy and social science fund of zhejiang province [grant number 21ndjc039yb]. information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 14 endnotes 1 shi-man tang, “research on the reading promotion model and implementation path of university library based on social media platform.” (master’s thesis, university of jilin, 2013): 90–111. 2 yao qi, hua-wei ma, huan yan, and qi chen, “analysis of social network users’ online behavior from the perspective of psychology,” advances in psychological science 22, no. 10 (2014): 1647–59, https://doi.org/10.3724/sp.j.1042.2014.01647. 3 david gefen, elena karahanna, and detmar w. straub, “trust and tam in online shopping: an integrated model,” mis quarterly 27, no. 1 (march 2003): 51–90, https://doi.org/10.2307/30036519. 4 colleen boff, robert schroeder, carol letson, and joy gambill, “building uncommon community with a common book: the role of librarians as collaborators and contributors to campus reading programs,” research strategies 20, (2007): 272–83, https://doi.org/10.1016/j.resstr.2006.12.004. 5 mae l. rodney, “building community partnerships: the ‘one book one community’ experience,” c&rl news 65, no. 3 (march 2004), 130–32, https://doi.org/10.5860/crln.65.3.130. 6 ai-hua hou, “analysis and research on the reading promotion strategy of university library,” lifelong education 9, no. 5 (2020), https://doi.org/10.18282/le.v9i5.1251. 7 mei-ning li, tian-zi zhao, xu guan, and xin-hua chen, “study on building digital service system of ‘subscription’ reading promotion for university library,” library and information service 62, no. 18 (2018): 77–82, https://doi.org/10.13266/j.issn.0252-3116.2018.18.008. 8 boff, “building uncommon community with a common book,” 271–83. 9 kim becnel et al., “‘somebody signed me up’: north carolina fourth-graders’ perceptions of summer reading programs,” children & libraries: the journal of the association for library service to children 15, no. 3 (2017): 3–8, https://doi.org/10.5860/cal.15.3.3. 10 rodney, “building community partnerships,” 130–32, 155. 11 boff, “building uncommon community with a common book,” 271–83. 12 mu-chen wan and liang ou, “the empirical research of university libraries reading promotion effect based on the wechat public platform,” library and information service 60, no. 22 (2015): 72–78, https://doi.org/10.13266/j.issn.0252-3116.2015.22.011. 13 xiao-li wei, yi-ming mi, and fang sheng, “motivation measurement of university library’s participation in the reading promotion based on self-determination theory,” journal of library and information science 10, (2018): 1–8. 14 dali keren and lindsay mcniff, “reading work as a diversity practice: a differentiated approach to reading promotion in academic libraries in north america,” journal of https://doi.org/10.3724/sp.j.1042.2014.01647 https://doi.org/10.2307/30036519 https://doi.org/10.1016/j.resstr.2006.12.004 https://doi.org/10.5860/crln.65.3.130 https://doi.org/10.18282/le.v9i5.1251 https://doi.org/10.13266/j.issn.0252-3116.2018.18.008 https://doi.org/10.5860/cal.15.3.3 https://doi.org/10.13266/j.issn.0252-3116.2015.22.011 information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 15 librarianship and information science 52, no. 4 (february 2020): 1050–62, https://doi.org/10.1177/0961000620902247. 15 boff, “building uncommon community with a common book,” 271–83. 16 elizabeth betty marcoux and d. v. loertscher, “the role of a school library in a school’s reading program,” teacher librarian 37, no. 1 (2009): 8, 10–14, 84. 17 brett b. bodemer, “they can and they should: undergraduates providing peer reference and instruction,” college & research libraries 75, no.2 (2014): 162–78, https://doi.org/10.5860/crl12-411. 18 barbara macadam and darlene p. nichols, “peer information counseling: an academic library program for minority students,” journal of academic librarianship 15, no. 4 (1989): 204–9, https://doi.org/10.1016/0268-4012(89)90012-1. 19 turkey alzahrani and melinda leko, “the effects of peer tutoring on the reading comprehension performance of secondary students with disabilities: a systematic review,” reading & writing quarterly (april 2017): 1–17, https://doi.org/10.1080/10573569.2017.1302372. 20 bodemer, “they can and they should,” 162–78; ruth sara connell and patricia j. mileham, “student assistant training in a small academic library,” public services quarterly 2, no. 2–3 (2006): 69–84, https://doi.org/10.1300/j295v02n02_06; michael m. smith and leslie j. reynolds, “the street team: an unconventional peer program for undergraduates,” library management 29.3 (2008): 145–58, https://doi.org/10.1108/01435120810855287; gail fensom et al., “navigating research waters: the research mentor program at the university of new hampshire at manchester,” college & undergraduate libraries 13, no. 2 (2006): 49–74, https://doi.org/10.1300/j106v13n02_05; wendy holliday and c. nordgren, “extending the reach of librarians: library peer mentor program at utah state university,” college & research libraries news, 66, no. 4 (2005), https://doi.org/10.5860/crln.66.4.7422. 21 mary o’kelly, julie garrison, brian merry, and jennifer torreano, “building a peer-learning service for students in an academic library,” libraries and the academy 15, no. 1 (2015): 163– 82, https://doi.org/10.1353/pla.2015.0000. 22 fensom et al., “navigating research waters,” 49–74. 23 bodemer, “they can and they should,” 162–78. 24 ling-jie yao, “peer education: a new mode of university library services,” library development 12, (2012): 57–59. 25 jamie m. graham, allison faix ,and lisa hartman, “crashing the facebook party: one library’s experiences in the students’ domain,” library review 58, no. 3 (2009): 228–36, https://doi.org/10.1108/00242530910942072. 26 yue-qun zhang and chun-ning li, “change of library role in knowledge transfer in social network environment and countermeasures,” library and information 166, no. 6 (2015): 107– 12. https://doi.org/10.1177/0961000620902247 https://doi.org/10.5860/crl12-411 https://doi.org/10.1016/0268-4012(89)90012-1 https://doi.org/10.1080/10573569.2017.1302372 https://doi.org/10.1300/j295v02n02_06 https://doi.org/10.1108/01435120810855287 https://doi.org/10.1300/j106v13n02_05 https://doi.org/10.5860/crln.66.4.7422 https://doi.org/10.1353/pla.2015.0000 https://doi.org/10.1108/00242530910942072 information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 16 27 yi yang and ji-qing sun, “professional reading habits correlation research based on the social network theory,” new century library 70, no. 10 (2012): 81, 91–92, https://doi.org/10.16810/j.cnki.1672-514x.2012.10.024. 28 john wesley white and holly hungerford-kresser, “character journaling through social networks,” journal of adolescent & adult literacy 57, no. 8 (2014): 642–54, https://doi.org/10.1002/jaal.306. 29 christa s. c. asterhan and rakheli hever, “learning from reading argumentative group discussions in facebook,” computers in human behavior, no. 53 (2015): 570–76, https://doi.org/10.1016/j.chb.2015.05.020. 30 a. kochtchi, t. v. landesberger, and c. biemann, “networks of names: visual exploration and semi-automatic tagging of social networks from newspaper articles,” computer graphics forum 33, no. 3 (2014): 211–20, https://doi.org/10.1111/cgf.12377. 31 zhen-hua huang, bo zhang, qiang fang, and yang xiang, “an efficient algorithm of information recommendation between groups in social networks,” acta electronica sinica 43, no. 6 (2015): 1090–93. 32 cheng-ying liu, ming-syan chen, and chi-yao, “incrests: towards real-time incremental short text summarization on comment streams from social network services,” ieee transactions on knowledge and data engineering 27, no. 11 (2015): 2986–3000, https://doi.org/10.1109/tkde.2015.2405553. 33 jesse fox and courtney anderegg, “romantic relationship stages and social networking sites: uncertainty reduction strategies and perceived relational norms on facebook,” cyberpsychology behavior & social networking 17, no. 11 (2014): 685–91, https://doi.org/10.1089/cyber.2014.0232. 34 fei yao, cheng-yu zhang, wu chen, and tian-fang dou, “study on integrating library services into social network sites: taking the book club of tsinghua library university as a practice example,” library journal 30, no. 6 (2011): 24–28, https://doi.org/10.13663/j.cnki.lj.2011.06.014. 35 lin nan, “social capital: a theory of social structure and action,” (cambridge: cambridge university press, 2001); paul s. adler and seok-woo kwon, “social capital: prospects for a new concept,” academy of management review 27, no. 1 (2002): 17–40, https://doi.org/10.5465/amr.2002.5922314; peter moran, “structural vs. relational embeddedness: social capital and managerial performance,” strategic management journal 26, no. 12 (2005): 1129–51, https://doi.org/10.1002/smj.486. 36 peter h. gray, s. parise, and b. iyer, “innovation impacts of using social bookmarking systems,” mis quarterly 35, no. 3 (2011): 629–43, https://doi.org/10.1002/asi.21581. 37 linton c. freeman, “centrality in social networks’ conceptual clarification,” social networks (1978), https://doi.org/10.1016/0378-8733(78)90021-7; stephen p. borgatti, “centrality and network flow,” social networks 27, no. 1 (2005): 55–71, https://doi.org/10.1016/j.socnet.2004.11.008. https://doi.org/10.16810/j.cnki.1672-514x.2012.10.024 https://doi.org/10.1002/jaal.306 https://doi.org/10.1016/j.chb.2015.05.020 https://doi.org/10.1111/cgf.12377 https://doi.org/10.1109/tkde.2015.2405553 https://doi.org/10.1089/cyber.2014.0232 https://doi.org/10.13663/j.cnki.lj.2011.06.014. https://doi.org/10.5465/amr.2002.5922314 https://doi.org/10.1002/smj.486 https://doi.org/10.1002/asi.21581 https://doi.org/10.1016/0378-8733(78)90021-7 https://doi.org/10.1016/j.socnet.2004.11.008 information technology and libraries march 2021 peer reading promotion in university libraries | jiang, chi, lou, zuo, chu, and zhuge 17 38 ronald s. burt, “structural holes: the social structure of competition” (cambridge: harvard university press, 1992). 39 adler and kwon, “social capital,” 17–40. 40 peter v. marsden and k. e. campbell, “reflections on conceptualizing and measuring tie strength,” social forces 91, no. 1 (2012): 17–23, https://doi.org/10.1093/sf/sos112tti; stephen p. borgatti and r. cross, “a relational view of information seeking and learning in social networks,” management science 49, no. 4 (2003): 432–45, https://doi.org/10.1287/mnsc.49.4.432.14428; peter moran, “structural vs. relational embeddedness,” 1129–51; mesch gustavo and i. talmud, “the quality of online and offline relationships: the role of multiplexity and duration of social relationships,” the information society 22, no. 3 (2006): 137–48, https://doi.org/10.1080/01972240600677805. 41 camille grange and i. benbasat, “opinion seeking in a social network-enabled product review website: a study of word-of-mouth in the era of digital social networks,” social science electronic publishing 27, no. 6 (2018): 629–53, https://doi.org/10.2139/ssrn.2993427. 42 borgatti and cross, “a relational view of information seeking and learning in social networks,” 432–45; gustavo and talmud, “the quality of online and offline relationships,” 137–48; fox and anderegg, “romantic relationship stages and social networking sites,” 685–91; gray, parise, and iyer, “innovation impacts of using social bookmarking systems,” 629–43. 43 david gefen, “e-commerce: the role of familiarity and trust,” omega 28, no. 6 (2000): 725–37, https://doi.org/10.1016/s0305-0483(00)00021-9. 44 tam kar yan and s. y. ho, “understanding the impact of web personalization on user information processing and decision outcomes,” mis quarterly 30, no. 4 (2006): 865–90, https://doi.org/10.2307/25148757. 45 jacqueline johnson brown and peter h. reingen, “social ties and word-of-mouth referral behavior,” journal of consumer research 14, no. 3 (december 1987): 350–62, https://doi.org/10.1086/209118. 46 glenn j. browne, mitzi g. pitts, and james c. wetherbe, “cognitive stopping rules for terminating information search in online tasks,” mis quarterly 31 (march 2007): 89–104, https://doi.org/10.2307/25148782. 47 yue long and yi-yang liu, “propagation characteristics and paths of negative network public opinions in colleges under the new media environment,” information science 37, no. 12 (2019): 134–39, https://doi.org/10.13833/j.issn.1007-7634.2019.12.022. https://doi.org/10.1093/sf/sos112tti https://doi.org/10.1287/mnsc.49.4.432.14428 https://doi.org/10.1080/01972240600677805 https://doi.org/10.2139/ssrn.2993427 https://doi.org/10.1016/s0305-0483(00)00021-9 https://doi.org/10.2307/25148757 https://doi.org/10.1086/209118 https://doi.org/10.2307/25148782 https://doi.org/10.13833/j.issn.1007-7634.2019.12.022 abstract introduction literature review university reading promotion peer education in university libraries social reading promotion methods research questions study design data collection experiment validity results reviews written and browsed discussion the value of peers opinion seeking tendency of different types of literature information seeking tendency of readers with different social capital conclusions limitations and future research acknowledgements endnotes 116 information technology and libraries | september 2009 success factors and strategic planning: rebuilding an academic library digitization program cory lampert and jason vaughan this paper discusses a dual approach of case study and research survey to investigate the complex factors in sustaining academic library digitization programs. the case study involves the background of the university of nevada, las vegas (unlv) libraries’ digitization program and elaborates on the authors’ efforts to gain staff support for this program. a related survey was administered to all association of research libraries (arl) members, seeking to collect baseline data on their digital collections, understand their respective administrative frameworks, and to gather feedback on both negative obstacles and positive inputs affecting their success. results from the survey, combined with the authors’ local experience, point to several potential success factors including staff skill sets, funding, and strategic planning. e stablishing a successful digitization program is a dialog and process already undertaken or currently underway at many academic libraries. in 2002, according to an institute of museum and library services report, “thirty-four percent of academic libraries reported digitization activities within the past 12 months.” nineteen percent expect to be involved in digitization work in the next twelve months, and forty-four percent beyond twelve months.1 more current statistics from a subsequent study in 2004 reflected that digitization work has both continued and expanded, with half of all academic libraries performing digitization activities.2 fifty-five percent of arl libraries responded to a survey informing part of the 2006 association of research libraries (arl) study managing digitization activities; of these, 97 percent of the respondents indicated engagement in digitization.3 the 2008 ithaka study key stakeholders in the digital transformation in higher education found that nearly 80 percent of large academic libraries either already have or plan to have digital repositories.4 with digitization becoming the norm in many institutions, the time is right to consider what factors contribute to the success and rapid growth of some library digitization programs while other institutions find digitization challenging to sustain. the evolution of digitization at the unlv libraries is doubtless a journey many institutions have undertaken. over the past couple of years, those responsible for such a program at the unlv libraries have had the opportunity to revitalize the program and help collaboratively address some key philosophical questions that had not been systematically asked before, let alone answered. associated with this was a concerted focus to engage other less involved staff. one goal was to help educate them on academic digitization programs. another goal was to provide an opportunity for input on key questions related to the programs’ strategic direction. as a subsequent action, the authors conducted a survey of other academic libraries to better understand what factors have contributed to their programs’ own success as well as challenges that have proven problematic. many questions asked of our library staff in the planning and reorganization process were asked in the survey of other academic libraries. while the unlv libraries have undertaken what is felt are the proper structural steps and have begun to author policies and procedures geared toward an efficient operation, the authors wanted to better understand the experiences, key players, and underlying philosophies of other institutional libraries as theses pertain to their own digitization program. the following article provides a brief context relating the background of the unlv libraries’ digitization program and elaborates on the authors’ efforts toward educating library colleagues and gaining staff buy-in for unlv’s digitization program—a process that countless other institutions have no doubt experienced, led, or suffered. the administered survey to arl members dealt with many topics similar to those that arose during the authors’ initial planning and later conversations with library staff, and as such, survey questions and responses are integrated in the following discussion. the authors administered a 26-question survey to the 123 members of the arl. the focus of this survey was different from the previously mentioned arl study managing digitization activities, though several of the questions overlapped to some degree. in addition to demographic or concrete factual types of questions, the unlv libraries digitization survey had several questions focused on perceptions—that is, staff support, administrative support, challenges, and benefits. areas of overlap with the earlier arl survey are mentioned in the appropriate context. though unlv isn’t a member of the arl, we consider ourselves a research library, and, regardless, it was a convenient way to provide some structure to the survey. survey responses were collected for a forty-five-day period from mid-june to late july, 2008. through visiting each and every arl library’s website, the authors identified the individuals that appeared to be the “leaders” of the arl digitization programs, with instructions to forward the message to a colleague if cory lampert (cory.lampert@unlv.edu) is digitization projects librarian and jason vaughan (jason.vaughan@unlv.edu) is director, library technologies, university of nevada las vegas. success factors and strategic planning | lampert and vaughan 117 they themselves had been incorrectly identified. this was very tricky, and revealed numerous program structures in place, differences between institutions in promoting their collections, and so on. the authors didn’t necessarily start with the presumption that all arl libraries even have a digitization program, but most (but not all) either seemed to have a formal organized digitization program with staffing, or at least had digitized and made available something, even if only a single collection. we e-mailed a survey announcement and a link to the survey to the targeted individuals, with a follow-up reminder a month later. responses were anonymous, and respondents were allowed to skip questions; thus the number of responses for the twenty-six questions making up the survey ranged from a low of thirty (24.4 percent) to a high of forty-four responses (35.8 percent). the average number of responses for each of the questions was 39.8, yielding an overall response rate of 32.4 percent. questions were of three types: multiple choice (select one answer), multiple choice (mark all that apply), and open text. in addition, some of the multiple choice questions allowed additional open text comments. survey responses appear in appendix a. n context of the unlv libraries’ digitization program “digital collection,” for the purpose of the unlv library digitization survey, was defined as a collection of library or archival materials converted to machine-readable format to provide electronic access or for preservation purposes; typically, digital collections are library-created digital copies of original materials presented online and organized to be easily searched. they may offer features such as: full text search, browsing, zooming and panning, side by side comparison of objects, and export for presentation and reuse. one question the survey asked was “what year do you feel your library published its first ‘major’ digital collection?” responses ranged from 1990 to 2007; the general average of all responses was 2001. the earlier arl study found 2000 as the year most respondents began digitization activities.5 mirroring this chronology, the unlv libraries has been active in designing digital projects and digitizing materials from library collections since the late 1990s. technical web design expertise was developed in the cataloging unit (later renamed bibliographic and metadata services), and some of the initial efforts were to create online galleries and exhibits of visual materials from special collections, such as the jeanne russell janish (1998) exhibit.6 subsequently, the unlv libraries purchased the contentdm digital collection management software, providing both back-end infrastructure and front-end presentation for digital collections. later, the first digitization project with search functionality was created in partnership with special collections and was funded by a unlv planning initiative award received in 1999. the early las vegas (2003) project focused on las vegas historical material and was designed to guide users to search, retrieve, and manipulate results using contentdm software to query a database.7 unlv’s development corresponds with regional developments in utah in 2001, when “the largest academic institutions in utah were just beginning to develop digital imaging projects.”8 data from the 2004 imls study showed that, in the twelve months prior to the study release in 2004, the majority of larger academic libraries had digitized between one and five hundred images for online presentation.9 in terms of staffing, digitization efforts occur in a wide variety of configurations, from large departments to solo librarians managing volunteers. for institutions with recognized digitization staff, great variations exist between institutions in terms of where in the organizational chart digitization staff are placed. boock and vondacek’s research revealed that, of departments involved in digitization, special collections, archives, technical services, and newly created digital library units are where digitization activities most commonly take place.10 a majority of respondents to the arl study indicated that some or all activities associated with digitization are distributed across various units in the library.11 in 2003, the unlv libraries created a formal department within the knowledge access management division—web and digitization services (wds)—initially comprising five staff focused on the development of the unlv libraries’ public website, the development of web-based applications and databases to manage and efficiently present information resources, and the digitization and online presentation of library materials unique to the unlv libraries’ collections and of potential interest to a wider audience. augmenting their efforts were individuals in other departments helping with metadata standards, content selection, and associated systems technical support. the unlv library digitization survey showed that the majority (78 percent) of libraries that responded have at least one full-time staff member whose central job responsibility is to support digitization activities. this should not imply the existence of a fully staffed digitization program; the 2006 imls study found that 74.1 percent of larger academic libraries described themselves as lacking in sufficiently skilled technology staff to accomplish technology-related activities.12 central to any digitization program should be some structure in terms of how projects are proposed and subsequently prioritized. to help guide the priorities 118 information technology and libraries | september 2009 of unlv’s infant wds department, a digital projects advisory committee was formed to help solicit and prioritize project ideas, and subsequently track the development of approved projects. this committee’s work could be judged as having mixed success partly because it met too infrequently, struggled with conflicting philosophical thoughts on digitization, and was confronted with the reality that staff that were needed to help bring approved ideas to fruition simply weren’t in place because of too many other library priorities drawing attention away from digitization. an evaluation of the lessons learned from these early years can be found in brad eden’s article.13 the unlv library digitization survey had several questions related to management and prioritization for digital projects and shows that despite the challenges of a committee-based decisionmaking structure, when a formal process is in place at all, 42.1 percent of survey respondents used a committee versus a single decision maker (23.7 percent) for determining to whom projects are proposed for production. a follow-up question asked “how are approved projects ultimately prioritized?” the most popular response (54.1 percent) indicated “by a committee for review by multiple people,” followed by “no formal process” (27 percent). “by a single decision maker” was selected by 18.9 percent of the respondents. the earlier arl study asked a somewhat related question: “who makes decisions about the allocation of staff support for digitization efforts? check all that apply.” out of seven possible responses, the three most popular were “head of centralized unit,” “digitization team/committee/working group,” and “other person”; the other person was most often in an administrative capacity, such as a dean, director, or department head.14 administrative support for a program was another variable the unlv library digitization survey investigated. the survey asked respondents to rate, on a scale of one to five, “how would you characterize current support for digitization by your library’s administration?” more than 40 percent of responses indicated “consistent support,” followed by 31 percent of respondents indicating “very strong support, top priority,” 14.3 percent ranking support as neutral, and 14.2 percent claiming “minimal support” or “very little support, or some resistance.” it was also clear from some of the other questions’ responses that the dean or director’s support (or lack thereof) can have dramatic effects on the digitization program. 2005 brought change to the unlv libraries in the form of a new dean. well-suited for the digitization program, she came from california, a state very heavily engaged and at the forefront of digitization within the library and larger academic environment. one of her initiatives was a retooling of the digitization program at the unlv libraries, and her enthusiasm reflects a growing awareness of administrators regarding the benefits of digitization. n reorganization, library staff engagement, and decision making in 2006, two new individuals joined unlv libraries’ web and digitization services department, the digitization projects librarian (filling a vacancy), and the web technical support manager (a new position). a bit later, the systems department (providing technical support for the web and digitization servers, among other things), and the wds department were combined into a single unit and renamed library technologies. collectively, these changes brought new and engaged staff into the digitization program and combined under one division many of the individuals responsible for digital collection creation and support. perhaps more subtlety, this arrangement also provided formal acknowledgement of the importance and desire of publishing digital collections. with the addition of new staff and a reorganization, a piece still missing was a resuscitation of library stakeholders to help solicit, prioritize, and manage the creation of digital collections and an overall vision guiding the program. while the technical expertise, knowledge of metadata and imaging standards, and deep-rooted knowledge of digitization programs and concepts existed within the library technologies staff, other knowledge didn’t—primarily in-depth knowledge of the unlv libraries’ special collections and a track record of deep engagement with college faculty and the educational curriculum. similar to other organizations, the unlv libraries had not only created a new unit, but was also poised to introduce cross-departmental project groups that would collaborate on digitization activities. in their study of arl and greater western library association (gwla) libraries, book and vondracek found that this was the most commonly used organizational structure.15 knowledge of the concepts of a digitization program and what is involved in digitizing and sustaining a collection was not widespread among other library colleagues. acknowledged, but not guaranteed up front for the unlv libraries, was the likely eventual reformation of a group of interested and engaged library stakeholders charged to solicit, prioritize, and provide oversight of the unlv libraries’ digitization program. for various reasons, the authors wanted to garner staff buy-in to the highest degree possible. apart from wanting less informed colleagues to understand the benefits of a digitization program, it was also likely that such colleagues would help solicit projects through their liaison work with programs of study across campus. one unlv library digitization survey question asked, “how would you characterize support for digitization in your library by the majority of those providing content for digitization projects?” “consistent support” was indicated by 65.9 percent of respondents; 15.9 percent indicated “very strong support, top priority,” 13.6 percent indicated neutrality, and 4.6 success factors and strategic planning | lampert and vaughan 119 percent indicated either minimal support or even some resistance. to help garner staff buy-in and set the stage for revitalizing the unlv libraries’ digitization efforts, we began laying the groundwork to educate and engage library staff in the benefits of a digitization program. this work included language successfully woven into the unlv libraries’ strategic plan and an authored white paper posing engaging questions to the larger library audience related to the strategic direction of the program. finally, we planned and executed two digitization workshops for library staff. n the strategic plan one unlv library digitization survey question asked, “is the digitization program or digitization activities referenced in your library’s strategic plan?” a total of 63.4 percent indicated yes, with an additional 22 percent indicating no specific references, but rather implied references. only 7.3 percent indicated that the digitization program was not referenced in any manner in the strategic plan, while, surprisingly, 3 responses (7.3 percent) indicated that their library doesn’t have a strategic plan. the unlv libraries’ strategic plan is an important document authored with wide feedback from library staff, and it exemplifies the participatory decision-making process in place in the library. the current iteration of the strategic plan covers 2007–9 and includes various goals with supporting strategies and action items.16 in addition, all action items have associated assessment metrics and library staff responsible for championing the action items. departmental annual reports explicitly reference progress toward strategic plan goals. as such, if goals related to the digitization program appear in the strategic plan, that’s a clear indication, to some degree, of staff buy-in in acknowledging the significance of the digitization program. fortunately, digitization efforts figure prominently in several goals, strategies, and action items, including the following: n increasingly provide access to digital collections and services to support instruction, research, and outreach while improving access to the unlv libraries’ print and media collections. n provide greater access to digital collections while continuing to build and improve access to collections in all formats to meet the research and teaching needs of the university. identify collections to digitize that are unique to unlv and that have a regional, national, and international research interest. create digital projects utilizing and linking collections. develop and adapt metadata and scanning standards that conform to national standards for all formats. provide content and metadata for regional and national digital projects. continue to develop expertise in the creation and management of digital collections and information. collaborate with faculty, students, and others outside the library in developing and presenting digital collections. n be a comprehensive resource for the documentation, investigation, and interpretation of the complex realities of the las vegas metropolitan area and provide an international focal point for the study of las vegas as a unique urban and cultural phenomenon. facilitate real and digital access to materials and information that document the historical, cultural, social, and environmental setting of las vegas and its region by identifying, collecting, preserving, and managing information and materials in all formats. identify unique collections that strengthen current collections of national and international significance in urban development and design, gaming, entertainment, and architecture. develop new access tools and enhance the use of current bibliographic and metadata utilities to provide access to physical and digital collections. develop web-based digital projects and exhibits based upon the collections. an associated capital campaign case statement associated with the strategic plan lists several gift opportunities that would benefit various aspects of the unlv libraries; several of these include gift ideas related to the digitization of materials. n the white paper another important step in laying the groundwork for the digitization program was a comprehensive white paper authored by the recently hired digitization projects librarian. the finished paper was originally given to the dean of libraries and thereafter to the administrative cabinet, and eventually distributed to all library staff. the outline of this white paper is provided as appendix b. the purpose of the white paper was multifaceted. after a brief historical context, the white paper addressed perhaps the single most important aspect of a digitization program—program planning—developing the strategic goals of the program, selecting and prioritizing projects though a formal decision-making process, and managing initiatives from idea to reality through efficient project teams. this first topic addressing the core values of the program had a strong educational purpose for the entire library staff—the ultimate audience of the paper. as part of its educational goal, the white paper enumerated the various strengths of digitization and why an institution 120 information technology and libraries | september 2009 would want to sustain a digitization program (providing greater worldwide access to unique materials, promoting and supporting education and learning when integrated with the curriculum, etc.). it defined distinctions between an ephemeral digital exhibit and a long-term published and maintained collection. it discussed the various components of a digital collection—images, multimedia, metadata, indexing, thematic presentation (and the preference to be unbiased), integration with other digital collections and the library website, etc. it posited important questions on sustenance and assessment, and defined concepts such as refreshing of data and migration of data to help set the stage for future philosophical discussions. given the myriad reasons one might want to publish a digital collection, checked by the reality that all the reasons and advantages may not be realized or given equal importance, the white paper listed several scenarios and asked if each scenario was a strong underlying goal for our program—in short, true or false: n “the libraries are interested in digitizing select unique items held in our collection and providing access to these items in new formats.” n “the libraries are interested in digitizing whole runs of an information resource for access in new formats.” n “the libraries should actively pursue funding to support major digitization initiatives.” n “the libraries should take advantage of the unique publicity, promotion, and marketing opportunities afforded by a digital project/program.” continuing with a purpose of defining boundaries of the new program, the paper asked questions related to audience, required skill sets, and resources. the second primary topic introduced the selection and prioritization of the items and ideas suggested for digitization. it posed questions related to content criteria (why does this idea warrant consideration? would complex or unique metadata be required from a subject specialist?) and listed various potential evaluative measures of project ideas (should we do this if another library is already doing a very similar project?). technical criteria considerations were enumerated, touching on interoperability of collections in different formats, technical infrastructure considerations, and so on. multiple simultaneous ideas beg for prioritization, and the white paper proposed a formal review process and the library staff and skill sets that would help make such a process successful. the third primary topic focused on the details of carrying an approved idea to reality, and strengthened the educational purpose of the white paper. it described the general planning steps for an approved project and included a list of typical steps involved with most digital projects—scanning; creating metadata, indexes, and controlled vocabulary; coding and designing the web interface; loading records into unlv libraries’ contentdm system; publicizing the launch of the project; and assessing the project after completion. one unlv library digitization survey question was related to thirteen such skills the unlv libraries identified as critical for a successful digitization program. the question asked respondents to rate skill levels possessed by personnel at their library, based on a five-point scale (from one to five: “no expertise,” “very limited expertise,” “working knowledge/enough to get by,” “advanced knowledge,” and “tremendous expertise”). neither “no expertise” nor “very limited expertise” garnered the highest number of responses for any of the skills. the overall rating average of all thirteen skills was 3.79 out of 5. the skills with the highest rating averages were “metadata creation/cataloging” 4.4 and “digital imaging/document scanning/post image processing/photography” with 4.27. the skills with the lowest rating averages were “marketing and promotion” with 2.95 followed by “multimedia formats” with 3.33. the unlv libraries’ white paper contained several appendixes that likely provided some of the richest content of the white paper. with the educational thrust completed, the appendixes drew a roadmap of “where do we want to go from here?” this roadmap suggested the revitalization of an overarching digital projects advisory committee, potential members of the committee, and functions of the committee. the committee would be responsible for soliciting and prioritizing ideas and tracking the progress of approved ideas to publication. the appendixes also proposed project teams (which would exist for each project), likely members of the project teams, and the functions of the project team to complete day-to-day digitization activities. the liaison between the digital projects advisory committee and the project team would be the digitization projects librarian, who would always serve on both. the last page of the white paper provided an illustration highlighting the various steps proposed in the lifecycle of a digital project—from concept to reality. n digitization workshops several months after the white paper had been shared, the next step in restructuring the program and building momentum was sponsoring two forums on digitization. the first one occurred in november 2006 and included two speakers brought in for the event, roy tennant (formerly user services architect with the california digital library and now with oclc) and ann lally (head of the digital initiatives program at the university of washington libraries). this session consisted of a success factors and strategic planning | lampert and vaughan 121 two-hour presentation and q&a to which all library staff were invited, followed by two breakout sessions. all three sessions were moderated by the digitization projects librarian. questions from these sessions are provided in appendix c. the breakout sessions were each targeted to specific departments in the unlv libraries. the first focused on providing access to digital collections (definitions of digital libraries, standards, designing useful metadata, accessibility and interoperability, etc.). the second focused on components of a well-built digital library (goals of a digitization program, content selection criteria, collaboration, evaluation and assessment, etc.). colleagues from other libraries in nevada were invited, and the forum was well attended and highly praised. the sessions were recorded and later made available on dvd for library staff unable to attend. this initial forum accomplished two important goals. first, it was an allstaff meeting offering a chance to meet, explore ideas, and learn from two well-known experts in the field. second, it offered a more intimate chance to talk about the technical and philosophical aspects of a digitization program for those individuals in the unlv libraries associated with such tasks. as a momentum-building opportunity for the digitization program, the forum was successful. the second workshop occurred in april 2007. to gain initial feedback on several digitization questions and to help focus this second workshop, we sent out a survey to several dozen library staff—those that would likely play some role at some point in the digitization program. the survey contained questions focused on several thematic areas: defining digital libraries, boundaries to the digitization program, users and audience, digital project design, and potential projects and ideas. it contained thirteen questions consisting of open-ended response questions, questions where the respondent ranked items on a five-point scale, and “select all that apply”–type questions. we distributed the survey to invitees to the second workshop, approximately three dozen individuals; of those, eighteen (about 50 percent) responded to most of the questions. the survey was closely tied to the white paper and meant to gauge early opinions on some of the questions posed by that paper. whereas the first workshop included some open q&a, the second session was structured as a hands-on workshop to answer some of the digitization questions and to illustrate the complexity of prioritizing projects. the second workshop began with a status update on the retooling of the unlv libraries’ digitization program. this was followed by an educational component that focused on a diagram that detailed the workflow of a typical digitization project and who was involved and that emphasized the fact that there is a lot of planning and effort needed to bring an idea to reality. in addition, we discussed project types and how digital projects can vary widely in scope, content, and purpose. finally, we shared general results from the aforementioned survey to help set the stage for the structured hands-on exercises. the outline for this second workshop is provided in appendix d. one question of the unlv library digitization survey asked, “on a scale of 1 to 5, how important are each of the factors in weighing whether to proceed with a proposal for a new digital collection project, or enhancement of an existing project?” eight factors were listed, and the fivepoint scale was used (from one to five: “not important,” “less important,” “neutral,” “important,” and “vitally important”). the average rating for all eight factors was 3.66. the two most important factors were “collection includes unique items” (4.49 average rating) and “collection includes items for which there is a preservation concern or to make fragile items more accessible to the public” (3.95 average rating). the factors with the lowest average ratings were “collection includes integration of various media into a themed presentation” (2.54 average rating) followed by “collection involves a whole run of an information resource (i.e., such as an entire manuscript, newspaper run, etc.” (3.39 average rating). the earlier arl survey asked a somewhat related question, “what is/has been the purpose of these digitization efforts? check all that apply.” of the six possible responses (which differed somewhat from those in the unlv library digitization survey), the most frequent responses were “improved access to library collections,” “support for research,” and “preservation.”17 the earlier survey also asked the question, “what are the criteria for selecting material to be digitized? check all that apply.” the most frequent responses were “subject matter,” “material is part of a collection being digitized,” and “rarity or uniqueness of the item(s).”18 the first exercise of the second digitization workshop focused on digital collection brainstorming. the authors provided a list of ten project examples and asked each of the six tables (with four colleagues each) to prioritize the ideas. afterward, a speaker from each table presented the prioritizations and defended their rankings. this exercise successfully illustrated to peers in attendance that different groups of people have different ideas about what’s important and what constitutes prime materials for digitization. the rankings from the varying tables were quite divergent. a related question asked of the arl libraries in the unlv library digitization survey was “from where have ideas originated for existing, published digital collection at your library?” and offered six choices. respondents could mark multiple items. the most chosen answer (92.7 percent) was “special collections, archives, or library with a specialized collection or focus.” the least chosen answer (51.2 percent) was “an external donor, friend of the library, community user, etc.” for the second part of the workshop exercise, each table came up with their own digital collection ideas, defined the audience and content of the proposal, and defended and 122 information technology and libraries | september 2009 explained why they thought these were good proposals. fourteen unique and varied ideas were proposed, most of which were tightly focused on las vegas and nevada, such as “history of las vegas,” “unlv yearbooks,” “las vegas gambling and gamblers,” and “african american entertainers in las vegas.” other proposals were less tied to the area, such as a “botany collection,” “movie posters,” “children’s literature,” “architecture,” and “federal land management.” this exercise successfully showed that ideas for digital collections stretch across a broad spectrum, as broad as the individual brainchilden themselves. finally, in the last digitization workshop exercise, each table came up with specialties, roles, and skills of candidates who could potentially serve on the proposed committee, and defended their rationale—in other words, committee success factors. this exercise generated nineteen skills seen as beneficial by one or more of the group tables. at the end of the workshop, we asked if others had alternate ideas to the proposed committee. none surfaced, and the audience thought such a committee should be reestablished. this second workshop concluded with a brief discussion on next steps—drafting a charge for the committee, choosing members, and a plug for the expectation of subject liaisons working with their respective areas to help better identify opportunities for collaboration on digital projects across campus. n toward the future digital projects currently maintained by the unlv libraries include both static web exhibits in the tradition of unlv’s first digitization efforts, as well as several searchable contentdm–powered collections. the unlv libraries have also sought to continue collaborative efforts, participating as project partners for the western waters digital library (phase 1) and continuing in a regional collaboration as a hosting partner in the mountain west digital library. partnerships were shown in the unlv library digitization survey to garner increased buy-in for projects, with one respondent commenting that faculty partnerships had been “the biggest factor for success of a digital library project.” institutional priorities at unlv libraries reflect another respondent’s comment regarding “interesting archival collections” as a success factor. one recently launched unlv collection is the showgirls collection (2006), focused on a themed collection of historical material about las vegas entertainment history.19 another recently launched collection, the nevada test site oral history project (2008), recounts the memories of those affiliated with and affected by the nevada test site during the era of cold war nuclear testing and includes searchable transcripts, selected audio and video clips, and scanned photographs and images.20 with general library approval, the restructured digitization projects advisory committee was established in july 2007 with six members drawn from library technologies, special collections, the subject specialists, and at large. the advisory committee has drafted and gained approval for several key documents to help govern the committee’s future work. this includes a collection development policy for digitization projects and a project proposal form to be completed by the individual or group proposing an idea for a digital collection. at the time of writing, the committee is just now at the point of advertising the project proposal form and process, and time will tell how successful these documents prove. in the unlv library digitization survey, 65.4 percent responded that a digitization mission statement or collection development policy was in place at their institution. one goal at unlv is to “ramp up” the number of simultaneous digitization projects underway at any one time at unlv. many items in the special collections are ripe for digitization. many of these are uncataloged, and digitizing such collections would help promote these hidden treasures. related to ramping up production, one unlv library digitization survey question asked, “on average over the past three years, approximately how many new digital collections are published each year?” responses ranged from zero new collections to sixty. the average number of new collections added each year was 6.4 for the 32 respondents who gave exact numerical answers. while this is perhaps double the unlv libraries’ current rate of production, it illustrates that increasing production is an achievable goal. staffing and funding for the unlv libraries’ digitization program have both seen increases over the past several years. a new application developer was hired, and a new graphics/multimedia specialist filled an existing vacancy. together, these staff have helped with projects such as modifying contentdm templates, graphic design, and multimedia creation related to digital projects, in addition to working on other web-based projects not necessarily related to the digitization program. another position has a job focus shifted toward usability for all things webbased, including digitization projects. in terms of funding, the two most recent projects at the unlv libraries are both the result of successful grants. the recently launched nevada test site oral history project was the result of two grants from the u.s. departments of education and energy. subsequently, a $95,000 lsta grant proposal seeking to digitize key items related to the history of southern nevada from 1900 to 1925 was funded for 2008–9, with the resulting digital collection publicly launched in may 2009. this collection, southern nevada: the boomtown years, contains more than 1,500 items from several institutions, focused on the heyday of mining town life in southern success factors and strategic planning | lampert and vaughan 123 nevada during the early twentieth century.21 this grant funded four temporary positions: a metadata specialist, an archivist, a digital projects intern, and an education consultant to help tie the digitized collection into the k–12 curriculum. grants will likely play a large role in the unlv libraries’ future digitization activities. the unlv library digitization survey asked, “has your institution been the recipient of a grant or gift whose primary focus was to help efforts geared toward digitization of a particular collection or to support the overall efforts of the digitization program?” the question sought to determine if grants had played a role, and if so, whether it was primarily large grants (defined as > $100,000), small grants (< $100,000), or both. the majority of responses (46.2 percent), indicated a combination of both small and large grants had been received in support of a project or the program. an additional 25.6 percent indicated that large grants had played a role, and 23.1 percent indicated that one or more small grants had played a role. two respondents (5.1 percent) indicated that no grants had been received or that they had not applied for any grants. the earlier arl survey asked the question, “what was/is the source of the funds for digitization activities? check all that apply.” of seven possible responses, “grant” was the second most frequent response, trailing only “library.”22 with an eye toward the future, the survey administered to arl libraries asked two blunt questions summarizing the overall thrust of the survey. one of the final open-ended survey questions asked, “what are some of the factors that you feel have contributed to the success of your institution’s digitization program?” forty respondents offered answers that ranged from listing one item to multiple items. several responses along the same general theme seemed to surface, which could be organized into rough clusters. in general, support from library administration was mentioned by a dozen respondents, with such statements as “consistent interest on the part of higher level administration,” “having support for the digitization program at an administrative level from the very beginning,” “good support from the library administration,” “support of the dean,” and, mentioned multiple times in the same precise language, “support from library administration.” faculty collaboration and interest across campus was mentioned by ten respondents, evidenced by statements such as “strong collaboration with faculty partners,” “support of faculty and other partners,” “interest from faculty,” “heavily involving faculty in particular . . . ensures that we can have continued funding since the faculty can lobby the provost’s office,” and “grant writing partnerships with faculty.” passionate individuals involved with the program and/or support from other staff in the libraries were mentioned by ten respondents, with comments such as “program management is motivated to achieve success,” “a strong department head,” “individual staff member ’s dedication to a project,” “commitment of the people involved,” “team work, different departments and staff willing to work together,” and “supportive individuals within the library.” having “good” content to digitize was mentioned by seven respondents, with statements such as “good content,” “collection strength,” “good collections,” and “availability of unique source materials.” strategic plan or goals integration was mentioned in several responses, such as “strong financial commitment from the strategic plan” and “mainstreaming the work of digital collection building into the strategic goals of many library departments.” successful grants and donor cultivation were mentioned by four respondents. other responses were more unique, such as one respondent’s one-word response—“luck”—and other responses such as “nimbleness, willingness, and creativity,” and “a vision for large-scale production, and an ability to achieve it.” the final unlv library digitization survey question asked, “what are the biggest challenges for your institution’s digitization program?” thirty-nine respondents provided feedback, and again, several variations on a theme emerged. the most common response, unsurprisingly, “not enough staffing,” was mentioned by eighteen respondents, with responses such as “lack of support for staffing at all necessary levels,” “the real problem is people, we don’t have enough staff,” “limited by staff,” and “we need more full-time people.” following this was (a likely related response) “funding,” mentioned by another nine respondents, with statements such as “funding for external digitization,” “identifying enough funding to support conversion,” “we could always use more money,” and, succinctly, “money.” related to staffing, specifically, six responses focused on technical staff or support from technical staff, such as “need more it (information technology) staff,” “need support from existing it staff,” “not enough application development staff,” and “limited technical expertise.” prioritization and demand issues surfaced in six responses, with responses such as “prioritizing efforts now that many more requests for digital projects have been submitted,” “prioritization,” “can’t keep up with demand,” and “everyone wants to digitize everything.” workflow was mentioned in four responses, such as “workflow bottlenecks,” “we need to simplify the process of getting materials into the repository,” and “it takes far longer to describe an object than to digitize it, thus creating bottlenecks.” “not enough space” was mentioned by three respondents, and “maintaining general librarywide staff support for the program” was mentioned by two respondents. the unlv libraries will keep in mind the experiences of our colleagues, as few, if any, libraries are likely immune to similar issues. 124 information technology and libraries | september 2009 n conclusions the unlv library digitization survey revealed, not surprisingly, that not all libraries, even those of high stature, are created equally. many have struggled to some extent in growing and sustaining their digitization programs. many have numerous published projects, others have few or perhaps even none. administrative and fellow colleague support varies, as does funding. additional questions remain to be tackled at the unlv libraries. how precisely will we define success for the digitization program? by the number of published collections? by the number of successful grants executed? by the number of image views or metadata record accesses? by the frequency of press in publications and word-of-mouth praise from fellow colleagues? ideas abound, but no definitive answers exist as of yet. at the larger level, other questions are looming. as libraries continue to promote themselves as relevant in the digital age, and promote themselves as a (or the) central partner in student learning, to what degree will libraries’ digital collections be tied into the educational curriculum, whether at their own affiliated institutions or with k–12 in their own states as well as beyond? clearly the profession is changing, with library schools creating courses and certificate programs in digitization. discussions about the integration of various information silos, metadata crosswalking, and item exposure in other online systems used by students will continue. library digitized collections are primary resources involved in such discussions. while these questions persist, it’s hoped that at a minimum, the unlv libraries have established the foundational structure to foster what we hope will be a successful digitization program. references 1. institute for museum and library services, “status of technology and digitization in the nation’s museums and libraries 2002 report,” may 23, 2002, www.imls.gov/publications/ techdig02/2002report.pdf (accessed mar. 1, 2009). 2. institute for museum and library services, “status of technology and digitization in the nation’s museums and libraries 2006 report,” jan. 2006, www.imls.gov/resources/ techdig05/technology%2bdigitization.pdf (accessed mar. 1, 2009). 3. rebecca mugridge, managing digitization activities, spec kit 294 (washington, d.c.: association of research libraries, 2006): 11. 4. ross housewright and roger schonfeld, “ithaka’s 2006 studies of key stakeholders in the digital transformation in higher education,” aug. 18, 2008, www.ithaka.org/research/ ithakas%202006%20studies%20of%20key%20stakeholders%20 in%20the%20digital%20transformation%20in%20higher%20 education.pdf (accessed mar 1, 2009). 5. ibid. 6. university of nevada, las vegas university libraries, “jeanne russell janish, botanical illustrator: landscapes of china and the southwest,” oct. 17, 2006, http://library.unlv .edu/speccol/janish/index.html (accessed mar. 1, 2009). 7. university of nevada, las vegas university libraries, “early las vegas,” http://digital.library.unlv.edu/early_ las_vegas/earlylasvegas/earlylasvegas.html (accessed mar. 1, 2009). 8. arlitsch, kenning, and jeff jonsson, “aggregating distributed digital collections in the mountain west digital library with the contentdm multi-site server,” library hi tech 23, no. 2 (2005): 221. 9. institute for museum and library services, “status of technology and digitization in the nation’s museums and libraries 2006 report.” 10. michael boock and ruth vondracek, “organizing for digitization: a survey,” portal: libraries and the academy 6, no. 2 (2006), http://muse.jhu.edu/journals/portal_libraries_and_ the_academy/v006/6.2boock.pdf (accessed mar. 1, 2009). 11. mugridge, managing digitization activities, 12. 12. institute for museum and library services, “status of technology and digitization in the nation’s museums and libraries 2006 report.” 13. brad eden, “managing and directing a digital project,” online information review 25, no. 6 (2001), www.emerald insight.com/insight/viewpdf.jsp?contenttype=article& filename=html/output/published/emeraldfulltextarticle/ pdf/2640250607.pdf (accessed mar. 1, 2009). 14. mugridge, managing digitization activities, 32–33. 15. boock and vondracek, “organizing for digitization: a survey.” 16. university of nevada, las vegas university libraries, “university libraries strategic goals and objectives,” june 1, 2005, www.library.unlv.edu/about/strategic_goals.pdf (accessed mar. 1, 2009). 17. mugridge, managing digitization activities, 20. 18. ibid, 48. 19. university of nevada, las vegas university libraries, “showgirls,” http://digital.library.unlv.edu/showgirls/ (accessed mar. 1, 2009). 20. university of nevada, las vegas university libraries, “nevada test site oral history project,” http://digital.library .unlv.edu/ntsohp/ (accessed mar. 1, 2009). 21. university of nevada, las vegas university libraries, “southern nevada: the boomtown years,” http://digital .library.unlv.edu/boomtown/ (accessed may 15, 2009). 22. mugridge, managing digitization activities, 40. success factors and strategic planning | lampert and vaughan 125 appendix a. unlv library digitization survey responses 1. is the digitization program or digitization activities referenced in your library’s strategic plan? answer options (41 responses total) response percent response count yes 63.4 26 no 7.3 3 not specifically, but implied 22.0 9 our library doesn’t have a strategic plan 7.3 3 2. how would you characterize current support for digitization by your library’s administration? answer options (42 responses total) response percent response count very strong support, top priority 31.0 13 consistently supportive 40.5 17 neutral 14.3 6 minimal support, 7.1 3 very little support, or some resistance 7.1 3 3. how would you characterize support for digitization in your library by the majority of those providing content for digitization projects (i.e., regardless of whether those providing content have as a primary or a minor responsibility provisioning content for digitization projects)? answer options (44 responses total) response percent response count very strong support, top priority 15.9 7 consistently supportive 65.9 29 neutral 13.6 6 minimal support 2.3 1 very little support, or some resistance 2.3 1 126 information technology and libraries | september 2009 4. what year do you feel your library published its first “major” digital collection? major is defined as this was the first project deemed as having permanence and which would be sustained; it has associated metadata, etc. if you do not know, you may estimate or type “unknown.” responses ranged from 1990 to 2007. 5. to date, approximately how many digital collections has your library published? (please do not include ephemeral exhibits that may have existed in the past but no longer are present or sustained.) responses ranged from 1 to 1,000s. the great majority of responses were under 100; four responses were between 100 and 200, and one response was “1,000s.” success factors and strategic planning | lampert and vaughan 127 6. on average over the past 3 years, approximately how many new digital collections are published each year? all but two responses ranged from 0 to 10. one response was 13, one was 60. 7. what hosting platform(s) do you use for your digital collections (e.g., contentdm, etc.)? 8. does your institution have an institutional repository (e.g., dspace)? answer options (41 responses total) response percent response count yes 73.2 30 no 26.8 11 9. if the answer was “yes” in question 5, is your institutional repository using the same software as your digital collections? answer options (30 responses total) response percent response count yes 26.7 8 no 73.3 22 128 information technology and libraries | september 2009 10. is there an individual at your library whose central job responsibility is the development, oversight, and management of the library’s digitization program? (for purposes of this survey, central job responsibility means that 50 percent or more of the employee’s time is dedicated to digitization activities.) answer options (38 responses total) response percent response count yes 78.9 30 no 21.1 8 11. are there regular, full-time staff at your library who have as their primary or one of their primary job responsibilities support of the digitization program? for this question, a primary job responsibility means that at least 20 percent of their normal time is spent on activities directly related to supporting the digitization program or development of a digital collection. (mark all that apply) answer options (39 responses total) response percent response count digital imaging/document scanning, post-image processing, photography 82.1 32 metadata creation/cataloging 79.5 31 archival research of documents included in a collection(s) 28.2 11 administration of the hosting server 53.8 21 grant writing/donor cultivation/program or collection marketing 23.1 9 project management 61.5 24 multimedia formats 25.6 10 database design and data manipulation 53.8 21 maintenance, customization, and/or configuration of digital asset management software or features within that software (e.g., contentdm) 64.1 25 programming languages 30.8 12 web design and development 71.8 28 usability 25.6 10 marketing and promotion 28.2 11 none of the above 2.6 1 12. approximately how many individuals not on the full-time library staff payroll (i.e., student workers, interns, fieldworkers, volunteers) are currently working on digitization projects? answers ranged from 0 to “approximately 46.” the majority of responses (24) fell between 0 and 10 workers; twelve responses indicated more than 10; several responses indicated “unknown.” success factors and strategic planning | lampert and vaughan 129 13. has your library funded staff development, training, or conference opportunities that directly relate to your digitization program and activities for one or more library staff members? answer options (41 responses total) response percent response count yes, frequently, one or more staff have been funded by library administration for such activities 48.8 20 yes, occasionally, one or more staff have been funded by library administration for such activities 51.2 21 no, to the best of my knowledge, no library staff member has been funded for such activities 0.0 0 14. where does the majority of digitization work take place? answer options (41 responses total) response percent response count centralized in the library (majority of content digitized using library staff and equipment in one department) 48.8 20 decentralized (majority of content digitized in multiple library departments or outside the library by other university entities) 12.2 5 through vendors or outsourcing 7.3 3 hybrid of approaches depending on project 31.7 13 15. on a scale of 1 to 5 (1 being least important and 5 being vitally important), how important are each of the factors in weighing whether to proceed with a proposal for a new digital collection project or enhancement of an existing project? answer options (41 responses total) not important less important neutral important vitally important rating average response count collection includes item(s) for which there is a preservation concern or to make fragile item(s) more accessible to the public 0 1 9 22 9 3.95 41 collection includes unique items 0 0 1 19 21 4.49 41 collection involves a whole run of an information resource (e.g., an entire manuscript, newspaper run, etc.) 2 5 11 21 2 3.39 41 130 information technology and libraries | september 2009 answer options (41 responses total) not important less important neutral important vitally important rating average response count collection includes the integration of various media (i.e., images, documents, audio) into a themed presentation 7 11 17 6 0 2.54 41 collection has a direct tie to educational programs and initiatives (e.g., university courses, statewide education programs, or k–12 education) 3 3 6 17 12 3.78 41 collection supports scholarly communication and/or management of institutional content 1 4 7 21 8 3.76 41 collection involves a collaboration with university colleagues 1 3 9 18 10 3.83 41 collection involves a collaboration with entities external to the university (e.g., public libraries, historical societies, museums) 2 4 11 19 5 3.51 41 16. from where have ideas originated for existing, published digital collections at your library? in other words, have one or more digital collections been the brainchild of one of the following? (mark all that apply) answer options (41 responses total) response percent response count library subject liaison or staff working with teaching faculty on a regular basis 75.6 31 library administration 65.9 27 special collections, archives, or library with a specialized collection or focus 92.7 38 digitization program manager 63.4 26 university staff or faculty member outside the library 68.3 28 an external donor, friend of the library, community user, etc. 51.2 21 (continued from previous page) success factors and strategic planning | lampert and vaughan 131 17. to whom are new projects first proposed to be evaluated for digitization consideration? answer options (38 responses total) response percent response count to an individual decision-maker 23.7 9 to a committee for review by multiple people 42.1 16 no formal process 34.2 13 18. how are approved projects ultimately prioritized? answer options (37 responses total) response percent response count by a single decision-maker 18.9 7 by a committee for review by multiple people 54.1 20 by departments or groups outside of the library 0.0 0 no formal process 27.0 10 19. are digitization program mission statements, selection criteria, or specific prioritization procedures in use? answer options (40 responses total) response percent response count yes, one or more of these forms of documentation exist detailing process 67.5 27 yes, some criteria are used but no formal documentation exists 25.0 10 no documented process in use 7.5 3 20. what general evaluation criteria do you employ to measure how successful a typical digital project is? (mark all that apply) answer options (39 responses total) response percent response count log analysis showing utilization/record views of digital collection items 69.2 27 analysis of feedback or survey responses associated with the digital collection 38.5 15 publicity generated by, or citations referencing, digital collection 46.2 18 e-commerce sales or reproduction requests for digital images 12.8 5 we have no specific evaluation measures in use 33.3 13 132 information technology and libraries | september 2009 21. has your institution been the recipient of a grant or gift whose primary focus was to help efforts geared toward digitization of a particular collection or to support the overall efforts of the digitization program? answer options (39 responses total) response percent response count we have received one or more smaller grants or donations (each of which was $100,000 or less) to support a digital collection/program 23.1 9 we have received one or more larger grants or donations (each of which was greater than $100,000) to support a digital collection/program 25.6 10 we have received a mix of small and large grants or donations to support a digital collection/program 46.2 18 we have been unsuccessful in receiving grants or have not applied for any grants—grants and/or donations have not played any role whatsoever in supporting a digital collection or our digitization program 5.1 2 22. how would you rate the overall level of buy-in for collaborative digitization projects between the library and external partners (an external partner is someone not on the full-time library staff payroll, such as other university colleagues, colleagues from other universities, etc.)? answer options (41 responses total) response percent response count excellent 41.5 17 good 39.0 16 neutral 4.9 2 minimal 7.3 3 low or none 0.0 0 not applicable—our library has not yet published or attempted to publish a collaborative digital project involving individuals outside the library 7.3 3 23. when considering the content available for digitization, which of the following statements apply? (mark all that apply) answer options (40 responses total) response percent response count at my institution, there is a lack of suitable library collections for digitization 0.0 0 content providers regularly contact the digitization program with project ideas 52.5 21 the main source of content for new digitization projects comes from special collections, archives, other libraries with specialized collections (maps, music, etc.), or local cultural organizations (historical societies, museums) 87.5 35 success factors and strategic planning | lampert and vaughan 133 answer options (40 responses total) response percent response count the main source of content for new digitization projects comes from born digital materials (such as dissertations, learning objects, or faculty research materials) 32.5 13 content digitization is mainly limited by available resources (lack of staffing, space, equipment, expertise) 47.5 19 obtaining good content for digitization can be challenging 7.5 3 24. various types of expertise are important in collaborative digitization projects. please rate the level of your local library staff’s expertise in the following areas (1–5 scale, with 1 having no expertise and 5 having tremendous expertise). answer options (41 responses total) no expertise very limited expertise working knowledge/ enough to “get by” advanced knowledge tremendous expertise n/a rating average response count digital imaging/ document scanning, post image processing, photography 0 1 3 21 16 0 4.27 41 metadata creation/ cataloging 0 0 2 20 18 0 4.40 40 archival research of documents included in a collection 0 2 6 15 16 2 4.15 41 administration of the hosting server 1 2 7 16 15 0 4.02 41 grant writing/ donor cultivation 1 4 13 13 8 2 3.59 41 project management 0 1 9 23 8 0 3.93 41 multimedia formats 0 5 21 10 4 1 3.33 41 database design and data manipulation 0 4 9 14 13 1 3.90 41 (continued from previous page) 134 information technology and libraries | september 2009 answer options (41 responses total) no expertise very limited expertise working knowledge/ enough to “get by” advanced knowledge tremendous expertise n/a rating average response count digital asset management software (e.g., contentdm) 3 0 5 21 11 0 3.93 40 programming languages 4 3 14 9 11 0 3.49 41 web design and development 2 1 13 10 15 0 3.85 41 usability 1 7 12 13 8 0 3.49 41 marketing and promotion 2 11 17 7 3 1 2.95 41 25. what are some of the factors that you feel have contributed to the success of your institution’s digitization program? survey responses were quite diverse because respondents were speaking to their own perceptions and institutional experience. the general trend of responses are discussed in the body of the paper. 26. what are the biggest challenges for your institution’s digitization program? survey responses were quite diverse because respondents were speaking to their own perceptions and institutional experience. the general trend of responses are discussed in the body of the paper. appendix b. white paper organization i. introduction ii. current status of digitization projects at the unlv libraries iii. topic 1: program planning a. are there boundaries to the libraries digitization program? what should the program support? b. what resources are needed to realize program goals? c. who is the user or audience? d. when selecting and designing future projects, how can high-quality information be presented in online formats incorporating new features while remaining un-biased and accurate in service provision? e. to what degree do digitization initiatives need their own identity versus heavily integrating with the libraries’ other online components, such as the general website? f. how do the libraries plan on sustaining and evaluating digital collections over time? g. what type of authority will review projects at completion? how will the project be evaluated and promoted? iv. topic 2: initiative selection and prioritization a. project selection: what content criteria should projects fall within in order to be considered for digitization and what is the justification for conversion of the proposed materials? (continued from previous page) success factors and strategic planning | lampert and vaughan 135 b. project selection: what technical criteria should projects fall within in order to be considered for digitization? c. project selection: how does the project relate to, interact with, or complement other published projects and collections available globally, nationally, and locally? d. project selection and prioritization: after a project meets all selection criteria, resources may need to be evaluated before the proposal reaches final approval. what information needs to be discussed in order to finalize the selection process, select between qualified project candidates, and begin the prioritization process for approved proposals? e. project prioritization: should we develop a formal review process? v. topic 3: project planning a. what are the planning steps that each project requires? b. who will be responsible for the different steps in the project plan and department workload? c. how can the libraries provide rich metadata and useful access points? d. what type of web design will each project require? e. what type of communication needs to exist between groups during the project? vi. concluding remarks vii. related links and resources cited viii. white paper appendixes a. working list of advisory committee functions and project workgroup functions b. contentdm software: roles and expertise c. project team workflow d. contentdm elements appendix c. first workshop questions general questions 1. how do you define a digital library? do the terms “repository,” “digital project,” “exhibit,” or “online collection” connote different things? if so, what are the differences, similarities, and boundaries for each? 2. what factors have contributed to a successful digitization program at your institution? did anything go drastically wrong? were there any surprises? what should new digitization programs be cautious and aware of? 3. what is the role, specifically, of the academic library in creating digital collections? how is digitization tied to the mission of your institution? 4. why digitize and for whom? do digital libraries need their own mission statement or philosophy because they differ from physical collections? should there be boundaries to what is digitized? 5. what standards are most widely in use at this time? what does the future hold? are there new standards you are interested in? technical questions, metadata questions 1. what are some of the recommended components of digital library infrastructure that should be in place to support a digitization program (equipment, staff, planning, technical expertise, content expertise, etc?) 2. what are the relationships between library digitization initiatives, the library website, the campus website or portal, and the web? in what ways do these information sources overlap, interoperate, or require boundaries? 3. how do you decide on what technology to use? what is the decision-making process when implementing a new technology? 4. standards are used in various ways during digitization. what is the importance of using standards, and are there areas where standards should be relaxed, or not used at all? how do digitization programs deal with evolving standards? 5. preservation isn’t talked about as much as it used to be. what’s your solution or strategy to the problem of preserving digital materials? 6. will embedded metadata ever be the norm for digital objects, or will we continue to rely on collection management like contentdm to link digital objects to their associated metadata? 136 information technology and libraries | september 2009 appendix d. second workshop outline 1. introduction—purpose/focus of the meeting a. to talk about next steps in the digitization program b. quick review of the current status and where the program has been c. serve to further educate participants on the steps involved in taking a project idea to reality d. goals for participants: understand types of projects and project prioritization; engage in activities on ideas and prioritization; talk about process and discuss committee; open forum 2. staff digitization survey discussion a. “defining digital libraries” b. “boundaries to the digitization program” c. “users and audience” d. “digital project design” e. “potential projects and ideas” 3. first group exercise: digital project idea ranking and defense of ranking 4. second group exercise: digital project idea brainstorming and defense of ideas brainstormed 5. concept/proposal for a digitization advisory committee 6. conclusion and next steps collections and design questions 1. how do you decide what should be included in a digital library? does the digital library need a collection development policy and if so, what type? how are projects prioritized at your institution? 2. how do you decide who your user is? are digital libraries targeting mobile users or other users with unique needs? what value-added material compliments and enhances digital collections (i.e., item-level metadata records, guided searches, narrative or scholarly content, teaching material, etc.)? 3. how should digital libraries be assessed and evaluated? how do you gauge the success of a digital collection, exhibit, or library? what has been proven and disproved in the short time that libraries have been doing digital projects? 4. what role do digital libraries play in marketing the library? how do you market your digital collections? are there any design criteria that should be considered for the web presence of digital libraries (should the digital library look like the library website, the campus website, or have a unique look and feel)? 5. do you have any experience partnering with teaching faculty to create digital collections? how are collaborations initiated? are such collaborations a priority? what other types of collaborations are you involved in now? how do you achieve consensus with a diverse group of collaborators? to what degree is centralization important or unnecessary? 20 file organization of library records i. a. warheit: international business machines corporation, san jose, california library records and their utilization are described and the various types of file organization available are examined. the serial file with a series of inverted indexes is preferred to the simple serial file or a threaded list file. it is shown how various records should be stored, according to their utilization, in the available storage devices in order to achieve optimum cost-performance. one of the problems data processing people are beginning to face is the organization of library files. these are some of the largest and most voluminous files that will have to be organized, maintained and searched. they range in size from the national union catalog of the library of congress, which has over sixteen million records with an average of three hundred characters each, down to the hundreds of small college catalogs of 100,000 records. there are more than fifty universities whose holdings range from one million to over eight million volumes. the average holdings of library systems serving cities of 500,000 or more exceed two million volumes, although the actual number of titles is less. since the tum of the century the university libraries have been growing exponentially and at present are doubling, on the average, every fifteen years. , also the abstracting-indexing services, whose records are very similar to library catalog records and are used in much the same way, have grown very large. chemical abstracts which has been operating since 1907, now has over three and a half million citations. it provides data on file organization of library records/w arheit 21 some three million compounds and is today adding over a quarter of a million citations each year. if the present rate of growth continues, it will be adding 400,000 citations a year by 1971. index medicus and biological abstracts are very similar and there are a number of other somewhat smaller bibliographic services in the field of metals, engineering, physics, petroleum, urban renewal, atomic energy, meteorology, geology, aerospace, and so on. in addition, library-type file maintenance, organization and search are being applied to medical records, adverse drug reaction reports, intelligence files, engineering drawings, museum catalogs · and the like, and these too, represent very large information retrieval files. in other words, library files are very widespread and are beginning to become a problem for data processing. characteristics of files the aforementioned library files have certain common characteristics. first, as already noted, they are large. in the next ten or fifteen years there will probably be several hundred libraries with holdings exceeding one million volumes each. second, the records themselves are alphabetic and tend to be voluminous. they range from two hundred characters in an index journal, to three hundred characters for the standard catalog card up to two thousand characters for the abstract journals. in 1962 the library of congress, for example, estimated that it would need a file exceeding 9 x 108 bits to do its normal library processing and to store the serial records; it would need a file of 1.3 x 109 bits to store the circulation records and location directory and monitor the use of the collection, and would need a file of 1012 bits for the central catalog and the catalog authority files ( 1) . on the basis of library experience since 1962, these figures are generally considered too low. third, file records are variable in length. the librarian cannot control his inputs. the world's publica,tions appear in every shape, form and identity and they must be recorded the way they have appeared so that they can be properly identified. artificial identification such as book numbers, call numbers, coden numbers for journals and the like are simply parochial conveniences and do not replace the actual bibliographic record. records in a large catalog file are generally stable and not dynamic. if there is a new edition of a document, a new bibliographic record is made. if the old document is retained along with the new edition, the old catalog record is also retained. the record is discarded only if the document is discarded and, in the large research library, this occurs very infrequently. new indexing or cataloging is seldom applied to old records. in contrast, the smaller item record file used for acquisition and processing, the circulation file, and the serials records file, all ranging from 10,000 to 100,000 records, are dynamic records requiring many and frequent changes, additions and deletions. _ 22 journal of library automation vol. 2/1 march, 1969 each record item must have a number of different access points, since a single class or access point which everyone will accept is an impossibility. at present, with conventional library cataloging, card catalogs and printed indexes provide about five or six access points or records per title. however, computer systems, with their greater opportunity to do deeper indexing, are providing from ten to twenty keys or access points per title. distribution of index tenns is very uneven and not predictable. a few terms have a great many postings or addresses, while many terms, notably author entries, have only one or two postings. file segmentation by subject class has been proposed by some data processing personnel, but inter-disciplinary needs are such that subject segmentation is not considered very seriously. file segmentation by date, especially for the abstract services, is increasing in popularity. it is generally thought that major activity, in the technologies especially, is concentrated in current records; this is less true, however, in the sciences and even less in the humanities. public library and undergraduate library personnel may not object to segmenting their files, but those librarians responsible for major research collections that cover all disciplines do not look with favor on segmented files. although circulation records do provide some clues as to the activity of the various parts of a library's collection, no one really knows what the search activity in the catalog is, or how it is distributed across the various records used. therefore, since every record is considered permanent in libraries, major effort has been expended on input processing which has included the recording of much material whose utility is questionable. a user wants to access files in open language, and wants to receive response in open language; he will not use codes and so-called machine language and will tolerate only a minimum of training on methods to interrogate the file. he prefers to engage in an actual dialogue with the file and if he cannot do this will ask a reference librarian or reader's advisor to find the references for him. he also wants real-time response. if he doesn't get fairly prompt answers, he will go elsewhere to satisfy his informational needs. types of files the librarian must work with a number of files: 1) the item record file is the record of an item, book, journal, report. etc., that is being ordered, is on order, is being received, or is being processed by the cataloger. 2) the catalog file is the permanent bibliographic and subject record of the item that has been processed by the cataloger. 3) the serial~ record file, which is in two parts, is the record of holdings of completed volumes both bound and unbound, and the check-in record of currently received periodical issues. 4) the circulation control file keeps the record of all items loaned or otherwise charged out. 5) the catalog authority file organization of library records/w arheit 23 file is the thesaurus-like vocabulary control which indexers and catalogers use as their authority list and guide in assigning index terms. it is also used to "normalize" the inquiries of a searcher and convert them to legitimate index terms. the librarian is also concerned with a number of indexed abstracts produced by various discipline oriented institutions which are used in libraries. he also uses a number of special files: borrower or patron file, special collection files, location files, vendor files, and the like. except for a few comments about the item record, this discussion is confined primarily to the catalog file, which is by far the largest file and, for the librarian and the general user, the most important. as already noted, in most respects it is very similar to the indexed abstract file and, in fact, in certain special libraries, these two files are combined. in process file the in process, or item record, file consists of records of all items which the library is acquiring and processing. it is not a very large file, or, at least if properly policed, should not be. unfortunately, because in manual systems it is difficult continuously to follow up outstanding orders, a lot of deadwood accumulates and files become unnaturally large and difficult to handle. in a well controlled file, however, the number of records does not grow appreciably, for, although new items are added, processed titles are removed when they are added to the catalog file. · in addition to providing such normal bibliographic access points as personal author, corporate author, title, report nmj}ber and the like, the item record may also be searched by a number of specialized keys: order number, vendor, publisher, journal code, contract number, fund, requester. the item record is very dynamic. information available to the librarian when the order for an item is placed may be faulty. new information will be coming in about the item, such as price, shipping costs, invoice number, change in vendor, and change in title. various funds have to be charged and obligations changed, payments authorized, funds decremented, receipt notices prepared and sent to requesters, flags in various files changed to prevent duplicate orders and the bibliographic record transmitted to the cataloging staff. however, once an item has been received and cataloged, only the bibliographic information (author, title, place, publisher, date, pagination) are retained and the rest of the information is retired to an historical file. ( 2). because it would provide greater flexibility as new and unexpected demands are generated, the best way to handle this dynamic file would be with a generalized data management system rather than with a tailormade acquisitions and processing program. although present data management systems are really not suitable, because of variable length records in item record files and because terminals will be used, it appears that some could be adapted. 24 journal of library automation vol. 2/1 march, 1969 catalog file the tendency today, however, is to build a single master file with various functional fields where bibliographic information, ordering, and purchasing data, loan records, location information and other item control data are stored. how should this very large master catalog file be organized so that it will be easy and economical to maintain and provide all the desired search capabilities? there are three basic file organization schemes in use today for information retrieval: the serial file, the inverted file and the list process file ( 3,4,5). actually, from a technical point of view, both the inverted file and the list process file represent two different classes of list structures and are, therefore, sometimes referred to as the inverted list system and the threaded list system. serial file organization although the serial file is the easiest and cheapest to maintain, the librarian obviously cannot accept purely serial searching of his catalog. the file is much too big and the real time requirements are such as to rule out any but the shortest, simplest serial or sequential search. as will be pointed out later, the librarian does need some serial searching capability, and of course he does need it if he wants to do any browsing. however, if he is to provide any kind of useful service, he must use direct-access storage devices and access to his records individually. threaded list file organization for a while there was some interest in using a threaded list file organization for the catalog file. here, the searcher is first directed through a dictionary or directory to the latest record associated with a term. this record also contains the chain address of the previous record having the same descriptor, so that a user can run through a "chain" or "list" until he reaches the oldest or last record, or comes back full circle to the starting record. each record belongs to a number of lists, one for each descriptor used to describe it, and there are as many lists as there are descriptors. such a system seems economical of storage space in that a secondary or separate index does not have to be stored, but, since storage space for the chain or link address has to be provided, the actual savings are very small. there are several possible refinements of this list file organization which reduce storage costs. some involve elimination of redundant information; a term, or any other searchable piece of information, is stored just once, sometimes in the form of a table. each record that contains searchable information has a pointer to the term itself. there have to be, of course, pointers from every term back to the records as well. insofar as the pointers may require fewer bits than the terms or addresses themselves, there is a saving in storage space. it does cost some additional processing time and file maintenance is somewhat complicated. ( 6). file organization of library records/w arheit 25 another economy measure is provided by what is generally called a multilist system which groups several-usually three-descriptors into one super key with one chain address. a multilist not only saves space but also speeds both file posting and searching by processing multiple descriptors simultaneously. ( 7,8,9,10,11). such a system, to be workable, must permit grouping of various descriptors into mutually exclusive groups, and within each group there must be some equitable distribution of descriptors posted to records. in normal library information retrieval applications, a very large percentage of the descriptors are used just for one or two documents and only a few descriptors are used to identify a large number of document records. in other words, most of the so-called super keys end up having just a single real descriptor, which is equivalent to establishing a separate list for each descriptor. in a test made with the defense document center collection it turned out that about ninety percent of the super keys had only single descriptors. ( 12,13). there are, in addition, special modifications of multilist files which essentially involve segmenting the multilist to fit the hardware, for example, the track length or cylinder size. (14). a fragmented sub-list, sometimes referred to as a cellular multilist, may even contain all the link addresses in the directory, thus becoming indistinguishable from an inverted file. any list process file organization, however, does pose serious file maintenance problems, especially where individual records must be changed or deleted. also special precautions must be taken to avoid broken chains and provision made to repair breaks, although some advocates of list process files claim it is easier to maintain thread~d lists than inverted lists. of course, if multilists are used, a special effort must be made to build the super keys. · it must not be forgotten that a threaded list directory can only provide the search statistics for a single term and, unlike the inverted list, can only provide intersection statistics upon completion of a total search. the few librarians who have been exposed to threaded list file organization have not reacted favorably. a few have been interested in applying this technique to do hierarchical searches and other relationship connections in their authority lists or thesauri, but have not seriously considered using it for their catalog files. inverted file organization the traditional library file organization as exemplified by the standard card catalog has been based on a serial main file plus an inverted file. here a normal serial file is "inverted" and the file sequenced by index entry or key. the record itself is duplicated under each of its keys, which librarians call tracings. by strictly limiting the number of tracings or keys applied to each record, the librarian can keep the card catalog down to a reasonable size. however, as deeper indexing is applied to the documents, more keys or tracings are used and the file becomes very large. 26 ] ournal of library automation vol. 2/1 march, 1969 furthermore, storage costs in the mechanized file are appreciably higher than in an ordinary manual card file. the full record, therefore, in a mechanized system cannot be economically stored behind each term. only the document or record number or file address of the master record is recorded after each term; in other words, the inverted file is just an index to the record file. the main record file itself is a simple serial file where each record is complete in itself, the tracings or keys in the record and the address of the record being duplicated on the inverted file. the catalog file, therefore, is made up of two parts: a serially organized main or master record file, and an inverted index to the main file. ( 15). maintenance of an inverted index is expensive. tracings and the addresses to which they refer have to be duplicated, requiring costly additional storage space. new terms and new addresses cannot simply be added to the end of a file but must be distributed and interfiled throughout tl1e index, causing a number of file maintenance problems. the inverted index and main serial file must be kept in phase, with changes in one being reflected in the other. to maintain these files, separate inputs should not be prepared; instead the inverted index should be generated from the main record file update by program control. ( 16,17,18,19). although the combined file organization of a serial record file and an inverted index does cost more to maintain than serial or list file organization, it provides such superior search capabilities that it has become the favored library catalog file organization. since the inverted file is organized by subject headings or descriptors and since a search request is specified by listing the desired descriptors and their logical relationships, the search programs need only examine the items filed behind each selected descriptor or subject heading. it is unnecessary to look at all the records, as it is with the serial file. the inverted file search, in · its basic form, takes the request descriptors, obtains the list of record addresses or items under each relevant descriptor, makes the specified logical connections, and produces all items satisfying the request. the search procedure examines only potentially pertinent records, ignoring the rest of the file. in other words, the file is organized every time a search is made to suit the requirements of the search. thus, the file and the request are compatible and utilization of the file is essentially independent of its size. an inverted index provides a very special capability to a searcher who is using a terminal, on-line system. he can test both individually and collectively the effectiveness of the terms of his search statement without having to make a complete search of the master record, simply by examining the inverted index. the system will tell him, for example, the number of entries under a term. it will tell him how many entries several terms share in common so that he can test the intersections, that is, the conjunction and disjunction of the terms. the count of addresses that results from the list intersection can be returned immediately to the terfile organization of library records/w arheit 27 minal as an upper limit of the number of hits. in effect decoding of the boolean expression takes place in the inverted index, which is a very compact list, and hence the response time is fast. it is true, some additional calculations and comparisons in the record itself may reduce the number of hits, but will never increase them. sitting at a terminal, a searcher can ask the system what will be the maximum number of hits he will get in response to a search statement. he can change the parameters of his search statement and see immediately what effect that will have on the response of the system. it is primarily because of this capability of the user to have a dialog with the machine that every terminal-oriented library information retrieval system, at least of which the author is aware, is adopting an inverted file organization. in order to reduce storage costs, not every search term need be carried on an inverted index. those search terms or index entries that are practically never searched alone, but used rather in conjunction with another term or tracing, are carried only in the main file and not on the inverted index. in a library catalog these terms are usually the place and date of publication, publisher, language of the book, level of the publication (i.e. adult, children, youth), number and type of illustrations, and so on. these terms appear on almost every record and some of them are high density terms; that is, they are heavily posted. for example, in a typical u.s. library, some eighty per cent of the books are identified as being in english. form headings (bibliography, essay, poem, biography, map, etc. ) , geographic headings, and numerics tha~ are used in conjunction with what are called main headings, also do not appear on the inverted index, but can be searched in the main file. in the very unlikely event that a search is required to be made only for a term not on the inverted file, then, of course, a serial search can be made of the master file. in some systems, a very compact serial file of data may simplify serial searching of the master file. physical organization a basic understanding of how a library's records are used is necessary to a proper plan for their physical organization. in a manual system, logical organization and physical organization of a library's records are identical. furthermore, all files are physically the same, usually on 3x5 catalog cards or, in a few cases, in printed book or sheaf catalogs. in a computer system, however, because of varying capacities, speeds and storage costs of different direct access devices, it is extremely important that the various records and segments of records be stored in those devices which will give the best cost-performance for the application. this means that the rate of utilization of the various records and parts of records, as well as the size of the records, will determine what types of devices will be used as physical files. 28 journal of library automation vol. 2/1 march, 1969 in a library operation there is very heavy use of index terms, or subject headings and author entries, to search the files; records for these entries can be very short. borrower records and charge-out records in circulation control systems are also very actively used. there is less use made of the bibliographic record or journal citation. these records are somewhat longer than the subject and author tracings, and hence require more storage, but do not need such rapid access. notes, abstracts and other explanatory material can require an enormous amount of storage space but, as a rule, are used only infrequently. patron registration, as contrasted with borrower records, is used much less frequently, unless, of course, the two types of records are combined. since serials holdings records do not change very frequently, printouts are quite satisfactory as finding tools and the records are usually kept off-line. journal check-in, however, requires a great number of accesses every day. in view of the requirements generated by the above uses, the present thinking for on-line library systems, in terms of current hardware, runs something like this: in a combined file system described above, with the bibliographic record on the serially organized main file and the index in an inverted file arrangement, the inverted file, which must be accessed many more times than the main fil~, would best be carried on disk files. the bibliographic record itself, being much more voluminous and accessed less frequently, is stored in a larger, slower, more economical file like the ibm 2321, tl1e data cell. abstracts, and other seldom used bulk records might well be on tape, off line. actually, though, as libraries build up their record files to control their total collections, they will, of course, exceed the capacity of the present data cells and will have to go to future mass memory devices similar to the ibm photo digital storage system. then it may be economical to put even abstracts and notes of the bibliographic record on line. if there is a separate item record file of in process or acquisitions data, it can be handled in the same way as the catalog file, that is, all access points as an inverted file on disk with the record itself on the data cell strips. if, however, the total item record file is not too big, it might well be stored on disk. circulation control records are carried on disk, but patron registration, if it is to be kept on line, would be more economically stored in the data cell. the authority list or thesaurus really has two functions. it is heavily used to validate and convert all inputs and all search requests. it is also used to store all cataloging and indexing decisions and to provide guides to users as to the formulation of search queries. the necessary data makes for long records that are either infrequently used or available as printouts. therefore, a condensed form of the authority list or thesaurus, a forni which carries only the terms and their equivalents, is best stored on disk, whereas the full-blown authority list which is used primarily for printing the thesaurus and its supplements can be carried off-line on tape, or in file organization of library records/w arheit 29 the cheapest, biggest and slowest direct access device which is available. in order to achieve economical, compact storage, the subject headings, descriptors or index terms would not be stored in open language but in numeric codes. by using, for example, the decimal code as used in a dewey decimal system, numeric codes would also make it possible economically to build hierarchies or class tables with the descriptors. it would be necessary, therefore, in every transaction, to translate from open language to code when interrogating the system and to translate from code to open language when outputing from the system. translations would have to be very fast to accommodate the traffic of a large number of terminals. the translation job, using a stored table, might have to be done in an auxiliary, large core storage, which is very fast but more expensive than disk files. as a general rule, what is being proposed is that for very large files the index and the bibliographic record are not to be stored in the same device. one might start this way until the file and the traffic into it are built up and the system becomes fully operational. however, the system should be so structured that indexes could be stored in files that are faster than the bulk storage devices used for the records. the translation files, that is, the tables that convert from open language to stored codes on input and the reverse on output, can be stored in the fastest available exemal storage. ( 20). it is extremely doubtful that hardware development in the' immediate future will change these principles of library file organization very much. as storage costs drop, total capacities increase, and _access times become shorter, more and more libraries will find it practical and economical to put their files on line in order to provide the improved services that users demand. references 1. u. s. library of congress: automation and the library of congress (washington, government printing office, 1963), p. 74. 2. batts, n. c.: "data analysis of science monograph order/cataloging fmms," special libraries, 57 (october, 1966), 583-586. 3. "corporate data file design," edp analyzer. 4 (december, 1966). 4. climenson, w. d.: "file organization and search techniques," annual review of information science and technology. 1 (new york: interscience, 1966), p. 50. 5. borko, h.: "design of information systems and services," annual review of information science and technology, 2 (new york: interscience, 1967), p. 50. 6. castner, w. g., et al.: "the mecca systema modified list processing application for library collections," proceedingsa. c. m. national meeting ( 1966), pp. 489-498. 30 journal of library automation vol. 2/1 march, 1969 7. prywes, n. s., et al.: the multi-list system (philadelphia, moore school of electrical engineering, university of pennsylvania technical status report no. 1 under contract nonr 551(40), november, 1961). 8. prywes, n. s.; gray, h. j.: "the multi-list system for real time storage and retrieval," ifip congress proceedings. 1962, pp. 112-116. 9. university of pennsylvania, moore school of electrical engineering: the tree as a stratagem for automatic information handling (report of work under ... contract nonr 551 ( 40) and ... af 30 ( 602)2832, moore school report no. 63-15, 15 december 1962). 10. lefkovitz, d.: automatic stratification of descriptors (philadelphia, moore school of electrical engineering, university of pennsylvania, technical report under contract nonr 551 ( 40), moore school report no. 64-03, 15 september 1963). 11. landauer, i.: "the balanced tree and its utilization in information retrieval," ieee transactions on electric computers (december, 1963), pp. 863-871. 12. univac division of sperry rand corporation: multi-list systems: preliminary report of a study into automatic attribute group assignment; technical status report no. 1-2, 3#ad 609 709, 4#ad 609 710 ( 1963-1964). 13. univac division of sperry rand corporation: optimization and standardization of information retrieval language and systems; final report (ad 630-797, 1966). 14. lefkovitz, d.: file st1'uctures for on-line systems (class syllabus). 15. curtice, r. m.: magnetic tape and disc file organizations for retrieval (lehigh university, center for information sciences, july, 1966). 16. warheit, i. a.: "the direct access search system," afips conference proceedings, 24 ( 1963), pp. 167-172. 17. warheit, i. a.: the combined file search system. a case study of system design fm· information retrieval (paper presented at the f. i. d. meeting in washington, d. c., october 15, 1965; abstract, 1965 congress, international federation for documentation ( fid), washington, d. c., u. s. a. 10-15, october 1965), p. 92. 18. prentice, d. d.: the combined file search system (san jose, california: ibm june 15, 1964). 19. 1401 information storage and retrieval systemversion ii; the combined file search system, no. 10.3.047 (hawthorne, new york: ibm, may 1, 1966) . 20. warheit, i. a.: file organization for libraries; report to project intrex, mit, cambridge, massachusetts, march 14, 1968. migration of a research library's ict-based services to a cloud platform communication migration of a research library’s ict-based services to a cloud platform francis jayakanth, ananda t. byrappa, and filbert minj information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.13537 francis jayakanth (francis@iisc.ac.in) is scientific officer, jrd tata memorial library, indian institute of science. ananda t. byrappa (anandtb@iisc.ac.in) is librarian, jrd tata memorial library, indian institute of science. filbert minj (filbert@iisc.ac.in) is principal research scientist, supercomputer education and research centre, indian institute of science. © 2022. abstract libraries have been at the forefront in adopting emerging technologies to manage the library’s operations and provide information services to the user community they serve. with the emergence of cloud computing (cc) technology, libraries are exploring and adopting cc service models to make their own services more efficient, reliable, secure, scalable, and cost-effective. in this article, the authors share their experience migrating some of the library’s locally hosted ict-based services onto the microsoft azure cloud platform. the migration of services to a cloud platform has helped the library significantly reduce the downtime of its services due to power or network or system outages. introduction established in 1909, the indian institute of science is a leading advanced education and research institution in the sciences and engineering. since its inception, the institute has balanced an emphasis on pursuing basic knowledge with applying its research findings for industrial and societal benefit. the institute, which started with just two departments—general and applied chemistry and electrical technology—now has over 40 departments spread across six divisions: biological sciences, chemical sciences, electrical sciences, interdisciplinary research, mechanical sciences, and physical and mathematical sciences. the institute’s jrd tata memorial library (https://library.iisc.ac.in) celebrated its centenary in 2011. established in 1911, the library was one of the earliest central facilities created by the institute to support teaching and research. the library offers both conventional and contemporary services to its users. the library’s traditional services include reference, referral, cataloguing and classification, circulation, inter library loan, document delivery, weekly display of recent periodicals and books, and photocopying. some of the library’s current information and communications technology (ict)based services include digital repository services for the institution’s research publications and theses and dissertations, a faculty profiling system, a web-based online public access catalogue (web opac), and shibboleth-based federated access to the library’s subscribed online resources. the library also facilitates information literacy services such as library orientations, workshops, seminars, demonstrations, invited talks, training sessions on subscribed resources, trial access to new products and services, and author workshops on the research publishing process. mailto:francis@iisc.ac.in mailto:anandtb@iisc.ac.in mailto:filbert@iisc.ac.in https://library.iisc.ac.in/ information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 2 until 2018, the library used its on-premises it infrastructure to provide these ict-based services. the library had dedicated computer servers for its email, institutional repository, library website, integrated library management system (lms), and online journal publishing system. the institution’s faculty profiles system is part of the indian research information system (https://irins.org/irins/), a web-based research information management service provided by the information and library network (inflibnet) centre. the library’s in-house servers were ageing, and they were even beginning to fail. also, managing the in-house servers with limited human resources was increasingly challenging. as a result, the library contemplated moving some of its services to a cloud platform. even smaller libraries had begun migrating their services to the cloud platform almost a decade ago.1 around 2016, the institution established the digits (digital campus and information technology services) office to conceive, plan, and create a best-in-class information technology and networking system and support operational excellence through agile it and networking services. to date, the digits office has, among other projects, successfully migrated more than 70 departmental email servers to a centrally managed cloud-based microsoft office 365 suite and developed and migrated the institute’s main website (https://www.iisc.ac.in) and more than 150 websites and 10 web portals of institution departments, centres, and other facilities to the microsoft azure platform. the digits office also creates and maintains virtual machines (vms) on the microsoft azure cloud platform for the institution’s departments and offices. migration of locally hosted it infrastructure to a cloud platform offers several benefits to the organization. these benefits include setting up virtual offices accessible from anywhere and at any time, avoiding capital investment in computing infrastructure, taking advantage of the cloud platform’s elastic computing resources, avoiding the necessity of having a dedicated it team, and, most importantly, minimizing downtime and loss of productivity and data. moreover, a cloud platform offers easy scalability, redundancy, and security. achieving these features in the traditional in-house hosting of computing infrastructure would be cost prohibitive. 2 the library has configured three vms on the azure platform and has moved some of its ict-based services to the cloud platform. migrating ict-based services to the cloud platform has helped the library significantly reduce the downtime of computer servers. cloud computing and its service models cloud computing (cc) refers to computer hardware and software provided as a service by another company. the only requirement to access the cloud computing service is a device with access to the internet. some leading cc service providers include amazon web services (https://aws.amazon.com/what-is-aws/), microsoft azure (https://azure.microsoft.com/en-in/), and google cloud (https://cloud.google.com). there are three service models in cloud computing: software as a service (saas), platform as a service (paas), and infrastructure as a service (iaas). service providers host software applications on their cloud platforms in the saas model. examples of the saas model include google apps (https://workspace.google.com/) and microsoft office 365 (https://www.microsoft.com/enin/microsoft-365). clients opting for the saas model need not worry about installation, setup, running, and maintaining the applications. service providers will do that for the clients. https://irins.org/irins/), https://www.iisc.ac.in/ https://aws.amazon.com/what-is-aws/ https://azure.microsoft.com/en-in/ https://cloud.google.com/ https://workspace.google.com/ https://www.microsoft.com/en-in/microsoft-365 https://www.microsoft.com/en-in/microsoft-365 information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 3 paas provides a computing platform comprising an operating system, database, programming environment, and application programming interface. examples of the paas model include amazon elastic beanstalk (https://aws.amazon.com/elasticbeanstalk/), windows azure (https://azure.microsoft.com/en-in/), and google compute engine (https://cloud.google.com/compute). in the iaas service model, clients can obtain computing infrastructure, virtual machines, networking, and storage components on demand and deliver them over the internet. examples for the iaas model include google compute engine, amazon ec2, and microsoft azure. in coordination with the digits office, the library initially provisioned three vms on the microsoft azure cloud platform to migrate some of its it-based services. table 1 shows the initial hardware configurations of each of the three vms. table 1. vm types and their system configurations along with the cost virtual machine (vm) vm type1 vcpus & ram (gb) cost / month (usd)2 os disk (gb) & type secondary storage disk (gb) & type storage cost / month (usd)3 os vm1 (ir services) standard f4s_v2 4, 8 140 400 (ssd) 600 (ssd) 114 cent 7.x vm2 (ilms) standard d4s_v3 4, 16 148 300 (ssd) 200 (ssd) 57 cent 7.x vm3 (website) standard f4s 4, 8 140 300 (ssd) 200 (ssd) 57 ubuntu 18.x 1as of 2018 and subject to change with time. 2cost as prevalent in 2018. 3cost as prevalent in 2018. a virtual machine (vm) is an on-demand and scalable computing resource available on cc platforms. vms give better control over the computing environment without buying any underlying physical hardware. the microsoft azure platform offers various vm options, each optimized for different workloads. for example, the d-series azure vms provide a combination of vcpus (virtual cpus), memory, and temporary storage to meet the requirements associated with most production workloads. categories in the d-series of vms include ds-series, dds-series, and das-series. the f-series vms feature a higher cpu-to-memory ratio, are equipped with 2 gb ram and 16 gb of local solid-state drives (ssds) per cpu core, and are optimized for compute-intensive workloads. f-series vms are costlier than the corresponding d-series vms (https://docs.microsoft.com/en-us/azure/virtualmachines/sizes). for the secondary storage, apart from the standard hard disk drives (hdds), vms support azure premium ssds and ultra-disk storage, depending on regional availability. the premium ssds are designed to support intensive input/output workloads. they are priced almost three times higher than the standard hdds. the standard disk capacity of an azure vm’s os disk is 30 gb, and it can be increased to the desired capacity. apart from the os disk, one can also have a required amount of secondary disk storage. the cost for the additional disk storage (both os and data) is https://aws.amazon.com/elasticbeanstalk/ https://azure.microsoft.com/en-in/ https://cloud.google.com/compute https://docs.microsoft.com/en-us/azure/virtual-machines/sizes https://docs.microsoft.com/en-us/azure/virtual-machines/sizes information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 4 independent of vm pricing, and it depends on storage type and capacity. a vcpu refers to a virtual central processing unit. a vm treats each vcpu as a single physical core. migration of the library’s ict-based services to microsoft azure cloud platform libraries have always been at the forefront of adopting emerging technologies. it is true with cc technology as well. in an interview at the american library association annual meeting in anaheim, california, in june 2012, clifford lynch traced 30 years of interactions between libraries and new technologies.3 with the evolution of cc technologies, libraries have been using cc’s saas and iaas service models since 2009 to host their websites, library management system (lms), and digital repositories. libraries have been using the cc mainly for saas and iaas services.4 as a first step, during mid-2017, the digits office began migrating all the 70+ individual departmental email servers, including the library’s, to a centrally managed, cloud-based mailing solution using office 365 (now microsoft 365) exchange online. after the successful migration of all the email servers, the library shut down its email server. next, the library decided to migrate some of its locally hosted ict-based services to the cloud platform in a phased manner. planning the migration process: considering a single vm or independent vms for each application before undertaking the migration process, libraries need to consider what types of projects are good candidates for the cloud and what types are not.5 in the first phase of the cloud migration, the library decided to migrate the following services: (1) institutional repository services, (2) the library management system, and (3) the library website. before the cloud migration, the library used three independent on-premises servers to host the above services. a sun fire computer server with intel xeon processor, 4 gb of ram, and 2 tb of secondary storage hosted the institutional repository service for research publications using eprints software and the electronic theses and dissertations service using dspace software. the libsys library management system was hosted on an ibm server with intel xeon cpu e5-2620 v2 @ 2.10ghz processor, 16 gb ram, and 1 tb of secondary storage. the library website was hosted on an ibm thinkserver ts150 server with intel xeon cpu e3-1225 v5 @ 3.30ghz processor, 8 gb ram, 1 tb of secondary storage. all three computer servers had been in use for nearly 10 years and were long overdue for replacement. as the library contemplated upgrading its ict infrastructure, provisioning vms on the azure cloud platform through the digits office was a stimulus. next, the library had to decide whether to go with a single vm with robust hardware configuration to host all three applications or to provide independent vms for each service. based on the experience gained from hosting two ir services on a single server, the library decided to go again with a single vm with robust hardware configuration to host the ir services. the lms is a critical application for managing all library functions; therefore, the library decided to host it on a separate vm. a third vm is used to host the library website. as the library eventually plans to move its other ict-based applications to the azure platform, it could migrate and distribute those applications on the existing three vms based on the utilization and load on each of the three vms. initially, the library opted for two vms in the f-series and one in the d-series with premium ssds for all three vms. after observing performance and price for about three months, the library had information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 5 one of the two vms (vm3) moved from f-series to d-series and downgraded the os and data disk types of all the three vms to the standard disk drives. the data disk on vm3, hosting the library website, was dropped as the os disk capacity was more than adequate to run the service. table 2 shows the revised vm types and their configurations. in april 2020, most of our students and faculty members started working off-campus because of the onset of the covid-19 pandemic. to facilitate seamless access to licensed online resources from off-campus, the library set up federated access through shibboleth sso.6 the library provisioned a new virtual machine (vm4) with a system configuration as indicated in table 2. table 2. revised vm types and their system configurations along with the cost virtual machine (vm) vm type4 vcpus & ram (gb) cost/ month (usd)2 os disk (gb) & type secondary storage (gb) & type storage cost / month (usd)3 os vm1 (ir services) standard f4s_v2 4, 8 147 400 (ssd) 600 (hdd) 39 cent 7.x vm2 (ilms) standard d4s_v3 4, 16 148 300 (ssd) 200 (hdd) 21 cent 7.x vm3 (website) standard d2s_v3 4, 8 81 300 (hdd) none nil ubuntu 18.x vm4 idp server standard f2s_v2 2, 4 71 300 (hdd) none nil ubuntu 18.x 1as of 2019 and subject to change with time. 2cost as prevalent in 2019. 3cost as prevalent in 2019. the cloud migration process cloud migration is the process of moving applications and data from an organization’s onpremises computers to a cloud platform. before undertaking the migration process, the requisite software applications must be installed and configured on the vms. then, the data corresponding to each application must be backed up on the on-premises system and moved to the corresponding vms. coordination with the campus network support team is essential to ensure the vms are accessible on the internet with all the security measures in place. so, every application migrating to the cloud platform has to go through a cloud migration process. in the following sections, the authors briefly describe the library’s migration process to migrate three of its ict-based services to the azure cloud. the library completed the entire migration process in about three months. migration process for the research publications repository eprints@iisc, the institute’s institutional repository (ir), was established in 2002 and holds nearly 55,000 publications. it is one of the earliest repositories in this part of the world. 7 the ir runs on eprints (https://www.eprints.org/uk/), the world-leading open-source digital repository platform. developed at the university of southampton for over 20 years, eprints has provided stable, innovative repository services across academia and beyond. eprints is a stable, flexible, https://www.eprints.org/uk/ information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 6 reliable software and ideal for maintaining institutional repositories. before the migration, the library hosted the publications repository on an on-premises server for almost 17 years. eprints software depends on other software, including apache web server with mod_perl (https://httpd.apache.org/), mysql/mariadb (https://mariadb.org/) relational database management system, perl programming language, and several perl modules. eprints software bundles many required perl modules, but some are installed depending on the underlying operating system (os). for example, for the eprints@iisc repository installed on a vm running the cent os, the library has installed apache web server with mod_perl, mariadb relational database management system, and a few missing perl modules. after installing all the dependent software, the library followed the steps listed below to migrate the publications repository to the vm on the azure cloud platform: 1. installed the latest version of the eprints (3.4.1) software on the vm and incorporated all the local customizations done at the configuration and code levels. 2. created a new repository and retained the existing repository name. 3. created a new mariadb database and assigned appropriate grant permissions to the database. 4. as the database structure had changed in the latest version of eprints, executed the necessary scripts built into the eprints software to update the eprints database structure. 5. imported customized institute-specific subject headings to override the default ones. 6. moved the database and full-text backup files to the vm using winscp—an open-source, free ftp client (https://winscp.net/eng/docs/introduction). 7. restored the backups comprising eprints mysql database and full-text files on the vm to the corresponding locations on the file system and uncompressed database and full-text files. 8. imported the mysql database into the new mariadb database on the vm. 9. regenerated all the static pages of the repository, abstracts of all the records, and browse views for the year, author, document type, and subject categories on the vm. 10. enabled hypertext transfer protocol secure (https) for log in and account creation links. 11. configured postfix (http://www.postfix.org/). postfix is a free and open-source mail transfer agent that routes and delivers electronic mail. 12. coordinated with the institute’s network support team to make necessary changes in the dns entries to reflect the vm’s new public ips and enable the vm to send and receive emails. 13. created crontab entries on the vm to run the cron jobs. a cron job is a time-based job scheduler in a unix-like computer operating system. some of the cron jobs include updating the latest records added to the repository, displaying the latest count of records in the repository, and updating the browse views of the repository. migration process for the electronic theses and dissertations repository established in 2005, etd@iisc, the institution’s electronic theses and dissertations repository, is one of the earliest etd repositories in this part of the world. 8 the library uses dspace software (https://duraspace.org/dspace/) to maintain the etd repository. to date, the repository holds nearly 6,000 of the institute’s etds. before the migration, the library hosted the etd repository on an on-premises server for almost 13 years. https://httpd.apache.org/ https://mariadb.org/ https://winscp.net/eng/docs/introduction http://www.postfix.org/ https://duraspace.org/dspace/ information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 7 dspace software is dependent on several third-party software applications and tools, including java jdk, apache maven (https://maven.apache.org/), apache ant (https://ant.apache.org/), postgresql (https://www.postgresql.org/) or oracle (https://www.oracle.com/in/database) relational database management system, and apache tomcat servlet engine (http://tomcat.apache.org/). the library followed the steps listed below to migrate the etd repository to the vm on the azu re cloud platform: 1. installed all the dspace-dependent software packages and the latest version of dspace software (version 6.3). 2. configured the dspace software to incorporate the native customizations. 3. created communities and collections to reflect the divisions and the corresponding departments and centres of the institute using java script. 4. set the access permission for each collection of the repository based on the users and groups belonging to the collection. 5. modified the metadata of the etds from the on-premises version using a script. the modified metadata was imported into the latest version of dspace. 6. copied and moved etd items comprising pdf files from the on-premises server to the vm. 7. enabled hypertext transfer protocol secure (https) for the etd site. 8. customized the default etd site for a better look and feel. 9. created crontab entries on the vm to run the cron jobs to take incremental backup and display the etd count on the landing page. the new version of the dspace user registration system was modified to enable only people with an institute email id to register with etd@iisc. in addition, the registration process captures the registrant’s department and the division, which helps automate the process of assigning the registrant to a specific collection of the repository. therefore, a user will submit an etd only to the designated collection. migration process for the libsys library management system the library has been using libsys (https://www.libsys.co.in/), a commercial lms, for over 25 years. libsys is dependent on several other software applications, including wildfly application server (https://www.wildfly.org/), java jdk, and mysql (mariadb) relational database management system. the steps involved in migrating libsys to the cloud (vm2) are listed below: 1. installed all the libsys-dependent software components and the latest version of the libsys software on the vm. 2. restored the mariadb database backup. 3. made required changes in the libsys configuration files. 4. installed and configured postfix email transfer agent for email communication. 5. as the libsys service and the opac run on nonstandard ports, the library coordinated with the network support team to open the required communication ports on the vm. https://maven.apache.org/ https://ant.apache.org/ https://www.postgresql.org/ https://www.oracle.com/in/database http://tomcat.apache.org/ https://www.libsys.co.in/ https://www.wildfly.org/ information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 8 migration process for the library website the library uses drupal (https://www.drupal.org/), a content management system, to host its website. the steps involved in the migration process are listed below: 1. installed all the drupal-dependent software, including apache web server, mariadb, and php (https://www.php.net/), on the vm. 2. installed one of the latest versions of drupal using its web-based installer. 3. installed the required drupal plugins and the drupal theme. 4. restored drupal database backup on the vm. 5. installed and configured postfix email transfer agent for email communication. after completing migration processes, the library coordinated with the network support team to make changes in the domain name system (dns) to enable access to all the three vms on the internet. monitoring the azure virtual machines azure monitor (am) for vms includes a set of performance charts that target several key performance indicators to determine how well a virtual machine performs. the graphs show resource utilization over a period to identify bottlenecks and anomalies or switch to a perspective listing of each vm to view resource utilization based on the metric selected. while there are numerous elements to consider when dealing with performance, azure monitor for vms monitors critical operating system performance indicators related to the processor, memory, network adapter, and disk utilization. performance complements the health monitoring feature and helps expose issues that indicate a possible system component failure, support tuning and optimization to achieve efficiency, or support capacity planning (https://docs.microsoft.com/enus/azure/azure-monitor/insights/vminsights-performance). am is accessible only to the cloud administrator. based on the am charts, the library’s inference has been that the ir server (vm1) hosting publications and etd repositories needs capacity planning. cpu utilization is reaching maximum capacity quite frequently. therefore, the library plans to move the etd repository to an independent vm. the utilization of the ilms server (vm2) is less than optimal. therefore, the library decided to migrate publication of the institution’s journal of the indian institute of science (jiisc) from onpremises hosting to the ilms server (vm2) on the azure cloud. for hosting jiisc on the azure cloud platform, the library uses the open journal system (ojs) platform (https://pkp.sfu.ca/ojs/). ojs is open-source software for the management of peerreviewed academic journals. ojs is dependent on other software and tools, including apache web server, mysql or postgresql, and php. the library used the virtual hosting concept to host multiple sites on a single vm (vm2). virtual hosting is a method of hosting multiple domain names on a single server. https://www.drupal.org/ https://www.php.net/ https://docs.microsoft.com/en-us/azure/azure-monitor/insights/vminsights-performance https://docs.microsoft.com/en-us/azure/azure-monitor/insights/vminsights-performance https://pkp.sfu.ca/ojs/ information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 9 benefits observed of migrating to the cloud moving some of the ict-based services of the library to the azure cloud platform has resulted in the following benefits: 1. service reliability has improved significantly as the reliance on the in-house ageing servers has been done away with. 2. cloud migration has made the library’s computing infrastructure more flexible. it can be scaled up or down as per the library’s requirement. 3. operational logs and usage metrics are easy to obtain. 4. set alert rules are based on vm metrics. 5. the cloud hosting company’s managed services include periodic backups, ensuring that data is secure. 6. users can now quickly move between the library and home (or any other location) and access all their research. 7. another significant benefit to cloud computing for library users is sharing information easily. libraries provide collaborative spaces within the building, but patrons also can use collaborative online spaces. onedrive provides access to online storage and allows sharing of files and folders among approved users. 9 lessons learned during and after the migration process initially, the library opted for two vms on azure’s f-series and one in the d-series with the ssd storage devices for all three vms. as stated above, the pricing of the azure vms depends on the hardware configuration of a specific type of vm series. for example, the f-series with a particular hardware configuration and ssd costs more than its counterparts on the dor b-series with the same hardware configuration with a standard hdd. libraries should, therefore, have a clear understanding of the vm types and the corresponding costing aspect. after observing vms performance and the cost aspect for a few months, the library moved one of the two vms from the f-series to the d-series and switched to standard hard disk drives for all three vms. the changeover did not result in any performance degradation, but the cost of the secondary storage came down by about one-third. the library maintains two institutional repositories, one for research publications using the eprints application and the other for theses and dissertations using the dspace application. the library decided to migrate both the repositories to a single vm. however, it turns out that this decision was not a prudent one, for the tomcat server running eprints crashes, often resulting in downtime for the etd service. the vm usage metrics reveal that the dspace application often utilizes nearly 100% of cpu capacity, which leads to the freezing of the tomcat server, resulting in etd service becoming unresponsive. the library contemplates upgrading the hardware configuration or setting up the two repository applications on two vms. the library is checking to understand if the issue is with the tomcat server configurations. the graph shown in figure 1 is the screenshot of azure’s metric monitoring of the vm1 running eprints and dspace applications. the peaks in the graph represent the cpu usage by the dspace application. it is evident from the graph that quite frequently, the cpu usage of the dspace application is reaching 100%, which eventually leads to the freezing of the tomcat web server. information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 10 figure 1. screenshot representing the cpu usage of eprints and dspace applications. the professional library staff administered and managed the on-premises ict-based services of the library. therefore, the library did not encounter any technical challenges during and after the cloud migration process. other libraries that intend to migrate their services to a cloud platform should ensure that the library staff entrusted with the migration process are comfortable working with the command prompt, especially in the linux operating system environment. conclusions it has been more than two years since some of the library’s ict-based services migrated to the microsoft azure platform. to date, the library has not experienced a single instance of its servers being down because of a power outage, network issue, or crashing of the vms. however, there have been issues with specific services like the apache tomcat servlet engine or the apache web server crashing, resulting in the corresponding application being unresponsive. such behaviour can result because, at times, the system resources, especially the cpu and ram, may be used to their capacity. restarting the specific service will ensure that the corresponding application comes up. therefore, it is essential to keep track of the ram and cpu usage of the vms and upgrade them if the situation warrants such an action. the rapid elasticity characteristic of cloud computing facilitates organizations to configure optimal computing resources based on actual requirements. based on the library’s initial experience running the ict-based services on a cloud platform, the authors suggest that deploying two different institutional repository software platforms like eprints and dspace on a single vm may not be a good idea. the tomcat instance powering the dspace site runs with higher cpu usage, swinging up to 100% and at times going beyond 100% cpu usage. the high cpu usage by the tomcat instance can eventually lead to its freezing, resulting in the corresponding service being inaccessible. working with the vms demands some degree of familiarity in working in the command prompt. library staff who are not comfortable working in the command prompt will require additional training in getting used to the vm environment. in our case, the training was not necessary as the authors had adequate experience working in the linux operating system. also, library staff managing the cloud infrastructure need to coordinate with the organization’s networking and information technology and libraries march 2022 migration of a research library’s ict-based services to a cloud platform | jayakanth, byrappa, and minj 11 email support staff to configure the vms to be accessible on the internet, email communications , and enforce security measures. acknowledgements the authors would like to thank the editor and the referees for their insightful comments and suggestions. endnotes 1 robin r. hartman, “life in the cloud: a worldshare management services case study,” journal of web librarianship 6, no. 3 (2012): 176–85, https://doi.org/10.1080/19322909.2012.702612. 2 erik t. mitchell, “cloud computing and your library,” journal of web librarianship 4, no. 1 (2010): 83–86, https://doi.org/10.1080/19322900903565259. 3 clifford lynch, elke greifeneder, and michael seadle, “interactions between libraries and technology over the past 30 years,” library hi tech 30, no. 4 (2012): 565–78, https://doi.org/10.1108/07378831211285059. 4 yan han, “iaas cloud computing services for libraries: cloud storage and virtual machines,” oclc systems and services 29, no. 2 (2013): 87–100, https://doi.org/10.1108/10650751311319296. 5 denis galvin and mang sun, “avoiding the death zone: choosing and running a library project in the cloud,” library hi tech 30, no. 3 (2012): 418–27, https://doi.org/10.1108/07378831211266564. 6 francis jayakanth, ananda t. byrappa, and raja vishvanathan, “off-campus access to licensed online resources through shibboleth,” information technology and libraries 40, no. 2 (2021), https://doi.org/10.6017/ital.v40i2.12589. 7 francis jayakanth et al., “eprints@iisc: india’s first and fastest-growing institutional repository,” oclc systems and services: international digital library perspectives 24, no. 1 (2008): 59–70, https://doi.org/10.1108/10650750810847260. 8 jobish pitchet, filbert minj, and tarikere basappa rajashekar, “etd@iisc.: a dspace-based etdms and oai compliant theses repository service of indian institute of science,” in etd 2005: evolution through discovery, 8th international symposium on electronic theses and dissertations, 28–30 september 2005 (sydney, australia: the university of new south wales). 9 tom ipri, “where the cloud meets the commons,” journal of web librarianship 5, no. 2 (2011): 132–41, https://doi.org/10.1080/19322909.2011.573295. https://doi.org/10.1080/19322909.2012.702612 https://doi.org/10.1080/19322900903565259 https://doi.org/10.1108/07378831211285059 https://doi.org/10.1108/10650751311319296 https://doi.org/10.1108/07378831211266564 https://doi.org/10.6017/ital.v40i2.12589 https://doi.org/10.1108/10650750810847260 https://doi.org/10.1080/19322909.2011.573295 abstract introduction cloud computing and its service models migration of the library’s ict-based services to microsoft azure cloud platform planning the migration process: considering a single vm or independent vms for each application the cloud migration process migration process for the research publications repository migration process for the electronic theses and dissertations repository migration process for the libsys library management system migration process for the library website monitoring the azure virtual machines benefits observed of migrating to the cloud lessons learned during and after the migration process conclusions endnotes 230 the recon pilot project: a progress report november 1969 -april 1970 henriette d. avram, kay d. guiles, lenore s. maruyama: marc development office, library of congress, washington, d. c. a srtnthesis of the second progress report submitted by the library of congress to the council on library resources under a grant for the recon pilot project. an overview of the p1'0gress made from november 1969 to april 1970 in the following areas: p1'0duction, official catalog comparison, format mcognition, research titles, microfilming, investigation of inptlt devices. in addition, the status of the tasks assigned to the recon working task force are briefly described. introduction an article was published in the june 1970 issue of the journal of library automation ( 1) describing the scope of the recon pilot project (hereafter referred to as recon) and summarizing the first progress report submitted by the library of congress ( lc) to the council on library resources (clr). recon is supported by the council, the u.s. office of education, and the library of congress. in order that all aspects of the project might be brought together as a meaningful whole, the various segments, regardless of the source of support, were covered in the second progress report and have been included in this article. in some instances, it has been necessary to introduce a section by repeating some aspects already reported in the june 1970 article in order to add clarity to the content of that section. recon pilot project/ avram 231 progress-november 1969 to april 1970 recon production the production operations of the recon pilot project are being handled by the recon production unit in the marc editorial office of the lc processing department. printed cards with 1968, 1969, and 7-series card numbers have been provided from the card division stock for recon input, and approximately 99,550 cards in the 1969 and 7-series have been received. using prescribed selection criteria the recon editors have sorted these cards and obtained approximately 27,150 eligible for recon input. approximately 150,000 cards in the 1968 series have also been received. the recon editors have sorted 60,000 of these cards and obtained approximately 24,000 records eligible for recon input. a large number of cards in these three series is already out of print, and replacement cards are being sent by the card division as soon as reprints are made. each card eligible for recon input from the above-mentioned selection process is also checked against a computer produced index of card numbers for records in machine readable form. each number in the print index has a corresponding code to show on which machine readable data base the record resides. the source codes are as follows: m1-marc i data base m2-marc ii, 1st practice tape m3-marc ii, 2nd practice tape m4-marc ii data base m5-marc ii residual data base (the two practice tapes contain records converted before the implementation of the marc distribution service to test the programs and input techniques.) the print index used for the final selection of the 1969 and 7-series card numbers contained only the records from m2-m5 (the marc i data base consists of the records converted during the marc pilot project which ended in june 1968). for the selection of the 1968 records, another print index had been produced which contains numbers for records on all five data bases. if the recon editors find a match on the print index, the appropriate source code is added to the printed card; these printed cards are then maintained in a separate file. (later in the project, the records in the data bases identified as m1 to m3 will be updated to conform with the current marc ii format and added to the recon data base.) the remaining cards for recon are reproduced on input worksheets and edited. to date, approximately 9,750 records in the 1969 and 7-series have been edited for recon. recon records in the 1969 and 7-series are being input by a service bureau. the contractor uses ibm selectric typewriters equipped with an ocr typing mechanism, and the hard-copy sheets are run through an 232 journal of library automation vol. 3/3 september, 1970 optical scanner. the output from the scatmer is a magnetic tape which is processed by the contractor's programs to produce a tape in the marc pre-edit format. this tape is then sent to lc and processed by the marc system programs to produce a full marc record. since the input for the retrospective conversion effort will be printed cards (or copies of printed cards from the card division record set), it will be necessary to compare these with their counterparts in the lc official catalog. the printed card for each main entry in the official catalog will show if any changes have been made which did not warrant reprinting these cards to incorporate these changes. items on a printed card that could be noted in this fashion include changed subject headings, added entries, and call numbers. since these will be important access points in a machine readable catalog record, it was felt that such revisions should be reflected in the recon records. the recon report ( 2) contains a lengthy discussion of the various factors involved in the catalog comparison process, such as the percentage of change in relation to the age of the record, the difficulty in ascertaining any changes because of language, interpretation of cataloging rules, etc. to determine the most efficient and least costly method of catalog comparison, two recon editors were assigned to conduct an experiment to test eight different methods as follows: 1) print-out checked in alphabetic order-single group of 200 records. 2) proofsheets (already proofed) checked in worksheet (card number) order-group of 200 records in batches of 20. 3) proofsheets (not proofed) checked in worksheet (card number) order -group of 200 records in batches of 20. 4) proofsheets (already proofed) checked by mental alphabetizationgroup of 200 records in batches of 20. 5) proofsheets (not proofed) checked by mental alphabetization-group of 200 records in batches of 20. 6) worksheets before editing (not input) checked by mental alphabetization-group of 200 records in batches of 20. 7) worksheets before editing (not input) checked in alphabetical order -group of 200 records in batches of 20. 8) worksheet before editing (not input) checked in worksheet (card number) order-group of 200 records in batches of 20. mental alphabetization means the searching of all the entries in a batch beginning with "a," then all the entries beginning with "b," etc., even though the batch is not in alphabetical order. each editor used 200 records for each method, made the necessary corrections, and recorded the time required as well as the number of corrections made. . figure 1 shows the average number of records checked in an hour using the eight different methods of catalog comparison. tables 1 and 2 give the estimated cost per record for each of the methods. in determining met.hod one : prinffi-0w checked in alplt.aaet'lcal 0,ra.e¢ metn0d. twa : pr0€if~li~lt!s (already proepfed) cheeked.in w:ork&heet qrder method ·th~~e: pitoof:s:tieets--(no;t proo.£,ed) cheek:ed i .n worksrbet orde,r merll:od fou·r method five method six p'rotlf'sheets (already proofed) che.c:ked bv .tmij!f,j:m.. all'~etua.',l'l~on liroofs·h:eets (not pt;';oe:i ed) checked bv mental alphabetization method seven: workshtbftts before editing (no•t tnntht) ~jh\~e,':f!,q-!\1. qj;!>to ..... 0 ~ = table 4. input devices >:l -----------0 manufacturer i mn};~ne ke yboard reco rd price t'-1 model configudisplay length i•~ mont1:f! remarks .... purchase ~ ration characters rent a cybercom i kic mark! kp none 80 $7970 $145 con.verter-$1801month ~ data action ki c 150 kp projec720 $5900 $155 converter-$5751month > tion .: ibm i ki c 50 kp back7w $9605 $175 converter-$340/ month -0 i ki c light in£nite converter-$3401 month ~ ibm mtstv t printed $100 -iv t printed in£nite $277 ... _ 0 sycor i kic 301 t crt 216 $7000 $150 converter-$1301 month ~ tycore ki c 8500 kp light240 $6000 $120 converter-$220/month < emitting 0 diodes viatron ki c 21 ti kp crt infinite $1920 $39 many options affecting price "' burroughs ki m n-7000 kp projec160 $8400to $165 to ...___ "' tion $12,200 $277 honeywell ki m keytape ti kp back80-400 $7500 to $148 to pooler for 2 stationscl) light $33,000 $735 $2001 month exh-a (i) ~ keymatic ki m 1091 t backin£nite $8750 $166 price is for basic 88 keys. 256 .... (i) light unique keys available as well s as optional printer. 0" 100 or 200 (i) mal i ki m 100-92 kp projec$6400 $160 pooler for up to 8 stationsv~ tion $401month extra ...... mohawk i ki m 6400 kp back80 $8000 $145 pooler for 3 stationsco light $1751month extra --l c motorola i k/ m kb800 kp none wo $8500 none pooler for 7 stationspotter i ki m kdr kp bcd 160 $8100 $165 purchase price $9700 pooler for 3 stations(bit) $451month sangamo kim ds9100 kp back120 $8200 $177 pooler for 10 stationsvanguard ki m datakp scribe light none 200 $247 / month extra $8500 $175 comoutp-r t(j't tnfo. .,.. c'r't' qm ~ 1 0. clf'v\ .&. ont'!n ~ '"'""" . coo soles system computer kit 6000 kp entry system mohawk kit 9000 kp computer machinery ki t key processkp general ki t ing 2100 t computer systems inforex kit key entry kp penta kit key kp associates logic systems eng. k/ t keytran kp logic corp. ki d lc-720 kp legend: ki t = key to magnetic tape system kid = key to disk system ki c = key to cassette none 496 back80 light back250 light printed 200 crt l28 back200 light none 300 crt 350 ki m = key to computer compatible magnetic tape kp = key punch t ikp = typewriter or key punch $78,000 $ 200 $16,200 to $360 to two to 6 stations $42,000 $925 $53,000 to $1040 to four to 16 stations $145,000 $2840 $92,500 to $2055to eight to 32 stations $168,100 $4095 $81,240 to $2350 to seven to 39 stations $273,120 $7885 $30,300 to $760 to four to 8 stations $35,100 $960 $110,000 $3000 to eight to 64 stations to $8600 $345,200 $100,000 $2875 to nine to 48 stations to $6350 $220,000 $148,000 $2450to four to 16 stations to $5800 $300,000 t = typewriter backlight= a matrix consisting of all individual characters that can be keyed. each character, as keyed, is displayed one at a time in its particular position in the matrix. projection and light-emitting diodes = a one-character position dot matrix. each character, as keyed, is displayed one at a time in the same position. bcd (bit) = lights displaying the bit position (on, off ) of individual characters. each character, as keyed, is displayed one at a time. (the prices quoted and the characteristics given of each device reflect the best information that could be obtained by the recon staff.) :::tl ~ ("') 0 ~ "';j [ ~ ~~ --.. ~ ~ ~ ~ 244 journal of library automation vol. 3/3 september, 1970 could be assigned to single keys and translated to their proper value by software, thus reducing the amount of keystroking required. the keymatic appears worth further investigation; therefore, the library may rent a device for several months for testing and evaluation. a typist will be trained in current marc/recon procedures and assigned to the keymatic as soon as her training period has been completed. the first month will be spent training on the keymatic prior to the actual input of recon records to obtain production and error rates and cost evaluation for comparison purposes. serious consideration was also given in the recon report to direct-read ocr equipment; however, at that time no equipment existed that offered the technical capability to perform the conversion of the lc record set. since then, preliminary investigation of the model370 compuscan universal optical character reader proved interesting enough to continue further exploration of the device. the model 370 compuscan is a computer directed flying-spot scanner which matches the scanned portion of a character with a character described in the core memory of the computer. the manufacturer has examjned a sample of lc printed cards selected at random over a period of twenty years and has concluded that although the hardware is sufficient to read the record set optically, significant software effort would be required. the results of the sampling indicated that the record set is not constituted entirely of "mint" cards, i.e., cards printed from the metal of the original linotype composition, but is composed of originals and reprints of the original. when the stock of the original printing is close to depletion, the card is reprinted by photographing the card, and duplicates are made by a photo-offset process. as this cycle is repeated, the card for any one title could be several generations removed from the original. in some instances, a microscopic examination of the cards seems to indicate that the matrices used in the linotype composition were worn. because of these factors, what might appear as the same character to the naked eye would represent different pattern configurations to the scanner's core memory. · the coarseness of the card surface may also cause variations in the same characters. lc cards have a high rag content in order to meet the archival standards required by libraries. the roughness of the surface does not affect the readability for the human but may cause variations in a given character when read by an optical scanner. another significant problem with lc cards concerns characters which touch, i.e., connections between what are intended to be distinct characters but are read by the scanner as one. for example, if a lower case "n" were next to a lower case «t" and the cross bar on the "t" touched the "n," the scanner would consider the combination of the "n" and the "t" as one character. software must be written to handle the variant character and the touching recon pilot project/ avram 245 character problems. in the case of the touching characters, the machine must recognize some allowable limit of reading a single character, and when this limit is exceeded, the pattern read rnust be divided and matched against single-character patterns held in core. programs can be written so that if either of the above conditions occurs, the output on magnetic tape will be flagged for later spot checking, permitting the scanner to continue to operate at throughput speeds without human intervention. the resultant magnetic tape would serve as input to the library's format recognition programs to reformat the scanner's output into the marc ii format. it has been estimated that the throughput speed of compuscan would be in the vicinity of 1800 cards per hour. the lc record set will be microfilmed according to the specifications required by the scanner. since the scanner operates with negative film, a very dark background with a very clear, white image is necessary. a tentative cost estimate of the microfilming and reading has been computed at approximately fifty cents per 1000 characters output on magnetic tape (approximately three lc cards). this price does not include the cost of the software. original printed "mint" cards will be used to test the device without implementing the required software, and depending on the results, investigation may be continued. the keying of the 1969 recon records has been performed by a contractor using an ibm selectric typewriter with the resulting hard copy fed through a farrington optical character reader. as part of the contractor's services to the library, production rates were monitored and reported. this gave lc the basis to compare two devices, the key-tocassette used at the library of congress for the marc distribution serv.ice and the equipment used by the contractor for recon records. to make the comparison in table 5, it was necessary to determine the costs for each method using the techniques developed in the recon report (9). some modifications of cost were made to the original recon estimates because actual figures are now available. marc costs were obtained by dividing the costs of the manhours for typing and proofing in a given period by the number of records added to the marc master file in the same period. the equipment cost per record was also based on the number of records added to the master file. production rates associated with particular tasks were not used. the manpower figures supplied by the contractor were limited to hourly production rates; therefore, to obtain the cost per record for ocr typing it was necessary to project the hourly rate to cover a manyear. the estimated annual production of a typist was then divided into the annual salary of a gs-4 (step 1) typist incremented by 8.5% for fringe benefits. the ocr equipment costs were computed on the basis of figures supplied by the contractor, assuming ownership of the ocr-font typewriter and service bureau rental of the scanner. 246 journal of library automation vol. 3/3 september, 1970 table 5. input costs per record 1. manpower key to cassette method typing $ .45 proofing .70 total $1.15 ocr method typing rate of contractor 1,000 records in 104 hours or 9.6 records per hour typing cost at lc $5,522 + 8.5% ( $5,522) 9.6 x 1,338 $ .466 proofing rate of recon editors at lc: 1,534 records proofed in 173 hours or 8.9 records per hour20% = 7.1 records per hour proofing cost at lc $6,882 + 8.5% ( $6,882) $ .786 7.1 x 1,338 typing $ .466 proofing .786 total $1.25 2. equipment (costs do not include maintenance where applicable ) key to cassette key to cassette monthly rental $100.00 converter-monthly rental prorated over 10 key to cassettes 26.00 total $126.00 hourly cost (assumes 132 hours a month) $ .955 effective production rate of key to cassette average weekly marc output 1,005 4 k t c tt 't 120 = 8.4 records/hour ey o asse e um s record cost of key to cassette and converter $.955 8.4 = $ .114 recon pilot project/ avram 247 ocr method ocr-font typewriter purchase price 40-month amortization hourly cost (assumes 132 hours use) effective production rate of ocr typewriter $500.00 12.50/month .095 9.6 records/hour x 1,338 homs d /l 132 hours x 12 months 8·1 recor s 10ur record cost of ocr typewriter $.095 sr=$ .o12 ocr scanner-service bureau hourly rental 10,000 lines/hour each recordis lines 555 records/hour record cost of ocr scmmer total record cost for equipment $.012 + $ .09 = $ 50.00 $ .09 $ .102 the cost of proofing in the ocr method was based on the recon experience at lc modified by contractor experience. in actual practice, ocr records are proofed and corrected by the contractor before they are proofed by recon editors. it was assumed that double proofing is unnecessary but that allowance should be made for the added difficulty of reading copy with a higher proportion of errors. (a preliminary study of errors on recon proofsheets has shown that there are fewer typographical errors on recon proofsheets than on current marc proofsheets.) for this reason, the number of recon records proofed in an hour has been decreased by 20% in the calculations. on the basis of the calculations in table 5, the comparative input costs are summarized as follows: table 6. estimated input cost per record key-to-cassette ocr manpower: typing $.45 $.47 proofing .80 .78 equipment .11 .10 totals $1.26 $1.35 the final figures indicate that the two methods are very close in cost. as presently calculated, the key-to-cassette method is less expensive than the ocr method. it is easy to see that a slight change in any cost or production rate could make the ocr method less expensive. if the proofing 248 journal of library automation vol. 3/3 september, 1970 rate of 8.9 records per hour were maintained instead of decreasing to 7.1 per hour, the ocr proofing cost would drop to $.63, and the total price for this proposed method would be $1.20. one way to test the assumption of the added difficulty of a single proofing would be to obtain uncorrected records from the contractor as a means of determining the actual proofing rate under that condition. recon tasks the four tasks that have been identified for study by the working task force are: 1) levels of completeness of marc records; 2) implications of a national union catalog in machine readable form; 3) conversion of existing data bases in machine readable form for use in a national bibliographic service; and 4) study of problems involved in any future distribution of name and subject cross reference control files. progress to date on the first three tasks is described in the following paragraphs. task 1 has been completed, and an article summarizing the results of a report submitted to clr has been published in the journal of library automation, june 1970 ( 10). the following conclusions reached by this study are quoted from the article: 1) the level of a record must be adequate for the purposes it will serve. 2) in terms of national use, a machine readable record may function as a means of distributing cataloging information and as a means of reporting holdings to a national union catalog. 3) to satisfy the needs of diverse installations and applications, records for general distribution should be in the full marc ii format. 4) records that satisfy the nuc function are not necessarily identical with those that satisfy the distribution function. 5) it is feasible to define the characteristics of a machine readable nuc report at a lower level than the full marc ii format. task 2 consists of an investigation of the implications of a national union catalog in machine readable form. a design of such a system is needed, and although the implementation of such a project is beyond the purview of the working task force, some of the technical and cost factors should be examined and defined for possible future research. as a framework for discussion purposes, a future reporting system for the national union catalog was postulated based on the present reporting system as follows: contributors lc outside libraries present report form printed cards locally produced cards and lc cards future report form lc marc data (for all records) marc data (for all records) or records submitted to nuc to be keyed as machine readable records recon pilot project/ avram 249 the problems of the control number and library location symbols were considered, but a tentative decision was made that recommendations should be forthcoming when the american national standards institute sectional committee z39 has completed its work on library identification codes. the indicators and subfield codes to be included in the machine readable nuc records would depend on the optimum file arrangement of the suggested bibliographic listings. the library of congress is presently engaged in a filing rules study which should influence the inclusion or exclusion of particular content designators. task 2 is still in progress. task 3 is the investigation of the possible utilization of other machine readable data bases for use in a national bibliographic store. the task was divided into several subtasks as follows: 1) identification of useful data bases for the purposes described (content and bibliographic completeness); 2) cost of the conversion from a local format to a marc ii record; 3 ) cost of updating records not already in the lc data base for consistency and missing data by comparing the records with the library of congress official catalog; 4) cost of comparing the record for the existing lc machine readable records to eliminate duplicate records. to satisfy the first subtask, a questionnaire was sent to 42 organizations. the information requested included: 1) availability of data bases-maintained by library or service bureau, and permission to copy data base. 2) use of the data base-for acquisitions, production of book catalog, circulation system, etc. 3) composition of data base-monographs, serials, technical reports, etc. 4) composition of data base-number of titles, imprint dates (primarily current, retrospective, etc.), language of records. 5) source of catalog data-marc distribution service, lc catalog card, local cataloging. 6) data elements for monographs. 7) format used in identifying data elements-marc i format, marc ii format, etc. 8) character set used. the results from this survey were analyzed, and a follow-up letter was sent to 22 of the organizations, requesting further information as follows: 1) an estimate of the number of monographs added to the data base each year. 2) representative group of twenty-five entries for monographs including both fiction and non-fiction. 3) details on the character set used in the machine readable data base.· 4) detailed specifications of monographic record format. responses from this last letter have been received and analyzed. this analysis should identify a limited number of machine readable data bases that will be subjected to further content and cost analysis. 250 journal of library automation vol. 3/3 september, 1970 outlook the recon project continues to be on schedule. the working task force has met several times for deliberations on the assigned tasks; in addition, members have been briefed on the progress of the pilot project and their advice has been sought. thus, individuals interested in the problems of bibliographic conversion guide the project throughout its development. the library of congress recon staff continues to maintain liaison with individuals and organizations working in any facet of the project's scope, hoping to bring all expertise possible to bear on the problems involved. it is significant, although not fully recognized at the onset of the recon project, that the solution to many of the problems under exploration will have impact on current conversion as well as retrospective conversion. this is evident at the library of congress where marc and recon, although staffed separately in the production area, share staff in the information systems office, and the project is known as marc/recon. coordination continues between the recon project and the card division mechanization project. the recon project director is the technical adviser for the card division project, and under her general direction, a computer analyst in the information systems office has been assigned full time to the project. the analyst has been given a detailed orientation to the procedures and computer programs for marc/recon and the specifications for the card division project. this exposure is necessary to guarantee that there is no duplication of effort between the two projects and that the design work for the card division project includes the possibility of a future national service for machine readable cataloging, both current and retrospective. (the marc distribution service is such a national service for english language monograph cataloging data, but what is assumed here is a service of a much broader scope.) although progress has been made in many of the tasks included in recon, several methods of input described in the recon report can only be fully evaluated when the format recognition programs are implemented. according to present estimates, this should take place toward the end of 1970. much remains to be accomplished. the library of congress will continue to make its progress known as rapidly as possible, because the results of the pilot project will have great ramifications for the entire library community. acknowledgments the authors wish to thank the staff members associated with the recon pilot project in the technical processes research office and the marc editorial office in the library of congress processing department, and recon pilot profectj avram 251 those in the information systems office, for their respective reports, which were incorporated into the progress report submitted to the council on library resources and which provided significant contributions to this paper. references l. avram, henriette d.: "the recon pilot project: a progress report," journal of library automation, 3 (june 1970). 2. recon working task force: conversion of retrospective records to machine-readable form (washington, d. c.: library of congress, 1969), pp. 32-33. 3. avram, henriette d., et al.: "marc program research and development: a progress report," journal of library automation, 2 (december 1969)' 250-253. 4. recon working task force: op. cit., p. 31. 5. national microfilm association: glossary of terms for microphotography and reproductions made from micro-images. 4th rev. ed. (annapolis, md.: national microfilm association, 1966), p. 8. 6. ibid. 7. ibid., p. 52 8. hawken, william r.: copying methods manual (chicago: library technology program, american library association, 1966), p. 243. 9. recon working task force: op. cit., pp. 58-59, 86, 93. 10. recon working task force: "levels of machine-readable records," journal of library automation, 3 (june 1970). / editorial the authors of “the state of rfid applications in libraries,” that appeared in the march 2006 issue, inadvertently included two sentences that are near quotations from a commentary by peter warfield and lee tien in the april 8, 2005 issue of the berkeley daily planet. on page 30 immediately following footnote 24, the authors wrote: “the eugene public library reported ‘collision’ problems on very thin materials and on videos as well as false readings from the rfid security gates. collision problems mean that two or more tags are close enough to cancel the signals, making them undetectable by the rfid checkout and security systems.” warfield and lien wrote: “the eugene (ore.) public library reported ‘collision’ problems on very thin materials and on videos as well as ‘false readings’ from the rfid security gates. (collision problems mean that two or more tags are close enough to ‘cancel the signals,’ according to an american library association publication, making them undetectable by the rfid checkout and security systems.)” (accessed may 16, 2006, www .berkeleydailyplanet.com/article.cfm?archivedate=04-08 -05&storyid=21128). the authors’ research notes indicated that it was a near quotation, but this fact was lost in the writing of the article. the article referee, the copy editors, and i did not question the authors because earlier in the same paragraph they wrote about the eugene public library experience and referred (footnote 23) to an earlier article in the berkeley daily planet. the authors and i apologize for this unfortunate error. **** july 1, 2006 marked the merger of rlg and oclc. by the time this editorial appears, many words will already have been spoken and written about this monumental, twentyfirst century library event. i know what i think the three very important immediate effects of the merger will be. first, it is a giant step toward the realization of a global library bibliographic database. second, taking advantage of rlg’s unique and successful programs and integrating them and their development philosophy as “rlgprograms,” while working alongside oclc research, seems a step so important for the future development of library technology that it cannot be overemphasized. third, and very practically, incorporating redlightgreen into open worldcat will give the library world a product that users might prefer over a search of google books or amazon. i requested and received quotes about the merger from the principals that i might put into this editorial that won’t appear until four months after the may 3 announcement. jay jordan, president and ceo, oclc, remarked: “we have worked cooperatively with rlg on a variety of projects over the years. since we announced our plans to combine, staff from both organizations have been working together to develop plans and strategies to integrate systems, products, and services. over the past several months, staff members have demonstrated great mutual respect, energy, and enthusiasm for the potential of our new relationship and what it means for the organizations we serve. there is much work to be done as we complete this transition. clearly, we are off to a good start.” betsy wilson, chair, oclc board of trustees, and dean of libraries, university of washington, wrote: “the response from our constituencies has been overwhelmingly supportive. over the past several months, we have finalized appointments for the twelve-person program council, which reports to . . . oclc through a standing committee called the rlg board committee. we are starting to build agendas for our new alliance. the members of this group from the rlg board are: james neal, vice president for information services and university librarian, columbia university; nancy eaton, dean of university libraries and scholarly communication, penn state university (and former chair of the oclc board); and carol mandel, dean of libraries, new york university. from oclc the members are elisabeth niggeman, director, deutschesbibliothek; jane ryland, senior scientist, internet 2; and betsy wilson, dean of university libraries, university of washington.” and from james michalko, currently president and ceo of rlg, and by the time you read this, vice president, rlg-programs development, oclc: “we are combining the practices of rlg and oclc in a very powerful way— by putting together the traditions of rlg and oclc we are creating a robust new venue for research institutions and new capacity that will provide unique and beneficial outcomes to the whole community.” by now, all lita members and ital readers know that in 1967, fred kilgour founded oclc; and was the founding editor of the journal of library automation (jola—vol. 1, no. 1 was published in march, 1968), which, with but a mild outcry from serials librarians, changed its title to information technology and libraries in 1982. this afternoon (6/15/06), i called fred. he and his wife eleanor reminisced about the earliest days, and then i asked him for his comments on the oclc-rlg merger. because he had had the first words about both oclc and jola, as it were, i told him that i would like for him to have the last. and this is what he said, “at long last!” fred kilgour died on july 31, 2006, aged 92. a tribute posted by alane wilson of oclc may be read at http:// scanblog.blogspot.com/2006/07/frederick-g-kilgour -1914-2006.html editorial: a confession, a speculation, and a farewell john webb john webb (jwebb@wsu.edu) is a librarian emeritus, washington state university and editor of information technology and libraries. editorial | webb 115 resource discovery: comparative survey results on two catalog interfaces heather hessel and janet fransen resource discovery: comparative survey results | hessel and fransen 21 abstract like many libraries, the university of minnesota libraries-twin cities now offers a next-generation catalog alongside a traditional online public access catalog (opac). one year after the launch of its new platform as the default catalog, usage data for the opac remained relatively high, and anecdotal comments raised questions. in response, the libraries conducted surveys that covered topics such as perceptions of success, known-item searching, preferred search environments, and desirable resource types. results show distinct differences in the behavior of faculty, graduate student, and undergraduate survey respondents, and between library staff and non-library staff respondents. both quantitative and qualitative data inform the analysis and conclusions. introduction the growing level of searching expertise at large research institutions and the increasingly complex array of available discovery tools present unique challenges to librarians as they try to provide authoritative and clear searching options to their communities. many libraries have introduced next-generation catalogs to satisfy the needs and expectations of a new generation of library searchers. these catalogs incorporate some of the features that make the current web environment appealing: relevancy ranking, recommendations, tagging, and intuitive user interfaces. traditional opacs are generally viewed as more complex systems, catering to advanced users and requiring explicit training in order to extract useful data. some librarians and users also see them as more effective tools for conducting research than next-generation catalogs. academic libraries are frequently caught in the middle of conflicting requirements and expectations for discovery from diverse sets of searchers. in 2002, the university of minnesota-twin cities libraries migrated from the notis library system to the aleph500™ system and launched a new web interface based on the aleph online catalog, originally branded as mncat. in 2006, the libraries contracted with the ex libris group as one of three development partners in the creation of a new next-generation search environment called primo. during the development process, the libraries conducted multiple usability studies that provided data to inform the direction of the product. participants in the usability studies generally characterized the primo interface as “clear” and “efficient.”1 a year later the university heather hessel (heatherhessel@yahoo.com) was interim director of enterprise technology and systems, janet fransen (fransen@umn.edu) is the librarian for aerospace engineering, electrical engineering, computer science, and history of science & technology, university of minnesota, minneapolis, mn. mailto:heatherhessel@yahoo.com mailto:fransen@umn.edu information technology and libraries | june 2012 22 libraries branded primo as mncat plus, rebranded the aleph opac as mncat classic, and introduced mncat plus to the twin cities user community as a beta service. in august 2008, mncat plus was configured as the default search for the twin cities catalog on the libraries’ main website, with the libraries continuing to keep a separate link active to the aleph opac. a new organizational body called the primo management group was created in december 2008 to coordinate support, feedback, and enhancements of the local primo installation. this committee’s charge includes evaluating user input and satisfaction, coordinating communication to users and staff, and prioritizing enhancements to the software and the normalization process. when the primo management group began planning its first user satisfaction survey, the group noted that a significant number of library users seemed to prefer mncat classic. therefore, two surveys were developed in response to the group’s charge. these two surveys were identical in scope and questions, except that one survey referenced mncat classic and was targeted to mncat classic searchers (appendix a), while the other survey referenced mncat plus and was targeted to mncat plus searchers (appendix b). these surveys were designed to produce statistics that could be used as internal benchmarks to gauge library progress in areas of user experience, as well as to assist with ongoing and future planning with regard to discovery tools and features. research questions in addition to evaluating user satisfaction and requesting user input, the primo management group also chose to question users about searching behaviors in order to set the direction of future interface work. questions directed toward searching behaviors were informed by the findings from a 2009 university of minnesota libraries report on making resources discoverable.2 the group surveyed respondents about types of items they expect to find in their searches, their interest in online resources, and the entry point for their discovery experience. the primo management group crafted the surveys to get answers to the following research questions:  how often do users view their searching activity as successful?  how often do users know the title of the item that they are looking for, as opposed to finding any resource relevant to their topic?  what search environments do users choose when looking for a book? a journal? anything relevant to a topic?  how interested are users in finding items that are not physically located at the university of minnesota?  are there other types of resources that users would find helpful to discover in a catalog search? resource discovery: comparative survey results | hessel and fransen 23 although it can be tempting to think of the people using the catalog interfaces as a homogeneous group of “users,” large academic libraries serve many types of users. as wakimoto states in “scope of the library catalog in times of transition,” on the one hand, we have ‘net-generation users who are accustomed to the simplicity of the google interface, are content to enter a string of keywords, and want only the results that are available online. on the other hand, we have sophisticated, experienced catalog users who understand the purpose of uniform titles and library of congress classifications and take full advantage of advanced search functions. we need to accommodate both of these user groups effectively.3 the primo management group planned to use the demographic information to look for differences among user communities; therefore the surveys requested demographic information such as role (e.g., student) and college of affiliation (e.g., school of dentistry). in designing the surveys, the group took into account the limitations of this type of survey as well as the availability of other sources of information. for example, the primo management group chose not to include questions about specific interface features because such questions could be answered by analyzing data from system logs. the group was also interested in finding out about users’ strategies for discovering information, but members felt that this information was better obtained through focus groups or usability studies rather than through a survey instrument. research method the primo management group positioned links to the user surveys in several online locations, with the libraries’ home page providing one primary entry point. clicking on the link from the home page presented users with an intermediate page, where they were given a choice of which survey to complete: one based on mncat plus, and the other on mncat classic. if desired, users could choose to complete a separate survey for each of the two systems. links were also provided from within the mncat plus and mncat classic environments, and these links directed users to the relevant version of the survey without the intermediary page. in addition to the survey links in the online environment, announcements were made to staff about the surveys, and librarians were encouraged to publicize the surveys to their constituents around campus. the survey period lasted from october 1 through november 25, 2009. at the time of the surveys, the university of minnesota libraries was running primo version 2 and aleph version 19. because participants were self-selected, the survey results represent a biased sample, are more extreme than the norm, and are not generalizable to the whole university population. participants were not likely to click the survey link or respond to e-mailed requests unless they had sufficient incentive, such as strong feelings about one interface or the other. thirty percent of respondents provided an e-mail address to indicate that they would be willing to be contacted for focus groups or further surveys, indicating a high level of interest in the public-facing interfaces the libraries employ. in considering a process for repeating this project, more attention would be paid to methodology to address validity concerns. findings and analysis information technology and libraries | june 2012 24 findings relevant to each research question are discussed here. six hundred twenty-nine surveys contained at least one response—476 for mncat plus and 153 for mncat classic. responses by demographics as shown in table 1, graduate students were the primary respondents for both mncat plus and mncat classic, followed by undergraduates and faculty members. library staff made up 13 percent of mncat classic respondents and 4 percent of mncat plus respondents, although the actual number of library staff responding was nearly identical (twenty-one for mncat plus, twenty for mncat classic). library staff members were disproportionately represented in these survey responses and the group analyzed the results to identify categories in which library staff members differed from overall trends in the responses. questions about affiliation appeared at the end of the surveys, which may account for the high number of respondents in the “unspecified” category. mncat classic respondents frequency mncat plus respondents frequency graduate student 50 33% graduate student 176 37% undergraduate student 31 20% undergraduate student 110 23% library staff 20 13% faculty 40 8% faculty 21 14% staff (non-library) 28 6% staff (non-library) 10 7% library staff 21 4% community member 2 1% community member 11 2% (unspecified) 19 12% (unspecified) 90 19% total 153 100% total 476 100% table 1. respondents by user population a comparison of the student survey responses shows that graduate students were overrepresented, while undergraduates were underrepresented, at close to a reverse ratio. of the total number of graduate and undergraduate students, 62 percent of the respondents were graduate students, even though they accounted for only 32 percent in the larger population. conversely, undergraduates represented only 38 percent of the student respondents, even though they accounted for 68 percent of the graduate and undergraduate total. regrettably, the surveys did not include options for identifying oneself as a non-degree-seeking or professional student, so the analysis of students compared with overall population in this section includes only graduate students and undergraduates. differences were also apparent in the representation of all four categories of students within a particular college unit. at least two college units were underrepresented in the survey responses: resource discovery: comparative survey results | hessel and fransen 25 carlson school of management and the college of continuing education. one college unit was overrepresented in the survey results; 59 percent of the overall student respondents to the mncat classic survey, and 47 percent of the mncat plus students indicated that they were housed in the college of liberal arts (cla), and yet cla students only represent 32 percent of the total number of students on campus. table 2 shows the breakdown of percentages by college or unit and the corresponding breakdown by survey respondent, highlighting where significant discrepancies are evident. twin cities overall percentage of students mncat classic student survey respondents +/mncat plus student survey respondents +/ carlson school of management 9% 0% -9% 2% -7% center for allied health 0% 2% +1% 1% 0% col of educ/human development 10% 9% -1% 14% +3% col of food, agr & nat res sci 5% 4% 0% 7% +2% coll of continuing education 8% 1% -7% 1% -7% college of biological sciences 4% 6% +2% 5% 0% college of design 3% 3% 0% 3% 0% college of liberal arts 32% 59% +27% 47% +15% college of pharmacy 1% 1% 0% 0% -1% college of veterinary medicine 1% 1% 0% 1% 0% graduate school 0% 0% 0% 0% 0% humphrey inst of publ affairs 1% 1% 0% 1% 0% institute of technology (now college of science & engineering) 14% 9% -5% 10% -4% law school 2% 1% -1% 1% 0% medical school 4% 2% -3% 5% 0% school of dentistry 1% 1% 0% 0% -1% school of nursing 1% 0% -1% 0% -1% school of public health 2% 1% -1% 3% +1% table 2. student responses by affiliation information technology and libraries | june 2012 26 faculty and staff together totaled only eighty-nine respondents on the mncat plus survey and fifty-one respondents on the mncat classic survey. in keeping with graduate and undergraduate student trends, the college of liberal arts (cla) was clearly over-represented in terms of faculty responses. the cla faculty group represents about 17 percent of the faculty at the university of minnesota. yet over half the faculty respondents on the mncat plus survey were from cla; over 80 percent of the mncat classic faculty respondents identified themselves as affiliated with cla. faculty groups that were underrepresented include the medical school and the institute of technology. perceptions of success a critical area of inquiry for the surveys was user satisfaction and perceptions of success: “do users perceive their searching activity as successful?” asked in both surveys, the question’s responses allowed the primo management group to compare respondents’ perceived success between the two interfaces. results show a marked difference: while 86 percent of the mncat classic respondents reported that they are “usually” or “very often” successful at finding what they are looking for, only 62 percent of the mncat plus respondents reported the same perception of success. respondents reported very similar rates of success regardless of school, type of affiliation, or student status. figure 1. perceptions of success: mncat plus and mncat classic these results should be interpreted cautiously. because mncat plus is the libraries’ default catalog interface, mncat classic users are a self-selecting group whose members make a conscious decision to bookmark or click the extra link to use the mncat classic interface. one cannot assume that mncat users in general also would have an 86 percent perception of success were they to use mncat classic; familiarity with the tool could play a part in mncat classic users’ success. 14% 24% 44% 18% 4% 11% 32% 54% 0% 10% 20% 30% 40% 50% 60% rarely sometimes usually very often mncat classic mncat plus resource discovery: comparative survey results | hessel and fransen 27 another possible factor in the reported difference in user success is the higher proportion of known-item searching—finding a book by title—occurring in mncat classic. a user’s criteria for success differ when searching for a known item versus conducting a general topical search. it is easier for a searcher to determine that they have been successful in a situation where they are looking for a specific item. some features of mncat classic, such as the start-of-title and other browse indexes, are well suited to known-item searching and had no direct equivalent in mncat plus, which defaults to relevance-ranked results. (primo version 3 has implemented new features to enhance known-item searching.) comments received from users suggest that several factors played a role. one mncat classic respondent praised the “precision of the search...not just lots of random hits” and noted that mncat classic supports a “[m]ore focused search since i usually already know the title or author.” in contrast, a mncat plus respondent commented that the next-generation interface was “great for browsing topics when you do not have a specific title in mind.” this comment is consonant with the results from other usability testing done on next-generation catalogs. in "next generation catalogs: what do they do and why should we care?", emanuel describes observed differences between topical and known-item searching: “during the testing, users were generally happy with the results when they searched for a broad term, but they were not happy with results for more specific searches because often they had to further limit to find what they wanted in the first screen of results.”4 a common characteristic of next-generation catalogs is that they return a large result set that can then be limited using facets. training and experience may also explain some of the differences in success. mncat plus also enables functionality associated with the functional requirements for bibliographic records (frbr), which is intended to group items with the same core intellectual content in a way that is more intuitive to searchers. however, this feature is unfamiliar to traditional catalog searchers and requires an extra step to discover very specific known-items in primo. one mncat plus user expressed dissatisfaction and added, “i'm not sure if it's my lack of training/practice or that the system is not user-friendly.” in focus group analyses conducted in 2008, oclc found that “when participants conducted general searches on a topic (i.e., searches for unknown items) that they expressed dissatisfaction when items unrelated to what they were looking for were returned in the results list. end users may not understand how to best craft an appropriate search strategy for topic searches.”5 how often do users know the title of the item that they are looking for? users come to the library with different goals in mind. in “chang's browsing,” available in theories of information behavior, chang identified five general browsing themes,6 adapted to discovery by carter.7 for the purposes of the survey, the primo management group grouped those themes into two goals: finding an item when the title is known, and finding anything on a given topic. the primo management group had heard concerns from faculty and staff that they have more difficulty finding an item when they know the title when using mncat plus than they did with mncat classic. the group was interested in knowing how often users search for known items. to explore this topic and its impact on perceptions of success, the surveys included two questions on known-item and topical searching. the survey results shown in table 3 indicate that a significantly higher proportion of mncat classic respondents (30 percent plus 43 percent = 73 percent) than mncat plus respondents (24 information technology and libraries | june 2012 28 percent plus 29 percent = 53 percent) were “very often” or “usually” searching for known items. it may be that users in search of known items have learned to go to mncat classic rather than mncat plus. rarely sometimes usually very often total i already know the title of the item i am looking for mncat classic 7% (11) 19% (29) 30% (46) 43% (66) 152 mncat plus 15% (69) 33% (151) 24% (111) 29% (132) 463 i am looking for any resource relevant to my topic mncat classic 14% (21) 32% (47) 20% (29) 34% (51) 148 mncat plus 14% (62) 29% (133) 29% (133) 28% (127) 455 table 3. responses to “i already know the title of the item i am looking for” when the primo management group considered how often researchers in different user roles searched for known items versus anything on a topic, clear patterns emerged as shown in figure 2. in the mncat plus survey, only 34 percent of undergraduate mncat plus searchers “usually” or “very often” search for a particular item, versus 74 percent of faculty. conversely, 75 percent of undergraduate respondents “usually” or “very often” search for any resource relevant to a topic, versus 37 percent of faculty. graduate student respondents showed interest in both kinds of use. if successful browsing by topic is best achieved using post-search filtering, it may help to explain differences between undergraduate students and faculty. the analysis of usability testing done on other next generation catalogs described in “next generation catalogs: what do they do and why should we care?” states that “users that did not have extensive searching skills were more likely to appreciate the search first, limit later approach, while faculty members were faster to get frustrated with this technique.”8 results for all mncat classic respondents showed a preference for known item searching, but undergraduate students still indicated that they search more for anything on the topic and less for known items than faculty respondents. no significant differences were identified by discipline. resource discovery: comparative survey results | hessel and fransen 29 figure 2. searching for a known item vs. any relevant resource some qualitative comments from survey takers suggest that respondents view the library interface as a place to go to find something already known to exist, e.g., “i never want to search by topic. library catalogs are for looking up specific items.” however, with respect to discovering resources for a subject in general, both mncat classic and mncat plus respondents showed that they would also like to find items relevant to their topic (figure 2). there was no significant difference between mncat classic and mncat plus respondents on this question; in both environments, only 14 percent of the users said that they would “rarely” be interested in general results relevant to their topic. perceptions of success by specific characteristics for mncat plus, the majority of respondents “somewhat agree” or “strongly agree” that items available online or in a particular collection are easy to find. one-third of the mncat plus respondents had never tried to find an item in a particular format. over 40 percent had never tried to find an item with a particular isbn/issn. interface features may be a factor here: isbn/issn searching is not a choice in the mncat plus drop down menu, so users may not know that they can do such a search. a higher percentage of mncat classic respondents “strongly agree” that it is easy to find items by collection, available online, or in a particular format, than mncat plus respondents. figure 3 shows results based on particular characteristics. information technology and libraries | june 2012 30 figure 3. perception of success by characteristic although the surveys were primarily intended to gather reactions from end users, some interesting data emerged about usage by library staff. as demonstrated in figure 4, library staff respondents were much more likely to have performed the specific types of searches listed in this section than users generally, and reported a much higher rate of perceived success with mncat classic. figure 4. perception of success by characteristic: library staff resource discovery: comparative survey results | hessel and fransen 31 searching by location: local collections and other resources in a large research institution with several physical library locations and many distinct collections, users need the ability to quickly narrow a search to a particular collection. but even the largest institution cannot collect everything a researcher might need. the primo management group wondered not only whether users felt successful when they looked for an item in a particular collection but also wanted to explore whether users want to see items not owned by the institution as part of their search results. finding items among the many library locations was not a problem for either mncat plus or mncat classic respondents: 72 percent either somewhat or strongly agreed that it is easy to find items in a particular collection using mncat. furthermore, survey respondents of both interfaces agreed that they are interested in items no matter where the items are, which underlines the value of a service such as worldcat; 73 percent of mncat plus respondents and 78 percent of mncat classic respondents expressed a preference for seeing items held by other libraries, knowing they could request items using an interlibrary loan service if necessary. preferred search environments three of the survey questions asked users about their preferred search environments for different searching needs:  when looking for a particular book  when looking for a particular journal article  when searching without a particular title in mind each survey presented respondents with a list of choices and space to specify other sources not listed. respondents were encouraged to mark as many sources as they regularly use. when searching for a specific book, users of the two catalog environments identified a number of other sources. the top five sources in each survey are listed in table 4. when i am looking for a specific book, i usually search (check all that apply): mncat classic respondents (frequency) mncat plus respondents (frequency) 1. mncat classic (116) 1. mncat plus (217) 2. worldcat (50) 2. google (165) 3. amazon (50) 3. mncat classic (163) 4. google (49) 4. amazon (160) 5. google books (31) 5. google books (108) table 4. search environment for books information technology and libraries | june 2012 32 qualitative comments indicated that users like being able to connect to amazon and google books in order to look at tables of contents and reviews. they also specifically mentioned barnes and noble, as well as other local libraries. these results show that mncat plus respondents were more likely to also use mncat classic than vice-versa. the data do not suggest why this would be the case, but familiarity with the older interface may play a role. mncat classic respondents were more likely than mncat plus users to return to their search environment when searching for a particular book (82 percent versus 53 percent). one mncat plus respondent commented “i didn't know i could still get to mncat classic.” when searching for a specific journal article, users of both systems chose “other databases (jstor, pubmed, etc.)” above all the other choices. even more respondents would likely have marked this choice if not for confusion over the term “other databases.” most of the comments mentioned specific databases, even when the respondent had not selected the “other databases” choice. one user commented, “most of these choices would be illogical. you don't list article indexes, that's where i go first.” table 5 lists the five responses marked most often for each survey. when i am looking for a specific journal article, i usually search (check all that apply): mncat classic respondents (frequency) mncat plus respondents (frequency) 1. other databases (jstor, pubmed, etc.) (92) 1. other databases (jstor, pubmed, etc.) (232) 2. mncat classic (53) 2. google scholar (131) 3. google scholar (40) 3. e-journals list (130) 4. e-journals list (34) 4. mncat plus (110) 5. google (29) 5. mncat plus article search (101) table 5. search environment for articles. qualitative comments from respondents indicated that interfaces would be more useful if they helped users find online journal articles. this raised some questions with regard to mncat plus, which includes a tab labeled “articles” for conducting federated article searches. however, mncat plus respondents noted that they used the plus “articles” search almost as much as they did mncat plus. other plus comments included: i tried to use this for journal articles but it only has some in the database i guess and when i did my search it only found books and no articles. i don't understand it. i tried this new one and it came up with wierd [sic] stuff in terms of articles. my professor said to give up and use the regular indexes because i wasn't getting what i needed to do the paper. it wasted my time. this desire for federated search coupled with the expressions of dissatisfaction with the existing federated search platform is consistent with the mixed opinions expressed in other studies, such as sam houston state university’s assessment of use of and satisfaction with the webfeat resource discovery: comparative survey results | hessel and fransen 33 federated search tool. that study found “[f]ederated search use was highest among lower-level undergraduates, and both use and satisfaction declined as student classification rose.”9 the new search tools that contain preindexed articles, such as primo central, summon, worldcat local, and ebsco discovery service, may address the frustrations that more experienced searchers express regarding federated search technology. when researching a topic without a specific title in mind, “google” and “other databases” were nearly equal and ranked first for mncat plus respondents, while “other databases” ranked first for mncat classic respondents. table 6 lists the five responses marked most option for each survey. when i am researching a topic without a specific title in mind, i usually search (check all that apply): mncat classic respondents (frequency) mncat plus respondents (frequency) 1. other databases (jstor, pubmed, etc.) (84) 1. google (197) 2. mncat classic (76) 2. other databases (jstor, pubmed, etc.) (192) 3. google (63) 3. google scholar (155) 4. google scholar (47) 4. mncat plus (145) 5. worldcat (32) 5. mncat classic (101) table 6. search environment for topics significant differences based on school affiliation were evident in the area of preferred search environments for topical research. for example, institute of technology respondents reported using google much more often when researching without a specific title in mind than respondents in other areas. evidence from the health sciences is limited in that only seven percent of respondents in total identified themselves as being from this area. however, these limited results show that health sciences respondents relied more on library databases than on google. respondents in the liberal arts relied more on mncat, in either version, than did respondents in the other fields. desired resource types one feature of the primo discovery interface is its ability to aggregate records from more than one source. university libraries maintains several internal data sources that are not included in the catalog, and the possibility of including some of these in the mncat plus catalog has been considered many times since primo’s release. the primo management group was interested to hear from users whether they would find three types of internal sources useful: research reports and preprints, online media, and archival finding aids. the group also asked users to mark “online journal articles” if they would find article results helpful. the question did not specify whether journal articles would appear integrated with other search results in a mncat “books” search or information technology and libraries | june 2012 34 in a separate search such as that already provided through a metasearch on the mncat plus articles tab. the surveys asked users what kinds of resources would make mncat more useful. the results for both mncat plus and mncat classic were similar and response counts for both surveys were ordered as shown in table 7. respondents could mark more than one of the choices. i would find mncat more useful if it helped me find: mncat classic frequency mncat plus frequency online journal articles 65 255 u of m research materials (e.g., research reports, preprints) 34 149 online media (e.g., digital images, streaming audio/visual) 27 134 archival finding aids 27 90 table 7. desired resource types the primo management group noted that more mncat plus respondents chose “online journal articles” more frequently than the other categories even though the mncat plus interface includes an “articles” tab for federated searching. it is unclear whether the respondents were not seeing the “articles” tab in mncat plus because they would like to see search results integrated, or if they were using the “articles” tab and were not satisfied with the results. comments from respondents generally supported the inclusion of a wider range of resources in mncat. however, several respondents also expressed concerns about the trade-offs that might be involved in providing wider coverage. one user liked the idea of having the databases “all … in one place,” but added that “it would have to just give you the stuff that you need.” several users cited the varying quality of the material discovered through library sources. one user supported the inclusion of articles “if it included good articles and not the ones i got.” a mncat classic respondent gave the variable quality of the material he or she had found through a database search as a reason for leaving the coverage of mncat as it is: “i use the best sources depending on my needs.” another mncat classic user expressed doubt that coverage of all disciplines was feasible. in commenting on the content of mncat, respondents also mentioned specific types of material that they wanted to see (e.g. archives of various countries), as well as difficulties with particular classes of material (“the confusing world of government documents”). one mncat plus user related his or her interest in public domain items to a specific item of functionality that would enhance their discovery, namely a date sort. in general, the interest in university of minnesota research material was fairly high. however, faculty members ranked university of minnesota research materials last in terms of preference: only twelve faculty respondents chose the option, out of sixty-one total faculty respondents. resource discovery: comparative survey results | hessel and fransen 35 conclusions the data from two surveys, conducted concurrently in 2009 on a traditional opac (mncat classic) and next-generation catalog (mncat plus), point to differences in the use and perceptions of both systems. there appeared to be fairly strong “brand loyalty” with mncat classic, given that this interface is no longer the default search for the libraries. surveys for both systems suggest a perception of success that is lower than desirable and that there is room to improve the quality of the discovery experience. it is unclear from the data if the reported perceptions of success were the result of the systems not finding what the user wants, or if the systems did not contain what the user wanted to find. mncat classic respondents were more likely to use worldcat to find a specific book than mncat plus respondents. mncat plus respondents indicated a use of mncat classic, but not vice versa. both sets of surveys described use of amazon and google for discovery. mncat plus respondents reported lower rates of success at finding known items than mncat classic respondents. mncat classic respondents were far more likely to have a specific title in mind that they wanted to obtain; half of the mncat plus respondents reported having a specific title in mind. the team that examined the survey responses found that the data suggested several key attributes that should be present in the libraries discovery environment. further discussion of the results and suggested attributes was conducted with library staff members in open sessions. results also informed local work on improving discovery interfaces. the results suggested:  the environment should support multiple discovery tasks, including known-item searching and topical research.  support for discovery activity should be provided to all primary constituent groups, noting the significant survey response by graduate student searchers.  users want to discover materials that are not owned by the libraries, in addition to local holdings.  a discovery environment should make it easy for users to find and access resources in vendor-provided resources, such as jstor and pubmed. while the results of the 2009 surveys provided a valuable description of usage, the survey team recognized that methodological choices limit the usefulness in applying results to a larger population. the team also recognized that there were a number of questions yet unanswered. some of these outstanding questions present opportunities for future research and suggest that a variety of formats might be useful, including surveys, focus groups, and targeted interviews.  to what extent do users expect to find integrated search results among different kinds of content, such as articles, databases, indexes, and even large scale data sets?  what general search strategies do users use to navigate the complex discovery environment that is available to them, and where are the failure points?  how much of the current environment requires training and how much is truly intuitive to users? information technology and libraries | june 2012 36  how can the university libraries identify and serve users who did not complete the surveys?  how useful would users find targeted results based on a particular characteristic such as role, student status, or discipline? since the surveys were conducted, the university libraries upgraded to primo version 3, which included features to address some of the concerns respondents identified in the surveys, such as known-item searching. primo version 3 allows users to conduct a left-justified title search (“title begins with…”), as well as sort by fields such as title and author. once the new version has been in place long enough for users to develop some comfort with the interface, the primo management group intends to resolve methodological issues and repeat its surveys, measuring users’ reactions against the baseline data set in the 2009 surveys. acknowledgements we would like to thank the other members of the primo management group, who helped to design and implement the surveys, as well as analyze and communicate the results: chew chiat naun (chair), susan gangl, connie hendrick, lois hendrickson, kristen mastel, r. arvid nelsen, and jeff peterson. we also want to acknowledge the helpful feedback and guidance of the group’s sponsor, john butler. references 1 tamar sadeh, “user experience in the library: a case study.” new library world 109, no. 1/2 (2008): 7–24. 2 cody hanson et al., discoverability phase 1 final report (minneapolis: university of minnesota, 2009), http://purl.umn.edu/48258/ (accessed dec. 20, 2010). 3 jina choi wakimoto, “scope of the library catalog in times of transition.” cataloging & classification quarterly 47, no. 5 (2009): 409–26. 4 jenny emanuel, “next generation catalogs: what do they do and why should we care?” reference & user services quarterly 49, no. 2 (winter, 2009): 117–20. 5 karen calhoun, diane cellentani, and oclc, online catalogs : what users and librarians want: an oclc report (dublin, ohio: oclc, 2009). 6 shan-ju chang, “chang's browsing,” in theories of information behavior, ed. karen e. fisher, sandra erdelez and lynne mckechnie, 69-74 (medford, n.j.: information today, 2005). 7 judith carter, “discovery: what do you mean by that?” information technology & libraries 28, no. 4 (december 2009): 161–63. 8 jenny emanuel, “next generation catalogs: what do they do and why should we care?” reference & user services quarterly 49, no. 2 (winter, 2009): 117–20. 9 abe korah and erin dorris cassidy. “students and federated searching: a survey of use and satisfaction,” reference & user services quarterly 49, no. 4 (summer 2010): 325–32. https://purl.umn.edu/48258 resource discovery: comparative survey results | hessel and fransen 37 appendix a. mncat classic survey the library catalog is intended to help you find an item when you know its title, as well as suggest items that are relevant to a given topic. we’d like to know how often you use mncat classic for these different purposes. 1. when i visit mncat classic… very often usually sometimes rarely i already know the title of the item i am looking for     i am looking for any resource relevant to my topic     many people use tools other than the library catalog to find books, articles, and other resources. for the different situations below, please tell us what other tools you find helpful. 2. when i am looking for a specific book, i usually search (check all that apply):  amazon  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search  google scholar  libraries onesearch other (please specify) _______________________________________________________ 3. when i am looking for a specific journal article, i usually search (check all that apply):  amazon  google books  mncat plus article search  citation linker  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat other (please specify) ___________________________________________________ information technology and libraries | june 2012 38 4. when i am researching a topic without a specific title in mind, i usually search (check all that apply):  amazon  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search other (please specify) ___________________________________________________ now we’d like to know what you think of mncat classic and what new features (if any) you’d like to see. 5. when i use mncat classic very often usually sometimes rarely i succeed in finding what i’m looking for     6. it is easy to find the following kinds of items in mncat classic strongly agree somewhat agree somewhat disagree strongly disagree i haven’t looked for this with mncat classic an item that is available online      an item within a particular collection (e.g., wilson library, university archives, etc.)      an item in a particular physical format (e.g., dvd, map, etc.)      an item with a specific isbn or issn      resource discovery: comparative survey results | hessel and fransen 39 7. i would find mncat classic more useful if it helped me find (check all that apply):  online journal articles  online media (e.g., digital images, streaming audio/visual)  archival finding aids  u of m research material (e.g., research reports, preprints) other (please specify) ___________________________________________________ 8. the worldcat catalog allows you to search the contents of many library collections in addition to the university of minnesota. which of the following best describes your level of interest in this type of catalog?  yes, i am interested in what other libraries have regardless of where they are, knowing i could request it through interlibrary loan if i want it  yes, i am interested, but only if i can get the items from a nearby library  no, i am interested only in what is available at the university of minnesota libraries please share anything you particularly like or dislike about mncat classic. 9. what i like most about mncat classic is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ 10. what i like least about mncat classic is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ we want to understand how different groups of people use mncat classic, as well as other tools, for finding information. please answer the following questions to give us an idea of who you are. 11. how are you affiliated with the university of minnesota?  faculty  graduate student  undergraduate student  staff (non-library) information technology and libraries | june 2012 40  library staff  community member 12. with which university of minnesota college or school are you most closely affiliated?  allied health programs  food, agricultural and natural resource sciences  pharmacy  biological sciences  law school  public affairs  continuing education  liberal arts  public health  dentistry  libraries  technology (engineering, physical sciences & mathematics)  design  management  veterinary medicine  education & human development  medical school  none of these  extension  nursing 13. we are interested in learning more about how you find the materials you need. if you would be willing to be contacted for further surveys or focus groups, please provide your e-mail address: _______________________________________________ resource discovery: comparative survey results | hessel and fransen 41 appendix b. mncat plus survey the library catalog is intended to help you find an item when you know its title, as well as suggest items that are relevant to a given topic. we’d like to know how often you use mncat plus for these different purposes. 1. when i visit mncat plus… very often usually sometimes rarely i already know the title of the item i am looking for     i am looking for any resource relevant to my topic     many people use tools other than the library catalog to find books, articles, and other resources. for the different situations below, please tell us what other tools you find helpful. 2. when i am looking for a specific book, i usually search (check all that apply):  amazon  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search  google scholar  libraries onesearch other (please specify) _______________________________________________________ 3. when i am looking for a specific journal article, i usually search (check all that apply):  amazon  google books  mncat plus article search  citation linker  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat other (please specify) ___________________________________________________ information technology and libraries | june 2012 42 4. when i am researching a topic without a specific title in mind, i usually search (check all that apply):  amazon  google scholar  libraries onesearch  e-journals list  mncat classic  other databases (jstor, pubmed, etc.)  google  mncat plus  worldcat  google books  mncat plus article search other (please specify) ___________________________________________________ now we’d like to know what you think of mncat plus and what new features (if any) you’d like to see. 5. when i use mncat plus very often usually sometimes rarely i succeed in finding what i’m looking for     6. it is easy to find the following kinds of items in mncat plus strongly agree somewhat agree somewhat disagree strongly disagree i haven’t looked for this with mncat plus an item that is available online      an item within a particular collection (e.g., wilson library, university archives, etc.)      an item in a particular physical format (e.g., dvd, map, etc.)      an item with a specific isbn or issn      resource discovery: comparative survey results | hessel and fransen 43 7. i would find mncat plus more useful if it helped me find (check all that apply):  online journal articles  online media (e.g., digital images, streaming audio/visual)  archival finding aids  u of m research material (e.g., research reports, preprints) other (please specify) ___________________________________________________ 8. the worldcat catalog allows you to search the contents of many library collections in addition to the university of minnesota. which of the following best describes your level of interest in this type of catalog?  yes, i am interested in what other libraries have regardless of where they are, knowing i could request it through interlibrary loan if i want it  yes, i am interested, but only if i can get the items from a nearby library  no, i am interested only in what is available at the university of minnesota libraries please share anything you particularly like or dislike about mncat plus. 9. what i like most about mncat plus is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ 10. what i like least about mncat plus is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ we want to understand how different groups of people use mncat plus, as well as other tools, for finding information. please answer the following questions to give us an idea of who you are. 11. how are you affiliated with the university of minnesota?  faculty  graduate student  undergraduate student  staff (non-library) information technology and libraries | june 2012 44  library staff  community member 12. with which university of minnesota college or school are you most closely affiliated?  allied health programs  food, agricultural and natural resource sciences  pharmacy  biological sciences  law school  public affairs  continuing education  liberal arts  public health  dentistry  libraries  technology (engineering, physical sciences & mathematics)  design  management  veterinary medicine  education & human development  medical school  none of these  extension  nursing 13. we are interested in learning more about how you find the materials you need. if you would be willing to be contacted for further surveys or focus groups, please provide your e-mail address: _______________________________________________ editorial board thoughts: reinvesting in our traditional personnel through knowledge sharing and training mark dehmlow information technology and libraries | december 2017 4 mark dehmlow (mdehmlow@nd.edu) is a member of the ital editorial board and director of library information technology, hesburgh library, university of notre dame, south bend, indiana. lately i have been giving a lot of thought to how those of us in technology positions can extend our impact throughout our organizations. with finite budgets and time and relatively low personnel turnover, i have realized that the solution goes beyond merely finding ways that technology can optimize workflows through automation. i have been working in academic library technology for nearly 20 years and when i began my career, virtually all areas of technology required specialized staff – from supporting general computer applications to managing the technical infrastructure that underlay our core systems. these days, technology is still a specialty, but the function of technicians has become more focused on providing infrastructure and much of the general application support we used to provide has become ubiquitous and has become an expectation of almost all library positions. managing email, creating specialized formulas for data analysis, navigating operating systems, even developing basic databases, are now regular parts of library work. the trend of technological infusion will continue but instead of general technical tasks, almost all new library positions will require deeper technical skills. this is due, in part, to the function of knowledge work becoming more specialized as libraries focus on the areas where they can create the most value and those new domains require more technical expertise to be effective. perhaps the most striking example of this evolution is in the transition of catalogers to metadata specialists. the days of working with a single metadata format (marc) in a single, tabular interface (catalog) are quickly slipping away and being replaced by metadata structured in multiple complex schemes, expressed in formats like xml and json. instead of acquiring data from oclc, libraries need to work with web-based apis to harvest metadata. and the tools for manipulation require basic programming skills in languages like python or working with open source applications that look little like the integrated library systems we are used to. working with these tools can enable metadata experts to customize metadata at scale, but it requires new knowledge and even new ways of thinking about metadata and metadata manipulation. cataloging isn’t the only position undergoing change in academic libraries, either. acquisitions is pushing toward greater automation and patron driven selection. the catalog is becoming more like a bookstore and the discovery landscape includes a panoply of resources that are purchased only at the point a user clicks on a link to a resource. acquisitions is also occurring at larger scale, and requiring the ability to work with thousands of items in a batch, to select based on the qualities of what libraries want to make available, to analyze usage trends, and to load, update, and remove metadata as quickly from our discovery environment as possible. the tools to accomplish this are similar to those for metadata. beyond technical services, we’re beginning to see the role of the subject selector transition from building broad disciplinary collections toward a focus on curation editorial board thoughts | dehmlow 5 https://doi.org/10.6017/ital.v36i4.10239 of specialized collections requiring digitization and digital curation. the tools to accomplish this are digital asset management systems and web-based digital exhibition tools which are specialized content management systems. subject selectors are transforming into digital content creators and managers. technologically-driven change regularly outpaces generational personnel turnover in libraries, and given that technological change continues to grow exponentially, it is clear we need a flexible workforce and an organizational commitment to training and professional growth. while organizations are rewriting positions to include technical skills, we will always have a preponderance of staff that started their careers in libraries with depreciating skillsets. merely directing staff to webinars, conferences and self-driven development isn’t enough. multi-day workshops are great as long as there are opportunities to apply learning upon returning to work. to guarantee skill retention, sustained training needs to be directed towards the specific skills needed now and based in actual work, not just theoretical exercises. the challenge, then, becomes how to implement such a program and identifying who can provide the necessary training. how can specialization be disseminated to non-specialists? many libraries have some of the needed resources close at hand, even if staffing is thin and technical resources scarce. it requires thinking a bit pragmatically to reuse the resources libraries do have, and for technologists to evolve with demands as well, transitioning our roles from technology experts alone to a hybrid of practitioners, teachers, and enablers. teaching is, itself, a specialty and many it professionals are unlikely to have developed that skillset. most libraries, though, have staff who do have experience and expertise in training and pedagogy. evolving towards in-sourced technology development will undoubtedly require it staff to first learn effective teaching methods and basic curricula development. they will need a framework to take a set of specific skills and build ad-hoc courses with medium range learning objectives. teaching can occur in the context of actual work scenarios so that learning is put to practical use as part of that training, and skills retention improved. libraries can become labs for cross-training and knowledge sharing through leveraging our teachers and technologists in interdisciplinary partnerships and collaboration with a focus on internal growth so that library organizations can meet continuously changing demands. once staff have been trained in new technical areas, there is another opportunity for it professionals to extend their impact, by dividing technology-driven projects into the parts that require deep technical work and the parts that require transferable technical skills. if technologists start looking at ways to implement technical solutions in componentized ways instead of as end-to-end solutions, they have the opportunity to empower newly trained staff to contribute in practical ways through building solution foundations and then delegating configurable application inputs. as an example – developing a full application stack requires considerable programming skill, but learning to create and update extensible stylesheets to transform xml-based metadata is a teachable skill. it professionals could develop applications that take a configuration file and an xsl file as inputs while staff with xslt training can modify the configuration to include parameters for connecting to apis or loading xml. trained staff could then modify the xsl to transform data to their specifications without having to pass the task back to the it professional. information technology and libraries | december 2017 6 moving toward more holistic technology capability in libraries will require all personnel to be committed to evolving to meet the emerging needs of our organizations – it professionals included. for decades, technologists have been in the privileged position of having the necessary skills to advance the profession’s digital future, but it will be important for technologists in libraries to integrate the many valuable skills other personnel can offer so that they also can evolve in ways that best support our organizations – leveraging foundational library skills to enhance overall organizational capacity to accomplish tasks that are increasingly requiring technical expertise. i won’t pretend it will be easy. it will require libraries to prioritize organizationally-led training, even amidst the flurry of demands around us, but i think it is also critical to the future of the profession, and the old adage that winter pays for summer feels apropos here. technologists will need to be open to incorporating foundational library skills, to collaborating and learning from other library specialists, to thinking of their positions more broadly, and, for those who live in ivory towers (you know who you are), to eliminating the silos they’ve built and collaborate, cooperate, and engage. technologists are an important part of library ecosystems with what we contribute operationally, but i think we can have a greater impact if we propagate our knowledge in an effort to increase the profession’s overall technology capacity and become agents to support knowledge workers’ future skill development. 218 history of library computerization frederick g. kilgour : director, ohio college library center, columbus, ohio the history of library computerization from its initiation in 1954 to 1970 is described. approximately the first half of the period was devoted to computerization of user-oriented subject infotmation retrieval and the second half to library-oriented procedures. at the end of the period on-line systems were being designed and activated. this historical scrutiny seeks the origins of library computerization and traces its development through innovative applications. the principal evolutionary steps following upon a major application are also depicted. the investigation is not confined to library-oriented computerization, for it examines mechanization of the use of library tools as well; indeed, the first half-dozen years of library computerization were devoted only to user applications. the study reveals two major trends in library computerization. first, there are those applications designed primarily to benefit the user, although few, if any, applications have but one goal. the earliest such applications were machine searches of subject indexes employing post-coordination of uniterms. nearly a decade later, the first of the bookform catalogs appeared that made catalog information far more widely available to users than do card catalogs. finally, networks are under development that have as their objective availability of regional resources to individual users. the second trend is employment of computers to perform repetitive, routine library tasks, such as catalog production, order and accounting procedures, serials control, and circulation control. this type of mechanizahistory of library computerizationfkilgour 219 tion is extremely important as a fir st step toward an increasingly productive library technology, which must be an ultimate goal if libraries are to be economically viable in the future ( 1,2). historical studies of library computerization have not yet appeared, although some reports beginning with that of l. r. bunnow ( 3) in 1960 contain valuable literature reviews. both editions of literature on information retrieval and machine translation by c. f. balz and r. h. stanwood ( 4,5) are extremely useful. in addition, j. a. speer's libraries and automation ( 6) is a valuable, retrospective bibliography of over three thousand entries. origins the origins of library computerization were in engineering libraries newly established in the 1950's and employing the uniterm coordinate indexing techniques of mortimer taube on collections of report literature. the technique of post-coordination of simple index terms proved most suitable for computerization, particularly when the size of a file caused manual manipulation to become cumbersome. harley e . tillitt presented the first report, albeit unpublished at the time, on library computerization at the u.s. naval ordnance test station (nots), now the naval weapons center at china lake, california. the report, entitled "an experiment in information searching with the 701 calculator" (7), was given at an ibm computation seminar at endicott, new york, in may 1954. the system was extended .and improved in 1956, and a published report appeared in 1957 ( 8). tillitt subsequently published an evaluation ( 9). the nots system mimicked manual use of a uniterm card file. this noteworthy system could add new information, delete information related to discarded documents, match search requests against the master file, and produce a printout of document numbers selected. search requests were run in batches, thereby producing inevitable delays that caused user dissatisfaction. when the user did receive results of his search, he had a host of document numbers that he had to take to a shell list file to obtain titles. subsequent system designers also found that a computerized system could cause user dissatisfaction if it did not speed up and make more thorough practically all tasks. because use of the system dwindled, it was not reprogrammed for an ibm 704 that replaced the 701 in 1957. however, a couple of years later, when an ibm 709 became available, the system was reprogrammed and improved so that the user received a list of document titles ( 10). tillitt, bracken, and their colleagues deserve much credit for their pioneer computerization of a subject information retrieval system. the application required considerable in genuity, for the ibm 701 did not have built-in character representation. therefore it was necessary to develop subroutines that simulated character representation ( 11 ). moreover, the 701 had an 220 journal of library automation vol. 3/3 september, 1970 unreliable electrostatic core memory. on some machines the mean time between failures was less than twenty minutes ( 12). in september 1958, general electric's aircraft gas turbine division at evendale, ohio, initiated a system on an ibm 704 computer ( 13) that was similar to the nots application. mortimer taube and c . d. gull had installed a uniterm index system at evendale in 1953 (14,15). the ge system was an improvement over the then-existing nots system because it printed out author and title information for a report selected, as well as an abstract of the report. like the nots system, however, the ge application provided only for boolean "and" search logic. the celebrated medlars system ( 16) encompassed the first major departure in machine citation searching. the original medlars had two principal products: 1 ) composition of index m edicus; and 2) machine searching of a huge file of journal article citations for production of recurrent or ondemand bibliographies. the system became operational in 1964. the nots and ge systems coordinated document numbers as listed under descriptors. medlars departed from this technique by searching a compressed citation file in which each citation had its descriptors or subject headings associated with it. the medlars system also provides for boolean "and," "or," and "not" search logic. the next major development was dialog (17), an on-line system for machine subject searching of the nasa report file. queries were entered from remote terminals. the suny biomedical communication network constitutes an important development in operation of machine subject searching and production of subject bibliographies of traditional library materials. the suny network went into operation in the autumn of 1968 with nine participating libraries ( 18) . its principal innovation is on-line searches from remote terminals of the medlars journal article file to which book references have been added. the suny network eliminates the two major dissatisfactions with the nots system and all subsequent batch systems, in that it provides the user with an immediate reply to his search query. catalog production in 1960, l. r. bunnow prepared a report for the douglas aircraft company ( 3) in which he recommended a computerized retrieval system like the nots and ge systems that would also include catalog card production. bunnow's proposal was perhaps the first to contain the concept of production of a single machine readable record from which multiple products could be obtained, such as printed catalog cards and subject bibliographies produced by machine searching. catalog card production began in may 1961 ( 19), the cards having a somewhat unconventional format and being printed all in upper-case characters as shown in figure 1. cards were mechanically arranged in packs for individual catalogs, and alphabetized within packs-an early sophistication. accompanying the history of library ,computerizationjkilgour 221 ml 13,750 douglas aircraft co., inc mechanized information retrieval system for douglas aircraft company, inc., status report. g. w. koriagin, l. r. bunnow january 1962 copy 1 fig. 1. sample catalog card. info~tion retrieval ibraries computer earching ibm 7090 ibm 1401 production of catalog cards was production of accession lists from the same machine readable data. the next development in catalog card production occurred at the air force cambridge research laboratory library, which began to produce cards mechanically in upperand lower-case in 1963 ( 20). a special computer-like device called a crossfiler manipulated a single machine readable cataloging record on paper tape to produce a complete set of card images punched on paper tape. this paper-tape product drove a friden flexowriter that mechanically typed the cards in upperand lower-case. two years later, yale began to produce catalog cards in upperand lower-case directly on a high-speed computer printer ( 21). the yale cards were also arranged in packs, as had been those at douglas, but were not alphabetized within packs. the new england library information network, nelinet, demonstrated in a pilot operation in 1968 a batch processing technique servicing requests from new england state university libraries, via teletype terminals, for production of catalog card sets, book labels, and book pockets from a marc i catalog data file ( 22). the nelinet system became operational in the spring of 1970 employing the marc ii data base. also in 1968 the university of chicago library brought into operation catalog card production with data being input remotely on terminals in the library, and cards being printed in batches on a high-speed computer printer centrally ( 23 ). bookform catalogs began to appear in the early 1960's, and it appears that the information center of the monsanto company in st. louis, missouri, published the earliest report on a bookform catalog that it had 222 journal of library automation vol. 3/3 september, 1970 produced by computer in 1962 ( 24,25) . the center discontinued its card catalog in the same year. book catalogs can increase availability of cataloging information to users while reducing library work, and the monsanto book catalog is an example of such an achievement, for it provides a union catalog of the holdings of seven monsanto libraries, and is produced in over one hundred copies. as would be expected, the catalog appeared all in upper-case. however, in september 1964 the library at florida atlantic university produced a bookform catalog in upperand lower-case (26) and the university of toronto library put out the first edition of its upperand lower-case onulp catalog on 15 february 1965 (27,28). the monsanto catalog format called for author and call number on one line, with title and imprint on a second, or second and third, line. both florida atlantic and toronto catalogs were essentially catalogs of catalog cards. under the leadership of mortimer taube, documentation, inc. was first to produce a bookform catalog in upperand lower-case, with a format like that of bookform catalogs in the nineteenth century ( 29); documentation, inc., prepared the catalog for the baltimore county public library. entries were made once, with titles listed under an entry if there were more than one. the stanford bookform catalog appeared late in 1966, introducing a new type of unit record, whose first element is the title paragraph. h. p. luhn proposed selective dissemination of information ( sdi) in 1958 (30), and perhaps the first library application of sdi was in the spring of 1962 at the ibm library at owego ( 31), where special processing was given to new acquisitions for input into the sdi system. at about the same time, the library of the douglas missile & space systems division instituted an sdi system that employed as input a single machine readable record from which catalog cards and accessions lists were also produced ( 32). the introduction of sdi into library operation is a major, historic innovation, for sdi is a routine but personalized service in contradistinction to the depersonalized library service characteristic of all but the smallest libraries. selective dissemination of information is one of the few examples of library computerization that takes full advantage of the computer's ability to treat an individual as a person and not as one of a horde of users. circulation the picatinny arsenal reported the first computerized circulation system ( 33). the pica tinny application produced a computer printed loan record, lists of reserves, overdues, lists of books on loan to borrowers, and statistical analysis, in a system that began operation in april 1962. the charge card at picatinny was an ibm punch card into which was punched the bibliographic data and data concerning the borrower each time the book was charged. in the fall of 1962, the thomas j. watson research center ( 34) activated a circulation system much like the pica tinny system, except that bibliographic data was punched into a book card by machine, but information about the borrower was manually punched. history of library c;om.puterizationjkilgour 223 the next step forward occurred at southern illinois university ( 35), where a circulation system like the two just described began limited operation in the spring of 1964 employing an ibm 357 data collection system. by using the 357, it was possible to have a machine punched book card and a machine readable borrower's identification card that could be read by the 357, thereby eliminating manual punching. the southern illinois system became fully operational at the beginning of the fall term of 1964, as did a similar 357 system at florida atlantic university (26). batch processed circulation systems periodically producing a listing of books on loan have a built-in source of dissatisfaction, particularly in academic libraries, for current records are unavailable on the average for half the period of the frequency of the printout. such delay can be eliminated in an on-line system, wherein information about the loan is available immediately after recording the loan. however, not all circulation systems with remote terminals operate interactively. in an on-line system introduced at the illinois state library in december 1966 ( 36) the transactions were recorded on an ibm 1031 terminal located at the circulation desk, data transmitted from the terminal being accumulated daily and processed into the file nightly. as first activated, the system did not permit querying the file to determine books charged out, but this capability was added in 1969. also in december 1966, the redstone scientific information center brought into operation a pilot on-line book circulation system based on a converted machine readable catalog consisting of brief catalog entries. this pilot system remained in operation until october 1967, and was capable of recording loans, discharging loans, putting out overdues, maintaining reserves, and locating the record in the file (37). the bellrel real time loan system went into operation at bell laboratories library in march 1968 ( 38). bellrel has a data base consisting of converted catalog records, so that in effect it also is a remote catalog access system. bellrel serves three libraries remotely from two ibm 1050 terminals in each library. bellrel is a sophisticated on-line, real time circulation system that not only records and discharges books, but also replies to inquiries as to the status of a title, and the status of a copy, and will display the full record for a title, as would be required for remote catalog access. serials the library of the university of california, san diego, activated the first computerized serials control system ( 39). this system has as its objective production of a complete holdings list, lists of current receipts, binding lists, claims, nonreceipt lists, and expiration of subscription lists. checking in was accomplished by manual removal from a file of a prepunched card for a specific title and issue. the check-in clerk sent this card to the computer center for processing and the journal issue to the shelves. this 224 journal of library automation vol. 3/3 september, 1970 technique of prepunching receipt cards has generated new problems in some libraries, for professional advice is often needed as to action to be taken when the issue received does not match the prepunched card. nevertheless, the san diego system still operates, albeit with modifications. the washington university school of medicine library activated a serials control system in 1963 ( 40) that was essentially like that at san diego. a series of symposia held at washington university, with the first in the autumn of 1963, widely publicized the system and led to its adoption elsewhere. the university of minnesota biomedical library introduced a technique of writing in receipts of individual journal issues on preprinted check-in lists ( 41 ). check-in data was then keypunched from the lists. this system obviated the problem generated by prepunched cards that did not match received issues, but, of course, reintroduced manual procedures. difficulties with check-in procedures, and delays in receipt of printed lists of holdings made it clear that an on-line real time circulation control system would be superior to the batch systems described in the previous paragraph. laval university in quebec introduced the first on-line, real time system in 1969 ( 42). in september 1969 the laval on-line file held 16,335 titles. access to the file from cathode ray tube terminals is by accession number, and the file, or sections thereof, can be listed. the system also produces operating statistics and contains the potential for automatic claiming. the kansas union list of serials ( 43 ), which appeared in 1965, was the first computerized union list to contain holdings of several institutions. the kansas union list recorded holdings for nearly 22,000 titles in eight colleges and universities. reproduced photographically from computer printout and printed three columns on a page, this legible and easy-to-use list set the style for many subsequent union lists. acquisitions the national reactor testing station library was first to use a computer in ordering processes ( 44). a multiple-part form was produced for library records and for dealers. the library of the thomas j. watson research center activated a more sophisticated system in 1964 that produced a processing information list containing titles of all items in process, a shelf list card, a book card, and a book pocket label ( 45). the pennsylvania state university library put a computerized acquisition system into operation in 1964 ( 46). this system produced a compact, line-a-title listing of each item in process, together with an indication of the status of the item in processing. a small decklet of punch cards was produced for each item on a keypunch, and one of these cards was sent to the _computer center for processing each time its associated item changed status. the pennsylvania system also produced purchase orders. in june 1964, the university of michigan library ( 47) introduced a computerized acquisitions procedure more sophisticated than its predehistory of library computerization/kilgour 225 cessors. the michigan system produced a ten-part purchase order fanfold, an in-process listing, and computer produced transaction cards to update status of items in process; and carried out acconnting for encumbrance and expenditure of book fnnds. in addition, the system produced periodic listings of "do-not-claim" orders, listings of requests for quotation, and of "third claims" for decision as to future action on such orders. in 1966, the yale machine aided technical processing system began operation ( 48). it produced daily and weekly in-process lists arranged by author, a weekly order number listing, weekly fund commitment registers, and notices to requesters of status of request. subsequently, claims to dealers were added, as well as management information reports on activities within the system. like the pennsylvania and michigan systems, its inprocess list recorded the status of the item in processing. the washington state university library brought the first on-line acquisition system into operation in april 1968 ( 49). access to the system was by purchase order number, with records arranged in a random access file nnder addresses computed by a random number generator (50). the stanford university libraries on-line acquisition system began operation in 1969 (51), and employed a sequential file of entries having an index of words in author and title elements of the entry. the stanford system calculated addresses of index works by employing a division hashing technique on the first three letters of the word. standardization by 1965, a dozen or more libraries had a dozen or more formats for machine readable bibliographic records, and an impenetrable thicket of such records was evolving. fortnnately, the library of congress, with the help of the connell on library resources, took the initiative in standardization of format of bibliographic records and produced the now familiar marc format (52) . just as standardization of catalog card sizes enabled interchange of catalog records, so has marc made possible interchange of machine readable catalog records. this standardization has encouraged developments of networks, such as the suny biomedical network, nelinet, the washington state libraries network, and that of the ohio college library center. with each of these regional networks employing the marc bibliographic record, it will be possible to integrate these regional nodes into a future national network. substance and sum the first half of the first decade and a half of library computerization was confined almost entirely to two major mechanizations of mortimer taube's uniterm coordinate indexing. the computerization of single descriptors with attendant document numbers wa£ a relatively easy task. the first breakaway from computerized subject searching came at the 226 journal of library autornation vol. 3/ 3 september, 1970 douglas aircraft corporation, where the technique of producing one machine readable record from which multiple products could be obtained was introduced in 1961. the last half of library automation's decade and a half has been largely consumed with efforts to automate existing library procedures. althou~h notable departures have occurred that take advantage of the computers powerful qualities, on-line, real time techniques introduced at the very end of the historical period under review began again to use individual words as words, not unlike the logic in which the first applications employed uniterms; and it seems likely that the immediate future will witness increasing degrees of computerization based on individual words in bibliographic descriptions rather than on the record as a whole. acknowledgments the author is grateful to sheila bertram for identifying, searching out, and gathering most of the references used in this paper. cloyd dake gull furnished in correspondence invaluable information about events of the fifties and early sixties, and various librarians supplied photocopies of early documents. references l. kilgom, frederick g.: "the economic goal of library automation," college & research libraries, 30 (july 1969 ), 307-311. 2. baumol, william j.: "the costs of library and informational services." in libraries at large (new york: r. r. bowker co., 1969), pp. 168-227. 3. bunnow, l. r.: study of and proposal for a mechanized inforrrwtion retrieval system for the missiles and space systems engineering library (santa monica, california: douglas aircraft co., 1960). 4. balz, charles f.; stanwood, richard h .: literature on information retrieval and machine translation ( international business machines corp., november 1962). 5. balz, charles f.; stanwood, richard h.: literature on information retrieval and machine translation 2d. ed. (international business machines corp., january 1966). 6. speer, jack a. : libraries and automation; a bibliography with index (emporia, kansas: teachers college press, 1967). 7. tillitt, harley e.: "an experiment in information searching with the 701 calculator," journal of library automation, 3 (sept. 1970 ), 202-206. 8. bracken, r. h. ; tillitt, h. e.: "information searching with the 701 calculator," journal of the a ssociation for computing machinery, 4 ( april 1957 ), 131-136. 9. tillitt, harley e. : "an application of an electronic computer to information retrieval." in boaz, martha : modern trends in doc'lrlm entation (new york: pergamon press, 1959), pp. 67-69. history of library computedzationjkilgour 227 10. zaharias, jerome l.: lizards; libmry irlformation search and retrieval data system (china lake, california: u. s. naval ordnance test station, 1963). 11. bracken, robert h.; oldfield, bruce g.: "a general system for handling alphameric information on the ibm 701 computer," journal of the association for computing machinery, 3 (july 1956), 175-180. 12. rosen, saul: "electronic computers: a historical survey," computing surveys, 1 (march 1969), 7-36. 13. barton, a. r.; schatz, v. l.; caplan, l. n.: information retrieval on a high speed computer (evendale, ohio: general electric co., 1959), p· 8. 14. gull, c. d.: personal communication, (22 august 1969) . 15. dennis, b. k.; brady, j. j.; dovel, j. a., jr.: "five operational years of inverted index manipulation and abstract retrieval by an electronic computer," journal of chemical documentation, 2 (october 1962 )) 234-242. 16. austin, charles j.: medlars; 1963-1967 (bethesda, maryland: national library of medicine, 1968). 17. summit, roger k.: "dialog: an operational on-line reference retrieval system." in association for computing machinery: proceedings of 22nd national conference. (washington, d. c.: thomson, 1967), pp. 51-56. 18. pizer, irwin: "regional medical library network," bulletin of the medical libmry association, 51 (april1969), 101-115. 19. koriagin, gretchen w .: "library information retrieval program," journal of chemical documentation, 2 (october 1962 ) 242-248. 20. fasana, paul j.: "automating cataloging functions in conventional libraries," 7 (fall 1963), 350-365. 21. kilgour, frederick g.: "library catalogue production on small computers," american documentation, 17 (july 1966), 124-131. 22. nugent, william r.: "nelinet-the new engjand information network." in congress of the international federation for information processing, 4th, edinburgh, 5-10 august, 1968: proceedings (amsterdam: north-holland publishing co., 1968), pp. g 28-g 32. 23. payne, charles t.: "the university of chicago's book processing system." in proceedings of a conference held at stanford university libraries, october 4-5, 1968 (stanford, califomia: stanford university libraries, 1969). 24. wilkinson, w . a.: personal communication (november 1969). 2.5. wilkinson, w. a.: "the computer-produced book catalog: an appli · cation of data processing at monsanto's information center." in university of illinois graduate school of library science: proceedings of the 1965 clinic on library applications of data processing (champaign, illinois: illini union bookstore, 1966), pp. 92-111. 228 journal of library automation vol. 3/3 september, 1970 26. heiliger, edward: "florida atlantic university library." in university of illinois graduate school of library science: proceedings of the 1965 clinic on library applications of data processing (champaign, illinois: illini union bookstore, 1966), pp. 92-111. 27. bregzis, ritvars: personal communication (november 1969 ) . 28. bregzis, ritvars: "the ontario universities library project-an automated bibliographic data control system," college & research libraries, 26 (november 1965), 495-508. 29. robinson, charles w.: "the book catalog: diving in," wilson library bulletin, 40 (november, 1965), 262-268. 30. luhn, h. p.: "a business intelligence system," ibm journal of research and development, 2 (october 1958), 315-319. 31. stanwood, richard h.: "the merge system of information dissemination, retrieval and indexing using the ibm 7090 dps ." in association for computing machinery: digest of technical papers (1962), pp. 38-39. 32. young, e. j.; williams, a. s.: historical development and present status-douglas aircraft company computerized library program (santa monica, california: douglas aircraft co., 1965). 33. haznedari, i.; voos, h.: "automated circulation at a government r & d installation," special libraries, 55 (february 1964), 77-81. 34. gibson, r. w., jr.: randall, g. e.: "circulation control by computer," special libraries, 54 (july-august 1963), 333-338. 35. mccoy, ralph e.: "computerized circulation work: a case study of the 357 data collection system," library resources & technical services, 9 (winter 1965), 59-65. 36. hamilton, robert e.: "the illinois state library 'on-line' circulation control system." in university of illinois graduate school of library science: proceedings of the 1968 clinic on library applications of data processing (urbana, illinois: graduate school of library science, 1969), pp. 11-28. 37. "redstone center shows on-line library subsystems," datamation, 14 (february 1968), 79, 81. 38. kennedy, r. a. : "bell laboratories' library real-time loan system (bellrel)," journal of library automation, 1 (june 1968), 128-146. 39. university of california, san diego, university library: report on serials computer project; university library and ucsd computer center (la jolla, california: university library, july 1962). 40. pizer, irwin h.; franz, donald r.; brodman, estelle: "mechanization of library procedures in the medium-sized medical library: i. the serial record," bulletin of the medical library association, 51 (july 1963) , 313-338. 41. strom, karen c.: "software design for bio-medical library serials control system." in american society for information science, annual meeting, columbus, 0., 20-240ct.1968: proceedings, 5 (1968) , 267-275. history of l-ibrary computerizationjkilgour 229 42. varennes, rosario de : "on-line serials system at laval university library," journal of library automation, 3 (june 1970). 43. kansas union list of serials ( lawrence, kansas: university of kansas libraries, 1965 ), 357 pp. 44. griffin, hillis l.: "electronic data processing applications to technical processing and circulation activities in a technical library." in university of illinois graduate school of library science: p-roceedings of the 1963 clinic on library applications of data process'ing (champaign, illinois: illini union bookstore, 1964) , pp. 96-108. 45. randall, g. e.; bristol, roger p.: "pil (processing information list ) or a computer-controlled processing record," special libraries, 55 (feb. 1964), 82-86. 46. minder, thomas l.: "automation-the acquisitions program at the pennsylvania state university library." in international business machines corporation: ibm library mechanization symposium, endicott, new york, may 25, 1964, pp. 145-156. 47. dunlap, connie: "automated acquisitions procedures at the university of michigan library," library resources & technical services, 11 (spring 1967), 192-206. 48. alanen, sally; sparks, david e.; kilgour, frederick g.: "a computermonitored library technical processing system." in american documentation institute, 1966 annual meeting, october 3-7, 1966, santa monica, california: proceedings, pp. 419-426. 49. burgess, t .; ames, l.: lola; library on-line acquisitions subsystem (pullman, wash.: washington state university library, july 1968). 50. mitchell, patrick c.; burgess, thomas k.: "methods of randomization of large files with high volatility," journal of library au-tomation, 3 (march 1970). 51. parker, edwin b.: "developing a campus information retrieval system." in proceedings of a conference held at stanford university libraries, october 4-5, 1968 (stanford, california: stanford university libraries, 1969), pp. 213-230. 52. "preliminary guidelines for the library of congress, national library of medicine, and national agricultural library implementation of the proposed american standard for a format for bibliographic information interchange on magnetic tape as applied to records representing monographic materials in textual printed form (books) ," jourruzl of ubrary automation, 2 (june 1969), 68-83. microsoft word september_ital_colegrove_for_proofing.docx editorial board thoughts: rise of the innovation commons tod colegrove information technology and libraries | september 2015 2 that the practice of libraries and librarianship is changing is an understatement. throughout their history, libraries have adapted and evolved to better meet the needs of the communities served. from content collected and/or archived, to facilities and services provided, a constant throughout has been the adoption, incorporation, and eventual transition away from technologies along the way: clay tablets and papyrus scrolls giving way to the codex; the printing press and eventual mass production and collection of books yielding to information communication technology such as computer workstations and the internet. indeed, the rapid and widespread adoption of the internet has enabled entire topologies of information to change – morphing from ponderous print tomes into digital databases, effectively escaping the walls of libraries and archives altogether.1 in reflection of end-‐users’ growing preference for easily accessible digital materials, libraries have responded with the creation of new spaces and services. repositioning physical, digital, human, and social resources to better meet the needs of the communities supported, the information commons2 that is the library begins to acquire a more technological edge. the concept of a library service or area referred to specifically as an information commons can be traced to as early as 1992 with the opening of the information arcade at the university of iowa – specifically designed to provide end-‐users technology tools, with a stated mission “to facilitate the integration of new technology into teaching, learning, and research, by promoting the discovery of new ways to access, gather, organize, analyze, manage, create, record, and transmit information.”3 first mentioned in the literature in 1994, discussion of the idea itself waited another five years, with donald beagle writing about the theoretical underpinnings of “the new service delivery model” in 1999. defined as “a cluster of network access points and associated it tools situated in the context of physical, digital, human, and social resources organized in support of learning.” a flurry of articles followed, with the idea seeming to have caught the collective imagination of libraries generally by 2004. information commons as named spaces within libraries made “… sudden, dramatic, and widespread appearance in academic and research libraries across the country and around the world.”4 scott bennett went further, in 2008 asking flatly: “who would today build or renovate an academic library without including an information commons?”5 this proliferation and transition has not been limited to academic libraries; for decades, libraries of all type, shape, and size, have been similarly provisioning resources and patrick “tod” colegrove (pcolegrove@unr.edu), a member of the ital editorial board, is head of the delamare science & engineering library at the university of nevada, reno, nv. editorial board thoughts: rise of the innovation commons | colegrove doi: 10.6017/ital.v34i3.8919 3 technology in the context of end-‐user access and learning. by 2006, a new variation of the information commons had entered the vernacular: the learning commons. defined by beagle as the result of information commons resources “organized in collaboration with learning initiatives sponsored by other academic units, or aligned with learning outcomes defined through a cooperative process.”6 a subset of the broader concept, when the library collaborates with stakeholders external to the library to collaboratively achieve academic learning outcomes, it becomes operationally a learning commons. one can easily conceive of the learning commons more broadly by considering learning outcomes desirable within the context of particular library types: school libraries with offerings and programs in alignment with broader k-‐12 curricula; public libraries in support of lifelong learning and participatory citizenship; special libraries in alignment with other niche-‐specific learning outcomes. note that not all information commons are learning commons. as defined, the learning commons depends on the actions and involvement of other units that establish the mission, and associated learning goals, of the institution. others must join with the library’s effort in order to create and nourish such spaces in a way that is deeply responsive to the aspirations of the institution: “the fundamental difference between the information and the learning commons is that the former supports the institutional mission while the latter enacts it.” (bennett 2008, emphasis added) at a time when libraries are undergoing such rapid and significant transformation, it’s hard to dismiss such collaborative effort as merely trendy – such spaces, and the library by extension, become of even more fundamental relevance to the broader organization. in short, resources are provisioned in the information commons so that learning can happen; collaborative effort with stakeholders beyond the library, but within the organization, ensures that learning does happen. drawing a parallel, what if the library were to go beyond simply repositioning resources in support of learning – indeed, beyond working with other units of the organization to collaboratively align and provision resources in support of achieving organizational learning outcomes? to go beyond strategic alignment with the aspirations of the institution, involving stakeholders from beyond the immediate organization in the creation and support of such spaces? provisioning library spaces and services that are deeply responsive to the aspirations of the greater community? arguably this is where the relatively recent introduction of makerspaces into the library fits in. the annual environmental scan performed by the new media consortium (nmc) has for a number of years identified makerspaces to be on its short-‐term adoption horizon – the 2015 library edition goes further, identifying a core value: the introduction of makerspaces in the academic library is inspiring a mode of learning that has immediate applications in the real world. aspiring inventors and entrepreneurs are information technology and libraries | september 2015 4 taking advantage of these spaces to access tools that help them make their dreams into concrete products that have marketable value.7 aspects of the information commons are present in library makerspace – not only in the access to traditional library resources, but also in the shift toward providing support of 21st-‐ century literacies in the creation, design, and engineering of output. with the acquisition and use of these literacies in collaboration with and in support of the goals of the greater institution, it is also a learning commons; for example, in the case of a school or public library where makerspace activities and engagement collaboratively meet and support learning outcomes including increased engagement with science, technology, engineering, the arts, and math (steam) disciplines. consider the further example of university students leveraging makerspace technology as part of ste(a)m outreach efforts to local middle schools in the hope of kindling interest, or partnering with the local discovery museum in the production of a mini maker-‐faire to carry that interest forward. alternatively, a team of students conceiving, then prototyping and patenting a new technology with the active and direct support of the library commons, going on to eventually launch as a business. to the extent the library can springboard off the combination of makerspace with information or learning commons to engage stakeholders from beyond the institution, it can go beyond – becoming something broader, and potentially transformative; even as it enables progress toward collaboratively achieving community goals, outcomes, and aspirations. the hallmark of community engagement with such library facilities is a spontaneous innovation that seems to flow naturally. library? information or learning commons? arguably such spaces are more accurately named innovation commons. beyond solidifying the library’s place as a hub of access, creation, and engagement across disciplinary and organizational boundaries, the direct support of innovation – the process of going from idea to an actual good or service with a real perceived value – is in potential alignment with the aspirations of the broader community. in collaboration with stakeholders from across the community, from economic development and government representatives to businesses and private individuals, broader outcomes and aspirations of the greater community can be identified and supported. nevertheless, simply adding makerspace technology to an information or learning commons does not automatically create an innovation commons. it is in the broader conversation, along with the catalyzation, identification of and support for the greater aspirations of the community, that the commons begins to assume its proper role in the greater ecosystem. leveraging the deliberate application of information, with imagination, and initiative, enabling end-‐users to go from idea all the way to useful product or service is something that community stakeholders see as a tangible value. editorial board thoughts: rise of the innovation commons | colegrove doi: 10.6017/ital.v34i3.8919 5 the library as innovation commons becomes a natural partner in the local innovation ecosystem, working collaboratively to achieve community aspirations and economic impact. traditional business and industry reference support ramps up to another level, providing active and participatory support of coworking, startup companies, and etsypreneur8 alike – patent searches taking on an entirely new light in support of innovators using makerspace resources to rapidly prototype inventions. actualized, the library joins forces in a deeper way with the community in the creation of new technologies, jobs, and services, taking an ever more active role in building the futures of the community and its members. references 1. morgan currie, “what we call the information commons,” institute of network cultures blog, july 8, 2010, http://networkcultures.org/blog/2010/07/08/what-‐we-‐call-‐the-‐ information-‐commons/ 2. the word commons reflects the shared nature of a resource held in common, such as grazing lands. 3. robert a. seal, “issue overview,” journal of library administration, 50 (2010), 1-‐6. http://www.tandfonline.com/doi/pdf/10.1080/01930820903422248 4. charles forrest & martin halbert, a field guide to the information commons. lanham, md: scarecrow, 2009. 5. scott bennett, “the information or the learning commons: which will we have?,” the journal of academic librarianship, 34, no. 3 (2008), 183-‐185. 6. donald robert, donald russel bailey, & barbara tierney, the information commons handbook, xviii. new york: neal schuman, 2006. 7. larry johnson, samantha adams becker, victoria estrada, and alex freeman, nmc horizon report: 2015 library edition, 36. austin, tx: the new media consortium, 2015. 8. the combination of etsy, a peer-‐to-‐peer e-‐commerce website that focuses on selling handmade, vintage, or unique items, and entrepreneurship. the word “etsypreneur” refers to someone who is in the “etsy business” – namely, selling such items via the website. http://etsypreneur.com/the-‐hidden-‐danger-‐of-‐the-‐internet-‐opportunity/ editor’s comments bob gerrity information technology and libraries | september 2012 1 g’day, mates, and welcome to our third open-access issue. ital takes on an additional international dimension with this issue, as your faithful editor has taken up residence down under, in sunny queensland, australia. the recent ala annual meeting in anaheim marked some changes to the ital editorial board that i’d like to highlight. cynthia porter and judith carter are ending their tenure with ital after many years of service. cynthia is featured in this month’s editorial board thoughts column, offering her perspective on library technology past and present. judith carter ends a long run with ital as managing editor, and i thank her for her years of dedicated service. ed tallent, director of levin library at curry college, is the incoming managing editor. we also welcome two new members of the editorial board: brad eden, the dean of library services and professor of library science at valparaiso university, and jerome yavarkovsky, former university librarian at boston college, and the 2004 recipient of ala’s hugh c. atkinson award. jerome currently co-chairs the library technology working group at the mediagrid immersive education initiative. we cover a broad range of topics in this issue. ian chan, pearl ly, and yvonne meulemans describe the implementation of the open-source instant messaging (im) network openfire at california state university san marcos, in supporting of the integration of chat reference and internal library communications. richard gartner explores the use of the metadata encoding and transmission standard (mets) as an alternative to the fedora content model (fcm) for an “intermediary” digital-library schema. emily morton and karen hanson present an innovative approach to creating a management dashboard of key library statistics. kate pittsley and sara memmott describe navigational improvements made to libguides at eastern michigan university. bojana surla reports on the development of a platform-independent, java-based marc editor. yongming wang and trevor dawes delve into the need for next-generation integrated library systems and early initiatives in that space. melanie schlosser and brian stamper begin to explore the effects of reposting library digital collections on flickr. in addition to the compelling new content in this issue of ital, we have compelling old content from the print archive of ital and its predecessor, journal of library automation (jola), that will soon be available online, thanks in large to the work of andy boze and colleagues at the university of notre dame. scans of all of the back issues have now been deposited onto the server that currently hosts ital, and will be processed and published online over the coming months. bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, st. lucia, queensland, australia. lib-mocs-kmc364-20131012113323 velopment, which has recently seen the implementation of a new batch retrospectiveconversion subsystem, and added com catalog options and online authority verification during input/edit. while not the only bibliographic system to be successfully replicated, the wln computer system is becoming the most systematically replicated main-frame facility, with a broad range of future possibilities, including that of a truly turnkey system. wln's experience indicates that, if a system is designed for ease of maintenance at perhaps some sacrifice of efficiency, it will be readily transportable and allow others to obtain the benefits of a highly sophisticated bibliographic capability without the everincreasing cost of original development and, more importantly, without having to support the ongoing maintenance of a unique system. a general planning methodology for automation richard w. meyer, beth ann reuland, francisco m. diaz, and frances colburn: clemson university, clemson, south carolina. introduction a workable planning methodology is the logical starting place for the successful implementation of automation in libraries. an automation plan may develop on the basis of an informal arrangement or from the efforts of one individual, but just as often, automation plans are developed by committees. an automation planning committee must determine and execute some kind of planning methodology and is more likely to be successful if it starts with clear guidelines, good leadership, and a thoroughly proven approach. as a summary review of the literature will bear out, many libraries have developed their own planning techniques inhouse. some of these, which are addressed to the issues of cataloging rule changes and public-access catalogs, have been very well thought out .1 however, these techniques are generally not directed to planning for communications 205 library-wide automation, and are usually designed to meet the specific needs of an individual library. although the pattern for these studies is often similar, they do not seem to be based upon any general automation design methodology. neither, in addition, does there seem to be a general methodology available through any external library agency. the office of library management studies of the association of research libraries has developed a number of programs designed to assist libraries with their planning efforts, some of which appear to be useful in automation development. 2 but for many libraries, these programs may be too broad, too time-consuming or too expensive. as an alternative, some libraries will need to look elsewhere for a general automation planning methodology. this problem was addressed by the administration of the clemson library, and was resolved in a unique way. background the robert muldrow cooper library of clemson university has the responsibility of acquiring, preserving, and making available for use the many materials needed by faculty and students in their research and instructional efforts. at a typical landgrant institution like clemson, the amount of scholarly publishing and the pressure to develop research proposals has risen sharply in recent years . the increased needs of users working with an expanding and diversified collection have resulted in a doubling of circulation activity, and have required the growth of library staff by 70 percent over the last decade . furthermore, acquisition, processing, and access problems are compounded by the high inflation rate of materials, particularly serial publications, and manpower costs. even though user demands heavily burdened the traditional manual systems, the extent of library automation at clemson had been limited to a batch circulation system, a simple serials-listing capability, and the use of bibliographic utilities. although it had been generally accepted for some time that the acquisitions and fund-control functions at clemson were in need of automation, no concrete approach to develop206 journal of library automation vol. 14/3 september 1981 ing a system had been established. in addition, there was some concern that the development of an automated acquisitions system shouldn't be initiated without a clear understanding of how such an effort would affect the rest of the functions in the library. with this in mind, and as an initial part of planning, the library administration decided to implement a programmed study to determine specific needs and problems of the whole library at clemson and to determine the attendant costs and benefits of their resolution. since developing the methodology for this kind of study effort inhouse has been shown by experience elsewhere to be both expensive and time-consuming, a planning methodology was sought which could be brought in from outside the library and applied in a timely fashion. the international business machines corporation (ibm), through their local marketing representative, volunteered to supply that methodology by means of an education industry application transfer team (att} study. in order to implement the study, a team was organized consisting of representatives from the library, from the university's division of administrative programming services (daps), and from the ibm corporation. the purpose, approach, and results of that study constitute the rest of this paper. purpose the application transfer team methodology was implemented to fulfill a fourfold purpose. • first, it was necessary to act on the recognized need for a library-wide automation plan with something tangible that library and university administrators could use in the decision-making process. • second, basic objectives and implementation estimates were required to provide groundwork to the development of systems specifications and evaluation. • third, the planning process needed to provide a forum for meaningful participation by a number of library staff and users. • fourth, the planning needed to be accomplished rather quickly. the att met all these requirements. although the att study technique is generalized for work on any problem in the education arena, it seems particularly well suited to the library environment because it is oriented toward developing applications that solve production problems. the application transfer team methodology was developed by the ibm corporation for customer use. the a tt methodology evolved from ibm's business system planning function, which has been operational since the early 1970s. although the methodology has been used several times in the academic environment, this is the first time, to our knowledge, that it has been used in a library operation. the strength of the att is that it helps members of a team with diverse backgrounds to understand the environment under study. its final goal was "to improve operational productivity, provide better service to students, and provide information which can enhance management planning and decision making."3 put to work, the methodology is straightforward and effective. from beginning to end, the a tt process took clemson slightly more than three months elapsed time. total work time (including all report writing) for library staff was approximately one thousand man hours. as the initial step with the a tt methodology, it was necessary to engage a sponsor and to select a team . for this study, the sponsor chosen was the dean of graduate studies, who reported directly to the vicepresident for academic affairs. in turn, the director of mmpnting and the director of the division of administrative programming services (daps) reported to the dean of graduate studies. although it was not critical that the sponsor be intimately involved in the project, his level of authority within the university administration would help to secure acceptance of the study's recommendation. the sponsor also provided cogent advice along the way, based upon his understanding of institutional resources, and he served as a communication link with other university administrative offices. the study team was chosen by the library administration with the intention of getting diverse involvement and expertise. library staff included the associate director, the head of circulation, the serials cataloger, and a reference librarian. although only the associate director brought significant experience in library automation development, the head of circulation contributed substantial practical experience with automation systems. the cataloger offered specifics of bibliographic problems, cataloging rule changes, and serials control issues, and the reference librarian contributed a comprehensive knowledge of informationretrieval concerns. outside staff included the director of daps, who furnished details on the clemson computing environment, and an ibm marketing representative, who provided appropriate help with hardware capabilities, the att metnodology, and legwork. in addition, clemson was also able to engage the help of a representative of ibm's education industry division to guide the a tt efforts on the basis of his experience in the use of the methodology. from time to time, other ibm and daps staff were involved in assisting with interviews and report writing. the associate director served as team chair in order to act as spokesperson, to coordinate team effort, and to edit the final report. methodology the application transfer team methodology is applied in six phases. ibm recommends that these phases be conducted sequentially, and that they last from five to sixteen weeks, depending on the size of the problem. throughout the process, verbal reviews were conducted by the team with the sponsor and with the library staff. the first phase involved an organizational session. following the introduction of team members, the ibm education industry division representative presented an overview of the methodology and explained the mechanics of the a tt study process. the team then established the scope of the study by choosing an application area on which to focus and by determining the general objectives of the final system to be implemented. since part of the purpose of the project was to develop a plan for librarywide automation, it was quickly recognized by the team that the application area should be an integrated library information system. however, the ibm representative suggested that this scope was too broad for the study and that one functional area such as communications 207 acquisitions be chosen, with other functions reserved for subsequent att studies. given time constraints, a compromise arrangement was made in which serials control was determined as the scope. since serials control is a single functional area, but encompasses nearly all bibliographic issues, it served as a microcosm of overall library operations. therefore, it was generally accepted that a plan that effectively accommodated serials would constitute an integrated system plan. the organizational phase continued by determining who to interview during the data-collections phase and by setting up an interview schedule. this phase was concluded by developing an outline of the final report and by assigning writing responsibilities to individual team members. the data-gathering effort constituted phase two. this involved structured interviews of representative staff of each unit of the library who were involved in routine interactions with any phase of serials control at clemson. interviews were conducted with staff from acquisitions, cataloging, circulation, reference units, and branch libraries as well as the university business office, students, and faculty. following an outline in the att, each person interviewed was asked for specific details of his work with serial publications regarding (1) interfaces (or points of interaction), (2) concerns or needs, (3) suggested improvements, (4) expected values or benefits of improvements, (5) work volume, and (6) cycles. data gathered in each of these interview sessions were immediately documented in a letter to the interviewees. these letters were reviewed by those interviewed for corrections and adde::tl delail. data from completed and documented interviews were consolidated during the third phase of the study into a matrix of each of the six questions plotted against operational areas of the library, graphically designating areas of the greatest concern to the largest part of the library. this composite was analyzed to separate problems that could be reasonably handled by an integrated automation system from those that needed the attention of administrative policy and direction. functions for automation consideration were then examined in a 208 journal of library automation vol. 14/3 september 1981 "blue sky" session of the committee to envision what system would accommodate the specifications for serials control and access that each library unit and serials user required. from this session a synthesis emerged of the architecture for an integrated system. 4 this architecture included a description of the basic relationships of functional modules of the system, a list of the various files needed to contain system information, and a list of data elements required for bibliographic holdings, acquisition, and patron records in the system database. phase four called for the translation of the architecture and general system requirements into modules on basic access, acquisition or processing functions, and into the individual programs needed to execute each module. the team divided into two parts. the ibm and daps personnel, with the associate director, listed the modules and programs and formulated descriptions of each. part of the description effort involved drafting approximate flowcharts of each program. using algorithms developed by ibm, these descriptions were used to assign estimates of person hours required to create the necessary modules. in order to determine the overall cost of system development the person-hour figures were converted to dollars using an average hourly cost for clemson daps personnel. committee members not involved in program/module design formed a group to evaluate anticipated benefits defined in the interviews, to collect data from library staff to support these expectations, and to assign a value to them. benefits from reduced file maintenance, processing, and tracking time were valued as person hours saved by the new system . additional improvements were projected for the system's capability for better fund control, more complete and immediate on-order, claiming, and inprocess information, and statistical collection development/use data. these benefits were assigned the value of estimated duplicate and inappropriate material acquired under the present system. a value was not assigned to user benefits. faculty and student satisfaction is intangible, and variable from case to case. enhanced user service was recognized as a substantial benefit of the proposed system, but was not quantified. the cost factors determined in phase four were consolidated with derived benefit values to form a cost/benefit analysis, which constituted phase five. in the sixth and final phase an implementation plan was formulated. this plan, along with recommended target dates, was presented orally to library staff and university administration. in addition, the entire process, recommendations, and plan of action were documented in a written report. 5 results within the a tt report were a description of the current library environment, objectives and description of the proposed system, implementation considerations, a cost/benefits analysis, and recommendations for a plan of action. although care was taken to "walk through" the function of each module of the described system, the report was not intended to provide detailed computer program specifications ready to be coded by a programmer. it described a useful and powerful integrated serials system in sufficient detail to be a working tool in the hands of a knowledgeable systems analyst to match (or revise) already available systems and programs to the library's specifications. the report itself also served as an effective communication link with the university administration, setting out library concerns and giving rational solutions to the pervasive problem of serials control and, in the long term, to an integrated library information system. the timing of the a tt study was fortunate for the clemson library. the university was on the eve of an accreditation selfstudy. as often happens with the examination of any organization, a host of related, but unacknowledged , problems surfaced in the course of the att study. during the interviews, staff members felt free to bring up matters of unclear policies, misunderstood hierarchical arrangements, and staffing inadequacies throughout the library. the number and importance of nonautomation concerns was significant enough that an administrative report was written to articulate these problems to the university administration. 6 it is interesting to note also that, while in every instance the team received enthusiastic cooperation from all those interviewed , there was fear among some staff members that any automation project would necessarily cut staff positions. once this worry was identified, the study team was able to allay those fears by explaining the study's purpose. one of the greatest contributions of the att study has been the direction it has given the library for future goals and priorities. by focusing on the problems of serials control, the team evaluated a microcosm of library problems. investigating these problems in the environment of more limited budgets, possible future closing or freezing of the card catalog, and increased user demands for services has helped the library develop a course of action, a resolve of mission, and a direction for future growth. the staff of daps and the library are conducting a review of existing software and systems potentially appropriate for a comprehensive serials control system. the att study was the tool successfully used to elicit university support for library automation. the university has given its approval, and supplied funding, to proceed with the determination of available systems and with the development of a request for quotation. communications 209 references l. for example: university of rochester, river campus libraries, task force on access systems, report (rochester, n.y.: univ. of rochester, 1980), university of california, berkeley, general library, committee on bibliographic control, future of the general library catalogs of the university of california at berkeley (berkeley: univ. of california, 1977); pennsylvania state university libraries, systems development department, remote catalog access system: general system specifications (university park: pennsylvania state univ., 1977). 2. association of research libraries, office of management studies, annual report, 1979 (was hington, d.c.: the association, 1979). 3. international business machines corporation, application transfer teams: application description (white plains, n.y.: the corporation, 1977), p.1 ; international business machines corporation, application transfer teams: realizing your computing syste1ns' potential (white plains, n.y.: the corporation, 1977). 4. inte rnational busine'is machines corporation , business systems planning: information systems planning guide (white plains, n.y.: the corporation, 1975), p.49. 5. richard w. meyer and others, total integrated library information system: a report on the general design phase (syracuse, n.y.: eric clearinghouse on information resources, 1980), ed 191446. 6. richard w. meyer, cooper library: status and agenda. a report on fy 1979-80 (clemson, s.c.: clemson univ., 1980). reducing psychological resistance to digital repositories | quinn 67 and mit mandates, and other mandates such as the one instituted at stanford’s school of education, have come to pass, and the registry of open access repository material archiving policies (roarmap) lists more than 120 mandates around the world that now exist.3 while it is too early to tell whether these developments will be successful in getting faculty to deposit their work in digital repositories, they at least establish a precedent that other institutions may follow. how many institutions follow and how effective the mandates will be once enacted remains to be seen. will all colleges and universities, or even a majority, adopt mandates that require faculty to deposit their work in repositories? what of those that do not? even if most institutions are successful in instituting mandates, will they be sufficient to obtain faculty cooperation? for those institutions that do not adopt mandates, how are they going to persuade faculty to participate in self-archiving, or even in some variation—such as having surrogates (librarians, staff, or graduate assistants) archive the work of faculty? are mandates the only way to ensure faculty cooperation and compliance, or are mandates even necessarily the best way? to begin to adequately address the problem of user resistance to digital repositories, it might help to first gain some insight into the psychology of resistance. the existing literature on user behavior with regard to digital repositories devotes scant attention to the psychology of resistance. in an article entitled “institutional repositories: partnering with faculty to enhance scholarly communication,” johnson discusses the inertia of the traditional publishing paradigm. he notes that this inertia is most evident in academic faculty. this would suggest that the problem of eliciting user cooperation is primarily motivational and that the problem is more one of indifference than active resistance.4 heterick, in his article “faculty attitudes toward electronic resources,” suggests that one reason faculty may be resistant to digital repositories is because they do not fully trust them. in response to a survey he conducted, 48 percent of faculty felt that libraries should maintain paper archives.5 the implication is that digital repositories and archives may never completely replace hard copies in the minds of scholars. in “understanding faculty to improve content recruitment for institutional repositories,” foster and gibbons point out that faculty complain of having too much work already. they resent any additional work that contributing to a digital repository might entail. thus the authors echo johnson in suggesting that faculty resistance the potential value of digital repositories is dependent on the cooperation of scholars to deposit their work. although many researchers have been resistant to submitting their work, the literature on digital repositories contains very little research on the psychology of resistance. this article looks at the psychological literature on resistance and explores what its implications might be for reducing the resistance of scholars to submitting their work to digital repositories. psychologists have devised many potentially useful strategies for reducing resistance that might be used to address the problem; this article examines these strategies and how they might be applied. o bserving the development and growth of digital repositories in recent years has been a bit like riding an emotional roller coaster. even the definition of what constitutes a repository may not be the subject of complete agreement, but for the purposes of this study, a repository is defined as an online database of digital or digitized scholarly works constructed for the purpose of preserving and disseminating scholarly research. the initial enthusiasm expressed by librarians and advocates of open access toward the potential of repositories to make significant amounts of scholarly research available to anyone with internet access gradually gave way to a more somber appraisal of the prospects of getting faculty and researchers to deposit their work. in august 2007, bailey posted an entry to his digital koans blog titled “institutional repositories: doa?” in which he noted that building digital repository collections would be a long, arduous, and costly process.1 the success of repositories, in his view, will be a function not so much of technical considerations as of attitudinal ones. faculty remain unconvinced that repositories are important, and there is a critical need for outreach programs that point to repositories as an important step in solving the crisis in scholarly communication. salo elaborated on bailey’s post with “yes, irs are broken. let’s talk about it,” on her own blog, caveat lector. salo points out that institutional repositories have not fulfilled their early promise of attracting a large number of faculty who are willing to submit their work. she criticizes repositories for monopolizing the time of library faculty and staff, and she states her belief that repositories will not work without deposit mandates, but that mandates are impractical.2 subsequent events in the world of scholarly communication might suggest that mandates may be less impractical than salo originally thought. since her post, the national institutes of health mandate, the harvard brian quinn (brian.quinn@ttu.edu) is social sciences librarian, texas tech university libraries, lubbock. brian quinn reducing psychological resistance to digital repositories 68 information technology and libraries | june 2010 whether or not this was actually the case.11 this study also suggests that a combination of both cognitive and affective processes feed faculty resistance to digital repositories. it can be seen from the preceding review of the literature that several factors have been identified as being possible sources of user resistance to digital repositories. yet the authors offer little in the way of strategies for addressing this resistance other than to suggest workaround solutions such as having nonscholars (e.g., librarians, graduate students, or clerical staff) serve as proxy for faculty and deposit their work for them, or to suggest that institutions mandate that faculty deposit their work. similarly, although numerous arguments have been made in favor of digital repositories and open access, they do not directly address the resistance issue.12 in contrast, psychologists have studied user resistance extensively and accumulated a body of research that may suggest ways to reduce resistance rather than try to circumvent it. it may be helpful to examine some of these studies to see what insights they might offer to help address the problem of user resistance. it should be pointed out that resistance as a topic has been addressed in the business and organizational literature, but has generally been approached from the standpoint of management and organizational change.13 this study has chosen to focus primarily on the psychology of resistance because many repositories are situated in a university setting. unlike employees of a corporation, faculty members typically have a greater degree of autonomy and latitude in deciding whether to accommodate new work processes and procedures into their existing routines, and the locus of change will therefore be more at an individual level. ■■ the psychology of user resistance psychologists define resistance as a preexisting state or attitude in which the user is motivated to counter any attempts at persuasion. this motivation may occur on a cognitive, affective, or behavioral level. psychologists thus distinguish between a state of not being persuaded and one in which there is actual motivation to not comply. the source of the motivation is usually an affective state, such as anxiety or ambivalence, which itself may result from cognitive problems, such as misunderstanding, ignorance, or confusion.14 it is interesting to note that psychologists have long viewed inertia as one form of resistance, suggesting paradoxically that a person can be motivated to inaction.15 resistance may also manifest itself in more subtle forms that shade into indifference, suspicion of new work processes or technologies, and contentment with the status quo. may be attributed at least in part to motivation.6 in another article published a few months later, foster and gibbons suggest that the main reason faculty have been slow to deposit their work in digital repositories is a cognitive one: faculty have not understood how they would benefit by doing so. the authors also mention that users may feel anxiety when executing the sequence of technical steps needed to deposit their work, and that they may also worry about possible copyright infringement.7 the psychology of resistance may thus manifest itself in both cognitive and affective ways. harley and her colleagues talk about faculty not perceiving any reward for depositing their work in their article “the influence of academic values on scholarly publication and communication practices.” this perception results in reduced drive to participate. anxiety is another factor contributing to resistance: faculty fear that their work may be vulnerable to plagiarism in an openaccess environment.8 in “towards user responsive institutional repositories: a case study,” devakos suggests that one source of user resistance is cognitive in origin. scholars do not submit their work frequently enough to be able to navigate the interface from memory, so they must reinitiate the learning process each time they submit their work. the same is true for entering metadata for their work.9 their sense of control may also be threatened by any limitations that may be imposed on substituting later iterations of their work for earlier versions. davis and connolly point to several sources of confusion, uncertainty, and anxiety among faculty in their article “institutional repositories: evaluating the reasons for non-use of cornell university’s installation of dspace.” cognitive problems arise from having to learn new technology to deposit work and not knowing copyright details well enough to know whether publishers would permit the deposit of research prior to publication. faculty wonder whether this might jeopardize their chances of acceptance by important journals whose editors might view deposit as a form of prior publication that would disqualify them from consideration. there is also fear that the complex structure of a large repository may actually make a scholar’s work more difficult to find; faculty may not understand that repositories are not isolated institutional entities but are usually searchable by major search engines like google.10 kim also identifies anxiety about plagiarism and confusion about copyright as being sources of faculty resistance in the article “motivating and impeding factors affecting faculty contribution to institutional repositories.” kim found that plagiarism anxiety made some faculty only willing to deposit already-published work and that prepublication material was considered too risky. faculty with no self-archiving experience also felt that many publishers do not allow self-archiving, reducing psychological resistance to digital repositories | quinn 69 more open to information that challenges their beliefs and attitudes and are more open to suggestion.18 thus before beginning a discussion of why users should deposit their research in repositories, it might help to first affirm the users’ self-concept. this could be done, for example, by reminding them of how unbiased they are in their work or how important it is in their work to be open to new ideas and new approaches, or how successful they have been in their work as scholars. the affirmation should be subtle and not directly related to the repository situation, but it should remind them that they are openminded individuals who are not bound by tradition and that part of their success is attributable to their flexibility and adaptability. once the users have been affirmed, librarians can then lead into a discussion of the importance of submitting scholarly research to repositories. self-generated affirmations may be even more effective. for example, another way to affirm the self would be to ask users to recall instances in which they successfully took a new approach or otherwise broke new ground or were innovative in some way. this could serve as a segue into a discussion of the repository as one more opportunity to be innovative. once the self-concept has been boosted, the threatening quality of the message will be perceived as less disturbing and will be more likely to receive consideration. a related strategy that psychologists employ to reduce resistance involves casting the user in the role of “expert.” this is especially easy to do with scholars because they are experts in their fields. casting the user in the role of expert can deactivate resistance by putting that person in the persuasive role, which creates a form of role reversal.19 rather than the librarian being seen as the persuader, the scholar is placed in that role. by saying to the scholar, “you are the expert in the area of communicating your research to an audience, so you would know better why the digital repository is an alternative that deserves consideration once you understand how it works and how it may benefit you,” you are empowering the user. casting the user as an expert imparts a sense of control to the user. it helps to disable resistance by placing the user in a position of being predisposed to agree to the role he or she is being cast in, which also makes the user more prone to agree with the idea of using a digital repository. priming and imaging one important discovery that psychologists have made that has some bearing on user resistance is that even subtle manipulations can have a significant effect on one’s judgments and actions. in an interesting experiment, psychologists told a group of students that they were to read an online newspaper, ostensibly to evaluate its design and assess how easy it was to read. half of them read an editorial discussing a public opinion survey of youth ■■ negative and positive strategies for reducing resistance just as the definition of resistance can be paradoxical, so too may be some of the strategies that psychologists use to address it. perhaps the most basic example is to counter resistance by acknowledging it. when scholars are presented with a message that overtly states that digital repositories are beneficial and desirable, it may simultaneously generate a covert reaction in the form of resistance. rather than simply anticipating this and attempting to ignore it, digital repository advocates might be more persuasive if they acknowledge to scholars that there will likely be resistance, mention some possible reasons (e.g., plagiarism or copyright concerns), and immediately introduce some counterrationales to address those reasons.16 psychologists have found that being up front and forthcoming can reduce resistance, particularly with regard to the downside of digital repositories. they have learned that it can be advantageous to preemptively reveal negative information about something so that it can be downplayed or discounted. thus talking about the weaknesses or shortcomings of digital repositories as early as possible in an interaction may have the effect of making these problems seem less important and weakening user resistance. not only does revealing negative information impart a sense of honesty and credibility to the user, but psychologists have found that people feel closer to people who reveal personal information.17 a librarian could thus describe some of his or her own frustrations in using repositories as an effective way of establishing rapport with resistant users. the unexpected approach of bringing up the less desirable aspects of repositories—whether this refers to the technological steps that must be learned to submit one’s work or the fact that depositing one’s work in a repository is not a guarantee that it will be highly cited—can be disarming to the resistant user. this is particularly true of more resistant users who may have been expecting a strong hard-sell approach on the part of librarians. when suddenly faced with a more candid appeal the user may be thrown off balance psychologically, leaving him or her more vulnerable to information that is the opposite of what was anticipated and to possibly viewing that information in a more positive light. if one way to disarm a user is to begin by discussing the negatives, a seemingly opposite approach that psychologists take is to reinforce the user’s sense of self. psychologists believe that one source of resistance stems from when a user’s self-concept—which the user tries to protect from any source of undesired change—has been threatened in one way or another. a stable self-concept is necessary for the user to maintain a sense of order and predictability. reinforcing the self-concept of the user should therefore make the user less likely to resist depositing work in a digital repository. self-affirmed users are 70 information technology and libraries | june 2010 or even possibly collaborating on research. their imaginations could be further stimulated by asking them to think of what it would be like to have their work still actively preserved and available to their successors a century from now. using the imagining strategy could potentially be significantly more effective in attenuating resistance than presenting arguments based on dry facts. identification and liking conscious processes like imagining are not the only psychological means of reducing the resistance of users to digital repositories. unconscious processes can also be helpful. one example of such a process is what psychologists refer to as the “liking heuristic.” this refers to the tendency of users to employ a rule-of-thumb method to decide whether to comply with requests from persons. this tendency results from users constantly being inundated with requests. consequently, they need to simplify and streamline the decision-making process that they use to decide whether to cooperate with a request. the liking heuristic holds that users are more likely to help someone they might otherwise not help if they unconsciously identify with the person. at an unconscious level, the user may think that a person acts like them and dresses like them, and therefore the user identifies with that person and likes them enough to comply with their request. in one experiment that psychologists conducted to see if people are more likely to comply with requests from people that they identify with, female undergraduates were informed that they would be participating in a study of first impressions. the subjects were instructed that they and a person in another room would each learn a little about one another without meeting each other. each subject was then given a list of fifty adjectives and was asked to select the twenty that were most characteristic of themselves. the experimenter then told the participants that they would get to see each other’s lists. the experimenter took the subject’s list and then returned a short time later with what supposedly was the other participant’s list, but was actually a list that the experimenter had filled out to indicate that either the subject had much in common with the other participant’s personality (seventeen of twenty matches), some shared attributes (ten of twenty matches), or relatively few characteristics in common (three of twenty matches). the subject was then asked to examine the list and fill out a survey that probed their initial impressions of the other participant, including how much they liked them. at the end of the experiment, the two subjects were brought together and given credit for participating. the experimenter soon left the room and the confederate participant asked the other participant if she would read and critically evaluate an eight-page paper for an english class. the results of the experiment indicated that the more the participant thought she shared in consumer patterns that highlighted functional needs, and the other half read a similar editorial focusing on hedonistic needs. the students next viewed an ad for a new brand of shampoo that featured either a strong or a weak argument for the product. the results of the experiment indicated that students who read the functional editorial and were then subsequently exposed to the strong argument for the shampoo (a functional product) had a much more favorable impression of the brand than students who had received the mismatched prime.20 while it may seem that the editorial and the shampoo were unrelated, psychologists found that the subjects engaged in a process of elaborating the editorial, which then predisposed them to favor the shampoo. the presence of elaboration, which is a precursor to the development of attitudes, suggests that librarians could reduce users’ resistance to digital repositories by first involving them in some form of priming activity immediately prior to any attempt to persuade them. for example, asking faculty to read a brief case study of a scholar who has benefited from involvement in open-access activity might serve as an effective prime. another example might be to listen briefly to a speaker summarizing the individual, disciplinary, and societal benefits of sharing one’s research with colleagues. interventions like these should help mitigate any predisposition toward resistance on the part of users. imagining is a strategy related to priming that psychologists have found to be effective in reducing resistance. taking their cue from insurance salesmen—who are trained to get clients to actively imagine what it would be like to lose their home or be in an accident—a group of psychologists conducted an experiment in which they divided a sample of homeowners who were considering the purchase of cable tv into two groups. one group was presented with the benefits of cable in a straightforward, informative way that described various features. the other group was asked to imagine themselves enjoying the benefits and all the possible channels and shows that they might experience and how entertaining it might be. the psychologists then administered a questionnaire. the results indicated that those participants who were asked to imagine the benefits of cable were much more likely to want cable tv and to subscribe to it than were those who were only given information about cable tv.21 in other words, imagining resulted in more positive attitudes and beliefs. this study suggests that librarians attempting to reduce resistance among users of digital repositories may need to do more than merely inform or describe to them the advantages of depositing their work. they may need to ask users to imagine in vivid detail what it would be like to receive periodic reports indicating that their work had been downloaded dozens or even hundreds of times. librarians could ask them to imagine receiving e-mail or calls from colleagues indicating that they had accessed their work in the repository and were interested in learning more about it, reducing psychological resistance to digital repositories | quinn 71 students typically overestimate the amount of drinking that their peers engage in at parties. these inaccurate normative beliefs act as a negative influence, causing them to imbibe more because they believe that is what their peers are doing. by informing students that almost threequarters of their peers have less than three drinks at social gatherings, psychologists have had some success in reducing excessive drinking behavior by students.23 the power of normative messages is illustrated by a recent experiment conducted by a group of psychologists who created a series of five cards to encourage hotel guests to reuse their towels during their stay. the psychologists hypothesized that by appealing to social norms, they could increase compliance rates. to test their hypothesis, the researchers used a different conceptual appeal for each of the five cards. one card appealed to environmental concerns (“help save the environment”), another to environmental cooperation (“partner with us to save the environment”), a third card appealed to the advantage to the hotel (“help the hotel save energy”), a fourth card targeted future generations (“help save resources for future generations”), and a final card appealed to guests by making reference to a descriptive norm of the situation (“join your fellow citizens in helping to save the environment”). the results of the study indicated that the card that mentioned the benefit to the hotel was least effective in getting guests to reuse their towels, and the card that was most effective was the one that mentioned that descriptive norm.24 this research suggests that if users who are resistant to submitting their work to digital repositories were informed that a larger percentage of their peers were depositing work than they realized, resistance may be reduced. this might prove to be particularly true if they learned that prominent or influential scholars were engaged in populating repositories with their work. this would create a social-norms effect that would help legitimize repositories to other faculty and help them to perceive the submission process as normal and desirable. the idea that accomplished researchers are submitting materials and reaping the benefits might prove very attractive to less experienced and less well-regarded faculty. psychologists have a considerable body of evidence in the area of social modeling that suggests that people will imitate the behavior of others in social situations because that behavior provides an implicit guideline of what to do in a similar situation. a related finding is that the more influential people are, the more likely it is for others to emulate their actions. this is even more probable for highstatus individuals who are skilled and attractive and who are capable of communicating what needs to be done to potential followers.25 social modeling addresses both the cognitive dimension of how resistant users should behave and also the affective dimension by offering models that serve as a source of motivation to resistant users to change common with the confederate, the more she liked her. the more she liked the confederate and experienced a perception of consensus, the more likely she was to comply with her request to critique the paper.22 thus, when trying to overcome the resistance of users to depositing their work in a digital repository, it might make sense to consider who it is that is making the request. universities sometimes host scholarly communication symposia that are not only aimed at getting faculty interested in open-access issues, but to urge them to submit their work to the institution’s repositories. frequently, speakers at these symposia consist of academic administrators, members of scholarly communication or open-access advocacy organizations, or individuals in the library field. the research conducted by psychologists, however, suggests that appeals to scholars and researchers would be more effective if they were made by other scholars and those who are actively engaged in research. faculty are much more likely to identify with and cooperate with requests from their own tribe, as it were, and efforts need to be concentrated on getting faculty who are involved in and understand the value of repositories to articulate this to their colleagues. researchers who can personally testify to the benefits of depositing their work are most likely to be effective at convincing other researchers of the value of doing likewise and will be more effective at reducing resistance. librarians need to recognize who their potentially most effective spokespersons and advocates are, which the psychological research seems to suggest is faculty talking to other faculty. perceived consensus and social modeling the processes of faculty identification with peers and perceived consensus mentioned above can be further enhanced by informing researchers that other scholars are submitting their work, rather than merely telling researchers why they should submit their work. information about the practices of others may help change beliefs because of the need to identify with other in-group members. this is particularly true of faculty, who are prone to making continuous comparisons with their peers at other institutions and who are highly competitive by nature. once they are informed of the career advantages of depositing their work (in terms of professional visibility, collaboration opportunities, etc.), and they are informed that other researchers have these advantages, this then becomes an impetus for them to submit their work to keep up with their peers and stay competitive. a perception of consensus is thus fostered—a feeling that if one’s peers are already depositing their work, this is a practice that one can more easily agree to. psychologists have leveraged the power of identification by using social-norms research to inform people about the reality of what constitutes normative behavior as opposed to people’s perceptions of it. for example, college 72 information technology and libraries | june 2010 highly resistant users that may be unwilling to submit their work to a repository. rather than trying to prepare a strong argument based on reason and logic, psychologists believe that using a narrative approach may be more effective. this means conveying the facts about open access and digital repositories in the form of a story. stories are less rhetorical and tend not to be viewed by listeners as attempts at persuasion. the intent of the communicator and the counterresistant message are not as overt, and the intent of the message might not be obvious until it has already had a chance to influence the listener. a well-crafted narrative may be able to get under the radar of the listener before the listener has a chance to react defensively and revert to a mode of resistance. in a narrative, beliefs are rarely stated overtly but are implied, and implied beliefs are more difficult to refute than overtly stated beliefs. listening to a story and wondering how it will turn out tends to use up much of the cognitive attentional capacity that might otherwise be devoted to counterarguing, which is another reason why using a narrative approach may be particularly effective with users who are strongly resistant. the longer and more subtle nature of narratives may also make them less a target of resistance than more direct arguments.28 using a narrative approach, the case for submitting work to a repository might be presented not as a collection of dry facts or statistics, but rather as a story. the protagonists are the researchers, and their struggle is to obtain recognition for their work and to advance scholarship by providing maximum access to the greatest audience of scholars and to obtain as much access as possible to the work of their peers so that they can build on it. the protagonists are thwarted in their attempts to achieve their ends by avaricious publishers who obtain the work of researchers for free and then sell it back to them in the form of journal and database subscriptions and books for exorbitant prices. these prices far exceed the rate of inflation or the budgets of universities to pay for them. the publishers engage in a series of mergers and acquisitions that swallow up small publishing firms and result in the scholarly publishing enterprise being controlled by a few giant firms that offer unreasonable terms to users and make unreasonable demands when negotiating with them. presented in this dramatic way, the significance of scholar participation in digital repositories becomes magnified to an extent that it becomes more difficult to resist what may almost seem like an epic struggle between good and evil. and while this may be a greatly oversimplified example, it nonetheless provides a sense of the potential power of using a narrative approach as a technique to reduce resistance. introducing a time element into the attempt to persuade users to deposit their work in digital repositories can play an important role in reducing resistance. given that faculty are highly competitive, introducing the idea not only that other faculty are submitting their work but that they are already benefiting as a result makes the their behavior in the desired direction. redefinition, consistency, and depersonalization another strategy that psychologists use to reduce resistance among users is to change the definition of the situation. resistant users see the process of submitting their research to the repository as an imposition at best. in their view, the last thing that they need is another obligation or responsibility to burden their already busy lives. psychologists have learned that reframing a situation can reduce resistance by encouraging the user to look at the same phenomenon in a different way. in the current situation, resistant users should be informed that depositing their work in a digital repository is not a burden but a way to raise their professional profile as researchers, to expose their work to a wider audience, and to heighten their visibility among not only their peers but a much larger potential audience that would be able to encounter their work on the web. seen in this way, the additional work of submission is less of a distraction and more of a career investment. moreover, this approach leverages a related psychological concept that can be useful in helping to dissolve resistance. psychologists understand that inconsistency has a negative effect on self-esteem, so persuading users to believe that submitting their work to a digital repository is consistent with their past behavior can be motivating.26 the point needs to be emphasized with researchers that the act of submitting their work to a digital repository is not something strange and radical, but is consistent with prior actions intended to publicize and promote their work. a digital repository can be seen as analogous to a preprint, book, journal, or other tangible and familiar vehicles that faculty have used countless times to send their work out into the world. while the medium might have changed, the intention and the goal are the same. reframing the act of depositing as “old wine in new bottles” may help to undermine resistance. in approaching highly resistant individuals, psychologists have discovered that it is essential to depersonalize any appeal to change their behavior. instead of saying, “you should reduce your caloric intake,” it is better to say, “it is important for people to reduce their caloric intake.” this helps to deflect and reduce the directive, judgmental, and prescriptive quality of the request, thus making it less likely to provoke resistance.27 suggestion can be much less threatening than prescription among users who may be suspicious and mistrusting. reverting to a third-person level of appeal may allow the message to get through without it being immediately rejected by the user. narrative, timing, and anticipation psychologists recommend another strategy to help defuse reducing psychological resistance to digital repositories | quinn 73 technological platforms, and so on. this could be followed by a reminder to users that it is their choice—it is entirely up to them. this reminder that users have the freedom of choice may help to further counter any resistance generated as a result of instructions or inducements to anticipate regret. indeed, psychologists have found that reinstating a choice that was previously threatened can result in greater compliance than if the threat had never been introduced.32 offering users the freedom to choose between alternatives tends to make them more likely to comply. this is because having a choice enables users to both accept and resist the request rather than simply focus all their resistance on a single alternative. when presented with options, the user is able to satisfy the urge to resist by rejecting one option but is simultaneously motivated to accept another option; the user is aware that there are benefits to complying and wants to take advantage of them but also wants to save face and not give in. by being offered several alternatives that nonetheless all commit to a similar outcome, the user is able to resist and accept at the same time.33 for example, one alternative option to self-archiving might be to present the faculty member with the option of an authorpays publishing model. the choice of alternatives allows the faculty member to be selective and discerning so that a sense of satisfaction is derived from the ability to resist by rejecting one alternative. at the same time, the librarian is able to gain compliance because one of the other alternatives that commits the faculty member to depositing research is accepted. options, comparisons, increments, and guarantees in addition to offering options, another way to erode user resistance to digital repositories is to use a comparative strategy. one technique is to first make a large request, such as “we would like you to submit all the articles that you have published in the last decade to the repository,” and then follow this with a more modest request, such as “we would appreciate it if you would please deposit all the articles you have published in the last year.” the original request becomes an “anchor” or point of reference in the mind of the user against which the subsequent request is then evaluated. setting a high anchor lessens user resistance by changing the user’s point of comparison of the second request from nothing (not depositing any work in the repository) to a higher value (submitting a decade of work). in this way, a high reference anchor is established for the second request, which makes it seem more reasonable in the newly created context of the higher value.34 the user is thus more likely to comply with the second request when it is framed in this way. using this comparative approach may also work because it creates a feeling of reciprocity in the user. when proposition much more salient. it not only suggests that submitting work is a process that results in a desirable outcome, but that the earlier one’s work is submitted, the more recognition will accrue and the more rapidly one’s career will advance.29 faculty may feel compelled to submit their work in an effort to remain competitive with their colleagues. one resource that may be particularly helpful for working with skeptical faculty who want substantiation about the effect of self-archiving on scholarly impact is a bibliography created by the open citation project titled, “the effect of open access and downloads (hits) on citation impact: a bibliography of studies.”30 it provides substantial documentation of the effect that open access has on scholarly visibility. an additional stimulus might be introduced in conjunction with the time element in the form of a download report. showing faculty how downloads accumulate over time is analogous to arguments that investment counselors use showing how interest on investments accrues and compounds over time. this investment analogy creates a condition in which hesitating to submit their work results in faculty potentially losing recognition and compromising their career advancement. an interesting related finding by psychologists suggests that an effective way to reduce user resistance is to have users think about the future consequences of complying or not complying. in particular, if users are asked to anticipate the amount of future regret they might experience for making a poor choice, this can significantly reduce the amount of resistance to complying with a request. normally, users tend not to ruminate about the possibility of future disappointment in making a decision. if users are made to anticipate future regret, however, they will act in the present to try to minimize it. studies conducted by psychologists show that when users are asked to anticipate the amount of future regret that they might experience for choosing to comply with a request and having it turn out adversely versus choosing to not comply and having it turn out adversely, they consistently indicate that they would feel more regret if they did not comply and experienced negative consequences as a result.31 in an effort to minimize this anticipated regret, they will then be more prone to comply. based on this research, one strategy to reduce user resistance to digital repositories would be to get users to think about the future, specifically about future regret resulting from not cooperating with the request to submit their work. if they feel that they might experience more regret in not cooperating than in cooperating, they might then be more inclined to cooperate. getting users to think about the future could be done by asking users to imagine various scenarios involving the negative outcomes of not complying, such as lost opportunities for recognition, a lack of citation by peers, lost invitations to collaborate, an inability to migrate one’s work to future 74 information technology and libraries | june 2010 submit their work. mandates rely on authority rather than persuasion to accomplish this and, as such, may represent a less-than-optimal solution to reducing user resistance. mandates represent a failure to arrive at a meeting of the minds of advocates of open access, such as librarians, and the rest of the intellectual community. understanding the psychology of resistance is an important prerequisite to any effort to reduce it. psychologists have assembled a significant body of research on resistance and how to address it. some of the strategies that the research suggests may be effective, such as discussing resistance itself with users and talking about the negative effects of repositories, may seem counterintuitive and have probably not been widely used by librarians. yet when other more conventional techniques have been tried with little or no success, it may make sense to experiment with some of these approaches. particularly in the academy, where reason is supposed to prevail over authority, incorporating resistance psychology into a program aimed at soliciting faculty research seems an appropriate step before resorting to mandates. most strategies that librarians have used in trying to persuade faculty to submit their work have been conventional. they are primarily of a cognitive nature and are variations on informing and educating faculty about how repositories work and why they are important. researchers have an important affective dimension that needs to be addressed by these appeals, and the psychological research on resistance suggests that a strictly rational approach may not be sufficient. by incorporating some of the seemingly paradoxical and counterintuitive techniques discussed earlier, librarians may be able to penetrate the resistance of researchers and reach them at a deeper, less rational level. ideally, a mixture of rational and less-conventional approaches might be combined to maximize effectiveness. such a program may not eliminate resistance but could go a long way toward reducing it. future studies that test the effectiveness of such programs will hopefully be conducted to provide us with a better sense of how they work in real-world settings. references 1. charles w. bailey jr., “institutional repositories: doa?,” online posting, digital koans, aug. 22, 2007, http://digital -scholarship.org/digitalkoans/2007/08/21/institutional -repositories-doa/ (accessed apr. 21, 2010). 2. dorothea salo, “yes, irs are broken. let’s talk about it,” online posting, caveat lector, sept. 5, 2007, http://cavlec. yarinareth.net/2007/09/05/yes-irs-are-broken-lets-talk-about -it/ (accessed apr. 21, 2010). 3. eprints services, roarmap (registry of open access repository material archiving policies) http://www.eprints .org/openaccess/policysignup/ (accessed july 28, 2009). 4. richard k. johnson, “institutional repositories: partnering the requester scales down the request from the large one to a smaller one, it creates a sense of obligation on the part of the user to also make a concession by agreeing to the more modest request. the cultural expectation of reciprocity places the user in a situation in which they will comply with the lesser request to avoid feelings of guilt.35 for the most resistant users, breaking the request down into the smallest possible increment may prove helpful. by making the request seem more manageable, the user is encouraged to comply. psychologists conducted an experiment to test whether minimizing a request would result in greater cooperation. they went door-to-door, soliciting contributions to the american cancer society, and received donations from 29 percent of households. they then made additional solicitations, this time asking, “would you contribute? even a penny will help!” using this approach, donations increased to 50 percent. even though the solicitors only asked for a penny, the amounts of the donations were equal to that of the original request. by asking for “even a penny,” the solicitors made the request appear to be more modest and less of a target of resistance.36 librarians might approach faculty by saying “if you could even submit one paper we would be grateful,” with the idea that once faculty make an initial submission they will be more inclined to submit more papers in the future. one final strategy that psychological research suggests may be effective in reducing resistance to digital repositories is to make sure that users understand that the decision to deposit their work is not irrevocable. with any new product, users have fears about what might happen if they try it and they are not satisfied with it. not knowing the consequences of making a decision that they may later regret fuels reluctance to become involved with it. faculty need to be reassured that they can opt out of participating at any time and that the repository sponsors will guarantee this. this guarantee needs to be repeated and emphasized as much as possible in the solicitation process so that faculty are frequently reminded that they are entering into a decision that they can reverse if they so decide. having this reassurance should make researchers much less resistant to submitting their work, and the few faculty who may decide that they want to opt out are worth the reduction in resistance.37 the digital repository is a new phenomenon that faculty are unfamiliar with, and it is therefore important to create an atmosphere of trust. the guarantee will help win that trust. ■■ conclusion the scholarly literature on digital repositories has given little attention to the psychology of resistance. yet the ultimate success of digital repositories depends on overcoming the resistance of scholars and researchers to reducing psychological resistance to digital repositories | quinn 75 20. curtis p. haugtvedt et al., “consumer psychology and attitude change,” in knowles and linn, resistance and persuasion, 283–96. 21. larry w. gregory, robert b. cialdini, and kathleen m. carpenter, “self-relevant scenarios as mediators of likelihood estimates and compliance: does imagining make it so?” journal of personality & social psychology 43, no. 1 (1982): 89–99. 22. jerry m. burger, “fleeting attraction and compliance with requests,” in the science of social influence: advances and future progress, ed. anthony r. pratkanis (new york: psychology pr., 2007): 155–66. 23. john d. clapp and anita lyn mcdonald, “the relationship of perceptions of alcohol promotion and peer drinking norms to alcohol problems reported by college students,” journal of college student development 41, no. 1 (2000): 19–26. 24. noah j. goldstein and robert b. cialdini, “using social norms as a lever of social influence,” in the science of social influence: advances and future progress, ed. anthony r. pratkanis (new york: psychology pr., 2007): 167–90. 25. dale h. schunk, “social-self interaction and achievement behavior,” educational psychologist 34, no. 4 (1999): 219–27. 26. rosanna e. guadagno et al., “when saying yes leads to saying no: preference for consistency and the reverse foot-inthe-door effect,” personality & social psychology bulletin 27, no. 7 (2001): 859–67. 27. mary jiang bresnahan et al., “personal and cultural differences in responding to criticism in three countries,” asian journal of social psychology 5, no. 2 (2002): 93–105. 28. melanie c. green and timothy c. brock, “in the mind’s eye: transportation-imagery model of narrative persuasion,” in narrative impact: social and cultural foundations, ed. melanie c. green, jeffrey j. strange, and timothy c. brock (mahwah, n.j.: lawrence erlbaum, 2004): 315–41. 29. oswald huber, “time pressure in risky decision making: effect on risk defusing,” psychology science 49, no. 4 (2007): 415–26. 30. the open citation project, “the effect of open access and downloads (‘hits’) on citation impact: a bibliography of studies,” july 17, 2009, http://opcit.eprints.org/oacitation -biblio.html (accessed july 29, 2009). 31. matthew t. crawford et al., “reactance, compliance, and anticipated regret,” journal of experimental social psychology 38, no. 1 (2002): 56–63. 32. nicolas gueguen and alexandre pascual, “evocation of freedom and compliance: the ‘but you are free of . . .’ technique,” current research in social psychology 5, no. 18 (2000): 264–70. 33. james p. dillard, “the current status of research on sequential request compliance techniques,” personality & social psychology bulletin 17, no. 3 (1991): 283–88. 34. thomas mussweiler, “the malleability of anchoring effects,” experimental psychology 49, no. 1 (2002): 67–72. 35. robert b. cialdini and noah j. goldstein, “social influence: compliance and conformity,” annual review of psychology 55 (2004): 591–21. 36. james m. wyant and stephen l. smith, “getting more by asking for less: the effects of request size on donations of charity,” journal of applied social psychology 17, no. 4 (1987): 392–400. 37. lydia j. price, “the joint effects of brands and warranties in signaling new product quality,” journal of economic psychology 23, no. 2 (2002): 165–90. with faculty to enhance scholarly communication,” d-lib magazine 8, no. 11 (2002), http://www.dlib.org/dlib/november02/ johnson/11johnson.html (accessed apr. 2, 2008). 5. bruce heterick, “faculty attitudes toward electronic resources,” educause review 37, no. 4 (2002): 10–11. 6. nancy fried foster and susan gibbons, “understanding faculty to improve content recruitment for institutional repositories,” d-lib magazine 11, no. 1 (2005), http://www.dlib.org/ dlib/january05/foster/01foster.html (accessed july 29, 2009). 7. suzanne bell, nancy fried foster, and susan gibbons, “reference librarians and the success of institutional repositories,” reference services review 33, no. 3 (2005): 283–90. 8. diane harley et al., “the influence of academic values on scholarly publication and communication practices,” center for studies in higher education, research & occasional paper series: cshe.13.06, sept. 1, 2006, http://repositories.cdlib.org/ cshe/cshe-13-06/ (accessed apr. 17, 2008). 9. rea devakos, “towards user responsive institutional repositories: a case study,” library high tech 24, no. 2 (2006): 173–82. 10. philip m. davis and matthew j. l. connolly, “institutional repositories: evaluating the reasons for non-use of cornell university’s installation of dspace,” d-lib magazine 13, no. 3/4 (2007), http://www.dlib.org/dlib/march07/davis/03davis .html (accessed july 29, 2009). 11. jihyun kim, “motivating and impeding factors affecting faculty contribution to institutional repositories,” journal of digital information 8, no. 2 (2007), http://journals.tdl.org/jodi/ article/view/193/177 (accessed july 29, 2009). 12. peter suber, “open access overview” online posting, open access news: news from the open access environment, june 21, 2004, http://www.earlham.edu/~peters/fos/overview .htm (accessed 29 july 2009). 13. see, for example, jeffrey d. ford and laurie w. ford, “decoding resistance to change,” harvard business review 87, no. 4 (2009): 99–103.; john p. kotter and leonard a. schlesinger, “choosing strategies for change,” harvard business review 86, no. 7/8 (2008): 130–39; and paul r. lawrence, “how to deal with resistance to change,” harvard business review 47, no. 1 (1969): 4–176. 14. julia zuwerink jacks and maureen e. o’brien, “decreasing resistance by affirming the self,” in resistance and persuasion, ed. eric s. knowles and jay a. linn (mahwah, n.j.: lawrence erlbaum, 2004): 235–57. 15. benjamin margolis, “notes on narcissistic resistance,” modern psychoanalysis 9, no. 2 (1984): 149–56. 16. ralph grabhorn et al., “the therapeutic relationship as reflected in linguistic interaction: work on resistance,” psychotherapy research 15, no. 4 (2005): 470–82. 17. arthur aron et al., “the experimental generation of interpersonal closeness: a procedure and some preliminary findings,” personality & social psychology bulletin 23, no. 4 (1997): 363–77. 18. geoffrey l. cohen, joshua aronson, and claude m. steele, “when beliefs yield to evidence: reducing biased evaluation by affirming the self,” personality & social psychology bulletin 26, no. 9 (2000): 1151–64. 19. anthony r. pratkanis, “altercasting as an influence tactic,” in attitudes, behavior and social context: the role of norms and group membership, ed. deborah j. terry and michael a.hogg (mahwah, n.j.: lawrence erlbaum, 2000): 201–26. we do not have an information-prone society. when faced with a problem or interest, i suggest, we are more prone to ask, "what do i have to do?" rather than, "what do i have to know?" part of this reaction is probably due to the fact that when we ask "what do i have to know?" we are faced with another problem in addition to the initial one; i.e., where to get the information. this added effort simply confirms in us our indifference to information, and we take our best shot at solving the problem through decision and action . i sometimes think we have made a virtue of the information incapacity by the way we laud decision making as an indicator of ability. if the foregoing examples are reasonably accurate, we are then faced with a situation in which information is fundamentally important to societal and individual wellbeing, but is not perceived to be so by people in the conduct of their daily affairs. computer-supported telecommunications systems can be the instrument for accelerating information control by a few (this has been much of the trend , so far , as indicated by corporate, research, and technical use of these systems), or it can be used to build information confidence, use, and desire throughout society. this option, i. suggest, is central to the significance of telecommunications systems for a democratic society. if the latter option is to be obtained, i suggest that information will have to be packaged and targeted so well on people 's everyday problems and interests that it will be easier and more productive to say "what do i have to know?" before saying "what do i have to do?" a basic approach to articulating an information service of this kind consists of the following steps: l. determine and prioritize the individual and societal problems and interests of a given community. 2. ascertain the information parameters of those problems and interests. 3. locate and obtain the information necessary to address those problems and interests. 4. organize this information so as to optimally target the specified probcommunications 103 !em or interest to be as easily retrievable as possible. this requires an understanding of the context in which the information is used so that it is optimally relevant, and an understanding of the language and problem articulation common ·to the individuals in the community in order to ensure rapid retrieval. a lesson in interactive television programming: the home book club on qube w. theodore bolton: oclc , inc., columbus, ohio. on december 1, 1977, warner communications christened what has become the most publicized and talked about technological development in the field of cable television: qube, its two-way interactive cable system . publicity posters claimed that this would be "a day you'll tell your grandchildren about," and broadcasters added the word "interactive" to their cocktail-party vocabulary. academicians who ten· years ago forecast a technological revolution initiated by the marriage of computer to cable television, smugly grinned and saw their dreams turn into reality. response to qube, however, has been mixed. participatory television brings, to some, futuristic images of instant democracy ; others warn of its potential demagogic power. 1 regardless of your critical persuasion, there now exists what former cbs executive turned warner amex2 consultant mike dann calls "a whole new utility ."3 this whole new utility, whether in the form of qube cable television, or some other combination of computer, cable television, telephone, and standard over-the-air broadcasting, will change the way we conduct our lives <:nd interact with other people . the history of the home book club early in 1979, the oclc, inc . , research staff appraised the nature and context of the qube facilities (located in co104 journal of library automation vol. 14/2 june 1981 lumbus , ohio, only five miles away). discussions, which at times centered around far-fetched and lofty ideas, eventually led to realistic and inventive concepts that made use of qube's interactive technology. the most promising of these concepts was a book discussion program where the audience determined the content and direction of the discussion itself. hoping to take advantage of this new technology, and at the same time expand library services available to the general public, oclc proposed a book discussion program to qube. in a previously released statement, qube vice-president harlan kleiman had stated that the polling capabilities of the qube system should be treated like a "time bomb."4 yet oclc's proposal indicated an interest in exploring these very same devices. this factor, coupled with qube' s "closed door" policy toward outside researchers and scholars, seemed to indicated that the home book club research proposal would be rejected. but qube executives did the unexpected: they agreed to air six home book club programs, one each month . and so, on july 18, 1979, at 7 p.m . , the home book club premiered. an interactive book discussion what makes qube unique is its twoway, or upstream, capability. the qube technology is made up of three complementary computers that are used for monitoring, tabulation, and billing purposes . each qube console in a viewer's home has thirty channels to choose from and five response buttons to press when answering questions posed to home viewers on qube programs . by monitoring and tabulating data that show which tv sets are on, which programs viewers are watching, and which response buttons they last touched, qube therefore has a virtually error-free system of audience research . t his allows for a staggering amount of audience data to be compiled theoretically every six seconds . apart from the thirty-channel capability of standard television, community programs, and pay-per-viewing feature films, the most intriguing aspect of qube is its five response buttons. oclc felt that the use of these buttons should be emphasized and the concept of interaction should be fully incorporated into the home book club . at the beginning of each home book club program, home viewers were asked to select, from three alternatives, the opening topic of conversation about the book. after the home viewers had "touched in" their preference on one of the prespecified buttons, the qube polling computer tallied and displayed the results. once the book discussion was under way, the home viewers were given additional opportunities to "democratically" determine whether the panelists should continue in a particular topic area, or move on to new topic areas. if a controversial issue emerged within the course of a discussion, the horne book club panelists were encouraged to spontaneously pose interactive questions to horne viewers. this form of instantaneous polling was extended to telephone participants who were also periodically incorporated into the book discussion. a sampling of these opinion-type questions included: from the wifey program, "should sandy have left norman?"; from the metropolitan life program, "is this book too subjective for non-new yorkers?"; from the eye of the needle program, "was the violence portrayed a necessary part of this book?"; from the world according to carp program, "was this a feminist novel?" toward the end of each one-hour horne book club program the qube system broke new ground in interactive television history: home viewers selected, from five alternatives, the book to be discussed on next month's program. in addition, horne viewers were able to request a copy of the book to be sent to their home at no charge from the public library of columbus and franklin county (plcfc) . these two transactions took place with a mere touch of the prespecified button on the qube console. plcfc provided a major contribution to the horne book club. once the qube computers had compiled the names and addresses of those viewers who requested next month's book (earlier, all horne viewers had been told that their names would be entered in the qube computer if they responded to a book request), the qube computer printed the names on mailing labels. these labels were forwarded to the plcfc books-by-mail office, which then filled each request. the total time from "touch-in request" to "in-home mail delivery" was usually two to three days. indeed, a form of electronic catalog ordering actually took place each time the home book club program was cablecast in columbus. it should be noted that home book club viewers were also given the opportunity to order the alternative book choices. who watched the home book club? an additional use of qube's two-way capability was also incorporated into the first six home book club programs. prior to selecting and ordering the next months' books, home viewers were asked to respond to a series of demographic-type questions. from these questions, a profile of the typical home book club viewer was compiled to plcfc and qube management. this portion of the program also provided the oclc research department with data with which to explore the market-research potential of an interactive television system. from the beginning of the home book club research project, a few obvious limitations of interactive polling became apparent. first, not all home viewers made use of, or were willing to participate in, qube's interactive technology. response rates ranged from 20 to 85 percent, with an approximate mean rate of 55 percent. second, only one viewer in a multiple-person household could respond. third, it can be logically assumed that certain kinds of people will and did interact more often than others . taking these limitations into consideration, a few generalizations were still able to be made regarding the home book club audience . the demographic data traced over the first six programs showed the audience to be primarily composed of younger (below thirty-nine years of age), college-educated (65 percent had college communications 105 or postgraduate degrees), middle to upper income (60 percent earning $25 ,000 or more per year), females (approximately 70 percent of the interacting audience). these figures should not surprise anyone who is either familiar with previous profiles of the general library users or who may in passing conjure a guess as to what kind of person might be interested in viewing a televised interactive book discussion . a closer inspection of the instantaneous audience demographics, however, led to some disappointing implications. can a democratic television program survive? as was pointed out earlier, home viewers were permitted to select the next month's book at the conclusion of a program. this was strictly a democratic process where the majority ruled. the world according to carp, the premier home book club book, was followed by eye of the needle and wifey for programs two and three respectively. the qube computer indicated that each of these programs were viewed by approximately 175 households, or almost 420 individuals. in a competitive structure where there are twenty-nine television program alternatives from which a viewer can choose, qube, oclc, and the plcfc felt that a successful programming concept had been born. qube management enthusiastically reported that the home book club had achieved audience levels that at times rivaled their more extravagant and broadbased entertainment/interview program, "columbus alive." this enthusiasm was short-lived as audience-level figu res from program four came in . at the end of program three (wifey), the audience selected james michener's weighty novel chesapeake for the next month's program. the respectable figure of approximately 375 viewers for wifey dwindled to slightly less than 210 viewers for chesapeake. and to make matters worse, the audience-level figures did not improve for programs five and six. there are several alternative and sometimes complementary explanations for this substantial loss in audience. first, many viewers may not have been able to get 106 journal of library automation vol. 14/2 june 1981 through the some one thousand pages of "maryland's eastern shore" history in chesapeake, and thus chose not to participate in the horne book club. second, the new fall syndicated programs offered at that time by local network affiliates may have led many viewers to choose alternative programming. additional hypotheses can also be gleaned from the interactive demographic data: whereas in programs one through three approximately 40 percent of the audience indicated their educational level to be either some college or below, only 20 percent of the chesapeake audience (program four) fell into this category. this statistic remained constant for programs five and six of the horne book club. in the democratic television environment that the horne book club provides, what happens to the minority interest group? could this democratic television system be systematically eliminating specific viewer types? it might be that the outvoted minority group book reader can withstand being overruled just so many times before ceasing to participate. what recourse does this minority interest group have other than to be dominated by higher-educated viewers who heavily stuff the electronic ballot box in favor of their own book preferences? quite clearly the recourse for the minority interest group was to select a competing television program, as evidenced by the declining viewing audience-level figures. the loss of these viewers becomes especially disheartening because this particular audience segment may represent a group of individuals who never before participated in a book discussion . the future of the home book club given the somewhat disappointing results of the horne book club reported thus far, one would expect the program to be recorded in history as a noble, but unsuccessful, attempt at interactive television programming. the books-by-mail program did send out some 760 paperback books as a result of the horne book club (a 79 percent overall increase), and twenty-six new library cards (not a prerequisite) were issued to horne book club viewers . but the fact remains that a for-profit company such as warner annex most definitely cannot substantiate the continuation of a program that has the audience ratings as low as the horne book club . ... or can it? not only has the horne book club been continued (it's now in its twentieth month), but a morning edition of the horne book club premiered in june 1980. what explanations can account for this somewhat bewildering corporate behavior? on a very idealistic level, warner annex could be fulfilling its obligation to serve all facets of the columbus community. the horne book club certainly offers a viewing alternative to an often neglected segment of the viewing population. oclc, inc., and public libraries throughout the united states applaud this kind of responsible programming . on a more practical level, there may be other strategies behind the renewal of the horne book club contract. a 1978 study completed by the argus research corporation concluded that "no profits are expected from qube until the system is successfully replicated in cities other than columbus , and at considerably lower costs . "s to replicate the qube system, warner annex must expand its cable territory into new communities throughout the united states. this can at times be a very difficult task. the right for a company such as warner annex to wire a local municipality to its qube system is determined by local government. normally, a city council reviews and contrasts alternative cable systems in terms of the services each system proposes in return for franchising rights. the final decision usually is based on costs, the programming made available, and, most importantly, the kind of community service the cable system proposes to extend to its viewers . one definition of extended community service might be a televised book discussion program that involves the local public libraries . the alluring notion of an interactive book discussion may even be more appealing to community-minded city council members. in fact, qube is currently using an edited composite tape of h9rne book club highlights in their franchising efforts. the success of such efforts remains to be seen . whether warner amex's motives are communityor commercial-minded, the fact remains that other communities may have the opportunity to develop a program of this kind . since local governments can legally specify what services the cable company must provide, the inclusion of a televised book discussion program could become part of a contract fulfillment. advice for those interested in developing alternative television programs for special-interest groups : don't be caught napping when your national cable representatives come knocking on your city council door. as for the home book club , qube and the public library of columbus and franklin county are working at reestablishing a solid baseline audience . as is the case for any television program, promotion is a key ingredient for success . when viewers were asked where they first found out about the home book club, more than half indicated they obtained program information through the free qube program guide. approximately 15 percent heard from a friend and 12 percent found information at the public library . a coordinated promotional effort is highly recommended for a public-service program of this nature . the future of interactive television qube must be thought of as more than just a two-way television system. in fact, it is more than interactive television. qube is actually a computer hooked to a cable communication system . that cable communication system is a network providing a pathway for a wide variety of services from central facility to home subscribers . in the future, not only will systems such as qube provide "local loop" communications for these services, but undoubtedly will be interconnected by a satellite with other similar systems throughout the country and indeed the world. the five buttons on the existing qube consoles are just the first evidence of the future possibilities of interactive broadband communications systems currently delivering television . because the early applications of cable were to provide entertainment television, and more often communications 107 than not were provided by people in the television business, cable television is naturally oriented toward the entertainment business . but the future of these broadband communications systems is in interactive retrieval of information as much as it is in entertainment. this goes far beyond the simple polled system so frequently used in a two-way mechanism : the talk show host asks how many people have read a particular book, the audience responds, and the net result has no effect on the program itself. it is also a lot more than interactive television : the host asks what you want to discuss, the audience says the plot of the book, and the answer has an effect on the outcome of the show. in fact , these broadband communications systems have the potential for placing at the fingertips of americans a vast storehouse of information services about, for example, the best auto routes to your favorite spots, baby care, banking, buying a house , dressmaking, good buys, hobbies , jobs , legal facts , properties for sale or rent, sports scores, technology, and wine. as qube expands into its qube iii system with more than a hundred channels of services, it will be technically positioned to support all aspects of this burgeoning information age. 6 besides simple information retrieval, a qube subscriber will be able to conduct banking and shopping transactions, to provide information such as who is on what side of community issues , and also (incidentally) to watch television . if all of that does not seem like enough, remember that cable "is really a very large pipe through which any variety of electronic information can be pushed. passive home security, fire alarm , and energy management are also services either in existence or contemplated by a number of cable operators . for that matter there is no reason to believe the computer processing services can't be made available to individual subscribers . a subscriber could call up the program to balance his checkbook, to perform his smallbusiness payroll calculations, or to complete a statistical analysis of data for a school project. most people thought (as we initially did) that interactive cable (qube) means interactive television . but oclc's research 108 journal of library automation vol. 14/2 june 1981 has shown that interactive television programs : 1. serve as an initial introduction to naive audiences of what a truly interactive system is all about; 2. are difficult to implement; 3. really aren't democratic; 4. are basically polling devices. it has been said that the reason that railroads went out of business was because they insisted that they were in the railroad business and wouldn't admit that they were in the transportation business. if cable operators insist that they are in the television business, they may well miss the opportunities that are possible in the communications business or, in fact, in the information business . by the same token, if libraries miss the significance of what cable television is bringing to their business, their role in the community will be diminished and libraries may go the way of railroads. modern communications and computers offer an opportunity for libraries to become the information choice in their community. in the near future, applications such as the home book club may well be a way to provide increased accessibility of library services to library patrons, and to "condition" those patrons to the coming electronic nature of libraries . over the long term, libraries, if they have the courage and the foresight, can be the focus of the coming information and telecommunications revolution . the message is quite clear: opportunities abound. references l. john wicklein, "wired city , u.s.a: the charms and dangers of two-way tv," atlantic monthly 243:35--42 (feb. 1979). 2. warner amex represents a newly formed corporation resulting from the merger of warner communications and american express. 3. jonathan black, "brave new world of television," new times ll:41 (24 july 1978). 4. ibid., p.49. 5. "warner cable's qube: exploring the outer reaches of two-way tv," broadcasting 95:28 (31 july 1978). 6. "two-way converters hot ticket at ncta exhibits," broadcasting 97:72 (26 may 1980). an informal survey of the cti computer backup system joseph covino and sheila intner: great neck library, great neck, new york. in order to help decide whether or not to purchase computer backup systems from computer translation, inc. (cti), * for use when the clsi libs 100 automated circulation system is not operating, great neck library conducted an informal survey of libraries using both systems . eleven institutions, including both public and academic libraries, responded to a brief questionnaire. they were asked what size cti system they had purchased and why, how easily it was installed, how well it performed, how it was maintained, and if clsi acknowledged that the addition of the backup did not affect their libs 100 maintenance agreements . before summarizing the responses, the structure of the two systems and how they interact should be outlined. clsi libs 100 the clsi automated circulation system consists of a stand -alone minicomputer console with local and/or remote terminals connected to it through individual ports by means of electrical and/or dedicated telephone line hookups. when it operates, the terminals are online and interactive with the database, which is stored on one or more multiplatter disc packs. cti backup the cti backup system is based on an apple ii microcomputer with two minidisc drives, which take 5 1/4-inch floppy discs, a tv monitor, and a switching system that can be connected to the libs 100 console or its terminals . the cti system can also be used alone. when the libs 100 is down (inoperative), the cti system is connected to a terminal, and data is recorded on its discs for later dumping (data entry) into the database via a port connection . it *cti is a profit-making company wholly owned by brigham young university. the cti backup system was originally developed to support the clsi"installation at byu. virtual production at cloud901 in the memphis central library public libraries leading the way virtual production at cloud901 in the memphis central library david mason and alan ji information technology and libraries | march 2023 https://doi.org/10.6017/ital.v42i1.16315 alan ji (geewhyalan@outlook.com) is youth contributor. david mason (davidpattenmason@gmail.com) is program specialist, memphis public library. © 2023. background cloud901 cloud901 is a teen learning center in the memphis public library (https://www.memphislibrary.org/cloud901/). in our space we give youth between the ages of 13–18 exclusive access to all the resources needed to produce anything from short films to visual art to 100+ pound robots. this includes specialty areas designated for subjects such as art, video production, and engineering, each staffed with expert facilitators. as an organization we aim to provide youth with the opportunity to discover many areas of study at only the cost of their time and attention. in addition to providing individual support, cloud901 (dubbed “the cloud”) runs its own instructional programs, acts as a venue for outside organizations to host their programs, and generally serves as a place where high schoolers can hang out after school. the nature of what we do places us at the intersection of many different fields and brings a wide variety of youth together. despite this, the cloud faces a challenge i see in many other institutions: computer programming is taught in isolation from other fields such as film, so it is seen as an uncreative pursuit. filmmaking and programming could each benefit from one another: film, by incorporating code to streamline production; and programming, by gaining a way to creatively collaborate with those seen traditionally as “artists.” in order to explore this connection at cloud901, i am using my experience in film and programming to develop a “virtual production” initiative in our space. this project serves the purpose of teaching youth how to write programs within unreal engine while creating a platform where those interested in the film, programming, music, and visual art aspects of our space can collaborate. virtual production virtual production is a new method of producing special effects for movies and tv that turns the traditional visual effects production pipeline on its head. until now, the process of modern film making has required all digital compositing and special effects to take place after filming. this system relies on actors, directors, and the pre-production team to imagine what environments might look like and to make decisions about lighting and composition with little input from the visual effects team. if they make misguided decisions, the post-production process becomes significantly more complicated and the result may not measure up to expectations. to minimize this problem, industrial light & magic’s production team for the mandalorian used a circular room lined with light panels controlled by a real-time 3d rendering engine to light the physical set. not only does this process allow visual effects artists to make immediate adjustments to the virtual surroundings in response to the director’s feedback during the shoot, but it also produces footage that requires a comparatively small amount of post-production work. mailto:geewhyalan@outlook.com mailto:davidpattenmason@gmail.com information technology and libraries march 2023 virtual production at cloud901 2 mason and ji how can we do this? the cloud faces many more constraints than industrial light & magic (ilm). nevertheless, we found a way to achieve a form of virtual production in our space now, as the tools have become quite accessible compared to even a few years ago. base station nikon d5600, camera that captures our actor. vive tracker, used in conjunction with the base station to sync the location and rotation of our physical camera with our virtual camera increasingly accessible tools on the software side we rely on blender for creating virtual environments and we use unreal engine 5 for the virtual production logic and rendering. both software applications are used in professional film production and both are completely free. in fact, the virtual production stage used in the mandalorian also relies on unreal engine. thanks to the gaming and live streaming industries, the hardware required has become much more affordable. in our production we acquired the latest graphics and vr tracking equipment for a few thousand dollars. such equipment includes our workstation laptop for running unreal engine and the htc vive 3.0 tracking system for synchronizing the position and orientation of our physical camera with our virtual camera. for a full breakdown, refer to the documentation on our project’s github page. https://www.vive.com/us/accessory/base-station/ https://www.nikonusa.com/en/nikon-products/product/dslr-cameras/d5600.html https://www.vive.com/us/accessory/tracker3/ https://github.com/dp-mason/budget-virtual-production information technology and libraries march 2023 virtual production at cloud901 3 mason and ji our approach the process used by ilm begins with the render of the environment in unreal engine, which is then sent to the led-panels surrounding the stage, which light the stage and act as the backdrop that the camera captures. the actors on the stage are working with the same visual backgrounds and lighting that will ultimately be seen by the viewer at home. (other effects are added in postproduction.) this means that the real actors are illuminated by the virtual background via the led displays, making the addition of purely virtual effects easier and more seamless. example of the cylindrical computer-generated background with actors in the center, lit with the same illumination as is generated by the virtual set. image licensed under the creative commons attribution-share alike 4.0 international license. source: https://commons.wikimedia.org/wiki/file:stagecraft_the_mandalorian.jpg. in contrast, our effects process begins at our physical camera by sending the live video of our actor against a green screen to unreal engine, where the video is processed to remove the background. then, with the help of a virtual object matching the dimensions of our physical green screen, we apply the actor’s image on top of the virtual environment. instead of projecting our virtual environment to real space through led panels, we do the inverse: we project our actor into the virtual environment, achieving a similar effect at a fraction of the cost. https://en.wikipedia.org/wiki/en:creative_commons https://creativecommons.org/licenses/by-sa/4.0/deed.en https://commons.wikimedia.org/wiki/file:stagecraft_the_mandalorian.jpg information technology and libraries march 2023 virtual production at cloud901 4 mason and ji render of our virtual production set. this is what our scene looks like in unreal engine from a view outside the camera. the image of our actor appears distorted from this angle, because it is projected from the camera to a mesh matching our green screen’s dimensions. this is the view of our virtual scene from the camera. source: https://www.blendswap.com/blend/30009 https://www.blendswap.com/blend/30009 information technology and libraries march 2023 virtual production at cloud901 5 mason and ji why does this project matter? the philosophy i had been pursuing photography, music, and 3d graphics since long before i joined cloud901 professionally. this interest actually began in my own teenage years while i was a volunteer at cloud901. at the time i was producing music, learning more about cameras through event photography, and just beginning to create 3d animations in blender. when i left for college to study computer science, i still had these interests, but they sometimes took a back seat to my studies. as time went on, though, i began to see how these interests and art in general are not incompatible with computer science. there are many ways to exemplify this compatibility, but there is something special about virtual production in particular. many students new to computer science get the impression that the field is all about memorizing rigid rules so that you can arrive at a “correct” solution. it is natural to feel that way even after months of classes. while it is true that creating in this medium can feel restricting, it offers many opportunities to make creative contributions. computer programming—especially in the context of virtual production—is like lute-making. a luthier’s first lutes are likely very ordinary, much of their effort spent attaining the “correct” sound in their instruments. yet, if they continue to develop their technical ability, to learn the mechanics of producing sound, and to become more comfortable with experimentation, they can influence the future of music in an aesthetic way without necessarily conforming to rigid notions of “correctness.” such an opportunity presented itself during the rise of opera. luthiers of the late sixteenth century used their technical knowledge to create the theorbo and other adaptations of the lute that would not be drowned out by an opera singer. there is nothing “correct” or “incorrect” about the theorbo or about virtual production methods, but rather they are subjective visions of how film production and music can adapt to create new experiences. virtual production is a fascinating use of current technology; it leverages current computing power to innovate in a field not typically associated with computer science. it makes us wonder what other industries could benefit from the application of programming. using lessons i taught students about unreal engine, i encouraged them to embrace topics such as programming, linear algebra, and photography as knowledge essential to crafting their own “instruments.” this began with smaller projects such as a creating a cube that moves up and down according to a sine wave and progressed to building extensions for the system we use to calibrate our camera tracking. co-author alan ji first got into 3d graphics through a programming camp at the memphis public library over the summer. it was there that he met david, and there that he was introduced to unreal engine. it’s crazy to think that i went from not knowing anything about unreal engine to helping create a virtual production set in a few months. connecting to the outside the novelty of this project also forged connections with people outside of the library. spencer burnham, a project manager critical to some of the biggest xr experiences to date, such as bafta award-winning app “wonderscope,” visited cloud901 to discuss his career and the potential he sees in learning virtual production to the cohort of youth interested in contributing to our project. while demoing the project at the “cs for all” conference last fall, i spoke to a computer science information technology and libraries march 2023 virtual production at cloud901 6 mason and ji teacher at a local high school and later spoke to the students in his after-school program. the debut of cloud901’s virtual production studio features a performance from alfred banks, a regularly touring vocal performer from new orleans. next steps in the short term, we would like to make our system easier to use and share it via an open source license. the current version of our system is hosted and documented on github. in the long term, we plan to take a more formal approach to onboarding youth contributors during the summer and record regular performances. currently we have onboarded people to the project on an ad-hoc basis, but we plan to organize regular summer programming teaching students interested in film and programming how to use and extend this program. we plan to use this set to host a series of performances featuring established musicians from the area. this will provide our youth contributors with practical experience working on an innovative form of production and opportunities to network with established creatives. external links • github repository for our project • our proof of concept video • cloud901 home page https://github.com/dp-mason/budget-virtual-production https://youtu.be/evbbt_uzhde https://youtu.be/evbbt_uzhde https://www.memphislibrary.org/cloud901/ background cloud901 virtual production how can we do this? increasingly accessible tools our approach why does this project matter? the philosophy connecting to the outside next steps external links lita president’s message joining together emily morton-owens information technology and libraries | december 2019 2 emily morton-owens (egmowens.lita@gmail.com) is lita president 2019-20 and the assistant university librarian for digital library development & systems at the university of pennsylvania libraries. . in writing this column i am looking ahead, as i have been throughout my term as vice-president and president of lita, to the possibility of our merger with alcts and llama. recently our discussions have included an exploration on all sides of how a division can support members through their career. this has inspired me to reflect on how lita has always taken a broad and inclusive view of what library technology work is and can be in the future. i believe the proposed core division can support and extend that tradition. one question that i’ve heard posed from time to time is “am i technical enough for lita?” longtime lita members like to answer that with a full-throated “yes!” if you’re interested enough to ask the question, we want you to join us in using technology as a part of your work. we want you to be supported in doing so at your current skill level, whether or not you want to make technology more a part of your work than it is today. if you want to go deeper into technology, we’ll be there with you. while the culture of the for-profit technology industry can promote imposter syndrome, we want lita to be a haven. in lita’s events and meetings, we consistently see different facets of library technology work reflected. some of us are training users in new technologies or creating programs that get young people excited about coding. others are working to make online resources accessible and easy for our users to benefit from. we have members who are manipulating metadata, creating services to help researchers comply with data management requirements, creating websites that guide users to the information they need, and preserving cultural heritage in digital forms. some of us manage tech projects or workers. some of our members work on large tech teams with generous resources and others are spinning magic just from their own skills. when i started working in libraries, my bosses and mentors were often librarians who had started in technical services or other roles, before “automation.” eager to improve their own workflows, and getting pulled into ils migrations and catalog development, they had become the technology experts. these accidental systems librarians have always been some of my favorite colleagues because of their sure-footed approach to our data. recently i’ve come to work with colleagues who are accidental systems librarians in the opposite sense; tech workers who took jobs in libraries and embraced what we do. one developer on my team, who had no previous library experience, took to our projects and ethical stance like a duck to water. he told me that he now goes to parties and tells people about how librarians are defenders of privacy and protectors of information. lita embraces growth in any direction because we want to support learning and problem-solving with a foundation of shared principles and resources. i don’t see these developments as time-based or inevitable in any given person’s career. there are plenty of library tech workers who prefer being an individual contributor and think they have mailto:egmowens.lita@gmail.com joining together | morton-owens 3 https://doi.org/10.6017/ital.v38i4.11905 their biggest impact doing direct work on applications. and many of my technical services colleagues prefer to define their work goals in those terms, no matter how adept they become with tech tools. whether or not they seek out a management position, our members will probably find themselves exhibiting leadership in some context, like developing standards or advocating for standards. instead of a rigid path of career development, many librarians today have fluid and multi-faceted careers. for myself, i have held similar positions at quite different types of libraries—public, medical, academic. lita has always been a part of my experience, though, providing a sort of collegial bedrock through a lot of change. the people are what make lita, lita: friendly, principled, and quirky. lita members are the kind of people who will learn all they can about a technology like the amazon alexa—and then unplug the one on the exhibit floor at annual. both as i was thinking about all this, and in this resulting column, leadership, collections, and technical services kept coming up. there is such strong and fruitful cross-pollination among these specialties, and i see that as something that would enhance the member experience—both for current lita members who want more contact with expert colleagues and for current llama and alcts members who want learning opportunities and support for their work with technology. lita members love to share their knowledge and hash through challenges together. sometimes i wish more ala members would feel comfortable giving us a try, and perhaps core will be a new, friendly face for that ongoing outreach. if, in the future, someone asked the new question “am i technical enough for core?” i’m sure the answer will be the same: “yes, please join us!” leadership and infrastructure and futures…oh my! letter from the core president leadership, infrastructure, futures christopher cronin information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.13027 christopher cronin (cjc2260@columbia.edu) is core president and associate university librarian for collections, columbia university. © 2020. i am so pleased to be able to welcome all ital subscribers to core: leadership, infrastructure, futures! this issue marks the first of ital since the election of core’s inaugural leadership. a merger of what was formerly three separate ala divisions—the association of library collections & technical services (alcts), library & information technology association (lita), and the library leadership & management association (llama)—core is an experiment of sorts. it is, in fact, multiple experiments in unification, in collaboration, in compromise, in survival. while initially born out of a sheer fight or flight response to financial imperatives and the need for organizational effectiveness, developing core as a concept and as a model for an enduring professional association very quickly became the real motivation for those of us deeply embedded in its planning. core is very deliberately not an all-caps acronym representing a single subset of practitioner within the library profession. it is instead an assertion of our collective position at the center of our profession. it is a place where all those working in libraries, archives, museums, historical societies—information and cultural heritage broadly—will find reward and value in membership and a professional home. all organizations need effective leaders, strong infrastructure, and a vision for the future. and that is what core strives to build with and for its members. while i welcome ital’s readers into core, i also welcome core’s membership into ital. no longer publications of their former divisions, all three journals within core have an opportunity to reconsider their mandates. as with all things, audience matters. ital’s readership has now expanded dramatically, and those new readers must be invited into ital’s world just as much as ital has been invited into theirs. as we embark on this first year of the new division, we do so with a sense of altogether newness more than of a mere refresh, and a sense of still becoming more than a sense of having always been. and who doesn’t want to reinvent themselves every once in a while? start over. move away from the bits that aren’t working so well, prop up those other bits that we know deserve more, and venture into some previously uncharted territory. how will being part of this effort, and of an expanded division, reframe ital’s mandate? the importance of information technology has never been more apparent. it is not lost on me that we do this work in core during a year of unprecedented tumult. in 2020, a murderous global pandemic was met with unrelenting political strife, pervasive distribution of misinformation and untruths, devastating weather disasters, record-setting unemployment, heightened attention on an array of omnipresent social justice issues, and a racial reckoning that demands we look both inward and outward for real change. individually and collectively, we grieve so many losses —loss of life, loss of income, loss of savings, loss of homes, loss of dignity, loss of certainty, loss of control, loss of physical contact. and throughout all of these challenges, what have we relied on more this year than technology? technology kept us productive and engaged. it provided a focal point for communication and connection. it provided venues for advocacy, expression, inspiration, and, as a mailto:cjc2260@columbia.edu information technology and libraries december 2020 leadership, infrastructure, futures | cronin 2 counterpoint to that pervasive distribution of misinformation, it provided mechanisms to amplify the voices of the oppressed and marginalized. for some, but unfortunately not all, technology also kept us employed. and as the physical doors of our organizations closed, technology provided us with new ways to invite our users in, to continue to meet their information needs, and to exceed all of our expectations for what was possible even with closed physical doors. and yet our reliance on and celebration of technology in this moment has also placed another critical spotlight on the devastating impact of digital poverty on those who continue to lack access, and by extension also a spotlight on our privilege. in her parting words to you in the final issue of ital as a lita journal, evviva weinraub lajoie, the last president of lita, wrote: we may have always known that inequities existed, that the system was structured to make sure that some folks were never able to get access to the better goods and services, but for many, this pandemic is the first time we have had those systemic inequities held up to our noses and been asked, “what are you going to do to change this?” balancing those priorities will require us to lean on our professional networks and organizations to be more and to do more. i believe that together, we can make core stand up to that challenge. i believe we will do this, too, and with a spirit of reinvention that is guided by principles and values that don’t just inspire membership but also improve our professional lives and experience in tangible ways. it was a privilege to have served as the final president of alcts and such a humbling and daunting responsibility to now transition into serving as core’s first. it is a responsibility i do not take lightly, particularly in this moment when so much is demanded of us. as we strive for equity and inclusion, we do so knowing that we are only as strong as every member’s ability to bring their whole selves to this work. we must work together to make our professional home everything we need it to be and to help those who need us. it is yours, it is theirs, it is ours. https://doi.org/10.6017/ital.v39i3.12687 20190318 10979 gallley editorial board thoughts who will use this and why? user stories and use cases kevin m. ford information technology and libraries | march 2019 5 kevin m. ford (kefo@loc.gov) is librarian, linked data specialist, library of congress. perhaps i’m that guy. the one always asking for either a “user story” or a “use case,” and sometimes both. they are tools employed in software or system engineering to capture how, and importantly why, actors (often human users, but not necessarily) interact with a system. both have protagonists, but one is more a creative narrative, the other like a strict, unvarnished retelling. user stories relate what an actor wants to do and why. use cases detail to varying degrees how that actor might go about realizing his desire. the concepts, though distinct, are often confused and conflated. and, because they classify as jargon, the concepts have sometimes been employed outside of technology to capture what an actor needs, the path the actor takes to his or her objective, including any decisions that might be made along the way, and all of this effort is undertaken in order to identify the best solution. by giving the actors a starring role, user stories and use cases ensure focus is on the actors, their inputs, and the expected outcome. they protect against incorporating unnecessary elements, which could clutter and, even worse, weaken the end product, and they create a baseline understanding by which the result can be measured. and so i find myself frequently asking in meetings, and mumbling in hallways: “what’s the use case for that?” or “is there a user story? if not, then why are we doing it?” you get the idea. it’s a little ironic that i would become this person. not because i didn’t believe in user stories and use cases – quite the contrary, i’ve always believed in the importance and utility of them – but because of a book i was assigned during graduate coursework for my lis degree and my initial reaction. it’s not just an unassuming book, it has a downright boring appearance, as one might expect of a book entitled “use case modeling.”1 it’s a shocking 347 pages. it was a joint endeavor by two authors: kurt bittner and ian spence. i think i read it, but i can’t honestly recall. i assume i did because i was that type of student and i had a long chicago el commute at the time. in any case, i know beyond doubt that i was assigned this book, dutifully obtained it, and then picked it up, thumbed through it, rolled my eyes, and probably said, “ugh, really?” and that’s just it. the joke’s on me. the concepts, and as such the book, which i’ve moved across the country a couple of times, remain near-daily constants in my life. as a developer, i basically don’t do anything without a user story and a use case, especially one whose steps (including preconditions, alternatives, variables, triggers, and final outcome) haven’t been reasonably sketched out. “sketched out” is an interesting phrase because one would think that if entire books were being authored on the topic of use cases, for example, then use cases would be complicated and involved affairs. they can be, but they need not be. the same holds for user stories. imagine you were designing a cataloging system, here’s an example of the latter: as a librarian i want my student catalogers to be guided through selection of vocabulary terms to improve both their accuracy and speed.2 editorial board thoughts: who will use this and why? | ford 6 https://doi.org/10.6017/ital.v38i1.10979 that single-sentence user story identifies the actors (student catalogers), what they need (a “guided … selection of vocabulary terms”), and why (“to improve their accuracy and speed”). the use case would explore how the student catalogers (the actors) would interact with the system to realize that user story. the use case might be narrowly defined (“adding controlled terms to records”) or might be part of a broader use case (“cataloging records”), but in either instance the use case might go to some length to describe the interaction between the student catalogers and the system in order to generate a clear understanding of the various interactions. by doing this, the use case helps to identify functional requirements and it clearly articulates user/system expectations, which can be reviewed by stakeholders before work begins and used to verify delivery of the final product. as i have presented this, using these tools might strike you as overly formal and time-consuming. in many circumstances they might be, if the developer has sufficient user and domain knowledge (rare, very, very rare) and especially if the “solution” is not an entirely new system but just an enhancement or augmentation to an existing system. yet, whether it is a completely new system being developed by someone who has long and profound experience with the domain or a simple enhancement, it may be worth entertaining the questions/process if even informally. i find it is often sufficient to ask “who will use this and why?” essentially i’m asking for the “user story” but dispensing with the jargon. doing so may lead to additional questions, the answers to which would likely check the boxes of a “use case” even if the effort is not identified as such, and it certainly ensures the user-driven nature and need of the request. this might all sound obvious, but i like to think of it as defensive programming, which is like defensive driving. yes, the driver coming up to the stop sign on my right is going to stop, but i take my foot off the gas and position it over the brake just in case. likewise, i’m confident the functional requirements i’m being handed have been fully considered and address a user need, but i’m going to ask for the user story anyway. i’m also leery of scope creep which, if i were to continue the driving analogy, would be equivalent to driving to one store because you need to, but then also driving to two additional stores for items you think might be good to have but for which you have no present need. it’s time-consuming, you’ve complicated your project, you’ve added expense to your budget, and the extra items might be of little or no use in the end. the number of times i’ve been in meetings in which new, additional features are discussed because the designers think it is a good idea (that is, there has been no actual user request or input sought) is alarmingly high. that’s when i pipe up, “is there a user story? if not, then why are we doing it?” user stories and use cases help focus any development project on those who stand to benefit, i.e. the project’s stakeholders, and can guard simultaneously against insufficient planning and software bloat. and the concepts, though most often thought of with respect to large-scale projects, apply in all circumstances, from the smallest feature request to an existing system to the redesign of a complex system. if you are not in the habit of asking, try it next time: who will use this and why? endnotes 1 kurt bittner and ian spence, use case modeling (boston: addison-wesley, 2003). also useful: alistair cockburn, writing effective use cases (boston: addison-wesley, 2001). information technology and libraries | march 2019 7 2 “use case 3.4: authority tool for more accurate data entry,” linked data for libraries (ld4l), accessed march 1, 2019, https://wiki.duraspace.org/display/ld4l/use+case+3.4%3a+authority+tool+for+more+accur ate+data+entry. editorial board thoughts: library analytics and patron privacy ken varnum information technology and libraries | december 2015 2 two significant trends are sweeping across the library landscape: assessment (and the corresponding collection and analysis of data) and privacy (of records of user interactions with our services). libraries, perhaps more than any other public service organization, are strongly motivated to assess their offerings with dual aims. the first might be thought of as an altruistic goal: understanding the needs of their particular clientele and improving library services to meet those needs. the second is perhaps more existential: helping justify the value libraries create to whatever sources of funding are necessary to impress. both are valid and important. it is hard to argue that improving services, focusing on actual needs, and maintaining funding are in any way improper goals. however, this desire is often seen as being in conflict with exploring too deeply the actions or needs of individual constituents, despite librarians’ historical and deeply-held belief that each constituent’s precise information needs should be explored and provided for through personalized, tailored services. solid assessment cannot happen without solid data. libraries have historically relied on qualitative surveys of their users, asking users to evaluate the quality of the services they receive. being able to know more details and ask directed questions of individuals who used services is possible in the traditional library setting through invitations to complete surveys after individual interactions such as a reference or circulation desk interaction, library program, visit to a physical location, or even a community-wide survey invitation. focus groups can be assembled as well, of course, once a library has identified a real-world group to study. however, those samples are more often convenience samples or—unless a library is able to successfully contact and receive responses from across the entire community— somewhat self-selected. assessment that leads to new or improved services relies much more heavily on broad-based understanding of the users of a system. libraries have been able to do limited quantitative studies of library usage—at its simplest, counting how many of this were checked out, how many of that was accessed, and how many users were involved. these metrics are useful, but also limited, particularly at the scale of a single library. knowing that a pool of resources is heavily used is helpful; even knowing that a suite of resources is frequently used collectively is beneficial. however, tying use of resources to specific information needs or information seekers, whether this is defined as ken varnum (varnum@umich.edu), a member of the ital editorial board, is senior program manager for discovery, delivery, and learning analytics at the university of michigan library, ann arbor, michigan. mailto:varnum@umich.edu editorial board thoughts: library analytics and patron privacy | varnum doi: 10.6017/ital.v34i4.9151 3 individuals or ad hoc collections of users based on situational factors such as academic level, course enrollments, etc. these more specific grouping rely on granular data that for many libraries—especially academic ones—are increasingly electronic. we are at a point in time when we have the potential to leverage wide swathes of user data. and this is where the second trend, privacy, comes to bear. protecting user privacy has been a guiding principle of librarianship in the united states (in particular) since the 1970s, as a strong reaction to u.s. government (through the fbi) requests to provide access to circulation logs for individuals under suspicion of espionage. this was in the early days of library automation, when large libraries with automated ils systems could prevent future disclosure through the straightforward strategy of purging transaction records as soon as the item was returned. this practice became standard operating procedure in libraries, and expanded into new information service domains as they evolved over the following forty years. with good intentions, libraries have ensured that they maintain no long-term history for most online services. as a profession, we have begun to realize that the straightforward (and arguably simplistic) approaches we have relied on for so long may no longer be appropriate or helpful. over the past year, these conversations found focus through a project coordinated by the national information standards organization thanks to a grant from the andrew w. mellon foundation.1 the range of issues discussed here was far-reaching and touched on virtually every aspect of privacy and assessment imaginable. the resulting draft document, consensus framework to support patron privacy in digital library and information systems,2 outlines 12 principles that libraries (and the information service vendors they partner with) should follow as they establish “practices and procedures to protect the digital privacy of the library user.” this new consensus framework sets a series of guidelines for us to consider as we begin to move into this uncharted (for libraries) territory. if we are to record and make use of our users’ online (and offline, for that matter) footprints to improve services, improve the user experience, and justify our value, this document gives us an outline of the issues to consider. it is time (and probably long past time) that we make conscious decisions about how we assess our online resources, in particular, and do so with a deeper knowledge of both the resources used and the people using them. at the exact moment in our technological history when we find ourselves able to provide automated services at scale to our users through the internet and simultaneously record and analyze the intricate details of those transactions, we need to come to think clearly about what questions we have, what data we need to answer them, and be explicit about how those data points are treated. it is important that we start this process now and change our blunt practices into more strategic data collection and analysis. where 40 years ago we opted to information technology and libraries | december 2015 4 bluntly enforce user privacy by deleting the data, we should now take a more nuanced approach and store and analyze data in the service of improved services and tools for our user communities. we have the opportunity, through technology and a more nuanced understanding of privacy, to conduct a protracted reference interview with our virtual users over multiple interactions… and thereby improve our services. references 1. http://www.niso.org/topics/tl/patron_privacy/ 2. http://www.niso.org/apps/group_public/download.php/15863/niso%20consensus %20principles%20users%20digital%20privacy.pdf http://www.niso.org/topics/tl/patron_privacy/ http://www.niso.org/apps/group_public/download.php/15863/niso%20consensus%20principles%20users%20digital%20privacy.pdf http://www.niso.org/apps/group_public/download.php/15863/niso%20consensus%20principles%20users%20digital%20privacy.pdf lib-s-mocs-kmc364-20140601052858 the shared cataloging system of the ohio college library center frederick g. kilgour, philip l. long, alan l. landgraf, and john a. wyckoff: ohio college library center, columbus, ohio development and implementation of an off-line catalog card production system and an on-line shared cataloging system are described. in off-line production, average cost per card for 529,893 catalog cards in finished form and alphabetized for filing was 6.57 (·. an account is given of system design and equipment selection for the on-line system. file organization and programs are described, and the on-line cataloging system is discussed. the system is easy to use, efficient, 1'eliable, and cost beneficial. the ohio college library center ( oclc) is a not-for-profit corporation chartered by the state of ohio on 6 july 1967. ohio colleges and universities may become members of the center; forty-nine institutions are participating in 1971/ 72. the center may also work with other regional centers that may "become a part of any national electronic network for bibliographic communication." the objectives of oclc are to increase the availability to individual students and faculty of resources in ohio's academic libraries, and at the same time to decrease the rate of rise of library costs per student. the oclc system complies with national and international standards and has been designed to operate as a node in a future national network as well as to attain the more immediate target of providing computer support to ohio academic libraries. the system is based on a central computer with a large, random access, secondary memory, and cathode ray tube terminals which are connected to the central computer by a network of telephone circuits. the large secondary memory contains a file of bibliographic records and indexes to the bibliographic record file. access to this central file from 158 journal of library automation vol. 5/3 september, 1972 the remote terminals located in member libraries requires fewer than five seconds. oclc will eventually have five on-line subsystems: 1) shared cataloging; 2) serials control; 3) technical processing; 4 ) remote catalog access and circulation control; and 5) access by subject and title. this paper concentrates on cataloging; the other subsystems are not operational at the present time. figure 1 presents the general file design of the system. the shared cataloging system has been the first on-line subsystem to be activated, and the files and indexes it employs are depicted in figure 1 by the heavy black lines and arrows. as can be seen in the figure, much of the system required for shared cataloging is common with the other four subsystems. the three main goals of shared cataloging are: 1) catalog cards printed to meet varying requirements of members; 2 ) an on-line union catalog; and 3) a communications system for requesting interlibrary loans. in addition, the bibliographic and location information in the system can be used for other purposes such as book selection and purchasing. the only description of an on-line cataloging system that had appeared in the literature during the development of the oclc system is that of the shawnee mission (kansas) public schools ( 1). the shawnee mission cataloging system produces uniform cards from a fixed-length, non-marc record. the oclc system uses a variable-length marc record and has great flexibility for production of cards in various formats. there are a number of reports describing off-line catalog card production systems, including systems at the georgia institute of technology ( 2), the new subject class 3,3 3,1,1, i lc card call author name and title index number title title number number index index index index index index l ! ! l ! l t ! bibliographic f---+ holding library, record file multiple and ~ partial holdings file i l l 1 ' date file name and techn ica i note and address processing dash entries file system and extra files added entries fig. 1. general file design; shared cataloging subsystem in heavy lines. shared cataloging systemfkilgour, et al. 159 england library information network ( nelinet) ( 3), and the university of chicago ( 4). the flexibility of the oclc system distinguishes it from these three systems as well. catalog card production-off-line an off-line catalog card production system based on a file of marc ii records was activated a year before the on-line system ( 5) . oclc supplied member libraries with request cards (punch cards prepunched with symbols for each holding library within an institution). for each title for which catalog cards were needed, members transcribed library of congress ( lc) card numbers onto a request card. members sent batches of cards to oclc at least once a week. at oclc, the lc card numbers were keypunched into the cards and new requests were combined with unfilled requests to be searched against the marc ii file. by the spring of 1971, over 70 percent of titles requested were found the first time they were searched. the selected marc ii records were then submitted to a formatting program that produced print images on magnetic tape for all cards required by a member library. the number of cards to be printed was determined by the number of tracings on the catalog record and the number of catalogs into which cards were to go including a regional union catalog (the cleveland regional union catalog) and the national union catalog. individual cards were formatted according to options originally selected by the member library. these options included: 1) presence or absence of tracings and holdings information on each of nine different types of cards; 2) three different indentions for added entries and subject headings; 3) a choice of upper-case or upperand lower-case characters for each type of added entry and subject heading; and 4) many formats for call numbers. oclc returned cards to members in finished form, alphabetized within packs for filing in specific local catalogs. the primary objective of off-line operation was the production of catalog cards at a lower cost than manual methods in oclc member libraries. early activation of off-line catalog card production did reduce costs and gave some members an opportunity to take advantage of normal staff turnover by not filling vacated positions in anticipation of further savings after activation of the on-line system. other objectives of off-line operation were the automated simulation of on-line activity in member libraries and development and implementation of catalog card production in preparation for card production in an on-line operation. the number of catalog card variations required by members, even after members had reviewed and accepted detailed designs of card products, proved to be higher than anticipated. more than one man-year was expended after activation of the off-line system in further development and implementation to take care of the formats and card dissemination variations requested by specific libraries. the one year advance start on 160 journal of library automation vol. 5/3 september, 1972 catalog production made possible by using marc ii records in the off-line mode proved to be a far greater blessing than anticipated, for it would have been literally impossible to have activated on-line operation and catalog card production simultaneously. a major goal of oclc card production is elimination of uniformity required by standardized procedures. the oclc goal is to facilitate cooperative cataloging without imposing on the cooperators. the cost to attain this goal is slight, for although there is a single expense to establish a decision point in a computer program, the cost of selection among three or thirty alternatives during program execution is infinitesimal. design of catalog cards and format options began four months before off-line activities. two general meetings of the oclc membership were held at which card formats were reviewed and agreed upon in a general sense. next, the oclc staff published a description of catalog card production and procedures for participation ( 6). this publication was reviewed by the membership and format variations were reported for inclusions in the procedure. members reported few variations at this time, but when implementation for individual members was undertaken, it was necessary to build many additional options into the computer programs. to assist the oclc staff in defining options for off-line catalog products and on-line procedures, an advisory committee on cataloging was established. this committee met several times and provided much needed guidance and counsel. the catalog card format options that members could select were extensive. for example, although the position of the call number was fixed in the upper left-hand corner of the card, there were 24 basic formats for lc call numbers, and libraries using the dewey decimal classification could format their call numbers as they wished. in general, the greatest number of format options are associated with call numbers, probably because there has never been a standard procedure for call number construction. programs because designing, writing, coding, and debugging of catalog card production programs can cost tens of thousands of dollars, oclc sought existing card production programs that could run on computers at ohio state university, which is the generous host of the ohio college library center. only two programs were located that could both produce cards in the manner required by oclc and run on osu computers. card production costs were not available for one of the programs, but because analysis suggested that the design of the program would create very high card costs, this program was not selected. the other program had been written and used at the yale university library, and although the card production costs were high, it was known that changes could be made to increase efficiency. thus, arrangements were made to obtain and run the yale programs at osu. members were free to choose a variety of format options and submitted on a catalog profile questionnaire (figure 2) their specifications for each shared cataloging systemjkilgovr, et al. 161 catalog. holdings information and tracings could be printed on any or all of nine types of cards: 1) shelf list; 2) main entry; 3) topical subject; 4) name as subject; 5) geographic subject; 6) personal and corporate added entries; 7) title added entry; 8) author-type series added entry; and 9) title-type series added entry. subject headings and added entries could have top-of-card or bottom-of-card placement and could be printed in all upper-case or in upperand lower-case characters. any type of subject heading and added entry could begin at the left edge of the card or at the first, second, or third indention. other options are described in the manual for oclc catalog card production ( 5). the data received on catalog profile questionnaires were transferred to punch cards and a computer program written in snobol iv embedded the information in the form of a pack definition table (pdt) in one of the principal catalog production programs named convert ( cnvt). each pdt defined the cards to go into the catalogs of one holding library, a holding library being a collection with its own catalog. the first major program in the processing sequence was prepros, which was written in ibm 360 basic assembler language ( bal) and run on an ibm 360/75. prepros converted records from the weekly marc ii tapes to an oclc internal processing format, including conversion of marc ii characters from ascii to ebcdic code. this program also parsed lc call numbers and partially formatted them. it also checked for end-of-field and end-of-record characters and verified the length of record. finally, it wrote the output records in lc card number sequence into huge variable format blocks of 20,644 characters. the large blocks reduced computer costs since the pricing algorithm employed on the ibm 360/75 imposed a charge for each physical read and write operation. the magnetic tape output weekly by prepros was then submitted to cnvt together with the old master file of bibliographic records in lc card number order and a file of request cards that had been sorted in lc card number order. cnvt merged the records on the weekly tape with the master file and then matched the requests by lc card number. when a match was obtained, cnvt deleted some fields from the bibliographic record and formatted the call number according to the specifications of the library that had originated the request. it then wrote the modified record and associated pot's onto an output tape in external ibm 7094 binarycoded-decimal (bcd) character code with the record format converted to that of the yale bibliographic system. the second principal product of cnvt was the new master tape of bibliographic records that would become the old master for the next week's run. cnvt also punched out a card bearing the lc card number for each request card for which there was a match. these punch cards were used to withdraw cards from the request card file so that they would not be submitted again. cnvt was first run on an ibm 360/50. the tape file of modified records and pot's was then submitted to 162 journal of library automation vol. 5/3 september, 1972 ohio college library center ca t alog pr o f ile questionnaire i. to define the pack of a rece i ving catalog, the member should complete the following tabl e . directions for completing the table are in the instruction manual , pp . 2 -3. leave blank rows for types o f entry not to be included ~n th~s pack. ii. 1. what is the name of the holding library or collection for which this pack contains c a rds? jvvc.;.. ,l... ~~ ::s 2 . what is the name of the receiving catalog into which this pack will go? u"'"" ~~~;~\ ""~r_s~ 3 . if this receiving catalog is not in the holding library or collection , 11 11 11111 put in the following box the stamp to appear above the call number · · · · · · · · · (see instruction manual) . lnsti tution: \)," jcl.<>·~ or;. [:>.,\( • .-<' fig. 2. catalog profile questionnaire. expand, a modified yale program written in mad and run on an ibm 7094. by combining the number of tracings and pdt requirements, expand developed a card image for each catalog card required by the requesting library. it also prepared a sort tag for each image so that the image could be subsequently sorted by library into packs and alphabetized within each pack. expand essentially did the formatting of catalog cards except for the complex lc call number formatting carried out by cnvt. the file of card images was passed to a program named build print tape (bldpt) written in bal and run on the ibm 360/ 75. bldpt first converted the external ibm 7094 bcd characters to ebcdic. next bldpt sorted the images, and finally, it arranged the images on a single tape to allow printing on continuous, two-up catalog card formsthe first half of the sorted file was printed on the left-hand cards and the second half on the right. the print program was also written in bal but run on an ibm 360/ 50. it was designed so that either the entire file or a segment as small as four cards could be printed; the latter feature was of greatest use in reprinting cards that for one of several reasons were not satisfactorily printed during the first run. cards were printed six lines to an inch and the print train used was a modified version of the train designed by the university of chicago which in turn was a modified version of the ibm tn train. shared cataloging systemjkilgour, et al. 163 the printer attached to the ibm 360/50 was an ibm 1403 n1 printer. this printer appears to be superior to any other high-speed printer currently available, but to obtain a product of high quality, it was necessary to fine-tune the printer, to use a mylar ribbon from which the ink does not flake off, and to experiment with various mechanical settings to determine the best setting for tension on the card forms and for forms thickness. above all, patience in large amounts was required during initial weeks when it seemed as though a messy appearance would never be eliminated. oclc off-line catalog card production programs were written in assembler language and higher level languages. use of higher level languages for character manipulation incurs unnecessarily high costs. therefore, for a large production sys tem like oclc, it is absolutely required that processing programs and subroutines that manipulate all characters, character by character, be written in an assembler language to obtain efficient programs that run at low cost. programs that do not manipulate characters, such as the oclc program for embedding pdt's in cnvt, may well be written in a higher level language. materials and equipment-a summary off-line catalog production was based on availability of marc ii records on magnetic tapes disseminated weekly by the library of congress. without the marc ii tapes, the off-line procedure could not have operated. each week, the new marc ii records were added to the previous cumulated master file also on magnetic tape, and previously unfilled and new requests were run against the updated file. osu computers employed were an ibm 360/75, an ibm 360/50, an ibm 7094, and an ibm 1620. the run procedure was complex and therefore somewhat inefficient, but this inefficiency was traded off against a predictably high expense to write a new card formatting program. members submitted a request for card production on a punch card on which the member had written an lc card number. members could specify a recycling period of from one to thirty-six weeks for running their request cards against the marc ii file before unfulfilled requests would be returned. in general, request cards bore lc card numbers for that section of the marc ii file that was complete; at first, the file was inclusive for only "7" series numbers, but in early 1971 the recon file for "69" numbers was added. request cards often numbered several thousand a week. catalog card forms are the now-familiar two-up, continuous forms with tractor holes along each side for mechanical driving. the card stock is permalife, one of the longest-lived paper stocks available. a thin slit of about one thirty-second of an inch in height converts each three-inch vertical section of card stock to 75 mm. the lowest price paid in a lot of a half million cards has been $8.065 per thousand. after having been printed, the card forms are trimmed on a modified uarco forms trimmer, model number 1721-1. this trimmer makes four 164 journal of library automation vol. 5/3 september, 1972 continuous cuts in the forms and produces cards with horizontal dimensions of 125 mm. cards are stacked in their original order as printed and are therefore in filing order. the trimmer operates at quoted speeds of 115 and 170 feet per minute or 920 and 1,360 cards per minute. measurements of speeds of operations confirmed these ratings. results the off-line catalog production system produced 529,893 catalog cards from july 1970 through august 1971 at an average cost of 6.57 cents per card. this cost includes over twenty separate cost elements plus a threequarter cent charge for overhead. the firm of haskins & sells, certified public accountants, reviewed the costing procedures that oclc employs, found that all direct costs were being included, and recommended the three-quarter cent overhead charge. the number of extension cards varies from library to library depending almost entirely on the types of cards on which libraries have elected to print tracings. however, one university library with a half-dozen department libraries and requiring tracings on only shelf list and main entry cards averages approximately six cards per title. cataloging using the oclc off-line system results in a decrease of staff requirements, and some libraries that used the system during most of the year found that they needed less staff in cataloging. reduction of staff by taking advantage of normal staff turnover facilitated financial preparation for the oclc on-line system in these libraries. evaluation despite the obvious inefficiences generated by running production computer programs on four different computers in two different locations and despite inefficiencies in the programs themselves, computer costs to process marc ii tapes and to format catalog cards, but not to print them, was 2.27 cents per card. as will be shown later, newer and more efficient programs have halved this cost, but even at 2.27 cents per card for formatting and .33 cents per card for printing, the cost of oclc off-line card production is less than half the cost of more traditional card production methods ( 7). two features originally designed into the system were never implemented, somewhat diminishing the usefulness of the system for some libraries. one of the incompleted features was a technique for deleting, changing, or adding a field to a marc record (this capability exists in the on-line system). absence of this procedure meant that libraries had to accept lc cataloging without modification except to call numbers. the second missing feature was the ability to print multiple holding locations on cards (this capability also exists in the on-line system) although it was possible to print multiple holdings in one location. this deficiency limited the usefulness of the system for large libraries processing duplicates into shared cataloging systemjkilgour, et al. 165 two or more collections. both of these features could have been activated, . but shortage of available time prior to activation of the on-line system prevented their implementation. figure 3 shows the high quality of the catalog cards produced. subsequent to attainment of this level of quality, there have been no complaints from members except in cases where a piece of chaff from the card forms went through the printer and caused omission of characters. oclc continues to vary the design of its continuous forms to achieve completely chaff-free stock. the shortest possible time in which cards could be received by the member library after submitting a request card was ten days, but it is doubtful that this response time was often achieved. the minimum average response time for the three-quarters of requests for which a marc record was located on the first run was two weeks. delays at a computer center or incorrect submission of a run could extend this delay to three and four weeks, and unfortunately such delays were cumulative for subsequent requests until the "weekly" runs were made sufficiently more often than weekly to catch up. if another delay occurred during a catch-up period, the response time further degraded. during the fourteen months of operation, there were two serious delays. the amount of normal turnover that occurred in oclc libraries during the fourteen months and that was taken advantage of by not filling positions was too small to reduce the financial burden incurred in starting up the on-line system. a few libraries demonstrated that it was possible to take advantage of such attrition. however, 20 percent of the libraries did not participate in the on-line system and perhaps half of those who did participate were uncertain as to whether the on-line cataloging system would operate or would operate at a saving. when feasibility of on-line shared cataloging has been substantiated and other centers begin to implement similar systems, it should be possible to activate off-line catalog production sufficiently in advance of on-line implementation to enable participants to take adequate advantage of normal attrition to minimize, or nearly eliminate, additional expenditures. experience such as that of oclc will enable new centers to calculate the number of months necessary for off-line production required to reduce salary expenditures by an amount needed to finance the on-line system. shared cataloging-on-line the cataloging objectives of the on-line shared cataloging system are to supply a cataloger with cataloging information when and where the cataloger needs the information and to reduce the per-unit cost of cataloging. catalog products of the system are the same as the off-line systemcatalog cards in final form alphabetized for £ling in specific catalogs; the on-line system is not limited to marc ii records but also allows cataloging input by member libraries. the shared cataloging system, which accommo166 ]oumal of library automation vol. 5/3 septembe r, 1972 jc423 oll7 ctt tt 171 oe45 1971 oaku la~reit de lacharr~ere repe. stude& sur ta theorle deaocratlc: spinoza, rousseau, beaet, marx. paris, pa:yot, 1963o 218 p• 23 c •• c8ibllotheque politique et econoaique) bibliocraphical ~ootnotes. dawis, mildred j., edo babroider:y desians, 1780-1820; 1ro• the aanuscript collection, the textile resource and research center, the valentine museua, richaond, virginia. edited by mildred j. davis. new york, crown publishers f1971 1 xiii, 94 p• (chie1l:y illus. (part colo)) 29 c•• commercial policy. 338.91 1:875in l:reinin, wordechai elihau, 193000 international econo•ics; a polic y approach (b:y) mordechai e. ~reinin• mew york harcourt, brace, jovanovich [ 1971 1 x., 379 p• it lus. (the harbrace series in business and econo•i cso) dc 430.5 • z9 c34 oako intersectoral capital ~tows in the econoaic dewelopaent o~ taiwan, 1 89 51960. lee, tena-hui • intersectoral capital ~lows in th e econoaic developaent o~ taiwan, 1 8&~1960. ithaca (n.y.] cornell univ ers it y press [ 1971 1 xxt 197 p• 23 cao an out&rowth o~ the author's the~is , cornell oniwersit:y, 1968. bibliography: p• (183)-1 8 1. 0 a"rnt 76-1 59031 ( fl(;uh e fig. 3. computer-produced catalog cards. hed uced 25%) dates all cataloging done in modern european alphabets, builds a union catalog of holdings in oclc member libraries as cataloging is done. one library, wright state university, is converting its entire catalog to machinereadable form in the oclc on-line catalog. the third major goal is a communications system for transacting interlibrary loans. system design and equipment selection figure 4 depicts the basic design of computer and communication comshared cataloging systemf kilgour, et al. 167 ponents for th e comprehensive system comprised of the five subsystems described in th e introduction. the machine system for shared cataloging was designed to be a subsystem of the total system so that subsequent modules could be added with minimal dismption . similarly, the logical d esign of the shared cataloging subsystem was constructed so that the modules of shared cataloging would be common to the remaining file requirements as shown in figure 1. design of the on-line shared cataloging system began with a redefinition of the catalog products of off-line catalog production ( 5) . in this exercise, the advisory committee on cataloging, comprised of members from seven libraries, contributed valuable assistance. the committee was also most helpful in designing the formats of displays to appear on terminal screens. important decisions in the design of the computer, communications, and terminal systems were those involving mass storage devices and terminals. random access storage was the only type feasible for achieving the objective of supplying a user with bibliographic information when and where he needed it. hence, random access memory devices were selected for the comprehensive system and ipso facto for shared cataloging. data channel system file catalog f1 l e data channe i memory drive contr ol data channe l ----connect1on made 1f cpu #i malfunct ions ·connect1on made if cpu #2 ma l funct1ons fig. 4. computer and communication system. 168 ]oum.al of library automation vol. 5/ 3 september, 1972 the cathode ray tube (crt) type of terminal was selected primarily because of its speed and ease of use by a cataloger. crt terminals are far more flexible in operation than are typewriter terminals from the viewpoint of both the user and machine system designer. for these reasons, crt terminals can enhance the amount of work done by the system as a whole. it was originally planned to select a computer without the assistance of computerized simulation, but in the course of time, it became clear that it was impossible to cope with the interaction among the large number of variable computer characteristics without computerized simulation. therefore, a contract was let to comress, a firm well known for its work in computer simulation. ten computer manufacturers made proposals to oclc for equipment to operate the five subsystems at peak loading (an average five requests per second over the period of an hour ) . all ten proposed computer systems failed because simulation revealed inefficiencies in their operating systems for oclc requirements. oclc and comress staff then proposed a modification in operating systems, which the manufacturers accepted. the next series of trials revealed that more than half of the computers or secondary memory files would have to be utilized over 100 percent of the time to process the projected traffic. as a result of these findings , one computer manufacturer withdrew its proposal, and five others changed proposals by upgrading their systems. on the final simulation runs, the percent of simulated computer utilization ranged from 19.70 percent to 114.31 percent. a subsequent investigation of predictable delays due to queuing in such a system showed that unacceptable delays could arise if computer utilization rose above 30 percent at peak traffic. three manufacturers proposed computer systems that were under 30 percent utilization and, for these, a trade-off study was made that included such characteristics as cost, reliability, time to install the applications system, and simplicity of program design. the findings of the simulation and trade-off studies provided the basis of the decision to select a xerox data systems sigma 5 computer. major components of the oclc sigma 5 are the central processing unit (cpu), three banks of core memory with a total capacity of 48 thousand 32-bit words or 192 thousand 8-bit bytes, a high speed disk secondary memory, 10 disk-pack spindles with total capacity of 250,000,000 bytes plus two spare spindles, two magnetic tape drives, two multiplexor channels, five communications controllers, a card reader, card punch, and printer. the character code is ebcdic. figure 5 illustrates the sigma 5 configuration at oclc. in this configuration, the burden of operating communications does not fall on the cpu so that there is no requirement for "cycle stealing" that slows processing by a cpu. the lease cost to oclc of the equipment represented in figure 5 is $16,317 monthly. the listed monthly lease of the equipment is $21,421 from which an educational discount of 10 percent is deducted. (the remaining difference is due to a rebate because the original order included secondary shared cataloging system j kilgour, et al. 169 memory units that xds was to obtain from another manufacturer who proved incapable of supplying units that fulfilled specifications. hence, xds was forced to supply other memory units having a higher list price but has done so at a cost per bit of the units originally ordered.) the printer furnished with the sigma 5 does not provide the high-quality printing required for library use. at the present time, oclc prints catalog cards on an osu ibm 1403-n1 printer that without doubt provides the highest quality printing currently available from a line printer. however, oclc is designing an interface between a sigma 5 and an ibm 1403 memory bonk no. i --dolo ---control memory bonk no. 2 memory bonk no. 3 i i i --------------~ l i 1 r----------r sigma 5 cpu multiplexor opera! or's console cord punch magnetic tope units cord reader dolo bose disk bonk no. i doto bose disk bonk no. 2 _______ !j bus-shor in g 1---+----, mull iplexor fig. 5. xds sigma 5 configuration. 170 journal of library automation vol. 5/3 september, 1972 printer; xds is also developing a new type of printer that will provide high quality output. when the sigma 5 can produce quality printing, it will be fully qualified to be used for nodes in national networks. as has already been stated, the crt-type terminal was selected because of its ease of use. moreover, the simulation study confirmed that crt terminals would place far less burden on the central computer and therefore, for the oclc system, would make possible selection of a less expensive computer than would be required to drive typewriter terminals. although typewriter terminals cost less, the total cost could be higher for a system employing typewriter terminals than for one using crt's because of greater central computer expense. library requirements for a crt terminal are: 1) that the terminals have the capability of displaying upperand lower-case characters and diacritical marks; 2) that the image on the screen be highly legible and visible; 3) that the terminal possess a large repertoire of editing capabilities; and 4) that interaction with the central computer and files be simple and swift. system requirements were: 1) that the terminal accept and generate ascii code; 2) that it make minimal demands for message transmissions from and to the central site; 3) that it have the capability of operating with at least a score of other terminals on the same dedicated line; and 4) that its cost, including service at remote sites, be about $150 per month. data were collected on crt's produced by fifteen manufacturers, and three machines were identified as being prime candidates for selection. oclc carried out a trade-off study in which thirty-three characteristics were assessed for these three machines. one of the thirty-three (reliability) could not be judged for any of the three because none had yet reached the market. for the remaining characteristics, the irascope lte excelled or equaled the other two terminals for twenty-eight characteristics including all nineteen characteristics of importance to the oclc user. moreover, the irascope was outstandingly superior in its ability to perform continuous insertion of characters, line wrap-around during insertion of characters, repositioning of characters so that each line ends in a complete word, and full use of its memory. however, the irascope was the most expensive$175 a month as compared with $153 and $166. nevertheless, the irascope was selected because of its obvious superiority. pilot operation by library staffs has not produced complaints concerning visibility or operability; complaints during pilot operation have sprung from failures caused by a variety of bugs in telephone systems and a couple of bugs in the terminals that were subsequently exterminated. the number of terminals needed by a member library for shared cataloging was calculated on the assumption that six titles could be processed per terminal-hour. it was also assumed that a library might have only one staff member to use the terminal throughout the year. it was further assumed that as much as three months of the terminal operator's time would be lost to vacations, sick leave, and breaks. at the rate of six titles per terminal-hour shared cataloging system f kilgour, et al. 171 and with 2,000 working hours in a year, 12,000 titles would be processed annually assuming full-time use. since only nine months was assumed to be available, it was estimated that 9,000 titles would be processed at each terminal. in large libraries where there would be more than one staff member to operate a terminal, there would be three months of time available to do input cataloging, and since only a few libraries will be obtaining less than 75 percent of cataloging from the central system, it appears that a formula of one terminal for every 9,000 titles or fraction thereof cataloged annually would give each library sufficient terminal-hours. in actual operation, operators have been able to work at twice the assumed rate of six titles per terminal-hour so that there is reason to believe that these guidelines will provide adequate terminal capability. file organization the primary data that will enter the total system are bibliographic records, and since the system is being designed to conform to standards, the national standard for bibliographic interchange on magnetic tape ( 8) has been complied with in file design. in other words, the system can produce marc records from records in the oclc file format; more specifically, the system can regenerate marc ii records from oclc records derived originally from marc ii records, although an oclc record contains only 78 percent of the number of characters in the original marc ii record. similarly, the system can generate marc ii records from original cataloging input by member libraries. the simulation study clearly showed that bibliographic data would have to be accessed in the shortest possible time if the system were to avoid generating frustrating delays at the terminal. imitation of library manual files or of standard computer techniques for file searching would not provide sufficient efficiency. oclc, therefore, set about developing a file organization and an access method that would take advantage of the computation speeds of computers. oclc research on access methods has produced several reports ( 9,10,11) and has developed a technique for deriving truncated search keys that is efficient for retrieval of single entries from large files. these findings have been employed in the present system that contained over 600,000 catalog records in april1973, arranged in a sequential file on disks, and indexed by a library of congress card-number index, author-title index, and a title index. the research program on access methods did not, however, investigate methods for storing and retrieving records. research on file organization included experiments directed toward development of a file organization that would minimize processing time for retrieval of entries or for the discovery that an entry is not in the file. since the oclc system is designed for on-line entry of data into the data base, it was not possible to consider a physically sequential file for the index files. 172 ]ottmal of library automation vol. 5/ 3 september, 1972 the indexed sequential method of file organization obviates the data-entry obstacle posed by physical sequential organization, but is inefficient in operation. consequently, scatter storage was determined to be the best method for meeting the efficient file organization requirements of the system. the findings of the investigation have shown that very large files of bibliographic index entries organized by a scatter-store technique in which search keys are derived from the main entry can be made to operate very efficiently for on-line retrieval and at the same time be sparing of machine time even in those cases where requests are for entries not in the file ( 12). this research also produced two powerful mathematical tools for predicting retrieval behavior of such files, and a design technique for optimizing record blocking in such files so that, on the average, only one to two physical accesses to the file storage device are needed to retrieve the desired information. the files displayed in figure 1 are constructed by a single file-building program designed so that additional modules can be embedded in the program. the program accepts a bibliographic record, assigns an address for it in the main sequential file, and places the record at that address. having determined the bibliographic record address, the program next derives the author-title search key and constructs an author-title index file entry which contains the pointer to the bibliographic record. then the program produces an lc card number index entry and a title index entry, each of which contains the same pointer to the bibliographic record. when a bibliographic record is used for catalog card production, an entry is made in the holdings file. when the first holdings entry is made for a bibliographic record, a pointer to the holdings entry is placed in that record; the pointer to each subsequent holdings entry is placed in the previous holdings entry. an entry is made at the same time in the call number index containing a pointer to the holdings entry. this file organization operates with efficiency and economy. the files containing the large bibliographic records and their associated holdings information are sequential, and hence, are highly economical in disk space. the technique used ensures that only a low percentage of available disk area need be reserved for growth of these large sequential files. disk units can be added as needed. each fixed-length record in the scatter-store files is less than 3 percent of the size of an average bibliographic record, and since 25 percent to 50 percent of these files are unoccupied, the empty disk area is small because of the small record lengths. sequential files the bibliographic record file and holdings file are sequential files, the holdings file being a logical extension of the bibliographic record file. a record is loaded into a free position made available by deletion of a record or into the position following the last record. whenever a new version of a shared cataloging system/kilgour, et al. 173 record updates the version already in the file, the new record is placed in the same location as the old if it will fit; otherwise, it is placed at the end of the file and pointers in the indexes are changed. there is a third, small sequential file containing unique notes for specific copies, dash entries, and extra added entries. each bibliographic record contains the information in a marc ii record. each record also contains a 128-bit subrecord capable of listing up to 128 institutions that could hold the item described by the record. at the present time, only 49 of the 128 bits are used since there are 49 institutions participating in oclc. the record also includes pointers to entries in index files, so that the data base may be readily updated, and a pointer to the beginning of the list of holdings for the record. in addition, each record has a small directory for the construction of truncated author-title-date entries, which are displayed to allow a user to make a choice whenever a search key indexes two or more records. although each bibliographic record includes all information in a standard marc ii record, records in the bibliographic record file have been reduced to 78 percent of the size of the communication record largely by reducing redundancy in structural information. oclc intends to compress bibliographic records further by reducing redundancy in text by employing compression techniques similar to those described in the literature ( 13,14). the holdings file contains a string of holdings records for each bibliographic record; individual records are chained with pointers. information in each record includes identity of the holding institution and the holding library within the institution, a list of each physical item of multiple or partial holdings, the call number and pointers to the next record in the chain, and to the call number index. the last record in the chain also has a back-pointer to the associated bibliographic record. whenever there is a unique note, dash entry, or extra added entry coupled to a holding, that holding has a pointer to a location in the third sequential file in which the note or entry resides. index files indexes include an author-title index, a title index, and an lc card number index. research and development are under way leading to implementation of an author and added author index and a call number index. a class number index will be developed and implemented in the future. with the exception of the class number index, which by its nature is required to be a sequentially accessible file, the oclc indexes are scatter storage files. the construction of and access to a scatter storage file involves the calculation of a home address for the record and the resolution of the collisions that occur when two or more records have the same home address. the calculation of a home address comprises derivation of a search key from the record to be stored or retrieved and the hashing or randomizing of the key to obtain an integer, relative record address that is converted to a 174 journal of library automation vol. 5/3 september, 1972 storage home address. the findings of oclc research on search keys has been reported (9,10,11). the hashing procedure employs a pseudo-random number generator of the multiplicative type: home address= rem ( k x.jm) where k is the multiplier 65539, x,. is the binary numerical value of the search key, and m is the modulus which is set equal to the size of the index file; 'rem' denotes that only the remainder of the division on the right-hand side is used. philip l. long and his associates have shown that efficiency of a scatter storage file is rapidly degraded when the loading of the file exceeds 75 percent ( 12 ); therefore, oclc initially loads files at 50 percent of physical capacity. hence, the modulus is chosen to be twice th e size of initial number of records to be loaded. when 75 percent occupancy is reached a new modulus is chosen and the file is regenerated. collisions are resolved using the quadratic residue search method proposed by a. c. day ( 15) and shown to be efficient ( 12). in this method, a new location is calculated when the home address is full; the first new location has the value (home address 2), the second (home address 6 ), the third ( home address 12 ) and so on until an empty location is found if a record is being placed in the file, or the end of the entry chain is found if records are being retrieved. when the file size m is a prime having the form 4n + 3, where n is an integer, the entire file may be examined by 1n searches. retrieval techniques the retrieval of a record or records from the oclc data base is achieved in fractions of a second when a single request is put to the file, and rarely exceeds a second when queuing delays are introduced by simultaneous operation of upwards of 50 terminals. response time at the terminal is greater than these figures because of the low communication line data rate, but terminal response time rarely exceeds five seconds. figure 6 shows the map of a record in the author-title index file and the title file. in the author-title file, the search key is a 3,3 key with the first trigram being the first three characters of the author entry and the second being the first three characters of the first word of the title that is not an english article (9). for example, "str,cha" is the search key for b. h. streeter's the chained library. however, any or all of the characters in the trigrams may be all in lower case. the author-title index also indexes title-only entries, but the title index provides a more efficient access to this type of entry. the pointer in the record map in figure 6 is the address of the bibliographic record from which the search key was d erived. the entry chain indicator bit is set to 0 (zero) if there is another record in the entry chain and to 1 if the record is last in the chain. when this bit is 0, the search skips to the next record as calculated by day's skip algorithm. the shared cataloging systemjkilgour, et al. 175 bibliographic record presence indicator bit is set to 0 (zero) to indicate that the bibliographic record associated with this index entry has been deleted; it is set to 1 to indicate that the bibliographic record is present. an author-title search of the data base is initiated by transmission of a 3,3 key from a terminal. a message parser analyzes the message and identifies it as a 3,3 author-title search key by the presence of the comma and by there not being more than three characters on either side of that comma. next, the hashing algorithm calculates the home address and the location is checked for the presence of a record. if no record is present, a message is sent to the terminal stating that there is no entry for the key submitted and suggesting other action to be taken. if a record is present and its key matches the key submitted and if the entry-chain indicator bit signifies that the record at the home address is the only record in the chain, the bibliographic record which matches the key submitted is displayed on the terminal screen. if the entry-chain bit signifies that there are additional records in the chain, those records are located by use of the skip algorithm. if more than one record possesses the same key as that submitted, truncated author-titledate entries derived from the matching bibliographic records are displayed with consecutive numbering on the terminal screen. the user then indicates by number the entry containing information pertaining to the desired work, and the program displays the full bibliographic record. the title-index record has the same map as the author-title record and is depicted in figure 6. the title index is also constructed and searched in entry chain indicator bit 4 bytes bibliographic record pointer nometitle search key bibliographic record presence indicator bit bibliographic record pointer title search key fig. 6. author-1'itte and title index records. 8 bytes 176 ]ou,-nal of library automation vol. 5/ 3 september, 1972 the same manner as the author-title index. the title search key ( 3,1,1,1) consists of the first three characters of the first word of the title that is not an english article plus the initial character of each of the next three words. commas separate the characters derived from each word. the title search key is "cha,l," for b. h. streeter's the chained libmry, the three commas signifying that the message is a title search key. the bibliographic record pointer and the two indicator bits have the same function as in the authortitle record. figure 7 exhibits the map for a record in the lc card number index. the three left-most bytes in the lc card number section contain an alphabetic prefix to a number where this is present, or, more usually, three blanks when there is no alphabetic prefix. similarly the right-most byte contains a supplement number or is blank. the middle four bytes contain eight digits packed two digits to a byte after the digits to the right of the dash have been, when necessary, left-filled with zeroes to a total of six digits. the dash is then discarded. for example, lc card number 68-54216 would be 68054216 before being packed. the pointer and the two indicator bits have the same function as in the author-title index record. an lc card number search is started with the transmission of an lc card number as the request. the parser identifies the message as an lc card number search by determining that there is a dash in the string of characters and that there are numeric characters in the two positions immediately to the left of the dash. the remainder of the search procedure duplicates that for the author-title index. on-line programs as is the case with all routinely used oclc programs, the on-line programs are written in assembly language to achieve the utmost efficiency in processing. in addition, every effort has been made to design programs to run in the fastest possible time. in other words, one of the main goals of the oclc on-line operation is lowest possible cost. the simulation study had shown that it was necessary to modify the operating system of the xds sigma 5 so that the work area of the operating system would be identical with that of the applications programs. the xds real-time batch monitor, which is one of the operating systems furnished by xds for the sigma 5, has been extensively altered, and one of the alterations is the change to a single work area. another major change to the operating system was building into it the capability for multiprogramming. at the present time, the on-line foreground of the system operates two tasks in that two polling sequences are running simultaneously, and the background runs batch jobs at the same time. this new monitor is called the on-line bibliographic monitor ( obm). an extension of obm is named motherhood (mh); mh supervises the operation of the on-line programs. mh also keeps track of the activities of these programs and compiles statistics of these activities. in addition, mh shared cataloging systemjkilgovr, et al. 177 contains some utility programs such as the disk and terminal 1/0 routines. the principal on-line application program is catalog (cat); its functions are described in detail in the subsequent sections entitled cataloging with existing bibliographic information and input cataloging. in general, cat accepts requests from terminals, parses them to identify the type of request, and then takes appropriate action. if a request is for a bibliographic record, cat identifies it as such, and if there is only one bibliographic record in the reply, cat formats the record in one of its work area buffers and sends the formatted record to the terminal for display. if more than one record is in the reply, cat formats truncated records and puts them out for display. after a single bibliographic record has been displayed, cat modifies the computer memory image of the record in accordance with update requests from the terminal. for example, fields such as edition statement or subject headings may be deleted or altered, and new fields may be added. when the request is received from the terminal to produce catalog cards from the record as revised or unrevised, cat writes the current computer memory image of the record onto a tape to be used as input to the catalog card production programs. the catalog card production programs operate off-line, and the first processing program is convert ( cnvt), which formats some of the fields and call numbers. the major activity of cnvt is the latter, for libraries require a vast number of options to set up their call numbers for printing. cnvt also automatically places symbols used to indicate oversized books above, below, or within call numbers as required. format is the second program; it receives partially formatted records from cnvt. format expands each record into the total number of card images corresponding to the total cards required by the requesting library 4 bytes bibliographic record pointer lbibliogrophic 8 bytes lc cord number record presence indicator bit entry choin indicator bit fig. 7. library of congress card number index record. 178 ] ournal of libm1·y automation vol. 5/3 september, 1972 for each particular title. format determines this total from the number of tracings and pack definition tables previously submitted by the library that define the printing of formats of cards to go into each catalog. format, which is an extensive revision of expand, contains many options not found in the old off-line catalog card production system. format can set up a contents note on any particular card, and puts tracings at the bottom of a card when tracings are requested. the author entry normally occurs on the third line, but if a subject heading or added entry is two or more lines long, format moves the author entry down on the card so that a blank line separates the added entry from the author entry. in other words, each card is formatted individually. the major benefit of this feature, which allows the body of the catalog data to float up and down the card, is that the text on most cards can start high up on the card, thereby reducing the number of extension cards. the omission of tracings from added entry cards has a similar effect. table 1 presents the percentage of extension cards in a sample of 126,738 oclc cards for 18,182 titles produced for twenty-five or more libraries during a seventeen-day period, compared with extension cards in library of congress printed cards and in a sample of nelinet cards "for over 1,300 titles" ( 16). the table shows that the oclc mixture of cards with and without tracings and with the floating body of text yields about 10.8 percent more extension cards compared to library of congress printed cards. were libraries to restore the original meaning to the phrase "main entry" by printing tracings only on main entry cards, the percentage of extension cards in computer produced catalog cards printed six lines to the inch would probably be less than for lc cards. format also sets up a sort key for each record and a sort program sorts the card images by institution, library, catalog, and by entry or call number within each catalog pack. another program, build-print-tape (bpt), arranges the sorted images on tape so that cards are printed in consecutive order in two columns on two-up card stock. f inally, a print program prints the cards on an ibm 1403 n1 printer attached to an ibm 360/50 computer. cataloging with existing bibliographic information this section describes cataloging using a bibliographic record already in the central file; the next section, entitled input cataloging describes cataloging when there is no record in the system for the item being cataloged. the cataloger at the terminal first searches for an existing record, using the lc card number found on the verso of the title page or elsewhere. if the respon se is negative or if there is no card number available, the cataloger searches by title or by author and title using the 3,1,1,1 or 3,3 search keys respectively. if these searches are unproductive, the cataloger does input cataloging. when a search does produce a record, the cataloger reviews the record shared cataloging systemjkilgour, et al. 179 table 1. extension catalog card percentages number oclc lilnary of nell net of marc ii congress marc ii cards cards printed cards cards 1 77.2 87.8 79.9 2 18.9 10.0 16.7 3 2.5 1.6 2.5 4 1.1 .3 .6 5 .2 .2 .1 6 .1 .2 to see if it correctly describes the book at hand. if it is the correct record and if the library uses library of congress call numbers, the cataloger tra nsmits a request for card production by depressing two keys on the keyboard. cataloging is then complete. if the lc call number is not used, the cataloger constructs and keys in a new number and then transmits the produce-cards request. if the record does not describe the book as the cataloger wishes, the record may be edited . the cataloger may remove a field or element, such as a subject heading. information within a field may be changed by replacing existing characters, such as changing an imprint date by overtyping, by inserting characters, or by deleting characters. finally, a new field such as an additional subject heading may be added. when the editing process is complete, the cataloger can request that the record on the screen be reformatted according to the alterations. having reviewed the reformatted version, the cataloger may proceed to card production. when a cataloger has edited a record for card production, the alterations in the record are not made in the record in the bibliographic record file. rather, the changes are made only in the version of the record that is to be used for card production. the edited version of the record is retained in an archive file after catalog card production so that cards may be produced again from the same record for the same library, should the need arise in the future. the author index currently under development will enable a cataloger to determine the titles of works in the file by a given author. the call number index, also currently being developed, will make it possible for a cataloger to determine whether or not a call number has been used before in his library. the class number index that will be developed in the future will provide the capability of determining titles that have recently been placed under a given class number or, if none is under the number, the class number and titles on each side of the given number. liijjul cataloging input cataloging is undertaken when there is no bibliographic record in the file for the book at hand. to do input cataloging, the cataloger requests 180 ]ounwl of library automation vol. 5/3 september, 1972 that a work form be displayed on the screen (figure 8 ) . the cataloger then proceeds to fill in the work form by keyboarding the catalog data, and transmitting the data to the computer field by field as each is completed. a~ shown in figure 8, a paragraph mark terminates each field ; each dash is to be filled in by the cataloger for each field used. input cataloging may be original cataloging or may use cataloging data obtained from some source other than the oclc system. type: form: intel i vi: bib i lv i: 1t ~ t> 1-t> 2 24t> 3 250 t> 4 260t> 5 300 t> 6 4-t> 7 5-t> 8 6-t> 9 7-t> 10 8-t> i i 092 t> 12 049 -t> 13 590 fig. 8. workform for a dewe y library. lang: isbn card no: d ~ b c ~ 1t b c ~ b c ~ d 1t ~ -« d ~ 1t b-j 4[ 1t shared cataloging systemjkilgour, et al. 181 when the catalog data has been input, revised, and correctly displayed on the terminal screen, the cataloger requests catalog card production. in the case of new cataloging, not only are cards produced, but also the new record is added to the file and indexed so that it is available within seconds to other users. if a marc ii record for the same book is subsequently added to the file, it replaces the input-cataloging record but does not disturb the holdings information. union catalog each display of a bibliographic record contains a list of symbols for those member institutions that possess the title. in other words, the central file is also a union catalog of the holdings of oclc member libraries, although in the early months of operation these holdings data are very incomplete. nevertheless, they will approach completeness with the passage of time and with retrospective conversion of catalog data. titles cataloged during the operation of the off-line system have been included in the union catalog. the union catalog function is an important function of the shared cataloging system, for it makes available to students and faculties, through the increased information available to staff members, the resources of academic institutions throughout ohio. libraries also use the union catalog as a selection tool since they can dispense with expensive purchases of little-used materials residing in a neighboring library. members also use the file to obtain bibliographic data to be used in ordering. assessment with over nine hundred thousand holdings recorded in the union catalog as of april 1973, it is clear that having this type of information immediately at hand will greatly improve services to students and faculties. enlargement of holdings recorded will enhance the union-catalog value of the system. wright state university is in process of converting its holdings using the oclc system, and the ohio state university libraries-the largest collection in the state-has already converted its shelf list in truncated form. the osu holdings information will soon be available to oclc members. members using the oclc system report a large reduction in cataloging effort. two libraries using lc classification report that they are cataloging at a rate in excess of ten titles per terminal hour when cataloging already exists in the system. libraries using dewey classification are experiencing a somewhat lower rate. the original cost benefit studies were done on the basis of a calculated rate of six titles per hour for those books for which there were already cataloging data in the system. the net savings will be realized when the file has reached sufficient size to enable the largest libraries to locate records for 65 percent of their cataloging and for the smallest to find 95 percent. to reach this level, members collectively would have to use 182 journal of library automation vol. 5/3 september, 1972 existing bibliographic information to catalog 350,000 titles in the course of a year, or an average of approximately 1,460 titles for the total system per working day. it was thought that this rate would be attained by the end of the second year of operation. however, at the end of the first month of on-line operation, over a thousand titles per day were being cataloged. the new catalog card production programs operating on the sigma 5 are much more efficient than the programs used in the older off-line system. earlier in this paper it was reported that cost of the older programs to format catalog cards, but not to print them, was 2.27 cents per card. if costs of the sigma 5 are calculated at commercial rates, the new programs format cards at 2.21 cents per card. however, if actual costs to oclc are used and with the total cost being assigned to one shift, the cost of formatting each card becomes 0.86 cents. the total cost of producing catalog cards is, of course, much more than the cost to format them on a computer. nevertheless, either the 2.21 cents or 0.86 cents rate might serve as a criterion for measuring the efficiency of computerized catalog card production. the low terminal response-time delay for the operation of seventy terminals is a good gauge of the efficiency of the on-line system. in particular, the file organization is efficient, for it enables retrieval of a single entry swiftly from a file of over 600,000 records. moreover, no serious degradation in retrieval efficiency is expected to arise as the result of the growth of file size. the system operates from 7:00 a.m. to 7:00 p.m. on mondays through fridays, and at times the interval between system downtimes has exceeded a week. it is rare that the system will be down on successive days, and when a problem does occur, the system can be restored within a minute or two. moreover, when the system goes down, only two terminals will occasionally lose data, and most of the time, there is no loss of data. hence, it can be concluded that the hardware and software are highly reliable. in summary, it can be said that the oclc on-line shared cataloging system is easy to use, efficient, reliable, and cost beneficial. acknowledgments the research and development reported in this paper were partially supported by office of education contract no. oec-0-70-2209 ( 506), council on library resources grant no. clr-489, national agricultural library contract no. 12-03-01-5-70, and an l.s.c.a. title iii grant from the ohio state library board. references 1. ellen washy miller and b. j. hodges, "shawnee mission's on-line cataloging system," ]ola 4:13-26 (march 1971). shared cataloging systemjkilgour, et al. 183 2. john p . kennedy, "a local marc project : the georgia t ech library," in proceedings of th e 1968 clinic on library a pplications of data processing. (u rbana , ill.: university of illinois gradu ate school of library science, 1969 ) p . 199-215. 3. new england board of high er education, new england librm·y information netw01·k; final r ep01t on council on library resources grant #443. (feb. 1970 ). 4. charles t. payne and robert s. mcgee, the university of chicago bibliographic data processing system: documentation and report supplement, (chicago, ill. : university of chicago library, april1971). 5. judith hopkins, manual for oclc catalog card production (feb. 1971). 6. ohio college library center, pt·eliminary description of catalog cards produced from marc ii data (sept. 1969). 7. f. g. kilgour, "libraries-evolving, computerizing, personalizing," american libraries 3:141-47 ( feb. 1972). 8. american national standards institute, american national standard for bibliographic information interchange on magnetic tape (new york: american national standards institute, 1971 ). 9. f. g. kilgour, p. l. long, and e. b. l eiderman, "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the american society for information science 7:79-82 ( 1970 ). 10. f. g. kilgour, p. l. long, e. b. leiderman, and a. l. landgraf, "titleonly entries retrieved by use of truncated search keys," lola 4: 207-210 (dec. 1971 ) . 11. philip l. long, and f. g. kilgour, "a truncated search key title index," lola 5:17-20 ( mar. 1972). 12. p. l. long, k. b. l. rastogi, j. e. rush, and j. a. w yckoff, "large on-line files of bibliographic data : an efficient design and a mathematical predictor of re trieval behavior." ifip congress '71: ljubljana -aug ust 1971. ( amsterdam, north holland publishing co., 1971 ). bookle t ta-3, 145-149. 13. martin snyderman and bernard hunt, "the myriad virtues of text compaction," datamation 16:36-40 (dec. 1970). 14. w. d. schieb er and g. w. thomas, "an algorithm for compaction of alphanumeric data," lola 4 :198-206 (dec. 1970). 15. a. c. day, "full table quadratic searching for scatter storage," communications of the acm 13:481 (aug. 1970). 16. new england board of higher education, new england libmry in formation . .. , p. 100-101. communications ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ lib-s-mocs-kmc364-20141005044228 book reviews die elektronische datenverarbeitung im bibliothekswesen. by paul niewalda. muenchen-pullach, berlin, verlag dokumentation, 1971. (bibliothekspraxis, 1) as the first volume in a new series called bibliothekspraxis (library practice), v erlag dokumentation has published a short monograph on library automation by paul n iewalda, of the university library of regensburg. niewalda has written an introductory text, in german, condensing the standard, largely american, literature on the subject. his treatment is concise, wellwritten, and · well-organized. computer capabilities, and existing library applications in the united states and elsewhere, are carefully delineated. the text is thoroughly documented, with a large number of notes and a useful bibliography included. the book addresses itself to the german reader and, in fact, much is already familiar to american librarians. yet niewalda's frequent references to the ·european, particularly the german, library automation scene enhance the book's value. the author is clearly well informed both about library automation in general, and about local practice and problems. he brings to his task common sense and sound judgment. the work is recommended to those readers having a general interest in foreign developments in the field of library automation. s. micha namenwirth university of california, berkeley dictionanj of library science, information and documentation in six languages. compiled and artanged by w. e. clason. amsterdam: elsevier scientific publ. co., 1973. the basic table, a numbered list of entries for 5,439 english language words and phrases, alphabetically arranged, forms the body of the dictionary of library science, information and documentation. each entry consists ·of a serial number, the english term (american and/or british), equivalents in french, spanish, italian, dutch; and german, and a code identifying the book reviews 123 vocabulary with which the term is associated. hence, there are separate entries for volume as a book trade or library term and as an information processing term. many entries are augmented by brief definitions. english synonyms are also frequently given; in general these are terms from which references have been made. in such cases entry is under the synonym which files first. this practice produces some apparently eccentric choices; e.g., pseudonym, see allonym; udc, see brussels system. following the basic ' table are indexes for the five non-english languages. numerical references are given to basic table entries in which the index term is cited. german band is found not only in the first volume entry mentioned above but also in the bookbinding and information processing entries for tape. criteria employed for the selection of entries are unexplained. ibm's data pro-·' cessing glossary and the american national standard vocabulary for information processing appear to have been important sources of information processing terms. the glossary in anglo-american cataloging rules was evidently not used. it is clear that some of the source lists used were in other languages. the juxtaposition of related vocabularies which often put the same words to different uses presents difficulties which the approach taken here seems capable of handling. nevertheless the work as executed has flaws which reduce its effectiveness. the notions of synonymy and nonsynonymy among the english terms are puzzling. definitions are frequently unclear and occasionally wrong. there are cases in which the non-english equivalents for a single' term are certainly not synonymous with each other. the utility of the mdexes would be enhanced if the number of nonenglish synonyms given were greater. however, if approached with care, the volume can ·provide much useful information. in works of this type it is probably unfair to expect perfection. besides, a dictionary which manages to encompass both negative entropy (information theory) and scrivener's palstj (authors and authorship) has to be interesting, at least. charles w. husbands harvard universitv library letter to the editor ann kucera information technology and libraries | june 2018 9 https://doi.org/10.6017/ital.v37i2.10407 dear editorial board, regarding “halfway home: user centered design and library websites” in the march 2018 issue of information technology and libraries (ital), i thought there were some interesting points. i think that your assertion, however, that user centered design automatically eliminates anything from a website that your main user group did not expressly ask for is faulty. when someone brings up the fact that user centered design is not statistically significant, i interpret that as a misunderstanding of what user centered design is. our academic library websites are not research projects so why would we gather statistically significant information about them? our academic library websites are (or should be) helpful to students and faculty and constantly changing to meet their needs. if librarians perpetuate a misunderstanding of user centered design, my fear is that misunderstanding could perpetuate stagnation and a refusal to change our technology/user interfaces in a rapidly changing environment and do our patrons and ourselves a disservice. user centered design is a set of tools to help us gather information about users and their needs. the information gathered informs the design but does not dictate the design and needs to be part of an iterative process. the web design team at your institution demonstrated user centered design when they added floor maps back into the web site when a group of users pointed out that it was causing problems for the main users at your institution. while valuable experience from librarians and other staff is critical to take into account, it is sometimes difficult to determine which pieces of the puzzle provide comfort to those who work at the library vs. which pieces assist students in their studies. i applaud your willingness to “clear the slate” and reduce the amount of information you were maintaining on your website. i’m guessing you may have removed dozens of links from your website. you only mentioned adding one category of information back into the design. i would say your user centered design process is working quite well. ann kucera systems librarian central michigan university https://doi.org/10.6017/ital.v37i1.10338 44 information technology and libraries | june 2007 author id box for 3 column layout column title 44 information technology and libraries | september 2008 communications james feher and tyler sondag administering an open-source wireless network this tutorial presents enhancements to an open-source wireless network discussed in the june 2007 issue of ital that should reduce its administrative burden. in addition, it will demonstrate an opensource monitoring script written for the wireless network. as it has become increasingly important to provide wireless internet access for their patrons, libraries and colleges are almost expected to offer this service. inexpensive methods of providing wireless access—such as adding a commodity wireless access point to an existing network—can suffer from security issues, access by external entities, and bandwidth abuses. designs that address these issues often involve more costly proprietary hardware as well as expertise and effort that are often not readily available. a wireless network built with open-source software and commodity hardware that addressed the cost, security, and equal access issues mentioned above was presented in the june 2007 issue of ital.1 this tutorial highlights enhancements to the previous design that help to explain the technical hurdles in implementation, and includes a program that monitors the status of the various software and hardware components, helping to reduce the time required to administer the network. the wireless network presented requires several different pieces of software that must work together. because each of the required software programs are frequently updated, slight changes to the implementation may also be needed. a few issues that have arisen since the previous paper was written are addressed. a note is provided explaining the significance of setting the correct media access control (mac) address for the radius server and for wireless distribution system (wds) when configuring the system. in addition, in order to provide secure exchange of authentication credentials (username and password), the secure socket layer was used. a brief explanation of how to install a registered certificate on the gateway server is provided. lastly, a program that monitors the status of the network, provides a web page displaying the status of the various hardware and software components, and e-mails administrators with any changes to the network status—along with information on how this program is to be deployed within the network—is presented. configuration changes for previous design as new exploits are discovered and patched on a continual basis, any system should be regularly updated to insure that the most recent software is being used. the network design provided in the previous article used many different software components including, but not limited to: access point software openwrt—whiterussian rc3 dns cache dnsmasq v2.32 gateway chillispot v1.0 operating system fedora core 4 radius server free radius v1.0.4 web caching server squid v2.5 web server apache 2.2.3 many of these components can be kept up-to-date by using the yellow dog updater, modified (yum). 2 for example, to update a given package, with root access, at the command line enter: yum update packagename the yum command may also be used to update each package that has an available update by simply removing the package name from the yum update command and entering the following: yum update yum may also be used to upgrade the entire operating system.3 keep in mind that with any change in software, the configuration of any particular package may change as well. for example, the newest version of squid is currently 2.6. appendix d in the previous paper explained how to allow transparent relay of web requests so that client browsers did not have to be reconfigured. so, while version 2.5 required four changes to allow the transparent relay, the current version—found in appendix a—requires only one. in addition to changes in software, occasionally even entire websites move, as happened with chillispot.4 another change involved the configuration of the linksys wrt54gs access points. the newer versions of this access point/router sold by linksys have half the flash memory and half of the ram of the older versions.5 while the newer versions of the linksys wrt54gs can be flashed with custom firmware,6 the firmware that will fit on the newer unit lacks all the capability of the standard firmware. given this, those wishing to implement such a wireless network should investigate the capability of models to be deployed, as well as the version numbers for the access points chosen. the current version of the linksys wrt54gl and wrtsl54gs units retain enough flash memory and ram to be updated with the standard firmware mentioned in the previous article.7 james feher (jdfeher@mckendree.edu) is associate professor of computer science and computer information systems at mckendree university, lebanon, illinois. tyler sondag (sondag@cs.iastate.edu) is a phd candidate in computer science at iowa state university, ames. introducing zoomify image | smith 45administering an open-source wireless network | feher and sondag 45 in addition, the procedure for upgrading the firmware for the wrtsl54gs is simpler than the procedure outlined in appendix i of the previous paper. the factory-installed firmware on version 1.1 can be flashed directly using the web interface provided by linksys. so, while this tutorial and the previous paper outline the design of a network, the administrator will need to be vigilant in updating the packages used and keep in mind that the configuration specifications may also change with those updates. the administrator for the network must also investigate the capability of the standard hardware used to insure that it retains the functionality required for the system. choosing the correct mac address for the access point the access points used will have more than one interface and as such more than one mac address. when entering the mac address of a given access point into either the users file for the radius server or the access points that use the wds, use the mac address associated with the wireless interface.8 using the incorrect mac address will result in problems when communicating with the various access points. for the radius server, the access point will not get the correct ip address, which will prohibit the possibility of remotely administering the unit. incorrect mac addresses that are used for the wds settings will cause even worse problems, as the unit will not be able to relay data from users who connect to this access point. installation of a registered ssl certificate as users are required to enter their authentication credentials to gain access to the internet, the exchange of this data is encrypted using the secure socket layer.9 while administrators can self-sign the certificates used for their web servers, it is recommended that a registered certificate be obtained and installed for the system. this can help prevent common attacks and has the added benefit of eliminating warnings for the client browsers when they detect unregistered certificates being used by the ssl. a search of “ssl certificate” will yield any number of commercial vendors from which a certificate can be obtained. generally the installation of a certificate is fairly straightforward. the openssl command line utility can be used to generate a ssl key and certificate signing request (csr).10 once the csr is generated, pick a vendor/certificate authority who can sign your key. it should be noted that the design presented required the authentication gateway to be behind the main router. this required a certificate to be signed for a server within an intranet that does not have a fully qualified domain name. so, when generating the ssl key and csr, make sure to use gatewayhostname.localnet as the common name of your server. of course, gatewayhostname is whatever you choose as the name of your gateway host. the term localnet is used to refer to the server existing within an intranet. then make sure to place an entry for gatewayhostname.localnet into the hosts file of the server that is providing domain name service for your network. an example entry for the hosts file which is in the /etc directory of a standard fedora core installation is found in appendix b. monitoring script for wireless network as the wireless network has many separate hardware and software components, many possible points of failure exist for the system. the script from appendix c, which was written in perl,11 uses ping to test if each access point is still connected to the network and nmap to test whether the port associated with a given network service is still available.12 this program can be run manually or, even better, run automatically through the unix cron utility to update a webpage that displays the current state of all the network components. the webpage generated by this script for the mckendree college wireless network may be found at http://lance.mckendree.edu/cgi-bin/wireless/status.cgi. (additionally, a sample of this page is available as a figure in appendix d.) this script actually contains a script within a script. the main script must be run on the gateway machine, chilli on the diagram in appendix e, as only this machine has access to ping the access points. when the script determines that an access point or daemon is down, it will e-mail the system administrator. when an access point is down, in addition to sending the system administrator an e-mail, it can also send notification to an e-mail address associated with that device. this allows for someone other than the system administrator—who may have closer physical access to the unit—to check the access point on behalf of the administrators for simple issues, such as an access point losing power. this script then generates another cgi script that can be transmitted to an external server that can be reached from anywhere on the internet. in this case, this generated script can be run as a web-based application or by the system itself using the cron utility. if run as by the cron daemon, it will also e-mail the administrators if the script has not been updated recently. the script requires the use of several perl modules that will need to be installed. n expect n mail::mailer n net::ping the script has been released using the gnu general public license, 46 information technology and libraries | june 200846 information technology and libraries | september 2008 version 2 (gpl).13 the first portion of the script contains a reference to the gpl, followed by a brief explanation of the script as well as a set of parameters that should be changed to fit the specifications of the network designed. conclusion administrators should be vigilant in updating the entire system to assure security, keeping in mind that new versions of software or hardware may necessitate changes in the overall configuration of the system. in addition, while the monitoring script provides a useful aid in monitoring the network, it could be further expanded to include a more comprehensive review of level of use for various access points by the different users. it is felt that this would be best done through a database, which would require a higher level of administrative effort. a brief frequently asked questions list along with the script and link to the code for the script can be found at http://lance.mckendree.edu/csi/ wirelessfaq.html. references 1. sondag, tyler and james feher, “open source wifi hotspot implementation,” information technology and libraries 26, no. 2: 35–43, http://ala.org/ala/lita/ litapublications/ital/262007/2602jun/ toc.cfm (accessed july 24, 2008). 2. linux@duke, “yum: yellow dog updater, modified,” http://linux.duke .edu/projects/yum (accessed july 24, 2008) 3. upgrading fedora using yum frequently asked questions, http://fedora p r o j e c t . o r g / w i k i / yu m u p g r a d e f a q (accessed mar. 16, 2007). 4. chillispot—open source wireless lan access point controller, “spice up your hotspot with chilli,” www .chillispot.info/ (accessed may 22, 2008). 5. openwrtdocs/hardware/linksys /wrt54gs—openwrt, http://wiki.open wrt.org/openwrtdocs/hardware/link sys/wrt54gs (accessed july 24, 2008). 6. bitsum technologies wiki— wrt54g5 cfe, http://bitsum.com/ openwiking/owbase/ow.asp?wrt54g5_ cfe (accessed july 24, 2008). 7. openwrtdocs/hardware/linksys/wrtsl54gs—openwrt, http:// wiki.openwrt.org/openwrtdocs/hard ware/linksys/wrtsl54gs (accessed july 24, 2008). 8. o p e n wr t d o c s / w h i t e r u s s i a n / configuration, wireless distribution system (wds)/repeater/bridge. http:// wiki.openwrt.org/openwrtdocs/white russian/configuration (accessed july 24, 2008). 9. viega, john, matt messier, and pravir chandra, network security with openssl cryptography for secure communications. (sebastopol, calif.: o’reilly and associates, 2002). 10. generating a key pair and csr for an apache server with modssl. www .verisign.com/support/tlc/csr/modssl/ v00.html (accessed feb. 20, 2007). 11. wall, larry, tom christiansen, and randal schwartz, programming perl, third edition (sebastopol, calif.: o’reilly and associates). 12. nmap—free security scanner for network exploration and security audits. http://insecure.org/nmap/ (accessed feb. 20, 2007). 13. gnu general public license version 2, june 2007. www.gnu.org/licenses/ gpl.txt. appendix a. squid configuration changes # changes made to squid.conf # lines needed for squid 2.5 #httpd_accel_port 80 #httpd_accel_host virtual #httpd_accel_with_proxy on #httpd_accel_uses_host_header on # # one line needed in version 2.6 http_port 3128 transparent appendix b. /etc/hosts entry on marla for localnet entry 127.0.0.1 marla localhost.localdomain localhost 66.128.109.60 bob 66.99.172.252 lance.mckendree.edu lance # next line is for the ssl certificate to work properly 192.168.176.1 chilli.localnet chilli introducing zoomify image | smith 47administering an open-source wireless network | feher and sondag 47 appendix c. monitoring script #!/usr/bin/perl ######################################################### # code released 03/22/07 under: # # the gnu general public license, version 2 # # http://www.gnu.org/licenses/gpl.txt # # # # it is recommended that this script is run as a cron # # job frequently to find changes in the network. this # # script will check the status of the wireless access # # points/routers as well as the daemons necessary to # # run the network. it will then output the results to # # another perl file that is copied to a remote # # webserver. when the script observes a change in the # # availability of any access point or daemon, email # # will be sent to the specified administrator # # address(es). the option exists to send an email to # # to an additional person for each access point. # # # # additionally, the output file on the remote webserver # # will check when it was last updated, if that script # # is run from the command line or via cron. if it has # # not been updated for a specified number of minutes, # # it will send an email to the administrator. it is # # also recommended that this output script be run as a # # cron jobr. this output script can also be executed # # as a cgi program to generate a display of network # # status. # ######################################################### use strict; use expect(); # needed to scp to webserver use mail::mailer; # needed to send emails if outages use net::ping; # needed to check the status of aps #variables for webserver to host status page’s my $webservuname = “username”; my $webservpass = “password”; my $webservurl = “lance.mckendree.edu”; my $webservtarg = “/var/www/cgi-bin/wireless/”; my $weboutputurl = “http://lance.mckendree.edu/cgi-bin/wireless/status.cgi”; my $instname = “mckendree college”; #default background color of the status page my $defbgcolor = “#660066”; # if the page on the webserver has not been updated # in $updatemin minutes send an email that the service # is down (set to =~ 3*crontime) my $updatemin = 10; #email address errors will be sent to my $fromemail = ‘admin1@email.com’; my $toemail = 48 information technology and libraries | june 200848 information technology and libraries | september 2008 ‘admin1@email.com, admin2@email.com’; #file where errors will be stored on remote host my $logfilename = “/tmp/wireleslog.txt”; #hash for routers/ap’s #location is displayed on the webpage and in status emails #owner changes in status regarding this ap are sent to # this address as well (optional) my %iptoloc = ( “192.168.182.10” => { “location” => “clark 205”, “owner” => ‘’}, “192.168.182.11” => { “location” => “clark 202a”, “owner” => ‘apuser1@email.com’}, “192.168.182.12” => { “location” => “pac lounge”, “owner” => ‘apuser2@email.com’}, “192.168.182.20” => { “location” => “library main”, “owner” => ‘apuser3@email.com’}, “192.168.182.21” => { “location” => “library upper”, “owner” => ‘’}, “192.168.182.22” => { “location” => “library lower”, “owner” => ‘’}, “192.168.182.30” => { “location” => “carnegie”, “owner” => ‘apuser4@email.com’}); #hash for daemons my %daemons = ( “dnsmasq dns server” => { “ip_addr” =>”10.4.1.90”, “port” =>”53”, “proto” =>”tcp”}, “radius authenticate” => { “ip_addr” =>”10.4.1.90”, “port” =>”1812”, “proto” =>”udp”}, “chilli capt. portal” => { “ip_addr” =>”10.5.3.30”, “port” =>”0”, “proto” =>”local”}, “squid web cache” => { “ip_addr” =>”10.4.1.90”, “port” =>”3128”, “proto” =>”tcp”}, “apache web server” => { “ip_addr” =>”10.5.3.30”, “port” =>”80”, “proto” =>”tcp”}); introducing zoomify image | smith 49administering an open-source wireless network | feher and sondag 49 ######################################################## # # # no changes need to be made to the following code # # # ######################################################## # get the current time my $currenttime = scalar localtime(); my $starttime = time(); # open old output status script to get previous status’ open(old, “status.cgi”); my @tmpoldstatfile = ; my $oldstatfile = join(“”, @tmpoldstatfile); # check routers/ap’s using ping my $diff = ‘’; my $allrouterstat; foreach my $host (sort keys %iptoloc){ my $p = net::ping->new(); my $pingresult = $p->ping($host); if(!$pingresult){ sleep 10; $pingresult = $p->ping($host); } my $thislaststat = ( $oldstatfile =~ m/$iptoloc{$host}{location}<\/td>close(); } #check the status of each daemon my $alldaemonstat =’’; foreach my $i (sort keys %daemons){ my $thislaststat = ( $oldstatfile =~ m/$i<\/td> (\$lasttime + (60 * $updatemin))){ \$systemstatus = “#ff0000”; \$message = “

status update failed

”; } # if this is cron running the script if (\$currentuser =~ “$webservuname”){ # send email if status is down & logfile doesn’t exist &sendemail() if ( (\$systemstatus =~ “#ff0000”) && !(-e “$logfilename”) ); # delete log file if everything is up unlink(“$logfilename”) if ( (!(\$systemstatus =~ “#ff0000”)) && (-e “$logfilename”) ); } #else apache is accessing the page (its a web request) else{ #print the page print header(); ############################ # start of html output # ############################ print < $instname wireless status introducing zoomify image | smith 51administering an open-source wireless network | feher and sondag 51

$instname wireless status

\$message $allrouterstat

$alldaemonstat

last updated $currenttime
web_output ########################## # end of html output # ########################## }#end else sub sendemail { my \$mailer = mail::mailer->new(“sendmail”); \$mailer->open({from => ‘$fromemail’, 52 information technology and libraries | june 200852 information technology and libraries | september 2008 to => [\$toemail], subject => “wireless problem”}); my \$message = “the wireless system has failed to “ .”it’s status.\n\n$weboutputurl\n”; print \$mailer \$message; \$mailer->close(); open(file, “>>$logfilename”); print file “failed to update system.”; close(file); } output_file_for_remote_host ######################################################## # end of script output block # ######################################################## #write output code to the file my $perloutputfile = “status.cgi”; open (out, “>$perloutputfile”); print out $perloutput; close (out); chmod 0755, $perloutputfile; #send email is necessary &sendemail($diff, $weboutputurl, $fromemail, $toemail) if ($diff); #send perl file to webserver &scpfile($perloutputfile, $webservuname, $webservpass, $webservurl, $webservtarg); ################################################ # # # end main code block, start functions # # # ################################################ # given the name and status of something (ap or # daemon), this returns a string for the table # row for displaying the status of the ap/daemon sub printstatus { my ($service, $status, $oldstatus, $owner, $toemail,$oldstatusfile, $currenttime ) = @_; my $msg = “”; my $statusline = “\n $serviceup”; introducing zoomify image | smith 53administering an open-source wireless network | feher and sondag 53 # if last two status’ were down if ($oldstatusfile =~ m/$$service$-0--->/){ $msg = “$service back up at $currenttime\n”; # if service has owner & not already in mail list, # add owner to mail list $toemail .= “, \’$owner\’” if ($owner && (!($toemail =~ $owner))); } } #else current status is down else{ $statusline .= “down\”>down”; # if last status was down & before that status was up if ($oldstatusfile =~ m/$$service$-0-1-->/){ $msg = “$service down at $currenttime\n”; # if service has owner & not already in mail list, # add owner to mail list $toemail .= “, \’$owner\’” if ($owner && (!($toemail =~ $owner))); } } $statusline .= “”; return ($statusline, $toemail, $msg); }#end printstatus function # checks the status for the given daemon # takes in ip, port to check, daemon name, and protocol # (tcp/udp). if given port=0 it checks for local daemon sub checkdaemon { my ($ip, $port, $daemon, $proto) = @_; my $dstat = 0; if ($proto !~ /local/){ #su checks for udp ports my $com = ($proto =~ “tcp”) ? (“nmap -p $port $ip | grep $port”) : (“nmap -su -p $port $ip | grep $port”); open(tmp, “$com|”); my $comout = ; close(tmp); if ($comout =~ /open/){ $dstat = 1; #if port is open, status is up } } else{ $daemon =~ s/ +.*//g; #\l lowercases the first letter of $daemon my $com = “which \l$daemon”; open(tmp, “$com|”); my $comout = ; close(tmp); $com = “ps aux | awk ‘{print \$11}’ | grep $comout”; open(tmp, “$com|”); $comout = ; close(tmp); $dstat = 1 if ($comout); 54 information technology and libraries | june 200854 information technology and libraries | september 2008 } return $dstat; } # end checkdaemon function # send the output perl status file to the webserver sub scpfile { my ($filepath, $webservuname, $webservpass, $webservurl, $webservtarg ) = @_; my $command = “scp $filepath $webservuname” .”\@$webservurl:$webservtarg”; my $exp1 = expect->spawn ($command); # the first argument “30” may need to be adjusted # if your system has very high latency my $ret = $exp1->expect(30, “word:”); print $exp1 “$webservpass\r”; my $ret = $exp1->expect(undef); $exp1->close(); } # end scpfile function # send an email to the admin & append error to log file sub sendemail { my ($errorlist, $weboutputurl, $fromemail, $toaddresses ) = @_; my $mailer = mail::mailer->new(“sendmail”); $mailer->open({from => “$fromemail”, to => [$toaddresses], subject => “wireless problem”}); $errorlist .= “\n\n$weboutputurl”; print $mailer $errorlist; $mailer->close(); } # end sendemail function appendix d. script output page appendix e. diagram of network lita cover 2, cover 3, cover 4 index to advertisers lib-s-mocs-kmc364-20140601051313 17 a truncated search key title index philip l. long: head, automated systems research and development and frederick g. kilgour: director, ohio college library center, columbus, ohio. an experiment showing that 3, 1, 1, 1 search keys derived from titles are sufficiently specific to be an efficient computerized, interactive index to a file of 135,938 marc ii records. this paper reports the findings of an experiment undertaken to design a title index to entries in the ohio college library center's on-line shared cataloging system. several large libraries participating in the center requested a title index because experience in those libraries had shown that the staff could locate entries in files more readily by title than by author and title. users of large author-title catalogs have long been aware of great difficulties in finding entries in such catalogs. since the center's computer program for producing an author-title index could be readily adapted to produce a title index, it was decided to add title access to the system. a previous paper has shown that truncated three-letter search keys derived from the first two words of a title are less specific than authortitle keys ( 1). earlier work had revealed that addition of only the first letter of another word in a title improved specificity ( 2) . therefore, the experiment was designed to test the specificity of keys consisting of the first three characters of the first non-english-article word of the title plus the first letter of a variable number of consecutive words. the experiment was also designed to produce an index that catalogers could use efficiently and that would operate efficiently in the computer system. it was assumed that the terminal user would have in hand the volume for which an entry was to be sought in the on-line catalog. the index was not to be designed for use by library users; subsequent experiments will be done to design an index for nonlibrarian users. other investigations into computerized, derived-key title indexes include 18 journal of library automation vol. 5/1 march, 1972 the previous paper in this series to which reference has already been made ( 1) and development of a title index in stanford's ballots system ( 3). although stanford has not published results observed from experiment or experience that describe the retrieval specificity of its technique, it is clear that the stanford procedure is not only more powerful than the one described in this paper but also more adaptable for user employment. the stanford index is probably less efficient. materials and methods a file of 135,938 marc ii records was used in this experiment. this file contains title-only and name-title entries, and keys were derived from titles in both types of entries. a key was extracted consisting of the first three characters of the first non-english-article word of each title plus the first character of each following word up to four. if there were fewer than four additional words, the key was left-justified, with trailing blank fill. only alphabetic and numeric characters were used in key derivation; alphabetic characters were forced to uppercase. all other characters were eliminated and the space occupied by an eliminated character was closed up before the key was derived. a total of 115,623 distinct keys was derived from the 135,938 entries. these 115,623 keys were then sorted. each key in the file was compared with the subsequent key or keys and equal comparisons were counted. a frequency distribution by identical keys was thus prepared, and a table constructed of percentages of numbers of equal comparisons based on the total number of distinct keys. this table contains the percentage of time for expected numbers of replies based on the assumption that each key had a probable use equal to all other keys. next, by eliminating the fourth single character and then the fourth and third, files of 3,1,1,1 and 3,1,1 keys were prepared from the 3,1,1,1,1 file. for example, the 3,1,1,1,1 key for raymond irwin's the heritage of the english library is her, 0, t, e , l; the 3,1,1,1 key for this title is her, 0 , t, e; and the 3,1,1 key, her, 0 , t. the same processing given to the 3,1,1,1,1 file was employed on these two files. results table 1 contains the maximum number of entries in 99 percent of replies. inspection of the table reveals that there is a large increase in specificity when the key is enlarged from 3,1,1 to 3,1,1,1; the maximum number of entries ( 99+ percent of the time) drops from twelve to five. however, when the key goes to 3,1,1,1,1, the number of entries per reply goes down only to four from five. the percentage of replies that contained a single entry was 67.8 for the 3,1,1 key, 84.0 for the 3,1,1,1 key, and 90.0 for the 3,1,1,1,1 key. a truncat ed search key / long and kilgour 19 table. 1. maximum number of entries in 99 percent of replies search key 3, 1,1 3, 1, 1,1 3, 1, 1, 1,1 title index entries maximum entries per reply 12 5 4 percentage of time 99.0 99.1 99.2 the irascope cathode ray tube terminals used in the oclc system can display nine truncated entries on the one screen, and it is felt that catalogers can use with ease up to two screensful of entries. therefore, the keys producing more than eighteen titles were listed. for 3,1,1,1,1 there were only 33; for 3,1,1,1 there were 67; and for 3,1,1 there were 357. the maximum number of identical keys was 321 for 3,1,1,1,1 and 3,1,1,1; the key was pro, b, b, b, b, most of which was d erived from "proceedings." for 3,1,1 the maximum was 417, for his, 0 , t "history of the." discussion it is clear from the findings that a 3,1,1 search key is not sufficiently specific to operate efficiently as a title index in a large file. however, the 3,1,1,1 key appears to be sufficiently specific for efficient operation, while the 3,1,1,1,1 key does not appear to possess sufficient increased specificity to justify its additional complexity. the observation that there is a large increase in specificity between keys employing threeand four-title words that constitute markov strings suggests that the second and third words may be highly correlated. indeed this suggestion is substantiated b y the maximum case for 3,1,1-his, 0, t. in the more-than-eighteen group for 3,1,1,1, these characters occurred in seven keys for a total of 206 entries, and for 3,1,1,1,1 they did not occur at all in the more-than-eighteen group. conclusion this experiment has shown that a 3,1,1,1 or 3,1,1,1,1 derived search key is sufficiently specific to operate efficiently as a title index to a file of 135,938 marc ii records. since a previous paper observed that as a fil e of entries increases the number of entries per reply does not increase in a one-to-one ratio ( 1 ), it is likely that these keys will operate efficiently for files of considerably greater size. 20 journal of library automation vol. 5/1 march, 1972 references 1. frederick g. kilgour, philip l. long, eugene b. leiderman, and alan l. landgraff, "title-only entries retrieved by use of truncated search keys," l ournal of library automation 4:207-10 ( dec. 1971 ). 2. frederick g. kilgour, "retrieval of single entries from a computerized library catalog file," proceedings of the american society for information science 5: 133-36 ( 1968 ). 3. edwin b. parker, spires (stanford physics information retrieval system) 1969-70 annual report ( palo alto: stanford university, june 1970 ), p. 7778. lib-s-mocs-kmc364-20141005023522 61 from the editor at the january 1973 midwinter meeting of the american library association, the board of directors of the information science and automation division appointed me to the position of editor of the journal of library automation. i wish to express gratitude to don s. culbertson, at that time executive secretary of both the information science and automation division and the american library trustees association, for adding yet another hat while he prepared a substantial portion of this june 1972 issue of jola. as incoming editor, i also wish to describe briefly to the subscribers and regular readers of jola the situation of the journal and my plans for its immediate future. you are aware that there has been a hiatus in the publication of jola. at this writing, the journal is approximately ten months behind schedule. by taxing the capacity of the ala staff, jola should return to its normal schedule within a year. during the intervening period, i will appreciate greatly the support of isad members, jola readers, authors, and the ala staff. with this support the task will be made lighter and perhaps will be expedited. no substantial changes in editorial policy will be made in the near future, as all efforts will be turned toward bringing the journal up to scheduje. susan k. martin, editor 17 march 1973 persistent urls and citations offered for digital objects by digital libraries article persistent urls and citations offered for digital objects by digital libraries nicholas homenda information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12987 abstract as libraries, archives, and museums make unique digital collections openly available via digital library platforms, they expose these resources to users who may wish to cite them. often several urls are available for a single digital object, depending on which route a user took to find it, but the chosen citation url should be the one most likely to persist over time. catalyzed by recent digital collections migration initiatives at indiana university libraries, this study investigates the prevalence of persistent urls for digital objects at peer institutions and examines the ways their platforms instruct users to cite them. this study reviewed institutional websites from the digital library federation’s (dlf) published list of 195 members and identified representative digital objects from unique digital collections navigable from each institution’s main web page in order to determine persistent url formats and citation options. findings indicate an equal split between offering and not offering discernible persistent urls with four major methods used: handle, doi, ark, and purl. significant variation in labeling persistent urls and inclusion in item-specific citations uncovered areas where the user experience could be improved for more reliable citation of these unique resources. introduction libraries, archives, and museums often make their unique digital collections openly available in digital library services and in different contexts, such as digital library aggregators like the digital public library of america (dpla, https://dp.la/) and hathitrust digital library (https://www.hathitrust.org/). as a result, there can be many urls available that point to digital objects within these collections. take, for example, image collections online (http://dlib.indiana.edu/collections/images) at indiana university (iu), a service launched in 2007 featuring open access iu image collections. users discover images on the site through searching and browsing and its collections are also shared with dpla. the following urls exist for the digital object shown in figure 1, an image from the building a nation: indiana limestone photograph collection: • the url as it appears in the browser in image collections online: https://webapp1.dlib.indiana.edu/images/item.htm?id=http://purl.dlib.indiana.edu/iudl/i mages/vac5094/vac5094-01446 • the persistent url on that page (“bookmark this page at”) http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 • the url pasted from the browser for the image in dpla: https://dp.la/item/eb83ff0a6ae507e2ba441634f7eb0f18?q=indiana%20limestone nicholas homenda (nhomenda@indiana.edu) is digital initiatives librarian, indiana university bloomington. © 2021. https://dp.la/ https://www.hathitrust.org/ http://dlib.indiana.edu/collections/images https://webapp1.dlib.indiana.edu/images/item.htm?id=http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 https://webapp1.dlib.indiana.edu/images/item.htm?id=http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 https://dp.la/item/eb83ff0a6ae507e2ba441634f7eb0f18?q=indiana%20limestone mailto:nhomenda@indiana.edu information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 2 as a digital library or collection manager, which url would you prefer to see cited for this object? figure 1. an example of a digital object with multiple urls. mcmillan mill, ilco id in2288_1. courtesy, indiana geological and water survey, indiana university, bloomington, indiana. retrieved from image collections online at http://purl.dlib.indiana.edu/iudl/images/vac5094/vac509401446. citation instructions given to authors in major style guides explicitly mention using the best possible form of a resource’s url: “[i]t is important to choose the version of the url that is most likely to continue to point to the source cited.”1 of the three urls above, the second is a purl, or persistent url (https://archive.org/services/purl/), which is why both image collections online and dpla instruct users to bookmark or cite it. other common methods for issuing and maintaining persistent urls include digital object identifiers (doi, https://www.doi.org/), handles (http://handle.net/), and archival resource keys (ark, https://n2t.net/e/ark_ids.html). all of those have been around since the late 1990s to early 2000s. at indiana university libraries, recent efforts have focused on migrating digital collections to new digital library platforms, mainly based on the open source samvera repository software (https://samvera.org/). as part of these efforts, we wanted to survey how peer institutions were http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 http://purl.dlib.indiana.edu/iudl/images/vac5094/vac5094-01446 https://archive.org/services/purl/ https://www.doi.org/ http://handle.net/ https://n2t.net/e/ark_ids.html https://samvera.org/ information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 3 employing persistent, citable urls for digital objects to determine if a prevailing approach had emerged since indiana university libraries’ previous generation of digital library services were developed in the earlyto mid-2000s. besides having the capability of creating and reliably serving these urls, our digital library platforms need to make these urls easily accessible to users, preferably along with some assertion that the urls should be used when citing digital objects and collections instead of the many non-persistent urls also directing to those same digital objects and collections. although libraries, archives, and museums have digitized and made digital objects in digital collections openly accessible for decades using several methods for providing persistent, citable urls, how do institutions now present digital object urls to people who encounter, use, and cite them? by examining digital collections within a large population of digital library institutions’ websites, this study aims to discover 1. what methods of url persistence are being employed for digital objects by digital library institutions? 2. how do these institutions’ websites instruct users to cite these digital objects? literature review the study of digital objects in the literature often takes a philosophical perspective in attempting to define them. moreover, practical accounts of digital object use and reuse note the challenges associated with infrastructure, retrieval, and provenance. much of the literature about common methods of persistent url resolution comes from individuals and entities who developed and maintain these standards, as well as overviews of the persistent url resolution methods available. finally, several studies have investigated the problem of “link rot” by tracking the availability of web-hosted resources over time. allison notes the generations of philosophical thought that it took to recognize common characteristics of physical objects and the difficulty in understanding an authentic version of a digital object, especially with different computer hardware and software changing the way digital objects appear.2 hui also investigates the philosophical history of physical objects to begin to define digital objects through his methods of datafication of objects and objectification of data, noting that digital objects can be approached in three phases: objects, data, and networks, in order to define them.3 lynch is also concerned with determining the authenticity of digital objects and challenges inherent in the digital realm. in describing digital objects, he creates a hierarchy with raw data at the bottom, elevated to interactive experiential works at the top which elicit the fullest emotional connection contributing to the authentic experience of the work.4 the literature often examines digital objects from the practitioner’s perspective, such as the publishing industry’s difficulty in repurposing digital objects for new publishing products. publishers in benoit and hussey’s 2011 case study note the tension between managers and technical staff concerning assumptions about what their computer system could automatically do with their digital objects; their digital objects always require some human labor and intervention to be accurately described and retrievable later. 5 dappert et al. note the need to describe a digital object’s environment in order to be able to reproduce it in their work with the premis data dictionary for preservation metadata (https://www.loc.gov/standards/premis/).6 strubulis et al. provide a model for digital object provenance using inference and resource description framework (rdf) triples (https://w3.org/rdf/) since storing full provenance information for https://www.loc.gov/standards/premis/ https://w3.org/rdf/ information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 4 complex digital objects, such as the large amount of mars rover data they offer as an example, would be cost prohibitive.7 in 2001, arms describes the landscape of persistent uniform resource names (urn) of handles, purls, and dois near the latter’s inception.8 recent work by koster explains the persistent identifier methods most in use today and examines current infrastructure practices for maintaining them.9 the persistent link resolution method most prominently featured in the literature is the digital object identifier (doi). beginning in 1999, those behind developing and implementing doi have explained its inception, development, and trajectory, continuing with paskin’s deep explanation in 2002 of the reasons why doi exist and the technology behind the service. 10 discipline-specific research notes the utility of doi. sidman and davidson and weissberg studied doi for the purposes of automating the supply chain in the publishing industry.11 derisi, kennison, and twyman, on behalf of the public library of science (plos) announced their 2003 decision to broadly implement doi, followed by additional disciplinespecific encouragement of the practice by skiba in nursing education and neumann and brase in molecular design.12 the archival resource key (ark) is an alternative permanent link resolution scheme. since 2001, the open-source ark identifier offers a self-hosted solution for providing persistent access to digital objects, their metadata, and a maintenance commitment.13 recently, duraspace working groups have planned for further development and expansion of ark with the arks in the open project (https://wiki.lyrasis.org/display/arks/arks+in+the+open+project). persistent urls (purls) have been used to provide persistent access to digital objects for nearly 20 years, and their use in the library community is well documented. shafer, weibel, and jul anticipate uniform resource names becoming a web standard and offer purls as an intermediate step to aid in urn development.14 shafer also explained how oclc uses purls and alternate routing methods (arms) to properly direct global users to oclc resources.15 purls are also used to provide persistent access to government information and were seen by the cendi persistent identification task group as essential to their early efforts to implement the federal enterprise architecture (fea) and a theoretical federal persistent identification resolver.16 digital objects and collections should ideally be accessible via urls that work beyond the life of any one platform, lest the materials be subjected to “link rot,” or the process of decay when previously working links no longer correctly resolve. ducut et al. investigated 1994–2006 medline abstracts for the presence of persistent link resolution services such as handle, purl, doi, and webcite and found 20% of the links were inaccessible in 2008.17 mcmurry et al. investigated link rot in life sciences data and suggested practices for formatting links for increased persistence and approaches for versioning.18 the topic of link rot has been examined as early as 2003, in markwell and brooke’s “broken links: just how rapidly do science education hyperlinks go extinct,” cited by multiple link rot studies. ironically, this article is no longer accessible at the cited url.19 methodology this study sought a set of digital objects within library institutions’ digital collections websites. to locate examples of publicly accessible digital objects in digital collections, this study collected institutional websites from the digital library federation’s (dlf) published list of 195 members https://wiki.lyrasis.org/display/arks/arks+in+the+open+project information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 5 as of august 2019.20 subsequent investigation aimed to find one representative digital object from unique digital collections navigable from each institution’s main web page. this study aimed to locate digital collections that met the following criteria: 1. collections are openly available. 2. collections are in a repository service, as opposed to highlighted content visible on an informational web page or blog. 3. collections are gathered within a site or service that contains multiple collections, as opposed to individual digital project websites, when possible. 4. collections are unique to an institution, as opposed to duplicated or licensed content. these criteria were developed in an effort to find unique, publicly accessible digital objects within each institution’s digital collections. to be sure, users search for and discover materials in a variety of ways and in numerous services, but studying the information-seeking behavior of users looking for digital objects or digital collections is outside the scope of this study. ultimately, digital collections indexed by search engines or available in aggregator services like dpla often contain links to collections and objects in their institutionally hosted platforms. users who discover these materials are likely to be directed to the sites this study investigated. for the purposes of this study, at least one digital collection was investigated from each dlf institution. multiple sites for an institution were investigated when more than one publicly accessible site or service met the above criteria. when digital collections at an institution were delivered only through the library catalog discovery service, reasonable attempts were made to delimit discoverable digital collections content. in total, 183 digital collections were identified for this study. once digital collections were located, subsequent investigation aimed to locate individual digital objects within them. while digital objects represent diverse materials available in a variety of formats, for ease of comparing approaches between institutions, a mixture of ind ividual digital images, multipage digital items, and audiovisual materials were examined. objects for this study were primarily available in websites containing a variety of collections and format types with common display characteristics despite format differences, and no additional efforts were made to locate equal or proportional digital object formats at each institution. one representative digital object was identified per digital collection, totaling 183 digital objects. once a digital object was located at an institution, the object’s unique identifier, format, persistent url, persistent url label, method of link resolution (if identifiable), and citation were collected with particular focus on the object’s persistent url, if available. commonly used persistent url types and their url components can be identified, as seen in table 1; however, any means of persistence was collected if clearly identified. after examining initial results, the object’s provided citation, if available, was added to the list of data collected since many digital collection platforms provide recommended citations for individual objects. information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 6 table 1. commonly used persistent url methods and corresponding url components persistent url type url component archival resource key (ark) ark:/ digital object identifier (doi) doi.org/ (or doi:) handle hdl.handle.net persistent url (purl) purl. results most institutions have a single digital collection site or service that met the selection criteria for this study. some appear to have multiple digital collection repositories, often separated by digital object format or library department, and many institutions have collections that are only publicly accessible through discrete project web sites, such as digital exhibits or focused digital humanities research projects. out of 195 dlf member institutions, 171 had publicly accessible digital collections. of these 171 institutions, 153 had digital collections services/sites that adhered to the criteria of this study, while 21 had only project-focused digital collections sites. since several institutions had more than one digital collection platform accessible via their main institutional website, a population of 183 digital collections were investigated. one representative digital object from each collection was gathered, consisting of 107 digital images, 73 multipage items, and 3 audiovisual items (totaling 183). table 2. number of instances of digital collection platforms identified platform number percentage of total (183) custom or unidentifiable 53 29% contentdm 46 25% islandora 19 10% dspace 11 6% samvera 11 6% omeka 10 5% internet archive 7 4% digital commons 6 3% fedora custom 4 2% luna 3 2% xtf 3 2% artstor 2 1% iiif server 2 1% primo 2 1% aspace 1 1% elevator 1 1% knowvation 1 1% veridian 1 1% information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 7 as seen in table 2, almost a third of digital collection platforms encountered appear to be customdeveloped or customized to not reveal the software platform upon which they were based. of the platform-based services encountered where software was identifiable, 17 different platforms were used and the top five were contentdm, islandora, dspace, samvera (hyrax, avalon, curation concerns, etc.), and omeka. table 3. occurrence of persistent links in surveyed digital collections, method of link persistence, and persistent link labels persistent links? number percentage of total (183) no/unknown 93 51% yes/ persistence claimed 90 49% persistent link method number percentage of total (90) unknown 33 37% handle 27 30% ark 19 21% doi 6 7% purl 5 6% persistent link label number percentage of total (90) othera 24 26.7% permalink 22 24.4% identifier 13 14.4% [no label given] 10 11.1% permanent link 7 7.8% uri 5 6% persistent link 3 3.3% handle 2 2.2% link to the book 2 2.2% persistent url 2 2.2% atwenty-four other persistent link labels were reported,21 each occurring only once. as seen in table 3, the numbers of digital objects with and without publicly accessible persistent (or seemingly persistent) links were nearly equal. among the digital objects with persistent links, the majority claimed persistence without a discernible resolution method, with the rest divided between handle, ark, doi, and purl. these objects also had 33 different labels for these links in the public-facing interface. the top five labels were: permalink (22), identifier (13), permanent link (7), uri (5), and persistent link (3). as seen in table 4, the majority of digital objects surveyed had a unique item identifier in their publicly viewable item record. the majority did not offer a citation in the item’s publicly viewable record. among items that offered citations, the majority contained a link to the item, and three offered downloadable citation formats only, such as endnote, zotero, and mendeley. information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 8 table 4. various digital object characteristics surveyed unique item identifier in item record number percentage of total (183) yes 132 72% no 51 28% citation in item record number percentage of total (183) yes 65 36% no 118 64% citations containing links to item number percentage of total (65) yes 39 60% downloadable citation format only 3 5% no 23 35% discussion since proper citation practice dictates choosing the url most likely to provide continuing access to a resource, it follows that providing persistent urls to resources such as digital objects or digital collections is also a good practice. it is encouraging to see a large number of institutions surveyed providing urls that persist (or claim to persist). providing persistent access to a unique digital resource implies a level of commitment to maintaining its url into the future, requiring policies, technology, and labor resources, further augmented by costs associated with registering certain types of identifiers like doi.22 it is likely that institutions not providing persistent (or not obviously persistent) urls are either internally committing to preserving their objects, collections, and services through means not known to end users; are constrained by technological limitations of their digital collection platforms; hope to develop or adopt new digital library services that offer these capabilities; or lack the resources to offer persistent urls. the four commonly used methods of persistent link resolution—doi, handle, ark, and purl— have been used for nearly 20 years, and it is not surprising that alternative observable methods were seldom encountered in this study. handles were the most common persistent url method, which seems related to the digital library platform used by an institution. dspace distributions are pre-bundled with handle server software, for example, and 12 out of 27 platforms serving digital objects with handles were based on dspace (https://duraspace.org/dspace/). when choosing to implement or upgrade a digital library platform, institutions often consider several available options. choosing a platform that offers the ability to easily create and maintain persistent urls might be less burdensome than making urls persist via independent or alternative means. thirty-three digital objects offered links that had labels implying some sort of persistence but lacked information describing the methods used or url components consistent with commonly used methods, as seen in table 1. to achieve persistence, there might be a combination of url rewriting, locally implemented solutions, or nonpublic persistent urls existing. it would benefit users, increasingly aware of the need to cite digital objects using persistent links, for digital object platforms that offer persistent linking to explicitly state that fact and ideally offer some evidence of the resolution method used. researchers will be looking for citable persistent links that offer https://duraspace.org/dspace/ information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 9 some cues signifying their persistence, whether it is clearly indicated language on the website or a url pattern consistent with the four major methods commonly used. the amount of variation in labeling persistent links was surprising. commonly used digital library software platforms have default ways of labeling these fields. nearly all of the “reference url” labels encountered are in contentdm sites, for example. since the concept of offering a persistent link to a digital object is not uncommon, perhaps there can be a more consistent approach to choosing the label for this content. when a researcher finds a digital object in an institutional digital library service, they might want to cite that object. accurately citing resources in all formats is an essential research skill, and digital library platforms often try to aid users by providing dynamically generated or pre-populated citations based on unique metadata associated with that object. it was somewhat surprising to encounter these types of citation helpers that did not include persistent links. since a digital object’s preferred persistent link is often different than the url visible in the browser, efforts should be made to make citations available containing persistent links. there are institutions with digital collections that were not examined in this study due to a number of factors. first, this study examined the 195 institutions who were members of the digital library federation, and there are 2,828 four-year postsecondary institutions in the united states as of 2018.23 additional study could expand perceptions about persistent links for digital objects when looking beyond the dlf member institutions, which are predominantly four-year postsecondary institutions but also contain museums, public libraries, and other cultural heritage organizations. an alternative approach to collecting this data would be to conduct user testing focused on finding and citing digital objects from a number of institutions. this approach was not used, however, since the initial goal of this study was to see how peer digital library institutions have employed persistent links and citations across a broad yet contained spectrum. as one librarian with extensive digital library experience, my approach to locating these platforms and resources is subject to subconscious bias i may have accumulated over my professional career, but i would hope that my experience makes me more able to locate these platforms and materials than the average user. digital library platforms are numerous, and often institutions have several of them with varying degrees of public visibility or connectivity to their institution’s main library website. this study’s findings for any particular institution are not as authoritative as self-reported information from the institution itself. while a survey aimed at collecting direct responses from institutions might have yielded more accuracy, a potentially low response rate would also make it difficult to truly know what methods of persistent linking peer institutions are employing, especially with the majority of these resources being openly findable and accessible. still, further study with self reported information could shed more light on the decisions to provide certain methods of persistent links to objects within their chosen digital collection platforms. moreover, it is possible that some digital object formats are more likely to have persistent urls than others. newer formats such as three-dimensional digital objects, commonly cited resources like data sets, and scholarship held in institutional repositories could be available in digital library services similar to those surveyed in this study with different persistent url characteristics. additional study could aim to survey populations of digital objects by format across multiple institutions to investigate any correlation between persistent urls and object format. information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 10 conclusion unique digital collections at digital library institutions are made openly accessible to the pu blic in a variety of ways, including digital library software platforms and digital library aggregator services. regardless of how users find these materials, best practices require users to cite urls for these materials that are most likely to continue to provide access to them. persistent urls are a common way to ensure cited urls to digital objects remain accessible. commonly used methods of issuing and maintaining persistent urls can be identified in digital object records within digital collection platforms available at these institutions. this study identified characteristics about these digital objects, their platforms, prevalence of persistent urls in their records, and the way these urls are presented to users. findings indicate that dlf member institutions are split evenly between providing and not providing publicly discernible persistent urls with wide variation on how these urls are presented and explained to users. decisions made in developing and maintaining digital collection platforms and the types of urls made available to users impact which urls users cite and the possibility of others encountering these resources through these citations. embarking on this study also was prompted by digital collection migrations at indiana university, and these findings provide us interesting examples of persistent url usage at other institutions and ways to improve the user experience in digital collection platforms. endnotes 1 the chicago manual of style online (chicago: university of chicago press, 2017), ch. 14, sec. 7. 2 arthur allison et al., “digital identity matters,” journal of the american society for information science & technology 56, no. 4 (2005): 364–72, https://doi.org/10.1002/asi.20112. 3 yuk hui, “what is a digital object?” metaphilosophy 43, no. 4 (2012): 380–95, https://doi.org/10.1111/j.1467-9973.2012.01761.x. 4 clifford lynch, “authenticity and integrity in the digital environment: an exploratory analysis of the central role of trust” council on library and information resources (clir), 2000, https://www.clir.org/pubs/reports/pub92/lynch/. 5 g. benoit and lisa hussey, “repurposing digital objects: case studies across the publishing industry,” journal of the american society for information science & technology 62, no. 2 (2011): 363–74, https://doi.org/10.1002/asi.21465. 6 angela dappert et al., “describing and preserving digital object environments,” new review of information networking 18, no. 2 (2013): 106–73, https://doi.org/10.1080/13614576.2013.842494. 7 christos strubulis et al., “a case study on propagating and updating provenance information using the cidoc crm,” international journal on digital libraries 15, no. 1 (2014): 27–51, https://doi.org/10.1007/s00799-014-0125-z. 8 william y. arms, “uniform resource names: handles, purls, and digital object identifiers,” communications of the acm 44, no. 5 (2001): 68, https://doi.org/10.1145/374308.375358. https://doi.org/10.1111/j.1467-9973.2012.01761.x https://www.clir.org/pubs/reports/pub92/lynch/ https://doi.org/10.1002/asi.21465 https://doi.org/10.1080/13614576.2013.842494 https://doi.org/10.1007/s00799-014-0125-z https://doi.org/10.1145/374308.375358 information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 11 9 lukas koster, “persistent identifiers for heritage objects,” code4lib journal 47 (2020), https://journal.code4lib.org/articles/14978. 10 albert w. simmonds, “the digital object identifier (doi),” publishing research quarterly 15, no. 2 (1999): 10, https://doi.org/10.1007/s12109-999-0022-2; norman paskin, “digital object identifiers,” information services & use 22, no. 2/3 (2002): 97, https://doi.org/10.3233/isu2002-222-309. 11 david sidman and tom davidson, “a practical guide to automating the digital supply chain with the digital object identifier (doi),” publishing research quarterly 17, no. 2 (2001): 9, https://doi.org/10.1007/s12109-001-0019-y; andy weissberg, “the identification of digital book content,” publishing research quarterly 24, no.4 (2008): 255–60, https://doi.org/10.1007/s12109-008-9093-8. 12 susanne derisi, rebecca kennison, and nick twyman, “the what and whys of dois,” plos biology 1, no. 2 (2003): 133–34, https://doi.org/10.1371/journal.pbio.0000057; diane j. skiba, “digital object identifiers: are they important to me?,” nursing education perspectives 30, no. 6 (2009): 394–95, https://doi.org/10.1016/j.lookout.2008.06.012; janna neumann and jan brase, “datacite and doi names for research data,” journal of computer-aided molecular design 28, no. 10 (2014): 1035–41, https://doi.org/10.1007/s10822-014-9776-5. 13 john kunze, “towards electronic persistence using ark identifiers,” california digital library, 2003, https://escholarship.org/uc/item/3bg2w3vs. 14 keith e. shafer, stuart l. weibel, and erik jul, “the purl project,” journal of library administration 34, no. 1–2 (2001): 123, https://doi.org/10.1300/j111v34n01_19. 15 keith e. shafer, “arms, oclc internet services, and purls,” journal of library administration 34, no. 3–4 (2001): 385, https://doi.org/10.1300/j111v34n03_19. 16 cendi persistent identification task group, “persistent identification: a key component of an egovernment infrastructure,” new review of information networking 10, no. 1 (2004): 97–106, https://doi-org/10.1080/13614570412331312021. 17 erick ducut, fang liu, and paul fontelo, “an update on uniform resource locator (url) decay in medline abstracts and measures for its mitigation,” bmc medical informatics & decision making 8, no. 1 (2008): 1–8, https://doi.org/10.1186/1472-6947-8-23. 18 julie a. mcmurry et al., “identifiers for the 21st century: how to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data,” plos biology 15, no. 6 (2017): 1–18, https://doi.org/10.1371/journal.pbio.2001414. 19 john markwell and david brooks, “broken links: just how rapidly do science education hyperlinks go extinct?” (2003), cited by many and previously available from: http://wwwclass.unl.edu/biochem/url/broken_links.html [currently non-functional]. 20 “our member institutions,” digital library federation (2020), https://www.diglib.org/about/members/. https://journal.code4lib.org/articles/14978 https://doi.org/10.1007/s12109-999-0022-2 https://doi.org/10.3233/isu-2002-222-309 https://doi.org/10.3233/isu-2002-222-309 https://doi.org/10.1007/s12109-001-0019-y https://doi.org/10.1007/s12109-008-9093-8 https://doi.org/10.1371/journal.pbio.0000057 https://doi.org/10.1016/j.lookout.2008.06.012 https://doi.org/10.1007/s10822-014-9776-5 https://escholarship.org/uc/item/3bg2w3vs https://doi.org/10.1300/j111v34n01_19 https://doi.org/10.1300/j111v34n03_19 https://doi-org/10.1080/13614570412331312021 https://doi.org/10.1186/1472-6947-8-23 https://doi.org/10.1371/journal.pbio.2001414 http://www-class.unl.edu/biochem/url/broken_links.html http://www-class.unl.edu/biochem/url/broken_links.html https://www.diglib.org/about/members/ information technology and libraries june 2021 persistent urls and citations offered for digital objects by digital libraries | homenda 12 21 twenty-four labels used only once: archival resource key; ark; bookmark this page at; citable link; citable link to this page; citable uri; copy; copy and paste this url; digital object url; doi; identifier (hdl); item; link; local identifier; permanent url; permanently link to this resource; persistent link to this item; persistent link to this record; please use this identifier to cite or link to this item; related resources; resource identifier; share; share link/location; to cite or link to this item, use this identifier. 22 one of the frequently asked questions (https://www.doi.org/faq.html) states that doi registration fees vary. 23 national center for education statistics, “table 317.10. degree-granting postsecondary institutions, by control and level of institution: selected years, 1949–50 through 2017–18,” in digest of education statistics, 2018, https://nces.ed.gov/programs/digest/d18/tables/dt18_317.10.asp. https://www.doi.org/faq.html https://nces.ed.gov/programs/digest/d18/tables/dt18_317.10.asp abstract introduction literature review methodology results discussion conclusion endnotes 8 information technology and libraries | june 20088 information technology and libraries | september 2008 from our readers: virtues and values in digital library architecture mark cyzyk editor’s note: “from our readers” will be an occasional feature, highlighting ital readers’ letters and commentaries on timely issues. at the fall 2007 coalition for networked information (cni) conference in washington, d.c., i pre-sented “a survey and evaluation of open-source electronic publishing systems.” toward the end of my presentation was a slide enumerating some of the things i had personally learned as a web application architect during my review of the systems under consideration: n platform independence should not be neglected. n one inherits the flaws of external libraries and frameworks. choose with care. n installation procedures must be simple and flawless. n don’t wake the sysadmin with “slap a gui on that xml!”—and push application administration out, as much as possible, to select users. n documentation must be concise, complete, and comprehensive. “i can’t guess what you’re thinking.” initially, these were just notes i thought might be useful to others, figuring it’s typically helpful to share experiences, especially at international conferences. but as i now look at those maxims, it occurs to me that when abstracted further they point in the direction of more general concepts and traits—concepts and traits that accurately describe us and the products of our labor if we are successful, and prescribe to us the concepts and traits we need to understand and adopt if we are not. in short, peering into each maxim, i can begin to make out some of the virtues and values that underlie, or should underlie, the design and architecture of our digital library systems. n freedom and equality platform independence should not be neglected. “even though this application is written in platformindependent php, the documentation says it must be run on either red hat or suse, or maybe it will run on solaris too, but we don’t have any of these here.” while i no doubt will be heartily flamed for suggesting that microsoft has done more to democratize computing than any other single company, i nevertheless feel the need to point out that, for many of us, windows server operating systems and our responsibility for administering them way back when provided the impetus for adding our swipe-card barcodes to the acl of the data center—surely a badge of membership in the club of enterprise it if ever there was one. you may not like the way windows does things. you may not like the way microsoft plays with the other boys. but to act like they don’t exist is nothing more than foolish burying one’s head in the *nix sand. windows servers have proven themselves time and again as being affordable, easily managed, dependable, and, yes, secure workhorses. windows is the ford pickup truck of the server world, and while that pickup will some day inevitably suffer a blowout of its twenty-year-old head gasket (and will therefore be respectfully relegated to that place where all dearly departed trucks go), it’s been a long and good run. we should recognize and appreciate this. windows clearly has a place in the data center, sitting quietly humming alongside its unix and linux brothers. i imagine that it actually takes some effort to produce platform-dependent applications using platform-independent languages and frameworks. such effort should be put toward other things. keep it pure. and by that i mean, keep it platform independent. freedom to choose and presumed equality among the server-side oses should reign. n responsibility and good sense one inherits the flaws of external libraries and frameworks. choose with care. so you’ve installed the os, you’ve installed and configured the specified web server, you’ve installed and configured the application platform, you’ve downloaded and compiled the source, yet there remains a long list of external libraries to install and configure. one by one you install them. suddenly, when you get to library number 16 you hit a snag. it won’t install. it requires a previous version of library number 7, and multiple versions of library number 7 can’t be installed at the same time on the same box. worse yet, as you take a break to read some more of the documentation, it sure looks like required library number 19 is dependent on the current version of library number 7 and won’t work with any previous version. and could it be that library number 21 is dependent on library number 20, which is dependent on library number 23, which is dependent on—yikes—library number 21? mark cyzyk (mcyzyk@jhu.edu) is the scholarly communication architect, library digital programs group, sheridan libraries, johns hopkins university in baltimore. from our readers: virtues and values in digital library architecture | cyzyk 9 all things come full circle. but let’s suppose you’ve worked out all of these dependencies, you’ve figured out the single, secret order in which they must install, you’ve done it, and it looks like it’s working! yet, when you go to boot up the web service, suddenly there are errors all over the place, a fearsome crashing and burning that makes you want to go home and take a nap. something in your configuration is wrong? something in the way your configuration is interacting with an external library is wrong? you search the logs. you gather the relevant messages. they don’t make a lot of sense. now what to do? you search the lists, you search the wikis to no avail, and finally, in desperation, you e-mail the developers. “but that’s a problem with library x, not with our application.” au contraire. i would like to strongly suggest a copernican revolution in how we think about such situations. while it’s obvious that the developers of the libraries themselves are responsible for developing and maintaining them, i’d like to suggest that this does not relieve you, the developer of a system that relies on their software, from responsibility for its bugs and peculiar configuration problems. i’d like to suggest that, far from pushing responsibility in the case mentioned above out to the developers of the malfunctioning external library, that you, in choosing that library in the first place, have now inherited responsibility for it. even if you don’t believe in this notion of inheritance, if you would please at least act as if it were true, we’d all be in a better place. part of accepting this kind of responsibility is you then acting as a conduit through which we poor implementers learn the true nature of the problem and any solutions or temporary workarounds we may apply so that we can get your system up and running pronto. in the end, it’s all about your system. your system as a whole is only as strong as the weakest link in its chain of dependencies. n simplicity and perfection installation procedures must be simple and flawless. it goes without saying that if we can’t install your system we a fortiori can’t adopt it for use in our organization. i remember once having such a difficult time trying to get a system up and running that i almost gave up. i tried first to get it running against apache 1.4, then against apache 2.0. i had multiple interactions with the developers. i banged my head against the wall of that system for days in frustration. the documentation was of little help. it seemed to be more part of an internal documentation project, a way for the developers to communicate among themselves, than to inform outsiders like me about their system. and related to this i remember driving to work during this time listening to a report on npr about the famous hopkins pediatric neurosurgeon, dr. ben carson. apparently, earlier in the week he had separated the brains of siamese twins and the twins were now doing fine, recuperating. the npr commentator marveled at the intricacy of the operation and at the fact that the whole thing took, i believe, five hours. “five hours? five hours?!” i exclaimed while barreling down the highway in my vintage 1988 ford ranger pickup (head gasket mostly sealed tight, no compression leakage). “i can’t get this system at work installed in five days!” our goal as system architects needs to be that we provide to our users simple and flawless installation procedures so that our systems can, on average, be installed and configured in equal or less time than it takes to perform major brain surgery.1 “all in an afternoon” should become our motto. i am happy to find that there are useful and easy to use package managers, e.g., yum and synaptic, for doing such things on various linux distributions. windows has long had solid and sophisticated installation utilities. tomcat supports drop-in-place war files. when possible and appropriate, we need to use them. n justice and e-z livin don’t wake the sysadmin with “slap a gui on that xml!”—and push application administration out, as much as possible, to select users. i remember reading plato’s republic as an undergraduate and the feeling of being let down when the climax of the whole thing was a definition in which “justice” simply is each man serving his proper place in society and not transgressing the boundaries of his role. “that’s it?” i thought. “so you have this rigidly hierarchical society and each person in it knows his role and knows in which slot his role fits—and keeping to this is ‘justice’?” this may not be such a great way to structure a society, but now that i think about it, it’s a great way to structure a computer application. sit down and carefully look at the functions your program will provide. then create a small set of user roles to which these functions will be carefully mapped. in the end you will have a hierarchical structure of roles and functions that should look perfectly simple and rational when drawn on a piece of paper. and while the superuser role should have power over 10 information technology and libraries | september 2008 all and access to all functions in the application, the list of functions that he alone has access to should be small, i.e., the actual work of the superuser should be minimized as much as possible by making sure that most functions are delegated to the members of other, appropriate, proper user roles. doing this happily results in what i call the state of e-z livin: the last thing you want is for users to constantly be calling you with data issues to fix. you therefore will model management of the data—all of it—and the configuration of the application itself—most of it— directly into the architecture of the application, provide users the guis they need to configure and manage things themselves, and push as much functionality as you can out to them where it belongs. let them click their respective ways to happiness and computing goodness. you build the tool, they use it, and you retire back to the land of e-z livin. users are assigned to their roles, and all roles are in their proper places. application architecture justice is achieved. n clarity and wholeness documentation must be concise, complete, and comprehensive. “i can’t guess what you’re thinking.” as system developers we’ve probably all had the magical experience of a mind meld with a fellow developer when working intensively on a project. i have had this experience with two other developers, separately, at different stages of my career. (one of them, in fact, used to point out to everyone that, “between the two of us, we make one good developer!”) this is a wonderful and magical and productive working relationship in which to be, and it needs to be recognized, supported, and exploited whenever it happens. you are lucky if you find yourself designing and developing a system and your counterpart is reading your mind and finishing your sentences. however, just as it’s best to leave that nice young couple cuddling in the corner booth alone, so too it really doesn’t make a lot of sense to expect the mind-melded developers to turn out anything that remotely resembles coherent and understandable documentation. those undergoing a mind meld by definition know perfectly well what they mean. to the rest of us it just feels like we missed a memo. if you have the luxury, make sure that the one writing the documentation is not currently undergoing a mind meld with anyone else on the development team. scotty typically stayed behind while he beamed the others down. beam them down. be that scotty. you do the world a great service by staying behind on the ship and dutifully reporting, clearly and comprehensively, what’s happening down on the red planet. to these five maxims, and their corresponding virtues, i would add one more set, one upon which the others rely: n empathy and graciousness you are not your audience. at least in applied computing fields like ours, we need to break with the long-held “guru in the basement” mentality. the actions of various managerial strata have now ostensibly acknowledged for us that technical expertise, especially in applied fields, is a commodity, i.e., it can be bought. a dearth of such expertise is remedied by simply applying money to the situation—admittedly difficult to do at the majority of institutions of higher education, but a common occurrence at the wealthiest. nevertheless, the dogmatic hold of the guru has been broken and the magical aura that once draped her is not now so resplendent—her relative rarity, and the clubby superiority that depended upon it, has been diluted significantly by the sheer number of counterparts who can and will gleefully fill her function. we respect, value, and admire her; it’s just that her stranglehold on things has (rightfully) been broken. and while nobody is truly indispensable, what is more difficult and rare to find is someone who has the guru’s same level of technical chops coupled with a genuine empathic ability to relate to those who are the intended users of her systems and services. unless your systems and services are geared primarily toward other developers, programmers, and architects— and presumably they are not, nor, in the library world, should they be—your users will typically be significantly unlike you. let me repeat that: your users are not like you. rephrased: you are not your audience. when looking back over the other maxims, values, and virtues mentioned in this essay then, the moralpsychological glue that binds them all is composed of empathy for our users—faculty, students, librarians, non-technical staff—and the graciousness to design and carry out a project plan in a spirit of openness, caring, flexibility, humility, respect, and collaboration. when empathy for the users of our systems is absent—and there are cases where you can actually see this in the design and documentation of the system itself—our systems will ultimately not be used. when the spirit of graciousness is broken, men become robots, mere rule followers, and users will boycott using their systems and will look elsefrom our readers: virtues and values in digital library architecture | cyzyk 11 where, naturally preferring to avoid playing the simonsays games so often demanded by tech folk in their workaday worlds; there is a reason the comic strip dilbert is so funny and rings so true. when confronted with a lack of empathy and graciousness on our part, the users who can boycott using our systems and services will boycott using our systems and services. and we’ll be left out in the rain, feeling like, as bonnie raitt once sadly sang, “i can’t make you love me if you don’t / i can’t make your heart feel something it won’t.” empathy and graciousness, while not guaranteeing enthusiastic adoption of our systems and services, are a necessary precondition for users even countenancing participation. there are undoubtedly other virtues and values that can usefully be expounded in the context of digital library architecture—consistency, coherence, and elegance immediately come to mind—and i could go on and on analyzing the various maxims surrounding these that bubble up through the stack of consciousness during the course of the day. yet doing so would conflict with another virtue i think is key to the success and enjoyment of opinionpiece essays like this and maybe even of other sorts of publications and presentations: brevity. note 1. a colleague of mine has since informed me that carson’s operation took twenty-five hours, not five. nevertheless, my admonition here still holds. when installation and configuration of our systems are taking longer, significantly longer, than it takes to perform major brain surgery, surely there is something amiss? reproduced with permission of the copyright owner. further reproduction prohibited without permission. information ecologies: using technology with heart/the media ... zillner, tom information technology and libraries; mar 2000; 19, 1; proquest pg. 54 book reviews information ecologies: using technology with heart by bonnie a. nardi and vicki l. o'day. cambridge: mit pr., 1999. 232p. $27.50 (isbn 0-262-14066-7). the media equation: how people treat computers, television, and new media like real people and places by byron reeves and clifford nass. cambridge: cambridge univ . pr., 1996 and 1999. 305p. $28.95 (isbn 1-575-86052x); paper, $15.95 (isbn 1-575-86053-8). the books i am reviewing this month are interrelated because they both focus on information technology and our changing world, with the two volumes looking at different levels of the picture. the broader, and to me more intriguing, view is presented by nardi and o'day in their wonderful book information ecologies. although it is not clear from the capsule biographies of the dust jacket, nardi and o'day are anthropologists who study the world of technology in a number of locales, and they here report the findings from their field work. among the case studies they discuss are an examination of the activities of reference librarians at two corporations and a look at a virtual world created for and by elementary school students. but they do much more than simply present case studies, although these alone make the book a worthwhile read. in addition, they argue that the most useful way to look at information technology is through the metaphor of "information ecologies," "system[s] of people, practices, values, and technologies in ... particular local environment[s]." they adopt this biological metaphor after carefully considering the most commonly employed information technology metaphors: technology as tool, text, or system . in turn, they find each of these metaphors wanting. it is particularly important to choose carefully the metaphorical lenses through which technological developments are viewed. each particular metaphor has consequences for how sanguinely we view a technology, and it is often worthwhile to use multiple metaphors to enhance our world view. the information ecology metaphor is particularly appropriate for an anthropological view of local "habitats" and their inhabitants and artifacts . in turn, an anthropological view is particularly apt for capturing the human side of technology (thus the subtitle: using technology with heart). this is a side of things that can be overlooked in other metaphorical views, particularly since it requires that the sticky issue of values be considered. unfortunately for all of us, there is a reluctance to talk of human values when considering technology. as nardi and o'day note, there is a tendency to either enthusiastically applaud new technology without regard to its effects, or to condemn all new technology as inherently debasing to humanity, or to simply resign oneself pessimistically to the inevitable development of technology and our lack of control over it. nardi and o'day tend to be cautious optimists, claiming that we can control technology, and the way to exercise that control is through our own local encounters with information ecologies. thus, rather than bemoaning the dehumanizing effects of the internet, information ecologies explores the successful use of internet technologies to set up a virtual world for students and the elderly in phoenix, arizona. instead of thinking or acting globally, exploit the technology locally, but do so in a way that makes sense in terms of human values. on the taxonomic scale of technology views, ranging from gloom and doom (e.g., the views of clifford 54 information technology and libraries i march 2000 tom zillner, editor stoll) to perpetual optimism (e.g., nicholas negroponte), i place nardi and o'day somewhere in the middle, but as i suggested, leaning toward cautious optimism. in fact, they spend several chapters discussing the views of others and offering prescient criticism of the deficiencies of those views . of particular interest to me was their analysis of the french sociologist jacques ellul, who apparently sounded the alarm concerning the stress to mind and soul of constant technological change in 1954, well before the current crop of doomsayers. nardi and o'day find ellul's views, as articulated in the technological society to be compelling. yet, they claim, the rise of the internet can counteract the trend that ellul saw toward monotonous sameness and lack of diversity in the face of technological efficiency. perhaps so. one thing that i was looking for in information ecologies were some practical tools for engaging in the kind of exploration of information habitats that nardi, o'day, and other anthropologists engage in. there is a spate of interest lately in the role of anthropologists in the design and deployment of new technologies, and i would like to determine its applicability to my modest software development projects. unfortunately, i was mainly disappointed on this score. in fairness to the authors, they did not set out to spell out the anthropological methodology of exploring information ecologies in any detail. the purpose of the book is rather to argue that viewing the world of technology as a set of interconnected information ecologies is useful and accurate, and in many cases superior to other metaphorical views. they succeed in this goal. now i want them to go on to write a book on using anthropological methods in these ecologies without necessarily becoming a professional anthropologist. nardi and o'day do touch extremely briefly on a few conventions of interviewing subjects, with --------reproduced with permission of the copyright owner. further reproduction prohibited without permission. their most important technical discussion centering on what they call "strategic questioning," which they present in the context of evolving information ecologies . they provide useful categories of questions to be asked, and specific examples. although it may seem obvious to ask penetrating questions of members of an information habitat, this is one area in which software developers in particular fail miserably . another seemingly obvious pointer is to pay attention . again, its obviousness is deceptive , since most of us are poor observers who make many assumptions about the characteristics of a work activity without observational evidence . as evidence that people introducing new technologies to an ecology do not follow these simplest pieces of advice you can tum to the chapter "a dysfunctional ecology," to see how badly technology can fail for nontechnological reasons . this case study deals with a major teach ing hospital that introduced a monitoring system into its neurosurgical operating suites that captured instrument readings as well as complete audio and video. the system was installed to aid neurophysiologists, experts who are called in to advise neurosurgeons at key points during complex surgeries to ensure that patient neurological function is not compromised . the neurosurgeons and neurophysiologists at this hospital decided that it would be more efficient for the neurophysi ologists to be able to remotely monitor multiple surgeries simultaneously. both groups failed to consult with the other constituencies among the operating team, the nurses and anesthesiology staff. these groups believed that their privacy was being compromised, particularly since it was possible to tape any procedures at multiple workstations throughout the hospital. i can easily envision similar sorts of problems due to lack of communication in introducing new or modified technology into other milieus, e.g., libraries. although the consequences might not lead to the potentially life-threatening situations that could arise in an operating suite, there are certainly possible outcomes where service to users could be undermined. despite the book being not exactly what i (rather selfishly) want, lnformation ecologies is a first-rate read and an important starting point for those concerned with better controlling technological change in the world of information. turning from an anthropological point of view to a psychological one, the media equation offers another important basis for technological design and implementation, particularly of computer software and multimedia. the release last year of a paperback edition of this volume, first published in 1996, provides a convenient pretext for reviewing this work. reeves and nass have supervised years of study and experimentation that have consistently demonstrated the truth of what they call the "media equation": that our relations with media, including computers and multimedia, are identical in key ways to our relationships with other human beings. this is true of all of us, even those of us sophisticated enough to understand that we are dealing with devices and human artifacts rather than people . reeves and nass quite entertainingly present the technique they've used over the years to perform their research, on a step-by-step basis: 1. pick a research finding on how people respond to each other or their environment. 2. find the summary of the social or natural rule that the study has yielded. 3. replace the words "person" or "environment" in the summary with media of some sort (television, movi es, computers, etc.) 4. find the research procedure . 5. substitute media for one of the people or the environment in the procedure. 6. run the experiment. 7. draw conclusions. although this may sound facetious, it is in fact the recipe that produced the startling conclusions that we all tend to behave toward media much as we do toward other people. what's perhaps more important is that reeves and nass point toward techniques that practitioners can use to produce more effective media, including computer software . as a simple example, consider politeness. reeves and nass discovered that people treated computers with the same sort of politeness that they would other human beings, and in turn reeves and nass suggest that people respond better to "polite media." they then provide some fairly straightforward advice on producing polite computer programs, starting with grice's maxims, a set of politeness rules assembled by h. paul grice, a philosopher and psychologist. these center around truth telling, appropriate quantity of information (neither too much nor too little), relevance, and clarity. all of this is fairly unsurprising, but the authors spell out just how the maxims can be applied to the construction of computer programs . further, they go on to suggest some rules of thumb of their own. for example, some computer programs produce verbal output but expect the user to key in his or her responses. this may be perceived by the user , possibly subconsciously, as forcing an impolite response, since mixing communications modalities is a faux pas. thus, they suggest that if text input is required , perhaps only text output should be supplied . this should provide you with some of the flavor of the media equation, and in turn you may be able to see a set of potential ethical dilemmas that can arise from utilizing book reviews i 55 reproduced with permission of the copyright owner. further reproduction prohibited without permission. techniques that result from the research of reeves and nass. this set of problems can be seen most clearly in the chapter "subliminal images," where they discuss how subliminal messages could be inserted into new media to advertise products or to attempt to bolster employee morale. in fact, they say, " ... it might be easier to accomplish subliminal intrusions with a computer than with a television, because software can respond to the particular input of individual users and timing is more precise." they immediately temper this insight with the caution that" ... ethical and legal issues abound." indeed. although some of the techniques that can be applied to new media do lead to ethical problems, i think that most of what reeves and nass talk about are just elements of good design. subliminal suggestion seems to most of us to be out of bounds because it unfairly manipulates user response in a powerful way. the unfairness is that someone can be manipulated without his or her knowledge to do something outside of the person's normal behavior. although the other techniques tend to subtly alter behavior, they don't generally result in an anomalous action by the user. if you think this is a kind of philosophical hairsplitting, you're right. the onus is upon the programmer or multimedia designer to use these techniques with great care. in a past professional life i wrote computerized patient interviews for the psychiatry department of the university of wisconsin. researchers there and elsewhere found that people were generally more candid with the computer than they were with human clinicians. so the findings of reeves and nass were not quite as surprising to me as they might be to others. what did surprise me, however, is that the media equation is not a phenomenon solely of the nai"ve or inexperienced media and computer users. on the contrary, all of us, no matter how conversant we are with underlying technology, are susceptible to the effects described in the media equation. this vastly increases the power of computer programs and other media for both good and ill. i want to emphasize that not all of the possible effects of humanmedia interaction are pernicious. most are simply innocuous, and if techniques that benefit users can result from these effects there should be no harm in applying them in software or multimedia. in general, it's desirable to make user experiences of software and media pleasanter and more productive, and reeves and nass do an excellent job of providing pointers throughout the book. there are suggestions with regard to personality, emotion (including arousal), social roles, and form (e.g., image size, fidelity of sound, and video). none of them comes close to being as controversial as subliminal suggestion, although it continues to make me uncomfortable that people react to media as if they were dealing directly with other human beings. this is a disquieting finding, but it should not dissuade us from our jobs of designing good systems for users. all in all, information ecologies and the media equation are both firstrate books that belong in our libraries and on our professional bookshelves. both provide methodologies and techniques for making user interactions with automated systems a better experience, both in terms of accomplishing tasks efficiently and in terms of user satisfaction.-tom zillner index to advertisers info usa library technologies, inc. lita cover 4 cover 3 cover 2, 2 56 information technology and libraries i march 2000 december_ital_oud_final accessibility of vendor-created database tutorials for people with disabilities joanne oud information technology and libraries | december 2016 7 abstract many video, screencast, webinar, or interactive tutorials are created and provided by vendors for use by libraries to instruct users in database searching. this study investigates whether these vendorcreated database tutorials are accessible for people with disabilities to see whether librarians can use these tutorials instead of creating them in-house. findings on accessibility were mixed. positive accessibility features and common accessibility problems are described, with recommendations on how to maximize accessibility. introduction online videos, screencasts, and other multimedia tutorials are commonly used for instruction in academic libraries. these online learning objects are time consuming to create in-house and require a commitment to maintain and revise when database interfaces change. many database vendors provide screencasts or online videos on how to use their databases. should libraries use these vendor-provided instructional tools rather than spend the time and effort to create their own? many already do: a study shows that 17.7 percent of academic libraries link to tutorials created by third parties, mainly by vendors or other libraries.1 when deciding whether to use vendor-created tutorials, one consideration is whether the tutorials meet accessibility requirements for people with disabilities. the importance of accessibility for online tutorials has been increasingly recognized and outlined in recent library literature.2 people with disabilities make up one of the largest minority groups in the united states and canada, and studies show that about 9 percent of university or college students have a disability.3 problems with web accessibility have been well documented. people with disabilities are often unable to access the same online sites and resources as others, creating a digital divide.4 even if people with disabilities can access a site, it is more difficult for many to use it.5 assistive technologies, like screen-reading software, enable access but add an extra layer of complexity in interacting with the site, and blind or low-vision users can’t always rely on visual cues to navigate and interpret sites. a recent study of library website accessibility concluded that typical library websites are not designed with people with disabilities in mind.6 joanne oud (joud@wlu.ca) is instructional technology librarian and instruction coordinator, wilfrid laurier university, ontario, canada. accessibility of vendor-created database tutorials for people with disabilities | oud https://doi.org/10.6017/ital.v35i4.9469 8 libraries, which are founded on a philosophy of equal access to information, should be concerned about online accessibility. legal requirements for providing accessible online web content vary, but exist in every jurisdiction in the united states and canada. apart from the legal requirements, recent literature points out that equitable access to information for people with disabilities is a matter of human rights and an issue of diversity and social justice, and calls on libraries and librarians to improve their commitment to online accessibility.7 it is important for libraries to participate in creating level playing field and to avoid creating conditions that make people feel unequal or prevent them from equitable access. it is unclear whether librarians can assume vendor-created instructional tutorials are accessible. studies on vendor database accessibility have been mixed, showing some commitment to and improvements in accessibility on one hand, but sometimes substantial gaps in accessibility on the other.8 the focus until now has been exclusively on the accessibility of database interfaces. this study investigates the accessibility of online tutorials, including videos, screencasts, interactive multimedia, and archived webinars created by database and journal vendors and offered as instructional materials to librarians and patrons, to determine whether they are a viable alternative to making in-house training materials. literature review although a few articles exist on how to make video tutorials accessible,9 no studies have evaluated the accessibility of already-created video or screencast tutorials. there are, however, some studies evaluating the accessibility of vendor databases. byerley, chambers, and thohira surveyed vendors in 2007 and found that most felt they had integrated accessibility standards into their search interfaces, and nearly all tested for accessibility to some degree, though not always with actual users.10 these findings conflict somewhat with the results of other studies. tatomir and durrance evaluated the accessibility of thirty-two databases with a checklist and found that although many did contain accessibility features, 72 percent were marginally accessible or inaccessible.11 similarly, dermody and majekodunmi found that students with print-related disabilities who use screen-reading software could only complete 55 percent of tasks successfully because of accessibility barriers and usability challenges.12 delancey surveyed vendors and examined vpats, or product accessibility claims, and found that vendors felt they were compliant with 64 percent of us section 508 items.13 especially relevant to this study, only 23 percent of vendors said that the multimedia content within their products was compliant, and 46 percent admitted multimedia content was not compliant at all. since vendor vpat forms are completed for databases and other products only, and not the instructional tutorials created by vendors on how to use those products, vendor accessibility claims for instructional tutorials are unknown. although no studies have been done on the accessibility of video or screencast tutorials, some have been done on the accessibility of multimedia or other related kinds of online learning. information technology and libraries | december 2016 9 roberts, crittenden, and crittenden surveyed 2,366 students taking online courses at several us universities. a total of 9.3 percent of those students reported that they had a disability, and of those, 46 percent said their disability affected their ability to succeed in their online course, although most reasons cited were not related to technical accessibility barriers.14 kumar and owston studied students with disabilities using online learning units that contained videos. all students in the study reported at least one barrier to completing the learning units.15 although this study involves student use of video tutorials, it doesn’t report on accessibility issues specific to those tutorials. previous studies of vendor products focus exclusively on database interfaces, and previous studies of online learning have not focused on screencast accessibility. therefore this study’s goal is to investigate how accessible vendor-created video tutorials are. accessibility is defined as both technical accessibility (can people with disabilities locate, access, and use them) and usability (how easy it is for people with disabilities to use them). this study will look at which major accessibility issues there are (if any) and make recommendations on whether librarians can direct students to them rather than making in-house instructional videos. method an evaluation checklist (see appendix 2) was developed for this study using criteria drawn from the web content accessibility guidelines (wcag) 2.0. wcag 2.0 is the most widely recognized web-accessibility standard internationally. much recent accessibility legislation adopts it, including the in-process revisions to section 508 guidelines in the united states.16 wcag 2.0 is also consistent with tutorial accessibility best-practice advice found in recent articles, which emphasize the need for accurate captions, keyboard accessibility, descriptive narration, and alternate versions for embedded objects, among other criteria.17 the checklist has twenty items and is split into two sections, “functionality” and “usability.” functionality items test whether the tutorial can be used by people using screen-reading software or a keyboard only, and include whether the tutorial is findable on the page and playable, whether player controls and interactive content can be operated by keyboard, whether captions are available, and whether audio narration is descriptive enough so someone who can’t see the video can understand what is happening. usability items test how easy the tutorial is to use. examples include clear visuals and audio, use of visual cues to focus the viewer’s attention, and short and logically focused content. to help prioritize the importance of checklist items, the local accessible learning centre (alc), which supports students on campus who use assistive technologies, was consulted about the difficulties most encountered by students. the alc’s highest priority was the provision of an alternate accessible version of a tutorial, since it is difficult to make complex embedded web content accessible for everyone under every circumstance and an alternate version allows people to work with content in a way that suits their needs. accessibility of vendor-created database tutorials for people with disabilities | oud https://doi.org/10.6017/ital.v35i4.9469 10 for the evaluation, major database vendors were chosen through a scan of common vendors and platforms at universities, with input from collections colleagues. some vendors were eliminated because they don’t provide instructional tutorials on their websites. twenty-five vendors were included in the study (see appendix 1). a large majority of the tutorials found were screencast or video tutorials; a few vendors provided recorded webinars, and a few provided interactive multimedia tutorials, mainly text captions or visuals with clickable areas or quizzes. in total, 460 tutorials were evaluated for accessibility: 417 video, screencast, or interactive tutorials from twenty-foure vendors, and 41 recorded webinars from four vendors. if tutorials were available in more than one place, most commonly on both the vendor’s website and youtube, both locations were tested. if more than thirty tutorials were provided by a vendor, every other one was tested. if multiple formats of tutorial were available, such as screencasts and recorded webinars, each format was tested. testing from the perspective of people with visual impairments was a key focus. other assistive technologies such as kurzweil (for people who can see but have print-related disabilities) and zoomtext (for enlargement) are widely used, but if webpages work well using screen-reading software intended for people with visual impairments, they also generally work using other kinds of assistive software. tutorials were tested with two screen-reading programs used by people with visual impairments: nvda (with firefox), a free open source program, and jaws (with internet explorer), a widely used commercial product. both were used to determine whether any difficulties were due to the quirks of a particular software product or a result of inherent accessibility problems. in addition, captions were evaluated to determine accessibility for people who are deaf or have hearing difficulties. people with visual or some physical impairments use the keyboard only, so all tutorials were tested without a mouse using solely the keyboard. during testing, each task was tried three different ways within nvda or jaws before deciding that it couldn’t be completed. if one of the three methods worked the task was marked as successfully completed. if a task could be completed successfully in one screen-reading program but not the other, it was marked as unsuccessful. screen-reader support needs to be consistent across platforms, since people may be using a variety of types of assistive software. findings and discussion tutorials created by the same vendor nearly all used the same approach and had the same checklist results. this is positive, since consistency is important for accessibility and helps in navigation and ease of use. none of the forty-one recorded webinars tested in this study were accessible. webinars did not have player controls that were findable on the page by screen-reading software or usable by information technology and libraries | december 2016 11 keyboard. none had captions, transcripts, or alternate accessible versions. often webinars were quite long, with no clear structure and no cues to focus attention on the screen. recorded webinars had almost no accessibility features and can’t be recommended for use as accessible instructional materials in their current form. none of the screencast or video tutorials tested were completely accessible, and all failed in at least one checklist item. tutorials from some vendors, however, came close to meeting all checklist requirements. overall, there were many positive accessibility features in the video and screencast tutorials. most of these tutorials were findable and playable by screen reading software in some way, had video player controls usable by keyboard, had descriptive narration so people who can’t see the screen can tell what is happening, had clear visuals and audio narration, used simple language, and were relatively short and focused in content. the most accessible screencast or video tutorials were produced by the american psychological association (apa), american theological library association (atla), modern language association (mla), and ebsco. their tutorials had many accessibility features and rated highly on the checklist. they included much less commonly found accessibility features, especially the use of visual and/or audio cues to focus the viewer’s attention and the inclusion of accurate and properly synchronized closed captions. visual cues are important for people with learning or attentionrelated disabilities, and help all viewers interpret and follow the video more easily. people who are deaf can’t access the content without captions, and captions also help people who have english as a second language or are at public computers without headphones. tutorials from these vendors also had an alternate version or transcript available. as mentioned earlier, the highest-priority checklist item is the presence of an alternate accessible version, since it is difficult to design multimedia that works for people with all disabilities in all circumstances. people with disabilities may also have previous negative experiences with online multimedia and prefer to use an alternate format that they have had more success with. in the case of these above-average vendors, the alternate accessible version was a transcript consisting of the video’s closed captions, auto-generated by youtube. since the tutorials’ narration was descriptive and the captions were accurate, the auto-generated transcripts are useful. however, the youtube transcript is hard to find on the youtube page. also, most of these vendors had tutorials available both from their own websites and from youtube, and none had alternate versions available on their own websites. viewers requiring an alternate format would need to know to go to the youtube site instead of the vendor site to find it. two other vendors also had quite accessible tutorials. ieee’s tutorials had the same positive accessibility features already mentioned. tutorials were done in-house and presented through the vendor’s site. while most tutorials presented on vendor sites were lacking in accessibility, ieee’s were well thought out from an accessibility perspective and usable by screen-reading software. these were the only tutorials tested where all interactivity, including pop-up screens, was easily accessibility of vendor-created database tutorials for people with disabilities | oud https://doi.org/10.6017/ital.v35i4.9469 12 usable and navigable by keyboard. the one accessibility issue was the lack of an alternate accessible version. elsevier’s sciencedirect tutorials took a different approach to accessibility than other vendors, or even than elsevier’s tutorials for other elsevier products. the science direct tutorials were not accessible, but an alternate text version was available and people using screen-reader software were informed of this when they get to the tutorial page and were redirected to the text version. the ideal is to have one version that is accessible to everyone, but this approach is a good way to implement an alternate version if one accessible version isn’t possible. screencasts or video tutorials from other vendors also have some good accessibility features, but these were balanced with serious accessibility problems. the main accessibility issues discovered include the following: alternate accessible versions: vendors who had captions and hosted their videos on youtube did have auto-generated youtube transcripts, but these were hard to find and were only useful if the captions were descriptive and accurate, which many were not. apart from elsevier’s sciencedirect tutorials, no vendors provided another format deliberately as an accessible alternative. captions: captions were missing or problematic in the tutorials of fourteen vendors, or 59 percent of the total. five (21 percent) of vendors provided no captions at all for their tutorials. nine (38 percent) had unedited, auto-generated youtube captions, which are highly inaccurate and therefore don’t provide usable access to the content for people who are deaf. tutorial not findable or playable on page: twelve vendors (50 percent) had tutorials that were not findable on the webpage or playable for people using a keyboard or screenreading software. most of these issues are with tutorials on vendor sites, which were often flash-based or offered through non-youtube third party sites like vimeo. four vendors (17 percent) offered access to their tutorials both through their own (inaccessible) website and youtube, which is findable and playable by screen reading software. eight (33 percent), however, only provided access through their (inaccessible) webpages, which means that people using a keyboard or screen reading software would not be able to use their tutorials. no visual cues to focus attention: eight vendors (33 percent) had no visual cues to focus attention in the video. visual cues help people with certain disabilities focus on the essential part of the screen that is being discussed, help everyone more easily interpret and follow what is happening, and are known to help facilitate successful multimedia learning.18 information technology and libraries | december 2016 13 nondescriptive narration: six vendors (25 percent) had tutorials with audio narration that didn’t sufficiently describe what was happening on the screen. narration needs to describe what is happening in enough detail so people who can’t see the screen are not missing information available for sighted viewers. fuzzy visuals: five vendors (21 percent) had tutorials with visuals that were fuzzy and hard to see. this makes viewing difficult for people with low vision, and challenging even for people with normal vision. fuzzy audio or background music: three vendors (13 percent) had poor-quality audio narration or background music playing during narration. background music is distracting for those with hearing difficulties and makes it more difficult to focus on what is being said. eliminating extraneous sound also makes it easier for people to learn from multimedia.19 tutorials consisting only of text captions: three vendors (13 percent) had tutorials consisting of text captions with no narration. the text captions were not readable by screen-reading software, and no alternate accessible versions were provided. providing narration in tutorials is recommended for accessibility, since it allows people who can’t see the screen to access the content more easily, and has been shown to improve learning and recall over on-screen text and graphics alone.20 recommendations and conclusions this study attempted to determine how accessible vendor-created database tutorials are, and whether academic librarians can use them instead of re-creating them locally. for recorded webinars, the answer is a clear no, since none were technically accessible for people using screenreading software. for video or screencast tutorials, however, the answer less is clear. results showed that many vendors created tutorials with positive features like clear visuals and audio, being short and focused on one main point, and using descriptive narration. however, technical accessibility was much less successful, with 59 percent of vendors omitting usable captions and 50 percent presenting tutorials that couldn’t be found on the page or played by people using screen-reading software. these technical accessibility issues prevent people with hearing, vision, or some mobility impairments from using the tutorials at all. although none of the tutorials studied met all the checklist criteria, some came close and could be used by librarians depending on local requirements, policies, and priorities for accessibility. in part, this study found that the accessibility of many tutorials depends on how they are presented. disappointingly, 50 percent of vendors had tutorials on their websites that were not findable or playable by people with disabilities. many vendors, however, hosted tutorials on youtube as well as their own site. in these cases, youtube was always a more accessible option accessibility of vendor-created database tutorials for people with disabilities | oud https://doi.org/10.6017/ital.v35i4.9469 14 than the vendor site. youtube itself is relatively accessible, with both pages and players that are navigable by keyboard and by screen-reading software. there are options for accessibility settings in youtube, such as having captions display automatically, and more accessible third-party overlays are available for the youtube player. on vendor sites, there were more likely to be issues with flash and an inability for people using screen-reading software or keyboards to find and play videos. some vendors embed youtube videos on their site. even if the embedded videos are findable and playable, this method omits important accessibility features found on the youtube page, such as the text transcript. the results of this study show that using youtube where available is recommended. further, linking to youtube rather than embedding the video is preferred, unless a separate link to the transcript is made to provide an alternate accessible version. captions are another key accessibility problem identified in this study: nearly two-thirds had unusable captions. often, auto-generated youtube captions were present but were not usable. the presence of captions is not enough for accessibility; those captions need to be accurate and present the same content as the narration. youtube auto-captioning does not generate captions that are accurate enough to be useful without manual editing. youtube auto-generates transcripts from the captions, so if the captions are inaccurate the transcript will not be useful either. editing youtube auto-generated captions is necessary to ensure accessibility. a few accessibility issues found in this study would be easy to improve with some thought during tutorial creation. adding visual cues like arrows or highlighting to the screen to help people focus attention, or remembering that not everyone can see the screen while recording narration, can be easily achieved and would improve accessibility significantly. other issues would require more planning and effort to improve. given the widespread technical accessibility problems identified in this study, it is particularly important for people creating tutorials to provide alternate formats that are accessible if tutorials themselves are not accessible. almost no vendors do this currently, but it would have the most significant impact on accessibility for the broadest range of people. adding usable captions is the second most important area for improvement. to provide access for people who are deaf, captions need to be added or autogenerated youtube captions need to be edited for accuracy. both alternate formats and captions require some thought and effort to implement but ensure that tutorials will meet accessibility requirements and be usable by everyone. notes and bibliography 1. eamon tewell, “video tutorials in academic art libraries: a content analysis and review,” art documentation 29, no. 2 (2010): 53–61. information technology and libraries | december 2016 15 2. amanda s. clossen, “beyond the letter of the law: accessibility, universal design, and human-centered design in video tutorials,” pennsylvania libraries: research & practice 2, no. 1 (2014): 27–37, https://doi.org/10.5195/palrap.2014.43; joanne oud, “improving screencast accessibility for people with disabilities: guidelines and techniques,” internet reference services quarterly 16, no. 3 (2011): 129–44, https://doi.org/10.1080/10875301.2011.602304; kathleen pickens and jessica long, “click here! (and other ways to sabotage accessibility),” imagine, innovate, inspire: the proceedings of the acrl 2013 conference (chicago: acrl, 2013), 107–12. 3. deann barnard-brak, lucy lechtenberger, and william y. lan, “accommodation strategies of college students with disabilities,” qualitative report 15, no. 2 (2010): 411–29. 4. cyndi rowland et al., “universal design for the digital environment: transforming the institution,” educause review 45, no. 6 (2010): 14–28. 5. peter brophy and jenny craven, “web accessibility,” library trends 55, no. 4 (2008): 950–72. 6. kyunghye yoon, laura hulscher, and rachel dols, “accessibility and diversity in library and information science: inclusive information architecture for library websites,” library quarterly 86, no. 2 (2016): 213–29. 7. ruth v. small, william n. myhill, and lydia herring-harrington, “developing accessible libraries and inclusive librarians in the 21st century: examples from practice,” advances in librarianship 40 (2015): 73–88, https://doi.org/10.1108/s0065-2830201540; john carlo jaeger, paul t. wentz, and brian bertot, “libraries and the future of equal access for people with disabilities: legal frameworks, human rights, and social justice,” advances in librarianship 40 (2015): 237–53; yoon, hulscher, and dols, “accessibility and diversity in library and information science: inclusive information architecture for library websites.” 8. suzanne l. byerley, mary beth chambers, and mariyam thohira, “accessibility of web-based library databases: the vendors’ perspectives in 2007,” library hi tech 25, no. 4 (2007): 509– 27, https://doi.org/10.1108/07378830710840473; kelly dermody and norda majekodunmi, “online databases and the research experience for university students with print disabilities,” library hi tech 29, no. 1 (2011): 149–60, https://doi.org/10.1108/07378831111116976; jennifer tatomir and joan c. durrance, “overcoming the information gap: measuring the accessibility of library databases to adaptive technology users,” library hi tech 28, no. 4 (2010): 577–94, https://doi.org/10.1108/07378831011096240. 9. pickens and long, “click here!”; clossen, “beyond the letter of the law”; oud, “improving screencast accessibility for people with disabilities”; nichole a. martin and ross martin, “would you watch it? creating effective and engaging video tutorials,” journal of library & accessibility of vendor-created database tutorials for people with disabilities | oud https://doi.org/10.6017/ital.v35i4.9469 16 information services in distance learning 9, no. 1–2 (2015): 40–56, https://doi.org/10.1080/1533290x.2014.946345. 10 . byerley, chambers, and thohira, “accessibility of web-based library databases.” 11. tatomir and durrance, “overcoming the information gap.” 12. dermody and majekodunmi, “online databases and the research experience for university students with print disabilities.” 13. laura delancey, “assessing the accuracy of vendor-supplied accessibility documentation,” library hi tech 33, no. 1 (2015): 103–13, https://doi.org/10.1108/lht-08-2014-0077. 14. jodi b. roberts, laura a. crittenden, and jason c. crittenden, “students with disabilities and online learning: a cross-institutional study of perceived satisfaction with accessibility compliance and services,” internet and higher education 14, no. 4 (2011): 242–50, https://doi.org/10.1016/j.iheduc.2011.05.004. 15. kari l. kumar and ron owston, “evaluating e-learning accessibility by automated and student-centered methods,” educational technology research and development 64, no. 2 (2015): 263–83, https://doi.org/10.1007/s11423-015-9413-6. 16. us access board, “draft information and communication technology ( ict ) standards and guidelines,” 36 cfr parts 1193 and 1194, rin 3014-aa37 (2015), https://www.accessboard.gov/attachments/article/1702/ict-proposed-rule.pdf. 17. pickens and long, “click here!”; clossen, “beyond the letter of the law”; martin and martin, “would you watch it?”; oud, “improving screencast accessibility for people with disabilities.” 18. see the signaling principle in richard e. mayer, multimedia learning, 2nd ed. (cambridge: cambridge university press, 2009): 108–17. 19. see the coherence principle, ibid., 89–107. 20. see the modality principle, ibid., 200–220. information technology and libraries | december 2016 17 appendix 1. list of vendors 1. acm 2. adam matthew 3. alexander st press 4. apa 5. atla 6. chemspider 7. cochrane library (webinars only) 8. ebsco 9. elsevier 10. factiva 11. gale 12. ieee 13. lexis nexis academic (tutorials and webinars) 14. marketline 15. mathscinet 16. ovid/wolters kluwer (tutorials and webinars) 17. oxford 18. proquest (tutorials and webinars) 19. pubmed 20. sage 21. scifinder 22. standard & poor/netadvantage 23. taylor and francis 24. web of knowledge/thompson reuters 25. zotero accessibility of vendor-created database tutorials for people with disabilities | oud https://doi.org/10.6017/ital.v35i4.9469 18 appendix 2. tutorial accessibility evaluation checklist functionality � equivalent alternate format(s) are provided � transcript/test version � audio � other ___________________________ � alternate formats provided are accessible � alternate formats provided are findable on the page by screen reader � screen reading software can find the video on the webpage � screen-reading software can access and play the video � video-player functions can by operated by keyboard/screen-reading software � interactive content can be accessed and used by keyboard/screen-reading software � user has some control over timing (pause/rewind capability) � alternate modes of presentation are available for all, meaning presented through text, visuals, narration, color, or shape � synchronized closed captions are available for all audio � audio/narration is descriptive usability � user controls if/when the video starts (no auto play) � video is easy to use by screen-reading software � clear, high-contrast visuals and text � clear, high-contrast audio (no background noise/music) � uses visual cues to focus attention (e.g., highlighting, arrows) � is short and concise � is clearly and logically organized � has consistent navigation, look, and feel � uses simple language, avoids jargon, and defines unfamiliar terms � explicit structure with sections, headings to give viewers context � learning outcome/goal clearly outlined and content focused on outcome lib-s-mocs-kmc364-20140601052211 96 ]o11mal of library automation vol. 5/ 2 june, 1972 analysis of search key retrieval on a large bibliographic file gerry d. guthrie, steven d. slifko : research & development division, the ohio state university libraries, columbus, ohio two search keys (4,5 and 3,3) are amlyzed using a probability formula on a bibliographic file of 857,725 records. assuming random requests by record permits the creation of a predictive model which more closely approximates the actual behavior of a search and retrieval system as determined by a usage survey. introduction systems planners are hard pressed to accurately predict the access characteristics of search keys on large on-line bibliographic files when so little is known about user requests. this paper presents a realistic model for analyzing different search keys and, in addition, the results are compared to actual request data gathered from a usage survey of the ohio state university libraries circulation system. a number of papers are available in the literature concerning search key effectiveness; however, all of these were done on relatively small data bases ( 1-5) . of particular importance to this paper is kilgour's article on truncated search keys ( 6) . purpose the purposes of this study are ( 1 ) to determine the comparative effectiveness of the 4,5 and 3,3 search keys, ( 2) to compare two predictive models, and ( 3 ) to test the results with an actual usage survey. method the ohio state university libraries circulation system contained at the time of this study 857,725 titles representing over 2.6 million volumes in the analysis of search key retrieval/guthrie 97 osu collection. the data base used for this study was the search key index file which contained one search key for each title in the master file. the search key is composed of the first four letters of the author's last name and the first five letters of the first word of the title excluding nonsignificant words ( 4,5 key). title words are passed against a stop-list to determine significance. the stop-list contains the words: a, an, and, annual, bulletin, conference, in, international, introduction, journal, of, on, proceedings, report, reports, the, to, yearbook. the search key file is in sequence by search key. for comparative purposes, a second search key file was created and sorted which contained a 3,3 key (the first three characters of the author's last name and the first three characters of the first significant word of the title. ) the two files of sorted search keys were then processed by a statistical analysis computer program. this program created a frequency distribution table of identical keys, i.e., how many keys were unique, duplicated once, duplicated twice, etc. from this table two models were compared. modell: file entry was viewed as a random process with choice of any unique search key equiprobable. this model has been suggested in the literature mentioned earlier. it states that if x;. number of keys will return i matches then the probability of a file search returning i matches may be written: p(i) = xi/ku where ku is the total number of unique file keys. likewise, the cumulative probability for i or fewer matches is i i p(i) = ~ p(i) = ( l x;. )/ku i= l i= l model 2: file entry is viewed as a random process with the choice of any record equiprobable. thus, p( i) = ix;/rt where r t is the total number of file records. correspondingly, i i p(i) = l p(i) = ( ~ ixi )/rt i= l i= l survey: the ohio state university libraries automated circulation system includes a telephone center to which patrons may telephone requests for 98 journal of library automation vol. 5/2 june, 1972 library holdings information and for checking out and renewing books. telephone operators, sitting at cathode ray tube ( crt) terminals, translate the patron's author-title request into a 4,5 search key and proceed with a file search. by having the telephone operators treat te lephone calls as random input to the system and recording the number of matches returned for each search used, results can be generated in the same form that both of the models take, i.e. , i or fewer matches have been returned p( i ) x 100 percent of the time. this is a relatively easy survey to conduct since the output list of matching records for any particular key entry is headed with the exact number of matches which follow. the sample size was 1000 information requests recorded over two one-week periods separated by one month. before these two subsamples were merged, statistical analysis on their individual means (for percent of 10 or fewer matches) signified they were identical at the 99 percent confidence level. results the results predicted by the two models for both a 4,5 and 3,3 search key for 1-10 matches appear in tables 1 and 2. the figures pertaining to the 4,5 key can be compared directly to the data received fro m the survey conducted through the osu library's telephone center. this comparison is shown in table 1 for 1-10 matches. table 1. file access comparisons (4,5 search key). (percent of time i or fewer matches returned) i 1 2 3 4 5 6 7 8 9 10 actual survey 35.9 53.8 66.0 73.1 78.5 81.3 83.8 85.6 86.6 87.8 modell model 2 (random key) (random tecord) 81.3 55.7 92.9 71.6 96.3 78.5 97.7 82.4 98.4 84.9 98.8 86.6 99.1 87.8 99.3 88.8 99.4 89.6 99.5 90.2 to acquire a 99 percent upper confidence limit on the percent of requests returning 10 or fewer matches, the normal distribution was used as an approximation to the binomial distribution ( n = 1000, p = .878 ) producing an upper limit of 90.2 percent. analysis of search key retrieval/guthrie 99 table 2. file access comparisons (3,3 search key). (percent of time i or fewer matches were returned ) i 1 2 3 4 5 6 7 8 9 10 discussion modell (random key) 64.3 81.0 87.9 91.6 93.7 95.1 96.1 96.8 97.3 97.7 model 2 (random record) 28.0 42.5 51.7 58.0 62.7 66.3 69.3 71.8 73.9 75.7 in table 1 the results of the survey show that 87.8 percent of all searches recorded returned 10 or fewer titles. in modell, assuming that requests of the file are random with respect to search key, it is predicted that 99.5 percent of all searches will return 10 or fewer titles. all predicted percentages for model 1 are consistently higher than observed results. the predicted response in model2 more closely approximates the observed behavior of the system as the number of responses increases. however, model 2 is also consistently higher than the actual survey. comparing model 1 and model 2 only, it is apparent that assuming a random record request more accurately reflects the true usage of a library collection. the lower percentages recorded in the actual survey may be attributable to a number of variables not taken into consideration in this study. clustering due to common english word titles and common names may account for the greater part of this difference. table 2 shows the results of predicted response for a 3,3 search key. in this table, model2 predicts that only 75.7 percent of requests will return 10 or fewer titles. equally important, only 28.0 percent of the requests will return a single record. conclusion in predicting the expected behavior of an information retrieval system, it is more accurate to assume random requests by record than to assume random requests by search key. probability predictions are deceptively high for assumed random key requests and do not reflect actual usage of the file. even assuming random requests by record will produce higher-thanobserved results. data calculated using model 2 should be considered as an upper limit or "ideal" performance indicator. regarding the results of 100 journal of library autvmatio11 vol. 5/ 2 june, 1972 the random record model as the upper limit on effectiveness of the search key, the data gathered indicate that, as the search key is shortened from 4,5 to 3,3, the deviation between the random key and random record models is considerably heightened. the 4,5 search key is more efficient for retrieval of 10 or fewer records from a large file than the 3,3 key (90.2 -75.7 percent ). based on these data, the osu libraries decided to retain the 4,5 search key and not reduce it to 3,3. additional studies should be undertaken to determine the effects of common word usage, common names, and their relation to book usage. secondly, the data presented here could be systematically and randomly reduced in size to predict the behavior of various search key combinations on varying file sizes. references 1. philip l. long and frederick g. kilgour, "a truncated search key title index," journal of library automation 5:17-20 (mar. 1972 ). 2. frederick g. kilgour, philip l. long, eugene b. leiderman, and alan l. landgraf, "title-only entries retrieved by use of truncated search keys," journal of library automation 4:207-10 (dec. 1971 ). 3. frederick g. kilgour, "retrieval of single entries from a computerized library catalog file," proceedings of the american society for information science 5: 133-36 ( 1968) . 4. frederick h. ruecking, jr., "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," j ournal of library automation 1:227-38 ( dec. 1968). 5. william l. newman and edwin j. buchinski, "entry / title compression code access to machine readable bibliographic files," journal of library automation 4:72-85 (june, 1971 ). 6. frederick g. kilgour, philip l. long, and eugene b. leiderman, "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the american society for information science 7:79-81 ( 1970). .. 190 information technology and libraries | december 2006 digital tool making offers many challenges, involving much trial and error. developing machine learning and assistance in automated and semi-automated internet resource discovery, metadata generation, and rich-text identification provides opportunities for great discovery, innovation, and the potential for transformation of the library community. the areas of computer science involved, as applied to the library applications addressed, are among that discipline’s leading edges. making applied research practical and applicable, through placement within library/collection-management systems and services, involves equal parts computer scientist, research librarian, and legacy-systems archaeologist. still, the early harvest is there for us now, with a large harvest pending. data fountains and ivia, the projects discussed, demonstrate this. clearly, then, the present would be a good time for the library community to more proactively and significantly engage with this technology and research, to better plan for its impacts, to more proactively take up the challenges involved in its exploration, and to better and more comprehensively guide effort in this new territory. the alternative to doing this is that others will develop this territory for us, do it not as well, and sell it back to us at a premium. awareness of this technology and its current capabilities, promises, limitations, and probable major impacts needs to be generalized throughout the library management, metadata, and systems communities. this article charts recent work, promising avenues for new research and development, and issues the library community needs to understand. t his article is intended to discuss data fountains (http://datafountains.ucr.edu) project work and thinking (and its foundation in the ivia system, http://ivia.ucr.edu) regarding tools and services, for use in collection creation and augmentation. both systems emphasize automated and semi-automated internet resource discovery, metadata generation, and rich-text harvest. these areas of work and research occur within the larger realms of machine assistance and machine learning. they are of critical value to libraries as they currently or potentially concern: significant resource savings; amplification and re-tasking of expert effort to better match librarian expertise with tasks that truly require it (through the automation of routine tasks); and better scaling of collections by providing them the technological wherewithal to grow, as appropriate, and better match the explosion of significant available knowledge and information that the internet has accelerated. this article is organized into three major sections: ■ part i details machine assistance work to date in the data fountains and ivia systems project. ■ part ii describes current and upcoming promising research directions in machine assistance. ■ part iii delves into planning and organizational issues that may arise for the library community as a result of these technologies. ■ part i: recent work in data fountains and ivia part i covers work to date on data fountains and ivia. section 1, “a new service and open source software,” describes concrete project work with data fountains, a new open service and suite of open-source software tools for the educational and library communities, in developing practical machine learning to provide machine assistance in collection building. data fountains is an expansion of work based upon the ivia systems foundation.1 it is an effort that has been ongoing and evolving since 1994.2 section 2, “role and niche definition for machine assistance in collection building,” covers recent developments in our ongoing effort to better research and define roles and niches for machine assistance of the types offered by data fountains. the spectrum—ranging from collection building with an emphasis on expertise that receives small assists from machine tools to an emphasis on machine tools that are configured and thereafter assisted through small refinements by expertise—is examined. results from an initial exploratory survey in these areas are summarized. ■ a new service and open-source software—data fountains description data fountains is an internet resource discovery, metadata-generation, and selected, full-text harvesting service as well as the open source (lesser general public license machine assistance in collection building: new tools, research, issues, and reflections steve mitchell steve mitchell (smitch@ucr.edu) is the science librarian for ivia/nsdl data fountains/data fountains projects, science library, university of california, riverside. machine assistance in collection building | mitchell 191 (lgpl) and general public license (gpl) licensed) software that makes the services possible. it is a set of tools for use by organizations and institutions serving the greater learning community that create and maintain internet portals, subject directories, digital libraries, virtual libraries, or library catalogs with portal-like capabilities (ipdvlcs) containing significant collections of internet resources. it is an evolved variant of the ivia system, with which it shares many components. the data fountains/ivia code base represents more than 250,000 lines of primarily c++ code. on the systems level, data fountains operates as an array of independent systems containing crawler, text classifier, text extraction, portal, and database software components customized to the needs of participating projects. each cooperator and subject community works with, fine tunes, and benefits from its own set of crawler(s), classifier(s), and database manager(s), i.e., its own specific data fountain. note that in this article, data fountains’ portal/metadata repository/database management, content management, import-export, or content search/browse capabilities, which are substantial, will not be discussed.3 instead, the article will focus on its machine assistance and machine-learning components. the data fountains system and service has been developed through a research partnership among computer scientists and academic librarians that is beginning to provide technological solutions to some of the major overall problems associated with the scalability and efficient running of ipdvlcs. much project effort is based on applying machine-learning techniques to partially automate and provide help in a number of laborious and costly ipdvlc activities. included here, more specifically, are the following needs/scaling challenges: reducing to some degree the high costs of manually created metadata; better coverage of the ever-increasing number of important internet resources (relatedly, the relatively small size of most library internet collections, where searches yielding very few or no results are common); reducing or making more efficient expert-involved tasks requiring little expertise; and reducing redundant efforts among ipdvlcs (both in content and systems building). by providing inexpensive, universally needed raw materials (i.e., metadata and rich full text representing important resources), the data fountains service is intended to offer major support and resource savings to cooperating ipdvlc participants that otherwise have strong ongoing commitments to their established institutional identity or “brand,” interface or look, system, and, more generally, “established way of doing things.” data fountains viability and sustainability is keyed to providing universally needed service and very generic information products that do not require ipdvlcs to change—this often being seen as prohibitively expensive in time and resources. data fountains is intended to lower barriers for substantive cooperation in collection building and resource savings on the part of large numbers of ipdvlcs by developing, sharing, and distributing the benefits of machine learning in its areas of application. the data fountains service will be useful to a large spectrum of academic and library-based finding tools including metadata repositories and catalogs with internet portal-like capabilities.4 increasingly, library-catalog software is developing more flexibility, including, hopefully, the means by which full marc (machine-readable cataloging) records coexist with more streamlined (and less expensive) records, e.g., dublin core (dc) and other types, and, moreover, metadata records that include or can be closely associated with selected rich full-text, among many other catalog need areas.5 data fountains offers multiple levels of products and services geared to fit the needs of ipdvlcs of differing sizes, subject needs, and desired data “completeness” or depth (this being the amount and type of metadata and full-text needed to properly represent each resource). uses, products, and services overall, data fountains automatically or semi-automatically supplies varying levels of what represents the basic “ore” required by ipdvlcs for internet resource and article collection building: access to significant, previously undiscovered resources as well as the metadata and selected full-text that describe or represent them. this ore is available in both raw (relatively unprocessed) and more refined products depending on the needs of the participating ipdvlc including, perhaps most importantly, the degree to which expertise is available to provide further refinement and how and for whom the material is intended to be used. data fountains multiple product and usage models supports the building of a wide array of ipdvlc collections. a number of usage or service models are supported by data fountains, including: collection development support for single hybrid record type collections the first usage model, based on full automation, involves the utilization of data fountains metadata and rich, fulltext “as is,” without review, to populate a collection. these records can be used by themselves or mixed with other types of records. they can also be used as part of a hybrid collection to undergird another, more primary, or fully expert-created, collection.6 while more accurate, expertcreated collections are not only comparatively more labor intensive and expensive to create and maintain, but often smaller, with narrower and more limited coverage. this has been the infomine (http://infomine.ucr.edu) model that features two distinct collections, with the automatically generated collection supporting, as a second tier of 192 information technology and libraries | december 2006 data, the expert-built content in the primary collection. users can search one or both. internet resource discovery service a second model uses data fountains primarily as an internet resource discovery service where links and titles and other minimal metadata are supplied but where the user’s intent is to identify new resources and build metadata records emphasizing a considerable amount of metadata not generated by data fountains (e.g., different subject schema). this is done by utilizing the targeted link crawler, expert guided crawler, or focused crawler. because little to no metadata/rich-text generation/extraction occurs, this is the least complex of the usage models. crème de la data fountains a third approach, a variation of the second, utilizes only those data fountains records that have been automatically determined, through a user-set threshold, to represent the most highly significant resources (e.g., the top 20 percent). these can be flagged for expert review or automatically harvested without review. the data fountains metadata retained for expert review, post-processing, and improvement can be minimal or full. metadata records intended for expert refinement a fourth approach, which is semi-automated, involves using data fountains as both a discovery service and as a metadata record-building service where employment of records from the data fountains data stream is selective but the machine-created record is routinely retained as a foundation record to be refined or augmented by the expert. metadata records plus full-text a fifth approach is to use the rich full-text selectively identified and harvested from the internet resource, either in addition to the metadata generated or by itself, to populate a collection and greatly boost retrieval. that is, some collections may want to utilize metadata differing from that produced by data fountains but have data fountains perform the service of augmenting their metadata with rich full-text. all or parts of the object and full-text can be harvested. ■ standards, metadata, and full-text data fountains’ record format is dublin core (dc) and features standard research library subject schemas including slightly modified library of congress subject headings (lcsh) and library of congress classification (lcc). as part of upcoming work, development of additional classifiers to apply other subject/classification schemas/ vocabularies will occur, notably ddc and those that can be automatically invoked from the terminology found in the collection objects. cooperators may choose to help develop new formats, subject schemas, and metadata to meet custom needs in collecting and classification. other important metadata generated include: title, creators, description (an annotation-like construct), keyphrases, capitalized terms, and resource language, among a total of thirty-plus fields. in addition to fielded metadata, data fountains delivers selected rich text harvested from the resource. this is important for enhancing ipdvlc retrieval capabilities and user-searching success. the rich text can be harvested verbatim and offered as-is for search or, if this is problematical, further processed into keyphrases. data post-processing, transfer, and product relevance assurance participants determine and download resources of relevance automatically in batch mode via subject-profiled, custom internet crawls and editable results sets created by and for each ipdvlc to reflect its particular interests. these profiled crawls and metadata generation routines are stored and can be re-executed at selected intervals. results are transferred using the open archives initiative protocol for metadata harvesting (oai-pmh) or sdf (standard delimited format) in dc, marc, and extensible hypertext markup language (xhtml) formats. in addition to batch transfers, participants can manually and interactively identify individual records or groupings of records that suit their needs for harvest. selective, interactive, sorting/browsing of results, followed often by evaluation and editing of metadata and full-text fields (as individual records or globally in patterns), is enabled prior to export. these capabilities allow precisely targeted, custom record identification, modification, and downloading. this in turn enables the most general, as well as the most subject-specialized, ipdvlcs certainty in identifying and receiving only records that meet their need criteria. open-source software the software making the above possible is available to all for free through the lgpl/gpl open-source licenses and model. the open-source model should work well for tool development as fundamental as that described. open source of this type generally means that users freely use and perhaps participate in further development of the functionality of the software and, at intervals, contribute their innovations back to the code base for all to use. lgpl/gpl supports a wide diversity of forms of commachine assistance in collection building | mitchell 193 mercial service development. open source has worked well for large applications such as many forms of the linux operating system (a number of variants of this are supported), apache server software, and mysql database management software (all of which are used by the data fountains system). using this model has the intent of cooperatively benefiting the community as a whole. it is the author’s belief that tools of the data fountains type will have wide enough usage within and are crucial enough to the library community to support the development of an open-source community around them. data fountains software is of use to thousands of institutions that build ipdvlc collections. open source also means that the development and evolution of a core tool or system for a community can potentially occur faster and more flexibly, with the proper community support, than many types of proprietary effort. this is needed given the continuing and increasingly greater revolutions in computing power and software potential. the community needs to be able to evolve faster in response to changing conditions, and free, communitybased, open-source software development is one strategy for achieving this. ■ current systems design, development, and features to date, most of the work has emphasized research and development leading to innovations in preferential focused crawling, subject classification using logistic regression, knearest neighbor (knn) and other classifiers, and rich full-text identification and extraction. a major emphasis in systems development has been identifying points of intervention in crawling, classification, and extraction, whereby initial, periodic or ongoing interactive expert input can be employed to improve machine processes and results. that is, the work has emphasized usage not only of fully automated machine processes but semi-automated machine processes intended to interactively augment, amplify, and improve the efforts of experts. experts assist machine processes, and machine processes assist expert judgment/labor. the programming has also been done with an eye toward modularity among different systems components. ■ internet resource discovery/ identification—expert guided and focused crawling a number of crawling systems have been used; currently, for data fountains, three are used that represent two approaches to crawling: expert guided and focused. expert-guided crawling is accomplished by a targeted link crawler (tlc) and an expert guided crawler (egc). tlc is concerned with crawling a user-specified link or list of links. egc differs from tlc in that the single “start url” link given is only the beginning point from which the crawler will either drill down (find onsite links at multiple depths in a site) or drill out (find external links not on the start url site). the result is that, compared with tlc, many more links than just those given the egc crawler initially are crawled. with all crawlers, a metadata record with accompanying rich full-text is generated for each resource crawled. a preferential focused crawler, called the nalanda ivia focused crawler (nifc) after the name of the ancient seat of learning in india, continues to be developed. focused crawling makes possible focused identification of significant internet resources by identifying specific, interlinked, and semantically similar communities of sites of shared subject interest. generally, nifc traverses subject experttargeted regions of the internet to find resources that are strongly interlinked and thereby represent coherent subject-interest communities and sites of shared interest and mutual use (i.e., are often concerned with and contain content similar to one another). communities sharing interests often identify and cite one another through linkages on their internet resources. through this mechanism, these communities and their sites/resources can be identified, mapped, and harvested. preferential focused crawling makes focused crawling more efficient by employing algorithms that can respond to clues in web resource page layout and structure (e.g., using document object models, visual cues, and text windows adjacent to anchor text, among others) that indicate the more “promising” links to crawl. the result is more efficient focused crawling (figure 1).7 the focused crawling process starts with exemplary sites/pages/urls being supplied by participating ipdvlc experts. these highly on-topic exemplars are used to form a seed set of model pages used for training/guiding the crawler. as the crawling progresses, an interlinkage graph is developed of which resources link to one another (i.e., cite and co-cite). highly interlinked resources are evaluated, differentiated, and rated as to the degree to which they are linked to/from as well as for their capacities as authoritative resources (e.g., a primary resource such as an important technical report that receives many in-links to it from other resources) or hubs (e.g., secondary sources such as expert virtual library collections that provide out-links to other, authoritative resources). as hubs, expert-created, high-quality ipdvlc collections of links (e.g., infomine) play an important role as milestones and navigation aids in the guidance of many types of crawling. another automated process works to rate resources, as a second indirect measure of resource quality, by comparing for similarity of content (e.g., similarities among key-word 194 information technology and libraries | december 2006 vocabularies) between the potential new resources and model resources. the most linked to/from authorities and hubs, with terminology most similar to the exemplars, are thus identified and become prime candidates for adding to the collection and for indicating other resources to add. the overall architecture of data fountains involves multiple concurrent crawls and an array of multiple crawlers and associated classifiers on multiple machines (i.e., there are one or more data fountains for each major subject area or major cooperator). areas of expert interaction in focused crawling expert interactive and semi-automated approaches to improve crawling are employed in and constitute special design areas of data fountains since many participating projects and communities have access to considerable subject expertise. there is much promise in amplifying the role of this expertise in the crawling process. experts can create and refine crawls by: ■ determining the most appropriate seeds (exemplary resources) to use (whether found in their own collections or generated from other sources); ■ choosing degree of “on-topic-ness” desired (a precision versus recall setting); ■ determining the total number of resources to be crawled; ■ editing initial crawl results (e.g., de-selecting or blacklisting resources found) with an eye toward generally refining and developing a super seed set of very large numbers of increasingly on-target seeds that are then crawled anew. (this process of refinement and enlargement can be reiterated as desired in achieving increasing accuracy in and numbers of exemplars and therefore accuracy in the final crawl.) ■ in addition, expert truing of crawler web graph weightings (i.e., manually “lifting” the values of selected hubs and authorities) either during or after a crawling run is being explored to improve crawling accuracy. this lifting process can be aided through tools to visualize the crawl so that the expert can quickly identify, among the masses of results, the most promising areas of a web graph for the crawler to emphasize. ■ expert-created blacklists of urls for types of sites or pages that are not valuable can be stored to save future crawling and expert time. there is such a blacklist for each participating data fountains community group and individual. ■ metadata generation— automated and semi-automated subject classification data fountains and ivia embody innovations in automated metadata generation, including identifying and applying controlled subject terms (using academic library-standard subject schema), keyphrases, and annotation-like constructs (figure 2). automated classifier programs apply these and other metadata and are part of a suite of programs known as the record builder. controlled subject terminology applied currently includes lcsh, lcc, ddc, and medical subject headings (mesh). in assigning these, the system generally first looks for html and dc metatags and then extracts these data. with some fields, when these data are not present (which is common), original metadata are then generated automatically. in the case of lcsh, lcc, and ddc, if not present in metatags, or if users choose to override metatag extraction (in cases where metatags are not accurate, such as when they are spammy or when top-page boilerplate metadata is carried onto all pages regardless of subject relevance), then classification processes are invoked. these derive a set of keywords and key phrases from the resource that serve as a surrogate in representing and summarizing its content. then, using a model that encapsulates the relationships between these natural-language terms and the set of controlled-subject terms, the closest corresponding set of controlled terms is assigned. the model is learned from training data sets that consist of large sets of records (more than thirty million in corpora loaned for research purposes by the cornell university library, library of congress, california digital library [cdl], and oclc) figure 1. focused and preferential crawling (courtesy of s. chakrabarti) machine assistance in collection building | mitchell 195 from library catalogs and virtual libraries. with lcc, the aim has been to assign one or more lccs to a resource based on the set of lcshs associated with that resource. svm, knn, and logistic regression classifiers have been used. generally, performance has been acceptable in cases where there were two hundred examples of the usage of a particular lcsh (in a record with a url). unfortunately, as large as the training data sets have been, there simply haven’t been enough records for classification purposes with urls and associated text. this problem will more than likely be resolved shortly as catalogs increasingly incorporate web resources. metadata generation—automated extraction of known, named entities named-entity (e.g., data elements that can be expected to be in a resource and that are placed by authors/publishers within a known textual/markup pattern) extraction is primarily practiced through the simple means of identifying and extracting data elements indicated by html/dc metatags, when present on a page. data for more than thirty dublin core common (and not so common) fields are extracted. with some fields, extraction can be guided, as needed, in the interests of original metadata creation through pattern recognition and profiling, or through classification (e.g., title, subjects, description). ■ rich-text identification and harvest refinement of our “aboutness” measure for identifying the most relevant pages or sections in a resource or document (i.e., those intended by the author to be rich in descriptive information about the topics within and the type of resource) from which to extract text is a continuing pursuit. involved in this quest has been better determination of author-created structures and conventions in document or resource layout (e.g., locating introductions, summaries, etc., and determining/proportioning the amount of text to be extracted from each). more accurate rich-text identification in turn yields more accurate identification, extraction, and application of key phrases and, from these, more accurate controlled subject term and other metadata application. this is at the foundation of many metadata generation processes. crucially, rich full-text is also important from an end-user information-retrieval perspective because the natural-language terminology contained partially corrects for the limitations inherent in many controlled metadata and subject vocabulary/schema approaches (e.g., new or specialized subject terminology is often slow to appear or weakly represented in the often generalist library-standard subject schemas). refinement of the “aboutness” measure in identifying terms indicating that rich text follows is an important and ongoing task that involves formulating fairly intricate text-extraction rules in reflecting conventions in rich-text placement in resources and documents of differing types (e.g., web sites, articles, database interfaces), formats (e.g., html, pdf, postscript), and languages. ■ a modular architecture that supports a federated array of subject-specific focused crawlers and classifiers the architecture that data fountains is based upon is shown in figures 3 and 4. data fountains operates on the systems level as an array of separate sets of bundled crawlers (both guided and focused), classifiers, and extractors; this bundled array of crawlers approach provides greater flexibility and efficiency, as compared with using a more monolithic, single-crawler, multiple-subject approach. a bundle can occupy a whole machine or several can exist independently, as virtual data fountains, on a single machine. instead of one broad, multiple-subject, multipleaudience data fountain that follows a broad shotgun approach to internet resource discovery and classification, there are several vertical, subjectand audience-focused data fountains. a data fountain is intended to exist for each distinct, major subject area and the subject-specific ipdvlc collections (e.g., visual arts, business, horticulture) associated with them. figure 2. metadata choices in data fountains 196 information technology and libraries | december 2006 data fountains systems architecture emphasizes modularity. it has been enabled and assumed that separate components of the system (e.g., the crawlers, classifiers, database management systems) could be developed further for other uses independent of the data fountains system. in addition, as technologies that the system is dependent upon advance, users will be able to more easily swap out and replace older modules. these capabilities contribute to system sustainability. ■ service design and sustainability data fountains was conceived to be a cooperative, nonprofit, low-overhead, cost-recovery-based service intended to sustain itself after start-up. access will be provided to ipdvlc cooperators who demonstrate interest and support for the work and service. by so doing, cooperators share in supporting the continuing evolution and improvement of data fountains. as an additional sustainability consideration, the software has been released as open source so that it can develop and evolve in many directions (to directly fit unique needs) as well as benefit through distributed effort. ■ “small is beautiful”: roles for and advantages of appropriate smallto medium-scaled tools approaches like those data fountains has taken may be among the few ways that internet finding tools can continue to be relevant to the learning/library community and offer the accuracy and significant content needed by that community. the technical challenges faced by the large engines in their quest to cover an infinitude of audiences and internet resources do not need to be grappled with by the community of research libraries and are not faced by focused crawlers and classifiers of the type data fountains relies upon. the latter are better able to develop targeted, more accurate approaches to their subjects because they enable machine assistance for, as well as amplification of, authoritative subject expertise (e.g., librarians) as a core interactive component in the process of finding and describing new resources. the processes involved target more narrowly defined, distinct, and finite subject universes and intellectual communities. this, in turn, allows them to scale appropriately for their tasks and to apply more complex and varied types of metadata for faculty, researchers, graduate students, and librarians, who generally require more precision (and authority) in their finding tools but still need to move beyond collections (even allied) that are essentially catalogs moved forward a notch. the smaller scale of this work also potentially enables innovations in effective linkage and similarity (i.e., semantic) analysis. some experts note that the future of internet searching as a whole may lie in searching federated finding tools based in these techniques.8 such a federation could be an academic’s or librarian’s web of high-quality finding tools. data fountains may offer part of the foundation needed to support such a web. from a related perspective, these tools represent an appropriate approach for library and library-communityscaled resource identification and description tasks that emphasizes perhaps the great advantage the library comfigure 3. interaction of fully and semi-automated and manual collection building processes in data fountains figure 4. overall data fountains architecture machine assistance in collection building | mitchell 197 munity can bring to bear in creating useful finding and metadata generation tools, which no others have. that is, the community’s unparalleled subject and description expertise in finding and ordering significant resources into coherent rich collections might be amplifiable shortly, through machine assistance. if such an effort was sensibly coordinated and focused, and minor modifications in approach and established standards made to enable best use of these new tools, then the best internet finding tools/collections could be made possible yielding high-quality and significant coverage. these collections would benefit by having the capability to catalyze, out of the mass of the web, the resources that constitute much of its intelligent fraction and make this coherently visible and available to learners and researchers. moreover, this could be done in such a way that digital and print record and object collections could seamlessly interact as one, rendering what would be the best information-finding tools/collections without regard to type of resource. this effort in fact has been unfurling for a long time, though, to date, in small and somewhat sporadic, uncoordinated ways. for example, infomine and similar collections have provided credible links to and for the academic community for well over a decade. ■ role and niche definition for machine assistance in collection building exploratory survey an exploratory survey conducted in fall 2005 illuminated new perspectives, desired products and services, and research opportunities as perceived by a sampling of digital library and library leaders in regard to a number of areas involving machine assistance in collection building. generally, areas explored concerned, among others: new roles projected for machine learning/machine assistance in libraries for metadata generation, resource discovery, and rich full-text identification and extraction; new finding-tool niches and opportunities existing in the service spectrum between google and opac; acceptance of streamlined, more minimal, and cost-saving approaches to metadata creation or augmentation; the role of cost-recovery-based service and cooperative, participatory business models in digital libraries. more specifically, the purposes of the survey were to: 1. elicit leading library attitudes in relation to the types of services, software development, and research that generally will constitute data fountains; 2. test the waters in regard to attitudes toward implementing machine-learning/machine-assistance-based services for semi-automated collection building within the general context of libraries; 3. probe for new avenues or niches for these services and tools in distinction to both traditional library services/tools and large web search engines; 4. concretely define data foundations’ initial set of automatically and semi-automatically generated metadata/resource-discovery products, formats, and services; 5. examine attitudes toward the value and roles of rich, full-text in library-related finding tools; 6. examine attitudes toward hybrid databases containing heterogeneous records (e.g., multiple formats, types, and amounts of metadata); 7. gather ideas on cooperatively organizing such services; and 8. generally define new ideas in all interest areas for development of products and services. the survey, comprised of fifty-nine questions, was sent to thirty-five managers of leading digital libraries/libraries/information projects.9 there was roughly a 40 percent return from those targeted (fourteen out of thirty-five). responding institutions and individuals were guaranteed anonymity of response. ■ survey result summary there was considerable agreement on most answers. as such, this initial definitional survey has proven helpful in design and product definition. though the survey sample set/number of respondents was limited and while results need to be seen as tentative, the views expressed are from well-regarded experts in the fields of digital library and library technology, development, and services. in addition to helping define current data fountains services, the survey results also indicated the need for further exploration in the areas of services, tools, overall niche definition, and publicity. while conclusions remain tentative, barring future, larger surveys, some of the more relevant results are as follows: ■ there appear to be significant niches for an automated/semi-automated collection-building/augmentation service given inadequacies in serving research-library users found in google (and presumably other large commercial search engines) and commercial-library opac/catalog systems. survey results indicate a need for services of the types characterized by data fountains. ■ generally, academic libraries get a slightly above middle-value (neutral) grade in terms of meeting shifting researcher and student information needs over the last decade. this indicates that, above and beyond specific library and commercial-finding tools, 198 information technology and libraries | december 2006 there are information needs not being met by libraries in regard to information discovery and retrieval that new services may be able to help provide. ■ there is support, above and beyond creating machine-assistance-based collection-building services, for developing and distributing the free, opensource software tools supporting these services. tools that make possible machine assistance in resource description and collection development are seen as potentially providing very useful services. ■ automated metadata creation and automated resource discovery/identification, specifically, are perceived as potentially important services of significant value to libraries/digital libraries. ■ there is support for the notion of automated identification and extraction of rich, full-text data (e.g., abstracts, introductions) as an important service and augmentation to metadata in improving user retrieval. ■ the notion of hybrid databases/collections (such as infomine) containing heterogeneous metadata records (referring to differing amounts, types, and origins of metadata) representing heterogeneous information objects/resources, of different types and levels of core importance, was supported in most regards. ■ many notions that were foreign to library and even leading-edge digital library managers/leaders (the respondents) two to three years ago appear to be acknowledged research and service issues now. included among these are: machine assistance in collection building; crawling, extraction, and classification tools; more streamlined types of metadata; open-source software for libraries; limitations of google for academic or research uses; limitations of commercial-library opac/catalog systems; and the value of rich full-text as a complement to metadata for improved retrieval. ■ there is strong support, given the resource savings and collection growth made possible, for the notion of machine-created metadata: both that which is created fully automatically and, with even more support, that which is automatically created and then expert reviewed and refined. ■ amounts, types, and formats of desired metadata (very streamlined dc metadata was supported for most uses and contexts) and means of data transfer (oai-pmh was preferred) were specified by respondents. ■ summary of part i data fountains is a unique service and system for inexpensively supporting aspects of collection building among ipdvlcs. developing and utilizing advances in focused crawling and classification, this service automatically and semi-automatically identifies useful internet resources (both open-web as well as closed-collection resources including articles and reports, etc.) and generates metadata (and selected rich text) to accompany them. data fountains is a cooperative service, a free open-source software system, and a research-and-development project exploring machine assistance as well as machine-expert interfaces and synergies in collection building. several useful service niches and roles for the work have been identified and have been or are being developed. ■ part ii: new directions in research this section discusses important new directions in research for machine assistance in collection building as they relate to upcoming and expanding research, development, and prototyping within data fountains and ivia. among focus areas are promising means of: automated classification for applying library standard controlled subject vocabularies/schema, including hybrid and ensemble classification; smarter and more accurate named-entity extraction (e.g., capturing object/article metadata “facts” such as publisher and publishing date); improvements in rich-text identification and harvesting; article/report collection level cocitation and subject gisting functionality; and generally improved expert-guided and focused web crawling. ■ new research in machine assistance for collection building the ivia and data fountains projects have recently received a fourth national leadership grant from the united states institute of museum and library services that supports three years of research and development in machine assistance in collection building. in addition, the national science digital library is continuing funding. the areas that will be worked in are discussed below. these have been determined through experience gained over the last eight years of work in machine-assistance-oriented systems development and dialogue with computer scientists and collection coordinators. these areas of technology work and application, though complex and challenging, are very important. that is, assuming it is important that the learning/library community not be dis-intermediated by such technologies but instead becomes more fully empowered by them. this can only occur through developing a much larger role in actively defining, guiding, and putting the technologies to best possible use. looking into the future, it is clear that libraries cannot simply continue to wait for or rely on good companies like machine assistance in collection building | mitchell 199 google, oclc, or opac creators to deliver them, much like a cargo cult, as they have in the past. to the degree that this is done, there is the risk of becoming vendor vectors blinded by the limitations of these companies and their product lines. these products are often incorrectly assumed to be the known technical and organizational universe of what is possible or doable. the revolutions coming in computing power together with the low cost of this power—which will be almost ubiquitously distributed among users of library collections and services—promise much more change than libraries have seen in the last decade. among the changes underway are those in machine learning and machine assistance in libraries. as the changes take place, organization size may not guarantee much as, over the last decade, librarians and researches have witnessed large academic and other research libraries, with some exception, demonstrate a profound organizational entropy in almost direct proportion to the magnitude of what are essentially paradigm shifts in scholarly communications, information provision, and research. to some degree, these simply reflect larger blockages within the universities and institutions in which libraries are embedded. as these changes play out, it should be noted that history in information or libraryrelated public or scholarly information provision/access probably will not end with google or oclc—wonderful and fairly open companies—just as history in automobile manufacture has not ended with gm, computer manufacture with ibm, or web finding with alta vista. with this as background and in the vein of open planning (as well as open services and open software) and given the size of the work areas addressed and their challenges, much of the projects’ technical planning and direction are being presented in this paper. these areas of computer and information-science research and development, which will affect libraries in many ways, are evolving rapidly into practical application. the current major research areas are: ■ named-entity identification and extraction, and unified models of information extraction and data mining named-entity identification and extraction is concerned with finding and harvesting generally concise factual data—often common bibliographic metadata—present in the targeted resource such as publisher, title, and publishing date. this type of metadata usually is associated with particular collections containing information objects that are often homogeneous (e.g., scientific article collections) and in which author-intended placement of metadata (or data) elements follows an established pattern and location in the object (e.g., an abstract is typically present and indicated in a pattern following presentation of title/author). while making extraction easy is one of the functions of metatagged metadata in internet resources, generally few authors or collection coordinators in academia, or elsewhere, use metatags or applicable naming schema in any significant or uniform way (often, in fact, it is used very sparingly or not at all). extractors therefore must be able not only to identify and harvest metatag metadata, but must discern and then extract specific metadata elements interspersed in bodies of text, as made identifiable by detecting the patterns of occurrence unique to the type of element as it occurs in the object or collection. among the many advances planned for data foundations is the usage of conditional random fields.10 important as well are user interfaces or dash boards that allow configuration of extractors whereby, as patterns of placement for desired data for extraction change in differing collections and types of objects, the tool can be configured appropriately to match the context and task. also under development consideration are more hybrid, unified approaches to and models for data extraction and mining (as applies to text classification), using each to inform and improve the other.11 that is, a family of models is being developed for improving data mining of information in largely unstructured text by using methods that “have such tight integration that the boundaries between them disappear, and they can be accurately described as a unified framework for extraction and mining.”12 much of this work is concerned with generating metadata for article/report-level collections. ■ document-scale learning and classification a strong emphasis in the new work will be on documentscale machine learning, classification, and named-entity extraction in regard to collections of research papers, reports, theses, and monographs. internet-object boundary detection is another important concern. detecting and properly defining compound documents (e.g., web hyper-books on multiple pages or sites) is a goal, as is identifying compound-document points of author-intended entry and intended-user paths (i.e., author-intended main connective threads in distributed or compound documents).13 relatedly, improved internal-document structure identification for better document-level classification and extraction is critical. involved are standard-document internal-structure identification (e.g., abstract, introduction, summary text, captions for tables/figures) including units of rich text and microinformation units of text organized via subtopic.14 methods of document-level word-and-phrase graphing as per 200 information technology and libraries | december 2006 textrank and other means of identifying small-world and micro-information units are currently being pursued.15 a strong emphasis as well will be on examining and implementing new means of co-referencing among documents in collections and new means of identifying latent topics in a well-defined collection. by way of explanation, another term for co-referencing is co-citation. an example of such co-referencing is referencing work, described in papers, that has been funded through the same agency and program or that shares principal investigators in addition to standard bibliographic citation. this will improve on work done in citeseer.ist (researchindex) and similar projects through integrating and advancing the promising approaches of rexa open-source collection-management software.16 the focus of this effort will be on integrating article-level named-entity extraction as well as co-citation and bibliometric-refined subject identification within collections of papers/reports. ■ individual text-classification algorithm and training method improvement new research on individual text-classification algorithms will be examined and applied. the emphasis here will be on prototyping and measuring how applicable recent promising scholarly work might be to library-related metadata-generation challenges. the major focus continues to be in the area of applying controlled, library standard subject vocabularies (e.g., lcsh, lcc, and ddc). many of the improvements relate to advances in individual text-classification algorithms, classifier training and fine-tuning, training-corpora cleanup and normalization techniques, and creating the ability for the individual classifiers to hybridize with other classifiers. of special interest are classifiers that perform well with very large numbers of classes, both small and large amounts of text, and that yield probabilistic estimates in class assignment (e.g., of a particular lcsh). the latter allows both provision of multiple class assignments for resources that have multiple subjects as well as greater accuracy and knowledge of the confidence level of the assignments (thresholds of confidence level in accuracy can be set in applying, or not, a particular classification). more specifically, this work will examine, test, and— depending on test results—refine recently improved variants of the most promising of several classification algorithms.17 among those are: ■ support vector machines (svms)18 ■ logistic regression (lr)19 ■ naïve bayes (nb)20 ■ hidden markov models (hmms)21 ■ knn/knn model22 a number of metrics to measure performance of these and other text classifiers in regard to controlled subject assignment, in both fully-automated and user-interactive (semi-automated) modes, will also be employed.23 ■ hybrid classifiers an important effort will be to test and develop new hybrid classifiers that incorporate the best capabilities of two or more in one classifier. much of the current research has involved developing and improving new hybrids that combine the best of discriminative (e.g., lr, svms, decision trees) and generative (e.g., nb and expectation maximization) techniques in classification. for example, nb is fast but lacking in accuracy, while svms are accurate but can be slow to train. hybrid models can produce better accuracy/coverage than either their purely generative or purely discriminative counterparts.24 various combinations, among others, of lr, hmm, and svm are among the most promising.25 ■ ensemble classification or classifier fusion this constitutes one of the main current directions in classification research and has been applied to a wide range of real-world challenges. classification ensembles are reputed to be more accurate than any individual classifier making them up.26 an important focus is on experimenting with new approaches to automated and semi-automated ensemble classification that involves creating frameworks that support metaclassifiers or classifier-recommender systems to apply multiple classifiers, as appropriate, to the classification task.27 developing classifier ensembles, including the metaclassifiers to guide them, is a major element in making possible the self-service aspect of an open, automated metadata-generation service, given that the metaclassifier is intended to determine the nature of the collection and classification task and assign the appropriate classifier(s) to the job.28 it is probable that expert interaction at suitable points in this process will improve performance. ■ distributed classification classifier ensembles are often used for distributed data mining in order to discover knowledge from inherently distributed and heterogeneous information sources and to scale-up learning to very large databases (often the context for library-related tasks). however, standard methods machine assistance in collection building | mitchell 201 of combining multiple classifiers, such as stacking, can have high performance costs. new classifier combination strategies and methods of distributed interaction will be examined to better handle very large classification needs.29 distributed classification, by nature, would be focused on improving large-scale self-service classification. ■ semi-automated, expert-interactive classification means of enabling semi-automated, expert-interactive classification will be presented.30 there is much scope for building interactive classifiers that engage the tool user or collection coordinator in an active dialogue (e.g., multiple iterations of machine/expert actions and feedback loops) that leads to incorporation of expert knowledge about specific classification tasks, metadata, and collections into the classifier, thus improving performance. that is, an active learning model can be extended significantly for these processes to include both feature-selection and document-labeling conversations, exploiting rapidly increasing computing power to give the user immediate feedback on choices to improve the classification process.31 several different models featuring domain expertinteractive classification and extraction will be evaluated. these vary from being extremely interactive, emphasizing frequent machine assists, to less interactive, where experts profile, launch, and only occasionally refine a primarily machine process. the initial focus will be on the latter models. note that ivia and data fountains have included a metadata generator with semi-automated record builder for years. ollie and hiclass are examples of systems that are more intensively expert-interactive.32 classification tasks and collection types will be characterized as to which lend themselves to frequent expert interactions, occasional interactions, or more fully-automated modes (i.e., little interaction or initial profiling/definition only). ■ classifier training and evaluation techniques as important as direct work on the classifiers is work emphasizing assessment, cleaning, and testing of classifier-training data and classifier-evaluation techniques. involved are training data/corpora-normalization techniques, document-clustering techniques, and classifier bias/variance-reduction techniques. also involved on the classifier side are tuning issues in regard to the data at hand, including improved feature-selection techniques and determining and using confidence estimates in applying/not applying classifications. different approaches to these will be examined, tested, and refined with a range of training corpora. diverse training and test data from assorted collection “types” will include standardized corpora as well as data from participating library or educational community projects. that is, the techniques will be assessed with regard to how they perform with: (1) open web resources, (2) collections of research papers, reports, theses, or monographs (working with rexa), (3) typical campus web-site pages, and (4) mixes of the above.33 each collection-type focus will require differing approaches, algorithms, training, and fine-tuning techniques and will be evaluated through a number of measures.34 ■ improved rich-text identification and extraction for improved classification and user search/ browse rich text is text that has the role of conveying through traditional or new document structures or conventions (e.g., introductions, tables of contents, faqs, and captions for figures) the author-intended subject(s) and intent of the information object. being able to accurately identify and extract this material greatly aids in classifier performance by improving significant keyphrase identification as well as in user retrieval by enabling full-text retrieval. the availability of natural-language text for searching is one means of helping to resolve problems encountered in searching controlled, library standard subject vocabularies (which in turn counteract problems searchers have when only natural-language retrieval is available). both approaches are inherently complementary. improvements in rich-text identification and harvest through improved means of document-structure learning (e.g., identifying text windows around links or captions for figures and tables) will be sought. the lightweight semantic (e.g., use of terms that indicate “aboutness” such as “about”, faqs, introduction, and abstract; rating the frequency and uniformity of application of these terms in a given collection; and proportioning source of harvest) and markup clues will be refined as well. identifying aboutness text, which can be seen as micro-information units of text organized via topic and subtopic, is being pursued through work with rexa and others.35 ■ improved focused crawling focused crawling is an appropriately scaled method of crawling for many library collections (see part i). it is used to discover new internet resources by defined topic terminology and topic web-link neighborhood. topic similarity 202 information technology and libraries | december 2006 and semantic analysis are key measures of significance that are combined with linkage co-citation measures to indicate significance or relevance of a new resource. topic similarity among resources will be increasingly modeled through a topic-linkage matrix (i.e., semantic similarity map).367 new means of evaluating, fine-tuning, and improving basic crawling will be examined.37 rules reflecting the specific semantics of each major subject area are to be developed by participants for crawls/classification. ■ combined mining and extraction that support improved focused crawling in regard to best link pursuit and expert interaction the development of hybrid, unified approaches to extraction and mining can be applied to focused crawling. the processes of data mining, rich-text identification and extraction, and the newest forms of focused crawling are starting to overlap and depend upon one another in important ways (as discussed in the section on preferential focused crawling). another focus for development efforts will therefore be work to more systematically refine bestlink pursuit with an eye toward combining advances in mining, extraction, and rich-text identification in focused crawling. this work will be undertaken to improve the work on nifc. focused crawling will improve in many situations, as well, through use of user-interactive components and data-visualization interfaces (e.g., control boards that visualize an interactive graph to aid in expert “lifting” of the values of specific sites/subtopic neighborhoods to better reflect their significance to the expert). this in turn that will help users guide and tune the crawling, in semiautomated fashion, to better fit the goals and context of a particular crawl. ■ modeling different approaches for a self-service, openly accessible metadata-generation service(s) the data fountains and ivia efforts have some experience with modeling metadata-collection related services, having provided collaborative, scholarly virtual-library service successfully for more than a decade. the data fountains project has improved upon earlier work and represents an automated and semi-automated resource discovery, metadata generation, and rich-text identification and harvest service for cooperating collections. the intent is that data fountains be a self-service operation. in related effort, with co-operators at the national science digital library (nsdl) and library of congress (lc), the data fountains project has been striving to develop self-service dash boards that collection managers can use to configure, profile, and satisfy their needs. by complementing initial profiling with ongoing interactive dialogue, guidance, and refinement, more precise task definition and tool utilization can be achieved. the goal is to have a service that can, through advanced interfaces, engage users in dialogue to help them better determine their options, the tasks involved in achieving them, the capabilities and limitations of the tools available, and therefore, the best choice of tools and practices given their specific service needs and the nature of their collections. ■ summary of part ii there are many fronts of research in machine learning as applied to text processing and new-resource discovery in regard to collection building of various types, relevant to libraries, which have opened over the last few years. the data fountains/ivia research described is looking into just a few of these. for libraries, the borders between computer science, information science, and library science are dissolving rapidly. it would be hard to devise or project forward a five-year plan for a large working library without some understanding of current and oncoming machinelearning and machine-assistance work in each of these disciplines, the many inter-connected organizational/community/technical issues, and without an understanding that goes beyond the domain of current or developing products and services from existing vendors. ■ part iii: issues and reflections part iii is intended to define and address some of the many challenges and issues that are arising or may arise as a result of work on machine-assistance tools in the areas of automated and semi-automated resource discovery, metadata generation, and rich, full-text identification and harvest. included here are reflections on and questions about some of the probable implications and impacts of, as well as roadblocks to, machine-learning technologies applied to collection building. addressed are probable impacts leading to changing roles for libraries, librarian expertise, library standard vocabularies/schema, and the organizations that are the stewards of library standards. these include: ■ what might be the effect of these technologies on library operations, including changes in the areas and nature of expenditure of expertise required, shifts in amount of expertise required, and changes in divisions of labor (both human/human and human/ machine)? machine assistance in collection building | mitchell 203 ■ what are the effects on libraries and end users when the coverage of finding-tool content can be greatly and inexpensively broadened and deepened? ■ how do current or traditional approaches to librarybased practices and standards help foster or hinder these technologies? ■ how will best practices develop in regard to machineassisted activities? ■ how do these technologies amplify and enable or simply prematurely dislodge librarian expertise? ■ who will own these technologies and tools? ■ how open to evolution are library metadata standards and the organizations entrusted with their stewardship? ■ how will these technologies impact these standards? unfortunately, most of these questions will remain as questions unanswered. the few answers offered here must remain as tentative, contradictory, and flawed as those of most who dabble in the cottage industry of imagining library futures. still, in the effort to help map some of the new information landscape that is becoming apparent, these reflections, developed over the course of the last few years, may be small contributions toward defining and understanding what is coming. ■ licensing for automatic agents of libraries it will become increasingly important for libraries to develop licenses with commercial-resource vendors/publishers that allow crawlers/classifiers and other automated programs, to be seen as agents of and for these libraries. it is important that automated agents be allowed to work with (e.g., create or enrich metadata and therefore increase end-user success in finding) both free and fee-based materials in much the same way that an expert bibliographer, cataloger, or public-services librarian would when selecting, creating original metadata for, and providing access to a new commercially vended book intended to become part of a library or other well-defined collection. automated agents accessing and processing fee-based, internet-delivered information objects do so with the goal of improving the finding tools of the institution paying the fee to provide access for users to these objects (i.e., “library users”). thus, they are engaged in a bona fide, fair use of the material by and for the purchasing/subscribing institution. the metadata and descriptive information these tools develop help make the materials they process more visible in collection/finding-tool contexts, a goal which should be desirable by all parties (i.e., end user, subscribing library, and owning author/publisher). ■ new medium, new organization, and an over-proliferation of electronic toll booths and borders another challenge is that internet access to library-collection contents and library catalog-described data, both free and fee-based, is becoming increasingly restricted as libraries, library service organizations, and publishers grope to create special aggregations, with exclusive access for their clienteles. countering this in their adherence to open access, have been, among others, services developed by, for example, arxiv, the institute of museum and library services open archive, cdl escholarship, oaister, citeseer, and nsdl.38 differences in the two approaches may increasingly become an issue. on the one hand there is the broad, longterm community ethic favoring open access to an internet with few walls or borders, and authors enabled to publish directly via the internet through open eprint collections or dual commercial/personal-site publishing/copyrighting of their work. on the other hand there is the fairly narrow definition of an internet information niche in which electronic/virtual services and collection access remain mapped restrictively to the sponsoring physical libraries/collections/institutions/publishers. libraries face a contradiction or tension between these two approaches. the latter mode is a natural effort to retain a tightly held clientele and access model that has characterized physical libraries, reflecting narrowly conceived and decadesold organizational/budget/certification/user models of physical-library services and publisher controls. much of this practice is necessitated by commercial publishers (for whom libraries often have no alternative but to act as vectors), together with the lack of vision for and outdated stereotypes held of libraries by the larger organizations in which they find themselves. at the same time, much of the problem is also due to the inability of libraries to develop new cooperative organizational modes, models, and services that map better to the new medium, map better to new author and user benefits enabled by this medium, and that are better able to exploit fully and fluidly the new medium’s capabilities. the types of compartmentalization of collections, access, and services needed for physical libraries and print, or necessitated by publisher restrictions, are increasingly an obstacle when projected onto internet access and service capabilities. thorough rethinking is needed, just as the educational and scholarly missions of the university as a whole must be thoroughly rethought in the light of internet-associated technologies and capabilities.39 while the information highway must be paid for, over-compartmentalization based on dated organizational and service models is yielding an over-multiplication of 204 information technology and libraries | december 2006 toll booths and border crossings among aggregations and collections. an example has been the emphasis at many university of california campus libraries on the single campus opac rather than the pooling of resources across uc libraries for the strengthening and refinement of cdl’s melvyl union catalog. it is likely that with systemwide, multicampus shared resources, melvyl could improve in all respects vastly beyond the single campus opac. this is noted in the final report of the bibliographic services task force of the university of california libraries.40 overall, institutional parochialism can and has greatly lessened the value and fluidity of the internet as a medium for information provision. the booths and borders of tightly held collections make material harder to find, less visible, and less useful than would be true of more open, expansive collections and archives. as dempsey stated, libraries need to find “better ways to match supply and demand in the open network. . . . we need new services that operate at the network level, above the level of individual libraries.”41 for crawlers and classifiers, the booths and borders that are proliferating in libraries can act disjunctively as barriers, reducing their performance. there are few answers to the challenges that over-proliferation of booths and borders represent. they are often practical solutions to immediate needs. still, projects that are exploring new avenues in organization and open, sharable collections (and the standards they are based upon) should be further encouraged and supported communitywide. these include the open archives already mentioned and systems such as those ivia/data fountains work upon that to provide services for such collections in an open, inclusive, cooperative, participatory manner. while the answer will probably remain a mix of open (reflecting capabilities of media) and closed (reflecting organizational and vendor restraints) collections, it would be progress to move the balance point more toward the middle and away from so many booths and borders. ■ note on the related issue of meta-search libraries often respond to some of these open/closed/ multiple-collection aggregator and “brand” challenges and issues with meta-search services. meta-search can serve to mask the fundamental, growing problem of increasing booths and borders. meta-search, unlike the internet-borne conceptions of open service, collections, access, systems, software, and standards, does not really ask us to change our fundamental assumptions, organizations, or data architectures to match the capabilities of the new information medium. it does not ask us to cooperate more fully and share at the level of collection and data; it also doesn’t encourage uniform-standards adoption and development. while meta-search is a fine answer to certain needs, sometimes it is used as a technical means to attempt to avoid these more fundamental issues. in addition, meta-search can be constraining for user search/access—i.e., it frequently disallows use of significant or unique search and metadata capabilities of each individual database to which it is applied. meta-search in libraries is becoming increasingly central, though it has many current operational flaws. among these flaws are: ■ simplification or dumbing-down of search in order to access lowest-common-denominator fields; ■ clumsy cross-walking among fields, or metadata terminologies that really are not equivalents; ■ difficulty in collating results/eliminating duplicates; and ■ difficulty of matching differing results ranking weightings/systems held by different bases. libraries emphasizing this approach may be increasingly themselves perceived as dumbed down by academics, grad students, or serious researchers, who must reach beyond google, the opac, and meta-search search and display. instead of, or in addition to, meta-search, it might be wise to pursue more fully the hybrid database approach of combining heterogeneous records for multiple collections (and multiple retrieval languages as needed) in one database.42 as computing power increases geometrically and price decreases drastically every couple of years, the challenge that the hybrid-database approach poses in regard to searching and maintenance of very large hybrid databases may soon become less of a problem. this power also implies that meta-search become more useful. ■ library standard controlled subject schema/vocabularies as the promise of automated and semi-automated metadata generation and related tools becomes better known, it may be important for the community as a whole to urge our major subject vocabulary standards organizations, i.e., lc and oclc, to open more fully their standards and input in standard making for wider participation on the part of new communities of researchers, developers, and end users. both organizations maintain important library standard subject vocabularies/schema, lcsh/lcc, and ddc, and related large bibliographic databases and classifier-training data embodying these standards. in this work, both organizations need to more actively seek out and encourage a wider variety of open innovation and development, both within and outside of the library community. this means involving more researchers, end users, and other perspectives in the effort of contributing to the more rapid evolution of these standards in an attempt both to better meet end-user finding needs and machine assistance in collection building | mitchell 205 to facilitate application of the standards through machine assistance. while oclc and lc have been generous in providing their data and standards for ivia research (others that have been generous with training data have been the cornell university library and cdl), most known work on these standards is funneled through their organizations, allies, and organizational filters. this is, of course, critical to a point for coordination; however, if overdone it may unnecessarily inhibit wider pollinations, new perspectives (e.g., a wider variety of linguists, computer scientists, and subject vocabulary/schema experts from other disciplines such as medicine and the sciences), decision making, and faster movement forward. informing the perspective here is that, while there are major costs involved in maintaining and coordinating these vocabularies/schemas, such costs are being borne directly or indirectly by the community in fees paid, monies applied (often public monies through the large participating public university/land-grant libraries, among others), or labor volunteered/provided. lc is a public agency and oclc a corporate cooperative. in many ways then, libraries, through their metadata expert/cataloger community, should be seen as “owning,” as both co-author and funding agent, more of a share in these vocabularies (and other standards in library metadata) than their stewarding organizations. a significant portion of the success of thousands of individual libraries is dependent on the successful evolution (replacement?) of these standards through the facilitation efforts and new roles adopted by these two organizations. ultimately, it must be recognized that in many ways, oclc and lc metadata schema and vocabularies (as well as conventions, styles, and customs in practical application) represent the codified wisdom, in the form of very large knowledge bases, of decades of resource description practice on the part of information professionals in thousands of institutions. the library community is the co-author of these, and oclc and lc are their stewards. when viewing the community as owner, and when taking into account that the community needs to evolve more rapidly with its users to survive, then periodic clarification and renewal of the origin, intent, and understanding of the stewarding organizations and the standards they coordinate might help encourage more rapid, far-sighted change. libraries may or may not sink to the degree that this is realized. in this light, it should be noted that some communities, including path-breaking projects within nsdl, have made well-reasoned decisions not to use these library subject vocabulary standards (carl lagoze, pers. comm.). these are just recent examples, given that abstracting and indexing services/databases, for the journal literature, have in most cases long ago chosen to use their own specialist vocabularies, often supplementing these by enabling key-word or natural-language searching of abstracts or complete full-text. among other core practical concerns here are that the library community’s standards may not be seen as useful and as widely applicable as other information communities may desire. that is, if an important goal is to evolve and expand standards long associated with and emanating from the library community into becoming the standards of new, larger communities outside of libraries, then a moreguarded-than-not approach, which is slow to respond to early adaptors or innovators and slows sensible change, may not be the best path. here it should be said that there are significant ongoing efforts to overcome some of the challenges and better evolve lcsh/lcc. oclc’s faceted application of subject terminology (fast) may represent a step in the right direction.43 having an entry-level vocabulary to translate enduser terminology to appropriate library subject standard vocabulary terms would be of great importance to most types of end user.44 oclc has also been working with the resource description network (rdn) to streamline ddc application.45 there just need to be more of these efforts moving at a more rapid clip. as macewan concluded in 1998, “if lcsh does not change it will sooner or later be abandoned. . . .”46 the same might be said of library subject vocabulary/classification standards. however, in the worst-case scenario, assuming the existing subject standards cannot evolve more rapidly to meet new user needs in information access, collection building, and metadata creation, now may even be an appropriate juncture for a large-scale rethinking and rebuilding, from the ground up.47 the architecture, intent, end-user audience, form, and substance of these standards would need to be rebuilt and expanded. a capability for organizationally responding more quickly to what has amounted over the last few years to far-reaching paradigm shifts would be enabled. now may be the time because, in addition to the questions of the openness/innovation/evolutionary adaptability of these standards, they exhibit significant, long-noted, functional flaws in terms of a non-librarian end user finding success. among others often noted are: ■ misuse/lack of understanding on the part of end users (and, rarely, poor learning materials and guidance supplied by librarians) due to real or perceived complexity, often associated with the use of subheadings and arcane terms that are far from intuitive for users).48 ■ typically sparse application that doesn’t fully represent the number or depth of topics addressed by a work. despite the time needed to create the marc record manually, very few lcshs are applied (often three or less in the university of california’s melvyl union catalog). ■ the arcane and overly general nature of many terms that sometimes do not accord with terminology used by practitioners in the field.49 206 information technology and libraries | december 2006 ■ the lack of currency of terms describing new or recent phenomenon (see discussion of entry vocabulary.50 ■ the lack of uniformity of subject granularity in their application across multiple cataloging institutions for the same/similar works. ■ the significant amounts of expensive expert labor involved in their application. ■ their complexity often at least partially assumes some expert mediation (that may not be available, given that access is increasingly from outside the library) or long-term experience with the vocabulary. ■ overdone detail/complexity, some of it either not extremely useful to researchers and nonlibrarian end users or already instantly verifiable by users. ■ their arcane-ness and complexity, which limits capabilities for machine assistance in application and, thus thwarts a major, inexpensive means for future collection growth, increased coverage, and more useful collections. fortunately, and this is crucial, it turns out that much of the tonic needed for improvement may reside in the areas of inexpensively augmenting, as opposed to changing, the lcsh/lcc/ddc schema/vocabularies. for example, it is probable that most significant objects, when not digitized themselves, will be accompanied increasingly by digitized, representative cores of searchable natural-language rich text, as lc is doing with its table of contents digitization.51 automated and semi-automated tools for rich-text identification, extraction, and end-user searching are showing applicability now (see part i). similarly, keyphrase identification and application can be accomplished automatically with a good degree of reliability; these processes play a role similar to rich text in providing useful retrieval terms and in augmenting subject searching with/without these controlled vocabularies. finally, reasonably good overall subject gisting is occurring in the creation of annotationlike constructs. all of these—rich text, keyphrases, and annotation-like constructs alike—are of great potential value in addressing controlled subject vocabulary/schema inadequacies and in complementing lcsh/lcc/ddc in end-user finding. it is also probable that use of machine means to augment overarching standard subject vocabularies with complementary and much more granular/detailed specialist vocabularies (both expert created and controlled as well as those that are automatically invoked) will shortly be practical and prove very useful. streamlined lcsh/ lcc/ddc could be made perhaps to function as linguistic “switching yards” with specialist vocabularies oriented to them and acting as extensions via the spine provided by the generalist vocabularies (similar to work being explored by vizine-goetz). all of this could be hinged on the synonomy and other term/concept relationships supplied by wordnet or other whole natural-language corpora.52 in such a manner, reconceived lcsh/lcc/ddc can basically work as multi-vocabulary integration and translation tools in cases where the granularity of the subject becomes very fine-grained or specialized.53 such synonymy, linguistic linkages, and switching capabilities would make possible more meaningful and accurate interrelations and more fluid user movement among the vocabularies and concepts of multiple disciplines and multiple-controlled vocabularies/schema. this would also better enable the end user when employing terms actually used by practitioners/researchers/students in their disciplines.54 these and other efforts are crucial because, despite their problems, lcsh/lcc/ddc are comprehensive, overarching vocabularies and schema that, though complex (as are the subject vocabularies of biosis and pubmed/medline, which successfully represent very large subject universes of their own), have done a generally useful job of representing and coherently organizing finding terminology for most known worldly (and unworldly) phenomena. this, on any basis, is no easy task. these library standard vocabularies might best be seen as both essential connective tissue and as spines that could coherently thread many disciplines and interests, and many of the more specific vocabularies, together. without such a spine, interdisciplinarians, researchers/students new to an area, and generalists—whose focus requires wide knowledge often across among many disciplines (and therefore subject vocabularies)—may find themselves handicapped. each suband then sub-sub-specialization might develop its own mutually exclusive and contradictory terminology in a manner that natural-language substitutions such as keyphrase and rich-text availability can only partially fix. many end users and librarians noted the downsides of natural-language-text-only searching two decades ago while using newspaper and other full-text databases offered by dialog or brs. finally, one cannot ignore that lcsh/lcc/ddc have huge established bases of practitioners and metadata records employing them. therefore, their value is large. to summarzie, the solutions to the problems inherent in using library standard subject vocabulary/schema and other controlled metadata will involve the following: ■ openness to extensive hybridization of approaches to rethinking subject vocabularies/schema and other metadata; ■ awareness of, design for, guidance of, and incorporation of new machine-assisted technologies to boost collection coverage and reduce costs of application; ■ embracing machine assistance, as appropriate, as a means of amplifying and extending expertise and application; ■ applying existent technologies for generation of keymachine assistance in collection building | mitchell 207 phrases, description-like constructs, and rich text in order to augment controlled subject vocabularies; ■ developing a better conception of end-user metadata expectations and needs against the backdrop and expectations generated by the web, such as instant end-user access/verification; and ■ making use of specialist vocabularies that might be dovetailed well with and coordinated through standard vocabularies. ■ invoked subject vocabularies—hierarchical and otherwise it is important to track recent research into automated and semi-automated means for creating (often referred to in the computer-science literature as “inducing” or extracting) hierarchical and other subject vocabularies/ontologies from natural-language corpora (see part ii). the intent of this work is to have the natural-language terms used by practitioners directly populate and structure the subject-finding approach. automated induction of subject vocabularies will be useful to augment and increase the capabilities, flexibility, and interactivity of standard subject vocabularies/schema.55 at the very least, and this is important, they could function to automatically suggest synonyms or new terminology for ongoing vocabularies/schema. and these approaches could be put to use in building entry-level vocabularies that front the vocabularies of the standards.56 they could also be used to aid in the semi-automated or automated repopulation/reworking of the standards, if large-scale, from-the-ground-up reworking is deemed necessary at some point. this would be done on a discipline-by-discipline, subject-by-subject basis. ■ resource discovery, search engines, and your library’s subject portal library collections, virtual libraries, portals, and internetenabled catalogs of openly accessible, significant internet resources all function as “hubs” (see part i). along with other types of expert-created hubs, they have played a role in providing most large, sophisticated, commercial search engines with a significant means for modeling and determining high-quality resources and, when accurate, a considerable portion of their accuracy. though google and others do not detail how their search algorithms work, most advanced crawlers highly weight (give authority to) sites that contain large numbers of links to research and other significant resources, especially when expert created. similarly, resources from specific domains such as .edu, .org, and .gov, and institutions such as libraries, universities, and scholarly societies can be identified and more highly weighted. this is another case of the community’s expertise/authority functioning as a knowledge base that, when offered as a public good (as library-created hubs often are), helps better enable directional tools for these commercial and noncommercial crawlers. there is nothing wrong with this as long as the community is aware of its contribution and as long as its efforts are recognized by these businesses. expert library-based subject portals often reciprocate usage by using commercial engines for resource discovery, though this usually represents a minor way of collecting because other expert sources are preferred. ■ enumeration of catalysts for, impacts of, and issues in machine assistance in the library community related to these research and technical developments, the library community needs to think through a great many interrelated and diverse issues and questions regarding (1) impacts of the machine assistance we have been discussing; (2) the possible massive automation of metadata generation and resource discovery in libraries, (3) who will “own” these technologies and ideas, and (4) changes in expectations/roles of metadata practitioners and standards and their stewards, in the following areas: ■ when will machine learning/machine assistance yield reliable, inexpensive, and therefore massive application of metadata on an internet scale, that meets librarian, and more importantly, end-user expectations in terms of usefulness? machine assistance should begin to be factored into long-term planning. ■ what will be the effects of this machine amplification in changing the importance/roles/content of subject standards? that is, how and to what degree will a new means and scale of application change these standards generally, and how they’re perceived and used by end users and librarians and, therefore, be applied by the library community? how might these standards themselves change both in terms of changes in and approaches to vocabulary and schema? that is to say, how would massive, machine-assisted application in and of itself change the makeup of the vocabulary, schema, and the styles/conventions with which they are applied? ■ how might the roles of the stewards of these standards change, given massive application as well as possible interest on the part of other communities? can library standards penetrate and be effectively used by other information communities? what changes in the standards would be required to achieve this? 208 information technology and libraries | december 2006 ■ what are the trade-offs between highly manual or craftsman/guild approaches and highly automated or more industrial approaches to applying metadata? within which contexts, collections, resources, and budgets are these approaches to be best used, either singly or combined in various proportions, in building/expanding a collection? how does each approach best complement the other in library collections? ■ to what degree will changing end-user information usage and access patterns change approaches in regard to collection design and access assumptions, the metadata standards the collections are based upon, and the stewarding organizations of the standards? ■ to what degree may labor and resource savings, as well as the ability to provide for more comprehensive collections, as offered by this technology, dictate changes within the library community in regard to expectations for metadata quality and specificity? in which information-seeking contexts and collections and to what degree will the google-type record or minimal, streamlined dc become, if not a necessity themselves, then a pole toward which library bibliographic metadata evolves? ■ a question self-evident to most but not to all is: to what degree will the nature of the internet itself continue to change our approach to supplying metadata? again, researchers in academic departments no longer need walk across campus to the library by virtue of having many bibliographic details of an object present in a metadata record. increasingly, they can go to the object on the internet and instantly verify the detail for themselves. should libraries deemphasize data elements/fields that are dependably and quickly end-user verifiable in favor of expending more expertise, time, and resources in gisting/ describing the subject, intent, and perhaps even estimated quality or significance of the work? ■ in which specific ways will labor be saved and machines be capable of assisting in resource discovery and metadata generation? that is, what level of automation/semi-automation is acceptable to the community and reliably deployable in production over horizons of one to five years? what level of quality/depth will users accept in metadata designed to occupy the continuum existing between the marc record and the google “record” (this being a large and significant service area; see part i)? how will this technology change old and enable new roles, tasks, and production routines for library subject experts and other staff? how will libraries ramp up and transition into this? ■ will the substantial potential economic advantages of automated or semi-automated generation of library standard metadata such as lcsh/lcc/ddc vocabularies/schema drive a rethinking toward greater uniformity/simplicity/streamlining of these standards and conventions in their application, explicitly with machine application in mind? for example, perhaps only a subset of a whole vocabulary will be used and those that are used will become less detailed and less rich for experts but also—for most end users— less complex and arcane, and more intuitive.57 ■ in some ways, the existence of dc is a recognition that this kind of rethinking and streamlining of library description standards, in the interest of representing and providing access to a much larger scale of communities and resources, is already well under way. what are the obstacles to greater usage of dc? ■ what should the balance be in streamlining metadata for automated application, in relation to its current complexity/depth while augmenting with rich text? from another perspective, what is the balance when considering the oversimplification and loss of descriptive power when using machine methods as compared with that otherwise achievable through use of subject expertise? how will libraries determine best balances of expert and machine in regard to different tasks? how will this be quantified and determined through examination of user retrieval success/satisfaction—with this, in turn, factored against the backdrop of metadata creation costs, fulltext data harvesting and retrieval, and the need for collections with much greater reach? ■ as accurate means of metadata and rich-text generation for/from text objects improve, machine assistance will allow a shifting of expertise to provide better collection coverage and expression of subjectdomain expertise (e.g., in abstracts). how will this new capability for breadth and depth be defined and used in library collections? for example, will new visual, multimedia, and data objects—which the web has made possible on a mass basis and which libraries generally do not cover well—become a major goal in repurposing expertise since these do not easily lend themselves to machine processing (karen calhoun, pers. comm.)? ■ might streamlining and the usage of multiple depths/types of metadata application first require the acceptance within the community of the concept of the multitiered collection/database that supports multiple levels and types of heterogeneous resources representing differing levels of importance to users?58 or, can this need be met through more fully evolved meta-search approaches? ■ helping to structure this metadata heterogeneity might be the sliding-scale application of varying levels of metadata-generation labor expenditures and amounts/type of metadata, with the lower machine assistance in collection building | mitchell 209 and middle-value resources receiving application of streamlined standard vocabularies/schema and rich text, automatically or semi-automatically, at low cost. high-value resources would continue to receive expert-applied, expensively created, complex, and high-quality metadata as well as rich text. libraries already make such distinctions in quality/significance to some degree through purchasing (e.g., departmental collecting profiles/weightings by subject and object type and cost) and order-ofcataloging priority decisions, as well as by student/ faculty input on specific items. more specifically, we would need to discuss and develop criteria in determining the core or peripheral value of a resource for its subjects and user communities and then, based on the judgments derived, appropriately apportion amount and type of metadata and expert labor or machine assistance, on a sliding scale. again, while it should be noted that the library community has generally avoided rendering judgments on the possible use/relevance of a resource to a subject community, libraries nevertheless do routinely make general calls that effectively function this way to some degree. in making this judgement, it would be critical to involve resource users. reviewer-researcher, library user, and librarian evaluations for purchases as well as finding tool/collection-usage statistics for the specific subject or author and item all could be woven into the means by which the core weighting of a resource could be assigned and be refined over time via usage. developing this value is important from a library standpoint. it is a key that may help unlock solutions for some of the community’s bigger challenges, including those revolving around the best marriage of machine assistance with librarian expertise. how do libraries go about making these sliding-scale evaluations with some uniformity, among different collection types and interests, with an eye toward tasking expert and machine? ■ can some of the general end-user search deficiencies commonly acknowledged for lcsh/lcc/ddc be rectified to some extent by automatically/semi-automatically providing rich full-text accompaniment for each record/resource, either in the form of “selected” excerpts verbatim or as processed into significant keyphrases representing this text? how could the presence of this rich text not so much change as augment these standards? for example, rich full-text might be relied upon to contain detail that obviated the need to use certain lcsh subdivisions or other types of marc metadata. could inadequacies/inaccuracies in expert-applied and machine-applied metadata be partially countered, for end-user retrieval purposes, through the presence of rich full-text? rich text, as well as keyphrases/terms and descriptions that serve the same purpose in this context, can now be reliably generated in many cases automatically. what would be the right mix of subject-vocabulary standard metadata and accompanying, selected natural-language text for best end-user success? how might rich-text extraction and searching improve upon searching of whole-object full-text? how much rich text is needed and how distilled should it be? large, whole-object full-text searching can often be a searcher’s quagmire, clouding results rankings and weightings. ■ could a new scale of application and interest on the part of new communities be better catalyzed through the incentive offered by opening up the lcsh/lcc/ ddc subject vocabularies/schema on an open-standards/open-source, free-software model? ■ if development of these technologies is constrained with regard to action/inaction on the part of the community and its stewards, will the standards be replaced—or become obsolete—for major existing or prospective sectors of users? if so, what does this mean for the library community? ■ by and for whom is such standard subject vocabulary/schema application technology developed within the community? classifiers are actually trained through great amounts of what, in many cases, is really community-created knowledge in order to apply community-developed schema/vocabularies. smart crawlers and extractors similarly use (have “learned”) collectively created information patterns, derived from open-knowledge bases of various sorts. who should own these tools/models and how open/ closed should the programming code/ideas be, considering they could not be built without using the collective wisdom embodied in these knowledge bases? these tools exploit decades of labor by thousands of institutions, whose assumption has generally been that the knowledge base and, by extension, the tools that are built on and benefit from it, are and should remain directly or indirectly, public goods. ■ for whom is machine learning/assistance in collection building patented? the ideas, training corpora, algorithms, and data models discussed need to be observed and protected for the public domain to encourage their widespread and inexpensive availability, as well as their evolution. the u.s. patent and trademark office is now more commonly supporting the patenting of whole, generic processes that have heretofore had one or both feet in the commons, as compared with solely granting patent rights in more discrete areas of original invention. it would be unfortunate to find one day that machine assistance in collection building had been patented. this is especially an issue, given that there is little machine learning of interest to libraries that does not mine, apply, and extend the stored wisdom and knowledge that the community has built for decades. 210 information technology and libraries | december 2006 ■ summary of part iii it is important to think through and anticipate a great number of issues and concerns—including those of open models and open development—regarding machine-assistance tools (e.g., classifiers, extractors, and related algorithms/models) that generate library standard metadata, and identify and extract useful natural-language data. it is important because these tools could become central activities in libraries over the next one to five years. reflection here is especially appropriate, given the degree that these tools are trained on exemplars from library collections and come to distill and embody models of library metadata, standards, and expertise that represent the knowledge created over decades through the effort of a whole community. it is important to think through what machineassistance technologies in collection building imply for the future role of the librarian’s expertise. specifically, libraries need to reconceptualize machine-assistance software not as fully automated “ai” but rather, as enabling expert driven, strongly interactive, “servo-mechanisms” that semi-automate some work to increase the reach, quality, and user-finding success within library collections. while it will probably start out with ten or fifteen minutes of expert time saved per record by such tools, this is a lot of time saved when aggregated across the entire community and will only increase. and the community needs to think through what this implies for the evolution of librarystandard metadata, given that machine assistance will increasingly allow for massive and economic application, if a convergence of machine capabilities and machinefriendly metadata standards is architected. this large-scale amplification of usage will quite likely involve changing the value/roles of these standards for the community, as well as for the larger communities that may come to use them at the cost of simplification, streamlining, and a greater reliance on end users to verify some of their own metadata details (often interacting directly with the digital resource). the tools also imply a restructuring of expertise and its application in metadata creation in libraries to reflect a division of labor, with semi-automated machine description processes spent on the mass of useful but midto lower-value materials; with and expert time being spent on high-value resources; and with both types of records residing in the same multitiered, heterogeneous collection.58 finally, needing examination will be the roles of the stewardship organizations in: ■ shepherding the community’s metadata standards during a period of great change; ■ openly evolving the application of metadata standards within the context of machine assignment for the greatest possible good; ■ rapidly evolving the application of metadata standards to retain guidance of and to keep pace with open and proprietary developments in these areas; ■ distilling the metadata knowledge base and wisdom created by the community as this is transformed into the programmatic knowledge (rule bases and models) used by new tools. this knowledge base is a priceless asset for the library community in sustaining service roles in an age of the large-scale advent of commercial-information access, delivery, and ownership. ■ conclusion this article discusses work over the last several years in machine-learning software and services relevant to collection building in libraries. a number of promising avenues for exploration and research are detailed. deeper understanding of and more direct involvement in areas of machine learning are urged for libraries in order to reflect advances in the computer sciences and other disciplines as well as to meet changing end-user needs among information seekers. ■ acknowledgements the author would like to thank the u.s. institute of museum and library services; the library of the university of california at riverside; the national science foundation’s national science digital library; the fund for the improvement of post-secondary education of the u.s department of education; the librarians association of the university of california; and the computing and communications group of the university of california at riverside for current or past funding support. the author would also like to thank the library of congress; cornell university library; oclc; and the california digital library for providing training data and other assistance for the research. thanks to karen calhoun (cornell university library) and two anonymous readers for some excellent comments and suggestions. finally, the author would like to commend ivia lead programmer johannes ruscheinski, primary author of the data fountains and ivia code bases, for his excellent work over the years, as well as gordon paynter, walt howard, jason scheirer, keith humphries, anthony moralez, paul vander griend, artur kedzierski, margaret mooney, john saylor, laura bartolo, carlos rodriguez, jan herd, carolyn larson, diane hillmann, and ruth jackson for their invaluable contributions to the machine assistance in collection building | mitchell 211 projects. the views expressed here are solely those of the author and not intended to represent those of the library of the university of california, riverside, our funding agencies, or cooperators. ■ references and notes 1. s. mitchell et al., “ivia: open source virtual library software,” d-lib magazine (january 2003). http://www.dlib .org/dlib/january03/mitchell/01mitchell.html (accessed oct. 20, 2006); g. paynter, “developing practical automatic metadata assignment and evaluation tools for internet resources,” in proceedings of the 5th acm/ieee joint conference on digital libraries (denver: acm pr., 2005), 291–300 (winner of the jcdl 2005 vannevar bush best paper award), http://ivia.ucr.edu/ projects/publications/paynter-2005-jcdl-metadata-assignment.pdf, (accessed oct. 20, 2006); s. mitchell, “collaboration enabling internet resource collection-building software and technologies,” library trends 53, no. 4 (may 2005): 604–19; j. mason et al., “infomine: promising directions in virtual library development,” first monday (2000), http://www.first monday.dk/issues/issue5_6/mason/ (accessed oct. 20, 2006). 2. s. mitchell, “infomine: the first three years of a virtual library for the biological, agricultural, and medical sciences,” in proceedings of the contributed papers session, biological sciences division, special libraries association annual conference (seattle: special libraries assocation, 1997). 3. mitchell, “collaboration enabling internet resource collection-building software and technologies.” 4. j. phipps et al., “orchestrating metadata enhancement services: introducing lenny,” in proceedings of dc-2005: international conference on dublin core and metadata applications (madrid, spain: universidad carlos iii de madrid, 2005), http://arxiv.org/pdf/cs.dl/0501083, (accessed oct. 20, 2006). 5. mason et al., “infomine: promising directions in virtual library development.” 6. ibid. 7. s. chakrabarti, mining the web: discovering knowledge from hypertext (san francisco: morgan kauffman, 2003); s. chakrabarti et al., accelerated focused crawling through online relevance feedback, http://www2002.org/cdrom/ refereed/336/ (accessed oct. 20, 2006); s. chakrabarti, the structure of broad topics on the web, http://www2002.org/cdrom/refereed/338/index.html (accessed oct. 20, 2006); s. chakrabarti, integrating the document object model with hyperlinks for enhanced topic distillation and information extraction, http://www10.org/cdrom/papers/489 (accessed oct. 20, 2006). 8. chakrabarti et al., accelerated focused crawling; f. menczer, “mapping the semantics of web text and links” iee internet computing, 9, no. 3 (may/june 2005): 27–36; f. menczer, g. pant, and p. srinivasan, “topical web crawlers: evaluating adaptive algorithms” transactions on internet technology, 4, no 4 (2004): 378–; f. menczer, “correlated topologies in citation networks and the web” european physical journal b, 38 no. 2 (march 2004): 211–21. 9. s. mitchell, “data fountains survey,” 2005, http:// datafountains.ucr.edu/ datafountainssurvey.doc, (accessed oct. 20, 2006). 10. a. culotta and a. mccallum, “confidence estimation for information extraction,” in proceedings of human language technology conference and north american chapter of the association for computational linguistics (boston: association for computational linguistics, 2004), http://www.cs.umass.edu/ ~mccallum/papers/crfcp-hlt04.pdf, (accessed oct. 20, 2006); f. peng and a. mccallum, “accurate information extraction from research papers using conditional random fields,” in proceedings of the human language technology conference and north american chapter of the association for computational linguistics (2004). http://ciir.cs.umass.edu/pubfiles/ir-329.pdf, (accessed oct. 20, 2006); c. sutton and a. mccallum, “an introduction to conditional random fields for relational learning,” in introduction to statistical relational learning, lise getoor and ben taskar, eds. (cambridge, mass.: mit pr., 2006). http://www .cs.umass.edu/~mccallum/papers/crf-tutorial.pdf, (accessed oct. 20, 2006). 11. a. mccallum and d. jensen, “a note on the unification of information extraction and data mining using conditionalprobability, relational models,” in proceedings of the ijcai 2003 workshop on learning statistical models from relational data, acapulco, mexico: ijcai, http://www.cs.umass.edu/~mccallum/ papers/iedatamining-ijcaiws03.pdf, (accessed oct. 20, 2006); u. nahm and r. mooney, “a mutually beneficial integration of data mining and information extraction,” in proceedings of the american association for artificial intelligence/innovative applications of artificial intelligence (austin, texas: american association for artificial intelligence, 2000). http://www.cs.utexas .edu/users/ ml/papers/discotex-aaai-00.pdf, (accessed oct. 20, 2006); r. raina et al., “classification with hybrid generative/ discriminative models,” in proceedings of neural information processing systems (2003). http://www.cs.umass.edu/~mccallum/ papers/hybrid-nips03.pdf, (accessed oct. 20, 2006); g. bouchard and b. triggs, “the trade-off between generative and discriminative classifiers,” compstat 2004. (prague: springer, 2004) http://lear.inrialpes.fr/pubs/2004/bt04/bouchard-comp stat04.pdf, (accessed oct. 20, 2006). 12. mccallum and jensen, “a note on the unification of information extraction.” 13. n. eiron and k. mccurley, “untangling compound documents on the web,” in conference on hypertext (nottingham, uk: acm conference on hypertext and hypermedia, 2003), http://citeseer .ist.psu.edu/eiron03untangling.html, (accessed oct. 20, 2006). http://www.almaden.ibm.com/cs/people/mccurley/pdfs/ pdf.pdf, (accessed oct. 20, 2006); p. dimitriev et al., “as we may perceive: inferring logical documents from hypertext,” presented at ht 2005, 16th acm conference on hypertext and hypermedia (salzburg: acm, 2005); k. tajima, “finding context paths for web pages,” in proceedings of acm hypertext (darmstad, germany: acm, 1999), http://www.jaist.ac.jp/~tajima/ 212 information technology and libraries | december 2006 papers/ht99www.pdf, (accessed oct. 20, 2006); k. tajima et al., “discovery and retrieval of logical information units in web,” in proceedings of the workshop of organizing web space (in conjunction with acm conference on digital libraries) (berkeley, calif.: acm, 1999), 13–23, http://www.jaist.ac.jp/~tajima/ papers/ wows99www.pdf, (accessed oct. 20, 2006); e. de lara et al., “a characterization of compound documents on the web,” tr99-351, university of toronto (1999), http://www.cs.toronto .edu/~delara/papers/compdoc.pdf, (accessed oct. 20, 2006), http://www.cs.toronto.edu/~delara/ papers/compdoc_html/, (accessed oct. 20, 2006); l. xiaoli et al., “web search based on micro information units,” (honolulu, hawaii: eleventh international world wide web conference, 2002), http://www2002 .org/cdrom/poster/78.pdf, (accessed oct. 20, 2006); w. lee et al., retrieval and organizing web pages by information unit, http://www10.org/cdrom/papers/466/, (accessed oct. 20, 2006). 14. tajima et al., “discovery and retrieval of logical information units in web”; xiaoli et al., “web search based on micro information units”; lee et al., retrieval and organizing web pages. 15. r. mihalcea, “graph-based ranking algorithms for sentence extraction, applied to text summarization,” in proceedings of the 42nd annual meeting of the association for computational linguistics, companion volume (barcelona, spain: association for computational linguistics, 2004), http://www.cs.unt .edu/~rada/papers/mihalcea.acl2004.pdf, (accessed oct. 20, 2006); r. mihalcea and p. tarau, “textrank: bringing order into texts,” in proceedings of the conference on empirical methods in natural language processing (barcelona, spain: empirical methods in natural language processing, 2004), http://www.cs.unt .edu/~rada/papers/mihalcea.emnlp04.pdf, (accessed oct. 20, 2006); r. mihalcea, p. tarau, and e. figa, “pagerank on semantic networks, with application to word sense disambiguation,” in proceedings of the 20th international conference on computational linguistics (geneva, switzerland: coling 2004). http://www .cs.unt.edu/~rada/papers/ mihalcea.coling04.pdf, (accessed oct. 20, 2006); y. matsuo et al., “keyworld: extracting keywords in a document as a small world,” in proceedings of discovery science (berlin, new york: springer, 2001), 271–81 (lecture notes in computer science, v. 2226), http://www.miv.t.u-tokyo.ac.jp/ papers/matsuods01.pdf, (accessed oct. 20, 2006); y. matsuo and m. ishizuka, “keyword extraction from a single document using word co-occurrence statistical information,” international journal on artificial intelligence tools 13, no.1 (2004): 157–69, http://www.miv.t.u-tokyo.ac.jp/papers/matsuoijait04.pdf, (accessed oct. 20, 2006); xiaoli et al., “web search based on micro information units”; lee et al., retrieval and organizing web pages; g. forman and ira cohen, “learning from little: comparison of classifiers given little training,” tech report: hpl2004-19r1 20040719 (palo alto, calif.: hewlett-packard research labs., 2004), http://www.hpl.hp.com/techreports/2004/hpl -2004-19r1.pdf, (accessed oct. 20, 2006). 16. g. mann et al., “bibliometric impact measures leveraging topic analysis,” (in press), in proceedings of the joint conference on digital libraries (2006). http://www.cs.umass.edu/~mccallum/ papers/impact-jcdl06s.pdf, (accessed oct. 20, 2006). 17. r. bouckaert and e. frank, “evaluating the replicability of significance tests for comparing learning algorithms,” in proceedings of the pacific-asia conference on knowledge discovery and data mining. (berlin, new york: springer-verlag, 2004), 3–12 (lecture notes in computer science, v. 3056), http://www .cs.waikato.ac.nz/~ml/publications/2004/bouckaert-frank.pdf, (accessed oct. 20, 2006); r. bouckaert, “estimating replicability of classifier learning experiments,” in proceedings of the international conference on machine learning (2004), http://www. cs.waikato.ac.nz/~ml/publications/2004/bouckaert-estimating.pdf, (accessed oct. 20, 2006); r. caruana and a. niculescumizil, “data mining in metric space: an empirical analysis of supervised learning performance criteria,” in kdd-2004: proceedings of the tenth acm sigkdd international conference on knowledge discovery and data mining (new york: acm press, 2004), http://perfs.rocai04.revised.rev1.ps, (accessed oct. 20, 2006). 18. j. zhang et al., “modified logistic regression: an approximation to svm and its application in large-scale text categorization,” in proceedings: twentieth international conference on machine learning (menlo park calif.: aaai press, 2003), 888–97, http://www.informedia.cs.cmu.edu/documents/icml03zhang .pdf, (accessed oct. 20, 2006); y-c. chang, “boosting svm classifiers with logistic regression,” technical report. (taipei: institute of statistical science, academia sinica, 2003), http://www.stat.sinica.edu.tw/library/c_tec_rep/2003-03.pdf, (accessed oct. 20, 2006); t. zhang and f. oles, “text categorization based on regularized linear classification methods,” information retrieval 4, no. 1 (2001): 5–31, http://www.research .ibm.com/people/t/tzhang/pubs.html, (accessed oct. 20, 2006); t. joachims, “svmlight,” (including svmmulticlass, svmstruct, svmhmm) (software, 2005), http://svmlight.joachims.org/, (accessed oct. 20, 2006); c. chang and c-j. lin, “libsvm,” (software, 2005), http://www.csie.ntu.edu.tw/~cjlin/libsvm/, (accessed oct. 20, 2006); c-w hsu and c-j lin, “bsvm,” (software, 2003), http://www.csie.ntu.edu.tw/~cjlin/bsvm/index .html, (accessed oct. 20, 2006); t. finley and t. joachims, “supervised clustering with support vector machines,” in proceedings of the international conference on machine learning (new york: acm press, 2005), http://www.cs.cornell.edu/ people/tj/publications/finley_joachims_05a.pdf, (accessed oct. 20, 2006); i. tsochantaridis et al., “support vector machine learning for interdependent and structured output spaces,” in proceedings of the international conference on machine learning (new york: acm press, 2004), http://www.cs.cornell.edu/ people/tj/publications/tsochantaridis_etal_04a.pdf, (accessed oct. 20, 2006) ; s. godbole and s. sarawagi, “discriminative methods for multi-labeled classification,” in proceedings of the pacific-asia conferences on knowledge discovery and data mining (2004), http://www.it.iitb.ac.in/~shantanu/work/pakdd04 .pdf, (accessed oct. 20, 2006); l. cai and t. hofmann, “hierarchical document categorization with support vector machines,” in proceedings of the acm 13th conference on information and knowlmachine assistance in collection building | mitchell 213 edge management (2004), http://www.cs.brown.edu/people/ th/publications.html, (accessed oct. 20, 2006); t. hofmann et al., “learning with taxonomies: classifying documents and words,” in proceedings of the workshop on syntax, semantics, and statistics, neural information processing (2003), http:// www.cs.brown.edu/people/ th/publications.html, (accessed oct. 20, 2006); a. tveit, “empirical comparison of accuracy and performance for the mipsvm classifier with existing classifiers,” technical report, division of intelligent systems, department of computer and information science, norwegian university of science and technology. (trondheim, norway, 2003), http://www.idi.ntnu.no/~amundt/publications/2003/ mipsvmclassificationcomparison.pdf, (accessed oct. 20, 2006); c-w hsu and c-j lin, “a comparison of methods for multiclass support vector machines,” ieee transactions on neural networks 13, no. 2 (2002): 415–25, http://www.csie.ntu .edu.tw/~cjlin/papers/multisvm.pdf, (accessed oct. 20, 2006). 19. p. komarek, “logistic regression for data mining and high-dimensional classification” (ph.d. thesis, carnegie mellon university, 2004), 138; p. komarek and a. moore, “fast robust logistic regression for large sparse data sets with binary outputs,” proceedings of the ninth international workshop on artificial intelligence and statistics. january 3–6, 2003, hyatt hotel, key west, florida, ed. by christopher m. bishop and brendan j. frey. http://research.microsoft.com/ conferences/aistats2003/proceedings/174.pdf (accessed nov. 23, 2006); a. popescul et al., “towards structural logistic regression: combining relational and statistical learning,” in mrdm 2002: workshop on multi-relational data mining, http://www-ai.ijs.si/sasodzeroski/mrdm2002/proceed ings/popesul.pdf (accessed nov. 23, 2006); j. zhang and y. yang, “probabilistic score estimation with piecewise logistic regression,” in proceedings: twenty-first international conference on machine learning (menlo park, calif.: aaai press, 2004), http://www-2.cs.cmu.edu/~jianzhan/papers/icml04zhang .pdf, (accessed oct. 20, 2006); zhang et al., “modified logistic regression”; zhang and oles, “text categorization”; multi-class lr is discussed in zhang et al., 2003, and chang, 2003 (reference 18). 20. some recent work on nb can be seen in j. rennie, “tackling the poor assumptions of naive bayes text classifiers,” in t. fawcett and n. mishra, eds., proceedings of the 20th international conference on machine learning (washington, d.c.: aaai pr., 2003), 616–23, http://haystack.lcs.mit.edu/papers/rennie .icml03.pdf, (accessed oct. 20, 2006); k. schneider, “techniques for improving the performance of naive bayes for text classification,” in computational linguistics and intelligent text processing: sixth international conference, cicling2005, mexico city, mexico, february 13–19, 2005: proceedings (new york: springer, 2005). (lecture notes in computer science, 3406). 682–93, http://www.phil.uni-passau.de/linguistik/ schneider/pub/cicling2005.html, (accessed oct. 20, 2006); e. frank et al., “locally weighted naive bayes,” in proceedings of the 19th conference in uncertainty in artificial intelligence (acapulco: morgan kaufmann, 2003), 249–56, http://www .cs.waikato.ac.nz/~eibe/pubs/uai_200.ps.gz, (accessed oct. 20, 2006); g. webb et al., “not so naive bayes: aggregating one-dependence estimators,” machine learning 58, no. 1 (jan. 2005): 5–24, http://www.csse.monash.edu.au/~webb/files/ webbboughtonwang05.pdf, (accessed oct. 20, 2006); e. keogh and m. pazzani, “learning the structure of augmented bayesian classifiers,” international journal on artificial intelligence tools 11, no. 4 (2002): 587–601, http://www.ics.uci.edu/~pazzani/ publications/tools.pdf (accessed oct. 20, 2006). 21. mccallum and jensen, “a note on the unification of information extraction and data mining”; joachims, “svmlight”; y. altun et al., “hidden markov support vector machines,” in proceedings of the 20th international conference on machine learning (menlo park, calif.: aaai press, 2003), http://www.cs.brown.edu/people/th/publications.html (accessed oct. 20, 2006); a. ganapathiraju et al., “hybrid svm/hmm architectures for speech recognition,” in advances in neural information processing systems 13: proceedings of the 2000 conference (cambridge, mass.: mit press, 2001), http://www.nist.gov/speech/publications/tw00/pdf/cp210 .pdf (accessed oct. 20, 2006); d. freitag and a. mccallum, “information extraction with hmm structures learned by stochastic optimization,” in proceedings of the 18th conference on artificial intelligence (austin, tx.: aaai press, 2000) http:// www.cs.umass.edu/~mccallum/papers/iehill-aaai2000s .ps (accessed oct. 20, 2006); s. basu et al., “a probabilistic framework for semi-supervised clustering,” in proceedings of the 10th acm sigkdd international conference on knowledge discovery and data mining (seattle, wash.: 2004), 59– 68, http://www.cs.utexas.edu/users/ml/papers/semi-kdd -04.pdf, (accessed oct. 20, 2006). 22. t. liu et al., “efficient exact knn and nonparametric classification in high dimensions,” in advances in neural information processing systems 15: proceedings of the 2002 conference (cambridge, mass.: mit press, 2001). http://www .autonlab.org/autonweb/showpaper.jsp?id=liu-knn, (accessed oct. 20, 2006); g. guo et al., “knn model-based approach in classification,” in lecture notes in computer science, vol. 2888 (heidelberg: springer berlin, 2003), 986–96, http://www .icons.rodan.pl/publications/%5bguo2003%5d.pdf (accessed oct. 20, 2006) 23. bouckaert and frank, “evaluating the replicability of significance tests”; bouckaert, “estimating replicability of classifier learning experiments”; caruana and niculescu-mizil, “data mining in metric space”; r. caruana and t. joachims, “perf (data mining evaluation software),” in proceedings of the conference on knowledge discovery and data mining (2004). http://kodiak.cs.cornell.edu/kddcup/software.html (accessed oct. 20, 2006); paynter, “developing practical automatic metadata.” 24. raina et al., “classification with hybrid generative/ discriminative models”; bouchard and triggs, “the trade-off between generative and discriminative classifiers.” 25. ibid; zhang et al., “modified logistic regression”; chang, “boosting svm classifiers with logistic regression”; joachims, 214 information technology and libraries | december 2006 “svmlight”; l. shih et al., “not too hot, not too cold: the bundled svm is just right!” in proceedings of the icml-2002 workshop on text learning (2002). http://people.csail.mit.edu/ u/j/jrennie/public_html/papers/icml02-bundled.pdf (accessed oct. 20, 2006); f. fukumoto and y. suzuki, “manipulating large corpora for text classification,” in proceedings of the conference on empirical methods in natural-language processing (philadelphia: association for computational linguistics, 2002), 196–203, http://acl.ldc.upenn.edu/w/w02/w02-1026.pdf (accessed oct. 20, 2006); altun et al., “hidden markov support vector machines; ganapathiraju et al., “hybrid svm/hmm architectures”; liu et al., “efficient exact k-nn”; a. ng and m. jordan, “on discriminative versus generative classifiers: a comparison of logistic regression and naive bayes,” in advances in neural information processing systems 14: proceedings of the 2001 conference (cambridge, mass.: mit press, 2002), http:// www.robotics.stanford.edu/~ang/ papers/nips01-discriminativegenerative.ps (accessed oct. 20, 2006); k. nigam et al., “text classification from labeled and unlabeled documents using em,” machine learning 39, nos. 2/3 (2000): 103–34, http://www .kamalnigam.com/papers/emcat-mlj99.pdf (accessed oct. 20, 2006). 26. g. valentini and f. masulli, “ensembles of learning machines,” in neural nets wirn vietri-02, series lecture notes in computer sciences, m. marinaro and r. tagliaferri, eds. (heidelberg: springer-verlag, 2002), http://www.disi.unige.it/ person/masullif/papers/masulli-wirn02.pdf (accessed oct. 20, 2006). 27. ibid.; r. caruana et al., “ensemble selection from libraries of models” in proceedings: twenty-first international conference on machine learning (menlo park, calif.: aaai press, 2004). http://www.cs.cornell.edu/~alexn/shotgun.icml04.revised. rev2.pdf (accessed oct. 20, 2006); g. tsoumakas, “effective voting of heterogeneous classifiers,” in machine learning ecml 2004: 15th european conference on machine learning, pisa, italy, september 20–24, 2004: proceedings. (berlin, new york: springer, 2004), http://users.auth.gr/~greg/publications/tsoumakas -ecml2004.pdf (accessed oct. 20, 2006); j. fürnkranz, “on the use of fast sub-sampling estimates for algorithm recommendation,” technical report tr-2002-36 (wien: österreichisches forschungsinstitut für artificial intelligence, 2002), http://www .ofai.at/cgi-bin/get-tr?paper=oefai-tr-2002-36.pdf (accessed oct. 20, 2006); a. seewald, 2002. “meta-learning for stacked classification,” (extended version) in proceedings of the 2nd international workshop on integration and collaboration aspects of data mining, decision support, and meta-learning (university of helsinki, department of computer science, report b-2002-3, 2002), http:// www.ofai.at/cgi-bin/get-tr?download=1&paper=oefai-tr-2002 -05.pdf (accessed oct. 20, 2006); a. seewald and j. fürnkranz, “an evaluation of grading classifiers,” in advances in intelligent data analysis: proceedings of the 4th international symposium (lisbon, portugal: springer-verlag, 2001), http://www.ofai.at/ cgi-bin/get-tr?paper=oefai-tr-2001-01.pdf (accessed oct. 20, 2006); p. bennett et al., “the combination of text classifiers using reliability indicators,” technical report. microsoft and information retrieval 8, no. 1 (2005): 67–100, http://research .microsoft.com/~horvitz/tclass_combine.pdf (accessed oct. 20, 2006); y. kim et al., “optimal ensemble construction via metaevolutionary ensembles,” expert systems with applications 30, no. 4 (in press 2006), http://www.informatics.indiana.edu/fil/ papers/mee-eswa.pdf (accessed oct. 20, 2006). 28. s. godbole, “document classification as an internet service: choosing the best classifier” (masters thesis, iit bombay, 2001). http://www.it.iitb.ac.in/~shantanu/work/mtpsg.pdf (accessed oct. 20, 2006). 29. k. liu and h. kargupta, “distributed data mining bibliography: release 1.7,” (baltimore: university of maryland, computer science department, 2006), http://www.csee.umbc.edu/ ~hillol/ddmbib/ (accessed oct. 20, 2006); a. prodromidis and p. chan, “meta-learning in distributed data mining systems: issues and approaches,” in advances of distributed data mining, hillol kargupta and philip chan, eds. (menlo park, calif. : aaai/ mit press, 2000). http://www1.cs.columbia.edu/~andreas/ publications/ddmbook.ps.gz (accessed oct. 20, 2006); g. tsoumakas and i. vlahavas, “distributed data mining of large classifier ensembles,” in methods and applications of artificial intelligence: second hellenic conference on ai, setn 2002, thessaloniki, greece, april 11–12, 2002: proceedings, (berlin, new york: springer, 2002), 249–56, http://users.auth.gr/~greg/publications/ddmlce.pdf (accessed oct. 20, 2006); r. khoussainov et al., “grid-enabled weka: a toolkit for machine learning on the grid,” ercim news no. 59, (oct. 2004), http:// www.ercim.org/publication/ercim_news/enw59/khussainov .html (accessed oct. 20, 2006). 30. s. godbole et al., “document classification through interactive supervision of document and term labels,” in knowledge discovery in databases: pkdd 2004: 8th european conference on principles and practice of knowledge discovery in databases, pisa, italy, september 20–24, 2004: proceedings (berlin; new york: springer, 2004), http://www.it.iitb.ac.in/~shantanu/work/ pkdd04.pdf (accessed oct. 20, 2006).; h. yu et al., “pebl: positive example based learning for web page classification using svm,” in kdd-2002: proceedings of the eighth acm sigkdd international conference on knowledge discovery in data mining (new york: acm pr., 2002), 239–48, http://eagle.cs.uiuc.edu/ pubs/2002/pebl-kdd02.pdf (accessed oct. 20, 2006); t. kristjannson et al., “interactive information extraction with constrained conditional random fields,” in proceedings: nineteenth national conference on artificial intelligence (aai-04) (menlo park, calif.: aaai press; cambridge, mass.: mit press, 2004), http://www.cs.umass.edu/~mccallum/papers/addrie-aaai04. pdf (accessed oct. 20, 2006); v. tablan et al., “ollie: on-line learning for information extraction,” in proceedings of the hlt-naacl workshop on software engineering and architecture of language technology systems: edmonton, canada: 2003. (new york: acm, 2003), http://gate.ac.uk/sale/hlt03/ollie-sealts.pdf (accessed oct. 20, 2006). 31. godbole et al., “document classification.” 32. ibid.; tablan et al., “ollie: on-line learning for information extraction.” machine assistance in collection building | mitchell 215 33. g. mann et al., “bibliometric impact measures,” (in press). 34. bouckaert and frank, “evaluating the replicability of significance tests”; caruana and niculescu-mizil, “data mining in metric space”; caruana and joachims, “perf (data mining evaluation software).” 35. mann et al., “bibliometric impact measures”; matsuo et al., “keyworld”; matsuo and ishizuka, “keyword extraction from a single document”; lee et al., retrieval and organizing web pages; tajima et al., “discovery and retrieval of logical information.” (see also the sections on hybrid, unified models, and document scale learning and classification, above.) 36. menczer, “mapping the semantics of web text and links.” 37. p. srinivasan et al., “a general evaluation framework for topical crawlers,” information retrieval 8, no. 3 (2005): 417–47, http://www.informatics.indiana.edu/fil/papers/ crawl_framework.pdf (accessed oct. 20, 2006); a. maguitman et al., “algorithmic computation and approximation of semantic similarity,” (in press, 2006). to appear in world wide web journal. http://www.informatics.indiana.edu/fil/papers/ semsim_extended.pdf (accessed oct. 20, 2006). 38. arxiv. cornell university library, http://arxiv.org/ (accessed oct. 20, 2006); citeseer.ist (formerly researchindex), http://citeseer.ist.psu.edu/ (accessed oct. 20, 2006); escholarship repository, california digital library, http://repositories.cdlib .org/escholarship/, (accessed oct. 20, 2006); national science foundation, national science digital library, http://nsdl.org/ (accessed oct. 20, 2006); oaister. digital library production service (university of michigan), http://oaister.umdl.umich.edu/ o/oaister/ (accessed oct. 20, 2006); u.s. institute of museum and library services. digital collections and content, http://imlsdcc .grainger.uiuc.edu/ (accessed oct. 20, 2006). 39. k. calhoun, “the changing nature of the catalog and its integration into other discovery tools,” (report to the library of congress, mar. 17, 2006), http://www.loc.gov/catdir/calhoun -report-final.pdf (accessed oct. 20, 2006); mitchell, “collaboration enabling internet resource collection-building software and technologies”; w. wulf, “higher education alert: the railroad is coming,” in educause, publications from the forum for the future of higher education (2002), http://www.educause. edu/ir/library/pdf/ffpiu022.pdf (accessed oct. 20, 2006). 40. university of california libraries, “rethinking how we provide bibliographic services at the university of california,” final report of the bibliographic services task force of the university of california libraries, 2005, http://libraries .universityofcalifor nia.edu/sopag/bstf/final.pdf (accessed oct. 20, 2006). 41. l. dempsey, “libraries and the long tail: some thoughts about libraries in a network age,” d-lib magazine 12, no. 4 (2006), http://www.dlib.org/dlib/april06/dempsey/ 04dempsey.html (accessed oct. 20, 2006). 42. mason, j. et al., “infomine: promising directions in virtual library development,” first monday 5, no. 6 (june 5, 2000), http://www.firstmonday.dk/issues/issue5_6/mason/ (accessed oct. 20, 2006). 43. e. o’neill and l. m. chan, “fast: faceted application of subject terminology,” in proceedings of the world information congress, ifla general conference and council (berlin: ifla, 2003). http://www.ifla.org/iv/ifla69/papers/010e-oneill_maichan.pdf (accessed oct. 20, 2006); see also: oclc 2003–2006, “fast: faceted application of subject terminology,” http:// www.oclc.org/research/projects/fast/default.htm) (accessed oct. 20, 2006). 44. m. bates, 2003, “improving user access to library catalog and portal information,” task force recommendation 2.3, final report (washington, d.c.:library of congress, 2003), 30, http:// www.loc.gov/catdir/bibcontrol/2.3batesreport6-03.doc.pdf (accessed oct. 20, 2006). 45. rdn (resource description network), http://www.rdn .ac.uk/projects/eprints-uk/, (accessed oct. 20, 2006); oclc “eprints-uk” (2005), http://www.oclc.org/research/projects/ mswitch/epuk.htm, (accessed oct. 20, 2006). 46. a. macewan, “working with lcsh: the cost of cooperation and the achievement of access: a perspective from the british library,” presented at the ifla general conference, 1998, http://www.ifla.org/iv/ifla64/033-99e.htm (accessed oct. 20, 2006). 47. ibid.; r. larson, “the decline of subject searching: longterm trends and patterns of index use in an online catalog,” journal of the american society for information science 42, no. 3 (1991): 197–215. 48. k. drabenstott et al., “end-user understanding of subject headings in library catalogs,” library resources & technical services 43, no. 3 (jul. 1999): 140–60; bates, “improving user access.” 49. bates, “improving user access,” (see discussion of entry vocabulary). 50. ibid. 51. beat (bibliographic enrichment advisory team, library of congress), “digital tables of contents,” (2005), http://www .loc.gov/catdir/beat/digitoc.html (accessed oct. 20, 2006). 52. d. vizine-goetz, “terminology services, oclc,” (2004), http://www.oclc.org/research/projects/termservices/default .htm (accessed oct. 20, 2006). 53. c. fellbaum, wordnet: an electronic lexical database (cambridge, mass.: mit pr., 1998), http://wordnet.princeton.edu/ (accessed oct. 20, 2006); a. csomai, “wordnet bibliography,” (2006). http://lit.csci.unt.edu/~wordnet/ (accessed oct. 20, 2006). 54. bates, “improving user access.” 55. a. maedche and r. volz, “the ontology extraction and maintenance framework: text-to-onto,” in proceedings of the icdm 2001 workshop (san jose, calif.: ieee computer society (2001), http://cui.unige.ch/~hilario/icdm-01/dm-km-final/volz .pdf (accessed oct. 20, 2006); v. parekh, j. gwo, and t. finin, “mining domain specific texts and glossaries to evaluate and enrich domain ontologies,” in proceedings of the 2004 international conference on information and knowledge engineering: ike ‘04 (las vegas: csrea press, 2004), http://ebiquity. umbc.edu/v2.1/paper/html/id/171/ (accessed oct. 20, 2006); 216 information technology and libraries | december 2006 d. sleeman et al., “enabling services for distributed environments: ontology extraction and knowledge base characterization,” in proceedings of workshop on knowledge transformation for the semantic web/fifteenth european conference on artificial intelligence (lyon, france: ecai, 2002), http://www.csd.abdn .ac.uk/~sleeman/published-papers/p129-final-ontomine.pdf (accessed oct. 20, 2006). ; b. omelayenko, “learning of ontologies for the web: the analysis of existent approaches,” in proceedings of the international workshop on web dynamics (london: webdyn, 2001), http://dcs.bbk.ac.uk/webdyn/ webdynpapers/omelayenko.pdf (accessed oct. 20, 2006); r. dhamankar et al., “imap: discovering complex semantic matches between database schemas,” in sigmod 2004: proceedings of the acm sigmod international conference on management of data, june 13–18, 2004, paris, france (new york: association for computing machinery, 2004), http:// www.cs.washington.edu/homes/pedrod/papers/sigmod04 .pdf (accessed oct. 20, 2006); p. cassin et al., “ontology extraction for educational knowledge bases,” lecture notes in computer science, vol. 2926 (heidelberg: springer-verlag, 2004), 297–309; revised and invited papers from agent-mediated knowledge management: international symposium (stanford, calif., mar. 24–26, 2003), ftp://mas.cs.umass.edu/pub/cassin _ontology-amkm03.pdf (accessed oct. 20, 2006); t. wang et al., “extracting a domain ontology from linguistic resource based on relatedness measurements,” in the 2005 ieee/wic/ acm international conference on web intelligence: proceedings: september 19–22, compiègne university of technology, france (los alamitos, calif.: ieee computer society, 2005), 345–51, http:// csdl2.computer.org/persagen/dlabstoc.jsp?resourcepath=/ dl/proceedings/&toc=comp/proceedings/wi/2005/2415/00/ 2415toc.xml&doi=10.1109/wi.2005.63 (accessed oct. 20, 2006). 56. bates, “improving user access to library catalog and portal information.” 57. o’neill and chan, “fast: faceted application of subject terminology.” 58. mason, et al., “infomine: promising directions in virtual library development.” microsoft word 9733-16966-4-ce.docx editorial board thoughts: arts into science, technology, engineering, and mathematics – steam, creative abrasion, and the opportunity in libraries today tod colegrove information technologies and libraries | march 2017 4 over the millennia, man’s attempt to understand the universe has been an evolution from the broad to the sharply focused. a wide range of distinctly separate disciplines evolved from the overarching natural philosophy, the study of nature, of greco-roman antiquity: anatomy and astronomy through botany, mathematics, and zoology among many others. similarly, the arts, humanities, and engineering developed from broad over-arching interest into tightly focused disciplines that today are distinctly separate. as these legitimate divisions formed, grew, and developed into ever-deepening specialty, they enabled correspondingly deeper study and discovery1; in response, the supporting collections of the library divided and grew to reflect that increasing complexity. libraries have long been about the organization of, and access to, information resources. subject classification systems in use today, such as the dewey decimal system, are designed to group like items with like, albeit under broad overarching topic. a perhaps inevitable result for print collections housed under such a classification system is the physical isolation of items and, by extension, the individuals researching those topics from one another. under the library of congress system, for example, items categorized as “geography” are physically removed from those in “science;” further still from “technology.” end-users benefit from the possibility of serendipitous discovery while browsing shelves nearby, even as they are effectively shielded from exposure to distracting topics outside of their immediate focus. recent years have witnessed a rediscovery of, and renewed interest in, the fundamental role the library can have in the creation of knowledge, learning, and innovation among its members. as collections shift from print to electronic, libraries are increasingly less bound to the physical constraints imposed by their print collections. rather than a continued focus on hyperspecialization and separation, we have the opportunity to rethink the library: exploring novel configurations and services that might better support its community, and embracing emerging roles of trans-disciplinary collaboration and innovation. the library as intersection libraries reflect the institutional and organizational structures of their communities, even as the tod colegrove (pcolegrove@unr.edu), a member of the ital editorial board, is head of delamare science & engineering library, university of nevada, reno. editorial board thoughts | colegrove https://doi.org/10.6017/ital.v36i1.9733 5 physical organization of the structures built to house print collections mirror the classification system in use. academic libraries are perhaps most entrenched in the structural division: rather than intrinsically promoting collaboration and discovery across disciplines, the organization of print collections, and typically the spaces around them, is designed to foster increased focus and specialization. specialized almost to the exclusion of other areas of study altogether, in branch libraries of a college or university this division can reach a pinnacle; libraries and collections devoted to exclusive topics of engineering, science, music, and others, exist on campuses across the country. amplified by separation and clustering of faculty and researchers, typically by department and discipline, it becomes entirely possible for individuals to “spend a lifetime working in a particular narrow field and never come into contact with the wider context of his or her study.”2 the library is also one of the few places in any community where individuals from a variety of backgrounds and specialties can naturally cross paths with one another. at a college or university, students and faculty from one discipline might otherwise rarely encounter those from other disciplines. whether public, school, or academic library, outside of the library individuals and groups are typically isolated from one another physically, with little opportunity to interact organically. without active intervention and deliberate effort on the part of the library, opportunities for creative abrasion3 and trans-disciplinary collaboration become virtually nonexistent; its potential to “unleash the creative potential that is latent in a collection of unlikeminded individuals,”4 untapped. leveraged properly, however, the intersection of interests and expertise that occurs naturally within the neutral spaces of the library can become a powerful tool that supports not only research, but creativity and innovation a place where ideas and viewpoints can collide, building on one another: “for most of us, the best chance to innovate lies at the intersection. not only do we have a greater chance of finding remarkable idea combinations there, we will also find many more of them.... the explosion of remarkable ideas is what happened in florence during the renaissance, and it suggests something very important. if we can just reach an intersection of disciplines or cultures, we will have a greater chance of innovating, simply because there are so many unusual ideas to go around.”5 difficult and scary the problem? “stimulating creative abrasion is difficult and scary because we are far more comfortable being with folks like us.”6 and yet a quick review of the literature reveals that knowledge creation, innovation, and success are inextricably linked7, with the fundamental understanding of their connection having undergone a dramatic shift: “knowledge is in fact essential to innovate, and while this might sound obvious today, putting knowledge and innovation and not physical assets at the centre of competitive advantage was a tremendous change.”8 as our libraries move toward embracing an even more active role within our communities, our organizational priorities are undergoing similarly dramatic shifts: support for knowledge creation information technologies and libraries | march 2017 6 and innovation becomes more central, even as physical assets shift toward a supporting, even peripheral, role. libraries, as fundamentally neutral hubs of diverse communities, are uniquely positioned to be able to cultivate creative abrasion within and among their communities, fostering not only knowledge creation, but innovation and success. indeed, the combination of physical, electronic, and staff assets can be the raw stuff by which trans-disciplinary engagement is encouraged. the active cultivation and support of creative abrasion, with direct linkage to desired outcomes, becomes arguably one of the most vital services the library can provide its community. rather than deepening the cycle of hyper-specialization, the emergence of makerspace in our libraries is one example of a trend toward enabling libraries to broaden and embrace that support. building on the intellectual diversity within the spaces of the library, staff members, volunteers, and fellow community members can serve as catalyst, triggering groups to “do something with that variety”9 by engaging across traditional boundaries. indeed, “by deliberately creating diverse organizations and explicitly helping team members appreciate thinking-styles different than their own, creative abrasion can result in successful innovation.”10 strategic placement and staff support of makerspace activity can dramatically increase the opportunity for creative abrasion and, by extension, the resulting knowledge creation, creativity and innovation. arts bring a fundamental literacy and resource to stem in recent years, greater emphasis on students acquiring stem (science, technology, engineering, and math) skills has raised the topic to be one of the most central issues in education. considered a key solution to improving the competitiveness of american students on the global stage, the approach of stem education shares the common goal of breaking down the artificial barriers that exist even within the separate disciplines of sciences, technology, engineering, and math in short, increasing the diversity of the learning environment. proponents of steam go further by suggesting that adding art into the mix can bring new energy and language to the table, “sparking curiosity, experimentation, and the desire to discover the unknown in students.” 11 federal agencies such as the u.s. department of education and the national science foundation have funded and underwritten a number of grants, conferences, and workshops in the field, including the seminal forum hosted by the rhode island school of design (risd), “bridging stem to steam: developing new frameworks for art-science-design pedagogy.”12 john maeda, the president of the risd, identifies a direct connection between the approach and the creativity and success of late apple co-founder steve jobs, with steam support “a pathway to enhance u.s. economic competitiveness.”13 proponents go further, arguing the arts bring both a fundamental literacy and resource to the stem disciplines, providing “innovations through analogies, models, skills, structures, techniques, methods, and knowledge.”14 consider the findings of a study of nobel prize winners in the sciences, members of the royal society, and the u.s. national academy of sciences; nobel laureates were: editorial board thoughts | colegrove https://doi.org/10.6017/ital.v36i1.9733 7 twenty-five times as likely as an average scientist to sing, dance, or act; seventeen times as likely to be an artist; twelve times more likely to write poetry and literature; eight times more likely to do woodworking or some other craft; four times as likely to be a musician; and twice as likely to be a photographer.15 from the standpoint of creative abrasion, welcoming the “a” of art into the library support of stem disciplines increases the diversity of the library, and by default the opportunity for creative abrasion. from aristotle and pythagoras through galileo galilei and leonardo da vinci to benjamin franklin, richard feynman, and noam chomsky, a long list of individuals of wideranging genius hints at a potential left largely untapped by our traditional approach. connections between stem disciplines, art, and the innovation arising directly out of their creative abrasion surround us: the electronic screens used on a wide range of technology, including computers, televisions, and cell phones, are the result of a collaboration between a series of painter-scientists and post-impressionist artists such as seurat a combination of red, green, and blue dots generate full-spectrum images in a way not unlike that of the artistic technique of pointillism. the electricity to drive that technology is understood, in part, due to early work by franklin even as he lay the foundations of the free public library with the opening of america’s first lending library, and pursued a broad range of parallel interests. the stitches used in medical surgery are the result of nobel laureate alexis carrel taking his knowledge of lace making from a traditional arena into the operating room. prominent american inventors “samuel morse (telegraph) and robert fulton (steam ship) were among the most prominent american artists before they turned to inventing.”16 in short, “increasing success in science is accompanied by developed ability in other fields such as the fine arts.”17 rather than isolated in monastic study, “almost all nobel laureates in the sciences are actively engaged in arts as adults.”18 perhaps surprisingly, rather than being rewarded by an ever-increasing focus and hyper-specialization, genius in the sciences seems tied to individuals’ activity in the arts and crafts. the study’s authors cite three different nobel prize winners, including j. h. van’t hoff’s 1878 speculation that scientific imagination is correlated with creative activities outside of science19; going on to detail similar findings from general studies dating back over a century. of even more seminal interest, the authors point to a similar connection for adolescents/young adults where milgram and colleagues20 found “having at least one persistent and intellectually stimulating hobby is a better predictor of career success in any discipline than iq, standardized test scores, or grades.”21 discussion the connection between individuals holding a multiplicity of interests, trans-disciplinary activity, and success is clear; what is less clear is to what extent we are fostering that connection in our libraries today. the potential is nevertheless tantalizing: a random group of people, thrown together, is not likely to be very creative. by going beyond specialization and wading into the information technologies and libraries | march 2017 8 deeper waters of supporting and cultivating creative abrasion and avocation among the membership of our libraries, we are fostering success and innovation beyond what might otherwise occur. the decision to catalyze and foster the cross-curricular collaboration that is steam22 is squarely in the hands of the library: in the design of its spaces, and in the interactions of the staff of the library with the communities served. we can choose to actively connect and catalyze across traditional boundaries. as the head of a science and engineering library, one of the early adopters of makerspace and actively exploring the possibilities of steam engagement for several years, i have time and again witnessed the leaps of insight and creativity brought about by creative abrasion. from across disciplines members are engaging with the resources of the library and, with our encouragement, one another in an ever-increasing cycle of knowledge creation, innovation, and success. the impact is particularly dramatic among individuals from strongly differing backgrounds and disciplines: for example, when an engineering student, who considers themselves to be expert with a particular technology, witnesses and interacts with an art student using that same technology and accomplishing something truly unexpected, even seemingly magical. or when a science student approaching a problem from one perspective realizes a practitioner from a different discipline sees the problem from an entirely different, and yet equally valid, point of view. in each case, it’s as if the worldview of each suddenly melts: shifting and expanding, never to return to its original shape. transformative experiences become the order of the day, even as the informal environment offers a wealth of opportunity to engage with and connect end-users to the more traditional resources of library. by actively seeking out opportunities to bring art into traditionally stem-focused activity, and vice-versa, we are deliberately increasing the diversity of the environment. makerspace services and activities, to the extent they are open and visibly accessible to all, are a natural for the spontaneous development of trans-disciplinary collaboration. within the spaces of the library, opportunities to connect individuals around shared avocational interest might range from music and spontaneous performance areas to spaces salted with lego bricks and jigsaw puzzles; the potential connections between our resources and the members of our communities are as diverse as their interests. indeed, when a practitioner from one discipline can interact and engage with others from across the steam spectrum, the world becomes a richer place – and maybe, just maybe, we can fan the flames of curiosity along the way. references 1. bohm, d., and f. d. peat. 1987. science, order, and creativity: a dramatic new look at the creative roots of science and life. london: bantam. 2. ibid., 18-19. 3. hirshberg, jerry. 1998. the creative priority: driving innovative business in the real world. london: penguin. editorial board thoughts | colegrove https://doi.org/10.6017/ital.v36i1.9733 9 4. leonard-barton, dorothy, and walter c. swap. 1999. when sparks fly: harnessing the power of group creativity. boston, massachusetts: harvard business school press books. 5. johansson, frans. 2004. the medici effect: breakthrough insights at the intersection of ideas, concepts, and cultures. boston, massachusetts: harvard business school press, 20. 6. leonard-barton, dorothy, and walter c. swap. 1999. when sparks fly: harnessing the power of group creativity. boston, massachusetts: harvard business school press books, 25. 7. nonaka, ikujiro. 1994. “a dynamic theory of organizational knowledge creation.” organization science 5 (1): 14–37. 8. correia de sousa, milton. 2006. “the sustainable innovation engine.” vine 36 (4): 398–405, accessed february 14, 2017. https://doi.org/10.1108/03055720610716656. 9. leonard-barton, dorothy, and walter c. swap. 1999. when sparks fly: harnessing the power of group creativity. boston, massachusetts: harvard business school press books, 20. 10. adams, karlyn. 2005. the sources of innovation and creativity. education, september, 2005, 33. https://doi.org/10.1007/978-3-8349-9320-5. 11. jolly, anne. 2014. “stem vs. steam: do the arts belong?” education week teacher. http://www.edweek.org/tm/articles/2014/11/18/ctq-jolly-stem-vssteam.html?qs=stem+vs.+steam. 12. rose, christopher, and brian k. smith. 2011. “bridging stem to steam: developing new frameworks for art-science-design pedagogy.” rhode island school district press release. 13. robelen, erik w. 2011. “steam: experts make case for adding arts to stem.” education week. http://www.bmfenterprises.com/aep-arts/wp-content/uploads/2012/02/ed-week-stemto-steam.pdf. 14. root-bernstein, robert. 2011. “the art of scientific and technological innovations – art of science learning.” http://scienceblogs.com/art_of_science_learning/2011/04/11/the-art-ofscientific-and-tech-1/. 15. ibid. 16. ibid. 17. root-bernstein, robert, lindsay allen, leighanna beach, ragini bhadula, justin fast, chelsea hosey, benjamin kremkow, et al. 2008. “arts foster scientific success: avocations of nobel, national academy, royal society, and sigma xi members.” journal of psychology of science and technology. https://doi.org/10.1891/1939-7054.1.2.51. information technologies and libraries | march 2017 10 18. ibid. 19. van’t hoff, jacobus henricus. 1967. “imagination in science,” molecular biology, biochemistry and biophysics, translated by g. f. springer, 1, springer-verlag, pp. 1-18 20. milgram, roberta m., and eunsook hong. 1997. "out-of-school activities in gifted adolescents as a predictor of vocational choice and work." journal of secondary gifted education 8, no. 3: 111. education research complete, ebscohost (accessed february 26, 2017). 21. root-bernstein, robert, lindsay allen, leighanna beach, ragini bhadula, justin fast, chelsea hosey, benjamin kremkow, et al. 2008. “arts foster scientific success: avocations of nobel, national academy, royal society, and sigma xi members.” journal of psychology of science and technology. https://doi.org/10.1891/1939-7054.1.2.51. 22. land, michelle h. 2013. “full steam ahead: the benefits of integrating the arts into stem.” procedia computer science 20. elsevier masson sas: 547–52. https://doi.org/10.1016/j.procs.2013.09.317. 10980 2019038 editor lita president’s message updates from the 2019 ala midwinter meeting bohyun kim information technology and libraries | march 2019 2 bohyun kim (bohyun.kim.ois@gmail.com) is lita president 2018-19 and chief technology officer & associate professor, university of rhode island libraries, kingston, rhode island. in this president’s message, i would like to provide some updates from the 2019 ala midwinter meeting held in seattle, washington. first, as many of you know, the potential merger of lita with alcts and llama has been temporarily put on hold, due to an initial timeline that was rather ambitious and the lack of time required to deliberate on and resolve some issues in the transition plan to meet that timeline.1 these updates were also shared at the lita town hall during the midwinter meeting, where many lita members spent time discussing topics such as the draft mission and vision statements for the new division, what makes people feel at home in a division, in which areas lita should redouble its focus, and which activities lita may be able to set aside without losing its identity. valuable feedback and thoughts were provided by town hall participants. many emphasized the importance of building and retaining a community of library technologists around lita values, programming, resources, advocacy, service activities, and networking opportunities in those feedback. the merger-related discussion is to resume this spring, and the leadership of lita, alcts, and llama will make every effort to ensure the best future for three divisions at this time of great flux and change. second, lita is looking into introducing some changes to the lita forum. in the feedback and thoughts gathered at the lita town hall, the lita forum was also mentioned as one of the valuable lita offerings to its members. the origin of the lita forum goes back to lita’s first national conference held in baltimore in 1983.2 since then, the lita forum has become a cherished venue for many library technologists, a place where they meet other like-minded people in the field, learn from one another, share ideas and experience, and look for more ways in which technology can be utilized to better serve libraries and library patrons. initially, the steering committee hoped that all three divisions would participate in putting together the lita forum with a wider range of content that encompasses the interests of not only lita members but also of those in alcts and llama, in a virtual format in order to engage more members who cannot easily travel, to be held some time in spring 2020. at the time this idea was conceived more than a year ago, it was assumed that all preparations for the member vote regarding the merger would have been nearly completed by the time of the midwinter meeting. however, the steering committee unfortunately ran out of time for that preparation. merger planning also took up almost the entirety of the time that the leadership and the staff of the three divisions had available. this resulted in an unfortunate delay in proper forum planning. with the merger conversation on hold at this point and the new timeline for the merger being likely to be set back at least by a year, the changed circumstances for the forum planning had to be reviewed. information technology and libraries | march 2019 3 after a lively and thoughtful discussion at the midwinter meeting, the lita board decided that, considering how much work remains to be done regarding merger planning, it may not be practical or feasible to have the next lita forum be the first virtual and joint one. however, there was a lot of interest in and excitement about trying a virtual format since it will allow lita to reach and serve the needs of more lita members than the traditional in-person meeting. it was also pointed out that the virtual format may provide an opportunity for lita to experiment with different and more unconventional conference program formats, which could be a welcoming change to lita members. the lita board, however, also acknowledged the value of a physical conference where people get to meet one another in person, which cannot be easily transferred to a virtual conference. if the virtual conference experiment takes place and is successful, lita may hold its forum alternating every year between two different formats – virtual and physical. planning for and running a fully virtual conference at the scale of a multi-day national forum will require additional time and careful consideration since it will be the first time the lita forum planning committee and the lita office attempt this. logistics management is likely to be quite different in a virtual conference. the attendee expectations and the user experience will also significantly differ in a virtual conference than in a physical conference. as the first step of this investigation, the lita forum planning committee will explore what the ideal lita virtual forum may look like in terms of programming formats and participant experience. the lita office and the finance advisory committee will also look into the financial side of running the lita forum in a virtual format. at this time, it is not yet determined when the next lita forum will be held and whether it will be a virtual or a physical one. once these investigations are completed, however, the lita board should be able to decide on the most appropriate path towards the next lita forum. stay tuned for what exciting changes may be coming to the lita forum. third, i would like to mention that lita issued a statement regarding the incidents of aggressive behavior, racism, and harassment reported at the 2019 ala midwinter meeting.3 along with the statement, the lita board has decided to commit funds to provide an online bystander / allyship training, which we hope will equip lita members with tools that empower active and effective allyship, recognize and undo oppressive behaviors and systems, and promote the practice of cultural humility, thereby collectively increasing our collaborative capacity. the lita statement and the board decision were received positively by many lita members. other ala divisions such as alcts, alsc, asgcla, llama, united, and yalsa have already expressed interest in working together with lita on this, and the lita board is looking into a few options to choose from. more information about the training will be soon provided. lastly, i am thrilled to announce that the lita president’s program at the upcoming ala annual conference at washington d.c in june will feature meredith broussard, a data journalist and the author of artificial unintelligence: how computers misunderstand the world, as the speaker. in her book, broussard delves into many problems surrounding techno-chauvinism, which displays blind optimism about technology and an abundant lack of caution about how new technologies will be used. she further details how this simplistic worldview that prioritizes building new things and efficient code above social conventions and human interactions often misinterprets a complex social issue as a technical problem and results in a reckless disregard for public safety and the public good. lita president’s message: updates from the 2019 ala midwinter meeting | kim 4 https://doi.org/10.6017/ital.v38i1.10980 reviewing the early history of computing and digital technology, broussard observes: “we have a small, elite group of men who tend to overestimate their mathematical abilities, who have systematically excluded women and people of color in favor of machines for centuries, who tend to want to make science fiction real, who have little regard for social convention, who don’t believe that social norms or rules apply to them, who have unused piles of government money sitting around, and who have adopted the ideological rhetoric of far-right libertarian anarcho-capitalists. what could possibly go wrong?”4 i invite all of you to come to this program for more insight and a deeper understanding about what the recent technology innovation involving artificial intelligence (ai) and big data means to our everyday life and where it may be headed. the program information is available in the ala 2019 annual conference scheduler at https://www.eventscribe.com/2019/alaannual/fspopup.asp?mode=presinfo&presentationid=519109. endnotes 1 the official announcement can be found at the lita blog. see bohyun kim, “update on new division discussions,” lita blog, january 26, 2019, https://litablog.org/2019/01/update-onnew-division-discussions/. 2 stephen r. salmon, “lita’s first twenty-five years: a brief history,” library information technology association (lita), september 28, 2006, http://www.ala.org/lita/about/history/1st25years. 3 “lita’s statement in response to incidents at ala midwinter 2019,” lita blog, february 4, 2019, https://litablog.org/2019/02/litas-statement-in-response-to-incidents-at-alamidwinter-2019/. 4 meredith broussard, artificial unintelligence: how computers misunderstand the world (cambridge, massachusetts: the mit press, 2018), p. 85. institutional political and fiscal factors in the development of library automation, 1967-71 allen b. veaner: stanford university, stanford, california. 5 this paper (1) summarizes an investigation into the political and financial factors which inhibited the ready application of computers to individual academic libraries during the period 1967-71, and (2) presents the author's speculations on the future of libraries in a computer dominant society.il> technical aspects of system design were specifically excluded from the investigation. twenty-four institutions were visited and approximately 100 pe1·sons interviewed. substantial future change is envisaged in both the structure and function of the library, if the eme1·ging trend of coalescing libraries and computerized «information processing centers" continues. summary of major factors which inhibited the application of computers to library problems, 1967-71 major factors which inhibited the application of the computer to the library during the period 1967-71 can be categorized under three broad headings: (a) governance, organization, and management of the computer facility; (b) personnel in the computer facility; and (c) deficiencies in the library environment. a. governance, organization, and management of the computer facility 1. uncertainty over who was in charge of the computer facility.-this problem was partly attributable to the fact that the goals and objectives of the facility were imprecisely stated or not stated at all often there was no charter, no systematic procedures for establishing priorities, and excessive autonomy by the computer facility. these factors often permitted the facility to operate as a self-directing, self-sustaining entity, responsible to no informed, upper level manager. '~> the paper is based on a clr fellowship report to the council on library resources, inc., for the period january-june 1972. 6 journal of lihra1·y automation vol. 7/1 march 1974 2. effect of high level administrative changes.-in a few instances, the library automation effort was instigated by the president of the institution. he could, in effect, personally direct the allocation of resources. however, whenever a high administrative official leaves, the resulting vacuum is quickly filled by other interests, the atmosphere changes, and his personal program goals dissolve. 3. management inadequacies.-the effects of domination by a technician or special interest group are described below in more detail. although more and more organizations are putting together influential user groups to point the way toward better management, decision-making responsibility and authority continued to be misplaced in a few institutions which vested authority for technical decisions in a committee of deans who were somewhat remote from current trends in computing because of their administrative responsibilities. (in one institution, it was half jokingly stated that a dean in any hard science could be characterized as suffering from a minimum technological time lag of two years.) 4. lack of long-range planning inclusive of attention to community priorities.-few facilities visited had any written long-range plans, either for the acquisition of hardware, the conversion of older programs, or the involvement of users in systems design. ad hoc arrangements were prevalent. 5. system instability.-this was more the rule than the exception, especially in software, operating systems, hardware configuration, and pricing. wherever an academic computing facility was used for library development, the same broken record always seemed to be playing: the facility was always being taken apart and put together again. of course library development was not the only user affected; complaints arose from all users. 6. biased pricing algorithms.-in the academic facility, student and research use were competitive. hence systems were typically geared to distribute computing resources around the clock in some equitable and rational way. for instance, short student jobs were sometimes given a high priority for rapid turnaround, while long, grinding calculation work was pushed off to the evening or night shift by means of variable pricing schedules or algorithms. a pricing algorithm is basically a load leveling device to smooth out the peaks of over-demand and the valleys of under-utilization which would have occurred in the absence of such controls. devising pricing algorithms is by no means a simple task, since many factors must be taken into account: the kinds of machine resources available, their respective costs, the data rates at which they can function, market demand, hardware and software available, and system overhead, to name but a few. library jobs tended to suffer in both batch and on-line processing. in the former case, because batch jobs on large data bases took so much institutional political and fiscal factors/veaner 7 time, library work generally could not be done during the prime shift; in the latter case, an on-line library system made substantial demands upon a facility's storage equipment and telecommunications support, and competed with all other on-line users. 7. sense of competition with the library for hard dollars.-this problem, which is related to pricing bias, is detailed further on page 21. 8. scheduling problems.-many of the institutions visited had systems or charts for scheduling production, development, and maintenance. but conversations with system users often verified that schedules were either not met or had been unrealistically established. this was especially the case with development work b. personnel in the computer facility 1. selection and evaluation.-inasmuch as the library often did not have the competence to judge personnel nor the ability to generate meaningful specifications, there was generally very little protection from incompetence in this area. 2. elitism: the notion that the masters of the computer are inherently superior to and have better judgment than computer customers.-elitism is a paradox: it can be positive or negative-positive when the best brains produce software designs of true genius with respect to function, performance, economy, and reliability-but in its negative manifestation, reminiscent of the girl with the curl in the middle of her forehead: "when she was good, she was very, very good; when she was bad, she was horrid." during the boom years when computer facilities were expanding faster than the supply of competent staff, elitism seemed fairly common in the computer center. the excitement of rapid development, the seemingly unlimited intellectual challenge presented by the powerful apparatus, and high strung dispositions sometimes caused tempers to flare or immaturity to sustain itself beyond a reasonable time. strange hours, strange habits, bizarre behavior, all seemed to conspire against ordered and rational development. fortunately, as the field matures, the negative aspects of elitism are dying; managers now can concentrate on staff development work to turn top intellectual talents toward productive achievement. 3. disinterest.-this factor may be allied to elitism. in some instances, the computer center's staff gave considerable attention to the library during the period immediately following machine installation, when utilization was low. later, the staff's keen interest became "dulled" at the thought of operating a production system. "more interesting jobs" were .challenging the programmers and beginning to fill up the machine. 4. fear of the unknown big user.-it was recognized early that the library could be among the computer facility's largest potential customers, perhaps the largest. in some facilities, this recognition may have induced 8 journal of library automation vol. 7/1 march 1974 a fear of being taken over or overwhelmed by the user, who would then be in a position to dominate and dictate the direction of further development and operations. 5. fears of an unknown production environment-simply expressed, a production environment removes much of the stimulus for creative approaches to problem solving unless continuous development is maintained for new systems and new applications. many of the best programmers did not wish to lose their freedom to innovate and actively resisted participation in establishment of a production environment, with its concomitant requirement of "dull" maintenance support work. c. deficiencies within the library environment 1. failure to understand in full detail the current manual system.even where the manual system was understood, there was often an inability to describe it in the clear, unambiguous style essential to system design work. these deficiencies were further compounded by the unwillingness of some librarians to learn how to communicate adequately with computer personnel. 2. inability to communicate design specification.-many did not understand how to put together a specification document; particularly they did not know how to account exhaustively for all possible cases or alternatives. librarians were unaccustomed to defining their data processing requirements quantitatively or with precision-both absolutely indispensable to the computer environment. also, as much as the computer facility changed its software environment, many library development efforts were constantly changing their system requirements-a condition which made it all but impossible to program efficiently. 3. failure to understand the development process.-development is a new phenomenon in libraries. most librarians were not educated to comprehend development as an iterative process, characterized by experimentation, error, feedback, and corrective measures. accustomed to the relative stability of long-established procedures-some of which had stood for generations, even centuries-some librarians were baffled by the rapidly changing new technology, others showed impatience and a low tolerance for frustration. many expected development projects to resemble turnkey operations, and the failure of the process to accommodate these expectations produced disappointment and an inability to cope with the computer environment. 4. failure to recognize the computer as a finite resource.-both librarians and early facility managers seemed to look upon the computer as an inexhaustible resource, the former through lack of sophistication and the latter apparently through myopia or possibly ambition. some managers must have told their users that there was "no way" their equipment could be saturated in the foreseeable future. apparently some library users were naive enough to believe. institutional political and fiscal factorsjveaner 9 5. excessive or unrealistic performance expectations.-few library users understood the relationship between the system specifications and functional results, and fewer still understood the significance of performance specifications. the situation was not assisted by notions of "instantaneous" retrieval pushed by salesmen or the popular press. (the writer recalls vividly how one salesman told him the library could have a crt device for $1 a day! and indeed, the device itself was $1 per day if one cared to do without the keyboard, without cables, installation, control units, teleprocessing overhead, a computer, software, etc.) 6. lack of an established tradition of research and development ( r & d) and the lack of venture capital in the library community.the challenge of the computer may have been largely responsible for activating research and development as a serious and continuous effort in librarianship. inexperience in raising and managing funds for r & d, as well as a general lack of knowledge of computer cost factors inhibited progress or tended to make the development effort inefficient and full of surprises. 7. human problems.-some libraries having prior experience with small batch systems underestimated the scale of effort for contributing to the design of the large system, selling it to the users, installing it, and training the users. 8. insufficient support from top management.-in some instances, library management did not accord the automation effort the kind and degree of support essential to success. in particular, some librarians seemed to feel that automation was a temporary affair, definitely of less importance and significance than current manual operations. some did not recognize the sacrifices in regular production that would be necessary and some did not appreciate the continuing nature of development work. background two important prerequisites to progress in library automation were money and technical readiness. the government supplied the first, industry the second. the announcement by ibm in 1964 of its system 360 occurred at a fortunate time for the american library community. president johnson's administration had launched enormous programs in support of education. the library services and construction act was soon to channel millions of dollars into library plant expansion and, perhaps more significantly, the higher education act of 1965 was to sponsor research, which ui1til then had only the support of limited funds from the council on library resources, inc., and the national science foundation. (support from the national science foundation was largely, although not exclusively; directed toward discipline-oriented information services; one of the largest nsf grants went to the university of chicago library.) it was the right time to invest in library automation. important milestones were already behind the library community: the national library 10 journal of lihm1'y automation vol. 7/1 march 1974 of medicine's medlars program was well underway, the airlie conference on library automation had been held and its report published ("the white book"), and the library of congress automation feasibility study ("the red book") had appeared. 1 • 2 the first marc format was being tested in the field. in computer technology, third generation equipment represented major increases in computing power, processing speed, reliability, and capacity to store data in machine-readable form. ibm's sales force was successful beyond imagination in getting system 360's installed in large universities, as well as in business and government. ibm promised a new kind of software-time-sharing-which would virtually eliminate the tremendous mismatch of data processing speed between the human being and the machine. the new methods of spreading computer power through teleprocessing and time-sharing promised to make the computer at least competitive with and possibly an improvement over "antiquated" manual systems of providing rapid access to large and complex data files. within this relatively unknown environment, universities and libraries entered the software development process, which if successful, could enable them to catch up where they had been hopelessly falling behind. circulation, book purchasing, and technical processing loads in many libraries seemed to double and triple overnight as the country's schools and their programs grew to accommodate expanding enrollments. manual systems that had been reasonably workable and responsive in environments characterized by slow growth demonstrated significant and disturbing defects -the inability to deal with peak loads, or rapidly changing loads. the same effects were felt in administrative and academic computing: a bigger and more complex payroll, more students to register, construction contracts to monitor, more research grants which demanded bigger computers, and so on. these were truly boom years. but in the academic community there was still another force developing which was ultimately to be of even greater significance for libraries than the inconveniences of being unable to handle the housekeeping load: a dramatic rise in the expectations of patrons, especially in the academic community, where computers already abounded. libraries had come to be felt by some as strongholds of conservatism and expensive luxuries; librarians were faulted for not "putting the card catalog onto magnetic tape," for not implementing automated circulation systems, or otherwise failing to take advantage of new and powerful data processing techniques. the libraries were caught amidst a variety of sometimes conflicting, sometimes complementary factors: the visionary ignorance of the computer salesman, the senior academic officer possessed by the computer dybbuk, a lack of sympathy or understanding among some computer center managers, a lack of appreciation by students and faculty of the complexity of identifying, procuring, and cataloging unique copies of what must be the least standardized product known to man, and their own lukeinstitutional political and fiscal facto1'sjveaner 11 warm commitment to undertake the hard work required to learn how to use the computer resource. anxieties about jog displacement caused some library staff to look upon computers with trepidation, thus further placing the librarian in a defensive position. while these forces were taking shape, the library's bibliographic activities continued to be seriously hampered by inadequate international bibliographic control.~~ some essential computer hardware, especially the programmable crt terminal with an adequate character set, was either nonexistent or totally unsuitable to library applications. in this institutional context librarians entered the world of computers and data processing. t purpose it is the purpose of this report to examine in some detail how internal institutional factors affected the development of computerized bibliographic systems, and especially to consider nontechnical, negative factors: what slowed down or inhibited the applications of computers in librarianship? this report is not concerned with the merits or demerits of specific systems or their features; indeed, the investigator did not inquire about system specifications. major questions centered about the factors which fostered or hindered the development p1'ocess, regardless of the merit of a project or system. scope investigation was limited almost solely to those institutions considered likely to have large scale, in-house development projects using third generation computer equipment. the majority of places visited were large academic libraries. the time span included in the survey begins approximately in 1967 and ends in 1971. a total of twenty-four institutions was visited and some 100 persons interviewed; a list of the institutions visited is in appendix 1. methodology site visits and i nte1'views arrangements were made to visit four types of individuals: the director of libraries, the head of the library's system development department, the director of the computation center, and whatever principal institutional officer was managerially and/ or financially responsible for campus computing. considerable variation was found in the type of person assigned this last responsibility-it could be the provost, the vice-president u implementation of the library of congress' shared cataloging program under title ii6f the higher education act of 1965 was soon to alter this situation dramatically. t the painful trauma libraries and librarians experienced in getting into computers is too well documented to summarize here. perhaps the best summary has been done by stuart-stubbs. a 12 ] oumal of library automation vol. 7/1 march 197 4 for academic affairs, or the vice-president for business/ financial affairs. choice of the major institutional official to be interviewed was often determined by the pattern of computing in a particular institution, or the facility which supported the development effort. at first the investigator attempted to utilize a structured questionnaire for interviewing. this very quickly broke down, as the interviewees were generally voluble and ranged widely over many related topics or items which they would have been asked about later. accordingly, after the first few interviews, the formal questionnaire approach was dropped and a simple checklist of major questions kept on a few cards to make sure that each major issue had been addressed. every interviewee received the investigator graciously and none was unwilling to talk; indeed, if anything the opposite was the case-most persons seemed to be eagerly waiting for an opportunity to air their views. visits and interviews occurred during the period january-april1972. literature searches searching the literature on this topic has been extremely frustrating. in the literature of computer science and management, there are many articles on pricing algorithms, machine resource allocation schemes, and issues of managing the computer facility, but none specific to the topic of this report. besides scanning professional literature, the author has regularly conducted for the past year monthly computer searches via the ucla center for information service's sdi service. abstracts and citations were searched in research in education (rie) and current index to journals in education (cije). with respect to problems faced by the library in acquiring computer services, the results have been nil in both cases. the author reluctantly concludes that no major recent studies have yet been published in this sensitive area, although two papers by canadian librarians are very helpful. 3• 4 the national academy of sciences/computer science and engineering board's information systems panel appears to have come closest to identifying the issues in its report, library and information technology: a national systems challenge. still, the comments in that report are highly generalized and do not grapple with specifics. 5 structure of educational computing most of the visited institutions maintained separate facilities for administrative and academic computing, while a few ran combined facilities or were in the throes of consolidating their facilities. the differences between administrative and academic computing have historical roots deeply embedded in institutional soil. administrative computing is usually an outgrowth of punched card installations first set up for payroll and financial reporting. academic computing, on the other hand, has its origins within the institution's instructional and research programs. typically it has been supported by external grants and contracts and has been oriented toward institutional political and fiscal facto1'sjveaner 13 the "hard" sciences. until the recent dropoff in federal support of higher education, academic computing was a money maker (through the overhead on grants and contracts) while administrative computing was a money spender. administrative computing typically very little computational work is done in administrative applications; most of the computer work is associated with input, update, reading records, writing records, and printing reports. except for the pay.roll application, the consumer group has tended to be somewhat smaller and less transient than the academic group. but to university administrators the computer could do much more than write checks and pay bills. many significant administrative applications had already been installed on second generation equipment: faculty-staff directories, inventories of space, supplies, and equipment, records of grades, course consumption reports, etc. all these tended to expand the user group, increasing competition for the resource. the advent of third generation equipment made it attractive for administrators to think about applications centered around the so-called "integrated data base." this led to a demand for further new services for the registrar, fund raising and gift solicitation, student services, purchasing, etc. conventional administrative computing-particularly that part of it which generated regular reports-lent itself naturally to batch processing, and indeed many of the early computer installations actually continued established punched card operations, merely using the computer as a faster calculator and printer. the administrative computing shop is typically characterized by (or hopes to be characterized by) great systems stability and dependability, a cautious and measured rate of innovation, and in the opinion of some academic computing types, not much imagination. file integrity, backup and recovery, and timely delivery of its products are prime goals in an administrative computing system. the administrative computing facility very much resembles the library in two important aspects: ( 1) it is a production system; and ( 2) it is almost entirely an overhead function, i.e., there is little or no attempt at cost recovery from system users for its services. academic computing academic computing is a much different world. it serves a large, vociferous, .influential, and mostly technological user community, many of whom ~~e not only competent in programming, but more importantly, possess ready cash. but this is changing: as academic computing expands to service users in the humanities and social sciences rather than mainly those in the "hard" sciences, the user group is growing and it will probably not be long before it embraces the total academic community. in hard science applications, the academic facility typically performs an 14 journal of library automation vol. 7/1 march 1974 enormous amount of computing ("number crunching") with a relatively small amount of output. system backup and recovery is important to the academic computing facility, but file integrity responsibility may often be assigned to the user since such a center sometimes does not maintain the data base but merely provides a service for manipulating it. the main components of academic use are departmentor discipline-oriented research and student instruction, the latter being particularly strong if there is a well-established computer science department. software development has customarily played a major role in academic computing and the usual practice was to actively seek out imaginative systems programmers for whom change and system improvement are food and drink. consequently, instability, both in hardware and software, has been more the rule than the exception in the recent past, although as the management of computer facilities matures, this too is changing. currenttrendsandstatus it is obvious from the above that administrative and academic computing have been characterized by diametrically opposed machine and managerial requirements. where they have been combined in the same facility, tensions have prevailed and neither user was happy. in a few instances known to the writer, such combinations have been abortive and a reversion made to divided facilities. but as computing matures it is becoming evident that operational stability is needed for all types of computing, not just administrative computing. additionally, the financial crises now prevalent in institutions of higher education have brought more realistic attitudes to the fore in understanding just what kinds of facilities can be afforded, and how they should be managed. additionally, the economies of scale, the increasing flexibility of hardware and growing sophistication of software are now combining to form an environment which can better satisfy all potential users of computers. there are clear indications that a unified, well-managed shop with competent staff might now economically and efficiently serve a variety of applications, including administrative and academic-on the same facility. however, this is a developing trend and does not correspond with what the writer actually observed during his visits. in situ he saw much evidence that anthony oettinger's observations of some years ago were still valid: ... routine scheduled administrative work and unpredictable experimental work coexist only very uneasily at best, and quite often to the serious detriment of both. where the demands of administrative data processing and education require the same facilities at precisely the same time, the argument is invariably won by whoever pays the bills. finances permitting, the loser sets up an independent installation. 6 indeed, it would not be unreasonable to conclude from the interviews that in most places visited, computing during the period 1967-71 was in a institutional political and fiscal factorsjveaner 15 state of disarray. there is abundant and disagreeable evidence of technical incompetence, lack of management ability, ill spent money, communication failures, and naive and disillusioned users. but it would be a mistake to conclude that the failures in library automation are attributable primarily to computer-oriented personnel or hardware problems-librarians in their own way displayed many of these same failures. it would be another mistake to dwell excessively on the high failure rates observed. in any complex technological endeavor, the rate of failure is dramatically high at the beginning; there is ample evidence here from the aircraft and space industries. indeed, the likelihood of a first success in anything complex-library automation is complex, as we have learned the hard way-is practically nil. organization and management problems: the academic computing environment early academic computing facilities were typically run by faculty members in engineering, applied mathematics, computer science, or related fields. this arrangement was satisfactory when computers were small, relatively primitive, and the user community was confined to those few people who could program in machine language or assembly language. as equipment became bigger and more powerful, and as higher level programming languages developed, more and more people learned programming. correspondingly, the task of managing the computer facility grew rapidly in size and scope. the budget of a large computer center in a modern university can easily run to several millions of dollars annually. the manager must balance seemingly innumerable, complex forces: personnel, management, government and vendor relationships, demands from vocal users, establishing priorities, the challenge of hardware advances, marketing, pricing services, balancing the budget, etc. it soon became clear that few faculty members possessed either the multifaceted talents or the experience required for effective management. as the center's budget grew, and particularly as the shift was made from second to third generation equipment, th,e faculty member tended to be replaced by the technician as manager. unf01tunately for many of the facility users, the technician tended to promote his own technical interests in software development or hardware utilization. in some instances, the user community felt that the facility was being run more for the benefit of the staff than for the users. the technician-manager often looked at the computer as his personal machine, much as some faculty members had earlier felt the computer to be their own private preserve. the vice-president of one university expressed the view that the technician-manager doesn't really have an institutional loyalty tied to the goals and objectives of the academic programs; he is more loyal to the machine or the software. in a school with a long history of computer utilization, there had been no tech16 journal of library automation vol. 7/1 march 1974 nician in charge of the computer facility for a decade. yet in a school not too far away, an officer indicated that his institution had "made the same mistake twice in a row" by hiring a technician to manage the computer facility. the technician-manager represents a highly personalized management style, one in which goodwill, friendship, or personal interest is the key to effective service. it can hardly represent an arrangement for the successful development and implementation of computerized bibliographic systems. in the third and current organization and management phase of academic computer facilities, the professional manager is in charge. schools are now beginning to see the need to develop formal charters for their computing centers, quasi-legal instruments which will lay out their specific responsibilities as service agencies. a professionally managed service agency eliminates one of the most irritating elements in the allocation of computer resources: personal judgment by the faculty or technician-manager as to the worth of a project, which was so prevalent during earlier management stages. at the time of the interviews, very few institutions actually had such charters, but their need was being recognized. it is now universally accepted that the computer center can no longer be the plaything of the faculty nor the expensive toy of the technician. organization and management: the administrative environment because of its historical development the administrative computing facility was usually first run by someone with an accounting or financial background. (academic computing persons occasionally put disparaging labels on such people as "edp-types" or characterized them as having a "punched card mentality.") the nature of the workload virtually meant that the administrative shop would be set up mainly for batch processing and any data base services provided for other users would involve printed lists. such facilities were found satisfactory by a number of libraries even for applications such as circulation, which produced gigantic lists-probably because it represented a vast improvement over an antiquated, poorly designed, or overloaded manual system. however, there was at least one major technical consideration which had direct political and financial implications for the library which turned to the administrative computing facility for its computer support. this was the library's need to support and manipulate a data base with nearly every data element of variable length-a requirement that was practically nonexistent in administrative computing. some facilities were unable or unwilling to meet this requirement. the move from tape-oriented systems to mixed disc and tape systems on third generation equipment necessitated an upgrading of programming staff, and brought into the administrative shop the same clearcut distinction between system programmers and application programmers which had institutional political and fiscal factorsjveaner 17 emerged earlier in the academic shop. this change in turn demanded appointment of more knowledgeable facility managers, many of whom were drawn from business and industry rather than the ranks of in-house accounting staff. this transitional period was characterized by two enormously challenging parallel efforts: the conversion of existing programs to run on third generation equipment and the development of new applications. to an extent these responsibilities were competitive, and from this viewpoint it was certainly not a propitious time to embark upon anything as complex as bibliographic data processing. yet numerous workable systems emerged for circulation, book catalogs, ordering and accounting systems, and serials lists. these were not accomplished without anguish as the library did not control the machine resources and often did not control the human resources -the facility manager tended to make his pliority decisions to please his boss who was certainly not the librarian. besides, no application could really take precedence over payroll or accounting in the administrative shop. to the librarian it was more like borrowing another person's car than renting or owning a car: when the resource was urgently needed someone else had first call. organization and management: the library automation endeavor a detailed study of this subject is not within the scope of this investigation. however, it will be useful to note that the organization and management of library automation activities demonstrate development phases which closely parallel those in the computing environment: 1. a stage in which the user himself ( cf. accountant or faculty member) undertakes to perform the activity. in this stage individual librarians learned programming, did their own design work, wrote, debugged, and ran programs themselves. (this was possible in the "open shop" environment prevalent in many early computer facilities.) 2. a stage in which the technician-in this case a librarian with appropriate public service expertise (for circulation applications) or technical processing knowledge (for acquisitions, cataloging, or serials) -took charge of an organized development effort, hired his own programmers and systems analysts, and negotiated directly with the computer facility.* 3. a stage in which the professional system development manager is hired to oversee the total effort. such a person is sometimes drawn from business or industry, is a seasoned project manager, and has broad knowledge of computers, especially in the area of costs. such an ap*the technical person need not be a librarian. northwestern university represents a significant instance where a faculty member in computer sciences and electrical engineering undertook the development effort. 18 ]oumal of library automation vol. 7/1 march 1974 pointment is more common in the large library, the consortium, or network. human problems associated with rapid change in institutions some institutions, particularly in their administrative functions, became embroiled in a seemingly endless round of internal psycho-social problems which did not make the environment conducive to problem solving. the move to computerizing manually oriented functions, whether in the library or other parts of an institution, was found to be extremely threatening to established departmental structures. it was consistently reported that the political and emotional aspects of system conversion, both in the libra.ry and elsewhere, were much more aggravating than the technical aspects. the problem simply showed up first outside the library because applications of computers occurred there earlier. departments were sometimes unwilling to give up data for computer manipulation for fear that computerization would take jobs away. this phenomenon is not unknown in librarianship where some professionals take an extremely proprietary attitude toward bibliographic data. now pressures from governments, legislatures, and the academic community at large are gradually establishing the concept that some categories of data are corporate, and do not belong to a specific individual or department, or even to an institution, but should be shared through networking or other mechanisms. but the rapidity of microsocial change and its upsetting emotional consequences caught some library leaders unawares. a considerable reeducational process for both management and labor is required to smooth the transition to the new view. motivation problems it is difficult to elicit sound comment concerning motivation (or lack thereof) as a deterrent to progress in library automation. it is an emotional subject and neither the librarians nor the programmers come out "clean." the prima donna computer programmer, much in evidence in the early days of computer center development, is very much on the wane these days. like the spoiled child, the prima donna programmer could only exist where personal interests were permitted to take precedence over social goals-or perhaps where institutional goals for the computer facility had not been clearly articulated or had not yet come into focus. some prima donnas, partly out of ignorance, partly through a stereotyped image of library activities, were inclined to disdainfully dismiss library applications as "trivial," and demand "really challenging" assignments. but the librarians had their prima donnas, too. some had learned enough programming to be a little dangerous and they then felt like peers who could tell the computer center not only what to do but how to do it. at first, few members of the library staff were willing to learn how to arinstitutional political and fiscal factorsjveaner 19 ticulate their specifications and requirements to the management of a computer facility. most librarians expected some kind of miraculous magic, akin to a wave of the hand, to bring a computer system to reality. very few understood the heuristic nature of development. so there were barriers of status, depth of knowledge, and language-any one of which would have sufficed to kill the development of the good motivation essential to breaking new gro~nd. in the wrong combination they could present an overwhelming conspiracy, for their mutual interaction could only produce polarization and intransigence. the library and the computer facility the role of similarities and differences for a long time the library has been the "heart of the university." until the advent of the computer, little could challenge the supremacy of the library as the principal resource of an educational institution. even the faculty could be put into second place, since it was difficult to attract high quality faculty without good library resources, and the faculty were to a greater degree transient, for the library was considered "permanent," an investment for all time. the computer represents a new and challenging force in the arena where shrinking resources are allocated among competing academic users. both the library and the computer facility have experienced exceedingly rapid growth in the recent past, concurrent with an expanded demand for services which can easily outstrip available resources. among some of the larger academic libraries, the staff of the computer center may be half or greater than half that of the library. important differences between the two services have recently come into focus. first, most of the services and benefits of the library are intangible. because of this it has always been difficult to measure the cost benefit of the library as an institution, and it is well known that counts of the number of people entering the door or the number of circulations are far from true measures of the library's functional success. the computer, on the other hand, is a relentless accounting engine; computer facilities can produce endless statistics on the number of jobs run, lines printed, terminal hours provided to users, turnaround time, cards punched, etc. the computer's output is extremely tangible and can be more directly and easily related to academic achievement than can library use. a second major difference lies in apparently different financial roles within the institution. in most organizations, the library is run as an overhead expense, without any attempt to charge back to users or departments proportional costs of utilization. like air, the library resource is there for anyone to use as much or as little as he pleases; the library gets a "free ride," but the computer center is expected to pay its own way. this dichotomy is often explicitly designated as the "library-bookstore" duo model. furthermore, since the library does not generate much in the way of re20 journal of library automation vot 7/1 march 1974 search grants and contracts, it is looked upon as a consumer rather than a producer of financial resources. in fact, those who support computing in preference to books point to. the fact that overhead income generated by computer-related research grants and contracts is shared with the library which may have done little to contribute toward the acquisition of such income! in some institutions the situation has become critical indeed because of the recent substantial reductions in federal· support. much political in-fighting has been necessary to maintain current levels of computer activity, and not all such efforts· have been successful.· some institutions have been forced to cut back on computing power, merge facilities, or combine resources with other institutions. · · · · several years ago when the national science foundation imposed an expenditure ceiling on grants, associated overhead income was correspondingly reduced. one computer center director was reported to have suggested that the effect of this overhead cut could be nullified by a simple, internal reallocation of funds, say by taking the needed amount from the budget of another agency on campus of less significance to researchers and scientists, such as the library. this attitude is clear evidence that the library has lost its sacred cow status as a "good thing" on the campus. it too must justify itself. close examination of the library and the computer facility gives clear evidence that both deal with the same commodity: information. within the recent past several computer facilities have changed their designations to "information processing" facilities or centers. several institutions, notably the university of pittsburgh and columbia university, have coalesced the library and the computer center organizationally or have both units reporting to a vice-president for information services. the recognition and furtherance of this natural linkage may do much to reduce the potentially destructive competition which can characterize the relationship between the two units. there are remarkable growth parallels between the two facilities-the library acquiring and processing more and niore books in response to expanded publication patterns, more users, and the· growth of new ·disciplines and interdisciplinary research, while the computation facility moves rapidly from one generation of software and hardware to the next. the expansion of both organizations produces seemingly equal capital-intensive and labor-intensive pressures: library processing staff doubles and triples, while the ·newly acquired books demand ·more in the way of housing, whether of the traditional library type or warehouse space; the computer center moves toward more sophisticated hardware, especially terminals and communications, which need to be supported by greater numbers of still more highly qualified· systems programmers, communication experts, and user services staff. both services have a marketing. problem; but the computation facility, being relatively more dynamic and more interactive (because of terminal services), can be more sensitive and responsive, .financially and technically, to its clientele than can the library. only now institutional political and fiscal factors/veaner 21 with the emphasis upon computerized bibliographic networking has the library as an institution begun to approach the marketing strategies and the effective user feedback already well developed in computation facilities. service capacity, resource utilization and sharing differences both in service capacity and resource utilization represent a key political issue affecting the future of both libraries and computer facilities. in major universities, the budget for the computer facility is now not far from the library budget in size, and in a few institutions it exceeds the library budget. with the diminution of external grants and contracts, the two organizations compete for the same hard dollars. this economic competition can either drive the two facilities apart, dividing the campus, or cause them to coalesce-as has been the case at columbia and pittsburgh. despite its high operating costs, from the viewpoint of resource utilization, the well-managed computer facility can almost always point to an excellent record.§ no matter how well managed, the research library can never make this claim in the context of its current materials and processing expenditures, much of which by definition is aimed at filling future needs. the library and its patrons cannot "use" all the resources at their command; the library could not even service all the patrons should they demand the use of "all" the resources. in contrast, the computer facility (particularly large on-line systems with interactive capabilities) can be very efficiently utilized even when demand is heavy. thus, to the "objective" eye, it would appear that in the computer facility both the institution and the individual patron get more value for their dollar than they do in the library, which in comparison resembles a bottomless financial pit. one may counter that apples and oranges are being compared, but the institution which pays their bills nevertheless makes the comparison. flexibility, inflexibility, and the future besides better resource utilization, the computer facility offers the patron far greater flexibility of resource use than can the library. there is no way a large collection of books on the celtic language or the military history of the austro-hungarian empire can help a professor of structural engineering, a student of marine biology, or a researcher in modern urban problems. even the books these people actually need and use cannot easily assist others, as relevant data in them is not indexed or readily available for computer manipulation. · the point is that, unlike the library, the computer is a highly elastic universal tool, one that each user can temporarily shape to his own need, replicate .the shape later, or if he wishes change the shape at will. the traditional.lib:rary has no such flexibility; its main bibliographic retrieval de§in fact, if a computer resource is not much used and isn't "carrying its weight," it can be disposed of, by sale if purchased, or by cancellation if leased. 22 journal of library automation vol. 7/1 march 1974 vice-the card catalog-is especially noted for its high maintenance cost, its limited ability to respond to complex queries, and a general fixity of organization and structure that is ever at variance with changing patron expectations and interests. (if computers can be flexible, why can't the library?) there is much in the library that is not used because it is inaccessiblelocked up in an inflexible retrieval tool or unavailable because the stateof-the-art (both in bibliography and computer science) or staffing does not yet permit far deeper access via "librarian-negotiators" and patrons at terminals interacting with large and deeply indexed data bases. as long as major portions of the library budget and staff are devoted to housekeeping and internal technical processing, the library will look less good, less "costbeneficial" to the academic community than does the computer facility. but there is growing recognition that both institutions deal with information processing which covers a wide spectrum of time. true, the storage formats differ, but this may be a temporary phenomenon. as progress is made on improved, less expensive conversion of data from analog to digital form and vice-versa, the day may arrive when the library and the computer facility are indistinguishable. will the library become an information utility? computer utilities are an important developing trend and it is sometimes suggested that library services could be delivered within the utility model. utilities and libraries as they exist today have very different characteristics. a utility can be defined as a system providing a relatively undifferentiated but tangible service to a mass consumer group and with use charges in accordance with a pricing structure designed for load leveling (i.e., optimization of resource utilization). typically, a utility both wholesales and retails its services. within this definition, a conventional library cannot be construed as a utility; its services are generally intangible and very highly differentiated-indeed, chiefly unique, for rarely is one book "just as good as another"; its clientele is not the general public but a highly select group which itself contains highly unequal concentrations of users; and almost no libraries impose user charges in the interest of cost recovery; practically speaking, there is only one united states wholesaler (of bibliographic data) -the library of congress. this situation is changing in several respects. first, the establishment of practical, computerized bibliographic networks has introduced among participating institutions cost sharing schemes closely resembling the load leveling or rate averaging algorithms prevalent among utilities.ll these han example of rate averaging is the practice of the ohio college library center to lump total telecommunication cost and prorate it into the membership fee, in effect creab":ng a distance independent tariff. (this arrangement does not hold outside of ohio.) institutional political and fiscal factorsjveaner 23 new ideas have been readily accepted by libraries and could even become the basis for balancing more equitably the costs of interlibrary loan traffic. second, specialized "information centers" have evolved in certain fields, partially as a consequence of lack of responsiveness (or slow turnaround) by conventional library services, and "for profit" commercial services have been set up. examples of the latter include the european s'il vous plait and its american counterpart, f.i.n.d. (often such commercial services do not hire librarians as they are considered too tradition bound.) a third force which is rather inchoate at the moment may soon take on a recognizable shape: facilities management. under such a scheme, the complete management responsibility for all or part of a function is contracted to an outside vendor. for instance, it is conceivable that some libraries in the near future may have no in-house staff for technical processing. services would be purchased totally from a vendor or obtained from his resident staff, much as computer centers buy specialized expertise through the "resident s.e." (systems engineer). the gradual buildup of computerized bibliographic services offers an excellent opportunity for commercial ventures into turnkey bibliographic operations for libraries. this would bring the libraries one step closer to the utility concept, as they buy a complete package from a wholesaler who probably services many customers. the traditional library service concepts we know today may undergo drastic changes in financing and in methods of delivery. beyond the commercialized or contractual arrangement for technical processing, which is only one component of the total information flow, lie unknown territory and little explored concepts: use charges for library services (the bookstore model), the "for profit" library, the complete information delivery system integrated with computers, communication satellites, and cable tv. if the computer-based library is to become an information utility, a major accommodation will be needed in the financing arrangements, perhaps in form of user charges-for no utility can survive without regulated demand. an unlimited, uncontrolled demand for any product or service is untenable, for without regulation (i.e., pricing) demand rapidly outruns supply. in the traditional library, where theoretically every user has the "righf' to unlimited demand, this never happens for several reasons: (1) not all potential patrons elect to use the resource; ( 2) the users must usually go to the library to access the bibliographic apparatus and obtain the materials held by the library; ( 3) every item in a library collection does not have an equal probability of use; and ( 4) there is a finite rate at which human beings can "use the resource," i.e., people can read just so f~st. none of these self-limiting factors applies to say, electric power, radxo and tv broadcasting, telecommunication services, or similar utilities. the library picture could become quite different if these limitations were removed or mitigated. suppose the patron could access the bibliographic apparatus through his home computer terminal attached to his tv 24 journal of libmry automation vol. 7 ;1· march 1974 in the "wired city." further suppose that he could receive selected, short items (where time of delivery is important to him) directly at his tv set, or longer items having less time value as microforms or hard copy delivered by mail or private delivery systems. given such possibilities, the collecting policies of individual .. libraries" (if they continue to be called by that name) might well change drastically so that nationally, collections might become much more standardized or .. homogenized" -increasing the likelihood that individual holdings will have more nearly equal use probabilities. this would imply the need for one or more national and/ or regional centers for servicing the less used materials, along with appropriate delivery systems and pricing schedules. conclusion work on library automation has proceeded during a highly developmental period in the history of computing. in this sense, librarianship has suffered no worse than any other computer application, nearly all of which have gone through traumas of design, installation, redesign, reprogramming, etc. the main distinction is that in many of these other applications -government, military, industrial, or commercial-there have been . far greater resources available to the task and vastly greater experience with the development process. despite the obstacles, progress in computerized bibliographic work has been far more significant and has achieved far more than many librarians-especially those unaccustomed to the developmerit cycle..;..can appreciate. the snowballing growth of practical consortia and networks along with the successful installation and operation of several on-line bibliographic systems has already changed the face of libtarianship in ·a very short time. like the breaking of the sonic barrier, once the initial.difficulty is overcome, further progress is easier. the ·computer has successfully achieved what librarians have until recently· only paid lip service to: cooperation and wide sharing of an expensive· and large· resource. though the linear growth model in libraries has been dead for some time, the recognition of this fact has riot yet penetrated the entire profession. if libraries are to survive as viable institutions throughout this century and into the next, their leaders inust solve the financial, space, ·and human communication problems inherent in growth. local autonomy, local self-sufficiency, and the "freedom" to ·avoid, evade, and even· undermine national standards now show up as expensive and dangerous luxuries-potentially self-destructive. only through the computet will true library cooperation be possible~ only the development of regional and national bibliographic networks,· with the assistance of substantial federal funding, can really .. save" the library. the computer is actually the' library's life insurance and blood plasma .. a failure to respond to the challenge of the ·computer could be fatal, for it is increasingly apparent that patrons growing up in the computer era will not patiently interact'with··library systems geared to nineteenth-century methods. nothing institutional political and fiscal factorsjveaner 25 in the educational system exists to .force people to use a given resource; people use the resources which are effective, responsive, and economical. if the computer is a better performer than the library, patrons will go to the computer. this will be pa!ticularly the case as computer services· become broader in coverage, simpler to lise, and unit prices continue to decline. despite the serious and irritating problems associated with learning''tp ·use the computer,. librarians must continue aggressively to support. computer applications; indeed, library leaders can impart no more important message than this to their community leaders. · acknowledgments· i wish to thank the following persons for their support: dr. e. howard brooks, who was vice-provost for academic affairs in 1971, and da'vid c. weber, director of libraries, respectively, stanford university, for granting the leave of absence which enabled me to undertake this project. i acknowledge with thanks the contributions of the following persons who reviewed early drafts of the paper, in many cases making valuable suggestions and in other instances helping me ward off errors: mrs. henriette d. avram, head, marc development office, library of congress; hank epstein, director of project ballots and associate director for library and administrative computing, stanford center for information processing; frederick g. kilgour, executive director, ohio college library center; peter simmons, professor of library science, university of british columbia; carl m. spaulding, program officer, council on library resources, inc.; david c. weber, director of libraries, stanford university. references 1. barbara evans markuson, ed., libra1'ies and automation; conference on libraries and automation, warrenton, va., 1963. (washington, d.c.: library of congress, 1964). 2. u.s. library of congress, automation and the library of congress; a survey sponsored by the council on library resources, inc. (washington, d.c.: library of congress, 1963), 3, basil stuart-stubbs, "trial by computer: a punched card parable for library administrators," library ]ournal92:4471-4 (15 dec. 1967). 4. dan mather, "data processing in an academic library: some conclusions and observations," pnla quarterly 32:4-21 (july 1968). 5. lib1'aries and information technology: a national systems challenge; a report to the council on library resources, inc., by the information systems panel, computer science and engineering board. (washington: national academy of sciences, 1972). 6. anthony oettinger, run, computer, run (cambridge, mass.: harvard university · press, 1969), p.196. (these same comments were cited in allen b. veaner's earlier article, "major decision points in library automation," college & research libraries :299-312. 26 journal of library automation vol. 7/1 march 1974 appendix 1 list of institutions visited university of alberta university of british columbia university of chicago cleveland public library the college bibliocentre, ontario university of colorado columbia university cornell university harvard university university of illinois indiana university massachusetts institute of technology university of michigan new york public library northwestern university ohio college library center university of pennsylvania pennsylvania state university umversity of pittsburgh purdue university simon fraser university syracuse university university of toronto yale university a computer output microfilm serials list for patron use william saffady: wayne state university, detroit, michigan. 263 library literature generally assumes that com is bette1· suited to staff rather than patron use applications. this paper describes a com serials holdings list intended for patton use. the application and conversion from paper to com are described. emphasis is placed on the selection of an appropriate microformat and easily operable viewing equipment as conditions of success fo1' patron use. as a marriage of dynamic information-handli11g technologies, computer output microfilm (com) is a systems tool of· potentially great significance to librarians. several libraries have reported successful com applications initiated within the last few years. the two most recent-fischer's description of four com-generated reports used by the los angeles public libraries and bolefs account of a com book catalog at the washington university school of medicine library-stress the time, space, and cost savings so frequently reported in analyses of the advantages of com.1• 2 this article describes the substitution of microfilm for paper as the computer output medium in one of the most common library automation applications, a serials holdings list intended for use by library patrons. it is interesting that, at a time when librarians are insisting on the importance of patron acceptance of technological innovation, the recent literature reports com applications intended solely for staff use. bole£, in fact, lists staff rather than patron use among the characteristics of potentially successful library com applications. the report that follows suggests, however, that careful attention to the selection of an appropriate microformat and viewing equipment can successfully extend the effectiveness of com to include pab:on-use library automation applications. the application the union list of se1·ials in the wayne state university libraries is a computer-generated alphabetical listing, by title, of serials held by the wayne state university library system and some biomedical libraries in the detroit metropolitan area. sullivan describes it as "informative in purpose and conventional in method."3 as with many similar applications, serials i' 264 journal of library automation vol. 7 j 4 december 197 4 holdings were automated in order to unify and disseminate hitherto separate, local records. the list is primarily a location device, giving for each title the location within the library system and information on the holdings at each location. it is updated monthly, the july 1974 issue totalling 1,431 pages. in paper form, twenty copies produced on an ibm 1403 line printer using four-ply carbon-interleaved forms were distributed for use throughout the library system. the list shares some of the characteristics that have marked other successful com applications. 4 it consists of many pages and has a sizeable distribution. quick retrieval of information is essential. use is for reference rather than reading. there is no need to annotate the list and no need for paper copies, although the latter requirement would not rule out the use of com for this particular application. patrons simply consult the list to determine whether the library's holdings include a particular serial and then proceed to the indicated location. it is interesting that serials holdings lists, long recognized as an excellent introductory library automation application, should also prove an excellent first application for com. complexities of format and viewing equipment selection aside, the conversion of output from paper to microfilm presented no problems. since the wayne state university computing and data processing center does not have com capability, the university libraries, after careful consideration of several vendors, contracted with the mark larwood company, a microfilm service bureau equipped with a gould beta com 700l recorder. the beta com is a crt-type com recorder with an uppercase and lowercase character set, forms-overlay capability, proportional spacing, underlining, superscripts, subscripts, italics, and a universal camera capable of producing 16, 35, 70, and 105mm microformats at several reduction ratios. a decisive factor in the selection of this particular vendor was the beta com's dedicated pdp-8/l minicomputer that enables the com recorder to accept an ibm 1403 print tape, thereby greatly simplifying conversion and eliminating the expense of reprogramming. microformat selection as ballou notes, discussions of com have tended to concentrate more on the computer than on micrographics, but for a patron-use com application the selection of an appropriate microformat is of the greatest importance.5 however, there has been an unfortunate emphasis placed, both in the literature of micrographics and by vendors, on microfiche, the format now dominating the industry, especially in com applications. such emphasis ignores the fundamental rule of systems design, that form follows function. each of the microformats has strengths and weaknesses that must be analyzed with reference to the application at hand. for a patron-use, com-generated serials holdings list, ease of use with a minimum of patron film handling is a paramount consideration. microfiche is clearly unsuitable for a list of over 1,400 pages. even at 42x reduction, the paserials list/saffady 265 tron would be forced to choose from among seven :fiches, each containing 208 pages. the difficulties of handling and loading, combined with library staff involvement in a program of user instruction, make fiche an unattractive choice. instead, the relatively large size of the holdings list suggests that one of the 16mm roll formats offers the best prospects of containing present size and future growth within a single microform. the disadvantages of the conventional 16mm open spool-the necessity of threading film onto a take-up reel before viewing-can be minimized by using a magazine-type film housing. the popular cartridge format eliminates much film handling, but cartridge readers are very expensive, necessitating a considerable investment where many readers are required. even with the cartridge, it is still possible for a patron to unwind the film from the take-up reel, necessitating rethreading before viewing. fortunately, microfilm cassettes overcome this difficulty. unlike the cartridge format, 16mm cassettes feature selfcontained supply and take-up reels. the film cannot be completely unwound from the take-up reel and the cassette can be removed from the viewer at any time without rewinding. patron film handling is virtually eliminated. the cassette format has proven very popular with british libraries, where it has been used with satisfactory results in com applications.6 viewing equipment success in format choice is contingent on the selection of appropriate viewing equipment. as larkworthy and brown point out, the best viewer for patron-use com applications is one that can easily be operated by the least mechanically inclined person.7 fortunately, cassette viewers, while limited in number, tend to be very easy to operate. the viewer chosen for use with the union list of serials, the memorex 1644 autoviewer, features a simple control panel, fixed 24x reduction, easily operated focus and scan knobs, motorized film drive for high-speed searching, and a manual hand control for more precise image positioning. the screen measures eleven by fourteen inches in size, with sufficient brightness for comfortable ambient light viewing. other cassette viewers examined, however satisfactory they might be in other respects, failed to meet the peculiar requirements of this particular application. discussion since its introduction in april 1974, the com-generated union list of serials in the wayne state university libraries has enjoyed a satisfactory reception. patrons have learned to consult the com list with little difficulty. the selection of an appropriate microformat and easily operated viewing equipment have kept staff involvement in patron instruction to a minimum. there appears to be no reason for limiting potential library com applications to those used primarily or solely by staff members. given the 266 journal of library automation vol. 7/4 december 1974 severity of the current paper shortage, the consequent rise in paper prices, and serious questions about the availability of paper at any price, com merits serious consideration as an alternative output medium for the widest range of library automation applications. references 1. mary l. fischer, "the use of com at the los angeles public library," the journal of micrographics 6:205-10 (may 1973). 2. doris bole£, "computer-output microffim," special libraries 65:169-75 (april 1974). 3. howard a. sullivan, "metropolitan detroit's network: wayne state university library's serials automation project," medical library association buuetin 56:269-71 (july 1968). 4. see, for example, auerbach on computer output microfilm (princeton: auerbach publishers, 1972), p.1-10. 5. hubbard w. ballou, "microform technology," in carlos cuadra, ed., annual review of information science and technology, v.8 (washington, d.c.: american society for information science, 1973), p.139. 6. d. r. g. buckle and thomas french, "the application of microform to manual and machine readable catalogues," program 6:187-203 (july 1972). 7. graham larkworthy and cyril brown, "library catalogs on microfilm," library association record 73:231-32 (dec. 1971). critical success factors for integrated library system implementation in academic libraries: a qualitative study shea-tinn yeh and zhiping walter information technology and libraries | september 2016 27 abstract integrated library systems (ilss) support the entire business operations of an academic library from acquiring and processing library resources to making them available to user communities and preserving them for future use. as libraries’ needs evolve, there is a pressing demand for libraries to migrate from one generation of ils to the next. this complex migration process often requires significant financial and personnel investment, but its success is by no means guaranteed. we draw on enterprise resource planning and critical success factors (csfs) literature to identify the most salient csfs for ils migration success through a qualitative study with four cases. we found that careful selection process, top management involvement, vendor support, project team competence, staff user involvement, interdepartmental communication, data analysis and conversion, project management and project tracking, staff user education and training, and managing staff user emotions are the most salient csfs that determine the success of a migration project. introduction the first generation of integrated library systems (ilss) were developed specifically for library operations focused on the selection, acquisition, cataloging, and circulation of print collections. as libraries’ nonprint materials steadily grow, the print-centric ilss became less and less efficient in supporting libraries’ daily operations. recent years have seen an emergence of a new generation of ilss, commonly called library services platforms (lsps), that takes into account the management of both print and electronic collections. lsps take advantage of cloud computing and network advancements to provide economies of scale and to allow a library to better share data with other libraries. furthermore, lsps unify the entire suite of library operations to provide efficient workflow at the back end and advanced online discovery tools at the front end for the library.1 given the claimed benefits of the emerging lsp and the fact that vendors are phasing out support for their legacy ilss, we project that more libraries will be migrating to lsps as the systems mature and libraries’ needs evolve. shea-tinn yeh (sheila.yeh@du.edu) is assistant professor and library digital infrastructure and technology coordinator, university of denver libraries. zhiping walter (zhiping.walter@ucdenver.edu) is associate professor, business school, university of colorado denver. mailto:sheila.yeh@du.edu mailto:zhiping.walter@ucdenver.edu) critical success factors for integrated library system implementation in academic libraries: a qualitative study | yeh and walter |doi:10.6017/ital.v35i2.9255 28 migrating from one generation of ils to another is a significant initiative that affects the entire library operation.2 because of its scale and complexity, the migration project is not always smooth and often fraught with problems, with some projects falling behind migration completion schedule.3, 4, 5 in addition, committing to a new system often results in significant financial and personnel costs for an academic library.6 understandably, there is considerable trepidation before, during, and after the migration process.6, 7 what contributes to a smooth migration process and a successful migration project? this is an urgent question at present and an enduring question for the future. this is because, as libraries continue to evolve, their operations and management needs are destined to outgrow functionalities of the current generation of ils. therefore migration to a new generation of ils is destined to occur periodically for a library. in this research, we study critical success factors (csfs) that contribute to a smooth migration process and a successful migration project defined as on-time and on-budget project completion and a smooth implementation process. to achieve our research goal, we anchor our theoretical foundation in the enterprise resource planning (erp) system-implementation literature. erp is “business process management software that allows an organization to use a system of integrated applications to manage the business and automate many back office functions related to technology, services and human resources.”9 since a complete ils is formed from a suite of integrated functions to manage a broad range of library processes, it is in fact an erp for libraries.10 a literature review of csfs for erp system implementation success revealed more than ninety csf factors.11, 12 the contribution of our research is in identifying, through qualitative research method, the most salient csfs that contribute to the success of a library system migration project from one generation of ils to another. results of this study can help library administrators to improve the chance of success and decrease the level of anxiety during a migration project now and in the future. the remainder of the article is organized as follows: section 2 reviews erp, ils, lsp, csfs, and information system success measurement described in the literature. section 3 describes the guided interviews that have been conducted to identify the csfs, the results, and the analysis of the results. finally, we offer conclusions and limitations as well as recommend future work. literature review erp is business-management software comprising a suite of integrated applications that an organization can use to collect, store, manage, and interpret data from many business activities, including product planning, manufacturing, service delivery, marketing and sales, and human resources. the core idea of an erp system is to integrate both the data and the process dimensions in a business so that transactions can be monitored and analyzed for planning and strategic purposes.13 modules of the system cover different functions within a company and are linked so users can see what is happening in all areas of the company. an erp system can improve a business’s back offices as well as its front-end functions, with both operational and strategic benefits.14 some of the benefits include reliability in information access, data and operations information technology and libraries | september 2016 29 redundancy, data retrieval and reporting efficiency, easy module extension, and internet commerce capability. just like an erp system for a business, a complete library management solution comprises a suite of integrated applications that manage a broad range of library processes including circulation, acquisition, cataloging, electronic resources management, and system administration. lsps, the current generation of library management systems, are designed to manage both physical and digital collections. lsps follow the service-oriented architecture (soa) and can be deployed through multitenant software as a service (saas) distribution model.15 in addition to supporting all library functions, lsps integrate with other university systems, such as student registry and finance, and provide front-end for library patrons in a cloud environment that leverages a global network of systems for discovery of a wide array of resources.16 since an lsp is essentially an enterprise system for library functions, csfs of erp implementation success could guide lsp implementation. csfs are conditions that must be met for an implementation to be successful.17 more than ninety csfs have been identified for erp implementation success.18, 19 those csfs have been classified according to various schemes, but we found the strategic versus tactical classification most relevant to the library context.20 strategic factors address the big picture involving the breakdown of goals into do-able items. tactical factors, on the other hand, are the methods to accomplish the doable items that lead to achieving the goals.21 by examining the entire list of csfs from both the strategic and the tactical perspectives, we identify top csfs for library-management-solution implementation and migration success, defined as on-time and on-budget delivery as well as smooth implementation process,22, 23 through a qualitative study. method we conducted semi-structured interviews with open-ended questions to identify the most salient csfs for implementation success. since we needed to reduce more than ninety csfs in the literature to a list of most salient csfs in the library context and to potentially identify new csfs, a qualitative-interview approach was more suitable than a quantitative-survey approach. a twostep process was used to arrive at the final list. first, we evaluated all csfs in the literature and identified a subset of csfs that might be most relevant for library-systems implementation.24 second, this csfs subset was used to develop an interview guide for semistructured interviews conducted later to further reduce this subset. open-ended questions were also used during the interviews to elicit additional csfs. an institutional review board (irb) application was submitted and approved. the result of this two-step process is a list of ten csfs discussed in the results section, with nine csfs coming from our initial list and one csf emerging from the interviews. the criterion for recruiting study libraries is that the library has implemented a new lsp within the last three years. this is because the lsp is the current generation of ils, and it is only within the last few years that various lsp vendors began to promote and implement the lsps. a critical success factors for integrated library system implementation in academic libraries: a qualitative study | yeh and walter |doi:10.6017/ital.v35i2.9255 30 recruitment email was sent to libraries listed as adopters on various vendors’ press release sites. participating recipients referred the interview request to appropriate migration team members whom we later contacted to schedule interviews. this resulted in up to five people from each participating library being interviewed in person or via skype. their positions are listed in table 1. interviews were recorded, transcribed, and cleaned. emails to the same interviewees were used for follow-up questions as needed. after interviews with each library, qualitative data analysis was performed to identify csfs that emerged from the interviews. interviews continued until no new csfs emerged in the last interview. in total, staff from four libraries were interviewed between october 2014 and march 2015 about their implementation process and experience from staff user perspective. the design and implementation of discovery public interface experience was not part of this inquiry. table 1 summarizes characteristics of the four libraries. case numbers instead of university names are used to protect identities of participating libraries and interviewees. case 1 case 2 case 3 case 4 type of university private public public private student population 11,000+ 32,000+ 2,400+ 2,700+ operating budget 11 million 13 million 1.5 million 1.3 million library employees 150 400 17 13.5 project length 6 months 9 months 6 months 9 months ils used before millennium aleph evergreen voyager lsp implemented sierra alma sierra sierra reasons for migration discontinued vendor system support; servers out of warranty; vendor gave incentives outdated servers; servers out of warranty in need of a robust system and provides discovery layer in need of a modern system demonstrating the library’s moving with the times positions of interviewees head of systems; module experts heads of systems director of library; head of systems director of library table 1. summary of case study site characteristics. results the following csfs emerged from interviews: careful selection process, top management involvement, vendor support, project team competence, staff user involvement, interdepartmental communication, data analysis and conversion, project management and project tracking, staff user education and training, managing staff user emotions. we discuss each csf next. information technology and libraries | september 2016 31 careful selection process most ilss are commercial, off-the-shelf software systems that can vary dramatically in functionality from system to system.25 for example, some packages are more suitable for large institutions while others are more suitable for smaller ones. to mitigate risks in productivity or transaction loss and to minimize system and implementation costs, a library needs to determine the best “fitness-of-use” system. such a determination is the outcome of a careful selection process. although there is no commonly accepted technique, method, or tool for this process, all selection processes share common key steps suggested in the literature.26 they are the following as applied to library-systems selection: define stakeholder requirements, search for products, create a short list of most promising candidates based on a set of “must-have” requirements, evaluate the candidates on the short list, and analyze the evaluation data to make a selection. in addition, if the server option was chosen instead of the cloud option, selected hardware needs to satisfy system requirements for the final configuration. careful selection process emerged as a csf that affected implementation outcome for all four libraries. all cases were migrating to an lsp system. some systems can be offered as locally installed systems, which require appropriate in-house and hardware capabilities. case 1 did not consider its it capability when deciding on a turnkey system. as a result, the library experienced difficulties in setting up the infrastructure in-house during the implementation. each of the other three cases considered the candidate system’s compatibility with the legacy system, the match between library needs and system functionalities, system maturity, migration costs, data storage needs, and vendor support before and during the implementation as well as continued vendor support throughout the life of the new system. even though each of the three libraries arrived at its system choice differently, on reflection, interviewees expressed relief and satisfaction in their decisions to choose their respective systems. “we were in the position where our servers were out of date and warranty, needed to be replaced. the servers were too small. we had sizing issues and we couldn’t update to the most recent version of aleph . . . alma being a cloud based solution will eliminate our need to be ‘in the server business.’” (case 2). “we went through a very extensive formal process to select this system.” (case 3) top management involvement successful implementation requires strong leadership by executives who understand, support, and champion the project.27 when this involvement is trickled down through organizational hierarchy, it leads to an organizational commitment, which is required for implementation success for complex projects.28, 29 since library-system implementation is a complex project that (if done correctly) will transform the entire library and reposition it for better efficiency, strong leadership is critical as well. critical success factors for integrated library system implementation in academic libraries: a qualitative study | yeh and walter |doi:10.6017/ital.v35i2.9255 32 in all four cases, top management were involved in the final decisions of their respective system choices. in cases 1 and 2, top management also took charge in securing funding for the migration projects. interviewees stressed that top management support was very important in their respective project implementations. “the top level management took the recommendations from the systems librarians at the time, with the blessing of the council determined whether they want to proceed with the product alma, and had funding conversations with the financial people.” (case 2) “we have faculty library committee, faculty governance oversight. we showed them webinars of the products we considered before we signed them, so we have faculty representation on board. we held open forum and were inclusive in our invitations.” (case 4) vendor support with a new technology, it is critical to acquire external technical expertise, often from the vendor, to facilitate successful implementation.30 effective vendor support includes adequate and highquality technical support during and after implementation, sufficient training provided for both the project team and staff users, and positive relationships between all parties in the project.31 additionally, there should be adequate knowledge transfer between the vendor consultants and the clients, which can be achieved by defining roles, achieving shared understanding, and enhancing relationships through competent communication.32, 33 in the case of library-system implementations, vendor support is particularly important because of the complexity of each new generation of the system and the library personnel’s knowledge gap in understanding the nuts and bolts of the new system. effective vendor support was identified in each case as a critical success factor determining the implementation outcome even though the form of vendor support varied from case to case. in case 1, the vendor sent different consultants with various expertise as project managers on the basis of the project phase. in case 2, the vendor sent one consultant who served as the main project manager. in case 3, the vendor provided a project manager and a team of technicians. in case 4, consultants were shared across multiple consortium libraries that were implementing the system at the same time. no matter how vendor support was provided, it was essential for implementation success as indicated by interviewees. “the vendor has been very supportive and provides a group of experts throughout the process, some are knowledgeable in server business while others are skilled project managers.” (case 1) project team competence since library-system migration affects all functional areas of a library, members of the implementation team need to be cross-functional. furthermore, members with both business information technology and libraries | september 2016 33 knowledge and technology knowhow are especially crucial for implementation success.34 competence of vendor consultants assigned to the project also influences implementation success, as discussed earlier. additionally, it is important to have an in-house project leader who champions the project and who has the essential skills and authority to set goals that legitimize change.35 having a competent project team was essential for implementation success for each of our cases. in each case, the vendor provided the project manager and the library provided a co-manager who was a champion figure. other team members came from various functional areas such as acquisition, circulation, cataloging, electronic resources management, and system administration. for example, in case 1, the technology librarian participated as a co-project manager. the projectmanagement team comprised module experts within the library and from functional areas. in addition, the university’s technology services department lent technical support during early stages of implementation when servers need to be set up. the interviewees all stressed the importance of project-team competence. “without the infrastructure knowledge from the university’s technology team and their time and full support to negotiate with the vendor, the migration project would not have been possible.” (case 1) “the university’s it made sure that we are in compliance with campus policies and expectations for securities.” (case 2) staff user involvement it is important that the project team involve staff users early on, otherwise the implementation process may be bumpy. when end users are involved in decisions relating to system selection and implementation, they are more invested in and concerned with the success of the system, which in turn leads to greater system use and user satisfaction.36, 37 as such, it is one of the most cited critical success factors in erp implementation.38 because personal relevance to the system is just as important for library-system implementation, effective staff user involvement with implementation is positively related to implementation success. staff user involvement has emerged as a main success factor in all our cases and contributed to the implementation project outcome. in case 1, staff users were not consulted as to whether an lsp was necessary for the library, although they were informed of the reasons for implementation. additionally, staff users were not involved when the project timetable was negotiated. this lack of early staff user involvement led to considerable stress down the road, which made the implementation process bumpy. the other three cases involved staff users early on; as a result, staff users experienced much less stress and frustration down the road. specifically, in case 2, the staff users were educated about the need for migration through staff meetings, town hall meetings, supervisory meetings, council meetings, and forums. many product-demo sessions were conducted for the staff so they would have the knowledge to participate before the final decision critical success factors for integrated library system implementation in academic libraries: a qualitative study | yeh and walter |doi:10.6017/ital.v35i2.9255 34 was made. there were daily internal newsletters conveying implementation news throughout implementation months. in case 3, the entire library was involved with the selection of a new system. while the key staff (such as circulation manager, acquisition manager, and reference manager) had more input than others, everyone offered input about the project. as such, the buyin with the new system was strong from all stakeholders. in case 4, staff users were involved early on through open forums and webinars. the following quotes are examples of interviewee sentiment concerning staff user involvement: “everybody is involved in choosing the system; partially because evergreen had been so problematic. we wanted to make sure that everyone is on board.” (case 3) “migration is the most time consuming aspect of the library staff work during the time of the project, without their buy-ins, it is difficult to have a successful project.” (case 4) interdepartmental communication the importance of effective communications across functional and departmental boundaries is well known in information-systems-implementation literature.39 with consultants coming from the vendor, project team members coming from different functional areas, and staff users with different perceptions and understandings of the implementation project, the importance of effective communications between all involved cannot be overstated. communications should start early, be consistent and continuous throughout various stages of the implementation process, and include a system overview, rationale for implementation, briefings for process changes, and contact-points establishment.40 expectations and goals should be communicated to all stakeholders and to all levels of the organization.41 effectiveness of interdepartmental communication affected the implementation outcome in all our cases. in case 1, the library’s project manager was designated to communicate with the vendor when issues arose, such as hardware and software configurations, system backup and use, and task assignments. the formal project plan was established using the web-based basecamp so that team members in different roles with different responsibilities could communicate and work together online. regular meetings were held and emails were exchanged between project team members. however, there is a lack of effective interdepartmental communication with staff who were not on the project team. this resulted in the absence of necessary system testing that would have detected some data-integrity issues. such issues later caused the system to be offline for days, which brought much frustration and stress to everyone. in the other three cases, all actors were well informed through news releases, meetings, presentations, and webinars. concerns were communicated to the project team and addressed timely. as a result, the level of frustration was very low for those three cases. data analysis and conversion a fundamental requirement for the effectiveness of an erp system is the accuracy of its data,42 and the same is true for a library system. data types in a legacy ils are often of an outdated format and information technology and libraries | september 2016 35 can differ from formats supported by a new library system. conversion from one format to another can be an overwhelming process, especially when there is no existing expertise in the library. since migrating legacy data to the new system is essential, effective data analysis for conversion is a critical success factor for implementation success. the smoothness of each of the four implementation cases was related to the project team’s data analysis and conversion efforts. in case 1, the library did not spend any effort to analyze, convert, or clean the data. as a result, the system experienced data-integrity issues after it went live. the other three libraries either devoted time to clean and convert the data or had a third party do the data cleaning. as a result, no system issues arose from data-integrity problems. interviewees from case 2 told us, “we elected to freeze the data 30 days sooner in terms of bibliographic data, so that we can do an authority control project with a third party vendor.” project management and project tracking according to erp implementation literature, effective project-management practices are critical for implementation success. such practices include defining clear objectives, establishing a formal implementation plan, designing a realistic work plan, and establishing resource requirements.43 the formal implementation plan needs to identify modules to be implemented, tasks to be undertaken, and all technical and nontechnical issues to be considered.44 project progress must be carefully monitored through meetings and reports.45, 46 effective project management and tracking has affected implementation outcome in all our cases. a popular project management and tracking software is basecamp, a web-based project management and collaboration tool initially released in 2004.47 it offers discussion boards, to-do lists, file sharing, milestone management, event tracking, and messaging system that help project teams stay organized and connected despite their different locations. all cases used basecamp for project management and tracking, which contributed to on-time and on-budget project completion for all cases. staff user education and training a new system often frustrates users who do not receive adequate training in its functionalities and use.48 when feeling frustrated and stressed, users may avoid using the system. proper and adequate training will sooth users and eliminate their reluctance to use the new system, which in turn helps realize productivity gains.49, 50 training processes should consider factors such as training curriculum, user commitment, trainers’ personnel skills and competence, as well as training schedule, budget, evaluation, and methods.51 effective staff user training has emerged as a critical success factor from all our cases. in case 1, staff users had access to a vendor-supplied preview portal, which simulated system functionalities. staff users were so familiar with the new system by the time the system went live that they were eager to engage with it. in cases 2, 3 and 4, staff users were trained through demo products, online video trainings, q&a, and on-site training sessions conducted by the vendor. critical success factors for integrated library system implementation in academic libraries: a qualitative study | yeh and walter |doi:10.6017/ital.v35i2.9255 36 these training materials and sessions served to ease staff user’s feeling of uncertainty and anxiety, as the following quotes show: “the online training videos were provided to all staff in the library and followed up with q&a sessions which members of the committee will host in their respective areas. . . . then ex libris did a week long onsite training workshop serve for the final deep configuration issues. . . . we know that there are staff users who want to be ahead of the game, yet there are always people who don’t want to learn until the day before they go live.” (case 2) “we have a training package with several onsite visits, each one is for a few days. the trainer focused on one aspect of the system. it was more than watching the videos online. because of the small staff here, almost everyone attended at least one training.” (case 3) “the trainers varied with their expertise, we developed fondness for some more than others. the training is functional in nature. the vendor’s priority was about trainer availability and to keep the project on time. we became familiar with trainers’ expertise; we were able to request the right trainer with the job.” (case 4) managing staff user emotions although education and training eases user anxiety, it does not completely eliminate it. emotions felt by users early in the implementation of a new system have important effects on the use of the system later on.52 how to manage staff user anxiety and negative emotions when they appear has emerged as a critical success factor in all our cases, as shown in the following quotes: “there were so many things going on in the library during the migration go-live week. the unknown of the migration success made staff users uncomfortable. should the migration date be decided in consideration of other initiatives, the frustration experienced would have been a lot less and might not have been ignored during the going-live week.” (case 1) “the frustration was just change; it was the fact that we have to learn something new. . . . primarily the frustration was handled by the lead.” (case 2) “there was a challenge, especially early on, in getting people to engage with the manuals and the literature in documentation. it is as if everyone is being asked to learn a new language. . . . the key relationship between the onsite coordinator and the project manager on the vendor side is important. when those two exchange information and handle frustration diplomatically, this bridge between the two organizations can smooth over a lot of rough feathers on either or both sides.” (case 4) information technology and libraries | september 2016 37 this final csf did not come directly from the ninety-plus csfs that we started with, although it aligned closely with “change management” category.53 this csf emerged mostly from the interview process. summary of results the results of the case studies for each critical factor are summarized in table 2. implementation project outcome is summarized in table 3. an implementation is considered successful if it was completed on-time and on-budget and if the implementation process was smooth as reflected in the number and degree of unexpected problems along the way. critical success factor case 1 case 2 case 3 case 4 careful selection process no yes yes yes top management involvement yes yes yes yes vendor support yes yes yes yes project team competence yes yes yes yes staff user involvement no yes yes yes interdepartmental communication no yes yes yes data analysis & conversion no yes yes yes project management and tracking yes yes yes yes staff user education and training yes yes yes yes managing staff user emotions no yes yes yes table 2. summary of case study critical success factors findings case 1 case 2 case 3 case 4 on time implementation yes yes yes yes on budget implementation yes yes yes yes smoothness of implementation no staff users experienced data integrity issue, system downtime, as well as anxiety and stress with the system implementation process yes yes yes table 3. summary of case study implementation success measures discussion and conclusions the implementation of a new ils is a large-scale undertaking that affects every aspect of a library’s operations as well as every staff user’s workflow process. as such, it is imperative for critical success factors for integrated library system implementation in academic libraries: a qualitative study | yeh and walter |doi:10.6017/ital.v35i2.9255 38 library administrators to understand what factors contribute to a successful implementation. our qualitative study shows that there are two categories of csfs: strategic and tactical. from the strategic perspective, top management involvement, vendor support, staff user involvement, interdepartmental communication, and staff user emotion management are critical. from the tactical perspective, project team competence, project management and project tracking, data analysis and conversion, and staff user education and training to break down the technical barrier greatly affect implementation outcome. in addition, selection of the final system from a variety of choices and options requires a careful consideration of both strategic and tactical issues. each factor identified is important in its own right during the implementation process. combined, they complement each other to guide an implementation to success. among the list of csfs identified, the role of staff user emotion management was not identified during the theoretical phase of the study; it only emerged as an important csf during interviews. top management involvement, vendor support, project team competence, project management and tracking, and staff user education and training are csfs that were somewhat intuitive, and they were implemented by all cases. however, a library may select an end system without careful considerations. it may also be unaware of the importance of involving users early on, the importance of opening clear lines of interdepartmental communications, or the importance of performing data analysis and conversion before the implementation. staff user emotion management, especially, is at the risk of being an afterthought of an implementation. by identifying the most salient csfs, this study offers practical contributions to academic library leaders and administrators in understanding how critical success factors play a role in ensuring a smooth and successful ils implementation. although csfs have been extensively studied in the discipline of information-systems management, this is the first study to apply csfs in the library context. since library management has its unique challenges compared to businesses, identifying csfs for library-system-implementation success is important not only for the current migration to lsps but also for future migrations to future generations of ilss as the needs of libraries continue to evolve. as with any empirical research, there are limitations to this study. the number of academic libraries interviewed is small despite no new information being discovered after the fourth interview. the vendors represented in this study are only two of the many in the market providing lsps to libraries. with these aforementioned limitations, the results of this study may not be generalizable to libraries implementing an lsp with vendors other than innovative interfaces and ex libris. additionally, the results may not be generalizable to nonacademic libraries. this research can be extended to validate the proposed csfs quantitatively by performing a survey research in academic libraries. studying interactions between identified factors will offer an even greater contribution. this research can be experimented in other types of libraries to generalize inferences. in addition, case libraries 3 and 4 both expressed that lsp changes the public interface that is used by external users, and they wished to have more opportunities for outreach prior to the implementation. although the design and implementation of the public interface was not considered within the scope of this research, this comment is insightful because information technology and libraries | september 2016 39 it may imply that future studies should consider a project champion to be a critical success factor. the project champion must have people-related skills and position to introduce changes in achieving buy-in from staff users.54, 55 references 1. richard m. jost, selecting and implementing an integrated library system: the most important decision you will ever make (boston: chandos, 2015). 2. ibid., 3. 3. suzanne julich, donna hirst and brian thompson, “a case study of ils migration: aleph500 at the university of iowa,” library hi tech 21, no. 1 (2003): 44–55, http://dx.doi.org/10.1108/07378830310467391. 4. zahiruddin khurshid, “migration from dobis libis to horizon at kfupm,” library hi tech 24, no. 3 (2006): 440–51, http://dx.doi.org/10.1108/07378830610692190. 5. vandana singh, “experiences of migrating to an open-source integrated library system,” information technology & libraries 32, no. 1 (2013): 36–53. 6. jost, “selecting and implementing an integrated library system.” 7. yongming wang and trevor a. dawes, “the next generation integrated library system: a promise fulfilled,” information technology & libraries 31, no. 3 (2012): 76–84. 8. keith kelley, carrie c. leatherman, and geraldine rinna, “is it really time to replace your ils with a next-generation option?” computers in libraries 33, no. 8 (2013): 11–15. 9. vangie beal, “erp—enterprise resource planning,” webopedia, http://www.webopedia.com/term/e/erp.html. 10. “library management system,” tangient llc, https://libtechrfp.wikispaces.com/library+management+system. 11. christopher p. holland and ben light, “a critical success factors model for erp implementation,” ieee software 16, no. 3 (1999): 30–36, http://dx.doi.org/10.1109/52.765784. 12. levi shaul and doron tauber, “critical success factors in enterprise resource planning systems: review of the last decade,” acm computing surveys 45 no. 4 (2013): 1–39, http://dx.doi.org/10.1145/2501654.2501669. 13. yahia zare mehrjerdi, “enterprise resource planning: risk and benefit analysis,” business strategy series 11, no. 5 (2010): 308–24, http://dx.doi.org/10.1108/17515631011080722. 14. mohammad a. rashid, liaquat hossain, and jon david patrick, “the evolution of erp systems: a historical perspective,” in enterprise resource planning: global opportunities and challenges (hershey, pa: idea group, 2002). http://dx.doi.org/10.1108/07378830310467391 http://dx.doi.org/10.1108/07378830610692190 http://www.webopedia.com/term/e/erp.html https://libtechrfp.wikispaces.com/library+management+system http://dx.doi.org/10.1109/52.765784 http://dx.doi.org/10.1145/2501654.2501669 http://dx.doi.org/10.1108/17515631011080722 critical success factors for integrated library system implementation in academic libraries: a qualitative study | yeh and walter |doi:10.6017/ital.v35i2.9255 40 15. marshall breeding, “library systems report 2014: competition and strategic cooperation,” american libraries 45, no. 5 (2014): 21–33. 16. sharon yang, “from integrated library systems to library management services: time for change?” library hi tech news 30, no. 2 (2013): 1–8, http://dx.doi.org/10.1108/lhtn-022013-0006. 17. shahin dezdar, “strategic and tactical factors for successful erp projects: insights from an asian country,” management research review 35, no. 11 (2012): 1070–87, http://dx.doi.org/10.1108/14637151111182693. 18. ibid. 19. shahin dezdar and ainin sulaiman, “successful enterprise resource planning implementation: taxonomy of critical factors,” industrial management & data systems 109, no. 8 (2009): 1037– 52, http://dx.doi.org/10.1108/02635570910991283. 20. sherry finney and martin corbett, “erp implementation: a compilation and analysis of critical success factors,” business process management journal 13, no. 3 (2007): 329–47, http://dx.doi.org/10.1108/14637150710752272. 21. f. pearce, business building and promotion: strategic and tactical planning (houston: pearman cooperation alliance, 2004). 22. jennifer bresnahan, “mixed messages,” cio (may 16, 1996), 72, http://dx.doi.org/10.1016/j.jchf.2013.07.005. 23. majed al-mashari, abdullah al-mudimigh, and mohamed zairi, “enterprise resource planning: a taxonomy of critical factors,” european journal of operational research 146, no. 2 (2003): 352–64, http://dx.doi.org/10.1016/s0377-2217(02)00554-4. 24. shaul and tauber, “critical success factors in enterprise resource planning systems.” 25. h. akkermans and k. van helden, “vicious and virtuous cycles in erp implementation: a case study of interrelations between critical success factors,” european journal of information systems 11, no. 1 (2002): 35–46, http://dx.doi.org/10.1057/palgrave.ejis.3000418. 26. abdallah mohamed, guenther ruhe, and armin eberlein, “cots selection: past, present, and future” (paper presented at the 14th annual ieee international conference and workshops on the engineering of computer-based system, 2007), http://dx.doi.org/10.1109/ecbs.2007.28. 27. m. michael umble, elisabeth j. umble, and ronald r. haft, “enterprise resource planning: implementation procedures and critical success factors,” european journal of operational research 146 no. 2 (2003): 241–57, http://dx.doi.org/10.1016/s0377-2217(02)00547-7. 28. jim johnson, “chaos: the dollar drain of it project failures,” application development trends 2, no. 1 (1995): 41–47. http://dx.doi.org/10.1108/lhtn-02-2013-0006 http://dx.doi.org/10.1108/lhtn-02-2013-0006 http://dx.doi.org/10.1108/14637151111182693 http://dx.doi.org/10.1108/02635570910991283 http://dx.doi.org/10.1108/14637150710752272 http://dx.doi.org/10.1016/j.jchf.2013.07.005 http://dx.doi.org/10.1016/s0377-2217(02)00554-4 http://dx.doi.org/10.1057/palgrave.ejis.3000418 http://dx.doi.org/10.1109/ecbs.2007.28 http://dx.doi.org/10.1016/s0377-2217(02)00547-7 information technology and libraries | september 2016 41 29. prasad bingi, maneesh k. sharma, and jayanth k. godla, “critical issues affecting an erp implementation,” information systems management 16, no. 3 (1999): 7–14, http://dx.doi.org/10.1201/1078/43197.16.3.19990601/313. 30. mary sumner, “critical success factors in enterprise wide information management systems projects,” proceedings of the 1999 acm sigcpr conference on computer personnel research, 1999 (new york: acm, 1999), http://dx.doi.org/10.1145/299513.299722. 31. eric t. g. wang et al., “the consistency among facilitating factors and erp implementation success: a holistic view of fit,” journal of systems & software 81 no. 9 (2008): 1609–21, http://dx.doi.org/10.1016/j.jss.2007.11.722. 32. dong-gil ko, laurie j. kirsch, and william r. king, “antecedents of knowledge transfer from consultants to clients in enterprise system implementations,” mis quarterly 29, no. 1 (2005): 59–85. 33. al-mashari, “enterprise resource planning.” 34. fiona fui-hoon nah and santiago delgado, “critical success factors for enterprise resource planning implementation and upgrade,” journal of computer information systems 46 no. 5 (2006): 99–113. 35. liang zhang et al., “a framework of erp systems implementation success in china: an empirical study,” international journal of production economics 98, no. 1 (2005): 56–80, http://dx.doi.org/10.1016/j.ijpe.2004.09.004. 36. ann-marie k. baronas and meryl reis louis, “restoring a sense of control during implementation: how user involvement leads to system acceptance,” mis quarterly 12, no. 1 (1988): 111–24. 37. joseph esteves, joan pastor and joseph casanovas, “a goals/questions/metrics plan for monitoring user involvement and participation in erp implementation projects,” ie working paper, march 11, 2004, http://dx.doi.org/10.2139/ssrn.1019991. 38. khaled al-fawaz, zahran al-salti, and tillal eldabi, “critical success factors in erp implementation: a review” (paper presented at the european and mediterranean conference on information systems, dubai, may 25–26, 2008). 39. h. akkermans and k. van helden, “vicious and virtuous cycles in erp implementation: a case study of interrelations between critical success factors,” european journal of information systems 11, no. 1 (2002): 35–46, http://dx.doi.org/10.1057/palgrave.ejis.3000418. 40. nancy bancroft, henning seip, and andrea sprengel, implementing sap r/3: how to introduce a large system into a large organisation (greenwich, uk: manning, 1998). 41. nah, “critical success factors.” http://dx.doi.org/10.1201/1078/43197.16.3.19990601/313 http://dx.doi.org/10.1016/j.jss.2007.11.722 http://dx.doi.org/10.1016/j.ijpe.2004.09.004 http://dx.doi.org/10.2139/ssrn.1019991 http://dx.doi.org/10.1057/palgrave.ejis.3000418 critical success factors for integrated library system implementation in academic libraries: a qualitative study | yeh and walter |doi:10.6017/ital.v35i2.9255 42 42. toni m. somers and klara nelson, “the impact of critical success factors across the stages of enterprise resource planning implementations,” proceedings of the 34th hawaii international conference on system sciences, 2001, http://dx.doi.org/10.1109/hicss.2001.927129. 43. shi-ming huang et al., “assessing risk in erp projects: identify and prioritize the factors,” industrial management & data systems 104, no. 8 (2004): 681–88, http://dx.doi.org/10.1108/02635570410561672. 44. nah, “erp implementation.” 45. umble, “enterprise resources planning.” 46. nah, “erp implementation.” 47. “basecamp, in a nutshell,” basecamp, https://basecamp.com/about/press. 48. nah, “erp implementation.” 49. umble, “enterprise resources planning.” 50. mo adam mahmood et al., “variables affecting information technology end-user satisfaction: a meta-analysis of the empirical literature,” international journal of human-computer studies 52, no. 4 (2000): 751–71, http://dx.doi.org/10.1006/ijhc.1999.0353. 51. iuliana dorobat and floarea nastase, “training issues in erp implementations,” accounting & management information systems 11, no. 4 (2012): 621–36. 52. anne beaudry and alain pinsonneault, “the other side of acceptance: studying the direct and indirect effects of emotions on information technology use,” mis quarterly 34, no. 4 (2010): 689–710. 53. shaul and tauber, “critical success factors in enterprise resource planning systems.” 54. andrew lawrence norton et al., “ensuring benefits realisation from erp ii: the csf phasing model,” journal of enterprise information management 26, no. 3 (2013): 218–34, http://dx.doi.org/10.1108/17410391311325207. 55. chong hwa chee, “human factor for successful erp2 implementation,” new straits times, july 28, 2003, https://www.highbeam.com/doc/1p1-76161040.html. http://dx.doi.org/10.1109/hicss.2001.927129 http://dx.doi.org/10.1108/02635570410561672 https://basecamp.com/about/press http://dx.doi.org/10.1006/ijhc.1999.0353 http://dx.doi.org/10.1108/17410391311325207 https://www.highbeam.com/doc/1p1-76161040.html abstract introduction literature review method results careful selection process top management involvement vendor support project team competence staff user involvement interdepartmental communication data analysis and conversion project management and project tracking staff user education and training managing staff user emotions summary of results marc international richard e. coward: head of research and development, the british national bibliography, london, england 181 the cooperative development of the library of congress marc ii profect and the british national bibliography marc ii project is described and presented as the forerunner of an international marc network. emphasis is placed on the necessity for a standard marc record for international exchange and for acceptance of international standards of cataloging. this paper is an examination of two major operational automation projects. these projects, the library of congress marc ii project and the british national bibliography (bnb) marc ii project, are the result of sustained and successful anglo-american cooperation over a period of three years during which there has been continuous evaluation and change. in 1969, for a brief period, the systems developed have been stabilised, partly to give time for library systems to examine ways and means of exploiting a new type of centralised service, and partly to give the library of congress and the british national bibliography the opportunity to look outwards at other systems being developed in other countries. there has, of course, already been extensive contact and exchange of views between the agencies involved in the planning and developing of automated bibliographic systems and the possibilities of cooperation and exchange have been informally discussed at many levels. the time has now come for the national libraries and cataloguing agencies concerned to look at what has been achieved and to lay the foundation for effective cooperation in the future. the history of the anglo-american marc project began at the library 182 journal of library automation vol. 2/ 4 december, 1969 of congress with an experiment in a new way of distributing catalogue data. the traditional method of distributing library of congress bibliographic information is to provide catalogue cards or proof sheets. these techniques will undoubtedly continue indefinitely into the future, but the rapid spread of automation in libraries has created a new demand for bibliographic information in machine readable form. the original marc project ( 1) was "an experiment to test the feasability of distributing library of congress cataloguing in machine readable form". the use of the word "cataloguing" underlines the essential nature of the marc i project; its end product was a catalogue record on magnetic tape. there is a very significant difference between a catalogue record on magnetic tape and a bibliographic file in machine form. the latter does not necessarily hold anything resembling a catalogue entry, although marc ii still reflects, both in the lc implementation ( 2,3) and in the bnb implementation ( 4,5), a preoccupation with the visual organisation of a catalogue entry. fortunately retention of the cataloguing ''framework" does not hinder the utilisation of lc or bnb marc data in systems designed to hold and exploit bibliographic information, as the whole project is designed as a method for communication between systems. the essence of the marc ii project is that it is a communications system, or a common exchange language between systems wishing to exchange bibliographic information. it is highly undesirable, in fact quite impossible, to plan in terms of direct compatability between systems. machines are different, programs are different, and local objectives are different. the exchange of bibliographic information in any medium implies some level of agreement on the best way to organise and present the data being exchanged. the need to use a fairly standard type of bibliographic structure on a catalogue card is obvious enough, and over the years a form of presentation, as best exemplified by a library of congress catalogue card, has been developed which holds all the essential data and also, by means of typographical distinctions and layout, conveys the information in a visually attractive style. when bibliographic information is transmitted in a machine readable form the question of visual layout does not arise but the question of structure is vitally important. this structure is called the machine format and the machine format holds the data. it literally does not matter in what order the various bits and pieces that make up a catalogue record appear on a magnetic tape. what does matter very much is that the machine should be able to recognise each data element: author, title, series, subject heading, etc. in practice, either each data element must be given an identifying tag that the machine can recognise, or each data element must occupy a predetermined place in the record. in view of the unpredictable nature of bibliographic information, the former methodthat of tag identification-is now widely used and is the technique adopted in the marc system. marc international/ coward 183 the lc and bnb marc systems are two very closely related implementations of a communications format which in its generalised form has been carefully designed to hold any type of bibliographic information. the generalised format description is now being circulated by british standards institute and united states of america standards institute. it can be very briefly described as follows : i leader i directory i control field(s) i data fields the leader is a fixed field of 24 characters, giving the record length, the file status and details of the particular implementation. the directory is a series of entries each containing a tag (which identifies a data field) , the length of the data field in the record, and its starting character position. this directory is a variable field depending on the number of data elements in the record. the control fields consist of a special group of fields for holding the main control number and any subsidiary control data. the data fields are designated for holding bibliographic data. each field may be of any length and may be divided into any number of subfields. a data field may begin with special characters, called indicators, which can be used to supply additional information about the field as a whole. it can be seen that the basis of marc ii is a very flexible file structure designed to hold any type of bibliographic record. once such a level of compatability is established it is possible to prepare general file handling systems ( 6) which will convert any bibliographic record to a local file format. there is certainly much scope for agreement on local file formats as well, but such formats will necessarily be conditioned by the type of machine available and the use to be made of the file. the establishment of a generalised file structure is a great step forward but by itself means very little unless a wide measure of agreement can be reached on the data content of the record to be exchanged. here the responsibility for cooperation and standardisation shifts from the automation specialist to the librarian, and particularly to those national libraries and cataloguing agencies who can by their practical actions assist libraries to implement the standards prepared for the profession. in order to appreciate the real importance of standardisation, particularly in the context of the marc project, it is necessary to look a few years into the future. it is inevitable that the rapid spread of automated systems in libraries will create a demand for machine readable bibliographic records and that in turn will lead to the setting up of bibliographic data banks in machine readable form in national and local centres. these data banks will be international in scope and will contain many millions of items. in the long run the only feasible way to maintain them is for each country or group of countries to develop automated centralised cata184 journal of library automation vol. 2/4 december, 1969 loguing systems for handling their own national outputs and to receive from all other countries involved in the network machine readable records of the latter's national outputs. countries cooperating on this basis must agree on standards of cataloguing (and ultimately on standards of classification and subject indexing), so that the general data bank presents a consistently compiled set of bibliographic data. there is no doubt that national data banks will be set up. libraries today are faced simultaneously with a rapid increase in book prices, a need to maintain ever-increasing book stocks to meet the basic requirements of their readers, and a persistent shortage of trained personnel to catalogue their purchases. these trends are already well established and in the united states, where they are most advanced, the result has been the massive and highly successful shared cataloguing program. historically the shared cataloguing program will probably be seen as the first and last attempt to provide a comprehensive bibliographic service by unilateral action. a large number of countries have cooperated in this attempt but the shared cataloguing program does not rest on the principle of exchange. it is doubtful if even the united states will be able to maintain and extend this programme in its present form. the shared cataloguing program must ultimately be replaced with an international exchange system. national machine readable bibliographic systems will be established, but there is a grave danger that those agencies responsible will be primarily concerned only with the immediate problem of producing records suitable for use in their own national context or for their own national bibliography, regardless of the fact that the libraries and information centres they need to serve are acquiring ever-increasing quantities of foreign material. the exchange principle will be downgraded to an afterthought, a bv-product of the fact that an automated system is being used. if this outcome is to be avoided, international standards must be prepared and national agencies must accept them instead of only paying lip service to them. in the past librarians have tended to be more concerned with codification than standardisation, but in the field of cataloguing at least a great breakthrough was made sixteen years ago when seymour lubetzky produced his "cataloguing rules and principles; a critique of the a.l.a. rules for entry and a proposed design for their revision" ( 7). the work of lubetsky led to the "paris principles" ( 8) published by ifla in 1963 and in due course to the preparation of the "anglo-american cataloguing rules" 1967 ( 9) . these rules, though unfortunately departing from lubetzky's principles in one or two areas provide a solid basis for standardisation. we are fortunate to have them available at such a critical moment in the history of librarianship. they must form the basis of an international marc project. of all the great libraries of the world, the library of congress has done more than any other to promote international cataloguing standards. it is now in a uniquely favourable position to promote these standards marc international/coward 185 through its own marc ii project. the lc marc ii project, together with the bnb marc ii project, can provide the foundation of the international marc network. these projects alone cover the total field of english language material and yet already the basic requirement of standardisation is absent. the library of congress finds itself unable, for administrative reasons, to adopt fully the code of rules it worked so hard to produce and which british librarians virtually accepted as it stood in the interests of international standardisation. that a great library should be in this position is understandable. what is less understandable is that the library of congress should transfer the non-standard cataloguing rules established by an internal administrative decision to prescription of cataloguing data in the machine readable record that it is now issuing on an international basis. one of the great advantages of machine readable records is that they can simultaneously be both standard and non-standard. there is no reason that the library of congress, or any national agency, should not provide for international exchange a standard marc record together with any local information the library might want. if as a result other national agencies are encouraged to do the same, it will not be long before the absurdity and expense of examining each record received via the international network in order to change a standard heading to a local variant, will become apparent. the british national bibliography has already accepted the anglo-american code and by this action has now done much to promote its acceptance in great britain. incomplete acceptance of the code is really the only significant difference between the two marc projects. at a detailed level there are differences in some of the subfield codes. these are chiefly due to the fact that the british marc committee was particularly concerned with the problems of filing bibliographic entries, and as no generally accepted filing code exists it was decided to provide a complete analysis of the fields in headings. this analysis will enable the bnb marc data base to be arranged in different sequences to test the rules now being prepared. the other difference, or extension, in the british marc format is the provision of cross references with each entry, on the assumption that in a marc system a total pack of cataloguing data should be provided. however these differences reflect the experimental nature of the british project, not the fundamental differences in opinion. in this paper an attempt has been made to look at the british and american marc projects not as systems for distributing bibliographic information but as the forerunners of an international bibliographic network. intensive efforts have been made to lay a foundation for this international network. the anglo-american code provides a sound cataloguing base, the generalised communications format provides a machine base, and the standard book numbering system provides an international identification 186 journal of library automation vol. 2/ 4 december, 1969 system. these developments are all part of a general move towards real cooperation in the provision of bibliographic services. they must now be brought together in an international marc network. references i. avram, henriette d.: the marc pilot profect (washington, library of congress: 1968). 2. u. s. library of congress. information systems office. the marc ii format: a communications format for bibliographic data. prepared by henriette d. avram, john f. knapp and lucia j. rather. (washington, d. c.: 1968). 3. "preliminary guidelines for the library of congress, national library of medicine, and national agricultural library implementation of the proposed american standard for a format for bibliographic information interchange on magnetic tape as applied to records representing monographic materials in textual printed form (books)," journal of library automation, 2 (june 1969) . 68-83 4. bnb marc documentation service publications, nos. 1 and 2 (london, council of the british national bibliography, ltd., 1968 ). 5. coward, r. e.: '~he united kingdom marc record service," in cox nigel s. j.; grose, michael w.: organization and handling of bibliographic records by computer (hamden, conn., archon books, 1967). 6. cox, nigel s. m.; dews, j. d.: "the newcastle file handling system," in op. cit. (note 4). 7. lubetzky, seymour: code of cataloging rules ... prepared for the catalog code revision committee . .. with an explanatory commentary by paul dunkin. (chicago : american library association, 1960). 8. international federation of library associations. international conference on cataloguing principles, paris, 9th-18th october, 1961: report; edited by a. h. chaplin. 9. anglo-american cataloging rules. british text (london: library association, 1967). assessing the effectiveness of open access finding tools articles assessing the effectiveness of open access finding tools teresa auch schultz, elena azadbakht, jonathan bull, rosalind bucy, and jeremy floyd information technology and libraries | september 2019 82 teresa auch schultz (teresas@unr.edu) is social sciences librarian, university of nevada, reno. elena azadbkaht (eazadbakht@unr.edu) is health sciences librarian, university of nevada, reno. jonathan bull (jon.bull@valpo.edu) is scholarly communications librarian, valparaiso university. rosalind buch (rbucy@unr.edu) is research & instruction librarian, university of nevada, reno. jeremy floyd (jfloyd@unr.edu) is metadata librarian, university of nevada, reno. abstract the open access (oa) movement seeks to ensure that scholarly knowledge is available to anyone with internet access, but being available for free online is of little use if people cannot find open versions. a handful of tools have become available in recent years to help address this problem by searching for an open version of a document whenever a user hits a paywall. this project set out to study how effective four of these tools are when compared to each other and to google scholar, which has long been a source of finding oa versions. to do this, the project used open access button, unpaywall, lazy scholar, and kopernio to search for open versions of 1,000 articles. results show none of the tools found as many successful hits as google scholar, but two of the tools did register unique successful hits, indicating a benefit to incorporating them in searches for oa versions. some of the tools also include additional features that can further benefit users in their search for accessible scholarly knowledge. introduction the goal of open access (oa) is to ensure as many people as possible can read, use, and benefit from scholarly research without having to worry about paying to read and, in many cases, restrictions on reusing the works. however, oa scholarship helps few people if they cannot find it. this is especially problematic for green oa works, which are those that have been made open by being deposited in an open online repository even if they were published in a subscription -based journal. opendoar reports more than 3,800 such repositories.1 as users are unlikely to search each individual repository, an efficient search method is needed to find the oa items spread across so many locations. in recent years, several browser extensions have been released that allow a user to search for an open version of an article while on a webpage for that article. the tools include: • lazy scholar, a browser extension that searches google scholar, pubmed, europepmc, doai.io, and dissem.in. it has extensions for both the chrome and firefox browsers.2 • open access button, which uses both a website and a chrome extension to search for oa versions.3 • unpaywall, which also acts through a chrome extension to search for open articles via the digital object identifier.4 • kopernio, a browser extension that searches subject and institutional repositories and is owned by clarivate analytics. kopernio has extensions for chrome, firefox, and opera.5 mailto:teresas@unr.edu mailto:eazadbakht@unr.edu mailto:jon.bull@valpo.edu mailto:rbucy@unr.edu mailto:jfloyd@unr.edu assessing the effectiveness of open access finding tools |auch schultz, azadbakht, et al. 83 https://doi.org/10.6017/ital.v38i3.11109 some of the tools offer other services, such as open access button’s ability to help the user email the author of an article if no open version is available, as well as integration with libraries’ interlibrary loan workflows. kopernio and lazy scholar offer to sync with a user’s institutional library to see if an article is available through the library’s collection.6 although other similar extensions might also exist, this article is focused on the four mentioned above based on the authors’ knowledge of available oa finding tools at the time of the project. literature review as noted above, scholars have indicated for several years a need for reliable and user-friendly methods, systems, or tools that can help researchers find oa materials. bosman et al. forwarded the idea of a scholarly commons—a set of principles, practices, and resources to enable research openness—that depends upon clear linkages between digital research objects.7 bulock notes that oa has “complicated” retrieval in that oa versions are often housed in various locations across the web, including institutional repositories (irs), preprint servers, and personal websites. 8 there is no perfect search option or tool, although some have tried creating solutions, such as the open jericho project from wayne state university, which is seeking to create an aggregator to search institutional repositories and eventually other sources as well.9 however, this lack of a central search tool can lead to confusion among researchers.10 nicholas and colleagues found that their sample of early career scholars drawn from several countries relied heavily on google and google scholar to find articles that interested them.11 many also turn to researchgate and other social media platforms and risk running afoul of copyright. the results of ithaka s+r’s 2015 survey of faculty in the united states reflect these findings to a certain extent, as variations exist between researchers in different disciplines.12 a majority of the respondents also indicated an affinity for freely accessible materials. as more researchers become aware of and gravitate toward oa options, the efficacy of various discovery tools, such as the browser extensions evaluated in this study, will become even more pertinent. previous studies on the findability of oa scholarship have focused primarily on google and google scholar.13 a few have assessed tools such as oaister, opendoar, and pubmed central.14 norris, oppenheim, and rowland sought a selection of articles using google, google scholar, oaister, and opendoar.15 while oaister and opendoar found just 14 percent of the articles’ open versions, google and google scholar combined managed to locate 86 percent. jamali and nabavi assessed google scholar’s ability to retrieve the full text of scholarly publications and documented the major sources of the full-text versions (publisher websites, institutional repositories, researchgate, etc.).16 google scholar was able to locate full-text versions of more than half (57.3 percent) of the items included in the study. most recently, martin-martin et al. likewise used google scholar to gauge the availability of oa documents across different disciplines.17 they found that roughly 54.6 percent of the scholarly content for which they searched was freely available, although only 23.1 percent of their sample were oa by virtue of the publisher. as of yet, no known studies have systematically evaluated the growing selection of open access tools’ efficiency and effectiveness at retrieving oa versions of articles. however, several scholars and journalists have reviewed these new tools, especially the more established open access button and unpaywall.18 these reviews were mostly positive, even as some acknowledged that the tools are not a wholescale solution for locating oa publications. despite pointing out these tools’ information technology and libraries | september 2019 84 limitations, reviewers voiced their hope that the oa finding tools could help disrupt the traditional scholarly publishing industry.19 at least one study has used the open access button to determine the green oa availability of journal articles. emery used the tool as the first step to identify oa article versions and then searched individual institutional repositories, followed by google scholar as the final steps.20 emery found that 22 percent of the study sample was available as green oa but did not say what portion of that was found by the open access button. emery did note that the open access button returned 17 false positives (six in which the tool took the user to the wrong article or other content, and 11 in which it took the user to a citation of the article with no full text available). she also found at least 38 cases of false-negative returns from the open access button, or articles that were openly available that the tool failed to find. the study did not count open versions found on researchgate or academia.edu. methodology oa finding tools this study compared the chrome browser extensions for google scholar and four oa finding tools: lazy scholar, unpaywall, open access button, and kopernio. each extension was used while in the chrome browser to search for open versions of the selected articles and the success of each extension in finding any free, full version was recorded. the authors did not track whether an article was licensed for reuse. for the four oa finding tools, the occurrences of false positives (e.g., the retrieval of an error page, a paywalled version, or the wrong article entirely) were also tracked. false positives were not tracked for google scholar, which does not purport to find only open versions of articles. data collection occurred over a six-week period in october and november 2018. the authors used web of science to identify the test articles (n=1,000) with the aim of selecting articles that would give the tools the best chance for finding a high number of open versions. articles selected were published in 2015 and 2016. these years were selected in order to try to avoid embargoes that might have prevented articles being made open through deposit. the articles were selected from two disciplines: applied physics and oncology, both of which have a large share in web of science and come from a broader discipline with a strong oa culture.21 each comparison began with searching the google scholar extension by article doi or title if a doi was not available. all versions retrieved by google scholar were examined until an open version was located or until the retrieved versions were exhausted. the remaining oa tools were then tested from the webpage for the article record on the journal’s website (if available). if no journal page was available, the article pdf page was tested. all data were recorded in a shared google sheet according to a data dictionary. searches for open versions of paywalled articles were performed away from the authors’ universities to ensure the institutions’ subscriptions to various journals did not impact the results. authors were limited in the number of articles they could search each day as some tools blocked continued use, presumably over concerns of illegitimate web activity, after as few as 15 searches. study limitations this methodology might have missed open versions of articles, even using these five search tools. although studies have found google scholar to be one of the most effective ways of searching for assessing the effectiveness of open access finding tools |auch schultz, azadbakht, et al. 85 https://doi.org/10.6017/ital.v38i3.11109 open versions, way has shown that it is not perfect.22 therefore, it is possible that this study undercounted the number of oa articles. the study tested the ability of oa finding tools to locate open articles from a journal’s main article page, not other possible webpages (e.g., the google scholar results page). this design may have limited the effectiveness of some tools, such as kopernio, which appear to work well with some webpages but not others. results overall, the tools found open versions for just less than half of the study sample (490), whereas they found no open versions for 510 articles. although lazy scholar, unpaywall, open access button, and kopernio all found open versions, google scholar returned the most with 462 articles (94 percent of all articles with at least one open version). open access button, lazy scholar, and unpaywall all found a majority of the open articles (62 percent, 73 percent, and 67 percent, respectively); however, kopernio found open versions for just 34 percent of the articles (see figure 1). figure 1. number of open versions found by each tool. it was most common for three or more of the tools to find an open version for an article, with just 48 found by two tools and 98 found by only one tool (see figure 2). information technology and libraries | september 2019 86 figure 2. number of articles where x number of oa finding tools found an open version. when looking at articles where only one tool returned an open version, google scholar had the highest results (84). open access button (4) and lazy scholar (10) also returned unique hits, but unpaywall and kopernio did not. open access button returned the most false positives with 46, or nearly 5 percent of all 1,000 articles. lazy scholar returned 31 false positives (3 percent), unpaywall returned 14 (1 percent), and kopernio returned 13 (1 percent). discussion the results for the oa search tools show that while all four options met with some success, none of them performed as well as google scholar. three of the tools—lazy scholar, open access button, and unpaywall—did find at least half or more of the open versions that google scholar did. it is important to note that open access button, which found the second fewest open versions, does not search researchgate and academia.edu because of legal concerns over article versions that are likely infringing copyright.23 this could have affected open access button’s performance. likewise, kopernio’s lower percentage of finding oa resources might relate to concerns over article versions as well. when creating an account on kopernio, the user is asked to affiliate themselves with an institution so that the tool can search existing library subscriptions at that institution. for this study, the authors did not affiliate with their home institutions when setting up kopernio to get a better idea of which content was open as opposed to content being accessible because of the tool connecting to a library’s subscription collection. if the authors were to identify assessing the effectiveness of open access finding tools |auch schultz, azadbakht, et al. 87 https://doi.org/10.6017/ital.v38i3.11109 with an institution, the number of accessible articles would likely increase, but this access would not be a true representation of what open content is discoverable. in addition, some tools might work better with certain publishers than others. for instance, kopernio did not appear to work with spandidos publications, a leading biomedical science publisher that publishes much of its content as gold oa, meaning the entire journal is published as oa. kopernio found just one open version of a spandidos article, compared to 153 by google scholar. this could be an unintentional malfunction either with spandidos or kopernio, which if fixed, could greatly increase the efficacy of this finding tool. however, open access button, lazy scholar, unpaywall, and google were able to find oa publications from spandidos at similar rates (135, 138, and 139, respectively) with no false positives. while none of the tools performed as well as google scholar, some of the tools were easier to use compared to google scholar. google scholar does not automatically show an open version first; instead, users often have to first select the “all x versions” option at the bottom of each record and then open each version until they find an open version. lazy scholar and unpaywall appear (for the most part) automatically, meaning users can see right away if an open version is available and then click a button once to be taken to that version. although open access button and kopernio do not show automatically if they have found an open version, users need to click a button on their toolbar once to activate each tool and see if the tool was able to find an open version. open access button also provides the extra benefit of making it easy for users to email authors to make their works open if an open version is not already available. relying on lazy scholar, unpaywall, or open access button first causes users no harm, and they can always rely on google scholar as a backup. whether all four tools are needed is questionable. for instance, a few of the authors found kopernio difficult to work with as it seemed to be incompatible with at least one publisher’s website and it introduced extra steps in downloading a pdf file. the fact that it also returned by far the fewest open versions—just 36 percent of the ones google scholar found and no unique hits—does not argue well for users to include it in their oa finding toolbox. also, while lazy scholar, unpaywall, and open access button all performed better on their own, the authors wonder what improvements could be created by combining the resources of the individual tools. conclusion the growth of oa finding tools is encouraging to see as far as helping to make oa works more discoverable. although the study showed that google scholar uncovered more articles than any of the other tools, the utility of at least two of the tools—lazy scholar and open access button—can still be seen in that both found articles not discovered by the other tools, including google scholar. indeed, using the tools in conjunction with one another appears to be the best method. and although open access button found the second fewest articles, the tool’s effort to integrate with interlibrary loan and discovery workflows, as well as its concern about legal issues are all promising for its future. likewise, kopernio might be a better tool for those interested in combining access to a library collection—which likely has a large number of final, publisher versions of scholarship—with their search for openly available scholarship. future studies can include newer oa finding tools that have entered the market, as well as evaluate the user experience of the tools. another study can also look at how well open access information technology and libraries | september 2019 88 button’s author email feature works. also, as open access button and unpaywall continue to move into new areas, such as interlibrary loan support, research could explore if these are more effective ways of connecting users to oa material as well as measure users’ understanding of oa versions they find. overall, the emergence of oa finding tools offers much potential for increasing the visibility of oa versions of scholarship, although no tool is perfect. however, if scholars wish to support oa through their research practices or find themselves unable to purchase or legally acquire the publisher's version, each of these tools can be valuable additions to their work. data statement the data used for this study has been shared publicly in the zenodo database under a cc-by 4.0 license at https://doi.org/10.5281/zenodo.2602200. endnotes 1 jisc, “browse by country and region,” accessed february 15, 2019, http://v2.sherpa.ac.uk/view/repository_by_country/countries=5fby=5fregion.html. 2 colby vorland, “extension,” accessed march 14, 2019, http://www.lazyscholar.org/; colby vorland, “data sources,” lazy scholar (blog), accessed march 14, 2019, http://www.lazyscholar.org/data-sources/. 3 “avoid paywalls, request research,” open access button, accessed march 14, 2019, https://openaccessbutton.org/. 4 unpaywall, “browser extension,” accessed march 14, 2019, https://unpaywall.org/products/extension. 5 kopernio, “faqs,” accessed march 14, 2019, https://kopernio.com/faq. 6 colby vorland, “features,” lazy scholar (blog), accessed march 14, 2019, http://www.lazyscholar.org/category/features/. 7 jeroen bosman et al., “the scholarly commons—principles and practices to guide research communication,” open science framework, september 15, 2017, https://doi.org/10.17605/osf.io/6c2xt. 8 chris bulock, “delivering open,” serials review 43, no. 3–4 (october 2, 2017): 268–70, https://doi.org/10.1080/00987913.2017.1385128. 9 elliot polak, email message to author, june 4, 2019. 10 bulock, "delivering open.” 11 david nicholas et al., “where and how early career researchers find scholarly information,” learned publishing 30, no. 1 (january 1, 2017): 19–29, https://doi.org/10.1002/leap.1087. https://doi.org/10.5281/zenodo.2602200 http://v2.sherpa.ac.uk/view/repository_by_country/countries=5fby=5fregion.html http://www.lazyscholar.org/ http://www.lazyscholar.org/data-sources/ https://openaccessbutton.org/ https://unpaywall.org/products/extension https://kopernio.com/faq http://www.lazyscholar.org/category/features/ https://doi.org/10.17605/osf.io/6c2xt https://doi.org/10.1080/00987913.2017.1385128 https://doi.org/10.1002/leap.1087 assessing the effectiveness of open access finding tools |auch schultz, azadbakht, et al. 89 https://doi.org/10.6017/ital.v38i3.11109 12 christine wolff, alisa b rod, and roger c. schonfeld, “ithaka s+r us faculty survey 2015,” 2015, 83, https://sr.ithaka.org/publications/ithaka-sr-us-faculty-survey-2015/. 13 mamiko matsubayashi et al., “status of open access in the biomedical field in 2005,” journal of the medical library association 97, no. 1 (january 2009): 4–11, https://doi.org/10.3163/15365050.97.1.002; michael norris, charles oppenheim, and fytton rowland, “the citation advantage of open-access articles,” journal of the american society for information science and technology 59, no. 12 (october 1, 2008): 1963–72, https://doi.org/10.1002/asi.20898; doug way, “the open access availability of library and information science literature,” college & research libraries 71, no. 4 (2010): 302–09; charles lyons and h. austin booth, “an overview of open access in the fields of business and management,” journal of business & finance librarianship 16, no. 2 (march 31, 2011): 108–24, https://doi.org/10.1080/08963568.2011.554786; hamid r. jamali and majid nabavi, “open access and sources of full-text articles in google scholar in different subject fields,” scientometrics 105, no. 3 (december 1, 2015): 1635–51, https://doi.org/10.1007/s11192-0151642-2; alberto martín-martín et al., “evidence of open access of scientific publications in google scholar: a large-scale analysis,” journal of informetrics 12, no. 3 (august 1, 2018): 819–41, https://doi.org/10.1016/j.joi.2018.06.012. 14 norris, oppenheim, and rowland, “the citation advantage of open-access articles”; micahel norris, fytton rowland, and charles oppenheim, “finding open access articles using google, google scholar, oaister and opendoar,” online information review 32, no. 6 (november 21, 2008): 709–15, https://doi.org/10.1108/14684520810923881; maria-francisca abad‐garcía, aurora gonzález‐teruel, and javier gonzález‐llinares, “effectiveness of openaire, base, recolecta, and google scholar at finding spanish articles in repositories,” journal of the association for information science and technology 69, no. 4 (april 1, 2018): 619–22, https://doi.org/10.1002/asi.23975. 15 norris, rowland, and oppenheim, “finding open access articles using google, google scholar, oaister and opendoar.” 16 jamali and nabavi, “open access and sources of full-text articles in google scholar in different subject fields.” 17 martín-martín et al., “evidence of open access of scientific publications in google scholar.” 18 stephen curry, “push button for open access,” the guardian, november 18, 2013, sec. science, https://www.theguardian.com/science/2013/nov/18/open-access-button-push; bonnie swoger, “the open access button: discovering when and where researchers hit paywalls,” scientific american blog network, accessed may 30, 2017, https://blogs.scientificamerican.com/information-culture/the-open-access-buttondiscovering-when-and-where-researchers-hit-paywalls/; lindsay mckenzie, “how a browser extension could shake up academic publishing,” chronicle of higher education 68, no. 33 (april 21, 2017): a29–a29; joyce valenza, “unpaywall frees scholarly content,” school library journal 63, no. 5 (may 2017): 11–11; barbara quint, “must buy? maybe not,” information today 34, no. 5 (june 2017): 17–17; michaela d. willi hooper, “product review: unpaywall [chrome & firefox browser extension],” journal of librarianship & scholarly communication 5 https://sr.ithaka.org/publications/ithaka-sr-us-faculty-survey-2015/ https://doi.org/10.3163/1536-5050.97.1.002 https://doi.org/10.3163/1536-5050.97.1.002 https://doi.org/10.1002/asi.20898 https://doi.org/10.1080/08963568.2011.554786 https://doi.org/10.1007/s11192-015-1642-2 https://doi.org/10.1007/s11192-015-1642-2 https://doi.org/10.1016/j.joi.2018.06.012 https://doi.org/10.1108/14684520810923881 https://doi.org/10.1002/asi.23975 https://www.theguardian.com/science/2013/nov/18/open-access-button-push https://blogs.scientificamerican.com/information-culture/the-open-access-button-discovering-when-and-where-researchers-hit-paywalls/ https://blogs.scientificamerican.com/information-culture/the-open-access-button-discovering-when-and-where-researchers-hit-paywalls/ information technology and libraries | september 2019 90 (january 2017): 1–3, https://doi.org/10.7710/2162-3309.2190; terry ballard, “two new services aim to improve access to scholarly pdfs,” information today 34, no. 9 (november 2017): cover-29; diana kwon, “a growing open access toolbox,” the scientist, accessed december 11, 2017, https://www.the-scientist.com/?articles.view/articleno/51048/title/agrowing-open-access-toolbox/; kent anderson, “the new plugins — what goals are the access solutions pursuing?,” the scholarly kitchen, august 23, 2018, https://scholarlykitchen.sspnet.org/2018/08/23/new-plugins-kopernio-unpaywallpursuing/. 19 curry, “push button for open access”; swoger, “the open access button”; mckenzie, “how a browser extension could shake up academic publishing”; kwon, “a growing open access toolbox.” 20 jill emery, “how green is our valley?: five-year study of selected lis journals from taylor & francis for green deposit of articles,” insights 31, no. 0 (june 20, 2018): 23, https://doi.org/10.1629/uksg.406. 21 anna severin et al., “discipline-specific open access publishing practices and barriers to change: an evidence-based review,” f1000research 7 (december 11, 2018): 1925, https://doi.org/10.12688/f1000research.17328.1. 22 way, “the open access availability of library and information science literature.” 23 open access button, “open access button library service faqs,” google docs, accessed february 19, 2019, https://docs.google.com/document/d/1_hwkryg7qj7ff05cx8kw40ml7exwrz6ks5fb10gegg/edit?usp=embed_facebook. https://doi.org/10.7710/2162-3309.2190 https://www.the-scientist.com/?articles.view/articleno/51048/title/a-growing-open-access-toolbox/ https://www.the-scientist.com/?articles.view/articleno/51048/title/a-growing-open-access-toolbox/ https://scholarlykitchen.sspnet.org/2018/08/23/new-plugins-kopernio-unpaywall-pursuing/ https://scholarlykitchen.sspnet.org/2018/08/23/new-plugins-kopernio-unpaywall-pursuing/ https://doi.org/10.1629/uksg.406 https://doi.org/10.12688/f1000research.17328.1 https://docs.google.com/document/d/1_hwkryg7qj7ff05-cx8kw40ml7exwrz6ks5fb10gegg/edit?usp=embed_facebook https://docs.google.com/document/d/1_hwkryg7qj7ff05-cx8kw40ml7exwrz6ks5fb10gegg/edit?usp=embed_facebook abstract introduction literature review methodology oa finding tools study limitations results discussion conclusion data statement endnotes author name and second author the use of ajax, or asynchronous javascript + xml, can result in web applications that demonstrate the flexibility, responsiveness, and usability traditionally found only in desktop software. to illustrate this, a repository metasearch user interface, ojax, has been developed. ojax is simple, unintimidating but powerful. it attempts to minimize upfront user investment and provide immediate dynamic feedback, thus encouraging experimentation and enabling enactive learning. this article introduces the ajax approach to the development of interactive web applications and discusses its implications. it then describes the ojax user interface and illustrates how it can transform the user experience. w ith the introduction of the ajax development paradigm, the dynamism and richness of desktop applications become feasible for web-based applications. ojax, a repository metasearch user interface, has been developed to illustrate the potential impact of ajax-empowered systems on the future of library software.1 this article describes the ajax method, highlights some uses of ajax technology, and discusses the implications for web applications. it goes on to illustrate the user experience offered by the ojax interface. ■ ajax in february 2005, the term ajax acquired an additional meaning: asynchronous javascript + xml.2 the concept behind this new meaning, however, has existed in various forms for several years. ajax is not a single technology but a general approach to the development of interactive web applications. as the name implies, it describes the use of javascript and xml to enable asynchronous communication between browser clients and server-side systems. as explained by garrett, the classic web application model involves user actions triggering a hypertext transfer protocol (http) request to a web server.3 the latter processes the request and returns an entire hypertext markup language (html) page. every time the client makes a request to the server, it must wait for a response, thus potentially delaying the user. this is particularly true for large data sets. but research demonstrates that response times of less than one second are required when moving between pages if unhindered navigation is to be facilitated through an information space.4 the aim of ajax is to avoid this wait. the user loads not only a web page, but also an ajax engine written in javascript. users interact with this engine in the same way that they would with an html page, except that instead of every action resulting in an http request for an entire new page, user actions generate javascript calls to the ajax engine. if the engine needs data from the server, it requests this asynchronously in the background. thus, rather than requiring the whole page to be refreshed, the javascript can make rapid incremental updates to any element of the user interface via brief requests to the server. this means that the traditional page-based model used by web applications can be abandoned; hence, the pacing of user interaction with the client becomes independent of the interaction between client and server. xmlhttprequest is a collection of application programming interfaces (apis) that use http and javascript to enable transfer of data between web servers and web applications.5 initially developed by microsoft, xmlhttprequest has become a de facto standard for javascript data retrieval and is implemented in most modern browsers. it is commonly used in the ajax paradigm. the data accessed from the http server is usually in extensible markup language (xml) but another format, such as javascript object notation, could be used.6 applications of ajax google is the most significant user of ajax technology to date. most of its recent innovations, including gmail, google suggest, google groups, and google maps, employ the paradigm.7 the use of ajax in google suggest improves the traditional google interface by offering real-time suggestions as the user enters a term in the search field. for example, if the user enters xm, google suggest might offer refinements such as xm radio, xml, and xmods. experimental ajax-based auto-completion features are appearing in a range of software.8 shanahan has applied the same ideas to the amazon online bookshop.9 his experimental site, zuggest, extends the concept of auto-completion: as the user enters a term, the system automatically triggers a search without the need to hit a search button. the potential of ajax to improve the responsiveness and richness of library applications has not been lost on the library community.10 several interesting experiments have been tried. at oclc, for example, a “suggest-like service,” based on controlled headings from the worldjudith wusteman and pádraig o’hiceadha using ajax to empower dynamic searching | wusteman 57 using ajax to empower dynamic searching judith wusteman (judith.wusteman@ucd.ie) is a lecturer in the ucd school of information and library studies, university college dublin, ireland. 58 information technology and libraries | june 2006 wide union catalog, worldcat, has been implemented.11 ajax has also been used in the oclc deweybrowser.12 the main page of this browser includes four iframes, or inline frames, three for the three levels of dewey decimal classification and a fourth for record display.13 the use of ajax allows information in each iframe to be updated independently without having to reload the entire page. implications of ajax there have been many attempts to enable asynchronous background transactions with a server. among alternatives to ajax are flash, java applets, and the new breed of xml user-interface language formats such as xml user interface language (xul) and extensible application markup language (xaml).14 these all have their place, particularly languages such as xul. the latter is ideal for use in mozilla extensions, for example. combinations of the above can and are being used together; xul and ajax are both used in the firefox extension version of google suggest.15 the main advantage of ajax over these alternative approaches is that it is nonproprietary and is supported by any browser that supports javascript and xmlhttprequest—hence, by any modern browser. it could be validly argued that complex client-side javascript is not ideal. in addition to the errors to which complex scripting can be prone, there are accessibility issues. best practice requires that javascript interaction adds to the basic functionality of web-based content that must remain accessible and usable without the javascript.16 an alternative non-javascript interface to gmail was recently implemented to deal with just this issue. a move away from scripting would, in theory, be a positive step for the web. in practice, however, procedural approaches continue to be more popular; attempts to supplant them, as epitomized by xhtml 2.0, simply alienate developers.17 it might be assumed that the use of ajax technology would result in a heavier network load due to an increase in the number of requests made to the server. this is a misconception in most cases. indeed, ajax can dramatically reduce the network load of web applications, as it enables them to separate data from the graphical user interface (gui) used to display it. for example, each results page presented by a traditional search engine delivers, not only the results data, but also the html required to render the gui for that page. an ajax application could deliver the gui just once and, after that, deliver data only. this would also be possible via the careful use of frames; the latter could be regarded as an ajax-style technology but without all of ajax’s advantages. ■ from client-server to soa the dominant model for building network applications is the client/server approach, in which client software is installed as a desktop application and data generally reside on a server, usually in a database.18 this can work well in a homogenous single-site computing environment. but institutions and consortia are likely to be heterogeneous and geographically distributed. pcs, macs, and cell phones will all need access to the applications, and linux may require support alongside windows. even if an organization standardizes solely on windows, different versions of the latter will have to be supported, as will multiple versions of those ubiquitous dynamic link libraries (dlls). indeed, the problems of obtaining and managing conflicting dlls have spawned the term “dll hell.”19 in web applications, a standard client, the browser, is installed on the desktop but most of the logic, as well as the data, reside on the server. of course, the browser developers still have to worry about “dll hell,” but this need not concern the rest of us. “speed must be the overriding design criterion” for web pages.20 but the interactivity and response times possible with client/server applications are still not available to traditional web applications. this is where ajax comes in: it offers, to date, the best of the web application and client/server worlds. much of the activity is moved back to the desktop via client-side code. but the advantages of web applications are not lost: the browser is still the standard client. service-oriented architecture (soa) is an increasingly popular approach to the delivery of applications to heterogeneous computing environments and geographically dispersed user populations.21 soa refers to the move away from monolithic applications toward smaller, reusable services with discrete functionality. such services can be combined and recombined to deliver different applications to users. web services is an implementation of soa principles.22 the term describes the use of technologies such as xml to enable the seamless interoperability of web-based applications. ajax enables web services and hence enables soa principles. thus, the adoption of ajax facilitates the move toward soa and all the advantages of reuse and integration that this offers. ■ arc arc is an experimental open-source metasearch package available for download from the sourceforge opensource foundry.23 it can be configured to harvest open using ajax to empower dynamic searching | wusteman 59 archives initiative-protocol for metadata harvesting (oai-pmh)-compliant data from multiple repositories.24 the harvested results are stored in a relational database and can be searched using basic web forms. arc’s advanced search form is illustrated in figure 1. ■ applying ajax to the search gui the use of ajax has the potential to narrow the gulf between the responsiveness of guis for web applications and those for desktop applications. the flexibility, usability, and richness of the latter are now possible for the former. the ojax gui, illustrated in figure 2, has been developed to demonstrate how ajax can improve the richness of arc-like guis. ojax, including full source code, is available under the open-source apache license and is hosted on sourceforge.25 ojax comprises a client-side gui, implemented in javascript and html, and server-side metasearch web services, implemented in java. the web services connect directly to a metasearch database created by arc from harvested repositories. the database connectivity leverages several libraries from the apache jakarta project, which provides open-source java solutions.26 ■ development process the ojax gui was developed iteratively using agile software development methods.27 features were added incrementally and feedback gained from a proxy user. in order to gain an in-depth understanding of the system and the implications for the remainder of the gui, features were initially built from scratch, using objectoriented javascript.they were then rebuilt using three open-source javascript libraries: prototype, script.aculo .us, and rico.28 prototype provides base ajax capability. it also includes advanced functionality for object-oriented javascript, such as multiple inheritance. the other two libraries are built on top of prototype. the script.aculo. us library specializes in dynamic effects, such as those used in auto-completion. the rico library, developed by sabre, provides other key javascript effects—for example, dynamic scrollable areas and dynamic sorting.29 ■ storyboard one of the aims of the national information standards organization (niso) metasearch initiative is to enable all library users to “enjoy the same easy searching found in web-based services like google.”30 adopting this approach, ojax incorporates the increasingly common concept of the search bar, popularized by the google toolbar.31 ojax aims to be as simple, uncluttered, and unthreatening as possible. the goal is to reflect the simple-search experience while, at the same time, providing the power of an advanced search. thus, the user interface has been kept as simple as possible while maintaining equivalent functionality with the arc advanced search interface. all arc functionality, with the exception of the grouping feature, is provided. to help the intuitive flow of the operation, the fields are set out as a sentence: find [term(s)] in [all archives] from [earliest year] until [this year] in [all subjects] tool tips are available for text-entry fields. by default, searching is on author, title, and abstract. these fields map to the creator, title, and description dublin core metadata fields harvested from the original repositories.32 the search can be restricted by deselecting unwanted fields. arc supports both mysql and oracle databases.33 mysql has been chosen for ojax as mysql is an open-source database. boolean search syntax has been figure 1. arc’s advanced search form figure 2. the ojax metasearch user interface 60 information technology and libraries | june 2006 implemented in ojax to allow for more powerful searching. the syntax is similar to that used by google in that it identifies and/or and exact phrase functionality by +/and “ ”. hence it preserves the user’s familiarity with basic google search syntax. however, it is not as powerful as the full google search syntax; for example, it does not support query modifiers such as: intitle: 34 the focus of this research is the application of ajax to the search gui and not the optimization of the power or expressive capability of the underlying search engine. however, the implementation of an alternative back end that uses a full-text search engine, such as apache lucene, would improve the expressive power of advanced queries.35 full-text search expressiveness is likely to be key to the usability of ojax, ensuring its adequacy for the advanced user without alienating the novice. ■ unifying the user interface one of the main aims of ojax is the unification of the user interface. instead of offering distinct options for simple and advanced search and for refining a completed search, the interface is sufficiently dynamic to make this unnecessary. the user need never navigate between pages because all options, both simple and advanced, are available from the same page. and all results are made available on that same page in the form of a scrollable list. the only point at which a new page is presented is when the resource identifier of a result is clicked. at this stage, a pop-up window, external to the ojax session, displays the full metadata for that resource. this page is generated by the external repository from which the record was originally harvested. simple and advanced search options are usually kept separate because most users are unwilling or unable to use the latter.36 furthermore, the design of existing search-user interfaces is based on the assumption that the retrieval of results will be sufficiently time-consuming that users will want to have selected all options beforehand. with ojax, however, users do not have to make a complete choice of all the options they might want to try before they see any results. as data are entered, answers flow to accommodate them. because the interface is so dynamic and responsive and because users are given immediate feedback, they do not have to be concerned about wasting time due to the wrong choice of search options. users iterate toward the search results they require by manipulating the results in real time. the reduced level of investment that users must make before they achieve any return from the system should encourage them to experiment, hence promoting enactive learning. ■ auto-completion in order to provide instant feedback to the user, the search-terms field and the subject field use ajax to autocomplete user entries. figure 3 illustrates the result of typing smith in the search-terms field. a list is automatically dropped down that itemizes all matches and the number of their occurrences. users select the term they want, the entire field is automatically completed, and a search is triggered. the arc system denormalizes some of the harvested data before saving them in its database. for example, it merges all the author fields into one single field, each name separated by a bar character. to enable the ojax auto-completion feature, it was necessary to renormalize the names. a new table is used to store each name in a separate row; names are referenced by the resource identifier. to enable this, arc’s indexing code was updated so that it creates this table as it indexes records extracted from the oai-pmh feed. in its initial implementation, ojax uses a simple algorithm for auto-completion. future work will involve developing a more complex heuristic that will return results more closely satisfying user requirements. ■ auto-search as already mentioned, a central theme of ojax is the attempt to reduce the commitment necessary from users before they receive feedback on their actions. one way in which dynamic feedback is provided is the triggering of an immediate search whenever an entire option has been selected. examples of entire options include choice of an archive or year and acceptance of a suggested autocompletion. in addition, the following heuristics are used to identify when a user is likely to have finished entering a search term and, thus, when a search should be triggered: 1. entering a space character in the search-terms field or subject field 2. tabbing out of a field after having modified its contents 3. five seconds of user inactivity for a modified field the third heuristic aims to catch some of the edge cases that the other heuristics may miss. it is assumed likely that a term has been completed if a user has made no edits in the last five seconds. as each term will be using ajax to empower dynamic searching | wusteman 61 separated by a space, it is only the last term in a search phrase that is likely not to trigger an auto-search via the first heuristic. users can click the search button whenever they wish, but they should never have to click it. the zuggest system abandons the search button entirely; ojax retains it, mainly in order to avoid confounding user expectations.37 while a search is in progress, the search button is greyed out and acquires a red border. this is particularly useful in alerting the user that a search has been automatically triggered. this is the only feature of ojax that may have an impact on network load in terms of slightly higher traffic. however, the increased number of requests is offset by a reduction in the size of each response because the gui is not downloaded with it. for example, initiating a search in arc results in an average response size of 57.32k. the response is in the form of a complete html page. initiating a search in ojax results in an average response size of 7.96k. the latter comprises a web service response in xml. in other words, more than seven ojax autosearches would have to be triggered before the size of the initial search result in arc was exceeded. ■ dynamic archive list the use of ajax enables a static html page to contain a small component of dynamic data without the entire page having to be dynamically generated on the server. ojax illustrates this: the contents of the drop-down box listing the searchable archives are not hard-coded in the html page. rather, when the page is loaded, an ajax request for the set of available archives is generated. this is a useful technique; static html pages can be cached by browsers and proxy servers, and only the dynamic portion of the data, perhaps those used to personalize the page, need be downloaded at the start of a new session. ■ dynamic scrolling searches commonly produce thousands of results. typical systems, such as google and arc, make these results available via a succession of separate pages, thus requiring users to navigate between them. finding information by navigating multiple pages can take longer than scrolling down a single page, and users rarely look beyond the second page of search results.38 to avoid these problems and to encourage users to look at more of the available results, those results could be made available in one scrollable list. but, in a typical non-ajax application, accessing a scrollable list of, say, two thousand items would require the entire list to be downloaded via one enormous html page. this would be a huge operation; if it did not crash the browser, it would, at least, result in a substantial wait for the user. the rico library provides a feature to enable dynamic scrollable areas. it uses ajax to fetch more records from the server when the user begins to scroll off the visible area. this is used in the display of search results in ojax, as illustrated in figure 4. to the user, it appears that the scrollable list is seamless and that all 4,678 search results are already downloaded. in fact, only 386 have been downloaded. the rest are available at the server. as the user scrolls further down, say to item 396, an ajax request is made for the next ten items. any item downloaded is cached by the ajax engine and need not be requested again if, for example, the user scrolls back up the list. a dynamic information panel is available to the right of the scroll bar. it shows the current scroll position in relation to the beginning and end of the results set. in figure 3. auto-completion in the search terms field figure 4. display of search results and dynamic information panel 62 information technology and libraries | june 2006 figure 4, the information panel indicates that there are 4,678 results for this particular search and that the current scroll position is at result number 386. this number updates instantly during scrolling, preserving the illusion that all results have been downloaded and providing users with dynamic feedback on their progress through the results set. this means that users do not have to wait for the main results window to refresh to identify their current position. ■ auto-expansion of results ojax aims to provide a compact display of key information, enabling users to see multiple results simultaneously. it also aims to provide simple access to full result details without requiring navigation to a new web page. in the initial results display, only one line each of the title, authors, and subject fields, and two lines of the abstract, are shown for each item. as the cursor is placed on the relevant field, the display expands to show any hidden detail in that field. at the same time, the background color of the field changes to blue. when the cursor is placed on the bar containing the resource identifier, all display fields for that item are expanded, as illustrated in figure 5. this expansion is enabled via simple cascading style sheet (css) features. for example, the following css declaration hides all but the first line of authors: #searchresults td div { overflow:hidden; height: 1.1em } when the cursor is placed on the author details, the overflow becomes visible and the display field changes its dimensions to fit the text inside it: #searchresults td div:hover { overflow:visible; height:auto } ■ sorting results another method used by ojax to minimize upfront user investment is to provide initial search results before requiring the user to decide on sort options. because results are available so quickly and because they can be re-sorted so rapidly, it is not necessary to offer pre-search selection of sort options. ajax facilitates rapid presentation of results; after a re-sort, only those on the first screen must be downloaded before they can be presented to the user. results may be sorted by title, author, subject, abstract, and resource identifier. these options are listed on the gray bar immediately above the results list. clicking one of these options sorts the results in ascending order; an upward-pointing arrow appears to the right of the sort option chosen, as illustrated in figure 6. clicking on the option again sorts in descending order and reverses the direction of the arrow. clicking on the arrow removes the sort; the results revert to their original order. functionality for the sort feature is provided by the rico javascript library. server-side implementation supports these features by caching search results so that it is not necessary to regenerate them via a database query each time. figure 5. auto-expansion of all fields for item number 386 figure 6. results being sorted in ascending order by title using ajax to empower dynamic searching | wusteman 63 ■ search history several experimental systems—for example, zuggest— have employed ajax to facilitate a search-history feature. a similar feature could be provided for ojax. a button could be added to the right of the results list. when chosen, it could expand a collapsible search-history sidebar. as the cursor was placed on one of the previous searches listed in the sidebar, a call out, that is, a speech bubble, could be displayed. this could provide further information such as the number of matches for that search and a summary of the search results clicked on by the user. clicking one of the previous searches would restore those search results to the main results window. this feature would take advantage of the ajax persistent javascript engine to maintain the history. its use could help counter concerns about ajax technology “breaking” the back button; the feature could be implemented so that the back button returned the user to the previous entry in the search history.39 in fact, this implementation of back-button functionality could be more useful than the implementation in google, where hitting the back button is likely to take the user to an interim results page; for example, it might simply take the user from page 3 of results to page 2 of results. ■ scrapbook users browsing through search results on ojax would require some simple method of maintaining a record of those resource details that interested them. ajax could enable the development of a useful scrapbook feature to which such resource details could be copied and stored in the persistent javascript engine. ojax could further leverage a shared bookmark web service, such as del. icio.us or furl, to save the scrapbook for use in future sessions and to share it with other members of a research or interest group.40 ■ potential developments for ojax as well as searching a database of harvested metadata, the ojax user interface could also be used to search an oai-pmh-compliant repository directly. with appropriate implementation, all of ojax’s current features could be made available, apart from auto-completion. a recent development has enabled the direct indexing of repositories by google using oai-pmh.41 the latter provides google with additional metadata that can be searched via the google web services apis. the current ojax web services could be replaced by the google apis, thus eliminating the need for ojax to host any server-side components. hence, ojax could become an alternative gui for google searching. ■ conclusion ojax demonstrates that the use of ajax can enable features in web applications that, until now, have been restricted to desktop applications. in ojax, it facilitates a simple, nonthreatening, but powerful search user interface. page navigation is eliminated; dynamic feedback and a low initial investment on the part of users encourage experimentation and enable enactive learning. the use of ajax could similarly transform other web applications aimed at library patrons. however, ajax is still maturing, and the barrier to entry for developers remains high. we are a long way from an ajax button appearing in dreamweaver. reusable, well-tested components, such as rico, and software frameworks, such as ruby on rails, sun’s j2ee framework, and microsoft’s atlas, will help to make ajax technology accessible to a wider range of developers.42 as with all new technologies, there is a temptation to use ajax simply because it exists. as ajax matures, it is important that its focus does not become the enabling of “cool” features but remains the optimization of the user experience. references and notes 1. ojax homepage, http://ojax.sourceforge.net (accessed apr. 5, 2006). 2. j. j. garrett, “ajax: a new approach to web applications,” feb. 18, 2005, www.adaptivepath.com/publications/ essays/archives/000385.php (accessed nov. 11, 2005). 3. ibid. 4. j. nielsen, “the need for speed,” alertbox mar. 1, 1997, www.useit.com/alertbox/9703a.html (accessed nov. 11, 2005). 5. dynamic html and xml: the xmlhttprequest object, http://developer.apple.com/internet/webcontent/xmlhttpreq .html (accessed apr. 5, 2006). 6. javascript object notation, wikipedia definition, http:// en.wikipedia.org/wiki/json (accessed apr. 5, 2006). 7. google gmail, http://mail.google.com (accessed apr. 5, 2006); google suggest, www.google.com/webhp?complete =1&hl=en (accessed apr. 5, 2006); google groups, http://groups .google.com (accessed apr. 5, 2006); google maps, http://maps .google.com (accessed apr. 5, 2006). 8. p. binkley, “ajax and auto-completion,” quædam cuiusdam blog may 18, 2005, www.wallandbinkley.com/quaedam/?p=27 (accessed nov. 11, 2005). 9. francis shanahan, zuggest, www.francisshanahan.com/ zuggest.aspx (accessed apr. 5, 2006). 64 information technology and libraries | june 2006 10. a. rhyno, “ajax and the rich web interface,” librarycog blog apr. 10, 2005, http://librarycog .uwindsor.ca:8087/artblog/librarycog/1113186562 (accessed nov. 11, 2005); r. tennant, “tennant’s top tech trend tidbit,” lita blog june 22, 2005, http://litablog.org/?p=35 (accessed nov. 11, 2005). 11. t. hickey, “ajax and web interfaces,” outgoing blog, mar. 31, 2005. retrieved nov. 11, 2005 http://outgoing.typepad .com/outgoing/2005/03/web_application.html. 12. oclc deweybrowser. http://ddcresearch.oclc.org/ ebooks/fileserver (accessed apr. 5, 2006). 13. hickey, “ajax and web interfaces.” 14. j. wusteman, “from ghostbusters to libraries: the power of xul,” library hi tech 23, no 1 (2005a). retrieved nov. 11, 2005 www.ucd.ie/wusteman/; cover pages, microsoft extensible application markup language (xaml), http://xml.cover pages.org/ms-xaml.html (accessed apr. 5, 2006). 15. google extensions for firefox, http://toolbar.google .com/firefox/extensions/index.html (accessed apr. 5, 2006). 16. c. adams, “ajax: usable interactivity with remote scripting,” sitepoint. (jul. 13, 2005), www.sitepoint.com/article/ remote-scripting-ajax (accessed nov. 11, 2005). 17. xhtml 2.0, w3c working draft, may 27, 2005, www .w3.org/tr/2005/wd-xhtml2-20050527 (accessed apr. 5, 2006). 18. client/server model, http://en.wikipedia.org/wiki/ client/server (accessed apr. 5, 2006). 19. dll hell, http://en.wikipedia.org/wiki/dll_hell (accessed apr. 5, 2006). 20. j. nielsen, “the need for speed.” 21. service-oriented architecture, http://en.wikipedia.org/ wiki/service-oriented_architecture (accessed apr. 5, 2006). 22. j. wusteman, “realizing the potential of web services,” oclc systems & services: international digital library perspectives 22, no. 1 (2006): 5–9. 23. arc—a cross archive search service, old dominion university digital library research group, http://arc.cs.odu .edu (accessed apr. 5, 2006); niso metasearch initiative, www .niso.org/committees/ms_initiative.html (accessed apr. 5, 2006); arc download page, sourceforge, http://oaiarc.source forge.net (accessed apr. 5, 2006). 24. open archives initiative protocol for metadata harvesting, www.openarchives.org/oai/openarchivesprotocol.html (accessed apr. 5, 2006). 25. ojax download page, sourceforge, http://sourceforge .net/projects/ojax (accessed apr. 5, 2006). 26. apache jakarta project, http://jakarta.apache.org (accessed apr. 5, 2006); apache jakarta commons dbcp, http:// jakarta.apache.org/commons/dbcp (accessed apr. 5, 2006); apache jakarta commons dbutils, http://jakarta.apache.org/ commons/dbutils (accessed apr. 5, 2006). 27. agile software development definition, wikipedia, http://en.wikipedia.org/wiki/agile_software_development (accessed apr. 5, 2006). 28. prototype javascript framework, http://prototype.conio .net (accessed apr. 5, 2006); script.aculo.us, http://script.aculo .us (accessed apr. 5, 2006); rico, http://openrico.org/rico/ home.page (accessed apr. 5, 2006). 29. sabre, www.sabre.com (accessed apr. 5, 2006). 30. niso metasearch initiative, www.niso.org/committees/ ms_initiative.html (accessed apr. 5, 2006). 31. google toolbar, http://toolbar.google.com (accessed apr. 5, 2006). 32. dublin core metadata initiative, http://dublincore.org (accessed apr. 5, 2006). 33. mysql, www.mysql.com (accessed apr. 5, 2006). 34. google help center, advanced operators, www.google .com/help/operators.html (accessed apr. 5, 2006). 35. apache lucene, http://lucene.apache.org (accessed apr. 5, 2006). 36. j. nielsen, “search: visible and simple,” alertbox may 13, 2001, www.useit.com/alertbox/20010513.html (accessed nov. 11, 2005). 37. francis shanahan, zuggest. 38. j. r. baker, “the impact of paging versus scrolling on reading online text passages,” usability news 5, no. 1 (2003), http://psychology.wichita.edu/surl/usabilitynews/51/ paging_scrolling.htm (accessed nov. 11, 2005); j. nielsen, “search: visible and simple.” 39. j. j. garrett, “ajax: a new approach to web applications.” 40. del.icio.us, http://del.icio.us (accessed apr. 5, 2006); furl, www.furl.net (accessed apr. 5, 2006). 41. google sitemaps (beta) help, www.google.com/web masters/sitemaps/docs/en/other.html (accessed apr. 5, 2006). 42. ruby on rails, www.rubyonrails.org (accessed apr. 5, 2006); java 2 platform, enterprise edition (j2ee), http://java .sun.com/j2ee (accessed apr. 5, 2006); m. lamonica, “microsoft gets hip to ajax,” cnet news.com, june 27, 2005, http:// news.com.com/microsoft+gets+hip+to+ajax/2100-1007_3 -5765197.html (accessed nov. 11, 2005). letter from the editor: a blank page letter from the editor a blank page kenneth j. varnum information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.12405 nothing is as daunting as a blank page, particularly now. as i sat down to write this issue’s letter, i was struck by how much fundamental uncertainty is in our lives, so much trauma. a blank page can emphasize our concerns that the old familiar should return at all, or that a new, better, normal will emerge. at the same time, a blank page can be liberating at a time when so much of our social, professional, and personal lives needs to be reconceptualized and reactivated in new, healthier , more respectful and inclusive ways. we are collectively faced with two important societal ailments. the first is the literal disease of the covid-19 pandemic that has been with us for only months. the other is the centuries-long festering disease of racial injustice, discrimination, and inequality that typifies (particularly, but not uniquely) american society. while some of us may be in better positions to help heal one or the other of these two ailments, we can all do something in both, as different as they are. lend emotional support to those in need of it, take part in rallies if your personal health and circumstances allow, and advocate for change to government officials at all levels from local to national. learn about the issues and explore ways you can make a difference on either or both fronts. i hope i am not being foolish or naive when i say i believe the blank page before us as a society will be liberating: an opportunity to shift ourselves toward a better, more equitable, more just path. * * * * * * to rephrase humphrey bogart’s rick blaine in casablanca, “it doesn’t take much to see that the problems of three little people library association divisions don’t amount to a hill of beans in this crazy world.” but despite the small global impact of our collective decision, i am glad our alcts, llama, and lita colleagues chose a united future as core: leadership, infrastructure, futures. watch for more information about what the merged division means for our three divisions and this journal in the months to come. sincerely, kenneth j. varnum, editor varnum@umich.edu june 2020 https://core.ala.org/ mailto:varnum@umich.edu lita president's message: a framework for member success lita president’s message a framework for member success emily morton-owens information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.12105 emily morton-owens (egmowens.lita@gmail.com) is lita president 2019-20 and the assistant university librarian for digital library development & systems at the university of pennsylvania libraries. this column represents my final venue to reflect on our potential merger with alcts and llama before the vote. after a busy midwinter meeting with lots of intense discussions about the steering committee on organizational effectiveness (scoe)’s recommendations, the divisions, the merger, ala finances, and more, my thoughts keep turning in a particularly wonkish direction: towards our organization. so many of the challenges before us hinge on one particular dilemma. for those of us who are most involved in ala and lita, the organization (our committees, offices, processes, bylaws, etc.) may be familiar and supportive. but for new members looking for a foothold, or library workers who don’t see themselves in our association, our organization may look like a barrier. moreover, many of our financial challenges are connected to our organization. the organization must evolve, but we must achieve this without losing what makes us loyal members. while ala and lita have specific audiences of library workers and technologists, we have a lot in common with other membership organizations. one of the responsibilities for the lita vicepresident is attendance at a workshop put on by the american society of association executives, where we learn how to steward an organization. representatives from many different groups attended this workshop, where i had a chance to discuss challenges with leaders from medical and manufacturing associations, and i learned that these challenges are often orthogonal to the subject matter at hand. everyone was dealing with the need to balance membership cost and value, how to give members a voice while allowing for agile decision-making, and how to put on events that are great for attendees without becoming the only way to get value from membership. hearkening back even further, i worked as a library-school intern at a library with a long run of german and french-language serials that i retrospectively cataloged. one batch that has always stuck in my mind is the planning materials for international congresses that were held in the early 20th century by the international societies for horticulture and botany. these events were massive undertakings held at multi-year intervals, gradually planned by international mail. interested parties would receive a lavish printed prospectus, with registration and travel arrangements starting several years in advance. the most interesting documents pertained to the events planned for the mid to late 1930s in europe. these events were cancelled or fell short of intentions because of pre-world war ii political pressures. the congress schedules did not resume until 1950 or later, with some radical changes—for example, german was no longer used as the language of science, and the geographic distribution of events increased significantly in the later 20th century. when i first encountered this material, i was intrigued by how the war affected science. looking back now, i see a dual case study in organizations weathering a crisis whose magnitude we can only imagine, and then reinventing themselves on the other side. both of these organizations still exist and continue to meet, by the way—and i can’t help but feel that reinvention is the key to survival. mailto:egmowens.lita@gmail.com information technology and libraries march 2020 a framework for member success | morton-owens 2 our organizational framework is a key part of the challenge for both ala and lita. i have no doubt that members remain excited about our key issues for advocacy, our subjects for continuing education, and our opportunities for networking. but we have concerns about how we make those things happen. in lita, for example, continuing education requires a massive effort on the part of both member volunteers and staff to organize. we need to brainstorm relevant topics, recruit qualified instructors, schedule and promote the events, and finally run the sessions, collect feedback, and arrange payment for the instructors. this takes the time of the same people we’d like to have creating newsletters and booking conference speakers. meanwhile, right across the hall at ala headquarters, we have staff from alcts and llama doing the same things. these inefficiencies hit at the heart of our financial problems. at the ala level, scoe has proposed ideas like a single set of dues structures for divisions, and a single set of policies and procedures for all round tables. these changes would reduce the overhead required to operate these groups as unique entities, a financial benefit, while also making it easier for members to afford, join, and move between them, a membership benefit. that framework also offers us an opportunity to improve our associations. members have been asking how the association can act more responsively on issues of diversity, equity, and inclusion—for example, how can we have incident response that is proactive and sensitive to member needs while recognizing the complexities of navigating that space as a member-based organization. this is a chance to live up to our aspirations as a community. the actions lita has taken to extend all forms of participation to members who can only participate remotely/online are a way to make us more accessible to library workers regardless of finances or home circumstances. bylaws and policies may not be the most glamorous part of associations but they are the levers we can employ to change the character of our community. coming back to core, we can observe elements of the plan that are responding to both threats and opportunities. members of alcts, llama, and lita know that financial pressures are a major impetus for the merger effort. but, in the hope of achieving a positive reinvention, the merger planning steering committee put most of its emphasis on the opportunity side. the diagram of intersecting interests for core’s six proposed sections (https://core.ala.org/core-overlap/) is a demonstration of the new frontiers of collaboration that core will offer members. the proposed structure of core retains committees while also offering a more nimble way to instantiate interest groups. moreover, the process of creating core reflects the kind of transparent process we want to see in the future. the steering committee and the communications sub-committee crossed not just the three divisions but also different levels of experience and types of prior participation in the divisions. the communications group answered freeform questions, held twitter ama’s, and held numerous forums to collect ideas and feelings about the project. zoom meetings and twitter are not new media, but the sustained effort that went into soliciting and responding to feedback through these channels is a new mode for our divisions. the lita board recently issued a statement (https://litablog.org/2020/02/news-regarding-thefuture-of-lita-after-the-core-vote/) explaining that if the core vote does not succeed, we don’t see a viable financial path forward and will be spending the latter half of 2020 and the beginning of 2021 working toward an orderly dissolution of lita. it is tempting to approach this crossroads from a place of disappointment or fear. we cannot yet say precisely what it will be like to be a https://core.ala.org/core-overlap/ https://litablog.org/2020/02/news-regarding-the-future-of-lita-after-the-core-vote/ https://litablog.org/2020/02/news-regarding-the-future-of-lita-after-the-core-vote/ information technology and libraries march 2020 a framework for member success | morton-owens 3 member of core. but when i look at the organizational structure core offers us, i feel hopeful about it being a framework in which members will find their home and flourish. the new division includes what we need for a rich member experience coupled with a streamlined structure that makes it easier to be involved in the ways and extent that make sense for you. in fifty years, perhaps a future member of core will be writing a letter to their members: looking back at this moment of technological and organizational disruption and reflecting on how we reinvented our organization at the moment it needed it most. yee ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀฀฀ ฀ ฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀฀฀ ฀ ฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀฀฀ ฀ ฀฀฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀฀฀฀ ฀ ฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀฀฀ ฀ ฀฀฀฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀฀฀ ฀ ฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀฀฀ ฀ ฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀฀฀ ฀ ฀฀฀฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀฀฀ ฀ ฀฀฀฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀฀฀฀ ฀ ฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀฀ ฀฀฀฀ ฀ ฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀฀฀ ฀ ฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀฀฀ ฀ ฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀฀฀ ฀ ฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀฀฀ ฀ ฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀ ฀ ฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀฀฀฀฀฀฀฀฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀฀ ฀ ฀ ฀฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ 54 information technology and libraries | june 2010 tinuing education opportunities for library information technologists and all library staff who have an interest in technology. 2. innovation: to serve the library community, lita expert members will identify and demonstrate the value of new and existing technologies within ala and beyond. 3. advocacy and policy: lita will advocate for and participate in the adoption of legislation, policies, technologies, and standards that promote equitable access to information and technology. 4. the organization: lita will have a solid structure to support its members in accomplishing its mission, vision, and strategic plan. 5. collaboration and outreach: lita will reach out and collaborate with other library organizations to increase the awareness of the importance of technology in libraries, improve services to existing members, and reach out to new members. the lita executive committee is currently finalizing the strategies lita will pursue to achieve success in each of the goal areas. it is my hope that the strategies for each goal are approved by the lita board of directors before the 2010 ala annual conference in washington, d.c. that way the finalized version of the lita strategic plan can be introduced to the committee and interest group chairs and the membership as a whole at that conference. this will allow us to start the next fiscal year with a clear road for the future. while i am excited about what is next, i have also been dreading the end of my presidency. i have truly enjoyed my experience as lita president, and in some way wish it was not about to end. i have learned so much and have met so many wonderful people. thank you for giving me this opportunity to serve you and for your support. i have truly appreciated it. a s i write this last column, the song “my way” by frank sinatra keeps going through my head. while this is definitely not my final curtain, it is the final curtain of my presidency. like sinatra i have a few regrets, “but then again, too few to mention.” there was so much more i wanted to accomplish this year; however, as usual, my plans were more ambitious than the time i had available. being lita’s president was a big part of my life, but it was not the only part. those other parts—like family, friends, work, and school—demanded my attention as well. i have thought about what to say in this final column. do i list my accomplishments of the last year? nah, you can read all about that in the lita annual report, which i will post in june. tackle some controversial topic? while i can think of a few, i have not yet thought of any solutions, and i do not want to rant against something without proposing some type of solution or plan of attack. i thought instead i would talk about where i have devoted a large part of my lita time over the last year. as i look back at the last year, i am also thinking ahead to the future of lita. we are currently writing lita’s strategic plan. we have a lot to great ideas to work with. lita members are always willing to share their thoughts both formally and informally. i have been charged with the task of taking all of those great ideas, gathered at conferences, board meetings, hallway conversations, surveys, e-mail, etc., to create a roadmap for the future. after reviewing all of the ideas gathered over the last three years, i was able to narrow that list down to six major goal areas. with the assistance of the lita board of directors and the lita executive committee, we whittled the list down to five major goal areas of the lita strategic plan: 1. training and continuing education: lita will be nationally recognized as the leading source for conmichelle frisque (mfrisque@northwestern.edu) is lita president 2009–10 and head, information systems, north western university, chicago. michelle frisque president’s message: the end and new beginnings lib-mocs-kmc364-20140103102512 64 predicting the need for multiple copies of books robert s. grant, presently at hope college, holland, michigan. an industrial inventory technique adapted to a university library's computer based circulation system as one aid in identifying heavily used books for multiple-copy purchase. the university of windsor has approximately 5,000 students. the university library's open stacks contain more than 300,000 volumes, 100,000 of which are non-circulating (bound periodicals and reference books). there are approximately 200,000 books available for circulation, a booksto-student ratio of 40:1. nevertheless, a perennial student complaint is: "why is it that every time i need a book, someone else has already checked it out?" to help mitigate this problem, the library decided several years ago to embark upon a programme of purchasing multiple copies of much used books. the question then became one of determining which books would need duplicating, and how many more copies of each title would need to be bought. suggestions of titles to be duplicated were at £rst solicited from the faculty, but ever-increasing demands on them prevented their being more than minimally cooperative. three years ago, in an effort to increase the availability of books to undergraduates, the library changed its circulation period for undergraduates from two weeks to one week, with unlimited renewals. at the same time there was instituted a system whereby a student £lied out a reserve card requesting that he be allowed to check out a book upon its predicting need for multiple copies/grant 65 return. when there were five or more such requests, then a copy of the book was to be purchased. although this . system of ordering multiple copies was very cumbersome, it was better than nothing. an article by william l. leffler ( 1) suggested a system of adapting industrial inventory techniques to the problem of identifying books to be duplicated that would be compatible with the library's computer based circulation system and also could be expected to be simpler and more thorough than the above method of buying multiple copies. without rehearsing leffler's arguments, the basic formula used in this project can be simply stated as: n x n9s% nbooks = t where nbook• = the number of copies of a single title necessary to meet at least 95% of student demand for that title; t =number of days of observation, i.e., the number of days in the academic year in which students are permitted to check out books (a constant of 273 in this formula, being the number of days in the period from 1 september to 31 may); n = total number of times a title circulated during t; n9s% =a+ 2s, where a= the average length of time a title was on loan, i.e., the total number of days in which a title was in circulation divided by the number of times (n ) the title circulated. s =standard deviation, which is computed as: j l (a~a) 2 a1 is the length of time, in terms of days, that a single title was off the shelves each time it circulated, and is not to be confused with a which is the average length of time (over the academic year) that the same title was on loan. the sum ( l) of all the a1's was used earlier to calculate a. a1 + a2 + aa . ... etc. a= n for example, if a book circulated three times during the academic year (the first time for 18 days, the second time for 20 days, and the third time for 3 days) then a (the average length of time the book was on loan) would be calculated as 18 + 20 + 3 13 66 3 , or . at this point it should be noted that although the library continues to accept request cards for books presently on loan (and to reserve books for the requestors), these requests are not used as part of the data in determining the number of copies necessary to meet at least 95% of the demand. for one thing, there is no way of knowing how long the person making a request will want to keep a book out, and time is an important element in the formula. but more importantly, the formula, as it now stands, attempts to account for unsatisfied requests. it assumes that in at least some instances there will be more requests for a title than there 66 journal of library automation vol. 4/2 june, 1971 calculate t n '>-----~1 of loans + add 1 t o copies circulati ng n calcu late n o . of days on loan s tor e information in table fig. 1. programme logic. total n of days on loan re-set table n calculate average length of l oan calculate standard deviation calculate calcu l a te print report predicting need for multiple copiesj grant 67 are copies in the library. by providing an analysis of the present circulation profile of each book, the formula attempts to predict the number of copies of each title the library would need to have in order to more adequately accommodate unsatisfied demand. the programme for performing the calculations is written in pl/1 and is run on an ibm 360 j 50 (figure 1). the execution time for 140,000 circulation records (each time a book circulates the data on its circulation is considered a single record ) is 15 minutes. the historical record file, the source of data for the programme, is incremented each time a book in circulation is returned. figure 2 shows the format of this file. the file itself is a sequential file stored on magnetic tape, updated daily to include the previous day's circulation data. entries are arranged in lc call number-accession number order. field card type lc call number author accession number spare card sequence number spare borrower's id code borrower 's id number spare action code due da te (mmddyy) (mo .-dayyr.) spare indicator date charged out (yyddd) (yr.-day) date returned (yyddd) (yr. -day) fig. 2. format of historical record file. length 1 29 15 6 1 6 2 1 6 3 1 6 3 1 5 5 accumulative length 1 30 45 51 52 58 60 61 67 70 71 77 bo bl 86 91 68 journal of library automation vol. 4/ 2 june, 1971 results after the calculations described above have been performed for every title circulated during the academic year, a print-out of the results is produced ( figure 3). in order to limit paperwork, only those results under "projected need" which were ~ 1.00 appear on the print-out; any results less than 1.00 were suppressed. the column labelled "transactions" is simply the number of times the book was checked out and checked back in again. the column, "average loan period" is the a described in the formula above. and the column, "copies circulated" is the number of books with the same classification number as listed on the left-hand column, but with different accession numbers, checked out during the year. this figure is not the number of copies of the book that the library owns, which could, in some instances, be more copies than were actually circulated. the column labelled "projected need" should, according to the calculations, indicate the number of copies of a title which could accommodate the demand for that title with 95% certainty. in order to find out whether or not the library should purchase more copies of a particular title, the number listed in this column is simply checked against the number of classification author projected trans. avg loan copies need period circul . am---101.-.c3488-canada-national 3 . 61 37 10 . 45 17 b-----56.-.c6---collins-james-d 1.14 21 8 . 00 2 b-----65.-.86---bodenheimer,£.1. 21 12 11.50 3 b--65. .r6----rommen-heinrich l. 34 5 20.60 2 8--67 ..858-blake-ralph-m-2.00 4 36 .75 2 8 ----67.-.n22---nagel-ernest--2. 34 23 11 . 39 3 8----72.-.c63-copleston f.c . 2. 39 27 9.18 10 b----72. .hs---gilson-e. h.---1. 64 26 9.03 14 b-----72.-.j6----joad-cyril-edwi 2.84 8 2 1. 7 5 2 b----72.-.p3----parkerf .h . ---2.48 4 41.00 2 b--358.-.c57---plato--------5.68 21 15 . 61 3 8----358.-.j8----plato---------2.00 38 8 . 07 10 b---358.-.w7----plato-------2. 72 5 39.80 3 8----377. -. a285-plato--3.65 8 3 5. 3 7 2 b----378.-.a2c6--plato-----l. 58 2 73 . 00 2 8 -381. -. ast35plato---l. 04 3 36 . 33 3 b---385.-.a6----anderson-f h-2.92 16 13 . 43 2 b----395.-.877---brumbaugh-r-s2. 05 12 13.33 1 b----395.-.c6----crombie-i-----3.02 17 12 . 41 2 b-395. .g67--grube-georg em5.13 30 10.30 5 b----395. -. g78 --guardini,r.2. 04 17 12 . 23 4 b---395. -. k6----koyre-alexandre 1.13 4 21. 7 5 1 b-395.-.l6--lodge rupert c1. 88 3 51.33 1 b----395.. 553-5horey , paul-4.69 23 11.91 4 b--398. .t25 -taylor,a.e. ---1. 31 28 7 . 7 5 5 8 -398.-.e8h17hall , robert-w .2.99 11 16 . 72 2 b-407 .-. l8l9-lutoslaw5ki,w. 3.10 4 59.25 1 b--505. -. m2--aristotel£5, -2 . 88 17 1 2 . 00 7 b----505.-.03---oates-w.j.--3 . 86 9 2 7. 3 3 7 b--528.. z 4 13-zeller-eduard-1.39 6 33 . 00 2 8--528. .p751--pohlenz-max---1. 35 5 34.60 2 8 --667.. 525---sam bur5ky 5am-1. 36 5 42 . 40 1 b-701 . -.d4d6-dondaine , h . f . 1. 03 2 69. 0 0 1 b-701.-.a4e5 -proc lu5-diadoch 1.11 2 72.50 1 fig. 3. circulation history analysis report. predicting need for multiple copies/ grant 69 copies listed for this classification number in the official shelf list. for example, the book classified as b.72.j6 shows a "projected need" of 2.84. therefore if the library had three copies of this book, and the book's circulation pattern did not change significantly in the immediate future, then the library would be able to fill 95% of the requests. the official shelf list, however, indicates that the library only owns two copies of this title, suggesting that at least one more copy should be purchased to meet present demand. these calculations do not anticipate future demand on the book. also, doubling the number of copies can never succeed in doubling circulation, a fact demonstrated by leimkuhler ( 2). this print-out, therefore, can only serve as one guide to multiple-copy purchase. precautions and pitfalls in using the results of these computations as a guide to the purchase of multiple copies, the librarian should be aware of several factors which may have distorted the results. one is that the student who checks out the only copy of a book and keeps checking it out all year, in lieu of buying his own copy, creates a false "demand" for· the book. it may be that he is the only person in the university interested in it, and when he graduates this book may sit out its life on the shelves completely unused. however, since the historical record file contains the borrower's id number, it is possible to distinguish between an original loan and a renewal. the first time the borrower's id number appears on the book's circulation record indicates the original loan. each additional and consecutive time the same borrower's id number appears on the same circulation record indicates a renewal. although the pilot project did not contain provisions for obviating this problem, it would have been simple enough to build into the programme a mechanism for suppressing the unwanted data. a faculty member who assigns parts of books for students to read, but does not place the books on reserve, forces competition for them on the open shelves. this too creates a demand which may not exist after the professor leaves the university or stops teaching a particular course. the librarian should be aware of such possible short-lived demands that may never recur. the circulation analysis programme was executed at the end of one academic year in order to provide the university of windsor librarians with guidelines for purchase of multiple copies of books to be used in the next academic year. if it were known that a particular book receiving heavy use one year would not receive equally heavy use in the next (because, for example, the particular course requiring that book would no longer be taught; or the book would be placed on a "two-hour reserve" for the coming academic year; or the book circulated frequently in one year only because it was on the "best-seller list"), then it would be folly to purchase three or four additional copies of the book just because the computer print-out indicated that a number of additional copies were 70 journal of library automation vol. 4/2 june, 1971 needed. other factors, therefore, although not included in the input data, are certainly relevant in determining the need for multiple copies. at the university of windsor library, a book that needs to be re-bound because of heavy use or mutilation is charged out to the bindery department. it then shows up on the historical record file, just as though it had been charged out. but since the "borrower's id number" for books charged to the bindery department consists of all zeroes, it would be simple enough to identify and suppress these particular records as unwanted data. by-products in addition to providing a list of books to be considered for duplication, the historical record file upon analysis revealed several other interesting facts about the university library's circulation. most noteworthy is the fact that, although there were more than 200,000 circulating books sitting on the open shelves at the time of this pilot project, only 40,205 different titles circulated for a total of 134,276 times. assuming there were only 100,000 different titles among the 200,000 books, this would mean that nearly 60% of the collection was probably not used by the students. of the 40,205 different works which did circulate, the calculations indicated that only 3,257 titles required one or more copies in order to fill 95% of the requests. of this latter number, only 570 titles were in need of duplication. (that is to say, the number of copies listed under projected need exceeded the number of copies actually owned by the library as indicated by the shelf list.) a random sample comprising one-third of these 570 titles was checked to see whether or not the books were in print. indications were that 38% of the titles in need of duplication were no longer in print. conclusions a close examination of the 570 titles apparently in need of duplication reveals that, with very few exceptions, students are apparently checking out only books that are curriculum oriented in the most narrow sense, i,e., books which they need to use in writing term papers. nevertheless, one can appreciate the fact that these books are in demand by the student, and if the library is to be responsive to users' demands on its facilities, it will need to spend part of the book budget each year purchasing multiple copies of the most heavily used books. unfortunately, even with these good intentions and the sophisticated assistance of the computer, students' demands for books will still be frustrated (at least one-out-of-three times) because books which need to be duplicated are no longer in print. programme a print-out copy of the circulation analysis programme described above predicting need for multiple copies/grant 71 is available from mrs. jean griffiths, computer centre, university of windsor, windsor, ontario, canada. acknowledgments the initial impetus and continuous guidance for this project was provided by albert v. mate, assistant librarian for public services at the university of windsor. dr. martin basic, faculty of business administration, acted as consultant. systems analyst was mrs. jean griffiths, and programmer was mrs. lillian jin, both at the university computer centre. references 1. leffier, william l.: "a statistical method for circulation analysis," college and research libraries, 25 ( 1964), 488-490. 2. leimkuhler, ferdinand f .: "systems analysis in university libraries," college and research libraries, 27 ( 1966), 13-18. • editorial board thoughts | farnel 169 t his past spring, my alma mater, the school of library and information studies (slis) at the university of alberta, restructured the it component of its mlis program. as a result, as of september 2010, incoming students are expected to possess certain basic it skills before beginning their program.1 these skills include the following: ■■ comprehension of the components and operations of a personal computer ■■ microsoft windows file management ■■ proficiency with microsoft office (or similar) products, including word processing and presentation software ■■ use of e-mail ■■ basic web browsing and searching this new requirement got me thinking: is this common practice among ala-accredited library schools? if other schools are also requiring basic it skills prior to entry, how do those required by slis compare? so i thought i’d do a little investigating to see what others in “library school land” are doing. before i continue, a word of warning: this was by no means a rigorous scientific investigation, but rather an informal survey of the landscape. i started my investigation with ala’s directory of institutions offering accredited master’s programs.2 there are fifty-seven institutions listed in the directory. i visited each institution’s website and looked for pages describing technology requirements, computer-competency requirements, and the like. if i wasn’t able to find the desired information after fifteen or twenty minutes, i would note “nothing found” and move on to the next. in the end i found some sort of list of technology or computer-competency requirements on thirty-three (approximately 58 percent) of the websites. it may be the case that such a list exists on other sites and i didn’t find it. i should also note that five of the lists i found focus more on software and hardware than on skills in using said software and hardware. even considering these conditions, however, i was somewhat surprised at the low numbers. is it simply assumed that today’s students already have these skills? or is it expected that they will be picked up along the way? i don’t claim to know the answers, and discovering them would require a much more detailed and thorough investigation, but they are interesting questions nonetheless. once i had found the requirements, i examined them in some detail to get a sense of the kinds of skills listed. while i won’t enumerate them all, i did find the most common ones to be similar to those required by slis— basic comfort with a personal computer and proficiency with word processing and presentation software, e-mail, file management, and the internet. a few (5) schools also list comfort with local systems (e-mail accounts, online courseware, etc.). several (7) schools mention familiarity with basic database design and functionality, while a few (5) list basic web design. very few (3) mention competency with security tools (firewalls, virus checkers, etc.), and just slightly more (4) mention familiarity with web 2.0 tools like blogs, wikis, etc. while many (14) specifically mention searching under basic internet skills, few (7) mention proficiency with opacs or other common information tools such as full-text databases. interestingly, one school has a computer programming requirement, with mentions of specific acceptable languages, including c++, pascal, java, and perl. but this is certainly the exception rather than the rule. i was encouraged that there seems to be a certain agreement on the basics. but i was a little surprised at the relative rarity of competency with wikis and blogs and all those web 2.0 tools that are so often used and talked about in today’s libraries. is this because there is still some uncertainty as to the utility of such tools in libraries? or is it because of a belief that the members of the millennial or “digital” generation are already expert in using them? i don’t know the reasons, but it is interesting to ponder nonetheless. i was also surprised that a level of information literacy isn’t listed more often, particularly given that we’re talking about slis programs. i do know, of course, that many of these skills will be developed or enhanced as students work their way through their programs, but it also seems to me that there is so much other material to learn that the more that can be taken care of beforehand, the better. librarians work in a highly technical and technological environment, and this is only going to become even more the case for future generations of librarians. certainly, basic familiarity with a variety of applications and tools and comfort with rapidly changing technologies are major assets for librarians. in fact, ala recognizes the importance of “technological knowledge and skills” as core competencies of librarianship. specifically mentioned are the following: ■■ information, communication, assistive, and related technologies as they affect the resources, service delivery, and uses of libraries and other information agencies. ■■ the application of information, communication, assistive, and related technology and tools consistent with professional ethics and prevailing service norms and applications. ■■ the methods of assessing and evaluating the sharon farnel editorial board thoughts: system requirements sharon farnel (sharon.farnel@ualberta.ca) is metadata & cataloguing librarian at the university of alberta in edmonton, alberta, canada. 170 information technology and libraries | december 2010 references 1. university of alberta school of library and information studies, “degree requirements: master of library & information studies,” www.slis.ualberta.ca/mlis_degree_requirements.cfm (accessed aug. 5, 2010). 2. american library association office for accreditation, “library & information studies directory of institutions offering accredited master’s programs 2008–2009,” 2008, http:// ala.org/ala/educationcareers/education/accreditedprograms/ directory/pdf/lis_dir_20082009.pdf (accessed aug. 5, 2010). 3. american library association, “ala’s core competences of librarianship,” january 2009, www.ala.org/ala/education careers/careers/corecomp/corecompetences/finalcorecomp stat09.pdf (accessed aug. 5, 2010). specifications, efficacy, and cost efficiency of technology-based products and services. ■■ the principles and techniques necessary to identify and analyze emerging technologies and innovations in order to recognize and implement relevant technological improvements.3 given what we know about the importance of technology to librarians and librarianship, my investigation has left me with two questions: (1) why aren’t more library schools requiring certain it skills prior to entry into their programs? and (2) are those who do require them asking enough of their prospective students? i hope you, our readers, might ask yourselves these questions and join us on italica for what could turn out to be a lively discussion. 3 hoist by their own petard a funny thing happened at ala midwinter. what's more, it was fascinating as well, for it was one of the loveliest examples of "communications dysfunction" i've ever seen. (dysfunction: impaired or abnormal functioning.) librarians-information scientists-have always been concerned with the transfer of information. in recent times, this concern has been explicitly identified as constituting the major component of the profession's domain. whether one interprets information to be the book, and discusses its transfer in terms of acquisitions, circulation, and interlibrary loan, or one interprets information to be datum, and discusses transfer in terms of access, retrieval, and transfer, the fact remains that information transfer is the area of concern of the information profession. yet, as is already evident from the paragraph above, the medium being used to relay the message, the unit which is basic to the process of information transfer, i.e., the word, is a fractious thing. one would think that informationalists would be among the most alert to this frailty of language; yet, though the problem has been addressed at great length by a great many, members of our profession have not been predominant among them. we, too, use words ever more loosely, violate structure ever more often, and transpose jargon ever more freely-unaware, and, apparently, uncaring that in the process we are vitiating the very foundation of our field. and thus, at the palmer house in chicago, during a very balmy january midwinter meeting of the american library association, a select group of professional practitioners who had gathered together to work together found themselves caught in their own trap. they were unable to communicate! information specialists-listening without hearing, reading without comprehending, talking without communicating. it was almost frightening. "network" concerns got defined in terms of the need for reimbursement for interlibraty loan. the phrases "data base interchange," "machine-readable record exchange," and "networking" were being used interchangeably, engendering damaging misconceptions. the distinction between "conb:act negotiation assistance" (which clr will provide the anable serials group) and "contracting" (which clr is not doing here) was not made. legislative "networks" described procedural, not substantive, activity. the jargon of internal revenue code section 4942 ( j) 4 journal of library automation vol. 7/1 march 1974 ( 3) (operating foundation) and the jargon of the technical sector ( operations) were interpreted as being synonymous. and the word standard lost its identity altogether. the irony is overwhelming. like the old adage about the shoemaker's children who don't have shoes, it would appear that it is the information specialists who cannot communicate.-ruth l. tighe, new england library information network llo journal of library automation vol. 14/2 june 1981 the desperation from a downtime situation. great neck library is also planning to use the apples for other functions, which, it is hoped, will be implemented soon. multimedia catalog: com and online kenneth j. bierman: tucson public library, tucson, arizona. like many public libraries, the tucson public library (tpl) is closing its card catalog and implementing a vendorsupplied microform catalog. unlike most of these other libraries, however, the tpl microform catalog will not include', location or holding information. the indication of where copies of a particular title are actually available (i.e., which of the fifteen possible branch locations) will be available only by accessing a video display terminal connected to the online circulation and inventory control system. conceptually, the tpl catalog will be in two parts with each part intended to serve different functions.' the microform catalog (copies available in both film and fiche format) will fulfill the bibliographic function of the catalog. this catalog will contain bibliographic description and provide the traditional access points of author, title, and subject. the online catalog (online terminals are in place at all reference desks and a few public access terminals will also be available) will fulfill the finding or locating function of the catalog. this catalog will contain very brief bibliographic description and will only be searchable by author, title, author/title, and call number, and will contain the current status of every copy of every title in the library system (i.e., on shelf, checked out, at bindery, reported missing, etc.). why did the tucson public library make this decision? there are two major reasons: l. accuracy . the location information, if provided in the microform catalog, would always be inaccurate and out of date. assuming that the locations listed in the latest edition of the microform catalog were completely accurate when the catalog was first issued (an unrealistic assumption to begin with as anyone who has ever worked with location information at a public library with many branches well knows!), the location information would become increasingly less accurate with each day because of the large number of withdrawals, transfers, and added copy transactions that occur (more than 100,000 a year) . in addition, at any given time, one-quarter to one-third of the materials in busy branches are not on the shelf because they are either checked out or waiting to be reshelved . thus, the microform catalog would indicate that these materials were available at specific branches when a significant percentage would in fact not be available at any given time. in short, even in the best of circumstances, easily half of the location information would be incorrect in telling a user where a copy of a title was actually available at that moment. 2. cost , a study done at the tucson public library indicated that close to half of the staff time of the cataloging department was spent dealing with location and holding information. this time includes handling transfers, withdrawals, and added copies. all of this record keeping is already being done as a part of the online circulation and inventory control system (the tucson public library has no card shelflist containing copy and location information but rather relies completely on the online file for this type of information). to "duplicate" the information in the microform catalog would cost an estimated $40,000 to $60,000 a year and the information in the microform catalog would never be accurate or up to date for the reasons outlined above . figure 1 is a brief summary of how the bibliographic system will work. would the system in figure 1 be improved if holdings were included in the microform catalog? on the surface, the obvious answer is yes-more information is communications 111 known-item search (37 percent of tpl catalog use according to catalog use survey conducted at the tpl in 1971) user searches microform catalog by author and/or title. if user does not find desired bibliographic entry, user either leaves unsatisfied or goes to desk (or public access terminal) for help. if user finds the desired bibliographic entry, he/she writes down call number (or author for fiction) and proceeds to shelf. if user finds book on shelf he/she checks it out. if user does not find book on shelf, user either leaves unsatisfied or goes to desk (or public access terminal) to obtain holdings information or ask for help (put on reserve, borrow from another library, possible purchase of additional copies, etc.). subject search (63 percent of tpl catalog use by public according to catalog use survey conducted at the tpl in 1971) user searches microform catalog. user writes down call number(s) and proceeds to shelf. if user finds appropriate material(s), he/she checks it out. if user does not find appropriate material he/she leaves unsatisfied or goes to desk for help (reference interview, etc.) . fig. 1. summary of how system will work. always better. but, if we examine the situation in depth, perhaps not. let us look at some hypothetical situations. if the user is doing a search and does not find the desired entry/entries in the microform catalog, it makes no difference whether holdings are included in the catalog. the user will still either leave unsatisfied or go to the desk for help. if the user is doing a known-item search and finds the desired item and notes, and the agency he/she is at is listed as a holding agency, he/she will proceed to the shelf. if the desired material is found, fine . if not (because the material is checked out, reported missing, or withdrawn), he/she will either leave unsatisfied or go to the desk (or public access terminal) for help. if the user is doing a known-item search and finds the desired item in the microform catalog but notes that the agency is not listed as a holding agency, what are his/her choices? the user can go away unsatisfied without checking the shelves (although there may be a copy on the shelf because a copy may have been added to that agency since the microform catalog was last recumulated) or he/she can go to the desk (or public access terminal) to obtain help; here he/she will have access to the "real" holdings information--on the online system. the user could notice from the holdings in the microform catalog that another branch has the item and drive to the other branch. however, when the user gets there he/she may discover that the item is not available-information that could have been found in the online system at the original branch if he/she had gone to the desk (or public access terminal). · the purpose of the above exercise is to demonstrate that in all cases the user is still going to require access to the online catalog in order to determine holdings more accurately. with time, this access will become increasingly self-service through public access terminals. from the user's point of view, providing inaccurate holdings in the microform catalog does very little good and can actually do harm by leaving the impression that, if a library is listed as a holding library, that library will have the item (a false conclusion because of checkouts, reported missings, and withdrawals) or leaving the impression that if a library is not listed as a holding library, that library will not have the item (a false conclusion because a copy could have been added recently but that fact is not yet reflected in the microform catalog) . if the user is doing a subject search, holdings are of less value in the catalog 112 journal of library automation vol. 14/2 june 1981 anyway because he is primarily getting suggested classification numbers in order to browse. the tucson public library could not have made the above decisions if it did not have a complete online file of all its holdings (including even reference materials that never circulate). but since this data did exist (after a five-year bar-coding effort) and since more than forty online terminals were already in place throughout the library system to access the online file, the decision not to include locations or holdings in the microform catalog seemed reasonable . in the longer-range future (1990?), it is very likely that the entire catalog will be available online . in the meantime, the tucson public library did not want to divide its resources maintaining two location records, but rather wanted to concentrate resources in maintaining one accurate record of locations available as widely as possible throughout the library system (by installing more online terminals for staff and public use). was this decision a sound one? we don't know. the microform catalog has not yet been introduced for public use. by the end of this year we should have some preliminary answers to this question. references 1. robin w. macdonald and j. mcree elrod, "an approach to developing computer catalogs," college & research libraries 34:202-8 (may 1973). a structure code for machine readable library catalog record formats herbert h. hoffman: santa ana college, santa ana, california. libraries house many types of publications in many media, mostly print on paper, but also pictures on paper, print and pictures on film, recorded sound on plastic discs, and others. these publications are of interest to people because they contain recorded information. more precisely said, because they contain units of intellectual, artistic, or scholarly creation that collectively can be called "works." one could say simply that library materials consist of documents that are stored and cataloged because they contain works. the structure of publications into documents (or "books") and works, the clear distinction between the concept of the information container as opposed to the contents, deserves more attention than it has received so far from bibliographers and librarians. the importance of the distinction between books and works has been hinted at by several theoreticians, notably lubetzky . however, the idea was never fully developed. the cataloging implications of the structural diversity among documents were left unexplored. as a consequence, librarians have never disentangled the two terms book and work . from the paris principles and the marc formats to the new second edition of the anglo-american cataloguing rules, the terms book and work are used loosely and interchangeably, now meaning a book, now a work proper, now part of a work, now a group of books. such ambiguity can be tolerated as long as each person involved knows at each step which definition is appropriate when the term comes up. but as libraries ease into the age of electronic utilities and computerized catalogs based on records read by machine rather than interpreted by humans, a considerably greater measure of precision will have to be introduced into library work. as one step toward that goal an examination of the structure of publications will be in order. the items that are housed in libraries, regardless of medium, are of two types. they are either single documents, or they are groups of two or more documents. items that contain two or more documents are either finite items (all published at once, or with a first and a last volume identified) or they are infinite items (periodicals, intended to be continued indefinitely at intervals). schematically, these three types of bibliographic items in libraries can be represented as shown in figure l. it should be noted that all publications, all documents, all bibliographic items in liproduct ownership of a legacy institutional repository: a case study on revitalizing an aging service article product ownership of a legacy institutional repository a case study on revitalizing an aging service mikala narlock and don brower information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13241 mikala narlock (mnarlock@nd.edu) is digital collections strategy librarian, university of notre dame. don brower (dbrower@nd.edu) is digital projects lead, university of notre dame. © 2021. abstract many academic libraries have developed and/or purchased digital systems over the years, including digital collection platforms, institutional repositories, and other online tools on which users depend. at hesburgh libraries, as with other institutions, some of these systems have aged without strong guidance and resulted in stale services and technology. this case study will explore the lengthy process of stewarding an aging service that satisfies critical external needs. starting with a brief literature review and institutional context, the authors will examine how the current product owners have embraced the role of maintainers, charting a future direction by defining a clear vision for the service, articulating firm boundaries, and prioritizing small changes. the authors will conclude by reflecting on lessons learned and discussing potential future work, both at the institutional and professional level. introduction our home-grown institutional repository (ir) began almost a decade ago with enthusiasm and promise, driven by an eagerness to meet as many use cases as possible. over time, the code grew unwieldy, personnel transitioned into new roles, and new priorities emerged, leaving few individuals to manage the repository, allocate resources, articulate priorities, or advocate for user needs. this in turn left the system underutilized and undervalued. in mid -2019, two product owners (pos) at hesburgh libraries, university of notre dame were named to oversee the service and tasked with determining how the service should continue, if at all. the pos began by evaluating the service, current commitments, and benefits, and identifying potential on-campus adopters of the service. after agreeing the service should continue, the pos started the lengthy process of turning the metaphorical ship, prioritizing modest adjustments that would have large payoffs.1 selected literature review since the 2003 seminal article by clifford lynch, much has been authored on the topic of institutional repositories as academic libraries and archives have flocked to create their own.2 a complete literature review is beyond the scope of this case study: institutional repositories have contended and continue to contend with a wide variety of challenges, including legal, ethical, and socio-technical challenges.3 while the lessons presented in this case study can apply to a wide variety of legacy services, a brief overview of some of the literature surrounding irs is crucial to understanding the challenges the authors were presented as product owners. broadly defined “as systems and service models designed to collect, organize, store, share, and preserve an institution’s digital information or knowledge assets worthy of such investment,” libraries and archives flocked to build the “essential infrastructure for scholarship in the digital mailto:mnarlock@nd.edu mailto:dbrower@nd.edu information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 2 age.”4 operating under the assumption that faculty members would flock to the service to deposit their works, irs were promised to solve many problems, including supporting open access publishing and digital asset management.5 as articulated by dorothea salo, however, the field of dreams model (“build it and they will come”) was insufficient as repositories often failed to meet changing user needs and expectations while heavily employing library jargon that was foreign to faculty members.6 moreover, as identified by kim, some irs struggle to even be known to their users, while also grappling with concerns of trust.7 other problems that have plagued repositories include limited adoption rates, restricted resources to support digitization of analog materials for faculty that operate in both analog and digital media, failing support from fellow library colleagues, and inconsistent and incomplete metadata.8 salo warned more than a decade ago that high-level library administrative support would be necessary to empower repository managers to enact lasting and substantive change, and recent studies echo these concerns.9 libraries have slowly started to serve faculty on their terms, such as by creating automated processes for populating irs, streamlining content deposits, experimenting with metadata harvesting features to provide increased access, and building more tools to integrate directly with the research lifecycle.10 however, these new technologies and services may be out of reach for many institutions. in addition to limited resources, some institutions are grappling with a legacy system that is incompatible with newer code, leaving these institutions in a feature desert, reliant on aging technology and cumbersome deposit processes.11 moreover, even in an institution where resources might be more readily available for licensing or purchasing newer technology, early forks of open-source code or otherwise deprecated components might make migration to newer platforms extremely difficult, if not impossible, without extensive infrastructure improvements. lastly, as libraries grappled with some of the issues mentioned above and options for repositories continued to proliferate, many institutions struggled to clearly articulate boundaries around their digital library holdings. confusion between digital collections, scholarly content, e-resources, and other digital materials resulted in some institutions having too many options to store content, leaving internal and external stakeholders confused as to where to discover and distribute materials; conversely, other institutions have few options, and a wide variety of content is pigeonholed imperfectly into a single repository.12 in both situations, developing repositories with vague content scopes can be exceedingly difficult, as a restrictive scope can stifle development , while an overly inclusive approach results in too many use cases and competing stakeholder interests to effectively prioritize feature development. local context our institutional repository at the university of notre dame, managed by hesburgh libraries employees, suffered from many problems that affected our locally built code: limited adoption and awareness on campus; aging technology that made adding new features a monumental, if not impossible, task; and an overly broad scope (and a simultaneous proliferation of other digital collection tools). while the detailed history of this repository is beyond the scope of this paper, a brief overview of the development provides critical context. additionally, the technical details and implementation particulars will not be discussed, as this case study transcends specific software frustrations and will resonate with many institutions regardless. information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 3 in 2012, after a failed attempt to launch a repository in the early 2000’s, consortial development of our ir began in an open-source community. in 2014, an early implementation of the product was envisioned to be a unified digital library service that would provide support to many different stakeholders. this included a plan for a single location for researchers to share their scholarly work, research outputs, and research data, as well as for the university libraries to provide access to digitized rare and archival holdings. as development continued on the homegrown service, features were implemented to serve the numerous purposes mentioned above. this included components of an institutional repository, such as a self-deposit interface, customizable access levels, and a proof-of-concept researcher profile system. over time support for browsing digital collections was added, namely the development of the work type “collection,” which allowed curators to create a landing page for their collection and customize it with a representative image. development continued in a somewhat sporadic fashion, often aligning at the intersection of “what is easy?” and “what is needed?” as technical staff continued growing the open-source code. as content was added to the system, stemming from special collections, various campus partners, and electronic thesis and dissertation (etd) deposits, additional use cases emerged and were added to the scope of the repository. the system quickly grew cumbersome and difficult to work with. in short, the repository struggled with the challenges of many open-source technologies. the struggle was compounded by decreasing resources, an overly inclusive scope, limited adoption— both with external faculty as well as library faculty and staff—and consortial development that introduced features extraneous to local campus needs. while our repository did many different things, it failed to do any one well. after falling short of meeting the expectations for digital collections, particularly with regards to browsing and displaying objects, the library applied for, and received, a three-year mellon grant.13 this grant, a collaboration with the snite museum of art, university of notre dame, was initially sought to improve upon the existing repository and to build the infrastructure necessary to support the online display of collective cultural heritage materials and facilitate serendipitous discovery for patrons. however, soon into the grant, it became clear that creating an entirely new system for digital collections would be not only easier to build and maintain, but also better suited to meet the specific needs of digital collections as articulated by campus partners. first things first: what is our ir? around the same time this shift was announced, two individuals were appointed to serve as product owners (pos) of the repository. while exact duties vary between institutions, pos are responsible for liaising with users, managing the product backlog, directing development, communicating with a wide variety of stakeholders, resolving issue tickets, and guiding the overall direction of the product.14 the pos were tasked with making this amorphous, oft-critiqued service usable while dealing with uncertain resources and competing institutional priorities. with the change in grant objectives mentioned above, namely the desire to develop a new repository instead of contending with the legacy code, the option was presented to retire the repository and direct users to other systems that could sufficiently meet their needs, such as discipline specific repositories, general purpose repositories, or even online cloud storage. the pos recognized that continuing the system due solely to sunk costs was a fallacy: if the service was too cumbersome to information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 4 maintain with even nominal use, the return on investment would be abysmal and ultimately prevent the library from investing resources more appropriately. in order to evaluate the service, the pos considered active commitments and ongoing partnerships tied to the service. in particular, several centers and departments on campus had utilized the system to capture citations and demonstrate their impact. additionally, after conversations with library liaisons, it became apparent that there was great value in providing the campus with a discipline-agnostic repository that allows deposition of, provides access to, and preserves scholarly outputs that might otherwise be lost. while the pos recognized that faculty adoption or even awareness of the service was limited, they realized there were several campusspecific features that were useful to local champions, including flexible access controls at the record and file levels, as well as a customized etd workflow that served the graduate school, internal technical services, and the students and faculty required to interact with the system. acknowledging that the system and related services were still critical, the pos prioritized making sure the system remained useful: maintaining the legacy repository would cost valuable time and resources and would need to overcome the resentment that many internal stakeholders had developed over the years. after deciding the system was worth maintaining, it was necessary to explicitly narrow the scope of the service, which had broadened over time in an ad hoc manner: as other services were turned off, leaving various digital content to find a new location, our institutional repository was often leveraged to host the content, even when support for the needs of niche content was poor at best. when considering the future of the repository, several key use cases emerged, including the etd support provided to the graduate school as mentioned above. while the service had done many things acceptably, the strength was in the support for scholarship: the customized access levels, self-deposit interface, and robust preservation capabilities were frequently lauded as the highlights of the service to internal and external stakeholders. these considerations, combined with the eventual migration of digitized rare and unique materials to the new mellon-funded platform, resulted in rebranding and redefining the service as exclusively focused on scholarly outputs. with the goal of best supporting the teaching and research mission of the university, the directional force became how to (re)build the service as a trusted, useful, and integral repository for campus scholars to provide access to their research outputs. mission (and vision) critical operating under the guiding principles of usefulness, usability, and transparency, the first task after redefining and rearticulating the scope of the service was to keep the service operational. however, with the recognition that maintenance alone, while critical, would not lead to an enhanced reputation on campus, it was important to continue charting a forward direction. the product owners were given the freedom to articulate their ideal mission statement. to complement the vision of the repository as both trusted and integral, the pos further defined the mission statement in three key areas: to increase the impact of campus researchers, scholars, and departments; to advance new research by facilitating access to scholarship in all forms; and to serve as active campus partners in the research lifecycle. while these statements are far from innovative or revolutionary, it was essential for moving the service forward. in fact, these sentences were carefully crafted over the course of a month, during which time the product owners drafted the language, compared it with peer and aspirational peer information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 5 institutions, and solicited feedback from trusted internal colleagues before sharing it more broadly. this time-consuming process was critical for success, however: with the knowledge that these words would serve as the foundation for prioritizing feature requests and advocating for resources, the pos wanted to establish both the repository and themselves in their new role. this clarity in mission was also important for grappling with legacy emotional and mental frustrations that lingered towards the system, as the pos had a strong, unified foundation to advocate for resources and the service as a whole. relatedly, these mission and vision statements provided critical and consistent talking points, which were leveraged in presentations to internal stakeholders, provided to librarians as messaging for the liaison faculty, and useful in short communications to teaching professors, research faculty, and department administrato rs. clear and present boundaries in rebranding the repository, it also became clear that firm boundaries would be instrumental in attaining success. in addition to narrowly focusing feature development on supporting research and scholarly outputs, the pos also scaled back goals for adoption, intentionally excluded digital collection features, and identified features that were patently unattainable in the short term. the repository was often seen as a failure locally due to limited adoption and an incomplete record of the academic outputs of campus, reflecting concerns of irs more generally.15 combatting this narrative required a clear articulation and acceptance of the fact that the institutional repository, regardless of how seamlessly integrated or easy to use, would never be absolutely comprehensive or the authoritative record of our researchers and scholars. with limited resources and a current technical infrastructure in which it is difficult to incorporate automatic harvesting mechanisms, any effort to make the repository comprehensive would be impractical, unrealistic, and a waste of limited resources. instead, by focusing efforts on making the repository useful and refraining from being yet another requirement for an already overwhelmed faculty member or graduate student, the service can be improved to meet the unique needs of campus faculty, serving as a more viable option for those who need it.16 similarly, because there is less concern with filling the repository and increasing usage statistics and more on what the patron needs, the pos have been able to develop robust partnerships with stakeholders, leading to champions in research centers, labs, departments, and other administrative units across campus. this has helped scholars demonstrate the impact of their work, which in turn led to more partnerships with other campus centers, as champions began to advocate for the service to colleagues facing similar challenges across the university. in this way, decreasing the effort to fill the repository has actually increased holdings and driven more traffic to the site: by focusing on useful offerings and decreasing the burden on ourselves to create a comprehensive research repository, the pos have been able to prove the value of a discipline-agnostic approach to internal and external stakeholders. an additional, and extremely beneficial, boundary was intentionally excluding library-owned digital collections from the repository’s collecting and feature-development scope. the pos received little pushback from internal users on this change: the repository had been the de facto scholarly and research repository for nearly five years, as it was patently clear that supporting digital collections had been more of an afterthought, with limited features built to support curators and users in creating and interacting with rare and archival materials. in fact, internal colleagues supported this change wholeheartedly, as the pos volunteered to continue providing access to the extant digital content in the ir as the mellon grant-funded site was built. while this information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 6 direction had already been understood by individuals across the organization, it was helpful to clearly articulate the new boundaries in open forums for internal stakeholders, communication through a library-wide listserv, and repetition in smaller meetings. by articulating this new boundary clearly and repeating it frequently in different methods of communication, the pos had the authority to reject feature requests that were explicitly in support of rare and archival materials. with a clear focus on collecting and providing access to scholarly and research outputs, niche metadata fields, advanced browsing features, and robust collection landing pages were identified as unnecessary, as they were scoped for the mellon-funded platform, and internal colleagues quickly embraced this boundary. the final, crucial boundary, also related to feature requests, was to clearly define requests that were impossible to accommodate in the current technical infrastructure. as mentioned earlier, the pos focused first on maintenance: by updating code, critically evaluating the service and existing commitments, and charting a future direction, the pos could more effectively steward the project. this also meant revisiting previous feature requests, and even technical promises, in order to set more reasonable expectations on what the service would, and would not, be able to support in the coming years. with limited resources, advanced features such as research profiles—a frequent request from internal allies—was beyond the current capabilities with the aging technical stack. moreover, a feature-rich repository would be essentially useless if users’ basic expectations were left unmet: a cumbersome deposit interface, limited upload support, and confusing language throughout the site were more pressing issues, as they prevented users from even engaging with the site for any amount of time. by resolving these limitations and generating awareness of the repository, the pos could better serve not only current campus partners, but also future users, as an increase in adoption and use would lead to more resources to develop advanced features. instead of planning a new outfit for the proverbial patient, it was more important to stop the bleeding. by adopting firm boundaries, the pos were able to scope developer work, prioritize maintenance and modest feature development, and even deny implementation of previously requested features that were no longer relevant to the repository or would be unattainable in the coming years. the pos could explicitly drop support for unused services, allow other services to limp along, and improve existing strengths. this has further helped to clarify messaging about the service and garner more support from our campus partners; instead of a malleable system that fits too many roles in a limited capacity, the pos could clearly state how the repository offers support and garner users from across campus. small changes, big rewards the last critical component of rebranding and revitalizing the institutional repository was the conscious decision to implement incremental improvements instead of large, sweeping changes. in particular, there were known frustrations with the service that were easy to start working on while the product owners expanded the user base and sought additional user feedback. small changes to the user interface, including the addition of use metrics and color-coded access tags, received immediate attention and positive feedback from key stakeholders. additionally, over the numerous years of development, many projects to improve the repository had stalled for various reasons. by either prioritizing the work necessary to complete the project or accepting the sunkcosts and clearing the backlog for other projects, the technical development team could build momentum, completing projects and clearing mental space for new, exciting endeavors. information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 7 with limited resources on hand, maximizing the return on investment also included an emphasis on securing and keeping internal and external champions. due to the limited outreach conducted early in the system’s existence as well as the mediocre service offerings, many campus users were unaware of the tool, and a few were using the repository in a somewhat limited fashion. in order to build support for the service, it was critical that key users of the repository received targeted support and outreach efforts. a primary example of this was an imaging facility on campus: this unit provided a critical service to campus, yet had difficulty showing the impact of their work as many faculty members did not cite their team in publications. the facility slowly began collecting citations manually, but still struggled to publicly advertise their capabilities and show the fruits of their labor. they solved this problem by loading citation records into the repository, which became the single location where any interested faculty, staff, and students could look to see the full output of the center. while they were using the repository in a somewhat different manner than anticipated, they found the system useful and were actively directing other campus centers and institutes to the repository for similar support. in conversations with them, it became clear that a few modest changes would streamline their workflows and alleviate some cumbersome burdens. with this concentrated outreach and a minimal amount of development, the repository secured a champion that continues to advocate for the service to colleagues across campus. lastly, prioritizing maintenance and paying down technical debt was critical for moving the repository forward. many software dependencies had fallen behind by several major version updates, making it difficult to add new features or consider potential migration paths to future technical solutions. while the amount of technical debt to be paid was substantial, by prioritizing a small amount of maintenance every month, the development team quickly caught up, thereby improving the overall performance of the site and providing the product owners with the flexibility to consider future technical implementations and key features to continue recruiting users. lessons learned and future work moving forward, the product owners are embracing the role of maintainers. in specific reference to repositories, that includes “repairing, caring for and documenting a wide variety of knowledge systems beyond irs to facilitate access and optimize user experience.”17 the work of critically evaluating commitments, establishing clear boundaries, and reaffirming the mission of the repository is useful on a recurring basis, and will need to be continued as the repository ages. maintaining the technical infrastructure as appropriate and conducting user experience testing to improve the service will be critical to ensuring the long-term success of the repository and the information contained therein. beyond the stewardship and small improvements required for maintaining the service, there is the opportunity to reconsider the role of the institutional repository, both at the local level and within the academic community. by prioritizing usefulness over comprehensiveness, the product owners made great strides in making the service accessible to patrons and actually usable. when considering the future of repositories, specifically through a lens of usefulness, it is critical to consider how future work will best serve faculty needs without overburdening librarians. adding pos who are examining how a service will be used and what will promote the mission of the library reframes a repository from being a piece of technology to being a source of interconnections. scholarship usually requires a level of technology different from what most campus it departments can provide: research does not usually just deal in urls, it requires dois information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 8 and persistent identifiers; files are not just backed up, but are preserved (an active process that requires consideration for how computing will change over the coming decades). not only is a library a place to go to look for data, but it is also a place that can help publish and deposit items, providing valuable services to connect researchers to tools and platforms to facilitate research. this is an area of service that libraries and repositories can provide. in the relationship between libraries and technologies, innovation and maintenance, one clear challenge was the amount of emotional labor necessary to revitalize a service. the pos spent a large portion of time apologizing for previous failures, managing expectations by scaling back previous promises, and grappling with the current technical shortcomings of the service. while this is, at least in part, the role of the pos, the phenomenon of controlling expectations and handling the emotional debt that comes with broken promises and failed technologies is not localized to hesburgh libraries. in libraries especially, this work tends to fall to women, where they are forced to be the middle ground between technology and patron-facing librarians.18 while embracing the term “product owner” has helped to make visible and valuable the labor invested, especially that which might otherwise be overlooked, libraries writ large still need to contend with the gender divide plaguing the seeming dichotomy between innovation and maintenance. 19 in fact, as libraries continue to build new technologies and support innovative research, the role of the product owners in managing legacy technologies will be crucial for success, as will embracing a culture of care and empathy. while beyond the scope of this case study, continued discussions of the gender roles often employed in library technology need to continue, especially as academic libraries embrace scrum methodology, project management, and product ownership. conclusion in this case study, the product owners of a legacy institutional repository described methods for revitalizing a service. for the institutional repository managed by hesburgh libraries, there has been a noticeable increase in usage in the past six months: more deposits, higher access counts, and more support tickets tracked. it appears the efforts of the product owners are showing results. this increased usage is one more piece of evidence that a repository is more than software and more than technology: by allowing the product owners oversight of the mission and ultimate direction of the service, not to mention the freedom to engage with users on behalf of the development team, the system is in a much better position than in previous years. despite these improvements, there is still room for growth as the pos guide the overall mission and development of the institutional repository as both a service and a system. similarly, as more institutions contend with legacy digital technology, using pos and the methods described above may prove beneficial. there is additional work to be done, such as investigating more thoroughly the role of the repository—indeed the concept of the repository—and discussions of gender norms in technology. endnotes 1 this article is based on a presentation by don brower and mikala narlock: “what to do when your repository enters middle age” (online presentation, samvera connect 2020, october 28, 2020), https://doi.org/10.7274/r0-e32v-2h81. 2 clifford lynch, “institutional repositories: essential infrastructure for scholarship in the digital age,” portal: libraries and the academy 3 (april 1, 2003): 327–36, https://doi.org/10.1353/pla.2003.0039. https://doi.org/10.7274/r0-e32v-2h81 https://doi.org/10.1353/pla.2003.0039 https://doi.org/10.1353/pla.2003.0039 https://doi.org/10.1353/pla.2003.0039 information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 9 3 soohyung joo, darra hofman, and youngseek kim, “investigation of challenges in academic institutional repositories: a survey of academic librarians,” library hi tech 37, no. 3 (january 1, 2019): 525–48, https://doi.org/10.1108/lht-12-2017-0266. 4 j. j. branin, “institutional repositories,” in encyclopedia of library and information science, ed. m. a. drake (boca raton, fl: taylor & francis group, 2005): 237–48; lynch, “institutional repositories.” 5 raym crow, “the case for institutional repositories: a sparc position paper,” arl bimonthly report 223, august 2002: 7; lynch, “institutional repositories.” 6 dorothea salo, “innkeeper at the roach motel,” december 11, 2007, https://minds.wisconsin.edu/handle/1793/22088. 7 jihyun kim, “motivations of faculty self-archiving in institutional repositories,” journal of academic librarianship 37, no. 3 (may 1, 2011): 246–54, https://doi.org/10.1016/j.acalib.2011.02.017; deborah e. keil, “research data needs from academic libraries: the perspective of a faculty researcher,” journal of library administration 54, no. 3 (april 3, 2014): 233–40, https://doi.org/10.1080/01930826.2014.915168. 8 trevor owens, “the theory and craft of digital preservation,” lis scholarship archive, july 15, 2017, https://doi.org/10.31229/osf.io/5cpjt. 9 e.g., joo, hofman, and kim, “investigation of challenges in academic institutional repositories.” 10 sarah hare and jenny hoops, “furthering open: tips for crafting an ir deposit service,” october 26, 2018, https://scholarworks.iu.edu/dspace/handle/2022/22547; james powell, martin klein, and herbert van de sompel, “autoload: a pipeline for expanding the holdings of an institutional repository enabled by resourcesync,” code4lib journal, no. 36 (april 20, 2017), https://journal.code4lib.org/articles/12427; carly dearborn, amy barton, and neal harmeyer, “the purdue university research repository: hubzero customization for dataset publication and digital preservation,” oclc systems & services, february 1, 2014, https://docs.lib.purdue.edu/lib_fsdocs/62. 11 clifford lynch, “updating the agenda for academic libraries and scholarly communications,” college & research libraries 78, no. 2 (february 2017): 126–30, https://doi.org/10.5860/crl.78.2.126. 12 lynch, “updating the agenda,” 128. 13 diane walker, “hesburgh/snite mellon grant,” october 31, 2018, https://doi.org/10.17605/osf.io/cusmx. 14 hrafnhildur sif sverrisdottir, helgi thor ingason, and haukur ingi jonasson, “the role of the product owner in scrum-comparison between theory and practices,” in “selected papers from the 27th ipma (international project management association), world congress, dubrovnik, croatia, 2013,” special issue, procedia—social and behavioral sciences, 119 (march 19, 2014): 257–67, https://doi.org/10.1016/j.sbspro.2014.03.030. https://doi.org/10.1108/lht-12-2017-0266 https://doi.org/10.1108/lht-12-2017-0266 https://minds.wisconsin.edu/handle/1793/22088 https://minds.wisconsin.edu/handle/1793/22088 https://minds.wisconsin.edu/handle/1793/22088 https://doi.org/10.1016/j.acalib.2011.02.017 https://doi.org/10.1016/j.acalib.2011.02.017 https://doi.org/10.1016/j.acalib.2011.02.017 https://doi.org/10.1080/01930826.2014.915168 https://doi.org/10.1080/01930826.2014.915168 https://doi.org/10.31229/osf.io/5cpjt https://doi.org/10.31229/osf.io/5cpjt https://scholarworks.iu.edu/dspace/handle/2022/22547 https://scholarworks.iu.edu/dspace/handle/2022/22547 https://journal.code4lib.org/articles/12427 https://journal.code4lib.org/articles/12427 https://journal.code4lib.org/articles/12427 https://docs.lib.purdue.edu/lib_fsdocs/62 https://docs.lib.purdue.edu/lib_fsdocs/62 https://docs.lib.purdue.edu/lib_fsdocs/62 https://doi.org/10.5860/crl.78.2.126 https://doi.org/10.5860/crl.78.2.126 https://doi.org/10.5860/crl.78.2.126 https://doi.org/10.17605/osf.io/cusmx https://doi.org/10.1016/j.sbspro.2014.03.030 https://doi.org/10.1016/j.sbspro.2014.03.030 information technology and libraries september 2021 product ownership of a legacy institutional repository | narlock and brower 10 15 salo, “innkeeper.” 16 carolyn ten holter, “the repository, the researcher, and the ref: ‘it’s just compliance, compliance, compliance’,” journal of academic librarianship 46, no. 1 (january 1, 2020): 102079, https://doi.org/10.1016/j.acalib.2019.102079. 17 don brower et al., “on institutional repositories, ‘beyond the repository services,’ their content, maintainers, and stakeholders,” against the grain, 32 (1), https://against-thegrain.com/2020/04/v321-atg-special-report-on-institutional-repositories-beyond-therepository-services-their-content-maintainers-and-stakeholders/. 18 bethany nowviskie, “on capacity and care,” october 4, 2015, http://nowviskie.org/2015/oncapacity-and-care/; ruth kitchin tillman, “who’s the one left saying sorry? gender/tech/librarianship,” april 6, 2018, https://ruthtillman.com/post/whos-the-one-leftsaying-sorry-gender-tech-librarianship/. 19 dale askey and jennifer askey, “one library, two cultures” (library juice press, 2017), https://macsphere.mcmaster.ca/handle/11375/22281; rafia mirza and maura seale, “dudes code, ladies coordinate: gendered labor in digital scholarship,” october 22, 2017, https://osf.io/hj3ks/. https://doi.org/10.1016/j.acalib.2019.102079 https://doi.org/10.1016/j.acalib.2019.102079 https://against-the-grain.com/2020/04/v321-atg-special-report-on-institutional-repositories-beyond-the-repository-services-their-content-maintainers-and-stakeholders/ https://against-the-grain.com/2020/04/v321-atg-special-report-on-institutional-repositories-beyond-the-repository-services-their-content-maintainers-and-stakeholders/ https://against-the-grain.com/2020/04/v321-atg-special-report-on-institutional-repositories-beyond-the-repository-services-their-content-maintainers-and-stakeholders/ http://nowviskie.org/2015/on-capacity-and-care/ http://nowviskie.org/2015/on-capacity-and-care/ http://nowviskie.org/2015/on-capacity-and-care/ https://ruthtillman.com/post/whos-the-one-left-saying-sorry-gender-tech-librarianship/ https://ruthtillman.com/post/whos-the-one-left-saying-sorry-gender-tech-librarianship/ https://ruthtillman.com/post/whos-the-one-left-saying-sorry-gender-tech-librarianship/ https://macsphere.mcmaster.ca/handle/11375/22281 https://macsphere.mcmaster.ca/handle/11375/22281 https://macsphere.mcmaster.ca/handle/11375/22281 https://osf.io/hj3ks/ abstract introduction selected literature review local context first things first: what is our ir? mission (and vision) critical clear and present boundaries small changes, big rewards lessons learned and future work conclusion endnotes conversion of bibliographic information to machine readable form using on-line computer terminals 217 frederick m. balfour: information systems engineer, technical information dissemination bureau, state university of new york, buffalo, new york a description of the first six months of a profect to convert to machine readable form the entire shelf list of the libraries of the state university of new york at buffalo. ibm datatext~ the on-line computer service which was used for the conversion, provided an upperand lowercase typewriter which transmitted data to disk storage of a digital computer. output was a magnetic tape containing bibliographic information tagged in a· modified marc i format. typists performed all tagging at the console. au information except diacriticals and non-roman alphabets was converted. direct costs for the first six months were $.55 per title. several recent articles have reported on methods and related costs to convert library bibliographic information to machine readable form. chapin ( 1) compared keypunching, paper tape, and optical character recognition. keypunching was also described by hammer ( 2), and black (3) . buckland (4) described paper tape conversion, and johns hopkins university ( 5) reported on optical character recognition. online computer terminals have been proposed ( 6), but have hitherto not been tried in a large library. without attempting to discuss the various techniques, this paper presents a detailed report of converting with on-line computer terminals. it is hoped that the experiences reported here and in the cited articles will 218 journal of library automation vol. 1/ 4 december, 1968 provide suitable information to a library administration considering largescale conversion. background in 1965 a systematic program of automation was begun in the libraries of the state university of new york at buffalo. the general goals of the program were to improve services to patrons and streamline internal operations. there are three general areas usually considered for automation in a library: acquisitions and accounting, the card catalog, and circulation control. an analysis of the system indicated that conversion of the card catalog to machine readable form would provide the greatest improvement in library services and operations. the reasons for the decision were as follows. first, the university libraries are growing rapidly; in one year the shelf list will increase by 60,000 to 100,000 titles, or about 15 to 25 per cent. second, suny buffalo is currently planning a new campus which will be completed in five to ten years. in the interim, the university will be spread over three major campus locations, with many smaller offices and departments located throughout the city, and the libraries must provide some form of bibliographic index for each location. the conversion of the shelf list to machine readable form will allow this distribution of the bibliographic information at a very low cost per title. finally, the project will provide experience in using magnetic tape for the handling of bibliographic information, so that when the library of congress' marc project begins to produce magnetic tapes, suny buffalo will be able to utilize them immediately. selecting the conversion hardware in 1966, a proposal for converting the shelf list to machine readable form ( 7) was presented to the library administration. it pointed out the many improvements in patron services, the advantages to the library staff, both professional and clerical, and the monetary savings to be realized by such a conversion. it discussed the four methods of file conversion then feasible: punched cards, optical scanners, punched paper tape, and magnetic tape-keyed data converters (as exemplified by tl1e mohawk data sciences equipment) ( 8). the proposal recommended using the magnetic tape-keyed data converters because of their input speed, ease of entry, and elimination of handling cards or paper tape. during the first quarter of 1967, a fifth method of conversion was considered, an ibm product called datatext (9). it required the rental of an ibm 27 41 communications terminal (essentially a typewriter), a western electric 103a data-set, and a voice-grade telephone line to the nearest ibm installation, which was cleveland, ohio. a customer may buy time in six-hour blocks called datatext agreements. an agreeconversion of bibliographic information/ balfour 219 ment covered a time segment from 7:00a.m. to 1:00 p.m., or from 1:30 p.m. to 7:30p.m., five days a week. datatext provided everything that the magnetic tape converters did with some important additions. first, it had upperand lower-case alphabet using a shift character (the library administration had seen only the mohawk upper-case converter). second, the typewriter gave a typed copy which was easy to proofread. third, corrections were much easier because of the text-editing capabilities of the on-line computer. text-editing can best be illustrated by describing a typical datatext job. a typist working from source material produces a typewritten page; at the same time, the ibm 27 41 she is using transmits the data being typed to the computer in an area called "working storage". when typing is completed, the clerk gives the appropriate command and the information is stored in an area called "permanent storage", a computer manipulation which can be compared to taking a page from the typewriter and placing it in a folder in a file cabinet. when the typist wishes to make changes to the information, she can give a command to recall it from permanent storage to working storage. she can then manipulate it in several ways. during original entry, the computer automatically assigned numbers to each line. using these line numbers, a typist can move information within the text, can add or delete information, and can correct errors. commands are very simple and concise; for example, it takes four keystrokes to move a new line into the text. in making a correction, the typist merely types the incorrect word and the correct word; the computer then types the complete line to show that the correction has been properly executed. (this instant replay, or on-line interaction, is a benefit unique to the on-line terminal.) after any change, the computer automatically renumbers lines and reformats the entire text. a sample of typed input is illustrated and discussed later in the article. in april 1967, it was decided to test the datatext service because of its powerful correction capability, and because it could be installed and working within three weeks. in may the console was delivered, the telephone equipment installed, and a long-distance line to cleveland rented. a one-month test of datatext proving successful, three more consoles, data sets and telephone lines were added, and the conversion project was fully underway. training the typists the majority of the typing and proofreading staff were drawn from existing personnel in the cataloging department. individuals chosen had a background in either catalog card typing or file maintenance, and consequently a good working knowledge of information on a catalog card. it was anticipated that with a minimum of further training, the typists could identify and tag information as they were typing it at the console. this assumption was critical to the success of the project, since the li.... ----------------~---220 journal of library automation vol. 1/ 4 december, 1968 brary could not afford the professional time necessary for complete pretagging of bibliographic information. typists involved in the one-month test were given several hour-long training sessions on tagging before the console arrived. when the project got underway, a list of all possible tags was posted near the console, and a librarian was nearby to answer questions. mter three weeks of operation, it was obvious that the typists could tag at the console, thus making this part of the test run a success. the tagging system used was developed from the marc i pilot project ( 10). most of the original tags were retained and several additional ones designed to meet specific local needs. tape files created were formatted according to marc i specifications, although fixed fields were left blank. the tagging system is outlined in a reference manual prepared for typists and proofreaders ( 11). operation of an on-line console requires special training. ibm sent a datatext instructor to buffalo on several occasions to provide typist training. for the major training session, which occurred in june, the ibm representative came for a full week. ten typists were trained; five specialized in entering information, and five specialized in retrieving, correcting, and transmitting information. by the end of the week both groups were skilled in their respective specialities, and many typists were able to perform well in both areas. later, typists were trained in several sessions by one of the library's typing staff. during the first three months, the author was near the terminals at all times to answer questions on terminal operation, to collect data for measuring and controlling performance, and to act as supervisor. a librarian was on call for questions on complex library problems, and the programmer-analyst was available to help solve problems regarding input format and tagging. at the end of this period, appropriate clerical staff had been trained to supervise minute-to-minute operation. conversion procedures the general method of conversion (figure 1 ) was as follows. a typist typed into "working storage" for an hour, inputting 15 to 30 shelf list cards. she instructed the computer to store this "document" in a permanent storage location on disc. she then placed the typed copy and cards in a proofreading bin, cleared working storage, and started another document. a proofreader compared typed copy with original cards and indicated any errors. the corrected document then went to a correction typist who "retrieved" the document from permanent storage to working storage, performed the corrections, and transmitted the corrected document to magnetic tape. the original uncorrected document was left in permanent storage overnight and deleted the following day. documents were transmitted to tape bqffalo 3 x 5 shelf list cards hard copy proofreading operation file conversion of bibliographic information/ balfour 221 cleveland computer disc storage mail to library fig. 1. shelf list conversion information flow. 222 journal of library automation vol. 1/ 4 december, 1968 for about two weeks and the accumulation returned to the library via the mails. (ibm saved all permanent storage records for one week as a security measure. if a library typist inadvertently deleted a document , it could be retrieved by the computer operator. ) figure 2 shows a sample of typed input and subsequent correction. line numbers, as they are stored on the disc, are included on the right margin for ease of explanation. lines typed in capitals are computer r esponses to commands, the first entry being the command to clear working storage. the computer responds and then indicates that the console is in one of two general input modes. all cards are typed in "automatic" mode, for which the typist gives the appropriate command. when the computer responds the typist asks for the next line number, which is 3, and begins to input the card. in line 4, the typist makes an error and realizes it before throwing the carriage. she hits the "attention" key proc cleared uncontrolled mode a automatic mode n next number -3 90t bs2575.3.a7 lot bible. n.t. matthew. english. 1963. new english. 20t the gospel according to matthew=. commemen 3 ntary by a.w. argyle. 4 30a cambridge 30b university press 30c 1963 40t 227 p. maps. 20 em. sot the dambridge bible commentary: new english bible 70t bible. n.t. matthew -commentaries. 7lt argyle, aubrey william, 191073t title. 60z 92t 226.207 94t 63-23728 n next number -10 6 dambridge cambridge ~ot the cambridge bible commentary: new english bible fig. 2. sample input and correction of one shelf list card. 5 6 7 8 9 conversion of bibliographic info1'mation/balfovr 223 clueing the underscore, rolls the platen down, back spaces, and retypes the correct word. the computer then corrects the error. in line 6 the typist misspells "cambridge", but does not realize it before throwing the carriage. the correction is shown at the bottom although the input typist could not have performed it herself; it would have gone through proofreading and back to the correction typist. the correction is made by typing the line number, in this case "6", the incorrect word, "dambridge", tab, and the correct word. the computer responds by typing out the complete line showing the correction. except for a brief period, the shelf list was converted in alphabetic order, and by december 1 shelf list drawers through the e's were completed. early in the project, some of the literature classification, p and pq, was converted. foreign languages in the pq's gave no particular problems, and typing rates did not drop. all cards were converted in shelf list order except for those having non-western alphabets. when possible, these were transliterated and entered. otherwise their input was delayed. since the 2741 console has no diacritical marks, these were left out; however each card having them was entered and given a special tag to permit retrieval at a later date when diacritical marks could be added by special coding such as used by marc. conversion consoles and shelf list were in the same building. each day, several inches of cards were removed from the drawer being processed and a marker inserted indicating where the cards had gone. in general operation, cards were returned and refiled in less than a day so that inconvenience to staff was minimal. as a card was proofread, it was marked on the back with a "c" and the ·upper right hand comer received a very small notch with a mcbee punch. thus, newly cataloged cards filed with cards already converted are recognizable by the unnotched comer. costs table 1 gives a statistical summary of the conversion project from july 31 through december 1, 1967. the term "l.c. card" refers to a complete bibliographic entry for a title and may include more than one physical card, or may include writing on the back of a card. input and correction functions are reported separately and then totaled to give a realistic input rate per hour for corrected cards. supervisor cost reflects wages of clerical supervisors only. those of the programmer-analyst, the librarian and the systems analyst assigned to the project are not included. a breakdown of monthly equipment costs per console is given in table 2. installation costs were $150 for each terminal, and $50 for each leased telephone line. when the project operated four consoles, the monthly equipment cost was $4,472. 224 journal of library automation vol. 1/ 4 december, 1968 table 1. conversion project statistics (july 31-dec. 1, 1967) input, proofreading and correction total l.c. cards input typist hours input typist hours correcting total typist hours proofreading hours number of errors per l.c. card l.c. card input rate per hour l.c. card correction rate per hour overall conversion rate (input & correction) cards per hour proofreading rate, cards per hour costs labor cost @ $1.75 per hour equipment and supervisors total cost cost per card converted utilization of console time hours typed hours consoles down hours computer down hours lost time table 2. monthly operational costs per terminal ibm 2741 communications terminal western electric 103a data set 24-hour voice-grade lease line to cleveland plus local telephone costs 2 data text agreements @ $310. total 3,035 492 3,381 245 91 438 4,155 49,348 3,527 1,235 .42 16.3 100 14 40 $ 8,078.00 18,995.00 $27,073.00 $0.55 81.4% 5.9% 2.2% 10.5% 100.0% $ 85.00 27.50 385.50 620.00 $1118.00 "hours typed" is time that consoles were actually being used to input or correct cards. this is slightly less than "typist hours worked" because some correction has been delayed, but it was included in hours worked to give true representation of input rates. "hours consoles down" reflects time lost due to console breakdown. during the early part of the conversion of bibliographic information/ balfour 225 period, two consoles were failing often. however, as operating problems were solved, console down-time dropped far below the average 5.9 per cent shown. "hours computer down" was also greater during early weeks of the project. however, for each hour down, ibm credited the library with $12.00 ( $3.00 per terminal for four terminals). "hours lost time" reflects periods when a working console could not be manned because of personnel breaks or operator absence. all times are given in console-hours, four consoles operating for one hour being recorded as four hours. the error rate of .42 errors per card is very low. allowing 350 characters per shelf list card, typists were making one error for every 830 keystrokes. this translates to about 3 errors per typewritten page of 50 characters per line, 50 lines per page. the office of secretarial studies of suny at buffalo indicates that this rate is well within the tolerance for "normal" typing, as in a typing pool. when it is considered that typists were tagging and inputting complicated bibliographic information, rate of accuracy was commendably high. typists used in the project included the lowest salary grade of civilservice typists, part-time hourly workers, and students. an acceptable input rate for civil service typists was 18 cards per hour, which is equivalent to 21 5-character words per minute. the faster typists, at 26 cards per hour, were typing at 30 words per minute. again, let it be mentioned that the material was complex and that typists were required to tag each piece of information. conclusions several points can be made about converting with datatext. it was easy to implement and received excellent support from ibm. the ibm information marketing staff in cleveland provided constant assistance during the early part of the installation and visited often once the project was successfully underway. ibm sent the datatext instructor as often as needed and provided free computer time during teaching sessions. the four long-distance telephone lines and data sets proved reliable. there was only one instance during the period when a line was inoperable and it was repaired in three hours. the liaison and support from new york bell telephone was very good. datatext costs would have been lower had the ibm installation been nearer. cleveland is 173 miles from buffalo giving a 24-hour leaseline cost of $342 per month. (datatext service will soon include a uniform long-distance-lines cost.) verification or correction on datatext does not require human retyping of each line of entry. only the word in error and its replacement need be typed; the console then types the corrected line to show that the error was deleted and the replacement inserted. consequently correction costs are low and corrections accurate. 226 journal of library automation vol . l / 4 december, 1968 average rates and costs given in table i reflect learning during the first six months of the project. towards the end of the reported period, rates were improving and costs decreasing. since december 1967, the project has added three more consoles and uses a datatext service provided by a campus computer. costs have dropped below $.45 per card, a figure which will increase somewhat when diacriticals are added. potentially cost per title for complete conversion is under $.50. references 1. chapin, richard e.; pretzer, dale h.: "comparative costs of converting shelf list records to machine readable form," journal of library automation, 1 (march 1968), 66-7 4. 2. hammer, donald p.: "problems in the conversion of bibliographic dataa keypunching experiment," american documentation, 19 (january 1968), 12-17. 3. black, donald v. : "creation of computer input in an expanded character set," journal of library automation, 1 (june 1968), 110-120. 4. buckland, l. f.: recording of library of congress bibliographical data in machine readable form (rev. ed.; washington, d.c.: council on library resources, 1965). 5. the johns hopkins university. milton s. eisenhower library: progress report on an operations research and systems engineering study of a university library (baltimore: johns hopkins, 1965). 6. international business machines corporation. federal systems division : report of a pilot protect for converting the pre-1952 national union catalog to a machine readable record (rockville, maryland: ibm, 1965). 7. lazorick, gerald j.; herling, john; atkinson, hugh: conversion of shelf list bibliographic information to machine readable form and production of book indexes to shelf list (buffalo, n.y.: state university of new york at buffalo, technical information dissemination bureau, 1966). 8. mohawk data sciences corp.: datagram no. 35, 1181 twk correspondence data-recorder, (herkimer, n.y., mohawk data sciences corp., 1967). 9. international business machines corporation: datatext operators instruction guide, form # j20-0010-1 (ibm, white plains, n.y., 1967). 10. u.s. library of congress, information systems office: a preliminary report on the marc (machine readable catalog) pilot protect (washington, d.c.: library of congress, 1966). 11. michael m. coffey: reference manual for typists and proofreaders. sunyab shelf list conversion project (buffalo, n.y. : suny at buffalo, technical information dissemination bureau, 1968). searchable signatures: context and the struggle for recognition gina schlesselmantarango information technology and libraries | september 2013 5 abstract social networking sites made possible through web 2.0 allow for unique user-generated tags called “searchable signatures.” these tags move beyond the descriptive and act as means for users to assert online individual and group identities. this paper presents a study of searchable signatures on the instagram application, demonstrating that these types of tags are valuable not only because they allow for both individuals and groups to engage in what social theorist axel honneth calls the “struggle for recognition,” but also because they provide contextual use data and sociohistorical information so important to the understanding of digital objects. methods for the gathering and display of searchable signatures in digital library environments are also explored. introduction a comparison of user-generated tags with metadata traditionally assigned to digital objects suggests that social network platforms provide an intersubjective space for what social theorist axel honneth has termed the “struggle for recognition.” 1 social network users, through the creation of identity-based tags—or what can be understood as “searchable signatures”—are able to assert and perform online selves and are thus able to demand, or struggle for, recognition within a larger social framework. baroncelli and freitas cogently argue that web 2.0, or the interactive online social arena, in fact functions as a “recognition market in which contemporary individuals . . . trade personal worth through displays and exchanges of . . . self-presentations.” 2 a comparison of a metadata schema used in yale university’s digital images database with usergenerated tags accompanying shared photographs on the social networking platform instagram demonstrates that searchable signatures are unique to social networking sites. as phenomena that allow for public presentations of disembodied selves, searchable signatures thus provide specific information about the context of the digital images with which they are associated. capturing context remains a challenge for those working with digital collections, but searchable signatures allow viewers to derive valuable use data and sociohistorical information to better understand the world in which digital images originated and exist. literature review web 2.0 identities and recognition theory while web 2.0 can be imagined as a highly collaborative space where social actors are able to gina schlesselman-tarango (gina.schlesselman@du.edu) holds a master of social sciences from the university of colorado denver and is currently an mlis candidate at university of colorado. mailto:gina.schlesselman@du.edu searchable signatures: context and the struggle for recognition | schlesselman-tarango 6 communicate to the world new identities, some warn that this communication is somehow engineered and performed. van dijck, in an analysis of social media, argues that it is indeed “publicity strategies [that] mediate the norms for sociality and connectivity,” and baroncelli and freitas note that web 2.0 allows people to make themselves visible through modes of spectacularization.3 though his focus is on the spectacle in fin de siècle france, clark provides some insight into the effects of spectacularization on the individual. 4 working within a historical materialist framework, clark points that with the growth of capitalism, the individual has become colonized. 5 clark further describes this colonization as “massive internal extension of the capitalist market—the invasion and restructuring of whole areas of free time, private life, leisure, and personal expression . . . the making-into-commodities of whole areas of social practice which had once been referred to casually as everyday life.” 6 here, web 2.0 is not a liberatory tool but instead a space where users are colonized to the extent that they create selves exchanged through social networking sites owned by capitalist enterprises. web 2.0, then, has created a situation in which personal time and identification can be successfully commodified. baroncelli and freitas conclude, “from that formula, personal life becomes a capital to be shared with other people—preferably, with a large audience.” 7 the problem, then, is that one’s existence is defined simply “by being seen by others” and can no longer be understood as authentic.8 despite the sophistication of the argument detailed above, there are some who view the online self, created through web 2.0, as a legitimate and authentic identity. in an account of the online self, hongladarom summarizes this position, noting that both offline and virtual identities are constructed in social environments. 9 for hongladarom, these identities are not different in essence because “what it is to be a person . . . is constituted by external factors.” 10 the online world as an external factor has the ability to affirm one’s existence, regardless of whether that existence is physical or virtual. in sum, it is the social other and not a material existence that is the authenticating factor in identity formation. there are others who validate the role that spectacle—or what also can be understood as performance—plays in identity formation. pearson calls on the work of goffman to argue, “identity-as-performance is seen as part of the flow of social interaction as individuals construct identity performances fitting their milieu.” 11 for pearson, the identity is always performed, be it through web 2.0 or otherwise. there is nothing particularly worrisome, then, about the effects of web 2.0 on the self, nor does web 2.0 threaten the authenticity of the self. identity is always performed and is in a sense a spectacle—this does not mean, however, that identity in itself is spurious. it is with this perspective of the online self as a performed albeit authentic identity that this paper further develops. before a thorough analysis of the searchable signature as an online self can be conducted, a deeper understanding of honneth’s theory of recognition is in order. information technology and libraries | september 2013 7 in his 1995 work the struggle for recognition: the moral grammar of social conflicts, honneth sets out to develop a social theory based on what he calls “morally motivated struggle.” 12 based on the habermasian concept of communicative action, honneth contends that it is through mutual recognition that “one can develop a practical relation-to-self [and can] view oneself from the normative perspective of one’s partners in interaction, as their social addressee.” 13 relation-toself is key for honneth, and he argues that a healthy relation-to-self, or what can be thought of as self-esteem, is developed when one is seen as valuable by others. beyond self-esteem, honneth points that the success of social life itself depends on “symmetrical esteem between individualized (and autonomous) subjects.” 14 for honneth, this “symmetrical esteem” can lead to solidarity between individuals. “relationships of this sort,” he explains, “can be said to be cases of ‘solidarity’ because they inspire not just passive tolerance but felt concern for what is individual and particular about the other person.” 15 that is to say that felt concern for another allows one to see the specific traits of the other as valuable in working towards common goals, and honneth imagines that in situations of “symmetrical esteem . . . every subject is free from being collectively denigrated, so that one is given the chance to experience oneself to be recognized, in light of one’s own accomplishments and abilities, as valuable for society.” 16 until this ideal is realized, however, individuals must find sites in which to struggle to be recognized as valuable social assets. according to baroncelli and freitas, it is in fact web 2.0 that provides the arena where “the contemporary demand for the visibility of the self” is able to flourish. 17 they position this argument within honneth’s framework, asserting that the visibility of self is “directed towards a quest for recognition,” and they thus conclude that web 2.0 can be understood as a “recognition market.” 18 context and its importance capturing and integrating markers of context into records, according to chowdhury, still present a challenge for many.19 “there is now a general consensus that the major challenge facing a digital library as well as a digital preservation program is that it must describe its content as well as the context sufficiently well to allow its correct interpretation by the current and future generations of users,” he contends.20 context in itself is difficult to define, let alone its myriad facets that might or might not facilitate better understanding of digital objects. dervin, in her exploration of the meaning of context, points that it is often conceptualized as the “container in which the phenomenon resides.” 21 she points that the list of factors that constitute the container and might be considered contextual is in fact “inexhaustible”—items on this list, for example, might include the gender, race, and ethnicity of those involved in a phenomenon. 22 in an indexing or digital collection environment, the goal is to determine which of these many factors ought be included in a record to best allow for discovery and use. searchable signatures: context and the struggle for recognition | schlesselman-tarango 8 others imagine context as a fluid, ever-changing process rather than as a static container of data. “in this framework,” dervin writes, “reality is in a continuous and always incomplete process of becoming.” 23 this understanding of context as changing is helpful for those working with objects that live in digital environments, especially web 2.0. certainly the interactive nature of the web has created room for a variety of users to create, share, appropriate, comment on, tag, reject, celebrate, and ultimately understand images in a multitude of contexts that might be different from one moment to the next. there are many reasons to include contextual information in records of digital objects. lee argues that by providing context, or what he describes as the “social and documentary” world “in which [a digital object] is embedded,” future users will be able to better understand the “details of our current lives.” 24 further, lee contends that context is helpful in that is illustrates the ways in which a digital object is related to other materials: relationships to other digital objects can dramatically affect the ways in which digital objects have been perceived and experienced. in order for a future user to make sense of a digital object, it could be useful for that user to know precisely what set of . . . representations—e.g. titles, tags, captions, annotations, image thumbnails, video keyframes—were associated with a digital object at a given point in time. 25 the user-generated tag, then, is a valuable representation that provides contextual information surrounding the perception and experience of the image with which it is directly related. discussion user-generated tags and traditional metadata user-generated tags have been hailed as an important stage in the evolution of image description and are said to have the potential to shape controlled vocabularies used in traditional metadata schemas. for example, in a comparison of flickr tags and index terms from the university of st. andrews library photographic archive, rorissa stresses the importance of exploring similarities and differences between indexers’ and users’ language, noting that “social tagging could serve as a platform on which to build future indexing systems.” 26 like others, rorissa hopes that continued research into user-generated social tags will be able to “bridge the semantic gap between indexerassigned terms and users’ search language.” 27 in fact, some are currently utilizing social tags in an effort to describe and facilitate access to collections. one such organization is steve: the museum social tagging project, “a place where you can help museums describe their collections by applying keywords, or tags, to objects.” 28 the organization allows users to not only view traditional metadata associated with cultural objects, but also tags generated by others. in an effort to better understand the similarities and differences between user-generated tags and the language used in traditional metadata schemas, one must compare the two systems. information technology and libraries | september 2013 9 yale university’s digital images database provides a glimpse at the ways in which traditional metadata schemas are typically used to describe images in digital library settings. most of the images included in the database are accompanied by descriptive, structural, and administrative metadata. for example, an item entitled “boy sitting on a stoop holding a pole” (see figure 1) from the university’s collection of 1957–90 andrews st. george papers provides a digital copy of the image, the image number, name of the creator, date of creation, type of original material, dimensions, copyright information, manuscript group name and number, box and folder numbers, and a credit line.29 the image is further described by the following: “man in the shed is making homemade bombs. the boy and man are also in image 45350.” 30 figure 1. “boy sitting on a stoop holding a pole” from yale university’s digital images database collection of 1957–90 andrews st. george papers, november 2012. certainly, such information is useful in library environments and provides users with helpful and formatted data to best guide the information discovery process. the finding aid for the andrews st. george collection is additionally helpful in that it includes information about provenance, access, processing, associated materials, and the creator; it also contains descriptive information about the collection by box and folder number. 31 however, if additional use data and sociohistorical searchable signatures: context and the struggle for recognition | schlesselman-tarango 10 information specific to this individual item were available, it would be most helpful in assisting users in determining the image’s greater context. a study of modes of participation on social networking sites suggests that it is now possible to supply such contextual information for digital objects that live in interactive online environments. a useful site for exploring user-generated tags associated with images is instagram, a social application designed for iphone and android.32 instagram users are able to upload and edit photos, and other users can then view, like, and comment on the shared photos. instagram users are able to follow other users and search for photos by the creator’s username or by accompanying tags. instagram, owned by facebook, is interoperable with other social networking sites, and users have the ability to share their photos on facebook, flickr, tumblr, and twitter. as of july 2012, it was reported that instagram had 80 million users, and in september 2012, the new york times reported that 5 billion photos were shared through the application.33 users are limited to 30 tags per photo, and instagram suggests that users be as specific as possible when describing an image with a tag so that communities of users with similar interests can form.34 many tags, like the information included in traditional metadata schemas, aim to best describe an image by explaining its content; for example, one user assigned the tags #kids, #nieces, #nephews, and #family to a photograph of a group of smiling children (see figure 2). like the information accompanying the photograph in the yale university digital images database, such tags provide users and viewers with tools to better determine the “aboutness” of the image at hand. information technology and libraries | september 2013 11 figure 2. photo shared on instagram assigning both descriptive tags and the searchable signature #proudaunt, november 2012. however, instagram users are repurposing the tagging function in a way that is unique to social networking sites. in addition to the descriptive tags assigned to the image of the children described above, the user also tagged the photo with the term #proudaunt (see figure 2). there is, however, no aunt (what can be assumed to be an adult female) in the photograph. this tag, then, functions to further identify the user who created or shared the photograph and does not describe the content of the image at hand. a search of the same tag, #proudaunt, demonstrates that this user is not alone in identifying as such: in november 2012, this search returned 40,202 images with the same tag and more than 58,000 images with tags derived from the same phrase (#proudaunty, #proudauntie, #proudaunties, #proudauntiemoment, and #proudaunti) (see figure 3). figure 3. list of results from #proudaunt hashtag search on instagram, november 2012. this type of user-generated tag—one that identifies the creator or sharer of the photograph yet is not necessarily meant to describe the content of the image—can be understood as a searchable signature. such identity-based tags are not found within yale university’s digital images database; the closest relative of the searchable signature is the creator’s name. while searchable, this name is not alternative, or secondary, and it was not created and does not exist in a social environment. searchable signatures: context and the struggle for recognition | schlesselman-tarango 12 currently, born-digital objects are often created and shared in a technological milieu that allows for the assignment of user-generated tags. consequently, the integration of the searchable signature into the presentation of digital objects has become part of accepted social practice and offers unique opportunities for digital library curators and users alike. until quite recently, most materials—be they photographs, manuscripts, or government documents—were not born in digital environments. however, digitization projects have been undertaken to ensure that such historical materials are more widely and eternally available. these reborn digital objects, then, have been and can be integrated into dynamic social environments. steve: the museum social tagging project, mentioned earlier in this paper, is one example of an organization that has capitalized on the social practice of user-generated tagging and is using descriptive tags along with traditional metadata to better describe reborn digital objects. it is important, then, to explore what (if any) implications the application of the searchable signature, a unique type of user-generated tag, has for historical objects that are later integrated into digital environments. searchable signatures associated with born digital images on social networking sites contain valuable information about their creators, users, and the images’ context. one cannot ignore that users will, if given the chance, also likely apply signatures to reborn digital objects in similar ways that they do to objects that have always existed in social environments. since the searchable signature is used to identify not only digital image creators, but also sharers, and if these signatures do in fact provide important insight into the sharers and their motivations, then these signatures are not to be ignored. rather than focusing on the creating, the lens through which to understand the searchable signature for reborn digital objects can be shifted to the social act of sharing: by whom, when, in which social environments, and for what purposes. a deeper analysis of the presentation of self through the searchable signature and the role that the signature plays in providing valuable contextual information for both bornand reborn-digital objects is developed below. searchable signatures and the struggle for recognition if web 2.0 indeed functions as a recognition market, then social media and social networking sites might appear to be tables at such a market. placing oneself behind a table—be it facebook, twitter, or instagram—the user is able to perform his or her online identity to passersby and effectively struggle to be recognized as a unique individual or as a member of a social group. these performances, which could be deemed narcissistic in nature, can alternatively be read as healthy attempts to self-actualize and connect to larger society.35 one such “table” in the recognition market is instagram. beyond instagram’s social nature that allows participants to interact with and follow one another, the specific role of the searchable signature is of interest to those who are concerned with struggles for recognition. rather than describing shared images, searchable signatures reflect performative yet authentic user identities. information technology and libraries | september 2013 13 mccune, in a case study of consumer production on instagram, acknowledges the potential of the tag to not only facilitate image exchange but to communicate users’ positions as members of social groups.36 through a simple search of tags, users who identify as, for example, “cat ladies,” are able to validate their identities when they see that there are many others who use the same or similar language in demonstrations of the self (see figure 4). other signatures such as #proudaunt, while not necessarily playful, still function to provide viewers with additional information about the instagram user that cannot be determined through the photo itself. the ability to find images based on these searchable signatures allows users to find others who identify in a like manner and to imagine themselves as part of a larger social group. in effect, searchable signatures allow users to be recognized as social addressees of like-minded others. positioning oneself within a group must be understood as a struggle for recognition, for to imagine oneself as part of the social fabric is also to see oneself as valuable. figure 4. list of results from #catlady hashtag search on instagram, november 2012. enabled by web 2.0, searchable signatures contain potential for marginalized peoples or groups to assert online selves to be seen and ultimately heard in a truly intersubjective landscape. it is not too much of a leap to imagine that searchable signatures might make possible the organization of individuals and groups for political purposes. in fact, in a discussion of social groups, honneth notes that “the more successful social movements are at drawing the public sphere’s attention to searchable signatures: context and the struggle for recognition | schlesselman-tarango 14 the neglected significance of the traits and abilities they collectively represent, the better their chances of raising the social worth, or indeed, the standing of their members.” 37 here, searchable signatures might provide such movements with a venue to capture the public’s attention and to effectively struggle for and gain recognition. searchable signatures and context as markers of individual and group identities, searchable signatures are unique in that they provide a snapshot of the multitude of social, historical, political, individual, and interpersonal relationships that ontologize the images with which they are paired. it is this very contextual information that is at times lacking in traditional indexing environments. by examining searchable signatures, experts and users are able to understand which individuals and groups create, use, and identify with certain images. thus, as markers of self, searchable signatures provide use data for scholars to better investigate which images are important to online individual or group identities. if the searchable signature is used in a political fashion, historians and sociologists might be able to study which types of images, for example, marginalized groups rally around, identify with, and use in their struggles for recognition. such use data also illuminates how and by whom certain digital images have been appropriated over time. for example, if a picture of a cat is first created or shared via instagram by an animal rights activist, the image might be accompanied by the searchable signature #humanforcats. this same image, shared by another user months later, might be accompanied by the #catlady signature. those interested will be able to examine how the same image has been historically used for different purposes and will be better able to grasp the evolving nature of its digital context. in addition to use data, the searchable signature provides insight into the sociohistorical context surrounding digital images. for those who perceive “reality . . . as accessible only (and always incompletely) in context, in specific historicized moments in time space” the searchable signature clarifies and makes more accessible that reality surrounding the digital image. 38 in a traditional library setting, a photo of a cat might be indexed with descriptive subject headings such as “cat,” “persian cat,” or “kitten—behavior.” however, the searchable signature #catladyforlife provides additional information on how the cat has become, for a certain social group in a specific moment in time, a trope of sorts for those who are proud of not only their relationships with their domestic pets, but of their shared values and lifestyles as well. if a historian were to dig deeper, he or she also might see that “cat lady” has historically been used in a derogatory manner to mark single, unattractive women thought to be crazy and unable to care for the great number of cats they own and that, by (re)claiming this title, women might be engaging in a struggle for recognition that extends beyond mere admiration for felines.39 chowdhury, in a continued discussion of challenges facing the digital world, asks whether it is “possible to capture the changing context along with the content of each information resource, because as we know the use and importance . . . changes significantly with time.” 40 additionally, he information technology and libraries | september 2013 15 asks, “will it be possible to re-interpret the stored digital content in the light of the changing context and user community, and thereby re-inventing the importance and use of the stored objects?” 41 it is here that the searchable signature offers use data and sociohistorical information to illuminate the (changing) value digital images have for individuals, communities, and society. conclusion clark argues that representation must be understood within the confines of what he calls “social practice.” 42 social practice, among other things, can be understood as “the overlap and interference of representations; it is their rearrangement in use.” 43 representation of self also must be understood within current social practice, and an important facet of today’s practice is web 2.0. as a social space, web 2.0 allows for the creation of disembodied self-representations. one type of such representation, the searchable signature, is a phenomenon unique to social networking sites. while many acknowledge the potential of descriptive, user-generated tags to inform or even to be used in conjunction with metadata schemas or controlled vocabularies, instagram users have created an additional, alternative use for the tag. rather than simply using tags to describe shared images, they have successfully created a route to online identity formation and recognition. searchable signatures demonstrate the power of the online self, as they allow users to struggle to be recognized as unique individuals or as parts of larger social groups. these signatures, too, might act as platforms on which social groups can assert their value and thus demand recognition. additionally, searchable signatures provide contextual information that reflects the social practice in which digital images live. while the capture and integration of such information remains a challenge for those engaged in traditional indexing, web 2.0 allows for this unique type of usergenerated tag and thus provides better understanding of the context surrounding digital images. as to the question of whether searchable signatures can be integrated into existing metadata schemas or be used to inform controlled vocabularies in library environments, it is not unreasonable to suggest that digital objects be accompanied by their supplemental yet valuable representations (e.g., searchable signatures and the like). many methods exist through which these signatures might be both gathered and displayed. certainly, a full exploration of such practices is the stuff of future research; however, some initial ideas are detailed below. one method of gathering identity-based tags would involve the active hunting down of searchable signatures. locating objects on social networking sites that are also in one’s digital collection, the indexer would identify and track associated user-generated searchable signatures. this method would require extreme diligence, high levels of comfort navigating and using web 2.0, a clear idea of which social networking sites yield the most valuable searchable signatures, and likely one or more full-time staff members devoted to such activities. even if feed systems were employed for individual digital objects, this method demands much of indexers and would likely not be sustainable over time. searchable signatures: context and the struggle for recognition | schlesselman-tarango 16 a more passive yet efficient way of gathering searchable signatures would simply be to build on methods that have shown to be successful. by creating interactive digital environments that encourage users to assign not only descriptive but also identity-based tags, indexers are freed of the time-consuming task of hunting for searchable signatures on the web. since searchable signatures have come to be part of online social practice, the assigning of them would likely be familiar to users—initially, libraries might need to prompt users to share signatures or provide them with examples. this gathering tactic could be used to harvest signatures for items that are already part of the library’s digital collection (telling us about signatures used by potential sharers) or as a means to incorporate new digital objects into the collection (telling us about signatures used by both creators and sharers). in both gathering scenarios, indexers might choose to display only the most occurring or what they deem to be the most relevant searchable signatures, or they might choose to display all such tags; decisions such as these will ultimately depend on each institution’s mission and resources. of course, if a library integrates a born-digital image into its collection and can identify the searchable signatures originally assigned to it via social networking sites or otherwise, this information should also be recorded. here, users will be able to get a glimpse of the image in its pre-library life. providing associated usernames, dates posted, and the name of the social networking sites too will assist in providing a more complete picture of the individuals or groups linked to the image. this information can provide valuable data about the information creators and sharers who use specific social platforms. the aim of this paper is to lay the theoretical groundwork to better understand the role of searchable signatures in today’s digital environment as well as the signature’s unique ability to provide context for digital images. surely, further research into the phenomenon of the searchable signature would demonstrate how it is currently used outside of instagram or as a political tool. others might consider examining the username as another arena in which individuals or groups construct and perform online identities and thus engage in struggles for recognition. usernames also might provide contextual use data and sociohistorical information that inevitably support greater understanding of digital objects. finally, further research is needed to identify how libraries could utilize the searchable signature in promotional activities and to build and cater to user communities. references 1. axel honneth, the struggle for recognition: the moral grammar of social conflicts (cambridge: mit press, 1995). 2. lauane baroncelli and andre freitas, “the visibility of the self on the web: a struggle for recognition,” in proceedings of 3rd acm international conference on web science, 2011, accessed august 12, 2013, www.websci11.org/fileadmin/websci/posters/191_paper.pdf. information technology and libraries | september 2013 17 3. jose van dijck, “facebook as a tool for producing sociality and connectivity,” television & new media 13, no. 2 (2012): 160–76; baroncelli and freitas, “the visibility of the self.” 4. t. j. clark, introduction to the painting of modern life: paris in the art of manet and his followers (princeton, nj: princeton university press, 1984), 1–22. 5. ibid. 6. ibid., 9. 7. baroncelli and freitas, “the visibility of the self.” 8. ibid. 9. soraj hongladarom, “personal identity and the self in the online and offline world,” minds & machines 21 (2011): 533–48. 10. ibid., 541. 11. erika pearson, “all the world wide web’s a stage: the performance of identity in online social networks,” first monday 14 (2009), accessed november 9, 2012, www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm; erving goffman, the presentation of self in everyday life (garden city, ny: doubleday, 1959). 12. honneth, the struggle for recognition, 1. 13. jurgen habermas, the theory of communicative action (boston: beacon, 1984); honneth, the struggle for recognition, 92. 14. honneth, the struggle for recognition, 129. 15. ibid. 16. ibid., 130. 17. baroncelli and freitas, “the visibility of the self.” 18. ibid. 19. gobinda chowdhury, “from digital libraries to digital preservation research: the importance of users and context,” journal of documentation 66, no. 2 (2010): 207–23, doi: 10.1108/00220411011023625. 20. ibid., 217. 21. brenda dervin, “given a context by any other name: methodological tools for taming the unruly beast,” in information seeking in context, ed. pertti vakkari et al. (london: taylor graham, 1997), 13–38. searchable signatures: context and the struggle for recognition | schlesselman-tarango 18 22. ibid., 15. 23. ibid., 18. 24. christopher a. (cal) lee, “a framework for contextual information in digital collections,” journal of documentation 67 (2011): 95–143. 25. ibid., 100. 26. abebe rorissa, “a comparative study of flickr tags and index terms in a general image collection,” journal of the american society for information science and technology 61, no. 11 (2010): 2230–42. 27. ibid., 2239. 28. “steve central: social tagging for cultural collections,” steve: the museum social tagging project, accessed december 16, 2012, http://tagger.steve.museum. 29. “yale university library manuscripts & archives department,” yale university manuscripts & archives digital images database, last modified april 19, 2012, accessed december 3, 2012, http://images.library.yale.edu/madid. 30. ibid. 31. “andrew st. george papers (ms 1912),” manuscripts and archives, yale university library, accessed april 30, 2013, http://drs.library.yale.edu:8083/fedoragsearch/rest. 32. “faq,” instagram, accessed november 10, 2012, http://instagram.com/about/faq. 33. emil protalinksi, “instagram passes 80 million users,” cnet, july 6, 2012, accessed november 13, 2012, http://news.cnet.com/8301-1023_3-57480931-93/instagram-passes-80-millionusers; jenna wortham, “it’s official: facebook closes its acquisition of instagram,” new york times, september 6, 2012, accessed november 13, 2012, http://bits.blogs.nytimes.com/2012/09/06/its-official-facebook-closes-its-acquisition-ofinstagram. 34. “tagging your photos using #hashtags,” instagram, accessed november 10, 2012, http://help.instagram.com/customer/portal/articles/95731-tagging-your-photos-usinghashtags; “instagram tips: using hashtags,” instagram, accessed november 10, 2012, http://blog.instagram.com/post/17674993957/instagram-tips-using-hashtags. 35. andrew l. mendelson and zizi papacharissi, “look at us: collective narcissism in college student facebook photo galleries,” in a networked self: identity, community and culture on social network sites, ed. zizi papacharissi (new york: routledge, 2010), 251–73. 36. zachary mccune, “consumer production in social media networks: a case study of the http://tagger.steve.museum/ http://images.library.yale.edu/madid/ http://drs.library.yale.edu:8083/fedoragsearch/rest/ http://instagram.com/about/faq/ http://news.cnet.com/8301-1023_3-57480931-93/instagram-passes-80-million-users/ http://news.cnet.com/8301-1023_3-57480931-93/instagram-passes-80-million-users/ http://bits.blogs.nytimes.com/2012/09/06/its-official-facebook-closes-its-acquisition-of-instagram/ http://bits.blogs.nytimes.com/2012/09/06/its-official-facebook-closes-its-acquisition-of-instagram/ http://help.instagram.com/customer/portal/articles/95731-tagging-your-photos-using-hashtags http://help.instagram.com/customer/portal/articles/95731-tagging-your-photos-using-hashtags http://blog.instagram.com/post/17674993957/instagram-tips-using-hashtags information technology and libraries | september 2013 19 ‘instagram’ iphone app” (master’s dissertation, university of cambridge, 2011), accessed december 20, 2012, http://thames2thayer.com/portfolio/a-study-of-instagram. 37. honneth, the struggle for recognition, 127. 38. dervin, “given a context by any other name,” 17. 39. kiri blakeley, “crazy cat ladies,” forbes, october 15, 2009, accessed december 4, 2012, www.forbes.com/2009/10/14/crazy-cat-lady-pets-stereotype-forbes-woman-timefelines.html; crazy cat ladies society & gentlemen's auxiliary homepage, accessed december 4, 2012, www.crazycatladies.org. 40. chowdhury, “from digital libraries to digital preservation,” 219. 41. ibid. 42. clark, introduction to the painting of modern life, 6. 43. ibid. acknowledgments many thanks to erin meyer and dr. krystyna matusiak at the university of denver for their feedback and guidance. http://thames2thayer.com/portfolio/a-study-of-instagram/ online ticketed-passes: a mid-tech leap in what libraries are for public libraries leading the way online ticketed-passes: a mid-tech leap in what libraries are for jeffrey davis information technology and libraries | june 2019 8 jeffrey davis (jtrappdavis@gmail.com) is branch manager at san diego public library, san diego, california. last year a library program received coverage from the new york times, the wall street journal, the magazines mental floss and travel+leisure, many local newspapers and tv outlets, online and trade publications like curbed, thrillist, and artforum, and more. that program is new york’s culture pass, a joint program of the new york, brooklyn, and queens public libraries. culture pass is an online ticketed-pass program providing access to area museums, gardens, performances, and other attractions. as the new york daily news wrote in their lede: “it’s hard to believe nobody thought of it sooner: a new york city library card can now get you into 33 museums free.” libraries had thought of it sooner, of course. museum pass programs in libraries began at least as early as 1995 at boston public library and the online ticketed model in 2011 at contra costa (ca) county library. the library profession has paid this “mid-tech” program too little attention, i think, but that may be starting to change. what are online ticketed-passes? the original museum pass programs in libraries circulate a physical pass that provides access to an attraction or group of attractions. sometimes libraries are able to negotiate free or discounted passes but many times the passes are purchased outright. the circulating model is still the most common for library pass programs, but it suffers from many limitations. passes by necessity are checked out for longer than they’re used. they sit waiting for pick up on hold shelves and in transit to their next location. long queues make it hard for patrons to predict when their requests will be filled, and therefore difficult to plan on using. for the participating attractions, physical passes are typically good anytime and so compete with memberships and paid admission. there are few ways to shape who borrows the passes in order to meet institutional goals. and there are few ways to limit repeat use by library patrons to both increase exposure and nudge users toward membership. as a result, most circulating pass programs only connect patrons to a small number of venues. despite these limitations, circulating passes have been incredibly popular: at writing there are 967 requests for san diego public library’s 73 passes to the new children’s museum. we sometimes see that sort of interest in a new bestseller, but this is a pass that sdpl has offered continuously since 2009. in 2011, contra costa county library launched the first “ticketed-pass” program, discover & go. discover & go replaced circulating physical passes with an online system with which patrons, remotely or in the library with staff assistance, retrieve day-passes — tickets — by available date or venue. this relatively simple and common-sense change makes an enormous difference. in addition to convenience and predictability for patrons, availability is markedly increased because venues are much more comfortable providing passes when they can manage their use: patrons can be restricted to a limited number of tickets per venue per year and venues can match the information technology and libraries | june 2019 9 number of tickets available to days that they are less busy. the latter preserves the value of their memberships while making use of their own “surplus capacity” to bring in new visitors and potential new members. funding and internal expectations at many venues carry obligations to reach underserved communities and the programs allow partner attractions to shape public access and receive reporting by patron zip code and other factors. the epass software behind discover & go is regional by design and supports sharing of tickets across multiple library systems in ways that are impractical to do with physical passes. as new library systems join the program, they bring new partner attractions into the shared collection with them. the oakland zoo, for example, needs only to negotiate with their contact at oakland public library to coordinate access for members of oakland, san francisco, and san jose public libraries. because of the increased attractiveness of participation, it’s been easier for libraries to bring venues into the program. in 2011, discover & go hoped for a launch collection of five museums but ultimately opened with forty. the success of ticketed-pass programs in turn attracts more partners. today, discover & go is available through 49 library systems in california and nevada with passes to 137 participating attractions. similarly, new york’s culture pass launched with 33 participating venues and has grown in less than a year to offer a collection of 49. while big city programs attract the most attention, pass programs are offered by county systems like alamace county (nc), consortiums like libraries in clackamas county (or), small cities like lawrence (ma), small towns like atkinson (nh), and statewide like the michigan activity pass which is available through over 600 library sites with tickets to 179 destinations plus state parks, camping, and historical sites. for each library, the participating destinations form a unique collection: a shelf of local riches, idiosyncratic and rooted in place. through various libraries one can find tickets for the basketball hall of fame, stone barns center for food and agriculture, dinosaur ridge, eric carle museum of picture book art, bushnell park carousel, california shakespeare theater, children’s museums, zoos, aquariums, botanical gardens, tours, classes, performances, and on to the met, moma, crocker, de young, and many, many, many more. for kids, “enrichments” like these are increasingly understood as essential parts of learning and exploration. for adults, access to our cultural treasures, including partners like san francisco’s museum of the african diaspora or chicago’s national museum of puerto rican arts & culture — besides being its own reward — enhances local connection and understanding. we’re also starting to see the ticketing platform itself become an asset to smaller organizations — craft studios, school performances, farm visits, nature centers, and more — that want to increase public access without having to take on a new ability. importantly, ticketed-pass programs are built on the core skills of librarians: information management, collection development, community outreach, user-centered design, customer service, and technological savvy. the technology discover & go was initially funded by a $45,000 grant from the bay area library and information system (balis) cooperative. contra costa contracted with library software company quipu group to develop the epass software that runs the program and that is also used by ny’s culture pass, public libraries leading the way: online ticketed passes | davis 10 https://doi.org/10.6017/ital.v38i2.11141 multnomah county (or) library’s my discovery pass, and a consortium of oregon libraries as cultural pass. ticketed-pass software is also offered by the libraryinsight and plymouth rocket companies and used by denver public library, seattle public library, the michigan activity pass, and others. the software consists of a web application with a responsive patron interface and connects over sip2 or vendor api to patron status information from the library ils. administrative tools set finegrained ticket availability, blackout dates, and policies including restrictions by patron age, library system, zip code, municipality, number of uses allowed globally and per venue, and more. recent improvements to epass include geolocation to identify nearby attractions and improved search filters. still in development are transfer of tickets between accounts, re-pooling of unclaimed tickets, and better handling of replaced library cards. the strength that comes from multi-system ticketed-pass programs also carries with it challenges on the patron account side. ilses each implement protocols and apis for working with patron account information differently and library systems maintain divergent policies around patron status. there’s a role for lita and for library consortia and state libraries to push for more attention to and consistency on patron account policies and standards. the emphasis in library automation is similarly shifting. our ilses originated to manage the circulation of physical items, a catalog-centric view. today, as robert anderson of quipu group suggested to me, a diverse range of online and offline services and non-catalog offerings orbit our users, calling for a new frame of reference: “it’s a patron-centric world now.” the vision library membership is the lynchpin of ticketed-pass and complementary programs in the technical sense, as above, and conceptually: library membership as one’s ticket to the world around. though i’m not aware of academic libraries offering ticketed-passes, they have been providing local access through membership. at many campuses, the library is the source for one’s library card which is also one’s campus id, onand off-campus cash card, transit pass, electronic key, print management, and more. that’s kind of remarkable and deserving of more attention. traditionally, librarians have responded to patron needs by providing information, resources, and services ourselves. new models and technologies are making it easier to complement this with the facilitation approach, of which online ticketed-passes are the quintessential example. we further increase access by reducing barriers of complexity, language, know-how, and social capital, for example, by maintaining community calendars of local goings-on or helping communities take advantage of nearby nature. online ticketed-pass programs will grow and take their place in the public’s expectations of libraries and librarians: that libraries are the place that help us (better, more equitably) access the resources and riches around us. powering this are important new tools for library technologists to interrogate and advance with the same attention we give to both more established and more speculative applications. the provision of mobile services in us urban libraries ya jun guo, yan quan liu, and arlene bielefield information technology and libraries | june 2018 78 ya jun guo (yadon0619@hotmail.com) is associate professor of information and library science at zhengzhou university of aeronautics, china. yan quan liu (liuy1@southernct.edu) is professor of information and library science at southern connecticut state university. arlene bielefield (bielefielda1@southernct.edu) is professor in information and library science at southern connecticut state university. . abstract to determine the present situation regarding services provided to mobile users in us urban libraries, the authors surveyed 138 urban libraries council members utilizing a combination of mobile visits, content analysis, and librarian interviews. the results show that nearly 95% of these libraries have at least one mobile website, mobile catalog, or mobile app. the libraries actively applied new approaches to meet each local community’s remote-access needs via new technologies, including app download links, mobile reference services, scan isbn, location navigation, and mobile printing. mobile services that libraries provide today are timely, convenient, and universally applicable. introduction the mobile internet has had a major impact on people’s lives and on how information is found located and accessed. today, library patrons are untethered from and free of the limitations of the desktop computer.1 the popularity of mobile devices has changed the relationship between libraries and patrons. mobile technology allows libraries to have the kind of connectivity with their patrons that did not exist previously. patrons no longer think that it is necessary for them to be physically in the library building to use library services, and they are eager to obtain 24/7 access to library resources anywhere using their mobile devices. mobile patrons need mobile libraries to provide them with services. in other words, “patrons want to have a library in their pocket.”2 as a result, libraries around the world are exploring and developing mobile services. according to the state of america’s libraries 2017 report by the american library association, the 50 us states, the district of columbia, and outlying territories have 8,895 public library administrative units (as well as 7,641 branches and bookmobiles). the vital role public libraries play in their communities has also expanded.3 as part of the main role of public libraries, us urban libraries need to embrace the developmental trend of the mobile internet to better serve their communities. the provision of mobile services in us urban libraries is worthy of study and is of great significance as a model for how other public libraries plan and implement their mobile services. mailto:yadon0619@hotmail.com mailto:liuy1@southernct.edu mailto:bielefielda1@southernct.edu the provision of mobile services in us urban libraries | guo, liu, and bielefield 79 https://doi.org/10.6017/ital.v37i2.10170 literature review definition and types of mobile devices and mobile services as early as 1991, mark weiser proposed “ubiquitous computing,” pointing out how people could obtain and handle information at anytime, anywhere, and in any way.4 with this expectation, the possibilities of using personal digital assistants (pdas) as mobile web browsers were researched in 1995.5 in combination with a wireless modem, library users are able to use pdas to access information services whenever they are needed. today, mobile devices are generally defined as units small enough to carry around in a pocket, falling into the categories of pdas, mobile phones, and personal media players.6 for many researchers, laptops are not included in the definition of mobile devices. although wireless laptops purportedly offer the opportunity to go “anywhere in the home,” laptops are generally used in a small set of locations, rather than moving fluidly through the home; wireless laptops are portable, but not mobile.7 in contrast, lippincott suggested that mobile devices should include laptops, netbooks, notebook computers, cell phones, audio players such as mp3 players, cameras, and other items.8 according to the “mobile strategy report” by the california digital library, mobile phones, e-readers, mp3 players, tablets, gaming devices, and pdas are common mobile devices.9 each mobile device has its own characteristics and the potential to connect to the internet from anywhere with a wi-fi network, driving widespread use and thus the provision of library mobile services. mobile services are services libraries offer to patrons via their mobile devices. these services as described herein comprise two categories: traditional library services modified to be available via mobile devices and services created for mobile devices.10 pope et al. listed several mobile services, including sms or text-messaging services, the my info quest project, digital collections, audiobooks, applications, and mobile-friendly websites.11 the california digital library pointed out that a growing number of university and public libraries are offering mobile services. libraries are creating mobile versions of library websites, using text messaging to communicate with patrons, developing mobile catalog searching, providing access to resources, and creating new tools and services, particularly for mobile devices.12 the most recognized mobile services in university libraries are mobile sites, mobile apps, mobile opacs, mobile access to databases, text messaging services, qr codes, augmented reality, and e books.13 both academic and public libraries’ use of web 2.0 applications and services include blogs, wikis, phone apps, qr codes, mash-ups, video or audio sharing, customized webpages, social media and social networking, and types of social tagging.14 this study focuses on the two most common mobile devices, mobile phones and tablets, and on the services provided to library patrons and local communities through mobile websites, mobile apps, and mobile catalogs. status of mobile services in us libraries mobile devices present a new and exciting opportunity for libraries of all types to provide information to people of all ages on the go, wherever they are.15 it is generally observed that there is an increased use of mobile technology in the library environment. information technology and libraries | june 2018 80 librarians see their users increasingly using mobile phones instead of laptops and desktop computers to search the catalog, check the library’s opening hours, and maintain contact with library staff.16 in an earlier investigation of 766 librarians, spires found that there was very little demand for services for mobile devices as of august 2007. at that time, relatively few libraries (18%) purchased content specifically for wireless handheld device use, and very few libraries (15%) reformatted content for these devices.17 however, a survey of public libraries completed by the american library association between september and november 2011 indicated interesting changes: 15% of library websites are optimized for mobile devices, and 12% of libraries use scanned codes (e.g. qr codes), and 7% of libraries have developed smartphone applications for access to library services; 36% of urban libraries have websites optimized for mobile devices, compared to 9% of rural libraries; 76% of libraries offer access to e-books; 70% of libraries use social networking tools such as facebook. 18 later studies revealed more significant changes. 99 association of research libraries member libraries were surveyed in 2012 to identify how many had optimized at least some services for the mobile web. apps were not investigated. the result showed that 83 libraries (84%) had a mobile website.19 a study in 2015 by liu and briggs showed that the top 100 university libraries in the united states offered one or more mobile services, with mobile websites, mobile access to the library catalog, mobile access to the library’s databases, e-books, and text messaging services being the most common. qr codes and augmented reality were less common.20 kim noted that “libraries are acknowledging that people expect to do just about everything on mobile devices and that more and more people are now using a mobile device as their primary access point for the web.”21 although librarians may have previously underestimated what people wanted to do using mobile devices, there is a growing understanding of the potential of these access points. research design survey samples while a growing number of users tend to access information remotely, urban libraries, as the most popular public-sector institutions and community centers, are facing great challenges in addressing the growing need for mobile services. the urban libraries council (ulc) (https://www.urbanlibraries.org), as an authoritative source founded in 1971, is the premier membership association of north america’s leading public library systems. ulc’s member libraries are in communities throughout the united states and canada, comprising a mix of institutions with varying revenue sources and governance structures, and serving communities with populations of differing sizes. ulc’s website lists 145 us and canadian urban libraries. since this study focused only on us urban libraries, 138 libraries were chosen as the study targets, and all were examined. https://www.urbanlibraries.org/ the provision of mobile services in us urban libraries | guo, liu, and bielefield 81 https://doi.org/10.6017/ital.v37i2.10170 table 1. the survey and examples of survey results. contents options example no.1: pima county public library … example no.138: milwaukee public library components of mobile websites 1 account login; 2 catalog search; 3 contact us; 4 downloadables; 5 events; 6 interlibrary loan; 7 kids & teens; 8 locations and hours; 9 meeting room; 10 recent arrivals; 11 recommendations; 12 social media; 13 suggest a purchase; 14 support 1, 2, 3, 4, 5, 7, 8, 9, 10, 12, 13, 14. 1, 2, 3, 4, 5, 7, 8, 9, 12, 13, 14. components of mobile apps 1 account login; 2 barcode wallet; 3 bestsellers; 4 catalog search; 5 contact us; 6 downloadables; 7 events; 8 full website; 9 interlibrary loan; 10 just ordered; 11 kids & teens; 12 locations and hours; 13 meeting room; 14 my bookshelf; 15 my library; 16 pay fines; 17 popular this week; 18 recent arrivals; 19 recommendations; 20 scan isbn; 21 social media; 22 suggest a purchase; 21 support 1, 4, 5, 6, 7, 8, 12, 15, 18, 20, 21. 1, 4, 5, 6, 7, 8, 12, 17, 20, 21. mobile reference services 1 chat/im; 2 social medias; 3 text/sms; 4 web form - 1, 3, 4. social media 1 blog; 2 facebook; 3 flickr; 4 goodreads; 5 google+; 6 instagram; 7 linkedin; 8 pinterest; 9 tumblr; 10 twitter; 11 youtube 1, 2, 3, 6, 8, 10, 11. 1, 2, 6, 8, 10. mobile reservation services 1 reserve a computer; 2 reserve a librarian; 3 reserve a meeting room; 4 reserve a museum pass; 5 reserve a study room; 6 reserve exhibit space - 3. mobile printing 1 mobile printing; 2 no mobile/ wi-fi printing; 3 wifi printing 3. 2. apps or databases 1 axis 360; 2 biblioboard; 3 bookflix;4 brainfuse; 5 career transitions; 6 cloud library; 7 driving -tests.org; 8 ebscohost; 9 flipster; 10 freading; 11 freegal; 12 gale virtual; 13 hoopla; 14 instant flix; 15 learning express; 16 lynda.com; 17 mango languages; 18 master file; 19 morningstar; 20 new york times; 21 novelist; 22 one click digital; 23 overdrive; 24 reference usa; 25 safari; 26 tumble book; 27 tutor.com; 28 world book; 29 worldcat; 30 zinio. 4, 11, 14, 22, 23, 26, 28, 30. 4, 8, 11,12, 13, 15, 17, 18, 19, 21, 23, 24, 30. information technology and libraries | june 2018 82 survey methods as mobile services are offered basically via wireless systems and mobile devices, a combination of research methods, including mobile website visits, content analysis, and librarian interviews, were applied for data collection. specifically, librarian interviews were employed as a verification and supplemental process to ensure that survey data were accurate and exhaustive. first, the authors utilized an iphone, an android mobile phone, and an ipad to access the websites of the 138 us urban libraries in the study sample to ascertain if these libraries have mobile websites or mobile catalogs and whether the platforms are operated properly. then the authors checked whether these libraries have mobile apps that can be downloaded from the apple app store or the google play store. the survey was conducted from june 18 to june 24, 2017. next, the authors went through all the mobile websites and the mobile apps the libraries provide to check the mobile services offered. the authors used a specially designed survey to collect data about each library’s mobile website and app (see table 1). the procedure of survey content analysis was conducted between june 25 and july 24, 2017, with the examination of each library’s services taking approximately 30 minutes. finally, for those libraries that had no mobile websites or mobile apps found through the website visits, the authors made interview requests to staff librarians via their online reference services such as live chat, web form and email. an additional purpose of this step was to confirm the accuracy of the survey data collected from website visits. the survey was conducted from july 22 to august 3, 2017. results and analysis results from the examination of mobile website visits, content analysis, and librarian interviews revealed what services us urban libraries provided as mobile services, how they were provided, and which were commonly provided. how many libraries provide mobile services? over 83% of us urban libraries have developed their own mobile websites (see figure 1) for communities they serve. the mobile website is currently the most popular service platform for mobile users. the provision of mobile services in us urban libraries | guo, liu, and bielefield 83 https://doi.org/10.6017/ital.v37i2.10170 figure 1. types of mobile services provided by libraries. promisingly, each test of these websites through the authors’ mobile devices, either smartphones or tablets, confirmed that all the study subjects can be accessed 100% of the time. these library websites, however, are not entirely built specially for mobile devices. while the majority of urban libraries have transformed their desktop websites into mobile sites with proper responsive design, about 17% are just smaller versions of their desktop websites (see figure 2). a responsive mobile website can react or change according to the needs of the users and the mobile device they’re viewing it on to achieve a good layout and content display. here, text and images change from a three-column to a single-column layout, and unnecessary images are hidden. the web address of a responsively designed mobile website is the same as the desktop website. responsive design is described as a long-term solution for addressing both designers’ and users’ needs.22 the survey found that 59% of libraries now have apps. our analysis of the earliest version of apps records indicate that los angeles public library was the first to use an app, in august 2010. mobile apps have advantages and disadvantages compared to mobile websites, and many libraries compared them and chose between the two. skokie (illinois) public library, as of october 2015, is no longer supporting the library’s mobile app because they claim the library’s website offers a better mobile experience. they also offer an easy access solution like that for a mobile app, with a message displayed to users: “miss having an icon on your home screen? bookmark the site to your home screen and you’ll have an icon to take you directly to this site.” 83% 59% 22% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% mobile website mobile app mobile catalog information technology and libraries | june 2018 84 figure 2. the smaller versions of the desktop website and the specially designed mobile website the proportion of libraries providing mobile catalog services is only 22%. libraries can use multiple options to create one or more mobile service platforms. nearly half (46%) of us urban libraries have both mobile websites and mobile apps. according to the survey, 95% of libraries have at least one mobile website, mobile catalog, or mobile app. a survey the authors conducted in april 2014 found that only 81% of the urban libraries had at least one mobile website, mobile catalog, or mobile app (see figure 3). clearly, libraries are paying increasing attention to mobile services, and providing mobile services has become the unavoidable choice of libraries nowadays. figure 3. changes in the proportion of libraries that provide mobile services from 2014 to 2017. 19% 81% 2014 no mobile services at least one mobile service 5% 95% 2017 no mobile services at least one mobile service the provision of mobile services in us urban libraries | guo, liu, and bielefield 85 https://doi.org/10.6017/ital.v37i2.10170 what content do the mobile websites offer? through mobile website visits and content analysis, it was found that some types of information are available at all libraries, including “account login,” “events,” “locations and hours,” “contact us,” and “social media” (see figure 4). figure 4. components of mobile websites the proportion of library mobile sites that offer “support” and “downloadables” is 96% and 95%, respectively. among them, “support” generally includes donations to the library foundation, donation of books and other materials, and providing volunteer services; “downloadables” generally include e-books, e-magazines, and music. a total of 86% of the urban libraries set up “kids” and “teens” sections, providing specialized information services, such as storytime, games, events, book lists, homework help, volunteer information, and college information. a majority (62%) of libraries provide interlibrary loan information on mobile websites, but one library, palo alto (california) city library, no longer offers the costly interlibrary loan service as of july 2011. more than half (56%) of the libraries set up a “suggest a purchase” function and generally ask readers to provide title, author, publisher, year published, format, and other information in web form. some libraries display “recommendations” (26%) on their mobile websites. denver public library has a special column recommending books for children and teenagers and offers personalized reading suggestions: “tell us what you like to read and we’ll send you our recommendations in about a week.” many mobile websites will pop hints to the libraries’ mobile apps and link to the apple app store or the google play store after automatically identifying the user’s mobile phone operating system. this is helpful for promoting the use of the libraries’ apps, and it also provides great convenience for users. 100% 100% 100% 100% 100% 99% 96% 95% 86% 74% 62% 56% 32% 26% 0% 20% 40% 60% 80% 100% account login events locations and hours contact us social media catalog search support downloadables kids & teens meeting room interlibrary loan suggest a purchase recent arrivals recommendations http://www.marinlibrary.org/events/?trumbaembed=filter3%3dstorytimes information technology and libraries | june 2018 86 what content do the mobile apps offer? the content of mobile websites in libraries is basically the same, but the content of their mobile apps varies widely. the reason is that the understanding of the various libraries about the functions an app should offer differs from one library to another. some of these apps were designed by software vendors, such as boopsie, sirsidynix, and bibliocommoms, but some were designed by the libraries themselves, leading to the absence of a uniform standard or template for the app design. survey results show that only “account login” and “catalog search” are available in all apps (see figure 5). “locations and hours” accounts for a high proportion of apps at 96%. the “locations” feature in many libraries apps, with the help of gps, helps users find their nearest library location. figure 5. components of mobile apps about 85% of apps provide “contact us.” click “contact us” in poudre river public library district and some other libraries’ apps, and you can directly call the library or send text messages via email. “scan isbn” is a unique feature of mobile apps, and 75% of apps provide this functionality. if a library user finds a book they need in a bookstore or elsewhere, they can scan the isbn to can see if that book is in the library’s collection. apps designed by bibliocommoms all have “bestsellers”, “recently reviewed”, “just ordered” and “my library” (see chart figure 6). in “my library,” the “checked out” section contains red alerts for “overdue,” yellow alerts for “due soon,” and “total items.” the “holds” section contains “ready for pickup,” “active holds,” and “paused holds.”. the “my shelves” section contains “completed,” “in progress,” and “for later.” in this way, users can clearly see the details of the books they have 100% 100% 96% 89% 85% 77% 75% 68% 46% 27% 24% 19% 18% 16% 16% 10% 6% 5% 3% 0% 20% 40% 60% 80% 100% account login catalog search locations and hours downloadables contact us events scan isbn social media full website recent arrivals bestsellers recently reviewed popular this week just ordered my library my bookshelf pay fines barcode wallet kids & teens the provision of mobile services in us urban libraries | guo, liu, and bielefield 87 https://doi.org/10.6017/ital.v37i2.10170 borrowed and intend to borrow. apps designed by boopsie generally have “popular this week” to tell users which books have been borrowed more recently. figure 6. an app designed by bibliocommoms. only 3% of apps have “kids” and “teens” sections, which differs greatly from the percentage of mobile websites that offer those sections (86%). what mobile reference services do libraries provide? according to the survey, the most common way for us urban libraries to provide mobile reference service is a web form, which is available in 86% of surveyed libraries (see figure 7). related to “call us,” a web form has the advantage of being independent from the library’s working hours. although users fill out and submit a web form, it is similar to email and, generally, librarians respond to the user’s e-mail address, but it does not require users to enter their own email system, as they only need to fill in the content required by the web form. therefore, it is more convenient to use. the authors believe that providing only an email address is not mobile reference service. the survey found that 6% of libraries do not have mobile reference services. information technology and libraries | june 2018 88 figure 7. mobile reference services provided by libraries. currently, 43% of libraries offer chat and instant messaging (im) services, which allow users to communicate with librarians instantly. for example, when gwinnett county (georgia) public library’s mobile website is visited, an “ask us” dialog box appears in the upper right corner of the site, which allows visitors to chat with librarians. outside of the library’s work hours, the box displays “sorry, chat is offline but you can still get help” (see figure 8). the county of los angeles public library provides four options for im. they are aim, google talk, yahoo! messenger, and msn messenger. figure 8. “ask us” on gwinnett county public library’s mobile website 86% 43% 33% 8% 0% 20% 40% 60% 80% 100% web form chat/im text/sms social media the provision of mobile services in us urban libraries | guo, liu, and bielefield 89 https://doi.org/10.6017/ital.v37i2.10170 all the florida urban libraries surveyed offer reference services via the web form, chat, and text because an “ask a librarian” service administered by the tampa bay library consortium provides florida residents with those mobile reference services. the survey shows that only 8% of the libraries provide social media reference service in “ask a librarian.” the social media that provides reference service is either facebook or twitter. in fact, 100% of libraries have social media, and 100% of libraries have facebook and twitter, but most libraries do not use them to provide reference services. what social media do the libraries use? survey results showed that 100% of mobile websites display links to their social media, usually in the prominent position of the front page of the websites; 68% of apps have social media links. facebook and twitter are social media leaders, and now all libraries’ mobile websites have both (see figure 9). the survey conducted in 2014 showed that facebook and twitter had the highest occupancy rate, but only 61% of libraries offered facebook and 53% offered twitter. it is obvious that libraries have made great progress in the last three years in the application of social media. figure 9. social media being used by libraries. instagram and pinterest are both photo social media, and they are used 76% and 49%, respectively. as the leading social media in the video field, youtube is used by 67% of libraries. what mobile reservation services do libraries provide? mobile reservation services were found in 78% of all libraries’ mobile services. a majority (62%) of the libraries allow online reservation of a meeting room via web form or other forms, and 14% allow reserving a study room (see figure 10). some libraries only reserve a study or meeting room via phone. 100% 100% 76% 67% 57% 49% 41% 19% 12% 12% 9% 0% 20% 40% 60% 80% 100% facebook twitter instagram youtube blog pinterest flickr tumblr linkedin google+ goodreads information technology and libraries | june 2018 90 figure 10. mobile reservation services provided by libraries. a few libraries provide instant online access to free and low-cost tickets to museums, science centers, zoos, theatres, and other fun local cultural venues with discover & go. a total of 14% of the libraries provide “reserve a librarian” service, allowing patrons to reserve a free session with a reference librarian or subject specialist at the library. in addition, several libraries, such as pasadena public library, allow reserving of exhibit space. how many libraries provide mobile printing? mobile printing services allow patrons to print to a library printer from outside the library or from their mobile device. patrons’ print jobs are available for pick up at the library. already, 43% of the libraries provide mobile printing service (see figure 11). it is expected that more libraries will provide this service. to print from a mobile device, patrons need to download an app that supports mobile printing. printeron is the more commonly used app, which has been used by oakland public library, and san mateo county (california) libraries, and others. however, san diego public library uses the your print cloud print system, and santa clara county (california) library uses smart alec. san mateo county libraries offers wireless printing from smartphones, tablets, and laptops at all of its locations, and its wireless printing includes mobile printing, web printing, and email printing. in addition, 14% of libraries offer wireless printing services but do not provide mobile printing services. for example, live oak public libraries in savannah, georgia, states that printing from laptops (pc and mac) is available in all branches, but they don’t have apps that support printing from tablets or mobile phones. 62% 20% 15% 14% 14% 4% 0% 10% 20% 30% 40% 50% 60% 70% reserve a meeting room reserve a computer reserve a museum pass reserve a study room reserve a librarian reserve exhibit space the provision of mobile services in us urban libraries | guo, liu, and bielefield 91 https://doi.org/10.6017/ital.v37i2.10170 figure 11. the proportion of libraries that offer mobile printing. what apps or databases do libraries provide for patrons? four main software programs found to be used to display e-books of the surveyed libraries are overdrive (93%), hoopla (64%), tumblebook (61%), and cloud library (48%). for audiobooks, overdrive (93%) and hoopla (64%) are the most popular; oneclickdigital is used by 48%. most libraries (74%) use zinio for e-magazines, and 48% use the music software freegal. overdrive is the most common application in libraries (see table 2). table 2. the proportion of apps or databases being used in libraries. apps or databases % of libraries providing apps or databases % of libraries providing overdrive 93 world book 46 novelist 79 new york times 44 referenceusa 74 masterfile 43 zinio 74 ebscohost 43 learningexpress 69 flipster 29 gale virtual 68 bookflix 28 hoopla 64 brainfuse 22 morningstar 64 tutor.com 17 mango languages 61 safari 17 tumblebook 61 driving-tests.org 16 lynda.com 57 biblioboard 12 worldcat 51 career transitions 12 freegal 48 axis 360 11 oneclick digital 48 instantflix 10 cloud library 48 freading 9 mobile printing 43% no wireless/mobile printing 42% wireless printing 14% information technology and libraries | june 2018 92 the libraries provide users with various types of databases. survey statistics show that the widely used databases include referenceusa (business), mango languages (language learning), learningexpress and career transitions (job and career), lynda.com and tutor.com (education), morningstar (investment), world book (encyclopedias), worldcat (library resources worldwide), new york times (newspaper articles), driving-tests.org (testing preparation), and safari (technology). conclusion this study shows that mobile services have become popular in us urban libraries as of summer 2017, with 95% offering one or more types of mobile service. responsive mobile websites and mobile apps are the main platforms of current mobile services. the us urban libraries are terribly striving to meet local community’s remote access needs via new technologies. compared with desktop websites, mobile websites and apps for mobile devices offer services that are more accessible, smarter and interactive for local users. some mobile websites automatically prompt the user to install the libraries’ apps; many libraries’ apps offer the “scan isbn” function, making it convenient for the user to scan a book title at any time to see if it is in the library’s collection; “location” provides gps positioning and navigation services for users; “contact us” can directly link telephone, text, and email. libraries are actively developing and adding more mobile services, such as mobile reservation services and mobile printing services. the development of mobile technology has provided the support for libraries to offer mobile services. a future world of users accessing services provided by the libraries at anytime, anywhere, and in any way is getting closer and closer. acknowledgements this work was supported by grant no. 14ctq028 from the national social science foundation of china. references 1jason griffey, mobile technology and libraries (new york: neal-schuman, 2010). 2meredith farkas, “a library in your pocket,” american libraries no. 41 (2010): 38. 3american library association, “the state of america’s libraries 2017: a report from the american library association,” special report, american libraries, april 2017, http://www.ala.org/news/sites/ala.org.news/files/content/state-of-americas-librariesreport-2017.pdf. 4mark weiser, “the computer for the 21st century,” scientific american 265, no. 3 (1991): 94–104. 5stefan gessler and andreas kotulla, “pdas as mobile www browsers,” computer networks and isdn systems 28, no. 1–2 (1995): 53–59. 6georgina parsons, “information provision for he distance learners using mobile devices,” electronic library 28, no. 2 (2010): 231–44, https://doi.org/10.1108/02640471011033594. http://www.ala.org/news/sites/ala.org.news/files/content/state-of-americas-libraries-report-2017.pdf http://www.ala.org/news/sites/ala.org.news/files/content/state-of-americas-libraries-report-2017.pdf https://doi.org/10.1108/02640471011033594 the provision of mobile services in us urban libraries | guo, liu, and bielefield 93 https://doi.org/10.6017/ital.v37i2.10170 7allison woodruff et al., “portable, but not mobile: a study of wireless laptops in the home,” international conference on pervasive computing 4480 (2007): 216–33, https://doi.org/10.1007/978-3-540-72037-9_13. 8joan k. lippincott, “a mobile future for academic libraries,” reference services review 38, no. 2 (2010): 205–13. 9rachel hu and alison meir, “mobile strategy report,” california digital library, august 18, 2010, https://confluence.ucop.edu/download/attachments/26476757/cdl+mobile+device+user+r esearch_final.pdf?version=1. 10yan quan liu and sarah briggs, “a library in the palm of your hand: mobile services in top 100 university libraries,” information technology & libraries 34, no. 2 (2015): 133–48, https://doi.org/10.6017/ital.v34i2.5650. 11kitty pope et al., “twenty-first century library must-haves: mobile library services,” searcher 18, no. 3 (2010): 44–47. 12hu and meir, “mobile strategy report.” 13qian and briggs, “a library in the palm of your hand.” 14kalah rogers, “academic and public libraries’ use of web 2.0 applications and services in mississippi,” slis connecting 4, no. 1 (2015), https://doi.org/10.18785/slis.0401.08. 15 pope et al., “twenty-first century library must-haves.” 16lorraine paterson and low boon, “usability inspection of digital libraries: a case study,” ariadne 63, no. 1 (2010): 11, https://doi.org/10.1007/s00799-003-0074-4. [website lists h. rex hartson, priya shivakumar, and manuel a. pérez-quiñones as the authors] 17todd spires, “handheld librarians: a survey of librarian and library patron use of wireless handheld devices,” internet reference services quarterly 13, no. 4 (2008): 287–309, https://doi.org/10.1080/10875300802326327. 18 american library association, “libraries connect communities 2011-2012,” last modified june, 2012, http://connect.ala.org/files/68293/2012.67b%20plfts%20results.pdf. 19barry trott and rebecca jackson, “mobile academic libraries,” reference & user services quarterly 52, no. 3 (2013): 174–78. 20 liu and briggs, “a library in the palm of your hand.” 21bohyun kim, “the present and future of the library mobile experience,” library technology reports 49, no. 6 (2013): 15–28. 22hannah gascho rempel and laurie bridges, “that was then, this is now: replacing the mobileoptimized site with responsive design,” information technology & libraries 32, no. 4 (2013): 8–24, https://doi.org/10.6017/ital.v32i4.4636. https://doi.org/10.1007/978-3-540-72037-9_13 https://confluence.ucop.edu/download/attachments/26476757/cdl+mobile+device+user+research_final.pdf?version=1 https://confluence.ucop.edu/download/attachments/26476757/cdl+mobile+device+user+research_final.pdf?version=1 https://doi.org/10.6017/ital.v34i2.5650 https://doi.org/10.18785/slis.0401.08 https://doi.org/10.1007/s00799-003-0074-4 https://doi.org/10.1080/10875300802326327 http://connect.ala.org/files/68293/2012.67b%20plfts%20results.pdf https://doi.org/10.6017/ital.v32i4.4636 abstract introduction literature review definition and types of mobile devices and mobile services status of mobile services in us libraries research design survey samples survey methods results and analysis how many libraries provide mobile services? what content do the mobile websites offer? what content do the mobile apps offer? what mobile reference services do libraries provide? what social media do the libraries use? what mobile reservation services do libraries provide? how many libraries provide mobile printing? what apps or databases do libraries provide for patrons? conclusion acknowledgements references author id? metaphor’s role in the information behavior of humans interacting with computers | sease 9 robin sease metaphor’s role in the information behavior of humans interacting with computers metaphors convey information, communicate abstractions, and help us understand new concepts. while the nascent field of information behavior (ib) has adopted common metaphors like “berry-picking” and “gap-bridging” for its models, the study of how people use metaphors is only now emerging in the subfield of human information organizing behavior (hiob). metaphors have been adopted in human–computer interaction (hci) to facilitate the dialogue between user and system. exploration of the literature on metaphors in the fields of linguistics and cognitive science as well as an examination of the history of use of metaphors in hci as a case study of metaphor usage offers insight into the role of metaphor in human information behavior. editor’s note: this article is the winner of the lita/ ex libris writing award, 2008. o ur world is growing increasingly digital; our entire lives—our interactions, our entertainment, even our personal memories—are mediated by technology. humans have had thousands of years to learn to communicate with each other, largely employing metaphors and analogies to negotiate meaning. our experience communicating with computers is both nascent yet broadening every day with increasing dependency. we must fully understand the role that metaphors play in the exchange of information to facilitate the communication between humans and computers. n metaphors: a definition originally regarded as rhetorical devices, plato abhorred the use of metaphors, arguing that they could convince a man to do the illogical. schön explains that at that time metaphors were considered a “kind of anomaly of language, one which must be dispelled in order to clear the path for a general theory of reference or meaning.”1 aristotle, on the other hand, saw that they provided insight into the items of comparison. “ordinary words convey only what we know already; it is from metaphor that we can best get hold of something new.”2 traditionally the objects in the equation have been called the tenor and the vehicle, but more recently they are referred to as the target and source domains. in the metaphor, “alex is a space cadet,” alex is the tenor or target domain (the abstract or undefined), and space cadet represents the vehicle or source domain (the known). if “the essence of metaphor is understanding and experiencing one thing in terms of another,” then the vehicle or the source domain is responsible for elucidating the tenor or target domain.3 one measures the relationship between these domains, the tenor and the vehicle, with “ground” and “tension.” ground concerns the similarities between the domains and tension represents the dissimilarities.4 metaphors have been studied from multiple perspectives: from the creative use of metaphors in literature to the comprehension or appreciation of metaphors.5 the research from other disciplines can offer insight into the effect of metaphors on human information behavior. i will first discuss the use of metaphors in language and then review some of the theories on how they work. n metaphorically speaking: the role of metaphors in language the work of lakoff and johnson has been fundamental to understanding the pervasive use of metaphors in our language. they propose that metaphors are an underlying structure forming and shaping the way we discuss and even think about the world. they argue that the “human conceptual system is metaphorically structured and defined.”6 mapping from a source domain to a target domain is central to the semantics of language and communication. “domains need structure so that one can reason about them. the major function of metaphor is thus to supply structure in terms of which reasoning can be done.”7 in metaphors we live by, lakoff and johnson catalogue examples of underlying conceptual metaphors. they identify orientation metaphors that underlie how we speak about abstract concepts such as health, happiness, and success. each of these states is associated with the direction up. one can be “up and at ‘em” or in “high spirits” or of “high standing.” counter examples include “being under the weather,” “feeling down,” and “low comedy.” metaphors shape the way we think about the concepts we are describing. for instance, the metaphor “argument is war” (“defending your point of view,” “attacking your opponent’s stance,” and “he shot me down”) may define expectations for “winning” and “losing” and detrimentally shape our ability to negotiate and compromise.8 lakoff and johnson refer also to michael reddy’s 1979 piece, “the conduit metaphor.”9 reddy hypothesizes that linguistically and conceptually we see ideas or meanings as objects, linguistic expressions as containers, and communication as sending. the “receivers” of robin sease (seaser@u.washington.edu) is an mlis candidate at the ischool, university of washington, seattle. 10 information technology and libraries | december 2008 the communication are the information users or seekers. the designers “package their ideas,” “put them down on paper,” and “convey” them to the user who “gets” them or not. reddy argues that this underlying metaphor influences the way we think about the communication process, making information and meaning an object rather than a process, which trivializes the function of the reader or listener.10 metaphors are undeniably central to our ability to communicate and use language, and perhaps more fundamentally, to convey meaning or to infer meaning—to illustrate and explain as well as to identify and to catalog. the role of metaphors in human cognition is still a matter of great debate. n thinking about metaphors: the cognitive role of metaphors information science is at its heart the study of information. if metaphors exist as a necessary component of language—a tool to convey meaning and to transfer information—then metaphors are by necessity a component of information science. understanding how metaphors work provides insight into information itself. early propositions about how metaphors were understood stemmed from poetic and rhetorical research. that is, if a sentence cannot be interpreted literally, then it must be interpreted figuratively. to illustrate, the assertion “my child is a pig” is initially illogical, so the receiver would then move on to figurative interpretation. once that determination is made, the mind sets about finding meaning from the expression. this theory argues that once the statement is deemed false, the statement is treated like a simile, or a comparison statement, by identifying traits or attributes in the source domain (the pig: sloppy, slovenly, fascinated with mud) that would be applicable to the target domain.11 one group of theorists questions this premise, pointing to sentences that can be interpreted literally and figuratively. one useful example is the statement “my dog is an animal.”12 while this expression is true literally, most would reject the literal interpretation in favor of one that depicts the dog as a ferocious or uncontrollable beast. glucksberg and keysar, among others, seek a model that focuses on the associations between the domains. they hypothesize that metaphors are not “implicit comparisons” but are class-inclusion statements or “assertions of categorization.”13 research in cognitive processing of analogies has shifted from plain association of a is to b where a traits are matched to b traits to a hypothesis that maps from a to b and leads insight into a super-ordinate category that includes both a and b. gentner’s work studying science metaphors in the 1980s is partially founded on this theory. she notes that through “analogical reasoning, learning can result in the generation of new categories and schemas.”14 she is particularly interested in creating ways for computers to interpret figurative expressions. she proposes a structuremapping theory: a system of relations (not just traits) from the source domain to the target domain with a parallelism between the structures that allows for a one-to-one mapping of the domains and relationships. weiner explores a similar tactic with human–computer interaction language processing by prototyping the shared framework. the prototype theory allows for a range of possible predicates and would accommodate greater tension (the differences in a metaphor) in the same way that we can categorize penguins and chickens under the prototype of bird.15 these theories of categorization remain popular today, but still struggle to account for certain things about the way metaphors are comprehended. specifically, take the shakespearean line, “juliet is the sun.” categorization theory does not explain why some attributes like “glowing” and “center of the solar system” are transferred from the source while others such as “nuclear” and “huge” are not.16 this theory also stumbles with novel poetic metaphors like e e cummings’ “the voice of your eyes is deeper than all roses.”17 alternative theorists argue that while the categorization-based theories accommodate the ground (commonality) in a metaphor, they fail to fully explain the effect and purpose of the tension (differences) in the equation. lakoff fervently contends that simplifying conceptual models to mere categorization ignores the unique nature of each specific mapping: each mapping defines an open-ended class of potential correspondences across inference patterns. when activated, a mapping may apply to a novel source domain knowledge structure and characterize a corresponding target domain knowledge structure.18 in other words, each pairing creates new meaning or conceptual frameworks from which other metaphors and meanings can be instantiated. a is to b creates meaning c, rather than a and b are part of c. looking at it from the perspective of lanier, a vocabulary is created upon which we can define even more vocabulary.19 lakoff maintains that the theory of conceptual domains speaks to both the uni-directional nature of metaphors as well as the “systematicity” that allows the interpreter to selectively identify the aspects that are consistent and discard the aspects that are inconsistent with the metaphor.20 more recent work approaches the question from a connectivist point of view, seeking ways to identify an overarching model consistent with and encompassing of other theories. this premise rests on the foundation of metaphor as communication and examines the use of metaphors in conversational contexts. the necessary mutual cognitive environment of the communicators, the metaphor’s role in the information behavior of humans interacting with computers | sease 11 working memory, and the common ground that they find are all of importance, but so are context and motivation as influencing factors. the context in which the statement is made, the place in which it is interpreted, and the motivation of the user to understand the statement combine to affect the meaning that is derived. for instance, the phrase, “i want you to sheepdog this project” could mean something different in the context of a chaotic group of workers than in the context of a core team threatened by competing entities.21 likewise, the relationship of the receiver to the sender could modify the motivation of the receiver to seek meaning beyond the first or easiest interpretation. n classifying metaphors: metaphors in information science these notions of context and user-motivation are not new to the field of information science. at the turn of the century the subfield of information behavior had begun to direct its attention to cognitive psychology, the nature of man–machine dialogue, and to a certain extent the role of metaphor in deciphering and creating meaning. spink investigates human information behavior (hib) from an evolutionary perspective.22 after exploring a wide variety of research in fields, spink and currier performed a qualitative analysis of the information behavior of historical figures. they postulated that modular cognitive architecture makes homo sapiens rare in their ability to think of one thing in terms of another.23 the resulting mapping allows for the creation of new cognitive structures in a similar fashion to lanier’s vocabulary development conjecture. spink and currier’s work launched a new theory of information use, which has led to recent research into metaphor use. in an attempt to model an integrative approach to human information behavior incorporating the everyday life information-seeking and sense-making approach, the information-foraging approach, and the problem-solution view of information seeking, spink and cole recognized a gap in the research covering actual information use and proffered a fourth information approach to account for it. their information-use theory “starts from an evolutionary psychology notion that humans are able to adapt to their environment and survive because of our modular cognitive architecture.”24 development of this theory has birthed a sub-area within the field of human information behavior dubbed human information organizing behavior (hiob) of which the use of metaphors or metaphor instantiation is a necessary component. cole and leide explore the notion of modular cognitive architecture in an attempt to model a cognitive framework for metaphor use in hiob. similar to the categorization theory of metaphor use, they claim that “metaphor instantiation is similar to a form of superordinate category instantiation . . . along with the metaphor comes the structure of the metaphor.”25 following in belkin’s footsteps, they address the problem of a “domain novice attempting to formulate his information need into an effective query to an information retrieval system.”26 they conducted three case studies with the purpose of developing a methodology that researchers can use to “ascertain the efficacy of metaphor instantiation as an information need structuring device.”27 they conclude that metaphor instantiation might help us create systems that more closely resemble the way that humans behave with information: interaction, organization, and retrieval. n metaphors in human–computer interaction: a case study reality bytes while theorists of various fields explored the nature of metaphors, the field of human–computer interaction (hci) found itself thrust into the thick of it. rarely does one intentionally adopt new ideas so whole-heartedly without first considering the ramifications, but the history of hci shows that that is exactly what happened. it began with enthusiastic adoption to improve communication, then reeled in recognition of the drawbacks of metaphor mismatches, and finally has lurched to a standstill while new approaches to metaphor use are explored. the first instances of metaphor and analogy in the field of computer science and hci preceded images of windows, desktops, mice, scrollbars, and icons. the initial focus was on natural language processing to improve the communication between the user and the system.28 although the field of information science was on the periphery of metaphor research at the time, it certainly was interested in improving the dialogue between users and systems. belkin proposed a model of information seeking that highlighted the user ’s anomalous state of knowledge. he argued for a better understanding of user’s conceptual models in order to improve system communications.29 although he did not propose metaphors specifically, the advent of the graphical user interface (gui) placed metaphors in a position to tackle belkin’s concerns. hci gets gui perhaps because of the difficulty of man–machine dialogue, guis emerged. by simplifying the “language” to “point and click,” even an average user could make the system do what it was supposed to do.30 with its more intuitive and memorable interface, the gui was the 12 information technology and libraries | december 2008 result of years of frustration trying to remember system functions and commands. because illustrations of the abstract are necessarily grounded in something concrete, guis and metaphors were inexorably intertwined; in a sense, metaphors were “inescapable.”31 metaphors enacted through the user interface would become the primary mechanism of communication between the user and the system. gui metaphors can be categorized several ways. a typical breakdown is to break out noun and verb metaphors into “organization metaphors” and “operations metaphors.”32 alternatively, fineman further divides the nouns and classifies various metaphors into three basic types: functionality metaphors, interface metaphors, and interaction metaphors.33 fineman describes an e-mail program. functionality metaphors outline the expectations that a user should have for an application and generally guide the overall behavior of the tool. in the e-mail program the functionality metaphor would be “e-mail is postal mail.” interface metaphors are the mechanical metaphors that allow the user to accomplish the tasks within the functionality metaphor. the interface metaphors should be guided by the functionality metaphors, but not constrained by them. examples would include the address book and printer metaphors. interaction metaphors, or the verbs, are the underlying metaphors that define the form of the action, how things are performed; these metaphors span beyond a particular tool, but greatly affect the functionality metaphor.34 the effect of the selected metaphors cannot be underestimated. for instance, many feel that the direct manipulation metaphor (data is an object that can be manipulated) and gui are synonymous.35 and within the graphical user interface, the choice of desktop has affected all aspects of the interface with the user. one need only reflect upon the famous englebart demonstration of the “mouse” most often viewed in alan kay’s video presentation.36 englebart’s mouse preceded the notion of a desktop and more closely resembled a pilot’s controls than an office worker sitting at a typewriter keyboard. imagine how different our computers would be today had the pilot metaphor ever got off the ground.37 the ground we walk on having adopted metaphors, the field of hci wanted a better understanding of why and how they worked. carroll and thomas stressed the importance of psychology research and rallied for the use of metaphor for its grounding purposes, that is, bridging abstract concepts to concrete attributes. in a manner similar to belkin, they brought forth the notion that the designer of the system creates a conceptual model of how it works. the metaphors used within the user interface serve as bridges to the user’s mental model of the system. “people employ metaphors in learning about computing systems, the designers of those systems should anticipate and support likely metaphorical constructions to increase the ease of learning and using the system.”38 they encouraged designers to consider the limitations and consequences of metaphors; ideally, the metaphor should convey its limitations to the user. their eagerness to adopt metaphors, which they considered “crucial” for motivating and facilitating understanding, was countered only by their warning that “for most computer systems there will come a point at which the metaphor or metaphors that initially helped the user understand the system will begin to hinder further learning.”39 case recognized the importance of assessing users’ needs and expectations when designing metaphors for systems. his study of historians found that metaphors and analogies are commonly used in the information behavior of historians. he endorsed their use in interface development despite potential pitfalls. concerned mostly with transitioning historians from physical to electronic format, case argued that digital documents and files should more closely resemble physical files— not necessarily physically but in the manner of retrieval and storage.40 espousing a slightly more conservative opinion, marcus indicated that an “appropriate metaphor balances delicately expectation and surprise on part of the user/viewer.”41 marcus repeated that the objective of the designer is to design a conceptual model that clearly indicates to users what their expectations of the system should be, the goal being that the conceptual model created by the designers will map as much as possible to an existing mental model that the user can bring to reference.42 metaphors are not only useful for familiarizing users with the system, but also affect the system design as part of the design rationale. maclean, bellotti, young, and moran noted the usefulness of metaphors in the creative process, but expressed concern that designers should consider the effect of even implicit metaphors.43 some metaphors are inevitable because “new concepts and processes require new terminology. we can either coin new terms, borrow them from greek, latin, or other languages, create terms by adding prefixes or suffixes—or use metaphoric terms.”44 many metaphors used by designers in their communications are simply embedded in the language of computer science. what makes computer science so unique among the sciences, especially when using metaphors, is that they not only talk about something in terms of metaphors, they implement them too. “we live with our metaphors.”45 this discourse may carry loads of inexplicable metaphors for common users, “heaps” and “stacks” and “parents” and “children,” for instance, come readily to mind for anyone with computer science experience, but do metaphor’s role in the information behavior of humans interacting with computers | sease 13 not necessarily convey meaning to users. we should stay aware of our metaphors so that we avoid seeing “platforms, engines and objects rather than ‘platforms’, ‘engines’ and ‘objects.’”46 the tension builds these caveats that metaphors must be constantly monitored and selected with care, coupled with a growing collection of mismatched and ill-fitting metaphors, began the initial protestations over the use of metaphors in hci. the field of hci started experiencing the effect of the tension in the metaphorical equation—those attributes that fail to match. gentner and nielson summarize three “classic drawbacks” of metaphors: n the target domain has features not in the source domain (magic attributes). n the source domain has features not in the target domain (misleading attributes). n some features exist in both domains but act differently (violation of expectations).47 even proponents of metaphors readily admitted the limits of metaphors, specifically that they never match perfectly and that they can “limit meaning.”48 halasz and moran cautioned that teaching new users through analogical models may be an easy way to introduce a user to a new system but that “analogical models can act as barriers preventing new users from developing an effective understanding of systems.”49 halasz and moran argued that computers are unique; we should abandon analogical models and rather seek to create a conceptual model of the system that would more accurately reflect the actual system. a system designer’s conceptual model would represent the system to improve the user’s ability to solve problems and apply reason within the system. they confess that moving away from analogical models leaves the user without the tool of “prior knowledge,” so for teaching purposes (though not long-term reasoning purposes) they offer the use of smaller, simpler metaphors—those that they liken to literary metaphors used to “make a point in passing. once the point is made, the metaphor can be discarded.”50 noting that there was room for error and rejection on behalf of the user, marcus explained that some inappropriate metaphors simply become assimilated or evolve. for example, the original apple trashcan icon more closely resembled a “kitchen garbage can” for scraps and rotting things than an office wastebasket for paper, but over the years it has evolved to its current office basket icon.51 also, as technology changes, the metaphors will change. “the paradigm shift, or change in metaphors, will be constant and swift as paradigms evolve from prototypes, become typed, evolve to archetypes, and eventually become stereotyped or obsolete.”52 without stating it explicitly, he spoke of dead metaphors: metaphors that no longer bring new meaning to light, the “arm” of a chair or the “leg” of a table, for instance. these metaphors are accepted idiomatically with no need for explanation and exploration. aware of the ease with which users employ idiomatic icons in computing, cooper adduced that idioms and meaningless symbols are preferable to new metaphors, claiming “metaphors offer a tiny boost in learnability to first time users at tremendous cost. the biggest problem is that by representing old technology, metaphors firmly nail our conceptual feet to the ground, forever limiting the power of our software.”53 he proposed that we move away from a metaphoric paradigm to an idiomatic paradigm where a word or symbol simply stands for something else and does not carry with it the weight of analogy. many of the metaphors originally created in computing have become dead metaphors or idioms already. people do not think of their memory buffer where they store copied or cut items as an actual clipboard. the macintosh trashcan is ubiquitously cited as a perfect example of a mismatched metaphor and illustrates what may happen when a metaphor becomes idiomatic. for many years to the horror and confusion of many users, the trashcan both deleted files and was used to eject a diskette. a user would drag their diskette icon to the trashcan to eject it. although this may seem like just a poor choice of metaphor, it does have a sensible origin. historically, computers had no hard drive, but rather ran applications from diskettes. when you were entirely done with the application you would remove the application icon from the desktop by placing it in the trash. you would also need to eject the diskette. for expediency, apple engineers incorporated ejection and desktop removal into one quick task. it was user tested and readily adopted.54 the metaphor was a natural extension until the functionality changed. the user is not the only potential victim of metaphors; the blinders of an adopted metaphor can curtail a system designers’ vision.55 gentner and nielson take great offense at the direct manipulation metaphor because it reduces us to “pointing” and “grunting” as if we were children barely able to communicate or patrons at a restaurant where we don’t speak the language. when they state “computer interfaces must evolve to utilize more of the power of language,”56 they are not speaking of voice control and natural language processing, but to creating a shared language understandable by both the user and the system. only “power users” of a machine have breached the walls of the interface and have attempted to learn the language of the machine itself, but even they are inevitably dragged down by the restrictions of direct manipulation.57 14 information technology and libraries | december 2008 near the end of the millennium, user interface guidelines and handbooks backed off—afraid to support or spurn metaphor use in hci. blackwell’s chronicle of the history of the desktop metaphor notes that 1990 “marked the middle of a decade (1985 to 1995) in which researchers anticipated problems with metaphor at the start and had experienced failure by the end.”58 the silence is most stunning in the handbook of human computer interaction, a 1,582 page volume in which only two of the sixty-two chapters even mention metaphors.59 hollan, bederson, and helfman caution against metaphors in their chapter on information visualization,60 while neale and carroll cautiously return to carroll’s original thesis, stressing the importance of creating a conceptual model (the designer’s model of the system’s functions) that “should incorporate an accurate understanding of the user’s task, requirements, experience, capabilities, and limitations.”61 n metaphor ever after by the year 2000, system designers found themselves stuck between a rock and hard drive. investigations into the efficacy of metaphors find that metaphors are a mixed bag, unavoidable, useful, yet problematic.62 while creating a taxonomy of hci metaphors, barr, biddle, and noble conclude that “the analysis present in the taxonomy should indicate that there are many benefits to user-interface metaphors if we choose them correctly and harness them properly.”63 yet blackwell’s dissertation research finds that metaphors afford “surprisingly little benefit for cognitive tasks” and that the benefit is “largely restricted to mnemonic assistance.”64 blackwell notes that the benefits were greatest when the user constructed his or her own metaphor rather than using the system supplied metaphor. interestingly, while studying students’ understanding of search engines, hendry identified a conceptual metaphor (not provided by the system) common to many of the students’ visions of an information retrieval system. although hendry does not suggest that metaphors should be used when creating systems, he does question how existing conceptual metaphors might be identified through sketching and then incorporated to create mappings “between problem domains and programming notations.”65 endeavoring to incorporate the benefits of metaphors while dodging the drawbacks, recent variations on the use of metaphor have been tendered. neale and carroll lobby for composite metaphors—metaphors made up of multiple metaphors—to alleviate the tension between source and target domains.66 powell found composite metaphors useful for facilitating computer game play without unduly upsetting users. she explains that gamers have readily adopted the tool or inventory bag from which the user may equip their character with a mannequin style “dress-up” panel. the bag and mannequin metaphors have no real-world association but work effectively.67 hsu, investigating composite metaphors, confirmed neale and carroll’s assertions and found that the “closer the mapping between designers’ conceptual models and users’ mental models, the greater the effect of interface metaphors.”68 as an alternative to composite metaphors, khoury and simoff propose a new class of metaphors that they call “elastic.” they explain that metaphors in language are unavoidable, and we must deal with them in information technology. rather than focusing on concrete objects, however, metaphors should focus on social structures, such as relationships in game play. they conclude that “elastic metaphors can provide an optimal mapping from source to target domains.”69 n conclusion historically, in hci the designer of the system supplies metaphors to help the user understand the system better. unfortunately, this format falls prey to reddy’s conduit metaphor: the receiver of the information is left out of the communication process. if hci is to learn from humanto-human interaction, then the user of the system should be able to communicate his or her needs to the system. if the system does not have the capacity to understand the request, then the user and the system should be empowered to select mutually agreeable simple metaphors for communicating. the user should be given the option to choose his or her own metaphors, and the metaphors, vocabulary, and “language” created should be able to evolve as the boundaries of the comparison are reached. a common complaint from users is, “the computer just isn’t listening to me.” and they are, of course, right. the field of information science, and particularly the subfield of human information behavior, are in a unique position to help resolve the long-standing debate over the use of metaphors in hci. from belkin’s early stated objectives to improve information systems to cole and leide’s pursuit of metaphor instantiation in human information organizing behavior, the study of information behavior attempts to better understand and ideally facilitate the user—assisting them in their acquisition and application of information. metaphors are clearly utilized by humans as they communicate with each other, seek and conceptualize information, and solve problems. to improve the interaction between human and computer, we must first gain better insight into the role that metaphors play in our own interaction with information. metaphor’s role in the information behavior of humans interacting with computers | sease 15 references 1. donald a. schön, “generative metaphor,” in metaphor and thought, 2nd ed., ed. andrew ortony (new york: cambridge univ. pr., 1993): 138 . 2. aristotle, rhetoric, book iii, chapter 10, ed. lee honeycutt, trans. william r. roberts, www.public.iastate.edu/~honeyl/ rhetoric/rhet3-10.html (accessed june 25, 2008). 3. george lakoff and mark johnson, metaphors we live by (chicago: univ. of chicago pr., 1980): 5. 4. andrew ortony, “metaphor, language, and thought,” in metaphor and thought, 2nd ed., ed. andrew ortony (new york: cambridge univ. pr., 1979/1993): 1–18. 5. robert sternberg, roger tourangeau, and georgia nigro, “metaphor, induction, and social policy: the convergence of macroscopic and microscopic views,” in metaphor and thought, 2nd ed., ed. andrew ortony (new york: cambridge univ. pr., 1979/1993): 277–303. 6. lakoff and johnson, metaphors we live by, 9. 7. george lakoff, “the contemporary theory of metaphor,” in metaphor and thought, 2nd ed., ed. andrew ortony (new york: cambridge univ. pr., 1979/1993): 194. 8. lakoff and johnson, metaphors we live by. 9. michael j. reddy, “the conduit metaphor,” in metaphor and thought, 2nd ed., ed. andrew ortony (new york: cambridge univ. pr., 1993): 174–201. 10. ibid. 11. alan paivio and mary walsh, “psychological processes in metaphor comprehension and memory,” in metaphor and thought, 2nd ed., ed. andrew ortony (new york: cambridge univ. pr., 1979/1993): 307–28. 12. sam glucksberg and boaz keysar, “how metaphors work,” in metaphor and thought, 2nd ed., ed. andrew ortony (new york: cambridge univ. pr., 1993): 408. 13. ibid., 401. 14. dedre gentner, “reasoning and learning by analogy,” american psychologist 52, no. 1 (1997): 33. 15. judith e. weiner, “a knowledge representation approach to understanding metaphors,” computational linguistics 10, no. 1 (1984): 1–14. 16. george lakoff, “position paper on metaphor,” proceedings of the 1987 workshop on theoretical issues in natural language processing (morristown, n.j.: association for computational linguistics, 1987): 94–197. 17. gentner, “reasoning,” 106. 18. lakoff, “contemporary theory,” 210. 19. jaron lanier, “jaron’s world: the meaning of metaphor,” discover (mind and brain) 28, no. 2 (2007), http://discover magazine.com/2007/feb/jarons-world-metaphors-vocabulary (accessed june 25, 2008). 20. lakoff and johnson, “metaphors.” 21. david ritchie, “metaphors in conversational context: toward a connectivity theory of metaphor interpretation,” metaphor and symbol 19, no. 4 (2004): 265–87. 22. amanda spink and charles cole, “a human information behavior approach to a philosophy of information,” library trends 52, no. 3 (2004): 617–28; amanda spink and james currier, “towards an evolutionary perspective for human information behavior: an exploratory study,” journal of documentation 62, no. 2 (2006): 171–93; amanda spink and james currier, “emerging evolutionary approach to human information behavior,” in new directions in human information behavior, ed. amanda spink and charles cole, ol. 8 of information science and knowledge management (netherlands: springer, 2006): 170–202. 23. spink and currier, “towards an evolutionary perspective.” 24. amanda spink and charles cole, “human information behavior: integrating diverse approaches and information use,” journal of the american society for science and technology 57, no. 1 (2005): 25. 25. charles cole and john e. leide, “a cognitive framework for human information behavior: the place of metaphor in human information organizing behavior” in new directions in human information behavior ed. amanda spink and charles cole, vol. 8 of information science and knowledge management (netherlands: springer, 2006): 174. 26. ibid., 173. 27. ibid., 198. 28. weiner, “a knowledge representation approach”; gentner, “reasoning and learning.” 29. nicholas j. belkin, “anomalous state of knowledge for information retrieval,” canadian journal of information science 5 (1980): 133–43. 30. donald gentner and jacob nielson, “the anti-mac interface,” communications of the acm 39, no. 8 (1996): 70–82. 31. richard m. chisholm, “new metaphors for understanding the new machines” proceedings of the 4th annual international conference on systems documentation (new york: acm, 1986): 91. 32. aaron marcus, “metaphor design in user interfaces: how to effectively manage expectation, surprise, comprehension, and delight” in conference companion on human factors in computing systems chi ‘95, ed. irivin katz, robert mack, and linn marks (new york: acm, 1995): 373–74. 33. benjamin fineman, “computers as people: human interaction metaphors in human-computer interaction (master’s thesis, carnegie-mellon university, 2004), www.mildabandon .com/paper/paper.pdf (accessed june 25, 2008). 34. ibid. 35. gentner and nielson, “the anti-mac interface.” 36. alan kay, doing with images makes symbols (university video communications, 1987), flash video file, http://video .google.com/videoplay?docid=-533537336174204822 (accessed june 25, 2008). 37. alan f. blackwell, “the reification of metaphor as a design tool,” acm transactions on computer-human interaction 13, no. 4 (2006): 490–530. 38. john m. carroll and john c. thomas, “metaphor and the cognitive representation of computing systems,” ieee transactions on systems, man and cybernetics 12, no. 2 (1982): 108. 39. ibid., 113. 40. donald case, “conceptual organization and retrieval of text by historians: the role of metaphor and memory,” journal of the american society for information science 42, no. 9 (1991): 657–68. 41. aaron marcus, “managing metaphors for advanced user interface,” proceedings of international workshop avi ’94 (new york: acm, 1994): 14. 16 information technology and libraries | december 2008 42. ibid. 43. allan maclean, victoria bellotti, richard young, and thomas moran, “reaching through analogy: a design rationale perspective on roles of analogy” in proceedings of chi ‘91 conference on human factors in computer systems (new york: acm press, 1991), 167–72. 44. chilsolm, “new metaphors,” 90. 45. gerald j. johnson, “of metaphor and the difficulty of computer discourse,” communications of the acm 37, no. 12 (1994): 97–102. 46. ibid., 101. 47. gentner and nielson, “the anti-mac interface.” 48. chisolm, “new metaphors,” 90. 49. frank halasz and thomas p. moran, “analogy considered harmful,” international journal of man-machine studies 14 (1981): 383. 50. ibid., 185. 51. marcus, “managing metaphors,” 14. 52. ibid., 16. 53. alan cooper, “the myth of metaphor” originally published in visual basic programmer’s journal (july 1995), www .cooper.com/articles/art_myth_of_metaphor.htm (accessed june 25, 2008). 54. tim rohrer, “metaphors we compute by: bringing magic into interface design,” (1995), http://zakros.ucsd.edu/~trohrer/ metaphor/gui4web.htm (accessed june 25, 2008). 55. gentner and nielson, “the anti-mac interface.” 56. ibid., 74 57. ibid. 58. blackwell, “the reification of metaphor,” 493. 59. handbook of human computer interaction, 2nd rev. ed., ed. martin helander, thomas landauer, and p. prabhu (amsterdam: elsevier science pub. b.v., 1998). 60. james hollan, benjamin bederson, and jonathan helfman, “information visualization” in handbook of human computer interaction, 2nd rev. ed., ed. martin helander, thomas landauer, and p. prabhu (amsterdam: elsevier science pub. b.v., 1998), 441–62. 61. dennis c. neale and john m. carroll, “the role of metaphors in user interface design” in handbook of human computer interaction, 2nd rev. ed., ed. martin helander, thomas landauer and p. prabhu (amsterdam: elsevier science pub. b.v., 1998): 447. 62. a. f. blackwell and t. r. g. green, “does metaphor increase visual language usability?” in proceedings 1999 ieee symposium on visual languages (1999): 246–53.; lee ratzan, “making sense of the web: a metaphorical approach,” information research 6, no. 1 (2000), http://informationr.net/ir/6-1/ paper85.html (accessed june 25, 2008); christopher r. wolfe, “plant a tree in cyberspace: metaphor and analogy as design elements in web-based learning environments,” cyberpsychology & behavior 4, no. 1 (2001): 67–76; muna k. yousef, “legal, social, theoretical and fundamental aspects: assessment of metaphor efficacy in user interfaces for the elderly: a tentative model for enhancing accessibility,” proceedings of the 2001 ec/ nsf workshop on universal accessibility of ubiquitous computing: providing for the elderly (new york: acm, 2001): 120–24. 63. pippin barr, robert biddle, and james noble, “a taxonomy of user interface metaphors” proceedings of sigchi-nz symposium on computer-human interaction (chinz 2002) (hamilton, new zealand: australian computer society, 2002): 6. 64. alan f. blackwell, “metaphor in diagrams” (phd diss., darwin college, univ. of cambridge, 1998), www.cl.cam .ac.uk/~afb21/publications/thesis/blackwell-thesis.pdf (accessed june 25, 2008): 1. 65. david g. hendry, “sketching with conceptual metaphors to explain computational processes” visutal languages and human-centric computing (vl-hcc ‘06), (piscataway, n.j.: ieee, 2006): 7. 66. neale and carroll, “the role of metaphors in user interface design.” 67. amy powell, “composite metaphor, games and interface” proceedings of the second australasian conference on interactive entertainment: vol. 123, acm international conference proceeding series (sydney, australia: creativity & cognition studios pr., 2005): 159–62. 68. yu-chen hsu, “the long-term effects of integral versus composite metaphors on experts’ and novices’ search behaviors,” interacting with computers 17 (2005): 391. 69. gerald khoury and simeon simoff, “elastic metaphors: expanding the philosophy of interface design” in selected papers from conference on computers and philosophy, ed. john weckert and yeslam al-saggaf, vol. 37 of acm international conference proceeding series volume 101 (darlinghurst, australia: australian computer society, 2003): 70. taking the long way around: improving the display of hathitrust records in the primo discovery system jason alden bengtson and jason coleman information technology and libraries | march 2019 27 jason bengtson (jbengtson@ksu.edu) is head of it services for kansas state university libraries. jason coleman (coleman@ksu.edu) is head of library user services for kansas state university libraries. abstract as with any shared format for serializing data, primo’s pnx records have limits on the types of data which they pass along from the source records and into the primo tool. as a result of these limitations, pnx records do not currently have a provision for harvesting and transferring rights information about hathitrust holdings that the kansas state university (ksu) library system indexes through primo. this created a problem, since primo was defaulting to indicate that all hathitrust materials were available to ksu libraries (k-state libraries) patrons, when only a limited portion of them actually were. this disconnect was infuriating some library users, and creating difficulties for the public services librarians. there was a library-wide discussion about removing hathitrust holdings from primo altogether, but it was decided that such a solution was an overreaction. as a consequence, the library it department began a crash program to attempt to find a solution to the problem. the result was an application called hathigenius. introduction many information professionals will be aware of primo, the web scale discovery tool provided by ex libris. web scale discovery services are designed to provide indexing and searching user experiences, not only for the library’s holdings (as with a traditional online public access catalog), but also for many of a library’s licensed and open access holdings. primo offers a variety of useful features for search and discovery, taking in data from manifold sources and serializing them into a common format for indexing within the tool. however, such applications are still relatively young, and the technologies powering them have not fully matured. the combination of this lack of maturity and deliberately closed architecture between vendors leads to several problems for the user. one of the most frustrating is errors in identifying full-text access availability. as with any shared format for serializing data, primo’s pnx (primo normalized xml) records have limits on the types of data they pass from the source records into the primo tool. as a result of these limitations, pnx records do not currently have a provision for harvesting and transferring rights information about hathitrust holdings that the k-state libraries system indexes through primo. this created a problem in the k-state libraries’ implementation, since primo was defaulting to indicate that all hathitrust materials were available to k-state libraries patrons, when only a limited portion of them actually were. this disconnect was infuriating some library users, and creating difficulties for the public services librarians. there was a library-wide discussion about removing hathitrust holdings from primo altogether, but it was decided that such a solution was an overreaction. as a consequence, the library it services department began a crash program to attempt to find a solution to the problem. taking the long way around | bengston and coleman 28 https://doi.org/10.6017/ital.v38i1.10574 hathitrust’s digital library as a collection in primo central hathitrust was established in 2008 as a collaboration among several research libraries that were interested in preserving digital content. as of the beginning of march 2018, the collaborative’s digital library contained more than sixteen million items, approximately 37 percent of which were in the public domain.1 ex libris’ primo central index (pci), which serves as primo’s built-in index of articles from various database providers, includes metadata for the vast majority of the items in hathitrust’s digital library, providing inline frames within the original primo user interface to directly display full-text content of those items that the library has access to. libraries subscribing to primo choose whether or not to make these records available to their users. k-state libraries, like many other primo central clients, elected to activate hathitrust in its instance of primo, which it has branded with the name search it. the unmodified version of primo central identified all records from hathitrust’s digital library as available online, regardless of the actual level of access provided to users. users who discovered a record for an item from hathitrust’s digital library were presented with a conspicuous message indicating that full text was available and two links named view it and details. an example of the appearance of these search results is shown in figure 1. after clicking the “view it” tab, the center window would display the item’s homepage from hathitrust’s digital library inside an iframe. public domain items would display the title page of the item and present users with an interface containing numerous visual indicators that they were viewing an ebook (see figure 2 for an example). items with copyright restrictions would display a message indicating that the item is not available online (see figure 3 for an example). figure 1. two books from hathitrust as they appeared in search it prior to implementation of hathigenius. information technology and libraries | march 2019 29 figure 2. hathitrust result for an item in the public domain. figure 3. hathitrust’s homepage for an item that is not in the public domain. despite the intentions evident in the design of the primo interface, availability of hathitrust records was not being accurately reflected in the list of returns. the size of the indices underlying web scale discovery systems and the number of configurations and settings that must be maintained locally introduce a variety of failure points that can intercede when patrons attempt to access subscribed resources.2 one of the failure points identified by sunshine and carter is inaccurate knowledgebase information. the scope of inaccurate information about hathitrust items in primo central index constituted a particularly egregious example of this type of failure. patron reaction to misinformation about access to hathitrust between the time hathitrust’s digital library was activated in search it and the time the hathigenius application was installed at least thirty patrons contacted k-state libraries to ask why they were unable to access a book in hathitrust when search it had indicated that full text was available for the book. many of these expressed frustration at frequently encountering this error (for an example, see figure 4). taking the long way around | bengston and coleman 30 https://doi.org/10.6017/ital.v38i1.10574 1:08 26389957777759601093088133 i find it misleading that the search it function often finds a book i am interested in, but sometimes says it is available online; however, oftentimes it takes me to the hathi trust webpage for the book where i am told it is not available online. is this because our library has had to give up their subscription to this service? 1:08 me hi! 1:09 me that is definitely frustrating and we are trying to find a way to correct it. 1:10 me it does not have to do with our subscription, but rather the metadata we receive from hathitrust and its compatibility (or rather, incompatibility) with search it 1:11 26389957777759601093088133 okay, so i guess i better ask for the book i am seeking (the emperor’s mirror) through ill. 1:11 me that’d probably be your best bet, but let me take a look one moment 1:14 me yes, ill does look best. please note that the ill department will be closed after today until january 1:14 26389957777759601093088133 got it. thanks. i hope the hathi trust issue is resolved soon. (i have seen this problem all semester and finally got so frustrated to ask about it.) 1:15 26389957777759601093088133 have a happy holiday! 1:15 me you as well! and yes, i hope we can figure it out asap 1:15 me (it’s frustrating for us, too!) 1:20 26389957777759601093088133 has left the conversation figure 4. chat transcript revealing frustration with inaccurate information about availability of items in hathitrust. staff reaction to misinformation about access to hathitrust reference staff at k-state libraries use a ticketing system to report electronic resource access problems to a team of librarians who troubleshoot the underlying issues. shortly after the hathitrust library was activated in search it, reference staff submitted several tickets about problems with access to items in that collection. members of the troubleshooting team responded quickly and informed the reporting librarians that the problem was one beyond their control. this message was slow to reach the entirety of the reference staff and was not always understood as being applicable to the full range of access problems our patrons were experiencing. samples and healy note that this type of decentralization and reactive orientation is common in electronic resource troubleshooting.3 like them, k-state libraries recognized a need to develop best practices to obviate confusion. we also found ourselves pining for a tool such as that described by collins and murray that could automatically verify access for a large set of links.4 the extent of displeasure with the situation was so severe that some librarians stated they were loath to promote search it to students since several million records were so conspicuously inaccurate. information technology and libraries | march 2019 31 technical challenges the k-state libraries it department wanted to fix the situation, in order to provide accurate expectations to their users, but doing so presented severe technical challenges, the most significant of which stemmed from the lack of rights information in the pnx record in primo. without more accurate information on availability, user satisfaction seemed destined to remain low. research into patron use of discovery layers predicted this unsurprising dissatisfaction. oclc’s (2009) research into what patrons want from discovery system led the researchers to conclude that “a seamless, easy flow from discovery through delivery is critical to end users. this point may seem obvious, but it is important to remember that for many end users, without the delivery of something he or she wants or needs, discovery alone is a waste of time.”5 a later usability study reported: “some participants spent considerable time looking around for features they hoped or presumed existed that would support their path toward task completion.”6 additionally, the perceived need to customize discovery layers so that they reflect the needs of a particular research library is hardly new, or exclusive to k-state libraries. the same issue was confronted by catalogers at east carolina university, as well as catalogers at unc chapel hill.7 nonetheless, the challenge posed by discovery layers comes with opportunity, as james madison university discovered when their ebsco discovery service widget netted almost twice the usage of their previous library catalog widget, and as the university of colorado discovered when they observed users attempting to use the discovery layer search box in “google-like” ways that could potentially aid discovery layer creators (as well as library it departments) in both design and in setting expectations.8 as previously noted, primo’s results display is driven by pnx records (see figure 5 for an example). the single most fundamental challenge was finding a way to get to holdings rights information despite that data not being present in the pnx records, or, consequently, the search results that showed up in the presentation layer. there was no immediate option to create a solution that leveraged “server-side” resources, where the data itself resided and was transformed, since k-state libraries subscribes to primo as a hosted service, and ex libris provided no direct server-side access to k-state libraries. some alternative way had to be found to locate the rights data for individual records and populate it into the primo interface. upon assessing the situation, the assistant director, it (ad) decided that one potential approach would be to independently query the hathitrust bibliographic application programming interface (api) for rights information. this approach solved a number of fundamental problems, but also posited its own questions and challenges: 1. some server-side component would still be needed for part of the query . . . where would that live and how could it be made to communicate with the javascript k-state libraries had injected into its primo instance? 2. how to best isolate hathitrust object identifiers from primo and then use them to launch an api query? 3. how to keep those responses appropriately “pinned” to their corresponding entries on the primo page? 4. how would the hathitrust bibliographic api perform under load from search it queries? answering these questions would require significant research into the hathitrust bibliographic api documentation, and extensive experimentation. taking the long way around | bengston and coleman 32 https://doi.org/10.6017/ital.v38i1.10574 figure 5. a portion of the pnx record for http://hdl.handle.net/2027/uc1.32106011231518 (the second item shown in figure 1). building the application of these four questions, the first was easily the most challenging: where would the server-side component live and how would it work? the k-state libraries it services department had, in the past, made a number of significant modifications to the appearance and functionality of the primo application by adding javascript to the static html tiles used in the primo interface. however, generally speaking, javascript cannot successfully request data from outside of the domain of the web document it occupies. requesting data from an api across domains requires the mediation of a server-side appliance. the ad constructed one for this purpose, using the php programming language. this script would serve as an intermediary between the javascript in primo and the hathitrust api. the appliance accepted data from the primo javascript in the form of the contents of http variables (encoded in the url of the get request to the php appliance), then used those values to query the hathitrust api. however, since this server-side appliance did not reside in the same domain as k-state libraries’ primo instance, the problem of getting the returned api data from the php appliance to the javascript still remained. this problem was solved by treating the php appliance as a javascript file for purposes of the application. while javascript cannot load data from another domain, a web document may load actual javascript files from anywhere on the web. the hathigenius appliance takes advantage of this fact by calling the php appliance programmatically as a javascript file, with a javascript object notation (json) version of the identifiers of any hathitrust entries encoded as part of the url used to call the file. the php script runs the queries against the api and returns a javascript file consisting of a single variable containing the json data encoding the availability information for the hathitrust entries as supplied from the bibliographic api . . . essentially appearing to the browser as a standard javascript file. information technology and libraries | march 2019 33 the second and third problems were intrinsically interrelated, and essentially boiled down to finding a unique identifier to use in an api query from the hathitrust entries. the most effective way to handle these queries was to use the “htid” identifier, which was largely unique to hathitrust entries, could be easily extracted from any entries that contained it, and would form the basis of the php script’s request to the hathitrust restful api to obtain rights information. in the process of harvesting the htid, hathigenius also copies the id for the object in the webpage that serves as the entry in the list of primo returns containing that htid. as the data is moved back and forth for processing, the htids, and later the resultant json data, remain paired to the object id for the entry in the list of returns. when hathigenius receives the results of the api query, it can then easily rewrite those entries to reflect the rights data it obtained. the fourth question has been fully answered with time. to this point, well over a year after hathigenius was activated in production, library it has not observed any failure of the api to deliver the requested results in testing, and no issues to that effect have been reported by users. log data indicates that, even under heavy load, the api is performing to expectations. further modifications originally, the hathigenius application supplied definitive states of available or unavailable for each entry. however, some experimentation showed this approach to be less than optimal. since the bibliographic api cannot be queried by kansas state university as a specific user, but rather was being queried for general access rights, the possibility still existed for false negatives in the future, if kansas state university’s level of access to hathitrust changed. the data returned from the api queries, when drilled down, just consisted of the usrightsstring property from the api, which corresponded to open-source availability, and did not account for any additional items available to the library by license in the future. after the application had been active for a short time, to mitigate this potential issue, the “not available” state (consisting of an application of the “exlresultstatusnotavailable” class to the hathitrust entry) was “softened” into an application of the “exlresultstatusmaybeavailable” class and verbiage asking users to check the “view it” tab for availability. a few weeks after deployment, it received a ticket indicating hathigenius was failing to work properly. the source of the problem proved to be detailed bibliographic pages for items in a search results list, which were linking out from the search entries. these pages used a different class and object structure than the search results pages in primo, requiring that an additional module be built into hathigenius to account for them. once the new module was added to the application and put into place, the problem was resolved. a second issue presented itself some weeks later, when a few false negatives were reported. at first, the assistant director assumed that licensing had changed, creating a disparity between the access information from the usrightsstring property and the library’s actual holdings. however, upon investigation it was clear that hathigenius was dropping some of the calls to the hathitrust bibliographic api. the api itself was performing as expected under load, however, and the failure proved to be coming from an unexpected source. the php script used by hathigenius to interface with the api was employing the curl module, which, in turn, was using its own, less secure certificate to establish a secure socket layer (ssl) connection to the hathitrust server. once the taking the long way around | bengston and coleman 34 https://doi.org/10.6017/ital.v38i1.10574 script was refactored to employ the simpler file_get_contents function, which relied upon the server’s main ssl certificate, the problem was fully resolved. hathigenius also had a limited vulnerability to bad actors. while the internal script’s destination hardwiring prevented hathigenius from being used as a generic tool to anonymously query apis, the library did encounter a situation in which a (probably inadvertently) malicious bot repeatedly pinged the script, causing it to use up system resources until it interrupted other services on the host machine. modifications were added to the script to provide a simple check against requests originating from primo. additionally, restrictions were placed on the script so that excessive resource use would cause it to be intermittently deactivated. while not perfect solutions, these measures have prevented a repeat of the earlier incident. k-state libraries has recently finished work on its version of the new primo user interface (primo new ui), which was moved into production this year. the new interface has a completely different client-side structure, requiring a very different approach to integrating hathigenius.9 appearance of hathitrust results in primo after hathigenius when the hathigenius api does not find a usrights property, we configured primo to display a yellow dot and the text “please check availability with the view it tab” (see figure 6 for an example). as noted earlier, we originally considered this preferable to displaying a red dot and the text “not available online,” because there might be instances in which the item is actually available in full view through hathitrust despite the absence of usrights in the record. figure 6. two books for which hathigenius found no usrights in hathitrust. when the hathigenuis api finds usrights, we configured primo to display a green dot and text “available online” (see figure 7 for an example). information technology and libraries | march 2019 35 figure 7. a book for which hathigenius found usrights. patron response since the beginning of 2017, the reference staff at k-state libraries have received no reports of patrons encountering situations in the original user interface in which primo indicates that full text is available but hathitrust is only providing a preview. however, a small number of patrons (at least four) expressed confusion at seeing a result in primo and discovering that the full-text is not available. some of those patrons noted that they saw the text “please check availability with the view it tab,” and inferred that this was meant to state that the full-text was available. others indicated that they never considered that we would include results for books that we do not own. these responses add to the body of literature documenting user expectations that everything should be available in full-text in an online library and that systems should be easy to use.10 internal response in order to gauge the feelings of k-state libraries’ staff who regularly assist patrons with reference questions, the authors crafted a brief survey (included in appendix a). respondents were asked to indicate whether they had noticed a positive change following implementation of hathigenius, a negative change, or no change at all. they were also invited to share comments. the survey was distributed to thirty individuals. twelve (40 percent) of those thirty responded to the survey. the survey response indicated a great deal of ambivalence by reference staff toward the change, with four individuals (33 percent) indicating they had not noticed a difference, and another four (33 percent) indicating that they had noticed a difference, but that it had not improved the quality of search results. only two (17 percent) of the respondents revealed that they had noticed an improvement in the quality of the search results. one (9 percent) respondent indicated that they felt that the hathitrust results had gotten noticeably worse since the introduction of hathigenius, although they did not elaborate on this in the survey question which invited further comment. the remaining respondent stated that they did not have an opinion. four comments were left by respondents, including one which indicated displeasure with the new, softer verbiage for hathitrust “negatives,” and one who claimed that the problem of false positives persisted, despite such feedback not being seen by the authors through any of the statistical modalities currently used for recording reference transactions. one user praised hathigenius, while another related broad displeasure with the decision to include hathitrust records in search it. that individual claimed that almost none of the results from hathitrust were available and stated that the hope engendered by the presence of the hathitrust results and the corresponding suggestion to check the view it tab was always dashed, to the detriment of patron satisfaction. taking the long way around | bengston and coleman 36 https://doi.org/10.6017/ital.v38i1.10574 the new ui as previously mentioned, in late 2018, k-state libraries adopted the primo new ui created by ex libris. this new user interface was built in angular, and changed many aspects about how hathigenius had to be integrated into primo. the k-state libraries’ it department completed a refactoring (reworking application code to change how an application works, but not what it does) of hathigenius to integrate it with the new ui and released it into production in september 2018. as an interesting aside, the it department did not initially prioritize the reintegration of hathigenius, due to the ambivalence of the response to the application evidenced by the survey conducted for this paper. however, shortly after search it was switched over to the new ui, complaints about the hathitrust results again displaying inaccurate availability information began to come in to the it department via both email and tickets from reference staff. as the stridence of the response increased, the project was reprioritized, and the work completed. future directions as previously mentioned, hathigenius currently uses the very rough “usrightsstring” property value from the hathitrust bibliographic api. however, the api also delivers much more granular rights data for digital objects. a future version of the app may inspect these more granular rights codes and compare them to rights data from k-state libraries in order to more definitively provide access determinations for hathitrust results in primo should the licensing of hathitrust holdings be changed. similarly, since htid technically only resolves to the volume level, a future version may additionally harvest the hathitrust record number, which appears to be extractable from the primo entries. based on feedback from the survey, the “soft negative” verbiage used in hathigenius was replaced with a firmer negative. this decision proved especially sagacious given that, once the early issues with certificates and communication with the hathitrust bibliographic api were sorted out, the accuracy of the tool seemed to be fully satisfactory. another problem with the “soft negative” was the fact that it asked users to click on the view-it tab, when many users simply chose to ignore the tabs and links in the search results, instead clicking on the article title, as found in a usability study on primo conducted by the university of houston libraries.11 it is also worth noting the one survey respondent who is apparently not seeing an improvement in hathitrust accuracy. if the continued difficulties they have indicated can be documented and replicated, the it department can examine those complaints to investigate where the tool may be failing. discussion one interesting feature of this experience is the seeming disconnect between library reference support staff and users in terms of the perception of the efficacy of the tool. this disconnect is all the more curious given the negative reaction displayed by reference support staff when hathigenius became unavailable temporarily upon introduction of the primo new ui. part of this perceived disconnect may be a result of the fact that staff were given a survey instrument, while the reactions of users have been determined largely via null results (a lack of complaints to, or information technology and libraries | march 2019 37 requests for assistance from, service point staff). however, given the dramatic drop in user complaints compared to the ambivalent reaction to the tool by most of the survey respondents, it appears that the staff had a much less enthusiastic response to the intervention than patrons. a few possibilities occur to the authors, including a general dislike for the discovery layer by reference librarians, a general disinclination toward a technological solution by some respondents, or the initial perception by at least part of the reference staff that the problem was not significant. as noted by fagan et al., the pivot toward discovery layers has not been a comfortable one for many librarians.12 until further research can be conducted on this, and reactions to similar customization interventions, these possibilities remain speculation. one particular feature of note with hathigenius is the use of what one of the authors refers to as “sidewise development” to solve problems that seem to be intractable within a proprietary, or open source, web-based tool. while not a new methodology in and of itself, the author has mainly encountered this type of design in ad-hoc creations, rather than as a systematic approach to problem-solving. instead of relying upon the capabilities of primo, this type of customization made its own query to a relevant api and blended that external data with the data available from primo seamlessly within the application’s presentation layer in order to facilitate a solution to a known problem. the solution created in this fashion was portable, and unaffected by most updates to primo itself. even the transition to the new ui required changes to the “hooks” and timing used by the javascript, rather than any substantial rewrite of the core engines of the application. this methodology has been used repeatedly by k-state libraries it services to solve problems where other interventions would have necessitated the creation of specialized modules, or the rewriting of source code; both of which would be substantially affected by updates to the product itself, and which would have been difficult to improve or version without down time to the affected product. similar solutions have seen tools independently query an application’s database in order to inject the data back into the application’s presentation layer, bypassing the core functionality of the application. conclusion reactions at this point from users, and at least some library staff, have been positive. while not a perfect tool, hathigenius has improved the user experience, removing a point of frustration and an area of disconnect between the library and its users. the application itself is fully replicable by other institutions (as is the general model of sideways development), allowing them to improve the utility of their primo instances. as with many possible customizations to discovery layers, hathigenius provides fertile ground for additional work, research, and refinement, as libraries struggle to find the most effective ways to implement discovery tools within their own environments. beyond hathigenius itself, the sideways development method provides a powerful tool for libraries to improve the tools they use by integrating additional functionality at the presentation layer level. tackling the problem of inaccurate full-text links in discovery layers is only one application of this approach, but it is an important one. as libraries continue to strive to improve the results and usability of their search offerings, the ability to add local customizations and improvements will be an essential feature for vendors to consider. taking the long way around | bengston and coleman 38 https://doi.org/10.6017/ital.v38i1.10574 appendix a. feedback survey q1 in january 2017, the library began applying a tool (called hathigenius) to the hathitrust results in primo in order to eliminate the problem of “false positives.” in other words, primo would report that all of the hathitrust results it returned were available online as full text, when many were not. we would like your feedback about the impact of this change from your perspective. q2 which of the following statements best describes your opinion about the impact of hathigenius? o i haven’t noticed a difference. o i feel that search it’s presentation of hathitrust results in search it has become noticeably better since hathigenius was implemented. o i feel that search it’s presentation of hathitrust results in search it has become noticeably worse since hathigenius was implemented. o i have noticed a difference, but i feel that search it’s presentation of hathitrust results is about the same quality as it was before hathigenius was implemented. o no opinion. q3 please share any comments you have about hathigenius or any ideas you have for improving the display of hathitrust’s records in search it. information technology and libraries | march 2019 39 references 1 hathitrust digital library, “welcome to hathitrust!” accessed march 4, 2018, https://www.hathitrust.org/about. 2 sunshine carter and stacie traill, “essential skills and knowledge for troubleshooting eresources access issues in a web-scale discovery environment,” journal of electronic resources librarianship 29, no. 1 (2017): 7, https://doi.org/10.1080/1941126x.2017.1270096. 3 jacquie samples and ciara healy, “making it look easy: maintaining the magic of access,” serials review 40, no. 2 (2014): 114, https://doi.org/10.1080/00987913.2014.929483. 4 maria collins and william t. murray, “seesau: university of georgia’s electronic journal verification system,” serials review 35, no. 2 (2009): 80, https://doi.org/10.1080/00987913.2009.10765216. 5 karen calhoun, diane cellentani, and oclc, eds., online catalogs: what users and librarians want: an oclc report (dublin, ohio: oclc, 2009): 20, https://www.oclc.org/content/dam/oclc/reports/onlinecatalogs/fullreport.pdf. 6 rice majors, “comparative user experiences of next-generation catalogue interfaces,” library trends; baltimore 61, no. 1 (summer 2012): 191, https://scholarcommons.scu.edu/cgi/viewcontent.cgi?article=1132&context=library. 7 marlena barber, christopher holden, and janet l. mayo, “customizing an open source discovery layer at east carolina university libraries “the cataloger’s role in developing a replacement for a traditional online catalog,” library resources & technical services 60, no. 3 (july 2016): 184, https://journals.ala.org/index.php/lrts/article/view/6039; benjamin pennell and jill sexton, “implementing a real-time suggestion service in a library discovery layer,” code4lib journal, no. 10 (june 2010): 5, https://journal.code4lib.org/articles/3022. 8 jody condit fagan et al., “usability test results for a discovery tool in an academic library,” information technology and libraries 31, no. 1 (march 2008): 99, https://doi.org/10.6017/ital.v31i1.1855. 9 dan moore and nathan mealey, “consortial-based customizations for new primo ui,” the code4lib journal, no. 34 (october 25, 2016), http://journal.code4lib.org/articles/11948. 10 lesley m. moyo, “electronic libraries and the emergence of new service paradigms,” the electronic library, 22, no. 3 (2004): 221, https://www.emeraldinsight.com/doi/full/10.1108/02640470410541615. 11 kelsey brett, ashley lierman, and cherie turner, “lessons learned: a primo usability study,” information technology and libraries, 35, no. 1 (march 2016): 20, https://ejournals.bc.edu/ojs/index.php/ital/article/view/8965. 12 fagan et al., “usability test results for a discovery tool in an academic library,” 84. lib-mocs-kmc364-20140103101924 13 shawnee mission's on-line cataloging system ellen washy miller: library systems analyst, and b. j. hodges: senior systems analyst, shawnee mission public schools, shawnee mission, kansas an on-line cataloging pilot project for two elementary schools is discussed. the system components are 27 40 terminals, upper-lower-case input, ibm's faster generalized software packo.ge, and usual cards/labels output. reasons for choosing faster, software and hardware features, operating procedures, system performance and costs are detailed. future expansion to cataloging 100,000 annual k-12 acquisitions, on-line circulation, retrospective conversion, and union book catalogs is set forth. introduction the shawnee mission public schools, serving several affiuent suburbs of the kansas city metropolitan area, began automated library operations in 1968. as the school districfs computer center was then equipped with a 1401 computer and tape/disk store, the first library system was designed for batch ordering and cataloging. later, a batch circulation system for three of the school district's fourteen secondary libraries was started. library automation in that period was similar to that described by scott (1) and auld (2). two years saw a profound change in the shawnee mission school district. by unification, it had added 50 elementary schools and a new high school, makin9 a total of 65 schools, all of which had libraries. at the school districts computer center, the configuration had passed through the 360/30 stage to a 360/40; 2314 disk packs were on order; and 2741 term~als, using ibm's remote access computational system (rax) had been ~stalled at all five high schools for computer science courses. wlule the batch library system could handle the 28.000 items ordered 14 journal of library automation vol. 4/1 march, 1971 and cataloged annually up to that point, it was impossible to justify using it for an estimated 100,000 annual acquisitions needed by 65 libraries. computer time to process autocoder programs on a mod/40 would be excessive; the librarians desired many improvements (upperand lowercase i/0; longer fields; shortened time to process items; and more accurate data on cards and labels) . the need for data processing and library improvements resulted in rethinking of the approach to ordering, cataloging, and circulation. naturally, on-line processing came to mind. ibm 274fs for computer science courses had given management and data processing staff some experience with a dedicated on-line system; the 360/40 and 2314 disks would support large files, indexed sequential file organization, and multiprogramming (simultaneous use of the cpu for terminal and batch jobs). the experiences of stanford and university of chicago ( 3) and ibm ( 4) pointed out that on-line systems could be built for larger and more complex organizations than for shawnee mission, where the collections are 95% english language and the system covers only books and audio-visual items. cataloging is based on title-page information; tools used are children·s catalog, sears, n.u.c., a. a. rules, and other standard works. also very important was the fact that the computer center management wanted experience in multiprogramming prior to considering it for student scheduling, student records, payroll and business functions. a proposal was made to library and data processing management by the library systems analyst in mid-december 1969 that on-line cataloging in multiprogramming mode be begun by mid-march 1970 for two elementary schools on a test basis. an improved batch order system using cobol programs was also proposed. finally, it was suggested that a carefully designed cataloging system could include fields to be used later for circulation control. the specific purposes of the on-line cataloging pilot project were 1) to test whether direct access to master disk files is an efficient, accurate, and economical method of creating and updating bibliographic and holdings data; and 2) to allow data processing management to ascertain if multiprogramming is feasible and practical at this time locally. a search of library literature revealed no on-line systems for cataloging and circulation functions; rather, either circulation or order functions were real time. moreover, truly on-line systems were rare; columbia had designed a circulation system that could be used in that mode, but as of october 1968, was operating batch (5). chicago's book processing system does input data on line, although ordering and cataloging functions are performed off-line (6) . bellrel is an on-line circulation system (7). comparing the circumstances of the above institutions with that of shawnee mission school district brought out one sterling difference: the latter had no yant money nor huge parent institution upon which to rely. rather, it ha a modest hardware-software configuration, a need to be on-line cataloging system/ miller and hodges 15 operational within three months if the two test librarians were to see any output by the end of the school year, and a small team of data processors and librarians devoted to redesign and implementation. methods and materials having earlier seen demonstrations of the kansas city, missouri, police department's faster system, with its on-line access to constantly updated alphanumeric files, the senior systems analysts turned to ibm for further information. the police department's system was based on a software package developed in alameda, california, for law enforcement. it was also available in a general form called faster (filing and source data entry techniques for easier retrieval). the proven ability of this system to accept on-line data via 2740 terminals and to display it on 2260 crt's, its ease of adaptation to user requirements, the quickness with which analysts and programmers had learned to use it at the police department, and a local, positive experience decided the issue. in mid-january 1970, faster was chosen as the software framework for on-line cataloging. software faster has been programmed in modular form, with each module performing a particular task ( 8). modules supporting functions that vary because of hardware must be coded by the user. this coding is done in macro form (brief program statements in higher level language which generate many machine instructions) and therefore is not a tedious task. one of the hardest, most complicated portions of implementing a teleprocessing system is programming the support from the cpu to the terminal; with faster, this took about a day. the macros use basic teleprocessing access method (btam) support. with line support taking little time, the user may spend more effort on his own processing needs. the user may have only a query or an update requirement; shawnee mission needed both. because faster is a modular system, the user is permitted to describe each of his needs as a transaction. this transaction must be programmed as a tpd (transaction processing d~scription) using macros. coding and listing time for a tpd will vary w1th the processing description. those familiar with detailed programming will note that the programmer does not have to concern himself with 1/0. the tpd will prepare the data for output, and the faster interface module will handle 1/0. some of the major functions supported by the macro language include: 1) retrieval of records from indexed sequential ( isam ) files-files accessed only through hierarchic indexes; 2) modifications and additions to isam ides; 3) data manipulation; 4) formatting of responses to selected terminals; 5) message switching and 6) recording audit data on a logging file. faster under dos requires fixed-length records; this has been modified under the os version. 16 journal of library automation vol. 4/1 march, 1971 retrieval from the !sam files required for processing a given transaction may be performed in one of three ways: 1) retrieval of a unique record, 2) sequential retrieval of a specified number of records from a logical grouping, or 3) retrieval of a specified number of records from a logical grouping, in which the retrieval records represent the best qualified from the group based upon the user's selection criteria. hardware hardware supported by the faster system is as follows: machine configuration: ibm 360 mod/30, 40, or 50 storage requirements: minimum-dos 65k; minimum-os 128k disks supported: 2311 or 2314 logging file: disk or tape line control: btam witb 2701, 2702, or 2703 terminals: 2740, 1050, 2260 crt systems at shawnee mission computer center: ibm 360/40 dos 256 k eight 2314 disks three 2401 tape drives one 2702 line control 27 40 and 27 41 terminals the system operates in three partitions. partition f1 houses apl (a programming language) for student use with 27 41 terminals. partition f2 houses faster. partition bg is used for batch jobs (both cobol and autocoder under cs monitor). file organization and access faster supports !sam files only (as data sets) with the exception of the logging file; the logging device must be sequential. faster's support of disk files is accomplished by using the same software modules that al and cobol use in maintaining !sam files. therefore, standard programming languages may be used for creating files and data retrieval. shawnee mission chose cobol as its main fanguage and found it to complement the faster system. files the batch library system was based on a 400-character record, repeated in its entirety for every copy in each library. this space consumption for redundant information was undesirable in a system with 65 collections, and therefore two basic files were designed. the first is the disk title file, containing one record with bibliographic information for each unique title in the school district. its fields include author, title, dates, subject headings, annotation, etc. (table 1). each record is 562 characters long. on-line cataloging system/miller and hodges 17 table 1. main fields input by operators record title copy field form code publication date copyright date author title annotation publisher edition price dewey number cutter number grade level collation series language code lc card number subject heading 1 subject heading 2 subject heading 3 added entry title number number of copies building code funding code volume number print instructions length 2 4 4 35 50 105 30 3 5 8 10 4 40 35 3 14 24 46 60 30 7 4 5 comments distinguishes physical format may be continued in annotation use marc language codes use sears headings " , , , , for name or title 2 if other than general funds 3 for volume or other sequence number 16 kept only until labels and cards printed in distinction, the disk copy file contains a 56-character record for each copy of a title, comprised of fixed-length fields for building number, special funding, volume information, and circulation control. copy and title records are linked through the title number. the third file is the isam title index, comprised of records with a phon.etic code and key for each title record. this file is called up by a t~rnu~al transaction containing title; the incoming phonetic code for the htle ~~ matched with any equal ones on the index. for matches, biblio~rap~tc data is pulled from the title and typed on the terminal. the main unction of the title index is to determine duplicates. 18 journal of library automation vol. 4/1 march, 1971 tests on 45,711 title records showed that a 16-character phonetic code resulted in a maximum of 36 different titles having the same phonetic code. the 16-character code chosen consists of the first character of title followed by numeric values for most consonants. the on-line cataloging system input recognizing that the pilot project might be expanded into a full-scale operation, librarians drew up procedures for entering data from either shelf list cards or new arrivals. conversion from shelf lists required that cards be edited to eliminate confusing information and to add implicit data. for new acquisitions, most information needed by the terminal operator is annotated on the title page and its verso. a grid sheet to be slipped into the book contains subject headings, added entry, annotation and some other fields. all of these practices were set forth in a user's manual (9), along with instructions on how to enter transactions into the disk files. limits on the input buffer permit a maximum of 120 characters to be transmitted by any transaction, which means that several transactions are required to add all cataloging and location data necessary to describe a new title. there are two sets of transactions. the lt series adds to and updates records in the title file; the lc series does the same for the copy file. for instance, entering a new title requires an lt01 transaction to start the record and assign a title number, one or more lt02's to complete cataloging data, and an lc01 for building assignment. operators find transactions easy to key and understand. by category, lc and lt transactions set up new records, add on fields, update fields, delete or activate records, and query the contents of a specified record. these transactions are a simple, understandable, and powerful method of maintaining library files. several transactions also add data to fields, thus saving the operator keying time. for instance, the cutter number is automatically derived from the first three letters of the author's last name, unless specifically superseded by the operator. also, "f" is assigned to dewey for all items unless replaced by another classification. finally, a standard set of output, consisting of 1) two author cards, overprinted cards, a copy card, and 2) one three-up label, is assumed when an lc01 transaction is input to show location. if other outputs are needed, the operator uses an lc05 transaction to specify them. there are also several instances of data being input in lower case (to save time and buffer space for a shift) and being edited on output to upper case. the result of all these program aids is that the operator knows she is keying only important data; highly invariable fields are input and edited by the faster programs, saving operator time. on-line cataloging system/ miller and hodges 19 output two basic card formats were chosen. the unit card contains all cataloging information; the copy card shows a library's holdings of a given title (figures 1 and 2) . a unit card and copy card (giving all cataloging and holdings information) go into the school's shell list; the usual set of main and added entry unit cards goes into the public catalog. gunthf'r , j()lln hlr the gr , < 0, 1>, <1, 0>, < 0, 2>, < 1, 1> , <2, 0 >, < 0, 3>, < 1, 2>, etc. ordered triples are ordered pairs of ordered pairs and the integers: t(x, y, z)=i(x, i(y,z) ). and so on. because we have a d enumerable set of books we can accomplish a linear mapping by both subject and category. in fact , the problem is trivial because there are only a finite number of books. physically, however, neither subject nor category will remain together. to suit the library the mapping must be physically simple, but can be abstractly complex. for all his protestations, the classificationist cannot eschew the physical library. if he could-or wished to-the way is open. as i understand classification, it is vacuous without reference to its ability as a finding tool. it must concern itself with the polydimensional aspect of content but cannot disregard the codex. in answer to the question form "where is the book about ... ?" an appropriate and total response type is "at location (x, y, z) ." here x, y, and z are the spatial coordinates relative to a particular library both as to origin and values. the dewey or lc numbers of the book are incomple te answers in that they presume a knowledge of the classification structure as well as knowledge of the architecture of the building. i have suggested that a classification scheme must not disregard the codex, but must insofar as possible not be subservient to physical form. the following scheme takes advantage of the codex form, is as easily automated or computerized as current one-dimensional schemes, advances beyond one dimension, and is very relevant to finding: a library is considered as a three-dimension entity. conventions are adopted for run-on from room to room and floor to floor as for the linear scheme. each book is classified in all three dimensions-the dimensions being independent. the interpretation of each dimension is left to the discretion of the individual library. thus each book has a relative position in each dimension. (this is not an alexandrian scheme relying on absolute location. ) the following example illustrates the relevant concepts: choose a subject classification (as commonly understood ) for the x dimension. for example, let dewey numbers be arranged from left to right on the x axis. choose a category scheme for the y dimension. one could assign degrees of difcataloging geometry/hazelton 15 x d i f f i c u subject ficulty from one to seven, for example. choose a category scheme for the z dimension. one could assign numbers between one and seven running from most general to most specific. this has the following effect: standing in front of the near shelf (i.e., z=l) one can choose a subject by moving laterally. the general books will appear first with difficult items at the top, easy ones at the bottom. if the items are too general, merely move one stack forward and try again. this approach presents an unusually usable instructional layout for circular libraries. a reading lounge can be put dead center with the most subject specific books ranged about the circumference. level of difficulty is easily adjusted by looking up or down. given this apparatus you may wish to change the subject classification scheme. why not put solid state physics behind general physics instead of to the right or left? the card catalog can now be used with greater meaning. there is no reason why it cannot be a map of the shelves. the axes can be translated for ease of searching (e.g., interchange x and y for the card catalog). of particular interest is the relation between this scheme and those of a. d. booth ( 4) where access time is minimized by arranging books in the inverse order of their frequency of use. further refinements consider nonstandard shelf layouts (radial, circular, spiral). one misgiving about shelving by inverse frequency expressed by librarians is that one no longer knows where to look for a particular book in the sense that one knows when using standard schemes. this objection is easily overcome by combining the three-dimensional and frequency schemes. one dimension can be used for frequency, leaving two dimensions in which to group books by subject, difficulty, generality, color, length, or whatever you please. access time is reduced while physical grouping is retained. 16 journal of library automation vol. 5/1 march, 1972 one difficulty that will be encountered is the classification of books that are not subject-oriented-poetry and fiction, for example. these areas are not adequately dealt with in linear schemes and they could easily be left as they are. that is, two dimensions could be constants. on the other hand, it seems plausible that, given three dimensions in which to work, someone could discover congenial physical groupings that would be reasonable yet impossible in one dimension. rather than being a problem, threedimensional classification offers opportunities to cope with literatures that are not subject specific. each dimension of this scheme can be criticized on the same grounds as the current linear classification. but, taken as a whole, it provides a more powerful, much needed tool for the classificationist while allowing new approaches by automaters. its simplicity is assured because it is closer to our intuitive notions of information storage. three dimensions are necessary! references 1. w. h. auden, "the cave of nakedness," about the house ( new york: random house, 1965), p.32. 2. jesse h. shera, "classification-current functions and applications to the subject analysis of library material," in libraries and the organization of knowledge (connecticut: archon books, 1965), p.97 -98. 3. jesse h. shera, "classification as the basis of bibliographic organization," in libraries and the organization of knowledge (connecticut: archon books, 1965) , p.84, 85. 4. a. d. booth, "on the geometry of libraries," journal of documentation 25:28-42 (march 1969). practical limits to the scope of digital preservation mike kastellec practical limits to the scope of digital preservation | kastellec 63 abstract this paper examines factors that limit the ability of institutions to digitally preserve the cultural heritage of the modern era. the author takes a wide-ranging approach to shed light on limitations to the scope of digital preservation. the author finds that technological limitations to digital preservation have been addressed but still exist, and that non-technical aspects—access, selection, law, and finances—move into the foreground as technological limitations recede. the author proposes a nested model of constraints to the scope of digital preservation and concludes that costs are digital preservation’s most pervasive limitation. introduction imagine for a moment what perfect digital preservation would entail: a perfect archive would capture all the content generated by humanity instantly and continuously. it would catalog that information and make it available to users, yet it would not stifle creativity by undermining creators’ right to control their creations. most of all, it would perfectly safeguard all the information it ingested eternally, at a cost society is willing and able to sustain. now return to reality: digital preservation is decidedly imperfect. today’s archives fall far short of the possibilities outlined above. much previous scholarship debates the quality of different digital preservation strategies; this paper looks past these arguments to shed light on limitations to the scope of digital preservation. what are the factors that limit the ability of libraries, archives, and museums (henceforth collectively referred to as archival institutions) to digitally preserve the cultural heritage of the modern era? 1 i first examine the degree to which technological limitations to digital preservation have been addressed. next, i identify the non-technical factors that limit the archival of digital objects. finally, i propose a conceptual model of limitations to digital preservation. technology any discussion of digital preservation naturally begins with consideration of the limits of digital preservation technology. while all aspects of digital preservation are by definition related to technology, there are two purely technical issues at the core of digital preservation: data loss and technological obsolescence. 2 many things can cause data loss. the constant risk is physical deterioration. a digital file consists at its most basic level as binary code written to some form of mike kastellec (makastel@ncsu.edu) is libraries fellow, north carolina state university libraries, raleigh, nc. mailto:makastel@ncsu.edu information technology and libraries | june 2012 64 physical media. just like analog media (paper, vinyl recordings), digital media (optical discs, hard drives) are subject to degradation at a rate determined by the inherent properties of the medium and environment in which it is stored. 3 when the physical medium of a digital file decays to the point where one or more bits lose their definition, the file becomes partially or wholly unreadable. other causes of data loss include software bugs, human action (e.g., accidental deletion or purposeful alteration), and environmental dangers (e.g., fire, flood, war). assuming a digital archive can overcome the problem of physical deterioration, it then faces the issue of technological obsolescence. binary code is simply a string of zeroes and ones (sometimes called a bitstream)—like any encoded information, this code is only useful if it can be decoded into an intelligible format. this process depends on hardware, used to access a bitstream from a piece of physical media, and software, which decodes the bitstream into an intelligible object, such as a document or video displayed on a screen, a printout, or an audio output. technological obsolescence occurs when either the hardware or software needed to render a bitstream usable is no longer available. given the rapid pace of change in computer hardware and software, technological obsolescence is a constant concern. 4 most digital preservation strategies involve staying ahead of deterioration and obsolescence by copying data from older to current generations of file formats and storage media (migration) or by keeping many copies that are tested against one another to find and correct errors (data redundancy). 5 other strategies to overcome obsolescence include pre-emptively converting data to standardized formats (normalization) or avoiding conversion and instead using virtualized hardware and software to simulate the original digital environment needed to access obsolete formats (emulation). as may be expected of a young field, 6 there is a great deal of debate over the merits of each of these strategies. to date, the arguments mostly concern the quality of preservation, which is beyond the scope of this work. what should not be contentious is that each strategy also imposes limitations on the potential scale of digital preservation. migration and normalization are intensive processes, in the sense that they normally require some level of human interaction. any human-mediated process limits the scale of an archival institution’s preservation activities, as trained staffs are a limited and expensive resource. emulation postpones the processing of data until it is later accessed, potentially allowing greater ingest of information. as a strategy, however, it remains at least partly theoretical and untested, increasing the possibility that future access will be limited. data redundancy deserves closer examination, as it has emerged as the gold standard in recent years. the limitations data redundancy imposes on digital preservation are two-fold. the first is that simple maintenance of multiple copies necessarily increases expenses, therefore—given equal levels of funding—less information can be preserved redundantly than can be preserved without such measures. (cost considerations are inextricably linked to every other limitation on digital preservation and are examined in greater detail in “finances,” below.) there are practical, technical limitations on the bandwidth, disk access, and processing speeds needed to perform practical limits to the scope of digital preservation | kastellec 65 parity checks (tests of each bit’s validity) of large datasets to guard against data loss. pushing against these limitations incurs dramatic costs, limiting the scale of digital preservation. current technology and funding are many orders of magnitude short of what is required to archive the amount of information desired by society over the long term. 7 the second way technology limits digital preservation is more complex—it concerns error rates of archived data. non-redundant storage strategies are also subject to errors, of course. only redundant systems have been proposed as a theoretical solution to the technological problem of digital preservation, 8 though, so it is necessary to examine their error rate in particular. on a theoretical level, given sufficient copies, redundant backup is all but infallible. in practice, technological limitations emerge. 9 the number of copies required to ensure perfect bit preservation is a function of the reliability of the hardware storing each copy. multiple studies have found that hardware failure rates greatly exceed manufacturers’ claims. 10 rosenthal argues that, given the extreme time spans under consideration, storage reliability is not just unknown but untestable. 11 he therefore concludes that it cannot be known with certainty how many copies are needed to sustain acceptably low error rates. even today’s best digital preservation technologies are subject to some degree of loss and error. analog materials are also inevitably subject to deterioration, of course, but the promise of digital media leads many to unrealistic expectations of perfection. nevertheless, modern digital preservation technology addresses the fundamental needs of archival institutions to a workable degree. technological limitations to digital preservation still exist but the aspects of digital preservation beyond purely technical considerations—access, selection, law, and finances— should gain greater relative importance than they have in the past. access with regard to digital preservation, there are two different dimensions of access that are important. at one end of a digital preservation operation, authorized users must be able to access an archival institution’s holdings and unauthorized users restricted from doing so. this is largely a question of technology and rights management—users must be able to access preserved information and permitted to do so. this dimension of access is addressed in the technology and law sections of this paper. the other dimension of access occurs at the other end of a digital preservation operation: an archival institution must be able to access a digital object to preserve it. this simple fact leads to serious restrictions on the scope of digital preservation because much of the world’s digital information is inaccessible for the purposes of archiving by libraries and archives. there are a number of reasons why a given digital object may be inaccessible. large-scale harvesting of webpages requires automated programs that “crawl” the web, discovering and capturing pages as they go. web crawlers cannot access password-protected sites (e.g., facebook) and database-backed sites (all manner of sites, including many blogs, news sites, e-commerce sites, information technology and libraries | june 2012 66 and countless collections of data). this inaccessible portion of the web is estimated to dwarf the readily accessible portion by orders of magnitude. there is also an enormous amount of inaccessible digital information that is not part of the web at all, such as emails, company intranets, and digital objects created and stored by individuals. 12 additionally, there is a temporal limit to access. some digital objects only are accessible (or even exist) for a short window of time, and all require some measure of active preservation to avoid permanent loss. 13 the lifespans of many webpages are vanishingly short. other pages, like some news items, are publicly accessible for a short window before they are hidden behind paywalls. even long-lasting digital objects are often dynamic: the ads accompanying a webpage may change with each visit; news articles and other documents are revised; blog posts and comments are deleted. if an archival institution cannot access a digital object quickly or frequently enough, the object cannot be archived, at least not completely. large-scale digital preservation, which in practice necessarily relies on periodic automated harvesting of content, is therefore limited to capturing snapshots of the changes digital objects undergo over their lifespans. law existing copyright law does not translate well to the digital realm. leaving aside the complexities of international copyright law, in the united states it is not clear, for example, whether an archival institution like the library of congress is bound by licensing restrictions and if it can require deposit of digital objects, nor whether content on the web or in databases should be treated as published or unpublished. 14 “many of the uncertainties come from applying laws to technologies and methods of distribution they were not designed to address.” 15 a lack of revised laws or even relevant court decisions significantly impacts the potential scale of digital preservation, as few archival institutions will venture to preserve digital objects without legal protection for doing so. given this unclear legal environment, efforts at large-scale digital preservation are hampered by the need to secure permission to archive from the rights holder of each piece of content. 16 this obviously has enormous impact on preserving the web, but even scholarly databases and periodical archives may not hold full rights to all of their published content. additionally, a single digital object can include content owned by any number of authors, each of whose permission is needed for legal archival. without stronger legal protection for archival institutions, the scope of digital preservation is severely limited by copyright restrictions. digital preservation is further limited by licensing agreements, which can be even more restrictive than general copyright law. frequently, purchase of a digital object does not transfer ownership to the end-user, but rather grants limited licensed access to the object. in this case, libraries do not enjoy the customary right of first sale that, among other things, allows for actions related to preservation that would otherwise breach copyright. 17 preservation of licensed works requires that libraries either cede archival responsibility to rights practical limits to the scope of digital preservation | kastellec 67 holders, negotiate the right to archive licensed copies, or create dark archives that preserve objects in an inaccessible state until their copyright expires. selection the limitation selection imposes on digital preservation hinges on the act of intellectual appraisal. the total digital content created each year already outstrips the total current storage capacity of the world by a wide margin. 18 it is clear libraries and archives cannot preserve everything so, more than ever, deciding what to preserve is critical. 19 models of selection for digital objects can be plotted on a scale according to the degree of human mediation they entail. at one end, the selective model is closest to selection in the analog world, with librarians individually identifying digital objects worthy of digital preservation. at the other end of the scale, the whole domain model involves minimal human-mediation, with automated harvesting of digital objects. the collaborative model, in which archival institutions negotiate agreements with publishers to deposit content, falls somewhere between these two extremes, as does the thematic model, which can apply either selectiveor whole-domain-type approaches to relatively narrow sets of digital objects defined by event, topic, or community. each of these approaches results in limits to the scope of digital preservation. the human mediation of the selective model limits the scale of what can be preserved, as objects can only be acquired as quickly as staff can appraise them. the collaborative and thematic models offer the potential for thorough coverage of their target but by definition are limited in scope. the whole domain model avoids the bottleneck of human appraisal but, more than any other model, is subject to the access limitations discussed above. whole domain harvesting is also essentially wasteful, as it is an anti-selection approach—everything found is kept, irrespective of potential value. this wastefulness makes the whole domain model extremely expensive because of the technological resources required to manage information at such a scale. finances the ultimate limiting factor is financial reality. considerations of funding and cost have both broad and narrow effects. the narrow effects are on each of the other limitations previously identified— financial constraints are intertwined with the constraints imposed by technology, access, law, and selection. the technological model of digital preservation that offers the highest quality and lowest risk, redundant offsite copies, also carries hard-to-sustain costs. while the cost of storage continues to drop, hardware costs actually make up only a small percentage of the total cost of digital preservation. power, cooling, and—for offsite copy strategies—bandwidth costs are significant and do not decrease as scale increases to the same degree that storage costs do. cost considerations similarly fuel non-technical limitations: increased funding can increase the rate at which digital objects are accessed for preservation and can enable development of systems to mine deep web resources. selection is limited by the number of staff who can evaluate objects or information technology and libraries | june 2012 68 the need to develop systems to automate appraisal. negotiating perpetual access to objects or arranging to purchase archival copies creates additional costs. the broad financial effect is that any digital preservation requires dedicated funding over an indefinite timespan. lavoie outlines the problem: much of the discussion in the digital preservation community focuses on the problem of ensuring that digital materials survive for future generations. in comparison, however, there has been relatively little discussion of how we can ensure that digital preservation activities survive beyond the current availability of soft-money funding; or the transition from a project's first-generation management to the second; or even how they might be supplied with sufficient resources to get underway at all. 20 there are many possible funding models for digital preservation, 21 each with their own limitations. creators and rights holders can preserve their own content but normally have little incentive to do so over the long-term, as demand for access slackens. publicly funded agencies can preserve content, but they may lack a clear mandate for doing so, and they are chronically underfunded. preservation may be voluntarily funded, as is the case for wikipedia, although it is not clear if there is enough potential volunteer funding for more than a few preservation efforts. fees may support preservation, either through charging users for access or by third-party organizations charging content owners for archival services; in such cases, however, fees may also discourage access or provision of content, respectively. a nested model of limitations these aspects can be seen as a series of nested constraints (see figure 1). practical limits to the scope of digital preservation | kastellec 69 figure 1. nested model of limitations at the highest level, there are technical limitations on how much digital information can be preserved at an acceptable quality. within that constraint, only a limited portion of what could possibly be preserved can be accessed by archival institutions for digital preservation. next, within that which is accessible, there are legal limitations on what may be archived. the subset defined by technological, access, and legal limitations still holds far more information than archival institutions are capable of archiving, therefore selection is required, entailing either the limited quality of automated gathering or the limited quantity of human-mediated appraisal. finally, each of these constraints is in turn limited by financial considerations, so finances exert pressure at each level. conclusion it is possible to envision alternative ways to model these series of constraints—the order could be different, or they could all be centered on a single point but not nested within each other. thus, undue attention should not be given to the specific sequence outlined above. one important conclusion that may be drawn, however, is that the identified limitations are related but distinct. the preponderance of digital preservation research to date has understandably focused on overcoming technological limitations. with the establishment of the redundant backup model, which addresses technological limitations to a workable degree, the field would be well served by greater efforts to push back the non-technical limitations of access, law, and selection. the other conclusion is that costs are digital preservation’s most pervasive limitation. as rosenthal plainly states it, “society’s ever-increasing demands for vast amounts of data to be kept for the future are information technology and libraries | june 2012 70 not matched by suitably lavish funds.” 22 if funding cannot be increased, expectations must be tempered. perhaps it has always been the case, but the scale of the digital landscape makes it clear that preservation is a process of triage. for the foreseeable future, the amount of digital information that could possibly be preserved far outstrips the amount that feasibly can be preserved. it is useful to put the advances in digital preservation technology in perspective and to recognize that non-technical factors also play a large role in determining how much of our cultural heritage may be preserved for the benefit of future generations. references and notes 1. issues specific to digitized objects (i.e., digital versions of analog originals) are not specifically addressed herein. technological limitations apply equally to digitized and born-digital objects, however, and the remaining limitations overlap greatly in either case. 2. francine berman et al., sustainable economics for a digital planet: ensuring long-term access to digital information (blue ribbon task force on sustainable digital preservation and access, 2010), http://brtf.sdsc.edu/biblio/brtf_final_report.pdf (accessed apr. 23, 2011). 3. marilyn deegan and simon tanner, “some key issues in digital preservation,” in digital convergence—libraries of the future, ed. rae earnshaw and john vince, 219–37 (london: springer london, 2007), www.springerlink.com.proxyremote.galib.uga.edu/content/h12631/#section=339742&page=1 (accessed nov. 18, 2010). 4. berman et al., sustainable economics for a digital planet; deegan and tanner, “digital convergence.” 5. data redundancy normally will also entail hardware migration; it may or may not also incorporate file format migration. 6. the library of congress, for instance, only began digital preservation in 2000 (www.digitalpreservation.gov/partners/pioneers/index.html [accessed apr. 24, 2011]). 7. david s. h. rosenthal, “bit preservation: a solved problem?” international journal of digital curation 5, no. 1 (july 21, 2010), www.ijdc.net/index.php/ijdc/article/view/151 (accessed mar. 14, 2011). 8. h. m. gladney, “durable digital objects rather than digital preservation,” january 1, 2008, http://eprints.erpanet.org/149 (accessed mar. 14, 2011). 9. rosenthal, “bit preservation.” 10. ibid. rosenthal cites studies by schroeder and gibson (2007) and pinheiro (2007). 11. ibid. http://brtf.sdsc.edu/biblio/brtf_final_report.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.springerlink.com.proxy-remote.galib.uga.edu/content/h12631/%23section=339742&page=1 file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.springerlink.com.proxy-remote.galib.uga.edu/content/h12631/%23section=339742&page=1 http://www.digitalpreservation.gov/partners/pioneers/index.html file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.ijdc.net/index.php/ijdc/article/view/151 http://eprints.erpanet.org/149/ practical limits to the scope of digital preservation | kastellec 71 12. peter lyman, “archiving the world wide web,” in building a national strategy for digital preservation: issues in digital media archiving (washington, dc: council on library and information resources and library of congress, 2002), 38–51, www.clir.org/pubs/reports/pub106/pub106.pdf (accessed dec. 1, 2010); f. mccown, c. c marshall, and m. l nelson, “why web sites are lost (and how they’re sometimes found),” communications of the acm 52, no. 11 (2009): 141–45; margaret e. phillips, “what should we preserve? the question for heritage libraries in a digital world,” library trends 54, no. 1 (summer 2005): 57–71. 13. deegan and tanner, “digital convergence”; mccown, marshall, and nelson, “why web sites are lost (and how they’re sometimes found).” 14. june besek, copyright issues relevant to the creation of a digital archive: a preliminary assessment (the council on library and information resources and the library of congress, 2003), www.clir.org/pubs/reports/pub112/contents.html (accessed mar. 15, 2011). 15. ibid., 17. 16. archival institutions that do not pay heed to this restriction, such as the internet archive (www.archive.org), claim their actions constitute fair use. the legality of this claim is as yet untested. 17. berman et al., sustainable economics for a digital planet. 18. francine berman, “got data?” communications of the acm 51, no. 12 (december 2008): 50, http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=acm&idx=j79&part =magazine&wanttype=magazines&title=communications (accessed nov. 20, 2010). 19. phillips, “what should we preserve?” 20. brian f. lavoie, “the fifth blackbird,” d-lib magazine 14, no. 3/4 (march 2008): i, www.dlib.org/dlib/march08/lavoie/03lavoie.html (accessed mar. 14, 2011). 21. berman et al., sustainable economics for a digital planet. 22. rosenthal, “bit preservation.” http://www.clir.org/pubs/reports/pub106/pub106.pdf file:///c:/users/gerrityr/documents/my%20dropbox/ital/ital_june_2012_preprints/,%20http:/www.clir.org/pubs/reports/pub112/contents.htm http://www.archive.org/ http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=acm&idx=j79&part=magazine&wanttype=magazines&title=communications http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=acm&idx=j79&part=magazine&wanttype=magazines&title=communications http://www.dlib.org/dlib/march08/lavoie/03lavoie.html http://www.dlib.org/dlib/march08/lavoie/03lavoie.html 52 information technology and libraries | june 2006 author name and second author author id box for 2 column layout this paper discusses google scholar as an extension of kilgour’s goal to improve the availability of information. kilgour was instrumental in the early development of the online library catalog, and he proposed passage retrieval to aid in information seeking. google scholar is a direct descendent of these technologies foreseen by kilgour. google scholar holds promise as a means for libraries to expand their reach to new user communities, and to enable libraries to provide quality resources to users during their online search process. editor’s note: this article was submitted in honor of the fortieth anniversaries of lita and ital. f red kilgour would probably approve of google scholar. kilgour wrote that the paramount goal of his professional career is “improving the availability of information.”1 he wrote about his goal of achieving this increase through shared electronic cataloging, and even argued that shared electronic cataloging will move libraries toward the goal of 100 percent availability of information.2 throughout much of kilgour’s life, 100 percent availability of information meant that all of a library’s books would be on the shelves when a user needed them. in proposing shared electronic cataloging—in other words, online union catalogs—kilgour was proposing that users could identify libraries’ holdings without having to travel to the library to use the card catalog. this would make the holdings of remote libraries as visible to users as the holdings of their local library. kilgour went further than this, however, and also proposed that the full text of books could be made available to users electronically.3 this would move libraries toward the goal of 100 percent availability of information even more than online union catalogs. an electronic resource, unlike physical items, is never checked out; it may, in theory, be simultaneously used by an unlimited number of users. where there are restrictions on the number of users of an electronic resource—as with subscription services such as netlibrary, for example—this is not a necessary limitation of the technology, but rather a limitation imposed by licensing and legal arrangements. kilgour understood that his goal of 100 percent availability of information would only be reached by leveraging increasingly powerful technologies. the existence of effective search tools and the usability of those tools would be crucial so that the user would be able to locate available information without assistance.4 to achieve this goal, therefore, kilgour proposed and was instrumental in the early development of much library automation: he was behind the first uses of punched cards for keeping circulation records, he was behind the development of the first online union catalog, and he called for passage retrieval for information seeking at a time when such systems were first being developed.5 this development and application of technology was all directed toward the goal of improving the availability of information. kilgour stated that the goal of these proposed information-retrieval and other systems was “to supply the user with the information he requires, and only that information.”6 shared catalogs and electronically available text have the effect of removing both spatial and temporal barriers between the user and the material being used. when the user can access materials “from a personal microcomputer that may be located in a home, dormitory, office, or school,” the user no longer has to physically go to the library.7 this is a spatial barrier when the library is located at some distance from the user, or if the user is physically constrained in some way. even if the user is perfectly able-bodied, however, and located close to a library, electronic access still eliminates a temporal barrier: accessing materials online is frequently faster and more convenient than physically going to the library. electronic access enables 100 percent availability of information in two ways: by ensuring that the material is available when the user wants it, and by lowering or removing any actual or perceived barriers to the user accessing the material. ■ library automation weise writes that “for at least the last twenty to thirty years, we [librarians] have done our best to provide them [users] with services so they won’t have to come to the library.”8 the services that weise is referring to are the ability for users to search for and gain access to the full text of materials online. libraries of all types have widely adopted these services: for example, at the author’s own institution, the university of north carolina at chapel hill, the libraries have subscriptions to approximately seven hundred databases and provide access to more than 32,000 unique periodical titles; many of these subscriptions provide access to the full text of materials.9 additionally, the state library of north carolina provides a set of more than one hundred database subscriptions to all academic and public libraries around the jeffrey pomerantz jeffrey pomerantz (pomerantz@unc.edu) is assistant pro fessor in the school of information and library science, university of north carolina at chapel hill. google scholar and 100 percent availability of information google scholar and 100 percent availability of information | pomerantz 53 state; any north carolina resident with a library card may access these databases.10 several other states have similar programs. by providing users with remote access to materials, libraries have created an environment in which it is possible for users to be remote from the library. or rather, as lipow points out, it is the library that is remote from the user, yet the user is able to seek and find information.11 this adoption of technology by libraries has had the effect of enabling and empowering users to seek information for themselves, without either physically going to a library or seeking a librarian’s assistance. the increasing sophistication of freely available tools for information seeking on the web has accelerated this trend. in many cases, users may seek information for themselves online without making any use of a library’s human-intermediated or other traditional services. (certainly, providing access to electronic collections may be considered to be a service of the library, but this is a service that may not require the user either to be physically in the library or to communicate with a librarian.) even technically unsophisticated users may use a search engine and locate information that is “good enough” to fulfill their information needs, even if it is not the ideal or most complete information for those purposes.12 thus, for better or worse, the physical library is no longer the primary focus for many information seekers. part of this movement by users toward self-sufficiency in information seeking is due to the success of the web search engine, and to the success of google in particular. recent reports from the pew internet and american life project shed a great deal of light on users’ use of these tools. rainie and horrigan found that “on a typical day at the end of 2004, some 70 million american adults logged onto the internet.”13 fallows found that “on any given day, 56% of those online use search engines.”14 fallows, rainie, and mudd found that of their respondents, “47% say that google is their top choice of search engine.”15 from these figures, it can be roughly estimated that more than 39 million people use search engines, and more than 18 million use google on any given day—and that is only within the united states. this trend seems quite dark for libraries, but it actually has its bright side. it is important to make a distinction here between use of a search engine and use of a reference service or other library service. there is some evidence that users’ questions to library reference services are becoming more complex.16 why this is occurring is less clear, but it may be hypothesized that users are locating information that is good enough to answer their own simple questions using search engines or other internet-based tools. the definition of “good enough” may differ considerably between a user and a librarian. nevertheless, one function of the library is education, and as with all education, the ultimate goal is to make the student self-sufficient in self-teaching. in the context of the library, this means that one goal is to make the user self-sufficient in finding, evaluating, and using information resources. if users are answering their own simple questions, and asking the more difficult questions, then it may be hypothesized that the widespread use of search engines has had a role in raising the level of debate, so to speak, in libraries. rather than providing instruction to users on simply using search engines, librarians may now assume that some percentage of library users possess this skill, and may focus on teaching higher-level information-literacy skills to users (www.ala.org/ala/acrl/ acrlstandards/informationliteracycompetency.htm). simple questions that users may answer for themselves using a search engine, and complex questions requiring a librarian’s assistance to answer are not opposites, of course, but rather two ends of a spectrum of the complexity of questions. while the advance of online search tools may enable users to seek and find information for themselves at one end of this spectrum, it seems unlikely that such tools will enable users to do the same across the entire spectrum any time soon; perhaps ever. the author believes that there will continue to be a role for librarians in assisting users to find, evaluate, and use information. it is also important to make another distinction here, between the discovery of resources, and access to those resources. libraries have always provided mechanisms for users to both discover and access resources. neither the card catalog nor the online catalog contains the full text of the materials cataloged; rather, these tools are means to enable the user to discover the existence of resources. the user may then access these resources by visiting the library. search engines, similar to the card and online catalogs, are tools primarily for discovery of resources: search-engine databases may contain cached copies of web pages, but the original (and most up-todate) version of the web page resides elsewhere on the web. thus, a search engine enables the user to discover the existence of web pages, but the user must then access those web pages elsewhere. the author believes that there will continue to be a role for libraries in providing access to resources—regardless of where the user has discovered those resources. in order to ensure that libraries and librarians remain a critical part of the user’s information-seeking process, however, libraries must reappropriate technologies for online information seeking. search engines may exist separate from libraries, and users may use them without making use of any library service. however, libraries are already the venue through which users access much online content—newspapers, journals, and other periodicals; reference sources; genealogical materials—even if many users do not physically come to the library or consult a librarian when using them. it is possible for 54 information technology and libraries | june 2006 libraries to add value to search technologies by providing a layer of service available to those using it. ■ google scholar one such technology for online information seeking to which libraries are already adding value, and that could add value to libraries in turn, is google scholar (scholar. google.com). google scholar is a specialty search tool, obviously provided by google, which enables the user to search for scholarly literature online. this literature may be on the free web (as open-access publications become more common and as scholars increasingly post preprint or post-print copies of their work on their personal web sites), or it may be in subscription databases.17 users may access literature in subscription databases in one of two ways: (1) if the user is affiliated with an institution that subscribes to the database, the user may access it via whatever authentication method is in place at the institution (e.g., ip authentication, a proxy server), or (2) if the user is not affiliated with such an institution, the user may pay for access to individual resources on a pay-perview basis. there is not sufficient space here to explore the details of google scholar’s operation, and anyway that is not the point of this paper; for excellent discussions of the operation of google scholar, see gardner and eng, and jacsó.18 pace draws a distinction between federated searching and metasearching: federated search tools compile and index all resources proactively, prior to any user’s actual search, in a just-in-case approach to users’ searching.19 metasearch tools, on the other hand, search all resources on the fly at the time of a user’s search, in a just-in-time approach to users’ searching. google scholar is a federated search tool—as, indeed, are all of google’s current services—in that the database that the user searches is compiled prior to the user’s actual search. in this, google scholar is a direct descendent of kilgour’s work to develop shared online library catalogs. a shared library catalog is a union catalog: it is a database of libraries’ physical holdings, compiled prior to any actual user’s search. google scholar is also a union catalog, though a catalog of publishers’ electronic offerings provided by libraries, rather than of libraries’ physical holdings. it should be noted, however, that while this difference is an important one for libraries and publishers, it might not be understood or even relevant for many users. many of the resources indexed in google scholar are also available in full text. this fact allows google scholar to also move in the direction of kilgour’s goal of making passage retrieval possible for scholarly work. by using google’s core technology—the search engine and the inverted index that is created when pages are indexed by a search engine—google scholar enables full-text searching of scholarly work. as mentioned above, when users search google scholar, they retrieve a set of links to the scholarly literature retrieved by the search. google scholar also makes use of google’s linkanalysis algorithms to analyze the network of citations between publications—instead of the network of hyperlinks between web pages, as google’s search engine more typically analyzes. a cited by link is included with each retrieved link in google scholar, stating how many other publications cite the publication listed. clicking on this cited by link performs a preformulated search for those publications. this citation-analysis functionality resembles the functionality of one of the most common and widely used scholarly databases in the scholarly community: the isi web of science (wos) database (scientific .thomson.com/products/wos). wos enables users to track citations between publications. this functionality has wide use in scholarly research, but until google scholar, it has been largely unknown outside of the scholarly community. with the advent of google scholar, however, this functionality may be employed by any user for any research. further, there is a plugin for the firefox browser (www.mozilla.com/firefox) that displays an icon for every record on the page of retrieved results that links to the appropriate record in the library’s opac (google scholar does not, however, currently provide this functionality natively20). this provides a link from google scholar to the materials that the library holds in its collection. when the item is a book, for example, this link to the opac enables users to find the call number of the book in their local library. when the item is a journal, it enables them to find both the call number and any database subscriptions that index that journal title. periodicals are often indexed in multiple databases, so libraries with multiple-database subscriptions often have multiple means of accessing electronic versions of journal titles. a library user may access a periodical via any or all of these individual subscriptions without using google scholar— but to do so, the user must know which database to use, which means knowing either the topical scope of a database or knowing which specific journals are indexed in a database. as a more centralized means of accessing this material, many users may prefer a link in google scholar to the library’s opac. google scholar thus fulfills, in large part, kilgour’s vision of shared electronic cataloging. in turn, shared cataloging goes a long way toward achieving kilgour’s vision of 100 percent availability of information by allowing a user to discover the existence of information resources. however, discovery of resources is only half of the equation: the other half is access to those resources. and it is here where libraries may position themselves as a critical part of the information-seeking process. search engines google scholar and 100 percent availability of information | pomerantz 55 may enable users to discover information resources on their own, without making use of a library’s services, but it is the library that provides the “last mile” of service, enabling users to gain access to many of those resources. ■ conclusion google scholar is the topic of a great deal of debate, both in the library arena and elsewhere.21 unlike union catalogs and many other online resources used in libraries, it is unknown what materials are included in google scholar, since as of this writing google has not released information about which publishers, titles, and dates are indexed.22 google is known to engage in self-censorship—or self-filtering, depending on what coverage one reads—and so potentially conflicts with the american library association’s freedom to read statement (www .ala.org/ala/oif/statementspols/ftrstatement/freedom readstatement.htm).23 google is a commercial entity and, as such, a primary motivation of google must be profit, and only secondarily, meeting the information needs of library users. for all of these and other reasons, there is considerable debate among librarians about whether it is appropriate for libraries to provide access to google scholar. despite this debate, however, users are using google scholar. google scholar is simply the latest tool to enable users to seek information for themselves; it isn’t the first and it won’t be the last. google scholar holds a great deal of promise for libraries due to the combination of google’s popularity and ease of use, and the resources held by or subscribed to by libraries to which google scholar points. as kesselman and watstein suggest, “libraries and librarians need to have a voice” in how tools such as google scholar are used, given that “we are the ones most passionate about meeting the information needs of our users.” given that library users are using google scholar, it is to libraries’ benefit to see that it is used well. google scholar is the latest tool in a long history of information-seeking technologies that increasingly realize kilgour’s goal of achieving 100 percent availability of information. google scholar does not provide access to 100 percent of information resources in existence; but rather enables discovery of information resources, and allows for the possibility that these resources will be discoverable by the user 100 percent of the time. google scholar may be on the vanguard of a new way of integrating library services into users’ everyday information-seeking habits. as taylor tells us, people have their own individual sources to which they go to find information, and libraries—for many people—are not at the top of their lists.25 google, however, is at the top of the list for a great many people.26 properly harnessed by libraries, therefore, google scholar has the potential to bring users to library resources when they are seeking information. google scholar may not bring users physically to the library. instead, what google scholar can do is bring users into contact with resources provided by the library. this is an important distinction, because it reinforces a change that libraries have been undergoing since the advent of the online database: that of providing access to materials that the library may not own. ownership of materials potentially allows for a greater measure of control over the materials and their use. ownership in the context of libraries has traditionally meant ownership of physical materials, and physical materials by nature restrict use, since the user must be physically collocated with the materials, and use of materials by one user precludes use of those materials by other users for the duration of the use. providing access to materials, on the other hand, means that the library may have less control over materials and their use, but this potentially allows for wider use of these materials. by enabling users to come into contact with library resources in the course of their ordinary web searches, google scholar has the potential to ensure that libraries remain a critical part of the user’s information-seeking process. it benefits google when a library participates with google scholar, but it also benefits the library and the library’s users: the library is able to provide users with a familiar and easy-to-use path to materials. this is (for lack of a better term) a “spoonful of sugar” approach to seeking and finding information resources: by using an interface that is familiar to users, libraries may provide quality information sources in response to users’ information seeking. green wrote that “a librarian should be as unwilling to allow an inquirer to leave the library with his question unanswered as a shop-keeper is to have a customer go out of his store without making a purchase.”27 a modern version of this might be that a librarian should be as unwilling to allow an inquirer to abandon a search with his question unanswered. google scholar and online tools like it have the potential to draw users away from libraries; however, these tools also have the potential to usher in a new era of service for libraries: an expansion of the reach of libraries to new users and user communities; a closer integration with users’ searches for information; and the provision of quality resources to all users, in response to all information needs. google scholar and online tools like it have the potential to enable libraries to realize kilgour ’s goals of improving the availability of information, and to provide 100 percent availability of information. these are goals on which all libraries can agree. 56 information technology and libraries | june 2006 ■ acknowledgements many thanks to lisa norberg, instruction librarian, and timothy shearer, systems librarian, both at the university of north carolina at chapel hill, for many extensive conversations about google scholar, which approached coauthorship of this paper. this paper is dedicated to the memory of kenneth d. shearer. references and notes 1. frederick g. kilgour, “historical note: a personalized prehistory of oclc,” journal of the american society for information science 38, no. 5 (1987): 381. 2. frederick g. kilgour, “future of library computerization,” in current trends in library automation: papers presented at a workshop sponsored by the urban libraries council in cooperation with the cleveland public library, alex ladenson, ed. (chicago: urban libraries council, 1981), 99–106; frederick g. kilgour, “toward 100 percent availability,” library journal 114, no. 19 (1989): 50–53. 3. kilgour, “toward 100 percent availability.” 4. frederick g. kilgour, “lack of indexes in works on information science,” journal of the american society for information science 44, no. 6 (1993): 364; frederick g. kilgour, “implications for the future of reference/information service,” in collected papers of frederick g. kilgour: oclc years, lois l. yoakam, ed. (dublin, ohio: oclc online computer library center, inc., 1984): 9–15. 5. frederick g. kilgour, “a new punched card for circulation records,” library journal 64, no. 4 (1939): 131–33; kilgour, “historical note”; frederick g. kilgour and nancy l. feder, “quotations referenced in scholarly monographs,” journal of the american society for information science 43, no. 3 (1992): 266–70; gerald salton, j. allan, and chris buckley, “approaches to passage retrieval in full-text information systems,” in proceedings of the 16th annual international acm sigir conference on research and development in information retrieval (new york: acm pr., 1993), 49–58. 6. kilgour, “implications for the future of reference/information service,” 95. 7. kilgour, “toward 100 percent availability,” 50. 8. frieda weise, “being there: the library as place,” journal of the medical library association 92, no. 1 (2004): 10, www.pubmedcentral.nih.gov/articlerender.fcgi?artid=314099 (accessed apr. 9, 2006). 9. it is difficult to determine precise figures, as there is considerable overlap in coverage; several vendors provide access to some of the same periodicals. 10. north carolina’s database subscriptions are via the nc live service, www.nclive.org (accessed apr. 9, 2006). 11. anne g. lipow, “serving the remote user: reference service in the digital environment,” paper presented at the ninth australasian information online and on disc conference and exhibition, sydney, australia, 19–21 jan. 1999, www.csu.edu.au/ special/online99/proceedings99/200.htm (accessed apr. 9, 2006). 12. j. janes, “academic reference: playing to our strengths,” portal: libraries and the academy 4, no. 4 (2004): 533–36, http:// muse.jhu.edu/journals/portal_libraries_and_the_academy/ v004/4.4janes.html (accessed apr. 9, 2006). 13. lee rainie and john horrigan, a decade of adoption: how the internet has woven itself into american life (washington, d.c.: pew internet & american life project, 2005), 58, www.pewinter net.org/ppf/r/148/report_display.asp (accessed apr. 9, 2006). 14. deborah fallows, search engine users (washington, d.c.: pew internet & american life project, 2005), i, www.pew internet.org/pdfs/pip_searchengine_users.pdf (accessed apr. 9, 2006). 15. deborah fallows, lee rainie, and graham mudd, data memo on search engines (washington, d.c.: pew internet & american life project, 2004), 3, www.pewinternet.org/ppf/ r/132/report_display.asp (accessed apr. 9, 2006). 16. laura bushallow-wilber, gemma devinney, and fritz whitcomb, “electronic mail reference service: a study,” rq 35, no. 3 (1996): 359–69; carol tenopir and lisa a. ennis, “reference services in the new millennium,” online 25, no. 4 (2001): 40–45. 17. alma swan and sheridan brown, open access selfarchiving: an author study (truro, england: key perspectives, 2005), www.jisc.ac.uk/uploaded_documents/open%20access %20self%20archiving-an%20author%20study.pdf (accessed apr. 9, 2006). 18. susan gardner and susanna eng, “gaga over google? scholar in the social sciences,” library hi tech news 8 (2005): 42–45; péter jacsó, “google scholar: the pros and the cons,” online information review 29, no. 2 (2005): 208–14. 19. andrew pace, “introduction to metasearch . . . and the niso metasearch initiative,” presentation to the openurl and metasearch workshop, sept. 19–21, 2005, www.niso.org/news/ events_workshops/openurl-05-ppts/2-1-pace.ppt (accessed apr. 9, 2006). 20. this plugin was developed by peter binkley, digital initiatives technology librarian at the university of alberta. see www.ualberta.ca/~pbinkley/gso (accessed apr. 9, 2006). 21. see, for example, gardner and eng, “gaga over google?”; jacsó, “google scholar”; m. kesselman and s. b. watstein, “google scholar and libraries: point/counterpoint,” reference services review 33, no. 4 (2005): 380–87. 22. jacsó, “google scholar.” 23. anonymous, google censors itself for china, bbc news, jan. 25, 2006, http://news.bbc.co.uk/2/hi/technology/4645596 .stm (accessed apr. 9, 2006); a. mclaughlin, “google in china,” google blog., jan. 27, 2006, http://googleblog.blogspot .com/2006/01/google-in-china.html (accessed apr. 9, 2006). 24. kesselman and s. b. watstein, “google scholar and libraries,” 386. 25. robert s. taylor, “question-negotiation and information seeking in libraries,” college & research libraries 29, no. 3 (1968): 178–94. 26. fallows, rainie, and mudd, data memo on search engines. 27. samuel s. green, “personal relations between librarians and readers,” american library journal i, no. 2–3 (1876): 79. 1–11. a framework for measuring relevancy in discovery environments article a framework for measuring relevancy in discovery environments blake l. galbreath, alex merrill, and corey m. johnson information technology and libraries | june 2021 https://doi.org/10.6017/ital.v40i2.12835 abstract discovery environments are ubiquitous in academic libraries but studying their effectiveness and use in an academic environment has mostly centered around user satisfaction, experience, and task analysis. this study aims to create a quantitative, reproducible framework to test the relevancy of results and the overall success of washington state university’s discovery environment (primo by ex libris). within this framework, the authors use bibliographic citations from student research papers submitted as part of a required university class as the proxy for relevancy. in the context of this study, the researchers created a testing model that includes: (1) a process to produce machine-generated keywords from a corpus of research papers to compare against a set of human-created keywords, (2) a machine process to query a discovery environment to produce search result lists to compare against citation lists, and (3) four metrics to measure the comparative success of different search strategies and the relevancy of the results. this framework is used to move beyond a sentiment or task-based analysis to measure if materials cited in student papers appear in the results list of a production discovery environment. while this initial test of the framework produced fewer matches between researcher-generated search results and student bibliography sources than expected, the authors note that faceted searches represent a greater success rate when compared to open-ended searches. future work will include comparative (a/b) testing of commonly deployed discovery layer configurations and limiters to measure the impact of local decisions on discovery layer efficacy as well as noting where in the results list a citation match occurs. introduction discovery environments are ubiquitous in academic libraries as all but two libraries in the association of research libraries (arl) report using a discovery environment, and they continue to gain traction in other library settings.1 the one-stop shopping model of discovery environments is one of their most alluring features as it closely resembles searching the open web. this familiarity allows users who are accustomed to searching the web to feel comfortable searching the library catalog without fear of encountering a “failed” search (zero result set). discovery environments seldom fail to return results as even the most rudimentary or naïve search strategy will return something for a user. this idea of “returning something” has been anecdotally noted as a positive as it ensures the user does not give up and allows novices to be successful with limited search sophistication or prior instruction from information professionals. one of the potential negatives to this approach however is the sheer volume of material that is returned per search query. library discovery environments often present thousands, if not millions, of search results from an initial search query. this emulation of google is essentially blake l. galbreath (blake.galbreath@wsu.edu) is core services librarian, washington state university. alex merrill (merrilla@wsu.edu) is head of library systems and technical operations, washington state university. corey m. johnson (coreyj@wsu.edu) is instruction & assessment librarian, washington state university. © 2021. mailto:blake.galbreath@wsu.edu mailto:merrilla@wsu.edu mailto:coreyj@wsu.edu information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 2 making the time-honored study of relevancy (precision/recall) moot. how can one determine the number of relevant documents in a search query if the number of documents returned is becoming limitless? this study aims to create a quantitative, reproducible framework to test the relevancy of results returned from, and the overall efficacy of, a library discovery environment, in this case, ex libris primo. within this framework, the authors compare the results returned in model primo search queries against the bibliographic citations used in students’ research papers. background the university common requirements (ucore) curriculum, implemented in fall 2012, was a major redesign of the washington state university (wsu) undergraduate general education program. ucore is comprised of required categories of classes designed to build student proficiency in the seven undergraduate learning goals.2 roots of contemporary issues (rci) is the sole mandated undergraduate course under the ucore system.3 during the 2018–2019 academic year, over 4,500 students were enrolled in rci at wsu, the vast majority being first-year students. this paper utilizes data from the rci library research project, a term-length research experience with four central assignments designed to familiarize students with the fundamentals of quality research and a cumulative research paper where they utilize the skills learned. the research project components are spaced evenly throughout the term; students are guided along the research process from general topic formation, to research question generation, to thesis statement defense in the final paper. students are tasked with finding sources of particular resource types (e.g., journal articles), describing the value of these sources for their research, and citing them properly in chicago style. wsu libraries uses the discovery environment primo, an ex libris product, to provide resources to its patrons.4 specifically, wsu libraries uses the new user interface version of primo, which incorporates search results from the primo central index (pci) in its default search. primo, like all discovery environments, provides results with a wide variety of resource types so rci students can use it at all stages of the term research project. students use it in the pursuit of contemporary newspaper articles, history monographs, history journal articles, and primary sources. in this article, the authors focus on the versatility of primo, using rci student paper bibliographies as the central data source for the project. literature review the need for assessment of library resources and services in higher education has been welldocumented. libraries are increasingly asked to provide tangible evidence they aid student information literacy skill development and thus advance achievement of institutional learning outcomes. accrediting bodies acknowledge, “the importance of information literacy skills, and most accreditation standards have strengthened their emphasis on the teaching roles of libraries.”5 oakleaf and kaske also stress the importance of librarians choosing assessments that can contribute to university-wide assessment efforts, noting they are preferable to assessments that only benefit libraries.6 the washington state university libraries is committed to assessment of its resources and services, with primo as a central target resource, and with large, lowerundergraduate courses as a primary area of focus. information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 3 there are numerous papers which document usability testing of primo. prommann and zhang (2015) analyzed the efficiency of primo through hierarchical task analysis (hta). they counted the number of physical and cognitive steps necessary to get to records or full text of known items and concluded that primo is “a flexible discovery layer as it helps achieve many goals with minimum amount [sic] of steps.”7 although many of these studies articulate avenues of success in terms of user interaction with the discovery environment, there are also reports of difficulties in a variety of categories. students have problems with source retrieval, for example, understanding availability status terminology and labels, and using link resolvers and interlibrary loan.8 dalal et al. (2015) demonstrated that retrieving the full text of an article in a discovery environment is sometimes unintuitive for students and involves navigating multiple interfaces. 9 users also have issues using facets to find particular resource types or distinguishing between them. 10 while the study addressed in this paper does not directly address user difficulties with primo functionality, issues with source retrieval point to a plausible explanation for the few matches between the model search results and student paper bibliographies. it is possible students saw many of the same sources from the model searches in their results, but ultimately did not secure those sources because of the difficulties outlined above. in other words, some source selection choices are based mostly on availability, not as much on relevance. source relevancy is an active area of research for web-based discovery services, in terms of comparative studies to disciplinary subject databases. evelhoch and zebulin analyzed two years of usage data from both primo and a selection of subject databases, concluding that users have difficulty finding relevant sources in primo or they are not available. 11 based on users’ judgments, lee and chung, determined that ebsco discovery service was less effective than a set of education and library subject databases in terms of source relevance. 12 another study illustrated that while students preferred discovery environments, the articles they selected from the subject (indexing and abstracting) databases were more authoritative.13 finally, librarians are posited to believe that subject databases are superior to discovery environments in terms of the relevancy of search results and disciplinary coverage.14 conclusions about source relevancy are complicated by the fact that students infrequently look beyond a first page of results lists.15 researchers have also explored the idea of primo user satisfaction through the presence of relevant results. in one instance, using online questionnaires and in-person focus groups, researchers found users had a high level of satisfaction with their institution’s discovery environment, largely attributed to the quality of search results over ease of use.16 hamlett and georgas (2019) conducted a mixed-methods user experience study to understand student perceptions of relevancy in primo. this study found that participants believed primo to return relevant results (with an average score of 8.3 out of 10). however, some of the qualitative responses indicated that the keywords used did not actually yield relevant results. 17 many other methods and measures have been executed in determining the value and usefulness of primo. huurdeman, aamodt, and heggo analyzed a dataset of 50 popular queries in primo. they deemed a query successful if the first 10 results included the (likely) targeted resource and found that 58% of the queries from the popular searches dataset had been successful, while 20% were unsuccessful, and 22% could not be determined. their approach assumed there is one intended document per query and that the authors can surmise what it is.18 the research presented in the remainder of this article below is unique in that the authors explore user judgment of source relevance (satisfaction) as a function of whether sources in the model primo searches for their topics existed in the students’ papers’ bibliographies. information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 4 methods research questions the impetus for this study was to understand the factors that play a role in establishing a framework to test the relevancy of results returned from primo. the authors attempted to answer the following questions: • how effective is primo at returning relevant results? • to what extent does faceting improve search results? • which search strategies are the most effective within the given framework? • how can the researchers refine the framework for future investigations into relevancy? • what are the implications of this study for end users? data collection the authors began with a sample of 100 randomly selected and anonymized research papers that were submitted to the roots of contemporary issues (rci) courses in fall 2018 and spring 2019 semesters. the study used a two-pronged approach to generate keywords for model primo search queries. for one approach, keywords were machine-generated via a word-vector generation process. for the other, keywords were human-generated by a student research assistant to approximate natural language queries. keywords and queries, machine a rapidminer (https://rapidminer.com/) word-vector generation process with term-frequency schema converted the research papers into keywords, which the authors then used to generate search queries. within the main routine, the process documents from files operator, rapidminer transformed the texts into lower case and tokenized the final papers according to non -letters. rapidminer then filtered the data by those tokens representing nouns and adjectives, removed english stop words, and filtered tokens by length, with a minimum of one character and maximum of 50 characters. the researchers then applied a snowball stemmer for english words and generated 20 n-grams per paper, each with a maximum length of four. table 1 illustrates the product of the word-vector generation process. throughout this example research paper, "trade” occurred 40 times, "slave” occurred 34 times, “slave” and “trade” occurred together 26 times, "africa” occurred 18 times, "impact” occurred 16 times, “african” occurred 11 times, and "peopl” occurred 10 times. table 1. example n-grams and frequency as retrieved from rapidminer n-gram number of occurrences trade 40 slave 34 slave_trade 26 africa 18 impact 16 african 11 peopl 10 ... ... https://rapidminer.com/ information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 5 number of n-grams after compiling the data in rapidminer, the authors created a process to select those n-grams to use in the model primo search queries. huurdeman, aamodt, and heggo (2018) found that users included an average of 2.6 terms per query in their popular searches dataset.19 in a report by ex libris, stohn indicates that most topic-search queries contain five or fewer words.20 in order to investigate both ends of this spectrum, this study constructed short-length queries, consisting of two n-grams, and full-length queries, consisting of four n-grams, using the following rubric to help systematize the construction. rubric to select n-grams for shortand full-length queries pick terms that satisfy the following criteria: 1. n-grams that occur more frequently in a paper are preferred to those that occur less frequently. 2. if two n-grams appear to be structural derivatives of the same word (e.g., korea and korean), select the shortest n-gram and truncate it. 3. if one or more of the top terms appear in a later 2-gram, use the 2-gram as a phrase search. 4. ignore n-grams with repeating terms (e.g., south_africa_africa). 5. truncate all terms (using asterisk or question mark), except the first term of a phrase search, unless the first term is not a complete word (e.g., “busi* meeting*”). 6. for terms or phrases that end in truncated “i”, use the truncated version of the term and its truncated “y” counterpart, and combine both with an or operator (e.g., countri* or country*). 7. ignore all 3and 4-grams as they have a propensity to create nonsensical phrase searches (e.g., racism_polic_brutal). 8. if abbreviations are encountered, expand them for searching purposes (e.g., us is “united states”), except in cases where they are more commonly known by their abbreviation (e.g., ddt). 9. ignore results of contractions (e.g., ‘t) in case of a tie in the selection of an n-gram, sequence the following rules for selection: 1. preference proper nouns over other nouns and adjectives. if there are multiple proper nouns, preference place-name proper nouns over other proper nouns. 2. preference the n-gram that occurs in the greatest number of two or more n-grams later in the list. 3. preference longer words over shorter words. 4. group all the tied n-grams with a series of or statements. note: this may result in the selection of more than four total n-grams. referring to the example n-grams from table 1, an illustration of this method is shown in the following steps: 1. arrange terms from highest to lowest frequency. 2. select slave_trade as first n-gram, since “trade” and “slave” both occur in later n-gram. truncate to “slave trade*”. 3. select africa since it has the next greatest number of occurrences. combine africa with african since they are structural derivatives of one another. truncate to africa*. information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 6 at this point, the first two selected n-grams—slave_trade and africa—become the keywords of the short-length query “slave trade*” and africa*. 4. select impact since it has the next greatest number of occurrences. truncate to impact*. 5. select peopl since it has the next greatest number of occurrences. truncate to peopl*. finally, the first four selected n-grams—slave_trade, africa, impact, and peopl—become the keywords of the full-length query “slave trade*” and africa* and impact* and peopl*. on average, after stop words and booleans were removed, the full-length queries in this study were 5.69 keywords long, while the short-length queries were 3.11 keywords long. keywords and queries, natural language in addition to the machine-oriented keyword process, the authors employed a student research assistant to create human-generated phrases, consisting of 3–10 words, which served as synopses for each of the 100 papers. this study then used these phrases as proxies for creating natural language search queries. for the same example research paper cited in table 1 above, this student created the summary phrase history and effects of the slave trade. this phrase in its entirety became the natural language query. on average, after stop words and booleans were removed, the natural language queries used in this study were 3.95 keywords long. search results using the three keyword-generation strategies outlined above, the authors constructed search queries and ran them against the ex libris’ primo search api endpoint. table 2 summarizes example result sets from the above short-length query, full-length query, and natural language query. for each of the keyword-generation strategies, the authors constructed search queries along four parameters: queries that used no faceting (open-ended), queries that faceted to articles only (articles), queries that faceted to books and ebooks only (books), and queries that faceted to newspaper articles only (newspapers). in all, there were 12 search-query constructions (three query types by four faceting modes) for fall 2018 and 12 for spring 2019. to construct a baseline for the search comparisons, the researchers designed the initial search to be open-ended. that is, the study assumed that patrons most often use the default, basic search functionality, with no facets selected. a segment of the rci instruction specifically encourages students to incorporate materials with resource types articles, books, and newspaper articles into their research papers. the authors therefore assumed that these students would most likely utilize facets corresponding to these resource types in their more specific queries and mirrored this behavior in the comparative searches. each primo search api returned titles for the top 50 results, moving beyond users’ usual search behavior in an effort to provide more flexibility to the initial steps of the relevancy framework. information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 7 table 2. first-occurring result titles for query types: short-length, full-length, and natural language queries query type query first-occurring result titles short-length “slave trade*” and africa* the atlantic slave trade the atlantic slave trade : a census the atlantic slave trade legacy of the trans-atlantic slave trade : hearing before the subcommittee on the constitution, civil rights, and civil liberties of the committee on the judiciary, house of representatives, one hundred tenth congress, first session, december 18, 2007. ... full-length “slave trade*” and africa* and impact* and peopl* the atlantic slave trade the atlantic slave trade : effects on economies, societies, and peoples in africa, the americas, and europe slave trades, 1500–1800 : globalization of forced labour african voices of the atlantic slave trade : beyond the silence and the shame ... naturallanguage history and effects of the slave trade urban history, the slave trade, and the atlantic world 1500–1900 the atlantic slave trade and british abolition, 1760– 1810 the decolonization of african education and history the united states and the transatlantic slave trade to the americas, 1776–1867 ... a student research assistant harvested all the citations used across the 100 example papers to create an inventory of 730 bibliographic citations. using the excel fuzzy lookup add-in, the authors then compared this bibliographic inventory against the 60,000 titles that were returned information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 8 via the primo search api. this add-in fuzzy matches rows between two different tables and assigns a similarity score for each match. the study focused attention on rows with matching scores of .80 and above to further investigate potential matches. using the fuzzy matches as a starting point, the authors confirmed or denied matches by hand, using title and resource type as the main criteria. table 3. sample comparison of citations used in research papers against results returned from primo search api fuzzy score citation title citation resource type results title result resource type confirmed match 1.0000 a short history of biological warfare article a short history of biological warfare article yes 0.9933 the female madlady women, madness, and english culture, 1830– 1980 print book the female malady : women, madness, and english culture, 1830–1980 print book yes 0.9778 industrial revolution web resource the industrial revolution e-book no 0.9037 drug use & abuse print book drug use and abuse : a comprehensive introduction print book no results source citation data description this study compared citations gathered from a random sample of 100 research papers from the two semesters of all sections of history 105/305 taught at washington state university (wsu) from fall 2018 to spring 2019. table 4 below gives a descriptive breakdown of the citations by resource type. the student research assistant identified and categorized the source citation list. information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 9 table 4. total source citations resource type fall 2018 (% of total) spring 2019 (% of total) book chapter 7 (1.94%) 4 (1.08%) books (e-books/print) 107 (29.72%) 96 (25.95%) newspaper article 63 (17.50%) 60 (16.22%) journal article 84 (23.33%) 99 (26.76%) reference entry 6 (1.67%) 6 (1.62%) other/ cannot determine 10 (2.78%) 15 (4.05%) web document 81 (22.50%) 90 (24.32%) magazine article 1 (.28%) n/a newspaper/magazine article 1 (.28%) n/a semester citation count 360 (100%) 370 (100%) total citation count 730 target citations list data the citations collected from the papers were then compared against 60,000 citations retrieved from the wsu primo search api endpoint on july 24, 2020, as described previously in the methods section. to better account for the differing numbers of citations among resource types in the source data and to normalize reporting across query types and semesters, most results are presented as a percentage and referred to as the matching success rate. for example, the natural language query had six matches out of a possible 360 citations in the open-ended search for citations from the fall of 2018. the matching success rate of the open-ended search in the fall of 2018 therefore is calculated at 1.67% (see table 5). table 6 below shows the percentage results for short queries, and table 7 for full queries. for information about the raw source numbers and target data, please see the open science framework project site.21 when all query types and faceting modes are considered, the matching success rate almost uniformly increased from fall 2018 to spring 2019. the largest difference in matching success rate was observed in the full-query articles only search at 8.91% as shown in table 7. the open-ended search observed the smallest difference in positive movement and the anomaly of a diminishing success rate. across the natural language and full-query types the open-ended search exhibited the least amount of positive difference in success rate, at 1.04% and 0.26% respectively, and the short-query open-ended search had a small negative change in success rate at −0.36%. information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 10 table 5. natural language query results success rate fall 2018 spring 2019 % difference open-ended search 1.67% 2.70% 1.04% articles only 4.76% 9.09% 4.33% books only 3.74% 11.46% 7.72% newspapers only 0.00% 1.67% 1.67% table 6. short-query results success rate fall 2018 spring 2019 % difference open-ended search 3.33% 2.97% −0.36% articles only 3.57% 5.05% 1.48% books only 9.35% 10.42% 1.07% newspapers only 0.00% 3.33% 3.33% table 7. full-query results success rate fall 2018 spring 2019 % difference open-ended search 0.56% 0.81% 0.26% articles only 1.19% 10.10% 8.91% books only 0.93% 5.21% 4.27% newspapers only 0.00% 5.00% 5.00% total unique matches across all three search strategies and their four iterations, the researchers also note a raw count of matches which helps to determine how an overall search strategy is performing at finding matching citations. as the reader might expect, this metric includes a matching citation once across all four iterations of a search strategy. meaning, even if a source citation appears in both the open-ended search and the books only search, that source citation is only counted once for the purpose of this metric. for example, in the natural language query in fall 2018, six citations were matched in the openended search. four of the citations were articles and two were books. some of the matches in the articles and books searches were redundant to the open-ended search. considering only unique matches in the articles, books, and newspaper searches, the authors calculated the total number of unique matches. when the target searches were compared, the researchers matched two additional citations in the books only citations list. when the authors add the two additional matches, there were a total of eight unique citation matches across all iterations of the natural language search (open-ended search, books only, articles only, newspapers only). the total unique matches number and the corresponding success rate of the total unique matches for each search strategy is shown in table 8. information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 11 table 8. total unique matches fall 2018 spring 2019 % difference natural language query 8 (2.22%) 22 (5.95%) 3.72% short query 14 (3.89%) 16 (4.32%) 0.44% full query 3 (0.83%) 18 (4.86%) 4.03% matches added by faceting another metric used to measure overall effectiveness of faceted searching is the percentage of matching citations that are new to the results list when limited to a certain resource type— matches added by faceting. meaning, what matching citations were not present in the open-ended search results but are then matched when the results list is reduced to only a single resource type. in table 9, the percentage of matches that are new and only to be found in a targeted search result varies greatly. between both semesters and among all search iterations, the smallest percentage of matches added by faceting is 14.29% and the largest is 83.33%. table 9. matches added by faceting fall 2018 spring 2019 % difference natural language query 2 (25.00%) 12 (54.55%) 29.55% short query 2 (14.29%) 5 (31.25%) 16.96% full query 1 (33.33%) 15 (83.33%) 50.00% comparing search strategies the matching success rate across search strategies (natural, short, full) and iterations is a mixed result and does not allow for very useful comparison beyond descriptions of difference which are outlined in the comparison tables (tables 5–7). to better compare the search strategies as a whole, as opposed to how a particular iterative search performed relative to another open or targeted search, the researchers used a weighted success rate of the total unique matches from both semesters as the proxy for overall performance and the point of comparison among the three search strategies. the comparison of this weighted success rate shows no difference in overall success rate between the natural language query (4.11%) and the short query (4.11%). the search strategy that was demonstrably different in weighted success rate is the full query at a lagging 2.88%. see table 10 for comparison and calculation details. table 10. weighted success rate of total unique matches natural language query (2.22%*360)+(5.95%*370)/730 4.11% (0.04109589) short query (3.89%*360)+(4.32%*370)/730 4.11% (0.04109589) full query (0.83%*360)+(4.86%*370)/730 2.88% (0.02876712) discussion how effective is primo at returning relevant results? according to the preliminary findings, primo is relatively ineffective at providing search results that match the citations used by the student researchers. the matching success rates of the openinformation technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 12 ended searches range from 0.56% to 3.33%. the possible reasons for these low numbers are numerous and varied; everything from students perhaps intending to use sources in the researchers’ auto-generated results lists, but unfortunately were unable to locate the full text, to the prevalence of finding open internet sources outside the discovery layer, to open-ended searches being flooded with rarely cited reference materials and very contemporary newspaper articles (see more about these ideas below). future research aims to understand more clearly which potential factors are present and to what degree they impact the matching success rates. to what extent does faceting improve search results? faceting within primo leads to better results, although the matching success rates are still more ineffective than not. the faceted searches contain the only matching success rates above ten percent: 10.10% (full query, articles only), 10.42% (short query, books only), and 11.46% (natural language query, books only). the data shows that the majority of unique matches found by the 2019 full-length and natural language search strategies occurs within the faceted searches (83.33% and 54.55%, respectively). it is interesting to note that these represent the two longer query strings, on average. future testing will reveal whether there is a relationship between query length and percentage of matches added by faceting. which search strategies are the most effective within the given framework? looking at the search strategies holistically, the researchers note that the total unique matches increased from fall 2018 to spring 2019 across all three query types. this increase was expected behavior, partially due to the fact that primo relevancy ranking algorithms assume that patrons prefer newer materials.22 the weighted success rate is an attempt to understand each search strategy’s performance over the 2018–2019 academic year, as opposed to comparing one semester to the other. from this metric, the consistency of the short-length query is equally effective as the more dynamic performance of the natural language query. the researchers are looking forward to adding more data to this metric to understand in which direction the average might move. how to refine the framework for future investigations into relevancy the most popular resource types used in the source citations were books, journal articles, web documents, and newspaper articles. together, these categories comprised approximately 93% of all resource types in both fall 2018 and spring 2019. however, not all areas were equally accessible within washington state university’s discovery layer configuration. the heavy reliance on web documents in the source citations was somewhat problematic, given the fact that web documents did not constitute a faceted resource type in wsu libraries’ primo prior to this study. therefore, the authors will need to better account for web documents in future testing. the assessment of newspaper articles also proved to be problematic, given their proclivity to inundate primo search results with numerous and recent documents. the sheer number of newspaper articles published and indexed every year in primo for general and introductory topics can dilute the pool of possible target citations greatly. for example, a scan of the matching newspaper articles reveals that 67% (4/6) were published in 2018. in future studies, the researchers will limit publication dates for target citations to the appropriate time period (e.g., an upper limit of may 2019 would be placed on publication dates for papers written in spring 2019) or collect data closer to the submission of research papers. in 11 out of 12 cases, matching success rates were better in spring 2019 than fall 2018, most likely due to recency. it is common for discovery environments, and true for the environment used in this study, to present content information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 13 sorted by relevance and then publication date. therefore, the researchers expected to and did find an increased matching success rate closer to the date of testing, with the one exception of the short-length, open-ended search query. this anomaly led researchers to dig more deeply into the target citations to see if a cause could be determined. researchers found a larger than expected number of citations for resource types that are underrepresented in source citations. for example, the reference entry resource type surfaced prominently in the open-ended search for several of the queries, diluting the pool of target citations with entries that had little chance of appearing in the source citation lists. in one standout example, there were four separate reference entries titled simply “taiping rebellion.” the discovery environment gave preference to these four separate reference entries over other, more substantive works, that are more likely to be cited in an academic paper. the researchers surmise this is partly a function of the relevancy ranking algorithm that gives greater weight to matches in the title, author, and subject fields.23 depending on the search and the configuration of the discovery environment, it is possible that reference entries would push other results from books, articles, and newspaper resource types farther down the results list, making them less and less visible in an open-ended search for a given topic. this dilution of the target citations with resource types that are not emphasized or widely used in source citations is another area the researchers aim to isolate and examine in further rounds of testing. in addition to the source recency and particular source type issues explained above, the authors did not take into account source availability, nor where sources were found by students, which remains a confounding factor on matching success rate. subsequent studies will capture whether sources are present in the local deployment of primo during the time frame the students were conducting research. this issue will be further addressed and mitigated by analyzing urls provided within student source citations. implications of this study for end users the matching success rate in the open-ended search when compared to the type-limited searches leads to a discussion of how to define and present the default search of the discovery environment to best serve an academic population. more pointedly, it opens the discussion of what resource types to include within that default search to return the most relevant and useful results and not just the most results. in this case, the argument could be made that excluding several resource types (e.g., reference entries) would surface resources that are more likely to be cited in a researcher’s scholarship. based on the number of matches that were introduced by performing a faceted search, it is evident that researchers still need to utilize a search strategy which includes using search filters and limiters (prior to or following the initial search) and other search tactics in a discovery environment to return relevant results. the notion that an open-ended “one and done search,” for even the most introductory of topics, will be successful in retrieving many usable and citable resources in the first page or two of results is not supported by the results of this study. conclusions and next steps as the common adage goes, “it’s not what you say, it’s what you do.” in this study, the saying applies as the researchers move beyond what sources students think are relevant to the sources students ultimately use in their papers. the current slate of discovery environment research projects focuses largely on users’ affective connections to discovery environments, often information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 14 compared to other kinds of academic databases, and places users in temporary, hypothetical research scenarios in order to judge source relevance.24 in juxtaposition, the rci research project is a term-length (10–14 weeks) venture; students have a significant amount of time and the aid of a scaffolded set of assignments, to bolster their source relevance assessment skills and authority. methodologies which closely mirror the authentic experiences and curriculum of the students are those which arguably will provide a more accurate picture of the value of the discovery environment in an academic setting. the authors of this study took the first steps in building a relevancy rating system for discovery environments. to standardize their preliminary results, they generated four metrics: matching success rate, total unique matches, matches added by faceted search, and weighted success rate. while the results of this study do not allow the researchers to draw statistical conclusions regarding the dominance of one search strategy over another in returning relevant results, the frequencies showed a better match (success) rate with faceted than non-faceted searching. discovery environments are commonly advertised as providing an easy to use, one-stop location for academic research needs, but the reality is more complex. students need to engage these systems with multiple search refinements to find valuable materials. this investigation was also the initial attempt to create a machine-generated framework to test the relevancy of web-based discovery environment’s results. as the authors look to build upon this preliminary study, there are several avenues to pursue that will enhance the methodology of the framework. one avenue is a refinement of the boundaries of the testing framework. this boundary refinement includes a re-examination of the criteria for inclusion in both the source citations and the search results list. in the current study, all student citations were deemed viable regardless of whether the source citation was able to be verified and accessed. this led to the inclusion of citations of lecture notes and other such materials that are not generally expected to appear in a discovery environment. the authors will also re-examine the inclusion of newspapers and reference works in open-ended searching. these two resource types are large in number, are not indexed very well, and often do not have descriptive titles. a portion of the next round of research will be dedicated to comparative testing (a/b) of generally deployed discovery environment configurations. another avenue of exploration is determining where in the results list a citation appears, not just the binary positive or negative, and measuring any impact based on behavior of the search (i.e., search construction) or behavior and configuration of the discovery environment. refining the methodology of the current framework will result in fewer potentially confounding factors and allow librarians to regain an understanding of relevancy when it comes to teaching discovery layers to student researchers. these next steps will contribute to the overall picture concerning the value and efficacy of web-based discovery environments that is steadily taking shape. information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 15 endnotes 1 marshall breeding, “library technology guides: academic members of the association of research libraries: index-based discovery services,” library technology guides, https://librarytechnology.org/libraries/arl/discovery.pl. 2 “student learning goals,” washington state university common requirements, 2018, https://ucore.wsu.edu/about/learning-goals. 3 “welcome to the roots of contemporary issues,” washington state university department of history, 2017, https://ucore.wsu.edu/faculty/curriculum/root/. 4 “search it,” washington state university libraries, 2020, https://searchit.libraries.wsu.edu/. 5 megan oakleaf and neal kaske, “guiding questions for assessing information literacy in higher education,” portal: libraries and the academy 9, no. 2 (2009): 277, https://doi.org/10.1353/pla.0.0046. 6 oakleaf and kaske, “guiding questions.” 7 marlen prommann and tao zhang, “applying hierarchical task analysis method to discovery layer evaluation,” information technology and libraries 34, no. 1 (2015): 97, https://doi.org/10.6017/ital.v34i1.5600. 8 rice majors, “comparative user experiences of next-generation catalogue interfaces,” library trends 61, no. 1 (2012): 186–207, https://doi.org/10.1353/lib.2012.0029; david comeaux, “usability testing of a web-scale discovery system at an academic library,” college & undergraduate libraries 19, no. 2–4 (2012): 199, https://doi.org/10.1080/10691316.2012.695671; greta kliewer et al., “using primo for undergraduate research: a usability study,” library hi tech 34, no. 4 (2016): 566–84, http://doi.org/10.1108/lht-05-2016-0052; blake galbreath, corey m. johnson, and erin hvizdak, “primo new user interface,” information technology and libraries 37, no. 2 (2018): 10–33, https://doi.org/10.6017/ital.v37i2.10191. 9 heather dalal, amy kimura, and melissa hofmann, “searching in the wild: observing information-seeking behavior in a discovery tool” (association of college & research libraries 2015 conference proceedings, march 25–28, 2015): 668–75, http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 5/dalal_kimura_hofmann.pdf. 10 comeaux, “usability testing”; xi niu, tao zhang, and hsin-liang chen, “study of user search activities with two discovery tools at an academic library,” international journal of humancomputer interaction 30, no. 5 (2014): 422–33, https://doi.org/10.1080/10447318.2013.873281; kevin patrick seeber, “teaching ‘format as a process’ in an era of web-scale discovery,” reference services review 43, no. 1 (2015): 19– 30, https://doi.org/10.1108/rsr-07-2014-0023; kylie jarret, “findit@flinders: user experiences of the primo discovery search solution,” australian academic & research libraries 43, no. 4 (2012): 278–99, https://doi.org/10.1080/00048623.2012.10722288; aaron nichols et al., “kicking the tires: a usability study of the primo discovery tool,” journal of web librarianship 8, no. 2 (2014): 172–95, https://doi.org/10.1080/19322909.2014.903133; https://librarytechnology.org/libraries/arl/discovery.pl https://ucore.wsu.edu/about/learning-goals https://ucore.wsu.edu/faculty/curriculum/root/ https://searchit.libraries.wsu.edu/ https://doi.org/10.1353/pla.0.0046 https://doi.org/10.6017/ital.v34i1.5600 https://doi.org/10.1353/lib.2012.0029 https://doi.org/10.1080/10691316.2012.695671 http://doi.org/10.1108/lht-05-2016-0052 https://doi.org/10.6017/ital.v37i2.10191 http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2015/dalal_kimura_hofmann.pdf http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/2015/dalal_kimura_hofmann.pdf https://doi.org/10.1080/10447318.2013.873281 https://doi.org/10.1108/rsr-07-2014-0023 https://doi.org/10.1080/00048623.2012.10722288 https://doi.org/10.1080/19322909.2014.903133 information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 16 kelsey renee brett, ashley lierman, and cherie turner, “lessons learned: a primo usability study,” information technology and libraries 35, no. 1 (2016): 7–25, https://doi.org/10.6017/ital.v35i1.8965; galbreath, johnson, and hvizdak, “primo new user interface.” 11 zebulin evelhoch, “where users find the answer: discovery layers versus database,” journal of electronic resources librarianship 30, no. 4 (2018): 205–15, https://doi.org/10.1080/1941126x.2018.1521092. 12 boram lee and eunkyung chung, “an analysis of web-scale discovery services from the perspective of user’s relevance judgement,” journal of academic librarianship 42 (2016): 529–34, https://doi.org/10.1016/j.acalib.2016.06.016. 13 sarah p. c. dahlen and kathlene hanson, “preference vs. authority: a comparison of student searching in a subject-specific indexing and abstracting database and a customized discovery layer” college & research libraries 78, no. 7 (2017): 878–97, https://doi.org/10.5860/crl.78.7.878. 14 stefanie buck and christina steffy, “promising practices in instruction of discovery tools,” communications in information literacy 7, no. 1 (2013): 66–80, https://doi.org/10.15760/comminfolit.2013.7.1.135; anita k. foster, “determining librarian research preferences: a comparison survey of web-scale discovery systems and subject databases,” journal of academic librarianship 44 (2018): 330–36, https://doi.org/10.1016/j.acalib.2018.04.001. 15 diane cmor and xin li, “beyond boolean, towards thinking: discovery systems and information literacy,” 2012 iatul proceedings, paper 7, https://docs.lib.purdue.edu/iatul/2012/papers/7/; kliewer et al., “using primo”; alexandra hamlett and helen georgas, “in the wake of discovery: student perceptions, integration, and instructional design,” journal of web librarianship 13, no. 3 (2019): 230–45, https://doi.org/10.1080/19322909.2019.1598919. 16 courtney lundrigan, kevin manuel, and may yan, “‘pretty rad’: explorations in user satisfaction with a discovery layer at ryerson university,” college & research libraries 76, no. 1 (2015): 43–62, https://doi.org/10.5860/crl.76.1.43. 17 hamlett and georgas, “in the wake of discovery.” 18 hugo c. huurdeman, mikaela aamodt, and dan michael heggo, “‘more than meets the eye’— analyzing the success of user queries in oria,” nordic journal of information literacy in higher education 10, no. 1 (2018): 18–36, https://doi.org/10.15845/noril.v10i1.270. 19 huurdeman, aamodt, and heggo, “more than meets the eye.” 20 christina stohn, ”how do users search and discover?: findings from ex libris user research,” ex libris, 2015, https://www.exlibrisgroup.com/blog/ex-libris-user-studies-how-do-userssearch-and-discover/. https://doi.org/10.6017/ital.v35i1.8965 https://doi.org/10.1080/1941126x.2018.1521092 https://doi.org/10.1016/j.acalib.2016.06.016 https://doi.org/10.5860/crl.78.7.878 https://doi.org/10.15760/comminfolit.2013.7.1.135 https://doi.org/10.1016/j.acalib.2018.04.001 https://docs.lib.purdue.edu/iatul/2012/papers/7/ https://doi.org/10.1080/19322909.2019.1598919 https://doi.org/10.5860/crl.76.1.43 https://doi.org/10.15845/noril.v10i1.270 https://www.exlibrisgroup.com/blog/ex-libris-user-studies-how-do-users-search-and-discover/ https://www.exlibrisgroup.com/blog/ex-libris-user-studies-how-do-users-search-and-discover/ information technology and libraries june 2021 framework for measuring relevancy in discovery environments | galbreath, merrill, and johnson 17 21 alex merrill and blake l. galbreath, “a framework for measuring relevancy in discovery environments,” 2020, https://osf.io/ve3kp/. 22 “primo search discovery: search, ranking, and beyond,” ex libris, 2015, https://www.exlibrisgroup.com/products/primo-discovery-service/relevance-ranking/. 23 “primo search discovery,” 3. 24 lee and chung, “an analysis of web-scale discovery services”; dahlen and hanson, “preference vs. authority”; lundrigan, manuel, and yan, “pretty rad”; hamlett and georgas, ”in the wake of discovery.” https://osf.io/ve3kp/ https://www.exlibrisgroup.com/products/primo-discovery-service/relevance-ranking/ abstract introduction background literature review methods research questions data collection keywords and queries, machine number of n-grams rubric to select n-grams for shortand full-length queries keywords and queries, natural language search results results source citation data description target citations list data total unique matches matches added by faceting comparing search strategies discussion how effective is primo at returning relevant results? to what extent does faceting improve search results? which search strategies are the most effective within the given framework? how to refine the framework for future investigations into relevancy implications of this study for end users conclusions and next steps endnotes 112 journal of library automation vol. 14/2 june 1981 anyway because he is primarily getting suggested classification numbers in order to browse. the tucson public library could not have made the above decisions if it did not have a complete online file of all its holdings (including even reference materials that never circulate). but since this data did exist (after a five-year bar-coding effort) and since more than forty online terminals were already in place throughout the library system to access the online file, the decision not to include locations or holdings in the microform catalog seemed reasonable . in the longer-range future (1990?), it is very likely that the entire catalog will be available online. in the meantime, the tucson public library did not want to divide its resources maintaining two location records, but rather wanted to concentrate resources in maintaining one accurate record of locations available as widely as possible throughout the library system (by installing more online terminals for staff and public use). was this decision a sound one? we don't know. the microform catalog has not yet been introduced for public use. by the end of this year we should have some preliminary answers to this question. references 1. robin w. macdonald and j. mcree elrod, "an approach to developing computer catalogs," college & research libraries 34:202--8 (may 1973). a structure code for machine readable library catalog record formats herbert h. hoffman: santa ana college, santa ana, california. libraries house many types of publications in many media, mostly print on paper, but also pictures on paper, print and pictures on film, recorded sound on plastic discs, and others. these publications are of interest to people because they contain recorded information. more precisely said, because they contain units of intellectual, artistic, or scholarly creation that collectively can be called "works." one could say simply that library materials consist of documents that are stored and cataloged because they contain works. the structure of publications into documents (or "books") and works, the clear distinction between the concept of the information container as opposed to the contents, deserves more attention than it has received so far from bibliographers and librarians. the importance of the distinction between books and works has been hinted at by several theoreticians, notably lubetzky. however, the idea was never fully developed. the cataloging implications of the structural diversity among documents were left unexplored. as a consequence, librarians have never disentangled the two terms book and work . from the paris principles and the marc formats to the new second edition of the anglo-american cataloguing rules, the terms book and work are used loosely and interchangeably, now meaning a book, now a work proper, now part of a work , now a group of books. such ambiguity can be tolerated as long as each person involved knows at each step which definition is appropriate when the term comes up. but as libraries ease into the age of electronic utilities and computerized catalogs based on records read by machine rather than interpreted by humans, a considerably greater measure of precision will have to be introduced into library work. as one step toward that goal an examination of the structure of publications will be in order. the items that are housed in libraries, regardless of medium, are of two types. they are either single documents, or they are groups of two or more documents. items that contain two or more documents are either finite items (all published at once, or with a first and a last volume identified) or they are infinite items (periodicals, intended to be continued indefinitely at intervals). schematically, these three types of bibliographic items in libraries can be represented as shown in figure l. it should be noted that all publications, all documents, all bibliographic items in lid d ... d do __ _ fig. 1. three types of bibliographic items: top, single-document item; center, finite multiple-document item; bottom, infinite multipledocument item. braries, can be assigned to one of these three structures. there are no exceptions. all bibliographic items, furthermore, contain works. an item may contain one single work. but an item may also contain several works. schematically, the two situations can be represented as shown in figure 2. an item that is composed of several documents and contains several works may have one work in each document, or several per document. schematically, the two possibilities can be represented as shown in figure 3. it is possible, of course, for an item to fig . . 2. top, single-work document (example: a typical novel); bottom, multiple-work document (example: a collection of plays). communications 113 fig. 3. top, one work per document; bottom, several works per document . be composed of several documents but to contain only one work. figure 4 is a schematic representation of this case. mixed structures are also possible, as in the schematic shown in figure 5. ign oring the mixed structure that is only a combination of two "pure" structures, the foregoing information can be combined into a table that shows seven possible publication types that differ from each other in terms of structure (figure 6). all bibliographic items, whether composed of one document or many, are known by a title . these titles can be called item titles. in the case of a singledocument item (structures a and c), item title and document title are, of course, identical. but in the case of some multiple-document items (publications of types d, e, f, and g, for example), two possibilities exist: the documents that make up the item may or may not have their own individual document titles. for purposes of fig. 4. multivolume work (example: a very long novel in two volumes). fig. 5. finite multi-document item containing many works, mixed structure. 114 journal of library automation vol. 14/2 june 1981 one several documents document per item per item one \.jork per item a se veral several lo/orks works per item per c document one lo/ork per document fig . 6. publication types. the bibliographer or cataloger, items that consist of several documents bearing individual document titles can be described under one of two principles. the entire item can be treated as a unit. elsewhere i have coined a term for this treatment: the set description principle .1 but it is also possible to treat each document as a separate publication, to describe it under the book description principle . if we combine all these considerations we find that we can assign to each bibliographic item that is added to a library's collection one of the thirteen codes shown in figure 7. how can these codes be useful? taking a look into the future, let us imagine an online catalog system supported by a database that contains the records of a library's holdings . the records in such a database are entered in a definite format . in this format, whatever it will be called , there will be data fields for titles, authors, physical descriptions , subject headings, document numbers, and much else. i propose that to these fields one other be added: the structure code . the structure code would add a new dimension to the retrieval of recorded infinite infinite b d e f g formation. here are a few specific examples . consider a search for material on subject x. qualify the search argument by structure codes 1, 3, 7, and 12. result: the search will yield only major monographic works, defined as items of types a, b,f, and g. note that subject x assigned to such items is a true subject heading. the materials retrieved in this example would all be works dealing specifically with the topic x. but the same term assigned to an item coded, say, 6, would not be a true subject heading. the term here would only give a broad general summary of what the works in the item are about. the structure code adds sophistication to the retrieval process by enabling a searcher to distinguish between specific subject designators and mere summary subject headings. a search that excludes codes 2, 4, 5, and 6 limits output to materials that are not just collections of essays. the stratagem used in card catalogs to reach the same result is the qualification of a subject heading by terms denoting format, such as the subdivisions congresses or addresses, essays, lectures . this method of qualifying subject headings has never been done communications ll5 structure code publication type description principle: book (b) or set (s) schematic 1 a 2 c 3 b 4 d 5 d 6 d 7 f 8 f 9 e 10 e 11 e 12 g 13 g fig. 7. structure codes . consistently , however . the proposed structure code would ensure uniform treatment of all affected publications. qualify the search by codes 9, 10, 11, 13 and all periodicals can be excluded . in the card catalog, format qualifications such b b s b s, with individual document title s, without indiv . document title b s b s, with individual . document title s, without indiv. document title b s fwli ___ wgj ~--~ as periodicals, or societies, periodicals, etc ., or yearbooks are sometimes added to subject headings to reach similar results. again, the structure code would introduce uniformity and consistency. present-day card catalogs list publica116 journal of library automation vol. 14/2 june 1981 tions only. they do not list the individual works that may be contained in publications. if an analytic catalog were to be built into a computerized system at some time in the future , the structure code would be a great help in the redesign, because it makes it easy to spot items that need analytics, namely those that contain embedded works, or codes 2, 4, 5, 6, 8, 9, 10, 11, and 13. a searcher working with such an analytic catalog could use the code to limit output to manageable stages-first all items of type c, for example; then broadening the search to include those of type d; and so forth, until enough relevant material has been found. the structure code would also be useful in the displayed output. if codes 5 or 8 appeared together with a bibliographic description on the screen, this would tell the catalog user that the item retrieved is a set of many separately titled documents. a complete list of those titles can then be displayed to help the searcher decide which of the documents are relevant for him. in the card catalog this is done by means of contents notes . not all libraries go to the trouble of making contents notes, though, and not all contents notes are complete and rtliable . the structure code would ensure consistency and completeness of contents information at all times. codes 10 and 13 in a search output, analogously, would tell the user that the item is a serial with individual issue titles. there is no mechanism in the contemporary card catalog to inform readers of those titles. codes 4 and 7 would tell that the document is part of a finite set, and so forth. it has been the general experience of database designers that a record cannot have too many searchable elements built into its format. no sooner is one approach abandoned "because nobody needs it," than someone arrives on the scene with just that requirement. it can be anticipated, then, that once the structure code is part of the standard record format, catalog users will find many other ways to work the code into search strategies. it can also be anticipated that the proposed structure code, by adding a factor of selectivity, will help catalogers because it strengthens the authority-control aspect of machine-readable catalog files. if two publications bear identical titles, for example, and one is of structure 1, the other of structure 6, then it is clear that they cannot possibly be the same items. however, if they are of structures 1 and 7, respectively, extra care must be taken in cataloging, for they could be different versions of the same work. determination of the structure of an item is a by-product of cataloging, for no librarian can catalog a book unless he understands what the structure of that book is-one or more works, one or more documents per item, open or closed set, and so forth . it would therefore be very cheap at cataloging time to document the already-performed structure analysis and express this structure in the form of a code. references l. herbert h. hoffman, descriptive cataloging in a new light: polemical chapters for librarians (newport beach, calif.: headway publications, 1976), p.43. revisions to contributed cataloging in a cooperative cataloging database judith hudson: university libraries , state university of new york at albany. introduction oclc is the largest bibliographic utility in the united states. one of its greatest assets is its computerized database of standardized cataloging information . the database, which is built on the principle of shared cataloging, consists of cataloging records input from library of congress marc tapes and records contributed by member libraries. oclc standards ln. order to provide records contributed by member libraries that are as usable as those input from marc tapes, it is imreproduced with permission of the copyright owner. further reproduction prohibited without permission. pearls marmion, dan information technology and libraries; mar 2000; 19, 1; proquest pg. 53 pearls ed. note: "pearls" is a new section that will appear in these pages from time to time. it will be ital 's own version of the "top technology trends" topic begun by pat ensor. these pearls might be gleaned from a variety of places, but most often will come from discussion lists on the net. our first pearl, from thomas dowling appeared on web4lib on august 19, 1999 under the subject "pixel sizes for web from : thomas dowling to : multiple recipients of list sent : thu, 19 aug 1999 06:07 :08 -0700 (pdt) subject: [web4lib] pixel s izes for web pages dan marmion pages." he is responding to a query that asked if web site developers should assume the standard monitor resolution is 640x480 pixels, or something else. you may want to consult the web4lib archive for comments from the last few merry go-rounds on this topic. monitor size in inches is different from monitor size in pixels , which is different from window size in pixels, which is d ifferent from the rendered size of a browser's default font. not only are these four measurements different, they operate almost wholly independently of each other . so a statement like "i have trouble reading text at 600x800" puts the blame in the wrong place . html inherently has no sense of screen or window dimensions. many web designers will argue that the only aspects to a page with fixed pixel dimensions should be inline images; such designers typically restrain their use of images so that no single image or horizontal chain of images is wider than, say, 550px (with obvious exceptions for sites like image archives where the main purpose of a page is to display a larger image) . outside of images, find ways to express measurements relative to window size (percentages) or relative to text size (ems). users detest horizontal scrolling. in my experience, users with higher screen resolutions and/or larger monitors are less likely to run any application full screen; average window size on a 1280x1024 19" or 21 " monitor is very likely to be less than b00px wide. (the browser window i currently have open is 587px wide and 737px high .) i applaud your decision to support web access for the visually impaired . since that entails much , much more than monitor resolution, i trust the people actually writing your pages are familiar with the web content accessibility guidelines. it is actually possible to design web sites that are equally usable , even equally beautiful, under a wide range of viewing conditions. failing to accomplish that completely is understandable; failing to identify it as a goal is not. my recommendations to your committee would be a) find a starting point that isn't tied up in presentational nitpicking; b) find a design that looks attractive anywhere from 550 to 1550 pixels wide; c) crank up both your workstations ' resolution and font size; and d) continue to run your browsers in windows that are approximately 600 to 640 pixels wide . thomas dowling ohiolink ohio library and information network tdowllng @ohiolink.edu pearls i 53 102 the recon pilot project: a progress report henriette d. a vram: project director, information systems office, library of congress, washington, d. c. a synthesis of the progress report submitted by the library of congress to the council on library resources under an officers grant to initiate the recon pilot project that gives an overview of the project and the progress made from august-november 1969 in the following areas: training, selection of material to be converted, investigation of input devices, and format recognition. introduction the recon pilot project is an effort to analyze the problems of largescale conversion of retrospective catalog records through the actual conversion of approximately 85,000 non-current records. this project has grown directly out of the implementation of the marc distribution service. libraries considering the use of machine readable records for their current materials have naturally begun to consider conversion of their older records as well. some libraries have even begun such conversion projects. since the library of congress is also interested in the feasibility of converting its own retrospective records, it seemed appropriate to explore the possibility of centralized conversion of retrospective cataloging records and their distribution to the entire library community from a central source. a proposal having been submitted by the library of congress to the council on library resources, inc. ( clr), the council granted funds for a study of this problem. an advisory committee was appointed to provide guidance, and direct responsibility for the study and report ( 1) was assigned to a working task force. recon pilot project/ avram 103 a recommendation of the working task force was the implementation of a pilot project to test the techniques suggested in the report in an operational environment. since any feasibility report, no matter how detailed, refers to a theoretical model, the recommended techniques should be tested to determine a most efficient method for a large-scale conversion activity. the advisory committee concurred with this recommendation. the library of congress submitted a proposal for a pilot project (hereinafter referred to as recon) to clr, and received an officer's grant in august 1969 to initiate recon while the council continued its evaluation of the full-sc'ale pilot project. . a progress report was submitted to clr by the library covering the period from mid-august to november 1, 1969. so that clr might have a clear understanding of the work in progress, the report addressed itself to both the areas of recon supported by the council and those activities supported by the library of congress. in december 1969, clr awarded the library the funds requested for the entire pilot project. to make the library community cognizant of recon as quickly as possible, clr granted permission to modify the progress report for publication.· overview of the recon pilot project the pilot project is concerned with the conversion and distribution of an estimated 85,000 english language titles: 22,000 titles cataloged in 1969 and not included in the marc distribution service, and 63,000 titles from 1968. the creation of this data base partially satisfies the conclusions and specific recommendations of the recon working task force as stated in the report ( 2) : 1) there should be no conversion of any category (language or form of material) of retrospective records until that category is being currently converted; 2) the initial conversion effort should be limited to english language monograph records issued from 1960 to date and converted into machine readable form in reverse chronological order. (marc distribution service covers current english language monographs cataloged by the library of congress) . in order to explore the problems encountered in encoding and converting cataloging records for older english language monographs, and monographs in other roman alphabet languages, 5,000 additional titles will be selected and converted. the library further intends to investigate, through the design and implementation of a format recognition program, the use of the computer to assist in the editing of cataloging records. this technique should significantly reduce the manpower needs of the present method of conversion and therefore have an impact on any future library of congress conversion activity, either of currently cataloged or retrospective titles. recon will include experimentation with microfilming and producing hard copy from the lc record set. the record set in the lc card division consists of a master copy of the latest version of every lc printed card, arranged by card series and, 104 journal of library automation vol. 3/2 june, 1970 within each series, by card number. although a specific time period can be selected for conversion, the primary disadvantage of the record set for this purpose is the fact that not all changes in cataloging made to the lc official catalog are reflected in the record set. after considering all the alternatives, the recon working task force recommended (3) that the record set be used for selection of titles, but that the titles be compared with the official catalog and updated to insure bibliographic accuracy and completeness. since the record set is in constant use by card division personnel, the selected titles for conversion must be reproduced, and the original file reconstituted, as quickly as possible. the state of the art of direct-read optical character recognition devices suitable for large-scale conversion will be monitored and experimentation will be conducted with a variety of input devices. recon is closely related to the lc card division mechanization project, which is based upon the availability of records in machine readable form. recon will be closely coordinated with the card division project, both in the design of specifications for implementation and in the investigation of a common hardware/software configuration. the project was organized during august 1969. the first group of records being edited are those cataloged by the library of congress in 1969. in june 1970, the editing of the 1968 records will begin. since these records will have to be compared with the lc official catalog to record any changes, present thinking includes the design of a print program (referred to as a two-up print program) to cut printing time by providing a listing with records arranged in card number sequence (the order of input) and in alphabetic sequence by main entry on the same page. the records will be arranged by main entry to reduce the effort of checking them against the official catalog and the changed records will be inserted in their proper place in sequence by lc card number. the process of manual editing may be greatly reduced, or perhaps even eliminated, by october 1970, when the format recognition program is scheduled for completion. mter this time, the records will be input with little or no prior tagging and further editing will be performed by the computer. the resulting records will be examined by the marc editors both for accuracy in transcription and for correctness in the assignment of marc tags, indicators, and subfield codes. the duration of the pilot project will be twenty-four calendar months, august 1969-august 1971. it is anticipated that by november 1970 enough data should be available to determine whether a full-scale conversion project should be undertaken. an early evaluation of the project is advantageous in order to explore the funding possibilities of a conversion effort if the results of the pilot are affirmative. figure 1 is a calendar indicating the major milestones of recon as postulated during august 1969. recon pilot project/ avram 105 1969 1970 1971 au s 0 n d ja f mr ap ~y jn jy au s 0 n d ja f ~r ap my jn jy au t_ project begins • production staff hired • iso staff organized card division sends 1969, 1968 cards !investigate input devices, recon/card division hardware/software rrainfng editors trinf index !reproduction methods for catalog records study !analysis, editing, etc., research [itles 1 organize cardf for recon input ~ull editing of 1969 titlfs (16,000 records) rna1ysir of system to convert 1968 titles 1 full editingr 1968 titles ~er new mtst's ~ire nfw mtst typists fesign and implementation of format recognitifn fig. 1. recon calendar. use of format recognition on remainder 1968 titles conversion of marc , ! to marc ii rnd interim marc ii to marc ii regin evaluation of pilot pr~ject begin planning for con• tinuation of project! begin writing final report 106 journal of library automation vol. 3/2 june, 1970 essentially the same advisory committee and working task force selected for the recon feasibility study have agreed to serve in their respective capacities for recon. the implementation of the library of congress' marc distribution service and the initiation of recon are providing the nucleus of a national bibliographic data base. creation of this data base is not in itself a panacea for libraries but, in fact, amplifies the need to explore some of the larger issues at this time to provide the direction for future cohesive library systems. certain aspects of the problems were discussed in general terms in the recon report but time did not permit full analysis. during the two-year period of recon, the working task force will consider some of those issues (defined as four tasks listed below) under the grant from clr. the ability to complete all of the tasks described will be dependent on additional funding, which, it is hoped, may be available early in 1970. 1) any national data store should have a data base in which all records are consistent. it is possible, and highly probable, that libraries may convert bibliographic records for local use, which may not require the detail of a marc ii record. it is imperative that before levels of completeness of marc records are defined with respect to content and content designation, the implications of these definitions to future library networks be thoroughly explored. 2) any consideration of a national bibliographic data store in machine readable form should include the possibility of recording titles and holdings from other libraries. although the resolution of the problems associated with a machine readable national union catalog are enormous, it is time to begin an exploration of the problems to provide guidance for future design efforts. 3) several institutions have begun the conversion of their cataloging records into machine readable form. the possibility of utilizing these records in building a national bibliographic data store should be investigated. this will involve evaluating the difficulty and cost of converting and upgrading records converted by others to a marc format as opposed to preparing original records. 4) the library of congress maintains, and is considering the conversion into machine readable form, of its name and subject authority files. many libraries have expressed interest in receiving these records in the present marc distribution service. little thought has been given to the storage and maintenance of these large files in each library subscribing to marc distribution service. a library may not have in its collections a bibliographic record requiring either a name or subject cross reference record distributed by the library of congress. however, the library will keep the cross reference record because it cannot predict when a title will be added to the collection that does require the cross reference structure. the result will be the eventual storage and maintenance of the ~-==--------------------------------.... recon pilot project/ avram 107 entire lc name and subject reference files in each library. this problem should be explored to determine if there is a possible efficient method of libraries accessing these files from either a centralized source or several regional sources. progress-august 1969 to november 1969 organization the recon staff is divided into two sections: 1) the production section, responsible for the actual editing and keying of the records; and 2) the research and development section, responsible for liaison with the production section, determination of the criteria for the selection of the 1968 and 1969 titles, actual selection of the 5,000 research titles, investigation of input devices and photocopying techniques, liaison with the card division mechanization project, and the design and coding of special computer programs unique to recon. in addition, staff members of the marc project team in the information systems office (iso) are working in areas of format recognition and marc system programming that will affect recon. training the marc experience at the library of congress has demonstrated that staff members assigned to the editorial process of preparing catalog records for conversion to machine readable form must be exposed to cataloging fundamentals. phase i of the training program for the recon editors was a twoweek cataloging class conducted by the supervisor of the production section, a professional librarian with experience in teaching cataloging principles at the library of congress. each day was formally structured into reading, discussion, and practice. the editor-trainees applied the angloamerican cataloging rules ( 4) to practice problems and to actual cataloging of books. experience in using the lc subject heading list, filing rules, and classification schedules was provided to a lesser extent. in order to insure that the editor-trainees would have a wider range of experience in examining cataloging copy, the mnemonic marc tags and the more simple indicators and subfield codes were taught and used to identify explicitly cataloging elements on lc proofslips. phases ii and iii of the training, marc editing and correction procedures, were also taught by professional librarians. the editing class, which lasted two weeks, was divided into lecture sessions and laboratory sessions. each lecture period was from two to three hours; then, during the laboratory session, the instructions given in the lectures were applied to practice worksheets. the course covered input of variable and fixed fields, assignment of bibliographic codes for language and place of publication, and identification of diacritical marks included in the lc character set. phase iii of the training program, on correction procedures, 108 journal of library automation vol. 3/2 june, 1970 was a one-week class covering the addition, deletion, and conection of entire records or data elements at the field level. the training period was followed by an intensive practice period using marc input worksheets, which were reviewed by the experienced editors. selection of cards the actual selection of the 1968 and 1969 titles is a joint effort by the card division staff and the recon staff. the procedures for the selection of cards from the card division for recon differ from those described in the original report. since only cards for 1968 and 1969 titles are being selected, it is more expedient to draw the cards from the card division card stock than to microfilm the record set. these cards will include all titles cataloged by the library of congress during 1968 and 1969 regardless of language or form of material, which will yield approximately 250,000 cards. the cards are forwarded to the production section from the card division, where each record is inspected to determine whether it meets the criteria established for recon, i.e., all english language monographs with an lc catalog card number representing works cataloged by lc in 1968 and 1969 that are not already in machine readable form. the determination as to whether or not an item is in english is based upon the text, not the title page. an anthology of literature in spanish with a title page in english would not be included in recon; a book with text in english but title page in french would be included. if a book is multilingual (complete text in more than one language), the language of the first title determines inclusion or exclusion for recon. atlases are included, but not single maps or set maps. music or music scores are excluded, but books about music are included. records representing film strips, moving pictures, serials, and other kinds of materials not regarded as monographs are excluded. once the cards eligible for recon are selected and arranged in lc card number sequence, the cards are compared with the print index listing all records already in machine readable form. those records not in machine readable form are photocopied onto the input worksheet for editing and keying. to date, 60,000 cards have been selected by card division staff and forwarded to the production staff for further processing. selection of research titles an integral part of recon is the conversion of 5,000 titles to machine readable form for research purposes. ideally, these titles should serve not only the needs of recon but also be useful for some other purpose in the library of congress. these titles would include english language monographs cataloged before 1950, and foreign language material using the roman alphabet, and would be used to test various methods of input recon pilot projectjavram 109 and certain aspects of the format recognition program. the older material would represent records cataloged under earlier cataloging rules and would reveal problems in conversion in an area in which little information exists. two sources were initially considered for the selection of research titles: 1) titles in the main reading room collection for conversion into machine readable form for the production of book catalogs, and 2) the popular titles (cards ordered most frequently) of the card division mechanization project. a decision was made to study the titles in both sources with priority given to solution of conversion problems and to determine: 1) if overlap existed in records for both projects that would also serve the needs of recon; 2) if overlap did not exist, which titles (main reading room collection or card division popular titles) best served the needs of recon; and 3) if the titles in neither project were suitable, the method of selection to be used from the card division record set. the first task was a study of the characteristics of the main reading room collection. the collection consists of approximately 14,000 titles, and printed cards have been collected to compile a complete shelf-list catalog. these cards represent a wide range of material cataloged from 1900 to date. approximately one-fourth to one-third represent serials. the collection includes material in most of the roman alphabet languages currently processed at the library, the more common non-roman alphabet languages, such as russian, japanese, hebrew, etc., and a number of "difficult" titles, such as encyclopedias, dictionaries, etc., that would present a variety of cataloging and editing problems. the second task was a study of the popular titles from the card division. the card division provided a printout of card numbers for titles with 25 or more orders. there were 4,765 such card numbers listed with their corresponding number of orders. only 210 of these were for pre-1950 cards, and 97 of the 210 cards were for serial titles. only 15 out of the 210 cards were for "difficult" titles. another list was produced which contained card numbers for titles with ten or more orders. this list (with 39,148 card numbers) did produce more titles that would meet the research needs of recon. a sampling technique was designed by the technical processes research office to determine the percentage of overlap of this list with the titles in the main reading room reference collection. the estimated number of matches ( 15.5%) indicated that not enough overlap existed to consider a selection of titles that would serve the needs of both projects (main reading room collection and card division) and recon. therefore, the research titles are being selected from records for the reference collection. iso is working closely with staff members of the reference department on this project. the reference department is providing local informallo journal of library automation vol. 3/2 june, 1970 tion (e.g., local call number to locate the item in the reference collection as opposed to the lc call number which locates the item in the general collection) for all titles. as this process is completed, the responsible recon staff member is selecting the research titles. to date, "local" information has been added to 2,000 records, and 400 recon titles have been selected from this group of records. computer programs the only computer program implemented to date is the print index program. this program was required to check the records meeting the manual selection criteria for inclusion in recon against records in existing machine readable data bases to avoid duplicate input. print index lists by card number all records in machine readable form in either the marc i or marc ii data bases. at a later date, the 1968 titles found on the marc i data base will be processed by a subset of the format recognition program and converted to the marc ii processing format. the print index program is made up of two routines. the lc catalog card number routine reads each record, extracts the lc card number and creates a magnetic tape file of numbers (called print index tape). the tape created contains a card number right justified for machine sorting, a card number in the same form (zeros deleted) as the number on the printed card, and a data base code indicating the file in which the record originally resided (e.g., marc ii data base, marc ii practice tape, marc i data base). a parameter card is used to indicate which format and data base is to be processed. the ibm sort is used to arrange the output of the lc catalog card number routine into the following order: all 6x-series numbers, all 6xseries numbers with alphabetic prefixes (by year of cataloging-i.e., 1968 followed by 1969), all 7 -series numbers (disregarding the check digit, the second digit in the number). the lc card number print routine prints the card numbers, which are in numeric sequence as described in the preceding paragraphs, from the print index tape. each page of the listing contains a heading, a running index, a date, and a page number. the program prints 200 card numbers and data base codes per page. the numbers are in ascending order, top to bottom in four columns of 50 numbers each. format recognition the experience of the library in the creation of machine readable cataloging records during the marc pilot project and the marc dish·ibution service has clearly demonstrated that the highest cost factor of conversion is the human editing and proofing. the editing presently consists of assigning tags and codes to the bibliographic record to explicitly identify the content of the record for machine manipulation. the recon pilot projectfavram 111 library has completed a format recognition feasibility study which concluded that the probability of success of automatically assigning tags and codes by computer is high. since the format recognition feasibility study was only concerned with cataloging records for current english language monographs, the study must be extended to cover other roman alphabet languages and as part of recon, records which were created according to different rules and conventions. although the progress report submitted to clr included the definition and status of each of the tasks that make up the format recognition program, these have been omitted to avoid duplication with an article recently published in the i ournal of library automation ( 5) describing format recognition concepts in some detail and elaborating on the tasks completed and projected at that time. investigation of input devices the investigation of input devices and the testing of several selected devices in an operational mode will continue throughout recon. a study of the use of a mini-computer operating in an on-line mode for input, editing, and formatting of marc records is in progress at the library and will supplement the recon effort and provide additional data. a preliminary investigation was begun of optical character readers commercially available and in the developmental phases. only those readers capable of reading numerous characters on many lines (page reader) as opposed to a limited number of characters or lines per document (document reader) were included in the study. the machines evaluated were considered as possible candidates if they were capable of processing upperand lower-case alphabetic characters, numerals, standard punctuation and some special symbols. each manufacturer has specifications for the type of paper required and the font style which can be recognized. paper handling is a major drawback of optical character readers. excessive handling of the paper or any type of smear, crease, or crinkle could cause rejection of a character or conversion of a character to some specified symbol indicating an invalid character. error rates for the devices considered range from one to 35 characters per 10,000 characters and 80% of the errors are caused by paper handling. typewriters used to prepare the source document must be constantly cleaned and ribbons changed to keep impact keys free of dirt. frequent jamming appears to be a characteristic of most machines; unjamming these machines can be difficult and is highly dependent upon the skill of the operator. ten companies that have various types of optical character recognition equipment commercially available were considered in the first study. five were immediately rejected because their devices did not meet the criteria as specified above. 112 journal of libmry automation vol. 3/2 june, 1970 the devices remaining had the following characteristics: control data corporation 915 page reader. accepts 2.5x4 to 12x14inch paper; ocr-a standard type font; recognizes upper-case alphas, numerals, and standard punctuation; through programming and use of special symbols, lower-case alphas can be coded. farrington model 3030. accepts 4.5x5.5 to 8.5x13.5inch paper; ocr-a standard and 12l (farrington) type fonts; recognizes uppercase alphas, numerals, standard punctuation and special symbols; through programming and use of special symbols, lower-case alphas can be coded. scan-data models 100/300. accepts 8.5xll-inch paper; multi-type fonts; recognizes upperand lower-case alphas, numerals, standard punctuation, and special symbols; has programmable unit for formatting. philco-ford general purpose reader. accepts 5.7x8.5x11 inch paper; multi-type fonts; recognizes upper-case alphas, numerals, standard punctuation and special symbols; through programming and use of special symbols, lower-case alphas can be coded. recognition equipment retina. accepts 3.25x4.88 to 14.14-inch paper; multi-type fonts; recognizes upperand lower-case alphas, numerals, standard punctuation, and special symbols; has a programmable unit for formatting. the possibility exists of using any of these five machines for the input of english language material. the keying of an extraneous character is required with the farrington and control data corporation equipment for lower-case and some special symbols. this is not necessary with philco-ford, scan-data, and recognition equipment machines. since the number of special symbols vary by machine, each machine must be studied to determine a method of coding the entire library character set as developed by the library of congress and this method must be evaluated in terms of the burden placed on the typist. with the added feature of lower-case recognition, the price of the machine increases substantially. adequate information has not been obtained from these companies to give an accurate accounting of cost. it should be noted that the rental price for the majority of optical character readers is high, a factor which will have to be taken into consideration at the time of selection of an input device. the most economic route to recon pilot project/ avram 113 conversion may be through a service bureau, depending on the volume of records to be converted. outlook it is too early in the life of the project to predict the outcome or to describe any factual conclusions. the library of congress is greatly encouraged by the interest expressed in the project and the assistance offered by the members of the advisory committee and the working task force. the scope of the assignments and the fact that all members of the working task force have responsible positions in their own institutions are clear evidence of the spirit of cooperation that has been exhibited by the working task force members and their parent organizations. other members of the library community have been and will continue to be contacted throughout the project for their expertise in certain facets of the many problems under exploration. several developing regional networks were requested to describe their plans in the hope that smaller scale efforts would shed some light on the problems involved on a national level. those organizations contacted have responded, and a continuing liaison will be maintained not only to avoid duplication of effort but, more important, to attain a better understanding of how to approach the requirements of future library systems in terms of what is possible today. the report submitted to clr described progress made to november 1, 1969. since that time, the recon production staff has selected all the 1969 titles from the card stock to be included in recon, 5,200 records have been edited, and the first 250 have been forwarded to a service bureau to test its procedures for keying. the staff· has begun the selection of the 1968 titles and out of approximately 26,000 records received to date from the card division 19,000 are recon candidates. the production section continues its training by the proofing of marc records until the recon records are processed through the marc system to provide the required diagnostics for the proofing process. procedures were set up for typing records without any editing and in accordance with the requirements for the format recognition program. sample records selected for testing the procedures were of above-average difficulty in order to include all types of data that might be encountered. the procedures will be continually evaluated until some optimal method is determined. the format recognition algorithms are being evaluated by having recon staff simulate a computer and follow through the logic of the algorithms on actual data. results of the simulation will provide the necessary feedback to adjust the algorithms prior to the coding of the computer programs. detailed design work has begun on the expansion of the marc system to include random access capability and on-line correction. this 114 journal of library automation vol. 3/2 june, 1970 effort is being coordinated with the card division mechanization project and is considering the requirements of a large-scale conversion activity. although it has a long way to go, recon is on schedule and for any project concerned with automation, that is an encouraging note. for the moment the future looks bright. acknowledgment the author wishes to thank the recon staff members of the library of congress for their respective reports which were incorporated into the progress report submitted to the council on library resources, inc., and as such, are significant contributions to this paper. without the aid of the council on library resources the recon project would not have become a reality. through three important grants the council has made a major contribution to the project: 1) the first was a grant in support of the recon feasibility study and the working task force that resulted in the recon report; 2) an officer's grant enabling the establishment of the recon production unit to create additional machine readable records not included in the marc distribution service; and 3), most importantly, a grant providing full funding for the two-year pilot project. references 1. library of congress; recon working task force: conversion of retrospective catalog records to machine readable form. (washington: library of congress, 1969). 2. ibid, pp. 10-11. 3. ibid, pp. 20-38. 5. anglo-american cataloging rules. (chicago: american library association, 1967). 4. avram, henriette d., et al.: marc program research and development: a progress report," journal of library automation, 2 (december 1969), 242-265. 247 buyer be wary! in the september 1974 issue of jola, the "highlights of isad board meeting" reflects the library automation community's growing concern with misrepresentation of products and misleading or fraudulent claims. a proposal was made that isad create a mechanism to monitor relevant advertising in order to inform and protect' its constituency and, indeed, the entire profession. it is paradoxical that this concern is being voiced at a time when the relationship between the public and private sectors seems closer than at any other time in the recent past. in general, librarians and vendors are good friends. there is an atmosphere of mutual respect, and we no longer raise eyebrows upon learning that a librarian-colleague has gone "commercial." indeed, librarians and libraries are learning from the business world to create products and market them in order to support desired internal services. the growing entrepreneurial efforts of libraries are linkirig the two groups with a yet firmer bond. unfortunately, but inevitably, there are a few flies in the ointment. with regularity, we pick up professional literature to find advertising which sounds too good to be true. an investigation will usually indicate that, in fact, it is not true. we are often visited by salesmen describing incredible advances in their particular areas. the pressure applied by these people can be distasteful and even intolerable. or we may receive a onepage brochure from an unknown company, touting its latest, very competitive system, and listing the familiar names of well-respected librarians as advisors. almost always, we are lucky and are able to discover for ourselves the true nature of the products being advertised. our misfortune may begin when an ambitious salesman finds his or her way into the office of an administrator or politician who does not have adequate preparation for the onslaught of facts, figures, and fallacies. what are the best ways of misrepresenting a product? most approaches fall into one of the following categories: ( 1) misleading advertising, with unclear statements and imprecise use of vocabulary; ( 2) claims that one, or several, or many other libraries are using the product with satisfaction (when this indeed is not the case); ( 3) specific statements that a large and prestigious library is about to sign a contract for servic~s or products (although investigation will reveal no such intention); ( 4) lists of experts in the field who are presumed to be associated with the company in an advisory or consultant role (but who are unaware of this use of their names); and ( 5) approaches to federal, state, or local agencies to appeal 248 journal of library automation vol. 7/4 december 1974 the procedures used by libraries in requesting bids or awarding contracts. at this point, a note of caution must be inserted. strategies of advertising and marketing usually involve one or more of the above techniques to a certain extent. we all practice minor exaggerations and simplifications in our professional lives in order to accomplish certain goals. it would be unwise and unfair to accuse an advertiser of misleading his market on the basis of one of these "small exaggerations." in resolving this issue, our concern must be with those individuals or organizations who are constantly found with a large discrepancy between the word and the deed. what methods can be used as protection against these tactics? there are several reliable paths: ( 1) be aware of and alert to the possibilities of misleading claims and misrepresentation; ( 2) follow up a sales pitch with a few phone calls to those institutions that are described to be using the product or about to sign the contract; ( 3) maintain a reasonable amount of resistance to the sales talk; ( 4) use the library profession's invisible college to determine the validity of the claims and the experiences that others have had with the firm; and ( 5) support the attempts of our professional societies, such as ala and asis, to require organizations to maintain certain advertising standards. the library market is expanding and maturing; therefore, these growing pains associated with increased marketing efforts are not unexpected. with adequate education and awareness on the part of the buyer, with some pressures placed on advertisers by the professional community, and with a tolerance for the normal tendencies of advertising and marketing, we will be able to resolve a difficult situation with grace and without hard feelings. susan k. martin tech tools in pandemic-transformed information literacy instruction: pushing for digital accessibility article tech tools in pandemic-transformed information literacy instruction pushing for digital accessibility amanda rybin koob, kathia salomé ibacache oliva, michael williamson, marisha lamont-manfre, addie hugen, and amelia dickerson information technology and libraries | december 2022 https://doi.org/10.6017/ital.v41i4.15383 amanda rybin koob (amanda.rybinkoob@colorado.edu) is assistant professor, literature and humanities librarian, university of colorado. kathia salomé ibacache oliva (kathia.ibacache@colorado.edu) is assistant professor, romance languages librarian, university of colorado. michael williamson (michael.d.williamson@colorado.edu) is assistant director, assessment and usability, digital accessibility office, university of colorado. marisha lamont-manfre (marisha.manfre@colorado.edu) is accessibility and usability assessment coordinator, digital accessibility office, university of colorado. addie hugen (addison.hugen@colorado.edu) is senior accessibility tester, digital accessibility office, university of colorado. amelia dickerson (amelia.dickerson@colorado.edu) is accessibility professional, digital accessibility office, university of colorado. © 2022. abstract inspired by pandemic-transformed instruction, this paper examines the digital accessibility of five tech tools used in information literacy sessions, specifically for students who use assistive technologies such as screen readers. the tools are kahoot!, mentimeter, padlet, jamboard, and poll everywhere. first, we provide an overview of the americans with disabilities act (ada) and digital accessibility definitions, descriptions of screen reading assistive technology, and the current use of tech tools in information literacy instruction for student engagement. second, we examine accessibility testing assessments of the five tech tools selected for this paper. our data show that the tools had severe, significant, and minor levels of digital accessibility problems, and while there were some shared issues, most problems were unique to the individual tools. we explore the implications of tech tools’ unique environments as well as the importance of best practices and shared vocabularies. we also argue that digital accessibility benefits all users. finally, we provide recommendations for teaching librarians to collaborate with campus offices to assess and advance the use of accessible tech tools in information literacy instruction, thereby enhancing an equitable learning environment for all students. introduction the last fifteen years have seen the rise of collaborative and interactive web platforms and whiteboards, game-based learning technologies, audience polls, and other tools that contribute to student engagement in higher education classrooms. these educational tech tools have supported one-time library information literacy (il) sessions by enabling student participation in real time. still, knowing that tech tools may enhance engagement is not enough; we should also be asking whether these tech tools are accessible for all students and, if not, what can be done to make them more accessible. this paper examines the digital accessibility of five tech tools specifically for students who use assistive technologies such as screen readers. the tools are kahoot!, mentimeter, padlet, mailto:amanda.rybinkoob@colorado.edu mailto:kathia.ibacache@colorado.edu mailto:michael.d.williamson@colorado.edu mailto:marisha.manfre@colorado.edu mailto:addison.hugen@colorado.edu mailto:amelia.dickerson@colorado.edu information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 2 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson jamboard, and poll everywhere. these tech tools were identified in a 2021 paper inquiring which tech tools librarians used in emergency remote il instruction during the covid-19 pandemic along with their perceptions of the weaknesses and strengths of these tech tools.1 although there are guidelines aiding librarians in assessing ada accessibility around library spaces, there are no disability-related recommendations for specific tech tools used in il instruction or studies examining tech tools’ digital accessibility features. 2 there is also a lack of documentation regarding librarians’ outreach to ada-related academic offices and tech companies regarding tech tools. we argue that collaboration between libraries and ada-related offices at the campus level increases awareness of digital accessibility issues and requirements and could ultimately advance digital accessibility in educational tech tools used in il instruction. we place our paper within the context of other pandemic-responsive digital pedagogy research. we acknowledge that technology needs for student engagement are evolving in new face-to-face, hybrid, and remote instruction environments; thus, we hope to impact the way tech tools are assessed for digital accessibility and to promote the use of accessibility-tested tech tools in library instruction. first, we provide an overview of ada and digital accessibility definitions, descriptions of screen reading assistive technology, and the current use of tech tools in instruction for student engagement. secondly, we examine accessibility testing reports for the five tech tools selected for this paper. then, we discuss two trends found in the reports: shared issues between the tools and the implications of unique environments. we also argue that digital accessibility benefits all users. finally, we provide recommendations for teaching librarians to collaborate with campus offices to assess and advance the use of accessible tech tools in il instruction, thereby enhancing an equitable learning environment for all students. overview ada accessibility the americans with disabilities act (ada) was made law in 1990, signaling an initiative to protect people with disabilities from discrimination in employment opportunities, when purchasing goods, and when participating in state and local government services. 3 the idea behind the ada law was to provide equal opportunity.4 however, as health sciences librarian ariel pomputius notes, ada law protects people from discrimination, but it does not guarantee a right to accessibility beyond the legal requirements granted by this act.5 as higher education advances through the covid-19 pandemic, digital accessibility has become more essential than ever in il instruction as it takes place in hybrid, remote, and in -person environments. to ensure the digital accessibility of tech tools for all students, we should first understand its meaning. what is digital accessibility? the covid-19 pandemic brought digital accessibility to the forefront as universities navigated complex remote and hybrid learning environments. fernando h. f. botelho, a scholar with expertise in technology and disability, explains digital accessibility as the interconnection of “hardware design, software development, content production, and standards definition.”6 for botelho, accessibility is “an ongoing and dynamic process” rather than an immobilized state, where standards work together as a part of a ubiquitous process.7 as information studies information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 3 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson professor jonathan lazar notes, “digital accessibility means providing an equal user experience for people with disabilities, and it never happens by accident.” 8 georgetown law also defines digital accessibility from a perspective that may resonate with instructors who seek technologies that are accessible to all students. they define digital accessibility as “the inclusive practice of removing barriers that prevent interaction with, or access to websites, digital tools, and technologies.”9 however, it is lazar who moves the topic forward when referring to digital accessibility in research libraries, arguing that although accessibility laws protect people with disabilities, digital accessibility also benefits the whole population.10 lazar made this assertion after capturing the challenges and lessons learned related to digital accessibility during covid-19.11 the most salient lesson is that research libraries should create an infrastructure that supports digital accessibility, especially now that the covid-19 pandemic has driven universities to provide instruction in multiple formats.12 we argue that this infrastructure should also include digital accessibility evaluation of tech tools used in the classroom. assistive technology for blind users congress defined assistive technology in the disabilities act of 1988 as “any item, piece of equipment, or product system, whether acquired commercially off the shelf, modified, or customized, that is used to increase, maintain, or improve functional capabilities of individuals with disabilities.”13 furthermore, special education professors kathleen puckett and kim w. fisher state, “technology becomes assistive when it supports an individual … to accomplish tasks that would otherwise be difficult or impossible.”14 as scholars of occupational therapy claire kearney-volpe and amy hurst note, screen readers assist people with no or low vision by presenting web information on “a non-visual interface” via braille or speech synthesis.15 screen readers’ purpose is important because all people should have the opportunity to access the same information and services in the digital environment without facing undue barriers or burdens. the digital accessibility office’s (dao) assessment and usability team at university of colorado boulder (cu boulder) primarily tests tools for accessibility by utilizing screen reader assistive technology for both computers and mobile devices. assessment and usability staff rely on screen readers for testing because this assistive technology uses and responds to the underlying code of each webpage, application, and environment. this in-depth output makes screen readers good tools for overall accessibility testing, even though they are generally for people with no vision. however, we found no studies on tech tools and classroom engagement that consider assistive technology such as screen readers. classroom engagement with tech tools academic librarians emily chan and lorrie knight state in a 2010 study that library instruction risks being anachronistic if it does not include an engaging technology-based activity.16 with this in mind, there is ample literature documenting the impact and benefits of tech tools in the classroom. for example, authors highlight tech tools’ anonymous environment, categorized as free of judgment, noting that it is student-centered and enhances student participation.17 moreover, anonymity provides a means for students to answer honestly, fostering classroom discussion that includes introverted students.18 on the other hand, some authors argue that anonymous participation does not enhance critical thinking. ann rosnida md deni and zainor izat zainal, referring to padlet as an educational tool, information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 4 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson argue that one challenge of using tech tools to advance student engagement is that they do not, on their own, enhance criticality or discussion because students may not want to oppose their peers’ opinions.19 as with other pedagogical techniques, intentional facilitation with tech tools is necessary to enhance criticality. many authors regard the use of tech tools in the classroom positively.20 examining kahoot! to test students’ performance, darren h. iwamoto et al. state that students valued receiving immediate feedback on their answers after taking a high-stakes examination.21 carolyn m. plump and julia larosa also appreciate the use of instructional games to provide immediate feedback to students, noting that this feedback warned faculty instructors against making assumptions about how much students understand in class.22 similarly, librarian maureen knapp, referring to online tools for active learning, notes that instant feedback drives classroom discussions forward.23 liya deng, a librarian with a focus on disability studies, notes in a 2019 study that using poll everywhere in library instruction provides an opportunity to build rapport with students and a strategy to keep students focused and away from non-instruction-related internet distractions.24 engineering classes have also used tech tools to enhance teaching and learning. a 2021 case study addressing online education due to covid-19 reports that students found kahoot! to be a useful online tool that helped them reflect, apply knowledge, and receive feedback. 25 similarly, engineering educator razzaqul ashahn advocates for incorporating tech tools like jamboard for active “think-pair-share” activities, noting that it enables instructors to connect with students as they do small group work.26 these studies suggest that tech tools continue to be relevant and beneficial during the pandemic, though again, they do not consider whether they are digitally accessible. for this reason, the continued use of tech tools in various modalities (in-person, hybrid, and remote) attests to their relevance, which may continue to grow as instructors transition to pandemic-transformed pedagogy. pandemic-transformed pedagogy in a 2020 publication exploring covid-19 impacts on teaching, learning, and technology use, scholars jillianne code, rachel ralph, and kieran forde coined the phrase “pandemic-transformed pedagogy.”27 as they state, educators find themselves “on the cusp of a rapid change that is compelling them to re-think their worldview in both how they teach and how their students learn, calling for their transformation as educators.”28 a review of the recent literature available through google scholar on “pandemic-transformed pedagogy” shows expanding adoption of this phrase, including academics publishing on a range of interdisciplinary subjects and in international contexts, with implications for both k–12 and post-secondary education.29 as we reflect on this transformation and call for responsiveness to rapid change, we emphasize the need for support, planning, and advocacy for digital accessibility and tech tools. before the covid-19 pandemic, scholars at the university of sydney found in 2018 that the most significant factor driving the choice to use technology was whether it was immediately available. 30 these scholars emphasized the “just in time” use, noting that ready access to the technology required “actions, expenditure, support, and commitment from policymakers and administrators.”31 at the beginning of the pandemic, teachers, librarians, and students had a matter of days to pivot to remote work, and as ibacache, rybin koob, and vance found in a 2021 study, “availability” was a consideration for librarians in selecting tech tools for engagement and content delivery.32 this “just in time” consideration is even more important in the aftermath of covid-19, which prompted emergency remote learning. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 5 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson yet, teaching librarians also ought to go beyond what is easily available and move towards what is digitally accessible. part of the transformation we envision is to extend the concept of “pandemictransformed pedagogy” to include digital accessibility and thus push for the tech tools we use in il instruction to be readily available and digitally accessible. methodology as previously mentioned, this study examines the digital accessibility of five educational tech tools used in il instruction. to initiate a formal accessibility test, we created scripts detailing how to interact with the samples we provided for the five tools.33 these scripts were then used to manually test each tech tool for its digital accessibility using a variety of screen readers on both computers and mobile devices. about the testers the testers are native users of screen reading assistive technology and are blind. they test each tech tool first, with additional staff in the dao reviewing and validating results. about the test scripts process each test script contained the following parameters: 1. basic information about the tool. 2. contact information for access issues and technical questions, such as the tools’ customer support email and librarians’ emails for follow-up questions. 3. access points to the software and websites (urls). 4. step-by-step instructions for testers to impersonate a student engaging in an il task. as a part of these test scripts, we created short sample quizzes and activities for each of the five tools considered in this paper. in addition, the test scripts provided step-by-step descriptions to help the testers interact with the tools. the testers then tried each tool, focusing on functionality and whether they could complete the tasks in the script. the reports describe three levels of problems: severe, significant, and minor. the results section of this paper reports on these problems as found with the five tools tested. the testers also assessed general user experience (usability). the testers used a holistic approach, engaging with the entire virtual environment of the tool rather than looking only at isolated functions. assistive technology the testers utilized four types of screen reader software: voiceover, talkback, nvda, and jaws. voiceover, developed by apple, is a screen reader for mobile devices and computers that “comes standard on every iphone, mac, apple watch, and apple tv. it is gesture-controlled, so by touching, dragging, or swiping[,] users can hear what’s happening on screen and navigate between elements, pages, and apps.”34 talkback is a google-based screen reader included in android mobile devices that functions similarly to voiceover.35 nvda is a microsoft windows-only free open access screen reader supporting people who are blind or have vision impairment.36 jaws, also compatible with microsoft windows, allows people with visual impairment to read the pc screens with a text-to-speech output or via braille display.37 we also tested for visual usability issues using a free web-based color contrast analyzer.38 the testers provided thorough reports detailing the results of their testing, including exact versions used. the tests were conducted between february 27 and may 1, 2022. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 6 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson tools evaluated the educational technology tools in this study are web-based and have free options, allowing students to engage in activities using their computers or their phones. we identified these tools based on a survey about tech tool use during the covid-19 pandemic and from our own experiences.39 the tools are jamboard, kahoot!, mentimeter, padlet, and poll everywhere. jamboard is a google-powered virtual whiteboard tool. kahoot! is a quiz/game platform offering multiple styles; we tested the standard quiz question format and utilized one of the vendor provided sample quizzes. mentimeter is another quiz-making platform; we created the sample quiz utilizing multiple choice and short answer question formats. padlet is a collaborative bulletin board platform with various formats (including the three we tested: padlet maps, padlet shelf, and padlet wall). padlet includes options for users to add text and multimedia in response to question prompts or to post their own questions and other content in a collaborative virtual space. finally, poll everywhere is a polling/survey platform. limitations although digital accessibility offices at different universities commonly rely on shared standards for technology evaluation, such as web content accessibility guidelines (wcag) 2.1, we acknowledge that the assessment approach will vary from office to office. overall, there is much debate on which practices and standards for evaluating tech tools yield the best results. not all higher education institutions have digital accessibility offices, let alone accessibility and usability labs and testers. some institutions may rely on automated checkers or a mix of automated and manual testing. approaches to testing differ, and there is disagreement among digital accessibility practitioners about whether a fully automated, fully manual, or hybrid approach is best. regardless, we expect that manual testing of these educational tech tools using similar assistive technologies would have similar results during the timeframe these tools were tested. the testing reports capture a moment in time, and it’s important to note that web-based tools are frequently updated. we only tested the free versions of these tools. there may be differences in accessibility between free and paid versions. we tested only the browser versions of these tools on computers and mobile devices and did not test mobile applications, which may or may not be more accessible. this decision was made due to the probability that most il librarians and other instructors would not regularly ask students to download applications to their personal devices for in-class engagement. kahoot!, mentimeter, padlet, and poll everywhere were tested on windows, ios, and android platforms. jamboard was tested only on windows, because the browser version would not open on a mobile device using assistive technology. instead, it attempted to force an app download. we also tested each tool using sample environments and functions that we hope captured some ways the tools would be used in a typical il classroom. due to the nature of the tools and the many options available for question and collaboration formats in each tool, these samples were not exhaustive of all options available. these testing results are meant to be illustrative rather than comprehensive. finally, this study evaluated tech tools only for digital accessibility using the specific assistive technology of screen readers. further research is needed regarding how students with a rang e of different disabilities may interact with the technology tools examined here. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 7 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson results this section reports the three levels of problems (severe, significant, and minor) that dao testers found in jamboard, kahoot!, mentimeter, padlet (shelf, map, and wall), and poll everywhere. the testers also assessed user experience (“usability”). issues may be present in multiple categories based on how they impact the user’s ability to complete actions. severe issues table 1 shows the severe issues found in the tools tested. severe issues create access barriers that prevent assistive technology users from completing tasks and are issues that need to be remediated. the testers consider these issues prohibitive for many individuals with disabilities and for those who use assistive technologies. the dao identified ten severe issues in padlet shelf; five severe issues each in jamboard, padlet wall, and poll everywhere; four severe issues in kahoot!, and three severe issues in padlet map. the testers did not find severe issues in mentimeter. table 1 shows that the most common severe issue corresponds to elements that are unlabeled or inappropriately labeled. in the case of padlet map and jamboard, the testers found buttons that were unlabeled or labeled with irrelevant numbers. testers felt unclear as to what the buttons were or what their functionality was. padlet shelf contained the most unlabeled buttons, including the buttons to add posts and the three vertical dots menu to edit or delete. this issue is highly relevant since users need these buttons to navigate and contribute to the padlet. the testers observed a similar problem when using the screen reader talkback to engage with padlet shelf. talkback found unknown or unlabeled buttons, which impede users’ ability to navigate or interact with videos they submit to the padlet. figure 1 illustrates the play button located at the center of a video. in the screen reader, this button is unlabeled and appears after the video, preventing the screen reader from understanding its function and leaving users unclear whether this button is connected to the video. figure 1. the play button at the center of the video is unlabeled. in the reading order, this unlabeled button appears after the video; therefore, it is unclear what it does or how it relates to the video. the second most prevalent severe issue is elements that are not accessible to screen readers. this issue affected padlet shelf and padlet wall. in the case of padlet shelf, the testers utilizing the information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 8 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson voiceover screen reader were unable to interact with or locate gifs and graphics. when the testers utilized talkback, they would hear the gif but could not find the graphics because they were marked as links. in addition, the drawing feature was also not accessible to screen readers, including the visual elements that control colors, which appear as clickable links instead of the visual elements associated with colors. these elements were unavailable for users utilizing voiceover and jaws. the testers found a similar problem with the visual elements in padlet wall, especially when they tried to edit a post (see fig. 2). figure 2. when users want to edit a post in padlet wall, there are visual elements that are available to change the color of the post. these elements are not available to screen readers. figure 3. when images are not programmed to be read as graphics, screen readers are not able to gather information related to the gif. this image was read as “jaf3mi0ja5huk/giphy.” information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 9 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson figure 4. while using nvda, the user hears links for images that do not make sense. the third most frequent severe issue relates to graphics and gifs that are not appropriately programmed. this issue affected padlet shelf and padlet wall. when the testers were using jaws in padlet shelf, the gifs read as links with the following text: “jaf3ml0ja5huk/giphy.” when the testers utilized nvda, the gifs read as “giphy,” conveying no information describing the gif and hindering navigation (fig. 3). similarly, graphics and gifs in padlet wall are programmed as links rather than graphics. when the testers used jaws to understand graphics and gifs, they heard long links such as: “eb351cc20e6bfda76d443f1e93ad7963/pumpkin_seedling_3.” long links like this are useless to people using screen readers and disrupt people’s ability to search for graphics. when the testers used nvda, they also heard links for the images, but without the other series of characters included in jaws (fig. 4). the testers also found severe issues with elements not available by keyboard or screen reader (jamboard and poll everywhere) and timer features (kahoot!). for example, the pen, eraser, laser, shapes, and text box elements in jamboard can only be utilized or placed on the screen by a mouse, making them inaccessible to blind learners. another issue is the lack of alternative text. since jamboard offers a collaborative multi-user space, some users may post images. however, there is no way to input alternative text to an image. in the case of kahoot!, when the timer is activated, the countdown plays as the screen reader tries to read the page, confusing the screen reader and the user, who will hear the timer with random numbers and not the question. the timer feature also affects the user when starting a quiz or moving between questions. it is unclear whether the screen reader is unable to read the questions due to the short timeframe or whether the questions are truly unavailable to the screen reader. the instructor may extend the timer for quizzes in kahoot!, but it is impossible to turn it off altogether when using the kahoot! quiz question format. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 10 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson table 1. number of occurrences of severe issues found during screen reader testing for kahoot!, jamboard, mentimeter, padlet (three formats tested), and poll everywhere. (jamboard was tested on a windows computer only; the other tools were tested on windows, ios, and android.) severe issue jamboard kahoot! mentimeter padlet: map padlet: shelf padlet: wall poll everywhere total occurrences element not available by keyboard or screen reader 2 0 0 0 0 0 1 3 element presents gesture/navigation traps 0 0 0 0 0 0 1 1 elements are not keyboard accessible 0 0 0 1 0 0 0 1 elements are unlabeled or inappropriately labeled 2 0 0 1 4 2 0 9 elements not accessible to screen reader 0 0 0 0 3 2 0 5 errors do not get focus 0 0 0 0 0 0 1 1 graphics and gifs are not programmed appropriately 0 0 0 0 3 1 0 4 graphics are unlabeled or inappropriately labeled 0 0 0 0 0 0 2 2 graphics lack alternative text 1 0 0 0 0 0 0 1 lack of alert 0 0 0 1 0 0 0 1 text not read by screen reader 0 1 0 0 0 0 0 1 timed pages disrupt the ability to read the page 0 3 0 0 0 0 0 3 tool totals 5 4 0 3 10 5 5 32 information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 11 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson the testers found other severe issues such as text not being read by screen readers, missing notifications, elements not accessible to screen readers, unlabeled graphics, and lack of focus on images. for example, in the case of kahoot!, the screen reader could not read the answer notification text. this issue meant that while the tool offered visual indicators for correct and incorrect answers, the screen reader did not read these indicators and it remained unclear to testers whether their answers were correct or not. finally, “lack of focus” or challenges with “focus handling” indicate that the assistive technology’s attention was not where it should be. this problem happens because tool developers do not set the appropriate code for screen readers. significant issues table 2 shows the significant issues found in the tools. significant issues represent items that create great difficulty for people who use assistive technologies, but they do not necessarily prevent the tool from being used. significant issues are recommended for remediation. interestingly, most significant issues were not shared across the five tools; out of seventeen problems, only one was shared by four tools (“inconsistent focus handling”), and three were shared by two tools each (“graphics are inappropriately labeled,” “reading order can be confusing to users,” and “state is not indicated”). because of this lack of overlap, brief descriptions of how frequent issues affected specific tech tools are warranted, focusing on those issues that affected multiple tools, recurred most frequently, or both. the significant issue that recurred most frequently was “reading order can be confusing to users,” affecting jamboard as well as all three padlet styles. in jamboard, when creating a sticky note, the focus of the assistive technology went into the edit field but ignored the color options. this meant that users were unable to switch between colors when making a post. reading order also caused difficulties. reading order is the way elements are tagged and read by screen readers. this may not be the same order most sighted users experience when reading elements on the page from top to bottom, though it should closely reflect the visual layout of the page. it determines what a blind learner will understand about the digital environment and in what order. in padlet map, the screen reader went through irrelevant content, including the terms and conditions, before reading the “new post” button. padlet shelf had three instances of confusing reading order; for example, the “publish” and “update” options were in the reading order above the “edit” field. the user would have to know to navigate back to finalize their post (this issue is repeated in padlet wall as well). further, if a user leaves the new post dialog box, it is difficult to return due to the reading order. the “more buttons” element was also read before the heading of a new post, and those additional buttons are unlabeled. finally, in padlet wall, the tester utilizing voiceover could not discard a post (fig. 5). a dialog opened asking for discard confirmation, but this dialog was buried in the reading order and challenging to locate. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 12 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson figure 5. a dialog box appears visually in padlet wall when a user attempts to discard a post, but it is buried in the reading order of the voiceover screen reader, making it difficult to locate and complete the task. the next most frequent significant issue was “inconsistent focus handling,” which occurred six times. focus handling directs the attention of the user and facilitates various actions in a given environment. inconsistent focus handling emerged in four out of the five tools: jamboard, kahoot!, all three padlet styles, and poll everywhere. this issue often appeared when a new element on the screen was opened, but the “focus” (what the screen reader was paying attention to at any given time) remained on the previous panel or element, causing confusion and difficulty. for example, in jamboard, when selecting the “open a jamboard” button, the panel opened visually, but the screen reader’s focus remained on the button behind the open panel. to get to the new jamboard, the tester had to navigate the other page content first. focus handling was inconsistent across many activation buttons and interactions in all three padlet styles. in kahoot!, focus handling was inconsistent across screen readers, with the focus going to different places, such as after answering a question. in poll everywhere, the focus traveled to other areas of the page after answering a question, returning to previous ques tions, or refreshing the page. these inconsistencies varied among screen readers. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 13 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson table 2. number of occurrences of significant issues found during screen reader testing for jamboard, kahoot!, mentimeter, padlet (three formats tested), and poll everywhere. (jamboard was tested on a windows computer only; the other tools were tested on windows, ios, and android.) significant issue jamboard kahoot! mentimeter padlet: map padlet: shelf padlet: wall poll everywhere total occurrences difficult combination/list box 0 3 0 0 0 0 0 3 element difficult to access 1 0 0 0 0 0 0 1 element state not indicated 1 0 0 0 0 0 2 3 error does not get focus 0 0 1 0 0 0 0 1 extensive load times create difficulties 0 1 0 0 0 0 0 1 graphics are inappropriately labeled 0 1 1 0 0 0 0 2 graphics not programmed appropriately 1 0 0 0 0 0 0 1 headings are not used 1 0 0 0 0 0 0 1 inconsistent focus handling 1 1 0 1 1 1 1 6 lack of alert 0 0 0 1 0 2 0 3 lack of contextual text/information 0 1 0 0 0 0 0 1 lack of focus indicators 0 0 0 1 0 1 0 2 lack of notification 3 0 0 0 0 0 0 3 object placement 1 0 0 0 0 0 0 1 reading order can be confusing to users 1 0 0 1 3 2 0 7 user-created objects initially lack markup 0 0 0 0 0 1 0 1 tool totals 10 7 2 4 4 7 3 37 information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 14 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson another issue that recurred across two tools (kahoot!, mentimeter) was “graphics are inappropriately labeled.” in kahoot!, a graphic showing the final scoreboard from the quiz and a podium were hidden from screen readers. in mentimeter, the logo for the tool is labeled as a “logotype” in the alt text. while these inappropriate labels for graphics may seem minor, they leave players using assistive technology out of celebratory or fun elements and may be confusing. inappropriate labels cannot be corrected by instructors, who are unable to adjust the alt text for elements that are built into the software. another issue that recurred was “state is not indicated” (jamboard, poll everywhere). here, “state” refers to any change or option for an element—the state of it in the digital environment. in jamboard, there is no indication of what color is selected for sticky notes, for example, which can be problematic if instructors use color to convey meaning (fig. 6). in one test question on poll everywhere, the unlabeled image reads as clickable to nvda, and visually, the image becomes larger when clicked. but this change is not announced and again may be confusing. figure 6. for screen readers, there is not an indication of what color has been selected for sticky notes, though this is available visually. with ten issues listed, jamboard was the tool with the greatest number of significant problems. this was true even though jamboard was tested only on windows and not on mobile devices. padlet wall and kahoot! had seven significant issues each. this is a slight departure from the data in table 1, where padlet shelf had the most severe issues. in general, tools with severe issues consistently exhibited some significant issues as well. figure 6. users can enter text on option 1 and option 2, but these options do not generate a heading. minor issues table 3 shows the minor issues found in the five technology tools. minor issues represent items that are inconvenient or annoying, but do not necessarily create barriers to accessibility, e.g., repetitiveness, unclear text, etc. the testers found that each tool had between one and four minor issues of their own but did not share any of the minor issues listed. kahoot! had three issues related to confusing elements: gibberish text heard on screen readers, blanks in the statement not read by screen readers, and an icon that shows the total number of users who finished a test, which the screen reader could not read. other minor issues include instructions, questions, and information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 15 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson table 3. number of occurrences of minor issues found during screen reader testing for jamboard, kahoot!, mentimeter, padlet (three formats tested), and poll everywhere. (jamboard was tested on a windows computer only; the other tools were tested on windows, ios, and android.) minor issue jamboard kahoot! mentimeter padlet: map padlet: shelf padlet: wall poll everywhere total occurrences element is inappropriately labeled 0 0 1 0 0 0 0 1 elements confusing to users 0 3 0 0 0 0 0 3 elements read twice 0 0 0 0 1 0 0 1 heading level not concise 0 0 0 0 0 1 0 1 headings are not used to provide structure 0 0 0 1 0 0 0 1 headings used too often 0 0 0 0 1 0 0 1 inconsistent focus handling 0 0 1 0 0 0 0 1 labels are inconsistent 1 0 0 0 0 0 0 1 lack of a programmatic list creates confusion 0 0 0 0 0 1 0 1 lack of notification 0 0 0 0 0 1 0 1 same information is presented to screen reader multiple times 0 0 0 0 0 0 1 1 sound effect portray meaning 0 1 0 0 0 0 0 1 submenu item count not provided 1 0 0 0 0 0 0 1 unclear text is confusing to user 0 0 0 0 1 1 0 2 tool totals 2 4 2 1 2 3 1 17 information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 16 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson answers announced multiple times (poll everywhere), long heading text (padlet wall), lack of notification when the user adds an image (padlet wall), excessive use of headings in a page forcing users to go through the entire page to find a heading (padlet shelf), and headings not used to provide structure and facilitate navigation. the lack of heading structure complicates the ability of users who desire to add a post with a heading and text as seen in figure 6 (padlet map). usability issues the accessibility assessment reports also included usability evaluation. usability issues may impact users of any ability. the testers noticed insufficient color contrast in three tools (poll everywhere, padlet wall and map, and kahoot!). for example, figure 7 illustrates a poll everywhere sample question where the color of the text does not have enough contrast between the text and the background. the evaluators also found a lack of color contrast in instructions and captions. in some padlet formats, the instructor can change color contrast by choosing a different template. figure 7. an example of poll everywhere answer options that do not have sufficient color contrast between the text and background. conveying meaning using colors is another issue. in the case of jamboard, the sticky note (fig. 8) has a blue bar at the bottom that appears to be loading. this bar is connected to a character count that is not noted by screen readers. in addition, the testers could continue typing past the character limit when the loading bar turned red. the testers also noticed layered elements that caused usability problems. figure 9 illustrates how the preview panel in padlet map visually blocks the post and the button to close the preview panel. padlet shelf and mentimeter did not have usability issues. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 17 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson figure 8. in jamboard, the blue bar (below the yellow box) is used as a visual indication of character limit that is not available to screen readers. figure 9. when the user has the “preview panel” in padlet map open and starts a new post, the preview panel blocks the post. summary of results the reports showed that mentimeter was the most digitally accessible tool of those considered in this study. it is important to note that kahoot! and poll everywhere were judged as relatively accessible with caveats. both jamboard and all three types of padlet tested were found to be inaccessible for many individuals who use assistive technologies. in any case, all tools included either severe or significant issues, creating a great deal of difficulty for users. most issues were unique to individual tools. of twelve severe issues, only two were shared across two tools each. of sixteen significant issues, only four were shared across tools, and only one was shared across more than two tools (“inconsistent focus handling” was a problem in all tools except mentimeter). all fourteen minor issues were unique. mentimeter, with very few issues, and padlet, with the most problems in aggregate, were outliers. because padlet offers so many different format options, we tested three, which affected the findings. still, padlet shelf had the most severe issues (ten). jamboard had the most significant issues (ten) of any tool aside from padlet. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 18 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson discussion we hypothesized that the tools selected for this study shared similar digital accessibility issues. to our surprise, the reports showed that these tech tools had few shared problems. we will thus consider two trends in the reports worthy of further examination: shared issues among the tools and unique environments. we will also discuss how digital accessibility can benefit all users of tech tools, not only people with disabilities. shared issues among the tools as previously mentioned, many issues were unique to individual tech tools covered in this study, while a few problems were shared among the tools. when a particular issue was shared among different tools, the level of severity determined whether a person using assistive technology could have a successful interaction with a tool or not. tracking shared issues and their severity may help developers and digital accessibility staff create a shared vocabulary for discussing user experience. it may also help both parties recognize when issues are common and relatively easy to remediate (e.g., labeling, heading, and alt text problems). there are other shared issues, such as focus handling inconsistencies, that are more difficult to resolve even though they are at the heart of screen reading assistive technology. tracking focus handling problems may allow developers and digital accessibility advocates to share possible solutions with one another. moreover, if tech tool developers and digital accessibility staff both understand the importance of a factor like focus handling, any difficult and severe problem can be prioritized when creating and fixing tech tools. it is also important for instruction librarians to have a basic grasp of this shared vocabulary so that they can anticipate the needs and experiences of the learners in their classrooms. looking at each tech tool in isolation offers only a tiny glimpse into the possibilities of what might happen when students connect to engagement technologies. evaluating multiple tools allowed us to better understand recurring problems and the barriers they create. unique environments though tracking shared issues is important for these reasons, by the end of our testing, we found that the tools were not similar and that even when they had shared issues, these problems had unique characteristics. for this study, we selected tools that have similar functionality (for example, both kahoot! and mentimeter can function as quiz platforms) and others that are distinctive (such as padlet maps, which incorporates gps data to allow users to interact with maps). these tools offer students real-time engagement, which helps foster a collaborative learning environment. as mentioned above, most severe issues (ten out of twelve), most significant issues (twelve out of sixteen), and all minor issues were unique—in other words, they were not shared across tools. from the testers’ point of view, the presence of unique problems is understood by the fact that the elements of each tool combine to create unique environments. for example, some tech tools are more image based, while others are text based.40 our study shows that even tools that initially appear similar are revealed as unique through assistive technology testing. an interesting finding concerned padlet. when tools have problems, these issues usually exist across all of the screen readers used for testing. padlet, however, caused inconsistencies across information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 19 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson different screen readers. for example, padlet shelf had many unlabeled or inappropriately labeled elements that created different experiences between voiceover, nvda, and jaws. moreover, when irregularities appear across similar assistive technologies, this might mean that developers created unusual code in order to facilitate specific visual elements or other aspects of the technology. developers should consider testing on multiple platforms and with two or more screen readers to catch these inconsistencies and should also consider whether simple html alternatives are possible in place of more complicated code. regarding padlet, it may be possible that software developers used accessible rich internet application code (aria-code), which is known to cause inconsistencies for assistive technology. whenever possible, user experience should be consistent across screen readers. users should never be asked to switch assistive technologies in order to adapt to a tech tool. although we sampled only five tech tools, when considering the breadth of other tools in the market and those that may not yet be developed, we wonder whether our results could indicate an abundance of unique environments with unique digital accessibility problems. this inference suggests that software developers may not be creating tech tools with digital accessibility in mind or may be testing with only one type of screen reader. it also speaks to the lack of digital accessibility best practices in software development for educational tech tools. if anything, our results also illustrate the complexity of tech tool environments and the nuances of assistive technology. digital accessibility benefits users with different abilities digital accessibility is valuable for everyone, not just people with disabilities. two specific values illustrate this comprehensive benefit. first, if standards for digital accessibility are followed, digital content will be more “portable across platforms, browsers, and operating systems.”41 this interoperability could mean that learning content and properly formatted tech tools will be easy to use across assistive technologies and devices such as smartphones. secondly, accessible features benefit people who do not see themselves as having a disability. 42 for example, covid-19 amplified the benefits of using captioning for all learners, even when these learners did not have a specific disability.43 a 2004 microsoft survey also inferred that accessibility features benefit a wide variety of people.44 while a person with a disability benefits from clear organization, headings, labels, and color contrast, those aspects are also helpful for all users. recommendations and next steps planning with intention teaching librarians need to invest time learning about the environment of a tech tool they decide to use in il instruction. sometimes, tech tools that are digitally accessible are not easy or intuitive for instructors to use. we experienced this “easy to use” versus digital accessibility conflict when preparing the scripts for mentimeter and padlet. padlet is used extensively at our institution due perhaps to its instructor-friendly platform. however, padlet’s wall, shelf, and map assessments revealed many problems with digital accessibility. additionally, we had a difficult time creating a quiz in mentimeter, finding this platform unfriendly for the instructor; yet this tool had the fewest digital accessibility problems. this tension between ease of use and digital accessibility illustrates the importance of taking time to read and understand documentation and training materials before creating engagement activities for il sessions. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 20 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson we encourage teaching librarians to work with their local digital accessibility offices to evaluate the technology used most frequently in il classrooms. if a digital accessibility office does not exist on your campus, you may wish to advocate for your administration to create one. even if your institution does not yet have a digital accessibility office, there are ways for librarians to plan their il sessions with accessibility in mind. librarians may do basic assessments of tech tools without access to assistive technology by testing whether it is possible to access all features in a given tool using the tab keyboard key. if there is a function or action that you cannot access with tab or that you must use a mouse to navigate to, then that part of the tool will not be accessible to someone using a screen reader. you can also unplug or turn off the mouse and attempt using a tech tool. librarians can approach each tech tool by asking: is there anything in the tool that uses only images or colors? do sounds convey a meaning that is not otherwise communicated on the screen? if there is anything in the tool that relies on a single form of sensory feedback, it may be unperceivable to people using assistive technology. finally, we strongly suggest considering whether these tools add value to il instruction. if you like a certain tool but know it is inaccessible (or you are unsure), consider trying a different way of involving students in the same kind of engagement. think about simplifying the tech tools that you do use. extend or turn off timers where possible if you choose to use kahoot!, mentimeter, or other quiz-making tools. avoid using questions on any platform that require users to engage with images, even if alt text is provided, because they tend to be more difficult for screen readers. pursue documentation and take time to understand various options for each tool and each question, then weigh which option will be most accessible for most students. it takes time and energy to plan ahead with intention but increasing the ability of all students to engage in learning makes the planning process worthwhile. collaboration if collaboration between librarians and digital accessibility experts is possible on your campus, take the time to talk to one another about learning outcomes and reasons for using specific tech tools. consult with experts in digital accessibility who can also help you advocate for accessibility clauses in purchase contracts before agreeing to subscribe to a given tool or service. you may also foster collaboration with an inclusive community of practice if you have one at your library. further, the teaching and learning unit on your campus may offer support for integrating technology with pedagogy to promote the engagement and learning experience of all students attending il instruction. this collaboration may be impactful for the library and the campus teaching community. as librarians with teaching responsibilities, we usually do not work in isolation. instruction librarians can also serve as a resource for teaching faculty who may want to incorporate accessible tech tools into their instruction. in addition, librarians could investigate professional organizations that provide support and development in understanding digital accessibility. while a framework for assessing tech tools for accessibility does not currently exist, the development of standards and best practices would be beneficial for librarians, software developers, and accessibility professionals alike. we hope to undertake future research and consultation to develop such frameworks with colleagues, possibly through ala round tables or acrl sections focused on instruction and accessibility. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 21 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson next steps our next steps include sharing these reports with the companies who created the five tools we tested. we will ask them to prioritize both the most severe issues and those issues that are easy to fix and that impact user experience. we will also underscore those areas that surprised us, such as inconsistencies between screen readers for the same issue in a given environment. the goal of this outreach will be to build relationships with tech tool developers so that continued dialogue and testing can occur. the ultimate goal is a more accessible learning environment for everyone with technology vendors as partners in this journey. conclusion advocating for digital accessibility in research libraries requires relationship and capacity building. the challenges faced during emergency remote learning illustrate the necessity of campus units working together to ensure student inclusion and success. increased collaborations between academic libraries, tech tool developers, and digital accessibility offices mean that all parties can benefit from mutual expertise. librarians may share the kinds of tech tools being used in il sessions, while accessibility offices may test those tools and provide recommendations for improvement, which may then be leveraged when working with software companies to advocate for positive change. if more people are aware of digital accessibility vocabulary, needs, and resources across campus, that can also augment the number of people available to respond to and triage needs when future emergencies arise. acknowledgment we would like to thank scott holman and eric klinger from the cu boulder writing center for their help revising this manuscript. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 22 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson appendix a: tester instruction and script background poll everywhere is an online platform used in classrooms to engage with students through questions, surveys, and polls. people can sign up for a free account or for one of four subscription based account options. the free account allows users to create unlimited questions, have access to webinar tutorials, and upload images as question choices. this tool also allows people to respond via browser, sms, or app; to export data and screenshots; and to share to social networks, though some of these features are limited with a free account. poll everywhere script 1. type in your browser or click on the link provided. a pop-up might show on your screen. agree to the cookie policy if it does. 2. you may be prompted to introduce yourself and enter the screen name you would like to appear alongside your responses. 3. click continue. the survey will let you know that there are six questions. click start survey. 4. the first question is multiple choice. select your favorite sport. 5. click next on the upper right-hand corner. 6. the second question is a short response. type your favorite ice cream flavor. click submit. you can enter as many answers as you want. when you are ready to go to the next question click next. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 23 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson 7. the third question is also a short response. type in your favorite food. you can enter as many answers as you want. when you are ready to go to the next question click next. 8. the fourth question is also a short response. type what you are looking forward to this semester. you can enter as many answers as you want. when you are ready to go to the next question click next. 9. the fifth question is a clickable image question. click on the face that describes how you are feeling today. for this question, if you want to clear your response and enter a new one, you may do so by clicking “clear last response.” when you are ready, click next. 10. the sixth question is a ranking question. you need to use the arrow feature, which appears when you click next to the image. move images up and down organizing them from favorite to least favorite. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 24 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson for this question, if you want to clear your response and enter a new one, you may do so by clicking “clear last response.” when you are ready, click submit. 11. click finish in the upper right-hand corner. a screen will appear that says “all done!” the results of the survey are only available when the creator of the survey presents them in class. we were not able to figure out a way for respondents to access group responses asynchronously. notes • we noticed that when preparing questions 4 and 6, we were not prompted to enter alt-text by default. • the creator of the poll must enable alt-text for clickable image questions (such as question 4) by going to the user profile and selecting “features lab.” • alt-text did not seem to be available for question 6. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 25 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson appendix b: digital accessibility assessment report for poll everywhere information • testing tools: o windows 10 / jaws 2021/ nvda 2021.3.1 / google chrome (most recent version) o pixel 3a, android 12 / talkback / last updated 9/30/21 o iphone 12 mini / voiceover / safari ios 14.3 • testing dates: february 27, 28, and march 3, 2022 summary this document provides an overview of the issues the digital accessibility office (dao) identified on the poll everywhere platform. overall, we found the site to be relatively accessible for many individuals with disabilities or who use assistive technologies or alternative forms of access depending on the question type. the questions with images—to rank or select—were inaccessible. that said, through our testing, we found five severe issues, two significant issues, one minor issue, and one usability issue. severe issues represent items that create access barriers and need to be remediated, significant issues represent items that create a great deal of difficulty and should be remediated, and minor issues represent items that are the lowest priority but would be good to remediate. usability issues can impact users of any ability. if there are questions, concerns, or the desire to see demos of the issues presented in this report, please reach out to the assessment & usability testing team. please also consider filling out the assessment & usability testing feedback form to help us improve our testing protocols. issues severe graphics are unlabeled or inappropriately labeled • in question 6, there are four pictures of animals. the screen readers read all four images as “unlabeled images.” there is no differentiation between the four images. appropriate image descriptions are needed. o additionally, while reviewing the history of submissions, the answers are a list that read “(an image), (an image), (an image), (an image)” information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 26 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson • there were several elements that have dots in the label name. when using voiceover, these elements were read as “unpronounceable. [braille dots] ...” followed by numbers. o these elements included the marker on the emoji image, the up and down arrow buttons on question 6, and the finished icon. element presents gesture/navigation traps • on question 6, while using voiceover and talkback, the user could not swipe between the answer options. this made the buttons, links, and text before and after the options inaccessible. o a tester was able to leave the trap, but they had to use direct touch and focus landed outside the answers. o additionally, while using talkback, there was not any indication that the image was being moved up or down. element not available by keyboard or screen reader • question 5 is an emoji question where the user would need a mouse or direct touch (while not using a screen reader) to answer successfully. the alternative text says there are emojis, but the user does not know what five emotes or different colors are presented. to activate, the user selects “enter” or double taps (mobile screen reader). this makes a random selection and places the marker in the middle of the image without a way to move the marker to the appropriate emoji. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 27 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson errors do not get focus • during one instance, a user received an error that the response was not submitted. during this one instance, the focus was not pushed to the error message. the user would have to know it was there. ideally, the focus would be pushed to the error so all users would be aware that an error had occurred. significant element state not indicated • in question 6, the unlabeled image reads as “clickable” to nvda. when selecting enter, the state of the element is not announced. visually, the image gets larger. inconsistent focus handling • focus handling for all tools could be improved. focus goes to different areas of the page after responses, returning to previous questions, or refreshing the page. o focus inconsistencies depended on the screen reader. while going through the questions, focus would go to the top of the page, “close app download offer” button, “submit,” or “next.” ideally, focus would be on the heading 1 for each question. minor same information is presented to screen reader multiple times • while using voiceover, the instructions, questions, and answers were announced multiple times. this was noted on several occasions. information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 28 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson usability insufficient color contrast (4.5:1) • in the multiple-choice question, after selecting an answer, the question’s color becomes lighter. the lighter color has insufficient color contrast for both the answer selected (2:1) and the answers not selected (1.8:1). information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 29 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson endnotes 1 kathia ibacache et al., “emergency remote library instruction and tech tools: a matter of equity during a pandemic,” information technology and libraries 40, no. 2 (2021): 8, https://doi.org/10.6017/ital.v40i2.12751. 2 there are many examples of guides which include ada compliance and library spaces, including but not limited to william w. sannwald, checklist of library building design considerations, 6th ed. (chicago: ala editions, an imprint of the american library association, 2016); and carrie scott banks et al., including families of children with special needs: a how-to-do-it manual for librarians (chicago: american library association, 2014). 3 “introduction to the ada,” ada.gov: united states department of justice civil rights division, accessed june 15, 2022, https://www.ada.gov/ada_intro.htm. 4 “introduction to the ada.” 5 ariel pomputius, “assistive technology and software to support accessibility,” medical reference services quarterly 39, no. 2 (2020): 203, https://doi.org/10.1080/02763869.2020.1744380. 6 fernando h. f. botelho, “accessibility to digital technology: virtual barriers, real opportunities,” assistive technology 33, no. s1 (2021): s31, https://doi.org/10.1080/10400435.2021.1945705. 7 botelho, “accessibility to digital technology,” s27. 8 jonathan lazar, “planning for digital accessibility in research libraries,” research libraries issues, no. 302 (2021): 20, https://doi.org/10.29242/rli.302.3. 9 “digital accessibility,” georgetown law, accessed june 15, 2022, https://www.law.georgetown.edu/your-life-career/campus-services/information-systemstechnology/digital-accessibility/. 10 lazar, “planning for digital accessibility,” 21. 11 lazar, “planning for digital accessibility,” 19. 12 lazar, “planning for digital accessibility,” 26–28. 13 education of individuals with disabilities, 20 u.s.c. §§ 1400–1485 (suppl. 2 1988), https://tile.loc.gov/storage-services/service/ll/uscode/uscode1988-03202/uscode1988032020033/uscode1988-032020033.pdf; see also kathleen puckett and kim w. fisher, “assistive technology,” in the sage encyclopedia of intellectual and developmental disorders, ed. ellen b. braaten (thousand oaks, ca: sage publications, inc., 2018), 100 –101. 14 puckett and fisher, “assistive technology,” 100. 15 claire kearney-volpe and amy hurst, “accessible web development: opportunities to improve the education and practice of web development with a screen reader,” acm transactions on accessible computing 14, no. 2 (july 21, 2021): 8:2. https://doi.org/10.6017/ital.v40i2.12751 https://www.ada.gov/ada_intro.htm https://doi.org/10.1080/02763869.2020.1744380 https://doi.org/10.1080/10400435.2021.1945705 https://doi.org/10.29242/rli.302.3 https://www.law.georgetown.edu/your-life-career/campus-services/information-systems-technology/digital-accessibility/ https://www.law.georgetown.edu/your-life-career/campus-services/information-systems-technology/digital-accessibility/ https://tile.loc.gov/storage-services/service/ll/uscode/uscode1988-03202/uscode1988-032020033/uscode1988-032020033.pdf https://tile.loc.gov/storage-services/service/ll/uscode/uscode1988-03202/uscode1988-032020033/uscode1988-032020033.pdf information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 30 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson 16 emily k. chan and lorrie a. knight, “‘clicking’ with your audience: evaluating the use of personal response systems in library instruction,” communications in information literacy 4, no. 2 (march 1, 2011): 192–201, https://doi.org/10.15760/comminfolit.2011.4.2.96. 17 referring to the use of padlet to foster collaboration in a statistics course, henrik skaug saetra suggests that students feel more welcome to ask basic questions in an anonymous environment, in “using padlet to enable online collaboration mediation and scaffolding in a statistics course,” education sciences 11, no. 5 (2021), 6, https://eric.ed.gov/?id=ej1297247. christopher j. e. anderson notes that anonymity invites classroom discussion participation for introverted students and states that answers can be reviewed without “requiring participants to reveal their choice, thus removing stigmas that keep many introverted students from orally participating,” in “repurposing digital devices: using poll everywhere as a vehicle for classroom participation,” journal of teaching and learning with technology 7, (2018): 154, https://eric.ed.gov/?id=ej1307006. 18 citing a 2010 article by b. jean mandernach and jana hackatborn, jared hoppenfeld states that anonymity provides a means for students to answer honestly, in “keeping students engaged with web-based polling in the library instruction session,” library hi tech 30, no. 2 (2012): 238, https://doi.org/10.1108/07378831211239933. see also anderson, “repurposing digital devices,” 154. 19 this paper considered pedagogical approaches when using padlet in the classroom, noting that this tool did not enhance criticality or students’ desire to counter a post by a classmate; see ann rosnida md deni and zainor izat zainal, “padlet as an educational tool: pedagogical considerations and lessons learnt,” proceedings of the 10th international conference on education technology and computers (october 2018), 157, https://doi.org/10.1145/3290511.3290512. 20 some authors surmise that an instructional game could be used to prepare students for exams, for example, patricia a. baszuk and michele l. heath, “using kahoot! to increase exam scores and engagement,” journal of education for business 95, no. 8. (2020): 550, https://doi.org/10.1080/08832323.2019.1707752. examining technology as a tool to enhance teaching and learning in engineering classes, vian ahmed and alex opuku mentioned that students found kahoot! a useful online tool that helped them reflect, apply knowledge, and receive feedback, in “technology supported learning and pedagogy in times of crisis: the case of covid‐19 pandemic,” education and information technologies 27 (2021), https://doi.org/10.1007/s10639-021-10706-w. darren h. iwamoto et al. assert that kahoot! provided students with a fun activity that helped them memorize important concepts, in darren h. iwamoto, jace hargis, erik jon taitano, and kv vuong, “analyzing the efficacy of the testing effect using kahoot! on student performance,” turkish online journal of distance education 18, no. 2 (2017): 83, 89, https://eric.ed.gov/?id=ej1145220. 21 iwamoto et al., “analyzing the efficacy,” 83, 89. 22 carolyn m. plump and julia larosa, “using kahoot! in the classroom to create engagement and active learning: a game-based technology solution for elearning novices,” management teaching review 2, no. 2 (2017): 156, https://doi.org/10.1177/2379298116689783. https://doi.org/10.15760/comminfolit.2011.4.2.96 https://eric.ed.gov/?id=ej1297247 https://eric.ed.gov/?id=ej1307006 https://doi.org/10.1108/07378831211239933 https://doi.org/10.1145/3290511.3290512 https://doi.org/10.1080/08832323.2019.1707752 https://doi.org/10.1007/s10639-021-10706-w https://eric.ed.gov/?id=ej1145220 https://doi.org/10.1177/2379298116689783 information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 31 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson 23 maureen knapp, “technology for one-shot instruction and beyond,” journal of electronic resources in medical libraries (2014): 224, https://doi.org/10.1080/15424065.2014.969969. 24 liya deng, “assess and engage: how poll everywhere can make learning meaningful again for millennial library users,” journal of electronic resources librarianship 31, no. 2 (2019): 63, https://doi.org/10.1080/1941126x.2019.1597437. 25 a surveyand interview-based study by engineering faculty members ahmed and opoku examined both instructors’ and students’ perceptions of technology-supported learning during times of crisis. with regard to technological and pedagogical best practices, student participants noted that interactive feedback tools such as kahoot! helped them synthesize and apply their knowledge. as one student said, “kahoot! was a fun and interactive application and engaging.” see ahmed and opuku, “technology supported learning and pedagogy,” 381. 26 razzaqul ahshan, “a framework of implementing strategies for active student engagement in remote/online teaching and learning during the covid-19 pandemic,” education sciences 11, no. 9 (2021): 487, https://doi.org/10.3390/educsci11090483. 27 jillianne code, rachel ralph, and kieran forde, “pandemic designs for the future: perspectives of technology education teachers during covid-19,” information and learning sciences 121, no. 5/6 (january 1, 2020): 419–31, https://doi.org/10.1108/ils-04-2020-0112. 28 code, ralph, and forde, “pandemic designs,” 426. 29 one such study examines the impact of covid-19 on higher education in ethiopia; see berhanu abera, “the effects of covid-19 on ethiopian higher education and their implication for the use of pandemic-transformed pedagogy: ‘corona batches’ of addis ababa university in focus,” journal of international cooperation in education 24, no. 2 (2021): 3–25. another study focuses on polish primary school integration of ipads; see lucyna kopciewicz and hussein bougsiaa, “understanding emergent teaching and learning practices: ipad integration in polish school,” education and information technologies 26, no. 3 (2021): 2916, https://doi.org/10.1007/s10639-020-10383-1. a third article explores pandemic-transformed pedagogy from the perspectives of early childhood instructors in the caribbean; see sabeerah abdul-majied, zoyah kinkead-clark, and sheron c. burns, “understanding caribbean early childhood teachers’ professional experiences during the covid-19 school disruption,” early childhood education journal (2022), https://doi.org/10.1007/s10643-022-01320-7. 30 paul f. burke et al., “exploring teacher pedagogy, stages of concern and accessibility as determinants of technology adoption,” technology, pedagogy and education 27, no. 2 (2018): 149–63, https://doi.org/10.1080/1475939x.2017.1387602. 31 burke et al., “exploring teacher pedagogy,” 158–59. 32 ibacache, rybin koob, and vance, “emergency remote library instruction,” 9. 33 the authors of this study hold roles as academic subject specialist librarians and digital accessibility office staff, including accessibility and usability team testers. https://doi.org/10.1080/15424065.2014.969969 https://doi.org/10.1080/1941126x.2019.1597437 https://doi.org/10.3390/educsci11090483 https://doi.org/10.1108/ils-04-2020-0112 https://doi.org/10.1007/s10639-020-10383-1 https://doi.org/10.1007/s10643-022-01320-7 https://doi.org/10.1080/1475939x.2017.1387602 information technology and libraries december 2022 tech tools in pandemic-transformed information literacy instruction 32 rybin koob, ibacache, williamson, lamont-manfre, hugen, and dickerson 34 “free accessibility tools and assistive technology you can use today,” bureau of internet accessibility (blog), october 26, 2018, https://www.boia.org/blog/free-accessibility-tools-andassistive-technology-you-can-use-today; see also “chapter 1, introducing voiceover,” in voiceover getting started guide, apple, inc., accessed june 16, 2022, https://www.apple.com/voiceover/info/guide/_1121.html. 35 “get started on android with talkback,” android accessibility help, accessed june 16, 2022, https://support.google.com/accessibility/android/answer/6283677?hl=en. 36 “about nvda,” nv access, accessed june 17, 2022, https://www.nvaccess.org/about-nvda/. 37 jaws was developed by freedom scientific members with vision loss; see “jaws®—freedom scientific,” accessed june 16, 2022, https://www.freedomscientific.com/products/software/jaws/. 38 “colour contrast analyser (cca),” tpgi, accessed june 16, 2022, https://www.tpgi.com/colorcontrast-checker/. 39 ibacache, rybin koob, and vance, “emergency remote library instruction,” 9. 40 jamboard is very visual, with multiple options such as sticky notes, drawing pens, and image searching. other tools such as kahoot! and mentimeter are not solely visual; they also include additional moving parts, such as sounds and other notifications. 41 lazar, “planning for digital accessibility,” 21. 42 lazar, “planning for digital accessibility,” 21. 43 lazar indicated that captioning benefits people who process information in different ways, who are learning the language being used, or who may otherwise struggle to understand a dialect, in “planning for digital accessibility,” 21. 44 forrester research, inc., accessible technology in computing: examining awareness, use, and future potential, redmond, wa: microsoft corporation (2004): 9, http://download.microsoft.com/download/0/1/f/01f506eb-2d1e-42a6-bc7b1f33d25fd40f/researchreport-phase2.doc. https://www.boia.org/blog/free-accessibility-tools-and-assistive-technology-you-can-use-today https://www.boia.org/blog/free-accessibility-tools-and-assistive-technology-you-can-use-today https://www.apple.com/voiceover/info/guide/_1121.html https://support.google.com/accessibility/android/answer/6283677?hl=en https://www.nvaccess.org/about-nvda/ https://www.freedomscientific.com/products/software/jaws/ https://www.tpgi.com/color-contrast-checker/ https://www.tpgi.com/color-contrast-checker/ http://download.microsoft.com/download/0/1/f/01f506eb-2d1e-42a6-bc7b-1f33d25fd40f/researchreport-phase2.doc http://download.microsoft.com/download/0/1/f/01f506eb-2d1e-42a6-bc7b-1f33d25fd40f/researchreport-phase2.doc abstract introduction overview ada accessibility what is digital accessibility? assistive technology for blind users classroom engagement with tech tools pandemic-transformed pedagogy methodology about the testers about the test scripts process assistive technology tools evaluated limitations results severe issues significant issues minor issues summary of results discussion shared issues among the tools unique environments digital accessibility benefits users with different abilities recommendations and next steps planning with intention collaboration next steps conclusion acknowledgment appendix a: tester instruction and script background poll everywhere script notes appendix b: digital accessibility assessment report for poll everywhere information summary issues severe graphics are unlabeled or inappropriately labeled element presents gesture/navigation traps element not available by keyboard or screen reader errors do not get focus significant element state not indicated inconsistent focus handling minor same information is presented to screen reader multiple times usability insufficient color contrast (4.5:1) endnotes highlights of isad board meeting 197 4 midwinter meeting chicago, illinois monday, january 21, 1974 43 the meeting was called to order at 10:15 a.m. by president frederick kilgour. those present were: board-frederick g. kilgour, lawrence w. s. auld, paul j. fasana, donald p. hammer (isad executive secretary), susan k. martin, ralph m. shoffner, and berniece coulter, secretary, isad. guest-brett butler. midwinter 1973 minutes approved. motion. mr. shoffner moved to approve the minutes of the midwinter 1973 board meetings. seconded by mr. fasana. carried. las vegas annual meeting minutes accepted. a correction on page one of the las vegas annual meeting minutes was noted: mr. auld's name should be added to the list of guests present. motion. mr. fasana moved that the minutes of the isad board meetings at the las vegas annual conference be accepted as corrected. seconded by mrs. martin. carried. isad history committee. the matter of appointing members to the isad history committee, whose function is to prepare a history of isad for ala's centennial celebration in 1976, was considered. mr. shoffner said that during the time he was president, he had rendered the isad history committee inactive. it was suggested by mr. kilgour that a historian would serve the purpose better than a committee. mr. shoffner remarked that he anticipated the chairman would be a historian. mrs. martin asked whether a check could be made first whether ala is planning to publish any document for the centennial celebration that would make any preparation by an isad committee or historian worth while. mr. kilgour remarked that isad definitely should be included if ala did plan to publish any document and asked the board to give an "ok" to appoint a historian. motion. mr. fasana moved that the ad hoc isad history committee 44 journal of library automation vol. 7/1 march 197 4 be abolished and recommended that the president be given the right to appoint a historian if ala planned to publish a centennial document. seconded by mr. auld. carried. ala dues structure. mr. hammer explained the information submitted to the board concerning the proposed ala. dues structure. the basic fee for ala membership under this proposed dues structure would be $35. membership in each division would be an additional $15. in es~ sence, each division would be on its own financially:· if there are not enough memberships to support a division, as could be the case, the division would cease to exist. !sad could support itself with its present membership, but there is no. way of knowing how many !sad members would still select !sad if the choice of two divisions included in the dues was removed. the divisions that publish a journal would attract membership much more easily than those that do not provide a journal. mr. hammer further remarked that the proposed dues schedule indicates that the divisions must prove themselves with membership dues as their only support, but this does not apply to ala committees, scmai, units such as the office for intellectual freedom, office for library service to the disadvantaged, and the administrative and support units of ala. these units may be of great value to ala, but if one tinit is forced to prove its value financially, then it seems that all should have to prove themselves. the divisions would be expected to depend on their own resources, e.g., if the division runs out of postage ·money, there would be no further mailings. the divisions would be expected to pay for their support services.· the idea is very closeto the federation plan which has been circulated for some time. in answer to the question of how a new division would get started, mr. hammer replied that he assumed there would have to be enough memberships to provide for it financially. mr. shoffner suggested that the discussion be divided into two parts: ( 1) the principle involved; and ( 2) the financial aspect. · the following points were brought up in the ensuing discussion by the board regarding the proposed dues structure: . starting a new division could be a problem; perhaps it could be subsidized for a stated time, after which the division: would be self-sufficient. the proposed separation of dues, however, would force a clarity in ex~ penditures of. ala in respect to how the divisions would benefit. some divisions could not be self-supporting and yet are producing important contributions for ala. ' ' a division would be at the mercy of the ala supporting units. if a sup~· port unit was not efficient, the divisions would be handicapped in the services to their members. would a division be able to know enough in advance how much money could be counted on for program planning? the answer was "yes" based highlights of meetings 45 on past membership, except in the first year. the income would be predicted on the basis of the previous year's income. an excess of income would remain in the division's funds. if the division income fell short of the anticipated amount, it would have no back-up from ala as it has presently. a person could not join one or more of the divisions without joining ala. some divisions could become part of a stronger division, e.g., a division could be broken up and absorbed into several other divisions with related interests. was there any plan to absorb or redirect these divisions which obviously could not be self-supporting? nothing has been announced so far. if a division got into financial difficulties, it could not cut down on its professional staff as a professional staff is needed to maintain ala's status with the internal revenue service. it was noted that there were more important reasons than this for maintaining a professional staff . . · this proposal was drafted by the then deputy director ruth warncke in 1970. the board was informed that a cost study of ala was recently discussed by staff members, but the reply has been that it would take five years to make such a study. the isad board disagreed with the period of five years, but stated that it could take a year. . · a division should be allowed to set up its own budget under this pro~ posal as well as have a voice in ala policy. · · the proposal appeared to be unfair in some points: ( 1) some divisions would have about twice their present income through memberships, while isad would break about even; ( 2) life members would be entitled to membership in all divisions; ( 3) apparently institutions without a group insurance plan of their own could join ala for $35 and be entitled to the gioup insurance for their staffs; at some point an examination of the privileges in each category of membership should be made; and ( 4) if the $35 ala membership fee were increased in the future, this would directly affect membership in the divisions. the isad budget for the 1973/74 year is approximately $47,000 and the journal of library automation $23,000, or a total of approximately $.70,000. if isad membership should fall back to 3,000 members and the membership fee were $25, isad could still be viable. "mr. kilgour's poll of the board revealed all were in favor of the principle of more or less independent divisions, but with reservations; the following was therefore moved: · 'motion. mr. shoffner moved that the isad board favors the prin. ciple of divided annual fees for ala and for its divisions subject to: ' · ( 1) division determination of the fee structure for division memberships and publications; ( 2) division participation in the governance of ala headquarters activities. seconded by mr. fasana. motion carried. ·selective dissemination of information system. mr. 46 journal of library automation vol. 7/1 march 1974 hammer presented a proposal for establishing on a subscription basis a selective dissemination of information system for ala members (see exhibit 1). mter discussion it was decided that mr. hammer would contact ohio state university library and obtain information on exact procedure as to how this would be run, how it would be publicized, who would develop the profiles, who would handle the subscriptions, the cost to the division, etc., and then repmt to the board. co-sponsorship of basic data processing seminars. mr. hammer presented a proposal to the board regarding co-sponsorship of basic data processing seminars with organizations outside isad, such as ibm and dataflow systems, inc. in bethesda, maryland. in the past isad seminars have generally been on library applications, but what he had in mind, mr. hammer said, was primarily on the basics of data processing, systems analysis, and other basic aspects that would be of interest to administrators. the intent would be to give administrators enough knowledge so that they could evaluate the results that they should be gaining from their data processing systems. these institutes would be a package deal in that the personnel and materials would be commercially supplied, dataflow has conducted seminars for the united states civil service commission. ibm has some seminars which are free, but there is a charge if they have to develop a special program. comment was made regarding seminars conducted several years ago where problems developed as to the commercial aspects. motion. it was moved by mrs. martin that the matter of !sad's cosponsoring basic data processing seminars with outside organizations be referred to the isad program planning committee for discussion and their evaluation. seconded by mr. fasana. carried. tuesday, january 22, 1974 the meeting was called to order by the president, mr. kilgour, at 2:25 p.m. those present were: board-frederick g. kilgour, lawrence w. s. auld, paul j. fasana, donald p. hammer (isad executive secretary), susan k. mqrtin, ralph m. shoffner, and berniece coulter, secretary, isad. guests-alex allain, brigitte kenney, ron miller, and velma veneziano. draft on ala goals and objectives. mrs. brigitte kenney sought feedback from the board on the paper previously distributed on the ala committee on planning's draft statement on ala's goals and objectives. several changes were suggested. mrs. kenney expressed her appreciation for their input. freedom to read foundation. mr. alex allain from the foundation presented the cause of the freedom to read foundation in rehighlights of meetings 47 gard to the current problem of censorship. he stressed the desire to keep channels open with the divisions of ala and with systems and networks across the nation. marbi and isad standards committee (tesla). velma veneziano, chairman of the marbi interdivisional committee, appeared before the isad board requesting clarification of the functions of marbi and the isad standards committee ( tesla). she said that her committee would like discrepancies cleared up and duplications eliminated. mrs. martin suggested that the charges to both marbi and tesla be reworded to clarify their functions. isad bylaws committee. in response to discussions concerning the establishment of several committees, mr. shoffner moved to establish an organization committee. seconded by mrs. martin. mr. fasana pointed out that the mechanism for establishing a bylaws committee was already spelled out in the isad constitution. the president can appoint the committee. motion withdrawn. mr. shoffner withdrew his motion. mr. fasana suggested that the bylaws committee also be charged with the organizational and review function. the matter of the standards committee's function was also made the charge of the bylaws committee. wednesday, january 23,1974 president kilgour called the meeting to order at 10:15 a.m. those present were: board-frederick g. kilgour, lawrence w. s. auld, paul j. fasana, donald p. hammer ( isad executive secretary), susan k. martin, ralph m. shoffner, and berniece coulter, secretary, isad. guestsbrett butler, john kountz, ann painter, charles payne, james rizzolo, richard utman, velma veneziano, and david waite. report of the nominating committee. the chairman, charles payne, announced the nominees for the 197 4/75 slate of isad candidates: vice-president/president-elect: board member-at-large: henriette a vram allen veaner ruth tighe maurice freedman the board members extended a vote of thanks to the nominating committee for their work. report of marc user's discussion group. mr. james rizzolo, chairman, said most of the discussion in the discussion group revolved around ala, clr, and the change in clr' s status which was moved in august from one irs classification to another. it is now an "op48 journal of library automation vol. 7/1 march 1974 erating foundation," i.e., it is active in programs rather than waiting for a reaction to a request using funds they have as a "carrot.'~ also discussed was whether clr should fund and pick the participants or clr should do the funding and ala pick the participants. , . also the group considered the question of standards and how one ardves at them. there are a number of groups in ala dealing with standards, but there is a need to work out a systematic method of developing standards. there needs to be a routine mechanism set up for going from an imtial formulation of an idea for a standard to a standard that the profession can live with. report of program planning committee.. the committee met at the asis meeting in los angeles prior to meeting at the ala midwinter meeting. . :rvir. brett butler, chairman, announced that three european librarians );lad been invited to participate in the 1974 annual program .at new york city. mr. kilgour was handling all arrangements. mr. kilgour informed the board that the travel expenses of all three librarians were ·being provided for by sources outside ala. linda crismond is the local planning person for the 1975. san francisco annual conference program which will be sponsored jointly with asis. joshua smith had suggested mark radwin of lockheed as liaison and he had agreed to serve in this capacity. . the new orleans institute on "alternatives in bibliographic networking" had enough registrants by midwinter to confirm it. there had been some difficulty concerning contact with speakers but the .details had been straightened out. copies of the program ·for the new orleans institute were distributed. mr. butler also inforrried the board that his committee was looking into the details of cooperating with other institutions and state schools which might be interested in working with isad in a seminar or institute. the committee was also considering what type of programs should ·be presented, subcontracting to outside companies, and how to control these. the members of the committee were working on a procedure manual for use in conducting institutes .. telecommunications committee report. the activities of the telecommunications committee are highly organizational at present. the committee has swung away from cable tv as its primary interest and towards telecommunications as applied to bibliographic networks. the chairman, david. waite, said there was a need to set up a simple guide to carry out their charge for the educational activities and legislation advisory responsibilities to the ala committee on legislation. more people would probably be appointed to the telecommunications committee as there was a need for more expertise to assign to the areas identified by the committee. . highlights of meetings 49 he further said that the need now is to determine what existing appara. tus may be utilized to fulfill the committee's responsibility to disseminate information regarding telecommunications as applied to the library community so that the committee could put most of .its effort into technical work. one project discussed was to gather background information on bibliographic data centers and network activities and their needs for telecommunication facilities in order to draft a requirements statement. the purpose of such a: statement is. that the committee could communicate .with new telecommunications systems. the committee was not aware of an ade· quate statement of library requirements that. is readily available ·for the commercial services that .are steadily increasing. assignments have been given to gordon randall, maryann duggan, and ron miller to gather this information. mtr. waite remarked that the committee would be interested in ~ny report on the proposed isad networks committee when available. brett. butler, chairman of the program planning co:rrnnittee; suggested that a telecommtmications institute should be in the future plans and mr. waite's or any of his committee members contribution of any ideas about· such would be appreciated. report 6f the interdivisional committee on machine-readable bibliographic information (marbi). (see exhibit 2~) mr. kilgoirr appointed velma veneziano to serveas liai~ son to the isad standards committee from marbi. her term as chair~ man of marbi will conclude in jnne 197 4. report of cola discussion group. (see exhibit 3.) report of committee on technical standards fqr library 'automation (tesla). (see exhibit 4.)~report of chairman jolln kountz. · · technological unemployment. president kilgour felt ala should do something about the spreading of unemployment due.' to increased use of technological development. m!r. auld suggested that someone be appointed to study the potential and existing problems in this area. this could be funded either: (1.) ,under a fellowship by clr; or (2) application for the j. morris jones .goals award. .· · · · mr. fasana. thought an interdivisional committee might be set up be~ tween the fotir rnost directly affected divisions: isad, lad, led.; 'and rtsd. ·. . . , mr. shoffner expressed· his view that as efficiency is ii:rcreased productivity is increased aj}d could possibly therefore increase employment. mr.: kil~ gour said tha.t.history had proved to.the contrary. mr. shoffner stated he felt the problem was on~. of education and ·.training. a specification· of 50 journal of library automation vol. 7/1 march 1974 what is expected of one and what training he would receive during a technical changeover was needed. mr. fasana's suggestion was that the four divisions be asked for papers of their views or a program at the san francisco annual conference be prepared on the subject of technological unemployment. mr. auld asked if it could not rather be introduced at the new york annual conference, to which ann painter volunteered the use of the isad /led education committee's two-hour time slot for the program at new york. motion. mr. fasana moved that mr. kilgour phrase a statement of the problem on technological unemployment as he sees it and present it to the !sad /led education committee for consideration as the program theme at the new york conference. seconded by mrs. martin. carried. proposed standards in ]ola tc. mr. john kountz brought up the subject of using lola tc for the interactive mechanism of presenting the proposal of a standard to the isad members for comment, and of having a form included to be filled out and returned. the board agreed that this was a good idea. isad/led education committee report. ann painter, chairman, asked for clarification of appointment of new members to the committee. roger greer is the only member whose term continues past this year. mr. hammer was asked to find out who appoints members to the above committee. the committee is working on a series of papers defining educational "modules" and has sent out a revised questionnaire to identify appropriate subject areas. it is planning to send the questionnaires to associated institutions as well as to the ala accredited schools. the need for funding the modules rather than depending upon volunteer or "slave labor" was considered by the committee. volunteers have little preparation time and so often there is a lack of in-depth or consistency in developing these modules. also the committee would like to set up a file of modules available to people across the country. there could be a problem of copyright involved. mr. kilgour asked miss painter for suggestions of people who might be interested in serving on the committee. lola manuscripts. mrs. martin, editor of ]ola, asked the board for its feeling on whether it would be appropriate or desirable to put the date of acceptance on published manuscripts in lola. the board decided that should be the editor's decision. vote of thanks to mrs. martin. the board gave mrs. susan martin a unanimous vote of thanks for her work in getting the issues of ]ola caught up to date in time to meet the post office deadline of december 31, 1973 in order to retain the second class permit. highlights of meetings 51 report of the membership survey committee. (see exhibit 5.) board minutes in lola. the board suggested that minutes published in ]ola be entitled "highlights of isad board meeting" rather than minutes. the meeting was adjourned at 12:30 p.m. exhibit 1 proposal for.establishing on a subscription basis a selective dissemination of information system for ala members the original proposal for an sdi system was intended for isad members only, but interest has grown at ala headqua1ters to the extent that it is being considered as a service to be provided for all ala members. the proposal therefore does not require any action on the part of the isad board. it is presented here for information and to give the board members an opportunity to comment on the idea and make suggestions toward developing the best possible procedure. it is hoped that a presently operating system can be found that would enable ala members to subscribe to a system using multisubject data banks that would automatically adjust profiles according to past output results and that would supply as requested copies of articles and documents whenever possible. such documents would of course be supplied at a fee additional to the basic subscription fee. it is also hoped that the operators of the system would be responsive to subscriber feedback and would improve the system as warranted. at present the only existing data banks in the library and information science fields are eric and marc, but hopefully as time goes on others will be developed. it, for example, would seem prudent for the h. w. wilson company to consider the sale of lihm1·y litemtme in machine-readable form. in any event, there is no reason to limit subscriptions to the service to information science data banks. if interested, members of ala could subscribe to other subject fields depending upon the data banks made available by the operating service. chemistry librarians could, if useful to them, subscribe to chemical abstmcts condensates, engineering librarians to enginee1'ing index, etc., etc. only time and the availability of sdi can determine the interest of librarians in such services. at the time of writing, only one of the two agencies contacted for information has provided descriptive data on their system. a copy of one of the papers sent by the ucla center for information services is attached. ohio state university libraries had not as yet responded. enquiries will be made with other operating systems so that a basis for comparison wiii be available for decision at ala headquarters. comments and suggestions from isad board members would be appreciated. information regarding presently operating systems would also be of great value. december 13, 1973 exhibit 2 reports of the meetings of the marbi committee (interdivisional committee on representation in machine readable form of bibliographic information) january 19 and 20, 1974 number one priority was the resolution of the relationship between the library of congress and marbi in its capacity as the marc advisory group. 52 journal of library automation vol. 7/1 march 1974 there was discussion of the position paper which was presented at the las vegas meeting (copy attached) entitled "the library of congress view on its relation to the ala marc advisory committee." lc had revised certain portions of this paper to conform with marbi's wishes. these revisions were acceptable to the committee. there was concern, however, over an addition which pertained to marbi's role with regard to formats other than books and serials (namely films, maps, music, etc.) alternate wording to lc's proposal was worked out by paul fasana and john knapp. several documents were submitted by henriette avram: (1) a proposed document numbering scheme for communications between lc and the committee and vice versa, and (2) proposed format for presenting changes to marc formats (copies attached). these documents and proposals were acceptable to the committee. (note: incidental to this discussion, the committee officially adopted "marbi" as its official acronym.) 1. the lc liaison presented two proposed marc format changes for the committee's consideration entitled: lc/marbi 2-addition of $x subfield for 4xx fields to allow for issn. lc/marbi 3-specincation of the 830 field. the committee decided that the following plan of action would be followed with regard to these two changes: they would be announced and distributed to isad marc users' discussion group at its january 21, 1974 meeting. the proposed changes would be sent to all on mudg's mailing list, asking for replies to the marbi chairman by february 16, 1974. the chairman would summarize responses and poll marbi committee members who would respond by march 16, 1974. the marbi committee chairman would respond to lc by march 16, 1974. marbi will request publication of changes in ]ola technical communications. 2. henriette avram presented to the committee a clr statement which had been presented to arl entitled "a composite effort to build an on-line national serials data base." the committee took note of the presentation with interest and voted to take no action on the matter at the january 19 meeting. 3. the character set subcommittee of marbi reported that it had issued a written report which will be used in support of the united states position concerning development of standards within the international standard organization. marbi issued thanks to the subcommittee and requested that they remain convened pending review of further developments coming from activities within iso. 4. there was a report on activities of the ad hoc committee convened by clr to discuss use of the marc format in a network environment. a paper entitled "sharing machine readable bibliographic data: a progress report on a series of meetings sponsored by the council on library resources" was discussed. the committee took note of these activities with interest and will wait for formal submission of format changes from the library of congress. 5. marbi discussed the apparent overlap of the change between marbi and the new isad committee on technical standards. marbi passed a resolution that the isad representatives should bring to the attention of the isad board its concern over the similarity of the function statements of the two committees, and asked that these apparent discrepancies be considered and any duplication be eliminated. 6. the proposed marbi serials task force was discussed. it was felt that marbi committee members needed to keep up on developments, and that the chairman should continue to collect and distribute as much documentation as possible to the committee highlights of meetings 53 members. it was decided that there was no need ~tt this time to set up a separate subcommittee to perform this function. 7. the proposed amendments to iso 2709-1973(e) were discussed. it appears that there are several proposals circulating to change this standard. marbi formed a subcommittee to study these proposals and respond, and possibly, to make counterproposals. the position of marbi will be reported to the chairman of ansi z-39, sc/2 and will be used in support of the u.s. position within iso. any committee member or interested professional may reply individually. the subcommittee appointed consists of charles payne, john knapp, mike malinconico, and charles husbands. response will be made by april 1, 197 4. at its regular scheduled meeting, on january 20, all members were present. (john byrum was unable to attend the unofficial meeting on january 19.) the distribution of the rtsd and isad manual material was discussed. the discussion of the previous day was summarized for purposes of review and for the benefit of the nonmembers attending the meeting. 1. marbi and lc the alternative wording to the lc position paper was presented by paul fasana. it was passed. henriette avram will have it published in lcib and will submit it to lola tc. lrts will also receive a copy. the paper will be submitted to each divisional board. 2. the national on-line union file of serials was discussed. larry livingston answered questions. 3. the character set subcommittee report will see that isad has a copy. interested professionals should ask for a copy from them. 4. the activities of the ad hoc clr committee were again reviewed. 5. the isad standards committee was discussed. 6. the serials task force for marbi was reported on. 7. the proposed changes to iso 2709-1973 (e) were reviewed. new business: 8. the activity of the ifla working group on content designators was discussed. it was reported that there is an attempt to standardize content designators across national boundaries, for purposes of international exchange. there are problems in the area of cataloging rules, not all libraries participating, and language. no action was needed, as this is only for informational purposes at this time. 9. location codes were discussed, but the issue was tabled pending report of ad hoc clr committee. 10. language and geographic area codes were brought up but not considered necessary to become involved. 11. the z39 standard account number (san) was reported by emery koltay. 12. progress in regard to the publication of the isbd-m and s was discussed. exhibit 3 cola report-midwinter '74 about fifty people were in attendance at portions of the four-hour meeting. the first half was taken up by a series of informal presentations about activity at: by: stanford allen veaner csuc john kountz berkeley & ulap sue martin ulap cis project at ucla peter watson 54 ]oumal of libmry automation vol. 7/1 march 1974 at: nypl-rlg & suny plans university of chicago lc by: mike malinconico charles payne rob mcgee mary kay daniels questions were entertained at the end of each presentation. the second half was opened by a few announcements by maryann duggan about the new orleans institute and henriette avram about the serials proposals. the major portion of the second half consisted of a panel discussion by john kountz, eme1y koltay, tom brady, and john knapp on the communication of orders, claim reports, ill requests and responses in machine-readable form. john kountz addressed general system design aspects, emery koltay discussed the isbn, issn, and standard account numbers, tom brady discussed b&t's experiences with batab, and john knapp addressed the nature of the data elements and the record structure itself. considerable discussion followed the presentations, centering heavily on the isbn and its good points and failings. both parts of the meeting seemed to be well received. the major value of cola seems to be as an occasion for a wide variety of automation-oriented people to discuss a similarly wide variety of topics in an informal environment. there was some feeling that the presentations in the first half could have been more tightly controlled. the presentation in the second half was quite useful, i feel. i would like to suggest cola as a good sounding-board for proposals and place for announcements, distributions of handouts or written position papers. john kountz and i have discussed setting aside a portion of it for tesla reports. exhibit 4 respectfully submitted, brian aveney to: board of directors, information science and automation division from: john kountz, chairman, committee on technical standards for library automation subject: report of committee's activities, ala midwinter meeting, 1974 the committee on technical standards for library automation (tesla) held its inaugural meetings on tuesday, january 1974 (4:3d-6:00 p.m. and 8:30-11:00 p.m.). these were icebreaker meetings for a new group. in view of the interest that had been expressed in various quarters, several interested observers attended, as well as six of the seven committee members (for membership attendance see attached list). in addition, the following individuals were invited to meet with the committee and present their review of standards activities in other areas; establish a working perspective for the committee within the american library association; and delineate the constraints of the committee's charge: mr. fred kilgour, mr. don hammer, ms. velma veneziano, mr. emery koltay. while the specific discussion that ensued covered a variety of topics, the central objectives for these two meetings (establishing/ defining action areas, constraints, roles, and reviewing in some detail the committee's charge) were met. in addition, stress was placed throughout the discussion on differentiating between professional, service, bibliographic, and similar library standards, and the communications/ clearinghouse function to be served by the committee in its dealings with technical standards impacting library automation. highlights of meetings 55 at its next meeting, the committee can be expected to complete its deliberations on the charge, complete a proposed pilot procedure for the handling of initiative/reactive requirements for standards, and recommend a shakedown of the proposed procedure. committee on technical standards for library automation ala midwinter meeting 1974 attendees of meetings held 21 january 1974 dr. edmund a. bowles, ibm mr. arthur brody, bro-dart industries mr. jay cunningham, university of california mr. john kountz, chairman, california state university and colleges mr. tony miele, illinois state library mr. richard utman, princeton university absent: ms. madeline henderson, national bureau of standards exhibit 5 report of the membership survey committee we mailed out 4,337 questionnaires as of november 3. as of last week, we had received 1,666 replies. they have now dwindled down to about five or six a day, so i feel we have probably received the majority of responses from our mailing. i hope for about a 40 percent response. the returns are presently being coded now by my graduate assistant, and the university of south carolina computer centre will keypunch them for us. i am hopeful that we can start analyzing the results by the end of february, and have the report ready for you by april. the expenses to date have been: $346.95 164.32 166.60 $677.88 preliminary mailing printing of envelopes return postage the bill for printing the questionnaire hasn't been received yet but should be a very minor one. jim williams will write the program for the data, and the library school has computer time which we can use. i expect when all the expenses are in that the total will be more than the budgeted $700, but not very much more. submitted by: elspeth pope, chairman jim williams bill summers martha manheimer letter from the editors (june 2023) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | june 2023 https://doi.org/10.6017/ital.v42i2.16xxx welcome to the june 2023 issue. see below for updates on our hosting migration plans, editorial board membership, and our call for submissions. peer-reviewed articles in the current issue are listed here: • supporting faculty’s instructional video creation needs for remote teaching: a case study on implementing eglass technology in a library multimedia studio space / hanwen dong • technology integration in storytime programs: provider perspectives / maria cahill, erin ingram and soohyung joo • a tale of two tools: comparing libkey discovery to quicklinks in primo ve / jill k. locascio and dejah rubel mary carrier also shares her experience in creating a coding club for kids and teens in this month’s “public libraries leading the way” column. she highlights helpful resources and tools in her article “community-driven programming: offering coding and robotics classes in your library.” ital will move to a new host this summer as mentioned in the march letter from the editor, ital is moving from our longtime hosting at boston college to ala production services. a reminder of a few important details: • our landing page url, https://italjournal.org/, will get you to the journal’s front page both now and after the migration is complete. • urls for articles will change, but dois will continue to resolve the new home of the journal. we will work with our current host, boston college, to set up redirects for as many informational pages as possible to the new location. • for authors and reviewers, there should not be any significant differences because we will continue to use the open journal system platform. • articles published in ital (and our two sibling journals, library leadership and management and library resources and technical services) will continue to be open access with no fees charged to authors or readers. authors maintain copyright in their work. if you would like to receive an email when the september 2023 issue is published at our new location, create a user account by going to the user registration page. be sure to check the “yes, i would like to be notified of new publications and announcements” box near the bottom of the sign-up page to receive an email when future issues are published. we are very grateful to boston college for their support of information technology and libraries over the past decade and to the core board for supporting this migration project. editorial board membership update june marks the end of terms for six of our editorial board colleagues. marisha and i extend our gratitude to lori ayre, jon goddard, soo-yeon hwang, holi kubly, brady lund, and paul swanson for their commitment, dedication, and excellent service to ital and the library technology profession over the four years they have volunteered on the editorial board. https://ejournals.bc.edu/index.php/ital/article/view/15201 https://ejournals.bc.edu/index.php/ital/article/view/15201 https://ejournals.bc.edu/index.php/ital/article/view/15701 https://ejournals.bc.edu/index.php/ital/article/view/16253 https://ejournals.bc.edu/index.php/ital/article/view/16617 https://ejournals.bc.edu/index.php/ital/article/view/16617 https://doi.org/10.6017/ital.v42i1.16319 https://italjournal.org/ https://ejournals.bc.edu/index.php/ital/user/register information technology and libraries march 2023 letter from the editors 2 varnum and kelly in july, we will welcome new members to the editorial board. new members joining us for a twoyear (renewable) term are: cindi blyberg, joanna dipasquale, john klima, ellen schmid, and le yang. we are excited to have them join us. be a part of a future issue as the u.s. academic year hurdles to a close this spring, it’s a great time to think about the work you’ve accomplished and what you might share with your library colleagues near and far. our call for submissions outlines the topics of interest to the journal—basically, if the submission discusses the intersection of libraries/archives/museums and technology, it’s potentially in scope—and the process for submitting an article. we’d love to consider your article for publication. or, if you have an idea you’d like to discuss with ital’s editors, contact either of us at the email addresses below. kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu marisha.librarian@gmail.com https://ejournals.bc.edu/index.php/ital/call-for-submissions mailto:varnum@umich.edu mailto:marisha.librarian@gmail.com ital will move to a new host this summer editorial board membership update be a part of a future issue digital native librarians, technology skills, and their relationship with technology jenny emanuel information technology and libraries | september 2013 20 introduction a new generation of academic librarians, who are a part of the millennial generation born between 1982 and 2001,1 are now of the age to either be in graduate school or embarking on their careers. often referred to as “digital natives” because their generation is believed to have always grown up online and with technology ubiquitous in their daily lives,2 many agree that this generation is poised to revolutionize library services with their technology skills.3 younger academic librarians believe that their technology knowledge makes them more flexible and assertive in libraries compared to their older colleagues, and they have different ways of completing their work. they refuse to be stereotyped into the traditional “bookish” idea of librarianship and want to transform libraries into technology-enhanced spaces that meet the needs of students in the digital age, redefining librarianship.4 this paper, as part of a larger study examining millennial academic librarians, their career selection, their attitudes, and their technology skills, looks specifically at the technology skills and attitudes toward technology among a group of young librarians and library school students. the author initially wanted to learn if the increasingly high-tech nature of academic librarianship attracted millennials to the career, but results showed that they had a much more complex relationship with technology than the author assumed. literature review the literature concerning the millennial generation focuses on their use of technology in their daily lives. millennials are using technology to create new ways of doing things, such as creating a digital video for a term project, playing video games instead of traditional board games, and connecting with friends and extended family worldwide through email, instant messaging, and social networking.5 they use technology to create new social and familial networks with friends that are based on the music they listen to, the books they read, the pictures they take, and the products they consume.6 they believe that their relationship with technology will change the way society views and relates to technology.7 with technology at their fingertips on a nearly constant basis, millennials have gained an expectation of instant gratification for all of their wants and needs.8 jenny emanuel (emanuelj@illinois.edu) is digital services & reference librarian, university of illinois, urbana. mailto:emanuelj@illinois.edu digital native academic librarians, technology skills, and their relationships with technology | emanuel 21 millennials believe that technology is not a passive experience, as it was for previous generations.9 to them, technology is active and an experience by which they live their lives.10 they have grown up with reality television, which means anyone can have his or her fifteen minutes of fame. in turn, this means being heard, having their say, and becoming famous online are all natural experiences that can be shared by anyone.11 because they can create their own customized media and make media consumption an interactive, as opposed to a passive and hierarchical, experience, they believe that everyone’s opinion counts and deserves to be heard.12 even though they believe they are the greatest generation and expert users of technology, others have a different view. for example, bauerlein argues that they are not intellectually curious, are anti-library, and blindly accept technology at face value while not understanding the societal implications or context of technology. they also consume technology without understanding how it works.13 within libraries, technology skills related to new librarians have been studied by del bosque and lampert, who surveyed librarians from a variety of library settings with less than nine years experience working as professional librarians. the survey found the majority (55 percent) understood that technology played a large part of their library education, but a similar percent (57 percent) did not expect to work in a technical position upon graduation. respondents also thought there was a disconnect between the technology skills taught in library school and what was needed on the job, with job responsibilities being much more technical than they expected. thus, even though more experienced librarians expected recent graduates to fill highly technical roles, library school did not prepare them for these roles and students did not opt to go to library school to gain strong technology skills. based on survey comments, the researchers noted two categories of new librarians: those who have a high level of technical experience, usually from a previous job in a technology related industry, and those who struggle with technology. for those who struggle with technology, technology was not the reason they decided to become librarians, and they wish their library school had more hands-on opportunities for technology instruction instead of teaching theoretical applications.14 method to understand, in part, the technology skills of millennial academic librarians and their attitudes toward technology, the author developed a two-part research study including an online survey and individual interviews with millennial librarians and library school students. first, an exhaustive three-part survey was created covering multiple aspects of millennial academic librarians, including demographics, career choice, specialization, generational attitudes, management, and technology skills. although the survey focused on many areas of data collection, this paper focuses only on technology skills. the survey was disseminated in may 2012 to 50 american library association (ala) accredited library schools in the united states as well as online outlets geared toward new librarians, including the new members round table (nmrt) electronic discussion list, nextgen-l (next generation librarians list), the ala emerging leaders program alumni electronic discussion list, and the ala think tank on facebook. the survey was information technology and libraries | september 2013 22 open for 10 days. the survey also asked participants if they would be willing to participate in a follow-up interview. a total of 161 participants volunteered for a follow-up interview. interviews began once the survey closed, and individuals were contacted via email to schedule an interview at their convenience. a total of 20 interviews were conducted in may and june 2012. the interviews were conducted using the audio-only function of skype and were recorded using the mp3 skype recorder software. the author then transcribed all of the interviews and coded the transcripts. the interview utilized open-ended questions to gather individual stories and offer support to the quantitative demographic and qualitative survey questions.15 the interview questions were semistructured and asked participants to explain in detail their path to becoming an academic librarian and their attitudes toward technology. results there were 315 valid survey responses. the birth years of participants ranged from 1982 to 1990 (see figure 1). the respondents were nearly evenly divided between library school students (45.5 percent) and individuals having already obtained a mls degree (52.1 percent). concerning the format of their library school program, 38.4 percent earned the degree at an institution entirely in person, 19.6 percent completed the degree entirely online, and 42.0 percent went to a program that was a mix of in person and online courses. figure 1. birth-year distribution of survey participants. 41 35 64 50 45 39 33 22 2 1982 1983 1984 1985 1986 1987 1988 1989 1990 0 10 20 30 40 50 60 70 digital native academic librarians, technology skills, and their relationships with technology | emanuel 23 quantitative data millennials believe it is very important for librarians to understand technology, with 99 percent reporting that it is important or very important. data on skills related to technology were gathered through several questions, notably by using a list of technologies commonly used in academic libraries and asking respondents to rate their comfort level before starting library school, after library school, and at the present time. the results are illustrated in table 1. technology before library school after library school at the present time adobe dreamweaver 1.93 2.5 2.46 adobe flash 2.28 2.61 2.66 adobe photoshop 2.66 3.15 3.22 computer hardware 3.03 3.27 3.32 computer networking 2.54 2.85 2.83 computer security 2.56 2.96 2.91 content management systems (cms) 2.34 3.32 3.29 course management systems (blackboard, moodle, etc.) 3.37 4.22 4.22 file management issues 3.00 3.72 3.67 html 2.56 3.56 3.48 image editing/scanning 3.47 3.87 n/a information architecture 1.86 2.67 2.58 integrated library systems—back end n/a 3.05 2.93 integrated library systems—front end n/a 3.53 3.39 linux/unix 1.58 1.83 1.86 mac os x 2.92 3.31 3.45 microsoft access 2.55 3.19 3.26 microsoft excel 3.94 4.37 4.40 microsoft windows 4.57 4.67 4.71 microsoft word 4.66 4.76 4.79 mobile devices 4.27 4.51 4.60 powerpoint 4.43 4.62 4.65 programming languages (c++, .net, etc.) 1.53 1.94 1.84 relational databases 1.87 2.66 2.66 screen capture software (camtasia, captivate, etc.) 2.10 3.26 3.32 server set up/maintenance 1.56 1.85 1.84 video conferencing 2.61 3.36 3.54 video editing 2.28 2.90 2.94 web 2.0 (rss, blogs, social networking, wikis, etc.) 3.79 4.54 4.49 web programming languages 1.55 1.99 1.92 xml 1.60 2.40 n/a table 1. average comfort level with technologies before and after library school and at the current time. scale: 1 = very uncomfortable to 5 = very comfortable. this list can be split into categories based on the level of technical skill required. individuals were most comfortable with technologies that are used rather than technologies that enable people to create content, which generally require a higher level of skill. for example, people were comfortable with using content management systems (cms) and software used to create information technology and libraries | september 2013 24 webpages, such as dreamweaver, but they were not comfortable with the information architecture skills, css, and html needed to create more complex websites. there was also a lack of understanding about relational databases, which serve as the back end of many online library resources that all librarians use to accomplish most reference work. other deficiencies include linux, an operating system commonly used to run servers, as well as server set up and administration, which run all web-based library resources and services. there is also a strong lack of computer programming understanding and skills, including c++ and .net, as well as web programming languages such as php, asp, and perl. however, when asked which technologies they would like to learn, respondents listed computer and web programming languages the most often, along with other high-level technology skills, including xml, database software and vocabularies, geographic information systems (gis), adobe photoshop, and statistical software, such as spss. data from the technology questions also show that people are learning about technology in library school, but they are learning more about technology they already know how to use than technologies that are new to them. there are a couple of exceptions, including cms, course management systems, html, and screen casting software, with which respondents grew notably more comfortable while in library school (see table 1). more than 84 percent of respondents were required to take a technology course in library school, and they generally believed library school prepared them well to deal with the technological side of librarianship, rating 3.23 on a 1–5 scale. however, respondents did note that most of their technology skill was self-taught (81.7 percent), with only 47.5 percent stating that coursework contributed to their skills. an open-ended question asked what specific technology skills individuals wanted to learn. the results indicate that millennial librarians desire to learn more of the higher-level technology skills, especially programming, which was indicated in 28 of the 97 responses. other skills that were frequently noted include various elements of web programming, including scripting, xml, html, photoshop, microsoft access, spss, and gis. all of these skills either involve content creation (such as scripting, xml and html) or are complicated software that can require a great deal of training to master. see figure 2 for a tag cloud of technologies respondents want to learn. figure 2. coded tag cloud indicating responses to question, “are there any other technologies you want to learn?” digital native academic librarians, technology skills, and their relationships with technology | emanuel 25 clear trends emerged when millennial librarians were asked about technologies that will be most important to libraries in five year. mobile devices, including e-readers (such as the amazon kindle), apps, and tablet computers were the most common category of responses, followed by social media and social applications aimed at libraries. cmss for managing website content was also very popular, and website design was also common. advanced knowledge of database design, including relational database design, and the storage of library data frequently were mentioned, skills that were on a higher level than simply using databases to retrieve information online. web 2.0 applications were also commonly mentioned, but it is unknown if these overlapped with social media. e-books, not unexpectedly, were very popular. the most popular technology individuals wanted to learn was programming, which came up 25 times, indicating there may be a gap in the technical skills that librarians have and the skills they need to have. see figure 3 for a visualization of coded responses. figure 3. coded responses to question, “what three technologies will be most important to libraries in five years?” qualitative data interview participants exhibited a wide variety of diversity and roughly matched the demographics of survey participants. demographic information for survey participants was gathered from their survey responses. ten of the interviewees were born in 1984 or 1985, with the remaining ten born during the remaining years between 1982 and 1989. three participants were male, one did not indicate sex on the survey, and the remaining 16 participants were female. fifteen identified their race as white, two african american, one middle eastern, one hispanic, and one from multiple races. interview participants were from 14 different states. all participants were given pseudonyms for purposes of data analysis. the interview transcripts were coded into information technology and libraries | september 2013 26 broad categories, including attitudes about being digital natives, and technical skills relating to career choice. digital native issues related to being digital natives came up often when millennial librarians were asked to talk about their experiences using technology before they began library school, in library school, and on the job. however, not all considered themselves a digital native, very tech savvy, or able to pinpoint exactly what their tech skills are. most, however, did believe that there were differences in technology use and attitudes between librarians who were younger versus older librarians. childhood technology most remember when they first had a computer in their home as a child, so it was not a part of their lives from birth, but rather from a young age that most participants remember. betty and diana recall always having technology in their homes growing up because their parents worked in technology careers or had an interest in it as a hobby. as diana stated, “both [parents] worked in the it field, so when i was really little, they spent an astronomical amount of money on a computer back in the mid to late 1980s, so i’ve always grown up with technology.” others remember first being exposed to computers in school, with catharine saying, “i can remember being in elementary school and being on a computer and having specialized training. not just in typing but they even pulled people out of class to learn how computers work.” heather vividly remembers her family getting their first computer: “we got one in my house when i was like in the sixth grade and that was a huge thing.” participants also remember having internet access as a child. betsy noted, “i had a prodigy (online service) account when i was seven, when most people did not even know what the internet was at that point.” gabby said, “i think they call people between 21 and 30 the ‘in between’, because they knew what it was like before technology, but they also know how to use technology . . . because i remember before computers.” kelly talked extensively about how she grew up with technology: i think we got our first computer when i was in the fifth grade. i definitely grew up with it. i used it in school. i remember what life was like before computers, though. i have that little bit of perspective there. but it was definitely part of my daily life. and in college i joined facebook back when it was only for college students and now people cannot remember that now. but i used email, was one of the first users of gmail. i got a little more into it in college. olivia also talked about her use of technology as a child: we had a computer in my house. we were very fortunate because my dad was on top of that. so we had a computer since i was a little kid. so i would play around on that a lot, like aol and prodigy. i had the basic skills. and in high school we were taught basic word processing and excel. so i’ve always been in front of a computer. digital native academic librarians, technology skills, and their relationships with technology | emanuel 27 concept of the digital native most people believed they are digital natives because they have been working with technology for a long time, which sets them apart from older generations who they thought did not work extensively with technology until they were adults. catharine stated, “i know it has been a part of my life forever so probably my age does have something to do with it.” catharine talked about the differences in technology skill between herself and her older colleagues, but added, “i don’t feel there is an unwillingness for them to learn technology. i just don’t think they had experiences at the time, where maybe we’re just afforded more opportunities.” however, when pressed, not all considered themselves digital natives. abby recalls a class discussion about the idea of digital natives and how younger people may not be as good with technology as they perceive. because of this, she was hesitant to refer to herself as a digital native, even though growing up she believed technology was a part of her life. others, such as betsy, are reluctant to call themselves digital natives because they remember when their family first got a computer and it was not always in their household. there were also a couple of outliers who were reluctant to call themselves digital natives because they did not grow up with technology in the same ways as did many of their peers. rachel grew up in a poorer home that always got technology second-hand, and she always thought they were behind others. although her family first had an apple computer in the 1980s, she did not recall using it, and just thought of it as a sort of “new appliance” in her house. her family did not emphasize technology use and saw it as something not worth investing in until they had to, which gave her a different perspective of using technology only as necessary and as “one of those things that sometimes i just don’t want to deal with.” samantha grew up in a rural area that only had dial-up internet, which embarrassed her and did not work as well as she thought it should, so she did not use it, leading to a belief that she did not grow up on the internet in the same way as her peers. because of this, she did not consider herself a digital native: i’m still able to relate to those in a different generation who have no idea where to start [with technology], because i was at that state recently. . . . i’m at the in-between stage, so i can handle both ends of the [technology use] spectrum. but yeah, i’m not a digital native. technology reaction participants assumed that, because of their age, they were not as scared or intimidated by technology as they thought some of their colleagues were. heather talked about how learning new things would initially make her nervous, but then would get excited about what the new program or application could do for herself or her work. francis stated, “i’m not afraid of the technology.” she also talked about the differences between herself and her older colleagues: if you ask them something different or to learn something new, they will make it more complicated. i’m so used to exploring my options, i don’t think about it. those 20–30 years older than me are comfortable knowing what they know how to do but not necessarily exploring new ways of doing something that they already know how to do. they feel pretty comfortable and confident in their skills information technology and libraries | september 2013 28 but aren’t really looking to test the waters to see if there is a different way to do something. . . . i’m willing to try. i see a lot of people that are afraid they are going to break something and don’t want to click on it. and i have the confidence that if i click on something, then i can pretty much undo whatever that does. so not necessarily skills, but a different mindset or something. as francis inferred above, younger librarians, because they have always used technology, believe they can quickly learn new technologies. quinn, a current student, also talked extensively about this: i definitely think my age has a lot to do with how comfortable i am with it. because there are various ages within [my library school] and i have definitely noticed that older people fear it a bit more. i guess i can attribute my age to being embedded in technology. because i’ve always had it, well i haven’t always had it, but i had it young enough to feel like it is a part of me, as opposed to new fangled and wasn’t with it in the beginning. . . . i’m not afraid of it, i’m not afraid to mess around with it and mess things up. because you can always reboot or start over. i think that’s the biggest thing, like i will work on something and mess around with it until i figure it out as opposed to someone who is older who wants to know something exactly the right way so they don’t want to do anything bad to it. heather stated: i think i’m a bit more open to new technologies than some of my older colleagues. . . . i have the feeling i know a little bit more. . . . i’m not sure it is just because my comfort level was higher or maybe their experiences make them more cautious about new things, but i think the younger librarians are more quick to latch on to new things. other participants shared this same belief when talking about the difference in work styles and technology use among different ages in their workplace. technology skills the individual technology skills individuals described focus on the use of technology, not the creation of it. francis described this: i don’t have any programming or coding [experience] or building physical computers or anything like that, but just general using a variety of devices like the ipad, iphone, everything is all integrated. i like being able to use technology in my personal life. no one responded that they knew how to program and work with servers, though edward said he had “fiddled with linux as a server” but did not spend a lot of time with it. olivia and quinn, however, did express interest in learning how to program, understand the back end, and create emerging technologies. betty mentioned using sql and xml in her workplace and aired her frustrations that people just expect to be able to use technology without learning how it actually works and what went into making that device or service. several people mentioned working in web design, but only a couple people mentioned creating webpages with html and css, though several had experience using tools such as dreamweaver or frontpage. ian mentioned that it was part of her public library job to assist patrons with using their personal devices, while others digital native academic librarians, technology skills, and their relationships with technology | emanuel 29 stated that when they have technology problems, they simply contact their it departments. many participants mentioned using social media and various web 2.0 applications such as facebook and twitter, both personally and professionally. when asked to compare their tech skills from before they became librarians to after, some described minor changes in skills, such as learning html, but others mostly indicated that library school helped them learn new applications, existing technologies, or new technology resources, most without going into detail. quinn talked about her tech skills in relation to what she is learning in library school: i think they [technology skills] are actually above average. i’ve taken a few of the courses that are offered in terms of tech, and they are totally below what i already know. but other classmates have thought it was really hard. but i’ve had prior knowledge of it. patricia stated she started using online tools more extensively after learning about them in library school. one talked extensively about using webinar software and libguides to deliver instruction online, while another stated library school inspired her to start a blog that she did not keep up, and another became an extensive twitter user. jan focused on digital librarianship while in library school because she saw it as the future of libraries. she thought that library school helped her do some “encoding on some projects and how to do webpages,” but it barely touched on the skills needed to actually perform a job within digital librarianship. she would like to get more into the development side of library technology, but in her current job there is not the time or support to further advance those skills. a couple of participants talked about learning about usability and the evaluation of technologies. a few interview participants mentioned the tech skills of people even younger than they are, or current college students they work with. betty did not see younger coworkers understanding what is needed to develop or understand the back end of technology and believed younger workers do not use technology to communicate as effectively as they could. edward, who works at a for-profit career college that has many poorer and nontraditional students, stated, it is “not just the 50 year olds, but the 18 year olds who don’t know how to attach documents to an email.” when pressed as to why she thinks young students struggle with basic technology tasks, he stated, “at times i think that has a lot more to do with their k–12 experience and if they had access to computers and stuff. i don’t know. it just blows me away sometimes.” gabby, currently working in appalachia, said, “not everyone here has computer skills, not everyone has access to it at home or maybe can’t make it to the library. . . . i think it is awesome to have those things at your fingertips, but not everyone does.” on the other hand, diana believed that she does not “have the same relationship with technology like i’m seeing some of the college students now where they are hooked in all the time and they are just going for it”. she also said that she “wouldn’t call myself a digital immigrant, but i’m very comfortable using technology but not to the extent i’m seeing many people i see now.” information technology and libraries | september 2013 30 tech skills related to career choice the researcher sought to determine the role of technology in determining the career choice of librarianship. those interview participants who talked about using technology did not mention it as a reason they became a librarian. survey responses indicated that opportunities to use technology were an important reason to become a librarian, but participants did not stress technology during the interviews. participants were much more likely to specify their love of the academic atmosphere or their general interest in research first and then maybe think of technology as an afterthought. gabby mentioned, after a long list of things that influenced her career choice, “and technology and stuff.” only taylor talked extensively about how technology influenced her choice. a current library school student, she wanted to go into archives and is really excited about how much information is being digitized and put online: you know how everything on microfiche is now digital? everything seems to be digitized as well, you know books and e-books and journals. being able to take something and scan it and put it online for users to access. it is definitely an important thing. so yeah, that definitely influenced me on becoming a librarian. jan decided to specialize in digital librarianship while in library school because she saw it was the future of library work. rachel, who has observed similar attitudes among her classmates, shared this thinking as well. however, heather admitted she did not have a lot of technology experience before going to library school and did not believe that her master’s program prepared her to go into the technology oriented digital librarianship. several participants talked about how their background using search engines such as google and doing research online would make them better librarians, but none talked about these as factors related to choosing librarianship as a career. abby talked about how she always uses google to look things up, and that it is nice to have found a career that rewards such use. diana discussed how she had always been good at finding information online since she was a child, which helped her narrow her career choice to academic librarianship, as she believed it was the best match for these skills. instead of talking about how technology influenced their career choice, participants were more likely to talk about the fact that technology did not influence them. abby stated, “i don’t think [technology influenced me] because i didn’t really know that librarians needed a lot of technology skill.” edward stated he, “didn’t do any technology in library school because i didn’t want to go in that direction,” reiterated this. rachel, who strongly did not consider herself a digital native, stated she was drawn to librarianship, specifically access services, because she liked working with print books rather than using online resources to find information. she commented, i really liked looking for books and i used the card catalog when i was a kid, but i can use a computer to help people find things, but it was like, i really just liked finding the books rather than electronic digital native academic librarians, technology skills, and their relationships with technology | emanuel 31 information. i guess i feel like it feels comfortable and safe, like books. and you can hold them and you can touch them. and sometimes i feel like they should always be a part of the library. i took a digital libraries course this past semester and i felt like i was the only one being like, “no, we still need physical books,” so i was actually realizing how intimidated i am with technology. i’m totally willing to adapt, and i’m willing to work on these issues, but i do feel like i want the library to still be a place that has the traditional feel. samantha also did not feel like a digital native, as she grew up in a rural area that only had access to dial-up internet. she went on to describe how she did not work with online tools until college and she was relieved when she did not have to use such tools during a year off between college and graduate school. although she recognized technology use by librarians is helping libraries not becoming obsolete, she only learned what she needed to learn in order to complete library school, so it did not influence her career choice. implications millennials are very comfortable with technology, though there are limitations to their skills. for the most part, they have a lifetime attachment to technology,16 but they do remember a time without having a computer in their homes or when computers were something only used at school and for basic instruction. as interview participant frances put it, “nothing like how students get to use them now.” millennial grew up with computers, but early on, they were not advanced enough to do the multimedia creation and application building that is done now, and they mostly use resources that were developed by others. however, millennial librarians in this study do see the utility that computers have in everyday life, and by high school, many stated that computer use was required for them to go about their academic and personal lives, but they thought that technology in its current state with online research resources and social networking did not come about until they were in college. additionally, most interview respondents said that library school helped acculturate them into using technology more frequently. however, not everyone in the study grew up with a computer or internet access in his or her home. two interview respondents refused to call themselves digital natives. one said she grew up in an environment without much money, and the only technology her family had access to was often secondhand and several years behind. the other participant grew up in a rural area that did not have access to high-speed internet, and as a result, she was rarely online until college. both individuals believed that technology was definitely not a factor in them being drawn to librarianship, and they were more interested in the circulation and the print resources than in specializations that require a high level of technical knowledge. other participants were quick to acknowledge that there are many members of their generation who, for one reason or another, do not have an interest in technology and may not have had the resources growing up to have incorporated it into their daily lives. some participants noted there was some computer instruction starting in elementary school, but it was very basic computer literacy, and most of their technology learning occurred at home when there was the time to focus on tasks that were more complicated. information technology and libraries | september 2013 32 even though study participants remember a time without technology in their homes and they believe that technology did not mature to its current state until they were in college, they have used it for a much larger percentage of their lives than older generations. for that reason, they are quick to learn new technologies as they become available or are required based on professional needs. they also believe that because computers had matured alongside them, they are not afraid to break them. interview participant abby states, “i have a lot of faith in technology.” millennials believe that they can experiment with technology without fear that it will become inoperable or cause additional headaches in the future. they are also not wedded to particular technologies and do not get frustrated by current technologies and applications because they think something newer and better is always around the corner. one disconnect in the technology skills of millennials is that most of them are accustomed to using technology, not creating it or understanding the back end infrastructure. as one interview participant said, “they expect everything to be easy, but they don’t understand what went into trying to make it easy.” although many librarians indicated they use tools such as camtasia to create multimedia projects, many thought they had weak skills in this area and desired to learn more. they are also most likely to edit content on webpages using a cms system such as drupal or libguides instead of creating more elaborate websites utilizing information architecture principles or more complex web programming languages (such as php) or relational databases (such as mysql). they rely on dedicated tech people to set these up and maintain the servers that house these services, but they desire to learn more about these technologies themselves. there is also a strong desire to learn more traditional computer programming languages such as c++, c#, and perl. many participants thought library school only affected their technology skills marginally, and they desire to learn higher-order skills that can be applied to their job. millennials are comfortable learning front-end technologies on their own, but they need help understanding the technology behind the tools they use in their daily lives. conclusion this mixed-methods study examined many characteristics of millennial librarians, and this article noted their technology skills and attitudes toward technology. the findings indicate that technology did not play a major role in their decision to become academic librarians. the data also reveal that, although millennial librarians mostly grew up with technology and believe this sets their skills apart from older librarians, their skills are mostly in using technology tools and not in creating them. they also believe their status as digital native has allowed them to recognize that librarianship is changing as a career. however, millennial librarians still respect their older colleagues and the skills associated with traditional librarianship and are firmly rooted in traditions. millennial librarians just want to be able to shape the profession in their own way. digital native academic librarians, technology skills, and their relationships with technology | emanuel 33 references 1. william strauss and neil howe, millennial and the pop culture: strategies for a new generation of consumers in music, movies, and video games (great falls, va: life course associates, 2006). 2. haidee e. allerton, “generation why: they promise to be the biggest influence since the baby boomers,” training and development 55, no. 11 (2001): 56–60; don tapscott, growing up digital: the rise of the net generation (new york: mcgraw-hill, 2008). 3. rachel singer gordon, the nextgen librarian’s survival guide (medford, nj: information today, 2006); sophia guevara, “generation y what can we do for you?” information outlook 11, no. 6 (2007): 81–82; diane zabel, “trends in reference and public services librarianship and the role of rusa: part two,” reference & user services quarterly 45, no. 2 (2005): 104–7. 4. gordon, the nextgen librarian’s survival guide. 5. gordon, the nextgen librarian’s survival guide; lisa johnson, mind your x’s and y’s: satisfying the 10 cravings of a new generation of consumers (new york: free press, 2006); william strauss & neil howe, millennial rising: the next great generation (new york: vintage, 2000); tapscott, growing up digital; ron c. zemke, claire raines, and bob filipczak, generations at work: managing the clash of veterans, boomers, xers, and nexters in your workplace (new york: amacom, 2000). 6. johnson, mind your x’s and y’s; tapscott, growing up digital. 7. strauss and howe, millennial and the pop culture. 8. zemke, raines, and filipczak, generations at work. 9. tapscott, growing up digital. 10. strauss and howe, millennial and the pop culture; tapscott, growing up digital. 11. l. p. morton, “targeting generation y,” public relations quarterly 47, no. (2002): 46–48; p. paul, “getting inside gen y,” american demographics 23, no. 9 (2001): 42–49. 12. paul, “getting inside gen y”; tapscott, growing up digital. 13. mark bauerlein, the dumbest generation: how the digital age stupefies young americans and jeopardizes our future (or, don’t trust anyone under 30) (new york: penguin, 2008). 14. darcy del bosque and cory lampert, “a chance of storms: new librarians navigating technology tempests,” technical services quarterly 26, no. 4 (2009): 261–86. 15. carol h. weiss, evaluation: methods for studying programs and policies (upper saddle river, nj: prentice hall, 1998). 16. allerton, “generation why ”; tapscott, growing up digital. tech skills related to career choice the next generation library catalog | zhou 151are your digital documents web friendly? | zhou 151 are your digital documents web friendly?: making scanned documents web accessible the internet has greatly changed how library users search and use library resources. many of them prefer resources available in electronic format over traditional print materials. while many documents are now born digital, many more are only accessible in print and need to be digitized. this paper focuses on how the colorado state university libraries creates and optimizes text-based and digitized pdf documents for easy access, downloading, and printing. t o digitize print materials, we normally scan originals, save them in archival digital formats, and then make them webaccessible. there are two types of print documents, graphic-based and text-based. if we apply the same techniques to digitize these two different types of materials, the documents produced will not be web-friendly. graphic-based materials include archival resources such as historical photographs, drawings, manuscripts, maps, slides, and posters. we normally scan them in color at a very high resolution to capture and present a reproduction that is as faithful to the original as possible. then we save the scanned images in tiff (tagged image file format) for archival purposes and convert the tiffs to jpeg (joint photographic experts group) 2000 or jpeg for web access. however, the same practice is not suitable for modern text-based documents, such as reports, journal articles, meeting minutes, and theses and dissertations. many old text-based documents (e.g., aged newspapers and books), should be yongli zhoututorial files for fast web delivery as access files. for text-based files, access files normally are pdfs that are converted from scanned images. “bcr’s cdp digital imaging best practices version 2.0” says that the master image should be the highest quality you can afford, it should not be edited or processed for any specific output, and it should be uncompressed.1 this statement applies to archival images, such as photographs, manuscripts, and other image-based materials. if we adopt the same approach for modern text documents, the result may be problematic. pdfs that are created from such master files may have the following drawbacks: ■■ because of their large file size, they require a long download time or cannot be downloaded because of a timeout error. ■■ they may crash a user’s computer because they use more memory while viewing. ■■ they sometimes cannot be printed because of insufficient printer memory. ■■ poor print and on-screen viewing qualities can be caused by background noise and bleedthrough of text. background noise can be caused by stains, highlighter marks made by users, and yellowed paper from aged documents. ■■ the ocr process sometimes does not work for high-resolution images. ■■ content creators need to spend more time scanning images at a high resolution and converting them to pdf documents. web-friendly files should be small, accessible by most users, full-text searchable, and have good treated as graphic-based material. these documents often have faded text, unusual fonts, stains, and colored background. if they are scanned using the same practice as modern text documents, the document created can be unreadable and contain incorrect information. this topic is covered in the section “full-text searchable pdfs and troubleshooting ocr errors.” currently, pdf is the file format used for most digitized text documents. while pdfs that are created from high-resolution color images may be of excellent quality, they can have many drawbacks. for example, a multipage pdf may have a large file size, which increases download time and the memory required while viewing. sometimes the download takes so long it fails because a time-out error occurs. printers may have insufficient memory to print large documents. in addition, the optical character recognition (ocr) process is not accurate for high resolution images in either color or grayscale. as we know, users want the ability to easily download, view, print, and search online textual documents. all of the drawbacks created by high-quality scanning defeat one of the most important purposes of digitizing text-based documents: making them accessible to more users. this paper addresses how colorado state university libraries (csul) manages these problems and others as staff create web-friendly digitized textual documents. topics include scanning, long-time archiving, full-text searchable pdfs and troubleshooting ocr problems, and optimizing pdf files for web delivery. preservation master files and access files for digitization projects, we normally refer to images in uncompressed tiff format as master files and compressed yongli zhou is digital repositories librarian, colorado state university libraries, colorado state university, fort collins, colorado 152 information technology and libraries | september 2010152 information technology and libraries | september 2010 factors that determine pdf file size. color images typically generate the largest pdfs and black-and-white images generate the smallest pdfs. interestingly, an image of smaller file size does not necessarily generate a smaller pdf. table 1 shows how file format and color mode affect pdf file size. the source file is a page containing black-and-white text and line art drawings. its physical dimensions are 8.047" by 10.893". all images were scanned at 300 dpi. csul uses adobe acrobat professional to create pdfs from scanned images. the current version we use is adobe acrobat 9 professional, but most of its features listed in this paper are available for other acrobat versions. when acrobat converts tiff images to a pdf, it compresses images. therefore a final pdf has a smaller file size than the total size of the original images. acrobat compresses tiff uncompressed, lzw, and zip the same amount and produces pdfs of the same file size. because our in-house scanning software does not support tiff g4, we did not include tiff g4 test data here. by comparing similar pages, we concluded that tiff g4 works the same as tiff uncompressed, lzw, and zip. for example, if we scan a text-based page as blackand-white and save it separately in tiff uncompressed, lzw, zip, or g4, then convert each page into a pdf, the final pdf will have the same file size without a noticeable quality difference. tiff jpeg generates the smallest pdf, but it is a lossy format, so it is not recommended. both jpeg and jpeg 2000 have smaller file sizes but generate larger pdfs than those converted from tiff images. recommendations 1. use tiff uncompressed or lzw in 24 bits color for pages with color graphs or for historical documents. 2. use tiff uncompressed or lzw compress an image up to 50 percent. some vendors hesitate to use this format because it was proprietary; however, the patent expired on june 20, 2003. this format has been widely adopted by much software and is safe to use. csul saves all scanned text documents in this format. ■■ tiff zip: this is a lossless compression. like lzw, zip compression is most effective for images that contain large areas of single color. 2 ■■ tiff jpeg: this is a jpeg file stored inside a tiff tag. it is a lossy compression, so csul does not use this file format. other image formats: ■■ jpeg: this format is a lossy compression and can only be used for nonarchival purposes. a jpeg image can be converted to pdf or embedded in a pdf. however, a pdf created from jpeg images has a much larger file size compared to a pdf created from tiff images. ■■ jpeg 2000: this format’s file extension is .jp2. this format offers superior compression performance and other advantages. jpeg 2000 normally is used for archival photographs, not for text-based documents. in short, scanned images should be saved as tiff files, either with compression or without. we recommend saving text-only pages and pages containing text and/or line art as tiff g4 or tiff lzw. we also recommend saving pages with photographs and illustrations as tiff lzw. we also recommend saving pages with photographs and illustrations as tiff uncompressed or tiff lzw. how image format and color mode affect pdf file size color mode and file format are two on-screen viewing and print qualities. in the following sections, we will discuss how to make scanned documents web-friendly. scanning there are three main factors that affect the quality and file size of a digitized document: file format, color mode, and resolution of the source images. these factors should be kept in mind when scanning text documents. file format and compression most digitized documents are scanned and saved as tiff files. however, there are many different formats of tiff. which one is appropriate for your project? ■■ tiff: uncompressed format. this is a standard format for scanned images. however, an uncompressed tiff file has the largest file size and requires more space to store. ■■ tiff g3: tiff with g3 compression is the universal standard for faxs and multipage line-art documents. it is used for blackand-white documents only. ■■ tiff g4: tiff with g4 compression has been approved as a lossless archival file format for bitonal images. tiff images saved in this compression have the smallest file size. it is a standard file format used by many commercial scanning vendors. it should only be used for pages with text or line art. many scanning programs do not provide this file format by default. ■■ tiff huffmann: a method for compressing bi-level data based on the ccitt group 3 1d facsimile compression schema. ■■ tiff lzw: this format uses a lossless compression that does not discard details from images. it may be used for bitonal, grayscale, and color images. it may the next generation library catalog | zhou 153are your digital documents web friendly? | zhou 153 to be scanned at no less than 600 dpi in color. our experiments show that documents scanned at 300 or 400 dpi are sufficient for creating pdfs of good quality. resolutions lower than 300 dpi are not recommended because they can degrade image quality and produce more ocr errors. resolutions higher than 400 dpi also are not recommended because they generate large files with little improved on-screen viewing and print quality. we compared pdf files that were converted from images of resolutions at 300, 400, and 600 dpi. viewed at 100 percent, the difference in image quality both on screen and in print was negligible. if a page has text with very small font, it can be scanned at a higher resolution to improve ocr accuracy and viewing and print quality. table 2 shows that high-resolution images produce large files and require more time to be converted into pdfs. the time required to combine images is not significantly different compared to scanning time and ocr time, so it was omitted. our example is a modern text document with text and a black-and-white chart. most of our digitization projects do not require scanning at 600 dpi; 300 dpi is the minimum requirement. we use 400 dpi for most documents and choose a proper color mode for each page. for example, we scan our theses and dissertations in black-andwhite at 400 dpi for bitonal pages. we scan pages containing photographs or illustrations in 8-bit grayscale or 24-bit color at 400 dpi. other factors that affect pdf file size in addition to the three main factors we have discussed, unnecessary edges, bleed-through of text and graphs, background noise, and blank pages also increase pdf file sizes. figure 1 shows how a clean scan can largely reduce a pdf file size and cover. the updated file has a file size of 42.8 mb. the example can be accessed at http://hdl.handle .net/10217/3667. sometimes we scan a page containing text and photographs or illustrations twice, in color or grayscale and in black-and-white. when we create a pdf, we combine two images of the same page to reproduce the original appearance and to reduce file size. how to optimize pdfs using multiple scans will be discussed in a later section. how image resolution affects pdf file size before we start scanning, we check with our project manager regarding project standards. for some funded projects, documents are required in grayscale 8 bits for pages with black-and-white photographs or grayscale illustrations. 3. use tiff uncompressed, lzw, or g4 in black-and-white for pages containing text or line art. to achieve the best result, each page should be scanned accordingly. for example, we had a document with a color cover, 790 pages containing text and line art, and 7 blank pages. we scanned the original document in color at 300 dpi. the pdf created from these images was 384 mb, so large that it exceeded the maximum file size that our repository software allows for uploading. to optimize the document, we deleted all blank pages, converted the 790 pages with text and line art from color to blackand-white, and retained the color table 1. file format and color mode versus pdf file size file format scan specifications tiff size (kb) pdf size (kb) tiff color 24 bits 23,141 900 tiff lzw color 24 bits 5,773 900 tiff zip color 24 bits 4,892 900 tiff jpeg color 24 bits 4,854 873 jpeg 2000 color 24 bits 5,361 5,366 jpeg color 24 bits 4,849 5,066 tiff grayscale 8 bits 7,729 825 tiff lzw grayscale 8 bits 2,250 825 tiff zip grayscale 8 bits 1,832 825 tiff jpeg grayscale 8 bits 2,902 804 jpeg 2000 grayscale 8 bits 2,266 2,270 jpeg grayscale 8 bits 2,886 3,158 tiff black-and-white 994 116 tiff lzw black-and-white 242 116 tiff zip black-and-white 196 116 note: black-and-white scans cannot be saved in jpeg, jpeg 2000, or tiff jpeg formats. 154 information technology and libraries | september 2010154 information technology and libraries | september 2010 many pdf files cannot be saved as pdf/a files. if an error occurs when saving a pdf to pdf/a, you may use adobe acrobat preflight (advanced > preflight) to identify problems. see figure 2. errors can be created by nonembedded fonts, embedded images with unsupported file compression, bookmarks, embedded video and audio, etc. by default, the reduce file size procedure in acrobat professional compresses color images using jpeg 2000 compression. after running the reduce file size procedure, a pdf may not be saved as a pdf/a because of a “jpeg 2000 compression used” error. according to the pdf/a competence center, this problem will be eliminated in the second part of the pdf/a standard— pdf/a-2 is planned for 2008/2009. there are many other features in new pdfs; for example, transparency and layers will be allowed in pdf/a2.5 however, at the time this paper was written pdf/a-2 had not been announced.6 portable, which means the file created on one computer can be viewed with an acrobat viewer on other computers, handheld devices, and on other platforms.3 a pdf/a document is basically a traditional pdf document that fulfills precisely defined specifications. the pdf/a standard aims to enable the creation of pdf documents whose visual appearance will remain the same over the course of time. these files should be software-independent and unrestricted by the systems used to create, store, and reproduce them.4 the goal of pdf/a is for long-term archiving. a pdf/a document has the same file extension as a regular pdf file and must be at least compatible with acrobat reader 4. there are many ways to create a pdf/a document. you can convert existing images and pdf files to pdf/a files, export a document to pdf/a format, scan to pdf/a, to name a few. there are many software programs you can use to create pdf/a, such as adobe acrobat professional 8 and later versions, compart ag, pdflib, and pdf tools ag. simultaneously improve its viewing and print quality. recommendations 1. unnecessary edges: crop out. 2. bleed-through text or graphs: place a piece of white or black card stock on the back of a page. if a page is single sided, use white card stock. if a page is double sided, use black card stock and increase contrast ratio when scanning. often color or grayscale images have bleedthrough problems. scanning a page containing text or line art as black-and-white will eliminate bleed-through text and graphs. 3. background noise: scanning a page containing text or line art as black-and-white can eliminate background noise. many aged documents have yellowed papers. if we scan them as color or grayscale, the result will be images with yellow or gray background, which may increase pdf file sizes greatly. we also recommend increasing the contrast for better ocr results when scanning documents with background colors. 4. blank pages: do not include if they are not required. blank pages scanned in grayscale or color can quickly increase file size. pdf and longterm archiving pdf/a pdf vs. pdf/a pdf, short for portable document format, was developed by adobe as a unique format to be viewed through adobe acrobat viewers. as the name implies, it is table 2. color mode and image resolution vs. pdf file size color mode resolution (dpi) scanning time (sec.) ocr time (sec.) tiff lzw (kb) pdf size (kb) color 600 100 n/a* 16,498 2,391 color 400 25 35 7,603 1,491 color 300 18 16 5,763 952 grayscale 600 36 33 6,097 2,220 grayscale 400 18 18 2,888 1370 grayscale 300 14 12 2,240 875 b/w 600 12 18 559 325 b/w 400 10 10 333 235 b/w 300 8 9 232 140 *n/a due to an ocr error the next generation library catalog | zhou 155are your digital documents web friendly? | zhou 155 able. this option keeps the original image and places an invisible text layer over it. recommended for cases requiring maximum fidelity to the original image.8 this is the only option used by csul. 2. searchable image: ensures that text is searchable and selectable. this option keeps the original image, de-skews it as needed, and places an invisible text layer over it. the selection for downsample images in this same dialog box determines whether the image is downsampled and to what extent.9 the downsampling combines several pixels in an image to make a single larger pixel; thus some information is deleted from the image. however, downsampling does not affect the quality of text or line art. when a proper setting is used, the size of a pdf can be significantly reduced with little or no loss of detail and precision. 3. clearscan: synthesizes a new type 3 font that closely approximates the original, and preserves the page background using a low-resolution copy.10 the final pdf is the same as a born-digital pdf. because acrobat cannot guarantee the accuracy of manipulate the pdf document for accessibility. once ocr is properly applied to the scanned files, however, the image becomes searchable text with selectable graphics, and one may apply other accessibility features to the document.7 acrobat professional provides three ocr options: 1. searchable image (exact): ensures that text is searchable and selectfull-text searchable pdfs and troubleshooting ocr errors a pdf created from a scanned piece of paper is inherently inaccessible because the content of the document is an image, not searchable text. assistive technology cannot read or extract the words, users cannot select or edit the text, and one cannot figure 1. pdfs converted from different images: (a) the original pdf converted from a grayscale image and with unnecessary edges; (b) updated pdf converted from a blackand-white image and with edges cropped out; (c) screen viewed at 100 percent of the pdf in grayscale; and (d) screen viewed at 100 percent of the pdf in black-and-white. dimensions: 9.127” x 11.455” color mode: grayscale resolution: 600 dpi tiff lzw: 12.7 mb pdf: 1,051 kb dimensions: 8” x 10.4” color mode: black-and-white resolution: 400 dpi tiff lzw: 153 kb pdf: 61 kb figure 2. example of adobe acrobat 9 preflight 156 information technology and libraries | september 2010156 information technology and libraries | september 2010 but at least users can read all text, while the black-and-white scan contains unreadable words. troubleshoot ocr error 3: cannot ocr image based text the search of a digitized pdf is actually performed on its invisible text layer. the automated ocr process inevitably produces some incorrectly recognized words. for example, acrobat cannot recognize the colorado state university logo correctly (see figure 6). unfortunately, acrobat does not provide a function to edit a pdf file’s invisible text layer. to manually edit or add ocr’d text, adobe acrobat capture 3.0 (see figure 7) must be purchased. however, our tests show that capture 3.0 has many drawbacks. this software is complicated and produces it’s own errors. sometimes it consolidates words; other times it breaks them up. in addition, it is time-consuming to add or modify invisible text layers using acrobat capture 3.0. at csul, we manually add searchable text for title and abstract pages only if they cannot be ocr’d by acrobat correctly. the example in troubleshoot ocr error 2: could not perform recognition (ocr) sometimes acrobat generates an “outside of the allowed specifications” error when processing ocr. this error is normally caused by color images scanned at 600 dpi or more. in the example in figure 4, the page only contains text but was scanned in color at 600 dpi. when we scanned this page as blackand-white at 400 dpi, we did not encounter this problem. we could also use a lower-resolution color scan to avoid this error. our experiments also show that images scanned in black-and-white work best for the ocr process. in this article we mainly discuss running the ocr process on modern textual documents. black-and-white scans do not work well for historical textual documents or aged newspapers. these documents may have faded text and background noise. when they are scanned as blackand-white, broken letters may occur, and some text might become unreadable. for this reason they should be scanned in color or grayscale. in figure 5, images scanned in color might not produce accurate ocr results, ocred text at 100 percent, this option is not acceptable for us. for a tutorial on to how to make a full-text searchable pdf, please see appendix a. troubleshoot ocr error 1: acrobat crashes occasionally acrobat crashes during the ocr process. the error message does not indicate what causes the crash and where the problem occurs. fortunately, the page number of the error can be found on the top shortcuts menu. in figure 3, we can see the error occurs on page 7. we discovered that errors are often caused by figures or diagrams. for a problem like this, the solution is to skip the error-causing page when running the ocr process. our initial research was performed on acrobat 8 professional. our recent study shows that this problem has been significantly improved in acrobat 9 professional. figure 3. adobe acrobat 8 professional crash window figure 4. “could not perform recognition (ocr)” error figure 5. an aged newspaper scanned in color and black-and-white aged newspaper scanned in color aged newspaper scanned in black-and-white the next generation library catalog | zhou 157are your digital documents web friendly? | zhou 157 a very light yellow background. the undesirable marks and background contribute to its large file size and create ink waste when printed. method 2: running acrobat’s built-in optimization processes acrobat provides three built-in processes to reduce file size. by default, acrobat use jpeg compression for color and grayscale images and ccitt group 4 compression for bitonal images. optimize scanned pdf open a scanned pdf and select documents > optimize scanned pdf. a number of settings, such as image quality and background removal, can be specified in the optimize scanned pdf dialog box. our experiments show this process can noticably degrade images and sometimes even increase file size. therefore we do not use this option. reduce file size open a scanned pdf and select documents > reduce file size. the reduce file size command resamples and recompresses images, removes embedded base-14 fonts, and subset-embeds fonts that were left embedded. it also compresses document structure and cleans up elements such as invalid bookmarks. if the file size is already as small as possible, this command has no effect.11 after process, some files cannot be saved as pdf/a, as we discussed in a previous section. we also noticed that different versions of acrobat can create files of different file sizes even if the same settings were used. pdf optimizer open a scanned pdf and select advanced > pdf optimizer. many settings can be specified in the pdf optimizer dialog box. for example, we can downsample images from sections, we can greatly reduce a pdf’s size by using an appropriate color mode and resolution. figure 9 shows two different versions of a digitized document. the source document has a color cover and 111 bitonal pages. the original pdf, shown in figure 9 on the left, was created by another university department. it was not scanned according to standards and procedures adopted by csul. it was scanned in color at 300 dpi and has a file size of 66,265 kb. we exported the original pdf as tiff images, batch-converted color tiff images to black-and-white tiff images, and then created a new pdf using blackand-white tiff images. the updated pdf has a file size of 8,842 kb. the image on the right is much cleaner and has better print quality. the file on the left has unwanted marks and figure 8 is a book title page for which we used acrobat capture 3.0 to manually add searchable text. the entire book may be accessed at http://hdl .handle.net/10217/1553. optimizing pdfs for web delivery a digitized pdf file with 400 color pages may be as large as 200 to 400 mb. most of the time, optimizing processes may reduce files this large without a noticeable difference in quality. in some cases, quality may be improved. we will discuss three optimization methods we use. method 1: using an appropriate color mode and resolution as we have discussed in previous ~do university original logo text ocred by acrobat figure 6. incorrectly recognized text sample figure 7. adobe acrobat capture interface figure 8. image-based text sample 158 information technology and libraries | september 2010158 information technology and libraries | september 2010 grayscale. a pdf may contain pages that were scanned with different color modes and resolutions. a pdf may also have pages of mixed resolutions. one page may contain both bitonal images and color or grayscale images, but they must be of the same resolution. the following strategies were adopted by csul: 1. combine bitmap, grayscale, and color images. we use grayscale images for pages that contain grayscale graphs, such as black-and-white photos, color images for pages that contain color images, and bitmap images for text-only or text and line art pages. 2. if a page contains high-definition color or grayscale images, scan that page in a higher resolution and scan other pages at 400 dpi. 3. if a page contains a very small font and the ocr process does not work well, scan it at a higher resolution and the rest of document at 400 dpi. 4. if a page has both text, color, or grayscale graphs, we scan it twice. then we modify images using adobe photoshop and combine two images in acrobat. in figure 10, the grayscale image has a gray background and a true reproduction of the original photograph. the black-and-white scan has a white background and clean text, but details of the photograph are lost. the pdf converted from the grayscale image is 491 kb and has nine ocr errors. the pdf converted from the black-and-white image is 61kb and has no ocr errors. the pdf converted from a combination of the grayscale and black-and-white images is 283 kb and has no ocr errors. the following are the steps used to create a pdf in figure 10 using acrobat: 1. scan a page twice—grayscale optimizer can be found at http:// www.acrobatusers.com/tutorials/ understanding-acrobats-optimizer. method 3: combining different scans many documents have color covers and color or grayscale illustrations, but the majority of pages are textonly. it is not necessary to scan all pages of such documents in color or a higher resolution to a lower resolution and choose a different file compression. different collections have different original sources, therefore different settings should be applied. we normally do several tests for each collection and choose the one that works best for it. we also make our pdfs compatible with acrobat 6 to allow users with older versions of software to view our documents. a detailed tutorial of how to use the pdf figure 9. reduce file size example figure 10. reduce file size example: combine images the next generation library catalog | zhou 159are your digital documents web friendly? | zhou 159 help.html?content=wsfd1234e1c4b69f30 ea53e41001031ab64-7757.html (accessed mar. 3, 2010). 3. ted padova adobe acrobat 7 pdf bible, 1st ed. (indianapolis: wiley, 2005). 4. olaf drümmer, alexandra oettler, and dietrich von seggern, pdf/a in a nutshell—long term archiving with pdf, (berlin: association for digital document standards, 2007). 5. pdf/a competence center, “pdf/a: an iso standard—future development of pdf/a,” http://www. pdfa.org/doku.php?id=pdfa:en (accessed july 20, 2010). 6. pdf/a competence center, “pdf/a—a new standard for longterm archiving,” http://www.pdfa.org/ doku.php?id=pdfa:en:pdfa_whitepaper (accessed july 20, 2010). 7. adobe, “creating accessible pdf documents with adobe acrobat 7.0: a guide for publishing pdf documents for use by people with disabilities,” 2005, http://www.adobe.com/enterprise/ a c c e s s i b i l i t y / p d f s / a c ro 7 _ p g _ u e . p d f (accessed mar. 8, 2010). 8. adobe, “recognize text in scanned documents,” 2010, http:// help.adobe.com/en_us/acrobat/9.0/ s t a n d a rd / w s 2 a 3 d d 1 fa c fa 5 4 c f 6 -b993-159299574ab8.w.html (accessed mar. 8, 2010). 9. ibid. 10. ibid. 11. adobe, “reduce file size by saving,” 2010, http://help.adobe.com/en_us/ acrobat/9.0/standard/ws65c0a053 -bc7c-49a2-88f1-b1bcd2524b68.w.html (accessed mar. 3, 2010). the other 76 pages as grayscale and black-and-white. then we used the procedure described above to combine text pages and photographs. the final pdf has clear text and correctly reproduced photographs. the example can be found at http://hdl .handle.net/10217/1553. conclusion our case study, as reported in this article, demonstrates the importance of investing the time and effort to apply the appropriate standards and techniques for scanning and optimizing digitized documents. if proper techniques are used, the final result will be web-friendly resources that are easy to download, view, search, and print. users will be left with a positive impression of the library and feel encouraged to use its materials and services again in the future. references 1. bcr’s cdp digital imaging best practices working group, “bcr’s cdp digital imaging best practices version 2.0,” june 2008, http://www.bcr.org/ dps/cdp/best/digital-imaging-bp.pdf (accessed mar. 3, 2010). 2. adobe, “about file formats and compression,” 2010, http://livedocs .adobe.com/en_us/photoshop/10.0/ and black-and-white. 2. crop out text on the grayscale scan using photoshop. 3. delete the illustration on the black-and-white image using photoshop. 4. create a pdf using the blackand-white image. 5. run the ocr process and save the file. 6. insert the color graph. select tools > advanced editing > touchup object tool. rightclick on the page and select place image. locate the color graph in the open dialog, then click open and move the color graph to its correct location. 7. save the file and run the reduce file size or pdf optimizer procedure. 8. save the file again. this method produces the smallest file size with the best quality, but it is very time-consuming. at csul we used this method for some important documents, such as one of our institutional repository’s showcase items, agricultural frontier to electronic frontier. the book has 220 pages, including a color cover, 76 pages with text and photographs, and 143 text-only pages. we used a color image for the cover page and 143 black-and-white images for the 143 text-only pages. we scanned appendix a. step-by-step creating a full-text searchable pdf in this tutorial, we will show you how to create a full-text searchable pdf using adobe acrobat 9 professional. creating a pdf from a scanner adobe acrobat professional can create a pdf directly from a scanner. acrobat 9 provides five options: black and white document, grayscale document, color document, color image, and custom scan. the custom scan option allows you to scan, run the ocr procedure, add metadata, combine multiple pages into one pdf, and also make it pdf/a compliant. to create a pdf from a scanner, go to file > create pdf > from scanner > custom scan. see figure 1. at csul, we do not directly create pdfs from scanners because our tests show that it can produce fuzzy text and it is not time efficient. both scanning and running the ocr process can be very time consuming. if an error occurs during these processes, we would have to start over again. we normally scan images on scanning stations by student employees 160 information technology and libraries | september 2010160 information technology and libraries | september 2010 or outsource them to vendors. then library staff will perform quality control and create pdfs on seperate machines. in this way, we can work on multiple documents at the same time and ensure that we provide high-quality pdfs. creating a pdf from scanned images 1. from the task bar select combine > merge files into a single pdf > from multiple files. see figure 2. 2. in the combine files dialog, make sure the single pdf radio button is selected. from the add files dropdown menu select add files. see figure 3. 3. in the add files dialog, locate images and select multiple images by holding shift key, and then click add files button. 4. by default, acrobat sorts files by file names. use move up and move down buttons to change image orders and use the remove button to delete images. choose a target file size. the smallest icon will produce a file with a smaller file size but a lower image quality pdf, and the largest icon will produce a high image quality pdf but with a very large file size. we normally use the default file size setting, which is the middle icon. 5. save the file. at this point, the pdf is not full-text searchable. making a full-text searchable pdf a pdf document created from a scanned piece of paper is inherently inaccessible because the content of the document is an image, not searchable text. assistive technology cannot read or extract the words, users cannot select or edit the text, and one cannot manipulate the pdf document for accessibility. once optical character recognition (ocr) is properly applied to the scanned files, however, the image becomes searchable text with selectable graphics, and one may apply other accessibility features to the document. adobe acrobat professional provides three ocr options, searchable image (exact), searchable image, and clean scan. because searchable image (exact) is the only option that keeps the original look, we only use this option. to run an ocr procedure using acrobat 9 professional: 1. open a digitized pdf. 2. select document > ocr text recognition > recognize text using ocr. 3. in the recognize text dialog, specify pages to be ocred. 4. in the recognize text dialog, click the edit button in the settings section to choose ocr language and pdf output style. we recommend the searchable image (exact) option. click ok. the setting will be remembered by the program and will be used until a new setting is chosen. sometimes a pdf’s file size increases greatly after an ocr process. if this happens, use the pdf optimizer to reduce its file size. figure 2. merge files into a single pdf figure 3. combine files dialog figure 1. acrobat 9 professional’s create pdf from scanner dialog editorial | truitt 55 a recent library journal (lj) story referred to “the palpable hunger public librarians have for change . . . and, perhaps, a silver bullet to ensure their future” in the context of a presentation at the public library association’s 2010 annual conference by staff members of the rangeview (colo.) library district. now, lest there be any doubt on this point, allow me to state clearly from the outset that none of the following ramblings are in any way intended as a specific critique of the measures undertaken by rangeview. far be it from me to second-guess the rangeview staff’s judgment as to how best to serve the community there.1 rather, what got my attention was lj’s reference to a “palpable hunger”for magic ammunition, from whose presumed existence we in libraries seem to draw comfort. in the last quarter century, it seems as though we’ve heard about and tried enough silver bullets to keep our collective six-shooters endlessly blazing away. here are just a few examples that i can recall off the top of my head, and in no particular order: ■■ library cafes and coffee shops. ■■ libraries arranged along the lines of chain bookstores. ■■ general-use computers in libraries (including information/knowledge commons and what-have-you) ■■ computer gaming in libraries. ■■ lending laptops, digital cameras, mp3 players and ipods, e-book readers, and now ipads. ■■ mobile technology (e.g., sites and services aimed at and optimized for iphones, blackberries, etc.) ■■ e-books and e-serials. ■■ chat and instant-message reference. ■■ libraries and social networking (e.g., facebook, twitter, second life, etc.). ■■ “breaking down silos,” and “freeing”/exposing our bibliographic data to the web, and reuse by others outside of the library milieu. ■■ ditching our old and “outmoded” systems, whether the object of our scorn is aacr2, lcsh, lcc, dewey, marc, the ils, etc. ■■ library websites generally. remember how everyone—including us—simply had to have a website in the 1990s? and ever since then, it’s been an endless treadmill race to find the perfect, user-centric library web presence? if sisyphus were to be incarnated today, i have little doubt that he would appear as a library web manager and his boulder would be a library website. ■■ oh, and as long as we’re at it, “user-centricity” generally. the implication, of course, is that before the term came into vogue, libraries and librarians were not focused on users. ■■ “next-gen” catalogs. i’m sure i’m forgetting a whole lot more. anyway, you get the picture. each of these has, at one time or another, been positioned by some advocate as the necessary change—the “silver bullet”—that would save libraries from “irrelevance” (or worse!), if we would but adopt it now, or better yet, yesterday. well, to judge from the generally dismal state of libraries as depicted by some opinionmakers in our profession—or perhaps simply from our collective lack of self-esteem—we either have been misled about the potency of our ammunition, or else we’ve been very poor markspersons. notwithstanding the fact that we seem to have been indiscriminately blasting away with shotguns rather than six-shooters, our shooting has neither reversed the trends of shrinking budgets and declining morale nor staunched the ceaseless dire warnings of some about “irrelevance” resulting from ebbing library use. to stretch the analogy a bit further still, one might even argue that all this shooting has done damage of its own, peppering our most valuable services with countless pellet-sized holes. at the same time, we have in recent years shown ourselves to be remarkably susceptible to the marketingfocused hyperbole of those in and out of librarianship about technological change. each new technology is labeled a “game-changer”; change in general is either— to use the now slightly-dated, oh-so-nineties term—a “paradigm shift” or, more recently, “transformational.” when did we surrender our skepticism and awareness of a longer view? what’s wrong with this picture?2 i’d like to suggest another way of viewing this. a couple of years ago, alan weisman published the world without us, a book that should be required reading for all who are interested in sustainability, our own hubris, and humankind’s place in the world. the book begins with our total, overnight disappearance, and asks (1) what would the earth be like without us? and (2) what evidence of our works would remain, and for how long? the bottom line answers for weisman are (1) in the long run, probably much better off, and (2) not much and not for very long, really. so, applying weisman’s first question to our own, much more modest domain, what might the world be like if tomorrow librarians all disappeared or went on to work doing something else—became consultants, perhaps?— and our physical and virtual collections were padlocked? would everything be okay, because as some believe, marc truitteditorial: no more silver bullets, please marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 56 information technology and libraries | june 2010 think we need to be prepared to turn off the lights, lock the doors, and go elsewhere, because i hope that what we’re doing is about more than just our own job security. and if the far-fetched should actually happen, and we all disappear? i predict that at some future point, someone will reinvent libraries and librarians, just as others have reinvented cataloguing in the guise of metadata. notes and references 1. norman oder, “pla 2010 conference: the anythink revolution is ripe,” library journal, mar. 26, 2010, http://www .libraryjournal.com/article/ca6724258.html (accessed mar. 30, 2010). there, i said it! a fairly innocuous disclaimer added to one of my columns last year seemed to garner more attention (http:// freerangelibrarian.com/2009/06/13/marc-truitts-surprising -ital-editorial/) than did the content of the column itself. will the present disclaimer be the subject of similar speculation? 2. one of my favorite antidotes to such bloated, short-term language is embodied in michael gorman’s “human values in a technological age,” ital 20, no. 1 (mar. 2000): 4–11, http:// www.ala.org/ala/mgrps/divs/lita/ital/2001gorman.cfm (accessed apr 12, 2010)—highly recommended. the following is but one of many calming and eminently sensible observations gorman makes: the key to understanding the past is the knowledge that people then did not live in the past—they lived in the present, just a different present from ours. the present we are living in will be the past sooner than we wish. what we perceive as its uniqueness will come to be seen as just a part of the past as viewed from the point of a future present that will, in turn, see itself as unique. people in history did not wear quaintly oldfashioned clothes—they wore modern clothes. they did not see themselves as comparing unfavorably with the people of the future, they compared themselves and their lives favorably with the people of their past. in the context of our area of interest, it is particularly interesting to note that people in history did not see themselves as technologically primitive. on the contrary, they saw themselves as they were—at the leading edge of technology in a time of unprecedented change. it’s all out there on the web anyway, and google will make it findable? absent a few starry-eyed bibliophiles and newly out-of-work librarians—those who didn’t make the grade as consultants—would anyone mourn our disappearance? would anyone notice? if a tree falls in the woods . . . in short, would it matter? and if so, why and how much? the answer to the preceding two questions, i think, can help to point the way to an approach for understanding and evaluating services and change in libraries that is both more realistic and less draining than our obsessive quest for the “silver bullet.” what exactly is our “valueadd”? what do we provide that is unique and valuable? we can’t hope to compete with barnes and noble, starbucks, or the googleplex; seeking to do so simply diverts resources and energy from providing services and resources that are uniquely ours. instead, new and changed services and approaches should be evaluated in terms of our value-add: if they contribute positively and are within our abilities to do them, great. if they do not contribute positively, then trying to do them is wasteful, a distraction, and ultimately disillusioning to those who place their hopes in such panaceas. some of the “bullets” i listed above may well qualify as contributing to our value-add, and that’s fine. my point isn’t to judge whether they are “bad” or “good.” my argument is about process and how we decide what we should do and not do. understanding what we contribute that is uniquely ours should be the reference standard by which proposed changes are evaluated, not some pie-inthe-sky expectation that pursuit of this or that vogue will magically solve our funding woes, contribute to higher (real or virtual) gate counts, make us more “relevant” to a particular user group, or even raise our flagging selfesteem. in other words, our value-add must stand on its own, regardless of whether it actually solves temporal problems. it is the “why” in “why are we here?” if, at the end of the day, we cannot articulate that which makes us uniquely valuable—or if society as a whole finds that contribution not worth the cost—then i testing for transition: evaluating the usability of research guides around a platform migration articles testing for transition: evaluating the usability of research guides around a platform migration ashley lierman, bethany scott, mea warren, and cherie turner information technology and libraries | december 2019 76 ashley lierman (lierman@rowan.edu) is instruction librarian, rowan university. bethany scott (bscott3@uh.edu) is coordinator of digital projects, university of houston. mea warren (mewarren@uh.edu) is natural science and mathematics librarian, university of houston. cherie turner (ckturner2@uh.edu) is assessment & statistics coordinator, university of houston. abstract this article describes multiple stages of usability testing that were conducted before and after a large research library’s transition to a new platform for its research guides. a large interdepartmental team sought user feedback on the design, content, and organization of the guide homepage, as well as on individual subject guides. this information was collected using an open-card-sort study, two face-to-face, think-aloud testing protocols, and an online survey. significant findings include that users need clear directions and titles that incorporate familiar terminology, do not readily understand the purpose of guides, and are easily overwhelmed by excess information, and that many of librarians’ assumptions about the use of library resources may be mistaken. this study will be of value to other library workers seeking insight into user needs and behaviors around online resources. introduction like many libraries that employ springshare’s popular libguides platform for creating online research resources, the university of houston libraries (uhl) has accumulated an extensive collection of guides over the years. by 2015, our collection included well over 250 guides, with varying levels of complexity, popularity, usability, and accessibility. this presented a major challenge when we planned to migrate our libguides instance (locally branded as “research guides”) to libguides v2 in fall 2015, but also an opportunity: the transition would be an ideal time to appraise, reorganize, and streamline existing guide content. although uhl had conducted user research in the past to improve usability, in preparing for the migration it became clear that another round of tests would be beneficial in revising our guides for the new platform. our research guides would be presented much differently in libguides v2, and the design and organization of information would need to be tailored to the needs of our user community like any other service. user feedback would be vital to reorganizing our guides’ content and to making customizations to the new system. this article will describe the usability testing process that was employed before and after uhl’s migration to libguides v2. usability testing is one technique in the field of user experience (ux). the primary goal of ux is to gain a deep understanding of users’ preferences and abilities, in order to inform the design and implementation of more useful, easy-to-use products or systems. best practices for ux emphasize “improving the quality of the user’s interaction with and perceptions of your product and any related services.”1 usability tests conducted as part of this case study mailto:lierman@rowan.edu mailto:bscott3@uh.edu mailto:mewarren@uh.edu mailto:ckturner2@uh.edu testing for transition | lierman, scott, warren, and turner 77 https://doi.org/10.6017/ital.v38i4.11169 were informed by the work of jakob nielsen, who pioneered several ux ideas and techniques, and the explanations on conducting your own usability testing provided in steve krug’s seminal works on the topic, don’t make me think and rocket surgery made easy. uhl’s transition to libguides v2 consisted of five stages: (1) card sort testing to determine the best organization of guides in the new system; (2) the migration itself; (3) face-to-face usability testing after migration to study user expectations and behavior after the change; (4) a survey to identify any significant variations in distance users‘ experiences; and (5) final analysis and implementation of the results. incorporating usability testing was a relatively easy and inexpensive process with a high yield of useful insights, which could be adapted as needed to other library settings in order to evaluate similar online resources. literature review as libraries have moved from traditional paper pathfinders to online research guides of increasing sophistication, there has been substantial study into the effectiveness of online research guides for various audiences and information needs. several studies highlight the apparent disconnect between students’ and librarians’ perceptions of research guides, especially regarding the purpose, organization, and intended use of the guides. reeb and gibbons used an analysis of surveys and web usage statistics from several university libraries to show that students rarely or never used online guides despite the extensive time spent by librarians to curate and present information resources.2 similarly, in courtois, higgins, and kapur’s one-question survey (“was this guide useful?”) the authors were surprised to find that 40 percent of the responses received rated guides unfavorably, noting that “it was disheartening for many guide owners to receive poor ratings or negative comments on guides that require significant time and effort to produce and maintain.”3 hemmig concluded that in order to increase the value of a guide from a user perspective, librarians must adopt a user-centric approach by guiding the search process, understanding students’ mental models for research, and providing “starter references.”4 staley’s survey of student users also indicates a need to be mindful of what resources guides are actually expected to provide, as it found that pages linking to articles and databases were far more used than pages with other content.5 data has also shown that undergraduate students are unable to match their information needs with the resources provided on broad subject-area guides, leading several authors to conclude that students would be able to use course-specific guides more easily. for instance, strutin found that course guides are among the most frequently used guides, especially when paired with library instruction sessions.6 several other studies cite survey data, statistics, and educational concepts like cognitive load theory to conclude that ideally, guides would be customized to the specific information needs of each course and its assignments in order to better match the mental models and information-seeking behavior of undergraduate students.7 while the value of online research guides has been under study for quite some time, usability testing of guides is a relatively recent phenomenon. in 2010, librarians at concordia university conducted usability testing of two online research guides and found that undergraduate students generally found the guides difficult to use.8 librarians at metropolitan state university conducted two rounds of usability tests on their libguides with a broader range of participant types, highlighting the ability to incorporate usability testing as part of an iterative design process.9 at ithaca college, subject librarians partnered with students in a human-computer interaction information technology and libraries | december 2019 78 course to test both course guides and subject guides through a series of usability tests, preand post-test questionnaires, and a group discussion in which students evaluated the findings of the usability tests and discussed their experiences.10 at the university of nevada, las vegas, librarians conducted usability testing with both undergraduate students and librarians, and surprisingly found that attitudes towards the guides were similar in both groups: interface design challenges were the greatest barrier to task completion, rather than the level of expertise of the user.11 finally, at northwestern university, librarians conducted several types of usability tests as a part of a transition from the original libguides platform to libguides v2, to determine what features worked from the original guides and what could be improved or updated during the migration.12 throughout these and other usability studies, the authors have identified a number of desirable and undesirable elements in research guide design: • clean and simple design is highly prioritized by users. students preferred streamlined text, plentiful white space, and links to “best bets” rather than comprehensive but overwhelming lists of databases.13 these findings also align with accepted web design best practices. • guide parts and included resources should be labeled clearly and without jargon.14 sections and subpages within each guide should be named according to key terms that students recognize and understand. also, librarians should consider creating subpages using a “need-driven approach,” based on the purpose of each research task or step, rather than by the format of materials or resources.15 • the tabbed navigation of libguides v1 is both unappealing to and easily missed by users, and if it must be implemented, great care should be taken to maximize its visibility and usability.16 • consistency of guide elements, both within a guide and from one guide to the next, helps users more easily orient themselves when using guides; certain elements should always be present in the same place on the page, including navigational elements and table of contents, contact information, supplemental resources such as citation and plagiarism information, and common search boxes.17 with the findings and recommendations of these predecessors in mind, we designed a multi-stage study to expand upon their results and identify new challenges and opportunities that the libguides v2 platform might present. methodology stage 1: card sort the majority of research guides at uhl are organized by subject area, by course, or both. there are a number of guides, however, that are not affiliated with any particular subject area or course, containing task-oriented information that may be valuable across a wide variety of disciplines. the organizational system for these guides had developed organically over time as new guides were developed, rather than being structured intentionally, and it had become evident that these guides were not particularly discoverable or well-used by students. the migration to libguides v2 presented an opportunity to reorganize these guides based on user input. a team of three librarians from the liaison services department conducted an open-card-sort study in november 2015, in order to determine how best to organize those research guides not already affiliated with a course or subject area. card sorting is a method of identifying the testing for transition | lierman, scott, warren, and turner 79 https://doi.org/10.6017/ital.v38i4.11169 categories and organization of information that make the most sense to users, by asking users to sort potential tasks into named categories representing the menus or options that would be available on the site. an open-card sort allows users to create and name as many categories as they need, as opposed to a closed-card sort, which requires users to sort the available options into a predetermined set of categories. to prepare for the study, we reviewed all of our guides to develop a complete list of those not affiliated with a subject or course. for each guide, we developed a brief, clear description of the guide’s topic that would be easy for an average library user to understand, each on a small laminated card. over an approximately ninety-minute period, we staffed a table in the 24-hour lounge of m.d. anderson library, where we recruited passersby to participate in the study. after answering a few demographic questions, participants were asked to place the cards into groups that seemed logical to them. they could create as many or as few groups as necessary, but were asked to try to place every card in a group. while the participants organized the cards, they were asked to explain their thought processes and rationale, and one librarian observed the sorting process and took notes on their actions and explanations. when a participant finished grouping the cards, they were asked to write on an index card a name for each of the groups they had created. the final groupings were photographed and the labels retained for recording purposes. after the testing was complete, participants’ responses were organized into a spreadsheet and reviewed for recurring patterns and commonalities. a new set of categories was developed based on those most commonly created by students during the study, and these categories were titled using the most common terminology used by students in their group labels. stage 2: migration at the direction of the instructional design librarian (idl), research guide editors at uhl revised and prepared their own guide content throughout fall 2015, eliminating unneeded information and reorganizing what remained. the idl led multiple trainings and work sessions throughout the process to ensure compliance. during this same time, the idl completed back-end work in the libguides system to prepare for migration, and the web services department created a custom layout for the new guide site. the data migration itself took place on december 18, 2015, followed by cleanup and full implementation in january 2016. the idl provided a deadline by which all content must be ready for public consumption, prior to the start of the spring semester. af ter that deadline, the web services department switched the url for uhl’s research guides site to the libguides v2 instance and made the new system publicly available. stage 3: face-to-face testing after the migration process was complete, the idl assembled a team of ten other librarian and staff stakeholders from the liaison services, special collections, and web services departments to develop a usability testing protocol. this team assisted the idl in developing two different face-toface testing scripts and the text of a survey for distance users, as well as helping to administer face-to-face testing. the method we chose for the face-to-face testing process was think-aloud testing. in a think-aloud test, the user is provided a set of tasks to complete using the web resource that have been identified as common potential uses. the user is asked to attempt each task, and to narrate any thoughts or reactions to the resource, as well as the thought process and rationale behind each decision made. information technology and libraries | december 2019 80 several members of the team were already familiar with usability practices and had participated in think-aloud user testing before. training for the others was provided in the form of short practical readings, verbal guidance from the idl in group meetings, and practice sessions before conducting the face-to-face testing. in the practice sessions, group members volunteered for their roles in the testing, discussed protocol and logistics and asked any questions, and practiced the tasks they would each need to complete: making the recruitment pitch to users, walking through the consent process, using recording software, using the notetaking sheet, and so on. as the team leader and one of the members experienced with usability, the idl conducted the actual testing interviews. each of the face-to-face tests focused on either subject guides or the guide homepage. for both tests, tables were set up in the 24-hour lounge for recruitment and testing. two team members recruited students in the library at the time of testing by offering snacks and library branded giveaways. two additional team members facilitated the test and took notes during testing. both tests also used the same consent forms and demographic questions, and largely the same followup questions. participants in both homepage and subject guide testing were guided to the appropriate starting points and interviewed about their impressions of the homepage and guides, their perceptions of the purpose of these resources, and their understanding of the research guides name. subject guide testers were allowed to select which of our two testing guides they would be more comfortable using: the general business resources guide, or the biology and biochemistry resources guide. subject guide testers were also asked how they would seek help if the guide did not meet their needs. both groups were then asked to complete one of two sets of tasks. the homepage tasks were designed to test users’ ability to find individual guides, either for a specific course or for general information on a subject; the subject guide tasks were designed to test users’ ability to find appropriate resources for research on a given topic. after completing the tasks for their appropriate resources, participants answered several general follow-up questions, with additional questions from the facilitator as necessary. stage 4: survey unlike the face-to-face testing, the survey focused only on use of subject guides, not the homepage. otherwise, however, because the purpose of the survey was to compare the behavior of distance users to the behavior of on-campus users, the survey was designed to mimic the face-to-face test as closely as possible. several team members with liaison responsibilities identified distance user groups in their subject areas who would be demographically appropriate and available at the needed time, and contacted appropriate faculty members to ask for assistance in distributing the survey via email. ultimately, the survey was distributed to small cohorts of users in the areas of social work, education, nursing, and pharmacy, and customized for each targeted cohort. each version of the survey linked users to their appropriate subject guide and then asked the same questions regarding impressions of the guide that were asked in the face-to-face testing. users were also asked to complete tasks using the guide that were similar in purpose to those in the face-to-face testing, and they were prompted to enter the resource they found at the end of each task. demographic information was requested at the end of the survey to ensure that in the event of drop-offs, basic demographic information would be more likely to be lost than testing data. the testing for transition | lierman, scott, warren, and turner 81 https://doi.org/10.6017/ital.v38i4.11169 survey was distributed to the target groups over a three-week period in june 2016. six users at least partially completed the survey, and four completed it in full. stage 5: analyzing and implementing results after completing the face-to-face testing, the idl reviewed and transcribed the recordings of each test session, along with additional insights from the notetakers. responses to each interview question were coded and ordered from most to least common, as were patterns of behavior and difficulties in completing each task. task results and completion times were also recorded for each user and organized into a spreadsheet with users’ demographic information. the idl then reported out to research guide editors on common responses and task behaviors observed in the testing, and interpretations of the implications of these results for guide design. after survey responses were collected, the idl compiled and analyzed the results using a similar process, although the survey received few enough responses that coding was not necessary. users’ responses to questions were noted and grouped, and success and failure rates on tasks were tallied. a second report out to research guide editors summarized these results and described which responses closely resembled those received in the face-to-face testing and which varied. finally, when all data had been collected, the idl compiled recommendations based on the testing results with other recommendations derived from past uhl studies and from reviewing the literature, and from these developed a set of research guides usability guidelines. the guidelines were organized from highest to lowest priority, based on how commonly each was indicated in testing or in the literature. research guide editors were asked to revise their guides according to these guidelines within one year of their implementation, and advised that their compliance would be evaluated in an audit of all guide content in summer 2017. in the interest of transparency, the idl also included in the guidelines document an annotated bibliography of the relevant literature review, and a formal report on the procedures and results of the usability testing process. findings card sort one significant observation from the card sort was that, while librarians tended to organize guides into groups based on type of user (e.g., “undergraduates,” “student athletes,” “first-years,” etc.), none of the students who participated categorized resources in this way, and they did not seem to be particularly conscious of the categories into which they or other users might fit. instead, their groupings focused on the type of task to which each guide would be most appropriate, rather than the type of user that would be most likely to use that guide. for example, users readily recognized guides related to citation tasks and preferred them to be grouped together, regardless of the level at which they addressed the topic, and also grouped advanced visualization techniques like gis with simpler multimedia-related tasks like finding images. similarly, category labels tended to include “how to . . . ” language in describing their contents, focusing on the task to which the guides in that category would be beneficial. this aligns with the recommendation from sinkinson et al. to name guide pages based on purpose rather than format.18 it is worth noting, however, that all of the students who participated in the card-sort study were undergraduates and may not have fully understood some of the more complex research tasks being described. it should also be noted that all users created some sort of category for “general” or “basic” research tasks, and most either explicitly created an “advanced” research category, or information technology and libraries | december 2019 82 created several more granular categories and then verbally described these as being for “advanced” research tasks. in general, organization by task type was most preferred, followed by level of sophistication of task. face-to-face testing: homepage no significant correlations were found between user demographics and users’ success rates in completing each task, nor between demographics and time on task. users’ ability to navigate the system was generally consistent regardless of major, year in program, and—somewhat surprisingly—frequency of library use. this is, however, in keeping with costello et al.‘s finding that technology barriers were more significant in user testing than level of experience.19 when testing the homepage, we found that all users were able to find known guides (such as a course guide for a specific course) and appropriate guides for a given task (such as a citation guide for a particular style) quickly and easily. when seeking a guide, users generally used the by subject view of all guides to locate both subject and course guides. if this view was not helpful, as in the case of citation style guides, users’ next step was most commonly to switch to the all guides view and use the search function to look for key terms. users understood and used the by subject and all guides views intuitively, expressed more confusion and hesitation about the by owner and by type views, and disregarded the by group view entirely. we had been concerned about whether the search function would confuse users by highlighting results from guide subpages, but on the contrary, the study participants used the search function easily, and the fact that it surfaced results from within guides seemed to help them find and identify relevant terms, rather than confusing them. overall, users responded favorably to the look and feel of guides, albeit with a few specific critiques: the initially limited color palette made it difficult for some users to distin guish parts of a guide from one another, and the text size was found to be uncomfortably small in some areas. face-to-face testing: subject guides in subject guide testing, we found overwhelmingly that users both valued and made use of link and box descriptions within guides, using them throughout the navigation process as sources of additional information. users generally preferred briefer descriptions, rather than reading lengthy paragraphs of text, but several noted specific instances in which they would not have understood the nature or purpose of a database without the description that was provided. we also found, conversely, that librarian profile boxes were of less value to users than we had assumed. when asked how they would find help when researching, most subject guide testers said they would turn to google, ask at the library service desk, or use the contact us link in the libguides footer; only two mentioned the profile box as a potential source of help at all. users also seemed unsure of the purpose of the profile box, and not to recognize whose photo and contact information they were seeing, in spite of box labels and text. contrary to our expectations, users also readily clicked through to subpages of guides to find information, sometimes even when more useful information was actually available on the guide landing page. this was particularly evident in one of the subject guides that included internal navigation links in a box on the landing page: if users saw a term they recognized in one of these links, they would click it immediately, without exploring the rest of the page. in general, users latched on quickly to terms in subpage and box titles that seemed relevant to their tasks, and some expressed feelings of increased confidence and reassurance when seeing a familiar term featured testing for transition | lierman, scott, warren, and turner 83 https://doi.org/10.6017/ital.v38i4.11169 prominently on an otherwise unfamiliar resource. scanning for keywords in this manner also sometimes led users astray, however: some navigated to inappropriate pages or links because they featured words like “research” or “library” in their titles. users also expressed confusion about page titles that did not match their expectations of tasks they could complete online, such as “biology reading room.” these findings support those of many prior authors regarding the importance of including clear descriptions with key words that users readily understand.20 many of our results from subject guide testing not only ran counter to our expectations, but challenged the assumptions on which we had based our questioning. for example, we had been curious to learn whether links to external websites were used significantly compared to links to library databases, or if they simply cluttered guides unnecessarily. in testing, however, we found that users did not distinguish between the two types of resources at all, and used both interchangeably. a better question seemed to be not whether users found those links useful, but how to distinguish them from library content—or whether the distinction was necessary from the user’s perspective. some team members had also been concerned about the scroll depth of guide pages, but the majority of users not only said they did not mind scrolling, but seemed surprised and amused by being asked. their own assumptions about this type of resource clearly included the need to scroll down a page. a few other miscellaneous issues presented themselves in our face-to-face testing. one was that the purpose and nature of research guides was not readily evident to users. many used language that conflated guides with search tools like databases, or even with individual information resources like books or articles. for example, a user asked whether the by owner view listed the authors of articles available in this resource. the curated and directional nature of research guides was not at all clear to users. furthermore, in spite of the improvements to guide look and feel in libguides v2, several users still spoke of guides as being cluttered, lengthy, and overwhelming, leaving them intimidated and unsure of where to begin. consistently, testers tended to gravitate toward course guides even when subject guides would have been more appropriate for a given task, and some users expressed that this choice was because of the greater specificity in course guide titles. users demonstrated a great preference for familiarity, gravitating toward terms and resources that were known to them, and even repeating behaviors that had been unproductive earlier in the testing process. finally, one of the greatest points of confusion for users seemed to be the relationship of research guides to physical materials within the library. users readily and confidently followed links to online resources from research guides but expressed confusion and hesitancy when guides pointed to books or other resources available in the library. survey the survey of off-campus users had few responses, but the demographics of the respondents varied more than those of the on-campus testing participants, including graduate students and faculty. the users who did respond showed evidence of less use of guide subpages than we had observed in the face-to-face testing, indicating that the presence of a librarian during testing may have influenced users to explore guides more thoroughly than they would have when working on their own. at the same time, more experienced researchers in the survey group—in this case, a late-program graduate student and a faculty member—were apparently more likely than less experienced users to explore guides thoroughly, and to succeed at research tasks. survey respondents also were far more likely to state that they would use the profile box on guides for information technology and libraries | december 2019 84 help, with some indicating that they recognized their liaison librarian’s picture and were familiar with the librarian as a source of assistance. liaison librarians at uhl often work more closely with higher-level students and faculty than with undergraduates, and this greater familiarity was not surprising. discussion implementation of findings based on the results of the literature review and testing, a number of changes and recommendations were implemented. a brief description of the nature and purpose of research guides was added to the guide homepage’s sidebar, and more color variation was added to guides, while font sizes were increased. existing documentation was also reworked and expanded to create the research guides usability guidelines document for all guide editors, which included adding or revising the following recommendations: • pages, boxes, and resources should all be labeled with clear, jargon-free language that includes keywords familiar to their most frequent users. • page design should be “clean and simple,” minimizing text and incorporating ample white space. • brief, oneto two-sentence descriptions should be provided for all links. • each guide should have an introduction on its landing page with a brief description of its contents and purpose. it may be helpful to include links to subpages in this box as well, but this should be done judiciously, as these links may take users off the landing page prematurely. • pages and resources aimed at undergraduates should be organized and titled according to their relevance to research tasks (e.g., “find articles”), and not by user group. • electronic resources should be prioritized on guides over print resources. • clear distinctions should be made between library and non-library links when the distinction is important. • a profile box with a picture should be included, but the importance of this item is not as great as we had previously imagined. limitations one of the most significant challenges in our testing was actually negotiating the irb application process. delays in our application raised concerns within the team that we would not receive approval in time to test with students before the start of the summer break. although we did receive approval in time, the window for testing afterward was extremely narrow. submitting the application also bound us to the scripts and text that we had originally drafted, which severely limited the flexibility of the testing process. this became a challenge at several points when a particular phrasing or design of a question was found to be ineffective in practice, but could not be altered from its original form. tensions between the requirements for institutional review and the unique needs of usability testing are a persistent problem for user experience development in an academic setting, and must be planned for accordingly as much as possible. in some cases, as well, we might have improved our results by better designing our questions. one example of this was the question about the name “research guides,” which anecdotal evidence has suggested might be challenging for users. simply asking whether that name made sense to the participant was clearly not effective in practice, and did not yield actionable insights. in the future, testing for transition | lierman, scott, warren, and turner 85 https://doi.org/10.6017/ital.v38i4.11169 we might consider informal testing of our planned questions with users in the target demographic before proceeding with full-scale usability testing. a final challenge was in gathering data on use of guides by distance users. though we were able to get enough responses to draw some tentative conclusions, we had hoped for a larger pool of data. though it would make the results more difficult to compare to in-person testing, reducing the length of the survey might have helped to produce more responses. additionally, increased marketing and more flexible timing for survey distribution might have also helped us reach a larger audience. conclusions the results of our testing were very instructive, and led to the creation of valuable documentation for guide editors to use in their work. we also learned a number of lessons relating to process that would be of value to other librarians seeking to perform similar testing at their own institutions. the first of these is that working with a large, interdepartmental team on this type of project— while occasionally unwieldy—is greatly beneficial overall. even if all the team members are not able to fully participate, involving as many colleagues as possible in the usability testing process lessens the workload for each individual, increases flexibility, and ultimately increases buy-in and compliance with the resulting changes and recommendations. for a platform used directly by a relatively large percentage of librarians, as libguides generally is, the number of stakeholders in user research is correspondingly large, and as many of these stakeholders as possible should be involved to some degree. not only will this distribute the benefits of the process more broadly, it will make it possible to staff more extensive and more frequent testing sessions. in the course of our testing process, we also came to recognize the value of testers familiar with the user group under examination. a majority of librarians involved in testing were from publicfacing departments, with significant student contact in their day-to-day work. as a result, we were able to quickly attract a diverse set of participants for our testing simply through our collective knowledge of students’ likely behaviors and preferences: where students were most likely to congregate, what kinds of rewards would motivate them to participate, how to reach them at a distance, and how far their patience would be likely to extend for an in-person interview or an online survey. the incentives and location that the testing teams selected were so effective that the numbers of volunteers we received overwhelmed our capacity to accommodate within the allotted testing time, resulting in a substantial pool of responses for analysis. therefore, we conclude that the effectiveness of user research can be increased by including (or at least consulting) those most familiar with the user group to be studied. simply assuming that participants will be available may ultimately compromise the effectiveness of testing. additionally, time management is an extremely important element of testing development. failing to fully account for the demands of the irb process, for example, led to significant limitations for our project concerning the timing of testing, the availability of participants, our capacity for marketing and distribution of the survey, and the quality of our testing instrument. while acknowledging that, as in our case, sometimes the need for usability testing arises on short notice, we recommend allocating as much time and preparation to the process as possible, to ensure that every aspect of the testing can be given adequate attention. information technology and libraries | december 2019 86 figure 1. average monthly guide views by transition period. testing for transition | lierman, scott, warren, and turner 87 https://doi.org/10.6017/ital.v38i4.11169 as a final note, nearly two years after the best practices were implemented, we collected and compared guide traffic statistics from three key periods: • september 2014 through december 2015, the sixteen months preceding our transition to libguides v2; • january 2016 through august 2017, our first twenty months on libguides v2, during which time best practices had not yet been fully developed and implemented; and • september 2017 through april 2019, from the beginning of best practices implementation through the time of writing (best practices were implemented gradually between september 2017 and february 2018). mindful of the fact that guide usage fluctuates with the academic year, we compared average views for each guide on a monthly basis. figure 1 shows the average number of times each guide was viewed in a month for each period of the transition. as the figure shows, for most of the academic year, guide views dropped sharply after our transition from libguides v1 to libguides v2, and continued to decline slightly with time through the period when our best practices were implemented. there are a number of possible causes for this phenomenon: • guide usage may be declining over time generally for a variety of reasons, and the transition to the new look of v2 may have confused and disoriented users in the immediate aftermath, causing use of some guides to be discontinued. • a substantial number of older guides were eliminated in the transition to v2, some of which may have been more heavily used than suspected, and new guides that have been created since may not yet have gained traction and recognition from users. • librarians may also have reduced their efforts to incorporate guides into their teaching and outreach strategies. • improved organization in the new system may be helping users to find the guide they need on the first try, without having to move through and examine multiple guides. in any case, this trend is concerning and merits further investigation, but a direct correlation with the transition to libguides v2 and the implementation of best practices has not been established. a more accurate measure of the effect of the best practices would be a user satisfaction survey, although a comparison would be difficult to make due to a lack of a baseline from bef ore the transition. we will continue to investigate trends in the use of our guide and how our best practices have affected our users, and how they can be improved upon in the future. information technology and libraries | december 2019 88 appendix a: homepage testing script welcome and demographics hello! thank you for agreeing to participate. i’ll be helping you through the process, and my colleague here will be taking notes. before we get started, i’d like to ask you a few quick questions about yourself. • are you a student? o (no:) ▪ what is your status at uh? (faculty, staff, fellow, etc.) ▪ with what college or area are you affiliated? o (yes:) ▪ are you an undergraduate or a grad student? ▪ what program are you in? ▪ what year are you in now? • how often do you use this library? • how often do you use the libraries’ website or online resources? • about how many hours a week would you say you spend online? • have you ever used the libraries’ research guides before? (if not) have you ever heard of them? are you ready to start? do you have any questions? homepage tour first, i’d like to ask you a few questions about the homepage, which you can see here. don’t worry about right or wrong answers, i just want to know your reactions. • when you look at this page, what are your first impressions of it? • just from looking at these pages, what do you think this resource is for? • look at the categories across the top of the screen. what do you think each of those mean? what would you use them for? • what would you call the resources listed here? • we call these resources “research guides.” does that name make sense to you? tasks: odd-numbered participants now we’re going to ask you to complete two tasks using this page and the links on it. this isn’t a test, and nothing you do will be the wrong or right answer. we just want to see h ow you interact with the site and what we can do to make that experience better. do you have any questions so far? let’s begin. please try to talk about what you’re doing as much as possible, and tell us what you’re thinking and why you’re taking each step. 1. you need to find sources for an assignment for your history class, and you aren’t sure where to start. you clicked a link on the help section of the library webpage that led you here. find a guide that you think can help you. 2. you are taking chemistry 1301, and your professor told you that the library has a research guide especially for this class. find the guide you think they meant. testing for transition | lierman, scott, warren, and turner 89 https://doi.org/10.6017/ital.v38i4.11169 tasks: even-numbered participants now we’re going to ask you to complete two tasks using this page and the links on it. this isn’t a test, and nothing you do will be the wrong or right answer. we just want to see how you interact with the site and what we can do to make that experience better. do you have any questions so far? let’s begin. please try to talk about what you’re doing as much as possible, and tell us what you’re thinking and why you’re taking each step. 1. you need to format a bibliography in mla style, and your professor told you that the library has a research guide that can help. find the guide you think she meant. 2. you are taking a psychology course for the first time, and you want find out what types of tools you should use to do research in psychology. you clicked a link on the help section of the library webpage that led you here. find a guide that you think can help you. follow-up questions now i’d like to ask you a few follow-up questions. • was this easy or hard to do? • what was the easiest part? • what was the hardest part? • what did you like about using this site? • what’s one thing that would have made these tasks easier to complete? information technology and libraries | december 2019 90 appendix b: subject guides testing script welcome and demographics hello! thank you for agreeing to participate. i’ll be helping you through the process, and my colleague here will be taking notes. before we get started, i’d like to ask you a few quick questions about yourself. • are you a student? o (no:) ▪ what is your status at uh? (faculty, staff, fellow, etc.) ▪ with what college or area are you affiliated? o (yes:) ▪ are you an undergraduate or a grad student? ▪ what program are you in? ▪ what year are you in now? • how often do you use this library? • how often do you use the libraries’ website or online resources? • about how many hours a week would you say you spend online? • have you ever used the libraries’ research guides before? (if not) have you ever heard of them? are you ready to start? do you have any questions? guide impressions first, i’d like to ask you a few questions about this page. don’t worry about right or wrong answers, i just want to know your reactions. • when you look at this page, what are your first impressions of it? • just from looking at this page, what do you think this resource is for? what would you use it for? • what would you call this type of resource? • we call resources like this “research guides.” does that name make sense to you? • if you couldn’t find what you were looking for on this page, what would you do to find help? now we’re going to ask you to complete two tasks using this page and the links on it. this isn’t a test, and nothing you do will be the wrong or right answer. we just want to see how you interact with the site and what we can do to make that experience better. do you have any questions so far? let’s begin. please try to talk about what you’re doing as much as possible, and tell us what you’re thinking and why you’re taking each step. tasks: general business resources guide 1. find a database that you could use for research in a general business class. 2. imagine you want to find information on census data. find an appropriate resource on this guide. 3. find a tool you could use to find a dissertation to use in a general business class. testing for transition | lierman, scott, warren, and turner 91 https://doi.org/10.6017/ital.v38i4.11169 tasks: biology and biochemistry resources guide 1. find a database that you could use for research in a biology class. 2. imagine you want to find information on taxonomy. find an appropriate resource on this guide. 3. find a tool you could use to find a thesis to use in a biology class. follow-up questions now i’d like to ask you a few follow-up questions. • was this easy or hard to do? • what was the easiest part? • what was the hardest part? • what did you like about using this site? • what did you dislike? • what’s one thing that would have made these tasks easier to complete? • did it bother you to have to scroll down the page to find additional information? • if you had been doing this on your own, do you think you would have kept scrolling, or gone to other pages on the guide? • did you notice or read the text below the links? • did the names of the different pages on the guide make sense to you? did you know what to expect? • do you think you would use these resources yourself if you were a student in the appropriate class? information technology and libraries | december 2019 92 appendix c: example survey— social work students screening questions are you a university of houston student, faculty member, or employee? • yes • no are you at least 18 years of age? • yes • no consent university of houston consent to participate in research project title: usability testing of library research guides you are being invited to participate in a research project conducted by ashley lierman, the instructional design librarian, and a team of other librarians from the university of houston libraries. non-participation statement your participation is voluntary and you may refuse to participate or withdraw at any time without penalty or loss of benefits to which you are otherwise entitled. you may also refuse to answer any question. if you are a student, a decision to participate or not or to withdraw your participation will have no effect on your standing. purpose of the study the purpose of this study is to investigate user interactions with the research guides area of the uh libraries’ website, in order to understand user needs and expectations and improve the performance of the site. procedures you will be one of approximately fifty subjects to be asked to participate in this survey. you will be asked to provide your initial thoughts and reactions to the libraries’ research guides, and to complete three ordinary research tasks using the page and associated links, then answer followup questions about your experience. the survey includes 23 questions and should take approximately 20-30 minutes. confidentiality your participation in this project is anonymous. please do not enter your name or other identifying information at any point in this survey. testing for transition | lierman, scott, warren, and turner 93 https://doi.org/10.6017/ital.v38i4.11169 risks/discomforts no foreseeable risks or discomforts should result from this research. benefits while you will not directly benefit from participation, your participation may help investigators better understand our users’ needs and expectations from the libraries’ website. alternatives participation in this project is voluntary and the only alternative to this project is non participation. publication statement the results of this study may be published in professional and/or scientific journals. it may also be used for educational purposes or for professional presentations. however, no individual subject will be identified. if you have any questions, you may contact ashley lierman at 713-743-9773. any questions regarding your rights as a research subject may be addressed to the university of houston committee for the protection of human subjects (713743-9204). by clicking the “i agree to participate” button below, you affirm your consent to participate in this survey. if you do not consent to participate, you may simply close this window. • i agree to participate guide impressions click the link below (will open in a new window) and explore the page it leads to, then return to this survey and answer the questions. http://guides.lib.uh.edu/socialwork when you look at the page linked above, what are your first impressions of it? just from looking at the page, what do you think this resource is for? what would you use it for? what would you call this type of resource, if you had to give it a name? if you couldn’t find what you were looking for on the page linked above, what would you do to find help? on the following pages, you will be asked to complete three brief tasks. this is not a test, and nothing you do will be the wrong or right answer. the purpose of these tasks is simply to allow you to experiment with using the guide in an authentic way. when you have completed all of the tasks, you will be asked a few questions about your experiences. http://guides.lib.uh.edu/socialwork information technology and libraries | december 2019 94 first task click the link below to open the social work resources guide (will open in a new window): http://guides.lib.uh.edu/socialwork on the social work resources guide, find a link to a database that you could use to investigate possible psychiatric medications. enter the name of the database you found: second task click the link below to open the social work resources guide (will open in a new window): http://guides.lib.uh.edu/socialwork imagine you want to find a psychological assessment. find an appropriate resource on social work resources guide. (you do not need to actually find an assessment, only the name of a resource that would help you locate one.) enter the name of the resource you found: third task click the link below to open the social work resources guide (will open in a new window): http://guides.lib.uh.edu/socialwork on the social work resources guide, find a tool you could use to find historical census data. enter the name of the tool you found: follow-up questions were the tasks on the preceding pages easy or difficult to do? • extremely easy • somewhat easy • neither easy nor difficult • somewhat difficult • extremely difficult what was the easiest part of completing the tasks? what was the most difficult part of completing the tasks? what did you like about using the guide that you were linked to? what did you dislike about using the guide? what is one thing that would have made the tasks easier to complete? demographics thank you for completing the survey! before you leave, please answer a few demographic questions about yourself. http://guides.lib.uh.edu/socialwork http://guides.lib.uh.edu/socialwork http://guides.lib.uh.edu/socialwork testing for transition | lierman, scott, warren, and turner 95 https://doi.org/10.6017/ital.v38i4.11169 are you a student? • yes • no type of student: • undergraduate • graduate • not a student program or major: year in program: • 1st • 2nd • 3rd • 4th • 5th or higher • not a student how often do you use the university of houston libraries? • daily • a few times a week • a few times a month • a few times a year • never how often do you use the libraries’ website or online resources (e.g. databases, catalog, etc.)? • daily • a few times a week • a few times a month • a few times a year • never have you ever used the libraries’ research guides before? • yes • no ending screen we thank you for your time spent taking this survey. your response has been recorded. information technology and libraries | december 2019 96 references 1 “user experience basics,” usability.gov, https://www.usability.gov/what-and-why/userexperience.html. 2 brenda reeb and susan gibbons, “students, librarians, and subject guides: improving a poor rate of return,” portal: libraries and the academy 4, no. 1 (2004): 123-30, https://doi.org/10.1353/pla.2004.0020. 3 martin p. courtois, martha e. higgins, and aditya kapur, “was this guide helpful? users’ perceptions of subject guides,” reference services review 33, no. 2 (2005): 188-96, https://doi.org/10.1108/00907320510597381. 4 william hemmig, “online pathfinders: toward an experience-centered model,” reference services review 33, no. 1 (2005): 66-87, https://doi.org/10.1108/00907320510581397. 5 shannon m. staley, “academic subject guides: a case study of use at san jose state university,” college & research libraries 68, no. 2 (2007): 119-40, http://crl.acrl.org/content/68/2/119.short. 6 michal strutin, “making research guides more useful and more well used,” issues in science and technology librarianship 55 (2008), https://doi.org/10.5062/f4m61h5k. 7 kristin costello et al., “libguides best practices: how usability showed us what students really want from subject guides” (presentation, brick & click ’15: an academic library conference, maryville, mo, november 6, 2015): 52-60; alisa c. gonzalez and theresa westbrock, “reaching out with libguides: establishing a working set of best practices,” journal of library administration 50, no. 5-6 (2010): 638-56, https://doi.org/10.1080/01930826.2010.488941; jennifer j. little, “cognitive load theory and library research guides,” internet reference services quarterly 15, no. 1 (2010): 53-63, https://doi.org/10.1080/10875300903530199; dana ouellette, “subject guides in academic libraries: a user-centered study of uses and perceptions,” canadian journal of information and library science 35, no. 4 (2011): 436-51, https://doi.org/10.1353/ils.2011.0024. 8 luigina vileno, “testing the usability of two online research guides,” partnership: the canadian journal of library and information practice and research 5, no. 2 (2010): 1-21. https://doi.org/10.21083/partnership.v5i2.1235. 9 alec sonsteby and jennifer dejonghe, “usability testing, user-centered design, and libguides subject guides: a case study,” journal of web librarianship 7, no. 1 (2013): 83-94. https://doi.org/10.1080/19322909.2013.747366. 10 laura cobus-kuo, ron gilmour, and paul dickson, “bringing in the experts: library research guide usability testing in a computer science class,” evidence based library and information practice 8, no. 4 (2013): 43-59, http://ejournals.library.ualberta.ca/index.php/eblip/article/view/20170. 11 costello et al., 56. https://www.usability.gov/what-and-why/user-experience.html https://www.usability.gov/what-and-why/user-experience.html https://doi.org/10.1353/pla.2004.0020 https://doi.org/10.1108/00907320510597381 https://doi.org/10.1108/00907320510581397 http://crl.acrl.org/content/68/2/119.short https://doi.org/10.5062/f4m61h5k https://doi.org/10.1080/01930826.2010.488941 https://doi.org/10.1080/10875300903530199 https://doi.org/10.1353/ils.2011.0024 https://doi.org/10.21083/partnership.v5i2.1235 https://doi.org/10.1080/19322909.2013.747366 http://ejournals.library.ualberta.ca/index.php/eblip/article/view/20170 testing for transition | lierman, scott, warren, and turner 97 https://doi.org/10.6017/ital.v38i4.11169 12 john j. hernandez and lauren mckeen, “moving mountains: surviving the migration to libguides 2.0,” online searcher 39, no. 2 (2015): 16-21. 13 ouellette, 447; denise fitzgerald quintel, “libguides and usability: what our users want,” computers in libraries 36, no. 1 (2016): 8; sonsteby and dejonghe, 89. 14 costello et al., 56; hernandez and mckeen, 20; sonsteby and dejonghe, 89. 15 caroline sinkinson et al., “guiding design: exposing librarian and student mental models of research guides,” portal: libraries and the academy 12, no. 1 (2012): 74, https://doi.org/10.1353/pla.2012.0008. 16 costello et al., 56; ouellette, 444-45; quintel, 8; kate a. pittsley, and sara memmot, “improving independent student navigation of complex educational web sites: an analysis of two navigation design changes in libguides,” information technology and libraries 31, no. 3 (2012): 56, https://doi.org/10.6017/ital.v31i3.1880; sonsteby and dejonghe, 87. 17 cobus-kuo, gilmour, and dickson, 50; costello et al., 56. 18 sinkinson et al., 74. 19 costello et al., 56. 20 costello et al., 56; hernandez and mckeen, 20; sonsteby and dejonghe, 89; sinkinson et al., 74. https://doi.org/10.1353/pla.2012.0008 https://doi.org/10.6017/ital.v31i3.1880 abstract introduction literature review methodology stage 1: card sort stage 2: migration stage 3: face-to-face testing stage 4: survey stage 5: analyzing and implementing results findings card sort face-to-face testing: homepage face-to-face testing: subject guides survey discussion implementation of findings limitations conclusions appendix a: homepage testing script welcome and demographics homepage tour tasks: odd-numbered participants tasks: even-numbered participants follow-up questions appendix b: subject guides testing script welcome and demographics guide impressions tasks: general business resources guide tasks: biology and biochemistry resources guide follow-up questions appendix c: example survey— social work students screening questions consent guide impressions first task second task third task follow-up questions demographics ending screen references lib-s-mocs-kmc364-20140601051834 biblios revisited j kountz 63 biblios revisited john c. kountz: library systems coordinator, california state university and colleges, los angeles. when this article was in preparation, the author was systems analyst, orange county public libraries, orange county, california. in the following, orange county public library's earlier reports on its biblios system are updated. book catalog and circulation control modules are detailed, development and operation costs documented, and a cost comparison for acquisitions cited. "in 1968 ala began publishing, through its information science and automation division, a journal of library automation. it is perhaps appropriate to note that in the first three quarterly issues only one public library project was described ( 1), and this was a project under contemplation, not one actually in operation." ( 2) this statement by dan melcher to substantiate his contention that library automation is suspect is, in itself, suspect. the public library project alluded to as being contemplated in 1968 was brought to fruition by orange county (california) public library in 1969, and has functioned with startling success ever since. in addition, the finished system was reported to the library ( 3) and data processing ( 4) worlds in 1969 and 1970 respectively. orange county public library's biblios (book inventory building library information oriented system) is a system designed to fulfill all functional requirements of a multibranch library which is growing by leaps and bounds (5). specifically these functional requirements are: acquisitions, book processing, catalog maintenance, circulation control, and book fund accounting, in addition to management reporting on a level not practical in a manual system. 64 ]ounwl of uhrary automation vol. 5 / 2 !unc, 1972 the functional system the interrelation of these system elements is shown diagramaticall y in figure 1. briefly and from a us<'r's point of view, the system works like this: a title is desired by someone, patron or staff member. the p erson refers to the book catalog, figure 2, to see if the item is in the collection. if it is and not in circulation, he gets the book directly. if the item is in circulation, he can submit a request for it-to rece ive the book on its return. to update the catalog, a cumulative supplement is produced, keeping current the listing of the library's holdings. if the title is not found in the catalog or supplement, the monthly cumulative on· order list, figure 3, is consulted. if the title is listed , a request is submitted and, on receipt and processing, the book is released to the requester. if the title is cancelled, the requester is notified. when a title wanted for the collection is not listed in either the catalog or the cumulative on-order list, a bibliographic information sheet ( bis ), figure 4, is completed and optically scanned into the system. this information is essentially a pre-cataloging bibliographic description of the desired material. once entered, these same data serve first to create purchase orders and related reports; then, once edited by the catalogers from the book in hand, to create book card and pocket sets (figure 5 ), book catalog entries, shown in figure 2, holding lists (shelf lists ) for each branch, and a broad array of operational reports. it is a feature of biblios that the descriptive data (from the bis) are entered in their entirety only once. this means that a bibliographic description need not be initialized by each individual using it; rather, it need only be consulted and, if necessary, corrected or deleted. thus, an entry once in the system is immediately available for, among other purposes, ordering. this is especially significant since it means that each entry in the book catalog, the catalog supplement, the cumulative on-order list, etc., can be ordered against by simply using the key number for the desired item and the number assigned to the branch wishing to order. this poses the possibility of orders for materials which are op or otherwise not readily available through the usual vendor channels. biblios addresses these potential errors by listing (pre-vend list, figure 6) all order requirements for review before they are used to create orders. by editing this list against books in print and/or publishers' catalogs and taking corrective action, orders for the unobtainable are short-stopped. on placing an order, while a unique subpurchase order number is mechanically created, the key number continues to document the title for processing purposes. in this role the key number follows the order until it is filled or cancelled. thus, the key is used by biblios to update inven tory automatically on receipt of an order and to create the card and pocket sets for those materials received. finally, the key number is used by the branches to report inventory changes and, as a subset of inventory, for circulation control. rl ruos revisited /kountz fl.s since it is through the key number (or key, for short) for a bibliographic citation that the citation is used in the various functions performed hy biblios, perhaps a little detail concerning the key is in order. bibliographic data optical scan marc ~ l bibliographic jl book catalog ~ master indices book catalogs & supps. orders new materials reorders ~ ! acq uisitions accounting ~ sub purchase orders on-order-lists budget reports vendor performance pre-vend/review lists inventory update losses gifts ..___...._r----inventory t locator ~ lo cator guide & supps. pocket & card sets collection profiles fig. 1. biblios-the functiorwl system. the key number circulation input transactions patron registrations _,...___ ... l cl rculation l ~ holdings list book "tags" patron register overdue notices management repo rts us e profiles in figure 2, the key for 73084452 has been underlined. the key number resembles the lc card order number. wherever an lc card order number is available, it is used. when no lc card order number is available, a unique orange county ( oc) number is applied. the oc number consists of two alphabetic characters in the first two positions (at one time the numbers implied year ) of the "traditional" number followed by a six-digit sequential number. since the library of congress has certain idiosyncracies about its card order number, the key also specifies the type of material it represents (for example, only book keys are in the book ca talog ), and identifies each volume, or edition, of a title which has a blanket lc card order number. the selection of the lc card order number for this application was based on a suspicion that the bulk of materials in the collection were already adult catalog '71 cumulative supplement 7 author-title section wall, joseph frazier. walter chandoha's book of foals ward, mary jane. washington, george, pres. u. s ., andrew carneg•e oxford umvers•ty 1970 and horses. the other carol•ne~ a novel crown 1970 17 32-1799. 1 137p lnde• b•bhog photos b•ography see. chandoha. walter 216p the journal ol ma1or george wa.h1ngton 92-c aa006725 636.1 197 1 aa011379 fiction 70108078 march of ameoca facs1mlfe seoes. no wall, leonard vernon comp1ter the walters ndrome. ward, ritch i e . 42 oogmat r p reads w1tilamsburgh the puppet boo~ ed. of g. a while 2nd & see· neel ric~ard the living cloc~s drawmgs by hollett smith ponied. london. reponted for r jefferys extensive rev under ed. of a. r. philpott y . allred a knopf 1971 385p index b/ w 1154 co vers the pertod from oct i 153 faber & faber 1965 300plnde•b•bhog fictionm 1970 79122149 lllusphotos to jan 1154 un•vers•tymicro fllms 1966 b/ w lllus photos walters, barbara. 574 1 77111247 32p no index maps 791.53 68017740 how to tal~ w1th practically anybody about · . 973.26 66026314-001 wall street and w·tchcraft practically anyth~ng doubleday 1970 warde, fredenck b . washington international arts i • 195p f11ty years ol make bel1eve by fredeock l tt see gunther. max 80856 aa007142 warde international pr syndicate 19 20 e er edllor 133 1971 aa012873 · liop grant s and a1d to lnd1v1duals 1n the arts th w ii s walton, clarence c . contammg ltstmgs o f most protess1onat e a treet jungle. ethos and the execut1 ve values m fiction 2 i008l 54 awards. and /nlormafion about colleges see ney. richard manageoal decision makmgprent1ce-hall wardropper, bruce w. edlfor umvers1tres and prot schools of the 332.678 1970 76084477 1969 267plndekbibhog spamsh f>oetry of the golden age edited by arts by the ed1tors of the washmgton wallace, irving. 658.4 .. 73084~52 bruce w wardropper appleton-century inti arts lefler paperback wa sh lnll the nympho and other mamacs s1monwalton lzaak 1971 353p for lang poetry collect1on art s le tter 1970 75p no index sc huster 197 1 4 7 5p index b1bhog b/ w the ll~es of john donne and george herbert b1bhog r378. 34 70112695 lllus b•ography bound wlfh the prlgom 's progress. by sp861.08 78132806 waskow, arthur i. 301.415 aa011778 john bunyan v 15 m the harvard ware, clyde. the freedom seder a new 1/aggadah l or wallace, marcia. c!ass1cs coll1er 1909 418p the ecfen tree touchstone pubhsh~ng passover holt r1nehart w~nsto 1970 barefoot '" the kltchen. a cookbook tor fiction 09023026-001 -015 company 1971 357p 56p b w lllus summer hostesses dra wmgs by re1d the l1ves ol john donne and george herbert fiction aao 13079 296.437 7910355 7 perez kolman st marhn·s pr 1971 150p bound w1th the p1lgflm ·s progress. by . wasley, ruth. index b/ w tllus john bunyan no. i 5 m the harvard warmack, ohver j. bead oe 51 gn a comprehens11·e course tor 6 4 !5 73145431 ctasslcscoll1er 1937 418p the mystery ol lmquity . volume i 2 thess . begmner and expeoenced craltsman by w ii · r b rt fiction 37040164 -001 -015 2.1 pub by the author 1969 120p no ruth wasley and ed•lh hams crown a ace, 0 e edit or w b h j h index 19 70 216p index b w ill us col ill us the worldoibermnl.l5981680 byrobert am aug • osep • 200 77013647-001 photos 1 w41/4c~ tjnd th~ cdllors of t1me-lde the b lue kmeht an atlantic monthly press books t1me ·l1 i e books. 1970 192p index book l11t1e. brown. 1972 338p lg dwin g . -------746.~ . ~ ...__......._ 81~&,.chm ; ' cql.!!.~u s · · __...------~·~'l...._ _ 79175 nd the~r cu ~ -~ --~ ~-~ fig. 2. a book catalog page featuring four columns. biblios revisitedfkovntz 67 assigned a number, a suspicion which was confirmed on completion of conversion through simple reporting of the keys on file. in short, after fifty years of operation of orange county's libraries, 92 percent of all titles in the collection had an "lc number," a factor one might weigh when trying to decide between isbn and lc card order number; nor has it been indicated that isbn's will be developed retrospectively. an update to the system in the paper presented to the american society for information science in 1969 ( 6), neither the book catalog nor the circulation control modules had been implemented. book catalog in may 1971, the first edition of biblios book catalog was released for public use. since that date, the cumulative supplement has been run six times. the module of biblios producing the book catalog and cumulative supplement is diagrammed in figure 7. input is the title-master file (the system's bibliographic data base) and a specification of the output required. the output options available to the library include the production of either a full catalog or a cumulative supplement (displaying all entries placed on file since production of the full catalog which have been edited by cataloging). in the case of full catalog production, the title-master file is updated to reflect the use of all qualifying entries for catalog production and the date of their use. this updating facilitates cumulative supplement production by precluding the use of these entries from display until the next full catalog run. in addition to the type catalog (full or supplement), the library designates the format of the output. either an off-line print-out or a print file designed to drive a mechanical photocomposition device, or both, can be requested. it is important to note that this print file is designed specifically to be hardware independent, e.g., it will run on rca, photon, alphanumeric, or comparable equipment with equal ease. hardware independence in its simplest terms means the computer program does not have to be rewritten each time a vendor goes out of business. and, coincidentally, this print file is in the sequence it is to be displayed in. in short, the vendor only performs that processing necessary to make his device set type to the library's specification for layout, font style, and font sizea specification, it might be added, which calls for upperand lower-case type from a file in upper-case only. this approach differs from what has become typical of book catalog production in that sorting, file maintenance, and all related processing are sustained by the library through biblios. the vendor only sets type, prints, and binds. the results spell savings since a potentially error-laden file does not have to be committed to the most expensive of all displays, photocomposition, before corrections can be made. l~20l4c4 cumulative on·o~oer list _ _ _ _ media 01 book author titl~ --wioereierg~ siv wicker, kings~ey '~ier, este::t wiest, j, i levy, p, wiltox, leslie a, _ _ wil.de f; , laura .( incal.ls) wilk, max wllkeksun, oavio ~ilkes, bill st. john wllklhsun, paul h, wllki~son, rupert willarn5, milorto wilds ~~~ li.c ox, dont,lo \oilllc(1ll, donald wjllcux·, donald wjlliaiis, brao will i ams, ctlljn ·-· \ol i l li mis, ethel w a _ williahs, garth vii ll i mis, jay ~illians, john g, will i mis, joyce will 14ms, hill.er williams, rooert m. --williams, te nn essee ~ illia~s, ursula moray ~illia~s, ursula mo ray ~ illi~g hah, ~arren we 1-: illl$, f. roy i'll ls j;.~ , eomlino hilson, ~~len janet (came whson, erica wilson, h, w.; firm, pusl ¥! hson, ira g. 'rl ll.son, jean vi i lson~ j ulin rgwan ~ilson, k~ nneth l. w inc~[ll, cunstance mabel ~inuchvt eu~ene c, winn, marie; lllt\hf\ts tales , ~ i ntepburn 1 mollie ~ t nterst donald l., _ __ \-' lrte noer g, patricia l~ ~~ 1 se 1 arthur wi se, herbe~t alvin \ollse, slr.ne:v t, ~!li it her s1 carl ~lltkin, b. e. my best frtf l-10 ways of nhhlism white oak managem nt guide to pertm~schjt s 1/ cyage by thr. ~horr.s of silver l wit and wisrom of holly~u cross and th e s witc~ulaoe nautica i' arc ha ecjlogv aircraf enc i nes of t~e w preve nti on of u•i mki ng pr luc k nf harr y wfav~r muo (rn leat htr design new ues!gn i n jewelry wood design lost le l enos cf the wes t ~omnsex~al.s and the mllit knuw youk ant r sturs big gul •.en animal a 8 c s ll. v e r ~· h i s tl t; field gi de to the sutter adjusta~l[ jul i~ only wo~lo thlre is u c l a susi nr ss forecast hilk train du~sntt stop h snv in ear< n thrh toy makl rs fr~e-ac ~ ess higher ed ucat jta~y c ~ooses e~rope upstate alllr icml pai nter i n paris crf~e l. ~mgro l dery ficti on catal~g for 1970 wf-lht ccj ii ?uto;s ca n ~ot do weaving is fu!·i oa~r lt~c. t o:~ have fa l th with~ut ftar guide t reference books, to nk in 1' ulf playgro . p ol!o k w i ntf~ts tales l61 1970 techn i que lf rlandpu ilt pu henry c· nt.m .l 'riall..u:!: as all.aro~ndt h l 4guse 6rt \ti hu ki lli:o e!ijch pur:l.ll great t ~es uf tl rrgr i t lnvfst ~no ~~tirl i~ mexl m\e rjca . riddle &ulll< sumhary of califurnia law -------------------------fig. 3. all outstanding titles are reported in the monthly cumulative onorder list. orange c3unty public libraky lc-cc · n i.j ~uer 7zl14z2 9 ••••••••• aaoll44 9 ••••••••• 7 ~ 13~ 0 00 ••••••••• aaoll~5 3 • •••••••• 7 j l ~ 5~89 ••••••••• 1 39 0 27 9 4 9. , ••••••• 73124 9 83 ••••••••• 63 009~42 ••••••••• 711,9~50 ••••••••• 41013 39 7 •••••• 070 7aou 3d57 ••••••••• 72 131 147 ••••••••• 6y0 17~b7., ••••••• , 7 9 126t 7o ••••••••• 6~0 12400 ••••••••• . 7oo h 0 ~ 6n •• • •••••• aaol l , o? ••••••••• 6 00 15 2 5 2 •• • ••• • •• ~700 ~jlz ••••••••• 71 1 36 ~ 8? ••••••••• 7 3 146 ~ 03 ••••••••• a a u12 ~ 5l • •••• • ••• 7ol227uo ••••••••• aa017727 ••••••••• b 30l364l ••••••••• 79 1 0 2 ~ 11 ••••••••• 7315 2 d 7 ~ ••••••••• aa 0 17711 ••••••••• 7 50 g3 v 24., ••••••• 7514 33 0?. ••••••••• 7 0 149zzj., ••••• • • 62 009637 ••••••••• 0 90 35 0 4~ •••••• 070 ' 7 3 112 ~ 23 • • • •••••• aao l3 , 77., ••••••• 72150 ~2 ~ ••••••••• 7 7 1 2 4 69~ ••••••••• aao u 6 78 j •••••• o 67 aa c l2 1 55 • •••••••• o7 o l3)ql ••••••••• 5 5o1 j l ~~ •••••• o7l , aanl 77 9 4••••••••• 7 t::cj:1orc; .j ••••••••• 600260 ~~ ••••••• • • 761 4 8 ~ 31 ••••••••• 4 ~00 5 5 ~ 2 ••••••••• aao l 7o e5 ••••••• • • 5j olo u45., ••••••• 6000 4 79~ •••••• 169 surl pu t; tip. l8 order no 7127707 7 47 7117906?.49 7121705 '•27 7ll3 c02 0 3.3 7l2 7700j o a 711791 8~36 71 2 2105 -' ~l 7 1 2 !>ooo l 90 7127703 '•36 712210 ~'i 8q 710 '> 704 \li. o 7l22107z63 712!>6023 9 4 7122103587 7l277q5.::i30 712770297 9 7125605046 71277 0 2742 7l2t:l006~ l 71 22104 <. 31 7j.2560l l!l 6 7122100 0 95 712!>6026?.3 71161098 7 !) 7l2!;i605 ;•j3 7l2770 0 o37 7 l27 7 :l 4 ub8 "tllol os 'j65 712tr 0 2 6 5l 712 560 3 v l3 7125600143 7lz770ll6t 7l 25r) 097co 712211 ( v ~h 712770 1!u e0 7 i 6 4 712,~03013 711"1903-il~ 7 1256081>0 7127711:323 71 2i'7u tl'>31 7ll~ l()4d69 712770703i. 71221qq l ?7 712 ~ 60 3 g 5l 7 1277021> 71 711791p t94 712"170 59 05 712 21 0 9t•0 7 vei~dqr ct bro l:tp. o sho brq pr< ~! bro br~ bro cd bro bt b~o bro bro cro bt ar.o bihi 8 rll &ku ur•u sf{ o ch ut ~ r l.j ck o c!-i &r ij bf, g flfw liko wi p i< \~ st 00 bt p.ro dd cf·ia ih a f.:. a p (;ro ihuj bro ort-1 [i t cll too•··· 7l p t.r • .:~ 6 8 ptl qty cd :?.z 4 5 l 1 1 23 l 4 2 8 1 2 3 l 1 2 l 1 16 !. l 1 'l 1 7 lo '0 1 j 6 1 26 fl l \ ' ) 6 2 4 l 3 4 2 1 4 l 7.7 3 :; li ~ t prtc.t: :i, so-7.or, 3.95 4.9~ !!.9~ l. 8 () 7,95 4,9 5 ?.'j:; 2 5. 00 l :> . (; q 4.5 o 1~.so · 7.~0 8 ,95 5,95 6.9; 5 , c·o 3,3 () 3 .if ., ~ . 9 :; l. 3 5 4.9!> 10. ~ 0 3. 75 3,a il 3. (1 () 6 , ~ i) a. ~o 6. 9 :; 4 .95 7.5 () zs.r.o (l. 9~ 5 .95 b.95 3 .9~ · 4. 00 6.9~ 4.95 5.9!5 l.c, \) 0 8,9!) 5. 0 0 ;. 9 ) 4. 'i 'i l. -.: ..... a ;j ~ ..... 0 ~ 5' . ;::s 0 < £. cj1 0 ......_ l\j '--< c: 0 ::l _en ..... 0 cd -l t-0 0 --- 7616035~••••••••• 1 ryck1 fra~cis f a loarj~d gun a~olz421········· 1 sacki john n a ~ieu!enant callf.v 7712'906tt••••••• 1 sanders, eo, n a fami~v aa01311~•·••••••• 1 sanderson, ivan te~ence n a \.!• 5 9 a, 61014l1'••!••••!• 1 sandoz, marl ~ j r jhes~ were the sioux aaq1048~••••••••• 3 santesson 1 hans stefan f j days -after tqmurrow aaol0~7bee•t•••~• 1 sase~, hlroslav n esto es san p:railciscu j aa010~77,,,,,,,,, 1 sasek, mirosi.av n estu es washingt0~1 d, c, j 6~0l9787e••••••!• 1 sasek, ~lroslav this is hong kong '!016zb6," .. "!! 1 sasek, 11irdslav this ls pa~is 14171270 .. " .. ,., 1 saxton, jusepiiine group feast n j r n j r f a 1.1~1 pi{'i(;e ~,,o 4o95 ---------------------------------------------fig. 6. before producing a sub-p.o., the pre-vend list is checked for o .p. materials, among other things. ---------------------------~-----------------, y , _ r,u~llc llort\ry oaf e .. ll•oij·7~ ~agf44 ol ihjok , ty publlsh [ fl. ua!e vur 0 ujsc net pr av .. special o~oe~l liata co cd ptt p.rice co co 3 ~arnt:s -nubi.e ~9'11 •bro a ! , , e~!jnothcs ih a 4 l>oualroar 1971 •ou a 5 dcjl.isle()tiy 19?1 *ol' a sf 9 steln-uay 19"11 *i.h\ 0 a a! a 2 vi k l '·h> press 19'!1 •bt a 8ro a ) e. p, dutton 1971 *if!' a ii---3 n -.]. -.]. ----------~----~--------------()0 lb~oc301 holdings list cost center j adult call nu"ber author * ()' i 0 795.~15 reese terence • story of an accusation 1 0 795.415 s~einwolo alfred * shoat cut to winning bridge~· 795.415 s,.ith thoi'ias " * look it up in i'oyle 795.~15 young ray • bridge for people who don t know 0 795.41503 reese terence + bridge player s dictionary , 0 0 0 0 0 0 0 0 0 0 0 0 d 0 0 0 0 795.42 ccllver donald i + scie~tific blackjack and co ~ 795.42 thorp edward 0 * beat the dealer a winning stri 7q5.43& blackstone harry + blackstone s modern card tr ~ 795.43& stanyon ellis + card tricks for everyone 795.540973 r~nd ,.cnally * 1970 rand mcnally guidebook to c 796 bisher fur "an * with a southern exposure 796 krout john allen + annals of a"erican sport 796 mittelbuscher c f + call em rig~t 796 murray jim * sporting world of jim murray 796 smith robert miller • grantland rice award pri2 7<16 smith walter wellesley + views cf sport 796 vannier 11aryhelen + individual and tea" sp orts 796 wood cle11e nt • coi'iplete book of gai1es 796.026 sports rules encyclopedia* sports rules e~cycl 796.03 s~lak john s + dictionary of american sports 796.06& aaron david* chilo s play 796.068 butler george d * recreation areas 796.0m isaacs stan* careers and opportunities in spoil 796.08 pepe philips* winners never ouit 796.08 wino herbert warren * realm of sport 796.082 esquire * esquire s great men and moments in sp 796.0&2 sports illustrated * sports the american scene 796.09 cohane timothy + bypaths of glory 796.0973 be9t sports stories • best sports stories for 796.0973 best sports stories * best sports stories for 796.0973 best sports stories • best sports stories for 796.0973 best sports stories * b!st sports stories for 796.0973 best sports stories • best sports stories for 7q6.l broer marion r • fundamentals of marching 7q6.13 c~~se richard • hull~baloo and other singing f( 796.15 wagenvoord james • flying kites 796.3 holt rich~ro • teach yourself billiards 796.31 amateur athletic union • official rules 796.31 maxwell harvey c + american lawn bowler 796.323 amateur athletic union + official a a u 796.323 sports illustrated + book of basketball ~no sn( handbai s gu i dl basket i 7q6.32307 verderame sal reo • organization for champidnsi 796.323092 auerbach ~rnolo reo * reo auerb~ch winning the 796.3230q2 pettit bob • bob pettit the drive within ~e 796.3236 a~thel" pete • city game 796.33 ccnerly charlie • forward p~ss 796.332 schenkel chris • how to whch football on tel.e1 796.33203 treat roger l • encyclopedia of football 796.3320'& riger robert • best plays of the year 196z 796.33206 curran bob • four hundred thousand dollar quaa" 796.332077 devine dan * "issouri power football 796.332077 schoor gene • treasury of notre dame football 796.332082 newcombe ~ack * fireside book of football 7q6.33209 bell joseph n * bowl game thrills --.-----------.-----------......-.--..-.-fig. 9. th e maintenance of manual shelftists is obviated by a bibliosproduced holdings list for each branch. j------------------------------costa iusa loh-fict ion l tle i: card froi1 another l' lete casino guide ·ie gy for the ga"e of twenty one i :ks ; l"pgrounos rev eo i i l: sports stories i i 'or girls ano wollen ! 11pe d la ls i i ~ qrts i i i 061 : ~63 ~64 ~66 ~ 970 ~lk gai'ies ( jker l l e ! all guide 1965 1966 r ip high school basketball hard way ~.is ion ' erback 07/01/71 page 341 r s n8r lc/oc nbr • 67~17872 ••••••••• • 61016665 ••••••••• • 7&077366 ••••••••• • 64015641 ••••••••• • 63025374 ••••••••• • 66023116 ••••••••• • 66012019 ••••••••• • 58005566 ••••••••• • 6&022206 ••••••••• 60c01380••••••••• • 62008215 ••••••••• • 2900080~a •••••••• • 8b091802 ••••••••• • 68c25594••••••••• • 62015934 ••••••••• • 53006862 ••••••••• • 60007465 ••••••••• • 3&003909 ••••••••• • 61019409 ••••••••• • 60013658 ••••••••• • 64012696 ••••••••• • 57011288 ••••••••• • 64019529 ••••••••• • 67026079 ••••••••• • 66019433 ••••••••• • 61010232 ••••••••• • 63021480 ••••••••• • 63016506 ••••••••• • 45035124 •••••• 061 • 45035124 •••••• 063 • 45035124 •••••• 064 • 45035124 •••••• 066 • 45035124 •••••• 070 • 65021807 ••••••••• • 49008127 ••••••••• • 68031281 ••••••••• • 5&003667 ••••••••• • 88c40004 ••••••••• • 66025876 ••••••••• • 88090177 ••••••••• • 62011346 ••••••••• • 63014720 •••••• • •• • 67011223 ••••••••• • 66c14357 ••••••••• • aa010179 ••••••••• • 60012110 •••••••• • • 64020856 ••••••••• • 61013913 ••••••••• • 62022305 • • ••••••• • 65022618 ••••••••• • 62005250 ••••••••• • 6201&326 ••••••••• • 64019933 ••••••••• • 2 63016799 ••••••••• 80 journal of library automation vol. 5/ 2 june, 1972 branch to circul•tion control sub-system book card production (by s ranth) book & date-due cards fig. 10. biblios circulation control subsystem. biblios revisitedjkountz 81 cards, cassette, or mini-reels). ideally, the elusive transactor should be able to "read" a label on the book as well as a patron card. kimball labels, "sunburst" tags, magnetically coded swatches and the like have worked and continue to work in the retail trade; there is no reason why they shouldn't work for libraries. the only deterrent seems to be the reticence of their manufacturers to enter an unknown market where, following the melcher axiom, they are met with a "stubborn, 'show me' attitude when automation is proposed." ( 8) the products designed into the circulation control module include: weed lists, patron "black lists," circulation profiles (graphically displaying patron use of each branch's collection), and automatic duplicate ordering. reports measure circulation from a manager's viewpoint, but not to the exclusion of such bread-and-butter products as overdue notices, registration lists, and related statistical recapitulations. a word about documentation for each program in each subsystem of biblios, forty unique programs in all, there is a formal package consisting of: l. a program specification detailing the inputs, processing, outputs, idiosyncracies, and edits of that program; 2. a listing of the cobol program itself; 3. an operations binder (notebook) section for set-up and run procedures; 4. a user's guide section relating requirements and diagnostics to the librarians using the program including typical problems; and, 5. assorted total system binders (notebooks). while some might think "overkill," in automation this is not the case. the biblios system has yet to fail a scheduled commitment. further, it is suspected that the mere discipline of documentation caused many serious reconsiderations of program and procedural logic, at the time and on the spot, with the result that biblios is a reliable systemrequiring no major rework and continuing to respond to the library's functional requirements for over two years at this writing. a word about development costs both developmental and operational costs for biblios are known and documented. specifically, the costs to procure such a system are broken out in table 1, where each subsystem is examined in terms of the dollars it represents and the assorted tasks required to bring it into being. the totals represent all costs over approximately a three-year period beginning with rough specifications and yielding the first book catalog. it must be noted that final program specifications and coding were performed for orange county by a contractor. this approach was chosen, since a good job done on time was wanted. that the approach was valid is table 1. biblios development costs (including full conversion and publication of first book catalog). (x) 1:-0 '-. 0 bibliographic book catalog ~ marc inventory locator guide acquisitions circulation total contractor 0 -program specifications t"-' & coding $16,686 $ 54,299 $ 25,800 $ 72,305 $ 91,000 $260,090 & ;::; ""' .'= orange co. public library ~ :::: ..... analyst 3,360 7,840 2,240 14,560 7,000 35,000 0 ;:; coordination ..... 1,225 7,679 818 5,310 5,670 20,702 -· -~ :s implementation ( k.p. , < machine time, etc. ) 4,772 12,263 4,635 7,879 10,110 39,659 £. con version/ outside cll -services 800 53,500 41,370 95,670 l-:> subtotal 10,157 81,282 49,063 27,749 22,780 191,031 ._ ,.. :l "\.) ,.... total $26,843 $135,581 $ 74,863 $100,054 $113,780 $451,121 c;o -..1 l..o biblios revisitedfkountz 83 evidenced by the achievement of a successful system on schedule and within budget. this approach reflects a contention that librarians can specify their requirements if they "have a mind to," and that a contracted programming staff can satisfactorily perform to predetermined standards and timeframes if properly directed. in direct contrast to this approach are the incredible schedules developed when requirements are not specified (and frozen), and the suspected monumental costs hidden in lost staff time due to extended parallel operations or simply waiting until "they" get the " ... thing" to run right. the remaining cost components, briefly, reflect direct library analyst time, the cost of coordination meetings, direct key punch and machine time for programs, their test, debug, string test, systems test, and for the bibliographic and book catalog, subsystems conversion and catalog print file generation. the conversion/outside services include a marc subscription, the creation and use of a group of nine typists to optically scan the library's files to convert them to machine readable form (including error correction), and the contracted services of a photoreproduction house to mechanically compose, print, bind, and deliver 500 sets of the book catalog and 100 sets of the locator guide. these are the costs of setting the system up, staff training, and creating a single operational display: the book catalog. a word about operating costs early in 1965, as a prelude to implementing a book acquisition program, a time/cost study was performed to determine how much it cost the library to order a book (one title). this study detailed and costed the typing, sorting, assignment of vendors, and the reduction of a diversity of paper requisite to creating a purchase order. excluding the cost of the purchase order form itself, the direct manual cost for this process was $1.56 per title, using a clerical rate of $2.10 per hour. in the intervening years three things have happened: first, clerical rates have increased to $2.79 per hour which when applied to the unit cost of the 1965 acquisitions study means a direct outlay of $2.07 per title (as against the previous $1.56). second, the number of branches has increased which implies that, if the manual system of 1965 could cope with the increased load, it would have required more people and therefore an increase in indirect costs, not to mention the probability of less efficiency due to increased direct costs. third, orange county has automated this function (as well as others). since orange county is wont to track costs, it so happens that the cost for creating a purchase order ( subpurchase order under the new system) is available. specifically, orange county knows computer and peripheral costs and the exact time for processing from actual billings over the past two years. the reduction of these data to a per-unit-handled equivalent, while detailed, is not difficult. thus, it is possible to deduce the machine costs table 2. typical processing costs for one title in orange county public library's biblios system. marc acquisitions 2 book catalog (weekly) bibliographic1 inventory order receive3 b.c. inventory run cost $325.16 $300.40 $201.21 $1244.94 $238.55 $238.00 $26.00 average items per period 1154 1,000 8,100 700 4000 order receive cost/entry 2.83 0.30 0.025 $1.78 $0.34 0.059 0.0006 supplies 0.13 0.028 $.05 (sub p.o.) services 0.02 ( convelope) .06 ( opscan) 0.041 0.0028 (opscan) ( comp /print) ( comp /print) total $2.96 $0.32 $0.053 $1.89 $0.34 $.10 $0.0034 example: cost of entry from initial input to display in book catalog (including convelope; excluding marc source: $2.77). 1 40% bibliographic. 2 60% bibliographic. 3 includes invoice, vendor, and budget displays. 4 if all new entries to system came from marc. (x) ..,... 0' ::: 3 1::) ........ ~ t""< 6.:. ..., ;:::: ..., <::: :;.,... ::: .,... 2 :::; ..... c· ;:; < 2.. '-" -...._ 1'0 ._ ,.. 5 "(1) ,_. ~ 1'0 biblios revisitedjkountz 85 equitable to those for the earlier manual effort : creating a purchase order for one title, including the purchase order form , now costs $1.89. similar economies can readily be documented as can the increases in service to our patrons at no increase in staff. the operating costs for those biblios subsystems in regular use are given in table 2. only two entries on this table are not self-explanatory. marc marc, which is indicated as processed weekly, has not been run for over a year. the explanation is simple economics. it costs $0.32 to manually place a bibliographic description on file (excluding the time spent to circle an entry in publishers, weekly (pw) vs. $2.96 to process the same entry from marc. this cost for marc includes the subscription cost prorated to selected entries, the translation and format of all marc entries, the automatic release of those entries of limited value to a public library, the cumulation of entries which may be of value, the extract and transfer of those entries selected, and the reporting via indices and full listings for the contents of the cumulated file. the unit cost is the actual processing cost for marc ii files for one year divided by the number of titles processed through the rest of biblios during the same period. this cost does not include corrections to selected marc entries (invariably in the call number and author fields for consistency with the library's existing files). the costs affiliated with processing corrective input closely resemble those for bibliographic, e.g., $0.32 each. prorated bibliographic input biblios works on pre-cataloged entries. the 60 percent bibliographic input shown under acquisitions relates to the full initial description for a title being entered by a book selector to effect its order and subsequent reporting; the 40 percent shown under bibliographic is for cataloger input to adjust the entry for title-page accuracy, consistency with existing files , and, for nonfiction, the assignment of call numbers and subject headings. it is important to note that for reorders against a title already in the system, no bibliographic input is required. in the case of reorders, the per title cost is $0.88 including subpurchase order forms. references 1. john c. kountz, "cost comparison of computer versus manual catalog maintenance," l ournal of library automation 1:159-77 (spring 1968). 2. daniel melcher, m elcher on acquisition (chicago: american library association, 1971 ), p. 135. 3. john c. kountz and robert norton, "biblios-a modular approach to total library adp," proceedings of asis 6:39-50 ( 1969). 86 journal of library automation vol . 5/ 2 june, 1972 4. john c. kountz and robert e. norton , "biblios-a modular system for library automation," datamation 16-79-83 ( feb. 1970 ) . 5. orange county public library presently has twenty-six branches, three bookmobiles, and plans for at least three more branches and an additional bookmobile in the near future. 6. kountz and norton, "biblios-a modular approach." 7. the device affiliated with the book depends on the transactor. the only requirements are that it mechanically represent the key for the book, be practically indestructible, and that it can be prepared mechanically. this last consideration is an absolute when there are 800,000 volumes to convert. 8. melcher, melcher on acquisition, p. 135. 2 information technology and libraries | march 2007 m any things happen on the national front that affect libraries and their use of technology. legislative action, national policy, and stan dards development are all arenas in which ala and lita both take an active role. lita has articulated in its strategic plan the need to pursue active involvement in providing its expertise on national issues and standards development. lita achieves these important objectives in a variety of ways. lita has several committees, interest groups, and representatives to ala standing committees that address legislation, regulation, and national policy issues that pertain to technology. the charge of the lita legislation and regulations committee reads: “the legislation and regulation committee monitors legislative and regula tory developments in the areas of information and communications technologies; identifies relevant issues affecting libraries and assists in developing appropri ate strategies for responding to these issues.” as its educational mission, the committee publicizes issues and strategies on the lita web site. the chairperson of this committee serves as the lita representative to the ala legislation assembly which advises ala on positions to take regarding legislative and regulatory action. lita also has a representative to the ala office of information technology policy advisory committee who works closely with the legislation and regulation committee on it policy issues that may cross over into the legislative realm. lita also appoints a representa tive to the ala intellectual freedom committee whose purpose is “to recommend such steps as may be neces sary to safeguard the rights of library users, libraries, and librarians, in accordance with the first amendment to the united states constitution and the library bill of rights.” much has happened on the national front in the past few years that provides plenty of work for these lita and ala committees. the patriot act, calea, net neutrality, dopa, ada compliance, and debates over copyright and intellectual property rights in an electronic world are all examples of issues that require technologi cal control or affect systems and network solutions. they also touch at the heart of what librarians have always stood for: protection of intellectual property, personal pri vacy, and intellectual freedom. library technologists exert enormous time and effort protecting the privacy of patron records through data retention policies, system controls, and strong authentication systems all while providing authorized access to intellectual property according to copyright or licensing restrictions. keeping lita mem bers apprised of all of these issues and the technologies required to abide by legal requirements is an enormous task of the committees and interest groups. these groups do this through programming, publications, and postings to the lita web site. lita has always been very active on the standards development front. from the start, lita was involved with the marc standards through the hard work of henriette avram. the number of standards that affect libraries has mushroomed. there are standards for all aspects of technology—data formats, hardware and firmware, and networking. ala regularly calls on lita to provide expertise on developing standards that per tain to library technology. lita has a standards interest group and shares membership with alcts and rusa on the marbi committee. most lita interest groups deal with standards of some sort at least occasionally. the lita board felt that lita’s work on develop ing standards was so important that in 2006 a new standards coordinator position was created and diane hillman, cornell university, was appointed as the first person in this role. the standards coordinator identifies lita experts to assist in calls for review of developing standards and seeks input from the membership. the standards coordinator works closely with the standards interest group to help educate the membership. because of the nature of digital information, networks, and the standards that enable the distribution of digital informa tion and services, it has become impossible for any one person to understand all the standards that affect the library technologist. as standards proliferate, it becomes more important for lita to provide educational oppor tunities alongside the involvement in the development of these standards that so impact our daily lives. the lita web site provides a wealth of information about standards. a new means of contributing to the dialogue about developing standards is to participate in the lita wiki where diane hillman will be leading the way in posting information about various library technology standards. also, a great place to learn about various stan dards is right here in ital. practically every issue has at least one article about one standard or another. lita’s participation in technological developments on the national front is critical to all libraries. policy, regu lation, and standards form the infrastructure to techno logical implementation and are the cornerstone to library technology. lita is the place where you can learn more about these developments and participate in the dialogue about them. bonnie postlethwaite (postlethwaiteb@umkc.edu) is lita president 2006/2007 and associate dean of libraries, university of missouri–kansas city. president’s column bonnie postlethwaite lib-s-mocs-kmc364-20141005045055 207 the ad · hoc discussion group on serials data bases: its history, current position, and future richard anable: coordinator, york university libraries, toronto, ontario, canada. history the ad hoc discussion group on serials data bases was formed as a result of an informal meeting held during the american library association's conference in las vegas on june 26, 1973. those in attendance were primarily interested in the generation and maintenance of machine-readable union files of serials. (this author's involvement in that meeting and the later activities of the group stems from a contract between the national library of canada and york university concerning an investigation of the problems associated with machine-readable serials files.) it was intended to be a relatively small and informal meeting of about ten individuals. the meeting was by no means closed, but it was not widely advertised. however, twenty-five individuals representing twenty institutions on the national (both the united states and canada), regional, and local levels attended. at the meeting there was a great deal of concern expressed about: 1. the lack of communication among the generators of machine-readable serials files. 2. the incompatibility of format andjor bibliographic data among existing files. 3. the apparent confusion about the existing and proposed bibliographic description and format "standards." there was general agreement that something should and could be done about these problems, and that the formation of a group specifically concerned with the generation and maintenance of machine-readable serials data bases would at least improve the communications aspect of the overall problem. (poor communication was thought by some to be at the root of the other problems.) it was also suggested that such a group could lay the groundwork for solving some of the compatibility problems, by presenting proposals on various aspects of the overall problem. these proposals might be used as guidelines for any new projects or revisions of existing ones. it 208 journal of library automation vol. 6/ 4 december 1973 was felt very strongly that the time factor was crucial if the efforts of such a group were to be useful, particularly to several of the institutions represented at the meeting. there was also a concern that the activities of the group should not parallel or duplicate any work already being undertaken by other groups. while various ala committees were dealing with some aspects of the overall problem, no one committee seemed to be addressing its entire scope. the association of research libraries was conducting a study of the existing serials data bases held by their member institutions, but was not currently addressing the overall problem, particularly with regard to the union list activities. it was suggested that direct communication with that committee be established. the net result of this first meeting was that the discussion group was formed and several meetings were scheduled. cynthia pugsley from the university of waterloo libraries, jay cunningham from the university-wide library automation program, university of california, and this writer were requested to prepare a position paper outlining the need for such a group. in july, the minutes of the june 26 meeting and the position paper were distributed. in the meantime a steering committee was arbitrarily selected. the council on library resources agreed to fund a meeting of that committee to be held september 21 at york university in toronto. the steering committee was made up of representatives from the council on library resources (clr), northwestern university, the canadian union catalogue taskgroup and its subgroup on the serials union list, the state university of new york (suny), the university of california university-wide library automation program ( ulap), the association of research libraries (arl), the joint committee on the union list of serials ( jculs), ohio college library center ( oclc), the national serials data program (nsdp), the library of congress (lc), the national library of canada (nlc), universite laval, international serials data system ( isds) /canada, and an observer from the british library. the purpose of the meeting was: 1. to establish a mechanism for creating a set of "agreed-upon practices for converting and communicating machine-readable serials data." 2. to establish a mechanism for cooperatively converting a comprehensive retrospective bibliographic data base of serials. to further these ends, the following subcommittees were established: 1. holding statement notation 2. working communications conventions 3. authority files 4. cooperative conversion mechanism the steering committee recognized the need for swift action on the development of "agreed-upon practices." consequently, this job was delegatad hoc discussidn group i anable 209 ed to the holding statement notation and the working communications conventions subcommittees. it deferred action on the question of a cooperative conversion effort until a report was received from that subcommittee at the next steering committee meeting scheduled for october 22, 1973 during the american society for information science meeting in los angeles. on october 10, three of the four subcommittees met for very brief sessions at the library of congress. the most significant results came from the cooperative conversion subcommittee which recommended that: ( 1) a proposal for a cooperative project be prepared as soon as possible; and (2)· that the conversion vehicle for such a project be the oclc facilities. at the next steering committee meeting these recommendations were accepted and the coordinator was asked to prepare a draft of a proposal for review by the cooperative conversion subcommittee. at this time the proposal is being prepared. the question of the need for formal affiliation with one or more of the existing professional organizations had repeatedly been raised at the various meetings. it was initially decided to inform the appropriate organizations of our existence and intentions, and to cooperate whenever and wherever our activities overlapped. when the group decided to prepare a proposal for a cooperative conversion project, the need for such· affiliation increased dramatically. at the october 22 meeting, the association of research libraries indicated a positive interest in our exploring that possibility further with them. they asked for a more detailed definition of our goals and plans, which is also being prepared. · generally the reaction of the group toward some kind of organizational arrangement with arl, if assurance could be made regarding the participation of non-arl institutions, was favorable. another question that lingers is whether it would be advisable to have a formal dual affiliation with both arl and a second professional organization. at this point the question is still open. current position thus far the activities of the group have addressed the problems of: 1. the improvement of communications among institutions engaged in the generation or maintenance of serials data bases. 2. the establishment of a set of "agreed-upon practices.'' 3. the investigation of future means of cooperative or coordinated serials record conversion of retrospective titles. the reasons for these efforts are obvious. we are currently all spending much time and money on noncooperative and uncoordinated local and regional conversion, and few of us are satisfied with the results. through improved communications among conversion efforts, we hope 210 journal of library automation vol. 6/4 december 1973 to establish a set of "agreed-upon practices" which should increase the interfiling compatibility. this in turn should reduce the cost to each institution. the use of a common centralized data collection vehicle will minimize redundant conversion. the problems associated with the generation and maintenance of union files of serials have multiplied in the last decade with the introduction of the anglo-american cataloguing rules ( aacr), establishment of the international serials data system ( isds), the presentation of the international standard bibliographic description for serials ( isbds) proposal, the distribution of the library of congress marc serials records, and the increasing role played by the indexing and abstracting services as points of access to serials lists of all types. individually our institutions cannot comprehensively attack all of these aspects. if attacked independently, there is little chance of similarity of approach; if attacked jointly, through the establishment of a set of "agreed-upon practices," similarity will be greater. if attacked jointly through a cooperative conversion effort, the resulting file will be equally usable to all the participants. it is the primary objective of the cooperative conversion project to establish a relatively comprehensive bibliographic data base of serials titles within a time frame which would eliminate the necessity for redundant and costly conversion efforts on the local and regional levels. the prime use of the resulting data base is intended to be the support of union list of serials activities. the secondary objectives are: i. to assist the national libraries of both countries (canada and the united states) in the establishment of a computer-maintained (and hopefully remotely accessible) serials data system. this will be accomplished partly by the very existence of the resulting data base, and partly by the experience gained in its establishment. 2. to assist in the definition of the roles of the regional or resource centers in such enterprises. 3. to provide a source data base for use within the international serials data system, and to seek the active participation of the canadian and united states national centers. the intention of the cooperative conversion project is to establish a comprehensive data base of serials titles in such a way as to accommodate the past, present, and future standards for format, description, and identification, when they can be identified. it is not the intention of this group to establish any new standards in any area. the proposed record structure will be a composite record complying with the iso /2709 format standard on level one (structure), and will attempt to reconcile the minor conflicts among the international serials data system's guidelines, the national serials data program internal format, the library of congress' marc-s format, and the draft of the caad hoc discussion group/ anable 211 nadian marc serials format on levels two and three (content designators and content) . the problems here appear to be technical in nature and by no means insurmountable. thus far the communication among the par· ticipants (including representatives from three of the four areas) has been most encouraging. the record will be based on a minimum set of data elements established to provide enough data to support the union list functions. however, this is a minimum and not a maximum set. it is basically a convention below which a record will be considered incomplete and above which it will be considered acceptable. there probably will be two additional categories of data elements besides those that are required: ( 1) required if readily avail. able; and ( 2) not required by the system, but acceptable. "required if readily available" covers those situations where complete bibliographic data are available at the time of conversion, such as where library of con· gress data are present. if the data are there, it is cheaper to convert at that point than at a later date. for this category and the "not required but ac· ceptable" category, a set of agreed·upon practices will be in effect to en· sure that if a data element is converted it will be consistent in content with similarly tagged fields. since at the time of this writing the proposed working communication conventions have not been finalized, it is not possible for the reader to judge whether the minimum set of data elements will meet his local or regional requirements. at this stage it appears that the set will probably in· elude over thirty elements and will have as a subset the isds data element requirements. the conversion project is intended not to compete with any existing or planned programs at either the library of congress or the national li· brary of canada. in fact, it is intended to complement activities in which these two institutions might be engaged. the distribution of the lc marc-s records, and the similar proposed service by the national library of canada, deal primarily with new titles or title changes, and not with the conversion of retrospective titles. while it is the stated intent of the nsdp to attack this area (retrospective titles), thus far it has not been fund· ed to do so. in fact the active involvement of both national libraries and their isds centers is anticipated since their contribution would be inappropriate to duplicate. it is intended that the resulting data base be made available to the isds international center and thus the rest of the international li· brary community. while the direct participation in the conversion effort may well be limited to a manageable number of institutions, this should not deter any institution from direct involvement in the deliberations of the group. what is requested is that the prospective participants have a serious interest in the solving of problems within the short time frame allowed. 212 journal of library automation vol. 6/4 december 1973 future to repeat, the basic goals of the group are: 1. to improve the communication among the generators of serials data bases. 2. to establish a set of "agreed-upon practices." 3. to establish a mechanism for the cooperative conversion of a comprehensive serials data base. the first goal will be an on-going effort, probably carried through by one or more existing organizations. the second goal, we hope to have partially completed by the time ala meets in chicago in january 1974, through the presentation to the steering committee of the reports of the various subcommittees. the third goal will be accomplished in stages. by ala midwinter we hope to have a concrete proposal that can be presented to all prospective participants, to funding agencies, and to the library community as a whole. since time is one of the items to be optimized, we feel that we should have the project launched no later than the end of the second quarter of 1974. the basic approach being proposed in this document can be characterized in the following ways: 1. a limited number of large institutions ( 5-15), or centers representing large institutions, will use a single on-line data collection facility, such as the ohio college library center ( oclc), to convert their retrospective serials files. 2. one or more large bibliographic 6.les will be used as a base file ( possibly the minnesota union list of serials file) to which new records or fields can be added. 3. the conversion requirement will be based on: (a) the building of a composite record incorporating the aacr, isds, and proposed isbd ( s) requirements. (b) minimum set of data elements basically for union list of serials purposes. (c) the concept of an expandable record able to incorporate: ( 1) variant entry approaches, and (2) available (but not required) data elements. such an approach is a series of compromises, the first of which deals with the trade-off between time and cost. one argument which has been offered against the concept of collecting an "incomplete" serials record is that the total cost to the library community in the long run will be greater than if a complete record conversion were to be done initially. this argument is a carryover from the similar discussion concerning monographs. however, we must recognize the following: l. serials records are of a dynamic nature; what is true for a title this ad hoc discussion group/ anable 213 year probably will not be true next year. the more comprehensive the data element set, the more true this becomes. 2. the cost curve dramatically increases as the number and type of data elements increase. 3. the increased time required to collect, edit, and control an exhaustive data element set will seriously protract the time frame of such an effort. 4. the time frame is one of the prime targets for optimization. any massive additional data collection requirement will compromise that goal. there are no conclusive studies in the area of serials conversions which suggest that the "total" record conversion approach would be less expensive in the long run tha11 a base record conversion approach. we do not know what the total conversion effort is now, but it is guessed to be significant. the utilization of most of the existing files is primarily for catalogs or for location services which the proposed minimum data set will accommodate. only a limited number of institutions have experimented with serials check-in or other functions requiring more complete records. the building of a composite record incorporating the various bibliograpllic standards is easily justified. such a record must be accessible via past, current, future, and popular practices. the ability to incorporate alternative applications of the same standards is also important, particularly in those cases where the rule is open to interpretation. this is very important if there is no centralized authority to control the application of a specific standard. the ability to convert additional data elements which are readily available but not required, is also an important capability since it will reproduce a more complete record at a reasonable cost. keeping the number of contributing institutions to a relatively small number simplifies the control aspects of the project. using a central on-line (remotely accessible) system such as oclc reduces the amount of software development required and reduces the degree of redundant conversions. it also will enable us to start conversion in a time frame otherwise impossible. the use of at least one large bibliographic file such as muls decreases the amount of original conversion, thus shortening the total time frame. the use of multiple starting data bases increases the matching requirements of similar records among the data bases but further reduces the original effort. the problem of selecting data bases is being studied. summary i have attempted to define in this article the history, the current position, and the future plans of the ad hoc discussion group on serials data bases. we have tried to include in the deliberations of the group as many of the interested parties as possible. omissions do exist, not by intent, but 214 journal of library automation vol. 6/ 4 december 1973 because of a lack of complete information and poor communication. i have tried to act with speed because of the need expressed by the participants. the group is not a closed shop. any institution seriously desiring to make contributions is welcome. please consider this an open invitation. any documentation desired is readily available from me, as the coordinator. the willingness of the participants to cooperate and to make compromises has thus far exceeded all expectations, particularly in those areas where problems were expected. it has truly been a group effort. i would like to especially thank the national library of canada, the library of congress, and the national serials data program for the cooperation they have given to the regional organizations which have been the backbone of this effort. ital_24n4.pdf ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ 26 information technology and libraries | june 2008 preparing locally encoded electronic finding aid inventories for union environments: a publishing model for encoded archival description author id (to come) plato l. smith ii this paper will briefly discuss encoded archival description (ead) finding aids, the workflow and process involved in encoding finding aids using ead metadata standard, our institution’s current publishing model for ead finding aids, current ead metadata enhancement, and new developments in our publishing model for ead finding aids at florida state university libraries. for brevity and within the scope of this paper, fsu libraries will be referred to as fsu, electronic ead finding and/ or archival finding aid will be referred as ead or eads, and locally encoded electronic ead finding aids inventories will be referred to as eads @ fsu. n what is an ead finding aid? many scholars, researchers, and learning and scholarly communities are unaware of the existence of rare, historic, and scholarly primary source materials such as inventories, registers, indexes, archival documents, papers, and manuscripts located within institutions’ collections/holdings, particularly special collections and archives. a finding aid—a document providing information on the scope, contents, and locations of collections/ holdings—serves as both an information provider and guide for scholars, researchers, and learning and scholarly communities, directing them to the exact locations of rare, historic, and scholarly primary source materials within institutions’ collections/holdings, particularly noncirculating and rare materials. the development of the finding aid led to the institution of an encoding and markup language that was software/hardware independent, flexible, extensible, and allowed online presentation on the world wide web. in order to provide logical structure, content presentation, and hierarchical navigation, as well as to facilitate internet access of finding aids, the university of california–berkeley library in 1993 initiated a cooperative project that would later give rise to development of the nonproprietary sgml-based, xml-compliant, machine-readable markup language encoding finding aid standard, encoded archival description (ead) document type definition (dtd) (loc, 2006a). thus, an ead finding aid is a finding aid that has been encoded using encoded archival description and which should be validated against an ead dtd. the ead xml that produces the ead finding aid via an extensible style sheet language (xsl) should be checked for well-formed-ness via an xml validator (i.e. xml spy, oxygen, etc.) to ensure proper nesting of ead metadata elements “the ead document type definition (dtd) is a standard for encoding archival finding aids using extensible markup language (xml)” (loc, 2006c). an ead finding aid includes descriptive and generic elements along with attribute tags to provide descriptive information about the finding aid itself, such as title, compiler, compilation date, and the archival material such as collection, record group, series, or container list. florida state university libraries has been creating locally encoded electronic encoded archival description (ead) finding aids using a note tab light text editor template and locally developed xsl style sheets to generate multiple ead manifestations in html, pdf, and xml formats online for over two years. the formal ead encoding descriptions and guidelines are developed with strict adherence to the best practice guidelines for the implementation of ead version 2002 in florida institutions (fcla, 2006), manuscript processing reference manual (altman & nemmers, 2006), and ead version 2002. an ead note tab light template is used to encode findings down to the collection level and create ead xml files. the ead xml files are tranformed through xsl stylesheets to create ead finding aids for select special collections. n ead workflow, processes, and publishing model the certified archivist and staff in special collections and a graduate assistant in the digital library center encode finding aids in ead metadata standard using an ead clip and ead template library in note tab light text editor via data entry input for the various descriptive, administrative, generic elements, and attribute metadata element tags to generate ead xml files. the ead xml files are then checked for validity and well-formed-ness using xml spy 2006. currently, ead finding aids are encoded down to the folder level, but recent florida heritage project 2005–2006 grant funding has allowed selected special collections finding aids to be encoded down to the item level. currently, we use two xsl style sheets, ead2html.xsl and ead2pdf.xsl, to generate html and pdf formats, and simply display the raw xml as part of rendering ead finding aids as html, pdf, and xml and presenting these manifestations to researchers and end users. the ead2html.xsl style sheet used to generate the html versions was developed with specifications such as use of fsu seal, color, and display with input from the special collections department head. the ead2pdf.xsl style sheet used to generate pdf versions uses xsl-fo (formatting plato l. smith ii (psmithii@fsu.edu) is digital initiatives librarian at florida state university libraries, tallahassee. preparing locally encoded electronic finding aid inventories for union environments | smith 27 object), and was also developed with specifications for layout and design input from the special collections department head. the html versions are generated using xml spy home edition with built-in xslt, and the pdf versions are generated using apache formatting object processor (fop) software from the command line. ead finding aids, eads @ fsu, are available in html, pdf, and xml formats (see figure 1). the style sheets used, ead authoring software, and eads @ fsu original site are available via www.lib.fsu.edu/dlmc/dlc/ findingaids. n enriching ead metadata as ead standards and developments in the archival community advance, we had to begin a way of enriching our ead metadata to prepare our locally encoded ead finding aids for future union catalog searching and opac access. the first step toward enriching the metadata of our ead finding aids was to use rlg ead report card (oclc, 2008) on one of our ead finding aids. the test resulted in the display of missing required (req), mandatory (m), mandatory if applicable (ma), recommended (rec), optional (opt), and encoding analogs (relatedencoding and encodinganalog attributes) metadata elements (see figure 2). the second test involved reference online archive of california best practices guidelines (oac bpg), specifically appendix b (cdl, 2005, ¶ 2), to create a formal public identifier (fpi) for our ead finding aids and make the ead fpis describing archives content standards (dacs)–compliant. this second test resulted in the creation of our very first dacs– compliant ead formal public identifier. example: ftasu2003004. xml the rlg ead report card and appendix b of oac bpg together helped us modify our ead finding aid encoding template and workflow to enrich the ead document identifier metadata tag element, include missing mandatory ead metadata elements, and develop fpis for all of our ead finding aids. prior to recent new developments in the publishing model of ead finding aids at fsu libraries, the ead finding aids in our eads @ fsu inventories could not be easily found using traditional web search engines, were part of the so-called “deep web,” (prom & habing, 2002) and were “unidimensional in that they [were] based upon the assumption that there [was] an object in a library and there [was] a descriptive surrogate for that object, the cataloging record” (hensen, 1999). ead finding aids in our eads @ fsu inventories did not have a descriptive surrogate catalog record and lacked the relevant related encoding and analog metadata elements within the ead metadata with which to facilitate “metadata crosswalks”—mapping one metadata standard with another metadata standard to facilitate crosssearching. “to make the metadata in ead instance as robust as possible, and to allow for crosswalks to other encoding schemes, we mandate the inclusion of the relatedencoding and encodinganalog attributes in both the and segments” (meissner, et al., 2002). incorporating an ead quality checking tool such as rlg bpg and ead compliance such as dacs when figure 1. ead finding aids in html, pdf, and xml format figure 2. rlg ead report card of xml ead file 28 information technology and libraries | june 2008 authoring eads, will assist in improving ead encoding and ead finding aids publishing model. n some key issues with creating and managing ead finding aids one of the major issues with creating and managing ead finding aids is the set of rules used for describing papers, manuscripts, and archival documents. the former set of rules used for providing consistent descriptions and anglo-american cataloging rules (aacr) bibliographic catalog compliance for papers, manuscripts, and archival documents down to collection level was archives, personal papers, and manuscripts (appm), which was complied by steven l. hensen and published by the library of congress in 1983. however, the need for more description granularity down to the item level, enhanced bibliographic catalog specificity, marc and ead metadata standards implementations and metadata standards crosswalks, and inclusion of descriptors of archival material types beyond personal papers and manuscripts prompted the development of describing archives: a content standard (dacs), published in 2004 with the second edition published in 2007. “dacs [u.s. implementation of international standard for the description of archival materials and their creators] is an output-neutral set of rules for describing archives, personal papers, and manuscripts collections, and can be applied to all material types ”(pearce-moses, 2005). some international standards for describing archival materials are general international standard archival description isad(g) and international standard archival authority record for corporate bodies, persons, and families [isaar(cpf)]. other issues with creating and managing ead finding aids include (list not exhaustive): 1. online presentation of finding aids 2. exposing finding aids electronically for searching 3. provision of a search interface to search finding aids 4. online public access catalog record (marc) and link to finding aids 5. finding aids linked to digitized content of collections eads @ fsu exist in html for online presentation, pdf for printing, and xml for exporting, which allow researchers greater flexibility and options in the information-gathering and research processes and have improved the way archivists communicated guides to archival collections with researchers as opposed to paper finding aids physically housed within institutions. eads @ fsu have existed online in html, pdf, and xml formats for two years in a static html document and then moved to drupal (mysql database with php) for about one year, which improved online maintenance but not researcher functionality. however, the purchase and upgrade of a digital content management system marked a huge advancement in the development of our ead finding aids implementation and thus resolutions to issues numbers 1–3. researchers now have a single-point search interface to search eads @ fsu across all our digital collections/ institutional repository (see figure 3); the ability to search within the finding aids via full-text indexing of pdfs; the option of brief (thumbnails with ead, htm, pdf, and xml manifestation icons), table (title, creator, and identifier), and full (complete ead finding aid dc record with manifestations) views of search results, which provides different levels of exposures of ead finding aids; and the ability to save/e-mail search results. future initiatives are underway to enhance eads @ fsu implementation via the creation of ead marc records through dublin core to marc metadata crosswalk, to deep link to ead finding aids via 856 field in marc records, and to begin digitizing and linking to ead finding aids archival content via digital archival object ead element. is “linking element that uses the attributes entityref or href to connect the finding aid information to electronic representations of the described materials. the and elements allow the content of an archival collection or record figure 3. online search gui for ead finding aids and digital collections within ir preparing locally encoded electronic finding aid inventories for union environments | smith 29 group to be incorporated in the finding aid” (loc, 2006b). we have opted to create basic dublin core records of ead finding aids based on the information in the ead finding aids descriptive summary (front matter) first and then crosswalk to marc, but are cognizant that this current workflow is subject to change in the pursuit of advancement. however, we are seeking ways to improve the ead workflow and ead marc record creation through more communication and future collaboration with the fsu libraries cataloging department. n number of finding aids and percent of eads @ fsu as of february 16, 2006, we had 700 collections with finding aids in which 220 finding aids are electronic and encoded in html (31 percent of total finding aids). from the 220 electronic finding aids, 60 are available as html, pdf, and xml finding aids (20 percent of electronic finding aids are eads @ fsu). however, we currently have 63 ead finding aids available online in html, pdf, and xml formats. n new developments in publishing eads @ fsu current eads @ fsu include the recommendations from test 1 and test 2 (rlg bpg and dacs compliance) which were discussed earlier and the digital content management system (i.e. digitool) creates a descriptive digital surrogate of the ead objects in the form of brief and basic dublin core metadata records for each ead finding aid along with multiple ead manifestations (see figure 4). we have successfully built and launched our first new digital collection, fsu special collections ead inventories, in digitool 3.0 as part of fsu libraries dlc digital repository (http://digitool3.lib.fsu.edu/r/), a relational database digital content management system (dcms). digitool has an oracle 9i relational database management system backend, searchable web-based gui, a default ead style sheet that allows full-text searching of eads, supports marc, dc, mets metadata standards, jpeg2000 (built in tools for images and thumbnails) as well as z39.50 and oai protocols which will enable resource discovery and exposing of eads @ fsu. you can visit fsu special collections ead finding aids inventories at http://digitool3.lib.fsu.edu/r/? func=collections-result&collection_id=1076. n national, international, and regional aggregation of finding aids initiatives rlg’s archivegrid (http://archivegrid.org/web/index. jsp) is an international, cross-institutional search constituting the aggregation of primary source archival materials of more than 2,500 research libraries, museums, and archives with a single-point interface to search archival collections from across research institutions. other international, cross-institutional searches of aggregated archival collections are: n intute: arts& humanities in the united kingdom www.intute.ac.uk/artsandhumanities/ cgi-bin/browse.pl?id=200025 (international guide to subcategories of archival materials) n archives made easy www.archivesmade easy.org (guide to archives by country) there are also some regional initiatives, which provide cross-institutional search of aggregations of finding aids: n publication of archival library and museum materials (palmm) http://palmm.fcla.edu (crossfigure 4. ead finding aids in ead (default), html, pdf, and xml manifestations 30 information technology and libraries | june 2008 institutional searches in fl fsu participates, fl) n virginia heritage: guides to manuscript and archival collections in virginia http://ead.lib .virginia.edu/vivaead/ (cross-institutional searches in virginia) n texas archival resources online www.lib.utexas. edu/taro/ (cross-institutional searches in texas) n online archive of new mexico http://elibrary .unm.edu/oanm/ (cross-institutional searches in new mexico) awareness of regional, national, and international aggregation of finding aids initiatives and engagement in regional aggregation of finding aids will enable a consistent advancement in the development and implementation of eads @ fsu. acknowledgments fsu libraries digital library center and special collections department, florida heritage project funding (fcla), chuck f. thomas (fcla), and robert mcdonald (sdsc) assisted in the development, implementation, and success of eads at fsu. references altman, b. & nemmers, j. (2006). manuscripts processing reference manual. florida state university special collections. california digital library (cdl). (2005). oac best practice guidelines for encoded archival description, appendix b. formal public identifiers for finding aids. retrieved october 6, 2006 from www.cdlib.org/inside/diglib/guidelines/bpgead/ bpgead_app.html#d0e2995. digital library center, florida state university libraries. (2006). fsu special collections ead finding aids inventories. retrieved january 5, 2007 from http://digitool3.lib.fsu.edu/ r/?func=collections-result&collection_id=1076. florida center of library automation (fcla). (2004). palmm: publication of archival library and museum materials, archival collections. retrieved january 7, 2007 from http://palmm.fcla .edu. florida center for library automation (fcla). (2006). best practice guidelines for the implementaton of ead version 2002 in florida institutions. (john nemmers, ed.). accessed april 21, 2008, at www.fcla.edu/dlini/openingarchives/new/ floridaeadguidelines.pdf fox, m. (2003). the ead cookbook — 2002 edition.chicago: the society of american archivists. retrieved october 6, 2006 from www.archivists.org/saagroups/ead/ead2002cookbook .html. hensen, s. l. (1999). nistf ii and ead: the evolution of archival description. encoded archival description: context, theory, and case studies (pp. 23–34). chicago: the society of american archivsits library of congress (loc). (2006a). development of the encoded archival description dtd. retrieved october 6, 2006 from www.loc.gov/ead/eaddev.html. library of congress (loc). (2006b). digital archival object— encoded archival description tag library—version 2002. retrieved january 8, 2007 from www.loc.gov/ead/tglib. library of congress (loc). (2006c). encoded archival description —version 2002 official site. etd dtd version 2002. retrieved april 19, 2008 from www.loc.gov/ead/ead2002a.html. meissner, d., kinney, g., lacy, m., nelson, n., proffitt, m., rinehart, r., ruddy, d., stockling, b., webb, m., & young, t. (2002). rlg best practices guidelines for encoded archival description (pp. 1-24). mountain view: rlg. retrieved january 5, 2007 from www.rlg.org/en/pdfs/bpg.pdf. national library of australia. (1999). use of encoded archival description (ead) for manuscript collection retrieved january 4, 2007 from www.nla.gov.au/initiatives/ead/eadintro .html. oclc. (2007). archivegrid—open the door to history. retrieved january 4, 2007 from http://archivegrid.org/web. oclc. (2008). ead report card. retrieved april 11, 2008 www.oclc.org/programs/ourwork/past/ead/reportcard .htm. pearce-moses, r. (2005). a glossary of archival and records terminology. chicago: society of american archivists. retrieved january 8, 2007 from www.archivists.org/glossary/index.asp. prom, c. j. & habing, t. g. (2002). using the open archives initiative protocols with ead . paper preserted at the international conference on digital libraries proceedings of the 2nd acm/ieee-cs joint conference on digital libraries. portland, oregan, usa, july 14-18, 2002. retrieved october 6, 2006 from http://portal.acm .org/citation.cfm?doid=544220.544255. reese, t. (2005). building lite-weight ead repositories,. paper presented in the international conference on digital libraries proceedings of the 5th acm/ieee-cs joint conference on digital libraries. new york: acm. retrieved january 5, 2007 from http://doi.acm.org/10.1145/1065385.1065498. special collections department, university of virginia. (2004). virginia heritage guides to manuscripts and archival collections in virginia. retrieved january 7, 2007 from http://ead.lib.virginia .edu/vivaead/. thomas, c., et al. (2006). best practices guidelines for the implementation of ead version 2002 in florida institutions. florida state university special collections. university of texas libraries, university of texas at austin. (unknown). texas archival resources online (taro). retrieved january 4, 2007 from www.lib.utexas.edu/taro. factors affecting university library website design | kim 99 yong-mi kim factors affecting university library website design factors include usability testing and institutional forces.5 because website design studies are sparse, this study examines the success of technology utilization studies to further identify factors that are pertinent to website design in order to provide a comprehensive view of web design success factors. a review of literature related to university library website design will be offered in the next section. the research methods, which discuss the data collection strategies and the measurements used in the current study, will be followed by the literature review. the findings of the study will later be reported and discussed after the research methods section. the paper will then conclude with an overview of the implications the findings have for academia and managers. ■■ literature review this section offers an overview of the existing website design literature and relevant success factors. these factors include institutional forces, supervisors’ technical knowledge and support, input from secondary sources, and input from users. because the aforementioned elements are identified as independent variables, this study also adopts them as such. following existing studies, website success factors are identified from the utilitarian perspective.6 the dependent variables are (1) the extent to which website designers meet users’ needs, (2) the extent to which users perceive ulwr to be useful, and (3) their actual usage. in this manner, the evaluation of success is measured from different perspectives. this discussion of the independent and the dependent variables appears in the conceptual model, figure 1. institutional forces institutional forces refer to as organizations following other organizations practices to secure efficiency and legitimacy. existing studies have identified three institutional forces: coercive, mimetic, and normative.7 coercive force takes place when an organization pressures others to adopt a certain practice. it is higher when an organization is a subset of another organization. in this research context, the university could be an agent of coercive force. mimetic force refers to organizations following other organizations’ practices, and it is especially common for organizations within the same industry group.8 because organizations within existing studies have extensively explored factors that affect users’ intentions to use university library website resources (ulwr); yet little attention has been given to factors affecting university library website design. this paper investigates factors that affect university library website design and assesses the success of the university library website from both designers’ and users’ perspectives. the findings show that when planning a website, university web designers consider university guidelines, review other websites, and consult with experts and other divisions within the library; however, resources and training for the design process are lacking. while website designers assess their websites as highly successful, user evaluations are somewhat lower. accordingly, use is low, and users rely heavily on commercial websites. suggestions for enhancing the usage of ulwr are provided. f rom a utilitarian perspective, a website evaluation is based on users’ assessments of the website’s instrumental benefits.1 if a website helps users complete their tasks, they are likely to use the website. following this line of reasoning, dominant research has reported that users are most likely to use university library website resources (ulwr) when they can help with user tasks.2 although we know now that the utilitarian perspective should be applied to web design, not clear is the extent to which web designers consider users’ needs and, likewise, the extent to which users consider ulwr to be successful in terms of meeting their needs. also not clear are what factors other than user needs influence university library website design. this is not a trivial issue because university libraries have invested a massive number of resources into providing web services and need to justify their investments to stakeholders (such as the university) by demonstrating their ability to meet users’ needs.3 also important is the identification of these factors because web design and website performance are closely correlated.4 as a consequence, investigating factors that influence successful university library website design and providing managerial guidance is a timely pursuit. later, the objectives of this paper are twofold: 1. what factors influence university library website design? 2. to what extent do website designers and users consider the university library website to be successful? to explore these research questions, this study identifies factors influencing university library website design that have been reported in existing literature. these yong-mi kim (yongmi@ou.edu) is assistant professor, school of library and information studies, university of oklahoma, tulsa, oklahoma. 100 information technology and libraries | september 2011 although it is a critical factor for website success, there is little evidence that website designers receive strong support from their supervisors. research shows that supervisors’ lack of knowledge about websites inhibits user-centered website design.17 a respondent from chen et al.’s study reports, “it’s really a pain trying to connect with our administration on the topic of web design and usability, because even definitions are completely out the window” and “the dean and the associate directors know little about the need for usability and view it as a last minute check-off, so they can say that the web site is tested and usable.”18 lack of supervisor support inhibits website usability.19 input from secondary sources website designers typically aggregate information from secondary sources rather than from users. identified secondary sources are consultations with experts, other divisions within the library, webmasters, web committees, and focus groups.20 the most widely used method is consultation with experts.21 experts uncover technical flaws and any obvious usability problems with a design,22 facilitate focus groups,23 and create new information architecture.24 because they are experts, however, their ways of thinking may not be the same as users.’25 research shows that 43 percent of the problems found by expert evaluators were actually false alarms and that 21 percent of users’ problems were missed by those evaluators. if this analysis is true, expert evaluators tend to miss and incorrectly identify more problems than they correctly identify;26 consequently, expert testing should not substitute for user testing.27 another problem with secondary sources is that web committees “are ignorant about integrating design with usability and focus on their own agenda.”28 nonetheless, because of the lack of available resources to conduct more rigorous usability tests and the difficulty of collecting information directly from users, secondary sources such as expert evaluations are commonly used.29 input from users user input provides a great advantage for directly finding out users’ needs and integrating a user-centered design during the development stage.30 often, information from secondary sources makes assumptions about users’ needs.31 to discover users’ genuine needs, designers can conduct a regular user survey and/or seek out users’ input.32 by surveying users’ needs, one can overcome criticism such as, “most websites are created with assumptions of more expert knowledge than the users may actually possess,” and can address users’ needs more effectively.33 discovering users’ needs goes beyond usability testing because information obtained directly the same industry face similar problems or issues, mimetic decisions can reduce uncertainty and secure legitimacy.9 in this context, website designers may analyze and emulate other universities’ websites to claim that their websites are congruent with successful websites, thereby justifying their managerial practices. normative force is associated with professionalism.10 normative force occurs when the norms (e.g., equity, democracy, etc.) of the professional community are integrated into organizational decision-making. in a library setting, website designers may follow a set of value systems or go to conferences to discover ways to better deliver services. there is evidence that website designers follow other organizations.11 this phenomenon is known as isomorphism. the appearance and the structure of websites show isomorphic patterns when an organization follows examples of other organizations’ websites or conforms to institutional pressures.12 another study reports coercive forces in the design of university library websites; the parent institution exercises power over library website design by providing guidelines, and later, the design is not independent.13 supervisors’ technical knowledge and support literature on supervisors’ knowledge of and support for technology has long been recognized as one of the most important technology success factors.14 if supervisors are knowledgeable about technology, they are likely to support and provide resources for training.15 supervisors’ technical knowledge also serves as a signal for the importance of the utilization of technology within the organization; consequently, employees actively look for ways to utilize technology and vigorously adopt technology.16 figure 1. conceptual model for website design success factors affecting university library website design | kim 101 march and may 2009. a total of 315 responses were collected (139 males and 176 female; 148 undergraduates, 101 master ’s, and 66 doctoral/faculty; business 152, human relations 51, psychology 43, engineering 41, education 20, other 8). because detailed discussion of the user side of this sample appears elsewhere,36 it will not be repeated here to avoid redundancy. because sparse research has been done in this area, the questionnaire and its measurements were created based on literature relating to the successful deployment of technology, but they were modified to fit into the website design context. because of this modification, the finalized instrument was pretested and pilot tested before use in this study.37 the institutional forces are measured in three categories: coercive isomorphism (i.e., following the university guidelines regarding website creation), mimetic isomorphism (i.e., investigating other university websites and investigating commercial websites), and normative isomorphism (i.e., attending conferences). following existing studies, supervisors’ knowledge and support are assessed by the web designer in two areas: the extent to which a supervisor is knowledgeable about technology and aware of the importance of technology. the supervisor ’s support for the website is measured by asking web designers about the extent to which their supervisors allocated resources and offered training. input from secondary sources is measured by asking the extent to which website designers consult sources such as experts, other divisions, webmasters, and web committees. input from users is measured by the extent to which web designers collect information from website users. finally website successes are measured by two categories: assessments made by the web designers and the website users themselves. the finalized measurements and the sources appear in table 1. ■■ report of findings this section reports the empirical findings of each category discussed in the previous section. figure 2 shows institutional forces that influence university library website design. the first category is coercive force, the second category is mimetic forces, and the third category is normative force. it is clear that the majority of university library web designers (75 percent) comply with the guidelines given by the university, which is a measurement of coercive force; and also designers investigate other universities’ websites (75 percent) and commercial websites (59 percent), which is a measurement of mimetic forces; however, designers don’t appear to actively attend conferences that influence website design, which is a measurement of normative force. from users will reveal what users want and what should be done to meet their needs, thereby enhancing ulwr usage. however, research shows that this aspect is not actively integrated into web design due to the lack of support from supervisors.34 website success success can be measured according to the website’s purpose: to what extent does the website meet users’ needs? in the university library website context, following a utilitarian perspective, researchers measured the success by the degree of ulwr integrated into users’ tasks and users’ frequent visits to the website.35 these two measurements, when combined with the designers’ perceptions of success, will allow one to measure the users’ and designers’ perspectives of website success. by measuring from these two sides, if there is a discrepancy between the two success outcomes, it will prompt designers to adjust their viewpoints to align their success measures with users. ■■ research methods this section discusses the sampling strategies and the measurements for the independent and the dependent variables. because one of the contributions of this study is to compare users’ and designers’ perceptions of website success, the samples are drawn from two groups: one is from university library website designers and the other one is from university library users. for the designer side, it is directly collected from university library website designers; later, libraries without website designers within the library are excluded. the designer sample is identified from the publicly available yahoo academic library list (http://dir.yahoo.com/ reference/libraries). the list contains 448 academic libraries, including those outside the united states. the research assistant made a phone call to the libraries that reside in the united states and verified the existence of website designers within the library, which included 86 academic libraries. if a library had a website designer, the research assistant contacted the person and invited him or her to participate in the study. because of difficulties contacting website designers, the research assistant was able to collect 16 responses between may 2009 and february 2010. once the graduate assistant identified the unreachable designers, the researcher e-mailed those designers between january and april of 2010 and added 30 more responses to the dataset, which resulted in a total of 46 responses (a 54 percent response rate). for the user side, a survey questionnaire was sent to faculty, doctoral, master ’s, and undergraduate students between 102 information technology and libraries | september 2011 the second group of factors that affects website design is supervisors’ knowledge about technology and support for the utilization of technology (see figure 3). web designers have a somewhat mixed perception about their supervisors’ technical knowledge. more specifically, 37 percent of respondents responded that their supervisors do not have good knowledge about technology; 23 percent responded that their supervisors were somewhat knowledgeable about technology; and 40 percent responded that their supervisors have good knowledge about technology; thus, web designers have mixed evaluations about supervisors’ technical knowledge. web designers reported that their supervisors’ perceptions of the importance of technology and websites are higher than their technical knowledge. approximately 60 percent of designers responded that their supervisors emphasize the importance of technology and websites, and the remaining respondents answered that their supervisors are somewhat aware of the importance or do not value it at all. table 1. instrument construct operationalization source institutional forces following university guidelines regarding website creations investigating other university websites investigating commercial websites attending conferences 11, 12, 15 supervisor’s technical knowledge and support supervisor’s knowledge about technology supervisor’s evaluation of the importance of technology supervisor’s evaluation of the importance of website utilization availability of website tools availability of budgeting availability of technical training availability of website creation training 17, 22 input from secondary sources consulting with experts consulting with other divisions within the library consulting with webmasters consulting with website committee consulting with focus group 10, 25–26 input from users conducting user survey utilizing users’ inputs 10 website success measures from web designer we meet users’ needs we provide better services via the website we satisfy users’ needs we provide quality services our library is overall successful 1, 2 website success measures from website users it lets me finish my project more quickly it helps improve my productivity it helps enhance the quality of my project the extent to which users integrate website library resources into users’ tasks* frequency of users’ visits to university library website** 3, 41, 43 all items are measured with a likert scale: 1 not really; 2: somewhat; and 3: greatly. * measured by percentage **measured by frequency figure 2. institutional forces factors affecting university library website design | kim 103 percent of respondents reported that they consult with web experts; over 70 percent responded that they integrate input from other divisions; and around 70 percent consult with webmasters. the utilization of secondary information sources for website creation is very high except for focus groups. the most widely used technique in this category is expert consultations followed by consultations with other divisions within the library. web designers also consider input from webmasters and web committees. figure 6 shows the extent to which website designers apply input directly derived from web users. around half of respondents reported that they obtain information from user surveys, and around 70 percent responded that they consider users’ input collected via comments, feedback, and complaints. figure 4 shows the extent to which supervisors support web designers. fifty-five percent of respondents reported that they have good web creation tools; 44 percent responded that they have enough budget for website creation, and almost a similar rate of respondents (39 percent) reported that they do not have adequate budgets for website creation. the last two questions concerning training show somewhat different results from the findings of the first two questions. the majority of web designers do not get technology-related or website creation-related training. less than one-third of respondents reported that they receive enough technology-related and web creationrelated training. the findings of the use of secondary sources show in figure 5 that web designers actively leverage such information sources for web design. by category, over 80 figure 3. supervisor’s knowledge about technology figure 4. supervisor’s support figure 5. input from secondary sources figure 6. input from users 104 information technology and libraries | september 2011 majority of users rely on commercial web resources for their academic tasks. ■■ discussion based on the study’s findings, this discussion will first cover the most influential factors first followed by the least influential elements in designing a university library website. first, the most influential factors for website designers are expert opinions and consultations with other divisions within the library. these may be the most important factors because relying on experts allows designers to discover users’ needs while saving costs. web designers also consider input from webmasters and web committees. coercive and mimetic forces are also highly significant factors affecting web designers. the university library is a subset of the university, and thus, designers may need to align themselves with university policy. also, designers can claim legitimacy by imitating other successful university websites, thereby securing necessary resources and support for website creation; however, web designers are much less likely to imitate commercial websites. this finding is consistent with existing reports that organizations imitate other successful organizations’ managerial practices that are within the same industry category.38 the least influential website creation factors are supervisors’ knowledge, which in turn impacts low budget allocations, and web designers’ technical training. this finding is consistent with successful technology deployment literature that shows supervisors’ technical knowledge is highly correlated with budget allocations.39 the lack of training for web designers does not appear to be improved since the last study, which was conducted in 2001;40 library ■■ website success website success is evaluated from two sides: designer opinion and user opinion. overall, designers evaluated their websites to be highly successful. they believe that they meet users’ needs, provide better services via the web, satisfy users’ needs, and provide quality services. later, their evaluation of their website is extremely positive, as reported in figure 7. figure 8 shows users’ perceptions of the usefulness of ulwr. users generally agree that ulwr are useful for their academic projects. more specifically, 55 percent responded that they are able to finish their tasks quickly because of the resources; 65 percent reported that they could increase their productivity; and 67 percent responded that they enhanced project quality thanks to the resources. on the other hand, a significant portion of respondents (more than 30 percent) do not think or have no opinions that ulwr are useful for their academic tasks. figure 9 investigates how often users visit university library websites. approximately 30 percent reported that they never visited or rarely visited the university library website. thirty-two percent made a visit to the website a couple of times a month, and approximately 40 percent visited the library website a couple of times a week or daily. figure 10 examines the users’ utilization of ulwr versus commercial website resources. the responses from 315 users show that they utilize commercial websites more than ulwr. specifically, 46 percent of respondents reported that they use less than 20 percent of ulwr and only 8 percent utilize ulwr more than 80 percent. in contrast, 14 percent utilize less than 20 percent of commercial website resources, and 22 percent utilize more than 80 percent of commercial website resources. the figure 7. website success evaluated by design figure 8. users’ perceptions of website usefulness factors affecting university library website design | kim 105 from a utilitarian perspective, web designers primarily need to consider the ability of the website to meet users’ needs. usefulness again needs to be evaluated by users. according to user assessments ulwr are somewhat satisfactory but not strong enough to rely heavily on for academic projects. it is an alarming fact that users use commercial website resources at a much higher rate than ulwr. this is somewhat disturbing given that web designers strive to provide good services to users, and libraries have invested massive resources into providing online services. this study has implications for academia and practitioners. for academia, there has been sparse research on web design studies from a designer standpoint. it may be because of difficulties in collecting data directly from website designers. from this line of research, this study enhances the understanding of what factors influence university web design. although university websites may be deemed successful, information managers should discover why the majority of users turn to commercial websites for their academic projects. without addressing this problem, the existence of library websites may be compromised. although there is evidence that libraries consider user input, it may not accurately represent all user populations because only extremely satisfied or extremely dissatisfied users tend to provide feedback;43 consequently, a regular survey may facilitate the utilization of ulwr. finally, supervisors’ technical knowledge is found to be low. this problem may be alleviated as time goes on because new generations are more aware of the importance of technology. in the meantime, web designers are encouraged to actively communicate with supervisors about the value of the utilization of technology and seek more financial support. this study’s data have some limitations. although the web designers are usually self-taught rather than formally trained.41 one promising finding, though, is that despite the relatively low technical knowledge held by supervisors, the respondents tend to rank highly when it comes to their perceptions of the importance of technology. compared with other institutional forces, normative force is relatively low. this kind of institutional force is higher at the early stage of technology adoption. in other words, the majority of universities have already launched their websites and have established rules and policies, so libraries are already past this stage. also, input from user surveys is relatively low. this may be because it is very costly, and they have other sources to turn to such as other universities’ successful websites. website success evaluations by web designers and users show discrepancies. overall, web designers evaluate their websites to be highly successful, while user ratings offer a different picture. this incongruity is a red flag in terms of ulwr usage. the majority of users report that they turn to commercial websites more than ulwr, and one-third never or rarely visit the university website. the disparity of the success between web designers and users may be attributed to the sources of information that website designers rely on. more specifically, existing studies report that input from experts and website committees is incongruent with what users really want, while feedback from focus groups can assist in understanding users’ needs.42 ■■ conclusions this study investigates the factors that website designers consider when designing university library websites. figure 9. frequency of visits to university library websites figure 10. university library vs. commercial website 106 information technology and libraries | september 2011 seriously in information systems research,” mis quarterly 29, no. 4 (2005): 591–605. 9. scott, institutions and organizations; dimaggio and powell, “the iron cage revisited”; h. haverman, “follow the leader: mimetic isomorphism and entry into new markets,” administrative science quarterly 38, no. 4 (1993): 593–627. 10. scott, institutions and organizations. 11. k. lee, dinesh mirchandani, and xinde zhang, “an investigation on institutionalization of web sites of firms,” the data base for advances in information systems 41, no. 2 (2010): 70–88. 12. lee, mirchandani, and zhang, “an investigation on institutionalization of web sites of firms.” 13. r. raward, “academic library website design principles: development of a checklist,” australian academic & research libraries 32, no. 2 (2001): 123–36. 14. y-m. kim, an investigation of the effects of it investment on firm performance: the role of complementarity (saarbrucken, germany: vdm verlag, 2008); p. weill, “the relationship between investment in information technology and firm performance: a study of the valve manufacturing sector,” information systems research 3, no. 4 (1992): 307–33. 15. a. lederer and v. sethi, “the implementation of strategic information systems planning methodologies,” mis quarterly (1988): 445–461; j. thong, c. yap, and k. raman, “top management support, external expertise and information systems implementation in small business,” information systems research 7, no. 2 (1996): 248–67; m. earl, “experiences in strategic information systems planning,” mis quarterly (1993): 1–24; a. boynton and r. zmud, “information technology planning in the 1990’s: directions for practice and research,” mis quarterly 11, no. 1 (1987): 59–72. 16. s. jarvenpaa and b. ives, “information technology and corporate strategy: a view from the top,” information systems research 1, no. 4 (1990): 351–76. 17. chen, germain, and yang, “an exploration into the practices of library web usability in arl academic libraries.” 18. ibid. 19. j. veldof and s. nackerud, “do you have the right stuff? seven areas of expertise for successful web site design in libraries,” internet reference services quarterly 6, no. 1 (2001): 20. 20. chen, germain, yang, “an exploration into the practices of library web usability in arl academic libraries”; r. raward, “academic library website design principles: development of a checklist,” australian academic & research libraries 32, no. 2 (2001): 123–36; j. bobay et al., “working with consultants to test usability: the indiana university bloomington experience,” in usability assessment of library-related web sites: methods and case studies, ed. n. campbell (chicago: ala, 2002): 60–76; h. king and c. jannik, “redesigning for usability: information architecture and usability testing for georgia tech library’s website,” oclc systems & services 21, no. 3 (2005): 235–43. 21. j. h. spyridakis, j. b. barrick, and e. cuddihy, “internetbased research: providing a foundation for web-design guidelines,” ieee transactions on professional communication 48, no. 3 (2005): 242–60; t. a. powell, web design: the complete reference (berkeley, calif.: osborne/mcgraw-hill, 2002). 22. powell, web design. 23. r. tolliver, d. carter, and s. chapman, “website redesign and testing with a usability consultant: lessons learned,” oclc systems & services 21, no. 3 (2005): 156–66; l. vandecreek, author tried to increase responses using various means, the number of responses does not allow one to use a sophisticated analytical technique such as regression. this study includes academic libraries with a web designer within the library; as a consequence, libraries without a web designer are not included. it is recommended to collect data from both groups and compare those with a designer (resource rich) and without a designer (resource poor), and discover underlying patterns of the factors impacting website designs and offer implications for academia and managers. references 1. d. v. parboteeah, j. s. valacich and j. d. wells, “the influence of website characteristics on a consumer’s urge to buy impulsively,” information systems research 20, no. 1 (2009): 60–78; m-h. huang, “designing web site attributes to induce experiential encounters,” computers in human behavior 19 (2003): 425–42. 2. y-m. kim, “the adoption of university library web site resources: a multigroup analysis,” journal of the american society for information science & technology 61, no. 5 (2010): 978–93; o. nov and c. ye, “users’ personality and perceived ease of use of digital libraries: the case for resistance to change,” journal of the american society for information science & technology 59 (2008): 845–51; n. park et al., “user acceptance of a digital library system in developing countries: an application of the technology acceptance model” international journal of information management 29, no. 3 (2009): 196–209. 3. w. hong et al., “determinants of user acceptance of digital libraries: an empirical examination of individual differences and system characteristics,” journal of management information systems 18, no. 3 (2001–2): 97–124. 4. parboteeah, valacich and wells, “the influence of website characteristics; j. palmer, “web site usability, design, and performance metrics,” information systems research 13, no. 2 (2002): 151–67. 5. c. burton, “library web site user testing,” collect & undergraduate libraries 9, (2002): 10; s. ryan, “library web site administration: a strategic planning model for the smaller academic library,” journal of academic librarianship 29, no. 4 (2003): 207–18; y-h chen, c.a. germain., and h. yang, “an exploration into the practices of library web usability in arl academic libraries,” journal of the american society for information science and technology 60, no. 5 (2009): 953–68. 6. m-h huang, “designing web site attributes to induce experiential encounters,” computers in human behavior 19 (2003): 425–42. 7. w. r. scott, institutions and organizations (thousand oaks, calif.: sage publications, inc, 1995); p. dimaggio and w. powell, “the iron cage revisited: institutional isomorphism and collective rationality in organizational fields,” american sociological review 48 (1983): 147–60. 8. w. r. scott, institutions and organizations; h. haverman, “follow the leader: mimetic isomorphism and entry into new markets,” administrative science quarterly 38, no. 4 (1993): 593–627; m. w. chiasson and e. davidson,” taking industry factors affecting university library website design | kim 107 “usability testing for web redesign: a ucla case study,” oclc systems & services 21, no. 3 (2005): 226–34; j. ward, “web site redesign: the university of washington libraries’ experience,” oclc systems & services 22, no. 3 (2006): 207–16. 32. chen, germain, and yang, “an exploration into the practices of library web usability in arl academic libraries.” 33. ibid. 34. kim, “the adoption of university library web site resources.” 35. ibid. 36. ibid. 37. y-m. kim, “validation of psychometric research instruments: the case of information science,” journal of the american society for information science & technology 60, no. 6 (2009): 1178–91. 38. h. haverman, “follow the leader: mimetic isomorphism and entry into new markets,” administrative science quarterly 38, no. 4 (1993): 593–627. 39. t. teo and j. ang, “an examination of major is planning problems,” information journal of information management 21 (2001): 457–70. 40. r. raward, “academic library website design principles: development of a checklist,” australian academic & research libraries 32, no. 2 (2001): 123–36. 41. ibid. 42. chen, germain, and yang, “an exploration into the practices of library web usability in arl academic libraries”; powell, web design; b. bailey, “heuristic evaluations vs. usability testing,” ui design update newsletter (2001), http:// www.humanfactors.com/downloads/jan01.asp (accessed june 15, 2011). 43. t. hennig-thurau et al., “electronic word-of-mouth via consumer-opinion platforms: what motivates consumers to articulate themselves on the internet?” journal of interactive marketing 18, no. 1 (2004): 38–52. “usability analysis of northern illinois university libraries’ website: a case study,” oclc systems & services 21, no. 3 (2005): 181–92. 24. spyridakis, barrick, and cuddihy, “internet-based research.” 25. b. bailey, “heuristic evaluations vs. usability testing,” ui design update newsletter (2001), http://www.humanfactors .com/downloads/jan01.asp (accessed june 10, 2011). 26. powell, web design. 27. chen, germain, and yang, “an exploration into the practices of library web usability in arl academic libraries.” 28. k.a. saeed, y. hwang, and v. grover, “investigating the impact of web site value and advertising on firm performance in electronic commerce,” international journal of electronic commerce 7, no. 2 (2003): 119–41. 29. l. manzari and j. trinidad-christensen, “user-centered design of a web site for library and information science students: heuristic evaluation and usability testing,” information technology & libraries 25, no. 3 (2006): 163–69. 30. e. abels, m. white, and k. hahn, “identifying user-based criteria for web pages,” internet research 7, no. 4 (1997): 252–56. 31. l. vandecreek, “usability analysis of northern illinois university libraries’ website: a case study,” oclc systems & services 21, no. 3 (2005): 181–92; m. ascher, h. lougee-heimer, and d. cunningham, “approaching usability: a study of an academic health sciences library web site,” medical reference services quarterly 26, no. 2 (2007): 37–53; b. battleson, a. booth and j. weintrop, “usability testing of an academic library web site: a case study,” journal of academic librarianship 27, no. 3 (2001): 188– 98; g. h. crowley et al., “user perceptions of the library’s web pages: a focus group study at texas a&m university,” journal of academic librarianship 28, no. 4 (2002): 205–10; b. thomsett-scott and f. may, “how may we help you? online education faculty tell us what they need from libraries and librarians,” journal of library administration 49, no. 1/2 (2009): 111–35; d. turnbow et al., 4 information technology and libraries | march 2007 this study examines how social scientists arrive at and utilize information in the course of their research. results are drawn about the use of information resources and channels to address information inquiry, the strategies for information seeking, and the difficulties encountered in information seeking for academic research in today’s information environment. these findings refine the understanding of the dynamic relationship between information systems and services and their users within social-scientific research practice and provide implications for scholarly information-system development. t he information needs and informationseeking behavior of social scientists have been the focus of inquiry within library and information science (lis) research for decades. folster reviewed the major studies that have been conducted in this area over the past three decades.1 she found that research methods had developed through several stages. research prior to the 1960s usually consisted of questionnairebased user studies that gathered basic demographic data and quan titative data on the type of information used. following that were citation studies in the mid1960s, and then the combination of questionnaire and interview techniques to develop profiles of users and their needs in the 1970s. the information environment of the 1980s witnessed a major transition in research design. the former practice of studying large groups via questionnaires or struc tured interviews gave way to the use of unstructured interviews or observation of smaller groups, resulting in a more holistic picture of social scientists’ research practices. more fully developed techniques for behavioral models emerged in the 1990s. folster summarized these studies done over decades and concluded that (1) social scientists place a high importance on journals; (2) most of their citation identification comes from journals; (3) infor mal channels, such as consulting colleagues and attend ing conferences, are an important source of information; (4) library resources, such as catalogs, indexes, and librar ians, are not very heavily utilized; and (5) computerized services are ranked very low in their importance to the research process. there are many examples of studies about the infor mationseeking behavior of social scientists. for example, the infross project (investigation into information requirements of the social scientist) studied the informa tion needs of british social scientists in the late 1960s and early 1970s and found that they preferred to use journal citations instead of traditional bibliographic tools, and that they tended to consult with colleagues and subject experts, rather than library catalogs or librarians in order to locate information.2 other socialscientist studies reinforced the findings of the infross project.3 several studies indicated that computerized literature searching was ranked low as a source of information among social scientists and suggested the promotion of electronic information services by librarians to enhance their roles as information providers.4 in an influential study on social scientists’ informa tionseeking patterns, ellis developed a behavioral model with six features based on the stages they went through in gathering information: ■ starting—includes activities characteristic of the ini tial search for information, such as asking colleagues or consulting literature reviews, online catalogs, and indexes and abstracts; ■ chaining—following chains of citations and other forms of referential connection between materials; ■ browsing—semidirected searching in an area of potential interest, such as scanning published jour nals, tables of contents, references, and abstracts; ■ differentiating—using differences (authors or jour nal hierarchies) between sources as a filter on the nature and quality of the material examined; ■ monitoring—maintaining awareness of develop ments in an area through the monitoring of particular sources such as core journals, newspapers, confer ences, magazines, books, and catalogs; and ■ extracting—systematically working through a par ticular source to locate material of interest, for exam ple, sets of journals, collections of indexes, abstracts, or bibliographies.5 meho and tibbo revised ellis’s informationseeking model by studying the informationseeking behavior of socialscience faculty who study stateless nations.6 they confirmed ellis’s model and derived four additional fea tures—accessing, networking, verifying, and information managing. accessing is getting hold of the materials or sources of information once they have been identified and located. networking includes communicating and maintaining a close relationship with a broad range of people such as friends, colleagues, and intellectu als. verifying is checking the accuracy of the informa tion found, and information managing includes filing, archiving, and organizing the collected information to facilitate research. yi shen yi shen (yishen@wisc.edu) is a ph.d. candidate in the school of library and information studies, university of wisconsinmadison. her article is the winner of the 2006 lita/endeavor student writing award. information seeking in academic research: a study of the sociology faculty at the university of wisconsin-madison article title | author 5information seeking in academic research | shen 5 with the exception of ellis’s work in 1987–1990 and the followup study by meho and tibbo, studies inves tigating academic social scientists have been in steady decline since the mid1970s.7 according to line, in an information world radically changed by the internet, it is essential to carry out new studies of information uses and needs.8 most of the studies discussed in this paper were conducted before the development of the internet. the present study focuses on the informationseeking behav ior of social scientists in a new information environment featuring the internet and other dramatic technological advances. kling and mckim pointed out the growing importance of information technology and the resulting major shifts in scientific practice.9 costa and meadows studied the impact of computer usage on scholarly com munication among social scientists and found that major changes in their communication habits were occurring.10 the most significant impacts of information technology were greater interactivity, widened community boundar ies, extended access to information, and an increasing democratization of the international research community. they suggested that the developments were influenced by new pressures (social, economic, political) from the research community and the institutional environment, and by newly available resources (infrastructure, ser vices, sources) being introduced into the academic envi ronment by information technology. it could be expected that social scientists’ informationseeking behavior would change within a new socialtechnical environment. the purpose of this study is to extend the findings of the pre vious studies by examining social scientists’ information needs and their activities and perceptions in relation to today’s information systems and services. this paper provides a theoretical framework for the study, discusses the methods for data collection and data analysis, and summarizes findings. finally, it discusses results, reflects on the theoretical and practical implica tions that ensue, and notes the limitations imposed by the study design. ■ theoretical framework the theoretical frame for this study is the idea of “com munities of practice.” wenger, mcdermott, and snyder define a community of practice as “a group of people who share a common concern, a set of problems, or a passion about a topic, and who deepen their knowledge and expertise in this area by interacting on an ongoing basis.”11 within communities of practice, people share common values, observe and interact with each other, exchange views and ideas, and contribute to the knowl edgecreation process.12 according to wenger, communities of practice are combinations of three elements: a domain of knowledge, which defines the key issues in the community; a com munity of people who care about the domain; and the shared practice that they create.13 communities of prac tice are loosely connected, informal, and selfmanaged. they are about knowledge sharing, and the best way to share knowledge is through social interaction and infor mal relationship networks. effective communication and mutual understanding are important factors in fostering communities of practice. this form of social construction is highly situated and highly improvised.14 it essentially suggests that researching some thing is inseparable from its own historical and social locations of practice and should be carried out in the process of actually doing that thing.15 a process organizes knowledge in a way that is especially useful to practitioners whose shared learning brings value to a community.16 pragmatically, the exami nation of contextbased research processes draws “atten tion away from abstract knowledge and cranial processes and situates it in the practice and communities in which knowledge takes on significance.”17 what is learned is highly dependent in the context on which the learning takes place, as it is central to the transfer and consump tion of information. this requires “looking at the actual practice of work, which consists of a myriad of fine grained improvisations that are unnoticed in any formal mapping of work tasks.”18 such beliefs are utilized in this present study to approach and explain informationseek ing behavior among social scientists. researchers used communities of practice in orga nization and business studies to investigate knowledge sharing and knowledgecreation processes within orga nizational settings to cultivate the building of knowl edgemanagement systems. researchers also used this approach in the field of computersupported cooperative work (cscw) to study the social interactions of group ware systems and community computing and support systems. this study selected communities of practice as the theoretical frame because it has been widely applied in the study of knowledge sharing and has been tested and verified through empirical research. this study rep resents an exploration of the usefulness of communities of practice for research on informationseeking behavior within a knowledgeintensive scholarly community. the primary purpose of the present study is to pro vide empirical evidence on social scientists’ information seeking in scientific research. the main research ques tions are: (1) how do social scientists make use of different information sources and channels to satisfy their infor mation needs? (2) what strategies do they apply when seeking information for academic research? and, (3) what difficulties are encountered in searching for supporting 6 information technology and libraries | march 20076 information technology and libraries | march 2007 information? information service providers should find the results of this study interesting because identifying users’ perceptions of the information environment pro vides guidance for informationsystem development that will closely reflect or accommodate the informationseek ing activities of social scientists. ■ methods the research questions described in the preceding section were tested in the context of information use in scientific inquiry by faculty in the department of sociology at the university of wisconsinmadison during march and april 2003. the participants were selected from the faculty list on the department web page and then contacted by email to arrange facetoface interviews. four people were interviewed based on their willingness to par ticipate. three of them are fulltime professors and have teaching experience of more than twenty years (one of them has been teaching for more than thirty years). the fourth is an assistant professor with four years of teaching experience. all of the participants are female. each inter view lasted from fortyfive minutes to an hour. all participants were interviewed in their campus offices to allow for easy access to supporting materials as examples of how they go about their work. after explain ing her identity, the purpose of the research, and assuring the confidentiality of the interview, the researcher asked initial questions in a relatively structured way to glean backgroundrelated information and research context. the second part of the interview dealt with informa tionrelated behavior, such as information sources and channels used to address research inquiry, and the major strategies for selecting needed information. the third part focused on problems the participants encountered in information seeking. the researcher took field notes and taperecorded all interviews. as a consistency check, the participants were sometimes asked to comment on disciplinary work prac tices gleaned from other interviews. the selection of four participants reflected the practicalities of collecting data with limited time and resources. ■ findings based on the idea of communities of practice that what is learned is highly dependent on the context in which the learning takes place because it is central to the trans fer and consumption of information, the present study provides a holistic picture of information use situated in actual research practice and academic context among these social scientists.19 these findings can be summarized into several interrelated stages as shown in figure 1. the figure shows that the social scientists’ information seeking moves from academic information needs, choice of information sources, searching for information, to use of the information. the researchers move back and forth between stages until the information inquiry is satisfied. searching for information involves the implementation of strategies, confrontation of difficulties, and continuous decision making. choice of information channels goes through the whole informationseeking process based on researchers’ momentary or changing information activi ties and information needs. this figure is intended to provide a general view of the information seeking behav ior in this specific case, but is not intended to generate a model or pattern of information seeking. the findings are organized into the use of information resources and channels to address information inquiry, the strategies for information seeking, and the difficulties encountered in searching for information, which together constitute the major informationseeking practice of the participants. figure 1. stages of the social scientists’ information seeking article title | author 7information seeking in academic research | shen 7 ■ use of information resources and channels to address information needs information needs the respondents reported their researchoriented infor mation needs in the context of their research activities. those information needs can be grouped into seven cat egories. examples of responses follow. 1. general academic issues and current research dis courses in the field. “i find conferences are more useful for seeing what kinds of general things are going on. i guess some of these are research, some are academic politics kinds of things, and what’s happening in the disci pline as a whole.” “in conferences, you find out what other people are doing research on. the most current research is not published yet, so you know what’s happening now.” 2. feedback from colleagues on personal research. “the best thing about conferences is that when i present my own research, i get comments about it.” “you show your paper to people and ask them for comments, and they show you their papers and ask you for comments. this is kind of the normal part of academic life.” “i usually send a copy of a paper or something and get actual comments through email.” 3. current research topics and activities of specific authors. “i’ll look for key people, and see what they’ve done. . . .” “knowing who is doing what where. . . .” “you sort of inevitably talk about your research with other people doing comparable research and find out what they are doing to keep current to what the different research projects are.” 4. existing datasets (existing survey research data bases) and statistics for secondary data analysis. “there are online statistical sources that i get to put in the papers.” “i use the internet to download all the . . . data that we analyze. . . .” “i do a lot of data research, so i use government sites on the internet, like the science’s bureau, or the national center for health statistics. we also have a little center for demography and ecology library. i use our inhouse databases too.” “in social science, there are many existing sets of data. we have something called data and program library service here. they have all kinds of data bases that will tell you where there are data sources that have certain variables in them. . . . so you can go and do your own statistical analysis on those data.” 5. information needed for management purposes, such as the cooperation and coordination of research activities. “in this department, we conduct community busi ness by email. we pass messages around. . . . a decision is usually made through this dialogue.” “i am constantly in interaction with people by e mail to cooperate on research projects.” 6. community recognition and inspirational support from colleagues. for example, one respondent commented, “in conferences, i feel invigorated when sitting and talking to field colleagues who are interested in my research. the whole conversa tion makes me feel excited and inspired.” another respondent indicated, “to see people facetoface that you respect and they think your work is good, that’s good.” it is echoed by a third respondent: “you just talk about your work, and people act like what you are doing is very interesting, then it makes you more inspired.” those needs for information constitute a major research practice of the participants and thus determine how they go about seeking information. ■ information resources supporting information resources could be divided into internally built university resources and external resources. moreover, these internal and external resources could be further subdivided into human resources and nonhuman resources based on their physical forms. internal, nonhuman resources the participants identified two major categories of internal nonhuman information resources for academic research based on their intended use. the first of these categories is books and journals that are available in the university libraries for literature reviews and to provide awareness of current research. however, because of phys ical inconvenience, campus libraries are not often used. one participant indicated, “the library is down the hill, so even before there were lots of good internet resources, i wasn’t going down to the library a lot.” on the other hand, the participants reported that they frequently used the library online public access catalog (opac) to order 8 information technology and libraries | march 20078 information technology and libraries | march 2007 document delivery from the libraries. “i find madcat (the library online catalog system) very useful for a whole variety of specific searches for journals, books, and differ ent online information.” another participant remarked, “i can request a book online through the document deliv ery services.” another internal nonhuman resource consists of exist ing survey datasets that are collected by the center for demography and ecology library for secondary data analysis and research. it was indicated that in social sci ence, as more and more survey research databases were available, there was an increasing amount of research conducted on secondary data. the data and program library service provides all kinds of databases informing researchers of the location of data sources and the vari ables contained in certain datasets. external, nonhuman resources the participants identified three types of external non human resources based on their medium. some of these resources are purchased and managed internally by the campus libraries but developed and maintained exter nally by outside library and information professionals. one type is electronic resources, such as electronic news papers, external opacs, electronic fulltext databases, online statistical reports, survey databases, and govern ment or personal web sites. some named examples include sociological abstracts, lexisnexis, science bureau’s web site, the national center for health statistics web site, web of science, and online british newspapers. the second type is printed resources, such as books, newspapers and magazines, archives, and newspaper indexes that are available from outside of the campus. named examples include the paper indexes for the new york times and los angeles times back in the 1960s. the third type is audio video resources, such as radio broadcasts, tapes, video tapes, and television. one major finding was that the participants depended primarily on electronic information resources. all looked for information on both literature and research data via the internet. literally, each participant had her own fre quent visit to search engines or opacs for information on specific research topics and general research subjects. examples of responses include: “i start with internet explorer and go to google.” “i work a lot online. . . . i just do internet searches. “both these journal and newspaper databases, i use a lot for various purposes.” “i want to find out if there is work on this specific topic or concept. i would almost always start with sociological abstracts.” “the citation index is terrific for finding contempo rary work building on something important.” moreover, the respondents also conducted research on the internet to study web behavior or social networks on the internet. “there are more and more people actually doing research on the internet, studying web sites or connec tions between web sites. . . . they collect data online. . . .” “in socialmovement research, more and more researchers study how people coordinate transactional movements, protest movements, various ethnic move ments, and political movements through the internet.” “online is a big way of doing cooperation as well as doing research. it is one of the reasons that we are inter ested in studying what kind of connections there are on the electronic network.” “a current research project that i am doing is looking at network of . . . web sites. so we are gathering primary data from the web sites.” thus, the electronic mechanism for information sys tems and services dominates the manner in which the participants carry out their research. internal, human resources the faculty participants were not only electronicinforma tion consumers, but also electronicinformation producers. for example, one described, “i maintain my own web page, on which i post my research and add links to outside resources that i collected for years. i have my own gateway to organize the link pages, which can be used for my future reference and by my students. the library links to my web page as well.” moreover, this participant advocated the creation and collection of electronic materials by her col leagues as well. “it’s an evolving process. the more people put their information on the internet, the more useful it is to be on the internet. we are right in that transition.” the department can easily take a step further to build a shared pool of information and information resources in its internal system. a second type of internal human resources comes from the technical staff who provided announcements of technical developments and product information, as well as technical assistance for socialscience research. working as the social science computing cooperative (sscc), the technical staff provides the faculty with detailed instruc tions and useful tips for creating electronic materials as well as with directions for publishing them. librarians, as a third type of human resource, provided reference services and collected necessary information resources for their academic research. external, human resources the external human resources that the respondents gathered and contacted are of two types: people shar article title | author �information seeking in academic research | shen � ing similar research interests and concerns, and people having different fields of interest. the former types were valued for supporting suggestive and creative commu nication and interaction as well as potential cooperation. for example, “when it comes to really think[ing] about things, sit down in one place and talk, and then stuff comes out. you don’t even know what you are thinking until you sit down and talk to people. it’s idea generating.” “knowing who is doing what where in the field is important. . . . i am working on a . . . research topic, which requires the awareness of other people with similar inter est around the world. . . . i cooperate with the scholars from different countries and with different knowledge background.” the latter types are used for current awareness of research works in other fields and general disciplinary activities and academic trends. for example, “i need to know people who know what’s going on in other fields, and they tell me what’s going on.” “i get a lot in terms of contemporary research at con ferences, which are useful for things that haven’t been published in journals.” “[a conference] will generate a lot of interesting inter change.” “[at conferences], i think about how what other people are doing is related to what i would want to do, or how they can do it differently. a lot of times, i think about whether the methods they are using would be useful for my work at all.” ■ channels the major information channels through which the par ticipants delivered and exchanged information included email, telephone, facetoface communication, and proj ect reports or other documents. email was a domi nant communication and informationacquisition tool in research. facetoface or oral communication channels in this case were often used as a supplementary means. “mostly, email is how i communicate with people, occasionally telephone, but not very often. even with people here and we can walk right next door, mostly we just email each other. it’s nice, because you have a record.” “i get hundreds of emails a week. . . . i live on email. my colleagues know i am easier to reach by email than in person.” “[faceto face] it’s just the more personal and emo tional mode [of communication] . . . you can see the person’s expression, and figure out what they are really thinking.” email communication helped accomplish several scientists’ tasks, including quick exchange of timely infor mation, teamwork coordination, nonworkoriented mes sage exchange, field discussion, field information seeking and finding communities of interest. for example, one participant indicated the coordination of community activities through email. in this department, we conduct community business by email. community members rarely meet face to face. the chairperson finds out what the research task is, and sends out messages. people exchange opinions through email messages. and a decision is usually made through this dialogue, instead of talking face to face. when scholars are going to have a facetoface meet ing, they deliver the data, records, and reports before hand, and share their initial viewpoints with supporting information through email. the following factors affected a scholar’s choice of channels for information delivery and exchange: the char acteristics of the information receiver, the characteristics of the information, the task or purpose of delivering or sharing information, and the immediacy of response. for example, one respondent mentioned that she usu ally delivered data, records, and research documents via email for formal announcement and record keeping by the receivers. when there was no stress of immediate response, she preferred email communication for the thoughtful input and feedback allowed by the asynchro nousexchange feature of email. “intellectual questions are more easily handled by email because i have the time to think about it and formulate my responses.” she con tinued, “i usually email a copy of my paper to colleagues for detailed feedback.” in another case, a participant indicated, “some of us are well aware that email is archived, it’s not anonymous and not private. if you are concerned about something and want to say something that you don’t want to have an email record of, you may want to go to talk to some body about it, instead of writing it in an email.” another participant explained that because of her research topics, she usually adopted the facetoface method of communication and attended all kinds of international academic conferences. in other circum stances, when collecting opinions for resolving certain questions, she chose to use email. ■ strategies for information seeking the participants indicated certain strategies applied to gathering information and tracking resources to address 10 information technology and libraries | march 200710 information technology and libraries | march 2007 their information needs. those strategies with response examples are: 1. extracting abstracts: “i use abstracts to get the parameters of what’s happening and then know more narrowly where to focus.” 2. tracking citations: “the citation index is terrific for finding contemporary work” that builds on previ ous major work in a subject area. 3. restricting the search to a limited set of sources or types of sources to achieve satisfactory results within an acceptable timeline. 4. constantly filtering and interpreting the search results by referring to the summary description of web sites: “in most searches that i do, the first ten hits are book dealers. i don’t bother with them. i go to the next page and try. . . . i look at the summary of what the site is and try to figure out what the worthy things are.” 5. avoiding search terms prone to commercial infor mation: “when searching for something without a lot of commercial stuff, you are more likely to get what you want on the top.” 6. setting the default for the number of search results with consideration of information completeness, information usefulness, importance of research, and timeliness: for example, one participant stated, “i usually set my least default to a hundred cita tions. five hundred is too many, but it depends on what you’re looking for, how much you care about your findings, how much faith you have for the existence of useful information. if you think it’s not worth a minute of your time, you just forget it. but if you are sure it’s there, you just have to keep looking for and work[ing] harder at it.” as shown in the findings, the participants employed certain criteria for evaluation of the information they gathered. those judgment criteria were: importance of research, usefulness, accuracy, completeness, and timeli ness. the results imply that to accomplish the research tasks on hand in a fastpaced and distributed digital information environment, the practicalities of time and human effort have come into play in the ways in which the participants sought information. ■ difficulties in seeking supporting information the problems encountered by the participants when col lecting information through various resources were iden tified and are grouped into categories, including: ■ information is scattered in different places and with different qualities; it is difficult to have a complete and valuable picture of a research phenomenon. the participants described this difficulty as “how tricky computerized search is.” ■ there is too much information on the internet to filter, and the current search techniques and ranking tools are not intelligent enough to capture the most relevant information of interest. the participants described trying alternative search strategies as a “gameplaying” and “brainstorming” process. ■ no sources of information or mechanisms assist in the identification of people with similar research interests and their activities in the broad virtual space. for example, one participant described: i am trying to find what’s in public debate on con troversial topics. and it’s very common to have trouble finding both sides of the debate. i started with diffuse searches on the internet trying to see if i can find the potential academic community and tag into their debate. i basically searched on [the search term] on the whole internet because i had no idea where it would be, who got involved, and how it was formed. when doing [the research] issue, it’s easy to find the people in favor of [a topic], but difficult to find anybody who was an opponent. eventually, i got hundreds of hits [search results], and i had to wade through a lot of proponents to find the opponents. sometimes, it’s an issue to find [an] ethnic minority perspective of a topic. ■ technology upgrades and system integration arouse another concern. as one participant expressed it, “technology is changing [so] fast that lots of com puter files from the 1970s are no longer readable. the danger of an information system lies in the tradeoff between the accessibility provided by digitization and the longterm survival of intellectual proper ties.” ■ there are no digital sources of information for some historical documents and no retrievable data bases for book chapters. one participant noted, “the online strategy is very good for really current stuff, but not for older stuff. the people who started the . . . research were actually writing before the online revolution, so they are not turning up so much in keyword searches online.” another participant also mentioned the inconvenience of using hardcopy indexes for newspapers from the 1960s and archival data that go back to the 1970s and 1980s. ■ discussion this study shows how the ‘communities of practice’ perspective situates the process of using information in the actual practice of scientific research. it provides an information context in which knowledge takes on sig article title | author 11information seeking in academic research | shen 11 nificance. the results provide empirical evidence of the participants’ activities as well as insights into the ways they seek information. in his discussion of useroriented evaluation and qualitative analysis of information use, ellis emphasized a smallscale qualitative analysis of users’ perceptions of system performance to construct insights into the complex reality of the information environment.20 he argued that a detailed understanding of the complexity and interaction of information systems and services and their users can be used to explain problems and provide guidance on the development of information systems. the present study is in accord with ellis’s idea by focus ing on a specific sample of academic social scientists working in a university setting. the choice of university of wisconsinmadison is based on the grounds of conve nience and ease of access. the restriction to one specific sample also avoids the added complexity and compound problems of information use situated in different practice and contexts. ellis also considered the feasibility of interviews to “provide enough information for a detailed and accurate account of the perceptions of the social scientists of their informationseeking activities to be made, and to enable an authentic picture to be constructed of those activi ties.”21 he thought the informationseeking activities of social scientists were too diffuse to carry out triangulation of methods. by applying the interview method, this cur rent study complies with ellis’s suggestion. on the other hand, ellis’s informationseeking behav iormodel of social scientists presented six generic fea tures. these conclusions are far too general for specific application. from the perspective of communities of practice, the current study examines the way social sci entists use information in their research practices and specific circumstances; it also presents specific informa tionrelated behavior, strategies, and difficulties. this study also extends the understanding of the way infor mation is used by social scientists in a new information environment with dramatic technical advances. the findings of this study support the conclusions of kling and mckim and costa and meadows by showing the growing importance of information technology and the resulting major shifts in informationseeking practice among social scientists.22 unlike research findings prior to the 1990s, the social scientists in this study make exten sive use of a variety of information sources and channels, primarily electronicinformation systems and services, in seeking information. in the new information environ ment, these new information mechanisms also presented limitations and difficulties. moreover, many lis researchers have examined users’ relevance criteria in information seeking.23 great emphasis is given to the “situational dynamism of user centered relevance estimation.”24 situated in their research practices, the present study also identified the social sci entists’ applications of certain criteria for evaluation of information. although the smallscale study has limitations for research generalization, the rich description of social sci entists’ perspective on the information environment has some practical implications for informationsystemand service design for academic social scientists. ■ plan for system-to-system integration this study identified technology upgrades and sys temintegration problems existing in current academic information systems. technology was developed and applied without the capability of intergenerational com munications and transactions at the cost of intellectual properties. kling and star addressed the same issue that “computerized systems appear like the layers of an archaeological dig, with newer systems built upon older systems with various workplace surveillance capa bilities.”25 they stated that such “legacy systems” are fragile and inflexible for information use and knowledge management. therefore, planning for system integration should be underway. ■ enhance the web resource-retrieval system the study identified the difficulties encountered by fac ulty in locating relevant, complete, and valuable informa tion effectively and efficiently on the large and dynamic web. an advanced web resource system thus is required that allows web content to be indexed and retrieved more intelligently. moreover, the findings of informationseek ing strategies in this case study suggest a oneway user system interaction process. there is no interactive query refinement between the user and the system. thus, the users have to brainstorm and play with alternative search strategies in the hope of significant results. to enhance system effectiveness, a relevancefeedback mechanism that takes into account the users’ relevance judgment is thus needed. this mechanism should have a twoway usersystem interaction component. ■ construct an internal information system the findings of the study point to a need for a shared pool of information resources in the university of wisconsin– madison department of sociology. through the leverage and reuse of existing internal knowledge assets in the 12 information technology and libraries | march 200712 information technology and libraries | march 2007 department, this system could help collectively create or gather information resources for crossreference by colleagues. ■ construct a collaborative information mechanism for the social-scientific community according to the findings, there are no sources of infor mation or mechanisms that assist the identification of people with similar research interests and their activi ties on the broad virtual space. however, awareness of shared interests and experiences constitutes an important external human resource that is valued for suggestive and creative interaction and for potential cooperation. thus, a collaborative information mechanism for identification with personal academic interests will be helpful. ■ limitations certain limitations inherent in the study need to be acknowledged. due to the time and resource constraints, the study sample includes only four scholars. given this small sample, results cannot be generalized. although ellis mentioned the feasibility of interviews in a user oriented study of information use, dependence on a single method has the disadvantages of the restriction of views. for example, interviewer characteristics, expecta tions, and verbal idiosyncrasies, and participants’ socially desirable responses are recognized in many studies as potential sources of method biases (podsakoff et al.).26 if time and resources permit, triangulation of methods—for example, combining interviews with observations and diaries—would increase the level of specificity and justify the validity and reliability of the research results. ■ conclusion drawing upon the idea of communities of practice that what is learned: (1) is dependant on the context in which the learning takes place, and (2) is central to the transfer and consumption of information, this study examined the informationseeking behavior of four social scien tists. results were drawn about their use of information resources and information channels to meet their infor mation inquiries, their strategies for information seeking, and the difficulties encountered in searching for relevant information, situated in the course of their actual scien tific research. this work has two primary contributions. first, it provides a rich description of social scientists’ per spectives on their researchoriented informationseeking behavior in the context of today’s information environ ment. second, it situates information seeking behavior in a socially constructed practice and presents specific features of information seeking. these results will help refine the understanding of the dynamic relationship between information systems and services and their users within scientific research. several areas remain for future research. researchers could make a comparative study of academics in differ ent institutional settings. future research could also study the dynamic interaction of information systems and ser vices and their users within each stage of ellis’s model of informationseeking patterns among social scientists to get insights into the specific features of their information seeking behaviors and to enrich their general patterns of information inquiry with specific details. research on informationseeking behaviors of social scientists could also focus on specific research tasks or certain research stages to decide differences or similarities of informa tionseeking behaviors across academic practice. similar research could also be done on faculty in other disci plines. references 1. m. b. folster, “informationseeking patterns: social scien tists,” the reference librarian 23, no. 49/50 (1995): 83–93. 2. m. b. line, “information requirements in the social sci ences: some preliminary considerations,” journal of librarianship 1, (1969): 1–19; m. b. line, “the information uses and needs of social scientists: an overview of infross,” aslib proceedings 23, (1971): 412–34. 3. p. stenstrom and r. b. mcbride, “serial use by social sci ence faculty: a survey,” college and research libraries 40 (1979): 426–31; r. h. epp and j. s. segal, “the acls survey and aca demic library service,” college and research libraries news 48, (1987): 63–69; m. slater, “social scientists’ information needs in the 1980s,” journal of documentation 44, no. 3 (1988): 226–37; m. b. folster, “a study of the use of information sources by social science researchers,” the journal of academic librarianship 15, no. 1 (1989): 7–11; c. c. gould and m. j. handler, information needs in the social sciences: an assessment (mountain view, calif.: research libraries group, 1989). 4. folster, “a study of the use of information sources by social science researchers”; epp and segal, “the acls survey and academic library service.” 5. d. ellis, “the derivation of a behavioral model for infor mation retrieval system design” (ph.d. diss., univ. of sheffield, 1987); d. ellis, “a behavioral approach to information retrieval system design,” journal of documentation 45, no. 3 (1989): 171– 212. 6. l. i. meho and h. r. tibbo, “modeling the information seeking behavior of social scientists: ellis’s study revisited,” article title | author 13information seeking in academic research | shen 13 journal of the american society for information science and technology 54, no. 6 (2003): 570–87. 7. h. c. hobohm, “social science information and docu mentation: time for a state of the art?” inspel 33, no. 3 (1999): 123–30. 8. m. b. line, “social science information: the poor rela tion,” ifla journal 26, no. 3 (2000): 177–79. 9. r. kling and g. mckim, “not just a matter of time: field differences and the shaping of electronic media in supporting scientific communication,” journal of the american society for information science 51, no. 14 (2000): 1306–20. 10. s. costa and j. meadows, “the impact of computer usage on scholarly communication among social scientists,” journal of information science 26, no. 4 (2000): 255–62. 11. e. wenger, r. mcdermott, and w. m. snyder, cultivating communities of practice: a guide to managing knowledge (boston: harvard business sch. pr., 2002), 4. 12. s. alhawamdeh, knowledge management: cultivating knowledge professionals (oxford: chandos pubs., 2003). 13. e. wenger, communities of practice: learning, meaning, and identity (cambridge: cambridge univ. pr., 1998). 14. j. s. brown and p. duguid, “organizational learning and communities of practice: toward a unified view of working, learning, and innovation,” organization science 2, no.1 (1991): 40–57. 15. j. s. brown, “internet technology in support of the con cept of ‘communities of practice’: the case of xerox,” accounting, management, and information technologies 8, no. 4 (1998): 227–36; brown and duguid, “organizational learning and com munities of practice; f. blackler, “knowledge, knowledge work, and organizations: an overview and interpretation,” organization studies 16, no. 6 (1995): 1021–46; j. lave and e. wenger, situated learning: legitimate peripheral participation (cambridge: cambridge univ. pr., 1991); n. hayes and g. walsham, “par ticipation in groupwaremediated communities of practice: a sociopolitical analysis of knowledge working,” information and organization 11, no. 4 (2001): 263–88. 16. wenger, mcdermott, and snyder, “cultivating communi ties of practice.” 17. brown and duguid, “organizational learning and com munities of practice,” 48. 18. hayes and walsham, “participation in groupwaremedi ated communities of practice,” 264. 19. k. grosser, “human networks in organizational informa tion processing,” in m. e. williams, ed., annual review of information science and technology (medford, n.j.: learned information, 1991), 349–402; brown, “internet technology in support of the concept of ‘communities of practice’”; brown and duguid, “organizational learning and communities of practice”; black ler, “knowledge, knowledge work, and organizations”; lave and wenger, situated learning: legitimate peripheral participation; hayes and walsham, “participation in groupwaremediated communities of practice.” 20. d. ellis, “useroriented evaluation and qualitative anal ysis of patterns of information use,” in d. bawden, user-oriented evaluation of information systems and services (brookfield, vt.: gower, 1990), 172–79. 21. ibid., 177. 22. kling and mckim, “not just a matter of time”; costa and meadows, “the impact of computer usage on scholarly com munication among social scientists.” 23. c. l. barry, “userdefined relevance criteria: an explor atory study,” journal of the american society for information science 45, no. 3 (1994): 149–59; h. w. bruce, “a cognitive view of the situational dynamism of usercentered relevance estimation,” journal of the american society for information science 45, no. 3 (1994): 142–48; s. mizzaro, “relevance: the whole story,” journal of the american society for information science 48, no. 9 (1997): 810–32; x.j. yuan, n. j. belkin, and j.y. kim, “the relation ship between ask and relevance criteria,” in proceedings of the 25th annual international acm sigir conference on research and development in information retrieval (new york: acm pr., 2002), 359–60; s. y. rieh, “judgment of information quality and cognitive authority in the web,” journal of the american society for information science and technology 53, no. 2 (2002): 145–61; c. n. wathen and j. burkell, “believe it or not: factors influencing credibility on the web,” journal of the american society for information science and technology 53, no. 2 (2002): 134–44; a. tombros, i. ruthven, and j. m. jose, “searchers’ criteria for assessing web pages,” in proceedings of the 26th annual international acm sigir conference on research and development in information retrieval (toronto: acm pr., 2003), 385–86. 24. bruce, “a cognitive view of the situational dynamism of usercentered relevance estimation,” 142. 25. r. kling and l. star, “humancentered systems in the perspective of organizational and social informatics,” computers and society 28, no. 1 (1998): 22–29. 26. p. m. podsakoff et al., “common method biases in behav ioral research: a critical review of the literature and recom mended remedies,” journal of applied psychology 88, no. 5 (2003): 879–903. lib-mocs-kmc364-20140106084018 title-only entries retrieved by use of trunca1'ed search keys 207 frederick g. kilgour, philip l. long, eugene b. liederman, and alan l. landgraf: the ohio college library center, columbus, ohio. an experiment testing utility of truncated search keys as inquiry terms in an on-line system was performed on a file of 16,792 title-only bibliographic entries. use of a 3,3 key yields eight or fewer entries 99.0% of the time. a previous paper ( 1) established that truncated derived search keys are efficient in retrieval of entries from a name-title catalog. this paper reports a similar investigation into the retrieval efficiency of truncated keys for extracting entries from an on-line, title-only catalog; it is assumed that entries retrieved would be displayed on an interactive terminal. earlier work by ruecking (2), nugent (3), kilgour (4), dolby (5), coe ( 6), and newman and buchinski ( 7) were investigations of search keys designed to retrieve bibliographic entries from magnetic tape files. the earlier paper in this series and the present paper investigate retrieval from on-line files in an interactive environment. similarly, the work of rothrock ( 8) inquired into the efficacy of derived truncated search keys for retrieving telephone directory entries from an on-line file. since the appearance of the previous paper, the ohio state university libraries have developed and activated a remote catalog access and circulation control system employing a truncated derived search key similar to those described in the earlier paper. however, osu adopted a 4,5 key consisting of the first four characters of the main entry and the first five characters of the title excluding initial articles and a few other nonsignificant words. whereas the osu system treats the name and title as a continuous string of characters, the experiments reported in this and the previous paper deal only with the first word in the name and title, articles always being excluded. 208 journal of library automation vol. 4/4 december, 1971 the bell system has also recently activated a large traffic experiment in the san francisco bay area. the master file in this system contains 1,300,000 directory entries. the system utilizes truncated derived keys like those investigated in the present experiments. materials and methods the file used in this experiment was described in the earlier paper ( 1), except that this experiment investigates the title-only entries. the same programs used in the name-title investigation were used in this experiment; the title-only entries were edited so that the first word of the title was placed in the name field and the .11emaining words in the title field. as was the case formerly, it was necessary to clean up the file. single word titles often carried in the second or title field such expressions as one year subscription or vol 16 1968. in addition there were spurious character strings that were not titles, and in such cases the entire entry was removed from the file. thereby, the original 17,066 title entries were reduced to 16,792. the truncated search keys derived from these title-only entries consist of the initial characters of the first word of the title and of the second word of the title. if there was no second word, blanks were employed. if either the first or second word contained fewer characters than the key to be derived, the key was left-justified and padded out with blanks. to obtain a comparison of the effectiveness of truncated research keys derived from title-only entries as related to first keys derived from nametitle entries, a name-title entry fil e of the same number of entries ( 16,792) was constructed. a series of random numbers larger than the number of entries in the original name-title file ( 132,808 ) was generated and one of the numbers was added to each of the 132,808 name-title entries in sequence. next the fil e was sorted by number so that a randomized file was obtained. then the first 16,792 name-title entries were selected. the same program analyzed keys d erived from this file. results table 1 presents the maximum number of entries to be expected in 99% of replies for the file of 16,792 title-only entries as well as for the nametitle file containing the same total of entries. for example, when a large number of random requests are put to the title-only file using a 3,3 search key, the prediction is that 99.0% of the time, eight or fewer replies will be returned. however, in the case of the name-title file , only two replies will be returned 99.3% of the time. the 3,3 key produced only thirteen replies ( .12% of the total number of 3,3 keys) containing twenty-one or more entries. the highest number of entries for a single reply for the 3,3 key was 235 ( "jou ,of" d erived from journal of ) . the next highest number of replies was 88 ("adv, in" for advances in ) . trun cated search keys j kilgour 209 table 1. maximum number of entries in 99% of replies search key title-only entries name-title entries percent max imum ent1·ies maximum entries percent per reply of time per reply of time ~2 ~ ~1 7 99.0 ~3 ~ ~1 4 99.6 2,4 11 99.0 3 99.5 3,2 9 99.1 3 99.2 3,3 8 99.0 2 99.3 3~ 8 ~1 2 99.5 4,2 8 99.1 2 99.2 4,3 7 99.0 2 99.6 4,4 7 99.1 2 99.7 discussion the two words from which the keys are derived in name-title entries constitute a two-symbol markov string of zero order, since the name string and title string are uncorrelated. however, the two words from which keys are derived in the title-only entry are first order markov strings, since they are consecutive words from the title string and are correlated. the consequence of these two circumstances on the effective ness of derived keys is clearly presented in table 1. the keys from name-title entries consistently produce fewer maximum entries per reply. therefore, it is desirable to derive keys from zero order markov strings wherever possible. the ohio state university libraries contain over two and a quarter million volumes, but on 9 february 1971 there were only 47,736 title-only main entries in the catalog. the file used in the present experiment is 35% of the size of the osu file. since 99% of the time the 3,3 key yields eight or fewer titles, it is clear that such a key will be adequate for retrieval for library on-line, title-only catalogs. the 3,3 key also posse sses the attractive quality of eliminating the majority of human misspe1ling as pointed out in the earlier paper ( 1). there remains, however, the unsolved problem of the efficient retrieval of such titles as those beginning with "journal of" and "advances in". it appears that it will be necessary to devise a special algorithm for those relatively few titles that produce excessively high numbers of entries in replies. in the previous investigation it was found that a 3,3 key yielded five or fewer replies 99.08% of the time from a fil e of 132,808 name-title entries. table 1 shows that for a file of only 16,792 entries the 3,3 key produces two or fewer replies 99.3% of the time . these two observations suggest that as a file of bibliographic entries increases, the maximum number of entries per reply does not increase in a one-to-one ratio, since the maximum 210 journal of library automation vol. 4/4 december, 1971 number of entries rose from two to five while the total size of the file increased from one to approximately eight. further research must be done in this area to determine the relative behavior of derived truncated keys as their associated file sizes vary. conclusion this experiment has produced evidence that a series of truncated search keys derived from a first order markov word string in a bibliographic description yields a higher number of maximum entries per reply than does a series derived from a zero order markov string. however, the results indicate that the technique is nonetheless sufficiently efficient for application to large on-line library catalogs. use of a 3,3 search key yields eight or fewer entries 99.0% of the time from a file of 16,792 title-only entries. acknowledgment this study was supported in part by national agricultural library contract 12-03-01-5-70 and by office of education contract oec-0-72-2289 (506). references 1. f. g. kilgour; p. l. long; e. b. leiderman: "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the american society for information science 7 ( 1970), pp. 79-82. 2. f. h. ruecking, jr.: "bibliographic retrieval from bibliographic imput; the hypothesis and construction of a test," journal of library automation 1 (december 1968), 227-38. 3. nugent, w. r.: "compression word coding techniques for information retrieval," ] ournal of library automation 1 ( december 1968 ) , 250-60. 4. f. g. kilgour: "retrieval of single entries from a computerized library catalog file," proceedings of the american society for information science 5 ( 1968), pp. 133-36. 5. j. l. dolby: "an algorithm for variable-length proper-name compression," ] ournal of library automation 3 (december 1970), 257-75. 6. m. j. coe: "mechanization of library procedures in the medium-sized medical library: x. uniqueness of compression codes for bibliographic retrieval,'' bulletin of the medical library association 58 (october 1970), 587-97. 7. w. l. newman; e. j. buchinski: "entry/title compression code access to machine readable bibliographic files," journal of library automation 4 (june 1971 ), 72-85. 8. h. i. rothrock, jr.: computer-assisted directory search; a dissertation in electrical engineering. (philadelphia: university of pennsylvania, 1968). lita president's message: sustaining lita lita president’s message sustaining lita emily morton-owens information technology and libraries | september 2019 2 emily morton-owens (egmowens.lita@gmail.com) is lita president 2019-20 and the assistant university librarian for digital library development & systems at the university of pennsylvania libraries. recently, at the 2019 midwinter meeting in seattle, ala decided to adopt sustainability as one of the core values of librarianship. the resolution includes the idea of a triple bottom line: “to be truly sustainable, an organization or community must embody practices that are environmentally sound and economically feasible and socially equitable.” if you had thought of sustainability mainly in terms of the environment, you have plenty of company. i originally pictured it as an umbrella term for a variety of environmental efforts: clean air, waste reduction, energy efficiency. but in fact the idea encompasses human development in a broader sense. one definition of sustainability involves making decisions in the present that take into account the needs of the future. of course our current environmental threats demand our attention, and libraries have found creative ways to promote environmental consciousness (myriad examples include books on bikes, seeking leed or passive house certification for library buildings, providing resources on xeriscaping, and many more). even if you’re not presently working in a position that allows you to engage directly on the environment, though, the concept of sustainability turns out to permeate our work and values. the ideas of solving problems in a way that doesn’t create new challenges for future people, developing society in a way that allows all people to flourish, and fostering strong institutions: these concepts all resonate with the work we do daily, not only in what we offer our users but also in how we work with each other. as a profession, we have a history of designing future-proof systems (or at least attempting to). whenever i’ve been involved in planning a digital library project, one of the first questions on the table is “how do we get our data back out of this, when the time comes?” no matter how enamored we are of the current exciting new solution, we remember that things will look different in the future. library metadata schemas are all about designing for interoperability and reusability, including in new ways that we can’t picture yet. someone who is unaccustomed to this kind of planning may see a high project overhead for these concerns, but we have consistently incorporated long-term thinking into our professional values due to the importance we place on free access, data preservation, and interoperability. the triple-bottom line approach, considering economic, social, and environmental factors, also influences the lita leadership. i recently announced the lita board’s decision to reduce our in person participation at ala midwinter for 2020, which is partly in response to ala’s deliberations about reinventing the event starting in 2021. with all the useful collaboration technologies now at our fingertips, it is harder to justify requiring our members to meet in person more than once per year. it is possible for us to do great work, on a continuous and rolling basis, throughout the year. more importantly, we want to offer committee and leadership positions to members who may not mailto:egmowens.lita@gmail.com http://www.ala.org/aboutala/sites/ala.org.aboutala/files/content/governance/council/council_documents/2019_ms_council_docs/ala%20cd%2037%20resolution%20for%20the%20adoption%20of%20sustainability%20as%20a%20core%20value%20of%20librarianship_final1182019.pdf sustaining lita | morton-owens 3 https://doi.org/10.6017/ital.v38i3.11627 be able to travel extensively, for personal or work reasons. (especially when many do not receive financial support from their employers. and, to come back around to environmental concerns for a moment, think of all the flights our in-person meetings require.) by being more flexible about what participation looks like, we sustain the effort that our members put into lita through a world of work that is changing. financial sustainability is also a factor in our pursuit of a merger with alcts and llama. we are three smaller divisions based on professional role, not library type, who share interests and members. we also have similar needs and processes for running our respective associations. unfortunately, lita has been on an unsustainable course with our budget for some time—we spend more than we take in annually, due to overhead costs and working within ala’s processes and infrastructure. the lita board has engaged for many years on the question of how to balance our financial future with the fact that our programs require full-time staff, instructors, technology, printing, meeting rooms, etc. core, as the new merged division will be known, will allow us to correct that balance by combining our operations, streamlining workflows, and containing our costs. the staff will also be freed up to invest more effort in member engagement. we can’t predict all the services that associations will offer in the future, but we know that, for example, online professional development is always needed, so we’re ensuring that the plan allows it to continue. it is inspiring to talk about the new collaborations and subject-matter synergies that the merger will bring with it, but core will also achieve something important for sustaining a level of service to our membership. at the ala level, the steering committee on organizational effectiveness (scoe) is also looking at ways to streamline the association’s structure and make it more approachable and welcoming to new members. i would add that a simplified structure should make ala more accountable to members as well, which is crucial for positioning it as an organization worth devoting yourself to. these shifts are essential because member volunteers are what make ala happen, and we need a structure that invites participation from future generations of library workers. taken together, these may look like a confusing flurry of changes. but librarians have evolved to be excellent at long-term thinking about our goals and values and how to pursue an exciting future vision based on what we know now and what tools (technology, people, ideas) we have at hand. we care about helping our users thrive and are able to take a broad view of what that encompasses. in particular, with the new resolution about sustainability, we’re including the health of our communities and the security of our environment as a part of that mission. due to their innovative spirit and principled sense of commitment, our members are well-placed to lead transformations in their home institutions and to participate in the development of lita. as we weigh all these changes, we value the achievements of our association and its past leaders and members, and seek to honor them by making sure those successes carry on for our future colleagues. reproduced with permission of the copyright owner. further reproduction prohibited without permission. the internet, the world wide web, library web browsers, and library web servers jian-zhong, zhou information technology and libraries; mar 2000; 19, 1; proquest pg. 50 tutorial the internet, the world wide web, library web browsers, and library web servers jian-zhong (joe) zhou this article first examines the difference between two very familiar and sometimes synonymous terms, the internet and the web. the article then explains the relationship between the web's protocol http and other high-level internet protocols, such as telnet and ftp, as well as provides a brief history of web development. next, the article analyzes the mechanism in which a web browser (client) "talks" to a web server on the internet. finally, the article studies the market growth for web browsers and web servers between 1993 and 1999. two statistical sources were used in the web market analysis: a survey conducted by the university of delaware libraries for the 122 members of the association of research libraries, and the data for the entire web industry from different web survey agencies. many librarians are now dealing with the internet and the web on a daily basis. while the web is sometimes synonymous with the internet in many people's minds, the two terms are quite distinct, and they refer to different but related concepts in the modem computerized telecommunication system. the internet is nothing more than many small computer networks that have been wired together and allow electronic information to be sent from one network to the next around the world . a piece of data from joe zhou (joezhou@udel.edu) is associate librarian at the university of delaware library, newark. beijing, china may traverse more than a dozen networks while making its way to washington, d.c. we can compare the internet to the great wall of china, which was built in the qin dynasty around the third century b.c. by connecting many existing short defense walls built by previous feudal states . the great wall not only served as a national defense system for ancient china, but also as a fast military communication system. a border alarm was raised by means of smoke signals by day, and beacon fires at night, ignited by burning a mixture of wolf dung , sulfur, and saltpeter. the alarm signal could be relayed over many beacon-fire towers from the western end of the great wall to the eastern end (4,500 miles away) within a day . this was considered light speed two thousand years ago. however, while the great wall transferred the message in a linear mode, the internet is a multidimensional network. the web is a late-comer to the internet, one of the many types of high-level data exchange protocols on the internet. before the web, there was telnet, the traditional commanddriven style of interaction. there was ftp, a file transfer protocol useful for retrieving information from large file archives. there was usenet , a communal bulletin board and news system. there was also e-mail for individual information exchange, and e-mail lists, for one-to-many broadcasts. in addition, there was gopher, a campus-wide information system shared among universities and research institutions, and wais, a powerful search and retrieval system developed by thinking machines, inc. in 1990 tim bemerslee and robert cailliau at cern (www. cern.ch), the european laboratory for particle physics, created a new information system called "world wide web" (www). designed to help the cern scientists with the increasingly confusing task of exchanging information on the 50 information technology and libraries i march 2000 internet, the web system was to act as a unifying force, a system that would seamlessly bind all file-protocols into a single point of access. instead of having to invoke different programs to retrieve information via various protocols, users would be able to use a single program, called a "browser," and allow it to handle all the details of retrieving and displaying information. in december 1993 www received the ima award, and in 1995 bemers-lee and cailliau received the association for computing (acm) software system award for its development. the web is best known for its ability to combine text with graphics and other multimedia on the internet. in addition, the web has some other key features that make it stand out from earlier internet information exchange protocols. since the web is a late-comer to the internet, it has to be compatible backwards with other communications protocols in addition to its native language, hypertext transfer protocol (http). among the foreign languages spoken by web browsers are telnet, ftp, and other high-level communication protocols mentioned earlier. this support for foreign protocols lets people use a single piece of software, the web browser, to access information without worrying about shifting from protocol to protocol and software incompatibility . despite different high-level protocols including http for the web, there is one thing in common for all parts of the internet-tcp/ ip, the lower level of the internet protocol. tcp /ip is respon sible for establishing the connection between two computers on the internet and guarantees that the data can be sent and received intact. the format and content of the data are left for high-level communication protocols to manage, among which the web is the best known one. at the tcp /ip level all computers "are created equal." two computers establish a connection and start to reproduced with permission of the copyright owner. further reproduction prohibited without permission. communicate. in reality, however, most conversations are asymmetric. the end user's machine (the client) usually sends a short request for information, and the remote machine (the server) answers with a longwinded response. the media is the internet. the common language on the internet can be the web or any other high-level protocols . on the web, the client is the web browser; it handles the user's request for a document. the first web browser, ncsa mosaic, developed by the national center for supercomputing applications (ncsa) at the university of illinois at urbanachampaign, was released in midnovember 1993 for unix, windows, and macintosh platforms. version 3.0 of ncsa mosaic is available at www. ncsa. uiuc.ed u/ sdg /software/ mosaic. both source code and binaries are free for academic use. mosaic lost market share to netscape after its key developer left ncsa and joined netscape. even after mosaic introduced an innovative 32-bit version in early 1997, which can perform feats that other major browsers had not even thought of back then, mosaic remained out of the major browsers' market. the two most widely-used browsers today are microsoft's internet explorer (ie) and netscape's navigator (part of the netscape communicator suite). recent web browser surveys conducted by different internet survey companies such as www.zonaresearch.com/ browserstudy, www.psrinc.com/ trends.htm, and www .statmarket. com all indicate that ie is the market leader with more than 60 percent market share, leaving navigator with between 35 percent and 40 percent. in 1995 ie had only 1 percent share versus navigator's more than 90 percent, an unimaginable rise critics have attributed to microsoft's strategy of bundling the browser with its near-monopoly windows operating system. however, a survey conducted in december 1998 by the university of delaware library of 122 members of the association of research libraries (arl) showed that netscape still remained the market leader among big academic libraries. more than 90 percent of arl libraries supported netscape, and about 50 percent also supported ie. most arl libraries supported both browsers, and unlike the browser industry survey mentioned earlier, in which only one product can be picked as the primary browser , the sum of the percentages for the arl survey was greater than 100 percent. the main function of the web browser is to request a document available from a specific server through the internet using the information in the document's url. the server on a remote machine returns the document usually physically stored on one of the server's disks. with the use of common gateway interface (cgi), the documents do not have to be static. rather, they can be synthesized at the point of being requested by cgi scripts running on the server's side of the connection . in some database-driven web servers that make the core of today's e-commerce, the documents provided may never exist as physical files but are generated as needed from database records . the web server can be run on almost any computer, and server software is available for almost all operating systems, such as unix, windows 95/98/nt, macintosh, and os / 2. according to the university of delaware library's 1998 survey of internet web servers among arl member libraries, more than 32 percent of arl libraries chose apache as their web server software, followed by the netscape series at 29.32 percent, ncsa httpd at 11.28 percent, and microsoft internet information server (iis) at 7.52 percent. in july 1999 the author checked the netcraft survey at www .netcraft. com/survey . the top three web server software programs for more than 6.5 million web sites are apache (56.35 percent) , microsoft-hs (22.33 percent), and netscape (5.65 percent). the netcraft survey also provides the historical market share information of major web servers since august 1995. ncsa httpd was the first web server software released, about the same time as the release of mosaic in 1993. however, it slipped from the number-one position with more than 90 percent market share in 1993, and almost 60 percent in 1995, to less than 1 percent in july 1999. it is no longer supported by ncsa, however, httpd remains a popular choice for web servers due to its small size, fast performance, and solid collection of features . the "inertia effect" of the existing sites (if it runs well, why bother to change?) will likely keep ncsa on the major web server software list for some time. ncsa is free, but available only for the unix platform. it is available from http:/ /hoohoo .ncsa.uiuc.edu. however, when the author visited the site in july 1999, the following message appeared on the main page : "the ncsa httpd is no longer under development. it is an unsupported product. we recommend that you check out the apache server, instead of installing our server." most people who use only web browsers may have heard of apache only as an indian nation or a military helicopter, not the most popular web server software with more than 50 percent market share . it was first introduced as a set of fixes or "patches" to the ncsa httpd. apache 1.0 was released in december 1995 as open-source server software by a group of webmasters who named themselves the apache group. open-source means the source code is available and freely distributed, and it is the key to apache's attractiveness and popularity. the apache group members were nsca users tutorial i zhou 51 reproduced with permission of the copyright owner. further reproduction prohibited without permission. who decided to coordinate development work on the server software after nsca stopped. in july 1999 the apache group announced that it was establishing a more formal organization called the apache software foundation (asp). in the future, the asp (www .apache.org) will monitor development of the free software, but it will remain a "not-for-profit" foundation. apache is high-end, enterprise-level server software and can be run on os/2, unix (including linux), and windows platforms, but a mac version is still not available. the netscape series includes netscape-enterprise, netscape-pasttrack, netscape-commerce, and netscape-communication . enterprise is a high-end, enterprise-level server while pasttrack serves as an entrylevel server for small workgroups. netscape supports both the unix and the windows nt platforms. the other major commercial web server, microsoft internet information server (iis), as of 1999, is only available for the windows platform. however, one advantage of iis over netscape is that it can be downloaded for free as part of the windows option pack. in addition, iis can handle ms office documents very well. while both the microsoft and netscape brand names are well recognized by millions of end users. a name alone does not necessarily equate to large market share, nor does a deep pocket. apache remains the top web server despite intense competition. one of the keys to apache's success, in addition to its outstanding performance, lies in its open-source code movement and active user support on a wide basis. the web server of choice for the macintosh platforms is webstar. however, due to the limitations of the operating system networking software, the performance of macintosh-based servers has not been great. webstar can be downloaded as a free evaluation release from www.stamine.com/webstar. the web server market is dynam52 information technology and libraries i march 2000 ic and competition intense. there are more than sixty web server products on the top list ( of web servers with more than one thousand web sites) as of july 1999, and newcomers are being added frequently. acknowledgments the author thanks peter liu, head of the systems department at the university of delaware library, for providing the web survey data of arl libraries . after this article was submitted, the survey data was published by arl in 1999 as spec kit 246: web page development and management. the author also wants to thank his dear wife min yang for her technical assistance. min is webmaster and system administrator for the web site at a. i. dupont nemours foundation and hospital for children, http:/ /kidshealth.org. public libraries respond to the covid-19 pandemic, creating a new service model editorial board thoughts public libraries respond to the covid-19 pandemic, creating a new service model jon goddard information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.12847 jon goddard (jgoddard@northshorepubliclibrary.org) is a librarian at the north shore public library, and a member of the ital editorial board. © 2020. during the covid-19 pandemic, public libraries have demonstrated, in many ways, their value to their communities. they have enabled their patrons to not only resume their lives, but to help them learn and grow. additionally, electronic resources offered to patrons through their library card have allowed people to be educated and entertained. the credit must go to the librarians, who initially fueled, and have maintained this level of service by re-writing the rules—creating a new service model. once libraries closed, librarians promoted ebooks and other important platforms available to patrons with their library cards. the result: the checkout of ebooks, and the use of these platforms rose, exponentially. community engagement became completely virtual with librarians, and those who provide library programs to the public, providing services on platforms that they may or may not have heard of, such as zoom and discord. as libraries re-opened, many offered real-time reference services, as well as seamless and contactless curbside service, providing a sense of control and continuity amongst the chaos. exponential increases in electronic resource usage overdrive, which is currently used by nearly 90% of public libraries in the united states to manage both ebook and audiobook collections, saw an exponential increase in its usage. since the lockdown began in mid-march, the daily average for ebook checkouts have been consistently 52% above pre-covid periods. additionally, new users to the platform have been consistently double and triple 2019 highs.1 library staff have been helping readers during this time to ensure they obtain access with their devices. in suffolk county, new york, where new patron registration to overdrive is up 72% from last year (as of august 2020), there has been no shortage of requests for help.2 with kids being home from school and learning virtually, it is no surprise that ebook readership skyrocketed amongst ya and juvenile readers with an 87% increase from last year. 3 to help them with their homework and studies, families turned to online tutoring. in suffolk county, new york, the usage of the brainfuse online live tutoring service has been consistently up by nearly 50% during the school closures.4 gale, a cengage company, which offers miss humblebee's academy, a virtual learning program for preschoolers, saw its user sessions increase by 100% from the previous year.5 mailto:jgoddard@northshorepubliclibrary.org information technology and libraries december 2020 public libraries respond to the covid-19 pandemic | goddard 2 adults, also eager to learn new skills, took to online courses as well. gale courses saw a 50% increase in enrollments from march-july from the previous year. likewise, gale presents: udemy, which offers on-demand video courses, saw just over 21,000 course enrollments from marchjune.6 to help those who did not have sufficient broadband wifi to use these necessary resources and platforms, many libraries left their wifi on even when the building was closed to allow access to those in the vicinity of the building. in addition, many libraries purchased wifi hotspots to lend to their patrons. according to pew research, approximately 25% of households do not have a broadband internet connection at home.7 while public libraries cannot provide the only local solution to this gap, here are other steps libraries have been taking during the shutdown: • strengthening wireless signals so people can access wireless from outside library buildings. • hosting drive-up wifi hotspot locations. • partnering with schools to obtain and distribute wifi hotspots to families in need. community engagement virtually community engagement has been vital since the covid-19 lockdown. both librarians and those who provide library programs to the public had to quickly adjust to the virtual world in which we were suddenly living. using a mixture of social media platforms, including facebook live and stories, discord, instagram, youtube, zoom, and gotomeeting, librarians flocked to the internet, providing a wide range of programming. even those libraries that did not previously have any virtual programs managed to very quickly provide quality programs to their patrons. virtual programming was not available at the san josé public library (sjpl) prior to the shutdown. librarians quickly started to move programs online, including story time, and created a program called spring into reading, similar to the summer reading program, to continue to encourage families to read together. they also started a weekly recorded story time, so patrons could call the library and use their phones to hear a story. to date, sjpl has hosted over 2,000 virtual events since the lockdown began on march 17th.8 some libraries, like the oceanside library in new york, were offering virtual programs before the pandemic. when the library closed on march 13, the team started planning to move completely virtual. two days later, the library was offering four programs a day, including story times, book chats, and book clubs. by the end of the week, they were offering eight programs a day.9 in april, may, and june, they found book discussions and story times were the most popular programs. they then started to open their programs to people from out of state, partnering with other libraries. the result? program attendance has increased and several zoom meeting rooms have been maxed out.10 through the lockdown, library patrons have been exercising, listening to concerts, taking virtual vacations, learning new skills, cooking, playing games, and reducing stress. this incredible information technology and libraries december 2020 public libraries respond to the covid-19 pandemic | goddard 3 adaption was only possible due to library worker’s quick thinking and a never-ending determination to help. delivering information and materials with a new service model at the san jose public library (sjpl), which has over 500,000 library members, library staff had to quickly shift to a new online reality just after the shutdown. to help patrons get the most from their electronic resources, sjpl used libanswers to post faqs and email responses to their issues and questions. when a librarian was available, patrons could use libchat to ask questions in real-time. because no one was in the library buildings to answer phones, libanswers and libchat became the only way the public could communicate with staff. chat reference conversations increased by nearly 400%—from approximately 40 chat sessions per day to 160 per day. the chat service was also made available in spanish, vietnamese, and chinese. when the library implemented its express pickup service, sjpl utilized the spaces functionality in libcal to allow patrons to create pickup appointments. when patrons arrived at the library for their appointment, the sms functionality in libanswers allowed patrons to text staff upon arrival. through the city of san josé’s sj access initiative, which aims to help bridge the digital divide in the city, sjpl worked closely with other city departments, and the santa clara county office of education, to purchase approximately 16,000 high-speed at&t hotspots for students and the public.11 working towards the new normal the american library association (ala) is committed to advocate strongly for libraries on several different fronts. thanks to thousands of advocate communications with congress, libraries secured $50 million for the institute of museum and library services (imls) in the coronavirus aid, relief, and economic security (cares) act. this enabled libraries and museums to apply for grants during this time of need.12 in addition, the ala is currently advocating for the passage of the library stabilization fund act (s.4181 / h.r.7486) to allow libraries to retain staff, maintain services, and safely keep communities connected and informed. the legislation calls for $2 billion in emergency recovery funding for america's libraries through the institute of museum and library services (imls).13 while the ala is rightly advocating for these emergency funds, public librarians and administrators should take advantage of this time to strategically review what has been put into place to react to the covid-19 pandemic, and plan for the long term. while it is true that libraries are physical spaces, they are also technology-driven services for learning and connections for all ages. additionally, they have shown that due to this new service model, access has expo nentially expanded to new patrons, showing tremendous value when it comes to education and engagement. information technology and libraries december 2020 public libraries respond to the covid-19 pandemic | goddard 4 this new service model should be preserved. programs that engage our communities should be both physical and virtual. physical media and books should be provided both at the circulation desk and through a contactless service. reference services should be provided both at the reference desk, and through chat reference services. this must be our new normal. endnotes 1 david burleigh, director, brand marketing & communication at overdrive, phone conversation with author, october 9, 2020. 2 maureen mcdonald, special projects supervisor at the suffolk cooperative library system, phone conversation, september 14, 2020. 3 burleigh. 4 mcdonald. 5 kayla siefker, head of media & public relations at gale, a cengage company, brian risse, vp of sales – public libraries. muna sharif, product manager, discovery & analytics, phone conversation with author, october 16, 2020. 6 siefker. 7 pew research center, “internet/broadband fact sheet,” june 12, 2019, accessed october 13, 2020, https://www.pewresearch.org/internet/fact-sheet/internet-broadband/. 8 laurie willis, web services at sjpl, phone conversation with author, october 14, 2020. 9 erica freudenberger, “programming through the pandemic,” library journal, may 22, 2020, https://www.libraryjournal.com/?detailstory=programming-through-the-pandemic-covid19. 10 tony iovino, assistant director for community services at the oceanside library, phone conversation with author, october 19, 2020. 11 willis. 12 american library association, “advocacy & policy,” accessed october 15, 2020, http://www.ala.org/tools/covid/advocacy-policy. 13 ibid. https://www.pewresearch.org/internet/fact-sheet/internet-broadband/ https://www.libraryjournal.com/?detailstory=programming-through-the-pandemic-covid-19 https://www.libraryjournal.com/?detailstory=programming-through-the-pandemic-covid-19 http://www.ala.org/tools/covid/advocacy-policy exponential increases in electronic resource usage community engagement virtually delivering information and materials with a new service model endnotes modeling a library website redesign process: developing a user-centered website through usability testing danielle a. becker and lauren yannotta information technology and libraries | march 2013 6 abstract this article presents a model for creating a strong, user-centered web presence by pairing usability testing and the design process. four rounds of usability testing were conducted throughout the process of building a new academic library web site. participants were asked to perform tasks using a talk-aloud protocol. tasks were based on guiding principles of web usability that served as a framework for the new site. results from this study show that testing throughout the design process is an effective way to build a website that not only reflects user needs and preferences, but can be easily changed as new resources and technologies emerge. introduction in 2008 the hunter college libraries launched a two-year website redesign process driven by iterative usability testing. the goals of the redesign were to: • update the design to position the library as a technology leader on campus; • streamline the architecture and navigation; • simplify the language used to describe resources, tools, and services; and • develop a mechanism to quickly incorporate new and emerging tools and technologies. based on the perceived weaknesses of the old site, the libraries’ web committee developed guiding principles that provided a framework for the development of the new site. the guiding principles endorsed solid information architecture, clear navigation systems, strong visual appeal, understandable terminology, and user-centered design. this paper will review the literature on iterative usability testing, user-centered design, and thinkaloud protocol and the implications moving forward. it will also outline the methods used for this study and discuss the results. the model used, building the design based on the guiding principles and using the testing to uphold those principles, led to the development of a strong, user-centered site that can be easily changed or adapted to accommodate new resources and technologies. we believe this model is unique and can be replicated by other academic libraries undertaking a website redesign process. danielle a. becker (dbe0003@hunter.cuny.edu) is assistant professor/web librarian, lauren yannotta (lyannotta@hotmail.com) was assistant professor/instructional design librarian, hunter college libraries, new york, new york. mailto:dbe0003@hunter.cuny.edu mailto:lyannotta@hotmail.com modeling a library website redesign process | becker 7 background the goals of the research were to (1) determine the effectiveness of the hunter college libraries website, (2) discover how iterative usability testing resulting in a complete redesign impacts how the students perceive the usability of a college library website, and (3) reveal student informationseeking habits. a formal usability test was conducted both on the existing hunter college libraries website (appendix a) and the following drafts of the redesign (appendix b) with twenty users over an eighteen-month period. the testing occurred before the website redesign began, while the website was under construction, and after the site was launched. the participants were selected through convenience sampling and informed that participation was confidential. the intent of the usability test was to uncover the flaws in navigation and terminology of the current website and, as the redesign process progressed, to incorporate the users’ feedback into the new website’s design to closely match their wants and needs. the redesign of the website began with a complete inventory of the existing webpages. an analysis was done of the website that identified key information, links, units within the department, and placement of information in the information architecture of the website. we identified six core goals that we felt were the most important for all users of the library’s website: 1. user should be able to locate high-level information within three clicks. 2. eliminate library jargon from navigational system using concise language. 3. improve readability of site. 4. design a visually appealing site. 5. create a site that was easily changeable and expandable. 6. market the libraries’ services and resources through the site. literature review in 2010, oclc compiled a report, “the digital information seeker,” that found 84 percent of users begin their information searches with search engines, while only 1 percent began on a library website. search engines are preferred because of speed, ease of use, convenience, and availability.1 similar studies such as emde et al., and gross and sheridan, have shown that students are not using library websites to do their research.2 gross and sheridan assert in their article on undergraduate search behavior that “although students are provided with library skills sessions, many of them still struggle with the complex interfaces and myriad of choices the library website provides.” 3 this research shows the importance of creating streamlined websites that will information technology and libraries | march 2013 8 compete for our students’ attention. in building a new website at the hunter college libraries, we thought the best way to do this was through user-centered design. web designers both inside and outside the library have recognized the importance of usercentered design. nielsen advises that website structure should be driven by the tasks the users came to the site to perform.4 he asserts the amount of graphics on webpages should be minimized because they often affect page download times and that gratuitous graphics (including text rendered as images) should be eliminated altogether. 5 he also contends it is important to ensure that page designs are accessible to all users regardless of platform or newness of technology. 6 in their article, “how do i find an article? insights from a web usability study,” cockrell and jayne cited instances when researchers concluded that library terminology contributed to patrons’ difficulties when using library websites, thus highlighting the importance of understandable terminology. hulseberg and monson found in their investigation of student-driven taxonomy for library website design that “by developing our websites based on student-driven taxonomy for library website terminology, features, and organization, we can create sites that allow students to get down to the business of conducting research.” 7 performing usability testing is one way to confirm user-centered design. in his book don’t make me think!, krug insists that usability testing can provide designers with invaluable input. that, taken together with experience, professional judgment, and common sense, makes design choices easier.8 ipri, yunkin, and brown, in their article “usability as a method for assessing discovery,” emphasize the important role usability testing has in capturing emotional and aesthetic responses users have to websites, along with expressions of satisfaction with the layout and logic of the site. even the discovery of basic mistakes, such as incorrect or broken links and ineffective wording, can negatively affect discovery of library resources and services. 9 in battleson, booth, and weatherford’s literature review for their usability testing of an academic library website case study, they summarize dumas and redish's discussion of the five facets of formal usability testing: (1) the goal is to improve the usability of the interface, (2) testers should represent real users, (3) testers perform real tasks, (4) user behavior and commentary are observed and recorded, and (5) data are analyzed to recognize problems and suggest solutions. they conclude that when usability testing is "applied to website interfaces, this test method not only results in a more usable site, but also allows the site design team to function more efficiently, since it replaces opinion with user-centered design."10 this allows the designers to evaluate the results and identify problems with the design being tested. 11 usability experts nielsen and tahir contend that the earlier and more frequently usability tests are conducted, the more impact the results will have on the final design of the website because the results can be incorporated throughout the design process. they conclude it is better to conduct frequent, smaller studies with a maximum of five users. they assert, “you will always have discovered so many blunders in the design that it will be better to go back to the drawing board modeling a library website redesign process | becker 9 and redesign the interface than to discover the same usability problems several more times with even more users.” 12 based on the strength of the literature, we decided to use iterative testing for our usability study. krug points out that testing is an iterative process because designers need to create, test, and fix based on test results, then test again.13 according to the united states department of health and human services report “research-based web design and usability guidelines,” conducting before and after studies when revising a website will help designers determine if changes actually made a difference in the usability of the site.14 manzari and trinidad-christensen found in their evaluation of user-centered design for a library website, iterative testing is when a product is tested several times during development, allowing users’ needs to be incorporated into the design. in their study, their aim was that the final draft of their website would closely match the users’ information needs while remaining consistent, easy to learn, and efficient.15 battleson, booth, and weintrop report that there is “a consensus in the literature that usability testing be an iterative process, preferably one built into a web site’s initial design.” 16 they explain that “site developers should test for usability, redesign, and test again—these steps create a cycle for maintaining, evaluating and continually improving a site.” 17 george used iterative testing in her redesign of the carnegie mellon university libraries website and concluded that it was “necessary to provide user-centered services via the web site.” 18 cobus, dent, and ondrusek used six students to usability test the “pilot study.” then eight students participated in the first round of testing; then librarians modified the prototype and tested fourteen students in the second and final round. after the second round of testing they used the results of this test to analyze the user recordings and deliver the findings and proposed “fixes” to the prototype pages to the web editor.19 mcmullen’s redesign of the roger williams university library website was able to “complete the usability-refinement cycle” twice before finalizing the website design.20 but continued refinements were needed, leading to another round of usability tests to identify and correct problem areas.21 bauer-graham, poe, and weatherford did a comparative study of a library websites’ usability via a survey and then redesigned the website after evaluating the survey’s results. they waited a semester, distributed another survey to determine the functionality of the current site. the survey had the participants view the previous design and the current design in a side-by-side comparison to determine how useful the changes made to the site were. 22 when testing participants, in the article “how do i find an article? insights from a web usability study,” cockrell and jayne suggest using a web interface to perform specified tasks while a tester observes, noting the choices made, where mistakes occur, and using a “think aloud” protocol. they found that modifying the website through an ongoing, iterative process of testing, refining, and retesting its component parts improves functionality. 23 in conducting our usability testing we used a think-aloud protocol to capture the participants’ actions. van den haak, de jong, and schellens define think-aloud protocol as relying on a method information technology and libraries | march 2013 10 that asks users to complete a set of tasks and to constantly verbalize their thoughts while working on the tasks. the usefulness of this method of testing lies in the fact that the data collected reflect the actual use of the thing being tested and not the participants’ judgments about its usability. instead, the test follows the individual’s thoughts during the execution of the tasks. 24 nielsen states that think-aloud protocol “may be the single most valuable usability engineering method. . . . one gets a very direct understanding of what parts of the [interface/user] dialog cause the most problems, because the thinking aloud method shows how users interpret each individual interface item.” 25 turnbow ‘s article “usability testing for web redesign: a ucla case study” states that using the “think-aloud protocol” provides crucial real-time feedback on potential problems in the design and organization of a website.26 cobus, dent, and ondrusek used the think-aloud protocol in their usability study. they encouraged participants to talk out loud as they answered the questions, audio taped their comments, and captured their on-screen navigation using camtasia.27 this information was used to successfully reorganize hunter college library’s website. method an interactive draft of hunter college libraries redesigned website was created before the usability study was conducted. in spring 2009, the authors created the protocol for the usability testing. a think-aloud protocol was agreed upon for testing both the old site and the drafts of the new site, including a series of post-test questions that would allow participants to share their demographic information and give subjective feedback on the drafts of the site. draft questions were written, and we conducted mock usability tests on each other. after several drafts we revised our questions and performed pilot tests on an mlis graduate student and two undergraduate student library assistants with little experience with the current website. we ascertained from these pilot tests that we needed to slightly revise the wording of several questions to make them more understandable to all users. we made the revisions and eliminated a question that was redundant. all recruitment materials and finalized questions were submitted to the institutional review board (irb) for review and went through the certification process. after receiving approval we secured a private room to conduct the study. participants were recruited using a variety of methods. signs were posted throughout the library, an e-mail was sent out to several hunter college distribution lists, and a tent sign was erected in the lobby of the library. participants were required to be students or faculty. participants were offered a $10.00 barnes & noble gift card as incentive. applicants were accepted on a rolling basis. twenty students participated in the web usability study (appendix c). no faculty responded to our requests for participation so a decision was made to focus this usability test on students rather than faculty because students comprise our core user base. another usability test will be conducted in the future that will focus on faculty to determine how their academic tasks differ from undergraduates when using the library modeling a library website redesign process | becker 11 website. the redesigned site is malleable, which makes revisions and future changes in the design a predicted outcome of future usability tests. tests were scheduled for thirty-minute intervals. we conducted four rounds of testing using five participants per round. the two researchers switched questioner and observer roles after each round of testing. each participant was asked to think aloud while they completed the tasks and navigated the website. both researchers took notes during the tests to ensure detailed and accurate data was collected. each participant was asked to review the irb forms detailing their involvement in the study, and they were asked to consent at that time. their consent was implied if they participated in the study after reading the form. the usability test consisted of fifteen task-oriented questions. the questions were identical when testing the old and new draft site. the first round tested only the old site, while the following three rounds tested only the new draft site. we tested both sites because we believed that comparing the two sites would reveal if the new site improved performance. the questions (appendix d) were not changed after they were initially finalized and remained the same throughout the entire four rounds of the usability study. participants were reminded at the onset of the test and throughout the process that the design and usability of the site(s) were being tested, not their searching abilities. the tests were scheduled for an hour each, allowing participants to take the tests without time restrictions or without being timed. as a result, the participants were encouraged to take as much time as they needed to answer the questions, but were also allowed to skip questions if they were unable to locate answers. initially the tests were recorded using camtasia software. this allowed us to record participants’ navigation trails through their mouse movements and clicks. but, after the first round of testing, we decided that observing and taking notes was appropriate documentation, and we stopped using the software. after the participants completed the tests we asked them user preference questions to get a sense of their user habits and their candid opinions of the new draft of the website. these questions were designed to elicit ideas for useful links to include on the website and also to gauge the visual appeal of the site. information technology and libraries | march 2013 12 results table 1. percent of tasks answered correctly discussion hunter college libraries’ website was due for a redesign because the site was dated in its appearance and did not allow new content to be added quickly and easily. as a result, a decision was made to build a new site using a content management system (cms) to make the site easily expandable and simple to update. this study tested the simple tasks to determine how to structure the information architecture and to reinforce the guiding principles of the redesigned website. task successes and failures the high percentage of success of participants finding books on the redesigned website using the online library catalog and easily find library hours reinforced our guiding principle of understandable terminology and clear navigational systems. krug contends that navigation educates the user on the site’s contents through its visible hierarchy. the result is a site that guides the user through their options and instills confidence in task old site new site find a book using online library catalog 80% 86% find library hours 100% 100% get help from a librarian using questionpoint 40% 93% find a journal article 20% 66% find reference materials 0% 7% find journals by title 40% 66% find circulation policies 60% 53% find books on reserve 80% 73% find magazines by title 0% 73% find the library staff contact information 60% 100% find contact information for the branch libraries 40% 100% modeling a library website redesign process | becker 13 the website and its designers.28 we found this to be true in the way our users easily found the hours and catalog links on the prototype of our library website. the users on the old site knew where to look for this information because they were accustomed to how to navigate the old site. given that the prototype was a complete departure from the navigation and design of the old site, it was crucial that the labels and links were clear and understandable in the prototype or our design would fail. we made “hours” the first link under the “about” heading and “cuny+/books” the first link under the “find” heading and as a result both our terminology and our structure was a success with participants. on the old website, users rarely used the libraries’ online chat client. despite our efforts to remind students of its usefulness, the website didn’t sufficiently place the link in a reasonably visible location on the home page. in the old site, only 40 percent of participants located the link as it was on the bottom left of the screen and easy to overlook. instead, on the new site, the “ask a librarian” link was prominently featured on the top of the screen. these results upheld the guiding principles of solid information architecture and understandable terminology. it also supported nielsen’s assertion that “site design must be aimed at simplicity above all else, with as few distractions as possible and with a very clear information architecture and matching navigation tools.” 29 as a result the launch of the redesigned site, the use of the questionpoint chat client has more than doubled. finding a journal article on a topic was always problematic for users of the old library website. the participants we tested were familiar with the site, and 80 percent erroneously clicked on “journal title list” when the more appropriate link would have been “databases” if they didn’t have an exact journal title in mind. although we taught this in our information literacy courses, it was challenging getting the information across. in order to address this on the new site, “databases” was changed to “databases/articles” and categorized under the heading “find.” the participants using the new site had greater success with the new terminology; 66 percent correctly chose “databases/articles.” this question revealed an inconsistency with the guiding principals of understandable terminology and clear navigation systems on the old site. these issues were addressed by adding the word “articles” after “databases” on the new site to clarify what resources could be found in a database and also by placing the link under the heading “find” to further explain the action a student would be taking by clicking on the “databases/articles” link. finding reference materials was challenging for the users of the old site as none of the participants clicked on the intended link “subject guides.” in an effort to increase usage of the research guides, the library not only purchased the libguides tool, but also changed the wording of the link to “topic guides.” as we neared the end of our study we observed that only one participant knew to click on the “topic guides” link for research assistance. the participants suggested calling it “research guides” instead of “topic guides” and we changed it. unfortunately, the usability study was completed and we were unable to further test the effectiveness of the rewording of this link. anecdotally, the rewording of this link appears to be more understandable to users as the information technology and libraries | march 2013 14 research guides are getting more usage (based on hit counts) than the previous guides. the rewording of these guides adhered to both principles of understandable terminology and usercentered design. these results supported nielsen’s assertion that the most important material should be presented up front, using the inverted pyramid principal. “users should be able to tell in a glance what the page is about and what it can do for them.” 30 our results also supported the hhs report, which states that terminology “plays a large role in the user’s ability to find and understand information. many terms are familiar to designers and content writers, but not to users.” 31 we concluded that rewriting the link based on student feedback reduces the use of terminology. although librarians are “subject specialists” and “subject liaisons” and are familiar with those labels and that terminology, our students were looking for the word “research” instead of “subject” so they were not connecting with the library’s libguides. as previously discussed, students of the old site thought the link “journal title list” would give them access to the library’s database holdings. when asked to find a specific journal title the correct answer to this question on the old site was “journal title list,” with only 40 percent of the participants answering correctly. another change to terminology in the new site, both were placed under the heading “find,” and, after testing of the first prototype, “journal title list” was changed to “list of journals and magazines.” in the following tests 66 percent of the participants were able to answer correctly. the percentages of success in finding circulation policies between the old site and the prototype site were slight, only a 7 percent difference. this can be attributed to the fact that participants on the old site could click on multiple links to get to the correct page, and they were familiar enough with the site to know that. in the prototype of the site there were several paths as well, some direct, some indirect. testing the wording of this link supported the understandable terminology principle, more so than the old website’s “library policies” link, yet to be true to our user-centered design principle, we needed to reword it once more. therefore, after the test was completed and the website was launched, we reworded the link to “checkout policies,” which utilizes the same terminology that users are familiar with because they checkout books at our checkout desk. the remaining tasks consisted of locating information, such as finding books on reserve, magazines by title, library staff contact information, and finding branch information were all met with higher success rates in the prototype site because in the redesign process the links were reworded to support the understandable terminology and user-centered design principles. participant feedback: qualitative the usability testing process informed the redesign of our website in many specific ways. if the layout of the site didn’t test well with participants, we planned to create another prototype. in their evaluation of colorado state universities libraries’ digital collections and the western waters digital library websites, zimmerman and paschal describe the importance of first impressions of a website as the determining factor of whether users return to a website; if it is positive they will return and continue to explore.32 modeling a library website redesign process | becker 15 when given an opportunity to give feedback on what they thought of the design of the website the participants commented: • “there were no good library links at the bottom before and there wasn’t the ask a librarian link either which i like a lot.” • “the old site was too difficult to navigate, new site has a lot of information, i like the different color schemes for the different things.” • “it is contemporary and has everything i need in front of me.” • “cool.” • “helpful.” • “straightforward.” • “the organization is easier for when you want to find things.” • “interactivity and rollovers make it easy to use.” • “intuitive, straight-forward and i like the simplicity of the colors.” • “more professional, more aesthetically pleasing than the old site.” • “the four menu options (about, find, services, help) break the information down easily.” additional research conducted by nathan, yeow, and murguesan claims attractiveness (referring to aesthetic appeal of a website) is the most important factor in influencing customer decisionmaking and affects the usability of the website.33 not only that, but users feel better when using a more attractive product. fortunately, the feedback from our participants revealed that the website was visually appealing, and the navigation scheme was clear and easy to understand. other changes made to the libraries’ website because of usability testing participants commented that they expected to find library contact information on the bottom of the homepage, so the bottom of the screen was modified to include this information as well as a “contact us” link. participants did not realize that the “about,” “find,” “services,” and “help” headings were also links, so we modified them so they were underlined when hovered over. there were also adjustments to the gray color bars on the top of the page because participants thought they were too bright, so they were darkened to make the labels easier to read. participants also commented that they wanted links to various public libraries in new york city under the “quick links” section of the homepage. we designed buttons for brooklyn public library, queens public library, and the new york public library and reordered this list to move these links closer to the top of the “quick links” section. information technology and libraries | march 2013 16 conclusion conducting a usability study of hunter college libraries existing website and the various stages of the redesigned website prototypes was instrumental in developing a user-centered design. approaching the website redesign in stages, with guidance from iterative user testing and influenced by the participants’ comments, gave the web librarian and the web committee an opportunity to incorporate the findings of the usability study into the design of the new website. rather than basing design decisions on assumptions of users’ needs and information seeking behaviors, we were able to incorporate what we’d learned from the library literature and the users’ behavior into our evolving designs. this strategy resulted in a redesigned website that, with continued testing, user feedback, and updating, has aligned with the guiding principles we developed at the onset of the redesign project. the one unexpected outcome from this study is that we discovered that despite how well a library website is designed, users will still need to be educated in how to use the site with an emphasis on developing strong information literacy skills. references 1. “the digital information seeker: report of the findings from selected oclc, rin, and jisc user behaviour projects,” oclc research, ed. lynn silipigni-connaway and timothy dickey (2010): 6, www.jisc.ac.uk/publications/reports/2010/digitalinformationseekers.aspx. 2. judith emde, lea currie, frances a. devlin, and kathryn graves, “is ‘good enough’ ok? undergraduate search behavior in google and in a library database,” university of kansas scholarworks (2008), http://hdl.handle.net/1808/3869; julia gross and lutie sheridan, “web scale discovery: the user experience,” new library world 112, no. 5/6 (2011): 236, doi: 10.1108/03074801111136275. 3. ibid, 238. 4. jakob nielsen, designing web usability (indianapolis: new riders, 1999), 198. 5. ibid, 134. 6. ibid, 97. 7. barbara j. cockrell and elaine a. jayne, “how do i find an article? insights from a web usability study,” journal of academic librarianship 28, no. 3 (2002): 123, doi: 10.1016/s00991333(02)00279-3. 8. steve krug, don't make me think! a common sense approach to web usability, 2nd ed. (berkeley, ca: new riders, 2006), 135. 9. tom ipri, michael yunkin, and jeanne brown, “usability as a method for assessing discovery,” information technology & libraries 28, no. 4 (2009): 181, doi: 10.6017/ital.v28i4.3229. 10. brenda battleson, austin booth, and jane weintrop, “usability testing of an academic library web site: a case study,” journal of academic librarianship 27, no. 3 (2001): 189–98, doi: 10.1016/s0099-1333(01)00180-x. http://www.jisc.ac.uk/publications/reports/2010/digitalinformationseekers.aspx http://hdl.handle.net/1808/3869 doi:%2010.1108/03074801111136275 doi:%2010.1108/03074801111136275 doi:%2010.1016/s0099-1333(02)00279-3 doi:%2010.1016/s0099-1333(02)00279-3 doi:%2010.6017/ital.v28i4.3229 doi:%2010.1016/s0099-1333(01)00180-x doi:%2010.1016/s0099-1333(01)00180-x modeling a library website redesign process | becker 17 11. ibid. 12. jakob nielsen and marie tahir, “keep your users in mind,” internet world 6, no. 24 (2000): 44. 13. steve krug, don't make me think! a common sense approach to web usability, 135. 14. research-based web design and usability guidelines, ed. ben schneiderman (washington: united states dept. of health and human services, 2006), 190. 15. laura manzari and jeremiah trinidad-christensen, “user-centered design of a web site for library and information science students: heuristic evaluation and usability testing,” information technology & libraries 25, no. 3 (2006): 163, doi: 10.6017/ital.v25i3.3348. 16. battleson, booth, and weintrop, “usability testing of an academic library web site,” 190. 17. ibid. 18. carole a. george, “usability testing and design of a library website: an iterative approach,” oclc systems & services 21, no. 3 (2005): 178, doi: 10.1108/10650750510612371. 19. laura cobus, valeda dent, and anita ondrusek, “how twenty-eight users helped redesign an academic library web site,” reference & user services quarterly 44, no. 3 (2005): 234–35. 20. susan mcmullen, “usability testing in a library web site redesign project,” reference services review 29, no. 1 (2001): 13, doi: 10.1108/00907320110366732. 21. ibid. 22. john bauer-graham, jodi poe, and kimberly weatherford, “functional by design: a comparative study to determine the usability and functionality of one library's web site,” technical services quarterly 21, no. 2 (2003): 34, doi: 10.1300/j124v21n02_03. 23. cockrell and jayne, “how do i find an article?,” 123. 24. maaike van den haak, menno de jong, and peter jan schellens, “retrospective vs. concurrent think-aloud protocols: testing the usability of an online library catalogue,” behavior & information technology 22, no. 5 (2003): 339. 25. battleson, booth, and weintrop, “usability testing of an academic library web site,” 192. 26. dominique turnbowet al., “usability testing for web redesign: a ucla case study,” oclc systems & services 21, no. 3 (2005): 231, doi: 10.1108/10650750510612416. 27. cobus, dent, and ondrusek, “how twenty-eight users helped redesign an academic library web site,” 234. 28. krug, don't make me think! 59. 29. nielsen, designing web usability, 164. 30. ibid., 111. doi:%2010.6017/ital.v25i3.3348 doi:%2010.1108/10650750510612371 doi:%2010.1108/00907320110366732 doi:%2010.1300/j124v21n02_03 doi:%2010.1108/10650750510612416 information technology and libraries | march 2013 18 31. schneiderman, research-based web design and usability guidelines, 160. 32. don zimmerman and dawn bastian paschal, “an exploratory evaluation of colorado state universities libraries’ digital collections and the western waters digital library web sites,” journal of academic librarianship 35, no. 3 (2009): 238, doi: 10.1016/j.acalib.2009.03.011. 33. robert j. nathan, paul h. p. yeow, and sam murugesan, “key usability factors of serviceoriented web sites for students: an empirical study,” online information review 32, no. 3 (2008): 308, doi: 10.1108/14684520810889646. doi:%2010.1016/j.acalib.2009.03.011 doi:%2010.1108/14684520810889646 modeling a library website redesign process | becker 19 appendix a. hunter college libraries’ old website information technology and libraries | march 2013 20 appendix b. hunter college libraries’ new website modeling a library website redesign process | becker 21 appendix c. test participant profiles participant sex academic standing major library instruction session? how often in the library 1 female senior history yes every day 2 female sophomore psychology no every day 3 male junior nursing no 1/week 4 female junior studio art no 5/week 5 female senior accounting yes 2–3/week 6 male freshman undeclared yes 1/week 7 female freshman undeclared no every day 8 male senior music yes 3–4/week 9 male freshman physics/english no every day 10 female senior english lit/ media studies no 1/week 11 female junior fine arts/ geography yes 2–3/week 12 male sophomore computer science yes every day 13 male sophomore econ/psychology yes 6 hours/week 14 female senior math/econ yes 2–3/week 15 female senior art yes everyday 16 male n/a* pre-nursing no daily 17 female senior** econ didn’t remember 3/week 18 male senior pre-med yes 2/week 19 female grad art history yes 3/week 20 male grad education (tesol) no every day note: *this student at hunter fulfilling pre-requisites; already had bachelor of arts degree from another college. **this student had just graduated. information technology and libraries | march 2013 22 appendix d. test questions/tasks • what is the first thing you noticed (or looked at) when you launched the hunter libraries homepage? • what’s the second? • if your instructor assigned the book to kill a mockingbird what link would you click on to see if the library owns that book? • when does the library close on wednesday night? • if you have a problem researching a paper topic and are at home, where would you go to get help from a librarian? • where would you click if you needed to find two journal articles on “homelessness in america”? • you have to write your first sociology paper and wanted to know what databases, journals, and web sites would be good resources for you to begin your research. where would you click? • does hunter library subscribe to the e-journal journal of communication? • how long can you check out a book for? • how would you find items on reserve for professor doyle’s liibr100 class? • does hunter library have the latest issue of rolling stone magazine? • what is the e-mail for louise sherby, dean of libraries? • what is the phone number for the social work library? • you are looking for a guide to grammar and writing on the web, does the library’s webpage have a link to such a guide? • your friend is a hunter student who lives near brooklyn college. she says that she may return books she borrowed from the brooklyn college library to hunter library. is she right? where would you find out? • this website is easy to navigate (agree, agree somewhat, disagree somewhat, disagree)? • this website uses too much jargon (agree, agree somewhat, disagree somewhat, disagree)? • i use the hunter library’s website (agree, agree somewhat, disagree somewhat, disagree)? 20180926 10703 editor president’s message: rebuilding our identity, together bohyun kim information technology and libraries | september 2018 2 bohyun kim (bohyun.kim.ois@gmail.com) is lita president 2018-19 and chief technology officer & associate professor, university of rhode island libraries, kingston, ri. ital is the official journal of lita (library and information technology association), and if you are a reader of the ital journal, it is highly likely that you are a member of lita and/or one who is deeply interested in library technology. it is my pleasure to write this column to update all of you about the exciting discussion that is currently underway in lita and two other ala divisions, alcts (association for library collections and technical services), and llama (library leadership and management association). as many of you know, lita began discussing the potential merger with two other ala divisions, alcts and llama, last year.1 what initially prompted the discussion was the prospect of continuing budget deficits in all three divisions. but the resulting conversation has proved that financial viability is not the entire story of the change that we want to bring about. at the 2018 ala annual conference in new orleans, the three boards of lita, alcts, and llama held a joint meeting open to members and non-members alike to solicit and share our collective thoughts, suggestions, concerns, and hopes about the potential three-division realignment. at this meeting attended by approximately 75 people, participants expressed their support for creating a new division with the following key elements. • retain and build upon the best elements of each division. • embrace the breakdown of silos and positive risk-taking to better collaborate and move our profession forward. • build a strong culture of innovation, energy, and inspiration. • be more transparent, responsive, agile, and less bureaucratic • excel in diversity, equity, and inclusion. • support members in all stages of their careers, those with the least means to travel for in-person participation, in particular. • provide member-driven interactions and good value for the membership fee. these ideas have made it clear that members of all three divisions see the goal of realignment as something much more fundamental than financial sustainability. they have validated the shared belief among the lita, alcts, and llama boards that the ultimate goal of realignment is to create a division that better serves and benefits members, not to simply recover the division’s financial health. while the criteria for the success of a new combined division received almost unanimous endorsement at the meeting, opinions about how to realize such success varied. there were understandable concerns associated with combining three small-sized associations into one large one. for example, how will we reconcile three distinctly different cultures in lita, alcts, and llama? how will the new association ensure itself to be more transparent, responsive, and rebuilding our identity, together | kim 3 https://doi.org/10.6017/ital.v37i3.10703 nimble than the individual divisions prior to the merger? could the larger size of the new division make it more difficult for small groups with special interests to get needed support for their programs? many requested that the leadership of the three divisions provide more specific vision and details. as a group, the leaders of lita, alcts, and llama are committed to hashing out those details. with the aim of providing fuller information about what the new division would look like at the 2019 midwinter conference, we have already formed working groups, one for finances and the other for communication and are currently working to create two more on operations and activities. these four teams will work closely together with the current leadership of lita, alcts, and llama, to prepare the most important information about the proposed new division, so that the boards and the members of three divisions can review and provide feedback for needed adjustments. our goal is to present essential information that will allow the members to vote with confidence on the proposal to form one new division on the ala ballot in the spring of 2019. if the membership vote passes, then we will be taking the proposal to the ala committee on organization for finalization. on this occasion, i would also like to bring to everyone’s attention to an inherent tension between the two ideas that many of us hold as association members regarding alignment. one is that more member involvement in determining alignment-related details at an early stage is essential to the success of the new division. the other is that we can decide whether we will support the new division or not, only after the leadership first presents us with a clear, specific, and detailed picture of what the new division will look like. the problem is that we cannot have both at the same time. as members, if we want to be involved at an early stage of reorganization, we will have to accept that there will be no ready-made set of clear and specific details about the division waiting for us to simply say yes or no. we will be required to work through our collective ideas to decide on those details ourselves. it will be a messy, iterative, and somewhat confusing process for all of us. there is no doubt that this will be hard work for both the lita leadership and lita members. but it is also an amazing opportunity. imagine a new division, where (a) innovative ideas and projects are shared and tested through open conversation and collaboration among library professionals in a variety of functional areas such as systems and technology, metadata and cataloging, and management and administration, (b) frank and inspiring dialogues take place between front-line librarians and administrators about vexing issues and exciting challenges, and (c) new librarians learn the ropes, are supported throughout their careers going through changes in their responsibilities as well as areas of specialization, are mentored to be future leaders, and get to develop the next generation of leaders as they themselves achieve their goals. furthermore, i believe that the process of building this kind of new association from the ground up will be a truly rewarding experience. we had an opportunity to discuss and share our collective hope and vision for the new division at the joint meeting, and that vision is an inspiring one: a division that is member-driven, nimble and responsive, transparent and inclusive, and not afraid to take risks. can we create a new association that breaks down our own silos and builds bridges for better communication and collaboration to move our profession forward? information technology and libraries | september 2018 4 my hope is that we can model and embody the change we want to see, starting in the reorganization process itself. if we want to build a new association that is inclusive, transparent, and nimble, we should be able to build such an association in precisely that manner: inclusively, transparently, and nimbly. if we are successful, our identity as members of this new division will be rebuilt as the very spirit and energy of continuing innovation, experimentation, and collaboration across different functional silos of librarianship, rather than as what we have in our job titles. many lita members and ital readers are leaders in their field and care deeply about the continued success and innovation of lita and ital. i would like to invite all of you to participate in this effort of three-division alignment and to inform and lead our way together. while the boards of three divisions are working on the proposal, there will be multiple calls for member participation. keep your eye out for new updates that will be posted in the ala connect community, “alcts/llama/lita alignment discussion” at https://connect.ala.org/communities/community-home?communitykey=047c1c0e-17b9-45b6a8f6-3c18dc0023f5. all information in this group site is viewable to the public. lita, alcts, and llama members can also join the group, post suggestions and feedback, and subscribe to updates. where would you like lita to be next year, and the year after? let us take lita there, together. endnote 1 andromeda yelton, “president’s message,” information technology and libraries 37, no. 1 (march 19, 2018): 2–3, https://doi.org/10.6017/ital.v37i1.10386. a comprehensive approach to algorithmic machine sorting of library of congress call numbers articles a comprehensive approach to algorithmic machine sorting of library of congress call numbers scott wagner and corey wetherington information technology and libraries | december 2019 62 scott wagner (smw284@psu.edu) is information resources and services support specialist, penn state university libraries. corey wetherington (cjw36@psu.edu) is open and affordable course content coordinator, penn state university libraries. abstract this paper details an approach for accurately machine sorting library of congress (lc) call numbers which improves considerably upon other methods reviewed. the authors have employed this sorting method in creating an open-source software tool for library stacks maintenance, possibly the first such application capable of sorting the full range of lc call numbers. the method has potential application to any software environment that stores and retrieves lc call number information. background the library of congress classification (lcc) system was devised around the turn of the twentieth century, well before the advent of digital computing. 1 consequently, neither it nor the system of library of congress (lc) call numbers which extend it were designed with any consideration to machine readability or automated sorting.2 rather, the classification was formulated for the arrangement of a large quantity of library materials on the basis of content, gathering like items together to allow for browsing within specific topics, and in such a way that a new item may always be inserted (interfiled) between two previously catalogued items without disruption to the overall scheme. unlike, for instance, modern telephone numbers, isbns, or upcs—identifiers which pair an item with a unique string of digits having a fixed and regular format, largely irrespective of any particular characteristics of the item itself—lc call numbers specify the locations of items relative to others and convey certain encoded information about the content of those items. the library of congress summarizes the essence of the lcc in this way: the system divides all knowledge into twenty-one basic classes, each identified by a single letter of the alphabet. most of these alphabetical classes are further divided into more specific subclasses, identified by two-letter, or occasionally three-letter, combinations. for example, class n, art, has subclasses na, architecture; nb, sculpture, nd, painting; as well as several other subclasses. each subclass includes a loosely hierarchical arrangement of the topics pertinent to the subclass, going from the general to the more specific. individual topics are often broken down by specific places, time periods, or bibliographic forms (such as periodicals, biographies, etc.). each topic (often referred to as a caption) is assigned a single number or a span of numbers. whole numbers used in lcc may range from one to four digits in length, and may be further extended by the use of decimal numbers. some subtopics appear in alphabetical, rather than hierarchical, lists and are represented by mailto:smw284@psu.edu mailto:cjw36@psu.edu algorithmic machine sorting of lc call numbers | wagner and wetherington 63 https://doi.org/10.6017/ital.v38i4.11585 decimal numbers that combine a letter of the alphabet with a numeral, e.g., .b72 or .k535. relationships among topics in lcc are shown not by the numbers that are assigned to them, but by indenting subtopics under the larger topics that they are a part of, much like an outline. in this respect, it is different from more strictly hierarchical classification systems, such as the dewey decimal classification, where hierarchical relationships among topics are shown by numbers that can be continuously subdivided.3 as this description suggests, lcc cataloging practices can be quite idiosyncratic and inconsistent across different topics and subtopics, and sorting rules for properly shelf-ordering lc call numbers can be correspondingly complex, as we will see below.4 for the purposes of discussion in what follows, we divide lc call number strings into three principal substrings: the classification, the cutter, and what we will term the specification. the classification categorizes the item on the basis of its subject matter, following detailed schedules of the lcc system published by the library of congress; the cutter situates the item alongside others within its classification (often on the basis of its title and/or author5); and the specification distinguishes a specific edition, volume, format, or other characteristic of the item from others having the same author and title: hc125⏞ 𝑎 .g25313⏞ 𝑏 1997⏞ 𝑐 in the above example, the classification string (a) denotes the subject matter (in this case, general economic history and conditions of latin america), the cutter string (b) locates the book within this topic on the basis of author and/or title (following a specific encoding process), and the specification string (c) denotes the particular edition of the text (in this case, by year). each of these general substrings may contain further substrings having specific cataloging functions, and though each is constructed following certain rigid syntactical rules, a great deal of variation in format may be observed within the basic framework. the following is an inexhaustive summary of the basic syntax of each of the three call number components: • the classification string always begins with one to three letters (the class/subclass), almost always followed by one to four digits (the caption number), possibly including an additional decimal. the classification may also contain a date or ordinal number following the caption number. • the beginning of the cutter string is always indicated by a decimal point followed by a letter and at least one digit. while the majority of call numbers contain a cutter, it is not always present in all cases. among the sorting challenges posed by lc call numbers, we note in particular the “double cutter”—a common occurrence in certain subclasses— wherein the cutter string changes from alphabetic to numeric, then back to alphabetic, and finally again to numeric. triple cutters are also possible, as are dates intervening between cutters. certain cutter strings (e.g., in juvenile fiction) end with an alphabetic “work mark” composed of two or more letters. • the specification string (which may be absent on older materials) is always last, and usually contains the date of the edition, but may also include volume or other numbering, ordinal numbers, format/part descriptions (e.g., “dvd,” “manual,” “notes”), or other distinguishing information. information technology and libraries | december 2019 64 figure 1 shows example call numbers, all found within the catalog of penn state university libraries, suggesting the wide variety of possibilities: figure 1. example call numbers. as one might expect given this irregularity in syntax, systematic machine-sorting of lc call numbers is by no means trivial. to begin with, sorting procedures within the lcc system are to a certain degree contextual—that is, the sorter must understand how a given component of a call number operates within the context of the entire string in order to determine how it should sort. both integer and decimal substrings appear in lc call numbers, so that a numeral may properly precede a letter in one part of a call number (a ‘1’ sorts before an ‘a’ in the classification portion, for example: h1 precedes ha1), while the contrary may occur in another part (within the cutter, in particular, an ‘a’ may well precede a ‘1’: hb74.p65a2 precedes hb74.p6512). similarly, letters may have different sorting implications depending on where and how they appear. compare, for instance, the call numbers v23.k4 1961 and u1.p32 v.23 1993/94. the v in the former denotes the subclass of general nautical reference works and simply sorts alphabetically, whereas the v in the latter call number functions in part as an indicator that the numeral 23 refers to a specific volume number and is to be sorted as an integer rather than a decimal. such contextual cues are often tacitly understood by a human sorter, but can present considerable challenges when implementing machine sorting procedures. furthermore, the lack of uniformity or regularity in the format of call number strings poses various practical obstacles for machine sorting. taken together, these assorted complexities suggest the insufficiency of a single alphanumeric sorting procedure to adequately handle lc call numbers as unprocessed, plain text strings. literature review a thorough review of information science literature revealed little formal discussion of the algorithmic sorting of lc call numbers. if the topic has been more widely addressed in the scholarly or technical literature, we were unable to discover it. nevertheless, the general problem appears to be fairly well known. this is evident both from informal online discussions of the topic (e.g., in blog posts, message board threads, and coding forums) and from the existence of certain features of library management system (lms) and integrated library system (ils) software designed to address the issue. in this section we examine methods proffered by some of these sources, and detail how each fails to fully account for all aspects of lc call number sorting. b1190 1951 no cutter string dt423.e26 9th.ed. 2012 compound specification e505.5 102nd.f57 1999 ordinal number in classification hb3717 1929.e37 2015 date in classification kbd.g189s no caption number, no specification n8354.b67 2000x date with suffix ps634.b4 1958-63 hyphenated range of dates ps3557.a28r4 1955 “double cutter” pz8.3.g276lo 1971 cutter with “work mark” pz73.s758345255 2011 lengthy cutter decimal algorithmic machine sorting of lc call numbers | wagner and wetherington 65 https://doi.org/10.6017/ital.v38i4.11585 in a brief article archived online, conley and nolan outline a method for sorting lc call numbers through the use of function programming in microsoft excel. 6 given a column of plain-text lc call numbers, their approach entails successive processing of the call numbers across several spreadsheet columns with the aim of properly accounting for the sorting of integers . the fullyprocessed strings are then ultimately ready for sorting in the rightmost column using excel’s built in sorting functionality. we note that conley and nolan’s method (hereafter “cnm”) only attempts to sort what the authors refer to as the “base call number” (i.e., the classification and cutter portions), and does not address the sorting of “volume numbers, issue numbers, or sheet numbers” (what we refer to here as the “specification”). 7 cnm stems from the tacit observation that ordinary, single-column sorting of lc call numbers is clearly inadequate in an environment like excel’s. for instance, in the following example, standard character-by-character sorting fails at the third character position, since pz30.a1 erroneously sorts before pz7.a1 (as 3 is compared to 7 in the third character position), contrary to the correct order (7 before 30). to address this, cnm normalizes the numeric portion of the class number with leading zeros so that each numeric string is of equal length, ensuring that the proper digits are compared during sorting. this entails a transformation, pz30.a1  pz0030.a1 pz7.a1  pz0007.a1 following which the strings will in fact sort correctly in an excel column. this technique appears adequate until we compare call numbers having subclasses of different length: p180.a1  p0180.a1 pz30.a1  pz0030.a1 here, while standard excel sorting will in fact properly order the resulting strings, in other applications, depending on the sorting hierarchy employed, sorting may fail in the second position if letters are sorted before numbers. hierarchy aside, it is not difficult to see the potential issues that may arise from sorting unlike portions of the call number string against one another in this way, particularly when comparing characters within the cutter string or in situations involving a “double cutter.” for instance, the call numbers b945.d4b65 1998 and b945.d41 1981b are listed here in their proper sorting order, but are in fact sorted in reverse by cnm when, in the eighth character position, 1 is sorted before b in accordance with excel’s default sorting priority. this again illustrates an essential problem of character-by-character sorting: in certain substrings we require a letters-before-numbers sorting priority, while in others a numbers-before-letters order is needed. this impasse makes clear that no single-column sorting methodology can succeed for all types of lc call numbers without significant modification to the call number string. in a blog post, dannay observed that cnm does not account for certain call number formats, particularly those of legal materials within the k classification having 3-letter class strings. 8 (the information technology and libraries | december 2019 66 same would also be true in the d classification, where 3-letter strings also appear.) although minor modification of portions of the function code (e.g., replacing certain ‘2’s with ‘3’s) would be sufficient to alleviate this particular issue, dannay proposed instead to employ placeholder characters to normalize the classification string and avoid instances of alphabetic characters being compared against numeric ones. dannay’s method (dm) normalizes various parts of the classification string, including the subclass, caption, and decimal portions: q171.t78  q**0171.0.t78 qa9.r8  qa*0009.0.r8 (here, of course, it is imperative that the chosen placeholder character sort before all letters in the sorting hierarchy.) dm thus successfully avoids the issue of comparing classification strings of unequal length or format. nevertheless, despite the improvements of dm over cnm, both approaches are ultimately unable to properly process certain types of common lc call numbers. for example, call numbers with dates preceding the cutter (e.g., gv722 1936.h55 2006) and call numbers without cutters (e.g., b1205 1958) both result in errors, as do those containing the aforementioned “double cutters.” furthermore, as we previously noted, neither dm nor cnm were designed to handle any portion of the specification string following the cutter, where the presence of ordinal and volumetype numbering is commonplace. hence neither method is able to properly order the quite ordinary pair of call numbers ac1.g7 v.19 and ac1.g7 v.2, since the first digit of each’s volume number is compared and ordered numerically (i.e., character-by-character), resulting in a mis-sort. though neither dn nor cnm is ultimately comprehensive (nor designed to be), both methods contain valuable insights and strategies that inform our own approach to the problem. software review available software solutions for sorting lc call numbers appear to be nearly as scant as literature on the subject. while github contains a handful of programs that attempt to address the problem, we found none which could be considered comprehensive. table 1 is a summary of those programs we discovered and were able to examine. the “sqlite3-lccn-extension” program is an extension for sqlite 3 which provides a collation for normalizing lc call numbers, executing from a sqlite client shell. we discovered several limitations in its ability to sort certain call number formats similar to those discussed above in the literature review. for instance, the program cannot correctly sort specification integers (e.g., it sorts v.13 before v.3) or call numbers lacking cutter strings (e.g., it sorts b 1190.a1 1951 before b 1190 1951). we found similar issues with “js-loc-callnumbers,” a javascript program with a web interface into which a list of call numbers can be pasted. the program transforms the call numbers into normalized strings, which are then sorted and displayed to the user. however, we observed that it does not account for dates or ordinal numbers in the classification string, nor can it correctly sort call numbers lacking caption numbers.9 algorithmic machine sorting of lc call numbers | wagner and wetherington 67 https://doi.org/10.6017/ital.v38i4.11585 program and author app-type, interface repository url last update “sqlite3-lccn-extension” by brad dewar database extension, shell https://github.com/macdewar/sqlite3lccn-extension dec. 2013 “js-loc-callnumbers” by ray voelker javascript, web https://github.com/rayvoelker/js-loccallnumbers feb. 2017 “library-of-congresssystem” by luis ulloa python tutorial, command line https://github.com/ulloaluis/library-ofcongress-system sep. 2018 “lcsortable” by mbelvadi2 google sheets script https://github.com/mbelvadi2/lcsortabl e may 2017 “library-callnumber-lc” by library hackers perl, python https://github.com/libraryhackers/libra ry-callnumberlc/tree/master/perl/librarycallnumber-lc dec. 2014 “lc_call_number_compare” by smu libraries javascript, command line https://github.com/smulibraries/lc_call_number_compare dec. 2016 “lc_callnumber” by bill dueber ruby https://github.com/billdueber/lc_callnu mber feb. 2015 table 1. list of github software involving lc call number sorting. several of the programs are rather narrow in scope. the “lcsortable” script is a google sheets scheme for normalizing lc call numbers into a separate column for sorting, very much like cnm and dm. its normalization routine appears to conflate decimals and integers, though, leading to transformations such as hf5438.5.p475 2001  hf5438.0005.p04752001 which would clearly result in a great deal of incorrect sorting across a wide array of lc call number formats. the command-line-based python program “library-callnumber-lc” processes a call number and returns a normalized sort key, but is not intended to store or sort groups of call numbers. it cannot adequately handle compound specifications or cutters containing consecutive letters (e.g., s100.bc123 1985), and does not appear to preserve the demarcation between a caption integer and caption decimal (i.e., the decimal point), thereby commingling integer and decimal sorting logic. lastly, “library-of-congress-system” is a tutorial/training program written in python that runs from the command line and supplies a list of call numbers for the user to sort. it does not draw call numbers from a static collection nor allow call numbers to be input by the user; rather, it randomly generates call numbers within certain parameters and following a https://github.com/macdewar/sqlite3-lccn-extension https://github.com/macdewar/sqlite3-lccn-extension https://github.com/rayvoelker/js-loc-callnumbers https://github.com/rayvoelker/js-loc-callnumbers https://github.com/ulloaluis/library-of-congress-system https://github.com/ulloaluis/library-of-congress-system https://github.com/mbelvadi2/lcsortable https://github.com/mbelvadi2/lcsortable https://github.com/libraryhackers/library-callnumber-lc/tree/master/perl/library-callnumber-lc https://github.com/libraryhackers/library-callnumber-lc/tree/master/perl/library-callnumber-lc https://github.com/libraryhackers/library-callnumber-lc/tree/master/perl/library-callnumber-lc https://github.com/libraryhackers/library-callnumber-lc/tree/master/perl/library-callnumber-lc https://github.com/smu-libraries/lc_call_number_compare https://github.com/smu-libraries/lc_call_number_compare https://github.com/billdueber/lc_callnumber https://github.com/billdueber/lc_callnumber information technology and libraries | december 2019 68 prescribed pattern. as such, we were not able to satisfactorily test its sorting capabilities for the kind of use-case scenario under discussion. we did not evaluate the remaining two github programs, “lc_call_number_compare” and “lc_callnumber,” as we could not get the former, a javascript es6 module, to execute, and as the latter, a ruby application which we did not attempt to install, evidently remains unfinished: its github documentation lists “normalization: create a string that can be compared with other normalized strings to correctly order the call numbers” as the among tasks yet to be completed. in addition to these open resources, we examined lc sorting capability within the commercial lms/ils software we had at hand. the marc (machine-readable cataloging) 21 protocol, a widely used international standard for formatting bibliographic data, provides a specific syntax for cataloging lc call numbers for the purposes of machine parsing.10 symphony workflows, the lms licensed by penn state university libraries from sirsidynix (and thus the only one available for our direct examination), contains within its search module a call number browsing feature which attempts to sort call numbers in shelving order via “shelving ids,” call number strings rendered from each item’s marc 21 “050” data field for sorting purposes. while these shelving ids are not visible within workflows (that is, they operate in the background), they can be accessed as plain text strings via bluecloud analytics, a separate, sirsidynix-branded data assessment and reporting tool peripheral to the lms. examination of these sort keys revealed integer normalization strategies similar to those of dm and cnm, with additional processing of volume-type numbering within the specification string. however, these shelving ids are similarly unable to correctly sort “double cutter” substrings and other syntactic complexities, such as ordinal numbers appearing in the classification. the following shelving id transformations of two call numbers in the penn state university libraries catalog, for instance, fail to properly account for the ordinal numbers which appear within the classification: e507.5 36th.v47 2003  e 000507.5 36th.v47 2003 e507.5 5th.c36 2000  e 000507.5 5th.c36 2000 consequently, and as expected, these two call numbers sort incorrectly within workflows’ call number browsing panes.11 proposed parsing and sorting methodology given the sorting difficulties inherent in the single-column approaches outlined above, we suggest a multi-column, tiered sorting procedure in which only like portions of the call number are compared to one another. this requires the call number to be processed, its various components identified, and each component appropriately sorted according to its specific typ e. this, in turn, requires a sorting algorithm which can identify like substrings by scanning for specific patterns and cues. “shelf reading” is a term for the common practice of verifying the correct ordering of items filed on a library shelf, typically unaided by technology, and our approach is primarily informed by the kind of mental procedures one undertakes when performing such sorting “in one’s head.”12 perhaps the most significant component of this process involves recognizing and interpreting the role and logic of specific types of substrings and identifying their positions within the sorting hierarchy. the overall design of the lc classification, from class to subclass to caption, constitutes algorithmic machine sorting of lc call numbers | wagner and wetherington 69 https://doi.org/10.6017/ital.v38i4.11585 a left-to-right progression from general to specific, and the classification portion of a call number can be interpreted as a series of containers holding items of increasingly narrow scope, some of which may be empty (that is, absent). this creates a structure that has a linear, hierarchical aspect, but also contains within it subcategories that share a common position within the structure. the priority that a subcategory (or container) is afforded in the sorting process depends first upon its position in the linear hierarchy, and subsequently on the depth ascribed to it relative to other subcategories that share the same position. call numbers indicate a subcategory’s position in the linear dimension by including or expanding sections; its depth within a given position is encoded in the character or series of characters chosen to represent it. thus, the sorting process may be regarded as a comparison of the paths that two call numbers denote through this structure, and the point at which the paths diverge is then the decisive point in determining an item’s position relative to others. this inflection point may occur at any juncture of the comparison, from the first character to the last. given these observations, a comprehensive machine-sorting strategy must observe the following provisions: 1. characters in call numbers should only be compared to characters that occupy an equivalent section of another call number. (“like compared to like.”) 2. within these designated sections, characters should only be compared to characters that occupy a corresponding position (place value) within that section. 3. if call numbers are identical up to the point that one of them lacks a section that the other call number possesses, the one with the “missing” section is ordered first. this is in keeping with the convention that items occupying a more general level in the hierarchy are ordered before those occupying a more specific one. (this principle is often summarized in shelfreading tutorials as “nothing before something.”) 4. if call numbers are identical up to the point that one of them lacks a character in a given position within a particular section that the other call number possesses, the one missing the character is ordered first. again, this preserves the general to specific scheme of lcc sorting. (another instance of “nothing before something.”) 5. whole numbers (e.g., caption integers, volume numbers) must be distinguished from decimals. for character-by-character sorting to work in sections of the call number containing integers, the length of whole numbers must be normalized to assure each digit is compared to another of equal place value. application of methodology shelfreader is a software application designed by the authors to improve the speed and accuracy of the shelf-reading process in collections filed using the library of congress system—and, to our knowledge, is the first such application to do so. it was coded by scott wagner in php and javascript, uses mysql for data storage and sorting, and is deployed as a web application. shelfreader allows the user to scan library items in the order they are shelved and receive feedback regarding any mis-shelved items. the program receives an item’s unique barcode identification via a barcode scanner, assembles a rest request incorporating the barcode, and sends it to an api connected to the lms. the application then processes the response, retrieving the title and call number of the item, along with information about the item’s status (for example, if it has been marked as lost or missing). the call number is passed off to the sorting algorithm, information technology and libraries | december 2019 70 which processes it and assigns it a position among the set of call numbers recorded during that session. a user interface then presents a “virtual shelf” to the user displaying a graphical representation of the items in the order they were scanned. when items are out of place on the shelf, the program calculates the fewest number of moves needed to correct the shelf and presents the necessary corrections for the user to perform until the shelf is properly ordered. a screenshot depicting the shelfreader gui during a typical shelf-reading session is presented in figure 2. figure 2. a screenshot of the shelfreader gui, showing an incorrectly filed item (highlighted in blue text) and its proper filing position (represented by the green band). shelfreader’s sorting strategy consists of breaking call numbers into elemental substrings and arranging those parts in a database table so that any two call numbers may be compared exclusively on the basis of their corresponding parts. to this end, a base set of call number components was established. these are shown in table 2, along with their abbreviations (for ease in reference), maximum length, and corresponding mysql data types. the specific mysql data type determines the kind of sorting employed in each column: • varchar accepts alphanumeric string data. sorting is character by character, numbers before letters. • integer accepts numerical data; numbers are evaluated as whole numbers. • decimal accepts decimal values. specifying the overall length of the column and the number of characters to the right of the decimal point has the effect of adding zeros as placeholders in any empty spaces to the right of the last digit. the values are then compared digit by digit. algorithmic machine sorting of lc call numbers | wagner and wetherington 71 https://doi.org/10.6017/ital.v38i4.11585 • timestamp a date/time value that defaults to the date and time the entry is made. this orders call numbers that are identical (i.e., multiple copies of the same item) in the order they are scanned. section, component abbreviation max. length mysql data type classification class/subclass sbc 3 varchar caption number, integer part ci 4 integer caption number, decimal part cdl 16 decimal caption date cdt 4 varchar caption ordinal co 16 integer caption ordinal indicator coi 2 varchar cutter first cutter, alphabetical part c1a 3 varchar first cutter, numerical part c1n 16 decimal first cutter date cd 4 integer second cutter, alphabetical part c2a 3 varchar second cutter, numerical part c2n 16 decimal second cutter date cd2 4 integer third cutter, alphabetical part c3a 3 varchar third cutter, numerical part c3n 16 decimal specification specification sp 256 varchar timestamp — — mysql timestamp table 2. shelfreader call number components and data types. when parsing a call number, it must be assumed that each call number may contain all of the components identified above. the following is a general outline of the parsing algorithm which processes the call number: information technology and libraries | december 2019 72 1. an array is created from the call number. each character, including spaces, is an element of the array. 2. a second array is then created to serve as a template for each call number, replacing the actual characters with ones indicating data type. for example, all integers are replaced with ‘i’s. this makes pattern matching and data-type testing simpler. 3. pattern matching is used to identify the presence or absence of landmarks such cutters, spaces, volume-type numbering, etc. 4. when landmarks are identified, their beginning and ending positions in the call number string are noted. 5. component strings are created by looping through the appropriate section of the call number template, constructing a string in which the template characters are replaced by the actual characters in the call number string and continuing until a space, th e end of the string, or an incompatible character is encountered. 6. where needed, whole numbers strings are normalized to uniform length. dividing a call number into its component parts and placing those parts in separate columns in a database table, then, effectively creates a sort key that may be used for ordering. this key occupies a row of the table, and is an inflated representation of the call number insofar as it makes use of the maximum possible string length of each component type. it contains the characters of each component the call number possesses, and any empty columns serve as placeholders for components it does not possess. when two call numbers are compared, sorting proceeds through each successive column, each component (and each character within each component) serving as a potential break point within the sorting process. we note that every column (with the exception of the specification) contains exclusively alphabetic or numeric data, so that numbers and letters are never compared in th ose sections of the call number string. (the use of spaces in the specification string effectively accounts for the mixed alphanumeric data type.) some additional points of clarification regarding the algorithm’s multi-column approach to sorting are worth mentioning: 1. any lowercase alphabetic characters are converted to uppercase before processing in order to ensure that letter case does not affect sorting. 2. components are arranged in the database table from left to right in the order they occur in the call number. 3. if a call number does not contain a given component, the column is left empty (in the case of a varchar column) or is assigned a zero value (in the case of numeric columns). 4. empty columns and zero columns sort before columns containing data. 5. in columns designated as varchar columns, numbers are compared as whole numbers . this means that, in order to sort correctly, the length of any number stored must be normalized to a uniform length (6 places) by adding leading zeros. for example, 17 must be normalized to “000017.” 6. sorting proceeds column by column, provided the call numbers are identical. when the first difference is encountered, sorting is complete. algorithmic machine sorting of lc call numbers | wagner and wetherington 73 https://doi.org/10.6017/ital.v38i4.11585 table 3 shows two randomly selected call numbers of rather common configuration, along with the corresponding sort keys created by shelfreader: e169.1.b634 2002 e169.1.b653 1987 }  sbc ci cdl cdt co coi c1a c1n cd c2a c2n cd2 c3a c3n sp e 0169 0.10000 0 b 0.6340000000 0002002 e 0169 0.10000 0 b 0.6530000000 0001987 table 3. example shelfreader sort-key processing of two similar call numbers. in this first example, sorting is complete when 3 is compared to 5 in the first numerical cutter (c1n) column. (note that we have here truncated the length of certain strings for space and readability.) to illustrate how the application handles call numbers having heterogenous formats, table 4 shows the sort keys created from two call numbers in an example mentioned above, one with a “double cutter” and one without: b945.d4b65 1998 b945.d41 1981b }  sbc ci cdl cdt co coi c1a c1n cd c2a c2n cd2 c3a c3n sp b 0945 0.0 0 d 0.400000 b 0.650000 0001998 b 0945 0.0 0 d 0.410000 0.000000 0001981b table 4. shelfreader sort-key processing of a “double cutter” call number and a nearby, single cutter call number. by pushing the second cutter (b65) in the first call number into the c2a and c2n columns, the issue of comparing incompatible sections of the call number is avoided, as the 1 in the second call number is compared to the placeholder 0 in the first. when the sorting routine reaches this position, it terminates, and any subsequent characters are ignored. aspects of this multi-column approach may seem counterintuitive at first, but the method mimics what we do when we order call numbers mentally. one compares two call numbers character by character within these component categories until encountering a difference, or until a character or entire category in one of the call numbers is found to be absent. results shelfreader’s sorting method is powerful, accurate, and has been extensively tested without issue in a number of different academic libraries within penn state’s statewide system. the application accurately sorts all valid lc call numbers (with the exception of those for certain cartographic materials in the g1000 – g9999 range, which sometimes employ a different syntax and sorting order) as well those of the national library of medicine classification system (which augments information technology and libraries | december 2019 74 lcc with class w and subclasses qs – qz) and the national library of canada classification (which adds to lcc the subclass fc, for canadian history). while there may conceivably be valid lc or lcextended call numbers having exotic formats that would fail to correctly sort in shelfreader, we are not aware of any examples (outside of, once again, the g1000 – g9999 range), nor have we received reports of any from users. in addition to verifying proper shelf-ordering, shelfreader contains a number of other features useful for stacks maintenance. the program can identify shelved items that are still checked out to patrons, have been marked missing or lost, or are flagged as in transit between locations, and often reveals items which have been inadvertently “shadowed” (i.e., excluded from public-facing library catalogs) or have shelf labels which do not match their catalogued call numbers. the gui has different modes to accommodate the user’s preferred view (both single shelf and multi-shelf, stacks views), and allows for a good deal of flexibility in how and when the user wishes to make and record shelf corrections. a reports module is also included, which tracks shelving statistics and other useful information for later reference. the shelfreader application code (including the full sorting algorithm) is freely available via an mit license at https://github.com/scodepress/shelfreader. while shelfreader was developed and tested using the collections and systems of penn state university libraries, its architecture could be adapted and configured for use with other library apis and adjusted to suit local practices within the general confines of the lc call number structure.13 we can also envision a wide array of potential applications of the sorting functionality within other software environments, and we welcome and encourage users to pursue innovative adaptations of the method. references and notes: 1 leo e. lamontagne, american library classification: with special reference to the library of congress (hamden, ct: the shoe string press, 1961). the lengthy development of the lcc is described in detail in chapters xiii and xiv (pp. 221-51). 2 indeed, as lamontagne asserts, “the classification was constructed [ . . . ] to provide for the needs of the library of congress, with no thought to its possible adoption by other libraries. in fact, the library has never recommended that other libraries adopt its system . . . ” (ibid., p. 252). nevertheless, lcc is employed by the overwhelming majority of academic libraries in the united states (brady lund and daniel agbaji, “use of dewey decimal classification by academic libraries in the united states,” cataloging & classification quarterly 56, no. 7 (december 2018): 653-61, https://doi.org/10.1080/01639374.2018.1517851). 3 “library of congress classification,” library of congress, https://www.loc.gov/catdir/cpso/lcc.html. italics in original. 4 for a summary of lc sorting rules, see “how to arrange books in call number order using the library of congress system,” rutgers university libraries, https://www.libraries.rutgers.edu/rul/staff/access_serv/student_coord/libconsys.pdf. note that this summary is not comprehensive and does not cover all contingencies. 5 here we emphasize that our definition of the cutter string may differ from that of others, including (at times) that of the library of congress. for instance, the schedules for certain lcc https://github.com/scodepress/shelfreader https://doi.org/10.1080/01639374.2018.1517851 https://www.loc.gov/catdir/cpso/lcc.html https://www.libraries.rutgers.edu/rul/staff/access_serv/student_coord/libconsys.pdf algorithmic machine sorting of lc call numbers | wagner and wetherington 75 https://doi.org/10.6017/ital.v38i4.11585 subclasses regard the first portion of a cutter as part of the classification itself. since this paper concerns sorting rather than classification, we favor the simpler and more convenient definition. 6 j.f. conley and l.a. nolan, “call number sorting in excel,” https://scholarsphere.psu.edu/downloads/9cn69m421z. 7 conley and nolan, “call number sorting in excel.” 8 tim danny, “sorting lc call numbers in excel,” https://medium.com/@tdannay/sorting-lc-callnumbers-in-excel-75de044bbb04. 9 while there is in fact a “hack” or partial patch built into the program which identifies call numbers beginning with the subclass kbg and parses them separately, there is no general support for other call numbers in this category. 10 for the details of this syntax, see “050 library of congress call number (r),” library of congress, https://www.loc.gov/marc/bibliographic/bd050.html. 11 testing was conducted on sirsidynix symphony workflows staff client version 3.5.2.1079, build date june 5, 2017. 12 for an overview, see “student library assistant training guide: shelving basics,” florida state college at jacksonville, https://guides.fscj.edu/training/shelving. 13 shelfreader was written to receive real-time data directly from a sirsidynix api connected to penn state university libraries’ lms, a great improvement over drawing from a static collections database. this does, however, present a challenge for making the program easily adaptable to libraries using distinct web services. a strategy to adapt the program would need to account for potential differences in barcode structure, structure and naming conventions in the rest request, and structure and naming conventions within the server response from institution to institution. it is possible that these issues could be resolved via a configuration file made available to the user, but no attempt to address this issue has been undertaken as of yet. https://scholarsphere.psu.edu/downloads/9cn69m421z https://medium.com/@tdannay/sorting-lc-call-numbers-in-excel-75de044bbb04 https://medium.com/@tdannay/sorting-lc-call-numbers-in-excel-75de044bbb04 https://www.loc.gov/marc/bibliographic/bd050.html https://guides.fscj.edu/training/shelving abstract background literature review software review proposed parsing and sorting methodology application of methodology results references and notes: digitization of libraries, archives, and museums in russia article digitization of libraries, archives, and museums in russia heesop kim and nadezhda maltceva information technology and libraries | december 2022 https://doi.org/10.6017/ital.v41i4.13783 heesop kim (heesop@knu.ac.kr) is professor, kyungpook national university. nadezhda maltceva (nadyamaltceva7@gmail.com) is graduate student, kyungpook national university. © 2022. abstract this paper discusses the digitization of cultural heritage in russian libraries, archives, and museums. in order to achieve the research goals, both quantitative and qualitative research methodologies were adopted to analyze the current status of legislative principles related to digitization through the literature review and the circumstance of the latest projects related to digitization through the literature and website review. the results showed that these institutions seem quite successful where they provide a wide range of services for the users to access the digital collections. however, the main constraints on digitization within libraries, archives, and museums in russia are connected with the scale of the work, dispersal of rare books throughout the country, and low level of document usage. introduction culture is one of the most important aspects of human activity. libraries, archives, and museums (lams) in the russian federation store some of the richest cultural and historical heritage collections, some of which can be classified as world cultural treasures. as is true with other countries, lams in russia are engaging in problems with digitization of their unique cultural treasures. in this regard, these repositories are implementing digital technologies to improve their work on digitization, preservation, indexing, search, and access of cultural heritage more effectively and efficiently. information technologies can be used to preserve national knowledge and experience.1 the digitization of cultural heritage is one of the changes that occurred at the present stage of the global information society. researchers have made many attempts to define the concept of digital culture, which is considered to be a phenomenon that manifests itself through art, creativity, and self-realization, by implementing information technologies.2 the need for digitization of unique cultural heritage has caused the rapid development of digital libraries, archives, and museums, described collectively as digital lams, the multidisciplinary institutions that change the way people retrieve and access information. researchers and specialists involved in the digitization of information resources in lams work together to preserve the cultural heritage of the russian federation using modern information technologies. as pronina noted, the digitization of cultural heritage began to develop actively in many countries, including russia, around the same time.3 many researchers analyzed digitization issues in russia. for example, lopatina and neretin discussed the modernization of the system of cultural information resources and the history of preserving digital cultural heritage in russia.4 astakhova pointed out the problem of the digitization of cultural heritage and the transformation of art objects into 3d models.5 miroshnichenko et al. discussed the problem of organizing digital documents in the state archives and pointed out the issues of providing digitized archival documents for wide access through open electronic resources.6 mailto:heesop@knu.ac.kr mailto:nadyamaltceva7@gmail.com information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 2 despite a long history of improvements in digitization policies and programs, issues still exist in the major cultural repositories, and russia’s level and scope of digitization research are still lagging behind many european countries.7 therefore, three primary research questions guide this study: 1. what is the policy to regulate the digitization of cultural heritage in russia? 2. what is the status of the digitization of cultural heritage in russia? 3. what are the constraints related to digitization in russia? in addition, there is not enough research that fully reflects the current activities of digitization practices in lams in russia. by analyzing this matter, the authors hope to present the state of cultural heritage digitization in russia and uncover problems and limitations in this field. benefits of digitization in a cultural heritage repository before answering the key research questions, it is worth exploring the ultimate benefits of digitization in cultural heritage repositories. digitization refers to converting an analogue source into a digital version.8 a large proportion of the collections related to cultural heritage repositories comprise not only the materials that are born digital, but many resources that are not originally created in digital form that have been digitized. digitization involves three major stages.9 the first stage is related to preparing objects for digitization and the actual process of digitizing them. the second stage is concerned with the processing required to make the materials easily accessible to users. this involves a number of editorial and processing activities including cataloguing, indexing, compression, and storage, as well as applying appropriate standards for text and multimedia file formats to meet the needs of online digital lams. the third stage includes the preservation and maintenance of the digitized collections and services built upon them.10 the benefits of digitization are improved access and preservation. items, once digitized, can be used by many people from different places simultaneously at any point in time. unlike printed or analogue collections, digitized collections are not damaged by heavy and frequent usage, which helps in the preservation of information. according to ifla’s guidelines, several benefits come from having digitized materials. organizations digitize 1. to increase access to a high demand from users and the library or archive has the desire to improve access to a specific collection; 2. to improve services to an expanding user’s group by providing enhanced access to the institution’s resources with respect to education and life-long learning; 3. to reduce the handling and use of fragile or heavily used original material and create a backup copy for endangered material such as brittle books or documents; 4. to give the institution opportunities for the development of its technical infrastructure and staff skill capacity; 5. to form a desire to develop collaborative resources, sharing partnerships with other institutions to create virtual collections and increase worldwide access; 6. to seek partnerships with other institutions to capitalize on the economic advantages of a shared approach; and 7. to take advantage of financial opportunities, for example the likelihood of securing funding to implement a program, or of a particular project being able to generate significant income.11 information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 3 while digitization has benefits, there are also some problems. the most obvious one is related to the quality of the digitized objects. in the course of digitizing, we may lose some important aspects of the original document. another problem is related to access management. proper mechanisms need to be put in place to determine the authenticity of materials, as well as to control unauthorized access and use. the success of digitization projects depends not only on technology but also on project planning. since digitization is a relatively new process, institutions may concentrate on technology before deciding on a project’s purpose. however, technology should never drive digitization projects; instead, user needs should be determined first, and only then should a technology appropriate to those needs be selected to meet a project’s objectives. the best practices for planning a digitization project can be suggested as follows: determine the copyright status of the materials; identify the intended audience of the materials; determine whether it is technically feasible to capture the information; insist on the highest quality of technical work that the institution can afford; factor in costs and capabilities for long-term maintenance of the digitized images; cultivate a high level of staff involvement; write a project plan, budget, timeline, and other planning documents; budget time for staff training; plan a workflow based upon the results of scanning and cataloging a representative sample of material.12 policies regulating digitization of cultural heritage in russia the policy development at the time of selection should be made early for the suitability of selection and digital object management. this policy should formulate goals of the digitization project, identify materials, set selection criteria, define the means of access to digitized collections, set standards for image and metadata capture and for preservation of the original materials and state the institutional commitment to the long-term preservation of digital content.13 as stated by russian law, the cultural heritage of the peoples of the russian federation includes material and spiritual values created in the past, as well as monuments and historical and cultural territories and objects significant for the preservation and development of identity of the russian federation and all its peoples, their contribution to world civilization.14 the decree of the president of the russian federation “on approval of the fundamentals of state cultural policy” extended the term of cultural heritage by including documents, books, photos, art objects, and other cultural treasures that represent the knowledge and ideas of people throughout the centuries. the government emphasized the role of the information environment and modern technologies by analyzing it at the legislative level. in the presidential decree “on approval of the fundamentals of state cultural policy,” the concept of the information environment is separately distinguished, defined as a set of mass media, radio, and television broadcasting, and the internet; the textual and visual information materials disseminated through them; as well as the creation of digital archives, libraries, and digitized museum collections.15 another important part of the government policy is to provide open access to cultural heritage objects. the problem of access was confirmed in the state program culture of russia (2012 –2018), which stipulated the need to provide access to cultural heritage in digital forms as well as to create and support resources that provide access to cultural heritage objects on the internet and in the national electronic library, one of the main digital repositories in the country.16 access to digital cultural heritage was also considered in the state program information society (2011–2020). the subprogram information environment ensured equal access to the media environment, including information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 4 objects of digital cultural heritage. the program aimed to reduce the gap in access to cultural heritage objects in different regions across the russian federation.17 the digitization of cultural heritage and creation of digital archives is one of the characteristics of innovative changes in the cultural sphere of the information society. the law “on archival affairs” notes that a significant part of the information resources of the archives has a historical and cultural value and should be considered as part of the digital cultural heritage collection, the digitization of which is required.18 with regards to libraries, on january 22, 2020, the state duma of the russian federation adopted the draft law “on amendments to the federal law on librarianship” in terms of improving the procedure for state registration of rare books (rare books are defined as handwritten books or printed publications that have an outstanding spiritual or material value; have a special historical, scientific, cultural significance; and for which a special regime for accounting, storage, and use has been established) that aimed at ensuring legal protection of rare books by improving the system of protection of the items of the national library. the law reflects the criteria for classifying valuable documents as rare books and fixes the main stages of their registration. in case of museums, a federal law from 1996 aimed to establish the national catalog of the russian federation museum collections. at first this national catalog was created for inventory purposes and then it was transformed into an online database to ensure open access to russia’s cultural heritage (http://kremlin.ru/events/administration/21027). annual reports “on the state of culture in the russian federation” reflect the overall situation and changes in libraries, archives, and museums. some researchers emphasized the need to develop a unified regulatory framework for cultural heritage preservation practices. particularly, shapovalova stressed that the leader in this discussion should be the government, which plays a crucial role in the legal regulations of the cultural heritage policy and is responsible for the development of initiatives.19 however, lialkova and naumov criticized that russian policy discusses digitization of only a few cultural objects, but does not define the legal status of such objects and does not cover objects originally created in a digital form.20 kozlova considered the issues of russian digital culture within the framework of the obligatory library copies system.21 since 1994, the national library of russia has accepted electronic media according to the federal law “on the obligatory copy of documents,” which established the legal deposit system; the bibliographic records of deposited electronic media are available online in the electronic catalog “russian electronic editions.” acquisitions librarians use this catalog as a national bibliographic resource for adding electronic editions to their collections. dzhigo addressed issues of digital preservation of cultural heritage and also paid attention to the federal legal deposit law.22 yumasheva dealt with the content of the russian normative methodic of regulating the process of digital copying of historical and cultural heritage from russian libraries and museums.23 kruglikova considered theoretical and practical issues of legislation for the preservation and popularization of cultural heritage in the modern world.24 shapovalova suggested introducing the terms of digital cultural heritage objects on a legislative level, to recognize the concept of preserving cultural heritage and to provide virtual access to such objects on the bigger scale.25 a review of the literature reveals various studies that discuss cultural heritage preservation using modern technologies. the majority of researchers identified issues in this field. digitization practices are carried out mainly by the state libraries, archives, and museums which seek to http://kremlin.ru/events/administration/21027 information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 5 preserve cultural heritage objects in a better methodological and legislative way, and less development is seen in smaller local lams. researchers express the value of preservation of cultural materials and the need to analyze and improve legislative procedures. to this day government realizes the importance of digital preservation; however, the term “digital cultural heritage” is not mentioned and the legal status of such digitized objects is not defined. in addition, legislative documents do not cover the regulation of objects originally created in digital format. moreover, we can see a large gap between the accumulation of materials and the degree of their use despite the fact that the government seems to support open access to digital cultural heritage objects. digitization projects of cultural heritage in russia to analyze the circumstances of the latest projects related to digitization, we investigated the relevant websites from may 2021 to june 2022. in this study, we chose a few representative institutions, including some national projects, based on their reputation, authority, and the scope of the collections. the data on digitization practices and current projects were collected . the list of institutions is shown in table 1. as shown in table 1, the authors selected russian national library, national electronic library, russian state library, and presidential library as the largest and most well-known libraries in russia. among the archives chosen for the analysis, the archival fonds was selected because it unites the archives in russia in one system, and the national digital archive was selected because its main goal is to preserve and archive key russian digital resources. as for the museums, the state hermitage museum, the state russian museum, and the state museum of fine arts named after a. s. pushkin were chosen for this study because they hold the richest collections of russian cultural heritage and play a vital role in replenishment of the national catalogue of the russian federation museum collections, the main goal of which is to unite museums across the country. by analyzing the websites of these selected libraries, archives, and museums, we can gain insight into what projects have been undertaken to preserve cultural heritage and what are the main drawbacks of this field. however, it is true that some institutions do not share the latest information on digitized items. in the case of libraries and archives, the numbers are fairly public on the website, but it is difficult to prove exactly when the objects were digitized. however, not all museums share information about recently digitized objects. in this case, quantitatively analyzing digitization practices is the only way. therefore, the authors used a manual method for data collection and counted the number of digitized materials available on the website. indeed, this could be one of the limitations of this work, as some institutions have hidden the exact amount of digitized collections; some institutions have not been able to manually count digitized copies due to huge amounts of data; and some websites may not be up to date. information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 6 table 1. institutions responsible for digitization of cultural heritage in russian lams type name size description libraries russian national library http://nlr.ru/eng/ra 2403/digital-library 650,000 scanned copies as of the beginning of 2019, the digital library included scanned copies of books, magazines, newspapers, music publications, graphic materials, maps, audio recordings, and more. the scanned materials include items from the national library of russia and from partner libraries, publishing organizations, authors and readers. national electronic library https://rusneb.ru 1,700,000 digitized books26 the nel project was designed to provide internet users with access to digitized documents from russian libraries, museums, and archives. nel combines rare books and manuscripts, periodicals, and sheet music collected from all major russian libraries. russian state library https://www.rsl.ru 1,500,000 documents this is the largest public library in russia; the digital collection contains copies of valuable and most requested publications, as well as documents originally created in electronic form. the electronic catalog contains information on more than 21 million publications, 1.5 million of which have been digitized. presidential library https://www.prlib.r u/en 1,000,000 units the presidential library is a nationwide electronic repository of digital copies of the most important documents of the history of russia. the volume of the presidential library collections is more than a million storage units including digital copies of books and journals, archival documents, audio and video recordings, photographs, films, dissertation abstracts, and other materials. http://nlr.ru/eng/ra2403/digital-library http://nlr.ru/eng/ra2403/digital-library https://rusneb.ru/ https://www.rsl.ru/ https://www.prlib.ru/en https://www.prlib.ru/en information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 7 type name size description archives archival fonds of russia (central fonds catalog) https://cfc.rusarchiv es.ru/cfc-search/ 959,576 archival fonds27 annually, the volume of documents of the archival fonds of the russian federation increases by an average of 1.7 million units. as of december 13, 2020, the central fonds catalog included 959,576 items from 13 federal archives and 2,225 state and municipal archives of the russian federation. national digital archive https://ruarxive.org 282 websites28 the purpose of this initiative is to find and preserve websites and other digital materials of high public value and at risk of destruction. the nda project collects official accounts on social networks, official websites of government bodies and political parties, and historical data. however, not many websites were collected in comparison with other countries’ initiatives. unlike the internet archive, the nda project make a complete copy of everything that is on the site, including archive channels on twitter, instagram, and telegram. museums national catalogue of the russian federation museum collections https://goskatalog.r u/portal/#/ 23,193,078 units the catalog is an electronic database containing basic information about each museum item and each museum collection included in the museum fonds of the russian federation. according to the latest statistics (2020), over 23 million units were recorded in the national museum catalog. however, the total amount of museum objects across russia is more than 84 million. https://cfc.rusarchives.ru/cfc-search/ https://cfc.rusarchives.ru/cfc-search/ https://ruarxive.org/ https://goskatalog.ru/portal/#/ https://goskatalog.ru/portal/#/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 8 type name size description state hermitage museum https://www.hermit agemuseum.org 400,000 units the state hermitage museum is the second largest museum in the world. the hermitage exposition is gradually moving online. this process is slow and very laborious. the entire collection of the hermitage has not been digitized, but the website already contains 400,000 exhibits (that is, approximately only one tenth of the entire collection). the online collection includes paintings, sculptures, numismatics, archaeological finds, and other exhibits. state russian museum https://www.rusmus eum.ru/collections/ 3,682* * the number of digitized collections were manually counted on the website this is the world’s largest museum of russian art. the collection of the museum has about 400,000 exhibits and covers all historical periods of russian art. at the moment on the museum website only a small part of the collection is available in digitized form. however, the museum is maintaining the virtual state russian museum branch project, the main goal of which is to give free access to digital and printed materials from other institutions online. state museum of fine arts named after a. s. pushkin https://pushkinmus eum.art 334,000 as of march 1, 2019, the museum’s database contained information on 670,000 museum items, 334,000 (49%) of which have images. in total there are about 683,000 images in the database (not counting special photography) with a volume of about 35 tb. https://www.hermitagemuseum.org/ https://www.hermitagemuseum.org/ https://www.rusmuseum.ru/collections/ https://www.rusmuseum.ru/collections/ https://pushkinmuseum.art/ https://pushkinmuseum.art/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 9 russian national library national electronic library russian national digital archive state hermitage museum figure 1. screenshots of the websites of some of the institutions listed in table 1. a further analysis of russian museums shows that 2,773 state and municipal museums have more than 84 million items, but only a few are displayed in digital form. biryukova et al. reviewed the interdisciplinary approach to preserving cultural heritage and creating virtual museums.29 povroznik also analyzed virtual museums that preserve the cultural heritage of the russian federation.30 the author concluded that virtual museums and its resources need to be studied, developed, and improved more. kondratyev et al. considered the issues of digital heritage preservation from the security, integrity, and accessibility perspective, and analyzed the concept of a smart museum.31 lapteva and pikov represented the experience of the students of the institute for the humanities of siberian federal university working with the state russian museum and the state hermitage museum, the leading russian museums that are playing the important role in country digitization practices.32 the authors noted that results of implementing modern information technologies in museums create a comfortable infrastructure for the audience by preserving and representing cultural heritage in interactive contexts. information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 10 findings digitization in russian libraries creating a digital collection has become a normal library activity in russia.33 within the framework of the main development of activities to preserve russian library collections from 2011 to 2020, one of the main programs of the national library is the digitization of rare books. rare books, according to the federal law “on amendments to the federal law on librarianship,” include handwritten books or printed publications that have outstanding material value or special historical, scientific, and/or cultural significance.34 thus, the law elevated the book to the same level of protection as other objects of cultural heritage at the national level. the website of the register of rare books (https://knpam.rusneb.ru), hosted by the russian state library, became a part of the national library collection preservation program developed in 2001. from 2001 to 2009, the subprogram rare books of the russian federation was created to provide a regulatory framework and methodological support for all areas of library activities related to the preservation of library collections. this program includes not only libraries but also other institutions such as museums, archives, and scientific and educational institutions. however, in order to implement the state registration of rare books, it is necessary to further develop regulatory documents that can control the reference procedure and registration procedure of rare books. another initiative for book preservation is the federal project digital culture, designed to provide citizens with wide access to the country’s unique cultural heritage. it was expected that ten to twenty libraries of different russian regions will take part in the digitization project, each offering at least 50 documents from their collections to the project. however, the problems of this program are related to the work scale, as well as the dispersal issue of rare books throughout the country. as the 2011–2020 library preservation report emphasizes, many of these rare books remain unknown to the wider scholarly community. approximately half of the valuable collections available in the country’s repositories are not described as integral objects of cultural and historical heritage. the russian state library noted that the main problems associated with rare books include comprehensive function to identify and record rare and valuable books; ensuring the safety and security of the books; copying valuable materials requires special equipment; and the need for proper storage as the most important condition for the preservation. another main center of digitization is the digital library of the national library of russia (nlr, https://nlr.ru). the digital library is an open and accessible information resource that includes over 650,000 digitized copies of books, magazines, newspapers, music publications, graphic materials, maps, plans, atlases, and audio recordings. the digitized materials include items from the holdings of the national library of russia, partner libraries, publishing organizations, authors, and even readers. for now, the digital collection of the library includes various collections such as landmarks of the nlr, rare books, rossica, maps, and manuscripts. hosted by russia’s national library in 2004, the national electronic library (nel, https://rusneb.ru/) was launched to create an electronic library sponsored by the russian federal ministry of culture. the nel is a service that searches the full text of scanned books and magazines that have been processed using optical character recognition and converted them into text. it is stored in a digital database available through the internet and mobile applications. one of the main tasks of the nel is the integration of the libraries of the russian federation into a single information network. as of june 15, 2022, the nel collection had a total of 5 million artifacts including electronic copies of books, educational and periodical literature, dissertations and https://knpam.rusneb.ru/ https://nlr.ru/ https://rusneb.ru/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 11 abstracts, monographs, patents, notes, and visual and cartographic publications. the russian state library became the main operator of the national electronic library project in 2014. since 2015, the national library of russia has expanded its digitization program and the site now publishes a list of publications that require digitization. readers vote for publications directly on the site by clicking the vote for digitalization button. for example, as of november 2020, a list of 1,998 publications on a variety of topics ranging from physics and mathematical literature to psychology and music was available for voting. digitization in russian archives archives have historical, scientific, social, and cultural significance, which is an essential part of the process of preserving russian cultural heritage. digitization projects in russia began as an element of the digital cataloging of the largest archives from the 1980s to the 1990s. initially, the main purpose of the digitization project was to create digital copies to ensure the preservation of original archive documents and to eliminate the distribution of rare or poor originals in the reading room. since then, digitization has become an integral part of creating digital archives in russia.35 currently, one of the main goals of digitizing archival documents is to provide open access to legal entities and individuals to archival documents from the russian federation. the main archival center is the archives fond of the russian federation (http://archives.ru/af.shtml). the archives fond has more than 609 million items from the early eleventh century to the present and performs important functions to preserve historical memory, replenish information resources, and provide access to the public. the main task of digitization is to preserve russia’s cultural and historical heritage. each year, the total volume of archives across russia increases by an average of 1.7 million items. despite the relatively small amount of equipment for digitization, we can still see progress. in 2015, 8,750 documents were digitized, while in 2019, the annual total had reached 27,518 documents. this increase in the number of digital documents shows that digital copy production is directly related to equipment acquisition. however, the researchers found that the level of use of these documents was not high and tended to decrease. for example, in 2015, there were 18,155 document views, while in in 2019, there were only 19,417 document views. therefore, it is necessary not only to promote the service of the archive agency but also to increase the demand for archive documents. a portal was created under the auspices of the archives fond of the russian federation (http://www.rusarchives.ru) to encourage archiving services for users and to organize all archives throughout russia. the portal collects information resources of russian archives on the internet and publishes archival directories and regulations. the establishment was an important breakthrough in organizing access to the documents of the archives fond of the russian federation. since 2012, the website has operated the central catalog software complex, which provides information on the composition of federal and regional digitized fonds. as reported by the federal archival agency, 32 virtual exhibition projects are posted on the official website and portals of the federal archival agency. this website provides information about online archive projects, including virtual exhibitions, digital collections, and inter-archive projects. users can search for materials on the website by three publication types: virtual exhibition, document collection, and inter-archive project. the project covers four subjects: the great patriotic war, statehood of russia, soviet era, and space exploration. the federal archival agency’s main website also provides five catalogs and databases that guide users through digitized collections. this list includes the central stock catalog http://archives.ru/af.shtml http://www.rusarchives.ru/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 12 (http://cfc.rusarchives.ru/cfc-search), state register of unique documents (https://unikdoc.rusarchives.ru/), guides to russian archives (https://guides.rusarchives.ru/), electronic inventories of federal archives (https://rusarchives.ru/elektronnye-opisi-federalnyharhivov), database of declassified cases and documents of the federal archives (http://unsecret.rusarchives.ru/), and database on the places of storage of documents on personnel (http://ls.rusarchives.ru/). as of january 1, 2022, 859 documents were included in the state register of unique documents of the archival fund of the russian federation. a total of more than 98,000 documents are stored in the database. a project to digitize documents from the soviet era is still in progress, and the new collections of digitized copies of archival documents stored in federal archives across russia will be displayed on the website in the future (http://sovdoc.rusarchives.ru/#main). one of the major drawbacks of the digitization process in russia is that archival agencies and cultural heritages are scattered throughout russia. to develop digital archiving initiatives in different regions of russia the culture of russia (2012–2018) program was developed. archives of the constituent entities of the russian federation can take part in this program and get funding from the regional budget to digitize collections as a part of the regional program for the development of archival affairs.36 despite some improvements and ongoing projects, there are still no initiatives for the long -term preservation of born-digital materials and no requirements for mandatory long-term preservation of information. however, the national digital archive (https://ruarxive.org) was created to find and preserve websites and other digital materials that have a high public value and are at risk of destruction. this initiative proposes the general idea of archiving modern digital heritage and consists of many projects. the main one is preserved government, which aims to preserve official materials in the following areas: official accounts on social networks; official sites of government managers, officials, political parties; historical documents; and especially databases. future plans include developing tools that will help collect digital materials faster and more efficiently and also better systematize what has already been collected. digitization in russian museums the active introduction of information technology into museums began at the end of the twentieth century. a new area of study, museum informatics, has emerged in russian higher-education institutions. this area of study focuses on museum work and modern information technology to develop and improve museum activities.37 museums have developed many digitization projects to preserve their collections and give free and easy access to cultural heritage items. the modern russian museum system consists of about 2,773 museums, although the exact number of museums is not known. since the 1970s, the rationale for russian museum digitization practices has been quite similar to that of many other countries, finding that information and collection management are needed to ensure that museum objects are listed and properly preserved. the museums plan to create electronic collections, open valuable collections to the public, create a state catalog of the museum collection of the russian federation (https://goskatalog.ru/portal/#/) and integrate all works from all museums in russia. as of 2020, more than 23 million museum items are registered in the national catalog of the museum collection. the catalog is planned to be complete by 2026, when metadata and images of the museum’s collection are included in the register and posted online. digitization of museum collections is an important process that has recently received stable support from the government. http://cfc.rusarchives.ru/cfc-search https://unikdoc.rusarchives.ru/ https://guides.rusarchives.ru/ https://rusarchives.ru/elektronnye-opisi-federalnyh-arhivov https://rusarchives.ru/elektronnye-opisi-federalnyh-arhivov http://unsecret.rusarchives.ru/ http://ls.rusarchives.ru/ http://sovdoc.rusarchives.ru/#main https://ruarxive.org/ https://goskatalog.ru/portal/#/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 13 the national information society (2011–2020) program includes a project to create a new virtual museum based on the collections of the country’s largest national museum. the term “virtual museum” is used to characterize various projects linked to digital technology in virtual and museum space.38 it can be represented by a collection of works of art on the internet and the publication of the museum’s electronic expositions. currently, there are about 300 virtual museums available in virtual form across the country (https://www.culture.ru/museums). the most-visited museums are the state hermitage museum in st. peterburg (https://www.hermitagemuseum.org/), the state tretyakov gallery (https://www.tretyakovgallery.ru), and the state russian museum (http://en.rusmuseum.ru). these museums offer users a wide range of activities, including the use of modern technology . for example, since 2003 the russian art collection at the state russian museum (the world’s largest museum) started to implement the russian museum: virtual branch project, opening virtual branches in museums, universities, cultural centers, and institutions of additional education around the country. thanks to computer technology and digitization, thousands of russian residents in near and far places have access to the value of russian culture, russia’s historical and artistic past, and the richest collection of russian art. international business machines (ibm) collaborated with the hermitage museum to make it one of the most technologically advanced museums in the world. ibm built the state hermitage museum website in 1997, later called the “world’s best online museum” by national geographic traveler.39 the hermitage has unique experience in developing digitization programs and uploading collections to websites. currently, the museum collects more than 3 million items, and the online archives presented on its website provide easy search and the possibility of creating your own collection on the website. in 2020, the hermitage released a documentary feature film in virtual reality (vr) format, “vr— hermitage: immersion in history with konstantin khabenskiy” (https://www.khabenskiy.com/ filmography-vr-hermitage-immersion-in-history-with-konstantin-khabenskiy/). visitors can tour the history of the hermitage in a vr format based on the most important events in the history of the hermitage from the eighteenth century to the present. the pushkin museum, the largest museum of european art in moscow, offers another example of using vr technology. the joy of museums offers virtual tours of more than 60,000 museums and historic sites around the world, including the pushkin museum (https://joyofmuseums.com/museums/russianfederation/moscow-museums/pushkin-museum/). virtual museums can display electronic versions of exhibits longer than actual museum exhibitions limited by region and time zone and have the means to record information about past exhibits, including electronic collections of exhibits, as well as data on opening times and concepts. for example, the website of the state tretyakov gallery contains a virtual archive of past exhibitions. therefore, the virtual museum has considerable research potential and is actively contributing to the preservation of cultural heritage. digital copies of the original culture and arts form an electronic archive of great value from two perspectives. this is the preservation of rarity for future generations, the broad access of users to the rarest and most unique artworks in historical significance, and the possibility of research. on the other hand, it is an opportunity to find commercial use of artifacts, additional sponsorship, and investment proposals for museums. conclusions and further study the two most obvious benefits of digitization are improved access and preservation, so that libraries, archives, and museums can represent russian culture and introduce rare and unique cultural heritage artifacts to future generations. in this work, we have addressed some legislative https://www.culture.ru/museums https://www.hermitagemuseum.org/ https://www.tretyakovgallery.ru/ http://en.rusmuseum.ru/ https://www.khabenskiy.com/filmography-vr-hermitage-immersion-in-history-with-konstantin-khabenskiy/ https://www.khabenskiy.com/filmography-vr-hermitage-immersion-in-history-with-konstantin-khabenskiy/ https://joyofmuseums.com/museums/russian-federation/moscow-museums/pushkin-museum/ https://joyofmuseums.com/museums/russian-federation/moscow-museums/pushkin-museum/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 14 principles and outlined major digitization projects. the general problem of digitization in russia is related to the size of works, the tendency of documents to decrease without high use, and the scatter of rare books nationwide. in the case of libraries, one of the problems of digitization is also related to the uneven distribution of rare books throughout the country. the most important materials are concentrated in the largest federal library, and many rare books are housed in many central libraries in various parts of the russian federation. work using book memorials should be planned as long-term activities performed at different levels. in the case of archives and museums, one of the major drawbacks of the digitization is the dismantling of national archives and cultural heritages. based on this preliminary study, there are several further research topics that can enhance understanding of digitization of cultural heritage in russia. in particular, since digitization is a complex process that requires both management and technology, future research needs to be divided into three aspects: management, technology, and content. endnotes 1 g. a. kruglikova, “use of information technologies in preservation and popularization of cultural heritage,” advances in social science, education and humanities research 437 (2020): 446–50. 2 g. m. shapovalova, “digital culture and digital heritage—doctrinal definitions in the field of culture at the stage of development of modern russian legislation. the territory of new opportunities” [in russian], the herald of vladivostok state university of economics and service 10, no. 4 (2018): 81–89. 3 l. a. pronina, “information technologies preserving cultural heritage. analytics of cultural studies,” 2008, https://cyberleninka.ru/article/n/informatsionnye-tehnologii-v-sohraneniikulturnogo-naslediya/viewer. 4 n. v. lopatina and o. p. neretin, “preservation of digital cultural heritage in a single electronic knowledge space,” bulletin mguki 5, no. 85 (2018): 74–80. 5 y. s. astakhova, “cultural heritage in the digital age. human in digital reality: technological risks,” materials of the v international scientific and practical conference (2020): 204–6. 6 m. a. miroshnichenko, y. v. shevchenko, and r. s. ohrimenko, “preservation of the historical heritage of state archives by digitalizing archive documents” [in russian], вестник академии знаний 37, no. 2 (2020): 188–94. 7 inna kizhner et al., “accessing russian culture online: the scope of digitization in museum s across russia,” digital scholarship in the humanities 19 (2019): 350–67, https://doi.org/10.1093/llc/fqy035. 8 s. d. lee, digital imaging: a practical handbook (new york: neal-schuman publishers, inc., 2001). 9 s. tanner and b. robinson, “the higher education digitisation service (heds): access in the future, preserving the past,” serials 11 (1998): 127–31; g. a. young, “technical advisory service for images (tasi),” 2003, http://www.jiscmail.ac.uk/files/newsletter/issue3_03/; https://cyberleninka.ru/article/n/informatsionnye-tehnologii-v-sohranenii-kulturnogo-naslediya/viewer https://cyberleninka.ru/article/n/informatsionnye-tehnologii-v-sohranenii-kulturnogo-naslediya/viewer https://doi.org/10.1093/llc/fqy035 http://www.jiscmail.ac.uk/files/newsletter/issue3_03/ information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 15 “preservation services,” harvard library, https://preservation.library.harvard.edu/digitization. 10 g. g. chowdhury and s. chowdhury, introduction to digital libraries (london: facet publishing, 2003), https://doi.org/10.1016/b978-1-84334-599-2.50006-4. 11 j. mcilwaine et al., “guidelines for digitization projects for collections and holdings in the public domain, particularly those held by libraries and archives” (draft) (unesco, march 2002), 6– 7, https://www.ifla.org/wp-content/uploads/2019/05/assets/preservation-andconservation/publications/digitization-projects-guidelines.pdf. 12 m. note, managing image collections: a practical guide (oxford: chandos publishing, 2011). 13 mcilwaine et al., “guidelines,” 51–52. 14 fundamentals of the legislation of the russian federation on culture, http://www.consultant.ru/document/cons_doc_law_1870/068694c3b5a06683b5e5a2d480 bb399b9a7e3dcc/. 15 decree of the president of the russian federation of december 24, 2014 no. 808, on approval of the fundamentals of state cultural policy, http://kremlin.ru/acts/bank/39208. 16 v. zvereva, ‘‘state propaganda and popular culture in the russian-speaking internet,” in freedom of expression in russia’s new mediasphere, ed. mariëlle wijermars and katja lehtisaari (routledge: abingdon, oxon, 2020), 225–47, https://doi.org/10.4324/9780429437205-12. 17 s. l. yablochnikov, m. n. mahiboroda, and o. v. pochekaeva, “information aspects in the field of modern public administration and law,” in 2020 international conference on engineering management of communication and technology (emctech), 1–5; u. chimittsyrenova, “a research proposal information society: copyright (presumption of access to the digital cultural heritage),” colloquium journal, no. 11-3 (2017): 22–24. голопристанський міськрайонний центр зайнятості = голопристанский районный центр занятости. 18 g. m. shapovalova, “information society: from digital archives to digital cultural heritage,” international research journal 5, no. 47 (2016): 177–81. 19 g. m. shapovalova, “the global information society changing the world: the copyright or the presumption of access to digital cultural heritage,” society: politics, economics, law, 2016. 20 s. b. lialkova and v. b. naumov, “the development of regulation of the protection of cultural heritage in the digital age: the experience of the european union,” информационное общество 1 (2020): 29–41. 21 e. kozlova, “russia’s digital cultural heritage in the legal deposit system,” slavic & east european information resources 12, no. 2-3 (2011): 188–91. 22 a. a. dzhigo, “preserving russia’s digital cultural heritage: acquisition of electronic documents in russian libraries and information centers,” slavic & east european information resources 14, no. 2-3 (2013): 219–23. https://preservation.library.harvard.edu/digitization https://doi.org/10.1016/b978-1-84334-599-2.50006-4 https://www.ifla.org/wp-content/uploads/2019/05/assets/preservation-and-conservation/publications/digitization-projects-guidelines.pdf https://www.ifla.org/wp-content/uploads/2019/05/assets/preservation-and-conservation/publications/digitization-projects-guidelines.pdf http://www.consultant.ru/document/cons_doc_law_1870/068694c3b5a06683b5e5a2d480bb399b9a7e3dcc/ http://www.consultant.ru/document/cons_doc_law_1870/068694c3b5a06683b5e5a2d480bb399b9a7e3dcc/ http://kremlin.ru/acts/bank/39208 https://www.worldcat.org/search?q=au=%22wijermars,%20marie%cc%88lle%22 https://doi.org/10.4324/9780429437205-12 information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 16 23 y. y. yumasheva, “digitizing russian cultural heritage: normative and methodical regulation,” bulletin of the ural federal university humanitarian sciences 3, no. 117 (2013): 2–7. 24 g. a. kruglikova, “use of information technologies in preservation and popularization of cultural heritage,” advances in social science, education and humanities research 437 (2020): 446–50. 25 g. m. shapovalova, “the concept of digital cultural heritage and its genesis: theoretical and legal analysis, the territory of new opportunities” [in russian], the herald of vladivostok state university of economics and service 9, no. 4 (2017): 159–68. 26 a. annenkov, “national electronic library of russia: it’s not yet on fire, but the time to save it is now [in russian], http://d-russia.ru/nacionalnaya-elektronnaya-biblioteka-rossii-eshhyone-gorela-no-spasat-uzhe-pora.html. 27 saa dictionary of archives terminology. a “fonds” is the entire body of records of an organization, family, or individual that have been created and accumulated as the result of an organic process reflecting the functions of the creator. 28 airtable, https://airtable.com/shro1hise7wgurxg5/tblhdxawiv5avtn7y. 29 m. v. biryukova et al., “interdisciplinary aspects of digital preservation of cultural heritage in russia” [in russian], european journal of science and theology 13, no. 4 (2017): 149–60. 30 n. povroznik, “virtual museums and cultural heritage: challenges and solution,” https://www.researchgate.net/profile/nadezhdapovroznik/publication/329308409_virtual_museums_and_cultural_heritage_challenges_and_ solutions/links/5c00e5dba6fdcc1b8d4aa3b7/virtual-museums-and-cultural-heritagechallenges-and-solutions.pdf. 31 d. v. kondratyev et al., “problems of preservation of digital cultural heritage in the context of information security,” history and archives (2013): 36–51. 32 m. a. lapteva and n. o. pikov, “visualization technology in museum: from the experience of sibfu collaboration with the museums of russia,” journal of siberian federal university humanities & social sciences 7, no. 9 (2016): 1674–81. 33 g. a. evstigneeva, the ideology of digitization of library collections on the example of the russian national public library for science and technology, library collections: problems and solutions, 2014, http://www.gpntb.ru/ntb/ntb/2014/3/ntb_3_8_2014.pdf. 34 main directions of development of activities for the preservation of library collections in the russian federation for 2011–2020, https://kp.rsl.ru/assets/files/documents/maindirections.pdf. 35 g. m. shapovalova, “the concept of digital cultural heritage,” 159–68. 36 o. a. kolchenko and e. a. bryukhanova, “the main directions of archiving informatization in the context of electronic society development,” vestnik tomskogo gosudarstvennogo universiteta—tomsk state university journal 443 (2019): 114–18. http://d-russia.ru/nacionalnaya-elektronnaya-biblioteka-rossii-eshhyo-ne-gorela-no-spasat-uzhe-pora.html http://d-russia.ru/nacionalnaya-elektronnaya-biblioteka-rossii-eshhyo-ne-gorela-no-spasat-uzhe-pora.html https://airtable.com/shro1hise7wgurxg5/tblhdxawiv5avtn7y https://www.researchgate.net/profile/nadezhda-povroznik/publication/329308409_virtual_museums_and_cultural_heritage_challenges_and_solutions/links/5c00e5dba6fdcc1b8d4aa3b7/virtual-museums-and-cultural-heritage-challenges-and-solutions.pdf https://www.researchgate.net/profile/nadezhda-povroznik/publication/329308409_virtual_museums_and_cultural_heritage_challenges_and_solutions/links/5c00e5dba6fdcc1b8d4aa3b7/virtual-museums-and-cultural-heritage-challenges-and-solutions.pdf https://www.researchgate.net/profile/nadezhda-povroznik/publication/329308409_virtual_museums_and_cultural_heritage_challenges_and_solutions/links/5c00e5dba6fdcc1b8d4aa3b7/virtual-museums-and-cultural-heritage-challenges-and-solutions.pdf https://www.researchgate.net/profile/nadezhda-povroznik/publication/329308409_virtual_museums_and_cultural_heritage_challenges_and_solutions/links/5c00e5dba6fdcc1b8d4aa3b7/virtual-museums-and-cultural-heritage-challenges-and-solutions.pdf http://www.gpntb.ru/ntb/ntb/2014/3/ntb_3_8_2014.pdf https://kp.rsl.ru/assets/files/documents/main-directions.pdf https://kp.rsl.ru/assets/files/documents/main-directions.pdf information technology and libraries december 2022 digitization of libraries, archives, and museums in russia | kim and malceva 17 37 g. p. nesgovorova, “modern information, communication and digital technologies in the preservation of cultural and scientific heritage and the development of museums: problems of intellectualization and quality of informatics systems” (2006): 153–61, https://www.iis.nsk.su/files/articles/sbor_kas_13_nesgovorova.pdf. 38 n. g. povroznik, “virtual museum: preservation and representation of historical and cultural heritage,” perm university bulletin 4, no. 31 (2015): 2013–21. 39 the preservation of culture through technology, https://www.ibm.com/ibm/history/ibm100/us/en/icons/preservation/ . https://www.iis.nsk.su/files/articles/sbor_kas_13_nesgovorova.pdf https://www.ibm.com/ibm/history/ibm100/us/en/icons/preservation/ abstract introduction benefits of digitization in a cultural heritage repository policies regulating digitization of cultural heritage in russia digitization projects of cultural heritage in russia findings digitization in russian libraries digitization in russian archives digitization in russian museums conclusions and further study endnotes bibliographic retrieval from bibliographic input; the hypothesis and construction of a test frederick h. ruecking, jr.: head, data processing division, the fondren library, rice university, houston, texas 227 a study of problems associated with bibliographic retrieval using unverified input data supplied by requesters. a code derived from compression of title and author information to four, four-character abbreviations each was used for retrieval tests on an ibm 1401 computer. retrieval accuracy was 98.67%. current acquisitions systems which utilize computer processing have been oriented toward handling the order request only after it has been manually verified. systems, such as that of texas a & i university (1), have proven useful in reducing certain clerical routines and in handling fund accounting ( 2). lack of a larger bibliographic data base and lack of adequate computer time have prevented many libraries from studying more sophisticated acquisitions systems. at the time the marc pilot project ( 3) was started, the fondren library at rice university did not have operating computer applications in acquisitions, serials, or cataloging. the university administration and the research computation center provided sufficient access to the ibm 7040 to permit the study of problems associated with bibliographic retrieval using input data which has varying accuracy. in 1966, richmond expressed the concern of many librarians about the lack of specific statements describing the techniques by which on-line retrieval could be accomplished without complicating the problems presented by the current card catalog ( 4). she had previously described some of the problems created by the kind and quality of data being utilized as references by library users ( 5). 228 journal of library automation vol. 1/ 4 december, 1968 an examination of the pertinent literature indicates that most of the current work in retrieval, while related to problems of bibliographic retrieval, does not offer much assistance when the input data is suspect ( 6, 7,8 ). tainiter and toyoda, for example, have described different techniques of addressing storage using known input data ( 9,10). one of the best-known retrieval systems is that of the chemical abstracts service, which provides a fairly sophisticated title-scan of journal articles with a surprising degree of flexibility in the logic and term structure used as input. comparable systems are used by the defense documentation center, medlars centers, and nasa technology centers. these systems have one specific feature in common: a high level of accuracy in the input data. user-supplied bibliographic data the reliability of bibliographic data supplied to university libraries from faculty and students has long been questioned ( 5). any search system which accepts such data must be designed 1) to increase the level of confidence through machine-generated search structures and variable threshholds and 2) to reduce the dependence upon spelling accuracy, punctuation, spacing and word order. the initial task of formulating an approach to this problem is to determine the type, quality, and quantity of data generally supplied by a user. to derive a controlled set of data for this purpose, the acquisition department of the fondren library provided xerox copies of all english language requests dated 1965 or later and a random sample of 295 requests was drawn from that file of 5000 items. this random sample was compared to the manually-verified, original order-requests to determine 1) the frequency with which data was supplied by the requestor and 2) the accuracy of the provided information. results of this study are given in table 1. table 1. level of confidence in the input data data times times level of elements given correct accuracy confidence edition 295 294 99.6 99.6 title 295 292 99.0 99.0 author 290 264 91.0 82.7 publish. 268 218 81.3 73.9 date 265 215 81.1 72.8 the results suggest that edition can have great significance when specified and should be used as strong supporting evidence for retrieval. it should not necessarily be a restrictive element because of the low-order magnitude of actual specification, which was five times in the sample. (unstated editions were considered as first editions, and correct. ) bibliographic retrievalj ruecking 229 title is the most significant and most reliable element. as richmond indicates, use of the entire title for searching would present distinct problems for retrieval systems ( 4) . consequently, an abbreviated version of the title must be derived from the input data which will reduce the impact and significance of the problems described by richmond (5). the hypothesis it is hypothecated that retrieval of correct bibliographic entries can be obtained from unverified, user-supplied, input data through the use of a code derived from the compression of author and title information supplied by the user. it is assumed that a similar code is provided for all entries of the data base using the same compression rules for main and added entry, title and added title information. it is further hypothecated that use of weighting factors for individual segments of the code will provide accurate retrieval in those cases when exact matching does not occur. before the retrieval methodology can be described, it is necessary to outline the compression technique to be used with author and title words. title compression to gain some understanding of the problems to be faced in compressing title information, a random sample of 500 titles was drawn from the first half of the initial marc i reel (about 4800 titles). each of these titles was analyzed for significant words and tabulations were made on word strings and word frequencies. the following words. were considered as non-significant: a, an, and, by, if, in, of, on, the, to. the tabulated data, shown in table 2, contain some surprising attributes. approximately 90% of the titles contain less than five significant words, which suggests that four significant words will be adequate to match on title. table 2. significant word strings in titles length of word string 1 2 3 4 5+ total number of titles 42 151 179 76 52 500 percentage 8.4 30.2 35.8 15.2 10.4 100.0 cumulative percentage 8.4 38.6 74.4 89.6 100.0 letting n stand for the corpus of words available for title use, the random chance of duplicating any specific word in another title can be stated 1 as . when a string of words is considered, the chance of randomly n 1 selecting the same word string may be considered as -a, where 'a' is the n number of words in the string. 230 journal of library automation vol. 1/ 4 december, 1968 certain words are used more frequently than others, and the occurrence of such words in a given string reduces the uniqueness of that string. the curve displayed in figure 1 shows the frequency distribution of words in the sample. the mean frequency of words in the title-sample is 1.33. 'ioo ( )b~f 800 700 600 t.r) 0 a: 0 3.500 ll. d 0:: llj cdfoo ~ =:i :z: 3()() 2ixj \ 100 fi'}.. i~ k f+!.~ \' jtl-' __() (i) i (~ _c[).l i z 3 '1s 6 7 8 f/ 10 ii /2 ffi!equency fig. 1. frequency distribution of words in sample. bibliographic retrievaljruecking 231 therefore, the chance of selecting an identical word string can be more accurately expressed as: n" an examination of word lengths, as shown in table 3, shows that 95% of the significant title words contain less than ten characters. an examination of the word list revealed that some 70% of the title words contain inflections and/ or suffixes. if these suffixes and inflections are removed, approximately 43% of the remaining word stems contain less than five characters and 59% contain less than six. table 3. distribution of character length and stem length length in total different percent stems percent characters words words 1 7 5 0.5 5 0.8 2 25 14 1.3 14 2.3 3 87 48 4.6 48 7.9 4 172 117 11.1 196 32.3 5 229 163 15.5 92 15.2 6 198 153 14.5 94 15.5 7 202 159 15.3 64 10.6 8 158 122 11.6 45 7.4 9 121 102 9.7 15 2.5 10 84 69 6.6 8 1.3 11 54 48 4.6 7 1.2 12 38 28 2.7 2 0.3 13 14 12 1.1 2 0.3 14 6 4 0.4 0 0.0 15 3 3 0.3 0 0.0 16 2 2 0.2 0 0.0 summary 1400 1049 592 the reduction of word length does affect the uniqueness of the individual word, merging distinct words into common word stems at a mean rate of 2.5 to 1.0. in table 3 the difference between 1049 words and 592 stems reflects the reduction of similar words into a common stem; for example: america, american, americans, americanism, etc., into a.mer. thus, the uniqueness of a string of title words is reduced to the following chance of duplication: (2.5 x 1.33 )• 3.3• n• or-n" 232 journal of library automation vol. 1/ 4 december, 1968 an analysis of consonant strings made by dolby and resnikoff provides frequencies of initial and terminal consonant strings occurring in 7000 common english words (with suffixes and inflections removed) ( 11,12, 13). these frequency lists clearly show that the terminal string of consonants has considerable information-carrying potential in terms of word identification. the starting string also carries information potential, but significantly less than the terminal string. by combining the initial and terminal strings, it is possible to generate an abbreviation which has adequate uniqueness and reduces the influence of spelling. the high percentage of four-character word stems and the fact that the maximum terminal string contains four consonants suggest the use of a four-character abbreviation. to compress a title word into four characters, it is necessary to specify a set of rules. the first rule will be to delete all suffixes and inflections which terminate a title word. the second rule will be to delete vowels from the stem until a consonant is located or the four-character stem is produced. the suffixes and inflections deleted in this procedure are contained in table 4. when the stem contains more than four characters, the third compression rule states that the four-character field is filled with the terminal-consonant string and remaining positions are filled from the initialcharacter string. table 4. deleted suffixes and inflections -ic -ive -in -et -ed -ative -ain -est -aged -ize -on -ant -oid -ing -ion -ent -ance -og -ation -ient -ence -log -ship -ment -ide -olog -er -ist -age -ish -or -y -able -al -s -ency -ible -ial -es -ogy -ite -ful -ies -ology -ine -ism -ives -ly -ure -urn -ess -ry -ise -ium -us -ary -ose -an -ous -ory -ate -ian -ious -ity -ite the relative uniqueness of the generated abbreviation can be calculated using the data supplied by dolby and resnikoff. for example, carter and bonk's building library collections would be abbreviatedbuld, libr,coct. the random chance of duplicating any abbreviation can be stated as consisting of the product of the random chance of duplicating the initial string and the random chance of duplicating the terminal string: bibliographic retrievalj ruecklng 233 fl ft -xx3.32 n1 nt the frequencies listed by dolby and resnikoff may be substituted in the above equation producing the following chances for duplication: 324 63 1 x x 10.89 = -for buld 6800 6800 208 288 6800 277 6800 1 1 x 6800 x 10.89 = 14745 for libr 16 1 x 6800 x 10.89 = 1041 for coct the random chance of duplicating this string of three abbreviations can be calculated by multiplying the individual calculations, which yields the random chance of 1 in 32 x 108• this high uniqueness declines rapidly when the title contains less than three significant words and contains high frequency words, such as the title collected works, for which the same uniqueness calculation produces the random chance of 1 in 44 x 104• to increase the level of uniqueness on short titles, like collected works, it becomes necessary to provide supporting data to the title information. it is clear that the supporting data must come from supplied author text. author compression the same compression algorithms can be used for both personal and corporate names with some modifications. the frequent· substitution of "conference" for "congress" and "symposia" for "symposium" suggests that meeting names should be considered as a secondary sub-set of non-significant words. names of organizational divisions, such as bureau, department, ministry, and office, can be considered as part of the same sub-set. the rules which govern the deletion of inflections, suffixes and vowels can be used for corporate names, but personal author names must be carried into the compression routine without modificatjon. only the last name of an author would be compressed into a code. constructing the test four, four-character abbreviations are allowed for title compression and four for author. rather than use a 32-character fixed field for these codes, the lengths of the input and main-base codes are variable, with leading control digits to specify the individual code sizes for the title and author segments. . provision is made for the inclusion of date, publisher and/ or edition in the search-code sh·ucture although these were not implemented in the test performed. . 234 journal of library automation vol. 1/ 4 december, 1968 at the time the input data is read, the existence of title, author, edition, publisher and date is indicated by the setting of indicators which control the matching mask and which, in part, control the specification of the retrieve threshhold. the title indicator specifies the number of compressed words in the supplied title which must be matched by the base code. a simple algorithm is used to calculate the threshhold values given in columns two through four of table 5. columns five through seven are obtained by adding two to the calculated threshholds. each agreement within the mask adds to a retrieve counter the values indicated in the last five columns of table 5, the values of x and y being the number of matching code words in the title and author segments respectively. conducting the test as mentioned above, the initial tests of the retrieve were based upon title and author matching exclusively and required three runs on the fondren library's 1401 computer. the first loaded 2874 original orderrequests, generated a search code utilizing the rules specified in this paper and created an input tape. the second run extracted title and author data from the marc i data base, created multiple search codes for title, main entry, added title and added entry. both tapes were sorted into ascending search-code sequence. the final run was the search program which attempted to match input codes with the marc i base codes. when there was agreement based on relationship of threshhold and retrieve counter, the printer displayed threshhold, short author and short title on one line, and retrieve value, input author and title on the next line as illustrated in figure 2. the printed results were compared to validate the accuracy of the retrieve. this comparison was cross-checked against the results of the acquisition department's manual procedures. the search program also provided for an attempt to match titles on the basis of a rearrangement of title words. in such attempts the retrieve threshhold was raised. analysis of results the raw data obtained from this experimental run are shown in table 6. of the 287 4 items represented in the input file , 48.4%, or 1392, were actually found to exist in the data base. of those actually present 90.4%, or 1200, were extracted with an overall accuracy of 98.67%. an examination of the sixteen false drops revealed several omissions in the compression routines for the input data and for the data base. one of the more significant omissions was failing to compensate for multi-character abbreviations, particularly 'st.' and 'ste.' for 'saint.' a subroutine for acceptance of such abbreviations added to the search-code generating program would increase the retrieve accuracy to 99%. table 5. values for variable threshhold data threshhold values agreement values given full-code test individual code test title author edition publish. date taepd 3 or 4 2 1 3 or 4 2 1 xylll 12 8+2y 4+2y 14 10+2y 6+2y 4x 2y 3 2 1 xyllo 12 8+2y 4+2y 14 10+2y 6+2y 4x 2y 3 2 1 xylol 12 8+2y 4+ 2y 14 10+2y 6+2y 4x 2y 3 2 1 xyloo 12 8+2y 4+ 2y 14 10+2y 6+2y 4x 2y 3 2 1 xyoll 12 8+2y 4+2y 14 10+2y 6+2y 4x 2y 3 2 1 l::x; .... xyolo 12 8+2y 4+ 2y 14 10+2y 6+2y 4x 2y 3 2 1 ~ g:' xy001 12 8+2y 4+2y 14 10+2y 6+2y 4x 2y 3 2 1 (1q ~ "';j xyooo 12 8+2y 4+2y 14 18+2y 6+2y 4x 2y 3 2 1 ;;:to .... ~ xolll 12 11 7 13 12 7 4x 2y 3 2 1 ::x; {';) xouo 12 11 7 13 12 7 4x 2y 3 2 1 ..... "'t .... {';) c: x0101 12 11 7 13 12 7 4x 2y 3 2 1 ~ "' x0100 12 11 7 13 11 7 4x 2y 3 2 1 !:l:l c::: xoou 12 10 6 13 11 7 4x 2y 3 2 1 trl (") p.:: xoolo 12 10 6 13 not permitted 4x 2y 3 2 1 -z x0001 12 9 5 13 not permitted 4x 2y 3 2 1 0 1:-0 xooo 12 not permitted not permitted c.:> cj1 10 4me r4m8rhchs 10 am~r4mbrhchs ob ame~boll ob ameii.boll 12 amerbusqshowzien 12 amerbusqshowzein 12 amercntrcampbrth 12 amercntrcamp 12 aherjewsisrliscs 1~ amerjewsisrliscshalor 12 ameroccpstctblau 1~ ameroccpstctblau 12 ameroccpstctounn 12 aheroccpstctblau 12 amerpartsysmchrs 14 aherpartsysmchrs 10 amerpreowarn 10 amerpreowarn 10 amerschkillck 10 4merschkblck 10 amerschosexi'i 10 amerschosexnpatccayo 12 amerspacexprshtn 1~ amerspacexprshen 12 amerthettooaoowr 1~ amerthettooaoowr 12 ame r thtii.as seenbrwn 11> amerthtras see nmos smonsj 12 ameihhtras seenmoss 18 a!'ierthtras seei'imossmo'isj 12 an4zphphargumcgl 12 anazphphargumcgfjan phip 12 ancihuntfar westpoud 18 ancihuntfar we stpouo fig. 2. sample of retrieved citations. heinrichs, waldo h. heinrichs* boswell, charles. boswellt lieoman, irving . leiomant bosworth, allan r. clay, c. t.t isaacs, harold robert; isaacs, harold r.t blau, peter michael. blaut duncan, otis ouoleyo jo blaut chambers, william nisbet chahberst warren, stoney, 1916 warrei'it black, hillel. blackt sexton, patricia cayo. sextoi'io patricia cayot shelton, william royo sheltont downer, alan seymour, oownert . brown, john mason, 1900 hoses, moi'itrose j.t american ambassaoor joseph c. gr american ambassaojrt the america the story of the worl the america. the story of the world the american burlesque show. the american burlesque showt america-s concentration camps by america-s concent~ation campst americai'i jews in israel by haao americai'i jews in israelt the americai'i occjpational structur the american etcupational structure the american occ~pational structur the american occupational structure the american party systems stages the american part~ systems• stages the american president the amer[can presioentt the american schjolbook. the american schoolbook* readings the american school a sociologic the american scholl. a sociological american space exploration the f american spat~ exploration. the fir the american theater today, eoite the american theater. today* the american theatre as seen by it the american theatre as seen by its moses, montrosf jqnaso the american the4tre as seen by it hoses, montrose j.t the american theatre as seen by its mcgreal, ian philip, 19 analyzing philosophical arguments mcgreaf, jan phillipt analyzing philosophical arguments. pouraoe, richard f. pourade* ancient hunters jf the far west, ancient hunters of the far west* ~ o;, 0" ~ ....... .q.. t"'l .... ~ ~ ~ i e· ;:$ < 0 r....... ~ t1 (!) (') (!) g. (!) ..:-: ....... cd 85 bibliographic retrievaljrvecking 237 table 6. table of results retrieve total correct false percentage values hits hits hits correct 6 14 14 0 100 8 0 0 0 0 10 311 311 0 100 12 264 248 16 93.3 14 232 232 0 100 16 118 118 0 100 18 260 260 0 100 20 1 1 0 100 totals 1200 1184 16 98.7 table 7. distribution of errors title errors author errors no. of title author author codes error spelling lacking error spelling other total 1 2 3 10 12 27 4 58 2 2 6 17 26 60 23 134 3 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 total 4 9 27 38 87 27 192 the occurrence of titles with the words "selected". or "collected," etc., produced additional false drop when the title word string exceeded two words. a modification to the search program to raise the threshhold when the input data contain codes such as 'sect; 'coct' would increase the retrieve accuracy to 99.17% the presence of personal names in titles, such as 'charles evans hughes' and 'franklin delano roosevelt' caused seven additional false drops. at present it seems unlikely that a simple method to prevent them can be included. conclusion the experimental results indicate that the hypothesis suggested is valid. use of multiple codes for added entry, added title in addition to the main entry, and main title data are clearly necessary. approximately 10% of the correctly retrieved items were produced by the existence of an added entry code. the influence of spelling accuracy was lessened by use of a compression technique. an inspection of extracted titles revealed the existence of 43 spelling errors which did not affect retrieval. thus, the search code reduced the significance of spelling by some 30%. utilizing table search followed by table look-up and linking random238 journal of library automation vol. 1/ 4 december, 1968 access addresses, should enable the search code approach to bibliographic retrieval to provide rapid, direct access to the title sought. acknowledgment this study was supported in part by national science foundation grants gn-758 and gu-1153 and by the regional information and communication exchange. the assistance of the acquisitions department staff, the research computation center staff and the staff of the fondren library's data processing division is gratefully acknowledged. references 1. morris, ned c.: "computer based acquisitions system at texas a & i university," journal of library automation, 1 (march 1968 ), 1-12. 2. wedgeworth, robert: "brown university library fund accounting system," i ournal of library automation, 1 (march 1968), 51-65. 3. u. s. library of congress: project marc, an experiment in automating library of congress catalog data (washington: 1967). 4. richmond, phyllis a.: "note on updating and searching computerized catalogs," library resources and technical services, 10 (spring 1966), 155-160. 5. richmond, phyllis a.: "source retrieval," physics today, 18 (april 1965)' 46-48. 6. atherton, p.; yorich, j. c.: three experiments with citation indexing and bibliographic coupling of physics literature (new york, american institute of physics, 1962). 7. international business machines corporation: reference manual, index organization for information retrieval (ibm, 1961). 8. international business machines corporation: a unique computable name code for alphabetic account numbering (white plains, n.y.: ibm, 1960). 9. tainiter, m.: "addressing random-access storage with multiple bucket capacities," association for computing machinery journal, 10 (july 1963 ), 307-315. 10. toyoda, junichi; tazuka, yoshikazu; kasahara, yoshiro: "analysis of the address assignment problems for clustered keys," association for computing machinery journal, 13 (october 1966), 526-532. 11. dolby, james l.; resnikoff, howard l.: "on the structure of written english words," language, 40 (apr-june 1964), 167-196. 12. resnikoff, howard l.; dolby, james l.: "the nature of affixing in written english, part i," mechanical translation, 8 (march 1965), 84-89. 13. resnikoff, howard l.; dolby, james l.: "the nature of affixing in written english, part ii," mechanical translation, 9 (june 1966), 23-33. mobile technologies & academics: do students use mobile technologies in their academic lives and are librarians ready to meet this challenge? angela dresselhaus and flora shrode mobile technologies & academics | dresselhaus and shrode 82 abstract in this paper we report on two surveys and offer an introductory plan that librarians may use to begin implementing mobile access to selected library databases and services. results from the first survey helped us to gain insight into where students at utah state university (usu) in logan, utah, stand regarding their use of mobile devices for academic activities in general and their desire for access to library services and resources in particular. a second survey, conducted with librarians, gave us an idea of the extent to which responding libraries offer mobile access, their future plans for mobile implementation, and their opinions about whether and how mobile technologies may be useful to library patrons. in the last segment of the paper, we outline steps librarians can take as they “go mobile.” purpose of the study similar to colleagues in all types of libraries around the world, librarians at utah state university (usu) want to take advantage of opportunities to provide information resources and library services via mobile devices. observing growing popularity of mobile, internetcapable telephones and computing devices, usu librarians assume that at least some users would welcome the ability to use such devices to connect to library resources. to find out what mobile services or vendors’ applications usu students would be likely to use, we conducted a needs assessment. the lessons learned will provide important guidance to management decisions about how librarians and staff members devote time and effort toward implementing and developing mobile access. we conducted a survey of usu’s students (approximately 25,000 undergraduates and graduates) to determine the degree of handheld device usage in the student population, the purposes for which students use such devices, and students’ interests in mobile access to the library. in addition, we surveyed librarians to learn about libraries’ current and future plans to launch mobile services. this survey was administered to an opportunistic population angela dresselhaus (aldresselhaus@gmail.com) was electronic resources librarian, flora shrode (flora.shrode@usu.edu) is head, reference & instruction services, utah state university, logan, utah. mailto:aldresselhaus@gmail.com mailto:flora.shrode@usu.edu information technology and libraries | june 2012 83 comprised of subscribers to seven e-mail lists whom we invited to offer feedback. our goal was to develop an action plan that would be responsive to students’ interests. at the same time, we aim to take advantage of the growing awareness of and demand for mobile access and to balance workloads among the library information technology professionals who would implement these services. usu is utah’s land-grant university and the merrill-cazier library is its primary library facility on the home campus in logan, utah. while usu has had satellite branches for some time, a growing emphasis on expanding online and distance education courses and degree programs has resulted in a considerable growth of its distance education programs in the last five years. mobile access to university resources makes especially good sense for the distance education population and for students who may reside close to the main usu campus but who also enroll in online courses. the library has an information technology staff of 4.5 fte professionals who support the library catalog, maintain roughly 250 computer workstations in cooperation with the director of campus student computer labs, and oversee the computing needs of library staff and faculty members. literature review mobile access to library resources is not a new concept; in fact, the first project designed to deliver handheld mobile access to library patrons began eighteen years ago, in 1993, the time of mainframe computers and gopher. the “library without a roof” project partners included the university of southern alabama, at&t, bellsouth cellular, and notable technologies, inc. 1 library patrons at participating institutions could search and read electronic texts on their personal digital assistants (pdas) and search the library catalog while browsing in physical collections. as reflected in the literature, interest in pda applications for libraries started to pick up around the turn of the twenty-first century. medical librarians were among the first to widely recognize the potential impact of mobile technologies on librarianship. a 2002 article in the journal of the medical library association and a monograph by colleen cuddy are among the first publications that focus on pdas. 2 a quick perusal of the medical category on the itunes store reveals several professional applications, ranging from new england journal of medicine tools to remote patient vital-sign monitors. as an example of the depth of mobile-device penetration in the medical field, in 2010 the food and drug administration approved the marketing of the airstrip suite of mobile-device applications. these apps work in conjunction with vital-sign monitoring equipment to allow instant remote access to a patient’s vital signs. 3 these examples illustrate the increasing pervasiveness of mobile technology in everyday life. mobile learning in academic areas outside of medicine has increased recently as more universities have adopted mobile technologies. 4 a sampling of current projects at academic mobile technologies & academics | dresselhaus and shrode 84 institutions is provided in the 2010 horizon report. 5 according to the 2010 educause center for applied research (ecar) study, 49 percent of undergraduates consider themselves mainstream adopters of technology. 6 locally, utah state university students have adopted smartphones at the rate of 39.3 percent and other handheld internet devices at the rate of 31.5 percent. these statistics indicate that skills are increasing and the technological landscape is changing quickly. the ecar study reports that student computing is rapidly moving to the cloud, another indication of the rapid change in the use of technology. “usb may one day go the way of the eight-track tape as laptops, netbooks, smartphones and other portable devices enable students to access their content from anywhere. they may or may not be aware of it, but many of today’s undergraduates are already cloud-savvy information consumers, and higher education is slowly but surely following their lead.” 7 similarly, usu students show interest in adopting new technology. while usu students are less likely to own mobile devices, 70.2 percent of respondents indicated that they would be likely or very likely to use library resources on smartphones if they owned capable devices and if the library provided easy access to materials. bridges, gascho rempel, and griggs published a comprehensive article, “making the case for a fully mobile library web site: from floor maps to the catalog,” detailing their efforts to implement mobile services on the oregon state university campus. 8 their paper highlights the popularity of mobile phones and smartphones/web-enabled phones. the authors discuss mobile phone use, library mobile websites, and mobile catalogs, and they describe the process they used to develop their mobile library site. they note that mobile services will certainly be expected in the coming years, and we have learned that usu students share this expectation. survey research in recent years librarians have conducted surveys on mobile technology in libraries. in a 2007 study, cummings, merrill, and borrelli surveyed library patrons to find out if they are likely to access the library catalog via small-screen devices. 9 they discovered that 45.2 percent of respondents, regardless of whether they owned a device, would access the library catalog on a small-screen device. mobile access to the library catalog was the most requested service in the usu student survey, although it accounted for only 16percent of the responses. cummings, et al. also discovered that the most frequent users of the catalog were also the least willing to access the catalog via mobile devices, an interesting observation that merits further research. their survey was completed in june of 2007, just five months after the january 9th release of the original iphone. the release of the iphone is significant as the point where the market demographics of mobile device users began to shift to people under thirty, the primary age group of undergraduate students. 10 librarians wilson and mccarthy at ryerson university conducted two surveys to measure information technology and libraries | june 2012 85 the usage of their catalog’s feature to send a call number via text or email (initiated in 2007) and their “fledgling mobile web site” (launched in 2008). 11 the first survey indicated that 20 percent of respondents owned internet-capable cell phones, and over half said they intended to buy this type of phone when their current contracts expired. the survey respondents indicated they wanted the following services: “booking group study rooms, checking hours and schedules, checking their borrower records and checking the catalogue.” 12 the second survey was conducted a year after the library had implemented a group study room reservation system, catalog and borrower record services, and a computer/laptop availability service. results of the follow-up survey show a drastic increase in ownership of internetcapable cell phones (from 20% to 65%). respondents desired two new services: article searches and e-book access. wilson and mccarthy found that very few library patrons were accessing the mobile services, but “60% of the survey respondents were unaware that the library provided mobile services.” 13 the authors conclude that advertising should be a central part of mobile technology implementation. they also detail how the library contributed expertise and leadership to their campus-wide mobile initiatives. seeholzer and salem conducted a series of focus groups in the spring of 2009 to determine the extent of mobile device use among students at kent state university. 14 notable among their findings are that students are willing to conduct research with mobile devices, and they desire to have a feature-rich interactive experience via handheld devices. students expressed interest in customizing interactions with the library’s mobile site and completing common tasks such as placing holds or renewing library materials. nationwide survey of librarians we asked colleagues who subscribe to e-mail distribution lists to respond to a survey about their libraries’ implementation of mobile applications for access to library collections and services. invitations to take the survey were sent to seven lists (acrl science & technology section, eril, information literacy instruction, liblicense-l, nasig, ref-l, and serialist), and 289 librarians and library staff members responded to the survey. the population of subscribers to the e-mail lists we used to solicit survey responses is dynamic and includes librarians and staff who work in academic and other types of settings. while our findings cannot be generalized in a statistically reliable manner, we nonetheless believe that the survey responses merit thorough analysis. we chose to conduct two surveys to avoid some of the problems we noted in a 2007 study conducted by todd spires. 15 spires’ survey questions focused on librarians’ perceptions rather than on empirical data. we developed separate surveys for librarians and students in hopes of avoiding problems that could arise from basing assumptions on perceived behavior or from the complexity of interpreting and generalizing from perceptions. a survey of library patrons should provide more accurate insight into the ways that patrons are using the library mobile technologies & academics | dresselhaus and shrode 86 via handheld devices. in the libraries that currently provide mobile access to resources, the library catalog is most commonly offered. article databases and assistance from a librarian tie as the second most frequently provided services. figure 1 shows a snapshot of the resources and services librarians reported that they provide. we also asked how long libraries have provided mobile access, and the time periods ranged from a few weeks to more than ten years. five librarians indicated that they have provided mobile access for six to ten years, and it is possible that these respondents may work in medical or health science libraries, as our literature review indicated that access to medical information and journal articles via pdas has been a reality for several years. figure 1. librarians’ responses: does your library provide mobile access to the following library resources? librarians were also asked what services and resources they believe libraries should provide via mobile devices. of one hundred seventy-eight responses, 71 percent indicated that “everything” or a variety of library resources should be made available. a few of the more interesting suggestions include a library café webcam (similar to a popular link from north carolina state university), locker reservations, a virtual suggestion box, alerts about database trials, an app that lists new books, and using ipads or other mobile devices for roving reference. roving reference with tablet pcs was evaluated by smith and pietraszewski at the west campus branch library of texas a&m. 16 as tablet computers become increasingly popular with the release of the ipad and other tablets, 17 roving reference should be reconsidered. smith and pietraszewski note that "the tablet pc proved to be an extremely useful device as well as a novelty that drew student interest (anything to make reference librarians look cool!)" 18 using the latest technology in libraries will help raise awareness that libraries are relevant and adapting to changing user preferences. we asked librarians to indicate who had responsibility for implementing mobile access in their library. the 184 responses are summarized here:  63 percent answered that a library systems or computing professional does this work;  26.1 percent indicated that the electronic resources librarian has this role;  17.9 percent rely on an information professional from outside of the library;  22.8 percent chose “other,” and we unfortunately did not offer a space for comments where survey respondents could tell us the job title of the person in their library who implements mobile access. the results from our sample of librarians are consistent with a larger study by the library journal. 19 the lj study found that the majority of academic libraries have implemented or are information technology and libraries | june 2012 87 planning to implement mobile technologies. student survey in january of 2011 we sent out a thirteen-question survey to students (questions are available in appendix a). usu’s student headcount is 25,767, and 3,074 students responded, representing 11.9 percent of the student population. we asked students to identify with colleges so that we could evaluate the survey sample against the enrollment at usu. the rate of response by college clustered between 12–19 percent with the lowest response rate (8 percent) from the college of education. the highest response rate came from the college of humanities and social sciences. we examined survey response rates from usu undergraduate and graduate populations; 54 percent of undergraduates and 50 percent of graduate students use mobile technology for academic purposes. we believe that our sample is sufficiently representative of the overall population of usu. figure 2. student response rates by college in order to understand the context of survey questions that specifically address mobile access, we asked students how often they used library electronic resources. the majority of students used electronic books, the library catalog, and electronic journals/articles a few times each semester. only 34.4 percent of students never use electronic books, 19.6 percent never use the library catalog, and 17.6 percent never use electronic journals/articles. we made comparisons between disciplines and found no significant difference in electronic resource use between fields in the sciences and those in humanities. further data will be collected in fall 2011 about use of print and electronic materials. mobile technologies & academics | dresselhaus and shrode 88 figure 3. electronic resource use among students students were asked how often they use a variety of handheld devices. we decided to emphasize access over ownership in order to allow for a variety of situations. responses show that 39.3 percent of our students use a smartphone with internet access on a daily basis. another 31.5 percent of students use other handheld devices like an ipod touch on a daily basis. very few students use ipads or e-book readers, with 3.9 percent and 5.4 percent indicating daily use, respectively. we view the "other handheld device" category as an important segment of the mobile technology market because of the lower cost barrier, since such devices do not require a subscription to a data plan. the ecar study also noted the possibility of cost factors influencing the decision of some students not to access the internet via a handheld device. 20 information technology and libraries | june 2012 89 figure 4. mobile device usage students were asked if they use their mobile device or phone for academic purposes (e.g., blackboard, electronic course reserves, etc.). this question was intentionally worded broadly in order to gather general information. we used skip logic to direct respondents to different paths through the survey based on their response to earlier questions. in response to a question about how students use their mobile devices, 54 percent of respondents indicated that they use their mobile devices for academic purposes. we analyzed the results by discipline and noted a few variances. among students responding from the school of business, 63 percent said that they use their mobile device for academic purposes, and 59 percent of engineering students use their devices for school work. the respondents from the other colleges reported use under 50 percent, most likely because of more limited adoption of mobile technology by usu faculty in those fields or lack of personal funds (or unwillingness to spend) to acquire devices and data plans. the 2010 ecar report also noted higher exposure to technology in these fields, indicating that the situation at usu is in line with results from a national study. 21 mobile technologies & academics | dresselhaus and shrode 90 table 1. device use for academic purposes by college we asked the students, “if library resources were easily accessible on your mobile devices, and if you had such a device, how likely would you be to use any of the following for assignments or research?” responses to this question allowed us to gauge interest without concerns about cost of technology or the current state of mobile readiness in our library. among the survey respondents, 70.2 percent are likely or very likely to use resources on a smartphone; 46.9 percent are likely or very likely to use resources on an ipad; 45.9 percent are likely or very likely to use resources on an e-book reader; 63.2 percent are likely or very likely to use resources on other devices. we included an option for respondents to select “not applicable” as distinct from “not likely” to allow for those students who may welcome use of a mobile device but who may currently use a device different from the types we specified. information technology and libraries | june 2012 91 figure 5. likelihood of using library resources on mobile device if easily available we are unsure how to account for the dramatic difference in interest between smartphone and ipad usage. survey responses indicated that only a small number of students have access to an ipad, and it is possible that students have had little opportunity to see their classmates or others use ipads in an academic setting. students were asked in a free-text question to list the services the library should offer. the comments were varied and often used language different from the vocabulary that librarians typically use. in order to gain an understanding of trends and to standardize the language, we coded the survey comments. after coding, trends began to emerge. access to the library catalog was mentioned by 16 percent of respondents. mobile services in general were specified by 11 percent of survey respondents, 10 percent wanted articles, and 9 percent wanted to reserve study rooms on their mobile device. the phrase “mobile services” represents a catch-all tag designated for comments that indicated that a student desired a variety of services or all services that are possible. for example, only 9 percent of respondents indicated they had used text to contact the library and 15percent had used instant messaging. several students indicated they might have used these services but did not know they were available, indicating a need for advertising. while we learned much mobile technologies & academics | dresselhaus and shrode 92 about students’ desires for mobile services from this important subset of comments in response to the free-text question, they did not prove especially useful to guide librarians’ plans for the next stages of implementing mobile technology. figure 6. services requested by students as is common at many institutions, funding at usu is limited and any development in the area of mobile access implementation must be strategic. our survey indicated that usu students are using mobile devices for their academic work and would like to further integrate library resources into their mobile routine. the next section of this paper outlines the steps we are taking toward mobile implementation. going mobile the usu library joins many other academic libraries in the beginning stages of implementing mobile technologies. survey responses from students indicate that they use mobile devices for academic purposes, and until options to use the library with such devices are available and advertised, we will not have a clear understanding of students’ preferences. klatt's article, “going mobile: free and easy,” 22 outlines a way to get started with mobile services with small investments of time and money. articles by griggs, 23 back, 24 and west, 25 and books by green, et al. 26 and hanson 27 also provide guidance in this area. here we offer suggestions to establish an implementation team, conduct an environmental scan, outline steps to begin the process, and shed light on advertising, assessment, and policy issues. information technology and libraries | june 2012 93 implementation team for a library seeking to provide mobile access to online resources, a diverse and talented implementation team is important. public services personnel in an academic library staff are on the front lines and often field students’ questions. they may also have the opportunity to observe how students are using mobile devices in the library. if librarians track reference interactions, they may find evidence that students are attempting to use their mobile devices to access library services. the electronic resources/collections specialist will also play a key role in mobile development. these specialists are often in contact with vendors, and their advocacy is important in encouraging mobile web development in the vendor community. a web site coordinator interested in mobile services and knowledgeable in current web standards will bring essential talent to the team. arguably, a mobile-optimized web site should become a standard level of service. web sites that are optimized or adapted specifically for mobile access are device agnostic and do not require advanced knowledge of smart phone operating systems. therefore existing web development staff can apply their current skill set to expand into mobile web design. in order to launch advanced interactive access to library resources, a programmer who is interested in developing mobile apps on a number of platforms is needed. device-specific applications allow for the use of phone features such as gps and orientation sensing via an accelerometer and provide the basis for augmented reality technologies. environmental scan librarians can learn about mobile usage in their community by gathering information to guide future development. at usu we interpret the numbers of students who use mobile devices for academic purposes as justification for implementing mobile library access, but we have not set a benchmark for a degree of interest that would trigger more development. some of the mobile implementations described at the end of this paper required minimal time or were investigated because of the electronic resources librarian’s interest for their relevance to her role as music subject librarian. in the survey we administered to students, we considered it important to include a wide range of devices, including ipod touches and similar devices that have many of the same possibilities for academic use as smartphones but which do not require a monthly contract. laptops are also considered a mobile technology, and while we did not emphasize this class of devices, some student comments referred specifically to laptop computers. we will monitor use of the mobile applications that we implement and likely conduct a follow-up survey to assess students’ satisfaction and to find out if there are other services they would like for the library to provide. while librarians may gather useful information from a user study, there are other ways to determine if students are, in fact, using mobile devices in the library. one approach is to review logs of reference questions to determine if students are inquiring about access to library resources via mobile devices. recently, a few mobile-related questions have surfaced mobile technologies & academics | dresselhaus and shrode 94 at usu in the libstats program used to track reference interactions. this is also an area where training reference staff to recognize and record questions about mobile access could be helpful to detect demand in the library’s community. if vendors provide statistics about use of their products from mobile devices, this information could also contribute to assessing need. finally, in libraries that use vpn or other off-campus authentication methods, consulting with it support staff to see if they field questions on setting up remote access on smartphones or other devices may factor into decisions regarding mobile access. the usu information technology website provides a knowledgebase that includes entries on a variety of mobile device queries. this indicates to librarians that people in the university community are using their mobile devices for academic functions. before we conducted the survey of usu students, we knew little about the exact nature of their mobile use. getting started after identifying the needs on campus, the next step is to create a plan for mobile implementation. an important aspect of anticipating the needs of a library’s user population is to understand the likely use scenarios, goals, tasks, and context as outlined in “library/mobile: tips on designing and developing mobile web sites.” 28 building on services that incorporate tasks that people already perform in non-academic contexts provides a logical bridge for those who are familiar with everyday use of a mobile device to recognize how such devices can serve academic purposes. gathering information from each vendor that supplies content to the library is an important early step in planning. this information can serve as the basis of a mobile web implementation plan and, in the case of ebsco, creating a profile is necessary in order to allow access to a mobile-formatted platform. at usu our online catalog provider has developed an application for apple's ios platform. if a library’s catalog vendor does not offer a dedicated application or mobile site, samuel liston’s comparisons of three major online catalogs on three popular mobile devices is helpful in gaining an understanding of how opacs display on smartphones. his article also outlines a procedure for testing opacs and usability. 29 at usu we can also take advantage of serials solutions’ mobile-optimized search screen and a variety of applications provided by other vendors. jensen noted that librarians should not rely solely on vendor-created applications due to vendors’ tendency to develop applications that are usable by only a segment of the overall mobile device user population. 30 he adds that libraries should also avoid developing applications for limited platforms. in addition, jensen provides a simple step-by-step process for converting articles retrieved from a vendor database to a format that can be downloaded from electronic course reserves and read on a variety of handheld devices. while using vendor-developed applications is an important strategy, most libraries will find that developing a mobile-compatible library website is necessary. information technology and libraries | june 2012 95 mobile website development can be accomplished in a variety of ways. at usu we plan to offer a version of our regular website by employing cascading style sheets (css). this method is described in the paper by bridges, et al., 31 and standard guidelines can be found in the mobile web best practices 1.0. 32 this method will allow the content to be reformatted at the point of need for a variety of platforms. results from the usu student survey indicate a desire to be able to use a mobile device for access to the library catalog, to use services like reference assistance, find articles, and make study room reservations. the library plans to include hours and location information, access to existing reference chat and text features, and links to databases with mobile friendly websites or vendor-created applications in addition to the resources requested by students. we are still unsure of the best way to provide links to applications and how to explain the various authentication methods required by each vendor. while vpn and ezproxy are possible methods to authenticate via mobile devices, vendors are content at the moment to allow students to access their resources by setting up an account that is based on an authorized e-mail domain or through a user account created on the non-mobile version of the resource. in a few cases at usu, mobile applications from vendors allow access to categories of users such as alumni because they have a usu.edu e-mail address, although the library does not typically include these patrons in our authorized remote user group. advertising, assessment, and policy creating a mobile website and offering mobile services are only the beginning of the effort to provide access to library materials for mobile users. as wilson and mccarthy found, advertising is essential; 33 students won’t use a service they don’t know about. crafting a marketing plan with both online and print materials is essential. educating library staff members, especially those on the public services front line, is an essential part of promoting mobile services. assessment strategies must be developed in order to focus development strategically. periodic surveys and focus groups can inform future development of mobile services and gauge the impact of currently offered services. librarians should encourage vendors to provide usage data for their mobile portals or applications, and libraries can track use data from their own information technology departments. implementation of mobile web services creates the need to develop new policies and to educate staff. privacy concerns and the complexities of digital rights management have the potential to transform the role of the library and its policies. 34 patrons will need to be aware that the library has less control over maintaining privacy when materials are accessed via third-party mobile applications. libraries will need to consider how new developments in pricing models may affect expanding mobile access; one example is harpercollins’ announcement in early 2011 about a policy requiring libraries to repurchase individual e mobile technologies & academics | dresselhaus and shrode 96 book titles after a cap on check-outs is reached. 35 librarians’ desire to offer reference services or other assistance via mobile devices follows naturally from their long-standing efforts to enable patrons to ask questions via e-mail, chat, instant messaging, or sms text. instant messaging, chat, and text lend themselves to mobile access because they are designed for the relatively short exchange that people typically use when communicating with a handheld device. offering reference services using sms text and chat in particular are relatively easy for libraries to employ because there are many free services to support them. in some cases, a systems administrator or it expert may be helpful in navigating the set-up of chat and text services and to integrate them so that, for example, when a text message arrives during a time when no one is monitoring the service, a voicemail message automatically appears in library’s e-mail account. librarians can find an enormous amount of advice on the web and in the literature about how to begin offering mobilefriendly reference, how to expand the virtual reference services they currently provide, and how to choose among free and fee-based services for their library’s needs and budget. two efficient places to begin are cody hanson’s special issue of library technology reports, which provides a thorough overview of mobile devices and their capabilities and straightforward suggestions for planning and implementation, and m-libraries, a section of library success: a best practices wiki. 36 conclusion in light of trends toward more widespread use of mobile computing devices and smartphones, it makes sense for libraries to provide access to their collections and services in ways that work well with mobile devices. this case study presents the situation at the merrill-cazier library at utah state university, where students who responded to a survey indicate they are very interested in mobile access, even if they have not yet purchased a smartphone or find data plans to be too expensive at this point. as is only reasonable for any library, at usu we have begun by implementing mobile applications that are available from vendors of our online catalog and databases because these require minimal effort and no additional cost. we present ideas for establishing an implementation team and advice for academic libraries who wish to “go mobile.” we aim to have a concrete plan for the work that will be required to optimize the library’s website for mobile access by the fall of 2011. a significant step is hiring a digital services librarian to work closely with the webmaster, electronic resources librarian, and others interested in promoting access to resources and services via mobile devices. our vision is to be on track to offer an augmented-reality experience to our patrons as the 2010 horizon report indicates will be an important trend in the next two to three years. we aim to create an environment in which students can use their mobile device to gain entry to a new layer of digital information, enhancing their experience in the physical library. information technology and libraries | june 2012 97 references 1. clifton dale foster, “pdas and the library without a roof,” journal of computing in higher education 7, no. 1 (1995): 85–93. 2. russell smith, “adapting a new technology to the academic medical library: personal digital assistants,” journal of the medical library association 90, no. 1 (2002): 93–94; colleen cuddy, using pdas in libraries: a how-to-do-it manual (new york: neal-schuman publishers, 2005). 3. andrea jackson, “wireless technology poised to transform health care,” rady business journal 3, no. 1 (2010): 24–26. 4. alan w. aldrich, “universities and libraries move to the mobile web,” educause quarterly 33, no. 2 (2010), www.educause.edu/educause+quarterly/educausequarterlymagazinevolum/univers itiesandlibrariesmoveto/206531 (accessed mar. 30, 2011). 5. larry johnson, alan levine, r. smith, and s. stone, the 2010 horizon report (austin, tx: the new media consortium, 2010), www.nmc.org/pdf/2010-horizon-report.pdf (accessed mar. 31, 2011). 6. shannon d. smith and judith borreson caruso, with an introduction by joshua kim, the ecar study of undergraduate students and information technology, 2010 (research study, vol. 6) (boulder, co: educause center for applied research, 2010), www.educause.edu/ecar (accessed mar. 31, 2011). 7. smith and caruso, the ecar study of undergraduate students and information technology, 2010. 8. laurie bridges et al., “making the case for a fully mobile library web site: from floor maps to the catalog,” reference services review 38, no. 2 (2010): 309–20. 9. joel cummings, alex merrill, and steve borrelli, “the use of handheld mobile devices: their impact and implications for library services,” library hi tech 28, no. 1 (2009): 22– 40. 10. rubicon consulting, the apple iphone: success and challenges for the mobile industry (los gatos, ca: rubicon consulting, 2008), http://rubiconconsulting.com/downloads/whitepapers/rubicon-iphone_user_survey.pdf (accessed mar. 31, 2011). 11. sally wilson and graham mccarthy, “the mobile university: from the library to the campus,” reference services review 38, no. 2 (2010): 215. http://www.educause.edu/educause%2bquarterly/educausequarterlymagazinevolum/universitiesandlibrariesmoveto/206531 http://www.educause.edu/educause%2bquarterly/educausequarterlymagazinevolum/universitiesandlibrariesmoveto/206531 http://www.educause.edu/educause%2bquarterly/educausequarterlymagazinevolum/universitiesandlibrariesmoveto/206531 file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.nmc.org/pdf/2010-horizon-report.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.nmc.org/pdf/2010-horizon-report.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.educause.edu/ecar http://rubiconconsulting.com/downloads/whitepapers/rubicon-iphone_user_survey.pdf http://rubiconconsulting.com/downloads/whitepapers/rubicon-iphone_user_survey.pdf mobile technologies & academics | dresselhaus and shrode 98 12. ibid., 216. 13. ibid., 223. 14. jamie seeholzer and joseph a. salem, “library on the go: a focus group study of the mobile web and the academic library,” college and research libraries 72, no. 1 (2011): 9– 20. 15. todd spires, “handheld librarians: a survey of librarian and library patron use of wireless handheld devices,” internet reference services quarterly 13, no. 4 (2008): 287– 309. 16. michael m. smith and barbara a. pietraszewski, “enabling the roving reference librarian: wireless access with tablet pcs,” reference services review 32, no. 3 (2004): 249–55. 17. kathryn zickuhr, generations and their gadgets (washington, d.c.: pew internet & american life project, 2011), http://pewinternet.org/reports/2011/generations-andgadgets.aspx (accessed mar. 31, 2011). 18. smith and pietraszewski, “enabling the roving reference librarian,” 253. 19. lisa carlucci thomas, “gone mobile: mobile catalogs, sms reference, and qr codes are on the rise—how are libraries adapting to mobile culture?” library journal 135, no. 17 (2020): 30–34. 20. smith and caruso, the ecar study of undergraduate students and information technology, 2010. 21. ibid. 22. carolyn klatt, “going mobile: free and easy,” medical reference services quarterly 30, no. 1 (2011): 56–73. 23. kim griggs, laurie m. bridges, and hannah gascho rempel, “library/mobile: tips on designing and developing mobile web sites,” code4lib 8, november 23, 2009, http://journal.code4lib.org/articles/2055 (accessed mar. 30, 2011). 24. godmar back and a. bailey, “web services and widgets for library information systems,”information technology & libraries 29, no. 2 (2010): 76–86. 25. mark andy west, arthur w hafner, and bradley d. faust, “communications—expanding access to library collections and services using small-screen devices,” information technology & libraries 25, no. 2 (2006): 103. 26. courtney greene, missy roser, and elizabeth ruane, the anywhere library: a primer for the mobile web (chicago: association of college and research libraries, 2010). http://pewinternet.org/reports/2011/generations-and-gadgets.aspx http://pewinternet.org/reports/2011/generations-and-gadgets.aspx http://journal.code4lib.org/articles/2055 information technology and libraries | june 2012 99 27. cody w. hanson, “libraries and the mobile web,” library technology reports 42, no. 2 (february/march 2011). 28. griggs, bridges, and gascho rempel, “library/mobile.” 29. samuel liston, “opacs and the mobile,” computers in libraries 29, no. 5 (2009): 6–47. 30. r. bruce jensen, “optimizing library content for mobile phones,” library hi tech news 27, no. 2 (2010): 6–9. 31. griggs, bridges, and gascho rempel, “library/mobile.” 32. “mobile web best practices 1.0,” worldwide web consortium (w3c), www.w3.org/tr/mobile-bp (accessed mar. 30, 2011). 33. wilson and mccarthy, “the mobile university.” 34. timothy vollmer, there’s an app for that! libraries and mobile technology: an introduction to public policy considerations (policy brief no. 3) (washington, d.c.: ala office for information technology policy, 2010), www.ala.org/ala/aboutala/offices/oitp/publications/policybriefs/mobiledevices.pdf (accessed mar. 31, 2011). 35. josh hadro, “harpercollins puts 26 loan cap on ebook circulations,” library journal, february 25, 2011, www.libraryjournal.com/lj/home/889452264/harpercollins_puts_26_loan_cap.html.csp (accessed mar. 31, 2011). 36. “m-libraries: library success: a best practices wiki,” www.libsuccess.org/index.php?title=m-libraries, (accessed mar. 31, 2011). file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.ala.org/ala/aboutala/offices/oitp/publications/policybriefs/mobiledevices.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.ala.org/ala/aboutala/offices/oitp/publications/policybriefs/mobiledevices.pdf file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.libsuccess.org/index.php%3ftitle=m-libraries, mobile technologies & academics | dresselhaus and shrode 100 appendix a. student survey questions 1. type of student? 2. age? 3. gender? 4. what is your college? 5. how often do you use the following electronic resources provided by your library? 6. do you use any of the following devices? 7. do you use your mobile device or phone for academic purposes (e.g., blackboard, electronic course reserves, etc.)? 8. please list what you use your device to do? 9. have you ever used a text message to get help using the library? 10. have you ever used instant messaging to get help using the library? 11. if library resources were easily accessible on your mobile devices and if you had such a device, how likely would you be to use any of the following for assignments or research? 12. what mobile services would you like the library to offer? 13. comments? information technology and libraries | june 2012 101 appendix b. librarian survey questions 1. type of library? 2. your job/role in the library? 3. years working in libraries? 4. does your library offer mobile device applications for the following electronic resources? 5. who in your library or on your campus is responsible for implementing or developing mobile device applications? 6. how long has your library provided access via mobile devices to electronic resources or services? 7. if you collect use data for library electronic resources, are patrons using the mobile device applications your library provides? 8. what mobile services do you believe libraries should offer? 9. comments? balancing community and local needs: releasing, maintaining, and rearchitecting the institutional repository article balancing community and local needs releasing, maintaining, and rearchitecting the institutional repository daniel coughlin information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.14073 daniel coughlin (dmc186@psu.edu) is head libraries strategic technologies, penn state university. © 2022. abstract this paper examines the decision points over the course of ten years of development of an institutional repository. specifically, the focus is on the impact and influence from the open-source community, the needs of the local institution, the role that team dynamics plays, and the chosen platform. frequently, the discussion revolves around the technology stack and its limitations and capabilities. inherently, any technology will have several features and limitations, and these are important in determining a solution that will work for your institution. however, the people running the system and developing the software, and their enthusiasm to continue work within the existing software environment in order to provide features for your campus and the larger open-source community, will play a bigger role than the technical platform. these lenses are analyzed through three points in time: the initial roll out of our institutional repository, our long-term running and maintenance, and eventual new development and why we made the decisions we made at each of those points in time. the institutional repository (ir) a university institutional repository (ir) provides long-term access to the scholarship and research outputs of an institution.1 the outputs can be in the form of scholarly publications, data sets to support publications or other research, electronic theses and dissertations, and other digital assets that have value to the university to preserve and to the research community an d beyond to disseminate. there is additional value in keeping these otherwise scattered resources collected in a single repository to showcase the scholarly accomplishments of an institution.2 there is value to the university to collect and disseminate the scholarly outputs of the university to understand the strengths of the university and promote that research to outside audiences, attract new faculty, and provide opportunities for new faculty where fields may be emergent or void of an institutional presence. furthermore, there is value to the research community to be able to find peer research without having to pay publisher access fees. reducing the burden on faculty to meet various policy demands from a federal, publisher, and institutional perspective provides another motivation for irs. federal policies can require making anonymized research data and scholarship publicly available because it is publicly funded through tax dollars; publishers can make authors provide access to the data that supports the research that is being published.3 in the united states, a growing number of academic institutions, from 2005 to 2021, have adopted an open-access policy that requires researchers to provide a copy of any published scholarly article in a publicly accessible repository. the institutional repository is a way for a university to meet this increasing demand from research organizations and funding institutions for their researchers.4 as the size of a campus grows in disciplines, it inherently grows in complexity and a diversity of digital needs and use cases from its researchers. for example, high-resolution images or mailto:dmc186@psu.edu information technology and libraries march 2022 balancing community and local needs | coughlin 2 atmospheric data are likely to create a higher demand in storage needs than a discipline that relies largely on text. performance-based research may require multimedia resources and streaming capabilities while other large files can be shared in a more asynchronous manner. the diversity of needs contributes to the complexity of finding a solution for an institutional repository that meets all, or many, of the needs on a campus from a file storage, discovery, and access perspective. this paper broadly addresses penn state university’s development of its ir at three distinct points in time: (1) choosing a platform for our ir and its initial release; (2) maintaining an ir; and finally, (3) our current solution nearly 10 years later. at each point in time, we analyze our decision process through four lenses. these lenses provided a thorough examination for us to decide on how to proceed; they are community needs and potential tension that exists with local needs, our team dynamics, and finally the platform we built our software on and the infrastructure required to maintain it. we discuss why we made the decisions we made through these four lenses, the benefits and drawbacks, and what we have learned along the way. penn state is the state of pennsylvania’s land grant university in the united states. the university has 24 campuses physically located in the commonwealth of pennsylvania, the world campus which is online, two law schools, and a medical school. in the fall of 2021, penn state had 73,476 enrolled undergraduate students and 13,873 graduate students, with research expenditures totaling over $850 million for the last four years.5 penn state is a large, public research institution with a diverse set of needs. this is significant because when the university is considering developing a large system such as an institutional repository, we need to meet the needs of a broad set of disciplines and domains. we are fortunate enough to have software developer and system administration resources that smaller institutions may not have. this provides a bit of context into our considerations for an institutional repository. selecting a repository in january 2012, penn state university libraries and penn state’s central information technology department collaborated on developing an institutional repository for the university’s growing data management needs. the university libraries was interested in becoming more involved in open-source software community development efforts. at that point, many universities that we had spoken with had an existing ir solution in place, and we had a lot of freedom to choose a platform without the burden of data migration. we considered investigating (1) off-the-shelf, turnkey solutions such as dspace, (2) a prototype we had just built called curation architecture prototype services (caps) using a microservices approach, or (3) building on top of an existing platform. ultimately, we decided to build on top of an existing platform, samvera (named hydra at the time).6 we did not want a turnkey solution, because we felt that we had distinct needs that would require a level of customization that these solutions would not be able to offer. based on discussions with others, we decided to develop something of our own. we wanted to leverage the experience of others in the repository development domain. the microservices approach at the time was more of a conceptual approach towards development than an existing software solution. the ability to build on an existing platform was a happy middle ground for us and we evaluated this decision through several lenses that led us to our selection at that time. community involvement we did not want to develop a solution in a vacuum and thought a group with a (relatively) common set of problems would be helpful to problem solve. the samvera community was a small but growing community working towards repository solutions like what we were trying to information technology and libraries march 2022 balancing community and local needs | coughlin 3 achieve. members of the community were both managerial and technical. this was valuable to us for understanding the strategic direction for the community and the ability to collaborate and problem solve on technical implementations. some of the key partners for our early work were university of hull (uk), stanford university, university of virginia, and notre dame. there was communication throughout the year over community email, chat platforms, and phone calls; however, the quarterly partner meetings were the most valuable time for collaboration. these quarterly meetings were a couple days in length, typically at a partner institution’s campus (physically) attended by managers and software/systems developers. this provided the ability to work together on specific problems, showcase our work, and get to know each other more closely at lunch and after-hours meetups. working within the community would also get our team increased exposure and help with recruiting future colleagues. working in the open-source software community has been seen to benefit both candidates and employers in future job recruitment.7 we were excited by the promise of working with and contributing toward a larger community. our team had apprehension about building this alone, and we were happy to be working with the support of a community and within their set of processes. local needs early on in our requirements for the repository we created a moscow chart that provided our “must have,” “should have,” “could have,” and “won’t have” features.8 the platform we were choosing was going to provide us with a significant set of these features for our repository with very little work on our end. these features were built in and included search, discovery, and basic file edit functionality. essentially, we were going to quickly meet the needs of our stakeholders by using this software. this was important for a couple of reasons. first, providing features to our stakeholders quickly gave them ample time to provide feedback so that we could make necessary customizations for their specific needs. a less quantitative benefit was gaining the trust of colleagues at the start of a new project and new initiative. rather than continually suggesting “that feature will be done next week,” we were able to deliver results quickly and get feedback. for example, our repository integrated with our campus authentication system, restricting access. we were able to deliver these features and get feedback on both the functionality as well as terminology to improve the usability. in particular, the way our developers described permissions was initially too confusing for our users and we were able to make necessary adjustments prior to a production release of the ir. team dynamics we believed it was a significant professional development opportunity for many people on the team to work with a larger community and learn from and with those in the open-source community. the team working on the ir consisted of three full-time, or near full-time, developers (one joining after we started the project), and a systems administrator. this project was our first large project that included a project manager invested in agile project management methodologies and with a systems administrator in place at the beginning of the project. platform and infrastructure stability there was a desire to get to a common solution to easily set up other repositories for various needs within the libraries and we hoped there would be an ability to plug and play various components or features. the three common components of this system were fedora commons to store both metadata and our digital assets; solr as an index for fast search; and blacklight as a web interface that sits on top of solr. one of the primary components, active-fedora, would sync content between the fedora and solr persistence layers. our hope was that with this model, we information technology and libraries march 2022 balancing community and local needs | coughlin 4 would be able to write code that could be used in other repositories, and we could use the code that other institutions had written for our repository needs to build other applications more quickly. the samvera community was initially called hydra because of the relationship with the mythical creature that has several heads (see figure 1). we were considering the potential of running a core storage infrastructure and discovery infrastructure, while developing several heads for our various applications. we knew this was a lofty expectation, but also thought that it was a good design principle for us to advance. additionally, the pilot that we developed on microservices (caps) seemed to have a relatively large storage service and we could not determine how to get away from that. although this was a bit of a shift in our philosophy, it was less of a shift based on our practical experience. figure 1. aspirational intentions of running many applications on one access and discovery system. initial release the initial release of our ir, scholarsphere, was for research data, scholarly articles, and presentations. we considered the repository file agnostic and left the definitions of scholarly materials up to the depositor. the self-deposit process made very few assumptions to limit the barriers to deposit—there were few mandated fields for deposit in scholarsphere. the initial rollout of scholarsphere had met the “must have” and many of the “should have” needs that we had defined initially in our development requirements. the list of “must haves” included upload files via the web, create and assign metadata to the uploaded files, set three access levels to the files, search for files, display files, etc. the list of “should haves” were faceted browse, faceted search results, share files with a group, etc. the benefit of working on a community-developed platform provided some of these features for us (search, faceted browse), and gave us the flexibility to customize where necessary. for example, we had our own data model of metadata to assign to files based on our users’ needs. we were able to update the existing metadata that was information technology and libraries march 2022 balancing community and local needs | coughlin 5 provided out of the box, to accommodate that. this was a tremendous win for us to leverage community-provided solutions and local needs. additionally, the platform provided a search index with solr. this enabled our infrastructure to have a common solution with community support on configuration questions. using the blacklight ui on top of solr created another opportunity for us to customize where desired and ease of development efforts. community: following the initial release, we worked with other members of the community to pull out some of the core functionality and place it in a separate ruby library. this library (sufia) could then be leveraged as a default set of repository features for other developers. the release of a new ir, and this library, provided us with a lot of positive exposure at various community events. local needs: locally, we used this library to develop a repository for our digital archives. it previously took two to three developers nearly nine months to develop scholarsphere; however, we used the sufia module to roll out a separate repository in six months with a single developer. this was another successful production rollout and a successful use of a product created by and for the community. team dynamics: we had a successful release and were getting support for new developers to hire. we continued to move more of our projects toward an agile approach and permanently embed systems administrators into our development projects. infrastructure: we had not released a new system for archives on the exact same system that scholarsphere was developed on, but we were happy that our projects were relatively homogeneous technology stacks and provided a familiarity to run. maintaining the ir over the next several years we released three major updates to scholarsphere: 1. migrating the data object store to a major version 2. overhauling the user interface 3. migrating our data model to the portland common data model (pcdm) simultaneously, the sufia library that we developed had also grown in usage by other institutions and contributions from other developers. we were excited to have additional contributors, and with that came an understandable sense of competing priorities within our community’s development roadmap. we were building scholarsphere features and functionality to meet the needs of our local institution and managing the tension between community direction and local needs. again, we look at these lenses as evaluating the period during maintenance, upgrades, and feature adds. community involvement two of the major releases mentioned above were largely community driven. in one case— migrating the data object store—we were one of the initial repositories within our open-source community to migrate our data storage system. we anticipated that doing this work early would prevent us from having to rewrite any code that relied on the data storage layer. ultimately, this may have been a bit early for us, because we never were able to create the momentum for others in the community to make this same migration. this created a bit of a divergence, but at this layer in our technology it did not prevent us from continuing to work closely with the community. information technology and libraries march 2022 balancing community and local needs | coughlin 6 we were able to add locally developed features for managing files and uploads, community components that allowed for controlled vocabularies, and cloud provider uploads.9 in all, from 2012 to 2019 we were an active member of the community: we provided technical contributions, we were being asked to present at community events, and our developers were frequently asked to help at several workshops. the community provided many opportunities for professional development and code from the community provided new features to our users. we felt this work was successful. we had three major releases. one was something that our local users were able to experience directly. two of our upgrades were largely on the back end and, while there is no argument on their importance, it can be a challenge to illustrate the significance of largely opaque technology upgrades to users. concurrently, we were coming up against other challenges that were proving difficult to solve in a sustainable and scalable way. large file size (larger than 1 gb) for uploads and downloads remained an issue that researchers seemed to be encountering more frequently. our mechanisms for getting around some of these obstacles led us to looking at an api for administrators and other applications to integrate. for example, if the web browser upload was not working, perhaps we could physically get the file from a user and upload it to the system ourselves. if we could do that, maybe we could use an api to do this upload, but we did not have an api. when developing new features, we would question if it should be code to contribute back to the community or only for our (penn state) needs. frequently, the devil is in the details and , while several institutions were interested in a feature based on a conversation, implementation could be much more detailed and it was difficult to find common ground. this complexity could lead to longer timelines and more difficult planning for local development features. team dynamics over this time period we advanced our team by adding several highly skilled developers (some of whom have now moved on to other positions and remain highly respected within the community), and enriched the collective skill set of the group. the team was enriched by this experience overall. the balance between community involvement and local needs became a frequent conversation point for our team. we spent a lot of effort on initiatives that had not solved some of the bigger problems our users were experiencing locally; our community disengagement was likely a combination of common reasons, for example, our lack of time to make meaningful contributions.10 in the spring of 2019, the development team that worked on scholarsphere shrunk from three developers down to one. we had a strong number of developers within the samvera community to collaborate with; however, we had difficulty bringing on new members at the time because the complexity within the scholarsphere system created a high learning curve that was not necessarily transferable to other technology stacks. at the end of the summer in 2019 we were given 25 gb of video files to upload in scholarsphere and make accessible. the parameters of the request were outside of what we could support from our web interface, and we had no api to allow a product owner to develop against and work with the researcher to meet this request. after approximately one month of working with the data and our system, we successfully ingested the files into scholarsphere. at the end of this month, we information technology and libraries march 2022 balancing community and local needs | coughlin 7 decided that we needed to more urgently evaluate our path forward because we could not have our lone developer spending this amount of time on single-user requests. platform and infrastructure stability each of the major versions released between 2012 and 2019 had several patches and feature releases to enhance the system, the interface, and/or our processes for change management within the software system. for example, we went from a typed script containing a series of commands to chef (a language used to automate software deployment) for deployment management; we upgraded infrastructure core components (fedora, solr, travis, redhat, etc.); and we added infrastructure to keep up with the system demands. in terms of adding infrastructure, we both enhanced the virtual capabilities (cpu and ram) of our systems and had tasks offloaded to other systems. we did not want the systems our users interfaced with to be responsible for all the heavy lifting. these tasks included characterization, indexing metadata for search, creating thumbnails, etc. (see figure 2). figure 2. systems and services with basic workflow process for uploading a file to scholarsphere, including the background jobs that ran on file upload. adding additional components improved the user experience but made our infrastructure d ifficult to manage. we were continually trying to push our systems to reflect the best practices of the twelve-factor app.11 however, over time, we had certain “infrastructure smells.” the infrastructure smells were essentially anti-patterns of these best practices or symptoms of a bigger problem.12 these anti-patterns included: • storage coupled closely to application • lack of flexibility to scale storage to integrate • inability to spin up a scholarsphere instance web 1 repo isilon jobs mysql services • apache • rails • passenger • clamav services • maria db services • tomcat • fedora • jetty • solr • redis services • rails • resque • fits services • nfs jobs on upload • characterization • thumbnail creation • text extraction (solr) • derivatives information technology and libraries march 2022 balancing community and local needs | coughlin 8 • taking days to set up a dev environment • lack of flexibility to decouple small tasks that may require increased resources (create derivatives) evaluating next steps although we were coming up against some struggles and continued maintenance with scholarsphere it was a successful software project that had several things we liked (and likely took for granted). it was important for us to recognize what features and characteristics of scholarsphere were a part of this list. scholarsphere‘s data model was flexible enough to support several current use cases and future needs and was developed with a significant amount of community input. there were other development teams within our organization that were also developing new applications in ruby, so the language continued to be relevant within ou r larger group as was ruby on rails, blacklight, and solr. some of the libraries developed with these frameworks were providing us with struggles and we knew that tools and infrastructure could be barriers to newcomers onboarding and orientation.13 however, the languages themselves were still flexible enough for us to continue our work. we had three permission levels to access the full text of an uploaded file: (1) public, (2) penn state only, and (3) private, and we didn’t want to develop anything more complex than that around access permissions. fedora provided us with versioning capability of our objects and we thought that this was something not only to continue but potentially enhance. we also had strong support from the samvera community for scholarsphere. many people had worked on the code that helped provide functionality and we could collaborate within that community when problems arose. at that point we largely decided to continue to develop needed features for scholarsphere while the community pushed forward. in part we were hoping that our divergent paths would converge within a year (give or take). the month following the relatively manual process of ingesting the 25 gb of video files into scholarsphere was spent making important updates to the system and fixing any low-hanging fruit. in october 2019 we decided to start from scratch and spend about two months developing a new solution and to evaluate our path forward after that. current solution we turned to the same four established lenses when evaluating our needs in 2019. however, it is worth noting that organizationally we were in a much different position than when we started in 2012. the software development and infrastructure team that managed the service was organizationally moved from central it to the libraries where the service and product owner resides. being in the same building and having the same priorities improved communications. also, people within the teams had changed, and our leadership had changed, which changed how we approached some of our decisions. we had more experience in technical skills, specifically in repository development; we were more refined in our implementation of agile methodologies; and having run a service for years, we had a better sense of our users’ needs. community involvement the community saw a tremendously successful period of growth during this time in adoption of software, exposure for funded grants, and number of partners. there was renewed excitement about multiple solutions including turnkey repository solutions, hosted solutions, the merging of two highly regarded software libraries for performance, and improvement in developer friendliness. the latter improvement stripped some of the design patterns that developers struggled with to something more familiar and made it easier to onboard new developers. information technology and libraries march 2022 balancing community and local needs | coughlin 9 local needs the pressure to meet our local needs and competing priorities for the community-based software became a sticking point for us. we needed to have a more scalable backend and we were not su re when our needs and the priorities of the community would merge. we had also been behind on several dependencies and the lift to get back up to date, before being able to add anything new, was considerable. this situation led us to create a prototype for evaluation. our initial goal was to see how difficult it would be to build a system to meet the needs of uploading the video files that scholarsphere currently could not handle. we had confidence we could develop features, but this area was a consistent challenge and we considered it a primary hurdle for us to jump. team dynamics our development team consisted of a single developer. however, we had an infrastructure developer who was able to help with systems configuration, automation, and containerization. our developer thoroughly understood scholarsphere and the underlying codebase and architecture and had the resources to hire a consultant to help with our efforts. we had considerable work performed by a local software development company on other repositories (electronic theses & dissertation system, a digital cultural heritage repository, and our researcher metadata database). we valued this partnership and wanted to continue to utilize them as our staff numbers were down. we needed to be able to more quickly onboard others than we previously had in the past. if we were able to have three relatively new members of the team contributing to this progress, then we would also potentially have chosen a technology stack that was comfortable for others outside our development team to make a more immediate impact. platform and infrastructure stability as with many systems that are actively developed for years, our current system had several dependencies that had organically grown over time to become burdensome to put together in order to set up a development environment. additionally, a local development environment was not an exact replica of the production environment because networked storage was implemented on production and our development systems had a local storage. we also took this opportunity to test out amazon s3 storage options as our production storage system. we chose this alternative to see if we had increased reliability in our storage and to see how well we could manage data in s3 and get a production service using this to provide an example of the annual operating cost by using the cloud vendor. we were able to simplify our rollout a bit, and modernize the technologies used to run our systems (i.e., docker containers, kubernetes cluster) (see figure 3). development we had three general goals: (1) to improve stability/scalability for local needs; (2) to improve our ability to get an environment up for developers more simply; and (3) to be able to onboard new developers more quickly. shortly after our prototype test proved we could meet local needs in scalability, we were able to test out our second goal, getting a scholarsphere environment set up easily. the process of setting up a development environment went from days to hours. we had reached two of our three goals with these tests and believed our development team (that was two to three new developers) contributing to our first two goals was proof that we could onboard new developers quickly (our third goal). after several months of development in early 2020 we had accomplished moving several of the obstacles that had been in our way in recent years but were nowhere near feature parody with scholarsphere. information technology and libraries march 2022 balancing community and local needs | coughlin 10 figure 3. current infrastructure for scholarsphere, released in november of 2021. we had a rich feature set to transfer from the existing scholarsphere and did not want to simultaneously run two systems until we achieved some level of feature parody. we wanted to get to a minimal viable product (mvp) for our new prototype, migrate data, release our new version, and retire our existing system. our product owner had been working directly with scholarsphere users and was able to help us determine priorities for the features we needed in order to have an mvp. the following were some of those features: • an api, at the very least an internal one for o our migration script o other home-developed applications o internal library employees • versioning and the ability to view versions • updated status (pending published) • updated user interface • urls that were harvestable • maintaining our data model for continued support of concepts such as collections • enhanced support for dois information technology and libraries march 2022 balancing community and local needs | coughlin 11 we also identified some features that had been developed over the years to either simplify or eliminate. the profiles within scholarsphere were not heavily used and over the years the university had more mature systems for this type of purpose. similarly, finding a featured researcher for the home page seemed to create more work than it was worth, and our social media integrations were not going to be a priority. we also thought a user’s dashboard—the default page after logging in—could be greatly simplified based on the most prominent actions our researchers wanted to perform. conclusion after a little over a year of development, in november 2020, we released our new version of scholarsphere. we used our own internal api, as planned, for data migration from our existing fedora commons storage system into the new one in amazon s3. over the past seven months we have done nine feature releases, including collections, and an enhanced api to support penn state’s open access initiative. we learned some lessons along the way within all of these lenses. we have also more than doubled the physical storage size of our repository since releasing in november 2020. over the summer, we were able to meet a faculty member’s request to upload 30 to 40 videos of 300 to 400 gb, a request we never would have been able to meet in our prior solution. community & local needs working with the samvera community has provided countless opportunities for our entire team. we were able to sharpen our technical skills, were given opportunities to lead workshops, organized community development sprints, and collaborated on a plan for a community roadmap (to name a few). our entire team benefitted in several ways by the involvement in the community: our software knowledge is higher, our problem-solving skills are more creative, and our outside professional opportunities expanded. ultimately, our paths diverged in a way that made it difficult to justify the time and resources required for merging back. there are several benefits to community-based software: more eyes looking at potential security issues in code, more voices to let you know when a dependency of your code has beco me vulnerable, shared software ideas for developing issues, and shared solutions for common problems. the cost of all these benefits comes with increased complexity in organizing a solution (you need to take multiple institutions into account), workflows for development (your local workflow may not be the same as the community approved workflow), competing priorities within the community, and competing priorities with the community and local roadmap. open source communities are largely online, these groups typically have a more shared, informal leadership structure and that lack of formal leadership can make it difficult to find solutions to these complexities.14 team dynamics, and platform and infrastructure stability rewriting a system can be a daunting task, and several prominent developers would argue against it.15 reasons we believe we were successful are that (1) we did not change our data model, (2) although we changed our architecture, we did not change our coding conventions or our agile development process, and (3) the benefits of our changes were multidimensional. we were meeting users’ needs with our development work and our infrastructure was enhancing our capabilities and making the work of our developers easier and less frustrating. our deployment process has improved to the point that we can perform a release easily and without downtime. information technology and libraries march 2022 balancing community and local needs | coughlin 12 our technology is no longer based on samvera, and is now, largely, a more generic ruby on rails application. we migrated from using fedora as both a metadata and object store (retrieving objects on our central isilon system through fedora) to using postgres as a metadata store and amazon’s s3 storage service for our files. we migrated our background jobs processing services from rescue to sidekiq. we continue to use blacklight discovery and search interface, with solr as our search platform. many of these technical decisions were made because of the change in dynamics of our team, and perhaps the single biggest change was around experience and the confidence that comes with that. selecting a platform and infrastructure to support that platform is daunting. it is particularly difficult when you have so many questions in front of you about how the system will be used, the demand it may be under, the need to scale, how to deploy new features and update dependencies, etc. our decisions in 2019 were made with much more experience and understanding of what was required of our system as well as what desired by our users. this gave us the confidence to branch off slightly from the joined technical path and recognize all the value (beyond technical solutions) to remain members of the community albeit in a modified capacity. acknowledgements many people put in tremendous time, effort, skill, thought, and enthusiasm into scholarsphere over the years. we want to acknowledge all those that have contributed to the development and advancement of the system and appreciation for their work: carolyn cole, hector correa, michael tribone, michael j. giarlo, adam wead, ryan schenk, jeff minnelli, dann bohn, justin patterson, joni barnoff, seth erickson, kieran etienne, calvin morooney, jim campbell, paul crum, chet swalina, matt zumwalt, justin coyne, elizabeth sadler, valerie maher, jamie little, brian maddy, kevin clair, patricia hswe, and beth hayes. endnotes 1 helen hockx‐yu, “digital preservation in the context of institutional repositories,” program 40, no. 3 (2006): 232–43, https://doi.org/10.1108/00330330610681312. 2 raymond okon, ebele leticia eleberi, and kanayo kizito uka, “a web based digital repository for scholarly publication,” journal of software engineering and applications 13, no. 4 (2020), https://doi.org/10.4236/jsea.2020.134005. 3 research data access and preservation, “browse data sharing requirements by federal agency,” sparc, september 29, 2020, http://researchsharing.sparcopen.org/compare?ids=18&compare=data; “publisher data availability policies index,” chorus, october 8, 2021, https://www.chorusaccess.org/resources/chorus-for-publishers/publisher-data-availabilitypolicies-index/. 4 “registry of open access repository mandates and policies,” roarmap, http://roarmap.eprints.org/view/country/840.html. 5 “student enrollment – fall 2021,” the pennsylvania state university data digest 2021, https://datadigest.psu.edu/student-enrollment/. https://doi.org/10.1108/00330330610681312 https://doi.org/10.4236/jsea.2020.134005 http://researchsharing.sparcopen.org/compare?ids=18&compare=data https://www.chorusaccess.org/resources/chorus-for-publishers/publisher-data-availability-policies-index/ https://www.chorusaccess.org/resources/chorus-for-publishers/publisher-data-availability-policies-index/ http://roarmap.eprints.org/view/country/840.html https://datadigest.psu.edu/student-enrollment/ information technology and libraries march 2022 balancing community and local needs | coughlin 13 6 stephen abrams, john kunze, and david loy, “an emergent micro-services approach to digital curation infrastructure,” the international journal of digital curation 5, no. 1 (2010): 172–86, https://doi.org/10.2218/ijdc.v5i1.151. 7 jennifer marlow and laura dabbish, “activity traces and signals in software developer recruitment and hiring,” in cscw ’13: proceedings (acm, 2013): 145–56, https://doi.org/10.1145/2441776.2441794. 8 dai clegg and richard barker, case method fast-track: a rad approach (reading: addisonwesley, 1994). 9 “questioning authority,” github, accessed september 2021, https://github.com/samvera/questioning_authority; “browse-everything,” github, accessed 09/05/2021, https://github.com/samvera/browse-everything. 10 sophie huilian qiu et al., “going farther together: the impact of social capital on sustained participation in open source,” 2019 ieee/acm 41st international conference on software engineering (icse) (2019): 688–99, https://doi.org/10.1109/icse.2019.00078. 11 adam wiggins, “the twelve-factor app,” accessed september 2021, http://12factor.net. 12 akond rahman, chris parnin, and laurie williams, “the seven sins: security smells in infrastructure as code scripts,” 2019 ieee/acm 41st international conference on software engineering (icse) (2019): 164–75, https://doi.org/10.1109/icse.2019.00033. 13 christopher mendez et al., “open source barriers to entry, revisited: a sociotechnical perspective,” in proceedings of the 40th international conference on software engineering (may 2018): 1004–15, https://doi.org/10.1145/3180155.3180241. 14 lindsay larson and leslie a. dechurch, “leading teams in the digital age: four perspectives on technology and what they mean for leading teams,” leadership quarterly 31, no. 1 (2020), https://doi.org/10.1016/j.leaqua.2019.101377. 15 fredrick p. brooks jr., the mythical man-month: essays on software engineering (reading, mass.: addison-wesley pub. co., 1982) https://search.library.wisc.edu/catalog/999550146602121; joel spolsky, “things you should never do part i,” joel on software, april 6, 2000, https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/. https://doi.org/10.2218/ijdc.v5i1.151 https://doi.org/10.1145/2441776.2441794 https://github.com/samvera/questioning_authority https://github.com/samvera/browse-everything https://doi.org/10.1109/icse.2019.00078 http://12factor.net/ https://doi.org/10.1109/icse.2019.00033 https://doi.org/10.1145/3180155.3180241 https://doi.org/10.1016/j.leaqua.2019.101377 https://search.library.wisc.edu/catalog/999550146602121 https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/ abstract the institutional repository (ir) selecting a repository community involvement local needs team dynamics platform and infrastructure stability initial release maintaining the ir community involvement team dynamics platform and infrastructure stability evaluating next steps current solution community involvement local needs team dynamics platform and infrastructure stability development conclusion community & local needs team dynamics, and platform and infrastructure stability acknowledgements endnotes academic uses of google earth and google maps in a library setting eva dodsworth and andrew nicholson academic uses of google earth and google maps in a library setting | dodsworth and nicholson 102 abstract over the last several years, google earth and google maps have been adopted by many academic institutions as academic research and mapping tools. the authors were interested in discovering how popular the google mapping products are in the academic library setting. a survey was conducted to establish the mapping products’ popularity, and type of use in an academic library setting. results show that over 90 percent of the respondents use google earth and google maps either to help answer research questions, to create and access finding aids, for instructional purposes or for promotion and marketing. the authors recommend expanding the mapping products’ user base to include all reference and liaison librarians. introduction since their launch in 2005, google maps and google earth have had an enormous impact on the way people think, learn, and work with geographic information. with easy access to spatial and cultural information, google maps/earth has provided users with the means to understand their world and their communities of interest. moreover, the customizable map features and dynamic presentation tools found in google maps and google earth make each one an attractive option for someone wanting to teach geographic information or make customized maps. for academic researchers, google mapping applications are also appealing for their powerful ability to share and host projects, create customized kml (keyhole markup language) files, and to easily communicate their own research findings in a geographic context. recognizing their potential for revitalizing map collections and geographic education, the authors felt that many academic libraries were also going to be active in using google maps/earth for a variety of purposes, from promoting their services to developing their own google kml files for users. with google earth’s ease of use and visualization capabilities, it was even thought that academic libraries would be using google earth heavily in instruction classes bringing geographic information to subject areas traditionally outside of geography. as active users of google maps/earth in their roles as academic librarians at their universities, the authors became curious to know what other academic librarians were doing with google maps/earth, particularly those working with maps and/or geography subjects. were they using eva dodsworth (edodsworth@uwaterloo.ca) is geospatial data services librarian, university of waterloo library, waterloo, and andrew nicholson (andrew.nicholson@utoronto.ca) is gis/data librarian, hazel mccallion academic learning centre, university of toronto mississauga, ontario mailto:edodsworth@uwaterloo.ca mailto:andrew.nicholson@utoronto.ca information technology and libraries | june 2012 103 the technology as part of their librarian roles on campus? how were they using it? what impacts was it having in how they delivered library services? to help answer these questions, the authors set out on a three-stage process with the aim of providing a more complete picture of google maps/earth use in academic libraries. the first stage consisted of a literature search focusing on library and information science research databases, to see what (if any) scholarly research had been written that discussed the role of google maps/earth in academic libraries. the second stage of the research had the authors examining over a dozen academic library websites to assess how they were integrating google maps/earth either through an api plug-in on their website or advertising other google maps/earth related services and collections. the third stage had the authors compile a set of twenty survey questions which were then distributed to academic librarians across canada and the united states, probing the use of google mapping products in the academic library setting. literature review despite the ubiquity of google for information searching, there was a surprising paucity of literature that documents the impact of google maps/earth in academic libraries. nevertheless, there are some articles which indicate just how much google maps can help raise the profile of library services. terry ballard, a librarian at quinnipiac university, describes in a few articles how he and colleagues were able to use google earth placemarks to promote his library’s special collections.1 the potential for “discovering the library with google earth” is also a theme in an article by brenner and klein in which the portland state university library linked their urban planning documents collection to google earth for ease of searching.2 although the focus is on public libraries, michael vandenburg documents how his library system began “using google maps as an interface for the library catalogue.” in his article, vandenburg discusses that the inspiration for such a project came about through various google maps mashups that were popular on search oriented websites such as “housing maps,” which combined realtor listings from craigslist with a google maps api. using api coding, vandenburg was able to link latitude and longitude data of countries to individual opac records enabling a visual search for items at the country level.3 while these articles focused on use of google earth as a collection discovery tool, troy swanson notes the visualization aspects of the applications and their utility for teaching information literacy. swanson has students use google earth and second life as tools to create a virtual exhibit on malcolm x. although swanson notes that the final output by the students did not meet the initial expectations, valuable learning opportunities for teaching in a 3d space were recognized and should be pursued. 4 some of these opportunities are highlighted as case studies by lamb, noting the visualization aspects of google earth would be very useful for librarians providing instruction.5 academic uses of google earth and google maps in a library setting | dodsworth and nicholson 104 google maps/earth & academic libraries: a scan of selected library websites for the next stage, the authors performed an environmental scan of academic library websites to see how they are using and implementing google mapping technology into their services. many are doing creative and innovative project work which will, we hope, encourage and guide other libraries to consider doing something similar. mapping technology can be used in several different ways, and with internet users becoming more proficient using this technology, libraries have the opportunity to take advantage of this communication medium. any document or image that has a geographic component can be digitized and made easily accessible using online mapping technology. the following section will review some of the projects highlighted on websites. the projects can be grouped into the following categories: finding aids, collection distribution, and teaching and reference services. finding aids all collections in libraries require some sort of finding aid to locate library material—the most obvious one being the library catalog. however, there are many location-based materials that use customized finding aids such as map and air photo indexes, and geospatial data coverage maps. for several years now libraries have been trying to make access to the finding aids easier by digitizing them and offering them online. not only are online versions easily updatable, but they are quite often created using google technology, allowing for the use of modern basemaps and zoom capabilities. traditional paper indexes can be difficult to navigate, especially the historical ones, making the search process rather difficult for users and library staff. one of the most popular types of online indexes created by libraries is air photo indexes. most map libraries collect air photos, and many use similar indexes to help locate aerial photography for an area of interest. several libraries have digitized the indexes making the same information available online. users simply zoom into a geographical area and click on a point to retrieve the photo information they need in order to locate the air photo in the library collection. some libraries will even send an electronic copy of the photo to the users. the mcgill university library, for example, has made its air photo information available from their webpage in a kml format to be viewed in google earth. users can click on a point of interest to easily obtain the air photo information. mcgill library has also digitized topographic indexes, making them also available via google earth.6 the university of western ontario’s serge a. sauer’s library also provides its air photo indexes online, incorporating google maps directly into their website. placemarks representing individual photos have been inserted on a google map, along with the photo description so that when users click on the placemark, photo information is released. using google mapping technology to offer online finding aids that are searchable by location is an innovative and cost-free step towards collection accessibility. what would make these types of library collections even more accessible, however, is offering users online access to digital versions of the collection items themselves. so to bring the indexing project one step forward, not only would the photo reference information be made available, but the actual image would be too, information technology and libraries | june 2012 105 thereby allowing libraries to use google mapping technology as an avenue for collection distribution and delivery. collection delivery libraries have had digital collections for quite some time. many of course do not need to digitize resources themselves as they subscribe to products such as electronic journals and books. however there are still some less common collections that are physically housed in libraries that would be much more accessible to users if they were exposed and made available online. an internet search has shed light on numerous digitization projects that use google mapping technology to search for and deliver location-based collections. examples of these types of collections include historical maps and air photos, archived photos and postcards, audio interviews, community information, textual documents like letters and diaries, and gis data. mcmaster university library is one example of a library that has digitized a historical map collection and made it available online. an index to its world war i military maps and aerial photography was created using google maps, and was embedded into its webpage.7 users can click on an area of interest to bring up the corresponding high resolution map image. likewise, brock university library has also offered its historical air photo collection online, allowing users to search using a google map, and then download photos of interest.8 additionally, yale university library has created kml indexes of its fire insurance plans, with direct links to the digitized images.9 the university of connecticut library has digitized its local historical maps and using google maps had created a map mashup which includes historic landmarks. clicking on the landmarks provides users with links to related resources. several libraries have digitized other imagery, such as postcards and photography. this is particularly popular with archival and specialized collections. the university of vermont library has embedded a google map into its website with placemarks that when clicked lead the user to the library’s long trail collection, an assortment of over 900 images of the oldest long-distance hiking trail in the united states. the images have been digitized from hand-colored lantern slides.10 cleveland state university library has also done something similar with its cleveland memory project, in which google maps were embedded into the library webpage and placemarks of local historic landmarks added. when users click on the placemarks, they are able to access a description of the landmark along with a photograph of it. clicking on “more information” will lead the user to several related resources, including the library catalog, where original documents about the location are available (e.g., images, books).11 besides digitizing their collections, some libraries have also georeferenced them so that they could not only be accurately located using an index, but so that they could be viewed in google earth (kml format). offering collections in kml format greatly increases exposure and use of geographic resources because google earth is one of the more popular location-based applications used by library users and the public. geographic files such as georeferenced air photos and satellite images, as well as gis data used to be only viewed in specialized gis programs. but gis technology has evolved into so many online applications, offering all computer users the benefits of geographic information and a platform to distribute information. academic uses of google earth and google maps in a library setting | dodsworth and nicholson 106 the university of waterloo map library is one example of a library that had digitized its historical air photo collection and made the images available in kml format for google earth usage.12 users can access a map index of the available photos from the map library webpage and then click on the index to download the images. the university of north carolina library has georeferenced several historical maps and made them available for viewing as an image overlay in google maps. this particular mapping project consists of around 150 thematic maps, including historical soil surveys, road and highway maps, city/county maps, and more. users can take advantage of the georeferenced maps and accurately compare historical features to modern ones with google maps’ basemap. having a preview of the dataset before it is downloaded assists the user in downloading only what is needed.13 perhaps more popular than a library’s air photo collection are libraries’ collections of geospatial data. geospatial, or gis data, has traditionally been only used by users who have access to gis programs such as esri’s arcgis, or arcview. more recently, librarians have discovered that when spatial files are converted into easy-to use file formats, such as kml, the user group is broadened and the files are used more. so it is no surprise that several libraries have converted their gis shapefiles (a spatial data file format used specifically in gis programs) into kml files and made them available for download from their webpages. university of connecticut library offers its gis files online in various formats, including kml. it also provides a sample image of the gis layer in google maps.14 baruch college at the city university of new york has made neighborhood census data available in google maps. the geographic boundary files were overlaid in google maps, and clicking on the map will lead users to the files available from the american census bureau’s fact finder. clearly, many libraries have incorporated google mapping technology into their digitization projects. the technology has proven capable of attracting collections that are not strictly locationfocused such as maps and air photos, but that have a location associated with it, such as archival photos of community landmarks or books written about a specific locale. google mapping technology makes the organization and storage of collections relatively effortless for library project managers, and it makes collection searching and distribution simple and friendly for the users. other uses of google maps/earth in libraries perhaps one of the simplest uses of google mapping technology can be illustrated by visiting several library websites. many libraries have embedded google maps into their website as either a webpage header15 survey: what are academic library staff doing with google maps/earth?16 following the review of the literature and academic library websites, the authors wanted to discover how academic librarians themselves were using google maps and google earth in their work, if at all. to capture this data, the authors compiled a set of survey questions targeting those in the academic library community who work with maps, gis, or geography/geology/earth science subject matter. information technology and libraries | june 2012 107 in preparing the survey questions, the authors were aware of a “survey fatigue” among the academic library community. at the time of research, many surveys were going out to librarians requesting their time and responses, so the authors wanted to keep the survey concise both in terms of number of questions, but also in the types of questions. in the end, the survey was created with twenty questions consisting of six yes/no questions, seven multiple choice, and the remaining seven questions being short answer. for distributing the survey, the authors wanted to reach as many librarians who worked with maps, geospatial data and government document subject matter as possible. the survey was then distributed on specialized map library and government publication listservs, including maps-l, govinfo, gis4lib, and carta (canadian maps & air photo systems forum). the survey was also distributed on the members’ only lists belonging to the association of canadian map libraries & archives (acmla) and the western association of map libraries (waml) listservs. the survey was made available on survey monkey for two months from december 2010 to the end of january 2011. the responses with the survey available during a quieter period of library activities, and thanks to a couple of reminder emails being sent out on the lists, our questionnaire received a total of 83 responses. who is using google maps/earth? the first couple questions dealt with the department or area of the library in which the respondent worked in, and what their position encompassed. as expected, a large majority of respondents, 81 percent, worked in “map/gis services” while 28.8 percent also had “general reference” responsibilities. other library service areas mentioned included “data services” and “it,” as well as some that fell outside library boundaries where staff worked in geography and environment science departments. not surprisingly, 52 percent of the responses indicated that their position was “librarian,” with the majority being “gis librarian” or “map librarian.” others included “reference & instructional services librarian” and “science librarian.” also received were 17 responses from gis specialists, library technicians and map assistants. what was especially noteworthy was that 12 responses were from library administrators, directors, or department heads who were finding time to work with google earth as part of their responsibilities. this number also included gis coordinators and map curators responsible for making decisions in their departments. google mapping products : what is being used, how often and for what purpose? to gain an understanding of how library staff are using google mapping products, a series of questions was asked of the respondents to determine which products were being used, how often and for which tasks. respondents were given a list of all the google mapping products available, and were asked to indicate which ones they had worked with. not surprisingly, the top two products used by respondents were google maps, 93 percent (71) and google earth, 91 percent (69). google maps api had been used by 40 percent (30) of the respondents, followed by google earth pro at 38 percent (29). eight percent (6) had also worked with google earth api, and 7 percent (5) had used google earth plus. interestingly, one respondent indicated that they had deployed google earth enterprise in their library. academic uses of google earth and google maps in a library setting | dodsworth and nicholson 108 figure 1. respondents’ use of google mapping products since many of these users may have simply used the products occasionally, it was important to get a sense of how often the products were being used. when asked the question “how regularly do you work with google mapping products for work-related projects?” 69 percent (54) responded that they use the products at least once a month. of those responses, 45 percent (35) use them at least weekly. specifically, eighteen percent (14) use them one to two times a week, thirteen percent (10) use them three to four times a week, and fourteen percent (11) use them even more often than that. only six percent (5) responded that they don’t use the products at all. information technology and libraries | june 2012 109 figure 2. frequency of use for work-related projects as google maps/earth can be used in many different ways and for different purposes in a library environment, the survey inquired how in fact these products were being used in their libraries. the survey question listed four possible tasks that the technology could be used for with the additional option for respondents to enter their own ‘other’ usages. respondents could check off all that applied. the options given included: • instruction • promotion/marketing • answering research questions • creating/accessing a finding aid tool (air photo map indexes, etc.) • other: (fill in answer) the majority of respondents, 82 percent (58) indicated they were using the products to answer research questions; 61 percent (43) for creating or accessing a finding aid tool; 56 percent (40) for instruction purposes; 27 percent for promotion/marketing and 20 percent (14) have used them for “other” purposes including georeferencing imagery, for use in webpages or creating learning objects. academic uses of google earth and google maps in a library setting | dodsworth and nicholson 110 figure 3. level and frequency of use in instruction are google mapping products being used for library instruction? for the authors, one of the best aspects of google maps/earth applications is their visualization capabilities. the ability to easily create and display geographic information to engage students makes google mapping applications an ideal instruction tool. in many ways, google maps and google earth have helped promote map and spatial literacy as concepts that are teachable. despite the free availability and ease of use of google mapping applications, the authors were somewhat surprised from the survey to find that 72 percent of library staff surveyed noted that their institution did not have any kind of map, spatial, or geospatial literacy policy in place. when it came time to provide instruction in the classroom, the survey found that only 31 percent (26) of the respondents had even used google earth in a classroom. nevertheless, in looking at the course levels, library instruction with google earth tools is actually occurring at all levels, from first year to graduate. significantly however, the frequency of the instruction seems to peak in the fourth year, where staff are using in upwards of six to nine courses. respondents were asked to give some details of these sessions, and they included a variety of class topics from environmental awareness education for first year students, to learning digitization skills in later years. has your library taken advantage of google map/earth technology for promotion or marketing purposes? information technology and libraries | june 2012 111 from our environmental scan of library websites we saw many interesting uses of google maps and google earth that were embedded directly into websites. perhaps because of this the authors were surprised to find that 55 percent of the survey respondents did not believe their library was using these technologies for promotion or marketing purposes. for those respondents who were using google maps or google earth to boost services for users, quite a few provided interesting examples of what this technology can offer. many were using google map apis to enhance map and aerial photo indexes, creating greater awareness of these resources and enhancing access. one respondent noted they had created a campus tour that highlighted all of the buildings that made up the library system, while others were using google api technology to showcase particular digitization projects such as folklore collections or geologic atlases. when asked if such activities have helped to enhance services or provided benefits to users, many responded that they had for both the users and for other library staff. greater speed and an increased familiarity of the collections were cited by several respondents, who no longer need to consult the paper indexes. does the library provide support to the wider campus community using google mapping products (not including instructional collaborations)? although many libraries are now using google maps and google earth technology, the authors were surprised that many were not actively leveraging this expertise across their campuses. almost all the respondents either skipped the question or stated that they were not providing this kind of active support. several noted that their gis services were open to all and that they were responsible for the google earth pro licences on campus, but that this was the extent of their support. working with google map/earth (kml) files in the last few years, kml files have become one of the more popular ways to display and distribute geographic information online. with its ease of use, and access, kml files have considerably broadened the user base of geographic information. kml files can be easily created in google earth, and they can be easily converted from gis files in specialized programs. it is this ease of access and usability that has popularized geographic information, hence increasing exposure to library collections and services. this survey was therefore interested in determining how libraries are using and creating kml files. when survey respondents were asked whether they work with kml files, 64 percent (47) responded they did, with 85 percent (40) claiming that they create their own kml files. for those who create their own, 92 percent (34) said that they created kml files by converting them from another file format using an external application, such as arcgis, earthpoint, ogr20gr, or shp2kml software. 78 percent (29) also created them in google earth, and 32 percent (12) created kml files by writing their own xml code. the authors were most interested to know if kml files were actually held as part of the library holdings. thirty percent (13) of respondents noted that they provide access to their kml files as academic uses of google earth and google maps in a library setting | dodsworth and nicholson 112 part of their collections, with 89 percent, (8) claiming they could be located through the library website. other areas mentioned for access included libguides and specialized gis data catalogues available through the library’s website. in terms of quantity, one respondent claimed a collection of 500-800 kml files, while other responses mentioned amounts in the ranges from 5 to 100, with some claiming that they were not sure exactly how many made up their collection. what other online mapping tools are used in your library apart from google maps and google earth? although google maps and google earth are perhaps the most well-known online mapping tools available, the authors were also interested to learn if there were other products that libraries were using as part of their service offerings. as expected many mentioned esri’s arcgis online and esri’s arcexplorer, while other responses included bing maps, openstreetmap, and open layers. discussion google mapping applications are clearly being used for academic purposes in library settings. with such diverse capabilities made available in these programs, library professionals are using them in several different ways. google earth and google maps are popular among library staff who work with gis and/or map collections. in fact, over 90 percent of the respondents use both products, either to help answer research questions, to create and access finding aids, for instructional purposes or for promotion and marketing. google mapping products have also helped libraries revitalize their collections as well as assist in transferring spatial information literacy skills to academic students and faculty. the authors hope that readers who work in a map/gis library setting will be inspired by the many examples of online mapping projects outlined in this paper and they will too use the online tools to the benefit of their library and their library users. google mapping products offer libraries an online platform to share information, and resources in an easy, accessible and low-cost way. the survey results also indicate that map/gis professionals in academic libraries trust and rely on google maps/earth as a solution to many academic queries and needs. since google mapping products were created for the use by mainstream society, it can be suggested that all other nonmap and gis related fields may find the products to be beneficial and useful to them as well. google earth and google maps are very easy to learn and the users do not require any spatial or mapping skills. as this survey was limited to map/gis users, the authors do not know how, if at all, google mapping products are being used by other library staff. this will be a future area of study. the authors do strongly suggest however for map/gis librarians to consider offering training sessions to reference staff and liaison librarians. as a multidisciplinary tool, many subject areas can benefit from google maps/earth, as it’s certainly not a tool for use by only gis/map librarians. with a little bit of training, all library staff can use google mapping products to assist with research questions, spatial literacy, location-based projects and library instruction. in fact, library staff members responsible for nontraditional library material such as photographs, postcards, audio recordings, original hand-written documents, etc. may want to consider using online mapping products to organize their collection. too many times such original material is lost in the library’s filing system, is irretrievable or unavailable during convenient hours. google maps/earth will organize all collections based on their geographic location and can offer access to the actual information technology and libraries | june 2012 113 material. more exposure to and training on these free and easy to use products can increase collection use, promote mapping technology, and organize the library’s holdings. references 1 terry ballard, “inheriting the earth: using kml files to add placemarks relating to the library’s original content to google earth and google maps” new library world 110 (2009): 357-65, doi: 10.1108/0307480091097579. jacobsen, mikael and terry ballard, “google maps: you are here: using google maps to bring out your library’s local collections” library journal, october 15, 2008 (accessed september 11, 2011). http://www.libraryjournal.com/article/ca6602836.html. 2 michaela brenner and peter klein, “discovering the library with google earth” information technology and libraries 27 (2008): 32-6. 3 michael vandenburg, “using google maps as an interface for the library catalogue” library hitech 26 (2008): 33-40. 4 troy swanson, “google maps and second life: virtual platforms meet information literacy” college & research libraries news 69 (2008): 610-12. 5 annette lamb, and larry johnson, “virtual expeditions: google earth, gis, and geovisualization technologies in teaching and learning” teacher librarian 37 (2010): 81-5. 6 a list of mcgill library’s air photo indexes can be viewed at http://www.mcgill.ca/library/library-findinfo/maps/airphotos/ (accessed september 8, 2011). 7 mcmaster university library map index can be found at http://library.mcmaster.ca/maps/ww1/ndx5to40.htm, (accessed september 8, 2011). 8 the brock university historical air photo collection can be accessed at: http://www.brocku.ca/maplibrary/airphoto/historical.php (accessed september 8, 2011). 9 the yale university sanborn indexes can be found at http://www.library.yale.edu/mapcoll/print_sanborn.html (accessed september 8, 2011). 10 the university of vermont library’s google map can be found at: http://cdi.uvm.edu/collections/browsecollection.xql?pid=longtrail&title=long%20trail%20p hotographs (accessed september 8, 2011). 11 the cleveland memory project can be found at: http://www.clevelandmemory.org/hlneo/ (accessed september 8, 2011). http://www.libraryjournal.com/article/ca6602836.html http://www.mcgill.ca/library/library-findinfo/maps/airphotos/ http://library.mcmaster.ca/maps/ww1/ndx5to40.htm http://www.brocku.ca/maplibrary/airphoto/historical.php http://www.library.yale.edu/mapcoll/print_sanborn.html http://cdi.uvm.edu/collections/browsecollection.xql?pid=longtrail&title=long%20trail%20photographs http://cdi.uvm.edu/collections/browsecollection.xql?pid=longtrail&title=long%20trail%20photographs http://www.clevelandmemory.org/hlneo/ academic uses of google earth and google maps in a library setting | dodsworth and nicholson 114 12 the university of waterloo map library website can be found at: http://www.lib.uwaterloo.ca/locations/umd/project/ (accessed september 8, 2011). 13 the university of north carolina library provides interactive maps at http://www.lib.unc.edu/dc/ncmaps/interactive/overlay.html (accessed september 8, 2011). 14 the university of connecticut library offers gis files online here: http://magic.lib.uconn.edu/connecticut_data.html (accessed september 8, 2011). 15 campus map examples include: yale university library at http://maps.commons.yale.edu/venice/ example maps for library locations on campus include: brock university library, http://www.brocku.ca/maplibrary/general/where-is-the-ml.php university of north carolina, http://www.lib.unc.edu/libraries_collections.html (all accessed on september 8, 2011). 16 the full survey instrument can be found in the appendix of this document. http://www.lib.uwaterloo.ca/locations/umd/project/ http://www.lib.unc.edu/dc/ncmaps/interactive/overlay.html http://magic.lib.uconn.edu/connecticut_data.html http://maps.commons.yale.edu/venice/ http://www.brocku.ca/maplibrary/general/where-is-the-ml.php http://www.lib.unc.edu/libraries_collections.html information technology and libraries | june 2012 115 appendix google maps and google earth: influences and impacts in your library you and your library 1. what is your work position title? 2. what department/division/area of library do you work in? (click all that apply) o map/gis services o government publications o general reference o technical services o other (please specify): google mapping products 3. please check all the products you have worked with? o google maps o google maps api o google earth o google earth plus o google earth pro o google earth api o google earth enterprise 4. how regularly do you work with google mapping products for work-related projects? o not at all o a few times a year o 1-3 times a month o 1-2 times a week o 3-4 times a week o more often than that! o not sure 5. for what work related tasks, have you used these products? (click all that apply) o instruction o promotion/marketing o answering research questions o creating/accessing a finding aid tool (air photo, map indexes, etc.) academic uses of google earth and google maps in a library setting | dodsworth and nicholson 116 library instruction using google mapping products 6. does your library have a map, or spatial, or geospatial literacy policy or program? o yes o no 7. if you are using google mapping products for instruction, what level or year of university course(s) are you using it in, and in how many courses: 1-2 3-5 6-9 10-14 15 and more 1st year (100 level) 2nd year (200 level) 3rd year (300 level) 4th year (400 level) graduate level 8. please describe some of these activities? 9. does your library offer geographic awareness or gis-related training to some or all the library staff? promotion/marketing using google mapping products 10. has your library used google mapping technology to promote, offer, or deliver a service? (for example, offering kml files for download, indexes, guides, scanned documents, placemarks/urls from google maps/earth, etc.) o yes o no 10a. if yes, please describe with as much detail as possible how your library has used google mapping technology. if possible, please provide links to the projects. 10b. if yes, how have the google mapping related projects enhanced services or benefited the library? information technology and libraries | june 2012 117 11. does the library provide support to the wider campus community using google mapping products (not including instructional collaborations)? kml/kmz collections 12. do you work with kml files? o yes o no 13. do you create your own kml files? o yes o no 14. how do you create your own kml files? o write xml code o save in google earth o convert from another file format using an external application o other (please specify) 15. does your library hold and provide access to kml or kmz files as part of its collections? o yes o no 16. if yes, approximately how many files do you currently hold? 17. how are these files findable by your patrons? o opac o library website o both 18. do you or other library staff use other online mapping tools? please list which ones and what they are used for. lib-mocs-kmc364-20140103101752 1 file size and the cost of processing marc records john p. kennedy: data processing librarian, georgia institute of technology, atlanta, georgia many systems being developed for utilizing marc records in acquisitions and cataloging operations depend on the selection of records from a cumulative tape file. analysis of cost data accumulated during two years' experience in using marc records for the production of catalog cards at the georgia tech library indicates that the ratio of titles selected to titles read from the cumulative file is the most significant determinant of cost. this implies that the number of passes of the file must be minimized and an effective formula for limiting the growth of the file must be developed in the design of an economical system. since 1963 several articles on computerized production of catalog cards have reported cost figures for card production. fasana reported a cost per card of 9.9 cents at the air force cambridge research laboratory (afcrl) (1). costs at the yale medical library under the columbiaharvard-yale computerized card production system varied from 8.8 cents to 9.8 cents per card ( 2) . under the yale bibliographic system, costs for card production at the yale medical library have been 13.9 cents per card .. when the marc mate program is used to introduce marc records mto the yale bibliographic system the cost of cards produced from the marc records is 24.9 cents ( 3) . costs for computer assisted card production at the philip morris research library have been estimated at 18 cents per card ( 4) . the cost per card for cards produced from marc records at the georgia institute of technology library has been reported as 10 cents (5). 2 ]oumal of library automation vol. 4/1 march, 1971 the focus of interest in these cost reports has been on a comparison of the costs of computer produced cards and manually produced cards. there is agreement in these reports that computer production can compete favorably in terms of cost with other methods of production. less attention has been given to variations in the costs of computer produced cards. since the systems for which costs have been reported vary in scope and objectives, equipment used, nature of input, rates for labor, and charges for computer time, it is not very useful to compare the costs from system to system. variations in cost within one system are of greater interest, since it is easier to isolate the factors that result in the altered costs. the report on the yale bibliographic system shows that the introduction of marc records into a system that was not designed for processing marc records may produce substantially higher costs. fasana reported that when a pdp-1 computer was used rather than the specially built crossfiler in the afcrl system, the cost per card was quadrupled. kilgour discusses briefly the effects of three changes in the columbia-harvard-yale system on the cost of cards produced. the 10-cent-per-card cost reported for georgia tech was the average cost during the preceding three-month period, january through march 1968. during the three years in which catalog cards have been produced on the computer at georgia tech, costs have varied widely as procedures, personnel, file sizes and work loads have changed. the greatest variation has occurred in the cost of the manual steps in the system, mainly proofreading and making corrections. the greatly improved accuracy of the marc ii records has resulted in a reduction in the time required for proofreading and making corrections. the costs of supflies and equipment have been small and shown little variation. the cost o computer time has varied from 18 cents per title (just over 2 cents per card) to a high of 47 cents ( 6 cents per card), excluding the cost of the merge runs to maintain a cumulative file of marc records. an analysis has been made to determine the factors responsible for this variation in computer costs, and techniques for reducing computer costs have been developed. materials and methods the price gilbert memorial library at the georgia institute of technology is a centralized scientific, technical and management collection of 612,000 volumes plus 500,000 microtext and other bibliographic units. in 1968/69 almost 20,000 titles representing about 35,000 volumes were cataloged for addition to the collection. the library makes use of the univac 1108 and the burroughs b5500 computing systems of the institute's rich electronic computing center for its data processing needs. the work described here was performed on the b5500. the georgia tech b5500 configuration includes two central processing units, 32,000 fortyeight-bit words of core storage, 29 million characters of disc storage and 10 magnetic tape drives. library programs are written in cobol and are file size and marc records/kennedy 3 multi-processed with other programs in the standard work stream. the library is billed $140 per hour for central processor time and $47 per hour for io channel time. the system for production of catalog cards from marc i records which was in operation for over two years has been described previously ( 6). statistics were recorded for all computer runs in the processing of 73 batches of marc i titles. these statistics include number of records processed, file sizes, processor time, io channel time, and cost, for each run. the time and cost remained fairly constant for some runs. the cost of runs to produce the sorted catalog cards from edited marc records ranged from 6 to 9 cents per title and averaged a little over 7 cents. the cost of runs to make changes and additions to the marc records ranged from 1 to 5 cents per title and averaged 2 cents. the cost was usually about 1 cent per title for each time the correction program was run. it often had to be rerun several times before all records in the batch were correct. the library's improved marc ii system avoids the cost of correction reruns by permitting independent corrections to any record in a direct access file rather than requiring records to be processed as a batch. most of the variation in the cost of computer time occurred in the run in which records were selected from the cumulative marc file and the selected records were then converted to the b5500 character codes, reformatted and prooflisted. the cost of this run varied from a low of 10 cents per title selected to a high of 36 cents per title; the variation is primarily an effect of the increasing size of the cumulative marc file and of variation in the number of titles selected in the run. as the marc file increased in size the cost of selecting a small number of titles increased dramatically. the precise relationship of file size and batch size to cost per title is not apparent, however, because the cost of character conversion, reformatting, and printing the prooflist were combined with the cost of selection in a single run. an additional complication results from the effects of the other jobs being processed by the computer concurrently. for example, one batch which had to be rerun because the output tape was defective cost 23 cents per title the first time and 28 cents per title when rerun with a different job mix. although the part of the run cost which can be attributed to passing the m~~c file and the part attributable to code conversion, formatting and pnntmg cannot be determined for a single run, this can be calculated from a number of runs with varying file sizes and batch sizes. it is assumed that . variations in the time required for processing individual records of v~ymg lengths and variations due to the mix of jobs run concurrently ~11 average out and may be disregarded. statistics for the selection runs mclude the number of records read from the cumulative marc file, the number ~f recor~ selected and processed, the processor time and io channel time requrred for the run, and the cost of the run. using the method of the least squares, these statistics were used to calculate the 4 journal of library automation vol. 4/1 march, 1971 average time and cost for each record read from the cumulative marc file. once these constants are calculated it is possible to predict the cost per item or the total cost of a select run with any given file size and batch size. in order to determine the average cost for processing a selected record and the average cost for reading a record from the cumulative marc file, it was postulated that c•= (~~) ca+c. where ct fs is the total cost per title (file size) is the number of records read from the cumulative marc file bs (batch size) is the number of records selected in the run cn is the cost of reading a record from the cumulative marc file cp is the cost for processing a selected record the method of least squares yields the following equations: [ ~(~~ r] ca+ [ ~(~~)] c•= ~(~~)c. and [ ~(:] c.+nc.=c. solving these equations for the data from the 73-batch sample gives the following values: cp = $.073 cn = $.00068 since charges for computer time are determined differently at other installations, the figures for processor time and 10 channel time may be more useful to others than the cost figures. using the same techniques but substituting processor time for cost gives the following values: processor time per record read = .00646 seconds processor time for selected records = 1.339 seconds again, using the same technique but substituting 10 channel time for cost gives the following values: 10 channel time per record read= .02048 seconds 10 channel time for selected records= .456 seconds file size and marc records/kennedy 5 these values may be substituted in the formula, cr = ( ~~ ) cn + c,, to find the cost or time per title for any batch and file size. for example, the per title cost for selecting and processing a batch of 200 records from a marc file of 40,000 records: c.=(~~ )c. +c. c.=( 4:0} $.00068) +$.073 ct= $.21 it will cost about twenty-one cents per title. the total cost of the run can be predicted as follows: c = ( fs bs ) ( cn) + ( bs ) ( cp) c = ( 40000 200) ( $.00066) + ( 200) ( $.073) c = $41.27 results table 1 shows the predicted cost per title for various file sizes and batch sizes; it is based on the cost of the select run at georgia tech and ignores the cost of maintaining the marc file. since the library of congress cumulated marc i records until a reel of tape was filled and provided a cumulative card number listing of the records on the reel, it was not essential to update the cumulative marc file each week. the marc ii tapes issued from the marc distribution service are not cumulative. most libraries maintaining a cumulative file of marc records will find it necessary to update this file each week. weekly updating of the marc file requires that all records on the file be not only read but also written on a new tape each week. for most systems this will rapidly become the most expensive machine procedure in the entire system. combining the selection function and any index production with the file update means that no additional passes of the file will be required, but the cost of writing the file each week must be added to the figures in table 1. statistics from the merge runs at tech show that if the number of old marc file records read, the number of records read from the weekly update tape, and the number of records written on the new marc file are totaled, the average cost per io operation for the merge runs ranged be.~een $.00062 and $.00073 and averaged $.00068 for all merge runs. since th1s is the same cost as that obtained for each record read from the cumulative file in the select runs, it seems reasonable to use this figure as the cost for reading or writing a marc record in calculating the cost of 0) ._ c ~ ~ table 1. relationship of file size and batch size to cost per title c -r .... ~ ~ ~ batch size > file ~ asize 50 100 150 200 250 300 400 500 750 1000 ~ $ .209 $ .118 $ .107 $ .100 $ .095 $ .087 $ .082 $ .080 .... !ok $ .141 $ .090 cs· 20k .345 .209 .164 .141 .127 .118 .107 .100 .091 .087 ;s 30k .481 .277 .209 .175 .155 .141 .124 .114 .100 .093 < 40k .617 .345 .254 .209 .182 .164 .141 .127 .109 .100 c ~ 50k .753 .413 .300 .243 .209 .186 .158 .141 .118 .107 ,;... -60k .889 .481 .345 .277 .236 .209 .175 .155 .127 .114 ...... 70k 1.025 .549 .390 .311 .263 . 232 .192 .168 .137 .121 ~ 80k 1.161 .617 .436 .345 .291 .254 .209 .182 .146 .127 ll' .685 .379 .277 .155 "t 90k 1.297 .481 .318 .226 .194 .134 c'.) look 1.433 .753 .526 .413 .345 .300 .243 .209 .164 .141 .?"' llok 1.569 .821 .572 .447 .372 . 322 .260 .223 .173 .148 ...... co ~ 120k 1.705 .889 .617 .481 .399 .345 .277 .236 .182 .155 ...... table 2. relationship of file size and batch size to cost per titlefile update and record selection functions combined in same program old batch size file size 50 100 150 200 250 300 400 lok $ .378 $ .225 $ .175 $ .149 $ .134 $ .124 $ .111 20k .650 .361 .265 . 217 .188 .169 .145 30k .922 .497 .356 .285 .243 .214 .179 40k 1.194 .633 .447 .353 .297 .260 .213 50k 1.466 .769 .537 .421 .352 .305 .247 60k 1.738 .905 .628 .489 .406 .350 .281 70k 2.010 1.041 .719 .557 .461 .396 .315 80k 2.282 1.177 8.09 .625 .515 .441 .349 90k 2.554 1.313 .900 .693 .569 .486 .383 look 2.826 1.449 .991 .761 .624 .532 .417 llok 3.098 1.585 1.081 .829 .678 .577 .451 120k 3.370 1.721 1.172 .897 .732 .622 .485 500 750 1000 ":tj ... ~ $ .104 $ .093 $ .088 en .131 .111 .102 n . ~ .158 .130 .115 ~ .185 .148 .129 ~ .212 .166 .143 ~ .240 .184 .156 > .267 .202 .170 ~ () .294 .220 .183 ~ .321 .238 .197 ~ ~ .348 .257 .211 c ~ .376 .275 .224 -.403 .293 .238 ~ ttl :z :z t=:l tj to< ~ 8 journal of library automation vol 4/1 march, 1971 combined merge-select runs. table 2 shows the predicted costs per title for combined merge-select runs with varying file and batch sizes. the costs shown are based on the following equation: c. =(fso + fs~:s· + fs. )c.o + cp where ct is the cost per title fso is the file size for the old marc file fsa is the file size for the add records ( 1200) fsv is the file size for the delete records ( 1200) fsn is the file size for the new marc file bs (batch size) is the number of records selected in the run c1o is the cost of reading or, writing a record ( $.00068) cp is the cost of processing a selected record ( $.073) calculations for this table are based on several assumptions: it is assumed that the file has reached a state of equilibrium in which the weekly additions and deletions are equal; it is also assumed that delete records have the same average length as other records and therefore take as long to read. while it is unlikely that these assumptions will hold perfectly, the variations are not great enough to destroy the usefulness of the resulting figures as a guide. discussion the figures presented in the two tables have several implications for the design of systems based on the maintenance of a cumulative marc file and the selection of records from that file. first, they show the im· portance of assuring that no unnecessary passes of the cumulative marc file are made. updating of the marc file, production of indexes to it and selection of records from it should be accomplished in a single pass of the file. if it is desired to select records from the file more often than once a week, table 1 provides a means of estimating the cost of the im· proved response time. if for example, the file size is 100,000 and the weekly volume is 500, twice-a-week runs would increase the cost by 14 cents per title or by $68.00 a week for the select runs. the figures presented in the two tables also show the critical importance of controlling the growth of the cumulative marc file, especially for file size and marc records/kennedy 9 libraries with a relatively small volume of titles to be processed. three characteristics of the acquisitions program of the library largely determine the possibilities for controlling the growth of this file. the number of titles acquired by the library determines the batch sizes for records to be selected from the file each week. the acquisition rate is also an important determinant of the growth rate of the cumulative file provided that records which have been selected and used are then purged from the file. if the library of congress issues an average of 1200 titles per week and a library uses an average of 1000 titles a week from the file, the net annual growth of the cumulative file will be only slightly over 10,000 records. on the other hand, a smaller library selecting an average of only 100 titles a week would have a net annual growth rate of about 57,000. if unused records were purged after one year, the file size would remain stable at these levels. table 2 indicates that the cost per title for file maintenance and selection at these two libraries would be about 9 cents and 86 cents respectively. a second characteristic of the acquisitions program of the library that is important in controlling the growth of the cumulative marc file is the scope of the subject coverage attempted. if most of the monographs acquired fall within well defined subject classes, the probability of utilizing marc records in many other subject classes may be low enough that these records need not be added to the cumulative marc file at all. for a special library that attempts to collect everything published in a few well defined subject areas it may be economical to maintain and utilize a limited marc file even though the number of records selected is small. on the other hand, a small or medium-sized public library acquiring the same number of titles would probably find a much larger percentage of its records on the marc file but still not be able to use the marc tapes economically. since the public library is likely to collect titles in most subject fields, the probabilities of utilizing records in different classes would not vary as widely and it would not be possible to limit the file to records in a few classes having a high probability of utility. consequently, the per-item cost of marc records would likely be too high for consideration. if it is determined that the probabilities of using marc records vary widely for other characteristics, such as publisher, these characteristics may be used for restricting the records to be added to the cumulative file, thus limiting its size, but subject class seems to be the most promising characteristic for this purpose. an analysis by subject class of all non-juvenile records in the marc i bi.e and of those records selected from it for use by the georgia tech ltbrary has been used as the basis for restricting the growth of the cumulative file of marc ii records. overall, 8,953 out of 46,486 records were utili~ed, 19.3% of the file. the percentage selected varied from more than 50% m some engineering classes to less than 1% in a few classes such as cs (genealogy) and bw (practical theology) . elimination of thirty 10 journal of library automation vol. 4/1 march, 1971 classes in which fewer than 4% of the records were eventually used would have reduced the file by 7,710 records or 16.6%. only 184 of these records ( 2.4%) were eventually selected for use. records for these thirty subject classes are not being added to the georgia tech file of marc ii records. a third characteristic of the acquisitions program important in controlling the growth of the cumulative marc file is the speed with which newly published monographs are acquired. if most monographs are acquired soon after publication, the probability of using a marc record that has not been selected in the first few months after its receipt may be low. unselected records may therefore be purged after a relatively short time and the file size thereby controlled. use of the marc tapes for book selection will help to increase the probability of records being selected during the first few months on the file. a system that uses the weekly marc tapes for book selection and does not retain on the cumulative marc file those records not selected for purchase might be quite economical. the frequency with which decisions are later made to acquire titles that were initially passed over, and the added cost for manual input of those records, would have to be considered in deciding on this policy. an analysis has been made of the interval between the date records were added to the marc file and the date on which they were selected for use by the georgia tech library. distributions by time intervals for each library of congress subject class were prepared. the distributions varied significantly for reasons that are not yet clear. generally, it appeared that in those subject classes for which a smaller percentage of the titles available on the marc file were acquired, they were acquired more rapidly. this seems to be advantageous for keeping the marc file small. for those classes in which a large percentage of titles are selected, unselected records will be retained on the file for a long period, such as eighteen months. use of a large percentage will mean that the number of unused records remaining on the file will be relatively small and they will have a high probability of selection over the extended period. for those classes in which a smaller percentage of titles are acquired, the unselected records will be retained on the file for a shorter period, such as six months. since titles in these fields tend to be acquired more promptly, few potentially useful records will be lost by purging unselected records after a shorter interval. over the past year major changes have been made in acquisitions procedures in the georgia tech library. a much larger proportion of mono· graphs are now received on approval plans. the marc distribution serv· ice now provides about twice as many records each week as were provided during the pilot project phase. the effects of these changes on the propor· tion of titles selected and the time required for acquiring titles in the various subject classes have not yet been determined. continuous moni· toring of the operation of the system for changes in these characteristics file size and marc records/kennedy 11 will be required for efficient operation. the improved program for maintenance of the marc ii file and selection of records from it provides for designating subject classes which are not to be added to the file and designating how long unselected records in other classes are to be retained on the file. this study of variations in the computer costs of card production lends support to the decision to continue using cobol as the primary language for the marc ii system being implemented on the univac 1108 rather than using assembly language. the inefficiency of cobol for characterby-character code conversion and for manipulating variable length data had been a source of some concern. the cost of all processing of selected records, including code conversion, reformatting, prooflisting, making corrections, generating and formatting added entry records, and sorting and printing catalog cards, averaged only about 16 cents per title. a reduction of even 50% through the use of assembly language and increased effort directed to program efficiency would reduce costs by only about 8 cents per title or 1 cent per card. these savings do not seem to justify the increased original programming costs and the likelihood of eventual costly reprogramming. on the other hand, the cost of selecting records from the marc file varied from 3 cents per title to 29 cents per title. with the added cost of weekly maintenance of the marc file and with more than twice as many marc records being received, the costs of processing the cumulative marc file might easily go much higher. by careful attention to controlling tl1e growth of this file, significant savings in the cost of the system may be achieved. conclusion some librarians have assumed that as the scope of the marc distribution service expands to include other languages and other types of materials their problems of inputting current records will be solved. this analysis shows that the situation is not so simple. probably only a few of the largest general research libraries will be able to maintain complete marc files for their individual use during the next few years, though reductions in computing costs may eventually change this prediction. even medium-sized libraries such as georgia tech will not be able to use economically the foreign language materials when they are included in the marc program. some libraries which do not use a large enough proportion of the marc records to make it economically practical to maintain a complete marc file may be able to make economical use of marc records by carefully contro~ling the retention of records on the cumulative file. continuing analysts of the probabilities for selecting records of varying age and subject classes rna~ be utilized in developing a formula for maintaining the file at near opbmum size if the system provides for collection of the required statistics. 12 journal of library automation vol. 4/1 march, 1971 for libraries which cannot profitably use the marc tapes, there is another prospect. cooperative centers that do the processing for large library systems or for several systems will have the volume to justify maintenance of complete files. certainly, a processing center serving all libraries of the university system of georgia could economically maintain a more complete marc file than georgia tech alone can justify. the development of cooperative processing programs in ohio, new england, oklahoma, ( 7, 8, 9) and elsewhere indicates that some librarians are coming to this realization. acknowledgments mrs. julie gwynn wrote most of the computer programs referred to in this paper. her husband, professor john gwynn, gave valuable advice on the statistical techniques employed in analyzing the data. the university of toronto library generously provided a copy of its marc file, which included the date each record was added to the file, for use in analysis of the time lag between availability of the record and selection of it. references 1. fasana, paul j.: "automating cataloging functions in conventional libraries," library resources and technical services, 7 (fall1963), 350-365. 2. kilgour, frederick g.: "costs of library catalog cards produced by computer," journal of library automation. 1 (june 1968), 121-127. 3. stone, sandra f .: yale bibliographic system; time and cost analysis at the yale medical library (unpublished document, new haven: yale university library, 1969). 4. murrill, donald p.: "production of library catalog cards and bulletin using an ibm 1620 computer and an ibm 870 document writing system," journal of library automation, 1 (september 1968 ), 198-212. 5. kennedy, john p.: "a local marc project: the georgia tech library." in university of illinois, graduate school of library science: proceedings of the 1968 clinic on library applications of data processing (urbana: university of illinois, 1969), pp 199-215. 6. ibid. 7. kilgour, frederick g.: "a regional networkohio college library center," datamation, 16 (february, 1970), 87-89. 8. agenbroad, james e.; et al.: systems design and pilot operations of the n ew england state universities. nelinet, new england li· brary information network. progress report, july 1, 1967. march 30, 1968 (cambridge, mass.: inforonics, inc., 1968). ed 026 078. 9. bierman, kenneth john; blue, betty jean: "processing of marc tapes for cooperative use," journal of library automation, 3 (march 1970)' 36-64. 10738 20190318 galley determining textbook cost, formats, and licensing with google books api: a case study from an open textbook project eamon costello, richard bolger, tiziana soverino, and mark brown information technology and libraries | march 2019 91 eamon costello (eamon.costello@dcu.ie) is assistant professor, open education at dublin city university. richard bolger (richard.bolger@dcu.ie) is lecturer at dublin city university. tiziana soverino (tiziana.soverino@dcu.edu) is researcher at dublin city university. mark brown (mark.brown@dcu.ie) is full professor of digital learning, dublin city university. abstract the rising cost of textbooks for students has been highlighted as a major concern in higher education, particularly in the us and canada. less has been reported, however, about the costs of textbooks outside of north america, including in europe. we address this gap in the knowledge through a case study of one irish higher education institution, focusing on the cost, accessibility, and licensing of textbooks. we report here on an investigation of textbook prices drawing from an official college course catalog containing several thousand books. we detail how we sought to determine metadata of these books including: the formats they are available in, whether they are in the public domain, and the retail prices. we explain how we used methods to automatically determine textbook costs using google books api and make our code and dataset publicly available. introduction the cost of textbooks is a hot topic for higher education. it has been reported that by 2014 the average student spent $1,200 annually on textbooks.1 another study claimed that between 2006 and 2016 the costs of college textbooks increased over four times the cost of inflation.2 despite this rise in textbook costs, a survey of more than 3,000 us faculty members (“the babson survey”) found that almost every course (98 percent) mandated a textbook or related study resources.3 one response to the challenge of rising textbook costs is open textbooks. open textbooks are a type of open educational resource (oer). oers have been defined as “teaching, learning, and research resources that reside in the public domain or have been released under an intellectual property license that permits their free use and repurposing by others. open educational resources include full courses, course materials, modules, textbooks, streaming videos, tests, software, and any other tools, materials, or techniques used to support access to knowledge.”4 oers stem from the principle that access to education is a human right and that, as such, education should be accessible to all.5 hence an open textbook is made available under terms which grant legal rights to the public, not only to use, but also to adapt and redistribute. creative commons licensing is the most prevalent and well-developed intellectual property licensing tool for this purpose. open textbook projects aimed at promoting publishing and redistributing open textbooks, both in digital and print formats, have been growing. for example, the bcampus project in canada began in 2012 with the aim of creating a collection of open textbooks aligned with the most popular subject areas in british columbia.6 the project has shown strong growth, with over 230 open digital textbooks now available and more than forty institutions involved. a significant recent determining textbook cost, formats, and licensing | costello, bolger, soverino, and brown 92 https://doi.org/10.6017/ital.v38i1.10738 development in open textbooks occurred in march 2018, when the us congress announced a $5 million investment in an open textbook initiative.7 in addition to helping change institutional culture, and challenge attitudes to traditional publishing models, one of the most oft-cited benefits of open textbooks is cost savings. according to the college board’s survey of colleges, the average annual cost to us undergraduate students in 2017 for textbooks and materials was estimated at $1,250.8 this figure is remarkably close to the aforementioned figure of $1,200 a year, as reported by baglione and sullivan. however, there is little known about the monetary face value of books that students are expected to buy, beyond studies based on self-reported data. students themselves in the us have attempted to at least open the debate in this area by highlighting book price disparities.9 nonetheless, they only report on a very small number of books, and the college board representing on-campus us textbook retailers have disputed their results for this reason, claiming that they have been selective in the book prices they have chosen. hence this study seeks to address the gap that exists in knowledge about the true cost of textbooks in higher education. this is in the context of a wider research project we are conducting on open textbooks in ireland.10 determining the cost of books is not straightforward as books can be new, used, rental, or digital subscription. however, the cost of new books does set a baseline for other forms, particularly rental and used books. our aim here is hence to start with new books, by analyzing costs of all the required and recommended textbooks of one higher education institution (hei) in ireland. the overarching research question this study sought to address is: what is known about the currently assigned textbooks in an irish university? the sub-questions were: • rq1: what is the extent of textbooks that are required reading? • rq2: what are the retail costs of textbooks? • rq3: are textbooks available in digital or e-book form? • rq4: are textbooks available in the public domain? the next section outlines our methodology and how we sought to find answers to these questions. methods in this section we describe our approach, the dataset generated, and the methods we used to analyze the data. we identified a suitable data source comprising the official course catalog of a hei in ireland with more than ten thousand students. in the course catalog faculty give required and recommended textbook details for all courses. this information is freely accessible on the website of the hei; the course catalog is powered by a software system known as akari (http://www.akarisoftware.com/). akari is a proprietary software system used by several heis in and outside ireland to create and manage academic course catalogs. the course team gained access to a download of all books recorded in the database of the course catalog (figure 1). in this catalog, fields are provided for lecturers to input information for students about books such as title, international standard book number (isbn), author, and publisher. following manual and automated data cleansing, 3,014 unique records of books were created. due to the large number of books, at this stage we sought a programmatic solution for finding out more information about these books. information technology and libraries | march 2019 93 figure 1. course catalog screenshot. we initially thought that isbns might prove the best way to accurately reconcile records of books. however, many isbns were incomplete or mistyped. moreover, many instructors simply did not enter an isbn. given the capacity for errors in the data—for instance, some lectures simply entered “i will tell you in class” in the book title field—we required a tool that could handle fuzzy search queries, e.g. cases where a book title or author were misspelled. the tool we selected was the google books application programming interface (api).11 this api provides an interface to the google books database of circa thirty million books. the service, like the main google search engine, is forgiving of queries that are mistyped or misspelled. hence, we constructed a query based on a combination of author name, book title, and publisher. following experimentation, we determined that these three search terms together allowed us to find books with a high degree of accuracy whilst also accounting for possible spelling errors. determining textbook cost, formats, and licensing | costello, bolger, soverino, and brown 94 https://doi.org/10.6017/ital.v38i1.10738 figure 2. system design. we then wrote a custom javascript middleware program deployed in the google cloud platform. this program parsed the file of the book search queries, passed them to the google books api as search requests and saved the results. the api returned results in javascript object notation (json) format. json is a modern web language for describing data. it is related to javascript and can be used to translate objects in the javascript programming language into textual strings. it is used as a replacement for xml as it is arguably more human readable and is considerably less verbose. we then imported this json into a mongodb database to filter and clean the data, before finally exporting them to excel for statistical analysis. mongodb is a document store database that natively stores objects in the json format and allows for efficient querying of the data. the google books api provides some key metadata on books aside from the usual author, publisher, isbn, edition, pages, etc. as it gives prices for selected books. google draws this information from its own e-book store which contains over three million books and a network of resellers who sell print and digital versions of the books. in addition to price, google books also contains information on accessible versions of books, digital/e-pub versions, pdf versions, and whether the book is in the public domain. we have published a release of this dataset and all of our code to the software repository github. we then used the zenodo platform to generate a digital object identifier (doi) for the code.12 one of the functions of the zenodo platform is to allow for code to be properly cited and referenced. we published our code in this way for others interested in replicating this work in other contexts. in the next section we will provide an analysis of the results of our queries. results after extracting and processing the data from the course catalog and google platforms, we obtained 3,030 unique course names and in these courses we found over 15,414 books listed. required versus recommended reading from the course catalog data, we found that 11,022 (71.5 percent) books were required readings and the remaining 4,392 (28.5 percent) were recommended. information technology and libraries | march 2019 95 upon cleaning and removing duplicates and missing data, we identified 3,014 books that could be queried using the google books api. querying the api returned results for 2,940 books, i.e. it found 97 percent of the books and only seventy-four books could not be found. the google books api returns information in json format. figure 3 below shows an example of the json information returned for one book. { "volumeinfo" : { "title" : "psychiatric and mental health nursing", "authors" : [ "phil barker" ], "industryidentifiers" : [ { "type" : "isbn_13", "identifier" : "9781498759588" }, { "type" : "isbn_10", "identifier" : "1498759580" } ], "imagelinks" : { "smallthumbnail" : "http://books.google.com/books/content?id=btsocgaaqbaj&printsec=frontcover&img=1&zo om=5&edge=curl&source=gbs_api" } }, "saleinfo" : { "isebook" : true, "retailprice" : { "amount" : 62.39, "currencycode" : "usd" } }, "accessinfo" : { "publicdomain" : false, "pdf" : { "isavailable" : true } } } figure 3. sample of book information returned by google books api. digital formats and public domain license figure 4 shows the numbers of pdf (1,219) and e-book (1,016) versions of books reported to be available. eight hundred and fifty-four were available in both pdf and e-book format. from the determining textbook cost, formats, and licensing | costello, bolger, soverino, and brown 96 https://doi.org/10.6017/ital.v38i1.10738 total of 2,940 individual books listed their availability was as follows: figure 4. availability of 2,940 books in digital formats and public domain license. as per figure 4, only 0.18 percent (six) of the books had a version available in the public domain according to google books. cost results the google books api only returned prices for 596 (20 percent) of the books that we searched for. within that sample, the cost ranged from $0.99 to over $452, as illustrated in figure 5. the median price of a book was $40, and the mean price was $56.67. as there are on average 3.96 books per course, this implies an average cost to students of $224.41 per course taken. as students take an average of 8.05 courses per year, this further implies a cost per year of $1,806.50 per student if they were to buy new versions of all the books. 1,219 (39.73% ) 1,016 (34.56% ) 6 (0. 18%) 0 500 1000 1500 2000 2500 pdf ebook openpdf e-book public domain information technology and libraries | march 2019 97 figure 5. summary of book prices (n = 596). discussion and conclusion we have demonstrated that it is possible to programmatically search and determine the prices of large numbers of books. we used this information to attempt to estimate the full economic cost of books to students on average in an irish hei. we are still actively developing this tool and encourage others to use and even contribute to the code which we have published with the dataset. this proof of concept tool may allow stakeholders with an interest in book costs for students to quickly get real data on large numbers of books. ultimately, we hope that this will help highlight the costs of many textbooks. our findings also highlight relatively low levels of digital book availability. very few books were found to be in the public domain. a limitation of this research is that there are issues around the coverage of google books and its index policies or algorithms. in a literature review of research articles about google books in 2017, fagan pointed out that the coverage of google books is “hit and miss.”13 in 2017, google books included about thirty million books, though google did not release specific details on its database, as emphasized by fagan. it is known that content includes digitized collections from over forty libraries, and that us and englishlanguage books are overrepresented.14 furthermore, google books is only returning results for books that are in the public domain and cannot tell us if books are made available through open licenses such as creative commons. accepting such caveats, however, we have found the google books api to be a very useful tool for answering questions about large numbers of books in a systematic way and hope that our findings can help others. the prices that we derived in this study were for new books only. however, the new book prices provide a baseline for all other prices, e.g. a used book or a loan book price will be relative to a new book price and library budgets will need to take account of new book prices.15 further study is required to determine a more realistic figure for the cost of textbooks and the next phase of our 0 50 100 150 200 250 300 350 400 450 500 1 16 31 46 61 76 91 10 6 12 1 13 6 15 1 16 6 18 1 19 6 21 1 22 6 24 1 25 6 27 1 28 6 30 1 31 6 33 1 34 6 36 1 37 6 39 1 40 6 42 1 43 6 45 1 46 6 48 1 49 6 51 1 52 6 54 1 55 6 57 1 58 6 d ol la rs cost in usd books determining textbook cost, formats, and licensing | costello, bolger, soverino, and brown 98 https://doi.org/10.6017/ital.v38i1.10738 wider open textbook research projects involves interviews and focus groups with students to better understand the lived reality of their relationship with textbooks.16 references 1 stephen l. baglione and kevin sullivan, “technology and textbooks: the future,” american journal of distance education 30, no. 3 (aug. 2016): 145-55, https://doi.org/10.1080/08923647.2016.1186466. 2 etan senack and robert donoghue, “covering the cost: why we can no longer afford to ignore high textbook prices,” report, the student pirgs (feb. 2016), www.studentpirgs.org/textbooks. 3 elaine allen and jeff seaman, “opening the textbook: educational resources in u.s. higher education, 2015-16,” report, babson survey research group (july 2016), https://www.onlinelearningsurvey.com/reports/openingthetextbook2016.pdf. 4 william and flora hewlett foundation (2019), http://www.hewlett.org/programs/education-program/open-educational-resources. 5 2012 paris oer declaration, http://www.unesco.org/new/fileadmin/multimedia/hq/ci/wpfd2009/english_declaratio n.htm. 6 mary burgess, “the bc open textbook project,” in open: the philosophy and practices that are revolutionizing education and science, rajiv s. jhangiani and robert biswas-diener (eds.). (london: ubiquity pr., 2017): 227–36. 7 nicole allen, “congress funds $5 million open textbook grant program in 2018 spending bill,” sparc open (mar. 20, 2018), https://sparcopen.org/news/2018/open-textbooks-fy18/. 8 jennifer ma et al., “trends in college pricing,” report, the college board (oct. 2017), https://trends.collegeboard.org/sites/default/files/2017-trends-in-college-pricing_0.pdf. 9 kaitlyn vitez, “open 101: an action plan for affordable textbooks,” report, student pirgs (jan. 2018), https://studentpirgs.org/campaigns/sp/make-textbooks-affordable. 10 mark brown, eamon costello, and mairéad nic giolla mhichíl, “from books to moocs and back again: an irish case study of open digital textbooks,” in exploring the micro, meso and macro. proceedings of the european distance and e-learning network 2018 annual conference, genova, 17-20 june, 2018 (budapest: the european distance and e-learning network): 206-14. 11 google books api (2018), https://developers.google.com/books/docs/v1/reference/volumes. 12 eamon costello and richard bolger, “textbooks authors, publishers, formats and costs in higher education,” bmc research notes 12, no. 1 (jan. 2019): 12-56, https://doi.org/10.1186/s13104-019-4099-1. information technology and libraries | march 2019 99 13 jody condit fagan, “an evidence-based review of academic web search engines, 2014-2016: implications for librarians’ practice and research agenda,” information technology and libraries 36, no. 2 (mar. 2017): 7-47, https://doi.org/10.6017/ital.v36i2.9718. 14 ibid. 15 anne christie, john h. pollitz, and cheryl middleton, “student strategies for coping with textbook costs and the role of library course reserves,” portal: libraries and the academy 9, no. 4 (oct. 2009): 491-510, http://digital.library.wisc.edu/1793/38662. 16 eamon costello et al., “textbook costs and accessibility: could open textbooks play a role?” proceedings of the 17th european conference on elearning (ecel), vol. 17 (athens, greece: 2018): 99-106. microsoft word june_ital_liu_final.docx a library in the palm of your hand: mobile services in top 100 university libraries yan quan liu and sarah briggs information technology and libraries | june 2015 133 abstract what is the current state of mobile services among academic libraries of the country’s top 100 universities, and what are the best practices for librarians implementing mobile services at the university level? through in-‐depth website visits and survey questionnaires, the authors studied each of the top 100 universities’ libraries’ experiences with mobile services. results showed that all of these libraries offered at least one mobile service, and the majority offered multiple services. the most common mobile services offered were mobile sites, text messaging services, e-‐books, and mobile access to databases and the catalog. in addition, chat/im services, social media accounts and apps were very popular. survey responses also indicated a trend towards responsive design for websites so that patrons can access the library’s full site on any mobile device. respondents recommend that libraries considering offering mobile services begin as soon as possible as patron demand for these services is expected to increase. introduction mobile devices, such as smart phones, tablets, e-‐book readers, handheld gaming tools and portable music players are practically omnipresent in today’s society. according to walsh (2012), “mobile data traffic in 2011 was eight times the size of the global internet in 2000 and, according to forecasts, mobile devices will soon outnumber human beings”.1 studies have revealed that use of mobile devices is widespread and continues to increase. as of 2013, 56% of americans owned a smart phone (smith 2013). this number is even higher among people ages 18 to 29.2 however, peters (2011) points out that mobile phones at least can be found among people of all ages, nationalities and socioeconomic classes. he writes, “we truly are in the midst of a global mobile revolution.”3 in 2012, the acrl research planning and review committee found that 55% of undergraduates have smart phones, 62% have ipods, and 21% have some kind of tablet. over 67% of these students use their devices academically.4 elmore and stephens (2012) write, “academic libraries cannot afford to ignore this growing trend. for many students a mobile phone is no longer just a telephonic device but a handheld information retrieval tool.”5 yan quan liu (liuy1@southernct.edu) is professor in information and library science at southern connecticut state university, new haven, ct, and special hired professor at tianjin university of technology, tianjin, china. sarah briggs (sjg.librarian@gmail.com) is library/media specialist at jonathan law high school, milford, ct. a library in the palm of your hand: mobile services in the top 100 university libraries | liu and briggs | doi: 10.6017/ital.v34i2.5650 134 it is clear from these studies that academic libraries can expect their patrons to be accessing their services via mobile devices in growing numbers and need to adapt to this reality. however, the sheer number of mobile devices on the market and the myriad ways libraries could offer mobile services can be daunting. additionally, offering mobile services requires investing time, money, and personnel. in order to give libraries a starting point, this paper examines the current status of mobile services in the united states’ top 100 universities’ libraries as a model, specifically what services are being offered, what are they being used for, and what challenges libraries have encountered in offering mobile services. in doing so, this paper attempts to answer two questions: what is the state of mobile services among academic libraries of the country’s top ranked universities, and what can the experiences of these libraries teach us about best practices for mobile services at the university level? literature review current status of mobile services in academic libraries there is not a lot of data regarding the prevalence of mobile services in academic libraries. a 2010 study found that 35% of the english speaking members of the association of research libraries had a mobile website for either the university, the library, or both (canuel and crichton 2010).6 a study of chinese academic libraries revealed that only 12.8% surveyed had a section of their web pages devoted to mobile library service (li 2013).7 in 2010, canuel and crichton found that 13.7% of association of universities and colleges of canada members had some mobile services, including websites and apps.8 in the united states, a 2010 survey found that 44% of academic libraries offered some type of mobile service. 39% had a mobile website, and 36% had a mobile version of the library’s catalog. half of libraries which did not offer mobile services were in the planning process for creating a mobile website, catalog, and text notifications. additionally, 40% planned on implementing sms reference services, and 54% wanted the ability to access library databases on mobile devices (thomas 2010).9 however, it is widely assumed that mobile services will expand rapidly in the future (canuel and crichton 2010).10 more recently, a 2012 survey of academic libraries in the pacific northwest found that 50% had a mobile version of the library’s website and/or catalog, 40% used qr codes, 38% had a text messaging service, and 18% replied “other” with mobile interfaces for databases being a popular offering. however, 31% of survey respondents still did not have any mobile services (ashford and zeigen 2012).11 osika and kaufman (2012) surveyed community and junior colleges nationwide to determine what mobile services were being offered. 73% offered mobile catalog access, 62% offered vendor database apps, two were creating a mobile app for the library, and 14.7% had a mobile library website.12 definition and types of mobile services although there are dozens of different mobile devices on the market, la counte (2013) aptly and succinctly defines them as follows: “the reality is that mobile devices can refer to essentially any device that someone uses on the go” (vi).13 smart phones, netbooks, tablet computers, e-‐readers, information technologies and libraries | june 2015 135 gaming devices and ipods are examples of mobile devices that are now commonplace on college campuses. barnhart and pierce (2012) define these devices as “…networked, portable, and handheld…”14 additionally, these devices may be used to read, listen to music, and watch videos (west, hafner and faust 2006).15 according to lippincott (2008), libraries should consider all their patron groups as potential mobile library users, including faculty, distance education students, on-‐campus students, students placed in internships or doing other kinds of fieldwork, and students using mobile devices to work on collaborative projects outside of school.16 the most common mobile services discussed in the literature are mobile-‐friendly websites or apps, mobile-‐friendly access to the library’s catalog and databases, text messaging services, qr codes, augmented reality, e-‐books, and information literacy instruction facilitated by mobile devices. these services fall into one of two categories: traditional library services amended to be available with mobile devices and services created specifically for mobile devices. common library services that have been updated to be mobile-‐friendly include a mobile website (either as a mobile version of the library’s regular site, an app, or both), mobile-‐friendly interfaces for the library’s catalog and databases, access to books in electronic format, and information literacy instruction which makes use of mobile devices. regarding mobile websites and apps, walsh (2012) writes, “if a well-‐designed app is like a top-‐end sports car, a mobile website is more like a family run-‐ around. it may not be as good looking, but it is likely to be cheaper, easier to run and accessible to more people.”17 it is not feasible to replicate the entire website in a mobile version, so libraries must know what patrons find most important and address that information through the mobile site (walsh 2012).18 according to a 2012 survey of academic libraries in the pacific northwest, the most popular types of information found on mobile websites are links to the catalog, a way to contact a librarian, links to databases, and hours of operation (ashford and zeigen 2012).19 many libraries are also providing mobile access to their catalogs and databases. this is sometimes difficult because often third-‐party vendors are responsible for the catalogs and/or databases, and libraries must rely on these vendors to provide mobile access (iglesias and meesangnil 2011).20 however, many vendors already offer mobile-‐friendly interfaces; libraries must be aware when this is the case and provide links to these interfaces. when a vendor does not provide a mobile-‐friendly interface, the library should encourage the vendor to do so (bishoff 2013, p. 118).21 there is a growing expectation that libraries will provide e-‐books to patrons as e-‐books become increasingly popular. walsh (2012) states that the proportion of adults in the united states who own an e-‐book reader doubled between november 2010 and may 2011.22 according to bischoff, ruth, and rawlins (2013), 29% of americans owned a tablet or e-‐reader as of january 2012.23 this has presented challenges for libraries, mainly in two areas: format and licensing. there is risk involved in choosing a format that will only work with one product, i.e. a nook or a kindle, a library in the palm of your hand: mobile services in the top 100 university libraries | liu and briggs | doi: 10.6017/ital.v34i2.5650 136 because not every patron will own the same device, and ultimately one device might become the most popular, rendering books purchased for other devices obsolete. on the other hand, formats that work with multiple devices tend to have only basic functionality and do not provide an ideal user experience (walsh 2012).24 walsh (2012) recommends epub, which works well with many different devices, is free, and supports the addition of a digital rights management layer.25 licensing is also an issue as libraries and publishers strive to find a method of loaning e-‐books amenable to both. no one model has emerged which is mutually satisfactory (walsh 2012).26 libraries are increasingly integrating mobile technologies into information literacy instruction and other forms of instruction. for example, services such as skype and facetime, which walsh (2012) describes as “a window to another world” (p. 105), can be used for distance learning, including reference and instruction.27 when interactions do not need to take place live, many mobile devices have the capability to take pictures, record video, and record audio (walsh 2012, p. 97).28 this allows class events, including lectures and discussions, to be broadcast to people and spaces beyond the physical classroom. walsh (2012) notes that, when constructing podcasts or vodcasts, it is important to make mobile-‐friendly versions of these available, bearing in mind different platforms and screen sizes people might be using to access the content.29 text messaging, qr codes, and augmented reality are examples of library services that were created expressly for mobile devices. text messaging in particular has become a very popular mobile service offering; as thomas and murphy (2009) write, “interacting with patrons through text messaging now ranks among core competencies for librarians because sms increasingly comprises a central channel for communicating library information.”30 a common use of text messaging is a ‘text a librarian’ service. walsh (2012) recommends launching such a service even if the library currently offers no other mobile services, noting, “it can be quick, easy and cheap to introduce such a service and it is an ideal entry into the world of providing services via mobile devices” (p. 45).31 peters (2011) points out that the shorter the turnaround time (he recommends less than ten minutes) the better. he notes that many questions arise as the result of a situation the questioner is currently in. he writes, “if you do not respond in a matter of minutes, not hours, the context will be lost and the need will be diminished or satisfied in other ways.”32 qr codes have become popular in libraries offering mobile services. qr codes encode information in two dimensions (vertically and horizontally), and thus can provide more information than a barcode. the applications necessary for using qr codes are usually free, and they can be read by most mobile devices with cameras (little 2011).33 the most common uses of qr codes in academic libraries, according to elmore and stephens (2012), are linking to the library’s mobile website and social media pages, searching the library catalog, viewing a video or accessing a music file, reserving a study room, and taking a virtual tour of the library facilities.34 augmented reality may not currently be used as often in libraries as other services such as mobile sites and text messaging, but many libraries are finding unique and compelling ways to use ar. ar applications link the physical with the digital, are interactive in real time, and are registered in 3-‐d. information technologies and libraries | june 2015 137 hahn (2012) defines ar as follows: “in order to be considered a truly augmented reality application, an app must interactively attach graphics or data to objects in real time, to achieve the real and virtual combination of graphics into the physical environment.”35 he notes that such applications are excellent additions to libraries’ mobile services because they connect physical and digital worlds, much like libraries.36 one example of augmented reality is north carolina state university’s wolfwalk, which is advertised as “…a historical walking tour of the nc state campus using the location-‐aware campus map” (ncsu libraries).37 to create the tour, the ncsu libraries special collections research center provided over one thousand photographs of the campus from the 19th century to the present (ncsu libraries).38 research design to make sure the information gathered was current and valid, this study employed two approaches, website visits and survey investigation, to determine the state of mobile services at the top 100 universities’ libraries. the website visits explored what mobile services are being offered and how they are being offered at these university libraries. the survey sent via email inquired how they are providing mobile services in their libraries and what their results have been regarding challenges, successes, and best practices. the survey data was analyzed and compared to the data obtained via website exploration to form a more comprehensive picture of mobile services at these universities. participants university libraries' patrons are frequent users of mobile technology. according to osika and kaufman (2012), studies have found that 45% of 18 to 29-‐year-‐olds who have internet-‐capable cell phones do most of their browsing on their devices. 39 kostruski and skornia (2011) note that people of this age group are “…leaders in mobile communication…the traditional college-‐age student.”40 as the nation’s leaders in undergraduate and graduate programs and academic research, an examination of the status of the top 100 university libraries' mobile services can provide useful service patterns and a benchmark for the service improvements that would benefit academic programs. based on the u.s. news & world report's national university rankings, this study selected the top 100 universities in the 2014 rankings.41 procedure website visits as the first step were conducted from march 2, 2014 to march 16, 2014. each library’s home page was carefully examined for the most common mobile services named in the literature with these categorized items: 1) a mobile website or app, 2) mobile access to the library’s catalog and databases, 3) text messaging services, 4) qr codes, 5) augmented reality, and 6) e-‐books. to assess each site, we first visited the site via a nexus 7 to see if it had a mobile version. next, we viewed each library’s full site on a laptop computer. we browsed through each page of the site looking for mention or use of each said categorization. we also searched for these a library in the palm of your hand: mobile services in the top 100 university libraries | liu and briggs | doi: 10.6017/ital.v34i2.5650 138 items via the library’s site map or site search functions whenever available. the results were tabulated with a codebook in the established categorization through microsoft excel. although the website visits place great value on gathering quantitative data about what mobile services are offered at these libraries, this method has its limitations. firstly, it locates only those mobile services that appear on a library’s website, but services the library provides which are not mentioned on the website can be overlooked. also, the use of mobile devices or services in library instruction, a very commonly mentioned mobile service in the literature, cannot generally be determined via a website visit. in addition, the website visit provides only a snapshot of the current state of mobile services; university libraries may be planning to implement or even be in the process of implementing mobile services. lastly, website visits evaluate what is publicly available, but it is not possible to access password-‐protected information meant only for a university’s students and faculty to assess mobile content. to address these shortcomings, we created a survey using surveymonkey to complement the data supplied from the website visits. we sent out the survey via email to each of the top 100 universities’ libraries. the survey was conducted from april 10, 2014, to april 24, 2014. results and analysis study results presented compelling evidence that mobile services are already ubiquitous among the country's top universities. the most recognized ones are mobile sites, mobile apps, mobile opacs, mobile access to databases, text messaging services, qr codes, augmented reality, and e-‐ books. these service forms confirm those commonly named in the literature as library mobile services. what basic types of mobile services do the libraries provide? the results showed all of the libraries offered one or more of the specific mobile services in chart 1 with multiple entries allowed, presenting modernized new service patterns the university libraries provide to meet the needs and demands of university communities in this digital era. information technologies and libraries | june 2015 139 chart 1. percentage of libraries offering specific mobile services (multiple entries allowed). it is clear from both the survey results and the website visits that almost all libraries at the top 100 universities are offering multiple mobile services, with mobile websites, mobile access to the library’s catalog, mobile access to the library’s databases, e-‐books, and text messaging services being the most common. qr codes and especially augmented reality are not as common. of the eight main mobile services we looked for via the website visits and survey (mobile site, mobile app for the site, mobile opac, mobile access to databases, text messaging, qr codes, augmented reality, and e-‐books), all libraries surveyed offer between one and seven of these services. no universities have none of these services, and no universities have all of these services. only one university has one service, none have two, seven have three, thirteen have four, twenty-‐ four have five, forty-‐six have six, and eight have seven. to make this information easy to read, we summarized it in table 1 below. number of mobile services offered number of libraries percentage of libraries no mobile services 0 0% 1 mobile service 1 1% 2 mobile services 0 0% 3 mobile services 7 7% 4 mobile services 13 13% 5 mobile services 24 24% 6 mobile services 46 46% 7 mobile services 8 8% 8 mobile services 0 0% table 1. number of mobile services offered. 5.0% 29.2% 58.7% 77.2% 81.6% 81.7% 88.0% 92.6% augmented reality mobile app for site qr codes text messaging mobile website mobile databases mobile opac e-‐books percentage of libraries offering specimic mobile services a library in the palm of your hand: mobile services in the top 100 university libraries | liu and briggs | doi: 10.6017/ital.v34i2.5650 140 such a data pattern demonstrates not only that mobile services are very widespread at these universities’ libraries, but also that the vast majority of these libraries offer multiple mobile services. in other words, libraries do not appear to be offering mobile services in isolation; they have taken several of their most popular services (such as websites, reference, and search functions) and mobilized all of them. in fact, the average number of mobile services offered among the eight services we examined is 5.31. although results collected from the two research methods (website visits and survey) are almost identical for mobile websites and mobile opacs and are very comparable for text messaging, qr codes, and augmented reality there is a bit of a gap between results from the website visits and the survey regarding mobile databases (92.9% vs. 70.59%), but perhaps libraries that responded to the survey just happened to offer mobile access to databases less often than all the libraries in general. it is interesting that we located e-‐books on 100% of the websites we visited, but only 85.29% of respondents mention offering them. perhaps this discrepancy can be explained by a clarification in terms. we looked for the presence of books in electronic format that could be accessed online. perhaps survey respondents only considered e-‐books specifically formatted for smart phones or tablets as a mobile service. also, later in the survey several respondents mention communication issues as an ongoing challenge in offering mobile services, specifically, not always knowing what other library departments are offering in terms of mobile services. it is possible that some survey respondents are not responsible for the e-‐book collection and thus did not mention it as a mobile service. another discrepancy exists between the results for mobile apps for the library’s site (20.2% for the website visits versus 38.24% for the survey). these results indicate that mobile apps for libraries’ sites are more common than we had previously thought. perhaps these apps are being advertised in places other than on the library’s website, and therefore a website visit is not the best way to discover them. the website visits did not look for mobile library instruction, mobile book renewal, or mobile interlibrary loan, but through our website visits we saw these services mentioned several times and thus included them in the survey. they turned out to be somewhat common among libraries surveyed; 41.18% of respondents offer mobile book renewal, 20.59% offer mobile interlibrary loan, and 32.35% offer mobile-‐friendly library instruction. table 2 below compares the data collected from both the website visits and the survey among these 100 universities, ranking from high to low percentages. in most cases, they are very similar. information technologies and libraries | june 2015 141 mobile services percentage of libraries offering service (website visits) percentage of libraries offering service (survey) e-‐books 100% 85.29% mobile databases 92.90% 70.59% mobile opac 87.80% 88.24% mobile website 80.80% 82.35% text messaging 80.80% 73.53% qr codes 61.60% 55.88% mobile app for site 20.20% 38.24% augmented reality 7.00% 2.94% table 2. data comparison of specific mobile services between website visits & survey. what content do the mobile sites offer? in addition to assessing whether libraries had a mobile site, the survey asked libraries that already have a mobile site what is included on the site. 100% of libraries with mobile sites include library hours on their site, making this the most common feature. the next two most common features are library contact information and a search function for the catalog, which both received 96.67%. searching within mobile-friendly databases , such as ebscohost mobile, jstor and pubmed, is the next most popular feature, although it trailed a little behind library hours, contact information, and catalog searching at 70%. book renewal received 56.67%, and access to patron accounts received 53.33%. interlibrary loan is the least common feature by far, offered by only 26.67% of respondents. this information is summarized in chart 2 below: chart 2. components of libraries’ mobile sites. 26.67% 53.33% 56.67% 70.00% 96.67% 96.67% interlibrary loan access to patron accounts book renewal search the databases library contact information search the catalog components of libraries' mobile sites a library in the palm of your hand: mobile services in the top 100 university libraries | liu and briggs | doi: 10.6017/ital.v34i2.5650 142 these results are interesting as, overall, they reflect higher percentages for specific mobile services than question 1 on the survey, which asked which mobile services libraries offer. for example, in question 1, 88.24% of respondents offer mobile access to the library’s catalog, whereas for libraries with mobile sites, 96.67% offer access to the catalog on the mobile site. the ability to search mobile-‐friendly versions of databases the library subscribes to was almost the same for both groups, with 70.59% of respondents to question 1 offering this and 70% of respondents having this as a component of their mobile sites. mobile book renewal is much more common among libraries with mobile sites; 56.67% of respondents with mobile sites compared to 41.18% of total respondents. a slightly higher percentage of respondents with mobile sites offer mobile interlibrary loan (26.67%) compared to all respondents (20.59%). this data suggests that, on the whole, libraries with mobile sites are more likely to offer other mobile services as well, specifically mobile access to the catalog, mobile book renewal, and mobile interlibrary loan. what mobile reference services do libraries provide? the survey also looked for information on virtual and/or mobile reference services. 81.25% of survey respondents offer text/sms messaging, 100% offer chat/im, and 21.88% offer reference services via a social media account. these results showing popular reference services in these top universities are summarized in chart 3 below: chart 3. popular mobile reference services. chat/im is obviously the most popular method of providing virtual/mobile reference services; all survey respondents offer this service. text/sms is also very popular, indicating that the majority of libraries see value in providing both despite their similar functions. the fact that social media does not compare favorably to either texting or chat/im services is curious because most social media platforms have a mobile version available that libraries can take advantage of for free. however, this may not be the best medium for reference. one respondent commented on this question, “our ‘ask a librarian’ service is available from desktop facebook, but not on mobile facebook.” 22% 81% 100% social media text/sms chat/im popular virtual/mobile reference services information technologies and libraries | june 2015 143 what apps do libraries use or provide for patrons? although the website visits and survey results indicated that apps for a library’s site are not very common, both tools revealed that use of apps for various purposes is widespread. the most commonly mentioned app is browzine, which is used for accessing e-‐journals. several respondents mentioned apps developed in-‐house for using library services, such as an app for reserving a study room, accessing university archives, and sending catalog records to a mobile device. another respondent stated that the university’s app has a library function. several respondents mentioned vendor-‐provided or third-‐party apps, such as apps for accessing pubmed, sciencedirect, naxos music library, accessmylibrary (for gale resources), a mobile medical dictionary, and the american chemical society. one respondent noted that the library loans ipads preloaded with popular apps to support student research such as endnote, notability, goodreader, pages, numbers, and keynote, among others. finally, these apps were named at least once as an app libraries either use or provide access to: iresearch (for storing articles locally), boopsie (for building a library mobile app), ebrary (for accessing e-‐books), and safari (for accessing books and videos online). these results indicate that the use of apps is fairly robust and diverse among these libraries. additionally, from these results, it seems more common for libraries to use and/or provide apps created by third parties than to develop an in-‐house app, perhaps due to the expertise and expense involved in creating and maintaining an app. what mobile services will be added in the future? the final question of the survey asks libraries if there are any plans to offer a mobile service not currently provided. responses are summarized in chart 4 below. chart 4. percentage of the libraries seeking to add specific mobile services the most common selection is mobile friendly library instruction, with 61.54%. the next most common is a mobile website (46.15%). mobile interlibrary loan was chosen by 38.46% of 8% 8% 8% 15% 15% 15% 38% 46% 62% text messaging services qr codes mobile app(s) e-‐books augmented reality mobile opac mobile databases mobile book renewal mobile interlibrary loan mobile website mobile library instruction planned mobile services additions a library in the palm of your hand: mobile services in the top 100 university libraries | liu and briggs | doi: 10.6017/ital.v34i2.5650 144 respondents. less common services planned include adding mobile access to the library’s opac, mobile access to the library’s databases, and mobile book renewal, each of which were chosen by 15.38% of respondents. 7.69% of respondents are planning to add mobile apps, e-‐books, and augmented reality, respectively. no one indicated plans to add text messaging services or qr codes. these results indicate that libraries expect demand for traditional library services in a mobile-‐friendly format to continue to expand; mobile friendly library instruction was only offered by 32.35% of respondents, yet 61.54% have plans to offer this service in the future. mobile interlibrary loan is currently offered by 20.59% of respondents, so the fact that 38.46% would like to add it represents a significant change. not surprisingly, mobile websites are likely to remain a very popular mobile service. the fact that 82.35% of respondents already have a mobile website and 46.15% who do not have one wish to add one in the near future means that mobile-‐friendly sites are well on their way to becoming ubiquitous, at least among libraries at the top 100 universities, and may reasonably be expected to take their place among websites in general as a necessity to maintain institutional viability. additionally, several respondents mentioned moving towards responsive design, in which their websites are fully functional regardless of whether they are accessed on mobile devices or desktops. what are challenges and strategies for offering mobile services? in addition to looking for the presence or absence of mobile services being offered at top 100 university libraries, the survey also examined libraries’ experiences in implementing mobile services, including challenges, successes, and best practices. several themes emerged in response to these questions. the most common challenge among respondents was having the time, expertise, staffing and money to support mobile services, especially apps and mobile sites. to solve this problem, respondents mention relying on vendors and third-‐party providers supplying apps to access their resources, but this does not give libraries the flexibility and specificity of an in-‐house app. another common challenge mentioned by several respondents involved technical issues, such as difficulties with off campus access to resources via a proxy server and compatibility issues among different browsers and especially different devices. a lack of communication and/or support is another issue for libraries. one respondent reported a lack of support from the campus computing center for mobile services. one respondent discussed the difficulty of having a coordinated mobile effort when the library has a large number of departments, and each department may or may not be aware of what the others are doing in regards to mobile services. survey results revealed that few libraries have policies in place to support mobile services. coming up with a specific plan for implementing such services can help libraries work towards promoting effective communication and garnering support. one respondent wrote, “the biggest challenges have been: (1) developing a strategy (2) developing a service model (3) having a systematic model for managing content for both mobile-‐ and non-‐mobile applications. we've had information technologies and libraries | june 2015 145 success with the first two and are making great progress on the third.” interestingly, several respondents noted that underuse is an issue for some services. one respondent mentioned that qr codes are not used often, and another mentioned that the library’s text-‐a-‐librarian service is much underutilized. several respondents cited the need to market mobile services as an antidote to this problem. seeking regular feedback from the user community regarding mobile services wants and needs is another recommended solution. other issues include the fact that not all library services are mobilized. however, libraries are actively looking for solutions for this. there is a trend among respondents towards developing a site that is responsive to all devices, including desktops, laptops, tablets, and phones. this will take the place of a separate mobile site. as one respondent states, “at the moment, our library mobile website only has a fraction of the services available via our desktop website. we are in the process of moving everything to responsive design, with the expectation that all services will be equally available in mobile and desktop.” in reading through these responses, one message is clear: mobile services are a must. several respondents noted that demand for mobile services is growing, with one writing, “get started as soon as possible. our analytics show that mobile use is continuing to increase.” conclusion this study confirms that as of spring 2014 mobile services are already ubiquitous among the country’s top 100 universities’ libraries and are likely to continue to grow. where the most common services offered are e-‐books, chat/im, mobile access to databases, mobile access to the library catalog, mobile sites, and text messaging services, there is a trend towards responsive design for websites so that patrons can access the library’s full site on any mobile device. the experiences of these libraries demonstrate the value of creating a plan for providing mobile services, allotting the appropriate amount of staffing, time, and funding, communicating among departments and stakeholders to coordinate mobile efforts, marketing services, and regularly seeking patron feedback. however, there is no one approach to offering mobile services, and each library must do what works best for its patrons. references 1. andrew walsh, using mobile technology to deliver library services (maryland: scarecrow press, 2012), xiv. 2. “smartphone ownership 2013,” last modified june 5, 2013, http://www.pewinternet.org/2013/06/05/smartphone-‐ownership-‐2013/. 3. thomas a. peters, “left to their own devices: the future of reference services on personal, portable information, communication, and entertainment devices,” reference librarian 52 (2011): 88-‐97, doi:10.1080/02763877.2011.520110. a library in the palm of your hand: mobile services in the top 100 university libraries | liu and briggs | doi: 10.6017/ital.v34i2.5650 146 4. acrl research planning and review committee, “top ten trends in academic libraries, “ college & research libraries news 73 (2012): 311-‐320. 5. lauren elmore and derek stephens, “the application of qr codes in uk academic libraries,” new review of academic librarianship 18 (2012):26-‐42, doi:10.1080/13614533.2012.654679. 6. robin canuel and chad crichton, “canadian academic libraries and the mobile web,” new library world 112 (2011): 107-‐120, doi:10.1108/03074801111117014. 7. aiguo li, “mobile library services in key chinese academic libraries,” journal of academic librarianship 39 (2013): 223-‐226, doi:10.1016/j.acalib.2013.01.009. 8. robin canuel and chad crichton, “canadian academic libraries,” 107-‐120. 9. lisa carlucci thomas, “gone mobile? (mobile libraries survey 2010),” library journal 135 (2010): 30-‐34. 10. robin canuel and chad crichton, “canadian academic libraries,” 107-‐120. 11. “mobile technology in libraries survey,” last modified 2012, http://www.ohsu.edu/xd/education/library/about/staff-‐ directory/upload/mobile_survey_academic_final.pdf. 12. brittany osika and cate kaufman, “’mobilizing’ community college libraries,” searcher 20 (2012): 36-‐46. 13. scott la counte, “introduction,” in mobile library services: best practices, ed. charles harmon and michael messina. (maryland: scarecrow press, 2013), v-‐vii. 14. fred d. barnhart and jeannette e. pierce, “becoming mobile: reference in the ubiquitous library,” journal of library administration 52 (2012): 559-‐570, doi:10.1080/01930826.2012.707954. 15. mark andy west, arthur w. hafner, and bradley d. faust, “expanding access to library collections and services using small-‐screen devices,” information technology & libraries 25 (2006): 103-‐107. 16. joan k. lippincott, “mobile technologies, mobile users: implications for academic libraries,” arl: a bimonthly report on research library issues & actions 261 (2008): 1-‐4. 17. walsh, using mobile technology, 58. 18. ibid. 19. “mobile technology in libraries survey.” information technologies and libraries | june 2015 147 20. edward iglesias and wittawat meesangnil, “mobile website development: from site to app,” bulletin of the american society for information science and technology 38 (2011): 18-‐23, doi: 10.1002/bult.2011.1720380108. 21. joshua bishoff, “going mobile at illinois: a case study,” in mobile library services: best practices, ed. charles harmon and michael messina. (maryland: scarecrow press, 2013), 107-‐ 121. 22. walsh, using mobile technology. 23. helen bischoff, michele ruth, and ben rawlins, “making the library mobile on a shoestring budget,” in mobile library services: best practices, ed. charles harmon and michael messina. (maryland: scarecrow press, 2013), 43-‐54. 24. walsh, using mobile technology. 25. ibid. 26. ibid. 27. ibid., 105. 28. ibid., 97. 29. ibid. 30. “go mobile: use these strategies and increase your mobile literacy and your patrons’ satisfaction,” last modified november 1, 2009, http://libraryconnect.elsevier.com/articles/technology-‐content/2009-‐11/go-‐mobile. 31. walsh, using mobile technology, 45. 32. peters, “left to their own devices.” 33. geoffrey little, “keeping moving: smart phone and mobile technologies in the academic library,” journal of academic librarianship 37 (2011): 267-‐269, doi: 10.1016/j.acalib.2011.03.004. 34. elmore and stephens, “the application of qr codes.” 35. jim hahn, “mobile augmented reality applications for library services,” new library world 113 (2012): 429-‐438, accessed june 21, 2014, doi:10.1108/03074801211273902. 36. ibid. 37. wolfwalk: explore nc state history right on your phone,” http://www.lib.ncsu.edu/wolfwalk/. a library in the palm of your hand: mobile services in the top 100 university libraries | liu and briggs | doi: 10.6017/ital.v34i2.5650 148 38. ibid. 39. osika and kaufman, “mobilizing community college libraries.” 40. kate kosturski and frank skornia, “handheld libraries 101: using mobile technologies in the academic library,” computers in libraries 31 (2011): 11-‐13. 41. “national university rankings,” http://colleges.usnews.rankingsandreviews.com/best-‐ colleges/rankings/national-‐universities/spp+50. bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 205 kayla l. quinney, sara d. smith, and quinn galbraith bridging the gap: self-directed staff technology training of hbll patrons. as anticipated, results indicated that students frequently use text messages, social networks, blogs, etc., while fewer staff members use these technologies. for example, 42 percent of the students reported that they write a blog, while only 26 percent of staff and faculty do so. also, 74 percent of the students and only 30 percent of staff and faculty indicated that they belonged to a social network. after concluding that staff and faculty were not as connected as their student patrons are to technology, library administration developed the technology challenge to help close this gap. the technology challenge was a self-directed training program requiring participants to explore new technology on their own by spending at least fifteen minutes each day learning new technology skills. this program was successful in promoting lifelong learning by teaching technology applicable to the work and home lives of hbll employees. we will first discuss literature that shows how technology training can help academic librarians connect with student patrons, and then we will describe the technology challenge and demonstrate how it aligns with the principles of self-directed learning. the training will be evaluated by an analysis of the results of two surveys given to participants before and after the technology challenge was implemented. ■■ library 2.0 and “librarian 2.0” hbll wasn’t the first to notice the gap between librarians and students, mcdonald and thomas noted that “gaps have materialized,” and library technology does not always “provide certain services, resources, or possibilities expected by emerging user populations like the millennial generation.”1 college students, who grew up with technology, are “digital natives,” while librarians, many having learned technology later in life, are “digital immigrants.”2 the “digital natives” belong to the millennial generation, described by shish and allen as a generation of “learners raised on and confirmed experts in the latest, fastest, coolest, greatest, newest electronic technologies.”3 according to sweeny, when students use libraries, they expect the same “flexibility, geographic independence, speed of response, time shifting, interactivity, multitasking, and time savings” provided by the technology they use daily.4 students are undergraduates, as members of the millennial generation, are proficient in web 2.0 technology and expect to apply these technologies to their coursework—including scholarly research. to remain relevant, academic libraries need to provide the technology that student patrons expect, and academic librarians need to learn and use these technologies themselves. because leaders at the harold b. lee library of brigham young university (hbll) perceived a gap in technology use between students and their staff and faculty, they developed and implemented the technology challenge, a self-directed technology training program that rewarded employees for exploring technology daily. the purpose of this paper is to examine the technology challenge through an analysis of results of surveys given to participants before and after the technology challenge was implemented. the program will also be evaluated in terms of the adult learning theories of andragogy and selfdirected learning. hbll found that a self-directed approach fosters technology skills that librarians need to best serve students. in addition, it promotes lifelong learning habits to keep abreast of emerging technologies. this paper offers some insights and methods that could be applied in other libraries, the most valuable of which is the use of self-directed and andragogical training methods to help academic libraries better integrate modern technologies. l eaders at the harold b. lee library of brigham young university (hbll) began to suspect a need for technology training when employees were asked during a meeting if they owned an ipod or mp3 player. out of the twenty attendees, only two raised their hands—one of whom worked for it. perceiving a technology gap between hbll employees and student patrons, library leaders began investigating how they could help faculty and staff become more proficient with the technologies that student patrons use daily. to best serve student patrons, academic librarians need to be proficient with the technologies that student patrons expect. hbll found that a self-directed learning approach to staff technology training not only fosters technology skills, but also promotes lifelong learning habits. to further examine the technology gap between librarians and students, the hbll staff, faculty, and student employees were given a survey designed to explore generational differences in media and technology use. student employees were surveyed as representatives of the larger student body, which composes the majority kayla l. quinney (quinster27@gmail.com) is research specialist, sara d. smith (saradsmith@gmail.com) is research specialist, and quinn galbraith (quinn_galbraith@byu.edu) is library human resource training and development manager, brigham young university library, provo, utah. 206 information technology and libraries | december 2010 2.0,” a program that “focuses on self-exploration and encourages staff to learn about new technologies on their own.”24 learning 2.0 encouraged library staff to explore web 2.0 tools by completing twenty-three exercises involving new technologies. plcmc’s program has been replicated by more than 250 libraries and organizations worldwide,25 and several libraries have written about their experiences, including academic26 and public libraries.27 these programs—and the technology challenge implemented by hbll—integrate the theories of adult learning. in the 1960s and 1970s, malcolm knowles introduced the theory of andragogy to describe the way adults learn.28 knowles described adults as learners who (1) are self-directed, (2) use their experiences as a resource for learning, (3) learn more readily when they experience a need to know, (4) seek immediate application of knowledge, and (5) are best motivated by internal rather than external factors.29 the theory and practice of self-directed learning grew out of the first learning characteristic and assumes that adults prefer self-direction in determining and achieving learning goals, and therefore learners exercise independence in determining how and what they learn.30 these theories have had a considerable effect on adult education practice31 and employee development programs.32 when adults participate in trainings that align with the assumptions of andragogy, they are more likely to retain and apply what they have learned.33 ■■ the technology challenge hbll’s technology challenge is similar to learning 2.0 in that it encourages self-directed exploration of web 2.0 technologies, but it differs in that participants were even more self-directed in exploration and that they were asked to participate daily. these features encouraged more self-directed learning in areas of participant interest as well as habit formation. it is not our purpose to critique learning 2.0, but to provide some evidence and analysis to demonstrate the success of hands-on, self-directed training approaches and to suggest other ways for libraries to apply self-directed learning to technology training. the technology challenge was implemented from june 2007 to january 2008. hbll staff included 175 full-time employees, 96 of whom participated in the challenge. (the student employees were not involved.) participants were asked to spend fifteen minutes each day learning a new technology skill. hbll leaders used rewards to make the program enjoyable and to motivate participation: for each minute spent learning technology, participants earned one point, and when one thousand points were earned, the participant would receive a gift certificate to the campus bookstore. staff and faculty participated and tracked their progress through an online masters of “informal learning”; that is, they are accustomed to easily and quickly gathering information relevant to their lives from the internet and from friends. shish and allen claimed that millennials prefer “interactive, hyper-linked multimedia over the traditional static, textoriented printed items. they want a sense of control; they need experiential and collaborative approaches rather than formal, librarian-guided, library-centric services.”5 these students arrive on campus expecting “to handle the challenges of scholarly research” using similar methods and technologies.6 interactive technologies such as blogs, wikis, streaming media applications, and social networks, are referred to as “web 2.0.” abram argued that web 2.0 technology “could be useful in an enterprise, institutional research, or community environment, and could be driven or introduced by the library.”7 “library 2.0” is a concept referring to a library’s integration of these technologies; it is essentially the use of “web 2.0 opportunities in a library environment.”8 manesss described library 2.0 is user-centered, social, innovative, and provider of a multimedia experiences.9 it is a community that “blurs the line between librarian and patron, creator and consumer, authority and novice.”10 libraries have been using web 2.0 technology such as blogs,11 wikis,12 and social networks13 to better serve and connect with patrons. blogs allow libraries to “provide news, information and links to internet resources,”14 and wikis create online study groups15 and “build a shared knowledge repository.”16 social networks can be particularly useful in connecting with undergraduate students: millennials use technology to collaborate and make collective decisions,17 and libraries can capitalize on this tendency by using social networks, which for students would mean, as bates argues, “an informational equivalent of the reliance on one’s facebook friends.”18 students expect library 2.0—and as libraries integrate new technologies, the staff and faculty of academic libraries need to become “librarian 2.0.” according to abram, librarian 2.0 understands users and their needs “in terms of their goals and aspirations, workflows, social and content needs, and more. librarian 2.0 is where the user is, when the user is there.”19 the modern library user “needs the experience of the web . . . to learn and succeed,”20 and the modern librarian can help patrons transfer technology skills to information seeking. librarian 2.0 is prepared to help patrons familiar with web 2.0 to “leverage these [technologies] to make a difference in reaching their goals.”21 therefore staff and faculty “must become adept at key learning technologies themselves.”22 stephen abram asked, “are the expectations of our users increasing faster than our ability to adapt?”23 and this same concern motivated hbll and other institutions to initiate staff technology training programs. the public library of charlotte and mecklenburg county of north carolina (plcmc) developed “learning bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 207 their ability to learn and use technology. to be eligible to receive the gift card, participants were required to take this exit survey. sixty-four participants, all of whom had met or exceeded the thousand-point goal, chose to complete this survey, so the results of this survey represent the experiences of 66 percent of the participants. of course, if those who had not completed the technology challenge had taken the survey the results may have been different, but the results do show how those who chose to actively participate reacted to this training program. the survey included both quantifiable and open-ended questions (see appendix b for survey results and a list of the open-ended questions). the survey results, along with an analysis of the structure of the challenge itself, demonstrates that the program aligns with knowles’s five principles of andragogy to successfully help employees develop both technology skills and learning habits. self-direction the technology challenge was self-directed because it gave participants the flexibility to select which tasks and challenges they would complete. garrison wrote that in a self-directed program, “learners should be provided with choices of how they wish to proactively carry out the learning process. material resources should be available, approaches suggested, flexible pacing accommodated, and questioning and feedback provided when needed.”34 hbll provided a variety of challenges and training sessions related to various technologies. technology challenge participants were given the independence to choose which learning methods to use, including which training sessions to attend and which challenges to complete. according to the exit survey, the most popular training methods were small, instructor-led groups, followed by self-learning through reading books and articles. group training sessions were organized by hbll leadership and addressed topics such as microsoft office, rss feeds, computer organization skills, and multimedia software. other learning methods included web tutorials, dvds, large group discussions, and one-on-one tutoring. the group training classes preferred by hbll employees may be considered more teacher-directed than self-directed, but the technology challenge was self-directed as a whole in that learners were given the opportunity to choose what they learned and how they learned it. the structure of the technology challenge allowed participants to set their own pace. staff and faculty were given several months to complete the challenge and were responsible to pace themselves. on the exit survey, one participant commented: “if i didn’t get anything done one week, there wasn’t any pressure.” another enjoyed flexibility in deciding when and where to complete the tasks: “i liked being able to do the challenge anywhere. when i had a few minutes between appointments, classes, board game called “techopoly.” participation was voluntary, and staff and faculty were free to choose which tasks and challenges they would complete. tasks fell into one of four categories: software, hardware, library technology, and the internet. participants were required to complete one hundred points in each category, but beyond that, were able to decide how to spend their time. examples of tasks included attending workshops, exploring online tutorials, and reading books or articles about a relevant topic. for each hundred points earned, participants could complete a mini-challenge, which included reading blogs or e-books, listening to podcasts, or creating a photo cd (see appendix a for a more complete list). participants who completed fifteen out of twenty possible challenges were entered into a drawing for another gift certificate. before beginning the challenge, all participants were surveyed about their current use of technology. on this survey, they indicated that they were most uncomfortable with blogs, wikis, image editors, and music players. these results provided a focus for technology challenge trainings and mini-challenges. while not all of these technologies may apply directly to their jobs, 60 percent indicated that they were interested in learning them. forty-four percent reported that time was the greatest impediment to learning new technology; therefore the daily fifteen-minute requirement was introduced with the hope that it was small enough to be a good incentive to participate but substantial enough to promote habit formation and allow employees enough time to familiarize themselves with the technology. although some productivity may have been lost due to the time requirement (especially in cases where participants may have spent more than the required time), library leaders felt that technology training was an investment in hbll employees and that, at least for a few months, it was worth any potential loss in productivity. because participants could chose how and when they learned technology, they could incorporate the challenge into their work schedules according to their own needs, interests, and time constraints. of ninety-six participants, sixty-six reached or exceeded the thousand-point goal, and eight participants earned more than two thousand points. ten participants earned between five hundred and one thousand points, and another six earned between one hundred and five hundred. although not all participants completed the challenge, most were involved to some extent in learning technology during this time. ■■ the technology challenge and adult learning after finishing the challenge, participants took an exit survey to evaluate the experience and report changes in 208 information technology and libraries | december 2010 were willing, even excited, to learn technology skills: 37 percent “agreed” and 60 percent “strongly agreed” that they were interested in learning new technology. their desire to learn was cultivated by the survey itself, which helped them recognize and focus on this interest, and the challenge provided a way for employees to channel their desire to learn technology. immediate application learners need to see an opportunity for immediate application of their knowledge: ota et al. explained that “they want to learn what will help them perform tasks or deal with problems they confront in everyday situations and those presented in the context of application to real life.”39 because of the need for immediate application, the technology challenge encouraged staff and faculty to learn technology skills directly related to their jobs—as well as technology that is applicable to their personal or home lives. hbll leaders hoped that as staff became more comfortable with technology in general, they would be motivated to incorporate more complex technologies into their work. here is one example of how the technology challenge catered to adult learners’ need to apply what they learn: before designing the challenge, hbll held a training session to teach employees the basics of photoshop. even though attendees were on the clock, the turnout was discouraging. library leaders knew they needed to try something new. in the revamped photoshop workshop that was offered as part of the technology challenge, attendees brought family photos or film and learned how to edit and experiment with their photos and burn dvd copies. this time, the class was full: the same computer program that before drew only a few people was now exciting and useful. focusing on employees’ personal interests in learning new software, instead of just on teaching the software, better motivated staff and faculty to attend the training. motivation as stated by ota et al., adults are motivated by external factors but are usually more motivated by internal factors: “adults are responsive to some external motivators (e.g., better job, higher salaries), but the most potent motivators are internal (e.g., desire for increased job satisfaction, self-esteem).”40 on the entrance survey, participants were given the opportunity to comment on their reasons for participating in the challenge. the gift card, an example of an external motivation, was frequently cited as an important motivation. but many also commented on more internal motivations: “it’s important to my job to stay proficient in new technologies and i’d like to stay current”; “i feel that i need to be up-to-date or meetings i could complete some of the challenges.” employees could also determine how much or how little of the challenge they wanted to complete: many reached well over the thousand-point goal, while others fell a little short. participants began at different skill levels, and thus could use the time and resources allotted to explore basic or more advanced topics according to their needs and interests. garrison had noted the importance of providing resources and feedback in self-directed learning.35 the techopoly website provided resources (such as specific blogs or websites to visit) and instructions on how to use and access technology within the library. hbll also hired a student to assist staff and faculty one-on-one by explaining answers to their questions about technology and teaching other skills he thought may be relevant to their initial problem. the entrance and exit surveys provided opportunities for self-reflection and self-evaluation by questioning the participants’ use of technology before the challenge and asking them to evaluate their proficiency in technology after the challenge. use of experience the use of experience as a source of learning is important to adult learners: “the richest resource for learning resides in adults themselves; therefore, tapping into their experiences through experiential techniques (discussions, simulations, problem-solving activities, or case methods) is beneficial.”36 the small-group discussions and one-onone problem solving made available to hbll employees certainly fall into these categories. small-group classes are one of the best ways to encourage adults to share and validate their experiences, and doing so increases retention and application of new information.37 the trainings and challenges encouraged participants to make use of their work and personal experiences by connecting the topic to work or home application. for example, one session discussed how blogs relate to libraries, and another helped participants learn adobe photoshop skills by editing personal photographs. need to know adult learners are more successful when they desire and recognize a need for new knowledge or skills. the role of a trainer is to help learners recognize this “need to know” by “mak[ing] a case for the value of learning.”38 hbll used the generational survey and presurvey to develop a need and desire to learn. the results of the generational survey, which demonstrated a gap in technology use between librarians and students, were presented and discussed at a meeting held before the initiation of the technology challenge to help staff and faculty understand why it was important to learn 2.0 technology. results of the presurvey showed that staff and faculty bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 209 statistical reports or working with colleagues from other libraries.” ■■ “i learned how to set up a server that i now maintain on a semi-regular basis. i learned a lot about sfx and have learned some perl programming language as well that i use in my job daily as i maintain sfx.” ■■ “the new oclc client was probably the most significant. i spent a couple of days in an online class learning to customize the client, and i use what i learned there every single day.” ■■ “i use google docs frequently for one of the projects i am now working on.” participants also indicated weaknesses in the technology challenge. almost 20 percent of those who completed the challenge reported that it was too easy. this is a valid point—the challenge was designed to be easy so as not to intimidate staff or faculty who are less familiar with technology. it is important to note that these comments came from those who completed the challenge—other participants may have found the tasks and mini-challenges more difficult. the goal was to provide an introduction to web 2.0, not to train experts. however, a greater range of tasks and challenges could be provided in the future to allow staff and faculty more selfdirection in selecting goals relevant to their experience. to encourage staff and faculty to attend sponsored training sessions as part of the challenge, hbll leaders decided to double points for time spent at these classes. this certainly encouraged participation, but it lead to “point inflation”—perhaps being one reason why so many reported that the challenge was too easy to complete. the doubling of points may also have encouraged staff to spend more time in workshops and less time practicing or applying the skills learned. a possible solution would be offering 1.5 points, or offering a set number of points for attendance instead of counting per minute. it also may have been informative for purpose of analysis to have surveyed both those who did not complete the challenge as well as those who chose not to participate. because the presurvey indicated that time was the biggest deterrent to learning and incorporating new technology, we assume that many of those who did not participate or who did not complete the challenge felt that they did not have enough time to do so. there is definitely potential for further investigation into why library staff would not want to participate in a technology training program, what would motivate them to participate, and how we could redesign the technology challenge to make it more appealing to all of our staff and faculty. several library employees have requested that hbll sponsor another technology challenge program. because of the success of the first and because of continuing interest in technology training, we plan to do so in the future. we will make changes and adjustments according to the on technology in order to effectively help patrons”; “to identify and become comfortable with new technologies that will make my work more efficient, more presentable, and more accurate.” ■■ lifelong learning staff and faculty responded favorably to the training. none of the participants who took the exit survey disliked the challenge; 34 percent even reported that they strongly liked it. ninety-five percent reported that they enjoyed the process of learning new technology, and 100 percent reported that they were willing to participate in another technology challenge—thus suggesting success in the goal of encouraging lifelong technology learning. the exit survey results indicate that after completing the challenge, staff and faculty are more motivated to continue learning—which is exactly what hbll leaders hoped to accomplish. eighty-nine percent of the participants reported that their desire to learn new technology had increased, and 69 percent reported that they are now able to learn new technology faster after completing the technology challenge. ninety-seven percent claimed that they were more likely to incorporate new technology into home or work use, and 98 percent said they recognized the importance of staying on top of emerging technologies. participants commented that the training increased their desire to learn. one observed, “i often need a challenge to get motivated to do something new,” and another participant reported feeling “a little more comfortable trying new things out.” the exit survey asked participants to indicate how they now use technology. one employee keeps a blog for her daughter’s dance company, and another said, “i’m on my way to a full-blown googlereader addiction.” another participant applied these new skills at home: “i’m not so afraid of exploring the computer and other software programs. i even recently bought a computer for my own personal use at home.” the technology challenge was also successful in helping employees better serve patrons: “i can now better direct patrons to services that i would otherwise not have known about, such as streaming audio and video and e-book readers.” another participant felt better connected to student patrons: “i understand the students better and the things they use on a daily basis.” staff and faculty also found their new skills applicable to work beyond patron interaction, and many listed specific examples of how they now use technology at work: ■■ “i have attended a few microsoft office classes that have helped me tremendously in doing my work more efficiently, whether it is for preparing monthly 210 information technology and libraries | december 2010 2. richard t. sweeny, “reinventing library buildings and services for the millennial generation,” library administration & management 19, no. 4 (2005): 170. 3. win shish and martha allen, “working with generationd: adopting and adapting to cultural learning and change,” library management 28, no. 1/2 (2006): 89. 4. sweeney, “reinventing library buildings,” 170. 5. shish and allen, “working with generation-d,” 96. 6. ibid., 98. 7. stephen abram, “social libraries: the librarian 2.0 pheonomenon,” library resources & technical services 52, no. 2 (2008): 21. 8. ibid. 9. jack m. maness “library 2.0 theory: web 2.0 and its implications for libraries,” webology 3, no. 2 (2006), http:// www.webology.ir/2006/v3n2/a25.html?q=link:webology.ir/ (accessed jan. 8, 2010). 10. ibid., under “blogs and wikis,” para. 4. 11. laurel ann clyde, “library weblogs,” library management 22, no. 4/5 (2004): 183–89; maness, “library 2.0. theory.” 12. see matthew m. bejune, “wikis in libraries,” information technology & libraries 26, no. 3 (2007): 26–38 ; darlene fichter, “the many forms of e-collaboration: blogs, wikis, portals, groupware, discussion boards, and instant messaging,” online: exploring technology & resources for information professionals 29, no. 4 (2005): 48–50; maness, “library 2.0 theory.” 13. mary ellen bates, “can i facebook that?” online: exploring technology and resources for information professionals 31, no. 5 (2007): 64; sarah elizabeth miller and lauren a. jensen, “connecting and communicating with students on facebook,” computers in libraries 27, no. 8 (2007): 18–22. 14. clyde, “library weblogs,” 183. 15. maness, “library 2.0 theory.” 16. fichter, “many forms of e-collaboration,” 50. 17. sweeney, “reinventing library buildings”; bates, “can i facebook that?” 18. bates, “can i facebook that?” 64. 19. abram, “social libraries,” 21. 20. ibid., 20. 21. ibid., 21. 22. shish and allen, “working with generation-d,” 90. 23. abram, “social libraries,” 20. 24. helene blowers and lori reed, “the c’s of our sea change: plans for training staff, from core competencies to learning 2.0,” computers in libraries 27, no. 2 (2007): 11. 25. helene blowers, learning 2.0, 2007, http://plcmclearning .blogspot.com (accessed jan. 8, 2010). 26. for examples, see ilana kingsley and karen jensen, “learning 2.0: a tool for staff training at the university of alaska fairbanks rasmuson,” the electronic journal of academic & special librarianship 12, no. 1 (2009), http://southernlibrarianship.icaap.org/content/v10n01/kingsley_i01.html (accessed jan. 8, 2010); beverly simmons, “learning (2.0) to be a social library,” tennessee libraries 58, no. 2 (2008): 1–8. 27. for examples, see christine mackenzie, “creating our future: workforce planning for library 2.0 and beyond,” australasian public libraries & information services 20, no. 3 (2007): 118–24; liisa sjoblom, “embracing technology: the deschutes public library’s learning 2.0 program,” ola quarterly 14, no. 2 (2007): 2–6; hui-lan titango and gail l. mason, “learning library 2.0: 23 things @ scpl,” library management 30, no. 1/2 feedback we have received, and continue to evaluate it and improve it based on survey results. the purpose of a second technology challenge would be to reinforce what staff and faculty have already learned, to teach new skills, and to help participants remember the importance of lifelong learning when it comes to technology. ■■ conclusion hbll’s self-directed technology challenge was successful in teaching technology skills and in promoting lifelong learning—as well as in fostering the development of librarian 2.0. abram listed key characteristics and duties of librarian 2.0, including learning the tools of web 2.0; connecting people, technology, and information; embracing “nontextual information and the power of pictures, moving images, sight, and sound”; using the latest tools of communication; and understanding the “emerging roles and impacts of the blogosphere, web syndicasphere, and wikisphere.”41 survey results indicated that hbll employees are on their way to developing these attributes, and that they are better equipped with the skills and tools to keep learning. like plcmc’s learning 2.0, the technology challenge could be replicated in libraries of various sizes. obviously an exact replication would not be feasible or appropriate for every library—but the basic ideas, such as the principles of andragogy and self-directed learning could be incorporated, as well as the daily time requirement or the use of surveys to determine weaknesses or interests in technology skills. whatever the case, there is a great need for library staff and faculty to learn emerging technologies and to keep learning them as technology continues to change and advance. but the most important benefit of a self-directed training program focusing on lifelong learning is effective employee development. the goal of any training program is to increase work productivity—and as employees become more productive and efficient, they are happier and more excited about their jobs. on the exit survey, one participant expressed initially feeling hesitant about the technology challenge and feared that it would increase an already hefty workload. however, once the challenge began, the participant enjoyed “taking the time to learn about new things. i feel i am a better person/librarian because of it.” and that, ultimately, is the goal—not only to create better librarians, but also to create better people. notes 1. robert h. mcdonald and chuck thomas, “disconnects between library culture and millennial generation values,” educause quarterly 29, no. 4 (2006): 4. bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 211 ers,” journal of extension 33 (2005), http://www.joe.org/ joe/2006december/tt5.php (accessed jan. 8, 2010); wayne g. west, “group learning in the workplace,” new directions for adult and continuing education 71 (1996): 51–60. 33. ota et al., “needs of learners.” 34. d. r. garrison, “self-directed learning: toward a comprehensive model,” adult education quarterly 48 (1997): 22. 35. ibid. 36. ota et al., “needs of learners,” under “needs of the adult learner,” para. 4. 37. ota et al., “needs of learners”; west, “group learning.” 38. ota et al., “needs of learners,” under “needs of the adult learner,” para. 2. 39. ibid., para. 6. 40. ibid., para 7. 41. abram, “social library,” 21–22. (2009): 44–56; illinois library association, “continuous improvement: the transformation of staff development,” the illinois library association reporter 26, no. 2 (2008): 4–7; and thomas simpson, “keeping up with technology: orange county library embraces 2.0,” florida libraries 20, no. 2 (2007): 8–10. 28. sharan b. merriam, “andragogy and self-directed learning: pillars of adult learning theory,” new directions for adult & continuing education 89 (2001): 3–13. 29. malcolm shepherd knowles, the modern practice of adult education: from pedagogy to andragogy (new york: cambridge books, 1980). 30. jovita ross-gordon, “adult learners in the classroom,” new directions for student services 102 (2003): 43–52. 31. merriam, “pillars of adult learning”; ross-gordon, “adult learners.” 32. carrie ota et al., “training and the needs of learnappendix a. technology challenge “mini challenges” technology challenge participants had the opportunity to complete fifteen of twenty mini-challenges to become eligible to win a second gift certificate to the campus bookstore. below are some examples of technology mini-challenges: 1. read a library or a technology blog 2. listen to a library podcast 3. check out a book from circulation’s new self-checkout machine 4. complete an online copyright tutorial 5. catalog some books on librarything 6. read an e-book with sony ebook reader or amazon kindle 7. scan photos or copy them from a digital camera and then burn them onto a cd 8. backup data 9. change computer settings 10. schedule meetings with microsoft outlook 11. create a page or comment on a page on the library’s intranet wiki 12. use one of the library’s music databases to listen to music 13. use wordpress or blogger to create a blog 14. post a photo on a blog 15. use google reader or bloglines to subscribe to a blog or news page using rss 16. reserve and check out a digital camera, camcorder, dvr, or slide scanner from the multimedia lab and create something with it 17. convert media on the analog media racks 18. edit a family photograph using photo-editing software 19. attend a class in the multimedia lab 20. make a phone call using skype 212 information technology and libraries | december 2010 how did you like the technology challenge overall? answer response percent strongly disliked 0 0 disliked 0 0 liked 42 66 strongly liked 22 34 how did you like the reporting system used for the technology challenge (the techopoly game)? answer response percent strongly disliked 0 0 disliked 4 6 liked 41 64 strongly liked 19 30 would you participate in another technology challenge? answer response percent yes 64 100 no 0 0 what percentage of time did you spend using the following methods of learning? (participants were asked to allocate 100 points among the categories) category average response instructor-led large group 15.3 instructor-led small group 27 one-on-one instruction 3.5 web tutorial 12.8 self-learning (books, articles) 27.4 dvds .5 small group discussion 2.7 large group discussion 2.6 other 6.7 i am more likely to incorporate new technology into my home or work life. answer response percent strongly disagree 0 0 disagree 2 3 agree 49 77 strongly agree 13 20 i enjoy the process of making new technology a part of my work or home life. answer response percent strongly disagree 0 0 disagree 2 3 agree 37 58 strongly agree 24 38 after completing the technology challenge, my desire to learn new technologies has increased. answer response percent strongly disagree 0 0 disagree 7 11 agree 44 69 strongly agree 13 20 i feel i now learn new technologies more quickly. answer response percent strongly disagree 0 0 disagree 20 31 agree 39 61 strongly agree 5 8 appendix b. exit survey results bridging the gap: self-directed staff technology training | quinney, smith, and galbraith 213 open-ended questions ■■ what would you change about the technology challenge? ■■ what did you like about the technology challenge? ■■ what technologies were you introduced to during the technology challenge that you now use on a regular basis? ■■ in what was do you feel the technology challenge has benefited you the most? how much more proficient do you feel in . . . category not any somewhat a lot hardware 31% 64% 5% software 8% 72% 20% internet resources 17% 68% 15% library technology 23% 64% 13% in order for you to succeed in your job, how important is keeping abreast of new technologies to you? answer response percent not important 1 2 important 22 34 very important 41 64 microsoft word september_ital_park_proofed.docx evaluation of semi-‐automatic metadata generation tools: a survey of the current state of the art jung-‐ran park and andrew brenza information technology and libraries | september 2015 22 abstract assessment of the current landscape of semi-‐automatic metadata generation tools is particularly important considering the rapid development of digital repositories and the recent explosion of big data. utilization of semi-‐automatic metadata generation is critical in addressing these environmental changes and may be unavoidable in the future considering the costly and complex operation of manual metadata creation. to address such needs, this study examines the range of semi-‐automatic metadata generation tools (n = 39) while providing an analysis of their techniques, features, and functions. the study focuses on open-‐source tools that can be readily utilized in libraries and other memory institutions. the challenges and current barriers to implementation of these tools were identified. the greatest area of difficulty lies in the fact that the piecemeal development of most semi-‐automatic generation tools only addresses part of the issue of semi-‐automatic metadata generation, providing solutions to one or a few metadata elements but not the full range of elements. this indicates that significant local efforts will be required to integrate the various tools into a coherent set of a working whole. suggestions toward such efforts are presented for future developments that may assist information professionals with incorporation of semi-‐automatic tools within their daily workflows. introduction with the rapid increase in all types of information resources managed by libraries over the last few decades, the ability of the cataloging and metadata community to describe those resources has been severely strained. furthermore, the reality of stagnant and decreasing library budgets has prevented the library community from addressing this issue with concomitant staffing increases. nevertheless, the ability of libraries to make information resources accessible to their communities of users remains a central concern. thus there is a critical need to devise efficient and cost effective ways of creating bibliographic records so that users are able to find, identify, and obtain the information resources they need. one promising approach to managing the ever-‐increasing amount of information is with semi-‐ automatic metadata generation tools. semi-‐automatic metadata generation tools jung-‐ran park (jung-‐ran.park@drexel.edu) is editor, journal of library metadata, and associate professor, college of computing and informatics, drexel university, philadelphia. andrew brenza (apb84@drexel.edu) is project assistant, college of computing and informatics, drexel university, philadelphia. evaluation of semi-‐automatic metadata generation tools| park and brenza doi: 10.6017/ital.v34i3.5889 23 concern the use of software to create metadata records with varying degrees of supervision from a human specialist.1 in its ideal form, semi-‐automatic metadata generation tools are capable of extracting information from structured and unstructured information resources of all types and creating quality metadata that not only facilitate bibliographic record creation but also semantic interoperability, a critical factor for resource sharing and discovery in the networked environment. through the use of semi-‐automatic metadata generation tools, the library community has the potential to address many issues related to the increase of information resources, the strain on library budget, the need to create high-‐quality, interoperable metadata records, and, ultimately, the effective provision of information resources to users. there are many potential benefits to semi-‐automatic metadata generation. the first is scalability. because of the quantity of information resources and the costly and time-‐consuming nature of manual metadata generation,2 it is increasingly apparent that there simply are not enough information professionals available for satisfying the metadata-‐generation needs of the library community. semi-‐automatic metadata generation, on the other hand, offers the promise of using high levels of computing power to manage large amounts of information resources. in addition to scalability, semi-‐automatic metadata generation also offers potential cost savings through a decrease in the time required to create effective records. furthermore, the time savings would allow information professionals to focus on tasks that are more conceptually demanding and thus not suitable for automatic generation. finally, because computers can perform repetitive tasks with relative consistency when compared to their human counterparts, automatic metadata generation promises the ability to create more consistent records. a potential increase in consistency of quality metadata records would, in turn, increase the potential for interoperability and thereby the accessibility of information resources in general. thus semi-‐automatic metadata generation offers the potential to not only ease resource description demands on the library community but also to improve resource discovery for its users. goals of the study assessment of the current landscape of semi-‐automatic metadata generation tools is particularly important considering the fast development of digital repositories and the recent explosion of data and information. utilization of semi-‐automatic metadata generation is critical to address such environmental changes and may be unavoidable in the future considering the costly and complex operation of manual metadata creation. even though there are promising experimental studies that exploit various methods and sources for semi-‐automatic metadata generation,3 a lack of studies assessing and evaluating the range of tools have been developed, implemented, or improved. to address such needs, this study aims to examine the current landscape of semi-‐ automatic metadata generation tools while providing an evaluative analysis of their techniques, features, and functions. the study primarily focuses on open-‐source tools that can be readily utilized in libraries and other memory institutions. the study also highlights some of the challenges still facing the continued development of semi-‐automatic tools and the current barriers information technology and libraries | september 2015 24 to their incorporation into the daily workflows for information organization and management. future directions for the further development of tools are also discussed. toward this end, a critical review of the literature in relation to semi-‐automatic metadata generation tools published from 2004 to 2014 was conducted. databases such as library and information sciences abstracts and library, information science and technology abstracts were searched and germane articles identified through review of titles and abstracts. because the problem of creating viable tools for the reliable automatic generation of metadata is a not a problem limited to the library and information science professions,4 database searches were expanded to include those databases pertinent to the computing science, including proquest computing, academic search premier, and applied science and technology. keywords, such as “automatic metadata generation,” “metadata extraction,” “metadata tools,” and “text mining,” including their stems, were used to explore the databases. in addition to keyword searching, relevant articles were also identified within the reference sections of articles already deemed pertinent to the focus of the survey as well as through the expansion of results lists through the application of relevant subject terms applied to pertinent articles. to ensure that the latest, most reliable developments in automatic metadata were reviewed, various filters, such as date range and peer-‐review, were employed. once tools were identified, their capabilities were tested (when possible), their features were noted, and overarching developments were determined. the remainder of the article provides an overview of the primary techniques developed for the semi-‐automatic generation of metadata and a review of the open-‐source metadata generation tools that employ them. the challenges and current barriers to semi-‐automatic metadata tool implementation are described as well as suggestions for future developments that may assist information professionals with integration of semi-‐automatic tools within the daily workflow of technical services departments. current techniques for the automatic generation of metadata as opposed to manual metadata generation, semi-‐automatic metadata generation relies on machine methods to assist with or to complete the metadata-‐creation process. greenberg distinguished between two methods of automatic metadata generation: metadata extraction and metadata harvesting.5 metadata extraction in general employs automatic indexing and information retrieval techniques to generate structured metadata using the original content of resources. on the other hand, metadata harvesting concerns a technique to automatically gather metadata from individual repositories in which metadata has been produced by semi-‐automatic or manual approaches. the harvested metadata can be stored in a central repository for future resource retrieval. within this dichotomy of extraction methods, there are several other more specific techniques that researchers have developed for the semi-‐automatic generation of metadata. polfreman et al. identified an additional six techniques that have been developed over the years: meta-‐tag harvesting, content extraction, automatic indexing, text and data mining, extrinsic data auto evaluation of semi-‐automatic metadata generation tools| park and brenza doi: 10.6017/ital.v34i3.5889 25 generation, and social tagging.6 although the last technique is not properly a semi-‐automatic metadata generation technique because it is used to generate metadata with a minimum of intervention required by metadata professionals, it can be viewed as a possible mode to streamline the metadata creation process. both greenberg and polfreman provide comprehensive, high-‐level characterizations of the techniques employed in current semi-‐automatic metadata generation tools. however, an evaluation of these techniques within the context of a broad survey of the tools themselves and a comprehensive enumeration of currently available tools are not addressed. thus, although these techniques will be examined for the remainder of this section, they serve simply as a framework through which this study provides a current and comprehensive analysis of the tools available for use today. each section provides an overview of the relevant technique, a discussion of the most current research related to it, and the tools that employ that technique. the tables included in each section provide lists of the semi-‐automatic metadata generation tools (n = 39) evaluated in the course of this survey. the information presented in the tables is designed to provide a characterization of each tool: its name, its online location, the technique(s) used to generate metadata, and a brief description of the tool’s functions and features. only those tools that are currently available for download or for use as web services at the time of this writing are included. furthermore, the listed tools have not been strictly limited to metadata-‐ generation applications but also include some content management system software (cmss) as these generally provide some form of semi-‐automatic metadata extraction. typically, cmss are capable of extracting technical metadata as well as data that can found in the meta-‐tags of information resources, such as the file name, and using that information as the title of a record. meta-‐tag extraction meta-‐tag extraction is a computing process whereby values for metadata fields are identified and populated through an examination of metadata tags within or attached to a document. in other words, it is a form of metadata harvesting and, possibly, conversion of that metadata into other formats. marcedit, the most widely used semi-‐automatic tool for the generation of metadata in us libraries,7 is an example of this technique. marcedit essentially harvests metadata from open archives initiative protocol for metadata harvesting (oai-‐pmh) compliant records and offers the user the opportunity to convert those records to a variety of formats, including machine-‐readable cataloging (marc), machine-‐readable cataloguing in xml (marc xml), metadata object description schema (mods), and encoded archival description (ead). it also offers the capabilities of converting records from any of the supported formats to any of the other supported formats. other examples of this technique are the web services editor-‐converter dublin core metadata and firefox dublin core viewer extension. both of these programs search html files on the web and convert information found in html meta-‐tags to dublin core elements. in the cases of marcedit information technology and libraries | september 2015 26 and editor-‐converter dublin core, users are presented with the converted information in an interface that allows the user to edit or refine the data. figure 1 provides an illustration of the extracted metadata of the new york times homepage using editor-‐converter dublin core, while figure 2 offers an illustration of the editor that this web service provides. figure 1. screenshot of extracted dublin core metadata using editor-‐converter dublin core. figure 2. screenshot of editor-‐converter dublin core editing tool (only eight of the sixteen fields are visible in this screenshot). evaluation of semi-‐automatic metadata generation tools| park and brenza doi: 10.6017/ital.v34i3.5889 27 perhaps the biggest weakness to this type of tool is that it entirely depends on the quality of the metadata from which the programs harvest. this can be most readily seen in the above figure by the lack of values for a number of the dublin core fields for the the new york times website. programs that solely employ the technique of meta-‐tag harvesting are unable to infer values for metadata elements that are not already populated in the source. table 1 lists the tools that support meta-‐tag harvesting either as the sole technique or as one of a suite of techniques used to generate metadata from resources. of the thirty-‐nine tools evaluated for this study, nineteen support meta-‐tag harvesting. tool name location techniques functions/features anvl/erc kernel metadata conversion toolkit http://search.cpan.org/~jak/file-‐ anvl/anvl meta-‐tag harvester a utility that can automatically convert records in the anvl format into other formats such as xml, json (javascript object notation), turtle or plain, among others. apache poi – text extractor http://poi.apache.org/download.html content extractor; meta-‐tag harvester; extrinsic auto-‐ generator apache poi provides basic text extraction for all project supported file formats. in addition to the (plain) text, apache poi can access the metadata associated with a given file, such as title and author. apache tika http://tika.apache.org/ content extractor; meta-‐tag harvester; extrinsic auto-‐ generator built on apache poi, the apache tika toolkit detects and extracts metadata and text content from various documents. ariadne harvester http://sourceforge.net/projects/ariadn ekps/files/?source=navbar meta-‐tag harvester a harvester of oai-‐pmh compliant records which can be converted to various other schema such as learning object metadata (lom). bibframe tools http://www.loc.gov/bibframe/implem entation/ meta-‐tag harvester bibframe offers a number of tools for the conversion of marcxml documents to bibframe documents. web service and downloadable software are both available. data fountains http://datafountains.ucr.edu/ content extractor; automatic indexer; meta-‐tag harvester; extrinsic auto-‐ generator scans html documents and first extracts information contained in meta-‐tags. if information is unavailable in meta-‐tags, the program will use other techniques to assign values. includes a focused web crawler that can target websites concerning a specific subject. information technology and libraries | september 2015 28 dublin core meta toolkit http://sourceforge.net/projects/dcmet atoolkit/files/?source=navbar meta-‐tag harvester transforms data collected via different methods into dublin core (dc) compatible metadata. dspace http://www.dspace.org/ meta-‐tag harvester; extrinsic auto-‐ generator; social tagging automatically extracts technical information regarding file format and size. can also extract some information from meta-‐tags. editor-‐converter dublin core metadata http://www.library.kr.ua/dc/dcedituni e.html meta-‐tag harvester; extrinsic auto-‐ generator scans html documents, harvesting metadata from tags and converting them to dc. embedded metadata extraction tool (emet) http://www.artstor.org/global/g-‐ html/download-‐emet-‐public.html content extractor; emet is a tool designed to extract metadata embedded in jpeg and tiff files. meta-‐tag harvester; extrinsic auto-‐ generator firefox dublin core viewer extension http://www.splintered.co.uk/experime nts/73/ meta-‐tag harvester; extrinsic auto-‐ generator scans html documents, harvesting metadata from tags and displaying them in dublin core. marcedit http://marcedit.reeset.net/ meta-‐tag harvester harvests oai-‐pmh compliant data and converts it to various formats including dc and marc. metatag extractor software http://meta-‐tag-‐ extractor.software.informer.com/ meta-‐tag harvester permits customizable extraction features, harvesting meta-‐tags as well as contact information from websites. my meta maker http://old.isn-‐ oldenburg.de/services/mmm/ meta-‐tag harvester can convert manually entered data into dc. photo rdf-‐gen http://www.webposible.com/utilidade s/photo_rdf_generator_en.html meta-‐tag harvester generates dublin core and resource description framework (rdf) output from manually entered input. pymarc https://github.com/edsu/pymarc meta-‐tag harvester scripting tool in python language for the batch processing of marc records, similar to marcedit. repomman http://www.hull.ac.uk/esig/repomman /index.html meta-‐tag harvester; content extractor; extrinsic auto-‐ generator automatically extracts various elements for documents uploaded to fedora such as author, title, description, and key words, among others. results are presented to user for review. sherpa/romeo http://www.sherpa.ac.uk/romeo/api.h tml meta-‐tag harvester a machine-‐to-‐machine application program interface (api) that permits the automatic look-‐up and importation of publishers and journals. url and metatag extractor http://www.metatagextractor.com/ meta-‐tag harvester permits the targeted searching of websites and extracts urls and meta-‐tags from those sites. table 1. semi-‐automatic tools that support meta-‐tag harvesting. evaluation of semi-‐automatic metadata generation tools| park and brenza doi: 10.6017/ital.v34i3.5889 29 content extraction content extraction is a form of metadata extraction whereby various computing techniques are used to extract information from the information resource itself. in other words, these techniques do not rely on the identification of relevant meta-‐tags for the population of metadata values. an example of this technique is the kea application, a program developed at the new zealand digital library that uses machine learning, term frequency-‐inverse document frequency (tf.idf) and first-‐occurrence techniques to identify and assign key phrases from the full text of documents.8 the major advantage of this type of technique is that the extraction of metadata can be done independently of the quality of metadata associated with any given information resource. another example of a tool utilizing this technique is the open text summarizer, an open-‐source program that offers the capability of reading a text and extracting important sentences to create a summary as well as to assign keywords. figure 3 provides a screenshot of what a summarized text might look like using the open text summarizer. figure 3. open text summarizer: sample summary of text. another form of this technique often relies on the predictable structure of certain types of documents to identify candidate values for metadata elements. for instance, because of the reliable format of scholarly research papers—which generally include a title, author, abstract, introduction, conclusion, and reference sections in predictable ways—this format can be exploited by machines to extract metadata values from them. several projects have been able to exploit this technique in combination with machine learning algorithms to extract various forms of metadata. for instance, in the randkte project, optical character recognition software was used to scan a large quantity of legal documents from which, because of the regularity of the documents’ information technology and libraries | september 2015 30 structure, structural metadata such as chapter, section, and page number could be extracted.9 in contrast, the kovacevic’s project used the predictable structure of scholarly articles, converting documents from pdf to html files while preserving the formatting details and used classification algorithms to extract metadata regarding title, author, abstract, and keywords, among other elements.10 table 2 lists the tools that support content extraction either as the sole technique or as one of a suite of techniques used to generate metadata from resources. of the thirty-‐nine tools evaluated for this study, twenty tools support some form of content extraction. tool name location techniques functions/features apache poi— text extractor http://poi.apache.org/download.html content extractor; meta-‐tag harvester; extrinsic auto-‐ generator apache poi provides basic text extraction for all project supported file formats. in addition to the (plain) text, apache poi can access the metadata associated with a given file, such as title and author. apache standol https://stanbol.apache.org/ content extractor; automatic indexer extracts semantic metadata from pdf and text files. can apply extracted terms to ontologies. apache tika http://tika.apache.org/ content extractor; meta-‐tag harvester; extrinsic auto-‐ generator built on apache poi, the apache tika toolkit detects and extracts metadata and text content from various documents. biblio citation parser http://search.cpan.org/~mjewell/ biblio-‐citation-‐parser-‐1.10/ content extractor a set of modules for citation parsing. catmdedit http://catmdedit.sourceforge.net/ content extractor catmdedit allows the automatic creation of metadata for collections of related resources, in particular spatial series that arise as a result of the fragmentation of geometric resources into datasets of manageable size and similar scale. crossref http://www.crossref.org/ simpletextquery/ content extractor this web service returns digital object identifiers for inputted references. data fountains http://datafountains.ucr.edu/ content extractor; automatic indexer; meta-‐tag harvester; extrinsic auto-‐ generator scans html documents and first extracts information contained in meta-‐tags. if information is unavailable in meta-‐tags, the program will use other techniques to assign values. includes a focused web crawler that can target websites concerning a specific subject. evaluation of semi-‐automatic metadata generation tools| park and brenza doi: 10.6017/ital.v34i3.5889 31 embedded metadata extraction tool (emet) http://www.artstor.org/global/g -‐html/download-‐emet-‐public.html content extractor; meta-‐tag harvester; extrinsic auto-‐ generator emet is a tool designed to extract metadata embedded in jpeg and tiff files. freecite http://freecite.library.brown.edu/ content extractor free parsing tool for the extraction of reference information. can be downloaded or used as a web service. general architecture for text engineering (gate) http://gate.ac.uk/overview.html content extractor; automatic indexer; natural language processor and information extractor. kea http://www.nzdl.org/kea/index_old .html#download content extractor; automatic indexer analyzes the full texts of resources and extracts keyphrases. keyphrases can also be mapped to customized ontologies or controlled vocabularies for subject term assignment. metagen http://www.codeproject.com/articles /41910/metagen-‐a-‐project -‐metadata-‐generator-‐for-‐visual-‐st content extractor; automatic indexer used to build a metadata generator for silverlight and desktop clr projects, metagen can be used as a replacement for static reflection (expression trees), reflection (walking the stack), and various other means for deriving the name of a property, method, or field. metagenerator http://extensions.joomla.org/ extensions/site-‐management/seo-‐a -‐metadata/meta-‐data/11038 content extractor a plugin that automatically generates description and keyword meta-‐tags by pulling text from joomla content. with this plugin you can also control some title options and add url meta-‐tags. ont-‐o-‐mat http://projects.semwebcentral.org/ projects/ontomat/ content extractor assists user with annotation of websites that are semantic web-‐ compliant. may now include a feature that automatically suggests portions of the website to annotate. open text summarizer http://libots.sourceforge.net/ content extractor extracts pertinent sentences from a resource to build a free text description. information technology and libraries | september 2015 32 parscit http://wing.comp.nus.edu.sg/parscit/ #ws content extractor open-‐source string-‐parsing package for the extraction of reference information from scholarly articles. repomman http://www.hull.ac.uk/esig/ repomman/index.html meta-‐tag harvester; content extractor; extrinsic auto-‐ generator automatically extracts various elements for documents uploaded to fedora such as author, title, description, and key words, among others. results are presented to user for review. simple automatic metadata generation interface (samgi) http://hmdb.cs.kuleuven.be/amg/ download.php content extractor; extrinsic auto-‐ generator a suite of tools that is able to automatically extract metadata elements such as key phrase and language from documents as well as from the context in which a document exists. termine http://www.nactem.ac.uk/software/ termine/ content extractor extracts keywords from texts through c-‐value analysis and acromine, an acronym identifier and dictionary. available as free web service for academic use. yahoo content analysis api https://developer.yahoo.com/ contentanalysis/ content extractor; automatic indexer the content analysis web service detects entities/concepts, categories, and relationships within unstructured content. it ranks those detected entities/concepts by their overall relevance, resolves those if possible into wikipedia pages, and annotates tags with relevant metadata. table 2. semi-‐automatic tools that support content extraction automatic indexing in the same way as content extraction, automatic indexing involves the use of machine learning and rule-‐based algorithms to extract metadata values from within information resources themselves, rather than relying on the content of meta-‐tags applied to resources. however, this technique also involves the mapping of extracted metadata terms to controlled vocabularies such as the library of congress subject headings (lcsh), the getty thesaurus of geographic names (tgn), or the library of congress name authority file (lcnaf), or to domain-‐specific or locally developed ontologies. thus, in this technique, researchers use classifying and clustering algorithms to extract relevant metadata from texts. term-‐frequency statistics or if.idf, which determines likelihood of keyword applicability through its relative frequency within a given evaluation of semi-‐automatic metadata generation tools| park and brenza doi: 10.6017/ital.v34i3.5889 33 document as opposed to its relative infrequency in related documents, are commonly used in this technique. projects such as john hopkins university’s automatic name authority control (anac) tool utilizes this technique to extract the names of composers within its sheet music collections and to assign the authorized form of those names based on comparisons with lcnaf.11 erbs et al. also use this technique to extract key phrases from german educational documents which are then used to assign index terms, thereby increasing the degree to which related documents are collocated within the repository and the consistency of subject term application.12 table 3 lists the tools that support automatic indexing either as the sole technique or as one of a suite of techniques used to generate metadata from resources. of the thirty-‐nine tools evaluated for this study, seven tools support some form of automatic indexing. tool name location techniques functions/features apache poi— text extractor http://poi.apache.org/download.html content extractor; meta-‐tag harvester; extrinsic auto-‐ generator apache poi provides basic text extraction for all project supported file formats. in addition to the (plain) text, apache poi can access the metadata associated with a given file, such as title and author. apache tika http://tika.apache.org/ content extractor; meta-‐tag harvester; extrinsic auto-‐ generator built on apache poi, the apache tika toolkit detects and extracts metadata and text content from various documents. data fountains http://datafountains.ucr.edu/ content extractor; automatic indexer; meta-‐tag harvester; extrinsic auto-‐ generator scans html documents and first extracts information contained in meta-‐tags. if information is unavailable in meta-‐tags, the program will use other techniques to assign values. includes a focused web crawler that can target websites concerning a specific subject. digital record object identification (droid) http://www.nationalarchives.gov.uk/ information-‐management/manage -‐information/preserving-‐digital -‐records/droid/ extrinsic auto-‐ generator droid is a software tool developed by the national archives to perform automated batch identification of file formats. dspace http://www.dspace.org/ meta-‐tag harvester; extrinsic auto-‐ generator automatically extracts technical information regarding file format and size. can also extract some information from meta-‐tags. editor-‐ converter dublin core metadata http://www.library.kr.ua/dc/ dceditunie.html meta-‐tag harvester; extrinsic auto-‐ generator scans html documents, harvesting metadata from tags and converting them to dublin core. information technology and libraries | september 2015 34 embedded metadata extraction tool (emet) http://www.artstor.org/global/g -‐html/download-‐emet-‐public.html content extractor; meta-‐tag harvester; extrinsic auto-‐ generator emet is a tool designed to extract metadata embedded in jpeg and tiff files. firefox dublin core viewer extension http://www.splintered.co.uk/ experiments/73/ meta-‐tag harvester; extrinsic auto-‐ generator scans html documents, harvesting metadata from tags and displaying them to dublin core. jhove http://jhove.sourceforge.net/ #implementation extrinsic auto-‐ generator extracts metadata regarding file format and size as well as validating the structure of the identified file format. national library of new zealand— metadata extraction tool http://meta-‐extractor .sourceforge.net/ extrinsic auto-‐ generator developed by the national library of new zealand to programmatically extract preservation metadata from a range of file formats like pdf documents, image files, sound files, microsoft office documents, and others. omeka http://omeka.org/ extrinsic auto-‐ generator; social tagging automatically extracts technical information regarding file format and size. repomman http://www.hull.ac.uk/esig/ repomman/index.html meta-‐tag harvester; content extractor; extrinsic auto-‐ generator automatically extracts various elements for documents uploaded to fedora such as author, title, description, and key words, among others. results are presented to user for review. simple automatic metadata generation interface (samgi) http://hmdb.cs.kuleuven.be/amg/ download.php content extractor; extrinsic auto-‐ generator a suite of tools that is able to automatically extract metadata elements such as keyphrase and language from documents as well as from the context in which a document exists. table 3. semi-‐automatic tools that support automatic indexing text and data mining the two methods discussed above, content extraction and automatic indexing, rely on text-‐ and data-‐mining techniques for the automatic extraction of metadata. in other words, the above methods utilize machine-‐learning algorithms, statistical analysis of term frequencies, clustering techniques, or techniques that examine the frequency of term utilization between documents as opposed to the use of controlled vocabularies, and classifying techniques, or techniques that exploit the conventional structure of documents, for the semi-‐automatic generation of metadata. because of the complexity of these techniques, few tools have been fully developed for application within real-‐world library settings. rather, most uses of these techniques have been developed to solve the problems of automatic metadata generation within the context of specific research projects. evaluation of semi-‐automatic metadata generation tools| park and brenza doi: 10.6017/ital.v34i3.5889 35 there are two reasons for this. one is that, as many researchers have noted, the effectiveness of machine learning techniques depends on the quality and quantity of training data used to teach the system.13, 14, 15 because of the number and diversity of subject domains as well as the shear variety of document formats, many applications are designed to address the metadata needs of very specific subject domains and very specific types of documents. this is a point that kovacevic et al. make in stating that machine learning techniques generally work best for documents of a similar type, like research papers.16 another issue, especially as it applies to automatic indexing, is the fact that, as gardner notes, controlled vocabularies such as the lcsh are too complicated and diverse in structure to be applied through semi-‐automatic means.17 although some open-‐source tools such as data fountains have made efforts to overcome this complexity, projects like it are the exception rather than the rule. these issues signify the difficulty with developing sophisticated semi-‐automatic metadata generation tools that have general applicability across a wide range of subject domains and format types. nevertheless, for semi-‐automatic metadata generation tools to become a reality for the library community, such complexity will have to be overcome. there are, however, some tools that have broader applicability or can be customized to meet local needs. for instance, the kea keyphrase extractor offers the option of building local or applying available ontologies that can be used to refine the extraction process. perhaps the most promising of all is the above mentioned data fountains suite of tools developed by the university of california. the data fountains suite incorporates almost every one of the semi-‐automatic metadata techniques described in this study, including sophisticated content extraction and automatic indexing features. it also provides several ways to customize the suite in order to meet local needs. extrinsic data auto-‐generation extrinsic data auto-‐generation is the process of extracting metadata about an information resource that is not contained within the resource itself. extrinsic data auto-‐generation can involve the extraction of technical metadata such as file format and size but can also include the extraction of more complicated features such as the grade level of an educational resource or the intended audience for a document. the process of extracting technical metadata is perhaps one area of semi-‐automatic metadata generation that is in a high state of development, included in most cmss such as dspace,18 as well as other more sophisticated tools such as harvard’s jhove, which can recognize at least 7twelve different kinds of textual, audio, and visual file formats.19 on the other hand, the problem of semi-‐automatically generating other types of extrinsic metadata, like grade level, are of the most difficult to solve. as leibbrandt et al. note in their analysis of the use of artificial intelligence mechanisms to generate subject metadata for a repository of educational materials at the education services australia, the extraction of extrinsic metadata such as grade level was much more difficult than the extraction of keywords because of the lack of information surrounding a resource’s context within the resource itself.20 this difficulty can also be seen in the absence of tools that support the information technology and libraries | september 2015 36 extraction of extrinsic data beyond those that are harvesting metadata that has been created manually or extracting technical metadata. table 4 lists the tools that support extrinsic data auto-‐generation either as the sole technique or as one of a suite of techniques used to generate metadata from resources. of the thirty-‐nine tools evaluated for this study, thirteen tools support some form of extrinsic data auto-‐generation. tool name location techniques functions/features apache poi— text extractor http://poi.apache.org/download.html content extractor; meta-‐tag harvester; extrinsic auto-‐ generator apache poi provides basic text extraction for all project supported file formats. in addition to the (plain) text, apache poi can access the metadata associated with a given file, such as title and author. apache tika http://tika.apache.org/ content extractor; meta-‐tag harvester; extrinsic auto-‐ generator built on apache poi, the apache tika toolkit detects and extracts metadata and text content from various documents. data fountains http://datafountains.ucr.edu/ content extractor; automatic indexer; meta-‐tag harvester; extrinsic auto-‐ generator scans html documents and first extracts information contained in meta-‐tags. if information is unavailable in meta-‐tags, the program will use other techniques to assign values. includes a focused web crawler that can target websites concerning a specific subject. digital record object identification (droid) http://www.nationalarchives.gov.uk/ information-‐management/manage -‐information/preserving-‐digital -‐records/droid/ extrinsic auto-‐ generator droid is a software tool developed by the national archives to perform automated batch identification of file formats. dspace http://www.dspace.org/ meta-‐tag harvester; extrinsic auto-‐ generator automatically extracts technical information regarding file format and size. can also extract some information from meta-‐tags. editor-‐ converter dublin core metadata http://www.library.kr.ua/dc/ dceditunie.html meta-‐tag harvester; extrinsic auto-‐ generator scans html documents, harvesting metadata from tags and converting them to dublin core. embedded metadata extraction tool (emet) http://www.artstor.org/global/g -‐html/download-‐emet-‐public.html content extractor; meta-‐tag harvester; extrinsic auto-‐ generator emet is a tool designed to extract metadata embedded in jpeg and tiff files. firefox dublin core viewer extension http://www.splintered.co.uk/ experiments/73/ meta-‐tag harvester; extrinsic auto-‐ generator scans html documents, harvesting metadata from tags and displaying them to dublin core. jhove http://jhove.sourceforge.net/ extrinsic auto-‐ extracts metadata regarding file evaluation of semi-‐automatic metadata generation tools| park and brenza doi: 10.6017/ital.v34i3.5889 37 #implementation generator format and size as well as validating the structure of the identified file format. national library of new zealand— metadata extraction tool http://meta-‐extractor .sourceforge.net/ extrinsic auto-‐ generator developed by the national library of new zealand to programmatically extract preservation metadata from a range of file formats like pdf documents, image files, sound files, microsoft office documents, and others. omeka http://omeka.org/ extrinsic auto-‐ generator; social tagging automatically extracts technical information regarding file format and size. repomman http://www.hull.ac.uk/esig/ repomman/index.html meta-‐tag harvester; content extractor; extrinsic auto-‐ generator automatically extracts various elements for documents uploaded to fedora such as author, title, description, and key words, among others. results are presented to user for review. simple automatic metadata generation interface (samgi) http://hmdb.cs.kuleuven.be/amg/ download.php content extractor; extrinsic auto-‐ generator a suite of tools that is able to automatically extract metadata elements such as keyphrase and language from documents as well as from the context in which a document exists. table 4. semi-‐automatic tools that support extrinsic data auto-‐generation. social tagging social tagging is now a familiar form of subject metadata generation although, as mentioned previously, it is not properly a form of automatic metadata generation. nevertheless, because of the relatively low cost in generating and maintaining metadata through social tagging and its current widespread popularity, a few projects have attempted to utilize such data to enhance repositories. for instance, linstaedt et al. use sophisticated computer programs to analyze still images found within flickr and then use this analysis to process new images and to propagate relevant user tags to those images.21 in a slightly more complicated example, liu and qin employ machine-‐learning techniques to initially process and assign metadata, including subject terms, to a repository of documents related to the computer science profession.22 however, this proof of concept project also permits users to edit the fields of the metadata once established. the user-‐edited tags are then reprocessed by the system with the hope of improving the machine-‐learning mechanisms of the database, creating a kind of feedback loop for the system. specifically, the improved tags are used by the system to suggest and assign subject terms for new documents as well as to improve subject description of existing documents within the repository. although these two examples provide instances of sophisticated reprocessing of social tag metadata, these capabilities do not seem to be present in open-‐source tools at this time. nevertheless, social tagging capabilities are offered by many cmss such as omeka. these social tagging capabilities may offer a means to enhance subject access to holdings. information technology and libraries | september 2015 38 table 5 below lists the tools that support social tagging either as the sole technique or as one of a suite of techniques used to generate metadata from resources. of the thirty-‐nine tools evaluated for this study, two tools support some form of social tagging. tool name location techniques functions/features dspace http://www.dspace.org/ meta-‐tag harvester; extrinsic auto-‐ generator; social tagging automatically extracts technical information regarding file format and size. can also extract some information from meta-‐tags. omeka http://omeka.org/ extrinsic auto-‐ generator; social tagging automatically extracts technical information regarding file format and size. table 5. semi-‐automatic tools that support social tagging. challenges to implementation although semi-‐automatic metadata generation tools offer many benefits, especially in regards to streamlining the metadata-‐creation process, there are significant barriers to the widespread adoption and implementation of these tools. one problem with semi-‐automatic metadata generation tools is that many are developed locally to address the specific needs of a given project or as part of academic research. this local, highly focused milieu for development means that general applicability of the tools is potentially diminished. the local context may also hinder widespread adoption of applications that would result in strong communities of application users and provide further support for the development of applications in an open-‐source context. because of the highly specific nature of many current tools, their relevance to real-‐world processes of metadata creation within the broader context of libraries’ diverse information management needs are not accounted for. additionally, many tools are focused on solving one or, at most, a few metadata generation problems. for instance, the kea application is designed to use machine-‐learning techniques for the sole purpose of extracting keywords, the open text summarizer is limited to automatic extractions of summary descriptions and keywords, and editor converter dublin core is designed to extract information in html meta-‐tags and map them to dublin core elements. because of the piecemeal development of semi-‐automatic generation tools, any comprehensive package of tools will require the significant efforts of the implementer to coordinate the selected applications and to produce results in a single output. this is, to say the least, a daunting task. furthermore, a high degree of technical skill is required to implement these complex tools. many of the more sophisticated tools used to semi-‐automatically generate metadata, such as data fountains, kea, and apache stanbol, require competence in a variety of programming languages. evaluation of semi-‐automatic metadata generation tools| park and brenza doi: 10.6017/ital.v34i3.5889 39 significant knowledge of c++, python, and java, are required to implement these systems properly. the high degree of technical knowledge needed to implement these tools means that many libraries and other institutions may not have resources to begin implementing them, let alone incorporating them into the daily workflows of the metadata creation process. further, this high degree of technical expertise may require libraries to seek assistance outside of the library. in other words, librarians may need to build strong collaborative relationships with those who have the technical skills, expertise and credentials to implement and maintain these complicated tools. as vellucci et al. note in regards to their development of the metadata education and research information commons (meric), a metadata-‐driven clearinghouse of education materials related to metadata, elaborate and multidisciplinary partnerships need to be firmly established for the ultimate success of such projects, including the sustained support of the highest levels of administration.23 these types of partnerships may be difficult to establish and maintain for the sustained implementation of complicated tools. additionally, sustainable development of tools, especially in regards to the funding needed for continued development of open-‐source applications, appears to be a significant barrier to implementation. for instance, at the time of this writing, many of the tools that were touted in the literature as being most promising, such as dc dot, reggie, and describethis, are no longer available for implementation. beyond the fact that discontinuation hurts the potential adoption and continued development of semi-‐automatic tools within real world library and other information settings, there is also the problem that those settings that have in fact adopted tools may lose the technical support of a central developer and community of users. thus discontinuation may result in higher rates of tool obsolescence and increase the potential expenses of libraries who have implemented and then must change applications. finally, the application of semi-‐automatic metadata tools remains relatively untested in real-‐world scenarios. as polfreman et al. note, most tests of automatic metadata generation tools have several of problems, including small sample sizes, narrow scope of project domains, and experiments that lack true objectivity because systems are generally tested by their creators.24 for these reasons, libraries and other institutions may be reluctant to expand the resources needed to implement and fully integrate a complicated, promising, but ultimately untested, tool within the already strained workflows of its processes. conclusion semi-‐automatic metadata generation tools hold the promise of assisting information professionals with the management of ever-‐increasing quantities and types of information resources. using software that can create metadata records consistently and efficiently, semi-‐automatic metadata generation tools potentially offer significant cost and time savings. however, the full integration of these tools into the daily workflows of libraries and other information settings remains elusive. for instance, although many tools have been developed that have addressed many of the more complicated aspects of semi-‐automatic metadata generation, including the extraction of information technology and libraries | september 2015 40 information related to conceptually difficult areas of bibliographic description such as subject terms, open-‐ended resource descriptions, and keyword assignment, many of these tools are relevant only at the project level and are not applicable to the broader contexts needed by libraries. in other words, the current array of tools exists to solve experimental problems but has not been developed to the point that the library community can implement it in a meaningful way. perhaps the greatest area of difficulty lies in the fact that most tools only address part of the problem of semi-‐automatic metadata generation, providing solutions to the semi-‐automatic generation of one or a few bibliographic elements but not the full range elements. this means that for libraries to truly have a comprehensive tool set for the semi-‐automatic generation of metadata records, significant local efforts will be required to integrate the various tools into a working whole. couple this issue with the instability of tool development and maintenance and it appears that the library community may lack incentive to invest already strained and limited resources in the adoption of these tools. thus it appears that a number of steps will need to be taken before the library community can seriously consider the incorporation of semi-‐automatic metadata generation tools within its daily workflows. first, it seems that the integration of these various tools into a coherent set of applications is likely the next step in the development of viable semi-‐automatic metadata generation. since most small libraries likely do not have the resources required to integrate these disparate tools together, let alone incorporate them within existing library systems, a single package of tools will be needed simply from a resource perspective. secondly, considering the high level of technical expertise needed to implement the current array of tools, the integrated set of tools must be accomplished in such a way as to foster implementation, utilization, and maintenance with a minimum of technical expertise. for instance, if an integrated set of tools that functioned across a wide range of subject domains and format types could be developed, the suite might be akin to the cmss currently employed by many libraries. furthermore, with a suite of tools that are relatively easy to use, adaption would likely increase. this might result in a stable community of users that would foster the further development of the tools in a sustainable manner. a comprehensive, relatively easy to implement set of tools might foster independent testing of those tools. the independent testing of the semi-‐automatic tools is needed to provide an objective basis for tool evaluation and further development. finally, designing automated workflows tailored to the subject domain and types of resources seems to be an essential step for integrating semi-‐automatic metadata generation tools into metadata creation. such workflows may delineate data elements that can be generated by automated meta-‐tag extractor from data elements that need to be refined and manually created by cataloging and metadata professionals. to develop, maximize, and sustain semi-‐automatic metadata generation workflows, administrative support for finance, human resources, and training is critical. evaluation of semi-‐automatic metadata generation tools| park and brenza doi: 10.6017/ital.v34i3.5889 41 thus, although many of the technical aspects of semi-‐automatic metadata generation are well on their way to being solved, many other barriers exist that might limit adoption. further, these barriers may have a negative influence on the continued, sustainable development of semi-‐ automatic metadata generation tools. nevertheless, there is a critical need that the library community finds ways to manage the recent explosion of data and information in cost-‐effective and efficient ways. semi-‐automatic metadata generation holds the promise to do just that. acknowledgement this study was supported by the institute of museum and library services. references 1. jane greenberg, kristina spurgin, and abe crystal, “final report for the amega (autozmatic 2. sue ann gardner, “cresting toward the sea change,” library resources & technical services 56, no. 2 (2012): 64–79, http://dx.doi.org/10.5860/lrts.56n2.64. 3. for details, see jung-‐ran park and caimei lu, “application of semi-‐automatic metadata generation in libraries: types, tools, and techniques,” library & information science research 31, no. 4 (2009): 225–31, http://dx.doi.org/10.1016/j.lisr.2009.05.002. 4. erik mitchell, “trending tech services: programmatic tools and the implications of automation in the next generation of metadata,” technical services quarterly 30, no. 3 (2013): 296–10, http://dx.doi.org/10.1080/07317131.2013.785802. 5. jane greenberg, “metadata extraction and harvesting: a comparison of two automatic metadata generation applications,” journal of internet cataloging 6, no. 4 (2004): 59–82, http://dx.doi.org/10.1300/j141v06n04_05. 6. malcolm polfreman, vanda broughton, and andrew wilson, “metadata generation for resource discovery,” jisc, 2008, http://www.jisc.ac.uk/whatwedo/programmes/resourcediscovery/autometgen.aspx. 7 park and lu, “application of semi-‐automatic metadata generation in libraries.” 8. kea automatic keyphrase extraction homepage, http://www.nzdl.org/kea/index_old.html. 9. wilhelmina randtke, “automated metadata creation: possibilities and pitfalls,” serials librarian 64, no. 1–4 (2013): 267–84, http://dx.doi.org/10.1080/0361526x.2013.760286. 10. aleksandar kovačević et al.,“automatic extraction of metadata from scientific publications for cris systems.” electronic library and information systems 45, no. 4 (2011): 376–96, http://dx.doi.org/10.1108/00330331111182094. information technology and libraries | september 2015 42 11. mark patton et al., “toward a metadata generation framework: a case study at johns hopkins university,” d-‐lib magazine 10, no. 11 (2004), http://www.dlib.org/dlib/november04/choudhury/11choudhury.html. 12. nicolai erbs, iryna gurevych, and marc rittberger, “bringing order to digital libraries: from keyphrase extraction to index term assignment.” d-‐lib magazine 19, no. 9/10 (2013), http://www.dlib.org/dlib/september13/erbs/09erbs.html. 13. polfreman, broughton, and wilson, “metadata generation for resource discovery.” 14. randtke, “automated metadata creation.” 15. xiaozhong liu and jian qin, “an interactive metadata model for structural, descriptive, and referential representation of scholarly output,” journal of the association for information science & technology 65, no. 5 (2014): 964–83, http://dx.doi.org/10.1002/asi.23007. 16. kovačević et al., “automatic extraction of metadata from scientific publications for cris systems.” 17. gardner, “cresting toward the sea change.” 18. mary kurtz, “dublin core, dspace, and a brief analysis of three university repositories,” information technology & libraries 29, no. 1 (2010): 40–46, http://dx.doi.org/10.6017/ital.v29i1.3157. 19. “jhove -‐ jstor/harvard object validation environment,” jstor, http://jhove.sourceforge.net. 20. richard leibbrandt et al., “smart collections: can artificial intelligence tools and techniques assist with discovering, evaluating and tagging digital learning resources?” international association of school librarianship: selected papers from the annual conference (2010). 21. stefanie lindstaedt et al., “automatic image annotation using visual content and folksonomies,” multimedia tools & applications 42, no. 1 (2009): 97–113, http://dx.doi.org/10.1007/s11042-‐008-‐0247-‐7. 22. liu and qin, “an interactive metadata model.” 23. sherry vellucci, ingrid hsieh-‐yee, and william moen, “the metadata education and research information commons (meric): a collaborative teaching and research initiative,” education for information 25, no. 3/4 (2007): 169–78. 24. polfreman, broughton, and wilson, “metadata generation for resource discovery.” 14 information technology and libraries | march 2007 article title: subtitle in same font author name and second author author id box for 2 column layout 14 information technology and libraries | march 2007 article title: subtitle in same font author name and second author author id box for 2 column layout based on data collected as part of the 2006 public libraries and the internet study, the authors assess the degree to which public libraries provide sufficient and quality bandwidth to support the library’s networked services and resources. the topic is complex due to the arbitrary assignment of a number of kilobytes per second (kbps) used to define bandwidth. such arbitrary definitions to describe bandwidth sufficiency and quality are not useful. public libraries are indeed connected to the internet and do provide public-access services and resources. it is, however, time to move beyond connectivity type and speed questions and consider issues of bandwidth sufficiency, quality, and the range of networked services that should be available to the public from public libraries. a secondary, but important issue is the extent to which libraries, particularly in rural areas, have access to broadband telecommunications services. t he biennial public libraries and the internet studies, conducted since 1994, describe public library involve ment with and use of the internet.1 over the years, the studies showed the growth of publicaccess comput ing (pac) and internet access provided by public libraries to the communities they serve. internet connectivity rose from 20.9 percent to essentially 100 percent in less than ten years; the average number of public access computers per library increased from an average of two to nearly eleven; and bandwidth rose to the point where 63 percent of public libraries have connection speeds of greater than 769kbps (kilobytes per second) in 2006. this dramatic growth, replete with related information technology challenges, occurred in an environment of challenges—among them budgetary and staffing—that public libraries face in main taining traditional services as well as networked services. one challenge is the question of bandwidth suf ficiency and quality. the question is complex because typically an arbitrary number describes the number of kbps used to define “broadband.” as will be seen in this paper, such arbitrary definitions to describe band width sufficiency are generally not useful. the federal communications commission (fcc), for example, uses the term “high speed” for connections of 200kbps in at least one direction.2 there are three problematic issues with this definition: 1. it specifies unidirectional bandwidth, meaning that a 200kbps download, but a much slower upload (e.g., 56kbps) would fit this definition; 2. regardless of direction, bandwidth of 200kbps is neither high speed nor does it allow for a range of internetbased applications and services. this inad equacy will increase significantly as internetbased applications continue to demand more bandwidth to operate properly. 3. the definition is in the context of broadband to the single user or household, and does not take into consideration the demands of a highuse multiple workstation publicaccess context. in addition to connectivity speed, there are many ques tions related to public library pac and internet access that can affect bandwidth sufficiency—from budget and sus tainability, staffing and support, to services public librar ies offer through their technology infrastructure, and the impacts of connectivity and pac on the communities that libraries serve. one key question, however, is what is quality pac and internet bandwidth for public libraries? and, in attempting to answer that question, what are measures and benchmarks of quality internet access? this paper provides data from the 2006 public libraries and the internet study to foster discussion and debate around determining quality pac and internet access.3 bandwidth and connectivity data at the library outlet or branch level are presented in this article. the band width measures are not systemwide but rather at the point of service delivery in the branch. ■ the bandwidth issue there are a number of factors that affect the sufficiency and quality of bandwidth in a pac and internet service context. examples of factors that influence actual speed include: ■ number of workstations (publicaccess and staff) that simultaneously access the internet; ■ provision of wireless access that shares the same con nection; ■ ultimate connectivity path—that is, a direct connec tion to the internet that is truly direct, or one that goes through regional or other local hops (that may have aggregated traffic from other libraries or orga nizations) out to the internet; john carlo bertot and charles r. mcclure assessing sufficiency and quality of bandwidth for public libraries john carlo bertot (jbertot@fsu.edu) is the associate director of the information use management and policy institute and professor at the college of information, florida state university; and charles r. mcclure (cmcclure@ci.fsu.edu) is the director of the information use management and policy institute (www .ii.fsu.edu) and francis eppes professor of information studies at the college of information, florida state university. article title | author 15assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 15 ■ type of connection and bandwidth that the telecom munications company is able to supply the library; ■ operations (surfing, email, downloading large files, streaming content) being performed by users of the internet connection; ■ switching technologies; ■ latency effects that affect packet loss, jitter, and other forms of noise throughout a network; ■ local settings and parameters, known or unknown, that impede transmission or bog down the delivery of internetbased content; ■ range of networked services (databases, videoconfer encing, interactive/realtime services) to which the library is linked; ■ if networked, the speed of the network on which the publicaccess workstations reside; and ■ general application resource needs, protocol priority, and other general factors. thus, it is difficult to precisely answer “how much bandwidth is enough” within an evolving and dynamic context of public access, use, and infrastructure. putting publicaccess internet use into a more typi cal applicationanduse scenario, however, may provide some indication of adequate bandwidth. for example: ■ a typical threeminute digital song is 3mb; ■ a typical digital photo is about 2mb; and ■ a typical powerpoint presentation is about 10mb. if one person in a public library were to email a powerpoint presentation at the same time that another person downloaded multiple songs, and another was exchanging multiple pictures, even a library with a t1 line (1.5mbps—megabytes per second) would experience a temporary network slowdown during these operations. this does not take into account many other new high bandwidthconsuming applications such as cnn stream ingvideo channel; uploading and accessing content to a wiki, blog, or youtube.com; or streaming content such as cbs’s webcasting the 2006 ncaa basketball tournament. an increasingly used technology in various settings is twoway internetbased video conferencing. with an installed t1 line, a library could support two 512kbps or three 384kbps videoconferences, depending on the amount of simultaneous traffic on the network—which, in a public access context, would be heavy. indeed, the 2006 public libraries and the internet study indicated a near continuous use of publicaccess workstations by patrons (only 14.6 percent of public libraries indicated that they always had a sufficient number of workstations available for patron use). public libraries increasingly serve as access points to egovernment services and resources, e.g., social services, disaster relief, health care.4 these services can require the simple completion of a webbased form (lowbandwidth consumption) to more interactive services (highband width consumption). and, as access points to continuing education and online degree programs, public libraries need to offer adequate broadband to enable users to access services and resources that increasingly can depend on streaming technologies that consume greater bandwidth. ■ bandwidth and pac in public libraries today as table 1 demonstrates, public libraries continue to increase their bandwidth, with 63.3 percent of public libraries reporting connection speeds of 769kbps or greater. this compares to 47.7 percent of public libraries reporting connection speeds of greater than 769kbps in 2004. there are disparities between rural and urban pub lic libraries, with rural libraries reporting substantially fewer instances of connection speeds of greater than 1.5mbps in 2006. on the one hand, the increase in con nectivity speeds between 2004 and 2006 is a positive step. on the other, 16.1 percent of public libraries report that their connection speeds are insufficient to meet patron demands all of the time, and 29.4 percent indicate that their connection speeds are insufficient to meet patron demands some of the time. thus, nearly half of public libraries indicate that their connection speeds are insuf ficient to meet patron demands some or all of the time. in terms of public access computers, the average number of workstations that public libraries provide is 10.7 (table 2). urban libraries have an average of 17.1 workstations, as compared to rural libraries, which report an average of 7.1 workstations. a closer look at bandwidth and pac for the next sections, the data offer two key views for analysis purposes: (1) workstations—divided into libraries with ten or fewer publicaccess workstations and libraries with more than ten publicaccess worksta tions (given that the average number of publicaccess workstations in libraries is roughly ten); and (2) band width—divided into libraries with 769kbps or less and libraries with greater than 769kbps (an arbitrary indicator of broadband for a public library context). in looking across bandwidth and publicaccess work stations (table 3), overall 31.8 percent of public libraries have connection speeds of less than 769kbps while 63.3 percent have connection speeds of greater than 769kbps. a majority of public libraries—68.5 percent—have ten or fewer workstations, while 30.9 percent have more than ten workstations. in general, rural libraries have fewer workstations and lower bandwidth as compared to sub urban and urban libraries. indeed, 75.2 percent of urban 16 information technology and libraries | march 200716 information technology and libraries | march 2007 libraries with fewer than ten workstations have connec tion speeds of greater than 769kbps, as compared to 45.2 percent of rural libraries. when examining pac capacity, it is clear that public libraries have capacity issues at least some of the time in a typical day (tables 4 through 6). only 14.6 percent of public libraries report that they have sufficient numbers of workstations to meet patron demands at all times (table 6), while nearly as many, 13.7 percent, report that they consistently are unable to meet patron demands for publicaccess workstations (table 4). a full 71.7 percent indicate that they are unable to meet patron demands during certain times in a typical day (see table 5). in other words, 85.4 percent of public libraries report that they are unable to meet patron demand for publicaccess workstations some or all of the time during a typical day—regardless of number of workstations available and type of library. the disparities between rural and urban libraries are notable. in general, urban libraries report more difficulty in meeting patron demands for publicaccess workstations. of urban public libraries, 27.8 percent report that they consistently have difficulty in meeting patron demand for workstations, as compared to 11.0 percent of suburban and 10.6 percent of rural public libraries (table 4). by contrast, 6.6 percent of urban libraries report sufficient workstations to meet patron demand all the time as compared to 18.9 percent of rural libraries (table 6). when reviewing the adequacy of speed of connectiv ity data by the number of workstations, bandwidth, and metropolitan status, a more robust and descriptive pic table 1. public library outlet maximum speed of public-access internet services by metropolitan status and poverty metropolitan status poverty level maximum speed urban suburban rural low medium high overall less than 56kbps 0.7% ±0.8% (n=18) 0.4% ±0.6% (n=17) 3.7% ±1.9% (n=275) 2.0% ±1.4% (n=245) 2.7% ±1.6% (n=61) 2.6% ±1.6% (n=5) 2.1% ±1.4% (n=311) 56kbps– 128kbps 2.5% ±1.6% (n=67) 5.4% ±2.3% (n=264) 15.2% ±3.6% (n=1,132) 9.9% ±3.0% (n=1,237) 9.5% ±2.9% (n=216) 5.3% ±2.2% (n=10) 9.8% ±3.0% (n=1,463) 129kbps– 256kbps 2.7% ±1.6% (n=72) 6.8% ±2.5% (n=332) 11.1% ±3.1% (n=829) 8.5% ±2.8% (n=1,067) 7.3% ±2.6% (n=166) 8.2% ±2.8% (n=1,233) 257kbps–768kbps 9.1% ±2.9% (n=241) 10.4% ±3.1% (n=504) 13.4% ±3.4% (n=1,002) 12.5% ±3.3% (n=1,557) 8.4% ±2.8% (n=190) 11.7% ±3.2% (n=1,747) 769kbps– 1.5mbps 33.6% ±4.7% (n=889) 40.0% ±4.9% (n=1,945) 31.0% ±4.6% (n=2,310) 34.3% ±4.8% (n=4,286) 34.6% ±4.8% (n=788) 38.1% ±4.9% (n=70) 34.4% ±4.8% (n=5,144) greater than 1.5mbps 49.4% ±5.0% (n=1,304) 31.6% ±4.7% (n=1,533) 19.9% ±4.0% (n=1,488) 27.4% ±4.5% (n=3,423) 35.5% ±4.8% (n=808) 50.5% ±5.0% (n=93) 28.9% ±4.5% (n=4,324) don’t know 1.9% ±1.4% (n=50) 5.4% ±2.3% (n=263) 5.7% ±2.3% (n=427) 5.5% ±2.3% (n=685) 2.1% ±1.4% (n=48) 3.5% ±1.8% (n=6) 4.9% ±2.2% (n=739) weighted missing values, n=1,497 table 2. average number of public library outlet graphical publicaccess internet terminals by metropolitan status and poverty* poverty level metropolitan status low medium high overall urban 14.7 20.9 30.7 17.9 suburban 12.8 9.7 5.0 12.6 rural 7.1 6.7 8.1 7.1 overall 10.0 13.3 26.0 10.7 * note that most library branches defined as “high poverty” are in general part of library systems with multiple branches and not single building systems. by and large, library systems connect and provide pac and internet services systemwide. article title | author 17assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 17 ture emerges. while overall, 53.5 percent of public librar ies indicate that their connection speeds are adequate to meet demand, some parsing of this figure reveals more variation (tables 7 through 10): ■ libraries with connection speeds of 769kpbs or less are more likely to report that their connection speeds are insufficient to meet patron demand at all times, with 24.0 percent of rural libraries, 25.8 percent of suburban libraries, and 25.4 percent of urban libraries so reporting (table 7). ■ libraries with connection speeds of 769kpbs or less are more likely to report that their connection speeds are insufficient to meet patron demand at some times, with 35.0 percent of rural libraries, 38.1 per cent of suburban libraries, and 53.4 percent of urban libraries so reporting (table 8). ■ libraries with connection speeds of greater than 769kbps also report bandwidthsufficiency issues, with 12.0 percent of rural libraries, 10.5 percent of suburban libraries so reporting; and 14.0 percent of urban librar ies indicating that their connection speeds are insuf ficient all of the time (table 7); 20.3 percent of rural libraries, 29.5 percent of suburban libraries, and 30.0 percent of urban libraries indicating that their connec tion speeds are insufficient some of the time (table 8). ■ libraries that have ten or fewer workstations tend to rate their bandwidth as more sufficient at either 769kbps or less or greater than 769kbps (tables 7, 8, and 10). thus, in looking at the data, it is clear that libraries with fewer workstations indicate that their connection speeds are more sufficient to meet patron demand. table 3. public library public-access workstations and speed of connectivity by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 48.4% n=2,929 45.2% n=2,737 30.1% n=891 63.2% n=1,872 21.6% n=269 75.2% n=937 more than 10 workstations 22.0% n=307 75.5% n=1,053 12.0% n=225 85.1% n=1,595 9.6% n=130 89.8% n=1,221 total 43.4% n=3,242 50.9% n=3,802 23.0% n=1,116 71.6% n=3,474 15.1% n=399 83.0% n=2,194 missing: 7.6% (n=1,239) table 4. fewer public library public-access workstations than patrons wishing to use them by metropolitan status rural suburban urban total 10 or fewer workstations 10.5% n=681 10.8% n=339 23.6% n=300 12.1% n=1,321 more than 10 workstations 10.8% n=158 11.4% n=220 31.2% n=430 16.9% n=808 total 10.6% n=845 11.0% n=562 27.8% n=748 13.7% n=2,157 missing: 2.9% (n=473) table 5. fewer public library public-access workstations than patrons wishing to use them at certain times during a typical day by metropolitan status rural suburban urban total 10 or fewer workstations 68.8% n=4,444 74.5% n=2,347 69.1% n=880 70.5% n=7,670 more than 10 workstations 78.1% n=1,139 80.2% n=1,548 62.8% n=866 74.5% n=3,553 total 70.5% n=5,605 76.7% n=3,905 65.6% n=1,764 71.7% n=11,273 missing: 2.9% (n=473) table 6. sufficient public library public-access workstations available for patrons wishing to use them by metropolitan status rural suburban urban total 10 or fewer workstations 20.6% n=1,331 14.7% n=464 7.4% n=94 17.4% n=1,889 more than 10 workstations 11.0% n=161 8.4% n=163 6.0% n=83 8.5% n=406 total 18.9% n=1,501 12.3% n=627 6.6% n=177 14.6% n=2,304 missing: 2.9% (n=473) 18 information technology and libraries | march 200718 information technology and libraries | march 2007 table 7. public library connection speed insufficient to meet patron needs by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 25.4% n=668 12.1% n=297 27.4% n=233 9.8% n=173 15.4% n=34 10.2% n=90 more than 10 workstations 11.6% n=34 11.4% n=108 19.2% n=41 11.3% n=168 25.4% n=32 17.1% n=199 total 24.0% n=705 12.0% n=408 25.8% n=274 10.5% n=341 18.7% n=72 14.0% n=293 table 8. public library connection speed insufficient to meet patron needs at some times by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 34.1% n=898 19.3% n=474 37.1% n=315 29.0% n=511 50.0% n=130 27.0% n=238 more than 10 workstations 43.2% n=127 22.5% n=214 42.3% n=90 30.3% n=450 60.3% n=76 32.0% n=374 total 35.0% n=1,025 20.3% n=694 38.1% n=405 29.5% n=961 53.4% n=206 30.0% n=626 table �. public library connection speed is sufficient to meet patron needs by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 38.9% n=1,025 68.3% n=1,675 35.0% n=297 60.2% n=1,062 34.6% n=90 62.9% n=556 more than 10 workstations 45.2% n=133 66.1% n=628 38.5% n=82 54.9% n=817 14.3% n=18 50.9% n=594 total 39.5% n=1,158 67.5% n=2,306 35.7% n=379 57.9% n=1,886 28.0% n=108 56.0% n=1,168 table 10. public library connection speed insufficient to meet patron needs some or all of the time by metropolitan status rural suburban urban lt769kbps gt769kbps lt769kbps gt769kbps lt769kbps gt769kbps 10 or fewer workstations 59.5% n=1,566 31.4% n=771 64.6% n=549 38.8% n=684 65.4% n=170 37.1% n=328 more than 10 workstations 54.8% n=161 33.9% n=322 61.5% n=131 41.6% n=618 85.7% n=108 49.1% n=573 total 24.0% n=1,025 32.3% n=1,102 64.0% n=680 40.0% n=1,302 72.0% n=278 44.0% n=919 article title | author 1�assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 1� ■ discussion and selected issues the data presented point to a number of issues related to the current state of public library pac and internetaccess adequacy in terms of available public access computers and bandwidth. the data also provide a foundation upon which to discuss the nature of quality and sufficient pac and internet access in a public library environment. while public libraries indicate increased ability to meet patron bandwidth demand when providing fewer publicly avail able workstations, public libraries indicate that they have difficulty in meeting patron demand for public access computers. growth of wireless connections in 2004, 17.9 percent of public library outlets offered wire less access, and a further 21.0 percent planned to make it available. outlets in urban and highpoverty areas were most likely to have wireless access. the majority of librar ies (61.2 percent), however, neither had wireless access nor had plans to implement it in 2004. as table 11 demon strates, the number of public library outlets offering wire less access has roughly doubled from 17.9 percent to 36.7 percent in two years. furthermore, 23.1 percent of outlets that do not currently have it plan to add wireless access in the next year. thus, if libraries follow through with their plans to add wireless access, 61.0 percent of public library outlets in the united states will have it by 2007. the implications of the rapid growth of the public library’s provision of wireless connectivity (as shown in table 11) on bandwidth requirements are significant. either libraries added wireless capabilities through their current overall bandwidth, or they obtained additional bandwidth to support the increased demand created by the service. if the former, then wireless access created an even greater burden on an already problematic band width capacity and may have actually reduced the overall quality of connectivity in the library. if the latter, libraries then had to shoulder the burden of increased expendi tures for bandwidth. either scenario required additional technology infrastructure, support, and expenditures. sufficient and quality connections the notion of sufficient and quality public library con nection to the internet is a moving target and depends on a range of factors and local conditions. for purposes of discussion in this paper, the authors used 769kbps to differentiate “slower” from “faster” connectivity. if, how ever, 1.5mbps or greater had been used to define faster connectivity speeds, then only 28.9 percent of public libraries would meet the criterion of “faster” connectiv ity (see table 1). and in fact, simply because 28.9 percent of public libraries report connection speeds of 1.5mbps or faster does not also mean that they have sufficient or quality bandwidth to meet the computing needs of their users, their staff, their vendors, and their service provid ers. some public libraries may need 10mbps to meet the pac needs of their users as well as the internal staff and management computing needs. the library community needs to become more edu cated and knowledgeable about what constitutes sufficient and quality connectivity in their library for the communi ties that they serve. a first step is to understand clearly the nature and type of the connectivity of the library. the next step is to conduct an internal audit that minimally: ■ identifies the range of networked services the library provides both to users as well as for the operation of the library; ■ identifies the typical bandwidth consumption of these services; ■ determines the demands of users on the bandwidth in terms of services they use; ■ determines peak bandwidthusage times; ■ identifies the impact of highconsumption networked services used at these peakusage times; ■ anticipates bandwidth demands of newer services and resources that users will want to access through the library’s infrastructure—myspace.com, youtube. com—regardless of whether or not the library is the direct provider of such services; and ■ determines what broadband services are available to the library, the costs of these services, and the “fit” of these services to the needs of the library. based on this and related information from such an audit, library administration can better determine the degree to which the bandwidth is sufficient in speed and quality. ■ planning for sufficient and quality bandwidth knowing the current condition of existing bandwidth in the library is not the same as successful technology plan ning and management to ensure that the library has, in fact, bandwidth that is sufficient in speed and quality. once an audit such as has been suggested is completed, careful planning for bandwidth deployment in the library is essential. it appears, however, that currently much of the management and planning for networked services is based first on what bandwidth is available as opposed to the bandwidth that is needed to provide the necessary services and resources in a networked environment. this stance puts public libraries in a reactive condition rather than a proactive condition regarding provision of net worked services. 20 information technology and libraries | march 200720 information technology and libraries | march 2007 most public library planning approaches stress the importance of conducting some type of needs assessment as a precursor to any type of planning.5 further, technology plans should include such things as goals, objectives, ser vices provision, and evaluation as they relate to bandwidth and the appropriate bandwidth needed. recent library technology planning guides, however, give little attention to the management, planning, and evaluation of band width as it relates to provision of networked services. it must be noted that some public libraries may be prevented from accessing higher bandwidth due to high cost, lack of availability of bandwidth alternatives, or other local factors that determine access to advanced telecommunications in their areas. in such circumstances, the audit may serve to inform the public service/utilities commissions, fcc, and others of the need for deploy ment of advanced telecommunications services in these areas. ■ bandwidth planning in a community context the audit and planning processes that have been described are critical activities for libraries. it is essential, however, for these processes to occur in the larger community con text. investments in technology infrastructure are increas ingly a communitywide resource that services multiple functions—emergency services, community access, local government agencies, to name a few. it is in this larger context that library pac and internet access occurs. moreover, there is a convergence of technology and service needs. for example, public libraries increasingly serve as agents of egovernment and disasterrelief providers.6 first responders rely on the library’s infrastructure when theirs is destroyed, as hurricane katrina and other storms demonstrated. local, state, and federal government agen cies rely on broadband and pac and internet access (wired or wireless) to deliver egovernment services. thus, at their core, libraries, emergency services, gov ernment agencies, and others have similar needs. pooling resources, planning jointly, and looking across needs may yield economies of scale, better service, and a more robust community technology infrastructure. emergency providers need access to reliable broadband and commu nications technologies in general, and in emergency situ ations in particular. libraries need access to highquality broadband and pac technologies. both need access to wireless technologies. as broadcast networks relinquish ownership of the 700 mhz frequency used for analog television in february 2009, and this frequency is distributed to municipali ties for emergency services, now is an excellent time for libraries to engage in community technology planning for egovernment, disaster planning and relief efforts, and pac and internet services. by working with the larger community to build a technology infrastructure, the library and the entire community benefit. ■ availability to high-speed connectivity one key consideration not known at this time is the extent to which public libraries—particularly those in rural areas—even have access to highspeed connec tions. many rural communities are served not by the large telecommunications carriers, but rather by small, privately ownedandrun local exchange carriers. iowa and wisconsin, for example, are each served by more than eighty exchange carriers. as such, public libraries are limited in capacity and services to what these exchange table 11. public-access wireless internet connectivity availability in public library outlets by metropolitan status and poverty metropolitan status poverty level provision of public-access wireless internet services urban suburban rural low medium high overall currently available 42.9% ± 4.9% (n=1,211) 42.5% ± 4.9% (n=2,240) 30.7% ± 4.6% (n=2,492) 38.0% ± 4.8% (n=5,165) 28.1% ±4.5% (n=679) 53.8% ± 5.0% (n=99) 36.7% ± 4.8% (n=5,943) not currently available and no plans to make it available within the next year 23.1% ± 4.2% (n=651) 29.7% ± 4.6% (n=1,562) 49.2% ± 5.0% (n=3,988) 37.4% ± 4.8% (n=5,091) 44.4% ± 4.9% (n=1,072) 21.0% ± 4.1% (n=39) 38.3% ± 4.9% (n=6,201) not currently available, but there are plans to make it available within the next year 30.6% ± 4.6% (n=864) 26.0% ± 4.4% (n=1,369) 18.6% ± 3.9% (n=1,509) 22.5% ± 4.2% (n=3,063) 26.2% ± 4.4% (n=633) 25.3% ± 4.4% (n=46) 23.1% ± 4.2% (n=3,742) article title | author 21assessing sufficiency and quality of bandwidth for public libraries | bertot and mcclure 21 carriers offer and make available. thus, in some areas, dsl service may be the only form of highspeed connec tivity available to libraries. and, as suggested earlier, dsl may or may not be considered high speed given the needs of the library and the demands of its users. communities that lack highquality broadband ser vices by telecommunications carriers may want to con sider building a municipal wireless network that meets the community’s broadband needs for emergency, disas ter, and publicaccess settings. as a community engages in communitywide technology planning, it may become evident that local telecommunications carriers do not meet the broadband needs of the community. such com munities may need to build their own networks, based on identified technologyplan needs. ■ knowledge of networked services connectivity needs patrons may not attempt to use highbandwidth services at the public library because they know from previous visits that the library cannot provide acceptable connec tivity speeds to access that service—thus, they quit trying to access that service, limiting the usefulness of the pub lic library. in addition, librarians may have inadequate knowledge or information to determine when bandwidth is or is not sufficient to meet the demands of their users. indeed, the survey and site visits revealed that some librarians did not know the connection speeds that linked their library to the internet. consequently, libraries are in a dilemma: increase both the number of workstations and the bandwidth to meet demand; or provide less service in order to operate within the constraints of current connectivity infrastruc ture. and yet, roughly 45 percent of public libraries indi cate that they have no plans to add workstations within the next two years; the average number of workstations has been around ten for the last three surveys (2002, 2004, and 2006); and 80 percent of public libraries indicate that space limitations affect their ability to add workstations.7 hence, for many libraries, adding workstations is not an option. ■ missing the mark? the networked environment is such that there are multi ple uses of bandwidth within the same library—for exam ple, public internet access, staff access, wireless access, integrated library system access. we are now in the web 2.0 environment, which is an interactive web that allows for content uploading by users (e.g., blogs, mytube.com, myspace.com, gaming). streaming content, not text, is increasingly the norm. there are portable devices that allow for text, video, and voice messaging. increasingly, users desire and prefer wireless services. this is a new environment in which libraries provide public access to networked services and resources. it is an enabling environment that puts users fully in the content seat—from creation to design to organization to access to consumption. and users have choices, of which the public library is only one, regarding the information they choose to access. it is an environment of competition, advanced applications, bandwidth intensity, and highquality com puters necessary to access the graphically intense content. the impacts of this new and substantially more com plex environment on libraries are potentially significant. as user expectations rise, combined with the provision of highquality services by other providers, libraries are in a competitive and service and resourcerich informa tion environment. providing “bare minimum” pac and internet access can have two detrimental effects in that they: (1) relegate libraries to places of last resort, and (2) further digitally divide those who only have publicaccess computers and internet access through their public librar ies. it is critical, therefore, for libraries to chart a highend course regarding pac and internet access, and not access that is merely perceived to be acceptable by the librarians. ■ additional research the context in which issues regarding quality pac and sufficient connectivity speeds to internet access reside is complex and rapidly changing. research questions to explore include: ■ is it possible to define quality pac and internet access in a public library context? ■ if so, what are the attributes included in the defini tion? ■ can these attributes be operationalized and mea sured? ■ assuming measurable results, what strategies can the library, policy, research, and other interested communities employ to impact public library move ment toward quality pac and internet access? ■ should there be standards for sufficient connectivity and quality pac in public libraries? ■ how can public librarians be better informed regard ing the planning and deployment of sufficient and quality bandwidth? ■ what is the role of federal and state governments in supporting adequate bandwidth deployment for public libraries?8 ■ to what extent is broadband deployment and avail ability truly universal as per the universal service 22 information technology and libraries | march 200722 information technology and libraries | march 2007 (section 254) of the telecommunications act of 1996 (p.l. 104104)? these questions are a beginning point to a larger set of activities that need to occur in the research, practitioner, and policymaking communities. ■ obtaining sufficient and quality public-library bandwidth arbitrary connectivity speed targets, e.g., 200kbps or 769kbps, do not in and of themselves ensure quality pac and sufficient connectivity speeds. public libraries are indeed connected to the internet and do provide public access services and resources. it is time to move beyond connectivitytype and speed questions and consider issues of bandwidth sufficiency, quality, and the range of networked services that should be available to the public from public libraries. given the widespread connectivity now provided from most public libraries, there continue to be increased demands for more and better networked services. these demands come from governments that expect public libraries to support a range of egovernment services, from residents who want to use free wireless connectivity from the public library, to patrons who need to download music or view streaming videos (to name but a few). simply providing more or better connectivity will not, in and of itself, address all of these diverse service needs. increasingly, pac support will require additional public librarian knowledge, resources, and services. sufficient and quality bandwidth is a key component of those services. the degree to which public libraries can provide such enhanced networked services (requiring exceptionally high bandwidth that is both sufficient and of high quality) is unclear. mounting a significant effort now to better understand existing bandwidth use and plan for future needs and requirements in individual public libraries is essential. in today’s networked envi ronment, libraries must stay competitive in the provision of networked services. such will require sufficient and highquality connectivity and bandwidth. ■ acknowledgements the authors gratefully acknowledge the support of the bill & melinda gates foundation and the american library association for support of the 2006 public libraries and the internet study. data from that study have been incorpo rated into this paper. references 1. information institute, public libraries and the internet (tal lahassee, fla.: information use management and policy insti tute, 2006). all studies conducted since 1994 are available at: http://www.ii.fsu.edu/plinternet (accessed march 1, 2007). 2. u.s. federal communications commission, high speed services for internet access: status as of december 31, 2005 (wash ington, d.c.: fcc, 2006), available at http://www.fcc.gov/ bureaus/common_carrier/reports/fccstate_link/iad/ hspd0604.pdf (accessed mar. 1, 2007). 3. j. c. bertot et al., public libraries and the internet 2006 (tal lahassee, fla.: information use management and policy insti tute, forthcoming), available at http://www.ii.fsu.edu/plinternet (accessed mar. 1, 2007). 4. j. c. bertot et al., “drafted: i want you to deliver e government,” library journal 131, no. 13 (aug. 2006): 34–37. 5. c. r. mcclure et al., planning and role setting for public libraries: a manual of options and procedures (chicago: ala, 1987); e. himmel and w. j. wilson, planning for results: a public library transformation process (chicago, ala, 1997). 6. j. c. bertot et al., “drafted: i want you to deliver egov ernment.”; p. t. jaeger et al., “the policy implications of internet connectivity in public libraries,” government information quarterly 23, no. 1 (2006): 123–41. 7. j. c. bertot et al., public libraries and the internet 2006. 8. jaeger et al., “the policy implications of internet connec tivity in public libraries.” the marc sort program john c. rather: specialist in technical processes research, and jerry g. pennington: information systems mathematician, library of congress, washington, d.c. 125 describes the characte1·istics, performance, and potential of sked (sortkey edit), a generalized computer program for creating sort keys for marc ii records at the users option. sked and a modification of the ibm s/360 dos tape sort/merge program form the basis for a comprehensive program for arranging catalog entries by computer. the role of sorting in the marc system many present and potential uses of cataloging data in machine readable form require that the input sequence of the records be altered before output. the production of book catalogs, bibliographical lists, and similar output products benefits from an efficient means for arranging the records in a more sophisticated way than mere alphabetical order or, even worse, the collating sequence of a particular computer. internal files, such as special tape indexes, also may require sequencing by sort keys that differ from the actual character strings in the records. the demonstration of the feasibility of filing catalog entries by computer hinges on successfully pedorming two tasks: 1) analyzing the requirements of particular filing arrangements; and 2) programming the computer to perform the required operations. actually, the two tasks are interdependent, because the nature of the filing analysis is strongly influenced by the ability of the computer to perform certain types of operations. the requirements for filing arrangement were considered at the genesis of the marc project ( 1) and they materially affected the characteristics of the marc ii format ( 2,3). structuring the format of a machine record is only part of the solution to the problem, however. 126 journal of library automation vol. 2/3 september, 1969 the first requirement for a program for library sorting is a set of generalized computer techniques for creating sort keys from marc records at the user's option. these techniques will provide the foundation for further refinement of the sorting capability by developing algorithms to resolve specific problems in file arrangement. this article describes the characteristics, performance, and potential of a generalized program developed by the information systems office and the technical processes research office of the library of congress. the present approach to the computer sorting problem was based on the following assumptions: 1) the sort key must be generated on demand. for maximum flexibility and economy of storage, it should not be a permanent part of the machine readable record. 2) data to be sorted must be processed (edited) for sorting by the machine. input to a data field should be in the form required for cataloging purposes; it should not be contrived simply to satisfy the requirements of filing. 3) all data elements contributing to a sort key must be fully represented. to determine the position of an entry in a large file, the filing elements must be considered in turn until the discrimination point is reached. no element may be truncated to make room for another. 4) at least initially, a manufacturer's program should be used for sorting and merging the records with sort keys. given the library's present machine configuration, this means using ibm s /360 dos tape sort/merge program. these assumptions shaped the course that was followed. the requirement that the sort key be generated on demand meant that a program had to be written to build sort keys specifically for records in the marc ii format. to allow maximum flexibility in specifying elements to be included in the sort key, the basic program was to be highly generalized, allowing any combination of fixed and variable field data to be included in the sort key. since several data elements may have to be considered to determine the proper location of an item in a complex file, it is desirable to construct a single sort key containing as many characters of each element considered in turn as the length of the key will allow. using a single sort key is more efficient than using separate keys for each element. the marc ii format the marc sort program was written to handle records in the processing format used by the library of congress. the differences between thi,s format and the marc ii communications format ( 2,3,4) have been described by avram and droz (5). for the purposes of the present article, it is sufficient to give a brief outline of the structure of the format as it marc sort progmm 127 relates to computer sorting capability and to describe the salient features of the content designators that facilitate the creation of sort keys. marc records are undefined; that is, they vary in length, and information is not provided at the beginning of each record for use by software input/ output macros. since the manufacturer's program used for sorting marc records cannot handle undefined records, preparation for sorting must include changing them from one type to the other. at the end of the sort/ merge phase, they must be returned to an undefined state. the maximum physical length of a marc record is 2048 bytes. if a logical record (that is, the bibliographical data plus machine format data) requires more than 2048 bytes, it must be divided into two (or more) physical records. at present, the marc sort program cannot handle continuation records of this type. the basic structure of the format includes a leader, a record directory, and variable fields. each variable field is identified by a unique tag in the directory. if necessary, the data in a field can be defined more precisely by indicators and subfield codes. they appear at the beginning of the field separated by delimiters. when no indicator is needed, the field begins with a delimiter. tags, indicators, and subfield codes are used to specify what variable field data are to be included in the sort key, how the data are to be arranged, and what special treatment may be required. although the full potential of these content designators has yet to be realized, they provide a basis for programming to achieve content-related filing arrangements; for example, placement of a single surname before other headings with the same character string. characteristics of the marc sort program the marc sort program has three components: 1) a sort-key edit program ( sked); 2) the sorting capability of the ibm s / 360 dos tape sort/merge program (tsrt); and 3) a merge routine written expressly for the marc sort program. the marc sort program is activated by a set of control cards supplied by the user. these control cards specify the parameters to be observed in processing each record. using this information, sked reads each record, builds as many sort keys as are required to satisfy the parameters, duplicates the master record each time a different key is constructed, and records information about the sort key and the master record for possible later use. the output of sked is an intermediate marc sort file containing records with sort keys. the second phase of the program involves tsrt, which also is controlled by parameter cards. the input is the intermediate marc sort file. the tsrt program sorts the records according to their keys, using a standard collating sequence (that is, according to the order of the bitconfigurations of the characters in the keys). the output can take either or both of two forms: 1) marc format, in which the sort key is stripped 128 journal of library automation vol. 2/ 3 september, 1969 merge i sort-key edit (sked) program. sort fig. 1. marc sort program data flow. sked parameter cards marc sort program 129 from each record and the format is returned to an undefined state; or 2) intermediate marc sort format, which is identical with the input to the tsrt program (sort key remains with the record). the merge routine written especially for the marc sort program provides the capability to merge two or more files produced previously by tsrt in the intermediate marc format and to output files in either or both of the above formats. it is necessary to provide a separate program for the merge function, since the manufacturer-supplied sort/merge package does not have the capability to merge intermediate marc records while producing marc ii output. figure 1 shows a simplified flow chart of the program. sort-key edit program ( sked) sked builds sort keys according to the parameters specified at run time. in this process, it uses a table to translate the data to equalize upperand lower-case letters, eliminate unwanted characters (e.g., diacritics, punctuation), and to provide a more desirable collating sequence during the sort phase. if the parameters result in more than one sort. key for a record, the record is duplicated each time a new key is built. the sort key is attached to the front of the marc record when both are written in the intermediate marc sort file. this is a variable-length, blocked file with a maximum block length of 4612 bytes (minimum blocking factor of 2). figure 2 shows diagrammatically how a record looks after it has been processed by sked. communiblock record sort key sort leader cations control fixed direc~ variable length length length key area field field tory fields i i i i i i i i l k -----2 or more records blocked-----~ fig. 2. schematic diagram of an intermediate marc sort record. records in the master file that do not satisfy the parameters for a particular processing cycle are written in an exception file which is in the same format as the original master file (that is, undefined). a utility program can be used to list the contents of the exception file. tsrt requires the specification of the number of control fields and certain related information about each such field. as many as twelve control fields (each with a maximum length of 256 bytes) can be accommodated by the program. the current implementation of the marc sort program uses a 256-byte key starting in position 9 of each record. (the first 8 bytes are used for variable record information). any change in the length 130 journal of library automation vol. 2/ 3 september, 1969 of the sort key must be reflected in the sked source deck and on the control cards for tsrt. the specification of control fields shown on a tsrt control card must be changed as follows: if the length of the sort key is shortened, then the control field length specification must be reduced. if the sort key is lengthened, then the control field must be split into two or more control fields, as follows: key length number of control fields = 256 (if the quotient is a fraction, use the next higher integer.) parameters the control cards for a sked processing cycle allow the user to specify the following options: 1) type of field. both fixed and variable fields may be specified as parameters for a sort key. there is no restriction on the order in which they are given. 2) specification of fields. fields may be specified in several ways: a) exact form: a specific tag for a variable field (e.g., 650) or the address of a fixed field (the only option for this type of field); b) generalized form: nxx, nnx, xnn, nxn, where any digit may be substituted for n (e.g., !xx ); and c ) as a range: nnnxxx (e.g. 600-651) . 3) selection of data from a field. the amount of data from a field to be processed can be determined in any of three ways: a) specifying the variable field tag without specifying particular subfield codes associated with it. this results in all data in the field being processed. b) specifying the number of characters to be processed. this must be done for fixed fields even if all data are desired. with either type of field, the data will be truncated if the number specified is smaller than the number of characters in the field. c) specifying the particular subfield codes associated with a variable field tag. this results in the sort key containing only the data from the specified subfields. for example, if the data in a 100 field were smith, john, 1910ed., failure to include subfield "e, (the designator for a relator like ''ed.") in the specification of subfields would result in its being excluded from the sort key. 4) alternate selection. two or more parameters may be specified for the same position in the sort key with the provision that only the first to be found will be used. for example, if 240 (uniform title) and 245 (bibliographical title) are specified as alternate selections in that order and both occur in a record, preference is given to 240 and only it is used in the sort key. marc sort program 131 5) multiple parametric levels. for efficient processing, mutually exclusively para~eters can be listed in the instructions for the same processing cycle. the program affords a means of distinguishing between primary parameters that must always appear in the sort key and secondary parameters that cannot be combined with one another. the user also has the option of specifying that a sort key is to be generated using only the primary parameters. for example, if a book catalog were to contain main entries, added entries, and subjects, the tags for added entries and subjects would be specified as secondary parameters and the tags for the main entry, title, and imprint date as primary parameters. the sort key built for each added entry and each subject entry would always include the main entry (if present), title, and imprint date. this option can be by-passed if, for example, only a subject catalog is desired. 6) sequence of subfield codes. the subfield codes for a variable field may be specified not only to control the data to be included in the sort key but also to determine the order in which it appears. the following example shows how this works: tag subfield codes data record 100 abed charles t ii t king of great britain,f 1630-1685 sked parameter 100 acbd sort key ( charles king of great britain ii 1630 1685ij 7) separator. the user must specify the character that will separate each data element in the sort key but he has a choice of the character to be used. when the required characters from a given data element have been moved to the sort key, the selected separator is inserted to mark the end of the element. the separator is one of a set of specially contrived characters called superblanks that sort lower than any other character, including a b1ank. use of the superblank permits the combination of different data elements in the same sort key because it prevents unlike data elements from sorting against one another, as shown below: {l-b_al_l_j_ohn_o_arth __ ull_th_!_l._i_m_g ___ 7,.. l ball john arthur 0 chess ~ ....____ ---· __ .r without the superb lank (shown here for convenience as a bullet) the second sort key would be placed before the first. later it is expected that use of different superblanks within data elements will enable the sort/merge program to group related headings together. .. . ' 132 journal of library automation vol. 2/3 september, 1969 8) acceptance/ rejection indicator. at the beginning of the processing cycle, a decision can be made as to whether a marc record should be processed if it does not include a particular parameter. if the rejection indicator is set, the record will be written to the exception file if that parameter does not occur. for example, if the parameters are 1xx (any main entry), 245 (bibliographical title ), and imprint date (taken from the fixed field), a record without a main entry tag could be accepted but a record without a title rejected. this allows for title main entries while excluding imperfect records that may be present. 9) duplication. the parameters for variable fields to be included in the sort key may be satisfied by more than one combination of tags in the directory for a single marc record. to provide for this occurrence, a duplication indicator may be set, thereby insuring that a sort key will be generated for each combination that satisfies the parameters. in addition the entire record will be duplicated so that it can accompany each sort key. 10) number of parameters. the number of parameters designating data elements for the sort key is determined by the user at the beginning of the processing cycle. any number up to twenty may be specified. it is unlikely that any given sort key will contain more than four or five parameters, but the ability to specify a much larger number allows for processing sort keys of several different types during the same cycle. translation table before data characters are moved from a designated field in a marc record to the sort key, they must be edited to insure that the key will include only characters that are relevant for sorting. this involves translation of the characters into the sked character set: 1) to equate upper and lower case versions of the same alpha characters; 2) to translate the period, comma, and hyphen to the bit configuration of an ordinary blank; 3) to reduce other punctuation, diacritics, and special characters to a single bit configuration that cannot be moved to the sort key; and 4) to insure the proper machine collating sequence (blank, 0 9, az). the sked character set also provides bit configurations for superblanks (see above). the translation routine is written so that the character set can be changed without programming complications. sked also includes a feature that safeguards the sort key against the possibility of two consecutive blanks, as would be the case when a period and a blank occur in sequence, or when the data erroneously include two blanks when only one should occur. before a character with a bit con~ figuration equal to a blank is moved to the sort key, the program determines whether the last character moved has the same configuration. if it does, the second character is not moved. marc sort prog1·am 133 other options sked has the optional capability of adding two variable data fields and their corresponding directory entries to each record. these entries follow the format for data in other marc ii variable fields. 1) 998 entry. when the sked capability to duplicate records is used, it may be desirable to label one record of the set as the "master record". this technique or a modification of it might be used to generate a reference from a partial record to the full (master) record in a book catalog. when this option is selected there will be one, and only one, 998 field generated with each record. infor,mation about the master record will be given by listing the tags used in the sort key to achieve a unique position of that record on file. for example, if a master record is sorted by main entry, then title (if different from main entry), and finally by the date of publication, the 998 field describing this master record should list the lxx tag, followed by the 2xx tag, and finally by the address !lnd length of the fixed field containing the publication date. the order of the tags in the 998 field is the same sequence used in the sort key of the master record. 2) 999 entry. when a book catalog is produced, it is desirable to show on the first line of an entry the element (e.g., title, subject) that determined the position of the entry in the arrangement. sked supplies this information by creating a variable field (tagged 999) containing the initial element of the sort tag. if this option is chosen, an indicator can be set in the 999 field to show that the data in it should be printed as the first line of the bibliographic printout. sort/ merge the tsrt program used by the marc sort program is the standard ibm-supplied ibm system/ 360 basic operating system tape sort/ merge program. design specifications for this program satisfy the sorting and merging requirements of tape-oriented systems with at least 16k bytes of main storage. this program enables the user to sort files of random records, or merge multiple files of sequential records, into one sequential file. if any inherent sequencing exists within the input file, the program will take advantage of it. the intermediate marc sort file produced by sked is acceptable to tsrt. as stated earlier, tsrt can accommodate up to twelve control fields for sorting. the marc sort program requires only one control field at present. it is important to note that the tsrt comparison routines end as soon as a character in one control field is different from the corresponding character in another conh·ol field. tsrt operates in four phases: assignment (phase 0); internal-sort (phase 1); external-sort (phase 2); merge-only (phase 3). if sorting is 134 journal of library automation vol. 2/3 september, 1969 to be done, the assignment, internal-sort, and external-sort phases are executed. if only merging is to be done, the assignment and merge-only phases are executed. tsrt provides various exits from the main line to enable a user to insert his own routines. exit 22 in the external sort has been provided to delete records. in the marc sort program phase this exit is used as an option for stripping the key and returning the records to standard marc ii format (undefined 2040 bytes maximum). the user exit intercepts each sorted record prior to output and converts it to an undefined state. the option is provided by addition of a "mods" control card . a how chart of tsrt appears as figure 3. user exit 2 2 assignment phase (phase 0 ) internal sor t phase (phase 1 ) external sort phase (phase 2) end o f progr am merged only ( pha!e 3) fig. 3. flow chart of the sort/merge phases of the marc sort program. marc s01·t program 135 four work tapes are used by this application of tsrt. a fifth drive is used for input and output. the ibm sort package is not capable of writing undefined length records. since the marc record is in an undefined format, the output routines of tsrt cannot be used. therefore, the following method is used to develop the marc output file: 1) a separate file receives the sort output instead of the standard sort out-file; 2) the separate file is written by special coding in exit 22 of tsrt; and 3) each record is written in such a way as to prevent the sort from also writing the record. the merge routine written especially for the marc sort program will output either intermediate (with key) or marc ii (without key) format tapes or both. the program in action processing times sked was written in assembler language, using physical input/ output control system and dynamic buffering assignments to achieve speed. the amount of time required to process a particular record is affected by the applicability of run-time parameters on the record. for example, if the user specifies twelve data elements from twelve different variable fields for inclusion in the sort key, the processing time will be greater than that required for a run with only one specification. likewise, a run requesting duplication of each record for every added entry will require more computer time than another run that does not duplicate records at all. since the total processing time required for a file of n records will be the same as the time to process n files of one record each (disregarding i/0 considerations) it is possible to project times using a run with various data conditions and numerous sked parameters as a guide. one such run on the ibm s / 360 at the library of congress processed records at the rate of 2,400 per minute. twelve parameters were specified and records were duplicated for certain subject entries. except for time spent in changing tape reels, sked can be expected to process records at the same rate regardless of the size of the file. the processing time for tsrt is affected by the same characteristics that affect most sort programs. some of these are as follows: 1) amount of memory available to the sort; 2) number of storage units (in lc' s case, tape units are used); 3) types of storage unit (for magnetic tape-interrecord gap, density, and tape length); 4) block size for data; and 5) amount of bias in the input. the only characteristic of sked that seems to relate to the speed with which tsrt operates has to do with sked's extended use of a single control field. for example, in many sorting systems, if records are to be arranged by main entry and within main entry by title and then by date, three control fields would be specified. one would be chosen for main entry; one would be chosen for title; and, one would be selected for the 136 journal of library automation vol. 2/ 3 september, 1969 date. sked places all of these within the same control field, separating them by a superblank. since tsrt is required to discriminate only on the single control field, a smaller amount of processing time is needed than would be the case if several control fields were used. results although sked does not have the ability to make the refined distinctions among headings required for sophisticated filing arrangements, it performs in a workmanlike way in producing alphabetical sequences that are unaffected by the presence of diacritical marks and vagaries in punctuation and spacing. moreover, the collating sequence (blank, 0 9, a z) insures that short names will file before longer ones beginning with the same character string. the ability to truncate headings to remove relators (e.g., ed.) also insmes the creation of a single sequence for authors whose names are sometimes qualified in this way. the following consolidated example shows some of the arrangements produced by sked. to simplify the presentation, generally only the first filing element is given. other elements have been added if they are needed to show distinctions that were made by the program. abbott, charles abc company. a "beckett, gilbert acadia university. alexander iii, king of albania alexander ii, king of bulgaria alexander i, king of russia bradley, bradley and bradley, firm. bradley, milton, 1836-1911. bradley (milton) company katz, eric, ed. sound about our ears. katz, eric. sound in space. lincoln, abraham, pres. u. s., 1809-1865. lincoln co., cr.-directories lincoln county coast directory lincoln, david. lincoln highway lincoln, marshall. lincoln, mass.-history lincoln, me.-genealogy marc sort program 137 london, albert, joint author. [mockridge, norman.] (author not used in sort key) inside the law. london, albert. london at night. london at night. london. central criminal court. london. county council. london, declaration of, 1909. london-description london (diocese) courts. london, jack. white fang. 1930. london, jack. white fang. 1950. london, jack white. alaskan adventure. london, ont. council london, ontario; a history london. ordinances. london-social conditions smith, john, 1900smith, john, 1901-1965. smith, john allan, 1900smith, john, clockmaker. anticipated developments at the present stage of the development of the marc sort program, sked does not have the ability to treat data in a field according to their semantic content. it cannot, for example, treat a character string in a 100 field in a special way because it is a single surname as opposed to a forename or multiple surname. nor does sked include routines for treating abbreviations and digits as if spelled out, or for suppressing data in a given field in some cases but not in others. the achievement of these capabilities will require: 1) development of a generalized technique for taking account of indicators in processing data in variable fields; 2) devising algorithms to handle particular filing situations related to the content of the field; and 3) placement of the algorithms within the framework of the sked program. the refinement of sked is being undertaken in relation to the problem of maintaining the lc subject heading list in machine readable form. techniques developed for this purpose will be applicable also to filing for book catalogs and other listings. the result should be a firm foundation for a comprehensive program for arranging bibliographic entries by computer. 138 journal of library automation vol. 2/ 3 september, 1969 availability of the program since the marc sort program should be useful to libraries that subscribe to the marc distribution setvice, the package (consisting of sked and the modified version of tsrt) has been filed with the ibm program information department. requests should be made through a local branch office of ibm and should cite the following number: 360d06.1.005. references 1. u. s. library of congress. office of the information systems specialist: a proposed format for a standardized machine-readable record. prepared by henriette d. avram, ruth s. freitag, kay d. guiles. iss planning memorandum, no.3. (washington, d. c.: 1965), p . 5. 2. u. s. library of congress. information systems office. the marc ii formats a communications format for bibliographic data. prepared ?y henriette d. avram, john f. knapp, and lucia j. rather. (washmgton, d. c.: 1968), p. 33. 3. "preliminary guidelines for the library of congress, national library of medicine, and national agricultural library implementation of the proposed american standard for a format for bibliographic information interchange on magnetic tape as applied to records representing monographic materials in textual printed form (books)," journal of library automation, 2 (june 1969), 68-83. 4. u. s. library of congress, information systems office: subscriber's guide to the marc distribution service. (washington, d. c.: 1968). 5. avram, henriette d.; droz, julius r.: "marc ii and cobol," journal of library automation, 1 (december 1968), 261-272. a semantic model of selective dissemination of information | morales-del-castillo et al. 21 a semantic model of selective dissemination of information for digital libraries j. m. morales-del-castillo, r. pedraza-jiménez, a. a. ruíz, e. peis, and e. herrera-viedma in this paper we present the theoretical and methodological foundations for the development of a multi-agent selective dissemination of information (sdi) service model that applies semantic web technologies for specialized digital libraries. these technologies make possible achieving more efficient information management, improving agent–user communication processes, and facilitating accurate access to relevant resources. other tools used are fuzzy linguistic modelling techniques (which make possible easing the interaction between users and system) and natural language processing (nlp) techniques for semiautomatic thesaurus generation. also, rss feeds are used as “current awareness bulletins” to generate personalized bibliographic alerts. n owadays, one of the main challenges faced by information systems at libraries or on the web is to efficiently manage the large number of documents they hold. information systems make it easier to give users access to relevant resources that satisfy their information needs, but a problem emerges when the user has a high degree of specialization and requires very specific resources, as in the case of researchers.1 in “traditional” physical libraries, several procedures have been proposed to try to mitigate this issue, including the selective dissemination of information (sdi) service model that make it possible to offer users potentially interesting documents by accessing users’ personal profiles kept by the library. nevertheless, the progressive incorporation of new information and communication technologies (icts) to information services, the widespread use of the internet, and the diversification of resources that can be accessed through the web has led libraries through a process of reinvention and transformation to become “digital” libraries.2 this reengineering process requires a deep revision of work techniques and methods so librarians can adapt to the new work environment and improve the services provided. in this paper we present a recommendation and sdi model, implemented as a service of a specialized digital library (in this case, specialized in library and information science), that can increase the accuracy of accessing information and the satisfaction of users’ information needs on the web. this model is built on a multi-agent framework, similar to the one proposed by herrera-viedma, peis, and morales-del-castillo,3 that applies semantic web technologies within the specific domain of specialized digital libraries in order to achieve more efficient information management (by semantically enriching different elements of the system) and improved agent–agent and user–agent communication processes. furthermore, the model uses fuzzy linguistic modelling techniques to facilitate the user–system interaction and to allow a higher grade of automation in certain procedures. to increase improved automation, some natural language processing (nlp) techniques are used to create a system thesaurus and other auxiliary tools for the definition of formal representations of information resources. in the next section, “instrumental basis,” we briefly analyze sdi services and several techniques involved in the semantic web project, and we describe the preliminary methodological and instrumental bases that we used for developing the model, such as fuzzy linguistic modelling techniques and tools for nlp. in “semantic sdi service model for digital libraries,” the bulk of this work, the application model that we propose is presented. finally, to sum up, some conclusive data are highlighted. n instrumental basis filtering techniques for sdi services filtering and recommendation services are based on the application of different process-management techniques that are oriented toward providing the user exactly the information that meets his or her needs or can be of his or her interest. in textual domains, these services are usually developed using multi-agent systems, whose main aims are n to evaluate and filter resources normally represented in xml or html format; and n to assist people in the process of searching for and retrieving resources.4 j. m. morales-del-castillo (josemdc@ugr.es) is assistant professor of information science, library and information science department, university of granada, spain. r. pedrazajiménez (rafael.pedraza@upf.edu) is assistant professor of information science, journalism and audiovisual communication department, pompeu fabra university, barcelona, spain. a. a. ruíz (aangel@ugr.es) is full professor of information science, library and information science department, university of granada. e. peis (epeis@ugr.es) is full professor of information science, library and information science department, university of granada. e. herrera-viedma (viedma@decsai.ugr.es) is senior lecturer in computer science, computer science and artificial intelligence department, university of granada. 22 information technology and libraries | march 2009 traditionally, these systems are classified as either content-based recommendation systems or collaborative recommendation systems.5 content-based recommendation systems filter information and generate recommendations by comparing a set of keywords defined by the user with the terms used to represent the content of documents, ignoring any information given by other users. by contrast, collaborative filtering systems use the information provided by several users to recommend documents to a given user, ignoring the representation of a document’s content. it is common to group users into different categories or stereotypes that are characterized by a series of rules and preferences, defined by default, that represent the information needs and common behavioural habits of a group of related users. the current trend is to develop hybrids that make the most of content-based and collaborative recommendation systems. in the field of libraries, these services usually adopt the form of sdi services that, depending on the profile of subscribed users, periodically (or when required by the user) generate a series of information alerts that describe the resources in the library that fit a user’s interests.6 sdi services have been studied in different research areas, such as the multi-agent systems development domain,7 and, of course, the digital libraries domain.8 presently, many sdi services are implemented on web platforms based on a multi-agent architecture where there is a set of intermediate agents that compare users’ profiles with the documents, and there are input-output agents that deal with subscriptions to the service and display generated alerts to users.9 usually, the information is structured according to a certain data model, and users’ profiles are defined using a series of keywords that are compared to descriptors or the full text of the documents. despite their usefulness, these services have some deficiencies: n the communication processes between agents, and between agents and users, are hindered by the different ways in which information is represented. n this heterogeneity in the representation of information makes it impossible to reuse such information in other processes or applications. a possible solution to these deficiencies consists of enriching the information representation using a common vocabulary and data model that are understandable by humans as well as by software agents. the semantic web project takes this idea and provides the means to develop a universal platform for the exchange of information.10 semantic web technologies the semantic web project tries to extend the model of the present web by using a series of standard languages that enable enriching the description of web resources and make them semantically accessible.11 to do that, the project basis itself on two fundamental ideas: (1) resources should be tagged semantically so that information can be understood both by humans and computers, and (2) intelligent agents should be developed that are capable of operating at a semantic level with those resources and that infer new knowledge from them (shifting from the search of keywords in a text to the retrieval of concepts).12 the semantic backbone of the project is the resource description framework (rdf) vocabulary, which provides a data model to represent, exchange, link, add, and reuse structured metadata of distributed information sources, thereby making them directly understandable by software agents.13 rdf structures the information into individual assertions (e.g., “resource,” “property,” and “property value triples”) and uniquely characterizes resources by means of uniform resource identifiers (uris), allowing agents to make inferences about them using web ontologies or other, simpler semantic structures, such as conceptual schemes or thesauri.14 even though the adoption of the semantic web and its application to systems like digital libraries is not free from trouble (because of the nature of the technologies involved in the project and because of the project’s ambitious objectives,15 among other reasons), the way these technologies represent the information is a significant improvement over the quality of the resources retrieved by search engines, and it also allows the preservation of platform independence, thus favouring the exchange and reuse of contents.16 as we can see, the semantic web works with information written in natural language that is structured in a way that can be interpreted by machines. for this reason, it is usually difficult to deal with problems that require operating with linguistic information that has a certain degree of uncertainty (e.g., when quantifying the user’s satisfaction in relation to a product or service). a possible solution could be the use of fuzzy linguistic modelling techniques as a tool for improving system–user communication. fuzzy linguistic modelling fuzzy linguistic modelling supplies a set of approximate techniques appropriate for dealing with qualitative aspects of problems.17 the ordinal linguistic approach is defined according to a finite set of tags (s) completely ordered and with odd cardinality (seven or nine tags): { }{ }t,=hi,s=s i …∈ 0, the central term has a value of approximately 0.5, and the rest of the terms are arranged symmetrically around a semantic model of selective dissemination of information | morales-del-castillo et al. 23 it. the semantics of each linguistic term is given by the ordered structure of the set of terms, considering that each linguistic term of the pair (si, st-i) is equally informative. each label si is assigned a fuzzy value defined in the interval [0,1] that is described by a linear trapezoidal property function represented by the 4-tupla (ai, bi, αi, βi). (the two first parameters show the interval where the property value is 1.0; the third and fourth parameters show the left and right limits of the distribution.) additionally, we need to define the following properties: 1.–the set is ordered: si ≥ sj if i ≥ j. 2.–there is the negation operator: neg(si ) = sj, with j = t i. 3.–maximization operator: max(si, sj) = si if si ≥ sj. 4.–minimization operator: min(si, sj) = si if si ≤ sj. it also is necessary to define aggregation operators, such as linguistic weighted averaging (lwa),18 capable of and operating with and combining linguistic information. focusing on facilitating the interaction between users and system, the other starting objective is to achieve the development and implementation of the model proposed in the most automated way possible. to do this, we use a basic auxiliary tool—a thesaurus—that, among other tasks, assists users in the creation of their profile and enables automating the alerts generation. that is why it is critical to define the way in which we create this tool, and in this work we propose a specific method for the semiautomatic development of thesauri using nlp techniques. nlp techniques and other automating tools nlp consists of a series of linguistic techniques, statistic approaches, and machine learning algorithms (mainly clustering techniques) that can be used, for example, to summarize texts in an automatic way, to develop automatic translators, and to create voice recognition software. another possible application of nlp would be the semiautomatic construction of thesauri using different techniques. one of them consists of determining the lexical relations between the terms of a text (mainly synonymy, hyponymy, and hyperonymy),19 and extracting terms that are more representative for the text’s specific domain.20 it is possible to elicit these relations by using linguistic tools, like princeton’s wordnet (http://wordnet .princeton.edu) and clustering techniques. wordnet is a powerful multilanguage lexical database where each one of its entries is defined, among other elements, by their synonyms (synsets), hyponyms, and hyperonyms.21 as a consequence, once given the most important terms of a domain, wordnet can be used to create from them a thesaurus (after leaving out all terms that have not been identified as belonging or related to the domain of interest).22 this tool can also be used with clustering techniques—for example, to group documents of a collection in a set of nodes or clusters, depending on their similarity. each of these clusters is described by the most representative terms of their documents. these terms make up the most specific level of a thesaurus and are used to search in wordnet for their synonyms and most general terms, contributing (with the repetition of this procedure) to the bottom-up-development process of the thesaurus.23 although there are many others, these are some of the most well-known techniques of semiautomatic thesaurus generation (semiautomatic because, needless to say, the supervision of experts is necessary to determine the validity of the final result). for specialized digital libraries, we propose developing, on a multi-agent platform and using all these tools, sdi services capable of generating alerts and recommendations for users according to their personal profiles. in particular, the model presented here is the result of several previous models merging, and its service is based on the definition of “current-awareness bulletins,” where users can find a basic description of the resources recently acquired by the library or those that might be of interest to them.24 n the semantic sdi service model for digital libraries the sdi service includes two agents (an interface agent and a task agent) distributed in a four-level hierarchical architecture: user level, interface level, task level and resource level. its main components are a repository of full-text documents (which make up the stock of the digital library) and a series of elements described using different rdfbased vocabularies: one or several rss feeds that play a role similar to that of current-awareness bulletins in traditional libraries; a repository of recommendation log files that store the recommendations made by users about the resources, and a thesaurus that lists and hierarchically relates the most relevant terms of the specialization domain of the library.25 also, the semantics of each element (that is, its characteristics and the relations the element establishes with other elements in the system) are defined in a web ontology developed in web ontology language (owl).26 next, we describe these main elements as well as the different functional modules that the system uses to carry out its activity. elements of the model there are four basic elements that make up the system: 24 information technology and libraries | march 2009 the thesaurus, user profiles, rss feeds, and recommendation log files. thesaurus an essential element of this sdi service is the thesaurus, an extensible tool used in traditional libraries that enables organizing the most relevant concepts in a specific domain, defining the semantic relations established between them, such as equivalence, hierarchical, and associative relations. the functions defined for the thesaurus in our system include helping in the indexing of rss feeds items and in the generation of information alerts and recommendations. to create the thesaurus, we followed the method suggested by pedraza-jiménez, valverde-albacete, and navia-vázquez.27 the learning technique used for the creation of a thesaurus includes four phases: preprocessing of documents, parameterizing the selected terms, conceptualizing their lexical stems, and generating a lattice or graph that shows the relation between the identified concepts. essentially, the aim of the preprocessing phase is to prepare the documents’ parameterization by removing elements regarded as superfluous. we have developed this phase in three stages: eliminating tags (stripping), standardizing, and stemming. in the first stage, all the tags (html, xml, etc.) that can appear in the collection of documents are eliminated. the second stage is the standardization of the words in the documents in order to facilitate and improve the parameterization process. at this stage, the acronyms and n-grams (bigrams and trigrams) that appear in the documents are identified using lists that were created for that purpose. once we have detected the acronyms and n-grams, the rest of the text is standardized. dates and numerical quantities are standardized, being substituted with a script that identifies them. all the terms (except acronyms) are changed to small letters, and punctuation marks are removed. finally, a list of function words is used to eliminate from the texts articles, determiners, auxiliary verbs, conjunctions, prepositions, pronouns, interjections, contractions, and grade adverbs. all the terms are stemmed to facilitate the search of the final terms and to improve their calculation during parameterization. to carry out this task, we have used morphy, the stemming algorithm used by wordnet. this algorithm implements a group of functions that check whether a term is an exception that does not need to be stemmed and then convert words that are not exceptions to their basic lexical form. those terms that appear in the documents but are not identified by morphy are eliminated from our experiment. the parameterization phase has a minimum complexity. once identified, the final terms (roots or bases) are quantified by being assigned a weight. such weight is obtained by the application of the scheme term frequencyinverse document frequency (tf-idf), a statistic measure that makes possible the quantification of the importance of a term or n-gram in a document depending on its frequency of appearance and in the collection the document belongs to. finally, once the documents have been parameterized, the associated meanings of each term (lemma) are extracted by searching for them in wordnet (specifically, we use wordnet 2.1 for unix-like systems). thus we get the group of synsets associated with each word. the group of hyperonyms and hyponyms also are extracted from the vocabulary of the analyzed collection of documents. the generation of our thesaurus—that is, the identification of descriptors that better represent the content of documents, and the identification of the underlying relations between them—is achieved using formal concept analysis techniques. this categorization technique uses the theory of lattices and ordered sets to find abstraction relations from the groups it generates. furthermore, this technique enables clustering the documents depending on the terms (and synonyms) it contains. also, a lattice graph is generated according to the underlying relations between the terms of the collection, taking into account the hyperonyms and hyponyms extracted. in that graph, each node represents a descriptor (namely, a group of synonym terms) and clusters the set of documents that contain it, linking them to those with which it has any relation (of hyponymy or hyperonymy). once the thesaurus is obtained by identifying its terms and the underlying relations between them, it is automatically represented using the simple knowledge organization system (skos) vocabulary (see figure 1).28 user profiles user profiles can be defined as structured representations that contain personal data, interests, and preferences of users with which agents can operate to customize the sdi service. in the model proposed here, these profiles are basically defined with friend of a friend (foaf), a specific rdf/xml for describing people (which favours the profile interoperability, since this is a widespread vocabulary supported by an owl ontology) and another nonstandard vocabulary of our own to define fields not included in foaf (see figure 2).29 profiles are generated the moment the user is registered in the system, and they are structured in two parts: a public profile that includes data related to the user’s identity and affiliation, and a private profile that includes the user’s interests and preferences about the topic of the alerts he or she wishes to receive. to define their preferences, users must specify keywords and concepts that best define their information a semantic model of selective dissemination of information | morales-del-castillo et al. 25 needs. later, the system compares those concepts with the terms in the thesaurus using as a similarity measure the edit tree algorithm.30 this function matches character strings, then returns the term introduced (if there’s an exact match) or the lexically most similar term (if not). consequently, if the suggested term satisfies user expectations, it will be added to the user’s profile together with its synonyms (if any). in those cases where the suggested term is not satisfactory, the system must have any tool or application that enables users to browse the thesaurus and select terms that better describe their needs. an example of this type of applications is thmanager (http://thmanager .sourceforge.net), a project of the universidad de zaragoza, spain, that enables editing, visualizing, and going through structures defined in skos. each of the terms selected by the user to define his or her areas of interest has an associated linguistic frequency value (tagged as ) that we call “satisfaction frequency.” it represents the regularity with which a particular preference value has been used in alerts positively evaluated by the user. this frequency measures the relative importance of the preferences stated by the user and allows the interface agent to generate a ranking list of results. the range of possible values for these frequencies is defined by a group of seven labels that we get from the fuzzy linguistic variable “frequency,” whose expression domain is defined by the linguistic term set s = {always, almost_ always, often, occasionally, rarely, almost_never, never}, being the default value and “occasionally” being the central value. rss feeds thanks to the popularization of blogs, there has been widespread use of several vocabularies specifically designed for the syndication of contents (that is, for making accessible to other internet users the content of a website by means of hyperlink lists called “feeds”). to create our current-awareness bulletin we use rss 1.0, a vocabulary that enables managing hyperlinks lists in an easy and flexible way. it utilizes the rdf/xml syntax and data model and is easily extensible because of the use of proceedings figure 1. sample entry of a skos core thesaurus diego allione sr. af9fa7601df46e95566 library management 0.83 figure 2. user profile sample 26 information technology and libraries | march 2009 modules that enable extending the vocabulary without modifying its core each time new describing elements are added. in this model several modules are used: the dublin core (dc) module to define the basic bibliographic information of the items utilizing the elements established by the dublin core metadata initiative (http:// dublincore.org); the syndication module to facilitate software agents synchronizing and updating rss feeds; and the taxonomy module to assign topics to feeds items. the structure of the feeds comprises two areas: one where the channel itself is described by a series of basic metadata like a title, a brief description of the content, and the updating frequency; and another where the descriptions of the items that make up the feed (see figure 3) are defined (including elements such as title, author, summary, hyperlink to the primary resource, date of creation, and subjects). recommendation log file each document in the repository has an associated recommendation log file in rdf that includes the listing of evaluations assigned to that resource by different users since the resource was added to the system. each of the entries of the recommendation log files consists of a recommendation value, a uri that identifies the user that has done the recommendation, and the date of the record (see figure 4). the expression domain of the recommendations is defined by the following set of five fuzzy linguistic labels that are extracted from the linguistic variable “quality of the resource”: q = {very_low, low, medium, high, very_high}. these elements represent the raw materials for the sdi service that enable it to develop its activity through four processes or functional modules: the profiles updating process, rss feeds generation process, alert generation process, and collaborative recommendation process. system processes profiles updating process since the sdi service’s functions are based on generating passive searches to rss feeds from the preferences stored 14/03/2007 high figure 4. recommendation log file sample escudero sánchez, manuel fernández cáceres, josé luis broadcasting and the internet http://eprints.rclis.org/…/audiovideo_good.pdf this paper is about… 2002 redoc, 8 (4), 2008 virual communities figure 3. rss feed item sample in a user’s profile, updating the profiles becomes a critical task. user profiles are meant to store long-term preferences, but the system must be able to detect any subtle change in these preferences over time to offer accurate recommendations. in our model, user profiles are updated using a simple mechanism that enables finding users’ implicit preferences by applying fuzzy linguistic techniques and taking into account the feedback users provide. users are asked about their satisfaction degree (ej) in relation to the information alert generated by the system (i.e., whether the items a semantic model of selective dissemination of information | morales-del-castillo et al. 27 retrieved are interesting or not). this satisfaction degree is obtained from the linguistic variable “satisfaction,” whose expression domain is the set of five linguistic labels: s’ = {total, very_high, high, medium, low, very_low, null}. this mechanism updates the satisfaction frequency associated with each user preference according to the satisfaction degree ej. it requires the use of a matching function similar to those used to model threshold weights in weighted search queries.31 the function proposed here rewards the frequencies associated with the preference values present when resources assessed are satisfactory, and it penalizes them when this assessment is negative. let ej { }t,=hba,|ss,s ba 0,...∈∈ s’ be the degree of satisfaction, and f j i l { }t,=hba,|ss,s ba 0,...∈∈ s the frequency of property i (in this case i = “preference”) with value l, then we define the updating function g as s’x s→s: { } { } ( ) {=f,eg s75% of strategic plans), second tier, and other areas of emphasis. saunders’ analysis revealed that strategic plan diversity content was a second-tier issue related to library staff. saunders found the term diversity was used in two ways: to refer to expertise, skills, and abilities; and to delimitate demographic characteristics, including ethnicity, nationality, or language.15 like wilson et al., saunders’ findings demonstrate academic libraries’ recognition of the importance of diversity in higher education.16 methodology this study employed content analysis and examined uborrow consortium members’ library websites (see appendix a for a list) for the presence and content of dei statements. uborrow is an interlibrary loan service comprised of big ten academic alliance members, plus the university of chicago and the center for research libraries, in which “users at member institutions are granted access to the collective wealth of information of the entire consortium.”17 uborrow members leverage individual campus resources to collaboratively assist the academic pursuits of students and faculty of each institution via the expedited sharing of resources. these libraries were chosen as a representation of a model consortium and are a reasonable focus for examination. content analysis is a research technique for making replicable and valid inferences from texts or other forms of contextually based data. it allows for data analysis “in view of the meanings, symbolic qualities, and expressive contents they have and of the communicative roles they play in the lives of the data’s sources”.18 content analysis provides a foundation for understanding how messages and meanings are constructed. as such, content analysis is appropriate to analyze the content and meanings of dei statements on uborrow websites. additionally, this study utilized multimodal theory, particularly lemke’s hypermodality and three communicative acts: organizational, presentational, and orientational.19 examining dei statements as multimodal texts allows for the analysis of meaning making and construction across information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 4 each type of act. just as users make meanings across sentences, paragraphs, and pages, users likewise make meanings from the ways in which they interact with digital information.20 the organizational aspect provides a way to examine the spatial arrangement of library websites, for example, libraries that dedicate entire webpages to dei statements or those in which these statements share pages with other content. content analysis provides a way of examining the presentational aspect of information, the ideational content of texts, in this case how dei statements are presented on uborrow websites. content analysis also provides a way to examine the orientational aspect, which indicates the nature of the communicative relationship, via exploring how libraries establish relations with whom they are communicating, for example, how the presence of dei statements positions libraries as conscientious entities engaged in the promotion of diverse and inclusive environments. i examined each uborrow member website for an explicit dei statement. informed by previous literature, i created an excel spreadsheet and entered data from each institution including: institution name, library website url, dei statement (yes/no), homepage link (yes/no), dei statement url, and notes following a standardized process. first, i recorded the library’s homepage. next, i searched the homepage for a dei statement link. if found, i indicated the presence in the yes/no columns and recorded the url. only direct links to library dei statements were marked as yes in the homepage link column. if no homepage link existed, i searched the library websites using the following terms: diversity, equity, and inclusion. when it was difficult to locate dei statements, i utilized the chat feature or e-mailed library administrators to ensure i was not overlooking relevant information. i conducted an initial search in july 2020 and a subsequent search in november 2020. no changes to explicit dei statements occurred. i conducted a followup search in april 2021. in the intervening months, one major change occurred on the university of minnesota libraries website. implications of this change are discussed below. once dei statements were identified, i examined the pages on which they were located. first, i examined page organization. in this step, i noted if the dei statement was the sole content on a page. if not, i noted the accompanying content. this analysis focused on lemke’s traversals, or the varied paths available to users in their search and navigation of websites.21 second, i analyzed dei statement content and identified the ways libraries presented their statements. this step included an examination of the language used in the dei statement. third, i expanded upon the presentational analysis and considered the ways dei statement content oriented the library toward users by exploring how statement language contributed to portraying the library in a certain way. this analysis focused on two areas: library-centered language common across statements and social justice language, which a subset of libraries’ dei statements employed. limitations uborrow is a single 16-member library consortium. further research of similar consortia or library associations would contribute to the study’s limited size. this study focused on explicit dei statements, thereby excluding other forms of dei content (e.g., announcements, marketing material, events). further research employing a broader view of dei content on academic library sites would also contribute to this study’s findings. finally, this study represents library websites during a snapshot in time. findings and analysis twelve (75%) uborrow member websites had an explicitly titled dei statement in november 2020 and 13 had an explicitly titled and labeled dei statement as of april 2021 (see table 1). in information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 5 november 2020, the university of minnesota had a clearly defined statement; however, this statement was untitled, and its location was unique among websites during the initial search. initially, this statement was not considered an explicit statement due to the lacking title, the implications of which are discussed in detail below. however, between november 2020 and april 2021, the university of minnesota libraries updated their homepage to include a link to a clearly defined and labeled dei statement, which university librarian and dean of libraries lisa german approved on february 1, 2021. for this reason, the university of minnesota libraries website receives unique discussion in the analysis that follows. three additional consortium members did not have an explicit dei statement. table 1. uborrow member libraries and the presence of dei statements institution explicit dei statement (y/n) university of chicago yes center for research libraries no university of illinois at urbana-champaign yes indiana university yes university of iowa yes university of maryland yes university of michigan yes university of minnesota no (fall 2020) yes (spring 2021) michigan state university yes university of nebraska yes northwestern university no ohio state university yes penn state university yes purdue university no rutgers university yes university of wisconsin madison yes while 12/13 of the 16 uborrow members contained an explicitly labeled dei statement, all member institutions addressed dei in some form, which included libguides, links to library resources, library events, and statements responding to specific societal events. however, the degree to which additional dei content prominent varied, with some content buried deep within library sites, as mestre’s work indicated.22 dei statement analysis: organization, presentation, and orientation the following section presents the descriptive analysis of the findings utilizing content analysis and lemke’s organizational, presentational, and orientational communicative aspects.23 analysis focused on questions about how the organization and presentation of dei statements contributed to the construction of meaning, the content of dei statements, and how content is oriented toward users in ways that position academic libraries as conscientious entities via their dei statements. organizational aspect of dei statements the organizational aspect of communication is instrumental and organizes and composes content in such a way that it is coherent and cohesive.24 organizational meanings have practical consequences, as the example of the university of minnesota’s library website demonstrates. information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 6 unique among uborrow sites as of november 2020, the university of minnesota libraries’ website contained a clearly focused, although untitled, dei statement. this statement is accessed via the about option on the homepage’s menu (see figure 1), which includes dropdown links to library policies, library overview, and the untitled dei statement. figure 1. the university of minnesota libraries’ homepage (november 2020). the statement’s placement is problematic for several reasons. first, the statement is easy to overlook. the researcher and a library staff member who responded to the researcher’s query via library chat both overlooked the statement. only when a third staff member was consulted did the identification of this statement occur. secondly, lemke discusses the affordances of hypertext and the many ways users can navigate websites, calling possible paths traversals.25 among the most basic is the visual-organizational traversal, which considers how webpage composition guides users’ eyes across the page. in this instance, the links are a call to action and signify to users that clicking on a link will transport them to a page with more information. static text on a webpage does not offer the same affordance. as a block of text located next to two panes of links, the statement is static, passive, and non-interactive, contributing to the ease with which users can overlook the statement. finally, this statement did not appear in the results of a library site search for the terms diversity, equity, or inclusion. given the various ways users can transverse a website, including actively searching for information, the lacking title makes this statement difficult to locate via scanning and searching. in users’ traversals of websites, two common approaches, identifying links or actively searching for desired information, are not applicable in locating the university of minnesota libraries’ dei statement. in the intervening months, between november 2020 and april 2021, the university of minnesota libraries website was updated to include an explicitly titled and labeled dei statement, available via a link from the homepage, prominently located in the upper right quadrant between the menu bar and hours and locations information (see figure 2). information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 7 figure 2. the umn libraries’ homepage (april 2021). this statement, written by the university libraries’ diversity, equity, and inclusion leadership committee, was approved on february 1, 2021. presented on a standalone page, this statement is similar to those of eight other uborrow consortium members, which are discussed in the next section. organization: stand-alone dei statements of the websites that contained explicitly labeled dei statements, eight libraries dedicated an entire page to the dei statement (nine including the umn libraries update). examination of these webpages revealed similar page titles, with variance according to the terms included. some page titles only included diversity, while others included diversity, equity, and inclusion. the university of michigan was unique as it also included accessibility. the relative consistency across these titles contributes to less frustrating and confusing user experience. clear and descriptive titles provide a positive experience for users accessing pages with an assistive screen reader. logistically, clear titles amplify page presence on searches conducted via google or other search engines. in addition to webpage titles, examination of the eight/nine pages revealed a relatively similar page organization and structure. each page contained headings that included some or all the terms diversity, equity, or inclusion. the pages were text heavy, with the university of nebraska and penn state university the only two whose pages included visual representations of diversity (i.e., images containing multi-racial groups). furthermore, the detail level of libraries’ dei statements were relatively consistent across the eight/nine webpages. while the page titles, organization, and detail of dei statements were similar, differences existed in the amount of additional dei content. for example, along with their dei statement, the university of maryland libraries’ diversity page defined diversity, an equitable environment, and inclusion. the university of michigan library followed their explicit dei statement with information relating the statement to the library’s collections, services, spaces, and people. other library webpages did not contain as much other on-page information. for example, rutgers university libraries links to various dei resources, which was another common trait (see figure 3). although clicking links requires additional steps to reach dei content, the presence of links is significant in consideration of lemke’s traversals. information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 8 figure 3. rutgers libraries’ diversity homepage with dei links. of the many types, a common organizational traversal is what lemke terms cohesive, in which “each element is an instance of some general category, and therefore with some thematic and/or visual similarities to the others, and as we catenate them we are cumulating toward an exhaustive exploration of the category”.26 the links on the rutgers university libraries’ diversity page allow users to traverse the library’s dei content, along with institutional dei content, as several links direct users to diversity pages external to the library. these links serve as calls to action and require users to click for more information. associated to dei content via the categorical connection, these links allow users to fully explore and expand upon the information found on the library’s dei statement page, allowing users to create their own meaning of library commitment to dei. user creation of meaning is in opposition to the library making this decision for the user, as when dei statements are placed on pages with other content, as is the case in four uborrow member websites. organization: shared dei statements unlike the libraries that dedicated a page to dei statements, variety exists in the page titles of the four libraries on which dei statements share pages with other content. dei statements are available via the about section of the library’s website, while of these, two are further couched on administration pages. the pages on which dei statements shared space exemplify mestre’s finding that dei content situated deeply within a website are difficult to locate.27 furthermore, the location of dei content is not entirely intuitive, making a user’s traversal to locate desired information less cohesive. michigan state university’s (msu) dei statement is found on the information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 9 library’s strategic plan page. rather than in a single statement, dei content is spread across the library’s strategic plan, including an inclusivity statement; a vision statement; and diversity, equity, and inclusion strategic direction (see figure 4). also, unlike the pages singularly devoted to a dei statement, msu’s strategic plan page was comparatively static with no links to other library or institutional dei content. the lack of links does not allow users to traverse msu’s site for dei content as easily due to the page’s static nature, making it difficult for users to, “construct a traversal which is more than the sum of its parts.”28 in this way, msu constructs the meaning of their commitment to dei via the limitations and restrictions on users’ interaction opportunities with the page on which their dei content is situated. figure 4. michigan state university libraries’ diversity content as part of strategic plan. organization: homepage links homepage links to dei statements were present on seven (58%) library homepages. when present, homepage links were located at two locations: in the menu or page footers. additionally, two levels of clarity existed regarding homepage links, as some sites contained an explicitly labeled link, while others required a two-step process to access the dei link. for example, the information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 10 university of iowa and penn state university libraries each had a clearly labeled link to their dei statement available via a single click on their library homepages. contrastingly, michigan, maryland, nebraska, ohio state, and rutgers all required users to first navigate a menu bar to find a link to the library’s dei statement. this two-step process requires more time and effort, whereas direct links require one less step. however, the direct link’s location on the university of iowa’s library homepage is located in the footer (see figure 5) and penn state university’s direct dei link is located near the bottom of the page, requiring users to scroll through entire pages. although requiring an extra step, libraries with a menu link at the top of the homepage, such as the ohio state university (see figure 6) do not require scrolling. a tradeoff exists between page location and number of steps to locate a link to the library’s dei statement when a homepage link is present. regardless of the homepage location, the presence of links to dei statements provides relatively easy access, making a user’s traversal to these statements relatively effortless and straightforward. figure 5. university of iowa libraries’ homepage dei statement footer link. figure 6. the ohio state university libraries’ homepage dei statement menu link. presentational aspect of dei statements lemke defines presentational meanings as those that present some state of affairs, which are construed from connections among processes, relations, events, participants, and circumstances and is significant for institutional purposes.29 users see the product of these actions that result in public dei statements. the discussions, meetings, efforts, and decisions that contribute to dei statements on library websites are concealed. the presence of dei statements represents the hidden work necessary for their creation, making dei statement content the library’s presentation of commitment to diversity, equity, and inclusion. information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 11 presentation: vague language and diversity conceptualizations examining the content of the 12/13 libraries with an explicit dei statement revealed these statements are frequently vague. many statements do not include specific language identifying what diverse means or who is in-/excluded. for example, rutgers university libraries’ dei statement states, “the libraries advance and promote diversity in all its forms” without describing, defining, or providing diversity examples.30 additionally, rutgers libraries endeavors “to create a welcoming workplace that reflects and supports the many populations and programs of the university with which we engage [emphasis original].”31 again, no definition indicates who the many populations includes. similarly, vague language produced an inconsistency regarding to whom dei statements were directed, with many, but not all, statements including faculty and staff. indiana university’s statement represents the later, stating, “iu libraries esteems diversity of all kinds […] to support students from diverse socio-economic backgrounds and foster a global, diverse inclusive community… in addition, the libraries commits to diversifying its own staff to reflect a diversity of perspectives and backgrounds [emphasis original].”32 including library faculty and staff acknowledges the potential significance of having a diverse and representative workforce, but still vaguely addresses the issue. unlike many dei statements which vaguely conceptualize diversity, the university of maryland libraries includes in its definition of diversity “race, ethnicity, nationality, religion, socioeconomic status, education, marital status, language, age, gender, sexual orientation, cognitive or physical disability; and learning styles” while noting diversity is not limited to these categories.33 similarly, the university of iowa libraries “welcomes and serves all, including people of color from all nations, immigrants, people with disabilities, lgbtq, and the most vulnerable in our community.”34 while still broad, and with language to cover additional conceptions of diversity, these statements’ explicit mention of various groups is unique among uborrow members’ dei statements. presentation: library-focused language continuing the broad conceptualizations of diversity, the university of chicago libraries’ statement includes an inward focus, which asks library users to consider their own positions and backgrounds: “we encourage open and honest discussion, reflect on our assumptions, and actively seek viewpoints beyond our own … and respect the uniqueness that we each bring to our shared endeavors.”35 this statement asks library users to actively challenge their own assumptions, values, beliefs, and views. however, the statement does not include active language regarding the necessity to prepare for challenging and difficult conversations and interactions. furthermore, the general conceptualization of these interactions with diversity makes it difficult for individuals to prepare for concrete situations in which one may encounter challenging, uncomfortable, or difficult conditions. utilizing lemke’s presentational aspect of communication, which considers processes, relations, events, participants, and circumstances to create and present a state of affairs, demonstrates that uborrow member libraries are vague in their dei statements, which the university of illinois at urbana-champaign (uiuc) exemplifies in their recognition of “diversity as a constantly changing concept. it [diversity] is purposefully defined broadly as encompassing, but not limited to, individuals’ social, cultural, mental and physical differences.”36 dei statements, as representative of academic libraries, present these institutions as attuned to larger social issues and the difficulties of making sweeping, definitive statements regarding diversity, when the term itself is, as the uiuc statement indicates, evolving and contested. the challenges this creates for library information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 12 administrators, and the hidden work that contributes to the creation and presentation of dei statements, is invisible in the end product, yet informs the content of these public statements. orientational aspect of dei statements orientational meanings establish relations between those who are communicating. these meanings communicate point of view, attitudes, and values.37 dei statements demonstrate libraries’ willingness to engage with and address dei issues, as well as, in some cases, combating racism and discrimination. analyzing the content of these statements produces insights into how statements orient libraries to their audiences. the vague and general language of many library dei statements creates a sense of detachment between libraries and users. conceptualizing diversity using vague language in an exchange between library users and a library dei statement orients the library in an abstract, immaterial way. using vague, broad, and ill-defined language makes no concrete demands of users. additionally, many dei statements are written in library-centered language, which positions the libraries at the center and users as peripheral. for example, the university of nebraska’s statement begins, “the university libraries creates and fosters inclusive environments for teaching, learning, scholarship, creative expression and civic engagement.”38 in this instance, the onus is on the libraries and what they can do to address issues of diversity, equity, and inclusion. the statement continues, “libraries staff members are empowered to provide an array of library services, collections, and spaces to meet the diverse needs of students, faculty, and researchers.”39 again, the library self-promulgates their efforts to address dei issues and ignores users’ contributions to positive, inclusive, and equitable environments. the university of nebraska libraries’ statement is not unique in the use of library-centered language, as such language is common across uborrow members’ dei statements. less vague and user-centered language would make library dei statements more humanizing, valuable, and contribute to the inclusive environments these statements espouse. orientation: anti-racism and social justice language some libraries’ dei statements make explicit mention of larger social issues and actively position themselves as social justice advocates, particularly anti-racism. the university of wisconsin – madison libraries are “dedicated to the principles and practices of social justice, diversity and equality and … commit ourselves to doing our part to end the many forms of discrimination that plague our society.”40 the penn state university libraries’ statement includes a commitment to “disrupting racism, hate and bias whenever and wherever we encounter it.”41 the university of michigan library “actively work[s] to ensure that tenets of diversity and antiracism influence all aspects of our work.”42 these statements present the libraries as cognizant, responsible, and socially aware entities. in these statements, the libraries’ employment of social justice discourse demonstrates nonneutrality. the university of wisconsin – madison libraries’ statement recognizes its place within society and the continual legacy of discrimination which tangibly affects current students. identifying social discrimination as a “plague” implies a solution via targeted, collective efforts to “further and enable the opportunities for education, benefit the good of the public and inform citizens.”43 similarly, the ohio state university libraries are guided by priorities “which facilitate, celebrate and honor diversity, inclusion, access and social justice.”44 embracing an active stance against social discrimination and positing the libraries as proponents of social justice utilizes the libraries’ dei statement as a tool to combat these injustices. semantically, dei statement text information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 13 offers information to users. statement content demonstrates libraries’ willingness to address dei issues institutionally and within society. the text demonstrates the libraries’ desire to combat injustice and the importance they place in doing so. additionally, in linking dei statements to social justice issues, libraries make demands of users. while still employing library-centered language, these statements provide a call to action via their direct acknowledgement that the libraries’ actions are a part of larger, collective efforts in the continual struggle against social injustices. lack of explicit dei statements as the analysis shows, the ways in which academic libraries organize, present, and orient themselves via their dei statements contributes to the construction of institutional value of, and commitment to, diversity, equity, inclusion. but what about libraries who do not have an explicit dei statement? in the united states context, given the attention to diversity, through black lives matter and other social movements advocating for social justice, it is surprising that four uborrow members do not have explicitly labeled dei statements on their websites. orientationally, the absence of an explicit dei statement suggests a lack of concern and consideration on behalf of libraries, as well as seemingly being out of touch with broader social contexts in which racial disparities persist. a clear dei statement, however, is a single piece of a library’s online presence. academic libraries can organize and present dei content on their websites in other ways, as all uborrow members did, even if an explicit statement was lacking. for example, the purdue university libraries have a diversity, inclusion, racism and anti-racism resources library guide, which acts as a one-stop shop for dei-related material. additionally, this guide contains a statement from the dean of libraries, dated june 2, 2020, condemning and making a collective call to action to address systemic racism. given this statement, while acknowledging the bureaucratic mechanisms in place that may slow the creation of an explicit dei statement, the question remains: if “enough is enough” as the statement claims, why have the purdue libraries not taken swift action to expedite the bureaucratic process? purdue university libraries are working behind the scenes and have created a council on equity, inclusion and belonging, as well as creating a new strategic plan in which “edi [equity, diversity and inclusion] is much more prominent in the current draft of that plan than in previous ones.”45 similarly, northwestern university does not, as of this writing, have an explicit dei statement. however, minimal diversity language is present in a public-facing welcome message on the library’s about page stating, “your library serves the diversity of the northwestern community.”46 furthermore, minimal diversity language appears in the internal strategic plan, 2019–2021, which includes a commitment to “responding to the vibrant diversity of our campus community.”47 additionally, recent conversations regarding racism, diversity, and social justice among library leadership have spurred the creation of a formal edi program at the institutional level.48 examining the situation at northwestern university revealed a look into the hidden work required to create and present dei content and an explicit dei statement, demonstrating the institutional significance of the presentational aspect of communication. discussion and implications the descriptive analysis presented in this study provides a foundation for closer analysis and future research, with potential avenues suggested below. this analysis also illustrates issues with the way in which dei statements are presented on academic library websites, which, given the pervasive whiteness of academic librarianship, affects academic librarians, staff, and the students information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 14 they serve. following lemke’s treatment of organizational meanings as primarily instrumental, the following section discusses presentational and orientational implications of dei statement content.49 academic libraries are an integral component of the institutions within which they are situated. their physical and digital spaces, services, and resources are critical to students’ academic success and faculty research. academic libraries also contribute to larger institutional dei initiatives. while an examination of institutional dei statements is beyond the scope of this study, institutional mission and vision statements also address diversity, equity, and inclusion. although many institutions have implemented specific diversity statements, wilson, meyer, and mcneal identified diversity content on institutional websites as being limited.50 given the changing demographics of higher education in the united states, the significance of dei to academic institutions and libraries will continually increase. if the purpose of mission and diversity statements is to reflect institutional priorities, as wilson and colleagues argue, the presence, or lack thereof, and content of these statements indicates the extent to which institutions value diversity, equity, and inclusion.51 presentational implications of dei statements in the context of the present study, that all uborrow member libraries’ websites engaged in some ways with dei content demonstrates the value they place on diversity, equity and inclusion. however, that only 12/13 of the 16 sites contained explicitly titled dei statements demonstrates more concerted effort is required if these libraries are to truly demonstrate their commitment. despite other dei language, northwestern university, a member of the 2020 acrl diversity alliance, does not have an explicit, public-facing dei statement, which demonstrates the many ways academic libraries are involved with diversity initiatives. while academic libraries may have internal policies that guide practice, that these policies, if they exist, are not public does not contribute to the construction and dissemination of the libraries’ message indicating their commitment to diversity, equity, and inclusion. the lack of a publicly facing statement, whether intentional or not, contributes to the message that the library is not fully committed to diversity, equity, and inclusion. in this vein, further exploration of diversity content and statements, at the institutional and library levels, is necessary to expand upon the findings of the present study regarding the messages dei statements send. qualitative studies could investigate the working cultures of academic libraries and explore internal mechanisms that contribute to the creation of public facing statements and how these mechanisms operate. lemke argues that presentational meanings are typically uncritical due to the presupposition of institutional hierarchies and roles, which minimize threats to the status quo, making this avenue especially fruitful from a critical or decolonizing framework.52 other opportunities for further research include quantitative content analysis of diversity statements, which could reveal specific words, terms, and phrases that institutions and academic libraries use to shed light on how these entities conceptualize dei. research examining users’ perspectives of academic library dei content is necessary to explore the ways in which libraries’ messages are received. orientational implications of dei statements examining uborrow members’ dei statements revealed the frequent employment of librarycentered language. framing the statements in this way places responsibility to create inclusive, equitable and welcoming environments on academic libraries, librarians, and staff. if the onus is information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 15 on academic libraries, as this dei statement language suggests, those who staff libraries are required to appropriately serve diverse students. as such, practical considerations of staff training regarding cultural competence is of paramount importance, which the university of michigan recognizes as they “encourage all library staff to participate in diversity-focused professional development and training activities.”53 while training and professional development opportunities are of limited utility, as cultural competence, cultural humility, and a diversity mindset cannot be acquired in one-off sessions, setting a pervasive atmosphere establishes institutional library value of diversity, equity, and inclusion. furthermore, hiring and retaining staff representative of student demographics is critical as doing so is one way academic libraries can demonstrate evidence of their values placed on diversity. that librarianship has traditionally been a white profession, as 86.7% of ala members selfidentified as white as of 2017 and 86.1% of higher education credentialed librarians were white as of 2009–2010, exacerbates the need for representative library staff.54 however, recruiting and hiring diverse staff is challenging as the number of visible minorities in academic librarianship has remained stagnant.55 retaining academic librarians and staff of color is a separate challenge, as institutional and library environments, expectations and research output are all explicit barriers, while internal pressure and time management constraints are implicit barriers.56 academic librarians and staff of color are subject to racial microaggressions perpetrated by unaware nonminority colleagues, an issue that permeates higher education, particularly at historically white institutions (hwis).57 these environments contribute to individual stress and fatigue for faculty of color.58 a history of what mehra and gray label white-ist trends in lis, an amalgamation of practices that symbolize, “racist connotations and racism in lis that is part of its historical evolution and development in the united states” affects librarians and staff of color.59 at the societal level, hate crimes are a continual issue in the united states.60 academic library dei statements were not created to directly address grand social issues. however, some dei statements included a social justice call to action. while not all dei statements contained such language, those that did not still made a commitment to supporting diversity. academic libraries’ dei statements identify the scope of available services and demonstrate libraries’ collective attempt to provide equitable spaces for all campus community members. while these statements occasionally align with institutional diversity statements, institutional responses to bias and discrimination provide insight into other ways institutions craft an identity.61 especially at hwis, these responses typically include demonstrating a professed commitment to dei, acknowledging actions to prevent future instances, establishing a protocol in the event an incident occurs; and addressing the issue and removing the institution from the perpetrators’ actions.62 academic library dei statements that simply state a commitment to diversity and inclusion without actively promoting change, which is lacking in the vague, library-centric language common to these statements, is a typical, though not emphatic, stance. this passive stance demonstrates the need for critical analysis of orientational meanings. such critical analysis allows for the examination of scrutinization of the actors and processes involved in dei statement creation, presentation, and messaging. such an examination offers an avenue to hold institutions accountable for their words and dei statements. future research that examines academic libraries’ responses to specific incidents of bias and discrimination could provide further insight into internal processes that lead to the public display of academic libraries as change agents. additional research could examine individual academic librarians and staff to interrogate the congruences or dissimilarities of individual and institutional practices regarding engagement with dei initiatives. information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 16 conclusion examination of uborrow members’ websites revealed that 12/13 of 16 sites contained explicitly labeled dei statements. although not all members’ sites contained an explicit statement, every library engaged with dei content in some way. among the 12/13 sites that contained an explicit dei statement, distinctions existed regarding statement organization. eight/nine libraries dedicated an entire page to their dei statement, while four members’ statements shared a page with other content. organizationally, the pages containing dei statements were similar with textheavy pages common across the websites. presentationally, dei statements serve as publicly facing representations of university libraries. the most telling insight into the presentational aspect of communication was revealed in an analysis of the sites that did not contain explicit dei statements, as this analysis examined the hidden work that is necessary in dei statement creation. orientationally, vague and library-centric language distances academic libraries and positions them as abstract entities. those libraries whose dei statements employed social justice language made more concrete demands of users. while explicit dei statements comprise only a portion of academic library dei content, an analysis of these statements revealed the ways in which they contribute to academic libraries’ construction of value of, and commitment to, diversity, equity, and inclusion. this analysis demonstrated how the presence, or lack thereof, of dei statements positions libraries as conscious entities operating within institutional and social contexts that both restrain and encourage promotion of diversity, equity, and inclusion. that the university of minnesota libraries updated their homepage to include a link to a newly constructed dei statement during the months between the first and second examination of uborrow consortium members’ websites in this study indicates the significance and value institutions place on dei initiatives. academic libraries, as entities that operate within institutions in the social context of historical racism, discrimination, and marginalization in the united states, are not immune to the consequences of these enduring legacies. despite current and ongoing efforts, this analysis revealed that much work and dedication is yet required in the continual engagement with dei initiatives. information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 17 appendix a: uborrow member institutions university of chicago university of illinois at urbana-champaign indiana university university of iowa university of maryland university of michigan michigan state university university of minnesota university of nebraska – lincoln northwestern university ohio state university penn state university purdue university rutgers university university of wisconsin – madison center for research libraries information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 18 appendix b: urls for dei pages from uborrow consortium websites university of chicago: https://www.lib.uchicago.edu/about/thelibrary/ university of illinois at urbana-champaign: https://www.library.illinois.edu/about/administration-overview/ indiana university: https://libraries.indiana.edu/administration#panel-about university of iowa: https://www.lib.uiowa.edu/about/diversity-equity-inclusion/ university of maryland: https://www.lib.umd.edu/about/deans-office/diversity university of michigan: https://www.lib.umich.edu/about-us/about-library/diversity-equityinclusion-and-accessibility michigan state university: https://lib.msu.edu/strategic-plan/ university of minnesota: https://www.lib.umn.edu/about/inclusion university of nebraska – lincoln: https://libraries.unl.edu/diversity ohio state university: https://library.osu.edu/equity-diversity-inclusion penn state university: https://libraries.psu.edu/about/diversity rutgers university: https://www.libraries.rutgers.edu/diversity university of wisconsin – madison: https://www.library.wisc.edu/diversity/ information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 19 endnotes 1 “table 306.30 fall enrollment of u.s. residents in degree-granting postsecondary institutions, by race/ethnicity: selected years, 1976–2028,” national center for education statistics, last modified march 2019, https://nces.ed.gov/programs/digest/d18/tables/dt18_306.30.asp. 2 courtney mcdonald and heidi burkhardt, “library-authored web content and the need for content strategy,” information technology and libraries 38, no. 3 (2019): 8–21, https://doi.org/10.6017/ital.v38i3.11015; courtney mcdonald and heidi burkhardt, “web content strategy in practice within academic libraries,” information technology and libraries 40, no. 1 (2021): 52–98, https://doi.org/10.6017/ital.v40i1.12453. 3 library bill of rights, american library association, amended january 29, 2019, https://www.ala.org/advocacy/intfreedom/librarybill. 4 alice m. cruz, “intentional integration of diversity ideals in academic libraries: a literature review,” the journal of academic librarianship 45, no. 3 (2019): 220–27, https://doi.org/10.1016/j.acalib.2019.02.011; jenny lynne semenza, regina koury, and sandra shropshire, “diversity at work in academic libraries 2010–2015: an annotated bibliography,” collection building 36, no. 3 (2017): 89–95, https://doi.org/10.1108/cb-122016-0038. 5 acrl racial and ethnic diversity committee, “diversity standards: cultural competency for academic librarians,” college and research libraries news 73, no. 9 (2012): 551–61, https://doi.org/10.5860/crln.73.9.8835; “acrl plan for excellence,” american library association, revised november 2019, http://www.ala.org/acrl/aboutacrl/strategicplan/stratplan 6 toni anaya and charlene maxey-harris, diversity and inclusion, spec kit 356 (washington, dc: association of research libraries, september 2017) https://doi.org/10.29242/spec.356. 7 american library association, “acrl, arl, odlos, and pla announce joint cultural competencies task force,” news release, may 18, 2020, https://www.ala.org/news/membernews/2020/05/acrl-arl-odlos-and-pla-announce-joint-cultural-competencies-task-force. 8 lori s. mestre, “visibility of diversity within association of research libraries websites,” the journal of academic librarianship 37, no. 2 (2011): 101–8, https://doi.org/10.1016/j.acalib.2011.02.001. 9 preston salisbury and matthew r. griffis, “academic library mission statements, web sites, and communicating purpose,” the journal of academic librarianship 40, no. 6 (2014): 592–96, https://doi.org/10.1016/j.acalib.2014.07.012. 10 linda r. wadas, “mission statements in academic libraries: a discourse analysis,” library management 38, no. 2/3 (2017): 108–16, https://doi.org/10.1108/lm-07-2016-0054. 11 salisbury and griffis, “academic library mission statements”; wadas, “mission statements in academic libraries.” information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 20 12 jeffery l. wilson, katrina a. meyer, and larry mcneal, “mission and diversity statements: what they do and do not say,” innovative higher education 37 (2012): 125–39, https://doi.org/10.1007/s10755-011-9194-8. 13 laura saunders, “academic libraries’ strategic plans: top trends and under-recognized areas,” the journal of academic librarianship 41, no. 3 (2015): 285–91, https://doi.org/10.1016/j.acalib.2015.03.011. 14 saunders, “academic libraries’ strategic plans”; wadas, “mission statements in academic libraries.” 15 saunders, “academic libraries’ strategic plans.” 16 wilson, meyer, and mcneal, “mission and diversity statements”; saunders, “academic libraries’ strategic plans.” 17 “library borrowing,” big ten academic alliance, accessed november 5, 2020, https://www.btaa.org/library/reciprocal-borrowing. 18 klaus krippendorf, content analysis: an introduction to its methodology, 3rd ed. (los angeles, ca: sage, 2013), 49. 19 jay l. lemke, “travels in hypermodality,” visual communication 1, no. 3 (2002): 299–325, https://doi.org/10.1177%2f147035720200100303. 20 lemke, “travels in hypermodality.” 21 lemke, “travels in hypermodality.” 22 mestre, “visibility of diversity.” 23 krippendorf, content analysis; lemke, “travels in hypermodality,” 304–5. 24 lemke, “travels in hypermodality,” 304. 25 lemke, “travels in hypermodality,” 300–1. 26 lemke, “travels in hypermodality,” 318. 27 mestre, “visibility of diversity.” 28 lemke, “travels in hypermodality,” 318. 29 lemke, “travels in hypermodality,” 304. 30 “diversity, equity, and inclusion,” rutgers university libraries, accessed april 2, 2021, https://www.libraries.rutgers.edu/about-rutgers-university-libraries/diversity-equity-andinclusion. 31 “diversity, equity, and inclusion,” rutgers university libraries. information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 21 32 “indiana university libraries diversity strategic plan,” indiana university libraries, accessed april 2, 2021, https://libraries.indiana.edu/strategicplan. 33 “diversity, equity, inclusion,” university of maryland libraries, accessed april 2, 2021, https://www.lib.umd.edu/about/deans-office/diversity. 34 “the university of iowa libraries’ commitment to diversity, equity, and inclusion,” iowa university libraries, accessed april 2, 2021, https://www.lib.uiowa.edu/about/diversityequity-inclusion/. 35 “diversity, equity, and inclusion statement,” university of chicago library, accessed april 2, 2021, https://www.lib.uchicago.edu/about/thelibrary/. 36 “library diversity statement,” university of illinois library diversity committee, accessed april 2, 2021, https://www.library.illinois.edu/about/administration-overview/. 37 lemke, “travels in hypermodality,” 304. 38 “diversity mission statement,” university of nebraska-lincoln libraries, accessed april 2, 2021, https://libraries.unl.edu/diversity. 39 “diversity mission statement,” university of nebraska-lincoln libraries. 40 “our commitment to diversity and inclusion,” university of wisconsin–madison libraries, accessed april 2, 2021, https://www.library.wisc.edu/about/administration/commitment-todiversity-and-inclusion/. 41 “libraries diversity, equity, inclusion, and accessibility (deia) commitment statement,” penn state university libraries, accessed april 2, 2021, https://libraries.psu.edu/about/diversity. 42 “diversity, equity, inclusion, and accessibility,” university of michigan library, accessed april 2, 2021, https://www.lib.umich.edu/about-us/about-library/diversity-equity-inclusion-andaccessibility. 43 “our commitment to diversity and inclusion,” university of wisconsin–madison libraries. 44 “diversity, equity, inclusion and accessibility (deia),” the ohio state university, university libraries, accessed april 2, 2021, https://library.osu.edu/equity-diversity-inclusion. 45 mark a. puente, associate dean for organizational development, inclusion and diversity, personal communication with the author, november 11, 2020. 46 “about,” northwestern university libraries, accessed april 2, 2021, https://www.library.northwestern.edu/about/index.html. 47 “strategic plan,” northwestern university libraries, accessed july 21, 2020, https://www.library.northwestern.edu/documents/about/2019-21-plan.pdf. 48 claire roccaforte, director of library marketing & communication, personal communication with the author, october 26, 2020. information technology and libraries december 2021 diversity, equity & inclusion statements on academic library websites | ely 22 49 lemke, “travels in hypermodality,” 304. 50 wilson, meyer, and mcneal, “mission and diversity statements.” 51 wilson, meyer, and mcneal, “mission and diversity statements.” 52 lemke, “travels in hypermodality.” 53 “diversity, equity, inclusion, and accessibility,” university of michigan library. 54 kathy rosa and kelsey henke, 2017 ala demographic study (chicago: ala office for research and statistics, 2017): 1–3, https://www.ala.org/tools/sites/ala.org.tools/files/content/draft%20of%20member%20de mographics%20survey%2001-11-2017.pdf; diversity counts 2012 tables, (data from diversity counts study, chicago: american library association), https://www.ala.org/aboutala/sites/ala.org.aboutala/files/content/diversity/diversitycounts /diversitycountstables2012.pdf. 55 janice y. kung, k-lee fraser, and dee winn, “diversity initiatives to recruit and retain academic librarians: a systematic review,” college and research libraries 81, no. 1 (2020): 96–108, https://doi.org/10.5860/crl.81.1.96. 56 trevar riley-reid, “breaking down barriers: making it easier for academic librarians of color to stay,” the journal of academic librarianship 43, no. 5 (2017): 392–96, https://doi.org/10.1016/j.acalib.2017.06.017. 57 jaena alabi, “racial microaggressions in academic libraries: results from a survey of minority and non-minority librarians,” the journal of academic librarianship 41, no. 1 (2015): 47–53, https://doi.org/10.1016/j.acalib.2014.10.008; chavella t. pittman, “racial microaggressions: the narratives of african american faculty at a predominantly white university,” the journal of negro education 81, no. 1 (2012): 82–92, https://doi.org/10.7709/jnegroeducation.81.1.0082. 58 william a. smith, tara j. yosso, and daniel g. solorzano, “challenging racial battle fatigue on historically white campuses: a critical race examination of race-related stress,” in covert racism: theories, institutions, and experiences, ed. rodney d. coates (boston: brill, 2011): 211– 37. 59 bharat mehra and laverne gray, “an ‘owning up’ of white-ist trends in lis to further real transformations,” library quarterly 90, no. 2 (2020): 189–239, https://doi.org/10.1086/707674. 60 “hate crime statistics, 2019,” federal bureau of investigation, https://ucr.fbi.gov/hatecrime/2019. 61 wadas, “mission statements in academic libraries.” 62 glyn hughes, “racial justice, hegemony, and bias incidents in u.s. higher education,” multicultural perspectives 15, no. 3 (2013): 126–32, https://doi.org/10.1080/15210960.2013.809301. president’s message: imagination and structure in times of change bohyun kim information technology and libraries | december 2018 2 bohyun kim (bohyun.kim.ois@gmail.com) is lita president 2018-19 and chief technology officer & associate professor, university of rhode island libraries, kingston, ri. in my last column, i talked about the discussion that lita had begun regarding forming a new division to achieve financial sustainability and more transparency, responsiveness, and agility. this proposed new division would merge lita with alcts (association for library collections and technical services) and llama (library leadership and management association). when this topic was brought up and discussed at an open meeting at the 2018 ala annual conference in new orleans, many members of these three divisions expressed interests and excitement. at the same time, there were many requests for more concrete details. you may recall that as a response to those requests, the steering committee, which consists of the presidents, presidents-elect, and executive directors of the three divisions decided to form four working groups with the aim of providing more complete information about what the new division would look like. today, i am happy to report that the work of the steering committee and the four working groups is well underway. the operations working group that i have been chairing for the last two months submitted its recommendations on november 23. the activities working group finished its report on december 5. the budget and finance working group also submitted its second report. the communications working group continues to engage members of all three divisions by sharing new updates and soliciting opinions and suggestions. most recently, it started gathering input and feedback on potential names for the new division.1 you can see the charges, member rosters, and current statuses of these four working groups in the ‘current information’ page at the ‘alcts/ llama/ lita alignment discussion’ community in the ala connect website (https://connect.ala.org/communities/allcommunities/all/all-current-information).2 to give you a glimpse of our work preparing for the proposed new division, i would like to share some of my experience leading the operations working group. the operations working group consisted of nine members, three from each division, in addition to myself as the chair and one staff liaison. we quickly became familiar with the organizational and membership structures of three divisions. the three divisions are similar to one another in size, but they have slightly different structures. lita has 18 interest groups (ig), 25 committees, and 4 (current) task forces; llama has 7 communities of practice (cop) and 46 discussion groups / committees / task forces; alcts has 5 sections, 42 igs, and 61 committees (20 at the division level and 41 at the section level). all committees and task forces in lita are division-level, while alcts and llama have committees that are either division-level or section/cop-level. alcts is unique in that it elects section chairs, who serve on the division board alongside with alcts directors-at-large. alcts also has a separate executive committee in addition to the board. llama has self-governed cops, which are formed by the board’s approval. among all three, lita has the most flat and simplest structure due to its intentional efforts in the past. for example, there are neither sections nor mailto:bohyun.kim.ois@gmail.com https://connect.ala.org/communities/allcommunities/all/all-current-information information technology and libraries | december 2018 3 communities of practice in lita, and the lita board eliminated the executive committee a few years ago. the steering committee of the three divisions agreed upon several guiding principles for the potential merger. these include (i) open, flexible, and straightforward member engagement, (ii) simplified and streamlined processes, and (iii) a governance and coordinating structure that engages members and staff in meaningful and productive work. the challenge is how to translate those guiding principles into a specific organizational structure, membership structure, and bylaws. clearly, some shuffling of existing sections, cops, and igs in three divisions will be necessary to make the new division as effective, agile, and responsive as promised. however, when and how such consolidation should take place? furthermore, what kind of guidance should the new division provide for members to re-organize themselves into a new and better structure? these are not easy questions to answer. nor are they something that can be immediately answered. some changes may require going through multiple stages for them to be completed. this may concern some members. they may prefer all these questions to have definitive answers before they decide on whether they will support the proposed new division or not. people often assume that a change takes place after a big vision is formed, and then the change is executed by a clear plan that directly translates that vision into reality in an orderly fashion. however, that is rarely how a change takes place in reality. more often than not, a possible change builds up its own pressure, showing up in a variety of forms on multiple fronts by many different people while getting stronger, until the idea of this change gains enough urgency. finally, some vision of the change is crafted to give a form to that idea. the vision for a change also does not materialize in one fell swoop. it often begins with incomplete details and ideas that may even conflict with one another in its first iteration. it is up to all of us to sort them out and make them consistent, so that they would become operational in the real world. recently, the steering committee reached an agreement regarding the final version of the mission, vision, and values of the proposed new division. i hope these resonate with our members and guide us well in navigating challenges ahead if the membership votes in favor of the proposal. the new division’s mission: we connect library and information practitioners in all career stages and from all organization types with expertise, colleagues, and professional development to empower transformation in technology, collections, and leadership, and to advocate for access to information for all. the new division’s vision: we shape the future of libraries and catalyze innovation across boundaries. the new division [name to be determined] amplifies diverse voices and advocates for equal and equitable access to information for all. the new division’s values: shared and celebrated expertise; strategically chosen work that makes a difference; transparent, equitable, flexible, and inclusive structures; empowering framework for experimental and proven approaches; intentional amplification of diverse perspectives; expansive collaboration to become better together. imagination and structure in times of change | kim 4 https://doi.org/10.6017/ital.v37i4.10850 in deciding on all operational and logistical details for the new division, the most important criteria will be whether a proposed change will advance the vision and mission of the new division and how well it aligns with the agreed-upon values and guiding principles. the steering committee and the working groups are busy finalizing the details about the new division. those details will be first reviewed by the board of each division and then shared with the membership at the midwinter for feedback. i did not anticipate that during my service as the lita president-elect and president, i would be leading a change as great as dissolving lita and forming a new division with two other divisions, alcts and llama. it has been an adventure filled with many surprises, difficulties, and challenges, to say the least. this adventure taught me a great deal about leading a change for an organization at a high level. when we move from the high-level vision of a change to the matter of details deep in the weeds, it is easy to lose sight of the original aspiration and goal that led us to the change in the first place. trying to determine as many logistical details becomes tempting to those in a leadership role because we all want to assure people in our organizations at a time of uncertainty and to make the transition smooth. however, creating a new division itself is a huge change at the highest level. it would be wrong to backtrack on the original goal to make the transition smooth. for it is the original goal that requires a transition, not vice versa. i believe those in a leadership role should accept that their most important work during the time of change is not to try to wrangle logistics at all levels but to keep things on track and moving in the direction of the original aspiration and goal. lita and two other divisions have many talented and capable members who will be happy to lend a hand in developing new logistics. the responsibility of leaders is to create space where those people can achieve that freely and swiftly and to provide the right amount of framework and guidance. i hope that all lita members and those associated and involved with lita see themselves in the vision, mission, and values of the new division, embrace changes from the lowest to the highest level, and work towards making the new vision into reality together. 1 you can participate in this process at https://connect.ala.org/communities/communityhome/digestviewer/viewthread?groupid=109804&messagekey=625e8823-21e0-419c-ab2b1cb4a82b8d09 and http://www.allourideas.org/newdivisionname. 2 this ‘current information’ page will be updated as the plans for the new division develop. https://connect.ala.org/communities/community-home/digestviewer/viewthread?groupid=109804&messagekey=625e8823-21e0-419c-ab2b-1cb4a82b8d09 https://connect.ala.org/communities/community-home/digestviewer/viewthread?groupid=109804&messagekey=625e8823-21e0-419c-ab2b-1cb4a82b8d09 https://connect.ala.org/communities/community-home/digestviewer/viewthread?groupid=109804&messagekey=625e8823-21e0-419c-ab2b-1cb4a82b8d09 http://www.allourideas.org/newdivisionname google us! capital area district libraries gets noticed with google ads grant public libraries leading the way google us! capital area district libraries gets noticed with google ads grant sheryl cormicle knox and trenton m. smiley information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.12089 sheryl cormicle knox (knoxs@cadl.org) is technology director for capital area district libraries. trenton m. smiley (smileyt@cadl.org) is marketing & communications director for capital area district libraries. increased choices in the marketplace are forcing libraries to pay much more attention to how they market themselves. libraries can no longer simply employ an inward marketing approach that speaks to current users through printed materials and promotional signage plastered on the walls. furthermore, they cannot rely on occasional mentions by the local media as the primary driver of new users. that’s why in 2016, capital area district libraries (cadl), a 13 branch library system in and around lansing, michigan, began using more digital tactics as a cost-effective way to increase our marketing reach and to have more control over promoting the right service, at the right time, to the right person. one example of these tactics is ad placement on the weather channel app. this placement allows ads about digital services like overdrive and hoopla to appear when certain weather conditions, such as a snowstorm, occur in the area. in 2017, while attending the library marketing and communications conference in dallas, our marketing and communications director had the good fortune of sitting in on a presentation by trey gordner and bill mott from koios (www.koios.co) on how to receive up to $10,000 of in-kind advertising every month from a google ad grants (www.google.com/grants). during this presentation, koios offered participants a 60day trial of their services to help secure the google ad grants and create a few starter campaigns. google ads are text-based and appear in the top section of google's search results, along with the ads of paying advertisers. nonprofits in the google ad grants program can set up various ad campaigns to promote whatever they like—the overall brand of the library, the collection, and various events, meeting room offerings or any other product or service. the appearance of each google ad is triggered by keywords chosen for each campaign. after cadl's trial period expired, we decided to retain koios to oversee the google ad grants project. while the library has used google ads for the sharing of video, we had not done much with keyword advertising. so, we were excited to learn more about the process of using keywords and the funding available through the grant. we viewed this as a great new tool to add to our marketing toolbox. it would help us achieve a few of our marketing goals: expanding our overall marketing reach and digital footprint by 50 percent; increasing the library’s digital advertisement budget by 300% (by using alternative funding); and promoting the right service at the right time. getting started koios coached us through the slalom course of obtaining accounts and setting them up. to secure the monthly ad grant, we first obtained a validation key from tech soup (www.techsoup.org), the nonprofit that makes technology accessible to other non-profits and libraries. that, in turn, pre-qualified us for a google for nonprofits account. (at the time, we were able to get a validation token from our existing tech soup account, but koios currently recommends starting by registering a 501c3 friends organization or library foundation with tech soup whenever possible.) after creating our google for nonprofits account, we used the same account username to create a google ads account. finally, to work efficiently with koios, mailto:knoxs@cadl.org mailto:smileyt@cadl.org https://www.koios.co/ https://www.google.com/grants https://www.techsoup.org/ information technology and libraries march 2020 google us! | knox and smiley 2 we provided them access to our google analytics property (which we have configured to scrub patron identifying information) and our google tag manager account (with the ability to create tags that we in turn review and approve). if you are taking the do-it-yourself approach, google has a step-by-step google ad grants activation guide and extensive help online. designing campaigns spending money well is hard work and that holds true with keyword search ads as well. there are some performance and ad quality requirements in the grant program that must be observed to retain your monthly allotment. understanding these guidelines and implementing campaigns that respect them, while working well enough to spend your grant allocation requires study and patience. again, we relied on koios to guide us. they helped us create campaigns and ad groups within those campaigns that were effective within the grant program. figure 1. example of minecraft title keyword landing page created by koios. information technology and libraries march 2020 google us! | knox and smiley 3 in august 2018, we started with campaigns for general branding awareness that included ads aimed at people actively searching for local libraries and our core services. these ads funnel users to our homepage and our online card signup. they are configured to display only to searchers who are geographically located in our service area. this campaign has been grown and perfected over 18 months into one of our most successful campaigns, garnering over 2,300 impressions and 650 clicks in january 2020, yet it spends just $450 of our grant funds. another consistent performer for us has been our digital media campaign with ads targeting users searching for ebooks and audiobooks. by june 2019 we had grown our grant spend to $1,500 a month using 27 different campaigns. the game changer for us has been working with koios to create campaigns based on an export of marc records from our catalog. we worked with koios to massage this data into a very simple pseudo-catalog of landing pages based on item titles. the landing page is very simple and seo friendly so that it ranks well in the split-second ad auction competition that determines whether your ad will be displayed. it has cover images, clear calls to action, loads fast, is mobile friendly and communicates the breadth of formats held by the library (see figure 1). clicking the item title or the borrow button sends users straight into our full catalog to get more information, request the item, or link to the digital version. figure 2. a user search in google for “dad jokes” showing a catalog campaign ad. grant program ads are displayed below paid ads. the format of the ad may vary as well. this version shows several extensions, like phone number, site links, and directions links. information technology and libraries march 2020 google us! | knox and smiley 4 figure 3. the landing page displayed to the searcher after they click on the ad and the resulting catalog page if the searcher clicks the borrow button. in google ads, koios created 14 catalog campaigns out of the roughly 250,000 titles we sent them. each campaign has keywords (single words and phrases from titles) derived from roughly 18,000 titles ranked by how frequently they are used in google search. again, these ads are limited geographically to our service area. figures 2 and 3 illustrate what a google searcher in ingham county, michigan, potentially encounters when searching for “dad jokes”. since their inception in september 2019, these catalog campaigns have been top performers for us, generating clickthrough rates of 8-15% and a couple thousand additional ad clicks monthly, the aggregation of a small number of clicks on any one ad from our “long tail” of titles. we are now spending over $5,000 of our grant funds and garnering nearly 23,000 impressions and 3,000 ad clicks monthly. results in general, we find that our google ads have succeeded in drawing additional new visitors to our web site. using our long-established google analytics implementation that measures visits to our website and catalog combined, we compared the third quarter of 2018, when we were ramping up our google ad grants campaigns, to the third quarter of 2019, after our catalog campaign was firmly established. the summary numbers are encouraging. the number of users is up 17%, and number of sessions is up 4%. within the overall rise in users, returning users are up 9%, but new users are up 25%. therefore, we are getting more of those coveted, elusive “non-library-users” to visit us online. when comparing the behavior of new and returning visitors, we also see that the overall increase in sessions was achieved despite the head wind of a 4% decline in returning visitor sessions. however, are the new visitors engaging? perhaps the most tangible measure of engagement for a public library catalog is placing holds. we have a google analytics conversion goal that measures those holds. the information technology and libraries march 2020 google us! | knox and smiley 5 rate of conversion on the hold goal among new visitors rose 7%, while dropping 13% among returning visitors. from other analysis, we know that our highly-engaged members are migrating to our mobile app and to digital formats, so the drop for returning users is explainable and the rise among new visitors is hopeful. we are working on ways to study more closely these new visitors so that we can discover and remove more barriers in the way of them becoming highly engaged members of their public library. future plans with the help of koios, new campaigns will be created to promote our blogs and podcasts. we will also link a campaign to our demco events database. finally, in partnership with koios, we will work with patron point to incorporate our automated email marketing system into google ad campaigns. we will add campaigns for pop-up ads that encourage library card signup through our online registration system. once someone signs up for a library card online, the system will trigger a welcome email that promotes some of our core services. this on-boarding set-up will also include an opportunity for the new cardholder to fill out a form to tailor content in future emails to their interests. through all these means, cadl leads the way in delivering the right service, at the right time, to the right person. getting started designing campaigns results future plans we can do it for free! using freeware for online patron engagement public libraries leading the way we can do it for free! using freeware for online patron engagement karin suni and christopher a. brown information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.13257 karin suni (sunik@freelibrary.org) is curator, theatre collection, the free library of philadelphia. christopher a. brown (brownc@freelibrary.org) is curator, children’s literature research collection, the free library of philadelphia. © 2021. “public libraries leading the way” is a regular column spotlighting technology in public libraries. in the early weeks of the pandemic, the special collections division of the free library of philadelphia (https://freelibrary.org/) responded to the library’s call for fun and interactive online engagement. initially staff members released games and buzzfeed-inspired lists via various social media accounts to amuse patrons, distract from the lockdown, and provide educational programming. as the list of activities grew, we realized this content needed a more substantial home; the return on investment of time for the development and production of an online game to be released once on social media was not sufficient. activities and passive programming that took hours to create could easily fall victim to social media’s algorithms and be quickly buried in a patron’s feed. the free library’s official blog was an insufficient option because it promoted all library programming, and our goal was to highlight the value of our division and the materials housed within it. we resolved these issues by creating an online repository solely with freeware systems (https://bit.ly/funwithflpspeccoll). the repository provides a stable landing page wherein the special collections division content builds meaningful connections with patrons of all ages. this model can be readily adapted and is a valuable tool for library workers promoting their own online engagement. repository framework it was clear that our division could not add to the burden of an overworked it staff by requesting support for digital engagement. we needed to seek external alternatives that would interest patrons and could be managed with limited training. before we began our search, we brainstormed a list of requirements: • an inexpensive and user-friendly hosting platform • a pleasing look and easy navigation • the ability to be updated frequently and easily • the flexibility to adapt and expand as our requirements change our search led us to the google suite of products, specifically google sites and google drawings. google sites and google drawings integrated perfectly with each other, and we appreciated their usability and relative simplicity. once we selected the software, we knew we needed a list of best practices to guide the repository’s creation: ● to establish a visual connection with our official website, the repository would primarily use the free library’s branded color scheme. ● all thumbnails created would be square, allowing us to reuse the image as promotional material on different social media accounts. mailto:sunik@freelibrary.org mailto:brownc@freelibrary.org https://freelibrary.org/ https://bit.ly/funwithflpspeccoll information technology and libraries march 2021 we can do it for free! | suni and brown 2 ● all members of the division can create content, but the ability to update and edit the repository would remain limited to ensure consistency. these guidelines have proven effective. the color scheme and thumbnail rules formed a framework wherein we could work productively without “reinventing the wheel.” limiting administrative abilities has allowed us to maintain a controlled vocabulary within the repository, better unifying the content. repository software the google suite, specifically google sites, is advantageous for library workers looking to create professional-looking content quickly. it is free with a google account and built-in templates allow users to build a fully functional website within a few hours with little-to-no design experience. as with all freeware, google sites has quirks. the foremost is that while there are options for customization, these options are finite. there are a limited number of layout, header, and font designs meaning that anyone using the software must temper their vision to fit within the confines of the program. google drawings is far more flexible, in part because it is a much simpler program. users familiar with software like powerpoint or ms paint have the ability to design images for headers, thumbnails, etc. two drawbacks we encountered with this freeware are the restrictions on image upload size (a consideration for our division given the archival files used in our digital collections) and the limited ability to create word art. for our division, the advantages of these software products outweigh their limitations. content framework the repository houses programming devised primarily with freeware. an early discovery was a suite of activities from flippity (https://www.flippity.net). designed for educational use, flippity provides templates for a variety of online activities including memory games, matching games, and board games. our primary focus has been on the first two, although we continue to explore new aspects of this suite as templates are added. flippity works with google sheets and can integrate images from google drawings. jigsaw planet (https://jigsawplanet.net/) has been used extensively by libraries and museums during the pandemic. it allows creators to easily turn images into puzzles that are played online, either on the site itself or through embedding the puzzle. the site allows registered users to access leaderboards, and it allows creators to track how many times puzzles have been played. in addition to the ease of use, the major benefit of jigsaw planet is that the patron can customize their experience by changing the number of pieces to fit their preferred level of difficulty. the desire for audio and video content has surged over the last several months, and we have sought to meet that need through the use of a variety of software. in regard to video, youtube is not a new tool, but the majority of our pre-pandemic programs were not filmed. with the shift to crowdcast and zoom, we now have a library of online lectures and other events that have been uploaded to youtube and can be viewed repeatedly and at any time. with a dedicated home for this content, we have been inspired to seek out older videos of special collections programming across multiple channels and link them to the repository. https://www.flippity.net/ https://jigsawplanet.net/ information technology and libraries march 2021 we can do it for free! | suni and brown 3 one of the newest additions to our offerings has been the podcast story search from special collections (http://bit.ly/flpstorysearch), which explores stories based on, inspired by, or connected to material artifacts. the podcast is recorded and edited using zencastr and audacity and is posted on anchor, which also distributes it to major listening apps. in recent weeks, our division has added images, blog posts, and additional con tent for current and past exhibitions. this is the first formal exhibition compilation since the special collections division began in 2015, and we are delighted that it is available for the public to explore. the material is arranged using templates and tools available in google sites, allowing patrons to view image carousels, exhibition tags, and past programs. the inclusion of this material marks a shift away from the repository functioning as a response to the need for pandemic-related content to a living history of our division and our work promoting the special collections of the free library. accessibility accessibility and equity of access lie at the core of library service. sadly, we were not initially focused on this point, and our content was not fully accessible, e.g., text was presented in thumbnails only which limited the use of screen readers to relay information. as the content expanded, we sought to make the space as inclusive as the freeware limits allowed. alternative text was added to images and information was not limited within thumbnails. this is an ongoing process, but one that is necessary to reach as many patrons as possible. analytics site visits and other statistics for a library’s online presence are always important, but especially so during the pandemic when restricted physical access has driven more patrons to online resources. our plan for capturing this information was two-pronged. first, we used bit.ly to create customized, trackable links for our content. these are used within the repository and on social media and in other online promotions. this has proven to increase repository traffic while providing information on how patrons discover our content. the statistics generated from bit.ly are only available for 30 days for free accounts, albeit in a rolling 30-day window. knowing this, we transcribe the statistics monthly into a spreadsheet to maintain a consistent account of patron access. our second prong is google analytics, a freeware option that only tracks data within the repository. google analytics connects a single google account to google sites, but the integration is seamless and the data remains available indefinitely. this provides a visual breakdown of statistics, including maps and graphs that are easily shared with other stakeholders. by using both tools we are able to surmise who is visiting the repository, where they are finding the links, and which sections are popular with our patrons. conclusion the special collections repository was created in response to a growing need for online patron engagement during the early weeks of the pandemic. our division strove to engage the public with fun, educational programming and activities primarily using freeware. this has proven to be successful with the general public and members of our division. the statistics from the site have both informed content creation and engendered a better appreciation for the repository from our administration. as we move forward, the repository is evolving into a comprehensive collection of what the special collection division does and how we meet the need for patron engagement http://bit.ly/flpstorysearch information technology and libraries march 2021 we can do it for free! | suni and brown 4 online and in person. it is a framework that can be used by library workers across a multitude of areas and specialties, housing activities from story times and passive programming to book clubs and lectures. repository framework repository software content framework accessibility analytics conclusion lib-s-mocs-kmc364-20141024053122 201 a regional serials program under national serials data program auspices: discussion paper prepared for ad hoc serials discussion group audrey n. grosch: university of minnesota, minneapolis. purpose of the program a regionally organized program for serials bibliography is proposed because of the large volume of complex data needing control and the many purposes to which the data can be put in support of regional or local needs. · the size of the data base comprising serials bibliography in the united states alone may exceed 2 million titles. gregory's union list of serials represents the largest single source of controlled titles-450,000 in the third edition.1at a minimum its successor publication, new serial titles (nst), contains 325,000 titles. 2 therefore, some 775,000 titles are under control for ·identification, interlibrary transfer, and location purposes . . the data base requirements for the isds/ nsdp record comprise several dozen fields. when added together with other information now found in a marc serials record for cataloging purposes and when further coupled with explicit holdings information needed for regional networks, the file size would exceed 1 billion characters. therefore, the systems design basis as well as the functional purposes of such data would encourage us 'to explore a regionally organized serials program. another even more overwhelming factor which gives support to a regional system is the mixture of rules applied in cataloging. the nsdp data establish conclusive identity via the key-title without a full bibliographic record. however, libraries will not suddenly drop their local practices or use of lc cataloging copy, affecting their internal arrangements of collections. therefore, the most prudent course would seem to be reconciliation of past practice with the new system and development of a machinereadable serials record specification which accommodates the requirements of the isds/ nsdp and the cataloging rules of present libraries. a regionally organized program should be highly responsive to such a reconciliation. 202 journal of library automation vol. 6/4 december 1973 the regional serials program within the framework of the isds and its respective national center ( nsdp in the u.s. ) , each country will proceed to develop its serials program. the united states program must be organized with nsdp at its center. figure i is a schematic showing the relationships of nsdp and the other units within this proposed regional program. this figure also shows the bi-directional communications flow between the various units. n.a.l. r-e--c:=j ' r--1 i i i i i i i i i ""-1 i i i n.l.m. reiional serials data centers local libraries i i i i i i l ------------------------------------------------------------· fig. i. a regional serials program organization and lines of communication. the three national libraries originate bibliographic data for use in the nsdp system and also can continue to function as providers of cataloging data to the library community via cards, marc tapes, the national ~~brary of medicine's serline, and any new services of this type. marc, serline or other machine readable sources should ultimately be the method by which raw data from the national libraries would enter nsdp. the regional serials data centers would receive information from nsdp and also would provide certain kinds of data for the nsdp data base which would be nonduplicative of the national libraries input. local libraries would interface to their regional serials data center, supplying information for the region in a shared environment. hopefully, the regional centers could take over the functions now performed by the marc serials service, supplying products requested by the libraries locally and obviating the need of local libraries subscribing to marc serial tapes. the serials environment may be organized into: • a local library serials management component; . • a network serials management component; and, • an international/national serials system component. a regional program can address itself to these three facets of the environment, in the following manner. a regional serials programj grosch 203 the local library, whatever its size or type, must develop some system for internal control of its serials collections. this author feels that the local library should be free to adopt either nsdp or anglo-american cataloging rules for current serials but need not change its retrospective records unless it can really afford to, if such data became available in the future through the program herein proposed. ideally, such a conversion and uniformity has much to offer the library user, but costs would be too high for most libraries. also, small libraries and certain libraries, because of their physical conditions, may need to preserve differences. therefore, the local library can be urged to adopt nsdp i aacr as standard but cannot be forced to standardize because of the large retrospective conversion problem. the local library can develop its serials system independently or through partial or full support through a network-the regional serials data center under this plan. independent of whether it chooses to use the regional serials data center for such services its needs remain the same: • to identify the serial in hand; • to obtain cataloging copy for it; • to service its subscriptions, claims, binding; and • to produce some form of catalog showing its holdings and arranged to reflect its specific shelf arrangement. the networking serials management component represents the development of union catalogs, wherein members of the network can: • identify a serial; • identify who holds a specific issue; • provide interlibrary loan/photocopy service to obtain the actual document; and • provide a way to consolidate fragmentary sets, eliminate unnecessary duplication or provide more copies when needed, and broaden subject coverage among the network. union catalogs, document delivery services, and bibliographic reference assistance comprise the products that are used by the network component. the international/ national serials system component must be the vehicle to provide a uniform bibliographic description and local components. the issn and the nsdp record provide the means to this end. the machine-readable record at the regional serials data center would comprise at least the key elements of the international record, i.e. keytitle as supplied via nsdp from either national library or regional center input and the issn. beyond that, the regional center bibliographic data base should be structured to provide full bibliographic description according to aacr rules for current publications, accommodating the retrospective data as it is found in its region-at least until some national effort at conversion of superimposed records can be mounted. moreover, the regional center should be tailored to perform the functions that its local libraries deem important. this obviously will vary with the region and with time as regions will vary in size and their mix of libraries. 204 journal of library automation vol. 6/4 december 1973 figure 2 enumerates the functions and responsibilities of the respective parts of the regionally organized serials program illustrated in figure 1. this list is not meant to be all inclusive as other functions could be recognized by other libraries or regions. national se1·ials data program 1. assign issn/key-title to titles reported via national libraries and regional serials data centers. 2. create and maintain data base, indexes of key-titles, issn' s, etc. 3. create and maintain or accept surrogates from regional serials data centers. 4. maintain essential isds data elements, other isds elements. 5. maintain essential non-isds data elements or national library extensions via other data elements. 6. publish indexes to the data base at nsdp for use by regional serials data centers and national libraries. 7. transmit issn's and key-titles as assigned to the regional serials data centers and the national libraries. 8. carry out publisher relations to convince publishers of the need to use issn, etc. as well as foster some additional uniformity wherever possible. national libraries 1. provide cataloging copy-surrogate for newly cataloged titles-using new or available mechanisms such as marc, serline or nst to nsdp for key-titlesj issn assignments. 2. provide subscription ca.rdj tape cataloging copy to local libraries until a dual definition national serials data program record canbe developed and revised through the regional centers. 3. maintain national union list functions via nst, eventually coordinating with nsdp to provide key-title and issn entry points. regional centers 1. create and maintain regional center serials data base reflecting holdings of libraries in the region. 2. forward new title bibliographic/surrogate data to nsdp for issn/ key-title assignment if not processed by nsdp through national libraries or another regional center. 3. publish union catalogs or holdings in region for network library use. 4. process and provide machine/manual cataloging data for region use with m~rc, serline, nst type processing. 5. forward retrospective data to nsdp converted to their requirement$ for fig. 2. functions and responsibilities of the respective parts of a regionally organized serials program. a regional serials programjgros~h 205 retrospective issn/key-title assignment and addition to the central store atnsdp. 6. develop local library services as required. for example: a. catalog card production. b. book catalog production. c. oclc type serials check-in, claiming, binding, subscription system. d. document/ photocopy delivery system for interlibrary loan. e. coordinated acquisitions program for new serials, added copies. f. coordinate interlibrary transfers and consolidations of retrospective holdings. 7. communicate with other centers or nsdp to locate titles not supplied by local region. local library 1. notify regional center of new titles, changes, corrections to maintain regional data base. 2. use local library services as deemed necessary from regional center. 3. participate in document delivery / resource sharing/set consolidation within the region network. fig. 2. (continued) figure 3 shows the basic tasks and their current status with respect to developing such a program, provided that funding became available for at least a pilot regional center. the isds record specifications are presently available and implemented through the nsdp data base. the nsdp i university of minnesota contract for a feasibility study will determine the costs and required bibliographic and programming support to convert locally generated data bases, i.e., the minnesota union list of serials ( muls) to nsdp requirements. the results of this feasibility study will determine the prospects for funding any proposals for actual conversion of local data bases such as muls. the current muls data base represents one model of a regional center data base, with the addition of issn and key-title as the links to the nsdp system and/ or augmentation to provide other kinds of services to local libraries participating in muls. creation of the system of regional centers would depend upon proposing and funding such a program based on the above work. establishment of a pilot regional center would be one manner in which such a plan could be tested, followed by further center establishment based on the results of the pilot program. with such a system in routine operation, the nsdp in its central role could focus its attention on the retrospective conversion-issn and key-title assignment possibly via some nationally coordinated cooperative venture among the regional centers or other contractors. conclusion obviously, any system for a serials program will have its problems. are206 journal of library automation vol. 6/ 4 december 1973 task i. development of isds record specifications for nsdp and other centers. 2. development of software/ systems to convert locally generated marc based serials data bases to nsdp requirements. 3. conversion of locally generated marc based data bases to nsdp requirements and issn ;key title assignments for unique titles. 4. design of regional center data base record-basic data element specifications. 5. creation of regional centers-pilot center establishment, followed by other regional centers and full implementation of a regional plan. 6. possible retrospective serial title conversion. presently available feasibility study contracted. future future-model available in muls with addition of key-title/issn as linking fields between nsdp record and regional center record. future future-on a nationally coordinated basis. fig. 3. requirements for establishment of a regionauy organized serials program. gionally organized system would have greater responsiveness to local and networking needs than a large centralized program. moreover, certain technical problems of data base manipulation would be easier to solve under this organization. no attempt at greater specificity has been made here, as the purpose of this paper is to describe the nucleus of one way in which a serials program for the u.s. could be structured for maximum local library and networking benefit. let the discussion flow! references i. winifred gregory, union list uf serujls in libraries uf the u.s. and canada (2d ed.; new york: wilson,1943). 2. new serial titles (washington, d.c.: library of congress, 1950). the current state and challenges in democratizing small museums' collections online article the current state and challenges in democratizing small museums’ collections online avgoustinos avgousti and georgios papaioannou information technology and libraries | march 2023 https://doi.org/10.6017/ital.v42i1.14099 avgoustinos avgousti (a.avgousti@cyi.ac.cy) is a researcher, the cyprus institute, cyprus. georgios papaioannou (gpapaioa@ionio.gr) is associate professor in museum studies and director of the museology research laboratory, ionian university, corfu, greece. © 2023. abstract this article focuses on the problematic democratization of small museum collections online in cyprus. while the web has enabled cultural heritage organizations to democratize information to diverse audiences, numerous small museums do not enjoy the fruits of this digital revolution; many of them cannot democratize their collections online. the current literature provides insight into small and large museums’ challenges worldwide. however, we do not have any knowledge concerning small cypriot museums. this article aims to fulfill this gap by raising the following research question: what is the current state of small museum collections online in cyprus, and what challenges do they face in democratizing their collections online? we present our empirical results from the interview summaries gathered from six small museums. introduction cultural heritage digitization and online accessibility offer an unprecedented opportunity to democratize museum collections. online collections, typically presented on institutional websites, represent the world’s culture, an increasing trend toward a world where information is digitally preserved, stored, accessed, and disseminated instantaneously through a global and interconnected digital network. consumers search for information on the web has enabled cultural heritage institutions to democratize their collections online, yet most small museums have not benefited from this process and do not have their collections online. as a result of the above-mentioned problem, digital versions of small museum collections are primarily inaccessible, meaning less access to information “knowledge.” there is a clear need for small museums to remain relevant by publishing their collections online. small museums must move quickly into the digital world. current literature provides insights into the challenges they face worldwide. however, we do not have knowledge regarding the situation in cyprus. this study aims to fill this gap by researching small museums in cyprus and asking the following research question: what is the current state of small museum collections online in cyprus, and what challenges do they face in democratizing their collections online? what is a small museum? museums are defined as small based on their annual budget and number of staff. the american association for state and local history (aaslh) defines museums as small if they have an annual budget of less than $250,000 and limited staff with multiple responsibilities. other factors such as the size of collections and the physical size of the museum could further categorize a museum as small. katz set the same budget and set the staff number at five or less.1 honeysett and falkowski put the budget at $300,000 and five or fewer employees.2 miller notes that the average small mailto:a.avgousti@cyi.ac.cy mailto:gpapaioa@ionio.gr information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 2 avgousti and papaioannou museum has just two full-time employees and a budget of less than $90,000.3 watson by contrast defines small museums as ones that grew out of the community they serve. 4 for the purposes of this article, a small museum is one with more than one but less than five fulltime employees not including museum custodians. categorizing a museum based on its budget is difficult and contentious since often museum staff are funded by another body, such as a municipality. literature review cultural heritage institutions such as galleries, libraries, archives, and museums (glams) were among the first organizations to digitize information by creating databases whose access was granted locally to institutional cardholders (horan, 2013). the process of digitization is of paramount importance, with museums eager to offer online access to their physical collections .5 online collections provide a range of opportunities, including the facilitation of knowledge sharing and the creation of a participatory environment that promotes information exchange.6 through their online presence, museums can present their collections to a global audience.7 the accessibility of digital knowledge opens the door for further knowledge to be generated and enhances the educational reach of cultural institutions.8 online collections create opportunities for small and geographically isolated museums to deliver learning opportunities to audiences around the world, something all museums should aim for.9 while larger museums have done well, smaller ones have not been as successful.10 much of past knowledge is stored in small museums, whose importance in preserving cultural heritage should not be underestimated.11 they sometimes add far more to social capital than larger national ones.12 though the need for museum collections online is recognized, there are limitations. if it was simple, every museum would be online.13 however, most small museums are not online.14 their collections remain digitally inaccessible to future generations.15 oberoi and arnold have gone so far as to maintain that information absent from the internet can be regarded as nonexistent. 16 on the other hand, in rare cases where small museums have their collections online, they target human consumers.17 the information is stored in isolated data silos incompatible with automatic processing. the challenge is to make collections discoverable via online search engines and metadata aggregators.18 the issue appears to have been ongoing for many years as gergatsoulis and lilis maintain that the web lacks semantic information and it has proved challenging to process such a massive set of interconnected data as mentioned 18 years ago.19 clearly, online collections must be understood and used efficiently both by humans and machines, because machine-consumable content will end up in human-consumable content.20 the current state and challenges small museums find it difficult to publish their collections online. most large museums have undergone a digital transformation, but few small ones have.21 the museum survey by tongue in 2017 showed that the number of museums planning to publish their collections online decreased from 40 percent in 2016 to 24 percent in 2017, although only 8 percent had already gone online by 2018.22 the survey in 2020 by network of european museum organisations (nemo) on digitization in european museums shows that an average of 20 percent of museum collections in europe as a whole are online, and the median is 10 percent. surprisingly, 43 percent are digitized but not online, meaning the public has access to less than half of the existing digital items.23 information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 3 avgousti and papaioannou a report by flynn in 2018 reveals that most historical society collections are not accessible online.24 according to honeysett and falkowski, the majority of museums in their survey in the us have less than 10 percent of their collections online.25 according to a survey by axiell in 2017, only 21 percent of museums have a complete collection online, 27 percent more than half, 38 percent percent less than half, and 14 percent percent have no collections online.26 in 2020 beaudoin pointed out that approximately 32 percent of us art museums with holdings provide online collection systems that are openly available to the public, while 13 percent do not even have an institutional website.27 avgousti, papaioannou, and gouveia indicated that even if small museums manage to give online access to their collections, they are often stored in isolated data silos incompatible with automatic processing.28 the museum survey by vernon systems in 2016 showed that 82 percent of museums do not use any machine-consumable standards.29 furthermore, only 11.9 percent use dublin core as a metadata standard, 3.6 percent darwin core, 1.2 percent ead and 8.3 percent other. further, the existence of individual collections online, maintained by different organizations, brings challenges to the discoverability, sharing, and reuse of resources .30 metadata aggregation is a frequently utilized strategy in which centralized organizations, such as europeana,31 collect associated metadata to make resources more discoverable and usable. why do we witness such low levels of online publishing in small museums? and why are online collections not in a format that is searchable and easy to find? according to the relevant literature, small museums lack resources and skilled staff to move to the digital age. current obstacles a key obstacle in the digitization of small museum collections is insufficient resources. large cultural heritage institutions have much greater access to funds.32 according to klimper, while the internet has had a tremendous impact on the democratization of european culture, insufficient financial resources remains a significant challenge for small museums.33 irina oberlander from the institute of cultural memory has pointed out that small and medium-sized museums with limited budgets are digital age victims.34 laine-zamojska stressed that small museums, which are often entirely run by volunteers, cannot afford to digitize or make their collections available to a wider audience.35 therefore, online access to cultural heritage in these small institutions is minimal. the nemo report in 2020 showed that insufficient staff is another major obstacle for museum digitization and online accessibility.36 small museums are understaffed.37 this is confirmed by gallery systems, who noted that small museums face their own set of collection challenges.38 with smaller team sizes and limited staff hours, it is difficult to operate. the museum survey by tongue in 2017 showed that 73 percent of museums did not have dedicated staff to manage online collections.39 this means that collection management is given to staff who already have a full job description. avgousti, papaioannou, and gouveia pointed out that museums do not usually hire experts to plan, develop, deploy, and maintain a digital collection, but delegate the task to museum staff who are often limited in technological skills,40 while wigodner and kearney mentioned that small museums typically have fewer (if any) employees devoted to web publishing.41 fewer employees often means a lack of skilled personnel. in the aforementioned survey, no museum with fewer than 50 staff members reported employing a computer expert.42 additionally, honeysett and falkowski mention that two-thirds of museums have one or no it personnel.43 in information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 4 avgousti and papaioannou addition, the same concern has been observed in small university libraries, whereby 67 percent did not have an it expert.44 further, small museums do not have suitable technology, and in many cases, the staff is not technologically adept.45 additionally, klimper affirms that the internet promise of providing access to european culture is hampered by a lack of technological skills.46 considerable expertise in semantic web technologies is needed to expose machine-consumable content to the “web of data.”47 finally, in-depth knowledge of modeling, along with programming skills, are also essential needs. complexity of technology and metadata issues the nemo report in 2020 showed that less than 20 percent of museum collections are online.48 as already mentioned, this may be attributed and related to the prerequisites of online collections as they include complex technology or the need for online platforms. additionally, avgousti, papaioannou, and gouveia pointed out that small museums do not have suitable technologies.49 within the discussion on the semantic web (also known as web 3.0, the world wide web’s extensions that make internet data machine-readable via applying standards), corlosquet stated that one of the significant challenges is getting semantic web data annotations to the end-user applications. if this is achieved, there will be faster adoption of the web of data. moreover, while content management systems (cms) significantly aid the production of online content by end users, the problem of allowing the user to produce semantic web content remains elusive.50 further, velios discusses the problem of understanding semantic web concepts concerning complex setups.51 such setups may be bewildering for those humanities scholars without a technical background. he mentions that the semantic web does not offer the necessary tools to accommodate data easily. vavliakis, karagiannis, and mitkas postulate that even for the mainstream use of the semantic web in the cultural heritage community, easily operated tools are also required.52 cultural heritage institutions are encouraged to start processing and publishing content with semantic technologies. still, the tools which can undertake such a considerable task continue to lack user friendly features. daradimos, vassilakis, and katifori claim that small museums use content management systems to publish their collections online.53 however, using a general-purpose cms (e.g., drupal) comes with great difficulty, primarily due to the lack of technical information such as dublin core fields, as nontechnical staff cannot be expected to know how to install and configure appropriate modules within drupal to enable the entry and publication of this metadata.54 however, there has been little development of the current cmss regarding user-friendly tools targeting the implementation of semantic markup annotations. the integration of cms into semantic web technologies will increase cultural heritage knowledge dissemination remarkably. further, the absence of robust and easily usable tools is considered a central challenge that continues to pose obstacles concerning the rapid adoption of semantic web and linked data.55 antoniou and van harmelen explain that the semantic web’s adoption relies on developing new and straightforward tools.56 the semantic web is also being based on the adoption of the existing technology rather than on new scientific solutions. modern and easy-to-use tools will facilitate the information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 5 avgousti and papaioannou semantic web’s adoption compared with what is available in the current conjuncture. however, only a small number of institutions use semantic technologies. tim berners-lee, the brains behind the semantic web, points out that the machine-readable web is always farther off compared to the human-readable web.57 in cases of large and well-funded organizations or museums like the bbc or the british museum, it is possible to work with semantic web technologies. on the other hand, small museums will have difficulties with the semantic web’s smooth implementation.58 it is pivotal to emphasize that challenges related to the implementation of machine-consumable content by museums rely heavily on adopting existing technology rather than on scientific approval. as antoniou and van harmelen have underlined, the most significant needs are observed in the areas of easily accessible tools that are approaching nontechnical communities. the most significant technological progress will lead to a more advanced semantic web compared to what can be achieved today.59 methodology data collection methods interviews are regularly used in qualitative research for data collection.60 structured interviews lead to more specific answers, usually in a controlled environment. in unstructured interviews, there are no set-in-advance questions, and the interview can be very broad, open, and exploratory. semistructured interviews fall in the middle, as they allow both a few specific questions to be addressed and space for extra information via deviating from the set questions. this is the main reason why they are one of the most popular and widely used methods of data collection.61 the interview type selected depends on the questions to be asked and the research method. the current research aims to collect a comprehensive understanding of a problem. therefore, semistructured interviews were the ideal tool, and an interview guide containing open-ended questions was developed. selection of the sample the researcher selected museums based on nonrandom criteria. techniques for nonprobability sampling methods are often suitable for qualitative research. nonprobability sampling’s aim is not to test a hypothesis about a large population but to establish an initial understanding of a small community or a population under research. the current research targets small museums in cyprus. therefore, a nonprobability sampling method was used to select small museums. the small museums contacted were not always responsive. however, we managed to conduct interviews with six small museums in cyprus using the snowball sampling method, where the researcher asked the interviewee to refer other people for conducting future interviews. sample size in the current study, the sample population is homogeneous, meaning the population is related to small museums in cyprus. when the population is homogenous the sample size should be at least 4 to 12 cases. in cases of heterogeneous samples, for example in small museums from around the world, the sample size must be at least 12 to 30 cases. in more complex cases such as ethnographic or grounded theory, the sample size must be larger. information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 6 avgousti and papaioannou in our case, we started with two cases and continued until data saturation was achieved, the point in the research when no new information is discovered in data analysis.62 cyprus has 34 museums of which 10 are small (for this survey, defined has having one full-time and fewer than five total employees). we interviewed six of the 10 small museums and reached data saturation after interviewing the first four. conducting the interviews in preparation for the interviews, we contacted the interviewees by phone and email, informing them about the interview. information on the size of the staff was gathered by contacting the museum. while ten museums met the definition of small, only six agreed to participate in the research. first, a pilot test was conducted on two interviewees to identify any problems with the interview guide. based on this pilot test, we made changes and corrected mistakes. due to the covid-19 pandemic, interviews were conducted via internet-based technologies, mostly zoom, a video telephony software program, chosen because of its ability to record video. the interview length was about 20–25 minutes, and all participants had the option to choose greek or english as the interview language. due to the pandemic and logistic challenges, it took about six months to identify subjects and conduct the interviews. results this section discusses the empirical results extracted from the interview summaries. interviews were conducted in greek (both authors are native speakers of greek) and translated to english by the authors. under the major headings of our research subject, we present our findings concerning our research question. the current state of collections online our results indicate that most small museums in cyprus do not have an online presence. two of the six museums (4 and 5) do not have a website. the ones that do have websites created but not updated or supported for more than 15 years, and which therefore need replacement. here are two representative comments: “the museum has an old and simple website” (respondent 1); “[we have] a very old website that needs to be changed soon” (respondent 3). the two museums that do not have a website, use/have used social media: “the museum uses facebook and instagram” (respondent 4); “[we] used to have a facebook page” (respondent 5). we discovered that five of the six museums do not have their collections online: “the museum does not have any of its collections online” (respondent 1); “no online collections” (respondent 4); “we do not have any collections online” (respondent 5). further, we learned that none of the museums use machine-consumable standards to achieve wider interoperability on the web: “the online collections are only in a human-readable format” (respondent 2); “we do not use any machine-readable” (respondent 3). however, museums understand the need and benefits of such solutions: “our goal is to have the online collection understandable by machines and share metadata online” (respondent 2). information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 7 avgousti and papaioannou we noticed that all museums are willing to give online access to their collections, complete or partial, and agree that the primary goal is to disseminate information: “the primary goal is to give access to museum collections for general use” (respondent 1); “to put it another way, to communicate information, knowledge to scholars and the general public” (respondent 2); “to reach as many people as possible and spread those collections online to a variety of audiences” (respondent 3); “the main reason that online collections exist is that it is the tool to reach more people and disseminate those collections online to diverse audiences, researchers, and the public alike, in other words to disseminate knowledge” (respondent 4); “to disseminate knowledge and information to more people such as students and researchers and the general public” (respondent 5). museums also view online collections as a marketing tool that can bring more people to the museum’s physical space: “[online collections] can work as a marketing tool, people that can view our collections online may visit the museum physical space” (respondent 4); “the main goal is to be found” (respondent 5); “tourists coming to cyprus can use the system and find out about our collections and the museum” (respondent 6). clearly, museums are eager to give online access to their collections. the goal is to disseminate information and attract more people to their physical premises. when asked about the goals of publishing machine-consumable content online, findability was most significant: “nowadays, people are using search engines to find the information they are looking for. and since the information is not in a machine-readable format and understandable by search engines, it creates difficulties to be located online” (respondent 1); “[the goal is] to make the collections more findable” (respondent 2); “… to be easily findable by search engines on the internet” (respondent 3); “[to] increase wider findability of the collections over the web” (respondent 6). additionally, we discovered that some museums are not aware of the existence of machinereadable formats: “i am not aware of machine-readable data” (respondent 4); “the museum is not aware of any machine-readable standards for wider web interoperability” (respondent 5). it is evident that findability is the main goal in online content. but it is also clear that some museums are not aware of the existence of machine-readable standards and such technologies. the current challenges of collections online insufficient resources and the cost of existing solutions our study shows that museums’ insufficient resources and the cost of existing solutions are the main obstacles in having their collections online. here are five representative comments: “lack of money” (respondent 1); “we got offers from different companies; however, the costs of existing solutions were well above our budget and possibilities” (respondent 2); “the main obstacle related to giving online access to the museum collections is the cost … outsourcing this kind of work costs a lot of money that the museum does not have” (respondent 4); “of course is the cost” (respondent 5). insufficient staff (time) and skilled staff (know-how) according to our findings, staff limitations are another obstacle small museums face in providing online access to their collections: “the existing staff has so many other responsibilities mostly related to research and museum daily functions” (respondent 1); “populating all the material to a new system requires a lot of time and staff that the museum does not have” (respondent 2); “the museum’s limited staff” (respondent 4); “the limited staff of the museums is a problem” (respondent 6). information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 8 avgousti and papaioannou further, interviewees shared that the lack of know-how is another obstacle to digitizing and making accessible museum collections online: “we do not have the technical knowledge. of all of the staff members, no one has technical knowledge … this means that we must hire a person that has this kind of knowledge” (respondent 1); “we do not have a dedicated staff to work specifically for this function” (respondent 6). complexity of technology (existing systems) according to our research, the existing technological complexity is another major problem: “the lack of easy-to-use tools that we can use at the museum [is a problem]” (respondent 1); “creating a content model selecting all necessary fields is a very complex and time-consuming process” (respondent 2); “we need tools that are user-friendly, easy to use with nontechnical complexity without requiring a too specialized technical know-how” (respondent 3); “the technological complexity that is involved” (respondent 5); “hosting your own online collections due to the maintenance and technical knowledge is another issue that small museums are facing” (respondent 6). insufficient infrastructure our research revealed the lack of technological infrastructure was an obstacle: “the lack of infrastructures … we cannot work with this kind of old infrastructure … we cannot work with a computer that is 20 years old, this is impossible … [we have] only one old computer that is connected to the internet” (respondent 1); “primary challenges related to technological infrastructure” (respondent 3); “the existing infrastructure of the museum, we have old computers” (respondent 4); “hosting your own online collections due to the maintenance and technical knowledge is another issue that small museums are facing in cyprus. this is why we use external platforms” (respondent 6). not machine consumable the complexity of technology was highlighted as the biggest challenge in publishing collections online in machine-consumable formats: “easy-to-use solutions” (respondent 1); “selection of the appropriate technology, there are so many standards for machine-readable data making the selection process extremely hard” (respondent 2); “the complexity of technology is the main obstacle” (respondent 3); “if the system we use can automatically create machine-consumable content this will help” (respondent 4); “the platform that publishes the collection humanconsumable content can at the same time publish in machine-understandable content will solve the problem” (respondent 6). for some, machine consumption is not a priority: “it is not a first priority of the museum” (respondent 5); “the museum is not familiar with machine publishing” (respondent 6). the complexity of technology and the lack of easy-to-use tools are among the biggest obstacles to publishing machine-consumable content. discussion and conclusions existing online collections and/or museum resources should be researched further as they may not be completely digitized and accessible to different audiences online. with one-third of small museums in cyprus providing access to their collections online, there are many opportunities to help small museums to give access to their collections to benefit information knowledge democratization. information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 9 avgousti and papaioannou we discovered that the lack of resources and infrastructure are two significant challenges small museums in cyprus face in providing online access to their collections. our results show no museum partners with national institutions, such as universities or academic research centers. we assert that this collaboration can reduce costs and eliminate the need for infrastructure. at the same time, institutions, such as universities, usually have the technological know-how and can provide museums with new tools, free and open-source systems that focus on cypriot small museum needs. such tools, which can be found in our research, will help museums to drastically reduce the cost that is involved in buying such systems. moreover, we found that the lack of staff (time) is another challenge that prevents museums from having their collections online. we believe that developing new tools that can accelerate the process of generating, administering, maintaining, and uploading museum collections online will alleviate staff time. our research also uncovered that small museums in cyprus do not work with volunteers, as they have no time and resources to find and then train volunteers to museum work; we suggest museums must consider these options concerning the lack of staff (time). additionally, we learned that museums lack specialized staff (know-how), another significant challenge that blocks museums from democratizing their online collections. we anticipate that developing technology that requires less technical expertise will benefit small museums that do not have specialized staff (e.g., developers and information technology specialists). further, help from external bodies such as universities may help. on the other hand, there are platforms available that do not need specialized technical knowledge. however, we discovered that the complexity of existing technology impedes museum collections online. we hope that creating less complex technology will enable museums to use and publish their collections online in human and machine consumable formats. further, training of existing staff in new technologies is needed. to sum up, small museums in cyprus and the world need to invest in democratizing their collections online via digitizing, describing, and making their objects and collections available online. simple and turnkey solutions for publishing and describing digitized objects are required. there is a will; we keep researching towards finding the most suitable case-oriented and affordable ways. acknowledgments many thanks to all the interviewees from small museums in cyprus that opened their doors to our research. for ethical considerations, we keep institutions and interviewees anonymous. information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 10 avgousti and papaioannou endnotes 1 paul katz, “the quandaries of the small museum,” the journal of museum education 20, no. 1 (1995): 15–17, https://www.jstor.org/stable/40479486. 2 nik honeysett and julia falkowski, museum technology landscape 2018: discovery and findings (lyrasis: 2018), https://www.lyrasis.org/leadership/documents/lyrasis-museum-techlandscape-report-2018.pdf. 3 eric miller, uche ogbuji, victoria mueller, and kathy macdougall, bibliographic framework as a web of data: linked data model and supporting services (washington, dc: library of congress, 2012), 42. 4 sheila watson, ed., museums and their communities (routledge: new york, 2007). 5 guido cimadomo, “documentation and dissemination of cultural heritage: current solutions and considerations about its digital implementation,” in 2013 digital heritage international congress (digitalheritage) (ieee: 2013), 555–62, https://doi.org/10.1109/digitalheritage.2013.6743796; s. sylaiou, f. liarokapis, p. patias, and o. georgoula, “virtual museums: first results of a survey on methods and tools” (paper presented at cipa 2005 xx symposium, 26 september–01 october 2005, torino, italy); rachel regelein, “a digital collections plan for the southwest seattle historical society” (unpublished master’s project, university of washington, 2019), https://www.washington.edu/museology/2019/11/13/a-digital-collections-plan-for-thesouthwest-seattle-historical-society/; ion gil fuentetaja and maria economou, “studying the type of online access provided to museum collections” (2008), https://www.semanticscholar.org/paper/studying-the-type-of-online-access-provided-tofuentetaja-economou/b44415e02b5fca204d79b481d325b66482461f41. 6 regelein, “a digital collections plan”; bernadette flynn, “making collections accessible” (federation of australian historical societies inc., january 2018), https://www.history.org.au/wp-content/uploads/2018/10/makingcollectionsaccessible.pdf; karol j. borowiecki and trilce navarrete, “digitization of heritage collections as indicator of innovation,” economics of innovation and new technology 26, no. 3 (2017): 227–46, https://doi.org/10.1080/10438599.2016.1164488; morgan schlesinger, “the museum wiki: a model for online collections in museums” (master’s project/capstone, university of san francisco, 2016), https://repository.usfca.edu/capstone/456; genevieve horan, “digital heritage: digitization of museum and archival collections” (research paper, master of public administration, political science department, southern illinois university, 2013), https://opensiuc.lib.siu.edu/gs_rp/374. 7 ilse harms and werner schweibenz, “evaluating the usability of a museum web site abstract” (2001), https://www.museumsandtheweb.com/mw2001/papers/schweibenz/schweibenz.html. 8 steen hvass, preface to the museum’s web users, a user survey of museum websites (heritage agency of denmark, 2010), https://slks.dk/fileadmin/publikationer/kulturarv/the_museum_s_web_users_2010.pdf; https://www.jstor.org/stable/40479486 https://www.lyrasis.org/leadership/documents/lyrasis-museum-tech-landscape-report-2018.pdf https://www.lyrasis.org/leadership/documents/lyrasis-museum-tech-landscape-report-2018.pdf https://doi.org/10.1109/digitalheritage.2013.6743796 https://www.washington.edu/museology/2019/11/13/a-digital-collections-plan-for-the-southwest-seattle-historical-society/ https://www.washington.edu/museology/2019/11/13/a-digital-collections-plan-for-the-southwest-seattle-historical-society/ https://www.semanticscholar.org/paper/studying-the-type-of-online-access-provided-to-fuentetaja-economou/b44415e02b5fca204d79b481d325b66482461f41 https://www.semanticscholar.org/paper/studying-the-type-of-online-access-provided-to-fuentetaja-economou/b44415e02b5fca204d79b481d325b66482461f41 https://www.history.org.au/wp-content/uploads/2018/10/makingcollectionsaccessible.pdf https://doi.org/10.1080/10438599.2016.1164488 https://repository.usfca.edu/capstone/456 https://opensiuc.lib.siu.edu/gs_rp/374 https://www.museumsandtheweb.com/mw2001/papers/schweibenz/schweibenz.html https://slks.dk/fileadmin/publikationer/kulturarv/the_museum_s_web_users_2010.pdf information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 11 avgousti and papaioannou monica bercigli, “dissemination strategies for cultural heritage: the case of the tomb of zechariah in jerusalem, israel,” heritage 2, no. 1 (march 2019): 306–14, https://doi.org/10.3390/heritage2010020; elena villaespesa and trilce navarrete, “museum collections on wikipedia: opening up to open data initiatives” (paper presented at mw19, the 23rd museweb conference, boston, ma, april 2–6, 2019), https://mw19.mwconf.org/paper/museum-collections-on-wikipedia-opening-up-to-opendata-initiatives/; gerald wayne clough, “democratization of knowledge through digitization in libraries, museums, and archives” (2020), https://smartech.gatech.edu/handle/1853/62423; gerald wayne clough, “democratization of knowledge through digitization in libraries & archives” (2020), video, https://smartech.gatech.edu/bitstream/handle/1853/62423/clough.mp4?sequence=5&isallo wed=y; eva richani, georgios papaioannou, and christina banou, “emerging opportunities: the internet, marketing and museums,” in 20th international conference on circuits, systems, communications and computers (cscc 2016) 76, https://doi.org/10.1051/matecconf/20167602044. 9 lynsey martenstyn, “digital archives: making museum collections available to everyone,” culture professionals network, the guardian, may 3, 2013, https://www.theguardian.com/culture-professionals-network/culture-professionalsblog/2013/may/03/museum-archives-digital-online; shyam oberoi and kristen arnold, “new architectures for online collections and digitization” (paper presented at mw2015: museums and the web, chicago, il, april 8–11, 2015), https://mw2015.museumsandtheweb.com/paper/new-architectures-for-online-collectionsand-digitization/. 10 barbara lejeune, “the effects of online catalogues in london and other museums: a study of an alternative way of access,” papers from the institute of archaeology 18, no. s1 (2007): 79–97, https://doi.org/10.5334/pia.289. 11 chryssoula bekiari, leda charami, martin doerr, christos georgis, and athina kritsotaki, “documenting cultural heritage in small museums” (paper presented in 2008 annual conference of cidoc), https://cidoc.mini.icom.museum/wpcontent/uploads/sites/6/2018/12/25_papers.pdf; rolf däßler and ulf preuß, “digital preservation of cultural heritage for small institutions,” in digital cultural heritage, ed. horst kremers (springer international publishing, 2020), https://www.springerprofessional.de/en/digital-preservation-of-cultural-heritage-for-smallinstitutions/16842836. 12 penelope kelly, “managing digitization projects in a small museum” (master’s project, arts and administration program, university of oregon, march 2005), https://scholarsbank.uoregon.edu/xmlui/handle/1794/937. 13 kate taylor, “going digital not easy for cultural institutions,” the globe and mail, april 18, 2020, https://www-theglobeandmail-com.cdn.ampproject.org. 14 regelein, “a digital collections plan”; flynn, “making collections accessible”; susan wigodner and caitlin kearney, “who reviewed this?! a survey on museum web publishing in 2018 https://doi.org/10.3390/heritage2010020 https://mw19.mwconf.org/paper/museum-collections-on-wikipedia-opening-up-to-open-data-initiatives/ https://mw19.mwconf.org/paper/museum-collections-on-wikipedia-opening-up-to-open-data-initiatives/ https://smartech.gatech.edu/handle/1853/62423 https://smartech.gatech.edu/bitstream/handle/1853/62423/clough.mp4?sequence=5&isallowed=y https://smartech.gatech.edu/bitstream/handle/1853/62423/clough.mp4?sequence=5&isallowed=y https://doi.org/10.1051/matecconf/20167602044 https://www.theguardian.com/culture-professionals-network/culture-professionals-blog/2013/may/03/museum-archives-digital-online https://www.theguardian.com/culture-professionals-network/culture-professionals-blog/2013/may/03/museum-archives-digital-online https://mw2015.museumsandtheweb.com/paper/new-architectures-for-online-collections-and-digitization/ https://mw2015.museumsandtheweb.com/paper/new-architectures-for-online-collections-and-digitization/ https://doi.org/10.5334/pia.289 https://cidoc.mini.icom.museum/wp-content/uploads/sites/6/2018/12/25_papers.pdf https://cidoc.mini.icom.museum/wp-content/uploads/sites/6/2018/12/25_papers.pdf https://www.springerprofessional.de/en/digital-preservation-of-cultural-heritage-for-small-institutions/16842836 https://www.springerprofessional.de/en/digital-preservation-of-cultural-heritage-for-small-institutions/16842836 https://www-theglobeandmail-com.cdn.ampproject.org/ information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 12 avgousti and papaioannou (paper presented at mw18: museums and the web 2018, vancouver, canada, april 18–21, 2018), https://mw18.mwconf.org/paper/who-reviewed-this-a-survey-on-museum-webpublishing-in-2018/index.html; shyam oberoi and kristen arnold, “new architectures for online collections and digitization” (paper presented at mw2015: museums and the web, chicago, il, april 8–11, 2015), https://mw2015.museumsandtheweb.com/paper/newarchitectures-for-online-collections-and-digitization/index.html. 15 flynn, “making collections accessible.” 16 oberoi and arnold, “new architectures.” 17 nuno freire, pável calado, and bruno martins, “availability of cultural heritage structured metadata in the world wide web” (paper presented at 22nd international conference on electronic publishing, june 2018), 11, https://www.researchgate.net/publication/325914185_availability_of_cultural_heritage_stru ctured_metadata_in_the_world_wide_web. 18 flynn, “making collections accessible.” 19 manolis gergatsoulis and pantelis lilis, “multidimensional rdf,” in on the move to meaningful internet systems 2005: coopis, doa, and odbase (2005): 1188–1205, https://doi.org/10.1007/11575801_17. 20 cruce saunders, “content authoring for human and machine consumption.” [a], september 2, 2019, https://simplea.com/articles/content-authoring-for-human-and-machine. 21 alejandra garcia bittar, “is a digital strategy necessary in small museums?” museum and digital culture – pratt institute, november 11, 2018, https://museumsdigitalculture.prattsi.org/is-adigital-strategy-necessary-in-small-museums-a72c1645e495. 22 charles tongue, “museum survey 2017,” vernon systems (blog), may 12, 2017, https://vernonsystems.com/museum-survey-2017/. 23 network of european museum organisations, “final report: digitisation and ipr in european museums” (network of european museum organisations, july 2020), https://www.nemo.org/fileadmin/dateien/public/publications/nemo_final_report_digitisation_and_ipr_in_ european_museums_wg_07.2020.pdf; cf. kelly, “managing digitization projects.” 24 flynn, “making collections accessible.” 25 honeysett and falkowski, “museum technology landscape.” 26 axiell, “museums accelerate implementation of digital strategies, making more content available online and on-site to improve visitor experiences,” axiell (blog), june 28, 2017, https://www.axiell.com/axiell-news/museums-accelerate-implementation-of-digitalstrategies-making-more-content-available-online-and-on-site-to-improve-visitor-experiences2/. https://mw18.mwconf.org/paper/who-reviewed-this-a-survey-on-museum-web-publishing-in-2018/index.html https://mw18.mwconf.org/paper/who-reviewed-this-a-survey-on-museum-web-publishing-in-2018/index.html https://www.researchgate.net/publication/325914185_availability_of_cultural_heritage_structured_metadata_in_the_world_wide_web https://www.researchgate.net/publication/325914185_availability_of_cultural_heritage_structured_metadata_in_the_world_wide_web https://doi.org/10.1007/11575801_17 https://simplea.com/articles/content-authoring-for-human-and-machine https://museumsdigitalculture.prattsi.org/is-a-digital-strategy-necessary-in-small-museums-a72c1645e495 https://museumsdigitalculture.prattsi.org/is-a-digital-strategy-necessary-in-small-museums-a72c1645e495 https://vernonsystems.com/museum-survey-2017/ https://www.ne-mo.org/fileadmin/dateien/public/publications/nemo_final_report_digitisation_and_ipr_in_european_museums_wg_07.2020.pdf https://www.ne-mo.org/fileadmin/dateien/public/publications/nemo_final_report_digitisation_and_ipr_in_european_museums_wg_07.2020.pdf https://www.ne-mo.org/fileadmin/dateien/public/publications/nemo_final_report_digitisation_and_ipr_in_european_museums_wg_07.2020.pdf https://www.axiell.com/axiell-news/museums-accelerate-implementation-of-digital-strategies-making-more-content-available-online-and-on-site-to-improve-visitor-experiences-2/ https://www.axiell.com/axiell-news/museums-accelerate-implementation-of-digital-strategies-making-more-content-available-online-and-on-site-to-improve-visitor-experiences-2/ https://www.axiell.com/axiell-news/museums-accelerate-implementation-of-digital-strategies-making-more-content-available-online-and-on-site-to-improve-visitor-experiences-2/ information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 13 avgousti and papaioannou 27 joan beaudoin, “art museum collections online: extending their reach” (paper presented at mw20, the 24th museweb conference, march 31–april 4, 2020), https://mw20.museweb.net/paper/art-museum-collections-online-extending-their-reach/. 28 avgoustinos avgousti, georgios papaioannou, and feliz ribeiro gouveia, “content dissemination from small-scale museum and archival collections: community reusable semantic metadata content models for digital humanities,” the code4lib journal, no. 43 (february 14, 2019), https://journal.code4lib.org/articles/14054. 29 “vernon systems – museum survey,” vernon systems (blog), may 19, 2016, https://vernonsystems.com/vernon-systems-museum-survey-data/. 30 nuno freire et al., “a survey of web technology for metadata aggregation in cultural heritage,” information services & use 37, no. 4 (2017): 425–36, https://doi.org/10.3233/isu-170859. 31 europeana: discover europe’s digital cultural heritage (website), accessed january 29, 2023, https://www.europeana.eu/en. 32 denis pitzalis, “3d and semantic web: new tools to document artefacts and to explore cultural heritage collections” (2013); denis pitzalis, “3d and semantic web: new tools to document artifacts and to explore cultural heritage collections. signal and image processing,” (phd diss. université pierre et marie curie, 2013); chryssoula bekiari et al., “documenting cultural heritage in small museums;” lejeune, “the effects of online catalogues in london and other museums: a study of an alternative way of access.” 33 paul klimper, “introduction to museums in the digital age,” in nemo 21st annual conference documentation, bukarest, romania, 2013, ed. julia pagel and kelly donahue, https://www.nemo.org/fileadmin/dateien/public/statements_and_news/nemo_21st_annual_conference_doc umentation.pdf. 34 network of euopean museum organisations, “final report”; dorel micle, “heritage networks and portals,” in museum and the internet. selected papers from the international summer course in buşteni, romania, 20th – 26th of september, 2004, ed. irina oberlander-târnoveanu (budapest: archaeolingua, 2008), 73-120; bittar, “is a digital strategy necessary in small museums?” 35 magdalena laine-zamojska, “virtual museum and small museums: vimuseo.fi project,” in museums and the web 2011: proceedings, ed. j. trant and d. beardman (toronto: archives and museum informatics, 2011), https://www.museumsandtheweb.com/mw2011/papers/virtual_museum_and_small_museu ms_vimuseofi_pro. 36 network of european museum organisations, “final report.” 37 noah lenstra, “website development for small museums: a case study of the katherine dunham dynamic museum,” january 1, 2008, https://mw20.museweb.net/paper/art-museum-collections-online-extending-their-reach/ https://journal.code4lib.org/articles/14054 https://vernonsystems.com/vernon-systems-museum-survey-data/ https://doi.org/10.3233/isu-170859 https://www.europeana.eu/en https://www.ne-mo.org/fileadmin/dateien/public/statements_and_news/nemo_21st_annual_conference_documentation.pdf https://www.ne-mo.org/fileadmin/dateien/public/statements_and_news/nemo_21st_annual_conference_documentation.pdf https://www.ne-mo.org/fileadmin/dateien/public/statements_and_news/nemo_21st_annual_conference_documentation.pdf https://www.museumsandtheweb.com/mw2011/papers/virtual_museum_and_small_museums_vimuseofi_pro https://www.museumsandtheweb.com/mw2011/papers/virtual_museum_and_small_museums_vimuseofi_pro information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 14 avgousti and papaioannou https://www.academia.edu/2439955/website_development_for_small_museums_a_case_stud y_of_the_katherine_dunham_dynamic_museum. 38 “resources for small museums,” gallery systems (blog), accessed july 13, 2022, https://www.gallerysystems.com/online-tools-resources-small-museums/. 39 tongue, “museum survey 2017.” 40 avgousti, papaioannou, and ribeiro gouveia, “content dissemination.” 41 wigodner and kearney, “who reviewed this?!” 42 wigodner and kearney, “who reviewed this?!” 43 honeysett and falkowski, “museum technology landscape.” 44 jasmine hoover, “gaps in it and library services at small academic libraries in canada,” information technology and libraries 37, no. 4 (2018): 15–26, https://doi.org/10.6017/ital.v37i4.10596. 45 avgousti, papaioannou, and ribeiro gouveia, “content dissemination.” 46 klimper, introduction. 47 vikas bhushan, shiv shakti ghosh, and sudipta biswas, “bridging the gap between cms and semantic web” (paper presented at naclin 2015 conference, kamataka, india), researchgate, 2016, https://www.researchgate.net/publication/307923260_bridging_the_gap_between_cms_and_ semantic_web. 48 network of european museum organisations, “final report.” 49 avgousti, papaioannou, and ribeiro gouveia, “content dissemination.” 50 stéphane jean joseph corlosquet, “bootstrapping the web of data with drupal” (master’s thesis, national university of ireland, galway, 2009), https://aic.ai.wu.ac.at/~polleres/supervised_theses/stephane_corlosquet_mseng_2009.pdf . 51 athanasios velios and aurelie martin, “off-the-shelf crm with drupal: a case study of documenting decorated papers,” international journal on digital libraries 18, no. 4 (2017): 321–31, https://doi.org/10.1007/s00799-016-0191-5. 52 konstantinos vavliakis, georgios karagiannis, and pericles mitkas, “semantic web in cultural heritage after 2020” (2012), https://www.semanticscholar.org/paper/semantic-web-incultural-heritage-after-2020-vavliakiskaragiannis/c69d14de020d5dedb9e76a173c94cc56cc254251. 53 illias daradimos, costas vassilakis, and akrivi katifori, “a drupal cms module for managing museum collections” (2015), https://www.academia.edu/2439955/website_development_for_small_museums_a_case_study_of_the_katherine_dunham_dynamic_museum https://www.academia.edu/2439955/website_development_for_small_museums_a_case_study_of_the_katherine_dunham_dynamic_museum mailto:https://www.gallerysystems.com/online-tools-resources-small-museums/ https://doi.org/10.6017/ital.v37i4.10596 https://www.researchgate.net/publication/307923260_bridging_the_gap_between_cms_and_semantic_web https://www.researchgate.net/publication/307923260_bridging_the_gap_between_cms_and_semantic_web https://doi.org/10.1007/s00799-016-0191-5 https://www.semanticscholar.org/paper/semantic-web-in-cultural-heritage-after-2020-vavliakis-karagiannis/c69d14de020d5dedb9e76a173c94cc56cc254251 https://www.semanticscholar.org/paper/semantic-web-in-cultural-heritage-after-2020-vavliakis-karagiannis/c69d14de020d5dedb9e76a173c94cc56cc254251 https://www.semanticscholar.org/paper/semantic-web-in-cultural-heritage-after-2020-vavliakis-karagiannis/c69d14de020d5dedb9e76a173c94cc56cc254251 information technology and libraries march 2023 the current state and challenges in democratizing small museums’ collections online 15 avgousti and papaioannou https://www.academia.edu/29947679/a_drupal_cms_module_for_managing_museum_collect ions. 54 quinn dombrowski, drupal for humanists (college station: texas a&m university press, 2016.) 55 jennifer zaino, “2017 trends for semantic web and semantic technologies,” dataversity (blog), november 29, 2016, https://www.dataversity.net/2017-predictions-semantic-websemantic-technologies/. 56 grigoris antoniou and frank van harmelen, a semantic web primer, second edition (cambridge massachusetts and london, u.k.: the mit press, 2008). 57 jackson joab, “tim berners-lee: machine-readable web still a ways off” gcn, october 30, 2009, https://gcn.com/articles/2009/10/30/berners-lee-semantic-web.aspx. 58 eric miller, uche ogbuji, victoria mueller, and kathy macdougall, “bibframe primer – bibliographic framework as a web of data: linked data model and supporting services” (november 2012): 42, https://www.researchgate.net/publication/280113409_bibframe_primer__bibliographic_framework_as_a_web_of_data_linked_data_model_and_supporting_services. 59 antoniou and van harmelen, a semantic web primer. 60 bryn farnsworth, “qualitative vs quantitative research – what is what?” imotions (blog), june 11, 2019, https://imotions.com/blog/qualitative-vs-quantitative-research/. 61 ceryn evans, “analysing semi-structured interviews using thematic analysis: exploring voluntary civic participation among adults,” in sage research methods datasets part 1 (sage publications, ltd., 2018), https://doi.org/10.4135/9781526439284. 62 sandra faulkner and stormy p. trotter, “data saturation,” in the international encyclopedia of communication research methods (american cancer society, 2017), 1–2, https://doi.org/10.1002/9781118901731.iecrm0060. https://www.academia.edu/29947679/a_drupal_cms_module_for_managing_museum_collections https://www.academia.edu/29947679/a_drupal_cms_module_for_managing_museum_collections https://www.dataversity.net/2017-predictions-semantic-web-semantic-technologies/ https://www.dataversity.net/2017-predictions-semantic-web-semantic-technologies/ https://gcn.com/articles/2009/10/30/berners-lee-semantic-web.aspx https://www.researchgate.net/publication/280113409_bibframe_primer_-_bibliographic_framework_as_a_web_of_data_linked_data_model_and_supporting_services https://www.researchgate.net/publication/280113409_bibframe_primer_-_bibliographic_framework_as_a_web_of_data_linked_data_model_and_supporting_services https://imotions.com/blog/qualitative-vs-quantitative-research/ https://doi.org/10.4135/9781526439284 mailto:https://doi.org/10.1002/9781118901731.iecrm0060 abstract introduction what is a small museum? literature review the current state and challenges current obstacles complexity of technology and metadata issues methodology data collection methods selection of the sample sample size conducting the interviews results the current state of collections online the current challenges of collections online insufficient resources and the cost of existing solutions insufficient staff (time) and skilled staff (know-how) complexity of technology (existing systems) insufficient infrastructure not machine consumable discussion and conclusions acknowledgments endnotes using augmented and virtual reality in information literacy instruction to reduce library anxiety in nontraditional and international students articles using augmented and virtual reality in information literacy instruction to reduce library anxiety in nontraditional and international students angela sample information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.11723 dr. angela sample (asample@oru.edu) is head of access services, oral roberts university abstract throughout its early years, the oral roberts university (oru) library held a place of pre-eminence on campus. oru’s founder envisioned the library as central to all academic function and scholarship. under the direction of the founding dean of learning resources, the library was an early pioneer in innovative technologies and methods. however, over time, as the case with many academic libraries, the library’s reputation as an institution crucial to the academic work on campus had diminished. a team of librarians is now engaged in programs aimed at repositioning the library as the university’s hub of learning. toward that goal, the library has long taught information literacy (il) to students and faculty through several traditional methods, including one-shot workshops and sessions tied to specific courses of study. now, in conjunction with disseminating augmented, virtual, and mixed reality (avmr) learning technologies, the library is redesigning instruction to align with various realities of higher education today, including uses of avmr in instruction and research and following best practices from research into serving 1. online learners; 2. international learners not accustomed to western higher-education practices; and 3. learners returning to university study after being away from higher education for some time or having changed disciplines of study. the library is innovating online tutorials targeted for nontraditional and international graduate students with various combinations of avmr, with the goal to diminish library anxiety. numerous library and information science studies have shown a correlation between library anxiety and reduced library use, and library use has been linked to student learning, academic success, and retention.1 this paper focuses on il instruction methods under development by the library. current indicators are encouraging as the library embarks on the redesign of il instruction and early development of inclusion of avmr in il instruction for nontraditional and international students. literature review the patron approaches the reference desk, with eyes downcast. in a voice so soft that it is barely above a whisper, the patron mumbles, “is this where i can get help with research?” some variation on the above scenario is an occurrence long familiar to academic reference librarians. in 1986, mellon put a name to this nervousness of patrons; she called it library anxiety.2 mailto:asample@oru.edu information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 2 since then, librarians have implemented various measures to help put patrons at ease and minimize their library anxiety. scholars have studied many of these measures aimed at reducing library anxiety, both to determine the efficacy of such interventions and to understand better the causes of library anxiety. this paper describes one library’s intervention using a virtual-reality tour of the library to learn about some of the services available at the library prior to their initial visit in an attempt to reduce some aspects of their library anxiety. library anxiety library and information science (lis) researchers have long recognized anxiety related to libraries and research can have a detrimental effect on students. mizrachi described library anxiety as the feeling of being overwhelmed, intimidated, nervous, uncertain, or confused when using or contemplating use of the library and its resources to satisfy an information need. it is a state-based anxiety that can result in misconceptions or misapplications of library resources, procrastination, and avoidance of library tasks.3 since mellon’s theoretical framing of library anxiety in 1986, researchers have studied a number of library-related anxieties, including research anxiety, information literacy anxiety, library technophobia, and computer anxiety. various studies have focused on different groups of students—freshmen, nontraditional students, and international students, to name a few—who may experience higher levels of library anxiety. another area that has been of interest to researchers is the study of the efficacy of various measures aimed at reducing the library anxiety of students. causes and factors researchers have found several causes of library anxiety. in her seminal article, mellon used a grounded theory approach to understand and “describe students’ fear of the library as library anxiety.”4 mellon noted most of the students in her study described their feelings as being lost in the library, which mellon stated “stemmed from four causes: (1) the size of the library; (2) a lack of knowledge about where things were located; (3) how to begin; and (4) what to do.”5 head and eisenberg also found a majority of students (84 percent) had difficulties in knowing where to begin.6 bostick and later jiao and onwuegbuzie named “five general antecedents of library anxiety . . . namely, barriers with staff, affective barriers, comfort with the library, knowledge of the library, and mechanical barriers.”7 barriers with staff are the feelings students have regarding the accessibility and approachability of library staff.8 affective barriers are students’ self-perceptions of their competence in using the library and library resources. affective barriers’ arise from feelings of inadequacy and can be heightened by the perception that others possess library skills that they alone do not.9 comfort with the library deals with the student’s perception of the library as a “safe and comforting environment.”10 knowledge of the library is students’ knowledge of “where things are located and how to find their way around in the building.”11 mechanical barriers refer to students’ perception of the reliability of machines in the library (e.g., copiers, printers, computers, etc.).12 researchers focused on investigating the information-seeking behavior of students have identified stages of library anxiety. in her work, kuhlthau identified six stages of information seeking in information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 3 which students may experience library anxiety: task initiation, topic selection, prefocus exploration, focus formulation, information collection, and search closure.13 in blundell’s presentation of her theoretical model of the academic information search process (aisp) of undergraduate millennial students (figure 1), she described the varying levels of anxiety students may feel throughout this process depending upon their success at finding needed information.14 anxiety at stage 2: development/refinement “ranges from mild to extreme, depending on the success of the student’s aisp in finding information he/she believes is appropriate for addressing the academic need.”15 at stage 3, “based on information located through the aisp in stages 1 & 2, [the] student either fulfills [the] academic need with minimal anxiety, refocuses aisp with mid to high-level anxiety, or abandons the academic need completely with high/extreme levels of anxiety.”16 figure 1. blundell aisp model.17 although blundell studied undergraduate millennial students’ information-seeking behaviors, the same behaviors may also be descriptive of other groups of students. blundell omitted anxiety at or prior to stage 1 when the assignment is received by the student. one reason for the omission of anxiety in blundell’s model at stage 1 may be a seemingly paradoxical finding by many researchers regarding students’ inflated belief in their research skills as compared to their actual level of information literacy (il) skills.18 students with a high self-assessment of their il skills may feel confident at the onset of research, only experiencing anxiety when encountering low success rates when searching for information or when experiencing information overload. however, many other students may experience anxiety at the onset of receiving an assignment, particularly on a information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 4 topic in which they have little or no knowledge. others may experience anxiety if they realize they do not know where to look for information, how to use library research tools, or feel apprehension at the thought of asking for help from a librarian. for example, library anxiety can result from the requirements of the assignment; most professors require peer-reviewed sources. many new students do not know what a peer-reviewed source is, much less how to find one. indeed, many of the causes of library anxiety described from mellon’s and later jiao’s and onwuegbuzie’s work can be positioned throughout all six of kuhlthau’s and all three of blundell’s stages of information seeking and could explain some of the potential steps blundell noted in her model. negative effects in addition to the obvious discomfort students might feel, library anxiety, as with other forms of anxiety, can have a detrimental effect on students’ academic performance. as mellon noted, “students become so anxious about having to gather information in a library for their research paper that they are unable to approach the problem logically or effectively.”19 the findings from jiao’s and onwuegbuzie’s numerous studies support the negative effect library anxiety can have on students’ academic performance in various ways, including research performance, research proposal writing, and study habits.20 research has also shown the link between higher levels of library anxiety and avoidance of the library.21 avoidance of the library could hinder students’ academic performance or retention; studies have linked library use to higher gpas and increased retention rates.22 other negative effects of library anxiety include the reluctance of students to ask for help from a librarian and the tendency to procrastinate until it is too late to do well on assignments. when library anxiety is at a level high enough to cause students to enter a panic mode, logical thinking, the ability to apply existing skills, and building or acquiring new skills can be impaired. at-risk student groups acknowledging the negative effects library anxiety can have on students’ academic performance, several studies have looked to determine whether particular demographic groups of students experience library anxiety at higher rates and what factors or causes may be most prevalent in the causes of library anxiety for a particular group. in one study conducted by jiao, onwuegbuzie, and lichtenstein, students who fell into the following groups tended to have the highest levels of library anxiety: “male, undergraduate, not speak english as their native language, have high levels of academic achievement, be employed either partor full-time, and visit the library infrequently.”23 some studies have focused on learning more about the library anxiety of a particular group. some of the groups investigated include graduate, international, and nontraditional students. still others have focused on possible racial differences in the prevalence of library anxiety. although a few studies have found library anxiety to be higher for undergraduate students than graduate students, one of the most often-studied groups at risk for library anxiety has been graduate students.24 these researchers have looked at a number of factors in relation to graduate students’ library anxiety. in an early study, they found graduate students with the preferred learning style of visual learners tend to have higher levels of library anxiety. 25 in another study of graduate students, they examined the relation between library anxiety and trait anxiety, defined as “the relative stable proneness within each person to react to situations seen as stressful. ”26 jiao and onwuegbuzie, together with bostick, investigated the potential relationship between race and library anxiety in 2004, which study they replicated in 2006. in both, the researchers found information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 5 caucasian american graduate students reported higher levels of library anxiety than their african american counterparts.27 another group frequently examined in library anxiety studies is international students. mizrachi noted “studies involving international students in american universities consistently show their levels of library anxiety to be much higher than their american peers.”28 onwuegbuzie and jiao found international esl students “had higher levels of library anxiety associated with ‘barriers with staff,’ ‘affective barriers,’ and ‘mechanical barriers,’ and lower levels of library anxiety associated with ‘knowledge of the library’ than did native english speakers.”29 later, jiao and onwuegbuzie found the most prevalent causes of library anxiety for international students were mechanical barriers (library technology) as the greatest source, followed by affective barriers. 30 in the more recent pilot study by lu and adkins, the greatest barriers for international students were affective and staff barriers, while mechanical barriers, such as technologies, were no longer a significant cause of anxiety for most.31 collins and veal found adult learners in their study had the highest degree of library anxiety pertaining to affective barriers. 32 in their study, kwon, onwuegbuzie, and alexander revealed graduate students who had higher levels of library anxiety resulting from affective barriers and knowledge of the library had weaker critical-thinking skills, lower self-confidence, less inquisitiveness, and reduced systematicity (“less disposed toward organized, logical, focused, and attentive inquiry”).33 kwon found similar results in undergraduate students.34 interventions recognizing the multiple causes and multidimensional aspects of library anxiety, librarians have devised a number of interventions aimed at addressing one or more of its causes. some of the means to address barriers with staff have focused on outreach, engaging library instruction, online presence, and other similar efforts to reach students and provide needed support for students’ research. librarians have used information literacy instruction (ili), reference desk consultations, and print and online guides to address library anxiety stemming from affective barriers, knowledge of the library, and even the mechanical barriers arising from lack of technology skills. a common intervention is ili, which several studies have found to have some success in reducing students’ library anxiety. bell explored students’ levels of library anxiety before and after a onecredit il course.35 platt and platt examined the efficacy of two 50-minute ili sessions, required of students enrolled in the research methods in psychology course, in reducing library anxiety, which found “the greatest changes . . . were related primarily to knowledge of what resources are available in the library and how to access them.”36 in contrast to the typical one-session il class, fleming-may, mays, and radom investigated and found a three-workshop instruction model correlated with students’ increased confidence in using the library and lessening library anxiety. 37 notwithstanding the benefits of library instruction sessions for students in relieving library anxiety, pellegrino found students were far more likely to ask a librarian for help when their instructor, rather than a librarian, encouraged or required them do so.38 by familiarizing students with the location and arrangement of library services in the building, library orientations have been found to help relieve library anxiety. 39 library orientations primarily aim to address one of the causes of library anxiety: a lack knowledge of the library. these orientations often introduce students to various library staff, which may also help with the dimension of library anxiety due to barriers with staff. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 6 other interventions have been attempted with some success. martin and park found students were more apt to request assistance from the librarian if persuaded the consultation would save time. 40 mcdaniel found in a study of graduate students that the use of peer mentors was effective in reducing affective barriers.41 robbins discussed the use of library events to help ease students anxiety, but found in the follow-up survey many students were unaware of the events.42 diprince et al. discussed ways the use of a print guide can help alleviate library anxiety. 43 oru library oral roberts university the oru library serves the students, faculty, and staff of oral roberts university (oru). oru is a small, private, not-for-profit, liberal arts college located in tulsa, oklahoma. founded in 1963 by oral roberts, enrollment is approximately 3,600 students. oru is an interdenominational christian institution focused on a whole-person education of spirit, mind, and body. oru offers more than 150 majors, minors, and pre-professional programs in a range of degree fields, from business, biology, engineering, nursing, ministry, and more.44 history “the first building will be the library which is the core of the whole academic structure.”45 —oral roberts (1962) from the founding of oru, founder oral roberts had a vision of the library’s centrality to academics.46 this set a precedent early in the history of oru library of the importance of the library to the academic work of the students and faculty of oru. expanding on traditional views of the function of an academic library to serve mainly as the repository of books and articles, through the vision of early library administrators, oru library emerged as one of the early adopters of electronic technology with the dairs (dial access information retrieval system) computer.47 throughout the years, due to a number of factors, the oru library receded from the forefront of pre-eminence in academics on campus. library practices followed the general trend of academic libraries. the oru library continued to acquire needed materials (e.g., books, journals, access to databases). library instruction likewise kept up with current models of instruction. the typical method of instruction to undergraduates has been teaching one or two sessions to a class at the request of the instructor. on largely the efforts of the instruction librarian, il became a required component of undergraduate education at oru. with rare exceptions, undergraduate students at oru are required as a part of comp 102: composition ii to attend two sessions of an il course. other forms of ili include workshops and sessions for undergraduates working on their senior papers and other sessions for graduate and postgraduate students, all typically at the request of the instructors of classes. with the new addition of augmented, virtual, and mixed reality (avmr) learning technologies, at the behest of their dean, oru librarians have begun to look at ways to incorporate these technologies into their classes and daily work. several oru instructors are using avmr technologies in their classes.48 to help prepare students for the use of these technologies in their classes, one oru instruction librarian has begun to introduce students to avmr technologies. other oru instruction librarians are exploring ways to use avmr technologies to create visualizations of library and research concepts, such as a 3d visualization of how boolean logic information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 7 works in database searches. oru instruction librarians are also exploring ways to incorporate avmr technologies into a new program of online ili. although still in very early stages of planning, the proposed online ili will include a virtual tour of the library. this paper focuses on the implementation and early feedback from a formative assessment on a virtual tour of the oru library. oru modular students in addition to traditional 15-week semesters, two colleges at oru offer graduate modular programs, the college of education and the college of theology and ministry. many of the students who enroll in these programs are nontraditional students who are returning to college after some time. several of these students work full-time jobs and have family obligations in addition to their academic work. often, these students are not local to the tulsa campus; several are us students who live out of state and many others are international students. the modular classes offered by both programs can be a hybrid of online and modular format. the college of theology and ministry offers one-week courses on campus; the college of education offers two-and-a-half-day on-campus classes. modular classes are intensive due to the compressed nature of the curriculum. often, modular students are visiting campus for the first time, and in addition to locating their classes, are very busy with coursework. adding to these pressures, modular students may be using computer technologies in new ways. navigating the library’s resources is yet another stressor for many of these students. for students who are not familiar with the operations of an academic library, they may not be aware of library services nor how to access those services. the project in january 2017, the global learning center opened on the campus of oru. one hallmark of this renovated structure is the integration of avmr technologies.49 despite several professors on campus from various disciplines and colleges implementing avmr into their curriculum, students’ use of the facilities was somewhat lower than had been hoped. in the fall 2018 semester, the idea of creating a virtual tour of the oru library arose from a conversation between the author and a colleague, dan eller. eller described an online ili course he envisioned for oru’s graduate theology modular students. as a part of this course, he envisioned a virtual tour that could help students by reducing their library anxiety. early in 2019, oru’s associate vice president of technology and innovation, michael mathews, contacted dr. mark roberts, dean of learning resources (of which the oru library is a part) to propose making avmr learning technologies available through the oru library. dean roberts agreed and created an avmr team of library faculty to oversee this project. in the spring 2019 semester, the oru it department sent one of their employees, stephen guzman, to work with the library’s avmr team to set up an avmr station and work in the library to help make these new technologies available and known to oru students. in addition to other avmr projects guzman helped the library’s avmr team begin, he volunteered to take the 360 images when he learned of the library’s desire to create a virtual tour of the library. guzman also helped in the selection of editing software, 3dvista, for which the library acquired a license. working with the 360 images guzman took and stitched together, the author used 3dvista to create a virtual tour of the library. this software allows for the addition of elements to the 360 images that make up the virtual tour to enhance the viewer’s experience and to provide information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 8 information and hyperlinks to external webpages with more information. some of the elements added to the oru library virtual tour are hotspots that enable a viewer to move from one area to another, icons that present pop-up windows with more information, and other icons that link to the online profiles of various library faculty. throughout the tour, consistent use of icons for the same functions is maintained. for example, icons with arrows allow the viewer to move from one location to another (figure 2), while icons with question marks displayed over library personnel (figure 3) open the personnel’s profile webpage when clicked. icons that contain the letter “i” feature pop-up windows with information and related links. the tour begins from outside the building so new visitors will be able to recognize the building when they arrive on campus (figure 4). viewers can navigate through subsequent 360 images by clicking on the arrow icons so the viewer virtually travels the same path they will follow to enter the library when on campus. there are two other options to navigate the tour. the viewer can click on the small icons of scenes displayed on the left side of the screen to move to another area. the floor plans displayed at the upper right of the screen have red dots indicating the location of various scenes and, when clicked, move the viewer to that scene. figure 2. avmr station near the reference desk, oru library virtual tour. other elements of the tour include small icons of the scenes on the left of the screen. beneath these icons are the names of the various areas. the title of the current scene appears in yello w lettering, providing information to help orient the viewer. small floorplans located in the upper right side of the screen offer additional information on the location of the area (figure 3). viewers can toggle these floorplans on and off. another feature supplying location information is the dropdown menu for the floorplans (the dark blue bar at the upper right of the screen) which shows the floor level of the building on which the area is located. in the lower right of the screen, an information icon is available with details on what behavior to expect when clicking on icons and a description of the various ways to navigate the tour. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 9 figure 3. dean mark roberts near alexa and the self-checkout station at the circulation desk, oru library virtual tour. figure 4. oru campus, oru library tour. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 10 methodology the aim of the virtual tour of the library is to reduce several dimensions of students’ library anxiety. the primary goal of the tour is to reduce anxiety related to knowledge of the library by familiarizing students with images and information regarding the building prior to their arrival on campus. another aim is to reduce barriers with staff, which we address by proving information along with images of library faculty. affective barriers and mechanical barriers are two of the most prevalent causes of library anxiety, which the intervention of the tour does not directly address. the hope is, however, that with the minimization of any anxiety stemming from knowledge of the building and barriers with staff, students will be encouraged to consult with librarians, particularly as information on the variety of ways to contact librarians is included on the information pop-up window on the reference desk. preand post-surveys the preand post-surveys administered to students included 42 statements from bostick’s library anxiety scale. bostick’s library anxiety scale, developed in 1992, is a 5-point likert scale survey instrument that contains 43 statements. the pre-survey also contains demographic questions. the one statement omitted from bostick’s original survey was number 40, “the change machines are always out of order,” as the oru library does not have change machines.50 with the exception of the demographical questions, the post-survey is the duplicate of the pre-survey, with the same 42 statements. although several researchers have adapted bostick’s library anxiety scale, such as blundell’s adaptation to add “elements related specifically to information technology (both hardware such as computers, and software such as online research databases),”51 for the purposes of this preliminary inquiry, the researcher decided to use the original questions from the library anxiety scale. the original statements were used because reduction of library anxiety stemming from information technology use was not a goal of this study. administration of survey a link to the pre-survey was posted on the homepage of the oru library. the author sent email invitations containing a link to the pre-survey to students enrolled in the june 2019 summer modular theology classes. the author met with groups of education modular students during the week they were on campus (june 24–30, 2019) to recruit participation. in a library session, another librarian encouraged her modular students to participate in the study. at the end of the pre-survey, a unique number and instructions to note the number were provided to participants to be used to log in to the post-survey. the link to the virtual tour appeared on the final screen of the pre-survey. the link to the post-survey was provided on the same page as the virtual tour, allowing participants to navigate to the post-survey when desired. the surveys asked for no identifying information; however, the unique number provided on the pre-surveys and entered by the participants on the post-surveys allowed the researcher to link the participants’ responses to both surveys. once the results were downloaded, each of the participants’ preand post-survey responses were coded p1 through p7 to track any potential effects of the virtual tour on participants’ responses. because of the low rate of participation, formal statistical analyses were not applied to these findings. the results were examined in two ways. each participants’ preand post-survey responses were compared to determine if responses changed from preto post-survey. the total number of responses on each point of the likert scale information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 11 to each of the 42 statements were examined to determine trends in participants’ levels of library anxiety. results although approximately 100 students enrolled in either the graduate theology or graduate education modular classes visited the campus june 24–30, 2019, participation in this preliminary study was extremely low. to date only seven participants have completed both the preand postsurveys. the responses from this formative assessment will be used by the oru library to guide future iterations of the virtual library tour and inclusion in ili. the following discusses initial findings from the preand post-surveys. most of the participants reported little or no discomfort or anxiety with using the library. all participants indicated they are us citizens, and all indicated some level of familiarity with the library. four reported they had often visited the library, three responded they had visited the library previously, but not often. of the seven participants, five indicated they are graduate students, one marked “other,” and one reported doctoral-student status. ages of the participants varied from one at 20–29, one 30–39, two 40–49, and three at 50 years or over. the following describes the effect the virtual tour of the library had on participants’ responses. interestingly, one participant showed no change in responses from preto post-survey. note: bostick’s original categorization of the statements have been retained for all 42 of the statements on both instruments. knowledge of the library the principal aim of the virtual tour was to reduce library anxiety related to knowledge of the library by acquainting students with “where things are located and how to find their way around in the building.”52 bostick categorized 5 of the 42 statements as knowledge of the library. based on participants’ responses, there is some indication the tour did help acquaint students with the library. the changes in participants’ responses showed a greater positive trend after viewing the virtual tour; although on two statements, responses showed a negative trend (table 1). table 1 shows the questions on which participants had a change in their responses from preto postsurvey. the number in the positive column indicates the number of participants whose responses displayed a favorable change in the perceptions of participants to that statement following the virtual tour. the number in the negative column shows the number of participants whose responses on the post-survey showed a negative effect of the virtual tour on their responses. statement positive negative i don’t feel physically safe in the library. 1 1 i enjoy learning new things about the library. 3 1 i want to learn how to do my own research. 1 the library is a safe place. 2 the library is an important part of my school. 2 totals 9 2 table 1. statements in knowledge of the library category, which showed change on post-survey. the number of responses of strongly disagree statements in this category were unchanged from preto post-survey. the only statement that received any responses of strongly disagree was five information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 12 to the statement “i don't feel physically safe in the library.” taken together with the two responses of disagree to this statement, all the participants feel safe in the library. to the statement “the library is a safe place,” all seven participants answered either agree (five on pre-survey, four on post-survey) or strongly agree (two on pre-survey, three on post-survey) (figure 6). curiously, responses to “i enjoy learning new things about the library” changed from no responses of disagree on the pre-survey to one response of disagree on the post-survey. the other shift in the number of responses of disagree was on the statement “the library is an important part of my school” (two on pre-survey, one on post-survey), indicating a slight improvement (figure 5). figure 5. comparison of strongly disagree and disagree responses in knowledge of the library category. to the statements in this category, none of the respondents replied undecided, except to the statement “i enjoy learning new things about the library.” there was one undecided response on the pre-survey and no responses of undecided on the post-survey to this statement. the other change in this category was to the statement “the library is an important part of my school,” which moved from no responses of undecided on the pre-survey to one undecided on the postsurvey. the respondents, for the most part, wanted to learn to do their own research, with five responses of agree or strongly agree on both the preand post-surveys. five of the participants felt the library is of importance (one agree and four strongly agree). six of the seven participants reported they enjoy learning new things about the library. the shift in responses from five agree and one strongly agree on pre-survey to two agree and four strongly agree indicates the tour might have affected participants’ views on this statement (figure 6). information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 13 figure 6. comparison of strongly agree and agree responses in knowledge of the library category. affective barriers while not a direct goal of the virtual tour, the responses of participants showed the most gains on the post-survey within the category of affective barriers. this seems to indicate that viewing the virtual tour improved students’ self-perceptions of their competence in using the library and library resources. out of the 42 statements on each of the instruments, 12 are in bostick’s category, affective barriers. the statements in table 2 are those on which participants had a change in their responses from preto post-survey. the numbers in the positive column indicate the number of participant responses, which improved on the post-survey. a number in the negative column indicates participants’ post-survey responses that moved in a negative direction. statement positive negative a lot of the university is confusing to me. 2 i am unsure how to begin my research. 2 i can never find things i need in the library. 3 i don’t know what resources are available in the library. 2 i don't know what to do next when the book i need is not on the shelf. 1 i feel comfortable using the library. 3 i get confused trying to find my way around the library. 2 i’m embarrassed that i don’t know how to use the library. 1 1 the directions for using the computers are not clear. totals 17 1 table 2. statements in affective barriers category, which showed change on post-survey. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 14 looking at the responses to statements in this category reveals some possible effects of the tour and some potential areas of library anxiety. the responses to “i don’t know what resources are available in the library” were split on the pre-survey, with three responses of agree, one of strongly agree, one undecided, one of strongly disagree, and two of disagree. the post-survey responses showed almost no change; the only change was one additional response of undecided, with no strongly agree responses (figures 7.1, 8.1, 9.1). these findings indicate more information on what sources are available to patrons may be needed on the virtual tour. most of the respondents indicated confidence about where to begin research. on both preand post-surveys, there were five responses of strongly disagree or disagree to the statement “i am unsure how to begin my research” (figure 7.1). most indicated they feel confident in using the library based on the responses to the statements “i’m embarrassed that i don’t know how to use the library,” “i feel comfortable using the library,” “i can never find things in the library,” and “i get confused trying to find my way around the library” (figures 7.1, 7.2, 9.1, 9.2). responses were equally positive to the statements “the library won't let me check out as many items as i need,” “a lot of the university is confusing to me,” “i don’t know what to do next when the book i need is not on the shelf,” and “i can’t find enough space in the library to study” (figures 7.1, 7.2, 8.1, 8.2, 9.1, 9.2). responses were divided on the statement “i feel like i’m bothering the reference librarian if i ask a question” (figures 8.2, 10.2). this finding needs further research to determine what is causing students to feel reluctance to ask the librarian for assistance. figure 7.1. comparison of strongly disagree and disagree responses in affective barriers category. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 15 figure 7.2. comparison of strongly disagree and disagree responses in affective barriers category. figure 8.1. comparison of undecided responses in affective barriers category. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 16 figure 8.2. comparison of undecided responses in affective barriers category. figure 9.1. comparison of strongly agree and agree responses in affective barriers category. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 17 figure 9.2. comparison of strongly agree and agree responses in affective barriers category. mechanical barriers although not a goal of the study, there was positive change in participants’ responses in both statements in the category of mechanical barriers. it is unclear how the virtual tour might have caused the improvement in participants’ perception of the reliability of machines in the library. statement positive negative the computer printers are often out of paper. 1 the copy machines are usually out of order. 1 totals 2 table 3. statements in mechanical barriers category, which showed change on post-survey. in this category, on both the preand post-surveys, there was one strongly disagree response to both statements. no respondents replied agree or strongly agree to the statements in this category. responses of disagree to both statements increased one from one disagree response on the pre-survey to two disagree responses on the post-survey. the number of undecided responses fell from five to four on the post-survey. as noted above, it is not clear what caused the change in responses. barriers with staff a secondary goal of the tour was to reduce barriers with staff and thus to reduce library anxiety by providing information with images of library faculty. by providing information and images of the library faculty, this study sought to reduce the anxiety students may have regarding the accessibility and approachability of library staff. in this category, participants showed some positive effects of the virtual tour on how participants viewed library staff. however, the responses of participants exhibited the most variability in this category, with almost an equal number of responses being positive or negative after viewing the tour. the reasons for this information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 18 variance are unclear. in future studies, additional space for comments will be included on the surveys as well as possible follow-up focus-group discussions to determine the causes of negative trends in responses. table 4 shows the statements within this category on which participants had a change in their responses from preto post-survey. the number in the positive column indicates how many participant responses changed in a favorable direction on the post-survey. the number in the negative column indicates the number of participants whose post-survey responses moved in a negative direction. on the survey instruments, 12 of the 15 statements categorized as bostick’s barriers with staff showed changes in responses. statement positive negative i can always ask a librarian if i don’t know how to work a piece of equipment in the library. 1 i can’t get help in the library at the times i need it. 1 1 if i can’t find a book on the shelf the library staff will help me. 2 2 library staff don’t have time to help me. 1 1 the librarians are unapproachable. 2 the library is a comfortable place to study. 2 the library staff doesn’t care about students. 1 3 the library staff doesn’t listen to students. 1 the reference librarians are not approachable. 2 the reference librarians are unhelpful. 2 the reference librarians don’t have time to help me because they’re always busy doing something else. 1 1 there is often no one available in the library to help me. 2 1 totals 15 12 table 4. statements in barriers with staff category, which showed change on post-survey. the findings in the category, overall, were favorable. most feel the librarians and library staff care and are responsive and available to students. pre-survey responses indicated one or two of the participants felt librarians are unapproachable or unhelpful. post-survey responses reflected a positive change in participants’ views on librarians’ approachability and helpfulness. participants also reported the library to be a comfortable study location and that the rules are reasonable (figures 10.1, 10.2, 11.1, 11.2, 12.1, 12.2). information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 19 figure 10.1. comparison of strongly disagree and disagree responses in barriers with staff category. figure 10.2. comparison of strongly disagree and disagree responses in barriers with staff category. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 20 figure 11.1. comparison of undecided responses in barriers with staff category. figure 11.2. comparison of undecided responses in barriers with staff category. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 21 figure 12.1. comparison of strongly agree and agree responses in barriers with staff category. figure 12.2. comparison of strongly agree and agree responses in barriers with staff category. comfort with the library according to collins and veal, comfort with the library is students’ perceptions of the library as a “safe and comforting environment.”53 out of the 42 statements, bostick placed 8 within this category, all of whom showed some change in responses from pre-survey to post-survey. the changes reflected in this category were positive, but it is unclear how the virtual tour might have influenced participants’ perceptions on statements such as “there is too much crime in the library” or “good instructions for using the library’s computers are available.” further investigation is needed to determine what may account for changes in perception on statements such as these. table 5 depicts the changes, both positive and negative, in participants’ responses on the statements in this category. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 22 statement positive negative good instructions for using the library’s computers are available. 2 i don’t understand the library’s overdue fines. 1 2 i feel comfortable in the library. 2 i feel safe in the library. 2 1 the library never has the materials i need. 1 the people who work at the circulation desk are helpful. 3 1 the reference librarians are unfriendly. 1 there is too much crime in the library. 1 2 totals 12 7 table 5. statements in comfort with the library category that showed change on post-survey. the following bar graphs compare the responses on the pre-surveys to the post-survey responses within this category. as with other categories, responses were mostly favorable in this category (figures 13, 14, 15). figure 13. comparison of strongly disagree and disagree responses in comfort with the library category. information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 23 figure 14. comparison of undecided responses in comfort with the library category. figure 15. comparison of strongly agree and agree responses in comfort with the library category. conclusion the oru library has found the virtual tour to be of use in familiarizing students with the library. anecdotal statements from students who viewed the tour during its creation noted the desire that such a tour had been available when they began college and further commented on the assistance that the tour will provide new students. a limitation of this study is the low participation, with no participation from students from some of the groups that other studies have shown may have higher levels of library anxiety (e.g., new students, international students). however, given the indications of positive effects of the virtual tour from our study results and anecdotal statements, we are encouraged that this tool that will assist our students in reducing library anxiety, with the result that they will visit and use the information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 24 library more often to their benefit. again, although participation was low, these results have also encouraged oru librarians to seek other ways to include avmr and other innovative technologies in our instruction, outreach, and services. the 360 virtual tour of the library is undergoing updates and additions to provide students with disabilities information on access points and accessible restrooms. other projects underway include incorporating avmr in il sessions, the addition of a digital sandbox with various technologies and equipment including a vr station, and the addition of vr equipment in our designated faculty research room for use by university faculty to learn and teach students how to use avmr technologies. the response from students and faculty to these new services has been enthusiastic and encouraging that the oru library is positively influencing and supporting the academic work of oru faculty and students. recommended reading varnum, kenneth j. beyond reality: augmented, virtual, and mixed reality in the library. chicago: ala editions, 2019. elliott, christine, marie rose, and jolanda-pieta van arnhem. augmented and virtual reality in libraries. lanham, md: rowman & littlefield, 2018. endnotes 1 anthony j. onwuegbuzie and qun g. jiao, “information search performance and research achievement: an empirical test of the anxiety-expectation mediation model of library anxiety,” journal of the american society for information science & technology 55, no. 1 (2004): 41–54, https://doi.org/10.1002/asi.10342; qun g. jiao and anthony j. onwuegbuzie, “is library anxiety important?,” library review 48, no. 6 (1999), https://doi.org/10.1108/00242539910283732; qun g. jiao and anthony j. onwuegbuzie, library anxiety: the role of study habits (paper presented at the annual meeting of the midsouth educational research association (msera), bowling green, kentucky, november 15–17, 2000), http://files.eric.ed.gov/fulltext/ed448781.pdf. 2 constance a. mellon, “library anxiety: a grounded theory and its development,” college & research libraries 47, no. 2 (1986), https://doi.org/10.5860/crl_47_02_160; see also constance a. mellon, “library anxiety: a grounded theory and its development,” college & research libraries 76, no. 3 (2015), https://doi.org/10.5860/crl.76.3.276. 3 diane mizrachi, “library anxiety,” encyclopedia of library and information sciences (boca raton, fl: crc press, 2017): 2782. 4 mellon, “library anxiety,” (1986): 163; see also mellon, “library anxiety,” (2015): 280. 5 mellon, “library anxiety,” (1986): 162; see also mellon, “library anxiety,” (2015): 278. 6 alison j. head and michael b. eisenberg, truth be told: how college students evaluate and use information in the digital age: project information literacy progress report (university of washington's information school, 2010): 3. https://doi.org/10.1002/asi.10342 https://doi.org/10.1108/00242539910283732 http://files.eric.ed.gov/fulltext/ed448781.pdf https://doi.org/10.5860/crl_47_02_160 https://doi.org/10.5860/crl.76.3.276 information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 25 7 sharon lee bostick, “the development and validation of the library anxiety scale,” (phd diss., wayne state university, 1992); qun g. jiao and anthony j. onwuegbuzie, “antecedents of library anxiety,” library quarterly 67, no. 4 (1997): 72, https://doi.org/10.1086/629972. 8 jiao and onwuegbuzie, “antecedents of library anxiety.” 9 mellon, “library anxiety” (1986); see also mellon, “library anxiety” (2015); see also constance a. mellon, “attitudes: the forgotten dimension in library instruction,” library journal 113, no. 14 (1988). 10 kathleen m. t. collins and robin e. veal, “off-campus adult learners’ levels of library anxiety as a predictor of attitudes toward the internet,” library & information science research 26, no. 1 (2004): 4, https://doi.org/https://doi.org/10.1016/j.lisr.2003.11.002. 11 mizrachi, “library anxiety,” 2784. 12 anthony j. onwuegbuzie, “writing a research proposal: the role of library anxiety, statistics anxiety, and composition anxiety,” library & information science research 19, no. 1 (1997), https://doi.org/10.1016/s0740-8188(97)90003-7. 13 carol collier kuhlthau, “developing a model of the library search process: cognitive and affective aspects,” research quarterly 28, no. (winter 1988), https://www.jstor.org/stable/25828262; carol c kuhlthau, “inside the search process: information seeking from the user’s perspective,” journal of the american society for information science 42, no. 5 (1991), https://doi.org/10.1002/(sici)10974571(199106)42:5<361::aid-asi6>3.0.co;2-%23. 14 shelley blundell, “documenting the information-seeking experience of remedial undergraduate students,” proceedings from the document academy 1, no. 1 (2014), https://doi.org/10.35492/docam/1/1/4. 15 blundell, “documenting the information-seeking experience,” 5. 16 blundell, “documenting the information-seeking experience,” 6. 17 used by permission of the author. retrieved from http://remedialundergraduateaisp.pbworks.com/w/file/88755941/modelrevised%20-%208 .4.jpg. 18 blundell, “documenting the information-seeking experience”; melissa gross and don latham, “attaining information literacy: an investigation of the relationship between skill level, self estimates of skill, and library anxiety,” library & information science research 29, no. 3 (2007), https://doi.org/10.1016/j.lisr.2007.04.012; melissa gross and don latham, “undergraduate perceptions of information literacy: defining, attaining, and self-assessing skills,” college & research libraries 70, no. 4 (2009), https://doi.org/10.5860/0700336; melissa gross and don latham, “experiences with and perceptions of information: a phenomenographic study of first-year college students,” library quarterly 81, no. 2 (2011), https://doi.org/10.1086/658867; melissa gross, “the impact of low-level skills on https://doi.org/10.1086/629972 https://doi.org/https:/doi.org/10.1016/j.lisr.2003.11.002 https://doi.org/10.1016/s0740-8188(97)90003-7 https://www.jstor.org/stable/25828262 https://doi.org/10.1002/(sici)1097-4571(199106)42:5%3c361::aid-asi6%3e3.0.co;2-%23 https://doi.org/10.1002/(sici)1097-4571(199106)42:5%3c361::aid-asi6%3e3.0.co;2-%23 https://doi.org/10.35492/docam/1/1/4 http://remedialundergraduateaisp.pbworks.com/w/file/88755941/modelrevised%20-%208.4.jpg http://remedialundergraduateaisp.pbworks.com/w/file/88755941/modelrevised%20-%208.4.jpg https://doi.org/10.1016/j.lisr.2007.04.012 https://doi.org/10.5860/0700336 https://doi.org/10.1086/658867 information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 26 information-seeking behavior: implications of competency theory for research and practice,” reference & user services quarterly (2005), https://www.jstor.org/stable/20864481. 19 mellon, “attitudes,” 138; jiao and onwuegbuzie, “antecedents of library anxiety.” 20 qun g. jiao and anthony j. onwuegbuzie, “perfectionism and library anxiety among graduate students,” journal of academic librarianship 24, no. 5 (1998), https://doi.org/10.1016/s00991333(98)90073-8; jiao and onwuegbuzie, “is library anxiety important?”; qun g. jiao and anthony j. onwuegbuzie, “library anxiety among international students” (paper presented at the annual meeting of the mid-south education research association point clear, alabama, november 17–19, 1999), https://eric.ed.gov/?id=ed437973; qun g. jiao and anthony j. onwuegbuzie, “self-perception and library anxiety: an empirical study,” library review 48, no. 3 (1999), https://doi.org/10.1108/00242539910270312; qun g. jiao and anthony j. onwuegbuzie, “identifying library anxiety through students’ learning-modality preferences,” library quarterly 69, no. 2 (1999), https://doi.org/10.1086/603054; qun g. jiao and anthony j. onwuegbuzie, library anxiety: the role of study habits; qun g. jiao and anthony j. onwuegbuzie, “library anxiety and characteristic strengths and weaknesses of graduate students’ study habits,” library review 50, no. 2 (2001), https://doi.org/10.1108/00242530110381118; qun g. jiao and anthony j. onwuegbuzie, “dimensions of library anxiety and social interdependence: implications for library services, ” library review 51, no. 2 (2002), https://doi.org/10.1108/00242530210418837; qun g. jiao and anthony j. onwuegbuzie, the relationship between library anxiety and reading ability (paper presented at the annual meeting of the mid-south educational research association, chattanooga, tennessee, november 6–8, 2002), https://eric.ed.gov/?id=ed478612; qun g. jiao and anthony j. onwuegbuzie, “reading ability as a predictor of library anxiety,” library review 52, no. 4 (2003), https://doi.org/10.1108/00242530310470720; anthony j. onwuegbuzie, and vicki l. waytowich, “the relationship between citation errors and library anxiety: an empirical study of doctoral students in education,” information processing & management 44, no. 2 (2008), https://doi.org/10.1016/j.ipm.2007.05.007; onwuegbuzie, “writing a research proposal”; anthony j. onwuegbuzie and qun g. jiao, “i’ll go to the library later: the relationship between academic procrastination and library anxiety,” college & research libraries 61, no. 1 (2000), https://doi.org/10.5860/crl.61.1.45; onwuegbuzie and jiao, “information search performance and research achievement”; anthony j. onwuegbuzie, qun g. jiao, and sharon l bostick, library anxiety: theory, research, and applications, vol. 1 (lanham, maryland: scarecrow press, 2004). 21 jiao and onwuegbuzie, “identifying library anxiety”; qun g. jiao, anthony j. onwuegbuzie, and art a. lichtenstein, “library anxiety: characteristics of ‘at-risk’ college students,” library & information science research 18, no. 2 (1996), https://doi.org/10.1016/s07408188(96)90017-1; nahyun kwon, “a mixed-methods investigation of the relationship between critical thinking and library anxiety among undergraduate students in their information search process,” college & research libraries 69, no. 2 (2008), https://doi.org/10.5860/crl.69.2.117; mellon, “attitudes.” 22 gaby haddow, “academic library use and student retention: a quantitative analysis,” library & information science research 35, no. 2 (2013), https://www.jstor.org/stable/20864481 https://doi.org/10.1016/s0099-1333(98)90073-8 https://doi.org/10.1016/s0099-1333(98)90073-8 https://eric.ed.gov/?id=ed437973 https://doi.org/10.1108/00242539910270312 https://doi.org/10.1086/603054 https://doi.org/10.1108/00242530110381118 https://doi.org/10.1108/00242530210418837 https://eric.ed.gov/?id=ed478612 https://doi.org/10.1108/00242530310470720 https://doi.org/10.1016/j.ipm.2007.05.007 https://doi.org/10.5860/crl.61.1.45 https://doi.org/10.1016/s0740-8188(96)90017-1 https://doi.org/10.1016/s0740-8188(96)90017-1 https://doi.org/10.5860/crl.69.2.117 information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 27 https://doi.org/https://doi.org/10.1016/j.lisr.2012.12.002; adam murray, ashley ireland, and jana hackathorn, “the value of academic libraries: library services as a predictor of student retention,” college & research libraries 77, no. 5 (2016), https://doi.org/10.5860/crl.77.5.631; krista m. soria, “factors predicting the importance of libraries and research activities for undergraduates,” journal of academic librarianship 39, no. 6 (2013), https://doi.org/10.1016/j.acalib.2013.08.017; krista m soria, jan fransen, and shane nackerud, “library use and undergraduate student outcomes: new evidence for students’ retention and academic success,” portal: libraries and the academy 13, no. 2 (2013), https://doi.org/10.1353/pla.2013.0010; krista m. soria, jan fransen, and shane nackerud, “stacks, serials, search engines, and students’ success: first-year undergraduate students’ library use, academic achievement, and retention,” journal of academic librarianship 40, no. 1 (2014), https://doi.org/10.1016/j.acalib.2013.12.002; krista m soria, jan fransen, and shane nackerud, “beyond books: the extended academic benefits of library use for firstyear college students,” college & research libraries 78, no. 1 (2017), https://doi.org/10.5860/crl.78.1.8. 23 jiao, onwuegbuzie, and lichtenstein, “library anxiety,” 1. 24 jiao and onwuegbuzie, “identifying library anxiety”; see also bostick, “the development and validation”; barbara fister, julie gilbert, and amy ray fry, “aggregated interdisciplinary databases and the needs of undergraduate researchers,” portal: libraries and the academy 8, no. 3 (2008), https://doi.org/10.1353/pla.0.0003; mellon, “library anxiety”; jiao and onwuegbuzie, “perfectionism and library anxiety among graduate students”; jiao and onwuegbuzie, “is library anxiety important?”; jiao and onwuegbuzie, “library anxiety among international students”; jiao and onwuegbuzie, “self-perception and library anxiety: an empirical study”; jiao and onwuegbuzie, “identifying library anxiety through students’ learning-modality preferences”; jiao and onwuegbuzie, library anxiety: the role of study habits; jiao and onwuegbuzie, “library anxiety and characteristic strengths and weaknesses of graduate students’ study habits”; jiao and onwuegbuzie, “dimensions of library anxiety and social interdependence”; jiao and onwuegbuzie, the relationship between library anxiety and reading ability; jiao and onwuegbuzie, “reading ability as a predictor of library anxiety”; onwuegbuzie and waytowich, “the relationship between citation errors and library anxiety”; onwuegbuzie, “writing a research proposal”; onwuegbuzie and jiao, “i'll go to the library later”; onwuegbuzie and jiao, “information search performance and research achievement”; onwuegbuzie, jiao, and bostick, library anxiety: theory, research, and applications. 25 onwuegbuzie and jiao, “the relationship”; anthony onwuegbuzie and qun g. jiao, “understanding library-anxious graduate students,” library review 47, no. 4 (1998), https://doi.org/10.1108/00242539810212812. 26 jiao and onwuegbuzie, “is library anxiety important?” 27 qun g. jiao, anthony j. onwuegbuzie, and sharon l bostick, “racial differences in library anxiety among graduate students,” library review 53, no. 4 (2004), https://doi.org/10.1108/00242530410531857; qun g. jiao, anthony j. onwuegbuzie, and sharon l. bostick, “the relationship between race and library anxiety among graduate https://doi.org/https:/doi.org/10.1016/j.lisr.2012.12.002 https://doi.org/10.5860/crl.77.5.631 https://doi.org/10.1016/j.acalib.2013.08.017 https://doi.org/10.1353/pla.2013.0010 https://doi.org/10.1016/j.acalib.2013.12.002 https://doi.org/10.5860/crl.78.1.8 https://doi.org/10.1353/pla.0.0003 https://doi.org/10.1108/00242539810212812 https://doi.org/10.1108/00242530410531857 information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 28 students: a replication study,” information processing & management 42, no. 3 (2006), https://doi.org/10.1016/j.ipm.2005.03.018. 28 mizrachi, “library anxiety,” 2784. 29 anthony j. onwuegbuzie and qun g. jiao, “academic library useage: a comparison of native and non-native english-speaking students,” australian library journal 46, no. 3 (1997): 263, https://doi.org/10.1080/00049670.1997.10755807; jiao and onwuegbuzie, “antecedents of library anxiety.” 30 jiao and onwuegbuzie, “library anxiety among international students.” 31 yunhui lu and denice adkins, “library anxiety among international graduate students,” proceedings of the american society for information science and technology 49, no. 1 (2012), https://doi.org/10.1002/meet.14504901319. 32 collins and veal, “off-campus adult.” 33 nahyun kwon, anthony j. onwuegbuzie, and linda alexander, “critical thinking disposition and library anxiety: affective domains on the space of information seeking and use in academic libraries,” college & research libraries 68, no. 3 (2007): 276, https://doi.org/10.5860/crl.68.3.268. 34 kwon, “a mixed-methods investigation.” 35 judy carol bell, “student affect regarding library-based and web-based research before and after an information literacy course,” journal of librarianship & information science 43, no. 2 (2011), https://doi.org/10.1177/0961000610383634. 36 jessica platt and tyson l platt, “library anxiety among undergraduates enrolled in a research methods in psychology course,” behavioral & social sciences librarian 32, no. 4 (2013): 248, https://doi.org/10.1080/01639269.2013.841464. 37 rachel a. fleming-may, regina mays, and rachel radom, “‘i never had to use the library in high school’: a library instruction program for at-risk students,” portal: libraries and the academy 15, no. 3 (2015), https://doi.org/10.1353/pla.2015.0038. 38 catherine pellegrino, “does telling them to ask for help work?,” reference & user services quarterly 51, no. 3 (2012), https://doi.org/10.5860/rusq.51n3.272. 39 kathy christie anders, stephanie j. graves, and elizabeth german, “using student volunteers in library orientations,”practical academic librarianship: the international journal of the sla 6, no. 2 (2016): 17–30, http://hdl.handle.net/1969.1/166249. 40 pamela n. martin and lezlie park, “reference desk consultation assignment: an exploratory study of students’ perceptions of reference service,” reference & user services quarterly 49, no. 4 (2010), https://doi.org/10.5860/rusq.49n4.333. https://doi.org/10.1016/j.ipm.2005.03.018 https://doi.org/10.1080/00049670.1997.10755807 https://doi.org/10.1002/meet.14504901319 https://doi.org/10.5860/crl.68.3.268 https://doi.org/10.1177/0961000610383634 https://doi.org/10.1080/01639269.2013.841464 https://doi.org/10.1353/pla.2015.0038 https://doi.org/10.5860/rusq.51n3.272 http://hdl.handle.net/1969.1/166249 https://doi.org/10.5860/rusq.49n4.333 information technology and libraries march 2020 using augmented and virtual reality in information literacy instruction | sample 29 41 sarah mcdaniel, “library roles in advancing graduate peer-tutor agency and integrated academic literacies,” reference services review 46, no. 2 (2018), https://doi.org/10.1108/rsr-02-2018-0017. 42 elaine m. robbins, “breaking the ice: using non-traditional methods of student involvement to effect [sic] a welcoming college library environment,” southeastern librarian 62, no. 1 (2014), https://digitalcommons.kennesaw.edu/seln/vol62/iss1/5. 43 elizabeth diprince et al., “don’t panic!,” reference & user services quarterly 55, no. 4 (2016), https://doi.org/10.5860/rusq.55n4.283. 44 oral roberts university, “about oru,” (2019), https://www.oru.edu/admissions/undergraduate/. 45 oral roberts, our partnership with god [sound recording]. eighth world outreach, oral roberts evangelistic association, tulsa, ok: abundant life recordings, 1962). 46 oral roberts, our partnership. 47 margaret m. grubiak, “an architecture for the electronic church: oral roberts university in tulsa, oklahoma,” technology and culture 57, no. 2 (2016), https://doi.org/10.1353/tech.2016.0066. 48 stephanie hill, “oru receives innovation award,” press release, may 2, 2017, http://www.oru.edu/news/oru_news/20170502-glc-innovation-award.php?locale=en. 49 hill, “oru receives.” 50 bostick, “the development and validation,” 160. 51 blundell, “documenting the information-seeking experience,” 263. 52 mizrachi, “library anxiety,” 2784. 53 collins and robin e. veal, “off-campus adult,” 7. https://doi.org/10.1108/rsr-02-2018-0017 https://digitalcommons.kennesaw.edu/seln/vol62/iss1/5 https://doi.org/10.5860/rusq.55n4.283 https://www.oru.edu/admissions/undergraduate/ https://doi.org/10.1353/tech.2016.0066 http://www.oru.edu/news/oru_news/20170502-glc-innovation-award.php?locale=en abstract literature review library anxiety causes and factors negative effects at-risk student groups interventions oru library oral roberts university history oru modular students the project methodology preand post-surveys administration of survey results knowledge of the library affective barriers mechanical barriers barriers with staff comfort with the library conclusion recommended reading endnotes lib-s-mocs-kmc364-20141005045144 providing bibliographic services from machine-readable data basesthe library's role richard de gennaro: director of libraries, university of pennsylvania, philadelphia, 215 libraries will play a key .role in providing access to data bases, but not by subscribing to tape services and establishing local processing centers as is commonly assumed. high costs and the nature of the demand will make this approach unfeasible. it is more likely that the library~s reference staff will develop the capability of serving as a broker between the local campus user and the various regional or specialized retail distribution centers which exist or will be established. this brief paper will attempt to counter the widely held view that the larger research libraries will soon need to begin subscribing to the growing number of data bases in machine-readable form and providing current awareness and other services from them for their local users. 0 it will speculate on how this field might develop and will suggest a less expensive and more feasible strategy which libraries may use to gain access to these increasingly important bibliographic services. the key question of who will pay for these new services, the user or the institution, will also be discussed. while it is clearly outside the scope of this paper to review the state-ofthe-art of data base services, reference to a few key works and a brief introduction to the subject may be helpful. the most comprehensive and authoritative review of the state-of-the-art of the field and its literature is the excellent chapter entitled "machinereadable bibliographic data bases" by marvin c. gechman in the 1972 volume of the annual review of information science and technology. 1 a useful selection of readings is key papers on the use of computerbased bibliographic services edited by stella keenan and published jointly • this paper was developed from a talk by the author on a panel entitled "library management of machine-readable data bases." the program was jointly sponsored by cola, isad, and acrl and took place at the ala conference in las vegas, june 24, 1973. 216 journal of library automation vol. 6/4 december 1973 by the american society for information science and the national federation of abstracting and indexing services in 1973.2 a study of six university-based information systems made by the national bureau of standards is essential and contains in convenient form comparative and descriptive information about these pioneering centers which are sponsored by the national science foundation.3 some of the most useful and important data bases available are those that have been developed by the indexing and abstracting services as byproducts of their efforts to automate the production of their regular printed publications. like the publications, the tapes come in a wide variety of incompatible formats. among the important producers are: chemical abstracts service, biosciences information service, engineering index inc., american institute of physics, and the american geological institute. ccm information corporation (pandex) and the institute for scientific information are two examples of major commercial suppliers. several of the scientific societies received substantial grants from the n ationa! science foundation and other sources in the 1960s for this automation effort, and it was generally expected that an important new market for the by-product tapes would develop among researchers in universities and in industry. imaginative and forward-looking librarians and computer people at various universities applied for and received grants to establish centers where these new data tapes could be used to provide current awareness and retrospective search services to users. the national aeronautics and space administration established a network of regional dissemination centers at six universities, including the universities of connecticut, indiana, and new mexico, the north carolina science and technology research center, university of pittsburgh, and the university of southern california. the national science foundation has been supporting centers at the university of georgia, lehigh university, university of california at los angeles, ohio state university, and stanford university. other centers have been established at the illinois institute of technology research institute and the university of florida. it is worth noting that nearly all centers provide services free to their own institutional users and continue to be heavily subsidized. all seem eager to expand their markets to include paying customers from a larger region. the latest entry into this field is the new england board of higher education's northeast academic science information center (nasic) sponsored by nsf. nasic's approach is basically different from the unitary centers that have been named. it will attempt to become a broker between the various existing centers and its own members, facilitating their access to existing services elsewhere. it will serve a ten-state region and is expected, perhaps somewhat optimistically, to become self-supporting after the three-year grant period ends. the number of data bases available in the united states is now over a hundred and is growing rapidly, apparently without benefit of firm stan..... providing bibliographic services/ de gennaro 217 dards. a parallel development is taking place in europe. as the number of available data bases increases, and as the activity at these centers expands, more and more librarians become interested in and concerned about how they are going to provide these new, important, and expensive services on their own campuses. interest among librarians in data base services is running high. a session at the association of research libraries conference in the spring of 1973 was devoted to it, and a program at the annual meeting of the american library association in las vegas on the subject was jointly sponsored by the cola discussion group~ the information science and automation division, and the association of college and research libraries. while this interest is commendable and should be stimulated, it is also important that it be tempered and put into perspective by a realistic consideration of some of the costs and problems involved in providing these services. this is what the remainder of this paper will attempt to do. the title of the ala program was "library management of machine~ readable reference data bases." implied in that title are two basic assumptions that are widely accepted: one is that libraries will play a key role in providing access to information in machine-readable data bases on their campuses. the other is that in order to provide this access they will have to acquire and maintain these data bases and develop the capability of searching and manipulating them for their local users. the first assumption is valid; libraries will be responsible for assisting users in gaining access to information in this new form. the second assumption is highly questionable, if not invalid. it is extremely unlikely that many individual libraries will be able to afford to establish centers to acquire and process these machine-readable data bases. while it may appear that a straw man is being set up that can be easily demolished, the idea that academic libraries must and will begin acquiring and servicing many large and expensive data bases, and even statistical data banks, is still widely enough held that it ought to be put to rest. how did this idea gain such currency? perhaps it was because the first available data bases were from the indexing and abstracting services and contained machine-readable versions of their printed indexes. since li~ braries subscribed to the printed editions, it followed that they should also subscribe to the tape editions. the same is true for the census tapes. li~ braries were the chief repositories for printed census publications, so it was natural to assume that they would have to subscribe to and make avail~ able the machine~readable census data as well. we now know better about the census tapes; the problem was simply beyond our resources, and they are being made available from specialized centers. a similar solution may well emerge for the bibliographical data tapes of the indexing and ab~ stracting services. to help put matters into perspective, it might be useful to review a few other ideas we had in the last two decades on how certain technological de~ 218 journal of library automation vol. 6/ 4 december 1973 veloprnents would be implemented in the library. take microfilm, for example. back in the 1950s when microfilm came of age for library use, many librarians thought that every major library would require its own laboratory where large quantities of film could be produced and processed under the direction of a new breed of librarian called a documentalist. several major libraries did establish such laboratories for a time, but the only remaining ones of any significance are at the library of congress and a few other large libraries . most of the others were put out of business by the copying machine, the local service bureau, and commercial micropublishers-and the documentalists became information scientists. library automation provides other interesting examples. many of us recall that in the 1960s it was a commonly held view that each major library would have to automate its operations, and that librarians would learn to master the computer that was soon to be installed in every library basement, or see themselves replaced by computer experts. as we all know, it did not happen that way. librarians will probably end up with computer terminals or minicomputers, with software packages supplied by library cooperatives or commercial vendors. when the marc tapes were first made available, it was assumed (and this is what the marc i experiment was all about) that each library would have to subscribe to the tapes and design, implement, and operate its own system to use the data in its cataloging operations. again, it did not happen that way. marc data are being used by libraries, but indirectly through cooperative centers such as oclc, or through commercial vendors of card services such as information design or josten's, inc. individual libraries are not subscribing to marc tapes, as we had thought would be the case. the point of citing these few examples is to suggest that it is extremely difficult in the early stage of a new technology to predict with any confidence how it will be introduced and implemented, and what effects it will have. we seem to have a natural tendency first to try to cope with each new technological development on a do-it-yourself individual library level, and when experience teaches us that implementing the particular technology is more difficult and more expensive than we thought, we regroup and try a broader-based approach. this is approximately where we are with data base services; it is time for a broader-based approach. again, it is unlikely that libraries will provide access to machine-readable data by setting up their own campus information centers to acquire and process data bases. anyone who takes the time to look at a list of data bases available and their annual subscription rates will understand that research library book budgets will not be large enough to cover these additional subscription costs. in fact, the subscriptions are only a minor element in the total cost of providing these services. the data bases must be cumulated and maintained. programs to manipulate and access them in their many nonstandard formats and contents must be written or adapted. c1 providing bibliographic services/de gennaro 219 the cost of administering and marketing the services and interfacing with the users will be high. perhaps the most critical question to be answered is: will the individual user be charged for the services he uses or will the costs be absorbed by the university? the answer to that question will determine how and to what extent the machine-based services will be used in the future. if they are offered free, as are traditional library services, then one can assume with some confidence that a substantial demand for them will materialize. this has in fact been the early experience of the centers at the university of georgia and ohio state and others where use has been totally subsidized by grant money.3·' on the other hand, if the individual user is asked to pay for these services out of his own pocket or even out of departmental or grant funds, the market for them will be severely limited. it is extremely unlikely that large numbers of faculty and other researchers in universities will be seriously interested in becoming paying users of machine-based information services. the experience of c. c. parker at the university of southampton may prove to be typical.5 he reported a drop from forty-seven to five users of an sdi service after charges were introduced. it was not that the users could not pay the charges, but that they preferred to use their resources for other more important needs. the national library of medicine recently instituted user charges in the medline system in order to effect a needed reduction in the number of users. the case for giving these services to users free is theoretically sound in the traditional library context, but there are practical difficulties. first, these services will be expensive and they will require a net addition to library budgets rather than a transfer from one activity to another; the prospects for such budget increases seem dim in the next few years. second, if the services are offered free, there will be no natural or automatic mechanism for controlling their use, and such control is essential to limit costs. once users get on a free subscription list they will tend to stay on it whether they actually use the products or not. this happens in many libraries where current accessions lists are regularly sent to faculty, most of whom discard them unread. on the other hand, there is ample precedent for charging a modest fee for certain services in libraries. the best example is the almost universal charge for photocopies. in those instances where libraries offered free copies, the service was abused and charges had to be reinstated. it seems likely that a combination of institutional subsidy and individual charges will evolve as the dominant method of paying for machinereadable services. in order to recover some costs and prevent abuses, an appropriate system of charges will have to be instituted in spite of the logic of the argument for free services. incidentally, the case for free computer time in universities is perhaps equally valid, but it has never been accepted by the responsible budget officers. 220 journal of library automation vol. 6/4 december 1973 regardless of who pays, these services will have to be advertised and marketed aggressively to reach the limited number of potential users on each campus. it will not be enough to announce their availability and wait for customers. but even the best salesman on the most research-oriented campus will probably fail to find enough users to justify the high costs of providing the extensive and diverse subject coverage that every university will require. the solution, of course, lies in the establishment of a small number of comprehensive regional or even national information processing centers, possibly backed up by a much larger number of specialized centers or services for particular subject or mission-oriented fields such as physics, chemistry, medicine, pollution, urban studies, census data, etc. libraries will play a key role in facilitating access to data bases by functioning as the interface or broker between the users on campus and these regional and special processing and distribution centers. this means that they must develop a new kind of information or data services librarian on their reference staffs whose function it will be to publicize these services and maintain extensive files of information on their scope, contents, cost, and availability. these reference specialists will also guide users to the most appropriate services, help them to build and maintain their interest profiles, and provide assistance with the business aspects of dealing with vendors."'"' after an initial start-up period, this function should and doubtless will become a fully integrated part of the regular reference service, and the need for specialists will disappear as this knowledge becomes a part of every reference librarian's repertoire. the available data base services fall into two main categories: off-line batch and on-line interactive services. the most commonly available up to now have been regular off-line current awareness ( sdi) services based on an interest profile; these have been supplemented by occasional requests for retrospective searches of the older files. the results of these off-line searches are delivered to the subscriber by conventional mail. on-line services permit the user or the reference specialist to access a portion of the data base directly via terminals and telephone lines and perform the search in an interactive mode. some results are immediately displayed on the terminal and others are sent by mail. the lockheed information retrieval service and systems development corporation have recently begun offering interactive searching ~th online computer terminals of a large selection of the most useful bibliographic data bases. with this capability commercially available from leased terminals on a fee-per-use basis, it will be difficult for a university or even some existing centers to justify subscribing to and maintaining these data bases for their own limited use. if lockheed, sdc, and other vendors ca~ d evelop the market and operate these services at a profit, they may be able to 00 the university of pennsylvania library recently established a data services office based on this concept with encouraging early results. providing bibliographic services/ de gennaro 221 satisfy a very substantial portion of the need for these new bibliographic services. medline, toxline, recon, and the new york times information bank provide other models for specialized and centralized interactive services. some authorities assert that this trend toward on-line interactive searching will accelerate and eventually supersede tape searching.6 others argue that the cost of maintaining and searching on-line the really large data bases is prohibitive and will remain so for several years to come. it seems most likely to this author that the trend will be toward on-line systems covering a limited period of time, probably the latest three to five years, with supporting off-line services for retrospective searches. if this proves to be the case, libraries will find it practical and convenient to make terminals available at or near reference desks. a close look at the several centers which now exist on individual campuses would probably show that they are heavily subsidized by grant or other outside funds, and that they are trying to expand to serve their states or even wider regions in order to achieve greater cost effectiveness. these centers ·deserve the credit that is always due pioneers. they are in the process of developing the patterns for providing these services in the future. one of the chief lessons they may have already taught us is that a single university, or even possibly a single state or region, is not a large enough market base upon which to build this activity. these centers will require a large volume of business to justify their high overhead and operating costs and they will seek and welcome additional paying customers. to summarize and conclude, libraries will play a key role in providing access to machine-readable data bases, but they will generally not do it by acquiring and managing these data bases in local campus centers because of the high costs involved. these high costs and the limited market will restrict the number of processing centers to several regional or even national centers, supplemented by a larger number of specialized discipline and mission-oriented services. many data bases and services will be available on a fee-for-service basis either through existing centers or directly from professional societies, government agencies, and commercial vendors with the library serving as facilitator or broker. it seems likely that a combination of institutional subsidies and individual cl1arges will emerge as the pattern for paying for these new computer-based bibliographical services. references i. marvin c. gechroan, "machine-readable bibliographic data bases," in annual review ()j info1'11uj.tion science and technology, v. 7 (washington, d.c.: asis, 1972). p.323-78. 2. stella keenan, ed., key papers on the use of computer-ba.sed bibliographic services (washington, d.c.: asis, 1973). 222 journal of library automation vol. 6/ 4 december 1973 3. b. marron, and others, a study of six university-based information systems (washington, d.c.; national bureau of standards, 1973 [nbs technical note 781]). 4. james l. carmon, "a campus-based information center," special libraries 64:6569 (feb. 1973). 5. c. c. parker, "the use of external current awareness services at southampton university," aslib proceedings 25:4-17 (jan. 1973). 6. m. cerville, l. d. higgins, and francis j. smith, "interactive reference retrieval in large files," information storage and retrieval 7:205-10 (dec. 1971). ital_24n4p24-32 ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ 34 information technology and libraries | march 2010 tagging: an organization scheme for the internet marijke a. visser how should the information on the internet be organized? this question and the possible solutions spark debates among people concerned with how we identify, classify, and retrieve internet content. this paper discusses the benefits and the controversies of using a tagging system to organize internet resources. tagging refers to a classification system where individual internet users apply labels, or tags, to digital resources. tagging increased in popularity with the advent of web 2.0 applications that encourage interaction among users. as more information is available digitally, the challenge to find an organizational system scalable to the internet will continue to require forward thinking. trained to ensure access to a range of informational resources, librarians need to be concerned with access to internet content. librarians can play a pivotal role by advocating for a system that supports the user at the moment of need. tagging may just be the necessary system. w ho will organize the information available on the internet? how will it be organized? does it need an organizational scheme at all? in 1998, thomas and griffin asked a similar question, “who will create the metadata for the internet?” in their article with the same name.1 ten years later, this question has grown beyond simply supplying metadata to assuring that at the moment of need, someone can retrieve the information necessary to answer their query. given new classification tools available on the internet, the time is right to reassess traditional models, such as controlled vocabularies and taxonomies, and contrast them with folksonomies to understand which approach is best suited for the future. this paper gives particular attention to delicious, a social networking tool for generating folksonomies. the amount of information available to anyone with an internet connection has increased in part because of the internet’s participatory nature. users add content in a variety of formats and through a variety of applications to personalize their web experience, thus making internet content transitory in nature and challenging to lock into place. the continual influx of new information is causing a rapid cultural shift, more rapid than many people are able to keep up with or anticipate. conversations on a range of topics that take place using web technologies happen in real time. unless you are a participant in these conversations and debates using web-based communication tools, changes are passing you by. internet users in general have barely grasped the concept of web 2.0 and already the advanced “internet cognoscenti” write about web 3.0.2 regarding the organization and availability of internet content, librarians need to be ahead of the crowd as the voice who will assure content will be readily accessible to those that seek it. internet users actively participating in and shaping the online communities are, perhaps unintentionally, influencing how those who access information via the internet expect to be able to receive and use digital resources. librarians understand that the way information is organized is critical to its accessibility. they also understand the communities in which they operate. today, librarians need to be able to work seamlessly among the online communities, the resources they create, and the end user. as internet use evolves, librarians as information stakeholders should stay abreast of web 2.0 developments. by positioning themselves to lead the future of information organization, librarians will be able to select the best emerging web-based tools and applications, become familiar with their strengths, and leverage their usefulness to guide users in organizing internet content. shirky argues that the internet has allowed new communities to form. primarily online, these communities of internet users are capable of dramatically changing society both onand offline. shirky contends that because of the internet, “group action just got easier.”3 according to shirky, we are now at the critical point where internet use, while dependent on technology, is actually no longer about the technology at all. the web today (web 2.0) is about participation. “this [the internet] is a medium that is going to change society.”4 lessig points out that content creators are “writing in the socially, culturally relevant sense for the 21st century and to be able to engage in this writing is a measure of your literacy in the 21st century.”5 it is significant that creating content is no longer reserved for the internet cognoscenti. internet users with a variety of technological skills are participating in web 2.0 communities. information architects, web designers, librarians, business representatives, and any stakeholder dependent on accessing resources on the internet have a vested interest in how internet information is organized. not only does the architecture of participation inherent in the internet encourage completely new creative endeavors, it serves as a platform for individual voices as demonstrated in marijke a. visser (marijkea@gmail.com) is a library and information science graduate student at indiana university, indianapolis, and will be graduating may 2010. she is currently working for ala’s office for information and technology policy as an information technology policy analyst, where her area of focus includes telecommunications policy and how it affects access to information. tagging: an organization scheme for the internet | visser 35 personal and organizationally sponsored blogs: lessig 2.0, boing boing, open access news, and others. these internet conversations contribute diverse viewpoints on a stage where, theoretically, anyone can access them. web 2.0 technologies challenge our understanding of what constitutes information and push policy makers to negotiate equitable internet-use policies for the public, the content creators, corporate interests, and the service providers. to maintain an open internet that serves the needs of all the players, those involved must embrace the opportunity for cultural growth the social web represents. for users who access, create, and distribute digital content, information is anything but static; nor is using it the solitary endeavor of reading a book. its digital format makes it especially easy for people to manipulate it and shape it to create new works. people are sharing these new works via social technologies for others to then remix into yet more distinct creative work. communication is fundamentally altered by the ability to share content on the internet. today’s internet requires a reevaluation of how we define and organize information. the manner in which digital information is classified directly affects each user’s ability to access needed information to fully participate in twenty-first-century culture. new paradigms for talking about and classifying information that reflect the participatory internet are essential. n background the controversy over organizing web-based information can be summed up comparing two perspectives represented by shirky and peterson. both authors address how information on the web can be most effectively organized. in her introduction, peterson states, “items that are different or strange can become a barrier to networking.”6 shirky maintains, “as the web has shown us, you can extract a surprising amount of value from big messy data sets.”7 briefly, in this instance ontology refers to the idea of defining where digital information can and should be located (virtually). folksonomy describes an organizational system where individuals determine the placement and categorization of digital information. both terms are discussed in detail below. although any organizational system necessitates talking about the relationship(s) among the materials being organized, the relationships can be classified in multiple ways. to organize a given set of entities, it is necessary to establish in what general domain they belong and in what ways they are related. applying an ontological, or hierarchical, classification system to digital information raises several points to consider. first, there are no physical space restrictions on the internet, so relationships among digital resources do not need to be strictly identified. second, after recognizing that internet resources do not need the same classification standards as print material, librarians can begin to isolate the strengths of current nondigital systems that could be adapted to a system for the internet. third, librarians must be ready to eliminate current systems entirely if they fail to serve the needs of internet users. traditional systems for organizing information were developed prior to the information explosion on the internet. the internet’s unique platform for creating, storing, and disseminating information challenges pre– digital-age models. designing an organizational system for the internet that supports creative innovation and succeeds in providing access to the innovative work is paramount to moving the twenty-first-century culture forward. n assessing alternative models controversy encourages scrutiny of alternative models. in understanding the options for organizing digital information, it is important to understand traditional classification models. smith discusses controlled vocabularies, taxonomies, and facets as three traditional methods for applying metadata to a resource. according to smith, a controlled vocabulary is an unambiguous system for managing the meanings of words. it links synonyms, allowing a search to retrieve information on the basis of the relationship between synonyms.8 taxonomies are hierarchical, controlled vocabularies that establish parent–child relationships between terms. a faceted classification system categorizes information using the distinct properties of that information.9 in such a system, information can exist in more than one place at a time. a faceted classification system is a precursor to the bottom-up system represented by folksonomic tagging. folksonomy, a term coined in 2004 by thomas vander wal, refers to a “user-created categorical structure development with an emergent thesaurus.”10 vander wal further separates the definition into two types: a narrow and a broad folksonomy.11 in a broad folksonomy, many people tag the same object with numerous tags or a combination of their own and others’ tags. in a narrow folksonomy, one or few people tag an object with primarily singular terms. internet searching represents a unique challenge to people wanting to organize its available information. search engines like yahoo! and google approach the chaotic mass of information using two different techniques. yahoo! created a directory similar to the file folder system with a set of predetermined categories that were intended to be universally useful. in so doing, the yahoo! developers made assumptions about how the general public would categorize and access information. the categories 36 information technology and libraries | march 2010 and subsequent subcategories were not necessarily logically linked in the eyes of the general public. the yahoo! directory expanded as internet content grew, but the digital folder system, like a taxonomy, required an expert to maintain. shirky notes the yahoo! model could not scale to the internet. there are too many possible links to be able to successfully stay within the confines of a hierarchical classification system. additionally, on the internet, the links are sufficient for access because if two items are linked at least once, the user has an entry point to retrieve either one or both items.12 a hierarchical system does not assure a successful internet search and it requires a user to comprehend the links determined by the managing expert. in the google approach, developers acknowledged that the user with the query best understood the unique reasoning behind her search. the user therefore could best evaluate the information retrieved. according to shirky, the google model let go of the hierarchical file system because developers recognized effective searching cannot predetermine what the user wants. unlike yahoo!, google makes the links between the query and the resources after the user types in the search terms.13 trusting in the link system led google to understand and profit from letting the user filter the search results. to select the best organizational model for the internet it is critical to understand its emergent nature. a model that does not address the effects of web 2.0 on internet use and fails to capture participant-created content and tagging will not be successful. one approach to organizing digital resources has been for users to bookmark websites of personal interest. these bookmarks have been stored on the user’s computer, but newer models now combine the participatory web with saving, or tagging, websites. social bookmarking typifies the emergent web and the attraction of online networking. innovative and controversial, the folksonomy model brings to light numerous criteria necessary for a robust organizational system. a social bookmarking network, delicious is a tool for generating folksonomies. it combines a large amount of self-interest with the potential for an equal, if not greater, amount of social value. delicious users add metadata to resources on the internet by applying terms, or tags, to urls. users save these tagged websites to a personal library hosted on the delicious website. the default settings on delicious share a user’s library publicly, thus allowing other people—not limited to registered delicious account holders—to view any library. that the delicious developers understood how internet users would react to this type of interactive application is reflected in the popularity of delicious. delicious arrived on the scene in 2003, and in 2007 developers introduced a number of features to encourage further user collaboration. with a new look (going from the original del.icio.us to its current moniker, delicious) as well as more ways for users to retrieve and share resources by 2007, delicious had 3 million registered users and 100 million unique urls.14 the reputation of delicious has generated interest among people concerned with organizing the information available via the internet. how does the folksonomy or delicious model of open-ended tagging affect searching, information retrieving, and resource sharing? delicious, whose platform is heavily influenced by its users, operates with no hierarchical control over the vocabulary used as tags. this underscores the organization controversy. bottom-up tagging gives each person tagging an equal voice in the categorization scheme that develops through the user generated tags. at the same time, it creates a chaotic information-retrieval system when compared to traditional controlled vocabularies, taxonomies, and other methods of applying metadata.15 a folksonomy follows no hierarchical scheme. every tag generated supplies personal meaning to the associated url and is equally weighted. there will be overlap in some of the tags users select, and that will be the point of access for different users. for the unique tags, each delicious user can choose to adopt or reject them for their personal tagging system. either way, the additional tags add possible future access points for the rest of the user community. the social usefulness of the tags grows organically in relationship to their adoption by the group. can the internet support an organizational system controlled by user-generated tags? by the very nature of the participatory web, whose applications often get better with user input, the answer is yes. delicious and other social tagging systems are proving that their folksonomic approach is robust enough to satisfy the organizational needs of their users. defined by vander wal, a broad folksonomy is a classification system scalable to the internet.16 the problem with projecting already-existing search and classification strategies to the internet is that the internet is constantly evolving, and classic models are quickly overcome. even in the nonprint world of the internet, taxonomies and controlled vocabulary entail a commitment both from the entity wanting to organize the system and the users who will be accessing it. developing a taxonomy involves an expert, which requires an outlay of capital and, as in the case with yahoo!, a taxonomy is not necessarily what users are looking for. to be used effectively, taxonomies demand a certain amount of user finesse and complacency. the user must understand the general hierarchy and by default must suspend their own sense of category and subcategory if they do not mesh with the given system. the search model used by google, where the user does the filtering, has been a significantly more successful search engine. google recognizes natural language, making it user friendly; however, it remains merely a search engine. it is successful at making links, but it leaves the user stranded without a means to organize search results beyond simple page rank. traditional tagging: an organization scheme for the internet | visser 37 hierarchical systems and search strategies like those of yahoo! and google neglect to take into account the tremendous popularity of the participatory web. successful web applications today support user interaction; to disregard this is naive and short-sighted. in contrast to a simple page-rank results list or a hierarchical system, delicious results provide the user with rich, multilayer results. figure 1 shows four of the first ten results of a delicious search for the term “folksonomy.” the articles by the four authors in the left column were tagged according to the diagram. two of the articles are peer-reviewed, and two are cited repeatedly by scholars researching tagging and the internet. in this example, three unique terms are used to tag those articles, and the other terms provide additional entry points for retrieval. further information available using delicious shows that the guy article was tagged by 1,323 users, the mathes article by 2,787 users, the shirky article by 4,383 users, and the peterson article by 579 users.17 from the basic delicious search, the user can combine terms to narrow the query as well as search what other users have tagged with those terms. similar to the card catalog, where a library patron would often unintentionally find a book title by browsing cards before or after the actual title she originally wanted, a delicious user can browse other users’ libraries, often finding additional pertinent resources. a user will return a greater number of relevant and automatically filtered results than with an advanced google search. as an ancillary feature, once a delicious user finds an attractive tag stream—a series of tags by a particular user—they can opt to follow the user who created the tag stream, thereby increasing their personal resources. hence delicious is effective personally and socially. it emulates what internet users expect to be able to do with digital content: find interesting resources, personalize them, in this case with tags, and put them back out for others to use if they so choose. proponents of folksonomy recognize there are benefits to traditional taxonomies and controlled vocabulary systems. shirky delineates two features of an organizational system and their characteristics, providing an example of when a hierarchical system can be successful (see table 1).18 these characteristics apply to situations using databases, journal articles, and dissertations as spelled out by peterson, for example.19 specific organizations with identifiable common terminology—for example, medical libraries—can also benefit from a traditional classification system. these domains are the antithesis of the domain represented by the web. the success of controlled vocabularies, taxonomies, and their resulting systems depends on broad user adoption. that, in combination with the cost of creating and implementing a controlled system, raises questions as to their utility and long-term viability for use on the web. though meant for longevity, a taxonomy fulfills a need at one fixed moment in time. a folksonomy is never static. taxonomies developed by experts have not yet been able to be extended adequately for the breadth and depth of internet resources. neither have traditional viewpoints been scaled to accept the challenges encountered in trying to organize the internet. folksonomy, like taxonomy, seeks to provide the information critical to the user at the moment of need. folksonomy, however, relies on users to create the links that will retrieve the desired results. doctorow puts forward three critiques of a hierarchical metadata system, emphasizing the inadequacies of applying traditional classification schemes to the digital stage: 1. there is not a “correct” way to categorize an idea. 2. competing interests cannot come to a consensus figure 1. search results for “folksonomy” using delicious. table 1. domains and their participants domain to be organized participants in the domain small corpus expert catalogers formal categories authoritative source of judgment restricted entities coordinated users clear edges expert users 38 information technology and libraries | march 2010 on a hierarchical vocabulary. 3. there is more than one way to describe something. doctorow elaborates: “requiring everyone to use the same vocabulary to describe their material denudes the cognitive landscape, enforces homogeneity in ideas.”20 the internet raises the level of participation to include innumerable voices. the astonishing thing is that it thrives on this participation. guy and tonkin address the “folksonomic flaw” by saying user-generated tags are by definition imprecise. they can be ambiguous, overly personal, misspelled, and a contrived compound word. guy and tonkin suggest the need to improve tagging by educating the users or by improving the systems to encourage more accurate tagging.21 this, however, does not acknowledge that successful web 2.0 applications depend on the emergent wisdom of the user community. the systems permit organic evolution and continual improvement by user participation. a folksonomy evolves much the way a species does. unique or single-use tags have minimal social import and do not gain recognition. tags used by more than a few people reinforce their value and emerge as the more robust species. n conclusion the benefits of the internet are accessible to a wide range of users. the rewards of participation are immediate, social, and exponential in scope. user-generated content and associated organization models support the internet’s unique ability to bring together unlikely social relationships that would not necessarily happen in another milieu. to paraphrase shirky and lessig, people are participating in a moment of social and technological evolution that is altering traditional ways of thinking about information, thereby creating a break from traditional systems. folksonomic classification is part of that break. its utility grows organically as users add tagged content to the system. it is adaptive, and its strengths can be leveraged according to the needs of the group. while there are “folksonomic flaws” inherent in a bottomup classification system, there is tremendous value in weighting individual voices equally. following the logic of web 2.0 technology, folksonomy will improve according to the input of the users. it is an organizational system that reflects the basic tenets of the emergent internet. it may be the only practical solution in a world of participatory content creation. shirky describes the internet by saying, “there is no shelf in the digital world.”22 classic organizational schemes like the dewey decimal system were created to organize resources prior to the advent of the internet. a hierarchical system was necessary because there was a physical limitation on where a resource could be located; a book can only exist in one place at one time. in the digital world, the shelf is simply not there. material can exist in many different places at once and can be retrieved through many avenues. a broad folksonomy supports a vibrant search strategy. it combines individual user input with that of the group. this relationship creates data sets inherently meaningful to the community of users seeking information on any given topic at any given moment. this is why a folksonomic approach to organizing information on the internet is successful. users are rewarded for their participation, and the system improves because of it. folksonomy mirrors and supports the evolution of the internet. librarians, trained to be impartial and ethically bound to assure access to information, are the logical mediators among content creators, the architecture of the web, corporate interests, and policy makers. critical conversations are no longer happening only in traditional publications of the print world. they are happening with communication platforms like youtube, twitter, digg, and delicious. information organization is one issue on which librarians can be progressive. dedicated to making information available, librarians are in a unique position to take on challenges raised by the internet. as the profession experiments with the introduction of web 3.0, librarians need to position themselves between what is known and what has yet to evolve. librarians have always leveraged the interests and needs of their users to tailor their services to the individual entry point of every person who enters the library. because more and more resources are accessed via the internet, librarians will have to maintain a presence throughout the web if they are to continue to speak for the informational needs of their users. part of that presence necessitates an ability to adapt current models to the internet. more importantly, it requires recognition of when to forgo conventional service methods in favor of more innovative approaches. working in concert with the early adopters, corporate interests, and general internet users, librarians can promote a successful system for organizing internet resources. for the internet, folksonomic tagging is one solution that will assure users can retrieve information necessary to answer their queries. references and notes 1. charles f. thomas and linda s. griffin, “who will create the metadata for the internet?” first monday 3, no. 12 (dec. 1998). 2. web 2.0 is a fairly recent term, although now ubiquitous among people working in and around internet technologies. attributed to a conference held in 2004 between medialive tagging: an organization scheme for the internet | visser 39 international and o’reilly media, web 2.0 refers to the web as being a platform for harnessing the collective power of internet users interested in creating and sharing ideas and information without mediation from corporate, government, or other hierarchical policy influencers or regulators. web 3.0 is a much more fluid concept as of this writing. there are individuals who use it to refer to a semantic web where information is analyzed or processed by software designed specifically for computers to carry out the currently human-mediated activity of assigning meaning to information on a webpage. there are librarians involved with exploring virtual-world librarianship who refer to the 3d environment as web 3.0. the important point here is that what internet users now know as web 2.0 is in the process of being altered by individuals continually experimenting with and improving upon existing web applications. web 3.0 is the undefined future of the participatory internet. 3. clay shirky, “here comes everybody: the power of organizing without organizations” (presentation videocast, berkman center for internet & society, harvard university, cambridge, mass., 2008), http://cyber.law.harvard.edu/inter active/events/2008/02/shirky (accessed oct. 1, 2008). 4. ibid. 5. lawerence lessig, “early creative commons history, my version,” videocast, aug. 11, 2008, lessig 2.0, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed aug. 13, 2008). 6. elaine peterson, “beneath the metadata: some philosophical problems with folksonomy,” d-lib magazine 12, no. 11 (2006), http://www.dlib.org/dlib/november06/peterson/11peterson .html (accessed sept. 8, 2008). 7. clay shirky, “ontology is overrated: categories, links, and tags” online posting, spring 2005, clay shirky’s writings about the internet, http://www.shirky.com/writings/ontology_ overrated.html#mind_reading (accessed sept. 8, 2008). 8. gene smith, tagging: people-powered metadata for the social web (berkeley, calif.: new riders, 2008): 68. 9. ibid., 76. 10. thomas vander wal, “folksonomy,” online posting, feb. 7, 2007, vanderwal.net, http://www.vanderwal.net/folksonomy .html (accessed aug. 26, 2008). 11. thomas vander wal, “explaining and showing broad and narrow folksonomies,” online posting, feb. 21, 2005, personal infocloud, http://www.personalinfocloud.com/2005/02/ explaining_and_.html (accessed aug. 29, 2008). 12. shirky, “ontology is overrated.” 13. ibid. 14. michael arrington, “exclusive: screen shots and feature overview of delicious 2.0 preview,” online posting, june 16, 2005, techcrunch, http://www.techcrunch.com/2007/09/06/ exclusive-screen-shots-and-feature-overview-of-delicious-20 -preview/(accessed jan. 6, 2010). 15. smith, tagging, 67–93 . 16. vander wal, “explaining and showing broad and narrow folksonomies.” 17. adam mathes, “folksonomies—cooperative classification and communication through shared metadata” (graduate paper, university of illinois urbana–champaign, dec. 2004); peterson, “beneath the metadata”; shirky, “ontology is overrated”; thomas and griffin, “who will create the metadata for the internet?” 18. shirky, “ontology is overrated.” 19. peterson, “beneath the metadata.” 20. cory doctorow, “metacrap: putting the torch to seven straw-men of the meta-utopia,” online posting, aug. 26, 2001, the well, http://www.well.com/~doctorow/metacrap.htm (accessed sept. 15, 2008). 21. marieke guy and emma tonkin, “folksonomies: tidying up tags?” d-lib magazine 12, no. 1 (2006), http://www.dlib .org/dlib/january06/guy/01guy.html (accessed sept. 8, 2008). 22. shirky, “ontology is overrated.” global interoperability continued from page 33 9. julie renee moore, “rda: new cataloging rules, coming soon to a library near you!” library hi tech news 23, no. 9, (2006): 12. 10. rick bennett, brian f. lavoie, and edward t. o’neill, “the concept of a work in worldcat: an application of frbr,” library collections, acquisitions, & technical services 27, no. 1, (2003): 56. 11. park, “cross-lingual name and subject access.” 12. ibid. 13. thomas b. hickey, “virtual international authority file” (microsoft powerpoint presentation, ala annual conference, new orleans, june 2006), http://www.oclc.org/research/ projects/viaf/ala2006c.ppt (accessed dec. 9, 2009). 14. leaf, “leaf project consortium,” http://www.crxnet .com/leaf/index.html (accessed dec. 9, 2009). 15. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 16. alan danskin, “mature consideration: developing bibliographic standards and maintaining values,” new library world 105, no. 3/4, (2004): 114. 17. ibid. 18. bennett, lavoie, and o’neill, “the concept of a work in worldcat.” 19. moore, “rda.” 20. danskin, “mature consideration,” 116. 21. ibid.; park, “cross-lingual name and subject access.” a simulation model for purchasing duplicate copies in a library w. y. arms: the open university, and t. p. walter: unilever limited. at the time this study was undertaken the authors were at the university of sussex. 73 p1'ovision of duplicate copies in a lib1'at'y requires knowledge of the demand fo1' each title. since di1'ect measu1'ement of demand is difficult a simulation model has been developed to estimate the demand for a book f1'om the number of times it has been loaned and hence to dete1·mine the number of copies required. special attention has been given to accurate calibration of the model. introduction a common difficulty in library management is deciding when to buy duplicate copies of a given book and how many copies to buy. a typical research library has several hundred thousand different works; many are lightly used but all are potential candidates for duplication. the problem which we faced at sussex university was how to obtain reliable forecasts of the demand for each title and to translate this into a purchasing policy. at present sussex spends between £10,000 and £20,000 ($22,00o-$44,000) per year on duplicate copies, and as the university grows this amount is increasing steadily. because of the large number of books in a library relatively little data are available about each title. records are kept of books on loan or removed from the library, but frequently these are the only routine data collected. few large libraries even manage inventory checks. we therefore looked for a system that could be implemented with the minimum of data collection, preferably one based on existing records. forecasts of demand if the demand for a particular book is known, it is possible, though not necessarily easy, to determine how many copies of that book are needed to achieve a specified level of service, such as a copy being available on 80 percent of the occasions that a reader requires the book. unfortunately demand cannot be measured directly, even retrospectively. records of the 74 journal of librm·y automation vol. 7/2 june 1974 number of times that a book is issued from the library contain no information about how many times the book was used within the library, nor how many readers failed to find a copy and went away unsatisfied. since both these factors are extremely difficult to measure, one of the central parts of our work was to develop a method of estimating them from data readily available. to forecast demand two lines of approach seemed reasonable: subjective estimation based on faculty reading lists; and forecasts based on the number of loans in previous years. in the past, sussex library has made extensive use of reading lists provided by faculty to decide how many copies to buy of each title. as the books most in demand are those recommended for undergraduate courses this seemed a sensible approach, though the number of copies required is not obvious even if the demand is known. webster analysed the effectiveness of these lists in predicting demand for specific titles and evaluated the purchasing rule being used, one copy for every ten students taking a course. 1 restricting his attention to books known to be in demand and marked in the catalog, he drew a random sample of 673 titles, about 4 percent of the books falling into this category. he compared the number of loans of each of these titles over a term· with data from the reading lists supplied at the beginning of the term. as the library had made a special effort to obtain reading lists for all courses taught that term, he had data on the number and type of students taking each course, the importance given to each text, and the subject areas involved. yet despite a thorough analysis of these data webster was able to find very little relationship between observed demand and reading list information. his work shows that faculty at the university have remarkably little knowledge of the books that their students read. in the sample some books strongly recommended to large groups of students were hardly used and some of the most heavily used works appeared on no reading list. the results of this study are fascinating from an educational viewpoint but less satisfying as operational research. the failure of this .. approach led us to predicting demand from records of the number of past loans. this divides into two parts: using the number of loans over a period to estimate what the total demand was during that period; and using this estimate of the demand in one period to forecast the demand in another. various evidence suggests that the latter is a sensible thing to do. the main demand for heavily used books comes from undergraduate courses. most faculty are loyal in their reading habits, recommending books they know rather than new ones, and each course tends to be repeated year after year with a syllabus that changes only gradually. the use of past circulation to forecast future use is fundamental to a markov model of book usage developed by morse and elston and tested with data from the m.i.t. engineering library. 2 for our work we have used the number of loans in a given term to predict the demand in the corresponding term a year later. simulation m odelj arms and walter 75 estimating the total demand in a period from the number of loans in that period is more difficult. this requires a model of the circulation system. mathematical approach several attempts have been made to apply the methods of inventory control or queueing theory to the problem of buying duplicates. for example, grant has recently described an operational system using the simple rule that the number of copies required to satisfy 95 percent of the demand is n (p,. + 2cr.)/t where n is the number of times that the book is issued during a period of t days and p,8 and cr8 are the mean and standard deviation of the time that each book is off the shelf when on loan. 3 this type of approach has the advantage of being straightforward to use. periodically a simple computer program analyzes the circulation history of each book in the library and prints a list of books requiring duplication. however, the method suffers from difficulties both mathematical and practical. to obtain the simple mathematical expression given above, several simplifying assumptions have to be made. for example, the expression ignores use of a book within the library, and identifies demand in a period with the number of loans within that period. practical difficulties in arriving at a more exact mathematical expression are discussed in the next section. difficulties in constructing a model the following are the main difficulties that we found in constructing a model, either mathematical or using simulation: 1. the most useful measure of the effectiveness of a duplication policy is satisfaction level, the proportion of readers who on approaching the shelves find a copy of the book there, but satisfaction level is almost impossible to measure directly since, although some unsatisfied readers ask that the book be held for them, most go away without comment. more or less equivalent is the percentage time on shelf, the proportion of time that at least one copy of the book is available. this can be measured directly, though a visit to the shelves is needed, and was found useful in validating our model. if the underlying demand is random these two measures of effectiveness have the same value. 2. use of books within the library is also difficult to measure. at sussex, as in most libraries, data are available only on the number of times that a book is lent out of the library. if a reader does not find a copy on the shelves or if he uses a book within the library but does not take it away then no record is generated. since various studies, notably that of fussier and simon, suggest that the amount of use within li76 ]oumal of libmry automation vol. 7/2 june 1974 braries often exceeds the number of loans recorded by a factor of three or more, if the number of loans is used to estimate demand a reasonable knowledge of within-library use is essential.4 3. the number of copies required to achieve a specified satisfaction level does not go up linearly with demand. since a reader is satisfied if he finds a single copy on the shelves, proportionately fewer duplicates are needed of the books most in demand. at sussex more than twenty copies are provided of several books and this nonlinearity is very noticeable. 4. the demand for a title is erratic, changing from term to term, from week to week, and from day to day, even if the mean demand is constant. over a period such as a term three different effects might be expected: a background random demand independent of university courses; sudden peaks when a book is required for a course taken by several students; and feedback caused by previously unsatisfied readers returning. 5. the circulation of books is surprisingly complicated. at sussex some books are designated short term loan and can be borrowed for up to four days only; the remainder are long term loan books and can be borrowed for up to six weeks. circulation data show that the time for which a book is off the shelf is not the same as the period for which it is lent, but has a heavily skewed distribution. few books are returned until near the due date; just before the book is due back there is a peak when most books are returned but many become overdue and the tail of the distribution dies away slowly. simulation as these various factors seemed too complex to derive usable mathematical results, we decided to use computer simulation of the book circulation. simulation of book circulation is not new. in particular it has been used at lancaster university by mackenzie et al. to decide loan periods.5 their report includes a good description of the general approach. the object of our simulation was to model the circulation process so that we could study the relationship between three groups of parameters: 1. 0 bserved data number of copies available number of loans 2. total underlying demand 3. measures of effectiveness satisfaction of level percentage time on shelf. the results obtained from any simulation are only as accurate as the values given to the variables used to calibrate the model. as several of these values were not known at all accurately when the work was begun, special efforts were put into careful validation and calibration of the mod76 ]oumal of libmry automation vol. 7/2 june 1974 braries often exceeds the number of loans recorded by a factor of three or more, if the number of loans is used to estimate demand a reasonable knowledge of within-library use is essentiaj.4 3. the number of copies required to achieve a specified satisfaction level does not go up linearly with demand. since a reader is satisfied if he finds a single copy on the shelves, proportionately fewer duplicates are needed of the books most in demand. at sussex more than twenty copies are provided of several books and this nonlinearity is very noticeable. 4. the demand for a title is erratic, changing from term to term, from week to week, and from day to day, even if the mean demand is constant. over a period such as a term three different effects might be expected: a background random demand independent of university courses; sudden peaks when a book is required for a course taken by several students; and feedback caused by previously unsatisfied readers returning. 5. the circulation of books is surprisingly complicated. at sussex some books are designated short term loan and can be borrowed for up to four days only; the remainder are long term loan books and can be borrowed for up to six weeks. circulation data show that the time for which a book is off the shelf is not the same as the period for which it is lent, but has a heavily skewed distribution. few books are returned until near the due date; just before the book is due back there is a peak when most books are returned but many become overdue and the tail of the distribution dies away slowly. simulation as these various factors seemed too complex to derive usable mathematical results, we decided to use computer simulation of the book circulation. simulation of book circulation is not new. in particular it has been used at lancaster university by mackenzie et al. to decide loan periods.5 their report includes a good description of the general approach. the object of our simulation was to model the circulation process so that we could study the relationship between three groups of parameters: 1. 0 bserved data number of copies available number of loans 2. total underlying demand 3. measures of effectiveness satisfaction of level percentage time on shelf. the results obtained from any simulation are only as accurate as the values given to the variables used to calibrate the model. as several of these values were not known at all accurately when the work was begun, special efforts were put into careful validation and calibration of the modsimulation model/ arms and walter 77 el. a separate study was made for a small sample of books, to compare the percentage time on shelf estimated by the simulation with the actual time for which a copy was available, found by looking at the shelves. the results of this study were used to check the amount of use within the library. by this means we were able to verify the simulation model and calibrate it to a highly satisfactory level of accuracy. description of program the basic layout of the simulation is shown in figure 1. .this is a time advance model with a period of one day. the program has been coded in fortran and running on the icl 1904a computer at sussex takes about one second of machine time to simulate two years. this fast speed has enabled us to try a wide range of values for most parameters and to experiment with a variety of distributions of arrival times and book return dates. 1. satisfaction level at the beginning of each day the number of demands for that day is generated. the satisfaction level is taken as the proportion of these requests which can be satisfied from the books left on the shelf from the previous day and those returned during the simulated day. 2. within-library use the proportion of use that takes place within the library was a key parameter in calibrating the model. the first version of the simulation program assumed a figure of 25 percent use within the library. this was based on a small survey of the type of books being studied, standard texts used for undergraduate courses. the weakness of this survey was that it used a count of those books that were left lying in the library at the end of the day and did not make sufficient allowance for books reshelved by readers or by library staff during the day. the validation experiment showed a consistent difference between predicted and observed percentage time on shelf which could be corrected by changing the value of the within-library use parameter to 60 percent. 3. distribution of demand two distributions of demand have been used, poisson arrivals with a specified mean, and a step demand superimposed on a poisson process. in both cases provision is made for a proportion of unsatisfied readers to return later. as the effect of this feedback is to introduce sharp peaks of demand, the two distributions have proved surprisingly similar in the results produced and most of the runs of the program have been done with random demand. a recent survey showed that 69 percent of readers who fail to find a book intend to return, but we do not know how many actually come back nor what the time interval is before they return. 6 the simulation proved to be insensitive to moderate changes of these parameters 78 journal of library automation vol. 7/2 june 1974 advance clock one day add returned books generate requests fig. 1. outline flowchart of simulation program generate :return date generate return date reader return date simulation model/ arms and walter 79 and for most runs 25 percent of unsatisfied readers were deemed to return after a delay which averaged two days. 4. period for which the book is off the shelf the simulation allows for a book to be borrowed within the library, in which case it is available again the next day, or to be lent from the library. if the book is lent, the return date is generated from one of two histograms which respectively refer to books available on short and long term loan. these histograms were derived from an analysis of all books returned during one week in autumn 1970, modified to reflect changes in the circulation system. validation experiment although the structure of the simulation is fairly straightforward several parameters used in the model have been estimated indirectly. validation of the model took two forms. firstly we ran the program with a wide range of values for the main parameters to see which most influence the results. secondly a small study was set up to measure the percentage time on shelf of a number of books. for each book, the actual availability was estimated by the simulation from the number of loans during the same period. twenty-eight books known to be in heavy demand were selected, half in physics and half in sociology. over a period of eight weeks the shelves were inspected once per day, at random times during the day, to see if a copy was available. the number of loans of each copy of each book during the period was noted and the library staff carried out a thorough check to determine whether any copies shown in the catalog had been lost, stolen, or had their loan category altered. the simulation was used to estimate the percentage time on shelf and this was plotted on a graph against the observed percentage. figure 2 shows the graph for the original values of the parameters. in this graph the x axis shows the percentage time on shelf predicted by the simulation; the y axis shows the percentage observed. if the model were perfect the points would lie near the line y = x, deviations being caused by y being a random variable. the graph in figure 2 is clearly convex downwards showing a consistent error in the model, with these values of the parameters. knowing that the simulation is sensitive to the parameter giving the proportion of use that takes place within the library and that our estimate of its value was not precise, a series of graphs were prepared varying this parameter. figure 3 shows the same observations plotted against predictions assuming 60 percent use within the library, the value which best predicts the observations. this graph is much closer to being linear than figure2. the next question is whether the nonlinearities in figure 3 are the type to be expected from y being a random variable. a very rough calculation helps to answer this question. if we make the dubious assumption that 80 i ournal of lihm1'y automation vol. 7/2 june 197 4 observed availability (percent time on shelf) 100 50 25 o~----------~~----------~50~----------~75 ____________ -jloo predicted availability (percent time on shelf) fig. 2. observed percentage time on shelf against predicted ( 25 percent use within library) availability of a copy on a given day is independent of the days before and afterwards, then, for x given, y should be approximately normally distributed with mean x and variance x( 1 : ) , where n is the number of days in the study (forty). if this calculation were exact, 95 percent of the observations of y would lie within two standard deviations of x, but, since the assumption of independence is definitely false, we would expect the number of observations which fall within the range to be less than 95 percent. the curves y = x ± 2 { x(lx)/n} ¥. observed availability (percent time on shelf) 100 75 50 25 simulation model/ arms and walter 81 predicted availability (percent time on shelf) fig. 3. observed percentage time on shelf against predicted ( 60 percent use within library) with 95 percent probability curves have been added to figure 3. two points lie well off all graphs and cannot be explained except as the result of books being stolen or lost during the period of the study. of the remaining twenty-six all but three lie within the curves. this shows that the simulation model as finally calibrated gives a very reasonable description of the situation. operational experience the results of this simulation have been used by library staff since the middle of 1971 initially on an experimental basis. a two-stage process is in82 journal of library automation vol. 7/2 june 1974 volved. from the computer based circulation system cau; be found the number of times that each short term loan copy has been circulated. from these figures the library staff can estimate the demand for a title, over a given period. once the demand has been estimated the staff can use the simulation again to determine how many copies would have been required to have achieved a specified satisfaction level, perhaps 80 percent. if fewer copies are held by the library orders are placed for extra copies. at present these procedures are done manually using tables, but the possibility exists of modifying the computer system to identify those titles which need extra duplication. the actual decision to purchase needs to be done by library staff who can take account of factors not included in the simulation, such as price and changes of undergraduate courses. conclusion although this work was carried out during 1971, we shall have little operational experience of the method in action until the computer circulation system is reorganized. in the past, different copies of the same book have been processed entirely independently, meaning that the total number of loans of a given title can only be found by manually adding up the number of loans of each copy. in the revised computer system this will be done automatically. experience will probably show that the best procedure combines use of the simulation model with reading lists and the skill of a librarian. one possible feature of a computer based system is that it could automatically indicate which books appear to require duplication. the method used here would seem to apply equally well to other libraries. naturally the circulation patterns of other libraries are different, which means that a different simulation would be needed, but this work has shown that it is possible to calibrate a simulation accurately enough to examine the circulation of individual books. acknowledgments we would like to thank the many members of the university of sussex library staff who have helped at various stages, particularly p. t. stone who was closely involved throughout. references 1. p. f. webster, provision of duplicate copies in the university library, final year project report (university of sussex, 1971). 2. p. m. morse and c. r. elston, "a probability model for obsolescence," operations resem·ch 17:36-47 (1969). 3. r. s. grant, "predicting the need for multiple copies of books," journal of library automation 4:64-71 (june 1971). 4. h. h. fussier and j. l. simon, patterns in the use of books in large research libmries (chicago: univ. of chicago pr., 1969). 5. a. g. mackenzie et al., systems analysis of a university library. report to osti on project sl/ 52/02, 1969. 6. j. urquhart, private discussion, 1971. the library of congress view on its relation to the ala marc advisory committee henriette d. avram: marc development office, libraq of congress. 119 this paper is a statement of the library of congress' 1'ecommendation that a marc advisory committee be appointed within the present structure of the rtsd jisad jrasd committee on representation in machine-readable form of bibliographic information (marbi) and describes the library's proposed relation to such a committee. the proposals and recommendations suggested were adopted by the marbi committee dming its deliberations at ala midwinter, janua1'y 1974, and a1'e now in effect. introduction during ala midwinter, january 1973, the library of congress (lc) suggested to the rtsd/isad/rasd committee on representation in machine-readable form of bibliographic information that a marc advisory committee be formed to work with the marc development office regarding changes made to the various marc formats. the primary interest of the committee would be the serial and monograph formats, though the committee should have interest in and responsibility for reviewing changes in any of the marc formats to insure that the integrity and compatibility of marc content designators are preserved. the marbi committee decided that it would be the marc advisory committee and asked that a paper be prepared proposing how such a committee would operate in relationship to the marc development office. prior to a discussion of marc changes, it appears appropriate to make certain basic statements regarding marc changes and the difficulties experienced by the marc development office in evaluating the significance of a change for the marc subscriber. it would be naive to assume, in a dynamic situation, that even in the best of all worlds a marc subscriber would never have to do any reprogramming. changes in procedures, changes in cataloging, experience in providing the knowledge for more efficient ways to process information, additional requirements from users, etc., have always been factors creating the 120 ] ournal of library automation vol. 7/2 june 197 4 need to both modify andjor expand an automated system. programming installations always require personnel to maintain ongoing systems. situations creating changes locally must exist and, likewise, they also exist at lc. staff of the marc development office give serious consideration to every proposed marc change and its impact on the marc subscribers. however, it must be realized that it is not possible to evaluate fully the impact of each change because the significance of a change is directly dependent on the use made of the elements of the record and the programming techniques used by each subscriber. marc staff cannot possibly know the details of use and programming techniques and capabilities at every user installation. each marc subscriber evaluates a change in light of his operational requirements. since the uses made of the data are varied among users, there is rarely a consensus as to the pros and cons of a change. marc staff are aware of the expenses imposed by changes to software and have made an attempt to solicit preferences in some cases for one technique over another from marc subscribers when changes were required. in the case of the isbd implementation, ten replies were received from questions submitted to the then sixty-two marc users. the remainder of this paper describes what is included in the term "change," the various stimuli that initiate changes, and recommendations of how lc and the marc advisory committee should interact in regard to changes. the appendix summarizes in chart form the addenda to books: a marc fo1·mat since the initiation of the marc service. an examination of the chart will reveal that the number and the types of changes have not been too significant. marc changes the term "change" is used throughout this paper in the broad sense, i.e., the term includes additions, modifications, and deletions of content data (in both fixed and variable fields) and content designators (tags, indicators, and subfield codes) made to the format as well as additions, modifications, and deletions made to the tape labels. the concern is with changes made to all records where applicable or groups of records but not with the correction or updating of individual records as part of the marc distribution service. changes as described above fall into several broad types: 1. addition of new fields, indicators, or subfield codes to the format. 2. implementation of aheady defined but unused tags, indicators, subfield codes, or fixed fields. 3. modification of content data of fields (fixed and variable). 4. changes in style of content in records, e.g., punctuation. 5. cessation in use of existing fields, indicators, and subfield codes. library of congress view/ avram 121 the following paragraphs are divided into two sections. section "a" describes the stimulus for a change and the rationale for making it. section "b" describes the lc position regarding the change and, where applicable, a recommendation to the marc advisory committee. changes made to marc records may be divided into the following categories: category 1: changes resulting from a change in cataloging rules or systems. a. cataloging rules or systems fall into two distinct types: those made in consultation with ala (resources & technical services division/cataloging & classification section/descriptive cataloging committee), and those made by the subject cataloging division to the subject cataloging system without consultation with ala. lc follows aacr. since the marc record is the record used for lc bibliographical control as well as the source record for the lc printed card and lc book catalogs (for those items presently within the scope of marc), cataloging changes (descriptive and subject) are necessarily reflected in marc. if the cataloging change is such that the retrospective records can reasonably be modified by automated techniques, these records are modified to reflect the change. prior to marc, this updating could not be provided to subscribers to lc bibliographic products and is one of the advantages of a machine-readable service. it has the effect of maintaining a consistent data base for all marc users. b. changes made in cataloging rules or systems will be made by the appropriate agencies. once changes in cataloging rules have been made by the ala (rtsdjccsjdcc) committee, lc will consult with the marc advisory committee with respect to their implementation in those cases affecting the marc format.'~* wherever possible, depending upon resources available, the number of records affected, and the type of change, the retrospective flies will be updated and made available in one of two ways: if the number of records is small (to be decided by lc), the records will be distributed as corrections through the normal channels of the marc distribution service. if the number of records is large, the records will be sold by the lc card division. category 2: changes made to satisfy a requirement of the library of congress. a. since lc uses the marc records for its own purposes, situations do arise in which lc has a requirement for a change. in most cases, lc feels that the change would also be beneficial to the users. under these circumstances lc has carefully evaluated the im""format change is used in this context to mean a change affecting the tags, indicators, subfield codes, addition or deletion of fixed fields, or change to the leader. 122 i oumal of libmry automation vol. 7/2 june 197 4 plication of the change to the marc subscribers and, in some cases, solicited their preferences and advice. b. if lc has a requirement to make a change to marc, the proposed change and the reason for the change will be referred to the marc advisory committee. the marc advisory committee will solicit opinions from marc users as to whether or not to include the change in the marc distribution service, and lc will abide by the committee's recommendation. if this decision is not to include the change, lc will implement the change only in its own data base.t category 3: changes made to satisfy subscribers' requests. a. subscribers sometimes request that a change be made to a marc record. where possible, within the limitation of lc resources, these requests are complied with. lc, when considering such a request, has sought the opinion of the marc subscribers, and if sufficient numbers of users were interested in the change, the change was implemented. b. changes requested by subscribers will be evaluated by lc, and if considered possible to implement, the proposed change will be submitted by lc to the marc advisory committee to solicit opinions from marc users. if the committee recommends, lc will implement the change. catego1·y 4: changes made to support international standardization. a. lc plays a significant role in international activities in the area of machine-readable cataloging records. much of the future expansion of marc depends upon standards in formats, data content, and cataloging. in all these activities, lc firmly supports aacr and current marc formats. occasionally, in order to arrive at complete agreement with agencies in other countries, it becomes necessary for all to compromise. however, in all cases lc does not agree to changes in cataloging rules until the recommendation has been approved by the appropriate ala committee. b. changes resulting from international meetings will fall principally into two areas: 1. cataloging-if the change required is the result of a change in cataloging rules and the ala (rtsdjccsjdcc) has approved the aacr modifications, the marc change falls into category 1. 2. all other changes affecting the format-since lc is the agency in the u.s. that will exchange machine-readable bibliographic records with other national agencies, lc will consider these t an exception to this statement will be those changes to lc practice which must be reflected on cards and in the marc record and which cannot exist in optional form. an example of the above would be abolition of the check digit in the lc card number. libmry of congress viewj avram 123 changes an internal lc requirement; therefore, they can be considered under the proposal described in category 2. lc will submit the proposed changes to the marc advisory committee. category 5: changes made to expand the marc program to include additional services. a. if the marc service were static, changes to expand the service would not be possible. an example of an additional service is the cataloging in publication data available on marc tapes. since these cataloging data are available four to six months prior to the publication of the item, it was determined to be of value to marc subscribers and'changes were made to the marc record to make these data available in machine-readable form. b. if a new service is under consideration at lc that will cause a change to marc records, e.g., cataloging in publication, lc will submit the proposal to the marc advisory committee for their action as described in category 2. other lc recommendations for the marc advisory committee 1. time fmme fo1' changes. in order to prevent consultation on changes from taking an inordinate length of time, lc proposes that the marc advisory committee be given two months to solicit comments from marc users, to arrive at a consensus, and to respond to proposed changes. if there is no response during that time, lc will implement the proposed change. lc will notify the marc subscribers two months prior to including the change in the marc distribution service. 2. consultation with the marc advisory committee. the marc development office will submit the recommendation for change and any other information required to evaluate the change to the marc advisory committee. the marc advisory committee will be responsible for submitting the proposal to the marc users and notifying the marc development office of the committee's recommendation. 3. test tapes. the marc advisory committee, on consultation with the marc development office, will consider the requirement for a test tape to reflect the change made to the marc record (the requirement for a test tape is dependent on the type of change made). appendix a addenda to books: a marc format stimul~ for change date change 1. cataloging rules and cataloging system changes 1972 u.s./gt. brit. changed to united states and great britain. comments change made to facilitate machine filing. 124 journal of library automation vol. 7/2 june 1974 appendix a-continued stimulus for change date change 1972 isbd. 1973 isbd-additional information. comments cataloging change based on an international agreement. 2. subscribers requests 1972 government publication code 3. initiated at lc: a. addition or deletion of fields added to fixed field. 1969 abolishment of 653-political jurisdiction (subject) and 750-proper name not capable of authorship.' these little-used fields proved difficult to define and of little value. 1970 addition of encoding level to implemented for use for leader. recon records. 1970 addition of geographic area code field, tag 043. 1971 addition of superintendent of documents field, tag 086. this field has been widely used by lc and subscriber libraries. information added to lc catalog cards (and thus to marc records) at the request of outside libraries. b. additions of indicators 1971 addition of filing indicators. or subfields information needed to allow lc to ignore initial articles in arranging its computerproduced book catalog. c. addition or change of codes or data to existing fields 1972 addition of "q" subfield to fields for conferences entered under place. 1969 code added to modified record indicator in fixed field to indicate shortened records. 1969 code for phonodiscs added to illustration fixed field. 1970 code added to modified record indicator in fixed field to indicate that the dashed-on entry on the original lc card was not carried in marc record. 1971 "questionable condition" codes deleted from country of publication code. 1971 geographic area code. guidelines for implementation modified slightly and 23 new codes added. subfield needed to enable lc to file conferences entered under place correctly. 1971 microfilm call numbers description of what such call carried in lc call number field. numbers looked like. 1971 abolished lc card number check digit. numbers available using check digit too limited. library of congress viewjavram 125 appendix a-continued stimulus for change date change comments d. explanations or 1970 use of "b" subfield with subfield and its use inadcorrections topical subjects (field 650) vertently omitted from books: and geographic subjects a marc format. it occurs (field 651). rarely in marc records. 1971 use of "revision date" as explanation of what this insuffix to lc card number. formation means at lc and how subscribers use it. 1971 indicators used with explanation of use of indiromanized title. cators with this field omitted from books: a marc format. e. changes to labels 1972 change to label to reflect new computer system at lc. 4. national and 1970 standard book number (9 international agreement digits ) changed to international standard book number ( 10 digits) to conform to an international standard. 1971 entry map added to leader to adoption of ansi z39 format conform to national standard. for exchange of bibliographic information interchange. 1971 change to label to conform to ansi standard. 5. new services at lc 1969 changes to label and status to provide for cumulative codes for cumulated tapes. quarterly and semiannual tapes. 1971 cip records-addition of codes to encoding level and record status. everyone’s invited: a website usability study involving multiple library stakeholders elena azadbakht, john blair, and lisa jones information technology and libraries | december 2017 34 elena azadbakht (elena.azadbakht@usm.edu) is health and nursing librarian and assistant professor, john blair (john.blair@usm.edu) is web services coordinator, and lisa jones (lisa.r.jones@usm.edu) is head of finance and information technology, university of southern mississippi, hattiesburg, mississippi. abstract this article describes a usability study of the university of southern mississippi libraries website conducted in early 2016. the study involved six participants from each of four key user groups— undergraduate students, graduate students, faculty, and library employees—and consisted of six typical library search tasks, such as finding a book and an article on a topic, locating a journal by title, and looking up hours of operation. library employees and graduate students completed the study’s tasks most successfully, whereas undergraduate students performed relatively simple searches and relied on the libraries’ discovery tool, primo. the study’s results displayed several problematic features that affected each user group, including library employees. these results increased internal buy-in for usability-related changes to the library website in a later redesign. introduction within the last decade, usability testing has become a common way for libraries to assess their websites. eager to gain a better understanding of how users experience our website, we assembled a two-person team and conducted the first usability study of the university of southern mississippi libraries website in february 2016. the web advisory committee—which is tasked with developing, maintaining, and enhancing the libraries’ online presence—wanted to determine if the content on the website was organized in a way that made sense to users and facilitated the efficient use of the libraries’ online resources. our usability study involved six participants from each of the following library user groups: undergraduate students, graduate students, faculty, and library employees. student and faculty participants represented several academic disciplines and departments. all of the library employees involved in the study work in public-facing roles. the web advisory committee and libraries’ administration wanted to know how each of these groups differ in their website use and whether they have difficulty with the same architecture or features. usability testing helped illuminate which aspects of the website’s design might be hindering users from accomplishing key tasks, thereby identifying where and how improvement needed to be made. we included library employees in this study to compare their approach to the website to that of other users in the mailto:elena.azadbakht@usm.edu mailto:john.blair@usm.edu mailto:lisa.r.jones@usm.edu everyone’s invited | azadbakht, blair, and jones 35 https://doi.org/10.6017/ital.v36i4.9959 hope of increasing internal stakeholders’ buy-in for recommendations resulting from this study. this article will discuss the usability study’s design, results, and recommendations as well as the implications of the study’s findings for similarly situated academic libraries. we will give special consideration to how the behavior of library employees compared to that of other groups. literature review the literature on library-website user experience and usability is extensive. in 2007, blummer conducted a literature review of research related to academic-library websites, including usability studies. her article provides an overview of the goals and outcomes of early library-website usability studies. 1 more recent articles focus on a portion or aspect of a library’s website such as the homepage, federated search or discovery tool, or subject guides. fagan published an article in 2010 that reviews user studies of faceted browsing and outlines several best practices for designing studies that focus on next-generation catalogs or discovery tools. 2 other library-website studies have reported on the habits of user groups, with undergraduates being the most commonly studied constituent group. emde, morris, and claassen-wilson observed university of kansas faculty and graduate students’ use of the library website, which had been recently redesigned, including a new federated search tool. 3 many of the study’s participants gravitated toward the subject-specific resources they were familiar with and either missed or avoided using the website’s new features. when asked for their opinions on the federated search tool, several participants said that while it was not a tool they saw themselves using, they did see how it might be a helpful for undergraduate students who were still new to research. the researchers also provided the participants with an article citation and asked them to locate it using the using the library’s website or online resources. while half the participants did use the website’s “e-journals” link, others were less successful. some who had the most difficulty “search[ed] for the journal title in a search box that was set up to search database titles.” 4 this led emde, morris, and claassen-wilson to observe that “locating journal articles from known citations is a difficult concept even for some advanced researchers.” turner’s 2011 article describes the result of a usability study at syracuse university library that included both students and library staff. participants were asked to start at the library’s homepage and complete five tasks designed to emulate the types of searches a typical library user might perform, such as finding a specific book, a multimedia item, an article in the journal nature, and primary sources pertaining to a historic event. 5 when asked to find toni morrison’s beloved, most staff members used the library’s traditional online catalog whereas students almost always began their searches with the federated search tool located on the homepage. participants of both types were less successful at locating a primary source, although this task highlighted key differences in each groups’ approach to searching the library website. since library staff were more familiar than students with the library’s collections and online search tools, they relied more on facets and limiters to narrow their searches, and some even began their searches by navigating to the library’s webpage for special collections. information technology and libraries |december 2017 36 library staff tended to be more persistent; draw upon their greater knowledge of the library’s collections, website, and search tools; and use special syntax in their searchers, like inverting an author’s first and last names. “library staff took more time, on average, to locate materials,” writes turner, because of their “interest in trying alternative strategies.” 6 students, on the other hand, usually included more detail than necessary in their search queries (such as adding a word related to the format they were searching for after their keywords) and could not always differentiate various types of catalog records, for example, the record for a book review and the record for the book itself. turner concludes that the students’ mental models for searching online and their experiences with other web-search environments influence their expectations of how library search tools work and that library-website design should take these mental models into consideration. research on the search behaviors of students versus more experienced researchers or subject experts also has implications for library website design. two recent articles explore the different mental models or mindsets students bring to a search. the students in asher and duke’s 2012 study “generally treated all search boxes as the equivalent of a google search box” and used very simple keyword searches. 7 this tracked with holman’s 2010 study, which likewise found that the students she observed relied on simple search strategies and did not understand how search interfaces and systems are structured. 8 methods our research team consisted of the libraries’ health and nursing librarian and the web services coordinator. we worked closely with the head of finance and information technology in designing and running the usability study. a two-week period in mid-february 2016 was chosen for usability testing to avoid losing potential participants to midterms or spring break. we posted a call for participants to two university discussion lists, on the libraries website, and on social media (facebook and twitter). we also reached out directly to faculty in academic departments we regularly work with and emailed library employees directly. we directed nonlibrary participants to a web form on the libraries website to provide their name, contact information, university affiliation/class standing, and availability. the health and nursing librarian followed up with and scheduled participants on the basis of their availability. each student participant received a ten-dollar print card and each faculty participant received a ten-dollar starbucks gift card. to record the testing sessions, we needed a free or low-cost software option. since the libraries already had a subscription to screencast-o-matic to develop video tutorials, and the tool allows for simultaneous screen, audio, and video capture, so we decided to use it to record all testing sessions. we also used a spare laptop with an embedded camera and microphone. the health and nursing librarian served as both facilitator and note-taker for most usability testing sessions. participants were given six tasks to complete. we encouraged participants to everyone’s invited | azadbakht, blair, and jones 37 https://doi.org/10.6017/ital.v36i4.9959 narrate as they completed each task. the sessions began with simple, secondary navigational questions like the following: • how late is our main library open on a typical monday night? • how could you contact a librarian for help? • where would you find more information about services offered by the library? next, we asked the participants to complete tasks designed to assess their ability to search for specific library resources and to illuminate any difficulty users might have navigating the website in the process. each of the three tasks focused on a particular library-resource type, including books, articles, and journals: • find a book about rabbits. • find an article about rabbits. • check to see if we have a subscription/access to a journal called nature. after the usability testing was complete, we reviewed the recordings and notes and coded them. for each task, we calculated time to completion and documented the various paths participants took to answer each question, noting any issues they encountered. we also compared the four user groups in our analysis. limitations although we controlled for user type (undergraduate, graduate, faculty, or library employee) in the recruitment of study participants, we did not screen by academic discipline. doing so would have hindered our team’s ability to include enough graduate students and faculty members in the study, as nearly all the volunteers from these two groups were from humanities or social science fields. the results might have differed slightly had the study successfully managed to include more faculty from the so-called hard sciences and allied health fields. additionally, the order in which we asked participants to attempt the tasks might have affected how they approached some of the later tasks. if a participant chose to search for a book using the primo discovery tool, for example, they might be more inclined to use it to complete the next task (find an article) rather than navigate to a different online resource or tool. despite these limitations, usability testing has helped improve the website in key ways. we plan to correct for these limitations in future studies. results every group included a participant who failed to complete at least one of the six tasks. an adequate answer to each of the study’s six tasks can be found within one or two pages/clicks from the libraries homepage (figure 1). the average distance to a solution remained at about two page loads across all of the study’s participants, despite a few individual “website safaris.” information technology and libraries |december 2017 38 figure 1. university of southern mississippi libraries’ homepage. graduate students tended to complete tasks the quickest and were generally as successful as library employees. they preferred to use primo for finding books but tended to favor the list of scholarly databases on the “articles & databases” page to find articles and journals. undergraduates were the second fastest group, but many struggled to complete one or more of the six tasks. they had the most trouble finding books and locating the journal by title. undergraduates generally performed simple searches and had trouble recovering from missteps. they were heavy users of primo, relying on the discovery tool more than any other group. the other two user groups, faculty and library employees, were slower at completing tasks. of the two, faculty took the longest to complete any task and failed to complete tasks at a similar rate as undergraduates. likewise, this group favored primo nearly as often. in contrast, library employees took almost as long as faculty to complete tasks but were much more successful. as a group, library employees demonstrated the different paths users could take to complete each task but favored those paths they identified as the “preferred” method for finding an item or resource over the fastest route. everyone’s invited | azadbakht, blair, and jones 39 https://doi.org/10.6017/ital.v36i4.9959 the majority of study participants across all user groups had little trouble with the first three tasks. although most participants favored the less direct path to the libraries’ hours—missing the direct link at the top of the homepage (figure 2)—they spent relatively little time on this task. likewise, virtually all participants took note of the links to our “ask-a-librarian” and “services” pages located in our homepage’s main navigation menu. this portion of the usability study alerted us to the need for a more prominent display of our opening hours on the homepage. figure 2. link to “hours” from the homepage. of the second set of tasks—find a book, find an article, and determine if we have access to nature—the first and last proved the most challenging for participants. one undergraduate was unable to complete the book task, and one faculty member took nearly eight minutes to do so—the longest time to completion of any task by any user in the study. primo was the most preferred method for finding a book. although an option for searching our classic catalog (which uses innovative interfaces’ millennium integrated library system) is contained within a search widget on the homepage, primo is the default search option and therefore users’ default choice. interestingly, even after statements from some faculty such as “i don’t love primo,” “primo isn’t the best,” and “the [classic catalog] is better,” these participants proceeded to use primo to find a book. library employees were evenly split between primo and classic catalog. one undergraduate student, graduate student, and library employee were unable to determine whether we have access to nature. this task was the most time consuming for library employees because there are multiple ways to approach this question and library employees tended to favor the most consistently successful yet most time-consuming options (e.g., searching within the classic catalog). lacking a clear option in the main navigation bar, the most popular path started information technology and libraries |december 2017 40 with our “articles & databases” page, but the answer was most often successfully found using primo. several participants tried using the “search for databases” search box on the “articles & databases” page, which yielded no results because it searches only our database list. the search widget on the homepage that includes primo has an option for searching e-journals by title, as shown in figure 3. however, nearly all nonlibrary employees missed this feature. participants from both the undergraduate and graduate student user groups had trouble with this task, including those who were ultimately successful. unfortunately, many of the undergraduates could not differentiate a journal from an article, and while graduate students were aware of the distinction, a few indicated that they were not used to the idea of finding articles from a specific journal. figure 3. e-journals search tab. when it came to finding articles, undergraduates, as well as several faculty and a few library employees, gravitated toward primo. others, particularly graduate students and library employees, opted to search a specific database—most often academic search premier or jstor. however, those who used primo to answer this question arrived at an answer two to three times faster because of the discovery tool’s accessibility from a search widget on the homepage. regardless of the tool or resource they used, most participants found a sufficient result or two. common breakdowns despite the clear label “search for databases,” at least one participant from each user group, including library employees, attempted to enter a book title, journal name, or keyword into the libguides’ database search tool on our “articles & databases” page (figure 4). some participants attempted this repeatedly despite getting no results. others did not try a search but stated, with everyone’s invited | azadbakht, blair, and jones 41 https://doi.org/10.6017/ital.v36i4.9959 confidence, that entering a journal, book, or article title into the “search for databases” field would yield a relevant result. a few participants also attempted this with the search box on our research guides (libguides) page, which searches only within the content of the libguides themselves. across all groups, when not starting at the homepage, many participants had difficulty finding books because no clear menu option exists for finding books like it does for articles (our “articles & databases” page). this was difficulty was compounded by many participants struggling to return to the libraries homepage from within the website’s subpages. those participants who were able to navigate back to the homepage were reminded of the primo search box located there and used it to search for books. figure 4. “search for databases” box on the “articles & databases” page. another breakdown was the “help & faq” page (figure 5). participants who turned there for help at any point in the study spent a relatively long time trying to find a usable answer and often ended up more confused than before. in fact, only one in three participants managed to use “help & faq” successfully because the faq consists of many questions with answers on many different pages and subpages. this portion of the website had not been updated in several years and therefore the questions were not listed in order of frequency. information technology and libraries |december 2017 42 figure 5. the answer to the “how do i find books?” faq item leads to several subpages. discussion using the results of the study, we made several recommendations to the libraries’ web advisory committee and administration: (1) display our hours of operation on the homepage; (2) remove the search boxes from the “articles & databases” and “research guides” pages; (3) condense the “help & faq” pages; and (4) create a “find books” option on the homepage. all of these recommendations were taken into account during a recent redesign of the website. we also considered each user group’s performance and its implications for website design as well as instruction and outreach efforts. first, our team suggested that the current day’s hours of operation be featured prominently on the website’s front page. despite “how late is our main library open on a typical monday night?” being one of two tasks that had a 100 percent completion rate, this change is easy to make, adds convenience, and addresses a long-voiced complaint. several participants expressed a desire to see this change implemented. moreover, this is something many of our peer libraries provide on their websites. the team’s next recommendation was to remove the “find databases by title” search box from the “article & databases” page. during the study, participants who had a particular database in mind opted to navigate directly to that database rather than search for it. another such search box exists on the “research guides” page. although most of the participants did not encounter this search box during the study, those that did also mistook it for a general search tool. participants everyone’s invited | azadbakht, blair, and jones 43 https://doi.org/10.6017/ital.v36i4.9959 from all groups, especially undergraduate students, assumed that any search box on the libraries’ website was designed to search for and within resources like article databases and the online catalog, regardless of how the search box was labeled. given our findings, libraries with similar search boxes might also consider removing these from their websites. another recommended change was to condense the “help & faq” section of the website considerably. the “help & faq” section was too large and unwieldy for participants to use successfully without becoming visibly frustrated, defeating its purpose. moreover, google analytics showed that only nine of the more than one hundred “help & faq” pages were used with any regularity. going forward, we will work to identify the roughly ten most important questions to feature in this section. the final major recommendation was to consider adding a top-level menu item called “find books” that would provide users with a means to escape the depths of the site and direct them to primo or the classic catalog. when participants would get stuck on the book-finding task, they looked for a parallel to the “articles & databases” menu option. a “getting started” page or libguide could take this idea a step further by also including brief, straightforward instructions on finding articles and journals by title. in effect, this option would be another way to condense and reinvent some of the topics originally addressed in the “help & faq” pages. comparing each user group’s average performance helped illuminate the strengths and weaknesses of the website’s design. we suspect that graduate students were the fastest and nearly most successful group because they are early in their academic careers and doing a great deal of their own research (as compared to faculty). many of them are also responsible for teaching introductory courses and are working closely with first-year students who are just learning how to do research. faculty, because their research tends to be on narrower topics, were familiar with the specific resources and tools they use in their work but were less able to efficiently navigate the parts of the website with which they have less experience. moreover, individual faculty varied widely in their comfort level with technology, and this affected their ability to complete certain tasks. conclusion the results of our website usability study echo those found elsewhere in the literature. students approach library search interfaces as if they were google and generally conduct very simple searches. without knowledge of the libraries’ digital environment and without the research skills library employees possess, undergraduates in our study tended to favor the most direct route to the answer—if they could identify it. this group had the most trouble with library and academic terminology or concepts like the difference between an article and a journal. though not as quick as the graduate students, undergraduates completed tasks swiftly, mainly becau se of their reliance on the primo discovery tool. however, undergraduate students were less able to recover from missteps; more of them confused the “find databases by title” search tool for an article search tool than participants from any other group. since undergraduates compose the bulk of our user information technology and libraries |december 2017 44 base and are the least experienced researchers, we decided to focus our redesign on solutions that will help them use the website more easily. although all of the library employees in our study work in public-facing roles, not all of them provide regular research help or teach information literacy. since most of them are very familiar with our website and online resources, they approached the tasks more methodically and thoroughly than other participants. library employees tended to choose the search strategy or path to discovery that would yield the highest-quality result or they would demonstrate multiple ways of completing a given task, including any necessary workarounds. the inclusion of library employees yielded the most powerful tool in our research team’s arsenal. holding this group’s “correct” methods side-by-side to equally valid methods of discovery helped shake loose rigid thinking, and the fact that some library employees were unable to complete certain tasks shocked all parties in attendance when we presented our findings to stakeholders. any potential argument that student, faculty, and staff missteps were the result of improper instruction and not of a usability issue was countered by evidence that the same missteps were sometimes made by library staff. not only was this an eye-opening revelation to our entire staff, it served as the evidence our team needed to break through entrenched resistance to making any changes. we were met with almost instant, even enthusiastic, buy-in to our redesign recommendations from the libraries’ administration. therefore, we highly recommend that other academic libraries consider including library staff as participants in their website usability studies. references 1 barbara a. blummer, “a literature review of academic library web page studies,” journal of web librarianship 1, no. 1 (2007): 45–64, https://doi.org/10.1300/j502v01n01_04. 2 jody condit fagan, “usability studies of faceted browsing: a literature review,” information technology and libraries 29, no. 2 (2010): 58–66, https://ejournals.bc.edu/ojs/index.php/ital/article/view/3144/2758. 3 judith z. emde, sara e. morris, and monica claassen-wilson, “testing an academic library website for usability with faculty and graduate students,” evidence based library and information practice 4, no. 4 (2009): 24–36, https://doi.org/10.18438/b8tk7q. 4 ibid., 30. 5 nancy b. turner, “librarians do it differently: comparative usability testing with students and library staff,” journal of web librarianship 5, no. 4 (2011): 286–98, https://doi.org/10.1080/19322909.2011.624428. 6 ibid., 295. https://doi.org/10.1300/j502v01n01_04 https://ejournals.bc.edu/ojs/index.php/ital/article/view/3144/2758 https://doi.org/10.18438/b8tk7q https://doi.org/10.1080/19322909.2011.624428 everyone’s invited | azadbakht, blair, and jones 45 https://doi.org/10.6017/ital.v36i4.9959 7 andrew d. asher and lynda m. duke, “searching for answers: student behavior at illinois western university,” in college libraries and student culture: what we now know (chicago: american library association, 2012), 77–78. 8 lucy holman, “millennial students’ mental models of search: implications for academic librarians and database developers,” journal of academic librarianship 37, no. 1 (2011): 21– 23, https://doi.org/10.1016/j.acalib.2010.10.003. https://doi.org/10.1016/j.acalib.2010.10.003 abstract introduction methods limitations results common breakdowns discussion conclusion references lib-s-mocs-kmc364-20140601051521 39 selective dissemination of marc: a user evaluation lorne r. buhr: murray memorial library, university of saskatchewan, saskatoon, saskatchewan after outlining the terms of reference of an investigation of user reaction to the selective dissemination of marc records, a summary of the types of users is given. user response is analyzed and interpreted in the light of recent developments at the library of congress. implications for the future of sdi of marc in a university setting conclude the paper. introduction f. w. lancaster ( 1968) in his detailed study of medlars makes the following statement, which has application to all sdi work: "in order to survive, a system must monitor itself, evaluate its performance, and upgrade it wherever possible." ( 1) since seldom operates in a fairly new field, sdi for current monographs, an evaluation is most important. to a great extent it must be made without reference to other systems since most of the operational sdi services deal with tape services in various fields of scientific journals, and although there are some parallels, there are numerous differences. whereas services such as can/ sdi cater primarily to the natural and applied sciences, seldom opens up the possibilities for sdi in the humanities and social sciences. the background to the seldom project at the university of saskatchewan has been outlined earlier by smith and mauer hoff ( 1971) and will not be repeated here. (2) after five months of operation a major questionnaire was sent out to each of 121 participants in the experimental seldom service. this questionnaire was based almost entirely on the one used by studer ( 1968 ) in his dissertation at indiana state university. (3) the general purpose of the study was to elicit user reaction to seldom, their evalmaterial appearing in this paper was originally presented at the third annual meeting of the american society for information science (western canada chapter), banff, alberta, october 4, 1971. 40 journal of library automation vol. 5/1 march, 1972 uation of its usefulness, time necessary to scan the weekly output, suggestions regarding continuance of the service, etc. besides this general purpose, the gathering and analyzing of data on seldom will be useful to the library administration in determining the future of an sdi service of this nature. a separate cost study is being prepared in this connection. several factors prompt a cautionary stance in assessing the value of an sdi system on the basis of one questionnaire: ( 1) there is no control situation to which we can compare seldom, i.e., there was no systematic service for current awareness in the field prior to the advent of seldom. faculty and researchers were dependent on their ingenuity to ferret out information on new books which were pertinent to their field of research and instruction. seldom is therefore being compared to a conglomeration of ad hoc methods which may be as numerous as the individuals using them. therefore, we must be cautious or we will tend to say, "something in the field of current awareness is better than nothing," when we really do not know what that "nothing" is. (2) although seldom had been operational for some twenty weeks when evaluation began, this is a relatively short period on which to base an assessment. on the other hand studer's evaluation was based on the experiences of thirty-nine users and covered only eight weekly runs against the marc tapes scheduled on an every other week basis. (3) seldom was implemented without any study to determine the adequacy of the ad hoc approaches, to which i have already referred, nor to assess the patterns of recommendation for purchase. it was assumed that there was a need for seldom and some of the response would indicate that this is a fairly valid assumption, since almost 90 percent of the respondents wanted the service continued. a random investigation in mid-august of 748 current orders in the acquisitions department for books with imprint of 1969 or later revealed that ninty-five or 12~ percent referred to seldom as the source of information for a particular recommendation to purchase. this may or may not be significant since there is no way of assessing whether these items would have been recommended anyway, only later perhaps. one by-product of orders based on seldom information is that correct lc and isbn numbers are given and with the capabilities of the tesa-1 cataloging/ acquisitions system such orders can be expedited more quickly and can also be cataloged sooner than non-marc materials, thus ostensibly getting the desired item to the requestor in less time than previously. seldom is valuable in our university setting, therefore, not only as a means of awareness of new items, but also in the actual retrieval of the item for the user, in this case through acquisition. our analysis, however, must be directed to the effectiveness of seldom as an awareness service, vis-a-vis the ad hoc approach. user group of 121 questionnaires sent out, seventy-seven or 63.5 percent were returned. six of these had to be rejected for the purposes of this study since selective dissemination of marc /buhr 41 table 1 i. library and information science a. on-campus 12 b. off-campus 17 29 ii. social sciences and humanities a. on-campus 15 b. off-campus 2 17 iii. natural and applied sciences a. on-campus 23 b. off-campus 2 25 either only a few questions had been answered or a general letter had been sent instead of answering the questionnaire. thus, the data presented in this study will be based on seventy-one completed questionnaires or 58.6 percent return. three additional verbal comments were made to the writer and thus we in fact heard from eighty or 66 percent of the users. the term "users" will designate the seventy-one who completed their questionnaires, although comments from the other nine individuals will also be referred to. the users have been grouped into three categories according to table 1. categorization was along fairly traditional lines, with category i being necessary because of the large number of people falling into this area. the seventeen off-campus users coming under designation (i) represent the library schools in canada as well as librarians/ information scientists in canada and the united states. the on-campus users are library department heads and heads of branch libraries. included in the social sciences and humanities are the fields of psychology, sociology, history, economics, english, commerce, classics, etc. the natural and applied sciences include all the health sciences plus physical education since the two profiles in that area are tending toward the health sciences. engineering, poultry science, physics, chemistry, biology, etc. , are represented here. observations a sample of the questionnaire used appears on p.47-50 and includes a tally of the number of responses for each possible alternative answer to each question. in some cases the total number of replies for a question is less than seventy-one. this is explained by the fact that some questions on some questionnaires were not answered or were answered ambiguously so they could not be tallied. generally speaking, users found seldom to be good to very good in providing sdi for new english monographs. 25.8 percent of the users found the lists very useful while 48.5 percent said they were useful. six users said the listings were inconsequential for their purposes; in several 42 journal of library automation vol. 5/1 march, 1972 instances this may be due to poor profiling or profiling for a subject area in which little would appear on the marc data base. 23.6 percent of the users indicated that in most cases items of interest found on the seldom lists were previously not known to them. 45.8 percent said that "of interest" items were frequently new. 76 percent of the group believed that the proportion of "of interest" items which also were new was satisfactory, a percentage which speaks well for the currency and effectiveness of an sdi capability. one of the chief drawbacks for which sdi services are often cited is the absence of evaluative commentary or abstract material to accompany the citations. some tape services do provide either an abstract or a good number of descriptors, and this has proved to be an asset in helping the subscriber. seldom is based on the marc tapes which provide complete cataloging data but do not give either evaluations or a multiplicity of descriptors. (some indications are that the information now available in publishers' weekly might at some time in the future be added to the marc tapes. ) interestingly enough, 83.5 percent of the users said the information included in the entries was adequate to determine whether an item was of interest or not. predictably, title, author/ editor, and subject headings were the three indicators, in that order, which were found most useful in making evaluations. this is significant since titles in the humanities and some of the social sciences, particularly, are often not as specific in describing the contents of a work as are titles in the physical sciences. 63.5 percent of the users indicate that seldom information is used for recommending titles for acquisition by the library. as a result it is quite possible that purchasing in the areas covered by seldom profiles may increase and the tendency to broaden the collection should increase. unfortunately, no pattern of pre-seldom recommending for purchase is known. some instructors use the weekly printouts to keep current bibliographies on hand both for teaching purposes and for research purposes. since over half the users ( 55.8 percent) needed no more than ten minutes per week to scan the printouts, there is no indication that excessive time is taken up in the use of such an sdi service. in reply to the question, "would you be willing to increase the number of irrelevant notices received in order to maximize the number of relevant ones?" opinions were nearly balanced with 58 percent replying in the affirmative and 42 percent answering negatively. on the other hand, increases in the marc data base expected some time in 1972 when other roman alphabet language imprints and records for motion pictures and filmstrips are added, did not seem problematic with only 25 percent of users asking that an upper limit be placed on the quantity of material retrieved by their profiles. numerous individuals (thirty ) responded favorably to the prospect of wider language coverage by marc. on the other hand, several individuals commented that non-english output on seldom would not enhance the service for them, and this likely reflects languagt selective dissemination of marc /buhr 43 capabilities more than a lack of non-english material in their subject area. the question regarding format brought interesting comments, especially from library personnel and off-campus librarians: "computer type format is often confusing." "a book designer should be consulted to improve the format." "spacing could be improved to separate title and imprint information from subject headings and notes at foot of entry. would make scanning easier." questions fourteen, nineteen, and twenty-one provide an overall summary of user reaction. 88.6 percent of users want the service to continue. overall value of seldom was rated "very high" by 11.3 percent, "high" by 33.8 percent, "medium" by 42.2 percent, and "low" by 12.7 percent. seldom served to demonstrate the possibility of sdi for monographs "amply" according to 36.6 percent of users, "adequately" to 50.6 percent of users, and "poorly" to 12.65 percent of users. there was less certainty on how such a program should be administered or coated, particularly since a long-range cost study was not yet available. clearly those who were impressed with seldom's effectiveness and future possibilities wanted other faculty to have the same opportunities, yet they cautioned against a blanket service. one comment sums this up best, "it should be available to anyone who has a perceived need for it-but require them to at least make the effort of setting up the profiles, etc." many of the less than enthusiastic comments about seldom could be correlated with little or no user feedback to the search editor in order to improve relevancy and recall. user education in this regard is crucial in order that all users fully understand the possibilities and limitations of the sdi service. the success of any existing sdi service in the periodical literature has hinged on a good data base and up-to-date, specific profiling according to smith and lynch ( 1971 ). ( 4) the effectiveness of the profiling is a direct function of the ingenuity and persistence of the user and the profile editor. discussion this study has attempted to weigh the usefulness of an sdi service primarily with regard to its utility as a current awareness service. seldom, in order to be worthwhile, must either be faster or broader in its coverage than existing services. two comparisons readily arise out of the commentary of the users. some library science professors felt that the lc proofslip service was just as fast as seldom and thus there was no advantage in having the latter when the former was available. a study done at the university of chicago by payne and mcgee ( 1970) repudiates this argument fairly effectively. ( 5) findings at chicago show that marc is faster than the corresponding proofslips. a number of users rely heavily on publishers' blurbs and prepublication notices and find that often books for which records appear on seldom are already on the library shelves. this observation is not altogether an indictment of seldom since another user observed that he appreciated being able to have the hard copy im44 journal of library automation vol. 5/1 march, 1972 mediately; and in some cases he might not even have known about the item except for seldom. some users mentioned that waiting for evaluative reviews could put one at least a year behind just in placing the order for the book, let alone receiving it. seldom has the virtue of informing individuals of the existence of new books, but the delay in having the actual item might be problematic, so one question was directed to this consideration. some people felt that it was at least worth something to know that a book existed even if one could not consult it immediately. numerous complaints were aired regarding the slowness of obtaining items ordered through a library's acquisitions department. in fact one user said this slowness meant he had to purchase personal copies of items he wanted/ needed. as indicated earlier in the introduction, the tesa-1 acquisitions-cataloging routine at the university of saskatchewan library does have the capability to speed up actual receipt of books by the patron. a recent development at the library of congress has definite implications for the future of seldom and any other marc-based sdi programs. the cip (cataloging in publication) 0 program initiated this summer means that lc will now be able to make available cataloging information, except for collation, for books about to be published, at a time factor of up to six weeks before publication. such marc records will have a special tag designating them as cip material. furthermore, cip records will appear only on marc, the number predicted is 10,000 for the first year and 30,000 by the third year, a figure which would include all american imprints. (6) marc-oklahoma o o has already surveyed the subscribers to its sdi project to determine whether they would prefer to receive both cip marc records and regular marc records or only one of the two categories. users preferred to receive both types of information and appropriate changes have been made to the oklahoma sdi programs. (7) beginning with september marc cip records will appear and present information on books thirty to forty-five days before they are published. several library personnel appreciated the usefulness of seldom as an outreach service of the university library into the academic community. they see seldom as a public relations tool. numerous efforts are at the present time being made by librarians to alert individuals to materials in their several fields of interest, and seldom can play an important role in providing an active dissemination of information on a systematic basis. this is the direction in which we need to move so that our role becomes both that of a collector of information and a disseminator of information. special librarians have been doing this kind of thing for years and seldom allows for specialized service to a larger user group. implications and conclusions 1. an sdi service based on marc can be helpful in building a balanced library collection depending on the efforts of faculty and/ or bibselective dissemination of marc /buhr 45 liographers in setting up their prorles and maintaining them. the article by ayres ( 1971 ) is particularly good on this aspect. ( 8) the parameters of the marc data base must constantly be kept in mind, just as the constraints of the ad hoc methods must be considered in any comparisons. publishers' blurbs in journals have the limitation of not systematically covering all the publications in a given subject area; book reviews tend to appear too late to allow users to receive current information on new books; seldom corrects the first shortcoming at the expense of not having the evaluations appearing in book reviews. on the other hand marc tapes do represent the cataloging of books in the english language by one of the largest national libraries in the world, and thus provide a coverage which is hard to duplicate by any one other alerting service. 2. comments, especially from users in the social sciences and humanities, indicate that an sdi system for new monographs has greater pertinence in their area than perhaps in the natural and applied sciences simply because of the nature of research done in the two areas. a recent study by j. l. stewart ( 1970) substantiates this factor for the field of political science. ( 9) his detailed analysis of the fatterns of citing in the writings appearing in a collective work in politica science indicated that 75 percent of such citations were from monographs leading him to the obvious conclusion that "monographs provide three times as much material as do journals" in the field of political science. by contrast, journals are likely more crucial for the fields of natural and applied science, and provide the key access point for vital information. 3. sdi of marc, most users felt, should demand a fair amount of effort on the part of users to assure that the service would obtain optimum return for money invested. a blanket service to all faculty would be wasteful since many faculty would not have a perceived need for it and others would not use it enough if it was simply offered free to everyone. comments tended to favor making contact through the departmental library representative and channel weekly printouts through this individual. a cost study will help determine whether it is economically feasible to operate seldom in an academic setting with at least 100 users. if current subscription costs for sdi services such as those offered by can/sdi of the national science library, ottawa can be maintained, and early indications are that they can, a cost of $100 per profile per year may be feasible bringing the annual expenditure for 100 users to $10,000. a chief variable which makes effective costing difficult is the variation in the number of records appearing on each weekly tape and this is a variable which can only be dealt with by prediction on the basis of the number of records on past tapes. 4. seldom has the virtue of adding a major role of dissemination of information to libraries which up until now have primarily operated as starers of information. selective dissemination of marc/buhr 47 822 33~ shakespeare william fleay. frederick garo. 1831-1909. shakespeare manual. new york.ams press<1970> xxiii. 312 p. 19 cm. lc 76-130621 pr2895 p1002 en 01 tw 000 wt 000 s r0252 fc leng 822.33 isbn 0~0~02~08~ seldom evaluation questionnaire l. what is your feelin g about the sdi lists as a source for finding out about the existence of newly published works in your fields of interest? would you say that the lists provided a source which was: (a) very useful (b) useful ( c ) moderately useful (d) inconsequential 18 34 12 6 2. do you feel that the sdi lists brought to your attention works of interest which are not generally cited b y other sources that you use to learn of new publications? (a) many works (b ) some works (c) a few works (d) none 10 39 19 2 3. how would you characterize your feeling about the relative proportions of the items "of interest" ( relevant items) and "those not of interest" (irrelevant items) included in the sdi lists? (a) the proportion of relevant items in the lists was satisfactory. 57 (b) the proportion of irrelevant items in the lists was too high. 13 48 journal of library automation vol. 5/1 march, 1972 4. it is inevitable that some "not-of-interest" items are included in the sdi lists. was the inclusion of irrelevant notices bothersome to you? (a) yes ( b ) no 6 65 reasons: 5. on the other hand, it is possible that for any given search run, some relevant items in the file are missed. the chance of relevant items being missed can generally be minimized by certain search adjustments, but with a resulting increase in irrelevant notices. would you be willing to increase the number of irrelevant notices received in order to maximize the number of relevant ones? ( a) yes ( b ) no 40 29 reasons: 6. the sdi lists notified you of an average of--items per list which you judged to be "of interest." on a purely quantitative basis, would you say that this number was satisfactory, or for some reason too small or too large? (a) satisfactory 48 ( b ) too small 16 ( c ) too large 1 7. when the input to the marc file is increased, your sdi output would also likely increase. do you feel that you would like to be able to set some arbitrary upper limit on the quantity of items included in each sdi list even at the risk of missing a number of relevant items? (a) yes (b) no if yes , maximum number__ _____________ _ 17 51 reasons: 8. the sdi lists alerted you to a number of items which you judged to be "of interest." would you say that "of interest" items were new to you? (a) in most cases (b) frequently (c) occasionally (d) seldom 17 33 17 5 9. do you feel that the proportion of items "of interest" which were also "new" to you was: (a) satisfactory (b) too low 54 17 10. would you say that, in general, information given for the entries in the sdi lists is adequate to judge whether an item is or is not of interest to you? ( a) yes ( b ) no 58 10 11. what elements of the entry did you most often find useful in making evaluations? (a) author/ editor (b) title (c) publisher (d) series note (e) sub38 55 9 4 35 ject headings (f) classification numbers (g) other (please specify) 8 1 selective dissemiootion of marc /buhr 49 12. what is the primary use to which you put the sdi information? (a) recommendation for library acquisition (b) personal purchase of 51 12 item (c) other (please specify) 15 13. if your recommendation originates the library order for a publication, it will be some time before the work is available; and even if already on order, most of the publications included in your lists were probably too new to be available from the library at the same time you received the list. do you feel that this diminishes the value of the sdi service? (a) significantly (b) somewhat (c) negligibly 2 w ~ for what reasons? 14. a potential value of sdi service, based on the large volume of newly published works cataloged by and for the library of congress, is to bring together in one list timely notices for those works in the file which correspond to your several fields of interest. do you feel that the experimental sdi service demonstrated this capacity? (a) amply (b) adequately (c) poorly 26 36 9 15. is the format of the sdi notices satisfactory? (a) yes (b) no 61 9 if not, what format would you suggest? 16. is the distribution schedule of once a week satisfactory? ( a ) yes ( b ) no 71 0 17. on the average, how much time would you estimate it took to examine an sdi list? roughly: minutes: (a) 5 (b) 5-10 (c) 10 (d) 10-15 (e) 15 (f) 15-20 (g) 20 23 16 9 11 5 1 5 18. a possible by-product of this sdi service is the building up of a cumulative marc tape file which can be searched in various ways by computer. would you make use of such a file? (a) yes (b) no 40 18 if no, for what purposes? 19. judging from your total experience with the sdi service, would you characterize its overall value to you as: ( a) very high (b) high (c) medium (d) low r 24 30 9 50 journal of library automation vol. 5/1 march, 1972 20. the marc file at present represents english monographs cataloged by the library of congress on a week-by-week basis. sometime in 1972, the library of congress will begin to add some non-english monographs to the marc file. keeping in mind the forthcoming expanded marc file on which future sdi service would be based, do you feel that its value to you would then be: (a) increased (b) the same (c) less 30 33 7 21. do you personally want this sdi service to be continued? (a) yes (b) no (c) it doesn't matter ~ 3 5 22. do you faculty? (a) yes feel that this sdi service should be offered to the entire 42 reasons: (b) no 14 23. do you feel that this sdi service should appropriately be made available by the university, i.e., that the university should organize and administer the service? (a) yes ( b ) no (c) don't know 36 5 23 24. do you feel that the university alone should pay for this faculty sdi service? (a) yes (b) no (c) don't know 30 6 25 25. optional: general comments, pros and cons, elucidation of above replies, attitudes, suggestions, etc., concerning the sdi service. 40 information technology and libraries | march 2010 mary kurtz dublin core, dspace, and a brief analysis of three university repositories this paper provides an overview of dublin core (dc) and dspace together with an examination of the institutional repositories of three public research universities. the universities all use dc and dspace to create and manage their repositories. i drew a sampling of records from each repository and examined them for metadata quality using the criteria of completeness, accuracy, and consistency. i also examined the quality of records with reference to the methods of educating repository users. one repository used librarians to oversee the archiving process, while the other two employed two different strategies as part of the selfarchiving process. the librarian-overseen archive had the most complete and accurate records for dspace entries. t he last quarter of the twentieth century has seen the birth, evolution, and explosive proliferation of a bewildering variety of new data types and formats. digital text and images, audio and video files, spreadsheets, websites, interactive databases, rss feeds, streaming live video, computer programs, and macros are merely a few examples of the kinds of data that can be now found on the web and elsewhere. these new dataforms do not always conform to conventional cataloging formats. in an attempt to bring some sort of order from chaos, the concept of metadata (literally “data about data”) arose. metadata is, according to ala, “structured, encoded data that describe characteristics of informationbearing entities to aid in the identification, discovery, assessment, and management of the described entities.”1 metadata is an attempt to capture the contextual information surrounding a datum. the enriching contextual information assists the data user to understand how to use the original datum. metadata also attempts to bridge the semantic gap between machine users of data and human users of the same data. n dublin core dublin core (dc) is a metadata schema that arose from an invitational workshop sponsored by the online computer library center (oclc) in 1995. “dublin” refers to the location of this original meeting in dublin, ohio, and “core” refers to that fact dc is set of metadata elements that are basic, but expandable. dc draws upon concepts from many disciplines, including librarianship, computer science, and archival preservation. the standards and definitions of the dc element sets have been developed and refined by the dublin core metadata initiative (dcmi) with an eye to interoperability. dcmi maintains a website (http://dublincore.org/ documents/dces/) that hosts the current definitions of all the dc elements and their properties. dc is a set of fifteen basic elements plus three additional elements. all elements are both optional and repeatable. the basic dc elements are: 1. title 2. creator 3. subject 4. description 5. publisher 6. contributor 7. date 8. type 9. format 10. identifier 11. source 12. language 13. relation 14. coverage 15. rights the additional dc elements are: 16. audience 17. provenance 18. rights holder dc allows for element refinements (or subfields) that narrow the meaning of an element, making it more specific. the use of these refinements is not required. dc also allows for the addition of nonstandard elements for local use. n dspace dspace is an open-source software package that provides management tools for digital assets. it is frequently used to create and manage institutional repositories. first released in 2002, dspace is a joint development effort of hewlett packard (hp) labs and the massachusetts institute of technology (mit). today, dspace’s future mary kurtz (mhkurtz@gmail.com) is a june 2009 graduate of drexel university’s school of information technology. she also holds a bs in secondary education from the university of scranton and an ma in english from the university of illinois at urbana– champaign. currently, kurtz volunteers her time in technical services/cataloging at simms library at albuquerque academy and in corporate archives at lovelace respiratory research institute (www.lrri.org), where she is using dspace to manage a diverse collection of historical photographs and scientific publications. dc, dspace, and a brief analysis of three university repositories | kurtz 41 is guided by a loose grouping of interested developers called the dspace committers group, whose members currently include hp labs, mit, oclc, the university of cambridge, the university of edinburgh, the australian national university, and texas a&m university. dspace version 1.3 was released in 2005 and the newest version, dspace 1.5, was released in march 2008. more than one thousand institutions around the world use dspace, including public and private colleges and universities and a variety not-for-profit corporations. dc is at the heart of dspace. although dspace can be customized to a limited extent, the basic and qualified elements of dc and their refinements form dspace’s backbone.2 n how dspace works: a contributor’s perspective dspace is designed for use by “metadata naive” contributors. this is a conscious design choice made by its developers and in keeping with the philosophy of inclusion for institutional repositories. dspace was developed for use by a wide variety of contributors with a wide range of metadata and bibliographic skills. dspace simplifies the metadata markup process by using terminology that is different from dc standards and by automating the production of element fields and xml/html code. dspace has four hierarchical levels of users: users, contributors, community administrators, and network/ systems administrators. the user is a member of the general public who will retrieve information from the repository via browsing the database or conducting structured searches for specific information. the contributor is an individual who wishes to add their own work to the database. to become a contributor, one must be approved by a dspace community administrator and receive a password. a contributor may create, upload, and (depending upon the privileges bestowed upon him by his community administrator), edit or remove informational records. their editing and removal privileges are restricted to their own records. a community administrator has oversight within their specialized area of dspace and accordingly has more privileges within the system than a contributor. a community administrator may create, upload, edit, and remove records, but also can edit and remove all records available within the community’s area of the database. additionally, the community administrator has access to some metadata about the repository’s records that is not available to users and contributors and has the power to approve requests to become contributors and grant upload access to the database. lastly, the community administrator sets the rights policy for all materials included in the database and writes the statement of rights that every contributor must agree to with every record upload. the network/systems administrator is not involved with database content, focusing rather on software maintenance and code customization. when a dspace contributor wishes to create a new record, the software walks them through the process. dspace presents seven screens in sequence that ask for specific information to be entered via check buttons, fillin textboxes, and sliders. at the end of this process, the contributor must electronically sign an acceptance of the statement of rights. because dspace’s software attempts to simplify the metadata-creation process for contributors, its terminology is different from dc’s. dspace uses more common terms that are familiar to a wider variety of individuals. for example, dspace asks the contributor to list an “author” for the work, not a “creator” or a “contributor.” in fact, those terms appear nowhere in any dspace. instead, dspace takes the text entered in the author textbox and maps it to a dc element—something that has profound implications if the mapping does not follow expected dc definitions. likewise, dspace does not use “subject” when asking the contributor to describe their material. instead, dspace asks the contributor to list keywords. text entered into the keyword field is then mapped into the subject element. while this seems like a reasonable path, it does have some interesting implications for how the subject element is interpreted and used by contributors. dc’s metadata elements are all optional. this is not true in dspace. dspace has both mandatory and automatic elements in its records. because of this, data records created in dspace look different than data records created in dc. these mandatory, automatic, and default fields affect the fill frequency of certain dc elements—with all of these elements having 100 percent participation. in dspace, the title element is mandatory; that is, it is a required element. the software will not allow the contributor to proceed if the title text box is left empty. as a consequence, all dspace records will have 100 percent participation in the title element. dspace has seven automatic elements, that is, element fields that are created by the software without any need for contributor input. three are date elements, two are format elements, one is an identifier, and one is provenance. dspace automatically records the time of the each record’s creation in machine-readable form. when the record is uploaded into the database, this timestamp is entered into three element fields: dc.date.available, dc.date.accessioned, and dc.date.issued. therefore dspace records have 100 percent participation in the date element. for previously published materials, a separate screen asks for the original publication date, which is then 42 information technology and libraries | march 2010 placed in the dc.date.issued element. like title, the original date of publication is a mandatory field, and failure to enter a meaningful numerical date into the textbox will halt the creation of a record. in a similar manner, dspace “reads” the kind of file the contributor is uploading to the database. dspace automatically records the size and type (.doc, .jpg, .pdf, etc.) of the file or files. this data is automatically entered into dc.format.mimetype and dc.format.extent. like date, all dspace records will have 100 percent participation in the format element. likewise, dspace automatically assigns a location identifier when a record is uploaded to the database. this information is recorded as an uri and placed in the identifier element. all dspace records have a dc.identifier.uri field. the final automatic element is provenance. at the time of record creation, dspace records the identity of the contributor (derived from the sign-in identity and password) and places this information into a dc.provenance element field. this information becomes a permanent part of the dspace record; however, this field is a hidden to users. typically only community and network/systems administrators may view provenance information. still, like date, format, and identifier elements, dspace records have automatic 100 percent participation in provenance. because of the design of dspace’s software, all dspace-created records will have a combination of both contributor-created and dspace-created metadata. all dspace records can be edited. during record creation, the contributor may at any time move backward through his record to alter information. once the record has been finished and the statement of rights signed, the completed record moves into the community administrator’s workflow. once the record has entered the workflow, the community administrator is able to view the record with all the metadata tags attached and make changes using dspace’s editing tools. however, depending on the local practices and the volume of records passing through the administrator’s workflow, the administrator may simply upload records without first reviewing them. a record may also be edited after it has been uploaded, with any changes being uploaded into the database at the end of editing process. in editing a record after it has been uploaded, the contributor, providing he has been granted the appropriate privileges, is able to see all the metadata elements that have attached to the record. calling up the editing tools at this point allows the contributor or administrator to make significant changes to the elements and their qualifiers, something that is not possible during the record’s creation. when using the editing tools, the simplified contributor interface disappears, and the metadata elements fields are labeled with their dc names. the contributor or administrator may remove metadata tags and the information they contain and add new ones selecting the appropriate metadata element and qualifier from a slider. for example, during the editing process, the contributor or administrator may choose to create dc.contributor. editor or dc.subject.lcsh options—something not possible during the record-creation process. in the examination of the dspace records from our three repositories, dspace’s shaping influence on element participation and metadata quality will be clearly seen. n the repositories dspace is principally used by academic and corporate nonprofit agencies to create and manage their institutional repositories. for this study, i selected three academic institutions that shared similar characteristics (large, public, research-based universities) but which had differing approaches to how they managed their metadata-quality issues. the university of new mexico (unm) dspace repository (dspaceunm) holds a wide-ranging set of records, including materials from the university’s faculty and administration, the law school, the anderson school of business administration, and the medical school, as well as materials from a number of tangentially related university entities like the western water policy review advisory commission, new mexico water trust board, and governor richardson’s task force on ethic reform. at the time of the initial research for this paper (spring 2008), dspaceunm provided little easily accessible on-site education for contributors about the dspace record-creation process. what was offered—a set of eight general information files—was buried deep inside the library community. a contributor would have to know the files existed to find them. by summer 2009, this had changed. dspaceunm had a new homepage layout. there is now a link to “help sheets and promotional materials” at the top center of the homepage. this link leads to the previously difficult-tofind help files. the content of the help files, however, remains largely unchanged. they discuss community creation, copyrights, administrative workflow for community creation, a list of supported formats, a statement of dspaceunm’s privacy policy, and a list of required, encouraged, and not required elements for each new record created. for the most part, dspaceunm help sheets do not attempt to educate the contributor in issues of metadata quality. there is no discussion of dc terminology, no attempts to refer the contributor to a thesaurus or controlled vocabulary list, nor any explanation of the record-creation or editing process. this lack of contributor education may be explained in part because dspaceunm requires all new records dc, dspace, and a brief analysis of three university repositories | kurtz 43 to be reviewed by a subject area librarian as part of the dspace community workflow. thus any contributor errors, in theory, ought to be caught and corrected before being uploaded to the database. the university of washington (uw) dspace repository (researchworks at the university of washington) hosts a narrower set of records than dspaceunm, with the materials limited to the those contributed by the university’s faculty, students, and staff, plus materials from the uw’s archives and uw’s school of public and community health. in 2008, researchworks was self-archiving. most contributors were expected to use dspace to create and upload their record. there is no indication in the publicly available information about the record creation workflow if record reviews were conducted before record upload. the help link on the researchworks homepage brought contributors to a set of screen-by-screen instructions on how to use dspace’s software to create and upload a record. the step-through did not include instructions on how to edit a record once it had been created. no explanation of the meanings or definitions of the various dc elements was included in the help files. there also were no suggestions about the use of a controlled vocabulary or a thesaurus for subject headings. by 2009, this link had disappeared and the associated contributor education materials with it. the knowledge bank at ohio state university(osu) is the third repository examined for this paper. osu’s repository hosts more than thirty communities, all of which are associated with various academic departments or special university programs. like researchworks at uw, osu’s repository appears to be self-archiving with no clear policy statement as to whether a record is reviewed before it is uploaded to the repository’s database. osu makes a strong effort to educate its contributors. on the upper-left of the knowledge bank homepage is a slider link that brings the contributor (or any user) to several important and useful sources of repository information: about knowledge bank, faqs, policies, video upload procedures, community set-up form, describing your resources, and knowledge bank licensing agreement. the existence and use of metadata in knowledge bank are explicitly mentioned in the faq and policies areas, together with an explanation of what metadata is and how metadata is used (faq), and a list of supported metadata elements (policies). the describe your resources section gives extended definitions of each dspace-available dc metadata element and provides examples of appropriate metadata-element use. knowledge bank provides the most comprehensive contributor education information of any of the three repositories examined. it does not use a controlled vocabulary list for subject headings, and it does not offer a thesaurus. n data and analysis i chose twenty randomly selected full records from each repository. no more than one record was taken from any one collection to gather a broad sampling from each repository. i examined each record for the quality of its metadata. metadata quality is a semantically slippery term. park, in the spring 2009 special metadata issue of cataloging and classification quarterly, suggested that most commonly accepted criteria for metadata quality are completeness, accuracy, and consistence.3 those criteria will be applied in this analysis. for the purpose of this paper, i define completeness as the fill rate for key metadata elements. because the purpose of metadata is to identify the record and to assist in the user’s search process, the key elements are title, contributor/creator, subject, and description.abstract— all contributor-generated fields. i chose these elements because these are the fields that the dspace software uses when someone conducts an unrestricted search. table 1 shows the fill rate for the title element is 100 percent for all three repositories. this is to be expected because, as noted above, title is mandatory field. the fill rate for contributor/creator is likewise high: 16 of 20 (80 percent) for unm, 19 of 20 (95 percent) for uw, and 19 of 20 (95 percent) for osu. (osu’s fill rate for creator and contributor were summed because osu uses different definitions for creator and contributor element fields than do unm or uw. this discrepancy will be discussed in greater depth in the consistency of metadata terminology below.) the fill rate for subject was more variable. unm’s subject fill rate was 100 percent, while uw’s was 55 percent, and osu’s was 40 percent. the fill rate for the description.abstract subfield was 12 of 80 (60 percent) at unm, 15 of 20 (75 percent) at uw, and 8 of 20 (40 percent) at osu. (see appendix a for a complete list of metadata elements and subfields used by each of the three repositories.) the relatively low fill rate (below 50 percent) at the osu knowledgebank in both subject and description .abstract suggests a lack of completeness in that repository’s records. accuracy in metadata quality is the essential “correctness” of a record. correctness issues in a record range from data-entry issues (typos, misspellings, and inconsistent date formats) to the correct application of metadata definitions and data overlaps.4 accuracy is perhaps the most difficult of the metadata 44 information technology and libraries | march 2010 quality criteria to judge. local practices vary widely, and dc allows for the creation of custom metadata tags for local use. additionally, there is long-standing debate and confusion about the definitions of metadata elements even among librarians and information professionals.5 because of this, only the most egregious of accuracy errors were considered for this paper. all three repositories had at least one record that contained one or more inaccurate metadata fields; two of them had four or more inaccurate records. inaccurate records included a wide variety of accuracy errors, including poor subject information (no matter how loosely one defines a subject heading, “the” is not an accurate descriptor); mutually contradictory metadata (record contained two different language tags, although only one applied to the content); and one in which the abstract was significantly longer and only tangentially related than the file it described. additionally, records showed confusion over contributor versus creator elements. in a few records, contributors entered duplicate information into both element fields. this observation supports park and childress’s findings that there is widespread confusion over these elements.6 among the most problematic records in terms of accuracy were those contained in uw’s early buddhist manuscripts project. this collection, which has been removed from public access since the original data was drawn for this paper, contained numerous ambiguous, contradictory, and inaccurate metadata elements.7 while contributor-generated subject headings were specifically not examined for this paper, it must be noted that was a wide variation in the level of detail and vocabulary used to describe records. no community within any of the repositories had specific rules for the generation of keyword descriptors for records, and the lack of guidance shows. consistency can be defined as the homogeneity of formats, definitions, and use of dc elements within the records. this consistency, or uniformity, of data is important because it promotes basic semantic interoperability. consistency both inside the repository itself and with other repositories makes the repository easier to use and provides the user with higher quality information. all three repositories showed 100 percent consistency in dspace-generated elements. dspace’s automated creation of date and format fields provided reliably consistent records in those element fields. dspace’s automatic formatting of personal names in the dc.contributor.author and dc.creator fields also provided excellent internal consistency. however, the metadata elements were much less consistent for contributor-generated information. inconsistency within the subject element is where most problems occurred. personal names used as subject heading and capitalization within subject headings both proved to be particular issues. dspace alphabetizes subject headings according to the first letter of the free text entered in the keyword box. thus the same name entered in different formats (first name first or last name first) generates different subject-heading listings. the same is true for capitalization. any difference in capitalization of any word within the free-text entry generates a separate subject heading. another field where consistency was an issue was dc.description.sponsorship. sponsorship is problem because different communities, even different collections within the same community, use the field to hold different information. some collections used the sponsorship field to hold the name of a thesis or dissertation advisor. some collections used sponsorship to list the funding agency or underwriter for a project being documented inside the record. some collections used sponsorship to acknowledge the donation of the physical materials documented by the record. while all of these are valid uses of the field, they are not the same thing and do not hold the same meaning for the user. the largest consistency issue, however, came from table 1. metadata fields and their frequencies element univ. of n.m. univ. of wash. ohio state univ. title 20 20 20 creator 0 0 16 subject 20 11 8 description 12 16 17 publisher 4 4 8 contributor 16 19 3 date 20 20 20 type 20 20 20 identifier 20 20 20 source 0 0 0 language 20 20 20 relation 3 1 6 coverage 2 0 0 rights 2 0 0 provenance ** ** ** **provenance tags are not visible to public users dc, dspace, and a brief analysis of three university repositories | kurtz 45 a comparison of repository policies regarding element use and definition. unaltered dspace software maps contributor-generated information entered into the author textbox during the record-creation process into the dc.contributor.author field. however, osu’s dspace software has been altered so that the dc.contributor .author field does not exist. instead, text entered into the author textbox during the record-creation process maps to dc.creator. although both uses are correct, this choice does create a significant difference in element definitions. osu’s dspace author fields are no longer congruent with other dspace author fields. n conclusions dspace was created as repository management tool. by streamlining the record creation workflow and partially automating the creation of metadata, dspace’s developers hoped to make institutional repositories more useful and functional while time providing an improved experience for both users and contributors. in this, dspace has been partially successful. dspace has made it easier for the “metadata naive” contributor to create records. and, in some ways, dspace has improved the quality of repository metadata. its automatically generated fields ensure better consistency in those elements and subfields. its mandatory fields guarantee 100 percent fill rates in some elements, and this contributes to an increase in metadata completeness. however, dspace still relies heavily on contributorgenerated data to fill most of the dc elements, and it is in these contributor-generated fields that most of the metadata quality issues arise. nonmandatory fields are skipped, leading to incomplete records. data entry errors, a lack of authority control over subject headings, and confusion over element definitions can lead to poor metadata accuracy. a lack of enforced, uniform naming and capitalization conventions leads to metadata inconsistency, as does the localized and individual differences in the application of metadata element definitions. while most of the records examined in this small survey could be characterized as “acceptable” to “good,” some are abysmal. to improve the inconsistency of the dspace records, the three universities have tried differing approaches. only unm’s required record review by a subject area librarian before upload seems to have made any significant impact on metadata quality. unm has a 100 percent fill rate for subject elements in its records, while uw and osu do not. this is not to say that unm’s process is perfect and that poor records do not get into the system—they do (see appendix b for an example). but it appears that for now, the intermediary intervention of a librarian during the record-creation process is an improvement over self-archiving—even with education—by contributors. references and notes 1. association of library collections & technical services, committee on cataloging: description & access, task force on metadata, “final report,” june 16, 2000, http://www.libraries .psu.edu/tas/jca/ccda/tf-meta6.html (accessed mar. 10, 2007). 2. a voluntary (and therefore less-than-complete) list of current dspace users can be found at http://www.dspace. org/index.php?option=com_content&task=view&id=596&ite mid=180. further specific information about dspace, including technical specifications, training materials, licensing, and a user wiki, can be found at http://www.dspace.org/index .php?option=com_content&task=blogcategory&id=44&itemi d=125. 3. jung-ran park “metadata quality in digital repositories: a survey of the current state of the art,” cataloging & classification quarterly 47, no. 3 (2009): 213–28. 4. sarah currier et al., “quality assurance for digital learning object repositories: issues for the metadata creation process,” alt-j: research in learning technology 12, no. 1 (2004): 5–20. 5. jung-ran park and eric childress, “dc metadata semantics: an analysis of the perspectives of informational professionals,” journal of information science 20, no. 10 (2009): 1–13. 6. ibid. 7. for a fuller discussion of the collection’s problems and challenges in using both dspace and dc, see kathleen forsythe et al., university of washington ealy buddhist manuscripts project in dspace (paper presented at dc-2003, seattle, wash., sept. 28–oct. 2, 2003), http://dc2003.ischool.washington.edu/ archive-03/03forsythe.pdf (accessed mar. 10, 2007). lita cover 2, cover 3 neal-schuman cover 4 oclc 7 index to advertisers 46 information technology and libraries | march 2010 appendix a. a list of the most commonly used qualifiers in each repository university of new mexico dc.date.issued (20) dc.date.accessioned (20) dc.date.available (20) dc.format.mimetype (20) dc.format.extent (20) dc.identifier.uri (20) dc.contributor.author (15)) dc.description.abstract (12) dc.identifier.citation (6) dc.description.sponsorship (4) dc.subject.mesh (2) dc.contributor.other (2) dc.description.sponsor (1) dc.date.created (1) dc.relation.isbasedon (1) dc.relation.ispartof (1) dc.coverage.temporal (1) dc.coverage.spatial (1) dc.contributor.other (1) university of washington dc.date.accessioned (20) dc.date.available (20) dc.date.issued (20) dc.format.mimetype (20) dc.format.extent (20) dc. identifier.uri (20) dc.contributor.author (18) dc.description.abstract (15) dc.identifier.citation (4) dc.identifier.issn (4) dc.description.sponsorship (1) dc.contributor.corporateauthor (1) dc.contributor.illustrator (1) dc.relation.ispartof (1) ohio state university dc.date.issued (20) dc.date.available (20) dc.date.accessioned (20) dc.format.mimetype (20) dc.format.extent (20) dc.identifier.uri (20) dc.description.abstract (8) dc.identifier.citation (4) dc.subject.lcsh (4) dc.relation.ispartof (4) dc.description.sponsorship (3) dc.identifier.other (2) dc.contributor.editor (2) dc.contribtor.advisor (1) dc.identifier.issn (1) dc.description.duration (1) dc.relation.isformatof (1) dc.description.statementofresponsibility (1) dc.description.tableofcontents (1) appendix b. sample record dc.identifier.uri http://hdl.handle.net/1928/3571 dc.description.abstract president schmidly’s charge for the creation of a north golf course community advisory board. dc.format.extent 17301 bytes dc.format.mimetype application/pdf dc.language.iso en_us dc.subject president dc.subject schmidly dc.subject north dc.subject golf dc.subject course dc.subject community dc.subject advisory dc.subject board dc.subject charge dc.title community_advisory_board_charge dc.type other 20 information technology and libraries | june 2008 an assessment of student satisfaction with a circulating laptop service louise feldmann, lindsey wess, and tom moothart since may of 2000, colorado state university’s (csu) morgan library has provided a laptop computer lending service. in five years the service had expanded from 20 to 172 laptops. although the service was deemed a success, users complained about slow laptop startups, lost data, and lost wireless connections. in the fall of 2005, the program was formally assessed using a customer satisfaction survey. this paper discusses the results of the survey and changes made to the service based on user feedback. colorado state university (csu) is a land-grant insti-tution located in fort collins, colorado. the csu libraries consist of the morgan library, the main library on the central campus; the veterinary teaching branch hospital library at the veterinary hospital campus; and the atmospheric branch library at the foothills campus. in 1997, morgan library completed a major renovation and expansion which provided a designated space for public desktop computers in an information commons environment. the library called this space the electronic information center (eic). due to the popularity of the eic ,and with the intent of expanding computer access without expanding the computer lab, library staff began to explore the implementation of a laptop checkout service in 2000. library staff used heather lyle’s (1999) article “circulating laptop computers at west virginia university” as a guide in planning the service. development funds were used to purchase twenty laptop computers, and the 3com corporation donated fifteen wireless network access points. the laptops were to be used in morgan library on a wireless network maintained by the library technology services department. these computers were to be circulated from the loan desk, the same desk used to check out books. although the building is open to the public, use of the laptops was limited to university students and staff and for library in-house use only. all the public desktop computers and laptops use microsoft windows and microsoft office. maintaining the security of the libraries’ network and students’ personal data in a wireless environment was paramount. to maintain a secure computing environment and present a standardized computing experience in the library, an application of windows xp group policies was used. currently, the laptop software is updated at least every semester using symantec ghost. ghost copies a standardized image to every laptop even when the library owns a variety of computer models from the same manufacturer. additionally, due to concerns over wireless computer security, morgan library implemented cisco’s virtual private network (vpn) in 2004. the laptop service was launched in may 2000. more than 22,000 laptop transactions occurred in the initial year. since its inception, the use of the morgan library laptop service and the number of laptops available for checkout has steadily grown. using student technology funds, the service had grown to 172 laptops and ten presentation kits consisting of a laptop, projector, and a portable screen. circulation during the fall 2005 semester totaled 30,626 laptops and 102 presentation kits. in fiscal year 2005, 66,552 laptops and presentation kits were checked out. based on the high circulation statistics and anecdotal evidence, the service appeared to be successful. although morgan library replaced laptops every three years and upgraded the wireless network, laptop support staff noted that users complained of slow laptop startups, lost data, and lost wireless connections. the researchers also noted that large numbers of users queued at the circulation desk at 5:00 p.m. even though large numbers of desktop computers were available in the eic. a customer service satisfaction survey was developed to assess the service and test library staff’s assumptions about the service. csu had a student population of 25,616 students at the time of the survey. n literature review much of the published literature discussing laptop services focuses on the implementation of laptop lending programs and was published from 2001 to 2003, when many libraries were beginning this service (allmang 2003; block 2001; dugan 2001; myers 2001; oddy 2002; vaughan and burnes 2002; williams 2003). these articles deal primarily with topics such as how to deal with start-up technological, staffing, and maintenance issues. they have minimal discussion of the service post-implementation. researchers who have surveyed users of university laptop lending services include direnzo (2002), lyle (1999), jordy (1998), block (2001), oddy (2002), and monash university’s caulfield library (2004). direnzo from the university of akron only briefly discusses a survey they conducted with some information about additional software added as a result of their user comments. lyle from west virginia university discusses the percentage of respondents to particular questions such louise feldmann (louise.feldmann@colostate.edu) is the business and economics librarian at colorado state university libraries. she serves as the college liaison librarian to the college of business. lindsey wess (lindsey.wess@colostate. edu) coordinates assistive technology services and manages the information desk and the electronic information center at colorado state university libraries. tom moothart (tmoothar@ library.colostate.edu) is the coordinator of on-site services at colorado state university libraries. student satisfaction with circulating laptop service | feldmann, wess, and moothart 21 as what applications were used, problems encountered, and overall satisfaction with the service. jordy’s report provides in-depth analysis of the survey results from the university of north carolina at chapel hill, but the focus of his survey is on the laptop service’s impact on library employee work flow. monash university’s caulfield library survey focuses on wireless access and awareness of the program by patrons. other survey results found on university library web sites include southern new hampshire university library (west 2005) and murray state university library (2002). additionally, the monmouth university library web site (2003) provides discussion and written analysis of a survey they conducted prior to implementation of their service, a survey which was used to gather information and assess patron needs in order to aid in the construction and planning of their service. from the survey results discussed in the literature and posted on web sites, overall comments from users are very consistent with one another. most users indicate that they use a loaned laptop computer rather than desktop computer for privacy and portability (lyle 1999; oddy 2002; west 2005). in addition, the responses from patrons are overwhelmingly positive and users appreciated having the service made available (lyle 1999; jordy1998; west 2005). both west virginia university and the university of north carolina at chapel hill surveys found that 98 percent of respondents would check out a laptop again (lyle 1999; jordy 1998). southern university of new hampshire’s survey indicated that 88 percent of those responding would check one out again (west 2005). many respondents stated that a primary drawback of using the laptops was the slowness of connectivity (lyle 1999; monash 2004; murray state 2002). the primary use of the laptops, reported in the surveys, was microsoft word (lyle 1999; jordy 1998; oddy 2002). there is a lack of published literature regarding laptop lending customer satisfaction surveys and analysis. this could be due to the relative newness of many programs, the lack of university libraries that provide laptops, or the reliance on circulation statistics solely to assess the program. articles that discuss circulation and usage statistics as an assessment indicator to judge the popularity of their programs include direnzo (2002), dugan (2001), and vaughan and burnes (2002). based on high circulation statistics and positive anecdotal evidence, it may appear that library users are pleased with laptop programs, and perhaps there has been a hesitation to survey users on a program that is perceived by those in the library as successful. n results with the strong emphasis on assessment at colorado state university, it was decided to formally survey laptop users on their satisfaction with the program. the survey was distributed by the access services staff when the laptops were checked out from october 28, 2005, to november 28, 2005. this was a voluntary survey and the respondents were asked to complete one survey. users returned 173 completed surveys. undergraduates are the predominant audience for the laptop service; of the 173 returned surveys, 160 identified themselves as undergraduates. as shown in table 1, the responses indicated that the library has a core of regular laptop users, with 33 percent using the laptops at least daily and 82 percent using the laptops at least weekly. only 3 percent indicated that they were using a laptop for the first time. many laptop users also utilized the eic with 67 percent responding that they use the information commons at least weekly (see table 2). the laptops were initially purchased with the intent that they would be used to support student team projects. presentation kits with a laptop, projector, and portable screen were an extension of this idea and were also made available for checkout. surprisingly, only 15 percent of table 1. how often do you use a library laptop? frequency percentage more than once a day 3% daily 30% weekly 49% monthly 15% my first time 3% n=172 table 2. how often do you use a library pc? frequency percentage more than once a day 3% daily 20% weekly 44% monthly 20% never 13% n=169 22 information technology and libraries | june 2008 the respondents noted that they were using the laptop with a group. during evenings, it was observed by staff that students were regularly queuing and waiting for a laptop even though pcs were available in the library computer lab. figure 1 shows an hourly use statistics for the desktop and laptop public computers. the usage of the desktop computer drops in the late afternoon, just as the use of the laptop computer increases. students were asked why they chose a laptop rather than a library pc and were allowed to choose from multiple answers. as can be seen in table 3, most students noted the advantages of portability and privacy. five respondents wrote in the “other” category that they were able to work better in quieter areas, and ten mention that the computer lab workspace is limited. the dense use of space in the library computer lab has been noted by morgan library staff and students. the desktop surrounding each library pc only provides about three feet of workspace. one respondent explained the choice of laptop over pc was because “i can take it to a table and spread out my notes vs. on a library pc.” for many users, the desktops are too crowded to spread research material, and the eic is too noisy for contemplative thought. as can be noted from the use statistics, the public laptop program has been a very popular library service. prior to the survey, the perception of the morgan library staff was that students were waiting in the evening for extended periods of time for a laptop. when the library expanded the laptop pool from 20 in 2000 to 172 in 2005, it had seemingly no effect on reducing the number of students waiting to use them. as can be seen in table 4, when asked how long they had waited for a laptop, 74 percent of the students said they had access to a laptop immediately, and 15 percent waited less than a minute. the survey was administered during the second busiest time of the year for the library, the month before thanksgiving break. in the open comments, one respondent stated that it was possible to wait fortyfive minutes to an hour for a laptop and another noted that “during finals weeks it is almost impossible to get one.” even with the limited waiting time recorded by the page 1 of 1 feldmann figures.doc 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 7:30 am 8:30 am 9:30 am 10:30 am 11:30 am 12:30 pm 1:30 pm 2:30 pm 3:30 pm 4:30 pm 5:30 pm 6:30 pm 7:30 pm 8:30 pm 9:30 pm 10:30 pm 11:30 pm time of day p er ce nt ag e of u se r desktop computers checkout laptops figure 1. computer use statistics for may 1, 2006. figure 1. computer use statistics for may 1, 2006. table 3. why did you choose to use a laptop rather than a library pc? response number portability 41 privacy 12 easier to work with a group 7 portability and privacy 54 portability and easier to work with a group 10 portability, privacy, and easier to work with a group 12 student satisfaction with circulating laptop service | feldmann, wess, and moothart 23 respondents, when asked how the library could improve the laptop service many respondents requested that more laptops be purchased to decrease the wait. the library is struggling to determine the appropriate number of laptops to have available during peak use periods to reduce or eliminate wait times. the library laptops are more problematic than the library desktop computers to support. the laptops are more fragile than the desktop computers and have the added complication of connecting to the wireless network. every morning the morgan library’s technology staff retrieves non-functioning laptops; library technicians regularly retrieve lost data due to malfunctioning laptops and unsophisticated computer users. the addition of the virtual private network (vpn) connection to the laptop startup script files has slowed the boot-up to the wireless network. an effort has been made to ameliorate wireless “dead zones,” but users still complain of being dropped from the wireless network. with these problems in mind, users were asked about the technical complications they have experienced with the library laptops. the survey responses in tables 5 and 6 indicate a much lower percentage of users reporting technical problems than was anticipated. the technical staff’s large volume of technical calls may reflect the volume of users rather than systematic problems with the laptop service. surprisingly, 79 percent of the users reported rarely or never returning a non-functioning laptop. in addition, the library technicians have reported that no problems have been found on some of the laptops returned for repair. some of the returned computers may be due to frustration with the slow connection to the wireless network. forty-five percent of respondents reported at least occasionally having problems connecting to the wireless network. from the inception of the laptop program, the library has experienced problems with the wireless technology. from its original fifteen wireless access points to its current twenty-nine, the library has struggled to meet the demand of additional library laptops and users’ personal laptops. many written comments on the surveys complained about the slow connection speed of the wireless network such as, “find a way to make the boot-up process faster. i need to wait about five minutes for it to be totally booted and ready to use.” even with the slow connection to the wireless network, 41 percent of students responding to the survey rated their satisfaction with the library’s laptop service as excellent and 49 percent rated their satisfaction as good (see table 7). n discussion even with 90 percent of our users rating the laptop service as good or excellent, the survey noted some problems that needed attention. the morgan library laptops seamlessly connect to a wireless network through a login script when the computer is turned on. a new script was written to table 4. how long did you wait before you were able to check out your laptop? response percentage i did not wait 74% less than one minute 15% one to four minutes 11% five to ten minutes 2% more than ten minutes 0% n=171 table 5. how often have you experience problems saving files, connecting to the wireless network, or had a laptop that locked up or crashed? frequency saving files wireless connection locked up or crashed often <1% 5% <1% occasionally 8% 40% 17% rarely 33% 32% 35% never 58% 24% 49% n= 165 165 163 table 6. how often have you returned a library laptop that was not working properly? frequency percentage often 4% occasionally 18% rarely 30% never 49% n=165 24 information technology and libraries | june 2008 allow the connection and authentication to the cisco virtual private network (vpn) client. during testing it was found that some laptops took as long as ten minutes to connect to the wireless network, which resulted in numerous survey respondents commenting on our slow wireless network. to help correct this problem, the library’s network staff changed each laptop’s user profile from a mandatory roaming profile to a local profile and simplified the login script. the laptops connected faster to the wireless network with the new script, but they still did not meet the students’ expectations. in the fall of 2006, the library network staff moved the laptops from vpn to wi-fi protected access (wpa) wireless security, and laptop login time to the wireless network dropped to under two minutes. the number of customer complaints dropped dramatically after implementing wpa. additional access points were purchased to improve connectivity in morgan library’s wireless “dead zones.” in january 2006, the university’s central computing services audited the wireless network after continued wireless connectivity complaints. the audit recommended reconfiguring the access points channel assignments. in many cases it was found that the same channel had been assigned to access points adjacent to each other, ultimately compromising laptop connectivity. the audit also discovered noise interference on the wireless network from a 2.4-ghz cordless phone used by the loan desk staff. the phone was replaced with a 5.8-ghz one, which has resulted in fewer dropped connections near the loan desk. supporting almost 200 laptops has introduced several problems in the library. the morgan library building was not designed to support the use of large numbers of laptops. because it is impractical for the loan desk to charge nearly 200 laptop batteries throughout the day, laptops available for checkout must be connected to electrical outlets. these are seldom near study tables, and students are forced to crawl underneath tables to locate power or stretch adapter cords across aisles. a space plan for the morgan library is being developed that will increase the number of outlets near study tables. in the meantime, 100 power strips were added to tables used heavily by laptop users. the loan desk staff is very efficient at circulating, but has less success at troubleshooting technical problems. when the laptop service was first implemented, large numbers of laptops were not available due to servicing reasons. the public laptop downtime was lowered by hiring additional library technology students. a one-day onsite repair service agreement was purchased from the manufacturer which resulted in many equipment repairs being completed within 48 hours. in order to reduce the downtime further, a plan to replace some loan desk student workers with library technology students is being evaluated. the technology students will be able to troubleshoot connectivity and hardware problems with the users when they return the defective computers to the loan desk. if a computer needs additional service, it can be handled immediately, which will allow more laptops for checkout since fewer will be removed for repair. when the laptop service was first envisioned, it was seen as a great service for those working in groups. as can be seen in table 3, very few students are using the laptops in a group setting. in survey written comments, students emphasize that they enjoy the portability and privacy enabled by using a laptop. the morgan library eic is cramped and noisy, with the configuration allowing very little room for students to spread out research materials and notes for writing. the morgan library space plan takes these issues into consideration and recommends reconfiguring the eic to lessen the noise and provide writing space near computers. this is intended to improve the student library experience and encourage students to use the desktop computers during the evenings when lines form for the laptops. in order to decrease the current laptop queue at the loan desk, more laptops will be added. as a result of survey comments requesting apple computers, five mac powerbooks were added to the library’s laptop fleet. in addition, as morgan library adds more checkout laptops and the number of students arriving on campus with wireless laptops increases, the wireless infrastructure will need to be upgraded. upgrading the wireless access points to standard 802.11g has been implemented. updating each laptop with a new hardrive image has become problematic as the number of laptops has increased. the wireless network capacity is not large enough for the ghost software to transmit the image to multiple laptops, and so each laptop must be physically attached to the library network. initially, when library technology services attempted imaging many laptops at once, it took six to eight hours and required up to eight staff members. this method of large-scale laptop imaging was so network intensive that it had to be performed when the library was closed to avoid disrupting table 7. please rate your satisfaction with the laptop service. response percentage excellent 41% good 49% neutral 7% poor very poor 2% <1% n=166 student satisfaction with circulating laptop service | feldmann, wess, and moothart 25 public internet use. now imaging the laptop fleet is done piecemeal, twenty to thirty laptops at a time, in order to minimize complications with the ghost process and multicasting through the network switches. due to the staff time required, laptop software is not updated as often as the users would like. technological solutions continue to be investigated that will decrease the labor and network intensity of imaging. n conclusion the morgan library laptop service was established in 2000 and has been a very popular addition to the library’s services. as an example of its popularity, in fiscal year 2005 the laptops circulated 66,552 times. student government continues to support the use of student technology fees to support and expand the fleet of laptops. this survey was an attempt to assess users’ perceptions of the service and identify areas that need improvement. the survey found that students rarely wait more than a few minutes for a laptop, and in open-ended survey questions, students noted that they waited for computers only during peak use periods. while relatively few survey respondents experienced technical difficulties with the laptops and wireless network, slow wireless connection time was a concern that students noted in the open comments section of the survey. overall, the students gave the laptop service a very high rating. when asked to suggest improvements to the service, many respondents recommended purchasing more laptops. the libraries made several changes to improve the laptop service based on survey responses. changes have been made to the login script files, wireless network, and security protocol to speed and stabilize the wireless connection process. additional wireless access points will be added to the building and all access points will be upgraded to the 802.11g standard. in addition, five mac powerbooks have been added to the fleet of windowsbased laptops. the library continues to investigate new service models to circulate and maintain the laptops. works cited allmang, nancy. 2003. our plan for a wireless loan service. computer in libraries 23, no. 3: 20–25. block, karla j. 2001. laptops for loan: the experience of a multilibrary project. journal of interlibrary loan, document delivery, and information 12, no. 1: 1–12. direnzo, susan. 2002. a wireless laptop-lending program: the university of akron experience. technical services quarterly 20, no. 2: 1–12. dugan, robert e. 2001. managing laptops and the wireless network at the mildred f. sawyer library. journal of academic librarianship 27, no. 4: 295–298. jordy, matthew l. 1998. the impact of user support needs on a large academic workflow as a result of a laptop check-out program. master’s thesis, university of north carolina. lyle, heather. 1999. circulating laptop computers at west virginia university. information outlook 3, no. 11: 30–32. myers, penelope. 2001. laptop rental program, temple university libraries. journal of interlibrary loan, document delivery, and information supply 12, no. 1: 35–40. monash university caulfield library. 2004. laptop users and wireless network survey. www.its.monash.edu.au/staff/networks/wireless/review/caul-lapandnetsurvey.pdf (accessed june 8, 2005). monmouth university. 2003. testing the wireless waters: a survey of potential users before the implementation of a wireless notebook computer lending program in an academic library. http://bluehawk.monmouth.edu/~hholden/wwl/wireless_survey_results.html (accessed june 8, 2005). murray state university. 2002. library laptop computer usage survey results. www.murraystate.edu/msml/laptopsurv. htm (accessed june 8, 2005). oddy, elizabeth carley. 2002. laptops for loan. library and information update 1, no. 4: 54–55. vaughn, james b., and brett burnes. 2002. bringing them in and checking them out: laptop use in the modern academic library. information technology and libraries 21, no. 2: 52–62. west, carol. 2005. librarians pleased with results of student survey. southern new hampshire university. www.snhu. edu/3174/asp (accessed june 8, 2005). williams, joe. 2003. taming the wireless frontier: pdas, tablets, and laptops at home on the range. computers in libraries 23, no. 3: 10–12, 62–64. june_ital_fagan_final an evidence-based review of academic web search engines, 2014-2016: implications for librarians’ practice and research agenda jody condit fagan an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 7 7 abstract academic web search engines have become central to scholarly research. while the fitness of google scholar for research purposes has been examined repeatedly, microsoft academic and google books have not received much attention. recent studies have much to tell us about google scholar’s coverage of the sciences and its utility for evaluating researcher impact. but other aspects have been understudied, such as coverage of the arts and humanities, books, and non-western, non-english publications. user research has also tapered off. a small number of articles hint at the opportunity for librarians to become expert advisors concerning scholarly communication made possible or enhanced by these platforms. this article seeks to summarize research concerning google scholar, google books, and microsoft academic from the past three years with a mind to informing practice and setting a research agenda. selected literature from earlier time periods is included to illuminate key findings and to help shape the proposed research agenda, especially in understudied areas. introduction recent pew internet surveys indicate an overwhelming majority of american adults see themselves as lifelong learners who like to “gather as much information as [they] can” when they encounter something unfamiliar (horrigan 2016). although significant barriers to access remain, the open access movement and search engine giants have made full text more available than ever.1 the general public may not begin with an academic search engine, but google may direct them to google scholar or google books. within academia, students and faculty rely heavily on academic web search engines (especially google scholar) for research; among academic researchers in high-income areas, academic search engines recently surpassed abstracts & indexes as a starting place for research (inger and gardner 2016, 85, fig. 4). given these trends, academic librarians have a professional obligation to understand the role of academic web search engines as part of the research process. jody condit fagan (faganjc@jmu.edu) is professor and director of technology, james madison university, harrisonburg, va. 1 khabsa and giles estimate “almost 1 in 4 of web accessible scholarly documents are freely and publicly available” (2014, 5). an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 8 two recent events also point to the need for a review of research. legal decisions in 2016 confirmed google’s right to make copies of books for its index without paying or even obtaining permission from copyright holders, solidifying the company’s opportunity to shape the online experience with respect to books. meanwhile, microsoft rebooted their academic web search engine, now called microsoft academic. at the same time, information scientists, librarians, and other academics conducted research into the performance and utility of academic web search engines. this article seeks to review the last three years of research concerning academic web search engines, make recommendations related to the practice of librarianship, and propose a research agenda. methodology a literature review was conducted to find articles, conference presentations, and books about the use or utility of google books, google scholar, and microsoft academic for scholarly use, including comparisons with other search tools. because of the pace of technological change, the focus was on recent studies (2014 through 2016, inclusive). a search was conducted on “google books” in ebsco’s library and information science and technology abstracts (lista) on december 19, 2016, limited to 2014-2016. of the 46 results found, most were related to legal activity. only four items related to the tool’s use for research. these four titles were entered into google scholar to look for citing references, but no additional relevant citations were found. in the relevant articles found, the literature reviews testified to the general lack of studies of google books as a research tool (abrizah and thelwall 2014; weiss 2016) with a few exceptions concerning early reviews of metadata, scanning, and coverage problems (weiss 2016). a search on “google books” in combination with “evaluation or review or comparison” was also submitted to jmu’s discovery service,2 limited to 2014-2016 in combination with the terms. forty-nine items were found and from these, three relevant citations were added; these were also entered into google scholar to look for citing references. however, no additional relevant citations were found. thus, a total of seven citations from 2014-2016 were found with relevant information concerning google books. earlier citations from the articles’ bibliographies were also reviewed when research was based on previous work, and to inform the development of a fuller research agenda. a search on “microsoft academic” in lista on february 3, 2017 netted fourteen citations from 2014-2016. only seven seemed to focus on evaluation of the tool for research purposes. a search on “microsoft academic” in combination with terms “evaluation or review or comparison” was also submitted to jmu’s discovery service, limited to 2014-2016. eighteen items were found but no additional citations were added, either because they had already been found or were not relevant. the seven titles found in lista were searched in google scholar for citing references; four additional relevant citations were found, plus a paper relevant to google scholar not 2 jmu’s version of ebsco discovery service contained 453,754,281 items at the time of writing and is carefully vetted to contain items of curricular relevance to the jmu community (fagan and gaines 2016). information technology and libraries | june 2017 9 previously discovered (weideman 2015). thus, a total of eleven citations were found with relevant information for this review concerning microsoft academic. because of this small number, several articles prior to 2014 were included in this review for historical context. an initial search was performed on “google scholar” in lista on november 19, 2016, limited to 2014-2016. this netted 159 results, of which 24 items were relevant. a search on “google scholar” in combination with terms “evaluation or review or comparison” was also submitted to jmu’s discovery tool limited to 2014-2016, and eleven relevant citations were added. items older than 2014 that were repeatedly cited or that formed the basis of recent research were retrieved for historical context. finally, relevant articles were submitted to google scholar, which netted an additional 41 relevant citations. altogether, 70 citations were found to articles with relevant information for this review concerning google scholar in 2014-2016. readers interested in literature reviews covering google scholar studies prior to 2014 are directed to (gray et al. 2012; erb and sica 2015; harzing and alakangas 2016b). findings google books google books (https://books.google.com) contains about 30 million books, approaching the library of congress’s 37 million, but far shy of google’s estimate of 130 million books in existence (wu 2015), which google intends to continue indexing (jackson 2010). content in google books includes publisher-supplied, self-published, and author-supplied content (harper 2016) as well as the results of the famous google books library project. started in december 2004 as the “google print” project,3 the project involved over 40 libraries digitizing works from their collections, with google indexing and performing ocr to make them available in google books (weiss 2016; mays 2015). scholars have noted many errors with google books metadata, including misspellings, inaccurate dates, and inaccurate subject classifications (harper 2016; weiss 2016). google does not release information about the database’s coverage, including which books are indexed or which libraries’ collections are included (abrizah and thelwall 2014). researchers have suggested the database covers mostly u.s. and english-language books (abrizah and thelwall 2014; weiss 2016). the conveniences of google books include limits by the type of book availability (e.g. free ebooks vs. google e-books), document type, and date. the detail view of a book allows magnification, hyperlinked tables of contents, buying and “find in a library” options, “my library,” and user history (whitmer 2015). google books also offers textbook rental (harper 2016) and limited print-on-demand services for out-of-print books (mays 2015; boumenot 2015). in april 2016, the supreme court affirmed google’s right to make copies for its index without paying or even obtaining permission from copyright holders (authors guild 2016; los angeles times 2016). scanning of library books and “snippet view” was deemed fair use: “the purpose of the copying is highly transformative, the public display of text is limited, and the revelations do 3 https://www.google.com/googlebooks/about/history.html an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 10 not provide a significant market substitute for the protected aspects of the originals” (u.s. court of appeals for the second circuit 2015). literature concerning high-level implications of google books suggests the tool is having a profound effect on research and scholarship. the tool has been credited for serving as “a huge laboratory” for indexing, interpretation, working with document image repositories, and other activities (jones 2010). at the same time, the academic community has expressed concerns about google books’s effects on social justice and how its full-text search capability may change the very nature of discovery (hoffmann 2014; hoffmann 2016; szpiech 2014). one study found that books are far more prevalently cited in wikipedia than are research articles (kousha and thelwall 2017). yet investigations of google books’ coverage and utility as a research tool seem to be sorely lacking. as weiss noted, “no critical studies seem to exist on the effect that google books might have on the contemporary reference experience” (weiss 2016, 293). furthermore, no information was found concerning how many users are taking advantage of google books; the tool was noticeably absent from surveys such as (inger and gardner's (2016) and from research centers such as the pew internet research project. in a largely descriptive review, harper (2016) bemoaned google books’ lack of integration with link resolvers and discovery tools, and judged it lacking in relevant material for the health sciences, because so much of the content is older. she also noted the majority of books scanned are in english, which could skew scholarship. the non-english skew of google books was also lamented by weiss, who noted an “underrepresentation of spanish and overestimation of french and german (or even japanese for that matter)” especially as compared to the number of spanish speakers in the united states (weiss 2016, 286-306). whitmer (2015) and mays (2015) provided practical information about how google books can be used as a reference tool. whitmer presented major google books features and challenged librarians to teach google books during library instruction. mays conducted a cursory search on the 1871 chicago fire and described the primary documents she retrieved as “pure gold,” including records of city council meetings, notes from insurance companies, reports from relief societies, church sermons on the fire, and personal memoirs (mays 2015, 22). mays also described google books as a godsend to genealogists for finding local records (e.g. police departments, labor unions, public schools). in her experience, the geographic regions surrounding the forty participating google books library project libraries are “better represented than other areas” (mays 2015, 25). mays concludes, “its poor indexing and search capabilities are overshadowed by the ease of its fulltext search capabilities and the wonderful ephemera that enriches its holdings far beyond mere ‘books’” (mays 2015, 26). abrizah and thelwall (2014) investigated whether google books and google scholar provided “good impact data for books published in non-western countries.” they used a comprehensive list of arts, humanities, and social sciences books (n=1,357) from the five main university presses in information technology and libraries | june 2017 11 malaysia 1961-2013. they found only 23% of the books were cited in google books4 and 37% in google scholar (p. 2502). the overlap was small: only 15% were cited in both google scholar and google books. english-language books were more likely to be cited in google books; 40% of english language books were cited versus 16% malay. examining the top 20 books cited in google books, researchers found them to be mostly written in english (95% in google books vs 29% in the sample), and published by university of malaysia press (60% in google books vs 26% in the sample) (2505). the authors concluded that due to the low overlap between google scholar and google books, searching both engines was required to find the most citations to academic books. kousha and thelwall (2015; 2011) compared google books with thomson reuters book citation index (bkci) to examine its suitability for scholarly impact assessment and found google books to have a clear advantage over bkci in the total number of citations found within the arts and humanities, but not for the social sciences or sciences. they advised combining results from bkci with google books when performing research impact assessment for the arts and humanities and social sciences, but not using google books for the sciences, “because of the lower regard for books among scientists and the lower proportion of google books citations compared to bkci citations for science and medicine” (kousha and thelwall 2015, 317). microsoft academic microsoft academic (https://academic.microsoft.com) is an entirely new software product as of 2016. therefore, the studies cited prior to 2016 refer to entirely different search engines than the one currently available. however, a historical account of the tool and reviewers’ opinions was deemed helpful for informing a fuller picture of academic web search engines and pointing to a research agenda. microsoft academic was born as windows live academic in 2006 (carlson 2006), was renamed live search academic after a first year of struggle (jacsó 2008), and was scrapped two years later after the company recognized it did not have sufficient development support in the united states (jacsó 2011). microsoft asia research group launched a beta tool called libra in 2009, which redirected to the “microsoft academic search” service by 2011. early reviews of the 2011 edition of microsoft academic search were promising, although the tool clearly lacked the quantity of data searched by google scholar (jacsó 2011; hands 2012). there were a few studies involving microsoft academic search in 2014. ortega and aguillo (2014) compared microsoft academic search and google scholar citations for research evaluation and concluded “microsoft academic search is better for disciplinary studies than for analyses at institutional and individual levels. on the other hand, google scholar citations is a good tool for individual assessment because it draws on a wider variety of documents and citations” (1155). 4 google books does not support citation searching; the researchers searched for the book title to manually find citations to a book. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 12 as part of a comparative investigation of an automatic method for citation snowballing using microsoft academic search, choong et al. (2014) manually searched for a sample of 949 citations to journal or conference articles cited from 20 systematic reviews. they found microsoft academic search contained 78% of the cited articles and noted its utility for testing automated methods due to its free api and no blocks to automated access. the researchers also tested their method against google scholar, but noted “computer-access restrictions prevented a robust comparison” (n.p.). also in 2014, orduna-malea et al. (2014) attempted a longitudinal study of disciplines, journals, and organizations in microsoft academic search only to find the database had not been updated since 2013. furthermore they found the indexing to be incomplete and still in process, meaning microsoft academic search’s presentation of information about any particular publication, organization, or author was distorted. despite this finding, mas was included in two studies of scholar profiles. ortega (2015) compared scholar profiles across google scholar, microsoft academic search, research gate, academia.edu, and mendeley, and found little overlap across the sites. they also found social and usage indicators did not consistently correlate with bibliometric indicators, except on the researchgate platform. social and usage indicators were “influenced by their own social sites,” while bibliometric indicators seemed more stable across all services (13). ward et al. (2015) still included microsoft academic search in their discussion of scholarly profiles as part of the social media network, noting microsoft academic search was painfully time-consuming to work with in terms of consolidating data, correcting items, and adding missing items. in september 2016, hug et al. demonstrated the utility of the new microsoft academic api by conducting a comparative evaluation of normalized data from microsoft academic and scopus (hug, ochsner, and braendle 2016). they noted microsoft academic has “grown massively from 83 million publication records in 2015 to 140 million in 2016” (10). the microsoft academic api offers rich, structured metadata with the exception of document type. they found all attributes containing text were normalized and that identifiers were available for all entities, including references, supporting bibliometricians’ needs for data retrieval, handling, and processing. in addition to the lack of document type, the researchers also found the “fields of study” to be too granular and dynamic, and their hierarchies incoherent. they also desired the ability to use the doi to build api requests. nevertheless, the advantages of microsoft academic’s metadata and api retrieval suggested to hug et al. that microsoft academic was superior to google scholar for calculating research impact indicators and bibliometrics in general. in october 2016, harzing and alakangas compared publication and citation coverage of the new microsoft academic with google scholar, scopus, and web of science using a sample of 145 academics at the university of melbourne (harzing and alakangas 2016a) including observations from 20-40 faculty each in the humanities, social sciences, engineering, sciences, and life sciences. they discovered microsoft academic had improved substantially since their previous study (harzing 2016b), increasing 9.6% for a comparison sample in comparison with 1.4%, 2%, and 1.7% growth in google scholar, scopus, and web of science (n.p.). the researchers noted a few information technology and libraries | june 2017 13 problems with data quality, “although the microsoft academic team have indicated they are working on a resolution” (n.p.). on average, the researchers found that microsoft academic found 59% as many citations as google scholar, 97% as many citations as scopus, and 108% as many citations as web of science. google scholar had the top counts for each disciplinary area, followed by scopus except in the social sciences and humanities, where microsoft academic ranked second. the researchers explained that microsoft academic “only includes citation records if it can validate both citing and cited papers as credible,” as established through a machine-learningbased system, and discussed an emerging metric of “estimated citation count” also provided by microsoft academic. the researchers concluded that microsoft academic is promising to be “an excellent alternative for citation analysis” and suggested microsoft should work to improve coverage of books and grey literature. google scholar google scholar was released in beta form in november 2004, and was expanded to include judicial case law in 2009. while google scholar has received much attention in academia, it seems to be regarded by google as a niche product: in 2011 google removed scholar from the list of top services and list of “more” services, relegating it to the “even more” list. in 2014, the scholar team consisted of just nine people (levy 2014). describing google scholar in an introductory manner is not helped by google’s vague documentation, which simply says it “includes scholarly articles from a wide variety of sources in all fields of research, all languages, all countries, and over all time periods.”5 the “wide variety of sources” includes “journal papers, conference papers, technical reports, or their drafts, dissertations, pre-prints, post-prints, or abstracts,” as well as court opinions and patents, but not “news or magazine articles, book reviews, and editorials.” books and dissertations uploaded to google book search are “automatically” included in scholar. google says abstracts are key, noting “sites that show login pages, error pages, or bare bibliographic data without abstracts will not be considered for inclusion and may be removed from google scholar.” studies of google scholar can be divided in to three major categories of focus: investigating the coverage of google scholar; the use and utility of google scholar as part of the research process; and google scholar’s utility for bibliographic measurement, including evaluating the productivity of individual researchers and the impact of journals. there is some overlap across these categories, because studies of google scholar seem to involve three questions: 1) what is being searched? 2) how does the search function? and 3) to what extent can the user usefully accomplish her task? the coverage of google scholar scholars want to know what “scholarship” is covered by google scholar, but the documentation merely states that it indexes “papers, not journals”6 and challenges researchers to investigate 5 https://scholar.google.com/intl/en/scholar/inclusion.html 6 https://www.google.com/intl/en/scholar/help.html#coverage an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 14 google scholar’s coverage empirically despite google scholar’s notoriously challenging technical limitations. while some limitations of google scholar have been corrected over the years, longstanding logistical hurdles involved with studying google scholar’s coverage have been well-documented for over a decade (shultz 2007; bonato 2016; haddaway et al. 2015; levay et al. 2016), and include: • search queries are limited to 256 characters • not being able to retrieve more than 1,000 results • not being able to display more than 20 results per page • not being able to download batches of results (e.g. to load into citation management software) • duplicate citations (beyond the multiple article “versions”), requiring manual screening • retrieving different results with advanced and basic searches • no designation of the format of items (e.g. conference papers) • minimal sort options for results • basic boolean operators only7 • illogical interpretation of boolean operators: esophagus or oesophagus and oesophagus or esophagus return different numbers of results (boeker, vach, and motschall 2013) • non-disclosure of the algorithm by which search results are sorted. additionally, one study reported experiencing an automated block to the researcher’s ip address after the export of approximately 180 citations or 180 individual searches (haddaway et al. 2015, 14). furthermore, the research excellence framework was unable to use google scholar to assess the quality of research in uk higher education institutions, because of researchers’ inability to agree with google on a “suitable process for bulk access to their citation information, due to arrangements that google scholar have in place with publishers” (research excellence framework 2013, 1562). such barriers can limit what can be studied and also cost researchers significant time in terms of downloading (prins et al. 2016) and cleaning citations (levay et al. 2016). despite these hurdles, research activity analyzing the coverage of google scholar has continued in the past two years, often building off previous studies. this section will first discuss google scholar’s size and ranking, followed by its coverage of articles and citations, then its coverage of books, grey literature, and open access and institutional repositories. google scholar size and ranking in a 2014 study, khabsa and giles estimated there were at least 114 million english-language scholarly documents on the web, of which google scholar had “nearly 100 million.” another study by orduna-malea, ayllón, martín-martín, and lópez-cózar (2015) estimated that the total number 7 e.g., no nesting of logical subexpressions deeper than one level (boeker, vach, and motschall 2013) and no truncation operators. information technology and libraries | june 2017 15 of documents indexed by google scholar, without any language restriction, was between 160 and 165 million. by comparison, in 2016 the author’s discovery tool contained about 168 million items in academic journals, conference materials, dissertations, and reviews.8 google scholar’s presence in the information marketplace has influenced vendors to increase the discoverability of their content, including pushing for the display of abstracts and/or the first page of articles (levy 2014). proquest and gale indexes were added to google scholar in 2015 (quint 2016). martín-martín et al. (2016b) noted that google scholar’s agreements with big publishers come at a price: “the impossibility of offering an api,” which would support bibliometricians’ research (54). google scholar’s results ranking “aims to rank documents the way researchers do, weighing the full text of each document, where it was published, who it was written by, as well as how often and how recently it has been cited in other scholarly literature.”9 martín-martín and his colleagues (2017, 159) conducted a large, longitudinal study of null query results in google scholar and found a strong correlation between result list ranking and times cited. the influence of citations is so strong that when the researchers performed the same search process four months later, 14.7% of documents were missing in the second sample, causing them to conclude even a change of one or two citations could lead to a document being excluded or included from the top 1,000 results (157). using citation counts as a major part of the ranking algorithm has been hypothesized to produce the “matthew effect,” where “work that is already influential becomes even more widely known by virtue of being the first hit from a google scholar search, whereas possibly meritorious but obscure academic work is buried at the bottom” (antell et al. 2013, 281). google scholar has been shown to heavily bias its ranking toward english-language publications even when there are highly cited non-english publications in the result set, although selection of interface language may influence the ranking. martin-martin and his colleagues noted that google scholar seems to use the domain of the document’s hosting web site as a proxy for language, meaning that “some documents written in english but with their primary version hosted in nonanglophone countries’ web domains do appear in lower positions in spite of receiving a large number of citations” (martin-martin et al. 2017, 161). this effect is shown dramatically in figure 3 of their paper. google scholar coverage: articles and citations the coverage of articles, journals, and citations by google scholar has been commonly examined by using brute force methods to retrieve a sample of items from google scholar and possibly one or more of its competitors. (studies discussed in this section are listed in table 1). the goal is usually to determine how well google scholar’s database compares to traditional research databases, usually in a specific field. core methodology involves importing citations into software such as publish or perish (harzing 2016a), cleaning the data, then performing statistical tests, 8 the discovery tool does not contain all available metadata but has been carefully vetted (fagan and gaines 2016). 9 https://www.google.com/intl/en/scholar/about.html an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 16 expert review, or both. haddaway (2015) and moed et al. (2016) have written articles specifically discussing methodological aspects. recent studies repeatedly find that google scholar’s coverage meets or exceeds that of other search tools, no matter what is identified by target samples, including journals, articles, and citations (karlsson 2014; harzing 2014; harzing 2016b; harzing and alakangas 2016b; moed, barilan, and halevi 2016; prins et al. 2016; wildgaard 2015; ciccone and vickery 2015). in only three studies did google scholar find fewer items, and the meaningful difference was minimal.10 science disciplines were the most studied in google scholar, including agriculture, astronomy, chemistry, computer science, ecology, environmental science, fisheries, geosciences, mathematics, medicine, molecular biology, oceanography, physics, and public health. social sciences studied include education (prins et al. 2016), economics (harzing 2014), geography (ştirbu et al. 2015, 322-329), information science (winter, zadpoor, and dodou 2014; harzing 2016b), and psychology (pitol and de groote 2014). studies related to the arts or humanities 2014-2016 included an analysis of open access journals in music (testa 2016) and a comparison between google scholar and web of science for research evaluation within education, pedagogical sciences, and anthropology11 (prins et al. 2016). wildgaard (2015) and bornmann et al. (2016) included samples of humanities scholars as part of bibliometric studies, but did not discuss disciplinary aspects related to coverage. prior to 2014, the only study found related to the arts and humanities compared google scholar with historical abstracts (kirkwood jr. and kirkwood 2011). google scholar’s coverage has been growing over time (meier and conkling 2008; harzing 2014; winter, zadpoor, and dodou 2014; bartol and mackiewicz-talarczyk 2015, 531; orduña-malea and delgado lópez-cózar 2014) with recent increases in older articles (winter, zadpoor, and dodou 2014; harzing and alakangas 2016b), leading some to question whether this supports the documented trend of increased citation of older literature (martín-martín et al. 2016c; varshney 2012). winter et al. noted that in 2005 web of science yielded more citations than google scholar for about two-thirds of their sample, but for the same sample in 2013, google scholar found more citations than web of science, with only 6.8% of citations not retrieved by google scholar (winter, zadpoor, and dodou 2014, 1560). the unique citations of web of science were “typically documents before the digital age and conference proceedings not available online” (winter, zadpoor, and dodou 2014, 1560). harzing and alakangas’s (2016b) large-scale longitudinal comparison of google scholar, scopus, and web of science suggested that google scholar’s retroactive expansion has stabilized and now all three databases are growing at similar rates. 10 for example, bramer, giustini, and kramer (2016a) found slightly more of their 4,795 references from systematic reviews in embase (97.5%) than in google scholar (97.2%). in testa (2016), the music database rilm indexed two more of the 84 oa journals than google scholar (which indexed at least one article from 93% of the journals). finally, in a study using citations to the most-cited article of all time as a sample, web of science found more citations than did google scholar (winter, zadpoor, and dodou 2014). 11 prins et al. classified anthropology as part of the humanities. information technology and libraries | june 2017 17 google scholar also seems to cover both the oldest and the most recent publications. unlike traditional abstracts and indexes, google scholar is not limited by starting year, so as publishers post tables of contents of their earliest journals online, google scholar discovers those sources (antell et al. 2013, 281). trapp (2016) reported the number of citations to a highly-cited physics paper after the first 11 days of publication to be 67 in web of science, 72 in scopus, and 462 in google scholar (trapp 2016, 4). in a study of 800 citations to nobelists in multiple fields, harzing found that “google scholar could effectively be 9–12 months ahead of web of science in terms of publication and citation coverage” (2013, 1073). an increasing proportion of journal articles in google scholar are freely available in full text. a large-scale, longitudinal study of highly-cited articles 1950-2013 found 40% of article citations in the sample were freely available in full text (martín-martín et al. 2014). another large-sample study found 61% of articles in their sample from 2004–2014 could be freely accessed (jamali and nabavi 2015). in both studies, nih.gov and researchgate were the top two full-text providers. google scholar’s coverage of major publisher content varies; having some coverage of a publisher does not imply all articles or journals from that publisher are covered. in a sample of 222 citations compared across google scholar, scopus, and web of science, google scholar contained all of the springer titles, as many elsevier titles as scopus, and the most articles by wolters kluwer and john wiley. however, among the three databases, google scholar contained the fewest articles by bmj and nature (rothfus et al. 2016). an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 18 18 study sample results (bartol and mackiewicztalarczyk 2015) documents retrieved in response to searches on crops and fibers in article titles, 1994-2013 (samples varied by crop) google scholar returned more documents retrieved for each crop. for example, “hemp” retrieved 644 results in google scholar, 493 in scopus, and 318 in web of science; google scholar demonstrated higher yearly growth of records over time. (bramer, giustini, and kramer 2016b) references from a pool of systematic reviewer searches in medicine (n=4795) google found 97.2%, embase, 97.5%, medline 92.3% of all references; when using search strategies, embase retrieved 81.6%, medline 72.6%, and google scholar 72.8%. (ciccone and vickery 2015) based on 183 user searches randomly selected from ncsu libraries’ 2013 summon search logs (n=137) no significant difference between the performance of google scholar, summon, and eds for known-item searches; “google scholar outperformed both discovery services for topical searches.” (harzing 2014) publications and citation metrics for 20 nobelists in chemistry, economics, medicine, physics, 20122013 (samples varied) google scholar coverage is now “increasing at a stable rate” and provides “comprehensive coverage across a wide set of disciplines for articles published in the last four decades” (575). (harzing 2016b) citations from one researcher (n=126) microsoft academic found all books and journal articles covered by google scholar; google scholar found 35 additional publications including book chapters, white papers, and conference papers. (harzing and alakangas 2016a) samples from (harzing and alakangas 2016b, 802) (samples varied by faculty) google scholar provided higher “true” citation counts than microsoft academic but microsoft academic “estimated” citation counts were 12% higher than google scholar for life sciences and equivalent for the sciences. information technology and libraries | june 2017 19 (harzing and alakangas 2016b) citations of the works of 145 faculty among 37 scholarly disciplines at the university of melbourne (samples varied by faculty) for the top faculty member, google scholar had 519 total papers (compared with 309 in both web of science and scopus); google scholar had 16,507 citations (compared with 11,287 in web of science and 11,740 in scopus). (hilbert et al. 2015) documents published by 76 information scientists in german-speaking countries (n=1,017) google scholar covered 63%, scopus, 31%, bibsonomy, 24%, mendeley, 19%, web of science, 15%, citeulike, 8%. (jamali and nabavi 2015) items published between 2004 and 2014 (n=8,310) 61% of articles were freely available; of these, 81% were publisher versions and 14% were pre-prints; researchgate was the top full-text source netting 10.5% of full-text sources, followed by ncbi.nlm.nih.gov (6.5%). (karlsson 2014) journals from ten different fields (n=30) google scholar retrieved documents from all the selected journals; summon only retrieved documents from 14 out of 30 journals. (lee et al. 2015) journal articles housed in florida state university’s institutional repository (n=170) metadata found in google for 46% of items and in google scholar for 75% of items; google scholar found 78% of available full text. google scholar found full text for six items with no full text in the ir. (martín-martín et al. 2014) items highly cited by google scholar (n=64,000) 40% could be freely accessed using google scholar; nih.gov and researchgate were the top two full-text providers. (moed, bar-ilan, and halevi 2016) citations to 36 highly cited articles in 12 scientific-scholarly english-language journals (n=about 7,000) 47% of sources were in both google scholar and scopus; 47% of sources were in google scholar only; 6% of sources were in scopus only; of the unique google scholar citations, sources were most often from google books, springer, ssrn, researchgate, acm digital library, arxiv, and aclweb.org. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 20 (prins et al. 2016) article citations in the field of education and pedagogies, and citations to 328 articles in anthropology (n=774) google scholar found 22,887 citations in education & pedagogical science compared to web of science’s 8,870, and 8,092 in anthropology compared with web of science’s 1,097. (ştirbu et al. 2015) compared # of citations resulting from two geographical topic searches (samples varied) google scholar found 2,732 geographical references whereas web of science found only 275, georef 97, and francis 45. for sedimentation, google scholar found 1,855 geographical references compared to web of science’s 606, georef’s 1,265, and francis’s 33; google scholar overlapped web of science by 67% and 82% for the two searches, and georef by 57% and 62% (testa 2016) open access journals in music (n=84) google scholar indexed at least one article from 93% of oa journals. rilm indexed two additional journals. (wildgaard 2015) publications from researchers in astronomy, environmental science, philosophy and public health (n=512) publication count from web of science was 2-4 times lower for all disciplines than google scholar; citation count was up to 13 times lower in web of science than in google scholar. (winter, zadpoor, and dodou 2014) growth of citations to 2 classic articles (19952013) and 56 science and social science articles in google scholar, 2005-2013 (samples varied) total citation counts 21% higher in web of science than google scholar for lowry (1951) but google scholar 17% higher than web of science for garfield (1955) and 102% higher for the 56 research articles; google scholar showed a significant retroactive expansion to all articles compared to negligible retroactive growth in web of science. table 1. studies investigating google scholar’s coverage of journal articles and citations, 2014-2016. information technology and libraries | june 2017 21 google scholar coverage: books many studies mentioned that books, including google books, are sometimes included in google scholar results. jamali and nabavi (2015) found 13% of their sample of 8,310 citations from google scholar were books, while martín-martín et al. (2014) had found that 18% of their sample of 64,000 citations from google scholar were books. within the field of anthropology, prins (2016) found books to generate the most citation impact in google scholar (41% of books in their sample were cited in google scholar) compared to articles (21% of articles were cited in google scholar). in education, 31% of articles and 25% of books were cited by google scholar (3). abrizah and thelwall found only 37% of their sample of 1,357 arts, humanities, and social sciences books from the five main university presses in malaysia had been cited in google scholar (23% of the books had been cited in google books) (abrizah and thelwall 2014, 2502). the overlap was small: 15% had impact in both google scholar and google books. the authors concluded that due to the low overlap between google scholar and google books, searching both engines is required to find the most citations to academic books. english books were significantly more likely to be cited in google scholar (48% vs. 32%), as were edited books (53% vs. 36%). they surmised edited books’ citation advantage was due to the use of book chapters in social sciences. they found arts and humanities books more likely to be cited in google scholar than social sciences books (40% vs. 34%) (abrizah and thelwall 2014, 2503). google scholar coverage: grey literature grey literature refers to documents not published commercially, including theses, reports, conference papers, government information, and poster sessions. haddaway et al. (2015) was the only empirical study found focused on grey literature. they discovered that between 8% and 39% of full-text search results from google scholar were grey literature, with the greatest concentration of citations from grey literature on page 80 of results for full-text searches and page 35 for title searches. they concluded “the high proportion of grey literature that is missed by google scholar means it is not a viable alternative to hand searching for grey literature as a standalone tool” (2015, 14). for one of the systematic reviews in their sample, none of the 84 grey literature articles cited were found within the exported google scholar search results. the only other investigation of grey literature found was bonato (2016), who after conducting a very limited number of searches on one specific topic and a search for a known item, concluded google scholar to be “deficient.” in conclusion, despite much offhand praise for google scholar’s grey literature coverage (erb and sica 2015; antell et al. 2013), the topic has been little studied and when it has, grey literature results have not been prominent. google scholar coverage: open access and institutional repository content erb and sica touted google scholar’s access to “free content that might not be available through a library’s subscription services,” including open access journals and institutional repository coverage (2015, 48). recent research has dug deeper into both these content areas. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 22 in general, oa articles have been shown to net more citations than non-oa articles, as koler-povh, južnic, and turk (2014) showed within the field of civil engineering. across their sample of 2,026 scholarly articles in 14 journals, all indexed in web of science, scopus, and google scholar, oa articles received an average of 43 citations while non-oa articles were cited 29 times (1039). google scholar did a better job discovering those citations; in google scholar the median of citations of oa articles was always higher than that for non-oa articles, wheras this was true in web of science for only 10 of the 14 journals and in scopus for 11 of the 14 journals (1040). similarly, chen (2014) found google scholar to index far more oa journals than scopus and web of science, especially “gold oa.”12 google scholar’s advantage should not be assumed across all disciplines, however; testa (2016) found both google scholar and rilm to provide good coverage of oa journals in music, with google scholar indexing at least one article from 93% of the 84 oa journals in the sample. but the bibliographic database rilm indexed two more oa journals than google scholar. google scholar indexing of repositories may be critical for success, but results vary by ir platform and whether the ir metadata has been structured according to google’s guidelines. in a random sample from shodhganga, india’s central etd database, weideman (2015) found not one article had been indexed in full text by google scholar, although in many cases the metadata was indexed, leading the author to identify needed changes to the way shodhganga stores etds.13 likewise, chen (2014) found that neither google scholar nor google appears to index baidu wenku, a major full-text archive and social networking site in china similar to researchgate, and orduña-malea and lópez-cózar (2015) found that latin american repositories are not very visible in google or google scholar due to limitations of the description schemas chosen as well as search engine reliability. in yang’s (2016) study of texas tech’s dspace ir, google was the only search engine that indexed, discovered, or linked to pdf files supplemented with metadata; google scholar did not discover or provide links to the ir’s pdf files, and was less successful at discovering metadata. when google scholar is able to index ir content, it may be responsible for significant traffic. in a study of four major u.s. universities’ institutional repositories (three dspace, one contentdm) involving a dataset of 57,087 unique urls and 413,786 records, researchers found that 48%–66% of referrals came from google scholar (obrien et al. 2016, 870). the importance of google scholar in contrast to google was noted by lee et al. (2015), who conducted title searches on 170 journal articles housed in florida state university’s institutional repository (using bepress’s digital commons platform), 100 of which existed in full text in the ir. links to the ir were found in google results for 45.9% of the 170 items, and in google scholar for 74.7% of the 170 items. furthermore, google scholar linked to the full text for 78% of the 100 cases where full text was available, and even provided links to freely available full text for six items that did not have full 12 oa articles on publisher web sites, whether the journal itself is oa or not (chen 2014) 13 most notably, the need to store thesis documents as one pdf file instead of divided into multiple, separate files, to create html landing pages as per google’s recommendations, and to submit the addresses of these pages to google scholar. information technology and libraries | june 2017 23 text in the ir. however, the researchers also noted “relying on either google or google scholar individually cannot ensure full access to scholarly works housed in oa irs.” in their study, among the 104 fully open access items there was an overlap in results of only 57.5%; google provided links to 20 items not found with google scholar, and google scholar provided links to 25 items not found with google (lee et al. 2015, 15). google scholar results note the number of “versions” available for each item. in a study of 982 science article citations (including both oa and non-oa) in irs, pitol and degroote found 56% of citations had between four and nine google scholar versions (2014, 603) almost 90% of the citations shown were the publisher version, but of these, only 14.3% were freely available in full text on the publisher web site. meanwhile, 70% percent of the items had at least one free full-text version available through a “hidden” google scholar version. the author’s experience in retrieving full text for this review indicates this issue still exists, but research would be needed to formulate reliable recommendations for users. use and utility of google scholar as part of the research process studies were found concerning google scholar’s popularity with users and their reasons for preferring it (or not) over other tools. another group of studies examined issues related to the utility of google scholar for research processes, including issues related to messy metadata. finally, a cluster of articles focused specifically on using google scholar for systematic reviews. popularity and user preferences several studies have shown google scholar to be well-known to scholarly communities. a survey of 3,500 scholars from 95 countries found that over 60% of 3,500 scientists and engineers and over 70% of respondents in the social sciences, arts, and humanities were aware of google scholar and used it regularly (van noorden 2014). in a large-scale journal-reader survey, inger and gardner (2016) found that among academic researchers in high-income areas, academic search engines surpassed abstracts and indexes as a starting place for research (2016, 85, figure 4). in low-income areas, google use exceeded google scholar use for academic research. major library link resolver software offers reports of full-text requests broken down by referrer. inger and gardner (2016) showed a large variance across subjects for whether people prefer google or google scholar: “people in the social sciences, education, law, and business use google scholar more to find journal articles. however, people working in the humanities and religion and theology prefer to use google” (88). humanities scholar use of google over google scholar was also found by kemman et al. (2013); google, google images, google scholar, and youtube were used more than jstor or other library databases, even though humanities scholars’ trust in google and google scholar was lower. user research since 2014 concerning google scholar has focused on graduate students. results suggest scholar is used regularly but the tool is only partially sufficient. in their study of 20 engineering masters’ students’ use of abstracts and indexes, johnson and simonsen (2015) found an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 24 that half their sample (n=20) had used google scholar the last time they located an article using specific search terms or criteria. google was the second most-used source at 20%, followed by abstracting and indexing services (15%). graduate students describe google scholar with nuance and refer to it as a specific part of their process. in bøyum and aabø’s (2015) interviews with eight phd business students and wu and chen’s (2014, 381) interviews with 32 graduate students drawn from multiple academic disciplines, the majority described using library databases and google scholar for different purposes depending on the context. graduate students in both studies were well aware of google scholar’s use for citation searching. bøyum and aabø’s (2015) subjects described library resources as more “academically robust” than google or google scholar. wu and chen’s (2014) interviewees praised google scholar for its wider coverage and convenience, but lamented the uncertain quality, sometimes inaccessible full text, too many results, lack of sorting function (document type or date), finding documents from different disciplines, and duplicate citations. google scholar was seen by their subjects as useful during early stages of information seeking. in contrast to general assumptions, more than half the students (wu and chen 2014, 381) interviewed reported browsing more than 3 pages’ worth of google scholar results. about half of interviewees reported looking at cited documents to find more, however students had mixed opinions about whether the citing documents turned out to be relevant. google scholar’s “my library” feature, introduced in 2013, now competes with other bibliographic citation management software. in a survey of 344 (mostly graduate) students, conrad, leonard, and somerville found google scholar was the most-used (47%) followed by endnote (37%), and zotero (19%) (2015, 572). follow-up interviews with 13 of the students revealed that a few students used multiple tools, for example one participant noted he/she used “endnote for sharing data with lab partners and others “across the community”; mendeley for her own personal thesis work, where she needs to “build a whole body of literature”; and google scholar citations for “quick reference lists that i may not need for a second or third time.” messy metadata many studies have suggested google scholar’s metadata is “messy.” although none in the period of study examined this phenomenon in conjunction with relative user performance, the issues found could affect scholarship. a 2016 study itemized the most common mistakes in google scholar resulting from its extraction process: 1) incorrect title identification; 2) missing or incorrectly assigned authors; 3) book reviews indexed as books; 4) failing to group versions of the same document, which inflates citation counts; 5) grouping different editions of books, which deflates citation counts; 6) attributing citations to documents that did not cite them, or missing citations that did; and 7) duplicate author profiles (martín-martín et al. 2016b). the authors concluded that “in an academic big data environment, these errors (which we deem affect less than 10% of the records in the database) are of no great consequence, and do not affect the core system information technology and libraries | june 2017 25 performance significantly” (54). two of these issues have been studied specifically: duplicate citations and missing publication dates. the rate of duplicate citations in google scholar has ranged upwards of 2.93% (haddaway et al. 2015) and 5% (winter, zadpoor, and dodou 2014, 1562), which can be compared to a .05% duplicate citation rate in web of science (haddaway et al. 2015, 13). haddaway found the main reasons for duplication include “typographical errors, including punctuation and formatting differences; capitalization differences (google scholar only), incomplete titles, and the fact that google scholar scans citations within reference lists and may include those as well as the citing article” (2015, 13). the issue of missing publication dates varies greatly across samples. dates were found to be missing 9% of the time in winter et al.’s study, although it varied by publication type: 4% of journals, 15% of theses, and 41% of the unknown document types” (winter, zadpoor, and dodou 2014, 1562). however martin-martin et al. studied a sample of 32,680 highly-cited documents and found that web of science and google scholar agreed on publication dates 96.7% of the time, with an idiosyncratically large proportion of those mismatches in 2012 and 2013 (2017, 159). utility for research processes prior to 2014, studies such as asher, duke, and wilson's 2012 evaluated google scholar’s utility as a general research tool, often in comparison with discovery tools. since 2014, the only such study found was namei and young’s comparison of summon, google scholar, and google using 299 known-item queries. they found google scholar and summon returned relevant results 74% of the time; google returned relevant results 91% of the time. for “scholarly formats,” they found summon returned relevant results 76% of the time; google 79%; and google 91% (2015, 526527). the remainder of studies in this category focused specifically on systematic reviews, perhaps because such reviews are so time-consuming. authors develop search strategies carefully, execute them in multiple databases, and document their search methods and results carefully. some prestigious journals are beginning to require similar rigor for any original research article, not just systematic reviews (cals and kotz 2016). information provided by professional organizations about the use of google scholar for systematic reviews seems inconsistent: the cochrane handbook for systematic reviews of interventions lists google scholar among sources for searching, but none of the five “highlighted reviews” on the cochrane web site at the time of this article’s writing used google scholar in their methodologies. the uk organization national institute for health and care excellence’s manual (national institute for health and care excellence (nice)) only mentions google scholar in an appendix of search sources under “conference abstracts.” a study by gehanno et al. (2013) found google scholar contained 100% of the references from 29 systematic reviews, and suggested google scholar could be the first choice for systematic reviews or meta-analyses. this finding prompted a slew of follow-up studies in the next three years. an an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 26 immediate response by giustini and boulos (2013) pointed out that systematic reviews are not performed by searching for article titles as with gehanno et al.’s method, but through search strategies. when they tried to replicate a systematic review’s topical search strategy in google scholar, the citations were not easily discovered. in addition the authors were not able to find all the papers from a given systematic review even by title searching. haddaway et al. also found imperfect coverage: for one of the seven reviews examined, 31.5% of citations could not be found (2015, 11). haddaway also noted that special characters and fonts (as with chemical symbols) can cause poor matching when such characters are part of article titles. recent literature concurs that it is still necessary to search multiple databases when conducting a systematic review, including abstracts and indexes, no matter how good google scholar’s coverage seems to be. no one database’s coverage is complete, including google scholar (thielen et al. 2016), and practical recall of google scholar is exceptionally low due to the 1,000 result limit, yet at the same time, google scholar’s lack of precision is costly in terms of researchers’ time (bramer, giustini, and kramer 2016b; haddaway et al. 2015). the challenges limiting study of google scholar’s coverage also bedevil those wishing to use it for reviews, especially the 1,000 result retrieval limit, lack of batch export, and lack of exported abstracts (levay et al. 2016). additionally, google scholar’s changing content, unknown algorithm and updating practices, search inconsistencies, limited boolean functions, and 256-character query limit prevent the tool from accommodating the detailed, reproducible search methodologies required by systematic reviews (bonato 2016; haddaway et al. 2015; giustini and boulos 2013). bonato noted google scholar retrieved different results with advanced and basic searches; could not determine the format of items (e.g. conference papers); and found other inconsistent results.14 bonato also lamented the lack of any kind of document type limit. despite the limitations and logistical challenges, practitioners and scholars are finding solid reasons for including academic web search engines as part of most systematic review methodologies (cals and kotz 2016). stansfield et al. noted that “relevant literature for lowand middle-income countries, such as working and policy papers, is often not included in databases,” and that google scholar finds additional journal articles and grey literature not indexed in databases (2016, 191). for eight systematic reviews by eppi-center, “over a quarter of relevant citations were found from websites and internet search engines” (stansfield, dickson, and bangpan 2016, 2). specific tools and practices have been recommended when using search engines within the context of systematic reviews. software is available to record search strategies and results (harzing and alakangas 2016b; haddaway 2015). haddaway suggests the use of snapshot tools (haddaway 2015) to record the first 1,000 google scholar records rather than the typical assessment of the first 50 search results as had been done in the past: “this change in practice 14 bonato (2016) found zero hits for conference papers when limiting by year 2015-2016, but found two papers presented at a 2015 meeting. information technology and libraries | june 2017 27 could significantly improve both the transparency and coverage of systematic reviews, especially with respect to their grey literature components.” (haddaway et al. 2015, 15). both haddaway (2015) and cochrane recommend that review authors print or save locally electronic copies of the full text or relevant details rather than bookmarking web sites, “in case the record of the trial is removed or altered at a later stage” (higgins and green 2011). new methods for searching, downloading, and integrating academic search engine results into review procedures using free software to increase transparency, repeatability, and efficiency have been proposed by haddaway and his colleagues (2015). google scholar citations and metrics google scholar citations and metrics are not academic search engines, but this article included them because these products are interwoven into the fabric of the google scholar database. google scholar citations, launched in late 2011 (martín-martín et al. 2016b, 12) groups citations by author, while google metrics (launch date uncertain) provides similar data for articles and journals. readers interested in an in-depth literature review of google scholar citations for earlier years (2005-2012) are directed to (thelwall and kousha 2015b). in his comprehensive review of more recent literature about using google scholar citations for citation analysis, waltman (2016) described several themes. google scholar’s coverage of many fields is significantly broader than web of science and scopus, and this seems to be continuing to improve over time. however studies regularly report google scholar’s inaccuracies, content gaps, phantom data, easily manipulatable citation counts, lack of transparency, and limitations for empirical bibliometric studies. as discussed in the coverage section, google scholar’s citation database is competitive with other major databases such as web of science and has been growing dramatically in the last few years (winter, zadpoor, and dodou 2014; harzing and alakangas 2016b; harzing 2014) but has recently stabilized (harzing and alakangas 2016b). more and more studies are concluding that google scholar will report more comprehensive information about citation impact than web of science or scopus. across a sample of articles from many years of one science journal, trapp (2016) found the proportion of articles with zero citations was 37% for web of science, 29% for scopus, and 19% for google scholar. some of google scholar’s superiority for citation analysis in the social sciences and humanities is due to its inclusion of book content, software, and additional journals (prins et al. 2016; bornmann et al. 2016). bornmann et al. (2016) noted citations to all ten of a research institute’s ten books published in 2009 were found in google scholar, whereas web of science found citations for only two books. furthermore they found data in google scholar for 55 of the total of 71 of the institute’s book chapters. for the four conference proceedings they could identify in google scholar, there were 100 citations, of which 65 could be found in google scholar. the comparative success of google scholar for citation impact varies by discipline, however: (levay et al. 2016) found web of science to be more reliable than google scholar, quicker for an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 28 downloading results, and better for retrieving 100% of the most important publications in public health. despite google scholar’s growth, using all three major tools (scopus, web of science, and google scholar) still seems to be necessary for evaluating researcher productivity. rothfus (2016) compared web of science, scopus, and google scholar citation counts for evaluating the impact of the canadian network of observational drug effect studies (cnodes), as represented by a sample of 222 citations from five articles. attempting to determine citation metrics for the cnodes research team yielded different results for every article when using the three tools. they found that “using three tools (web of science, scopus, google scholar) to determine citation metrics as indicators of research performance and impact provided varying results, with poor overall agreement among the three” (237). major academic libraries’ web sites often explain how to find one’s h-index in all three (suiter and moulaison 2015). researchers have also noted the disadvantages of google scholar for citation impact studies. google scholar is costly in terms of researcher time. levay et al. (2016) estimated the cost of “administering results” from web of science to be 4 hours versus 75 hours for google scholar. administering results includes using the search tool to search, download, and add records to bibliographic citation software, and removing duplicate citations. duplicate citations are often mentioned as a problem (prins et al. 2016), although moed (2016) suggested the double counting by google scholar would occur only if the level of analysis is on target sources, not if it is on target articles.15 downloaded citation samples can still suffer from double counts, however: harzing and alakangas described how cleaning “a fairly extreme case” in their study reduced the number of papers from 244 to 106 (2016b). google scholar also does not identify self-citations, which can dramatically influence the meaning of results (prins et al. 2016). furthermore, researchers have shown it is possible to corrupt google scholar citations by uploading obviously false documents (delgado lópez-cózar, robinson-garcía, and torres-salinas 2014).while the researchers noted traditional citation indexes can also be defrauded, google’s products are less transparent and abuses may not be easily detected. google did not respond to the research team when contacted and simply deleted the false documents to which it had been alerted without reporting the situation to the affected authors, and the researchers concluded: “this lack of transparency is the main obstacle when considering google scholar and its by-products for research evaluation purposes” (453). because these disadvantages do not outweigh google scholar’s seemingly broader coverage, many articles investigate workarounds for using google scholar more effectively when evaluating 15 “if a document is, for instance, first published in arxiv, and a next version later in a journal j, citations to the two versions are aggregated. in google scholar metrics, in which arxiv is included as a source, this document (assuming that its citation count exceed the h5 value of arxiv and journal j) is listed both under arxiv and under journal j, with the same, aggregate citation count (moed 2016, 29). information technology and libraries | june 2017 29 research impact. harzing and alakangas (2016b) recommend the hi index16, which is corrected for career length and co-authorship patterns, as the citation metric of choice for a fair comparison of google scholar with other tools. bornmann et al. (2016) investigated a method to normalize data and reduce errors when using google scholar data to evaluate citations in the social sciences and humanities. researcher profiles can also be used to find other scholars by topic. in a 2014 survey of researchers (n=8,554), dagienė and krapavickaitė found that 22% used a third-party service such as google scholar or microsoft academic to produce lists of their scholarly activities and 63% reported their scholarly record was freely available on the web (2016, 158, 161). google scholar ranked only second to microsoft word as the most frequently used software to maintain academic activity records (160). martín-martín et al. (2016b) examined 814 authors in the field of bibliometrics using google scholar citations, researcherid, researchgate, mendeley, and twitter. google scholar was the most used social research sharing platform, followed by researchgate, with researcherid gaining wider acceptance among authors deemed “core” to the field. only about one-third of the authors created a twitter profile, and many mendeley and researcherid profiles were found empty. the study found google scholar academic profiles’ distinctive advantages to be automatic updates and its high growth rate, with disadvantages of scarce quality control, inherited metadata mistakes from google scholar, and its manipulatability. overall, martin-martin and colleagues concluded that google scholar “should be the preferred source for relational and comparative analyses in which the emphasis is put on author clusters” (57). google scholar metrics provides citation information for articles and journals. in a sample of 1,000 journals, orduña-malea and delgado lópez-cózar found that “despite all the technical and methodological problems,” google scholar metrics provides sound and reliable journal rankings (2014, 2365). google scholar metrics seems to be an annual publication; the 2016 edition contains 5,734 publications and 12 language rankings. russian, korean, polish, ukranian, and indionesian were added this year, while italian and dutch were removed for unknown reasons (martín-martín et al. 2016a). researchers also found that many discussion papers and working papers were removed in 2016. english-language publications are broken into subject areas and disciplines. google scholar metrics often, but not always creates separate entries for each language in which a journal is published. bibliometricians call for google scholar metrics to display the total number of documents published in the publications indexed and the total number of citations received: “these are the two essential parameters that make it possible to assess the reliability and accuracy of any bibliometric indicator” (13). adding country and language of publication and self-citation rates are among the other improvements listed by lopez-cozar and colleagues. 16 harzing and alakangas (2016b) define the hia as the hi norm/academic age. academic age refers to the number of years elapsed since first publication. to calculate hi norm, one divides the number of citations by the number of authors for that paper, and then calculates the h-index of the normalized citation count. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 30 informing practice the glaring lack of research related to the coverage of arts and humanities scholarship, limited research on book coverage, and relaunch of microsoft academic make it impossible to form a general recommendation regarding the use of academic web search engines for serious research. until the ambiguity of arts and humanities coverage is clarified, and until academic web search engines are transparent and stable, traditional bibliographic databases still seem essential for systematic reviews, citation analysis, and other rigorous literature search purposes. disciplinespecific databases also have features such as controlled vocabulary, industry classification codes, and peer review indicators that make scholars more efficient and effective. nevertheless, the increasing relevance of academic search engines and solid coverage of sciences and social sciences make it essential for librarians to become as expert with google scholar, google books, and microsoft academic. for some scholarly tasks, academic search engines may be superior: for example, when looking up doi numbers for this paper’s bibliography, the most efficient process seemed to be a google search on the article title plus the term “doi,” and the most likely site to display in the results was researchgate.17 librarians and scholars should champion these tools as an important part of an efficient, effective scholarly research process (walsh 2015), while also acknowledging the gaps in coverage, biases, metadata issues and missing features available in other databases. academic web search engines could form the centerpiece for instruction sessions surrounding the scholarly network, as shown by “cited by” features, author profiles, and full-text sources. traditional abstracts and indexes could then be presented on the basis of their strengths. at some point, explaining how to access full text will likely no longer focus on the link resolver but on the many possible document versions a user might encounter (e.g. pre-prints or editions of books) and how to make an informed choice. in the meantime, even though web search engines and repositories may retrieve copious full text outside library subscriptions, college students should still be made aware of the library’s collections and services such as interlibrary loan. when considering google scholar’s weaknesses, it’s important to keep in mind chen’s observation that we may not have a tool available that does any better (antell et al. 2013). while google scholar may be biased toward english-language publications, so are many bibliographic databases. overall, google scholar seems to have increased the visibility of international research (bartol and mackiewicz-talarczyk 2015). while google scholar’s coverage of grey literature has been shown to be somewhat uneven (bonato 2016; haddaway et al. 2015), it seems to include more diversity among relevant document types than many abstracts and indexes (ştirbu et al. 2015; bartol and mackiewicz-talarczyk 2015). although the rigors of systematic reviews may contraindicate the tool’s use as a single source, it adds value to search results from other databases (bramer, giustini, and kramer 2016a). user preferences and priorities should also be taken into account; google 17 because the authority of researchgate is ambiguous, in such cases i then looked up the doi using google to find the publisher’s version. in some cases, the doi was not displayed on the publisher’s result page (e.g., https://muse.jhu.edu/article/197091). information technology and libraries | june 2017 31 scholar results have been said to contain “clutter,” but many researchers have found the noise in google scholar tolerable given its other benefits (ştirbu et al. 2015). google books purportedly contains about 30 million items, focused on u.s.-published and englishlanguage books. but its coverage is hit-or-miss, surprising mays (2015) with an unexpected wealth of primary sources but disappointing harper (2016) with limited coverage of academic health sciences books. recent court decisions have enabled google to continue progressing toward their goal of full-text indexing and making snippet views available for the google-estimated universe of 130 million books, which suggests its utility may increase. google books is not integrated with link resolvers or discovery tools but has been found useful for providing information about scholarly research impact, especially for the arts, humanities, and social sciences. as re-launched in 2016, microsoft academic shows real potential to compete with google scholar in coverage and utility for finding journal articles. as of february 2017 its index contains 120 million citations. in contrast to the mystery of google scholar’s black-box algorithms and restrictive limitations, microsoft academic uses an open-system approach and offers an api. microsoft academic appears to have less coverage of books and grey literature compared with google scholar. research is badly needed about the coverage and utility of both google books and microsoft academic. google scholar continues to evolve, launching a new algorithm for known-item searching in 201618 that appears to work very well. google scholar does not reveal how many items it searches but studies have suggested 160 million documents have been indexed. studies have shown the google scholar relevance algorithm to be heavily influenced by citation counts and language of publication. google scholar has been so heavily researched and is such a “black box” that more attention would seem to have diminishing returns, except in the area of coverage of and utility for arts and humanities research. librarians may find these takeaways useful for working with or teaching google scholar: • little is known about coverage of arts and humanities by google scholar. • recent studies repeatedly find that in the sciences and social sciences google scholar covers as much if not more than library databases, has more recent coverage, and frequently provides access to full text without the need for library subscriptions. • although the number of studies is limited, google scholar seems excellent at retrieving known scholarly items compared with discovery tools. • using proper accent marks in the title when searching for non-english language items appears to be important. 18 google scholar’s blog notes that in january 2016, a change was made so “scholar now automatically identifies queries that are likely to be looking for a specific paper” technically speaking, “it tries hard to find the intended paper and a version that that particular user is able to read” https://scholar.googleblog.com/. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 32 • finding full text for non-english journal articles may require searching google scholar in the original language. • while google scholar may include results from google books, it appears both tools should be used rather than assuming google books will appear in google scholar. • while google scholar does include grey literature, these results do not usually rank highly. • google scholar and google must both be used to effectively search across institutional repository content. • free full text may be buried underneath the “all x versions” links because the publisher’s web site is usually the dominant version presented to the user. the right-hand column links may help ameliorate this situation, but not reliably. • google scholar is well-known in most academic communities and used regularly; however, it is seldom the only tool used, with scholars continuing to use other web search tools, library abstracts and indexes, and published web sites as well. • experts in writing systematic reviews recommend google scholar be included as a search tool along with traditional abstracts and indexes, using software to record the search process and results. • for evaluating research impact, google scholar may be superior to web of science or scopus, but using all three tools still seems necessary. • as with any database, citation metadata should be verified against the publisher’s data; with google scholar, publication dates should receive deliberate attention. • when google scholar covers some of a major publisher’s content, that does not imply it covers all of that publisher’s content. • google scholar metrics appears to provide reliable journal rankings. research agenda this review of the literature also provides direction for future research concerning academic web search engines. because this review focused on 2014-2016, researchers may need to review studies from earlier periods for methodological ideas and previous findings, noting that dramatic changes in search engine coverage and behavior can occur within only a few years.19 across the studies, some general best practices were observed. when comparing the coverage of academic web search engines, their utility for establishing research impact, or other bibliometric studies, researchers should strongly consider using software such as publish or perish, and to design their research approach with previous methodologies in mind. information scientists have charted a set of clear disciplinary methods; there is no need to start from scratch. even when 19 for example ştirbu found that google scholar overlapped georef by 57% and 62% (ştirbu et al. 2015, 328), compared with a finding by neuhaus in 2006 where scholar overlapped with georef by 26% (2006, 133). information technology and libraries | june 2017 33 performing a large-scale quantitative assessment such as (kousha and thelwall 2015), manually examining and discussing a subset of the sample seems helpful for checking assumptions and for enhancing the meaning of the findings to the reader. some researchers examined the “top 20” or “top 10” results qualitatively (kousha and thelwall 2015), while others took a random sample from within their large-study sample (kousha, thelwall, and rezaie 2011). academic search engines for arts and humanities research research into the use of academic web search engines within arts and humanities fields is sorely needed. surveys show humanities scholars use both google and google scholar (inger and gardner 2016; kemman, kleppe, and scagliola 2013; van noorden 2014). during interviews of 20 historians by martin and quan-haase (2016) concerning serendipity, five mentioned google books and google scholar as important for recreating serendipity of the physical library online. almost all arts and humanities scholars search the internet for researchers and their activities, and commonly expressed the belief that having a complete list of research activities online improves public awareness (dagienė and krapavickaitė 2016). mays’s (2015) practical advice and the few recent studies on citation impact of google books for these disciplines point to the enormous potential for this tool’s use. articles describing opportunities for new online searching habits of humanities scholars have not always included google scholar (huistra and mellink 2016). wu and chen’s interviews with humanities graduate students suggested their behavior and preferences were different from science and technology students, doing more known-item searching and struggling with “semantically ambiguous keywords” that retrieved irrelevant results (2014, 381). platform preferences seem to have a disciplinary aspect: hammarfelt’s (2014) investigation of altmetrics in the humanities suggests mendeley and twitter should be included along with google scholar when examining citation impact of humanities research, while a 2014 nature survey suggests researchgate is much less popular in the social sciences and humanities than in the sciences (van noorden 2014). in summary, arts and humanities scholars are active users of academic web search engines and related tools, but their preferences and behavior, and the relative success of google scholar as a research tool cannot be inferred from the vast literature focused on the sciences. advice from librarians and scholars about the strengths and limitations of academic web search engines in these fields would be incredibly useful. specific examples of needed research, and related studies to reference for methodological ideas: • similar to the studies that have been done in the sciences, how well do academic search engines cover the arts and humanities? an emphasis on formats important to the discipline would be important (prins et al. 2016). • how does the quality of search results compare between academic search engines and traditional library databases for arts and humanities topics? to what extent can the user usefully accomplish her task? (ruppel 2009)? an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 34 • to what extent do academic search engines support the research process for scholarship distinctive to arts and humanities disciplines (e.g. historiographies, review essays)? • in academic search engines, how visible is the arts and humanities literature found in institutional repositories (pitol and de groote 2014)? specific aspects of academic search engine coverage this review suggests that broad studies of academic search engine coverage may have reached a saturation point. however, specific aspects of coverage need additional investigation: • grey literature: although google scholar’s inclusion of grey literature is frequently mentioned as valuable, empirical studies evaluating its coverage are scarce. additional research following the methodology of haddaway (2015) could investigate the bibliographies of literature other than systematic reviews, investigate various disciplines, or use a sample of valuable known items (similar to kousha, thelwall, and rezaie’s (2011) methodology for books). • non-western, non-english language literature: for further investigation of the repeated finding of non-western, non-english language bias (abrizah and thelwall 2014; cavacini 2015), comparisons to library abstracts and indexes would be helpful for providing context. to what extent is this bias present in traditional research tools? hilbert et al. found the coverage of their sample increased for english language in both web of science and scopus, and “to a lesser extent” in google scholar (2015, 260). • books: any investigations of book coverage in microsoft academic and google scholar would be welcome. very few 2014-2016 studies focused on books in google scholar, and even looking in earlier years turned up little research. georgas (2015) compared google with a federated search tool for finding books, so her study may be a useful reference. kousha et al. (2011) found three times as many citations in google scholar than in scopus to a sample of 1,000 academic books. the authors concluded “there are substantial numbers of citations to academic books from google books and google scholar, and it therefore may be possible to use these potential sources to help evaluate research in bookoriented disciplines” (kousha, thelwall, and rezaie 2011, 2157). • institutional repositories: yang (2016) recommended that “librarians of digital resources conduct research on their local digital repositories, as the indexing effects and discovery rates on metadata or associated text files may be different case by case,” and the studies found 2014-2016 show that ir platform and metadata schema dramatically affect discovery, with some irs nearly invisible (weideman 2015; chen 2014; orduña-malea and lópez-cózar 2015; yang 2016) and others somewhat findable by google scholar (lee et al. 2015; obrien et al. 2016). askey and arlitsch (2015) have explained how google scholar’s decisions regarding metadata schema can dramatically affect results.20 libraries who 20 for example, google’s rejection of dublin core. information technology and libraries | june 2017 35 would like their institutional repositories to serve as social sharing platforms for research should consider conducting a study similar to (martín-martín et al. 2016b). finally, a study of ir journal article visibility in academic web search engines could be extremely informative. • full-text retrieval: the indexing coverage of academic search engines relates to the retrieval of full text, which is another area ripe for more research studies, especially in light of the impressive quantity of full text that can be retrieved without user authentication. johnson and simonsen (2015) found that more of the engineering students they surveyed obtained scholarly articles from a free download or getting a pdf from a colleague at another institution than used the library’s subscription. meanwhile, libraries continue to pay for costly subscription resources. monitoring this situation is essential for strategic decision-making. quint (2016) and karlsson (2014) have suggested strategies for libraries and vendors to support broader access to subscription full text through creative licensing and per-item fee approaches. institutional repositories have had mixed results in changing scholars’ habits (both contributors and searchers) but are demonstrably contributing to the presence of full text in the academic search engine experience. when will academic users find a good-enough selection of full-text articles that they no longer need the expanded full text paid for by their institutions? google books similarly to microsoft academic, google books as a search tool also needs dedicated research from librarians and information scientists about its coverage, utility, and/or adoption. a purposeful comparison with other large digital repositories such as hathitrust (https://www.hathitrust.org) would be a boon to practitioners and the public. while hathitrust is transparent about its coverage (https://www.hathitrust.org/statistics_visualizations), specific areas of google books’ coverage have been called into question. weiss (2016) suggested a gap in google books exists from about 1915-1965 “because many publishers either have let it fall out of print, or the book is orphaned and no one wants to go through the trouble of tracking down the copyright owners” and found that copies in google books “will likely be locked down and thus unreadable, or visible only as a snippet, at best” (303). has this situation changed since the court rulings concerning the legality of snippet view? longitudinal studies in the growth of google books similar to (harzing 2014) could illuminate this and other questions about google books’s ability to deliver content. uneven coverage of content types, geography, and language should be investigated. mays noted a possible geographical imbalance within the united states (mays 2015, 26). others noted significant language and international imbalances, and large disciplinary differences (weiss 2016; abrizah and thelwall 2014; kousha and thelwall 2015). weiss and others suggest the implications of google books’ coverage imbalance have enormous social implications: “google and other [massive digital libraries] have essentially canonized the books they have scanned and contribute to the marginalization of those left unscanned” (301). therefore more holistic quantitative investigations of the types of information in google books and possible skewness an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 36 would be welcome. finally, chen’s study (2012) comparing the coverage of google books and worldcat could be repeated to provide longitudinal information. the utility of google books for research purposes also needs further investigation. books are far more prevalently cited in wikipedia than are research articles (thelwall and kousha 2015a). examining samples of wikipedia articles’ citation lists for the prevalence of google books could reveal how dominant a force google books has become in that space. on a more philosophical level, investigating the ways google books might transform scholarly processes would be useful. szpiech (2014) considered how the google books version of a medieval manuscript transformed his relationship with texts, causing a rupture “produced by my new power to extract words and information from a text without being subject to its order, scale, or authority” (78). he hypothesized readers approach google books texts as consumers, rather than learners, whereby “the critical sense of the gestalt” is at risk of being forgotten” (84). have other researchers in experienced what he describes? microsoft academic given the stated openness of microsoft’s new academic web search engine,21 the closed nature of google scholar, and the promising findings of bibliometricians (harzing 2016b; harzing and alakangas 2016a), librarians and information scientists should embark on a thorough review of microsoft academic with similar enthusiasm to which they approached google scholar. the search engine’s coverage, utility for research, and suitability for bibliometric analysis22 all need to be examined. microsoft academic’s abilities for supporting scholarly social networking would also be of interest, perhaps using ward et al. (2015) as a theoretical groundwork. the tool’s coverage and utility for various disciplines and research purposes is a wide-open field for highly useful research. professional and instructional approaches based on user research to inform instructional approaches, more study on user behavior is needed, perhaps repeating herrera’s (2011) study with google scholar and microsoft academic. in light of the recent focus on graduate students, research concerning the use of academic web search engines by undergraduates, community college students, high school students, and other groups would be welcome. using an interview or focus group generates exploratory findings that could be tested through surveys with a larger, more representative sample of the population of interest. studying searching behaviors has been common; can librarians design creative studies to investigate reading, engagement, and reflection when web search engines are used as part of the process? is there a way to study whether the “matthew effect” (antell et al. 2013, 281), the aging citation 21 microsoft’s faq says the company is “adopting an open approach in developing the service, and we invite community participation. we like to think what we have developed is a community property. as such, we are opening up our academic knowledge as a downloadable dataset” and offers the academic knowledge api (https://www.microsoft.com/cognitive-services/en-us/academic-knowledge-api). 22 see jacsó (2011) for methodology. information technology and libraries | june 2017 37 phenomenon (verstak et al. 2014; martín-martín et al. 2016a; davis and cochran 2015), or other epistemological hypotheses are influencing scholarship patterns? a bold study could be performed to examine differences in quality outcomes between samples of students using primarily academic search engines versus traditional library search tools. exploratory studies in this area could begin by surveying students about their use of search tools for research methods courses or asking them to record their research process in a journal, and correlating the findings with their grades on the final research product. three specific areas of user research needed are the use of scholarly social network platforms, researcher profiles, and the influence of these on scholarly collaboration and research (ward, bejarano, and dudás 2015, 178); the performance of google’s relatively new known-item search23 (compared with microsoft academic’s known-item search abilities), and searching in non-english languages. regarding the latter, albarillo’s (2016) method which he applied to library databases could be repeated with google scholar, microsoft academic, and google books. finally, to continue their strong track record as experts in navigating the landscape of digital scholarship, librarians need to research assumptions regarding best practices for scholarly logistics. for example, searching google for article titles plus the term “doi,” then scanning the results list for researchgate was found by this study’s author to most efficiently provide doi numbers: but is this a reliable approach? does researchgate have sufficient accuracy to be recommended as the optimal tool for this task? what is the most efficient way for a scholar to locate full text for a citation? are academic search engines’ bibliographic citation management software export tools competitive with third-party commercial tools such as refworks? another area needing investigation is the visibility of links to free full text in google scholar. pitol and degroote found that 70% percent of the items in their study had at least one free full-text version available through a “hidden” google scholar version (2014, 603), and this author’s work on this review article indicates this problem still exists — but to what extent? also, when free full text exists in multiple repositories (e.g. researchgate, digital commons, academic.edu), which are the most trustworthy and practically useful for scholars? librarians should discuss the answers to these questions and be ready to provide expert advice to users. conclusion with so many users opting to use academic web search engines for research, librarians need to investigate the performance of microsoft academic, google books, and of google scholar for the arts and humanities, and to re-think library services and collections in light of these tools’ strengths and limitations. the evolution of web indexing and increasing free access to full text should be monitored in conjunction with library collection development. to remain relevant to 23 google scholar’s blog notes that in january 2016, a change was made so “scholar now automatically identifies queries that are likely to be looking for a specific paper” technically speaking, “it tries hard to find the intended paper and a version that that particular user is able to read” https://scholar.googleblog.com/. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 38 modern researchers, librarians should continue to strengthen their knowledge of and expertise with public academic web search engines, full-text repositories, and scholarly networks. bibliography abrizah, a., and mike thelwall. 2014. "can the impact of nonwestern academic books be measured? an investigation of google books and google scholar for malaysia." journal of the association for information science & technology 65 (12): 2498-2508. https://doi.org/10.1002/asi.23145. albarillo, frans. 2016. "evaluating language functionality in library databases." international information & library review 48 (1): 1-10. https://doi.org/10.1080/10572317.2016.1146036. antell, karen, molly strothmann, xiaotian chen, and kevin o’kelly. 2013. "cross-examining google scholar." reference & user services quarterly 52 (4): 279-282. https://doi.org/10.5860/rusq.52n4.279. asher, andrew d., lynda m. duke, and suzanne wilson. 2012. "paths of discovery: comparing the search effectiveness of ebsco discovery service, summon, google scholar, and conventional library resources." college & research libraries 74(5):464-488. https://doi.org/10.5860/crl374. askey, dale, and kenning arlitsch. 2015. "heeding the signals: applying web best practices when google recommends." journal of library administration 55 (1): 49-59. https://doi.org/10.1080/01930826.2014.978685. authors guild. "authors guild v. google." accessed january 1, 2016, https://www.authorsguild.org/where-we-stand/authors-guild-v-google/. bartol, tomaž, and maria mackiewicz-talarczyk. 2015. "bibliometric analysis of publishing trends in fiber crops in google scholar, scopus, and web of science." journal of natural fibers 12 (6): 531. https://doi.org/10.1080/15440478.2014.972000. boeker, martin, werner vach, and edith motschall. 2013. "google scholar as replacement for systematic literature searches: good relative recall and precision are not enough." bmc medical research methodology 13 (1): 1. bonato, sarah. 2016. "google scholar and scopus for finding gray literature publications." journal of the medical library association 104 (3): 252-254. https://doi.org/10.3163/15365050.104.3.021. bornmann, lutz, andreas thor, werner marx, and hermann schier. 2016. "the application of bibliometrics to research evaluation in the humanities and social sciences: an exploratory study using normalized google scholar data for the publications of a research institute." information technology and libraries | june 2017 39 journal of the association for information science & technology 67 (11): 2778-2789. https://doi.org/10.1002/asi.23627. boumenot, diane. "printing a book from google books." one rhode island family. last modified december 3, 2015, accessed january 1, 2017. https://onerhodeislandfamily.com/2015/12/03/printing-a-book-from-google-books/. bøyum, idunn, and svanhild aabø. 2015. "the information practices of business phd students." new library world 116 (3): 187-200. https://doi.org/10.1108/nlw-06-2014-0073. bramer, wichor m., dean giustini, and bianca m. r. kramer. 2016. "comparing the coverage, recall, and precision of searches for 120 systematic reviews in embase, medline, and google scholar: a prospective study." systematic reviews 5(39):1-7. https://doi.org/10.1186/s13643-016-0215-7. cals, j. w., and d. kotz. 2016. "literature review in biomedical research: useful search engines beyond pubmed." journal of clinical epidemiology 71: 115-117. https://doi.org/10.1016/j.jclinepi.2015.10.012. carlson, scott. 2006. "challenging google, microsoft unveils a search tool for scholarly articles." chronicle of higher education 52 (33). cavacini, antonio. 2015. "what is the best database for computer science journal articles?" scientometrics 102 (3): 2059-2071. https://doi.org/10.1007/s11192-014-1506-1. chen, xiaotian. 2012. "google books and worldcat: a comparison of their content." online information review 36 (4): 507-516. https://doi.org/10.1108/14684521211254031. ———. 2014. "open access in 2013: reaching the 50% milestone." serials review 40 (1): 21-27. https://doi.org/10.1080/00987913.2014.895556. choong, miew keen, filippo galgani, adam g. dunn, and guy tsafnat. 2014. "automatic evidence retrieval for systematic reviews." journal of medical internet research 16 (10): 1-1. https://doi.org/10.2196/jmir.3369. ciccone, karen, and john vickery. 2015. "summon, ebsco discovery service, and google scholar: a comparison of search performance using user queries." evidence based library & information practice 10 (1): 34-49. https://ejournals.library.ualberta.ca/index.php/eblip/article/view/23845. conrad, lettie y., elisabeth leonard, and mary m. somerville. 2015. "new pathways in scholarly discovery: understanding the next generation of researcher tools." paper presented at the association of college and research libraries annual conference, march 25-27, portland, or. https://pdfs.semanticscholar.org/3cb1/315476ccf9b443c01eb9b1d175ae3b0a5b4e.pdf. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 40 dagienė, eleonora, and danutė krapavickaitė. 2016. "how researchers manage their academic activities." learned publishing 29(3):155-163. https://doi.org/10.1002/leap.1030. davis, philip m., and angela cochran. 2015. "cited half-life of the journal literature." arxiv preprint arxiv:1504.07479. https://arxiv.org/abs/1504.07479. delgado lópez-cózar, emilio, nicolás robinson-garcía, and daniel torres-salinas. 2014. "the google scholar experiment: how to index false papers and manipulate bibliometric indicators." journal of the association for information science & technology 65 (3): 446-454. https://doi.org/10.1002/asi.23056. erb, brian, and rob sica. 2015. "flagship database for literature searching or flelpful auxiliary?" charleston advisor 17 (2): 47-50. https://doi.org/10.5260/chara.17.2.47. fagan, jody condit, and david gaines. 2016. "take charge of eds: vet your content." presentation to the ebsco users' group, boston, ma, may 10-11. gehanno, jean-françois, laetitia rollin, and stefan darmoni. 2013. "is the coverage of google scholar enough to be used alone for systematic reviews." bmc medical informatics and decision making 13 (1): 1. https://doi.org/10.1186/1472-6947-13-7. georgas, helen. 2015. "google vs. the library (part iii): assessing the quality of sources found by undergraduates." portal: libraries and the academy 15 (1): 133-161. https://doi.org/10.1353/pla.2015.0012. giustini, dean, and maged n. kamel boulos. 2013. "google scholar is not enough to be used alone for systematic reviews." online journal of public health informatics 5 (2). https://doi.org/10.5210/ojphi.v5i2.4623. gray, jerry e., michelle c. hamilton, alexandra hauser, margaret m. janz, justin p. peters, and fiona taggart. 2012. "scholarish: google scholar and its value to the sciences." issues in science and technology librarianship 70 (summer). https://doi.org/10.1002/asi.21372/full. haddaway, neal r. 2015. "the use of web-scraping software in searching for grey literature." grey journal 11 (3): 186-190. haddaway, neal robert, alexandra mary collins, deborah coughlin, and stuart kirk. 2015. "the role of google scholar in evidence reviews and its applicability to grey literature searching." plos one 10 (9): e0138237. https://doi.org/10.1371/journal.pone.0138237. hammarfelt, björn. 2014. "using altmetrics for assessing research impact in the humanities." scientometrics 101 (2): 1419-1430. https://doi.org/10.1007/s11192-014-1261-3. hands, africa. 2012. "microsoft academic search – http://academic.research.microsoft.com." technical services quarterly 29 (3): 251-252. https://doi.org/10.1080/07317131.2012.682026. information technology and libraries | june 2017 41 harper, sarah fletcher. 2016. "google books review." journal of electronic resources in medical libraries 13 (1): 2-7. https://doi.org/10.1080/15424065.2016.1142835. harzing, anne-wil. 2013. "a preliminary test of google scholar as a source for citation data: a longitudinal study of nobel prize winners." scientometrics 94 (3): 1057-1075. https://doi.org/10.1007/s11192-012-0777-7. ———. 2014. "a longitudinal study of google scholar coverage between 2012 and 2013." scientometrics 98 (1): 565-575. https://doi.org/10.1007/s11192-013-0975-y. ———. 2016a. publish or perish. vol. 5. http://www.harzing.com/resources/publish-or-perish. ———. 2016b. "microsoft academic (search): a phoenix arisen from the ashes?" scientometrics 108 (3): 1637-1647.https://doi.org/10.1007/s11192-016-2026-y. harzing, anne-wil, and satu alakangas. 2016a. "microsoft academic: is the phoenix getting wings?" scientometrics: 1-13. harzing, anne-wil, and satu alakangas. 2016b. "google scholar, scopus and the web of science: a longitudinal and cross-disciplinary comparison." scientometrics 106 (2): 787-804. https://doi.org/10.1007/s11192-015-1798-9. herrera, gail. 2011. "google scholar users and user behaviors: an exploratory study." college & research libraries 72 (4): 316-331. https://doi.org/10.5860/crl-125rl. higgins, julian, and s. green, eds. 2011. cochrane handbook for systematic reviews of interventions. version 5.1.0 ed.: the cochrane collaboration. http://handbook.cochrane.org/. hilbert, fee, julia barth, julia gremm, daniel gros, jessica haiter, maria henkel, wilhelm reinhardt, and wolfgang g. stock. 2015. "coverage of academic citation databases compared with coverage of scientific social media." online information review 39 (2): 255-264. https://doi.org/10.1108/oir-07-2014-0159. hoffmann, anna lauren. 2014. "google books as infrastructure of in/justice: towards a sociotechnical account of rawlsian justice, information, and technology." theses and dissertations. paper 530. http://dc.uwm.edu/etd/530/. ———. 2016. "google books, libraries, and self-respect: information justice beyond distributions." the library 86 (1). https://doi.org/10.1086/684141. horrigan, john b. "lifelong learning and technology." pew research center, last modified march 22, 2016, accessed february 7, 2017, http://www.pewinternet.org/2016/03/22/lifelonglearning-and-technology/. hug, sven e., michael ochsner, and martin p. braendle. 2016. "citation analysis with microsoft academic." arxiv preprint arxiv:1609.05354.https://arxiv.org/abs/1609.05354. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 42 huistra, hieke, and bram mellink. 2016. "phrasing history: selecting sources in digital repositories." historical methods: a journal of quantitative and interdisciplinary history 49 (4): 220-229. https://doi.org/10.1093/llc/fqw002. inger, simon, and tracy gardner. 2016. "how readers discover content in scholarly publications." information services & use 36 (1): 81-97. https://doi.org/10.3233/isu-160800. jackson, joab. 2010. "google: 129 million different books have been published." pc world, august 6, 2010. http://www.pcworld.com/article/202803/google_129_million_different_books_have_been_pu blished.html. jacsó, p. 2008. "live search academic." peter’s digital reference shelf, april. jacsó, péter. 2011. "the pros and cons of microsoft academic search from a bibliometric perspective." online information review 35 (6): 983-997. https://doi.org/10.1108/14684521111210788. jamali, hamid r., and majid nabavi. 2015. "open access and sources of full-text articles in google scholar in different subject fields." scientometrics 105 (3): 1635-1651. https://doi.org/10.1007/s11192-015-1642-2. johnson, paula c., and jennifer e. simonsen. 2015. "do engineering master's students know what they don't know?" library review 64 (1): 36-57. https://doi.org/10.1108/lr-05-2014-0052. jones, edgar. 2010. "google books as a general research collection." library resources & technical services 54 (2): 77-89. https://doi.org/10.5860/lrts.54n2.77. karlsson, niklas. 2014. "the crossroads of academic electronic availability: how well does google scholar measure up against a university-based metadata system in 2014?" current science 107 (10): 1661-1665. http://www.currentscience.ac.in/volumes/107/10/1661.pdf. kemman, max, martijn kleppe, and stef scagliola. 2013. "just google it-digital research practices of humanities scholars." arxiv preprint arxiv:1309.2434. https://arxiv.org/abs/1309.2434. khabsa, madian, and c. lee giles. 2014. "the number of scholarly documents on the public web." plos one 9 (5): https://doi.org/10.1371/journal.pone.0093949 kirkwood jr., hal, and monica c. kirkwood. 2011. "historical research." online 35 (4): 28-32. koler-povh, teja, primož južnic, and goran turk. 2014. "impact of open access on citation of scholarly publications in the field of civil engineering." scientometrics 98 (2): 1033-1045. https://doi.org/10.1007/s11192-013-1101-x. kousha, kayvan, mike thelwall, and somayeh rezaie. 2011. "assessing the citation impact of books: the role of google books, google scholar, and scopus." journal of the american society information technology and libraries | june 2017 43 for information science and technology 62 (11): 2147-2164. https://doi.org/10.1002/asi.21608. kousha, kayvan, and mike thelwall. 2017. "are wikipedia citations important evidence of the impact of scholarly articles and books?" journal of the association for information science and technology. 68(3):762-779. https://doi.org/10.1002/asi.23694. kousha, kayvan, and mike thelwall. 2015. "an automatic method for extracting citations from google books." journal of the association for information science & technology 66 (2): 309320. https://doi.org/10.1002/asi.23170. lee, jongwook, gary burnett, micah vandegrift, hoon baeg jung, and richard morris. 2015. "availability and accessibility in an open access institutional repository: a case study." information research 20 (1): 334-349. levay, paul, nicola ainsworth, rachel kettle, and antony morgan. 2016. "identifying evidence for public health guidance: a comparison of citation searching with web of science and google scholar." research synthesis methods 7 (1): 34-45. https://doi.org/10.1002/jrsm.1158. levy, steven. "making the world’s problem solvers 10% more efficient." backchannel. last modified october 17, 2014, accessed january 14, 2016, https://medium.com/backchannel/the-gentleman-who-made-scholar-d71289d9a82d. los angeles times. 2016. "google, books and 'fair use'." los angeles times, april 19, 2016. http://www.latimes.com/opinion/editorials/la-ed-google-book-search-20160419-story.html martin, kim, and anabel quan-haase. 2016. "the role of agency in historians’ experiences of serendipity in physical and digital information environments." journal of documentation 72 (6): 1008-1026. https://doi.org/10.1108/jd-11-2015-0144. martín-martín, alberto, juan manuel ayllón, enrique orduña-malea, and emilio delgado lópezcózar. 2016a. "2016 google scholar metrics released: a matter of languages... and something else." arxiv preprint arxiv:1607.06260. https://arxiv.org/abs/1607.06260. martín-martín, alberto, enrique orduña-malea, juan m. ayllón, and emilio delgado lópez-cózar. 2016b. "the counting house: measuring those who count. presence of bibliometrics, scientometrics, informetrics, webometrics and altmetrics in the google scholar citations, researcherid, researchgate, mendeley & twitter." arxiv preprint arxiv:1602.02412. https://arxiv.org/abs/1602.02412. martín-martín, alberto, enrique orduña-malea, juan manuel ayllón, and emilio delgado lópezcózar. 2014. "does google scholar contain all highly cited documents (1950-2013)?" arxiv preprint arxiv:1410.8464. https://arxiv.org/abs/1410.8464. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 44 martín-martín, alberto, enrique orduña-malea, juan ayllón, and emilio delgado lópez-cózar. 2016c. "back to the past: on the shoulders of an academic search engine giant." scientometrics 107 (3): 1477-1487. https://doi.org/10.1007/s11192-016-1917-2. martín-martín, alberto, enrique orduña-malea, anne-wil harzing, and emilio delgado lópezcózar. 2017. "can we use google scholar to identify highly-cited documents?" journal of informetrics 11 (1): 152-163. https://doi.org/10.1016/j.joi.2016.11.008. mays, dorothy a. 2015. "google books: far more than just books." public libraries 54 (5): 23-26. http://publiclibrariesonline.org/2015/10/far-more-than-just-books/ meier, john j., and thomas w. conkling. 2008. "google scholar’s coverage of the engineering literature: an empirical study." the journal of academic librarianship 34 (3): 196-201. https://doi.org/10.1016/j.acalib.2008.03.002. moed, henk f., judit bar-ilan, and gali halevi. 2016. "a new methodology for comparing google scholar and scopus." arxiv preprint arxiv:1512.05741.https://arxiv.org/abs/1512.05741. namei, elizabeth, and christal a. young. 2015. "measuring our relevancy: comparing results in a web-scale discovery tool, google & google scholar." paper presented at the association of college and research libraries annual conference, march 25-27, portland, or. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/201 5/namei_young.pdf national institute for health and care excellence (nice). "developing nice guidelines: the manual." last modified april 2016, accessed november 27, 2016. https://www.nice.org.uk/process/pmg20. neuhaus, chris, ellen neuhaus, alan asher, and clint wrede. 2006. "the depth and breadth of google scholar: an empirical study." portal: libraries and the academy 6 (2): 127-141. https://doi.org/10.1353/pla.2006.0026. obrien, patrick, kenning arlitsch, leila sterman, jeff mixter, jonathan wheeler, and susan borda. 2016. "undercounting file downloads from institutional repositories." journal of library administration 56 (7): 854-874. https://doi.org/10.1080/01930826.2016.1216224. orduña-malea, enrique, and emilio delgado lópez-cózar. 2014. "google scholar metrics evolution: an analysis according to languages." scientometrics 98 (3): 2353-2367. https://doi.org/10.1007/s11192-013-1164-8. orduña-malea, enrique, and emilio delgado lópez-cózar. 2015. "the dark side of open access in google and google scholar: the case of latin-american repositories." scientometrics 102 (1): 829-846. https://doi.org/10.1007/s11192-014-1369-5. orduña-malea, enrique, alberto martín-martín, juan m. ayllon, and emilio delgado lópez-cózar. 2014. "the silent fading of an academic search engine: the case of microsoft academic information technology and libraries | june 2017 45 search." online information review 38(7):936-953. https://doi.org/10.1108/oir-07-20140169. ortega, josé luis. 2015. "relationship between altmetric and bibliometric indicators across academic social sites: the case of csic's members." journal of informetrics 9 (1): 39-49. https://doi.org/10.1016/j.joi.2014.11.004. ortega, josé luis, and isidro f. aguillo. 2014. "microsoft academic search and google scholar citations: comparative analysis of author profiles." journal of the association for information science & technology 65 (6): 1149-1156. https://doi.org/10.1002/asi.23036. pitol, scott p., and sandra l. de groote. 2014. "google scholar versions: do more versions of an article mean greater impact?" library hi tech 32 (4): 594-611. https://doi.org/0.1108/lht05-2014-0039. prins, ad a. m., rodrigo costas, thed n. van leeuwen, and paul f. wouters. 2016. "using google scholar in research evaluation of humanities and social science programs: a comparison with web of science data." research evaluation 25 (3): 264-270. https://doi.org/10.1093/reseval/rvv049. quint, barbara. 2016. "find and fetch: completing the course." information today 33 (3): 17-17. rothfus, melissa, ingrid s. sketris, robyn traynor, melissa helwig, and samuel a. stewart. 2016. "measuring knowledge translation uptake using citation metrics: a case study of a pancanadian network of pharmacoepidemiology researchers." science & technology libraries 35 (3): 228-240. https://doi.org/10.1080/0194262x.2016.1192008. ruppel, margie. 2009. "google scholar, social work abstracts (ebsco), and psycinfo (ebsco)." charleston advisor 10 (3): 5-11. shultz, m. 2007. "comparing test searches in pubmed and google scholar." journal of the medical library association : jmla 95 (4): 442-445. https://doi.org/10.3163/1536-5050.95.4.442. stansfield, claire, kelly dickson, and mukdarut bangpan. 2016. "exploring issues in the conduct of website searching and other online sources for systematic reviews: how can we be systematic?" systematic reviews 5 (1): 191. https://doi.org/10.1186/s13643-016-0371-9. ştirbu, simona, paul thirion, serge schmitz, gentiane haesbroeck, and ninfa greco. 2015. "the utility of google scholar when searching geographical literature: comparison with three commercial bibliographic databases." the journal of academic librarianship 41 (3): 322-329. https://doi.org/10.1016/j.acalib.2015.02.013. suiter, amy m., and heather lea moulaison. 2015. "supporting scholars: an analysis of academic library websites' documentation on metrics and impact." the journal of academic librarianship 41 (6): 814-820. https://doi.org/10.1016/j.acalib.2015.09.004. an evidence-based review of academic web search engines, 2014-2016| fagan | https://doi.org/10.6017/ital.v36i2.9718 46 szpiech, ryan. 2014. "cracking the code: reflections on manuscripts in the age of digital books." digital philology: a journal of medieval cultures 3(1): 75-100. https://doi.org/10.1353/dph.2014.0010. testa, matthew. 2016. "availability and discoverability of open-access journals in music." music reference services quarterly 19 (1): 1-17. https://doi.org/10.1080/10588167.2016.1130386. thelwall, mike, and kayvan kousha. 2015b. "web indicators for research evaluation. part 1: citations and links to academic articles from the web." el profesional de la información 24 (5): 587-606.https://doi.org/10.3145/epi.2015.sep.08. thielen, frederick w., ghislaine van mastrigt, l. t. burgers, wichor m. bramer, marian h. j. m. majoie, sylvia m. a. a. evers, and jos kleijnen. 2016. "how to prepare a systematic review of economic evaluations for clinical practice guidelines: database selection and search strategy development (part 2/3)." expert review of pharmacoeconomics & outcomes research: 1-17. https://doi.org/10.1080/14737167.2016.1246962. trapp, jamie. 2016. "web of science, scopus, and google scholar citation rates: a case study of medical physics and biomedical engineering: what gets cited and what doesn't?" australasian physical & engineering sciences in medicine. 39(4): 817-823. https://doi.org/10.1007/s13246-016-0478-2. van noorden, r. 2014. "online collaboration: scientists and the social network." nature 512 (7513): 126-129. https://doi.org/10.1038/512126a. varshney, lav r. 2012. "the google effect in doctoral theses." scientometrics 92 (3): 785-793. https://doi.org/10.1007/s11192-012-0654-4. verstak, alex, anurag acharya, helder suzuki, sean henderson, mikhail iakhiaev, cliff chiung yu lin, and namit shetty. 2014. "on the shoulders of giants: the growing impact of older articles." arxiv preprint arxiv:1411.0275. https://arxiv.org/abs/1411.0275. walsh, andrew. 2015. "beyond "good" and "bad": google as a crucial component of information literacy." in the complete guide to using google in libraries, edited by carol smallwood, 3-12. new york: rowman & littlefield. waltman, ludo. 2016. "a review of the literature on citation impact indicators." journal of informetrics 10 (2): 365-391. https://doi.org/10.1016/j.joi.2016.02.007. ward, judit, william bejarano, and anikó dudás. 2015. "scholarly social media profiles and libraries: a review." liber quarterly 24 (4): 174–204.https://doi.org/10.18352/lq.9958. weideman, melius. 2015. "etd visibility: a study on the exposure of indian etds to the google scholar crawler." paper presented at etd 2015: 18th international symposium on electronic theses and dissertations, new delhi, india, november 4-6. http://www.web information technology and libraries | june 2017 47 visibility.co.za/0168-conference-paper-2015-weideman-etd-theses-dissertation-india-googlescholar-crawler.pdf. weiss, andrew. 2016. "examining massive digital libraries (mdls) and their impact on reference services." reference librarian 57 (4): 286-306. https://doi.org/10.1080/02763877.2016.1145614. whitmer, susan. 2015. "google books: shamed by snobs, a resource for the rest of us." in the complete guide to using google in libraries, edited by carol smallwood, 241-250. new york: rowman & littlefield. wildgaard, lorna. 2015. "a comparison of 17 author-level bibliometric indicators for researchers in astronomy, environmental science, philosophy and public health in web of science and google scholar." scientometrics 104 (3): 873-906. https://doi.org/10.1007/s11192-015-1608-4. winter, joost, amir zadpoor, and dimitra dodou. 2014. "the expansion of google scholar versus web of science: a longitudinal study." scientometrics 98 (2): 1547-1565. https://doi.org/10.1007/s11192-013-1089-2. wu, tim. 2015. "whatever happened to google books?" the new yorker, september 11, 2015. wu, ming-der, and shih-chuan chen. 2014. "graduate students appreciate google scholar, but still find use for libraries." electronic library 32 (3): 375-389. https://doi.org/10.1108/el-082012-0102. yang, le. 2016. "making search engines notice: an exploratory study on discoverability of dspace metadata and pdf files." journal of web librarianship 10 (3): 147-160. https://doi.org/10.1080/19322909.2016.1172539. the “black box”: how students use a single search box to search for music materials kirstin dougan information technology and libraries | december 2018 81 kristin dougan (dougan@illinois.edu) is head, music and performing arts library, university of illinois. abstract given the inherent challenges music materials present to systems and searchers (formats, title forms and languages, and the presence of additional metadata such as work numbers and keys), it is reasonable that those searching for music develop distinctive search habits compared to patrons in other subject areas. this study uses transaction log analysis of the music and performing arts module of a library’s federated discovery tool to determine how patrons search for music materials. it also makes a top-level comparison of searches done using other broadly defined subject disciplines’ modules in the same discovery tool. it seeks to determine, to the extent possible, whether users in each group have different search behaviors in this search environment. the study also looks more closely at searches in the music module to identify other search characteristics such as type of search conducted, use of advanced search techniques, and any other patterns of search behavior. introduction music materials have inherent qualities that present difficulties to the library systems that describe them and to the searchers who wish to find them. this can be exemplified in three main areas: formats, titles, and relationships. first, printed music comes in multiple formats such as full scores, vocal scores, study scores, and parts; and in multiple editions such as facsimiles, scholarly editions, performing editions (of various caliber); each format and edition serving a different purpose or need. related to this, but less problematic, is the variety of sound recording formats available. second, issues resulting from titling practices abound in music, ranging from frequent use of foreign terms, not just in descriptive titles (l'oiseau de feu = zhar-ptitsa = the firebird = feuervogel), but in generic titles as translated by various publishers from different countries (symphony=sinfonie). additionally, musical works often have only generic genre titles enhanced by key and work number metadata, for example symphony no. 1 in c minor. third, music materials present a relationship issue best defined as “one-to-many.” musical works often have multiple sections or songs in them (an aria in an opera or a movement in a symphony), and a cd or a score anthology may contain multiple pieces of music. given these three main challenges presented by music materials, it is possible that those searching for music develop distinctive search habits compared to patrons in other subject areas. this study uses transaction log analysis of the music and performing arts module of a library’s federated discovery tool to determine how patrons search for music materials. it also makes a top-level comparison of searches done using other broadly defined subject disciplines’ modules in the same discovery tool. it seeks to determine, to the extent possible, whether users in each group have different search behaviors in this search environment. the study also looks more closely at mailto:dougan@illinois.edu the “black box” | dougan 82 https://doi.org/10.6017/ital.v37i4.10702 searches in the music module to identify other search characteristics such as type of search conducted, use of advanced search techniques, and any other patterns of search behavior. background since fall 2007 the university of illinois library has had easy search (es), a locally developed search tool designed to aid users in finding results from multiple catalog, a&i, and reference targets quickly and simultaneously. there is a “main” es on the library’s main gateway page that searches a variety of cross-disciplinary tools (see figure 1). figure 1. gateway easy search. on the gateway, users have the option of selecting one of the format tabs to narrow their search to books, articles, journals, or media. when the data for this study was gathered, the journals tab was not present. starting in 2010 many of the subject and branch libraries in the university library created their own es modules with target resources specific to the disciplinary areas they serve. search boxes for these es subject modules are often displayed right on the branch library’s home page, but users can also select these subject module options from the dropdown in the main es (see figure 2). information technology and libraries | december 2018 83 figure 2. gateway dropdown subject choices. the mpal es interface as it appears on the mpal homepage can be seen in figure 3—it was created in 2011. figure 3. mpal easy search interface. the “black box” | dougan 84 https://doi.org/10.6017/ital.v37i4.10702 es is a federated search tool and does not have a central index like most current discovery layer tools. rather, it utilizes broadcast search technologies to target different tools and search them directly. while the gateway es now uses a “bento box” layout to display selected citations from each target, in the first iterations of the tool and still today in the subject modules, users are simply presented with a list of hit counts in each of the target tools (see figures 4 and 5). figure 4. mpal easy search display screen part 1. figure 5. mpal easy search results screen part 2. information technology and libraries | december 2018 85 not shown in the screen captures are the results from various newspaper indexes and reference sources such as oxford music online and the international encyclopedia of dance. literature review many studies have examined patron search behavior using transaction log analysis and other methods over the past few decades. since the appearance of google in 1998, and its vast impact on individuals’ expectations and search behavior, recent studies have looked at user search behavior in tools that initially present a single search box. additional studies have looked at disciplinespecific searching behaviors. general search studies and single search boxes many advantages and disadvantages have been ascribed to tools with single search boxes (whether federated search tools or discovery layers), namely ease and convenience on the one hand, and the lack of precision possible in searching and overwhelming number of results on the other. two companion articles, boyd et al. and dyas-correira, written ten years apart, attempted to visit and revisit these issues.1 results and patron satisfaction can vary based on size of library and number of resources accessed by these tools. these types of tools will never be able to search and display everything and that problem is magnified by the number of resources a library has. holman, porter, and zimerman discovered in independent studies that undergraduates do not search very efficiently or effectively and find library tools difficult to use.2 avery and tracy also found this true for the es tool under discussion in this study: the generation of keywords by many students indicates they often struggled to identify alternative terminology that may have resulted in a more successful search . . . . many students exhibited persistence in their searching, but the selection of search terms, sometimes compounded by spelling problems or problems in search string structure, likely did not yield the most relevant results.3 asher, duke, and wilson state in their study comparing student search strategies and success across a variety of library search tools and google that there were “strong patterns in the way students approached searches no matter which tool they used. . . . students treated almost every search box like a google search box, using simple keyword searches in 81.5 percent (679/829) of the searches observed.”4 dempsey and valenti note students’ infrequent use of limiters such as “peer-review” and “date” in eds, the high non-use and misuse rates of quotation marks, relatively low instances of repeated uncorrected spelling errors, and variance patterns in keyword usage.5 students like federated search tools and discovery layers because of their convenience and ease, as found in studies by armstrong, belliston, and williams et al.6 this is reiterated in asher et al., “despite the fact that they did not necessarily perform better on the research tasks in this study, students did prefer google and google-like discovery tools because they felt that they could get to full-text resources more quickly.”7 this one-box approach could hinder students, as described by swanson and green: the search box became an obstacle in other questions where it should not have been used. in some cases, the search box was viewed as an all-encompassing search of the entire site. the “black box” | dougan 86 https://doi.org/10.6017/ital.v37i4.10702 several students searched for administrative information, research guides, and podcasts in this box.8 lown et al. also found that users hope to access a vast range of information via a single search box. “one lesson is that library search is about more than articles and the catalog. about 23 percent of use of quicksearch took place outside either the catalog or articles modules, indicating that ncsu library users attempt to access a wide range of information from the single search box.”9 search and library use in different disciplines in their study comparing a discovery layer and subject-specific tools, dahlen and hanson found “subject-specific indexing and abstracting databases still play an important role for libraries that have adopted discovery layers. discovery layers and subject-specific indexing and abstracting databases have different strengths and can complement each other within the suite of library resources.”10 they also observed things iterated by previous authors, chiefly that “not all students prefer discovery tools” and “the tools that students prefer may not be those that give them the best results.”11 in addition, they found that default configuration matters in terms of students’ success in and preference for a given tool. fu and thomes found that creating smaller disciplinespecific subsets in discovery tools was beneficial to searchers by reducing results and in creasing the results’ relevance.12 few studies investigate how music students search for music materials. dougan found in her observational study of music students’ search behaviors that they have difficulty forming good searches; misuse quotation marks and other search elements; and at times struggle with finding music materials.13 mayer noted upper-class music students’ frustration with using library tools to find specific works of music, going so far as to state, “the music students agreed that both the discovery layer and the catalog are not effective for music-related searching, for any format.”14 clark and yeager found that students had an easier time searching for media items than music scores, and frequently struggled with search strategy revisions.15 there is more research on the larger information needs of disciplines and creating models for research behavior, and not necessarily specific search processes or constructions.16 whitmire, in her 2002 pre-google article, found that students majoring in the social sciences were engaged in information-seeking behaviors at a higher rate than students majoring in engineering.17 chrzastowski and joseph surveyed graduate students at the university of illinois at urbana– champaign and found that those in the life sciences, physical sciences, and engineering visited the libraries less often than students in other academic disciplines.18 students in the arts and humanities used the library more often than students in other disciplines. collins and stone report that in prior studies of users across different disciplines, arts and humanities users do not account for the biggest users of library materials, their survey found the opposite to be true. 19 when looking at the various student populations in their study, musicians had the highest library usage in terms of items borrowed and almost the highest number of library visits. music users in the study also showed high numbers of hours logged into the library e-resources and highest number of e-resources accessed compared to others in their discipline group (but not as much as other disciplines). however, they show a low number of pdf downloads and low number of e-resources accessed frequently. information technology and libraries | december 2018 87 methodology this study conducted quantitative analysis of easy search (es) data as a whole and from a selection of the subject modules, including the music and performing arts library (mpal) module, using data from the period june 20, 2014 through june 16, 2015. additional quantitative and qualitative analysis was conducted only on the mpal es transaction log data. data from the following subject modules were included in comparative analyses: • funk agricultural, consumer and environmental sciences library (aces) (http://www.library.illinois.edu/funkaces/) • grainger engineering library (http://search.grainger.illinois.edu/top/) • history, philosophy, and newspaper library (hpnl) (http://www.library.illinois.edu/hpnl/) • music and performing arts library (mpal) (http://www.library.illinois.edu/mpal/) • social science, health, and education library (sshel) (http://www.library.illinois.edu/sshel/) • undergraduate library (ugl) (http://www.library.illinois.edu/ugl/) each of these libraries has a search box for es on its home page that is customized to the search targets identified as best for those subject areas by the subject librarians in that library. transaction log data on searches done in es is continuously compiled in a sql database and queries were written to determine certain quantitative measures. searches done in these various subject modules were isolated by a variable in the sql data that indicates whether the search was done in the main gateway es, in the main gateway es but using one of the subject dropdown choices, or from the subject es box directly from that library’s homepage. searches in the six subject modules listed above and in the main es were assessed for the average number of searches per session and the average words per search. further analysis of searches done in the mpal es module used 25,503 sessions conducted on mpal public computers from march 21, 2014 to june 21, 2015, which is a slightly longer timespan than used for the comparative analysis between subject es modules described above. to make this more manageable, only every tenth session was considered, meaning 2,550 sessions were analyzed out of the full set of mpal data. searches were sorted by session id number, which is assigned to each session when a new session is begun. this method kept all strings from one session together, whereas simply sorting by date and string id did not, since multiple sessions can occur simultaneously. a session is a series of user actions (searches and click-throughs) from the same workstation in which there is less than a twenty-minute pause between actions. if there are user actions from the same workstation after a twenty-minute pause, a new session is established, therefore, there is the possibility that some of the sequential sessions were from the same user, but there is no easy way to determine that. the mpal data set was assessed using the following quantitative measures: 1) average number of searches per session and whether session contained a) a single search b) multiple searches for the same thing (either repeated exactly or varied) http://www.library.illinois.edu/funkaces/ http://search.grainger.illinois.edu/top/ http://www.library.illinois.edu/hpnl/ http://www.library.illinois.edu/mpal/ http://www.library.illinois.edu/sshel/ http://www.library.illinois.edu/ugl/ the “black box” | dougan 88 https://doi.org/10.6017/ital.v37i4.10702 c) multiple strings searching for multiple things 2) average number of search terms per search 3) type of search by index (title/author/keyword) or other advanced search 4) use of boolean, quotation marks, parentheses, etc. 5) use of work or opus numbers or key indications 6) search indicating format (score, cd, etc.) findings comparing the data for searches done in the main es to some of the subject modules (see table 1) shows that the ugl es and the hpnl es have the fewest average searches per session, and the mpal es has the third highest average number of searches per session. the sciences tend to have higher average words per search string values, while mpal has the second lowest average number of words per search. this is not surprising given that the sciences tend to use a lot of journal literature and it is common for researchers to copy and paste such citations into es. whereas in music, as we will see later, keyword searches tend to focus on combinations of the composer’s name and words from the work title, occasionally with other terms added. source sessions searches average searches per session average words per search all es searches 599,482 1,340,159 2.121 5.08220 gateway only gateway everything tab 382,040 757,862 1.9837 5.255 gateway books tab 71,007 136,724 1.9255 4.048 gateway articles tab 57,169 107,893 1.887 6.35 gateway total 1,002,479 all subject modules departmental searches (incl. those from gateway dropdown) 75,035 214,364 1.9288 searches done directly from subject library pages 144,283 select subject modules21 agricultural, consumer and environmental sciences library 2,732 5,221 1.911 4.07 engineering library 32,018 68,146 2.128 5.092 history, philosophy, and newspaper library 1,264 1,985 1.57 3.09 music and performing arts library (mpal) 21,047 41,590 1.976 3.375 mpal data from march 21, 2014 to june 21, 2015 25,503 49,702 1.949 3.349 social science, health, and education library 9,458 19,760 2.089 4.906 undergraduate library 26,988 44,588 1.65 3.909 table 1. comparative search data from june 20, 2014 to june 16, 2015 (unless otherwise noted). information technology and libraries | december 2018 89 average number (and range) of searches per session in looking at the searches done directly from the mpal homepage and from the gateway dropdown from march 21, 2014 through june 21, 2015, there were 25,503 sessions conducted in the mpal es that contained a total of 49,702 searches, resulting in an average of 1.949 searches per session. of the 2,550 mpal search sessions in the study sample, the majority (63.2 percent) consisted of one search.22 this means the patron conducted one search and then left es, presumably clicking into the library catalog or another tool that is a target in es to complete their research. sessions consisting of two to four searches account for 31 percent of sessions, while sessions involving five to nine searches only account for 5 percent of total sessions, and only 32 sessions, or fewer than 1 percent, consist of ten or more searches (see table 2). searches per session number of sessions searches per session number of sessions 1 1604 12 7 2 476 13 2 3 191 14 3 4 116 16 1 5 51 17 2 6 29 18 1 7 22 19 2 8 12 20 1 9 15 23 1 10 6 30 2 11 6 total searches 2,550 table 2. searches per session. sessions with multiple searches (n= 946) were evaluated to see whether patrons were searching multiple times for the same thing (either with the same term[s] or with different terms), or whether they were searching for different things. five sessions that were clearly not music-related were removed from the sample. each session was categorized as “same/exact,” “same/different,” or “different.” at times, sessions might include several searches for the same thing using altered strings, in addition to searches for other things. those sessions were coded as “different.” for example: crumb zodiac crumb georgy crumb georgy cromb korean music there were 478 multi-search sessions (50.6 percent) in which patrons searched for different things within their session, 391 sessions (41.3 percent) in which patrons looked for the same thing with differing search strings, and 71 (7.5 percent) in which patrons reiterated the exact same search in each attempt. in the 71 sessions in which patrons used the same exact search the “black box” | dougan 90 https://doi.org/10.6017/ital.v37i4.10702 multiple times, they averaged 2.25 searches. those sessions tagged as “same/exacts” provide an opportunity to try to determine why patrons repeat the same search. common themes include: using too broad a search, searching in wrong place (non-performing-arts–related search), or repeatedly typing in the wrong info (e.g., typos or other errors) and not realizing their mistake. in the 391 sessions in which patrons spent their session searching for the same thing with different search strings, they did so with an average of 2.96 searches. often the variation in the search string was a change in spelling or a minor change in the terms, but sometimes it involved the addition or subtraction of terms, such as starting with morley fitzewilliam virginalists and going to morley fitzewilliam. in another example, we see how music metadata can prove challenging for searchers to format, such as when a patron started with schumann op.68 (without the necessary space between op. and 68), then progressed to album for the young, and finally schumann album for the young. in the 478 sessions in which patrons searched for completely different things within their session, they did so with an average of 4.08 searches per session. in many cases, although the searches were for different items, they were related in some way, either by genre, instrument, or some other element, such as in this example: microjazz color me jazz jamey aebersold play-along vandall jazz jazz piano pieces but sometimes the searches were for very different things: debussy voiles composition as problem mart humal composition as problem debussy ursatz average number (and range) of terms per search in looking at the approximately 4,900 searches included in the sample of 2,550 mpal sessions, without removing the small percentage of duplicate searches, two-term searches are the most common, followed by three-term searches—together accounting for more than half of the searches (55.3 percent). oneand four-term searches are the next most common, together accounting for 25.5 percent of searches (see table 3). in 2012, regular es single-term searches were at almost 60 percent.23 information technology and libraries | december 2018 91 number of terms in search string instances percentage (%) 1 605 12.4 2 1,559 31.8 3 1,149 23.5 4 642 13.1 5 400 8.2 6 196 4.0 7 100 2.0 8-15 216 4.5 16-57 29 .06 table 3. words per search string. longer search strings (8-15 terms) ranged from 74 to ten examples each, respectively, while searches with 16 to 20 terms ranged from 8 to 2 examples each, respectively. the following term counts each had only one example in the logs (25, 26, 31, 32, 36, 57). single-term searches types of single-term searches can be broken down into several categories (see table 4). over half (58.4 percent) were searches for personal names or part/all of a work title. some names and work titles are in fact so unique that a one-word search might in fact be successful (e.g., beyoncé, schwanengesang, newsies, or landowska). over a fifth (22.2 percent) were classified as “other or undetermined,” including publisher names, cities, or subject terms. type of one-word search number personal name 260 title 93 instrument/genre 51 tool/location/format 51 call number/barcode/label number 15 other/undetermined 135 table 4. one-term search types. in the tool/location/format category patrons searched for things such as: albums, images, dissertation, rilm, worldcat, jstor, and imslp. while rilm (abstracts of music literature) and worldcat can be found by a search in this tool because they will match on journal or database titles to which we subscribe, a search for imslp [international music score library project] only brings back mentions of imslp in rilm, etc. mpal links to imslp on its webpage, but neither imslp nor the library’s website are targets in es. when patrons only searched for a format, as in a session where a patron first searched for performances, then albums, and then audio cd [sic], it is difficult to know whether the patron expected to be led to a tool that only searched or listed recordings, whether they wanted a list of all of our recordings, or if some other logic was occurring. searchers also used this technique in multi-word searches, such as in the example george gershwin articles. the “black box” | dougan 92 https://doi.org/10.6017/ital.v37i4.10702 single-term searches in the “other/undetermined” category were a mix of subject terms like solfege, tuning, and spectralism. the patron could be trying to find materials related to these topics, examples of them (in the case of solfege), or definitions for them. they also included publisher or label terms such as rubank and puntamayo [sic], and even, on more than one occasion, urls and dois. two-term searches and names the largest segment of the mpal data (31.8 percent) is comprised of two-term searches. the examples show that often a musical work can be easily sought based on the composer’s name and a word from the title, especially in cases where it is a common title but adding the composer’s name makes it unique (e.g., ligeti requiem). sometimes the patron only knows the work’s characteristics and not its proper name (e.g., lakme duet). patrons do attempt to search for topical material using only two words, and that is not likely enough for a good topic search in most cases, such as in the example mahler dying. sometimes phonetic spellings are employed such as woozy wick followed by woyzeck (which is both a play and a film with this spelling but could also potentially be a misspelling of berg’s opera wozzeck). another example is image cartier followed by images quartier. personal names are frequently seen in two-term search strings. occasional use of foreign versions of names is observed, e.g., georgy crumb. it is difficult to know if these are typos or an artifact of our high international student population. as with any search that contains only a name, it is impossible to know whether the searcher was looking for materials by that individual or information about them. additionally, when current faculty names are searched, it is difficult to know whether patrons are looking for contact information for them, or scores or recordings by them. also observed in name searches is the phenomena of patrons repeating their search with a change in order of names, such as bryan gilliam and then gilliam bryan. this occurs with other two-word searches as well, such as a change from introitus gubaidulina to gubaidulina introitus. switching the order of the words in a search no longer makes a difference in most search tools (although in some catalogs, of course, it was once required to formulate an author search as last name, first name). there is still the occasional use of comma in ln, fn searches here. echoing the results of an earlier study that asked students what data points they used in searching, only occasionally did searches in this data set incorporate specific performers combined with a particular piece or composer: franck mutter, or for a particular edition: idomeneo barenreiter.24 sometimes names/titles were combined with format, such as a session in which a patron searched for hedwig images and then hedwig photo. here it is hard to tell if they are looking for pictures of a fictional owl or images from productions of hedwig and the angry inch, or something else. names are also frequently combined with work numbers instead of title words, such as mozart k.395 and moscheles op.73. search strings in the “other/undertermined” category sometimes included what appears to be an author/date search, perhaps for an article, such as mccord 2006. long search strings on the other end of the spectrum, the vast majority of the ten-plus word string searches are for performing arts items, but some were in other subject areas. these long searches are often citations that have been copied and pasted, which can be discerned from the use of punctuation and capitalization, like “welded in a single mass”: memory and community in london’s concert halls information technology and libraries | december 2018 93 during the first world war.25 it is very common in general gateway es searches to see an entire citation pasted in,26 but less common in the mpal module. searches such as this are often truncated through iteration to make the search more generic (see table 5). given easy search’s doi search recognition function, the longest version of this search would have worked had the doi been correct, but the correct doi number lacks the “.2” at the end (see table 6, query 1). the middle three searches (#s 2-4) failed because none of the a&i services that include this citation use hess, j. for the author’s name, but instead use her full first name (juliet). other examples showed that even when patrons use the exact citation, their search might not be successful if the citation formatting did not match that of the database(s) in which the article was indexed. query # query string 1 hess, j. (2014). radical musicking: towards a pedagogy of social change. music education research, 16(3), 229-250. doi: 10.1080/14613808.2014.909397.2). 2 hess, j. (2014). radical musicking: towards a pedagogy of social change. music education research, 16(3), 229-250. 3 hess, j. (2014). radical musicking: towards a pedagogy of social change. 4 hess, j. radical musicking: towards a pedagogy of social change. 5 radical musicking: table 5. search truncation. in some instances, searches were long because the patron included additional information such as in this example: bernstein, leonard. arranger: jack mason. title: west side story-selections (for symphonic "full" orchestra piano-conductor score). edition/publisher: hal leonard corporation. it is hard to tell if this was a copy and paste from another source such as a publisher catalog, or if the patron was trying to be very precise. in any case, this search was not successful, but would have been had the searcher omitted extraneous information such as the terms “arranger” and “edition/publisher.” type of index search—title/author/keyword and adding subsets or tools easy search does have an advanced search function with indexes for title and author, although it is rarely used by patrons. including repeated searches, searches done selecting the “title” index only numbered 207, or fewer than 10 percent of the sample. searches done selecting “author” were even scarcer, at 141(5.5 percent). the remaining ~2,300 searches in the sample were conducted using the default keyword search. occasionally there was a misuse of index searching, such as: ti: js bach english suite ti: scarlatti sonatas ti: haydn cello concerto d in these examples, composer name is included in a title index search. it is unclear whether searchers do not realize that they have selected something other than a keyword search, or whether people inherently think of the composer’s name as part of the title. later in this paper the phenomenon of searches using possessive name forms is discussed, which may be associated. the “black box” | dougan 94 https://doi.org/10.6017/ital.v37i4.10702 patrons have the option to start from the main library gateway and perform a search in es, and in the advanced search screen can choose other subject modules such as arts and humanities, l ife sciences, and so forth, and/or types of tools to cross-search (see figure 6). patrons chose the music and performing arts tool subset in 161 sessions. figure 6. easy search advanced search screen. the vast majority of the time (4,557 searches or 93 percent), patrons chose to start from the mpal es on the mpal homepage and do a basic search there, but 179 times patrons started from the mpal es and chose other subsets through the advanced search.27 given our large music education program, logically, some patrons made tool choices that included the music subset and the education and/or social science subsets. but sometimes patrons chose every or almost every option available across multiple unrelated subject areas, which likely made for a very unwieldy result set. information technology and libraries | december 2018 95 use of boolean operators, quotation marks, parentheses, truncation, etc. as in most search tools, there are several ways in es to conduct more sophisticated searches. however, patrons do not employ these techniques often, in part because they don’t always have to. in most older catalogs (including our classic voyager opac), searchers had to use boolean terms in capital letters, whereas in vufind and worldcat boolean and is now implied between terms. in the 159 examples of boolean logic in the searches, and is most common term used. interestingly, some researchers used plus signs instead of and (as they might in google), not just between individual words, but in between multi-word segments of the string (without employing quotation marks). however, the + sign, like and, is ignored/implied by es. berg + warm die lufte progressive studies for trumpet progressive studies for trumpet + john miller progressive studies for trumpet (john miller) new orleans + bossa nova johnny alf + brazil dick farney + brazil dick farney + booker pittman in some cases, the use of boolean did not seem intentional, that is, the term “and” appears as part of a common phrase (especially for instrument combinations), such as in webern violin and piano. only a handful of the boolean searches included examples of or and not, which seemed to stem from a class assignment designed by a professor, as the search strings are all very similar. one set is below: machaut not mass machaut or mass machaut and mass machaut mass notre dame machaut mass machaut and mass the “black box” | dougan 96 https://doi.org/10.6017/ital.v37i4.10702 commas were sometimes seen to stand in for boolean operators in a sense, or at least to separate search concepts, like the plus signs above, but were not counted in the total uses of boolean terms cited above. they are ignored by es. rachmaninoff, moment music planet, holst city noir, john adams piazzolla, flute and marimba mussorgsky, pictures at an exhibition, manfred schandert searchers used quotation marks on occasion (n=162) to keep phrases together, and parentheses were also used in this manner eight times (although they are ignored by es), such as in these examples: preludes and fugues (well-tempered clavier) cohen chaconne (from partita in d minor, bwv 1004) in some cases, searchers did not seem to grasp the function of quotation marks, as in this example: “snowforms" raymond murrey schafer, which was also observed by avery and tracy.28 truncation symbols can be another powerful tool in a searcher’s arsenal, but examples of their use in the transaction logs show that most searchers who attempt to use them do not understand them, such as in the examples doctor atomic?, boethius music,* and: orchestra* history history of the orchestra orchest* history orchestr* history orchestra history orchestral history in fact, the current library catalog assists users by automatically applying truncation logic so that “symphony” returns results for “symphonies” and vice versa. it is doubtful that this is generally known among users and likely functions in a manner transparent to most of them. work numbers and key indications searching by music metadata elements such as work or opus numbers and key designations has always proved challenging in online search environments given that numbers and single letters can appear in other parts of the catalog record with different meanings (e.g., 1 part instead of symphony no. 1). added to this is the difficulty of describing items that contain multiple works— the item’s title might be “mozart’s complete symphonies” or “beethoven symphonies 1-6” without information technology and libraries | december 2018 97 complete work details provided. nevertheless, 134 searches had some form of work number included, and 36 searches included a key indication. fantasie in f# minor presto georg philipp telemann and concerto en ut mineur j.c. bach are further examples of why a work’s key is hard to search by, one because of the use of the french solfege syllable “ut” and one because it includes a sharp symbol (#).29 the difficulties this can cause often led searchers to try various permutations of their search. mozart concerto g major sam franko; mozart concerto k 283 sam franko; scores; mozart violin concerto g major; mozart violin concerto g major sam franko; mozart violin concerto; sonata g major flute cpe bach sonata g major flute bach hamburger sonata flute cpe bach hamburger sonata hamburger sonata it is counterintuitive to searchers that including specific details in their search string might not help, but that is in fact the case in many online catalogs. searchers often run into the question of how or if to include the work indicator (op., k., bwv, etc.), which can lead to a “misuse” of this extra data such as in mozart k501 and mahler symphony no.9 (no spaces). another observation includes the use of what the author calls musicians’ shorthand. that is, those familiar with classical repertoire will know that examples such as sibelius 1 and mahler 5 are searches for symphonies even though they do not say so, but it will be harder, if not impossible, for the catalog to interpret that, leaving the searcher to sort through many extra results. in addition is the long-standing issue of whether to enter the number as “1”, “1st”, or “first” and whether the system can interpret these against the form of the number present in the catalog record. search by format or edition type in forty-seven examples searchers used format terms in their searches, including score, vocal score, full score, dvd, performance recordings, albums, and audio cd as well as the following: prokofiev romeo and juliet orchestra parts orchestra excerpts prokofiev romeo and juliet viola the “black box” | dougan 98 https://doi.org/10.6017/ital.v37i4.10702 tosca harp part assassins cd saxophone article in fifteen examples searchers searched for edition types including urtext, facsimile, critical edition, and complete works. in the latter case they occasionally used the word “complete” and the composer’s name, such as complete schumann or complete webern. unfortunately, this approach will often not be successful, because even though the term “complete works” is used colloquially by musicians, the titles of such editions are often something else (and often in a foreign language, such as “opera omnia”). other observations on formulation of searches searching by call numbers and recording label numbers while some catalogs allow call number searches, our current instance of vufind does not have a call number index, and keyword searching for them only works in some instances.30 but while call number searching does not work well in vufind (e.g., it has to be done as a keyword search and not a call number index search like in voyager), it still works in es because it is searching by keywords. there were thirty-two examples of searches in mpal’s es where patrons used entire call numbers or the first part of a classification number to find related materials: count basie biography count basie ml 410 duke ellington ml 410 duke ellington bibiliography it is also not unrealistic to think that patrons might want to search by a recording’s label number, since most catalogs provide search options for isbns and issns for print materials. searchers attempted this in a handful of searches like lpo-0014,31 7.24356e+11,32 and 777337-2.33 unfortunately this information is not usually reflected in mpal’s catalog records. common descriptions, natural language queries, genre queries, and context words as mentioned already with the examples mahler 1 and complete works, patrons regularly search with terms and phrases that make sense to them or that are used colloquially when discussing music and sources, which may or may not be in the bibliographic record. additional examples in the data set include: handel messiah critical edition rodelinda in italian mamma mia! book [for the text of a musical] grove encyclopedia [the title of this is in fact “dictionary” not “encyclopedia”] mgg sachteil [the abbreviation for musik in geschichte und gegenwart and the name for a section of it] information technology and libraries | december 2018 99 dance collection the last example in the list is particularly intriguing—somewhat like the earlier search examples of performances and albums, one wonders if the patron hoped to find everything in that category and then be able to browse, however it is hard to know what the searcher anticipated getting in return. sometimes natural language queries appear, often in an attempt to find a smaller part of a larger work, such as the slow movement of brahms's first symphony, anonymous chant from vespers for christmas day, and chaconne (from partita in d minor, bwv 1004); or for things other than musical works, such as in reviews of stravinsky article by robert craft. another variation on natural language or colloquial searches is the use of the possessive form of composer names. although not common (23 examples), patrons do this when searching for composer and title of a work, e.g., verdi's requiem. it seems unlikely that people do this when searching for books or other works, but musicians make works possessive to the composer, such as in the examples mendelssohn's violin concerto, to differentiate between pieces with the same form/generic title. in rare cases searchers used the term “by,” such as jeptha by carrissimi. genre searches such as south indian vocal music and hindustani classical music show that people may want to search the way they might in pandora or itunes, although it is possible this person was looking for secondary materials and not recordings or scores: pop female pop women pop contemporary pop searchers also exhibit a desire to find things by genre and instrument or voice type, such as soprano arias [which is ‘high voice’ in the lc subject heading], mozart satb sanctus, and baroque arias for medium voice. other examples include marimba literature, organ literature, and organ techniques. catalogs do not necessarily aid in these types of searches, even though they are natural constructions for users. sometimes searchers add context words to their search like they would in google in a way that will not necessarily help them in the catalog, such as daniel read composer. discussion even given the difficulties of searching for music materials, mpal patrons have embraced es—its module has almost as many searches as the undergraduate library’s, which serves a much larger population. it also has twice as many searches as the social science, health, and education library module, which also serves a much larger population than mpal. one of the possible reasons for this is the fact that mpal was an early adopter of developing an es subject module that could be searched from our homepage, which means our patrons have had longer to grow accustomed to using it. mpal has lower average words-per-search ratio (3.375 or 3.349 depending on data set) than most other es modules, likely because there are more composer plus title keyword searches for musical works and not as many pasted article citation searches, which tend to be longer. this is supported by the comparison of the average number of words in searches done in the gateway books tab the “black box” | dougan 100 https://doi.org/10.6017/ital.v37i4.10702 (4.048) vs. the gateway articles tab (6.35). in addition, although twoand three-word searches are most common, mpal has a significant number of single-word searches (12.4 percent). such searches can work in music, when there are unique titles like turandot and treemonisha that are unlikely to appear for more than one composer or as terms in other disciplines. for this same reason, singleor even two-word searches are unlikely to be effective in most other disciplines. at around seven words per search a transition in search patterns occurs. eight word and longer search strings are almost always some version of a title of a book, article, chapter or dissertation, etc. and strings with six words and fewer tend to be topical searches or combination composer/piece searches. other transaction log studies of es have shown that “title searching and results display—of journal titles, article titles, and book titles—is being heavily employed by users.”34 however, in music, where title alone may not be sufficient to identify and retrieve a musical work, searches with a combination of composer name and elements of the title and/or additional information will always be most prevalent. search location appropriateness and context even though discovery layers and federated search tools help with minimizing the number of silos and places in which scholars need to search, there are still issues with patrons attempting to use the es box to find things it is not designed to find.35 searchers see a box and search, without always understanding the context. this can happen on multiple levels. the mpal page clearly states that the mpal es box searches for arts-related things, but obviously patrons do not always see or comprehend this, even after they type in many queries that do not provide (good) results. this is likely related to the number of visitors to mpal from other disciplines who do not realize that there are various differently scoped versions of es. the following example could be a theatre set construction related search, which would work only moderately well in our tool. or, it may have been conducted by an architecture or structural engineering student, who would have better luck using a different es module. light weigh [sic] structures in architecture building research the evolving design vocabulary of fabric structure the engineering discipline of tent structures building research jan/feb 1972:22 it would be ideal if the system was smart enough to make suggestions: “you appear to need architecture resources—if you are not finding what you need, might we suggest tool x, y, or z?” while es does this to an extent when it can in the generic es, it does not do so in the subject modules, and in reality, can only go so far. it raises the question of whether we are we doing patrons a disservice by offering pre-defined subject modules. while this approach has some benefits for most users, it also creates different challenges for some. mpal’s es does not target all available relevant online tools and neither does the general es, so interdisciplinary researchers still need to be cautious of silos, even well-intentioned ones created by librarians or traditional information technology and libraries | december 2018 101 ones created by vendors. it is difficult to inform patrons of this in one-box search settings—they see the box and are eager to get started without first having to read a lengthy set of instructions. search location context is also important when patrons use es to try to find things that are described or linked on our website and not in es, such as for any of our named special collections. patrons also use es to find tools such as naxos, jstor, worldcat, and librarysource, some of which are targeted by es and some of which are not. es will at least provide a link to a tool, however (see figure 7). figure 7. easy search post-search suggestion. these particular tools are all also linked from the mpal website (in fact, naxos is linked further down the home page from the es box) and we also have a separate tool that enables one to search for databases and online journals by name. on some occasions, searchers used es to look for help using library tools, such as in the following example: rilm retrieval rilm using rilm the library website, not the discovery layer, is a better tool for finding instructions, since help information is currently delivered via various libguides. however, this is not intuitive to patrons. on a related note, it is interesting to consider whether patrons searching for specific tools such as imslp expect to find results from non-library resources in our search layers, or if they simply do not differentiate in their minds what is an open tool and what is a library subscription tool. patron knowledge level many of the observations of this study are related to known-item searching, since a large percentage of people looking for music materials are looking for specific pieces of music. earlier studies show that it is difficult to search for something if you do not know what it is.36 this can be seen in examples like ombramaifu handel (should be ombra mai fu) or the interworkings of tennis (which was followed by the correct inner game of tennis). topical searches can be especially difficult in any subject when the patron does not quite know how to put what they want into words (or literally does not know the right words, especially in the case of our many patrons for whom english is not their first language). the “black box” | dougan 102 https://doi.org/10.6017/ital.v37i4.10702 qualtize musical tension spell change click: kw:qualitize musical tension quantize musical tension quantitative musical tension music motive similarity surveying musical form through melodic-motivic similarities a paradigmatic approach to extract the melodic structure of a musical piece inding subsequences of melodies in musical pieces spell change click: kw:finding subsequences melodies musical pieces similarity measures for melodies measures of musical tension measuring musical tension this echoes head and eisenberg’s 2009 findings and dempsey and valenti’s 2016 findings.37 shortcomings of the easy search tool this study helped illuminate some shortcomings in es. sometimes the search formulation changes from es to the target, for example cramer preludes in es becomes all(cramer preludes) [a bound phrase] in one target, resulting in many fewer results than if the search had been done in the native interface. patrons may not realize this as they are searching. in another case there were no results for danças folclóricas brasileiras e suas aplicações educativas but removing the diacritics retrieves this title in our catalog, so it appears that diacritics do not function in es (at least when vufind is the target)—something that may not be apparent to searchers and hopefully can be addressed in the code. further research additional analysis could be done on this data set, including assessing whether searches were for known items or topics, and more specifically whether for articles, books, scores, or recordings. however, in many cases it is difficult to tell if a patron is looking for a score, recording, or information about a piece or composer. other research on es shows over half of searches (just over 58 percent in 2015) in the main es are for known items.38 this percentage is likely to be much higher in mpal’s es. with an enhanced data set it would also be possible to identify which target tools searchers are choosing most often. conclusion while many patrons (and librarians) are eager for a tool that can truly search everything, we are not there yet. some have tried to make music-specific interfaces for library catalogs, but this work is not widespread.39 perhaps because music students are often searching for things other than articles it would be better to have one tool that searches the catalog and streaming media tools information technology and libraries | december 2018 103 and one that only searches article indexes. some schools have taken this approach—configuring their discovery layer indexes to include article content but not the local catalog. there were several observations in this data of patron search behavior are not fully supported by library systems in all cases, but perhaps should be (e.g., use of + signs, searching by record label numbers or genre names/types of music/formats). in some cases, this is an issue with the metadata standards in use and in others it is about needing more flexible search options based on the metadata that we already have. newcomer et al. discuss this in their article outlining music discovery requirements.40 tools like easy search and discovery layers solve some problems for users but can create others. dedicated library catalogs are still generally the best tools for finding scores and recordings in our physical (and some online) collections, but not all libraries offer that tool anymore, instead offering a discovery layer as the primary search tool. in those cases, serious consideration needs to be given to facets, the ability to limit by format, and especially the frbrization of items, which is particularly problematic for music. additionally, there is a continued need for targeted instruction for music library users, because not only are the tools used in libraries less than perfect, the inherent challenges in searching for music because of its formats and titles are aggravated by musicians’ use of shorthand and colloquialisms to describe music materials. endnotes 1 john boyd et al., “the one-box challenge: providing a federated search that benefits the research process,” serials review 32, no. 4 (december 2006): 247–54, https://doi.org/10.1016/j.serrev.2006.08.005; sharon dyas-correia et al., “’the one-box challenge: providing a federated search that benefits the research process’ revisited,” serials review 41, no. 4 (october-december 2015): 250–56, https://doi.org/10.1080/00987913.2015.1095581. 2 lucy holman, “millennial students’ mental models of search: implications for academic librarians and database developers,” journal of academic librarianship 37, no. 1 (january 2011): 19–27, https://doi.org/10.1016/j.acalib.2010.10.003; brandi porter, “millennial undergraduate research strategies in web and library information retrieval systems,” journal of web librarianship 5, no. 4 (july-december 2011): 267–85, https://doi.org/10.1080/19322909.2011.623538; martin zimerman, “digital natives, searching behavior, and the library,” new library world 11, nos. 3/4 (2012): 174–201, https://doi.org/10.1108/03074801211218552. 3 susan avery and dan tracy, “using transaction log analysis to assess student search behavior in the library instruction classroom,” reference services review 42, no. 2 (june 2014): 332, https://doi.org/10.1108/rsr-08-2013-0044. 4 andrew asher, lynda m. duke, and suzanne wilson, “paths of discovery: comparing the search effectiveness of ebsco discovery service, summon, google scholar, and conventional library resources,” college & research libraries 74, no. 5 (september 2013): 473, https://doi.org/10.5860/crl-374. https://doi.org/10.1080/00987913.2015.1095581 https://doi.org/10.1016/j.acalib.2010.10.003 https://doi.org/10.1080/19322909.2011.623538 https://doi.org/10.1108/03074801211218552 https://doi.org/10.1108/rsr-08-2013-0044 https://doi.org/10.5860/crl-374 the “black box” | dougan 104 https://doi.org/10.6017/ital.v37i4.10702 5 megan dempsey and alyssa valenti, “student use of keywords and limiters in web-scale discovery searching,” journal of academic librarianship 42, no. 3 (may 2016): 203, https://doi.org/10.1016/j.acalib.2016.03.002. 6 annie r. armstrong, “student perceptions of federated searching vs. single database searching,” reference services review 37, no. 3 august 2009): 291–303, https://doi.org/10.1108/00907320910982785; c. jeffrey belliston, jared l. howland, and brian c. roberts, “undergraduate use of federated searching: a survey of preferences and perceptions of value-added functionality,” college & research libraries 68, no. 6 (november 2007): 472-86, https://doi.org/10.5860/crl.68.6.472; sarah d. williams, angela bonnell, and bruce stoffel, “student feedback on federated search use, satisfaction, and web presence: qualitative findings of focus groups,” reference and user services quarterly 49, no. 2 (winter 2009): 131–39. 7 asher et al., “paths of discovery,” 476. 8 troy swanson and jeremy green, “why we are not google: lessons from a library web site usability study,” journal of academic librarianship 37, no. 3 (may 2011): 227, https://doi.org/10.1016/j.acalib.2011.02.014. 9 cory lown, tito sierra, and josh boyer, “how users search the library from a single search box,” college & research libraries 74, no. 3 (may 2013): 240, https://doi.org/10.5860/crl-321. 10 sarah dahlen and kathlene hanson, “preference vs. authority: a comparison of student searching in a subject-specific indexing and abstracting database and a customized discovery layer,” college & research libraries 78, no. 7 (november 2017), 892, https://doi.org/10.5860/crl.78.7.878. 11 ibid. 12 li fu and cynthia thomes, “implementing discipline-specific searches in ebsco discovery service,” new library world 115, nos. 3/4 (2014): 102–15, https://doi.org/10.1108/nlw-012014-0003. 13 kirstin dougan, “finding the right notes: an observational study of score and recording seeking behaviors of music students,” journal of academic librarianship 41, no. 1 (january 2015): 61–67, https://doi.org/10.1016/j.acalib.2014.09.013. 14 jennifer m. mayer, “serving the needs of performing arts students: a case study,” portal: libraries & the academy 15, no. 3 (july 2015): 416, https://doi.org/10.1353/pla.2015.0036. 15 joe clark and kristin yeager, “seek and you shall find? an observational study of music students’ library catalog search behavior,” journal of academic librarianship 44, no. 1 (january 2018): 105-12, https://doi.org/10.1016/j.acalib.2017.10.001. 16 christine d. brown, “straddling the humanities and social sciences: the research process of music scholars,” library & information science research 24, no. 1 (march 2002): 73–94, https://doi.org/10.1016/s0740-8188(01)00105-0; stephann makri and claire warwick, https://doi.org/10.1016/j.acalib.2016.03.002 https://doi.org/10.1108/00907320910982785 https://doi.org/10.5860/crl.68.6.472 https://doi.org/10.1016/j.acalib.2011.02.014 https://doi.org/10.5860/crl-321 https://doi.org/10.5860/crl.78.7.878 https://doi.org/10.1108/nlw-01-2014-0003 https://doi.org/10.1108/nlw-01-2014-0003 https://doi.org/10.1016/j.acalib.2014.09.013 https://doi.org/10.1353/pla.2015.0036 https://doi.org/10.1016/j.acalib.2017.10.001 https://doi.org/10.1016/s0740-8188(01)00105-0 information technology and libraries | december 2018 105 “information for inspiration: understanding architects' information seeking and use behaviors to inform design,” journal of the american society for information science & technology 61, no. 9 (september 2010): 1,745-770, https://doi.org/10.1002/asi.21338; francesca marini, “archivists, librarians, and theatre research,” archivaria 63 (2007): 7–33; ann medaille, “creativity and craft: the information-seeking behavior of theatre artists,” journal of documentation 66, no. 3 (may 2010): 327–47, https://doi.org/10.1108/00220411011038430; marybeth meszaros, “a theatre scholarartist prepares: information behavior of the theatre researcher,” in advances in library administration and organization (v. 29), delmus e. williams and janine golden, eds. (bingley, uk: emerald group publishing limited, 2010): 185-217; bonnie reed and donald r. tanner, “information needs and library services for the fine arts faculty,” journal of academic librarianship 27, no. 3 (may 2001): 231, https://doi.org/10.1016/s0099-1333(01)00184-7; shannon robinson, “artists as scholars: the research behavior of dance faculty,” college & research libraries 77, no. 6 (november 2016): 779-94, https://doi.org/10.5860/crl.77.6.779. 17 ethelene whitmire, “disciplinary differences and undergraduates’ information‐seeking behavior,” journal of the association for information science and technology 53 (june 2002): 631-38, https://doi.org/10.1002/asi.10123. 18 tina chrzastowski and lura joseph, “surveying graduate and professional students' perspectives on library services, facilities and collections at the university of illinois at urbana-champaign: does subject discipline continue to influence library use?,” issues in science & technology librarianship 45, no. 1 (winter 2006), https://doi.org/10.5062/f4dz068j. 19 ellen collins and graham stone, “understanding patterns of library use among undergraduate students from different disciplines,” evidence based library and information practice 9 (september 2014): 51–67, https://doi.org/10.18438/b8930k. 20 this is up from the 4.33 average reported by mischo in 2012 (164). 21 including direct from departmental webpage and via gateway es dropdown choices. 22 in mischo’s 2012 analysis of easy search logs, 52 percent of sessions had one string and 48 percent had two or more. by 2015, single-query sessions had risen to 57 percent (william mischo, et al., "the bento approach to library discovery: web-scale and beyond,” internet librarian international, october 21, 2015). 23 william h. mischo et al., “user search activities within an academic library gateway: implications for webscale discovery systems,” in planning and implementing resource discovery tools in academic libraries, ed. mary popp and diane dallis (hershey, pa: igi global, 2012), 163. 24 kirstin dougan, “information seeking behaviors of music students,” reference services review 40, no. 4 (november 2012): 563, https://doi.org/10.1108/00907321211277369. https://doi.org/10.1002/asi.21338 https://doi.org/10.1108/00220411011038430 https://doi.org/10.1016/s0099-1333(01)00184-7 https://doi.org/10.5860/crl.77.6.779 https://doi.org/10.1002/asi.10123 https://doi.org/10.5062/f4dz068j https://doi.org/10.18438/b8930k https://doi.org/10.1108/00907321211277369 the “black box” | dougan 106 https://doi.org/10.6017/ital.v37i4.10702 25 vanessa williams, “‘welded in a single mass’: memory and community in london’s concert halls during the first world war,” the journal of musicological research 33, nos. 1–3 (2014): 27–38. 26 mischo, “user search activities,” 162. 27 this echoes earlier research that shows most searchers use default settings and keyword searches. 28 avery and tracy, “using transaction logs,” 31. 29 barbara d. henigman and richard burbank, “online music symbol retrieval from the access angle,” information technology & libraries 14, 1 (march 1995): 5–16. 30 we still have to use our older voyager opac or the staff-side of voyager to effectively search by call number until we get a newer version of vufind. 31 symphony no. 4 in e flat “romantic” by anton bruckner, klaus tennstedt (conductor), london philharmonic orchestra. (performer). 32 this is mozart, “clarinet concerto in a, k. 622,” meyer/berlin philharmonic/abbado emi classics 57128; 7.24356e+11. 33 this is reich: sextet / piano phase / eight lines (griffiths kevin/ london steve reich ensemble/ the/ stephen wallace) (cpo: 777337-2)). 34 mischo, “user search activities,” 169. 35 this reinforces what lown and asher et al. found as cited in the literature review above. 36 kirstin dougan, “finding the right notes: an observational study of score and recording seeking behaviors of music students,” journal of academic librarianship 41, no. 1 (january 2015): 66. 37 alison head and michael eisenberg, “finding context: what today’s college students say about conducting research in the digital age,” progress report (2009) (retrieved from http://projectinfolit.org/images/pdfs/pil_progressreport_2_2009.pdf); dempsey and valenti, “student use of keywords and limiters,” 2016. 38 william h. mischo et al., “the bento approach to library discovery: web-scale and beyond,” internet librarian international, october 21, 2015. 39 anke hofmann and barbara wiermann, “customizing music discovery services: experiences at the hochschule für musik und theater, leipzig,” music reference services quarterly 17, no. 2 (june 2014): 61–75, https://doi.org/10.1080/10588167.2014.904699; bob thomas, “creating a specialized music search interface in a traditional opac environment,” oclc systems & services 27, no. 3 (august 2011): 248–56, https://doi.org/10.1108/10650751111164588. 40 nara newcomer et al., “music discovery requirements: a guide to optimizing interfaces,” notes 69, no. 3 (march 2013): 494-524, https://doi.org/10.1353/not.2013.0017. http://projectinfolit.org/images/pdfs/pil_progressreport_2_2009.pdf https://doi.org/10.1080/10588167.2014.904699 https://doi.org/10.1108/10650751111164588 https://doi.org/10.1353/not.2013.0017 abstract introduction background literature review general search studies and single search boxes search and library use in different disciplines methodology findings average number (and range) of searches per session average number (and range) of terms per search single-term searches two-term searches and names long search strings type of index search—title/author/keyword and adding subsets or tools use of boolean operators, quotation marks, parentheses, truncation, etc. work numbers and key indications search by format or edition type other observations on formulation of searches searching by call numbers and recording label numbers common descriptions, natural language queries, genre queries, and context words discussion search location appropriateness and context patron knowledge level shortcomings of the easy search tool further research conclusion endnotes microsoft word ital_march_gerrity.docx editor’s comments bob gerrity information technology and libraries | march 2013 1 with this issue, information technology and libraries (ital) begins its second year as an open-‐ access, e-‐only publication. there have been a couple of technical hiccups related to the publication of back issues of ital previously only available in print: the publication system we’re using (open journal system) treats the back issues as new content and automatically sends notifications to readers who have signed up to be notified when new content is available. we’re working to correct that glitch, but hope that the benefit of having the full ital archive online will outweigh the inconvenience of the extra e-‐mail notifications. overall though, ital continues to chug along and the wheels aren’t in danger of falling off any time soon. thanks go to mary taylor, the lita board, and the lita publications committee for supporting the move to the new model for ital. readership this year appears to be healthy—the total download count for the thirty-‐three articles published in 2012 was 42,166, with 48,160 abstract views. unfortunately we don’t have statistics about online use from previous years to compare with. the overall number of article downloads for 2012, for new and archival content, was 74,924. we continue to add to the online archive: this month the first issues from march 1969 and march 1981 were added. if you haven’t taken the opportunity to look, the back issues offer an interesting reminder of the technology challenges our predecessors faces. in this month’s issue, ital editorial board member patrick “tod” colegrove reflects on the emergence of makerspace phenomenon in libraries, providing an overview of the makerspace landscape. lita member danielle becker and lauren yannotta describe the user-‐centered website redesign process used at the hunter college libraries. kathleen weessies and daniel dotson describe gis lite provide examples of its use at the michigan state university libraries. vandana singh presents guidelines for adopting an open-‐source integrated library system, based on findings from interviews with staff at libraries that have adopted open-‐source systems. danijela boberić krstićev from the university of novi sad describes a software methodology enabling sharing of information between different library systems, using the z39.50 and sru protocols. beginning with the june issue of ital, articles will be published individually as soon as they are ready. ital issues will still close on a quarterly basis, in march, june, september, and december. by publishing articles individually as they are ready, we hope to make ital content more timely and reduce the overall length of time for our peer-‐review and publication processes. suggestions and feedback are welcome, at the e-‐mail address below. bob gerrity (r.gerrity@uq.edu.au) is university librarian, university of queensland, australia. near-field communication (nfc): an alternative to rfid in libraries articles near-field communication (nfc) an alternative to rfid in libraries neeraj kumar singh information technology and libraries | june 2020 https://doi.org/10.6017/ital.v39i2.11811 neeraj kumar singh (neerajkumar78ster@gmail.com), phd, is deputy librarian, panjab university, chandigarh, india abstract libraries are the central agencies for the dissemination of knowledge. every library aspires to provide maximum opportunities to its users and ensure optimum utilization of available resources. hence, libraries have been seeking technological aids to improve their services. near-field communication (nfc) is a type of radio-frequency technology that allows electronics devices—such as computers, mobile phones, tags, and others—to exchange information wirelessly across a small distance. the aim of this paper is to explore nfc technology and its applications in modern era. the paper will discuss potential use of nfc in the advancement of traditional library management system. introduction similar to other identification technologies such as radio-frequency identification (rfid), barcodes, and qr codes, near-field communication (nfc) is a short-range (4–10 cm) wireless communication technology. nfc is based on the existing 13.56 mhz rfid contactless card standards which have been established for several years and are used for payment, ticketing, electronic passport, and access control among many other applications. data rates range from 106 to 424 kilobits per second. a few nfc devices are already capable of supporting up to 848 kilobits per second which is now being considered for inclusion in the nfc forum specifications. 1 compared to other wireless communication technologies nfc is designed for proximity or shortrange communication which provides a dedicated read zone and some inherent security. its 13.56 mhz frequency places it within the ism band, which is available worldwide. it is a bi-directional communication meaning that you can exchange data in both directions with a typical range of 4 – 10 cm depending on the antenna geometry and the output power.2 nfc is convenient and fast: the action is automatically triggered when your phone comes within 10 cm near the nfc tag and you get instant access to the content on mobile, without a single click.3 rfid and nfc technologies are similar in that both use radio waves. both rfid and nfc technologies exchange data within electronic devices in active mode as well as in passive mode. in the active mode, outgoing signals are basically those that actually come from the power source, whereas in case of passive mode the signals use the reflected energy they have received from the active signal. in rfid technology the radio waves can send information to receivers up to hundreds of meters away depending on the frequency of the band used by th e tag. if provided with high amount of power, these signals can also be sent to extreme distances (e.g., in the case of airport radar). at large airports it typically controls traffic within a radius of 100 kilometers of the airport below an elevation of 25,000 feet. rfid is also used very often in tracking animals and vehicles. mailto:neerajkumar78ster@gmail.com information technology and libraries june 2020 near field communication (nfc) | singh 2 in contrast, items like passports and payment cards should not be capable of long-distance transmissions because of the threat of theft of personal information or funds. nfc is designed to meet this need. nfc tags are very small in size so as to fit on the inner side of devices and products such as inside luggage, purses and packs as well as from inside wallets and clothing and can be tracked. nfc technology has added security features that make it much more secure than the previously popular rfid equivalent and it is difficult to steal information stored in it. nfc has short range of work area compared to other wireless technologies, so it can be widely used for payments, ticketing and service admittance and thus has proved to be a safer technology. it is because of this security feature that this technology is used in cellular phones to turn them into a wallet.4 both rfid and nfc wireless technologies can operate in active and passive communication modes to exchange data within electronic devices. the main differences between nfc and rfid are: • though both rfid and nfc use radio frequencies for communication, nfc can be said to be an extension of the rfid technology. the rfid technology has been in use for more than a decade, but nfc has emerged on the scene recently. • rfid has a wider range whereas nfc has limited communication and operates only at close proximity. nfc typically has a range of a few centimeters. • rfid can function in many frequencies and many standards are being used, but nfc requires a fixed frequency of 13.56 mhz, and some other fixed technical specifications to function properly. • rfid technology can be used for such applications as item tracking, automated toll collecting on roads, vehicle movement, etc., that require wide area signals. nfc is appropriate for applications that carry data that needs to be kept secure like mobile payments, access controls, etc., that carry sensitive information. • rfid operates over long distances while exchanging data wirelessly so it is not secure for the applications that store personalized data. rfid using items susceptible to various fraud attacks such as data corruption. nfc’s short working range considerably reduces this risk of data theft, eavesdropping, and “man in the middle” attacks. • nfc has the capability to communicate both ways and thus is suitable to be used for advanced interactions such as card emulation and peer-to-peer sharing. • a number of rfid tags can be scanned simultaneously, while only a single nfc tag can be scanned at a time. how nfc works the extended functionality of a traditional rfid system has led to the nfc forum. the nfc forum has defined three operating modes for nfc devices: tag reader/writer mode; peer-to-peer mode, and card emulation mode (see figure 1). the nfc forum technical specifications for the different operating modes are based on the iso/iec 18092 nfc ip-1, jis x 6319-4, and iso/iec 14443. these specifications must be used to derive the full benefit from the capabilities of nfc technology. contactless smart card standards are referred to as nfc-a, nfc-b, and nfc-f in nfc forum specifications.5 information technology and libraries june 2020 near field communication (nfc) | singh 3 figure 1. nfc operation modes6 reader/writer mode in reader/writer mode (see figure 2), an nfc-enabled device is capable of reading nfc forummandated tag types, such as a tag embedded in an nfc smart poster. this mode allows nfcenabled devices to read the information that is stored on nfc tags embedded in smart posters and displays. since these tags are relatively inexpensive, they provide a great marketing tool for companies. figure 2. reader mode7 the reader/writer mode on the radio frequency interface is compliant with the nfc-a, nfc-b, and nfc-f schemes. examples of its use include reading timetables, tapping for special offers, and updating frequent flyer points, etc.8 information technology and libraries june 2020 near field communication (nfc) | singh 4 peer-to-peer mode in peer-to-peer mode (see figure 3), both devices must be nfc-enabled in order for them to communicate with each other to exchange information and to share files. the users of nfcenabled devices can thus quickly share information and other files with a touch. as an example, users can exchange data such as digital photos or virtual business cards via bluetooth or wifi. figure 3. peer-to-peer mode9 peer-to-peer mode is based on the nfc forum’s logical link control protocol specification and is standardized on the iso/iec 18092 standard. card-emulation mode in card-emulation mode (see figure 4), an nfc device behaves like a contactless smart card so that users can perform transactions such as purchases, ticketing, and transit access control with just a touch. an nfc device may have the ability to emulate more than one card. in card-emulation mode, an nfc-enabled device communicates with an external reader much like a traditional contactless smart card. this allows contact less payments and ticketing by nfc-enabled devices without changing the existing infrastructure. information technology and libraries june 2020 near field communication (nfc) | singh 5 figure 4. card-emulation mode by adding nfc to a contactless infrastructure one can enable two-way communications. in the air transport sector, this could simplify many operations such as updating seat information while boarding or adding frequent flyer points while making a payment.10 nfc standards and specifications the nfc specifications are defined by an industry organization called the nfc forum, which has nearly 200 member companies. the nfc forum was formed in 2004 with the objective of advancing the use of nfc technology. this was achieved by educating the market about nfc technology and developing specifications to ensure interoperability among devices and services. the nfc forum members are working together in task forces and working groups. as noted earlier, nfc technology is based on existing 13.56 mhz rfid standards and includes several protocols such as iso 14443 type a and type b, and jis x 6319-4 (which is also a japanese industrial standard known as sony felica). the iso 15693 standard, an additional 13.56 mhz protocol established in the market, is being integrated into the nfc specification by an nfc forum task force. smartphones in the market are already supporting the iso 15693 protocol.11 these nfc specifications and especially the specifications for the extended nfc functionalities are again standardized by the international standard organizations like iso/iec ecma and etsi.12 initially the rfid standards i.e. iso/iec 14443 a, iso/iec 14443 b and jis x6319-4 were also pronounced as nfc standards by different companies working in the field such as nxp, infineon, and sony. the first ever nfc standard was ecma 340, based on the air interface of iso/iec 14443a and jis x6319-4. ecma 340 adapted the iso/iec standard 18092. at the same time, major credit card companies like europay, mastercard, and visa introduced the emvco payment standard, which is based on iso/iec 14443 a and iso/iec 14443 b. these groups harmonised the over-the-air interfaces within the nfc forum. they are named nfc-a (iso/iec 14443 a based), nfc-b (iso/iec 14443 b based), and nfc-f (felica based).13 information technology and libraries june 2020 near field communication (nfc) | singh 6 nfc tags an nfc tag is a small microchip embedded in a sticker or wristband that can be read by the mobile devices that are within range. information regarding the item is stored in these microchips.14 an nfc tag has the capability to send the information stored on it to nfc enabled mobile phones. nfc tags can also perform various actions, such as changing the settings of handsets or even launch a website.15 tag memory capacity varies by the type of tag. for example, a tag may store a phone number or a url.16 the most common use of the nfc tag function on an object is mobile wallet payment processing, where the user swipes or flicks a mobile phone on a nfc tag to make payment. google’s version of this system is google wallet.17 figure 5. a quick overview of the tag types18 applications of nfc since it emerged as a standard technology in 2003, nfc technology has been implemented across multiple platforms in various ways. the primary driving force behind nfc is its application in the commercial sector in which the implementation of the technology focuses on such areas as sales and marketing. there are also emerging many new and interesting applications in various other fields of education and healthcare. all of these may impact libraries, librarians, and library users, either by prompting adaptations to existing collections and services or inspiring innovation in our profession.19 • mobile payment: customers with nfc-enabled smartphones can link with their bank accounts and are able to pay by simply tapping phones to an nfc-enabled point-of-sale.20 information technology and libraries june 2020 near field communication (nfc) | singh 7 • access and authentication: “keyless access” to restricted areas, cars, and other vehicles. one can imagine other potential uses of nfc in the future with the devices in the home being controlled by it.21 • transportation and ticketing: nfc-enabled phones can connect with an nfc-enabled kiosk to download a ticket, or the ticket can be sent directly to an nfc-enabled phone over the air (ota). the phone can then tap a reader to redeem that ticket and gain access. 22 • mobile marketing: nfc tags they can be embedded into the indoor and outdoor signage. upon tapping their smartphone on an nfc-enabled smart poster, the customer can read a consumer review, visit a website, or even view a movie trailer. • healthcare: nfc medical cards and bracelet tags can store relevant, up-to-date patient information like health history, allergies, infectious diseases, etc. • gaming: nfc technology is the bridge between physical and digital games. players can tap each other’s phones together and earn extra points or receive access to a new level, or get clues, by using nfc application.23 • inventory tracking, smart packaging, and shelf labels: nfc-tagged objects could provide a wide variety of information in different use environments. nfc-enabled smartphones can be used to tap the tags to access book reviews and information about the book’s author and recommend the book to other readers. users could check out a book or add it to a wish list to check out at a later date. indeed, with nfc, library records and metadata could theoretically be stored on and retrieved from library physical holdings themselves, allowing a patron to tap a book or resource borrowed from the library to recall its title, author, and due date.24 applications of nfc in libraries: introducing the smart library some libraries are beginning to use nfc technology as an alternative to rfid. yusof et al. proposed a newly developed application called the smart library, or “s-library,” that has adopted the nfc technology.25 in the s-library, library users can perform many library transactions just by using their mobile smartphones with integrated nfc technology. the users of s-library are required to download and install an app in their compatible mobile phone. this app provides the user relevant and easy to use library functionality such as searching, borrowing, returning, and viewing their transaction records. in this s-library model the app is integrated with the library management software. the s-library app needs to be installed on the mobile device, and the mobile device requires an internet connection that will connect it to the lms. the s-library provides five major functionalities to the user: scan, search, borrow, return, and transaction history. in the scanning function, users can access the information of a book by simply touching their mobile phone to the nfc tag on the book. as soon as the phone touches the book, information regarding its title, author, contents, synopsis, etc. will automatically be displayed on the screen of the mobile device. users can search for books by entering keywords such as book title, author name, year, etc. through the borrowing function the app allows users to check out books of interest. the user just needs to touch their mobile phone to the nfc-tagged book to borrow it. the transaction is automatically stored to the lms database. similar to the borrowing process is the returning process. the user is required to select the return function on the menu and touch the mobile device to the book, and the returning transaction will be automatically performed and stored in the lms database. however, it should be ensured that the book is physically returned to the library by returning the book through the nfc-enabled book drop information technology and libraries june 2020 near field communication (nfc) | singh 8 system of the library and only then transaction should be updated in the lms. the user can check the due date for the current transaction as well as his transaction history. the function of transaction history allows the user to view the list of books that have been borrowed from time to time and their status.26 data transmission for nfc technology can be up to 848 kilobits/second whereas the data transmission rate with rfid technology is 484 kilobits/second. taking advantage of this high data rate, the response time for s-library is also very fast. this is a huge improvement over rfid technology and especially over barcode technology where data transmission rate is variable and inconsistent and dependent upon the quality of the barcodes. the second key advantage of slibrary is that the time taken to read a tag (the communication time between a reader and an nfc enabled device) is very fast. the third advantage of nfc is its usability in comparison to the other two technologies. nfc technology is human-centric because it is intuitive and fast and the user is able to use it anywhere, anytime using their mobile phones. in rfid and barcode technology usability is item centric as person has to go to the specific device located in the library. 27 most of the shortcomings of rfid and barcode technology have been overcome by the s-library. with barcode technology, the quality of barcodes, printing clarity, print contrast ratio , and also the low level of security were all challenges. rfid technology had many drawbacks such as lack of common rfid standards, security vulnerability, reader and tag collision that happens when multiple tags are energized by the rfid tag reader simultaneously and they reflect their respective signals back to the reader at the same time. because nfc is touch based, it has presented a viable alternative tool for library users to overcome these weaknesses of the older technology. yosof et al. found many advantages to s-library: faster book borrowing; saved time of the user as well as the library staff; the connection can be initialised in less than a second; no configuration on the mobile device is required; and higher usability ratings and security.28 however, there are also some limitations of s-library. first, device compatibility is an issue, because s-library presently supports only the android platform. second, as the s-library application only supports up to a 10centimeter range, coverage is an issue. mobile payments nfc technology can be used for several library functions such as making payments, paying library fines, purchasing tickets to library events, or donating to library. users may also be able to use their digital wallet to pay for photocopying, printing, scanning, etc. keeping the requirements of the nfc technology in the future libraries have to enquire about the possibility of adding nfc payment capabilities into the existing hardware and also while purchasing new machines. already, bibliotheca’s smartserv 1000 self-serve kiosk, introduced in september 2013, includes nfc as a payment option. in the future other library automation companies for nfc integration would also be worth monitoring.29 library access and authentication nfc-enabled devices can be used to accessing the library and authenticate users. these capabilities suggest that nfc technology may play an important role in the next generation of identity management systems. of particular interest in this context are several applications of nfc in two-factor authentication, which generally combines a traditional password or other digital credential with a physical, nfc-enabled component as well. for example, an authentication system information technology and libraries june 2020 near field communication (nfc) | singh 9 could require the user to type in a fixed password in addition to tapping an nfc-enabled phone, identity card, or ring to the device they are logging in to. ibm has demonstrated a two -factor authentication method for mobile payment in which a user first types in a password and then taps an nfc-enabled credit card, issued by their bank, to their nfc-enabled smartphone. libraries could investigate similar access and authentication applications for nfc, both for internal use (staff badges and keys) as well as for public services. particularly if nfc mo bile payment finally gains consumer attraction, library patrons may begin to expect that they can use their nfc-enabled mobile devices to replace not just their credit cards but also their library cards. already, d-tech’s rfid air self check unit allows library patrons to log into their user accounts by tapping their nfc-enabled phone to the kiosk. the patron then uses the kiosk’s rfid reader to check out their library materials and receives a receipt via email or sms. beyond its application in circulation, nfc authentication can be applied to streamline access to other services and resources of the library.30 nfc-enabled devices could be used to make reservation of library spaces, classrooms, auditoriums or community halls, digital media labs, meeting rooms , etc. library users could use nfc authentication to be able to access digital library resources, such as databases, e-journals, e-books collections, and other digital collections. nfc might allow libraries of all kinds to provide more convenient access and authentication options to users, though privacy and security considerations would certainly need to be addressed. nfc access and authentication will certainly have an impact on academic libraries. at universities where nfc access systems are deployed, student identification cards can be replaced with nfc-enabled mobile phones for afterhours services such as library building entry, wifi access, and printing, copying, and scanning services. the inconvenience of multiple logins can be eliminated. however, the libraries will have to take the responsibility of protecting student information and library resources with added security.31 promotion of library services librarians can borrow ideas from commercial implementations of nfc-based marketing to enhance promotions for library resources, services, and events. as a first step, as kane and schneidewind suggested, nfc tags can complement several promotional uses of qr codes that have already been piloted or implemented in libraries. 32 for promotional use, libraries can easily embed nfc tags in their new book displays that can be linked to the bestseller list or current acquisitions lists in the library catalog or digital collections. similarly, if the reference book collection is tagged with nfc tags, it could be linked to the relevant digital collections of databases or e-books. nfc tags can be placed on library building doors or on library promotional material by which information such as library hours, opening days, schedule of events, membership rules , or floor plans for the building could be shared. as an example, at the renison university college library in ontario, canada, visitors can tap an nfc-enabled “library smartcard” to retrieve a digital brochure of library services in a variety of formats, including pdf, epub, and mp3.33 to promote outreach programs and events instead of merely sharing links the libraries can take advantage of nfc’s interactive capabilities. as an example, libraries could use nfc tags on their event posters so that the users that can scan them and register for an event, save the event to their personal calendar, join the friends of the library program, or even download a library app. to send a text message to a librarian the users can tap the smart poster promoting a virtual reference service. nfc-enabled promotional materials can engage users with library content even when they are outside of the library building itself. a brilliantly creative example was created by the field information technology and libraries june 2020 near field communication (nfc) | singh 10 museum of chicago. it used nfc-enabled outdoor smart posters throughout the city to promote an exhibit of the 1893 world’s fair. the event posters depicted a personage from 1893 that invited the viewer to “see what they saw.” users could tap their nfc-enabled mobile device to the smart poster (or read a qr code) to download an app from the field museum that included 360° images of the fair as well as videos highlighting items in the exhibition.34 inventory control the smart packaging use case brings forward a very important question for libraries that use rfid for inventory control. first, can existing rfid tags and infrastructure be leveraged to provide additional services to patrons with nfc-enabled mobile devices? the concept is not new; walsh envisioned using library rfid tags to store book recommendations or other digital information, which users could then access with a conveniently located rfid reader. 35 what nfc brings to walsh’s vision is that a dedicated rfid reader may no longer be necessary; a patron could use their own nfc-enabled smartphone to read a tag rather than taking it to a special location to be read. indeed, with nfc, library records and metadata could theoretically be stored on and retrieved from library physical holdings themselves, allowing a patron to tap a book or resource borrowed from the library to recall its title, author, and due date. an exciting and immediate use for nfc in libraries is for self-checkout: a patron can browse the stacks and could tap an nfctagged book with their nfc-enabled phone to check it out without visiting the circulation desk or waiting in line.36 smart packaging a sector close to librarians’ hearts is publishing and several publishers have started testing smart packaging for books, using embedded nfc tags to share additional content with readers such as book reviews, reading lists, etc. with digital extras, the concept of smart packaging has significant implications for libraries as a new opportunity to connect physical collections (i.e., from books to digital media). one can envision in the future that when a user taps an nfc-enabled library book they shall get access to relevant digital information (such as bibliographic information) in a variety of citation formats, editorial reviews, the author’s biography, a projected rating for the book, and links to other similar information. borrowing and returning books one of a library’s key functions is circulating physical books from the library’s collections. due to the low cost of barcode technology, many libraries around the world are using it for circulation management. however, barcode technology has several constraints: it requires a line-of-sight to the barcode, it does not provide security of library collection, it does not offer any benefit for collection management, and it is becoming challenging for libraries to satisfy the increasing demands of their users, for example, reservation of books issued out, checking their transaction history, etc. this leads to the need to implement a new technology to improve the library circulation management, inventory, and security of library collections. librarians are known as early adopters of technology and have started using rfid to provide circulation services in a more effective and efficient manner, for security of library collections, and to satisfy the increasing demands of the users, for example putting tags in books allows them to issue multiple books together by placing stack of books near a reader. information technology and libraries june 2020 near field communication (nfc) | singh 11 recommendations according to mchugh and yarmey, the implementation of nfc has been slow and unsteady and they do not foresee an immediate implementation in libraries.37 however, they recommend that librarians learn and prepare for nfc. they recommend, for example, that librarians: • follow the progress of research and scholarship on nfc and commercial progress of nfc technology to better anticipate its adoption in your community; • experiment with nfc technology and develop prototype applications for nfc use in the library; • offer an informational workshop on nfc for users and library colleagues; • enquire from the rfid vendor about tag compatibility with nfc and rewriting the tags; • monitor the progress of security and privacy aspects of nfc technology and educate the users about these issues; develop or update your library security policy; • allow patrons to “opt-in” to any nfc services at your library, providing other modes of communication where possible; • develop and share best practices for nfc implementations; and • support research on nfc in libraries via planning grants, research forums, and conference sessions. conclusions beyond the potential benefits of nfc, librarians should also be aware of and prepared for privacy and security concerns that accompany the technology. user privacy is of the utmost concern. nfc involves users’ mobile devices generating, collecting, storing, and sharing a significant amount of personal data. several of these functions, particularly mobile payment, necessitate the exchange of highly confidential data, including but not limited to a user’s financial accounts, purchase history, etc. spam may also be a concern; sending unwanted content (e.g., advertisements, coupons, or adware) to users’ mobile devices without their consent. librarians should also use special caution when considering the implementation of nfc for library promotions or services. security is a significant concern and an active area of research, as many nfc implementations involve the exchange of sensitive financial or otherwise personal data. an important concept in nfc security, particularly in the context of mobile payment, is the idea of a tamper-proof “secure element” as a basic protection for sensitive or confidential data such as account information and credentials for authentication.38 outside of continued standardization, the most effective measures for protecting n fc data transmissions are data encryption and the establishment of a secure channel between the sending and receiving devices (e.g., using a key agreement protocol and/or via ssl). for security concerns, as with privacy concerns, librarians have a crucial role to play in user education. there are important steps that individual users can and should take to protect their devices—e.g., setting a lock code for their device, knowing how to remotely wipe a stolen phone, and installing and regularly updating antivirus software. however, many users are unaware of the vulnerability of their mobile devices and often fail to enact even basic protections. by empowering objects and people to communicate with each other at a different level and establish a “touch to share” paradigm, nfc technology has the potential to transform the information technology and libraries june 2020 near field communication (nfc) | singh 12 information environment surrounding our libraries and fundamentally alter the ways in which the library patrons interact with information. endnotes 1 doaa abdel-gaber and abdel-aleem ali, “near-field communication technology and its impact in smart university and digital library: comprehensive study,” journal of library and information sciences, 3, no. 2 (december 2015): 43-77, https://doi.org/10.15640/jlis.v3n2a4. 2 “nfc technology discover what nfc is, and how to use it,” accessed march 17, 2019, https://www.unitag.io/nfc/what-is-nfc. 3 apuroop kalapala, “analysis of near field communication (nfc) and other short range mobile communication technologies” (project report, indian institute of technology, roorkee, 2013 ), accessed march 19, 2019, https://idrbt.ac.in/assets/alumni/pt2013/apuroop%20kalapala_analysis%20of%20near%20field%20communication%20(nfc) %20and%20other%20short%20range%20mobile%20communication%20technologies_2013. pdf. 4 ed, “near field communication vs radio frequency identification,” accessed march 10, 2019, http://www.nfcnearfieldcommunication.org/radio-frequency.html. 5 “what it does,” nfc forum, accessed march 12, 2019, https://nfc-forum.org/what-is-nfc/whatit-does. 6 josé bravo et al., “m-health: lessons learned by m-experiences,” sensors 18, 1569 (2018): 1–27. 10.3390/s18051569. 7 vedat coskun, busra ozdenizci, and kerem ok, “the survey on near field communication,” sensors 15, no. 6 (2015): 13348-405, https://doi.org/10.3390/s150613348. 8 coskun, ozdenizci, and ok, “the survey on near field communications,” 13352. 9 coskun, ozenizci, and ok, “the survey on near field communication.” 10 “how nfc works?,” cnrfid, accessed january 12, 2019, http://www.centrenationalrfid.com/how-nfc-works-article-133-gb-ruid-202.html. 11 coskun, ozdenizci, and ok, “the survey on near field communication,” 13352. 12 c. ruth, “nfc forum calls for breakthrough solutions for annual competition,” accessed march 21, 2019, https://nfc-forum.org/newsroom/nfc-forum-calls-for-breakthrough-solutions-forannual-competition/. 13 m. roland, “near field communication (nfc) technology and measurements,” accessed may 12, 2019, https://cdn.rohdeschwarz.com/pws/dl_downloads/dl_application/application_notes/1ma182 /1ma182_5e_nfc_white_paper.pdf. 14 roland, “near field communication (nfc) technology and measurements.” https://doi.org/10.15640/jlis.v3n2a4 https://www.unitag.io/nfc/what-is-nfc https://idrbt.ac.in/assets/alumni/pt-2013/apuroop%20kalapala_analysis%20of%20near%20field%20communication%20(nfc)%20and%20other%20short%20range%20mobile%20communication%20technologies_2013.pdf https://idrbt.ac.in/assets/alumni/pt-2013/apuroop%20kalapala_analysis%20of%20near%20field%20communication%20(nfc)%20and%20other%20short%20range%20mobile%20communication%20technologies_2013.pdf https://idrbt.ac.in/assets/alumni/pt-2013/apuroop%20kalapala_analysis%20of%20near%20field%20communication%20(nfc)%20and%20other%20short%20range%20mobile%20communication%20technologies_2013.pdf https://idrbt.ac.in/assets/alumni/pt-2013/apuroop%20kalapala_analysis%20of%20near%20field%20communication%20(nfc)%20and%20other%20short%20range%20mobile%20communication%20technologies_2013.pdf http://www.nfcnearfieldcommunication.org/radio-frequency.html https://nfc-forum.org/what-is-nfc/what-it-does https://nfc-forum.org/what-is-nfc/what-it-does https://doi.org/10.3390/s150613348 http://www.centrenational-rfid.com/how-nfc-works-article-133-gb-ruid-202.html http://www.centrenational-rfid.com/how-nfc-works-article-133-gb-ruid-202.html https://nfc-forum.org/newsroom/nfc-forum-calls-for-breakthrough-solutions-for-annual-competition/ https://nfc-forum.org/newsroom/nfc-forum-calls-for-breakthrough-solutions-for-annual-competition/ https://cdn.rohdeschwarz.com/pws/dl_downloads/dl_application/application_notes/1ma182/1ma182_5e_nfc_white_paper.pdf https://cdn.rohdeschwarz.com/pws/dl_downloads/dl_application/application_notes/1ma182/1ma182_5e_nfc_white_paper.pdf information technology and libraries june 2020 near field communication (nfc) | singh 13 15 “what is a near field communication tag (nfc tag)?,” techopedia, accessed may 27, 2019, https://www.techopedia.com/definition/28812/near-field-communication-tag-nfc-tag. 16 “what is meant by the nfc tag?,” quora, accessed july 12, 2019, https://www.quora.com/what-is-meant-by-the-nfc-tag. 17 s. profis, “everything you need to know about nfc and mobile payments,” accessed june 27, 2019, https://www.cnet.com/how-to/how-nfc-works-and-mobile-payments/. 18 “the 5 nfc tag types,” accessed march 24, 2019, https://www.dummies.com/consumerelectronics/5-nfc-tag-types/. 19 abdel-gaber and ali, “near-field communication technology and its impact in smart university and digital library,” 64–71. 20 iviane ramos de luna et al., “nfc technology acceptance for mobile payments: a brazilian perspective,” review of business management 19, no. 63 (2017): 82–103, https://doi.org/10.7819/rbgn.v0i0.2315. 21 rajiv, “applications and future of near field communication,” accessed march 14, 2019, https://www.rfpage.com/applications-near-field-communication-future/. 22 “nfc in public transport,” nfc forum, accessed april 12, 2019, http://www.smartticketing.org/downloads/papers/nfc_in_public_transport.pdf. 23 “gaming applications with rfid and nfc technology,” smarttech, accessed may 14, 2019, https://www.smarttec.com/en/applications/gaming. 24 sheli mchugh and kristen yarmey, “near field communication: recent developments and library implications,” synthesis lectures on emerging trends in librarianship 1, no. 1 (march 2014), 1–93. 25 m.k. yusof et al., “adoption of near field communication in s-library application for information science,” new library world 116, no. 11/12 (2015): 728–47, https://doi.org/10.1108/nlw-02-2015-0014. 26 yusof et al., “adoption of near field communication,” 734–36. 27 yusof et al., “adoption of near field communication,” 744. 28 yusof et al., “adoption of near field communication,” 745. 29 abdel-gaber and ali, “near-field communication technology and its impact in smart university and digital library,” 64. 30 mchugh and yarmey, “near field communication,” 27. 31 mchugh and yarmey, “near field communication,” 734. https://www.techopedia.com/definition/28812/near-field-communication-tag-nfc-tag https://www.quora.com/what-is-meant-by-the-nfc-tag https://www.cnet.com/how-to/how-nfc-works-and-mobile-payments/ https://www.dummies.com/consumer-electronics/5-nfc-tag-types/ https://www.dummies.com/consumer-electronics/5-nfc-tag-types/ https://doi.org/10.7819/rbgn.v0i0.2315 https://www.rfpage.com/applications-near-field-communication-future/ http://www.smart-ticketing.org/downloads/papers/nfc_in_public_transport.pdf http://www.smart-ticketing.org/downloads/papers/nfc_in_public_transport.pdf https://www.smarttec.com/en/applications/gaming https://doi.org/10.1108/nlw-02-2015-0014 information technology and libraries june 2020 near field communication (nfc) | singh 14 32 danielle kane and jeff schneidewind, “qr codes as finding aides: linking electronic and print library resources,” public services quarterly 7, no. 3–4 (2011): 111–24, https://doi.org/10.1080/15228959.2011.623599. 33 mchugh and yarmey, “near field communication,” 31. 34 mchugh, and yarmey, “near field communication,” 31. 35 andrew walsh, “blurring the boundaries between our physical and electronic libraries: location-aware technologies, qr codes and rfid tags,” the electronic library 29, no. 4 (2011): 429–37, https://doi.org/10.1108/02640471111156713. 36 projes roy and shailendra kumar, “application of rfid in shaheed rajguru college of applied sciences for women library, university of delhi, india: challenges and future prospects,” qualitative and quantitative methods in libraries 5, no. 1 (2016): 117–130, http://www.qqmljournal.net/index.php/qqml/article/view/310. 37 mchugh, and yarmey, “near field communication,” 61–2. 38 garima jain and sanjeet dahiya, “nfc: advantages, limits and future scope,” international journal on cybernetics & informatics 4, no. 4 (2015): 1–12, https://doi.org/10.5121/ijci.2015.4401. https://doi.org/10.1080/15228959.2011.623599 https://doi.org/10.1108/02640471111156713 http://www.qqml-journal.net/index.php/qqml/article/view/310 http://www.qqml-journal.net/index.php/qqml/article/view/310 https://doi.org/10.5121/ijci.2015.4401 abstract introduction how nfc works reader/writer mode peer-to-peer mode card-emulation mode nfc standards and specifications nfc tags applications of nfc applications of nfc in libraries: introducing the smart library mobile payments library access and authentication promotion of library services inventory control smart packaging borrowing and returning books recommendations conclusions endnotes author name and second author b y now, most library and information technology association (lita) members and information technology and libraries (ital) readers know that 2006 is the fortieth anniversary of lita’s predecessor, the information science and automation division (isad) of the american library association (ala). and 2007 marks the fortieth birthday of ital, first published in 1967 as the journal of library automation (jola). i hope that members and readers know the vital role played by fred kilgour in the founding of the division and as jola’s founding editor. this issue marks the initiation of a two-volume celebration (volumes 25 and 26) of his role as founding editor by publishing what we hope are significant articles resulting from original research, the development of important and creative new systems, or explications of significant new technologies that will shape future information technologies. i have invited some of the authors of these articles to submit their manuscripts. others are being submitted in response to a call i published both in an earlier editorial and in a message to the lita-l discussion list. whether invited or submitted, they will receive the same double-blind refereeing that all ital articles undergo. the referees will not know which articles have been invited or submitted for this purpose. the articles will, however, be so designated when they are published. volume 25 initiates a second landmark for ital. henceforth, ital will be published simultaneously in electronic and print versions. the electronic copy will be available to lita members and ital subscribers on the ala/lita web site. equally significantly, at the 2006 ala midwinter meeting in san antonio, the lita board of directors approved a second proposal from the lita publications committee. (the ital editor and editorial board report to the publications committee.) after six months, the electronic issues will be open to all, not restricted to members and subscribers. put simply, if you are a member or subscriber reading this issue in print, you may also read it and volume 25, number 1 (the march 2006 issue) on the web. when volume 25, number 3 is published in september 2006, the march issue on the web will be open for anyone to read. when the december issue is published, this june e-issue will be open to all. the web versions are to be published in both pdf and html versions. most ital articles now include urls. readers will be able to link to them. most figures and graphs submitted by authors are in color. from now on, these will be available to the readers of the e-copies. ala publishing allows authors to submit their articles to institutional repositories, and many authors now do so. authors will retain this option. some articles have been posted on other portals as well. martha yee’s outstanding june 2005 article on how to frbrize the opac appears not only on ucla’s repository site but also on the escholarship repository site of the university of california system, one of the few library-related articles on the site (http://repositories.cdlib.org/escholarhip). furthermore, on november 29, 2005, it was among the top ten most popular articles on the site. recently, dlist (http://dlist.sir.arizona.edu) at the university of arizona library received permission to include it. the decisions to allow simultaneous publication of print and electronic versions and to allow open access after six months were not made lightly. the lita board members carried on extensive electronic discussions among themselves and with nancy colyar, chair of the publications committee, and me. lita president pat mullin’s summary of those discussions was more than ten single-spaced pages. nancy and i also attended a meeting of the board in san antonio. publications and memberships are two chief sources of revenue for almost all professional associations. in two surveys in the past ten years, lita members have indicated they considered ital to be their most important membership benefit. lita membership fell this year, probably because of the recent dues increases by other divisions of ala. this decline was anticipated by lita’s leadership. i think both the ital editorial board and the lita leadership would love to take the additional pioneering step of making our journal a full open-access publication. however, legitimate concern was expressed that opening access after six months might lead to both a decrease in members and subscribers. a significant number of lita leaders said that their membership was based on lita programs, participation, and interaction with colleagues, not just ital. i hope that all lita members feel the same. i further hope that lita members will do everything they can to discourage their libraries from canceling their subscriptions. our financial health would be enhanced if all lita members took two other steps: participating in writing and encouraging the writing of significant articles, and encouraging your many library technology vendors to advertise in ital. fred kilgour and the other founders of our division were library information technology (it) pioneers. fred’s leadership helped make jola and now ital vital reading for library it professionals. i believe that by celebrating the lita/ital anniversaries with a reconfirmation of our practice of publishing articles of the highest quality and by making ital more accessible through electronic publication, we are reaffirming the scholarly and professional commitments first made by fred kilgour and his isad colleagues such a short forty years ago. john webb john webb (jwebb@wsu.edu) is assistant director for systems and planning, washington state university libraries, pullman, and editor of information technology and libraries. editorial: lita and ital: forty and still counting editorial | webb 51 june_ita_pekala_final privacy and user experience in 21st century library discovery shayna pekala information technology and libraries | june 2017 48 abstract over the last decade, libraries have taken advantage of emerging technologies to provide new discovery tools to help users find information and resources more efficiently. in the wake of this technological shift in discovery, privacy has become an increasingly prominent and complex issue for libraries. the nature of the web, over which users interact with discovery tools, has substantially diminished the library’s ability to control patron privacy. the emergence of a data economy has led to a new wave of online tracking and surveillance, in which multiple third parties collect and share user data during the discovery process, making it much more difficult, if not impossible, for libraries to protect patron privacy. in addition, users are increasingly starting their searches with web search engines, diminishing the library’s control over privacy even further. while libraries have a legal and ethical responsibility to protect patron privacy, they are simultaneously challenged to meet evolving user needs for discovery. in a world where “search” is synonymous with google, users increasingly expect their library discovery experience to mimic their experience using web search engines.1 however, web search engines rely on a drastically different set of privacy standards, as they strive to create tailored, personalized search results based on user data. libraries are seemingly forced to make a choice between delivering the discovery experience users expect and protecting user privacy. this paper explores the competing interests of privacy and user experience, and proposes possible strategies to address them in the future design of library discovery tools. introduction on march 23, 2017, the internet erupted with outrage in response to the results of a senate vote to roll back federal communications commission (fcc) rules prohibiting internet service providers (isps), such as comcast, verizon, and at&t, from selling customer web browsing histories and other usage data without customer permission. less than a week after the senate vote, the house followed suit and similarly voted in favor of rolling back the fcc rules, which were set to go into effect at the end of 2017.2 the repeal became official on april 3, 2017 when the president signed it into law.3 this decision by u.s. lawmakers serves as a reminder that today’s internet economy is a data economy, where personal data flows freely on the web, ready to be compiled and sold to the highest bidder. continuous online tracking and surveillance has become the new normal. shayna pekala (shayna.pekala@georgetown.edu) is discovery services librarian, georgetown university library, washington, dc. privacy and user experience in 21st century library discovery | pekala https://doi.org/10.6017/ital.v36i2.9817 49 isps are just one of the many players in the online tracking game. major web search engines, such as google, bing, and yahoo, also collect information about users’ search histories, among other personal information.4 by selling this data to advertisers, data brokers, and/or government agencies, these search engine companies are able to make a profit while providing the search engines themselves for “free.” in addition to profiting off of user data, web search engines also use it to enhance the user experience of their products. collecting and analyzing user data enables systems to learn user preferences, providing personalized search results that make it easier to navigate the ever-increasing sea of online information. the collection and sharing of user data that occurs on the open web is deeply troubling for libraries, whose professional ethics embody the values of privacy and intellectual freedom. a user’s search history contains information about a user’s thought process, and the monitoring of these thoughts inhibits intellectual inquiry.5 libraries, however, would be remiss to dismiss the success of web search engines and their use of data altogether. mit’s preliminary report on the future of libraries urges, “while the notion of ‘tracking’ any individual’s consumption patterns for research and educational materials is anathema to the core values of libraries...the opportunity to leverage emerging technologies and new methodologies for discovery should not be discounted.”6 this article examines the current landscape of library discovery, the competing interests of privacy and user experience at play, and proposes possible strategies to address them in the future design of library discovery tools. background library discovery in the digital age the advent of new technologies has drastically shaped the way libraries support information discovery. while users once relied on shelf-browsing and card catalogs to find library resources, libraries now provide access to a suite of online tools and interfaces that facilitate cross-collection searching and access to a wide range of materials. in an online environment, many paths to discovery are possible, with the open web playing a newfound and significant role. today’s library discovery tools fall into three categories: online catalogs (the patron interface of the integrated library system (ils)), discovery layers (a patron interface with enhanced functionality that is separate from an ils), and web-scale discovery tools (an enhanced patron interface that relies on a central index to bring together resources from the library catalog, subscription databases, and digital repositories).7 these tools are commonly integrated with a variety of external systems, including proxy servers, inter-library loan, subscription databases, individual publisher websites, and more. for the most part, libraries purchase discovery tools from third-party vendors. while some libraries use open source discovery layers, such as blacklight or vufind, there are currently no open source options for web-scale discovery tools.8 information technology and libraries | june 2017 50 outside of the library, web search engines (e.g. google, bing, and yahoo), and targeted academic discovery products (e.g. google scholar, researchgate, and academia.edu) provide additional systems that enable discovery.9 in fact, web search engines, particularly google, play a significant role in the research process. both students and faculty use google in conjunction with library discovery tools. students typically use google at the beginning of the research process to get a better understanding of their topic and identify secondary search terms. faculty, on the other hand, use google to find out how other scholars are thinking about a topic.10 unsurprisingly, google and google scholar provide the majority of content access to major content platforms.11 the data economy and online privacy concerns in an information discovery environment that is primarily online, new threats to patron privacy emerge. in today’s economy, user data has become a global commodity. commercial businesses have recognized the value of data mining for marketing purposes. bjorn bloching, et. al. explain, “from cleverly aggregated data points, you can draw multiple conclusions that go right to the heart and mind of the customer.”12 along the same lines, the ability to collect and analyze user data is extremely valuable to government agencies for surveillance purposes, creating an additional data-driven market.13 the increasing value of user data has drastically expanded the business of online tracking. in her book, dragnet nation, journalist julia angwin outlines a detailed taxonomy of trackers, including various types of government, commercial, and individual trackers.14 in the online information discovery process, multiple parties collect user data at different points. consider the following scenario: a user executes a basic keyword search in google to access an openly available online resource. in the fifteen seconds it takes the user to get to that resource, information about the user’s search is collected by the internet service provider (isp), the web browser, the search engine, the website hosting the resource, and any third-party trackers embedded in the website. the search query, along with the user’s internet protocol (ip) address, become part of the data collector’s profile on the user. in the future, the data collector can sell the user’s profile to a data broker, where it will be merged with profiles from other data collectors to create an even more detailed portrait of the user.15 the data broker, in turn, can sell the complete dataset to the government, law enforcement, commercial businesses, and even criminals. this creates serious privacy concerns, particularly since users have no legal right over how their data is bought and sold.16 privacy protection in libraries libraries have deeply-rooted values in privacy and strong motivations to protect it. intellectual freedom, the foundation on which libraries are built, necessarily requires privacy. in its interpretation of the library bill of rights, the american library association (ala) explains, “in a library (physical or virtual), the right to privacy is the right to open inquiry without having the subject of one’s interest examined or scrutinized by others.”17 many studies support this idea, privacy and user experience in 21st century library discovery | pekala https://doi.org/10.6017/ital.v36i2.9817 51 having found that people who are indiscriminately and secretly monitored censor their behavior and speech.18 libraries have both legal and ethical obligations to protect patron privacy. while there is no federal legislation that protects privacy in libraries, forty-eight states have regulations regarding the confidentiality of library records, though the extent of these protections varies by state.19 because these statutes were drafted before the widespread use of the internet, they are phrased in a way that addresses circulation records and does not specifically include or exclude internet use records (records with information on sites accessed by patrons) from these protections. therefore, according to theresa chmara, libraries should not treat internet use records any differently than circulation records with respect to confidentiality.20 the library community has established many guiding documents that embody its ethical commitment to protecting patron privacy. the ala code of ethics states in its third principle, “we protect each library user's right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.”21 the international federation of library associations and institutions (ifla) code of ethics has more specific language about data sharing, stating, “the relationship between the library and the user is one of confidentiality and librarians and other information workers will take appropriate measures to ensure that user data is not shared beyond the original transaction.”22 the library community has also established practical guidelines for dealing with privacy issues in libraries, particularly those issues relating to digital privacy, including the ala privacy guidelines23 and the national information standards organization (niso) consensus principles on user’s digital privacy in library, publisher, and software-provider systems.24 additionally, the library freedom project was launched in 2015 as an educational resource to teach librarians about privacy threats, rights, and tools, and in 2017, the library and information technology association (lita) released a set of seven privacy checklists25 to help libraries implement the ala privacy guidelines. personalization of online systems while user data can be used for tracking and surveillance, it can also be used to improve the digital user experience of online systems through personalization. because the growth of the internet has made it increasingly difficult to navigate the continually growing sea of information online, researchers have put significant effort into designing interfaces, interaction methods, and systems that deliver adaptive and personalized experiences.26 angsar koene, et. al. explain, “the basic concept behind personalization of on-line information services is to shield users from the risk of information overload, by pre-filtering search results based on a model of the user’s preferences… a perfect user model would…enable the service provider to perfectly predict the decision a user would make for any given choice.”27 the authors continue to describe three main flavors of personalization systems: 1. content-based systems, in which the system recommends items based on their similarity to items that the user expressed interest in; information technology and libraries | june 2017 52 2. collaborative-filtering systems, in which users are given recommendations for items that other users with similar tastes liked in the past; and 3. community-based systems, in which the system recommends items based on the preferences of the user’s friends.28 many popular consumer services, such as amazon.com, youtube, netflix, google, etc., have increased (and continue to increase) the level of personalization that they offer.29 one such service in the area of academic resource discovery is google scholar’s updates, which analyzes a user’s publication history in order to predict new publications of interest.30 libraries, in contrast, have not pressed their developers and vendors to personalize their services in favor of privacy, even though studies have shown that users expect library tools to mimic their experience using web search engines.31 some web-scale discovery services do, however, allow researchers to set personalization preferences, such as their field of study, and, according to roger schonfeld, it is likely that many researchers would benefit tremendously from increased personalization in discovery.32 in this vein, the american philosophical society library recently launched a new recommendation tool for archives and manuscripts that uses circulation data and user-supplied interests to drive recommendations.33 opportunities for user experience in library discovery a major challenge in today’s online discovery environment is that the user is inhibited by an overwhelming number of results. this leads to users rely on relevance rankings and to fail to examine search results in depth. creating fine-tuned relevance ranking algorithms based on user behavior is one remedy to this problem, but it relies on the use of personal user data.34 however, there may be opportunities to facilitate data-driven discovery while maintaining the user’s anonymity that would be suitable for library (and other) discovery tools. irina trapido proposes that relevance ranking algorithms could be designed to leverage the popularity of a resource measured by its circulation statistics or by ranking popular or introductory materials higher than more specialized ones to help users make sense of large results sets.35 michael schofield proposes “context-driven design” as an intermediary solution, whereby the user opts in to have the system infer context from neutral device or browser information, such as the time of day, business hours, weather, events, holidays, etc.36 jason clark describes a search prototype he built that applies these principles, but he questions whether these types of enhancements actually add value to users.37 rachel vacek cautions that personalization is not guaranteed to be useful or meaningful, and continuous user testing is key.38 discussion there are several aspects to consider for the design of future library discovery tools. the integrated, complex nature of the web causes privacy to become compromised during the information discovery process. library discovery tools have been designed not to retain borrowing records, but have not yet evolved to mask user behavior, which is invaluable in today’s data economy. it is imperative that all types of library discovery tools have built-in functionality to privacy and user experience in 21st century library discovery | pekala https://doi.org/10.6017/ital.v36i2.9817 53 protect patron privacy beyond borrowing records, while also enabling the ethical use of patron data to improve user experience. even if library discovery tools were to evolve so that they themselves were absolutely private (where no data were ever collected or shared), other online parties (isps, web browsers, advertisers, data brokers, etc.) would still have access to user data through other means, such as cookies and fingerprinting. the operating reality is such that privacy is not immediately and completely controllable by libraries. laurie rinehart-thompson explains, “in the big picture, privacy is at the mercy of ethical and stewardship choices on the part of all information handlers.”39 while libraries alone cannot guarantee complete privacy for their patrons, they can and should mitigate privacy risks to the greatest extent possible. at the same time, ignoring altogether the benefits of using patron data to improve the discovery user experience may threaten the library’s viability in the age of google. roger schonfeld explains, “if systems exclude all personal data and use-related data, the resulting services will be onedimensional and sterile. i consider it essential for libraries to deliver dynamic and personalized services to remain viable in today's environment; expectations are set by sophisticated social networks and commercial destinations.”40 libraries must find ways to keep up with greater industry trends while adhering to professional ethics. recommendations while libraries have traditionally shied away from collecting data about patron transactions, these conservative tendencies run counter to the library’s mission to provide outstanding user experience and the need to evolve in a rapidly changing information industry. as the profession adopts new technologies, ethical dilemmas present themselves that are tied into their use. while several library organizations have issued guidance for libraries about the role of user data in these new technologies, this does not go far enough. the niso privacy principles, for instance, acknowledge that its principles are merely “a starting point.”41 examining the substance of these guidelines is important for confronting the privacy challenges facing library discovery in the 21st century, but there are additional steps libraries can take to more fully address the competing interests of privacy and user experience in library discovery and in library technologies more generally. holding third parties accountable libraries are increasingly at the mercy of third parties when it comes to the development and design of library discovery tools. unfortunately, these third parties not have the same ethical obligations to protect patron privacy that librarians do. in addition, the existing guidance for protecting user data in library technologies is directed towards librarians, not third party vendors. the library community must hold third parties accountable for the ethical design of library discovery tools. one strategy for doing this would be to develop a ranking or certification process for discovery tools based on a community set of standards. the development of hipaa-compliant information technology and libraries | june 2017 54 records management systems in the medical field sets an example. because healthcare providers are required by law to guarantee the privacy of patient data,42 they must select electronic health records systems (erms) that have been certified by an office of the national coordinator for health information technology (onc)-authorized body.43 in order to be certified, the system must adhere to a set of criteria adopted by the department of health and human services,44 which includes privacy and security standards.45 another example is the consumer reports standard and testing program for consumer privacy and security, which is currently in development. consumer reports explains the reason for developing this new privacy standard, “if consumer reports and other public-interest organizations create a reasonable standard and let people know which products do the best job of meeting it, consumer pressure and choices can change the marketplace.”46 libraries could potentially adapt the consumer reports standards and rating system for library discovery tools and other library technologies. engaging in ux research & design libraries should not rely on third parties alone to address privacy and user experience requirements for library discovery tools. libraries are well-poised to become more involved in the design process itself by actively engaging in user experience research and design. the opportunities for “context-driven design” and personalization based on circulation and other anonymous data are promising for library discovery but require ample user testing to determine their usefulness. understanding which types of personalization features offer the most value while preserving privacy is key to accelerating the design of library discovery tools. the growth of user experience librarian jobs and the emergence of user experience teams and departments in libraries signals an increasing amount of user experience expertise in the field, which can be leveraged to investigate these important questions for library discovery. illuminating the black box when librarians adopt new discovery tools without fully understanding their underlying technologies and the data economy in which they operate, this does not serve users. librarians have ethical obligations that should require them to thoroughly understand how and when user data is captured by library discovery tools and other web technologies, and how this information is compiled and shared at a higher level. not only do librarians need to understand the technical aspects of discovery technologies, they also need to understand the related user experience benefits and privacy concerns and the resulting ethical implications. as technology continues to evolve, librarians should be required to engage in continued learning in these areas. such technology literacy skills could be incorporated in the curriculum of library and information science degree programs, as well as in ongoing professional development opportunities. empowering library users because information discovery in an online environment introduces new privacy risks, communication about this topic between librarians and patrons is paramount. librarians should privacy and user experience in 21st century library discovery | pekala https://doi.org/10.6017/ital.v36i2.9817 55 proactively discuss with patrons the potential risks to their privacy when conducting research online, whether they are using the open web or library discovery tools. it is ultimately up to the patron to weigh their needs and preferences in order to decide which tools to use, but it is the librarian’s responsibility to empower patrons to be able to make these decisions in the first place. conclusion with the rollback of the fcc privacy rules that prohibit isps from selling customer search histories without customer permission, understanding digital privacy issues and taking action to protect patron privacy is more important than ever. while privacy and user experience are both necessary and important components of library discovery systems, their requirements are in direct conflict with each other. an absolutely private discovery experience would mean that no user data is ever collected during the search process, whereas a completely personalized discovery experience would mean that all user data is collected and utilized to inform the design and features of the system. it is essential for library discovery tools to have built-in functionality that protects patron privacy to the greatest extent possible and enables the ethical use of patron data to improve user experience. the library community must take action to address these requirements beyond establishing guidelines. holding third party providers to higher privacy standards is a starting point. in addition, librarians themselves need to engage in user experience research and design to discover and test the usefulness of possible intermediary solutions. librarians must also become more educated as a profession on digital privacy issues and their ethical implications in order to educate patrons about their fundamental rights to privacy and empower them to make decisions about which discovery tools to use. collectively, these strategies enable libraries to address user needs, uphold professional ethics, and drive the future of library discovery. references 1. irina trapido, “library discovery products: discovering user expectations through failure analysis,” information technologies and libraries 35, no. 3 (2016): 9-23, https://doi.org/10.6017/ital.v35i3.9190. 2. brian fung, “the house just voted to wipe away the fcc’s landmark internet privacy protections,” the washington post, march 28, 2017, https://www.washingtonpost.com/news/the-switch/wp/2017/03/28/the-house-justvoted-to-wipe-out-the-fccs-landmark-internet-privacy-protections. 3. jon brodkin, “president trump delivers final blow to web browsing privacy rules,” ars technica, april 3, 2017, https://arstechnica.com/tech-policy/2017/04/trumps-signaturemakes-it-official-isp-privacy-rules-are-dead/. 4. nathan freed wessler, “how private is your online search history?” aclu free future (blog), https://www.aclu.org/blog/how-private-your-online-search-history. 5. julia angwin, dragnet nation (new york: times books, 2014), 41-42. information technology and libraries | june 2017 56 6. mit libraries, institute-wide task force on the future of libraries (2016), 12, https://assets.pubpub.org/abhksylo/futurelibrariesreport.pdf. 7. trapido, “library discovery products,” 10. 8. marshall breeding, “the future of library resource discovery,” niso white papers, niso, baltimore, md, 2015, 4, http://www.niso.org/apps/group_public/download.php/14487/future_library_resource_dis covery.pdf. 9. christine wolff, alisa b. rod, and roger c. schonfeld, ithaka s+r us faculty survey 2015 (new york: ithaka s+r, 2016), 11, https://doi.org/10.18665/sr.277685. 10. deirdre costello, “students and faculty research differently” (presentation, computers in libraries, washington, d.c., march 28, 2017), http://conferences.infotoday.com/documents/221/a103_costello.pdf. 11. roger c. schonfeld, meeting researchers where they start: streamlining access to scholarly resources (new york: ithaka s+r, 2015), https://doi.org/10.18665/sr.241038. 12. björn bloching, lars luck, and thomas ramge, in data we trust: how customer data is revolutionizing our economy (london: bloomsbury publishing, 2012), 65. 13. angwin, 21-36. 14. ibid., 32-33. 15. natasha singer, “mapping, and sharing, the consumer genome,” new york times, june 16, 2012, http://www.nytimes.com/2012/06/17/technology/acxiom-the-quiet-giant-ofconsumer-database-marketing.html. 16. lois beckett, “everything we know about what data brokers know about you,” propublica, june 13, 2014, https://www.propublica.org/article/everything-we-know-about-what-databrokers-know-about-you. 17. “an interpretation of the library bill of rights,” american library association, amended july 1, 2014, http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy. 18. angwin, dragnet nation, 41-42. 19. anne klinefelter, “privacy and library public services: or, i know what you read last summer,” legal references services quarterly 26, no. 1-2 (2007): 258-260, https://doi.org/10.1300/j113v26n01_13. 20. theresa chmara, privacy and confidentiality issues: guide for libraries and their lawyers (chicago: ala editions, 2009), 27-28. 21. “code of ethics of the american library association,” american library association, privacy and user experience in 21st century library discovery | pekala https://doi.org/10.6017/ital.v36i2.9817 57 amended january 22, 2008, http://www.ala.org/advocacy/proethics/codeofethics/codeethics. 22. “ifla code of ethics for librarians and other information workers,” international federation of library associations and institutions, august 12, 2012, http://www.ifla.org/news/ifla-code-of-ethics-for-librarians-and-other-informationworkers-full-version. 23. “privacy & surveillance,” american library association, approved 2015-2016, http://www.ala.org/advocacy/privacyconfidentiality. 24. national information standards organization, niso consensus principles on users’ digital privacy in library, publisher, and softwareprovider systems (niso privacy principles), published on december 10, 2015, http://www.niso.org/apps/group_public/download.php/15863/niso%20consensus%20pr inciples%20on%20users%92%20digital%20privacy.pdf. 25. “library privacy checklists,” library and information technology association, accessed march 7, 2017, http://www.ala.org/lita/advocacy. 26. panagiotis germanakos and marios belk, “personalization in the digital era,” in humancentred web adaptation and personalization: from theory to practice, (switzerland: springer international publishing switzerland, 2016), 16. 27. ansgar koene et al., “privacy concerns arising from internet service personalization filters,” acm sigcas computers and society 45, no. 3 (2015): 167. 28. ibid., 168. 29. ibid. 30. james connor, “scholar updates: making new connections,” google scholar blog, https://scholar.googleblog.com/2012/08/scholar-updates-making-new-connections.html. 31. schonfeld, meeting researchers where they start, 2. 32. roger c. schonfeld, does discovery still happen in the library?: roles and strategies for a shifting reality (new york: ithaka s+r, 2014), 10, https://doi.org/10.18665/sr.24914. 33. abigail shelton, “american philosophical society announces launch of pal, an innovative recommendation tool for research libraries,” american philosophical society, april 3, 2017, https://www.amphilsoc.org/press/pal. 34. trapido, “library discovery products,” 17. 35. ibid. 36. michael schofield, “does the best library web design eliminate choice?” libux, september information technology and libraries | june 2017 58 11, 2015, http://libux.co/best-library-web-design-eliminate-choice/. 37. jason a. clark, “anticipatory design: improving search ux using query analysis and machine cues,” weave: journal of library user experience 1, no. 4 (2016), https://doi.org/10.3998/weave.12535642.0001.402. 38. rachel vacek, “customizing discovery at michigan” (presentation, electronic resources & libraries, austin, tx, april 4, 2017), https://www.slideshare.net/vacekrae/customizingdiscovery-at-the-university-of-michigan. 39. laurie a. rinehart-thompson, beth m. hjort, and bonnie s. cassidy, “redefining the health information management privacy and security role,” perspectives in health information management 6 (2009): 4.s 40. marshall breeding, “perspectives on patron privacy and security,” computers in libraries 35, no. 5 (2015): 13. 41. national information standards organization, niso consensus principles. 42. joel jpc rodrigues, et al., “analysis of the security and privacy requirements of cloud-based electronic health records systems,” journal of medical internet research 15, no. 8 (2013), https://www.ncbi.nlm.nih.gov/pmc/articles/pmc3757992/. 43. office of the national coordinator for health information technology, guide to privacy and security of electronic health information, april 2015, https://www.healthit.gov/sites/default/files/pdf/privacy/privacy-and-security-guide.pdf. 44. office of the national coordinator for health information technology, “health it certification program overview,” january 30, 2016, https://www.healthit.gov/sites/default/files/publichealthitcertificationprogramovervie w_v1.1.pdf. 45. office of the national coordinator for health information technology, “2015 edition health information technology (health it) certification criteria, base electronic health record (ehr) definition, and onc health it certification program modifications final rule,” october 2015, https://www.healthit.gov/sites/default/files/factsheet_draft_2015-10-06.pdf. 46. consumer reports, “consumer reports to begin evaluating products, services for privacy and data security,” consumer reports, march 6, 2017, http://www.consumerreports.org/privacy/consumer-reports-to-begin-evaluatingproducts-services-for-privacy-and-data-security/. hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries annie wu, santi thompson, rachel vacek, sean watkins, and andrew weidner information technology and libraries | june 2016 5 abstract since 2009, tens of thousands of rare and unique items have been made available online for research through the university of houston (uh) digital library. six years later, the uh libraries’ new digital initiatives call for a more dynamic digital repository infrastructure that is extensible, scalable, and interoperable. the uh libraries’ mission and the mandate of its strategic directions drives the pursuit of seamless access and expanded digital collections. to answer the calls for technological change, the uh libraries administration appointed a digital asset management system (dams) implementation task force to explore, evaluate, test, recommend, and implement a more robust digital asset management system. this article focuses on the task force’s dams selection activities: needs assessment, systems evaluation, and systems testing. the authors also describe the task force’s dams recommendation based on the evaluation and testing data analysis, a comparison of the advantages and disadvantages of each system, and system cost. finally, the authors outline their dams implementation strategy comprised of a phased rollout with the following stages: system installation, data migration, and interface development. introduction since the launch of the university of houston digital library (uhdl) in 2009, the uh libraries have made tens of thousands of rare and unique items available online for research using contentdm. as we began to explore and expand into new digital initiatives, we realize that the uh libraries’ digital aspirations require a more dynamic, flexible, scalable, and interoperable digital asset management system that can manage larger amounts of materials in a variety of formats. we plan to implement a new digital repository infrastructure that accommodates creative workflows and allows for the configuration of additional functionalities such as digital exhibits, data mining, cross-linking, geospatial visualization, and multimedia presentation. the annie wu (awu@uh.edu) is head of metadata and digitization services, santi thompson (sathompson3@uh.edu) is head of repository services, rachel vacek (evacek@uh.edu) is head of web services, sean watkins (slwatkins@uh.edu) is web projects manager, and andrew weidner (ajweidner@uh.edu) is metadata services coordinator, university of houston libraries. mailto:awu@uh.edu mailto:sathompson3@uh.edu mailto:evacek@uh.edu mailto:slwatkins@uh.edu mailto:ajweidner@uh.edu hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 6 new system will be designed with linked data in mind and will allow us to publish our digital collections as linked open data within the larger semantic web environment. the uh libraries strategic directions set forth a mandate for us to “work assiduously to expand our unique and comprehensive collections that support curricula and spotlight research. we will pursue seamless access and expand digital collections to increase national recognition.”1 to fulfill the uh libraries’ mission and the mandate of our strategic directions, the uh libraries administration appointed a digital asset management system (dams) implementation task force to explore, evaluate, test, recommend, and implement a more robust digital asset management system that would provide multiple modes of access to the uh libraries’ unique collections and accommodate digital object production at a larger scale. the collaborative task force comprises librarians from four departments: metadata and digitization services (mds), web services, digital repository services, and special collections. the core charge of the task force is to: • perform a needs assessment and build criteria and policies based on evaluation of the current system and requirements for the new dams • research and explore dams on the market and identify the top three systems for beta testing in a development environment • generate preliminary recommendations from stakeholders' comments and feedback • coordinate installation of the new dams and finish data migration • communicate the task force work to uh libraries colleagues literature review libraries have maintained dams for the publication of digitized surrogates of rare and unique materials for over two decades. during that time, information professionals have developed evaluation strategies for testing, comparing, and evaluating library dams software. reviewing these models and associated case studies provided insight into common practices for selecting systems and informed how the uh libraries dams implementation task force conducted its evaluation process. one of the first publications of its kind, “a checklist for evaluating open source digital library software” by dion hoe-lian goh et al., presents a comprehensive list of criteria for library dams evaluation.2 the researchers developed twelve broad categories for testing (e.g., content management, metadata, and preservation) and generated a scoring system based on the assignment of a weight and a numeric value to each criterion.3 while the checklist was created to assist with the evaluation process, the authors note that an institution’s selection decision should be guided primarily by defining the scope of their digital library, the content being curated using the software, and the uses of the material.4 through their efforts, the authors created a rubric that can be utilized by other organizations when selecting a dams. information technology and libraries | june 2016 7 subsequent research projects have expanded upon the checklist evaluation model. in “choosing software for a digital library,” jody deridder outlines major issues that librarians should address when choosing dams software, including many of the hardware, technological, and metadata concerns that goh et al. identified.5 additionally, she emphasizes the need to account for personnel and service requirements with a variety of activities: usability testing and estimating associated costs; conducting a formal needs assessment to guide the evaluation process; and a tiered-testing approach, which calls upon evaluators to winnow the number of systems.6 by considering stakeholder needs, from users to library administrators, deridder’s contributions inform a more comprehensive dams evaluation process. in addition to creating evaluation criteria, the literature on dams selection has also produced case studies that reflect real-world scenarios and identify use cases that help determine user needs and desires. in “evaluation of digital repository software at the national library of medicine,” jennifer l. marill and edward c. luczak discuss the process that the national library of medicine (nlm) used to compare ten dams, both proprietary and open-source.7 echoing goh et al. and deridder, marill and luczak created broad categories for testing and developed a scoring system for comparing dams.8 additionally, marill and luczak enriched the evaluation process by implementing two testing phases: “initial testing of ten systems” and “in-depth testing of three systems.”9 this method allowed nlm to conduct extensive research on the most promising systems for their needs before selecting a dams to implement. the tiered approach appealed to the task force, and influenced how it conducted the evaluation process, because it balances efficiency and comprehensiveness. in another case study, dora wagner and kent gerber describe the collaborative process of selecting a dams across a consortium. in their article “building a shared digital collection: the experience of the cooperating libraries in consortium,”10 the authors emphasize additional criteria that are important for collaborating institutions: the ability to brand consortial products for local audiences; the flexibility to incorporate differing workflows for local administrators; and the shared responsibility of system maintenance and costs.11 while the uh libraries will not be managing a shared repository dams, the task force appreciated the article’s emphasis on maximizing customizations to improve the user experience. in “evaluation and usage scenarios of open source digital library and collection management tools,” georgios gkoumas and fotis lazarinis describe how they tested multiple open-source systems against typical library functions—such as acquisitions, cataloging, digital libraries, and digital preservation—to identify typical use cases for libraries.12 some of the use cases formulated by the researchers address digital platforms, including features related to supporting a diverse array of metadata schema and using a simple web interface for the management of digital assets.13 these use cases mirror local feature and functionality requests incorporated into the uh libraries’ evaluation criteria. hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 8 in “digital libraries: comparison of 10 software,” mathieu andro, emmanuelle asselin, and marc maisonneuve discuss a rubric they developed to compare six open-source platforms (invenio, greenstone, omeka, eprints, ori-oai, and dspace) and four proprietary platforms (mnesys, digitool, yoolib, and contentdm) around six core areas: document management, metadata, engine, interoperability, user management, and web 2.0. 14 the authors note that each solution is “of good quality” and that institutions should consider a variety of factors when selecting a dams, including the “type of documents you will want to upload” and the “political criteria (open source or proprietary software)” desired by the institution.15 this article provided the uh libraries with additional factors to include in their evaluation criteria. finally, heather gilbert and tyler mobley’s article “breaking up with contentdm: why and how one institution took the leap to open source,” provides a case study for a new trend: selecting a dams for migration from an existing system to a new one.16 the researchers cite several reasons for their need to select a new dams, primarily their current system’s limitations with searching and displaying content in the digital library.17 they evaluated alternatives and selected a suite of open-source tools, including fedora, drupal, and blacklight, which combine to make up their new dams.18 gilbert and mobley also reflect on the migration process and identify several hurdles they had to overcome, such as customizing the open-source tools to meet their localized needs and confronting inconsistent metadata quality.19 gilbert and mobley’s article most closely matches the scenario faced by the uh libraries. our study adds to the limited literature on evaluating and selecting dams for migration in several ways. it demonstrates another model that other institutions can adapt to meet their specific needs. it identifies new factors for other institutions to take into account before or during their own migration process. finally, it adds to the body of evidence for a growing movement of libraries migrating from proprietary to open-source dams. dams evaluation and analysis methodology needs assessment the dams implementation task force fulfilled the first part of its charge by conducting a needs assessment. the goal of the needs assessment was to collect the key requirements of stakeholders, identify future features of the new dams, and gather data in order to craft criteria for evaluation and testing in the next phase of its work. the task force employed several techniques for information gathering during the needs assessment phase: • identified stakeholders and held internal focus group interviews to identify system requirement needs and gaps • reviewed scholarly literature on dams evaluation and migration • researched peer/aspirational institutions • reviewed national standards around dams information technology and libraries | june 2016 9 • determined both the current use of uhdl as well as its projected use of uhdl • identified uhdl materials and users task force members took detailed notes during each focus group interview session. the literature research on dams evaluation helped the task force to find articles with comprehensive dams evaluation criteria. the niso criteria for core types of entities in digital library collections were also listed and applied to the evaluation after reviewing the niso framework of guidance for building good digital collections.20 more than forty peer and aspirational institutions’ digital repositories were benchmarked to identify web site names, platform architecture, documentation, and user and system features. the task force analyzed the rich data gathered from needs assessment activities and built the dams evaluation criteria that prepared the task force for the next phase of evaluation. evaluation, testing, and recommendation the task force began its evaluation process by identifying twelve potential dams for consideration that were ultimately narrowed down to three systems for in-depth testing. using data from focus group interviews, literature reviews, and dams best practices, the group generated a list of benchmark criteria. these broad evaluation criteria covered features in categories of system functionality, content management, metadata, user interface, and search support. members of the task force researched dams documentation, product information, and related literature to score each system against the evaluation criteria. table 1 contains the scores of the initial evaluation. from this process, five systems emerged with the highest scores: ● fedora (and, closely associated, fedora/hydra and fedora/islandora) ● collective access ● dspace ● rosettacontentdm the task force eliminated collective access from the final systems for testing because of its limited functionality. it is based around archival content only, and is not widely deployed. the task force decided not to test contentdm because of the system’s known functionalities that we identified through firsthand experience. after the initial elimination process, fedora (including fedora/hydra and fedora/islandora), dspace, and rosetta remained for in-depth testing. hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 10 dams evaluation score* fedora 27 fedora/hydra 26 fedora/islandora 26 collective access 24 dspace 24 rosetta 20 contentdm 20 trinity (ibase) 19 preservica 16 luna imaging 15 roda† 6 invenio‡ 5 table 1. evaluation scores of twelve dams using broad evaluation criteria the task force then created detailed evaluation and testing criteria by drawing from the same sources used previously: focus groups, literature review, and best practices. while the broad evaluation focused on high-level functions, the detailed evaluation and testing criteria for the final three systems closely analyzed the specific features of each dams in eight categories: ● system environment and function ● administrative access ● content ingest and management ● metadata ● content access ● discoverability ● report and inquiry capabilities ● system support * total possible score: 29. † removed from evaluation because the system does not support dublin core metadata. ‡ removed from evaluation because the system does not support dublin core metadata. information technology and libraries | june 2016 11 prior to the in-depth testing of the final three systems, the task force researched timelines for system setup. rosetta’s timeline for system setup proved to be prohibitive. consequently, the task force eliminated rosetta from the testing pool and moved forward with fedora and dspace. to conduct the detailed evaluation, the task force scored the specific features under each category utilizing systems testing and documentation. a score range from zero to three (0 = none, 1 = low, 2 = moderate, 3 = high) was assigned for each feature evaluated. after evaluating all features, the score was tallied for each category. our testing revealed that fedora outperformed dspace in over half of the testing sections: content ingest and management, metadata, content access, discoverability, and report and inquiry capabilities. see table 2 for the tallied scores in each testing section. testing sections dspace score fedora score possible score system environment and testing 21 21 36 administrative access 15 12 18 content ingest and management 59 96 123 metadata 32 43 51 content access 14 18 18 discoverability 46 84 114 report and inquiry capabilities 6 15 21 system support 12 11 12 total score: 205 300 393 table 2. scores of top two dams from testing using detailed evaluation criteria after review of the testing results, the task force conducted a facilitated activity to summarize the advantages and disadvantages of each system. based on this comparison, the dams task force recommended that the uh libraries implement a fedora/hydra repository architecture with the following course of action: ● adapt the uhdl user interface to fedora and re-evaluate it for possible improvements ● develop an administrative content management interface with the hydra framework ● migrate all uhdl content to a fedora repository hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 12 fedora/hydra advantages fedora/hydra disadvantages open source steep learning curve large development community long setup time linked data ready requires additional tools for discovery modular design through api no standard model for multi-file objects scalable, sustainable, and extensible batch import/export of metadata handles any file format table 3. fedora/hydra advantages and disadvantages the primary advantages of a dams based on fedora/hydra are: a large and active development community; a scalable and modular system that can grow quickly to accommodate large scale digitization; and a repository architecture based on linked data technologies. this last advantage, in particular, is unique among all systems evaluated, and will give the uh libraries the ability to publish our collections as linked open data. fedora 4 conforms to the world wide web consortium (w3c) recommendation for linked data platforms.21 the main disadvantage of a fedora/hydra system is the steep learning curve associated with designing metadata models and developing a customized software suite, which translates to a longer implementation time compared to off-the-shelf products. the uh libraries must allocate an appropriate amount of time and resources for planning, implementation, and staff training. the long-term return on investment for this path will be a highly skilled technical staff with the ability to maintain and customize an open-source, standards-based repository architecture that can be expanded to support other uh libraries content such as geospatial data, research data, and institutional repository materials. information technology and libraries | june 2016 13 dspace advantages dspace disadvantages open source flat file and metadata structure easy installation / ready out of box limited reporting capabilities existing familiarity through texas digital library limited metadata features user group / profile controls does not support linked data metadata quality module limited api batch import of objects not scalable / extensible poor user interface table 4. dspace advantages and disadvantages the main advantages of dspace are ease of installation, familiarity of workflows, and additional functionality not found in contentdm.22 installation and migration to a dspace system would be relatively fast, and staff could quickly transition to new workflows because they are similar to contentdm. dspace also supports authentication and user roles that could be used to limit content to the uh community only. commercial add-on modules, although expensive, could be purchased to provide more sophisticated content management tools than are currently available with contentdm. the disadvantages of a dspace system are the same long-term, systemic problems with the current contentdm repository. dspace uses a flat metadata structure, has a limited api, does not scale well, and is not customizable to the uh libraries’ needs. consultations with peers indicated that both contentdm and dspace institutions are exploring the more robust capabilities of fedora-based systems. migration of the digital collections in contentdm to a dspace repository would provide few, if any, long term benefits to the uh libraries. of all the systems considered, implementation of a fedora/hydra repository aligns most clearly with the uh libraries strategic directions of attaining national recognition and improving access to our unique collections. the fedora and hydra communities are very active, with project management overseen by duraspace and hydra respectively.23,24 over the long term, a repository based on fedora/hydra will give the uh libraries a low cost, scalable, flexible, and interoperable platform for providing online access to our unique collections. hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 14 cost considerations to balance the current digital collections production schedule with the demands of a timely implementation and migration, the task force identified the following investments as cost effective for fedora/hydra and dspace, respectively: fedora/hydra dspace metadata librarian: annual salary ● manages daily metadata unit operations during implementation ● streamlines the migration process metadata librarian: annual salary ● manages daily metadata unit operations during implementation ● streamlines the migration process @mire modules: $41,500 ● content delivery (3): $13,500 ● metadata quality: $10,000 ● image conversion suite: $9,000 ● content & usage analysis: $9,000 ● these modules require one-time fees to @mire that recur when upgrading to a new version of dspace table 5. start-up costs associated with fedora/hydra and dspace the task force determined that an investment in one librarian’s salary is the most cost-effective course of action. the new metadata librarian will manage daily operations of the metadata unit in metadata & digitization services while the metadata services coordinator, in close collaboration with the web projects manager, leads the dams implementation process. in contrast to fedora, migration to dspace would require a substantial investment in third party software modules from @mire to deliver the best possible content management environment and user experience. implementation strategies the implementation of the new dams will occur in a phased rollout comprised of the following stages: system installation, data migration, and interface development. mds and web services will perform the majority of the work, in consultation with key stakeholders from special collections and other units. throughout this process, the dams implementation task force will information technology and libraries | june 2016 15 consult with the digital preservation task force* to coordinate the preservation and access systems. phase one system installation phase two data migration phase three interface development set up production and server environment formulate content migration strategy and schedule reevaluate front-end user interface rewrite uhdl front-end application for fedora/solr migrate test collections and document exceptions rewrite uhdl front end as a hydra head or . . . create metadata models conduct the data migration . . . update current front end coordinate workflows with digital preservation task force create preservation metadata for migrated data establish interdepartmental production workflows begin development of administrative hydra head for content management continue development of the hydra administrative interface refine administrative hydra head for content management table 6. overview of dams phased implementation phase one: system installation during the first phase of dams implementation, web services and mds will work closely together to install an open-source repository software stack based on fedora, rewrite the current php front-end interface to provide public access to the data in the new system, and create metadata content models for the uhdl based on the portland common data model,25 in consultation with the coordinator of digital projects from special collections and other key stakeholders. the dams task force will consult with the digital preservation task force† to determine how closely the preservation and access systems will be integrated and at what points. the two groups will also jointly outline a dams migration strategy that aligns with the preservation system. web services and mds will collaborate on research and development of an administrative interface, based on the hydra framework, for day-to-day management of uhdl content. * an appointed task force to create a digital preservation policy and identify strategies, actions, and tools needed to sustain long-term access to digital assets maintained by uh libraries. † a working team at uh libraries that enforces the digital preservation policy and maintains the digital preservation system.[convert these footnotes to endnotes?] hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 16 phase two: data migration in the second phase, mds will migrate legacy content from contentdm to the new system and work with web services, special collections, and the architecture and art library to resolve any technical, metadata, or content problems that arise. the second phase will begin with the development of a strategy for completing the work in a timely fashion, followed by migration of representative sample collections to the new system to test and refine its capabilities. after testing is complete, all legacy content will be migrated from contentdm to fedora, and preservation metadata for migrated collections will be created and archived. development work on the hydra administrative interface will also continue. after the data migration is complete, all new collections will be ingested into fedora/hydra, and the current contentdm installation will be retired. phase three: interface development in the final phase, web services will reevaluate the current front-end user interface (ui) for the uhdl by conducting user tests to better understand how and why users are visiting the uhdl. web services will also analyze web and system analytics and gather feedback from special collections and other stakeholders. depending on the outcome of this research, web services may create a new ui based on the hydra framework or choose to update the current front-end application with modifications or new features. web services and mds will also continue to develop or adopt tools for the management of uhdl content and work with special collections and the branch libraries to establish production workflows in the new system. continued development work on the front-end and administrative interfaces, for the life of the new digital asset management system, is both expected and desirable as we maintain and improve the uhdl infrastructure and contribute to the open source software community in line with the uh libraries strategic directions. ongoing: assessment, enhancement, training, and documenting throughout the transition process mds and web services will undergo extensive training in workshops and conferences to develop the skills necessary for developing and maintaining the new system. they will also establish and document workflows to ensure the long-term viability of the system. regular consultation with special collections, the branch libraries, and other stakeholders will be conducted to ensure that the new system satisfies the requirements of colleagues and patrons. ongoing activities will include: ● assessing service impact of new system ● user testing on ui ● regular system enhancements ● establishing new workflows ● creating and maintaining documentation ● training: conferences, webinars, workshops, etc. information technology and libraries | june 2016 17 conclusion transitioning from contentdm to a fedora/hydra repository will place the uh libraries in a position to sustainably grow the amount of content in the uh digital library and customize the uhdl interfaces for a better user experience. publishing our data in a linked data platform will give the uh libraries the ability to more easily publish our data for the semantic web. in addition, the fedora/hydra architecture can be adapted to support a wide range of uh libraries projects, including a geospatial data portal, a research data repository, and a self-deposit institutional repository. over the long term, the return on investment for implementing an open-source repository architecture based on industry standard software will be: improved visibility of our unique collections on the web; expanded opportunities for aggregating our collections with highprofile repositories such as the digital public library of america; and increased national recognition for our digital projects and staff expertise. references 1. “the university of houston libraries strategic directions, 2013–2016,” accessed july 22, 2015, http://info.lib.uh.edu/sites/default/files/docs/strategic-directions/2013-2016libraries-strategic-directions-final.pdf. 2. dion hoe-lian goh et al., “a checklist for evaluating open source digital library software,” online information review 30, no. 4 (july 13, 2006): 360–79, doi:10.1108/14684520610686283. 3. ibid., 366. 4. ibid., 364. 5. jody l. deridder, “choosing software for a digital library,” library hi tech news 24, no. 9 (2007): 19–21, doi:10.1108/07419050710874223. 6. ibid., 21. 7. jennifer l. marill and edward c. luczak, “evaluation of digital repository software at the national library of medicine,” d-lib magazine 15, no. 5/6 (may 2009), doi:10.1045/may2009marill. 8. ibid. 9. ibid. 10. dora wagner and kent gerber, “building a shared digital collection: the experience of the cooperating libraries in consortium,” college & undergraduate libraries 18, no. 2–3 (2011): 272–90, doi:10.1080/10691316.2011.577680. 11. ibid., 280–84. http://info.lib.uh.edu/sites/default/files/docs/strategic-directions/2013-2016-libraries-strategic-directions-final.pdf http://info.lib.uh.edu/sites/default/files/docs/strategic-directions/2013-2016-libraries-strategic-directions-final.pdf http://dx.doi.org/10.1108/14684520610686283 http://dx.doi.org/10.1108/07419050710874223 http://dx.doi.org/10.1045/may2009-marill http://dx.doi.org/10.1045/may2009-marill http://dx.doi.org/10.1080/10691316.2011.577680 hitting the road towards a greater digital destination: evaluating and testing dams at university of houston libraries | wu et al. | doi:10.6017/ital.v35i2.9152 18 12. georgios gkoumas and fotis lazarinis, “evaluation and usage scenarios of open source digital library and collection management tools,” program: electronic library and information systems 49, no. 3 (2015): 226–41, doi:10.1108/prog-09-2014-0070. 13. ibid., 238–39. 14. mathieu andro, emmanuelle asselin, and marc maisonneuve, “digital libraries: comparison of 10 software,” library collections, acquisitions, & technical services 36, no. 3–4 (2012): 79–83, doi:10.1016/j.lcats.2012.05.002. 15. ibid., 82. 16. heather gilbert and tyler mobley, “breaking up with contentdm: why and how one institution took the leap to open source,” code4lib journal, no. 20 (2013), http://journal.code4lib.org/articles/8327. 17. ibid. 18. ibid. 19. ibid. 20. niso framework working group with support from the institute of museum and library services, a framework of guidance for building good digital collections (baltimore, md: national information standards organization (niso), 2007). 21 . “linked data platform 1.0”, w3c, accessed july 22, 2015, http://www.w3.org/tr/ldp/. 22. “dspace,” accessed july 22, 2015, http://www.dspace.org/. 23. “fedora repository home,” accessed july 22, 2015, https://wiki.duraspace.org/display/ff/fedora+repository+home. 24. “hydra project,” accessed july 22, 2015, http://projecthydra.org/. http://dx.doi.org/10.1108/prog-09-2014-0070 http://dx.doi.org/10.1016/j.lcats.2012.05.002 http://journal.code4lib.org/articles/8327 http://www.w3.org/tr/ldp/ http://www.dspace.org/ https://wiki.duraspace.org/display/ff/fedora+repository+home http://projecthydra.org/ introduction literature review dams evaluation and analysis methodology needs assessment evaluation, testing, and recommendation cost considerations implementation strategies phase one: system installation phase two: data migration phase three: interface development ongoing: assessment, enhancement, training, and documenting conclusion editorial board thoughts: libraries as makerspace? tod colegrove information technology and libraries | march 2013 2 recently there has been tremendous interest in “makerspace” and its potential in libraries: from middle school and public libraries to academic and special libraries, the topic seems very much top of mind. a number of libraries across the country have been actively expanding makerspace within the physical library and exploring its impact; as head of one such library, i can report that reactions to the associated changes have been quite polarized. those from the supported membership of the library have been uniformly positive, with new and established users as well as principal donors immediately recognizing and embracing its potential to enhance learning and catalyze innovation; interestingly, the minority of individuals that recoil at the idea have been either long-term librarians or library staff members. i suspect the polarization may be more a function of confusion over what makerspace actually is. this piece offers a brief overview of the landscape of makerspace—a glimpse into how its practice can dramatically enhance traditional library offerings, revitalizing the library as a center of learning. been happening for thousands of years . . . dale dougherty, founder of make magazine and maker faire, at the “maker monday” event of the 2013 american library association midwinter meeting framed the question simply, “whether making belongs in libraries or whether libraries can contribute to making.” more than one audience member may have been surprised when he continued, “it’s already been happening for hundreds of years—maybe thousands.”1 the o’reilly/darpa makerspace playbook describes the overall goals and concept of makerspace (emphasis added): “by helping schools and communities everywhere establish makerspaces, we expect to build your makerspace users' literacy in design, science, technology, engineering, art, and math. . . . we see making as a gateway to deeper engagement in science and engineering but also art and design. makerspaces share some aspects of the shop class, home economics class, the art studio and science lab. in effect, a makerspace is a physical mashup of these different places that allows projects to integrate these different kinds of skills.”2 building users’ literacies across multiple domains and a gateway to deeper engagement? surely these are core values of the library; one might even suspect that to some degree libraries have long been makerspace. a familiar example of maker activity in libraries might include digital media: still/video photography and audio mastering and remixing. youmedia network, funded by the macarthur patrick “tod” colegrove (pcolegrove@unr.edu), a lita member, is head of the delamare science & engineering library at the university of nevada, reno, nevada. mailto:pcolegrove@unr.edu editorial board thoughts: libraries as makerspace? | colegrove 3 institute through the institute of museum and library services, is a recent example of such effort aimed at creating transformative spaces; engaged in exploring, expressing, and creating with digital media, youth are encouraged to “hang out, mess around, and geek out.” a more pedestrian example is found in the support of users with first-time learning or refreshing of computer programming skills. as recently as the 1980s, the singular option the library had was to maintain a collection of print texts. through the 1990s and into the early 2000s, that support improved dramatically as publishers distributed code examples and ancillary documents on accompanying cd or dvd media, saving the reader the effort of manually typing in code examples. the associated collections grew rapidly, even as the overhead associated with the maintenance and weeding of a collection that was more and more rapidly obsoleted grew more. today, e-book versions combined with ready availability of computer workstations within the library, and the rapidly growing availability of web-based tutorials and support communities, render a potent combination that customers of the library can use to quickly acquire the ability to create or “make” custom applications. with the migration of the supporting print collections online, the library can contemplate further support in the physical spaces opened up. open working areas and whiteboard walls can further amplify the collaborative nature of such making; the library might even consider adding popular hardware development platforms to its collection of lendable technology, enabling those interested to check out a development kit rather than purchase on their own. after all, in a very real sense that is what libraries do—and have done, for thousands of years: buy sometimes expensive technology tailored to the needs and interest of the local community and make it available on a shared basis. makerspace: a continuum along with outreach opportunities, the exploration of how such examples can be extended to encompass more of the interests supported by the library is the essence of the maker movement in libraries. makerspace encompasses a continuum of activity that includes “co-working,” “hackerspace,” and “fab lab”; the common thread running through each is a focus on making rather than merely consuming. it is important to note that although the terms are often incorrectly used as if they were synonymous, in practice they are very different: for example, a fab lab is about fabrication. realized, it is a workshop designed around personal manufacture of physical items— typically equipped with computer controlled equipment such as laser cutters, multiple axis computer numerical controlled (cnc) milling machines, and 3d printers. in contrast, a “hackerspace” is more focused on computers and technology, attracting computer programmers and web designers, although interests begin to overlap significantly with the fab lab for those interested in robotics. co-working space is a natural evolution for participants of the hackerspace; a shared working environment offering much of the benefit of the social and collaborative aspects of the informal hackerspace, while maintaining a focus on work. as opposed to the hobbyist that might be attracted to a hackerspace, co-working space attracts independent contractors and professionals that may work from home. information technology and libraries | march 2013 4 it is important to note that it is entirely possible for a single makerspace to house all three subtypes and be part hackerspace, fab lab, and co-working space. can it be a library at the same time? to some extent, these activities are likely already ongoing within your library, albeit informally; by recognizing and embracing the passions driving those participating in the activity, the library can become central to the greater community of practice. serving the community’s needs more directly, opportunities for outreach will multiply even as it enables the library to develop a laser-sharp focus on the needs of that community. depending on constraints and the community of support, the library may also be well-served by forming collaborative ties with other local makerspace; having local partners can dramatically improve the options available to the library in day-to-day practice, and better inform the library as it takes well-chosen incremental steps. with hackerspace/co-working/fab lab resources aligned with the traditional resources of the library, engagement with one can lead naturally to the other in an explosion of innovation and creativity. renaissance in addition to supporting the work of the solitary reader, “today's libraries are incubators, collaboratories, the modern equivalent of the seventeenth-century coffeehouse: part information market, part knowledge warehouse, with some workshop thrown in for good measure.”3 consider some of the transformative synergies that are already being realized in libraries experimenting with makerspace across the country: • a child reading about robots able to go hands-on with robotics toolkits, even borrowing the kit for an extended period of time along with the book that piqued the interest; surely such access enables the child to develop a powerful sense of agency from early childhood, including a perception of self as being productive and much more than a consumer. • students or researchers trying to understand or make sense of a chemical model or novel protein strand able not only to visualize and manipulate the subject on a two-dimensional screen, but to relatively quickly print a real-world model to be able and tangibly explore the subject from all angles. • individuals synthesizing knowledge across disciplinary boundaries able to interact with members of communities of practice in a non-threatening environment; learning, developing, and testing ideas—developing rapid prototypes in software or physical media, with a librarian at the ready to assist with resources and dispense advice regarding intellectual property opportunities or concerns. the american libraries association estimates that as of this printing there are approximately 121,169 libraries of all kinds in the united states today; if even a small percentage recognize and begin to realize the full impact that makerspace in the library can have, the future looks bright indeed. editorial board thoughts: libraries as makerspace? | colegrove 5 references 1. dale dougherty, “the new stacks: the maker movement comes to libraries” (presentation at the midwinter meeting of the american library association, seattle, washington, january 28, 2013). http://alamw13.ala.org/node/10004. 2. michele hlubinka et al., makerspace playbook, december 2012, accessed february 13, 2012, http://makerspace.com/playbook. 3. alex soojung-kim pang, "if libraries did not exist, it would be necessary to invent them," contemplative computing, february 6, 2012, http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-benecessary-to-invent-them.html. http://alamw13.ala.org/node/10004 http://makerspace.com/playbook http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-be-necessary-to-invent-them.html http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-be-necessary-to-invent-them.html editorial | truitt 3 marc truitteditorial i doubt that many of the blog people are in the habit of sustained reading of complex texts. —michael gorman, 2005 s o, three plus years after the fact, why am i opening with michael gorman’s unfortunate characterization of those he labeled “blog people”? i have no interest in reopening this debate, honestly! but the problem with generalizations, however unfair, is that at their heart there is just enough substance to make them “stick”—to give them a grain or two of credibility. gorman’s words struck a chord in me that existed before his charge and has continued to exist to this day. the substance in gorman’s words had little to do with these “blog people” as such; rather, my interest was piqued by the implications in his remark about how we all deal with “complex texts” and the “sustained reading” of the same. in a time of wide availability of full-text electronic articles, it has become so easy and tempting to cherry pick the odd phrase here or there, without study of the work as a whole. how has scholarship especially been changed by the ease with which we can reduce works to snippets without having considered their overall context? i’m not arguing that scholarly research and writing hasn’t always been at least in part about finding the perfect juicy quotation around which we then weave our own theses. many of us well recall the boxes of 3x5” citation and 5x8” quotation files that we or our patrons laboriously assembled through weeks, months, and years of detailed research. but if the style of compiling these files that i witnessed (and indeed did) is any guide, their existence was the product of precisely that “sustained reading of complex texts” of which gorman spoke. my vague, nagging sense is that what is changing is this style of approaching whole texts. i wondered then about how much scholarly research today is driven by keyword searches of digitized texts that then essentially produce “virtual quotation files” without our having had to struggle with their context in the whole of the original source text? fast forward three years. lately, several articles touching on our changing ways of interacting with resources have appeared in both scholarly and popular venues, and these have served to underline my sense that we are missing something because of our growing lack of engagement with whole texts. writing in the july/august issue of the atlantic monthly, nicholas carr asks “is google making us stupid?” drawing an analogy to the scene in the film 2001: a space odyssey, in which astronaut dave bowman disables supercomputer hal’s memory circuits, carr says i can feel it, too. over the past few years i’ve had an uncomfortable sense that someone, or something, has been tinkering with my brain, remapping the neural circuitry, reprogramming the memory. my mind isn’t going—so far as i can tell—but it’s changing. i’m not thinking the way i used to think. i can feel it most strongly when i’m reading. immersing myself in a book or a lengthy article used to be easy. my mind would get caught up in the narrative or the turns of the argument, and i’d spend hours strolling through long stretches of prose. that’s rarely the case anymore. now my concentration often starts to drift after two or three pages. i get fidgety, lose the thread, begin looking for something else to do. i feel as if i’m always dragging my wayward brain back to the text. the deep reading that used to come naturally has become a struggle.1 carr goes on to explain that “what the net seems to be doing is chipping away my capacity for concentration and contemplation. my mind now expects to take in information the way the net distributes it: in a swiftly moving stream of particles. once i was a scuba diver in the sea of words. now i zip along the surface like a guy on a jet ski.”2 carr’s nagging fear found similar expression among some tech-savvy participants of library online forums; one of the more interesting comments appeared on the web4lib electronic discussion list. in a discussion of the article, tim spalding of librarything observed that he himself had experienced what he dubbed “the google effect” and noted something is lost. . . . human culture often advances by externalizing pieces of our mental life—writing externalizes memory, calculators externalize arithmetic, maps, and now gps, externalize way-finding, etc. each shift changes the culture. and each shift comes with a cost. nobody memorizes texts anymore, nobody knows the times tables past ten or twelve and nobody can find their way home from the stars and the side of the tree the moss grows on.3 meanwhile, another article appeared on a closely related topic, this time in the journal science. james a. evans observed that, because “scientists and scholars tend to search electronically and follow hyperlinks rather than browse or peruse,” the easy availability of electronic resources was resulting in an “ironic change” for scientific marc truitt (marc.truitt@ualberta.ca) is associate director, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | september 2008 scholarship, in that as more journal issues came online, the articles referenced tended to be more recent, fewer journals and articles were cited, and more of those citations were to fewer journals and articles. the forced browsing of print archives may have stretched scientists and scholars to anchor findings deeply into past and present scholarship. searching online is more efficient and following hyperlinks quickly puts researchers in touch with prevailing opinion, but this may accelerate consensus and narrow the range of findings and ideas built upon.4 evans’s research highlights an additional irony: an unintended benefit to the scholarly process in the paperbased world was “poor indexing,” since it encouraged browsing through less relevant, older, or more marginal literature. this browsing had the effect of “facilitat[ing] broader comparisons and led researchers into the past. modern graduate education parallels this shift in publication—shorter in years, more specialized in scope, culminating less frequently in a true dissertation than an album of articles.”5 what is one to make of all of this? at the outset, i wish to state clearly that i am not some sort of anti e-text luddite. electronic texts are a fact of life, and are becoming moreso every day. even though they are in their infancy as a medium, they’ve already transformed the landscape of bibliographic access. my interest is not with the tool, but with the manner in which we are using it. i began by suggesting that i share with gorman a concern about how we increasingly engage with “complex texts” today. unlike him, though, my concern is not limited only to the so-called blog people (whomever they may be), but indeed, it includes all of us. with the explosion in easily accessible electronic texts, our ideas and habits concerning interaction with these texts are changing, sometimes in unintended ways. in a recent informal survey i conducted of my colleagues at work, i asked, “have you ever read an e-book (not just a journal article) from (virtual) cover to (virtual) cover?” for those whose answer was affirmative, i also asked, “how many such books have you read in their entirety?” out of twenty-odd responses, three individuals answered that yes, they had had occasion to read an entire e-book (for a total of six books among the three “yes” respondents, which seemed surprisingly high to me). of greater interest, though, were those who chose to question the premise of the survey, arguing that people don’t “read” e-books the way that they read paper ones. it does make one wonder, then, how amazon thinks it possesses a viable business model in the kindle e-book reader, for which it currently lists an astounding 140,000+ available e-books. clearly, some e-books are being read as whole texts, by some people, for some purposes. but i suspect that’s another story.6 carr and evans use slightly differing imagery to describe a similar phenomenon. carr closes with a reference back to the death of 2001’s hal, saying, “as we come to rely on computers to mediate our understanding of the world, it is our own intelligence that flattens into artificial intelligence.”7 evans, on the other hand, compares contemporary scientific researchers to newton and darwin, each of whom produced works that “not only were engaged in current debates, but wove their propositions into conversation with astronomers, geometers, and naturalists from centuries past.” twenty-first-century scientists and scholars, by contrast, are able because of readily available electronic resources “to frame and publish their arguments more efficiently, [but] they weave them into a more focused—and more narrow—past and present.” 8 perhaps the most succinct statement, though, comes from librarything’s tim spalding, who summarized the problem thusly: “we advance by becoming dumber.”9 an ital research and publishing opportunity for an inquisitive and enterprising scholar, perhaps? i’d welcome the manuscript! shameless plugs department. by the time you read this, we at ital will have launched our new blog, italica (http://ital-ica.blogspot.com). italica addresses a need we on the ital editorial board have long sensed; that is, an area for “letters to the editor,” updates to articles, supplementary materials we can’t work into the journal—you name it. one of the most important features of italica will be a forum for readers’ conversations with our authors: we’ll ask authors to host and monitor discussion for a period of time after publication so that you’ll then have a chance to interact with them. italica is currently a pilot project. for our first issue we will have begun with a discussion hosted by jennifer bowen, whose article “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase i” was published in the june 2008 issue of ital. for our second italica, we plan to expand coverage and discussion to include all articles and other features in the september issue you now have in hand. italica is sure to become a stimulating supplement to and forum for topics originating in ital. we look forward to seeing you there! references and notes extract. michael gorman, “revenge of the blog people!” library journal (feb. 15, 2005) www.libraryjournal.com/article/ ca502009.html (accessed july 21, 2008). 1. nicholas carr, “is google making us stupid?” the atlantic monthly 301 (july/aug. 2008) www.theatlantic.com/ doc/200807/google (accessed july 23, 2008). editor’s column | truitt 5 2. ibid. 3. tim spalding, “re: ‘is google making us stupid? what the internet is doing to our brains,’” web4lib discussion list post, june 19, 2008, http://article.gmane.org/gmane.education .web4lib/12349 (accessed july 24, 2008). 4. james a. evans, “electronic publication and the narrowing of science and scholarship,” science (july 18, 2008) www .sciencemag.org/cgi/content/full/321/5887/395 (accessed july 24, 2008). emphasis added. 5. ibid. 6. as of 5:30pm (est), july 24, 2008, amazon’s website listed 145,591 “kindle books.” www.amazon.com/s/qid=1216934603/ ref=sr_hi?ie=utf8&rs=154606011&bbn=154606011&rh=n%3a1 54606011&page=1. 7. carr, “is google making us stupid?” 8. evans, “electronic publication and the narrowing of science of scholarship.” 9. spalding, “re: ‘is google making us stupid?’” lib-mocs-kmc364-20140106084043 211 name-title entry retrieval from a marc file philip l. long, head, automated systems research and development and frederick g. kilgour, director: ohio college library center, columbus, ohio a test of validity of earlier findings on 3,3 search-ke y retrieval from an in-process file for retrieval from a marc file. probability of number of entries retrieved per reply is essentially the same for both files. this study was undertaken to test the applicability of previous findings on retrieval of name-title entries from a technical processing system fil e ( 1 ) to retrieval from a marc file; the technique for retrieval employs truncated 3,3 search keys. materials and methods the study cited above employed a file of 132,808 name-title entries obtained from the yale university library's machine aided technical processing system. bibliographic control was not maintained for the generation of records in this file , with the result that the file contained errors that simulated errors in the requests library users put to catalogs. the marc file employed in the present study contains 121,588 name-title entries that are nearly error free. whereas the marc file possesses few records bearing foreign titles, the yale file has a significantly higher percentage of such titles, as would be expected for a large university library. initial articles were deleted in yale titles, but only english articles in marc titles because the language of foreign language titles is not identified in marc. 212 journal of library automation vol. 4/4 december, 1971 design of the program used to analyze the marc file was the same as that for the program employed in the previous study. however, the new program runs on a xerox data systems sigma 5 computer. the test employed the 3,3 search key to make possible comparison with previous results. results table 1 presents the percentage of time that up to five replies can be expected, assuming equal likelihood of key choice. inspection of the table reveals that there is no significant difference between the findings from the yale and the marc files. table 1. probability of number of entries per reply using 3,3, search key number of replies 1 2 3 4 5 discussion cumulative probability percentage yale file marc file 78.58 79.98 92.75 93.28 96.83 96.93 98.40 98.26 99.08 98.91 the same result was expected for the marc file that had been obtained earlier from the yale file. possible influences that might have led to different results were the existence of errors in the yale file, a significant proportion of foreign titles in the yale file as compared to the nearly all-english marc file, and the inability to mechanically delete the initial articles in the few foreign language marc titles. it is most unlikely that the effects of these differences are masking one another. conclusion the findings of a previous study on the effectiveness of retrieval of entries from a large bibliographic file ( 1) by use of a truncated 3,3 search key have been confirmed for a similarly large marc file. reference 1. kilgour, frederick g.; long, philip l. ; leiderman, eugene b.: "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the american society for information science, 7 ( 1970 ), 79-81. 12 information technology and libraries | september 2008 mylibrary: a digital library framework and toolkit eric lease morgan this article describes a digital library framework and toolkit called mylibrary. at its heart, mylibrary is designed to create relationships between information resources and people. to this end, mylibrary is made up of essentially four parts: (1) information resources, (2) patrons, (3) librarians, and (4) a set of locally defined, institution-specific facet/term combinations interconnecting the first three. on another level, mylibrary is a set of object-oriented perl modules intended to read and write to a specifically shaped relational database. used in conjunction with other computer applications and tools, mylibrary provides a way to create and support digital library collections and services. librarians and developers can use mylibrary to create any number of digital library applications: full-text indexes to journal literature, a traditional library catalog complete with circulation, a database-driven website, an institutional repository, an image database, etc. the article describes each of these points in greater detail. n background and history the term “mylibrary” was coined by keith morgan, doris sigl, and myself in 1997 when we worked in the department of digital library initiatives at the north carolina state university libraries. at that time it denoted a personalizable/customizable user interface to sets of library collections and services. it was a reaction to the then-popular portal applications called my netscape, my yahoo!, and my dejanews.1 in that form, mylibrary was a monolithic turnkey application. librarians were expected to use the administrative interface to organize information resources into three distinct groups: databases, electronic texts, and library links (services). each item in each group was expected to be associated with one or more discipline terms. patrons were expected to come to the system, register, select a discipline, and use the databases, texts, and library links to do library research. patrons had three additional functions at their disposal. the first was the ability to add “personal” links—bookmarks to their favorite websites. second, they had the ability to select multiple disciplines and thus refine the number of resources associated with “their” page. finally, and to a small degree, patrons had the ability to change the graphic design of the page. because of these customizable features and its implementation at ncsu libraries, the system was officially called mylibrary@ncstate. mylibrary@ncstate was packaged and distributed as open-source software, a newly coined term at that time. it was subsequently downloaded and installed in roughly two dozen libraries across the world. some of these libraries used it in exactly the manner it was designed, and some of them are still accessible today.2 other libraries used parts and pieces of the system to build their own applications. for example, the openuniversity used only the underlying database structure.3 on the other hand, los alamos national laboratory used to mylibrary@ncstate concept and completely re-wrote the perl modules.4 more importantly, the concept of mylibrary—a userdriven, customizable interface to sets of library collections and services—became very popular. mylibrary-like applications sprang up all over the library landscape. these implementations did not use the perl modules and scripts written under the mylibrary@ncstate rubric, but they did organize content in an underlying database and allowed patrons to mix and match the content for their specific purposes.5 as a turnkey application, mylibrary@ncstate functioned correctly. it did not crash and it did not output invalid data. at the same time, mylibrary@ncstate did not fare very well when it came to usability tests. for example, gibbons describes how the usability of mylibrary was improved to meet the needs of course offerings.6 in another article, brantley describes how users had difficulty “understanding the discipline-specific nature” of mylibrary@ncstate.7 its installation process was nonstandard and therefore difficult to implement. as written, mylibrary@ncstate was difficult to extend and enhance, and thus it did not truly benefit from its open-source nature. data entry was tedious and for this reason its content was difficult to initialize and maintain. the idea of actively customizing a user interface was foreign to many users. people do not take an active role in customizing their user interfaces. they accept the defaults or unconsciously expect the user interface to adapt to their needs.8 for all these reasons, mylibrary@ncstate’s popularity lasted about five years, but for many of the reasons outlined previously, the concept of mylibrary still seems viable. the balance of this article describes two things: (1) how the current implementation of mylibrary has evolved beyond the turnkey nature of mylibrary@ncstate, and (2) how the “new and improved” mylibrary has been and can be used to create a number of digital library applications. eric lease morgan is head of the digital access and information architecture department, hesburgh libraries, university of notre dame, indiana. mylibrary: a digital library framework and toolkit | morgan 13 n mylibrary, relationships, and facet/term combinations more than anything else, mylibrary is intended to provide a framework for creating relationships between information resources and people. most of the time these information resources are the traditional things of libraries such as books, journals, indexes, catalogs, manuscripts, and photographs. the people of mylibrary are patrons and librarians. relationships can be drawn between information resources and people through the use of facet/term combinations—a locally defined and institution-specific controlled vocabulary. information resources and people can be described in similar fashions. resources, for example, are described with subjects. they are described according to their physical format and function. patrons and librarians focus much of their energies in specific subjects: “i am majoring in philosophy.” sometimes people focus their attention on specific formats: “i need a journal article on . . .” sometimes people are interested in particular functions: “i need a definition for . . .” people can belong to particular audiences and they might want to use audience-specific resources: “these resources are particularly useful for students in geog 203.” in our increasingly networked environment, it is just as important to create relationships between people as it is to create relationships between information resources and patrons. librarians are not seen as the only authority on data and information. the opinions of one’s peers play an important role too. users want to read reviews, rank items according to various weights, and make decisions based on the thoughts of people like them. through facet/term combinations applied to users, this is possible. moreover, since users do not visit libraries as often as they used to, librarians need to figure out ways of staying in touch with their populations. by applying facet/term combinations to librarians as well as users, the librarians can know who their users are and users can easily identify subject experts. intended for use as the framework for a controlled vocabulary, the facet/term combinations of mylibrary give the librarian and developer an opportunity to describe and relate the primary components of libraries—information resources and people. through these facet/term combinations, conceptual links can be created between information resources and users, between users and librarians, and between librarians and resources. after creating a set of facet/term combinations, the librarian and developer can address increasingly popular desires such as but not limited to: n as a librarian, this is the set of resources i curate . . . n because you are in this class, you might want to use . . . n here is a list of all the encyclopedias on the topic of . . . n here is a list of patrons who use the resources i curate . . . n here is a list of the full-text article indexes . . . n here is a list of articles on . . . n the library owns the following special collections . . . n these special collections can be used for this class . . . n other people in this class have also used . . . n other people like you have used . . . n recommended resources for this subject are . . . n resources for this subject are . . . n the subject-specific librarian is . . . to be able to address these issues, the librarian and the developer first create sets of facet/term combinations and then assign one or more of them to information resources, patrons, and/or librarians. after the assignments have been made, lists of relevant mylibrary objects (information resources or people) can be generated by specifying—“joining” in relational database parlance—facet/ term combinations held in common between the objects. for example, if many information resources, patrons, and librarians were classified using a subjects/astronomy facet/term combination, then the librarian and developer can create a list of astronomy-related resources for patrons, a list of astronomy-interested patrons for librarians, and list of astronomy-responsible librarians for patrons. n mylibrary facets and terms mylibrary facets are intended to be the headings for very broad categories. mylibrary terms are expected to denote examples of the facets. facet/term combinations are expected but not required to be defined for every mylibrary implementation. every librarian and developer who uses mylibrary is expected to define his or her own set of facet/term combinations. in the form of a simplified entity-relationship diagram, figure 1 illustrates how the relationships between information resources and people are modeled in mylibrary. an easy-to-understand facet might be formats denoting the physical manifestation of an information resource. terms associated with a formats facet might include books, manuscripts, journals, microforms, articles, maps, pictures, movies, or datasets. given just about any information resource, a formats facet/term combination can be assigned to it. for example, a library that owns the encyclopaedia britannica might “catalog” it with the 14 information technology and libraries | september 2008 formats/books facet/term combination: title—encyclopaedia britannica facet/term—formats/books another easy-to-understand facet might be called research tools, denoting things used to find data and information. example terms might include dictionaries, thesauri, manuals, journal indexes, library catalogs, internet indexes, encyclopedias, atlases, or almanacs. continuing with the example above, encyclopaedia britannica might have an additional facet/term combination assigned to it: title—encyclopaedia britannica facet/term—formats/books facet/term—research tools/encyclopedias an audience facet might be created to denote classes of users. in an academic library, possible terms might include freshman, sophomores, juniors, seniors, graduate students, instructors, faculty, and staff. using a different information resource—say, dissertation abstracts—we might come up with a different set of facet/term combinations: title—dissertation abstracts facet/term—research tools/bibliographic indexes facet/term—audiences/graduate students using mylibrary’s facet/term combinations, it is almost trivial to create an authorities list. an authors facet can be created to denote the creators of works. specific names can be used as terms. similarly, there might be a need or desire to include genre headings. consequently, the adventures of huckleberry finn might be described like this: title—the adventures of huckleberry finn facet/term—audiences/adolescents facet/term—authors/mark twain facet/term—formats/books facet/term—genre/coming of age stories facet/term—genre/novels n mylibrary objects facet/term combinations are used to describe and create relationships between mylibrary objects. these objects include information resources and people, and the people consist of users and librarians. the idea of facet/term combinations has been described above. this section describes the mylibrary objects—information resources and people—in greater detail. information resources information resources are the traditional informationcarrying “things” of a library. typically they include books, journals, articles, manuscripts, indexes, catalogs, finding aids, etc. in order to organize and increase access to these materials, libraries systematically describe collections using rigorous cataloging procedures. with the advent of ubiquitous computing and the internet, at least two things have happened regarding the “things of a library.” first, they are increasingly less bibliographic in nature. while the number of books, journals, and articles is certainly not decreasing, the number of conference presentations, simulations, images, sounds, movies, and data sets is multiplying at an astounding rate. second, because of this additional content, the traditional rigorous cataloging procedures of librarianship do not scale to the amount of work that needs to be done. dublin core metadata elements were created to address these problems. facet/term combinations form the foundation for creating simple but local controlled vocabularies. facet/term combinations plus dublin core metadata elements plus a number of other attributes brought along from mylibrary@ncstate for backwards compatibility are used to describe information resource objects in mylibrary. attributes a few things ought to be noted about some of the mylibrary attributes. first, many of the dublin core elements can be duplicated with facet/term combinations. the prime candidates are elements that can be expressed as database many-to-many relationships. the dublin core element called creator is an excellent example. any figure 1. simplified mylibrary entity-relationship diagram. facets have a one-to-many relationship with terms. terms have a manyto-many relationship with resources, patrons, and librarians. after defining sets of facet/term combinations, the mylibrary api allows librarians and developers to build interconnections between resources, patrons, and librarians. mylibrary: a digital library framework and toolkit | morgan 15 single information resource may have many creators, and any creator may be associated with many resources. librarians and developers who use mylibrary are able to place creator information in an attribute of a mylibrary resource object and/or in a facet/term combination. the former usage is similar to traditional library cataloging technique and consequently requires additional overhead for editing records. the application of facet/term combinations makes it much easier to maintain database integrity as well as create browsable lists. just like creators, subjects might be better implemented as facet/term combinations, and the mylibrary subject attribute might be used as a placeholder for keywords or non-controlled vocabulary terms. each mylibrary resource object might have multiple subjects. using the facet/term approach, this is no problem to implement. using the dublin core subject field approach, this is challenging, since the field is not repeatable. to circumvent this, librarians and developers are encouraged to delimit subject term values with predefined characters (such as “|”). upon indexing or display, the subject attribute can be parsed into multiple values. identifiers mylibrary resource objects possess three distinct types of identifiers, and each has it own explicit use. the first is the mylibrary resource identifier, which is a relational database key. it is non-assignable and non-editable by librarians or developers. it is an internal value used to maintain relational database integrity. the second type of mylibrary resource identifier is called the fkey, and it is used to denote a foreign key. this attribute is primarily intended to contain the value of an identifier from a remote information system like the 001 field of a marc record. a better example includes the harvesting of records from oai data repositories. each record in each oai repository has an internet-wide unique identifier. this value is not a url, but usually a combination of characters and numbers analogous to the 001 field of a marc record. each repository may also implement a concept called “sets,” and each record might belong to multiple sets. when harvesting from a repository, the librarian and developer can save the oai identifier as an fkey value, and when the same record from an alternative set is discovered, the associated resource object can be updated instead of duplicated. the third type of identifier are resource::location objects. they are primarily intended for but not limited to urls. unlike all of the other resource attributes, resource::location objects are intended to have many values because information resources have many locations. for example, a library might have a printed version of the adventures of huckleberry finn, and its location is denoted by a call number. a library might also have an electronic version, and its location is a url. an online bibliographic database might be located at a particular url, but its locally developed help text might be located at a different url. each resource::location object has three qualities: (1) a key, (2) a type, and (3) a value. the key is an internal relational database identifier. the type is an institution-defined value denoting the kind of location. examples might include primary url, help text url, call number, local file name, and isbn or issn. the value is an example of the type, and, in the case of dublin core elements, might very well be the identifier. using mylibrary resource::location objects, single information resources can be displayed and multiple locations can be associated with them. library services think creatively regarding the definition of resource objects. think library services as well as books, journals, and databases. libraries are about more than collections. they are also about services applied against those collections. libraries want to promote their services just as much as they want to promote access to bibliographic indexes, special collections, and the wealth of monographs. these services include bibliographic and information literacy sessions, circulation services (such as interlibrary loan, item recalls, renewals, or document delivery), library tours, one-on-one reference consultations, and online chats. each of these services has a title, a description, and probably a url where details can be read online. mylibrary resource objects provide a means to embody this information in a concise package. all that is missing are facet/term combinations to relate them to other information resources or people. consider an audience facet. putting things on reserve is something of interest to instructors. consider an audience term called instructors. assign an audience/instructors facet/term combination to instructions for putting things on reserve. things put on reserve are intended for use by students. again, consider assigning something like an audience/ students facet/term combination to instructions for using the reserve book room. people—patrons and librarians mylibrary includes two types of objects representing people: patrons and librarians. like information resource objects, librarian and patron objects are characterized using a number of attributes plus facet/term combinations. on one level, the patron attributes are simple and rudimentary only including things like first name, last name, username, password, e-mail address, url, and image. this type of information was explicitly designed to map to the foaf (friend of a friend) architecture in the hopes of future compatibility. patron objects also 16 information technology and libraries | september 2008 include attributes for things like last date visited and total number of visits. this information forms the basis for potential what’s new? functionality. the patron object also includes functionality to record personal links for bookmarking features. the mylibrary librarian object is even simpler than the patron object since it only includes attributes for name, e-mail address, and url. just like the mylibrary information resource objects, both the patron objects and the librarian objects can be mapped to facet/term combinations. just as mla bibliography might be “cataloged” using a subjects/ english literature facet/term combination, a patron or librarian object can be “cataloged” in the same way. once these sorts of relationships are established, recommendations can begin to take shape. once patrons start bookmarking and associating particular resources and services to their identity, the system can take the next step and address things such as “people like you also used . . . ” or “popular resources in this area are . . .” moreover, once facet/term combinations are associated with people, then relationships between people can be created and the system can answer statements such as “other people interested in this topic include . . .” or “the patrons who are interested in this subject are . . .” establishing facet/term combinations for people is not as difficult as it may seem at first. in an academic library, much of this information can be gleaned from human resources data or the institution’s registrar office. libraries probably already get this information in one shape or another to populate their integrated library system circulation module. at the very least, this information includes a first name, a last name, and a unique institution identifier (possibly a username). given this information, the librarian and developer could query the institution’s directory services to discover institutional department and/or major field of study. just as this information is loaded into the integrated library system to support borrowing, it can be loaded into a mylibrary instance. each department or major can then be mapped to facet/term combinations. privacy is a real issue with the inclusion of patron information in a mylibrary instance. it should be taken very seriously. the use of mylibrary does not assume the inclusion of patron information; it is more than possible to use mylibrary and not have it contain any information about people. on the other hand, without this information a library prevents itself from providing the sort of services increasingly expected by its patrons. a discussion of the professional ethics of providing personalized services to library users in a computer-networked environment is beyond the scope of this article. each library must weigh for itself the strengths, weaknesses, advantages, and threats of using information about patrons to provide individualized services. n combining mylibrary with other “toolboxes” as a framework or toolbox, mylibrary is intended to support only certain aspects of a digital library, namely, the collection of content, information about people, and a means of making relationships between them. mylibrary is not intended to be an “integrated library system.” it has no acquisitions module. it has no circulation module. it includes the only the most basic functionality for searching. instead, librarians and developers are expected to combine mylibrary with other tools to fulfill these functions. for example, acquisitions functionality can be implemented by harvesting oai content. by combining mylibrary with another set of perl modules called net::oai::harvester, librarians and developers can import oai-based content into a mylibrary instance.9 feed net::oai::harvester an oai root url, and it will systematically harvest remote metadata in any number of metadata formats. since dublin core metadata is required of all oai data repositories, and since mylibrary supports a one-to-one mapping to dublin core elements, it is trivial to create mylibrary resource objects based on each of the harvested records. appendix a illustrates a simple yet complete oai acquisitions application. it harvests journal article metadata from the directory of open access journals. just about any bibliographic metadata format can be mapped to dublin core. examples include marc, marcxml, mods, ead, and tei. to get content in these forms into a mylibrary instance, the librarian and developer need to write a program reading bibliographic data, parsing out the desired information, and saving it to mylibrary. considering marc data, the venerable perl module called marc::record could be used to read and parse the data.10 the other data formats are xmlbased, and a perl-based application supporting xstl or xpath could be used to read and parse the data. in all of these cases the content of the mylibrary instance should be considered brief and the fkey value might point to the original file on the local file system. such mylibrary resource objects are useful for syndication, search result displays, or browsable lists. if more detail is required, then the brief records can point to the full metadata through the fkey value. mylibrary is not intended to support search. that is because search is best supported not by a database but by an indexer.11 there are myriad indexers available. some of them include swish-e, kinosearch, zebra, and lucene.12 to search the content of a mylibrary instance, librarians and developers are expected to write reports against the instance and use them as the content for indexing. appendix b illustrates a rudimentary but complete promylibrary: a digital library framework and toolkit | morgan 17 gram creating a kinosearch index against a mylibrary instance. once the index is created, librarians and developers are expected to write interfaces to search the index. appendix c illustrates one searching technique: get a query as input, search the index, return a record’s id value, lookup the record in mylibrary, display. in summary, mylibrary first defines a number fundamental library objects (information resources, people, and a controlled vocabulary). it then supports a perl-based application programmer interface (api) for doing input/ output against these objects. the input can be garnered from any number of streams—manual data entry, tabdelimited text files, marc or xml files, oai, etc. the output can be xml files, rss or atom feeds, oai, html subject pages, e-mail messages, or pdf files. n production and demonstration applications a number of diverse applications have been created with mylibrary. some of them are production services. some of them are not fully developed and only exist to demonstrate the possibilities. this section briefly describes a number of them. alex catalogue of electronic text the alex catalogue of electronic text is a collection of just less than 14,000 public-domain documents from american literature, english literature, and western philosophy. much of the content comes from project gutenberg, but it also includes content from the defunct eris project of virginia tech and the internet wiretap archive. each mylibrary resource object includes as much dublin core data as possible. the description attribute of each mylibrary resources includes not an abstract of the electronic text, but an rdf/xml version of the original text. a report was written against the mylibrary instance that saves the rdf/xml to the local file system. these files were then indexed with an open-source indexer called zebra, and access to the index was provided through a web services–based protocol called sru (search/retrieve via url). consequently, the catalogue is full-text searchable as well as searchable via title, creator, and subject. the contents of the subject fields were computed by analyzing each document and extracting statistically significant words. the searchable interface supports a did you mean? service by comparing search terms to alternative spellings and a wordnet thesaurus. the catalogue’s title and creator browsable lists are static html files built by a script written against the underlying mylibrary instance. finally, links to all of the documents and their subjects have been uploaded to del.icio.us. to facilitate this, a script was written against the database extracting all the titles, their creators, and subjects (“tags”). these things were then sent to del.icio .us via a perl module implementing the del.icio.us api. article index the directory of open access journals includes an oai interface to its journal titles as well as some of its articles. the article index system harvested the article metadata and saved it to a mylibrary instance. along the way, journal titles and publishers were saved to underlying facet/term combinations and linked to each article. this enabled the creation of browsable lists via publisher and source. the content of the database was indexed using kinosearch and made accessible via a perl module written to implement sru. search results are displayed in a brief format. details are available via a simple asynchronous javascript and xml (ajax-y) link. appendixes a, b, and c illustrate the core of this application. catholic research resources alliance the catholic research resources alliance (crra) is a “portal” intended to highlight rare and unique materials of interest to catholic scholars. much of this content exists in archives. archives use an xml format called ead to describe their holdings. the crra provides a mechanism for ingesting these ead files, parsing out controlled vocabularies, populating facet/term combinations accordingly, full-text indexing the ead, and supporting a searchable/ browsable interface to the entire content via sru. the crra also supports ingesting marc records as well as getting its input from online data-entry forms. reports are written against the underlying mylibrary instance allowing the crra’s content to be accessible via oai. facebook a facebook application has been written against the mylibrary data of the hesburgh libraries university of notre dame’s database-driven website. after facebook users load the application into their profile, they are presented with a set of default recommended resources. the user then has the option to select a different set of resources based on subject terms presented in a pop-up menu. the resulting list of resources is then saved to the user’s profile pane, giving easy access to the pertinent databases and indexes of his or her selected subject. library catalog mylibrary has been used to create a demonstration library catalog. about 300,000 marc records were downloaded 18 information technology and libraries | september 2008 from the library of congress. a program was written that reads each marc record, crosswalks it to dublin core, and creates mylibrary resource objects accordingly. each marc record is saved as an individual file on the file system. the whole collection is indexed with kinosearch, and an sru interface provides access to the index. as search results are returned, the existence of isbn numbers is checked. if found, cover art and user reviews are retrieved and displayed from amazon. each record is displayed in a brief format, but links to a fully tagged format is available as well as marcxml and mods formats. each record is also associated with a “get it for me” link. once clicked, the item is essentially checked out to the user. each user then has a “bookshelf” link displaying the items they have borrowed. hesburgh libraries, university of notre dame’s database-driven website the hesburgh libraries’ database-driven website is probably the most extensive mylibrary application in existence, and its primary purpose is to support the majority of the libraries’ website. the system begins with the integrated library system where much (but not all) of the library’s website content has been cataloged using traditional methods. each item in the catalog destined for the website has been flagged with a local note denoting such. each item’s description has also been enhanced with facet/term combinations. on a nightly basis, all of the items destined for the website are exported from the catalog as marc records. on a nightly basis, another script reads these records and updates a mylibrary instance. reports are written against the instance creating subject pages, format pages, tool pages, etc., complete with descriptions, recommendations, and links to associated librarians. some information resources on the website are not deemed worthy of a record in the catalog. for these items, a manual data-entry form was created allowing bibliographers and subject specialist librarians to supplement the website’s content. these resources are seamlessly integrated into the website along with the resources from the catalog. to facilitate search, reports are written against the mylibrary instance and fed to swish-e. the resulting index is then supplemented with the content of static web pages to support search this site functionality. using this database-driven and mylibrary-based system, the content of the libraries’ website has many fewer broken links because the links are all centrally maintained. the site also sports a common look and feel, making it easy for users to know where they are located in the system. this process also eliminates the need for selectors and subject specialist librarians to know any html. they can focus on content and the system can focus on presentation. n future directions and conclusion the mylibrary modules work in the manner in which they were intended, and they continue to be distributed and supported as open-source software, but software is never complete. mylibrary is available from cpan (comprehensive perl archive network). it is supported by a website complete with voluminous documentation, sample applications, access to a cvs repository, blog commentaries, and a mailing list with about 150 subscribers.13 yet despite the support, use of mylibrary outside the university of notre dame has been underwhelming. i assume this is true because the number of perl programmers in libraries is shrinking as the number other programming languages (php, python, ruby, java, etc.) grows. the modularity of the system may also be a factor since most of the library profession can not write a computer program and therefore will have a difficult time understanding how to put mylibrary into practical use. the idea of facet/term combinations used to describe information resources as well as people may be off-putting. finally, because mylibrary requires an underlying database to operate, the normal perl installation process (perl makefile.pl; make; make test; make install) can only be done after a bit of pre-installation processing. this is possibly another impediment to adoption—the installation process is a bit unusual. despite these issues, mylibrary works very well for the university of notre dame, and a number of improvements are planned. first, the underlying database contains a table for user reviews, and a perl module needs to be written allowing input/output against these tables. similarly, mylibrary presently includes tables for keeping track of how often a particular resource is used and by whom, but there is no module to update the table. future work will enhance this statistics table and implement the statistics module. finally—and most importantly—work will be done to make it easy to do input/output against a mylibrary instance through a rest-ful (representational state transfer) interface. as defined by rest, this interface will exploit the four transfer methods of http (get, post, put, and delete) to retrieve, create, edit, and remove mylibrary objects from the underlying database. by exploiting rest-ful computing techniques, at least two things will be enabled. first, application programmers will be able to use their favorite computer language to maintain a mylibrary instance. there will be no need to know perl; rest is computer-language independent. second, through the use of rest-ful computing mylibrary content will be more easily syndicated. for example, the output of a rest-ful mylibrary interface could be manifested in many flavors of xml. atom comes to mind, but an rdf/xml representation may be mylibrary: a digital library framework and toolkit | morgan 19 more expressive. the output of a rest-ful interface to mylibrary could also be manifested as a json (javascript object notation) data structure, making it easier to integrate mylibrary content in ajax-y interfaces. as more and more library collections and services are manifested in a computer-networked environment, the need to provide these collections and services in new and different ways increases. mylibrary is an attempt to address this issue, and it has met with qualified success. acknowledgments an enormous debt of gratitude goes to rob fox of the hesburgh libraries, university of notre dame for writing the bulk of the mylibrary perl modules. rob and i sat down together for a couple days in 2003 to learn about object-oriented perl programing techniques from ed summers (now working at the library of congress). we then coupled that experience with the needs and desires of the libraries to articulate and design mylibrary as it is today. while i wrote bits and pieces of the modules and used them to write many applications, rob was the person who really got his hands dirty. references and notes 1. keith morgan and tripp reade, “pioneering portals: mylibrary@ncstate,” information technology and libraries 19, no. 4 (dec. 2000): 191–98. 2. the author has identified at least four mylibrary@ ncstate implementations still up and running from across the world, including the wellington city libraries in new zealand, www.wcl.govt.nz/mylibrary (accessed feb. 19, 2008); the buswell library electronic access center of wheaton college, http://libweb.wheaton.edu/mylibrary (accessed feb. 19, 2008); the biblioteca mario rostoni at the universita carlo cattaneo, http://mylibrary.liuc.it/mylibrary (accessed feb. 19, 2008); and auburn university, http://mylibrary.auburn.edu (accessed feb. 19, 2008). 3. anne ramsden, james mcnulty, fiona durham, helen clough, and nicola dowson created myopenlibrary for the openuniversity in the united kingdom. “myopenlibrary is an online personalised library system developed for open university students and staff. every individual user can have a virtual library ‘shelf’ or space which is tailored to meet their particular needs. the system is based on the mylibrary software originally developed at north carolina state university and now supported at notre dame university. the software has a simple basic interface, groups resources under clear headings, and provides a tick box facility for selecting and removing resources. users sign in because it is a personalised service, but then they can customise the colour and settings of their page according to need, and if they are familiar with the internet, they add their own personal favourite links. there is a quick search facility for searching individual databases and internet search engines. the system is currently being used by 20 open university courses and this is expected to increase year on year. for more information see http://myopenlibrary. open.ac.uk/.” myopenlibrary includes 80,768 patrons (79% of the total student population of openuniversity), 111 disciplines, 12,731 e-books, 500 databases, and 38,708 journals. from personal correspondence between the author and james mcnulty (feb. 19, 2008). 4. “the lanl implementation of mylibrary @ lanl is an object oriented redesign of the mylibrary source code created by eric lease morgan of north carolina state university. the code was designed by two summer students andres monroyhernandez and cesar ruiz-meraz from monterrey, mexico. the code is currently maintained by mariella di giacomo and ming yu.” from http://library.lanl.gov/lww/mylibweb.htm (accessed feb. 19, 2008). 5. a search against google for “mylibrary” returns myriad results, many of which are mylibrary-like applications and services. representative samples include mylib of malaysia’s national digital library, www.mylib.com.my (accessed feb. 19, 2008); my library of hennepin county library, www.hclib .org/pub/ipac/mylibrary.cfm (accessed feb. 19, 2008); and mylibrary of coastal carolina university, www.coastal.edu/ library/mylibrary.html (accessed feb. 19, 2008). 6. susan gibbons, “building upon the mylibrary concept to better meet the information needs of college students,” d-lib magazine 9, no. 3 (mar. 2003), www.dlib.org/dlib/march03/ gibbons/03gibbons.html (accessed feb. 19, 2008). 7. steve brantley, annie armstrong, and krystal m. lewis, “usability testing of a customizable library web portal,” college and research libraries 67, no. 2 (mar. 2006): 146–63, www.ala .org/ala/acrl/acrlpubs/crljournal/backissues2006a/marcha/ brantley06.pdf (accessed feb. 19, 2008). 8. udi manber, ash patel, and john robison, “experience with personalization on yahoo!” communications of the acm 43, no. 8 (aug. 2000): 35–39. 9. net::oai::harvester, http://search.cpan.org/dist/oai -harvester (accessed feb. 19, 2008). 10. marc::record, http://search.cpan.org/dist/marc -record (accessed feb. 19, 2008). 11. search is a function best supported by an indexer, not a relational database. relational databases are tools for organizing and maintaining data. through the process of normalization, relational databases store data unambiguously and efficiently. because relational databases store their information in tables, records, and fields, it is necessary to specify the tables, records, and fields when querying a database. this requires the user to know the structure of the database. moreover, standard relational databases do not support full-text searching nor relevance-ranked output. indexers excel at search. given a stream of documents, indexers parse tokens (words) and associate them with document identifiers. searches against indexes return document identifiers and provide the means to retrieve the documents without the necessary knowledge of the index’s structure. indexers are weak at data maintenance. in a welldesigned database, authority terms can be updated in a single location and reflected throughout the database. indexers do not support such functionality. databases and indexers are two sides of the same information retrieval coin. together they form the technological core of library automation. 20 information technology and libraries | september 2008 12. there are a growing number of open-source indexers available on the web, including swish-e, http://swish-e.org (accessed feb. 19, 2008); kinosearch, www.kinosearch.com/ kinosearch (accessed feb. 2008); zebra, http://indexdata.com/ zebra (accessed feb. 19, 2008); and lucene, http://lucene .apache.org (accessed feb. 19, 2008). 13. the canonical home page for mylibrary version 3.x is http://mylibrary.library.nd.edu (accessed feb. 19, 2008). appendix a # harvest doaj articles into a mylibrary instance # require use mylibrary::core; use net::oai::harvester; # define use constant doaj => ‘http://www.doaj.org/oai.article’; # the oai repository mylibrary::config->instance( ‘articles’ ); # the mylibrary instance # create a facet called formats $facet = mylibrary::facet->new; $facet->facet_name( ‘formats’ ); $facet->facet_note( ‘types of physical items embodying information.’ ); $facet->commit; $formatid = $facet->facet_id; # create an associated term called articles $term = mylibrary::term->new; $term->term_name( ‘articles’ ); $term->term_note( ‘short, scholarly essays.’ ); $term->facet_id( $formatid ); $term->commit; $articleid = $term->term_id; # create a location type called url $location_type = mylibrary::resource::location::type->new; $location_type->name( ‘url’ ); $location_type->description( ‘the location of an internet resource.’ ); $location_type->commit; $location_type_id = $location_type->location_type_id; # create a harvester and loop through each oai set mylibrary: a digital library framework and toolkit | morgan 21 $harvester = net::oai::harvester->new( ‘baseurl’ => doaj ); $sets = $harvester->listsets; foreach ( $sets->setspecs ) { # get each record in this set and process it $records = $harvester->listallrecords( metadataprefix => ‘oai_dc’, set => $_ ); while ( $record = $records->next ) { # map the oai metadata to mylibrary attributes $fkey = $record->header->identifier; $metadata = $record->metadata; $name = $metadata->title; @creators = $metadata->creator; $note = $metadata->description; $publisher = $metadata->publisher; next if ( ! $publisher ); $location = $metadata->identifier; next if ( ! $location ); $date = $metadata->date; $source = $metadata->source; @subjects = $metadata->subject; # create and commit a mylibrary resource $resource = mylibrary::resource->new; $resource->fkey( $fkey ); $resource->name( $name ); $creator = ‘’; foreach ( @creators ) { $creator .= “$_|” } $resource->creator( $creator ); $resource->note( $note ); $resource->publisher( $publisher ); $resource->source( $source ); $resource->date( $date ); $subject = ‘’; foreach ( @subjects ) { $subject .= “$_|” } $resource->subject( $subject ); $resource->related_terms( new => [ $articleid ]); $resource->add_location( location => $location, location_type => $location_type_id ); $resource->commit; } } 22 information technology and libraries | september 2008 # done exit; appendix b # index mylibrary data with kinosearch # require use kinosearch::invindexer; use kinosearch::analysis::polyanalyzer; use mylibrary::core; # define use constant index => ‘../etc/index’; # location of the index mylibrary::config->instance( ‘articles’ ); # mylibrary instance to use # initialize the index $analyzer = kinosearch::analysis::polyanalyzer->new( language => ‘en’ ); $invindexer = kinosearch::invindexer->new( invindex => index, create => 1, analyzer => $analyzer ); # define the index’s fields $invindexer->spec_field( name => ‘id’ ); $invindexer->spec_field( name => ‘title’ ); $invindexer->spec_field( name => ‘description’ ); $invindexer->spec_field( name => ‘source’ ); $invindexer->spec_field( name => ‘publisher’ ); $invindexer->spec_field( name => ‘subject’ ); $invindexer->spec_field( name => ‘creator’ ); # get and process each resource foreach ( mylibrary::resource->get_ids ) { # create, fill, and commit a document with content my $resource = mylibrary::resource->new( id => $_ ); my $doc = $invindexer->new_doc; $doc->set_value ( id => $resource->id ); mylibrary: a digital library framework and toolkit | morgan 23 $doc->set_value ( title => $resource->name ) unless ( ! $resource->name ); $doc->set_value ( source => $resource->source ) unless ( ! $resource->source ); $doc->set_value ( publisher => $resource->publisher ) unless ( ! $resource->publisher ); $doc->set_value ( subject => $resource->subject ) unless ( ! $resource->subject ); $doc->set_value ( creator => $resource->creator ) unless ( ! $resource->creator ); $doc->set_value ( description => $resource->note ) unless ( ! $resource->note ); $invindexer->add_doc( $doc ); } # optimize and done $invindexer->finish( optimize => 1 ); exit; appendix c # search a kinosearch index and display content from mylibrary # require use kinosearch::searcher; use kinosearch::analysis::polyanalyzer; use mylibrary::core; # define use constant index => ‘../etc/index’; # location of the index mylibrary::config->instance( ‘articles’ ); # mylibrary instance to use # get the query my $query = shift; if ( ! $query ) { print “enter a query. “; chop ( $query = )} # open the index $analyzer = kinosearch::analysis::polyanalyzer->new( language => ‘en’ ); $searcher = kinosearch::searcher->new( invindex => index, analyzer => $analyzer ); # search $hits = $searcher->search( qq( $query )); # get the number of hits and display $total_hits = $hits->total_hits; 24 information technology and libraries | september 2008 print “your query ($query) found $total_hits record(s).\n\n”; # process each search result while ( $hit = $hits->fetch_hit_hashref ) { # get the mylibrary resource $resource = mylibrary::resource->new( id => $hit->{ ‘id’ }); # extract dublin core elements and display print “ id = “ . $resource->id . “\n”; print “ name = “ . $resource->name . “\n”; print “ date = “ . $resource->date . “\n”; print “ note = “ . $resource->note . “\n”; print “ creators = “; foreach ( split /\|/, $resource->creator ) { print “$_; “ } print “\n”; # get related terms and display @resource_terms = $resource->related_terms(); print “ term(s) = “; foreach (@resource_terms) { $term = mylibrary::term->new(id => $_); print $term->term_name, “ ($_)”, ‘; ‘; } print “\n”; # get locations (urls) and display @locations = $resource->resource_locations(); print “ location(s) = “; foreach (@locations) { print $_->location, “; “ } print “\n\n”; } # done exit; lib-mocs-kmc364-20140106083930 198 an algorithm for compaction of alphanumeric data william d. schieber, george w. thomas: central library and documentation branch, international labour office, geneva, switzerland description of a technique for compressing data to be placed in computer auxiliary storage. the technique operates on the principle of taking two alphabetic characters frequently used in combination and replacing them with one unused special character code. such une-for-two replacement has enabled the ilo to achieve a rate of compression of 43.5% on a data base of approximately 40,000 bibliographic records. introduction this paper describes a technique for compacting alphanumeric data of the type found in bibliographic records. the file used for experimentation is that of the central library and documentation branch of the international labour office, geneva, where approximately 40,000 bibliographic records are maintained on line for searches done by the library for its clients. work on the project was initiated in response to economic pressure to conserve direct-access storage space taken by this particularly large file. in studying the problem of how to effect compaction, several alternatives were considered. the first was a recursive bit-pattern recognition technique of the type developed by demaine ( 1,2), which operates mdependently of the data to be compressed. this approach was rejected because of the apparent complexity of the coding and decoding algorithms, and also because early analyses indicated that further development of the second type of approach might ultimately yield higher compression ratios. compaction of alphanumeric datajschieber and thomas 199 the second type of approach involves the replacement, by shorter nondata strings, of longer character strings known to exist with a high frequency in the data. this technique is data dependent and requires an analysis of what is to be encoded. one such method is to separate words into their component parts: prefixes, stems and suffixes; and to effect compression by replacing these components with shorter codes. there have been several successful algorithms for separating words into their components. salton ( 3) has done this in connection with his work on automatic indexing. resnikoff and dolby ( 4,5) have also examined the problem of word analysis in english for computational linguistics. although this method appears to be viable as the basis of a compaction scheme, it was here excluded because ilo data was in several languages. moreover, dolby and resnikoff's encoding and decoding routines require programs that perform extensive word analysis and dictionary look-up procedures that ilo was not in a position to develop. the actual requirements observed were twofold: that the analysis of what strings were to be encoded be kept relatively simple, and that the encoding algorithm must combine simplicity and speed presumably by minimizing the amount of dictionary look-up required to encode and decode the selected string. one of the most straightforward examples of the use of this technique is the work done by snyderman and hunt ( 6 ) that involves replacement of two data characters by single unused computer codes. however, the algorithm used by them does not base the selection of these two-character pairs (called "digrams") on their frequency of occurrence in the data. the technique described here is an attempt to improve and extend the concept by encoding digrams on the basis of frequency. the possibility of encoding longer character strings is also examined. three other related discussions of data compaction appear in papers by myers et al. (7) and by demaine and his colleagues (8,9). the compression technique the basic technique used to compact the data file specifies that the most-frequently occurring digrams be replaced by single unused specialcharacter codes. on an eight-bit character machine of the type used, there are a total of 256 possible character codes (bytes ) . of this total only a small number are allocated to graphics (that is, characters which can be reproduced by the computer's printer). in addition, not all of the graphics provided for by the computer manufacturer appear in the user's data base. thus, of the total code set, a large portion may go unused. characters that are unallocated may be used to represent longer character strings. the most elementary form of substitution is the replacement of specific digrams. if these digrams can be selected on the basis of frequency , the compression ratio will be better than if selection is done independent of frequency. 200 journal of library automation vol. 4/4 december, 1971 this requires a frequency count of all digrams appearing in the data, and a subsequent ranking in order of decreasing frequency. once the base character set is defined, and the digrams eligible for replacement are selected, the algorithm can be applied to any string of text. the algorithm consists of two elements: encoding and decoding. in encoding, the string to be encoded is examined from left to right. the initial character is examined to determine if it is the first of any encodable digram. if it is not, it is moved unchanged to the output area. if it is a possible candidate, the following character is checked against a table to verify whether or not this character pair can be replaced. if replacement can be effected, the code representing the digram is moved to the output area. if not, the algorithm then moves on to treat the second character in precisely the same way as the first. the algorithm continues, character-by-character until the entire string has been encoded. following is a step-by-step description of the element. 1) load length of string into a counter. 2) set pointer to first character in string. 3) check to determine whether character pointed can occur in combination. if character does not occur in combination, point to next character and repeat step 3. 4) if character can occur in combination, check following character in a table of valid combinations with the first character. if the digram cannot be encoded, advance pointer to next character and return to step 3. 5) if the digram is codable, move preceeding non-codable characters (if any) to output area, followed by the internal storage code for the digram. 6) decrease the string length counter by one, advance pointer two positions beyond current value and return to step 3. in the following example assume that only three digrams are defined as codable: ab, be and de. assume also that the clear text to be encoded is the six-character string abcdef. after encoding the coded string would appear as: ab c de f a horizontal line is used to represent a coded pair, a dot shows a single (non-combined) character. the encoded string above is of length four. note that although bc was defined as an encodable digram, it did not combine in the example above because the digram ab was already encoded as a pair. the characters c and f do not combine, so they remain uncoded. note also that if the digram ab had not been defined as codable, the resultant combination would have been different in this case: a bc de f compaction of alphanumeric data j schieber and thomas 201 the decoding algorithm serves to expand a compressed string so that the record can be displayed or printed. as in the encoding routines, decoding of the string goes from left to right. bytes in the source string are examined one by one. if the code represents a single character, the print code for that character is moved to the output string. if the code represents a digram, the digram is moved to the output string. decoding proceeds byte-by-byte as follows until end of string is reached: 1 ) load string length into counter. 2 ) set pointer to first byte in record. 3 ) test character. if the code represents a single character, point to next source byte and retest. 4) if the code represents a digram: move all bytes ( if any ) up to the coded digram; and move in the digram. 5) increase the length value by one, point to next source byte and continue with step 3. application of the technique the algorithm, when used on the data base of approximately 40,000 records was found to yield 43.5% compaction. the file contains bibliographic records of the type shown in figure 1. 413.5 1970 70al350 warner m stone m the data bank societyorganizations, computers and social freedom. london, george allen and unwin, <1970>. 244 p. charts. /social research/ into the potential thrf.at to privacy and freedom f/human right/sl through thf misuse of /data bank/s examines /computer/ based /information ---~ieval/, the impact of computer technology on branches of the /public administration/ ann /health service/$ in the /usa/ ano the /uk/ ano co~cluoes that, in order to protect human dignity, the new powers must be kept tn chf.ck. /bibliography/ pp. 236 to 242 ano /reference/$. engl fig. 1 . sample record from test file. each record contains a bibliographic se gment as well as a brief abstract containing descriptors placed between slashes for computer identification. a large amount of blank space appears on the printed version of these records; however, the uncoded machine readable copy does not contain blanks, except between words and as filler characters in the few fields defined as fixed-length. the average length of a record is 535 characters ( 10) . 202 journal of library automation vol. 4/4 december, 1971 the valid graphics appearing in the data are shown in table 1, along with the percentage of occurrence of each character throughout the entire file. table 1. single-character frequency freq. freq. freq. freq. freq. graphic % graphic % graphic % graphic % graphic % b 14.87 i 4.32 h 1.58 0.63 8 0.31 e 7.63 c 3.48 1.52 w 0.50 ( 0.28 n 6.38 l 3.32 ' 1.52 2 0.42 ) 0.28 i 6.01 d 2.32 1 1.08 k 0.42 + 0.21 a 6.01 u 2.21 v 0.91 3 0.40 j 0.15 (/j 5.86 p 2.12 b 0.87 5 0.37 x 0.14 t 5.50 m 2.02 9 0.83 7 0.37 z 0.13 r 4.82 f 1.61 y 0.82 0 0.35 q 0.08 s 4.61 g 1.58 6 0.81 4 0.34 misc. 0.01 spec. as might be expected, the blank (b) occurs most frequently in the data because of its use as a word separator. the slash occurs more frequently than is normal because of its special use as a descriptor delimiter. it should also be noted that the data contains no lower-case characters. this is advantageous to the algorithm because it considerably le~sens the total number of possible digram combinations. as a result, a larger proportion of the file is codable in the limited set chosen as codable pairs, and because the absence of 26 graphics allows the inclusion of 26 additional coded pairs. in the file used for compaction there are 58 valid graphics. allowing one character for special functions leaves 197 unallocated character codes (of a total of 256 possible ). a digram frequency analysis was performed on the entire file and the digrams ranked in order of decreasing frequency. from this list the first 197 digrams were selected as those which were eligible for replacement by single-character codes. table 2 shows these "encodable" digrams arranged by lead character. the algorithm was programmed in assembler language for use on an ibm 360/40 computer. the encoding element requires approximately 8,000 bytes of main storage; the decoding element requires approximately 2,000 bytes. in order to obtain data on the amount of computer time required to encode and decode the file, the following tests were performed. to find the encoding time, the file was loaded from tape to disk. the tape copy of the file was uncoded, the disk copy compacted. loading time for 41,839 records was 52 minutes and 51 seconds. the same tape to disk operation without encoding took 28:08. the time difference ( 24:43) represents encoding time for 41,839 records, or .035 seconds per record. a decoding test was done by unloading the previously coded disk file to tape. the time taken was 41:52, versus a time of 20:20 for unloading compaction of alphanumeric dataischieber and thomas 203 an uncompacted file. the time difference (21:32) represents decoding time for 41,839 records, or .031 seconds per record. the compaction ratio, as indicated above, was 43.5 per cent. for purposes of comparison, the algorithm developed by snyderman and hunt ( 6) was tested and found to yield a compaction ratio of 32.5% when applied to the same data file. table 2. most frequently occuring digrams lead char. a b c d e f g h i l m n 0 p r s t u v w y b 1 i ) eligible digrams ab ac ad ag ai al am an ap ar as at ab bl bo ca ce ch ci cl co ct cu cb c. dedi du db dl ea ec ed ef el em en ep er es et ev eb el fe fifo fr f~ ge gl gr gb gl ha he hi ho hb la ic ie il in 10 is it iv la le li ll lo lu us ma me mi mm mu mhs na nc nd ne ng ni no ns nt nla nl oc od of og ol om on op or ou ov ol,a pa pe pl po pr p. ra re ri rk rn ro rs rt ru ry rb rl sa se sl so sp ss st su shs s, s. ta tc te th ti to tr ts tu ty tb t i uc ud ul un ur us ut va ve vi wo yhs yl lisa hsb bc bd be hsg lal lal bm bn bo hip l;6r bs hit l;6u l;6w };6};6 l/j i l/j-. l/j ( 19 1 a ; c je 11 / l ; m jp jr ; s jt jb 1, ,b .l/j -b ), possible extension of the algorithm currently the compression technique encodes only pairs of characters. there might be good reason to extend the technique to the encoding of longer strings-provided a significantly higher compaction ratio could be 204 journal of library automation vol. 4/4 december, 1971 achieved without undue increase in processing time. one could consider encoding trigrams, quadrigrams, and up to n-grams. the english wo~d ·'the", for example, may occur often enough in the data to make it worth coding. the arguments against encoding longer strings are several. prime among these is the difficulty of deciding what is to be encoded. doing an analysis of digrams is a relatively straightforward affair, whereas an analysis of trigrams and longer strings is considerably more costly, because of the fact that there are more combinations. furthermore, if longer strings are to be en'coded, the algorithms for encoding and decoding become more complex and time-consuming to employ. one approach to this type of extension is to take a particular type of character string, namely a word, and to encode certain words which appear frequently. a test of this technique was made to encode particular words in the data: descriptors . all descriptors (about 1200 in number) appear specially marked by slashes in the abstract field of the record. each descriptor (including the slashes) was replaced by a two-character code. after replacement, the normal compaction algorithm was applied to the record. a compaction ratio of 56.4% was obtained when encoding a small sample of twenty records ( 10,777 characters). the specific difficulty anticipated in this extension is the amount of either processing time or storage space which the decoding routines would require. if the look-up table for the actual descriptor values were to be located on disk, the time to retrieve and decode each record might be rather long. on the other hand, if the look-up table were to be in main storage at the time of processing, its size might exclude the ability to do anything else, particularly when on-line retrieval is done in an extremely limited amount of main storage area. a partial solution to this problem might be to keep the look-up tables for the most frequently occurring terms in main storage and the others on disk. at present further analysis is being done to determine the value of this approach. conclusions the compaction algorithm performs relatively efficiently given the type of data used in text data base (i.e. data without lower case alphabetics, having a limited number of special characters, in primarily english text ). the times for decoding individual records ( .031 sec/ record ) indicate that on a normal print or terminal display operation, no noticeable increase in access time will be incurred. however several types of problems are encountered when treating other kinds of data. since the algorithm works on the basis of replacing the most-frequently occurring n-grams by single-byte codes, the compaction ratio is dependent on the number of codes that can be "freed up" for n-gram representation. the more codes that can be reallocated to n-grams, the better the compaction. data which would pose complications to the algorithm-as currently defined-can be separated for discussion as follows: compaction of alphanumeric datajschieber and thomas 205 1) data containing both upper and lower case characters (as well as a limited set of special characters), and 2) data which might possibly contain a wide variety of little-used special graphics. if lower-case characters are used, a possible way to encode data using this technique is to harken back to the time-honored method of representing lower-case with upper-case codes, and upper-case characters by their value, preceeded by a single shift code (e.g., #access for access). the shift code blank character digram would undoubtedly figure relatively high on the frequency list, making it eligible as an encodable digram. the second problem occurs when one attempts to compact data having a large set of graphics. a good example of this is bibliographic data containing a wide variety of little-used characters of the type now being provided for in the marc tapes ( 11) issued by the u. s. library of congress (such as the icelandic thorn). normally representation of these graphics is done by allocating as many codes as required from the possible 256-code set. since the compaction ratio is dependent on the number of unallocated internal codes, a possible solution to this dilemma might be to represent little-used graphics by multi-byte codes which would free the codes for representation of frequently occurring n-grams. further, it is noticeable that the more homogeneous the data the higher the compression ratio. this means that data all in one language will encode better than data in many languages. there is, unfortunately, no ready solution to this problem, given the constraints of this algorithm. in dealing with heterogeneous data one must be prepared to accept a lower compression factor. without doubt to be able to effect a savings of around 40% for storage space is significant. the price for this ability is computer processing time, and the more complex the encoding and decoding routines, the more time is required. there is a calculable break-even point at which it becomes economically more attractive to buy x amount of additional storage space than to spend the equivalent cost on data compaction. yet at the present cost of direct-access storage, compaction may be a possible solution for organizations with large data files. references 1. marron, b. a.; demaine, p. a. d.: "automatic data compression," communications of the acm, 10 (november 1967), 711-715. 2. demaine, p. a. d.; kloss, k.; marron, b. a.: the solid system iii: alphanumeric compression. (washington, d. c. : national bureau of standards, 1967 ) . (technical note 413 ) . 3. salton, g.: automatic information organization and retrieval (new york: mcgraw-hill, 1968 ). 4. resnikoff, h. l.; dolby, j. l.: "the nature of affixing in written english," mechanical translation, 8 (march 1965), 84-89. 206 journal of library automation vol. 4/4 december, 1971 5. resnikoff, h . l.; dolby, j. l.: "the nature of affixing in written english," mechanical translation, 9 (june 1966), 23-33. 6. snyderman, martin; hunt, bernard: "the myriad virtues of text compaction," datamation (december 1, 1970), 36-40. 7. myers, w.; townsend, m.; townsend, t.: "data compression by hardware or software," datamation (april 1966), 39-43. 8. demaine, p. a. d.; kloss, k.; marron, b. a.: the solid system ii. numeric compression. (washington, d. c.: national bureau of standards, 1967). (technical note 413 ). 9. demaine, p. a. d.; marron, b. a.: "the solid system i. a method for organizing and searching files." in schecter, g. (ed.): information retrieval-a critical view. (washington, d. c.: thompson book co., 1967). 10. schieber, w.: isis (integrated scientific information system; a general description of an approach to computerized bibliographical control). (geneva: international labour office, 1971) . 11. books: a marc format; specification of magnetic tapes containing monographic catalog records in the marc ii format. (washington, d. c.: library of congress, information systems office, 1970.) of the people, for the people: digital literature resource knowledge recommendation based on user cognition wen lou, hui wang, and jiangen he information technology and libraries | september 2018 66 wen lou (wlou@infor.ecnu.edu.cn) is an assistant professor in the faculty of economics and management, east china normal university. hui wang (1830233606@qq.com) is a graduate student in the faculty of economics and management, east china normal university. jiangen he (jiangen.he@drexel.edu) is a doctoral student in the college of computing and informatics, drexel university. abstract we attempt to improve user satisfaction with the effects of retrieval results and visual appearance by employing users’ own information. user feedback on digital platforms has been proven to be one type of user cognition. through conducting a digital literature resource organization model based on user cognition, our proposal improves both the content and presentation of retrieval systems. this paper takes powell's city of books as an example to describe the construction process of a knowledge network. the model consists of two parts. in the unstructured data part, synopses and reviews were recorded as representatives of user cognition. to build the resource category, linguistic and semantic analyses were used to analyze the concepts and the relationships among them. in the structural data part, the metadata of every book was linked with each other by informetrics relationships. the semantic resource was constructed to assist with building the knowledge network. we conducted a mock-up to compare the new category and knowledge-recommendation system with the current retrieval system. thirty-nine subjects examined our mock-up and highly valued the differences we made for the improvements in retrieval and appearance. knowledge recommendation based on user cognition was tested to be positive based on user feedback. there could be more research objects for digital resource knowledge recommendations based on user cognition. introduction the concept of user cognition originates in cognitive psychology. this concept principally explores the human cognition process through information-processing methods.1 the concept characterizes a process in which a user obtains unknown information and knowledge through acquired information. as information-science workers, we may explore the psychological activities of users by analyzing their cognitive processes when they are using information services.2 a knowledge-recommendation service based on user cognition has become essential since it emphasizes facilitating collaborations between humans and computers and promotes the participation of users, which ultimately improves user satisfaction. a knowledge-recommendation system is based on a combination of information organization, a retrieval system, and knowledge visualization.3 however, when exploring digital online literature resources, it is difficult to quickly and precisely find what we want because of the problem of information organization and retrieval. most search results only display a one-by-one list view. mailto:2012101040015@whu.edu.cn mailto:1830233606@qq.com mailto:jiangen.he@drexel.edu of the people, for the people | lou, wang, and he 67 https://doi.org/10.6017/ital.v37i3.10060 thus, adding visualization techniques to an interface could improve user satisfaction. furthermore, the retrieval system and visualizations rely on information organization. only if information is well designed can the retrieval system and visualization be useful. therefore, we attempt to improve retrieval efficiency by proposing a digital literature resource organization model based on user cognition to improve both the content and presentation of retrieval systems. taking powell’s city of books as an example, this paper proposes user feedback as first-hand user information. we will focus on (1) resource organizations based on user cognition and (2) new formats on search results based on knowledge recommendations. we will purposefully employ data from users’ own information and give knowledge back to users in accordance with the quote “of the people, for the people.” related work user cognition and measurement user cognition usually consists of a series of processes, including feeling, noticing, temporary memory, learning, thinking, and long-term memory.4 feeling and noticing are at an inferior level, while learning, thinking, and memory are comparatively superior. researchers have so far tried to identify user cognition processes by analyzing user needs. there are four levels of user needs according to ma and yang5 (see figure 1.) in turn, user interests normally reflect potential user needs. users who retrieve information on their own show feeling needs. users who give feedback show expression needs. users who ask questions show knowledge needs, which is the highest level. the methods to quantify user cognition require visible and measurable variables. existing studies have commonly used website log analysis or user surveys. website log analysis has been proven to be a solid data source to record and analyze both user interests and information needs.6 user surveys, including online questionnaires and face-to-face interviews, have been widely used to comprehend user feelings and user satisfaction.7 user surveys generally measure two kinds of relationship: between users and digital services and between users and the digital community.8 with a survey, we can make the most of statistics and assessment studies to analyze user satisfaction about an array of standards and systems of existing service platforms, service environments, service quality, and service personnel, which provides some references and suggestions for future study of user experience quality, platform elements, interaction process , and more.9 however, neither log data nor surveys can obtain first-hand user information in reallife settings. eye tracking and the concept-map method can be used to understand user behavior in the course of user testing.10 however, these approaches are difficult to adapt to a large group of users. therefore, a linguistic-oriented review analysis has become an increasingly important method. user content, including reviews and tags, could be analyzed through text mining and become valuable data sources to learn their preferences for the product and service in the areas of electronic commerce and digital libraries.11 this type of data has been called “more than words.”12 information technology and libraries | september 2018 68 figure 1. understanding user cognition by analyzing user needs. user-oriented knowledge service model the user-oriented service model includes user demand, user cognition, and user information behavior. a service model based on user demand chiefly concentrates on the motives, habits, regularities, and purposes of user demand to identify the model of use demand so that the appropriate service is adopted.13 service models based on user cognition attach importance to the process of user cognition, the influence that users are facing,14 and the change of library information services under the effects of series of cognitive processes (such as feeling, receiving, memorizing, and thinking).15 a service model based on user information behavior focuses on interactive behavior in the process of library information services that users participate in, such as interactions with academic librarians, knowledge platforms,16 and others. studies have paid more attention to the pre-process of the user-oriented service model, which analyzes information habits and user behaviors.17 studies have also proposed frameworks of knowledge services, design innovations,18 or personalized systems and frames of the knowledge service model, but they have not succeeded in implementing or performing user testing. knowledge service system construction most studies of knowledge service system construction are in business areas. numerous studies have explored knowledge-innovation systems for product services.19 cheung et al. proposed a knowledge system to improve customer service.20 vitharana, jain, and zahedi composed a knowledge repository to enhance the knowledge-analysis skills of business consultants.21 from of the people, for the people | lou, wang, and he 69 https://doi.org/10.6017/ital.v37i3.10060 the angle of user demand, zhou analyzed the elements of service-platform construction and found that crucial platforms should serve knowledge service system construction. 22 scholars proposed basic models for knowledge management and knowledge sharing, but they did not simulate their applications.23 knowledge management from the library-science perspective is very different from that in the business area. library knowledge management usually refers to a digital library, especially a personal digital library.24 others explore and attempt to construct a personalized knowledge service system,25 while fewer studies about system designs are based on the results of user surveys in accordance with documented surveys. we rarely see a user-feedback study combined with the method of using users’ own knowledge. users themselves know what they desire. if user-oriented studies separate the system design from user-needs analysis or the other way around, the studies may miss the purpose. therefore, we propose a resource-organization method based on users’ own knowledge to close the distance between the users and the system. resource-organization model based on user cognition there are normally two ways to construct a category system. one method gathers experts to determine categories and assign content to them; the category system comes first and the content second. the other method is to derive a category tree from the content itself, as we propose in this paper. in this way, the content takes priority over the categorization system. in this paper, we focus on this second way to organize resources and index content. resource organization requires a series of steps, including information processes, extraction, and organization. figure 2 shows the resource-organization model based on user cognition. this model fits the needs of digital resources with comments and reviews. the model has two interrelated parts. one is for indexing the content, and the other is for knowledge recommendations. for the first part, the model integrates all the comments and reviews of all literature in an area or the whole resource. the core concepts and the relationships among the concepts are extracted through natural language processing. the relationships between concepts are either subordination and correlation. a triple consists of two core concepts and their relationship. the triple set includes all triples. next, all books are indexed by taxonomy in the new category system. however, the indexing of every book is not based upon the traditional method, which is to manually determine each category by reading the literature. we use a method based on the books’ content. while we are extracting the core concepts from all books we extract the core concepts from every book by the same semantic-analysis methods and build up triples for the individual book. then the triples of this book can match the triple set in the new category system. once a triple in a single book yields a maximum matching value, the core concepts in the triple set will be indexed as the keywords of the book. a few examples of the matching process will be discussed in the empirical study (in the section “indexing books”). the first part is about comments and reviews, which are unstructured data. the second part is to make use of structural data in the bibliography to build a semantic network. structural data, including titles, keywords, authors, and publishers, is stored separately. we calculate the information technology and libraries | september 2018 70 informetrics relationships among the entities. the relationships can be among different entities, such as between one author and another or between an author and a publisher. then two entities and their relationship compose a triple. the components in triples are linked to each other, which makes them semantic resources. furthermore, the keywords in structural data are not the original keywords before the new category system but are the modified keywords. finally, the reindexed resources (books in the new category) and semantic resources (the triples from structural data) are both used to build the knowledge network. figure 2. resource-organization model based on user cognition. however, why is it important to use both unstructured data and structural data? the reason is to complete the entire content of a literature resource. neither of them can fully represent the whole semantics for a literature resource. structural data lacks subjective content, and unstructured data lacks basic information. thus, a full semantic network can be built using both kinds of data. of the people, for the people | lou, wang, and he 71 https://doi.org/10.6017/ital.v37i3.10060 resource-organization experiment object selection located in portland, oregon, powell’s city of books (hereafter referred to as “book city”) is one of the largest bookstores in the united states, with 200 million books in its inventory. book city caught our eyes for four reasons. (1) the comments and reviews of books on book city’s website are well constructed and plentiful. the national geographic channel established it as one of the ten best bookstores in the world.26 atlantis books, pendulo, and munro's books are also on the list. among these bookstores, only book city and munro’s books have indexed the information of comments and reviews. since user reviews are fundamental to this study, we restricted ourselves to bookstores that provided user reviews. (2) we excluded libraries because literature resources have been well organized in libraries. it might not be necessary to reorganize them according to user cognition. however, we can put this topic in the future study. (3) book city is a typical online bookstore that also has a physical bookstore. unlike amazon, book city, indigo, barnes & noble, and munro’s books have physical bookstores. however, they all have technological limitations on retrieval-system and taxonomical construction compared to amazon. thus, it is necessary to investigate these bookstores’ online systems and optimize them. (4) the location was geographically convenient to the researchers. the authors are more familiar with book city than other bookstores. moreover, we plan on conducting a face-to-face interview for the user study. it is doable only if the authors can get to the bookstore and the users who live there. in all, we choose book city as a representative object. data collection and processing on december 22, 2015, we randomly selected the field “cooking and food” and downloaded bibliographic data for 462 new and old books that included title, picture, synopsis and review, isbn, publication date, author, and keywords. in our previous work we described how metadata for all kinds of literature can be categorized into one of three types: structural data, semistructural data, and unstructured data.27 (see table 1). title, isbn, date, publisher, and author are classified as structural data. titles can be seen as structural data or unstructured data depending on the need. titles will be considered as an indivisible entity in this paper as titles need to retain their original meanings. keywords are considered as semistructural data for two reasons: (1) normally one book is indexed with multiple keywords, which are natural language; and (2) keywords are separated by punctuation. each keyword can individually exist with its own meaning. however, in the current category system, keywords are the names of categories and subcategories. since we are about to reorganize the category system, the current keywords will not be included in the following steps. we use the field “synopsis and review” in the downloaded bibliographic records as the source of user cognition. synopses and reviews are classified as unstructured data. all synopses and reviews of a single book are first incorporated into one paragraph, since some books contain more than one review. structural data will be stored for constructing a knowledge network. unstructured data will be part-of-speech tagged and word segmented by the stanford segmenter. all the books’ metadata are stored into the defined three data types and separate fields. each field is linked by the isbn as the primary key. information technology and libraries | september 2018 72 category organization first, the frequencies of words in all books are separately calculated after word segmenting so that core concepts are identified by the frequencies of words. in total, 29,370 words appeared 43,675 times, after excluding stop words. the 206 words in the sample that occurred more than 105 times appeared 34,944 times. this subset was defined as the core words according to the pareto principle. table 1. data sample. field content data type title a modern way to eat: 200+ satisfying vegetarian recipes structural data isbn 9781607748038 date 04/21/2015 publisher ten speed press author anna jones kwds cooking and food-vegetarian and natural semistructural data synopsis and review a beautifully photographed and modern vegetarian cookbook packed with quick, healthy, and fresh recipes that explore the full breadth of vegetarian ingredients—grains, nuts, seeds, and seasonal vegetables—from jamie oliver's london-based food stylist and writer anna jones. how we want to eat is changing. more and more people cook without meat several nights a week and are constantly seeking to . . . unstructured data we are inspired by zhang et al., who described a linguistic-keywords-extraction method by defining multiple kinds of relationships among words.28 the relationships include direct relationship, indirect relationship, part-whole relationship, and related relationship. • direct relationship. two core words have a relationship directly to each other. • indirect relationship. two core words are related and linked by another word as a media. • part-whole relationship. the “is a” relation. one core word belongs to the other. it is the most common relationship in context. • related relationship. two core words have no relationships but they both appear in a large context. the first two relationships can be mixed with the second two relationships. for instance, a partwhole relationship can have either a direct relationship or an indirect relationship. for this study, we combined every two core words into pairs for analysis. for example, the sentence “a picnic is a great escape from our day-to-day and a chance to turn a meal into something more festive and memorable” would result in several core-word pairs, including of the people, for the people | lou, wang, and he 73 https://doi.org/10.6017/ital.v37i3.10060 “picnic” and “meal,” “picnic” and “festive,” and “meal” and “festive.” for “picnic” and “meal,” there is an obvious part-whole relationship in this context. we observed all their relationships in all books and determined their relationship as a direct part-whole relationship because 67 percent of their relationships are part-whole relationship, 80 percent are direct relationship, and others are related relationship. this is the case when two core words are in the same sentence. for two words in different sentences but within one context, we define the words’ relationship as a sentence relationship. for example, “ingredient” and “meat” in one review in table 1 have an indirect relationship because they are connected by other core concepts between them. therefore, the relationship between “ingredient” and “meat” is an indirect part-whole one in this context. for other cases, two concepts are either related if they appear in the same context or are not related if they do not appear in the same review. thus, all couples of concepts are calculated and stored as semantic triples. figure 3. parts of a modified category in “cooking and food” based on user cognition. the next step is to build up a category tree (figure 4). a direct part-whole relationship is that between a parent class and child class. an indirect part-whole relationship is the relationship between a parent class and a grandchild class. a related relationship is the relationship between sibling classes. information technology and libraries | september 2018 74 compared to the modified category system (figure 3), the current hierarchical category system (figure 4) has two major issues. first, some categories’ names are duplicated. for example, the child class “by ingredient” contains “fruit,” “fruits and vegetables,” and “fruits, vegetables, and nuts.” second, there are categories without semantic meaning, such as “oversized books.” these two problems brought out disorderly indexing and recalled many irrelevant results. for example, the system would let you refine your search first if you type one word in search box. however, refining is confusing by parent class and children class. searching “diet” books as an example, the system suggests you refine your search from five subcategories of “diet and nutrition” under three different parent classes. however, the modified category system has avoided the duplicated keywords. furthermore, the hierarchical system based on users’ comments maintains meaning. figure 4. parts of current category system in “cooking and food.” indexing books we found that the list of keywords was confusing due to the inefficiency of the previous category system. it is necessary to re-index the keywords of each book based on the modified category system. we stand on the data-oriented indexing process. the method to detect the core concepts of each book is the same as that for all books in section 4.3. taking the book a modern way to eat as an example, triples are extracted from the book, including “grain-direct part whole-ingredient,” “nut-direct part whole-ingredient,” “vegetarian-related-health,” and so on. using all triples of the book to match with the triples set from all books in section 4.3, we index this book to categories by the best match parent class. in this case, 5 out of 9 triples of a modern way to eat are matched with the parent class “ingredient.” another two are matched with “natural” and “technique,” and of the people, for the people | lou, wang, and he 75 https://doi.org/10.6017/ital.v37i3.10060 the other two cannot correctly match with the triples set. then, a modern way to eat will be indexed with “cooking and food-ingredient,” “cooking and food-natural,” and “cooking and food-technique.” 4.5 semantic-resource construction the semantic resource is constructed based on structural data that was prepared at the beginning. the informetrics method (specifically co-word analysis) will be used to extract the precise relationship among the bibliography of books, as we previously proposed.29 we construct all structural data together and conduct co-words matrixes between each title, publisher, date, author, and keyword. for example, the author “anna jones” co-occurred with many keywords to varying degrees. the author co-occurred with the keyword “natural” four times and “person” seven times. according to qiu and lou, the precise relationship needs to be divided by the threshold and formatted as literal words.30 therefore, among the degree of all relationships between “anna jones” and other keywords, the relationship between “anna jones” and “natural” is highly correlated, and the relationship between “anna jones” and “person” is extremely correlated. triples are composed of two concepts and their relationships. then a semantic resource is finally constructed that could be used for knowledge retrieval. figure 5. an example of the knowledge network. once the semantic resource is ready, the knowledge network is presentable. we adopted d3.js to display the knowledge network (figure 5). the net view automatically exhibits several books related with an author william davis, which is placed in a conspicuous position on the screen. the forced map can be reformed when users drag any book with the mouse, which will be the noticeable center of other books. the network can connect with the database and the website. information technology and libraries | september 2018 76 5. user-experience study on knowledge display and recommendation there are two common ways to evaluate a retrieval system. one is to test the statistic results, such as the recall and precision. the other is a user study. since our aim is “of the people, for the people,” we chose to conduct two user-experience studies over the statistical results. as such, we can obtain what users suggest and comment on our approach. user-experience study design in february 2016, with the help of friends, we recruited volunteers by posting fliers in portland, oregon. fifty volunteers contacted us. thirty-nine responses were received by the end of march 2016 because the other eleven volunteers were not able to enroll in the electronic test. since we needed to test the feasibility of both the new indexing category and the knowledge recommendation, we set up the user study into two parts, including the comparison of the simple retrieval and the knowledge recommendation. first, we requested permission to use the data source and website frame from book city. however, we cannot construct a new website for book city due to intellectual-property issues. therefore, we constructed a practical mock-up to guide users to simulate a retrieval experiment. following the procedure of the user experience design, we chose mockingbot (https://mockingbot.com) as the mock-up builder. mockingbot allows the demo users to experience a vivid system that will be developed later. the mock-up supports every tag that can be linked with other pages so that subjects could click on the mock-up just as they would on a real website. the demo is expected to help us (1) examine whether our changes would meet the users’ satisfaction and (2) gather information for a better design. then we performed face-to-face, userguided interviews to first gain experience on the previous retrieval system and then compare them with our results. we concurrently recorded the answers and scores of users’ feedback. in the following sections, we will describe the interview process and present the feedback results. study 1: comparison of simple retrieval first, subjects were asked to search related books written by “michael pollan” at powells.com (figure 6). as such, all subjects used the search box based on their instincts. then they were asked to find a new hardcover copy of a book named cooked: a natural history of transformation. we paid attention to the ways that subjects located the target. only five of them used keyboard shortcuts to find the target. however, thirteen subjects stated their concerns regarding the absence of refinement options. furthermore, we noticed that six subjects swept (moused over) the refinement area and then decided to continue eye screening. in the meantime, we recorded the time they spent looking for the item. after they found the target, all subjects gave us a score from one to ten that represented their satisfaction with the current retrieval system. of the people, for the people | lou, wang, and he 77 https://doi.org/10.6017/ital.v37i3.10060 figure 6. screenshot of retrieval results in the current system. in the comparison experiment, we placed our mock-up in front of subjects and conducted the same exam above. in the mock-up, we used the basic frame of the retrieval system but reframed the refinement area. in the new refinement area (figure 7), we added an optional box with refinement keywords in the left column to narrow the search scope. the logic of the refined keywords comes from the indexing category, as we mentioned in the section on the indexing books. “michael pollan” was indexed in six categories: “biographies,” “children’s books,” “cooking and food,” “engineering manufactures,” “hobby and leisure,” and “gardening.” thus, when subjects clicked the “cooking and food” category, they can refine the results to only twelve books rather than the seventy books in the current system. users can obtain accurate retrieval results faster. after the subjects completed their tasks, they gave us a score from one to ten representing their satisfaction with the modified retrieval system. figure 7. refinement results in the modified category-system mock-up. information technology and libraries | september 2018 78 study 2: knowledge recommendation in this experiment, we conducted two tests for two functions on knowledge visualization. one tested the preferences for the net view, and the other tested the preferences for the individual recommendation. for the net view, we guided subjects to search for “william davis” in the mock-up and reminded them to click the net view button after the system recalled a list view. then, the subjects could see the net view results in figure 5. we recorded the scores that they gave for the net view. as for the recommendation on individual books, we adopted multiple layers of associated retrieval results for every book. users could click on one book and another related book would show in a new tab window. we asked subjects to conduct a new search for “william davis.” then they could browse the website and freely click on any book. once they clicked on davis’s book wheat belly: lose the wheat, lose the weight, and find your path back to health, the first recommendation results popped up (figure 8). the recommendation results about wheat in the field of “grain and bread” showed up, including good to the grain: baking with whole grain flours and bread bakers apprentice: mastering the art of extraordinary bread. others about health and losing weight showed up also, such as paleo lunches and breakfasts on the go. all related books appeared because the first book is about both wheat and a healthy diet. a new window showing relevant authors and titles would pop up if the mouse glided over any picture. we asked the subjects about their thoughts on the new recommendation format and recorded the scores. figure 8. an example of knowledge recommendation. users’ feedback as a result, knowledge organization and retrieval received a positive response (tables 2 and 3). first, subjects complained about the inefficiency of the current retrieval system in that it took so long to find one book without using shortcut keys (ctrl-f). three quarters of them were not satisfied with the original search style due to the search time length. however, 67 percent of the subjects gave a score of more than eight points for the refined search results of our new system. of the people, for the people | lou, wang, and he 79 https://doi.org/10.6017/ital.v37i3.10060 only two of them thought that it was useless since they were the two users who only took ten seconds to target the exact result. second, 67 percent and 74 percent of the subjects, respectively, thought that the knowledge recommendation and net view were useful and gave them six points. however, five subjects gave scores of one point because they maintained that it was not necessary to build a new viewer system. table 2. the time to find the exact result in the current system. answers # of users fewer than 10 seconds 2 10 to 30 seconds 4 30 seconds to 1 minute 12 more than 1 minute 21 table 3. statistics of quantitative questions in the questionnaire. score questions 10 9 8 7 6 5 4 3 2 1 total satisfied with original results 0 0 0 0 1 9 14 9 4 2 39 preference of refined results 2 10 14 6 5 0 0 0 0 0 37 preference of results in net view 1 8 10 6 4 1 2 3 1 3 39 preference of knowledge recommendation 3 6 4 8 5 6 0 3 1 2 38 during the interview, subjects who gave scores of more than eight points spoke positively about the vivid visualization of the retrieval results, using words such as “innovative” and “creative.” for instance, user 11 said, “bravo changes for powell, that’d be the most innovative experience for the locals.” among the subjects who gave scores of more than six points, the comments were mostly “interesting idea.” for instance, user 17 commented, “this is an interesting idea to explore my knowledge. i had no idea powell could do such an improvement.” some users offered suggestions to improve the system. for example, user 12 suggested that the system was not comprehensive enough to confidently assess whether the modified category system was better than the previous system. user 25 (a possible professional) was very concerned about the recall efficiency since the system might use many matching algorithms. discussion and conclusion in this paper, a digital literature resource organization model based on user cognition is proposed. this model aims to make users exert subjective initiative. we noticed a significant difference between the previous category system and the new system based on user cognition. our aim, which was “of the people, for the people,” was fulfilled. taking powell’s city of books as an example, it is purposeful to describe how to construct a knowledge network based on user cognition. the user experience study showed that this network implements an optimized exhibition of a digital-resource knowledge recommendation and knowledge retrieval. although user cognition includes many other processes of user behavior, we only used the literal expression. it turned out to be a positive and possible way to reveal users’ cognition. information technology and libraries | september 2018 80 we find that there is much more space for the construction object of digital resource knowledge recommendation based on user cognition. for one, in this paper we only take the familiar book city as a study object and books as experiment objects and determined favorable positive effects, which indicates that the digital resource knowledge link can be applied to physical libraries and bookstores or other types of literature. even though libraries have well-developed taxonomy systems, they can be compared with or combined with new ideas. for another, users adore visual effects and user functions. the results show promise in actualizing improvements to book city’s website or even to other digital platforms. the concerns will be how to optimize the retrieval algorithm and reduce the time costs in the next study. acknowledgements we thank carolyn mckay and powell’s city of books for such great help for the questionnaire networking and all participates for feedback. this work was supported by the national social science foundation of china [grant number 17ctq025]. references and notes 1 peter carruthers, stephen stich, and michael siegal, the cognitive basis of science (cambridge: cambridge university press, 2002). 2 sophie monchaux et al., “query strategies during information searching: effects of prior domain knowledge and complexity of the information problems to be solved,” information processing and management 51, no. 5 (2015): 557–69, https://doi.org/10.1016/j.ipm.2015.05.004. 3 hoill jung and kyungyong chung, “knowledge-based dietary nutrition recommendation for obese management,” information technology and management 17, no. 1 (2016): 29–42, https://doi.org/10.1007/s10799-015-0218-4. 4 dandan ma, liren gan, and yonghua cen, “research on influence of individual cognitive preferences upon their acceptance for knowledge classification recommendation service,” journal of the china society for scientific and technical information 33, no. 7 (2014): 712–29. 5 haiqun ma and zhihe yang, “study on the cognitive model of information searchers from the perspective of neuro-language programming,” journal of library science in china 37, no. 3 (2011): 38–47. 6 paul gooding, “exploring the information behaviour of users of welsh newspapers online through web log analysis,” journal of documentation 72, no. 2 (2016): 232–46. https://doi.org/10.1108/jd-10-2014-0149. 7 munmun de choudhury and scott counts, “identifying relevant social media content : leveraging information diversity and user cognition,” in ’ht11 proceedings of the 22nd acm conference on hypertext and hypermedia (new york: acm, 2011), 161–70, https://doi.org/10.1145/1995966.1995990; carol tenopir et al., “academic users’ interactions with sciencedirect in search tasks: affective and cognitive behaviors ,” information processing and management 44, no. 1 (2008): 105–21, https://doi.org/10.1016/j.ipm.2006.10.007. https://doi.org/10.1016/j.ipm.2015.05.004 https://doi.org/10.1007/s10799-015-0218-4 https://doi.org/10.1145/1995966.1995990 https://doi.org/10.1016/j.ipm.2006.10.007 of the people, for the people | lou, wang, and he 81 https://doi.org/10.6017/ital.v37i3.10060 8 young han bae, jong woo jun, and michelle hough, “uses and gratifications of digital signage and relationships with user interface,” journal of international consumer marketing 28, no. 5 (2016): 323–31, https://doi.org/10.1080/08961530.2016.1189372. 9 claude sicotte et al., “analysing user satisfaction with the system in use prior to the implementation of a new electronic inpatient record,” in proceedings of the 12th world congress on health (medical) informatics; building sustainable health systems (amsterdam: ios press, 2007), 1779-1784; zhenzheng qian et al., “satiindicator: leveraging user reviews to evaluate user satisfaction of sourceforge projects,” in proceedings—international computer software and applications conference 1 (2016):93–102, https://doi.org/10.1109/compsac.2016.183. 10 christina merten and cristina conati, “eye-tracking to model and adapt to user meta-cognition in intelligent learning environments,” in proceedings of the 11th international conference on intelligent user interfaces—iui ’06 (new york: acm, 2006), 39–46, https://doi.org/10.1145/1111449.1111465; weidong zhao, ran wu, and haitao liu, “paper recommendation based on the knowledge gap between a researcher’s background knowledge and research target,” information processing & management 52, no. 5 (2016): 976–88, https://doi.org/10.1016/j.ipm.2016.04.004. 11 haoran xie et al., “incorporating sentiment into tag-based user profiles and resource profiles for personalized search in folksonomy,” information processing and management 52, no. 1 (2016): 61–72, https://doi.org/10.1016/j.ipm.2015.03.001; francisco villarroel ordenes et al., “analyzing customer experience feedback using text mining: a linguistics-based approach,” journal of service research 17, no. 3 (2014): 278–95, https://doi.org/10.1177/1094670514524625; yujong hwang and jaeseok jeong, “electronic commerce and online consumer behavior research: a literature review,” information development 32, no. 3 (2016): 377–88, https://doi.org/10.1177/0266666914551071. 12 stephan ludwig et al., “more than words: the influence of affective content and linguistic style matches in online reviews on conversion rates,” journal of marketing 77, no. 1 (2012): 1–52, https://doi.org/10.1509/jm.11.0560. 13 jun yang and yinglong wang, “a new framework based on cognitive psychology for knowledge discovery,” journal of software 8, no. 1 (2013): 47–54. 14 alan baddeley, “on applying cognitive psychology,” british journal of psychology 104, no. 4 (2013): 443–56, https://doi.org/10.1111/bjop.12049. 15 aidan moran, “cognitive psychology in sport: progress and prospects,” psychology of sport and exercise 10, no. 4 (2009): 420–26, https://doi.org/10.1016/j.psychsport.2009.02.010. 16 john van de pas, “a framework for public information services in the twenty-first century,” new library world 114, no. 1/2 (2013): 67–79, https://doi.org/10.1108/03074801311291974. 17 enrique frias-martinez, sherry y. chen, and xiaohui liu, “evaluation of a personalized digital library based on cognitive styles: adaptivity vs. adaptability,” international journal of https://doi.org/10.1080/08961530.2016.1189372 https://doi.org/10.1109/compsac.2016.183 https://doi.org/10.1145/1111449.1111465 https://doi.org/10.1016/j.ipm.2016.04.004 https://doi.org/10.1016/j.ipm.2015.03.001 https://doi.org/10.1177/1094670514524625 https://doi.org/10.1177/0266666914551071 https://doi.org/10.1509/jm.11.0560 https://doi.org/10.1111/bjop.12049 https://doi.org/10.1016/j.psychsport.2009.02.010 https://doi.org/10.1108/03074801311291974 information technology and libraries | september 2018 82 information management 29, no. 1 (2009): 48–56, https://doi.org/10.1016/j.ijinfomgt.2008.01.012. 18 shing lee chung et al., “an integrated framework for managing knowledge-intensive service innovation,” international journal of services technology and management 13, no. 1/2 (2010): 20, https://doi.org/10.1504/ijstm.2010.029669. 19 koteshwar chirumalla, “managing knowledge for product-service system innovation: the role of web 2.0 technologies,” research-technology management 56, no. 2 (2013): 45–53, https://doi.org/10.5437/08956308x5602045; koteshwar chirumalla et al., “knowledgesharing network for product-service system development: is it a typical?,” in international conference on industrial product-service systems (2013): 109–14; fumiya akasaka et al., “development of a knowledge-based design support system for product-service systems,” computers in industry 63, no. 4 (2012): 309–18, https://doi.org/10.1016/j.compind.2012.02.009. 20 c. f. cheung et al., “a multi-perspective knowledge-based system for customer service management,” expert systems with applications 24, no. 4 (2003): 457–70, https://doi.org/10.1016/s0957-4174(02)00193-8. 21 padmal vitharana, hemant jain, and fatemeh zahedi, “a knowledge based component/service repository to enhance analysts’ domain knowledge for requirements analysis,” information and management 49, no. 1 (2012): 24–35, https://doi.org/10.1016/j.im.2011.12.004. 22 baihai zhou, “the construction of library interdisciplinary knowledge sharing service system,” in 2014 11th international conference on service systems and service management (icsssm), june 25–27, 2014, https://doi.org/10.1109/icsssm.2014.6874033. 23 rusli abdullah, zeti darleena eri, and amir mohamed talib, “a model of knowledge management system for facilitating knowledge as a service (kaas) in cloud computing environment,” 2011 international conference on research and innovation in information systems, november 23–24, 2011, 1–4, https://doi.org/10.1109/icriis.2011.6125691. 24 alan smeaton and jamie callan, “personalisation and recommender systems in digital libraries,” international journal on digital libraries 5, no. 4 (2005): 299–308, https://doi.org/10.1007/s00799-004-0100-1. 25 yanwen wu et al., “research on personalized knowledge service system in community elearning,” lecture notes in computer science (berlin: springer, 2006), https://doi.org/10.1007/11736639_17; shu-chen kao and chienhsing wu, “pikipdl. a personalized information and knowledge integration platform for dl service,” library hi tech 30, no. 3 (2012): 490–512, https://doi.org/10.1108/07378831211266627. 26 national geographic, destinations of a lifetime: 225 of the world’s most amazing places (washington d.c.: national geographic society, 2016). 27 wen lou and junping qiu, “semantic information retrieval research based on co-occurrence analysis,” online information review 38, no. 1 (january 8, 2014): 4–23, https://doi.org/10.1016/j.ijinfomgt.2008.01.012 https://doi.org/10.1504/ijstm.2010.029669 https://doi.org/10.5437/08956308x5602045 https://doi.org/10.1016/j.compind.2012.02.009 https://doi.org/10.1016/s0957-4174(02)00193-8 https://doi.org/10.1016/j.im.2011.12.004 https://doi.org/10.1109/icsssm.2014.6874033 https://doi.org/10.1109/icriis.2011.6125691 https://doi.org/10.1007/s00799-004-0100-1 https://doi.org/10.1007/11736639_17 https://doi.org/10.1108/07378831211266627 of the people, for the people | lou, wang, and he 83 https://doi.org/10.6017/ital.v37i3.10060 https://doi.org/10.1108/oir-11-2012-0203; junping qiu and wen lou, “constructing an information science resource ontology based on the chinese social science citation index,” aslib journal of information management 66, no. 2 (march 10, 2014): 202–18, https://doi.org/10.1108/ajim-10-2013-0114; fan yu, junping qiu, and wen lou, “library resources semantization based on resource ontology,” electronic library 32, no. 3 (2014): 322–40, https://doi.org/10.1108/el-05-2012-0056. 28 lei zhang et al., “extracting and ranking product features in opinion documents,” in international conference on computational linguistics (2010): 1462–70. 29 lou and qiu, “semantic information retrieval research,” 4; qiu and lou, “constructing an information science resource ontology,” 202; yu, qiu, and lou, “library resources semantization,” 322. 30 qiu and lou, “constructing an information science resource ontology,” 202. https://doi.org/10.1108/oir-11-2012-0203 https://doi.org/10.1108/ajim-10-2013-0114 https://doi.org/10.1108/el-05-2012-0056 abstract introduction related work user cognition and measurement user-oriented knowledge service model knowledge service system construction resource-organization model based on user cognition resource-organization experiment object selection data collection and processing category organization indexing books 4.5 semantic-resource construction 5. user-experience study on knowledge display and recommendation user-experience study design study 1: comparison of simple retrieval study 2: knowledge recommendation users’ feedback discussion and conclusion acknowledgements references and notes public library computer waiting queues: alternatives to the first -come-first-served strategy stuart williamson public library computer waiting queues | williamson 72 abstract this paper summarizes the results of a simulation of alternative queuing strategies for a public library computer sign-up system. using computer usage data gathered from a public library, the performance of these various queuing strategies is compared in terms of the distribution of user wait times. the consequences of partitioning a pool of public computers are illustrated as are the potential benefits of prioritizing users in the waiting queue according to the amount of computer time they desire. introduction many of us at public libraries are all too familiar with the scene: a crowd of customers huddled around the library entrance in the morning, anxiously waiting for the doors to open to begin a race for the computers. from this point on, the wait for a computer at some libraries, such as the one we will examine, can hover near thirty minutes on busy days and peak at an hour or more. such long waiting times are a common source of frustration for both customers and staff. by far the most effective solution to this problem is to install more public computers at your library. of course, when the space or money run out, this may no longer be possible. another approach is to reduce the length or number of sessions each customer is allowed. unfortunately, reducing session length can make completion of many important tasks difficult; whereas, restricting the number of sessions per day can result in customers upset over being unable to use idle computers.1 finally, faced with daunting wait times, libraries eager to make their computers accessible to more people may be tempted to partition their waiting queue by installing separate fifteen-minute “express” computers. a primary focus of this paper is to illustrate how partitioning the pool of public computers can significantly increase waiting times. additionally, several alternative queuing strategies are presented for providing express-like computer access without increasing overall waiting times. we often take for granted the notion that first-come-first-served (fcfs) is a basic principle of fairness. “i was here first,” is an intuitive claim that we understand from an early age. however, stuart williamson (swilliamson@metrolibrary.org) is researcher, metropolitan library system, oklahoma city, oklahoma. mailto:swilliamson@metrolibrary.org information technology and libraries | june 2012 73 the inefficiency present in a strictly fcfs queue is implicitly acknowledged when we courteously invite a person with only a few items to bypass our overflowing grocery cart to proceed ahead in the check-out line. most of us would agree to wait an additional few minutes rather than delay someone else for a much greater length of time. when express lanes are present, they formalize this process by essentially allowing customers needing help for only a short period of time to cut in line. these line cuts are masked by the establishment of separate dedicated lines, i.e., the queue is partitioned into express and non-express lines. one question addressed by this article is “is there a middle ground?” in other words, how might a library system set up its computer waiting queue to achieve express-lane type service without splitting the set of public internet computers into partitions that operate separately and in parallel? several such strategies are presented here along with the results of how each performed in a computer simulation using actual customer usage data from a public library. strategies queuing systems are heavily researched in a number of disciplines, particularly computer science and operations research. the complexity and sheer number of different queuing models can present a formidable barrier to library professionals. this is because, in the absence of real-world data, it is often necessary to analyze a queuing system mathematically by approximating its key features with an applicable probability distribution. unfortunately, applying these distributions entails adopting their underlying assumptions as well as any additional assumptions involved in calculating the input parameters. for instance, the poisson distribution (used to approximate customer arrival rates) requires that the expected arrival rate be uniform across all time intervals, an assumption which is clearly violated when school lets out and teenagers suddenly swarm the computers.2 even if we can account for such discrepancies, there remains the difficulty of estimating the correct arrival rate parameter for each discrete time interval being analyzed. fortunately, many libraries now use automated computer sign-up systems which provide access to vast amounts of real-world data. with realistic data, it is possible to simulate various queuing strategies, a few of which will be analyzed in this article. a computer simulation using real-world data provides a good picture of the practical implications of any queuing strategy we care to devise without the need for complex models. as is often the case, designing a waiting queue strategy involves striking a balance among competing factors. for instance, one way of reducing waiting times involves breaking with the fcfs rule and allowing users in one category to cut in front of other users. how many cuts are acceptable? does the shorter wait time for users in one category justify the longer waits in another? there are no right answers to these questions. while simulating a strategy can provide a realistic picture of its results in terms of waiting times, evaluating which strategy’s results are preferable for a particular library must be done on a case-by-case basis. in addition to the standard fcfs strategy with a single pool of computers and the same fcfs strategy implemented with one computer removed from the pool to serve as a dedicated fifteen public library computer waiting queues | williamson 74 minute express computer (referred to as fcfs-15), we will consider for comparison three other well-known alternative queuing strategies: shortest-job-first (sjf), highest-response-ratio-next (hrrn), and a variant of shortest-job-first (sjf-fb) which employs a feedback mechanism to restrict the number of times a given user may be bypassed in the queue.3 the three alternative strategies all require advance knowledge or estimation of how long each particular computer session will last. in our case, this means customers would need to indicate how long of a session they desire upon first signing up for a computer. any number of minutes is acceptable so we will limit the sign-up options to four categories in fifteen-minute intervals: fifteen minutes, thirty minutes, forty-five minutes, and sixty minutes. each session will then be initially categorized into one of four priority classes (p1, p2, p3, and p4) accordingly. as the data will show, customers selecting shorter sessions are given a higher priority in the queue and will thus have a shorter expected waiting time. it should be noted that relying on users to choose their own session length presents its own set of problems. it is often difficult to estimate how much time will be required to accomplish a given set of tasks online. however, users face a similar difficulty in deciding whether to opt for a dedicated fifteen-minute computer under the fcfs-15 system. the trade-off between use time and wait time should provide an incentive for some users to self-ration their computer use, placing an additional downward pressure on wait times. however, user adaptations in response to various queuing strategies are outside the scope of this analysis and will not be considered further. the shortest-job-first (sjf) strategy functions by simply selecting from the queue the user in the highest priority class. the amount of time spent waiting by each user is only considered as a tie breaker among users occupying the same priority class. our results demonstrate that the sjf strategy is generally best for minimizing overall average waiting time as well as for getting customers needing the least amount of computer time online the fastest. the main drawbacks of this strategy are that these gains come at the expense of more line cuts and higher average and maximum waiting times for the lowest priority users—those needing the longest sessions (sixty minutes). there is no limit to how many times a user can be passed over in the queue. in theory, this means that such a user could be continually bypassed and never be assigned a computer during the day. the sjf-fb strategy is a variant of sjf with the addition of a feedback mechanism that increases the priority of users each time they are cut in line. for instance, if a user signs up for a sixtyminute session, he/she is initially assigned a priority of 4. suppose that shortly after, another user signs up for a thirty-minute session and is assigned a priority of 2. the next available computer will be assigned to the user with the priority 2. the bypassed user’s priority will now be bumped up by a set interval. in this simulation an interval of 0.5 is used so the bypassed user’s new priority becomes 3.5. as a result, users beginning with a priority of 4 will reach the highest priority of 1 after being bypassed six times and will not be bypassed further. this effectively restricts the maximum number of times a user can be cut in front of at six. information technology and libraries | june 2012 75 the final alternative strategy, highest-response-ratio-next (hrrn), is a balance between fcfs and sjf. it considers both the arrival time and requested session length when assigning a priority to each user in the queue. each time a user is selected from the queue, the response ratio is recalculated for all users. the user with the highest response ratio is selected and assigned the open computer. the formula for response ratio is: ( ) this allows users with a shorter session request to cut in line, but only up to a point. even customers requesting the longest possible session move up in priority as they wait, just at a slower pace. this method produces the same benefits and drawbacks as the sjf strategy; but the effects of both are moderated, and the possibility of unbounded waiting is eliminated. still, although the expected number of cuts will be lower using hrrn than with sjf, there is no limit on how many times a user may be passed over in the queue. the response ratio formula can be generalized by scaling the importance of the waiting time factor. for instance in the modified response ratio below, increasing values of x > 1 will cause the strategy to more resemble fcfs, and decreasing values of 0 < x < 1 will more resemble sjf. ( ) one could experiment with different values of x to find a desired balance between the number of line cuts and the impact on average waiting times for customers in the various priority classes. this won’t be pursued here, and x will be assumed to be 1. methodology the data used in this simulation come from the metropolitan library system’s southern oaks library in oklahoma city. this library has eighteen public internet computers that customers can sign up for using proprietary software developed by jimmy welch, deputy executive director/technology for the metropolitan library system. the waiting queue employs the firstcome-first-served (fcfs) strategy. customers are allotted an initial session of up to sixty minutes but may extend their session in thirty-minute increments so long as the waiting queue is empty. repeat customers are also allowed to sign up for additional thirty-minute sessions during the day, provided that no user currently in the queue has been waiting for more than ten minutes (an indication that demand for computers is currently high). anonymous usage data gathered by the system in august 2010 was compiled to produce the information about each customer session shown in table 1. public library computer waiting queues | williamson 76 table 1. session data (units in minutes) the information about each session required for the simulation includes the time at which the user arrived to sign up for a computer, the number of minutes it took the user to log in once assigned a computer, how many minutes of computer time were used, whether or not this was the user’s first or a subsequent session for the day, and finally, whether the user gave up waiting and abandoned his/her place in the queue. users are given eight minutes to log in once a computer station is assigned to them before they are considered to have abandoned the queue. once this data has been gathered, the computer simulation runs by iterating through each second the library is open. as user sign-up times are encountered in the data, they are added to the waiting queue. when a computer becomes available, a user is selected from the queue using the strategy being simulated and assigned to the open computer. the customer occupies the computer for the length of time given by their associated log-in delay and session length. when this time expires, customers are removed from their computer and the information recorded during their time spent in the waiting queue is logged. results there were 7,403 sign-ups for the computers at the southern oaks library in august 2010. each of these requests is assigned a priority class based on the length of the session as detailed in table 2. the intended session length of users choosing to abandon the queue is unknown. abandoned sign-ups are assigned a priority class randomly in proportion to the overall distribution of priority classes in the data so as not to introduce any systematic bias into the results. even though their actual session length is zero, these users participate in the queue and cause the computer eventually assigned to them to sit idle for eight minutes until it is re-assigned. customers signing up for a subsequent session during the day are always assigned the lowest priority class (p-4) regardless of their requested session length. this is a policy decision to not give priority to users who have already received a computer session for the day. information technology and libraries | june 2012 77 table 2. assignment of priority classes figure 1 displays the average waiting time for each priority class during the simulation (bars) along with the total number of sessions initially assigned to each class (line). it is immediately obvious from the chart that each alternative strategy excels at reducing the average wait for high priority (p1) users. also observe how removing one computer from the pool to serve exclusively as a fifteen-minute computer drastically increases the fcfs-15 average wait times in the other priority classes. clearly, removing one (or more) computer from the pool to serve as a dedicated fifteen-minute station is a poor strategy here for all but the 519 users in class p-1. losing just one of the eighteen available computers nearly doubles the average wait for the remaining 6,884 users in the other priority classes. figure 1. average user wait minutes by priority class public library computer waiting queues | williamson 78 by contrast, note that the reduced average wait times for the highest priority users in class p-1 persist in classes p-2 and p-3 for the non-fcsc strategies. the sjf strategy produces the most dramatic reductions for the 2,164 users not in class p-4. however, for the 5,239 users in class p-4, the sjf strategy produced an average wait time that was 2.1 minutes longer than the purely fcfs strategy. the hrrn strategy achieves lesser wait time reductions than sjf in the higher priority classes, but hrrn increased the average wait for users in class p-4 by only 0.7 minutes relative to fcfs. the average wait using the sjf-fb strategy falls in between that of sjf and hrrn for each priority class while guaranteeing users will be cut at most six times. an examination of the maximum wait times for each priority class in figure 2 illustrates how the express lane itself can be a bottleneck. even with a dedicated fifteen-minute express computer under the fcfs-15 strategy, at least one user would have waited over half an hour to use a computer for fifteen minutes or less. in all but the highest priority class (p-2 through p-4), the fcfs-15 strategy again performs poorly with at least one user in each of these classes waiting over ninety minutes for a computer. figure 2. maximum user wait minutes by priority class capping the number of times a user may be passed over in the queue under the sfj-fb strategy makes it less likely that members of classes p-2 and p-3 will be able to take advantage of their higher priority to cut in front of users in class p-4 during periods of peak demand. as a result, the sjf-fb maximum wait times for classes p-2 and p-3 are similar to those under the fcfs strategy. this was not the case in the breakdown of sjf-fb average waiting times across priority classes in figure 1. information technology and libraries | june 2012 79 table 3 breaks down waiting times for each queuing strategy according to the overall percentage of users waiting no more than the given number of minutes. here we see the effects of each strategy on the system as a whole, instead of by priority class. notice that the overall average wait times for the non-fcfs strategies are lower than those of fcfs. this indicates that the total reduction in waiting times for high-priority users exceeds the additional time spent waiting by users in class p-4. in other words, these strategies are globally more efficient than fcfs. notice, too, in table 3 that the non-fcfs strategies achieve significant reductions in the median wait time compared with fcfs. table 3. distribution of wait times by strategy after demonstrating the impact that breaking the first-come-first-served rule can have on waiting times, it is important to examine the line cuts that are associated with each of these strategies. line cuts are recorded by each user in the simulation while waiting in the queue. each time a user is selected from the queue and assigned a computer, remaining users who arrived prior to the one just selected note having been skipped over. by the time they are assigned a computer, users have recorded the total number of times they were passed over in the queue. public library computer waiting queues | williamson 80 figure 3. cumulative distribution of line cuts by queuing strategy figure 3 displays the cumulative percentage of users experiencing no more than the listed number of cuts for each non-fcfs strategy. the majority of users are not passed over at all under these strategies. however, there is a small minority of users that will be repeatedly cut in line. for instance, in our simulation, one unfortunate individual was passed over in the queue sixteen times under the sjf strategy. this user waited ninety-one minutes using this strategy as opposed to only fifty-nine minutes under the familiar fcfs waiting queue. most customers would become upset upon seeing a string of sixteen people jump over them in the queue and get on a computer while they are enduring such a long wait. the hrrn strategy caused a maximum of nine cuts to an individual in this simulation. this user waited seventy-three minutes under hrrn versus only fifty-five minutes using fcfs. extreme examples such as those above are the exception. under the hrrn and sjf-fb strategies, 99% of users were passed over at most four times while waiting in the queue. conclusion we have examined the simulation of several queuing strategies using a single month of computer usage data from the southern oaks library. the relative performance difference between queuing strategies will depend on the supply and demand of computers at any given location. clearly, at libraries with plenty of public computers for which customers seldom have to wait, the choice of queuing strategy is inconsequential. however, for libraries struggling with waiting times on par with those examined here, the choice can have a substantial impact. information technology and libraries | june 2012 81 in general, however, these simulation results demonstrate the ability of non-fcfs queuing strategies to significantly lower waiting times for certain classes of users without partitioning the pool of computers. these reductions in waiting times come at the cost of allowing high priority users to essentially cut in line. this causes slightly longer wait times for low priority users; but, overall average and median wait times see a small reduction. of course, for some customers, being passed over in line even once is intolerable. furthermore, creating a system to implement an alternative queuing strategy may present obstacles of its own. however, if the need to provide for quick, short-term computer access is pressing enough for a library to create a separate pool of “express” computers; then, one of the non-fcfs queuing strategies discussed in this paper may be a viable alternative. at the very least, the fcfs-15 simulation results should give one pause before resorting to designated “express” and “nonexpress” computers in an attempt to remedy unacceptable customer waiting times. acknowledgments the author would like to thank the metropolitan library system, kay bauman, jimmy welch, sudarshan dhall, and bo kinney for their support and assistance with this paper as well as tracey thompson and tim spindle for their excellent review and recommendations. references 1. j. d. slone, “the impact of time constraints on internet and web use,” journal of the american society for information science and technology 58 (2007): 508–17. 2. william mendenhall and terry sincich, statistics for engineering and the sciences (upper saddle river, nj: prentice-hall, 2006), 151–54. 3. abraham silberschatz, peter baer galvin, and greg gagne, operating system concepts (hoboken, nj: wiley, 2009), 188–200. ital_24n4p12-23 ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ generating collaborative systems for digital libraries | hilera et al. 195 josé r. hilera, carmen pagés, j. javier martínez, j. antonio gutiérrez, and luis de-marcos an evolutive process to convert glossaries into ontologies dictionary, the outcome will be limited by the richness of the definition of terms included in that dictionary. it would be what is normally called a “lightweight” ontology,6 which could later be converted into a “heavyweight” ontology by implementing, in the form of axioms, knowledge not contained in the dictionary. this paper describes the process of creating a lightweight ontology of the domain of software engineering, starting from the ieee standard glossary of software engineering terminology.7 ■■ ontologies, the semantic web, and libraries within the field of librarianship, ontologies are already being used as alternative tools to traditional controlled vocabularies. this may be observed particularly within the realm of digital libraries, although, as krause asserts, objections to their use have often been raised by the digital library community.8 one of the core objections is the difficulty of creating ontologies as compared to other vocabularies such as taxonomies or thesauri. nonetheless, the semantic richness of an ontology offers a wide range of possibilities concerning indexing and searching of library documents. the term ontology (used in philosophy to refer to the “theory about existence”) has been adopted by the artificial intelligence research community to define a categorization of a knowledge domain in a shared and agreed form, based on concepts and relationships, which may be formally represented in a computer readable and usable format. the term has been widely employed since 2001, when berners-lee et al. envisaged the semantic web, which aims to turn the information stored on the web into knowledge by transforming data stored in every webpage into a common scheme accepted in a specific domain.9 to accomplish that task, knowledge must be represented in an agreed-upon and reusable computer-readable format. to do this, machines will require access to structured collections of information and to formalisms which are based on mathematical logic that permits higher levels of automatic processing. technologies for the semantic web have been developed by the world wide web consortium (w3c). the most relevant technologies are rdf (resource description this paper describes a method to generate ontologies from glossaries of terms. the proposed method presupposes an evolutionary life cycle based on successive transformations of the original glossary that lead to products of intermediate knowledge representation (dictionary, taxonomy, and thesaurus). these products are characterized by an increase in semantic expressiveness in comparison to the product obtained in the previous transformation, with the ontology as the end product. although this method has been applied to produce an ontology from the “ieee standard glossary of software engineering terminology,” it could be applied to any glossary of any knowledge domain to generate an ontology that may be used to index or search for information resources and documents stored in libraries or on the semantic web. f rom the point of view of their expressiveness or semantic richness, knowledge representation tools can be classified at four levels: at the basic level (level 0), to which dictionaries belong, tools include definitions of concepts without formal semantic primitives; at the taxonomies level (level 1), tools include a vocabulary, implicit or explicit, as well as descriptions of specialized relationships between concepts; at the thesauri level (level 2), tools further include lexical (synonymy, hyperonymy, etc.) and equivalence relationships; and at the reference models level (level 3), tools combine the previous relationships with other more complex relationships between concepts to completely represent a certain knowledge domain.1 ontologies belong at this last level. according to the hierarchic classification above, knowledge representation tools of a particular level add semantic expressiveness to those in the lowest levels in such a way that a dictionary or glossary of terms might develop into a taxonomy or a thesaurus, and later into an ontology. there are a variety of comparative studies of these tools,2 as well as varying proposals for systematically generating ontologies from lower-level knowledge representation systems, especially from descriptor thesauri.3 this paper proposes a process for generating a terminological ontology from a dictionary of a specific knowledge domain.4 given the definition offered by neches et al. (“an ontology is an instrument that defines the basic terms and relations comprising the vocabulary of a topic area as well as the rules for combining terms and relations to define extensions to the vocabulary”)5 it is evident that the ontology creation process will be easier if there is a vocabulary to be extended than if it is developed from scratch. if the developed ontology is based exclusively on the josé r. hilera (jose.hilera@uah.es) is professor, carmen pagés (carmina.pages@uah.es) is assistant professor, j. javier martínez (josej.martinez@uah.es) is professor, j. antonio gutiérrez (jantonio.gutierrez@uah.es) is assistant professor, and luis de-marcos (luis.demarcos@uah.es) is professor, department of computer science, faculty of librarianship and documentation, university of alcalá, madrid, spain. 196 information technology and libraries | december 2010 configuration management; data types; errors, faults, and failures; evaluation techniques; instruction types; language types; libraries; microprogramming; operating systems; quality attributes; software documentation; software and system testing; software architecture; software development process; software development techniques; and software tools.15 in the glossary, entries are arranged alphabetically. an entry may consist of a single word, such as “software,” a phrase, such as “test case,” or an acronym, such as “cm.” if a term has more than one definition, the definitions are numbered. in most cases, noun definitions are given first, followed by verb and adjective definitions as applicable. examples, notes, and illustrations have been added to clarify selected definitions. cross-references are used to show a term’s relations with other terms in the dictionary: “contrast with” refers to a term with an opposite or substantially different meaning; “syn” refers to a synonymous term; “see also” refers to a related term; and “see” refers to a preferred term or to a term where the desired definition can be found. figure 2 shows an example of one of the definitions of the glossary terms. note that definitions can also include framework),10 which defines a common data model to specify metadata, and owl (ontology web language),11 which is a new markup language for publishing and sharing data using web ontologies. more recently, the w3c has presented a proposal for a new rdf-based markup system that will be especially useful in the context of libraries. it is called skos (simple knowledge organization system), and it provides a model for expressing the basic structure and content of concept schemes, such as thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, and other similar types of controlled vocabularies.12 the emergence of the semantic web has created great interest within librarianship because of the new possibilities it offers in the areas of publication of bibliographical data and development of better indexes and better displays than those that we have now in ils opacs.13 for that reason, it is important to strive for semantic interoperability between the different vocabularies that may be used in libraries’ indexing and search systems, and to have compatible vocabularies (dictionaries, taxonomies, thesauri, ontologies, etc.) based on a shared standard like rdf. there are, at the present time, several proposals for using knowledge organization systems as alternatives to controlled vocabularies. for example, folksonomies, though originating within the web context, have been proposed by different authors for use within libraries “as a powerful, flexible tool for increasing the user-friendliness and interactivity of public library catalogs.”14 authors argue that the best approach would be to create interoperable controlled vocabularies using shared and agreed-upon glossaries and dictionaries from different domains as a departure point, and then to complete evolutive processes aimed at semantic extension to create ontologies, which could then be combined with other ontologies used in information systems running in both conventional and digital libraries for indexing as well as for supporting document searches. there are examples of glossaries that have been transformed into ontologies, such as the cambridge healthtech institute’s “pharmaceutical ontologies glossary and taxonomy” (http://www.genomicglossaries.com/content/ontolo gies.asp), which is an “evolving terminology for emerging technologies.” ■■ ieee standard glossary of software engineering terminology to demonstrate our proposed method, we will use a real glossary belonging to the computer science field, although it is possible to use any other. the glossary, available in electronic format (pdf), defines approximately 1,300 terms in the domain of software engineering (figure 1). topics include addressing assembling, compiling, linking, loading; computer performance evaluation; figure 1. cover of the glossary document generating collaborative systems for digital libraries | hilera et al. 197 4. define the classes and the class hierarchy 5. define the properties of classes (slots) 6. define the facets of the slots 7. create instances as outlined in the introduction, the ontology developed using our method is a terminological one. therefore we can ignore the first two steps in noy’s and mcguinness’ process as the concepts of the ontology coincide with the terms of the glossary used. any ontology development process must take into account the basic stages of the life cycle, but the way of organizing the stages can be different in different methods. in our case, since the ontology has a terminological character, we have established an incremental development process that supposes the natural evolution of the glossary from its original format (dictionary or vocabulary format) into an ontology. the proposed life cycle establishes a series of steps or phases that will result in intermediate knowledge representation tools, with the final product, the ontology, being the most semantically rich (figure 4). therefore this is a product-driven process, in which the aim of every step is to obtain an intermediate product useful on its own. the intermediate products and the final examples associated with the described concept. in the resulting ontology, the examples were included as instances of the corresponding class. in figure 2, it can be seen that the definition refers to another glossary on programming languages (std 610.13), which is a part of the series of dictionaries related to computer science (“ieee std 610,” figure 3). other glossaries which are mentioned in relation to some references about term definitions are 610.1, 610.5, 610.7, 610.8, and 610.9. to avoid redundant definitions and possible inconsistencies, links must be implemented between ontologies developed from those glossaries that include common concepts. the ontology generation process presented in this paper is meant to allow for integration with other ontologies that will be developed in the future from the other glossaries. in addition to the explicit references to other terms within the glossary and to terms from other glossaries, the textual definition of a concept also has implicit references to other terms. for example, from the phrase “provides features designed to facilitate expression of data structures” included in the definition of the term high order language (figure 2), it is possible to determine that there is an implicit relationship between this term and the term data structure, also included in the glossary. these relationships have been considered in establishing the properties of the concepts in the developed ontology. ■■ ontology development process many ontology development methods presuppose a life cycle and suggest technologies to apply during the process of developing an ontology.16 the method described by noy and mcguinness is helpful when beginning this process for the first time.17 they establish a seven-step process: 1. determine the domain and scope of the ontology 2. consider reusing existing ontologies 3. enumerate important terms in the ontology figure 2. example of term definition in the ieee glossary figure 3. ieee computer science glossaries 610—standard dictionary of computer terminology 610.1—standard glossary of mathematics of computing terminology 610.2—standard glossary of computer applications terminology 610.3—standard glossary of modeling and simulation terminology 610.4—standard glossary of image processing terminology 610.5—standard glossary of data management terminology 610.6—standard glossary of computer graphics terminology 610.7—standard glossary of computer networking terminology 610.8—standard glossary of artificial intelligence terminology 610.9—standard glossary of computer security and privacy terminology 610.10—standard glossary of computer hardware terminology 610.11—standard glossary of theory of computation terminology 610.12—standard glossary of software engineering terminology 610.13—standard glossary of computer languages terminology high order language (hol). a programming language that requires little knowledge of the computer on which a program will run, can be translated into several difference machine languages, allows symbolic naming of operations and addresses, provides features designed to facilitate expression of data structures and program logic, and usually results in several machine instructions for each program statement. examples include ada, cobol, fortran, algol, pascal. syn: high level language; higher order language; third generation language. contrast with: assembly language; fifth generation language; fourth generation language; machine language. note: specific languages are defined in p610.13 198 information technology and libraries | december 2010 since there are terms with different meanings (up to five in some cases) in the ieee glossary of software engineering terminology, during dictionary development we decided to create different concepts (classes) for the same term, associating a number to these concepts to differentiate them. for example, there are five different definitions for the term test, which is why there are five concepts (test1–test5), corresponding to the five meanings of the term: (1) an activity in which a system or component is executed under specified conditions, the results are observed or recorded, and an evaluation is made of some aspect of the system or component; (2) to conduct an activity as in (1); (3) a set of one or more test cases; (4) a set of one or more test procedures; (5) a set of one or more test cases and procedures. taxonomy the proposed lifecycle establishes a stage for the conversion of a dictionary into a taxonomy, understanding taxonomy as an instrument of concepts categorization, product are a dictionary, which has a formal and computer processed structure, with the terms and their definitions in xml format; a taxonomy, which reflects the hierarchic relationships between the terms; a thesaurus, which includes other relationships between the terms (for example, the synonymy relationship); and, finally, the ontology, which will include the hierarchy, the basic relationships of the thesaurus, new and more complex semantic relationships, and restrictions in form of axioms expressed using description logics.18 the following paragraphs describe the way each of these products is obtained. dictionary the first step of the proposed development process consists of the creation of a dictionary in xml format with all the terms included in the ieee standard glossary of software engineering terminology and their related definitions. this activity is particularly mechanical and does not need human intervention as it is basically a transformation of the glossary from its original format (pdf) into a format better suited to the development process. all formats considered for the dictionary are based on xml, and specifically on rdf and rdf schema. in the end, we decided to work with the standards daml+oil and owl,19 though we are not opposed to working with other languages, such as skos or xmi,20 in the future. (in the latter case, it would be possible to model the intermediate products and the ontology in uml graphic models stored in xml files.)21 in our project, the design and implementation of all products has been made using an ontology editor. we have used oiled (with oilviz plugin) as editor, both because of its simplicity and because it allows the exportation to owl and daml formats. however, with future maintenance and testing in mind, we decided to use protégé (with owl plugin) in the last step of the process, because this is a more flexible environment with extensible modules that integrate more functionality such as ontology annotation, evaluation, middleware service, query and inference, etc. figure 5 shows the dictionary entry for “high order language,” which appears in figure 2. note that the dictionary includes only owl:class (or daml:class) to mark the term; rdf:label to indicate the term name; and rdf:comment to provide the definition included in the original glossary. figure 4. ontology development process highorderlanguage figure 5. example of dictionary entry generating collaborative systems for digital libraries | hilera et al. 199 example, when analyzing the definition of the term compiler: “(is) a computer program that translates programs expressed in a high order language into their machine language equivalent,” it is possible to deduce that compiler is a subconcept of computer program, which is also included in the glossary.) in addition to the lexical or syntactic analysis, it is necessary for an expert in the domain to perform a semantic analysis to complete the development of the taxonomy. the implementation of the hierarchical relationships among the concepts is made using rdfs:subclassof, regardless of whether the taxonomy is implemented in owl or daml format, since both languages specify this type of relationship in the same way. figure 6 shows an example of a hierarchical relationship included in the definition of the concept pictured in figure 5. thesaurus according to the international organization for standardization (iso), a thesaurus is “the vocabulary of a controlled indexing language, formally organized in order to make explicit the a priori relations between concepts (for example ‘broader’ and ‘narrower’).”25 this definition establishes the lexical units and the semantic relationships between these units as the elements that constitute a thesaurus. the following is a sample of the lexical units: ■■ descriptors (also called “preferred terms”): the terms used consistently when indexing to represent a concept that can be in documents or in queries to these documents. the iso standard introduces the option of adding a definition or an application note to every term to establish explicitly the chosen meaning. this note is identified by the abbreviation sn (scope note), as shown in figure 7. ■■ non-descriptors (“non-preferred terms”): the synonyms or quasi-synonyms of a preferred term. a nonpreferred term is not assigned to documents submitted to an indexing process, but is provided as an entry point in a thesaurus to point to the appropriate descriptor. usually the descriptors are written in capital letters and the nondescriptors in small letters. ■■ compound descriptors: the terms used to represent complex concepts and groups of descriptors, which allow for the structuring of large numbers of thesaurus descriptors into subsets called micro-thesauri. in addition to lexical units, other fundamental elements of a thesaurus are semantic relationships between these units. the more common relationships between lexical units are the following: ■■ equivalence: the relationship between the descriptors and the nondescriptors (synonymous and that is, as a systematical classification in a traditional way. as gilchrist states, there is no consensus on the meaning of terms like taxonomy, thesaurus, or ontology.22 in addition, much work in the field of ontologies has been done without taking advantage of similar work performed in the fields of linguistics and library science.23 this situation is changing because of the increasing publication of works that relate the development of ontologies to the development of “classic” terminological tools (vocabularies, taxonomies, and thesauri). this paper emphasizes the importance and usefulness of the intermediate products created at each stage of the evolutive process from glossary to ontology. the end product of the initial stage is a dictionary expressed as xml. the next stage in the evolutive process (figure 4) is the transformation of that dictionary into a taxonomy through the addition of hierarchical relationships between concepts. to do this, it is necessary to undertake a lexicalsemantic analysis of the original glossary. this can be done in a semiautomatic way by applying natural language processing (nlp) techniques, such as those recommended by morales-del-castillo et al.,24 for creating thesauri. the basic processing sequence in linguistic engineering comprises the following steps: (1) incorporate the original documents (in our case the dictionary obtained in the previous stage) into the information system; (2) identify the language in which they are written, distinguishing independent words; (3) “understand” the processed material at the appropriate level; (4) use this understanding to transform, search, or traduce data; (5) produce the new media required to present the produced outcomes; and finally, (6) present the final outcome to human users by means of the most appropriate peripheral device—screen, speakers, printer, etc. an important aspect of this process is natural language comprehension. for that reason, several different kinds of programs are employed, including lemmatizers (which implement stemming algorithms to extract the lexeme or root of a word), morphologic analyzers (which glean sentence information from their constituent elements: morphemes, words, and parts of speech), syntactic analyzers (which group sentence constituents to extract elements larger than words), and semantic models (which represent language semantics in terms of concepts and their relations, using abstraction, logical reasoning, organization and data structuring capabilities). from the information in the software engineering dictionary and from a lexical analysis of it, it is possible to determine a hierarchical relationship when the name of a term contains the name of another one (for example, the term language and the terms programming language and hardware design language), or when expressions such as “is a” linked to the name of another term included in the glossary appear in the text of the term definition. (for 200 information technology and libraries | december 2010 indicating that high order language relates to both assembly and machine languages. the life cycle proposed in this paper (figure 4) includes a third step or phase that transforms the taxonomy obtained in the previous phase into a thesaurus through the incorporation of relationships between the concepts that complement the hierarchical relations included in the taxonomy. basically, we have to add two types of relationships—equivalence and associative, represented in the standard thesauri with uf (and use) and rt respectively. we will continue using xml to implement this new product. there are different ways of implementing a thesaurus using a language based on xml. for example, matthews et al. proposed a standard rdf format,26 where as hall created an ontology in daml.27 in both cases, the authors modeled the general structure of quasi-synonymous). iso establishes that the abbreviation uf (used for) precedes the nondescriptors linked to a descriptor; and the abbreviation use is used in the opposite case. for example, a thesaurus developed from the ieee glossary might include a descriptor “high order language” and an equivalence relationship with a nondescriptor “high level language” (figure 7). ■■ hierarchical: a relationship between two descriptors. in the thesaurus one of these descriptors has been defined as superior to the other one. there are no hierarchical relationships between nondescriptors, nor between nondescriptors and descriptors. a descriptor can have no lower descriptors or several of them, and no higher descriptors or several of them. according to the iso standard, hierarchy is expressed by means of the abbreviations bt (broader term), to indicate the generic or higher descriptors, and nt (narrower term), to indicate the specific or lower descriptors. the term at the head of the hierarchy to which a term belongs can be included, using the abbreviation tt (top term). figure 7 presents these hierarchical relationships. ■■ associative: a reciprocal relationship that is established between terms that are neither equivalent nor hierarchical, but are semantically or conceptually associated to such an extent that the link between them should be made explicit in the controlled vocabulary on the grounds that it may suggest additional terms for use in indexing or retrieval. it is generally indicated by the abbreviation rt (related term). there are no associative relationships between nondescriptors and descriptors, or between descriptors already linked by a hierarchical relation. it is possible to establish associative relationships between descriptors belonging to the same or different category. the associative relationships can be of very different types. for example, they can represent causality, instrumentation, location, similarity, origin, action, etc. figure 7 shows two associative relations, .. high order language (descriptor) sn a programming language that... uf high level language (no-descriptor) uf third generation language (no-descriptor) tt language bt programming language nt object oriented language nt declarative language rt assembly language (contrast with) rt machine language (contrast with) .. high level language use high order language .. third generation language use high order language .. figure 7. fragment of a thesaurus entry figure 6. example of taxonomy entry ... generating collaborative systems for digital libraries | hilera et al. 201 terms. for example: . or using the glossary notation: . ■■ the rest of the associative relationships (rt) that were included in the thesaurus correspond to the cross-references of the type “contrast with” and “see also” that appear explicitly in the ieee glossary. ■■ neither compound descriptors nor groups of descriptors have been implemented because there is no such structure in the glossary. ontology ding and foo state that “ontology promotes standardization and reusability of information representation through identifying common and shared knowledge. ontology adds values to traditional thesauri through deeper semantics in digital objects, both conceptually, relationally and machine understandably.”29 this semantic richness may imply deeper hierarchical levels, richer relationships between concepts, the definition of axioms or inference rules, etc. the final stage of the evolutive process is the transformation of the thesaurus created in the previous stage into an ontology. this is achieved through the addition of one or more of the basic elements of semantic complexity that differentiates ontologies from other knowledge representation standards (such as dictionaries, taxonomies, and thesauri). for example: ■■ semantic relationships between the concepts (classes) of the thesaurus have been added as properties or ontology slots. ■■ axioms of classes and axioms of properties. these are restriction rules that are declared to be satisfied by elements of ontology. for example, to establish disjunctive classes ( ), have been defined, and quantification restrictions (existential or universal) and cardinality restrictions in the relationships have been implemented as properties. software based on techniques of linguistic analysis has been developed to facilitate the establishment of the properties and restrictions. this software analyzes the definition text for each of the more than 1,500 glossary terms (in thesaurus format), isolating those words that a thesaurus from classes (rdf:class or daml:class) and properties (rdf:property or daml:objectproperty). in the first case they proposed five classes: thesaurusobject, concept, topconcept, term, scopenote; and several properties to implement the relations, like hasscopenote (sn), isindicatedby, preferredterm, usedfor (uf), conceptrelation, broaderconcept (bt), narrowerconcept (nt), topofhierarchy (tt) and isrelatedto (rt). recently the w3c has developed the skos specification, created to define knowledge organization schemes. in the case of thesauri, skos includes specific tags, such as skos:concept, skos:scopenote (sn), skos:broader (bt), skos:narrower (nt), skos:related (rt), etc., that are equivalent to those listed in the previous paragraph. our specification does not make any statement about the formal relationship between the class of skos concept schemes and the class of owl ontologies, which will allow different design patterns to be explored for using skos in combination with owl. although any of the above-mentioned formats could be used to implement the thesaurus, given that the endproduct of our process is to be an ontology, our proposal is that the product to be generated during this phase should have a format compatible with the final ontology and with the previous taxonomy. therefore a minimal number of changes will be carried out on the product created in the previous step, resulting in a knowledge representation tool similar to a thesaurus. that tool does not need to be modified during the following (final) phase of transformation into an ontology. nevertheless, if for some reason it is necessary to have the thesaurus in one of the other formats (such as skos), it is possible to apply a simple xslt transformation to the product. another option would be to integrate a thesaurus ontology, such as the one proposed by hall,28 with the ontology representing the ieee glossary. in the thesaurus implementation carried out in our project, the following limitations have been considered: ■■ only the hierarchical relationships implemented in the taxonomy have been considered. these include relationsips of type “is-a,” that is, generalization relationships or type–subset relationships. relationships that can be included in the thesaurus marked with tt, bt, and nt, like relations of type “part of” (that is, partative relationships) have not been considered. instead of considering them as hierarchical relationships, the final ontology includes the possibility of describing classes as a union of classes. ■■ the relationships of synonymy (uf and use) used to model the cross-references in the ieee glossary (“syn” and “see,” respectively) were implemented as equivalent terms, that is, as equivalent axioms between classes (owl:equivalentclass or daml:sameclassas), with inverse properties to reflect the preference of the 202 information technology and libraries | december 2010 match the name of other glossary terms (or a word in the definition text of other glossary terms). the isolated words will then be candidates for a relationship between both of them. (figure 8 shows the candidate properties obtained from the software engineering glossary.) the user then has the option of creating relationships with the identified candidate words. the user must indicate, for every relationship to be created, the restriction type that it represents as well as existential or universal quantification or cardinality (minimum or maximum). after confirming this information, the program updates the file containing the ontology (owl or daml), adding the property to the class that represents the processed term. figure 9 shows an example of the definition of two properties and its application to the class highorderlanguage: a property express with existential quantification over the class datastructure to indicate that a language must represent at least one data structure; and a property translateto of universal type to indicate that any high-level language is translated into machine language (machinelanguage). ■■ results, conclusions, and future work the existence of ontologies of specific knowledge domains (software engineering in this case) facilitates the process of finding resources about this discipline on the semantic web and in digital libraries, as well as the reuse of learning objects of the same domain stored in repositories available on the web.30 when a new resource is indexed in a library catalog, a new record that conforms to the ontology conceptual data model may be included. it will be necessary to assign its properties according to the concept definition included in the ontology. the user may later execute semantic queries that will be run by the search system that will traverse the ontology to identify the concept in which the user was interested to launch a wider query including the resources indexed under the concept. ontologies, like the one that has been “evolved,” may also be used in an open way to index and search for resources on the web. in that case, however, semantic search engines such as swoogle (http://swoogle.umbc .edu/), are required in place of traditional syntactic search engines, such as google. the creation of a complete ontology of a knowledge domain is a complex task. in the case of the domain presented in this paper, that of software engineering, although there have been initiatives toward ontology creation that have yielded publications by renowned authors in the field,31 a complete ontology has yet to be created and published. this paper has described a process for developing a modest but complete ontology from a glossary of terminology, both in owl format and daml+oil format, accept access accomplish account achieve adapt add adjust advance affect aggregate aid allocate allow allow symbolic naming alter analyze apply approach approve arrangement arrive assign assigned by assume avoid await begin break bring broke down builds call called by can be can be input can be used as can operate in cannot be usedas carry out cause change characterize combine communicate compare comply comprise conduct conform consist constrain construct contain contains no contribute control convert copy correct correspond count create debugs decompiles decomposedinto decrease define degree delineate denote depend depict describe design designate detect determine develop development direct disable disassembles display distribute divide document employ enable encapsulate encounter ensure enter establish estimate establish evaluate examine exchange execute after execute in executes expand express express as extract facilitate fetch fill follow fulfil generate give give partial given constrain govern have have associated have met have no hold identify identify request ignore implement imply improve incapacitate include incorporate increase indicate inform initiate insert install intend interact with interprets interrelate investigate invokes is is a defect in is a form of is a method of is a mode of is a part is a part of is a sequence is a sequenceof is a technique is a techniqueof is a type is a type of is ability is activated by is adjusted by is applied to is based is called by is composed is contained is contained in is establish is established is executed after is executed by is incorrect is independent of is manifest is measured in is not is not subdivided in is part is part of is performed by is performed on is portion is process by is produce by is produce in is ratio is represented by is the output is the result of is translated by is type is used is used in isolate know link list load locate maintain make make up may be measure meet mix modify monitors move no contain no execute no relate no use not be connected not erase not fill not have not involve not involving not translate not use occur occur in occur in a operate operatewith optimize order output parses pas pass test perform permit permitexecute permit the execution pertaining place preclude predict prepare prescribe present present for prevent preventaccessto process produce produce no propose provide rank reads realize receive reconstruct records recovery refine reflect reformat relate relation release relocates remove repair replace represent request require reserve reside restore restructure result resume retain retest returncontrolto reviews satisfy schedule send server set share show shutdown specify store store in structure submission of supervise supports suppress suspend swap synchronize take terminate test there are no through throughout transfer transform translate transmit treat through understand update use use in use to utilize value verify work in writes figure 8. candidate properties obtained from the linguistic analysis of the software engineering glossary generating collaborative systems for digital libraries | hilera et al. 203 to each term.) we defined 324 properties or relationships between these classes. these are based on a semiautomated linguistic analysis of the glossary content (for example, allow, convert, execute, operatewith, produces, translate, transform, utilize, workin, etc.), which will be refined in future versions. the authors’ aim is to use this ontology, which we have called ontoglose (ontology glossary software engineering), to unify the vocabulary. ontoglose will be used in a more ambitious project, whose purpose is the development of a complete ontology in software engineering from the swebok guide.32 although this paper has focused on this ontology, the method that has been described may be used to generate an ontology from any dictionary. the flexibility that owl permits for ontology description, along with its compatibility with other rdf-based metadata languages, makes possible interoperability between ontologies and between ontologies and other controlled vocabularies and allows for the building of merged representations of multiple knowledge domains. these representations may eventually be used in libraries and repositories to index and search for any kind of resource, not only those related to the original field. ■■ acknowledgments this research is co-funded by the spanish ministry of industry, tourism and commerce profit program (grant tsi-020100-2008-23). the authors also want to acknowledge support from the tifyc research group at the university of alcala. references and notes 1. m. dörr et al., state of the art in content standards (amsterdam: ontoweb consortium, 2001). 2. d. soergel, “the rise of ontologies or the reinvention of classification,” journal of the american society for information science 50, no. 12 (1999): 1119–20; a. gilchrist, “thesauri, taxonomies and ontologies—an etymological note,” journal of documentation 59, no. 1 (2003): 7–18. 3. b. j. wielinga et al., “from thesaurus to ontology,” proceedings of the 1st international conference on knowledge capture (new york: acm, 2001): 194–201: j. qin and s. paling, “converting a controlled vocabulary into an ontology: the case of gem,” information research 6 (2001): 2. 4. according to van heijst, schereiber, and wielinga, ontologies can be classified as terminological ontologies, information ontologies, and knowledge modeling ontologies; terminological ontologies specify the terms that are used to represent knowledge in the domain of discourse, and they are in use principally to unify vocabulary in a certain domain. g. van heijst, a. t. which is ready to use in the semantic web. as described at the opening of this article, our aim has been to create a lightweight ontology as a first version, which will later be improved by including more axioms and relationships that increase its semantic expressiveness. we have tried to make this first version as tailored as possible to the initial glossary, knowing that later versions will be improved by others who might take on the work. such improvements will increase the ontology’s utility, but will make it a lessfaithful representation of the ieee glossary from which it was derived. the ontology we have developed includes 1,521 classes that correspond to the same number of concepts represented in the ieee glossary. (included in this number are the different meanings that the glossary assigns ... figure 9. example of ontology entry 204 information technology and libraries | december 2010 20. w3c, skos; object management group, xml metadata interchange (xmi), 2003, http://www.omg.org/technology/documents/formal/xmi.htm (accessed oct. 5, 2009). 21. uml (unified modeling language) is a standardized general-purpose modeling language (http://www.uml.org). nowadays, different uml plugins for ontologies’ editors exist. these plugins allow working with uml graphic models. also, it is possible to realize the uml models with a case tool, to export them to xml format, and to transform them to the ontology format (for example, owl) using a xslt sheet, as the one published in d. gasevic, “umltoowl: converter from uml to owl,” http://www.sfu.ca/~dgasevic/projects/umltoowl/ (accessed oct. 5, 2009). 22. gilchrist, “thesauri, taxonomies and ontologies.” 23. soergel, “the rise of ontologies or the reinvention of classification.” 24. j. m. morales-del-castillo et al., “a semantic model of selective dissemination of information for digital libraries,” information technology & libraries 28, no. 1 (2009): 22–31. 25. international standards organization, iso 2788:1986 documentation—guidelines for the establishment and development of monolingual thesauri (geneve: international standards organization, 1986). 26. b. m. matthews, k. miller, and m. d. wilson, “a thesaurus interchange format in rdf,” 2002, http://www.w3c.rl.ac .uk/swad/thes_links.htm (accessed feb. 10, 2009). 27. m. hall, “call thesaurus ontology in daml,” dynamics research corporation, 2001, http://orlando.drc.com/daml/ ontology/call-thesaurus (accessed oct. 5, 2009). 28. ibid. 29. y. ding and s. foo, “ontology research and development. part 1—a review of ontology generation,” journal of information science 28, no. 2 (2002): 123–36. see also b. h. kwasnik, “the role of classification in knowledge representation and discover,” library trends 48 (1999): 22–47. 30. s. otón et al., “service oriented architecture for the implementation of distributed repositories of learning objects,” international journal of innovative computing, information & control (2010), forthcoming. 31. o. mendes and a. abran, “software engineering ontology: a development methodology,” metrics news 9 (2004): 68–76; c. calero, f. ruiz, and m. piattini, ontologies for software engineering and software technology (berlin: springer, 2006). 32. ieee, guide to the software engineering body of knowledge (swebok) (los alamitos, calif.: ieee computer society, 2004), http:// www.swebok.org (accessed oct. 5, 2009). schereiber, and b. j. wielinga, “using explicit ontologies in kbs development,” international journal of human & computer studies 46, no. 2/3 (1996): 183–292. 5. r. neches et al., “enabling technology for knowledge sharing,” ai magazine 12, no. 3 (1991): 36–56. 6. o. corcho, f. fernández-lópez, and a. gómez-pérez, “methodologies, tools and languages for buildings ontologies. where is their meeting point?” data & knowledge engineering 46, no. 1 (2003): 41–64. 7. intitute of electrical and electronics engineers (ieee), ieee std 610.12-1990(r2002): ieee standard glossary of software engineering terminology (reaffirmed 2002) (new york: ieee, 2002). 8. j. krause, “semantic heterogeneity: comparing new semantic web approaches with those of digital libraries,” library review 57, no. 3 (2008): 235–48. 9. t. berners-lee, j. hendler, and o. lassila, “the semantic web,” scientific american 284, no. 5 (2001): 34–43. 10. world wide web consortium (w3c), resource description framework (rdf): concepts and abstract syntax, w3c recommendation 10 february 2004, http://www.w3.org/tr/rdf-concepts/ (accessed oct. 5, 2009). 11. world wide web consortium (w3c), web ontology language (owl), 2004, http://www.w3.org/2004/owl (accessed oct. 5, 2009). 12. world wide web consortium (w3c), skos simple knowledge organization system, 2009, http://www.w3.org/ tr/2009/rec-skos-reference-20090818/ (accessed oct. 5, 2009). 13. m. m. yee, “can bibliographic data be put directly onto the semantic web?” information technology & libraries 28, no. 2 (2009): 55-80. 14. l. f. spiteri, “the structure and form of folksonomy tags: the road to the public library catalog,” information technology & libraries 26, no. 3 (2007): 13–25. 15. corcho, fernández-lópez, and gómez-pérez, “methodologies, tools and languages for buildings ontologies.” 16. ieee, ieee std 610.12-1990(r2002). 17. n. f. noy and d. l. mcguinness, “ontology development 101: a guide to creating your first ontology,” 2001, stanford university, http://www-ksl.stanford.edu/people/dlm/ papers/ontology-tutorial-noy-mcguinness.pdf (accessed sept 10, 2010). 18. d. baader et al., the description logic handbook (cambridge: cambridge univ. pr., 2003). 19. world wide web consortium, daml+oil reference description, 2001, http://www.w3.org/tr/daml+oil-reference (accessed oct. 5, 2009); w3c, owl. editorial | truitt 163 ■■ the space in between in my opinion, ital has an identity crisis. it seems to try in many ways to be scholarly like jasist, but lita simply isn’t as formal a group as asist. on the other end of the spectrum, code4lib is very dynamic, informal and community-driven. ital kind of flops around awkwardly in the space in between. —comment by a respondent to ital’s reader survey, december 2009 last december and january, you, the readers of information technology and libraries were invited to participate in a survey aimed at helping us to learn your likes and dislikes about ital, and where you’d like to see this journal go in terms of several important questions. the responses provide rich food for reflection about ital, its readers, what we do well and what we don’t, and our future directions. indeed, we’re still digesting and discussing them, nearly a year after the survey. i’d like to use some of my editorial space in this issue to introduce, provide an overview, and highlight a few of the most interesting results. i strongly encourage you to access the full survey results, which i’ve posted to our weblog italica (http:// ital-ica.blogspot.com/); i further invite you to post your own thoughts there about the survey results and their meaning. we ran the survey from mid-december to mid-january. a few responses trickled in as late as mid-february. the survey invitation was sent to the 2,614 lita personal members; nonmembers and ital subscribers (most of whom are institutions) were excluded. we ultimately received 320 responses—including two from individuals who confessed that they were not actually lita members—for a response rate of 12.24 percent. thus the findings reported below reflect the views of those who chose to respond to the survey. the response rate, while not optimal, is not far from the 15 percent that i understand lita usually expects for its surveys. as you may guess, not all respondents answered all questions, which accounts for some small discrepancies in the numbers reported. who are we? in analyzing the survey responses, one of the first things one notices is the range and diversity of ital’s reader base, and by extension, of lita’s membership. the largest groups of subscribers identify themselves either as traditional systems librarians (58, or 18.2 percent) or web services/development librarians (31, or 9.7 percent), with a further cohort of 7.2 percent (23) composed of those working with electronic resources or digital projects. but more than 20 percent (71) come from the ranks of library directors and associate directors. nearly 15 percent (47) identify their focus as being in the areas of reference, cataloguing, acquisitions, or collection development. see figure 1. the bottom line is that more than a third of our readers are coming from areas outside of library it. a couple of other demographic items: ■■ while nearly six in ten respondents (182, or 57.6 percent) work in academic libraries, that still leaves a sizable number (134, or 42.3 percent) who don’t. more than 14 percent (45) of the total 316 respondents come from the public library sector. ■■ nearly half (152, or 48.3 percent) of our readers indicated that they have been with lita for five years or fewer. note that this does not necessarily indicate the age or number of years of service of the respondents, but it’s probably a rough indicator. still, i confess that this was something of a surprise to me, as i expected larger numbers of long-time members. and how do the numbers shake out for us old geezers? the 6–10 and greater-than-15-years cohorts each composed about 20 percent of those responding; interestingly, only 11.4 percent (36) answered that they’d been lita members for between 11 and 15 years. assuming that these numbers are an accurate reflection of lita’s membership, i can’t help but wonder about the explanation for this anomaly.” see figure 2. how are we doing? question 4 on the survey asked readers to respond to several statements: “it is important to me that articles in ital are peerreviewed.” more than 75 percent (241, or 77.2 percent) answered that they either “agreed” or “strongly agreed.” “ital is timely.” more than seven in ten respondents (228, or 73.0 percent) either “agreed” or “strongly agreed” that ital is timely. only 27 (8.7 percent) disagreed. as a technology-focused journal, where time-to-publication is always a sensitive issue, i expected more dissatisfaction on this question (and no, that doesn’t mean that i don’t worry about the nine percent who believe we’re too slow out of the gate). marc truitt editorial: the space in between, or, why ital matters marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 164 information technology and libraries | december 2010 would likely quit lita, with narrative explanations that clearly underscore the belief that ital—especially a paper ital—is viewed by many as an important benefit of membership. the following comments are typical: ■■ “lita membership would carry no benefits for me.” ■■ “dues should decrease, though.” [from a respondent who indicated he or she would retain lita “i use information from ital in my work and/ or i find it intellectually stimulating.” by a nearly identical margin to that regarding timeliness, ital readers (226, or 72.7 percent) either “agreed” or “strongly agreed” that they use ital in their work or find its contents stimulating. “ital is an important benefit of lita membership.” an overwhelming majority (248, or 79.78 percent) of respondents either “agreed” or “strongly agreed” with this statement.1 this perception clearly emerges again in responses to the questions about whether readers would drop their lita membership if we produced an electronic-only or open-access ital (see below). where should we be going? several questions sought your input about different options for ital as we move forward. question 7, for example, asked you to rank how frequently you access ital content via several channels, with the choices being “print copy received via membership,” “print copy received by your institution/library,” “electronic copy from the ital website,” or “electronic copy accessed via an aggregator service to which your institution/library subscribes (e.g., ebsco).” the choice most frequently accessed was the print copy received via membership, at 81.1 percent (228). question 8 asked about your preferences in terms of ital’s publication model. of the 307 responses, 60.6 percent (186) indicated a preference for continuance of the present arrangement, whereby we publish both paper and electronic versions simultaneously. four in ten respondents preferred that ital move to publication in electronic version only.2 of those who favored continued availability of paper, the great majority (159, or 83.2 percent) indicated in question 9 that they simply preferred reading ital in paper. those who advocate moving to electronic-only do so for more mixed reasons (question 10), the most popular being cost-effectiveness, timeliness, and the environmental friendliness of electronic publication. a final question in this section asked that you respond to the statement “if ital were to become an electronic-only publication i would continue as a dues-paying member of lita.” while a reassuring 89.8 percent (273) of you answered in the affirmative, 9.5 percent (29) indicated that you figure 2. years of lita membership figure 1. professional position of lita members 18.2% (58) 0.3% (1) 0.6% (2) 0.6% (2) 0.9% (3) 2.2% (7) 2.5% (8) 3.1% (10) 4.1% (13) 4.4% (14) 6.3% (20) 7.9% (25) 9.4% (30) 9.7% (31) 12.9 % (41) 16.7% (53) 0% 5% 10% 15% 20% systems librarian (includes responsibility for ils, servers, workstat... other (please specify) library director web services/development librarian deputy/associate/assistant director reference services librarian cataloging librarian consortium/network/vendor librarian electronic resources librarian digital projects/digitization librarian student teaching faculty computing professional (non-mls) resource sharing librarian acquisitions/collection development librarian other library staff (non-mls) 11.4% (36) 19.7% (62) 20.0% (63) 48.3% (152) 0% 10% 20% 30% 40% 5 years or less 11–15 years 6–10 years more than 15 years editorial | truitt 165 his lipstick-on-a-pig ils. somewhere else there’s a library blogger who fends off bouts of insomnia by reading “wonky” ital papers in the wee hours of the morning. and that ain’t the half of it, as they say. in short—in terms of readers, interests, and preferences—“the space in between” is a pretty big niche for ital to serve. we celebrate it. and we’ll keep trying our best to serve it well. ■■ departures as i write these lines in late-september, it’s been a sad few weeks for those of us in the ital family. in mid-august, former ital editor jim kopp passed away following a battle with cancer. last week, dan marmion—jim’s successor as editor (1999–2004)—and a dear friend to many of us on the current ital editorial board—also left us, the victim of a malignant brain tumor. i never met jim, but lita president karen starr eulogized him in a posting to lita-l on august 16, 2010.3 i noted dan’s retirement due to illness in this space in march.4 i first met dan in the spring of 2000, when he arrived at notre dame as the new associate director for information systems and digital access (i think the position was differently titled then) and, incidentally, my new boss. dan arrived only six weeks after my own start there. things at notre dame were unsettled at the time: the libraries had only the year before successfully implemented exlibris’ aleph500 ils, the first north american site to do so. while exlibris moved on to implementations at mcgill and the university of iowa, we at notre dame struggled with the challenges of supporting and upgrading a system then new to the north american market. it was not always easy or smooth, but throughout, dan always maintained an unflappable and collegial manner with exlibris staff and a quiet but supportive demeanor toward those of us who worked for him. i wish i could say that i understood and appreciated this better at the time, but i can’t. i still had some growing ahead of me—i’m sure that i still do. dan was there for me again as an enthusiastic reference when i moved on, first to the university of houston in 2003 and then to the university of alberta three years later. in these jobs i’d like to think i’ve come to understand a bit better the complex challenges faced by senior managers in large research libraries; in the process, i know i’ve come to appreciate dan’s quiet, knowledgeable, and hands-off style with department managers. it is one i’ve tried (not always successfully) to cultivate. while i was still at notre dame, dan invited me to join the editorial board of information technology and libraries, a group which over the years has come to include many “friends of dan,” including judith carter (quite possibly the world’s finest managing editor), andy boze (ital’s membership] ■■ “ital is the major benefit to me as we don’t have funds for me to attend lita meetings or training sessions.” ■■ “the paper journal is really the only membership benefit i use regularly.” ■■ “actually my answer is more, ‘i don’t know.’ i really question the value of my lita membership. ital is at least some tangible benefit i receive. quite honestly, i don’t know that there really are other benefits of lita membership.” question 12 asked about whether ital should continue with its current delayed open-access model (i.e., the latest two issues embargoed for non-lita members), or go completely open-access. by a three-to-two margin, readers favored moving to an open-access model for all issues. in the following question that asked whether respondents would continue or terminate lita membership were ital to move to a completely open-access publication model, the results were remarkably similar to those for the question linking print availability to lita membership, with the narrative comments again suggesting much the same underlying reasoning. in sum, the results suggest to me more satisfaction with ital than i might have anticipated; at the same time, i’ve only scratched the surface in my comments here. the narrative answers in particular—which i have touched on in only the most cursory fashion—have many things to say about ital’s “place,” suggestions for future articles, and a host of other worthy ideas. there is as well the whole area of crosstabbing: some of the questions, when analyzed with reference to the demographic answers in the beginning of the survey, may highlight entirely new aspects of the data. who, for instance, favors continuance of a paper ital, and who prefers electronic-only? but to come back to that reader’s comment about ital and “the space in between” that i used to frame this discussion (indeed, this entire column): to me, the demographic responses—which clearly show ital has a substantial readership outside of library it—suggest that that “space in between” is precisely where ital should be. we may or may not occupy that space “awkwardly,” and there is always room for improvement, although i hope we do better than “flop around”! the results make clear that ital’s readers—who would be you!—encompass the spectrum from the tech-savvy early-career reader of code4lib journal (electronic-only, of course!) to the library administrator who satisfies her need for technology information by taking her paper copy of ital along when traveling. elsewhere on that continuum, there are reference librarians and catalogers wondering what’s new in library technology, and a traditional systems librarian pondering whether there is an open-source discovery solution out there that might breathe some new life into 166 information technology and libraries | december 2010 between membership and receiving the journal. many of them appear to infer that a portion of their lita dues, then, are earmarked for the publication and mailing of ital. sadly, this is not the case. in years past, ital’s income from advertising paid the bills and even generated additional revenue for lita coffers. today, the shoe is on the other foot because of declining advertising revenue, but ital is still expected to pay its own way, which it has failed to do in recent years. but to those who reasonably believe that some portion of their dues is dedicated to the support of ital, well, t’ain’t so. bothered by this? complain to the lita board. 2. as a point of comparison, consider the following results from the 2000 ital reader survey. respondents were asked to rank several publishing options on a scale of 1 to 3 (with 1 = most preferred option and 3 = least preferred option): ital should be published simultaneously as a print-onpaper journal and an electronic journal (n = 284): 1 = 169 (59.5%); 2 = 93 (32.7%); 3 = 22 (7.7%) ital should be published in an electronic form only (n = 293): 1 = 55 (18.8%); 2 = 61 (20.8%); 3 = 177 (60.4%) in other words, then as now, about 60% of readers preferred paper and electronic to electronic-only. 3. karen starr, “fw: [libs-or] jim kopp: celebration of life,” online posting, aug. 16, 2010, lita-l, http://lists.ala. org/sympa/arc/lita-l/2010-08/msg00079.html (accessed sept. 29, 2010). 4. marc truitt, “dan marmion,” information technology & libraries 29 (mar. 2010): 4, http://www.ala.org/ala/mgrps/ divs/lita/ital/292010/2901mar/editorial_pdf.cfm (accessed sept. 29, 2010). webmaster), and mark dehmlow. while dan left ital in 2004, i think that he left the journal a wonderful and lasting legacy in these extremely capable and dedicated folks. my fondest memories of dan concern our shared passion for model trains. i remember visiting a train show in south bend with him a couple of times, and our last time together (at the ala midwinter meeting in denver two years ago) was capped by a snowy trek with exlibris’ carl grant, another model train enthusiast, to the mecca of model railroading, caboose hobbies. three boys off to see their toys—oh, exquisite bliss! i don’t know whether ital or its predecessor jola have ever reprinted an editorial, but while searching the archives to find something that would honor both jim and dan, i found a piece that i hope speaks eloquently of their contributions and to ital’s reason for being. dan’s editorial, “why is ital important?” originally published in our june 2002 issue, appears again immediately following this column. i think its message and the views expressed therein by jim and dan remain as valid today as they were in 2002. they also may help to frame my comments concerning our reader survey in the previous section. farewell, jim and dan. you will both be sorely missed. notes and references 1. a number of narrative answers to the survey make it clear that ital readers who are lita members perceive a link reproduced with permission of the copyright owner. further reproduction prohibited without permission. site license initiatives in the united kingdom: the psli and nesli experience borin, jacqueline information technology and libraries; mar 2000; 19, 1; proquest pg. 42 l site license initiatives in the united kingdom: the psli and nesli experience jacqueline borin this article examines the development of site licensing within the united kingdom higher education community. in particular, it looks at haw the pressure to make better use of dwindling fiscal resources led ta the conclusion that information technology and its exploitation was necessary in order to create an effective library service. these conclusions, reached in the follett report of 1993, led to the establishment of a pilot site license initiative and then a national electronic site license initiative. the focus of this article is these initiatives and the issues they faced, which included off-site access, definition of a site and perhaps most importantly, the unbundling of print and electronic journals. increased competition for institution funding around the world has resulted in an erosion of library funding. in the united states state universities are receiving a decreasing portion of their funds from the state while private universities are forced to limit tuition increases due to outside market forces. in the united kingdom the entitlement to free higher education is currently under attack and losing ground. today's economic pressures are requiring individual libraries to make better use of their fiscal resources while the emphasis moves from being a repository for information to providing access to information. jacqueline sorin (jborin@csusm.edu) is coordinator of reference and electronic resources, library and information services, california state university, san marcos. as in the united states, the use of consortia for cost sharing in the united kingdom is becoming imperative as producers produce more electronic materials and make them available in full-text formats. consortia, while originally formed to cooperate on interlibrary loans and union catalogs, have recently taken on a new role, driven by financial expediency, in negotiating electronic licenses for their members, and the percentage of vendor contracts with consortia are rising. academic libraries cannot afford the prevalent pricing model that asks for the current print price plus an electronic surcharge plus projected inflation surcharges, therefore group purchasing power allows higher education institutions to leverage the money they have and to provide resources that would otherwise be unavailable. advantages for the vendor include one negotiator and one technical person for the consortia as a whole. in addition, the use of consortia provide greater leverage in pushing for the need for stable archiving and for retaining the principles of fair use within the electronic environment as well as reminding publishers of the need for flexible and multiple economic models to deal with the diverse needs and funding structures of consortia. i during the spring of 1998, while visiting academic libraries in the united kingdom, i looked at an existing initiative within the uk higher education community-the pilot site license initiative (psli), which had begun as a response to the follett report and to rising journal prices. at the time the three-year initiative was nearing its end and its successor, the national electronic site license initiative (nesli), was already the topic of much discussion. i history the concept of site licensing in the united kingdom higher education 42 information technology and libraries i march 2000 community had already been established, since 1988, by the combined higher education software team (chest), based at the university of bath. chest has negotiated site licenses with software suppliers and some large database producers through two different methods. either the supplier sells a national license to chest, which passes it on to the individual institution or chest sells licenses to the institution on the suppliers behalf and passes the fees on to them (see figure 1). chest works closely with national information services and systems (niss). niss provides a focal point for the uk education and research communities to access information resources. niss's web service, the niss information gateway, provides a host for chest information such as ebsco masterfile and oclc netfirst. most chest agreements are institution-wide site licenses that allow for all noncommercial use of the product, normally for five years to allow for incorporation into the curriculum. once an institution signs up it is committed for the full term of the agreement. chest is not in the business of either evaluating products or differentiating among competing suppliers. evaluations and purchase decisions are left up to the individual institutions.2 chest does set up and support e-mail discussion lists for each agreement so that users can discuss features and problems of the product among themselves. they also send out electronic news bulletins to provide advance warning of forthcoming agreements and to assess level of interest in future agreements. chest operates in a similar manner to many library consortia in the united states. the major differences are that it sells to higher education institutions as a whole so the products they sell include not only databases but also for example, software programs. this is also beginning to change in reproduced with permission of the copyright owner. further reproduction prohibited without permission. the united states. a recent article in the chronicle of higher education mentions that institutions will not stop with library databases, "in the future we'll be negotiating site licenses for software and all sorts of things . . . not just databases."3 although chest is substantially self-funding it is strongly supported (as is niss) by the joint information systems committee (jisc) of the higher education funding councils of england (hefce). the majority of public funding for higher education funding in the united kingdom is funneled through the hefcs (one each for england, scotland, wales, and northern ireland). one of the jisc committees, the information services subcommittee (issc), which in 1997 became part of the committee for electronic information (cei) defined principles for the delivery of content. 4 they were: • free at the point of use; • subscriptions not transaction based; • lowest common denominator; • universality; • commonality of interfaces and • mass instruction. i follett report in 1993 an investigation into how to deal with the pressures on library resources caused by the rapid expansion of student numbers and the worldwide explosion in academic knowledge and information was undertaken by the joint funding council 's libraries review group, chaired by sir brian follett. this investigation resulted in the follett report. one of the key conclusions of the report was "the exploitation of it is essential to create the effective higher education and public research establishments software, data , training needs ! chest © chest (university of bath) 1996 figure 1. chest diagram chest deals , chest offers negotiations software , data, training materials t it product suppliers library service of the future ." the review group recommended that as a starting point "a pilot initiative between a small number of institutions and a similar number of publishing houses should be sponsored by the funding councils to demonstrate in practical terms how material can be handled and distributed electronically." 5 as a consequence £15 million was allocated to an electronic libraries program, managed by jisc on behalf of hefce. the electronic libraries program was to "engage the higher education community in developing and shaping th e implementation of the electronic library." 6 this project provided a body of electronic resources and services for uk higher education and influenced a cultural shift towards the acceptance and use of electronic resources instead of more traditional information storage and access methods. psli in may 1995 a pilot site license initiative subsidized by the funding councils was set up to : • test if the site license concept could provide wider access to journals for those in the academic community; • see if it would allow more flexibility in the use of scholarly material ; • test the methods for dissemination of scholarly material to the higher education sector in a variety of formats ; • test legal models for a national site license program; and • explore the possibility for increased value for money from scholarly journals.7 sixty-five publishers were invited by hefce to participate for three years commencing january 1, 1996. hefce was also responsible through jisc for the funding of the elib program, but no formal links were established between the elib project and communications i borin 43 reproduced with permission of the copyright owner. further reproduction prohibited without permission. the psli. 8 the final selection of four companies included academic press ltd., blackwell publishers ltd., blackwell science ltd., and iop publishing ltd. the publishers agreed to offer print journals to higher education institutions for discounts of between 30 and 40 percent over the three year period as well as electronic access as available. originally the electronic journals were supposed to be the subsidiary component of the agreement but by the end of the agreement they had become the major focus. the psli achieved almost 100 percent take up among the higher education institutions due to the anticipated savings through the program.9 hefce did not specify how the publishers were to deliver their content. iopp hosted the journals on their own server, for example, while academic press linked their ideal server to the journals online service at the university of bath. one of the key provisions of the site license was the unlimited rights of authorized users to make photocopies (including their use within course packs) of the journals. academic press and iopp provided full-text access to all their journals while blackwell and blackwell science only allowed reading of full text where a print subscription existed. an integral part of the psli was that the funding from hefce to the higher education institutions was top sliced to support the discounted price offered to the institutions. several assessments of the initiative were made and a final evaluation of the pilot was concluded at the end of 1997. initial surveys indicated subscription savings through the program (average annual savings were approximately £11,800 per annum) and the first report of the evaluation team showed a wide level of support for the project despite major problems with lack of communication in a timely manner.10 the team recommended an extension of the psli to include more publishers and more emphasis on electronic delivery. one concern that was raised was ease of access, students had to know which system a journal they required was on. this was not easily discernible or user friendly. evaluations by focus groups showed users wanted one single access point to all electronic journals.11 also unresolved was the need for one consistent interface to the electronic journals and a solution to the archiving issue. at the end of the psli, hefce handed the next phase over to jisc. in the fall of 1997 jisc announced that a nesli would be set up and a new steering group was established. nesli was to be an electronic-only scheme and the invitation to tender went out at the end of 1997 with a decision to be made mid-1998. national electronic site license initiative nesli, a three-year jisc funded program, began on january 1, 1999 although the "official" launch was held at the british library on june 15, 1999. it is an initiative to deliver a national electronic journal service to the united kingdom higher education and research community (approximately 180 institutions) and is a successor program to the pilot site license initiative {psli). in may 1998 jisc appointed a consortium of swets and zeitlinger and manchester computing {university of manchester) to act as a managing agent (swets and blackwell ltd. announced in june 1999 their intention to combine swets subscription service and blackwell's information services, the two subscription agency services). the managing agent represents the higher education institutions in negotiations with publishers, manages delivery of the electronic material through a single web interface and oversees day-to-day operation of the program including the handling of subscriptions.12 44 information technology and libraries i march 2000 the managing agent also encourages the widespread acceptance by publishers of a standard model site license, one of the objectives of this being to reduce the number and diversity of site definitions used by publishers. other important provisions of the model site license addressed the issues of walk-in use by clients and the need for publishers to provide access to material previously subscribed to when a subscription is cancelled. the subscription model is currently the prevalent option although they are also working towards a pay-per-view option.13 priority has been given to publishers who had been involved in the psli and to those publishers participating in swetsnet, the delivery mechanism for the nesli. swetsnet is an electronic journal aggregation service that offers access to and management of internet journals. its search engine allows searching and browsing through titles from all publishers with links to the full-text articles. nesli is not a mandatory initiative, the higher education institutions can choose whether to participate in proposals and can pursue their own arrangements individually or through their own consortiums if they wish. while psli was basically a printbased initiative limited to a small number of publishers and funded via top slicing, nesli is an electronic initiative aimed at involving many more publishers. it is designed to be self-funding, although it did receive some start-up funding. although it is an electronic initiative, proposals that include print will be considered, as it is still not easy to separate print and electronic materials.14 the initiative addresses the most effective use, access, and purchase of electronic journals in the academic library community. its aims include: • access control-for on-site and remote users; • cost; reproduced with permission of the copyright owner. further reproduction prohibited without permission. • definition of a site; • archiving; and • unbundling print from electronic. access to swetsnet, the delivery mechanism for journals included in nesli, has now been supplemented by the option of athens authentication. athens, an authentication system developed by niss, provides individuals affiliated with higher education institutions a single username and password for all electronic services they have permission to access. athens is linked to swetsnet to ensure access for off-site, remote, and distance learners who do not have a fixed ip address. this supplements swetsnet's ip address authentication, which does not allow for individual access to toc and sdi alerting. a help desk is available for all nesli users through the university of manchester. the definition of a site is being addressed by the nesli model site license, which tries to standardize site definitions (including access from places that authorized users work or study, including homes and residence halls); interlibrary loan (supplying an authorized user of another library a single paper copy of an electronic original of a individual document); walk-in-users; access to subscribed material in perpetuity (it provides for an archive to be made of the licensed material with access to the archive permissible after termination of the license); and inclusion of material in course packs. jisc' s nesli steering group approved the model nesli site license on may 11, 1999 for use by the nesli managing agent.15 the managing agent asks publishers to accept the model license with as few alterations as possible. during the term of the initiative the managing agent will be working on additional value added services. these include links from key indexing and abstracting services, provision of access via z39.50, linking from library opacs, creation of catalog records and assessing a model for ejournal delivery via subject clusters. in particular, they have begun to look at the technical issues concerned with providing marc records for all electronic journals included in nesli offers. additionally they will be looking at solutions for longer term archiving of electronic journals to provide a comfort level for librarians purchasing electronic only copies.16 two offers that have been made under the nesli umbrella so far are blackwell sciences for 130 electronic journals and johns hopkins university press for 46 electronic titles. most recently two additional vendors have been added to the list. elsevier has made a proposal to deliver full text content via the publishers sciencedirect platform that includes the full text of more than 1,000 elsevier science journals along with those of other publishers. a total of more than 3,800 journals would be included in the service.17 mcb university press, an independent niche publisher, is offering access to 114 full text journals and secondary information in the area of management through it's emerald intelligence + fulltext service. similarly, here in the united states, california state university (csu) put out for competitive tender a contract for the building of a customized database of 1200+ electronic journals based on the print titles subscribed to by 15 or more of the 22 campuses-journal access core collection oacc). the journals will be made available via pharos, a new unified information access system for the csu. like ohiolink, a consortium of 74 ohio libraries, it will provide a common interface to electronic journals for students and faculty and will facilitate the development of distance learning programs.18 by unbundling the journals, libraries will no longer be required to pay for journals they do not want or need leading to moderate price savings. additional savings can be realized through the lowering of overhead costs achieved by system wide purchasing of core resources. other issues being addressed within the jacc rfp included archiving and perpetual access to journal articles the university system has paid for, availability of e-journals in multiple formats, interlibrary loan of electronic documents, currency of content and cost value at the journal-title level. 19 currently 500 core journals are being provided under the jacc by ebsco information services and the csu plans on expanding those offerings. i conclusion as we move into the next millennium library consortia will continue to work together with vendors to further customize journal offerings. however it is still far too early to say whether nesli will be successful or whether it will succeed in getting the publishing industry to accept the model site license. if it is to work within the higher education community, it will depend greatly on the flexibility and willingness of the publishers of scholarly journals. it has made a start by developing a license that sets a wider definition of a site and that deals realistically with the question of off-site access. by encouraging the unbundling of electronic and print subscriptions nesli allows services to be tailored to specific needs of the information community, but it remains to be seen how many publishers are prepared to accept unbundled deals at this stage. also as technology stabilizes and libraries acquire increasingly larger electronic collections, we will not be able to rely on license negotiations as the only way to influence pricing, access, and distribution. an additional problem that remains unaddressed by either psli or nesli is the pressure on academics to publish in traditional journals and the corcommunications i borin 45 reproduced with permission of the copyright owner. further reproduction prohibited without permission. responding rise in scholarly journal prices. nesli neither encourages nor hinders changes in scholarly communication and therefore the question of restructuring the scholarly communication process remains.20 references and notes 1. barbara mcfadden and arnold hirshon, "hanging together to avoid hanging separately: opportunities for academic libraries and consortia," information technology and libraries 17, no. 1 (march 1998): 36. see also international coalition of library consortia, "statement of current perspective and preferred practices for the selection and purchase of electronic information," information technology and libraries 17, no. 1 (march 1998): 45. 2. martin s. white, "from psli to nesli: site licensing for electronic journals," new review of academic librarianship 3, (1997): 139-50. see also chest. chest: software, data, and information for education (1996). 3. thomas j. deloughry, "library consortia save members money on electronic materials," the chronicle of higher education (feb. 9, 1996): a21. 4. information services subcommittee, "principles for the delivery of content." accessed nov. 17, 1999, www.jisc.ac.uk/ pub97 /nl_97.html#issc. 5. joint funding council's libraries review group. the follett report. (dec. 1993): accessed nov. 20, 1999, www.niss.ac. uk/ education/ hefc / follett/report/. 6. john kirriemuir, "background of the elib programme." accessed nov. 21, 1999, www.ukoln.ac.uk/services.elib/ background/history.html. 7. psli evaluation team, "uk pilot site license initiative: a progress report," serials 10, no. 1 (1997): 17-20. 8. white, "from psli to nesli," 149. 9. tony kidd, "electronic journals: their introduction and exploitation in academic libraries in the uk," serials review 24, no. 1 (1998): 7-14. 10. jill taylor roe, "united we save, divided we spend: current purchasing trends in serials acquisitions in the uk academic sector," serials review 24, no. 1 (1998): ~11. psli evaluation team, "uk pilot site license initiative," 17-20. 12. beverly friedgood, "the uk national site licensing initiative," serials 11, no. 1 (1998): 37-39. 13. university of manchester and swets & zeitlinger, nesli: national electronic site license initiative (1999). accessed nov. 21, 1999, www.nesli.ac.uk/. 14. nesli brochure, "further information for librarians." accessed nov. 21, 1999, www.nesli.ac.uk/ nesli-librarians-leaflet.html. 15. a copy of the model site license is available on the nesli web site. accessed nov. 22, 1999, www.nesli.ac.uk/ mode1license8.html. 16. albert prior, "nesli progress through collaboration," learned publishing 12, no. 1 (1999). 17. science direct. accessed nov. 24, 1999, www.sciencedirect.com. 18. declan butler, "the writing is on the web for science journals in print," nature 397, oan. 211998). 19. the journal access core collection request for proposal. accessed nov. 22, 1999, www.calstate.edu/tier3/ cs+p/rfp_ifb/980160/980160.pdf. 20. frederick j. friend, "uk pilot site license initiative: is it guiding libraries away from disaster on the rocks of price rises?" serials 9, no. 2 (1996): 129-33. a low-cost library database solution mark england, lura joseph, and nern w. schlecht two locally created databases are made available to the world via the web using an inexpensive but highly functional search engine created in-house. the technology consists of a microcomputer running unix to serve relational databases. cgi forms created using the programming language perl offer flexible interface designs for database users and database maintainers. many libraries maintain indexes to local collections or resources and create databases or bibliographies con46 information technology and libraries i march 2000 cerning subjects of local or regional interest. these local resource indexes are of great value to researchers. the web provides an inexpensive means for broadly disseminating these indexes. for example, kilcullen has described a nonsearchable, webbased newspaper index that uses microsoft access 97.1 jacso has written about the use of java applets to publish small directories and bibliographies.2 sturr has discussed the use of wais software to provide searchable online indexes.3 many of the web-based local databases and search interfaces currently used by libraries may: • have problems with functionality; • lack provisions for efficient searching; • be based on unreliable software; • be based on software and hardware that is expensive to purchase or implement; • be difficult for patrons to use; and • be difficult for staff to maintain. after trying several alternatives, staff members at the north dakota state university libraries have implemented an inexpensive but highly functional and reliable solution. we are now providing searchable indexes on the web using a microcomputer running unix to serve relational databases. cgi forms created at the north dakota state university libraries using the programming language perl offer flexible interface designs for database users and database maintainers. this article describes how we have implemark england (england@badlands. nodak.edu) is assistant director, lura joseph (ljoseph@badlands.nodak.edu) is physical sciences librarian, and nem w. schlecht (schlecht@plains.nodak.edu) is a systems administrator at the north dakota state university libraries, fargo, north dakota. this article discusses structural, systems, and other types of bias that arise in matching new records to large databases. the focus is databases for bibliographic utilities, but other related database concerns will be discussed. problems of satisfying a “match” with sufficient flexibility and rigor in an environment of imperfect data are presented, and sources of unintentional variance are discussed. editor’s note: this article was submitted in honor of the fortieth anniversaries of lita and ital. s ameness is a sometime thing. libraries and other informationintensive organizations have long faced the problem of large collections of records growing incrementally. computerized records in a net worked environment have encouraged the recognition that duplicate records pose a serious threat to efficient information retrieval. yet what constitutes a duplicate record may be neither exact nor completely predictable. levels of discernment are required to permit matches on records that do not dif fer significantly and records that do. n initial definitions matching is defined as the process by which additions to a large database are screened and compared with existing database records. ideally, this process of matching ensures that duplicates are not added, nor erroneous replacements made of record pairs that are not really equivalent. oclc (online computer library center, inc.) is a non profit organization serving member libraries and related institutions throughout the world. it is the chief database capital of the organization, and it is “owned” in a sense by the member libraries worldwide that use and contribute to it. at this writing, it contains over seventythree mil lion records. this discussion focuses chiefly on oclc’s extended worldcat (xwc), though many of the issues are common to other bibliographic databases. examples of these include the research libraries group’s research libraries information network (rlin) database, pica (a european cooperative of libraries headquartered in the netherlands), and other union catalogs. the literature will demonstrate that the problems described exist in many if not most large bibliographic databases.the database contents are representations or surrogates of the objects in shared collections. individual records in xwc are com plex bibliographic representations of physical or virtual objects—books, films, urls, maps, slides, and much more. each of these records consists of metadata, i.e., “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource”1(appendix a). the records use an xml varia tion of the marc communications format.2 for example, a record for a book might typically contain such fields for author, title, publisher, and date, and many more in addi tion. the representation of any one object can be quite com plex, containing scores of fields and subfields. such a record may be quite brief, or several thousand characters long. the depth and richness of the records varies enormously. they may describe materials in more than 450 languages. this is a database against which millions of searches and millions of records are processed, each month. why is matching a challenge? two records describing the same intellectual creation or work (e.g., shakespeare’s othello) can vary by physical form and other attributes. two records describing both the same work and exactly the same form can differ from each other if the records were created under different rules of record description (catalog ing). two records intended to describe the same object can vary unintentionally if typographical or other entry errors are present in one or both. thus sorting out significant from insignificant differences is critical. an example of the challenges of developing matching software in the metadata capture project is described elsewhere.3 the scope of misinformation is limited to information storage and retrieval, and specifically to comparison of incoming records to candidate matches in the database. the authors define misinformation as follows: 1. anything that can cause two database records, i.e., representations of different items to be mistaken as representations of the same item. these can lead to inappropriate merging or updates. 2. the effect of techniques or processes of search that can obscure distinctions in differing items. 3. any case where matching misses an appropriate match due to nonsignificant differences in two records that really represent the same item. note that disinformation (the intentional effort to mis represent) is not considered in scope for this discussion. the assumption is that cooperation is in the interests of all parties contributing to a shared database. we do not assume that all institutions sharing the database have the same goals. misinformation and bias in metadata processing | thornburg and oskins 15 misinformation and bias in metadata processing: matching in large databases gail thornburg and w. michael oskins gail thornburg (thornbug@oclc.org) has taught at the university of maryland and the university of illinois, and served as an adjunct professor at kent state university, and as a senior-level software engineer at oclc. w. michael oskins (oskins@oclc.org) has worked as a developer and researcher at oclc for twenty years. 16 information technology and libraries | june 200716 information technology and libraries | june 2007 what is bias? bias can be defined as factors in the creation or processing of database records that feed on misinformation or missing information, and skew charac terizations of the database records in question. context—matching and bias how are matching and bias related to each other? the growth of a database is in part a function of the matching process. if matching is not tuned correctly, the database can grow or change in nonoptimal ways. another way to look at the problem is to consider the goal of success in searching, and the need to know when to stop. human beings recognize that failure to find the best information for a given problem may be costly. finding the best information when less would suffice may also be costly. systems need to know this. for a large shared data base, hundreds of thousands of records may be processed in a day; the system must be as efficient as possible. what are some costs? fail to match when one should, and duplicates may proliferate in the database. match badly, and there is risk of merging multiple records that do not represent the same item. a system of matching can fail in more than one way. balance is needed. 1. searches, which are based on data in the incom ing record, may be too precise to find legitimate matches. loosen the criteria too much, and the search may return too many records to compare. 2. once retrieved, candidate matches are evaluated. compare candidates too narrowly, and records with insignificant differences will be rejected. fail to take note of salient differences between incom ing record and database record, and the match will be wrong, undetected, and potentially hard to detect in the future. the goals vary in different matching projects. for some projects, setting “holdings,” the indication that a member library owns a copy of something, is the main goal of the processing. this does not involve adding, replacing, or merging database records. for other projects, the goal is to update the database, either by replacing matched records, merging multiple duplicate records into one, or by adding new records if no match is found in the database. for the latter, bad matching could compromise database contents. n background hickey and rypka provide a good review of the problems of identifying duplicates and the implications for match ing software.4 their study notes concerns from a variety of library networks including that of the university of toronto (utlas), washington library network (wln), and research libraries group (rlin). they also refer ence studies on duplicate detection in the illinois state wide bibliographic database and at oak ridge national laboratories. background discussion of broader misinfor mation issues in shared library catalogs can be found in bade’s paper.5 a good, though dated, review of duplicate record problems can be found in the o’neill, rogers, and oskins article.6 the authors discuss their analysis of differences in records that are similar but not identical, and which elements caused failure to match two records for the same item. for example, when there was only one differing element in a pair, they found that element was most often publication date. their study shows the difficulties for experts to determine with certainty that a bibliographic record is for the same item. problems of typographical errors in shared biblio graphic records come under discussion by beall and kafadar.7 their study of copy cataloging errors found only 35.8 percent were corrected later by libraries, though the ordinary assumption is that copy cataloging will be updated when more information is available for an item. pollock and zamora report on a spelling error detection project at chemical abstracts service (cas) and charac terize the types of errors they found.8 chemical abstracts databases are among the most searched databases in the world. cas is usually characterized as a set of sources with considerable depth and breadth. of the four most common typographical errors they describe, errors of omission are most common, with insertion second, substitution third, and transposition fourth. over 90 percent of the errors they found were single letter errors. this is in agreement with the findings of o’neill and aluri, though the databases were substantially different.9 another study on moving image materials focuses on problems of nearequivalents in cataloging.10 yee suggests that cataloging practice tends to lead to making too many separate records for near equivalents. owen gingerich provides insight in the use of holdings information in oclc and other bibliographic utilities such as rlin for scholarly research in locating early editions of copernicus’ de revolutionibus.11 among other sources, he used holdings information in multiple bibliographic utilities to help in collecting a census of copies of de revolutionibus, and plotting its movements through europe in the sixteenth century. his article high lights the importance of distinguishing very similar items for scholarly research. shedenhelm and burk discuss the introduction of vendor records into oclc’s worldcat database.12 their results indicate that these minimallevel records increase the duplication rate within the database and can be costly to upgrade. (see further discussion in the section change in contributor characteristics below.) one problem in analysis of sources of mismatch in previous studies is that there is no good way to detect and charac public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 17misinformation and bias in metadata processing | thornburg and oskins 17 terize typos that form real words. jasco reviews studies characterizing types and sources of errors.13 sheila intner compares the quality issues in the databases of oclc and the research libraries group (rlg) and finds the issues similar.14 intner used matched samples of records from both worldcat and rlin to list and compare types of errors in the records. she noted that while the perception at that time was that rlin had higherquality cataloging, the differences found were not statistically significant. jeffrey beall, while focusing in his study on the full text online database jstor, notes the commonality of problems in metadata quality.15 in addition, he discusses the special quality problems in a database of scanned images. the scanning software itself may introduce typo graphical errors. like xwc, the database changes rapidly. o’neill and visinegoetz present a survey of quality con trol issues in online databases.16 their sections on dupli cate detection and on matching algorithms illustrate the commonalities of these problems in a variety of shared cataloging databases. they cite variation in title as the most common reason for failure to identify a duplicate record that should match. variations in publisher, names, and pagination were noted as common. lei zeng pres ents a study of chinese language records in the oclc and rlin databases.17 zeng discusses quality problems including (1) format errors such as field and subfield tagging and incorrect punctuation; (2) content errors such as missing fields and internal record inconsisten cies; and (3) editing and inputting errors such as spacing and misspelling. part 2 of her study presents the results of the prototype rulebased system developed to catch such errors.18 while the author refrains from comparing the quality of oclc and rlin chinese language catalog records, the discussion makes clear that the quality issues are common to a number of online databases. more work is needed on quality and accuracy of shared records in nonroman scripts, or in other lan guages transliterated to roman script. n types of bias to be considered specific factors that may tend to bias an attempt to match one record to another include: 1. violated expectations—system software expects data it does not receive, or data received is not well formed. 2. temporal bias—changes in rules and philosophies of record creation over time. 3. design bias—choices in layout of the records, which favor one type of record representation at the expense of another. 4. judgment calls—distinctions introduced in record representations due to differing but legitimate variation in expert judgment. oclc is a multina tional cooperative and there is no universal set of standards and rules for creating database records. rules of cataloging most widely used are not abso lutely prescriptive and are designed to allow local deviation to meet local needs.19 5. structural bias—process and systems bias. this category reflects internal influences, inherent in the automatic processing, storage, and retrieval of large numbers of records. 6. growth of the database environment—whether in raw numbers of records, numbers of specific formats, numbers of foreign languages, or other characteristics that may affect efficient location and comparison of records. 7. changes in contributor characteristics––in the goals or focus of institutions that contribute to the database. violated expectations data may not conform to expectations. expectations about the nature of records in the data bases are frequently violated. what seem to be good rules for matching may not work well if the incoming data is not well formed, or simply not constructed as expected. biasing sources in the incoming data include the fol lowing: 1. typographical errors occur in titles and other parts of the record. anywhere the software has to parse text, an entry error—or even correction of an entry error by a later update—could con found matching. this could confound both (a) query execution and (b) candidate comparisons. basically the system expects textual data such as the name of a title or publisher to be correct, and machinebased efforts to detect errors in data are expensive to run. spelling detection techniques can compensate in some ways for data problems, but will not identify cases of realword errors. see kukich for a survey of spelling error, realword, and contextdependent techniques.20 2. there is also the issue of real word differences in similar text strings. an automated system with programmed fault tolerance may wrongly equate the publisher name “mila” with “mela” when they are distinct publishers. equivalence tables can crossreference known variations on wellknown publisher names, but cannot predict merges and other organizational changes. or consider author names: are “john smith” and “jon smith” the 1� information technology and libraries | june 20071� information technology and libraries | june 2007 same? this is a major problem with automated authority control where context clues may not be trustworthy. 3. errors of formatting of variable fields in the meta data contribute to false mismatch. the rules for data entry in the marc record are complex and have changed over time. erroneous placement or coding of subfields poses challenges for iden tification of relevant data. the software must be fault tolerant wherever possible. changes in the format of the data itself in these fields/sub fields may further complicate record comparisons. isbns (international standard book numbers) and lccns (library of congress control numbers) have both changed format in the recent past. 4. errors occur in the fields that indicate format of the information. in bibliographic records, format information is used to derive the overall type of material being described: book, url, dvd, and so on. errors in the data in combination can generate an incorrect material type for the record. 5. language of cataloging: this comparison has in the past caused inappropriate mismatches. the require ments in the new matching aimed to address this. 6. language in formation of queries: marc records frequently are a mixture of languages. as has been seen in other projects with intensive comparison of text, overlap in languages has the potential to confuse comparisons of short strings of text.21 the assumption made here is that the use of all pos sible syllables contained in the title should tend to mitigate language problems. nothing short of semantic analysis by the software is likely to solve such a problem, and contextual approaches to detection have had most success (in the produc tion environment) in carefully controlled cases. matching overall must be generic in its problem solving techniques. temporal bias large databases developed over time have their contents influenced by changes in standards for record creation, changes in contributor perception of the role of the data base, and changes in technology to be described. changes may include the following: 1. description level: e.g. changes such as book or elec tronic book. these have evolved from format to contentbased descriptions that transcend format. over time, the cataloging rules for describing formats have changed. thus a format description created earlier might inadvertently “mismatch” the newer description of exactly the same item. for example, the rules for describing a book on a cd originally emphasized the cd format, whereas now, the emphasis might be shifted to focus on the intellectual content, the fact that it is a book. 2. the role of the database once perceived as chiefly repository or even backup source for a given library has become a shared resource with responsibilities to a community larger than any one library. 3. over time, the use of the database may change. (this is further discussed in the section on growth of the environment later.) searching has to satisfy the reference function of the database, but match ing as a process also relies on searching, and its goals are different. 4. varied standards worldwide challenge coopera tion. while u.s. libraries usually follow aacr2 and use the marc21 communications format, other parts of the world may use unimarc and countryspecific cataloging rules. for instance, the pica bibliotekssystem, which hosts the dutch union catalog, used the prussian cataloging rules, which tended to focus on title entries.22 the switch to the rak was made by the early nineties.23 5. some libraries may not use any form of marc but submit a spreadsheet that is then converted to marc. there is some potential for ambiguities in those conversions due to lack of 1:1 correspon dence of parts. 6. even within a country, standards change over time, so that “correct” cataloging in one decade may not match that in a later period. neither is wrong, in its own temporal context, but each results in different metadata being created to describe the same item. intner points out that oclc’s database was initi ated a full decade before rlg implemented rlin, and rlin started almost the same time as the aacr2 publication.24 thus rlin had many fewer preaacr2 records in its database, while worldcat had many more preexisting records to try to match with the newer aacr2 forms. 7. objects referenced in the database may change over time. for instance, a record describing an elec tronic resource may point to a location no longer valid for that resource. 8. vendor records are created as advance advertis ing, but there is no guarantee the records will be updated later. estimating the time before updates occur is impossible. 9. records themselves change over time as they are copied, derived, and migrated into other systems. they may be enhanced or corrected in any system where they reside. so when they return to the origi nating database, they may have been transformed so far as to be unrecognizable as representations of the same item. this problem is not unique to xwc; public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 1�misinformation and bias in metadata processing | thornburg and oskins 1� it is a challenge for any shared database where export of records and reentry is likely. design bias the title, author, publisher, place of publication, and other elements of a record, designed in a time when most of the contents of a library were books, may not appear as clear or usable for other forms of informa tion, such as web sites or software. there is a risk to any design of a representation for an object, that it may favor distinctions in one format over another. or representations imported from other schemes may lose distinctions in the crosswalk from one scheme to another. a crosswalk is a mechanism for the mapping of data elements/content from one metadata scheme to another. dublin core and marc are just two examples of schemes used by library professionals. software exists to convert dublin core metadata to marc for mat, but the process of converting less complex data to a scheme of more structured data has inevitable limita tions. for instance, dublin core has “subject” while marc has dozens of ways to indicate subject, each with a different kind of designation for subject aspects of an item.25 see discussion in beall.26 libraries commonly exchange or purchase records from external sources to reduce the volume or costs of inhouse cataloging. if an institution harvests metadata from multiple sources, there can be varying structures, content standards, and overall quality, all of which can make record compari sons error prone. while library and information science professionals have been creating metadata in the form of catalog records for a long time, the wider community of digital repositories may be outside the lis commu nity, and have varied understanding of the need for consistent representations of data. robertson discusses the challenges of metadata creation outside the library community.27 museums and archives may take a dif ferent view of what quality standards in metadata are. for example, for a museum, extensive detail about the provenance of an object is necessary. archives often record information at the collection level rather than the object level; for example, a box of miscellaneous papers, as opposed to a record for each of the papers within the box. educators need to describe resources such as learning objects. a learning object is any entity, digital or nondigital, which can be used, reused, or referenced during technologysupported learning 28 for these objects a metadata record using the ieee lom standard may be used.29 while this is as complex as a marc record, it has less bibliographic description and more focus on description of the nature and use of the learning object. in short, for one type of institution the notion of appropriate granularity of description may be too detailed or too vague for the needs of another type of institution. judgment calls two persons creating independent records for the same item exercise judgment in describing what is most impor tant about the object. one may say it is a book with an accompanying cd, another may say it is software on a cd, accompanied by a book of documentation. another example of legitimate variation is the choice of use of ellipses […] to leave out parts of long titles in a metadata description. one record creator may list the whole title, another may list only the first part followed by the mark of ellipsis to indicate abbreviation of the lengthy title. either is correct, but may not match each other without special techniques. see appendix b for the perils of ellipsis handling. the form of name of a publisher, given other occur rences of a publisher name in a record, may be abbrevi ated. for instance, in one place the corporate author who is also the publisher might be listed in the author field as “department of health and human services” and then abbreviated—or not—in the publisher area as “the department.” note that there are limitations inherent to the valida tion of any system of matching, in that human reviewers may not be able to determine whether two representa tions in fact describe the same item. structural bias 1. process bias refers to any features of the software which at runtime may change the way matching is carried out, whether by shortening or lengthen ing the analysis, or otherwise branching the logical flow. this can arise from many sources, including but not limited to the following factors. a. there is need for efficient processing of large num bers of incoming records. this can force an empha sis on speedy matching. that is, matching not required to replace records tends to be optimized to stop searching/matching as early as is reason able. in the case where unique key searching finds a single match to an incoming record, it is fairly easy for the software to “justify” stopping. if there are multiple matches found, more analysis may be needed before the decision to stop matching can be made. over time the numbers of records processed has increased enormously. b. matching needs to exploit “unique” keys to speed searching, yet these may not prove to be unique. though agreements are in place for use of numeric keys such as isbns, creation of these keys is not under the control of any one organization. 20 information technology and libraries | june 200720 information technology and libraries | june 2007 c. problems arise when brief records are com pared with fuller records. comparisons may be biased inadvertently towards false matches. such sparseness of data has been identified as a problem in rlin matching as well as in xwc. d. at the same time there is bias toward less generic titles in matching. requirements of sys tem throughput mandate an upper limit on the size of result set that the matching software will even attempt to analyze. this upper limit could tend to discriminate against effective retrieval of generic titles. matching will reject very large results sets of searches. so the query that has fewer title terms may tend to retrieve too much. titles such as “proceedings” or “bulletin” may be difficult to match if insufficient other informa tion is present in the record for the query to use. ironically this can mean addition of more generic titles to the database, since what is there is in effect less findable. e. transparency can contribute to bias in that, for each layer of transparency a layer of opacity may be added, when information is filtered out from a user’s view. that user may be a human or an application. openurl access to “appropriate copy” is an example from the standards world. the complexity of choosing among multiple online copies has become known as the “appro priate copy” problem. there are a number of instances where more than one legitimate copy of an electronic article may exist, such as mir roring or aggregator databases. it is essentially a problem of where and how to introduce localiza tion into the linking process.30 appropriateness reflects the user’s context, e.g., location, license agreements in place, cost, and other factors. 2. systems bias. what is this, really? the database can be seen as “agent.” the weight of its own mass may affect efforts to use its contents. a. for maintainers of large database systems, the goals of database repository and search engine may be somewhat at odds. yet librarians do make use of the database as reference source. b. search strategies for the software that acts as a user of the database is necessarily developed and optimized at a certain point in time. yet a river of new information flows into this data base. 1. if the numbers of types of entries in various database indexes grows nonproportion ally, search strategies that worked well in the past could potentially fall “out of tune” with the database contents. see growth of the environment section below. 2. change in proportions of languages in the database may render an application’s use of stopword lists less effective. 3. if changes in technology or practice result in new forms of material being described in the database, the software searches using material type as a limiter may not work properly. the software is using abstractions provided by the database, and they need to be kept synchronized. c. automated query construction presents its own problems. the use of boolean searching [term a and term b and term c] is quite restrictive in the sense that there is no “halfway” or flex for a record being included in a set of candidates. matching starts with the most specific search to avoid toohigh numbers of records retrieved, and all it can do is drop or rearrange terms from a query in the effort to broaden the results. d. disconnects in metadata object creation/revision are another problem. links can point to broken uris (uniform resource identifiers). controlled vocabularies can drift or expand. even more confusing, a uri that is not broken may point to content which has changed to the point where the metadata no longer describes the item it once did. at one extreme, bruce and hillmann describe the curious case of citation of judicial opinions, for which a record of the opinion may be created as much as eighteen months before the volume with the official citation is printed, and thus the official citation cannot be created.31 e. expectations for creation of metadata play a role as well. traditional cataloging has generally had an expectation that most metadata is being cre ated once and reused. yet current practice may be more iterative, and must be, if such problems as records with broken internet uris are to be avoided. f. loss of synchronization can subvert process ing. note that other elements of metadata may become divorced or out of synch with the origi nal target /purpose. the prefix to an isbn was originally intended to describe the publisher, but is now an unreliable discriminator. numeric keys intended to identify items uniquely can retrieve multiple items, if the scheme for assign ing them is not applied consistently. in the worst case, meaningful data elements may become so corrupted as to be useless for record retrieval or even comparison of two records. g. ownership issues can detract from optimal data base management. member institutions’ percep tions of ownership of individual records can conflict with the goals of efficient search and retrieval. members may resist the idea of a “bet public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 21misinformation and bias in metadata processing | thornburg and oskins 21 ter” record being merged with a “lesser” one. so systems have ways of ranking records by source or contents with the general goal of trying to avoid losing information, but with the specific effect of refraining from actions that might be enriching in a given case. growth of the database environment a shared database can grow in unpredictable ways. a change in the relative proportions of different types of materials or topical coverage can render onceeffective searches ineffective due to large result sets. an example of this is the number of internetrelated entries in xwc. a search such as “dog” restricted to “internetrelated” entries in 1995 retrieved thirtyfour hits. this might be a manageable number. but in 2005, 225 entries were in the result set. similarly with subject headings, one search on “computer animation” retrieved fourteen hits in 1980, and 342 in 2005. in both cases the result sets grew from manageable to “too large” over time. the increase in the number of foreign language entries in a database can cause problems. just determining what language an entry is in can be difficult, and records may contain multiple languages. also, such languages as chinese, japanese, and korean can overlap. chinese syllables such as: “a, an, to, no, jan, ka, jun, lung, sung, i, lo, la, le, so, sun, juan,” seen out of context might be chinese or any one of several other languages. determining appropriate handling of stopwords and other rules for effective title matching becomes more complex as more languages populate the database. changes in contributor characteristics copy cataloging practices in an institution can affect xwc indirectly. an institution previously oriented to fixing downloaded records may adopt a policy of refrain ing from changing downloaded records. historical inde pendence of libraries is one illustration. prior to the 1970s, most libraries did not share their cataloging with other libraries. many institutions, especially smaller ones, were outside the loop and did things their own way. they used what rules they felt were useful, if they used any rules at all. later they converted sparse and poorly formed data into marc records and sent them to oclc for matching, perhaps in an effort to get back a more complete and useful record. yet the matching process is not always able to distinguish or interpret these local dialects. changes in specialization of cata loging staff at an institution, or cutbacks in staff can lead to reduced facility in providing original cataloging. outsourcing of cataloging work can affect handling of specialized materials as well. the introduction of vendor records and their characteristics has been noted by shedenhelm and burk.32 as they note, these records are very brief bibliographic records originally designed to advertise an item for sale by the vendor. these mini mal level records have a relatively high degree of dupli cation with existing records (37.5 percent in their study) and because of their sparseness can increase the cost of cataloging. changes in the proportion of contribu tors who create records in nonmarc formats such as dublin core can affect the completeness of bibliographic entries. the use of such formats, meant to facilitate the entry of bibliographic materials, does come with a cost. group cataloging is a process whereby smaller libraries can join a larger set of institutions in order to reduce costs and facilitate cataloging. this larger group then contributes to oclc’s database as an entity. the growth of group cataloging has resulted in the addition of more records from smaller libraries, which may in the future have an effect on searching/matching in xwc worldcat overall. internationalization may be a factor as well. the marc format is an anglobased format with englishlanguagebased documentation. rapid inter national growth thrusts a broader range of traditions into a marc/oclc world. the role of character sets is heightened as the database grows. a cyrillic record may not be confidently matched to a transliterated record for the same item. although worldcat has a long his tory with cjk records, marc and worldcat are not yet accustomed to a wide repertoire of character sets. now, however, xwc is an environment in which expanding character coverage is possible, and likely. future research n we need more systematic study of the types of errors/omissions encountered in marc record cre ation. n how can the process of matching accomodate objects that change over time? n how does the conversion from new metadata schemes affect matching to marc records? does it help to know in what format a record arrived, or under what rules it was created? n how can we address sparseness in vendor records or legal citations? how can we deal with other advance publication issues? n how do changes in philosophy of the database affect the integrity of the matching process? n conclusions in this review we have seen that characterizing metadata at a high level is difficult. challenges for adding to a large, complex database include some of the following: 22 information technology and libraries | june 200722 information technology and libraries | june 2007 n rules for expert creation of metadata inevitably change over time. n the object of the metadata itself may change, more often than may be convenient. n comparisons of briefer records to records that are more elaborate descriptions can have pitfalls. search and comparison strategies for such record pairs are challenged by the need to have matching algorithms that work for every scenario. n changes within the database may themselves con tribute to exacerbation of matching problems if duplicates are added too often, or records are merged that actually represent different contents. because of the risk, policies for merging and replacing records tend to be conservative, but this does not always favor the greatest efficiency in database processing. n changes in the membership sharing a database are likely to affect its shape and searchability. n newer schemes of metadata representation are likely to challenge existing algorithms for determining matches. references 1. national information standards organization, understanding metadata (bethesda, md.: niso pr., 2004), 1. http:// www.niso.org/standards/resources/understanding metadata. pdf (accessed feb. 26, 2006). 2. library of congress, “marc 21 concise format for bibliographic data (2002).” http://www.loc.gov/marc/ bibliographic/ecbdhome.html (accessed nov. 20, 2004). 3. gail thornburg, “matching: discrimination, misinforma tion, and sudden death,” informing science conference, flag staff, ariz., june 2005. 4. thomas b. hickey and david j. rypka, “automatic detec tion of duplicate monographic records,” journal of library automation 12, no. 2 (june 1979): 125–42. 5. david bade, “the creation and persistence of misinfor mation in shared library catalogs,” occasional paper no. 211, (graduate school of library and information science, univer sity of illinois at urbana–champaign, apr. 2002). 6. edward t. o’neill, sally a. rogers, and w. michael oskins, “characteristics of duplicate records in oclc’s online union catalog,” library resources and technical services 37, no.1 (1993): 59–71. 7. jeffrey beal and karen kafadar, “the effectiveness of copy cataloging at eliminating typographical errors in shared bibliographic records,” library resources & technical services 48, no. 2 (apr. 2004): 92–101. 8. j. j. pollock and a. zamora, “collection and characteriza tion of spelling errors in scientific and scholarly text,” journal of the american society for information science 34, no. 1 (1983): 51–58. 9. edward t. o’neill and rao aluri, “a method for cor recting typographical errors in subject headings in oclc records,” research report # oclc/opr/rr80/3 (1980). 10. martha m. yee, “manifestations and yearequivalents: theory, with special attention to movingimage materials,” library resources and technical services 38, no. 3 (1995): 227–55. 11. owen gingerich, “researching the book nobody read: the de revolutionibus of nicolaus copernicus,” the papers of the bibliographical society of america 99, no. 4 (2005): 484–504. 12. laura d. shedenhelm and bartley a. burk, “book vendor records in the oclc database: boon or bane?” library resources and technical services 45, no. 1 (2001): 10–19. 13. peter jasco, “content evaluation of databases,” in annual review of information science and technology, vol. 32 (medford, n.j.: information today, inc., for the american society for infor mation science, 1997), 231–67. 14. sheila intner, “quality in bibliographic databases: an analysis of membercontrolled cataloging of oclc and rlin,” advances in library administration and organization 8 (1989): 1–24. 15. jeffrey beall, “metadata and data quality problems in the digital library,” journal of digital information 6, no. 3 (2005): 10–11. 16. edward t. o’neill and diane vizinegoetz, “quality control in online databases,” annual review of information science and technology 23 (washington, d.c.: american society for information science, 1988). 17. lei zeng, “quality control of chineselanguage records using a rulebased data validation system. part 1: an evalua tion of the quality of chineselanguage records in the oclc oluc database,” cataloging and classification quarterly 16, no. 4 (1993): 25–66 18. lei zeng, “quality control of chineselanguage records using a rulebased data validation system. part 2: a study of a rulebased data validation system for online chinese cata loging,” cataloging and classification quarterly 18, no. 1 (1993): 3–26. 19. anglo-american cataloguing rules, 2nd ed., 2002 rev. (chi cago: ala, 2002). 20. karen kukich, “techniques for automatically correct ing words in text,” acm computing surveys 24, no. 4 (1992): 377–439. 21. gail thornburg, “the syllables in the haystack: techni cal challenges of nonchinese in a wadegiles to pinyin con version,” information technology and libraries 21, no. 3 (2002): 120–26. 22. hartmut walravens, “serials cataloguing in germany: the historical development,” cataloging and classification quarterly 35, no. 3/4 (2003): 541–51; instruktionen für die alphabetischen kataloge der preuszischen bibliotheken vom 10. mai 1899. 2 ausg. in der fassung vom 10. august 1908 (berlin: behrend & co., 1909). 23. richard greene, email message to author, nov. 13, 2006; regeln für die alphabetische katalogisierung: rak / irmgard bou vier (wiesbaden, germany: l. reichert, 1980, c1977). 24. intner, “quality in bibliographic databases.” 25. richard greene, email message to author, feb. 27, 2006. 26. beall, “metadata and data quality problems in the digital library.” 27. r. john robertson, “metadata quality: implications for library and information science professionals,” library review 54, no. 5 (2005): 295–300. public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 23misinformation and bias in metadata processing | thornburg and oskins 23 28. ieee. learning technology standards committee, “wg12: learning objects metadata.” http://ltsc.ieee.org/wg12 (accessed feb. 26, 2006). 29. ibid. 30. orien beitarie et al., “linking to the appropriate copy: report of a doibased prototype,” d-lib 7, no. 9 (sept. 2001). 31. thomas r. bruce and diane i. hillmann,“the continuum of metadata quality: defining, expressing, exploiting,” in metadata in practice (chicago: ala, 2004), 238–56. 32. shedenhelm and burk, “book vendor records in the oclc database.” 24 information technology and libraries | june 200724 information technology and libraries | june 2007 appendix a. sample cdfrecord record from the xwc database cgm 7a 27681290 vf bcahru mr baaafu 920714r19551952fr 092 mleng 92513007 dlcamim dlc lp5921u.s. copyright office xxu mr vbe 63606361 (viewing copy) fgb 56435647 (ref print) fpa 06210625 (master pos) othello (motion picture : welles) the tragedy of othello the moor of venice / a mercury production, [films marceau?] ; directed, produced, and written by orson welles. u.s. ; [morocco?] france :films marceau,1952 ; [morocco?: :s.n., 1952?] ;united states : united artists,1955. 2 videocassettes of 2 (ca. 92 min.) :sd., b&w ; 3/4 in. viewing copy. 10 reels of 10 on 5 (ca. 8280 ft.) :sd., b&w ; 35 mm. ref print. 10 reels of 10 on 5 (ca. 8280 ft.) :sd., b&w ; 35 mm. masterpos. copyright: orson welles; 19sep52; lp5921. reference sources cited below and m/b/rs preliminary cataloging card list title as othello. photography, anchisi brizzi, g.r. aldo, george fanto ; film editors, john shepridge, jean sacha, renzo lucidi, william morton ; music, francesco lavagnino, alberto barberis. orson welles, suzanne cloutier, micheaì l macliamoì ir, robert coote. director, producer, and writer credits taken from focus on orson welles, p. 205. lc has u.s. reissue copy.dlc new york times,9/15/55. an adaptation of the play by william shakespeare. reference sources used: new york times, 9/15/55; international motion pic ture almanac, 1956, p. 329; focus on orson welles, p. 205206; monthly film bulletin, v. 23, no. 267, p. 44; index de la cineì matog raphie francì§aise, 1952, p. 496. received: 5/26/87 from lc video lab;viewing copy; preservation, made from ref print, paperwork in acq: copyrightmaterial movement form file, lwo 21635; copyright collection. received: 12/2/64; ref print;copyright deposit; copyright collection. received: 5/70; masterpos;gift; afi theatre collection. othello (fictitious charac ter)drama. public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 25misinformation and bias in metadata processing | thornburg and oskins 25 plays. mim features. mim welles, orson, 1915direction, production,writing, cast. cloutier, suzanne,1927cast. mac liammoì ir, micheaì l, 18991978,cast. coote, robert,19091982,cast. copyright collection (library of congress)dlc afi theatre collection (library of congress)dlc othello. appendix b. the perils of judging near matches a. challenges of handling ellipses in titles thought to be similar incoming title: general explanation of tax legislation enacted in ... / prepared by the staff of the joint committee on taxation match: general explanation of tax legislation enacted in the 104th congress prepared by the staff of the joint committee on taxation incoming title: general explanation of tax legislation enacted in ... / prepared by the staff of the joint committee on taxation match: general explanation of tax legislation enacted in the 106th congress prepared by the staff of the joint committee on taxation incoming title: general explanation of tax legislation enacted in ... / prepared by the staff of the joint committee on taxation match: general explanation of tax legislation enacted in the 107th congress prepared by the staff of the joint committee on taxation incoming title: general explanation of tax legislation enacted in ... / prepared by the staff of the joint committee on taxation match: general explanation of tax legislation enacted in the 108th congress prepared by the staff of the joint committee on taxation b. partial matches in names which might represent the same publisher publisher comparison is challenging in an environment where organziations are regularly merged or acquired by other organziations. there is no real authority control for publishers that would help cataloguers decide on a preferred form. when governmental organizations are added to the mix, the challenges increase. below are some examples of nonmatch ing text of publisher names in records, which might or might not considered the same by a human expert. (the publisher names have been normalized.) 26 information technology and libraries | june 200726 information technology and libraries | june 2007 1. publisher name may be partially or differently recorded in two records incoming publisher: konzeptstudien kantonale planungsgruppe match: kantonale planungsgruppe konzeptstudien (word order different) incoming publisher: institut francais proche orient match: institut francais darcheologie proche orient incoming publisher: u s dept of commerce national oceanic and atmospheric administration national environ mental satellite data and information service match: national oceanic and atmospheric administration 2. publisher name may have changed due to acquisition by another organization incoming publisher: pearson prentice hall match: prentice hall incoming publisher: uxl match: uxl thomson gale incoming publisher: thomson arco match: arco thomson learning 3. one record may show “publisher” which is actually government distributing agency or clearinghouse such as the u.s. government printing office or national technical information service (ntis), while the candidate match shows the actual government agency. these can be almost impossible to evaluate. incoming publisher: u s congressional service match: supt g p o (here the distributor is the government printing office, listed as the publisher) incoming publisher: u s dept of commerce national oceanic and atmospheric administration national environmental satellite data and information service match: national oceanic and atmospheric administration incoming publisher: u s gpo match: u s fish and wildlife service 4. the publisher in a record may start with or end with the publisher in the second record. should it be called a match? good: incoming publisher trotta match: editorial trotta incoming publisher wiley match: john wiley questionable? incoming publisher prentice hall match: prentice hall regents canada incoming publisher geuthner match: orientaliste geuthner incoming publisher oxford match: distributed royal affairs oxford incoming publisher: pan union general secretariat organization states match: social science section cultural affairs pan union 44 information technology and libraries | march 2011 jennifer emanuel usability of the vufind next-generation online catalog vufind incorporates many of the interactive web and social media technologies that the public uses online, including features from online booksellers and commercial search engines. the vufind search page is simple, containing only a single search box and a dropdown menu that gives users the option to search all fields or to search by title, author, subject, or isbn/issn (see figure 1). to combine searches using boolean logic or to limit to a particular language or format, the user must use the advanced search feature (see figure 2). the recordresults page displays results vertically, with each result containing basic item information, such as title, author, call number, location, item availability, and a graphical icon displaying the material’s format. the results page also has a column on the right side displaying “facets,” which are links that allow a user to refine their search and browse results using catalog data contained within the result set (see figure 3). vufind also contains a variety of web 2.0 features, such as the ability to tag items, create a list of favorite items, leave comments about an item, cite an item, and links to google book previews and extensive author biographies data mined from the internet. corresponding to the beginning of the vufind trial at uiuc, the university library purchased reviews, synopses, and cover images from syndetic solutions to further enhance both vufind and the existing webvoyage catalog. an additional appealing aspect of vufind was its speed; the carli installation of webvoyage is slow to load and is prone to time out while conducting searches. the uiuc library first provided vufind (http:// www.library.illinois.edu/vufind) at the beginning of the 2008 fall semester and expected it to be trialed through the end of the spring semester 2009. use statistics show that throughout the fall semester (september through december), there were approximately six thousand unique visitors each month, producing a total of more than thirty-eight thousand visits. spring statistics show use averaging more than ten thousand visitors a month, an increase most likely from word-of-mouth. librarians at both uiuc and carli were interested in what users thought about vufind, especially in relation to the usability of the interface. with this in mind, the library launched several forms of assessment during the spring semester. the first was a quantitative survey based on yale’s vufind usability testing.3 the second was a more extensive qualitative usability test that had users conducting sample searches in the interface and telling the facilitator their opinions. this article will discuss the hands-on usability portion of this study. survey responses that support the results presented herein will be reported in a separate venue. while this article only discusses vufind at a single institution, it does offer a generalized view of next-generation catalogs and how library users use such a catalog compared to a traditional online catalog. the vufind open–source, next-generation catalog system was implemented by the consortium of academic and research libraries in illinois as an alternative to the webvoyage opac system. the university of illinois at urbana-champaign began offering vufind alongside webvoyage in 2009 as an experiment in next generation catalogs. using a faceted search discovery interface, it offered numerous improvements to the uiuc catalog and focused on limiting results after searching rather than limiting searches up front. library users have praised vufind for its web 2.0 feel and features. however, there are issues, particularly with catalog data. v ufind is an open–source, next-generation catalog overlay system developed by villanova university library that was released to the public as beta in 2007 and version 1.0 in 2008.1 as of july 2009, four institutions implemented vufind as a primary catalog interface, and many more are either beta or internally testing it.2 more information about vufind, including the technical requirements and compatible opacs, is available on the project website (http://www.vufind.org). in illinois, the state consortium of academic and research libraries in illinois (carli) released a beta installation of vufind in 2008 on top of its webvoyage catalog database. the carli installation of vufind is a base installation with minor customizations to the carli catalog environment. some libraries in illinois utilize vufind as an alternative to their online catalog, including the university of illinois at urbana-champaign (uiuc), which currently advertises vufind as a more user friendly and faster version of the library catalog. as a part of the evaluation of nextgeneration catalog systems, uiuc decided to conduct hands-on usability testing during the spring of 2009. the carli catalog environment is very complex and comprises 153 member libraries throughout illinois, ranging from tiny academic libraries to the very large uiuc library. currently, 76 libraries use a centrally managed webvoyage system referred to as i-share. i-share is composed of a union catalog containing holdings of all 76 libraries as well as individual institution catalogs. library users heavily use the union catalog because of a strong culture of sharing materials between member institutions. carli’s vufind installation uses the records of the entire union catalog, but has library-specific views. each of these views is unique to the member library, but each library uses the same interface to view records throughout i-share. jennifer emanuel (emanuelj@illinois.edu) is digital services and reference librarian, university of illinois at urbana-champaign. usability of the vufind next-generation online catalog | emanuel 45 not simply find them.6 as a result, the past five years have been filled with commercial opac providers releasing next-generation library interfaces that overlay existing library catalog information and require an up-front investment by libraries to improve search capabilities. as these systems are inherently commercial and require a significant investment of capital, several open–source, next-generation catalog projects have emerged, such as vufind, blacklight, scriblio, and the extensible catalog project.7 these interfaces are often developed at one institution with their users in mind and then modified and adapted by other institutions to meet local needs. however, because they can be locally customized, libraries with significant technical expertise can have a unique interface that commercial vendors cannot compete against. one cannot discuss next-generation catalogs without mentioning the metadata that underlie opac systems. some librarians view the interface as only part of the problem of library catalogs and point to cataloging and metadata practices as the larger underlying problem. many librarians view traditional cataloging using machine-readable cataloging (marc), which has been used since the 1960s, as outdated because it was developed with nearly fifty-year-old technology in mind.8 however, because marc is so common and allows cataloging with a fine degree of granularity, current opac systems still utilize it. librarians have developed additional cataloging standards, such as dublin core (dc), metadata object description schema (mods), and functional requirements for bibliographic records (frbr), but none of these have achieved widespread adoption for cataloging printed materials. newly developed catalog projects, such as extensible catalog, are beginning to integrate these new metadata schemas, but currently others continue to use marc.9 many librarians also advocate to integrate folksonomy, or user tagging, into library catalogs. folksonomy is used by many library websites, most notably flickr, delicious, and librarything, each of which store user-submitted content that istagged with self-selected keywords that allow for easy retrieval and discovery.10 vufind integrates tagging into individual item records ■■ literature review librarians have complained about the usability of online catalogs since they were first created.4 when amazon.com became the go-to site for books and book information in the early 2000s, librarians and their users began to harshly criticize both opac interfaces and metadata standards.5 ever since north carolina state university announced a partnership with the commercial-search corporation endeca in 2006, librarians have been interested in the next generation of library catalogs and more broadly, discovery systems designed to help users discover library materials, figure 1. vufind default search figure 2. vufind advanced search figure 3. facets in vufind 46 information technology and libraries | march 2011 searching the library’s online catalog and were eager to see changes made to it. the test used was developed from a statewide usability test of different catalog interfaces usedin illinois. the test was adapted using the same sample searches, but was customized to the features and uses of vufind (see appendix). the vufind test was similar to the original test to allow a comparison of other catalog interfaces to vufind for internal evaluation purposes. i designed the test to allow subjects to perform a progressively complicated series of sample searches using the catalog while the moderator pointed out various features of the catalog interface. subjects were also asked what they thought about the search result sets and their opinions of the interface and navigation; they also were asked to perform specific tasks using vufind. the tasks were common library-catalog tasks using topics familiar at undergraduate–level students. the tasks ranged from a keyword search for “global warming” to a more complicated search for a specific compact disc by the artist prince. the tasks also included using the features associated with creating and using an account with vufind, such as adding tags and creating a favorite items list. through completing the test, subjects got an overview of vufind and were then asked to draw conclusions about their experience and compare it to other library catalogs they have used. the tests were performed in a small meeting room with one workstation set up with an install of the morae software, a microphone, and a web camera. morae is a very powerful software program developed by techsmith that records the screen on which the user is interacting with an interface, as well as environmental audio and video. although the study did not utilize all the features of the morae software, it was invaluable to the researcher to be able to review the entire testing experience with the same detail as when the test actually occurred in person. the study was carried out with the researcher sitting next to the workstation asking subjects to perform a task from the script while morae recorded all of their actions. once all fifteen subjects completed the test, the researcher watched the resulting videos and coded the answers into various themes on the basis of both broad subject categories and individual question answers. the researcher then gathered the codes into categories and used them to further analyze and gain insight into both the useful features of and problems with the vufind interface. ■■ analysis participants generally liked vufind and preferred it to the current webvoyage system. when asked to choose which catalog they would rather use, only one person, a faculty member, stated he would still use webvoyage. this faculty but does not pull tags from other sources; rather, users must tag items individually. additionally, next-generation catalogs offer a search mechanism that focuses on discovery rather than simply searching for library materials. users, accustomed to new ways of searching both on the internet and through commercial library indexing and abstracting databases, now search in a fundamentally different style than they did when opacs first became a part of library services. the online catalog is now just one of many tools that library users use to locate information and now covers fewer resources than it did ten to fifteen years ago. library users are now accustomed to using a single search box, such as with google; they also use nonlibrary online tools to find information about books and no longer view library catalogs as the primary place to look for books.11 as users are no longer accustomed to using the controlled language and particular searching methods of library catalogs because they have moved to discovering materials online, libraries must adapt to new way of obtaining information and focus not on teaching users how to locate library materials, but give them the tools to discover on their own.12 vufind is one option among many in the genre of next-generation or discovery-catalog tools. ■■ methods the study employed fifteen subjects who participated in individual, hands-on usability test sessions lasting an average of thirty minutes. i recruited volunteers though several methods, including posting to a university faculty and staff e-mail discussion list, an e-mail discussion lists aimed toward graduate students, and flyers in the undergraduate library. all means of recruitment stated that the library sought volunteer subjects to perform a variety of sample searches in a possible new library catalog interface. i also informed subjects that there was a gift card as a thank you for their time. all subjects had to sign a human subjects statement of informed consent approved by the university of illinois institutional review board. i sought a diverse sample, and therefore accepted the first five volunteers from the following pools: faculty and staff, graduate students, and undergraduate students. i felt that these three user groups were distinct enough to warrant having separate pools. the number of five users in each group was chosen because of jakob nielsen’s statement that five users will find 85 percent of usability problems and that fifteen users will discover all usability problems.13 although i did not specifically aim to recruit a diverse sample, the sample showed a large diversity in areas including age, library experience, and academic discipline. all subjects stated they had some experience usability of the vufind next-generation online catalog | emanuel 47 though there were questions as to how results were deemed relevant to the search statement as well as how they were ranked. participants were then asked to look at the right sidebar of the results page, which contains the facets. most users did not understand the term “facets,” with faculty and staff understanding the term more than graduate and undergraduate students did. one faculty member who understood the term facet noted that “facets are like a diamond with different sides or ways of viewing something.” however, when asked what term would be better to call the limiting options other than facet, several users suggested either calling the facets “categories” or renaming the column “refine search,” “narrow search,” or “sort your search.” participants were then asked to find how to see results for other i-share libraries. only two faculty members found i-share results quickly, and just half of the remaining participants were able to find the option at all. when asked what would make that option easier to find, most said they liked the wording, but the option needed to stand out more, perhaps with a different colored link or bolder type. two users thought having the location integrated as a facet would be the most useful way of seeing it. participants, however, quickly took to using the facets, as they were asked to use the climate change search results to find an electronic book published in 2008. no user had problems with this task, and several remarked that using facets was a lot easier than limiting to format and year before searching. the next task for participants was to open and examine a single record within their original climate change results (see figures 4 and 5). participants liked the layout, including the cover image with some brief title information, and a tabbed bar below showing additional information, such as more detailed description, holdings information, a table of contents, reviews, comments, and a link to request the item. several users remarked that they liked having information contained under tabs, but vufind organized each tab as a new webpage that made going back to previous tabs or the results page cumbersome. the only problem users had with the information contained within the tabs was the “staff view,” which contained the marc record information. most users looked at the marc record with confusion, including one graduate student who said, “if the staff view is of no use to the user, why even have it there?” one other useful feature that individual records in vufind contain is a link to an overlay window containing the full citation information for the item in both apa and mla formats. users were able to find this “cite this” link and liked having that information available. however, several participants noted that citation information would be much more beneficial if it could be easily exported to refworks or other bibliographic software. the next several searches used progressively higher-level member thought most of his searches were too advanced for the vufind interface and needed options that vufind did not have, such as limiting a search to an individual library or call number searching. this user did, however, specify that vufind would be easier to use for a fast and simple search. other users all responded very favorably to vufind, liking it better than any other online catalog they have used, with most stating that they wanted it as a permanent addition to the library. the most common responses to vufind were that the layout is easier on the eyes and displayed data much better than the webvoyage catalog; there were no comments about actual search results. several users stated that it was nice to be able to do a broad search and then have all limiting options presented to them as facets, allowing users to both limit after searching and letting them browse through a large number of search results. one user, an undergraduate student, stated she liked vufind because it “was new” and she always wants to try out new things on the internet. the first section of the usability test asked users to examine both the basic and advanced search options. users easily recognized how the interface functioned and liked having a single search box as the basic interface, noting that it looked more like a web search engine. they also recognized all of the dropdown menu options and agreed that the options included what they most often searched. however, four users wanted a keyword search. even though there is not a keyword search in webvoyage and there is an “all fields” menu option, participants seemed to think of the one box search universally as a keyword search and wanted that to be the default search option. one participant, an international graduate student, remarked that keyword is more understood by international students than the “all fields” search because, internationally, a field is not a search field but a scholarly field such as education or engineering. in the advanced search, all users thought the search options were clear and liked having icons to depict the various media formats. however, two users did remark that it would be useful to be able to limit by year on the advanced search page. the advanced search also is where the user can select one of seven languages, all of which are considered western languages, including latin and russian. two users, both international graduate students, stated that more languages would be beneficial, especially asian and more slavic languages. the university of illinois has separate libraries for asian and slavic materials, and these two participants said it would be useful to have search options that include the languages served by the libraries. the first task that participants were asked to do was an “all fields” search for “climate change.” they were instructed to look at the results page and an individual record to give feedback as to how they liked the layout and what they thought of the search results. upon looking at the results, all participants thought they were relevant, 48 information technology and libraries | march 2011 to items in which james joyce is the author, no participant had any problems, though several pointed out that there were three facets using his name—joyce, james; joyce, james avery; and joyce, j. a.—because of inconsistencies in cataloging (see figure 6). participants were next asked to search for an audio recording by the artist prince using the basic (single) search box. most participants did an “all fields” search for prince and attempted to use the facets to limit by a particular format. all but one was confident that they achieved the proper result, but there was confusion about the format. some participants were confused as to what format an audio recording was because the corresponding facet was for a music recording. a couple of users thought “audio recording” could be a spoken-word recording. most participants preferred that the format facets be more concrete toward a single actual physical format, such as a record, cassette, or a compact disc (see figure 7). physical formats appeared to resonate more with users than the broad cataloging term of “music recording.” a more specific format type (i.e., compact disc) is contained in the call number and should be straightforward to pull out as a facet. it appears vufind pulls the format information from marc field 245 subfield $h for medium rather than the call number (which at illinois can specify the format) or the 300 physical description field or another field such as a notes field that some institutions may use to specify the exact format. however, when participants were asked to further use facets to find prince’s first album, 1978’s for you, limitations with vufind became more apparent. each participant used a different method to search for this album, and none actually found the item either locally or in i-share, though the item has multiple copies available in both locations. most participants tried initially limiting by date because they were given that information. however, vufind’s facets focus on eras rather than specific years, which participants stated was frustrating as many items can fall under a broad era. also, the era facets brought up many more eras than one would consider an audio research skills and showed problems with both vufind and the catalog record data. the first search asked participants to do an “all fields” search for james joyce. all were able to complete the search, but there was notable confusion as to which records were written by james joyce and which were items about him. about half of the first-page results for this search did not list an author on the results page. vufind appears to pull the author field on the results page from the 100 field in the marc record, so if the 700 field is used instead for an editor, this information is not displayed on the results page. individual records do substitute the 700 field if the 100 field is not present, but this should also be the case on the initial results screen as well. several users thought it was strange that the results page often did not list the author, but an author was listed in the individual record. additionally, when asked to use the facets to limit figure 4. results set figure 5. record display figure 6. author facet figure 7. format facet usability of the vufind next-generation online catalog | emanuel 49 about both the reviews and comments that could be seen in the various records participants were asked to examine. many of the participants wanted more information as to where the reviews came from because this information was not clear. they also wanted to know whether the reviews or comments from catalog users had any type of moderation by a librarian. for the most part, participants liked having reviews inside the catalog records, but they liked having a summary even more. several users, all graduate students, expressed concern about the objectiveness of having reviews in the catalog, especially because it was not clear who did the review and feared that reviews may interject some bias that had no place in a library catalog record. one of these participants stated, “if i wanted reviews, i would just go to amazon. i don’t expect reviews, which can be subjective, to be in a library catalog—that is too commercial.” several undergraduate participants stated that reviews helped them decide whether the book was something that would be useful to them. the final task of the usability test asked participants to create an account with vufind because it is not connected to our user database. most users had no problems finishing this task, though they found some problems with the interface. first, it was not clear that users had to create an account and could not log in with their library number as they did in the library’s opac. second, the default field asks users for their barcode, which is not a term used at uiuc (users are assigned a library number). once logged in, participants were satisfied with the menu options and how their account information was displayed. finally, participants were asked, while logged in, to search for a favorite book and add it to their favorites list. all users liked the favorites-list feature, and many already knew of ways they could use it, but several wished they could create multiple lists and have the ability to arrange lists in folders. ■■ discussion participants thought favorably of the vufind interface and would use it again. they liked the layout of information much more than the current webvoyage interface and thought it was much easier to look at. they also had many comments that the color scheme (yellow and grey) was easier than the blues of the primary library opac. vufind also had more visual elements, such as cover images and icons representing format types that participants also commented on favorably. when asked to compare vufind to both the webvoyage catalog and amazon, only one participant indicated a preference for amazon, while the rest preferred vufind. the user who specified amazon, a faculty member, stated that that was where he always started searching for books; he would then search for specific titles in the recording, such as the 15th century. granted, the 15th century probably brings up music that originated in that era, not recorded then, but participants wanted the date to correspond to when an item was initially published or released. it appears that vufind pulls the era facet information from the subject headings and ignores the copyright or issue year. to users, the era facets are not useful for most of their search needs; users would rather limit by copyright or the original date of issue. another search that further highlighted problems searching for multimedia in vufind is the title search participants did for gone with the wind. everyone thought this search brought up relevant results, but when asked to determine whether the uiuc library had a copy of the dvd, many users expressed confusion. once again, the confusion was based on the inability to limit to a specific format. participants could use the facets to limit to a film or video, but not to a specific format. several participants stated that they needed specific formats because when they are doing a comparable search, they only want to find dvds. however, because all film formats are linked together under “film/video,” they must to go into individual records and examine the call number to determine the exact format. most participants stated clearly that “dvd” needed to be it’s own format facet and that entering a record to find the format required too much effort. participants also expressed frustration that the call number was the only place to determine specific format and believed that this information should be contained in the brief item information and not buried in the tabbed areas. the frustrations with the lack of specific formats also were evident when participants were asked to do an advanced search for a dvd on public speaking. all users initially thought the advanced search limiter for film/video was sufficient when they first looked at the advanced search options. however, when presented with an actual search (“public speaking”), they found that there should be more options and specific format choices up-front within the advanced search. another search that participants conducted was an author search for jack london. they then used the facets to find the book white fang. this search was chosen because the resulting records are mostly for older materials that often do not contain a lot of the additional information that newer records contain. participants looked at a specific record and then were asked what they thought of the information that was displayed. most answered that they would like as much information as you can give them, but were accepting of missing information. several participants stated that most people already know this book and thus did not need additional information. however, when pressed as to what information they would like added to the record, several users stated a summary would be the most useful. additionally, several users asked for more information 50 information technology and libraries | march 2011 the simplicity of the favorites listing feature, the difficulty of linking to other i-share library holdings, and the difficulties in using the facet categories. ■■ implications i intend to continue to perform similar usability tests on next-generation catalogs on a trial basis to examine one aspect regarding the future of online catalogs at uiuc. uiuc is looking at various catalog interfaces, of which vufind is one option, to see which best meets the needs of our users. users stated multiple times during testing that they find the current webvoyage interface to be very frustrating and will accept nearly anything that is an improvement, even if the new interface has some usability issues. vufind is not perfect for all searches, as shown by a lack of a call number search and the limitations in searching for multimedia options, but it does provide a more intuitive interface for most patrons. the future of vufind at uiuc is still open. development is currently stalled because of a lack of developer updates and internal staffing constraints both at uiuc and carli. however, because vufind is open–source, and the only ongoing cost is that of server maintenance, both carli and the library are continuing to display it as an option for searching the catalog. both carli and uiuc are closely examining other options for catalog interfaces that would provide patrons with a better search experience, but they have taken no further action to permanently adapt either vufind or to demo other options. despite its limitations, vufind is still a viable option for libraries with substantial technology expertise that are interested in a next-generation catalog interface at a low price. although it does have limitations, it has a better out-of-the-box interface than traditional opacs and should be considered alongside commercial options for any library thinking of adapting a catalog interface overlay. this usability test focused on one institution’s installation of vufind, which may or may not apply to other installations and other institutional needs. it would be interesting to study an installation of vufind at a smaller, nonresearch institution, where users have different searching needs and expectations related to a library’s opac. references 1. john houser, “the vufind implementation at villanova university,” library hi tech 27, no. 1 (2009): 96–105. 2. vufind, “vufind: about,” http://www.vufind.org/about .php (accessed sept. 10 2009). 3. kathleen bauer, “yale university vufind test— undergraduates,” http://www.library.yale.edu/libepub/ usability/studies/summary_undergraduate.doc (accessed mar. 20, 2010). library catalog to check availability. other participants who made comments about amazon stated that it was commercial and more about marketing materials, while the library catalog just provided the basic information needed to evaluate materials without attempting to sell them to you. several participants also stated they checked amazon for book information, but generally did not like it because of its commercial nature; because vufind provides much of the same information as amazon, they will use vufind first in the future. participants also thought amazon was for a popular and not scholarly audience, making it not useful for academic purposes. most users did not have much to say about the webvoyage opac, except it was overwhelming, had too many words on the result screen, and was not pleasantly visual. participants were also asked to look at vufind, amazon, and webvoyage from a visual preference. again, participants believed that vufind had the best layout. they liked that vufind had a very clean and uncluttered interface and that the colors were few and easy on the eye. they also commented about the visuals contained (cover art and icons) in the records and the vertical orientation of vufind (webvoyage has a horizontal orientation) to display records. they also liked how the facets were displayed, though two users thought they would be better situated on the left side of the results because they scan websites from the left to the right. the one thing that was mentioned several times was vufind’s lack of the star rating system that amazon uses to quickly rate an item. participants thought such a system might be better than reviews because it allows users to quickly scan through the item and not have to read through multiple reviews. when asked to rate the ease of use for vufind, with 1 being easy and 5 being difficult, participants rated it an average of 1.92. faculty rated the ease at 1.6, graduate students at 1.75, and undergraduates at 2.8. undergraduates were more likely to get frustrated at media searching and thought that some of the facets related to media items were confusing, which they used to explain their lower scores. however, when asked if they would rather use vufind over the current library catalog (webvoyage), all but one participant enthusiastically stated they would use vufind. most users stated that although vufind was not perfect, it was still much better than the other library catalog because of the better layout, visuals, and ability to limit results. the only user that specified they would still rather use the webvoyage catalog believed it had more options for advanced search, such as call number searching, which vufind lacked. there are, however, several changes that could make vufind more useful to our users that came out of usability testing. some of these are easy to implement on a local level, and others would improve the base build of vufind. a number of issues arose from usability testing, but the largest issues are the lack of refworks integration, usability of the vufind next-generation online catalog | emanuel 51 9. jennifer bowen, “metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1,” information technology & libraries 27, no. 2 (2008): 6–19. 10. tom steele, “the new cooperative cataloging,” library hi tech 27, no. 1 (2009): 68–77. 11. ian rowlands and david nicholas, “understanding information behaviour: how do students and faculty find books?” journal of academic librarianship 34, no. 1 (2008): 3–15. 12. ja mi and cathy weng, “revitalizing the library opac: interface, searching, and display challengers,” information technology & libraries 27, no. 1 (2008): 5–22. 13. jakob nielsen, “why you only need to test with 5 users,” http://www.useit.com/alertbox/20000319.html (accessed mar. 20, 2010). 4. christine borgman, “why are online catalogs still hard to use?” journal of the american society for information science 47, no. 7 (1996): 493–503. 5. georgia briscoe, karne selden, and cheryl rae nyberg, “the catalog versus the home page: best practices for connecting to online resources,” law library journal 95, no. 2 (2003): 151–74. 6. kristin antelman, emily lynema, and andrew k. pace, “toward a twenty-first century library catalog,” information technology & libraries 25, no. 3 (2006): 128–39. 7. marshall breeding, “library technology guides: discovery layer interfaces,” http://www.librarytechnology. org/discovery.pl?sid=20100322930450439 (accessed mar. 2010). 8. karen m. spicher, “the development of the marc format,” cataloging & classification quaterly 21, no 3/4 (1996): 75–90. appendix. vufind usability study logging sheets i. the look and feel of vufind a. basic screen (the vufind main page) 1) is it obvious what to do? yes _____ no _____; what were you trying to do? 2) open the drop down box, examine the options. do you recognize theseoptions? yes _____ no _____ some _____ (if some, find out what the patron was expecting and get suggestions for improvement). comments: b. click on the advanced search option—take a minute to allow the participants to look around the screen 1) examine each of the advanced search options a) are the advanced search options clear? yes_____ no_____ b) are the advance search options helpful? yes_____no_____ 2) examine the limits fields, open the drop-down menu boxes a) are the limits clearly identified? yes _____ no _____ b) are the pictures helpful? yes _____ no _____ c) are the drop-down menu box options clear? yes _____ no _____ comments: ii. (back to the) basic search field a. enter the phrase—climate change (search all fields)—examine the search results 1) do the records retrieved appear to be relevant to your search statement? yes _____no _____don’t know _____ 2) what information would you like to see in the record? how should it be displayed? 3) examine the right sidebar. are the “facets” clear? yes _____no _____some, not all _____ 4) if you want to view items from other libraries in your search results, can you find the option? yes _____no _____ 5) can you find an electronic book published in 2008? yes _____no _____don’t know _____ comments: b. click on the first book record in the original climate change search results 1) is information about the book clearly represented? yes _____ no _____ 2) is it clear where to find item? yes _____ no _____ 3) look at the tags. do you understand what this feature is? yes _____ no _____ comments: c. look at the brief item information provided on the screen 1) is the information displayed useful in determining the scope and content of the item? yes _____no _____ 2) are the topics in the record useful for finding additional information on the topic? yes _____no _____ comments: d. click on each button below the brief record information 1) is this information useful? yes _____ no _____ 2) are the names for the tabs accurate? what should they be named? e. can you easily determine where the item is located and how to request it? yes _____no _____ comments: f. go back to the basic search box and enter the author james joyce (all fields) as a new search 1) is it easy to distinguish items by james joyce from items about james joyce? yes _____no _____ 2) using the facets, can you find only titles with james joyce as author? yes _____no _____ 3) can you find out how to cite an item? yes _____ no _____ comments: 52 information technology and libraries | march 2011 g. now try to find an audio recording by the artist prince using basic search were you successful? yes _____no _____ h. find the earliest prince recording ( “for you”; 1978). is it in the local collection? yes _____ no _____ if not, can you get a copy? comments: iii. in the advanced search screen: a. use the title drop down to find the item: gone with the wind 1) were you successful? yes _____ no _____ not sure _____ 2) can you locate a dvd of the same title? yes _____ no _____ 3) are copies of the dvd available in the university of illinois library? yes _____ no _____ comments: b. use the author drop down in the advanced search to locate titles by: jack london using the facets, find and open the record for the jack london novel, white fang. explore each of the: description, holdings, and comments tabs: 1) is this information useful? yes _____ no _____ 2) would you change the names of the tabs or the information on them? 3) other than your local library copy of white fang, can you find copies at other libraries? yes _____ no _____ comments: c. using the advanced search, find a dvd on public speaking (hint: use the limit box to select the film/video format) are there instructional videos in the university of illinois library? yes _____ no _____ 1) identify the author that’s responsible for one of the dvds 2) can you easily find other works by this author? yes _____ no _____ comments: iv. exploring the account features: a. click on login in the upper right corner of the page. on the next page, create an account. is it clear how to create an account? yes _____ no _____ b. once you have your account and are logged in to vufind, look at the menu on the right hand side. is it clear what each of the menu items are? yes _____ no _____ c. while still logged in, do a search for your favorite book and add it to your favorites list. is this tool useful, would you consider using it? yes _____ no _____ comments: v. comparing vufind to other resources: a. open three browser windows (this is easiest in firefox by entering ctrl-t for each new window) with 1) your library catalog 2) vufind 3) amazon.com enter global warming in each website in the basic search window of each. based on your initial reactions, which service appears the best for most of your uses? library catalog _____ vufind _____ amazon _____ comments: c. do you have a preference in the display formats? library catalog _____ vufind _____ amazon _____ comments: debriefing now that you have used vufind, how would you rate it—on a scale from 1–5, from easy to confusing to use? comments? how does it compare to other library catalogs you’ve used? if vufind and your home library catalog were available side-by-side, which would you use first? why? are you familiar with any of these other products: aquabrowser _____ googlebooks _____ microsoft live search _____ librarything _____amazon.com _____other preferred service _____ that’s it! thank you for participating in our usability. you will be receiving one other survey through email, we appreciate your opinions on the vufind product. lita covers 2, 3, and 4 index to advertisers communications wikidata: from “an” identifier to “the” identifier theo van veen information technology and libraries | june 2019 72 theo van veen (theovanveen@gmail.com) is researcher (retired), koninklijke bibliotheek. abstract library catalogues may be connected to the linked data cloud through various types of thesauri. for name authority thesauri in particular i would like to suggest a fundamental break with the current distributed linked data paradigm: to make a transition from a multitude of different identifiers to using a single, universal identifier for all relevant named entities, in the form of the wikidata identifier. wikidata (https://wikidata.org) seems to be evolving into a major authority hub that is lowering barriers to access the web of data for everyone. using the wikidata identifier of notable entities as a common identifier for connecting resources has significant benefits compared to traversing the ever-growing linked data cloud. when the use of wikidata reaches a critical mass, for some institutions, wikidata could even serve as an authority control mechanism. introduction library catalogs, at national as well as institutional levels, make use of thesauri for authority control of named entities, such as persons, locations, and events. authority records in thesauri contain information to distinguish between entities with the same name, combine pseudonyms and name variants for a single entity, and offer additional contextual information. links to a thesaurus from within a catalog often take the form of an authority control number, and serve as identifiers for an entity within the scope of the catalog. authority records in a catalog can be part of the linked data cloud when including links to thesauri such as viaf (https://viaf.org/), isni (http://www.isni.org/), or orcid (https://orcid.org/). however, using different identifier systems can lead to having many identifiers for a single entity. a single identifier system, not restricted to the library world and bibliographic metadata, could facilitate globally unique identifiers for each authority and therefore improve discovery of resources within a catalog. the need for reconciliation of identifiers has been pointed out before.1 what is now being suggested is to use the wikidata identifier as “the” identifier. wikidata is not domain specific, has a large user community, and offers appropriate apis for linking to its data. it provides access to a wealth of entity properties, it links to more than 2,000 other knowledge bases, it is used by google, and the number of organisations that link to wikidata is quantifiably growing with tremendous speed.2 the idea of using wikidata as an authority linking hub was recently proposed by joachim neubert.3 but why not go one step further and bring the wikidata identifier to the surface directly as “the” resource identifier, or official authority record? this has been argued before and the implications of this argument will be considered in more detail in the remainder of this article. 4 information technology and libraries | june 2019 73 figure 1. from linking everything to everything to linking directly to wikidata. figure 1 illustrates the differences between a few possible situations that should be distinguished. on the left, the “everything links to everything” situation shows wikidata as one of the many hubs in the linked data cloud. in the middle, the “wikidata as authority hub” situation is shown, where name authorities are linked to wikidata. on the right is the arrangement proposed in this article, where library systems and other systems for which this may apply share wikidata as a common identifier mechanism. of course, there is a need for systems that feed wikidata with trusted information and provide wikidata with a backlink to a rich resource description for entities. in practice, however, many backlinks do not provide rich additional information and in such cases a direct link to wikidata would be sufficient for the identification of entities. figure 2 shows these two situations and other possible variations by means of dashed lines, i.e. systems that feed wikidata, but use the wikidata identifier as resource identifier for the outside world vs. systems that link directly to wikidata, but keep a local thesaurus for administrative purposes. it is certainly not the intention to encourage institutions to give up their own resource descriptions or resource identifiers locally, especially not when they are an original or rich source of information about an entity. a distinction can be made between the url of the description of an entity and the url of the entity itself. when following the url of a real-world entity in a browser, it is good practice to redirect to the corresponding description of the entity. this is known as the “httprange-14” issue.5 this article will not go into any detail about this distinction other than to note that it makes sense to have a single global identifier for an entity while accepting different descriptions of that entity linked from various sources. wikidata | van veen 74 https://doi.org/10.6017/ital.v38i2.10886 figure 2. feeding properties connecting collections to wikidata (left) and direct linking to wikidata using resource identifier (right). the dashed lines show additional connecting possibilities. the motivating use case the idea of using the wikidata identifier as a universal identifier was born at the research department of the national library of the netherlands (kb) while working on a project aimed at automatically enriching newspaper articles with links to knowledge bases for named entities occurring in the text.6 these links include the wikidata identifier and, where available, the dutch and english dbpedia (http://dbpedia.org) identifiers, the viaf number, the geonames number (http://geonames.org), the kb thesaurus record number, and the identifier used by the parliamentary documentation centre (https://www.parlementairdocumentatiecentrum.nl/). the identifying parts of these links are indexed along with the article text in order to enable semantic search, including search based on wikidata properties. for demonstration purposes the enriched “newspapers+” collection was made available through the kb research portal, which gives access to most of the regular kb collections (figure 3). 7 in the newspaper project, linked named entities in search results are clickable to obtain more information. as most users are not expected to know sparql, the query language for the semantic web, the system offers a user-friendly method for semantic search: a query string entered between square brackets, for example “[roman emperor]”, is expanded by a “best guess” sparql query in wikidata, in this case resulting in entities having the property “position held=roman emperor.”. these in turn are used to do a search for articles containing one or more mentions of a roman emperor, even if the text “roman emperor” is not present in the article. in another example, when a user searches for the term “[beatles]” the “best guess” search yields articles mentioning entities with the property “member of=the beatles”. for ambiguous items, as in the case of “guernica,”, which can be the place in spain or picasso’s painting, the one with the highest number of occurrences in the newspapers is selected by default, but the user may select another one. for information technology and libraries | june 2019 75 the default or selected item, the user can select a specific property from a list of wikidata properties available for that specific item. the possibilities of this semantic search functionality may inspire others to use the wikidata identifier for globally known entities in other systems as well. figure 3. screenshot of the kb research portal with a newspaper article as result of searching “[architect=willem dudok]”. the results are articles about buildings of which willem dudok is the architect. the name of the building meeting the query [architect=willem dudok] is highlighted. usage scenarios two usage scenarios can be considered in more detail: (1) manually following links between wikidata descriptions and other resource descriptions, and (2) a federated sparql query can be performed by the system to automatically bring up linked entities. in the first scenario, in which resource identifiers link to wikidata, the user can follow the link to all resource descriptions having a backlink in wikidata. but why would a user follow such a link? reasons may include wanting more or context-specific information about the entity, or a desire to search in another system for objects mentioning a specific entity. in the latter case, the information behind the backlink should provide a url to search for the entity, or the backlink should be the search url itself. wikidata provides the possibility to specify various uri templates. these can be used to specify a link for searching objects mentioning the entity, rather than just showing a thesaurus entry. when the backlink does not provide extra information or a way to search the entity, the backlink is almost useless. thus, when systems provide resource links to wikidata they give users access to a wealth of information about an entity in the web of data and, potentially, to objects mentioning a specific entity. some systems only provide backlinks from wikidata | van veen 76 https://doi.org/10.6017/ital.v38i2.10886 wikidata to their resource descriptions but not the other way around. users from such systems cannot easily benefit from these links. the second scenario of a federated sparql query applies when searching objects in one system based on properties coming from other systems. formulating such a sparql query is not easy because doing so requires a lot of knowledge about the linked data cloud. the alternative is to put the complete linked data cloud in a unified (triple store) database. the technology of linked data fragments might solve the performance and scaling issues but not the complexity. 8 using a central knowledge base like wikidata could reduce complexity for the most common situation of searching objects in other systems using properties from wikidata. this use case requires these systems to take the users query and automatically formulate a sparql search. there are many systems that are linked to wikidata that do not support sparql at all or only support it in a way that is not intended for the average user. those systems can still let users benefit from wikidata by offering a simple add-on to search in wikidata for entities that meet some criteria and use the identifiers for a conventional search in the local system as shown for the case of the historical newspapers. these two use cases illustrate how the use of a wikidata identifier can lower the barrier to access information about an entity and to finding objects related to an entity by minimizing the number of hubs, minimizing the required knowledge and minimizing the required technology. this is achieved by linking resources to wikidata and, even more so, by making objects searchable by means of the wikidata identifier. advantages of using the wikidata identifier as universal identifier summarizing the above, a number of significant advantages of using the wikidata identifier as universal identifier can be seen. these include: • using the wikidata identifier as resource identifier makes wikidata the first hub. applications therefore have in the first instance to deal with only one description model. from there, it is easy to navigate further: most information is only “one hub away,” so less prior knowledge is required to link from one source to another. • wikidata identifiers can be used for federated search based on properties in wikidata, so there is less need to know how to access properties in other resource descriptions. • wikidata identifiers facilitate generating “just in case” links to systems having the wikidata identifier indexed. • complicated sparql queries using wikidata as primary source for properties can be shared and reused more easily compared to a situation with many diverse sources for properties. • wikidata offers many tools and apis for accessing and processing data. • some libraries and similar institutions may even decide to use wikidata directly for authority control when it reaches a critical mass, relieving them from maintaining a local thesaurus. implementation institutions can gradually adopt the use of wikidata identifiers without needing to make radical changes in their local infrastructure. a simple first step is automatically generating links to information technology and libraries | june 2019 77 wikidata in the presentation of an object or to the object description to provide contextual information and navigation options. as a next step, the wikidata q-number of an entity could be indexed along with the descriptions containing it, so these objects become findable via a wikidata identifier search, e.g. of the form: https://whatever.local/wdsearch?id=q937 the wikidata identifier could then be used in conventional as well as federated searches for a resource, regardless of the exact spelling of a resource name. a search may be refined using wikidata properties without further requirements with respect to local infrastructures. institutions having a sparql endpoint can allow for a federated sparql query for combining local data with data from wikidata. as sparql is not easy for the end user this requires a user interface that can formulate a sparql query to protect the user from knowing sparql. those institutions willing to start using the wikidata identifier as resource identifier can unify references in their bibliographic records. currently, for example, a reference to albert einstein, in a simplified, rdf-like (https://www.w3.org/rdf/) xml fragment in a bibliographic record, could look quite different for different institutions, e.g.: albert einstein albert einstein albert einstein albert einstein if the wikidata identifier is used as resource identifier, this could for all institutions become the same: albert einstein in this case it becomes easy to navigate the web, to create common bookmarklets, and provide additional functionality using the wikidata identifier. cataloguing process and criteria for new wikidata entries for institutions that decide to link their entities directly to wikidata, their catalog software would have to be configured to support wikidata lookups. catalogers would not have to know about linked data or rdf to create links to wikidata; they would simply have to query wikidata and select the appropriate entry to link. the cataloging software would then add the selected identifier to the record being edited. if a query in wikidata does not yield any results the item would first then have to be created by the cataloger. creating a new item using the wikidata user interface (figure 4) is straightforward: create an account, add a new item, and add statements (fields) and values. wikidata | van veen 78 https://doi.org/10.6017/ital.v38i2.10886 figure 4. data entry screen for entering a new item in wikidata. catalogers must be aware of some rules when creating items. wikidata editors may delete items that fall under one of wikidata’s exclusion criteria, such as vandalism, empty descriptions, broken links, etc. in addition, the item must refer to an instance of a clearly identifiable conceptual or material “notable” entity. notable means that the item must be mentioned by at least one reliable, third-party published source. here, common sense is required: being mentioned in a telephone book or a newspaper is in itself not considered as notability. entities that are not notable enough to be entered into wikidata would then remain identified by a link to a local or other thesaurus. possible objections to wikidata as authority control mechanism although it is, at least at the present moment, not the intention of this article to propose the use of wikidata as the primary local authority control mechanism, some institutions may nonetheless consider the opportunity to do so. there are numerous objections to this idea to note, including: 1) institutions may consider themselves authoritative sources of information, and may therefore want to keep control over “their” thesaurus. the idea that the greater community can make changes to “their” thesaurus may not be tenable to them. quality control and error detection certainly are important issues, but experts from outside the library can sometimes provide more and better information about a resource than cataloguing professionals. for misuse and erroneous input, the community can be relied on and trusted to correct and add to wikidata entries. information that is critical for local usage, such as access control, may still be managed locally. despite possible objections to using wikidata for universal authority control, national libraries and other institutions can information technology and libraries | june 2019 79 work together with wikidata to share responsibility of maintaining the resource, to optimize and harmonize the shared use of wikidata, and maintain validity and authority. this might imply a more rigorous quality control. 2) existing systems like viaf and isni already, at present, still contain more persons than wikidata, so why use wikidata? viaf and isni are domain specific and are more restrictive with respect to updates of their content and the availability of tools and apis. in wikidata both viaf and isni are just one hub away and for internal use the viaf and isni identifiers remain available. the question here is whether there will be a moment that wikidata reaches a critical mass and supersedes viaf and isni. 3) there may be disagreement about a certain entity, especially when it concerns political events or persons whose role is perceived differently by different political parties. wikidata contains neutral properties. the properties that may contain subjective qualifications or might suffer bias are mostly behind the backlinks, like the abstract in wikipedia. a fundamental difference between wikipedia and wikidata is that wikipedia doesn’t have to be consistent across languages. wikidata is much more structured and therefore more useful for semantic applications. it doesn’t allow for the different nuances in descriptions like wikipedia articles do and therefore wikidata doesn’t reflect different opinions in descriptions and is less subject to bias.9 furthermore, the cataloguing practices in libraries are subject to bias and subjectivity too. perception and political view may, for example, be reflected in some subject headings and may also change over time.10 it is debatable whether a cataloger is more neutral and less biased than a larger user community. although the use and acceptance of wikipedia as a true source of information may be arguable, in the light of the current “fake news” discussion it is extremely important to guard the correctness of information in wikipedia. in this context it is interesting to note that “according to a study in nature, the correctness of wikipedia articles is comparable to the encyclopaedia britannica, and a study by ibm researchers found that vandalism is repaired extremely quickly.”11 4) some objections have to do with the discussion of “centralization versus decentralization.” some institutions may not want a central system perceptively having control over their local data. the idea of using wikidata as a common authority control mechanism is not that different from the use of any other thesaurus or identifier framework like isbn, issn, etc., except for its use of a central resource description. 5) what if wikidata disappears? there are solutions in terms of mirrors and a local copy of wikidata. moreover, national libraries and other, similar institutions that are already responsible for long-term preservation of digital content can take responsibility for keeping wikidata alive to maximize its viability wikidata | van veen 80 https://doi.org/10.6017/ital.v38i2.10886 conclusion reconciliation of linked data identifiers in general, and using the wikidata identifier as universal identifier in particular, has been shown to have many advantages. libraries and similar institutions can gradually start using the wikidata identifier without needing to make radical changes in their local database infrastructure. when wikidata reaches a critical mass, libraries and similar institutions may want to switch to using wikidata identifiers as the default resource identifiers or authority records. however, given the enormous growth of the number of collections that link entities to wikidata that is already taking place, we might end up in a situation where the perception is that “if an item is not in wikidata, it doesn’t exist” stimulating putting more items in wikidata and making local descriptions less relevant. from a strategic point of view for adopting wikidata decision makers may pose the question: “why do we have a local thesaurus when we already have wikidata?” the next question, then, will probably not be “should we go this way?” but rather “when should we go this way and start using the wikidata identifier as the identifier?” references 1 robert sanderson, “the linked data snowball and why we need reconciliation,” slideshare, apr. 4, 2016, https://www.slideshare.net/azaroth42/linked-data-snowball-or-why-we-needreconciliation. 2 karen smith-yoshimura, “the rise of wikidata as a linked data source,” hanging together, aug. 6, 2018, http://hangingtogether.org/?p=6775. 3 joachim neubert, “wikidata as a linking hub for knowledge organization systems? integrating an authority mapping into wikidata and learning lessons for kos mappings,” in proceedings of the 17th european networked knowledge organization systems workshop, 2017, 14-25, http://ceur-ws.org/vol-1937/paper2.pdf. 4 theo van veen, “wikidata as universal library thesaurus,” presented oct. 2017 at wikidatacon 2017, berlin, https://www.youtube.com/watch?v=1_nxkbncohm. 5 “httprange-14,” wikipedia, accessed mar. 15, 2019, https://en.wikipedia.org/wiki/httprange-14. 6 theo van veen et. al., “linking named entities in dutch historical newspapers,” in metadata and semantics research, mtsr 2016, ed. emmanouel garoufallou (cham: springer, 2016), 205–10, https://doi.org/10.1007/978-3-319-49157-8_18. 7 video demonstration of “kb research portal,” kb | national library of the netherlands, http://www.kbresearch.nl/xportal, accessed apr. 26, 2019, https://www.youtube.com/watch?v=j5mcem-hemg. 8 ruben verborgh, “linked data fragments: query the web of data on web-scale by moving intelligence from servers to clients,” accessed mar. 15, 2019, http://linkeddatafragments.org/. 9 mark graham, “the problem with wikidata,” apr. 6, 2012, https://www.theatlantic.com/technology/archive/2012/04/the-problem-withwikidata/255564/. information technology and libraries | june 2019 81 10 candise branum, “the myth of library neutrality,” may 15, 2014, https://candisebranum.wordpress.com/2014/05/15/the-myth-of-library-neutrality/. 11 “the reliability of wikipedia,” wikipedia, accessed mar. 15, 2019, https://en.wikipedia.org/wiki/reliability_of_wikipedia. 2007 is ital’s 40th volume. my 40th birthday was the occasion of a great deal of bizarre behavior by my work colleagues, who boobytrapped my office. i do not like cake but love radishes. my birthday “cake” at work was a cheese ball decorated with forty radishes stuck on toothpicks. since i didn’t have to blow them out, i ate them—all forty. ital’s fortieth is no time for such shenanigans. rather it is a time for reflection, celebration, and memoriam. fred kilgour, the founding editor of the journal of library automation (jola), ital’s original title, died last summer. in planning for the 40th anniversaries of lita in 2006 and ital in 2007, the editorial board and i wanted to honor fred as founding editor. i called him and invited him to submit an article of his choosing. he thanked me but graciously declined. he was busy writing his mem oirs and said that he needed to conserve his strength for that task. to honor him as founding editor, i have invited a number of authors to submit articles describing their research or their seminal thoughts on our profession. readers have, i hope, seen those articles that are so des ignated by notes. i have also invited all lita members to submit such articles in previous editorials and in a posting to lital. several articles have resulted from these invitations. this being the first issue of the 2007 volume, it is neither too late for me to reissue an invitation, nor too late for you lita members and ital readers to respond with articles that commemorate our fortieth. i’m old enough to know that it is a cliché to proclaim “there has never been a more exciting time to be a librar ian.” it was so when volume 1 of jola appeared in 1967. it is so today. let us together peruse the tables of contents (tocs) of the first two issues. vol. 1, no. 1 ned c. morris, “computer based acquisitions system at texas a&i university”; richard d. johnson, “a book catalog at stanford”; robert wedgeworth, “brown university library fund accounting system”; richard e. chapin and dale h. pretzer, “comparative costs of converting shelf list records to machine readable form”; richard de gennaro, “the development and administration of automated systems in academic libraries” vol. 1, no. 2 lawrence auld, “automated book order and circulation control procedures at the oakland university library”; donald v. black, “creation of computer input in an expanded character set”; frederick c. kilgour, “costs of library catalog cards produced by computer”; r. a. kennedy, “bell laboratories’ library realtime loan system (bellrel)” four things are immediately striking about those titles. their authors described computerbased solutions and systems for big issues facing libraries forty years ago. second, those problems were all administrative, i.e., they involved using computers to increase the productivity of major operations performed by librarians and library staff. to paraphrase an oftcited goal, they were systems designed to attempt to control the rate of rise of library costs of operations—to improve the efficiency and effec tiveness of internal library processes. therefore third, they were not systems for library users per se. and fourth, they were harbingers of success. global cooperative cataloging and wellintegrated library systems have revolutionized our operations. we are devoting relatively more resources to direct services than we did forty years ago. i do not mean that no thoughts or efforts were being devoted to improved user services. when these articles were published, lockheed and the system development corporation (sdc) were in the process of developing the first commercially successful, general online database search systems. in fact, forty years ago, in a former life, as it were, i was present at what i believe was the first trans continental online information search, from a teletype machine in sdc’s office in dayton, ohio, to a computer at its santa monica headquarters. (aside to readers: as an impatient young man, i was struck less by the “magic” of the event than by an observation that i expressed on the spot: the response time was horrible—unacceptable. i opined that no one would put up with such a wait. i narrowly escaped with my scalp intact.) the national library of medicine (nlm) was perfecting the medical literature analysis and retrieval system (medlars), medline’s (medlars online’s) predecessor. selective dissemination of information (sdi) services were already being provided using batch processes. computers gen erated a myriad of printed article and technical report indexes. we’ve come a long way in forty years. an article in the current issue describes what librarians need to know about “facebook.” increasingly, in informationrich soci eties, our students and others want and need their infor mation technology on the run. the first five paragraphs of this editorial were com posed three weeks ago using the word processor on my palm treo 650 whilst i sat in medicalcenter waiting and examining rooms in portland, oregon. i downloaded the tocs of jola to my home desktop computer in vancouver, washington, two weeks ago. yesterday, i editorial: reflections on forty john webb john webb (jwebb@wsu.edu) is a librarian emeritus, washington state university, and editor of information technology and libraries. editorial | webb 3 contiuned on page 34 34 information technology and libraries | march 200734 information technology and libraries | march 2007 12. if you answered “yes” to question 11, please describe how facebook could be considered an aca demic endeavor. ______________________________________________ ______________________________________________ ______________________________________________ ______________________________________________ 13. please check all answers that best describe what effect, if any, use of facebook in the library has had on library services and operations?  has increased patron traffic  has increased patron use of computers  has created computer access problems for patrons  has created bandwidth problems or slowed down internet access  has generated complaints from other patrons  annoys library faculty and staff  interests library faculty and staff  has generated discussion among library faculty and staff about facebook 14. is privacy a concern you have about students using facebook in the library?  yes  no  not sure please list any observations, concerns, or opinions you have regarding facebook use in libraries. extracted the paragraphs from my palm to my desktop, and saved that document and the tocs on a universal serial bus (usb) key. today, i combined them in a new document on my laptop and keyed the remaining paragraphs in my room at an inn on a pier jutting into commencement bay in tacoma on southern puget sound. i sought inspiration from the view out my window of the water and the fall color, from old crow medicine show on my ipod, and from early sixties beyond the fringe skits on my treo. fred kilgour was committed to delivering informa tion to users when and where they wanted it. libraries must solve that challenge today, and i am confident that we shall. editorial continued from page 3 december_ital_maceli_final technology skills in the workplace: information professionals’ current use and future aspirations monica maceli and john j. burke information technology and libraries | december 2016 35 abstract information technology serves as an essential tool for today’s information professional, and ongoing research is needed to assess the technological directions of the field over time. this paper presents the results of a survey of the technologies used by library and information science practitioners, with attention to the combinations of technologies employed and the technology skills that practitioners wish to learn. the most common technologies employed were email, office productivity tools, web browsers, library catalogand database-searching tools, and printers, with programming topping the list of most-desired technology skill to learn. similar technology usage patterns were observed for early and later-career practitioners. findings also suggested the relative rarity of emerging technologies, such as the makerspace, in current practice. introduction over the past several decades, technology has rapidly moved from a specialized set of tools to an indispensable element of the library and information science (lis) workplace, and today it is woven throughout all aspects of librarianship and the information professions. information professionals engage with technology in traditional ways, such as working with integrated library systems, and in new innovative activities, such as mobile-app development or the creation of makerspaces.1 the vital role of technology has motivated a growing body of research literature, exploring the application of technology tools in the workplace, as well as within lis education, to effectively prepare tech-savvy practitioners. such work is instrumental to the progression of the field, and with the rapidly-changing technological landscape, requires ongoing attention from the research community. one of the most valuable perspectives in such research is that of the current practitioner. understanding current information professionals’ technology use can help in understanding the role and shape of the lis field, provide a baseline for related research efforts, and suggest future monica maceli (mmaceli@pratt.edu) is assistant professor, school of information, pratt institute, new york. john j. burke (burkejj@miamioh.edu) is library director and principal librarian, gardner-harvey library, miami university middletown, middletown, ohio. technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 36 directions. the practitioner perspective is also valuable in separating the hype that often surrounds emerging technologies from the reality of their use and application within the lis field. this paper presents the results of a survey of lis practitioners, oriented toward understanding the participants’ current technology use and future technology aspirations. the guiding research questions for this work are as follows: 1. what combinations of technology skillsets do lis practitioners commonly use? 2. what combinations of technology skillsets do lis practitioners desire to learn? 3. what technology skillsets do newer lis practitioners use and desire to learn as compared to those with ten-plus years of experience in the field? literature review the growth and increasing diversity of technologies used in library settings has been matched by a desire to explore how these technologies impact expectations for lis practitioner skill sets. triumph and beile examined the academic library job market in 2011 by describing the required qualifications for 957 positions posted on the ala joblist and arl job announcements websites.2 the authors also compared their results with similar studies conducted in 1996 and 1988 to see if they could track changes in requirements over a twenty-three-year period. they found that the number of distinct job titles increased in each survey because of the addition of new technologies to the library work environment that require positions focused on handling them. the comparison also found that computer skills as a position requirement increased by 100 percent between 1988 and 2011, with 55 percent of 2011 announcements requiring them. looking more deeply at the technology requirements specifically, mathews and pardue conducted a content analysis of 620 jobs ads from the ala joblist to identify skills required in those positions.3 the top technology competencies required were web development, project management, systems development, systems applications, networking, and programming languages. they found a significant overlap of librarian skill sets with those of it professionals, particularly in the areas of web development, project management, and information systems. riley-huff and rholes found that the most commonly sought technology-related job titles were systems/automation librarian, digital librarian, emerging and instructional technology librarian, web services/development librarian, and electronic resources librarian.4 a few years later, maceli added to this list with newly popular technology-relating titles, including emerging technologies librarian, metadata librarian, and user experience/architect librarian.5 beyond examining which specific technologies librarians should be able to use, researchers have also pondered whether a list of skills is even possible to create. crawford synthesized a series of blog posts from various authors to discuss which technology skills are essential and which are too specialized to serve as minimum technology requirements for librarians.6 he questioned whether universal skill sets should be established given the variety of tasks within libraries and the unique backgrounds of each library worker. crawford also questioned the expectation that every librarian information technology and libraries | december 2016 37 will have a broad array of technology skills from programming to video editing to game design and device troubleshooting. partridge et al. reported on a series of focus groups held with 76 librarians that examined the skills required for members of the profession, especially those addressing technology.7 in the questions they asked the focus groups, the authors focused on the term “library 2.0” and attempted to gather suggestions on skills that current and future librarians need to assist users. they concluded that the groups identified that a change in attitudes by librarians was more important to future library service than the acquisition of skills with specific technology tools. importance was given to librarians’ abilities to stay aware of technological changes, be resilient and reflective in the face of them, and to communicate regularly and clearly with the members of their communities. another area examined in the studies is where the acquisition of technology skills should and does happen for librarians. riley-huff and rholes reported on a dual approach to measure librarians’ preparation for performing technology-related tasks.8 the authors assessed course offerings for lis programs to see if they included sufficient technology preparation for new graduates to succeed in the workplace. they then surveyed lis practitioners and administrators to learn how they acquired their skills and how difficult it is to find candidates with enough technology preparation for library positions. their findings suggest that while lis programs offer many technology courses, they lack standardization, and graduates of any program cannot be expected to have a broad education in library technologies. further research confirmed this troubling lack of consistency in technology-related curricula. singh and mehra assessed a variety of stakeholders, including students, employers, educators, and professional organizations, finding widespread concern about the coverage of technology topics in lis curricula.9 despite inconsistencies between individual programs, several studies provided a holistic view of the popular technology offerings within lis curricula. programs commonly offered one or more introductory technology courses, as well as courses in database design and development, web design and development, digital libraries, systems analysis, and metadata.10,11,12 as researchers have emphasized from a variety of perspectives, new graduates could not realistically be expected to know every technology with application to the field of information.13 there was widespread acknowledgement that learning in this area can, and must, continue in a lifelong fashion throughout one’s career. riley-huff and rholes reported that lis practitioners saw their own experiences involving continuing skill development on the job, both before and after taking on a technology role.14 however, literature going back many decades suggests that the increasing need for continuing education in information technology has generally not been matched by increasing organizational support for these ventures. numerous deterrents to continuing technology education were noted, including lack of time,15 organizational climate, and the perception of one’s age.16 while studies in this area have primarily focused on mls-level positions, jones reported on academic library support staff members and their perceptions of technology use over a ten-year period and found that increased technology responsibilities added technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 38 to workloads and increased workplace stress.17 respondents noted that increasing use of technology in their libraries has increased their individual workloads along with the range of responsibilities that they hold. method to build an understanding of the research questions stated above, which focus on the technologies currently used by information professionals and those they desired to learn, we designed and administered a thirteen-question anonymous survey (see appendix) to the subscribers of thirty library-focused electronic discussion groups between february 25 and march 13, 2015. the groups were chosen to target respondents employed in multiple types of libraries (academic, public, school, and special) with a wide array of roles in their libraries (public services librarians, systems staff members, catalogers, and so on). we solicited respondents with an email sent to the groups asking for their participation in the survey and with the promise to post initial results to the same groups. the survey included closed and open-ended questions oriented toward understanding current technology use and future aspirations as well as capturing demographics useful in interpreting and generalizing the results. the survey questions have been previously used and iteratively expanded over time by the second author, first in the fall of 2008, then spring of 2012, with summative results presented in the last three editions of the neal-schuman library technology companion. we obtained a total of 2,216 responses to the question, “which of the following technologies or technology skills are you expected to use in your job on a regular basis?” of these responses, 1,488 (67 percent) of the respondents answered the question regarding technologies they would like to learn: “what technology skill would you like to learn to help you do your job better?” we conducted basic reporting of response frequency for closed questions to assess and report the demographics of the respondents. to analyze the open-ended survey question results in greater depth, we conducted a textual analysis using the r statistical package (https://www.r-project.org/). we used the tm (text mining) package in r (http://cran.rproject.org/package=tm) to calculate frequency, correlation of terms, generate plots, and cluster terms. results the following section will first present an overview of survey responses and respondents, and then explore results as related to the stated four research questions. the lis practitioners who responded to the survey reported that their libraries are located in forty us states, eight canadian provinces, and forty-three other countries. academic libraries were the most common type of library represented, followed by public, school, special, and other (see table 1). information technology and libraries | december 2016 39 library type number of respondents percentage of all respondents academic 1,206 54.4 public 545 24.6 school 266 12 special 138 6.2 other 61 2.8 table 1. the types of libraries in which survey respondents work respondents also provided their highest level of education. a total of 77 percent of responding lis practitioners have earned a library-related or other master’s degrees, dual master’s degrees, or doctoral degrees. from these reported levels of education, it is likely that more respondents are in librarian positions than in library support staff positions. however, individuals with master’s degrees serve in various roles in library organizations, so the percentage of graduate degree holders may not map exactly to the percentage of individuals in positions that require those degrees. significantly fewer respondents (16 percent) reported holding a high school diploma, some college credit, an associate degree, or a bachelor’s degree as their highest level of education. another aspect we measured in the survey was tasks that respondents performed on a regular basis. the range of tasks provided in the survey allowed for a clearer analysis of job responsibilities than broad categories of library work such as “public services” or “technical services.” some respondents appeared to be employed in solo librarian environments where they are performing several roles. even respondents who might have more focused job titles such as “reference librarian” or “cataloger” may be performing tasks that overlap traditional roles and categories of library work. the tasks offered in the survey and the responses to each are shown in table 2. technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 40 task number of respondents percentage of respondents reference 1,404 63.4 instruction 1,296 58.5 collection development 1,260 56.9 circulation 917 41.4 cataloging 905 40.8 electronic resource management 835 37.7 acquisitions 789 35.6 user experience 775 35 library administration 769 34.7 outreach 758 34.2 marketing/public relations 722 32.6 library/it systems 672 30.3 periodicals/serials 659 29.7 media/audiovisuals 566 25.5 interlibrary loan 518 23.4 distance library services 474 21.4 archives/special collections 437 19 other 209 9.40% table 2. tasks performed on a regular basis by survey respondents while public services-related activities lead the list, with reference, instruction, collection development, and circulation as the top four task areas, technical services-related activities are well represented; the next three in rank are cataloging, electronic resource management, and acquisitions. the overall list of tasks shows the diversity of work lis practitioners engage in, as each respondent chose an average of six tasks. the results also suggest that the survey respondents are well acquainted with a wide variety of library work rather than only having experience in a few areas, making their uses of technology more representative of the broader library world. the survey also questioned the barriers lis practitioners face as they try to add more technology to their libraries, and 2,161 respondents replied to the question, “which of the following are barriers to new technology adoption in your library?” financial considerations proved to be the most common barrier, with “budget” chosen by 80.7 percent of respondents, followed by “lack of staff time” (62.4 percent), “lack of staff with appropriate skill sets” (48.5 percent), and “administrative restrictions” (36.7 percent). information technology and libraries | december 2016 41 what combinations of technology skillsets do lis practitioners commonly use? responses from survey question 8, “which of the following technologies or technology skills are you expected to use in your job on a regular basis?,” were analyzed to build an understanding of this research questions. a total of 2,216 responses to this question were received. survey respondents were asked to select from a detailed list of technologies/skills (visible in question 8 of the appendix) that they regularly used. the top answers respondents chose for this question were: email, word processing, web browser, library catalog (public side), and library database searching. the full list of the top twenty-five technology skills and tools used is detailed in figure 1, with the list of the bottom fifteen technology skills used presented in figure 2. figure 1. top twenty-five technology skills/tools used by respondents (n = 2,216) 0 500 1,000 1,500 2,000 email word processing web browser library catalog public side library database searching spreadsheets printers web searching teaching others to use technology presentation software windows os laptops scanners library management system staff side downloadable ebooks web based ebook collections cloud based storage technology troubleshooting teaching using technology online instructional materials/products tablets web video conferencing educational copyright knowledge library website creation or management cloud-based productivity apps technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 42 figure 2. bottom fifteen technology skills/tools used by respondents (n = 2,216) text analysis techniques were then used to determine the frequent combinations of technology skills used in practice. first, a clustering approach was taken to visualize the most popular technologies that were commonly used in combination (figure 3). clustering helps in organizing and categorizing a large dataset when the categories are not known in advance, and, when plotted in a dendrogram chart, assists in visualizing these commonly co-occurring terms. the authors numbered the clusters identified in figure 3 for ease of reference. from left to right, the first cluster is focuses on communication and educational tools, the second emphasizes devices and software, the third contains web and multimedia creation tools, the fourth contains office productivity and public-facing information retrieval tools, and the fifth cluster has a diverse collection of responsibilities including systems-oriented responsibilities (from operating systems to specific hardware devices), working with ebooks, teaching with technology, and teaching technology to others. 0 500 1,000 1,500 2,000 mac os audio recording and editing technology equipment installation computer programming or coding assistive adaptive technology rfid chromebooks network management server management statistical analysis software makerspace technologies linux 3d printers augmented reality virtual reality information technology and libraries | december 2016 43 figure 3. cluster analysis of most frequent technology skills used in practice, with red outlines on each numbered cluster notably, the list of top skills used (figure 1) falls more on the end-user side of technology; skills more oriented toward systems work (e.g. linux, server management, computer programming, or coding) were less frequently mentioned, and several were among the lowest reported (figure 2). of the 2,216 respondents, 15 percent used programming or coding skills regularly in their job (which is of interest as programming or coding was the skill most desired to learn by respondents; this will be discussed further in the context of the next research question). plotting the correlations between the more advanced technology skillsets can provide a picture of the work such systems-oriented positions are commonly responsible for, particularly as they are less well represented in the responses as a whole. figure 4 plots the correlated terms for those tasked with “server management.” it is fair to assume someone with such responsibilities falls on the highly technical end of the spectrum. technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 44 figure 4. terms correlated with “server management,” indicating commonly co-occurring workplace technologies for highly-technical positions the more common task of “library website creation or management,” which fell to those with a broad level of technological expertise, had numerous correlated terms. figure 5 demonstrated a wide array of technology tools and responsibilities. figure 5. terms correlated with “library website creation or management,” indicating commonly co-occurring technologies used on the job information technology and libraries | december 2016 45 and lastly, teaching using technology and teaching technology to others is a long-standing responsibility of librarians and library staff. the following plot (figure 6) presents the skills correlated with “teaching others to use technology.” figure 6. terms correlated with “teaching others to use technology,” indicating commonly cooccurring technologies used on the job what combinations of technology skillsets do lis practitioners desire to learn? we analyzed responses to survey question 10, “what technology skill would you like to learn to help you do your job better?,” to explore this research question. as summarized in burke18—and consistent with the prior year’s findings—coding or programming remained the most desired technology skillset, mentioned by 19 percent of respondents. the raw text analysis yielded a fuller list of the top terms mentioned by participants (table 3 and visualized in figure 7). technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 46 technology term number of respondents percentage of respondents coding or programming (combined for reporting) 292 19.59 web 178 11.96 software 158 10.62 video 112 7.53 apps 106 7.12 editing 105 7.06 design 85 5.71 database 76 5.11 table 3. terms mentioned by 5 percent or more of survey respondents figure 7. wordcloud of responses to “what technology skill would you like to learn to help you do your job better?” information technology and libraries | december 2016 47 we then explored the deeper context of responses and individually analyzed responses specific to the more popular technology desires. first, we assessed the responses mentioning the desire to learn coding or programming. of these responses, the most common specific technologies mentioned were html, python, css, javascript, ruby, and sql, listed in decreasing order of interest. although most participants did not describe what they would like to do with their desired coding or programming skills, of those that did, the responses indicated interest in ● becoming more empowered to solve their own technology problems (e.g., “i would like to learn the [programming languages] so i don't have to rely on others to help with our website,” “i’m one of the most tech-skilled people at my library, but i’d like to be able to build more of my own tools and manage systems without needing someone from it or outside support.”); ● improving communication with it (e.g., “how to speak code, to aid in communication with it,” “to better identify problems and work with it to fix them”); ● creating novel tools and improving system interoperability (e.g. “coding for app and api creation”); and ● bringing new technologies to their library and patrons (e.g., “coding so that i can incorporate a hackerspace in my library”). next, we took a clustering approach to visualize the terms commonly desired in combination. figure 8 describes the clustered terms that we found within the programming or coding responses. the terms “programming” and “coding” form a distinct cluster to the right of the diagram, indicating that many responses contained only those two terms. technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 48 figure 8. clustering of terms present in responses indicating the desire to learn coding or programming the remaining portion of the diagram begins to illustrate the specific technologies mentioned for those respondents that answered in greater detail or expanded on their general answer of programming or coding. other related desired technology-skill areas become apparent: database management, html and css (as well as the more general “web design,” which appeared in the top terms in table 3), php and javascript, python and sql, and xml creation, among others. the bulleted list presented in the previous paragraph illustrates some of the potential applications participants envisioned these skills being useful in, but the majority did not provide this level of detail in their response. editing was another prominent term that appeared across participant responses and was largely meant in the context of video editing. because of the vagueness of the term “editing,” a closer look was necessary to determine other technology desires. looking at terms highly correlated with “editing” revealed both video and photo editing to be important to respondents. several of the topappearing terms were used more generally: “database” and mobile “apps” were mentioned without specifying the technology tool or scenario of use, such that a more contextual analysis could not be conducted. these responses can be particularly difficult to interpret as the term “databases” can have a technical meaning (e.g., working with sql) or it can refer to the use of library databases from an end user perspective. information technology and libraries | december 2016 49 what technology skillsets do newer lis practitioners use and desire to learn as compared to those with ten-plus years experience in the field? of the 2,216 survey responses, 877 stated they had worked in libraries for ten or fewer years. we analyzed these responses separately from the remaining 1,334 respondents who had worked in libraries for more than ten years. of this group, 644 had worked in libraries for twenty-plus years (figure 9). a handful of participants did not answer the question and were omitted from the analysis. figure 9. number of survey responses falling into the various categories for number of years working in libraries the top technology skills used in the workplace did not differ significantly between the different groups. the top skills, as discussed earlier and presented in figure 1, were well represented and similarly ordered. a few small percentage points of difference were noted in a handful of the top skills (figure 10). those newer to the field were slightly more likely to teach others to use technology, use cloud-based storage, and use cloud-based productivity apps. more experienced practitioners regularly used the library management system (on the staff side) more than those that were newer to the field. 0 100 200 300 400 500 600 700 0-2 3-5 6-10 11-15 16-20 21+ technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 50 figure 10. top twenty-five technology skills used by respondents in the zero to ten years’ experience (dark blue) and eleven-plus years experience (light blue) groups for the question regarding technologies they would like to learn, 69 percent of the participants with zero to ten years’ experience answered the question compared to a slightly smaller 65 percent of the participants with more than ten-years’ experience. top terms for both groups were very similar, including coding or programming, software, web, video, design, and editing. these terms were not dissimilar to the responses taken as a whole (table 3), indicating that respondents were generally interested in learning the same sorts of technology skills regardless of how long they had been in the field. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% email word processing web browser library catalog public side library database searching spreadsheets web searching printers teaching others to use technology presentation software windows os laptops scanners downloadable ebooks cloud based storage library management system staff side web based ebook collections technology troubleshooting teaching using technology online instructional materials/products cloud-based productivity apps tablets web video conferencing library website creation or management educational copyright knowledge 0-10 years 11+ years information technology and libraries | december 2016 51 a few noticeable differences between the two groups emerged. the most popular skills mentioned, coding or programming, were mentioned by 28 percent of the respondents with zero to ten years’ experience, and by 15 percent of the respondents with eleven-plus years experience. there was slightly more interest (by a few percentage points) in databases, design, python, and ruby in the zero to ten years’ experience group. taking a closer look at the different year ranges in the zero to ten years of experience or less group, revealed that those with three to five years of experience were most likely to be interested in learning coding or programming skills. figure 11. percentage of respondents interested in learning coding or programming in the groups with ten or fewer years’ experience of the participants that answered the question at all, several stated that there were no technology skills they would need or like to learn for their position, either because they were comfortable with their existing skills or were simply open to learning more as needed (but nothing specific came to mind). combined with those who did not answer the question (and so presumably did not have a particular technology they were interested in learning), 28 percent of the zero to ten years’ experience group and 31 percent of the eleven-plus years experience group did not have any technologies that they desired to learn at the moment. discussion as detailed earlier, the most common technologies employed by lis practitioners were email, office productivity tools, web browsers, library catalog and database searching tools, and printers. generally similar technology usage patterns were observed for early and later-career practitioners and programming topped the list of most-desired technology skill to learn. 0% 5% 10% 15% 20% 25% 30% 35% 0-2 years 3-5 years 6-10 years technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 52 the cluster analysis presented in figure 3 suggests that a relatively small percentage of practitioners have technology-intensive roles that would require skills such as programming, working with databases, systems administration, etc. rather, the cluster analysis showed common technology skillsets focused on the end-user side of technology tools. in fact, most of the top ten skills used—email, office productivity tools (word processing, spreadsheets and presentation software), web browsers, library catalog and database searching, printers, and teaching others to use technology—are fairly nontechnical in nature. a potential exception is that of teaching technology. figure 6 suggests that teaching others to use technology entails several hardware devices (for example, laptops, tablets, smartphones, and scanners) as well as online and digital resources, such as ebooks. however, most of the popular skills used would be considered baseline skills for information workers in any domain. as suggested by tennant, programming and other advanced technical skills do not necessarily need to be a core skill for all information professionals, but knowledge of the potential applications and possibilities of such tools is required.19 this idea was echoed by partridge et al., whose findings emphasized the need for awareness and resilience in tackling new technological developments.20 these skills alone would obviously be too little for lis practitioners explicitly seeking a high-tech role, as discussed in maceli.21 however, further research directed toward exploring the mental models and general technological understanding of information professionals would be helpful in understanding the true level of practitioner engagement with technology, to complement the list of relatively low-tech tools employed. programming has been a skill of great interest within the information professions for many years and the respondents’ enthusiasm and desire to learn in this area was readily apparent from the survey results, with nearly 20 percent of participants citing either “programming” or “coding” as a skill they desired to learn. in the context of their current responsibilities, 15 percent of respondents overall mentioned “computer programming or coding” as a regular technological skill they employed (figure 2). there was a slight difference between the librarians with fewer than eleven years of experience—19 percent coded regularly—compared to 13 percent of those with eleven or more years of experience. within the years-of-experience divisions, the newer practitioners were more interested in learning programming, with the peak of interest at three to five years in the workplace (figure 11). the relatively low interest or need to learn programming in the newest practitioners potentially indicates a hopeful finding—that their degree program was sufficient preparation for the early years of their career. prior research would contradict this finding. for example, choi and rasmussen’s 2006 survey found that, in the workplace, librarians frequently felt unprepared in their knowledge of programming and scripting languages.22 in the intervening years, curriculum has shifted to more heavily emphasize technology skills, including web development and other topics covering programming,23 perhaps better preparing early career practitioners. overall, information technology and libraries | december 2016 53 programming remains a popular skill in continuing education opportunities as well as in job listings,24 which aligns well with the respondents’ strong interest in this area. the skills commonly co-occurring with programming in practice included working with linux, database software, managing servers, and webpage creation (figure 4). taken as a whole, these skills indicate job responsibilities falling toward the systems side, with webpage creation a skill that bridged intensely technical and more user-focused work (as also evident in figure 4).this indicates that, though programming may be perceived as highly desirable for communicating and extending systems, as a formal job responsibility it may still fall to a relatively small number of information professionals in any significant manner. makerspace technologies and their implementation possibilities within libraries have garnered a great deal of excitement and interest in recent years, with much literature highlighting innovative projects in this area (such as american library association25 and bagley26). fourie and meyer provided an overview of the existing makerspace literature, finding that most research efforts focus on the needs and construction of the physical space.27 given the general popularity of the topic (as detailed in moorefield-lang),28 it is interesting to note that such technologies were infrequently mentioned by survey participants, both in those desiring to learn these tools and those who were currently using them. the most infrequent skills used (figure 2) included makerspace technologies, 3d printers, augmented, and virtual reality. only a small number of respondents currently used this mix of makerspace-oriented and emerging technologies, and only 3 percent of respondents mentioned interest in learning makespace-related skills. despite many research efforts exploring the particulars of unique makerspaces in a case-study approach (for example, moorefield-lang),29 little data exists on the total number of makerspaces within libraries, and the skillset is largely absent from prior research describing lis curriculum and job listings. this makes it difficult to determine whether the low number of participants that reported working with makerspace technologies is reflective of the small number of such spaces in existence or simply that few practitioners are assigned to work in this area, no matter their popularity. in either case, these findings provide a useful baseline with which to track the growth of makerspace offerings over time and librarian involvement in such intensely technological work. despite the interest and clear willingness to learn and use technology, several workplace challenges became apparent from participant responses. as prior research explored (notable riley-huff and rholes),30 practitioners assumed they would be continually learning and building skills on the job throughout their career to stay current technologically. as described in the earlier results section, many participants mentioned that, although they were highly willing and able to learn, the necessary organizational resources were lacking. as one participant noted, “i’d like to learn anything but the biggest problem seems to be budget (time and monetary).” several participants expressed feeling overwhelmed with their current workload. new learning opportunities, technological or otherwise, were simply not feasible. although the survey results indicated that practitioners of all ages were roughly equally interested in learning new technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 54 technologies, a handful of responses mentioned that ageist issues were creating barriers. though few, these respondents described being dismissed as technologists because of their age. these themes have long been noted in the large body of continuing-education-related literature going back several decades. stone’s study ranked lack of time as the top deterrent to professional development for librarians, and it appears little has changed.31 chan and auster noted that organizational climate and the perception of one’s age may impair the pursuit of professional development, among other impediments.32 however, research has noted a generally strong drive in older librarians to continue their education; long and applegate found a preference in latercareer librarians for learning outlets provided by formal library schools and related professional organizations, but a lower interest in generally popular topics such as programming.33 these findings were consistent with the participant responses gathered in this survey. finally, as detailed in the results section, a significant percent of respondents (33 percent) did not answer the question regarding what technologies they would like to learn. as is a limitation with survey research, it is difficult to know what the respondent’s intention was in not answering the question, i.e., are they comfortable with their current technology skills? do they lack the time or interest in pursuing further technology education? and of those that did answer, many did not specify their intended use of the technologies they desired to learn. so a deeper exploration of what technologies lis practitioners desire to learn and why would be of value as well. these questions are worth pursuing in more depth through further research efforts. conclusion this study provides a broad view into the technologies that lis practitioners currently use and desire to learn, across a variety of types of libraries, through an analysis of survey responses. despite a marked enthusiasm toward using and learning technology, respondents described serious organizational limitations impairing their ability to grow in these areas. the lis practitioners surveyed have interested patrons, see technology as part of their mission, and are not satisfied with the current state of affairs, but they seem to lack money, time, skills, and a willing library administration. though respondents expressed a great deal of interest in more advanced technology topics, such as programming, the majority typically engaged with technology on an end-user level, with a minority engaged in deeply technical work. this study suggests future work in exploring information professionals’ conceptual understanding of and attitudes toward technology, and a deeper look at the reasoning behind those who did not express a desire to learn new technologies. information technology and libraries | december 2016 55 references 1. marshall breeding, “library technology: the next generation,” computers in libraries 33, no. 8 (2013): 16–18, http://librarytechnology.org/repository/item.pl?id=18554. 2. therese f. triumph and penny m. beile, “the trending academic library job market: an analysis of library position announcements from 2011 with comparisons to 1996 and 1988,” college & research libraries 76, no. 6 (2015): 716–39, https://doi.org/10.5860/crl.76.6.716. 3. janie m. mathews and harold pardue, “the presence of it skill sets on librarian position announcements,” college & research libraries 70, no. 3 (2009): 250–57, https://doi.org/10.5860/crl.70.3.250. 4. debra a. riley-huff and julia m. rholes, “librarians and technology skill acquisition: issues and perspectives,” information technology and libraries 30, no. 3 (2011): 129–40, https://doi.org/10.6017/ital.v30i3.1770. 5. monica maceli, “creating tomorrow’s technologists: contrasting information technology curriculum in north american library and information science graduate programs against code4lib job listings,” journal of education for library and information science 56, no. 3 (2015): 198–212, https://doi.org/10.12783/issn.2328-2967/56/3/3. 6. walt crawford, “making it work perspective: techno and techmusts,” cites and insights 8, no. 4 (2008): 23–28. 7. helen partridge et al., “the contemporary librarian: skills, knowledge and attributes required in a world -f emerging technologies,” library & information science research 32, no. 4 (2010): 265–71, https://doi.org/10.1016/j.lisr.2010.07.001. 8. riley-huff and rholes, “librarians and technology skill acquisition.” 9. vandana singh and bharat mehra, “strengths and weaknesses of the information technology curriculum in library and information science graduate programs,” journal of librarianship and information science 45, no. 3 (2013): 219–231, https://doi.org/10.1177/0961000612448206. 10. riley-huff and rholes, “librarians and technology skill acquisition.” 11. sharon hu, “technology impacts on curriculum of library and information science (lis)—a united states (us) perspective,” libres: library & information science research electronic journal 23, no. 2 (2013): 1–9, http://www.libres-ejournal.info/1033/. 12. singh and mehra, “strengths and weaknesses of the information technology curriculum.” 13. see, for example, crawford, “making it work perspective”; partridge et al., “the contemporary librarian.” technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 56 14. riley-huff and rholes, “librarians and technology skill acquisition.” 15. elizabeth w. stone, factors related to the professional development of librarians (metuchen, nj: scarecrow, 1969). 16. donna c. chan and ethel auster, “factors contributing to the professional development of reference librarians,” library & information science research 25, no. 3 (2004): 265–86, https://doi.org/10.1016/s0740-8188(03)00030-6. 17. dorothy e. jones, “ten years later: support staff perceptions and opinions on technology in the workplace,” library trends 47, no. 4 (1999): 711–45. 18. john j. burke, the neal-schuman library technology companion: a basic guide for library staff, 5th edition (new york: neal-schuman, 2016). 19. roy tennant, “the digital librarian shortage,” library journal 127, no. 5 (2002): 32. 20. partridge et al., “the contemporary librarian.” 21. monica maceli, “what technology skills do developers need? a text analysis of job listings in library and information science (lis) from jobs.code4lib.org,” information technology and libraries 34, no. 3 (2015): 8–21, https://doi.org/10.6017/ital.v34i3.5893. 22. youngok choi and edie rasmussen, “what is needed to educate future digital libraries: a study of current practice and staffing patterns in academic and research libraries,” d-lib magazine 12, no. 9 (2006), http://www.dlib.org/dlib/september06/choi/09choi.html. 23. see, for example, maceli, “creating tomorrow's technologists.” 24. elías tzoc and john millard, “technical skills for new digital librarians,” library hi tech news 28, no. 8 (2011): 11–15, https://doi.org/10.1108/07419051111187851. 25. american library association, “manufacturing makerspaces,” american libraries 44, no. 1/2 (2013), https://americanlibrariesmagazine.org/2013/02/06/manufacturing-makerspaces/. 26. caitlin a. bagley, makerspaces: top trailblazing projects, a lita guide (chicago: american library association, 2014). 27. ina fourie and anika meyer, “what to make of makerspaces: tools and diy only or is there an interconnected information resources space?,” library hi tech 33, no. 4 (2015): 519–25, https://doi.org/10.1108/lht-09-2015-0092. 28. heather moorefield-lang, “change in the making: makerspaces and the ever-changing landscape of libraries,” techtrends 59, no. 3 (2015): 107–12, https://doi.org/10.1007/s11528-015-0860-z. information technology and libraries | december 2016 57 29. heather moorefield-lang, “makers in the library: case studies of 3d printers and maker spaces in library settings,” library hi tech 32, no. 4 (2014): 583–93, https://doi.org/10.1108/lht-06-2014-0056. 30. riley-huff and rholes, “librarians and technology skill acquisition.” 31. stone, factors related to the professional development of librarians. 32. chan and auster, “factors contributing to the professional development of reference librarians.” 33. chris e. long and rachel applegate, “bridging the gap in digital library continuing education: how librarians who were not ‘born digital’ are keeping up,” library leadership & management 22, no. 4 (2008), https://journals.tdl.org/llm/index.php/llm/article/view/1744. technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 58 appendix. survey questions 1. what type of library do you work in? 2. where is your library located (state/province/country)? 3. what is your job title? 4. what is your highest level of education? 5. which of the following methods have you used to learn about technologies and how to use them? please mark all that apply. • articles • as part of a degree i earned • books • coworkers • face-to-face credit courses • face-to-face training sessions • library patrons • online credit courses • online training sessions (webinars, etc.) • practice and experiment on my own • web resources i regularly check (sites, blogs, twitter, etc.) • web searching • other: 6. which of the following skill areas are part of your responsibilities? please mark all that apply. • acquisitions • archives/special collections • cataloging • circulation • collection development • distance library services • electronic resource management • instruction • interlibrary loan information technology and libraries | december 2016 59 • library administration • library it/systems • marketing/public relations • media/audiovisuals • outreach • periodicals/serials • reference • user experience • other: 7. how long have you worked in libraries? • 0–2 years • 3–5 years • 6–10 years • 11–15 years • 16–20 years • 21 or more years 8. which of the following technologies or technology skills are you expected to use in your job on a regular basis? please mark all that apply • assistive/adaptive technology • audio recording and editing • augmented reality (google glass, etc.) • blogging • cameras (still, video, etc.) • chromebooks • cloud-based productivity apps (google apps, office 365, etc.) • cloud-based storage (google drive, dropbox, icloud, onedrive, etc.) • computer programming or coding • computer security and privacy knowledge • database creation/editing software (ms access, etc.) • dedicated e-readers (kindle, nook, etc.) • digital projectors technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 60 • discovery layer/service/system • downloadable e-books • educational copyright knowledge • e-mail • facebook • fax machine • image editing software (photoshop, etc.) • laptops • learning management system (lms) or virtual learning environment (vle) • library catalog (public side) • library database searching • library management system (staff side) • library website creation or management • linux • mac operating system • makerspace technologies (laser cutters, cnc machines, arduinos, etc.) • mobile apps • network management • online instructional materials/products (libguides, tutorials, screencasts, etc.) • presentation software (ms powerpoint, prezi, google slides, etc.) • printers (public or staff) • rfid (radio frequency identification) • scanners and similar devices • server management • smart boards/interactive whiteboards • smartphones (iphone, android, etc.) • software installation • spreadsheets (ms excel, google sheets, etc.) • statistical analysis software (sas, spss, etc.) • tablets (ipad, surface, kindle fire, etc.) • teaching others to use technology information technology and libraries | december 2016 61 • teaching using technology (instruction sessions, workshops, etc.) • technology equipment installation • technology purchase decision-making • technology troubleshooting • texting, chatting, or instant messaging • 3d printers • twitter • using a web browser • video recording and editing • virtual reality (oculus rift, etc.) • virtual reference (text, chat, im, etc.) • word processing (ms word, google docs, etc.) • web-based e-book collections • web conferencing/video conferencing (webex, google hangouts, goto meeting, etc.) • webpage creation • web searching • windows operating system • other: 9. which of the following are barriers to new technology adoption in your library? please mark all that apply. • administrative restrictions • budget • lack of fit with library mission • lack of patron interest • lack of staff time • lack of staff with appropriate skill sets • satisfaction with amount of available technology • other: 10. what technology skill would you like to learn to help you do your job better? 11. what technologies do you help patrons with the most? 12. what technology item do you circulate the most? technology skills in the workplace: information professionals’ current use and future aspirations | maceli and burke | https://doi.org/10.6017/ital.v35i4.9540 62 13. what technology or technology skill would you most like to see added to your library? 190 information technology and libraries | december 2011 from static and stale to dynamic and collaborative: the drupal difference editor’s note: this paper is adapted from a presentation given at the 2010 lita forum. i n 2009, the university library of the university of california, santa cruz, moved from a static, dreamweaverand html-created website to an entirely new databasedriven website using the open-source content management system (cms) drupal. this article will describe the interdisciplinary approach the project team took for this large-scale transition process, with a focus on user testing, information architecture planning, user analytics, data gathering, and change management. we examine new approaches implemented for group-authoring of resources and the challenges presented by collaboration and crowdsourcing in an academic environment. we also discuss the impact on librarians and staff changing to this new paradigm of website design and development and the training support provided. we present our process for testing, staging, and publishing new content and describe the modules used to build dynamic subjectand course-guide displays. finally, we provide a list of resources and modules for beginning and intermediate drupal users. why change was needed our old library website was created using static html and its organizational structure evolved to mirror the administrative structure of the library. the vocabulary we used was very library-centric and, though useful to library staff, could be confusing to patrons. like many larger, older websites, we had accumulated a number of redundant and defunct pages. many of these pages had not been updated for years, had inconsistent naming conventions, or outdated page design. the catalyst for updating our web presence was predicated on several things. with more than one million visits per year and more than two million page views, our old servers were no longer able to handle this load, and we were about to begin a major project to replace our server hardware. in addition, we anticipated participating in an upcoming transition to a new campuswide website template. we saw this moment of change as an opportunity to revitalize the library website’s entire structure and reorganize it with a more user-centric approach to the menus and vocabulary. to do this, we decided to move away from dreamweaver and the static html approach to web design and instead choose a cms that would provide a more flexible and innovative interface. choosing drupal we had done research on commercial and open-source solutions and were leaning toward drupal as our cms. many academic departments at our campus were going through a similar process of website redesign and had already explored the cms options and had chosen drupal. this helped move us toward choosing drupal and taking advantage of a growing developer community on campus. two of the largest units on campus both chose drupal as their cms and have since been great partners for collaboration and peer support. drupal is a free, open-source cms (or content management framework) written in php with a mysql database backing it up. it is a small application of core modules with thousands of add-on modules available to increase functionality. drupal also has a very strong developer community and has been adopted by a growing number of libraries. we have found it to be very open and fluid, which is both a blessing and curse. for any one problem there can be dozens of differing solutions and modules to resolve it. the transition team the library created a core website implementation team consisting of a librarian project manager/developer, a web designer from the it department, and two librarian developers. the core team was supported by a server administrator and an it analyst. the it staff supported the technical aspects of drupal installation, backup, and maintenance. the librarian developers planned the content migration and managed the user interface design, layout, content, scope, and architecture. they needed to know the basics of how drupal works and needed to have much more access to the inner workings of drupal (e.g., modules, user permissions, etc.) than staff. the librarians also would train library staff, so needed to be able to teach and develop documentation and tailor instruction to specific staff needs. everyone who participated in the implementation team had many other competing responsibilities. the librarian developers had other projects and traditional duties such as collection development and reference services, so learning drupal and creating this new website was a part-time project and had to be integrated into existing workloads. tutorial ann hubble, deborah a. murphy, and susan chesley perry ann hubble (ahubble@ucsc.edu) is science librarian, deborah a. murphy (damurphy@ucsc.edu) is emerging technologies librarian, and susan chesley perry (chesley@ucsc.edu) is head of digital initiatives, university of california, santa cruz. selecting a web content management system for an academic library website | hubble, murphy, and perry 191from static and stale to dynamic and collaborative: the drupal difference | hubble, murphy, and perry 191 and often eccentric organizational structures that were no longer meaningful. the previous website had accumulated pages that were a bit more freewheeling in design with a lack of consistent navigation and “look and feel.” adding another layer of complexity, our website changeover took place during a period of great organizational change and a severe budget crisis. surprisingly, what seemed at first a major drawback was actually somewhat helpful. with fewer people spread thinner and doing more work, there was less need to feel in control of individual empires, leading to more cooperation during the changeover. staff learning styles vary, and no one approach to drupal training will work for everyone, so we brought many of the lessons we have learned in our bibliographic instruction sessions to our staff training. for example, we focused training on repetition, reassurance, and patience, ensuring it was an active process with hands-on participation as well as a lecture or demonstration. we provided ample time for questions and invited staff to bring their own projects to work on during training sessions. though some staff only needed to learn a few applications within drupal to perform their jobs, most needed specialized instruction to do some departmental-specific task or action that now had a very different interface. we supplemented our large group by drop-in training sessions with specialized departmental sessions, custom-made documentation, individual hands-on training, e-mail updates on system changes, and regular presentations of new system features. not everyone will become a “born again” drupalista, but everyone should at least feel that they can get their work done using drupal. drupal has also meant changes not only in the way content is added to the website, but also in how we handle revisions and updates. in the past, we had a very siloed initially from the increasing interest in drupal at the campus level. they attended a two-day intensive drupal training course from a company called lullabot, which provided an in-depth technical foundation for our initial drupal installation. this level of technical training and content was not appropriate for the other librarian developers on our team. however, a more detailed, midlevel training would have benefited the librarian developers and moved the project forward at a faster pace. these librarian developers learned using a combination of resources, including free online content that covers core drupal skills, combined with a few carefully chosen professional in-person consultations and online training packages and books. drupal is not a static environment, so after the initial training there was still a need for regular updates and refresitishers. our transition team joined the drupal4lib discussion list and consulted with library colleagues using drupal in the northern california area. drupalcon conferences as well as online users groups were excellent places not only to learn but also to make contact with vendors and other developers. several of these resources are listed in the accompanying bibliography in appendix b. staff training by far our largest group of library drupal users was the fifty-plus library staff content contributors who were faced with learning a new approach to web development. drupal’s successful implementation was ultimately dependent on ensuring library staff would be able to create, edit, and manage thousands of library webpages using this new cms. this was a change for everyone in the library, not just a few. the new website meant leaving behind the comfort of routines created over the years, elaborate designs that had been developed, and various idiosyncratic transition planning with the goal of making our new site user-centered, we wanted to make data-driven decisions about design rather than what had ultimately devolved into the practice of decisions based on politics and committee negotiations. to that end, we took several approaches to gathering user data. we inventoried our current site and gathered usage statistics based on website analytics. we met with small campus focus groups who answered questions about library site searching. we created personas for user categories based on profiles of average users (e.g. first-year students, graduate students, faculty, community users, etc.). based on this data, we drafted web interface wireframes and began user testing. drupal implementation also included developing a safe and effective means of moving from a testing environment to a final, public production site. this deployment process is a crucial component of ensuring that we could both test new features and still provide a stable environment for our users. after extensive discussions and revisions we developed a process to experiment with new modules and themes in a way that does not overwrite our existing public site. the deployment process goes from development to staging to production. it is critical to be able to determine that a new module or update will not negatively affect the database. the process we follow from our sandbox site to our production site is described in more detail in appendix a. transition team training we had three types of drupal users within the library: system administrators, developers, and staff (the primary content editors); each group had its own training needs. the library project manager, web designer, and it systems administrators benefited 192 information technology and libraries | december 2011 appropriate for this particular subject. each tab can also be customized to display whichever records pertain to this subject area. how our dynamic displays were built cck (content construction kit) module content used in both the “article databases and research tools” and “subject guides” displays is held within a special record type we created using the content construction kit (cck) module. we called this special record, or content type, online resource. we defined fields within the online resource record to hold information about individual resources we want to either display on our website or keep track of internally. the fields we defined include the resource name, url (sometimes multiple urls), description, type of information (article database, dictionary, encyclopedia, etc.), and subject discipline. figure 3 shows what a portion of the online resource record for a particular database changes, it’s updated in a central record and immediately reflected in displays throughout the site. not only is it less work to update information, but we also can provide resources in more varied combinations and make them more findable for our users. figure 1 shows how the dynamically created “article databases and research tools” list appears to a user browsing our website. the default display lists these resources in alphabetical order. the user can display the same group of records sorted by other criteria just by clicking on the appropriate tab. if the “by subject” tab is selected, the resources are displayed under subject headings. selecting the “by type” tab lists the resources by resource types, such as dictionaries and encyclopedias, citation style guides, etc. our subject guides are also created using the same components used to build the “article databases and research tools” lists. figure 2 shows a portion of one of our subject guides. like the previous example, this portion of the guide is created dynamically, displaying only records permissions environment that limited editing to only those given specific permissions. we now have role-based ownership where everyone can edit everything so that we did not have to keep up a detailed list of who does what. initial concern that someone could write over or accidently delete pages was somewhat remedied by the drupal revisions history feature, which assists with version control. there have been a few pages where ownership is an issue, and we are still in the process of developing a system to ensure that pages are updated when there is no specific individual linked to a page. dynamic displays: article databases and subject guides as part of the move to drupal, we wanted to take advantage of the new environment to redesign some of the more specialized portions of our site. in particular, we hoped that drupal’s dynamic displays would help us to keep our site more current with less effort. with this in mind, we chose to focus on two of the most heavily used resources: our list of article databases and our library subject guides. we planned to transform these static, high-maintenance html pages into dynamic, easily maintained, and easily generated resources. we used a number of drupal modules to develop the library’s website, and these are described in more detail in appendix c. to redesign our list of article databases and our library subject guides, we relied heavily on three important modules: cck (content construction kit), views, and taxonomy. the interaction of these three modules is key to building dynamically created webpages. once these modules are configured, information is input just once. drupal does the work of pulling the right information from each resource to create dynamic displays. if information, such as a url figure 1. dynamic display: article databases and research tools selecting a web content management system for an academic library website | hubble, murphy, and perry 193from static and stale to dynamic and collaborative: the drupal difference | hubble, murphy, and perry 193 and not programmers. we found that drupal was very different from anything we had used before and had a very steep learning curve. if we could start over, we would have invested much more time in lessons learned learning drupal takes time our implementation team was comprised predominantly of librarians and list of defined fields looks like behind the scenes to the librarian web developer. some fields within the online resource record rely on a little further customization. the “type of information” field is defined via an allowed values list. figure 4 shows a portion of the values list we have defined for this particular field. the “subject, discipline, topic” field (figure 3) incorporates a taxonomy list that we first created using the taxonomy module. this taxonomy vocabulary allows us to later sort the resources dynamically in both the “article databases and research tools” (figure 1) and “subject guides” displays (figure 2). taxonomy module figure 5 shows the list of subject terms we created using the taxonomy module. terms are easily added, edited, and organized via this taxonomy display, available only to the web developers. views module–putting it all together to define how the online resource records are displayed to the user (figure 1), we use the views module. views allow us to define, sort, and filter these records for display. figure 6 shows what the “article databases and research tools” view of figure 1 looks like to the web developer. notice that “a–z,” “subjects,” and “by type” are listed in a box on the left side of the page. each of these tabs corresponds to a tab on the page that displays to the user. in this case, “a–z” is bold and is the active tab currently being defined for this display. display settings such as the record type used, number of records to display per page, specific fields to display, type of sorting, and url path for the webpage are defined here. figure 2. dynamic display: dynamic portion of a subject guide figure 3. cck module: online resource record: manage fields 194 information technology and libraries | december 2011 learning drupal basics and getting a better grasp of how drupal works as a cms. the architecture and database-driven paradigm of our new drupal site is a significantly different environment from our previous website’s html-designed pages and directory-and-folder organization. of particular importance for our site were three core modules: cck, views, and taxonomy. becoming proficient with these modules was a challenge, and we can’t emphasize enough the importance of good, basic training on their use. start small: identify small parts to bring over initially, the thought of moving our old website to drupal seemed insurmountable. bringing over static html pages was straightforward, but portions of the website (such as converting our database of online resources) took more intensive planning. the entire process became more manageable when we divided up the site and focused on drupalizing small parts at a time. this way we could focus on learning enough drupal to make these portions of the site work without being overwhelmed. project management software: document & share what you’ve done if we were to transition an entire website again we would recommend using some type of project management software before starting. none of the implementation team worked on this site full time. this project was added to our other full-time workload providing reference services, collection planning, teaching, digital projects, etc. during our project we tried several free products but were not satisfied with any of them. we felt that finding the right project management package could have made the website transition process much figure 4. cck module: allowed values list (type of information) figure 5. taxonomy module: subjects list selecting a web content management system for an academic library website | hubble, murphy, and perry 195from static and stale to dynamic and collaborative: the drupal difference | hubble, murphy, and perry 195 and the library is now in a much better position for future website design transitions, a process that will be much easier with so much less static content to migrate. for example, the look and feel of our entire website can be transformed by reconfiguring a few options within drupal. ultimately, the transition of the library website to a drupal environment was a very good thing, and we are glad we did it. it was difficult and messy at times, but our website is now more flexible, agile, adaptable, and better poised for change. epilogue since this article was submitted, the uc santa cruz university library website has moved to an entirely new campus theme. we note that having a drupal-based cms greatly aided this transition process. personas for librarians and content contributors and done more usability testing for non-developers. we found that training and teaching library staff the architecture and databasedriven paradigm of the new drupal culture has been a challenge and we still have varying levels of buy-in. conclusion we now have a consistent look and feel to our site, though there are still many things yet to do. now that we are more comfortable using drupal, we can focus on creating more dynamic content, such as staff lists, adding sidebars to pages, and so on. increasing the number of dynamically created pages will mean a more up-to-date site in general. though group authoring within the library is still a challenge, we continue to find ways to encourage collaboration. easier. documenting and sharing how we created elements of the site helped us replicate complex components and allowed us to collaborate more easily on various projects. test, test, test testing the website as we developed it was a crucial component of our work. modules also can interact with other modules in unpredictable ways, so we ultimately found that loading new modules on our sandbox site, a mirror of the library website, was a crucial step in determining compatibility as well as functionality with our existing site (appendix a). it’s essential to practice using a live site without bringing the real production website down. focus on essential modules: cck, views, taxonomy images, wysiwyg editors drupal comes with a set of core modules plus an ever-increasing number of specialized contributed modules. finding and installing the right contributed module that fits a particular need can sometimes be difficult. there are often myriad modules that can solve a problem. it takes time to find and test each one to see if it will actually function as needed, and not all modules work well with one another. focusing on the essential drupal core modules plus cck, views, and taxonomy will help reduce unnecessary development frustrations. staff are important though we created many personas for faculty, students, and community users, we should have created figure 6. views module: article databases and research tools view 196 information technology and libraries | december 2011 appendix a. website deployment process created by bryn kanar and sue chesley perry selecting a web content management system for an academic library website | hubble, murphy, and perry 197from static and stale to dynamic and collaborative: the drupal difference | hubble, murphy, and perry 197 appendix b. drupal resources for getting started ■■ american library association. “drupal4lib interest group (lita library & information technology association).” http://connect.ala.org/node/71787 (accessed march 18, 2011). ■■ american library association. “showcase: database pages & research guides using drupal.” http://connect.ala .org/node/98546 (accessed march 18, 2011). ■■ austin, andy, and christopher harris. “drupal in libraries.” library technology reports 44, no. 4 (2008). ■■ byron, angela, addison berry, nathan haug, jeff eaton, james walker, and jeff robbins. using drupal: choosing and configuring modules to build dynamic websites. sebastopol, ca: o'reilly, 2008. ■■ drupal. “drupal.org.” http://drupal.org/(accessed march 18, 2011). ■■ drupal dojo. “drupal dojo.” http://drupaldojo.com/ (accessed march 18, 2011). ■■ drupal modules. “search, rate, and review drupal modules.” http://drupalmodules.com/ (accessed march 18, 2011). ■■ “drupalconsf san francisco – april 19-21, 2010.” http://sf2010.drupal.org/conference/sessions (accessed march 18, 2011). ■■ drupalib.”drupalib: a place for library drupalers to hang out.” http://drupalib.interoperating.info/ (accessed march 18, 2011). ■■ gotdrupal.com. “gotdrupal: once you've got it, you're addicted!”. http://gotdrupal.com (accessed march 18, 2011). ■■ groups.drupal. “libraries.” http://groups.drupal.org/libraries (accessed march 18, 2011). ■■ groups.drupal. “list of libraries using drupal.” http://groups.drupal.org/libraries/libraries (accessed march 18, 2011). ■■ “is this site built with drupal?”. http://www.isthissitebuiltwithdrupal.com/ (accessed march 18, 2011). ■■ learn by the drop. “learn by the drop: a place to learn drupal.” http://learnbythedrop.com/ (accessed march 18, 2011). ■■ “lullabot.” http://lullabot.com (accessed march 18, 2011). ■■ mastering drupal. “drupal screencasts.” http://www.masteringdrupal.com/videos (accessed march 18, 2011). ■■ slideshare. “drupal resources for libraries, sarah houghton-jan.” http://www.slideshare.net/librarianinblack/ drupal-resources-2982935 (accessed march 18, 2011). ■■ slideshare. “introduction to drupal for libraries, laura solomon.” http://www.slideshare.net/oplin/intro-to -drupal-for-libraries (accessed march 18, 2011). ■■ sunrainproductions. “drupalcampla 2009 views demystified.” http://www.sunrainproductions.com/ drupalcampla/views-demystified (accessed march 18, 2011). appendix c. selected drupal modules used on the ucsc library site ■■ administration menu—adds a top menu bar for authenticated users with common administration tasks ■■ cck—allows you to add new content types, for example the online resources content type for a–z list ■■ ckeditor—wysiwyg editor ■■ google analytics—adds google javascript tracking code to all of our site's pages ■■ google cse—allows us to use google as the site search ■■ imce—image-uploading module, also allows you to create subdirectories within the image directory ■■ image cache—allows you to pre-set sizes for images ■■ ldap integration—links user authentication to the library’s ldap server ■■ mollum—spam filter and image captcha (part of spam control) ■■ nice menus—allows drop-down/right/left expandable menus ■■ nodeblock—allows you to specify a content type as being a block, which content creators to edit the block text and title without having to access the block administration page ■■ pathauto—automatically generates path aliases for various kinds of content (nodes, categories, users) ■■ printer-friendly, e-mail and pdf versions—allows you to configure any type of page to display links for print, e-mail, and pdf ■■ rules—allows site administrators to define conditionally executed actions based on occurring events, we use it to send email when new content is created and to hide some content fields from selected user roles ■■ taxonomy—enables us to assign subjects and other categories to content; the url paths and views use taxonomy ■■ webform—enables quick creation of forms and questionnaires 10844 20190318 galley library services navigation: improving the online user experience brian rennick information technology and libraries | march 2019 14 brian rennick (brian_rennick@byu.edu) is aul for library it, brigham young university. abstract while the discoverability of traditional information resources is often the focus of library website design, there is also a need to help users find other services such as equipment, study rooms, and programs. a recent assessment of the brigham young university library website identified nearly two hundred services. many of these service descriptions were buried deep in the site, making them difficult to locate. this article will describe a web application that was developed to improve service discovery and to help ensure the accuracy and maintainability of service information on an academic library website. introduction the brigham young university library released a new version of its website in 2014. multiple usability studies were conducted to inform the design of the new site. during these studies, the web designers observed that when a user did not see what they were looking for on the homepage, they were likely to click on the “services” link as the next best option. the word services appeared to be an effective catch-all term. web designers asked themselves, “what is a library service?” they concluded that a library service could be anything public-facing that meets the needs of a user. using this broad definition, services could include: • library materials—both digital and physical (e.g. books, dvds) • material services (e.g. course reserve, interlibrary loan) • equipment and technology (e.g. computers, cameras, tripods) • help and guidance (e.g. research assistance, computer assistance) • locations (e.g. group study rooms, classrooms, help desks) • programs (e.g. friends of the library, lectures) because libraries offer so many diverse services, structuring a website to effectively promote them all brings many challenges. for instance, a common approach to presenting library services on a website is to have a menu that lists a few of the most popular or important services. the last menu item will normally be a link to a web page for “other services” that provides a more comprehensive service list. such an all-inclusive listing of library services on a single web page can easily lead to information overload for users. where do services belong in a library website’s information architecture? determining the one correct path is not easy because there are multiple valid ways to organize services into web pages. services could be arranged by department, service category, user group (undergraduates, graduates, faculty, visitors, alumni), or any number of other ways. an ideal system would allow users to follow the path that makes the most sense to them. information technology and libraries | march 2019 15 user expectations for a single (google-like) search box add to the challenges for service listings.1 a single search box, also known as a metasearch system, web-scale discovery service, or federated search, combines search results from multiple library sources. a study at the university of colorado found that users expected to locate services by entering keywords into the single search box on the library’s homepage.2 for example, the users attempted to search for “interlibrary loan” and “chat with a librarian” using the single search box. it is unrealistic to expect all users to follow a specific series of links in order to find the one correct path to information about a service when they are accustomed to google-style searching. even when a user manages to locate the correct web page where a service is described, the pertinent information can still be difficult to pinpoint when service descriptions are buried in paragraphs. users need to be able to quickly perform a visual scan of a web page to locate service information. kozak and hartley suggest that “bulleted lists are easier to read, easier to search and easier to remember than continuous prose.”3 the ongoing maintenance of service listings poses another significant challenge. for large academic libraries, up-to-date service information is difficult to maintain because it is typically scattered throughout a website. each department may have its own set of web pages and service listings. department pages created and maintained by different individuals end up with inconsistent design, organization, and voice. services that are common to multiple departments will have duplicate listings with different descriptions. maintenance of accurate information becomes an issue as services change; tracking down all of the references to a discontinued or modified service requires extensive searching of the website. literature review studies and commentaries regarding the information architecture of academic library websites have been covered extensively in the literature.4 a few articles specifically address the way that library services are organized on websites. library services are a significant component of academic library website content. clyde studied one hundred library websites from thirteen countries in order to compare common features and to determine some of the purposes for a library website.5 purposes for the sites varied. some focused on providing information about the library and its services while others functioned more like a portal, providing links to internet resources. cohen and still developed a list of core content for academic library websites by examining pages from university and two-year college sites.6 they organized the content into categories: library information, reference, research, instruction, and functionalities. liu surveyed arl libraries to get an overview of the state of web page development.7 the subsequent spec kit identifies services commonly found on academic library websites. yang and dalal studied a random sample of academic library websites to see which web–based reference services were offered and how they were presented.8 they also examined the differing terminology used to describe the services. the choice of terminology used on library websites impacts the findability of services. dewey compared academic websites from thirteen member libraries of a consortium to determine how findable service links were on the sites.9 the service links used in the evaluation covered “access, reference, information, and user education” categories. the study measured the number of clicks from the homepage that were required to find information about a service. dewey found library services navigation | rennick 16 https://doi.org/10.6017/ital.v38i1.10844 inconsistent use of terminology used to describe library services from one site to another. dewey posited that extensive use of library jargon could, in a sense, hide links from users. the overall conclusion was that the websites contained “too much information poorly placed.” a study of an academic library website by mcgillis and toms also found that participants struggled with terminology when attempting to locate services.10 the website reflected “traditional library structures” instead of using categories that were meaningful to users. the decision on where to place library services on a website is an important step in the design process. as part of their proposal to establish a benchmarking program for academic library websites, hightower, shih, and tilghman created classifications for the web pages they studied.11 library services were assigned to the “directional” category instead of representing a separate category. vaughan described a history of changes to an academic website that took place from 1996–2000.12 an interesting change was that, after multiple redesigns, the web designers combined two categories into a single “library services” category in order to simplify top level navigation on the home page. comeaux studied thirty-seven academic library websites to see how design elements evolved between 2012 and 2015.13 a portion of the study compiled terms used as navigation labels. the term “about” was the most common navigation label followed by “services” as the second most common. use of the term “services” as a main navigation label increased in popularity from 2012 to 2015. several researchers suggest organizing library services into web pages or portals that target different audiences. gullikson et al. studied usability issues related to the information architecture of an academic website and discovered that study participants followed different paths in their attempts to locate service information on the site.14 some users found items easily while others were unsuccessful. menu labels were not universally understood. the researchers identified a need for multiple access points to information in order to accommodate different mental models. they suggested employing multiple information organizational schemes, such as categorizing links by function, frequency of use, and target user group. adams and cassner analyzed the websites of arl libraries to see how services for distance education students and faculty were presented.15 they recommend strategies for helping distance students navigate the website, including maintaining a web page designed specifically for distance students that avoided jargon and clearly described services. detlor and lewis envisioned academic library websites as “sophisticated guidance systems which support users across a wide spectrum of information seeking behaviors—from goal-directed search to wayward browsing.”16 they reviewed arl library websites to see which important features were present or absent. their coding methodology was adopted by gardner, juricek, and xu in their study of how library web pages can meet the needs of campus faculty.17 liu proposed a conceptual model for an improved academic library website that would be organized into portals designed for specific user groups, such as undergraduates, faculty, or visitors.18 some of the arl websites studied by the researcher already implemented portals by user group. a more recent approach for locating library services has been to include website search results when using the single search from the homepage. for example, the north carolina state libraries website includes library-wide site search results when using the single search.19 the wayne state university libraries single search displays results from a university-wide site search.20 information technology and libraries | march 2019 17 an influential report produced by andrew pace provides practical advice for designing library websites.21 in the report, pace described the library services that should be included on a site and stressed that website design affects the discoverability and delivery of these services: “whether requiring minimal maintenance or constant upkeep, the extensibility of the design and flexibility of a site’s architecture ultimately saves the library time, money, hassle, and user frustration.”22 the web application described in this article aims to achieve these goals in terms of service discoverability and website maintainability. a services web application in an effort to tackle the challenges of services navigation and maintenance, the brigham young university library developed a web application for organizing services that allows multiple routes to service information. the application, known internally as “services,” was built using django, an open-source python web framework. the application incorporates a comprehensive list of library services and a map of service relationships. each service is assigned one or more categories, locations, and service areas within the application: • categories and subcategories—broad groupings of services (e.g., research help, for faculty, printing and copying) • locations—physical or virtual places within the library where services can be found (e.g., help desks, rooms) • service areas— library departments or other organizational units that offer services (e.g., humanities, special collections) services can have multiple categories, locations, and service areas and some service areas have multiple locations within the library (see figure 1). service information can also include links to related services. these links facilitate the serendipitous discovery of additional services (see figure 2). service information is stored in a relational database that joins connected entities together. an html template is used to format service information from the database in order to generate web pages for each of the services. maintaining the data in this manner ensures that changes made to service information in the database flow through to all of the associated web pages. adding or modifying entries automatically triggers the generation of new html for only the impacted services. generating static content by using triggers keeps the web pages up-to-date without the performance hit of real-time dynamic page generation. library services navigation | rennick 18 https://doi.org/10.6017/ital.v38i1.10844 figure 1. sample map illustrating relationships between services (on the left side) and service area locations (on the right side). information technology and libraries | march 2019 19 figure 2. sample map of how related service web pages are linked. library services navigation | rennick 20 https://doi.org/10.6017/ital.v38i1.10844 user scenarios the following examples of navigation paths typify how the web application can help users locate services. in each case there are multiple alternative paths that could be followed to find the same information. scenario 1. a student is looking for a computer that has music notation software installed. clicking the “services” link on the library homepage leads to a summary of library services. the student clicks the “public computers” link found under the “featured services” heading and is presented with detailed information about the computers. in the bullet points listed in the “overview” section there is a link to “see the list of software available on these computers.” following this link the student is able to learn that the desired software is available in the library’s music and dance media lab. scenario 2. while visiting a web page for the faculty delivery service, a professor notices a link to the category “for faculty.” following the link leads to a page that highlights some of the library services provided exclusively to campus faculty. the professor clicks the link “faculty expedited book orders” and is taken to a web page that describes the service and provides an online form for requesting a book. scenario 3. a student would like to borrow a camera for a class project. entering “digital cameras” into the main search box on the library homepage produces a link to “digital cameras (dslr)” listed under the “library services” heading at the top of the search results. following the link leads to a web page with information about the library’s digital camera offerings. the web page provides links to related services, including the library’s video production studio. the student decides to reserve the studio instead of checking out a camera. anatomy of a services web page each service web page is divided into sections to help users quickly find the type of information they seek. each section represents an information module with a specific purpose and an identifying design; the sections are color coded and displayed in a consistent order on each page. this helps users to find the same kind of information in the same place on every service page. major sections include: • title • description • keywords • hours • location • contact • overview • call to action • frequently asked questions • additional resources • related services • categories information technology and libraries | march 2019 21 a few of the sections require an explanation. the hours, location, and contact sections are links located directly below the title and description. clicking these links displays the section content. the overview section is intended to provide brief bullet points near the top of the web page so that users can quickly scan the most important information about the service. the call to action section follows these bullet points and contains one or more links to web applications that facilitate use of the service. examples of calls to action include: • place a hold • reserve a group study room • register for an advanced writing class • submit an interlibrary loan request most of the sections are optional since not all sections apply to every service. the services web pages can also include raw html that is embedded in a section in order to provide unique formatting for those services that do not neatly fit the standard layout. for example, the public computers page includes a section that displays the current availability of computers for each floor of the library. the look and feel of services web pages can be extended to other pages on the library website. library departments have web pages that provide information about personnel, mission, location, and services offered. some of these pages have been converted to a format that resembles the services layout in an effort to add cohesiveness to the library website. the department pages have sections similar to services pages such as hours, location, contact information, and an overview with bullet points. the pages can automatically display links to all of the services available in the department. because department pages are part of the services application and are connected to services with a relational database, changes to service information remains in sync across the entire website. this helps alleviate the problem of out-of-date department web pages. searching for services services can be located by submitting a query in a search box or by following links found on the main services web page. the services search engine matches words from the query with words found in a service name or associated tags. each service is tagged with keywords, phrases, or synonyms to increase the likelihood of successful searching. users may not be familiar with library jargon and will search for services using a variety of terms. it is impossible to name library services in a way that is understood by everyone, especially since academic library services target both students and faculty. a study on library services and user-centered language found that: “the choices of the graduate students did not always mirror those of the faculty. this highlights the inherent challenge of marketing services—the target audiences for the same service can have very different opinions and preferences.”23 services can have multi-word phrases assigned in addition to individual keywords. for example, the data management service has the following synonyms assigned: data curation, data management plan, and dmp. new keywords and phrases can be identified by reviewing search queries in the system log files and by conducting usability studies. library services navigation | rennick 22 https://doi.org/10.6017/ital.v38i1.10844 figure 3. the interlibrary loan service web page. information technology and libraries | march 2019 23 in addition to using a search box on the services web pages, users can search for services using the single search box on the library’s homepage. the single search box returns a link to matching services as part of search results when the search engine recognizes services keywords in a query. the services application has an api that makes keywords and other service information available to the single search box application. figure 4. search for a service from the single search box on the library’s homepage. figure 5. json results from the services api. to facilitate browsing, services are organized into three groups on the services web page: featured services, categories, and service areas. the featured services group highlights the most commonly sought-after services. categories are organized by the type of service or the target audience. the service areas group directs users to services available in library departments or units. the services web page does not list every service but instead directs users to web pages based on categories or service areas that list individual services. the services search feature can also include links to non-services. for example, library policies are not services yet users occasionally search for them on the services page (the library website posts {"status": 200, "results": [{"url": "https://lib.byu.edu/services/datamanagement/", "type": "service", "name": "data management", "slug": "datamanagement", "description": "through our institutional repository scholarsarchive, faculty can store research data. this is particularly useful for faculty who must develop data management plans for research projects funded by grants.", "keywords": ["data curation", "dmp", "data management plan", "data storage", "open access"]}], "total": 1, "query": "dmp"} library services navigation | rennick 24 https://doi.org/10.6017/ital.v38i1.10844 policy documents on the about page). in order to minimize user frustration with searching, links to non-services are included in search results so that users can be redirected to the desired pages. to help with optimization for external search engines such as google, each services page has a user-friendly url that clearly identifies the service. for example, the 3d printer service has the url https://lib.byu.edu/services/3d-printers/. each web page also includes the service name in an embedded html title tag. conclusion adopting a broad view of what represents a service has altered the library’s approach to the information architecture of the website. the services web application offers several innovations for improving library service discoverability and maintenance including: • standardized organization of service information • attaching keywords/aliases to service descriptions • an api for integration with the single search box on the homepage • links to related services • generation of web pages from a relational database usability tests were conducted throughout the development of the services application. follow-up assessments are planned for the future in order to verify that the application works as expected and to identify potential adjustments to the design. the services application shows promise as an effective tool for facilitating the discovery of services and increasing the reliability and uniformity of service information. acknowledgements the author gratefully acknowledges the contributions of grant zabriskie for the original concept and design of the services application and ben crowder for the implementation. references 1 cory lown, tito sierra, and josh boyer, “how users search the library from a single search box,” college & research libraries 74, no. 3 (may 2013): 227-41, https://doi.org/10.5860/crl-321. 2 rice majors, “comparative user experiences of next-generation catalogue interfaces,” library trends 61, no. 1 (summer 2012): 186–207, https://doi.org/10.1353/lib.2012.0029. 3 marcin kozak and james hartley, “writing the conclusions: how do bullet-points help?” journal of information science 37 no. 2 (feb. 2011): 221–24, https://doi.org/10.1177/0165551511399333. 4 barbara a. blummer, “a literature review of academic library web page studies,” journal of web librarianship 1 no. 1 (2007): 45–64, https://doi.org/10.1300/j502v01n01_04; galina letnikova, “usability testing of academic library web sites: a selective annotated bibliography,” internet reference services quarterly 8 no. 4 (2004): 53–68, https://doi.org/10.1300/j136v08n04_04. information technology and libraries | march 2019 25 5 laurel a. clyde, “the library as information provider: the home page,” the electronic library 14 no. 6 (dec. 1996): 549–58, https://doi.org/10.1108/eb045522. 6 laura b. cohen and julie m. still, “a comparison of research university and two-year college library web sites: content, functionality, and form,” college & research libraries 60 no. 3 (1999): 275–89, https://doi.org/10.5860/crl.60.3.275. 7 yaping peter liu, “web page development and management: a spec kit,” association of research libraries (1999): https://hdl.handle.net/2027/mdp.39015042087232. 8 sharon q. yang and heather a. dalal, “delivering virtual reference services on the web: an investigation into the current practice by academic libraries,” journal of academic librarianship 41 no. 1 (2015): 68–86, https://doi.org/10.1016/j.acalib.2014.10.003. 9 barbara i. dewey, “in search of services: analyzing the findability of links on cic university libraries’ web pages,” information technology and libraries, 18 no. 4 (1999): 210–13, http://www.ala.org/sites/ala.org.acrl/files/content/conferences/pdf/dewey99.pdf. 10 louise mcgillis and elaine g. toms, “usability of the academic library web site: implications for design,” college & research libraries 62 no. 4 (july 2001): 355–67, https://doi.org/10.5860/crl.62.4.355. 11 christy hightower, julie shih, and adam tilghman, “recommendations for benchmarking web site usage among academic libraries,” college & research libraries 59 no. 1 (jan. 1998): 61–79, https://crl.acrl.org/index.php/crl/article/viewfile/15182/16628. 12 jason vaughan, “three iterations of an academic library web site,” information technology and libraries 20 no. 2 (june 2001): 81–92, https://search.proquest.com/docview/215832160. 13 david j. comeaux, “web design trends in academic libraries—a longitudinal study,” journal of web librarianship 11 no. 1 (2017): 1–15, https://doi.org/10.1080/19322909.2016.1230031. 14 shelly gullikson et al., “the impact of information architecture on academic web site usability,” the electronic library 17 no. 5 (oct. 1999): 293–304, https://doi.org/10.1108/02640479910330714. 15 kate e. adams and mary cassner, “content and design of academic library web sites for distance learners: an analysis of arl libraries,” journal of library administration 37 no. 1/2 (2002): 3–13, https://doi.org/10.1300/j111v37n01_02. 16 brian detlor and vivian lewis, “academic library web sites: current practice and future directions,” journal of academic librarianship 32 no. 3 (may 2006): 251–58, https://doi.org/10.1016/j.acalib.2006.02.007. 17 susan j. gardner, john eric juricek, and f. grace xu, “an analysis of academic library web pages for faculty,” journal of academic librarianship 34 no. 1 (jan. 2008): 6–24, https://doi.org/10.1016/j.acalib.2007.11.006. library services navigation | rennick 26 https://doi.org/10.6017/ital.v38i1.10844 18 shu liu, “engaging users: the future of academic library web sites,” college & research libraries 69 no. 1 (jan. 2008): 6–27, https://doi.org/10.5860/crl.69.1.6. 19 kevin beswick, “quicksearch,” north carolina state university libraries, accessed nov. 28, 2018, https://www.lib.ncsu.edu/projects/quicksearch. 20 cole hudson and graham hukill, “one-to-many: building a single-search interface for disparate resources,” in exploring discovery: the front door to your library’s licensed and digitized content, ed. kenneth j. varnum (chicago: ala editions, 2016), 141–53, http://digitalcommons.wayne.edu/libsp/114. 21 andrew k. pace, “optimizing library web services: a usability approach,” library technology reports 38 no. 2 (mar./apr. 2002): 1–87, https://doi.org/10.5860/ltr.38n2. 22 ibid. 23 allison r. benedetti, “promoting library services with user-centered language,” libraries & the academy 17 no. 2 (apr. 2017): 217-34, https://doi.org/10.1353/pla.2017.0013. microsoft word 5888-14722-8-ce.docx exploratory subject searching in library catalogs: reclaiming the vision julia bauder and emma lange information technology and libraries | june 2015 92 abstract librarians have had innovative ideas for ways to use subject and classification data to provide an improved online search experience for decades, yet after thirty-‐plus years of improvements in our online catalogs, users continue to struggle with narrowing down their subject searches to provide manageable lists containing only relevant results. this article reports on one attempt to rectify that situation by radically reenvisioning the library catalog interface, enabling users to interact with and explore their search results in a profoundly different way. this new interface gives users the option of viewing a graphical overview of their results, grouped by discipline and subject. results are depicted as a two-‐level treemap, which gives users a visual representation of the disciplinary perspectives (as represented by the main classes of the library of congress classification) and topics (as represented by elements of the library of congress subject headings) included in the results. introduction reading library literature from the early days of the opac era is simultaneously inspiring and depressing. the enthusiasm that some librarians felt in those days about the new possibilities that were being opened by online catalogs is infectious. elaine svenonius envisioned a catalog that could interactively guide users from a broad single-‐word search to the specific topic in which they were really interested.1 pauline cochrane conceived of a catalog that could group results on similar aspects of a given subject, showing the user a “systematic outline” of what was available on the subject and allowing the user to narrow their search easily.2 marcia bates even pondered whether “any indexing/access apparatus that does not stimulate, intrigue, and give pleasure in the hunt is defective,” since “people enjoy exploring knowledge, particularly if they can pursue mental associations in the same way they do in their minds. . . . should that not also carry over into enjoying exploring an apparatus that reflects knowledge, that suggests paths not thought of, and that shows relationships between topics that are surprising?”3 however, looking back thirty years later, it is dispiriting to consider how many of these visions have not yet been realized. the following article reports on one attempt to rectify that situation by radically reenvisioning the library catalog interface, enabling users to interact with and explore their search results in a profoundly different way. the idea is to give users the option of viewing a graphical overview of their results, grouped by discipline and subject. this was achieved by modifying a vufind-‐based julia bauder (bauderj@grinnell.edu) is social studies and data services librarian, and emma lange (langemm@grinnell.edu) is an undergraduate student and former library intern, grinnell college, grinnell, iowa. exploratory subject searching in library catalogs: reclaiming the vision | bauder and lange doi: 10.6017/ital.v34i2.5888 93 discovery layer to allow users to choose between a traditional, list-‐based view of their search results and a visualized view. in the visualized view, results are depicted as a two-‐level treemap, which gives users a visual representation of the disciplinary perspectives (as represented by the main classes of the library of congress classification [lcc]) and topics (as represented by elements of the library of congress subject headings [lcsh]) included in the results. an example of this visualized view can be seen in figure 1. figure 1. visualization of the results for a search for “climate change.” subsequent sections of this paper summarize the library-‐science and computer-‐science literature that provides the theoretical justification this project, explain how the visualizations are created, and report on the results of usability testing of the visual interface with faculty, academic staff, and undergraduate students. information technology and libraries | june 2015 94 literature review exploratory subject searching in library catalogs since charles ammi cutter published his rules for a printed dictionary catalogue in 1876, most library catalogs have been premised on the idea that users have a very good idea of what they are looking for before they begin to interact with the catalog.4 in this classic view, users are either conducting known-‐item searches—they know the titles or the author of the books they want to find—or they know the exact subject on which they are interested in finding books. yet research has shown that known-‐item searches are only about half of catalog searches,5 and that users often have a very difficult time expressing their information needs with enough detail to construct a specific subject search. instead, much of the time, users approach the catalog with only a vaguely formulated information need and an even vaguer sense of what words to type into the catalog to get the resources that would solve their information need.6 even in the earliest days of the opac era, librarians were aware of this problem. some of them, including elaine svenonius and pauline cochrane, speculated about better use of subject and classification data to try to help users who enter too-‐short, overly broad searches focus their results on the information that they truly want. one of cochrane’s many ideas on this topic was to use subject and classification data “to present a systematic outline of a subject,” which would let users see all of the different aspects of that subject, as reflected in the library’s classification system and subject headings, and the various locations where those materials could be found in the library.7 svenonius suggested using library classifications to help narrow users’ searches to appropriate areas of the catalog. for example, she suggests, if a user enters “freedom” as a search term, the system might be programmed to present to the user contexts in which “freedom” is used in the dewey decimal classification, such as “freedom of choice” or “freedom of the press.” once the user selects a one of these phrases, svenonius continued, the system could present the user with additional contextual information, again allow the user to specify which context is desired, and then guide the user to the exact call number range for information on the topic. she concluded, “thus by contextualizing vague words, such as freedom, within perspective hierarchies, the computer might guide a user from an ineptly or imprecisely articulated search request to one that is quite specific.”8 ideas such as these had little impact on the design of production library catalogs until the late 1990s, when a dutch company, medialab solutions, began developing aquabrowser, which features a word cloud composed of synonyms and other words related to the search term and allows users to refocus their search by clicking on these words.9 aquabrowser became available in the united states in the mid-‐2000s, shortly before north carolina state university launched its endeca-‐based catalog in 2006.10 while aquabrowser’s word cloud is certainly visually striking, the feature that these and most of the subsequent “next-‐generation” library catalogs implement that has had the most impact on search behavior is faceting. facets, while not as sophisticated as the systems envisioned by exploratory subject searching in library catalogs: reclaiming the vision | bauder and lange doi: 10.6017/ital.v34i2.5888 95 svenonius and cochrane, are partial solutions to the problems they lay out. facets can serve to give users a high-‐level overview of what is available on a topic, based on classification, format, period, or other factors. they can also help guide a user from an impossibly broad search to a more focused one. various studies have shown that faceted interfaces are effective at helping users narrow their searches, as well as helping them discover more relevant materials than they did when performing similar tasks on nonfaceted interfaces.11 however, studies have also shown that users can become overwhelmed by the number and variety of facets available and the number of options shown under each facet.12 visual interfaces to document corpora when librarians were pondering how to create a better online library catalog, computer scientists were investigating the broader problem of helping users to navigate and search large databases and collections of documents effectively. visual interfaces have been one of the methods computer scientists have investigated for providing user-‐friendly navigation, with perhaps the most prominent early advocate for visual interfaces being ben shneiderman.13 in recent years, shneiderman and other researchers have built and tested various types of experimental visual interfaces for different forms of information-‐seeking.14 however, with a few exceptions, most of these visual interfaces have remained in a laboratory rather than a production setting.15 with the exception of the “date slider,” a common interface feature that displays a bar graph showing dates related to the search results and allows users to slide handles to include or exclude times from their search results, few current document search systems present users with any kind of visual interface. method the grinnell college libraries use vufind, open-‐source software originally developed at villanova university as a discovery layer to use over a traditional ils. vufind in turn makes use of apache solr, a powerful open-‐source indexing and search platform, and solrmarc, code developed within the library community that facilitates indexing marc records into solr. using solrmarc, marc fields and subfields are mapped to various fields in the solr index; for example, the contents of marc field 020, subfield a, and field 773, subfield z, are both mapped to a solr index field called “isbn.” more than fifty solr fields are populated in our index. our visualization system was built on top of vufind’s solr index and visualizes data taken directly from the index. the visualizations are created in javascript using the d3.js visualization library, and they are designed to implement shneiderman’s visual information seeking mantra: “overview first, zoom and filter, then details-‐on-‐demand.”16 the goal was to give users the option of viewing a graphical overview of their results, grouped by disciplinary perspective and topic, and then allow them to zoom in on the results from specific perspectives or on specific topics. once they have used the interactive visualization to narrow their search, they can choose to see a traditional list of results with full bibliographic details about the items. this would, ideally, provide a version of the information technology and libraries | june 2015 96 systematic outline that cochrane envisioned. it should also support users as they attempt to narrow down their search results and focus on a specific aspect of their chosen subject without overwhelming them with long lists of results or of facets. currently, we are visualizing values of two fields, one containing the first letter of the items’ library of congress classification (lcc) numbers and the other containing elements of the items’ library of congress subject headings (lcsh). this data is visualized as a two-‐level treemap.17 first, large boxes are drawn representing the number of items matching the search within each letter of the lcc. within the largest of these boxes, smaller boxes are drawn showing the most common elements of the subject headings for items matching that search within that lcc main class. less common subject heading elements are combined into an additional small box, labeled “x more topics”; clicking on that box zooms in so that users only see results from one lcc main class, and it displays all of the lcsh headings applied to items in that group. similarly, users can click on any of the smaller lcc boxes, which do not contain lcsh boxes in the original visualization, to zoom in on that lcc main class and see the lcsh subject headings for it. both the large and the small boxes are sized to represent what proportion of the results were in that lcc main class or had that lcsh subject heading. this is easier to explain with a concrete example. let’s say a student were to search for “climate change” and click on the option to visualize the results. you can see what this looks like in figure 1. instead of seeing a list of nearly two thousand books, the student now sees a visual representation of the disciplinary perspectives (as represented by the main classes of the lcc) and topics (as represented by elements of the lcsh) included in the results. users could click to zoom in on any main class within the lcc to see all of the topics covered by books in that class, as in figure 2, where the student has zoomed in on “s – agriculture.” or users could click on any topic facet to see a traditional results list of books with that topic facet in that main class. at any zoom level, users could choose to return to the traditional results list by clicking on the “list results” option.18 we launched this feature in our catalog midway through the spring 2014 semester. formal usability testing was completed with five advanced undergraduates, three staff, and two faculty members in the summer of 2014. (see appendix a for the outline of the usability test.) one first-‐ year student completed usability testing in the fall 2014 semester. the usability study asked participants to complete a set list of nine specific, predetermined tasks. some tasks involved the use of now-‐standard catalog features, such as saving results to a list and emailing results to oneself, while about half of the tasks involved navigation of the visualization tool, which was entirely new to the participants. each participant received the same tasks and testing experience regardless of their status as a student, faculty, or staff, and each academic division was represented among the participants. exploratory subject searching in library catalogs: reclaiming the vision | bauder and lange doi: 10.6017/ital.v34i2.5888 97 figure 2. visualization of the results for a search for “climate change,” filtered to show only results with library of congress classification numbers starting with s. results usability testing revealed no major obstacles in the way of users’ ability to navigate the visualization feature; the visualized search results were quickly deciphered by the participants with the assistance of the context set by the study’s outlined tasks. familiarity with library catalogs in general, and the grinnell college libraries catalog in particular, showed no marked impact on users’ performance. no particular user group performed as an outlier in regards to users’ general ability to complete tasks or the time required to do so. the most common issue to arise during the session concerned the visualization’s truncated text, which appears in the far left column of results when the descriptor text contains too many characters for the space allocated. (an example of this truncated text can be seen in figure 1.) the information technology and libraries | june 2015 98 subject boxes appearing in the furthest left column contain the least results, and therefore receive the least space within the visualization. this limited space sometimes results in truncated text. the full-‐text can be viewed by hovering over the truncated text box, but few users discovered this capability. another common concern involved a participant’s ability to switch their search results from the default list view to the visualized view. all participants were capable of selecting the “visualize these results” button required to produce the visualization, but a handful of participants expressed that they feared they would not find that option if they were not prompted to do so. participants remarked that the visualization initially appeared daunting but then quickly became comfortable navigating the results. most participants, including staff, stated that they found the tool useful and intended to use it in the future during the course of their typical work at the college. conclusion librarians have had innovative ideas for ways to use subject and classification data to provide an improved online search experience for decades, yet after thirty-‐plus years of improvements in online catalogs, users continue to struggle with narrowing down their searches to produce manageable lists containing only relevant results.19 computer scientists have been advocating for interfaces to support visual information-‐seeking since the 1980s. finally, hardware and software have improved to the point where many of these ideas can be implemented feasibly, even by relatively small libraries. now is the time to put some of them into production and see how well they work for library users. the particular visualizations reported in this article may or may not be the best possible visualizations of bibliographic data, but we will never know which of these ideas might prove to be the revolution that library discovery interfaces need until we try them. exploratory subject searching in library catalogs: reclaiming the vision | bauder and lange doi: 10.6017/ital.v34i2.5888 99 appendix a. usability testing instrument introductory questions before we look at the site, i’d like to ask you just a few quick questions. —have you searched for materials using the grinnell college libraries’ website before? if so, what for and when? (for students only: could you please estimate how many research projects you’ve done at grinnell college using the library catalog?) in the grinnell college libraries, we’re testing out a new tool in our catalog that presents search results in a different way than you are used to. now i’m going to read you a short explanation of why we created this tool and what we hope the tool will do for you before we start the test. research is a conversation: a scholar reads writings by other scholars in the field, then enters into dialogue with them in his or her own writing. most of the time, these conversations happen within the boundaries of a single discipline, such as chemistry, sociology, or art history, even when many disciplines are discussing similar topics. but when you do a search in a library catalog, writings that are part of many different conversations are all jumbled together in the results. it’s like being thrown into one big room where all of these scholars, from all of these different disciplines, are talking over each other all at once. our new visualization tool aims to help you sort all of these writings into the separate conversations in which they originated. scenarios now i am going to ask you to try doing some specific tasks using 3search. you should read the instructions aloud for all tasks individually prior to beginning each. and again, as much as possible, it will help us if you can try to think out loud as you go along. please begin by reading the first scenario aloud and then begin the first scenario. if you are unsure whether you finished the task or not, please ask me. i can confirm if the task has been completed. once you are done with scenario 1, please continue onto scenario 2 by reading it aloud and then beginning the task. continue this process until all scenarios are finished. if you cannot complete a task, please be honest and try to explain briefly why you were unsuccessful and continue to the next. 1. pretend that you are writing a paper about issues related to privacy and the internet. do a search in 3search with the words “privacy internet.” 2. please select the first worldcat result and attempt to determine whether you have access to the full text of this book. if not, please indicate where you could request the full text through the interlibrary loan service. 3. go back to your initial search results. please choose “explore these results” of the ebsco database results. choose an article. if you have unlimited texting, have the article’s information technology and libraries | june 2015 100 information texted to your cell phone. then, add the article to a new list for future reference throughout this project. 4. go back to your initial search results. for grinnell college’s collections results, click on the “explore these results” link. then click on the “visualize results” link to visualize the results. which disciplines appear to have the greatest interest in this topic? 5. when privacy and the internet are discussed in the context of law, what are some of the topics that are frequently covered in these discussions? 6. one specific topic you are considering is the legal issues around libel and slander on the internet. how many resources do the libraries have on that specific topic? 7. click on “q – science,” to see the results authored by theoretical computer scientists. based on these results, what are some of the topics that are frequently covered in their discussions when these computer scientists discuss privacy and the internet? 8. pretend that you are writing this paper for a computer science class and you are supposed to address your topic from a computer science perspective. please narrow your results to only show results that are in the format of a book. based on this new visualization, what might be some good topics to consider? 9. add one of these books to the list you created in step 3. please email all of the items on this list to yourself. debriefing thank you. that is it for the computer tasks. i have a few quick questions for you now that you have gotten a chance to use the site. 1. what do you think about 3search? is it something that you would use? why or why not? 2. what is your favorite thing about 3search? 3. what is your least favorite thing about 3search? 4. did you find the visualization function useful? why or why not? 5. do you have any recommendations for changes to the way this site looks or works? exploratory subject searching in library catalogs: reclaiming the vision | bauder and lange doi: 10.6017/ital.v34i2.5888 101 references 1. elaine svenonius, “use of classification in online retrieval,” library resources & technical services 27, no. 1 (1983): 76–80, http://alcts.ala.org/lrts/lrtsv25no1.pdf. 2. pauline a. cochrane, “subject access—free or controlled? the case of papua new guinea,” in redesign of catalogs and indexes for improved online subject access: selected papers of pauline a. cochrane (phoenix: oryx, 1985), 275. previously published in online public access to library files: conference proceedings: the proceedings of a conference held at the university of bath, 3– 5 september 1984 (oxford: elsevier, 1985). 3. marcia bates, “subject access in online catalogs: a design model,” journal of the american society for information science 37, no. 6 (1986): 363, http://dx.doi.org/10.1002/(sici)1097-‐ 4571(198611)37:6<357::aid-‐asi1>3.0.co;2-‐h 4. charles ammi cutter, rules for a printed dictionary catalog (washington, dc: government printing office, 1876). 5. david ward, jim hahn, and kirsten feist, “autocomplete as a research tool: a study on providing search suggestions,” information technology & libraries 31, no. 4 (2012): 6–19, http://dx.doi.org/10.6017/ital.v31i4.1930; suzanne chapman et al., “manually classifying user search queries on an academic library web site,” journal of web librarianship 7 (2013): 401–21, http://dx.doi.org/10.1080/19322909.2013.842096. 6. n. j. belkin, r. n. oddy, and h. m. brooks, “ask for information retrieval: part i. background and theory,” journal of documentation (1982): 61–71, http://dx.doi.org/10.1108/eb026722; christine borgman, “why are online catalogs still hard to use?,” journal of the american society for information science (1996): 493–503, http://dx.doi.org/10.1002/(sici)1097-‐ 4571(199607)47:7<493::aid-‐asi3>3.0.co;2-‐p; karen markey, “the online library catalog: paradise lost and paradise regained?,” d-‐lib magazine 13, no. 1/2 (2007), http://www.dlib.org/dlib/january07/markey/01markey.html. 7. cochrane, “subject access—free or controlled?,” 275. 8. svenonius, “use of classification in online retrieval,” 78–79. 9. jasper kaizer and anthony hodge, “aquabrowser library: search, discover, refine,” library hi tech news (december 2005): 9–12, http://dx.doi.org/10.1108/07419050510644329. 10. kristen antelman, emily lynema, and andrew pace, “toward a twenty-‐first century library catalog,” information technology & libraries 25, no. 3 (2006): 128–39, http://dx.doi.org/10.6017/ital.v25i3.3342. 11. tod olson, “utility of a faceted catalog for scholarly research,” library hi tech (2007): 550– 61, http://dx.doi.org/10.1108/07378830710840509; jody condit fagan, “usability studies of information technology and libraries | june 2015 102 faceted browsing: a literature review,” information technology and libraries 29, no. 2 (2010): 58-‐66, http://dx.doi.org/10.6017/ital.v29i2.3144. 12. kathleen bauer, “yale university library vufind test—undergraduates,” november 11, 2008, accessed september 9, 2014, http://www.library.yale.edu/usability/studies/summary_undergraduate.doc. 13. see, for example, ben shneiderman, “the future of interactive systems and the emergence of direct manipulation,” behaviour & information technology 1 (1982): 237–56, http://dx.doi.org/10.1080/01449298208914450; ben shneiderman, “dynamic queries for visual information seeking,” ieee software 11 (1994): 70–77, http://dx.doi.org/10.1109/52.329404. 14. see, for example, aleks aris et al., “visual overviews for discovering key papers and influences across research fronts,” journal of the american society for information science & technology 60 (2009): 2219–28, http://dx.doi.org/10.1002/asi.v60:11; furu wei et al., “tiara: a visual exploratory text analytic system,” in proceedings of the 16th acm sigkdd international conference on knowledge discovery and data mining (washington, dc: acm, 2010), 153–62, http://dx.doi.org/10.1145/1835804.1835827; cody dunne, ben shneiderman, robert gove, judith klavans, and bonnie dorr, “rapid understanding of scientific paper collections: integrating statistics, text analysis, and visualization,” journal of the american society for information science & technology 63 (2012): 2351–69, http://dx.doi.org/10.1002/asi.22652. 15. the most notable exception is carrot2 (http://search.carrot2.org), a search tool that will automatically cluster web search results and display visualizations of those clusters. 16. ben shneiderman, “the eyes have it: a task by data type taxonomy for information visualizations,” september 1996, accessed april 27, 2014, http://drum.lib.umd.edu/bitstream/1903/5784/1/tr_96-‐66.pdf. 17. ben shneiderman, “treemaps for space-‐constrained visualization of hierarchies: including the history of treemap research at the university of maryland,” institute for systems research, accessed october 6, 2014, http://www.cs.umd.edu/hcil/treemap-‐history. 18. to explore this feature in our catalog, go to https://libweb.grinnell.edu/vufind/search/home, do a search, and click on the “visualize results” link in the upper right. 19. a recent project information literacy report found that the two aspects of research that first-‐ year students found most difficult were “coming up with keywords to narrow down searches” and “filtering and sorting through irrelevant results from online searches.” alison j. head, learning the ropes: how freshmen conduct course research once they enter college (project information literacy, december 5, 2013), http://projectinfolit.org/images/pdfs/pil_2013_freshmenstudy_fullreport.pdf, 15. usability study of a library’s m obile website: an example from portland state university kimberly d. pendell and michael s. bowman usability study of a library’s mobile website | pendell and bowman 45 abstract to discover how a newly developed library mobile website performed across a variety of devices, the authors used a hybrid field and laboratory methodology to conduct a usability test of the website. twelve student participants were recruited and selected according to phone type. results revealed a wide array of errors attributed to site design, wireless network connections, as well as phone hardware and software. this study provides an example methodology for testing library mobile websites, identifies issues associated with mobile websites, and provides recommendations for improving the user experience. introduction mobile websites are swiftly becoming a new access point for library services and resources. these websites are significantly different from full websites, particularly in terms of the user interface and available mobile-friendly functions. in addition, users interact with a mobile website on a variety of smartphones or other internet-capable mobile devices, all with differing hardware and software. it is commonly considered a best practice to perform usability tests prior to the launch of a new website in order to assess its user friendliness, yet examples of applying this practice to new library mobile websites are rare. considering the variability of user experiences in the mobile environment, usability testing of mobile websites is an important step in the development process. this study is an example of how usability testing may be performed on a library mobile website. the results provided us with new insights on the experience of our target users. in the fall of 2010, with the rapid growth of smartphones nationwide especially among college students, portland state university (psu) library decided to develop a mobile library website for its campus community. the library’s lead programmer and a student employee developed a test version of the website. this version of the website included library hours, location information, a local catalog search, library account access for viewing and renewing checked out items, and access to reference services. it also included a “find a computer” feature displaying the availability of work stations in the library’s two computer labs. kimberly d. pendell (kpendell@pdx.edu) is social sciences librarian, assistant professor, and michael s. bowman (bowman@pdx.edu) is interim assistant university librarian for public services, associate professor, portland state university library, portland, oregon. mailto:kpendell@pdx.edu mailto:bowman@pdx.edu information technology & libraries | june 2012 46 the basic architecture and design of the site was modeled on other existing academic library mobile websites that were appealing to the development team. the top-level navigation of the mobile website largely mirrored the full library website, utilizing the same language as the website when possible. the mobile website was built to be compatible with webkit, the dominant smartphone layout engine. use of javascript on the website was minimized due to the varying levels of support for it on different smartphones, and flash was avoided entirely. figure 1. home page of library mobile website, test version we formed a mobile website team to further evaluate the test website and prepare it for launch. three out of four team members owned smartphones, either an iphone 3gs or an iphone 4. we soon began questioning how the mobile website would work on other types of phones, recognizing that hardware and software differences would likely impact user experience of the mobile website. performing a formal usability test using a variety of internet-capable phones quickly became a priority. we decided to conduct a usability test for the new mobile website in order to answer the question: how user-friendly and effective is the new library mobile website on students’ various mobile devices? literature review smartphones, mobile websites, and mobile applications have dominated the technology landscape in the last few years. smartphone ownership has steadily increased, and a large percentage of usability study of a library’s mobile website | pendell and bowman 47 smartphone owners regularly use their phone to access the internet. the pew research center reports that 52 percent of americans aged 18–29 own smartphones, and 81 percent of this population use their smartphone to access the internet or e-mail on a typical day. additionally, 42 percent of this population uses a smartphone as their primary online access point.1 the 2010 ecar study of undergraduate students and information technology found that 62.7 percent of undergraduate students own internet-capable handheld devices, an increase of 11.5 percent from 2009. the 2010 survey also showed that an additional 11.3 percent of students intended to purchase an internet-capable handheld device within the next year.2 in this environment academic libraries have been scrambling to address the proliferation of student owned mobile devices, thus the number of mobile library websites is growing. the library success wiki, which tracks libraries with mobile websites, shows an 66 percent increase in the number of academic libraries in the united states and canada with mobile websites from august 2010 to august 2011.3 we reviewed articles about mobile websites in the professional library science literature and found that mobile website usability testing is only briefly mentioned. in their summary of current mobile technologies and mobile library website development, bridges, rempel, and griggs state that “user testing should be part of any web application development plan. you can apply the same types of evaluation techniques used in non-mobile applications to ensure a usable interface.”4 in a previous article, the same authors also note that not accounting for other types of mobile users is easy to do but leaves a potentially large audience for a mobile website “out in the cold.”5 more recently, seeholzer and salem found the usability aspect of mobile website development to be in need of further research.6 usability evaluation techniques for a mobile website are similar to those for a full website, but the variety of smartphones and internet-capable feature phones immediately complicates standard usability testing practices. the mobile device landscape is fraught with variables that can have a significant impact on the user experience of a mobile website. factors like small screen size, processing power, wireless or data plan connection, and on-screen keyboards or other data entry methods contribute to user experience and impact usability testing. zhang and adipat note that, mobile devices themselves, due to their unique, heterogeneous characteristics and physical constraints, may play a much more influential role in usability testing of mobile applications than desktop computers do in usability testing of desktop applications. therefore real mobile devices should be used whenever possible.7 one strategy for usability testing on mobile devices is to identify device “families” by similar operating systems or other characteristics, then perform a test of the website. for example, griggs, bridges, and rempel found representative models of device families at a local retailer, where they tested the site on the display phones. the authors also recommend “hallway usability testing,” an impromptu test with a volunteer.8 zhang and adipat go on to outline two methodologies for formal mobile application usability testing: field studies and laboratory experiments. the benefit of a mobile usability field study is information technology & libraries | june 2012 48 the preservation of the mobile environment in which tasks are normally performed. however, data collection is challenging in field studies, requiring the participant to reliably and consistently selfreport data. in contrast, the benefit of a laboratory study is that researchers have more control over the test session and data collection method. laboratory usability tests lend themselves to screen capture or video recording, allowing researchers more comprehensive data regarding the participant’s performance on predetermined tasks.9 however, billi and others point out that there is no general agreement in the literature about the significance or usefulness of the difference between laboratory and field testing of mobile applications.10 one compromise between field studies and laboratory experiments is the use of a smartphone emulator: an emulator mimics the smartphone interface on a desktop computer and is recordable via screen capture. however, desktop emulators mask some usability problems that impact smartphones, such as an unstable wireless connection or limited bandwidth.11 in order to record test sessions of users working directly with mobile devices, jakob nielsen, the well-known usability expert, briefly mentions the use of a document camera.12 in another usability test of a mobile application, loizides and buchanan also used a document camera with recording capabilities to effectively record users working with a mobile device.13 usability attributes are metrics that help assess the user-friendliness of a website. in their review of empirical mobile usability studies, coursaris and kim present the three most commonly used measures in mobile usability testing: efficiency: degree to which the product is enabling the tasks to be performed in a quick, effective and economical manner or is hindering performance; effectiveness: accuracy and completeness with which specified users achieved specified goals in particular environment; satisfaction: the degree to which a product is giving contentment or making the user satisfied.14 the authors present these measures in an overall framework of “contextual usability” constructed with the four variables of user, task, environment, and technology. an important note is the authors’ use of technology rather than focusing solely on the product; this subtle difference acknowledges that the user interacts not only with a product, but also other factors closely associated with the product, such as wireless connectivity.15 a participant proceeding through a predetermined task scenario is helpful in assessing site efficiency and effectiveness by measuring the error rate and time spent on a task. user satisfaction may be gauged by the participant’s expression of satisfaction, confusion, or frustration while performing the tasks. measurement of user satisfaction may also be supplemented by a post-test survey. returning to general evaluation techniques, mobile website usability employs the use of task scenarios, post-test surveys, and data analysis methods, similar to full site testing. general guides such as the handbook of usability testing by rubin and chisnell and george’s user-centered library websites: usability evaluation methods provide helpful information on designing task scenarios, how to facilitate a test, post-test survey ideas, and methods of analysis.16 another usability study of a library’s mobile website | pendell and bowman 49 common data collection method in usability testing is the think aloud protocol as it allows researchers to more fully understand the user experience. participants are instructed to talk about what they are thinking as they use the site; for example, expressing uncertainty of what option to select, frustration with poorly designed data entry fields, or satisfaction with easily understood navigation. examples of the think aloud protocol can also be found in mobile website usability testing.17 method while effective usability testing normally relies on five to eight participants, we decided a larger number of participants would be needed in order to capture the behavior of the site on a variety of devices. therefore, we recruited twelve participants to accommodate a balanced variety of smartphone brands and models. based on average market share, we aimed to test the website on four iphones, four android phones, and four other types of smartphones or internet-capable mobile devices (e.g., blackberry, windows phones). all study participants were university students, the primary target audience of the mobile website. we used three methods to recruit participants: a post to the library’s facebook page, a news item on the library’s home page, and two dozen flyers posted around campus. each form of recruitment described an opportunity for students to spend less than thirty minutes helping the library test its new mobile website. also, participants would receive a $10 coffee shop gift card as an incentive. a project-specific email address served as the initial contact point for students to volunteer. we instructed volunteers to indicate their phone type in their e-mail; this information was used to select and contact the students with the desired variety of mobile devices. if a scheduled participant did not come to the test appointment, another student with the same or similar type of phone was contacted and scheduled. no other demographic data or screening was used to select participants, aside from a minimum age requirement of eighteen years old. we employed a hybrid field and laboratory test protocol, which allowed us to test the mobile website on students’ native devices while in a laboratory setting that we could efficiently manage and schedule. participants used their own phone for the test without any adjustment to their existing operating preferences, similar to field testing methodology. however, we used a controlled environment in order to facilitate the test session and create recordings for data analysis. a library conference room served as our laboratory, and a document camera with video recording capability was used to record the session. the document camera was placed on an audio/visual cart and the participants chose to either stand or sit while holding their phones under the camera. the document camera recorded the phone screen, the participant’s hands, and the audio of the session. the video feed was available through the room projector as well, which helped us monitor image quality of the recordings. information technology & libraries | june 2012 50 figure 2. video still from test session recording the test session consisted of two parts: the completion of five tasks using participants’ phones on our test website recorded under the document camera, and a post-test survey. participants were read an introduction and instructions from a script in order to decrease variation in test protocol and our influence as the facilitators. we also performed a walk-through of the testing session prior to administering it to ensure the script was clearly worded and easy to understand. we developed our test scenarios and tasks according to five functional objectives for the library mobile website: 1. participants can find library hours for a given day in the week. 2. participants can perform a known title search in catalog and check for item status. 3. participants can use my account to view checked out books.18 4. participants can use chat reference. 5. participants can effectively search for a scholarly article using the mobile version of ebscohost academic search complete. prior to beginning the test, we encouraged participants to use the “think aloud” protocol while performing tasks. we also instructed them to move between tasks however they would naturally in order to capture user behavior when navigating from one part of the site to another. the post-test survey provided us with additional data and user reactions to the site. users were asked to rate the site’s appearance, ease of use, and how frequently they might use the different website features usability study of a library’s mobile website | pendell and bowman 51 (e.g., renewing a checked out item). the survey was administered directly after the task scenario portion of the test in order to take advantage of the users’ recent experience with the website. we evaluated the test sessions utilizing the measures of efficiency, effectiveness, and satisfaction. in this study, we assessed efficiency as time spent performing the task and effectiveness as success or failure in completing the task. we observed errors and categorized them as either a user error or site error. each error was also categorized as minor, major, or fatal: minor errors were easily identified and corrected by the user; major errors caused a notable delay, but the user was able to correct and complete the task; fatal errors prevented the user from completing the task. to assess user satisfaction, we took note of user comments as they performed tasks, and we also referred to their ratings and comments on the post-test survey. before analyzing the test recordings, we normalized our scoring behavior by performing a sample test session with a library staff member unfamiliar with the mobile website. we scored the sample recording separately and then met to discuss, clarify, and agree upon each error category. each of the twelve test sessions was viewed and scored independently. once this process was completed, we discussed our scoring of each test session video, combining our data and observations. we analyzed the combined data by looking for both common and unique errors for each usability task across the variety of smartphones tested. to protect participants’ confidentiality, each video file and post-test survey was labeled only with the test number and device type. prior to beginning the study, all recruitment methods, informed consent, methodology, tasks and post-test survey were approved by portland state university human subjects research and review committee. findings our recruitment efforts were successful with even a few same-day responses from the announcement posted on the library’s facebook page. some students also indicated that they had seen the recruitment flyers on campus. a total of fifty-two students volunteered to participate; twelve students were successfully contacted, scheduled, and tested. the distribution of the twelve participants and their types of phones is shown in table 1. number of participants operating system phone model 4 android htc droid incredible 2; motorola droid; htc mytouch 3g slide; motorola cliq 2 3 ios iphone 3gs 2 blackberry blackberry 9630; blackberry curve information technology & libraries | june 2012 52 1 windows phone 7 windows phone 7 1 webos palm pixi 1 other windows kin 2 feature phone (a phone with internet capability, running kinos) table 1. test participants by smartphone operating system and model usability task scenarios all test participants quickly and successfully completed the first task, finding the library hours for sunday. the second task was to find a book in the library catalog and report whether the book was available for check out. nine participants completed this task; the windows phone 7 and the two blackberry phones presented a fatal system error when working with our mobile catalog software, mobilecat. these participants were able to perform a search but were not able to view a full item record, blocking them from seeing the item’s availability and completing the task. this task also revealed one minor error for iphone users: the iphone displayed the item’s ten digit isbn as a phone number, complete with touch-to-call button. many users took more time than anticipated when asked to search for a book. the video recordings captured participants slowly scrolling through the menu before choosing “search psuonly catalog.” a few participants expressed their hesitation verbally: ● “maybe not the catalog? i don't know. yeah i guess that would be the one.” ● “i don't look for books on this site anyway...my lack of knowledge more than anything else.” ● “search psu library catalog i'm assuming?” the blackberry curve participant did not recognize the catalog option and selected “databases & articles” to search for a book. she was guided back to the catalog after her unsuccessful search in ebscohost. we observed an additional delay in searching for a book when using the catalog interface. the catalog search included a pull down menu of collections options. the collections menu was included by the site developers because it is present in the full website version of the local catalog. users tended to explore the menu looking for a selection that would be helpful in performing the task; however, they abandoned the menu, occasionally expressing additional confusion. usability study of a library’s mobile website | pendell and bowman 53 figure 3. catalog search with additional “collections” menu the next task was to log into a library account and view checked out items. all participants were successful with this task, but frequent minor user errors were observed, all misspelling or numerical entry errors. most participants self-corrected before submitting the login; however, one participant submitted a misspelled user name and promptly received an error message from the site. participants were also instructed to log out of the account. after clicking “logout” one participant made the observation; “huh, it goes to the login screen. i assume i'm logged out, though it doesn't say so.” the fourth task scenario involved using the library’s chat reference service via the mobile website. the chat reference service is provided via open source software in cooperation with l-net, the oregon statewide service. usability testing demonstrated that the chat reference service did not perform well on a variety of phones. also, a significant problem arose when participants attempted to access chat reference via the university’s unsecured wireless network. because the chat reference service is managed by a third-party host, three participants were confronted with a non-mobile friendly authentication screen (see discussion of the local wireless environment below). as this was an unexpected event in testing, participants were given the option to authenticate or abandon the task. all three participants who arrived at this point chose to move ahead with authentication during the test session. information technology & libraries | june 2012 54 once the chat interface was available to participants, other system errors were discovered. only three out of twelve participants successfully sent and received a chat message. only one participant (htc droid incredible) experienced an error-free chat transaction. various problems encountered included: · unresponsive or slow to respond buttons, · text fields unresponsive to data entry, · unusually long page loading time, · non-mobile-friendly error message upon attempting to exit, and · non-mobile-friendly “leave a message” webpage. another finding from this task is that participants expressed concern regarding communication delays during the chat reference task. if the librarians staffing the chat service are busy with other users, a new incoming user is placed in a queue. after waiting in the chat queue for forty seconds, one participant commented, “probably if i was on the bus and it took this long, i would leave a message.” being in a controlled environment, participants looked to the facilitator as a guide for how long to remain in the chat queue, distorting the indication of how long users would wait for a chat reference transaction in the field environment. figure 4. chat reference queue usability study of a library’s mobile website | pendell and bowman 55 the last task scenario asked participants to use the mobile version of ebscohost’s academic search complete. our test instance of this database generally performed well with android phones and less well with webos phones or iphones. android participants successfully accessed, searched, and viewed results in the database. iphone users experienced delays in initiating text entry, three consecutive touches being consistently necessary to activate typing in the search field. our feature phone participant with a windows kin 2 was unable to use ebscohost because the phone’s browser, internet explorer 6, is not supported by the ebscohost mobile website. the palm pixi participant also experienced difficulty with very long page loading times, two security certificate notifications (not present on other tests), and our ezproxy authentication page. with all these obstacles, the palm pixi participant abandoned the task. another participant, blackberry 9630, also abandoned the task due to slow page loading. a secondary objective of our ebscohost search task was to observe if participants explored ebscohost’s “search options” in order to limit results to scholarly articles. our task scenario asked participants to find a scholarly article on global warming. only one participant explored the ebscohost interface, successfully identified the “search options” menu, and limited the results to “scholarly (peer reviewed) articles.” another participant included the words “peer reviewed” with “global warming” in the search field in an attempt to add the limit. a third expressed the need to limit to scholarly articles but was unable to discover how to do so. of the remaining seven participants who searched academic search complete for the topic “global warming” none expressed concern or awareness of the scholarly limit in academic search complete. it is unclear whether this was a product of the interface design, users’ lack of knowledge regarding limiting their search to scholarly sources, or if our task scenario was simply too vague. though participants’ wireless configurations, or lack thereof, was not formally part of the usability test, we quickly discovered that this variable had a significant impact on the user’s experience of the mobile website. in the introductory script and informed consent we recommended to participants that they connect to the university’s wireless network to avoid data charges. however, we did not explicitly instruct users to connect to the secure network. most participants chose to connect to the unencrypted wireless network and appeared to be unaware of the encrypted network (psu and psu secure respectively). using the unencrypted network led to authentication requirements at two different points in the test: using the chat service and searching academic search complete. other users who were unfamiliar with adding a wireless network to their phone used their cellular network connection. these participants were asked to authenticate only when accessing ebscohost’s academic search complete (see table 2). participants expressed surprise at the appearance of an authentication request when performing different tasks, particularly while connected to the on-campus university wireless network. the required data entry in a non-mobile friendly authentication screen, and the added page loading time, created an obstacle for the participant to overcome in order to complete the task. notably, three participants also explained their naivete on how to find and add a wireless network to their phone. information technology & libraries | june 2012 56 internet connection library mobile website chat reference ebscohost on campus, unencrypted wireless no authentication required authentication required authentication required on campus, encrypted wireless no authentication required no authentication required no authentication required on campus, cellular network no authentication required no authentication required authentication required off campus, any mode no authentication required no authentication required authentication required table 2. authentication requirements based on type of internet connection and resource. post -test survey each participant completed a post-test survey that asked them to rate the mobile website’s appearance and ease of use. the survey also asked participants to rank how frequently they were likely to use specific features of the website such as search for books and ask for help on a rating scale of more than weekly, weekly, monthly, less than monthly, and never. participants were also invited to add general comments about the website. the mobile website’s overall appearance and ease of use was highly rated by all participants. the straightforward design of the mobile website’s homepage also garnered praise in the comment section of the post-test survey. comments regarding the site’s design included: “very simple to navigate,” and “the simple homepage is perfect! also, i love that the site rotates sideways with my phone.” for many of the features listed on the survey participants selected an almost even distribution across the frequency of use rating scale. however, two features were ranked as having potential for very high use. nine out of twelve participants said they would search for articles weekly or more than weekly. eight out of twelve participants said they would use the “find a computer” function weekly or more than weekly. two participants additionally wrote in comments that “find a computer” was “very important” and would be used “every day.” at the other end of the scale, our menu option “directions” was ranked as having a potential frequency of use of never, with the exception of one participant marking less than monthly. discussion usability testing of the library’s mobile website provided the team with valuable information, leading us to implement important changes before the site was launched. we quickly decided on a usability study of a library’s mobile website | pendell and bowman 57 few changes, while others involved longer discussion. the collections menu was removed from the catalog search; this menu distracted and confused users with options that were not useful in a general search. “directions” was moved from a top level navigation element to a clickable link in the site footer. also, the need for a mobile version of the library’s ezproxy authentication page was clearly documented and has since been created and implemented. however, the team was very pleased with the praise for the overall appearance of the website and its ease of use, especially considering the significant difficulties some participants faced when completing specific tasks. the “find a computer” feature of the mobile website was very popular with test participants. the potential popularity among users is perhaps a reflection of overcrowded computer labs across campus and the continued need students have for desktop computing. unfortunately, “find a computer” has been temporarily removed from the site due to changes in computer laboratory tracking software at the campus it level. we hope to soon again have access to the workstation data for the library’s two computer labs in order to develop a new version of this feature. the hesitation participants displayed when selecting the catalog option in order to search for a book was remarkable for its pervasiveness. it’s possible that the term “catalog” has declined in use to the point of not being recognizable to some users, and it is not used to describe the search on the homepage of the library’s full website. in fact, we had originally planned to name the catalog search option with a more active and descriptive phrase, such as “find books and more,” which is used on the library’s full website. however, the full library website employs worldcat local, allowing users to make consortial and interlibrary loan requests. in contrast, the mobile website catalog reflects only our local holdings and does not support the request functionality. the team decided not to potentially confuse users further regarding the functionality of the different catalogs by giving them the same descriptive title. in the case that worldcat local’s beta mobile catalog increases in stability and functionality, we will abandon mobilecat and provide the same request options on the mobile website as on the full website. we discussed removing the chat service option from the “ask us” page. during usability testing, it was demonstrated that users would too frequently have poor experiences using this service due to slow page loads on most phones, the unpredictable responsiveness of text entry fields and buttons, and the wait time for a librarian to begin the chat. also, it could be that waiting in a virtual queue on a mobile device is particularly unappealing because the user is blocked from completing other tasks simultaneously. the library recently implemented a new text reference service, and this service was added to the mobile website. the text reference service is an asynchronous, non-webbased service that is less likely to pose similar usability problems as those found with the chat service. this reflects the difference between applications developed for desktop computing, such as web-based instant messaging, versus a technology that is specifically related to the mobile phone environment, like text messaging. however, tablet device users complicate matters since they might use the full desktop website or the mobile website; for this reason, chat reference is still part of the mobile website. information technology & libraries | june 2012 58 participants’ interest in accessing and searching databases was notable. during the task, many participants expressed positive reactions to the availability of the ebscohost database. the posttest survey results demonstrated a strong interest in searching for articles via the mobile website, giving their potential frequency of use as weekly or more than weekly. this evidence supports the previous user focus group results of seeholzer and salem.19 students are interested in accessing research databases on their mobile devices, despite the likely limitations of performing advanced searches and downloading files. therefore, the team decided to include ebscohost’s academic search complete along with eight other mobile-friendly databases in the live version of the website launched after the usability test. figure 5. home page of the library mobile website, updated usability study of a library’s mobile website | pendell and bowman 59 the new library mobile website was launched in the first week of fall 2011 quarter classes. in the first full week there were 569 visits to the site. site analytics for the first week also showed that our distribution of smartphone models in usability testing was fairly well matched with the users of the website, though we underestimated the number of iphone users: 64 percent of visits were from apple ios users, 28 percent from android users, 0.7percent blackberry users, and the remaining a mix of users with alternative mobile browsers and desktop browsers. usability testing with participants’ native smartphones and wireless connectivity revealed issues which would have been absent in a laboratory test that employed a mobile device emulator and a stable network connection. the complications introduced by the encrypted and unencrypted campus wireless networks, and cellular network connections, revealed some of the many variables users might experience outside of a controlled setting. ultimately, the variety of options for connecting to the internet from a smartphone, in combination with the authentication requirements of licensed library resources, potentially adds obstacles for users. general recommendations for mobile library websites that emerged from our usability test include: · users appreciate simple, streamlined navigation and clearly worded labels; · error message pages and other supplemental pages linked from the mobile website pages should be identified and mobile-friendly versions created; · recognize that how users connect to the mobile website is related to their experience using the site; · anticipate problems with third-party services (which often cannot be solved locally). additionally, system responses to user actions are important; for example, provide a “you have successfully logged out” message and an indicator that a catalog search is in progress. it is possible that users are even more likely to abandon tasks in a mobile environment than in a desktop environment if they perceive the site to be unresponsive. as test facilitators, we experienced three primary difficulties in keeping the testing sessions consistent. the unexpectedly poor performance of the mobile website on some devices required us to communicate with participants about when a task could be abandoned. for example, after one participant made three unsuccessful attempts at entering text data in the chat service interface, she was directed to move ahead to the next task. such instances of multiple unsuccessful attempts were considered to be fatal system errors. however, under these circumstances, it is difficult to know whether our test facilitation led participants to spend more or less time than they normally would attempting a task. secondly, the issue of system authentication led to unexpected variation in testing. some participants proceeded through these obstacles, while others either opted out or had significant enough technical difficulties that the task was deemed a fatal error. again, it is unclear how the average user would deal with this situation in the field. some users information technology & libraries | june 2012 60 might leave an activity if an obstacle appears too cumbersome, others might proceed. finally, participants demonstrated a wide range in their willingness to “think aloud.” in retrospect, as facilitators, we should have provided an example of the method before beginning the test; perhaps doing so would have encouraged the participants to speak more freely. the relatively simple nature of most of the test tasks may have also contributed to this problem as participants seemed reluctant to say something that might be considered too obvious. another limitation of our study is that the participants were a convenience sample of volunteers selected by phone type. though our selection was based loosely on market share of different smartphone brands, a preliminary investigation into the mobile device market of our target population would have been helpful to establish what devices would be most important to test. additional usability testing on more complex library related tasks, such as advanced searching in a database, or downloading and viewing files, is recommended for further research. also of interest would be a study of user willingness to proceed past obstacles like authentication requirements and non-mobile friendly pages in the field. conclusion we began our study questioning whether or not different smartphone hardware and operating systems would impact the user experience of our library’s new mobile website. usability testing confirmed that the type of smartphone does have an impact on the user experience, occasionally significantly so. by testing the site on a range of devices, we observed a wide variation of successful and unsuccessful experiences with our mobile website. the wide variety of phones and mobile devices in use makes developing a mobile website that perfectly serves all of them difficult; there is likely to always be a segment of users who experience difficulties with any given mobile website. however, usability testing data and developer awareness of potential problems will generate positive changes to mobile websites and alleviate frustration for many users down the road. references and notes 1. aaron smith, “35% of american adults own a smartphone: one quarter of smartphone owners use their phone for most of their online browsing,” pew research center, june 15, 2011, http://pewinternet.org/~/media//files/reports/2011/pip_smartphones.pdf (accessed oct. 13, 2011). 2. shannon d. smith and judith b. caruso, the ecar study of undergraduate students and information technology, 2010, educause, 2010, 41, http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf (accessed sept. 12, 2011); shannon d. smith, gail salaway, and judith b. caruso, the ecar study of undergraduate students and information technology, 2009, educause, 2009, 49, http://www.educause.edu/resources/theecarstudyofundergraduatestu/187215 (accessed sept. 12, 2011). http://pewinternet.org/~/media/files/reports/2011/pip_smartphones.pdf http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf http://www.educause.edu/resources/theecarstudyofundergraduatestu/187215 usability study of a library’s mobile website | pendell and bowman 61 3. a comparison count of u.s. and canadian academic libraries with active mobile websites, wiki page versions, august 2010 (56 listed) and august 2011 (84 listed). library success: a best practices wiki, “m-libraries: libraries offering mobile interfaces or applications,” http://libsuccess.org/index.php?title=m-libraries (accessed sept. 7, 2011). 4. laurie m. bridges, hannah gascho rempel, and kim griggs, “making the case for a fully mobile library web site: from floor maps to the catalog,” reference services review 38, no. 2 (2010): 317, doi:10.1108/00907321011045061. 5. kim griggs, laurie m. bridges, and hannah gascho rempel, “library/mmobile: tips on designing and developing mobile web sites,” code4lib journal no. 8 (2009), under “content adaptation techniques,” http://journal.code4lib.org/articles/2055 (accessed sept. 7, 2011). 6. jamie seeholzer and joseph a. salem jr., “library on the go: a focus group study of the mobile web and the academic library,” college & research libraries 72, no. 1 (2011): 19. 7. dongsong zhang and boonlit adipat, “challenges, methodologies, and issues in the usability testing of mobile applications,” international journal of human-computer interaction 18, no. 3 (2005): 302, doi:10.1207/s15327590ijhc1803_3. 8. griggs, bridges, and rempel, “library/mobile.” 9. zhang and adipat, “challenges, methodologies,” 303–4. 10. billi et al., “a unified methodology for the evaluation of accessibility and usability of mobile applications,” universal access in the information society 9, no. 4 (2010): 340, doi:10.1007/s10209-009-0180-1. 11. zhang and adipat, “challenges, methodologies,” 302. 12. jakob nielsen, “mobile usability,” alertbox, september 26, 2011, www.useit.com/alertbox/mobile-usability.html (accessed sept. 28, 2011). 13. fernando loizides and george buchanan, “performing document triage on small screen devices. part 1: structured documents,” in iiix ’10: proceeding of the third symposium on information interaction in context, ed. nicholas j. belkin and diane kelly (new york: acm, 2010), 342, doi:10.1145/1840784.1840836. 14. constantinos k. coursaris and dan j. kim, “a qualitative review of empirical mobile usability studies” (presentation, twelfth americas conference on information systems, acapulco, mexico, august 4–6, 2006), 4, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.4082&rep=rep1&type=pdf (accessed sept. 7, 2011) 15. ibid., 2. http://libsuccess.org/index.php?title=m-libraries http://journal.code4lib.org/articles/2055 file:///c:/users/gerrityr/desktop/ital%2031n2_proofread/www.useit.com/alertbox/mobile-usability.html http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.4082&rep=rep1&type=pdf information technology & libraries | june 2012 62 16. jeffrey rubin and dana chisnell, handbook of usability testing: how to plan, design, and conduct effective tests, 2nd ed. (indianapolis, in: wiley, 2008); carole a. george, user-centered library web sites: usability evaluation methods (cambridge: chandos, 2008). 17. ronan hegarty and judith wusteman, “evaluating ebscohost mobile,” library hi tech 29, no. 2 (2011): 323–25, doi:10.1108/07378831111138198; robert c. wu et al., “usability of a mobile electronic medical record prototype: a verbal protocol analysis,” informatics for health & social care 33, no. 2 (2008): 141–42, doi:10.1080/17538150802127223. 18. in order to protect participants’ confidentiality a dummy library user account was created; the user name and password for the account were provided to the participant at the test session. 19. seeholzer and salem, “library on the go,” 14. tending a wild garden: library web design for persons with disabilities | vandenbark 23 r. todd vandenbark tending a wild garden: library web design for persons with disabilities nearly one-fifth of americans have some form of disability, and accessibility guidelines and standards that apply to libraries are complicated, unclear, and difficult to achieve. understanding how persons with disabilities access web-based content is critical to accessible design. recent research supports the use of a database-driven model for library web development. existing technologies offer a variety of tools to meet disabled patrons’ needs, and resources exist to assist library professionals in obtaining and evaluating product accessibility information from vendors. librarians in charge of technology can best serve these patrons by proactively updating and adapting services as assistive technologies improve. i n march 2007, eighty-two countries signed the united nations’ convention on the rights of persons with disabilities, including canada, the european community, and the united states. the convention’s purpose was “to promote, protect and ensure the full and equal enjoyment of all human rights and fundamental freedoms by all persons with disabilities, and to promote respect for their inherent dignity.”1 among the many proscriptions for assuring respect and equal treatment of people with disabilities (pwd) under the law, signatories agreed to take appropriate measures: (g) to promote access for persons with disabilities to new information and communications technologies and systems, including the internet; and (h) to promote the design, development, production and distribution of accessible information and communications technologies and systems at an early stage, so that these technologies and systems become accessible at minimum cost. in addition, the convention seeks to guarantee equal access to information by doing the following: (c) urging private entities that provide services to the general public, including through the internet, to provide information and services in accessible and usable formats for persons with disabilities; and (d) encouraging the mass media, including providers of information through the internet, to make their services accessible to persons with disabilities.2 because the internet and its design standards are evolving at a dizzying rate, it is difficult to create websites that are both cutting-edge and standards-compliant. this paper evaluates the challenge of web design as it relates to individuals with disabilities, exploring current standards, and offering recommendations for accessible development. examining the provision of it for this demographic is vital because according to the u.s. census bureau, the u.s. public includes about 51.2 million noninstitutionalized people living with disabilities, 32.5 million of which are severely disabled. this means that nearly one-fifth of the u.s. public faces some physical, mental, sensory, or other functional impairment (18 percent in 2002).3 because a library’s mandate is to make its resources accessible to everyone, it is important to attend to the special challenges faced by patrons with disabilities and to offer appropriate services with those special needs in mind. n current u.s. regulations, standards, and guidelines in 1990 congress enacted the americans with disabilities act (ada), the first comprehensive legislation mandating equal treatment under the law for pwd. the ada prohibits discrimination against pwd in employment, public services, public accommodations, and in telecommunications. title ii of the ada mandates that all state governments, local governments, and public agencies provide access for pwd to all of their activities, services, and programs. since school, public, and academic libraries are under the purview of title ii, they must “furnish auxiliary aids and services when necessary to ensure effective communication.”4 though predating widespread use of the internet, the law’s intent points toward the adoption and adaptation of appropriate technologies to allow persons with a variety of disabilities to access electronic resources in a way that is most effective for them. changes to section 508 of the 1973 rehabilitation act enacted in 1998 and 2000 introduced the first standards for “accessible information technology recognized by the federal government.”5 many state and local governments have since passed laws applying the standards of section 508 to government agencies and related services. according to the access board, the independent federal agency charged with assuring compliance with a variety of laws regarding services to pwd, information and communication technology (ict) includes any equipment or interconnected system or subsystem of equipment, that is used in the creation, conversion, or duplication of data or information. the term electronic r. todd vandenbark (todd.vandenbark@utah.edu) is web services librarian, eccles health sciences library, university of utah, salt lake city. 24 information technology and libraries | march 2010 and information technology includes, but is not limited to, telecommunications products (such as telephones), information kiosks and transaction machines, world wide web sites, multimedia, and office equipment such as copiers and fax machines.6 the access board further specifies guidelines for “web-based intranet and internet information and applications,” which are directly relevant to the provision of such services in libraries.7 what follows is a detailed examination of these standards with examples to assist in understanding and implementation. (a) a text equivalent for every non-text element shall be provided. assistive technology cannot yet describe what pictures and other images look like; they require meaningful text-based information associated with each picture. if an image directs the user to do something, the associated text must explain the purpose and meaning of the image. this way, someone who cannot see the screen can understand and navigate the page successfully. this is generally accomplished by using the “alt” and “longdesc” attributes for images: “short

. however, these aids also can clutter a page when not used properly. the current versions of the most popular screen-reader software do not limit the amount of “alt” text they can read. however, freedom scientific’s jaws 6.x divides the “alt” attribute into distinct chunks of 125 characters each (excluding spaces) and reads them separately as if they were separate graphics.8 this can be confusing to the end user. longer content can be put into a separate text file and the file linked to using the “longdesc” attribute. when a page contains audio or video files, a text alternative needs to be provided. for audio files such as interviews, lectures, and podcasts, a link to a transcript of the audio file must be immediately available. for video clips such as those on youtube, captions must accompany the clip. (b) equivalent alternatives for any multimedia presentation shall be synchronized with the presentation. this means that captions for video must be real-time and synchronized with the actions in the video, not contained solely in a separate transcript. (c) web pages shall be designed so that all information conveyed with color is also available without color, for example from context or markup. while color can be used, it cannot be the sole source or indicator of information. imagine an educational website offering a story problem presented in black and green print, and the answer to the problem could be deciphered using only the green letters. this would be inaccessible to students who have certain forms of color-blindness as well as those who use screen-reader software. (d) documents shall be organized so they are readable without requiring an associated style sheet. the introduction of cascading style sheets (css) can improve accessibility because they allow the separation of presentation from content. however, not all browsers fully support css, so webpages need to be designed so any browser can read them accurately. the content needs to be organized so that it can be read and understood with css formatting turned off. (e) redundant text links shall be provided for each active region of a server-side image map, and (f) client-side image maps shall be provided instead of server-side image maps except where the regions cannot be defined with an available geometric shape. an image map can be thought of as a geometrically defined and arranged group of links to other content on a site. a clickable map of the fifty u.s. states is an example of a functioning image map. a server-side image map would appear to a screen reader only as a set of coordinates, whereas clientside maps can include information about where the link leads through “alt” text. the best practice is to only use client-side image maps and make sure the “alt” text is descriptive and meaningful. (g) row and column headers shall be identified for data tables, and (h) markup shall be used to associate data cells and header cells for data tables that have two or more logical levels of row or column headers. correct table coding is critical. each table should use the “table summary” attribute to provide a meaningful description of its content and arrangement: . headers should be coded using the table header (“th”) tag, and its “scope” attribute should specify whether the header applies to a row or a column:

. if the table’s content is complex, it may be necessary to provide an alternative presentation of the information. it is best to rely on css for page layout, taking into consideration the directions in subparagraph (d) above. (i) frames shall be titled with text that facilitates frame identification and navigation. frames are a deprecated feature of html, and their use should be avoided in favor of css layout. (j) pages shall be designed to avoid causing the screen to flicker with a frequency greater than 2 hz and lower than 55 hz. lights with flicker rates in this range can trigger epileptic seizures. blinking or flashing elements on tending a wild garden: library web design for persons with disabilities | vandenbark 25 a webpage should be avoided until browsers provide the user with the ability to control flickering. (k) a text-only page, with equivalent information or functionality, shall be provided to make a web site comply with the provisions of this part, when compliance cannot be accomplished any other way. the content of the text-only page shall be updated whenever the primary page changes. complex content that is entirely visual in nature may require a separate text-only page, such as a page showing the english alphabet in american sign language. this requirement also serves as a stopgap measure for existing sites that require reworking for accessibility. some consider this to be the web’s version of separate-but-equal services, and should be avoided.9 offering a text-only alternative site can increase the sense of exclusion that pwd already feel. also, such versions of a website tend not to be equivalent to the parent site, leaving out promotions or advertisements. finally, a text-only version increases the workload of web development staff, making them more costly than creating a single, fully accessible site in the first place. (l) when pages utilize scripting languages to display content, or to create interface elements, the information provided by the script shall be identified with functional text that can be read by assistive technology. scripting languages such as javascript allow for more interactive content on a page while reducing the number of times the computer screen needs to be refreshed. if functional text is not available, the screen reader attempts to read the script’s code, which outputs as a meaningless jumble of characters. using redundant text links avoids this result. (m) when a web page requires that an applet, plug-in, or other application be present on the client system to interpret page content, the page must provide a link to a plug-in or applet that complies with [subpart b: technical standards] §1194.22(a) through (i). web developers need to ascertain whether a given plug-in or applet is accessible before requiring their webpage’s visitors to use it. when using applications such as quicktime or realaudio, it is important to provide an accessible link on the same page that will allow users to install the necessary plug-in. (n) when electronic forms are designed to be completed on-line, the form shall allow people using assistive technology to access information, field elements, and functionality required for completion and submission of the form, including all directions and cues. if scripts used in the completion of the form are inaccessible, an alternative method of completing the form must be made immediately available. each element of a form needs to be labeled properly using the tag. (o) a method shall be provided that permits users to skip repetitive navigation links. persons using screen reader software typically navigate through pages using the tab key, listening as the text is read aloud. websites commonly place their logo at the top of each page and make this graphic a link to the site’s homepage. many sites also use a line of graphic images just beneath this logo on every page to serve as a navigation bar. to avoid having to listen through this same list of links on every page just to get to the page’s content, a “skip to content” link as the first option at the top of each page provides a simple solution to this problem. (p) when a timed response is required, the user shall be alerted and given sufficient time to indicate more time is required. some sites log a user off if they have not typed or otherwise interacted with the page after a certain time period. users must be notified in advance that this is going to happen and given sufficient time to respond and request more time as needed. n standards-setting groups and their work one organization that seeks to move internet technology beyond basic section 508 compliance is the web accessibility initiative (wai) of the world wide web consortium (w3c). the mission of the wai is to develop n guidelines that are widely regarded as the international standard for web accessibility; n support materials to help understand and implement web accessibility; and n resources through international collaboration.10 the w3c published its first web content accessibility guidelines (wcag 1.0) in may of 1999 for making online content accessible to pwd. by following these guidelines, developers create web content that is readily available to every user regardless of the way it’s accessed. the wai provides ten quick tips for improving accessibility in website design: n images and animations. use the “alt” attribute to describe the function of each visual. n image maps. use the client-side map and text for hotspots. n multimedia. provide captioning and transcripts of audio, and descriptions of video. 26 information technology and libraries | march 2010 n hypertext links. use text that makes sense when read out of context. for example, avoid “click here.” n page organization. use headings, lists, and consistent structure. use css for layout and style where possible. n graphs and charts. summarize or use the “longdesc” attribute. n scripts, applets, and plug-ins. provide alternative content in case active features are inaccessible or unsupported. n frames. use the “noframes” element and meaningful titles. n tables. make line-by-line reading sensible. summarize. n check your work. validate. use tools, checklist, and guidelines at http://www.w3.org/tr/wcag.11 many libraries and other organizations have sought to follow wcag 1.0 since it was published. recently, the w3c updated their standards to wcag 2.0, and the wai website offers an overview of these guidelines along with a “customizable quick reference” designed to facilitate successful compliance. the principles behind 2.0 can be summarized by the acronym p.o.u.r. perceivable n provide text alternatives for non-text content. n provide captions and alternatives for multimedia. n make information adaptable and available to assistive technologies. n use sufficient contrast to make things easy to see and hear. operable n make all functionality keyboard accessible. n give users enough time to read and use content. n do not use content known to cause seizures. n help users navigate and find content. understandable n make text readable and understandable. n make content appear and operate in predictable ways. n help users avoid and correct mistakes. robust n maximize compatibility with current and future technologies.12 these guidelines offer assistance in creating accessible web-based materials. given their breadth, however, they raise concerns of overly wide interpretation and the strong possibility of falling short of section 508 standards. reading the details in wcag 2.0 does not give any additional assistance to library web developers on how to create a section 508–compliant website. clark points out that the three wcag 2.0 documents are long (72–165 pages), confusing, and sometimes internally contradictory.13 the goal of a library webmaster is to provide an interface (website, opac, database, and so on) that is both cutting-edge and accessible, and to encourage its use by patrons of all ability levels. while they have outlined a helpful rationale, the w3c’s overlong guidelines do little to help library web developers to achieve this goal. n recommendations libraries today typically offer three types of web-based resources: (1) access to the internet, (2) access to subscription databases, and (3) a library’s own webpage, all of which need to be accessible to pwd. libraries trying to comply with section 508 are required to “furnish auxiliary aids and services when necessary to ensure effective communication.”14 there are a number of options available to libraries on tight budgets. the first set involves the features built into each computer’s operating system and software. for some users with visual impairments, enlarging the font size of text and images on the screen will make electronic content more accessible. both macintosh and windows system software have universal-access capabilities built in, including the ability to read aloud text that is on the screen using synthesized speech. the mac read-aloud tool is called voice over; the windows read-aloud tool is called narrator. both systems allow for screen magnification. exploring and learning the capabilities of these systems to enhance accessibility is a free and easy first step for any library’s technology offerings, regardless of funding restrictions. libraries with more substantial technology budgets have a wide variety of hardware and software options to choose from to meet the needs of pwd. for patrons with visual impairments, several software packages are available to read aloud the content of a website or other electronic document using synthesized speech. jaws by freedom scientific and windoweyes by gw micro are two of the best-known software packages, and both include the ability to output to a refreshable braille display (which both companies also sell). kurzweil 3000 is an education-oriented software package that not only reads on-screen text aloud but has a wealth of additional tools to assist students with learning difficulties such as attention deficit disorder or dyslexia. it is designed to integrate with any education package as well as to assist students whose primary language is not english. persons with low vision needing screen magnification beyond the features windows offers may look to magic by freedom scientific or zoomtext by ai squared. some of these tending a wild garden: library web design for persons with disabilities | vandenbark 27 software companies offer free trial versions, have online demonstrations, or both. because prices for this software and related equipment can be high, it is prudent to first check with patrons with visual impairments and professionals in the field prior to making your purchase. humbert and stores, members of indiana university’s web accessibility team, offer accessibility evaluations of websites and other services at the university. when asked to compare windows and macintosh systems as to their usefulness in assisting pwd with web-based media, humbert rated the windows operating system superior, explaining that it has the proper “handles” coded into its software for screen readers and assistive technologies to grab onto. assistive technology software is more stable in windows vista because its predecessor, windows xp, “used hacked together drivers to display the information.”15 humbert discourages the use of vista and jaws on an older machine because vista is a memory hog and can crash jaws along with the rest of the system. the web browsers internet explorer and firefox allow the user to enlarge text and images on a webpage, though firefox is more effective. text can be enlarged only if the webpage being viewed is designed using resizable fonts. stores, who is profoundly visually impaired, uses jaws screen-reader software to work and to surf the web. she notes that both browsers work equally well with screenreader software.16 an important web-based resource that libraries provide is subscription databases. however, as one study has shown, “most librarians lack the time, resources and/or skills to evaluate the degree to which their library subscription databases are accessible to their disability communities.”17 the question is do the vendors themselves make an effort to produce an accessible product? a 2007 survey of twelve major database companies found that while most “have integrated accessibility standards/ guidelines into their search interfaces and/or plan to improve accessibility in future releases,” only five actually conducted usability studies with people who use assistive technology. a number of studies have found that “while most databases are functionally accessible, companies need to do more to meet the needs of the disability community and assure librarians of the accessibility of their products.”18 subscription databases can be inaccessible to pwd in the display of search results and accompanying information. the three most common forms of results delivery are html full text, html full text with graphics, and pdf files. pdf files are notoriously inaccessible to persons using screen readers. while adobe has made significant strides in rendering pdfs accessible, many databases contain numerous pdf documents created in versions of adobe acrobat prior to version 5.0 (released in 2001), which are not properly tagged for screen readers. even newer pdf documents are only as accessible as their tagging allows. journal articles received from publishers may or may not be properly tagged, so database companies cannot guarantee that their content is fully accessible. one vendor that is avoiding this trap is jstor. using optical character recognition (ocr) software, jstor delivers image-based pdfs with embedded text to make their content available to screen readers.19 librarians must insist that database packages be accessible and compatible with the forms of assistive technology most frequently used by their patrons, both in-house and online. one tool used to evaluate database (or other product) accessibility is the voluntary product accessibility template (vpat). created in partnership between the information technology industry (iti) council and the u.s. general services administration (gsa) in 2001, it provides “a simple, internet-based tool to assist federal contracting and procurement officials in fulfilling the new market research requirements contained in the section 508 implementing regulations.”20 vpat is a voluntary disclosure form arranged in a series of tables listing the criteria of relevant subsections of section 508 discussed previously. blank cells are provided to allow company representatives to describe how their product’s supporting features meet the criteria and to provide additional detailed information. library personnel can request that vendors complete this form to document which subsections of section 508 their products meet, and how. to be most useful, the form needs to be completed by company representatives with both a clear understanding of section 508 and its technical details and thorough knowledge of their product. knowledgeable library staff are encouraged to verify the quality and accuracy of the information provided before purchasing. like databases, a library’s website needs to be accessible to patrons with a variety of needs. according to muncaster, accessible sites are 35 percent easier for everyone to use and are more likely to be found by internet search engines.21 fully accessible websites are simpler to maintain and are on average 50 percent smaller than inaccessible ones, which means they download faster, making them easier to use.22 in creating a basic site, current best practice has been to render the content in html or xhtml and design the layout using css. this way, if it is discovered the site’s pages are not fully accessible, a simple change to the css updates all pages, saving the site manager time and effort. finally, creating an accessible site from the beginning is substantially easier than retrofitting an old one. a complete rebuild of a library website is an opportunity to improve accessibility. reynolds’ article on creating a user-centered website for the johnson county (kans.) library offers an example of how libraries can apply basic information architecture design principles on a budget. johnson county focused on simple, low-budget 28 information technology and libraries | march 2010 usability studies involving patrons in the selection of site navigation categories, designing the layout, and testing the resulting user interface. by involving average users in this process, this library was able to achieve substantial improvements in the site’s usability. prior to the redesign, usability testing determined that 42 percent of users were not successful in finding information on the library’s old site. after the redesign, “only 4% of patrons were unsuccessful in finding core-task information on the first attempt.”23 even so, a quick test of the site with the online accessibility evaluation tool cynthiasays indicates that it still does not fully meet the requirements of section 508. had the library’s staff included pwd in their process, the demonstrated degree of improvement might have allowed them to meet and possibly exceed this standard. an understanding of how a person with disabilities experiences the online environment can help point the way toward improved accessibility. a recent study in the united kingdom tracked the eye movements of ablebodied computer users in an effort to answer these questions. researchers asked eighteen people with normal or corrected vision to search for answers on two versions of a bbc website—the standard graphical page and the textonly version. subjects’ eyes tended to dart around the standard page “as they attempt to locate what appears visually to be the next most likely location”24 for the answer. but in searching the text-only page, subjects went line-by-line, making smaller jumps across each page. researchers determined that the webpage and its layout serve as a form of external memory, providing visual cues to the structure of its content and how to navigate it. if the internet is an information superhighway, then the layout of a standard webpage serves as the borders and directional signs for browsing. the visual cues and navigation aids inherent in current webpages’ layouts provide no auditory equivalent for presentation to people with visual impairments. information seeking on the web is a complex process requiring “the ability to switch and coordinate among multiple information-seeking strategies” such as browsing, scanning, query-based searching, and so on.25 if web browsers could translate formatting and presentation into audio tailored to the needs of the visually impaired, the use of the internet would be a far more satisfying experience for those users. however, such web programming would require years of additional research and development. in the meantime, web librarians must strive to build sites that are clean, hierarchical, and usable by all persons by following to the standards and guidelines currently available. one way to enhance the accessibility of sites is to follow a database-driven web development model. in addition to using xhtml and css, dunlap recommends that content be stored in a relational database such as mysql and that a coding language such as php be used to create pages dynamically. this approach has two advantages. first, it allows for the creation of “a flexible website design style that lives in a single, easily modified file that controls the presentation of every web page of the site.”26 second, it requires far less time for site maintenance, freeing staff to devote time to assuring accessibility while accommodating changes in web technology. such a model can be used by database vendors to ensure that their services can seamlessly integrate with the library’s online content. use of mobile phones and similar devices to browse the web is at an all-time high, and content providers are eager to make their sites mobile-friendly. many of these end users experience similar barriers to accessing this content as pwd do. for example, persons with some motor disabilities as well as mobile phones with only a numeric keypad cannot access sites with links requiring the use of a mouse. sites that follow either the w3c’s mobile web best practices (mwbp) or wcag are well on their way to meeting both standards.27 by properly associating labels with their controls, internet content can be made fully accessible to both end users. understanding the similarities between mwpb and wcag can lead to website design that is truly perceivable, operable, understandable, and robust. n summary librarians with responsibility for web design and technology management operate in an evolving environment. legal requirements make clear the expectation to serve the wide variety of needs of patrons with disabilities. yet the guidelines and standards available to assist in this venture range from complex to vague and insufficient. assistive technologies continue to improve with many traditional vendors confident that their products are accessible. in actual use, however, substantial challenges and shortcomings remain. the challenge for technology librarians is to be proactive in keeping abreast of technological advances, to experiment and learn from their efforts, and to continually update and adapt to provide web or hypermedia information and services to patrons of all kinds. references 1. united nations, convention on the rights of persons with disabilities, 2008, http://www.un.org/disabilities/default .asp?navid=12&pid=150 (accessed aug. 10, 2009). 2. ibid. 3. erika steinmetz, americans with disabilities (washington, d.c.: u.s. census bureau, 2002). tending a wild garden: library web design for persons with disabilities | vandenbark 29 4. u.s. department of justice, civil rights division, disability rights section, “title ii highlights,” aug. 29, 2002, http:// www.ada.gov/t2hlt95.htm (accessed july 26, 2008). 5. marilyn irwin, resources and services for people with disabilities: lesson 1b transcript (indianapolis: indiana university at indianapolis school of library and information science, 2008): 10 6. ibid., 10 7. 1998 amendment to section 508 of the rehabilitation act, subpart b—technical standards, §1194.22, http://www .section508.gov/index.cfm?fuseaction=content&id=12#appli cation (access dec. 2, 2009). 8. access it, “how long can an ‘alt’ attribute be?” university of washington, 2008, http://www.washington.edu/ accessit/articles?257 (accessed dec. 12, 2008). 9. matt may, “on ‘separate but equal’ design,” online posting, june 24, 2004, bestkungfu weblog, http://www.bestkungfu .com/archive/date/2004/06/on-separate-but-equal-design/ (accessed dec. 18, 2008). 10. web accessibility initiative, “wai mission and organization,” 2008, http://www.w3.org/wai/about.html (accessed july 22, 2008). 11. shawn lawton henry and pasquale popolizio, “wai, quick tips to make accessible web sites,” world wide web consortium, feb. 5, 2008, http://www.w3.org/wai/quicktips/ overview.php (accessed mar. 30, 2008). 12. ben caldwell et al., “web content accessibility guidelines (wcag) 2.0,” world wide web consortium, dec. 11, 2008, http://www.w3.org/tr/wcag20/ (accessed july 27, 2008). 13. joe clark, “to hell with wcag 2,” a list apart no. 217 (may 26, 2006), http://www.alistapart.com/articles/tohellwith wcag2 (accessed july 25, 2008). 14. u.s. department of justice, “title ii highlights.” 15. joseph a. humbert and mary stores, questions about new software and accessibility (richmond, ind., july 28, 2008). 16. ibid. 17. s. l. byerley, m. b. chambers, and m. thohira, “accessibility of web-based library databases: the vendors’ perspectives in 2007,” library hi tech 25, no. 4 (2007): 509–27. 18. ibid. 19. p. muncaster, “poor accessibility has a price,” vnu net, feb. 9, 2006, http://www.vnunet.com/articles/send/2150099 (accessed july 27, 2008). 20. information technology industry council, “faq: voluntary product accessibility template (vpat),” http://www.itic .org/archives/articles/20040506/faq_voluntary_product_ accessibility_template_vpat.php (accessed july 29, 2008). 21. muncaster, “poor accessibility has a price.” 22. isaac hunter dunlap, “how database-driven web sites enhance accessibility,” library hi tech 23, no. 8 (2008): 34–38. 23. erica reynolds, “the secret to patron-centered web design: cheap, easy, and powerful usability techniques,” computers in libraries 28, no. 6 (2008): 6–47. 24. caroline jay et al., “how people use presentation to search for a link: expanding the understanding of accessibility on the web,” universal access in the information society 6, no. 3 (2006): 307–20. 25. c. kouroupetroglou, m. salampasis, and a. manitsaris, “browsing shortcuts as a means to improve information seeking of blind people in the www,” universal access in the information society 6, no. 3 (2007): 11. 26. dunlap, “how database-driven web sites enhance accessibility.” 27. web accessibility initiative, “mobile web best practices 1.0,” july 29, 2008, http://www.w3.org/tr/mobile-bp (accessed aug. 10, 2009). ital_24n4p3 ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ 214 information technology and libraries | december 2010 margaret brown-sica, jeffrey beall, and nina mchale next-generation library catalogs and the problem of slow response time and librarians will benefit from knowing what typical and acceptable response times are in online catalogs, and this information will assist in the design and evaluation of library discovery systems. this study also looks at benchmarks in response time and defines what is unacceptable and why. when advanced features and content in library catalogs increase response time to the extent that users become disaffected and use the catalog less, nextgen catalogs represent a step backward, not forward. in august 2009, the auraria library launched an instance of the worldcat local product from oclc, dubbed worldcat@auraria. the library’s traditional catalog—named skyline and running on the innovative interfaces platform—still runs concurrently with worldcat@auraria. because worldcat local currently lacks a library circulation module that the library was able to use, the legacy catalog is still required for its circulation functionality. in addition, skyline contains marc records from the serialssolution 360 marc product. since many of these records are not yet available in the oclc worldcat database, these records are being maintained in the legacy catalog to enable access to the library’s extensive collection of online journals. almost immediately upon implementation of worldcat local, many library staff began to express concern about the product’s slow response time. they bemoaned its slowness both at the reference desk and during library instruction sessions. few of the discussions of the product’s slow response time evaluated this weakness in the context of its advanced features. several of the reference and instruction librarians even stated that they refused to use it any longer and that they were not recommending it to students and faculty. indeed, many stated that they would only use the legacy skyline catalog from then on. therefore we decided to analyze the product’s response time in relation to the legacy catalog. we also decided to further our study by examining response time in library catalogs in general, including several different online catalog products from different vendors. ■■ response time the term response time can mean different things in different contexts. here we use it to mean the time it takes for all files that constitute a single webpage (in the case of testing performed, a permalink to a bibliographic record) to travel across the internet from a web server to the computer on which the page is to be displayed. we do not include the time it takes for the browser to render the page, only the time it takes for the files to arrive to the requesting computer. typically, a single webpage is made of multiple files; these are sent via the internet from a web response time as defined for this study is the time that it takes for all files that constitute a single webpage to travel across the internet from a web server to the end user’s browser. in this study, the authors tested response times on queries for identical items in five different library catalogs, one of them a next-generation (nextgen) catalog. the authors also discuss acceptable response time and how it may affect the discovery process. they suggest that librarians and vendors should develop standards for acceptable response time and use it in the product selection and development processes. n ext-generation, or nextgen, library catalogs offer advanced features and functionality that facilitate library research and enable web 2.0 features such as tagging and the ability for end users to create lists and add book reviews. in addition, individual catalog records now typically contain much more data than they did in earlier generations of online catalogs. this additional data can include the previously mentioned tags, lists, and reviews, but a bibliographic record may also contain cover images, multiple icons and graphics, tables of contents, holdings data, links to similar items, and much more. this additional data is designed to assist catalog users in the selection, evaluation, and access of library materials. however, all of the additional data and features have the disadvantage of increasing the time it takes for the information to flow across the internet and reach the end user. moreover, the code that handles all this data is much more complex than the coding used in earlier, traditional library catalogs. slow response time has the potential to discourage both library patrons from using the catalog and library staff from using or recommending it. during a reference interview or library instruction session, a slow response time creates an awkward lull in the process, a delay that decreases confidence in the mind of library users, especially novices who are accustomed to the speed of an open internet search. the two-fold purpose of this study is to define the concept of response time as it relates to both traditional and nextgen library catalogs and to measure some typical response times in a selection of library catalogs. libraries margaret brown-sica (margaret.brown-sica@ucdenver.edu) is assistant professor, associate director of technology strategy and learning spaces, jeffrey beall (jeffrey.beall@ucdenver.edu) is assistant professor, metadata librarian, and nina mchale (nina.mchale@ucdenver.edu) is assistant professor, web librarian, university of colorado denver. next-generation library catalogs | brown-sica, beall, and mchale 215 mathews posted an article called “5 next gen library catalogs and 5 students: their initial impressions.”7 here he shares student impressions of several nextgen catalogs. regarding slow response time mathews notes, “lots of comments on slowness. one student said it took more than ten seconds to provide results. some other comments were: ‘that’s unacceptable’ and ‘slow-motion search, typical library.’” nagy and garrison, on lauren’s library blog, emphasized that any “cross-silo federated search” is “as slow as the slower silos.”8 any search interface is as slow as the slowest database from which it pulls information; however, that does not make users more likely to wait for search results. in fact, many users will not even know—or care—what is happening behind the scenes in a nextgen catalog. the assertion that slow response time makes wellintentioned improvements to an interface irrelevant is supported by an article that analyzes the development of semantic web browsers. frachtenberg notes that users, however, have grown to expect web search engines to provide near-instantaneous results, and a slow search engine could be deemed unusable even if it provides highly relevant results. it is therefore imperative for any search engine to meet its users’ interactivity expectations, or risk losing them.9 this is not just a library issue. users expect a fast response to all web queries, and we can learn from studies on general web response time and how it affects the user experience. huang and fong-ling help explain different user standards when using websites. their research suggests that “hygiene factors” such as “navigation, information display, ease of learning and response time” are more important to people using “utilitarian” sites to accomplish tasks rather than “hedonistic” sites.10 in other words, response time importance increases when the user is trying to perform a task— such as research—and possibly even more for a task that may be time sensitive—such as trying to complete an assignment for class. ■■ method for testing response time in an assortment of library catalogs, we used the websitepulse service (http://www .websitepulse.com). websitepulse provides in-depth website and server diagnostic services that are intended to save e-business customers time and money by reporting errors and web server and website performance issues to clients. a thirty-day free trial is available for potential customers to review the full array of their services; however, the free web page test, available at http://www.website server and arrive sequentially at the computer where the request was initiated. while the world wide web consortium (w3c) does not set forth any particular guidelines regarding response time, go-to usability expert jakob nielsen states that “0.1 second is about the limit for having the user feel that the system is reacting instantaneously.”1 he further posits that 1.0 second is “about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay.”2 finally, he asserts that: 10 seconds is about the limit for keeping the user’s attention focused on the dialogue. for longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.3 even though this advice dates to 1994, nielsen noted even then that it had “been about the same for many years.”4 ■■ previous studies the chief benefit of studying response time is to establish it as a criterion for evaluating online products that libraries license and purchase, including nextgen online catalogs. establishing response-time benchmarks will aid in the evaluation of these products and will help libraries convey the message to product vendors that fast response time is a valuable product feature. long response times will indicate that a product is deficient and suffers from poor usability. it is important to note, however, that sometimes library technology environments can be at fault in lengthening response time as well; in “playing tag in the dark: diagnosing slowness in library response time,” brown-sica diagnosed delays in response time by testing such variables as vendor and proxy issues, hardware, bandwidth, and network traffic.5 in that case, inadequate server specifications and settings were at fault. while there are many articles on nextgen catalogs, few of them discuss the issue of response time in relation to their success. search slowness has been reported in library literature about nextgen catalogs’ metasearch cousins, federated search products. in a 2006 review of federated search tools metalib and webfeat, chen noted that “a federated search could be dozens of times slower than google.”6 more comments about the negative effects of slow response time in nextgen catalogs can be found in popular library technology blogs. on his blog, 216 information technology and libraries | december 2010 ■■ findings: skyline versus worldcat@auraria in figure 2, the bar graph shows a sample load time for the permalink to the bibliographic record for the title hard lessons: the iraq reconstruction experience in skyline, auraria’s traditional catalog load time for the page is pulse.com/corporate/alltools.php, met our needs. to use the webpage test, simply select “web page test” from the dropdown menu, input a url—in the case of the testing done for this study, the permalink for one of three books (see, for example, figure 1)—enter the validation code, and click “test it.” websitepulse returns a bar graph (figure 2) and a table (figure 3) of the file activity from the server sending the composite files to the end user ’s web browser. each line represents one of the files that make up the rendered webpage. they load sequentially, and the bar graph shows both the time it took for each file to load and the order in which the files were received. longer segments of the bar graph provide visual indication of where a slow-loading webpage might encounter sticking points—for example, waiting for a large image file or third-party content to load. accompanying the bar graph is a table describing the file transmissions in more detail, including dns, connection, file redirects (if applicable), first and last bytes, file transmission times, and file sizes. figure 1. permalink screen shot for the record for the title hard lessons in auraria library’s skyline catalog figure 2. websitepulse webpage test bar graph results for skyline (traditional) catalog record figure 3. websitepulse webpage test table results for skyline (traditional) catalog record next-generation library catalogs | brown-sica, beall, and mchale 217 requested at items 8, 14, 15, 17, 26, and 27. the third parties include yahoo! api services, the google api service, recaptcha, and addthis. recaptcha is used to provide security within worldcat local with optical character recognition images (“captchas”), and the addthis api is used to provide bookmarking functionality. at number 22, a connection is made to the auraria library web server to retrieve a logo image hosted on the web server. at number 28, the cover photo for hard lessons is retrieved from an oclc server. the files listed in figure 6 details the complex process of web browsers’ assembly of them. each connection to third-party content, while all relatively short, allows for additional features and functionality, but lengthens overall response. as figure 6 shows, the response time is slightly more than 10 seconds, which, according to nielsen, “is about the limit for keeping the user ’s attention focused on the dialogue.”12 while widgets, third-party content, and other web 2.0 tools add desirable content and functionality to the library’s catalog, they also do slow response time considerably. the total file size for the bibliographic record in worldcat@auraria—compared to skyline’s 84.64 kb—is 633.09 kb. as will be shown in the test results below for the catalog and nextgen catalog products, bells and whistles added to traditional 1.1429 seconds total. the record is composed of a total of fourteen items, including image files (gifs), cascading style sheet (css) files, and javascript (js) files. as the graph is read downward, the longer segments of the bars reveal the sticking points. in the case of skyline, the nine image files, two css files, and one js file loaded quickly; the only cause for concern is the red line at item four. this revealed that we were not taking advantage of the option to add a favicon to our iii catalog. the web librarian provided the ils server technician with the same favicon image used for the library’s website, correcting this issue. the skyline catalog, judging by this data, falls into nielsen’s second range of user expectations regarding response time, which is more than one second, or “about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay.”11 further detail is provided in figure 3; this table lists each of the webpage’s component files, and various times associated with the delivery of each file. the column on the right lists the size in kilobytes of each file. the total size of the combined files is 84.64 kb. in contrast to skyline’s meager 14 files, worldcat local requires 31 items to assemble the webpage (figure 4) for the same bibliographic record. figures 5 and 6 show that this includes 10 css files, 10 javascript files, and 8 images files (gifs and pngs). no item in particular slows down the overall process very much; the longestloading item is number 13, which is a wait for third-party content, a connection to yahoo!’s user interface (yui) api service. additional third-party content is being figure4. permalink screen shot for the record for the title hard lessons in worldcat@auraria figure 5. websitepulse webpage test bar graph results for worldcat@auraria record 218 information technology and libraries | december 2010 total time for each permalinked bibliographic record to load as reported by the websitepulse tests; this number appears near the lower right-hand corner of the tables in figures 3, 6, 9, 12, and 15. we selected three books that were each held by all five of our test sites, verifying that we were searching the same three bibliographic records in each of the online catalogs by looking at the oclc number in the records. each of the catalogs we tested has a permalink feature; this is a stable url that always points to the same record in each catalog. using a permalink approximates conducting a known-item search for that item from a catalog search screen. we saved these links and used them in our searches. the bibliographic records we tested were for these books; the permalinks used for testing follow the books: book 1: hard lessons: the iraq reconstruction experience. washington, d.c.: special inspector general, iraq reconstruction, 2009 (oclc number 302189848). permalinks used: ■■ worldcat@auraria: http://aurarialibrary.worldcat .org/oclc/302189848 ■■ skyline: http://skyline.cudenver.edu/record=b243 3301~s0 ■■ lcoc: http://lccn.loc.gov/2009366172 ■■ ut austin: http://catalog.lib.utexas.edu/record= b7195737~s29 ■■ usc: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=2770895{ckey} book 2: ehrenreich, barbara. nickel and dimed: on (not) getting by in america. 1st ed. new york: metropolitan, 2001 (oclc number 256770509). permalinks used: ■■ worldcat@auraria: http://aurarialibrary.worldcat .org/oclc/45243324 ■■ skyline: http://skyline.cudenver.edu/record=b187 0305~s0 ■■ lcoc: http://lccn.loc.gov/00052514 ■■ ut austin: http://catalog.lib.utexas.edu/record= b5133603~s29 ■■ usc: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=1856407{ckey} book 3: langley, lester d. simón bolívar: venezuelan rebel, american revolutionary. lanham: rowman & littlefield catalogs slowed response time considerably, even doubling it in one case. are they worth it? the response of auraria’s reference and instruction staff seems to indicate that they are not. ■■ gathering more data: selecting the books and catalogs to study to broaden our comparison and to increase our data collection, we also tested three other non-auraria catalogs. we designed our study to incorporate a number of variables. we decided to link to bibliographic records for three different books in the five different online catalogs tested. these included skyline and worldcat@auraria as well three additional online public access catalog products, for a total of two instances of innovative interfaces products, one of a voyager catalog, and one of a sirsidynix catalog. we also selected online catalogs in different parts of the country: worldcatlocal in ohio; skyline in denver; the library of congress’ online catalog (lcoc) in washington, d.c.; the university of texas at austin’s (ut austin) online catalog; and the university of southern california’s (usc) online catalog, named homer, in los angeles. we also did our testing at different times of the day. one book was tested in the morning, one at midday, and one in the afternoon. websitepulse performs its webpage tests from three different locations in seattle, munich, and brisbane; we selected seattle for all of our tests. we recorded the figure 6. websitepulse webpage test table results for worldcat@auraria record next-generation library catalogs | brown-sica, beall, and mchale 219 .org/oclc/256770509 ■■ skyline: http://skyline.cudenver.edu/record=b242 6349~s0 ■■ lcoc: http://lccn.loc.gov/2008041868 ■■ ut austin: http://catalog.lib.utexas.edu/record= b7192968~s29 ■■ usc: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=2755357{ckey} we gathered the data for thirteen days in early november 2009, an active period in the middle of the semester. for each test, we recorded the response time total in seconds. the data is displayed in tables 1–3. we searched bibliographic records for three books in five library catalogs over thirteen days (3 x 5 x 13) for a total of 195 response time measurements. the websitepulse data is calculated to the ten thousandth of a second, and we recorded the data exactly as it was presented. publishers, c2009 (oclc number 256770509). permalinks used: ■■ worldcat@auraria: http://aurarialibrary.worldcat table 1. response times for book 1 response time in seconds day wor ldcat skyline lc ut austin usc 1 10.5230 1.3191 2.6366 3.6643 3.1816 2 10.5329 1.2058 1.2588 3.5089 4.0855 3 10.4948 1.2796 2.5506 3.4462 2.8584 4 13.2433 1.4668 1.4071 3.6368 3.2750 5 10.5834 1.3763 3.6363 3.3143 4.6205 6 11.2617 1.2461 2.3836 3.4764 2.9421 7 20.5529 1.2791 3.3990 3.4349 3.2563 8 12.6071 1.3172 3.6494 3.5085 2.7958 9 10.4936 1.1767 2.6883 3.7392 4.0548 10 10.1173 1.5679 1.3661 3.7634 3.1165 11 9.4755 1.1872 1.3535 3.4504 3.3764 12 12.1935 1.3467 4.7499 3.2683 3.4529 13 11.7236 1.2754 1.5569 3.1250 3.1230 average 11.8310 1.3111 2.5105 3.4874 3.3953 table 2. response times for book 2 response time in seconds day worldcat skyline lc ut austin usc 1 10.9524 1.4504 2.5669 3.4649 3.2345 2 10.5885 1.2890 2.7130 3.8244 3.7859 3 10.9267 1.3051 0.2168 4.0154 3.6989 4 13.8776 1.3052 1.3149 4.0293 3.3358 5 10.6495 1.3250 4.5732 3.5775 3.2979 6 11.8369 1.3645 1.3605 3.3152 2.9023 7 11.3482 1.2348 2.3685 3.4073 3.5559 8 10.7717 1.2317 1.3196 3.5326 3.3657 9 11.1694 1.0997 1.0433 2.8096 2.6839 10 19.0694 1.6479 2.5779 4.3595 2.6945 11 12.0109 1.1945 2.5344 3.0848 18.5552 12 12.6881 0.7384 1.3863 3.7873 3.9975 13 11.6370 1.1668 1.2573 3.3211 3.6393 average 12.1174 1.2579 1.9410 3.5791 4.5190 table 3. response times for book 3 response time in seconds day worldcat skyline lc ut austin usc 1 10.8560 1.3345 1.9055 3.7001 2.6903 2 10.1936 1.2671 1.8801 3.5036 2.7641 3 11.0900 1.5326 1.3983 3.5983 3.0025 4 10.9030 1.4557 2.0432 3.6248 2.9285 5 12.3503 1.5972 3.5474 3.6428 4.5431 6 9.1008 1.1661 1.4440 3.4577 3.1080 7 9.6263 1.1240 2.3688 3.1041 3.3388 8 10.9539 1.1944 1.4941 2.8968 3.4224 9 11.0001 1.2805 1.3255 3.3644 2.7236 10 10.2231 1.3778 1.3131 3.3863 3.4885 11 10.1358 1.2476 2.3199 3.4552 2.9302 12 12.0109 1.1945 2.5344 3.0848 18.5552 13 11.5881 1.2596 2.5245 3.8040 3.8506 average 10.7717 1.3101 2.0076 3.4325 4.4112 table 4. averages response time in seconds book worldcat skyline lc ut austin usc book 1 11.8310 1.3111 2.5105 3.4874 3.3953 book 2 12.1174 1.2579 1.9410 3.5791 4.5190 book 3 10.7717 1.3101 2.0076 3.4325 4.4112 average 11.5734 1.2930 2.1530 3.4997 4.1085 220 information technology and libraries | december 2010 university of colorado denver: skyline (innovative interfaces) as previously mentioned, the traditional catalog at auraria library runs on an innovative interfaces integrated library system (ils). testing revealed a missing favicon image file that the web server tries to send each time (item 4 in figure 3). however, this did not negatively affect the response time. the catalog’s response time was good, with an average of 1.2930 seconds, giving it the fastest average time among all the test sites in the testing period. as figure 1 shows, however, skyline is a typical legacy catalog that is designed for a traditional library environment. library of congress: online catalog (voyager) the average response time for the lcoc was 2.0076 ■■ results the data shows the response times for each of the three books in each of the five online catalogs over the thirteenday testing period. the raw data was used to calculate averages for each book in each of the five online catalogs, and then we calculated averages for each of the five online catalogs (table 4). the averages show that during the testing period, the response time varied between 1.2930 seconds for the skyline library catalog in denver to 11.5734 seconds for worldcat@auraria, which has its servers in ohio. university of colorado denver: worldcat@auraria worldcat@auraria was routinely over nielsen’s ten second limit, sometimes taking as long as twenty seconds to load all the files to generate a single webpage. as previously discussed, this is due to the high number and variety of files that make up a single bibliographic record. the files sent also include cover images, but they are small and do not add much to the total time. after our tests on worldcat@auraria were conducted, the site removed one of the features on pages for individual resources, namely the “similar items” feature. this feature was one of the most file-intensive on a typical page, and its removal should speed up page loads. however, worldcat@auraria had the highest average response time by far of the five catalogs tested. figure 7. permalink screen shot for the record for the title hard lessons in the library of congress online catalog figure 8. websitepulse webpage test bar graph results for library of congress online catalog record figure 9. websitepulse webpage test table results for library of congress online catalog record next-generation library catalogs | brown-sica, beall, and mchale 221 item 14 is a script, that while hosted on the ils server, queries amazon.com to return cover image art (figures 11–12). the average response time for ut austin’s catalog was 3.4997 seconds. this example demonstrates that response times for traditional (i.e., not nextgen) catalogs can be slowed down by additional content as well. university of southern california: homer (sirsidynix) the average response time for usc’s homer catalog was 4.1085 seconds, making it the second slowest after seconds. this was the second fastest average among the five catalogs tested. while, like skyline, the bibliographic record page is sparsely decorated (figure 7), this pays dividends in response time, as there are only two css files and three gif files to load after the html content loads (figure 9). figure 8 shows that initial connection time is the longest factor in load time; however, it is still short enough to not have a negative effect. total file size is 19.27 kb. as with skyline, the page itself (figure 7) is not particularly end-user friendly to nonlibrarians. university of texas at austin: library catalog (innovative interfaces) ut austin, like auraria library, runs an innovative interfaces ils. the library catalog also includes book cover images, one of the most attractive nextgen features (figure 10), and as shown in figure 12, third-party content is used to add features and functionality (items 16 and 17). ut austin’s catalog uses a google javascript api (item 16 in figure 12) and librarything’s catalog enhancement product, which can add book recommendations, tag browsing, and alternate editions and translations. total content size for the bibliographic record is considerably larger than skyline and the lcoc at 138.84 kb. it appears as though inclusion of cover art nearly doubles the response time; figure 10. permalink screen shot for the record for the title hard lessons in university of texas at austin’s library catalog figure 11. websitepulse webpage test bar graph results for university of texas at austin’s library catalog record figure 12. websitepulse webpage test table results for university of texas at austin’s library catalog record 222 information technology and libraries | december 2010 completed. added functionality and features in library search tools are valuable, but there is a tipping point when these features slow down a product’s response time to where users find the search tool too slow or unreliable. based on the findings of this study, we recommend that libraries adopt web response time standards, such as those set forth by nielsen, for evaluating vendor search products and creating in-house search products. commercial tools like websitepulse make this type of data collection simple and easy. testing should be conducted for an extended period of time, preferably during a peak period—i.e., during a busy time of the semester for academic libraries. we further recommend that reviews of electronic resources add response time as an worldcat@auraria, and the slowest among the traditional catalogs. this sirsidynix catalog appears to take a longer time than the other brands of catalogs to make the initial connection to the ils; this accounts for much of the slowness (see figures 14 and 15). once the initial connection is made, however, the remaining content loads very quickly, with one exception: item 13 (see figure 15), which is a connection to the third-party provider syndetic solutions, which provides cover art, a summary, an author biography, and a table of contents. while the display of this content is attractive and well-integrated to the catalog (figure 13), it adds 1.2 seconds to the total response time. also, as shown in item 14 and 15, usc’s homer uses the addthis service to add bookmarking enhancements to the catalog. total combined file size is 148.47 kb, with the bulk of the file size (80 kb) coming from the initial connection (item 1 in figure 15). ■■ conclusion an eye-catching interface and valuable content are lost on the end user if he or she moves on before a search is figure 13. permalink screen shot for the record for the title hard lessons in homer, the university of southern california’s catalog figure 14. websitepulse webpage test bar graph results for homer, the university of southern california’s catalog figure 15. websitepulse webpage test table results for homer, the university of southern california’s catalog next-generation library catalogs | brown-sica, beall, and mchale 223 4. ibid. 5. margaret brown-sica. “playing tag in the dark: diagnosing slowness in library response time,” information technology & libraries 27, no. 4 (2008): 29–32. 6. xiaotian chen, “metalib, webfeat, and google: the strengths and weaknesses of federated search engines compared with google,” online information review 30, no. 4 (2006): 422. 7. brian mathews, “5 next gen library catalogs and 5 students: their initial impressions,” online posting, may 1, 2009, the ubiquitous librarian blog, http://theubiquitouslibrarian .typepad.com/the_ubiquitous_librarian/2009/05/5-next-genlibrary-catalogs-and-5-students-their-initial-impressions.html (accessed feb. 5, 2010) 8. andrew nagy and scott garrison, “next-gen catalogs are only part of the solution,” online posting. oct. 4, 2009, lauren’s library blog, http://laurenpressley.com/library/2009/10/next -gen-catalogs-are-only-part-of-the-solution/ (accessed feb. 5, 2010). 9. eitan frachtenberg, “reducing query latencies in web search using fine-grained parallelism,” world wide web 12, no. 4 (2009): 441–60. 10. travis k huang and fu fong-ling, “understanding user interface needs of e-commerce web sites,” behaviour & information technology 28, no. 5 (2009): 461–69, http://www .informaworld.com/10.1080/01449290903121378 (accessed feb. 5, 2010). 11. nielsen, usability engineering, 135. 12. ibid. evaluation criterion. additional research about response time as defined in this study might look at other search tools, to include article databases, and especially other metasearch products that collect and aggregate search results from several remote sources. further studies with more of a technological focus could include discussions of optimizing data delivery methods—again, in the case of metasearch tools from multiple remote sources—to reduce response time. finally, product designers should pay close attention to response time when designing information retrieval products that libraries purchase. ■■ acknowledgments the authors wish to thank shelley wendt, library data analyst, for her assistance in preparing the test data. references 1. jakob nielsen, usability engineering (san francisco: morgan kaufmann, 1994): 135. 2. ibid. 3. ibid. editorial | truitt 3 marc truitteditorial marc truitt (marc.truitt@ualberta.ca) is associate director, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. the catalog. love it? hate it? depending upon who is speaking, it may be cast as the ultimate portal that enables user access to all local and networked resources, or it may be a tool of byzantine complexity, comprehensible at best to but a small fraction of librarians able to navigate its bibliographic metadata encoded in an arcane 1960s-era format. it is a rich trove of structured and controlled information assembled over decades by the work of countless dedicated catalogers and others. or, it is the now-obsolete product of a labor-intensive process of description and subject analysis that has no relevance in a web-centric world where “everything” is findable via the google search-box. its attempt to organize knowledge provides catalogers with a raison d’etre, but sends their colleagues and many users fleeing for simpler and more all-encompassing tools. it is our alpha and omega, our yin and our yang. few topics in librarianship—perhaps with the conspicuous exception of that perennial library school favorite, our profession’s status as a profession—seem to provoke the range and depth of sentiment engendered by discussions of the place of the catalog. especially in recent years, criticism of the catalog has grown ever more strident, to the point where it has become commonplace in our profession’s literature to say that this most basic of library services “sucks.” as a consequence, librarians have increasingly fallen into one of two camps, with those critical of the catalog often simplistically characterized as favoring, and those defending it as opposing “change.” a number of initiatives have emerged in response to this ferment. some of these have focused on our bibliographic metadata, and particularly on its ability to express the relationships and interconnectedness of the bibliographic universe. as we have traditionally cataloged whatever we had “in-hand,” our cataloging codes and encoding standards have done a very good job of managing the description of bibliographic items; what they have not generally expressed well are the relationships among items. frbr and frad—the functional requirements for bibliographic records and the functional requirements for authority data—seem promising beginnings for addressing the relationship issues, although there are as-yet very few practical implementations. resource description and access (rda), the forthcoming successor to aacr2, is designed around frbr concepts; it will be interesting to see how this plays out in the “real world;” equally interesting will be to what degree the present (or a modified) marc21 is able to express rda’s frbr-based relationship model. other approaches have focused on developing systems that are able to exploit our existing investment in bibliographic metadata in new and useful ways. the pioneering and best-known example of this, of course, is the discovery tool developed by a partnership of north carolina state university libraries and endeca, which premiered in early 2006. this initiative included several innovative features not previously found in library catalogs, such as search result relevance ranking and the ability to perform faceted searching against a variety of controlledvocabulary indices (subject/topical, form/genre, date, etc.) ncsu’s endeca discovery tool spawned an entirely new product segment for the catalog: major ils vendors have scrambled to develop their own next-gen products, combining relevancy and facets with additional functionality such as web 2.0 social and collaborative tools and enhanced federated searching capabilities. the result of all this activity has been the first cross-platform growth opportunity for ils vendors since the development of resource-linking tools and the erm. we at ital have watched these trends with keen interest and have published works describing many of the major developments vis-a-vis the catalog in recent years. indeed, since late 2004, ital has published at least eleven major papers on various topics related to improving the catalog. with our publication of jennifer bowen’s report on the first phase outcomes of the university of rochester’s extensible catalog (xc) project in this issue of ital, we continue our commitment to publish important research in this area. the rochester project is noteworthy, both for its modular and metadata-focused approach and for its high visibility as an open source effort that has received significant support from the andrew w. mellon foundation. i predict that this paper will quickly take its place among the other ground-breaking works on the catalog that ital has published, and i’ll eagerly be awaiting the next progress report on the xc. n “must-reads” dept. okay, so i may not be the first out of the gate with this one, but for those of you who haven’t looked at it yet, trust me, you’ll want to. jonathan zittrain’s the future of the internet and how to stop it (yale university press, 2008), which divides the internet into “generative” technologies such as the pc, and proprietary appliances such as the iphone, may or may not resonate with you, but i think it could well become the next big debate about where the net is and where it should be going. grab a copy and read it today. lib-s-mocs-kmc364-20141005045744 book reviews key papers in informatwn science. edited by arthur w. elias. washington, d.c.: american society for information science, 1971. 223 p. $6.00. when i re-read the articles making up this volume for the purpose of writing this review, a strong feeling of nostalgia welled up. as a reader who has lived t?rough the years of speculation, exploration, experiment, development, and debate that they embody, i couldn't help but feel again the spirit of excitement that i and others felt at the time. these are indeed "key papers," and it's valuable to have them together. oh, of course some names are missing and are missedmooers, taube, fairthorne, perry and kent, bar-hillel, bush, shaw-but enough of them are here to give a full flavor of the times. the question is whether, as a collection, this set of papers has value beyond nostalgia. before turning to that question, however, let's see what they consist of. the volume groups nineteen papers into four categories: (1) background and philosophy, (2) information needs and systems, ( 3) organization and dissemination of information, and ( 4) other areas of interest. the first includes papers by borko, by shera, and by otten and debons that attempt to define information science, its relationship to librarianship, and its potential as an independent discipline. the second includes papers by weinberg, by murdock and liston, by taylor, by parker and paisley, and by kertesz that outline the purposes and functions of information transfer, especially for the sciences. the third includes papers by doyle, by fischer, by conner, and by rees that present some of the techniques which have been developed for handling, organizing, and presenting information-especially mechanized ones such as kwic indexes, automatic indexing and abstracting, and sdi. the final section presents a potpourri of topics: a paper by lipetz on information storage book reviews 269 and retrieval, one by de gennaro on library automation, one by garvin on natural language, one by borko on systems analysis, and one by heilprin on technology and copyright. the defined purpose of this collection is to serve students and instructors in introductory courses in information science, by making these key papers readily available as assigned readings. they indeed are useful readings, and the organization imposed on them by the editor, elias, adds greatly to their usefulness, making them far more than a simple chronological listing. despite this, however, i must confess that, as the instructor in an introductory course in which we used the key papers for the purpose for which it was intended, it fell short of meeting the needs. since then, i've tried to evaluate why. recognizing that the difficulties may have been due to the style of the instructor and the form of the course the fact is that any collection of readings: valuable though they individually may be, has many deficiencies. i suppose they can all be summed up as follows: a collection of papers has the appearance of a book without being a book. it lacks congruity; it lacks balance; it lacks inherent structure in contrast to that which is imposed; it lacks a theme or point to be made; it lacks a consistent style. as a sometime publisher, as an editor of a -series of books as a reviewer of prospective manuscrip~ i have felt that these things are as important in evaluation as substance and content. beyond this, a more important fact is that these papers, "key" though they are, represent the past, not the present. an introduction to information science requires reading assignments in the work of today, not just those of historical importance. on the other hand, the fact remains that these are important papers, ones with which students should be come familiarand not simply for historical purposes, and that most instructors and classes should bnd this a useful volume. robert m. hayes becker & hayes, inc. editor’s note: we have an excellent editorial board for this journal and with this issue we’ve decided to begin a new column. in each issue of ital, one of our board members will reflect on some question related to technology and libraries. we hope you find this new feature thought-provoking. enjoy! any librarian who has been following the profes-sional literature at all in the past ten years knows that there has been an increasing emphasis on user-centeredness in the design and creation of library services. librarians are trying to understand and even anticipate the needs of users to a degree that’s perhaps unprecedented in the history of our profession. it’s no mystery as to why. we now live in a world where global computer networks link users directly with information in such a way that often, no middleman is required. users are exploring information on their own terms, at their own convenience, sometimes even using technologies and systems that they themselves have designed or contributed to. at the same time, most libraries are feeling a financial pinch. resources are tight, and local governments, institutions of higher education, and corporations are all scrutinizing their library operations more closely, asking “what have you done for me lately?” the unspoken coda is “it better be something good, or i’m cutting your funding.” the increasing need to justify our existence, together with our desire to build more relevant services, is driving an increased interest in assessment. how do we know when we’ve built a successful service? how do we define “success?” and, perhaps most importantly, in a world filled with technologies that are “here today, gone tomorrow,” how do we decide which ones are appropriate to build into enduring and useful services? as a library technologist, it’s this last question that concerns me the most. i’m painfully aware of how quickly new technologies develop, mature, and fade silently into that good night with nary a trace. it’s like watching protozoa under a microscope. which of these can serve as the foundation for real, useful services? it’s obvious to me that if i’m going to choose well, it’s vital that i place these services in context—and not my context, the user context. in order to do that, i need to understand the users. how do they do their work? what are they most concerned with? how do they think about the library in relation to the research process? how do they use technology as part of that process? how does that process fit into the larger context of the assignment? to answer questions like these, librarians often turn to basic marketing techniques such as the survey or the focus group. whether we are aware of it or not, the emphasis on user-centered design is making librarians into marketers. this is a new role for us, and one that most of us have not had the training to cope with. since most of us haven’t been exposed to marketing as a discipline of study, we don’t think of what we do as marketing, even when we use marketing techniques. but that’s what it is. so whether we know it or not, marketing, particularly market research, is important to us. marketing as a discipline is in the process of undergoing some major changes right now. recent research in sociology, psychology, and neuroscience has uncovered some new and often startling insights into how human beings think and make decisions. marketers are struggling to incorporate these new models into their research methods, and to change their own thinking about how they discover what people want. i recently collided with this change when my own library decided to do a focus group to help us redesign our website. since we have a school of business, i asked one of our marketing professors for help. her advice? don’t do it. as she put it: “you and the users would just be trading ignorances.” she then gave me a reading list, which included how customers think by gerald zaltman, which i now refer to as “the book that made marketing sexy.”1 zaltman’s book pulls together a lot of the recent research on how people think, make choices, and remember. some of it is pretty mind-blowing: n 95% of human reasoning is unconscious. it happens at a level we are barely aware of. n we think in images much more than we do in language n social context, emotion, and reason are all involved in the decision-making process. without emotion, we literally are unable to make choices. n all human beings use metaphors to explain and understand the world around them. metaphor is the bridge between the rational and emotional parts of the decision-making process. n memory is not a collection of immutable snapshots we carry around in our heads. it’s much more like a narrative or story—one that we change just by remembering it. our experience of the past and present are inextricably linked—one is constantly influencing the other. heady stuff. if you follow many of these ideas to their logical conclusions, you end up questioning the value of many traditional marketing techniques, such as surveys and focus groups. for example, if the social context in 4 information technology and libraries | june 2008 kyle felker (felkerk@wlu.edu) is an ital editorial board member, 2007–09, and technology coordinator at washington and lee university library in lexington, virginia. editorial board thoughts kyle felker ital board member’s column | felker 5 which a decision is made is important, then surveys are often going to yield false data, since the context in which the person is deciding to tick off this or that box is very different from the context in which they actually decide to use or not use your service or product. asking users “what services would be useful” in a focus group won’t be effective because you are only interviewing the users’ rational thought process—it’s at least as important to find out how they feel about the service, your library, the task itself, and how they perceive other people’s feelings on the subject. zaltman proposes a number of very different marketing techniques to get a more complete picture of user decision making: n use lengthy, one-on-one interviews. interviewing the unconscious is tricky and takes trust, it’s something you can’t do in a traditional focus group setting. n use images. we think in images, and images are a richer field for bringing unconscious attitudes to the surface. n use metaphor. invite interviewees to describe their feelings and experiences in metaphor. explore the metaphors they come up with to more fully understand all the context. if this sounds more like therapy than marketing to you, then your initial reaction is pretty similar to mine. but the techniques follow logically from the research zaltman presents. how many of us have done user assessment and launched a new service, only to find a less than warm reception for it? how many of us have had users tell us they want something, only to see it go unused when it’s implemented? zaltman’s model offers potential explanations for why this happens, and methods for avoiding it. lest you think this has nothing to do with technology, let me offer an example: library facebook/myspace profile pages. there’s been a lot of debate on how effective and appropriate these are. it seems to me that we can’t gauge how receptive users are to this unless we understand how they feel about and think about those social spaces. this is exactly the sort of insight that new marketing techniques purport to offer us. in fact, if the research is right, and there is a social and emotional component to every choice a person makes, then that applies to every choice a user makes with regard to the library, whether it’s the choice to ask a question at the reference desk, the choice to use the library website, or the choice to vote on a library bond issue. librarians are doing a lot of things we never imagined we’d ever need or want to do. web design. archival digitization. tagging. perhaps it’s also time to acknowledge that what we do has an important marketing component, and to think of ourselves as marketers (at least part time). i’m sold enough on zaltman’s ideas that i’m willing to try them out at my own institution, and i encourage you to do the same. reference 1. zaltman, gerald. how customers think: essential insights into the mind of the market (boston, mass.: harvard business school press, 2003.) editorial | truitt 3 marc truitteditorial a s i write this, hurricane ike is within twelve hours of making landfall in texas; currently, it appears that the storm will strike directly at the houston– galveston area. houstonians with long memories will be comparing ike to hurricane alicia, which devastated the region in 1983, killing twenty-one and doing $2.6 billion in damage.1 younger residents and/or more recent immigrants to the area will recall tropical storm allison, which though not of hurricane force, lashed the city and much of east texas for two weeks in june 2001, leaving in its wake twenty-three dead, $6.4 billion in losses, and tens of thousands of homes damaged or destroyed.2 and of course, more recently, and much better known to all of us, regardless of where we live, katrina, the “mother of all storms,” killed over eighteen hundred, caused over $80 billion in damage, left huge swaths of new orleans uninhabitable, and created a population exodus with whose effects we are living even to this day.3 common to each of these disasters—and so many others like them—is the fact that they have often wrought terrible damage on libraries in their areas. most of us have probably seen the pictures of the waterand mildewdamaged collections at tulane, xavier, the university of new orleans, and the new orleans public library system. and the damage from these events is long-term or even permanent. i formerly worked at the university of houston (uh), and when i left there in 2006 that institution was still dealing with the consequences of allison’s destruction of uh’s subterranean law library. and now i have to wonder whether uh librarians, faculty, and students might not be facing a similar or even worse catastrophe all over again with ike. ital editorial board member donna hirst has done the profession a great service with her column, “the iowa city flood of 2008: a librarian and it professional’s perspective,” which appears in this issue. her account of how library it folks there dealt with relocations of servers, library staff, and indeed library it staff members themselves should be made required reading for all of us in the field, as well as for senior library administrators. the problem, i think we all secretly know, is that emergency preparedness—also known by its current moniker “business continuity planning” (bc)—and disaster recovery (dr) are not “sexy” subjects. devoting a portion of our always too modest resources of money, equipment, staffing, and time to what is, at best, a sort of insurance against what might happen someday seems inexcusably profligate today. such planning and preparation doesn’t roll out any shiny new services and will win few plaudits from staff or patrons, to say nothing of new resources from those who control our institutional purse strings. buying higher bandwidth equipment for a switching closet is likely to be a far easier sell. that is, until that unthinkable something happens, and your organization is facing (or suffers) a catastrophic loss of it services. note that i didn’t say “equipment” or “infrastructure.” the really important loss will be one of services. “stuff”—in the form of servers, workstations, networks, etc.—all costs money, but ultimately is replaceable. what are not replaceable—at least not immediately—are library services to staff and patrons: access to computing (networking, e-mail, productivity applications, etc.), internet resources, and perhaps most importantly nowadays, the licensed electronic content on which we and our patrons have so come to rely. while the news coverage will emphasize (not without justice, i think) the lost or rescued books in a catastrophic loss situation, what staff and patrons are likely to demand first and loudest will be continuation or restoration of technology-based library services such as e-mail, web presence, web access, and licensed content. lest there be doubt, does anyone recall what drove evacuees into public libraries in the wake of katrina? it was, as much as anything, the desire to locate loved ones and especially the need to seek information and forms for government assistance—all of which required access to networked computing resources. if we have one at all—i suspect that many of us have a dr plan that is sadly dated and that has never been tested. look at it this way: would you roll out a critical and highly visible new web service without careful preparation and testing? yet many of us somehow think that bc or dr is somehow different, with no periodic review or testing required. since we feel we have no resources to devote to bc or dr planning and testing, we excuse our failure to do so by telling ourselves and our administrations that “we can’t really plan for a disaster, since the precise circumstances for which we’re planning won’t be the ones that actually occur.” and so we find ourselves later facing a crisis without any preparation. here at the university of alberta libraries, we’ve been giving the questions of business continuity and disaster recovery a good deal of thought lately. our preexisting dr plan was typical of the sort i’ve described above: outof-date, vanishingly skeletal in its details, without explicit reference or relevance to maintenance and restoration of mission critical services, and of course, untested. impetus for our review has come from several sources. perhaps the most interesting of these has been a university-sponsored bc planning process that embraces a twopronged approach: marc truitt (marc.truitt@ualberta.ca) is associate director, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. 4 information technology and libraries | december 2008 n identify and prioritize your organization’s services. working with other constituencies within the library, we have identified and prioritized approximately ten broad services to be maintained or restored in the event of an interruption of our normal business activities. for example, our top priority is the continuation or restoration of access to licensed electronic content (e.g., e-journals, e-books, databases, etc.). our it disaster planning will be informed by and respond to this goal. n identify “upstream” and “downstream” dependencies. we are dependent on others for services so that we can provide our own; thus we cannot offer access to the internet for our users unless campus it provides us with a gateway to off-campus networks. we need to make certain as we plan that campus it is aware of and can provide this service in the scenarios for which we’re planning. by the same token, others are dependent on us for the provision of services critical to their planning: our consortial partners, for example, rely on us for ils, document delivery, and other technology-based services that we need to plan to continue in the event of a disaster. these two facets—services and dependencies—can be expressed as a matrix that is helpful in planning for bc and dr goals that are both responsive to the needs of the organization and achievable in terms of upstream and downstream dependencies. it has been an enlightening exercise. one consequence has been our decision to include, as part of next fiscal year’s budget request, funding to help create a dr site at our library’s remote storage facility, to enable us quickly to restore access to our most critical technology services. in the past, we might have used this annual request as an opportunity to highlight our need for funding to support rolling out some glamorous new service initiative. with this request, though, we are explicitly recognizing that we as an organization need to commit to measures that ensure the continuance in a variety of situations of our existing core services. that’s a major change in mindset for us, as i suspect it would be for many library it organizations. a final interesting aspect of our planning process is that one of the major drivers for the university is a concern about business continuity in the event of a peoplebased disaster. as avian influenza (aka, “bird flu”) has spread beyond the confines of its southeast asian point of origin, worry about how we continue to operate in the midst of a pandemic has been added to the more predictable suite of fires, floods, tornadoes, and earthquakes (okay, not likely in alberta). indeed, pandemic planning is in many ways far more difficult than that for more “normal” disasters. while in many smaller libraries the “it shop” may be comprised of one person in many hats, in larger organizations such as ours (approximately 25 full-time equivalent employees in library it), there tends to be a great deal of specialization. can the webmaster, in the midst of a crisis, support staff workstations? can the help desk technician deduce why our vendor for web of science has suddenly and inexplicably disabled our access? our bc process rules tell us that we should be planning for “three-deep” expertise in all critical areas, since the assumption is that a pandemic might mean that a third or more of our staff would be ill (or worse) at any given time. how many of us offer critical technology services that suffer from that it manager’s ultimate staffing nightmare, the single point of failure? we have no profound answers to these questions, and our planning process is by no means the one that will work for all organizations. but the evidence of katrina, ike, and iowa city is plain: we need to be as prepared as possible for these events. the time to “get religion” about business continuity and disaster recovery is before the unthinkable occurs, not after. are there any of you out there with experiences—either in preparation and planning or in recovery operations—that you would consider sharing with ital readers? we all would benefit from your thoughts and experiences. i know i would! post-ike postscript. ike roared ashore four days ago and it is clear from media coverage since that galveston suffered a catastrophe and houston was badly damaged. reports from area libraries are sketchy and only today beginning to filter out. meanwhile, at the university of houston, the building housing the architecture library lost its roof, and the salvageable portions of its collection are to be relocated to the main m.d. anderson library. references 1. “hurricane alicia,” wikipedia, http://en.wikipedia.org/ wiki/hurricane_alicia (accessed sept. 12, 2007). 2. “tropical storm allison,” wikipedia, http://en.wikipedia .org/wiki/tropical_storm_allison (accessed sept. 12, 2007). 3. “hurricane katrina,” wikipedia, http://en.wikipedia .org/wiki/hurricane_katrina (accessed sept. 12, 2007). lib-s-mocs-kmc364-20140601051432 30 journal of library automation vol. 5/1 march, 1972 circulation control: off-line, on-line, or hybrid michael k. buckland, bernard gallivan: library research unit university of lancaster, england the requirements of a computer-aided circulation system are described. the characteristics of off-line systems are reviewed in the light of these requirements. on-line systems are then reviewed and their economic viability queried. a "hybrid" system (involving a dedicated mini-computer in the library, used in conjunction with a larger machine), appears to be more cost-effective than conventional on-line working. introduction an important feature of a very small library is the close contact between the librarian, his collections, and his users. over the years as collections and library usage have both increased enormously, librarians have gradually been losing this important "contact." the trend toward increased book use is a desirable one but the sheer pressure of transactions has necessitated the adoption of manual and photographic circulation control systems which concentrate on a restricted range of information about borrowing -notably when a book is due back and who has a given book. computerbased circulation systems offer the prospect of regaining detailed knowledge of book usage-at a price. this paper reviews three approaches. the desirable features of a circulation control system are that: 1. it should "marry" borrower, bo~k, and date information together rapidly and accurately. 2. it should enable rapid, easy consultation of the issue files at any time in order to detect the location of any book. 3. it should be able to immediately detect and register the fact that an item just returned from loan has been requested by another reader. this ability should not be dependent on whether or not the person returning the book has also remembered to bring in a recall notice. hybrid circulation control/buckland and gallivan 31 4. it should prepare suitable overdue notices for books retained too long. 5. it should be possible to produce lists of items out on loan to any given borrower and also to signal "over borrowing" (i.e., having an excessive number of books out on loan at any given time). 6. it should be able to detect delinquent borrowers at the point of issue. 7. when material is returned from loan, the system should amend the circulation records promptly and permit the calculation of any fine. 8. it should facilitate the collection, analysis, .and presentation of the "management information" needed to maintain effective stock control and high standards of service. 9. it should perform these tasks reliably and economically. these requirements vary in importance from library to library, but, with some differences in emphasis they appear to apply equally to both public and university libraries. off-line the commonest approach to computer-aided circulation control is to operate in the off-line mode. well-documented examples are ibm 357's (southern illinois), (2) southampton university library, (phase ii), using friden equipment, ( 3) and the current automated library systems ( als ) ltd. equipment. ( 4) these systems can perform the basic operations of issuing and discharging books in an economical manner but because they are operating in an off-line manner they experience difficulties in mainfig. 1 cards badr.es dials · '~'"m"lli ~ ~ ~i/ alsonly g :-taapping_l,_ _______ .,. receiver i store i .... ___ _ ___ ...j ! c:j or i i l----_j i library i i i ibm 357 collectadata 30. (friden) als ---------------------------------,-------------------------------1 32 i ournal of librm·y automation vol. 5 / 1 march, 1972 taining an up-to-date overview of their collections and in detecting reservations. they cannot detect delinquent readers at the point of issue. in order to solve some of these problems als has been developing an off-line system with a certain amount of storage attached to it. this "trapping store," indicated by dashed lines in figure 1 (see p.31), can contain the numbers of reserved books and delinquent readers to facilitate immediate identification at the point of issue. this system has proved to be quite popular and at least fifteen have been installed in university and public libraries. it is still not able to provide any better currency of information than is possible with a basic off-line system and the als system will handle only numeric information. books are identified by number only, so that if one receives an overdue reminder, it is because books 341672, 816649, and 654321 are overdue-unless there is a substantial matching operation against a complete catalog file. in contrast a system using alpha-numeric characters could include brief author, title, and call-number information on, say, an 80-column book card. this would permit the production of lists by author, etc., without reference to a complete catalog file. it may be noted by reference to table 1 that an outstanding virtue of the als system is the low cost of installing additional data collection points. a notable desideratum in library automation is the apparent lack of a simple, inexpensive data collection unit capable of reading alpha-numeric book cards. if relatively expensive equipment is used (e.g., ibm 357 or friden collectadata), there may be difficulties in coping economically with the inherent peakiness of library borrowing. attempts to mould table 1. off-line ci1'culation system costs ibm 357 basic two transmitter and receiver system ____ $13,000 maintenance ( p.a. ) __________________ 655 trapping store ___ _ --$13,655 als (6-reader friden collecta data 30 system) $14,000 $10,000 900 800 11,000 $14,900 $21,800 notes: 1. the specifications are intended to represent two service points. since als equipment uses separate card readers for borrowing and return, the provision for two borrowing points and two return points would, in fact, have a higher traffic capability than the other two systems. 2. figures representing british prices expressed as u.s. dollars at $2.40=£1. 3. collecta data 30 hardware is at 1967 price. 4. approximate cost of each als reader is $500. hybrid circulation control/buckland and gallivan 33 library use to suit the machinery are unlikely to prove satisfactory. notably a general lengthening of loan periods will result in a lower standard of library service in terms of immediate availability ( 1) and, almost certainly, a decrease in actual book usage. in management information terms, the symptoms of this would be an increase in the size of the issue file compared with the borrowing rate and a drop in issues per capita. on-line since the deficiencies of off-line working are serious, various attempts have been made to develop on-line circulation systems (see figure 2). this is the second main formula of which illinois state library ( 5), queens university, belfast ( 6 ), and midwestern university library (7) provide welldocumented examples. such a system is able to maintain a completely up-todate picture of the issue files. they can detect both reserved material and delinquent readers immediately and appear to provide the complete answer to the library's needs until their technical requirements are examined. in order to control the circulation system in an on-line manner, the library expects there to be at least ten hours on-line working available to it each working day. as more than one university library has already discovered, this number of hours of on-line working is very rarely available at present when computer facilities are being shared with many other users as in a university environment. furthermore, it is unlikely r=j/lt!plexer input-output d.c . u. ' s i fig. 2. on-line circulation control on-line computer dedicated storage c 0 m p u t e r u n i t 34 journal of library automation vol. 5/1 march, 1972 to become available for quite some years in the future because, with present machines and techniques, on-line working is an inefficient mode of operation unless the computer system is running well below capacity. a further obstacle when sharing facilities is the amount of dedicated storage that must be made available to the library. storage is a much prized commodity and computer centers are unwilling to forfeit valuable storage for any length of time. it should also be noted that no average-sized library will be able to afford or justify possession of its own dedicated computer adequate for on-line working. a library's requirements for storage, printing facilities, and so on would make such an independent system an extravagance since its power would have to be considerable to handle the vast quantities of data input to it, but it would constitute a grossly under-utilized investment compared with the sharing of the facilities provided in a university or local government computing center. the data collection units could be teleprinters or card reading stations with some printing or display facilities. the number and type of such dcu's will depend on the local work load, but we will consider a system using two alpha-numeric card reading stations with printout facilities plus an interrogating printer. an interface into the main computer and a multiplexing device will also be required. in order to answer queries and to control the circulation in a completely on-line manner the dedicated disk must be large enough to hold the issue file, and having gone to the expense of controlling the issue online, it would seem inconsistent to be satisfied with a number-only system. if ~e plan for an expected maximum number of 50,000 records in our issue file at any one time and allow 100 characters per record (i.e., author/ title, class or call number, borrower number, date due back, code to describe the type of loan, i.e., long or short, etc.), the disk must be capable of storing 5 million characters. since it is usual to store the bulk of the circulation control programs on the same disk and to allow certain parts of the disk to be used as work areas, a total store area of 6 million characters, at least, will be required. the cost of providing adequate dedicated disk storage will depend on the local situation but could well cost anything between $30,000 and $50,000 to purchase. the remaining equipment is likely to cost $20,000 and development costs will be greater than with off-line. a "hybrid" circulation control system it is possible to meet all the requirements of a library circulation system in a cost-effective manner by exploiting and combining the main advantages of on-line and off-line working in a hybrid system. the basic structure of such a system is shown in figure 3 (see p. 36) . as can be seen, the mini-computer is sited in the library building and has the various data collection terminals attached locally to it. the mini-computer is also conhybrid circulation control/buckland and gallivan 35 nected via a line to the main computer into and from which it can send and receive data. the important differences between this system and the conventional onor off-line systems are that: 1. the mini-computer spools up the transactions as they occur, into its own storage (either tape or disk). 2. the on-line link to the main computer is only used two or three times each day. this is important, since it implies that the hybrid system does not require continuous on-line facilities. 3. supplementary or full listings show the state of the issue file at a particular time. 4. the recent transactions stored by the mini·computer can be interrogated and, in conjunction with the listings, gives the immediately current state of the issue file. 5. reserved books and delinquent readers have their identifiers stored in the core of the mini-computer to enable their immediate identifi· cation at the point of issue. 6. the necessity for dedicated equipment on the main computer (such as a dedicated disk) is avoided. a fairly heavily used library will be handling approximately 5,000 transactions each day. since these transactions will be either issue or return transactions, in the main, if we allow 100 characters worth of information to identify an issue and 20 characters to identify a return, then on the average we will be handling 300,000 characters worth of information each day. in the hybrid system the mini~computer is acting as a controller to the data collection devices and is spooling this information up onto a magnetic tape until such time as the storage space is becoming full or until a sufficient time has elapsed since the last updating of the records. at this time the mini-computer passes the information on its magnetic tape to the main computer via the on-line link. the duration of the on-line connection might be ten to fifteen minutes owing to line speed limitations. in order . to operate a hybrid system, the library would need two periods of on-line working each day of approximately ten to fifteen minutes each. alternatively the magnetic tape could be physically replaced, the fresh one continuing to record transactions while the full one is carried manually to the main computer. provided that the tapes can be read by the main computer, on-line facilities would not be required. the recent transactions, having been passed to the main computer, will be sorted and merged with the rest of the issue file which would be kept on magnetic tape. the precise nature of the listings produced by the main computer will depend on local factors, such as the duration of the loan period, etc., but could be either a fully revised complete listing or a listing of the most recent transactions to supplement an earlier complete listing. 36 journal of library automation vol. 5/ 1 march, 1972 hybrid computer costs basic computer (includes teleprinter ) ------------------------------------------------$ 8,650 extra 4k of store ----------------------------------------------------------------------------------------$ 3,600 tape con troller --------------------------------------------------------------------------------------$ 7,200 dual tape transport --------------------------------------------------------------------------------$ 5, 700 data break interface ------------------------------------------------------------------------------$ 600 d. c. u .' s @ $3,100 ( 2 off) ______________________ ___________________________________________ ______ _ $ 6,200 interface d . c. u .-mini-computer -----------------------------------------------------------$ 2,000 interface mini-main computer --------------------------------------------------------------$ 1,200 $35,150 it is worth noting that the most widely adopted computeraided circulation system in great britain is the als system, which, if purchased with a "trapping store," is th e nearest equivalent to the hybrid system outlined in figure 3. the chief differences are that ( 1 ) the als system operates on numbers only, which, in our view makes it less suitable for university library applications; ( 2) the "trapping store" is inflexible in its capability when compared with a mini-computer of similar cost. in order to utilize the on-line facilities provided b y a mini-computer to the full, it would be possible to handle the "short loan" reserve collections of popular texts (commonly borrowable for a few hours only) in a completely on-line manner. in this respect there would be no reliance on the main machine. it might also be appropriate to use the mini-computer to handle other library data processing tasks. library ~r~als rn ~~ ,,!,. [___j computer d! lj consol e typewriter or vou comput er u nit · mai n computer -qhor lists, '-----,t,----------' call number lists, notices, etc . 8 fig. 3. simplified hypothetical " hybrid" circulation control system hybrid circulation control/buckland and gallivan 37 last year at the university of lancaster the average cost per issue was 12.72 cents. since the university of lancaster is a new university in great britain, it is in the middle of a period of growth and student numbers are expected to increase from 3,000 to 5,400 in the next five years. this university has also researched into the influence of duplication and loan period adjustment on the availability of stock to prospective users and with the present level of duplication and grades of loan period there is a per capita borrowing rate approaching 80 issues per annum. this figure is expected to increase in the next five years. with these figures as a basis, at least 2 million issues are expected in the next five years. even allowing for the cost of data conversion and the amortizement of hardware over five years, the use of a hybrid circulation control system could be expected to result in an average cost per issue of just under 12 cents. conclusion the costs already mentioned can be tabulated thus: off-line-$13,000-$22,000 on-line-$70,000 h ybrid-$35,000 this suggests that a hybrid system offers complete control over library circulation in a highly cost-effective manner compared with on-line working. whether or not a hybrid system is also to be preferred to off-line working will depend on the individual library context. the trade-off between the marginal advantages and the marginal increases in cost and complexity will depend on the detailed costs and value-judgments specific to each situation. if our diagnosis is correct then most attempts to progress from off-line to on-line working are ill-judged and would appear to have no justification in cost-effectiveness. in our view these developments are unlikely to become fully operational. if they do, their life will probably be short or restricted to limited hours unless exceptional circumstances prevail. such circumstances would include continuing subsidies for research and development or the existence of a system justified on other grounds (e.g., police records) . references 1. michael k. buckland and others, systems analysis of a university library; final report on a research project, university of lancaster library occasional papers, 4 (lancaster, england: university of lancaster library, 1970). 2. r. e. mccoy "computerised circulation work: a case study of the 357 data collection system, library resources & technical services 9:59-65 (winter 1965). 3. b. a. j. mcdowell and c. m. phillips circulation control system. automation project report no . 1 (soul/ aprl) (southampton, england: university of southampton library, 1970). 38 journal of library automation vol. 5/1 march, 1972 4. lorna m. cowburn "university of surrey automated issue system," program 2:70-88 (may 1971). 5. robert e. hamilton "the illinois state library "on-line" circulation control system," in dewey e. carroll ed., proceedings of the 1968 clinic on library applications of data processing. university of illinois graduate school of library science, urbana, illinois (london: bingley, 1969) , p.ll-28. 6. richard t. kimber, "an operational computerised circulation system with on-line interrogative capability," program 2:75-80 (oct. 1968). 7. calvin j. boyer and j. frost, "on-line circulation control-midwestern university library's system using an ibm 1401 computer in a "timesharing' mode," in dewey e. carroll ed., proceedings of the 1969 clinic on library applications of data processing. university of illinois graduate school of library science, urbana, illinois (london: bingley, 1970) , p.135-45. the next generation integrated library system: a promise fulfilled? yongming wang and trevor a. dawes information technology and libraries | september 2012 76 abstract the adoption of integrated library systems (ils) became prevalent in the 1980s and 1990s as libraries began or continued to automate their processes. these systems enabled library staff to work, in many cases, more efficiently than they had in the past. however, these systems were also restrictive—especially as the nature of the work began to change—largely in response to the growth of electronic and digital resources that they were not designed to manage. new library systems—the second (or next) generation—are needed to effectively manage the processes of acquiring, describing, and making available all library resources. this article examines the state of library systems today and describes the features needed in a next-generation library system. the authors also examine some of the next-generation library systems currently in development that purport to fill the changing needs of libraries. introduction since the late 1980s and early 1990s, the library automation system has gone from inception to rapid implementation to near ubiquitous adoption. but after two decades of changes in information technology, and especially in the last decade, the library has seen itself facing tremendous changes in terms of both resources and services it provides. on the resource side, print material and physical items are no longer dominant collections; electronic resources are fast outpacing physical materials to become the dominant library resources, especially in the academic and special libraries. in addition, many other digital format resources, such as digital collections, institutional repositories, and e-books have taken root. on the service front, library users— accustomed to immediate and instant searching, finding, and accessing information in the google age—demand more and more instant and easy access to library resources and services. but the library automation system, also called the integrated library system (ils), has not changed much for the past two decades. it finds itself uneasily handling the ever-changing library environment and workflow. library staff becomes ever more frustrated with the ils, noting its inadequacy in dealing with their daily jobs. library users are confused by the many interfaces and complexity of library applications and systems. it is obvious that we are at the tipping point for a dramatic change in the area of library automation systems. the library literature has been referring to these as second-generation library automation systems or next-generation library systems.1 two pillars of the second-generation library automation system are(1) it will manage the library resources in the comprehensive and unified way regardless of resource format and location; and (2) it will break away from the traditional ils models and build on the service oriented architecture (soa) model. yongming want (wangyo@tcnj.edu) is systems librarian for the college of new jersey library, ewing township, and trevor dawes (tdawes@princeton.edu) is access services & circulation librarian, princeton university libraries, princeton, new jersey. the next generation library system: a promise fulfilled? | wang and dawes 77 we are at the beginning of a new era of library automation systems. some library system vendors have realized the need to change and have started to develop and implement the secondgeneration library automation system. we believe that the concept and implementation of the new library automation system will catch on quickly among the all types of libraries. it will change how the library conducts its business and will benefit both library staff and users. literature review there is not much research literature on the subject to date. after more than a decade of library automation development and implementation, starting in the late 1990s, libraries have been facing the challenges ushered in by rapidly evolving internet and web 2.0 technologies in addition to the growing number of savvy web users. libraries found themselves lagging behind other sources (such as internet search engines) in meeting users’ information needs, and library staff members are generally frustrated by the lack of flexibility of traditional library systems. as early as 2007, marshall breeding pointed out that “as librarians continue to operate with sparse resources, performing ever more services with ever more diverse collections—but with no increases in staff—it’s more important than ever to have automation tools that provide the most effective assistance possible.”2 in his 2009 article, he deliberately says that “dissatisfaction with the current slate of ils products runs high. the areas of concern lie in their inability to manage electronic content and with their user interfaces that do not fare well against contemporary expectations of the web.”3 so what are the trends in libraries for the last decade in terms of library resources, collections, services, and resource discoveries? according to breeding, there are three trends: “1. increased digital collections; 2. changed expectations regarding interfaces; 3. shifted attitudes toward data and software.”4 andrew pace notes that “web-based content, licensed resources, born-digital documents, and institutionally significant digital collections emerged rapidly to overtake the effort required to maintain print collections, especially in academic libraries.”5 another noticeable trend in the library technology field is occurring along with a similar trend in the general information technology field, that is, the open-source software movement. as pace states, “open source software (oss) efforts such as the open archive initiative (oai), dspace, and koha—just to name a few, as an exhaustive list would overwhelm the reader—challenged commercial proprietary systems, not only for market share but often in terms of sophistication and functionality.”6 as for the infrastructure and features of the second-generation library automation system, both breeding and pace have their respective visions. breeding writes that “the next generation of library automation systems needs to be designed to match the workflows of today’s libraries, which manage both digital and print resources.”7 “one of the fundamental assumptions of the next generation library automation would involve a design to accommodate the hybrid physical and digital existence that libraries face today.”8 pace specifically requires that the next-generation library automation system should use the web as a platform to fulfill the notion of software-as-aservice (saas), or further, platform-as-a-service (paas). the technical advantages of such systems would include the ability to “1. develop, test, deploy, host, and maintain on the same integrated environment; 2. user experience without compromise; 3. build-in scalability, reliability, and information technology and libraries | september 2012 78 security; 4. build-in integration with web services and databases; 5. support collaboration; 6. deep application instrumentation.”9 also as early as october 2007, computers in libraries invited ellen bahr to survey a number of library technology experts regarding what features and functionality they want to see built into ilss soon. the experts included roy tennant, kristin antelman, ross singer, andrew pace, john blyberg, stephen abram, and h. frank cervone. they identified the following key functionality for future ilss: • direct, read-only access to data, preferably through an open source database management system like mysql. • a standard way to communicate with the ils, preferably through an application programming interface. • standards-compliant systems including better security and more complete documentation. • the ability to run the ils on hardware that the library selects and on servers that the library administers. • greater interoperability of systems, pertaining to the systems within the library (including components from vendors, open source communities, and homegrown systems) and beyond (enterprise-level systems such as courseware and university portals, and shared library systems such as oclc). • greater distinction between the ils (which needs to efficiently manage a library’s business processes) and the opac (which needs to be a sophisticated finding tool). • better user interfaces, making use of the most current technologies available and providing a single interface to all of the library’s holdings, regardless of format.10 four aspects of next-generation ils there are four distinguishing characteristics of the next-generation ils we believe are critical. they are comprehensive library resources management; a system based on service-oriented architecture; the ability to meet the challenge of new library workflow; and a next-generation discovery layer. comprehensive library resources management comprehensive library resources management requires that next-generation ilss should be able to manage all library materials regardless of format or location. current ilss are built around the traditional library practice of print collections and services designed around these collections, but the last ten to fifteen years have seen great shifts in both library collections and services. print and physical materials are no longer the dominant resources. actually, in many libraries, especially in academic and research libraries, the building of electronic and digital collections have taken a larger role in library collection development. the traditional ils has not been able to handle ever-growing electronic and digital resources—either in terms of their acquisition or management. therefore a variety of either commercial or open-source the next generation library system: a promise fulfilled? | wang and dawes 79 electronic resources management systems (erm systems) have been developed over the years to address this management gap, but two problems exist: first, most erm systems, whether commercial or open-source, have not been able to truly integrate the acquisition process into the acquisitions workflow of the current ils systems, causing a messy and redundant workflow for the library staff. in libraries where an erm is deployed, staff generally track workflows in both the erm and the ils. if the library’s workflows have not been revised, miscommunication between the traditional acquisitions staff and the electronic resources staff can cause confusion, delay, and may even lead to disruption of services to library patrons. second, erm systems, by design, don’t take current library workflows into account. while it is true that these resources may need to be processed differently, library staff generally are used to traditional processes and want systems that function in familiar ways. many libraries, particularly academic libraries, still have relatively large serials departments responsible for the management of print journals. some have only recently begun to develop the personnel and the skills required to manage the influx of electronic and digital resources. because of these problems with existing erm systems, it is important that the next-generation ilss fully integrate the key features of erm systems, enabling the library to streamline and efficiently manage resources and staff. full integration of e-resource management would not only include acquisitions functionality but also the ability to manage licenses—a critical component of e-resource management—and the ability to manage the various packages, databases, and vendors. describing and providing access to e-resources are two aspects of the e-resources management process. these two features of the erm system should also be integrated with the description and metadata management component of the next-generation ils. centrally managing the metadata of e-resources enables easier discovery of resources by library users and has the advantage of shifting some of the management workflow to the metadata (or cataloging) staff. system based on service-oriented architecture next-generation ilss should be designed based on service-oriented architecture (soa). what is soa? a service-oriented architecture (soa) is an architecture for building business applications as a set of loosely coupled distributed components linked together to deliver a well-defined level of service. these services communicate with each other, and the communication involves data exchange or service coordination. soa is based on web services. broadly, soa can be classified into two aspects: services and connections, described below. services: a service is a function or some processing logic or business processing that is welldefined, self-contained, and does not depend on the context or state of other services. an example of a service is loan processing services, which can be a self-contained unit for processing loan applications. another example is weather services, used to get weather information. any application on the network can use the services of the weather service to get the weather information for a local area or region. in the library field, an example of a well-defined service is a check-in or check-out service. information technology and libraries | september 2012 80 connections: connections are the links connecting these self-contained distributed services with each other. they enable client-to-services communication. in case of web services, simple object access protocol (soap) is frequently used to communicate between services. there are many benefits of soa in the next-generation ils. these include the ability to be platform independent, therefore allowing libraries to use the software and hardware of their choice. there is no threat of being locked in to a single vendor, as many libraries are now with their current ilss. soa also enables incremental development, deployment, and maintenance. the vendors can use the existing software (investment) and use soa to build applications without replacing existing applications. as breeding described, the potential of web services (soa) for libraries includes • real-time interaction between library-automation systems and business systems of a library’s parent organization; • real-time interaction between library-automation systems and library suppliers or other business partners; • blending of library services into campus or municipal portal environments; • insertion of library services and content into courseware-management systems or other learning environments; • blending of content from external sources into library interfaces; and • delivery of library services and content to library users through nontraditional channels. 11 meet the challenge of the new library workflow the library systems in use today are, in general, aging—most were developed at least ten to fifteen years ago. they have been updated with software patches and new releases, but they still demand that staff work in the manner in which the systems were originally designed. although changes in our library operations have been realized in many organizations, these systems have not been able to adequately adapt to how library staff now want to—or need to—operate. the inability to keep pace with the move from largely print to increasingly electronic resources in our libraries is one of the reasons our existing systems fail. copeland et al. present a stunning visual of the typical workflow involved in acquiring and making available an electronic resource in the print-based library management system.12 their graphic depicts five possible starting points, nine decision points, and close to twenty steps involved in the process. this process may not be typical, but it is illustrative of the complex nature of our new workflows that simply cannot be accommodated by existing ilss. as early as 1997, the sirsi corporation recognized the need to modify systems; they introduced workflows, which is designed to streamline library operations.13 workflows, which introduced a graphical user interface to the sirsi unicorn system, was intended to allow staff a certain amount of flexibility and customization, depending on the tasks they typically perform. the new systems that are being developed and deployed today promise even more flexibility and propose to enable staff to work more efficiently irrespective of the format of the material being processed. but these systems will require staff to think about workflows in entirely different ways. not only will the method used to perform tasks be different (now web-based, hosted services as the next generation library system: a promise fulfilled? | wang and dawes 81 opposed to client-server-based tools) but the functionality has been enhanced to be more efficient. we cannot say how these new systems will be welcomed or resisted by staff. nor can we say how much staff savings will be realized because these systems are still too new and have not yet been implemented on a wide enough scale for a thorough assessment. but they are at least starting to address the issue. on the one hand, they will open a new window for further study and exploration of how to shape the next-generation ilss to suit the new library workflow. on the other hand, the library will benefit by changing some of their out-of-date practices and workflows around the new system. next-generation discovery layer current library opacs, like the ilss themselves, are more than ten years old and generally have shown no improvement in search capability, navigability, or discovery. meanwhile, search technology has radically improved in the past decade. frustrations with the opacs’ limitations on the part of both librarians and library users eventually motivated many libraries to seek alternatives. libraries want to take advantage of the advances in search and discovery technology by implementing “nextgen” opacs or library discovery services. given the vast range of resources available in libraries—local print holdings, specialized databases, and commercial databases to name only a few—libraries want a service that would make as many of them as discoverable as possible. the ideal system would have a unified search interface with a single search box, but with relevance ranking, faceted search, social tagging of records, persistent links to records, rss feeds for searches, and the ability to easily save searches or export selected records to standard bibliographic management software programs. the ideal system would also integrate with the library’s opac, overlaying its current interface with a more nimble and navigable interface that still allows real-time circulation status and provides as much support as possible for foreign language fonts. it would also be as customizable as possible. numerous options for discovery currently exist, and these include summon from serials solutions, primo from ex libris, worldcat local from oclc, ebsco discovery service, and encore from innovative interfaces. as these services are not the focus of this article, they will not be discussed in detail, but the next-generation ilss should have the ability to integrate seamlessly with these discovery services. analysis of two examples 1. alma development in early 2009, ex libris (owner of aleph and voyager) began discussions with several institutions (boston college, princeton university, and katholieke universiteit leuven; purdue university joined later) to develop what they then termed the unified resource management system (urm). the urm was to replace the existing ilss and the subsequent add-ons that provided functionality not inherently available, such as the electronic resources management (erm) tools. the “backend” operations would also be de-coupled from the user interface as described elsewhere in this paper. information technology and libraries | september 2012 82 through a series of in-person and online meetings with the development partners, ex libris staff developed the conceptual framework and functional requirements for the urm (later named alma) and began development of the product. alma was delivered to the partners in a series of releases, each with more functionality, and the feedback was used to enhance or further develop the product. alma uses the concept of a shared metadata repository (the metadata management system) to which libraries would contribute, through which records would be shared, and from which records would be downloaded and edited with local information. selection and acquisitions functions would be integrated not only within alma, but within the discovery layer to allow patrons, as well as staff, the ability to suggest items for addition to the library’s holdings. with “smart fulfillment,” the workflows for delivering materials to patrons will also be seamless.14 one of the major changes planned for alma is the ability to manage the types of resources that cannot be effectively managed in current ilss—specifically electronic and digital resources. these resources are currently managed with the use of add-on products that interact with varying degrees of success with the ilss. this lack of integration has been a source of frustration for library staff, particularly as library electronic and digital collections continue to steadily grow. the development partners have presented extensively at various conferences about the development process and have been mostly positive about the product. dawes and lute described princeton university’s participation in a presentation at the 2011 acrl conference in philadelphia.15 at princeton, an executive committee was created to oversee that partner’s process. other staff members were then involved in testing each of the partner releases as the functionality increased and was made available to them. the princeton university team then provided feedback to ex libris via regular telephone calls, after which they would see changes based on their feedback, or a status update from ex libris about the particular issue reported. the staff members at princeton believe that their participation in the development of alma has given them an opportunity to closely examine their workflows to see where efficiencies can be made. 2. kuali ole project in 2008 a group of nine libraries formed the open library environment (ole) project, later called kuali ole. kuali is a community of higher education institutions that came together to build enterprise-level and open-source applications for the higher education community. these systems include some core applications such as kuali financial system, kuali people management, and other campus-wide applications. the kuali ole is its most recent endeavor. the purpose of the kuali ole project is to build an enterprise-level, open-source, and next-generation ils. the goal of kuali ole, taken from its website (http://kuali.org/ole), is to “develop the first system designed by and for academic and research libraries for managing and delivering intellectual information.” there are six principal objectives of the project: • to be built, owned, governed by the academic and research library community • to supports the wide range of resources and formats of scholarly information • to interoperate and integrate with other enterprise and network-based systems the next generation library system: a promise fulfilled? | wang and dawes 83 • to support federation across projects, partners, consortia, and institutions • to provide workflow design and management capabilities • to provides information management capabilities to nonlibrary efforts the funding is provided by a contribution from the andrew w. mellon foundation and the nine partner institutions. kuali ole will be built based on the soa model, on top of the kuali middleware application, kuali rice, the core component of the kuali suite of applications. kuali rice “provides an enterprise class middleware suite of integrated products that allows for applications to be built in an agile fashion. this enables developers to react to end user business requirements in an efficient and productive manner, so that they can produce high quality business applications.”16 version 1.0 of kuali ole is scheduled to be released to the public in december 2012. a stepping and testing version (0.3) was released in november 2011, which covers some core acquisitions features such as “select” and “acquire” processes. we believe that the kuali ole software will not only provide an alternative solution of the ils for academic and research libraries, but will change the way the library conducts its business, and will also have implications for staffing. these changes will result from the comprehensive management of library materials and resources, and the system’s interoperability with other college-level enterprise applications. conclusion after about two decades of library automation system history, both libraries and vendors have begun to realize that a revolutionary change is needed in designing and developing the nextgeneration ils. the system, built on the model of soa, should enable the library to comprehensively and effectively manage all library resources and collections, should accommodate a more flexible library workflow, and should enable the library to provide better services to library users. it is encouraging to see that, in both the commercial and open-source arenas, concrete steps are being taken to develop these systems that will manage all library resources. alma and kuali ole are but two of the next-generation ilss in development. in 2011, serials solutions announced their intent to develop a system using the same principles as described. so have innovative interfaces and oclc, the latter of which has already released an early version of their product to some institutions. since these products are still in development and implementation is not yet widespread, their success in meeting the needs of the library community is still to be seen. references 1. marshall breeding, “next generation library automation: its impact on the serials community,” the serials librarian 56, no. 1–4 (2009): 55–64. 2. marshall breeding, “it’s time to break the mold of the original ils,” computers in libraries 27, no. 10 (2007): 39–41. 3. breeding, “next generation library automation information technology and libraries | september 2012 84 4. breeding, “it’s time to break the mold of the original ils.” 5. andrew pace, “21st century library systems,” journal of library administration 49, no. 6 (2009): 641–50. 6. ibid. 7. breeding, “it’s time to break the mold of the original ils.” 8. breeding, “next generation library automation.” 9. dave mitchell, “defining platform-as-a-service, or paas,” bungee connect developer network, 2008, http://bungeeconnect.wordpress.com/2008/02/18/defining-platform-as-a-service-orpaas (accessed jan. 28, 2012). 10. ellen bahr, “dreaming of a better ils,” computers in libraries 27, no. 9 (2007): 10–14. 11. marshall breeding, “web services and service oriented architecture,” library technology reports 42, no. 3 (2006): 3–42. 12. jessie l. copeland et al., “workflow challenges: does technology dictate workflow?” serials librarian 56, no. 1–4 (2009): 266–70. 13. “sirsi introduces workflows to streamline library operations,” information today 14, no. 7 (1997): 52. 14. ex libris, “ex libris alma: the next generation library services framework,” 2011, www.exlibrisgroup.com/category/almaoverview (accessed jan. 3, 2012). 15. acrl virtual conference, “princeton university discusses ex libris alma,” 2011, www.learningtimes.net/acrl/2011/906 (accessed jan. 3, 2012). 16. kuali rice website, http://www.kuali.org/rice (accessed sept. 10, 2012). http://bungeeconnect.wordpress.com/2008/02/18/defining-platform-as-a-service-or-paas http://bungeeconnect.wordpress.com/2008/02/18/defining-platform-as-a-service-or-paas http://www.exlibrisgroup.com/category/almaoverview http://www.kuali.org/rice assignfast: an autosuggest-based tool for fast subject assignment rick bennett, edward t. o’neill, and kerre kammerer information technology and libraries | march 2014 34 abstract subject assignment is really a three-phase task. the first phase is intellectual—reviewing the material and determining its topic. the second phase is more mechanical—identifying the correct subject heading(s). the final phase is retyping or cutting and pasting the heading(s) into the cataloging interface along with any diacritics, and potentially correcting formatting and subfield coding. if authority control is available in the interface, some of these tasks may be automated or partially automated. a cataloger with a reasonable knowledge of faceted application of subject terminology (fast)1,2 or even library of congress subject headings (lcsh)3 can quickly get to the proper heading but usually needs to confirm the final details—was it plural? am i thinking of an alternate form? is it inverted? etc. this often requires consulting the full authority file interface. assignfast is a web service that consolidates the entire second phase of the manual process of subject assignment for fast subjects into a single step based on autosuggest technology. background faceted application of subject terminology (fast) subject headings were derived from the library of congress subject headings (lcsh) with the goal of making the schema easier to understand, control, apply, and use while maintaining the rich vocabula ry of the source. the intent was to develop a simplified subject heading schema that could be assigned and used by nonprofessional cataloger or indexers. faceting makes the task of subject assignment easier. without the complex rules for combining the separate subdivisions to form an lcsh heading, only the selection of the proper heading is necessary. the now-familiar autosuggest4,5 technology is used in web search and other text entry applications to help the user enter data by displaying and allowing the selection of the desired text before typing is complete. this helps with error correction, spelling, and identification of commonly used terminology. prior discussions of autosuggest functionality in library systems have focused primarily on discovery rather than on cataloging.6-11 rick bennett (rick_bennett@oclc.org) is a consulting software engineer in oclc research , edward t. o’neill (oneill@oclc.org) is a senior research scientist at oclc research and project manager for fast, and kerre kammerer (kammerer@oclc.org) is a consulting software engineer in oclc research, dublin, ohio. http://www.oclc.org/research/activities/fast.html http://www.loc.gov/catdir/cpso/lcc.html mailto:rick_bennett@oclc.org mailto:oneill@oclc.org mailto:kammerer@oclc.org information technology and libraries | march 2014 35 the literature often uses synonyms for autosuggest, such as autocomplete or type-ahead. since assignfast can lead to terms that are not being typed , autosuggest seems most appropriate and will be used here. the assignfast web service combines the simplified subject choice capabilities of f ast with the text selection features of autosuggest technology to create an in -interface subject assignment tool. much of a full featured search interface for the fast authorities, such as searchfast ,12 can be integrated into the subject entry field of a cataloging interface. this eliminates the need to switch screens, cut and paste, and make control character changes that may differ between the authority search interface and the cataloging interface. as a web service, assignfast can be added to existing cataloging interfaces. in this paper, the actual operation of assignfast is described , followed by how the assignfast web service is connected to an interface, and finally by a description of the web service construction. assignfast operation an authority record contains the established heading, see headings, and control numbers that may be used for linking or other future reference. the relevant fields of the fast record for motion pictures are shown here: control number fst01027285 established heading motion pictures see cinema see feature films -history and criticism see films see movies see moving-pictures in fast, the facet of each heading is known. motion pictures is a topical heading. the see references are unauthorized forms of the established heading. if someone intended to enter cinema as a subject heading, they would be directed to use the established heading motion pictures. for a typical workflow, the subject cataloger would need to leave the cataloging interface, search for “cinema” in an authority file interface, find that the established heading was motion pictures, and return to the cataloging interface to enter the established heading. the figure below shows the same process when assignfast is integrated into the cataloging interface. without leaving the cataloging interface, typing only “cine” shows both the see term that was initially intended and the established heading in a selection list. assignfast: an autosuggest-based tool |bennett, o’neill, and kammerer 36 figure 1. assignfast typical selection choices. selecting “cinema use motion pictures” enters the established term, and the entry process is complete for that subject. figure 2. assignfast selection result. the text above the entry box provides the fast id number and facet type. information technology and libraries | march 2014 37 as a web service, assignfast headings can be manipulated by the cataloging interface software after selection and before they are entered into the box. for example, one option available in the assignfast demo is marcbreaker format.13 marcbreaker combines marc field tagging and allows diacritics to be entered using only ascii characters. using marcbreaker output, assignfast returns the following for “ ”: =651 7$abrazil$zs{tilde}ao paulo$0(ocolc)fst01205761$2fast in this case, the output includes marc tagging of 651 (geographic), as well as subfie ld coding ($z) that identifies the city within brazil, that it’s a fast heading, and the fast control number. the information is available in the assignfast result to fill one or multiple input boxes and to reformat as needed for the particular cataloging interface. addition to web browser interfaces as a web service, assignfast could be added to any web-connected interface. a simple example is given here to add assignfast functionality to a web browser interface using javascript and jquery (http://jquery.com). these technologies are commonly used, and other implementation technologies would be similar. example files for this demo can be found on the oclc developers network under assignfast.14 the example uses the jquery.autocomplete function.15 first, the script packages jquery.js, jqueryui.js, and the style sheet jquery-ui.css are required. version 1.5.2 of jquery and version 1.8.7 for jquery-ui was used for this example, but other compatible versions should be fine. these are added to the html in the script and link tags. the second modification to the cataloging interface is to surround the existing subject search input box with a set of div tags.

the final modification is to add javascript to connect the assignfast web service to the search input box. this function should be called from assignfast: an autosuggest-based tool |bennett, o’neill, and kammerer 38 function setuppage() { // connect the autosubject to the input areas jquery('#existingbox').autoc omplete( { source: autosubjectexample, minlength: 1, select: function(event, ui) { jquery('#extrainformation').html("fast id " + ui.item.idroot + " facet "+ gettypefromtag(ui.item.tag)+ ""); } //end select } ).data( "autocomplete" )._renderitem = function( ul, item ) { formatsuggest(ul, item);}; } //end setuppage() the source: autosubjectexample tells the autocomplete function to get the data from the autosubjectexample function, which in turns calls the assignfast web service. this is in the assignfastcomplete.js file. in select: function, the extrainformation text is rewritten with additional information returned with the selected heading. in this case, the fast number and facet are displayed. the generic _renderitem of the jquery.autocomplete function is overwritten by the formatsuggest function (found in assignfastcomplete.js) to create a display that differentiates the see from the authorized headings that are returned in the search. the version used for this example shows: see heading use authorized heading when a see heading is returned, or simply the authorized heading otherwise. web service construction the autosuggest service for a fast heading was constructed a little differently than the typical autosuggest. for a typical autosuggest for the term motion picture from the example given above, you would index just that term. as the term was typed, motion picture and other terms starting with the text entered so far would be shown until you resolved the desired heading. for example, typing in “m t” might give motion pictures motion picture music employee motivation information technology and libraries | march 2014 39 diesel motor mothers and daughters for the typical autosuggest, the term indexed is the term displayed and is the term returned when selected. for assignfast, both the established and see references are indexed. however, when typing resolves a see heading, both the see heading and its established heading are displayed. only the established heading is selected, even if you are typing the see heading. for assignfast, the “m t” result now becomes features (motion pictures) use feature films motion pictures motorcars (automobiles) use automobiles motion picture music background music for motion pictures use motion picture music motion pictures for the hearing impaired use films for the hearing impaired documentaries, motion picture use documentary films mother of god use mary, blessed virgin, saint the headings in assignfast are ranked by how often they are used in worldcat, so headings that are more common appear at the top. to place the established heading above the see heading when they are similar, the established heading is also ranked higher than the see for the same usage. assignfast can also be searched by facet, so if only topical or geographic headings are desired, only headings from these facets will be displayed. the web service uses a solr16 search engine running under tomcat.17 this provides full text search and many options for cleaning and manipulating the terms within the index. the particular option used for assignfast is the edgengramfilter.18 this option is used for autosuggest and has each word indexed one letter at a time, building to its entire length. the ndex f “c nem ” w then c nt n “c,” “c ,” “c n,” “c ne,” “c nem,” nd “c nem .” solr handles utf-8 encoded unicode for both input and output. the assignfast indexes and queries are normalized using fast normalization19 to remove punctuation, diacritics, and capitalization. fast normalization is very similar to naco normalization, although in fast nor malization the subfield indicator is replaced by a space and no commas retained. assignfast is accessed using a rest request.20 rest requests consist of urls that can be invoked via either http post or get methods, either programmatically or via a web browser. http://fast.oclc.org/searchfast/fastsuggest?&query=[query]&queryindex=[queryindex]&qu eryreturn=[queryreturn]&suggest=autosuggest&rows=[numrows]&callback=[callbackfun ction] assignfast: an autosuggest-based tool |bennett, o’neill, and kammerer 40 where parameter description query the query to search queryindex the index corresponding to the fast facet. these include name description suggestall all facets suggest00 personal names suggest10 corporate names suggest11 events suggest30 uniform titles suggest50 topicals suggest51 geographic names suggest55 form/genre queryreturn information requested list, comma separated. these include: names description idroot fast number auth authorized heading, formatted for display with—as subfield separator type alt or auth—indicates whether the match on the queryindex was to an authorized or see heading tag marc authority tag number for the heading—100= personal name, 150 = topical, etc. raw authorized heading, with subfield indicators. blank if this is identical to auth (i.e., no subfields) breaker authorized heading in marcbreaker format. blank if this is identical to raw (i.e., no diacritics) indicator indicator 1 from the authorized heading numrows headings to return maximum restricted to 20 callback the callback function name for jsonp table 1. assignfast web service results description. information technology and libraries | march 2014 41 example response: http://fast.oclc.org/searchfast/fastsuggest?&query=hog&queryindex=suggestall&queryreturn=s uggestall%2cidroot%2cauth%2ctag%2ctype%2craw%2cbreaker%2cindicator&suggest=autosu bject&rows=3&callback=testcall yields the following response: testcall({ "responseheader":{ "status":0, "qtime":148, "params":{ "json.wrf":"testcall", "fl":"suggestall,idroot,auth,tag,ty pe,raw,b reaker,indicator", "q":"suggestall:hog", "rows":"3"}}, "response":{"numfound":1031,"start":0,"docs" :[ { "idroot":"fst01140419", "tag":150, "indicator":" ", "type":"alt", "auth":"swine", "raw":"", "breaker":"", "suggestall":["hogs"]}, { "idroot":"fst01140470", "tag":150, "indicator":" ", "type":"alt", "auth":"swine--ho using", "raw":"swine$xhousing", "breaker":"", "suggestall":["hog houses"]}, { "idroot":"fst00061534", "tag":100, "indicator":"1", "type":"auth", "auth":"hogarth, william, 1697-1764", "raw":"hogarth, william,$d1697-1764", "breaker":"", "suggestall":["hogarth, william, 1697-1764"]}] }}) table 3. typical assignfast json data return. assignfast: an autosuggest-based tool |bennett, o’neill, and kammerer 42 the first response heading is the use for headin hogs, which has the authorized heading swine. the second is the use for heading for hog houses, which has the authorized heading swine-housing. this authorized heading is also given in its raw form, including the $x subfield separator, which is unnecessary for the first heading. the third response matches the authorized heading for hogarth, william, 1697–1764, which is also given in its raw form. the breaker (marcbreaker) format is only added if it differs from the raw form, which is only when diacritics are present. conclusions subject assignment is a combination of intellectual and manual tasks. the assignfast web service can be easily integrated into existing cataloging interfaces, greatly reducing the manual effort eq ed f g d s bject d t ent y nc e s ng the c t ge ’s p d ct v ty. references 1. lois mai chan and edward t. o’neill, fast: faceted application of subject terminology, principles and applications (santa barbara, ca: libraries unlimited, 2010), http://lu.com/showbook.cfm?isbn=9781591587224 . 2. oclc research activities associated with fast are summarized at http://www.oclc.org/research/activities/fast. 3. lois m. chan, library of congress subject headings: principles and application: principles and application (westport, ct: libraries unlimited, 2005). 4. “a t c mp ete ” wikipedia, last modified on october 1, 2013, http://en.wikipedia.org/wiki/autocomplete. 5. tony russell-rose, “des gn ng e ch: as-you-type suggestions,” ux magazine, article no. 828, may 16, 2012, http://uxmag.com/articles/designing-search-as-you-type-suggestions. 6. david ward, jim hahn, and kirsten fe st “a t c mp ete s rese ch t : a t dy n providing search suggestions ” information technology & libraries 31, no. 4 (december 2012), 6–19. 7. jon je mey “a t m ted indexing: feeding the autocomplete monster,” indexer 28, no. 2 (june 2010), 74–75. 8. holger bast, christian w. mortensen, and ingmar webe “o tp t-sensitive autocompletion search,” information retrieval 11 (august 2008), 269–286. 9. elías tzoc, “re-using today’s metadata for tomorrow’s research: five practical examples for enh nc ng access t d g t c ect ns ” journal of electronic resources librarianship 23, no. 1 (january–march 2011) http://lu.com/showbook.cfm?isbn=9781591587224 http://www.oclc.org/research/activities/fast/ http://en.wikipedia.org/wiki/autocomplete http://uxmag.com/articles/designing-search-as-you-type-suggestions information technology and libraries | march 2014 43 10. holger bast and ingmar webe “type less f nd m e: f st a t c mp et n e ch w th succinct index,” sigir ’06 proceedings of the 29th annual international acm sigir conference on research and development in information retrieval (new york: acm, 2006), 364–71. 11. demian katz, ralph levan, and ya’aqov ziso “us ng a th ty d t n v f nd ” code4lib journal 11 (june 2011). 12. edward t. o’ne , rick bennett, and kerre kammerer, “using authorities to improve subject searches ” in m j ž me nd k. r e nd edw d t. o’ne eds., “ ey nd l b es— subject metadata in the digital environment and semantic web ,” special issue, cataloging & classification quarterly 52, no. 1/2 (in press). 13. “marcm ke nd marc e ke use ’s m n ” library of congress, network development and marc standards office, revised november 2007, http://www.loc.gov/marc/makrbrkr.html . 14. “oclc deve pe s netw k— ss gnfa t ” s bm tted eptembe 28 2012 http://oclc.org/developer/services/assignfast [page not found] 15. “jq e y t c mp ete ” ccessed oct be 1 2013 http://jqueryui.com/autocomplete. 16. “ap che l cene—ap che ” ccessed oct be 1 2013 http://lucene.apache.org/solr. 17. “ap che t mc t ” ccessed oct be 30 2013 http://tomcat.apache.org. 18. “ olr w k —analyzers tokenizers tokenfilters,” last edited october 29, 2013, http://wiki.apache.org/solr/analyzerstokenizerstokenfilters . 19. thomas b. hickey, jenny toves, and edward t. o’neill, “naco normalization: a detailed examination of the authority file comparison rules,” library resources & technical services 50, no. 3 (2006), 166–72. 20. “rep esent t n t te t nsfe ” wikipedia, last modified on october 21, 2013, http://en.wikipedia.org/wiki/representational_state_transfer . http://www.loc.gov/marc/makrbrkr.html http://oclc.org/developer/services/assignfast http://jqueryui.com/autocomplete/ http://lucene.apache.org/solr/ http://tomcat.apache.org/ http://wiki.apache.org/solr/analyzerstokenizerstokenfilters http://en.wikipedia.org/wiki/representational_state_transfer 6 information technology and libraries | june 2008 metadata to support next-generation library resource discovery: lessons from the extensible catalog, phase 1 jennifer bowen the extensible catalog (xc) project at the university of rochester will design and develop a set of open-source applications to provide libraries with an alternative way to reveal their collections to library users. the goals and functional requirements developed for xc reveal generalizable needs for metadata to support a next-generation discovery system. the strategies that the xc project team and xc partner institutions will use to address these issues can contribute to an agenda for attention and action within the library community to ensure that library metadata will continue to support online resource discovery in the future. library metadata, whether in the form of marc 21 catalog records or in a variety of newer metadata schemas, has served its purpose for library users by facilitating their discovery of library resources within online library catalogs (opacs), digital libraries, and institutional repositories. however, libraries now face the challenge of making this wealth of legacy catalog data function adequately within next-generation web discovery environments. approaching this challenge will require: n an understanding of the metadata itself and a commitment to deriving as much value from it as possible; n a vision for the capabilities of future technology; n an understanding of the needs of current (and, where possible, future) library users; and n a commitment to ensuring that lessons learned in this area inform the development of both future library systems and future metadata standards. the university of rochester ’s extensible catalog (xc) project will bring these various perspectives together to design and develop a set of open-source, collaboratively built next-generation discovery tools for libraries. the xc project team seeks to make the best possible use of legacy library metadata, while also informing the future development of discovery metadata for libraries. during phase 1 of the xc project (2006–2007), the xc project team created a plan for developing xc and defined the goals and initial functional requirements for the system. this paper outlines the major metadatarelated issues that the xc project team and xc partner institutions will need to address to build the xc system during phase 2. it also describes how the xc team and xc partners will address these issues, and concludes by presenting a number of issues for the broader library community to consider. while this paper focuses on the work of a single library project, the goals and functional requirements developed for the xc project reveal many generalizable needs for metadata to support a next-generation discovery system.1 the metadata-related goals of the xc project—to facilitate the use of marc metadata outside an integrated library system (ils), to combine marc metadata with metadata from other sources in a single discovery environment, and to facilitate new functionality (e.g., faceted browsing, user tagging)—are very similar to the goals of other library projects and commercial vendor discovery software. the issues described in this paper thus transcend their connection to the xc project and can be considered general needs for library discovery metadata in the near future. in addition to informing the library community about the xc project and encouraging comment on that work, the author hopes that identifying and describing metadata issues that are important for xc—and that are likely to be important for other projects as well—will encourage the library community to set these issues as high priorities for attention and action within the next few years. n the extensible catalog project the university of rochester’s vision for the extensible catalog (xc) is to design and develop a set of open-source applications that provide libraries with an alternative way to reveal their collections to library users. xc will provide easy access to all resources (both digital and physical collections) and will enable library content to be revealed through other web applications that libraries may already be using. xc will be released as open-source software, so it will be available for free download, and libraries will be able to adopt, customize, and extend the software to meet their local needs. the xc project is a collaborative effort between partner institutions that will serve a variety of roles in its development. phase 1 of the xc project, funded by the andrew w. mellon foundation and carried out by the university of rochester river campus libraries between april 2006 and june 2007, resulted in the creation of a project plan for the development of xc. during xc phase 1, the xc project team recruited a number of other institutions that will serve as xc partners and who have agreed to contribute resources toward building and implementing xc during phase 2. xc phase 2 (october 2007 through jennifer bowen (jbowen@library.rochester.edu) is director of metadata management at the university of rochester river campus libraries, new york, and is co-principal investigator for the extensible catalog project. metadata to support next-generation library resource discovery | bowen 7 june 2009) is supported through additional funding from the andrew w. mellon foundation, the university of rochester, and xc partners. during phase 2, the xc project team, assisted by xc partners, will deploy the xc software and make it available as open-source software.2 through its various components, the xc system will provide a platform for local development and experimentation that will ultimately allow libraries to manage and reveal their metadata through a variety of web applications such as web sites, institutional repositories, and content management systems. a library may choose to create its own customized local interface to xc, or use xc’s native user interface “as is.” the native xc interface will include web 2.0 functionality, such as tagging and faceted browsing of search results that will be informed by frbr (functional requirements for bibliographic records)3 and frad (functional requirements for authority data)4 conceptual models. the xc software will handle multiple metadata schemas, such as marc 215 and dublin core,6 and will be able to serve as a repository for both existing and future library metadata. in addition, xc will facilitate the creation and incorporation of user-created metadata, enabling such metadata to be enhanced, augmented, and redistributed in a variety of ways. the xc project team has designed a modular architecture for xc, as shown in the simplified schematic in figure 1. xc will bring together metadata from a variety of sources (integrated library systems, digital repositories, etc.), apply services to that metadata, and display it in a usable way in the web environments where users expect to find it.7 xc’s architecture will allow institutions that implement the software to take advantage of innovative models for shared metadata services, which will be described in this paper. n xc phase 1 activities during the now-completed xc phase 1, the xc project team focused on six areas of activity: 1. survey and understand existing research on user practices. 2. gauge library demand for the xc system. 3. anticipate and prepare for the metadata requirements of the new system. 4. learn about and build on related projects. 5. experiment with and incorporate useful, freely available code. 6. build a community of interest. the xc project team carried out a variety of research activities to inform the overall goals and high-level functional requirements for xc. this research included a literature search and ongoing monitoring of discussion lists and blogs, to allow the team to keep up with the most current discussions taking place about next-generation library discovery systems and related technologies and projects.8 the xc team also consulted regularly with prospective partners and other knowledgeable colleagues who are engaged in defining the concept of a next-generation library discovery system. in order to gauge library demand for the xc system, the team also conducted a survey of interested institutions.9 this paper reports the results of the third area of activity during xc phase 1—anticipating and preparing for the metadata requirements of the new system—and looks ahead to plans to develop the xc software during phase 2. n xc goals and metadata functional requirements the goals of the xc project have significant implications for the metadata functionality of the system, with each goal suggesting specific high-level functional requirements for how the system can achieve that particular goal. the five goals are: n goal 1: provide access to all library resources, digital and non-digital. n goal 2: bring metadata about library resources into a more open web environment. n goal 3: provide an interface with new web functionality such as web 2.0 features and faceted browsing. n goal 4: conduct user research to inform system development. n goal 5: publish the xc code as open-source software. figure 1. xc system diagram 8 information technology and libraries | june 2008 an overview of each xc goal and its related high-level metadata requirements appears below. each requirement is then discussed in more detail, with a plan for how the xc project team will address that requirement when developing the xc software. n goal 1: provide access to all library resources, digital and non-digital working alongside a library’s current integrated library system (ils) and its other web applications, xc will strive to bring together access to all library resources, thus eliminating the data silos that are now likely to exist between a library’s opac and its various digital repositories and commercial databases. this goal suggests two fairly obvious metadata requirements (requirements 1 and 2). requirement 1—the system must be capable of acquiring and managing metadata from multiple sources: ilss, digital repositories, licensed databases, etc. a typical library currently has metadata pertaining to its collections residing in a variety of separate online systems: marc data in an ils, metadata in various schemas in digital collections and repositories, citation data in commercial databases, and other content on library web sites. a library that implements xc may want to populate the system with metadata from several online environments to simplify access to all types of resources. to achieve goal 1, xc must be capable of acquiring and managing metadata from all of these sources. each online environment and type of metadata present their own challenges. repurposing marc data repurposing marc metadata from an existing ils will be one of the biggest metadata tasks for a next-generation discovery system such as xc. in planning xc, we have assumed that most libraries will keep their current ils for the next few years or perhaps migrate to a newer commercial or open-source ils. in either case, most libraries will likely continue to rely on an ils’s staff functionality to handle materials acquisition, cataloging, circulation, etc. for the short term. relying upon an ils as a processing environment does not, however, mean that a library must use the opac portion of that ils as its means of resource discovery for users. xc will provide other options for resource retrieval by using web services to interact with the ils in the background.10 to repurpose ils metadata and enable it to be used in various web discovery environments, xc will harvest a copy of marc metadata records from an institution’s ils using the open archives initiative protocol for metadata harvesting (oai-pmh).11 using web services and standard protocols such as oaipmh offers not only a short-term solution for reusing metadata from an ils, but can also be used in both the shortand long-term to harvest metadata from any system that is oai-pmh harvestable, as will be discussed further below. while harvesting metadata from existing systems into xc creates duplication of metadata between an ils and xc, this actually has significant benefits. xc will handle metadata updates through automated harvesting services that minimize additional work for library staff, other than for setting up and managing the automated services themselves. the internal xc metadata cache can be easily regenerated from the original repositories and services when necessary, such as to enable future changes to the internal xc metadata schema. the xc system architecture also makes use of internal metadata duplication among xc’s components, which allows these components to communicate with each other using oaipmh. this built-in metadata redundancy will also enable xc to communicate with external services using this standard protocol. it is important to distinguish the deliberate metadata redundancies built into the xc architecture from the type of metadata redundancies that have been singled out for elimination in the library of congress working group on the future of bibliographic control draft report (recommendation 1.1)12 and previously in the university of california (uc) libraries bibliographic services task force’s final report.13 these other “negative” redundancies result from difficulties in sharing metadata among different environments and cause significant additional staff expense for libraries to enrich or recreate metadata locally. xc’s architecture actually solves many of these problems by facilitating the sharing of enriched metadata among xc users. xc can also adapt as the library community begins to address the types of costly metadata redundancies mentioned in the above reports, such as between the oclc worldcat database14 and copies of that marc data contained within a library’s ils, because xc will be capable of harvesting metadata from any source that uses a standard api.15 metadata from digital repositories and other free sources xc will harvest metadata from various digital collections and repositories, using oai-pmh, and will maintain a copy of the harvested metadata within the xc metadata cache, as shown in figure 1. the metadata services hub architecture provides flexibility and possible economy for xc users by offering the option for multiple xc institutions to share a single metadata hub, thus allowing participating institutions to take full advantage of the hub’s capabilities to aggregate and augment metadata from multiple sources. while the procedure for harvestmetadata to support next-generation library resource discovery | bowen 9 ing metadata from an external repository is not technologically difficult in itself, managing the flow of metadata coming from multiple sources and aggregating that metadata for use in xc will require the development of sophisticated software. to address this, the xc project team is partnering with established experts in bibliographic metadata aggregation to develop the metadata services portion of the xc architecture. the team from cornell university that has developed the software behind the national science digital library’s metadata management system (nsdl/mms)16 is advising the xc team in the development of the xc metadata services hub, which will be built on top of the basic nsdl/mms software. the xc metadata services hub will coordinate metadata services into a reusable task grouping that can be started on demand or scheduled to run regularly. this xc component will harvest xml metadata and combine metadata records that refer to equivalent resources (based on uniform resource identifier [uri], if available, or other unique identifier) into what the cornell team describes as a “mudball.” each mudball will contain the original metadata, the sources for the metadata, and the references to any services used to combine metadata into the mudball. the mudball may also contain metadata that is the result of further automated processing or services to improve quality or to explicitly identify relationships between resources. hub services could potentially record the source of each individual metadata statement within each mudball, which would then allow a metadata record to be redelivered in its original or in an enriched form when requested.17 by allowing for the capture of provenance data for each data element, the hub could potentially provide much more granular information about the origin of metadata—and much more flexibility for recombining metadata—than is possible in most marcbased environments. after using the redeployed nsdl/mms software as the foundation for the xc metadata hub, the xc project team will develop additional hub services to support xc’s functional requirements. xc-specific hub services will accommodate incoming marc data (including marc holdings data for non-digital resources); basic authority control; mappings from marc 21, marcxml,18 and dublin core to an internal xc schema defined within the xc application profile (described below); and other services to facilitate the functionality of the xc user environments (see discussion of requirement 5, below). finally, the xc hub services will make the metadata available for harvesting from the hub by the xc client integration applications. metadata for licensed content for a next-generation discovery system such as xc to provide access to all library resources, it will need to provide access to licensed content, such as citation data and full-text databases. metasearch technology provides one option for incorporating access to licensed content into xc. unfortunately, various difficulties with metasearch technology19 and usability issues with some metasearch products20 make metasearch technology a less-than-ideal solution. an alternative approach would bring metadata from licensed content directly into a system such as xc. the metadata services hub architecture for xc is capable of handling the ingest and processing of metadata supplied by commercial content providers by adding additional services to handle the necessary schema transformations and to control access to the licensed content. the more difficult issue with licensed content may be to obtain the cooperation of commercial vendors to ingest their metadata into xc. pursuing individual agreements with vendors to negotiate rights to ingest their metadata is beyond the original scope of xc’s phase 2 project. however, the xc team will continue to monitor ongoing developments in this area, especially the work of the ethicshare project, which uses a system architecture very similar to that of xc.21 it remains our goal to build a system that will facilitate the inclusion of licensed content within xc in situations where commercial providers have made it available to xc users. requirement 1 summary when considering needed functionality for a next-generation discovery system, the ability to ingest and manage metadata from a variety of sources is of paramount importance. unlike a current ils, where we often think of metadata as mostly static unless it is supplemented by new, updated, and deleted records, we should instead envision the metadata in a next-generation system as being in constant motion, moving from one environment to another and being harvested and transformed on a scheduled basis. the metadata services hub architecture of the xc system will accommodate and facilitate such constant movement of metadata. requirement 2—the system must handle multiple metadata schemas. an extension of requirement 1 will be the necessity for a next-generation system such as xc to handle metadata from multiple schemas, as the system harvests those schemas from various sources. library metadata priorities as a part of the xc survey of libraries described earlier in this paper, the xt team queried respondents about what metadata schemas they currently use or plan to use in the near future. many responding libraries indicated that they expect to increase their use of non–marc 21 metadata within the next three years, although no library indicated the intention to completely move away from 10 information technology and libraries | june 2008 marc 21 within that time period. nevertheless, the idea of a “marc exit strategy” has been discussed in various circles.22 the architecture of xc will enable libraries to move beyond the constraints of a marc-based system without abandoning their ils, and will provide an opportunity for libraries to stage their “marc exit strategy” in a way that suits their purposes. libraries also indicated that they plan to move away from homegrown schemas toward accepted standards such as mets,23 mods,24 mads,25 premis,26 ead,27 vra core,28 and dublin core.29 several responding libraries plan to move toward a wider variety of metadata schemas in the near future, and will focus on using xmlbased schemas to facilitate interoperability and metadata harvesting. to address the needs of these libraries in the future, xc’s metadata services will contain a variety of transformation services to handle a variety of schemas. taking into account the metadata schemas mentioned the most often among survey respondents, the software developed during phase 2 of the xc project will support harvested metadata in marc 21, marcxml, and dublin core (including qualified dublin core).30 metadata crosswalks and mapping one respondent to the xc survey offered the prediction that “reuse of existing metadata and transformation of metadata from one format to another will become commonplace and routine.”31 xc’s internal metadata transformations must be designed with this in mind, to facilitate making these activities “commonplace and routine.” fortunately, many maps and crosswalks already exist that potentially can be incorporated into a next-generation system such as xc.32 the metadata services hub architecture for xc can function as a standard framework for applying a variety of existing crosswalks within a single, shared environment. following “best practices” for crosswalking metadata, such as those developed by the digital library federation (dlf),33 will be extremely important in this environment. as the dlf guidelines describe, metadata schema transformation is not as straightforward as it might first appear to be. while the dlf guidelines advise always crosswalking from a more robust schema to a simpler one, sometimes in a series of steps, such mapping will often result in “dumbing down” of metadata, or loss of granularity. this is a particularly important concern for the xc project because a large percentage of the metadata handled by xc will be rich legacy marc 21 metadata, and we hope to maintain as much of that richness as possible within the xc system. in addition to simply mapping one data element in a schema to its closest equivalent in another, it is essential to ensure that the underlying metadata models of the two schemas being crosswalked are compatible. the authors of the framework for a bibliographic future draft document define multiple layers of such models that need to be considered,34 and offer a general highlevel comparison between the frbr data model35 and the dcmi (dublin core metadata initiative) abstract model (dcam).36 more detailed comparisons of models are also taking place as a part of the development of the new metadata content standard, resource description and access (rda).37 the developers of rda have issued documents offering a detailed mapping of rda elements to rda’s underlying model (frbr)38 and analyzing the relationship between rda elements, the dcmi abstract model, and the metadata framework.39 as a result of a meeting held april 30–may 1, 2007, a joint dcmi/rda task group is now undertaking the collaborative work necessary to carry out the following tasks: n develop an rda element vocabulary. n develop an rda/dublin core application profile based on frbr and frad. n disclose rda value vocabularies using rdf/ rdfs/skos.40 these efforts hold much potential to provide a more rigorous way to communicate about metadata across multiple communities and to increase the compatibility of different metadata schemas and their underlying models. such compatibility will be essential to enabling the functionality of future discovery systems such as xc. an xc metadata application profile the xc project team will define a metadata application profile for xc as a way to document decisions made about data elements, content standards, and crosswalking used within the system. the use of an application profile can facilitate metadata migration, harvesting, and other automated processes, and presents an approach to metadata that is more flexible and responsive to local needs than simply adopting someone else’s metadata guidelines.41 application profiles facilitate the use of multiple schemas because elements can be selected for inclusion from more than one existing schema, or additional elements can be created and defined locally.42 because the xc system will incorporate harvested metadata from a variety of sources, the use of an application profile will be essential to support xc’s complex system requirements. the dcmi community has published guidelines for creating a dublin core application profile (dcap), which is defined more specifically as: [a] form for documenting which terms a given application uses in its metadata, with what extensions or adaptations, and specifying how those terms relate both to formal standards such as dublin core as well as to less formally defined element sets and vocabularies.43 metadata to support next-generation library resource discovery | bowen 11 the announcement of plans to develop an rda/ dublin core application profile illustrates the important role that application profiles are beginning to take to facilitate the interoperability of metadata schemas. the planned rda/dc application profile will “translate” rda into a standard structure that will allow it to be related more easily to other metadata element sets. unfortunately, the rda/dc application profile will likely not be completed in time for it to be incorporated into the first release of the xc software in mid-2009. nevertheless, we intend to use the existing definitions of rda elements to inform the development of the xc application profile.44 this will allow us to anticipate any future incompatibilities between the rda/dc and the xc application profiles, and ensure that xc will be wellpositioned to take advantage of rda-based metadata when rda is implemented. this process may have the reciprocal benefit of also informing the developers of rda of any rda elements that may be difficult to implement within a next-generation system such as xc. the potential value of rda to the xc project—in terms of providing a consistent approach to bibliographic and authority metadata and facilitating frbr-related user functionality—is very significant. it is hoped that at some point xc can become an early adopter of rda and provide a mechanism through which libraries can move their legacy marc 21 metadata into a system that is compatible with an emerging international metadata standard. n goal 2: bring metadata about library resources into a more open web environment xc will reveal library metadata not only through its own separate interface (either the out-of-the-box xc interface or an interface designed by the local library), but will also allow library metadata to be revealed through other web applications. the latter approach will bring library resources directly to web locations that library users are already visiting, rather than attempting to entice users to visit an additional library-specific web location. making library metadata work effectively in the broader web environment (outside the well-defined boundaries of an ils or repository) will require the following requirements 3 and 4: requirement 3—metadata must conform to the standards of the new web environments as well as to that of the system from which it originated. achieving requirement 3 will require library metadata in future systems to perform a dual function: to conform to both existing library standards as well as to web standards and conventions. one way to achieve this is to ensure that the two types of standards themselves are compatible. coyle and hillmann have argued persuasively for changes in the direction of rda development to allow metadata created using rda to function in the broader web environment. these changes include the need to follow a clearly refined, high-level metadata model, to create data elements that can be manipulated by machines, and to move toward the use of uris instead of textual identifiers.45 after the announcement of the outcomes of the rda/dc data modeling meeting, the two authors are considerably more optimistic about rda functioning as a standard within the broader web environment.46 this discourse concerning rda shows but a piece of the process through which long-established library metadata standards need to be reexamined to make library metadata understandable to both humans and machines on the web. moving away from aacr2 toward rda, and ultimately toward incorporating standard web conventions into library metadata, can be a difficult process for those involved in creating and maintaining library standards. nevertheless, transforming library metadata standards in this way is essential to fulfill the requirements necessary for next-generation library discovery systems. requirement 4—metadata must function effectively within the new web environments as well as within the system from which it originated. not only must metadata for a next-generation system follow the conventions and standards used in the broader web, but the data also needs to be able to function effectively in a broader web environment. this is a slightly different proposition from requirement 3, and will necessitate testing the metadata standards themselves to ensure that they enable library metadata to function effectively. the xc project will provide direct experience with using library metadata in two types of web environments: content management systems and learning management systems. library metadata in a content management system as shown in the xc architecture diagram in figure 1, the xc project team will build one of the primary user environments for xc on top of the open-source content management system, drupal.47 the xc drupal module will allow us to respond to many of the needs expressed by libraries in their responses to the xc survey48 by supplying: n a web application server with a back-end database; 12 information technology and libraries | june 2008 n a user interface with web 2.0 features; n library-controlled web pages that will treat library metadata as a native data type; n a metadata interface for enhancing or correcting metadata in the system; and n an administrative interface. the xc team will bring library metadata into the drupal content management system (cms) as a native content type within that environment, creating a drupal “node” for each metadata record. this will allow xc to take advantage of many native features of the drupal cms, such as a taxonomy system.49 building xc interfaces on top of the drupal cms will also give us an opportunity to collaborate with partner libraries that are already active participants in the drupal user community. xc’s architecture will allow the possibility of developing additional user environments on top of other content management systems. bringing library metadata into these new environments will provide many new opportunities for libraries to manipulate their metadata and present it to users without being constrained by the limitations of the current generation of library systems. such opportunities will then inform the future requirements for library metadata in such environments. library metadata in a learning management system figure 1 illustrates two examples of xc user environments through learning management systems: xc interfaces to both the blackboard learning system50 and sakai.51 much exciting work is being done at other institutions to bring library content into these web applications.52 xc will build on projects such as these to reveal library metadata for non-licensed library resources from an ils through learning management systems. specifically, we plan to develop the capability for libraries to make the display of library metadata context-sensitive within the learning management system. for example, searching or browsing on a page for a particular academic course could be configured to reflect the subject area of the course (e.g., chemistry) and automatically present library resources related to that subject.53 this capability will build upon the experiences gained by the university of rochester through its work to develop its “course resources” system.54 such xc functionality will be integrated directly into the learning management system, rather than simply providing a link out to a separate library system. again, we hope that our efforts to bring library metadata into these new environments will encourage libraries to engage in further work to integrate library resources into broader web environments and inform future requirements for library metadata in these environments. n goal 3: provide an interface with new web functionality such as web 2.0 features and faceted browsing new functionality for users will require that metadata fulfill more sophisticated functions in a next-generation system than it may have done in an ils or repository, in order to provide more intuitive searching and navigation. the system will also need to capture and incorporate metadata generated through tagging, user-contributed reviews, etc. such new functionality creates the need for requirements 5 and 6. requirement 5—metadata must support functionality to facilitate intuitive searching and navigation, such as faceted browsing and frbrinformed results groupings. enabling faceting and clustering much research has already been done regarding the design of faceted search interfaces in general.55 when considered along with user research conducted at other institutions56 and to be conducted during the development of xc, this data provides a strong foundation for the design of a faceted browse environment. the xc project team has already gained firsthand experience with developing faceted browsing through the development of the “c4” prototype interface during phase 1 of the xc project.57 to enable faceting within xc, we will also pay particular attention to what others have discovered through designing faceted interfaces on top of legacy marc 21 metadata. specific lessons learned from those involved with north carolina state university’s endeca-based catalog,58 vanderbilt university’s primo implementation,59 and plymouth state university’s scriblio system60 provide valuable guidance for the xc project team as we design facets for the xc system. ideally, a mechanism should be developed to enable these discoveries to feed back into the development of metadata and encoding standards, so that changes to existing standards can be considered to facilitate faceting in the future. several new system implementations have used library of congress subject headings (lcsh) and lc subdivisions from marc 21 records as the basis for deriving facets. the xc “c4” prototype interface provides facets for topic, genre, and region that are based simply upon one or more marc 21 6xx tags.61 north carolina state university’s endeca-based system has enabled facets for topic, genre, region, and era using lcsh subdivisions as well, but this has necessitated a “massive cleanup” of subdivisions, as described by charley pennell.62 oclc’s fast (faceted application of subject terminology) project may provide another option for enabling such facets.63 a library could populate its marc 21 data with fast headings, based metadata to support next-generation library resource discovery | bowen 13 upon the existing lcsh in the records, and then use the fast headings as the basis for generating facets. it remains to be seen whether fast will offer significant benefit over lcsh itself when it comes to faceting, however, since fast headings are generated directly from lcsh. while marc 21 metadata has some known difficulties where faceting and clustering are concerned (such as those involving lcsh), the xc system will encounter additional difficulties when implementing these technologies with less robust metadata schemas such as simple dublin core, and especially across metadata from a variety of schemas. the development of web services to augment batches of metadata records in an automated manner holds some promise for improving the creation of facets from other metadata schemas. within the xc system, such services could be added to the metadata services hub and run against ingested metadata. while designing extensive services of this type is beyond the scope of the next phase of xc software development, we will encourage others to develop such services for xc. another (but much less desirable) approach to augmenting metadata is for a metadata specialist to manually edit one record or group of records. the xc cataloging interface, built within the drupal cms, will allow recordby-record editing of metadata when necessary. while we see this editing interface as essential functionality for xc, we anticipate that libraries will want to use this feature sparingly. in many cases it will be preferable to correct or augment metadata within its original repository (e.g., the institution’s ils) and then re-harvest the corrected metadata, rather than correcting it manually within xc itself. because of the expense of manual metadata augmentation and correction, libraries will be well-advised to rely upon insights gained through user research to assess the value of this type of work. for example, a library might decide to edit individual metadata records only when the correction or augmentation will support specific system functionality that is of high priority for the institution’s users. implementing frbr results groupings to incorporate logical groupings of search results based upon the frbr64 and frad65 data models over sets of diverse metadata within xc, we will encounter similar difficulties that we face with faceting and clustering. various analyses of the marc 21 formats have dealt extensively with the relationship between frbr and marc 21,66 and others have written specifically about methodology for frbrizing a marc-based catalog.67 in addition, various tools and web services are available that can potentially facilitate this process.68 even with this extensive body of work to draw upon, however, the success of our implementation of frbr-based functionality will depend upon both the quality and completeness of the system’s metadata. metadata in xc that originated as dublin core records may need significant augmentation to be incorporated effectively into frbrized results displays. to maximize the ability of the system to support frbr/frad results groupings, we may need to supplement automated grouping of resources with a combination of additional services for the metadata services hub, and with cataloger-generated metadata correction and augmentation, as described above.69 the xc team will use the results of user research carried out during the next phase of the xc project to inform our decision-making regarding what frbr-informed results grouping users find helpful, and then assess what specific metadata augmentation services are needed for xc. providing frbr-informed groupings of related records in search results will be easier when the underlying metadata incorporates principles of authority control. of course, the vast majority of the non-marc metadata that will be ingested into xc will not be under authority control. again, this situation suggests the need for additional services or functionality to improve existing metadata within the xc metadata hub, the xc cataloging interface, or both. as an experiment in developing services to facilitate authority control, the xc project team carried out a pilot project in partnership with a group of software engineering students from the rochester institute of technology (rit) during phase 1 of xc. the rit students designed a basic name access control tool that can be used across disparate metadata schemas in an environment such as xc. the tool can ingest marc 21 authority and bibliographic records as well as dublin core records, provide automated matching, and facilitate a cataloger’s handling of problem reports.70 the xc project team will implement the automated portion of the tool as a web service within the xc hub, and the “cataloger facilitation” portion of the tool within the xc cataloging user interface. institutions that use xc can then incorporate additional tools to facilitate authority control into xc as they are needed and developed. in addition to providing a test case for developing xc metadata services, the rit pilot project proved valuable by providing an opportunity for student software developers and catalogers to discuss the functional requirements of a cataloging tool. not only did the experience enable the developers to understand the needs of the system’s intended users, but it also presented an opportunity for the engineering students to demonstrate technological possibilities that the catalogers—who work almost exclusively with legacy ils technology—may not have envisioned before participating in the project. requirement 6—the system must manage usergenerated metadata resulting from user tagging, submission of reviews, etc. because users now expect web-based tools to offer web 2.0 functionalities, the xc project has as one of its basic 14 information technology and libraries | june 2008 goals to incorporate these functionalities into xc’s user environments. the results of the xc survey rank tools to support the finding, gathering, use, and reuse of scholarly content (e.g., rss feeds, blogs, tagging, user reviews) eighth out of a list of twenty new desirable opac features.71 we expect to learn much more about the usefulness of web 2.0 technology within a next-generation system through the user research that we will carry out during phase 2 of the xc project. the xc system will capture metadata generated by users from any one of the system’s user environments (e.g., drupal-based interface, learning management system integration) and harvest it back into the system’s metadata services hub for processing.72 the xc application profile will incorporate user-generated metadata, mapped into its own carefully defined metadata elements. this will allow us to capture and manage this metadata as discrete content, without inadvertently mixing it with other metadata created by library staff or ingested from other sources. n goal 4: conduct user research to inform system development user research will be essential to informing the design and functionality of the xc software. to align xc’s functional requirements as closely as possible with user needs, the xc project team will practice a user-centered design methodology that takes an iterative approach to defining the system’s functional requirements. since we will engage concurrently in the processes of user research and software design, we will not fully determine the system requirements for xc until a significant amount of user research has been done. a complete picture of the demands upon metadata within xc will thus emerge as we gain information from our user research. n goal 5: publish the xc code as open-source software central to the vision of the xc project is sharing the xc software freely throughout the library community and beyond. our hope is that others will use all or part of the xc software, modify it, and improve it to meet their own needs. new requirements for the metadata within xc are likely to arise as this process takes place. other future changes to the xc software will also be needed to ensure the software’s continued compatibility with various metadata standards and schemas. these changes will all affect the system requirements for xc over time. addressing goals 4 and 5 while goals 1 through 3 for the xc project result in specific high-level functional requirements for the system’s discovery metadata that can be addressed and discussed as xc is being developed, goals 4 and 5 present general challenges that must be addressed in the future. goal 4 is likely to fuel the need to update the xc software over time as the needs of users change. goal 5 provides a challenge to managing that updating process in a collaborative environment. these two goals suggest an additional general requirement for the system’s metadata requirement 7: requirement 7—the system’s metadata must be extensible to facilitate future enhancements and updates. enabling future user needs developing xc using a user-centered design process in which user research and software design occur simultaneously will enable us to design and build a system that is as responsive as possible to the needs of users that are seeking library resources. however, user needs will change during the life of the xc software. these needs must be assessed and addressed, and then weighed against the desires of individual institutions that use xc and who request specific system enhancements. to carry forward the xc project’s commitment to serving users, we will develop a governance model for the xc community that brings the needs of future users into the decision-making process by providing a method for continuing to determine and capture user needs. in addition, we will consciously cultivate a commitment to user research among members of the xc community. because the xc software will be released as open source, we can also encourage xc partners to develop whatever additional functionality they need for their own institutions and make these enhancements available to the entire community of xc users. this approach is very different from the enhancement process in place for most commercial systems, and xc partner institutions may need to adjust to this approach. enabling future metadata standards as current metadata standards are revised and new standards and schemas are created, xc must be able to accommodate these changes. new crosswalks will allow new metadata schemas to be mapped to the xc internal schema in the future. the xc application profile can be updated with the addition of new data elements as needed. the drupal-based xc user environment will also allow institutions that use xc to create new internal data types to incorporate additional types of metadata. as the development of the semantic web moves forward73 and enables smart linking between existing authority files and vocabularies,74 xc’s architecture can make use of the resulting web services, either by incorporating them metadata to support next-generation library resource discovery | bowen 15 through the xc metadata services hub or through the native xc user interface as part of a user search query. n further considerations the above discussion of the goals and requirements for xc has revealed a number of issues related to the development of next-generation discovery systems that are unfortunately beyond the scope of the next phase of the xc project. we therefore offer them as a possible agenda for future work by the broader library community: 1. explore the wider usefulness of web-based metadata services and the need for an automated metadata services coordinator to control these functions. libraries are already comfortable with basic “services” that are performed on metadata by an outside agency: for example, a library may send copies of its marc records to a vendor for authority processing or enrichment with tables of contents or other data elements. the library community should encourage vendors and others to develop these and other metadata enrichment options as automated web services. 2. study the advantages of using statement-level metadata provenance, as used in the nsdl metadata management system and considered for use within the xc metadata services hub, and explore whether there are ways that marc 21 could move toward allowing more granularity in recording and sharing metadata provenance. 3. to facilitate access to licensed library resources, encourage the development of more robust metasearch technology and standards so that technological limitations do not hinder system performance and search result usability. if this is not successful, libraries and content providers must work together to enable metadata for licensed resources to be revealed within open discovery environments such as xc and ethicshare.75 this second scenario will enable libraries to directly address usability issues with the display of licensed content, which may make it a more desirable longer-term solution than attempting to improve metasearch technology. 4. the administrative bodies of the two groups represented on the dcmi/rda task group (i.e., the dublin core metadata initiative and the rda committee of principals) have a responsibility to take the lead in funding this group’s work to develop and maintain the rda/dc application profile and its related registries and vocabularies. beyond this, however, the broader library community must recognize that this work is essential to ensure that future library metadata standards will function in the broader web environment, and offer additional administrative and financial support for it in the coming years. 5. to ensure that library standards work effectively outside of traditional library systems, catalogers and metadata experts must develop ongoing, collaborative working relationships with system developers. such collaboration will necessitate educating each group of experts about the domain of the other. 6. libraries should experiment with using metadata in new environments and use the lessons learned from this activity to inform the metadata standards development process. while current library automation environments by and large do not provide opportunities for this, the extensible catalog will provide a flexible platform where experimentation can take place.76 xc will make experimentation as risk-free as possible by ensuring that the original metadata brought into the system can be reharvested in its original form, thus minimizing concerns about possible data corruption. xc will also minimize the investment needed for a library to engage in this experimentation because it will be released as open-source software. 7. to facilitate new functionality for next-generation library discovery environments, libraries must share their new expertise in this area with each other. for example, library professional organizations (such as ala and its associations) should form discussion groups and committees devoted to sharing lessons learned from the implementation of faceted interfaces and web 2.0 technologies, such as tagging and folksonomies. such groups should develop a “best practices” document outlining a preferred way to define facets from marc 21 data that can be used by any library implementing faceting on top of its legacy metadata. 8. the library community should discuss and encourage mechanisms for pooling and sharing usergenerated metadata among libraries and other interested institutions. n conclusions to present library resources via the web in a manner that users now expect, library metadata must function in ways that have never been required of it before. making library metadata function effectively within the broader web environment will require that libraries take advantage of the combined knowledge of experts in the areas of cataloging/metadata and system development who share a 16 information technology and libraries | june 2008 common vision for serving library users. the challenges to making legacy library metadata and newer metadata for digital resources interact effectively in the broader web environment are significant, and work must begin now to ensure that we can preserve the investment that libraries have made in their legacy metadata. while the recommendations within this report are the result of planning to develop one particular library discovery system—the extensible catalog (xc)—these lessons can inform the development of other systems as well. the actual development of xc will continue to add to our knowledge in this area. while it may be tempting to wait and see what commercial vendors offer as their next generation of commercial discovery products, such a passive approach may jeopardize the future viability of library metadata. projects such as the extensible catalog can serve as a vehicle for moving forward by providing an opportunity for libraries to experiment and to then take informed action to move the library community toward a next generation of resource discovery systems. acknowledgments phase 1 of the extensible catalog project was funded through a grant from the andrew w. mellon foundation. this paper is in partial fulfillment of that grant, originally funded on april 1, 2006, and concluding on june 30, 2007. the author acknowledges the contributions of the entire university of rochester extensible catalog project team to the content of this paper, and especially thanks david lindahl, barbara tillett, and konstantin gurevich for reading and offering suggestions on drafts of this paper. references and notes 1. despite the use of the word “catalog” within the name of the extensible catalog project, this paper will avoid using the word “catalog” in the phrase “next-generation catalog” because this may misleadingly convey the idea of a catalog as solely a single, separate web destination for library users. instead, terms such as “discovery environment” and “discovery system” will be preferred. 2. the xc blog provides a list of xc partners, describes their roles in xc phase 2, and provides links to reports that represent the outcomes of xc phase 1. “xc (extensible catalog): an opensource online system that will unify access to traditional and digital library resources,” www.extensiblecatalog.info (accessed october 4, 2007). 3. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records (munich: k. g. saur, 1998), www.ifla.org/vii/s13/frbr/ frbr.pdf (accessed july 23, 2007). 4. ifla working group on functional requirements and numbering of authority records (franar), “functional requirements for authority data: a conceptual model,” april 1, 2007, www.ifla.org/vii/d4/franar-conceptualmodel2ndreview.pdf (accessed july 23, 2007). 5. library of congress, network development and marc standards office, “marc 21 formats,” april 18, 2005, www.loc .gov/marc/marcdocz.html (accessed september 3, 2007). 6. “dublin core metadata element set, version 1.1,” december 20, 2004, http://dublincore.org/documents/dces (accessed september 3, 2007). 7. university of rochester river campus libraries, “extensible catalog phase 2,” (grant proposal submitted to the andrew w. mellon foundation, july 11, 2007). 8. “literature list,” extensible catalog blog, www. extensiblecatalog.info/?page_id=17 (accessed august 27, 2007). 9. a summary of the results of this survey is available on the xc blog. nancy fried foster et al., “extensible catalog survey report,” july 20, 2007, www.extensiblecatalog.info/wp-content/ uploads/2007/07/xc%20survey%20report.pdf (accessed july 23, 2007). 10. lorcan dempsey has written of the need for a service layer for libraries that would facilitate the “de-coupling” of resource retrieval from back-end processing. lorcan dempsey, “a palindromic ils service layer,” lorcan dempsey’s weblog, january 20, 2006, http://orweblog.oclc.org/archives/000927. html (accessed august 24, 2007). 11. “open archives initiative protocol for metadata harvesting v. 2.0,” www.openarchives.org/oai/openarchivesprotocol. html (accessed august 27, 2007). 12. library of congress, working group on the future of bibliographic control, “report on the future of bibliographic control: draft for public comment,” november 30, 2007, www .loc.gov/bibliographic-future/news/lcwg-report-draft-11-3007-final.pdf (accessed december 30, 2007). 13. university of california libraries bibliographic services task force, “rethinking how we provide bibliographic services for the university of california,” final report, 34, http://libraries. universityofcalifornia.edu/sopag/bstf/final.pdf (accessed august 24, 2007). 14. “[worldcat.org] search for an item in libraries near you,” www.worldcat.org (accessed august 24, 2007). 15. oclc’s plan to create additional apis to worldcat as part of its worldcat grid project is a welcome development that may enable oclc members to harvest metadata directly from worldcat into a system such as xc in the future. see the following blog posting for an early description of oclc’s plans, which have not been formally unveiled by oclc as of this writing: bess sadler, “the librarians and the chocolate factory: oclc developer network day,” solvitur ambulando, october 3, 2007, www.ibiblio.org/bess/?p=88 (accessed december 30, 2007). 16. “metadata management system,” nsdl registry, september 20, 2006, http://metadataregistry.org/wiki/index.php/ metadata_management_system (accessed july 23, 2007). 17. diane hillmann, stuart sutton, and jon phipps, “nsdl metadata improvement and augmentation services,”(grant proposal submitted to the national science foundation, 2007). 18. library of congress, network development and marc standards office, “marcxml: marc 21 xml schema,” july 26, 2006, www.loc.gov/standards/marcxml (accessed september 3, 2007). metadata to support next-generation library resource discovery | bowen 17 19. andrew k. pace, “category: metasearch,” hectic pace, http://blogs.ala.org/pace.php?cat=150 (accessed august 27, 2007). see in particular the following blog entries: “metameta,” july 25, 2006; “more meta,” september 29, 2006; “preaching to the publishers,” oct 31, 2006; “even more meta,” july 11, 2007; and “still here,” august 21, 2007. 20. david lindahl, “metasearch in the users’ context,” the serials librarian 51, no. 3/4 (2007): 220–222. 21. ethicshare, a collaborative project of the university of minnesota, georgetown university, indiana university–bloomington, indiana university–purdue university indianapolis, and the university of virginia, is addressing this challenge as part of its plan to develop a sustainable online environment for the practical ethics community. the architecture of the proposed ethicshare system has many similarities to that of xc, but the project focuses specifically upon ingesting citation metadata from a variety of sources, including commercial providers. see cecily marcus, “ethicshare planning phase final report,” july 2007, www.lib.umn.edu/about/ethicshare/university%20 of%20minnesota_ethicshare_final_report.pdf (accessed august 27, 2007). 22. roy tennant used this phrase in “marc exit strategies,” library journal 127, no. 19 (november 15, 2002), www.libraryjournal.com/article/ca256611.html?q=tennant+exit (accessed july 23, 2007); karen coyle presented her vision for moving beyond marc to a more flexible, identifier-based record structure that will facilitate a range of library functions in “future considerations: the functional library systems record,” library hi tech 22, no. 2 (2004). 23. library of congress, network development and marc standards office, “mets: metadata encoding and transmission standard official web site,” august 23, 2007, www.loc.gov/ standards/mets (accessed september 3, 2007). 24. library of congress, network development and marc standards office, “mods: metadata object description schema,” august 22, 2007, www.loc.gov/standards/mods (accessed september 3, 2007). 25. library of congress, network development and marc standards office, “mads: metadata authority description schema,” february 2, 2007, www.loc.gov/standards/mads (accessed september 3, 2007). 26. “premis: preservation metadata maintenance activity,” july 31, 2007, www.loc.gov/standards/premis (accessed september 3, 2007). 27. library of congress, network development and marc standards office, “ead: encoded archival description version 2002 official site,” august 17, 2007, www.loc.gov/ead (accessed september 3, 2007). 28. visual resources association, “vra core: welcome to the vra core 4.0,” www.vraweb.org/projects/vracore4 (accessed september 3, 2007). 29. “dublin core metadata element set, version 1.1.” 30. other xml-compatible schemas, such as mods and mads, will also be supported initially in xc if they are first converted into marc xml or qualified dublin core. in the future, we plan to allow these other schemas to be harvested directly into xc. 31. foster et al., “extensible catalog survey report,” july 20, 2007, 15. the original comment was submitted by meg bellinger in yale university’s response to the xc survey. 32. patricia harpring et al., “metadata standards crosswalks,” in introduction to metadata: pathways to digital information (getty research institute, n.d.), www.getty.edu/research/ conducting_research/standards/intrometadata/crosswalks. html (accessed august 29, 2007); see also carol jean godby, jeffrey a. young, and eric childress, “a repository of metadata crosswalks,” d-lib magazine 10, no. 12 (december 2004), www .dlib.org/dlib/december04/godby/12godby.html (accessed july 23, 2007). 33. digital library federation, “crosswalkinglogic,” june 22, 2007, http://webservices.itcs.umich.edu/mediawiki/oaibp/ index.php/crosswalkinglogic (accessed august 28, 2007). 34. karen coyle et al., “framework for a bibliographic future,” may 2007, http://futurelib.pbwiki.com/framework (accessed july 23, 2007). 35. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records. 36. andy powell et al., “dcmi abstract model,” dublin core metadata initiative, june 4, 2007, http://dublincore.org/ documents/abstract-model (accessed august 29, 2007). 37. joint steering committee for development of rda, “rda: resource description and access: background,” july 16, 2007, www.collectionscanada.ca/jsc/rda.html (accessed august 29, 2007). 38. joint steering committee for development of rda, “rda-frbr mapping,” june 14, 2007, www.collectionscanada .ca/jsc/docs/5rda-frbrmapping.pdf (accessed august 29, 2007). 39. joint steering committee for development of rda, “rda element analysis,” june 14, 2007, www.collectionscanada.ca/ jsc/docs/5rda-elementanalysis.pdf (accessed august 28, 2007). a revised version of the document was issued on december 16, 2007, at www.collectionscanada.gc.ca/jsc/docs/5rda-element analysisrev.pdf (accessed december 30, 2007). 40. “data model meeting: british library, london 30 april–1 may 2007,” www.bl.uk/services/bibliographic/meeting.html (accessed july 23, 2007). the task group has outlined its work plan, including deliverables, on its wiki at http://dublincore .org/dcmirdataskgroup (accessed october 4, 2007). 41. emily a hicks, jody perkins, and margaret beecher maurer, “application profile development for consortial digital libraries,” library resources and technical services 51, no. 2 (april 2007). 42. makx dekkers, “application profiles, or how to mix and match metadata schemas,” cultivate interactive, january 2001, www.cultivate-int.org/issue3/schemas (accessed august 29, 2007). 43. thomas baker et al., “dublin core application profile guidelines,” september 3, 2005, http://dublincore.org/usage/ documents/profile-guidelines (accessed october 8, 2007). 44. joint steering committee for development of rda, “rda element analysis.” 45. karen coyle and diane hillmann, “resource description and access (rda): cataloging rules for the 20th century,” d-lib magazine 13, no. 1/2 (jan./feb. 2007), www.dlib.org/dlib/ january07/coyle/01coyle.html (accessed august 24, 2007). 46. karen coyle, “astonishing announcement: rda goes 2.0,” coyle’s information, may 3, 2007, http://kcoyle.blogspot .com/2007/05/astonishing-announcement-rda-goes-20.html (accessed august 29, 2007). 18 information technology and libraries | june 2008 47. “drupal.org,” http://drupal.org (accessed august 30, 2007). 48. foster et al., “extensible catalog survey report,” 14. 49. “taxonomy: a way to organize your content,” drupal.org, http://drupal.org/handbook/modules/taxonomy (accessed september 12, 2007). 50. “blackboard learning system,” www.blackboard.com/ products/academic_suite/learning_system/index.bb (accessed august 31, 2007). 51. “sakai: collaboration and learning environment for education,” http://sakaiproject.org (accessed august 31, 2007). 52. for example, the library into blackboard project at california state fullerton has developed a toolkit for faculty that brings openurl resolver functionality into blackboard to create linked citations to resources. see “putting the library into blackboard: a toolkit for cal state fullerton faculty,” 2005, www .library.fullerton.edu/librarytoolkit/default.shtml (accessed august 31, 2007); and susan tschabrun, “putting the library into blackboard: using the sfx openurl generator to create a toolkit for faculty.” the sakaibrary project at indiana university and the university of michigan are working to integrate licensed library content into sakai using metasearch technology. see “sakaibrary: integrating licensed library resources with sakai,” june 28, 2007, www.dlib.indiana.edu/projects/sakai (accessed august 31, 2007). 53. university of rochester river campus libraries, “extensible catalog phase 2.” 54. susan gibbons, “library course management systems: an overview,” library technology reports 41, no. 3 (may/june 2005): 34–37. 55. marti a. hearst, “design recommendations for hierarchical faceted search interfaces,” august 2006, http:// flamenco.berkeley.edu/papers/faceted-workshop06.pdf (accessed august 31, 2007). 56. kristin antelman, emily lynema, and andrew k. pace, “toward a twenty-first century library catalog,” information technology and libraries 25, no. 3 (september 2006): 128–138. 57. “c4,” https://www.library.rochester.edu/c4 (accessed september 28, 2007). as of the time of this writing, the c4 prototype is available to the public. however, the prototype is no longer being developed, and this prototype may cease to be available at some point in the future. 58. charley pennell, “forward to the past: resurrecting faceted search @ ncsu libraries,” (powerpoint presentation at the american library association annual conference, washington, d.c., june 24, 2007), www.lib.ncsu.edu/endeca/ presentations/200706-facetedcatalogs-pennell.ppt (accessed august 31, 2007). 59. mary charles lasater, “authority control meets faceted browse: vanderbilt and primo,” (powerpoint presentation at the american library association annual conference, washington, d.c., june 24, 2007), www.ala.org/ala/lita/litamembership/ litaigs/authorityalcts/2007annualfiles/marycharleslasater.ppt (accessed august 31, 2007). 60. casey bisson, “faceting and clustering: an implementation report based on scriblio,” (powerpoint presentation at the american library association annual conference, washington, d.c., june 24, 2007), http://oz.plymouth.edu/~cbisson/ presentations/alaannual_2-2007june24.pdf (accessed august 31, 2007). 61. “subject access fields (6xx),” in marc 21 concise format for bibliographic data (2006), www.loc.gov/marc/bibliographic/ ecbdsubj.html (accessed september 28, 2007). 62. pennell, “forward to the past: resurrecting faceted search@ ncsu libraries.” 63. “fast: faceted application of subject terminology,” www.oclc.org/research/projects/fast (accessed august 31, 2007). 64. ifla study group on the functional requirements for bibliographic records, functional requirements for bibliographic records. 65. ifla working group on functional requirements and numbering of authority records (franar), “functional requirements for authority data.” 66. library of congress, network development and marc standards office, “functional analysis of the marc 21 bibliographic and holding formats,” april 6, 2006, www.loc. gov/marc/marc-functional-analysis/functional-analysis.html (accessed august 31, 2007); martha m. yee, “frbrization: a method for turning online public finding lists into online public catalogs,” information technology and libraries 24, no. 2 (june 2005): 77–95; pat riva, “mapping marc 21 linking entry fields to frbr and tillett’s taxonomy of bibliographic relationships,” library resources and technical services 48, no. 2 (april 2004): 130–143. 67. trond aalberg, “a process and tool for the conversion of marc records to a normalized frbr implementation,” in digital libraries: achievements, challenges and opportunities (berlin/heidelberg: springer, 2006), 283–292; christian monch and trond aalberg, “automatic conversion from marc to frbr,” in research and advanced technology for digital libraries (berlin/heidelberg: springer, 2003): 405–411; david mimno and gregory crane, “hierarchical catalog records: implementing a frbr catalog,” d-lib magazine 11, no. 10 (october 2005), www .dlib.org/dlib/october05/crane/10crane.html (accessed august 24, 2007). 68. trond aalberg, frank berg haugen, and ole husby, “a tool for converting from marc to frbr,” in research and advanced technology for digital libraries (berlin/heidelberg: springer, 2006), 453–456; “frbr work-set algorithm,” www .oclc.org/research/software/frbr/default.htm (accessed august 31, 2007); “xisbn (web service),” www.worldcat .org/affiliate/webservices/xisbn/app.jsp (accessed august 31, 2007). 69. for example, marc 21 data may need to be augmented to extract data attributes related to frbr works and expressions that are not explicitly coded within a marc 21 bibliographic record (such as a date associated with a work coded within a general note field); or to “sort out” the fields in a marc 21 bibliographic record for a single resource that contains various works and/or expressions (e.g. ,a sound recording with multiple tracks), to associate the various fields (performer access points, analytical entries, subject headings, etc.) with the appropriate work or expression. 70. while the rit-developed tool is not publicly available at the time of this writing, it is our intent to post it to sourceforge (www.sourceforge.net) in the near future. the final report of the rit project is available at http://docushare.lib.rochester.edu/ docushare/dsweb/get/document-27362 (accessed january 2, 2008). metadata to support next-generation library resource discovery | bowen 19 71. foster et al., “extensible catalog survey report.” 72. note the arrow pointing to the left in figure 1 between the user environments and the metadata services hub. 73. jane greenberg and eva mendez, knitting the semantic web (binghamton, ny: haworth information press, 2007). this volume, co-published simultaneously as cataloging and classification quarterly 43, no. 3/4, contains a wealth of articles that explore the role that libraries can, and should, play in the development of the semantic web. 74. corey a. harper and barbara b. tillett explore various methods for making these controlled vocabularies available in “library of congress controlled vocabularies and their application to the semantic web,” cataloging and classification quarterly 43, no. 3/4 (2007): 63. the development of skos (simple knowledge organization system), a semantic web language for representing controlled structured vocabularies, will also be valuable for xc. see alistair miles and jose r. perez-aguiera, “skos: simple knowledge organisation for the web,” catalogingand classification quarterly 43, no. 3/4 (2007). 75. marcus, “ethicshare planning phase final report.” 76. the talis platform provides another promising environment for experimentation and development. see “talis platform: semantic web application platform,” talis, www.talis.com/ platform (accessed september 2, 2007). 32 information technology and libraries | june 2007 author id box for 3 column layout column title 32 information technology and libraries | june 2008 communications michaela brenner and peter klein discovering the library with google earth libraries need to provide attractive and exciting discovery tools to draw patrons to the valuable resources in their catalogs. the authors conducted a pilot project to explore the free version of google earth as such a discover tool for portland state library’s digital collection of urban planning documents. they created eye-catching placemarks with links to parts of this collection, as well as to other pertinent materials like books, images, and historical background information. the detailed how-to-do part of this article is preceded by a discussion about discovery of library materials and followed by possible applications of this google earth project. in calhoun’s report to the library of congress, it becomes clear that staff time and resources will need to move from cataloging traditional formats, like books, to cataloging unique primary sources, and then providing access to these sources from many different angles. “organize, digitize, expose unique special collections” (calhoun 2006). in 2005, portland state university library received a grant “to develop a digital library under the sponsorship of the portland state university library to serve as a central repository of the collection, accession, and dissemination of [urban] key planning documents . . . that have high value for oregon citizens and for scholars around the world” (abbott 2005). this collection is called the oregon sustainable community digital library (oscdl) and is an ongoing project that includes literature, planning reports, maps, images, rlis (regional land information system) geographical data, and more. much of the older material is unpublished, and making it available online presents a valuable resource. most of the digitized—and, more recently, borndigital—documents are accessible through the library’s catalog, where patrons can find them together with other library materials about the city of portland. the bibliographic records are arranged in the catalog in an electronic resource management (erm) system (brenner, larsen, and weston 2006). additionally, these bibliographic data are regularly exported from the library catalog to the oscdl web site (http://oscdl. research.pdx.edu) and there integrated with gis (global information system) features, thus optimizing cataloging costs by reusing data in a different electronic environment. committed to not falling into the trap that clifford lynch had in mind when he wrote, “i think there is a mental picture that many of us have that digitization is something you do and you finish . . . a finite, one-time process“ (lynch 2002), and agreeing with gatenby that “it doesn’t matter at all if a user finds our opac through the ‘back door ’“ (gatenby 2007), the authors looked into further using these existing data from the library catalog by making them accessible from a popular and appealing place on the internet, a place that users are more likely to visit than the library catalog. the free version of google earth, a virtual-globe program that can be installed on pcs, lent itself to experimenting. “google earth combines the power of google search with satellite imagery, maps, terrain and 3-d buildings to put the world’s geographic information at your fingertips” (http://earth.google.com). from there, the authors provide links to the digitized documents in the library catalog. easy distribution, as well as the more playful nature of this pilot project and the inclusion of pictures, make the available data even more attractive to users. “google now reigns” “google now reigns,” claims karen markey (markey 2007), and many others agree that using google is easier and more appealing to most than using library catalogs. google’s popularity has been growing spectacularly. in august 2007, google accounted for 64 percent of all u.s. searches (avtec media group 2007). in contrast, the oclc report on how users perceive the library shows that only one percent of the respondents begin their information search on a library web site, while 84 percent use search engines (de rosa, et al. 2005). “if we [libraries] want to survive,” says stephen abram, “we must place our messages where the users are seeking answers and will trip over them. today that usually means at yahoo, msn, and google” (abram 2005). according to lorcan dempsey, in the longer run, traffic to the library catalog will come by linking from larger consolidated resources, like open worldcat and google scholar (dempsey 2005). dempsey also stressed that it becomes more and more significant to differentiate between discovery and location (dempsey 2006a). initially, users want to discover; they want to find what interests them independent from where this information is actually located and available. while there may be lots of valuable, detailed, and exceptionally well-organized bibliographic information in the library catalog, not michaela brenner (brennerm@pdx.edu) is assistant professor and database maintenance and catalog librarian at portland state university library, oregon. peter klein (peter.klein@colorado.edu) is aerospace engineering bs/ms at the university of colorado at boulder. introducing zoomify image | smith 33discovering the library with google earth | brenner and klein 33 many users (one percent) are willing to discover this information through the catalog. they may not discover what a library has to offer if “the library does not find a way to go to the user, rather than waiting for the user to come to the library” (coyle 2007). unless the intent is to keep our treasures buried, the library community needs to work with popular outside discovery environments— like search engines—to bring information available in libraries to users from the outside. libraries are, although sometimes reluctantly, responding. google, google scholar, and google books are open worldcat partner sites that are now or soon will be providing access to worldcat records. google book search includes “find this book in the library,” and the advanced book search also has the option to limit a search to library catalogs with access to the worldcat web record for each item. “deep linking” enables web users to link from search results in yahoo, google, or other partner sites to the “find in a library” interface in open worldcat, and then directly to the item’s record in their library’s online public access catalog (opac). simply put, “find it on google, get it from your library” (calhoun 2006). the “leveraged discovery environment” is an expression coined by dempsey that means it becomes increasingly important to leverage a “discovery environment which is outside your control to bring people back into our catalog environment (like amazon, google scholar)” (dempsey 2006b). issues in calhoun’s report to the library of congress include the question of how to get a google user from google to library collections. she quotes an interviewee saying that “data about a library’s collection needs to be on google and other popular sites as well as the library interface” (calhoun 2006). with evidence pointing to the heavy use of google for discovery and with google earth technology providing such a powerful visualization tool, the authors felt tempted to experiment with existing data from portland state library’s digital oscdl collection and make these data accessible through a virtual globe. the king’s college cultural heritage project martyn jessop from king’s college in london, united kingdom, published an article about a relatively small pilot project on providing access to a digital cultural heritage collection through a geographical information system (jessop 2005). jessop’s approach to explore different technologies and techniques to apply to existing data about unique primary sources was exactly what the authors had in mind with this project, and provided encouragement to move forward with the idea of providing additional access to the oregon sustainable community digital library (oscdl) collections through google earth. similar to jessop, the authors regard it an unaffordable luxury to put a great deal of effort into collecting, digitizing, and cataloging materials without making them available to a much broader audience through multiple access points. comparable to jessop, the goal of this project was to find a relatively simple, low-cost technological solution that could also be applied to a much wider range of data without much more investment in staff time and money. once the authors mastered the initial hurdle of understanding google earth’s programming language, they could easily identify with jessop’s notion of “project creep” as more and more possibilities arose to make the project more appealing. this, as with the king’s college project, was a valuable part of the development process, the details of which are described below. the portland state library oscdl-ongoogle-earth project the authors chose ten portlandbased oscdl sub-collections as the basis of this pilot project: harbor drive, front street, portland public market, urban studies collection, downtown, park blocks, south park blocks, pioneer courthouse square, portland city archives, and jpact (joint policy advisory committee on transportation). the programming language for google earth is kml (keyhole markup language), a file format used to display geographic data. kml is based on the xml standard and can be created with the google earth user interface or from scratch with a simple text editor. having no previous kml experience, the authors decided to use both. figure 1. basic placemark in google earth figure 2. kml script for basic placemark 34 information technology and libraries | june 200834 information technology and libraries | june 2008 a basic placemark provided by google earth (figure 1), copied and pasted in notepad (figure 2), was the starting point. at portland state library, information technology routinely batch export cataloged oscdl data from the library catalog (ils) to the oscdl web site to reuse them. for the google earth project, the authors had two options, to either export data relevant to our collections from the ils to a spreadsheet or to use an existing excel spreadsheet containing most of the same data, including place coordinates. this spreadsheet was one of many others that had been created to keep track for the digitization process as well as for creating bibliographic records for the library catalog later. using the available spreadsheet again, the following data were retained: n the title of the collection n longitude and latitude of the place the collection refers to n a brief description of the collection the following were added manually to the remaining spreadsheet: n all the texts and urls for the collection-specific links n urls for the collection-specific images the authors extracted the placemark-specific script from figure 2 to create a template in notepad. a general description and all links that were the same for the ten collections were added to this template, and placeholders were inserted for collection-specific data (figure 3). using microsoft office word’s mail merge, the authors populated the template with the data from the spreadsheet in one quick step. the result was a kml script that included all the placemark data for the ten collections (figure 4). the script was saved as plain text (.txt) first, and then renamed with the extension .kml, which represents the final file (figure 5). clicking the oscdl.kml icon on a desktop or inside a web application opens google earth. the user “flies” to portland, where ten stars represent the ten collections (figure 6). zooming in, the placemarks show the locations to which the collections refer. considering the many layers and icons available in google earth, the authors decided to use yellow stars to make them more visible. in order to avoid clutter and overlapping labels, titles only appear on mouse-over (figures 7 and 8). figure 9 shows the open placemark for portland public market. “portland state university” with the university’s logo is a link that takes the user to the university’s homepage. the next line is the title of the collection, followed by a brief description. the paragraph after that is the same for all collections and includes links to the portland state university library and the oscdl web site. the collection-specific links that follow next go to the library catalog where the user has access to the digitized manuscripts of this collection (figure 10). other pertinent links—in this case to a book available in the library, a public web site on the history of the market, and a historic image of the market—were added as well. to make the placemarks visually more attractive, all links are presented in the school’s “psu green,” and an image representative of the collection was added. the pictures can be enlarged in a new window by clicking on them. to avoid copyright issues, the authors photographed their own images. the last link opens an e-mail window for questions and comments (figure 11). this link is intended to bring some feedback and suggestions on how to improve the project and on its value for researchers and other users. the authors have been toying with the idea of including in the future more elaborate features such as video clips and music. one more recent feature is that kml files, created in google earth, can now also be viewed on the web by simply entering the url of the kml file into the search box of google maps (figure 12), thus creating google earth placemarks in figure 3. detail of template with variables between « double brackets » figure 4. detail: “downtown” placemark of finished kml script figure 5. simplified process figure 6. ten stars representing the ten collections introducing zoomify image | smith 35discovering the library with google earth | brenner and klein 35 google maps with different view options (figures 13 and 14). not all formatting is correctly transferred, and at this point, there is no way to correct this in google maps. for example, the yellow stars were white, the mouse-over didn’t work and the size of the placemarks was imprecise. however, the content of the placemarks—except for the images which didn’t show on some computers—was fully retained and all links worked (figure 15). although the use of the kml file in google maps is not as elegant as in google earth, it has the advantage that there is no need to install software as with google earth. this adds value to kml files and makes projects like this more versatile. the authors have identified several uses for the kml file: n a workstation in the library can be dedicated to resources about the city of portland. an icon on the desktop of this workstation will open google earth and “fly” directly to portland where the yellow stars are displayed. n professors can easily add the .kml file to webct (now blackboard) or other course management systems. n the file can be e-mailed as an figure 7. zoomed in with mouse-over placemark figure 8. location of the pioneer courthouse square placemark figure 9. portland public market figure 10. access to the collection in library catalog figure 11. ready-to-go e-mail window figure 12. url of kml file in google maps search box figure 13. “map” view in google maps figure 14. “satellite” view in google maps figure 15. portland public market placemark in google maps 36 information technology and libraries | june 200836 information technology and libraries | june 2008 attachment to those interested in the development of the city of portland. n a link from the wikipedia page related to the oscdl project leads to the google earth pilot project. n the project was added to the google earth gallery where many remarkable projects, created by individuals and groups can be found. n it can also be accessed through the oscdl web site, and relevant links from the records in the library catalog to google maps can be included. it may be useful to alert patrons, who actually did come to the catalog by themselves, to this visual tool. conclusion “the question now is not how we improve the catalog as such,” says dempsey. “it is how we provide effective discovery and delivery of library materials in a network environment where attention is scarce and information resources are abundant and where discovery opportunities are being centralized into major search engines and distributed to other environments” (dempsey 2006a). with this in mind, the authors took on the challenge to create another discovery tool for one of the library’s primary unique digital collections. google earth is not the web, and it needs to be installed on a workstation in order to use a kml file. on the other hand, the file created in google earth can also be used on the web more readily but less elegantly in google maps, thus possibly reaching a larger audience. similar to the king’s college project and following abram’s suggestion that “we should experiment more with pilots in specific areas” (abram 2005), this pilot project is of an exploratory, experimental nature. and as with many experiments, the authors were testing an idea, trying something different and new to find out how useful this idea might be, and useful applications for this project were identified. google earth is a sophisticated, attractive, and exciting program—and fun to play with. in a time “where attention is scarce and information resources are abundant,” as dempsey (2006a) says, we need to provide these kinds of discovery tools to attract patrons and to lure them to these valuable resources in our library’s catalog that we created with so much diligence and cost of staff time and resources. works cited abbott, carl. 2005. planning a sustainable portland: a digital library for local, regional, and state planning and policy documents. framing paper. http://oscdl.research.pdx.edu/documents/library_grant.pdf. abram, stephen. 2005. the google opportunity. library journal 130, no. 2: 34. avtec media group. 2007. search engine statistics. http://avtecmedia.com/ internet-marketing/internet-marketing-trends.htm. brenner, michaela, tom larsen, and claudia weston. 2006. digital collection management through the library catalog. information technology and libraries 25, no. 2: 65–77. calhoun, karen. 2006. the changing nature of the catalog and its integration with other discovery tools; final report, prepared for the library of congress. www.loc.gov.proxy.lib.pdx. edu/catdir/calhoun-report-final.pdf. coyle, karen. 2007. the library catalog in a 2.0 world. the journal of academic librarianship 33, no. 2: 289–291. de rosa, cathy et al. 2005. perceptions of libraries and information resources. a report to the oclc membership. www .oclc.org.proxy.lib.pdx.edu/reports/ pdfs/percept_all.pdf. dempsey, lorcan. 2006a. the library catalogue in the new discovery environment: some thoughts. ariadne 48. www.ariadne.ac.uk/issue48/dempsey. dempsey, lorcan. 2006b. lifting out the catalog discovery experience. lorcan dempsey’s weblog on libraries, services, and networks, may 14, 2006. http://orweblog .oclc.org/archives/001021.html dempsey, lorcan. 2005. making data work—web 2.0 and catalogs. lorcan dempsey’s weblog on libraries, services, and networks, october 4, 2005. http://orweblog.oclc .org/archives/000815.html gatenby, janifer. 2007. accessing library materials via google and other web sites. paper presented to elag (european library automation group), may 9, 2007. http://elag2007.upf. edu/papers/gatenby_2.pdf. jessop, martyn. 2005. the application of a geographical information system to the creation of a cultural heritage digital resource. literary and linguistic computing: journal of the association for literary and linguistic computing 20, no. 1: 71–90. lynch, clifford. 2002. digital collections, digital libraries, and the digitization of cultural heritage information. first monday 7, no. 5. www.firstmonday. org/issues/issue7_5/lynch. markey, karen. 2007. the online library catalog. d-lib magazine 13, no. 1/2. www .dlib.org/dlib/january07/markey/01 markey.html. lita cover 2, cover 3, cover 4 index to advertisers a hybrid access method for bibliographic records abraham bookstein: the university of chicago graduate library school, chicago, illinois. 97 this paper defines an access method for bibliographic reco1'ds that combines features of the sea1'ch key app1'oach and the inverted file approach. it is a refinement of the search key technique that permits its extension to la1'ge files. a method by which this approach can be efficiently implemented is suggested. introduction a major problem in the development of computerized files of bibliographic records is the creation of a convenient and economical mechanism to access the records. as the problem of organizing a file for efficient access is a general one, a number of structural devices have been suggested. hsiao and harary propose an abstract model for file structure that encompasses those that are discussed most frequently. 1 lefkovitz discusses these techniques in more detail and considers the advantages of each for implementation, while dodd and knuth describe the data structures needed in implementing such files. 2-4 these works reveal the interrelation between a file's organization and its retrieval capability, but the determination of which routes of access to provide must be the task of those responsible for creating the file. such a determination may involve consideration of both the intrinsic structure of the items represented by the file and the conditions under which the file is to be used. they will influence which file organization should be chosen. because of the complexity inherent in collections of bibliographic " items, the problem of determining suitable access routes to library files has been a challenging one. almost any datum may, on some occasion, be a useful means of entering the file. dimsdale and heaps, in their discussion of a file structure for an on-line catalog, explicitly propose words from the title, authors, and library of congress call numbers. 5 in this paper we shall consider the problem of accessing a known item by means of information contained in the author and title field. we shall concentrate on two approaches that have received much attention-the use 98 i ottrnal of library automation vol. 7/2 june 197 4 of a truncated search key, referred to simply as search key, and the use of boolean expressions of key words from the title. both of these are intended to allow a user simple entry into the file when the full field of information is long, complicated, or, perhaps, incompletely known by the user. the authors and titles of books often share these characteristics. each of the approaches, taken by itself, has its strengths and its weaknesses. we will discuss each technique in tum, and then suggest an elaboration of the search key technique that incorporates some features of the boolean search technique; this combination of techniques should enable systems that are committed to the use of search keys as a primary access route to extend this technique to large files. it introduces into the search key approach some of the flexibility of the key word approach. search keys this approach defines at least one special field, the search key, for each item represented in the file, and allows retrieval of the record for an item by inputting the value of its search key. 68 the search key should be constructed so as to allow its evaluation from data that are available at the time of access. the main advantage of this approach as it is usually implemented has been its great simplicity-for a broad variety of materials, the key can be readily evaluated and quickly entered into the system. the most heavily discussed defect of this approach is that it will sometimes retrieve a considerable number of records to a single request. consider, for example, these works: 1. ramsay, blanche margaret. relation of various climactic factors to the growth and development of sugar beets, and 2. ramsey, ian thomas. religious language. the popular ( 3, 3) search key, constructed by concatenating the first three letters of the author's name and the first three letters of the first significant word of the title, would represent each of these by the key ram, rel. this defect becomes particularly severe with certain corporate entries and works such as conference proceedings. furthermore, this difficulty can be expected to become aggravated as the file increases in size or, equivalently, as some items are given multiple search key values; the latter may be required in order to alleviate the problems inherent in having to access items with ambiguous or multiple forms of titles. attempts to remedy the difficulty of multiple retrievals have resulted in increasingly complex keys, defeating the purpose for which this technique was originally proposed. a more complex key makes greater demands on the user, encourages mistakes on entry, and also might increase the likelihood of two individuals deriving different keys for the same item. inverted files in this approach, a user attempts to retrieve a record by forming a hyb1·id access methodjbookstein 99 boolean expression of key words taken from various fields of the desired record. 9• 10 stanford university's ballots, for example, allows the user to enter the file by means of words taken from the title of a book. two advantages of this approach as compared to the search key are that: (a) the user need not know the information required to form a search key, for example the first word of the title; and (b) the user is able to enter the system by what appears to him to be the most distinctive terms in the title, thereby minimizing false drops. users of ballots have found that because of the speed at which computers operate, usually the indexes can be manipulated and a record retrieved immediately, or in a very short period of time. fayollat gives an estimate of two to five seconds as the response time. the most direct way to implement this approach would be to access each record in the file and compare it to the request. for any but the smallest files this would be unreasonably costly in computer time. an alternative, and customary, implementation involves maintenance of indexes of key words. while experience with this approach, as at the ballots project, recommends this as a workable implementation, it can be costly in terms of the computer costs involved with upkeeping the indexes. hybrid approach we offer for consideration an elaboration of the search key approach that incorporates aspects of the key word approach. it is intended as an alternative to developing increasingly complex keys for systems adopting a search key approach, but for which a simple search key retrieves too many items; possibly this approach can be selectively applied to the more troublesome parts of the file, such as to items with corporate authors. this approach associates a search key with each record, hopefully one that is simple and easily derived. a user would begin by entering into the system the search key. if the system finds that the number of items that would be retrieved exceeds a preset threshold, it would output a message requesting that the user enter a set of key words taken from various fields in the records; the title would be very useful in this regard. the system first generates a subfile of records having the desired search key. if a hashing technique is used, constructing this subfile can be accomplished quickly and at relatively little cost in space for tables. 11 once the smaller file is formed, a complete search of the full records can be made for the key words. since the system operates in two phases, it is less sensitive to the number of records the search key retrieves as far as user considerations are concerned. ease of use becomes the dominating objective in designing the search key. experience to date suggests that even a very simple search key will almost always produce less than thirty records with files having in the order of 100,000 records. however, a complete search of a reduced file of thirty records should be feasible; in fact, usually the subfile will be no larger than two or three records. from one point of view, in the hybrid system, we can 100 journal of library automation vol. 7/2 june 1974 think of the search key not as an access mechanism, as earlier, but rather as a file reduction mechanism. this system trades the cost of maintaining and storing large indexes for an increase in costs of computer processing; only relatively easily maintained hash tables for fixed length search keys need be maintained. an accurate assessment of these costs can be made only after the statistical characteristics of various search keys have been explored. observations if it should be desired to implement a hybrid system, the following observations would be in order: 1. among the current concerns of facilities with large bibliographic files is file compaction. if records will have to be searched for key words, this consideration will influence planning of compaction techniques. for example, a technique such as cop ack, which completely scrambles the bits in a record, would not be permissible. 12 use of variable length codes for characters, such as in hoffman coding, would allow searches for key words; most likely such a search would be implemented by attempting to match substrings of bits rather than matching on the full word level.13 another common compaction technique, bigram coding, would also complicate the separation of words unless the blank were prevented from combining with other characters; because of the frequency with which the blank occurs with other characters, this restriction would interfere considerably with the efficacy of the technique.u a different approach would be to recognize that each word could have only two "spellings," depending on what happened to the blank preceding the word, and both spellings could be tested. (a brief survey of the above compaction techniques has been conducted by fouty.15 ) 2. though a complete search for key words would be feasible on a small file, it is possible to expedite the search considerably by means of a technique devised by malcolm harrison, which involves adding a fixed number of bits, or signatures, to each field on which a search can take place; these additional bits are derived in a well-defined way from the original field. 16• 17 this subfield is a fixed-size representation of the full field in a form that can be used to very rapidly eliminate most records which would not pass the key word matching test. it is stored in the index to the file along with the address of the record. though this preliminary test is not foolproof, it could considerably reduce the size of the subfile that requires a more costly complete search, thereby reducing the number of disc accesses. if this procedure is adopted, a possible sequence of events would be as follows: (a) a user inputs a search key and, perhaps, a couple of key words. these may be words he is certain are in the title, although the hybrid access methodjbookstein 101 name of a series, the author, or subject headings would also represent candidates. (b) on the basis of the search key the system creates a sub file of record addresses and signatures taken from the index-if the user is unfortunate the subfile would have a large number of records. (c) a rapid preliminary search of the signatures using the harrison technique is made of the reduced file to test whether the key words could possibly be part of a record. this pass eliminates a number of records; how efficient this technique is will depend on the number of bits the system associates with each representative field. (d) finally, the full records of the remaining items are retrieved and a full search is made. at any point, if the subfile is too large, the system may request additional key words. example of technique implementation how to create a signature for a record is best explained by means of an example. many variants are possible, and we have chosen a simple one for the purposes of illustration. the signature we shall create will consist of one word of thirty-two bits. we proceed as follows: 1. list all the substantive words of the title, e.g., relation, various, climactic, ... , beets, if we consider one of the titles mentioned above. 2. truncate each word to, say, the first four characters: rela, vari, ... , beet. other truncation sizes, or no truncation at all, may be elected. 3. for each string of characters produced in this way, form the two consecutive strings of three characters. for example, "vari" contributes "var" and "ari." since the first word is already represented in the search key, we may use only the second three-letter string for that word-here "rela" is represented only by "ela." implicit in this implementation is the assumption that if a user remembers anything about a word, he will correctly remember at least its first three characters, and that the first four characters go a long way toward giving the word away. 4. finally, we turn on a bit in the signature for each three-character string, essentially creating a hash code of thirty-two bits. the code should incorporate information from all three characters. for purposes of illustration, the following method will suffice: (a) for each letter in a three-letter string, substitute the rank of that letter in the alphabet, beginning with 01 for a-thus "ela" becomes 05,12,01; (b) consider the string of digits as a single six-digit number, and multiply that number by 1111-thus "ela" becomes 51201 and then 56884311; (c) divide by 32 and use the remainder as the address of the bit which is to be turned on. the string "ela" is thus associated with bit number 23, where the leftmost bit is the oth bit. as the algorithm is 102 journal of library automation vol. 7/2 june 1974 applied to each three-character string, the signature is formed. the book by blanche margaret ramsay is accordingly represented by: 01000011100100011000010100100101 similarly the book by ian thomas ramsey is represented by: 00000000000000010000000001000010 suppose a patron, or a cataloger, wishes to see the record associated with mr. ramsey's book on religious language. he would enter the search key, ram,rel, and, say, the word "language." among the index entries reb·ieved by the search key will be the desired book, and also the book by ramsay, dealing with sugar beets. the signature for the word "language" has bits numbered 30 and 25 turned on. since the ramsay book does not have both of these turned on (in this case neither bit is turned on), it is immediately eliminated; the actual records retrieved from the file will be only those for which both bits are on. though it is quite possible that false drops can be incurred in this way, clearly many incorrect records are easily eliminated. note also that the user need input only as much of the word as he has confidence in, provided that at least three characters are produced. use of the above technique leaves a number of decisions that still must be made by the system designer. among these are: 1. should a signature be associated with each item, or only a part of them, for example, with corporate authors? 2. how much truncation is appropriate, if any? if no truncation is used, then the user can input fragments of words, including fragments taken from the middle of a word, as well as full words. on the other hand, as the signature fills up, the probability of a false drop increases. earlier research contains a formula that allows us to estimate this effect.18 consider a title with six significant words. fayollat has found that in a file of biomedical serials, about 83 percent of all items will be of this size or less.19 similarly, let us assume that the average word in the title is made up of eight characters, a modal number of characters in fayollat's data base. if the user requests a term composed also of eight characters, then table 1 estimates the probability of a false drop as a function of the b·uncation size. table l. probability of false drops as function of truncation size. truncation probability of length false drop 3 .17 4 .10 5 .08 6 .08 7 .08 8 .09 it is seen that for this typical case, the method eliminates about 90 perhybrid access methodjbookstein 103 cent of the false drops. it must be understood that longer titles, or titles made up of longer words, will be more likely to be erroneously retrieved; on the other hand, the user can increase his precision by inputting a larger number of terms. the above calculation assumes that terms in the request and in the title are independent; of course, all items having the same search key as the request and sharing the discriminant word will be retrieved; presumably the user will minimize this effect by choosing distinctive words. fayollat finds that 50 percent of the words appearing in his titles occur only once. conclusion in conclusion, we propose a technique for entering a bibliographic data base that retains the simplicity of search keys while also including some of the flexibility that boolean expressions of key words have for uniquely defining an item. in such a system, the only indexes that must be maintained are the hash tables; the other indexes, such as title words, are replaced by the search algorithms. if a signature, the supplementary field described above, is also stored in the index, this approach reduces the number of disc accesses. a major limitation of this approach is that a user must be able to provide a search key; this is shared, however, with systems depending exclusively on search keys. furthermore, since the system is capable of handling larger numbers of retums on the search key, there is greater inducement to associating more search key values for each item. thus such a hybrid system allows groups that find search keys an attractive access technique to extend this approach to file sizes which strain the capacities of the direct approach. references 1. d. hsiao and f. harary, "a formal system for information retrieval from files," communications of the acm 13:67-73 (feb. 1970). 2. d. lefkovitz, file structures for on-line systems (new york: spartan books, 1969). 3. g. dodd, "elements of data management systems," computing surveys 1:117-35 (june 1969). 4. d. knuth, fundamental algorithms, the art of computer programming, vol. 1 (new york: addison-wesley, 1968). 5. j. j. dimsdale and h. s. heaps, "file structure for an on-line catalog of one million titles," journal of libmry automation 6:37-55 (march 1973). 6. f. g. kilgour, p. l. long, and e. b. leiderman, "retrieval of bibliographic entries for a name-title catalog by use of truncated search keys," proceedings of the asis 7:79-82 (1970). 7. p. l. long and f. g. kilgour, "a truncated search key title index," journal of libmry automation 5:17-20 (march 1972). 8. a. landgraf, k. rastogi, and p. long, "corporate author entry records retrieved by use of derived truncated search keys," journal of library automation 6:15661 (sept. 1973). 9. james fayollat, "on-line serials control system in a large bio-medical library. 104 journal of library automation vol. 7/2 june 1974 part ii. evaluation of retrieval features," journal of the asis 23:353-58 (nov.dec. 1972). 10. a. h. epstein et al., articles in proceedings of the asis 10 ( 1973). 11. a. bookstein, "double hashing," journal of the asis 23:402-5 (nov.dec. 1972). 12. b. a. matton and p. a. d. de maine, "automatic data compression," communications of the acm 10:711-15 (nov. 1967). 13. w. d. maurer, "file compression using hoffman coding," in computing metho.ds in optimization problems 2, from second international conference on computing methods in optimization problems (new york: academic press, 1969), p.242-56. 14. w. d. schieber and g. w. thomas, "compaction of alphanumeric data," journal of library automation 4:198-206 (dec. 1971). 15. gary fouty, unpublished master's thesis, university of chicago. 16. m. harrison, "implementation of the substring test by hashing," communications of the acm 14:777-79 (dec. 1971). 17. a. bookstein, "on malcolm harrison's subsb·ing testing technique," communications of the acm 16:180-81 (march 1973). 18. ibid. 19. fayollat, "on-line serials control system." library discovery products: discovering user expectations through failure analysis irina trapido information technology and libraries | september 2016 9 abstract as the new generation of discovery systems evolve and gain maturity, it is important to continually focus on how users interact with these tools and what areas they find problematic. this study looks at user interactions within searchworks, a discovery system developed by stanford university libraries, with an emphasis on identifying and analyzing problematic and failed searches. our findings indicate that users still experience difficulties conducting author and subject searches, could benefit from enhanced support for browsing, and expect their overall search experience to be more closely aligned with that on popular web destinations. the article also offers practical recommendations pertaining to metadata, functionality, and scope of the search system that could help address some of the most common problems encountered by the users. introduction in recent years, rapid modernization of online catalogs has brought library discovery to the forefront of research efforts in the library community, giving libraries an opportunity to take a fresh look at such important issues as the scope of the library catalog, metadata creation practices, and the future of library discovery in general. while there is an abundance of studies looking at various aspects of planning, implementation, use, and acceptance of these new discovery environments, surprisingly little research focuses specifically on user failure. the present study aims to address this gap by identifying and analyzing potentially problematic or failed searches. it is hoped that focusing on common error patterns will help us gain a better understanding of users’ mental models, needs, and expectations that should be considered when designing discovery systems, creating metadata, and interacting with library patrons. terminology in this paper, we adopt a broad definition of discovery products as “tools and interfaces that a library implements to provide patrons the ability to search its collections and gain access to materials.”1 these products can be further subdivided into the following categories: irina trapido (itrapido@stanford.edu) is electronic resources librarian at stanford university libraries, stanford, california. mailto:itrapido@stanford.edu library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 10 • online catalogs (opacs)—patron-facing modules of an integrated library system. • discovery layers (also referred to as “discovery interfaces” or “next-generation library catalogs”)—new catalog interfaces, decoupled from the integrated library system and offering enhanced functionality, such as faceted navigation, relevance-ranked results, as well as the ability to incorporate content from institutional repositories and digital libraries. • web-scale discovery tools, which in addition to providing all interface features and functionality of next generation catalogs, broaden the scope of discovery by systematically aggregating content from library catalogs, subscription databases, and institutional digital repositories into a central index. literature review to identify and investigate problems that end users experience in the course of their regular searching activities, we analyzed digital traces of user interactions with the system recorded in the system’s log files. this method, commonly referred to as transaction log analysis, has been a popular way of studying information-seeking in a digital environment since the first online search systems came into existence, allowing researchers to monitor system use and gain insight into the users’ search process. server logs have been used extensively to examine user interactions with web search engines, consistently showing that web searchers tend to engage in short search sessions, enter brief search statements, do not browse the results beyond the first page, and rarely resort to advanced searching.2 a similar picture has emerged from transaction log studies of library catalogs. researchers have found that library users employ the same surface strategies: queries within library discovery tools are equally short and simply constructed; 3 the majority of search sessions consist of only one or two actions.4 patrons commonly accept the system’s default search settings and rarely take advantage of a rich set of search features traditionally offered by online catalogs, such as boolean searching, index browsing, term truncation, and fielded searching.5 although advanced searching in library discovery layers is uncommon, faceted navigation, a new feature introduced into library catalogs in the mid-2000s, quickly became an integral part of the users’ search process. research has shown that facets in library discovery interfaces are used both in conjunction with text searching, as a search refinement tool, and as a way to browse the collection with no search term entered.6 a recent study that analyzed interaction patterns in a faceted library interface at the north carolina state university using log data and user experiments demonstrated that users of faceted interfaces tend to issue shorter queries, go through fewer iterations of query reformulation, and scan deeper along the result list than those who use nonfaceted search systems. the authors also concluded that facets increase search accuracy, especially for complex and open-ended tasks, and improve user satisfaction.7 information technology and libraries | september 2016 11 another traditional use of transaction logs has been to gauge the performance of library catalogs, mostly through measuring success and failure rates. while the exact percentage of failed searches varied dramatically depending on the system’s search capabilities, interface design, the size of the underlying database, and, most importantly, on the researchers’ definition of an unsuccessful search, the conclusion was the same: the incidence of failure in library opacs was extremely high.8 in addition to reporting error rates, these studies also looked at the distribution of errors by search type (title, author, or subject search) and categorized sources of searching failure. most researchers agreed that typing errors and misspellings accounted for a significant portion of failed searches and were common across all search types.9 subject searching, which remained the most problematic area, often failed because of a mismatch between the search terms chosen by the user and the controlled vocabulary contained in the library records, suggesting that users experienced considerable difficulties in formulating subject queries with library of congress subject headings.10 other errors reported by researchers, such as the selection of the wrong search index or the inclusion of the initial article for title searches, were also caused by users’ lack of conceptual understanding of the search process and the system’s functions.11 these research findings were reinforced by multiple observational studies and user interviews, which showed that patrons found library catalogs “illogical,” “counter-intuitive,” and “intimidating,”12 and that patrons were unwilling to learn the intricacies of catalog searching.13 instead, users expected simple, fast, and easy searching across the entire range of library collections, relevance-ranked results that exactly matched what users expected to find, and convenient and seamless transition from discovery to access.14 today’s library discovery systems have come a long way: they offer one-stop search for a wide array of library resources, intuitive interfaces that require minimal training to be searched effectively, facets to help users narrow down the result set, and much more.15 but are today’s patrons always successful in their searches? usability studies of next-generation catalogs and, more recently, of web-scale discovery systems have pointed to patron difficulties associated with the use of certain facets, mostly because of terminological issues and inconsistencies in the underlying metadata.16 researchers also reported that users had trouble interpreting and evaluating the results of their search;17 users also were confused as to what resources were covered by the search tool.18 our study builds on this line of research by systematically analyzing real-life problematic searches as reported by library users and recorded in transaction logs. background stanford university is a private, four-year or above research university offering undergraduate and graduate degrees in a wide range of disciplines to about sixteen thousand students. the study analyzed the use of searchworks, a discovery platform developed by stanford university libraries. searchworks features a single search box with a link to advanced search on every page, relevanceranked results, faceted navigation, enhanced textual and visual content (summaries, tables of library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 12 content, book cover images, etc.), as well as “browse shelf” functionality. searchworks offers searching and browsing of catalog records and digital repository objects in a single interface; however, it does not allow article-level searching. searchworks was developed on the basis of blacklight (projectblacklight.org), an open-source application for searching and interacting with collections of digital objects.19 thanks to blacklight’s flexibility and extensibility, searchworks enables discovery across an increasingly diverse range of collections (marc catalog records, archival materials, sound recordings, images, geospatial data, etc.) and allows to continuously add new features and improvements (e.g., https://library.stanford.edu/blogs/stanford-libraries-blog/2014/09/searchworks-30-released). study objectives the goal of the present study was two-fold. first, we sought to determine how patrons interact with the discovery systems, which features they use and with what frequency. second, this study aimed to identify and analyze problems that users encounter in their search process. method this study used data comprising four years of searchworks use, which was recorded in apache solr logs. the analysis was performed at the aggregate level; no attempts were made to identify individual searchers from the logs. at the preprocessing stage, we created and used a series of perl scripts to clean and parse the data and extract only those transactions where the user entered a search query and/or selected at least one facet value. page views of individual records were excluded from the analysis. the resulting output file contained the following parameters for each transaction: a time stamp, search mode used (basic or advanced), query terms, search index (“all fields,” “author,” “title,” “subject,” etc.), facets selected, and the number of results returned. the query stream was subsequently partitioned into task-based search sessions using a combination of syntactic features (word cooccurrence across multiple transactions) and temporal features (session time-outs: we used fifteen minutes of inactivity as a boundary between search sessions). the analysis was conducted over the following datasets: dataset 1. aggregate data of approximately 6 million search transactions conducted between february 13, 2011, and december 31, 2014. we performed quantitative analysis of this set to identify general patterns of system use. dataset 2. a sample of 5,101 search sessions containing 11,478 failed or potentially problematic interactions performed in the basic search mode and 2,719 sessions containing 3,600 advanced searches, annotated with query intent and potential cause of the problem. the searches were performed during eleven twenty-four-hour periods, representing different years, academic http://projectblacklight.org/ https://library.stanford.edu/blogs/stanford-libraries-blog/2014/09/searchworks-30-released information technology and libraries | september 2016 13 quarters, times of the school year (beginning of the quarter, midterms, finals, breaks), and days of the week. this dataset was analyzed to identify common sources of user failure. dataset 3. user feedback messages submitted to searchworks between january 2011 and december 2014 through the “feedback” link, which appears on every searchworks page. while the majority of feedback messages were error and bug reports, this dataset also contained valuable information about how users employed various features of the discovery layer, what problems they encountered, and what features they felt would improve their search experience. for the manual analysis of dataset 2, all searches within a search session were reconstructed in searchworks and, in some cases, also in external sources such as worldcat, google scholar, and google. they were subsequently assigned to one of the following categories: known-item searches (searches for a specific resource by title, combination of title and author, a standard number such as issn or isbn, or a call number), author searches (queries for a specific person or organization responsible for or contributing to a resource), topical searches, browse searches (searches for a subset of the library collection, e.g., “rock operas,” “graphic novels,” “dvds,” etc.), invalid queries, and queries where the search intent could not be established. to identify potentially problematic transactions, the following heuristic was employed: we selected all search sessions where at least one transaction failed to retrieve any records, as well as sessions consisting predominantly of known-item or author searches, where the user repeated or reformulated the query three or more times within a five-minute time frame. we hypothesized that this search pattern could be part of the normal query formulation process for topical searches, but it could serve as an indicator of the user’s dissatisfaction with the results of the initial query for known-item and author searches. we identified seventeen distinct types of problems, which we further aggregated into the following five groups: input errors, absence of the resource from the collection, queries at the wrong level of granularity, erroneous or too restrictive use of limiters, and mismatch between the search terms entered and the library metadata. each search transaction in dataset 2 was manually reviewed and assigned to one or more of these error categories. findings usage patterns our analysis of the aggregate data suggests that keyword searching remains the primary interaction paradigm with the library discovery system, accounting for 76 percent of all searches. however, users also increasingly take advantage of facets both for browsing and refining their searches: the use of facets grew from 25 percent in 2011 to 41 percent in 2014. library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 14 although both the basic and the advanced search modes allow for “fielded” searches, where the user can specify which element of the record to search (author, title, subject, etc.), searchers rarely made use of this feature, relying mostly on the system’s defaults (the “all fields” search option in the basic search mode): users selected a specific search index in less than 25 percent of all basic searches. advanced searching was infrequent and declining (from 11 percent in 2011 to 4 percent in 2014). typically, users engaged in short sessions with a mean session length of 1.5 queries. search queries were brief: 2.9 terms per query on average. single terms made up 23 percent of queries; 26 percent had two terms, and 19 percent had three terms. error patterns the breakdown of errors by category and search mode is shown in figure 1. in the following sections, we describe and analyze different types of errors. figure 1. breakdown of errors by category and search mode input errors input errors accounted for the largest proportion of problematic searches in the basic search mode (29 percent) and for 5 percent of problems in the advanced search. while the majority of such errors occurred at the level of individual words (misspellings or typographical errors), entire search statements were also imprecise and erroneous (e.g., “diary of an economic hit man” instead of “confessions of an economic hit man” and “dostoevsky war and peace” instead of “tolstoy war and peace”). it is noteworthy that in 46 percent of all search sessions containing information technology and libraries | september 2016 15 problems of this type, users subsequently entered a corrected query. however, if such errors occurred in a personal name, they were almost half as likely to be corrected. absence of the item sought from the collection queries for materials that were not in the library’s collection accounted for about a quarter of all potentially problematic searches. in the advanced search modality, where the query is matched against a specific search field, such queries typically resulted in zero hits and can hardly be considered failures per se. however, in the default cross-field search, users were often faced with the problem of false hits and had to issue multiple progressively more specific queries to ascertain that the desired resource was absent from the collection. queries at the wrong level of granularity a substantial number of user queries failed because they were posed at the level of specificity not supported by the catalog. such queries accounted for the largest percentage of problematic advanced searches (63 percent), where they consisted almost exclusively of article-level searching: users either tried to locate a specific article (often by copying the entire citation or its part from external sources) or conducted highly specific topical searches more suitable for a fulltext database. in the basic search mode, the proportion of searches at the wrong granularity level was much lower, but still substantial (20 percent). in addition to searches for articles and narrowly defined subject searches, users also attempted to search for other types of more granular content, such as book chapters, individual papers in conference proceedings, poems, songs, etc. erroneous or too restrictive use of limiters another common source of failure was the selection of the wrong search index or a facet that was too restrictive to yield any results. the majority of these errors were purely mechanical: users failed to clear out search refinements from their previous search or entered query terms into the wrong search field. however, our analysis also revealed several conceptual errors, typically stemming from a misunderstanding of the meaning and purpose of certain limiters. for example, “online,” “database,” and “journal/periodical” facets were often perceived by the user as a possible route to article-level content. even seemingly straightforward limiters such as “date” caused confusion, especially when applied to serial publications: users attempted to employ this facet to drill down to the desired journal issue or article, most likely acting on the assumption that the system included article-level metadata. lack of correspondence between the users’ search terms and the library metadata a significant number of problems in this group involved searches for non-english materials. when performed in their english transliteration, such queries often failed because of users’ lack of library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 16 familiarity with the transliteration rules established by the library community, whereas searches in the vernacular scripts tended to produce incomplete or no results because not all bibliographic records in the database contained parallel non-roman script fields. author and title searches often failed because of the users’ tendency to enter abbreviated queries. for example, personal name searches where the user truncated the author’s first or middle name to an initial while the bibliographic records only contained this name in its full form were extremely likely to fail. abbreviations were also used in searches for journals, conference proceedings, and occasionally even for book titles (e.g., “ai: a modern approach” instead of “artificial intelligence: a modern approach”). such queries were successful only if the abbreviation used by the searcher was included in the bibliographic records as a variant title. a somewhat related problem occurred when the title of a resource contained a numeral in its spelled out form but was entered as a digit by the user. because these title variations are not always recorded as additional access points in the bibliographic records, the desired item either did not appear in the result set or was buried too deep to be discovered. topical searches within the subject index were also prone to failure, mostly because patrons were unaware that such searches require the use of precise terms from controlled vocabularies and resorted to natural language searching instead. user feedback our analysis of user feedback revealed substantial differences in how various user groups approach the search system and which areas of it they find problematic. students were often frustrated by the absence of spelling suggestions, which, as one user put it, “left the users wander [to?] in the dark” as to the cause of searching failure. this user group also found certain social features desirable: for example, one user suggested that having ratings for books would be helpful in his choice of a good programming book. by contrast, faculty and researchers were more concerned about the lack of the more advanced features, such as cross-reference searching and left-anchored browsing of the title, subject, and author indexes. however, there were several areas that both groups found problematic: students and faculty alike saw the system’s inability to assist in the selection of the correct form of the author’s name as a major barrier to effective author searching and also converged on the need for more granular access to formats of audiovisual materials. discussion scope of the discovery system the results of our analysis point to users’ lack of understanding of what is covered by the discovery layer. users are often unaware of the existence of separate specialized search interfaces for different categories of materials and assume that the library discovery layer offers google-like information technology and libraries | september 2016 17 searching across the entire range of library resource types. moreover, they are confused by the multiple search modalities offered by the discovery layer: one of the common misconceptions in searchworks is that the advanced search will allow the user to access additional content rather than offer a different way of searching the same catalog data. in addition to the expanded scope of the discovery tools, there is also a growing expectation of greater depth of coverage. according to our data, searching in a discovery layer occurs at several levels: the entire resource (book, journal title, music recording), its smaller integral units (book chapters, journal articles, individual musical compositions, etc.), and full text. user search strategies the search strategies employed by searchworks users are heavily influenced by their experiences with web search engines. users tend to engage in brief search sessions and use short queries, which is consistent with the general patterns of web searching. they rely on relevance ranking and are often reluctant to examine search results in any depth: if the desired item does not appear within the first few hits, users tend to rework their initial search statement (often with only a minimal change to the search terms) rather than scrolling down to the bottom of the results screen or looking beyond the first page of results. given these search patterns, it is crucial to fine-tune relevance-ranking algorithms to the extent that the most relevant results are displayed not just on the first page but are included in the first few hits. while this is typically the case for unique and specific queries, more general searches could benefit from a relevance-ranking algorithm that would leverage the popularity of a resource as measured by its circulation statistics. adding this dimension to relevance determination would help users make sense of large result sets generated by broad topical queries (e.g., “quantum mechanics,” “linear algebra,” “microeconomics”) by ranking more popular or introductory materials higher than more specialized ones. it could also provide some guidance to the user trying to choose between different editions of the same resource and improve the quality of results of author searches by ranking works created by the author before critical and biographical materials. users’ query formulation strategies are also modeled by google, where making search terms as specific as possible is often the only way to increase the precision of a search. faceted search systems, however, require a different approach: the user is expected to conduct a broad search and subsequently focus it by superimposing facets on the results. qualifying the search upfront through keywords rather than facets is not only ineffective, but may actually lead to failure. for example, a common search pattern is to add the format of a resource as a search term (e.g., “fortune magazine,” “science journal,” “gre e-book,” “nicole lopez dissertation,” “woody allen movies”), and because the format information is coded rather than spelled out in the bibliographic records, such queries either result in zero hits or produce irrelevant results. in a similar vein, making the query overly restrictive by including the year of publication, publisher, or edition library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 18 information often causes empty retrievals because the library might not have the edition specified by the user or because the query does not match the data in the bibliographic record. thus our study lends further weight to claims that even in today’s reality of sophisticated discovery environments and unmediated searching, library users can still benefit from learning the best search techniques that are specifically tailored to faceted interfaces.20 error tolerance input errors remain one of the major sources of failure in library discovery layers. users have become increasingly reliant on error recovery features that they find elsewhere on the web, such as “did you mean . . . ” suggestions, automatic spelling corrections, and helpful suggestions on how to proceed in situations where the initial search resulted in no hits. but perhaps even more crucial are error-prevention mechanisms, such as query autocomplete, which helps users avoid spelling and typographical errors and provides interactive search assistance and instant feedback during the query formulation process. our visual analysis of the logs from the most recent years revealed an interesting search pattern, where the user enters only the beginning of the search query and then increments it by one or two letters: pr pro proq proque proques proquest such search patterns indicate that users expect the system to offer query expansion options and show the extent to which the query autocomplete feature (currently missing from searchworks) has become an organic part of the users’ search process. topical searching while next-generation discovery systems represent a significant step toward enabling more sophisticated topical discovery, a number of challenges still remain. apart from mechanical errors, such as misspellings and wrong search index selections, the majority of zero-hit topical searches were caused by a mismatch between the user’s query and the vocabulary in the system’s index. in many cases such queries were formulated too narrowly, reflecting the users’ underlying belief that the discovery layer offers full-text searching across all of the library’s resources. in addition to keyword searching, libraries have traditionally offered a more sophisticated and precise way of accessing subject information in the form of library of congress subject headings (lcsh). however, our results indicate that these tools remain largely underused: users took advantage of this feature in only 21 percent of all subject searches in our sample. we also found information technology and libraries | september 2016 19 that 95 percent of lcsh usage came from clicks on subject heading links within individual bibliographic records rather than from “subject” facets, corroborating the results of earlier studies.21 there is a whole range of measures that could help patrons leverage the power of controlled vocabulary searching. they include raising the level of patron familiarity with the lcshs, integrating cross-references for authorized subject terms, enabling more sophisticated facetbased access to subject information by allowing users to manipulate facets independently, and exposing hierarchical and associative relationships among lcshs. ideally, once the user has identified a helpful controlled vocabulary term, it should be possible to expand, refine, or change the focus of a search through broader, narrower, and related terms in the lcsh’s hierarchy as well as to discover various aspects of a topic through browse lists of topical subdivisions or via facets. known-item searching important as it is for the discovery layer to facilitate topical exploration, our data suggests that searchworks remains, first and foremost, a known-item lookup tool. while a typical searchworks user rarely has problems with known-work searches, our analysis of clusters of closely related searches has revealed several situations where users’ known-item search experience could be improved. for example, when the desired resource is not in the library’s collection, the user is rarely left with empty result sets because of automatic word-stemming and cross-field searching. while this is a boon for exploratory searching, it becomes a problem when the user needs to ensure that the item sought is not included in the library’s collection. another common scenario arises when the query is too generic, imprecise, or simply erroneous, or when the search string entered by the user does not match the metadata in the bibliographic record, causing the most relevant resources to be pushed too far down the results list to be discoverable. providing helpful “did you mean . . . ” suggestions could potentially help the user distinguish between these two scenarios. another feature that would substantially benefit the user struggling with the problem of noisy retrievals is highlighting the user’s search terms in retrieved records. displaying search matches could alleviate some of the concerns over lack of transparency as to why seemingly irrelevant results are retrieved, repeatedly expressed in user feedback, as well as expedite the process of relevance assessment. author searching author searching remains problematic because of a convergence of factors: a. misspellings. according to our data, typographical errors and misspellings are by far the most common problem in author searching. when such errors occur in personal names, they are much more difficult to identify than errors in the title, and in the absence of library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 20 index-based spell-checking mechanisms, often require the use of external sources to be corrected. b. mismatch between the form and fullness of the name entered by the user and the form of the name in the bibliographic record. for example, a user’s search for “d. reynolds” will retrieve records where “d” and “reynolds” appear anywhere in the record (or anywhere in the author fields, if the user opts for a more focused “author” search), but will not bring up records where the author’s name is recorded as “reynolds, david.” c. lack of cross-reference searching of the lc name authority file. if the user searches for a variant name represented by a cross-reference on an authority record, she might not be directed to the authorized form of the name. d. lack of name disambiguation, which is especially problematic when the search is for a common name. while the process of name authority control ensures the uniqueness of name headings, it does not necessarily provide information that would help users distinguish between authors. for instance, the user often has to know the author’s middle name or date of birth to choose the correct entry, as exemplified by the following choices in the “author” facet resulting from the query “david kelly”: kelly, david kelly, david (david d.) kelly, david (david francis) kelly, david f. kelly, david h. kelly, david patrick kelly, david st. leger kelly, david t. kelly, david, 1929 july 11– kelly, david, 1929– kelly, david, 1929–2012 kelly, david, 1938– kelly, david, 1948– kelly, david, 1950– kelly, david, 1959– e. errors and inaccuracies in the bibliographic records. given the past practice of creating undifferentiated personal-name authority records, it is not uncommon to have one name heading for different authors or contributors. conversely, situations where a single person is identified by multiple headings (largely because some records still contain obsolete or variant forms of a personal name) are also prevalent and may information technology and libraries | september 2016 21 become a significant barrier to effective retrieval as they create multiple facet values for the same author or contributor. f. inability to perform an exhaustive search on the author’s name. a fielded “author” search will miss the records where the name does not appear in the “author” fields but appears elsewhere in the bibliographic record. g. relevance ranking. because search terms occurring in the title have more weight than search terms in the “author” fields, works about an author are ranked higher than works of the author. browsing like many other next-generation discovery systems, searchworks features faceted navigation, which facilitates both general-purpose browsing and more targeted search. in searchworks, facets are displayed from the outset, providing a high-level overview of the collection and jumping-off points for further exploration. rather than having to guess the entry vocabulary, the searcher may just choose from the available facets and explore the entire collection along a specific dimension. however, findings from our manual analysis of the query stream suggest that facets as a browsing tool might not be used to their fullest potential: users often resort to keyword searching when faceted browsing would have been a more optimal strategy. there are at least two factors that contribute to this trend. the first is users’ lack of awareness of this interface feature: it is common for searchworks users to issue queries such as “dissertations,” “theses,” and “newspapers” instead of selecting the appropriate value of the “format” facet. second, many of the facets that could be useful in the discovery process are not available as top-level browsing categories. for example, users expect more granular faceting of audiovisual resources, which would include the ability to browse by content type (“computer games,” “video games”) and genre (“feature films,” “documentaries,” “tv series,” “romantic comedies”). another category of resources commonly accessed by browsing is theses and dissertations. users frequently try to browse dissertations by field or discipline (issuing searches such as “linguistics thesis,” “dissertations aeronautics,” “phd thesis economics,” “biophysics thesis”), by program or department and by the level of study (undergraduate, master’s, doctoral), and could benefit from a set of facets dedicated to these categories. browsing for books could be enhanced by additional faceting related to intellectual content, such as genre and literary form (e.g., “fantasy,” “graphic novels,” “autobiography,” “poetry”) and audience (e.g., “children’s books”). users also want to be able to browse for specific subsets of materials on the basis of their location (e.g., permanent reserves at the engineering library). browsing for new acquisitions with the option of limiting to a specific topic is also a highly desirable feature. library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 22 while some browsing categories are common across all types of resources, others only apply to specific types of materials (e.g., music, cartographic/geospatial materials, audiovisual resources, etc.). for example, there is a strong demand among music searchers for systematic browsing by specific musical instruments and their combinations. ideally, the system should offer both an optimal set of initial browse options and intuitive context-specific ways to progressively limit or expand the search. offering such browsing tools may require improvements in system design as well as significant data remediation and enhancement because much of the metadata that could be used to create these browsing categories is often scattered across multiple fixed and variable fields in the bibliographic records, inconsistently recorded, or not present at all. one of the hallmarks of modern discovery tools has been their increased focus on developing tools that would facilitate serendipitous browsing. searchworks was one of the pioneers to offer virtual “browse shelf” feature, which is aimed at emulating browsing the shelves in a physical library. however, because this functionality relies on the classification number, it does not allow browsing of many other important groups of materials, such as multimedia resources, rare books, or archival resources. call-number proximity is only one of the many dimensions that could be leveraged to create more opportunities for serendipitous discoveries. other methods of associating related content might include recommendations based on subject similarity, authorship, keyword associations, forward and backward citations, and use. implications for practice addressing the issues that we identified would involve improvements in several areas: • scope. our findings indicate that library users increasingly perceive the discovery interface as a portal to all of the library’s resources. meeting this need goes far beyond offering the ability to search multiple content sources from a single search box: it is just as important to help users make sense of the results of their search and to provide easy and convenient ways to access the resources that they have discovered. and whatever the scope of the library discovery layer is, it needs to be communicated to the user with maximum clarity. • functionality. users expect a robust and fault-tolerant search system with a rich suite of search-assistance features, such as index-based alternative spelling suggestions, result screens displaying keywords in context, and query auto-completion mechanisms. these features, many of which have become deeply embedded into user search processes elsewhere on the web, could prevent or alleviate a substantial number of issues related to problematic user queries (misspellings, typographical errors, imprecise queries, etc.), enable more efficient recovery from errors by guiding users to improved results, and facilitate discovery of foreign-language materials. equally important is the continued focus on relevance ranking algorithms, which ideally should move beyond simple keyword information technology and libraries | september 2016 23 matching techniques toward incorporating social data as well as leveraging the semantics of the query itself and offering more intelligent and possibly more personalized results depending on the context of the search. • metadata. the quality of the user experience in the discovery environments depends as much on the metadata as it does on the functionality of the discovery layer. thus it remains extremely important to ensure consistency, granularity, and uniformity of metadata, especially as libraries are increasingly faced with the problem of integrating heterogeneous pools of metadata into a single discovery tool. conclusions and future directions the analysis of the transaction log data and user feedback has helped us identify several common patterns of search failure, which in turn can reveal important assumptions and expectations that users bring to the library discovery. these expectations pertain primarily to the system’s functionality: in addition to simple, intuitive, and visually appealing interfaces and relevanceranked results, users expect a sophisticated search system that would consistently produce relevant results even for incomplete, inaccurate, or erroneous queries. users also expect a more centralized, comprehensive, and inclusive search environment that would enable more in-depth discovery by offering article-level, chapter-level, and full-text searching. finally, the results of this study have underscored the continued need for a more flexible and adaptive system that would be easy to use for novices while offering advanced functionality and more control over the search process for the “power” users, a system that would provide targeted support for the different types of information behavior (known-item look-up, author searching, topical exploration, browsing) and would facilitate both general inquiry and very specialized searches (e.g., searches for music, cartographic and geospatial materials, digital collections of images, etc.). just like discovery itself, building discovery tools is a dynamic, complex, iterative process that requires intimate knowledge of ever-changing and evolving user needs and expectations. it is hoped that ongoing focus on user problems and frustrations in the new discovery environments can complement other assessment methods by identifying unmet user needs, thus helping create a more holistic and nuanced picture of users’ search and discovery behaviors. references 1. marshall breeding, “library resource discovery products: context, library perspectives, and vendor positions,” library technology reports 50, no. 1 (2014): 5–58. 2. craig silverstein et al., “analysis of a very large web search engine query log,” sigir forum 33, no. 1 (1999): 6–12; bernard j. jansen, amanda spink, and tefko saracevic, “real life, real users, and real needs: a study and analysis of user queries on the web,” information library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 24 processing & management 36, no. 2 (2000): 207–27, http://dx.doi.org/10.1016/s03064573(99)00056-4; amanda spink, bernard j. jansen, and h. cenk ozmultu, “use of query reformulation and relevance feedback by excite users,” internet research 10, no. 4 (2000): 317–28; amanda spink et al., “searching the web: the public and their queries,” journal of the american society for information science & technology 52, no. 3 (2001): 226–34; bernard j. jansen and amanda spink, “an analysis of web searching by european allteweb.com users,” information processing & management 41, no. 2 (2005): 361–81, http://dx.doi.org/10.1016/s0306-4573(03)00067-0. 3. cory lown and bradley hemminger, “extracting user interaction information from the transaction logs of a faceted navigation opac,” code4lib 7, june 26, 2009, http://journal.code4lib.org/articles/1633; eng pwey lau and dion ho-lian goh, “in search of query patterns: a case study of a university opac,” information processing & management 42, no. 5 (2006): 1316–29, http://dx.doi.org/10.1016/j.ipm.2006.02.003; heather moulaison, “opac queries at a medium-sized academic library: a transaction log analysis,” library resources & technical services 52, no. 4 (2008): 230–37. 4. william h. mischo et al., “user search activities within an academic library gateway: implications for web-scale discovery systems,” in planning and implementing resource discovery tools in academic libraries, edited by mary pagliero popp and diane dallis, 153–73 (hershey, : information science reference, 2012); xi niu, tao zhang, and hsin-liang chen, “study of user search activities with two discovery tools at an academic library,” international journal of human-computer interaction 30, no. 5 (2014): 422–33, http://dx.doi.org/10.1080/10447318.2013.873281. 5. eng pwey lau and dion ho-lian goh, “in search of query patterns”; niu, zhang, and chen, “study of user search activities with two discovery tools at an academic library.”. 6. lown and hemminger, “extracting user interaction; kristin antelman, emily lynema, and andrew k. pace, “toward a twenty-first century library catalog,” information technology & libraries 25, no. 3 (2006): 128–39; niu, zhang, and chen, “study of user search activities with two discovery tools at an academic library.” 7. xi niu and bradley hemminger, “analyzing the interaction patterns in a faceted search interface,” journal of the association for information science & technology 66, no. 5 (2015): 1030–47, http://dx.doi.org/10.1002/asi.23227. 8. steven d. zink, “monitoring user search success through transaction log analysis: the wolfpac example,” reference services review 19, no. 1 (1991): 49–56; deborah d. blecic et al., “using transaction log analysis to improve opac retrieval results,” college & research libraries 59, no. 1 (1998): 39–50; holly yu and margo young, “the impact of web search http://dx.doi.org/10.1016/s0306-4573(99)00056-4 http://dx.doi.org/10.1016/s0306-4573(99)00056-4 http://dx.doi.org/10.1016/s0306-4573(03)00067-0 http://journal.code4lib.org/articles/1633 http://dx.doi.org/10.1016/j.ipm.2006.02.003 http://dx.doi.org/10.1080/10447318.2013.873281 http://dx.doi.org/10.1080/10447318.2013.873281 information technology and libraries | september 2016 25 engines on subject searching in opac,” information technology & libraries 23, no. 4 (2004): 168–80; moulaison, “opac queries at a medium-sized academic library.” 9. thomas peters, “when smart people fail,” journal of academic librarianship 15, no. 5 (1989): 267–73; zink, “monitoring user search success through transaction log analysis”; rhonda h. hunter, “successes and failures of patrons searching the online catalog at a large academic library: a transaction log analysis,” reference quarterly (spring 1991): 395–402. 10. karen antell and jie huang, “subject searching success: transaction logs, patron perceptions, and implications for library instruction,” reference & user services quarterly 48, no. 1 (2008): 68–76; hunter, “successes and failures of patrons searching the online catalog at a large academic library”; peters, “when smart people fail.” 11. peters, “when smart people fail.”; moulaison, “opac queries at a medium-sized academic library”; blecic et al., “using transaction log analysis to improve opac retrieval results.” 12. lynn silipigni connaway, debra wilcox johnson, and susan e. searing, “online catalogs from the users’ perspective: the use of focus group interviews,” college & research libraries 58, no. 5 (1997): 403–20, http://dx.doi.org/10.5860/crl.58.5.403. 13. karl v. fast and d. grant campbell, “‘i still like google’: university student perceptions of searching opacs and the web,” asist proceedings 41 (2004): 138–46; eric novotny, “i don’t think i click: a protocol analysis study of use of a library online catalog in the internet age,” college & research libraries 65, no. 6 (2004): 525–37, http://dx.doi.org/10.5860/crl.65.6.525. 14. xi niu et al., “national study of information seeking behavior of academic researchers in the united states,” journal of the american society for information science & technology 61, no. 5 (2010): 869–90, http://dx.doi.org/10.1002/asi.21307; lynn sillipigni connaway, timothy j. dikey, and marie l. radford, “if it is too inconvenient i’m not going after it: convenience as a critical factor in information-seeking behaviors,” library & information science research 33, no. 3 (2011): 179–90; karen calhoun, joanne cantrell, peggy gallagher and janet hawk, online catalogs: what users and librarians want: an oclc report (dublin, oh: oclc online computer library center, 2009). 15. f. william chickering and sharon q. young, “evaluation and comparison of discovery tools: an update,” information technology & libraries 33, no.2 (2014): 5–30, http://dx.doi.org/10.6017/ital.v33i2.3471. 16. william denton and sarah j. coysh, “usability testing of vufind at an academic library,” library hi tech 29, no. 2 (2011): 301–19, http://dx.doi.org/10.1108/07378831111138189; jennifer emanuel, “usability of the vufind next-generation online catalog,” information technology & libraries 30, no. 1 (2011): 44–52; erin dorris cassidy et al., “student searching http://dx.doi.org/10.5860/crl.58.5.403 http://dx.doi.org/10.5860/crl.65.6.525 http://dx.doi.org/10.1002/asi.21307 http://dx.doi.org/10.6017/ital.v33i2.3471 http://dx.doi.org/10.1108/07378831111138189 library discovery products: discovering user expectations through failure analysis |irina trapido |doi:10.6017/ital.v35i2.9190 26 with ebsco discovery: a usability study,” journal of electronic resources librarianship 26, no. 1 (2014): 17–35, http://dx.doi.org/10.1080/1941126x.2014.877331. 17. sarah c. williams and anita k. foster, “promise fulfilled? an ebsco discovery service usability study,” journal of web librarianship 5, no. 3 (2011): 179–98, http://dx.doi.org/10.1080/19322909.2011.597590; rice majors, “comparative user experiences of next-generation catalogue interfaces,” library trends 61, no. 1 (2012): 186– 207; andrew d. asher, lynda m. duke, and suzanne wilson, “paths of discovery: comparing the search effectiveness of ebsco discovery service, summon, google scholar, and conventional library resources,” college & research libraries 74, no. 5 (2013): 464–88. 18. jody condit fagan et al., “usability test results for a discovery tool in an academic library,” information technology & libraries 31, no. 1 (2012): 83–112; megan johnson, “usability test results for encore in an academic library,” information technology & libraries 32, no. 3 (2013): 59–85. 19. elizabeth (bess) sadler, “project blacklight: a next generation library catalog at a first generation university,” library hi tech 27, no. 1 (2009): 57–67, http://dx.doi.org/10.1108/07378830910942919; bess sadler, “stanford's searchworks: unified discovery for collections?” in more library mashups: exploring new ways to deliver library data, edited by nicole c. engard, 247–260 (london: facet, 2015). 20. andrew d. asher, lynda m. duke and suzanne wilson, “paths of discovery: comparing the search effectiveness of ebsco discovery service, summon, google scholar, and conventional library resources,” college & research libraries 74, no. 5 (2013): 464–88; kelly meadow and james meadow, “search query quality and web-scale discovery: a qualitative and quantitative analysis,” college & undergraduate libraries 19, no. 2–4 (2012): 163–75, http://dx.doi.org/10.1080/10691316.2012.693434. 21. sarah c. williams and anita k. foster, “promise fulfilled? an ebsco discovery service usability study,” journal of web librarianship 5, no. 3 (2011): 179–98, http://dx.doi.org/10.1080/19322909.2011.597590; kathleen bauer and alice peterson-hart, “does faceted display in a library catalog increase use of subject headings?” library hi tech 30, no. 2 (2012), 347–58, http://dx.doi.org/10.1108/07378831211240003. http://dx.doi.org/10.1080/1941126x.2014.877331 http://dx.doi.org/10.1080/19322909.2011.597590 http://dx.doi.org/10.1108/07378830910942919 http://dx.doi.org/10.1080/10691316.2012.693434 http://dx.doi.org/10.1080/19322909.2011.597590 http://dx.doi.org/10.1108/07378831211240003 abstract introduction references articles weathering the twitter storm: early uses of social media as a disaster response tool for public libraries during hurricane sandy sharon han information technology and libraries | june 2019 37 sharon han (shrnhan@gmail.com) is candidate for master of science in library and information science, school of information sciences, university of illinois. abstract after a disaster, news reports and online platforms often document the swift response of public libraries supporting their communities. despite current scholarship focused on social media in disasters, early uses of social media as an extension of library services require further scrutiny. the federal emergency management agency (fema) recognized hurricane sandy as one of the earliest u.s. disasters in which first responders used social media. this study specifically examines early uses of twitter by selected public libraries as an information tool during sandy’s aftermath. results can inform uses of social media in library response to future disasters. introduction in the digital age of instantaneous communication, when disasters hit, they hit us all. the fall and winter of 2017-18 brought a literal and figurative deluge to our screens with the arrival of hurricanes harvey, irma, and maria to the united states. within moments of each event, websites and news feeds filled with images of destruction and cries for help. the use of social media to bring awareness to victims’ situations through hashtags and directly tagging first responders underscores the importance of this technological tool in the twenty-first century. in fact, the ubiquity of social media in documenting hurricane harvey have led some to believe that it should be considered the first “social media storm.”1 however, many of the most popular social media platforms have existed since the mid-2000s and have already been used to communicate disasterrelated information since well before harvey reached the united states’ shores. some of social media’s earliest adapters were even public libraries who had the resources and means to use this information technology as a method of connecting with their communities. why should social media matter to public libraries in times of disaster? as a physical manifestation of information access, the public library maintains a relationship with its community that varies across regions, time, and context. currently, the public library as an entity is in an interventionist period, according to jaeger’s article “libraries, policy, and politics in a democracy: four historical epochs,” where its roles and responsibilities are heavily influenced by outside factors, especially the federal government.2 from tax forms to permits to insurance claims, the government encourages people to use the public library to find and use information necessary to navigate american society. public demand for accessing government and other resources is especially apparent after natural disasters, which, due to their unpredictable nature, can heighten weathering the twitter storm | han 38 https://doi.org/10.6017/ital.v38i2.11018 community uncertainty and the need for credible and reliable information. public libraries can meet this information need by using social media as one strategy to assess and provide resources in real time. when hurricane sandy landed on new jersey’s shore on october 29, 2012, it prompted a new era for societal response to emergencies and community needs. due to the hurricane’s trajectory into densely populated areas of the american northeast and subsequent widespread flooding, hurricane sandy was the deadliest storm of 2012.3 with initial estimated recovery costs of up to $50 billion, the degree of damage to buildings, infrastructure, and endangerment of people’s safety made swift and coordinated communication paramount in response efforts. thus, the aftermath of hurricane sandy resulted in federal agencies using social media for the first time in coordinating and implementing disaster response.4 as community-based service providers, many public libraries responded to the hurricane by sharing available resources and services with patrons. however, few studies explicitly examine the use of social media as a library tool to support their community. this paper explores the role of social media and its impact on public library services in response to hurricane sandy as a measure of libraries using digital mediums to support their communities. using twitter posts from three separate public libraries impacted by the hurricane, their content is analyzed and compared to reported library services after the storm. the analysis will then be used to discuss the use of social media as a library tool and recommendations for social media implementation in future disaster response. background information library response to disasters according to the institute of museum and library services’ public library data from 2009 to 2011, over half of all public libraries are located within declared “disaster counties.”5 this value implicates disaster response as an important topic within public librarianship discourse. in addition to assessing damages to buildings and collections, libraries must also meet the needs of its community. information needs are heightened after a disaster, as the destruction results in information uncertainty and loss of important resources such as power and telecommunication services.6 consistent and increased use of public libraries is not unusual post-disaster. for example, despite 35 percent of louisiana libraries being closed after hurricane katrina in 2004, a study found that overall library visitor counts only decreased by 1 percent.7 frequent use of library resources after a disaster can be attributed to the library’s free and low-cost resources, as well as the institution’s reputation as a source for reliable and credible information.8 libraries also extend their resources and services beyond their walls. library bookmobiles and delivery programs provide services to those who are unable to physically visit the library. some libraries use their skills in information management and communication to assist local disaster preparedness groups and response teams.9 in 2011, the federal emergency management agency (fema) declared public libraries eligible for temporary relocation funds in the event of an emergency, a distinction once limited to first responders, hospitals, utilities, and schools.10 former executive director of the american library association’s (ala) washington office, emily sheketoff, stated such a distinction recognizes libraries as “essential community organizations.”11 in context with jaeger’s interventionist period, it benefits libraries and government agencies alike to have libraries open to serve communities after a disaster. information technology and libraries | june 2019 39 in the aftermath of hurricane sandy, communities suffered from varying degrees of damage, such as flooding, power outages, debris, and downed trees.12 the impact of the storm drove many community members to their local libraries to seek shelter, charge their electronics, file insurance claims and other e-government forms, drop off or pick up donations, and obtain entertainment.13 despite the many stories of libraries serving disaster victims and working with first responders, such actions have yet to be translated into widespread library policy and procedures. ala provides a “disaster preparedness and recovery” resources webpage, but it primarily focuses on addressing material and structural needs after a disaster, such as mitigating water damage to collections.14 other studies also note a majority of library disaster response literature remains focused on protecting materials.15 such a limited perspective is highlighted in a national survey in which the majority of librarian respondents believed protecting library materials and performing daily services were their primary goals in the event of an emergency.16 as a result, library communication with the community and local organizations remains a relatively unexplored subject in context with disaster response.17 while trade journals and websites publish stories of individual libraries serving their communities, formal studies and research are comparatively scarce. with the widespread use of technology and the internet, one method of communication stands out as an important tool for library outreach and study: social media. disaster response through social media as information providers and advocates of communication technology, libraries should use social media to connect with their communities. although libraries were early adopters of social media prior to hurricane sandy, their use of these tools tends to focus on one-way information sharing instead of a dialogue with their community.18 social media in context of disaster response may upend traditional library social media use, which is why this topic needs further examination. social media coupled with mobile technology has created a society in which information sharing and communication are constant and instantaneous.19 since social networking is a relatively new form of media, formal studies on its impact on social behaviors have only come about in the last decade.20 within this young body of literature, however, social media use in disaster response and recovery is a popular topic for researchers, organizations, and federal agencies.21 alexander claims that social media provides the following benefits during disaster response: • provides an outlet to listen and share thoughts, emotions, opinions; • monitors a situation; • integrates social media into emergency plans; • crowdsources information; • creates social cohesion and promoting therapeutic initiatives; • furthers causes; and • creates research data.22 such a comprehensive list is beneficial to this study because it provides a framework through which library social media use can be examined. these benefits stem from the sharing of information with people or entities, which is a large component of library disaster response, as discussed in the previous section. using alexander’s list as a reference, the three main benefits this study examines in context with library disaster response are: weathering the twitter storm | han 40 https://doi.org/10.6017/ital.v38i2.11018 1. monitors a situation. a survey of library patrons impacted by the 2015 south carolina floods revealed all respondents used social media to learn about the flooding and impacted areas.23 people now frequently use social media to get updates on situations, whether they were directly or indirectly impacted by the natural disaster itself. disaster response groups also monitor social media feeds to assess and allocate resources to those in need.24 libraries can use social media feeds to assess resources and services use, plan outreach opportunities, and even inform the public about its own status during the disaster. 2. integrates social media into emergency plans. social media is a low-cost and effective way to coordinate disaster response between organizations and people. much like bookmobiles, social media serves as outreach for librarians to improve service accessibility. librarians can use platforms like twitter and facebook to help coordinate their activities and services alongside with other responders in the community. having an established plan of action where the library’s role and responsibilities are clearly outlined will result in more effective service and efficient response to community needs.25 3. creates social cohesion and promoting therapeutic initiatives. in alignment with the library’s mission of creating and serving communities, social media can act as an extra method of fostering connections in times of need. disaster victims can take advantage of social media’s speed and ubiquity to check in with family, tell them they are safe, and participate in relief efforts.26 social cohesion through platforms such as twitter can also create participatory discourse between people and organizations. for example, then-fema administrator chris furgate’s recommendation to read to children during the hurricane prompted the hashtag #stormreads to trend on twitter, as many accounts—libraries included—shared their recommended titles.27 library use of social media can also address growing concerns about rumors and misinformation spread during disasters.28 as providers of reliable and accurate information, libraries help establish source credibility and push more accurate resources to misinformed and unaware community members. although there is a substantial amount of research focused on libraries responding to disasters and social media use during disasters separately, there is a gap in library science literature examining social media as a method of library disaster response. interestingly, formal studies that mention library disaster response note an explicit absence of social media as a form of emergency communication.29 despite the current dearth, library social media studies can develop quickly thanks to the abundant amount of data available on social media platforms. as libraries continue to respond to disasters, they will require more deliberate and planned use of social media as a communication tool. such a need demands a closer examination of how libraries have historically used social media during disasters. case studies: three public libraries and twitter this study will examine the social media feeds of three public libraries during and immediately after hurricane sandy landed on the northeastern coast as a measure of social media’s impact on communication and information-sharing amongst libraries, patrons, and first responders. due to its frequent use for sharing up-to-date information, twitter was the selected social media platform to study.30 the public library systems were selected for this analysis based on their varying characteristics and available literature describing their actions after the hurricane. new york information technology and libraries | june 2019 41 public library (nypl, @nypl), princeton public library (ppl, @princetonpl), and queens library (ql, @queenslibrary) have twitter accounts that were at least two years old by october 2012. all accounts were active during the time period of interest, although they were closed when hurricane sandy landed. nypl and ql were closed an additional two days due to damages to several branch libraries.31 these library systems serve varied communities. nypl and ql are urban libraries located in new york city, with 91 and 62 branches respectively, and ppl is a one branch library located in downtown princeton, new jersey. the larger library systems reported flooding and power outages at several branches from the hurricane, while ppl sustained no structural or internal damages.32 however, all library systems were in communities where large numbers of households lost electricity and internet access, and sustained damages from fallen trees and flooding.33 the library systems were mentioned in news reports for services to library patrons affected by the storm, including providing charging stations for electronics, helping people fill out fema insurance forms, running programs for children and adults, and having public computers and wireless connections to access the internet.34 the libraries’ coupled use of twitter and active provision of disaster response services make them ideal candidates for examining the correlation between the two activities. methodology this study used a filtered search on twitter to identify tweets from each library’s feed within the time period of interest. within searches, each tweet was recorded and categorized based on content and message format. a single tweet could have more than one category. common content subcategories were identified to improve analysis. defined categories are as follows: • hurricane information: information on the hurricane’s status and impact from news and government agencies. • library policies: information on library policies. • library policies, renewals/fines: information on renewals and fines during the studied time period. • library status: information on library branch closures. • library event/service related to hurricane: event or service specifically planned in response to hurricane. • library event/service not related to hurricane: regular library programming; included event/service cancellations as an indirect/direct result of hurricane. • non-library event/service related to hurricane: information on non-library sponsored events and services provided in response to the hurricane. • replies: a publicly posted message from the library to another twitter user. • social interactions: non-informative and informative tweets aimed at conversing with people or organizations in a social manner. selected categories were then associated with a corresponding benefit from three of alexander’s defined benefits (table 1).35 after categorizing, the collected data was organized for analysis and comparison. weathering the twitter storm | han 42 https://doi.org/10.6017/ital.v38i2.11018 table 1. categories organized by social media benefits.36 benefit twitter content categories monitoring a situation § hurricane information § library event/service related to hurricane § replies integrating social media into emergency plans § library policies § library status § non-library event/service related to hurricane creating social cohesion and promoting therapeutic initiatives § library event/service related to hurricane § library event/service not related to hurricane § non-library event/service related to hurricane § replies § social interactions results from october 29-31, each library used twitter regularly to provide information or to communicate with library followers. tweet frequencies were counted and compared over the fiveday period across libraries (figure 1). while nypl and ql averaged almost 11 tweets per day, ppl had nearly double their numbers, at about 18 tweets per day. nypl and ql had a generally increasing trendline in tweets, while ppl’s twitter use fluctuated greatly. nypl and ql’s low tweet count during the studied time frame may be attributed to library-wide closures, although only ql’s tweet count increased significantly upon reopening. figure 1. number of tweets per day by library. content analysis illustrated variations in twitter use across all three libraries (figure 2). nypl tweeted the most about their library status and renewal/fine policy, with 21 and 17 tweets, information technology and libraries | june 2019 43 respectively. ppl focused more on advertising library events and services such as electrical outlets, heat, internet, and entertainment. they also used twitter heavily for social interactions, 35 percent of ppl’s 112 tweets, including asking questions, recommending books, thanking concerned patrons, and even apologizing for retweeting too many news articles about the hurricane. ql’s twitter use was more of a mix, often posting about library status and socially interacting with other twitter users. figure 2. twitter content by library. each library also differed in least common content tweeted. nypl had the fewest tweets about the hurricane, non-library services and events related to the hurricane, other library policies, and social interactions. ppl also had few tweets with information about the hurricane and rarely tweeted about fines and renewals. ql had no tweets about the hurricane, nor did they tweet about any library events or programs that were unrelated to their disaster response. discussion the data collected was analyzed to determine whether each library fulfilled the three identified benefits of social media that directly relate to the library’s mission of information access and community building: monitoring a situation, integrating social media into emergency plans, and creating social cohesion and promoting therapeutic initiatives. each library’s consistent responses to twitter users, status updates, and information about library services illustrates they all monitored their communities’ situations and responded accordingly through services and programs, as evidenced in news reports. libraries also used twitter to engage with others and weathering the twitter storm | han 44 https://doi.org/10.6017/ital.v38i2.11018 create a social network of library patrons and local institutions. based on the lack of information about the storm itself and few recommendations for non-library disaster response group resources, it is not apparent libraries integrated social media as part of their emergency policy and procedures. this also resulted in a dissonance between library action and their online communication. one notable example: many news reports described librarians aiding patrons with finding and filling out fema insurance forms, but only one of the 196 tweets analyzed in this study advertised fema assistance at the library.37 ppl tweeted several posts illustrating library use by affected patrons, but also emphasized they were at capacity due to large visitor numbers and shortages in charging stations and internet bandwidth. ppl also failed to offer alternatives on twitter to meet patron information needs. the lack of a coordinated effort perhaps can be explained in two parts. first, as no two disasters are alike, library response is often a direct reaction to the event and damages to their institution and community. a busy library would logically place social media communication and coordination as a lower priority than other immediate, tangible needs. second, librarians may not make a concerted effort to use social media if they are trained to prioritize protecting library collections and conducting regular services.38 while digital and outreach services such as bookmobiles have been common components of libraries, there is still a noticeable gap in libraries extending these same services using online tools. the libraries in this study used social media as a part of their disaster response, but the lack of planning resulted in each library’s twitter feed acting more as a “triage center,” providing basic assistance as the need arose, rather than an extension of in-house services. takeaways and further research while these libraries provided much needed services in the aftermath of hurricane sandy, their implementation of social media as a communication and information-sharing tool illustrates opportunities to develop more coordinated efforts. as library presence on and use of social media continues to grow, it should be considered as a necessary component of library disaster response and collaboration with other government agencies and first responders. while libraries are qualified for fema funding, it is uncertain that local first responder groups are aware of the services and benefits libraries provide post-disaster at all. as of 2013, the u.s. department of homeland security’s virtual social media working group did not include any library organizations, which leaves libraries out of crucial conversations in designing comprehensive disaster response plans.39 in an effort to participate in productive discourse, librarians also need to improve their social media use to better align with their practice when serving distressed communities. while the exact reasons for librarians’ lack of effective social media use in disaster response remains speculative, other research has shown that training opportunities for social media use in libraries remain scarce and not very effective.40 since hurricane sandy, social media has only grown as a powerful tool for people and communities, rendering it an essential skill for librarians today. this should motivate librarians, library associations, and other professional groups to consider developing effective training and workshops geared towards intentional use of social media. despite its power, social media should be seen as a complementary tool to enhance information services for community members. it will optimize the library’s reach, but it cannot completely replace current methods of outreach, nor should it. this is especially important when considering information technology and libraries | june 2019 45 who benefits the most from libraries, many of whom do not necessarily have consistent access to social media.41 social media use varies across age, socioeconomic status, digital access, and education levels, making it important for librarians to consider whose information needs are and are not being met online. considering such limitations, learning impactful social media skills and creating a support network amongst disaster response groups will enable libraries to effectively develop outreach strategies and improve disaster response services. the discussion and takeaways highlight the necessity for further research on social media use in library disaster response. as the history of library development and service informs the direction of libraries today, so too should historic uses of social media as a library service tool guide future work. continuing research may include case studies of public library response to recent disasters, which would provide better insight into the developing use of social media. the identified patterns and strengths can be used to guide future work in incorporating effective social media policies and protocols in library disaster plans. considering social media usage by first responders and federal agencies, future research should also include a closer examination of relationships between public libraries, first responders, and disaster information providers in improving coordinated response efforts. conclusion when disaster strikes, many communities exhibit a great need for resources and information. despite libraries providing much needed service and resources to community members after natural disasters, their use of social media platforms as a tool remains overlooked. this study examines historical use of social media as a communication and service tool between libraries, community members, and disaster response groups in the aftermath of hurricane sandy. the effectiveness of social media use was evaluated using alexander’s review of social media benefits and compared with descriptions of post-sandy library resources and services described in the literature. the study found social media use to be highly variable based on content and correlations with reported in-house library services. there was no sign of a coordinated effort with other disaster response groups, and the primary objective of their twitter accounts was connecting with patrons and other organizations through social interactions. improvements to social media use could be achieved through intentional coordination with first responders, directed training, and evaluating social media’s strengths and limitations in disaster response. if libraries wish to continue providing pertinent information, they need to adapt to communication methods used by their community. with social media’s strong presence in society, suburban and urban libraries such as the ones examined in this study should improve their use of social media as an effective information sharing and communication tool. continuing to examine and assess uses of social media as a disaster response tool can help shape policies and procedures that will enable libraries to better serve their communities. references 1 maya rhodan, “‘please send help.’ hurricane harvey victims turn to twitter and facebook,” time, aug. 30, 2017, http://time.com/4921961/hurricane-harvey-twitter-facebook-socialmedia/. weathering the twitter storm | han 46 https://doi.org/10.6017/ital.v38i2.11018 2 paul t. jaeger et al., “libraries, policy, and politics in a democracy: four historical epochs,” library quarterly 83, no. 2 (apr. 2013): 166–81, https://doi.org/10.1086/669559. 3 virtual social media working group and dhs first responders group, “lessons learned: social media and hurricane sandy", u.s. department of homeland security, june 2013, https://www.dhs.gov/sites/default/files/publications/lessons%20learned%20social%20me dia%20and%20hurricane%20sandy.pdf. 4 virtual social media working group and dhs first responders group. 5 bradley w. bishop and shari r. veil, “public libraries as post-crisis information hubs,” public library quarterly 32 (2013): 33–45, https://doi.org/10.1080/01616846.2013.760390. 6 bishop and veil. 7 bishop and veil. 8 bishop and veil; jingjing liu et al., “social media as a tool connecting with library users in disasters: a case study of the 2015 catastrophic flooding in south carolina,” science & technology libraries 36, no. 3 (july 2017): 274–87, https://doi.org/10.1080/0194262x.2017.1358128. 9 charles r. mcclure et al., “hurricane preparedness and response for florida public libraries: best practices and strategies,” florida libraries 52, no. 1 (2009): 4–7. 10 michael kelley, “ala midwinter 2011: fema recognizes libraries as essential community organizations,” school library journal, jan. 11, 2011, http://lj.libraryjournal.com/2011/01/industry-news/ala-midwinter-2011-fema-recognizeslibraries-as-essential-community-organizations/. 11 kelley. 12 maureen m. garvey, “serving a public library community after a natural disaster: recovering from ‘hurricane sandy,’” journal of the leadership & management section 11, no. 2 (spring 2015): 22–31; cathleen a. merenda, “how the westbury library helped the community after hurricane sandy,” journal of the leadership & management section 11, no. 2 (spring 2015): 32– 34. 13 sarah bayliss, shelley vale, and mahnaz dar, “libraries respond to hurricane sandy, offering refuge, wifi, and services to needy communities,” school library journal, nov. 1, 2012, http://www.slj.com/2012/11/public-libraries/libraries-respond-to-hurricane-sandy-offeringrefuge-wifi-and-services-to-needy-communities/; joel rose, “for disaster preparedness: pack a library card? : npr,” npr, aug. 12, 2013, https://www.npr.org/2013/08/12/210541233/for-disasters-pack-a-first-aid-kit-bottledwater-and-a-library-card. 14 “disaster preparedness and recovery,” ala advocacy, legislation & issues, 2017, http://www.ala.org/advocacy/govinfo/disasterpreparedness. information technology and libraries | june 2019 47 15 bishop and veil, “public libraries as post-crisis information hubs.” 16 lisl zach, “what do i do in an emergency? the role of public libraries in providing information during times of crisis,” science & technology libraries 30, no. 4 (sept. 2011): 404–13, https://doi.org/10.1080/0194262x.2011.626341. 17 bishop and veil, “public libraries as post-crisis information hubs.” 18 liu et al., “social media as a tool connecting with library users in disasters: a case study of the 2015 catastrophic flooding in south carolina”; zach, “what do i do in an emergency?” 19 virtual social media working group and dhs first responders group, “lessons learned.” 20 david alexander, “social media in disaster risk reduction and crisis management,” science & engineering ethics 20, no. 3 (sept. 2014): 717–33, https://doi.org/10.1007/s11948-013-9502z. 21 alexander; liu et al., “social media as a tool”; virtual social media working group and dhs first responders group, “lessons learned.” 22 alexander, “social media in disaster risk reduction.” 23 liu et al., “social media as a tool.” 24 alexander, “social media in disaster risk reduction.” 25 bishop and veil, “public libraries as post-crisis information hubs.” 26 alexander, “social media in disaster risk reduction.” 27 bayliss, vale, and dar, “libraries respond.” 28 liu et al., “social media as a tool.” 29 liu et al.; zach, “what do i do in an emergency?” 30 deborah d. halsted, library as safe haven: disaster planning, response, and recovery: a how-todo-it manual for librarians, first edition (chicago: american library association, 2014). 31 george m. eberhart, “libraries weather the superstorm,” american libraries magazine, nov. 4, 2012, https://americanlibrariesmagazine.org/2012/11/04/libraries-weather-thesuperstorm/; rose, “for disaster preparedness.” 32 bayliss, vale, and dar, “libraries respond”; eberhart, “libraries weather the superstorm”; rose, “for disaster preparedness.” 33 bayliss, vale, and dar, “libraries respond.” weathering the twitter storm | han 48 https://doi.org/10.6017/ital.v38i2.11018 34 bayliss, vale, and dar; eberhart, “libraries weather the superstorm”; lisa epps and kelvin watson, “emergency! how queens library came to patrons’ rescue after hurricane sandy,” computers in libraries 34, no. 10 (dec. 2014): 3–30; rose, “for disaster preparedness.” 35 alexander, “social media in disaster risk reduction.” 36 benefits listed and defined in alexander, david. “social media in disaster risk reduction and crisis management.” science & engineering ethics 20, no. 3 (sept. 2014): 717–33. https://doi.org/10.1007/s11948-013-9502-z. 37 eberhart, “libraries weather the superstorm”; rose, “for disaster preparedness.” 38 zach, “what do i do in an emergency?” 39 virtual social media working group and dhs first responders group, “lessons learned.” 40 rachel n. simons, melissa g. ocepek, and lecia j. barker, “teaching tweeting: recommendations for teaching social media work in lis and msis programs,” journal of education for library and information science 57, no. 1 (dec. 1, 2016): 21–30, https://doi.org/10.3138/jelis.57.1.21. 41 alexander, “social media in disaster risk reduction.” editorial board thoughts: ital 2.0 | boze 57 litablog.org/) i see that there are occasional posts, but there are rarely comments and little in the way of real discussion. it seems to be oriented toward announcements, so perhaps it’s not a good comparison with italica. some ala groups are using wordpress for their blogs, a few with user comments, but mostly without much apparent traffic (for example, the ll&m online blog, http://www .lama.ala.orgllandm). in general, blogs don’t seem to be a satisfactory platform for discussion. wikis aren’t particularly useful in this regard, either, so i think that rules out the lita wiki (http://wikis.ala.org/lita/index.php/ main_page). i’ve looked at ala connect (http://connect. ala.org/), which has a variety of web 2.0 features, so it might be a good home for italica. we could also use a mailing list, either one that already exists, such as lita-l, or a new one. the one advantage e-mail has is that it is delivered to the reader, so one doesn’t have to remember to visit a website. we already have rss feeds for the italica blog, so maybe that works well enough as a notification for those who subscribe to them. i’ve also wondered whether a discussion forum (aka message board) would be useful. i frequent a few software-related forums, and i find them conducive to discussion. they have a degree of flexibility lacking in other platforms. it’s easy for any participant to start up a new topic rather than limiting discussion only to topics posted by the owner, as is usually the case with blogs. frankly i’d like to encourage discussion on topics beyond only the articles published in ital. for example, we used to have columns devoted to book and software reviews. even though they were discontinued, those could still be interesting topics for discussion between ital readers. in writing this, my hope is to get feedback from you, the reader, about what ital and italica could be doing for you. how can we use ala connect in ways that would be useful? could we use other platforms to do things beyond simply discussing articles that appear in the print edition? what social web technologies do you use, and how could we apply them to ital? after you read this, i hope you’ll join us at italica for a discussion. let us know what you think. editor’s note: andy serves on the ital editorial board and as the ital website manager. he earns our gratitude every quarter with his timely and professional work to post the new issue online. t he title of this recurring column is “editorial board thoughts,” so as i sit here in the middle of february, what am i thinking about? as i trudge to work each day through the snow and ice, i think about what a nuisance it is to have a broken foot (i broke the fifth metatarsal of my left foot at the midwinter meeting in boston—not recommended) but most recently i’ve been thinking about ital. the march issue is due to be mailed in a couple of weeks, and i got the digital files a week or so ago. in a few days i’ll have to start separating the pdf into individual articles, and then i’ll start up my web editor to turn the rtf files for each article into nicely formatted html. all of this gets fed into ala’s content management system, where you can view it online by pointing your web browser to http://www.lita.org/ala/mgrps/divs/lita/ ital/italinformation.cfm. in case you didn’t realize it, the full text of each issue of ital is there, going back to early 2004. selected full-text articles are available from earlier issues going back to 2001. the site is in need of a face lift, but we expect to work on that in the near future. starting with the september 2008 issue of ital we launched italica, the ital blog at http://ital-ica .blogspot.com/, as a pilot. italica was conceived as a forum for readers, authors, and editors of ital to discuss each issue. for a year and a half we’ve been open for reader feedback, and our authors have been posting to the blog and responding to reader comments. what’s your opinion of italica? is it useful? what could we be doing to enhance its usefulness? in reality we haven’t had a great deal of communication via the blog. we are looking at moving italica from blogger to a platform more integrated with existing ala or lita services. is a blog format the best way to encourage discussion? when i look at the lita blog (http:// andy boze (boze.1@nd.edu) is head, desktop computing and network services, university of notre dame hesburgh libraries, notre dame, indiana. andy bozeeditorial board thoughts: ital 2.0 editorial ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ 44 information technology and libraries | december 2007 author id box for 3 column layout column titlecommunications afghanistan digital library initiative: revitalizing an integrated library system yan han and atifa rawan this paper describes an afghanistan digital library initiative of building an integrated library system (ils) for afghanistan universities and colleges based on open-source software. as one of the goals of the afghan equality digital libraries alliance, the authors applied systems analysis approach, evaluated different open-source ilss, and customized the selected software to accommodate users’ needs. improvements include arabic and persian language support, user interface changes, call number label printing, and isbn-13 support. to our knowledge, this ils is the first at a large academic library running on open-source software. the last quartercentury has been devastating for afghanistan, with an uninterrupted period of inva sions, civil wars, and oppressive regimes. “since 1979, the education system was virtually destroyed on all levels. schools and colleges were closed, looted, or physically reduced; student bodies and faculties were emptied by war, migration, and eco nomic hardship; and libraries were gutted.”1 kabul university (ku), for example, was largely demolished by 1994 and completely closed down in 1998. it is universally recognized that afghanistan desperately needs trained faculty, teachers, librarians, and staff. the current state of the higher education system is one of dramatic destruction and deteriora tion. based on rawan’s assessments of ku library, most of its collections were damaged or destroyed. she found that there were approximately 60,000 to 70,000 books in english, 2,000 to 3,000 books in persian, and 2,000 theses in persian. none of these collections have manual or online catalog records. the library has eigh teen staff members, but not all are fully trained in library activities.2 rebuilding the educational infra structure in afghanistan is essential. afghan equality digital libraries alliance the university of arizona (ua) library has been involved in rebuilding academic libraries in afghanistan since april 2002. in 2005, we were invited to be part of the digital libraries alliance (dla) as part of the afghan equality alliances: 21st century universities for afghanistan initiative funded by the usaid and washington state university. dla’s goal is to build the capacity of afghan libraries and librarians to work with open source digital libraries platforms; and to provide and enhance access to schol arly information resources and open content that all afghanistan univer sities can share. revitalizing the afghan ils an integrated library system (ils) usually includes several critical com ponents, such as acquisitions, cat aloging, catalog (search and find), circulation, and patron management. traditionally it has been the center of any library. recent developments in digital libraries have resulted in dis tributed systems in libraries, and the ils is treated as one of many digital library systems. it still is critical to have a centralized ils to provide a primary way to access libraryowned materials for afghanistan universi ties and colleges. other services, such as interlibrary loan and other digital library systems, can be further devel oped to extend libraries’ services to users and communities. the ua library is working collab oratively with other dla members, including universities around the world and universities in afghanistan. one of the goals is to develop a digital library environment, includ ing a centralized ils for four aca demic universities in kabul (kabul university, polytechnic university, kabul medical university, and kabul education university). in the future, the ils will include other regional institutions throughout afghanistan. the ils will support 30,000 students and 2,000 faculty in afghan universi ties and colleges. overview of the ils market currently the ils market is primar ily dominated by commercial sys tems, such as innovative interface, endeavor, and sirsi. compared with other computing areas, opensource systems in ils are immature and limited, as there are only a few prod ucts available, and most of them do not have the full features of an ils. however, they are providing a valu able alternative to those costly com mercial systems. based on the availability of exist ing funding, experiences with com mercial vendors, and consideration of vendor supports and future direc tions, the authors decided to build a digital library infrastructure with the open concept (open access, open source, and open standards). the decision is widely influenced by glo balization, open access, open source, open standards, and increasing user expectations. at the same time, the decision gives us an opportunity to develop and integrate new tools and services for libraries as suggested by the university of california.3 koha is probably the most renowned opensource ils. it is yan han (hany@u.library.arizona.edu) is systems librarian and atifa rawan (rawana@u.library.arizona.edu) is librarian at the university of arizona libraries, tucson. afghanistan digital library initiative | han and rawan 45 a fullfeatured ils, developed in new zealand and first deployed in horowhenua library trust in 2000. so far koha has been running in a few public and special libraries. the underlying architecture is the linux, apache, mysql, and perl (lamp) stack. building on a simi lar lamp (linux, apache, mysql, and php) architecture, openbiblio has a relatively short history, releas ing its first beta 0.1.0 version in 2002 and currently in beta 0.5.1 version. webils is an opensource ils based on unesco’s cds/isis database, developed by the institute for computer and information engineering in poland. the software has some ils features, including cataloging, catalog (search and find), loan, and report modules. weblis must run on windows and window based web servers, such as xitami/ microsoft iis and isis database. gnuteca, another opensource ils widely deployed in south america universities, was developed in brazil. as with webils, it has some ils features, such as cataloging, cata log, and loan; however, the software interface is written in portuguese, which presents a language barrier for u.s. and afghanistan users. the paper open source integrated library systems provides a good overview of other systems.4 systems analysis the authors adopted systems analy sis by taking account of afghan col lections, users’ needs, and systems functionality required to perform essential library operations. koha was chosen as the base software, due to its functionality, maturity, and support. some of the reasons are: ■ the software architecture is open source lamp, which is popular, stable, and predominant. ■ our staff have skills in these open software systems. ■ it is a fullfeatured opensource ils. certain components, such as multiple branch support and user management, are critical. ■ two large public libraries serv ing population of 30,000 users in new zealand and united states have been running their ils on koha for a few years. the soft ware is stable, and most bugs have been fixed. ■ koha has a mailing list that is used by koha developers and users as a communication tool to ask and answer questions. kabul universities have com puter science faculty and students who have the capacity to participate in the development. due to working schedules and locations, we prefer to develop and maintain the system in the ua library. the technical project team consists of three people: yan han, who is responsible for manag ing the overall implementation and development in the open source ils system; one parttime (twenty hours per week) student developer whose major task is to develop and man age source code; and a temporary student (ten hours per week for two months) responsible for translating english to farsi and dari. testing tasks, such as unit testing and sys tem testing, are shared by all mem bers of the team. major challenges farsi and dari languages support koha version 2.2 cannot correctly handle east asian language records, including farsi and dari records. supporting persian, farsi, and dari records is a very important require ment, as these afghan universities have quite a few persian and dari materials. koha generates a web based graphical user interface (gui) through perl included templates that use a html meta tag with western character set (iso85591) to encode characters. browsers such as internet explorer and firefox use the meta tag to decode characters with a predefined character set. therefore, other characters, such as arabic and persian as well as chinese would not be displayed correctly. the perl tem plates were identified and modified to allow characters to be encoded in unicode, and this solved the prob lem. persian and dari characters can be entered into the cataloging module and displayed correctly in the gui. however, we should understand the limitations of this approach when dealing with other east asian character sets, such as chinese characters. only frequently used characters can be represented. a project of academia sinica is one of the efforts to deal with 65,000 unique chinese characters.5 farsi/dari gui as the project is designed for local afghanistan users, there is a need for a farsi and dari gui. the current version of koha does not have such an interface, and we decided to create a new farsi/dari gui for the opac. the koha system’s internal structure is logically arranged; therefore, our development work in translation is not difficult to manage. the transla tion student translates english words in perl template files into farsi and dari. at the same time he works with the developer to make sure it is dis played correctly in the opac. figure 1 is the screenshot of the gui. other improvements we further developed a spine label printing module and integrated the module into the ils, as there is no such function provided. the module allows library staff to print one or more standardized labels (1.5 inches high by 1 inch wide) with oclc formats on gaylord lsl 01 paper, which has fiftysix labels per sheet. 46 information technology and libraries | december 2007 lstaff can select an appropriate label slot to start and print out his or her choices of labels through the web preview feature. this feature eases library staff operations and provides cost savings for label papers. isbn13 replaced isbn10 after january 1, 2007, and any ils has to be able to handle the new isbn13. our ils has been improved to han dle both isbn standards. thanks to koha’s delegation of the gui and major functionality, interfaces such as fonts and web pages can be modi fied through the templates and css. a z39.50 service has been configured to allow users to search other librar ies’ catalogs. hardware and software support afghanistan is still developing its fun damental infrastructure: electricity, transportation, and communication. when considering buying hardware for the ils, difficult issues, such as server services and computer parts, have to be solved. even international it companies, such as dell, hp, and ibm, have very limited services and support in afghanistan. regarding software and system support, our strategies are to: ■ maintain and develop the open source software at the ua library by the project team; ■ run one server in kabul, afghanistan, administrated by a local system administrator. ■ run one server in the ua library administrated by the library’s system administrator. cost we estimated our overall cost for building the opensource system is low and reasonable. the system is currently run ning on a dell 2800 server ($5,000 for 3ghz cpu, 4gb ram, and five 73gb hard drives), kernel built debian linux (free), apache 2 (free), mysql (free), and perl (free). han spends four hours per week for coor dination, communication, and man agement of the project. the student developer works twenty hours per week for development and mainte nance, while the translation student will spend one hundred hours for translation. conclusion revitalizing an afghan ils is the first important goal to build digital library initiatives for the afghanistan higher education system. by under standing afghan university librar ies, collections, and users, the ua library is working with other dla members to build the open source ils. the new farsi and dari user interface, language support, and other improvements have been made to meet needs of afghan uni versities and colleges. the cost of using and developing existing open source software is reasonable. acknowledgments we thank usaid, washington state university, and other dla mem bers for providing support. this work was supported by usaid and washington state university. references and notes 1. nazif sharani et. al., conference transcription, conference on strate gic planning of higher education for afghanistan, 2002, indiana university, bloomington, oct. 6–7. 2. atifa rawan, transformation in afghanistan: rebuilding libraries, paper presented at azla conference, mesa, ariz., oct. 11–13, 2005. 3. the university of california libraries, rethinking how we provide bibliographic services for the university of california, 2005, http://libraries.univer sityofcalifornia.edu/sopag/bstf/final. pdf. 4. eric anctil and jamshid beheshti, open source integrated library systems: an overview, 2004, www.anctil.org/users/ eric/oss4ils.html (accessed nov. 5, 2006). 5. derming juang et al., “resolving the unencoded character problem for chinese digital libraries,” proceedings of the 5th acm/ieee-cs joint conference on digital libraries, jcdl 2005, denver (june 7–11, 2005): 311–19 (new york: acm pr., 2005). figure 1: afghanistan academic libraries union catalog in farsi/dari lita cover 2, cover 3, cover 4 index to advertisers 6 information technology and libraries | september 2008 mireia ribera turróeditorial board thoughts the june issue of ital featured a new column enti-tled editorial board thoughts. the column features commentary written by ital editorial board members on the intersection of technology and libraries. in the june issue kyle felker made a strong case for gerald zaltman’s book how customers think as a guide to doing user-centered design and assessment in the context of limited resources and uncertain user needs. in this column i introduce another factor in the library–it equation, that of rapid technological change. in the midst of some recent spring cleaning in my library i had the pleasure of finding a report documenting the current and future it needs of purdue university’s hicks undergraduate library. the report is dated winter 1995. the following summarizes the hicks undergraduate library’s it resources in 1995: [the library] has seven public workstations running eight different databases and using six different search software programs. six of the stations support a single database only; one station supports one cd-rom application and three other applications (installed on the hard drive). none of the computers runs windows, but the current programs do not require it. five stations are equipped with six-disc cd-rom drives. we do not anticipate that we will be required to upgrade to windows capability in the near future for any of the application programs. today the hicks undergraduate library’s it resources are dramatically different. as opposed to seven public workstations, we have more than seventy computers distributed throughout the library and the digital learning collaboratory, our information commons. this excludes forty-six laptops available for patron checkout and eighty-eight laptops designated for instructional use. we have moved from eight cd-rom databases to more than four hundred networked databases accessible throughout the purdue university libraries, campus, and beyond. as a result, there are hundreds of “search software programs”—doesn’t that phrase sound odd today?—including the library databases, the catalog, and any number of commercial search engines like google. today all, or nearly all, of our machines run windows, and the macs have the capability of running windows. in addition to providing access to databases, our machines are loaded with productivity and multimedia software allowing students to consume and produce a wide array of information resources. beyond computers, our library now loans out additional equipment including hard drives, digital cameras, and video cameras. the 1995 report also includes system specifications for the computers. these sound quaint today. of the seven computers six were 386 machines with processors clocking in at 25 mhz. the computers had between 640k and 2.5mb of ram with hard drives with capacities between 20 and 60mb. the seventh computer was a 286 machine probably with a 12.5 mhz processor, and correspondingly smaller memory and hard disc capacity. the report does not include monitor specifications, though, based on the time, they were likely fourteenor fifteen-inch cga or ega cathode ray tube monitors. modern computers are astonishingly powerful in comparison. according to a member of our it unit, the computers we order today have 2.8 ghz dual core processors, 3gb of ram, and 250gb hard drives. this equates to being 112 times faster, 1,200 times more ram, and hard drives that are 4,167 times larger than the 1995 computers! as a benchmark, consider moore’s law, a doubling of capacitors every two years, a sixty-four fold increase over a thirteen year period. who would have thought that library computers would outpace moore’s law?! today’s computers are also smaller than those of 1995. our standard desktop machines serve as an example, but perhaps not as dramatically as laptops, mini-laptops, and any of the mobile computing machines small enough to fit into your pocket. monitors are smaller, though also bigger. each new computer we order today comes standard with a twenty-inch flat panel lcd monitor. it is smaller in terms of weight and overall size, but the viewing area is significantly larger. these trends are certainly not unique to purdue. nearly every other academic library could boast similar it advancements. with this in mind, and if moore’s law continues as projected, imagine the computer resources that will be available on the average desktop machine— although one wonders if it will in fact be a desktop machine—in the next thirteen years. what things out on the distant horizon will eventually become commonplace? here the quote from the 1995 report about windows is particularly revealing. what things that are currently state-of-the-art will we leave behind in the next decade? what’s dos? what’s a cd-rom? will we soon say, what’s a hard drive? what’s software? what’s a desktop computer? in the last thirteen years we have also witnessed the widespread adoption and proliferation of the internet, the network that is the backbone for many technologies that have become essential components of physical and digital libraries. earlier this year, i co-authored an arl spec kit entitled social software in libraries.1 the survey reports on the usage of ten types of social software within arl libraries: (1) social networking sites like myspace and facebook; (2) media sharing sites like 6 information technology and libraries | september 2008 matthew m. bejune (mbejune@purdue.edu) is an ital editorial board member (2007–09), assistant professor of library science at purdue university, and doctoral student in the graduate school of library and information science at the university of illinois at urbana–champaign. matthew m. bejune editorial board thoughts | bejune 7 youtube and flickr; (3) social bookmarking and tagging sites like del. icio.us and librarything; (4) wikis like wikipedia and library success: a best practices wiki; (5) blogs; (6) rss used to syndicate content from webpages, blogs, podcasts, etc.; (7) chat and instant messenger services; (8) voice over internet protocol (voip) services like googletalk and skype; (9) virtual worlds like second life and massively multiplayer online games (mmogs) like world of warcraft; and (10) widgets either developed by libraries like facebook applications, firefox catalog search extensions, or widgets implemented by libraries like meebome and firefox plugins. of the 64 arl libraries that responded, a 52% response rate, 61 (95% of respondents) said they are using social software. of the three libraries not using social software, two indicated they plan to do so in the future. in combination then, 63 out of 64 respondents (98%) indicated they are either currently using or planning to use social software. as part of the survey there was a call for examples of social software used in libraries. of the 370 examples we received, we selected around 70 for publication in the spec kit. the examples are captivating and they illustrate the wide variety of applications in use today. of the ten social software applications in the spec kit, how many of them were at our disposal in 1995? by my count three: chat and instant messenger services, voip, and virtual worlds such as text-based muds and moos. of these three, how many were in use in libraries? very few, if any. in our survey we asked libraries for the year in which they first implemented social software. the earliest applications were cu-seeme, a voip chat service at cornell university in 1996, im at the university of california riverside in 1996 as well, and interoffice chat at the university of kentucky in 1998. the remaining libraries adopted social software in year 2000 and beyond, with 2005 being the most common year with 22 responses or 34% of the libraries that had adopted social software. a look at this data shows that my earlier use of a thirteen-year time period to illustrate how difficult it is to project technological innovations that may prove disruptive to our organizations is too broad a time frame. perhaps we should scale this back to looking at five-year increments of time. using the spec kit data, in year 2003, a total of 16 arl libraries had adopted social software. this represents 25% of the total number of institutions that responded when we did our survey. this seems like figure 1. responses to the question, “please enter the year in which your library first began using social software” (n=61). a more reasonable time frame to be looking to the future. so, what does the future hold for it and libraries, whether it be thirteen or five years in the future? i am not a technologist by training, nor do i consider myself a futurist, so i typically defer to my colleagues. there are three places i look to for prognostications of the future. the first is lita’s top technology trends, a recurring discussion group that is a part of ala’s annual conference sand midwinter meetings. past top technology trends discussions can be found on lita’s blog (www.ala .org/ala/lita/litaresources/toptechtrends/toptechnology.cfm) and on lita’s website (www.ala.org/ala/lita/ litaresources/toptechtrends/toptechnology.cfm). the second source is the horizon project, a five-year qualitative research effort aimed at identifying and describing emerging technologies within the realm of teaching and learning. the project is a collaboration between the new media consortium and educause. the horizon project website (http://horizon.nmc.org/wiki/main_page) contains the annual horizon reports going back to 2004. a final approach to project the future of it and libraries is to consider the work of our peers. the next library innovation may emerge from a sister institution. or perhaps it may take route at your local library first! reference 1. bejune, matthew m. and jana ronan. social software in libraries. arl spec kit 304. washington, d.c.: association of research libraries, 2008. lib-mocs-kmc364-20140106083630 development of a technical library to support computer systems evaluation 173 patricia munson malley: librarian, u. s. army computer systems support and evaluation command, washington, d.c. this paper reports on the development and growth of the united states army computer systems support and evaluation command (usacssec) technical reference library from a collection of miscellaneous documents related to only fifty computer systems to the present collection of approximately 10,000 hardware/software technical documents related to over 200 systems from 70 manufacturers. special emphasis is given to the evolution of the filing system and retrieval techniques unique to the usacssec technical reference library, i.e., computer listings of available documents in various sequences, and development uf the cataloging system adaptable to computer technology. it is hoped that this paper will be a contribution toward a standard approach in cataloging adp collections. the advent of the computer has created a situation which has been labeled the "information explosion." through automatic data processing, managers of all types can have available to them information previously impossible. many authors have addressed this situation from many aspects. however, little has been said of the explosive growth of information about computers themselves and of ways to cope with it. this paper is intended to help overcome this void. it is a description of the system installed by the united states army computer systems support and evaluation command ( usacssec) to provide controls on its extensive library of technical literature pertaining to automatic data processing equipment. the usacssec has the mission of selecting and procuring this equipment to satisfy requirements of the army, a process that involves analyzing and evaluating technical proposals made by computer manufacturers. the analysts of the command require immediate access to detailed technical literature on all aspects of commercially available adp hardware and 174 journal of library automation vol. 4/4 december, 1971 software. this literature is maintained in the command technical reference library. in form it ranges from single-page summaries to multi-volume bound collections. it includes periodicals, books, brochures, and reference works. in approximately five years, the library's vendor documentation has grown from approximately 200 to 10,000 manuals on over 200 computer systems. the library's holdings also include information on peripheral equipment from over 170 manufacturers, e.g., printers, magnetic tape transports, microfilm, platters, memories, etc.; standards; gsa federal supply schedules; programmed instruction courses published by vendors; and major reference works with monthly supplements. in the early days of the library's existence, one librarian was able to catalog and shelve the material manually with no difficulty. however, the rapid growth in the availability and use of adp brought with it a flood of technical literature which threatened to inundate the librarian and the manual filing methods. it was recognized early that some form of automation assistance for the library was necessary. the system described in this paper is the one which evolved and is now successfully employed. system description the system, named access (automated catalog of computer equipment and software systems), used by the usacssec is characterized by simplicity. it is built around a master list of all holdings, and the key to its uniqueness and success is the cataloging scheme. manufacturers have various methods of identifying their literature, some having structured stock numbers, some using only the document title, and others ranging between these extremes. the only common identifier is document title, which offers inadequate access to the collection. an efficient cataloging scheme is therefore of primary importance as a means of identifying and retrieving documents. searches made by the analysts for whom the library is maintained usually fall into one of three types: . 1) location of a specifically identified document (e.g., the cobol programming manual for the univac 1108 computer system); 2) location of all documents pertaining to specific aspects of a particular computer system (e.g., technical descriptions of all output devices for the burroughs b3500 system); 3) location of all documents pertaining to particular aspects of a number of different computer systems (e.g., technical descriptions of line printers for ibm system 360, burroughs b3500, honeywell 200, rca spectra 70, and univac 1108). in 1966, since approximately 75% of the literature in the library was ibm oriented, ibm's index of system's literature, which categorizes documents by subject, was used as an initial model to classify literature of other manufacturers. since that time a more sophisticated, explicit and expanded subject index has been developed. table 1 shows a complete list of categories, together with an explanation of them. computer systems libraryjmalley 175 table 1. representative subject categories and codes hardware categorization subject code (tab) 00 01 03 05 07 08 09 abbreviated title general information machine system input/output magnetic tape units and controls direct access storage units and controls analog equipment auxiliary equipment subject category content systems summaries, bibliographies, configurators, publications guide, brochures on systems where no technical documentation is provided and price lists not in the gsa federal supply schedule. ex: publications guide with addendas. principles of operation, operator manuals, operating procedures, reference and system manuals. ex: processor systems information manual, operating manual. component descriptions of unit record equipment, e.g., line printers, paper tape readers, card readers, etc. ex: printers reference manual, card punch style manual. component descriptions and operation of the units. ex: magnetic tape unit operating manual. component descriptions and operation procedures. ex: disc storage subsystem and reference manual. information related to analog computers. also includes the interface equipment for connecting to digital computers. ex: integrated hybrid subsystem. includes plotters, digitizers, optical character readers, all nonstandard i/ 0 devices. interface equipment. ex: graph plotters. 176 journal of library automation vol. 4/4 december, 1971 10 13 15 19 20 21 24 communications and remote terminal equipment special and custom features physical planning specifications original equipment manufacturers information component descriptions of communication control devices and remote terminals. ex: a. voice response unit. b. visual display unit. c. teletype, typewriter terminals. d. graphic display units. special feature descriptions and custom feature descriptions. (those devices that must be custom built.) ex: a. satellite coupler. b. programmed peripheral switch. c. special feature channelto-channel adapter. d. european communication line terminal. installation and physical planning manuals. ex: site preparation and installation manual. devices subcontracted from other manufacturers. ex: component subleased from one manufacturer for use on own vendors equipment. software categorization programming systemsgeneral assembler cobol general concepts and systems summary related to the software of the system. ex: a. catalog of programs. b. programmer's guide. reference and programming manuals on the assembly language ( s) of the system. ex: a. assembler language. b. card assembler reference manual. reference and programming manuals on the cobol language. ex: cobol reference manual. 25 26 28 30 31 32 computer systems libraryjmalley 177 fortran other languages report program generator input/output control systems data management systems literature on the utility programs reference and programming manuals on the fortran language (includes basic). ex: fortran iv operations manual. reference and programming manuals on other higher-order general purpose languages such as algol, jovial, etc. ex: a. algol programmers' guide. b. jovial compiler reference manual. reference and programming manuals on report program generator ( rpg) languages. ex: report program generator reference manual. information related to the software facilities for the control and handling ·of input/output operations. ex : a. operating systems basic locs. b. computer systems input/output package. information related to generalized information processing systems which include the functions of information storage, retrieval, organization, etc. ex: a. ibm -gis b . burroughs forge c. ge-lds standard routines used to assist in the operation of the computer; e.g., a conversion routine, sorting routine or a printout routine. ex: a. utility system general information manual. b. utility systems programming manual. 178 journal of library automation vol. 4/4 december, 1971 33 35 36 37 48 sort/merge systems simulators/ emulators language translators operating systems, supervisors-monitors automatic testing programs miscellaneous programs information related to software facilities whose major functions are to sequence data in a disciplined order according to defined rules. ex: a. sort /merge timing tables. b. general information sort/ merge routines. information related to techniques, hardware or software, utilized to make one computer operate as nearly as possible like some other computer. ex: a. flow simulator information manual. b. emulation information manual. information related to the programs of a system which are responsible for scheduling, allocating and controlling the system resources and application programs. ex: a. disk / tape operating system operation manual. b. operating system programmers. interpretive diagnostic techniques which provide analysis of hardware components or of software programs; e.g., hardware autotest programs, software trace routines. ex: a. program writing and testing bulletin. b. system test monitor diagnostic. information related to special techniques or application programs. ex: a. apt general information manual. computer systems library/malley 179 documents are shelved (in loose-leaf notebooks) by manufacturer, computer system, subject category and numerical publication identification. the user is aided in his searches by the following three types of listings of holdings: 1) listing by manufacturer (figure 1 ): major sort field, manufacturer; intermediate sort field , computer system nomenclature; intermediate sort field, subject code (tab); and minor sort field, publication number. that is, a document is listed by publication number, within subject code, within the computer system, within the manufacturer. this list serves as an index to the library's holdings. ~ ibm ibm ibm i bm ibm ibm ibm ibm ibm ibm ibm ibm ibh system tab sys/3 70 00 sys/370 00 sys/370 01 sys / 370 01 sys/370 01 sys / 370 01 sys/370 01 sys / 370 03 sys / 370 03 sys/370 07 sys / 370 07 sys/370 15 sys/370 15 usacssec technical reference library catalog as of june 71 ibm corporation library listing by mfr by system mrs . malley, librarian pub no publication title a33-3006-0l sys / 370 model 135 configurator 710300 n20 -0360 71 *srl newsletter index of publications + programs 701231 a22 693500 sys/370 mod 165 functional characteristics 700600 a22-6942 00 sys/370 hod 155 functional characteristics 700600 a22 -700000 sys / 370 principles of operation 700600 c20 -172900 a guide to system / 370 model 165 700600 c20 173400 *a guide to the ibm system/ 370 model 145 700900 a21 9124 -0l 3505 card reader, 3525 card punch subsystem 710300 a24 3550 -0l 3215 -1 console printerkeyboard comp descr 700700 a26-1592-00 3830 stg contrl / 3330 disk storage comp desc 700600 a26 -160600 2319 disk storage component sumhary 700900 a22 697000 system/ 370 model 15 5 installation man phys plan 700600 a22 6971 -00 system/ 370 model 165 installation man phys plan 700600 *indicates new entries since last catalog. fig. 1. sample index listing by manufacturer name and system. 2) listing by subject code (figure 2) : major sort field, subject code (tab); intermediate sort field, manufacturer; and minor sort field, computer system nomenclature. that is, a manual is listed by computer system, within the manufacturer, within the subject code. within each subject code, or tab, all manuals pertaining to this subject area are listed. 3) listing by manufacturer name and publication number (figure 3) : major sort field, manufacturer; intermediate sort field, publication number. that is, a document is listed by publication number within the manufacturer. 180 journal of library automation vol. 4/ 4 december, 1971 mfr system tab cdc 6000 24 cdc 6000 24 602s3000b 60191200a pu blication title *6000 series cobol 3 reference manual 64/6s/6600 cobol reference manual 700700 690900 rca spec70 24 ec 001 s00 *ansi cobol language translator (ucolt)prog pub 701200 rca 3301 24 940sooo realcom cobol 660soo un! 1108 24 fsd 20s l *fd ansi cobol prog ref man 700s04 un! 1108 24 up 7626 r2 *cobol exec 2 & exec 8 supplementary ref 700911 uni 9200 24 up 7s43 r2 *cobol supplementary ref-see 9300 24 700s11 uni 9300 24 up 7s43 r2 *cobol supplementary ref 700s11 uni 9300 24 up 7820 *9200/9300 cobol summary card 700917 uni 9400 24 up 7709 rl *9400 cobol su pplementary ref 700630 uni 9400 24 up 7797 *9400 cobol summ~ry card 700707 xds sigmas 24 901s01a cobol6s operations 680700 xds sigmas 24 90 1sooa cobol 6s reference 680700 *indicates new entries s ince l ast catalog. fig. 2. sample index listing by subject code (tab), manufacturer name i and system . mfr system tab pub no publication title pub date ibm sys/370 01 a22 6935 -00 sys / 370 mod 165 functional characterstics 700600 ibm sys / 370 00 a22 6944 -0l model 195 configurator 691100 ibm sys/370 01 a22 6962 -00 sys/370 mod 155 channel characteristics 700600 ibm sys/370 15 a22 6971 -00 system/370 model 165 i nstallation man phys plan 700600 ibm sys /360 19 a22 6974 -00 sys/360 370 i i o interface channel 710200 ibm sys/370 01 a22 7000-00 sys / 370 principles of operation 700600 ibm 7070 t 7074 01 a22 7003 -06 7070/7074 pri!iciples of operation 620000 ibm 1401/1460 00 a24 -140l -02 1401 system summary 650900 ibm sys/370 07 a26 1606-00 2319 disk storage component summary 700900 ibm sys/370 01 c20 1738 0l a guide to systej>i/370 model 135 710 300 ibm sys/370 15 c22 7004 00 sys / 370 installation manual physical planning 710 100 ibm sys/360 26 320 1011 01 call/360 & pl/1 subroutine ver 2 700200 ibm sys/360 25 320 1054-00 call/360 fortran reference manual 700200 figure 3. sample i ndex l i st i ng by manufacture r name and publ i cation numbe r . fig. 3. sample index listing by manufacturer name and publication number. computer systems library/malley 181 the manufacturer needs only to list his documents pertaining to a proposal and an analyst can find them immediately by using this listing. this listing also aids the manufacturer in updating his documents on file in the library, as most manufacturers publish their own index of publications in numerical order. the above lists are generated by sorting and listing a master file. the latter is maintained on magnetic tape and updated with punch cards. four card formats are employed, one for each of the following: 1) addition of publications, 2) deletion of publications, 3) change of title or date of a publication in the file, and 4) change of other information. tables 2 through 5 show the format for each type of card. it should be noted that in table 3, information in columns 1-26 must be identical to that in the entry to be deleted, and that the publication title and publication date are not changed by the card described in table 5. table 2. punch card format for addition of a publication card columns information 1-3 manufacturer (abbreviated) 4-12 system number 13-14 subject code 15-26 publication number 0 27 the letter 'a' (key for adding 28-74 75-80 a publication) publication title publication date table 3. punch card format for deletion of a publication card columns information 1-3 manufacturer 4-12 system number 13-14 subject code 15-26 publication number 0 27 the letter 'd' (key for deleting a publication) table 4. punch card format for change of title or publication date columns information remarks 1-3 manufacturer identical 4-12 system number to 13-14 subject code listing 15-26 publication number 0 27 the letter 'c' 28-74 the new title if applicable 75-80 the new publication date if applicable 182 journal of library automation vol. 4/4 december, 1971 table 5. punch card format for change of manufacturer, system, tab, or publication columns information remarks 1-3 manufacturer identical 4-12 system number to 13-14 subject code listing 15-26 publication number 27 the letter 'x' 28-30 new manufacturer name 31-39 new system number 40-41 new subject code 42-53 new publication number a simple program written in cobol for the univac ll08 is used to implement access. data cards are read into memory, and the master tape file is updated. errors such as "no match" or incorrect format are identified during the update process. the updated master file is sorted to provide the three types of output listings described above. system development the present system evolved over a five-year period. the initial catalogs were prepared and maintained manually, and some of the better features of the early attempts were carried forward into the automated system. because of this evolution, it is difficult to determine the actual development cost of access. much of the detailed design was done in connection with development of the computer program. approximately seven man-months were required for preparation and debugging of the program. during this period, a total of approximately two hours of univac ll08 system time was required. negligible time has been spent on program maintenance since installation of access. not unexpectedly the greatest effort was expended in collecting and preparing data for the initial master file. the library in 1967 contained over 3,000 documents, and a punch card had to be prepared for each. the major adpe manufacturers cooperated in this undertaking, by providing properly punched cards for individual documents. cards were prepared by the usacssec for documents provided by small manufacturers and for miscellaneous documents in the library. the major manufacturers have continued their assistance in maintaining the data base, providing punch cards with all new documents delivered to the library. nevertheless, it cannot be stressed too strongly that the updating and maintenance of this library file is a very difficult and tedious task representing the work of a full-time librarian, library assistant and computer systems library /malley 183 clerk the library may receive 600 new documents and/or page changes, with or without cards, during a thirty-day period. the master file is updated and new listings produced every sixty to ninety days. more frequent runs would prove more beneficial to the users and require less manpower on the part of the staff. each run requires approximately ten minutes of univac 1108 system time. it is an interesting fact that communication was a problem during detailed design of access. adp system analysts and programmers thought and spoke in terms of codes, fields, sorts and files; the librarian operated in a context of documents, catalog cards, and indexes. a period of mutual education was necessary before effective communication transpired and the system design progressed. results the library today contains almost 10,000 hardware/software equipment documents on over 200 computer systems from 70 manufacturers. the flexibility inherent in access permitted the library to absorb this rapid growth with minor perturbation. during one six-month period documents describing the mini-computers of twenty manufacturers were added. the subject codes accommodated all documents, and the only modification required to the system was the addition of codes for these new manufacturers. the value of access was demonstrated when ibm and rca announced the new system 7. documentation on the available hardware and software was delivered on the day of announcement together with punch cards, and within one week this large addition to the collection was completely integrated into the catalog. adpe manufacturers also have benefitted from access. the army requires that adpe vendors, to be eligible for contracts, must maintain current technical documentation of their proposed systems in the usacssec library. manufacturers are provided copies of the listings pertaining to their equipment to check for compliance with the requirement. some manufacturers have even accepted the access cataloging scheme for use in their own libraries. access has met the objectives established for it. benefitting from the evolutionary nature of the cataloging scheme, the system has required a minimum of modifications to date. none of these has been substantive, falling more in the category of debugging rather than in that of design change. although access was initiated and installed to satisfy the unique requirements of the usacssec, it has general application. it brings order to the conglomeration of technical information on adp systems and equipment. the three listings that it produces become, in effect, axes for the multi-dimensional volume of information. 184 journal of library automation vol. 4/4 december, 1971 conclusion the usacssec technical library is recognized as having the most extensive holdings of adpe manufacturer's literature in the washington area. no libraries of equal or greater size are known to exist anywhere. it was planned initially that only usacssec analysts and technicians would have access to the information in the usacssec library. however, the resulting interest of various organizations of the department of defense (dod), and the fact that this collection provided information that was otherwise unavailable, prompted the command to open the library to a selected group of dod users. this initial relaxation has gradually evolved into provision for all government and military personnel receiving prior clearance from command headquarters usacssec to utilize the library for research. unfortunately, because of the type of material collected, the quantity available, and the constant demand, it has not been possible to permit the lending of materials. at present, approximately eighty personnel from other government agencies use the library each month for research. some of the agencies use it each month for evaluation and selection of computers. user reaction is amazement that such a collection of adp materials exists. it is not unusual for relatively new and thoroughly dynamic fields of interest to progress so rapidly that efforts to document them adequately lag behind the latest developments. the problem is particularly acute in the information processing field, whose large amount of technical literature is of little value without an efficient cataloging system. usacssec has solved some of the information problems in the computer field by examining in detail the special on-the-job requirements of computer system analysts in general. by developing its library in terms of the computer industry, rather than specifically to one command's requirements, a generalized library system in adp has evolved. it is hoped that this paper will be a contribution toward a standard approach in cataloging adp collections and creation of a commonality among adp technical libraries. lib-s-mocs-kmc364-20141005044920 comparison of marc serials, nsdp, and isbd-s josephine s. pulsifer: marc development office, library of congress, washington, d.c. 193 editor's note: the discrepancies and variations described in this docurmnt are being examined by an ad hoc serials discussion group, and are being resolved by the staffs of the library of congress and the national serials data program. readers are referred to the discussions of richard anable and audrey grosch, in this issue. briefly characterizes and compares the specifications for serial records of the marc serials distribution service, the national serials data program, and the international standard bibliographic description for serials. both data content and, where applicable, machine format are analyzed. introduction that current interest in machine-readable serial records is running high is evidenced by the recent establishment of the ad hoc serials discussion group, the questionnaire from the arl ad hoc committee on machinebased serial records, and various union lists of serials projects. there is recognition of the inability of libraries to support unilateral conversion of serial records to machine-readable form and of the undesirability of creating uncoordinated data bases which are compatible neither in bibliographic data nor machine format. new developments three recent developments, while all directed towards standardization, are nevertheless adding to the confusion of those libraries which, singly or in groups, are attempting to determine how best to acquire the required serials data base. these three developments are: ( 1) the marc serials distribution service ( marc-s); ( 2) the international standard bibliographic description for serials (isbd-s); and (3) the national serials data program ( nsdp). each of these efforts has a somewhat different goal, operates within different constraints, and is to some degree in conflict with the other two. 194 journal of library automation vol. 6/ 4 december 1973 marc serials distribution service the marc distribution service distributes machine-readable records for all library of congress ( lc) serials in roman alphabet languages given printed card cataloging since february 1, 1973. romanized records for serials in nonroman alphabets will be added in the fall of 1973. records for recataloged serials are also included. the goal of marc-s is the creation of multipurpose machine-readable serial records, with standard catalog data in the marc serials format, for lc's own data base and for dissemination to libraries and related communities.1 these records follow the anglo-american cataloging rules (aacr), with superimposition of entries established under earlier rules.2 consideration is being given to· abandonment of superimposition. approximately 3,000 serial titles in lc's main reading room and science reading room collections have been converted to marc and will be made available for distribution as soon as possible. e~'pansion of the service to a phased conversion of records for retrospective but active titles would be most desirable, beginning with some identifiable and widely useful subset. expansion would require both space and funds and a determination of priorities between automation of new serial titles and progress towards a comprehensive data base for lc current serial receipts. national serials data program the nsdp is the united states national center of unisist' s international serials data system ( isds), which has sole responsibility for controlling the assignment of the international standard serial number ( issn). to nsdp is delegated responsibility for the assignment of the issn to serials published in the united states. serials published outside the u.s. are assigned by the appropriate national center or by the international center in paris. besides its isds responsibility, nsdp has as a prime objective the provision of a data base of serial publications for the three national libraries (library of congress, national agricultural library, and national library of medicine) . the program is aimed to be of benefit to the national user community, including libraries, subscription agencies, publishers, and abstracting and indexing services. as set forth in the isds guidelines, isds centers will develop and maintain national registers of serial publications, containing all the necessary information for the identification of the serials. the international center is to distribute a printed register (in issn and title sequences) as well as a machine file of all titles, including a regular supplement. nsdp plans to publish similar registers containing a subset of the international register. 0 0 the information on isds and nsdp has been taken from the following publications: (1) isds guidelines--nsdp copy. (received april 16, 1973.) (2) national serial data program, "notes on special development in the program," no. 2 , january 1973 [p. 1}. comparison of marc serialsjpulsifer 195 nsdp must follow the isds guidelines in providing to the international center a set of data elements considered essential for identification of the serial. certain other elements are defined as optional isds elements. nsdp has defined certain additional elements as needed for the national record. international standards organization (iso) standards must be followed where applicable. international standard bibliographic description for serials the isbd-s defines the elements necessary for the identification of serials, assigns an order to these elements in the entry1 and specifies the punctuation designating these elements. 3 the isbd-s, as a descriptive catalog record only, is not concerned with access points to the record (i.e., entries) . it is not a machine format, though its standard order of elements and standard punctuation greatly assist the machine identification (i.e., format recognition) of the various elements. the work of the working group on the international bibliographic description for serials of the international federation of library associations has been completed and will soon be published. this document will be subject to revision following a period of study and/ or use by ifla member organizations. the isbd-s will be put into practice by lc when it is accepted by the american library association and incorporated into the anglo-american cataloging rules. comparison of data content: marc-s, nsdp, and isbd-s entry marc-s: lc practice in choice of entry (author/title vs. title) based on aacr 6a--6c, is followed. the form of entry is that used by the library of congress1 i.e., aacr with superimposition. when main entry is under author, the author is usually omitted from the title field. nsdp: the isds includes no name entry, main or added, in its essential or additional data elements. names are included in the key title under certain circumstances. however, nsdp has added "author entries" as an essential national data element, along with a means of indicating the aacr main entry. these author entries are established by nsdp following aacr form of entry without superimposition, and an authority file is maintained to relate variant forms of entry used by the national libraries to the nsdp form. isbd-s: isbd-s (like isbd-m) is not concerned with entry. it does not preclude main or added entries, but provides a description that does not depend on entry for identification. an author statement follows the title, unless the author is part of the distinctive title or there is no author. title and statement of authorship marc-s: the title is transcribed from the title page according to aacr. subtitles are omitted unless necessary for the identification of the publication. the author statement is omitted when the author appears in 196 journal of library automation vol. 6/4 december 1973 substantially the same form in the main entry or when there is no author. the marc format provides an additional field for "title as it appears on the piece," but this field is not included in marc-s records at the present time. nsdp: the key title is the title as it appears on the piece. it may include the issuing body when the name is grammatically, typographically, or visually linked to the rest of the title or when it is needed to make a generic title distinctive. in addition, it may be modified by place and/ or date of publication to make it unique. nsdp assigns a new issn and creates a new record when the wording of the key title changes. • • "minor" changes, including prepositions and spelling changes that may significantly change filing, do not result in a new record or issn. where the author or issuing body changes, no new record is created unless the author is part of the key title. because a new record means a new key title and new issn, the determination as to whether a ti· tie change is to result in a new record may only be made by the official isds agency. when the author statement is not grammatically linked to the title and the title is a "generic" term, the author statement is added to the title but separated from it by a dash. the interpretation of generic is limited, according to the isds guidelines, to terms denoting the kind and/ or periodicity of a publication.' a study of the differences in practice between lc and nsdp with respect to creating new records for change of title is attached. isbd-s: the isbd-s "distinctive" title is essentially the same as the nsdp key title with the important differences that it need not be unique, and that it is always followed by an author statement unless the author is part of the distinctive title or there is no author. therefore, there is no requirement for centralized determination of the distinctive title. this determination can be performed by any library or any other organization by applying the isbd-s rules. this paper does not attempt to point out all the differences among the three systems. it may be noted, however, that there are differences in the romanization schemes used by lc and nsdp, since nsdp uses iso romanization tables and lc uses ala romanization tables. 0 0 • nsdp also uses the country code being developed under iso while marc-s uses the country codes approved by the three national libraries. however, it is likely that when the iso code becomes a standard, it may be adopted for marc-s. references 1. library of congress. marc development office. serials: a marc for-mat. 2d. ed. washington, 1973. in press . .. o this assumes that nsdp is notified of au changes by the publisher or a user; otherwise it will not know that there is a change to be considered. 000 isbd-s is not concerned with romanization. comparison of marc serials/pulsifer 197 2. anglo-american cataloging rules. north american text. chicago: american library association, 1967. 3. international federation of library associations. i.s.b.d.(s); international standard bibliographic description f01' serials . [4th (?) document] paris [oct. 1972-jan. 19731 2 pts. 4. unisist international serials data system. guidelines for isds. paris: may 1973, 58 p. + appendixes. appendix 1 examples of successive title entries vs. use of a variant title note (or equivalent) by lc serial record and nsdp. perusal of the records for serials in the main reading room collection points up certain practices that are operative in serial record (sr) cataloging section with respect to making new entries for change of title. these practices do not necessarily correspond with nsdp's practice, based on isds guidelines, in assigning new key titles. the basic rule that a change in the wording of a title results in a new record is applicable to both operations. the difference comes in the exceptions to the basic rule. nsdp assigns a new key title when nouns or adjectives are added, changed, or deleted, based almost entirely on the difference in the words. it does not change key title when only a preposition or a conjunction is changed, nor when the change affects only a subtitle. nsdp appears to prefer a less strict interpretation of the change of wording rule prescribed by isds. sr, while making a new entry for "major" wording changes, tends to retain the old entry (adding a title varies note) in more cases. a change in scope appears to be a more significant factor than the wording. where no change in scope is involved, the determination to change is made on the basis of how closely the two entries would file. if change in scope is not involved, subtitle changes do not generate a new entry. {a) from: to: from: tn: from: to: from: to: from: to: examples sr changes because of changes in scope annuaire statistique de la belgique et du congo belge annuaire statistique de la belgique the boarding school directory of the united states the boarding school directory of the united states and canada commercial television, cinema, and radio directory commercial television year book nsdp would make new key title?• yes yes yes bra by's commercial directory of south and central mrica braby' s commercial directory of south east and yes central africa brown's directory of american gas companies. gas statistics yes brown's directory of north american gas companies • according to consultation with mary sauer, national serials data program. 198 journal of library automation vol. 6/ 4 december 1973 (b) (c) from: to: from: to: from: to: from: to: from: to : from: to: from: to: from: to: from: to: from: to: from: to : from: to: from: to: sr makes new entry on the basis of filing arrangement. adressbuch der haupstadt hannover adressbuch der landeshauptstadt hannover american marketing association. national membership roster american marketing association. membership roster california. secretary of state. roster of state, county, city ... california. secretary of state. roster of federal, state, county cape times south mrican directory cape times directory of southern mrica annuaire fructidor annuaire international fructidor best's recommended adjusters and investigators best's recommended insurance adjusters personalities in the caribbean personalities caribbean sr does not make new entry american chamber of commerce in france. directory of american business in france american chamber of commerce in france. directory canadian newspaper service, ltd. national reference book on canadian business personalities . . . canadian newspaper service, ltd. national reference book high school track and field annual high school track u.s. civil aeronautics board. handbook of airline statistics, united states certificated air carriers u .s. civil aeronautics board. handbook of airline statistics who's who, jamaica, british west indies who's who jamaica child welfare league of america. directory of member agencies child welfare league of america. directory of member agencies and associates from: ports, dues, charges, and accommodations throughout to: from: to: the world ports, dues, charges, and accommodations rotary international. the official directory rotary international. official directory of rotary international from: guia lascano del comercio y de la industria, nsdp would make new key title? yes yes yes yes yes yes yes· yest not yes yest yest no profesionales y elementes oficial del peru yes • nsdp, following isds guidelines, would not change key title, but is not in agreement. • • nsdp would append main entry to the new key title in this instance. t nsdp, following isds guidelines, would assign a new key title, but is not in agreement. * considered a subtitle by nsdp. comparison of marc serials/pvlsifer 199 nsdp would make new key title? to: guia lascano del pem from: marine catalog and buyers' directory yes 4 to: marine catalog buyers' guide from: the mobile home dealers of the united states and canada yes to: mobile homes and recreational vehicle dealers of the united states and canada from: national federation of settlements and neighborhood centers. directory of member houses not to: national federation of settlements and neighborhood centers. directory of member agencies from: california. legislature. list of members, officers, committees, and rules of the two houses yes to: california. legislature. list of senate and assembly members, officers, attaches, committees from: turner, clarence a. financial statistics, public utilities no to: turner, clarence a. financial statistics of public utilities from: instituto nacional del libro espaiiol. guia de editores y libreros yes to: instituto nacional del libro espafiol. guia de editores y de libreros de espana from: international telecommunication union. general secretariat. official list of telegraph offices opened for international traffic no to: international telecommunication union. general secretariat. official list of telegraph offices open for international service from: list of doctoral dissertations in history now in progress at universities in the u.s. no to: list of doctoral dissertations in history in progress ... (d) sr makes new record when main entry changes, or when new title merits entry under title main entry. from: architects registration council of the united kingdom. register of registered architects • . . yes to: the register of architects (title main entry) from: american automobile association north central tour book yes to: north central states. (title main entry) from: american gas association. gas facts no to: american gas association. dept. of statistics. gas facts from: american nurses' association. nursing infonnation bureau. facts about nursing no to: american nurses' association. facts about nursing from: illinois university. dept. of agricultural economics. a err no to: illinois university at urbana-champaign. dept. of agricultural economics. aerr • nsdp, following isds guidelines, would assign a new key title, but is not in agreement. t nsdp considers this a minor change, assuming main entry is part of key title. 200 journal of library automation vol. 6/4 december 1973 from: to: (e) from: to: from: to: from: to: from: to: from: to: from: to: from: to: compagnie internationale des wagons-lits et des nsdp would make new key title? grands expres europeens. guide yes compagnie internationale des wagons-lits et du tourism. guide sr makes a new record when an initialism or name is appended at the beginning of the title. arena, auditorium, stadium guide yes aasg; arena, auditorium, stadium guide american association of state highway officials. reference book yes american association of state highway officials. aasho reference book of member department personnel and committees american concrete institute. directory yes american concrete institute. aci directory belgium and luxembourg yes fodor's belgium and luxembourg adressbuch von burgenland fiir industrie, handel, gewerbe yes herald adressbuch von burgenland fiir . .. christian booksellers association. suppliers directory for christian booksellers yes christian booksellers association. cba suppliers directory classic car club 9£ america. directory yes classic car club america. handbook and directory 1964reproduced with permission of the copyright owner. further reproduction prohibited without permission. that's my bailiwick: a library-sponsored faculty research web server soderdahl, paul a;hughes, carol ann information technology and libraries; mar 2000; 19, 1; proquest pg. 29 communications that's my bailiwick: a library-sponsored faculty research web server pau/a.soderdahland carol ann hughes the university of iowa libraries provide a unique, new, scholarly publishing outlet for their faculty and graduate students. with the prevalence of personal faculty home pages and course web sites in just about every department on campus, it's not very hard for faculty to find a web server somewhere for storing an html file. and, with some work, faculty can often find some "techie" to help convert a document to html or to save a list of links. what is rare, however, is a space on the web where faculty from all disciplines can find a home for their scholarly research interests, coupled with a computing environment and a knowledgeable staff to help them "follow their bliss" in digital form. the information arcade's new bailiwick project does just that. the need for something new for a number of years, academic departments in the humanities and social sciences have been able to mount departmental information on the university of iowa's central web server maintained by academic computing. more recently, two centrally administered course web servers have been made available to any faculty member or teaching assistant offering a credit course. based on feedback from faculty and graduate students, however, the university libraries learned that there was no place for a research idea or other academically oriented "pet project" to be published on the web. instead, faculty and students needed to bury these somewhere on a personal home page, often with a commercial internet service provider at their own expense . rising to address this need, the university libraries sought to provide a well-respected, institutionally supported web server for just this sort of electronic publishing endeavor. what originally started as simply a "projects" directory on the library's general web server has now grown into the bailiwick project. officially launched in march 1998, bailiwick provides a space on the web where academic passions can be realized as highly specialized and creative web sites. it is not simply a place for personal home pages, nor is it intended for course web sites or academic departmental information. rather, bailiwick is designed to provide faculty, staff, and graduate students with web space where they can focus on a particular area of scholarly interest. bailiwick is not meant to serve as the new model for scholarly publishing in peer-reviewed journals . most electronic publishing initiatives arise from an attempt to transfer existing models of print publishing to the digital environment. a small number of electronic scholarly journals are currently published on the university of iowa campus, and the university libraries already provide a number of ways to support this medium, from archiving to cataloging to hosting journal sites, as one element of the university libraries' new scholarly digital resources center . bailiwick, instead, provides a web space that allows authors to harness and exploit this new electronic medium, permitting new models of expression with multimedia, hypertext, and the ability to incorporate anything in digital form. it is not intended to substitute or even compete with traditional scholarly publishing or electronic journal publishing. rather, bailiwick provides an opportunity to engage in an entirely new medium for scholarly communication. a history of innovation the heart of the bailiwick project within the library environment is the information arcade, an award-winning facility located in the university of iowa's main library. opened in 1992, the information arcade is a place that provides access to published electronic information resources coupled with state-of-theart multimedia development workstations that allow faculty and students to digitize and manipulate source materials that are not already in electronic form. the facility also houses a fully networked electronic classroom, with twenty-four student workstations, where classes from throughout the university are heldsome for the whole term and others for one or two class sessions. in support of its unique service mission "to facilitate the integration of new technologies into teaching, learning, and research," the information arcade is well regarded as a place for innovation and risktaking on the university of iowa campus . it is a place where ideas can be fleshed out; a place that can respond to the real technology needs presented by faculty and students . when the information arcade first opened, it was the only fully wired electronic classroom on campus, with a workstation at every student's desk. it was the only publicly accessible facility on campus where any faculty member or student paul a. soderdahl (paul-soderdahl@ uiowa.edu) is head of information arcade, and carol ann hughes (carol-hughes@ uiowa .edu) is head of information , research, and instructional services at the university of iowa libraries. communications i soderdahl and hughes 29 reproduced with permission of the copyright owner. further reproduction prohibited without permission. could create digital video on a dropin basis. it was the only computer facility on campus where anyone could access the internet for free. all of these innovations are now mainstays on campus. in 1998 the information arcade expanded its offerings with three new innovative web-based services. the moo project this text-based virtual reality campus for the university of iowa community is made possible through the magic of moo, a piece of software that creates a networked environment on the internet that is part e-mail, part chat-room, and part programming interface. known collectively as "the mediatrix," this educational moo currently houses two distinct academic projects. the scholar's web project, devoted to the possibilities of digital communication in graduate education, makes its moo home in "the cave." the mooniversity project, which strives to provide a virtual undergraduate learning environment that encourages collaboration across campuses and disciplines, is located in "the mooniversity." coadministered by d. diane davis, assistant professor of rhetoric in the rhetoric department, and michael calvin mcgee, a professor of rhetoric in the communications studies department, the mediatrix is available to any faculty member wishing to make use of either of them for teaching and research. the streaming video project with text-based virtual reality at one end of the spectrum, the information arcade simultaneously launched a new streaming video server to meet high-end multimedia needs for delivering real-time motion video and audio over the internet. with a fifty-user license to real networks' real server, the information arcade now provides students and faculty with the ability to serve digital movie files to several locations simultaneously. because of the streaming quality of the video files, users do not need to wait for an entire file to download before playing it. already used by bob boynton, professor of political science, for his multimedia politics class, the streaming video server provides a delivery mechanism for the digital videos created by students and faculty at the information arcade's multimedia development stations. the bailiwick project by linking new modes of communication and providing an outlet for any number of innovative scholarly projects, the bailiwick server has become a home for research projects, complementing the university libraries' course web server. space is available on this research web server to any university of iowa faculty, staff, or graduate student developing a scholarly academic web site or web-based tool that might be experimental in nature. open by simple proposal, bailiwick runs on a dedicated web server within the library and is supported by the university libraries' web server infrastructure. content providers retain editorial control and freedom, and have the ability to define their topic of interest, identify the target audience, and design a customized web site. each bailiwick is initially limited to 5mb of space, with the ability to petition for more based on specific needs for a given project. in addition to disk space, authors can turn to library staff at the information arcade for consultation on site design, graphics and layout, technical support, and training. an individual bailiwick might: • serve as a home page for artistic expression and collabora30 information technology and libraries i march 2000 tion among artists working in iowa and other states; • be a showcase for digitally produced art that incorporates interactivity meant to be viewed on a computer screen; • provide a natural home for hypertext experiments that explore new forms of multilinear argument or open-system documents that welcome, even depend on, links to other web sites to expand or counter those arguments; • host a site not full of bells and whistles, but simply a collection of narrowly focused pages of links to resources on a given topic; and • offer an electronic publishing medium for delivery of specialized bibliographies or digital reproductions of rare documents. there are currently eleven bailiwicks in production, with another eight in development. the authors of bailiwicks represent thirteen different academic departments, including communication studies, political science, athletics administration, and theatre arts. they range in rank from teaching and research assistants to full professors. sample bailiwicks currently, developed bailiwicks fall into one of four categories: (1) a collection of internet links on a specialized topic of study, ranging from a small set of links on a particular page to an annotated internet bibliography of thousands of links; (2) a hypertextual or multimedia essay or thesis that necessitates publishing in this medium; (3) a scholarly research project that is dynamic or updated with such frequency that print publishing would be ineffective, including, for example, ongoing findings from a research study; or (4) a collaborative project that makes use of a reproduced with permission of the copyright owner. further reproduction prohibited without permission. diaspora lafonr£ra 6£nd£r l£sbi6a.y c.ybor6s bord£r incid£nrs or-h£r bord£rs figure 1. karla tonella 's award-winning "border crossings " bailiwick shared electronically accessible work space. the internet bibliography karla tonella, a graduate student in mass communication, has authored three different bailiwick sites that loosely fall into the category of internet bibliography. as a former graduate assistant and information arcade staff member, tonella first identified the need for this sort of publishing medium on campus and articulated the concept of the bailiwick project. she was instrumental in bringing the server to fruition and quickly adopted it as a home for two comprehensive and award-winning sites of internet resources in her areas of expertise: "women ' s studies online" and "journalism and mass media ." both of these sites have been given widespread praise in those subject areas and have helped bring attention to the bailiwick project, both on campus and around the country . her "border crossings" site (see figure 1) also relies on internet links as its core content, but it is experimental in design and published in a way that is intended to "encourage the browsing readers to consider the areas of their postmodern world where traditional boundaries are being renegotiated and blurred." the site explores the notion of "border crossings" from a number of different perspectives. "border crossings" has received numerous citations in the mainstream press, including a sidebar in the chronicle of higher education, inclusion in britannica online's catalog of recommended sites, and a feature article in search, a monthly newsletter for advanced graduate students published by northeastern university in boston. the multimedia essay the most popular use for bailiwick thus far has been for publishing multimedia essays. the information arcade itself has been a proponent of the multimedia essay since it first opened in 1992, and most semesterlong courses now held in the information arcade's electronic classroom incorporate some sort of multimedia term paper as part of the course requirements . the information arcade is one of the leaders on campus in the adoption of electronic theses and dissertations, working closely with the graduate college and academic computing on a pilot project this semester. it is not surprising, then, that faculty members and graduate students are turning to bailiwick as a medium for publishing these sorts of materials . michael calvin mcgee , professor of communication studies, has published his essay, "suffix it to say that reality is at issue," as a bailiwick (see figure 2). jennifer lawrence-gentry, a ph.d. candidate also in communication studies, created a comprehensive site on the work of mikhail bakhtin, which is now seen as one of the most complete online resources on bakhtin. patrick muller, a teaching assistant in preventive and community dentistry, developed a bailiwick essay titled, "complexity studies: the fluid multifaceted nature of knowledge." the sites are all very different in design, target audience, and perhaps even scholarly value. nonetheless, bailiwick provides an ideal way for the university of iowa to support this sort of experimental multimedia publishing outside the rubric of a class assignment for a multimedia term paper or a communications i soderdahl and hughes 31 reproduced with permission of the copyright owner. further reproduction prohibited without permission. suffix it to say that reality is at issue mkhael calvin mcgee the univft'sity of iowa · th• famous s1nchi stup.a ... coveis .a ~sht co nbi nlng buddh.a's bones . they w. 1• broug ht here by lndi.l'sfi rsttn.hi impu i1list, ashok.a, ln ~7 bc.".mil.tra pr1dt1h· goito/ny fo r• det.alled description.sum j.!l.!11ilij!! ~ most everyone is familiar with westernized yersions of stupa rituals: you know -what you figure 2. professor michael calvin mcgee's essay published as a bailiwick more traditional electronic scholarly publishing environment. the scholarly research project aside from the hypertextual and multimedia aspects of publishing on the web, the most unique advantage to the web for publishing scholarly research is the ability to maintain currency on a published project. the most well developed example of this is a bailiwick on gender equity in sports (see figure 3), sponsored by the women's intercollegiate athletics department. the site monitors the current state of affairs of gender equity in intercollegiate and interscholastic sport, and tracks title ix compliance and pending title ix litigation at colleges and universities. this resource has received significant national attention and acts as a research tool in and of itself that is published out of the university of iowa libraries and now available for students and scholars across the country. another example is the dogon bailiwick, published by chris culy, associate professor of linguistics . marcel kervran, a member of the congregation of catholic missionaries know as peres blancs, who lived in the town of bandiagara, mali for about thirty years, compiled this dictionary of the dogon language. the dictionary has more than seven thousand head words. a second expanded edition was published in 1993. partially representing the varieties of dogon spoken in and around bandiagara, the dictionary is being expanded from its earlier hypercard format, and it may soon be ported 32 information technology and libraries i march 2000 into an sgml environment. it is an excellent example of an academic toolthat would be difficult to create and deliver in paper form. the collaborative work space the bailiwick server provides a way for researchers at the university of iowa to work collaboratively and in a public forum with colleagues at other institutions. this collaborative space can be used as a way to gather research data, or to allow others to comm ent on or contribute to the development of a site . barbara bianchi, a graduate student in counselor education and an art therapist, has established a bailiwick for global connections, a set of online art and notes from travel journals. one component of the global connections site, called "russia revisited," includes materials from a number of contributing artists and students in russia, who are jointly working together to create a collaborative artistic travel journal. international collaboration is being tested in another project as well. with grant funding, two scholars-one at the university of iowa and one in germany-are working with university of iowa libraries staff to create a new academic resource consisting of a web-searchable critical edition of the work of ingeborg bachmann. this bailiwick will eventually contain bibliographies , a hypertext archive of materials not yet published in any form relating to bachmann's life, work, and cultural context, and a searchable corpus of commentaries and translations. an advisory group for the proj ect consisting of additional international scholars has already been named to oversee the development of content. as it grows, this bailiwick will result in an unprecedented resource for scholars from many disciplines. it presents a new reproduced with permission of the copyright owner. further reproduction prohibited without permission. last update: june 29 , 1999. welcome to the university of iowa gender equity in sports proj ect. for information about the maintenan ce of this proj ect, visit a bout this research proiect . see ind ex to updates & law suits hv cajegory for most recent changes. choose one of the following for more information. title ix: the law • overview of title ix • the federal law 0 title ix of the education amendments of 1972 (supt. of public instruction wa) hew figure 3. bailiwick on gender equity in sports sponsored by the women 's intercollegiate athletics department model for the development of academic web sites that not only reflect serious study but actually nurture the creation of new, international scholarships. other bailiwick proposals are also candidates for outside funding and can follow this exciting lead. policies regarding bailiwick sites bailiwick sites run the gamut in subject area, nature, and scope. no attempt is made to centrally control the content of someone's site . after all, it is their bailiwick and they have complete editorial freedom . on the other hand, there are certain guidelines in place for establishing a bailiwick to maintain the focus of the project as an innovative research web server . first, the site is not intended to be a space for student class assignments. short-term projects intended to meet course requirements can be accommodated currently on one of the university's centrally administered course web servers. in addition, the site is not meant to be a place to mount a personal home page or even a student's career portfolio . this type of activity can be better accommodated on a student's personal account through academic computing or through a commercial internet service provider. sites that are commercial in nature are refused, as are sites that are completely divorced from the university's mission . content providers need to abide by the university's acceptable use policy, which identifies inappropriate uses of information technology resources on campus, such as hacking, forgery, inserting viruses, viola ting intellectual property rights and software licenses, interfering with others' access to information technology resources, or personal campaigning, lobbying, or commercial activities. these modest restrictions notwithstanding, most proposals for bailiwicks ha ve been approved. inappropriate use of bailiwick web space has not yet been an issue. library resources to support the project the hallmark of the information arcade is its dual strength in providing a facility with state-of-the-art, high-end computing equipment for electronic publishing and multimedia development as well as providing a diverse public services staff who can work closely with faculty and students, often one-on-one, to help them harness the technology and integrate it effectively into their teaching, learning, and research. the facility is staffed with six half-time graduate assistants selected from a variety of academic programs in an attempt to achieve a balance of technologists, information specialists, graphics artists, and instructional designers. the primary benefit of this unique staffing arrangement is that the information arcade is much more than just another computer or library lab. it is a place where faculty and students can find qualified consultants trained in a subject specialty with expertise in almost any area related to technology. with this high-tech and hightouch model, the information arcade is uniquely suited to host a project like bailiwick. within the walls of this facility, the library provides support for every step of development from inception to creation to delivery. with expert consultation, access to equipment, technical support , and web server space, the arcade becomes a communications i soderdahl and hughes 33 reproduced with permission of the copyright owner. further reproduction prohibited without permission. one-stop place for presenting scholarly research. staff support includes consultation in any aspect of the bailiwick project, including design issues, interface development, and training in software. staff members do not provide programming nor do they do any work in researching or assembling sites. each faculty member is assigned an information arcade consultant at the point of submitting a bailiwick application. the consultant serves as a primary contact person for technical support, troubleshooting, basic interface design guidance, and referrals to other staff both in the libraries and on campus. at present, the current level of staffing has been sufficient to accommodate this sort of assistance, which is not unlike the assistance provided to any patron who walks in the door of the information arcade. as a computing facility, the information arcade provides public access to a host of multimedia development workstations for scanning images, slides, and text, and for digitizing video and audio. at these multimedia stations, a large suite of multimedia integration software and web publishing software is made available for public use. staff at the public services desk have a strong background in multimedia development and web design and can provide some one-on-one training on a walk-in basis beyond technical support and troubleshooting. all of these hardware and software resources are available to bailiwick content providers, who can choose to do their development work in the information arcade or at their home or office. finally, since there is a close relationship between the information arcade and the university libraries web site, system administration and web server support is all handled inhouse as well. there are few artificial barriers imposed by the technology, thereby permitting content providers to focus on their creative expression and scholarly work. with only minimal reallocation of existing resources, the university of iowa libraries has been able to launch the bailiwick project and continue to develop it at a modest pace. one of the components most essential for its continued success, however, is the ability to scale up to meet the expected demand over the next several years. technical infrastructure challenges are not overwhelming as yet. an analysis still needs to be made to determine how quickly creators are developing their sites, what the implications are for network delivery of these resources, what reasonable projections there are for disk space, and who is using the resources. perhaps more importantly, though, adequate staffing will always remain a concern. some faculty wish to work more closely with library staff consultants than time allows, and the consultants would certainly find it enriching to be more intimately involved with the development of each bailiwick site. marketing of the bailiwick project has been discrete (to say the least) because of the limited staffing available. however, embedded in the collaboration inherent in bailiwicks is the potential for stronger involvement with faculty in obtaining grant funding to support the development of specific bailiwick sites. a model for research libraries bailiwick is a project that allows the university of iowa libraries, and specifically the information arcade, to focus on the integration of technology, multimedia, and hypertext in the context of scholarship and research. to date, most of the bailiwick sites represent disciplines in the arts, humanities, and social sciences. this matches the overall clientele of the information arcade (given its location in the university of iowa's main library), but it also reflects the fact that these disciplines have been tradi34 information technology and libraries i march 2000 tionally undersupported with respect to technology. nevertheless, individual faculty in these disciplines have integrated some of the most creative applications of the technology in their everyday teaching and research, in part because of the existence of the information arcade and the groundwork laid by the libraries for the past several years. with the information arcade's visibility on campus, and with similar resources and support in the information commons-a sister facility in the hardin library for the health sciences-the university of iowa libraries are well regarded on campus as a leader in information technology, electronic publishing, and new media. thus, faculty and students alike are accustomed to turning to the libraries for innovation in technology and the bailiwick project is a natural fit. bailiwick is now fully integrated as part of a palette of new technology services and scholarly resources included within the libraries' support of teaching, learning, and research at the university of iowa. engelond: a model for faculty-librarian collaboration in the information age scott walter the question of how best to incorporate information literacy instruction into the academic curriculum has long been a leading concern of academic librarians. in scott walter (walter.123@osu.edu), formerly humanities and educaton reference librarian, university of missouri-kansas city, now is information services librarian, ohio state university. lib-s-mocs-kmc364-20141005044147 117 technical communications isad announcements please note a change of address for the editor of technical communications: send all future news releases, technical communications, and announcements to don l. bosseau, director of libraries, emory university, atlanta, ga 30322. technological inroads artificial intelligence transistors and other circuit elements of the new generation of computers are so tiny and fitted so closely together that it becomes feasible to combine thinking circuits and memory units on a single chip. thus, one cell in the computer's memory bank can both remember and reason. this is a major step closer to artificial intelligence. in july 1971, the japanese government eannarked $100,000,000 for an eight year study of artificial intelligence. japanese industry accepts the conclusion that it could be increasingly dependent on "intelligent" computers. a technical report dated february 1971 reads; "the development of these tiny chips presages a time when the electronic brain will rival the human brain in complexity and memory. the identity of the fully educated computer may become blurred with that of its programmer-teacher! it may exhibit esthetic and artistic judgments of an interesting degree of subtlety. responses akin to feeling and emotion need not be excluded from its training if they may enhance its performance." along with artificial intelligence will come electronic voice recognition. voice recognition by the computer-in other words, a computer that will respond to oral command-is making significant engineering progress. rca reports that its voice command machine responds to twenty-eight of the basic sounds in the english language.-(extracted from advertising age, march 19, 1973). cbs laboratories invents way to produce microfilm pictures by laser dr. william e. glenn, jr., director of research at cbs laboratories, a division of columbia broadcasting system, inc., has been granted a u.s. patent for an improved method of recording and reproducing information from microfilm. by means of a splitbeam laser, pictorial or printed information is transferred to a metal master. this metal master disk is similar to the type used in the record industry and, from this disk, duplicates can be stamped at low cost. "the market potential," dr. glenn stated, "will not only include the cassette and film industry, but it can be an asset to libraries and government printing as well. this system," he further stated, "is designed for recording and reproducing picture infonnation. it uses diffraction gratings that are modulated in accordance with the picture information. reproduction is effected by directing light through the medium. the zero-order diffracted light is modulated in accordance with the picture information." the patent, assigned to columbia broadcasting system, inc., will offer reduced costs for recording on microfilm. it has potential for use in the motion picture film industry, libraries, and cassette r~ cording. cbs laboratories has made other outstanding advances in laser technology which include the laser color film recorder, holography, and the holographic scanner. l i i 118 journal of library automation vol. 6/ 2 june 1973 microimagery-solution to the information explosion tomorrow's busy businessman will have the information necessary to do his job right at his flngertips, due to the growing acceptance of microimagery as the solution to the information explosion. "in every area of business today, the need for information is increasing faster than any individual can keep up," says walter steel, bell & howell's vice-president of microimagery marketing. "university courses are now teaching kids to be generalists and how to flnd the information on what they need to know. they're learning that the vehicle to the access of information sometimes is more important than the knowledge," steel says. the seventies will be known as the decade of microfilm, just like the sixties for the copier and the fifties for the computer, according to steel. microfilm is halfway between the computer and the copier as a support to business, because it includes copies and peripherals to the computer. soon the copier will become peripheral to microfllm, steel states. steel calls microimagery, "the immediate communication tool." it's the new media that fits the new world of business. soon companies will be saying to their customers, "we11 send you our computer once a week." technical journals will simply send their subscribers a paper newsletter that hits the high spots, along with a deck of microfiche and a new index, plus a retrospective new index each month, steel forecasts. "microfilm won't ever totally replace paper," says steel, "but it will replace file cabinets and storage areas, plus it will simplif,y the filing system in any size office. steel says that the potential for microfilm is greatest in the business records market. the bank market was the base for the microfilm business, but it's no longer predominant, according to steel. "the basic unique value of microimagery is that it saves money. our goal at bell & howell is to be able to provide a complete microfilm system for the small office market for under $1,000. that would include a camera, microfilm processor and viewer," he stated. in light of increasing postage costs, many publishers are actively investigating microimagery. ten pounds of printed matter are reduced in microforms to an ounce or less. with the development of microfiche having a 50 to 1 ratio (i.e., 510 images on a 4 x 6 inch fiche) , 90 percent of the books published could be available on a single microfiche each. the book of the month club could become the fiche of the month club. in every profession there's new technology that the successful manager must have access to in order to continue his success. microimagery can put that knowledge at his fingertips. reports-regional projects and activities new uc library automation office established berkeley-coordination of multicampus automation projects serving the university of california's libraries has been placed in a central office under a director of the university-wide library automation program (ulap). jay l. cunningham, a project manager in uc's institute of library research, has been appointed to the director's position. library automation has been underway for several years at the university of california, which is considered one of the pioneers in this field. each of the nine campus libraries has specialists for automation on its staff, and a central staff has been also working on such problems in the university-wide institute of library research ( ilr) . with growing emphasis on automation, coordination of the various campus projects becomes increasingly important to insure that applications are compatible. uc also maintains close contact with similar efforts at the california state university and colleges. coordination of a number of horary functions by these two segments of public higher education may be greatly facilitated by automated procedures. in recent years, such coordinating tasks have fallen more and more on ilr, an organized research unit directed by a professor in berkeley's school of librarianship, charles p. bourne, who also served as acting ulap director for the past eighteen months. since the primary task of such units is research in support of the university's educational function, the responsibility for development and operation of university-wide automated procedures has been made into a full-time assignment, with cunningham taking over as ulap director from professor bourne. a close working relationship will be maintained between the two groups. among projects well under way are the following: university of california union catalog supplement. the berkeley and ucla catalogs published in book form in 1963 have been recently supplemented by a forty-seven-volume set showing all monographs cataloged by all nine campuses during the five years 1963 through 1967. preparation and printing of the more than 750,000 titles was done by semiautomatic methods. union list af serials. all serial publications, including book series and scholarly journals, to which uc libraries subscribe are entered in another list that is to be continually updated, a task greatly simplified by the computer. scholars and other library users will be able to determine immediately which uc campus subscribes to any serial and how complete its holdings are. bibliographic center. in addition to housing the above two projects, this center helps in processing newly acquired books by printing catalog cards by computer at the ulap headquarters. cards can be ordered by a uc library in full sorted sets, including multiple sets if needed for branch libraries. the new system supplements the present method of ordering cards separately from outside vendors or producing them on each campus. among projects envisaged for the future are automated circulation procedures, under which each borrower would be given a machine-readable card and the charge slip technical communications 119 in the back of the book would likewise be machine-readable, such as a punched card. this method would speed the checking out of books and facilitate statistical studies. other projects include a clearinghouse that would indicate instantaneously whether a new book recommended for purchase has been already ordered by another campus; and the streamlining of library accounting procedures. cunningham is a graduate of cornell university and holds the master of library science degree from berkeley. before joining the ilr staff, he served as a library systems specialist at the library of congress, and as a u.s. air force officer for four years. the new director will report directly to vice-president-academic affairs angus e. taylor, a university-wide official. committee undertakes implementation of program which will afford universitywide "direct access" state university of new york students will soon benefit from more direct access to the 7.5 million books and 6.2 million slides, films, recordings, and other research materials contained in libraries on the university's thirty-four state campuses. that the university is moving to provide faculty and students with walk-in privileges at any of the libraries at the twentynine state-operated campuses and the five statutory colleges at alfred and cornell universities, was announced recently by university chancellor ernest l. boyer. the proposed system, which has the endorsement of the faculty senate of the university, will greatly improve upon the university's current interlibrary loan program under which beaks at cooperating libraries can be borrowed through the mails. working in cooperation with state university librarians, chancellor boyer has announced the formation o£ a committee of librarians and administrators to develop a timetable and procedures to implement the program. the committee will be chaired by willis bridegam, director of libraries at the university center at binghamton. 120 journal of library automation vol. 6/ 2 june 1973 the other members of the panel are dr. philip sirotkin, vice-president for academic ahairs at the university center at albany; don cook of the university center at stony brook; mary cassata of the university center at buffalo; george cornell, college at brockport; and henry murphy of cornell university. in addition to developing a program timetable and procedures, the committee will also explore the future possibility of extending access privileges to the faculty and students at the thirty-eight locallysponsored community colleges. the expanded library access policy is seen as an essential step in the university's efforts to use its library resources more effectively, particularly since the cost of acquiring books and periodicals has grown at an extraordinary rate in recent years. some publications costs have increased at the rate of 15 percent p er year. state university of new york is the first major multicampus system to introduce such a reciprocal program on so wide a scale, although the library system of the state university of illinois has a similar policy, limited to faculty and graduate students. the growing use of modern computer and data processing techniques is another cost control program the university has implemented in administration of its libraries. shared cataloging techniques and the compilation of lists of university-wide locations will be developed to enable library users expeditiously to locate books and reference tools. the policy will be particularly beneficial to students of the university's empire state college, since they are not campusbased and must rely heavily on library collections near their homes or places of employment. the policy will also make it much more convenient for students and faculty to conduct research and complete reference assignments in other parts of the state during vacation and intersession periods. collectively, the libraries at the university's state campuses comprise one of the greatest collections of titles and reference materials in the world. holdings for the 197172 academic year included 7,551,333 volumes, 237,428 microfilms, another 5,115,584 units in other forms of microtext, 20,587 slides, 71,007 recordings, 86,662 maps, 90,694 periodical titles, 29,334 additional serial titles, and 541,007 printed government documents-for a grand total of 13,743,636 entries. potpourri u.s. experts study soviet science information system and services eight united states information specialists from government, universities, professional societies, and private industry participated in the first u.s.-u.s.s.r. symposium on scientific and technical information, organized under the u.s.-u.s.s.r. agreement on cooperation in the fields of science and technology, in moscow, june 18-19. the group led by dr. lee g . burchinal, head, office of science information service, national science f oundation, also spent ten days in the soviet union visiting key information organizations in moscow, novosibirsk, yerevan, and kiev. the purpose of the symposium and subsequent site visits was to give the u.s. group an opportunity to learn more about the soviet system for providing science and industry with needed scientific and technological information, and to explore feasible areas for possible future cooperation. in addition to dr. burchinal, members of the group included william t. knox, director, national technical information service, department of commerce; melvin s. day, deputy director, national library of medicine; dale b. baker, director, chemical abstracts service; scott adams, science communications division, the george washington university; dr. vladimir slamecka, director, school of information and computer science, georgia institute of technology; bart holm, manager, systems development section, information services division, e. i. dupont de nemours & co.; and jerome luntz, senior vice-president, mcgraw-hill publications co. the group was hosted by engineer n. b. arutiunov, director of the information directorate, state committee for science and technology (scst), council of ministers of the u.s.s.r. the symposium featured four presentations by soviet specialists on the following topics: 1. state scientifl.c and technical information system of the u.s.s.r. (dr. 0. v. kedrovskiy) 2. viniti's integrated information system for the u.s.s.r. (dr. a. i. chernyy) 3. specialized system of scientific and technical information services in instrument making (dr. v. a. rukhadze and dr. v. m. baikovsky) 4. psychological aspects in charting the pathways of scientific and technical information development (prof. dr. g. t. artamonov) on june 20-23, the u.s. group visited the all-union institute for scientific and technical information (viniti), the allunion scientific and technical information center (vntitsentr), the all-union research institute of medical and medicotechnical information (vniimi) and the state public library of the u.s.s.r. for science and technology (gpntb-sssr). on june 24-29, the u.s. group visited the siberian branch of the u.s.s.r. academy of sciences and the novosibirsk center of scientific and technical information, the armenian research institute of scientific and technical information and technico-economic studies ( armniinti), the ukrainian research institute of scientific and te chnical information and technico-economic studies (ukrniinti), and the institute of cybernetics of the ukrainian academy of sciences. although about .five years behind the u.s. in applications of technology, especially computer and microform systems, dr. burchinal said, the soviets have established a strong base for rapid future growth. reflecting their style of centralized, national planning, the soviets are well advanced toward development of an integrated national information system embracing both science and technology. the major components of the emerging technical communications 121 integrated national system are { 1) centralized policy, planning review, and methodological guidance provided by the state committee for science and technology ( scst) ; ( 2) concentration of national backup resources in all-union (national) institutions; ( 3) eighty-two ''branch" information networks established by the industrial ministries; ( 4) development by the fifteen republic and regional information institutes of "interbranch" or interdisciplinary dissemination services to serve local industries and planning bodies. a major feature of this national information system is emphasis on the active dissemination ("propaganda") of information about technological innovations throughout the soviet economy. the u.s. group, d. burchinal said, was particularly struck by the importance attached to information services by the highest levels of scientific and technological management in the u.s.s.r. and in the constituent republics. their commitment is reflected in the resources being assigned to development of improved information services. four new buildings are being constructed in moscow alone for all-union scientific and technological information services; staffs are being expanded; third-generation computer systems will be installed at numerous sites beginning in early 197 4; and new buildings are underway or were recently completed for n early a dozen republic and interbranch services. in short, the soviets know where they want to go, and they are devoting considerable resources to achieve their national objectives. the second half of the symposium begun in moscow was held in washington on october 1-2. at that time u.s. and u.s.s.r. representatives sought agreement on areas of continued cooperation which will be reported to the joint u.s.-u.s.s.r. commission on cooperation in science and technology when it meets in moscow. a report of the june visit by the u.s. team to the u.s.s.r. will be available through the national technical information service. 122 journal of library automation vol. 6/2 june 1973 pertinent publications isad cable tv information packet now available from the american library association's information science and automation division is a thirteen-piece packet of materials on cable television. included in this information kit of articles, bibliographies, policy statements and suggestions are the following: • annotated bibliography on cable television for librarians, brigitte l. kenney and susan bunting • catv: visual library service, brigitte l. kenney and frank w. norwood • cable television-a bibliographic review, james schoenung • cable television: state-of-the-art and franchise recommendations, advisory memorandum by nowell leitzke • a glossary of terms for cable television and other broadband communications, merry sue smaller • guidelines for planning a cable te1evision franchise, sidney dean, jr. • letter to joe fischer, jr., from c. lamar wallis, director of libraries, memphis public library and information center • metropolitan library service agency (melsa) position paper on cable television, jon shafer • planning for urban telecommunications, kas kalba • public-cable, inc. statement • a report on cable communications and the district of columbia public library, lawrence e. molumby • san francisco public library video center policy statement • video/ cable activities in libraries, brigitte l. kenney and susan bunting packets are available for $2.50 each. send order to: cable tv packet, donald p. hammer, !sad, american library association, 50 e. huron st., chicago, il 60611. please make checks payable to the american library association. 148 telecommunications primer joseph becker: vice-president, interuniversity communications council (educom), bethesda, maryland a description of modern telecommunications devices which can be useful in inter-library communications, including their capacities, types of signal<; and carriers. described are telephone lines, radio broadcasting, coaxial cable, microwave and communications satellites. this article, and the one following, were presented as tutorials by the autho1's to participants at the american library association's atlantic city convention on june 25, 1969. as greater emphasis is placed on the development of regional and national library network programs to facilitate interinstitutional services, a concommitant requirement emerges to understand and apply communications technology. a great variety of communications methods has been used for interlibrary communications in the past, ranging from the simplest use of the u.s. !ilails up to the telephone, the teletype, the radio, and even experiments with microwave telefacsimile transmission. of all the different kinds of equipment used by libraries for interlibrary communications, the one which has received widest acceptance for its practical value and immediate usefulness is the teletype machine. the earliest use of the teletype machine can be traced back to the free li~ brary of philadelphia, which in 1927 used the teletype as part of a closed circuit system for communicating book information from the loan desk in the main reading roo'm to the stacks and vice versa. following world telecommunications primer 149 war ii, an installation connecting distant libraries yvas established in wisconsin between the milwaukee public library and the racine public library. racine's limited collection was considered inadequate to the demands of its patrons and its director, instead of increasing the book budget significantly, negotiated an access arrangement with the larger collection at milwaukee via teletype. daily messenger service was instituted between the two libraries to effect pickup and delivery of library materials. the teletype machine enabled the two libraries to have the speed of the telephone with the authority of the printed word. this advantage continues today and can be considered mainly responsible for the proliferation of t eletype communications for interlibrary loan. teletype communications between and among libraries are beginning to emerge in both informal and formal network configurations. in addition to its obvious application to interlibrary loan, teletype has also been used to augment library holdings on a reciprocal basis, to provide for general communications with other libraries, to serve as a channel for querying union catalogs, to accommodate reference questions and services, and to handle internal communications. perhaps the most important benefit to accrue to users of library teletype service is the ability to communicate immediately with any other teletype user anywhere in the world. thus, it becomes possible for any participant in the teletype network to communicate reference inquiries to information points outside the formal network. (a classified teletype directory exists which lists library subscribers in the united states and canada.) as reference demands increase, it is likely that libraries wpl begin to make wider use of the teletype machine even though it may have been acquired initially for a more limited purpose. in addition, expanded uses in the future are a virtual certainty both because of the low cost of teletype operation and because of the technical improvements in the equipment itself. although the advantages of other means of telecommunication have been known to libraries for many years, their utilization has been retarded by problems of cost and systems planning. however, in recent years, as libraries have made greater use of computers and as they h ave moved towards new programs of interlibrary cooperation and resource sharing, interest in telecommunications in general has grown more intense. the purpose of this article, therefore, is to provide a brief explanation of the fundamentals of communications technology in order to establish a basis of understanding for current and future library planning. telecommunications capacity telecommunications may be simply defined as the "exchange of information by electrical transmission over great distances." for the past forty years, the united states, through its commercial carriers, the bell tele150 journal of library automation vol. 2/ 3 september, 1969 phone system and western union, has built an increasingly effective system of wires, trunk stations, and switching centers for the transmission of human speech from point to point. the telephone network is a technological marvel despite the occasional busy signal one gets on the line. however, with the increasing use of computers and television in science, business, and industry, this network is being asked to carry digital and video signals in addition to voice, and its facilities are fast becoming overloaded. in the library field one can observe the trend toward use of machine readable data and non-print materials. these are but a few examples of library data forms that one will wish to communicate between and among libraries. voice can be efficiently transmitted over telephone lines, but data, like the digital language of the computer or the video language of the television camera and facsimile scanner, need a broader band-width for their efficient transmission than the narrow-band-width telephone line can provide. band-width is a measure of the signal-carrying capacity of a communications channel in cycles per second. it is the numerical difference between the highest frequency and the lowest frequency handled by a communications channel. the broader the band, the greater the signal transmission rate. the tens of thousands of bits which make up a computer message or tv picture, if sent by telephone, have to be squeezed through the narrow line over a longer period of time to transmit a given message. this consumes telephone capacity that would normally be used to carry other conversations. a good example of the problem can be illustrated with the "picture-phone". this is the telephone company service now being tested which permits a caller to see and hear the other person at the distant end. the two-way picture part of this dialogue requires more than 100 times more telephone transmission capacity than the voice portion. there are 100,000,000 telephones in the u.s. today. thus, if only 1% of the subscribers had picture phones we would theoretically exhaust our national telephone capacity for any other use. · fortunately, the problem of telecommunications capacity is not without solution. new channels of communication are being opened that do provide capacity for broad band-width exchange. the new technology of laser communications, for example, stands in the wings with a long-range answer. the word laser stands for light amplification by stimulated emission of radiation. its theoretical beginnings go back half a century, but fifteen years ago scientists working in high-energy physics learned how to amplify high-energy molecules so as to produce a powerful, narrow, coherent beam of light. this strange kind of light remains sharp and coherent over great distances and can therefore be used as a reliable channel or pipe for telecommunications. all other long-distance transmission systems tend to spread or disperse their signals, but laser beams provide a tight, confined highway over which signals can travel back and forth. telecommunications primer 151 a few years ago seven new york television channels, in an experiment, transmitted their programs simultaneously over the same laser beam. in terms of telephone conversations, one laser communications system could theoretically carry 800,000,000 voice conversations! the intense pencil-thin laser beam is so powerful and reliable that it can and is being used as a communications channel for space exploration. the apollo 11 astronauts left a laser beam reflector on the moon's surface to facilitate future communications experiments. types of signals there are three principal types of signals that telecommunications systems are designed to carry: 1) audio-originating as human speech or recorded tones and transmitted over conventional telephone lines. 2) digital-originating with computers or other machines in which data is encoded in the binary language. the data, instead of being represented as zeros and ones, take the form of an electrical pulse or no pulse. 3) video -originating with tv recorders, facsimile scanners, or other devices which change light particles into electrical energy in the form of small, discrete bits of information. each of the three types of telecommunication signals is associated with a telecommunication channel that can carry it most efficiently. audio, of course, was designed to travel over the telephone line. however, it can be carried just as well over the broader band-width channels. digital and video signals are carried over the wider band-width channels because of the great number of bits that must be accommodated per unit of time. sending computer data or pictures over telephone lines is possible if data phones are used; they convert digital and video data to their tone equivalents at the transmission end and reconvert them at the receiving end. this is, however, a very slow process and from a communications viewpoint it is most inefficient. when reference is made to "slow scan television.. it means that the video signal is being carried over a telephone line. library experimentation with telefacsimile has by and large been restricted to transmission of the facsimile signals over telephone lines. an 8"x10" page carried by telephone lines takes about six minutes, as compared to 30 seconds if it were sent over a broad band-width channel. a telecommunications system used for library purposes will eventually need to integrate audio, digital, and video signals into a single system. this integrated media concept is an important aspect of the design of an interlibrary communications system but it is poorly understood analytically in today's practice. the idea of an "integrated telecommunications system" became practical only during the past few years and commercial and governmental efforts are underway to provide these unified facilities as rapidly as possible. 152 journal of library automation vol. 2/3 september, 1969 signal carriers a number of methods exist by which audio, digital, and video information can flow back and forth for information exchange purposes. these telecommunications facilities are furnished for lease or private line use by the commercial carriers. a dedicated system may also be installed for the sole use of a particular customer. for example, the u.s. government has more than one dedicated system: the federal telecommunications system (fts ), which is available for official use only by civilian agencies; and it has similar dedicated facilities for use by the military. large companies, such as general electric, weyerhauser, and ibm, have exclusiveuse telecommunications systems also. in all cases, however, private or dedicated systems are planned in such a way that they interface smoothly with commercial dial-up facilities-thus increasing the overall distributive capacity of any one system. as might be expected, the tariff structure for these combined interconnections is very complex. the federal communications commission is reviewing the overall question of cost for voice and data communication and is also investigating the policy issues raised by the growing interdependence of computers and communications. technically speaking, there are five means by which audio, digital, and video signals may be carried to their destination and returned: by telephone line, by radio, by coaxial cable, by microwave relays, and by communications satellite. an explanation of each is given below and they are presented in ascending order of their band-width capacity. telephone lines the telephone as a means of communication is beyond compare. it is simple, quick, reliable, accurate, and provides great geographic flexibility. quite ofien the telephone can supply all the communications capability required for an information system, especially when it is coupled with the teletypewriter. a good toll quality telephone circuit has a frequency response of about 300-3400 cycles, which is adequate to supply good quality and a natural sounding voice. regular telephone lines are referred to as narrow band caltiers because of the low cycle range needed to carry human speech. radio broadcasting as the word "broadcasting" implies, signals are radiated in all directions and the omnidirectional antennas which are used in radio broadcasting are designed to have this effect. frequencies used are 500 to 1500 kilocycles for am (amplitude modulation), and 88 to 108 megacycles for fm (frequency modulation). the number of radio waves that travel past a point in one second is called the frequency. the number of waves s~nt out by a radio station each second is the frequency of that station. one complete wavelength is called a cycle. a kilocycle is one thousand cycles and a megacycle is one million cycles. broadcasting, in general, is used telecommunications primet· 153 as a one-way system. any radio or tv set equipped to receive certain frequencies can tune in to a particular station or channel. low-frequency systems, in the kilocycle range, require less power to operate. the signals are propagated close to the ground and the effective radius of reception ts small. with ultra-high frequency, vast distances can be covered by striking upper layers of the atmosphere and having the signal deflected to earth; this can happen more than once before the signal is received. high-frequency systems, however, are subject to atmospheric interference, which causes fading. coaxial cable (and catv) a remarkable extension of the carrier art was provided by the development of the coaxial cable. within the sheath of most coaxial cables are a number of copper tubes. within each tube is a copper wire, supported by insulating disks spaced one inch apart. the name coaxial reflects the fact that both the wire and the tube have the same axis. coaxial cables can carry many times the voice capacity of telephone lines and are thus considered to be broad band-width carriers able to accommodate digital and video data with equal efficiency. the coaxial cable has the additional advantage that the electrical energy confined within the tube can be guided directly to its destination, instead of spreading in all directions as is the case in radio broadcasting. to provide necessary amplification along the route, repeater stations are placed at designated intervals. repeater stations are unnecessary, however, within a half-mile radius and many libraries, planning new buildings, are including special ducts to accommodate known or potential requirements for communication between computer units, terminals, dial access stations, etc. the technology of community antenna television (catv) incorporates extensive use o.f coaxial cables. catv operates very similarly to the way a closed circuit television system works. a company in a locality sets up a powerful receiving antenna capable of importing television signals from many cities hundreds of miles away. on a subscription basis (about $6.00 per month), it will run a coaxial cable from the receiving station to the subscriber's home. subscribers benefit in several ways: 1) the incoming signals are sharper and clearer because there is no atmospheric interference; 2) a roof-top antenna is unnecessary; 3) more channels are available than a local tv station normally provides (some catv stations already offer the potential of 20 channels); and, 4) catv stations have close interrelationships with educational television stations (etv) and by law are required to make available to subscribers at least one channel for .. public service" and .. educational" purposes. the latter benefit has special implications for libraries. school libraries in a town or city where catv is proposed might well inquire whether the operator is willing to provide a school library programming service. 154 journal of library automation vol. 2/3 september, 1969 it is hardly possible to predict what effect catv and its coaxial cables will have on libraries. it is clear, however, that many homes will soon have coaxial cables as well as telephone lines, and this implies a new capability for bi-directional broad band-width information exchange. attachment of a coaxial cable from a catv trunk station to the horne provides an electronic pathway 300 megacycles wide. the telephone line is only 4000 cycles wide. since a megacycle is one million cycles, the relative practical difference in an operational environment is in the order of 50,000:1. it is this significant difference that causes some people to suggest that advanced telecommunications will someday bring newspapers and books into the home by electronic facsimile, along with computer information from data banks, individualized instruction from schools, and a much greater variety of educational materials. microwave the term microwave applies to those systems where the transmitting and receiving antennas are in view of each other. the word is not very definitive but generally describes systems with frequencies starting at 1000 megacycles and extending up to 15,000 megacycles, a range which includes the ultraand super-high frequency bands of the radio spectrum. microwave is, therefore, without question, one of the larger broad bandwidth carriers. microwave systems are used to transmit data and multichannel telephone or video signals. antennas are in the form of parabolic dishes mounted on high towers and lined up in sight of each other. these antenna produce very sharp beams to minimize power requirements. since microwaves do not bend, transcontinental microwave systems consist of relay towers spaced at approximately thirty-mile, line-of-sight intervals across the country. because of the earth's curvature, transoceanic microwave systems are hardly possible without a repeater station. it is this limitation which helped give rise to the development of the communications satellite. many state governments have, or are planning, private microwave systems for handling the mix of official, internal communications. here again, state libraries might investigate the use of such systems for interlibrary communications. communications satellites the newest and most promising telecommunication development is the communications satellite. a communications satellite is an object which is placed in orbit above the earth to receive and retransmit signals received from different points on earth. a communications satellite is launched by a conventional rocket, which sends it into an eliptical orbit with a high point, or apogee, of about 23,000 miles and a low point, or perigee, of 195 miles. on command from earth, a small motor aboard the satellite is fired telecommunications primer 155 just as the satellite reaches the high point of its orbit. this action thrusts the satellite into a circular path over the equator at an altitude of approximately 22,300 miles. subsequently, the satellite's orbital velocity is then synchronized with the speed of the earth's rotation. thus, a satellite in synchronous equatorial orbit with the earth appears to remain in a fixed position in space. three satellites can cover the globe with communications except for the north and south poles. or the antennas can be squinted to focus exclusively on one country or on part of a country. early bird's antenna was positioned to cover europe and the northeastern part of the united states, thus making it possible to link north america with europe. a satellite is not very large; early bird, which is still operating, is about seven feet in diameter. it contains a receiver to catch the signal, an amplifier to increase the signal's intensity, and a transmitter. signals received from one earth station on one frequency are amplified and transmitted on another frequency to a second earth station. the satellite receives light energy from the sun, and its solar batteries convert it into electrical energy for transmitting power. communications satellites are, in essence, broad band-width signal repeaters whose height enables them to provide coverage over a very large area. they can be "dedicated"; that is, designed for a single class of service, such as television relay; or they may be multipurpose and integrate a mix of different signals at the same time. generally, we tend to think of satellites as an extension of satellite broadcasting, mainly because most of their use up to now has been for television broadcasting. however, the enormous band-width capacity which they possess also makes them very attractive channels for two-way voice and picture applications for education, business, and libraries. within the next decade, domestic communications satellites will be available as "switchboards in the sky" for just such uses. conclusion libraries, like other institutions in our society, have learned the hard way that the new technology must be treated as an opportunity and not as a panacea. the same is true of telecommunications. before telecommunications can be applied effectively to interlibrary functions and services, many non-technical problems have to be solved. librarians must answer questions such as: how shall we organize our libraries to make optimum use of the advantage of telecommunications? what segment of our information resources and daily library business should flow over these lines? will our users accept machines as intermediates in the information exchange process? how can the copyright principle be safeguarded if libraries expand their interinstitutional communications? and, of course, how do we measure cost/ effectiveness before moving ahead with an operating program? to provide answers professional librarians must become more familiar with telecommunications technology and principles. 156 journal of library automation vol. 2/3 september, 1969 bibliography 1. becker, joseph: "communications networks for libraries," wilson library bulletin, 41 (december 1966), 383-387. 2. gentle, edgar c.: data communications in business: an introduction (new york: american telephone and telegraph company, 1965 ), 200 p. 3. kenney, brigitte l.: a survey of interlibrary communications systems (jackson, mississippi: rowland medical library, april1967), 74 p. prepared for the national library of medicine under nih contract no. ph-43-67-1152. 4. library telecommunications directory: canadaunited states. 2d edition, revised. (toronto and durham: 1968). 5. u. s. president. task force on communications policy: final report. (washington, d. c.: u.s. government printing office, december 1968). management planning for library systems development fred l. bellomy: head, library systems office, university of california, santa barbara, california 187 this paper deals with the application to library systems development programs of planning techniques which long ago proved their usefulness in business, military, and aerospace developments. the significant features of pert (program evaluation and review technique), wbs (work breakdown structure), planning diagrams, statements of work, cost/time estimates, schedules, manpower loading, and cost phasing are related through an example to the management requirements of a mafor systems development program at a large university library. the practical aspects of planning are treated in preference to the more theoretical. one seldom finds the sense of urgency characteristic of aerospace and military programs influencing the development of new library systems. this, of course, has both advantages and disadvantages. compared to military programs, the level of risk demanded by the urgency of the requirements may be considerably lower. development periods may be relatively longer and resource allocations can be spread out over a longer period of time, also. fewer people need to be involved in the development at any one time, but the problem of retaining individuals with a technical knowledge of the program throughout its life is greatly increased. the development of a total library system could require twenty to fifty man-years of effort and, depending on the number of people assigned to 188 journal of library automation vol. 2/ 4 december, 1969 the program, it could span a period of a decade or more. nevertheless, the requirements of a major library systems development program and those of a major aerospace or defense project are more similar than different. it is appropriate, therefore, to expect that planning techniques perfected for aerospace programs might be useful in planning major library programs. it is the purpose of this article to show how these principles are even now being applied in some library systems development programs. is planning necessary? the question is rhetorical, for every program manager uses some technique of planning in his work. as often as not, however, he attacks problems individually without an oveniding concern about the effect a particular solution may have on other aspects of the library's operation. this approach to solving problems, while obviously not an optimum one from the long-range point of view, may be the only available alternative at times. even the most ardent proponents of the total systems approach admit the possibility of critical problems requiring "quick and dirty" solutions ( 1). many of the steps to be outlined here for planning and implementing a total library system would, undoubtedly, be omitted where a solution was urgently needed to satisfy a small set of relatively simple objectives and where few external constraints and resource limitations were imposed. furthermore, not all systems designers agree that a library should even attempt to develop a "total system" in the beginning arguing that man must crawl before he learns to walk ( 2). in practice, any library will find it necessary to apply a combination of approaches, but must plan from the very beginning for a total system. even where the "fire fighting" approach must be adopted it is helpful to have a knowledge of procedures to be followed were solutions approachable in an ideal manner. a planning technique, regardless of the degree of sophistication, is only a tool and can never be expected to serve as a substitute for effective management. furthermore, such a tool must be viewed as an integral part of the entire management process. the management process has been evolving as much through the process of trial and error as through design for a long time now ( 3). many knowledgeable people have written about the process and not all of the descriptions agree ( 4,5,6). there does seem to be general agreement, however, on some of the fundamental operations which constitute a management cycle. these are diagrammed in figure 1. although phrased variously by writers the management process is usually defined to include: i ) the determination of objectives for an organization, 2) the preparation of plans for achieving the objectives, including the development of compatible cost and time schedules based on the plans, 3) the authorization of the required work, 4) the monitoring and evaluation of progress towards the objectives, and 5) the identification of alternate corrective action as problems develop. systems development planning! bellomy 189 fig. 1. the generalized management cycle. it is an unfortunate fact that too many major development programs in libraries are begun without prior establishment of objectives, prepared plans or developed schedules. too often, discussion has been begun with the unwarranted assumption that everyone concerned has a clear and identical understanding of objectives that have not been explicitly stated. during the past three years the author has had occasion to study, first hand, library automation projects underway at a large number of institutions : university of californiasan diego, university of californiairvine, university of californiariverside, university of californialos angeles, university of californiasanta barbara, university of californiasanta cruz, university of californiasan francisco, university of californiadavis, university of californiaberkeley, stanford university, ibmlos gatos, washington state university, texas a & m, florida atlantic university, southern illinois university, massachusetts institute of technology, yale university, university of maryland, harvard university, university of missouri, michigan state university, university of chicago, university of illinoischicago, university of pittsburg, ohio state university, rensselaer polytechnic institute, johns hopkins university, state university of new yorkalbany, state university of new yorkbuffalo, honnold libraryclaremont, new york public library, national library of medicine, library of congress. 190 journal of library automation vol. 2/ 4 december, 1969 in some of the major systems programs studied, planning had progressed not much beyond the identification of the initial steps which were required in the program, with tentative discussions of the immediate resources which were needed to implement the first steps. several of the managers reported that adequate funding for automated library systems development was hard to obtain before a technical capability had been demonstrated. others were of the opinion that a greater degree of library automation was inevitable and that although everyone knew that the first steps would be costly and relatively ineffective, a start had to be made sometime. in retrospect it is very clear that such arguments, while undoubtedly expedient in the short run, are not in an institution's best interest in the long run and, after all, as one associate put it, libraries are designed to last a millennium. prerequisites to planning resources the total systems approach implies the deployment of a team of professional people possessing diverse capabilities and backgrounds. one library administrator maintains that the development of a total library system requires the skills of scientific managers, philosophers, all categories of analysts, systems engineers, many categories of design engineers, computer programmers, and others in addition to library scientists. it is improbable that any one library would have on its staff personnel possessing the full range of capabilities required to pursue a successful systems development program. in some cases dedicated, full-time staff members will be able to learn the new skills which are needed; however, not all of the jobs requiring special skills need to be performed by full-time staff members. in some cases it will be feasible, and perhaps even desirable, to employ on a consulting basis individuals from outside organizations ( including equipment manufacturers). it may even be advantageous to contract with an experienced outside organization to perform an entire segment of a complex systems development program. in addition to individuals with specialized skills the systems development team should include key individuals from all of the existing library operations likely to be affected by the new systems. first, these people can provide the necessary insight into their organization's operations that only an insider can develop, and second, these people will stand as strong evidence that their organization's special interests are being considered, so that the new systems will have a much better chance of being accepted once they are implemented. above all, the early identification of one individual responsible for directing the entire development program is essential. this individual must have great skill in eliciting cooperation among people with diverse backgrounds, for systems work, like management, is partially a "people art" (7,8). systems development planning!bellomy 191 while it is imperative that a library systems program be adequately staffed it is equally important to insure adequate funding for the project. serious funding difficulties may result from a library's attempt to develop a major new system out of its existing operating budget. when a library administration commits itself to a comprehensive systems study, it must be willing to accept the risk that the results of the study may indicate that existing systems are adequate; that no new major systems development is required. if a library administration is dedicated to change for change' sake or if it has decided to undertake a research project as distinguished from the development of operational systems, much of what is being said here must be viewed from a considerably different perspective. the process of analyzing existing systems is itself valuable ( 9). libraries which have subjected themselves to systems analysis know that problems or inconsistencies within existing systems discovered during the analysis ordinarily will be followed by some immediate corrective action. few administrators consciously intend to maintain useless duplicate records or to prepare reports which serve no purpose. techniques applied by effective program managers vary widely from one individual to another and from one situation to another ( 10). aside from personal preference, factors which affect the approach taken include the complexity or scope of the objectives, the urgency of the requirements, and the risks the individual manager is willing to take. while objectives should be made explicit, they may be sketched out broadly or documented in great detail. similarly, plans should include consideration of every major activity required to achieve the objectives, but the level of detail may vary widely here, also. plans should, either implicitly or explicitly, specify the contingency relationships among all of the tasks identified in the plan. it should go without saying that schedules must be based on plans. however, there are undoubtedly countless instances where schedules have been conjured up out of thin air to meet artificial deadlines, or worse, where no schedules at all have been specified. the latter is more characteristic of dozens of small library programs now in progress, and it may be suspected that the former characterizes too many of the major library programs. objectives the reasons for undertaking a program must be determined by management in advance. a library administration begins the process of developing objectives for a modernization program by reviewing existing library policies, both generally understood and documented. because program objectives must be compatible with library policies, this is an essential first step. it will likely be necessary to develop a few new policies and to document many previously undocumented ones. the preliminary decision to undertake a modernization program may university objective serve a s 11a center of knowledge , a dive rs e collabora tion of academic and professional disciplines, with s uch emphasis on graduate and profession al s t ud ies as will provide facul ty and faciliti e s for the most ski lied in struc t ion and the most advanced research in the academ ic and profess ional dis c ip l ines." library objective efficiently provide the informational resources re quired by authorized inst ructional or research prog rams of the university i res ou rces development obje ctive library management object ive patron service s object i ve select , procure and process for us e inplan, organize, direct. control 7 a nd satisfy pat r on requests for in formaformat i ona 1 re sources needed by all coordinate the use of all capital re tion a i resources neede d i n connection authorized in s tructional or re search sou rces in a manner which maximizes wi th any authorized i nstructional o r pr og rams the effectiveness and minimizes the research program o f the un ivers ity in cost of the overall i ibrary operation t he shor test time possib le i i i i i i i select ,i ma ter ia is i acq :.11 r\l hater ia is i organize materials retrieve sl --'"later i a is ioisseminatel materi a ls i educate users i fac i lit ies objective systems object i ve administrat ion object i ve evaluate the fa c ilities needs of the evaluate the effectiveness of ope ra develo p a competent sta ff plan , or i i brary imposed by chang ing condit i ons tiona i systems and procedures and deganize, direct, control , and coordiand deve l op new facility specifications velop new systems which are more efnate its efforts t o utilize available requifed to meet the needs fective and less costly resources to achieve i ib rary object i ves i i systems and procedures systems development program analyze and improve existing library develop a tota l l ibrary system using systems and procedures t he best of the presently availa bl e devices and technologies which wil l produce a more effecti ve and/or l ess cos tly operation fig. 2. library objectives hierarchy. 1-' ~ c~ --q. t"-1 & j ~ cs ~ g· ~ !"""' 1:0 ~ t:l g ~ ~~ ,_. co ~ systems development planning/bellomy 193 have resulted from demands for change by higher governing bodies; requirements for new services in response to changing conditions; increasing backlogs; or inadequate budgets, staff or building space. in any case, program objectives will need to be established that reflect existing library policies and current or anticipated needs. h the library is a part of an institution that utilizes the planning-programming-budgeting system (ppbs ), this step already may have been taken. in the case of an essential support activity like a library, the process of identifying objectives is complicated by the fact that the operation tends to be self justifying. that is, it is an integral part of the stated objectives of the higher-level organization of which it is a part. thus, in order for a library to examine its full range of responsibilities it must first secure an approved statement of objectives for its parent organization. the purpose or objective of any organization depends on the perspective from which its functions are viewed. thus, even at the highest levels of abstraction, concerned individuals arrive at widely varying statements of objectives. in a university this process is further complicated by the general lack of concurrence on any subject, a situation which seems to be peculiarly characteristic of an academic community. in attempting to program the operations of a library, it is absolutely essential that the statement of objectives for the library, in some sense, be correlated with some reasonably authoritative and reasonably widely accepted statement of objectives for the parent organization. no statement of the library's objective will satisfy everyone concerned, but it must reflect the administration's official attitude. just as the library's objectives must contribute to the achievement of the objectives of the parent organization, so too must the objectives of the major library programs contribute to the achievement of the overall library objective. wlien objectives for program elements are identified, these in tum must contribute to the objectives of the programs, and so forth on down to the lowest level of activity in the program. in other words, there is a hierarchy of objectives, although they are seldom discussed in these terms. a portion of this hierarchy for a university library is shown in figure 2. the main purpose of figure 2 is to show how the objectives of a systems development program contribute to achieving all of the objectives at successively higher levels in the hierarchy. the systems activity is divided, in this example, into two major areas of work: systems and procedures work, and major systems development projects. the systems and procedures work is directed at obtaining relatively short-term gains while the major systems projects have comparatively long-range goals. the systems and procedures work in the example is considered to be a continuing administrative function directed at improving the general efficiency of the existing operation. much of this work is carried on by the individual supervisors themselves, with central coordination being pro194 journal of library automation vol. 2/ 4 december, 1969 vided. systems and procedures tasks include: organization planning and analysis; systems analysis and design; management audits; policy, procedures, and bulletin maintenance; forms analysis and design; reports analysis; records management; work measurement; office equipment selection; office layout; systems implementation; and related research ( 11). most libraries need to give this aspect of professional systems work greatly increased emphasis. the main objective of the systems development program cited in the example is to "develop a total library system using the best of the presently available devices and technologies which will produce a more effective and/or less costly total library operation." specific objectives could include such things as faster processing of new book orders, better control over technical processing routines, availability of more comprehensive statistics, better management information, reduction in routines performed by clerical staff, availability of better bibliographic descriptions of the collection, more effective utilization of professional staff, improved reference services, better control over the physical collection, reduction of patron involvement in the charging transaction, better circulation control, etc. naturally, no two libraries' general or specific objectives are going to match these exactly. selecting the first project the steps which are usually taken when preparing a set of program plans will be discussed in terms of a relatively typical example. let it be assumed that a systems analysis has shown that a total library system should be defined to consist of the following thirteen interrelated modules: materials selection, order processing, cataloging, materials preparation, library accounting, personnel control, systems and procedures, management information, inventory control, circulation, information retrieval, reference, and user education. also, let it be assumed that every routine function performed by the library will support one of its stated objectives and will be subsumed within one of these thirteen modules. the library system itself will be defined to be concerned only with the operational objectives, however. special, single-end-item projects, like facilities development, objectives or policy formulation, major systems projects (i.e. the development of a new module), etc., are a part of the management apparatus of the system. it will be convenient to isolate these aspects of the undertaking from the operational segments of the system. while it is reasonably clear that the formulation of a total system concept has to precede the development of any of the identified modules, it is much less clear in just what order the development of individual modules should be undertaken. a study of even some of the more obvious dependencies among thirteen such modules reveals a very complex set of contingency relationships. in a few cases these contingency factors will definitely constrain the order in which modules must be developed. usually, systems development planning/ bellomy 195 however, these considerations will be much less demanding and it will appear that the choice of implementation priorities will be, for all practical purposes, arbitrary. an evaluation of factors influencing the choice of implementation priorities will include: the nature of the interfaces among the defined boundaries of all the modules, an evaluation of the relative value of the payoffs to be expected from developing each of the modules, an evaluation of imminent changes in the state of the art affecting the development of a new module, and the political advantages to be gained from the development of a particular module. thus, the library's management must take these and other factors (including technical) into consideration when they make their initial selection of implementation priority. for the example program a set of hypothetical contingency relationships have been evaluated. they are diagrammed in figure 3, which shows how the sequence of implementation will be constrained by the various design contingencies which have been identified. the diagram says that the formulation of the total systems concept precedes development of any module. it must be the first major activity to be undertaken. further, it says that once the total systems concept has been formulated the development of any one of five modules can be initiated. the selection of a . pari ! sta rt fig. 3. design contingency relationships. end 196 journal of library automation vol. 2/4 december, 1969 ticular implementation priority is indicated by the letters associated with each block. that is, work implied by block "a" is completed first, then block "b"', then block "c", then block "d", etc., each module being completed before the next is begun. under these circumstances there would be little justification for identifying much more than the obvious contingency relationships already discussed. for more rapid development of the total library system, a much more complex planning effort would need to be undertaken. several of the major efforts shown to occur sequentially in figure 3 could, in fact, overlap significantly. some of the tasks involved in developing the cataloging module, for example, can be undertaken while the development of the order processing module is still in progress. where minimizing development time is an important program objective and where all necessary resources are made available as needed, a carefully formulated and detailed program plan is warranted-indeed it is essential. the work breakdown structure the work breakdown structure ( wbs ) displays two different kinds of information. first, it shows how the system itself is subdivided into successively smaller sub-components. second, it shows how all program activities making demands on available resources are related to the achievement of program objectives ( 12). the development of a work breakdown structure can be undertaken as soon as the system is conceptualized. furthermore, it should be available before an attempt is made to identify specific program tasks and the sequence in which they should be done (pert programing) . the work breakdown structure is a useful means of showing the components of a major program in successively greater detail. while there is, naturally, no limit to the number of levels of subdivision which can be used, four or five should satisfy the requirements of most library system development programs. the development of the work breakdown structure proceeds from the top to the bottom, showing how the total program is first subdivided into major program elements (or activities) and then how each of these in turn is successively subdivided into tasks and finally work packages. this relationship is shown generally in figure 4. a well developed work breakdown structure provides a basis for effective program planning and insures that no major program activity is overlooked during the planning phase of the program. it provides an excellent graphic representation of the interrelationship of the various components of a complex program, and shows how all aspects are related to the achievement of stated program objectives. finally, the work breakdown structure chart can be used as a convenient means for displaying progress towards achieving the objectives of a program. the details of the work breakdown structure developed for a project are heavily dependent on a number of factors. these include: the complexity, cost, and time span of the project; the relationship among the organisystems development planning/bellomy 197 program program element b task b work package b fig. 4. work breakdown diagram. zational units directly involved in the project; the objectives of the project; and externally imposed program constraints. an example of an actual work breakdown structure is presented in figure 5 and illustrates the important features of such a diagram. it shows how a typical major development program at a large research library might be dissected into its successively more detailed component parts. in this example, the systems development program is subdivided into four major subsystems developments and a general program activity. these are represented by the five blocks in the second level of the diagram (program elements). each of these five program elements is then further subdivided into more detailed tasks. tasks are divided into work packages so that the bottommost elements on the chart represent work assignments of a manageable size for program control. this is just an example, of course. in actual practice a similar structure would be developed for each project in the program. the order processing module, for example, would be divided into sub-modules, etc. an integral part of the planning function involves the budgeting of available funds (or the estimation of required funding) among the various program activities. a common technique for accomplishing this makes use of the use of the work breakdown structure. during later stages in the planning, all of the specific activities required to accomplish program obnsn"s o£y[ l01mt"" "'"'~ o('ftlop a total lltfta&y sysf~ 1)0 s'i'stut info-faces sutsysl(" ta$1( ~f£'-oa.iu (all ..oouus) a f~u\af£ oijf.uiyu -2 rf:coao amo aaalyle -) foamui.au: (omc[jtt •'p'1!.[pm[ skc tr!catioiis •s 0[sl(;n a/110 o(yuop -& a$50\ll( ccmponiius • 7 tu y tl($1~111 •i install c: fou. i)i·up •9 ooci.t\(nf •10 iiisu llan( ot.is fig. 5. library systems development program work breakdown structure. 1-' "' 00 'o t -a t"-1 t ~ ~ i ..... c;· ~ < ~ t-0 ~ tj c1> ~ g. v~ 1-' "' ~ systems development planning/bellomy 199 jectives will fall under individual blocks in the work breakdown structure. the lowest level blocks, it will be recalled, represent work packages. each of these work packages may in tum be assigned a cost account number for which funds may be budgeted. the work breakdown structure may also be used to establish summary budgets. while fund numbers may be assigned arbitrarily, coding is helpful. one workable technique is illustrated in figure 5. blocks of numbers are established for activities at each level within a structure on the diagram. one digit usually suffices at any particular level within a structure. responsibility for planning while it is probably better to assign responsibility for the preliminary planning activity to a single individual, it is imperative that plans eventually reflect the intentions of those who will actually be responsible for doing the work. these individuals will require certain guidelines before they can complete detailed planning activities. first, they must understand the program objectives. second, they must understand the basic organization of the program and the fundamental planning philosophies adopted by the program manager. third, they must understand that no plan is ever final, and should, therefore, propose every task which they believe necessary for a high probability of success. there are many advantages of drawing people responsible for major areas of work into the detailed planning activities of a program. a program plan developed in this way becomes their plan; it is one which reflects their intentions and which records their commitments. when schedules are finally developed from the plan they are much more likely to understand the significance of the completion dates and the consequence of slipping schedules or over-running budgets. it is well known that when an individual commits himself to a particular task completion date, he is more likely to meet that date than when he is directed to do so. program evaluation and review technique (pert) planning factors while the work breakdown structure provides a logical means of displaying the interrelationship among the various system components and program activities, it does not necessarily show all of the essential jobs which must be undertaken during the program. all such tasks are either implied or assumed during the preparation of the work breakdown structure, but they must be enumerated in greater detail before an attempt is made to prepare a comprehensive program plan. examples· of implied tasks might include: the selection of personnel to be assigned to the program, the procurement of funding, the survey and evaluation of manufacturer's equipment, program review conferences, system design evaluations, etc. a comprehensive list of such planning factors is another invaluable tool for use during the planning phase of the program. 200 journal of library automation vol. 2/4 december, 1969 g authorization to start 0 p-ogmii g patiton infor.hati oh 0 n£[05 v[rifi£d g total svst£h cohce:i't @ foiu'iulated g available docuttentatiotll 0 assetui l£0 g or0£11; pt~;ocessinc hodule design conce ,.t forhui.ateo g g sum~ry flow ch.art of pa[s£nt operation 8 pii.£pared g 1100ul£ interfaces 0 studied g) heasuaaflle processing @ pamitet£1\s identified g pllofilehs dgcuhenteo and b stud ied g pai\aaeter i'i.easuiuhent g [jtp[r ii'i.ents des ich£0 o--o-o orou process inc objectives ~nalvz£0 proc essing policies oocuhenteo cuili!;ent costs heasuaeo 8y function neu h.ocess lng oijec• lives "8minstom ing" coh'lete processi ng in foi'.kation needs stud i ed processing constraints· identified pllocessing paii.aaetus 11easuii.ed prese nt costs p(r units of \jdrk calculated process inc frequency req:uirehents studied bas i c svstehs alternatives io[hti fieo critical path (sequence of act ivities esti hateo to ft[quh~.[ the 1'\0st tih[ to c~pl£l£) c0 design tmde·dff stuotes complete @ complete p.evie\1 and ihfinement of hooule design concept @ complete reyi£\j and refinement of total syst£hs concept 0 hanaceh ent appr oval of concepts @ hhctiveness of old and n[v designs cofoi.paa.£0 g cost to develop ano opeilate new syste/1 esti,..ated @ opeii..at lng costs f~ old and ne\o' designs cohpaii.ed @ cost/e ffectiveness tmoe•. off study complete g itanac[ment appii.oval to ihplehent c0 ceneii:al systots sp£cifications pii.£p,ait[o fig. 6. system development program (pert) planning diagram. systems development planning/ bellomy 201 ® ;ene aal svstehs e eq.uiphent @ operati ng personnel specifications paemaeo selected taaineo (0 detailed traoe·of'f g office layout g module elements studies for systems designed tested cot\ponents cqi1pleted e orgajrrf!zational @ new module in stalled @ record and file slftuctui'es formulated specifications prepared @ shake o()in run e detailed process\ nc cot\pleteo @ kachin( proci\aa pj\oc eourts oev[lop[o spltlficatiohs prhar(o @ release hooul[ to @ coi1plete evaluation opeaating unit 8 ihput/otjt sp£cifica. .. anq rhim[/1[nt of lions prepatl\£0 module des ic'fs @ follc)i-up eyaluat ion complete g equ1 pt1ent sp£cifica.tious g mnac[m[ nt approval prepared to implehe'it @ 1100vl[ design fi na liz£0 g module 111ple11entat ioii g existing manual plans prepared files co nv eat£0 @ module desig~ documetiteo g >;((oa.o forhats 9 data communication desig"leo links ihstalleo @ publish all system e tqu i pt1eht procured docuh[ntatiotl @ f'tl£ structur($ oes jgh£0 initiation of next task g operation instructions ·g constrai ned by' complet ion g hach it.[ progaahs ~(par[o of seveml pr lor tasks coo(o 8 assembly and test of @ input and ou!put ka rwar£ cof1plet£ forms oesjg~£0 fig. 6 continued. 202 journal of library automation vol. 2/4 december, 1969 sometimes good lists of planning factors can be developed by reviewing other programs of a similar nature. while no list of planning factors developed by other organizations or individuals will prove entirely satisfactory in a new tmdertaking, it seems wise to take advantage, where possible, of others' experiences. sequencing activities the axiom, "the best place to begin is at the beginning," is probably less true of program planning activities than any of life's other endeavors. planning should begin with the important program goals (the major program objectives as specified by the library's chief executive). this is an alien approach to many, for it seems more "natural" to assess one's present situation and then to ask "where do we go from here?" there are fewer unknowns associated with planning activity for the near future than for the far. conditions can change radically during the course of a program. assumptions may be discovered to be poor or false. after having been caught up in such situations a number of times, everyone finds it more natural to say 'til cross that bridge when i come to it." but some people responsible for funding major library development programs are not "natural" thinkers. they often want assurances of specific accomplishments within specified periods of time in return for a specified amount of funds. it is not unusual for them to get very "unreasonable" when a request for funds is not accompanied by these kinds of "justifications." thus, program managers must approach the initial planning activity in an unnatural way. dilliculties must be anticipated and contingencies identified. above all, the plans must include recognition of every essential major activity. when plans are developed with reference to a carefully formulated work breakdown structure, the chance of inadvertently omitting an important activity is greatly reduced. an example of a typical planning diagram is presented as figure 6. such a plan is developed by first selecting a primary project objective. then, moving backwards in time, each task required to achieve the objective is determined in sequence (13, 14, 15, 16, 17, 18). the process of charting tasks in this manner to show their contingency relationships continues backward in time until a task contingent upon nothing other than authorization to proceed with the program is reached. as a practical matter, when a task has been identified that is contingent upon the completion of several other tasks, it is probably advisable to enumerate all of these tasks before selecting one to trace on back to the beginning of the program. naturally, all of the tasks will have to be traced back before the charting process is complete. preparation of the initial charts is an iterative process and assumes that a number of reviews will be made by knowledgeable individuals and their comments rehected in subsequent drafts of the preliminary chart. systems development planning/ bellomy 203 during this preliminary planning stage an effort should be made to have the diagram reflect all tasks that everyone thinks essential. furthermore, wherever tasks ideally should be conducted sequentially, they should be shown as sequential on the chart. when this procedure ultimately reveals schedule conflicts, compromises can be made. the logic adopted initially will likely be modified a number of times before even the first preliminary draft of the chart has been completed. arrangements that seemed logical initially will be discovered to be inconsistent as the plan develops, and new approaches and subdivisions of activities will be required. every good program planner knows that no amount of careful thought and foresight will result in the identification of all problem areas that will interfere with progress once the program is underway. consequently, he will either explicitly or implicitly build into the program plan contin~ gency factors. in some cases there will be sequences of activities that can be completed ahead of the time when contingent tasks begin. in these cases the waiting time and contingency factors can be identical and the problem is solved automatically, so to speak. in the critical path (that sequence of activities which will take the most elapsed time to complete) there will be no waiting times so that contingency factors must be interjected into this sequence of activities. they may be explicitly identified as contingency time or they may be implicitly imbedded in other tasks in the program. for example, management reviews or evaluations can be "padded, with the additional contingency time required for a viable pro~ gram plan. the best program plan will result when the final preliminary draft of the planning diagram reflects the understanding of all the individuals responsible for executing portions of the program plan. their backgrounds and experiences will permit them to see discrepancies and inadequacies in the plan which any single man could not possibly see. in particular, they will tend to view the plan from their own organization's point of view and can be expected to scrutinize critically those areas for which they will have some responsibility. some of their comments will not be com~ patible with the overall program philosophy or with the requirements of other organizations involved in the planing process. someone will need to arbitrate the special interests of individuals reviewing the plan. it is im~ portant, however, to attain a degree of concurrence among all individuals before the planning diagram is finalized. each of the involved individuals should consider the plan to be his plan, reflecting his judgment of what must be done to achieve the stated program objectives. the program man~ ager, who is responsible for the overall direction of the program, must be a primary participant in these negotiations, naturally. not every program will require such detailed planning. the process of periodically reviewing and revising the detailed plans is time consuming and may be completely unwarranted where the pressures of time do not force the performance of many tasks simultaneously. where all major program 204 journal of library automation vol. 2/ 4 december, 1969 activities can be scheduled for performance sequentially the process of planning is greatly simplified. referring to figure 3, again, it will be noted that the first major undertaking in the example is the formulation of a total system concept. the second major undertaking is the development of an order processing module. the third is the development of a systems and procedures module, and so on for the rest of the thirteen modules in the example. it is assumed that the development of each module is substantially completed before initiating the development of the next. using the less detailed planning approach the interrelationship among the several major activities that could be undertaken in formulation of a total system concept are summarized in figure 7. it will be seen that the second, third, and fourth activities could be scheduled to occur simultaneously, if the necessary personnel to undertake them were available. however, there is no reason why they could not be performed sequentially. taking the same gross planning approach the interrelationship among the various activities that might be undertaken to develop one of the modules are summarized in figure 8. this generalized planning network could apply equally well to any of the modules. statement of work those responsible for estimating the magnitude of work to be performed in each activity will require some knowledge of the scope of each activity. a generalized statement of work for the development of any one of the modules (figure 8) might look as follows: 1) formulate module objectives the objective of the module must contribute to achieving the objectives at all higher levels in the objectives hierarchy (figure 2). in addition to a generalized statement of objectives for the module, a comprehensive list of specific objectives needs to be formulated, in particular, what functions the module must perform; in other words, what products the module must produce. in performing this task attention needs to be paid both to the generalized objectives of the parent organization as well as to the present activities of the existing operations which imply objectives themselves. the design concepts finally adopted will reflect these objectives. 2) document existing operations in the process of formulating a total systems concept a great deal of documentation will have been assembled for all operations of the library. however, the emphasis there was on interfaces among operating units of the library. in executing the present task the emphasis is on detailed inputs, outputs, external constraints, processing information needs, resource requirements, and detailed procedures. this task must be concerned not only with specific items, such as books or forms, but also with specific data elements utilized or generated within the operation. systems development planning/bellomy 205 document policies & objectives document system define sy ste m equi rements prepare implementation plans end r-------~--~----------------~ 7 fig. 7. total system concept formulation planning network. formulate module ob·ectives prepare design specifications 0 design and develop t-jodule 0~----~c~o~nd~u~c~t_f~o~j~i~~·~·~u~p~e~v~a~lu~a~t~io~n~------~~ ~--------~r~e~f~in~e~m~o~d~u~l e~d~e~si~g~n------~-----20~o~c~u~~n~t~m~o~d~ul~e~de~s~i~gn~------------~ fig. 8. generalized module development planning network . 206 journal of library automation vol. 2/4 december, 1969 3) analyze and summarize the previous task provided the data necessary for putting together a comprehensive picture of the existing operation. the mass of data and materials which were collected need to be summarized, in a way which presents a concise display of the significant characteristics of the operation. all significant measurable parameters need to be identified. those capable of succinctly characterizing the operation must then be measured under carefully controlled, typical operating conditions to provide an accurate picture of current costs and effectiveness of the operation. this task should culminate with the informal publication of a module parameter summary. 4) formulate design concepts once the module objectives have been formulated, various alternate means for achieving these objectives can commence to be discussed. one important objective of this particular task is the identification of as many alternate approaches to satisfying the objectives as can be conceived. in this regard "brainstorming" sessions are useful ( 19,20). the fullest range of techniques and devices available should be explored for possible use in the implementation of the system module. during early stages in the development of a design concept little concern is paid to even the obvious design constraints. eventually, of course, a system concept must be postulated which satisfies these contraints, but initially even impossible approaches may suggest others which are possible, so that all alternatives will be considered in the beginning. before a design concept is finalized the result of the systems analysis of the existing operations should be evaluated. when a single set of concepts is finally selected, estimates of development and operating costs for a new module based on the concept, together with its projected effectiveness, should be made and compared with those of the existing operation. the design concept document should describe all functions to be performed by the module, as well as special techniques or items of equipment which will be used. 5) prepare design specifications based on the generalized descriptions in the design concept document, detailed specifications for the module are prepared. these specifications include such considerations as: the numbers, kinds, output formats, accessibility, and frequency of various management reports; the number of processing stations of various kinds; a comprehensive list of record contents; a description of all files required by the module; descriptions of all forms required by the module; personnel requirements and organizational descriptions; office layout; data processing machine software; equipment to be procured; timing of processes; and special module interface features. the documented design specifications should be circulated widely among operating personnel for comment and possible modification based on this comment. systems development planning/ bellomy 207 6) design and develop module this task includes the development of detailed procedures for transforming the module inputs into all of the required module outputs. machine programs must be written, forms designs finalized, file structures and record formats optimized, detailed operating instructions and procedures written, equipment interfaces confirmed, and personnel training programs developed, to name most of the more important undertakings. while no attempt should have been made at this point in the development of a system module to prepare final formalized documentation, enough background material should have been assembled to permit the preparation of such documentation. 7) assemble module components special equipment must be procured. interfaces between the library and a remotely located electronic data processing system must be established. existing personnel must be retrained and new personnel recruited. new communication links, if required, must be installed. 8) test module design every segment of the module design should be tested prior to its installation. new items of equipment or communication channels should be tested through many cycles to verify their operating characteristics, as well as to familiarize a few members of the staff with their operation. if a pilot operation of the module is possible, it should be undertaken. during the testing phase a continuing effort should be made to detect serious design deficiencies. the module should be exercised through several processing cycles, utilizing as many different variations of input data and output requirements as possible. such a testing phase should evaluate the adequacy of the various forms and reports, as well as provide some preliminary information about the accuracy of the predicted operating costs for the new module. 9) install module the first element of this task is the preparation of an installation plan. during the preparation of the installation plan early consideration needs to be given to the installation approach (phased, parallel, all at once) to be followed ( 21). during the changeover period special attention will need to be paid to operational problems as they develop. no system design is perfect and during the installation period major design deficiencies may become apparent. the major file conversion efforts are included in this task. this task culminates with turning the new module over to the operating personnel. 10) conduct follow-up evaluation during the new system's shakedown period it will have been forced to operate as intended by the designers and the department supervisor. 208 journal of library automation vol. 2/4 december, 1969 however, the real test of the workability of the system comes after this initial period when the system is "released" to run without any special attention being paid to it. after the system has been in operation for a period of time an evaluation of its effectiveness and the actual operating costs should be undertaken. because no system is ever perfect, even a brand-new system may be significantly improved as a result of this followup evaluation. 11) refine module design if the follow-up evaluation has disclosed any design deficiencies, a modification of the original module design is undertaken where the cost of correcting the deficiency is not greater than the value of the improved operation. 12) document module design after warranted modifications to the module design have been made as the result of the follow-up evaluation, the module design and operating instructions should be formally documented and released. until about this point in time the module design parameters may have been undergoing a process of gradual evolution, so that formal documentation of them may not have been justifiable. full and careful documentation of the new module design completes the module development project. estimating once the preliminary plan has been completed and approved, estimates of manpower, equipment, and materials requirements can be made. some people find it convenient to mark the various estimates on the pert planning diagram itsell, using different colors for each of the estimators. this has the advantage of displaying all previous estimates to each individual attempting to provide estimates for other activities in the program. however, this approach results in estimates being made on the spot without the careful deliberation and evaluation which they deserve and, therefore, probably is not advisable. the use of estimation worksheets can be effective. a worksheet that has been prepared for the example program is presented as figure 9. (note that a task breakdovvn has been included for illustrative purposes in the first two program elements, only.) each planned activity is entered on the form, where activities have been numbered in their general order of occurence. enough copies of these forms are then made so that each organization can have its own full set to use for estimating. the responsible individual in each organization provides estimates of required manpower, elapsed time, materials costs, and special equipment or facilities based on his understanding of the job. estimates of manpower requirements are made by category of manpower, except where a specific individual must be applied to a specific task. in such cases this individual systems development planning/ bellomy 209 months services ela pse d & man hours by category>', no. program elements time equipment material 1 2 3 4 5 a total system concept 18 0 0 1800 1500 1600 --1 assemble document at ion 5 --200 200 100 --2 document organization 2 --100 100 200 --3 oocumen t sys tern 10 --1000 1000 800 --4 policies and objectives 5 --100 100 200 --s define system requirements 1 --200 -100 --6 total system concept 1 --100 so 100 --7 implementation plan 2 --100 50 100 --b order processing module 26 $2s,ooo $23,000 1500 1600 2000 3000 800 1 formulate objectives 3 --100 10 10 --2 document operations 4 --100 so 200 -200 3 analyze and summarize 3 --100 so 200 -100 4 design concep ts 1 --50 so 100 100 -5 design spec if i cett ions 1 --so 10 90 100 -6 design and oeve 1 op 12 -$18,000 6oo 20 0 600 1500 100 7 as semb 1 e components 1 $22,000 -so 10 100 --8 test design 1 -$ 3,000 50 -90 150 -9 install module 2 $ 2. 000 $ 1 ,000 100 1000 100 150 400 10 follow-up evaluation 1 --so 20 100 200 -11 refine design 1 $ 1,000 $ 1,000 so so 100 300 -iz document design 3 --zoo 150 300 500 -c systems & pro cedure module 18 $ 1,000 $ 3. 000 4000 3000 700 zoo sod d mater ial preparat ion module 6 $ 1,000 $10.000 200 200 500 500 -e circulation module 18 $41,000 $15 , 000 2000 2000 2000 3000 zooo f user eoucati on module 18 $10,000 $10.000 1000 400 500 zooo -g inventory control module 6 $ 2,000 $ 6,000 100 300 500 300 -h perso nne l control module 12 $ 1,000 $10,000 1000 500 1000 700 -i cata log ing module z4 $35,000 $30.000 4000 1000 3000 4000 -j l1 brary account ing module 12 $ 2,000 $10,000 1000 500 1000 2000 k materials selection module 12 $ 7,000 $ 6,000 1000 200 1000 1000 -l management informal i on module 18 $ 8,000 $10,000 1000 300 2000 1000 -m reference module 24 $10,000 $15,000 2000 2000 3000 3000 -n information retrie val module 36 $50.000 $30,000 4000 2000 4000 5000 -* (1) l ibrarian, (2) clerk-typist, (3) ana l yst, (4) programmer, (5) general assistance fig. 9. cost/time estimates. is identified as a separate category of manpower and estimates are made separately for him. when all estimates have been completed, the costs are summarized by funding categories, as has been done for the example in figure 10, and the elapsed time estimates are marked onto the planning diagram. scheduling an elapsed time analysis is performed to determine the estimated time of completion for every event in the program. this is accomplished by adding together all the estimated elapsed times in a sequence of activities, and indicating at each event marker the cumulated elapsed time to that point. where several sequences of activities converge on a single 210 journal of library automation vol. 2/4 december, 1969 general nonsuppl i es equipment assistacademic academic & & total no, program elements ance salar ies salar ies expenses facilities costs a total system conce pt $ -$ 10,500 $ 16, 050 $ 2,000 $ -$ 28,550 i assemble ooclmentat ion 2 document organizat ion 3 document sys t em 4 polic ies a nd object ives 5 define sy stem requirements 6 tot a i system concept_s 7 i mpl ementati on plan b order processing module 2,200 8, 700 19,320 39,050 25,000 94, 27 0 i formulate object i ves 2 oo ctnent operat i o ns 3 ana i yze and sl.w1111a r i ze 4 design con cepts 5 design specifications 6 design and deve i op 7 assemb i e components 8 test des i gn 9 install modul ~ 0 follow-up eva luat ion i refine des ign 2 doc ument design c systems & procedure module 1,375 23,200 13,350 4,070 1, 000 42,995 d material preparation module -1, 160 4,290 12,675 1,000 19, 125 e circulation module 5,500 11,600 20 ,400 31 , 050 4 1, 000 109,550 f user educ at i on module -5 ,800 5 , 230 20,700 10 , 000 41,73 0 g inventory control module -sao 4,560 7,600 2,000 14,740 ~i personnel contr ol module -5,800 8,850 13.750 1,000 29,400 i catalogin g module -23 ,200 25 ,200 51 , 400 35,000 134,800 j ll brary accounting module -5,800 8,850 20,700 2,000 3 7. 350 k mater i als se lect i on module -5,800 8,040 11,350 7,000 32,190 l management i nforhati on modu le -5 ,800 15 , 810 15,350 8,000 44,960 m reference module -11,600 27,900 3 1, 050 10,000 80,550 n information retrieval module -23,200 35 , 400 56 ,750 50,000 165,350 ------------totals $ 9, 075 $142,740 $213,250 $317 , 495 $193,000 $875,560 fig. 10. costs by budget category. event marker, that sequence requiring the longest period of time determines the cumulative elapsed time to reach that event. those sequences which require less time will have slack time (waiting time) built into them and this can be used for adjusting schedules to minimize peak manpower, equipment, or facilities loading. when the cumulative elapsed times for each event have been determined for the entire program the preliminary scheduling activity can commence. it is convenient to use tenth's of forty-hour-work weeks in expressing elapsed times because 1/10th of a week equals a half day, which often seems to be a good minimum unit of time for estimating purposes. when the elapsed time analysis is complete it may be determined that the total elapsed time estimated for the program is incompatible with the systems development planning/bellomy 211 required program completion date. if this happens, it will be necessary to reinspect the program logic in an effort to identify activities originally planned to occur in sequence which can, in fact, be performed in parallel. such a change in the plan, however, will almost always imply increased risks. sometimes it will be possible to compensate for the increased risk by additional backup efforts, or by assigning the same activity to two different groups for simultaneous parallel performance. upon closer scrutiny it may be found that some of the activities originally thought essential are, in fact, merely desirable and can be eliminated from the plan entirely. eventually, this strategy will force the planned program elapsed time to be compatible with the program completion date specified by the program manager. in establishing schedules for activities, it is always best to leave any available slack at the end of a sequence of activities rather than at an earlier time in the sequence. because the above approach to scheduling may produce undesirable manpower peaks or unreasonable work schedules for individuals, early drafts of the schedule likely will need to be modified significantly before the draft can be finalized. a preliminary schedule is prepared by charting on a graph the earliest beginning and ending time for each activity identified in the plan. in figure 11 such a schedule has been graphed. tasks which are not contingent upon anything other than the start of the program (tasks 1 and 2 in figure 8) can be scheduled to commence on the first day of the program and be scheduled for completion in the estimated elapsed time for each one. for example, if task 1 had been estimated to require an elapsed time of three months the graph would show a bar starting at the beginning of the program and running out to the third month. some of the tasks (tasks 3 and 4) depend on the completion of earlier tasks. thus, task 4 (in fi~re 8) could not commence until the third month. then, if the elapsed time to complete that task had been estimated at one month, the bar for that task would begin at the third month and end on the fourth month. similarly, all tasks are scheduled in this way for the entire program. utilizing the other estimations (see figure 9) and making reasonable assumptions about how the man-hours are distributed in time for each task, the total number of man-hours by category can be calculated for any week in the program. if it were assumed, for example, that the expenditure of personnel time was evenly distributed throughout each of the tasks, and if two tasks were scheduled during the same week, with an average of 15 hours per week for one task and 25 hours per week for the second, a total of 40 man-hours of labor of that particular category would be estimated to be expended during the week in question. this sort of analysis is continued until the estimates of man-hour expenditures by category are available for each week of the program. now it is possible to analyze any period of activity in the planned program to determine what level of each category of manpower or special no. program elements 12 elapsed time in monts 24 36 a total system concept i assemble documentation 1-2 document organization -3 document system 4 policies and objectives 5 def i ne system requirements • 6 total system concept • 7 implementation plan b order processing module 1 formulate objectives -2 document operations -3 analyze and summarize .... 4 design concepts • 5 design specifications • 6 des i gn and develop 7 assemble components • 8 test design • 9 install module -10 followup eva l uation • 11 refine design • 12 document design c systems & procedure module d material preparat ion e circulation module f user educat ion module g inventory control module h personnel control module i cataloging module j library account ing module k materials selection module l management in formation module h reference module n 1 nformat i on retrieval t~odule fig. 11 . syst ems development program schedule. 41l 60 72 • 1:-0 ~ 1:-0 'c ~ .q.. ~ & ~ ~ > ;: 8" £ ..... .... 0 ;:s < 0 ~ ~ tj cl> () cl> 8 0" cl> ~'"i ~ (.0 ~ systems development planning/bellomy 213 facilities will be required during that period. during the first months of a typical program there will be heavy demands made on various categories of manpower. furthermore, later in the program there will be periods when practically no demand is made for the same categories of manpower. it usually would be desirable to minimize the peaks by shifting some of the activities to later times when fewer demands were being made. it is almost always possible to accomplish some shifting of schedules in a typical program. after an evaluation of the manpower loading implications of various scheduling alternatives a program schedule like that shown for the example in figure 11 might be adopted. based on the cost/time estimates presented in figures 9 and 10 and the program schedules presented in figure 11, resources requirements by year can be developed for the life of the program. the manpower loading chart (figure 12) shows manpower requirements for each of the four categories of skills speci£ed. the cost phasing chart presented as figure 13 shows funding requirements by category for each year of the program. it would be possible, of course, to further break down the costs into individual accounts as discussed in the section describing the work breakdown structure. a much tighter time phasing of all categories of costs is required for program control, but that subject is beyond the scope of the present article. o---o ll brad ans )e ~ typ ists f • anal ysts • • prograrrrners l3 t im e i n years fig. 12. manpower loading chart. year 1 2 3 4 5 6 7 8 9 10 ii 12 13 total ------man hours by category>'< il costs by category•'<>'< i 2 3 4 5 i ga ac.sal. n-ac.sal. s & e e & f 1,300 1,300 1 '100 0 0 0 8,000 12,000 2,800 1,300 1,200 0 300 1,000 16,000 12,000 2 , 000 1,000 3,400 3,600 2,400 3,500 1,000 3,000 20,000 37,000 10,000 0 1,800 1, 700 1,800 2,200 1,500 4,000 10,000 17,000 46,000 25,000 2,300 1,500 2,200 5,200 500 1,000 13,000 15,000 80,000 55,000 2,000 900 1,600 1,800 0 0 12,000 9,000 14,000 i ,000 2,000 500 2,000 2,000 0 0 12,000 5,000 51,000 35,000 2,300 500 2,000 2,300 0 0 13,000 5,000 1 i ,000 7,000 1,700 1,200 3,000 2,200 0 0 10,000 12,000 46,000 8,000 1,800 1,500 2,200 2,500 0 0 10,000 15,000 0 10,000 1,400 700 i ,500 2,000 0 0 8,000 7,000 57,000 0 1,400 500 1,500 2,000 0 0 8,000 5,000 0 50,000 400 300 3oo 0 0 0 2,000 3,000 0 0 24,600 15,500 22,800 25,700 3,300 9,000 142,000 154,000 317,000 192 , 000 ----------------'-------*(1) librarian (academic salaries) (@$5.80/hr.), (2) clerk-typist (non-academic salaries) (@$2.70/hr.) (3) systems analyst (non-academic salaries) (@$7.50/hr.), (4) prog ramme r (supplies & equipment) (@$5.35/hr.), (5) general assistance (@$2.75/hr.), note: 1,800 working hours per year used. ,.,., costs rounded to nearest $1 , 000 fig. 13. systems development program cost phasing. tot a i 20, 000 32,000 70,000 102,000 164,000 36,000 103, 000 36,000 76,000 35,000 72,000 63,000 s,ooo, 8j4,oooi -~ 1.\0 ~ ~ w "i s ,_ .q., l:""t .... c"' a ~ ::to. ~ f ..... c· ;:! <: ~ 1.\0 ~ t:j (1) n (1) ~ (1) ~"'1 ~ "' ~ systems development planning/ bellomy 215 the "completed" plan when the planned program is finally compatible with the externally imposed constraints and when there is a reasonable degree of concurrence among all the involved organizations, it is generally desirable to formalize the documentation. the temptation to consider the document unchangeable will be eliminated if it is pointed out that individual dates in the schedule rehect current "best estimate" targets, and that planning and rescheduling will be a continuing effort throughout the program. the wide availability of program plans permits all involved individuals to assess the impact of their efforts on the overall program. furthermore the pert planning diagram provides them with a convenient means for recording their performance against the program goals. finally, the cost and benefits data contained in the plans are major inputs to any planning-programming-budgeting system (ppbs) and this more rational approach to partitioning limited resources among the many competing activities of large institutions, like universities, is going to become an increasingly significant part of library operations in the future ( 22, 23, 24). no plan is ever final. it must be periodically reevaluated and warranted modifications reflecting newly identified requirements or changes in the operating environment, etc., must be made. it is an axiom of total systems design that the implementation of earlier parts of a system may so significantly modify the actual operating environment as to dictate major changes in the design specifications for other parts of the system to be implemented later. thus, a total system is much more likely to evolve, than to unfold according to some predetermined design. we must continue to expect systems work to "evolutionize" rather than revolutionize library operations. acknowledgments the work on which this article is based was supported in part under the grant from the council on library resources to the institute of library research for the preparation of the forthcoming handbook of data processing for libraries. eugene graziano made a perceptive and critical evaluation of an earlier draft of this paper which led to its extensive revision; and robert hayes encouraged development of this article from material the author prepared for the handbook of data processing for libraries. references 1. hayes, r. m.: "concept of an on-line, total library system," library technology reports, (may 1965), 13. 2. de gennaro, richard: "the development and administration of automated systems in academic libraries," journal of library automation, 1 (march 1968), 75-91. 216 journal of library automation vol. 2/4 december, 1969 3. george, c. s., jr.: the history of management thought (englewood cliffs, n. j.: prentice-hall, 1968). 4. roberts, edward b.: "industrial dynamics and the design of management control systems," in management controls, edited by charles p. bonini (new york: mcgraw-hill, 1964), pp. 102-126. 5. "feedback," the systemation letter, 166 ( 1966). 6. wheeler, j. l.; goldhor, h.: practical administration of public libraries (new york: harper and row, 1962). 7. deardon, john: "how to organize information systems," harvard business review, 43 (march 1965), 67-73. 8. "how a system is built," the systemation letter, 186 ( 1966) . 9. "analysis check list," the systemation letter, 12 ( 1958). 10. holtz, j. n.: an analysis of major scheduling techniques in the defense systems environment (santa monica, california: the rand corporation, 1966). 11. minnich, c. j.; nelson, 0. s.: systems management ror greater profit and growth (englewood cliffs, n.j. : prentice-hall, 1966), pp. 16-34. 12. national aeronautics and space administration: nasa pert in facilities project management (washington, d.c.: u. s. government printing office, march 1965), pp. 9-11. 13. kadet, jordan; frank, bruce h.: "pert for the engineer," ieee spectrum, 1 (november 1964), 131-137. 14. management systems corporation : dod and nasa guide pert cost system design (cambridge, massachusets: june 1962), 145 15. national aeronautics and space administration: nasa-pert "c" computer systems manual (washington, d.c.: government printing office, september 1964). 16. pert coordinating group: pert guide for management use (washington, d. c.: u. s. government printing office, june 1963). 17. pert . .. a dynamic project planning & control method. ibm general information manual (white plains, new york: ibm data processing division), 28 pp. 18. navy department, special projects office: an introduction to the pert/ cost system for integrated project management (washington, d. c.: u. s. government printing office, 1962). 19. "brainstorming: cure or curse?" business week, 1426 (december 1956) . 20. osborn, alex: "brainstorming, new ways to find new ideas," time, 99 (february 1957), 90. 21. systems and procedures association: business systems (cleveland; 1966). 22. "planning-programming-budgeting, selected comment prepared by the committee on government operations, united states senate, july 26, 1967 (subcommittee on national security and international operations) . systems development planning/ bellomy 217 23. hartley, harry j.: educational planning-programming-budgeting: a systems approach (englewood cliffs, new jersey: prentice-hall, 1968). 24. fazar, willard: "the importance of ppb to libraries," presented to the institute on ppbs for libraries, department of library science, wayne state university, detroit, michigan, september 23, 1968. computer based acquisitions system at texas a&i university .. ned c. morris: texas a&i university, kingsville, texas. 1 in september, 1966, a system was initiated at the university which provides for the use of automatically produced multiple orders and for the use of change cards to update order information on previously placed orders already on disk storage. the system is geared to an ibm 1620 central processing unit ( 40k) which has processed a total of 10, 222 order transactions the first year. it is believed that the system will lend itself to further development within its existing framework and that it will be capable of handling future work loads. in 1925, the library at texas a&l university (first known as south texas state teachers college and later as texas college of arts and industries) had an opening day collection of some 2,500 volumes. by the end of august, 1965, the library's collection had grown to 142,362 volumes, including 3,597 volumes purchased that year. the book budget doubled in september of 1965, and the acquisitions system was severely taxed as the library added by purchase a total of 6,562 volumes. after one full year under the mechanized system discussed below, a total of 9,062 volumes had been added by purchase. counting gifts, transfers, and cancellations, the computer actually handled 10,222 order transactions the first year. the computer-based acquisitions system now in operation was initiated in september of 1966, eleven months after the decision was made to ~echanize the process. the library had already experienced successes m computerizing the circulation and serial systems and, because a rapidly 2 journal of library automation vol. 1/ 1 march, 1968 expanding book budget had caused the old traditional type of acquisitions system to become unwieldy and seemingly obsolete, it seemed imminent that the installation of a computerized acquisitions system would follow. furthermore, it was agreed that acquisitions could make use of the computer at no additional cost, since the library was already paying its share of the machine rental costs for circulation and serials. following the decision to go ahead with the project of computerizing the acquisitions system, a preliminary survey was made of the literature on the subject, and a plan for approaching the task conceived. briefly, the plan hinged upon the idea of an automatically produced multiple order form similar to that proposed by ibm ( 1). it also provided for use of the change card, reported by becker to be "a unique and very important part of the penn state system" ( 2) . it further provided for the automatic production of a weekly books on order list or ''processing information list" similar to that reported by schultheiss to be in use at the university of illinois libraries ( 3) . the plan was written in the form of a proposal which was then sent with an accompanying flow chart to the director of the campus computer center for consideration. the basic proposal for the new system was accepted, and work toward implementation of the system was begun immediately. as was expected, the plan and how chart had to be altered in some areas as the project progressed. as a first step, the book order request form was redesigned to serve as a work slip in the verification routine, as a source document for keypunching, and, in the end, as notification to the requester that a requested item had been cataloged. the redesigned request card consisted of a single record form printed on one side of an ibm tab card (figure 1 ). the only objection to usage of this form appeared to be that the requester would have no record of his request unless he produced one for himself. however, this form was adopted because it was judged less expensive , book ltequest foil'ol firttncw'i"' et i dept. _ _ volumes (complete s• t) volum•_(onlyl [clition: series: yeor: nvmber or copi es:___ lht i'ti c:e:'---d.alcr : cat . no. : ltan no . : (fo~ liua!!y staff use only(: 0 ow~ l .c. nu.y... . . • ..•... ... .• .•. 1419.00 965.55 1633.66 ll80.2l· business administration ·--3358.00-1492.1":3.. 3.872.9h ·-.zoo7~l:-·· · ·•. chemistry . 1182.00 385 .20 646.19 150.61 education 4i~9.00 1755.05 1094.34 1549.61 engineering 2904.00 1143.60 1938.56 178.16· english . ~~6~;o6 i591.06 1463~43 11sd;s1 geography 671.00 1310.75 826.58 1466.33· governi'ie'iil--· ·-----,2~7""3~3.""oo 1ooi:-8s·.. 1356.31 374.84 ·-·-·--history 3091.00 1666.95 2856.31 1432.26health and physical education b.26.00 132.60 215.16 .. il8.24 home economics 675.00 429.17 8.92 236.91 industrial arts 89i.bo .. 193.57 29;21 668;·~2 journalism 312.00 83.40 115.41 ll3.19 -,...¥a~th~e~m~a~t~ic~s;-·----------.1~1+17'4'-;.0s:0;--,9.;;0~7-'-c.8~5i3ll.74 105.59· modern language 1801.00 626.45 978.38 196.17 music iilil2.00· 1214.15 338.92 328.93 physics 1494.00 634.30 823.19 36.51 ·;;svtholifgv··· · ....... ..... 1oz;oo· ··:;az.oo 57i.6o ·· 21.6osoc iology 739.00 197.45 69.96 471.59 ~sp;;.;e;,;e~c7:'h=-: ___________ ......... ll92.00 991.00 421.48 220.48· ~e~~.r.~l. .• 263?_1.00 8557.10 16907.12 886._78 total 64396.00 25984.62 37938.95 472.43 ··------value of gifts ano transf ers 7643.29 fig. 7. example of computer produced financial statement 8 journal of library automation vol. 1/ 1 march, 1968 (figure 8) for budgetary purposes. the computer also gives credit to the appropriate fund for items cancelled. this accounting is accomplished through the use of one of the change cards mentioned above. the "books on order" list mentioned above is necessarily cumulative to include all new orders processed, since all new requests are checked against this list for possible duplications. this list always provides current inf01mation on the status of an order, enabling the user to find out to what stage in the total process a given order has progressed. non-book materials are differentiated from book materials through use of form codes (figure 9) which appear on the "books on order" print-out. code department aed ag art bio ba chm ed en eng ceo gov hst hpe he la jrn mth mdl mus phy psy soc spe gen gft · agricultural education agriculture art biology business administration chemistry education engineering english geography govemment history health and physical education home economics industrial arts journalism mathematics modern language music physics psychology sociology speech general gifts and trasnfers fig. 8. fund codes used in the acquisitions system form gode microforms _______ m films _________________ __ ________ c filmstrips -----·---------s records ____ ___________ d tapes _______________________ t maps --"---c---------------a manuscripts ________ u serials ___________ _______ p fig. 9. form codes used for non-book materials computer based acquisitions system/ morris 9 use of change cards if a dealer reports an item unavailable, cancellation data is noted on the first change card, which then is sent to the computer center. here cancellation data is keypunched into the change card and the change card is fed into the computer to remove all information pertaining to the order from disk storage and consequently from the "books on order" list. the second change card is then discarded. if a dealer supplies an item, actual cost and date received is indicated on the first change card, which is then returned to the computer center. here cost and date received is keypunched into the change card and the change card is processed through the computer to record receipt of the item and to adjust the corresponding account if necessary. the second change card then accompanies the newly acquired item through the various stages of cataloging. at the appropriate time during the cataloging routine, the call number is written on the second change card. when the catalog cards are ready to be filed in the public catalog, the second change card is returned to the computer center where the call number is keypunched into it. from here this change card, usually in a group of several hundred, is fed into the computer and a list of current acquisitions (figure 10) is printed out. the second change card then is coded so as to make possible the deletion from disk storage of all information pertaining to an order which has appeared on an acquisitions list for as long as two months after the item has been cataloged. this allows the catalog department ample time to file cards in the public catalog, thus reducing the possibility of unintentional duplication. once deleted, the item no longer appears on the "books on order" list. use of five-part order form part one (the original) of the order is sent to the dealer. part two is sent to the catalog department for use as an order for cards from the library of congress. part three differs from part two in color only and serves primarily as a record of the library of congress card order. part four, with part five and corresponding change cards, is filed alphabetically first by dealer and then by main entry. part four serves as a report form on which to record dealer reports and other messages pertaining to the status of the item on order. in the event that an order is cancelled, part four is sent to the catalog department as a signal that library of congress cards may also be cancelled. part four is discarded if a claim or cancel procedure is negated by receipt of an ordered item. part five, with part four and corresponding change cards, is filed in the same manner as part four above. when an item is received and paid for, cost and date received is recorded on this copy of the order. part five, designated as the control copy, then is filed by order number in the library's "contror file for possible use in the identification of items already approved for 10 journal of library automation vol 1/ 1 march, 1968 f /g183ds 0 15.72 /g5896f v1+2 016.37139/h383p 016.519 /inp.b 016.9 /k953d 028 . 52 /b6448 029 .6 /m1990 031. /w569f 056. /in25 v3 1963 060. /w893 1966 67 110 . /m494e 130.1 /v631b 131. /k1396p 131 .3464 /w6321 137.842 /b388r v1 1961 137.842 /b388r v2 137. b42 /b388r v3 garland ham a daughter of th e middle border peter smith 1960 gonzalez luis fuentes de la hist contemp hex colegio mexico hex 1961 hendershot carl programed learning bibli ogr aphy ed 3 the author mich 1964 wold herman 0 bibli ogr aphy on time series mit press mass 1966 kuehl w f dissertations hi story amer l1 b assoc mckerrow r b wheeler will a pan am union mel sen a van vesey god n a kantor j r wickes fran g beck samuel j v 1 (only>. beck sam j v 2(0nly) beck samuel j v 3(0nly) univ of kentucky ky 1965 books for children 1960-1965 am l i assoc chic 1966 on the publication of research mla n y familiar allusi ons gale ind ex to latin amer periodicals eo 3 sc ar ecrow 1965 world of learning 1966-67 eo 17 internatl 1967 evolution and philosophy duquesne 1965 body and mind readings in philo humanities 1965 problems of physiological psy principia press !no 1947 the inner world of man ungar n y 1959 rorschacks test basic process ed 3 gr une 1961 rorschacks test variety of per grune 1949 rorschacks test advances in grune 1<':152 150.1943 /b78b broadbent d e behavior basic books 1961 fig. 10. example of computer produced current acquisitions list payment which may no longer appear on the "books on order, list. it further provides official evidence that purchase was duly authorized. gifts and transfers a gift item is processed in the same manner as a purchase except that part one of the order is discarded. an estimate of the value of each title is submitted so that the total value of gifts can be produced automatically computer based acquisitions system/morris 11 for a given period. an item transfelted from the bookstore or any other department of the institution is processed in the same manner as a gift, except that the actual cost of the item is used rather than an estimate. standing and continuation orders a standing or continuation order for a series is keypunched with coded information which causes it to appear indefinitely on the "books on order" list. the two-fold purpose of this is to eliminate the possibility of unintentional duplication and to serve as evidence that the order was authorized. an item actually received on a standing or continuation order basis is processed as a confirmation order and is assigned an order number different from the one assigned the original order. in this way, the item received will appear on the "books on order" list next to the original entry only as long as it takes to catalog the item. clearance of invoices and final routines upon receipt of shipment and coltesponding invoice, an item is accepted (if as ordered) and the date of acceptance and cost (as per invoice ) is noted on the first change card. this change card is then returned (usually in a group of several hundred) to the computer center, where cost and receipt date are keypunched into it. this information is fed into the computer and accurate accounting results. the next printout of the "books on order" list will indicate that the item was received on the date noted. part four of the order is discarded. part five of the order, bearing cost and date received, is filed by ~rder number in the "control" file. the second change card and the original request card accompany the book to the catalog department. book pockets are pasted in the books at this point to accommodate the second change card and, later, the ibm circulation card used by the library's circulation department. at the end of . the cataloging routine, the original request card is sent to the requester as notification that the item is ready for use. discussion no attempt has been made to compare costs of the new system to the old. on the surface, however, there appears to be considerable saving in time and clerical personnel. automatic accounting alone results in a net gain of approximately twenty hours per week in clerical time which can be applied to other necessary manual tasks. manual typing of orders has been completely eliminated with the use of the computer produced order, resulting in further savings in clerical time. limitations of the new system are about the same as those encountered by other mechanized systems, the limiting factors of space in input and electronic storage being most obvious. the present disk storage equipment is capable of storing data on approximately thirteen thousand book orders and this capacity could be doubled with the addition of another 12 journal of library automation vol. 1/ 1 march, 1968 disk unit. the problem · of disk storage space is not critical at present because removal of order information from storage at two-month intervals after the cataloging process creates additional space for new orders. although the new system has definite advantages, perfection was never expected nor does it exist. the human error factor in the book verification and keypunching processes shows up now and then. experience bears out the fact that output is only as perfect as input. nevertheless, there has been a noticeable gain in accuracy with the installation of the new system, mainly because the more exacting method of procedure helps in detecting an error before it is beyond retraction. even keypunching accuracy has been much greater than expected. conclusion the new acquisitions system at texas a&i university does the job that it was designed to do. it has resulted in faster clearance of orders, better control over unintentional duplication of orders, and automatic accounting. it is believed that the system will lend itself to further development within its existing framework and that it will be capable of handling future work loads. acknowledgements much of the credit for the success of the program goes to dr. j. r. guinn, professor and chairman of the department of electrical engineering. his time in reviewing the original proposal and his subsequent efforts toward the implementation of the project resulted in a workable, practical system. credit goes also to mr. patrick barkey, former librarian at texas a&i university (then known as texas college of arts and industries) for the encouragement he gave to the writer and for the support he gave to the project. appreciation is extended also to mr. r. c. janeway, librarian at texas technological college, for submitting some worthy ideas on design of order forms and on acquisitions procedures in general. references 1. international business machines: "mechanized library procedures," ibm data processing application manual (white plains: ibm, n. d.), p. 11. 2. becker, joseph: "system analysis-prelude to library data processing," ala bulletin, 59 (march 19~), 296. 3. schultheiss, louis a.: "data processing aids in acquisitions work," library resources and technical services, 9 (winter 1965), 68. 4. cox, carl c.: "mechanized acquisitions procedures at the university of maryland," college and research libraries, 26 (may 1965), 232. digitization has bestowed upon librarians and archivists of the late 20th and early 21st centuries the opportunity to reexamine how they access their collections. it draws these two traditional groups together with it specialists in order to collaborate on this new great challenge. in this paper, the authors offer a strategy for adapting a library system to traditional archival practice. t he librarian and the archivist . . . both collect, preserve, and make accessible materials for research; but significant differences exist in the way these materials are arranged, described, and used.”1 among the items usually collected by libraries are: published books and serials, and in more recent times, commercially available sound recordings, films, videos, and electronic resources of various types. archives, on the other hand, tend to collect original records of an organization, unique personal papers, as well as other effects of individuals and families. each type of institution, given its particular emphasis, has its own traditions and its own methods of dealing with its collections. most midto large-sized automated libraries in the united states and abroad use machine readable cataloging (marc) records to form the basis of their online catalogs. bibliographic records, including those in the marc format, generally represent an individually published item, or “information product,”2 and describe the physical characteristics of the item itself. the basic unit of archival description, however, is a much more complex entity than the basic unit of bibliographic description and often involves multiple hierarchical levels that may or may not extend down to the level of individual items. at portland state university (psu) the authors examined whether the capabilities of their present integrated library system could be expanded to capture the hierarchical structure of traditional archival finding aids. ■ background as early as 1841, the cataloging rules established by panizzi were geared toward locating individual published items. panizzi based his rules on the idea that any person looking for any particular book should be able to find it through the catalog.3 this tradition has continued over time up through current standards such as the anglo-american cataloguing rules and reaffirmed in marc, the standard for the representation and exchange of bibliographic information that has been widely used by libraries for over thirty years.4 archival description, on the other hand, is generally based on the fonds, that is, the entire collection of materials in any medium that were created, accumulated, and used by a particular person, family, or organization in the course of that creator’s activities and functions.5 thus, the basic unit of archival description, usually a finding aid, is a much more complex entity than the basic unit of bibliographic description, often involving multiple hierarchical levels of description that may or may not extend down to the level of individual items. before archival description begins, the archivist identifies related groups of materials and determines their proper arrangement. once the arrangement is determined, then the description of the materials reflects both their provenance and their original order.6 the first explicit statement of the levels of arrangement in an archival collection was by holmes and has since been elevated to the level of dogma in the archival community.7 a more recent statement in describing archives: a content standard (dacs) indicates that the actual levels of arrangement may differ for each collection. by custom, archivists have assigned names to some, but not all, levels of arrangement. the most commonly identified are collection, record group, series, file (or filing unit), and item. a large or complex body of material may have many more levels. the archivist must determine for practical reasons which groupings will be treated as a unit for purposes of description.8 rephrasing holmes, the five levels of arrangement can be defined as: 1. the collection level which holmes called the depository level—the breakdown of the depository’s complete holdings into a few major divisions based on the broadest common denominator 2. the record group level—the fonds or complete collection of the papers of a particular administrative division or branch of an organization or of a particular individual or family 3. the series level—the breakdown of the record group into natural series and the arrangement of each series with respect to the others 4. the filing unit level—the breakdown of each series into unit components, which are usually fairly obvious if the documents are kept in file folders 5. the document level—the level of individual items digital collection management through the library catalog michaela brenner, tom larsen, and claudia weston digital collection management through the library catalog | brenner, larsen, and weston 65 michaela brenner (brennerm@pdx.edu) and tom larsen (larsent@pdx.edu) are database maintenance and catalog librarians, and claudia weston (westonc@pdx.edu) is assistant university librarian for technical services, portland state university. 66 information technology and libraries | june 2006 the end result of archival description is usually a finding aid that ideally presents an accurate representation of the items in an archival collection so that users can, as independently as possible, locate them.9 building on the print finding aid, the archival community has explored a number of mechanisms for disseminating information on the availability of items in their collections. in 1983, the usmarc format for archival and manuscript control (marc-amc) was released and subsequently sanctioned for use as one possible standard data structure and communication protocol in the saa descriptive standard archives, personal papers, and manuscripts (appm) and its successor, dacs.10 its adoption, however, has been somewhat controversial among archivists.11 the difficulty in capturing the hierarchical nature of collections through the marc format is one factor that has limited the use of marc by the archival community. while it is possible to encode this hierarchical description in marc using notes and linking fields, few archivists in practice have actually made use of these linking fields.12 thus, in archival cataloging, marc records have been used primarily for collection-level description, allowing users to search and discover only general information about archival collections in online catalogs while the finding aid has remained the primary tool for detailed data at all levels of description. in 1995, the encoded archival description (ead) emerged as a new standard for encoding descriptions of archival collections. the ead standard, like the marc standard, allows for the electronic storage and exchange of archival information; but unlike marc, it is based on the finding aid. ead is well suited for encoding the hierarchical relationships between the different parts of the collection and displaying them to the user, and it has become more widely adopted by the archival community. as outlined, the standards and systems chosen by an institution are dictated by the needs and traditions of that institution. the archival community relies heavily on finding aids and, with increasing frequency, on ead, their electronic extension; whereas the library community heavily relies on the online public access catalog (opac) and marc records. new trends capitalizing on the strengths of both traditions are evolving as libraries and archives seek ways to improve access to their archival and digital collections. ■ access to digital archival collections in libraries when searching the web for collections of information, one frequently encounters separate interfaces for traditional library, archival, and digital collections even though these collections may be owned, sponsored, hosted, or licensed by a single institution. descriptive records for traditional library materials reside in the opac and are constructed according to standard library practice, while finding aids for the archival and digital collections increasingly appear on specially designed web sites. this, of course, means that users searching the opac may miss relevant materials that are described only in the archival and digital documents database or web site. similarly, users searching the archival and digital documents database or web site may miss relevant materials that are described only in the opac. in other instances, libraries, such as the library of congress, selectively add records to their opacs for individual items in their archival and digital document collections. this incorporation allows users more complete access to items within the library’s collections. authority control and the assignment of descriptors further enhance access to the item-level records. to minimize processing costs, however, libraries frequently create brief descriptive records for items, thereby limiting their value to patrons.13 by creating descriptive records for the items only, libraries also obscure the hierarchical relationships among the items and the collections in which they reside. these relationships can provide the user with a useful context for the individual items and are an essential part of archival description. still other libraries, such as the university of washington, include collection-level marc records in the opac for their archival and digital document collections. these are searchable in the opac in the same way as bibliographic records for other materials. these collection-level records can then in turn be linked to finding aids that describe the collections more fully.14 collection-level records often are used in libraries where library resources may be insufficient for cataloging large collections of materials at the item level.15 the guidelines for collection-level records in appm and dacs, however, allow for additional fields that are not ordinarily used in library bibliographic records. these include such things as descriptions of the organization and arrangement of the collection, citations for published descriptions of the collection and links to the finding aid, and acknowledgment of the donors, as well as ample subject access to the collection. despite their potential for detail, collectionlevel records cannot provide the same degree of access to individual items as full item-level records. ■ an approach taken at portland state university library in many ways, archival and digital-document collections are continuing resources. a continuing resource is defined as “. . . a bibliographic resource that is issued over time digital collection management through the library catalog | brenner, larsen, and weston 67 with no predetermined conclusion. continuing resources include serials and ongoing integrating resources.”16 like published continuing resources, archival and digital collections generally are created over time with no predetermined conclusion. in fact, some archival collections continue to grow even after part of the collection has been accessioned by a library or archive. thus, even though many of the individual items in the collection might be properly treated as monographic (not unlike serial analytics), it would not be unreasonable to treat the entire collection as a continuing resource. with this in mind, the authors examined whether their electronic-resource management system could be adapted to accommodate evolving collections of digitized and born-digital material. more specifically, the present system was examined to determine whether its capabilities could be expanded to capture the hierarchical structure found in traditional archival finding aides. the electronic resource management system in use by psu library is innovative interfaces’ electronic resource management (erm) product. according to innovative interfaces inc.’s (iii) marketing literature, “[erm] effectively controls subscription and licensing information for licensed resources such as e-journals, abstracting and indexing (a&i) databases, and full-text databases.”17 to control and provide improved access to these resources, erm stores details about purchase orders, aggregators and publishers, subscription terms, licensing conditions, breadth of holdings, internal and external contact information, and other aspects of these resources that individual libraries consider relevant. for increased security and data integrity, multilevel permissions restrict viewing and editing of data to the appropriate level of staff or patron. the ability of erm to replicate the two-level hierarchical relationships between aggregators or publishers and the electronic and print resources they provide was of particular interest to the authors. through erm and iii’s batch record load capabilities, bibliographic and resource records can be loaded into the iii system using delimited source files such as those provided by serials solutions. resource records are the mechanisms used by iii to describe digital resources at a collection, subcollection, or title level, thereby enabling the capture of descriptive information not permitted by standard bibliographic records. iii uses holdings records to document serial holdings statements. according to the marc 21 formats for holdings data, a holdings statement is the “record of the location(s) and bibliographic units of a specific bibliographic item held at one or more locations.”18 iii holdings records may also contain a url for connecting to an electronic resource. in figure 1, for example, the resource record shows that psu library provides limited access to a number of journal titles through its springer journals online resource. as seen in figure 2, the display of a holdings record embedded in a bibliographic record provides more specific information on the availability of a title through the library’s collection. in this particular example, the information display reveals that print volumes are available for this title but that psu only has this title available as a part of the springer-verlag electronic collection accessible by clicking on the hotlink. more information on the springer collection can be discovered by clicking on the about resource button to retrieve the springer journals online resource record. this example, then, represents a two-level hierarchy where the resource springer journals online is analogous to an archival collection and abdominal imaging is analogous to an archival series. adaptation of erm for library-created digital collections was explored through work being done to fulfill the requirements of a grant received in 2005 by psu library. the goal of this grant was “to develop a digital library under the sponsorship of the portland state university library to serve as a central repository for the collection, accession, and dissemination of key planning documents and reports, maps, and other ephemeral materials that have high value for oregon citizens and for scholars around the world.”19 the overall collection is called the oregon sustainable community digital library (oscdl). in addition to having its own web site, it was decided to make this collection accessible through the psu library catalog so that patrons could find digitized original documents about the city of portland together with other library materials. bibliographic records would be added to the database with hyperlinks to the digitized original documents using existing staff and tools. these bibliographic marc records would be as complete as possible. initially, attention was focused on documents originating from four different sources: ernest bonner, a former portland city planner; the city of portland archives; metro (the regional government for the portland, oregon, metropolitan area); and trimet (the portland metropolitan public transportation system). along with the documents, metadata was received from various databases. these descriptions ranged from almost nothing to detailed archival descriptions. unlike the challenge of shifting titles and holdings with typical serials collections, the challenge of this project was to reflect the four hierarchical levels of psu library’s collection (figure 3). innovative’s system structure was manipulated in order to accomplish this. at the core of iii’s erm module are resource records (rr) created to reflect the peculiarities of a particular collection. linked to these resource records are holdings records (hr) containing hyperlinks to the actual digitized documents (doc h1 – doc h3) as well as to their respective bibliographic records (bib doc h1 – bib doc h3) containing additional information on the individual items within the collection (figure 4). 68 information technology and libraries | june 2006 first, resource records were manually created for three of the subcollections within the bonner collection. these subcollections contained documents reflecting the development of harbor drive, front street, and the park blocks. the fields defined for the resource records include the resource title; type (digitized documents) and format (pdf) of the resource; a hyperlink to the new oscdl web site; content and systems contact names; a brief description of the resource; and, most importantly, the resource id used to connect holding records for individual documents to the corresponding resource record. next, the batch-loading function in erm was used to create bibliographic and holding records and associate them with the resource records. taking advantage of tracking data produced during the digitization process (figure 5), spreadsheets were created for each collection reflecting the data assigned to each individual digitized document. the document title, the date the document was created, number of pages, and summaries were included. coordinates for the streets mentioned in the documents were also included. because erm uses issn numbers and titles as match points for record loads, ”issn” numbers were also manufactured for each document and included in the spreadsheet. these homemade numbers were distinguished by using pdx as a prefix followed by collection and document numbers or letters, for example, pdx0022090 or pdxhdcoll. fortunately, erm accepted these dummy issns (figure 6). from this data spreadsheet, the system-required comma delimited coverage load file (*.csv) was also created. for this file, the system only allows a limited number of fields, and is very particular about the right terms, including correct capitalization, for the header row. individual document titles, the made-up issn numbers, individual urls to the documents, and a collection-specific resource id (provider) that connects all the documents from a collection to their respective resource record were included. the resource id is the same for all documents in one collection (figure 7). in the first attempt, the system was set up to produce holdings and bibliographic records automatically, using the data from the spreadsheets. for the bibliographic records, a system-provided template was created that included some general subject headings, genre headings, an author field, and selected fixed fields, such as language, bibliographic level, and material type (figure 8). records for the harbor drive collection were loaded, and the system created brief bibliographic and holdings records and linked them to the harbor drive resource record. the records were globally updated to add the general material designator (gmd) “electronic resource” to the title as well as the phrase “digitized document” as a local “call number” to make these documents more visible in the browse screen of the online catalog (opac) (figure 9). the digitized documents now could be found in the library catalog by author, subject, or keyword. the brief bibliographic records (figure 10) allow the user to go either to the digitized document via url or to the resource record with more information on the resource itself and links to other items in the same collection. the resource record then provides links either to the new oscdl web site (via the oregon sustainable community digital library link at the bottom of the resource record), to the bibliographic description of the individual document, or to the digitized document (figure 11). however, the quality of the brief bibliographic records that had been batch generated through the system-provided template was not satisfactory (figure 8). it was decided that more document-specific data like summaries, number of pages, the dates the documents were created, geographical information, and documentlevel local subject headings should be included. these data were already available from the original spreadsheets. with limited time and staff resources, full bibliographic marc records were batch created using the spreadsheets, detailed templates adjusted slightly to each collection, microsoft mail merge, and finally, the marcedit program created by terry reese of oregon state university (http://oregonstate.edu/~reeset/marcedit/html/index.html). this gave maximum control over the data to be included and the way they would be included. it also eliminated the need to clean up the data following the record load (figure 12). subsequently, full bibliographic records were created for the subcollections harbor drive, front street, and park blocks, to connect them to the next higher level, the bonner collection (figure 3). these records were also contributed to worldcat. mimicking the process used at the document level, a resource record was created for the bonner collection and the holdings records for the three subcollections were connected with their corresponding bibliographic records (figure 13). resource records with their corresponding item-level records for trimet, the city archives, and metro followed. the final step was then to add the resource record and the bibliographic record for the whole oscdl collection (figure 14). since this last bibliographic record is not connected to a collection above it, there is only a hyperlink to the oscdl resource record (figure 15). more subcollections and their corresponding digital documents are continually being added to oscdl. structures in psu library’s opac are adjusted as these collections change. digital collection management through the library catalog | brenner, larsen, and weston 69 ■ conclusion according to salter, “digitizing, the current challenge that straddles the 20th and 21st centuries, has given archivists and librarians pause to reconsider access to their collections. the world of digitization is the catalyst for it people, librarians, and archivists to unify the way they do things.”20 in this paper, a strategy has been offered for adapting a library system to traditional archival practice. by making use of some of the capabilities of the module in psu library’s integrated library system that was originally designed for managing electronic resources, a method was developed for managing digital archival collections in a way that incorporates some of the features of a traditional finding aid. the contents of the various hierarchical levels of the collection are fully represented through the manipulation of the record structures available through psu’s system. this technique provides for enhanced access to the individual items of a collection by giving the context of the item within the collection. links between the hierarchical levels facilitate navigation between the levels. although the records created for traditional library systems are not as rich as those found in traditional finding aids, or in ead, their electronic equivalent; and the visual arrangements are not as intriguing as a wellplanned web site, the ability to show how items fit within the greater context of their respective collection(s) is a step toward reconciling traditional library and archival practices. enabling the library user to virtually browse through the overall resources offered by the library and then, if desired, through the various levels of a collection for relevant resources enhances the opportunities presented to the user for finding relevant information. references and notes 1. society of american archivists, “so you want to be an archivist: an overview of the archival profession,” 2004, www.archivists.org/prof-education/arprof.asp (accessed apr. 24, 2006). 2. kent m. haworth, “archival description: content and context in search of structure,” journal of internet cataloging 4, no. 3/4 (2001): 7–26. 3. antonio panizzi, “rules for the compilation of the catalogue,” the catalogue of the british museum 1 (1841): v–ix. 4. joint steering committee for revision of aacr, angloamerican cataloguing rules, 2nd ed., 2002 revision (chicago: ala, 2002). 5. society of american archivists, describing archives: a content standard (chicago: society of american archivists, 2004). 6. haworth, “archival description.” 7. oliver w. holmes, “archival arrangement: five different operations at five different levels,” american archivist 27, no. 1 (1964): 21–41; terry abraham, “oliver w. holmes revisited: levels of arrangement and description of practice,” american archivist 54, no. 3 (1991): 370–77. 8. society of american archivists, describing archives: a content standard (chicago: society of american archivists, 2004); xiii. 9. haworth, “archival description.” 10. society of american archivists, describing archives: a content standard (chicago: society of american archivists, 2004); steven l. hensen, comp., archives, personal papers, and manuscripts, 2nd ed. (chicago: society of american archivists, 1989). 11. peter carini and kelcy shepherd, “the marc standard and encoded archival description,” library hi tech 22, no. 1 (2004): 18–27; steven l. hensen, “archival cataloging and the internet: the implications and impact of ead,” journal of internet cataloging 4, no. 3/4 (2001): 75–95. 12. abraham, “oliver w. holmes revisited.” 13. elizabeth j. weisbrod and paula duffy, “keeping your online catalog from degenerating into a finding aid: considerations for loading microformat records into the online catalog,” technical services quarterly 11, no. 1 (1993): 29–42. 14. carini and shepherd, “the marc standard and encoded archival description.” 15. see, for example, margaret f. nichols, “finding the forest among the trees: the potential of collection-level cataloging,” cataloging & classification quarterly 23, no. 1 (1996): 53–71; and weisbrod and duffy, “keeping your online catalog from degenerating into a finding aid.” 16. joint steering committee for revision of aacr, angloamerican cataloguing rules, d-2. 17. innovative interfaces inc., “electronic resources management,” 2005, www.iii.com/pdf/lit/eng_erm.pdf (accessed apr. 24, 2006). 18. library of congress, marc 21 format for holdings data: including guidelines for content designation (washington, d.c.: cataloging distribution service, library of congres, 2000), appendix e–glossary. 19. carl abbot, “planning a sustainable portland: a digital library for local, regional, and state planning and policy documents—framing paper,” 2005, http://oscdl.research.pdx. edu/framing.php (accessed apr. 24, 2006). 20. anne a. salter, “21st-century archivist,” newsletter, 2003, www.lisjobs.com/newsletter/archives/sept03asalter.htm (accessed apr. 24, 2006). 70 information technology and libraries | june 2006 figure 1. example of resource record from the psu library catalog (search conducted nov. 4, 2005) appendix. figures digital collection management through the library catalog | brenner, larsen, and weston 71 figure 2. example of a bibliographic record for a journal title from the psu library catalog (search conducted nov. 4, 2005) 72 information technology and libraries | june 2006 figure 4. resource record harbor drive with linked holdings records, bibliographic records, and original documents figure 3. partial diagram of the hierarchical levels of the collection digital collection management through the library catalog | brenner, larsen, and weston 73 figure 7. comma delimited coverage load file (*.csv) figure 6. data spreadsheet figure 5. spreadsheet for tracking data 74 information technology and libraries | june 2006 figure 9. browse screen in opac figure 8. bibliographic records template digital collection management through the library catalog | brenner, larsen, and weston 75 figure 11. resource record with various links figure 10. system-created brief bibliographic record in opac 76 information technology and libraries | june 2006 figure 13. bonner resource record with linked holdings records, bibliographic records, and original documents figure 12. full bibliographic record in opac digital collection management through the library catalog | brenner, larsen, and weston 77 figure 15. bibliographic record for the oscdl collection figure 14. outline of linked records in the collection 1 concept of an on-line computerized library catalog frederick g. kilgour, director, the ohio college library center, columbus, ohio a concept for mechanized descriptive cataloging is presented, together with four areas of research programs to be undertaken. this paper will describe a concept of a catalog that is hospitable to mechanized descriptive cataloging, and will delineate major areas of research for production of knowledge necessary to implement such a catalog. to avoid an unnecessarily complex presentation, the discussion will treat only of printed books. nevertheless, the catalog described will function equally well as a store for serials, journal articles, reports, and any other materials that carry bibliographic-like descriptions of themselves. as used in this paper, a concept is an idea that combines experience with and observations of catalogs, and suggests further experimentation and observation. the merit of a concept is measured by its fruitfulness in production of new ideas and new experimentation and observation. the purpose of the concept proposed in this paper is to suggest avenues of investigation that will yield findings useful in development of mechanized, descriptive cataloging. the paper opens with a brief discussion of the objective and functions of a library catalog. next there is an analysis of the principal contribution of information retrieval during the past quarter century and a proposal for applying this advance to cataloging of books. the third section describes a plan for a new style of catalog, and the fourth shows how it will be possible to prepare entries mechanically from title pages for inclusion in such a catalog. there follows an outline of major research investiga2 journal of library automation vol 3/ 1 march, 1970 tions to be undertaken to produce knowledge necessary for activation of t.'ie new style catalog and mechanical cataloging. the paper concludes with a brief estimate of the success the system may be expected to attain in achieving the objectives that appear at the start of the paper. objectives the principal objective of a future library will be active participation in the program of the community or institution of which it is a part by furnishing members of the community with bibliographic, textual, and other recorded information when and where they need such information. the passive service functions that librarianship has developed during the past century are no longer adequate to maintain a library as a viable organization within its environment. effective special libraries have rid themselves of the passive service concept and aggressively participate in the programs of their companies. an extreme example of active participation in institutional programs is the library-like sections of intelligence agencies. here collectors and processors of new information do not place that information on shelves or in files with the expectant hope that someone will use it. rather the collection and processing staff immediately brings new information to the attention of those making decisions that the new data may affect. a fruitful concept of a library is as an external human memory. since aristotle, it has been recognized that memory is necessary for creative thinking. the process of creative thinking requires raw materials from memory; but for centuries it has been impossible for man to retain within his memory all data that could fuel his creative thinking. indeed, one great triumph that sets human beings apart from all other animals is the ability to store data in an external memory such as a library. however, to support creative thinking, the external memory must transfer data to the human mind with as great speed as possible to prevent hindrance of thought that admits distraction. it is this lack of speed that generates frustration among library users. if a library is to participate effectively in programs of its institution or community, it must simulate human memory in furnishing a human mind with data when and where that mind needs data. current development in computers, and particularly in their memories, holds out the hope of highly effective external memory operation at some point in time beyond the foreseeable future, but in the meantime it is entirely feasible to strive for simulation of human memory with speedy recall of bibliographic information. it has been pointed out elsewhere ( 1) that productivity of library workers is not continuously increasing as is that of workers in the general economy, so that library unit costs are rising at a much more rapid rate than are those in the economy as a whole. if libraries are to attain their objectives in the future, they must invoke a technology that will enable them to lower their excessively high rate of unit-cost rise. it appears that concept of computerized catalogs/ kilgovr 3 mechanization, or more specifically, computerization, is the only avenue that extends toward the goal of economic viability. information retrieval during the last quarter-century, there have been important developments in information retrieval that have yet to be applied to book collections. such pioneers as w. e. batten, g. cordonnier, calvin mooers, and mortimer taube made a major innovation when they developed coordinate indexing. this technique coordinates index terms at time of searching, employing boolean logic. coordinate indexing has greatly increased flexibility of searching and number of accesses to documents in contrast to the precoordinated headings in traditional catalogs that are inflexible for searching and for up-to-date maintenance. early information retrieval systems dealt with relatively small files of documents, such as patents and internal reports, that were not subject to traditional bibliographic control. moreover, indexes to these files were housed in various manual devices. with the advent of the computer it became feasible to apply coordinate indexing techniques to large files of documents, including materials under classical bibliographic control. however, up to the present time, the techniques of information retrieval have been applied primarily to huge files of journal articles; an outstanding example of the application of coordinate indexing to journal articles is the pioneering medlars project, in which the primary approach to article is via coordination of subject indexing terms. retrieval of books from a library collection is an information retrieval process irrespective of whether the borrower uses subject headings or author-title entries in the catalog. the user who obtains a book from a library by employing the book's author-title label is logically engaged in the same information retrieval process as is the user who searches out a book under a subject heading. at first reading of the previous paragraph it might appear that a reader who obtains a novel by use of an author-title entry in a catalog is not engaged in information retrieval in its customary narrow sense. however, it is clear that the reader of a novel or poem is acquiring information in the same sense as is the reader of a book on computers, although knowledge he gleans from a novel is not for immediate practical application, but rather to enable him to understand what it is like to be a human being, and more specifically what it is like to be a human being in some of the precise circumstances of life. for the library user the words in an author's name, a title, and subject headings, are index labels that he uses to find a book that contains information he needs. the traditional use of these index labels, at least since the middle ages, has been some variety of an author and title citation form. the user knows externally to the citation that the book so labeled has information he wants. apparently three-quarters of the information 4 journal af library a.utomation vol. 3/1 march, 1970 retrieved from a library by use of a library catalog is via an author-title entry ( 2, 3) or a known document search ( 4) . a librarian's use of a catalog (except for reference librarians who. represent users) is to discover whether or not the library has a given book. the librarian does not use the authm-title entry as a label, but rather as information per se. librarians include sufficient data in catalog entries to enable them to decide from the description whether or not a book at hand or a book described by another citation is tne same book as that in the catalog records. in short, users employ a library catalog to direct them to information they require; librarians use the catalog for the actual information it contains. computerized catalog concepts several libraries employ nonconventional design for computerized catalogs or lists of other than bibliographic items. the stanford university library .. s on-line system uses a sequential fl:le of enfries to which there is an index of word's in the author and title elements of the edtry as well as other words in other elements (5). index files for various stanford data collections are "author, title word, id number, corporate author, conference author, keyword, citation,"' and for certain files, top1'cal subject indexes. in general, this system is· widely employed in the organization of computerized files, but the stanford application has a nniqne featnre in that it uses a derived key consisting of only the mst three letters of index words. for example, the derived keys for author and title words in figure 1 are vic, on}j, ret, sys, and the. the computer calculates the position of the word in the index from these trigrams· so that if is possible to locate the index word with great speecl this technique· of employing a derived key to compute location takes full advantage of a computer's major characteristic, namely, ability to compute rapidly. the washington state university library has developed a similar system for its on-line acquisitions file. access to an entry in this linear file is by purchase order nmnber. a random number genefafol' uses the purcllase order number to compute position of the entry in fhe file. from early trials, this technique appears to make possible exceptionally efficient use of random access file space. yale's machine aided techmcal processing system uses derived keys to locate entries for book funds in the system·s commitment register and entries in a name and address file employed for addressing notices and claims. the yale technical processing system also uses a derived key technique to detect duplication of purchase orders entering the master file. this system operates using the first four leffers of the author's name, first three letters of the first non-article word of the title, and first letter of the seeond title word if there is a second word. a routine run on 23 june, 1969, compared 1,23-7 new entries against 63,641 entries already in concept of cvmputerized catalogs/kilgour 5 the file. the comparison produced 199 couplets containing possible duplicates of which 115 couplets actually were duplicates; of the 115, only forty would have been obtained if an equal compare had been made throughout the author and title fields. several investigators are working on techniques for derivation of keys ( 6, 7, 8) . similar work on telephone directories ( 9) has yielded preliminary results indicating that an efficient formula for derived keys for personal listings is the first three letters of the surname and .first three letters of the street name; and for business listings the first three letters of the first word and first three letters of the second word. the ohio college library center is working on development of a computerized catalog for traditional catalog entries wherein a computer will rompute position of an entry in a file organized in a two-dimensional array from a precoordination of truncated strings of letters from words in the author's name and in the title. oclc plans to use this technique because, as already noted, three-quarters of the use of a library catalog seems to be use of author-title entries. precoordination :of derived keys from these two elements will speed average retrieval time. the present design calls for a microcatalog containing, on the average, perhaps fewer than five entries to be located at each computed position. having computed a location, a eomputer will search the microcatalog for entries possessing derived keys matching the original request, and entries satisfying this requirement will be displayed as a minilist on a cathode ray tube terminal. it is hoped that algorithms can be constructed that will yield minilists containing fewer than twenty entries 95% of the time. indexes to the p:r.oposed main entry file will be the equivalent of classical added entries. however, it is fruitful to view subject headings, title added entries, and author added entries as being continuous text, from which uniterms can be mechanically extracted. under each uniterm will be a list of addresses of the microcatalogs containing the corresponding entries, and each entry could be looked upon as analogous to the concept of a microtheme proposed by t. p. loosjes ( 10). coordination of indexing at sear-ch time by the user need not be limited within subject words, or title w{)rds, or author words. rather, coordination among these elements will greatly increase accesses to entries and will make possible retrieval of entries with a relatively slight amount of bibliographic information, particularly if each word is truncated as described above. although much research and new knowledge is necessary to achieve successful design of the type {)f catalog described above, there is no technical obstacle to its successful activation for experimentation. when practical implementation and routine operation are also successful, the user will be employing a minilist containing twenty or fewer entries most of the time. in other words, the reader will be using a catalog of twenty or fewer entries, and such a catalog makes it unnecessary to include bibliographical embellishments required for entries in huge card or book6 journal of library automation vol. 3/1 march, 1970 form catalogs. it would appear that for catalogs of twenty or fewer entries only information on title pages would be required; a scholar rarely, if ever, needs more. hence, it seems feasible that mechanization of descriptive cataloging could be achieved in the foreseeable future. mechanized descriptive cataloging the organization for a computerized catalog containing entries prepared mechanically from title pages would be somewhat different from that described in the previous section. if it proves impossible, as seems likely, to devise an algorithm that would mechanically identify author, title, and other elements on a title page, it would be necessary to arrange entries in sequential order. a computer could then prepare a mechanical coordinate index of substantive words on the title page that would make possible at search time coordination of words in author, title, and other elements without having to identify those elements. of course, catalogers would still do subject classifying and indexing, as well as assigning of call numbers, but a computer would mechanically convert these additions to the uniterm type of coordinate indexing described in the previous section. this proposal to construct a bibliographic record in the form of a transcription of a title page is not new. an early code for production of catalog entries, which the french government issued in 1791 ( 11), prescribed transcription of the title page, and underlining of the author's name as the filing term. if the book did not have an author, the key word in the title was to be underlined. the code also provided for the titlepage transcription to be supplemented by a physical description of the book. this proposed new concept for a computerized library catalog closely relates to the stanford design and the planned oclc design. however, in contrast to the new concept, the stanford file organization requires identification of record elements from within which words are extracted for inclusion in indexes, and the indexes are so tagged. similarly, the present oclc plan also requires identification of author and title elements for calculation of location, and hence for retrieval, as well as flagging of other retrieval elements, such as record number and call number; but the oclc system will not make necessary identification among types of added-entry elements. the proposed new concept expands this last device to the entire record. a computer simulation has been carried out of an on-line computerized catalog containing descriptive entries prepared mechanically. access to the simulated catalog was by coordination of non-structure words in titles via single-level indexes. simulation of user inquiries at a peak rate of five per second, processed on an economically feasible computer, revealed that utilization of the computer's central processing unit was only 19.87 percent. it is known from other simulation studies that library use of such concept of computerized catalogs/ kilgour 7 a computerized catalog would raise utilization by only two percent at the most. hence it follows that there is at least one (and probably several) existing, economical computer system that can be employed for such a catalog. mechanical descriptive cataloging of the title page depicted in figure 1 would be efficient and effective. the only character strings on the title page that would not be useful in coordinate indexing are "by" and "m.a., f .l.a.". however, the title page in figure 2 contains at least seven, or perhaps eleven, words and symbols that would not information retrieval systems characteristics, testing, and evaluation f. wilfrid lancaster nat!ooal ubruy of modlciqo john wiley & sons, inc. new york · london · sydney · toronto fig. 2. title page (undated). on retrieval system theory by b. c. vickery, m.a., f.l.a. second edition washington buiterworths 196s fig. 1. title page. be employed in coordinate indexing. if these eleven were to be included in the index and remain unused, they would approximately double the size of the index for this particular title page. such inefficiency is too large to tolerate. morever, the title page in figure 2 does not contain date of publication. effect of absence of publication date from entry and index is not known, although a recent study suggests that date of publication may be of relatively little use as a retrieval element ( 12). 8 journal of library automation vol. 3/ 1 march, 1970 the text of a title-page would be displayed as a string of characters and not rearranged as is done in traditional catalog entries. no doubt sophisticated algorithms will be devised to format displays, but even a simple algorithm produces a useful representation of title-page information. for example, by employing the simplest of algorithms that would insert two spaces at the end of each title-page line, the title page in figure 1 would appear as follows on a terminal. on retrieval system theory by b. c. vickery, m.a., f.l.a. second edition washington butterworths 1965 readers have used title pages successfully for centuries and will surely experience no difficulty in using them displayed in this manner on terminals. it is hoped that ultimately it will be possible to use optical character recognition techniques for mechanical transcription of most title pages. until effective ocr techniques are available, it will be necessary for clerical staff to transcribe title pages, and such employment for human beings is undesirable. however, libraries now employ clerical staff to transcribe bibliographic information for entries in essentially the same manner, so that continuance of an existing practice in this instance cannot be looked upon as invocation of a machine to convert human beings to machine-like activity. nevertheless, machines should replace such activity at the earliest opportunity. research there are at least four major areas of unknown on which research must be carried out to produce knowledge needed for development of a computerized library catalog hospitable to descriptive cataloging entries produced mechanically: 1) use of library catalogs; 2) specification for derived keys; 3) identification of title-page words useful and not useful for coordinate indexing; and 4) extent and type of coordination necessary to ensure successful retrieval. extraordinarily little is known about users' employment of library catalogs to obtain information from books. yet successful design of a catalog must be based on firm knowledge of catalog use. some areas of the broad pattern of catalog usage are known, but much more must be discovered before an effective catalog can be designed. descriptive cataloging rules have long been derived from rationalized principles of title-page and catalog formats. as yet there has been no major effort to derive these rules from the bibliographic practices of library users. for example, there has been no general effort to construct rules for descriptive catalog entries that match scholarly bibliographic references in such a way that a scholar could always expect to find in a library catalog essentially the same entry presented to him in a biblioconcept of computerized catalogs/kilgour 9 graphic footnote. to design new catalogs based on the various scholarly traditions of citation will require a series of analyses of citation practices that will ultimately yield descriptions of minimum regularity. the section of this paper on computerized catalog concepts has referred to research on specification for derived keys. such specification is required to enable swift access to files and at the same time to diminish human error in search requests. traditional designs of computer files are inadequate for management of huge files of millions of bibliographic entries. at the present time it appears that the truncation algorithms already referred to may be able to cope successfully with a majority of catalog entries. however, it is clear that such truncation techniques will not provide uniqueness of all keys adequate for efficient on-line catalogs. therefore, it will be necessary to carry out a series of investigations that will identify classes of entries for which a basic algorithm does not operate satisfactorily and to devise a supplementary algorithm to improve uniqueness of keys for those entries for which the basic algorithm essentially failed. presumably this cycle will be repeated as long as inadequacy of key uniqueness persists. in other words, research in this area will continue as long as retrieval inefficiency exists for the user. uniqueness of key depends on uniqueness of the serial combination of words from which the key is derived. hence analyses of frequency of word occurence on classical catalog entries, title pages, and in subject indexes, should be carried out with the aim of deriving a generalized description of such frequency distributions. such findings will be necessary for sophisticated logical and physical file organization. . to organize an efficient, huge file of bibliographic entries it is necessary to develop a method for computing scatter storage addresses that provides a very high percentage of unique addresses, thereby avoiding a collision with an entry already in an address. of course, it is necessary to furnish a hash-coding, or scatter-storage, algorithm with keys that possess high relative uniqueness; otherwise, the most efficient of scatter-storage algorithms would yield non-unique addresses in ratio to the degree of non-uniqueness of keys. p. c. mitchell and t. k. burgess ( 13) have introduced random-number generation for computing scatter-storage addresses and have shown their method to be more efficient than division hash coding. other investigators are working on techniques for minimizing queues resulting from repeated collisions. there is need for continuing imaginative investigation that will yield results like that of mitchell and burgess before huge bibliographic files and their indexes will be accessed efficiently. identification of useful and non-useful words for coordinate indexing on title pages, including those in foreign languages, is related to catalog usage. at the present time there is no information that gives a clue as to size of a list of non-useful words. much ingenuity and imagination will 10 journal of library automation vol. 3/1 march, 1970 be required to identify non-useful words and to construct efficient null lists. finally, investigation will be needed to determine amount and type of coordination necessary among author and title words. it will also be essential that a measure of retrieval success by author and title be developed. the need here is construction of a meaningful measure for retrieval of a single entry. conclusion the proposed concept for an on-line computerized library catalog will make it possible for a user to obtain bibliographic information from a remote terminal rapidly. use of derived keys would increase error tolerance well above that of present manual systems by diminishing effect of misspellings and by making it unnecessary for the user to have knowledge of catalog organization. moreover, the concept is a step toward full mechanization and can indeed be viewed as a partial simulation of text processing. the proposed catalog will also make it possible for libraries to take the first major step toward their economic goal of development of a continuously increasing productivity for both library staff and library user. it is anticipated that successive steps to come after mechanical descriptive cataloging will be automatic subject classification and indexing, to be followed ultimately by full text processing. when it is possible to achieve full text processing mechanically, and a decade or more may be required for that achievement, libraries will have succeeded in attaining their objective of participation as well as their economic goal of rate of cost rise equal to that in the general economy. references 1. kilgour, frederick g.: "the economic goal of library automation," college & research libraries, 30 (july 1969), 307-311. 2. tagliacozzo, renata; kochen, manfred; rosenberg, lawrence: "orthographic error patterns of author names in catalog searches," ]oumal of library automation, in press. 3. brooks, benedict; kilgour, frederick g.: "catalog subject searches in the yale medical library," college & research libraries, 25 (november, 1964), 483-487. 4. lipetz, ben-ami: "a quantitative study of catalog use" in university of illinois graduate school of library science: proceedings of the 1969 clinic on library applications of data processing, (preprint). 5. parker, edwin b. : "developing a campus information retrieval system." in proceedings of a conference held at stanford university libraries, october 4-5, 1968 (stanford, california: stanford university libraries, 1969), pp. 213-230. concept of computerized catalogs/kilgour 11 6. ruecking, frederick h., jr. : "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," journal of library automation, 1 (dec. 1968), 227-238. 7. nugent, william r.: "compression word coding techniques for information retrieval," l oumal of library automation, 1 (dec. 1968), 250-260. 8. kilgour, frederick g.: "retrieval of single entries from a computerized library catalog file," proceedings of the american society for information science, 5 ( 1968), 133-136. 9. rothrock, hamilton irving, jr.: computer-assisted directory search; a dissertation in electrical engineering (university of pennsylvania, 1968). 10. loosjes, t. p.: "document analysis,'' proceedings of the third international congress on medical librarianship ( 1969), preprint. 11. instruction pour proceder a la confection du catalogue de chacune des bibliotheques (paris: imprimerie nationale, 1791 ), p. 6. 12. vaughan, delores k.: "memorability of book characteristics: an experimental study." in university of chicago graduate library school: requirements study for future catalogs (chicago: university of chicago graduate library school, 1968), pp. 1-41. 13. mitchell, patrick d.; burgess, thomas k.: "methods of randomization of large files with high volatility," journal of library automation, 3 (march 1970), 79-86. lib-mocs-kmc364-20140103102448 on the recursive definition of a format for communication leonid n. sumarokov: head, research department, international center for scientific information, moscow, ussr 61 a recursive presentation of a communication format is discussed and a form of pertinent notation proposed. recursive notation permits presentation of an interchange format in more general terms than heretofore published, and expands application possibilities. the development of the forms of exchange of information among documentation systems, and particularly the development of the technique of recording machine readable bibliographic data on magnetic tape, has led to the requirement for the adoption of an agreement on a standard for a format for communication. thus, the problem of a format for communication reflects the existing tendency toward ensuring compatibility among formats. at the present time the greatest impact on world information practice has been caused by the american national standard institute (ansi) standard for bibliographic information interchange on magnetic tape ( l ) and the several implementations of that standard: marc, inis, cosati and others. it should be noted that, despite numerous existing peculiarities, in principle there is no difference in structure among the formats. one of the most important requisites for a communication format is universality. the practice of processing large quantities of information has emphasized the flexibility of the above-mentioned formats; their use has permitted identification of huge numbers of documentary materials in 62 journal of library automation vol. 4/2 june, 1971 various forms, thereby creating the impression that the structure of the format has been developed to such an extent that it can be canonized for any application. it must be said that support or rejection of this impression can be based only upon future experience in the application of a communication format. nevertheless, it appears expedient to generalize about the structure of a communication format by making a few preliminary remarks and thereby contributing toward expanding the sphere of its application. the remarks deal with the following. in the existing systems for interchanging information on magnetic tape, the document is the object of identification. with the development of data banks the characteristics of the objects to be identified may prove to be so varied, even though presented in the proper documentary form, that their uniform presentation will cause difficulties. (actually, examples can be given of data banks in which data appear in the capacity of objects : information regarding firms, rivers, information about products of the electrical engineering industry, etc.). furthermore, even if it is possible to identify in principle a certain object with the aid of the format, one must distinguish between the question of possible identification in principle, and that of the optimal (or rational) form of identification in view of the limitations of a certain system. the recursive notation of a communication format is presented below. certain definitions and ideas in general are used as source material for such a notation, using the american standard for bibliographic information interchange on magnetic tape ( 1). it must be conceded that the use of one term or another for defining individual elements of a notation, as well as the general structure of the entire notation, are not the principal subject of discussion here; this means that any change, either in definition or, to a certain extent, in the structure of the notation, will not affect the proposed form of the notation. consequently, this article does not pretend to describe a certain universal structure for a communication format. it has a different purpose, viz., to point out wider perspectives that will unfold by applying the recursive presentation of notations in formats at the expense of an object with any hierarchical depth. for the following symbols explanations can be found in the ansi standard ( 1 ) : r=record l=leader dr= directory t=tag d=data, or data elements ft=field terminator, or field separator rt=record terminator, or record separator the concept tt used below, and standing for tag terminator, is analogous to ft and rt. so also is the concept sf, meaning specific fields for de· d efinition of communication formatjsumarokov 63 fining contents that did not appear in the proposed notation although utilized in actual formats. the following symbols are also used : tg=tag generalized f=field df=data field bf=bibliographic fields utilization of special notation in brackets (analogous to the form used in algorithmic languages) enables r to be defined in the form of the following consecutive structure: 1) r=[l] [dr] [sf] [bf] the symbols written in brackets after the equal sign maintain the relationship of priority. further, the recursive universal tag tg is defined as follows: 2) tg=[t;tt] such a notation indicates that the expression in brackets is t or tt. the recursiveness of the notation indicates that it is possible that tg is t1t2 ... tp :tt where p is any whole number, a larger or an equal unit. (obviously p defines the depth of the hierarchic description in accordance with the given characteristic. ) finally 3) f=:[tg] [d]; 4) df=: [f;ft] ; 5) bf=: [df;rt]. thus, the general notation of the format is expressed by 1), in which the element bf, which constitutes the basic part of the so-called alternate fields , is expressed recursively with the aid of the system 2) -5 ). as is evident, the quantity f in df, and df in bf, as well as in the case of the subscripts tg, can arbitrarily be a whole number, changing from notation to notation. reference l. "usa standard for a format for bibliographic information interchange on magnetic tape," 1 ournal of library automation, 2 (june 1969), 53-65. tackling the big projects: do it yourself or contract with a vendor? editorial board thoughts tackling the big projects do it yourself or contract with a vendor? laurie willis information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.12067 laurie willis (laurie.willis@sjlibrary.org) is web services manager, san jose public library and a member of the information technology and libraries editorial board. copyright © 2020. everyone who works with library technology sooner or later finds they are faced with a major project to tackle. sometimes we contract with a vendor to do the bulk of the work, sometimes we do the project ourselves. there are advantages and disadvantages to both methods. here at san jose public library we were faced with two large projects at once—a website migration/redesign, and a new catalog discovery layer. we considered bibliocommons as the vendor for both projects. they offer both a website product (biblioweb) and a discovery layer (bibliocore). we opted to complete the website migration/redesign ourselves using open source software, migrating from our previous drupal 7 platform to drupal 8, and to contract with bibliocommons to provide our new discovery layer. this put us in an unusual position. we were implementing a website migration/redesign ourselves while simultaneously working with the vendor we would likely have chosen for the website project on the catalog discovery layer. this gave us the opportunity to compare the experience of implementing the website project ourselves with what the same project might have been like if we had been working with a vendor. what we learned timing not surprisingly, completing the website project on our own took longer than expected. • learning curve—we expected there to be a learning curve but it turned out to be significantly steeper than anticipated. • unknowns—in addition to basic learning, we also came across functionality that didn’t work as expected. • failures—there were times when what we tried to do didn’t work at all and we had to backtrack. timing for the vendor-led project, on the other hand, kept to the planned timeline. • prescribed timeline—as part of their contract, the vendor provided a timeline at the outset. we made small adjustments but for the most part the project stayed on time. • predictability—the vendor has completed many similar projects so they had a solid idea what to expect and how long it would take. • problem solving—some challenges unique to our situation did arise and caused some delays. control the ability to have more control over the project results was a significant factor in our decision to complete the website project ourselves. we had the opportunity to make choices and also faced the challenge of a sometimes-overwhelming number of options. mailto:laurie.willis@sjlibrary.org information technology and libraries march 2020 tackling the big projects | willis 2 • options—many options were available to us. we had choices regarding structure (website platform and theme), design and content. • overwhelm the plethora of options encouraged a tendency to spend a lot of time (too much?) “shopping”—researching and evaluating options. • we completed a thorough audit of our content and created a new site based on our needs. • user experience (ux) testing—we were able to perform testing with our users and adapt our website to better fit their needs. working with a vendor, on the other hand, limited what we were able to do but the decision -making process was easier and faster. • we had the option to select colors but otherwise the structure and design were fixed. • we had some control over textual content within the parameters given. e.g. we could add links to the footer but the number of links allowed was limited. • little time was spend making these decisions. • it’s a challenge fitting unique content into a pre-determined format. • user experience (ux) testing—the vendor is able to include a wider sampling of people while testing, but they’re not able to specifically consider our local users. implementation for the website project, implementation turned out to be more complex than expected. • learning—as mentioned above, there were many new things to learn that came up as the project progressed. • consultant—we came up with technical questions that were beyond the scope of our knowledge. we found it extremely helpful to contract with a consultant for guidance. • conflicting responsibilities—we worked on this project while continuing with our normal workload and maintaining the current website. we were also simultaneously working on the discovery layer implementation. the vendor-led implementation went more smoothly. • learning—the vendor assigned a project manager, who was available to guide us through the process. the vendor also provided documentation that walked us through the process. • expertise—when challenges did arise, the vendor had an experienced staff to help us work through them. • staff time—although the vendor did most of the work, the project did consume significantly more staff time than expected as we worked through every detail. training and marketing • staff for the website, we had to create our own training for staff. for the catalog, the vendor offered webinars for staff and sent a trainer to do in-person training. • public the vendor offered samples of materials from other libraries to both inform and educated the public. since both projects were launching at the same time, we were able to adapt some of these materials to include both. cost the cost of hiring a vendor initially seems steep, but staff time is also expensive. considering the unexpected additional staff time spent, it likely would have been less expensive to choose the vendor option. information technology and libraries march 2020 tackling the big projects | willis 3 conclusion there are pros and cons to both methods—completing a project on your own or working with a vendor. whether your project is a new website or catalog or something else entirely, learn as much as you can about what will be involved before you decide on an approach. weigh your options by looking at your needs and the resources and time available to you. the primary aspects to consider are: • do staff have the necessary expertise to complete the project? will there be a learning curve? are staff prepared and willing to learn new things and figure things out? if you are considering a vendor, do they have a training plan for • how much time is available? is there a deadline? if there is a deadline, what will be the costs if it needs to be extended? if you are considering a vendor, how committed are they to achieving the prescribed deadline? • which is more important to you—control and flexibility or ease of implementation? • what resources are available if you have questions? if you work on your own, are there people and online resources you will be able to turn to? if you are considering a vendor, will you be assigned a representative to walk you through the process? for our particular situation, i believe we made the right choice to complete the website project on our own. staff had enough expertise that they were willing and able to learn the necessary skills, calling upon a consultant when needed without outsourcing the entire project. while we had an expected timeline, we were able to extend it with only minor consequences (paying for additional web hosting while the project was under construction.) we maintained the control and flexibility we needed in order to present some of the unique services and spaces that our library offers, which might have been lost using a vendor package. we had some knowledge of consultants working in the field and were able to hire one to show us how to proceed when we were over our heads. we also relied heavily on tutorials and other training resources posted online. whatever you decide, taking time to think things through before beginning will help make your project a success. what we learned timing control implementation training and marketing cost conclusion editorial board thoughts: events in the life of ital sharon farnel information technology and libraries | june 2018 4 sharon farnel (sharon.farnel@ualberta.ca) is metadata coordinator, university of alberta libraries. at the end of june 2018, i will be ending my time on the ital editorial board. during my term i have had the opportunity to write several “from the board” pieces and have very much enjoyed the freedom to explore a library technology topic of choice. this time around i would like to examine ital as seen through crossref’s event data service. crossref launched its event data service in beta in 2017; production service was announced in late march of this year. event data is “an open data service that registers online activity (specifically, events) associated with crossref metadata. event data will collect and store a record of any activity surrounding a research work from a defined set of web sources. the data will be made available as part of our metadata search service or via our metadata api and normalised across a diverse set of sources. data will be open, audit-able and replicable.”1 using dois as a basis, event data captures information on discussions, citations, references and other actions on wikipedia, twitter, and other services. i thought it might be interesting to see what the crossref event data might say about ital. i used the event data api2 to pull event data using the prefix for all ojs journals hosted by boston college (10.6017). i then used openrefine3 to filter out all non-ital records and then began further examining the data. the data was gathered on may 9, 2018. in total, 313 events were captured. of these, 193 events were from wikipedia, 110 from twitter, and 5 each from the lens (patent citations) and wordpress blogs. the 313 events are associated with 38 ital articles, the earliest from 1973 (volume 6, number 1, from ital’s digitized archive), and the most recent from 2018 (volume 37, number 1). the greatest number of events (126) are associated with an article from volume 25, number 1 (2006) on rfid in libraries.4 the other articles are associated with a varying number of discrete events, from one to 24. looking more closely at the events associated with the 2006 article on rfid, all 126 events are references in wikipedia. these represent references to the english and japanese language wikipedia articles on radio frequency identification. other references from wikipedia are to articles on open access, fast (faceted application of subject terminology), library 2.0 , biblioteca 2.0, and others. what about that article from 1973? it was written by j. j. dimsdale and titled “file structure for an on-line catalog of one million titles.” the abstract provides a tantalizing glimpse into the content: “a description is given of the file organization and design of an on-line catalog suitable for automation of a library of one million books. a method of virtual hash addressing allows rapid search of the indexes to the catalog file. storage of textual material in a compressed f orm allows considerable reduction in storage costs.”5 mailto:sharon.farnel@ualberta.ca events in the life of ital | farnel 5 https://doi.org/10.6017/ital.v37i2.10460 there are only four events associated with this 1973 article, but interestingly all are from the lens,6 a global patent search database. these are a set of related patents, by mayers and whiting, for data compression apparatus and methods.7 there are 110 events associated with twitter, with tweets from 15 different users. the largest number of events, 21, begins with aaron tay, 8 a librarian and blogger from singapore management university, tweeting about a 2016 ital article9 on user expectations of library discovery products, which was then retweeted 20 times. the two next most-tweeted articles (17 tweets/retweets each) discuss privacy and user experience in library discovery 10 and “reference rot” in etd (electronic theses & dissertations) repositories. 11 what value can such a brief examination of this small set of data from a very new service provide to ital authors, or the editorial board? it can certainly provide a glimpse of who might be accessing ital articles, and how, and perhaps provide some hints as to ways to increase the reach of the journal. this kind of data is not a replacement for download counts or bibliographic citation patterns, but can complement them and add another layer to our understanding of the place of ital in the library technology community and beyond. as ital continues to thrive and as services like event data continue to improve, i look forward to seeing what story this data continues to tell! references the event data used for this analysis can be found at https://bit.ly/2kgdjcm. 1 madeleine watson, “event data: open for your interpretation,” crossref blog, february 25, 2016, https://www.crossref.org/blog/event-data-open-for-your-interpretation/. 2 crossref, event data user guide, https://www.eventdata.crossref.org/guide/. 3 openrefine, http://openrefine.org/. 4 jay sing, navjit brar and carmen fong, “the state of rfid applications in libraries,” information technology and libraries 25 no. 1, 2006, https://doi.org/10.6017/ital.v25i1.3326. 5 j. j. dimsdale, “file structure for an on-line catalog of one million titles,” information technology and libraries 6, no. 1, 1973, https://doi.org/10.6017/ital.v6i1.5760. 6 the lens, https://www.lens.org/. 7 clay mayers and douglas whiting. data compression apparatus and method using matching string searching and huffman encoding. us patent 5532694, filed july 7, 1995, and issued july 2, 1996. 8 aaron tay, https://twitter.com/aarontay. 9 irina trapido, “library discovery products: discovering user expectations through failure analysis,” information technology and libraries 35, no. 3, 2016, https://doi.org/10.6017/ital.v35i3.9190. https://bit.ly/2kgdjcm https://www.crossref.org/blog/event-data-open-for-your-interpretation/ https://www.eventdata.crossref.org/guide/ http://openrefine.org/ https://doi.org/10.6017/ital.v25i1.3326 https://doi.org/10.6017/ital.v6i1.5760 https://www.lens.org/ https://twitter.com/aarontay https://doi.org/10.6017/ital.v35i3.9190 information technology and libraries | june 2018 6 10 shayna pekala, “privacy and user experience in 21st century library discovery,” information technology and libraries 36, no. 2, 2017, https://doi.org/10.6017/ital.v36i2.9817. 11 mia massicotte and kathleen botter, “reference rot in the repository: a case study of electronic theses and dissertations (etds) in an academic library,” information technology and libraries 36, no. 1, 2017, https://doi.org/10.6017/ital.v36i1.9598. https://doi.org/10.6017/ital.v36i2.9817 https://doi.org/10.6017/ital.v36i1.9598 references the fourth industrial revolution: does it pose an existential threat to libraries? editorial board thoughts the fourth industrial revolution does it pose an existential threat to libraries? brady lund information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.13193 brady lund (blund2@g.emporia.edu) is a doctoral student at emporia state university's school of library and information management and a member of the ital editorial board. © 2021. “editorial board thoughts” is a regular column contributed by a member of the ital editorial board. opinions do not necessarily reflect those of the editorial board as a whole. no. no, it does not. not any more than any other technological innovation (information systems, personal computers, the internet, e-readers, google, google scholar) did. however, what is very likely is that the technologies that emerge from this era will slowly (but surely) lead to profound changes in how libraries operate. those libraries that fail to understand or embrace these technologies may, in fact, be left behind. so, we must, as always, stay abreast of trends in emerging technologies and what the literature (i.e., articles in this journal) propose as ideas for adopting (and adapting) them to better serve our patrons. with this column, my aim is to briefly discuss what the fourth industrial revolution is and its relevance within our profession. the “fourth” industrial revolution? the term “fourth industrial revolution” describes the evolution of information technology towards greater automation and interconnectedness. it includes or incorporates technological advancements such as artificial intelligence, blockchain, advanced robotics, the internet of things, autonomous vehicles, virtual reality, 3d printing, nanotechnology, and quantum computing.1 imaginations can run amok with idyllic visions of walt disney’s epcot—a utopian world of interconnectedness and efficiency—or dystopian nightmares captured in the mind of stephen king, george orwell, or pixar’s wall-e. if we have learned anything from the past though—and after the last 12 months i cannot be entirely sure of that—it is that reality is likely to settle somewhere in the middle. some things will improve dramatically in our lives but there will also be negative impacts—funding changes, learning curves, and maybe a bit of soul-searching within the profession (not that that last one is necessarily a bad thing). the “fourth industrial revolution” is referred to as such because historians of technological and industrial innovation have placed the period as fourth in a line of major shifts in technological innovation (shocking, no?). the first industrial revolution was the industrial revolution, the one we were taught to call the “industrial revolution” in high school: the period from the late-18th to mid-19th century where rapid innovation in the areas of agriculture and manufacturing transformed the economy, created a market for invention and profiteering, and formed a true “working class” of laborers. as idyllic as it sounds in our history classes, it was not a particularly pleasant time for the average laborer. there is a reason why the communist manifesto, and the whole concept of the social sciences, emerged from this era. but this era also brought us the modern (semi-modern?) library. mailto:blund2@g.emporia.edu information technology and libraries march 2021 the fourth industrial revolution | brady 2 the second industrial revolution, the “technological revolution,” occurred from the late-19th to early-20th century. it was all about the power of electricity and the engine. here we have the emergence of vast telegraph networks, lightbulbs, automobiles, and airplanes. very nice stuff. you also had a lot of economic uncertainty that led to a few depressions and wars (but do not focus on that too much, readers). the third industrial revolution, the “digital revolution,” is the first revolution that we really have a good record of within library and information science. it is why we have that “information science” part in the name. it is why we have this journal that you are reading now. it was the era of digital computers and computer networks. yes, it changed a lot—some for the better, some for the worse. if you are a reader of this journal, you would probably say—at least in terms of library services—that this era was full of fairly positive developments for libraries. we gained digital catalogs, electronic databases, integrated library systems, the internet, and microsoft office, all developments the increase the ease and efficiency of our everyday work tasks. so, that leads us to the precipice that we are on now. the fourth industrial revolution. while uncertainty provokes trepidation, there is much we can do to inform ourselves—much more so than with previous revolutions (we could not very well turn to the internet to learn about the internet). there are scenarios on both ends of the spectrum that ranges from utopia and doom — and it’s not altogether bad to read about the risks that automation presents, such as described by andrew yang and other political and economic figures—but if history and precedent mean anything, it is likely that we will see many things in our economy, and in our libraries, change substantially but certainly not vanish. the role of libraries in the fourth industrial revolution we have already been adopting many of these fourth industrial revolution technologies for quite some time—whether we were cognizant of it or not. a lot of what could be called “artificial intelligence” already exists in our library systems and more is on the way if current projects being conducted around the globe come to fruition.2 there are a lot of “pie in the sky” ideas of what the future of libraries will look like with the fourth industrial revolution—some of which were even published in this very journal. it is good to have these perspectives, even if they do seem a bit unrealistic and/or dystopian. we need to consider the possibilities of this era while also understanding the practicalities. our goal as library technologists—to serve the information needs of patrons to the best of our abilities—should never falter. if this is true of our professional ethics and values, then we must be prepared to sacrifice and adapt to change for the common good. one thing i am quite certain of is that these technological innovations will not spell doom for libraries. libraries are resilient and librarians themselves mean more to patrons than a computer interface. we do not have to go back too far to see how libraries responded to a similar disruptive period: the introduction of the internet. if you want to read an interesting parallel to this editorial, check out david raitt’s 1994 editorial in the electronic library (and hopefully not feel too old when you discover that i was born in the year that it was published). the purpose of raitt’s editorial was to answer the question that considering developments with the internet, “will librarians still be around in 2024, and if so what are they likely to be doing?”3 boy, i sure hope they are around in three years, or i really made a poor career choice. the ease with which raitt dismisses any concern about the internet spelling the end of libraries is delightful to read: “are librarians so insecure about their profession and future?... (in 2024) librarians will still be doing information technology and libraries march 2021 the fourth industrial revolution | brady 3 what they do now and what they have always done, only they will have more new-fangled technology to help them do it.”4 you could argue that raitt’s prediction is not entirely correct. some things about libraries have changed considerably. but have any of these changes been decidedly for the worse? even the most curmudgeonly postmodernist must concede that libraries serve their patrons better now than before the internet age. libraries, on the whole, are still very much the same. we have not undergone massive upheavals in our professional values and ethics. some job duties have changed (such as a greater emphasis on instruction in academic libraries) but it has not spelled total doom for our librarians. so, people, turn and face the strange, because there is nothing too serious to fear with coming changes. and, if my prediction turns out to be less accurate than raitt’s, well, just remember that my name is “john barron” (and metadata that indicates to the contrary is fake news). the changes that have been clumped under the buzzword of the “fourth industrial revolution” will do a lot to advance the mission of libraries. right now, a lot of the “how?” can seem a bit hazy, but check out some of these library school programs that are working to answer that very question and i think you will get a good idea: • blockchains for the information profession (san jose state university ischool): https://ischoolblogs.sjsu.edu/blockchains. o an imls-funded program that examines applications of blockchain in libraries. • artificial intelligence (a program of stanford university libraries): https://library.stanford.edu/projects/artificial-intelligence/. o a university-supported program that examines applications of ai in libraries. • the good systems program (university of texas ischool, in partnership with a bunch of other departments on campus): https://bridgingbarriers.utexas.edu/good-systems. o this program focuses on ethical uses of ai to improve lives. while it does not necessarily focus specifically on libraries, its founding members are connected to the university’s library school and many of the program’s “products” have direct relevance to libraries. library and information technology is one area where academia and scholarly research offer a lot of useful knowledge and ideas for the professional librarian. there are a lot of great ideas from researchers and programs at these schools, in addition to the investments made by industry leaders like oclc, that suggest libraries are not going to be left behind during the “coming revolution.”5 expect the ideas filtering from these programs, and others, to take greater hold in practical library settings in coming years—as the internet did in the mid-to-late 1990s and social media did in the period of (approximately) 2008–2013. like with these past innovations, the extent of adoption will likely vary from library-to-library. i will avoid a lecture on diffusion of innovations here, though it is one of my favorite “simple” theories (a few suggested readings for those who are interested are chatman’s 1986 article—which is a bit more technical—and minishi-majanja and kiplang'at’s 2005 article).6 what is important to expect is that this will not all just be a “flash in a pan” like we have seen before with some technologies. these technologies in the “fourth industrial revolution” will bring about real change in our world. if we are “ahead of the curve” (to reference a diffusion concept), we will be well-positioned to adapt to the changes to come. https://ischoolblogs.sjsu.edu/blockchains https://library.stanford.edu/projects/artificial-intelligence/ https://bridgingbarriers.utexas.edu/good-systems information technology and libraries march 2021 the fourth industrial revolution | brady 4 references 1 klaus schwab, the fourth industrial revolution (new york, ny: penguin books, 2017). 2 jason griffey, artificial intelligence and machine learning in libraries (chicago, il: american library association, 2019). 3 david raitt, “the future of libraries in the face of the internet,” the electronic library 12, no. 5 (1994): 275. 4 ibid. 5 thomas padilla, responsible operations: data science, machine learning, and ai in libraries (dublin, oh: oclc research, 2019), https://doi.org/10.25333/xk7z-9g97. 6 elfreda a. chatman, “diffusion theory: a review and test of a conceptual model in information diffusion,” journal of the american society for information science 37, no. 6 (1986): 377–86; mabel k. minishi-majanja and joseph kiplang’at, “the diffusion of innovations theory as a theoretical framework in library and information science research,” south african journal of libraries and information science 71, no. 3 (2005): 211–24. https://doi.org/10.25333/xk7z-9g97 the “fourth” industrial revolution? the role of libraries in the fourth industrial revolution references a rapid implementation of a reserve reading list solution in response to the covid-19 pandemic article a rapid implementation of a reserve reading list solution in response to the covid-19 pandemic matthew black and susan powelson information technology and libraries | september 2021 https://doi.org/10.6017/ital.v40i3.13209 matthew black (mblack@ucalgary.ca) is the discovery & systems librarian, university of calgary. susan powelson (spowelso@ucalgary.ca) is the associate university librarian, technology, discovery & digital services, university of calgary. © 2021. abstract in the spring of 2020, as post-secondary institutions and libraries were adapting to the covid-19 pandemic, libraries and cultural resources at the university of calgary rapidly implemented ex libris’ reading list solution leganto to support the necessary move to online teaching and learning. this article describes the rapid implementation process and changes to our reserve reading list service and policies, reviews the status of the implementation to date and presents key takeaways which will be helpful for other libraries considering implementing an online reading list management system or other systems on a rapid timeline. overall, rapid implementation allowed us to meet our immediate need to support online teaching and learning; however, long term successful adoption of this tool will require additional configuration, engagement, and support. introduction in response to the changes to the post-secondary learning environment due to covid-19 and to better integrate our course reserve reading list services with our library management system (ex libris’ alma), libraries and cultural resources (lcr) at the university of calgary (ucalgary) decided to rapidly implement ex libris’ reading list solution leganto. after rapidly implementing this reading list solution, lcr made it accessible to instructors and students in our learning management system, desire2learn (d2l), for the fall 2020 term. this article will discuss lcr’s decision to rapidly implement leganto, the implementation process, changes to our reserve reading list service and policies, and will conclude by reviewing the implementation so far and present key takeaways. this paper will be helpful for those who are considering implementing an online reading list management system in general but more so for those looking to do so on a rapid timeline. it will also be helpful for those implementing new systems to support changes to services due to covid-19. literature review online reading lists management systems have been in use since the early 2000s. from 1999 to 2000, loughborough university developed and implemented an in-house open-source reading list management system which “was an electronic representation of the academic’s paper-based reading list.”1 since then open-source and commercial solutions have been developed and implemented by many libraries. richard cross summarizes the development and growth of the market for “resource list software” in the uk and notes that “in the absence of a mature commercial market, several uk universities have developed in-house” systems. cross notes that since 2010, commercial solutions have been developed and that “the high-level specifications requirements” for reading list management systems have now been distilled.2 as part of the development of the commercial market, ex libris launched its leganto online reading list solution in 2015; the solution has since been implemented by over 230 institutions worldwide.3 mailto:mblack@ucalgary.ca mailto:spowelso@ucalgary.ca information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 2 overall, the literature on reading list management systems focuses on reviewing implementations and identifying factors that contribute to successful implementation of these solutions.4 also, some literature captures instructor and student perceptions of these systems and provides recommendations for successful engagement and use.5 these recommendations focus on the importance of administrative support for the tool, updating or creating policy and workflows to support implementation, technical configurations and integrations, and faculty/instructor engagement. 6 libraries that have implemented an online reading list system indicate that it is important to have the adoption of the system supported and championed by senior administration within the library and from the wider institution.7 having this support means that reading list policies and services can be aligned with library and institutional goals to support teaching and learning. furthermore, marie o'neill and lara musto contend that successful adoption is dependent on this support and integration with institutional goals and not just “premised on technology.”8 establishing policies and service goals that are supported by senior administration provides an impetus to make sure workflows and functionalities are configured to achieve these. the literature recommends that implementing an online reading list solution provides an important opportunity to review or revise previous workflows or to develop new workflows for standardization and “timely satisfaction of resource list needs.”9 to support these new workflows and services, technical integrations and configurations need to be considered. these include integrations with the library management system, the learning management system (lms), and institutional authentication systems.10 these integrations are essential to make the system streamlined and accessible for instructors and students and are expected by these user groups. for example, when the university of manchester library was implementing leganto in 2019, they surveyed students and instructors and found that students valued convenience and access, and that instructors were interested in a system that would allow them to order books and digitized chapters and see analytics.11 thus, the reading list solution should be configured and implemented with functionalities and integrations to support these expectations. ease of access within the lms is perceived as is an important technical requirement for successful implementation. o’neill and musto’s study finds that faculty had a strong desire to have a reading list system integrated with the lms.12 meredith gorran farkas contends that “positioning the library at the heart of the virtual campus seems as important as positioning the library as the heart of the physical campus.”13 murray and feinberg address the placement of libguides in the learning management system stating it is critical to make resources available despite where students are physically located.”14 this can be extrapolated to reading lists, and in this environment, where students are learning online in geographically dispersed locations, the lists need to be easily available. faculty and student engagement are important nontechnical considerations when embedding library services in the lms. knowing and including relevant stakeholders, gaining instructor buyin and understanding of the benefits, and determining how to raise student awareness of the tool, both short and longer term, are key elements.15 the literature recommends providing instructors with resources such as templates and training “at specific points in the semesters when faculty have less time pressures.”16 however, engagement should not be limited to this: it should be information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 3 ongoing and collaborative. this ongoing engagement can work to create “a virtuous circle” in which instructors “see return on the investment in resource list work” and “improved student satisfaction.”17 one university reports this can provide an opportunity for collaboration and generate “word-of mouth” to promote adoption.18 what is leganto? leganto is a cloud-based reading list solution fully integrated with ex libris’ alma library management system. it allows reading lists to be processed directly in alma by library staff and provides an interface for creating and engaging with readings lists that can be integrated directly into learning management systems using the learning tools interoperability (lti) integration. instructors can use leganto to create reading lists by searching for and adding resources. leganto allows diverse resources to be added to reading lists, including physical and electronic resources that are in alma, ex libris discovery index resources (via their central discovery index), internet sources (any url added manually or through the cite it! browser bookmark tool), or uploaded resources (documents uploaded by the instructor). university of calgary context the university of calgary is a comprehensive research university, ranked one of canada’s top ten research universities.19 the university is home to 14 faculties (offering more than 200 academic programs) and more than 33,000 students are currently enrolled in undergraduate, graduate, and professional degree programs. d2l is the university’s learning management system, managed by the university’s teaching and learning unit. libraries and cultural resources is a principal division of the university of calgary and includes eight libraries on campus and across the city, and two art galleries. the main library is the taylor family digital library (tfdl) which opened in 2011. in 2018, lcr adopted ex libris’ alma as its library management system. pre-covid-19 reserve reading list service prior to covid-19, lcr had distinct and unintegrated systems and processes for managing course reserves and reading lists. while some functions in alma were used to manage physical course reserves, most of the workflows were managed outside of alma. these included a web form that instructors could use to submit requests for course materials and a course reserve tool (atlas systems’ ares product). the submissions from the web form would be reviewed by our copyright office who would determine if the requested item needed copyright clearance or not. through this process, physical items which were already in the library collection were flagged and sent to fulfillment staff. staff would create or update the course in alma, add the item to a course reading list, and move it to a reserve location so it could be borrowed by students. doing this also made the items searchable through our course reserve search in our discovery service (primo). requested items that were not in the library collection were sent to the collections department for purchase consideration. through email communications with the copyright office, instructors could also request parts of items be scanned and approved for use in classes. requests for electronic reserves were not managed in alma. how did covid-19 affect our reserve reading list service? shortly after the covid-19 pandemic became widespread in the province of alberta, the government implemented restrictions which required post-secondary institutions to close physical locations and to restrict or move most courses and services online. in march 2020, lcr had to close all physical locations and was only able to provide online access to resources and information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 4 services. from march to august, lcr focused on online services and promoted access to online resources. the main taylor family digital library was not able to open until august, when we began offering contactless pickup service and limited study space bookings. as of july 2021, the majority of the courses offered by the university continue to be online. as a result, we are not fulfilling instructor requests to make print/physical resources accessible to students through course reserve. this has been a significant change from our previous operations in which physical course reserve was a popular and successful service. in 2019, lcr had a total 3,428 physical items added in alma as course reserve citations and these items circulated 16,345 times. with the announcement that fall 2020 term courses would be predominantly online, and the anticipation that the same would hold true for winter 2021, we realized that a new mechanism to deliver course reserves would be necessary. while we had been considering ex libris’ course reading list tool before the pandemic, the operational changes due to covid-19 provided us with an urgent need and impetus. we had to quickly develop alternative ways to support instructors in creating reading lists and for students to access reading list resources in the online learning environment. leganto rapid implementation leganto implementation is managed by ex libris and is typically done over an eight-to-twelveweek schedule. during this implementation, institutions work with ex libris to set up configurations to meet local needs. in 2020, ex libris began offering rapid implementation for leganto, which involves a shortened timeline of approximately four to five weeks. this is achievable because institutions allow ex libris staff to set up the leganto now configuration, which focuses on configuring essential features to quickly get the tool up and running. there are inherent tradeoffs between the typical and rapid implementation that we needed to consider. the shortened timeline would allow us to launch the service sooner and begin promoting it as a tool to support faculty in moving to teaching online due to covid-19. in addition, because we would implement leganto out of the box we could use the vendor created videos and support tools rather than creating our own. the additional three to seven weeks required for the typical implementation would have provided us more time to fine tune and configure the tool to meet our specific needs but would have delayed our ability to support our faculty in this time of need. for example, the rapid implementation schedule did not include support for setting up workflows for digitization requests or configuration of the more advanced copyright clearance functions to improve automatic processing. to configure these, we would need to work on them on our own while going through the rapid implementation or once the implementation was complete. after considering these tradeoffs, in april 2020 we made the decision to rapidly implement leganto. this decision was supported by the provost and lcr senior administration providing us with the institutional support identified in the literature as necessary for a successful implementation. table 1 outlines the rapid implementation schedule we followed. information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 5 table 1. project implementation schedule event dates kickoff may 27, 2020 implementation may 28 to june 28, 2020 go-live june 29, 2020 switch to support july 15, 2020 the implementation portion involved weekly meetings to review the project status, frequent posting to the online project management tool to update the status and ask clarifying questions, and many internal meetings to discuss our progress and to make decisions about workflows, configurations, and engagement. as a requirement for go-live, at least one reading list had to be created by an instructor and be accessible to students. ex libris offered the opportunity to implement together with the university of manitoba, which is a similarly sized canadian university. this was a great opportunity to not only start implementation sooner but also connect with the university of manitoba. in the short term, rapid implementation would help us meet an immediate need to support online access and learning while operating under covid-19 restrictions. in the long term, this was an opportunity to develop a new way to engage with instructors and further promote our collections as resources for supporting learning. also, it was the opportunity to revise or develop workflows to support these goals. overall, we hoped leganto would make it easier for instructors to add resources to reading lists and to get copyright clearance so these resources would be accessible to students through a standardized tool integrated with d2l. with these goals in mind, we aimed to • pilot leganto for summer 2020 term, • have leganto accessible for all courses by fall 2020 term, and • revise or develop workflows for digitization, copyright clearance, and purchase requests. implementation work despite the out-of-the-box leganto now configuration which aimed to have leganto up and running for our go-live date, implementation still required the lcr team to do a significant amount of work and planning. to manage this work, we established a technical team and an engagement team. the technical team, comprised of representatives from the library’s systems and discovery, copyright, digitization and collections units, the university’s teaching and learning unit, and the associate university librarian, technology, discovery and digital services (aul), worked to review, revise, and develop workflows and to test configurations in alma and leganto. the engagement team, created by a call for participants and comprised of three subject librarians and the aul, worked to develop strategies, communications, and resources to promote and support the use of leganto to instructors. also, the two teams collaborated to test configurations and functions and suggest improvements. the aul’s presence in the teams was critically important to demonstrate senior leadership support of the project, an important element for success noted by o’neill and musto.20 overall, this work required the teams to meet and discuss short-term and long-term changes to our course reserve service and how leganto could be configured to support these. the rapid information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 6 implementation schedule made this challenging because we had to start with the leganto now configurations and then test these to see how they aligned with our desired services and workflows. if they did not align well, we had to investigate and adapt. this was a back-and-forth negotiation as we learned in a short time frame how to configure the system to support our desired workflows and services and how to adjust our services and workflows to align with the capabilities/functionalities of the system. an important part of the implementation was configuring leganto to use the learning tools interoperability (lti) standard to integrate with d2l. as mentioned in the literature, having the online reading list solutions embedded and integrated with the institutional lms is a key factor for successful adoption of the tool by instructors and students.21 using the lti integration, we were able to connect alma, leganto, and d2l. this work required coordination with the adminis trators of d2l at the university so that course and student data could be communicated between d2l, alma, and leganto. after configuring and testing the lti integration, we decided to use it to create reading list (leganto) links in d2l courses through the d2l tools menu. when a user clicks the link, user and course data from d2l is sent to alma to • create the course based on the course information and • assign the appropriate role in alma and leganto for the instructor or student. since lcr could not provide physical reserves because of covid-19, we decided to support the full/partial digitization of physical resources based on copyright approval and the creation of purchase requests for electronic copies of physical items added to a reading list. to achieve these goals, we needed to revise and establish workflows that make use of the basic digitization and purchase request functions in leganto and alma. these functions were not part of the leganto now configuration, but we decided to take the opportunity to make them available to instructors. for our purposes, the digitization workflows and functions in alma and leganto would be used to support the full or partial scanning of physical resources. we already had workflows to support this type of work, but these needed to be revised so requests created by instructors in leganto could be reviewed by the copyright office, items could be retrieved from the collection and scanned by staff, and scans could be made accessible in leganto for instructors and students. the technical team worked with the departments involved in these workflows to determine how the new functions could support this work and how the workflows needed to be adjusted. also, we had to decide how to use the functions in alma and leganto to facilitate electronic purchase requests for print/physical resources added to readings lists. similarly, we already had workflows for this but needed to review and revise these to make use of the functions supported by alma and leganto. using these functions, we configured alma and leganto so that if an instructor added a book to their reading list and we did not have an e-book version, a purchase request for an e-book version would be submitted to the collections department. this was achieved using tags and automatic processing rules with definable parameters. there were a few other settings we had to customize to meet our needs. this included configuring the default out-of-the-box processing and copyright statuses for citations added to reading lists, reading list visibility settings to make sure only enrolled students could access the lists, user interface adjustments to control what functions are available to students and instructors, and interface text/messaging changes to align with our services. information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 7 the other major part of this work involved engagement to publicize the new tool and provide training and support to instructors and library staff. the engagement team worked to develop resources and provided training to staff and instructors. library staff were oriented to the product, the key benefits, and how to add course materials so that they could speak knowledgably to their faculty about the tool. faculty training sessions introduced faculty to the tool and how to add specific types of resources, for example a book chapter, a website, or an item from their personal collection. one libguide was created to support both faculty and staff. ex libris provided support to the engagement team by providing communication templates and advice on engagement strategies. the team also promoted the new tool in our institutional newsletter and worked with the university’s teaching and learning unit to promote it. overall, it was challenging during the five-week rapid implementation to quickly map and adapt our current course reserve workflows and ensure that these configurations will meet long term needs while considering the restraints of the current covid environment. go-live and post go-live we finished implementation on june 29, 2020. for the summer term pilot, we only had one instructor publish a reading list for their course. however, instead of using the functionalities to add citations to the list the instructor uploaded a document version of their reading list. this was similar to what instructors were used to doing in d2l and we had to support the instructor to add the citations from the document to the reading list in leganto. shortly after going live, we had to make the lti link to leganto available in all courses because after publicizing the new tool, we received requests for access from instructors who were preparing for their fall term courses. this was sooner than expected and we believe this was because the online fall term may have motivated instructors to start preparing earlier than usual. with the end of the fall 2020 term, we have been able to use alma and leganto’s analytics reporting to see instructor use and student engagement with reading lists (see table 2). from these reports, we can see that for the fall 2020 term, there were 50 reading lists created that were associated with a course and that had at least one citation added. this means the instructors of these courses at least tried to use the reading list tool. however, of these 50 lists, there were only 30 lists that had student activity. student activity is a category of interactions that ex libris uses to indicate how well the lists are being used by students and includes activities such as reading list views, number of citation views, number of full text views, number of files downloaded, and number of students that have marked a reading as done. we surmise that for the other 20 reading lists, the instructors encountered barriers to list creation and abandoned the process. we continued to use the analytics reporting to monitor the status of instructor use for the winter 2021 term. by the end of that term, there were 66 lists created with at least one citation. information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 8 table 2. fall 2020 and winter 2021 term usage term reading lists with at least one citation added number of reading lists with student activity students viewing reading lists reading list views fall 2020 50 30 664 5,320 winter 2021 66 49 1,154 13,200 takeaways rapidly implementing leganto during covid-19 has been a valuable and challenging learning experience which has offered several takeaways. go through rapid implementation with another university of a comparable size so that you can connect, support, and learn from each other. rapid implementation is an intense period of uncertainty which can be challenging to go through alone. another institution can provide support and help the implementation team cope. for example, during the implementation we had meetings with the university of manitoba to discuss our progress and challenges with the implementation before meeting with the vendor implementation team. since implementing we have stayed in contact and have been able to rely on each other to discuss the status of our engagement, adoption, and configurations. rapid implementation requires effort and time. the shortened implementation timeline imp lied that it would be relatively easy; however, the project dominated the schedules of those involved and has continued to require work. the weekly meetings with ex libris, the weekly internal meetings, the work in between meetings, and the continuous testing and reworking settings required a significant time commitment from all involved. timing is everything. we had hoped for a successful pilot in the summer term and to use this experience to learn and prepare for the wider rollout in the fall term. however, instructors began preparing for the fall term earlier than expected and consequently we needed to make leganto accessible earlier. understand your instructor timelines and practices around course preparation. the rapid implementation allowed us to respond to this unexpected pressure and begin to support instructors who wanted to make use of the tool. finally, further customizations and integrations will be necessary because of the nature of the rapid implementation and the inherent trade offs. the rapid implementation did not provide the time for us to pursue these customizations and integrations, and this is typical. for example, the course data integration would have required us to coordinate with our central it department and registrar to get approval and resources to build a data source and scripts to export and format the data so it could be loaded by the tool. normal implementation would have provided support and time for this. interestingly, sheedy, wells, and bellenger in discussing their implementation at curtin university noted that they too did not pursue some of the configurations during their implementation.22 regardless of the implementation schedule, libraries maybe uncertain of how these configurations will be useful until after using the tool. information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 9 in our case, it was after implementation that we realized the benefits these configurations could offer staff and users in terms of efficiency. our next steps include • loading course data (we are working on this with our central it department and hope to have it complete for summer 2021), • refining and expanding the advanced automatic rules for copyright approval, • refining digitization workflows, and • implementing the q&a functionality to prompt faculty to describe if they need an entire ebook or just a digitized chapter. these steps will support staff in the administration of courses and reading list in alma and leganto and in improved efficiency in processing citations. furthermore, they will help ensure resources are accessible to all registered students for the appropriate time period. conclusion it is not yet clear if the adoption of the tool has been successful with only 30 lists with student activity for the fall 2020 term and 49 for the winter 2021 term. we had hoped that given the increased need for online learning in this term, instructors would have been eager to use the reading list tool to support student access to resources and learning. conversely, it seems likely that the new tool may have been too much for some instructors during this stressful period.23 sheedy, wells and bellenger noted that there is a potential for system rejection by end users if the change is perceived as creating additional workloads for academic staff, and this may be a factor in our implementation.24 as a result, we will continue to monitor use and determine if our engagement strategies are working. if not, we will need to provide further engagement, training, and support to build interest and use. the public health restrictions due to covid-19 challenged the structure of learning and library services in post-secondary institutions. in some cases, these challenges were opportunities for change. presented with the challenge of how to continue to provide course reserve reading list service, lcr decided to adopt leganto through a rapid implementation. this implementation was an opportunity for lcr to continue to provide reserve reading list service while implementing new workflows to support online access to resources (digitization and purchase requests). although this has met some of lcr’s immediate needs, there is still work that needs to be done to ensure long term successful adoption and use. we will need to continue to review and improve engagement strategies, workflows, and integrations to make the most of this investment. endnotes 1 gary brewerton and jon knight, “from local project to open source: a brief history of the loughborough online reading list system (lorls),” vine 33, no. 4 (2003): 189–95, https://doi.org/10.1108/03055720310510909. 2 richard cross, “implementing a resource list management system in an academic library,” electronic library 33, no. 2 (july 2015): 221, https://doi.org/10.1108/el-05-2013-0088. 3 “leganto course resource list management,” ex libris, march 18, 2021, https://exlibrisgroup.com/products/leganto-reading-list-management-system/. https://doi.org/10.1108/03055720310510909 https://doi.org/10.1108/el-05-2013-0088 https://exlibrisgroup.com/products/leganto-reading-list-management-system/ information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 10 4 cross, “implementing a resource list management system;” brewerton and knight, “local project to open source.” 5 marie o'neill and lara musto, “faculty perceptions of loughborough's online reading list system (lorls) at dublin business school (dbs),” new review of academic librarianship 23, no. 4 (2016): 368, https://doi.org/10.1080/13614533.2016.1272473; cross, “implementing a resource list management system in an academic library.” 6 cross, “implementing a resource list management system”; brewerton and knight, “local project to open source”; o’neill and musto, “faculty perceptions”; olivia walsby, “implementing a reading list strategy at the university of manchester—determination, collaboration and innovation,” insights the uksg journal 33 (2020), https://doi.org/10.1629/uksg.494. 7 o’neill and musto, “faculty perceptions,” 368. 8 o’neill and musto, “faculty perceptions,” 368. 9 o’neill and musto, “faculty perceptions,” 368; linda sheedy, david wells, and amanda bellenger, “implementation of a leganto reading list service at curtin university library,” in technology, change and the academic library, ed. jeremy atkinson (chandos publishing, 2021), 55– 61, https://doi.org/10.1016/b978-0-12-822807-4.00005-1. 10 o’neill and musto, “faculty perceptions,” 368. 11 walsby, “implementing a reading list strategy.” 12 o’neill and musto, “faculty perceptions,” 368. 13 meredith gorran farkas, “libraries in the learning management system,” in american libraries tips and trends (summer 2015), https://acrl.ala.org/is/wpcontent/uploads/2014/05/summer2015.pdf. 14 jennifer murray and daniel feinberg, “collaboration and integration,” information technology and libraries 39, no. 2 (november 2020), https://doi.org/10.6017/ital.v39i2.11863,. 15 murray and feinberg, “collaboration and integration,” 9. 16 o’neill and musto, “faculty perceptions,” 368. 17 cross, “implementing a resource list management system.” 18 ex libris, “course materials affordability: a win for university of st. thomas,” library journal (november 7, 2018), https://www.libraryjournal.com/?detailstory=course-materialsaffordability-a-win-for-university-of-st-thomas. 19 “canada’s best medical doctoral universities: rankings 2020,” maclean’s, october 3, 2019, https://www.macleans.ca/education/university-rankings-2020-canadas-top-medicaldoctoral-schools/. https://doi.org/10.1080/13614533.2016.1272473 https://doi.org/10.1629/uksg.494 https://doi.org/10.1016/b978-0-12-822807-4.00005-1 https://acrl.ala.org/is/wp-content/uploads/2014/05/summer2015.pdf https://acrl.ala.org/is/wp-content/uploads/2014/05/summer2015.pdf https://doi.org/10.6017/ital.v39i2.11863 https://www.libraryjournal.com/?detailstory=course-materials-affordability-a-win-for-university-of-st-thomas https://www.libraryjournal.com/?detailstory=course-materials-affordability-a-win-for-university-of-st-thomas https://www.macleans.ca/education/university-rankings-2020-canadas-top-medical-doctoral-schools/ https://www.macleans.ca/education/university-rankings-2020-canadas-top-medical-doctoral-schools/ information technology and libraries september 2021 a rapid implementation of a reserve reading list solution | black and powelson 11 20 o’neill and musto, “faculty perceptions,” 368. 21 murray and feinberg, “collaboration and integration,” 9. 22 sheedy, wells, and bellenger, “implementation of a leganto reading list service,” 55–61. 23 “faculty wellness and careers,” course hero (blog), december 1, 2020, https://www.coursehero.com/blog/faculty-wellness-research/. 24 sheedy, wells, and bellenger, “implementation of a leganto reading list service,” 55–61. https://www.coursehero.com/blog/faculty-wellness-research/ abstract introduction literature review what is leganto? university of calgary context pre-covid-19 reserve reading list service how did covid-19 affect our reserve reading list service? leganto rapid implementation implementation work go-live and post go-live takeaways conclusion endnotes first aid training for those on the front lines: digital preservation needs survey results 2012 jody deridder information technology and libraries | june 2013 18 “the dilemma for the cultural heritage preservation community derives from the lag between immediate need and the long-term transformation of digital preservation expertise.” 1 introduction every day history is being made and recorded in digital form. every day, more and more digitally captured history disappears completely or becomes inaccessible due to obsolescence of hardware, software, and formats.2 although it has long been the focus of libraries and archives to retain, organize, and preserve information, these communities face a critical skills gap. 3 further, the typical library cannot support a true, trusted digital repository compliant with the open archival information system (oais) framework.4 until we have in place the infrastructure, expertise, and resources to distill critical information from the digital deluge and preserve it appropriately, what steps can those in the field take to help mitigate the loss of our cultural heritage? the very “scale of the digital landscape makes it clear that preservation is a process of triage.” 5 while educational systems across the country are scrambling to develop training programs to address the problem, it will be years, if ever, before every cultural heritage institution has at least one of these formally trained employees on staff. librarians and archivists already in place are wondering what they can do in the meantime. those on the front lines of this battlefront to save our cultural history need training. surrounded by content under digitization, digital content coming into special collections and archives, assisting content creators in their research and scholarship, these archivists and librarians need to know what they can do to prevent more critical loss. even if developing a preservation program is limited to ensuring the digital content survives long enough to be collected by some better-funded agency, capturing records in open standard interoperable technology neutral formats would help to ease later ingest of such content into a trusted digital repository.6 as molinaro has pointed out, those in the field need “the knowledge and skills to ensure that their projects and programs are well conceived, feasible, and have a solid sustainability plan.” 7 for those on the front lines, digital preservation education needs to be accessible, practical, and targeted to an audience that may have little technical expertise. since “resources for preservation are meager in small and medium-sized heritage organizations,” 8 such training needs to be free or as low-cost as possible. jody l. deridder (jlderidder@ua.edu) is head of digital services at the university of alabama libraries, tuscaloosa. mailto:jlderidder@ua.edu first aid training for those on the front lines | deridder 19 in an effort to address these needs, the library of congress established the digital preservation outreach & education (dpoe) train-the-trainer network.9 in six one-hour modules,10 this training provides a basic overview of the framework necessary to begin to develop a digital preservation program. the modules formed the basis for three well-attended aserl webinars in february 2012.11 attendee feedback after the webinars indicated a deep need for practical, detailed instruction for those in the field. this article reports on the results of a follow-up survey to identify the topics and types of materials most important to webinar attendees and their institutions for digital preservation, in the fall of 2012. approach the survey was open from october 2 until december 15, 2012. invitations to participate were sent to the following discussion lists: society of american archivists (saa) archives & archivists (a&a), saa preservation section discussion list, saa metadata and digital object round table discussion list, digital-curation (google group), digital library federation (dlf-announce), and the library of congress digital preservation and outreach (dpoe) general listserv. each invitation clarified that respondents need not be association of south eastern research libraries (aserl) members in order to attend the free webinars or to participate in the survey. the survey consisted of three questions, the first to determine the sources of digital content most important for respondents’ institutions to preserve, and the second to identify the topics of greatest concern to respondents themselves. for these two questions, respondents were asked to rate the options as: • extremely important • somewhat important • maybe of value • not important at all the first two questions are as follows: please rate the following sources of digital content in terms of importance for preservation at your institution: • born-digital institutional records • born-digital special collections materials • digitized collections • digital scholarly content (institutional repository or grey literature) • digital research data • web content • other please rate the following topics in terms of importance to you, for inclusion in future training webinars: information technology and libraries | june 2013 20 • how to inventory content to be managed for preservation • developing selection criteria, and setting the scope for what your institution commits to preserving • selecting storage options and number of copies • determining what metadata to capture and store • methods of preservation metadata extraction, creation, and storage • legal issues surrounding access, use, migration, and storage • selecting file formats for archiving • validating files and capturing checksums • monitoring status of files and media • file conversion and migration issues • business continuity planning • security and disaster planning at multiple levels of scope • self-assessment and external audits of your preservation implementation • developing your institution's preservation policy and planning team • planning for provision of access over time • other after each of these questions, respondents were provided a free text field in which to add additional entries related to the “other” entry. the last question on the survey asked respondents whether they are members of an aserl institution, since aserl is supporting this series of webinars. results of the 182 respondents, 37 (20.7 percent) self-identified as aserl members, 142 (79.3 percent) as non-aserl members, and three skipped the question. all respondents answered the first two queries. sources of digital content for the complete set of respondents, the top three types of material considered extremely important for preservation were born-digital special collections materials (65 percent, 117 respondents), born-digital institutional records (62.7 percent, 111 respondents), and digitized collections (61.2 percent, 109 respondents). digital scholarly content, digital research data, and web content trailed in importance, rated extremely important by only 37 percent (64 respondents), 33.9 percent (59 respondents), and 30.6 percent (52 respondents) respectively. in clarification, one respondent listed “born-digital correspondence (e-mail),” another listed “state government digital archival records,” a third asked for instructions for use of “kodak’s new asset protection film for preservation of moving and still images,” and one specified that by “special collections” she meant “audiovisual.” first aid training for those on the front lines | deridder 21 the concern for a/v materials was echoed by some of the 8 respondents suggesting other content as extremely important: “born-digital moving image preservation” (an aserl respondent), “best practices for preservation of different audio and video formats” (also an aserl respondent), “born digital photographs and video of college events,” and a request for an “audio digitization workshop.” additional “other” entries were copyright pitfalls, data security, and “very practical steps that very small institutions can take to preserve their digital materials (e.g. how to check digital integrity, and how often, selection of storage media, and creation of a ‘dark archive’).” one aserl respondent indicated that she did not rate “born digital” institutional and special collections materials as extremely important for preservation only because her institution does not yet have a system set up for these, nor do they yet collect many born-digital special collections. she clarified that she does think this is extremely important despite the seeming lack of interest on the part of her institution. figure 1. results for all survey respondents indicating sources of digital content of importance for preservation at their institution. information technology and libraries | june 2013 22 in comparing the responses to the first question by whether the respondents self-identified as members of an aserl institution (37 respondents as opposed to 142), those who did considered born-digital special collections materials far more important (73 percent, 27 respondents) than non-aserl respondents (62.9 percent, 88 respondents), but this still was rated most important by both groups. second for aserl respondents was digitized collections (69.4 percent, 25 respondents) whereas born-digital institutional records held second place for non-aserl respondents (62 percent, 85 respondents). third and fourth-ranked material sources for aserl respondents were born-digital institutional records (64.9 percent, 24 respondents) and digital scholarly content (63.9 percent, 23 respondents); digital research data only rated 52.8 percent (19 respondents). non-aserl respondents considered digitized collections the third most important source of digital content for preservation (59.7 percent, 83 respondents), and this group of respondents was far less concerned with digital scholarly content (29.9 percent, 40 respondents) or digital research data (29.6 percent, 40 respondents) than the aserl respondents. web content ranked lowest for both groups: 29.4 percent (10) aserl respondents and 30.6 percent (41) nonaserl respondents considered this content extremely important. figure 2. results for aserl survey respondents indicating sources of digital content of importance for preservation at their institution. first aid training for those on the front lines | deridder 23 figure 3. results for non-aserl survey respondents indicating sources of digital content of importance for preservation at their institution. perhaps most surprising was that 20 non-aserl respondents (14.8 percent) rated digital research data as “not important at all” for preservation at their institutions, but this may be reflective of their type of institution. museums and historical societies, non-research institutions, and government agencies likely are not concerned with research data; this theory seems to be supported by the 12.7 percent (17) non-aserl respondents who rated digital scholarly content as “not important at all.” in comparison, only one aserl respondent (2.8 percent) indicated that research data had no importance to his institution for preservation (0 for digital scholarly content). this may simply reflect a lack of awareness of current issues on the part of the respondent. topics of interest both groups of respondents agreed on the three most important topics for future training webinars. “methods of preservation metadata extraction, creation and storage” led the way with 77.3 percent (140 respondents: 70.3 percent or 26 aserl and 79.4 percent or 112 non-aserl) information technology and libraries | june 2013 24 listing this as extremely important. next was “determining what metadata to capture and store” (68 percent, 96 respondents: 62.2 percent or 23 aserl and 66.7 percent or 120 non-aserl). the third most important topic is “planning for provision of access over time” at 65.4 percent (117 respondents: 1.1 percent or 22 aserl and 65.7 percent or 92 non-aserl). figure 4. results for all survey respondents indicating topics of importance to them, for future training webinars. fourth in importance overall was “file conversion and migration issues” (58.8 percent, 107 respondents: 54.1 percent or 20 aserl and 60.6 percent or 86 non-aserl), though the aserl respondents thought this topic was slightly less critical than “developing selection criteria, and setting the scope for what your institution commits to preserving” (56.8 percent, 21 respondents as opposed to 49.6 percent or 70 non-aserl respondents; overall percentage 51.9 percent, 94 respondents). close in relative importance were “validating files and capturing checksums” (53.9 percent, 97 respondents), “monitoring status of files and media” (52.8 percent, 95 respondents), and “developing your institution’s preservation policy and planning team” (51.1 percent, 92 first aid training for those on the front lines | deridder 25 respondents). interestingly, however, “validating files and capturing checksums” is far more important to non-aserl respondents (53.6 percent, 75 respondents) than those from aserl institutions (only 37.8 percent, 14 respondents). “legal issues surrounding access, use, migration and storage” is a more important topic for aserl respondents (51.4 percent, 19 respondents) than non-aserl (42.8 percent, 77 respondents), and aserl respondents were more concerned (37.8 percent, 14 respondents) than non-aserl (33.1 percent, 46 respondents) with “selfassessment and external audits.” additionally, “selecting file formats for archiving” and “selecting storage options and number of copies” is more important for non-aserl (47.5 percent, 67 respondents and 47.9 percent, 67 respondents) than aserl respondents (35.1 percent, 13 respondents and 32.4 percent, 12 respondents, respectively). figure 5. results for aserl survey respondents indicating topics of importance to them, for future training webinars. “security and disaster planning” was ranked extremely important by only 32.6 percent (45) respondents overall, followed by “business continuity planning” at only 29.2 percent (40) respondents. the latter may reflect a lack of widespread awareness of just how critical the loss of information technology and libraries | june 2013 26 a single key employee can be, especially in smaller institutions. it also seems clear that there’s a level of complacency or sense of security about our ephemeral digital content that may be in error. then again, it is quite possible that the respondents are not administrators and feel they do not have the power in their organizations to address such issues. figure 6. results for non-aserl survey respondents indicating topics of importance to them, for future training webinars. additional topics considered extremely important to respondents are as follows, listed in the free text area (the last four by aserl members): • "clean" work station setup—hardware & software for ingest, virus scan, checksum, disk image, metadata, conversion, etc. • integrating tools into your workflow. there is a need to address the nuts and bolts for those of us that are further along in determining the metadata required to capture, selection criteria, and asset audit and preservation policy. first aid training for those on the front lines | deridder 27 • methods for providing researchers access to born digital content (not necessarily online, could be just in-house). • strategies for locating digital assets on physical media in large collections that have been using mplp [“more product, less process”] for decades. • format determination and successful migration or emulation. • staff diversity and training. • how to validate files, migrate files, and which born-digital institutional files our special collections needs to be preserving. • creating and maintaining effective organizational models for digital preservation (i.e. collaboration with central it and/or external vendors, etc.). • case studies of digital preservation, establishing workflow of digital preservation. • web archiving (best practices, alternatives to archive-it, methods of selection, etc.). • one (non-aserl) respondent said it was “somewhat important” to include the topic of “trends for field, future outlook.” conclusions the results from this survey are clear: free or low-cost training needs to focus immediately on preservation of born-digital special collections materials, born-digital institutional records, and digitized collections. the topics of prime importance to respondents were “methods of preservation metadata extraction, creation and storage,” “determining what metadata to capture and store,” and “planning for provision of access over time.” the variations in ratings between respondents from self-identifying as aserl members versus non-aserl members indicates that the needs of those in research libraries differs somewhat from that of cultural heritage institutions in the field dealing with “the long tail” of digital content. 12 future training may need to target these differing audiences appropriately to ensure these needs are met. additionally, administrators need to be addressed as a unique audience in order to focus on the requirements for addressing “security and disaster planning” and “business continuity planning,” as these critical areas need to be developed by those in management positions. future surveys of this nature should include a component to determine the level of technical expertise and support the respondents have, as well as a measure of their position or power in the administrative hierarchy. continued surveys would be extremely helpful in ensuring that available educational options meet the needs of librarians and archivists in the field. as molinaro has pointed out, “getting the right information in the right hands at the right time is a problem that has plagued the library community for decades.” 13 now is the time to develop free, openly available, practical digital preservation training for those on the front lines, if we are to retain critical cultural heritage materials which are only available in digital form. for them to effectively perform necessary triage on incoming digital content, they must be trained in “first aid.” our history is at stake. information technology and libraries | june 2013 28 references 1. paul conway, “preservation in the age of google: digitization, digital preservation, and dilemmas,” library quarterly 80, no. 1 (january 2010): 73–74, doi:10.1086/64846.3. 2. clifford lynch, “challenges and opportunities for digital stewardship in the era of hope and crisis” (keynote speech, is&t archiving 2009 conference, arlington, virginia, may 2009). 3. karen f. gracy and miriam b. kahn, “preservation in the digital age,” american library association, library resources and technical services 56, no. 1 (2012): 30. 4. marshall breeding, “from disaster recovery to digital preservation,” computers in libraries 32, no. 4 (2012): 25. 5. mike kastellec, “practical limits to the scope of digital preservation,” information technology & libraries 31, no. 2 (2012): 70, doi:10.6017/ital.v31i2.2167. 6. charles dollar and lori ashley, “digital preservation capability maturity model,” ver. 2.4, (november 2012), https://docs.google.com/file/d/0bwbqtwrvkhokrxnvnmhxtmo2suu/edit?pli=1 (accessed dec. 24, 2012). 7. mary molinaro, “how do you know what you don’t know? digital preservation education,” information standards quarterly 22, no. 2 (2010): 45. 8. conway, “preservation in the age of google,” 70. 9. library of congress, “digital preservation outreach & education: dpoe background,” accessed december 31, 2012, www.digitalpreservation.gov/education/background.html. 10. library of congress, “digital preservation outreach & education: dpoe curriculum,” accessed december 31, 2012, www.digitalpreservation.gov/education/curriculum.html. 11. jody l. deridder, “introduction to digital preservation—a three-part series based on the digital preservation, outreach and education (dpoe) model,” association of southeastern research libraries, 2012, [archived webinars], accessed december 31, 2012, www.aserl.org/archive. 12. jody l. deridder, “benign neglect: developing life rafts for digital content,” information technology & libraries 30:2 (june 2011): 71–74. 13. molinaro, “how do you know what you don’t know?” 47. https://docs.google.com/file/d/0bwbqtwrvkhokrxnvnmhxtmo2suu/edit?pli=1 http://www.digitalpreservation.gov/education/background.html http://www.digitalpreservation.gov/education/curriculum.html http://www.aserl.org/archive/ from dreamweaver to drupal: a university library website case study jesi buell and mark sandford information technology and libraries | june 2018 118 jesi buell (jbuell@colgate.edu) is instruction and design and web librarian and mark sandford (msandford@colgate.edu) is systems librarian at colgate university, hamilton, new york. abstract in 2016, colgate university libraries began converting their static html website to the drupal platform. this article outlines the process librarians used to complete this project using only in-house resources and minimal funding. for libraries and similar institutions considering the move to a content management system, this case study can provide a starting point and highlight important issues. introduction the literature available on website design and usability is predominantly focused on business or marketing websites. what separates library websites from other informational or commercial websites is the complexity of the information architecture—they contain both intricate informational and transactional functions. website managers need to maintain congruity between many interrelated but disparate tools in a singular interface and navigational system. libraries are also often challenged with finding individuals who possess the appropriate skills to build and maintain a secure, accessible, attractive, and easy-to-use website. in contrast to libraries, commercial companies employ a team of designers, developers, content managers, and specialists to triage internal and external issues. they can also spend months or years perfecting a website and, of course, all these factors have great costs associated with them. given that many commercial websites need a team of highly skilled workers with copious time and funding, how can librarians be expected to give their patrons similar experiences to sites like google? this case study will outline how a small team of librarians completely overhauled their fragmented, dreamweaver-based website to a more secure, organized, and appealing open-source platform with drupal within a tight timeline and very few financial consequences. it includes a timeline of major milestones in the appendix. goals and objectives the first necessity for restructuring the colgate university libraries’ website was building a team that had the skills and knowledge necessary to perform this task. the website overhaul was spearheaded by jesi buell, instructional design and web librarian, and mark sandford, systems librarian. buell has a user experience (ux) design and editing background while sandford has systems, cataloging, and server experience. they were advised by web development committee (wdc) members cindy li, associate director of library technology and digital initiatives, and debbie krahmer, digital learning and media librarian. together, the group understood trends in digital librarianship, the needs of the libraries’ patrons, as well as website and catalog design and mailto:jbuell@colgate.edu mailto:msandford@colgate.edu from dreamweaver to drupal | buell and sandford 119 https://doi.org/10.6017/ital.v37i2.10113 maintenance. the first thing the wdc did was outline its goals and objectives, and this documented weaknesses the group wanted to address with a new website. the wdc identified four main improvements colgate libraries needed to make to the website: improve design colgate libraries’ old website suffered from varied design and language use across pages and various tools (libguides, catalog, etc.). this led to an inconsistent and often frustrating user experience and detracted from the user’s sense of a single, cohesive website. the wdc also wanted to improve and update the aesthetic quality of the website. while many of these changes could have been made with an overhaul of the existing site, the wdc would have still needed to address the underlying cause. responsibility for content was decentralized, and content creation relied too heavily on technical expertise with dreamweaver. further, the ad hoc nature of the content—the product of years of “fitting in” content without a holistic approach—meant that changes to visual style could not be accomplished by changing a single css file. there were far too many exceptions to make changes simply. improve usability the wdc needed to make sure all the webpages were responsive and accessible. a restructuring of layout and information architecture (ia) was also necessary to improve findability of resources. on the old site, some content was hidden behind several layers of links. with no platform to ensure or enforce accessibility standards, website managers had to trust that all content creators were conscious of best practices or, failing that, pages had to be re-edited to improve accessibility. improve content creation and governance a common source of library staff frustration was the authoring experience using dreamweaver. there was no way to track when a webpage was changed or see who had made those changes. situations occurred where content was deleted or changed in error, and no one else knew until a patron discovered a mistake. staff could also mistakenly push out outdated versions of pages. it was not an ideal situation, and it was impossible for an individual (the web librarian) to monitor hundreds of pieces of content for daily changes to check for accuracy. the only other option would be narrow access to only those on the wdc, but that would mean everyone had to wait for the web librarian to push content live, which would also be frustrating. beyond the security and workflow issues, many of the library staff felt uncomfortable adding or editing content because dreamweaver requires some coding knowledge (html, css, javascript). therefore, the group wanted to install a content management system (cms) that provided a wysiwyg (what you see is what you get) content editor so that no coding knowledge would be needed. unite disparate sites (website, blog, and database list) under one updated url on a single secure server colgate libraries’ website functionality suffered from what marshall breeding describes as “a fragmented user experience.”1 the libraries website’s main address was http://exlibris.colgate.edu. however, different tools lived under other urls—one for a blog, another for the database list, yet another still for the mobile site librarians had to maintain information technology and libraries | june 2018 120 because the main website was not responsive. additionally, some portions of the website had been set up on other servers because of various limitations in the windows.net environment and inhouse skills. this was further complicated by the fact that most specialized interactivity or visual components had to be created from scratch by existing staff. the libraries’ blog was on an externally hosted wordpress site, and the database a–z list was on a custom-coded php page. a unified domain would make usage statistics easier to track and analyze. additionally, it would eliminate the need for multiple credentials for the various external sites. custom code, be it in php, .net, or any other language, also needs to be regularly updated as new security vulnerabilities arise.2 moving to a well-maintained cms would help alleviate that burden. by establishing goals and objectives, the wdc had identified that it wanted a cms to help with better governance, easier maintenance, and ways to disperse web maintenance responsibilities across library faculty. it was important to choose a cms platform that offered a wysiwyg editor so that content authoring did not require coding knowledge. additionally, the group wanted to update the site’s aesthetic and navigational designs. the wdc also decided that this was the optimal time to introduce a discovery layer (since all these changes would be one entirely new experience for colgate users) rather than smaller, continual changes that would require users to keep readjusting how they used the website. the backend complexity of updating both the website platform and implementing a discovery layer required abundant and detailed planning. however, while there was a lot of overlap in the preparatory work for implementing the discovery layer as well the cms, this article will focus primarily on the cms. planning after the wdc had detailed goals and objectives, and the proposal to update the libraries’ website platform was accepted by library faculty, the group had to take several steps to plan the implementation. the first steps in planning dealt with analysis. content analysis the web librarian conducted a content analysis of the existing website. using microsoft excel to document the pages and the omni group’s omnigraffle to organize the spreadsheet into a diagram, she cataloged each page and the navigation that connected that page to other pages. this can be extremely laborious but was necessary because some content was inherited from past employees over the course of a decade, and no one knew exactly what content was live on the website. this visual representation allowed for content creators to see redundancy in both content and navigation. it also made it easy for them to identify old content and combine or reorder pages. needs analysis the wdc wanted to make sure it considered more than the content creators’ needs. this group surveyed colgate faculty, staff, and students to learn what they would like to see improved or changed. the web librarian conducted several ux studies with both students and faculty, and this elucidated several key areas in need of improvement. from dreamweaver to drupal | buell and sandford 121 https://doi.org/10.6017/ital.v37i2.10113 peer analysis peer analysis involves thoroughly investigating peer institution’s websites to analyze how they organize both their content and their site navigation. it also gives insight into what other services and tools they provide. it is important to choose institutions similar in size and academic focus. colgate university is a small, liberal arts institution that only serves an undergraduate population, so the libraries would not seek to emulate a large university that serves graduate populations or distance learners. peer analysis is an excellent opportunity to see where a website is not measuring up to other websites as well as to borrow ideas from peers to customize for your specific patrons. evaluating platforms now that the group knew what the libraries had and what the libraries wanted from our web presence, it was time to evaluate the available options. this involved evaluating cms products and discovery layer platforms. the wdc researched different cmss and listed positives and negatives. ultimately, the group determined that drupal best satisfied the majority of colgate’s identified needs. a separate committee was formed to evaluate the major discovery-layer services with the understanding that any option could be integrated into the main website as a search box. budgeting as free, open-source software, drupal does not require a subscription or licensing fee. campus it provided a virtual server for the website at no cost to the libraries. budgeting was organized by the associate director of library technology and digital initiatives and the university librarian. money was set aside in case a consultant or developer was needed, but the web and systems librarians were able to execute the conversion from dreamweaver to drupal without external support. if future development support is needed for specific projects, it can be budgeted for and purchased as needed. the last step was creating a timeline defining achievable goals, ownership (who oversees completing the goal and who needs to be involved with the work), and date of completion. timeline the timeline was outlined as follows: october 2015–january 2016 halfway through the fall 2015 semester, the wdc began to create a proposal for changes to be made to the website. this proposal would be submitted to the university librarian for consideration by december 1. in the meantime, the web librarian completed a content inventory, peer analysis, and ux studies. she also gathered faculty and staff feedback on the current website through suggestion-box commentary, one-on-one interviews, online questionnaires, and anecdotal stories. by the deadline for the proposal, this additional information was condensed and presented to the university librarian. after incorporating suggested changes made by the university librarian, the wdc was able to present both the proposal and results from various studies to the library faculty on january 4, information technology and libraries | june 2018 122 2016. at the end of the meeting, the faculty voted to move forward and adopt the proposed changes. february 2016 february was spent meeting with stakeholders, both internal and external to the libraries, to gather concerns, necessary content, and ideas for improvements. the wdc members shared the responsibility of running these meetings. all members from the following departments were interviewed: research and instruction, borrowing services, acquisitions, library administration, cataloging, government documents, information literacy, special collections and university archives, and the science library. together, the wdc also met with members from it and communications. it was vital that these sessions identify several components. first, what content was important to retain on the new site, and why? the act of justification made stakeholders evaluate whether the information was necessary and useful to the libraries’ users. the wdc also asked the stakeholders to identify changes they wanted to see made to the website. the answers ranged from minor aesthetic tweaks to major navigational overhauls. last, it was important to understand how specific changes might impact workflows and functionality for tools outside colgate libraries’ own website. for example, the wdc had to update information with the communications department so that the libraries’ website would be findable on the university’s app. all the answers the wdc received were compiled into a report, and the web librarian used this information to inform design decisions moving forward. march 2016 while the associate director of library technology and digital initiatives coordinated demos from discovery layer vendors, the wdc also met to choose the final template from three options designed by the web librarian. the web and systems librarians also met to create a list of developers in case assistance was needed in the development of the drupal site. the wdc team researched potential developers and inquired about their pricing. the web librarian began to create wireframe templates of the different types of pages and page components (homepage, hours blocks, blogs, forms, etc.). she also began transferring existing content from the old website to the new website. this process, in addition to the development of new content identified by stakeholders, was to be completed by mid-summer. meanwhile, the systems librarian began to consolidate the external sites under drupal to the extent possible. while libguides lives externally to drupal and maintains its own url that the libraries’ website links out to, he was able to bring the database a–z list, blog, and analytics into the drupal platform. this entailed setting up new content types in drupal to accommodate various functional requirements for the a–z list and assist in creating pages to search for and display database information. from dreamweaver to drupal | buell and sandford 123 https://doi.org/10.6017/ital.v37i2.10113 april–may 2016 drupal allows for various models of permissions and authentication. by default, accounts can be created within the drupal system and roles and permissions assigned to individuals as needed. the ldap (lightweight directory access protocol) module allowed us to tie authentication to university accounts and includes the ability to tie drupal permissions to active directory roles and groups. connecting drupal to the university ldap server required the assistance of it infrastructure staff but was straightforward. it staff provided the connection information for the drupal module’s configuration and created a resource account for the drupal module to use to connect to the ldap service. as currently implemented, the ldap module simply verifies credentials and, if a local drupal account does not exist, creates one for the user. permissions for staff are added to accounts after account creation as needed as a part of the onboarding process. permissions in drupal can be highly granular. since one of the goals of the migration to drupal was to simplify maintenance of the website, the wdc decided to begin with a relatively simple, permissive approach. currently, all library staff can edit any page. because of drupal’s ability to track and revert changes easily, undoing a problematic edit is a simple procedure, and because all changes are tied to an individual login, problems can be addressed through training as needed. the wdc discussed a more fragmented approach that tied editing privileges to specific parts of the site but decided against it. the wdc team felt it was better to begin with the presumption of trustworthiness, expecting staff to only make changes to pages they were personally responsible for. additionally, trying to divide the site into logical pieces, then accounting for the inevitable exceptions, would be complicated and time-consuming. the wdc reserved the right to begin restricting permissions in the future, but thus far this has proven unnecessary. july–august 2016 as the libraries ramped up to the official launch, it was crucial to educate the library faculty and staff so they could become independent back-end content creators. both the web and systems librarians held multiple training sessions for the libraries employees so that everyone felt comfortable both editing and generating content. the associate director of library technology and digital initiatives drafted a campus-wide email announcing the new website and discovery layer at this point. it was sent out a month in advance of the official launch. the new website launched in two parts. the soft launch occurred on august 1, 2016. the web and systems librarians set up a link to the new website on the old site so that users could choose between getting acclimated to the new website or using the tool they were used to in the frantic weeks leading up to the beginning of the semester. august 15, 2016, was the official launch. at this point, the http://exlibris.colgate.edu dreamweaver-based website was retired, and it redirected all traffic heading to the old url to the new drupal-based website at http://cul.colgate.edu. because drupal’s url structure and information architecture differed from the old website, the wdc decided that mapping every page on the old site to the new one would be too time consuming. while it was acknowledged that this may cause some disruption (as it would break existing links), it seemed necessary for keeping the project moving forward. library staff updated all external links possible. the google search operator “inurl” allowed us to identify other sites information technology and libraries | june 2018 124 outside the libraries’ control that pointed to the old website. the wdc reached out to the maintainers of those few sites as appropriate. the biggest risk the libraries took by not redirecting all urls to the correct content was the potential to disrupt faculty who had bookmarked content or had direct urls in course materials. however, the wdc team received very few complaints about the new site, and most users agreed that the improvements to the site far outweighed any temporary inconveniences caused by it. if nothing else, the simplified architecture made finding content easier, so direct links and bookmarks became far less important than they once were. implementation and future steps by strictly following the timeline and working closely together, the web librarian and systems librarian were able to launch colgate libraries’ new website in time for the 2016 fall semester. the wdc team was able to pull off this feat within eight months without spending any extra money. the timeline above only gives a high-level view of the steps the wdc took to accomplish this task. the librarians who worked on this project cannot overemphasize the complexity of this endeavor, especially with a small team. however, a website conversion is feasible with organization, time, and with the online support the drupal community provides (especially the community of libraries on the drupal platform). it is also critical to have in-house personnel that have technical (coding and server-side) knowledge, project management knowledge, and information architecture and design knowledge. the response from incoming and returning students and faculty to the updated look and improved usability of the libraries’ digital content was overwhelmingly positive. following best design practices, in january 2017 more ux testing was conducted with student and teaching faculty participants to gauge their reactions to the new website. 3 users overwhelmingly found the new website to be both more aesthetically pleasing and usable than the old website. on the back end, the libraries’ content is now more secure, responsive, and accessible because the libraries are using a cms. library faculty and staff have been able to add or remove content that they are responsible for, but the website can still maintain a consistent look and feel across all pages. governance has been improved exponentially as library staff have been able to easily and quickly contribute to the website’s content without administrative delays. as the team moves forward, the wdc plans to investigate different advanced drupal tools, implementing an intranet, and better leveraging google analytics. as with all library endeavors, improvement requires continued effort and attention. from dreamweaver to drupal | buell and sandford 125 https://doi.org/10.6017/ital.v37i2.10113 appendix: detailed timeline 1. october 2015 a. began discussion with wdc to create proposal for website changes (web librarian) 2. november–december 2015 a. complete content inventory (web librarian) b. complete peer analysis (web librarian) c. complete ux studies (web librarian) d. gather faculty and staff feedback on current website (web librarian) 3. december 1, 2015 a. submit proposal to change from dreamweaver to drupal to university librarian for consideration and approval (web librarian) 4. january 4, 2016 a. submit revised proposal to library faculty for consideration and approval (web librarian) 5. january 2016 a. set up test drupal site (systems librarian) 6. february 2016 a. complete meetings with departments to gather feedback on concerns, content, and ideas for improvements (library department meetings were split among wdc members) 7. march 2016 a. demo primo, ex libris, and summon for library faculty and staff consideration (associate director of library technology and digital initiatives) b. from three options, choose template for our website (web librarian—approval by the wdc and then the library faculty) c. create list of developers in case we need assistance (web librarian and systems librarian) d. create wireframe templates for homepage (web librarian) e. begin transferring content from old website to new website and create new content with other stakeholders—to be completed by mid-summer (web librarian) f. begin consolidating multifarious external sites under drupal as much as possible (systems librarian) 8. april 2016 a. get drupal working with the ldap (systems librarian) b. agree on permissions and roles for back-end users (systems librarian—with approval by wdc) c. agree on discovery layer choice (associate director of library technology and digital initiatives) d. meet with outside stakeholders—communications, it, administration 9. may 2016 a. integrate discovery layer search (systems librarian) 10. july 2016 a. provide training for library faculty and staff as back-end content creators (web librarian) information technology and libraries | june 2018 126 b. prepare campus-wide email to announce new website and discovery layer with our new url (associate director of library technology and digital initiatives and web librarian) 11. august 1, 2016 a. set up a link on our old site (http://exlibris.colgate.edu) so for two weeks users could choose between using the old interface or start getting acclimated to the new website before the fall semester started (systems librarian) 12. august 15, 2016 a. official launch—we retire our http://exlibris.colgate.edu dreamweaver-based website and redirect all traffic headed to our old url to our new drupal-based website at http://cul.colgate.edu (systems librarian) 13. september–october 2016 a. update and get approval from library faculty for a new web style guide and governance guide (web librarian) 14. january 2017 a. conduct ux studies of students and faculty to see how people are using both the new website and the new discovery layer; gather feedback and ideas for improvement (web librarian) bibliography breeding, marshall. “smarter libraries through technology: strategies for creating a unified web presence.” smart libraries newsletter 36, 11 (november 2016): 1-2. general onefile (accessed august 3, 2017). http://go.galegroup.com/ps/i.do?p=itof&sw=w&v=2.1&it=r&id=gale%7ca471553487. naudi, t. “nearly all websites have serious security vulnerabilities--new research shows.” database and network journal 45, 4 (2015): 25. general onefile (accessed august 3, 2017). http://bi.galegroup.com/essentials/article/gale%7ca427422281. raward, r. “academic library website design principles: development of a checklist.” australian academic & research libraries 32, 2 (2001): 123-36. http://dx.doi.org/10.1080/00048623.2001.10755151 1 marshall breeding, “smarter libraries through technology: strategies for creating a unified web presence,” smart libraries newsletter 36, no. 11 (november 2016): 1–2. general onefile. 2 tamara naudi, “nearly all websites have serious security vulnerabilities—new research shows,” database and network journal 45, no. 4 (2015): 25. general onefile. 3 roslyn raward, “academic library website design principles: development of a checklist,” australian academic & research libraries 32, no. 2 (2001): 123–36. http://dx.doi.org/10.1080/00048623.2001.10755151 introduction goals and objectives improve design improve usability improve content creation and governance unite disparate sites (website, blog, and database list) under one updated url on a single secure server planning content analysis needs analysis peer analysis evaluating platforms budgeting timeline october 2015–january 2016 february 2016 march 2016 april–may 2016 july–august 2016 implementation and future steps appendix: detailed timeline bibliography lib-s-mocs-kmc364-20140601053338 two types of designs/ mcgee 203 book reviews the proceedings of the international conference on training for information work, rome, italy , 15th-19th november 1971, edited by georgette lubock. joint publication of the italian national information institute, rome and the international federation for documentation, the hague; f.i.d. publ. 486; sept. 1972, rome, 510 p. let's face it: there is something about any proceedings that elicits a very personal reaction in many of us: "here are papers that either, a) got their authors a trip to the conference city; b ) tell how we did good at our place; or c) unabashedly present h.b.i.'s( half baked ideas )." i personally like proceedings that have many papers under category c); such papers make me think ( or laugh ). the great majority of papers in these rome proceedings fall basically under category b), i.e.-'how we done it good,' and some quite obviously under a), i.e.-'have paper will travel'-well it was rome, italy, after all. however, there is a smattering of papers that fall under c), i.e.-h.b.i.'s. so for those interested in the topic, these proceedings offer among other things some food for speculative thought. for these other things let us start at the beginning. the contents consists of prefatory sections, one opening address, sixty-six papers, a set of twenty brief conclusions, three closing addresses, a summary of work at the conference, an author index, and a list of participants and authors' addresses. the papers are organized according to two major sessions: one on "training of information specialists" (nine invited and fortytwo submitted papers ) and another on "training of information users" (six invited and nine submitted papers ). the larger number of papers on training of specialists vs. training of users probably represents a good assessment of real education interests in the field. the conference was truly international: authors came from four continents, twenty countries and four international organizations. most represented were: italy as host country with fifteen papers, usa with eight, great britain with seven, and france with six papers. the concern for information science education is indeed worldwide; however, if the presented papers are any measure, such education is in big trouble, because one is left with the impression that information science education is in some kind of limbo: the bases, relations, and directions are muddled or nonexistent. but then isn't all contemporary higher education in big trouble, and in limbo? the conceptions of what information science education is all about differ so widely from paper to paper that the question of this difference in itself could be a subject of the next conference. it is my impression that the differences are due to a) widely disparate preconceptions of the nature of "information problems,'' and b) incompetence of a number of authors in relation to the subjects. accomplishments in some other field or, even worse, 204 journal of library automation vol. 5/3 september, 1972 a high administrative title does not necessarily make for competence in information science education. the proceedings offer a fascinating picture of information science education by countries and by various facets. it also offers frustration due to unbelievably unhygienic semantic conditions in the treatment of concepts, including a confusion from the outset of "training" and "education." the first business of the field should be toward clearing its own semantic pollution; such a conclusion can be derived even after a most cursory examination of the papers. my own choices for the three most interesting papers are: -v. slamecka and p. zunde, "science and information: some implications for the education of scientists;" (usa) -s. j. malan, "the implications for south african education in library science in the light of developments in information science;" (south africa) -w. kunz and h. w. j. rittel, "an educational system for the information sciences." (germany) the editing of the ptoceedings is exemplary; the editors and conference organizers worked hard and conscientiously. the proceedings also provide the best single source published so far from which one could gain a wide international overview not only of information science education but also of information science itself, including implicitly the problems the field faces. in this lies the main worth of the proceedings. t efko saracevic computer processing of library files at durham unive1·sity; an ordering and cataloging facility for a small collection using an ibm 360/ 67 machine. by r. n. oddy. durham, england: university library, 1971. 202p. £1.75. the task of the book is to guide the reader in the use of the lfp (library file processing) system developed by the durham university library. the lfp system orders items and prints book catalogs in various sequences for a small collection of items with the aid of an electronic digital computer. the system is batch with card input and printed output; the programs are written in pl/1. "the lfp system was designed to be flexible and easy to operate for small files , and is less suitable for files larger than 10,000 items because there are then other problems which it does not attempt to solve." (p. 10). the book fulfills its assigned task well; it is an excellent example of explanations and instructions for the personnel charged with the day to day operations for the particular system described. the book includes excellent introductory chapters on job control language, how computers operate, file maintenance, etc. outside of the durham university library, however, the book has little use except as a model of a well done operations guide. kenneth ]. bierman 2 information technology and libraries | december 2008 andrew k. pacepresident’s message i n my first column, i mentioned that the lita board’s main objective is “to oversee the affairs of the division during the period between meetings.” of course, oversight requires communication. sometimes this is among board members, or it’s an e-mail update, or a post to the lita-l discussion list, or even the articles in this journal. regardless, i see the cornerstone of “between-meeting oversight” as keeping the membership fully (or even partially) engaged from january through june and july through december. as a mea culpa for the board, but without placing the blame on any one individual, i am willing to concede that the board has not done an adequate job of engaging the membership between american library association (ala) meetings. while ala itself is addressing this problem with recommendations for virtual participation and online collaboration, lita should be at the forefront of setting the benchmark for virtual communication, participation, education, planning, and membership development. in an attempt to posit some solutions, as opposed to finding someone to blame, i first thought of the lita committees. which one should be responsible for communicating lita opportunities and events to the membership using twenty-first-century technology? education? membership? web coordinating? program planning? publications? in the end, i was left with the choice of two evils: merge all the committees into one so that they can do everything or create a new committee to deal with the perceived problem. knowing that neither of those solutions will suffice, i’d like to put the onus back on the membership. maybe i’m trying to be a 2.0 librarian—crowdsourcing the problem, that is, taking the task that might have been done by an individual or committee and asking for more of a community-driven solution. in the past, lita focused on the necessary technologies for crowdsourcing—discussion lists, blogs, and wikis—as if the technology alone could solve the problem. the bigwig taskforce and web coordinating committee have shouldered the burden of both implementing the technology and gaining philosophical consensus on its use—a daunting task that can easily appear chaotic. now that the technology is commoditized (and generally embraced by ala at large and other divisions as well), perhaps it is time to embrace the philosophy of crowdsourcing. maybe it’s just because i have had cloud computing and web-scale architectures on the brain too much lately (having decided that it is impossible to serve two masters—job and volunteer work—i shall forever endeavor to find the overlap between the two), but i sincerely believe that repeating the mantra that lita’s strength is its membership is not mere rhetorical lipservice. ebay is better for sellers because there are so many buyers; it is better for buyers because there are so many sellers. googledocs works for sharing documents better than a corporate wiki or microsoft sharepoint because it breaks down the barriers of domains, allowing the participants to determine who shares responsibility for producing something. barcamps are rising in popularity not only because of a content focus on open data, open source, and open access, but because of the participatory and usergenerated style of the barcamp-style meetings. as a division of ala, lita has two challenges— leading the efforts of educating the membership, other divisions, and ala about impending sea changes in information technology, but also embracing these technologies itself. we must eat our own dog food, as the saying goes. perhaps it is more fitting to suggest that lita must not only focus on getting technology to work, but putting technology to work. in the next few months, the lita board will be tackling lita’s strategic plan, which expires in 2008. that means it is time not only to review the strategy—to educate, to serve, to reach out—but also to assess the tactics employed to fulfill that strategy. you are probably reading this column in or after the month in which the strategic plan ends, which does not mean that we will be coasting into the ala midwinter meeting. on the contrary, i sincerely hope to gather enough information from committees, task forces, members, and nonmembers in order for the lita leadership to come up with something strategically meaningful going into the next decade. one year isn’t nearly long enough to see something this big through to completion. just as national politicians begin reelection campaigns as soon as they are elected, i suspect that ala divisional presidents begin thinking about their legacy within the first couple months of office, if not before. but i hope, at least, to establish some groundwork, including a platform strategy that will allow the membership to maintain a connection with the board and with other members—to crowdsource solutions on a scale that has not been attempted in the past and that will solidify our future. and when we have a plan, you can trust that we will use all the available methods at our disposal to promote it and solicit your feedback. andrew k. pace (pacea@oclc.org) is lita president 2008/2009 and executive director, networked library services at oclc inc. in dublin, ohio. gender, technology, and libraries | lamont 137 melissa lamont gender, technology, and libraries information technology (it) is vitally important to many organizations, including libraries. yet a review of employment statistics and a citation analysis show that men make up the majority of the it workforce, in libraries and in the broader workforce. research from sociology, psychology, and women’s studies highlights the organizational and social issues that inhibit women. understanding why women are less evident in library it positions will help inform measures to remedy the gender disparity. t echnology not only produces goods and services, it also influences society and culture and affects our ability to work and communicate. as the computer encroaches more deeply into both workplaces and homes, encouraging participation in the development and use of technology by all segments of society is important. libraries, in particular, need to provide services and products that both appeal to and are accessible by a broad range of clientele. for libraries, information technology (it) has become vitally important to the operation of the organization. yet fewer women are active in it than men. a complex series of social and cultural biases inhibits women from participating in technology both in the library and in the larger workforce. the inclusion of more women in technology would alter the development and design of products and services as well as change the dynamic of the workplace. understanding why women reject it as it is currently practiced is necessary to understanding how to make technology more inviting for women. n occupational data studies and statistics from the broader it fields highlight discrepancies between the compensation, managerial level, and occupational roles of men and women.1 among the numbers are those showing that computer and information science fields include only 519,700 females and slightly more than 1,360,000 males in 2003.2 in the same occupational fields, men earned a median of $74,000 while women earn $63,000.3 similarly, the association of research libraries (arl) statistics from 2004 to 2008 show that men were more often employed as the heads of computer systems departments within libraries. computer systems department heads also earned higher salaries than the heads of other library departments. with the exception of 2004–5, female computer department heads were paid less than their male counterparts, despite the fact that they had more years of experience. in the 2007–8 report, men and women had the same number of years of experience, though women’s salaries lagged slightly behind those of the men, as shown in table 1.4 the availability of statistics for the heads of library technology departments belies the difficulty in counting the number of technology positions in libraries, or the broader workplace, and compiling statistics by gender. in a recent study of the job satisfaction of academic library it workers, lim comments on the complexities in identifying survey participants, “as a directory of library it workers does not exist.”5 thus, to augment the statistical data for department heads, a citation analysis was used to identify those persons involved enough in library technology to write about it. presumably, authors of articles appearing in technology-oriented journals would have interests and expertise in technology regardless of their position titles or locations within the organization. technology-related articles can and do appear in a wide variety of library journals. journals with a focus on technology were selected to avoid the dilemma of subjectively categorizing individual articles as technical or nontechnical. the journals selected provide a cross-section of association, commercial, electronic, and print publications. information technology and libraries is the journal of the library information technology association division of the american library association (ala). the journal of information science and technology (jasis&t) is an official publication of the american society for information science and technology. the not-for-profit corporation for national research initiatives publishes d-lib magazine, an table 1. library computer systems department heads year gender depart ment heads salary years in field 2004–5 women 32 76,764 18.9 men 60 76,060 16.9 2005–6 women 32 78,767 19.4 men 52 79,680 18.4 2006–7 women 26 81,435 18.2 men 52 82,409 17.6 2007–8 women 27 87,107 18.8 men 51 87,136 18.8 melissa lamont (mlamont@rohan.sdsu.edu) is digital collections librarian, san diego state university. 138 information technology and libraries | september 2009 electronic publication on digital library research and development. all three are peer-reviewed. computers in libraries, published by information today, includes case studies and how-we-did-it articles and is not peer-reviewed. emerald publishes the peer-reviewed journal library hi tech. the author assembled statistics for the years 2006 and 2007. for the survey, regular columns, editors’ sections, reviews, short notices, and association communications were not counted. each authored article was counted. no attempt was made to include or discount an article based upon the topic. the gender of the authors was determined by notes within the journal, authors’ websites, other internet sites, or by communication with the authors. as the statistics in table 2 demonstrate, men publish in these journals at a far higher rate than women, with the exception of computers in libraries. women make up 35 percent of the authors while men make up 65 percent. jasis&t, arguably the most technical and theoretical journal in the analysis, and the journal with the most academic authorship, illustrates the highest disparity. alternatively, the publication computers in libraries contains more articles authored by women. this publication solicits articles on the application of technology—practical and less formal articles to share successes and ideas. it may be argued that female librarians simply publish less than male librarians. two additional publications, the journal of academic librarianship, published by elsevier, and college and research libraries (c&rl), published by the association of college and research libraries, were analyzed for comparison. table 3 illustrates the data for the comparison journals alone, with women making up 62 percent of the authors. female authors outnumbered male authors in the comparison journals, but women account for approximately 80 percent of u.s. librarians and are therefore publishing at a lower rate than men.6 in the interest of comparison, the author also analyzed the journal children and libraries, the journal of the association for library service to children, a division of ala. in 2006 and 2007, only four male authors were represented in children and libraries. they appeared as authors a total of eleven times. all of the remaining fifty authors are female. women made up 82 percent of the total authors while men made up 18 percent. these statistics are similar to a study conducted by hakanson and published in 2005. she analyzed articles in selected journals from the years 1980 to 2000 and found that male authors slightly outnumbered female authors, and further that articles authored by men were more likely to be referenced than those by women.7 the data gathered here are similar: 41 percent of the total authors in both technology and comparison journals are women, and 59 percent are men. male authors also are more likely to be the lead author on articles with multiple authors. again jasis&t shows the greatest disparity. computers in libraries includes more female lead authors, as shown in table 4. in the comparison journals, women are more often the lead author, as shown in table 5. both hakanson’s data and the small statistical sample reported here demonstrate that although women hold most library positions, they do not publish a comparable amount. technology journals show the most disparity between the numbers of male and female authors. together, the citation and occupational statistics illustrate the higher visibility men have in it. fewer women are evident in it as department heads, employees, academics, or authors. table 2. gender of authors in technology journals, 2006–7 publication articles female authors male authors # % # % computers in libraries 57 51 61.4 32 38.6 d-lib magazine 92 83 38.6 132 61.4 information technology & libraries 43 28 33 57 67 jasis&t 354 244 30.3 560 69.7 library hi-tech 91 63 41.2 90 58.8 totals 637 469 35 871 65 table 3. gender of authors in comparison journals, 2006–7 publication articles female authors male authors # % # % college & research libraries 66 81 63 48 37 journal of academic librarianship 128 140 61 89 39 totals 194 221 62 137 38 gender, technology, and libraries | lamont 139 n discussion in the broader workplace, not just libraries, men hold the majority of it positions. the importance of including women in it is not just a matter of equal opportunity. according to rasmussen and hapnes, women will bring different concerns and outlooks to it. further, the products and services produced by a diverse and integrated workforce will appeal to a broader market. including more women in the it workplace will also alter the organizational environment. their ideas and interests will bring new perspectives to development discussions and likely lead to new or different systems.8 understanding why relatively few women enter it fields will help inform measures to alter the current, male-dominated dynamic. by reviewing the research in sociology, psychology, and women’s studies, the factors inhibiting women from participation in it can start to be understood. the dissuasive factors are a complex and intertwined combination of organizational culture, occupational segregation, and subtle discrimination. abilities and perceptions technology is pervasive throughout the library, and nearly all librarians develop basic technical skills as a condition of employment. librarians may develop more advanced computing skills to address a lack of technical support, to develop new services, or for professional or personal interest. correspondingly, technologists have absorbed library concepts such as description and classification. yet knowledge and ability are valued and evaluated within the social context of the organization, according to scott-dixon. the location of an occupation within the organization will influence the perception of the ability and skill required to succeed in that position.9 although the work of librarians and technologists may be similar or interdependent, the occupations are valued differently. scott-dixon’s research addresses the problem of “designating which work is technical enough to merit consideration as it work.”10 technologically proficient librarians or staff working outside of the it department will not be considered part of the library’s it staff, yet they may be performing at a technological level equal to that of the regular it staff. scott-dixon states, “assumptions about it work incorporate assumptions about who performs this work, and that work performed in traditionally nonwhite, non-male jobs is often viewed as less technical, regardless of the technological objects that are employed in the process.”11 the number of women participating in it may be higher than the statistics represent; nevertheless, women are still less directly employed in it. any contributions they make to it will be devalued as a consequence of their positions within the library organization. position and department titles also influence the perceived value of the work. to make traditional library tasks appear modern and relevant, long-established library functions have been renamed. cataloging has become metadata, catalog control has become system administration, and librarianship has become information science. the old chestnut that information science is library science for boys has an element of truth. in 2006, the average annual starting salary for librarians who categorize their positions as information science was $48,413; the average for those who categorized their positions as library science was $39,580. women who categorized their positions as information science earned an average starting salary of $46,118; men averaged $55,423.12 salary statistics substantiate the research showing that information technology positions are more highly valued and therefore more highly compensated in the library organization. likewise, men are more highly compensated than women. one of the causes of income inequality is occupational table 4. gender of lead authors in technology journals, 2006–7 publication articles female first male first computers in libraries 20 12 8 d-lib magazine 50 22 28 information technology & libraries 17 7 10 jasis&t 140 32 108 library hi-tech 42 19 23 totals 269 92 177 table 5. gender of lead authors in comparison journals, 2006–7 publication articles female first male first college and research libraries 39 29 10 journal of academic librarianship 61 36 25 totals 100 65 35 140 information technology and libraries | september 2009 segregation.13 occupational segregation occurs when positions with similar educational requirements, but different titles or locations within the organization, are valued differently.14 the difference in the salaries of traditional library department heads and the heads of technology departments is one example of income inequality within the library. according to the arl annual salary survey 2007–08, heads of computer systems departments earn more than $87,100 while heads of rare books and manuscripts departments, who have the second highest salaries, earn $80,628. the rare books and manuscripts department heads are nearly evenly divided by gender; the majority of computer systems department heads are men.15 in libraries, occupational segregation divides traditional library departments and functions from it departments and technology applications. librarians are predominately female and, as the occupational statistics show, it workers are predominately male. the result for libraries has been a gendered segregation of the library workforce.16 the results of occupational segregation are intensified by the tendency for women to avoid defining themselves as technology workers. the research by adam et al. confirms the results of several earlier studies. when asked to define their roles in the organization, men more often associate their positions with it; women tend to identify with a larger or more encompassing group within the organization, not specifically it.17 though these studies did not include librarians, it could be assumed that female librarians would respond much like their counterparts in other industries. in fact, few occupational studies conducted outside the library profession include librarians. thus it appears that women choose to be excluded from an occupational group that is well compensated, integral to the organization, and considered highly skilled. not only do women define their positions as non– it, but women also underestimate their technical skills. hargittai and shafer reviewed a number of studies investigating the self-assessment of computer skills. in those studies, women test at the same skill level as men but consistently underrate their technical ability. hargittai and shafer conducted a study of internet skills that draws the same conclusion.18 organizational culture women may underestimate their abilities and disassociate with it in part because of the perception of it organizational culture.19 technical positions are associated with long and irregular hours, leading to the assumption that family and home responsibilities will cause women to be less able to contribute. as ramsey and mccorduck note, those assumptions are not associated with men’s work.20 they emphasize that while women “often shoulder more family responsibilities than men . . . the presumption more than the reality tends to limit women’s advancement.”21 the perception of a high commitment level is fostered by the computing industries. the stereotype of the solitary computer geek, typing away in physical, though not virtual, isolation with a social life revolving around the technology is not entirely accurate. yet guzman, stam, and stanton have studied it as an occupational subculture. they call the perceived demands of the subculture “extreme and unusual,” with long hours and constant need for self reeducation.22 the appearance of high cost in time and capital is one way that the already-initiated keep outsiders out. the use of specialized language and jargon, stories of long hours spent, and complaints about end users are all means of solidifying organizational boundaries. the ramsey and mccorduck report points to a perception by some women that the long hours are often “a status symbol, a sign of machismo.”23 all occupational groups participate in us-versus-them behavior however; since it is gendered, the subculture effectively excludes women and exacerbates the segregation. according to guzman, stam, and stanton’s research, one of the hallmarks of the it subculture is the sense of control over other groups within the organization. yet the subculture also shares a sense of fulfillment in assisting others with technology.24 the esoteric knowledge held by it workers is essential to the operation of most organizations, in particular libraries. this gives the subculture an inordinate sense of power.25 the computing professions appear to be linked with masculinity and power, at least in western cultures. melanie wilson writes, “the qualities required for entry to the professions and success in them are seen as masculine.”26 masculine occupations tend to be associated with skill, learning, and hard work. construction, business, and now it have a preponderance of male professionals. masculine occupations are more prestigious and better compensated. wacjman writes, “to be in command of the very latest technology signifies being involved in directing the future, so it is a highly valued and mythologized activity.”27 the idea that women’s skills are more instinctive makes them less valued, and feminized occupations tend to be associated with the innate behaviors.28 wilson points to research indicating that “women’s work tends to be regarded as semi-skilled merely because it is women’s work.”29 women are a higher percentage of elementary school teachers, nurses, and care givers, and those positions receive modest compensation compared to occupations typically held by men. specific to libraries, technology subfields may be seen as acceptable positions for men in an occupation traditionally dominated by women. as the research suggests, an increase in the number of women involved in technology would devalue those fields. roos and reskin explored the effect of an increase gender, technology, and libraries | lamont 141 in the numbers of women on occupational status. in a 1990 paper they wrote, traditionally, “women’s” jobs have been both lowerpaying and less valued than “men’s.” occupational incumbents have thus been chagrined to learn that their occupation is feminizing, fearful of a drop in wages and prestige. this fear has a valid empirical basis: the percentage female in an occupation is negatively correlated with occupational earnings.30 an influx of women into library it would likely devalue the subfield and depress wages; as such, occupational segregation is one means of protecting wages and influence. women are often deterred from entering or excelling in an occupation through subtle discrimination. because the sexist actions or words are not always recognized as discriminatory, subtle sexism is difficult to define. the repetition of the behaviors and language over time creates a sense that those patterns are acceptable, and they become more difficult to change.31 examples of subtle sexism include the expectation that the women will be more responsible for social occasions involving food or more responsible for the staff lounge or lunch room. often the informal exchange of information and skills, so-called boy’s-room knowledge, eludes women because they are excluded from masculine socializing. in addition, men may be assigned different, usually less clerical tasks, and women are often associated with the softer tasks of user support, help desks and interfaces.32 although subtle discrimination occurs in all work places, not just libraries, the effects in a gender-segregated workplace are compounded. confronted with a complex series of social, cultural, and organizational cues, women are made to feel less competent and less comfortable with technology. the association of women’s positions with lower wages and prestige serves to sustain the occupational segregation and justify the subtle discrimination that hinders women. sometimes perception creates reality. it would be a mistake to group all women as a whole, expecting that the experiences of all are exactly alike, just as not all men are technologically adept. socioeconomic factors, as well as ethnic and geographic differences, influence the abilities and desires of women and men to succeed in technology professions. yet the smaller number of women in technology subfields of librarianship implies an almost “symbolic image of the discipline as masculine, which in turn reinforces the minority position of women.”33 likewise, the far greater number of women writing in children’s librarianship simply reinforces this subfield as feminine. according to alksnis, “on the demand side, jobs are often seen as requiring the characteristics of the group that already dominates it.”34 the lack of women in the it field continues to reinforce the stereotype and perpetuate the imbalance. n conclusion to remedy the underrepresentation of women in it, it would be simple to call for greater educational opportunities for girls, mentoring programs for young professional women, and economic incentives to retain mid-career women. the situation, however, is not simple. a series of organizational, societal, and cultural perceptions inhibit women from associating or identifying with it. rasmussen and hapnes refer to a combination of organizational culture and gender politics that discourage women.35 instead of a focus on the numbers of women in it, librarians should work to transform the organizational culture. as technology progresses, the definition of technology work must be reevaluated and the entries into the technology fields must be redefined. in short, what constitutes it must be rethought, recast, and revalued as technology develops. in the library specifically, it and librarianship have much in common. at present, the library has a dichotomized workforce of female librarians and male it workers. over time, the skills of librarians and technologists will blend. if managed properly, the best of classic library theory and practice will combine with it into a dynamic and diverse workforce as well as a thriving and innovative organization. references and notes 1. examples of research and statistics concerning the number and status of women in technology fields, in addition to those noted in the paper, include carol simard et al., climbing the technical ladder: obstacles and solutions for mid-level women in technology (palo alto, calif.: anita borg institute for women and technology, 2008), http://anitaborg.org/files/climbing_the_ technical_ladder.pdf (accessed oct. 20, 2008); u.s. department of labor, bureau of labor statistics, household data annual averages, “table 11. employed persons by detailed occupation, sex, race and hispanic or latino ethnicity,” ftp://ftp.bls.gov/ pub/special.requests/lf/aat11.txt (accessed oct. 20, 2008); and jay vesgo, “cra taulbee trends: female students and faculty,” computing research association, june 17, 2008, www.cra.org/ info/taulbee/women.html (accessed oct. 20, 2008). 2. national science foundation, division of science resources statistics, women, minorities, and persons with disabilities in science and engineering: 2007, nsf 07-315, table h-5, employed scientists and engineers by occupation, highest degree level, and sex: 2003 (arlington, va.: national science foundation, 2007): 222, http://www.nsf.gov/statistics/wmpd/ pdf/nsf07315.pdf (accessed june 11, 2008). 3. national science foundation, division of science resources statistics, women, minorities, and persons with disabilities in science and engineering: 2007, table h-16, median annual salary of scientists and engineers employed full time, by highest degree, broad occupation, age group, and sex: 2003, 225. 4. association of research libraries, arl annual salary survey 2007–08, table 17, number and average salaries by position 142 information technology and libraries | september 2009 and sex (washington, d.c.: arl, 2008): 42–43, tables 17–18 www.arl.org/stats/annualsurveys/salary/annualedssal.shtml (accessed aug. 2008). 5. sook lim, “job satisfaction of information technology workers in academic libraries,” library & information science research 30, no. 2 (2008): 120. 6. stephanie maatta, “placements and salaries 2006: what’s an mlis worth?” library journal (oct. 15, 2007), www.library journal.com/article/ca6490671.html (accessed aug. 29, 2008) 7. malin hakanson, “the impact of gender on citations: an analysis of college & research libraries, journal of academic librarianship and library quarterly,” college & research libraries 66, no. 4 (2005): 312–22. 8. bente rasmussen and tove hapnes, “excluding women from the technologies of the future? a case study of the culture of computer science,” futures 23, no. 10 (1991): 1107. 9. krista scott-dixon, “from digital binary to analog continuum: measuring gendered it labor: notes toward multidimensional methodologies,” frontiers 26, no. 1 (2005): 26. 10. ibid. 11. ibid., 30. 12. stephanie maatta, “placements and salaries 2006.” 13. christine alksnis, serge desmarais, and james curtis, “workforce segregation and the gender wage gap: is women’s work valued as highly as men’s?” journal of applied social psychology 38, no. 6 (2008): 1416–41. 14. ibid., 1419. 15. association of research libraries, arl annual salary survey 2007–08, 42–43, tables 17–18. 16. lori ricigliano and renee houston, “men’s work, women’s work: the social shaping of technology in academic libraries,” (paper presented at the association of college and research libraries 11th annual national conference, charlotte, n.c., apr. 10–13, 2003): 1. 17. alison adam et al., “being an ‘it’ in it: gendered identities in it,” european journal of information systems 15, no. 4 (2006): 368–78. 18. eszter hargittai and steven shafer, “differences in actual and perceived online skills: the role of gender,” social science quarterly 87, no. 2 (2006): 432–48. 19. rasmussen and hapnes, “excluding women,” 1108. 20. nancy ramsey and pamela mccorduck, “where are the women in information technology? preliminary report of liberature search and interviews” (report prepared for the national center for women and information technology, feb. 5, 2005): 9, http://www.anitaborg.org/files/abi_wherearethe women.pdf (accessed june 12, 2009). 21. ibid. 22. indira r. guzman, kathryn r. stam, and jeffrey m. stanton, “the occupational culture of is/it personnel within organizations,” the data base for advances in information systems 39, no. 1 (2008): 45. 23. ramsey and mccorduck, “where are the women in information technology?” 9 24. guzman, stam, and stanton, “occupational culture,” 45. 25. ibid. 26. melanie wilson, “a conceptual framework for studying gender in information systems research,” journal of information technology 19, no. 1 (2004): 87. 27. judy wajcman, “reflections on gender and technology studies: what is state of the art?” social studies of science 30, no. 3 (june 2000): 454. 28. alksnis, desmarais, and curtis, “workforce segregation,” 1418. 29. wilson, “conceptual framework,” 85. 30. patricia a. roos and barbara f. reskin, “occupational desegregation in the 1970s: integration and economic equality?” sociological perspectives 35, no.1 (1992): 87. 31. nijole v. benokraitis, “sex discrimination in the 21st century,” in subtle sexism: current practice and prospects for change, ed. nijole v. benokraitis (thousand oaks, calif.: sage, 1997): 11. 32. fiona wilson, “can compute, won’t compute: women’s participation in the culture of computing,” new technology, work and employment 18, no. 2 (2003): 127. 33. vivian anette lagesen, “extreme make-over? the making of gender and computer science” (phd diss., norwegian university of science and technology, trondheim, norway, 2005): 188. 34. alksnis, desmarais, and curtis, “workforce segregation,” 1419. 35. rasmussen and hapnes, “excluding women,” 1108. automated periodicals system at a community college library vivian harp: staff analyst, illinois bell telephone company, chicago, and gertrude heard: serials technician, moraine valley community college library, palos hills, illinois. at the time of writing, ms. harp was assistant librarian at the moraine valley community college library. 83 automated systems need not be extensive to save time and improve efficiency. moraine valley's off-line operation, based on a file of 715 periodical titles, generates renewal orders, sends claims, and records subscription histories. background moraine valley community college (mvcc) is a two-year institution serving southwest cook county, illinois. it opened in september 1968 and now has an enrollment of 3,468 students. the library maintains 715 paid and free periodical subscriptions. because of the small staff size, periodicals had originally been handled by the cataloger. two subscription agencies were tried and found unsatisfactory. problems with overlapping subscriptions and lapsed subscriptions which were never picked up became quite severe, and time spent tracking down problems approached that needed to handle orders and renewals independently. periodicals were transferred to the public service librarian when the staff was expanded. as untangling of agency problems proved more and more time-consuming, a serials technician was assigned to maintain subscriptions, straighten out old problems, check in periodicals, and handle claims. for each subscription, bibliographic and order information and mvcc holdings were entered on a three-by-five-inch history card; on the verso were records for each renewal of purchase order number, subscription length, cost, and subscription dates. magazines were and are checked in on kardex files; the kardex card also holds the latest publisher's mailing label. a checklist is used to ensure that eaqh new title has a kardex card and storage box prepared and a listing in the public holdings record, plus any special instructions for routing. form letters are used for original enquiries to the publisher regarding availability and cost. when a subscription was renewed, data from the "current'' section of 84 journal of library automation vol. 7/2 june 1974 the history cards had to be transferred to the back and updated information entered on the front. a worksheet was made up to give all the necessary renewal information to the typist and the actual purchase order typed from that worksheet. once the purchase order was completed its number was marked on the worksheet and the history card. worksheets were kept on file to serve as easily accessible copies of the purchase orders for use in correspondence since the library copy of the purchase order was tied up in the accounting process. since many publishers do not provide renewal notification-and in our case these renewals amounted to over 40 percent of our orders-various methods to provide ourselves with notification of approaching expirations were attempted, including the use of colored plastic jackets in the history card file and division of the file by date. failure in this area was the chief weakness of the manual system. the cards were bulky and required much handling. creating a holdings list destroyed any semblance of the colorcoded order. if a card were removed for use in correspondence, it could be misplaced or misfiled and therefore not be considered for renewal at the proper time. duplication of paperwork and repeated erasures and transfers of information on the cards were other drawbacks of this operation. introduction to automated system it was hoped that an automated system would indicate approaching expirations and simplify the actual renewal procedure. the following specific objectives were set up: 1. to provide advance notice of subscriptions due for renewal even if a renewal notice were not received. 2. to produce a purchase order, or a replica providing on a single sheet all data needed for renewal. 3. to produce a list of periodical holdings that included the history of all renewals. 4. to claim missing issues of paid and free subscriptions. 5. to produce fiscal and subject area cost reports that would facilitate budget evaluation. two special problems had to be given consideration: ( 1) the college has a complicated check approval system requiring initiation of purchase orders two months before the check is needed; and ( 2) the automated system needed the capability to handle standing orders, government documents (depository and agency items), free materials, and titles held only on microfilm as well as ordinary renewals. these special items make up almost 30 percent of the total subscriptions, and to maintain a parallel manual system for them would be unsatisfactory. method and materials selecting data elements for inclusion was based as much as possible on automated periodicals system/harp and heard 85 the types of output reports desired. a simple holdings list as an end in itself was felt to be wasted effort, but as a by-product of the master file we wanted to generate public holdings lists twice a year. necessary data were readily available from the three-by-five-inch cards, with one addition-a unique number was assigned to each title. the data necessary would require more than one input card to produce the type of reports we wanted; therefore, as data elements were being considered, card codes and item numbers were also assigned to identify information for programming purposes. space was allocated to each field, using information recorded on the history cards, and the coding of certain fixed and variable fields was decided upon. the card formats are outlined in figure 1. figures 2 and 2a list the codes and their meanings. all cards-column1-5 unique number 6 card code cardl cc 7 8 9-66 67 68 69 70-71 72-76 77-80 card 2 cc 7-15 16-24 25---33 34-39 4q-48 49-51 52-74 75---79 80 card 3 cc 7-80 type of material (coded) x ( cancel) or h (hold) (coded) title type of subscription (coded) how to pay (coded) years subscription runs account charged (coded) cost renewal date invoice number periodical holdings microfilm holdings purchase order number subscription length data frequency ( coded) indexing ( coded ) blank method of payment publisher's name fig. 1. field desc1'iptions card 4 cc 7-40 publisher's street address 41-43 subject code (coded) 44-49 purchase order date 50 blank 51-75 publisher's city, state 76-80 publisher zip code cards 5, 6, 7, 8 & 9 cc 7-80 publisher's mailing label data carda cc 7-80 claim information card b-history cc 7--12 purchase order number 13-17 date history transferred 18-26 dates subscription ran 27-31 cost 32 no. years subscription ran 33-80 invoice number garde cc 7-80 comments 86 journal of library automation vol. 7/2 june 1974 cardl cc-7-type of material p-periodicals i-index v-vertical file m-microfilm n-newspapers a-membership (assoc.) l-librarians file cc-67-type of subscription n-new r-renewal cc-68-how to pay r-check request i -imprest fund t-invoice in triplicate cc-70-71-acct. to be chm·ged l-library b-biology h-humanities p-physics ss-social science hs-health science t-technology bu-business e-economics card2 cc-49-51-frequency s-sunday only d-daily d&s-daily & sunday w-weekly q-quarterly a-annually biw-every 2 weeks fig. 2. coding symbols and meanings bim-every 2 months smo-semimonthly san-semiannually irr-irregular 3/y-three times a year 5/y-five times a year 7/ y-seven times a year 9/y-nine times a year 10/y-ten times a year 11/y-eleven times a year cc-52-73-indexing 01-index medicus 02-applied science and technology 03-business periodicals index 04-education index 05-international nursing index 06-library literature 07 -mla bibliography 08-nursing literature index 09-public affairs information science 10-readers guide to periodical literature 11-social science and humanities index 12-new york times index 13-art index 00-no index cc-80-method of payment a-payment enclosed b-payment and notice enclosed c-please invoice in triplicate d-payment and invoice copy enclosed input transmittal forms were designed with the aid of the information systems staff to record information for use in keypunching. forms shown in figures 3, 4, and 5 illustrate the transmittal of information for keypunching. while customized data transmittal forms are available commercially, it was found just as satisfactory and much more economical to design our own forms and have them reproduced by college facilities. since our main consideration in choosing information for inclusion was automated periodicals system/harp and heard 87 001 accounting 002 aeronautics 003 african studies 004 afro-american 005 agriculture 006 anthropology and archaeology 007 architecture 008 art 009 astronomy 010 automation 011 banking and finance 012 bibliography 013 biological sciences 014 boats and boating 015 book reviews 016 business and industry 017 chemistry 018 cities and towns 019 conservation 020 crafts and hobbies 021 criminology and law enforcement 022 dance 023 dissident magazines 024 economics 025 education 026 engineering 027 english 028 entertainment 029 fire fighting 030 fishing and hunting 031 folklore 032 games and sports 033 general 034 geography 035 geriatrics 036 german language fig. 2a. codes used fo1· periodical subiect list 037 government 038 health science 039 history 040 home 041 indexes and abstracts 042 journalism 043 labor and industrial relations 044 library periodicals 045 linguistics and philology 046 literary and political reviews 047 literature 048 mathematics 049 men's magazines 050 military 051 motion pictures 052 music 053 newspapers 054 ornithology 055 philosophy 056 photography 057 physics 058 political science 059 psychology 060 radio, tv, and electronics 061 religion and theology 062 romance languages 063 science-general 064 slavic languages 065 sociology 066 theatre 067 travel 068 traffic and transportation 069 vocations and vocational guidance 070 women's magazines 071 memberships 072 social work to overcome our renewal problems, we had to determine what information was needed for this purpose. if purchase orders were to be generated, a field to be used as a key would be required. the program would check the contents of this key to determine whether or not the subscription was due for renewal. since the logical key seemed to be the expiration date, it was allowed a separate field (figure 3, item 009), even though this partially 88 ]oumal of library automation vol. 7/2 june 1974 sheet #1 field cd cc iteh descriptio 1 7 001 location 8 002 hold/cane. 9-66 003 title 67 004 type sub. 68 005 how to pay ~ 006 no. years 70-71 007 account 72-76 008 cost 77-80 009 expir.date 2 7-15 010 inv.no. 16-24 011 holdings s2_-33 012 mfholdings 34-39 013 p.o. no. 4o-48 014 sub.length 9-51 015 frequency library period1c}~vbrtll tr~.nsmittal title bun 1 s /c g li 11£/v originill infort.llltion f' 0 l:>!m rrj is fi'e ill ie iw ck f?. ,;1. b u 0 i ;;.. 0 oj 0[&' 1s' i q ?:l1:> -lt.t e i q ~~ i 9 7 i 0 .:?'f ~ 0 ~ 0 q 7 3 08'7.$' /)1 unique # 00 09 7 update infor~l~tio!t d i l j l i i i i _l ll i ii i ijjjjjj 2-74 016 indexing 0 3 i 0 l j i i i i iii_ljljij 80 017 pay t.leth. 13 ').~··· 3 7-80 018 publisher ]) 11 n i :s r. ~ iii en/ name .l .. 4 7-40 019 publisher & :~ i.b jf.l.:.z:tritihi ia-ivl£1 i i j i i l_l_l_l i address l l j ll t i i 41-43 027 sullj .code 0 i 6 h4-h9 028 lf'.o.date 0 &, a lrl7j3j 51-7? 020 ublisher la2. e.w ll::i~iiljki i l.,t..j!t:--iwj 1yic61tc i f\1 i i i i i city,stat " 76-80 jo.:>1 pub.zip j (i) 0 jl9l fig. 3. libmry pe1'iodical data tmnsmittal form duplicated the subscription length field. (the subscription length field itself was to be kept intact for transfer to the history record. ) a one-position field (figure 3, item 002) could be programmed to suppress printing of a purchase order, as in the case of a canceled subscription, or to keep the order in "hold" if a budget problem arose near the end of the fiscal year. hold status would cause the order to be printed with the tag "pay when authorized" to call attention to this status. other fields shown in figautomated periodicals system/harp and heard 89 !ncc library periodical data 'rransmittal unique # p () 0 '9 '7 sheet #2 original inforh!\tion update il!formati on field 0 d 'd cc iteh descrip'l'io l ..• . -5 7-41 022 publisher ~ ck -l 'l i-ko 4 los' ~} l 0000 13-1'7 014 18-26 c cj ? lg 0 f 6 9 008 2'7-31 ,{: 0 stj 6 006 32 i i 010 33-80 'i 0 s;j. (., l l l l j b 013 7-12 ;? 8 5 ~¢ rooo 13-17 014 18-26 0 q 10 fc n"~' o! 008 2'7-31 {/ () 'l 0 0 006 32 i i r q 0 q tf 'f i 1111 010 33-80 i i b _q13 7-12 0 0 b 317 tf 000 13-17 c f, ?b2_ 014 18-26 ·c 9 ? 0 0 g 7 3 008 2'7-31 0 i %' 0 0 006 32 3 010 33-80 l111j b 013 7-12 000 13-17 014 18-26 n-r-l __ j 008 27-31 ~ --32 _r-·-,.-j-r-· -r-r--~ 010 33-80 ! i · i lj ~f+tr-· j--rtieb t-r-----i i i 1 i ii-tj l i ! j_ fig. 5. historical record form plete file on the history of each subscription. data necessary to maintain this file were the subscription length and cost fields (described above) and the addition of fields for the purchase order number (figure 3, item 013) plus the invoice number (figure 3, item 010). the computer program was wlitten so these data could be automatically transferred to the history record card at renewal time. automated periodicals system/harp and heard 91 superintencent of documents governmf.nt printing office washington, dc 20402 re• our public lands attention subscription claims department 02107/73 according to our records, we have not receive,d the following issueisi. if our su8scription is in order, kindly send our missing issues. volumf. 2?, issue n0.3••summer 1q72 -------.. thank you moraine valley community college library fig. 6. claims letter cpl•oct ba•d•l8220 morainf vallev cc~munity college library 10900 so 88th ave palos hills il 60465 claims data are transmitted as needed by providing the unique number of the title and the information concerning the missing issue( s). claims letters (figure 6) are then mailed in window envelopes, so no typing is required. when working with periodicals or serials one becomes accustomed to sudden or unusual changes that occur with or without notice. a few examples could be changes in title, frequency, or general publishing patterns. we wanted to provide our system with the ability to notify us that an investigative procedure had been completed and thus avoid many of the "why's" that recur. accordingly, we included a comment card (figure 4, card c) which can be updated as circumstances require. from the transmittal forms for the initial batch of titles, cards were keypunched and built into a magnetic tape file. the serials technician now submits updates or additions (e.g., for new titles) within the schedule provided by information systems and the tape is updated each month. the main printed report is run monthly (figure 7). this master list includes all bibliographic, holding, and renewal information. titles due for renewal in three months are flagged with asterisks. the technician 92 journal of library automation vol. 7/2 june 1974 m 0 r a i n e val ley c 0 m m un i t y c 0 l l e.g e _______ ___!.2{2117_3 ______________ _ periodical list page_ 67 r r 2 bu 01200 0875 00097 p dun's review 197z•oate 1965•1971 024502 0973•0875 m dun's rev! ew 666 fifth ave new york, ne\1 york ~~rkin~o~:~l~~r~6~~~y 86o~t ~~;3~~~~~~~0 --~ library 10900 south 88th ave palos hills, !l 60465 0310 062573 _b ____________ _ 10019 ___ ____:_ _______________ _ 016 history 00097 ebsco 0968•0869 00500 1 805276 ------------------~co o~7o oo70o 1 905441 -------------006374 06-_73 0970•,9873 01800 3 ' ______ ___:_____.:_____~~--"-----:__ _ ___:_ _ ______.:.___---'---~-'-----~-~------------******* history 00664 p early years .0373•date early years , one hale lane library, moraine valley comm. college, 10900 south 88th ave., palos hills, ill. 60465 022934 0373•0374 9/y 000 darien, conn. r r 1 t 00700 ------040273 06820 025 no _history record found 0374 a ------0-~~:~1 ~l ,eb~~~~2""-d~at~e-ll59•1 072 023-yr608'1:3-rzt4 m 10 r r 1 l 00595 1274 051013 bhistory ebony . 820 s. michigan av_e chicago, !lltn~!=._s__:..._....:____!6~0~6~05~~--'--~-. p 160464mrn88t090092 06/b 2 moraine vall comm 06080 '10900 s 88th ave 13 coll_lib palos hills, ill 60464 004 00098 ebsco 0868•0769 00500 1 805276 ebsco 0869•0770 00600 1 905441 _______ __:o:o:0_.:::63=-.7~3~05::__:•73 0870•0773 01200 3 history 00099 px ecology ·today 0371•0872 ecology today r r p ·ooooo 000000 000000000 m 000 box 180 ll-6 21'3 -----~s_t _mystic connecticut -~_b_b ___________ _ moraine vall~y comm coll lib 10900 s 88th ave palos hills, ill 60465 ceased pub•8/72--replaced with environment 00099 010859 02•71 u37l-u'zt2llll600 1 015680 02-73 0372•0273 00600 1 fig. 7. master periodical printout 019 i . 0000 . a determines the current subscription price and number of years to renew each flagged title and updates these fields. at the next monthly run, facsimile purchase orders (containing all revised data except the purchase order number) are printed (figure 8). the technician types up numbered purchase orders from these and forwards them to the business office in time for payment. we intended our system to utilize purchase order forms to be run directly on the computer. therefore, our present method of typing from facsimiles does seem wasted effort, but is looked on as a stopgap measure for the present and the inconvenience is tolerated while waiting for the more desirable method. if the computer forms are adopted, we may have increased conflict in price updates because there will be less opportunity for last minute corrections. however, we do plan to avoid as automated periodicals system/harp and heard 93 assoc. p.o. no ... .0 .hen.tion· ·· ·t.ia~kltv * ****************>lr>lr>lr.i.>lr>lri*>lrlilr>lr•iiii*h****i> .z i • vsiir.rfnevai. s•la§t~~tktltl~··r~·;i~b~~itj.;,.·. :.·,... .... ··= ··· ·>~~cit)'· * aovocate * * * * * * * * * * library * * palos.hllls ll 6046~ * • payment enclosed * * * * to this pri~da~~~;'.~~;~j~ ~j~~e~ ·.··• corr espfjnoene·~.·: •... :. • ·.... · · """'' ·: •·:· < ii< .,. fig. 8. facsimile purchase order * * * • * * * * much conflict as possible by plans to run actual purchase orders closer to the actual expiration date. the renewal procedure followed involves these steps: 1. check purchase order facsimiles for accuracy and match with renewal notices received. 2. check kardex for material arrival regularity. 3. type and forward purchase orders to business office. 4. update forms and send to information systems. 5. scan master list for flagged items and record their unique numbers and titles on update sheets. 94 journal of library automation vol. 7/2 june 1974. 6. update flagged items with renewal notices as follows: a. price. b. number of years for renewal. c. new subscription dates. d. method of payment. e. any changed information concerning publisher and mailing label. 7. update flagged items without renewal notices in the same manner, using the latest issue received. 8. as additional renewal notices for flagged items come in, make necessary updates. 9. send all updates to information systems at least three days before the master list and facsimiles are due to be run. price changes do occur between the time the item is flagged and the check is mailed. with most, though, notification is received from the publisher before the purchase order is actually typed, and corrections are made at that time. since the renewal process is linked to the expiration field, updating that field also causes transfer of data for the year just expired into the history record, as explained earlier. free materials, government depository items, and standing orders for which invoicing is known to be automatic are handled by filling the expiration date field with zeros. if a purchase order history record is needed, as with standing orders, these fields are updated at the time the invoice arrives. our master list does not contain headings to explain field descriptions. we place our master list in· a binder; a legend describing placement of field descriptions is attached to the inside of the front of the binder cover and is readily available for reference. we felt headings on each record would be clumsy, confusing, and would waste valuable printing space. codes and their explanations are attached to the inside of the back of the binder cover. to date, two revisions have been installed into the system: ( 1) in 1972 we decided to classify our holdings by subject. space was "found" for three digits, and we then proceeded to code our subjects (figure 3, item 027). our subject codes and their meanings are explained in figure 2a. ( 2) correspondence was assisted by having all necessary information in one location. the cost, purchase order number, and problem explanation were available by merely flipping the printout pages to the title in question on the master list. however, the date the purchase order was typed had to be looked up in order to effect an intelligent solution. six spaces were again "found" to provide this purchase order date (figure 3, item 028). actual computer programming was performed by information systems staff in bal, and programs are run on the college ibm 370-135 computer. automated pet·iodicals system/harp and heard 95 results it has not been possible to figure actual monetary costs for the library portion and maintenance of this system, nor to compare these costs with the manual system. libraries have traditionally been weak in figuring operation costs, and we confess to not having been very innovative in this area. we do not have specific itemized costs for our manual routines, so actual comparisons are not possible. a few figures concerning library time can be given. from october through december of 1971, when the initial phase was set up, the serials technician and public services librarian each contributed about 20 percent of their time, and a student aide worked 10 to 15 hours per week on the clerical part of the data transmittal. since that time the system has been operational for over two years, and some time approximations concerning updating, adding to the file, etc., are now available. with development behind us, time contributed by the serials technician, who is now solely responsible for the maintenance of the system, has dropped from 20 percent to between 5 and 15 percent. exact costs are difficult to extract, since this varies during the year according to the number of renewals due in particularly heavy expiration months as compared with those due in light expiration months. the library as part of the college is not charged for use of computer facilities. figures for machine time and keypunching are available and are as follows: program periodical additions per 100 titles periodical updates per 100 titles purchase order printing claim disbursements miscellaneous reports machine time (hr.) .1 .1 .5 .1 3.0 keypunch time (hr.) 8.0 2.0 .5 .2 .0 information systems has given their monetary cost in developing this system as $5,970 for programming time. they also figure program maintenance at $215 per year and the cost to run programs per year at $256. we can list important benefits we have derived. renewal problems have been eliminated. the few duplicate problems can be handled now as soon as they occur. our system handles all types of live subscriptions and the "dead file" as well. there is no more fussing with cards since we have a one-stop, clear record of holdings and histories, including the entire invoice and payment record for each subscription. at renewal time all the information for purchase orders is listed on a single-sheet facsimile. claim letters are done for us and we can call for various listings as they are needed. reports we receive are: master listing once a month, purchase order facsimiles once a month, claim letters as needed, fiscal year total cost re96 journal of library automation vol. 7/2 june 1974 ports, fiscal year area cost reports, subject lists as needed, holdings lists as needed, unique number lists as needed. conclusions many librarians having access to sophisticated computer facilities content themselves with producing a more or less elaborate holdings list. subscription placements and renewals are handled manually, often through a commercial agency. common agency problems such as overlapping and lapsed subscriptions are simply tolerated. we feel from our experience that if enough effort is expended to create a successfully operating holdings list, a small library does not require much further effort to add renewal, history record, and claiming functions. this eliminates agency problems, provides the ability to manipulate files for producing various reports, ancl in our opinion, results in more efficient and convenient record-keeping. the size of our operation falls at the lower end of a range of libraries having holdings large enough to require at least one individual's time. translated into figures, we feel that any automated system would be wasted on holdings of under 150 periodicals. the crucial factor in relation to size is not really any magic number of holdings but the ratio of available staff time to the size of the holdings. this factor must be evaluated by libraries considering any type of automated system. we feel much of the success of our system has been dependent upon our initial planning, our staff availability, and our conviction that a change was necessary to/eliminate the problems we were encountering with our n:ianual system. also the availability of the computer facilities, the encouragement provided by our superiors, and adequate library staff and information systems staff all contributed to an efficient changeover. acknowledgments gratitude is due moraine valley community college for its permission and support of this innovation. particular gratitude is due anabel sproat, head librarian, for her permission, support, and constant encouragement. the excellent work and friendly attitude of linda nemeth and the entire information systems staff who made this project a reality have been ~eeply appreciated. also, the capable assistance of student aide barbara hart ( goeske) in the recording process proved to be a very valuable asset. 198 information technology and libraries | december 2011 yan hantutorial articles: one was to make a case for using the cloud;4 while the other provided more details of moving a library’s it infrastructure (ils, website, and digital library systems) to a cloud along with discussing motivation, results, and evaluation in three areas (quality and stability, impact on library services, and cost).5 on the cost discussion, mitchell mentioned the difficulty of calculating technology total cost of ownership (tco) and cited two papers suggesting minimal cost savings. mitchell suggested the same but did not provide detailed cost information. in comparison, this paper has a detailed breakdown cost analysis along with different services, such as web applications and storage. mirsa and mondal proposed a suitability index and a return on investment (roi) model by taking into consideration impacts and real value.6 their suitability index and roi model is well thought but consider using the cloud for every aspect of all it operations as a whole. as a result, a company using this model will have the final conclusion of a “suitable,” or “may or may not be,” or “not suitable.” however, modular it operations and services (e.g., e-mail and storage) can be evaluated individually because these services can be easily upgraded or changed with minimal impacts to customers. i/o intensive services and storage intensive services have different resource requirements and thus the same evaluation criteria may not give an accurate picture of costs and benefits. for example, storing digital preservation files for libraries is a one-time data intensive operation. giving the above different nature of it operations and services, cloud computing may be suitable for some it operations but not for others. healy suggested that many companies did not have a complete financial analysis by missing staff retraining and system management. he listed the following areas for tco: hardware, software, recurring licensing and maintenance, bandwidth, a starting point for locating information for research; (2) buyer, the library as a purchaser of resources; and (3) archive, the library as a repository of resources. the 2009 survey indicates a gradual decline in their perception of the importance of “gateway,” no change in “archive,” growth in “buyer,” and increased importance for two new roles: “teaching support” and “research support.”1 to meet customers’ needs in these roles, libraries are innovating services, including catalogs and home websites (as “gateway” services), repository and digital library programs (as “archive,” “teaching support,” and “research support” services), and interlibrary loan (as a “buyer” and “research support” services). these services rely on stable and effective it infrastructure to operate. in the past, the growing needs of these web applications increased it expenditures and work complexity. more web applications, more storage, and more it support staff are weaved into centralized on-site it infrastructure along with huge investments in physical servers, networks, and buildings. however, decreasing budgets in libraries have had huge impact on all aspects of library operations and staffing. web applications running on local, managed servers might not be effective in technology nor efficient in cost. web applications utilizing cloud computing can be much more effective and efficient in some cases. literature review there are a growing number of articles related to cloud computing in libraries. chudnov described his personal experience of using cloud services amazon ec2 and s3 in an informal tone, costing him 50 cents.2 jordan discussed oclc’s strategies of building its next generation of services in cloud and provided a clear view of oclc’s future directions for us.3 mitchell wrote two cloud computing: case studies and total costs of ownership this paper consists of four major sections: the first section is a literature review of cloud computing and a cost model. the next section focuses on detailed overviews of cloud computing and its levels of services: saas, paas, and iaas. major cloud computing providers are introduced, including amazon web services (aws), microsoft azure, and google app engine. finally, case studies of implementing web applications on iaas and paas using aws, linode and google appengine are demonstrated. justifications of running on an iaas provider (aws) and running on a paas provider (google appengine) are described. the last section discusses costs and technology analysis comparing cloud computing with local managed storage and servers. the total costs of ownership (tco) of an aws small instance are significantly lower, but the tco of a typical 10tb space in amazon s3 are significantly higher. since amazon offers lower storage pricing for huge amounts of data, the tco might be lower. readers should do their own analysis on the tcos. a 2009 study from ithaka suggested that faculty perceive three traditional functions of a library: (1) gateway, the library as yan han (hany@u.library.arizona.edu) is associate librarian, university of arizona libraries, tucson, arizona. selecting a web content management system for an academic library website | han 199cloud computing: case studies and total costs of ownership | han 199 fundamental computing resources so that they can deploy and run arbitrary software such as operating systems and applications.13 in this model, the providers only manage underlying physical cloud infrastructure (e.g. physical servers and network), and provides services via virtualization. the users have maximum control on the infrastructure as if they own underlying physical servers and network. leading providers of this model includes amazon, linode, rackspace, joyent, and ibm blue cloud. major cloud computing providers include amazon web services (aws), microsoft windows azure, and google appengine. aws is considered to be an iaas, paas, and saas provider, which offers a collection of multiple computing services through the internet, including a few well-known services such as amazon elastic compute cloud (ec2),14 amazon simple storage service (s3), and amazon simpledb. ec2 started as a public beta in 2006. it allows users to pay for computing resources as they use them. with scalable use of computing resources and attractive pricing models, ec2 is one of the biggest brand names in cloud computing. it offers different os options, including multiple linux distributions, opensolaris, and windows server. ec2 uses xen virtualization, each virtual machine is called an instance. an instance in ec2 has no persistent storage, and data stored will be lost if the instance is terminated. therefore it is typical to use ec2 along with amazon elastic block store (ebs) or s3, which provides persistent storage for ec2 instances. amazon claims that both ebs and s3 are highly available and reliable. a user can create, start, stop, and terminate server instances through multiple geographical locations for benefits of resource optimization and high availability. for example, a user can start an instance in northern virginia, a potential to transform the it industry and it services, shifting the way it infrastructure and hardware are designed, purchased, and managed. many experts have their own version of cloud computing, which was discussed before.9 the national institute of standards and technology defines cloud computing as “a model for enabling convenient, on-demand network access to a shared pool of configuration computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction.”10 nist also gives its three service models layered based on computing infrastructure: ■■ software as a service (saas) allows users to use the cloud computing providers’ applications through a thin client interface such as a web browser.11 in the saas model, the cloud computing providers manage almost everything in the cloud infrastructure (e.g., physical servers, network, os, applications). it is directly targeted for general end users. the end users can directly run applications on the clouds and do not need install, upgrade, and backup applications and their work. typical saas products are google apps and salesforce sales crm. ■■ platform as a service (paas) allows users to deploy their own applications on the provider’s cloud infrastructure under the provider’s environment such as programming languages, libraries, and tools.12 in this model, the cloud computing providers manage everything except the application in the cloud infrastructure. paas is directly targeted for general software developers. they can develop, test, and run their codes on a paas platform. typical examples of this model includes google appengine, windows azure, and joyent. ■■ infrastructure as a service (iaas) allows users to manage processing, storage, networks, and other staffing allocation, monitoring, backup, failover, security audit and compliance, integration, training, and speed to implementation.7 the author published his first paper regarding cloud computing in 2010.8 since then, the author has implemented and has been managing multiple web applications and services using iaas and paas providers. several web applications of the university of arizona libraries (ual) have been migrated to the cloud. this paper focuses on enterprise-level applications and services, not individual-level cloud applications such as google docs. the purposes of this article are to ■■ define cloud computing and levels of services; ■■ introduce and compare major cloud computing providers; ■■ provide case studies of running two web applications (dspace and a home grown java application) utilizing cloud computing with justification; ■■ provide a comparison of tco of running web applications comparing a cloud computing provider with a local managed server; ■■ provide a comparison of tco of 10tb storage space comparing a cloud computing provider with local managed storage; and ■■ briefly discuss technology advantages of cloud computing. definition of cloud computing and levels of services cloud computing services and providers cloud computing is becoming popular in the it industry. over the past few years, the supply-and-demand of this new area has been seeing a huge increase of investment in infrastructure and has been drawing broader uses in the united states. the author believes that it has a 200 information technology and libraries | december 2011 16gb storage, 200gb transfer, and the cost is $19.95 per month.20 customers pay up front. open-source cloud computing software and private cloud cloud computing also goes to open source if any person or organization wants to set up their own clouds. eucalyptus is an open-source cloud computing system developed by the university of california at santa barbara. some of its eye-catching features include full compatibility with amazon ec2 public infrastructure and multiple hypervisors, which allows different virtual machines (e.g., xen, kvm, vsphere) to run on one platform.21 its open-source company, eucalyptus systems, provides technical supports to end users. building a cloud infrastructure on cloud(s) is also possible and might be desirable in certain situations. current linux distributions work with eucalyptus to provide private cloud services such ubuntu enterprise cloud and red hat’s deltacloud. some organizations have been setting up private clouds to utilize advantages of cloud computing. the azure allows non-windows applications to run on the platform. for example, apache web server can be run as a “worker role.”17 there also are a few small-to-medium size providers such as linode.18 table 1 lists major cloud computing providers. the cloud computing providers operate in two business models: variable (pay-for-your-usage) plans and fixed plans. variable plans allows customers to pay only for the resources actually consumed (e.g., instancehours, data transfer). aws offers a variable plan. google app engine works in a similar way. google app engine offers two interesting features: daily budgets and free quotas. a daily budget allows customers to control the amount of resources used every day. the free quota is currently set as 6.5 hours of cpu time per day, 1 gb data in and out per day, and 1gb of data storage.19 by the end of each month, customers receive a bill listing the number of running hours, the amount of storage used, the size of data transfers, and other add-on services. linode only offers a fixed plan. the charge is based on the amount of ram, data storage, and data transfer by assuming an instance is always running. for example, the smallest instance has 360mb ram, mirroring instance in ireland, and another mirroring instance in asia. amazon keeps increasing its offering by introducing new paas and saas services, such as simpledb, simple e-mail service, and e-commerce. google app engine is a paas provider offering a cloud platform for web applications in google’s data centers. it was released as a beta version in 2008 but is currently in a full service mode. appengine functions like a middle layer, which frees customers worrying about running oss, modules, and libraries. it currently supports python and java programming languages and related frameworks, and it is expected to support more languages in the future. google app engine uses bigtable with its gql (a sqllike language). bigtable15 is google’s proprietary database, used in multiple google applications such as google earth, google search, and app engine. the design of gql intentionally does not support “join” statement for multiple machine optimization.16 unlike aws, google appengine has a nice feature that allows customers a taste of the platform: it is free of charge up to a certain level of resource use. after that, fees are charged for additional cpu time, bandwidth and storage. windows azure also is a paas provider, which runs on microsoft data centers. it provides a new way to run applications and storing data in microsoft way. microsoft customers can install and run applications on microsoft cloud. customers are provided with two different instance types: web role instances and worker role instances. customers can use a “web role instance” to accept incoming http/https requests using asp.net, windows communication foundation (wcf) or another.net technology working with iis. a “worker role instance” is not associated with iis, but functions as a background job. the two instances can be combined to create desired web services. it is clear that windows table 1. list of major cloud computing providers cloud computing provider layer akamai paas, saas amazon web services iaas, paas, saas emc saas eucalyptus iaas open source software google paas(appengine), saas ibm paas, saas linode iaas microsoft paas (azure), saas rackspace iaas, paas, saas salesforce.com paas, saas vmware vcloud paas, iaas zoho saas selecting a web content management system for an academic library website | han 201cloud computing: case studies and total costs of ownership | han 201 the work of modification of sql-style code would have been significant. the author has a monthly bill of $40 using an aws small instance. case study 2: japanese gif holding library finder application the author helped the north american coordinating council on japanese library resources (ncc) to develop and maintain a web service to identify japanese global ill framework (gif) libraries to facilitate interlibrary loan (ill) service. the application was developed in java using j2ee framework, and run in typical java servlet container such as tomcat. the application was initially operated in a small, locally managed server, and was migrated to linode and google appengine in may 2010. cloud computing provider selection and implementation unlike case 1, the author tested and installed the application to aws, linode and google appengine. aws and linode are iaas providers which give users greater control over virtual nodes on their cloud infrastructure. google appengine might be a better choice when applications run on normal os environments, because system administration tasks can be completed by paas providers, saving users’ time and resources. as a paas provider, google maintains its infrastructure environment such as os, programming languages, and tools. installing the application in google appengine can go through an eclipse plug-in or through command lines. in this case, the gif application is a simple system written in java without any database transactions. therefore google app engine’s proprietary gql database is not a barrier. however, users should be aware that google appengine has other unique features. for example, cloud computing provider selection and implementation a typical dspace instance requires java and related libraries, j2ee environment, and postgresql as database backend. three cloud computing providers have been evaluated: aws, linode, and google appengine. two instances were successfully installed and configured in aws and linode after a few days of testing. building a dspace instance on the cloud is the same process as running it on local except that it is much quicker to build, restart, rebuild, and backup. for example, an initial os installation in a traditional server will take a few hours compared to doing the same task that takes a few minutes using an iaas provider. installation on the aws ec2 and linode is almost the same except creating a login and setting up security policies. to log on to aws, command line tools using an x.509 certificate using public/private key are by default. a generated keypair is required to ssh an instance and no password ssh option is provided. in addition, appropriate “security groups” are required to set up to enable network protocols. in this case, protocols such as ssh and http along with typical port number 80 and 8080 must be enabled. activities such as manage instances, creating images, and setup security policies can be set up through aws web interface (see figure 1). steps and commands of running regular operations can be found in the appendix. in linode, using “root” to log on is allowed. users do not need to set network and security policies, as protocols and ports are already open. in system administration practice, running applications without enforcing security policies does present security risks to applications and systems. linode allows users to set up security policies. the author decided not to proceed with installation in google appengine because of its proprietary database gql. if implemented in google appengine, private cloud eases concerns in the public cloud such as security of data, control of data, and legal issues. for example, an institution can build its own cloud infrastructure using eucalyptus (or ubuntu cloud) with its own computing resources or simply using amazon aws. the private cloud computing service becomes customizable cloud computing resources which can be configured and reconfigured as needed. why is this valuable? in traditional computing approaches, servers, storage, and networking equipment are purchased, configured, and then used without significant changes for three to five years until lives end. in this case, some planning must be scheduled ahead of time thinking of computing resource needs in three to five years. it is certain that additional resources (e.g., ram, hard disks, cpu) will be reserved for future needs and are currently wasted. the private cloud reduces concerns regarding security and data control. however, one must still buy, build, and manage the private cloud, increasing tco and reducing the cost benefit. case studies: applications on the cloud case study 1: dspace implementation and analysis many libraries are running their institutional repositories at locally managed servers. ual has been running its repositories since 2004 as one of the earliest dspace adapters. one of the dspace instances was tested on the cloud in january 2010 after comparing costs and supports. later the author chose to run a production dspace in aws starting march 2010. the repository (http://www.afghan data.org/) currently holds 1,800 titles of digitized unique afghan materials. since then, several content and system updates have been applied. 202 information technology and libraries | december 2011 a good case for calculating the tco.25 in cases below, readers should be aware that there are the following assumptions: ■■ software, training, licensing, and maintenance costs are the same by assuming using on the same software environment on the local managed infrastructure and on the cloud. ■■ monitoring costs are the same based on the fact that monitoring software has to be hosted somewhere. ■■ bandwidth and network costs ignored. ■■ security audit and compliance ignored by assuming all data are open. the author runs an instance of 100gb in aws and a monthly bill of this node is around $40. in comparison, if running a local managed server, a physical server would have been purchased. in our case, a comparison of tco shows that the cloud computing model has a significant 50 percent cost saving, assuming a server life expectancy is five years. analysis and discussions cost analysis running applications on the cloud gives many technical advantages and results in significant cost savings over running them on local managed servers. in this section, the author presents detailed cost comparisons between virtual managed nodes in the cloud computing and local managed storage and servers in the traditional model. cost saving and low barriers to launch web services using the cloud is significant when considering easy start-up, scalability, and flexibility. one of the biggest advantages of the cloud computing lies in its on-demand, allowing users to start applications with minimal cost. the current cost of starting an instance on aws is 0.03 per hour if reserved. above the clouds: a berkeley view of cloud computing cites a comparison: “it costs $2.56 to rent $2 worth of cpu” and “costs are $6.00 when purchasing vs. $1.20–$1.50 per month on s3.”24 clearly healy made currently google appengine only allows users to have their codes running in python and java; it uses its own database query language gql. this creates an extra step for developers who are willing to migrate existing codes to google and existing sql queries have to be rewritten. in addition, other limitations with google app engine include allowing only a subset of the jre standard edition and users are unable to create new threads.22 the cost of running the application on google app engine is great, because google app engine offers free of charge up to its free quota. google identified 90 percent of applications were hosted free.23 this is a great paas resource for small web applications. applications on the cloud since 2009, the author has been running multiple web applications and services on multiple iaas and paas providers and has been very happy regarding services and overall costs. the running applications and services are listed in table 2. figure 1. amazon aws management console selecting a web content management system for an academic library website | han 203cloud computing: case studies and total costs of ownership | han 203 ■❏ operation expense: $7,190– $10,690. ignoring downtime and failure expenses, insurance cost, technology training, and backup process. ■● system administrator cost: $3,500–$7,000 = 5 years x 1–2 percent time x (50,000 salary + 50000 x 40 percent benefits). 1–2 percent time is about 5–10 minutes per day assuming this administrator works at 8 hours per day 5 days per week at 100 percent capacity. ■■ space cost: $1,500. ■● space cost for a book in ual is $2.80 per year. a physical server is estimated to be $300 dollars per year for space. ■● electricity cost: $2,190. of a 1.0–1.2 ghz 2007 opteron or 2007 xeon processor.”26 ■■ the tco of a physical server comparable to an aws small instance for 5 years: $5,858–$7,608. ■❏ an aws small instance is roughly 50 percent of computing power of a server quoted. (the tco here is calculated as 50 percent of $11,715–$15,215). ■❏ hardware: $4,525. ■● $4,525 = $2,658 (server) + $1,125 (3-year support) + $1,125 x2 /3 (additional 2-year support). note: dell poweredge server: intel xeon e56302.53ghz with 5-year support for mission critical 6-hours repair (source: dell. com quoted on oct. 20, 2010). ■■ the tco of an aws small instance for 5 years: $2,750–$3,750. ■❏ hardware: $0. ■❏ operation expense: $2,750– $3,750 ■● system administrator cost: $0–$1,000?. by eliminating physical infrastructure, there is no need or minimal cost to manage a server. ■● $2,750 = $350 (aws initial subscription fee) + $40/ month x 12 months x 5 years. the instance’s capacity can be found on aws, and cpu power can be evaluated by using /proc/cpuinfo. amazon indicated that “one ec2 compute unit provides the equivalent cpu capacity table 2. some ual web applications and cloud computing service providers computing infrastructure functions applications computing environment instances service providers data storage data storage n/a linux / windows data storage using ebs or s3 aws access digital repository dspace j2ee, java, tomcat, postgresql, afghanistan digital collections aws linode content management system joomla linux, apache, php, mysql, afghanistan digital libraries aws linode website html html sonoran desert knowledge exchange aws linode integrated library system koha linux, apache, perl, mysql afghanistan higher education union catalog aws linode web applications home-grown j2ee web application j2ee, java, tomcat japanese gif (global interlibrary-loan) holding finder at linode at google app engine aws linode google app engine computing services monitoring nagios linux, perl internal application aws linode networked devices administration ssh, sftp linux n/a aws linode 204 information technology and libraries | december 2011 meet users’ needs at will. rebuilding nodes and creating imaging are also easier on the cloud. server failure resulting from hardware error can result in significant downtime. the ual has a few server failure in the past few years. last year a server’s raid hard drives failed. the time spent on ordering new hard disks, waiting for server company technician’s arrival, and finally rebuilding software environment (e.g., os, web servers, application servers, user and group privileges) took six or more hours, not to mention about stress rising among customers due to unavailability of services. mirroring servers could minimize service downtime, but the cost would be almost doubled. in comparison, in the cloud computing model, the author took a few snapshots using the aws web management interface. if a node fails, the author can launch an instance using the snapshot within a minute or two. factors such as software and hardware failure, natural disasters, network failure, and human errors are the main causes for system downtime. the cloud computing providers generally have multiple data centers in different regions. for instance, amazon s3 and google appengine are claimed to be highly available and highly reliable. both aws and google app engine offer automatic scaling and load balancing. the cloud computing providers have huge advantages in offering high availability to minimize hardware failure, natural disasters, network failure, and human errors, while the locally managed server and storage approach has to be invested a lot to reduce these risks. in 2009 and 2010 the university of arizona has experienced at least two network and server outages each lasting a few hours; one failure was because of human error and the other was because of a power failure from tucson electric power. when a power line was cut by accident, what can you do? in comparison, over the past two years minimal downtime from includes 12tb hard disks (about 10tb usable space after raid 5 configuration) with 5-year support, assuming 5-year life expectancy. ■❏ operation expense: $1,438– $2,138 per year. ■● system administrator cost: $700–$1,200. see above. ■● space cost: $300. see above. ■● electricity costs: $438 per year. see above. ■● network cost ignored. technology analysis there is no need to purchase a server; no need to initial a cloud node; no need to setup security policies; no need to install tomcat, java and j2ee environment; and no need to update software. compared to the traditional approach, paas eliminates upfront hardware and software investment, reduces time and work for setting up running environment, and removes hardware and software upgrade and maintenance tasks. iaas eliminates upfront hardware investment along with other technical advantages discussed below. the cloud computing model offers much better scalability over the traditional model due to its flexibility and lower cost. in our repository, the initial storage requirement is not significant, but can grow over time if more digital collections are added. in addition, the number of visits is not high, but can increase significantly later. an accurate estimate of both factors can be difficult. in the traditional model, a purchased server has preconfigured hardware with limited storage. upgrading storage and processing power can be costly and problematic. downtime will be certain during the upgrade process. in comparison, the cloud computing model provides an easy way to upgrade storage and processing power with no downtime if handling well. bigger storage and larger instances with high-memory or highcpu can be added or removed to ■■ electricity cost: $2,190 = 5 years x 365 days/year x 24 hours/day x 0.5 kilowatt / hour x $0.10/kilowatt. most libraries running digital library programs require big storage for preserving digitization files. the analysis below just illustrates a comparison of the tco of 10tb space. it shows that the tco of locally managed storage has lower costs than amazon s3’s storage tco. though the cloud computing model still have the advantage of on-demand, avoid big initial investment on equipment, the author believes that locally managed storage may be a better solution if planned well. since amazon s6 storage pricing decreases from $0.14/gb to $0.095/gb over 500tb, amazon s3’s tco might be lower if an organization has huge amounts of data. the author suggests readers should do their own analysis. ■■ the tco of 10tb in amazon s3 per year: $16,800. note: amazon s3 replicate data at least 3 times, assuming these preservation files do not need constant changes. otherwise, data transfer fees could be high. ■❏ operation expense: $16,800 per year. ■● $16,800 = $1,400/month x 12 months. (based on amazon s3 pricing of $0.14/gb per month) ■● network cost ignored. ■■ the tco of a 10tb physical storage per year: $11,212–$12,612. ■❏ to match reliability of amazon s3, local managed storage needs three copies of data: two in hard disk and one in tape. note: dell ax4–5i san storage: quoted on october 26, 2010. replicate data 3 times, including 2 copies in hard disks, one copy in tape. ignoring time value of money, 3 percent inflation per year based on cpi statistic data. ■❏ hardware: $4,168 per year. ■● $20,840 a san storage selecting a web content management system for an academic library website | han 205cloud computing: case studies and total costs of ownership | han 205 ’06), nov. 6–8, 2006, seattle, wash., h t t p s : / / w w w. u s e n i x . o r g / e v e n t s / o s d i 0 6 / t e c h / c h a n g / c h a n g _ h t m l / (accessed apr. 21, 2010). 16. google, “gql reference, 2010, http://code.google.com/appengine/ docs/python/datastore/gqlreference .html (accessed apr. 21, 2010); google developers, “campfire one: introducing google app engine (pt. 3),” 2010, http:// www.youtube.com/watch?v=og6ac7dnx8 (accessed apr. 21, 2010). 17. david chappell, “introducing windows azure,” 2009, http://download.microsoft.com/download/e/4/3/ e43bb484–3b52–4fa8-a9f9-ec60a32954bc/ azure_services_platform.pdf (accessed apr. 2, 2010). 18. linode, “linode—xen vps hosting,” 2010, http://www.linode.com/ (accessed apr. 7, 2010). 19. google, “quotas—google app engine,” 2010, http://code.google.com/ appengine/docs/quotas.html (accessed oct. 21, 2010). 20. jay jordan, “climbing out of the box and into the cloud: building webscale for libraries,” journal of library administration 51, no. 1 (2011): 3–17. 21. nurmi daniel et al., “the eucalyptus open-source cloud-computing system,” in 9th ieee/acm international symposium on cluster computing and the grid, 2009, doi: 10.1109/ccgrid.2009.93. 22. google, “the jre white list— google app engine—google code,” 2010, http://code.google.com/appengine/ docs/java/jrewhitelist.html (accessed apr. 9, 2010); google, “the java servelet environment,” 2010, http://code.google .com/appengine/docs/java/runtime .html (accessed apr. 9, 2010). 23. google, “changing quotas to keep most apps serving free,” 2009, http:// googleappengine.blogspot.com/2009/ 06/changing-quotas-to-keep-most-apps .html (access oct. 21, 2010). 24. michael armbust et al., above the clouds: a berkeley view of cloud computing (eecs department, university of california, berkeley: reliable adaptive distributed systems laboratory, 2009), http://www.eecs.berkeley.edu/pubs/ te c h r p t s / 2 0 0 9 / e e c s 2 0 0 9 2 8 . h t m l (accessed july 1, 2009). 25. amazon, “amazon ec2 pricing,” 2010, http://aws.amazon.com/ec2/pricing/ (accessed feb. 20, 2010). 26. michael healy, “beyond cya as a service,” information week 1288 (2011): 24–26. case of 10tb storage. since amazon offers lower storage pricing for huge amounts of data, readers are recommended to do their own analysis on the tcos. references 1. roger c. schonfeld and ross housewright, faculty survey 2009: key strategic insights for libraries, publishers, and societies, 2010, http://www.ithaka .org/ithaka-s-r/research/faculty-surveys -2000–2009/faculty-survey-2009 (accessed apr. 20, 2010). 2. daniel chudnov, “a view from the clouds,” computers in libraries 30, no. 3 (2010): 33–35. 3. jay jordan, “climbing out of the box and into the cloud: building webscale for libraries,” journal of library administration 51, no. 1 (2011): 3–17. 4. erik mitchell, “cloud computing and your library,” journal of web librarianship 4, no. 1 (2010): 83–86. 5. erik mitchell, “using cloud services for library it infrastructure,” code4lib journal 9 (2010), http://journal .code4lib.org/articles/2510 (accessed feb 10, 2011). 6. subhas c. misra and arka mondal, “identification of a company’s suitability for the adoption of cloud computing and modelling its corresponding return on investment,” mathematical & computer modelling 53 (2011): 504–21, doi: 10.1016/j. mcm.2010.03.037. 7. michael healy, “beyond cya as a service,” information week 1288 (2011): 24–26. 8. yan han, “on the clouds: a new way of computing,” information technology & libraries 29, no. 2 (2010): 88–93. 9. ibid. 10. peter mell and tim grance, the nist definition of cloud computing, nist, http://csrc.nist.gov/groups/sns/cloud -computing/ (accessed oct. 21, 2010). 11. ibid. 12. ibid. 13. ibid. 14. amazon, amazon elastic compute cloud (amazon ec2), 2010, http://aws .amazon.com/ec2/ (accessed oct. 21, 2010). 15. fay chang et al., “bigtable: a distributed storage system for structure data,” in 7th symposium on operating systems design and implementation (osdi the cloud computing providers was reported. there are some issues when implementing cloud computing. above the clouds: a berkeley view of cloud computing discusses ten obstacles and related opportunities for cloud computing.27 all of these obstacles and opportunities are technical. the author’s first paper on this topic also discusses legal jurisdiction issues when considering cloud computing.28 users should be aware of these potential issues when making a decision of adopting the cloud. summary this paper starts with literature review of articles in cloud computing, some of them describing how libraries are incorporating and evaluating the cloud. the author introduces cloud computing definition, identifies three-level of services (saas, paas, and iaas), and provides an overview of major players such as amazon, microsoft, and google. open source cloud software and how private cloud helps are discussed. then he presents case studies using different cloud computing providers: case 1 of using an iaas provider amazon and case 2 of using a paas provider google. in case 1, the author justifies the implementation of dspace on aws. in case 2, the author discusses advantages and pitfalls of paas and demonstrates a small web application hosted in google appengine. detailed analysis of the tcos comparing aws with local managed storage and servers are presented. the analysis shows that the cloud computing has technical advantages and offers significant cost savings when serving web applications. shifting web applications to the cloud provides several technical advantages over locally managed servers. high availability, flexibility, and cost-effectiveness are some of the most important benefits. however, the locally managed storage is still an attractive solution in a typical 206 information technology and libraries | december 2011 (accessed july 1, 2009). 29. yan han, “on the clouds: a new way of computing,” information technology & libraries 29, no. 2 (2010): 88–93. (eecs department, university of california, berkeley: reliable adaptive distributed systems laboratory, 2009), http://www.eecs.berkeley.edu/pubs/ te c h r p t s / 2 0 0 9 / e e c s 2 0 0 9 – 2 8 . h t m l 27. erik mitchell, “cloud computing and your library,” journal of web librarianship 4, no. 1 (2010): 83–86. 28. michael armbust et al., above the clouds: a berkeley view of cloud computing, appendix. running instances on amazon ec2 task 1: building a new dspace instance ■■ build a clean os: select an amazon machine image (ami) such as ubuntu 9.2 to get up and running in a minute or two. ■■ install required modules and packages: install java, tomcat, postgresql, and mail servers. ■■ configure security and network access on the node. ■■ install and configure dspace: install system and configure configuration files. task 2: reloading a new dspace instance ■■ create a snapshot of current node with the ebs if desired: use aws’s management tools to create a snapshot. ■■ register the snapshot using aws’s management tools and write down the snapshot id, specify the kernel and ramdisk. command: ec2-register: registers the ami specified in the manifest file and generate a new ami id (see amazon ec2 documentation) (example: ec2-register -s snap-12345 -a i386 -d “description of ami” -n “name-of-image” —kernel aki-12345 — ramdisk ari-12345 ■■ in the future, a new instance can be started from this snapshot image in less than a minute. command: ec2-run-instances: launches one or more instances of the specified ami (see amazon ec2 documentation) (example: ec2-run-instance ami-a553bfcc -k keypair2 -b /dev/sda1=snap-c3fcd5aa: 100:false) task 3: increasing storage size of current instance ■■ to create an instance with desired persistent storage (e.g., 100 gb) command: ec2-run-instances: launches one or more instances of the specified ami (see amazon ec2 documentation) (example: ec2-run-instances ami-54321 -k ec2-key1 -b /dev/sda1=snap-12345:100:false) ■■ if you boot up an instance based on one of these amis with the default volume size, once it’s started up you can do an online resize of the file system: command: resize2fs: ext2 file system resizer (example: resize2fs /dev/sda1) task 4: backup ■■ go to aws web interface and navigate to the “instances” panel. ■■ select our instance and then choose “create image (ebs ami).” ■■ this newly created ami will be a snapshot of our system in its current state. spatiotemporal distribution change of online reference during the time of covid-19 article spatiotemporal distribution change of online reference during the time of covid-19 thomas gerrish and ningning nicole kong information technology and libraries | december 2022 https://doi.org/10.6017/ital.v41i4.15097 thomas gerrish (tgerrish@purdue.edu) is assistant professor, purdue university libraries and school of information studies. ningning nicole kong (kongn@purdue.edu) is associate professor, purdue university libraries and school of information studies. © 2022. abstract the goal of this project was to identify the impact of the covid-19 pandemic on the spatiotemporal distribution of the library’s online patrons, so that we could assess if the scheduled library reference hours are meeting the needs of the academic community. we collected each online reference patron’s location information via their ip address, as well as the timestamp of each online reference instance. the spatiotemporal distribution patterns were analyzed and compared before and after in-person instruction was suspended due to covid-19 distance protocols and a closing of the campus in the 2020 spring semester. the results show that the geographic origins of reference questions redistributed after covid-19 protocols were initially implemented and the university community underwent a temporary geographical redistribution. reference question origins tended to move away from campus to other areas of the state, other states, and internationally. this population redistribution suggested that the library could adjust the online reference schedule to provide better access and service to patrons. introduction the library’s online reference service, also referred to as library chat or digital reference, is a synchronous text-based interaction between the library and the patron via an internet connection, though audio/video communications are now also available. this online reference service provides a way to meet the information needs of patrons who cannot access the physical library location or prefer virtual communication. in this way, it expands the library’s reference services from the physical location to a virtual environment. when the university community was encouraged to socially distance due to covid-19, online reference became a key library function that maintains the library’s connection to the community and their information needs. this connection became vital when most of the library’s physical services were suspended for a short period during the 2020 spring semester. for many libraries, chat became the only possible way to connect with patrons. this increased reliance on online reference and the greater dispersion of the student population led to a looming assessment question with regard to the service: are the hours of online reference convenient for the populations that may live in time zones other than the library’s local time? that is to say, are we available when our patrons are likely to need us? to address the above questions, this study recorded the time stamp and ip address associated with every incoming chat reference from the beginning of the 2019 fall semester to the end of the 2020 fall semester. we evaluated and compared the information from the spatiotemporal distribution pattern change of online reference patrons before and during the covid-19 pandemic. an ip address is a unique string of numeric characters that identifies a particular computer or user over a network, and this number is generally saved in the background with all other information about the chat reference interaction. the ip addresses can be translated to mailto:tgerrish@purdue.edu mailto:kongn@purdue.edu information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 2 gerrish and kong latitude and longitude information using geocoding services. with the latitude and longitude coordinates derived from the ip address, we determined a patron’s location at a city level and time zone. thus, the information helped to evaluate the user population distribution in the world and compare users’ local times to the online reference service’s operation hours. the patrons’ location information as well as the timestamps were evaluated and analyzed in geographic information systems (gis) to provide insights about how the online reference service met the user needs and how the library could improve, if possible. background purdue university libraries serves a large r1 university with a student population of 45,869 as of fall 2020. the online reference service is staffed by approximately 20 professional staff, though this number jumped to 29 when the library’s physical locations closed due to covid-19. online reference at our libraries uses the springshare platform for synchronous chat service with the university community. covid-19 has had a measured effect on online reference hours of operation. prior to covid-19, online reference operated from 11 a.m. to 9 p.m., monday through thursday. on fridays, online reference operated from 11 a.m. to 5 p.m. sundays had truncated hours with the service open from 6 p.m. to 10 p.m. immediately following the move to virtual instruction in march 2020, the administration requested that the online reference service reflect the original hours of the physical library as closely as possible. thus, online reference hours of operation shifted to 7:30 a.m. to 10 p.m. with staff who normally cover the physical reference desks now covering the online service. additional hours were added on saturday afternoon, from 1 p.m. to 5 p.m., and sunday hours were extended from 1 p.m. to 10 p.m. during the university’s virtual instruction phase, only one physical library maintained limited hours of operation, from 8 a.m. to 5 p.m. for local students who needed a computer, wi-fi access, or printing. this was the only in-person service available until august 2020, when libraries began opening with covid-19 restrictions in place. during summer 2020, online reference opening time was moved back to 9 a.m. to allow staff to cover the operations of the library, but the evening and weekend hours were maintained. finally, in fall 2020, all physical libraries reopened with operating hours from 7 a.m. to 12 a.m. however, online reference hours did not return to the 2019 model. instead, online reference operated from 9 a.m. to 11 p.m., monday through thursday; 9 a.m. to 5 p.m. on friday; 1 p.m. to 5 p.m. on saturday; and 2 p.m. to 11 p.m. on sunday. figure 1 shows the timeline of the physical libraries operating status and online reference operating hours changes in 2020. prior to covid-19, this reference service mirrored the traditional in-person reference desk in the hours of operation and staff support. indeed, it was originally conceived, structured, and promoted as a supplement to the in-person reference desk, which was stationed at each library. after campus-wide covid-19 restrictions went into place in march 2020, on-campus students were asked to return home. all of the physical libraries except one were closed, and patrons were actively directed to online reference services. as a result, this service underwent changes in its availability and how users accessed it. while the service was already increasing in use, the postcovid-19 period observed a 30% increase as it became more widely used by not only the student population but also the faculty and staff. at this point, online reference became the primary connection between the geographically distributed community and the library. on march 23, 2020, all classes became virtual, and students largely departed campus. this dispersion was not information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 3 gerrish and kong figure 1. physical libraries operation status and online reference hours change in 2020 to accommodate the campus instruction mode change. only applied to students. members of the faculty and staff communities also relocated once it was clear that work would need to be done remotely. in the fall 2020 semester, university classes resumed with a hybrid online/in-person structure. of the total 45,869 students enrolled as of fall 2020, approximately 4,900 students (or 10.7% of the student population) elected online only classes for at least one semester. given that online reference would still have a schedule of 9 a.m. to 11 p.m., sunday through friday, the question became does this serve the students who were potentially located in time zones geographically distant from the university. literature review online reference service has been offered in academic libraries for more than three decades.1 projects that evaluate online reference service, along with technology changes and user community needs, have been conducted ever since.2 for example, mcclure et al. provided a guideline of statistics, measures, and quality standards for evaluating online reference services.3 however, not many studies have been done that assess the needs from patrons’ geographic perspective in order to improve the service. applying gis analysis to improve the library’s services has a comparatively long history. most of those studies are about mapping the interior spaces of libraries and understanding both space and facility use by patrons. among the representative articles, xia used gis to map the physical location of library materials against a user’s self-height to gain a better understanding of patron browsing habits.4 weessies used gis to evaluate the likelihood of a computer station’s use in relation to the distance to the library entrance, windows, printers, number of neighbo ring computers, and library service locations.5 given6 and mandel7 combined the traditional library metric of user counts from “visual sweeps” with gis to visualize library space usage, patron preferences, and traffic flow. complementing this, stoddart mapped library spaces against the expected use to visualize how library space was used.8 shen takes the space model a step further information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 4 gerrish and kong by creating a library-space information model that can direct patrons to the shelf location of a given book while also giving the library data on circulation stats vs. book location and shelf height.9 characteristic for this broad grouping of research is the use of gis to analyze patrons, patron behavior, and library resources within the extent of the brick and mortar library building. spaces outside of library buildings have been less examined. historically, the external space of the library has been important with regard to neighborhood service areas, wayfinding, distribution of reference within a consortium, and coverage over larger geographic areas. while many studies focused on the community and a library’s immediate locality and community, donnelly mapped library geographic dispersion on the national level against existing united states populations to examine the variation in local library coverage.10 with the implementation of covid-19 procedures and the ubiquity of online reference services, the external location of patrons and their change in distribution over time has increasingly become an important question that gis can address. there are limited studies using patron information, including ip addresses, to track patron location within an online reference model. in one example, clark geocoded patron addresses to visualize the library’s external service area.11 in an academic library setting, ruttenberg used the ip address to locate on-campus patrons when they asked a question to online reference.12 kinikin studied patron locations in the world and used gis to determine which populations were using the library and its branches while also informing decision makers on areas of low library coverage.13 these ideas were expanded in mon’s study, which geolocated ip addresses for the physical location of a patron who asked a question within the statewide florida electronic library collaborative chat service.14 building on this, bishop compared the originating location of a question against which librarian in a geographically disperse network asked the question as a measure of the utility of local knowledge in the reference process.15 in our study, we expanded upon the previous methods to compare patrons’ spatiotemporal pattern changes before and after the covid-19 pandemic. methods purdue university libraries online reference service uses springshare’s libanswers platform as an interface for chat. the system records each patron’s message text, timestamp, as well as the associated ip address. an initial data set of all online reference questions dated from fall semester 2019 until the end of fall semester 2020 (inclusive august 19, 2019, to december 15, 2020) was downloaded from the libchat administration function as a .csv file. this initial data set included fields for ip address, date, time, interaction transcript, patron email (if provided), and name (if provided). this initial data set included 8,754 chat interactions excluding emails, sms text transactions, and questions asked via twitter. all reference interactions were initiated from the library’s website, the ask-a-librarian page, or through the proactive chat widget, which is located on all purdue library webpages. we do not require patrons to identify their relation to the university, their email address, or their status in the university (i.e., undergraduate student, faculty, staff, etc.), so this information is generally unavailable unless the patron self-identifies through the course of the reference interaction. similarly, unless a patron identifies their physical location, reference staff generally do not know where in the world an online reference patr on is located. in practice, most incoming online reference questions are anonymous with no indicators of identity or location. information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 5 gerrish and kong the database was geocoded using the ipinfodb tool to get latitude and longitude information.16 per ipinfodb, the platform’s accuracy is “99.5% on a country level and around 60% on a city level for the us within a 25-mile radius.” the date and timestamps were also associated with the output file. the latitude and longitude coordinates were translated to city, region (i.e., state), and country. time zone information was added to each record according to its geographic location. the final data set contained separate fields for latitude, longitude, state or region, country, time zone, and timestamp, which includes both date and time. at this point, all personal information embedded in the original data set was de-identified, though certain conversations could potentially identify users. one potential limitation to the data set is that if a patron used the university’s vpn network, the reference interaction would be georeferenced with an ip address on campus. indeed, any vpn network would not report the reference interaction’s originating location correctly. the data from each semester was broken down and sorted separately. in addition, r ecords in spring semester 2020 for both preand post-covid-19 restrictions were compared to measure the redistribution of a question point of origin due to covid-19. the data were spatially plotted and analyzed using arcgis pro. results figure 2. total digital reference transactions per semester. 0 500 1000 1500 2000 2500 3000 fall 2019 spring 2020 summer 2020 fall 2020 total transactions per semester information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 6 gerrish and kong the digital reference transaction data were compiled and analyzed on a semester basis except for spring 2020, when all classes moved to 100% online on march 23, 2020. for the spring 2020 semester, the data were split into pre-march 23 (i.e., pre-covid-19 restrictions) and post-march 23 (i.e., post-covid-19 restrictions). the total number of chat interactions generally grew for each semester starting with fall 2019 (fig. 2). the number of summer chat interactions was relatively fewer because it was a shorter semester with fewer students taking classes as compared to the fall and spring semesters. we analyzed the spatial and temporal distribution of patrons before and after the covid-19 pandemic in the following sections. spatial distribution of the patrons before and after implementation of covid-19 protocol the spatial distributions of digital reference patrons before and after the pandemic are mapped in figure 3. there is a trend showing that after the start of the pandemic, patrons were more geographically distributed within the unites states and around the world. we mapped the international distribution of patrons in fall semester of 2019 and 2020 (see fig. 3(a) and 3(b)), as most of the international students make their travel plans by semester. there is a significant increase in questions coming from india, several european countries, and south america. we compared the spatial distribution of patrons before and after the implementation of covid-19 protocols in spring 2020 within the united states as the time frame is more suitable for domestic travel plans. figure 3(c) and 3(d) also show an increase of patrons around the country other than the campus area. figure 3. spatial distribution of patrons before and after initial covid-19 protocols closed the campus. (a) 2019 fall semester (b) 2020 fall semester (c) 2020 spring pre-covid-19 (d) 2020 spring post-covid-19 information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 7 gerrish and kong to help us further understand this spatial distribution change, we divided the geographic regions into four categories: local areas (west lafayette and lafayette areas of indiana, where the purdue university main campus is located), other indiana areas, other states apart from indiana, and areas outside of the united states. the transactions in these four regions are shown in table 1, and the percentages of transactions from each region were summarized in figure 4. in fall 2019, which is considered the last “normal” semester prior to the covid-19 response, 60% of reference questions originated in the immediate local area to campus (defined as the local area). the beginning of spring 2020 followed closely to this number with 56% of the total questions originating in the local area of the university. this proportion dropped at the march 23 boundary with only 29% of reference questions originating from the immediate local area to the university. during this period, there was an increase in questions originating in the state of indiana as well as other regions of the united states and the world. during summer 2020, 35% of the total number of chats originated in the campus area. in fall 2020, classes were offered as a combination of hybrid, in-person, and virtual formats; however, the proportion of questions originating in the immediate area did not return to fall 2019 levels. instead, only 43% of questions originated from the campus area. figure 4. the percentage of transactions before and after the population redistribution due to the implementation of covid-19 protocols in four geographic areas: local area (campus), indiana, united states outside of indiana, and international. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% fall 2019 2020 spring pre-covid 2020 spring postcovid summer 2020 fall 2020 local area state outside state outside us information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 8 gerrish and kong table 1. digital reference transactions coming from different geographic regions semester total trans. local area indiana not local outside indiana outside us fall 2019 1928 1156 203 493 76 2020 spring pre-covid-19 1168 656 205 273 34 2020 spring post-covid-19 1119 323 349 389 58 summer 2020 1909 676 552 600 81 fall 2020 2630 1118 670 714 128 in general, the geography of origin was redistributed primarily to the state of indiana, followed by the united states as a whole, and then to international locations. fall 2019 saw 11% of virtual reference questions originating in indiana but outside of the local area. this increased to 18% in spring 2020 prior to the implementation of campus covid-19 restrictions. after march 23, 2020, when all classes went online, the percentage of questions in indiana outside of our campus area rose to 31%. this proportion remained steady during summer 2020, when 29% of questions originated in indiana outside of the local area. in fall 2020, when in-person classes resumed, the proportion dropped to 26%, but this remains more than two times the proportion measured during fall 2019. this pattern of redistribution of geographic origin was repeated in the data points outside the united states, though the fluctuation due to covid-19, while similar, occurred to a lesser degree. in fall 2019, 3.9% of questions arrived from geographic origins outside of the united states. in spring 2020, prior to covid-19 restrictions, this number dropped to 2.9%. after classes went virtual and students moved off campus, the percentage of questions increased to 5.2%. in fall 2020, the proportion dropped to 4.9%, but this is still not a return to fall 2019 levels. the distance of digital reference patrons to main campus to analyze the spatial distribution change of digital reference patrons, we calculated the distance of each patron to our main campus. a small portion of ip addresses (less than 4%) which couldn ’t be correctly located was excluded from the analysis. figure 5 represents the distance distributions in a box and whisker diagram. the horizontal lines within the boxes show the median of the data sets. in both fall 2019 and early spring 2020, the median distances are about 400 miles or less, which is within the local area. this indicates that most of the digital reference questions come from patrons who live around the main campus. the median distance increased to 1,000 miles after classes were moved online in spring 2020, which is about the typical distance of traveling within the state of indiana. the maximum value was extremely high coming from international countries. this indicates that a significant portion of the patrons moved outside of the local area. in fall 2020, although the maximum distance dropped to a similar range as in normal semesters, the median and average values of the distance dataset are still much higher than the time before the pandemic. information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 9 gerrish and kong figure 5. the box and whisker diagram shows the distance from ip addresses to campus (in miles). to test the statistical significance of the distance values in different time periods, we conducted anova tests for the distances in spring 2020 semester comparing before and after the pandemic, as well as a comparison between fall 2019 and fall 2020. the test results are shown in table 2. both tests show there are significant differences before and after the classes were moved online. this means the pandemic situation significantly changed the patron distances to the main campus with p-values < 0.05. we likewise compared the distance between spring 2020 post-pandemic and fall 2020. there was no significance found. although the university started to offer in-person classes in fall 2020, and most of the students were back to campus, there were still quite a few questions coming from students/faculty who were not living in the geographic area around campus. information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 10 gerrish and kong table 2. the anova test results of the patrons’ distance to the main campus before and after the pandemic protocols were implemented (in miles) summary groups count sum average variance spring 2020 pre 1134 323,361 285.15 8e+05 spring 2020 post 1061 592,238 558.19 2e+06 anova source of variation ss df ms f p-value between groups 4.1e+07 1 4.1e+07 27.54942 0.0000 within groups 3.3e+09 2193 1.4e+06 total 3.3e+09 2194 summary groups count sum average variance fall 2020 2502 1e+06 500.95 2e+06 fall 2019 1852 7e+05 385.43 2e+06 anova source of variation ss df ms f p-value between groups 1.4e+07 1 1.4e+07 7.104865 0.0077 within groups 8.7e+09 4352 2.0e+06 total 8.7e+09 4353 temporal distribution of the questions while the spatial distribution of reference questions allowed us to understand where patrons were located, the temporal distribution of these questions helped us to plan the digital reference service hours to better meet patrons’ needs. we analyzed the temporal distribution of questions by the day of the week for fall 2019 and fall 2020. figure 6 shows the median distances of the questions for each day of the two semesters. the distance was broken down into six ranges differentiated by color. from the nearest distance range to the ranges above, the analysis covers the questions coming around campus, in the local area within indiana, around eastern united states (with mostly eastern and central time zones), the entire united states, and international locations. in fall 2019, we provided digital reference service sunday through friday, and in fall 2020, the service was provided every day except holidays. in fall 2019, the monday to friday range showed that most digital reference questions came from the campus area, especially in the first half of the semester. questions from further distances within indiana started to occur more often after november, probably due to holiday related travels. relatively remote questions came more often on sundays. one possible explanation for this difference is that students/faculty might travel away from campus during weekends. interestingly, the fall 2020 weekly distribution of questions shows a different pattern. first, most median distances are further than the fall 2019 semester, which means there were a lot of questions coming from people living off campus, no matter if it was at the beginning of the semester or later. second, there was no obvious difference between the median distances during the weekdays and weekends. information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 2 gerrish and kong figure 6. the median distance of reference questions in miles by weekdays (left: fall 2019, right: fall 2020). in addition, we analyzed the hourly distributions of digital reference questions for fall 2019 and fall 2020 (fig. 7). in fall 2019, the median distance of the questions were mostly from campus, especially during the peak hours. if the median distances measured were not from the campus area, at most they were located in the local area where most of the off-campus students, faculty, staff, and research community live, or within greater lafayette. remote questions came from different time zones, such as international time zones or the pacific time zone, and usually came either in the first hour or the last hour of digital reference service operating hours. in fall 2020, this distribution pattern changed. most of the median distance ranged 200 miles, which meant that a large portion of the questions came from off-campus populations. there were additional time slots with median distances above 2,000 miles, which came from a time zone with at least 2 hours difference from our campus. again, these questions were most often observed in either early or late reference service hours, i.e., 8 a.m. to 9 a.m., or 10 p.m. to 11 p.m. <= 5.6 <= 22 <= 188 <= 500 <= 2,000 >2,000 information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 3 gerrish and kong figure 7. the median distance of reference questions by hour of day (eastern time zone). discussion and conclusion covid-19 and the protocols developed in response to it had a redistributing effect on the geographic origin of reference questions in academic libraries. as the university closed and moved to virtual classes in response to the early pandemic, the geographic origin of reference questions redistributed away from campus. in our case, the geographic origins tended to move away from the campus to nearby areas within the state and neighbor states, though there was redistribution away from the campus at the state, regional, national, and international level. as of fall 2020, when the campus partially opened, these numbers have begun reversing themselves, but there is still a (a) fall 2019 median distance in miles (b) fall 2020 median distance in miles <= 5.6 <= 22 <= 188 <= 500 <= 2,000 <= 5.6 <= 22 <= 188 <= 500 <= 2,000 information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 4 gerrish and kong significant population beyond the campus and the local time zone. fall 2020 distribution numbers still show some of the redistribution effects observed early in the pandemic. there were a surprising lack of questions coming from the russian federation, china, and central asian countries given that our university does have students from these countries. this may be due to the use of a vpn by users in these countries when accessing library resources. if a user were to use vpn, the question would be recorded as having a geographic origin in the vpn provider’s location, rather than in the country of origin. this is one possible explanation for the lower number observed in china and the non-existent users in the russian federation, eastern europe, and central asia. this study demonstrated the broadening of our library’s geographic footprint in response to covid-19 protocols. students, faculty, and staff were not bound to campus and were free to study and work anywhere with internet access. in this regard, with populations distributed around the country and world, the expansion of reference hours was necessary. prior to covid-19, online reference operated from 11 a.m. to 9 p.m. this would mean students studying virtually in the pacific time zone would experience effective reference desk hours of 8 a.m. to 6 p.m., which eliminates access during the evening hours. when the library extended online reference hours during the covid-19 lockdown from 7:30 a.m. to 10 p.m., this somewhat improved the accessibility for patrons in the pacific time zone. for those students studying online in the pacific time zone, this creates effective reference hours of 4:30 a.m. to 7:00 p.m. pst. the contrast becomes even starker when examining international students studying online in much more distant countries. many students from india returned to their home country in spring 2020 following the move to virtual learning. for students studying in india, the online reference desk in pre-covid-19 times would have effective hours 8:30 p.m. to 6:30 a.m. ist (india standard time), which forced this population to interact with the library during their evenings and nights. the expanded reference hours improved this access to 5 p.m. to 7:30 a.m. ist. while this is better, it still forces students in this part of the world to interact with the library during the evenings and nights and excludes daytime hours. interestingly, the data from fall 2019 seems to indicate that there was an international population prior to covid-19. these were likely students studying abroad, taking some of the early online classes, or simply traveling. thus, the distributed online reference user population is nothing new but has been exacerbated by covid-19 and the expansion of online classes. in this regard, the number of international reference interactions can be predicted to decrease as covid-19 restrictions are gradually relaxed, but the number will not go to zero. endnotes 1 joseph janes, david carter, and patricia memmott, “digital reference services in academic libraries,” reference & user services quarterly 39, no. 2 (1999): 145–50, http://www.jstor.org/stable/20863724. 2 carol tenopir and lisa ennis, “a decade of digital reference 1991–2001,” reference & user services quarterly 41, no. 3 (2002): 264–73, http://www.jstor.org/stable/41241123. http://www.jstor.org/stable/20863724 http://www.jstor.org/stable/41241123 information technology and libraries december 2022 spatiotemporal distribution change of online reference during the time of covid-19 5 gerrish and kong 3 charles r. mcclure, r. david lankes, melissa gross, and beverly choltco-devlin, “statistics, measures, and quality standards for assessing digital reference library services: guidelines and procedures,” eric clearinghouse on information & technology (2002). 4 jingfeng xia, “library space management: a gis proposal,” library hi tech (2004), https://doi.org/10.1108/07378830410570476. 5 kathleen w. weessies, “a locational analysis of academic library computer use,” reference services review (2011), https://doi.org/10.1108/00907321111175868. 6 lisa m. given and heather archibald, “visual traffic sweeps (vts): a research method for mapping user activities in the library space,” library & information science research 37, no. 2 (2015): 100–8, https://doi.org/10.1016/j.lisr.2015.02.005. 7 lauren mandel, “visualizing the library as place,” performance measurement and metrics (2016), https://doi.org/10.1108/pmm-04-2016-0016. 8 rick stoddart and bruce godfrey, “gathering evidence of learning in library curriculum center spaces with web gis,” evidence based library and information practice 15, no. 3 (2020): 21– 35, https://doi.org/10.18438/eblip29721. 9 yaqui shen, “library space information model based on gis—a case study of shanghai jiao tong university,” information technology and libraries 37, no. 3 (2018): 99–110, https://doi.org/10.6017/ital.v37i3.10308. 10 f. p. donnelly, “regional variations in average distance to public libraries in the united states,” library & information science research 37, no. 4, 280–289, https://doi.org/10.1016/j.lisr.2015.11.008. 11 philip m. clark, “thematic mapping, data mapping, and geocoding techniques for analyzing library and information center data,” journal of education for library and information science (1995): 330–41. 12 judy ruttenberg and heather tunender, “mapping virtual reference using geographic information systems (gis),” poster presented at the ala conference, orlando, fl, june 2004, https://web.archive.org/web/20040808050212/http://helios.lib.uci.edu/question/gisala2004/campusmap2004-2.jpg. 13 janae kinikin, “applying geographic information systems to the weber county library system,” information technology and libraries 23, no. 3 (2004): 102. 14 lorri mon et al., “the geography of virtual questioning,” the library quarterly 79, no. 4 (2009): 393–420, https://doi.org/10.1086/605381. 15 bradley wade bishop, “location‐based questions and local knowledge,” journal of the american society for information science and technology 62, no. 8 (2011): 1594–603, https://doi.org/10.1002/asi.21561. 16 “ip address information,” ipinfodb, https://ipinfodb.com/. https://doi.org/10.1108/07378830410570476 https://doi.org/10.1108/00907321111175868 https://doi.org/10.1016/j.lisr.2015.02.005 https://doi.org/10.1108/pmm-04-2016-0016 https://doi.org/10.18438/eblip29721 https://doi.org/10.6017/ital.v37i3.10308 https://doi.org/10.1016/j.lisr.2015.11.008 https://web.archive.org/web/20040808050212/http:/helios.lib.uci.edu/question/gis-ala2004/campusmap2004-2.jpg https://web.archive.org/web/20040808050212/http:/helios.lib.uci.edu/question/gis-ala2004/campusmap2004-2.jpg https://doi.org/10.1086/605381 https://doi.org/10.1002/asi.21561 https://ipinfodb.com/ abstract introduction background literature review methods results spatial distribution of the patrons before and after implementation of covid-19 protocol discussion and conclusion endnotes lib-s-mocs-kmc364-20140601053406 content designators for machine-readable records: a working paper henriette d. a vram and kay d. guiles: marc development office, library of congress, washington, d.c. under the auspices of the international federation of library association's committees on cataloging and mechanization,. an international working group on content designators was formed to attempt to resolve the differences in the content designators assigned by national agencies to their machine-readable bibliographic records. the members of the ifla working group are: henriette d. avram, chairman, marc development office, library of congress; kay d. guiles, secretary, marc development office, library of congress; edwin buchinski, research and planning branch, national library of canada; marc chauveinc, bibliotheque interuniversitaire de grenoble, section science, domaine universitaire, france; richard coward, british library planning secretariat, department of education & science, united kingdom; r. erezepky, deutsche bibliothek, german federal republic; f. poncet, bibliotheque nationale. paris, france; mogens weitemeyer, det kongelige bibliotek, denmark. all working papers emanating from the ifla working group will be submitted to the international standards organization technical committee 46, subcommittee 4, working group on content designators. prior to any attempt to standardize the content designators for the international exchange of bibliographic data in machine-readable form, it is necessary to agree on certain basic points from which all future work will be derived. this first working paper is a statement of: 1) the obstacles that presently exist which prevent the effective international interchange of bibliographic data in machine-readable form; 2) the scope of concern for the ifla working group; and 3) the definition of terms included in the broader term "content designators." if an international standard format can be derived, it would greatly facilitate the use in this country of machine-readable bibliographic records issued by other national agencies. it should also contribute significantly to the expansion of marc to other languages by the library of congress. at 208 journal of library automation vol. 5/4 december, 1972 present, the assignment of content designators of most national systems is so varied that tailor-made programs must be written to translate each agency's records into the united states marc format. the international communications format might become the common denominator between all countries, each national system maintaining its own national version. introduction the international organization for standardization standard for bibliographic information interchange on magnetic tape ( 1) has recently been adopted, following on the adoption of the american national standard (2). these events, along with the implementation of the united states and the united kingdom marc projects and similar projects in other countries, have emphasized the importance of the international exchange of bibliographic data in machine-readable form. there are many problems to be resolved before we can approach a truly universal bibliographic system. many of these have been described in an article by dr. franz kaltwasser ( 3) . basic to the exchange of bibliographic data is the requirement for an interchange format which can be used to transmit records representing the bibliographic descriptions of different forms of material (such as records for books, serials, and films) and related records (such as authority records for authors and for subject terms). a format for machine-readable bibliographic records is composed of the following three elements: 1. the structure of the record, which is the physical representation of the information on the machine-readable medium. 2. the content designators (tags, indicators, and data element identifiers ( 4) ) for the record, which are means of identifying data elements or providing additional information about a data element. 3. the content of the record, which is the data itself, i.e., the author's name, title, etc. obstacles the structure of the record, as described in ansi z39.2-1971 and in the iso standard on bibliographic information interchange on magnetic tape, has been fairly well accepted by the international bibliographic community. however, events have shown that as the different agencies examine their requirements and establish the content of their machine-readable records, the content and the content designators so established are not the same across all systems. this lack of uniformity is the result of at least four principal factors: 1. the different functions performed by various bibliographic agencies. bibliographic services are provided by many types of organizations issuing a variety of products. these products are dissimilar because the content d esignators/ avram and guiles 209 uses made of them vary, reflecting dissimilarities in the principal functions of the agencies involved. the main products of some of the different bibliographic services are briefly described as follows: catalogs serve to index the collections of individual libraries by author, title, subject, and series. to enable a user to find a physical volume rather than merely a bibliographic reference, catalogs also provide a location code. a unique form of entry for each name or topical heading used as an access point is maintained by means of authority files. the various access points serve to bring together works by the same author, works with the same title, works on the same subject, and works within the same series. a unique bibliographic description of each item makes it possible to distinguish between different works with the same title, and different editions of the same work. natio1ull bibliographies provide an awareness service for those items published within a country during a given time period. a national bibliography is not a catalog, since it is not based on or limited to any single collection, nor is it concerned with providing access to the physical item itself. abstracting and indexing services are principally concerned with indexing technical report literature and individual articles from journals and composite works. because these services generally index more specialized materials and are aimed at the specialist in a particular discipline, more complete indexing by means of a relatively large number of very specific subject terms is the rule. like the national bibliography, the abstracting and indexing service is not concerned with a single collection or, in most cases, with providing access to the item on the shelf. 2. the lack of internationally accepted cataloging practices. the paris conference of 1961, which resulted in the paris principles, set the framework for an international cataloging code. following the conference, progress in standardization was evident in the work begun on the formulation of cataloging codes embodying, in varying degrees, the paris principles. one such code is the anglo-american cataloging rules (aacr) ( 5). however, we are concerned with the present, and the differences that exist in the cataloging codes of various countries do create differences in the content that may affect content designation of machine-readable bibliographic records. the differences between cataloging rules practiced in the library community and in the information community ( 6) are even more prominent. in the united states, these differences are clearly seen in a comparison between aacr and the cosati rules (7). even more significant is the fact that in preparing entries for abstracting and indexing services, it is common practice to use a name as it appears on the document, without attempting to distinguish it from names of other persons so as to bring together the works of a single author. in addition, cataloging practice in the information community often requires inclusion of data elements that 210 journal of librm·y automation vol. 5/4 december, 1972 are not used in the library community (e.g., organizational affiliation). it is obvious that these differences in practice are serious obstacles to achieving agreement on details of content designation for machine-readable records used in each environment. 3. lack of agreement on organization of data content in machinereadable records in different bibliographic communities. bibliographic data can be organized in machine-readable form in many different ways. for example, one approach could be the grouping of data elements by bibliographic function, such as main entry, title, etc.; another approach could be the grouping together of information by type, such as all personal names, all corporate names, etc. there are pros and cons associated with each of these groupings. this difference in organization exists in some instances between the library community and the information community. for the present discussion, it is not appropriate to analyze the relative merits between the two points of view. it must be emphasized, however, that there is no optimum organization, and that a variety of users will use the data in a variety of ways. it is certainly true that any given system can define, upon agreement of its members, a particular use to be made of the data exchanged and, in this case, perhaps an optimum data organization can be defined ("perhaps" is used because hardware is another variable that comes into play). 4. lack of agreement as to the functions of content designators. there is a lack of agreement as to the functions of content designators, as well as a misunderstanding, in some instances, of the rationale for the assignment of certain of them to specific data elements. the lack of agreement as to the functions of content designators is clearly seen when one examines the use of the data element identifiers in the different national formats. for example, in some cases the data element identifier is assigned to the data element according to its value in a collation sequence (e.g., a is smaller than b, b is smaller than c). the result is a prescribed order, from the smallest value to the largest, for selecting the data elements to build a sort key for file arrangement. in other systems, the data element identifier assigned to a data element is for the unique identification of that data element. there is no prescribed ordering built into the data element identifiers; the identification of the data elements allows them to be selected according to the requirements of the user to build a sort key for file arrangement. data element identifiers in some cases are tag dependent, i.e., they identify the same data elements consistently when used with a particular tag and data field, regardless of the combination of data elements present in the data field for any particular record. in other cases, the data element identifiers are tag, indicator, and data dependent, i.e. , the meaning of the data element identifiers changes and the data element identifiers are assigned to different data elements, depending upon the combination of data elements occurring in a data field for a particular record. content designat01·s j avram and guiles 211 scope the scope of responsibility for the ifla working group is to investigate the present assignment of content designators for the purpose of determining those areas in which there is uniformity of assignment and those areas in which there is not uniformity. once this has been done, the working group's next task is to explore how best these differences can be accommodated so as to arrive at a standard for the international interchange of bibliographic data. within that scope, the working group will first be concerned with the requirements for the international library community, i.e., libraries and national bibliographies. the magnitude of this assignment is such that it appears unwise to impose the additional problems of the needs of the information community concurrently. if the attempt is made to do so, and the result of the effort is failure, it will not be clear whether we failed because the task was too difficult or whether it is not possible to merge two communities with significant variation throughout their systems. on the other hand, if only the library community is approached at this time, the result of the effort can be success; but if the result is failure, at least one factor will be clear if only in a negative sense: there will be no lingering question as to whether the attempt might have succeeded had the problems of only one community been addressed at one time. in summary, it may be stated that our attempt to standardize content designators within the library community will be complicated by: 1) the lack of an international cataloging code; 2) the dissimilarities in the products of various agencies created by the different functions performed by those agencies; and 3) the lack of an agreement on the functions of the content designators themselves. the lack of agreement on an international cataloging code will have an impact on our work, but is an area which is out of scope for the working group, and therefore can be considered a variable over which there is no control. the dissimilarities in the functions of the different bibliographic services are also a given. however, since it was possible to work around these differences in the formulation of the international standard bibliographic description, it may be possible to do so for the standardization of content designators. therefore, within the two variables given above, our emphasis should be placed on attempting to resolve the lack of agreement on the functions of content designators and then we can proceed to attempt to standardize the assignment of tags, indicators, and data element identifiers. the present paper concentrates on the substance of the problem, namely, a statement of the definition of tags, indicators and data element identifiers and their functions, i.e., the information they are intended to provide to a system processing bibliographic data. the concept of a supermarc has been discussed in the literature ( 8, 9) as an international system for exchange, leaving the various national systems as they now exist. each country would have an agency that would 2i2 journal of library automation vol. 5/4 december, 1972 translate its own machine-readable record into that of the supermarc system; likewise, each agency would translate the supermarc record from national bibliographic systems into its own format for processing within the country concerned. at the international level, there would be only one record format. this concept has the theoretical advantage of eliminating the difficulties inherent in seeking agreement internationally. however, what has not been addressed is the problem inherent in this concept, namely, the problem associated with any switching language. this may be illustrated in the following manner. consider the case of a national agency (called system 1) whose format is not detailed in regard to content and/or content designation. when system i translates to supermarc, the result will be a supermarc record, but it will be a supermarc record still only defined at the level of detail of the limited record of system i. this will be true regardless of the level of detail at which supermarc is originally defined. likewise, when a national agency (called system 2) accepts records from system i via supermarc and translates the supermarc records into its own format, the resulting records will be the limited records of system i, regardless of the detail of system 2's local records. this may be schematically represented as follows: system 1 supermarc =no more detail than (little detail) (great detail) system i supermarc system 2 =no more detail than (record from (great detail) supermarc record system 1) from system 1 the result of this analysis suggests that systems with formats of less detail than that of supermarc must permanently upgrade their national formats to the level of detail of supermarc while systems with formats more detailed than supermarc must be prepared to accept the fact that records from other countries will probably require significant modification. therefore, although national variation is allowed in a supermarc system, the international community still faces all the problems of international agreement, i.e., arriving at an acceptable level of content designation for supermarc. content designators bibliographic records in machine-readable form permit the manipulation of data and allow greater flexibility for the creation of a variety of products. the full potential of machine-readable files has not been exploited to date, but based on experience and the projection of this experience into the future, it may be said that the variety of uses of machine-readable cataloging data will be limited only by the imagination of the user. among the possible products are printed catalog cards, book catalogs, special bibliographies, special indexes, book preparation materials, crt display of cataloging information, management statistics (analysis of data by type content designators/ avram and guiles 213 of material, subject, language, date, or other parameters), etc. all of the above are possible in a wide variety of output formats. in order to produce these various tools, there are four basic operations ( 10) which are performed on the data. 1. store-the storage operation is the internal (to the computer) management of the data, i.e., how files are organized, the type of accessing technique ( s) used, and the data elements (e.g., author, title) selected as keys to the complete bibliographic record. 2. retrieve-the retrieval operation is used here in its broadest sense, to cover the following kinds of retrieval: the retrieval of a single element from a record; the retrieval of a known item, such as the selection of a record by unique number or author and title; the retrieval of a category of records, such as those for all french language monographs on a particular subject with an imprint date of 1968 or later; the retrieval of all bibliographic records for a particular form of material, e.g., serials. (the latter retrieval capability allows segmentation of files not only for display purposes but also for the implementation of certain file organization techniques. ) 3. arrange-the arrange operation puts information in a sequence that is most useful for the user of the product, i.e., an alphabetic sequence or a systematic arrangement. 4. display-the display operation as used in this context implies formatting, the purpose of the operation being to make the information human-readable, e.g., display on a crt, computer printout, and photocomposed output. for example, to display a particular catalog record on a crt device, the record must be retrieved from the data base by a known number or other means of access and formatted for display; or, to prepare a special bibliography, all records satisfying a particular search argument are retrieved from the data base, arranged in some predefined order, formatted and printed. the storage operation is implicit in the examples. in order to perform these four basic operations through machine manipulation, content designators are assigned to the data content of the record. therefore, it may be stated that the function of content designators is to provide the means for the user to store, retrieve, arrange, and display information in a variety of ways to suit his needs. there are three types of content designators currently in use: tags, indicators, and data element identifiers. for the purposes of standardization, agreement must not only be reached on the definition of those three elements but also on other basic issues. the definitions for the elements are given below, as well as a general discussion of some of the decisions that must be made concerning each of the elements, prior to attempting to achieve standardization. 1. a tag is a series of characters used to identify or name the main content of an associated data field ( 11). the designation of main 214 journal of library automation vol. 5/4 december, 1972 content does not require that a data field contain all possible data elements (units of information) all the time. for example, the imprint may be defined as a data field containing the data elements, place, publisher, date of publication, printer, address of printer. the tag for the data field called imprint would be the same if only a partial set of the data elements existed for any single occurrence of the data field in a bibliographic record. should the method of assigning tags be simply to assign a unique series of characters to a data field whereby the characters have no meaning other than to name the main content of the data field? or is it desirable to give values to the characters making up the tag? in the latter case, a tag may identify a data field both by function and type of entry, thus allowing greater flexibility in internal organization of the data as well as its formatting for output. 2. an indicator is a character associated with a tag to supply additional information about the data field or parameters for the processing of the data field. indicators are tag dependent because they provide both descriptive and processing information about a data field. should alphabetic characters as well as numeric characters be assigned to indicators? should the character b (blank) always mean a null condition and the character 0 (zero) have a value or a meaning? should indicators with the same values and meanings be used for different data fields and their associated tags where the situation warrants this equality? for example, a personal name may be a main entry, an added entry, or a subject entry. if it is deemed desirable to further describe the type of personal name such as forename, single surname, multiple surname, or name of family, the indicators set for each of the data fields mentioned above would have the same value and the same meaning. this technique has the advantage of simplifying machine coding for the processing of different functional fields containing the same types of entries. 3. a data element identifier is a code consisting of one or more characters used to identify individual data elements within a data field. the data element identifier precedes the data element which it identifies ( 12). should data element identifiers be given a value, i.e., file arrangement value, other than the identification of the data element? should data element identifiers be tag dependent only or tag, indicator, and data dependent? should the same data element identifiers be assigned, so far as is possible, to the same data element regardless of the field in which the data element occurs? should data element identifiers be restricted to alphabetic characters or should they be expanded to allow the use of numerics and symbols? the assignment of a filing value to a data element identifier is intended to minimize the effort required to create software for filing. however, assigning filing values to data element identifiers results in identifiers that are tag, indicator, and data dependent. on the other hand, without assigning content designators/ avram and guiles 215 filing values to the data element identifiers and using computer filing algorithms, the system can avoid data dependent codes, thus ensuring maximum consistency across all fields. for example, the use of the same data element identifier assigned to a title wherever a title appears in the record allows the flexibility of selecting all titles by data element identifier. furthermore, tag, indicator, and data dependent data element identifiers create additional complexity in the editing procedure ( 13). although fixed fields are not content designators, they do take on similar characteristics as to function, i.e., to provide the means for the user to store, retrieve, arrange, and display information in a variety of ways to suit his needs. therefore, they should be considered by the working group along with the content designators. a fixed field is one in which every occurrence of the field has a length of the same fixed value regardless of changes in the contents of the field from occurrence to occurrence. the contents of the fixed field can actually be data content, e.g., date of imprint; or a code representing data content, e.g., type of illustration; or a code representing information about the record, e.g., language of the record; or data concerned with the processing of the record, e.g., date entered on file. here again, certain basic issues must be resolved. should the character b (blank) be used to signify a null condition, e.g., in a record without any type of illustration b (blank) would be used? should the codes that represent more than two possible conditions be alphabetic or numeric? should the characters 1 (one) and 0 (zero) be used to indicate an on-off condition, e.g., a book contains an index to its own contents ( 1) or it does not ( 0 )? it is important to keep in mind the eventual necessity of correlating the content designators and fixed fields for all the formats defined for different forms of material (books, serials, maps, films, music, etc.) . by adhering as much as possible to the same content designators and fixed fields, the processing of different forms of material will be facilitated in terms of the software required to perform a particular process and to combine all forms of material in a single product, such as a book catalog. references 1. international organization for standardization. bibliographic information interchange-format for magnetic tape recording. draft international standard iso/dis 2709. technical committee iso/tc 46 secretariat (germany), 1972. 2. american national standards institute. american national standard for bibliographic information interchange on magnetic tape. ansi z39.2-1971. new york: american national standards institute, 1971. 3. franz georg kaltwasser, "the quest for universal bibliographical control," unesco bulletin for libraries, 25:252-259 (sept./oct. 1971). 4. data element identifiers have more commonly been referred to as subfield codes. 216 journal of library automation vol. 5/ 4 december, 1972 5. anglo-american cataloging rules. prepared by the american library association ... north american text. chicago: american library association, 1967. 6. the term bibliographic services has been used to include all agencies concerned with bibliographic products. for this paper such agencies have been further subdivided into two communities: the library community, defined as including libraries and national bibliographies; and the information community, defined as including the abstracting and indexing services. this broad definition has been used for the sake of simplicity. 7. committee on scientific and technical information. standard for descriptive cataloging of government scientific and technical reports. washington: committee on scientific and technical information, 1966. 8. r. e. coward, "marc: national and international cooperation," in international seminar on the marc format and the exchange of bibliographic data in machine-readable form, berlin, 1971: the exchange of bibliographic data and the marc format, ( mi.inchenpullach, 1972), 17-23. 9. roderick m. duchesne, "marc and supermarc," in international seminar on the marc format ... , p. 37-56. 10. these basic operations are not used in this context to mean basic machine operations such as add, subtract, multiply, and divide. 11. a data field is a variable length field containing bibliographic or other data not intended to supply parameters to the processing of the bibliographic record, i.e., content data only. 12. there are in existence formats in which the data element identifier is a single character, i.e., a delimiter. in this case, there is no explicit identification function built into the data element identifier. if, in the particular data field, the data elements are all of the same type, such as a multiname data field, then the meaning of the delimiter is implicit. 13. editing is used in this context to mean the human or machine assignment of content designators. 2 information technology and libraries | june 2008 mark beatty (mbeatty@wils.wisc.edu) is lita president 2007/2008 and trainer, wisconsin library services, madison. mark beattypresident’s message i’ve recently read three quite different articles that surprisingly all had something similar to say with a different twist on the theme uppermost in my brain the last year or two. here’s the briefest of quotes from the three. i would suggest your full reading of all three if you haven’t already. n lankes, silverstein and nicholson, “participatory networks: the library as conversation,” in the december 2007 information technology and libraries: “with their principles, dedication to service, and unique knowledge of infrastructure, libraries are poised not simply to respond to new technologies, but to drive them. by tying technological implementation, development and improvement to the mission of facilitating conversations across fields, libraries can gain invaluable visibility and resources.” n bill crowley, “lifecycle librarianship,” in the april 1, 2008 library journal: “public, academic, and school librarians should adopt the service philosophy of lifecycle librarianship and jointly plan at town, city, or county levels to identify and meet human learning needs from “lapsit to nursing home.”” n joe kissell, “instant messaging for introverts,” in the april 4, 2008 tidbits (http://db.tidbits.com/ article/9544): “several people i discussed this issue with (using im and twitter) expressed dismay at having had relationships deteriorate due to an unwillingness on another person’s part to adapt to changing technology. for example, people who don’t use e-mail don’t get evites, and so they end up being excluded from parties.” what all three express to me is a concern that libraries, and just plain humans, need to be part of the conversation, part of the social structure, and full participants in life. we are now, through surveys and meetings and focus groups, starting to know that new librarians and new lita members are most interested in networking with their colleagues using multiple methods to fulfill the whole range of their professional and social needs. lankes wants to make sure we participate with all our constituencies, crowley wants us to spend a lifetime with those constituencies, and kissell wants to make sure we get invited to the party. that sounds a bit facetious but i believe the point is that our association, our libraries, our social structures are now required to be active participants, physically and virtually, in the life of their communities. we have to recognize our communities and then act to participate and provide space and support to those communities. this takes work and the will to always be part of our communities. all of which leads to my president’s program, featuring keynote speaker joe janes and the blogging folks at “it’s all good” at the ala annual conference 2008 in anaheim, california. it will be part of sunday afternoon with lita, taking place on sunday, june 29, 2008. the program line up will include: n top technology trends 1:30–3:00 p.m. n lita awards and scholarships reception 3–4 p.m. n lita president’s program 4–5 p.m. n “isn’t it great to be in the library . . . wherever that is?” it’s often said that today we have to run three libraries at once: the library of yesterday, today, and tomorrow. we run both the physical, visible library, and the one that exists beyond the walls. this raises many questions of what a library is and encompasses, what it isn’t, where the boundaries lie, the impact on what we do and how we do it, what our clients want, how we serve them, and what kinds of librarians serve them. this program will attempt to examine the full social and cultural constructs of libraries that move beyond basic web 2.0 and integrate patrons, librarians, and resources in what should be a ubiquitous manner. join joe janes, associate professor in the information school of the university of washington in seattle and columnist for american libraries, keynote speaker, along with members of the “it’s all good” blogging group (http://scanblog.blogspot.com) as the reactor panel for a lively exploration of possible futures. i hope you’ll be able to attend but be assured that members of the lita community will blog and report and even record the sessions in various ways that will be made freely available to our community. letter from the editors (december 2022) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | december 2022 https://doi.org/10.6017/ital.v41i4.16005 from the articles and communications in our december issue, the library technology profession has begun thinking through and reporting on the adaptations and changes wrought by the ongoing (some may say never-ending) covid-19 pandemic. four of the 5 articles in this issue relate to the many ways the pandemic altered how libraries do their work, both behind the scenes and in public. from the tools we use internally for project management to those we provide to our public service colleagues, it seems no aspect of library technology has been untouched. in particular, the seriousness of the challenges caused by interfaces with poor accessibility has been brought to the foreground. a critical component of libraries’ diversity, equity, inclusion, and accessiblity(deia) efforts, ensuring equitable access to all must be at top of mind. when the pandemic led libraries, and education in general, to adapt to largely virtual presentation models, the interactive tools we reached for—products such as padlet, jamboard, and poll everywhere— became de rigeur for establishing two-way interactions with our communities. yet little attention was paid, until now, to the accessibility of those tools. in this issue, “tech tools in pandemictransformed information literacy instruction: pushing for digital accessibility” provides excellent qualitative data to help us understand how well, or poorly, these tools meet accessibility needs. articles • digitization of libraries, archives, and museums in russia / heesop kim and nadezhda maltceva • tech tools in pandemic-transformed information literacy instruction: pushing for digital accessibility / amanda rybin koob, kathia salomé ibacache oliva, michael williamson, marisha lamont-manfre, addie hugen, and amelia dickerson • spatiotemporal distribution change of online reference during the time of covid-19 / thomas gerrish and ningning nicole kong communications • a library website redesign in the time of covid: a chronological case study / erin rushton and bern mulligan • a library website migration: project planning in the midst of a pandemic / isabel vargas ochoa as always, if you have lessons learned about technologies and their effect on our mission, we’d like to hear from you. our call for submissions outlines the topics and process for submitting an article for review. if you have questions or wish to bounce ideas off the editor and assistant editor, please contact either of us at the email addresses below. we particularly welcome our public library colleagues to consider a column in our “public libraries leading the way” series; proposals for 2023 may be submitted through the pllw proposal form. with best wishes for 2023, kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu marisha.librarian@gmail.com https://ejournals.bc.edu/index.php/ital/article/view/13783 https://ejournals.bc.edu/index.php/ital/article/view/15383 https://ejournals.bc.edu/index.php/ital/article/view/15383 https://ejournals.bc.edu/index.php/ital/article/view/15097 https://ejournals.bc.edu/index.php/ital/article/view/15101 https://ejournals.bc.edu/index.php/ital/article/view/14801 https://ejournals.bc.edu/index.php/ital/call-for-submissions https://docs.google.com/forms/d/e/1faipqlsegdx926lhtfsrsdkexaqzmx1ayfw7g2ny6j1iegy-qt6lubq/viewform?usp=sf_link mailto:varnum@umich.edu mailto:marisha.librarian@gmail.com articles communications this paper discusses some of the problems associated with search and digital-rights management in the emerging age of interconnectivity. an open-source system called context driven topologies (cdt) is proposed to create one global context of geography, knowledge domains, and internet addresses, using centralized spatial databases, geometry, and maps. the same concept can be described by different words, the same image can be interpreted a thousand ways by every viewer, but mathematics is a set of rules to ensure that certain relationships or sequences will be precisely regenerated. therefore, unlike most of today’s digital records, cdts are based on mathematics first, images second, words last. the aim is to permanently link the highest quality events, artifacts, ideas, and information into one record documenting the quickest paths to the most relevant information for specific data, users, and tasks. a model demonstration project using cdt to organize, search, and place information in new contexts while protecting the authors’ intent is also introduced. ■ statement of the problem human history is composed of original events, artifacts, ideas, and information translated into records that are subject to deciphering and interpretation by future generations (figure 1). it’s like putting together a puzzle, except that each person assembling bits and pieces of the same information may end up with a different picture. we are at a turning point in the history of humanity’s collective knowledge and expertise. we need more precise ways to structure questions and more interactive ways to interpret the results. today, there is nearly unlimited access to online knowledge collections, information services, and research or educational networks to preserve and interpret records in more efficient and creative ways.1 there is no reason digital archiving and dissemination techniques could not also be used to streamline redundancies between collections, build cross-references more methodically.2 content should be presented and techniques utilized according to orderly specifications. this will help to document work more responsibly, making shared records more correct, interesting, and complete. the open-source system proposed, context driven topologies (cdt), packs and unpacks ideas and information in themes similar to museum exhibitions using specifications created by each author and network. data layers are formed by registering unique combinations of geography, knowledge domains, and internet addresses to create multidimensional shapes showing where data originate, where they belong, and how they relate to similar information over time. the topologies can be manipulated to consolidate and compare multiple sources to identify the most reliable source, block out repetitious or irrelevant background information, and broadcast precise combinations of ideas and information to and from particular places. “places,” in this sense, means geographic region and cultural background, knowledge domain and education level, and all of their corresponding online resources. modern information must be searchable on multiple and simultaneous levels.3 today’s searches occur for a number of reasons that did not exist when most current collections, repositories, and publications were created. digital records have the potential to reach far broader audiences than original events, artifacts, and ideas. therefore, digitized items and the acts of publishing and referencing over networks could theoretically serve a longer-term and more expanded purpose than most individual collections, repositories, or publications are designed to serve. there is no shortage of interesting work to look at. we live in a complex world that is just recently being digitized, mapped, analyzed, and broadcast over the internet in fine detail and compelling overall relationships. many deborah l. macpherson (debmacp@gmail.com) is projects director, accuracy&aesthetics (www.accuracyandaesthetics.com) in vienna, virginia. deborah l. macpherson digitizing the non-digital: creating a global context for events, artifacts, ideas, and information digitizing the non-digital | macpherson 95 figure 1. 50 word word-search-puzzle (courtesy of kevin lightner) 96 information technology and libraries | june 2006 of these relationships require mathematics, images, and maps to explain them. we need more than keywords to explore and reference all that has been documented, but we have formed the habit of using keywords and machine-based classification schemes. the entire digital world is in a mire of conflicting priorities, funding opportunities, and intellectual quests toward the future. to advance humanity’s collective curiosity and knowledge, and to coordinate similar efforts across disciplines and cultures, we need one form of record keeping. one global context to show: 1. where ideas and information begin; 2. if the original is non-digital (e.g., an artifact or real world event), and if so, the location where the artifact resides or the time and place of the event; and 3. a marking system to keep track of the ways information has been exchanged, reinterpreted, and reused to create a more comprehensive and simplified guide to humanity’s collective knowledge and expertise. digitizing the non-digital is a concept to address three issues: ■ tools to assemble the bigger pictures needed to document the best paths to the most relevant information in sets rather than retrieving results item by item; ■ placeholders for information that has not been digitized or was never recorded; and ■ distribution to and from specific places according to the ways it is used, the kind of information it is, and the types of people who are able to understand it. there is currently little distinction between all data that have been collected or exist, versus the data and techniques selected to draw conclusions. there are no tools to differentiate between information under rigorous discussion by a discipline or culture versus random bits and pieces. there is a need to develop the equivalent of interpretive exhibits to instruct and inspire the general public. there is currently no way to herd information into crowded areas to be consolidated, compressed, and prioritized by its relationship to similar ideas and information. citation patterns are able to show connections or structure-related information.4 however, they currently do not show whether the reference is for or against the other work. there are very few big pictures.5 there is no way to trace where an idea has led over time. the global context proposed is not like the ancient library of alexandria or large-scale contemporary initiatives. the envisioned process looks beyond the quest to digitize or publish every available event, artifact, and idea. it is not about each item itself. it is being able to make sense of the ways the same information can be viewed in different contexts, and being able to construct a reliable process to search and document the results. having bigger pictures will allow researchers, curators, and others to see what is missing or decide which archival works should be converted into digital form. we do not have the time, resources, or reasons to digitize every item in every collection. the aim is to gradually identify what the most telling examples are in different areas so someone new to an event, artifact, idea, or information can see it in various contexts and automatically be shown the most compelling or instructive sequences first (figure 2). a coordinated effort to overlap and see all archives and publications by ranking accuracy and appeal to the public in relationship to all knowledge will make it possible for entirely new lines of inquiry to be established. it will help researchers coordinate work across disciplines. an example of this principle today is the international virtual observatory alliance (ivoa).6 ivoa is a coordinated effort by astronomers worldwide to document our universe more efficiently by systematizing their records; showing where they originate; indicating how they were collected; meeting their rigorous mathematical figure 2. photomosaic ® thousands of miniature images of the civil war combine to make one large portrait. (courtesy of robert silvers) article title | author 97digitizing the non-digital | macpherson 97 standards; and deciding themselves how and where their records belong in relationship to each other, and which ones are most important. only astronomers are qualified to do this. the same is true in any area of humanity’s specialized knowledge and expertise. the most difficult aspect of creating a global context is accommodating and expressing each area in its unique way as created from within, while still being able to get the most descriptive examples from all areas to fit together in a sensible and appealing overview. until digital archives and publications can be deeply searched on a global level using simpler tools and predetermined pathways accessible by anyone, two researchers in different geographic or academic areas may be investigating the same topic from different points of view and will not know it. there is no way to be led to the best internet resources. today, as so much information surrounds us, it is hard to believe that common lines of inquiry could be discovered by accident. context of the place, time, idea, or education level should be able to drive internet topologies to the most appropriate online resources. constructing a reliable and beautiful digital history of all events—both natural and man-made—artifacts, ideas, and information means contributing to and combining a wide range of knowledge, expertise, networks, archives, and tools. mapping digital knowledge to historical knowledge means arguing about and perfecting an entirely new set of checks and balances. historical and digital knowledge are different. historical knowledge is fluid, continuous, and held by traditionally separated cultures and disciplines. digital knowledge goes everywhere that can be marked and traced by the times and places it was created, captured, and distributed. trying to visualize what is happening and relating it to working practices and the types of information that came before it is not like tracing the history of the human race back to adam and eve or the universe back to the big bang, where substantial guesswork beyond our memory or experience is involved. the entire conversion into the networked age is happening before our eyes in less than one generation without the benefit of reflection, careful review, and storytelling. we’re collecting everything indiscriminately over and over again while all datasets are rapidly expanding. we need to step back, slow down, and acknowledge that many current digitization and publication methods do not consistently generate reflective or reviewed results that are able to tell a story. we do not currently have one shared map, context, mathematical record, language, or set of symbols to interpret from different points of view for a variety of purposes over time. we do not currently mark the original versus subsequent interpretations of the same information as an integral component of most digital records. there is no financial support for one single shared storage space to preserve only the highest-resolution, most agreed-upon versions because we may never be able to agree on what they are. therefore, there is also not one system that can be fine-tuned to discover research and results that may be accidentally overlapping. instead, unusual approaches get watered down by constrained words designed to fit metadata requirements developed by archivists and engineers rather than the original authors. links get broken, web sites are no longer maintained, trends change. there are currently very few feasible ways to pick up on a line of inquiry previously initiated by others without sorting through and regenerating the same information again.7 a simplified version of the work needs to be preserved on the network, able to be referenced by others even if they are far away, live in a different time, or are more or less advanced in their ways of thinking. if digital information is reliable, someone in a remote place or in the future should not need to collect the same information again or unintentionally retrieve out-of-date or duplicate results. searches in the public domain should not be boring. they should be as easy to click through as tv channels, with more directions to go and better content. all searchers should not have to start at the top like everyone else on the first page of google, citeseer, or arxiv with a blank white space and a box to enter key words. investigators should be able to outline the facts they know, dial in measurements, specify relationships, and generally be able to use their own knowledge and expertise to isolate and extract entire ideas over broad spectrums or select only relevant portions of archives and publications to reintegrate into larger bodies of work for further discussion. digital objects are able to depict more than the unaided eye can see. an example is the evaluation of the center of mass of michelangelo’s david performed for david’s restoration by the visual computing lab based on a 3d model of the statue built by stanford university (figure 3).8 the digital david does not have mass. the original david is a beautiful object sculpted of a known and predictable material. the model makes it possible to test restoration techniques without permanent damage in ways no one would dare attempt on the irreplaceable original without first knowing more. the documentation process is an enhanced original that should be permanently bound to the digital history of the original sculpture. the evaluation method could be applied to other objects, but this model belongs with this object and this type of research. a global context built upon a solid, mathematically linked foundation would mean this conscientious work would not be lost or need to be repeated. digital records are not being used nearly to their full potential. so many influences on humanity’s intellectual evolution could be examined as history takes shape over 98 information technology and libraries | june 2006 time. concurrent and conflicting interpretations can take on more meaning than the original by itself. for example, how could the internet and legal citations be used to map the subsequent interpretations of the u.s. constitution from the time, place, and reasons where it was written to every supreme court case and related citation since the original context? what would this map look like (figure 4)? the impact that these four pages of ink on paper have had to the united states and the entire world cannot currently be examined in one volume to see where the most contentious and useful passages are. similar dynamics in wikipedia are shown in history flow by martin wattenberg at ibm research.9 what if techniques developed in one field could be applied to content from another area? for example, what if computer models created to track storms and hurricanes could be used to arrange and watch the evolution and real world impact of all the documents and actions associated with a war? being able to see how originals evolve in their interpretation and impact on society over time is practical because not all records are worth keeping. even worse, mundane or meaningless events, artifacts, ideas, or information may seem more important than they actually were if they are not translated into digital form or distributed in the right way.10 the task today is to make the most advanced ways of thinking and working more approachable and appealing to someone new, which is everyone outside a particular discipline or culture, while traversing a map of humanity’s collective knowledge and expertise. because shared memories of this magnitude would be so far-reaching and complex, the record itself needs to be able to show every user how to use it. every unique purpose for looking around, publishing, or referencing work, and adding to or taking away from a collaborative global context should be geared toward improvement and simplification. while millions and millions of people are accessing enormous numbers of files and collections, some paths are better than others. in order to sort and choose the best parts of vast collections, documenting everyone going in and out of various semantic places can ultimately identify the best paths to information everyone understands. what if someone who does not care at all about paintings makes an inquiry—which ten should they be shown to get them interested? there is also the issue of gearing the internet to provide more efficient pathways to widely accessed preapproved and curated information. every mouse click could accumulate to document the most reliable pathways in and out of shared information spaces to generate an assortment of scenarios for looking at the same information in different ways (figure 5).11 we think there is far too much information to consolidate into one big picture, that our ideas and methods are too incompatible to coexist comfortably in one space, but perhaps this is not really the case. perhaps we can understand what is happening more clearly by working backwards. ■ proposed solution and design for a running prototype even though many networks are in place and countless computers have been manufactured, technology advances rapidly. there are very few reasons to repair obsolete equipment or maintain outdated web resources. therefore, why not go back to the drawing board on all of it? we may have completely new computers and networks within ten years, anyway. a record-keeping and referencing system this ambitious needs to incorporate every type of record, classification scheme, symbol, style, and quirk. when visiting a new place outside your comfort zone, it needs to be obvious what the best local techniques are to filter and understand the results. people new to an area need to have the option of using tools they can invent or already know. figure 3. david’s center of mass (courtesy of the visual computing lab and stanford university) article title | author 99digitizing the non-digital | macpherson 99 the visualization of cdt’s model demonstration project will bring together research scientists, artists, integrators, and institutions to develop a running prototype. the purpose is to establish and record a series of planned and spontaneous situations in different parts of the world across a range of disciplines and existing networks so that these situations can be mapped. the project will be a group of people thinking together to confront the roadblocks in assembling incompatible ideas and information into one context. the group will collaborate in larger and smaller groups in roughly three-month intervals as participants continue with their existing work. the development of this system has to be dynamic, changing piece by piece both from the bottom up and the top down while everyone’s regular work continues. therefore, the system will be geared toward sample sets of active work products, rather than the record-keeping system by itself. the current objective is to establish a network of ten art museums, ten scientific research institutes, and ten new media/new technology efforts in ten cities that speak different natural languages (for example: english, german, french, italian, hindi, mandarin, ga [belonging to the cluster of kwa languages in ghana], uzbek, spanish, and arabic). the overall intent is to use mathematics, art, and individual ways of knowing to develop a series of professional sketches to serve as shortcuts between languages and key words in the search process. the first step is to map the background of each of the project participants’ previous work by time, location, and discipline. the database will include scientific visualizations, art objects, performances, algorithms, mathematical formulae, musical recordings, and many other forms of creative and scholarly expression. the next steps will be to hold a series of interactive workshops. at the first workshop, the research scientists will explain the mathematics and images they use in their work. two sets of artists will isolate the aesthetics to render their own map through the scientists’ ideas. two traveling exhibits will be created, one to be experienced in person, the other to be presented through a new media and online exhibit. both will be tracked physically and conceptually using cdt. the results will be generated and interpreted using gis, matlab, photoshop, and flow visualization software. for more information, please contact the author. a survey of individual and institutional requirements will be undertaken to define practical ways to move and organize ideas and information into a unified sample map of previously unrelated content and techniques. for example, at one institute, perhaps only two participants and four local professors will understand what that part of the map is showing. another part may only have meaning to one artist. a unified map for everyone, with built-in copyright protection for the participating artists, scientists, and institutions, will be presented to nonspecialist general publics around the world for feedback and further change within specified limits. the participating publics will be people interested in contemporary art, cutting-edge scientific research, new media, and events where all three communities can interact. each part of the prototype will be able to be examined in groups to compare and contrast different elements against different backgrounds. some arrangements will be assisted by the computer and network. the project will map everything with which each event, idea, and artifact has ever been associated in scale, proportion, and relative placement in the record overall. for example, if the records in question are paintings, any group could figure 5. thick and thin (courtesy of the artist john simon) figure 4. the constitution of the united states (courtesy of the u.s. national archives and records administration [nara]) 100 information technology and libraries | june 2006 be gathered together into the same reference window without copying the images. the assembly window has a built-in scale for the items it is showing, so they will be displayed in the correct proportion to each other. the system binds images of physical objects with their dimensions and the times and places they were created while this information is known—so a user does not ever have to guess later when looking back at any part of the record. any group of paintings can be automatically arranged chronologically, by size, culture, or any number of comparisons and curatorial issues. a sample sequence is: 1. a zoomed-in map showing a group of paintings in an exhibit. each painting links to its history. 2. within the map of all paintings shown in an intricate collage. 3. inside the map of all human endeavor shown as an appealing landscape. higher levels can then be used to reorganize a theme, for example, “only germany 2005 to 2007,” and drilling back down to generate other exhibitions. this would lead to other paintings and other curators’ conclusions, which would provide a more complete representation of each painting, exhibition, museum, curator, culture, and era. when the records in question are scientific visualizations, problems of presenting unrelated files together are more complex. the records may not share a common scale or system of reference. it may only be possible to place mathematical constructs in contexts based on where they originate geographically and by knowledge domain. an important part of the work will be determining the best contexts by which to introduce ideas or information to untrained viewers and devising methods to start deeper in the records using mathematical, cultural, or other prior knowledge and preferences. the same concept can be described by different words, the same image can be interpreted a thousand ways by every viewer, but mathematics is a set of rules to ensure that certain relationships or sequences will be precisely regenerated. therefore, unlike most of today’s digital records, cdts are based on mathematics first, images second, words last. ideas and information will be encoded to persist over specified periods of time. better examples will find higher placement by connecting to more background information and showing stronger relationships to larger numbers of open questions. cycles will be implemented to return to the same idea later and remove information that is never referenced or has not changed the course of the record’s flow. out-of-date, irrelevant, or rarely used information has to either be compressed or be thrown away, a new type of identity and a process to assemble and eliminate information will be created in thirty prototype forms showing the intertwined history of the events, artifacts, ideas, and information generated by the project and all it branches out to when connecting back to the publications, exhibits, ideas, artifacts, and other information generated by the participating individuals and institutions. the cdt model will relate and join tables to display all the different forms together in one map. each piece of information and the patterned space around it will be documented a special way to generate drawings leading back to originals reliably structured to transfer to other computers and networks. they will transfer without ambiguity because the transactions and paths to the internet addresses are based on mathematical relationships that can be checked. each contributor has the first opportunity to place his or her ideas in context and define the limits of how their originals can be referenced, changed, and presented. at the end of the project, the set will be closed so that it can be cleaned of information that was only temporary, placeholders can be examined, and the entire model can be manipulated as one whole. for more information, please see www.contextdriventopologies.org the more specified a single piece or set of information, the easier it will be to define its history and place it in context. each unique placement and priority assigned by each individual or institution may not agree with the priorities and placements envisioned by others, but sooner or later, there will begin to be correspondence and everyone will be looking very generally at the same emerging map. ■ conclusions there will be innumerable contexts to create, discover, and remark upon in the future by creating a shared pace of curiosity and knowledge acquisition. a global context could be used to extrapolate new knowledge from trends that occur over longer periods of time in more places than we currently share or document. as the envisioned system is fine-tuned, it will become an ideal place to test an idea that is only partially complete to see where the idea fits or to determine if it has already been done. the results could be immediately applied to improve education. in today’s frantic information overload, we should not forget that digital information—and even cold, hard, raw data—is more than ones and zeros. they represent peoples’ work, their fingerprints; people are attached to their data. one wishes networks of computers could understand one’s ideas and work, but we only show them the boring parts. the proposed system will capture beauty so computers can help to find where it is hidden inside all the repositories, publications, and collections through which no person has the time to sort. the system will allow article title | author 101digitizing the non-digital | macpherson 101 users to specify how they think their information relates to the rest of the world so their intended context can be traced in the future. one hopes that using networks and computers to compare ideas and works on larger levels will restore craftsmanship and attention spans to make users want to spend more time with better information. a shared visual language driven by mathematical relationships that can be checked will allow future historians to see where records simply will not harmonize. users will be able to analyze why different ways of looking can shape and divide knowledge and history as it changes. visiting online archives and publications will change. developing processes to pre-organize searches and results for public viewing can change now by creating a system for curators and others to develop sets of information, rather than publishing individual items on their web sites. library facilities can change, and research rooms can become multimedia centers. networks can broadcast content and techniques in one package. there is not one clearly defined reason why being able to see these kinds of overviews or make these types of comparisons can be useful. the internet is a worldwide invention being constructed for a variety of purposes. a perfectly legitimate reason to capture the history of transactions across it in a simple form is just to see what might happen with the objective of increasing our understanding and respect for each other. the most important reason for establishing a global context is to allow users to transfer and update complex histories, thoughts, images, studies, visualizations, drawings, flow diagrams, sequences, transformations, cultural objects, stories, expressions, and purely mathematical or dynamic relationships without depending on constrained keywords or illegible codes that do not describe this information as well as the information can describe itself. all cultures and disciplines would be able to construct their parts of the record precisely the way they prefer. we would finally be able to use computers to show why and how we think information is related—a huge leap forward in the world of digital record keeping. references and notes 1. citeseer, 2005, http://citeseer.ist.psu.edu (accessed apr. 6, 2006); internet2, 2005, www.internet2.edu (accessed apr. 6, 2006); jane’s information group, 2005, www.janes.com (accessed apr. 6, 2006); machine learning network online information service (mlnetois), 2005, www.mlnet.org (accessed apr. 6, 2006); national technical information service, 2005, www.ntis .gov (accessed apr. 6, 2006); smithsonian institution libraries, galaxy of knowledge, 2005, www.sil.si.edu/digitalcollections (accessed apr. 6, 2006); thompson scientific, isi web of knowledge, 2005, www.thomsonisi.com (accessed apr. 6, 2006); visual collections, david rumsey collections, 2005, www .davidrumsey.com/collections (accessed apr. 6, 2006); world health organization, statistical information system, 2005, www3.who.int/whosis/menu.cfm (accessed apr. 6, 2006). 2. g. ammons et al., “debugging temporal specifications with concept analysis,” in proceedings of the acm sigplan 2003 conference on programming language design and implementation (new york: association for computing machinery, june 2003). 3. w. huyer and a. neumaier, “global optimization by multilevel coordinate search,” in global optimization 14 (1999): 331–55 4. a. bagga and b. baldwin, (workshop paper), in colingacl ‘98: 36th annual meeting of the association for computational linguisitics and 17th international conference on computational linguisitics, aug. 10–14, 1998, montréal, quebec, canada: proceedings of the conference (new brunswick. n.j.: acl; san francisco: morgan kaufmann, 1998); s. deerwester et al., “indexing by latent semantic analysis,” journal of the american society for information science 41, no. 6 (1990): 391–07; a. mccallum and b. wellner, “toward conditional models of identity uncertainty with application to proper noun coreference,” in proceedings of the ijcai workshop on information integration on the web (mountain view, calif: research institute for advanced computer science, 2003), 79–84; t. nisonger, “citation autobiography: an investigation of isi database coverage in determining author citedness,” college and research libraries 65, no. 2 (mar. 2004): 152–63; k. van deemter and r. kibble, “on coreferring: coreference in muc and related annotation schemes,” computational linguistics 26, no. 4 (dec. 2000); k. boyack, “mapping all of science and technology at the paper level,” presented at the session mapping humanity’s knowledge and expertise in the digital domain as part of the 101st annual meeting of the association of american geographers (denver: association of american geographers, 2005): 54; metacarta, 2005, www.metacarta.com. 5. j. burke, “knowledgeweb project, 2005.” www.k-web .org (accessed apr. 6, 2006); visual browsing in web and non-web databases, iowa state university, www.public.iastate .edu/~cyberstacks/bigpic.htm (accessed apr. 6, 2006). 6. international virtual observatory alliance, 2005, www .ivoa.net (accessed apr. 6, 2006). 7. s. bradshaw, “charting excursions through the literature to manage knowledge in the biological sciences,” presented at the session mapping humanity’s knowledge and expertise in the digital domain, as part of the 101st annual meeting of the association of american geographers (denver: association of american geographers, 2005): 56, project paper available from http://dollar .biz.uiowa.edu/~sbradsha/beedance/publications.html (accessed apr. 6, 2006). 8. m. callieri et al., “visualization and 3d data processing in the david restoration,” ieee computer graphics and applications 24, no. 2 (mar./apr., 2004): 16–21. 9. m. wattenberg, “history flow,” 2005, http://research web.watson.ibm.com/history (accessed apr. 6, 2006). 10. k. börner, “semantic association networks: using semantic web technology to improve scholarly knowledge and expertise management,” in visualizing the semantic web, 2nd ed. vladmire geroimenko and chaomei chen, eds., (london: springer verlag, 2006) 99–115. 11. g. sidler, a. scott, and h. wolf, “collaborative browsing in the world wide web,” in proceedings of 8th joint european networking conference, edinburgh, scotland (new york: elsevier, 102 information technology and libraries | june 2006 1997); j. thomas, “meaning and metadata: managing information in a visual resource reference collection,” in proceedings of association for computers and the humanities and the association for literary and linguistic computing meeting (charlottesville, va.: university of virginia, 1999); h. yu and a. vahdat, “design and evaluation of a conit-based continuous consistency model for replicated services,” in acm transactions on computer systems 20, no. 3 (aug. 2002): 239–82. 12. visualization of context driven topologies/cdt model demonstration project, 2005, www.contextdriventopologies.org (accessed apr. 6, 2006). image acknowledgments: 50-word word-search puzzle www.synthfool.com/puzzle.gif permission: kevin lightner, synthesizer enthusiast. wrightwood, california abraham lincoln www.photomosaic.com/samples/large/abrahamlincoln.jpg permission: from the artist robert silver. david’s center of mass http://vcg.isti.cnr.it/projects/davidrestoration/restaurodavid.htm http://graphics.stanford.edu/projects/mich/book/book.html permission: roberto scopigno, visual computing lab, isti-cnr, via g. moruzzi, 1, 56124 pisa italy and marc levoy, stanford computer graphic lab, gates computer science bldg. stanford, ca 94305 u.s. constitution www.archives.gov/ repository: national archives building, washington, d.c. permission: nara government records are in the public domain. thick and thin www.numeral.com/drawings/plotter/thickandthin.html 1995 11" × 15" ink on paper. permission: from the artist john simon, new york city. specializing in algorithms and conceptual art. kwic index to government publications margaret norden: reference librarian, rush rhees library, university of rochester, rochester, new york 139 united states and united nations publications were not efficiently processed nor readily available to the reader at brandeis university library. data processing equipment was used to make a list of this material which could be referred to by a computer produced kwic index. currency and availability to the user, and time and cost efficiencies for the library were given precedence over detailed subject access. united states and united nations classification schemes> and existing bibliographies and indexes were used extensively. collections of publications of the united states government and the united nations are unwieldy and, often, unused. orne (1), kane (2), and morehead ( 3) have acknowledged that much of the output of proliferating governmental agencies and government supported research centers is hardly accessible. successful attempts to control the literature of a particular subject field, such as the indexes to the human relations area files and the american political science review, have been compiled by kenneth janda ( 4). others ( 5,6,7,8,) have described projects which apply the kwic index method of control to industrial research reports. no similar attempt to control government publications has been reported, although at northeastern university data processing equipment has been used to list united states material. the index developed at brandeis university library was designed to accommodate the varied government publications held by a library which served student, faculty and researcher alike. 140 journal of lib-rary automation vol. 2/3 september, 1969 materials and method brandeis became a selective united states document depository late in 1965. two years later a government documents department was created to handle all united states publications, as well as those of the united nations. about 15,000 united states publications and a smaller number of united nations publications had previously been acquired and processed as a regular part of the library collection. this material formed the nucleus of the documents collection, to which some 3,000 pieces were added yearly. the new department ordered and received all publications issued by federal government agencies and the united nations, but processed and serviced only about 80% of them. materials that had been acquired for the science library or special collections, such as reserve, were directed to regular library processing departments. the materials retained were classified and arranged according to the superintendent of documents classification and the united nations scheme wherever such numbers were available. all previously cataloged items were removed from the regular collection and scheduled for reclassification. only where superintendent of documents and united nations numbers were not available was library of congress classification retained or assigned. the collection then consisted of material arranged in three sections according to the classifications of the superintendent of documents, the united nations, and the library of congress. the kwic index included all united states and united nations publications located in the documents department. the reader was reminded that additional material issued by those government publishers, housed elsewhere in the libraries, was included in the library catalog. prefatory material included a list of symbols and abbreviations. a two-part index to issuing agencies, represented by six-letter mnemonic acronyms, was arranged alphabetically by acronym, and by bureau name. the r eader was cautioned to consult the united states government organization manual and a united nations organization chart for identification of government agencies and for tracing frequent changes in their structure and nomenclature. the documents list consisted of two parts: one, an accession number listing; and two, a kwic index to part one. upon arrival at the library, publications were numbered and ibm cards were punched according to format cards that described allocation of columns: column card 1 1-6 7 8-13 14-79 80 information item number card number author agency title field blank column card 3 1-6 7 8-20 21-54 55-79 80 information item number card number procedural data holdings kwic index 141 classification number blank cards one and three were punched for all documents; however, cards two and four were punched only where data exceded the prescribed spaces on cards one or three. columns one through six were reserved for the accession numbers. a special punch in column one was used to identify united nations documents so that they were listed after the united states sequence. column seven indicated the card number for a given document and was suppressed in the print-out. the title field included not only the title, but series and number, personal author and monographic date where this information was suitable. the flexible field was used for any information for which the librarian wished kwic cards. a cross reference or explanatory note about the location of publications of a quasi-independent agency was incorporated in the title field. the procedural data included type of publication, binding and frequency information, accounting data and similar notations. a sample of part one has been reproduced in figure 1. part two of the list, the kwic index, was produced by an ibm 1620 computer, model one with 40k memory. an excerpt has been reproduced in figure 2. only cards one and two were put into the computer along with the program and dictionary of exceptions. cards three and four were not used to produce the kwic index. the program required production of cards for author acronyms and for all keywords found in the title field. except in the cases of author acronyms and first words, a keyword was identified by the fact that it followed a blank space. blanks were not necessary in these two cases because they were incorporated in the computer program. single letters, integers, and exceptions were not considered keywords. the index was printed so that the accession number always appeared on the left, and the author agency was followed by an asterik and a space. the wraparound format usually associated with kwic indexes was abandoned to improve visual clarity. results about eight months after its inception, 2600 items had been entered on a separate documents collection list. the list had been printed offline on three-part print-out paper interleaved with carbon. mter the papers were reassembled in looseleaf binders, they were made available in the documents and reference departments and in the science library. ·•' ' 142 journal of library automation vol. 2/3 september, 1969 131) .jntpub 130 32 22 trans. lations on communist chinae 6 50ooonoo1t1968-to date y3oj66/13 131 rt.ral 131 42 22 federal programs available to assist rural americao 1968o· 60epost 1968 ' a97 o2/l5z 132 fedi\ar 132 ~ directory of research natural ar~as on federal lands of the u.s. 1968. nz 4z 22 60epost 1968 y3 of 31/19/9/968 133 jntpub 133 tr-anslations on east european agricul;turet forestry + foodt industries, 133 32 22 6gift i/noo117ol963-to' date y3oj66/13 iranslaiions on. easi european foreign ira 134 jnipob 134 :>2 22 6 15ooo//noo80tl963-to date y3oj66/l3 135 jntpub 135 translations on economic organization and management in eastern europe. 135 32 22 6 65,00n0ol44t 1963-to date y3oj66/l3 136 jntpllb translations on mongoi.!ao 136 32 22 6 13,00noo19t 1963-to' date 137 jntpu'b translations on north koreao 137 32 22 6 20ooonoo1t 1966-to date lio jntpub translations on north vietnamo 138 3'2 22 6 70,00not1't 1966-to date 139 jntpub translations on south + east asiao 139 32 22 6gift 1963to date 140 jntpub translations on latin americao 140 32 22 6150,001967-to date 141 jntpub translations on the near east. 141 32 22 6 50,001966-to date 143 jntfub 143 32 22 translations on u,s,s,ro agricultureo 6 '10o001967-to date y3oj66/l3 y3oj66/13 y3oj66/13 y3oj66/13 y3tj66/13 y3tj66/13 y3oj66/l3 144 jf\;tpu6 144 32 22 u,s,s,r, economy and !ndustryo general information, 6 40,001966-to date y3oj66/13 145 l!bcon accessions listt ceylono 145 32 22 6pi..-480//l967-to date lcle3017/volt/noo 146 libcon accessions l!stt !nd!ao 146 32 22 5pl-480//l963-to date lc1,301volo/noo 147 libcon accessions listtindonesia o 147 jz 2 2 6pl-4801964-to date lc1o30/5/vol.inoo 1'48 libcon accessions listt middle east, 148 3z 2z 5pl-480//1963-to date lclo 30/3/vo.lo/noo 149 libcon accessio ns listt nepalo fig. 1. accession number list. 166 113 6 68 66 164 117 113 64 56 163 44 19 31 32 3 71 83 108 195 176 190 78 1~ 172 160 66 123 11 74 18 179 30 62 174 l2.3 177 42 93 . 118 179 166 155 9cooc2 96 176 90vv.j3 139 133 134 128 129 135 55 141 148 kwic index 143 congresso 1949-1951. desper * repto to the president and the congressional otre(toryo congrs *official congressional district data booktredistricteo states• suppe census * congressional recordo congrs * congrs * congressional recordo congrs * factual.. campaign informationt1968o congrs * message of the president of the u,s, and accompanying documents congrs * official (ong!iess!onal. directoryo conrep * calendars of the uoso house-of-representatives, conrep * laws relating to social security + unemployment compensatjontl9 consen * calendar of business, consen * nomination + election of president + vice-president of the u,st conservation, agrscs * so!~ consir!!ction reptso ohous!ng authorized by......b.\.i.l.l.d.lng. p.e.rmjj_a_~el!~y._s_* __ construction reptstthousing saleso census * construction reptsot housing authorized by building permits, census * consumer price indexes, lsblab * contemporary artistt pubo 4730o 1968, smiths * the armed forces of the u contributionst oasi-noo hewsaa * social security co•lperat i ve water resojj_rc..e.s_f3~.e.ft..b~l:la.~-d_.i.r~ u:u.i'l.y.lli~_1_a_l_n_t_~.!h__::*--- coronary drug projectt phs pubo 1965t revo l968o hewphs * the corps• l967o opport * job cost of clean watero 1968o wtrpol * countries for ~!sa of u,s, passports, stated * fees charged by foreign county business patternse census * court decisions, spctju * supreme courts, 1966 t1967 • childb •. legal!lil!c'iograt'h;rf"® ·.ruve'nllearo-f'a'filtv·crime + delinquency abstracts, hewphs * crime delinquency abstracts, before 1965 see international bibliography on crime + delinquencyo crisis of our citieso 1968, presdt * meeting the insurance crisistfactst myth + social changeo 1967• hewwel * rural youth in cr')ps + market so foragr * foreign ag.ricult(ir-e including • • • --··~ ct-1o 1967o census .* data access descriptionst census tabulations avatla deaf peopleo1968o vocreh * vocational rehabi~itation of defciv * safety reviewo defwro 4 naval research reviewst del!nquency abstracts'!. . .':!.~ wpi-)s * crime '!' . demonstration findingso labord * mdtatexperimental + department of state bulle stated * department store sales in selected areas, census * special current busin deptot or missouri-miscellaneouso 1863e comwar 4 repto western descriptions• census tabulations available on computer tap! series• ct-1 desper * repto to the pr.e.s.ule.iil._an_d_.jj:i.e. _con~r.es.so_,_ ].9!+9-j.9.-=5~1.t.•=--:-:--=---::=:::: developments 1834-1962t l962o stated * a historical summary of uoso-kore developmentsol96l-65o1968o ecosoc * capital punishmentt pttl repto1960tp drug abuse controlo hewfda * boac bulletin issued by the bureau of drug projectt phs pubo 1965t revo 1968o hewphs * the coronary duties• guardianshipo 1968o womenp * arental rights and east as l a. jntp..ujl! _ _t_rai'is.l.a .. llon_s_qtl_. south_.+ __ __ ·~ -~=c=;;:---:t.=c:-=---::--=:,...,-;-= east european agricul turf t forestry + foodt i ndustri es o jntpub * transl-,;" east european forf.ign tradeo jntpub * translations on eastern europeo jntpub * political translations on eastern europeo jntpub * sociological translations on eastern europe o jntpub * translations on economic organization and manag eastern europ~-l-state!l!'_t;_~_qi}._t;i.§_e_?_w!th the. s_ovg,t .union .• + .•• . ·----easto jntpub * translations on the near east, libcon * accessions ltsro middle fig. 2. kwic index. 144 journal of library automation vol. 2/3 september, 1969 the copies were usable only on the temporary basis for which they were intended. pages ripped easily in the binders. the printing on copies two and three, which were carbon copies, smudged readily. production of more permanent copies of the list was deferred until the catalog should be more complete. because of the preliminary nature of this project, no specific time accounting was made. there was an attempt to increase student assistant duties in order to save regular staff time. the librarian annotated superintendent of documents shipping lists to indicate which items required new punched cards. she omitted, for example, journals that were entered once with an open holdings statement. after annotation, punched cards for united states depository documents were made by student assistants who had been previously introduced to the allocation of columns and to punching procedures. for non-depository united states and united nations publications, the librarian mapped cards on 80-column format sheets. in the production of part two, the kwic index, staff was involved only to make cross references. since the kwic program had been designed to make entries for all words in the title field other than the dictionary of exceptions, cross references had to be interfiled manually after the kwic entries had been made and alphabetized. the cost of materials for the addition of 100 entries to the list is tabulated below: materials ibm cards ( 800) print-out paper ( 8 sets) ibm 1620 computer rental ( 4 minutes) costs $ .80 + freight .32 + freight 1.68 $ 2.80 + freight there was no charge for the use of the keypunch (ibm 026), sorter (ibm 082), and accounting machine (ibm 407), nor was the library charged by the computer center persom:iel who wrote the program. for the first 100 items, all cards were duplicated in production as insurance against destruction (thus the card expense itemized above was doubled). the duplicate deck was later eliminated because the time spent in duplicating and interpreting these cards was greater than that required to repunch the deck from the list entries. storage space was available without cost, and no new storage equipment was purchased. the kwic program was written so that keyword cards were made for all words in the title field except listed exceptions, single letters and integers. it seemed at the inception of the project that such a program, which allowed untrained assistants to punch cards with a minimum of difficulty, was preferable to one that involved tagged keywords. however, ' the necessary filing and removal of cross references subsequently proved an inconvenience when the list was updated and reprinted. kwic index 145 discussion the productivity of government publishers has directed so much material into the library that ordinary procedures have been overtaxed. card catalog entries, for example, have become tardy, cumbersome, and incomprehensible to the library user and expensive for the library. the kwic list was designed as a substitute; however, it was useful only where the subject of a publication had been fairly reflected by its title. the possibility of incorporating descriptors in the title field of the list was considered, but rejected in the interests of speed and efficiency. the list depended upon standard reference sources for more complete subject and analytic cataloging. most often used in the case of united states publications were the superintendent of documents monthly catalog (9) and its auxiliaries such as the bureau of the census catalog ( 10). other sources included: wilson, popular names (11), the readers guide, the social science and humanities index, the business periodicals index, the index to legal periodicals, and the commerce clearing house index. for the united nations publications, greater use was made of the trade publications such as the periodic check list (formerly the monthly sales bulletin), the international reporter, the unesco catalogue, and the publishers trade list annual section for "unesco." the kwic index also was limited in that it covered only documents in the library's collection. while the user was convenienced by the ready availability of all items listed, he was obliged to consult reference sources for other existing documents. the new tool had advantages similar to book catalogs in terms of space saving and ease of duplicating. although originally only three copies were made, the possibility of duplication and distribution of this list to interested academic departments had been considered. it was also intended that new punched cards would be used to produce lists of new accessions, which would be duplicated and circulated. the problem of updating involved reprinting of part two, the kwic index, after previous inter-alphabetizing of entries. part one, however, was not reprinted as new entries were added successively. corrections were made by duplicating parts of cards and punching where necessary. the availability and cun-entness of such a list would presumably have encouraged the faculty and students to make greater use of these materials, and eliminated duplication of purchase orders. a major drawback to the list was that its arrangement, by accession munbers, bore no particular logic. a classification number an-angement would have been more meaningful to the reader; it would also have served as a shelf list and provided material for subject holdings lists. however, the ibm cards were not so arranged because neither the mechanical nor manual sorting of multi-digit and letter numbers was practical. arrangement by superintendent of documents numbers was employed at north146 journal of library automation vol. 2/ 3 september, 1969 eastern university, boston, massachusetts, and proved so inadequate that the librarian added subject headings to documents punched cards. this extra time-consuming step, plus the need to manually file punched cards, influenced the author to abandon shelf list order. a second difficulty involved in the kwic project was the dependence of the library upon use of equipment owned by another agency. it was conceivable that alterations in the equipment, policies or personnel of the university computer center could enforce changes on the library's listing procedure. this evaluation of the kwic index excluded considerations of the separation of the documents and reference departments. this matter has been thoroughly discussed elsewhere ( 12) . two other subjective considerations appeared during the first year of operation. most serious was the estrangement between documents and reference personnel. since both departments served the public, and their material was distinguished only by publisher, each staff relied extensively upon the other. cooperation and acquaintance with library material was difficult to maintain in two separate departments. because documents staff were primarily public service personnel, their extensive involvement in technical processes was not an efficient use of staff expertise. on the other hand, complete responsibility for this portion of library holdings insured that the staff became thoroughly acquainted with the collection and were better able to serve the public. conclusion the kwic index to government publications at brandeis is difficult to evaluate before the tests of time and use have been made. the system was suitable for the university library in that it was frequently consulted by the same, relatively sophisticated, users who were eager to familiarize themselves with library material. the kwic list itself emphasized currentness and flexibility at the expense of detailed subject access. this system attempted to utilize a potential goldmine of material without major investment or upheaval in the library. it has been sufficiently resilient to withstand a complete change of department personnel and was successful enough so that the possibility of expansion is being considered. note this report described the documents department as it functioned at its inception in september, 1967. the author left brandeis universityin june of 1968. the scope of the documents list was changed in september 1968 to include all united states and united nations publications acquired by the library. any inquiries about the present system should be directed to the current documents librarian: mr. michael abaray, goldfarb library, brandeis university, waltham, massachusetts 02154. kwic index 147 references 1. "report on the sixty-ninth meeting of the association of research libraries, new orleans, la. 1/8/67," lc information bulletin, 26 (january 26, 1967), 70. 2. kane, rita: "the future lies ahead: the documents depository library of tomorrow," library journal, 92 (november 1, 1967), 39713973. 3. morehead, joe: "united states government documents-a mazeway miscellany," rq, 8 (fall1968), 47-50. 4. janda, kenneth ed.: "advances in information retrieval in the social sciences," american behavioral scientist, 10 (january and february 1967). 5. sternberg, v. a.: "miles of information by the inch at the library of the bettis atomic power laboratory, westinghouse electric corporation," pennsylvania library association bulletin, 22 (may 1967), 189-194. 6. lawson, constance : "report documentation at texas instruments, incorporated," special libraries association, texas chapter bulletin, 15, (february 1964), 14-17. 7. minton, ann: "document retrieval based on keyword concept," special libraries association, texas chapter bulletin, 15 (february 1964)' 8-10. 8. bauer, c. b.: "practical application of automation in a scientific information center-a case study," special libraries, 55 (march 1964), 137-142. 9. united states. superintendent of documents: monthly catalog of united states government publications (washington : government printing office, 1895) . 10. u. s. bureau of the census : bureau of the census catalog ('washington: government printing office, 1945). 11. wilson, donald f. and william p. kilroy, comps.: popular names of united states government reports, a catalog (washington: government printing office, 1966). 12. shaw, thomas shuler, ed.: "federal, state and local government publications," library trends, 15 (july 1966), 3-194. zumalt ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ editorial board thoughts: doesn’t work mark cyzyk editorial board thoughts | cyzyk 3 the proof of the pudding’s in the eating. miguel de cervantes saavedra. the ingenious hidalgo don quixote de la mancha. part i, chapter xxxvii, john rutherford, trans. about fifteen years ago i had two students from germany working for me, jens and andreas. those guys were great. they were smart and funny and interesting and always did their best. i would send them out to fix things around the library, and they would dutifully report back with success or failure. i told them that, particularly if there was a problem with a staff workstation, “if it breaks in the morning, it must be fixed by lunchtime; if it breaks in the afternoon, it must be fixed by 5:00.” they understood that if a staff workstation was down, then that probably also meant a staff member was just sitting there, waiting for it to be fixed. if we had to we could slap a sign on a broken public workstation and get back to it later—there were plenty of other working public stations after all—but staff workstations must be working at all times. insofar as we had an aged fleet of pcs whose cmos batteries were rapidly giving out, i kept jens and andreas running around the building quite a bit. on occasion, though, they would report back with the dreaded, “hey boss, doesn’t work.” this was the one thing that would raise my ire. “of course it doesn’t work, that’s why i sent you down there!” i would think. the phrase “doesn’t work” became for me a pavlovian signal that i was about to drop everything and go take a look myself. it now occurs to me, though, that this notion of “work” is precisely the point of technology, and that sometimes this gets lost for those of us employed fulltime as technologists in libraries. let me explain: in my opinion and for the most part, the proper role of the technologist in a library is that of a consultant on loan to the departments to work on projects there, embedded.1 two of the best bosses i ever had said essentially the same thing to me in our introductory first-day-on-the-job chit-chat: “you report to me, but you work for them.” such is the proper attitude in any serviceoriented profession. does this not frequently get inverted, subverted, lost? what happens is that technology starts to take on an importance undeserved. it becomes selfreferential and insular; a technology-for-technology’s-sake attitude arises. mark cyzyk (mcyzyk@jhu.edu) is scholarly communication architect in the sheridan libraries, john hopkins university. mailto:mcyzyk@jhu.edu information technology and libraries | june 2012 4 but technology-for-technology’s-sake is just wrong. technology is merely a means to an end, not an end in itself. the word itself derives from the ancient greek technê, most frequently translated into english as “craft” and frequently distinguished in the greek philosophical literature from epistêmê or (certain) knowledge.2 so it is here that the crucial distinction in the western world between practical and theoretical activities is made, and technology is clearly a practical, not theoretical activity. as such, it has by its very nature practical outcomes in the world: technology works in the world. technology is instrumental in achieving certain practical outcomes; its value is as a tool, instrumentally valuable, not inherently valuable. it is not for its own sake that we implement technology; we implement technology to get some sort of work accomplished in the world. our programming languages, application servers, web application frameworks, ajax libraries, integrated development environments, source-code repositories, build tools, testing harnesses, switches, routers, single-signon utilities, proxy servers, link resolvers, repositories, bibliographic management utilities, help-desk ticketing applications, and elaborate project-management protocols are all for naught if the final product of our labor, at the end of the day, doesn’t work. our product is not only literally useless, it is worse than useless because the library in which we labor has devoted precious resources to it only to result in a service or product that does not properly function, and those are precious resources that could have been spent elsewhere. hey there fellow technologists, why am i being so dismal? i would prefer the term “grave” to “dismal.” significant portions of the library budget are put toward technology each year, and as those whose duty it is to carry our local technology strategies into the future, we need to always be mindful of the fact that each and every dollar spent on technology is a dollar not available for building our collections—surely the direct center of the mission of anyone who calls himself a librarian, a.k.a., a cultural conservationist. (shouldn’t we be wearing badges that read, “to collect and preserve”?) making it work is job one for the technologist in the library. … a colleague and friend of mine once told me, a decade ago, that our fellow colleague made a snippy comment about an important and major web application i had written, “just because it works doesn’t mean it’s right.” now, admittedly, i was a very sloppy code formatter, and yet i certainly would never say that the applications i wrote were steaming plates of spaghetti. on the contrary, i think the code i wrote consisted of good, solid procedural programming. what my disgruntled colleague meant, i think, was that i failed to follow a framework, and by “framework” he naturally meant the same framework to which he’d recently hitched his own coding wagon. my response to his snippiness was, “ah, pretty-it-up all you want, organize it any-which-way, but functional code-code that works--is actually the number one criterion for being good code.” just ask your clients. editorial board thoughts | cyzyk 5 that app i wrote has been in production, happily working away as a key piece of the enterprise network infrastructure at a prominent, multi-campus, east coast university since 2002.3 references 1. and here i heartily agree with my fellow editorial board member, michael witt, when he notes that “[p]art of this process is attempting to feel our users’ pain…”, and i even extend this to the point of us technologists actively working with our users toward a common goal, literally sitting with them, among them, not merely being present to offer occasional support, not merely feeling their pain but being so invested in our common project that their pain is our pain. [did i really just suggest we take on more pain?! yep.] see: michael witt. “eating our own dogfood.” information technology and libraries 30, no. 3 (september 2011) 90. http://www.ala.org/lita/ital/sites/ala.org.lita.ital/files/content/30/3/pdf/witt.pdf 2. i’m no classics scholar, but this is my recollection from taking a graduate seminar many years ago on this very topic. so while i’m not pulling this entirely out of thin air, i am pulling it from the musty mists of middle-aged memory – that, and a quick scan of professor richard parry’s fine article on this topic in the stanford encyclopedia of philosophy, particularly the section on aristotle’s views. regarding my comments below on technology being instrumentally valuable, i cite parry’s words: “presumably, then, the craftsman does not choose his activity for itself but for the end; thus the value of the activity is in what is made”. see: richard parry. "episteme and techne," the stanford encyclopedia of philosophy, fall 2008 edition, edward n. zalta, editor. http://plato.stanford.edu/archives/fall2008/entries/episteme-techne/ 3. mark cyzyk, "the johns hopkins address registration system (jhars): anatomy of an application," educause quarterly 26, no. 3 (2003). https://jscholarship.library.jhu.edu/handle/1774.2/32800 http://www.ala.org/lita/ital/sites/ala.org.lita.ital/files/content/30/3/pdf/witt.pdf http://plato.stanford.edu/archives/fall2008/entries/episteme-techne/ https://jscholarship.library.jhu.edu/handle/1774.2/32800 editorial board thoughts | dehmlow 103 i n the age of the internet, google, and the nearly crushing proliferation of metadata, libraries have been struggling with how to maintain their relevance and survive in the face of shrinking budgets and misinformed questions about whether libraries still provide value. in case there was ever any question, the answer is “of course we do.” still, an evolving environment and changing context has motivated us to rethink what we do and how we do it. our response to the shifting environment has been to envision how libraries can provide the best value to our patrons despite an information ecosystem that duplicates (and to some extent replaces) services that have been a core part of our profession for ages. at the same time, we still have to deal with procedures for managing resources we acquire and license, and many of the systems and processes that have served us so well for so many years are not suitable for today’s environment. many have talked about the need to invest in the distinctive services we provide and unique collections we have (e.g., preserving the world’s knowledge and digitizing our unique holdings) as a means to add value to libraries. there are many other ways libraries create value for our users, and one of the best is for us to respond to needs that are specific to our organizations and users— specialized services, focused collections, contextualized discovery, all integrated into environments in which our patrons work, such as course management systems, google, etc. the library market has responded to many of our needs with ermss and next-generation resource management and discovery solutions. all of this is a good start, but like any solution that is designed to work for the greatest common denominator, they often leave a “desired functionality gap” because no one system can do everything for everyone, no development today can address all of the needs of tomorrow, and very rarely do all of the disparate systems integrate with each other. so where does that leave libraries? well, every problem is an opportunity, and there are two important areas that libraries can invest in to ensure that they progress at the same pace as technology, their users, and the market: open systems that have application programmer interfaces (apis), and programmers. apis are a means to access the data and functionality of our vended or opensource systems using a program as opposed to the default interface. apis often take the shape of xml travelling in the same way that webpages do, accessed via a url, but they also can be as complex as writing code in the same language as the base system, for example software development kits (sdks). the key here is that apis provide a way to work with the data in our systems, be they backend inventory or front-end discovery interfaces, in ways that weren’t conceived by the software developers. this flexibility enables organizations to respond more rapidly to changing needs. no matter which side of the opensource/vended solution fence you sit on, openness needs to be a fundamental part of any decision process for any new system (or information service) to avoid being stifled by vendor or open-source developer priorities that don’t necessarily reflect your own. the second opportunity is perhaps the more difficult one given the state of library budgets and that the resources that are needed to hire programmers are higher than most other library staff. but having local programming skills easily accessible will be vital to our ability to address our users’ specific needs and change our internal processes as we need to. i think it is good to have at least one technical person who comes from an industry outside of libraries. they bring knowledge that we don’t necessarily have and fresh perspectives on how we do things. if it is not possible to hire a programmer, i would encourage technology managers to look closely at their existing staff, locate those in the organization who are able to think outside of the box, and provide some time and space for them to grow their skill set. i am not so obtuse as to suggest that anyone can be programmer—like any skill it requires a general aptitude and a fundamental interest—but i am a self-taught developer who had a technical aptitude and an strong desire to learn new things, and i suspect that there are many underutilized staff in libraries that with a little encouragement, mentoring, and some new technical knowledge, could easily work with apis and sdks, thereby opening the door for organizations to be nimble and responsive to both internal and external needs. i recognize that with heavy demands it can be difficult to give up some of these highly valued people’s time, but the payoff is overwhelmingly worth it. these days i can only chuckle at the doomsday predictions about libraries and the death of our services— google’s dominance in the search arena has never really made me worried that libraries would become irrelevant. we have too much that google does not, specifically licensed content that our users desire, and we have relationships with our users that google will be incapable of having. i have confidence that what we have to offer will be valuable to our users for some time to come. however, it will take a willingness to evolve with our environment and to invest in skill sets that come at a premium even when it is difficult to do so. mark dehmlow editorial board thoughts: adding value in the internet age— libraries, openness, and programmers mark dehmlow (mdehmlow@nd.edu) is digital initiatives librarian, university of notre dame, notre dame, indiana. 157 library network analysis and planning (lib-nat) maryann duggan: director, industrial information services program, southern methodist university, dallas, texas a preliminary report on planning for networl< design undertaken by the reference round table of the texas library association and the state advisory council to library services and construction act title iii texas program. necessary components of a network are discussed, and network transactions of eighteen dallas area libraries analyzed using a methodology and quantitative measures developed fm· this project. to be a librarian in 1969 is to stand at the crossroads of change, with a real opportunity to put libraries and professional experience to work on immediate problems of today's world. in mobilizing total library resources for effective service to a variety of patron groups in a variety of ways, the librarian has at hand an exciting new tool of great potential and equally great challenge: the library network library networks and reference services networks and all that they imply are simply an extension of good reference services as they have been practiced for years, but their existence and potential capability require redefinition of the reference function, which, being no longer limited to one collection, has been given new dimensions of time, depth and breadth. networks, and the inter-library cooperation they require, offer an opportunity to combine materials, services and expertise in order to achieve more than any one library can do alone. in this case, the whole is greater than the sum of its parts, for each library can offer its particular patron group the total capability of the network, including outside resources not previously available. 158 journal of library automation vol. 2/ 3 september, 1969 with the new tool of library networks, it is possible to provide responsive, personalized, in-depth reference service, and to provide it so rapidly that a patron can receive a pertinent bibliography covering his desired topic within an hour of his original inquiry. the reference librarian becomes an expert in resources and resource availability at the national level. his reference desk becomes a switching center, at which he receives and analyzes inquiries, decides the level of service required, identifies available sources or resources that match an inquiry, transmits the latter ( restructured to be compatible with the network language), conducts a dialog with the source, receives the response and interprets it to the patron. this procedure is not markedly different from what has been done for years in any reference library, but with greater potential the process must be more formalized and structured. networks do require new expertise and crystallizing the reference philosophy. clarification is needed as to 1) types or levels of reference services, and unit operations in reference services; 2) the role of in-depth subject analysis of reference queries; 3) decisions on alternate choices of sources and of communications links; 4) structuring of large blocks of resources to permit fast access; and 5) the role of each library in the network and its responsibility to the network. approach to network design the reference round table of the texas library association and the state advisory council to library services and construction act title iii texas program have been struggling with the challenge of inter-library network design for the past two years. this paper is written to share with reference librarians some of their preliminary findings and to urge the involvement of reference librarians in planning and developing networks and network parameters. for identification the project herein described is referred to as lib-nat, for library network analysis theory. although only the author can be blamed for any faults of this "theory," many persons have contributed to the development of it. the reference round table of the texas library association has provided the forum for exploring and developing ideas on inter-library cooperation. title iii of the library services and construction act has provided the legal and financial impetus enabling the field testing of some of those ideas. texas chapter, special libraries association, has sparked and catalyzed ideas and clarified needs. the state technical services act provided the vehicle for experimental development of new approaches to reference services. southern methodist university provided the haven and ivory tower from which these new approaches could be tried under the cloak of academic respectability. but, of greatest importance of all, individual librarians, with vision and desire to be of service and willingness to try new things, have been the driving force in helping to develop new concepts of library use and purpose in the texas area. lib-nat 159 the basic philosophy back of lib-nat is simply that any person anywhere in the state of texas should have access to any material in any library anywhere in the state through a planned, orderly, effective system that will preserve the autonomy of each library while serving the needs of all the citizens of the state. particular needs of special user groups (such as the blind or the accelerated student or the industrial researcher) should also be identified and provided for in a cooperative mode through local libraries throughout the state. network components in the process of developing lib-nat, twelve critical components were identified that are essential to orderly, planned development of the objectives stated above. as a minimum, such a network must have the following: 1) organizational structure that provides for fiscal and legal responsibility, planning, and policy formulation. it must require commitment, operational agreement and common purpose. 2) collaborative development of resources, including provision for cooperative acquisition of rare and research material and for strengthening local resources for recurrently used material. the development of multi-media resources is essential. 3) identification of nodes that provide for designation of role specialization as well as for geographic configuration. 4) identification of primary patron groups and provision for assignment of responsibility for library service to all citizens within the network. 5) identification of levels of service that provide for basic needs of patron groups as well as special needs, and distribution of each service type among the nodes. there must be provision for "referral" as well as "relay" and for "document" as well as "information" transfer. 6) establishment of a bi-directional communication system that provides "conversational mode" format and is designed to carry the desired message/document load at each level of operation. 7) common standard message codes that provide for understanding among the nodes on the network. 8) a central bibliographic record that provides for location of needed items within the network. 9) switching capability that provides for interfacing with other networks and determines the optimum communication path within the network. 10) selective criteria of network function, i. e., guidelines of what is to be placed on the network. 11 ) evaluation criteria and procedures to provide feedback from users and operators and means for network evaluation and modification to meet specified operational utility. 160 journal of library automation vol. 2/ 3 september, 1969 12) training programs to provide instruction to users and operators of the system, including instruction in policy and procedures. the foregoing components of the ideal inter-library network (one so designed that any citizen anywhere in the state can have access to the total library and information resources of the state through his local library) may be considered the conceptual model, or the floor plan from which the network of the program can be constructed. although these twelve components might be labeled "ideal," they are achievable and they are within reach of the present capability of all libraries today. they have also weathered the unrelenting critique of 288 reference librarians in the march 27, 1969, tla reference round table ("the 1969 reference round table pre-conference institute: an overview," texas library journal, vol. 45 (summer 1969), no. 2.). during that reference round table the twelve components were tested in a simulated network, using 42 cases. in this behavioral model actual, current inter-library practices were observed during game-playing in the simulated network. the experience verified that the components outlined above are essential to the development of planned, cooperative, inter-library systems. analysis of network transactions as part of the lsca title iii project, and to test the twelve components, exploration was instituted into the existing inter-library relations among eighteen libraries of all types in the dallas area to see how current practices compared with the ideal conceptual model the essential minimum requirement of a library is document transfer, i. e., the ability to supply a known item on request; and on-going inter-library loan transactions are a valid indicator of emerging network patterns in the current environment. this microscopic study of 1967 individual library loans among eighteen libraries of different types has provided a wealth of insight into network developments. as a pilot model it has offered a means of observing and studying existing practices, identifying problems, and experimentally evaluating the effect of changes in the system or environment. more must be known about on-going inter-library transactions for the design of improved networks. in the attempt to find out who was attempting to borrow what from whom and how successfully requests were filled, the following variables were considered: 1) type of library, both borrowing and lending, such as academic, public, special, or public school. 2) type of message format, i. e., telephone, twx, telex, letter, or interlibrary loan. 3) type of item requested in the transaction, such as monograph, serial, map, document. 4 ) geographic location of borrowing and lending library, i. e., local, area, state, regional, national or international. lib-nat 161 the complexity of even a small pilot model required the formulation of some rigor in the analysis and the development of analytical tools and symbolic models. figure 1, for example, is a symbolic model that permits comparison of two variables simultaneously, e. g., the type of library participating in the transactions and the geographic level of the participants. for modeling purposes, it was assumed all libraries fall into one of four 1 = local 3 = state 2 = area 4 = re ion switching centers fig. 1. symbolic model of inter-library networks. classes represented by the quadrants in figure 1. also it was assumed that each library can be identified as to a specific geographic level, as indicated by the numbers 1 through 6. in the analysis of the pilot model data it was observed that transactions occur among libraries of the same type and at the same geographic level, and between libraries of different types at different geographic levels. figure 1 provides a symbolic model for conceptualizing these various types of transactions. switching centers, represented on figure 1 by the circles around the geographic numbers, participate in transactions at varying geographic levels, as well as between and among various types of library sectors. the role and the location of switching centers is an important aspect of lib-nat. 162 journal of library automation vol. 2/ 3 september, 1969 within the framework of the symbolic model, the simple form of interlibrary loan may be represented as a two-body transaction between the borrowing library and the lending library, as shown in figure 2. applying these transactions on the symbolic model of figure 1 and considering both a b fig. 2. two-body transaction. type of library and geographic level, four general classes of two-body transactions can be identified: 1 ) homogeneous vertical, i. e., between two libraries of the same type but at different geographic levels (pt _..,.. p~; st _..,.. sa) ; 2) heterogeneous horizontal, i. e., between two different types of libraries at different levels ( pt _..,.. a1; st _..,.. p1); 3) heterogeneous vertical, i. e., between two different types of libaries at different levels (pt _ ..,.. a4; sl _..,.. pg); 4) homogeneous horizontal, i.e., between two libraries of the same type and the same geographic level (pt _..,.. pt; s2 .... s2). the formulas serve as a shorthand symbolic representations of some typical transactions of these four classes. the final report on lib-nat will contain statistical data on distribution of pilot model transactions by type and by geographic level, showing type interdependency and geographic dependency or self-sufficiency. further analysis of the pilot model data revealed another type of transaction, the three-body transaction, in which a third agent becomes involved. the third agent may act as a referral center, as illustrated in figure 3, or as a relay center, as illustrated in figure 4 ( sw indicates switching center) . part of the lib-nat theory specifies that there is a distinction between referral and relay, and that the latter is a valid function of a true switching center. figure 5 illustrates the various types of possible three-body transactions with different geographic levels of switching among the different types of libraries. which of these transactions is the most efficient or has the greatest utility is one of the basic design parameters needing further analysis. it should be noted that the variable, of message format, that is, the channel of communication or type of communication link, has not yet been investigated in the symbolic modeling of these transactions. lib-nat 163 ..... .... a ... b c ~ .4~ t fig. 3. three-body transaction: referral. ... .... a sw b fig. 4. three-body transaction: relay. at • swt ., st at • sws ~ a4 ps • sws • pt sc1 ~sw2 .... sc2 p2 .. p1 ~sw3 .. p3 p2 ... p1 ~sw3 ~ a3 p2 • p1 ~sw3 .. sw1 ~s1 fig. 5. three-body transactions at various geographic levels. .. . . 'i 164 journal of library automation vol. 2/ 3 september, 1969 network configuration another very important design parameter is the network configuration or organizational hierarchy specifying the communication channels and message flow pattern. figure 6 illustrates symbolically a non-directed configuration of communication. if each dot represents a node in the network ( i. e., a participating library), and each line represents a communication link, it can be seen that each node can communicate directly with every other node, providing (or requiring) a total of fifteen links among the six nodes . n·l c = n (-2-) =15 fig. 6. non-directed network . by contrast, figure 7 illustrates a directed configuration to which the six nodes are interconnected through a switching center and requiring only six channel links. in like manner, if a non-directed network desires to interface with a specialized center, such as the library of congress or a special bibliographic center or search center, a total of twenty-one channels is required (figure 8), whereas a directed network can interface with a specialized center via only seven channels, as illustrated in figure 9. __ .... ----------jc~n-t=sl fig. 7. directed network. lib-nat 165 fig. 8. non-directed network including specialized center. fig. 9. directed network including specialized center. 166 journal of library automation vol. 2/ 3 september, 1969 as local or area networks begin to develop, there will be a need for tying together two area networks to develop larger units of service. the interfacing of an original network of six libraries in one area with an adjoining area network of sl"( libraries will result in the network configuration shown in figure 10 in the case of a non-directed network, and sixty-six communication links among twelve nodes will be required. whereas, if two directed networks of six libraries each desire to interface, a type of linkage requiring only thirteen channels may be envisioned (figure 11). which is the best type of network configuration? what are the decision parameters that should be considered in designing or planning network configuration? how can alternate configurations be evaluated ? alternate channel requirements? and alternate geographic levels of switching? in the pilot model study, a mathematical model has been devised which can be used for simulating various configurations and channel capacities, fig. 10. interface of two non-directed networks. fig. 11 . interface of two directed networks. lib-nat 167 thereby permitting some desired criteria function of network performance to be maximized or optimized. the details of the mathematical model will be published as part of the final report on lib-nat; in the meantime it can be said that this is a fascinating area of network analysis which will be useful to any group of libraries planning network configurations. the mathematical model-a multi-commodity, multi-channel, capacitated network model, developed by dr. richard nance at southern methodist university as part of the title iii project-promises to have a high potential application in network design and performance evaluation. it does require that the librarian make some hard-nosed decisions on operational and performance parameters of the inter-library systems discussed in the preceding article, but this is part of the challenge of lib-nat. measures of participation it is obvious that types of libraries, geographic level, types of transactions, various network configurations, alternate communication links and switching levels are all important in planning inter-library systems. next it is necessary to take an in-depth look at the relationship between the individual participating library and the total network. in the pilot model study of eighteen libraries a noticeable difference appeared in the magnitude and type of participation. in surveying only the two-body transactions, it was observed that some libraries were primarily borrowers and others primarily lenders, and some were heavy and some light. in pursuit of a quantitative method of representing these relationships some formulae were evolved which are helpful in understanding node/network dynamics. starting with the individual library or node, let b .. equal the number of borrowing transactions originating at that node and l,. equal the number of lending transactions; then l.. plus b,. will equal the total number of all transactions at that particular node. in like manner, looking at the total network (in this case all eighteen participating libraries), let bt equal the total number of borrowing transactions originating in the network and l, the total number of lending transactions; then b, plus lt will equal the total number of both types of transactions in the network. in the analysis of node/network dynamics, it was felt there should be some way of quantitatively expressing the individual node's dependency on the total network and also a way of expressing the relative degree of activity of each node. in other words, a participating library that was a net borrower (compared to its lending) was obviously more dependent on the network than would be a library that borrowed very little compared to its lending. the extent of dependency can be expressed as a node dependency coefficient calculated as follows: b. b,. + l,. relative amount of borrowing compared to total node transactions .. .. 168 journal of library automation vol. 2/3 september, 1969 among its other uses, the dependency coefficient of a node may give some insight into the extent to which it should share in network expenses, but the dependency coefficient alone should not be a final criterion, since magnitude of activity is of equal importance. for developing a method of quantitatively expressing activity of a node compared to total activity of the network a factor called the node activity coefficient may be calculated as follows: relative activity of both types at _ one node compared to total activity be + lt in total network bn + ln then, to quantitatively express the dependency of a given node on the network, one can calculate the node/ network dependency coefficient as follows: b b+l fig. 12. 0. 0.6 0.5 0.4 0.3 o' o i 0.2 cp i 0.1 i i i 100 be + le > o. 5 = net borrower < 0.5 = net lender i i i i q i i i i i ·i i i i i 200 300 400 b+ l node dependency coefficient. i i i i i i b>l1 i b
dept. of en erg y, mines and resou rces< l96 8> 38 p . ill us ., fold. col . ~aps i in pocket! 25 cm . ** geologica l survey of canada. bulletin 168 **ca nada . geolog i cal sur vey. bulletin 16r ••2.00 geo logy british colu~bia . **gf.olo gy al be rta. lc 77-524 81>8 qe 185 pools fn oj tw 000 wt 000 s r024'l fc 557 .11 leng da tf: ma o. 1'1, l'l71 seldom project: ma o.c ii vol 02 no 4~ j 'l7 1 th e fij ll ~wing monographs in the are as i n whi ch you have fx~resseo int erf~ t rfpresfn t titles recently prcces~eo ry thf li~rapy of congress . this listi~g i s re i ng prov i ded to you as part of a research project 8f i ng conduc t ed ry your llerary i n cooperation ~ith the nati onal sc i fncf llijrary i n ottawa. murray ~e~or i a l li~rary , univers ity of saskatche o an , saskatoon , s ask . n cn nt cundy, robfrt . beacon six . london , eyrf & spottiswoooe, 1'l70 . 25 3 p., 16 plates. illus., 7 haps, ports. 21 c~ . •• so/ fr an klin, j ohn , s i r, 1786-ir47. **no rt ~w est t eqr i torifs , can descr ipti on and travel . **arcti c regions . lc 73-539884 f 1060 pool~ f~ ryj tw co o wt 00 0 s r0249 fc leng 9 17 . 122041 i 58~ c413263002 00 18 ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 0018 ···············································*·········································· ····································· 0018 ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • • ••••••••••••••••••••••••••••••••• ••••••••••••••• 0018 ··················································································································*············ 001 8 ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• • ••••••••• •••••••••••••••••••• •••••• 0018 ••••••••••••••••••••••••••••••••••••••••••••• • •••••••••••••••••••••••••••••••••••••••••••••••••••••••• • ••• •• •••• ••••••••••••••• fig. 2. sample profile notices. ~ > :;x:, c":l ....... ';"< b:l ~ ~ ~ ::x:l ~ ~ ~g ..... .......... 3:: > ~ trl ~ ~ 0 ~ ~ § 0.. (j') 3:: j-( ~ ~ ~ cjl ~ 152 journal of library automation vol. 4/3 september, 1971 profi le -number 0003 sheetnumber insert your address i.abel in this bi.ock reference department murray memorial library university of saskatchewan saskatoon, canada. sta'i'e you'i $earth ' li!qo!$t _,, iff .. ~ll~l:i\'1: 'ojm, aw l'w(i rej:t:fi(p'c~$ _:·of papers pu8usheo t'l! 'i!qo or a 'c.ot.i.[ague working i~ -::·yo.ur ::' ,fi~~o; ::. -(~i.£ase ,;t'tf'f; or:, priih.l .. this profile is intended to obtain information on current monographs that would most likely be of interest to the reference department, in order to keep the collection up to date. reference works such as dictionaries, encyclopedias, handbooks, catalogues, etc. are the kinds of items sought. references: 1. united nations. economic commission for europe . sub-committee on urban renewal and planning. "directory of national bodies concerned with urban and regional research. " new york: unitec nations, 1968. 134 pp. (jx1977) 2. berlin, roisman & kessler. "law and taxation; a guide for conservation and other nonprofit orqanizations". washinqton: conservation foundation, 1970. 47 pp, (kf6449) 3. hayes, robert m.' and becker, joseph. "handbook of data processing for libraries." new york; wiley-interscience, 1970 . 885 p. 4. havlice, patricia pate. "art in time". metuchen, n.j.: scarecrow press, 1970. 350 p. (n7225; 016.7; art -indexes} .. ...... .. .. . .... .. .. :; .. ..... .. .... .. .. :: l.lsf., "pro'fi~: : woilos/ mto tii~i'ci( rje~essions on iieversl: sloe .. ...... . ..... fig. 3. sample profile formulation sheet: narrative and references. marc ii-based retrievalfmauerhoff and smith 153 ., ·ii c. f>rofil.e worj)s .. :' .. .. .. ,· n w · ac . • profi\..e iiiorq~ a a. _as* _a anniia c ann iiai~ n rtri rnr.llap~* e almana c* f dictionar* g di rectory h directories i en cycloped* j facts k glossar* l guide* m han dbook* n in dex* 0 interlibrary p :heck [s. * 0 gfnfaiogy r manual s manuals t t oiiti tnf* t u reference t v rep rint* t w review* t x syllabus t y syllabi t z catalog* t aa abstract* t ab sta [s' * t ac yearbook* t ad re rboo k* .•.. r 99 air -~ 2 r 99 lim-+ 7 3 r 99 fig. 4. sample profile formulation sheet: t erms and logic. 154 journal of library automation vol. 4/3 september, 1971 p 0007 b (\ date : fe3 ?8 , 1970 /!. ciin<\0 11 13 br itis h co lu mb ia c al[l!'rti\ . ---··---0 saskajche~an e i'ia.'-hto'ia f ontario g ouefiec oatf : feb 28 , 197 c p 10 98 k k t t t t t a l~ i c43* b ll3 104 4* c avd!c-visut. 0 film* e aiioio* f vioe(u fl b tl ~ b b 13 fl fl 13 b h n'iva sc otia l prince edwaio islan d j newf•ju'lolanc . __ ,_ g instruct!dnal h tra~sp~renc ies i av k yukjn l nurth~est t~rritories m queen ' s pminter tl n u, s , r. 0 for s hl!; i:iy the supt ;--fl p ava il aa l£ f~o~ clfar!nghduse 8 0 na ti on al f\ r stat( 1 s gt , br it, 8 t h,'i , s . u. b u un it fd nat ions '3 v u, n, t w agarungpaph* eol r 99 ai b-m e02r99 n l ij r t t t t t p t t -,eo3 r 99 sit ___ · ---t e0 4 r 99 u iv e j5 r 99 w k g j la nguii ge lfltl* k tv l teach i ng ~ficrli~e* m progr~~~ec i~struction n ca i u c ~ mputeq-assisted ~9 ai c1 99 c 1.1 0 da te: hfl 2!!, 197 0 ')0()9 a pullut" tl contfi"h"lat* c pol stj~* -f env i ro'li·ient * f to• g n-* eol r ! ~i bf l&g fig. 5. computer version of profiles. for sequence, syntax and semantics, and generates codes for the boolean operators in the search expressions. the program hags incorrect data base specifications, incorrect term types, and incorrect alpha codes (i.e. symbols corresponding to the profile words). profile transactions are by way of card input, and can consist of profile additions, profile updates, and profile deletes. listings accompany all transactions. operation and costs of seldom from the time that seldom became operable on a day-to-day basis, cost information has been gathered, and since seldom is composed of four modules, the recording of items of cost has been easily done. for example, lconv computer charges are presently $0.019 per record converted based on weekly files ranging in size from 1194 records to 2399. this breaks down to about 1939 records per week, and averages out to about $37 per marc tape. following the preparation of the tape for searching, the srchpro-sort routine is run. the average computer cost has been about $0.186 per profile per issue. seldom's user group presently numbers 81, with profile terms numbering 1121 or 14 terms per profile, and questions or expressions numbering 273 or about 4 per profile. prinpro was formerly running under stream-oriented transmission, at a total computer cost of $1.70 per 1000 lines of output. a shift to record-----------------· marc ii-based retrievaljmauerhoff and smith 155 oriented transmission has lowered charges to $1.50 per 1000 lines. with profiles having averaged about 832 lines of output, the total cost of printing out search results has been about $1.25 per profile. overall costs for the 81 profiles are presently about $2.23 per profile per tape, or $116.00 per profile per year. since the profiles require updating at frequent intervals, charges of $0.37 per profile per tape have been incorporated into this charge to take care of changes in terms and addresses. costs which have not been included in the calculations are such items as marc tape subscriptions, forms, and staff time. discussion the ots and the nsl have at their disposal a program package that is highly flexible. for instance, search keys can be added or deleted at will. fields from the marc tapes can either be incorporated or removed from the directory. any number of fields and subfields can be searched on tape, and any new directory items may be created, with the srchpro limit, however, being 32. this number was chosen because it satisfies 99% of the users' needs. almost every procedure in the program is table driven, the result being that variations can easily be introduced into the programs. in consequence, if and when bnb marc tapes are made available, and if and when a canadian marc service becomes a reality, searching of these tapes would present no problems whatsoever. the benefits to be derived from seldom go beyond the concept of sdi, because seldom can produce outputs for a wide variety of applications. sdi and current awareness have received considerable emphasis in the literature by those providing search services over a spectrum of scientifictechnical tape services. since marc ii has also elicited a tremendous response, especially from kenneth bierman of the oklahoma department of libraries, these utilities do not merit additional treatment in this paper. seldom, however, is unique in that it is the only marc-based sdi system capable of searches using six coordinated entry points, linear matching, truncation, weighting, and output options. from the point of selection, marc has great appeal. since the majority of the university's acquisitions (i.e. almost 80%) are english-language monographs, faculty and staff who have the responsibility for book selection would benefit from regular alerting services based on their areas of interest. apart from receiving verified bibliographic information, the participants benefit from the timeliness of the records. at the same time, selection costs per record will be brought down significantly, especially now that this selection process becomes tied in to tesa -1, the library's automated marc-based acquisitions and cataloguing system. it has been suggested that selection and ordering could be done for the cost of selection alone. the only problem areas envisaged are the lack of canadian imprints, and the lack of other non-english monographs, such as french, german, 156 journal of library automation vol. 4/3 september, 1971 spanish and portugese. a partial solution to this problem may take the form of a canadian marc project. a more complete solution is on its way, since marc coverage for other languages is anticipated by the beginning of 1972. collection rationalization, an area receiving considerable attention along regional and national lines, can also benefit from seldom. devising divisions of responsibility in the acquisition of library materials will enable libraries to acquire, organize, store, and make available to the public, comprehensive monographic collections. marc deselection, where practised by subscribers, is being pursued mainly along the lines of time decay. the university of chicago (21) has so far exhibited the only deselection algorithm employing a subject and intellectual level approach, in addition to date. they eliminate records from their file if they fall outside of their collection policy by using classification numbers. the ots will be able to perform the same function, but much more rigorously, since its deselection criteria can consist of six elements. in this way, file size can be kept to a reasonable level, and update and storage charges will not be so high. internal library data and information services will be along the lines of sdi, current awareness, demand bibliographies, and management statistics. these in-house utilities, which are already being obtained, have been very useful the reference department, for instance, receives a bibliography each week of marc ii reference sources. another profile for one of the catalogers is monitoring the publications of the modern day novelists and poets. outlook seldom has been operational for only several months. while it has tremendous potential in the library field, and although immediate interest has been keen, the system will have to undergo considerable acceptance testing. attention will have to be given to costs and to the user and his evaluation of the service. how seldom fits into a library's patron or reference services will be especially important, since the system will be integrated into a library's current accessions program and also the card catalog service. acknowledgments major credit for the existence of the seldom project is due to the systems analysts and programmers at the national research council of canada, messrs. p. h. wolters, r. a. green, j. heilik, miss r. smith; and to dr. j. e. brown, national science librarian. references 1. personal communication with henriette avram, marc development office, library of congress, washington, d. c. marc ll-based retrievaljmauerhoff and smith 157 2. bierman, k. j.: "sdi service," lola-technical communications, 1 (october 1970 ), 3. 3. bierman, k. j.; blue, betty j.: "a marc-based sdi service," journal of library automation, 3 ( december 1970 ), 304-319. 4. bierman, k. j.: "an operating marc-based sdi system: some preliminary services and user reactions," proceedings of american society for information science, 7 ( 1970 ) , 87-90. 5 . bierman, k. j. : statements of progress of cooperative sdi project. in oklahoma department of libraries: automation newsletter, 2 (february 1970 ), 3-4; 2 (june-august 1970) ; 2 (september 1970); 16, 25-26; 2 (december 1970), 34-35; 3 (february 1971), 1-3. 6. studer, william j.: computer-based selective dissemination of information (sdi ) service for faculty using librm·y of congress machinereadable catalog ( marc) records. (ph.d. dissertation, graduate library school, indiana university, september, 1968). 7. studer, william j. : "book-oriented sdi service provided for 40 faculty." in avram, henrie tte : the marc pilot project, final report ( washington, d. c.: library of congress, 1968) p. 179-183. also in random bits, 3:3 (november 1967 ), 1-4; 3 :4 (december 1967), 1-4, 6. 8. avram, h enriette: "marc program research and development: a progress report," journal of library automation, 2 (december 1969 ), 257-265. 9. atherton, pauline: "lc/marc on molds ; an experiment in computer-based, interactive bibliographic storage, search, retrieval, and processing," journal of library automation, 3 (june 1970 ), 142-165. 10. atherton, pauline; wyman, john : "searching marc tapes with ibm/ document processing system," proceedings of american society for information scien ce, 6 ( 1969 ), 83-88. 11. atherton, pauline; tessier, judith: "t eaching with marc tapes," l ournal of library automation, 3 (march 1970 ), 24-35. 12. hudson, judith a. : "searching marc/ dps records for area studies: comparative results using keywords, lc and dc class numbers," library resources and technical services, 14 (fall 1970), 530-545. 13. martin, dohn h.: "marc t ap e as a selection tool in the medical library," special libraries, (april 1970 ), 190-193. 14. veenstra, j. g.: "university of florida." in avram, henriette d.: the marc pilot project, final r eport (washington, d.c.: library of congress, 1968 ), pp. 137-140. 15. weisbrod, d. l.: "yale university." in avram, henriette d.: the marc pilot project, final report (washington, d.c.: library of congress, 1968) , pp. 167-173. 16. palmer, foster m.: "harvard university library." in avram, henriette d.: the marc pilot project, final report (washington, d .c.: library of congress, 1968 ), pp. 103-111. 158 journal of library automatwn vol. 4 / 3 september, 1971 17. tell, b. v. ; larsson, r. ; lindh, r. : "information retrieval with the abacus program: an experiment in compatibility," proceedings of a symposium on handling of nuclear informatwn (vienna : 16-20 february, 1970), p. 184. 18. heaps, d.; shapiro, v.; walker, d.; appleyard, f.: "search program for marc tapes at the university of alberta," proceedings of the annual meeting of the western canada chapter of the american society for informatwn science, (vancouver: september 14, 15, 1970), 83-94. 19. ayres, f . h.: "making the most of marc; its use for selection, acquisitions, and cataloguing," program, 3 ( april 1969 ), 30-37. 20. dieneman, w.: "marc tapes in trinity college library," program, 4 (april 1970 ), 70-75. 21. "marc ii and its importance for law libraries," law library journal, 63 (november 1970), 505-525. 22. wolters, peter h .; brown, jack e.: "can/ sdi system : user reaction to a computerized information retrieval system for canadian scientists and technologists," canadian library journal, 28 (january, february 1971 ), 20-23. 23. mauerhoff, georg r.: "nsl profiling and search editing," proceedings of the annual meeting of the w estern canada chapter of the american society for information science, (vancouver: september 14, 15, 1970), 32-53. identifying emerging relationships in healthcare domain journals via citation network analysis kuo-chung chu, hsin-ke lu, and wen-i liu information technology and libraries | march 2018 39 kuo-chung chu (kcchu@ntunhs.edu.tw) is professor, department of information management, and dean, college of health technology, national taipei university of nursing and health sciences; hsin-ke lu (sklu@sce.pccu.edu.tw) is associate professor, department of information management, and dean, school of continuing education, chinese culture university; wen-i liu (wenyi@ntunhs.edu.tw, corresponding author) is professor, department of nursing, and dean, college of nursing, national taipei university of nursing and health sciences. abstract online e-journal databases enable scholars to search the literature in a research domain or to crosssearch an interdisciplinary field. the key literature can thereby be efficiently mapped. this study builds a web-based citation analysis system consisting of four modules: (1) literature search; (2) statistics; (3) articles analysis; and (4) co-citation analysis. the system focuses on the pubmed central dataset and facilitates specific keyword searches in each research domain for authors, journals, and core issues. in addition, we use data mining techniques for co-citation analysis. the results could help researchers develop an in-depth understanding of the research domain. an automated system for co-citation analysis promises to facilitate understanding of the changing trends that affect the journal structure of research domains. the proposed system has the potential to become a value-added database of the healthcare domain, which will benefit researchers. introduction healthcare is a multidisciplinary research domain of medical services provided both inside and outside a hospital or clinical setting. article retrieval for systematic reviews in the domain is much more elusive than retrieval for reviews in clinical medicine because of the interdisciplinary nature of the field and the lack of a significant body of evaluative literature. other connecting research fields consist of the respective research fields of the application domain (i.e., the health sciences, including medicine and nursing).1 in addition, valuable knowledge and methods can be taken from the fields of psychology, the social sciences, economics, ethics, and law. further, the integration of those disciplines is attracting increasing interest.2 researchers may use bibliometrics to evaluate the influence of a paper or describe the relationship between citing and cited papers. citation analysis, one of several possible bibliometric approaches, is more popular than others because of the advent of information technologies.3 citation analysis counts the frequency of cited papers from a set of citing papers to determine the most influential scholars, publications, or universities in a discipline. it can be classified into two basic types: the first type counts only the citations in a paper that are authored by an individual, while the second mailto:kcchu@ntunhs.edu.tw mailto:sklu@sce.pccu.edu.tw mailto:wenyi@ntunhs.edu.tw identifying emerging issues in the healthcare domain | chu, lu, and liu 40 https://doi.org/10.6017/ital.v37i1.9595 type analyzes co-citations to identify intellectual links among authors in different articles. this paper focuses on the second type of citation analysis. small defined co-citation analysis as “the frequency with which two items of earlier literature are cited together by the later literature.”4 it is not only the most important type of bibliometric analysis, but also the most sophisticated and popular method. many other methods originate from citation analysis, including document co-citation analysis, bibliographic coupling,5 author cocitation analysis,6 and co-word analysis.7 there are levels of co-citation analysis: document, author, and journal. co-citation could be used to establish a cluster or “core” of earlier literature.8 the pattern of links between documents can establish a structure to highlight the relationship of research areas. citation patterns change when previously less-cited papers are cited more frequently, or old papers are no longer cited. changing citation patterns imply the possibility of new developments in research areas; furthermore, we can investigate changing patterns to understand the scientific trend within a research domain.9 co-citation analysis can help obtain a global overview of research domains.10 the aim of this paper is to detect emerging issues in the healthcare research domain via citation network analysis. our results can provide a basis for knowledge that researchers can use to construct a search strategy. structural knowledge is intrinsic to problem solving. because of the interdisciplinary nature of the healthcare domain and the broadness of the term, research is performed in several research fields, such as nursing, nursing informatics, long-term care, medical informatics, geriatrics, information technology, telecommunications, and so forth. although electronic journals enable searching by author, article, and journal title using keywords or full text, the results are limited to article content and references and therefore do not provide an in-depth understanding of the knowledge structure in a specific domain. the knowledge structure includes the core journals, core issues, the analysis of research trends, and the changes in focus of researchers. for a novice researcher, however, the literature survey remains a troublesome process in terms of precisely identifying the key articles that highlight the overview concept in a specific domain. the process is complicated and time-consuming, and it limits the number of articles collected for retrospective research. the objective of this paper is to provide information about the challenges and methodology of relevant literature retrieval by systematically reviewing the effectiveness of healthcare strategies. to this end, we build a platform for automatically gathering the full text of ejournals offered by the pubmed central (pmc) database.11 we then analyze the co-citation results to understand the research theme of the domain. methods this paper tries to build a value-added literature database system for co-citation analysis of healthcare research. the results of the analysis will be visually presented to provide the structure of the domain knowledge to increase the productivity of researchers. information technology and libraries | march 2018 41 dataset for co-citation analysis, a data source of related articles on healthcare is required. for this paper, the articles were retrieved from the pmc database using search terms related to the healthcare domain. to build the article analysis system, we used bibliometrics to locate the relevant references while analysis techniques were implemented by the association rule algorithm of data mining. the pmc database, which is produced by the us national institutes of health and is implemented and maintained by the us national center for biotechnology information of the us national library of medicine, provides electronic articles from more than one thousand full-text journals for free. we could understand the publication status from the open access subset (oas) and access to the oai (open archives initiative) protocol for metadata harvesting, which includes the full text in xml and pdf. regarding access permission, pmc offers a dataset of many open access journal articles. this paper used a dedicated xml-formatted dataset (https://www.ncbi.nlm.nih.gov/pmc/tools/oai/). the xml-formatted dataset followed the specification of dtd (document type definition) files, which are sorted by journal title. each article has a pmcid (pmc identification), which is useful for data analysis. in addition to the dataset, the pmc also provides several web services to help widely disseminate articles to researchers. pubmed central (pmc) citation database searching module citation module web view users data sourcemiddle-end pre-processeingback-end front-end xml files web serverdb server keyword co-citation module statistical module figure 1. the system architecture of citation analysis with four subsystems. https://www.ncbi.nlm.nih.gov/pmc/tools/oai/ identifying emerging issues in the healthcare domain | chu, lu, and liu 42 https://doi.org/10.6017/ital.v37i1.9595 system architecture our development environment consisted of the following four subsystems: front-end, middle-end, back-end, and pre-processing. the front-end creates a “web view,” a visualization of the results for our web-based co-citation analysis system. the system architecture is shown in figure 1. front-end development subsystem we used adobe dreamweaver cs5 as a visual development tool for the design of web templates. the php programming language was chosen to build the co-citation system that would be used to access and analyze the full-text articles. in terms of the data mining technique, we implemented the apriori algorithm with the php language.12 the results were exported as xml to a charting process, where we used amcharts (https://www.amcharts.com/), to create stock charts, column charts, pie charts, scatter charts, line charts, and so forth. middle-end server subsystem the system architecture was a microsoft windows-based environment with a xampp 2.5 web server platform (https://www.apachefriends.org/download.html). xampp is a cross-platform web development kit that consists of apache, mysql, php, and perl. it works across several operating systems, such as linux, windows, apache, macos, and oracle solaris, and provides ssl encryption, a phpmyadmin database management system, webalizer traffic management and control suite, a mail server (mercury mail transport system), and filezilla ftp server. back-end database subsystem to speed up co-citation analysis, the back-end database system used mysql 5.0.51b with interface phpmyadmin 2.11.7 for easy management of the database. mysql includes the following features: • using c and c++ to code programs, users can develop an application programming interface (api) through visual basic, c, c + +, eiffel, java, perl, php, python, ruby, and tcl languages with the multithreading capability that can be used in multi-cpu systems and easily linked to other databases. • performance of querying articles is quick because sql commands are optimally implemented, providing many additional commands and functions for a user-friendly and flexible operating database. an encryption mechanism is also offered to improve data confidentiality. • mysql can handle a large-scale dataset. the storage capacity is up to 2tb for win32 nts systems and up to 4tb for linux ext3 systems. • it provides the software myodbc as an odbc driver for connecting many programming languages, and it several languages and character sets to achieve localization and internationalization. pre-processing subsystem the pmc provides access to the article via oas, oai services, e-utilities, and ftp. we used ftp to download a compressed (zip) file packaged with a filename following the pattern “articles?-?.xml.tar.gz” on october 28, 2012 (ftp://ftp.ncbi.nlm.nih.gov/pub/pmc), where “?-?” is “0-9” or “a-z”. the size of the zip file was approximately 6.17gb. after extraction, the size of the articles was approximately 10gb. the 571,890 articles from 3,046 journals were grouped and https://www.amcharts.com/ https://www.apachefriends.org/download.html ftp://ftp.ncbi.nlm.nih.gov/pub/pmc information technology and libraries | march 2018 43 sorted by journal title in a folder labeled with an abbreviated title. an xml file would, for example, be named “aapsj-10-1-2751445.nxml,” where “aapsj” was the abbreviated title of the journal american association of pharmaceutical scientists journal, “10” was the volume of the journal, “1” was number of the issue, and “2751445” was the pmcid. we used related technologies for developing systems that include php language, array usage, and the apriori algorithm to analyze the articles and build the co-citation system.13 finally, several analysis modules were created to build an integrated co-citation system. research procedure the following is our seven-step research procedure to fulfill the integrated co-citation system: 1. parse xml file: select tags for construction of database; choose fields for co-citation analysis (for example, , , and ). 2. present web-based article: design webpage and css style; present web-based xml file by indexing variable . 3. build an abstract database: the database consists of several fields: , , , , , , and . 4. develop searching module: pass the keyword to the method “post” in sql query language and present the search result in the webpage. 5. develop statistical module: the statistical results include number of article and cited articles, the journals and authors cited in all articles, and the number of cited articles. 6. develop citation module: visually present the statistical results in several formats; rank searched journals; rank searched and cited journals in all the articles. 7. develop co-citation module: analyze the association between articles with the apriori algorithm. association rule algorithms the association rule (ar), usually represented by ab, means that the transaction containing item a also contains item b. there are many such rules in most of the dataset, but some were useless. to validate the rules, two indicators, support and confidence, can be applied. support, which means usefulness, is the number of times the rules feature in the transactions, whereas confidence means certainty, which is the probability that b occurs whenever the a occurs. we chose the rules for which the values of both support and confidence were greater than a predefined threshold. for example, a rule stipulating “toastjam” has support of 1.2 percent and confidence of 65 percent, implying that 1.2 percent of the transactions contain “toast” and “jam” and that 65 percent of the transactions containing “toast” also contained “jam.” the principle for generating the ar is based on two features of the documents: (1) find the highfrequency items that set their supports greater than the threshold; (2) for each dataset x and its subnet y, check the rule xy if the support is greater than the threshold, in which the rule xy means that the occurrence in the rule containing x also contains y. most studies focus on searching high-frequency item sets.14 the most popular approach for identifying the item sets is apriori algorithm, as shown in figure 2.15 the algorithm rationale is that if the support of item set i is less identifying emerging issues in the healthcare domain | chu, lu, and liu 44 https://doi.org/10.6017/ital.v37i1.9595 than or equal to the threshold, i is not a high-frequency item set. new item set i that inserts any item a into i would not be a high-frequency item set. according to the rationale, the apriori algorithm is an iteration-based approach. first, it generates candidate item set c1 by calculating the number of occurrences of each attribute and finding that the high-frequency item set l1 has support greater than the threshold. second, it generates item set c2 by joining l1 to c1, iteratively finding l2 and generating c3, and so on. 1: l1 = {large 1-item sets}; 2: for (k=2; lk-1; k++) do begin 3: ck = candidate_gen (lk-1); 4: for all transactions td do begin /* generate candidate k-dataset*/ 5: ct = subset (ck, t); 6: for all candidates c  ct do 7: c_count=c_count+1; 8: end 9: lk ={cck | c_count ≥ minsuppport} 10: end 11: return l =  lk; figure 2. the apriori algorithm. the apriori algorithm is one of the most commonly used methods for ar induction. the candidate_gen algorithm, as shown in figure 3, includes join and prune operations for generating candidate sets.16 steps 1 to 4 generate all possible candidate item sets c from lk-1. steps 5 to 8: delete the item set that is not a frequent item set by the apriori algorithm. step 9 returns candidate set ck to the main algorithm. 1: for each item set x1 lk-1 2: for each item set x2 lk-1 3: c = join (x1[1], x1[2], x1[k-2], x1[k-1], x2[k-1]) 4: where x1[1] = x2[1], x1[k-2] = x2[k-2], x1[k-1] < x2[k-1]; 5: for item sets c  ck do 6: for all (k-1)-subsets s of c do 7: if (s  lk-1) then add c to ck; 8: else delete c from ck; 9: return ck; figure 3. the candidate_gen algorithm. information technology and libraries | march 2018 45 results we searched the pmc database with keywords “healthcare,” “telecare,” “ecare,” “ehealthcare,” and “telemedicine” and located 681 articles with a combined 14,368 references. values were missing from the year field for 4 of the references; this was also the case for 635 of a total of 52,902 authors. according to the keyword search for the healthcare domain, a pie chart of the journal citation analysis, as shown in figure 4, the top-ranked journal in terms of citations was the british medical journal (bmj). it was cited approximately 439 times, 18.89 percent of the total, followed by the journal of the american medical association (jama), which was cited approximately 344 times, 14.80 percent of the total. the trend of healthcare citation 1852 to 2009 peaked in 2006 at approximately 1,419 citations, with more than half of the total occurring in this year. figure 4. top-cited journals in the healthcare domain by percentage of total citations (n = 2324) with the keyword search for the healthcare domain, figure 5 shows a pie chart of the author citations. the most-cited author was j. w. varni, professor of pediatric cardiology at the university of michigan mott children’s hospital in ann arbor. this author was cited approximately 149 times, equivalent to 23.24 percent of the total, followed by d. n. herndon, professor at the department of plastic and hand surgery, friedrich-alexander university of erlangen in germany. this author was cited approximately 73 times, 11.39 percent of the total. by identifying the affiliations of the topranked authors, researchers can access related information in their field of interest. the co-citation analysis was conducted using the apriori algorithm. the relationship of co-citation journals with a supporting degree greater than 38 from 1852 to 2009 is shown in figure 6. each identifying emerging issues in the healthcare domain | chu, lu, and liu 46 https://doi.org/10.6017/ital.v37i1.9595 journal was denoted by a node, where the node with double circle meant the journal is co-cited with the other in a citing article. bmj, which covers the fields of evidence-based nursing care, obstetrics, healthcare, nursing knowledge and practices, and others, is the core journal of the healthcare domain. figure 5. top-cited authors in journals of the healthcare domain by percentage of total citations (n = 641) figure 6. the relationship of co-citation journals with bmj. information technology and libraries | march 2018 47 to identify the focus of the journal, we analyze the co-citation in three periods. in 1852–1907, journals are not in co-citation relationships; in 1908–61, five candidates had a supporting degree greater than 1 (see table 1); and in 1962–2009, twenty-eight candidates had a supporting degree greater than 14 (see table 2 (for example, bmj and lancet had sixty-eight co-citations). table 1. candidates in co-citation analysis with a supporting degree greater than 1 (1908–61). no journals no. of journals co-cited support 1 publ math inst hung acad sci, publ math 2 3 2 jaoa, j osteopath 2 1 3 antioch rev, j abnorm soc psychol 2 1 4 n engl j med, am surg 2 1 5 arch neurol psychiatry, j neurol psychopathol, z ges neurol psychiat 3 1 table 2. candidates in co-citation analysis with a supporting degree greater than 14 (1962–2009). no journals no. of journals co-cited support 1 bmj, lancet 2 68 2 bmj, jama 2 65 3 jama, med care 2 64 4 bmj, arch intern med 2 61 5 lancet, jama 2 52 6 soc sci med, bmj 2 52 7 jama, arch intern med 2 51 8 lancet, med care 2 50 9 crit care med, prehospital disaster med 2 49 10 n engl j med, bmj 2 49 11 n engl j med, lancet 2 49 12 n engl j med, jama 2 47 13 n engl j med, med care 2 47 14 qual saf health care, bmj 2 47 15 bmj, crit care med 2 42 16 med care, bmj 2 38 17 n engl j med, j bone miner res 2 33 identifying emerging issues in the healthcare domain | chu, lu, and liu 48 https://doi.org/10.6017/ital.v37i1.9595 18 n engl j med, j pediatr surg 2 26 19 lancet, j pediatr surg 2 25 20 jama, nature 2 25 21 lancet, jama, bmj 3 24 22 n engl j med, lancet, bmj 3 21 23 intensive care med, bmj 2 21 24 bmj, n engl j med, jama 3 20 25 n engl j med, jama, lancet 3 20 26 jama, med care, lancet 3 14 27 jama, med care, n engl j med 3 14 28 bmj, jama, lancet, n engl j med 4 14 the link of co-citation journals in three periods from 1852 to 2009 can be summarized as follows: (1) three journals were highly cited but were not in a co-citation relationship in 1852–1907 (see figure 7); (2) five clusters of the healthcare journals in co-citation relationships were found for the years 1908–61 (see figure 8); and (3) 1962–2009 had a distinct cluster of four journals within the healthcare domain (see figure 9). figure 7. the relationship of co-citation journals for the healthcare domain in 1852–1907. information technology and libraries | march 2018 49 figure 8. the relationship of co-citation journals for the healthcare domain in 1908–61. journals with double circles are co-cited with the other in a citing article. journals with triple circles are cocited with the other two in a citing article. figure 9. the relationship of co-citation journals for the healthcare domain in 1962–2009. the thick line and circle indicates the journals are co-cited in a citing article. conclusions identifying emerging issues in the healthcare domain | chu, lu, and liu 50 https://doi.org/10.6017/ital.v37i1.9595 this paper presented an automated literature system for co-citation analysis to facilitate understanding of the sequence structure of journal articles cited in the healthcare domain. the system visually presents the results of its analysis to help researchers quickly identify the key articles that provide an overview of the healthcare domain. this paper used the keywords related to healthcare for its analysis and found that bmj is a core journal in the domain. the co-citation analysis found a single cluster within the healthcare domain comprising four journals: bmj, jama, lancet, and the new england journal of medicine. this paper focused on a co-citation analysis of journals. authors, articles, and issues featured in the co-citation analysis can be further studied in an automated way. a period analysis of publication years is also important. further analyses can facilitate understanding of the changes in a research domain and the trend of research issues. in addition, the automatic generation of a map would be a worthwhile topic for the future study. acknowledgements this article was funded by the ministry of science and technology of taiwan (most), formerly known as national science council (nsc), with grant no: nsc 100-2410-h-227-003. for the remaining authors none were declared. all the authors have made significant contributions to the article and agree with its content. there is no known conflict of interest in this study. references 1 a. kitson et al., “what are the core elements of patient-centered care? a narrative review and synthesis of the literature from health policy, medicine and nursing,” journal of advanced nursing 69 (2013): 4–8, https://doi.org/10.1111/j.1365-2648.2012.06064.x. 2 s. j. brownsell et al., “future systems for remote health care,” journal of telemedicine and telecare 5 (1999): 145–48, https://doi.org/10.1258/1357633991933503; b. g. celler, n. h. lovell, and d. k. chan, “the potential impact of home telecare on clinical practice,” medical journal of australia 171 (1999): 518–20; r. walker et al., “what it will take to create new internet initiatives in health care,” journal of medical systems 27 (2003): 95–98, https://doi.org/10.1023/a:1021065330652. 3 i. marshakova-shaikevich, the standard impact factor as an evaluation tool of science fields and scientific journals,” scientometrics 35 (1996): 283–85, https://doi.org/10.1007/bf02018487; i. marshakova-shaikevich, “bibliometric maps of field of science,” information processing & management 41(2005):1536–45, https://doi.org/10.1016/j.ipm.2005.03.027; a. r. ramosrodrí guez and j. ruí z-navarro, “changes in the intellectual structure of strategic management research: a bibliometric study of the strategic management journal, 1980–2000,” strategic management journal 25, no. 10 (2004): 982–1000, https://doi.org/10.1002/smj.397. 4 h. small, “co-citation in the scientific literature: a new measure of the relationship between two documents,” journal of american society for information science 24 (1973): 266–68. https://doi.org/10.1111/j.1365-2648.2012.06064.x https://doi.org/10.1258/1357633991933503 https://doi.org/10.1023/a:1021065330652 https://doi.org/10.1007/bf02018487 https://doi.org/10.1016/j.ipm.2005.03.027 https://doi.org/10.1002/smj.397 information technology and libraries | march 2018 51 5 m. m. kessler, “bibliographic coupling between scientific papers,” american documentation 14 (1963): 10–25, https://doi.org/10.1002/asi.5090140103; b. h. weinberg, “bibliographic coupling: a review,” information storage and retrieval 10 (1974): 190–95. 6 h. d. white and b. c. griffith, “author cocitation: a literature measure of intellectual structure,” journal of the american society for information science 32 (1981): 164–70, https://doi.org/10.1002/asi.4630320302. 7 y. ding, g. g. chowdhury, and s. foo, “bibliometric cartography of information retrieval research by using co-word analysis,” information processing & management 37 no. 6 (november 2001): 818–20, https://doi.org/10.1016/s0306-4573(00)00051-0. 8 small, “co-citation,” 266. 9 d. sullivan et al., “understanding rapid theoretical change in particle physics: a month-bymonth co-citation analysis,” scientometrics 2 (1980): 312–16, https://doi.org/10.1007/bf02016351. 10 n. shibata et al., “detecting emerging research fronts based on topological measures in citation networks of scientific publications,” technovation 28 (2008): 762–70, https://doi.org/10.1016/j.technovation.2008.03.009. 11 weinberg, “bibliographic coupling.” 12 white and griffith, “author cocitation.” 13 r. agrawal and r. srikant. “fast algorithm for mining association rules in large databases” (paper, international conference on very large databases [vldb], september 12–15, 1994, santiago de chile). 14 r. agrawal, t. imielinski, and a. swami, “mining association rules between sets of items in large databases” (paper, acm sigmod international conference on management of data, washington, dc, may 25–28, 1993. 15 agrawal and srikant, “fast algorithm,” 3. 16 ibid., 4. https://doi.org/10.1002/asi.5090140103 https://doi.org/10.1002/asi.4630320302 https://doi.org/10.1016/s0306-4573(00)00051-0 https://doi.org/10.1007/bf02016351 https://doi.org/10.1016/j.technovation.2008.03.009 abstract introduction methods dataset system architecture front-end development subsystem middle-end server subsystem back-end database subsystem pre-processing subsystem research procedure association rule algorithms results conclusions acknowledgements references rural public libraries and digital inclusion: issues and challenges brian real, john carlo bertot, and paul t. jaeger information technology and libraries | march 2014 6 abstract rural public libraries have been relatively understudied when compared to public libraries as a whole. data are available to show that rural libraries lag behind their urban and suburban counterparts in technology service offerings, but the full meaning and effect of such disparities is unclear. the authors combine data from the public library technology and access study with data from smaller studies to provide greater insight to these issues. by filtering these data through the digital inclusion framework, it becomes clear that disparities between rural and nonrural libraries are not merely a problem of weaker technological infrastructure. instead, rural libraries cannot reach their full customer service potential because of lower staffing (but not lower staff dedication) and funding mechanisms that rely primarily on local monies. the authors suggest possible solutions to these disparities while also discussing the barriers that must be overcome before such solutions can be implemented. introduction the large number of rural public libraries in the united states is surprisingly understudied, particularly in terms of technology access. the american library association (ala) and other professional organizations consider a public library to be small or rural if its population of legal service area is 25,000 or less. when viewed through this lens, rural public libraries1 • have on average less than one (.75) librarian with a master’s degree from an alaaccredited institution; • have an average of 1.9 librarians, defined as an employee holding the title of librarian; • have an average total of 4.0 staff, including both fulland part-time employees; • have a median annual income (from all sources) of $118,704.50; • have an average of 41,425 visits annually; and • typically have one building or branch that is open an average of 40 hours/week. brian real (breal@umd.edu) is a phd candidate in the college of information studies, john carlo bertot (jbertot@umd.edu) is co-director of the information policy and access center and professor in the college of information studies, and paul t. jaeger (pjaeger@umd.edu) is codirector of the information policy and access center and associate professor and diversity officer of the college of information studies, university of maryland, college park, maryland. mailto:breal@umd.edu mailto:jbertot@umd.edu mailto:pjaeger@umd.edu rural public libraries and digital inclusion | real, bertot, and jaeger 7 while these data suggest rural libraries operate on a smaller and less financially robust scale than their suburban and urban counterparts, the full effect of these discrepancies on service levels is unclear. this article uses various information sources to analyze the effect of these discrepancies on the ability of rural libraries to offer technology-based services. since the advent of the internet in the mid-1990s, public libraries have been key internet-access and technology-training providers for their communities. the ability to offer internet access alongside support and training for patrons using such technology are primary indicators of libraries’ value to their communities. by analyzing data from the 2012 public library funding and technology access survey (plftas), the authors found that rural libraries, on average, have weaker technological infrastructure (such as fewer average numbers of computers and slower broadband connections) and are able to offer fewer support services, such as training classes, than urban and suburban public libraries. with public libraries being many patrons’ only source of broadband access in many rural communities, limitations for rural libraries may affect patrons’ ability to fully participate in employment, education, government, and other central aspects of society. through analysis of the plftas data2 about technology access in rural public libraries in conjunction with other studies of rural libraries and librarians, this article explores the causes and effects of the relatively more limited technological and support infrastructures for rural patrons and communities. method as documented since 1994,3 public libraries were early adopters of internet-based technologies. the purpose of the pltas survey, and its previous iterations, is to identify public library internet connectivity; propose and promote public library internet policies at the federal level; maintain selected longitudinal data as to the connectivity, services, and deployment of the internet in public libraries; and provide national estimates regarding public library internet connectivity. through changes in funding sources and frequency of administration over the past two decades, the survey has maintained core longitudinal questions (e.g., numbers of public access workstations, bandwidth), but consistently explored a range of emerging topics (e.g., jobs assistance, e-government, emergency roles). the survey’s method has evolved over time to meet changing survey data goals. the 2012 survey provides both national and state estimates of public library internet connectivity, public access technologies, and internet-enabled services and resources. the survey used a stratified “proportionate to size sample” to ensure a proportionate national sample using the fy2009 imls public library dataset (formerly maintained by the us national center for education statistics) to draw its sample. strata included states in which libraries resided and metropolitan status (urban, suburban, rural) designations. bookmobile and books by mail service outlets were removed from the file, leaving 16,776 library outlets. information technology and libraries | march 2014 8 the study team drew a sample with replacement of 8,790 outlets stratified and proportionate by state and metropolitan status state.4 the survey received 7,252 responses for a response rate of 82.5%. using weighted analysis to generate national and state data estimates, the analysis uses the responses to estimate to all public library outlets (minus bookmobiles and books by mail) in the aggregate as well as by metropolitan status designations. unless otherwise noted, all data discussed in the article are from the 2012 study. that study, along with all previous public libraries and the internet and public library funding and technology access studies, additional analysis, and data products are available at http://www.plinternetsurvey.org. digital inclusion and the value of public libraries digital inclusion is a useful framework through which one can understand the importance of ensuring individuals have access to digital technologies as well as the means to learn how to use them.5 digital inclusion comprises policies and actions that mitigate the significant, interrelated problems of the digital divide and digital literacy: • digital divide implies the gap—whether based in socioeconomic status, education, geography, age, ability, language, or other factors—between individuals for whom internet access is readily available and those for whom it is not. indeed, even those with basic, dialup internet access are losing ground as internet and computer technologies continue to advance, using increasing bandwidth and demanding high-speed (“broadband”) internet access. • digital literacy encompasses the skills and abilities necessary for access once the technology is available, including understanding the language and component hardware and software required to successfully navigate the technology. • digital inclusion is policies developed to close the digital divide and promote digital literacy. it marries high-speed internet access (as dial-up access is no longer sufficient) and digital literacy in ways that reach various audiences, many of whom parallel those mentioned within the digital divide debate. to match the current policy language, digital inclusion will signify outreach to unserved and underserved populations. since virtually every public library in the united states offers public internet access, these institutions are invaluable in promoting digital inclusion. however, the plftas data shows that not all libraries are equal, with rural public libraries lagging behind libraries in more populated areas in providing technology services. therefore this article focuses on the following issues and questions: • digital divide: why do rural individuals have less access to broadband technologies than their suburban and urban counterparts? how are rural libraries currently compensating for this deficit? rural public libraries and digital inclusion | real, bertot, and jaeger 9 • digital literacy: why do rural libraries offer less digital literacy training and patron support? how do rural libraries compare to libraries in more populated areas on key issues in digital literacy, such as employment and government information? • digital inclusion: what policies have been developed to help rural libraries close the digital divide and promote digital literacy, and what policies—including funding structures and decisions—hinder these libraries from adequately addressing these concerns? what governmental and extra-governmental policies can be enacted to help rural libraries to better promote digital inclusion? the following section describes the differences between rural libraries and their urban and suburban counterparts, combining plftas data with information from other studies to demonstrate how rural libraries are more essential in bridging the digital divide yet are seemingly doing less to promote digital literacy. following this, the authors discuss why rural libraries trail suburban and urban libraries in these areas, with studies suggesting the issue is a result of inadequate resources, not a lack of staff dedication. finally, the authors present a review of some of the initiatives that are attempting to bridge these divides, including suggestions that may help rural librarians to act as better advocates for their patrons’ needs. rural challenges to digital inclusion numerous studies, including plftas, show that rural libraries offer less technology access with slower connection speeds than libraries in more populated areas. these libraries also offer comparatively less formalized digital literacy training, although rural libraries still provide invaluable informal training in this area. this section highlights discrepancies between rural libraries and those in more populated areas. technology and service disparities between rural and nonrural libraries while almost every public library offers patrons internet access, 70.3% of rural libraries are the only free internet and computer terminal access providers in their service communities, compared to 40.6% of urban and 60.0% of suburban libraries.6 the disparity between these categories becomes more striking when one considers the difference between home broadband adoption in rural and nonrural areas. according to the pew research center’s home broadband 2010 survey, only 50% of rural homes have broadband internet access, compared to 70% of nonrural homes.7 this disparity is due in large part to the greater difficulty and cost of creating the infrastructure to support broadband internet access in more sparsely populated areas.8 with broadband access provided primarily by for-profit companies, little profit motive exists to expand services to areas where the infrastructure cost would not allow for a quick and efficient recouping of costs. the us government has attempted to address this problem in a numerous ways, including dedicating $7.2 billion to improving broadband access throughout the country through grants (broadband information technology and libraries | march 2014 10 technology opportunity programs; btop) and loans (broadband infrastructure projects; bip) as part of the american recovery and reinvestment act (arra) of 2009.9 expanding this infrastructure will take time, and at this time it is unknown as to the extent to which broadband access in rural communities, both in general and for public libraries, will increase. as the arra projects near completion, it will be important to conduct follow-up analysis of the effect in terms of access to broadband in the home and in anchor institutions such as public libraries, as well as the extent to which broadband subscriptions increased. at present, however, public libraries—and rural public libraries in particular—are still the primary source of broadband access for many americans, and this will likely remain true for large portions of the population for the foreseeable future. individuals in need of internet access have few options in many communities. though there are increasing free wireless (wi-fi) internet access sources in communities (e.g., coffee shops, food outlets), one needs to have a device (e.g., tablet, laptop) to use these options. in two-thirds of american communities, the public library is the only source of freely available public internet access inclusive of public access computers.10 specific government efforts to increase internet access, broadband networks, and digital literacy of the population, however, fail to involve public libraries in a meaningful way, if at all.11 to be fair, public libraries were eligible to compete for the grants or submit loan applications for the arra broadband funding initiatives, and public libraries in states such as alaska, arizona, colorado, idaho, maine, montana, nebraska, and others have benefited from this, primarily through inclusion in applications with multiple beneficiaries.12 since btop works as a grantmaking process, relatively few us public libraries (approximately 20%) have benefited from btop funding, but the results have been encouraging. for example, 85 libraries in mostly rural nebraska have upgraded their broadband capacity using btop funds, with broadband capacity for these locations increasing from an average of 2.9 mbps to an average of 18.2 mbps. other states have tried innovative ideas, such as the idaho department of labor’s youth corps program to train high school and college students to work as digital literacy coaches, and then deploy them to libraries around the state. indeed, the btop program has certainly created some encouraging results, but this is not a permanently funded program and it targets a limited number of libraries, so it cannot be considered as a primary, widespread solution to the digital inclusion gap between rural and more populated areas. the authors of a recent btop report note, “unless strategic investments in u.s. public libraries are broadened and secured, libraries will not be able to provide the innovative and critical services their communities need and demand.”13 thus btop may provide a good model to addressing gaps in digital inclusion, but it was never designed to be a permanent solution. this role of ensuring digital inclusion in communities has accelerated at a time of unprecedented austerity nationally and at the state and local levels of government in particular. based on bureau of labor statistics (http://www.bls.gov) data, the united states lost 584,000 public-sector jobs between june 2009 and april 2012, or 2.5% of the local, state, and federal government jobs that rural public libraries and digital inclusion | real, bertot, and jaeger 11 existed before the prolonged economic downturn began. according to the center on budget and policy priorities, state budget shortfalls have ranged from $107 billion to $191 billion between 2009 and 2012, and current projections place state budget shortfalls at $55 billion for 2013.14 the prolonged economic downturn, in part, has driven up library usage in some communities.15 even before the downturn began, public libraries in the rural areas typically had the oldest computing equipment, the slowest internet access speeds, and the lowest support levels from the federal government.16 as a part of becoming the main source of digital literacy training and digital inclusion, public libraries have also become a primary training provider for in-demand, technology-based job skills.17 the resulting situation forces public libraries to balance reduced support, increased demand, and a growing centrality in helping their communities recover from the economic downturn. at the center of both increased demand and increased support of digital literacy and inclusion lies sufficient internet access. in a survey of rural librarians in tennessee, respondents reported that their patrons’ most critical information need was broadband internet access.18 the respondents also ranked access to recent hardware technology and software, technology training, and help with specific tasks like applying for jobs or government benefits as highly critical. by comparison, the respondents ranked traditional services such as book loaning as the least critical duty, significantly trailing the abovementioned and other technology services. despite rural librarians viewing technology-based services as their most important function, however, rural libraries lack the resources to meet the same service quality as nonrural libraries. the ensuing section discusses the nature of those disparities. technology infrastructure and technology training virtually all public libraries offer their patrons access to the internet. there is no statistical difference between rural, suburban, and urban libraries in this regard.19 likewise, rural libraries only lag slightly in wireless internet availability, which is becoming increasingly important with the ubiquity of mobile technology devices; 86.3% of rural libraries have wireless access available for patrons, compared to an average of 90.5% across all three categories.20 and, in one of the few technological areas where rural libraries lead their nonrural counterparts, 42.3% of rural libraries reported they had sufficient public access computer terminals at all times, compared to 33.5% of suburban and 12.9% of urban libraries. while the number of rural library computer terminals may be adequate in many locations, hardware quality suffers; 69.5% of rural libraries replace their public access computer terminals as needed while, 66.4% of urban libraries have a technology replacement schedule.21 for many small libraries with only a single full-time librarian, that employee also serves as the it specialist for the location.22 therefore many rural libraries have less up-to-date technologies and less technical support than their nonrural counterparts. even if the librarians who also provide it support for their locations are qualified to fulfill this role, the greater issue is limited time for information technology and libraries | march 2014 12 librarians to work on these issues in addition to other duties. in addition to less recent hardware, rural libraries also have limited bandwidth; 31.1% of rural libraries operate on bandwidths of 1.5 mbps (t1) or less, compared to only 18.3% of suburban libraries and a mere 9.7% of urban libraries.23 the greatest issues facing rural libraries are not well represented by the broader categories of internet access but instead in the implementation of services to make these technologies highly useful and effective for patrons. only 31.8% of rural libraries offer formal technology training classes, as compared to 63.2% of urban and 54.0% of suburban libraries.24 this comparison alone does not present a problem, since more populated areas have larger customer bases that justify training patrons in groups rather than in one-on-one sessions. however, rural libraries also trail significantly in offering one-on-one technology training, with only 30.1% of rural libraries providing such programs, compared to 43.4% of urban and 37.9% of suburban libraries. only 21.9% of rural libraries have online training materials, compared to 36.3% and 33.7% of urban and suburban libraries, respectively. in fact, 12.5% of rural libraries do not offer planned technology training at all, compared to a mere 5.1% of urban libraries and 8.0% of suburban libraries. therefore, while most patrons in nonrural areas who have limited technology skills can go to their local library and acquire such skills for free, such access to the resources for personal advancement is drastically limited by comparison in rural areas. since many rural residents do not have internet access in their homes, many of these individuals do not own computers and have limited technology skills resulting from limited technology exposure. this makes the technology training disparity between rural and nonrural libraries quite problematic, since most americans need these skills to maintain a high standard of living and employment. employment assistance while public libraries in all areas saw adequate staffing as a statistically similar problem for helping patrons find jobs—51.9% or rural librarians agreed this was a challenge, only slightly exceeding the overall average of 49.8%—the greater issue is the disparity of confidence levels in assisting patrons in employment matters.25 nearly half (48.3%) of rural survey respondents agreed a lack of staff expertise was a challenge to helping patrons find and apply for jobs online, compared to 27.9% of urban and 37.7% of suburban libraries. the internet has become essential for many people who wish to gain employment, thus rural public librarians’ inability to support rural residents with limited technology skills is problematic. many government agencies, hospitals, and private employers—including walmart, the largest employer in the united states—will no longer accept paper applications, but instead insist potential employees submit applications via the internet.26 this can be especially challenging for individuals who have recently lost jobs they have held for decades, as they simultaneously need to refresh basic application and interviewing skills while learning how to use unfamiliar information technologies to find and apply for jobs. librarians can offer critical assistance in these cases, especially for individuals who do not own a computer or have internet access in their homes. however, inequities in staffing between rural rural public libraries and digital inclusion | real, bertot, and jaeger 13 and nonrural libraries can prevent rural residents from having equal access and aid in finding careers. government service access rural libraries also lag behind libraries in more populated areas in providing support for accessing online government services. there is no statistically significant difference between public libraries in staff providing assistance to patrons who need help filling out forms, with 96.6% of all libraries offering this service.27 however, only 45.6% of rural libraries assist customers in understanding government programs and services, compared to 57.8% of urban and 52.9% of rural libraries. rural libraries are also far less likely to have formal guides to help customers understand these government services, with only 15.3% of rural libraries offering such products as compared to 33.6% of urban and 22.2% of suburban counterparts. just 6.2% of rural libraries offer formal training classes for using government websites, understanding government programs, and completing forms. roughly a one-fourth (24.5%) of urban and 11.9% of suburban libraries offer such services. in terms of staff expertise, 20.0% of rural libraries reported having at least one staff member with significant knowledge and expertise of government services, compared to 31.4% of urban and 25.0% of suburban libraries. therefore, while most public libraries help patrons access government services, rural libraries lag substantially in the type of formal planning that may make patrons more aware of government services that would improve their quality of life. important services such as voter registration, motor vehicle services, payment of taxes, and school enrollment for children can now be done either only or much more efficiently online.28 these online services are more convenient for many americans, but “while many members of the public may struggle with accessing or using egovernment information and services, government agencies have come to focus on it as a means of cost savings rather than increasing access for members of the public.”29 government agencies have for the most part not taken many americans' lack of digital literacy into account when shifting their primary means of service to the digital realm, nor have they considered the effect this shift has on public libraries as the primary internet provider for many americans. this has led to extra responsibilities for rural public libraries but not a direct increase in resources. one might consider that rural libraries offer fewer of these services, or have less expertise in providing digital government services, in part because such services are not in demand by patrons. however, government services have steadily moved online and the pace is accelerating towards an e-only means of interacting with government. the open government movement,30 combined with the federal government’s release of the technology and services blueprint, signals the further use of technologies to offer innovative and operational digital government services—both through more traditional web-based services and mobile applications.31 and state and local governments are increasingly engaging in e-government services such as unemployment and social service benefits, taxation, licensing, and more. in short, federal, state, and local governments are moving rapidly to a range of e-services that will necessitate facility by librarians with technologies, information technology and libraries | march 2014 14 government services, and government information to better help their communities navigate the challenges of e-government. government intervention in digital literacy although most government agencies have not considered the effect their shift to primarily digital services has on individuals who lack basic digital literacy, the federal communications commission launched two programs that could help with the digital literacy problem. the first of these, digitalliteracy.gov, is designed to provide individuals with tools to facilitate digital inclusion, helping users to acquire skills that will make them more capable in the modern information environment. the challenge with this approach is that many resources on the website are designed for individuals who need such skills and who therefore probably do not have access to the internet or possess the skills to fully engage the resources. moreover, most of these resource links point to external sites, which are organized by arbitrary user ratings rather than skill level and relevance.32 likewise, educator resources – which should be most valuable in helping librarians to education patrons – are presented as links to external sites with limited information about each resource. these resources may be able to help patrons, but a collaborative effort that includes public librarians in creating resources could better target particular patron needs in a public access setting. a newer project, connect2compete, demonstrated more promising progress in this area. connect2compete is a partnership between the fcc and private businesses to provide low-cost internet and computers to low-income families, digital literacy training, and other services.33 they also publicize the digital literacy divide, working with the ad council and other organizations to promote this issue.34 the website allows users to search for places where they can receive digital literacy training, with the search results primarily displaying local public library branches. however, despite pointing users to public libraries for such training, connect2compete currently only helps to fund such training in limited cases. while this program provides a strong model for raising awareness about digital inclusion, it is unlikely to provide infrastructure resources to fully bridge the gap between rural and nonrural communities in the near future. while the fcc has been innovative by soliciting private funds to prevent connect2compete from using any taxpayer funds, these private funds will not replace the need for government funds for public libraries throughout the nation, nor is private funding likely to continue indefinitely. indeed, “while governments at all levels are relying on public libraries to ensure digital inclusion, the same governments are reducing the funding of the very libraries that are being relied on.”35 the following section will detail how decreasing funding and limited resources have contributed to the digital divide between rural and nonrural libraries. rural libraries and barriers to promoting digital inclusion when the internet was emerging in the 1990s, “public libraries essentially experimented with public-access internet and computer services, largely absorbing this service into existing service rural public libraries and digital inclusion | real, bertot, and jaeger 15 and resource provision without substantial consideration of the management, facilities, staffing, and other implications of public-access technology services and resources.”36 while some libraries have increased their funding levels to match these challenges, most funding agencies have not recognized the costs or value of additional services that public libraries now offer in a wired nation. this section discusses the reasons why rural libraries have not been able to offer the same level of service nonrural library patrons routinely expect. funding inadequacies for rural libraries rural libraries face challenges from their problematic funding structure. sin noted that for public libraries, “on average, the local government provided 76.6% of the funding; the state, 8.7%; the federal government, 0.7%; and other sources, 13.9%.”37 this is a particular problem for rural libraries since, as holt explained, “if cities and suburbs had to survive on the extraordinarily low taxes on agricultural property, the urban/suburban public sector would have service levels so low that most officials would turn away in disgust.”38 this lack of local revenue for all public services—including libraries—in rural areas is exacerbated by the continuing population decrease in small towns and the desirability of such locales for retiring seniors, who prefer to live in areas with low taxes because of limited incomes.39 in other words, public library funding structures that place local governments at the forefront of budgeting plans put rural libraries at a serious disadvantage and promote a digital divide between rural and nonrural areas. holt notes, “it is the legitimate function of state government to make things right. state governments, after all, are of a size and scale that historically allows them to perform as equity agencies for locales.”40 indeed, the averages for funding sources cited above can vary, and state and federal governments have attempted to dampen the funding inequities between rural and nonrural libraries. one example is the federal e-rate program, established under the telecommunications act of 1996 to provide schools, libraries, and healthcare providers with a discounted “education rate” for communication technologies, including internet technologies.41 while this has subsidized part of the internet service costs for libraries throughout the nation, many libraries do not apply because they do not know they are eligible or the application process is too complicated. some rural libraries have had the advantage of their state library systems applying on their behalf, but even when funding is provided this only covers parts of the libraries’ connection and equipment costs. and, according to the plftas survey, only 61.5% of rural libraries received e-rate funds, compared to 75.0% of urban libraries, showing the program does not favor the class of libraries with the greatest connectivity issues.42 likewise, as noted above, the federal government designated $7.2 billion from the american recovery and reinvestment act of 2009 for improving broadband access throughout the nation, with funding designated for rural areas and public libraries in general. these improvements will take time, though, and will not fully compensate for the lack of local funds for rural libraries or rural libraries not receiving nearly as much in nongovernmental funds as nonrural libraries.43 additionally, while local governments in some areas have created their own broadband information technology and libraries | march 2014 16 infrastructure to compensate for corporate providers’ unwillingness to expand to some areas due to inadequate predicted profits, nineteen state governments banned such practices due to lobbying efforts from the broadband industry.44 the corporations that lobbied for these laws feared that if this becomes common practice, local governments could offer low enough pricing to compete against for-profit services. while this may be a legitimate concern, the end result of this legislation is local governments—including rural governments—in some states being legally blocked from allocating funds to solve the market failure that has prevented corporate providers from adequately expanding into rural areas. therefore public libraries’ funding and resource structures are inherently stacked against rural institutions. while e-rate and other federal and state programs may mitigate the problem, the ultimate solution needs to be a restructuring of library funding models that takes the primary burden off struggling local governments or at least increases state and federal contributions. in a seminal article on rural libraries and the technology written in 1995, vavrek noted that “public libraries cannot survive by only appealing to those who are least likely to be able to pay to support the library. while visions of the homeless person using the internet to locate information is both compassionate and within the social role of the public library, can the library afford to provide this access?”45 beyond patrons not being open to assisting less fortunate individuals, vavrek suggested attempts to diversify library services—including introducing internet technology services, which was novel at the time—could distract resources from libraries’ established services that have traditionally appealed to all income classes and, with this, erode public support for these institutions. the pew home broadband 2010 survey show vavrek’s thoughts on this matter were prescient, as 53% of survey respondents believed the government either should not support broadband expansion or that this should not be a very important priority.46 the benefits of greater broadband access and relevant service support may seem obvious to those who are intimate with this matter, but much of the public does not see the importance of expanding such services. if rural librarians cannot fight these perceptions and convince traditional library users and the general public of the importance of these services, then they will probably not be able to reverse these negative trends. unfortunately, rural libraries lack the time, resources, and data to lobby the public on these matters. staffing and training problems for rural librarians a lack of funding and resources affect not only rural public libraries, but also rural public librarians. in a study that illustrated such issues, flatley and wyman surveyed a random sample of libraries in extremely rural areas, with their service population baseline being 2,500 as opposed to the 25,000 threshold noted above.47 while the data they collected are somewhat dated (the survey was conducted in 2007), this study still deserves special attention because similar data have not been collected more recently or by other authors. the authors found that 80% of rural libraries have only a single full-time employee, and 50% have two or less paid employees when fulland part-time employees are considered.48 these rural public libraries and digital inclusion | real, bertot, and jaeger 17 employees are underpaid compared to the national average, with 72% reporting they earned $12.99 or less per hour.49 when asked why they believed their pay was relatively low, more than half (53%) of rural librarians responded it was because their communities lacked funds, demonstrating the structure of local funding being more important than state and federal funding to librarians’ salaries.50 flatley and wyman also found that only 14% of these employees held mls degrees, with 32% having achieved bachelor’s degrees and 37% having completed only a high school diploma.51 as one would expect in relation to most rural librarians not having professional training before entering the field, many of these individuals applied for their first library career because they saw a position advertised for their local library and it offered better pay than most other local jobs. while many rural librarians entered the profession because of reasons other than a desire to become librarians, the data suggest these individuals are capable and enthusiastic about their jobs. almost half (47%) of rural librarians had worked in the field for more than a decade, with an additional 22% having been librarians for six to ten years.52 two-thirds (66%) of survey respondents stated they intended to remain librarians until retirement age, and 97% responded they were very satisfied or somewhat satisfied with their careers.53 additionally, despite the relatively low pay for library positions, this was not the most common complaint rural librarians had about their jobs. instead, while 27% found low pay to be the greatest issue they faced, 29% felt a lack of funds for new materials was a greater problem.54 therefore, while certain technological issues in rural public libraries—such as the lack of technological training courses for patrons—can be framed accurately as a problem involving rural librarians, these problems should not be framed as the librarians’ fault. with current staffing levels, rural librarians do not have as much available staff time to provide training courses and one-on-one training as their suburban and urban counterparts. these librarians may also lack the knowledge and experience to train others in technological skills, and their libraries may lack the funds to help them acquire these abilities. these factors are outside of these librarians’ control, however, and “no matter how hard lis professionals try, one cannot expect public library systems (especially those in less-advantaged neighborhoods) to bridge the information gap when the libraries are themselves underfunded and understaffed.”55 considering typical rural librarians' high dedication levels, one can assume they would be willing to remedy information gaps if they first had the resources to fix their libraries’ skill, funding, and staffing gaps. possible solutions rural libraries face the dual issue of a lack of resources to allow librarians enough time to advocate for their branches and a lack of data that advocates can use to show funders these libraries’ value to their communities. as a solution for the latter problem, sin suggested that library and information science (lis) scholars and other prominent figures in the field begin a dialogue with underfunded libraries—including rural institutions—to work with librarians to gather, process, and interpret data on libraries’ needs and libraries’ effects on their information technology and libraries | march 2014 18 communities.56 this would have the dual benefit of giving librarians better information with which they could focus their services for maximum value and providing graduate-student and professional-level researchers with a stronger understanding of their field. the authors of this article would like to expand on this slightly to suggest that any researchers who draft scholarly papers and presentations from data collected from work with underfunded libraries should feel obligated to assist libraries in using this data for their own benefit. scholars are likely to be in a better position to advocate for libraries with which they collaborate than time-and resourcestrapped librarians, and they should feel an ethical responsibility to do so after reaping the benefits of research. more rural librarians also need the skills to empower them to lead technological training courses for patrons, gather data to better understand how to best optimize their services, and lobby for greater funding at the local level. mehra et al. of the school of information sciences at the university of tennessee attempted to remedy this problem to a limited degree with a program they launched in june 2010 with funding from the imls laura bush 21st century librarian program.57 the researchers used this funding to provide full scholarships—including laptop computers and funds for books—to sixteen rural librarians already working in the south and central appalachia regions, allowing them to earn an mls degree in two years of part-time study. the researchers had previously conducted a qualitative survey of rural librarians in tennessee to determine the training and resource needs of rural librarians,58 and they used these data to form a customized mls program for the scholarship students. this included courses focusing on strong technical competencies, service evaluation, grant writing, and other courses of particular relevance to the rural environment. likewise, georgia uses state funds to pay the salaries of many experienced librarians with mls degrees throughout the state, thereby lifting the burden of affording such individuals off cashstrapped counties and municipalities.59 however, as this system develops in georgia, state funding is still limited and there have been state funding cuts to other areas, such as materials and infrastructure, to allow for an increase in state-funded professional librarians.60 therefore, while this appears to be a promising model that can be of particular benefit to rural residents of the state, further study is needed to determine its overall effects. with an estimate of more than 8,000 rural public libraries operating in the united states,61 it would be impossible to find the resources to provide the large majority of librarians without an mls at these locations with the full training needed to earn the degree. even if such funding were available, a large portion—if not the majority—of these resources could be put to better use by improving rural libraries’ technological infrastructure, increasing salaries, and growing collections. therefore, while the mls may remain the gold standard for library professionalism, it is not a realistic goal for many experienced and dedicated librarians throughout the country. instead, a more realistic program on a larger scale may be to provide rural librarians with targeted online and in-person training to enhance the skills they feel they need to be more successful. faculty and rural public libraries and digital inclusion | real, bertot, and jaeger 19 graduate students in lis academic programs are perhaps the most capable people to lead such training, and they are likely more capable of writing grant proposals to cover the costs of such programs than the rural librarians they could assist. mehra et al. have shown promising progress in this direction,62 and by removing the mls goal (or only expecting it in limited cases), their work could easily be emulated to help lis educators empower librarians throughout the nation. connect2compete, as detailed above, also has the potential to provide a training model for public librarians. the organization plans to create a “digital literacy corps,” comprising individuals who will help train portions of the public in basic digital literacy skills.63 while this program is still in its early phases, the organization plans to include librarians among this corps, training them to be better able to train others. once again, this will be achieved through private funds donated by corporate partners. this is certainly a noble effort and will likely benefit many libraries and their patrons, but “having access to training and being able to take advantage of training are two separate things.”64 connect2compete, digitalliteracy.gov, and other organizations already provide some resources to help rural librarians understand digital literacy issues and provide better training, but librarians have limited time to familiarize themselves with these sources when dealing with their daily duties. for librarians to use current, future, or more refined training resources, the problem of low staffing—and its cause, low funding—must be addressed. since many rural librarians lack the skill or, more importantly, time to lobby for their own libraries, this is a significant area where partner organizations can help. whether these partners are university departments as envisioned by mehra et al. and sin or individuals funded by private donations in the connect2compete model is inconsequential. the important issue is that if these partner groups want to truly help rural libraries bridge the digital divide, these groups will have to contribute a significant portion of their efforts to lobbying to increase library funding enough to improve infrastructure and increase staffing—and, through this, staff time—for training and assisting patrons. as discussed above, the btop program has had success both in increasing technological infrastructure and human infrastructure, with grant funding being used in some cases to bring in temporary staff that is capable of training patrons in digital literacy and to increase training opportunities for patrons using existing staff. given the information above, btop’s holistic approach is certainly encouraging, and the program's use of federal funds has shown how resources from above the local level can serve as an equalizing force. the temporary nature and limited funding of this program, however, make it important to remember this cannot be considered as the primary solution to the digital inclusion problem. conclusion many rural public libraries are the only providers of free broadband internet service and computer terminals for their communities, with these communities having the lowest average proportion of homes with broadband connections. with the internet being essential to receive information technology and libraries | march 2014 20 important government services and to apply for jobs with some of the largest and most ubiquitous employers throughout the nation, the value of the services offered by these libraries cannot be understated. the basic public library funding structure needs to be modified to close the digital inclusion gap between rural and more populated areas. even if local governments remain the primary funding source for public libraries, this contribution cannot remain grossly disproportionate when compared to state and federal support. state and federal governments are already seeing savings by moving access to government services and information online, and these governments will benefit with the better employment rates and better employee competency that comes with a digitally inclusive society. since these governments share in the benefits of digital inclusion, they must also share in the costs. some programs have shown promising results in bolstering rural public libraries and, though this, improving this nation's digital inclusion. these results range from large-scale programs such as btop to smaller programs such as the mls education program initiated by mehra et al. a common element of many of these programs, though, is their temporary nature, showing that funders are not recognizing that as technological innovation continues, new problems in digital inclusion will emerge. for government decision makers to understand the ongoing nature of the digital inclusion problem, rural public librarians and their allies—including academics and other stakeholders— will need to gather better data and provide better advocacy. references 1. , “fy2011 public library (public use) data files,” institute of museum and library services, http://www.imls.gov/research/pls_data_files.aspx. 2. john carlo bertot etal., 2011–2012 public library funding and technology access survey: survey findings and results (college park, md: information policy and access center, 2012), http://ipac.umd.edu/sites/default/files/publications/2012_plftas.pdf. 3. the studies originally began as the public libraries and the internet survey series until 2006 through various funding sources, at which time they became part of the public library funding and technology access study (http://www.ala.org/plinternetfunding), funded by the american library association and the bill & melinda gates foundation. 4. john carlo bertot et al., “public libraries and the internet: an evolutionary perspective,” library technology reports 47, no. 6 (2011): 7–8. 5. paul t. jaeger et al., “the intersection of public policy and public access: digital divides, digital literacy, digital inclusion, and public libraries,” public library quarterly 31, no. 1 (2012): 1–20. 6. bertot et al., 2011–2012 public library funding and technology access survey. http://www.imls.gov/research/pls_data_files.aspx http://ipac.umd.edu/sites/default/files/publications/2012_plftas.pdf http://www.ala.org/plinternetfunding rural public libraries and digital inclusion | real, bertot, and jaeger 21 7. aaron smith, home broadband 2010 (washington, dc: pew research center, 2010): 8, http://www.pewinternet.org/~/media//files/reports/2010/home%20broadband%20201 0.pdf. 8. federal communications commission, connecting america: the national broadband plan (washington, dc: federal communications commission, 2009): xi–xiii, http://download.broadband.gov/plan/national-broadband-plan.pdf. 9. aaron smith, home broadband 2010, 5. 10. john carlo bertot, charles r. mcclure, and paul t. jaeger, “the impacts of free public internet access on public library patrons and communities,” library quarterly 78, no. 3 (2008): 286; bertot et al., “public libraries and the internet,” 12–13. 11. jaeger et al., “the intersection of public policy and public access,” 1–20. 12. us public libraries and the broadband technology opportunities program (btop). (washington, dc: american library association, 2013): 1–2, http://www.districtdispatch.org/wp-content/uploads/2013/02/ala_btop_report.pdf. 13. ibid., 18. 14. “states continue to feel recession’s impact,” center on budget and policy priorities, last modified june 27, 2012, http://www.cbpp.org/cms/index.cfm?fa=view&id=711. 15. deanne w. swan et al., public libraries survey: fiscal year 2010 (imls-2013–pls-01) (washington, dc: institute of museum and library services, 2010) 16. paul t. jaeger et al., “public libraries and internet access across the united states: a comparison by state from 2004 to 2006,” information technology & libraries 26, no. 2 (2007): 4–14, http://dx.doi.org/10.6017/ital.v26i2.3277. 17. natalie greene taylor et al., “public libraries in the new economy: 21st century skills, the internet, and community needs,” public library quarterly 31, no. 3 (2012): 191–219. 18. bharat mehra et al., “what is the value of lis education? a qualitative study of the perspectives of tennessee’s rural librarians,” journal of education for library & information science 52, no. 4 (2011): 272. 19. bertot et al., 2011–2012 public library funding and technology access survey, 15. 20. ibid., 22. 21. ibid., 46. 22. bertot, “public access technologies in public libraries,” 88. 23. bertot et al., 2011-2012 public library funding and technology access survey, 21. 24. ibid., 29. http://www.pewinternet.org/~/media/files/reports/2010/home%20broadband%202010.pdf http://www.pewinternet.org/~/media/files/reports/2010/home%20broadband%202010.pdf http://download.broadband.gov/plan/national-broadband-plan.pdf http://www.districtdispatch.org/wp-content/uploads/2013/02/ala_btop_report.pdf http://www.cbpp.org/cms/index.cfm?fa=view&id=711 information technology and libraries | march 2014 22 25. ibid., 42–45. 26. mehra et al., “what is the value of lis education?” 271–72. 27. bertot et al., 2011–2012 public library funding and technology access survey, 36. 28. paul t. jaeger and john carlo bertot, “responsibility rolls down: public libraries and the social and policy obligations of ensuring access to e-government and government information,” public library quarterly 30, no. 2 (2011): 91–116. 29. ibid., 100. 30. the obama administration’s commitment to open government: a status report (washington: government printing office, 2013): 4–7, http://www.whitehouse.gov/sites/default/files/opengov_report.pdf. 31. barack obama, digital government: building a 21st century platform to better serve the american people (washington, dc: office of management and budget, 2012), http://www.wh.gov/digitalgov/pdf. 32. “find educator tools,” digitalliteracy.gov, http://www.digitalliteracy.gov/content/educator. 33. “about us,” everyoneon, http://www.everyoneon.org/c2c. 34. ad council, “ad council & connect2compete launch nationwide psa campaign to increase digital literacy for 62 million americans,” press release, march 21, 2013, http://www.adcouncil.org/news-events/press-releases/ad-council-connect2competelaunch-nationwide-psa-campaign-to-increase-digital-literacy-for-62-million-americans. 35. jaeger et al., “public libraries and internet access,” 14. 36. bertot, “public access technologies in public libraries,” 81. 37. sei-ching joanna sin, “neighborhood disparities in access to information resources: measuring and mapping u.s. public libraries’ funding and service landscapes,” library & information science research 33, no. 1 (2011): 45. 38. glenn e. holt, “a viable future for small and rural libraries,” public library quarterly 28, no. 4 (2009): 288. 39. ibid., 288–89. 40. ibid., 289. 41. paul t. jaeger, charles r. mcclure, and john carlo bertot, “the e-rate program and libraries and library consortia, 2000–2004: trends and issues,” information technology & libraries 24, no. 2 (2005): 57–67. 42. bertot et al., 2011–2012 public library funding and technology access survey, 61. 43. sin, “neighborhood disparities in access,” 51. http://www.whitehouse.gov/sites/default/files/opengov_report.pdf http://www.wh.gov/digitalgov/pdf http://www.digitalliteracy.gov/content/educator http://www.everyoneon.org/c2c/ http://www.adcouncil.org/news-events/press-releases/ad-council-connect2compete-launch-nationwide-psa-campaign-to-increase-digital-literacy-for-62-million-americans http://www.adcouncil.org/news-events/press-releases/ad-council-connect2compete-launch-nationwide-psa-campaign-to-increase-digital-literacy-for-62-million-americans rural public libraries and digital inclusion | real, bertot, and jaeger 23 44. olivier sylvain, “broadband localism,” ohio state law journal 73, no. 4 (2012): 20–24. 45. bernard vavrek, “rural information needs and the role of the public library,” library trends 44, no. 1 (1995): 26. 46. aaron smith, home broadband 2010, 2. 47. robert flatley and andrea wyman, “changes in rural libraries and librarianship: a comparative survey,” public library quarterly 28, no. 1 (2009): 25–26. 48. ibid., 34. 49. ibid., 35. 50. ibid., 28. 51. ibid., 33. 52. ibid., 26. 53. ibid., 29. 54. ibid., 30. 55. sin, “neighborhood disparities in access,” 50. 56. ibid., 51. 57. bharat mehra et al., “collaborations between lis education and rural libraries in the southern and central appalachia: improving librarian technology literacy and management training,” journal of education for library & information science 52, no. 3 (2011): 238–47. 58. mehra et al., “what is the value of lis education?” 59. “state paid position guidelines,” last updated august 2013, http://www.georgialibraries.org/lib/stategrants_accounting/official_state_paid_position_gui delines-updated-august-2013.pdf. 60. bob warburton, “georgia tweaks state funding formula to prioritize librarians,” library journal, february 2, 2014, http://lj.libraryjournal.com/2014/02/budgets-funding/georgiatweaks-state-funding-formula-to-prioritize-librarians. 61. bertot et al., 2011–2012 public library funding and technology access survey, 14. 62. mehra et al., “collaborations between lis education and rural libraries”; mehra et al., “what is the value of lis education?” 63. institute of museum and library services, “imls announces grant to support libraries’ roles in national broadband adoption efforts,” press release, june 14, 2012, http://www.imls.gov/imls_announces_grant_to_support_libraries_roles_in_national_broadban d_adoption_efforts.aspx. http://www.georgialibraries.org/lib/stategrants_accounting/ http://lj.libraryjournal.com/2014/02/budgets-funding/georgia-tweaks-state-funding-formula-to-prioritize-librarians/ http://lj.libraryjournal.com/2014/02/budgets-funding/georgia-tweaks-state-funding-formula-to-prioritize-librarians/ http://www.imls.gov/imls_announces_grant_to_support_libraries_roles_in_national_broadband_adoption_efforts.aspx http://www.imls.gov/imls_announces_grant_to_support_libraries_roles_in_national_broadband_adoption_efforts.aspx information technology and libraries | march 2014 24 64. bertot, “public access technologies in public libraries,” 88. the importance of identifying and accommodating e-resource usage data for the presence of outliers. alain r. lamothe information technology and libraries | june 2014 31 abstract this article presents the results of a quantitative analysis examining the effects of abnormal and extreme values on e-journal usage statistics. detailed are the step-by-step procedures designed specifically to identify and remove these values, termed outliers. by greatly deviating from other values in a sample, outliers distort and contaminate that data. between 2010 and 2011, e-journal usage at laurentian university’s j. n. desmarais library spiked because of illegal downloading. the identification and removal of outliers had a noticeable effect on e-journal usage levels. they represented more than 100,000 erroneous articles downloaded in 2010 and nearly 200,000 erroneous downloaded in 2011. introduction this article was written with two purposes in mind. first, it presents and discusses the results of a quantitative analysis that assessed how outlier values can influence usage statistics. second, and more important, it details the step-by-step procedures designed specifically to identify outliers and reduce their impact on the data. outliers are abnormal values that result in the corruption or contamination of data by artificially increasing or reducing average values.1 an outlier can thus be defined as a value that appears to greatly deviate from all other values in the sample,2 as an observation that seems to be inconsistent with the rest of the dataset,3 or as a very extreme observation requiring special attention because of potential impacts it may have on a summary of the data.4 they occur frequently in measurement data.5 the presence of outliers in usage data can significantly and negatively impact libraries. for libraries having e-resource subscription pricing based on usage statistics, the presence of outliers can contribute to unwarranted increases in subscription rates. for libraries that integrate eresource usage statistics into their collection development and management practices, the presence of outliers can affect decisions on purchase, retention, or elimination of particular eresources. evaluators can be fooled into thinking that a particular e-resource is heavily used and must be kept. further, the presence of extreme outliers is often the result of a malicious system alain r. lamothe (alomothe@laurentian.ca) is associate librarian, department of library and archives, laurentian university, sudbury, ontario, canada. mailto:alomothe@laurentian.ca information technology and libraries | june 2014 32 intrusion,6 as was experienced by the j. n. desmarais library of laurentian university in sudbury, ontario, canada.7 between june 2010 and may 2011, e-journal usage at the j. n. desmarais library spiked after a four-year period of stable annual usage levels.8 between 2006 and 2010, the total number of fulltext articles downloaded from the library’s e-journal collection ranged between 640,000 and 720,000 annually, with an average of 700,000 articles downloaded per year. but in 2010 that number dramatically increased to more than 857,000 full-text articles downloaded. this was followed by an additional 870,000 full-text articles downloaded in 2011. then, as suddenly and inexplicably as the increase had occurred, usage levels returned to the same quantities recorded in the years prior to 2010. a total of 716,000 full-text articles were downloaded in 2012. during this period of spiking usage the library received notifications and warnings from certain ejournal vendors of abnormally large numbers of full-text articles being downloaded over a relatively short period of time from the laurentian university ezproxy server’s ip address. this level of usage was a breach of license agreements. these vendors then proceeded to temporarily block laurentian university’s ezproxy access until they obtained assurances from the university that the offending accounts were no longer active. this action on the vendors’ part prevented any further suspected illegal downloading from occurring but also barred laurentian university students, staff, and faculty from authorized off-campus access. but not all vendors operated in this fashion and, unknown to the library at the time, full-text articles continued to be downloaded from other vendor sites in excessive amounts. either they were not monitoring excessive usage or they did not have the technical means to do so. regardless, in some cases certain e-journal titles recorded downloads thousands of times higher than normal. in some cases dozens of articles were being downloaded in seconds. the situation continued until late spring 2011, at which point it was discovered that confidential proxy account login information had been posted illegally on the web. with the login information of all compromised accounts now available, proxy managers were able to block their access at once, thereby ending the period of illegal downloading of laurentian university licensed material. web robots were suspected to have been involved. web robots, also referred to as internet bots or www robots, are automated software applications that run tasks on the web much as search engines do.9 they send requests to web servers to procure resources.10 some robots are developed with malicious intent and are designed to download entire websites for the purpose of copying the site,11 for autonomous logins to send spam,12 or for autonomous logins to steal confidential or copyright protected material.13 web robots specifically designed for the illegal procurement of copyright protected content are obviously of particular concern for libraries. unlawful downloading of full-text content occurs for many reasons. studies have clearly demonstrated that excessively high prices of digital content is a major drive for illegal downloads.14 misunderstanding and misinterpretation of copyright laws in addition to the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 33 unfamiliarity with and general apathy toward these same copyright laws further contribute to unlawful downloading of protected material.15 many students are unaware that the transmission of downloaded articles violates copyright laws and license agreements and often misunderstand the fair use aspect of copyright as meaning that the acquisition and distribution of licensed content for the purpose of education is allowed.16 in the minds of these students, distribution is permitted provided it is not for profit. librarians have also reported students systematically downloading all articles from recent journal issues not for the purpose of distribution or sale but rather to build their own personal collection.17 they are more concerned with obtaining resources quickly and completely rather than legally.18 aggravating the situation are students who firmly believe that by paying tuition they have permission to do whatever they wish with their institutions’ e-resources.19 some of these same students even use web robots to download as much as possible thereby saving them time and energy.20 they consider the downloaded item as their personal property. in fact, calluzzo and cante found that students displayed an ethical sense to personal property but became neutral if the property belonged to an enterprise.21 and solomon and o’brien found that 71 percent of students believed illegal copying to be a socially and ethically acceptable behavior.22 the j. n. desmarais library integrates e-resource usage into its collection development policy. as stated in the library’s collection development policy, “if the cost-per-use of an online resource is greater than the cost of an interlibrary loan for three consecutive years, this resource will be reviewed for cancellation.”23 in fact, this practice has been enforced for the past several years and has saved the library a considerable sum of money.24 for this reason, it is extremely important not to assume the accuracy of usage values without carefully examining the data. the artificial inflation of usage numbers could substantially cost the library if it was believed that an e-resource was beginning to experience an improvement in usage when, in actuality, it was not the case. the decision to keep this resource could cost the library tens of thousands of dollars before it was realized that the high number of searches or downloads recorded were not reflective of actual usage but were rather the result of data recording errors or illegal activity. regrettably, libraries will continue to deal with the consequences of copyright infringement, even if the library itself is not at fault. it is, however, important to recognize and understand that publishers are businesses and like any business, expect financial gain.25 even though e-resource piracy is currently very small, the risk of it becoming the single greatest threat to the industry is quite real. both music and film industries have been greatly affected by piracy for nearly two decades, and everyone witnessed the damaging effect it had. publishers have learned from this and will not allow it to happen to them as well.26 unfortunately for all parties involved, the nature of e-resources has made them extremely easy to copy.27 information technology and libraries | june 2014 34 method the following methodology will detail the step-by-step procedures to identify and deal with suspected outliers. all data manipulation and calculations were executed in microsoft excel for mac 2011 (version 14.3.2). all tables and figures were generated using the same version of excel. the first step is to identify suspected outliers by visually examining an entire usage dataset. a dataset is defined as a collection of related data corresponding to the contents of a single database table in which each column represents a particular variable and each row, a given member of the dataset in question.28 for this reason, the term dataset will be referred to in this paper as a grouping of data from any single spreadsheet. each spreadsheet contains the number of full-text articles downloaded per year per vendor. each dataset was downloaded from vendors’ sites as jr1 counter-compliant reports, which detail the number of successful full-text articles downloaded per month and per journal for a given year. all vendors provided jr1 counter-compliant reports that were downloaded as excel spreadsheets. each spreadsheet, or dataset, contained the list of e-journal title and the number of articles downloaded for each title per month (see table 1). each dataset was then visually inspected in its entirety for suspected outliers. january february march april may june july august september october november december polymer 12 15 26 33 38 64 39 5 13 15,123 109 44 surface and coatings technology 3 1 2 1 22 17 17 0 12 3,771 5,428 601 international journal of radiation oncology 11 18 35 22 17 6,436 176 13 25 29 24 19 journal of catalysis 0 1 5 1 2 2 16 4 0 2 6,693 1 table 1. sample from a 2010 jr1 counter-compliant report indicating the number of articles downloaded per journal over a twelve-month period. suspected outliers are highlighted in bold. since it was impractical to include the entire spreadsheet, table 1 provides an excerpt from a 2010 jr1 counter-compliant report containing five suspected outliers that have been marked for identification. the suspected outliers are highlighted in bold. the first of these extreme values belongs to the title polymer and was recorded in october. compared to the other values for polymer, it stands out dramatically at 15,123 articles downloaded. the second and third extreme values belong to surface and coatings technology and are recorded for the months of october (3,771 downloads) and november (5,428 downloads). the fourth is the 6,436 articles downloaded in june from international journal of radiation oncology and the fifth from journal of catalysis in november (6,693 downloads). these five values greatly deviate from the other values recorded the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 35 for each e-journal title. for polymer, the next highest value is 109 articles downloaded in november 2010, making the suspected outlier almost 14,000 percent greater. now that the suspected outliers have been identified, they must be compared quantitatively to the rest of the values recorded for their corresponding titles and only for their corresponding titles. for example, to test the probability that the value of 15,123 downloads recorded in october 2010 for polymer is indeed an outlier, the comparison must include all other 2010 polymer monthly values plus all other available polymer values. this is achieved by copying all 2010 polymer monthly values into a separate blank spreadsheet and then adding all other polymer monthly values from all other available years to that same spreadsheet (see table 2). this new spreadsheet can be labeled dataset 2, with dataset 1 being the original jr1 report downloaded from the vendor. suspected usage outliers from an e-journal need to be compared to other usage values of that particular title because each e-journal tends to be used differently. it would be inaccurate to test for an outlier by comparing it to the values of all other e-journals included in a collection and would be like comparing apples to oranges. january february march april may june july august september october november december polymer 2009 27 14 35 22 15 28 24 19 11 8 13 7 polymer 2010 12 15 26 33 38 64 39 5 13 15,123 109 44 polymer 2011 113 159 638 345 52 57 94 70 39 36 221 65 polymer 2012 130 4 98 24 27 18 13 16 18 25 9 5 table 2. combining polymer’s usage values from all available jr1 counter-compliant reports. the suspected outlier is highlighted in bold. table 2 provides the number of articles downloaded for the title polymer over a four-year period. these were the only jr1 reports available from the vendor. the suspected outlier is highlighted in bold. when visually comparing the suspected outlier of 15,123 downloads to the rest of the values in dataset 2, it again appears to be an extreme. the next highest value being 638 articles downloaded during march 2011, making the suspected outlier 2,200 percent greater than the next highest value in the dataset. all further outlier testing and accommodating will be performed on this table. the dixon q test was chosen to test for outliers. it is simple to use and designed to test for a small number of outliers in a dataset.29 the q value is calculated by measuring the difference in the gap between the suspected outlier and the next value over the range of values in the dataset (e.g., outlier—next value/largest—smallest). the gap is the absolute difference between the outlier and the closest number to it. to facilitate the calculation, the data should be arranged in order of increasing value with the smallest value at the front of the sequence and the largest value at the end of the sequence. for example, using the data in table 2 each value is be arranged beginning with 4, 5, 5, 7, . . . , 345, 638, information technology and libraries | june 2014 36 and finally ending with 15,123. the calculation would thus be (15,123−638) / (15,123−4) = 0.9581. the calculated q value will also be represented by the symbol of qvalue from this point onward, making qvalue = 0.9581. the next step is to compare the calculated qvalue to the critical values for q determined by verma and quiroz-ruiz.30 critical values correspond to a particular significance level and represent cutoff values that lead to the acceptance or rejection of a null hypothesis.31 the null hypothesis refers to the position in which there is no statistically significant relationship between two variables.32 the alternate hypothesis would thus be the existence of a relationship between two variables.33 if the calculated value is less than the critical value, the null hypothesis is accepted.34 on the other hand, if the calculated value is greater than the critical value, the null hypothesis is rejected.35 if the null hypothesis is rejected, then the alternate hypothesis must be accepted. here, the null hypothesis can be stated as “the suspected outlier is not an outlier.” the alternate hypothesis can then be stated as “the suspected outlier is an outlier.” therefore if the null hypothesis is rejected. then the suspected outlier is to be considered, in fact, to be an outlier. verma and quiroz-ruiz have calculated the critical value for q for a sample size of 48 and at a 95 percent confidence level to be qcritical = 0.2241.36 although operating at a 99 percent confidence level is a more conservative approach, it increases the likelihood of retaining a value that contains an error.37 operating at a 95 percent confidence level provides a reasonable compromise.38 if the calculated value is greater than the critical value, then the suspected outlier is confirmed to be an outlier. therefore, testing for the suspected outlier of 15,124, the q value was calculated to be qvalue = 0.9581. with q0.9581 > q0.2241, the null hypothesis is rejected and it must be accepted that 15,123 is an outlier. once it is determined with statistical certainty that the suspected outlier is indeed an outlier, it needs to be replaced with the median calculated from all values found in dataset 2. for the case of polymer, the median was calculated to be 27 from all values in table 2. replacing an outlier with the median to accommodate the data has been proven to be quite effective in dealing with outliers by introducing less distortion to that dataset.39 extreme values are therefore replaced with values more consistent with the rest of the data.40 january february march april may june july august september october november december polymer 2009 27 14 35 22 15 28 24 19 11 8 13 7 polymer 2010 12 15 26 33 38 64 39 5 13 27 109 44 polymer 2011 113 159 638 345 52 57 94 70 39 36 221 65 polymer 2012 130 4 98 24 27 18 13 16 18 25 9 5 table 3. the identified outlier is replaced with the median (highlighted in bold). the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 37 table 3 represents the number of full-text articles downloaded for polymer after the outlier had been replaced with the median. the confirmed outlier of 15,123 articles downloaded recorded in october 2010 is replaced with the median of 27, highlighted in bold. this then becomes the accepted value for the number of articles downloaded from polymer in october 2010. the outlier is discarded. the new value of 27 articles downloaded in october 2010 replaces the extreme value of 15,123 in the original 2010 jr1 report (see table 4). this is the final step. january february march april may june july august september october november december polymer 12 15 26 33 38 64 39 5 13 27 109 44 surface and coatings technology 3 1 2 1 22 17 17 0 12 3,771 5,428 601 international journal of radiation oncology 11 18 35 22 17 6,436 176 13 25 29 24 19 journal of catalysis 0 1 5 1 2 2 16 4 0 2 6,693 1 table 4. sample from a 2010 jr1 counter-compliant report indicating the number of articles downloaded per journal over a twelve-month period. polymer’s identified outlier is replaced with the median calculated from table 2 (highlighted in bold). once the first outlier is corrected, the same procedures need to be followed for the other suspected outliers highlighted in table 1. if it is determined that they are outliers, they are replaced with their associated median values. although the steps and calculations used to identify and correct for outliers are relatively simple to follow, it is admittedly a very lengthy and timeconsuming process. but in the end, it is well worth the effort. results and discussion table 5 details the changes in the overall number of articles downloaded from j. n. desmarais library e-journals that resulted from the elimination of outliers. the column titled “recorded downloads” details the number of articles downloaded between 2000 and 2012, inclusively, prior to outlier testing. the column titled “corrected downloads” represents the number of articles downloaded during the same period of time but after the outliers had been positively identified and the data cleaned. the affected values are highlighted in bold. information technology and libraries | june 2014 38 year recorded downloads corrected downloads 2000 806 806 2001 1034 1034 2002 1015 1015 2003 4890 4890 2004 72841 72841 2005 251335 251335 2006 640759 640759 2007 731334 731334 2008 710043 710043 2009 725019 725019 2010 857360 757564 2011 869651 696973 2012 716890 716890 table 5. comparison of the recorded number of articles downloaded to the corrected number of articles downloaded, over a thirteen-year period. all data from all available years were tested for outliers. only data recorded in 2010 and 2011 tested positive for outliers. replacing outliers with the median values for those affected journal titles dramatically reduced the total number of downloaded articles (see table 5). between 2007 and 2009, inclusively, the actual number of full-text articles downloaded recorded from the library’s e-journal collection totaled between 731,334 and 725,019 annually (see table 5). the annual average for those three years is 722,132 articles downloaded. but in 2010 that number dramatically increased to 857,360 downloaded articles, which was followed by 869,651 downloaded articles in 2011 (see table 5). the elimination of outliers from the 2010 data resulted in the number of downloads dropping from 857,360 to 757,564, a difference of nearly 99,796 downloads, or 12 percent. similarly, in 2011, the number of articles downloaded decreased from 869,651 to 696,973 once outliers were replaced with median values. this represents a reduction of over 172,678 downloaded articles, or 20 percent. a staggering 20 percent of articles downloaded in 2011 can therefore be considered as erroneous and, in all likelihood, the result of illicit downloading. figure 1 is a graphical representation of the change in the number of articles downloaded before and after the identification of outliers and their replacement by median values. the line “recorded downloads” clearly indicates a surge in usage between 2010 and 2011 with usage returning to levels recorded prior to the 2010 increase. the line “corrected downloads” depicts a very different picture. the plateau in usage that began in 2007 continues through 2012. evidently, the observed spike in usage was artificial and the result of the presence of outliers in certain datasets. if the data had not been tested for outliers, it would have appeared that usage the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 39 had substantially increased in 2010 and it would have been incorrectly assumed that usage was on the rise once more. instead, the corrected data bring usage levels for 2010 and 2011 back in line with the plateau that had begun in 2007 and reflects a more realistic picture of usage rates at laurentian university. figure 1. comparing the recorded number of articles downloaded to the corrected number of articles downloaded over a thirteen-year period. accuracy in any data gathering is always extremely important, but accuracy in e-resource usage levels is critical for academic libraries. academic libraries having e-journal subscription rates based either entirely or partly on usage can be greatly affected if usage numbers have been artificially inflated. it can lead to unnecessary increases in cost. since it was determined that outliers were present only during the period in which the library had found itself under “attack,” it can be assumed that the vast majority, if not all, of the extreme usage values were a result of illegal downloading. it would therefore be a shame to need to pay higher costs because of inappropriate or illegal downloading of licensed content. accurate usage data is also important for academic libraries that integrate usage statistics into their collection development policy for the purpose of justifying the retention or cancellation of a particular subscription. the j. n. desmarais library is such a library. as indicated earlier, if the cost-per-download of a subscription is consistently greater than the cost of an interlibrary loan for three or more years, it is marked for cancellation. at the j. n. desmarais library, the average cost of an interlibrary loan had been previously calculated to be approximately can$15.00.42 therefore, subscriptions recording a “cost-per-download” greater than the can$15.00 target for more than three years can be eliminated from the collection. information technology and libraries | june 2014 40 any artificial increase in the number of downloads would have as result to artificially lower the cost-per-use ratio. this would reinforce the illusion that a particular subscription was used far more than it really was and lead to the false belief that it would be less expensive to retain rather than rely on interlibrary loan services. the true cost-per-use ratio may be far greater than initially calculated. the unnecessary retention of a subscription could prevent the acquisition of another, more relevant, one. for example, after adjusting the number of articles downloaded from sciencedirect in 2011, the cost-per-download ratio increased from can$0.74 to can$1.59, a 53 percent increase. for the j. n. desmarais library, this package was obviously not in jeopardy of being cancelled. but a 53 percent change in the cost-per-use ratio for borderline subscriptions would definitely have been affected. it must also be stated that none of the library’s subscriptions having experienced extreme downloading found themselves in the position of being cancelled after the usage data had been corrected for outliers. regardless, it is important to verify all usage data prior to any data analysis to identify and correct for outliers. once the outlier detection investigation has been completed and any extreme values replaced by the median, there would be no further need to manipulate the data in such a fashion. the identification of outliers is a one-time procedure. the corrected or cleaned datasets would then become the official datasets to be used for any further usage analyses. conclusions outliers can have a dramatic effect on the analysis of any dataset. as demonstrated here, the presence of outliers can lead to the misrepresentation of usage patterns. they can artificially inflate average values and introduce severe distortion to any dataset. fortunately, they are fairly easy to identify and remove. the following steps were used to identify outliers in jr1 countercompliant reports: 1. identify possible outliers: visually inspect the values recorded in a jr1 report dataset (dataset 1) and mark any extreme values. 2. for each suspected outlier identified, take the usage values for the affected e-journal title and incorporate them into a separate blank spreadsheet (dataset 2). incorporate into dataset 2 all other usage values for the affected journal from all available years. it is important that dataset 2 contain only those values for the affected journal. 3. test for the outlier: perform dixon q test on the suspected outlier to confirm or disprove existence of the outlier. 4. if the suspected outlier tests as positive, calculate the median of dataset 2. 5. replace the outlier in dataset 1 with the median calculated from dataset 2. 6. perform steps 1 through 5 for any other suspected outliers in dataset 1. 7. the corrected values in dataset 1 will become the official values and will be used for all subsequent usage data analysis. the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 41 the identification and removal of outliers had a noticeable effect on the usage statistics for j. n. desmarais library’s e-journal collection. outliers represented over 100,000 erroneous downloaded articles in 2010 and nearly 200,000 in 2011. a total of 20 percent of recorded downloads in 2011 were anomalous, and in all likelihood a result of illicit downloading after laurentian university’s ezproxy server was breached. new technologies have made digital content easily available on the web, which has caused serious concern for both publishers43 and institutions of higher learning, which have been experiencing an increase is illicit attacks.44 the history of napster supports the argument that users “will freely steal content when given the opportunity.”45 since web robot traffic will continue to grow in pace with the internet, it is critical that this traffic be factored into the performance and protection of any web servers.46 references 1. victoria j. hodge and jim austin, “a survey of outlier detection methodologies,” artificial intelligence review 85 (2004): 85–126, http://dx.doi.org/10.1023/b:aire.0000045502.10941.a9; patrick h. menold, ronald k. pearson, and frank allgöwer, “online outlier detection and removal,” in proceedings of the 7th mediterranean conference on control and automation (med99) haifa, israel—june 28-30, 1999 (haifa, israel: ieee, 1999): 1110–30. 2. hodge and austin, “a survey of outlier detection methodologies,” 85–126. 3. vic barnett and toby lewis, outliers in statistical data (new york: wiley, 1994). 4. hodge and austin, “a survey of outlier detection methodologies,” 85–126; r. s. witte and j. s. witte, statistics (new york: wiley, 2004); menold et al., “online outlier detection and removal,” 1110–30. 5. menold et al., “online outlier detection and removal,” 1110–30. 6. hodge and austin, “a survey of outlier detection methodologies,” 85–126. 7. laurentian university (sudbury, canada) is classified as a medium multi-campus university. total 2012 full-time student population was 6,863, of which 403 were enrolled in graduate programs. in addition, 2012 part-time student population was 2,652 with 428 enrolled in graduate programs. also in 2012, the university employed 399 full-time teaching and research faculty members. academic programs cover a multiple of fields in the sciences, social sciences, and humanities and offers 60 undergraduate, 17 master’s, and 7 doctoral degrees. 8. alain r. lamothe, “factors influencing usage of an electronic journal collection at a mediumsize university: an eleven-year study,” partnership: the canadian journal of library and information practice and research 7, no. 1 (2012), https://journal.lib.uoguelph.ca/index.php/perj/article/view/1472#.u36phvmsy0j. https://journal.lib.uoguelph.ca/index.php/perj/article/view/1472#.u36phvmsy0j information technology and libraries | june 2014 42 9. ben tremblay, “web bot—what is it? can it predict stuff?” daily common sense: scams, science and more (blog), january 24, 2008, http://www.dailycommonsense.com/web-botwhat-is-it-can-it-predict-stuff/. 10. derek doran and swapna s. gokhale, “web robot detection techniques: overview and limitations,” data mining and knowledge discovery 22 (2011): 183–210, http://dx.doi.org/10.1007/s10618-010-0180-z. 11. c. lee giles, yang sun, and isaac g. councill, “measuring the web crawler ethics,” in www 2010 proceedings of the 19th international conference on world wide web (raleigh, nc: international world wide web conferences steering committee, 2010): 1101–2, http://dx.doi.org/10.1145/17772690.1772824. 12. shinil kwon, kim young-gab, and sungdeok cha, “web robot detection based on patternmatching technique,” journal of information science 38 (2012): 118–26, http://dx.doi.org/10.1177/0165551511435969. 13. david watson, “the evolution of web application attacks,” network security (2007): 7–12, http://dx.doi.org/10.1016/s1353-4858(08)70039-4. 14. eric kin-wai lau, “factors motivating people toward pirated software,” qualitative market research 9 (2006): 404–19, http://dx.doi.org/1108/13522750610689113. 15. huan-chueh wu et al., “college students’ misunderstanding about copyright laws for digital library resources,” electronic library 28 (2010): 197–209, http://dx.doi.org/10.1108/02640471011033576. 16. ibid. 17. ibid. 18. emma mcculloch, “taking stock of open access: progress and issues,” library review 55 (2006): 337–43; c. patra, “introducing e-journal services: an experience,” electronic library 24 (2006): 820–31. 19. wu et al., “college students’ misunderstanding about copyright laws for digital library resources,” 197–209. 20. ibid. 21. vincent j. calluzzo and charles j. cante, “ethics in information technology and software use,” journal of business ethics 51 (2004): 301–12, http://dx.doi.org/10.1023/b:busi.0000032658.12032.4e. 22. s. l. solomon and j. a. o’brien “the effect of demographic factors on attitudes toward software piracy,” journal of computer information systems 30 (1990): 41–46. 23. j. n. desmarais library, “collection development policy” (sudbury, on: laurentian university, 2013), http://www.dailycommonsense.com/web-bot-what-is-it-can-it-predict-stuff/ http://www.dailycommonsense.com/web-bot-what-is-it-can-it-predict-stuff/ http://dx.doi.org/10.1007/s10618-010-0180-z http://dx.doi.org/10.1145/17772690.1772824 http://dx.doi.org/10.1177/0165551511435969 http://dx.doi.org/10.1016/s1353-4858(08)70039-4 http://dx.doi.org/1108/13522750610689113 http://dx.doi.org/10.1108/02640471011033576 http://dx.doi.org/10.1023/b:busi.0000032658.12032.4e the importance of identifying and accommodating e-resources usage data for the presence of outliers | lamothe 43 http://biblio.laurentian.ca/research/sites/default/files/pictures/collection%20development %20policy.pdf. 24. lamothe, “factors influencing usage”; alain r. lamothe, “electronic serials usage patterns as observed at a medium-size university: searches and full-text downloads,” partnership: the canadian journal of library and information practice and research 3, no. 1 (2008), https://journal.lib.uoguelph.ca/index.php/perj/article/view/416#.u364kvmsy0i. 25. martin zimerman, “e-books and piracy: implications/issues for academic libraries,” new library world 112 (2011): 67–75, http://dx.doi.org/10.1108/03074801111100463. 26. ibid. 27. peggy hageman, “ebooks and the long arm of the law,” econtent (june 2012), http://www.econtentmag.com/articles/column/ebookworm/ebooks-and-the-long-arm-ofthe-law--82976.htm. 28. “dataset, n.,” oed online, (oxford, uk: oxford university press, 2013), http://www.oed.com/view/entry/261122?redirectedfrom=dataset; “dataset—definition,” ontotext, http://www.ontotext.com/factforge/dataset-definition; w. paul vogt, “data set,” dictionary of statistics and methodology: a nontechnical guide for the social sciences (london, uk: sage, 2005); allan g. bluman, elementary statistics—a step by step approach (boston: mcgraw-hill, 2000). 29. david b. rorabacher, “statistical treatment for rejection of deviant values: critical values of dixon’s ‘q’ parameter and related subrange ratios at the 95% confidence level,” analytical chemistry 63 (1991): 139–45; r. b. dean and w. j. dixon, “simplified statistics for small numbers of observations,” analytical chemistry 23 (1951): 636–38, http://dx.doi.org/10.1021/ac00002a010. 30. surenda p. verma and alfredo quiroz-ruiz, “critical values for six dixon tests for outliers in normal samples up to sizes 100, and applications in science and engineering,” revista mexicana de ciencias geologicas 23 (2006): 133–61. 31. robert r. sokal and f. james rohlf, biometry (new york: freeman, 2012); j. h. zar, biostatistical analysis (upper saddle river, nj: prentice hall, 2010). 32. “null hypothesis,” accessscience (new york: mcgraw-hill education, 2002), http://www.accessscience.com. 33. ibid. 34. “critical value,” accessscience, (new york: mcgraw-hill education, 2002), http://www.accessscience.com. 35. ibid. 36. verma and quiroz-ruiz, “critical values for six dixon tests for outliers,” 133–61. http://biblio.laurentian.ca/research/sites/default/files/pictures/collection%20development%20policy.pdf http://biblio.laurentian.ca/research/sites/default/files/pictures/collection%20development%20policy.pdf https://journal.lib.uoguelph.ca/index.php/perj/article/view/416#.u364kvmsy0i http://dx.doi.org/10.1108/03074801111100463 http://www.econtentmag.com/articles/column/ebookworm/ebooks-and-the-long-arm-of-the-law--82976.htm http://www.econtentmag.com/articles/column/ebookworm/ebooks-and-the-long-arm-of-the-law--82976.htm http://www.oed.com/view/entry/261122?redirectedfrom=dataset http://www.ontotext.com/factforge/dataset-definition http://dx.doi.org/10.1021/ac00002a010 http://www.accessscience.com/ http://www.accessscience.com/ information technology and libraries | june 2014 44 37. rorabacher, “statistical treatment for rejection of deviant values,” 139–45. 38. ibid. 39. jaakko astola and pauli kuosmanen, fundamentals of nonlinear digital filtering (new york: crc, 1997); jaakko astola, pekka heinonen, and yrjö neuvo, “on root structures of median and median-type filters,” ieee transactions of acoustics, speech, and signal processing 35 (1987): 1199–201; l. ling, r. yin, and x. wang, “nonlinear filters for reducing spiky noise: 2dimensions,” ieee international conference on acoustics, speech, and signal processing 9 (1984): 646–49; n. j. gallagher and g. wise, “a theoretical analysis of the oroperties of median filters,” ieee transactions of acoustics, speech, and signal processing 29 (1981): 1136–41. 40. menold et al., “online outlier detection and removal,” 1110–30. 41. ibid. 42. lamothe, “factors influencing usage”; lamothe, “electronic serials usage patterns.” 43. paul gleason, “copyright and electronic publishing: background and recent developments,” acquisitions librarian 13 (2001): 5–26, http://dx.doi.org/10.1300/j101v13n26_02. 44. tena mcqueen and robert fleck jr., “changing patterns of internet usage and challenges at colleges and universities,” first monday 9 (2004), http://firstmonday.org/issues/issue9_12/mcqueen/index.html. 45. robin peek, “controlling the threat of e-book piracy,” information today 18, no. 6 (2001): 42. 46. gleason, “copyright and electronic publishing,” 5–26. http://dx.doi.org/10.1300/j101v13n26_02 http://firstmonday.org/issues/issue9_12/mcqueen/index.html lib-mocs-kmc364-20140103102946 103 book reviews libraries in new york city, edited by molly herman. new york: columbia university school of library service, 1971. 214 pp. $3.50. this guide to libraries in new york is comprehensive, and the description of each library is thorough. pages 184 and 185 list libraries in which there are active and significant automation projects. frederick g. kilgour cobol logic and programming, by fritz a. mccameron, homewood, ill.: richard d. irwin, inc., 1970. 254 pp. $6.00. this book provides a good introduction to cobol, although the author implies that cobol logic is different from ·other computer language logics. however, many examples are included in the text to illustrate new commands and there are numerous review questions, exercises and problems in each chapter. the problems of later chapters build on the logical designs presented earlier. thus, the reader can follow a problem from analysis through solution. the book would be a more useful self-instruction guide as well as textbook if the answers to recall questions and exercises were given. a sound understanding of cobol should be gained from solving the fairly sophisticated problems at the end of the book. one unique and useful idea is the inclusion of coding sheet, punch card, printout, test data and output facsimiles. the most serious drawback of this book in regard to library automation is its obvious slant toward business applications. while the cobol commands presented are sufficient for most applications, there is no mention of character manipulation commands such as examine, with tallying and replacing options. in addition, problems are oriented toward bookkeeping and inventory controls. valerie ]. ryder die universitii.tsbibliothek auf der industrieausstellung: 1. wissen auf abruf. 16 pp. 2. dokumentation-lnformation. 16 pp. berlin: universitatsbibliothek der technischen universitiit berlin, 1970. no price. this constitutes a report (in two parts) of the contribution of the library of the technical university of west berlin to the official german industrial exposition held september 27 to october 6, 1968. the library's special exhibit was part of a section labeled: "quality through research and development." it attempted to give a synoptic view of modern library proce104 journal of library automation vol. 4/2 june, 1971 dures and their value for improving science library services. the examples demonstrated emphasized document acquisition procedures and the various readers' services. a total area of approximately 600 square feet was divided into two rooms, one showing technical equipment and the other, besides housing a twx-terminal, was furnished as a reference reading room. the terminal connected the exhibit area with the reference department of the library of the technical university. graphic charts on the walls explained functions of the typical science library in germany and the kinds of services offered. no fundamental differences from the situation in other western countries, especially the usa, can be pointed out. it may be mentioned here that west germany has an efficient organization of union catalogs, one for almost every state (bavaria, wiirtemberg baden including palatinate, hessen, nordrhein-westphalia, hamburg, and west-berlin). interlibrary loan requests go first to a region's union catalog and from there, when the item is traced within the region, to the appropriate lending library, which forwards the item or copy to the requesting library. non-traceable titles are automatically sent on to a neighboring state's union catalog, and so on, until the item is found and sent to the requesting library. reader/ copier machines for different systems of micronized text material were displayed and could be operated by the visitors. under the title "document circulation" the application of edp methods were shown, using machine readable paper tape for borrowing records. the system described was an off-line one, using (presumably daily) lists of the updated circulation master file. other graphic charts described the automated document retrieval system installed at the library of the technical institute of delft, netherlands, and the integrated library system of euratom in ispra, italy, which includes a selected dissemination of information service. computer generated bookform catalogs of monographic and serials records of other west german science libraries were on display, together with information dealing with the european translations center in delft, which records all scientific translations and publishes "world index of scientific translations." a film showing the operation of the national lending library of great britain was demonstrated. literature analysis, recording, storing, and retrieval are the topics of the second part of the report. electromechanical documentation methods using punch cards, and more often punch paper tape, with their corresponding machinery for selecting and writing back records, were shown under operating conditions. a computer based automatic information retrieval system, developed by siemens on the hardware of the current rca spectra 70 computer series was also exhibited. the system named "golem" claims to have some advantages over the medlars i system of the national library of medicine. it is operational at siemens/ edp headquarters in munich. richard a. polacsek book reviews 105 marc manuals used by the library of congres~> , prepared by the information systems office, library of congress. 2d ed. chicago: information science and automation division, american library association, 1970. 70, 318, 26, 18 p. this second edition contains the same four manuals as did the first, issued in 1969, although the titles of some of the individual manuals have been changed. the manuals are: 1. books: a marc format. 4th ed., april 1970 (formerly the subscriber's guide to the marc distt·ibution service. 3d ed.) 2. data preparation manual: marc editors. 3d ed., april 1970. 3. transcription manual: marc typists. 2d ed., april 1970. 4. computer and magnetic tape unit usability study. the fourth manual has been reproduced unchanged from the 1969 edition. the third, which contains the keyboarding procedures designed to convert bibliographic data into machine readable form, has been given a subtitle and completely revised to apply to a different keying device, the ibm mt /st, model v. it is the first two manuals, however, which will attract the widest continuing study outside of the library of congress. both manuals have been updated. significant changes from the previous edition of each are indicated in the margin by a double asterisk at the point where the revision was made. no indication is made of deletions, however. thus, users who look for field 652, which was described in the earlier edition, will not find it; nor will they find any instructions directing them to fields 651 and 610, which contain the material formerly placed in that discontinued field, although both 651 and 610 are provided with o o to indicate that they contain new material. among the additions to the first manual are provisions for greek, subscript, and superscript characters, and a revision of the 001 field to take into account both the old and the new l.c. card numbering systems. among the deletions is the table showing the ascii 8-bit hex and 6-bit octal in ebcdic hex sequence. the editors' manual contains the procedures followed by the marc editors in preparing data for conversion to machine readable form. while the first edition of the marc manuals contained the first edition of this particular manual, a second edition was issued in july 1969 for internal use within the library of congress. this third edition is essentially the same as the second edition with minor revisions such as the addition of examples and clarifying statements, a few new instructions, and corrections of typographical errors. the double asterisks in this manual refer to changes from the second edition, not from the first, so that owners of the first edition will have to make their own comparisons to see where the third edition differs from the first. among the new, non-asterisked, materials included that did not appear in the first edition are a discussion of other (non-lc) subject headings on 106 journal of library automation vol. 4/2 june, 1971 pp. 111-114 and of romanized titles on pp. 131-132. the third edition also contains several new appendices covering diacritics and special characters, sequence numbers, and correction procedures. while the editors' manual is designed chiefly for use by the editors at l.c., it has great value for marc users. in many places it provides an expansion and explanation of material treated much more briefly in the first manual, books: a marc format. examples of this clarification are the discussion of fixed fields in the editors' manual and its explanation of the alternative entry indicator in the 700 fields, which is merely listed in the first manual. the editors' manual also contains material that does not appear in the first manual, such as the alphabetic alternatives for the numeric tags (which i find more confusing and less memorable than the numeric ones). while only a year intervened between the appearance of the first and second editions of the marc manuals, enough changes have been made to make the new edition a necessary purchase for all those actively involved in the use of marc records. provision of an index would, however, have facilitated its use. judith hopkins computers in knowledge based fields, by charles a. myers. cambridge, mass.: the mit press, 1970. 136 pp. $6.95. a joint project of the industrial relations section, sloan school of management, mit and the inter-university study of labor problems in economic development. the author has written previously on the impact of computers on management. in the current study on the implications of technological change and automation he has selected five areas-formal education and educational administration; library systems; legal services; medical and hospital service; national and centralized local data banks. in this book he is trying to answer such questions as what needs prompted the use of computers, what are the initial applications and what problems were encountered, what affect does the use of computers have on the work performed and what resistance was encountered to their introduction. he also posed the question: can anything be said about comparative costs of computer based programs as compared with other programs? the answer appears to be "no" or "not yet." the chapter on libraries deals primarily with project intrex and thus fails to give an overview of developments in library systems which are operational. the other chapters offer a review of planned and operational projects as of 1968-69. stephen e. furth book reviews 107 libraries and cultural change, ronald c. benge ( hamden, connecticut and london:) archon books & clive bingley ( 1970 ). 278 pp. $9.00. this work is intended primarily to serve library students as an introduction to a consideration of the place of the library in society, with suggestions for further reading. the author is hopeful that it may be of interest to a wider audience, and it is. mr. benge has taught in library schools in the caribbean, west africa, england and wales. this experience is reflected in his approach to a discussion of the social background of library work. although, as he points out, it is possible to establish connections of many kinds, and libraries might be convincingly connected with witchcraft or the illegitimacy rate or prehistoric man, yet more meaningful connections must be sought, and he has selected not only culture, but cultural change, as the basis. further, in his several discussions he has tried to commence with the cultural background and then to note the possible implications for librarianship, rather than to follow the more usual method of commencing with libraries and showing the relevance to them of social forces and other institutions. a listing of a few of mr. benge's fourteen chapter-headings will suggest his development of his theme: "the clash of cultures", "mass communications", "censorship", "the impact of technology", "philosophies of librarianship". each chapter is an urbane essay in the editor's easy-chair manner, a monolog in which the author introduces the reader to that part of the universe that can be viewed through the arch over which the particular chapter-title is inscribed, and relates it to the work of the library. mr. benge is infmmative ( he is up-to-date on all manner of matters; e.g., he has been reading library college and he knows about high john), he is occasionally witty and often convincing. as the basis for class-room discussion his work is perhaps also as stimulating as a propaedeutic should be, but lacking such discussion i doubt this attribute. i find that to stimulate, a book must organize the field of discussion. for me mr. ben~e fails to do so. i find his essays agreeable, with occasional bons mots ('young people, like books, must be preserved for the future "; "guinea pigs are happy creatures") but, like other conversational literature, it leaves me with a general euphoria but unsatisfied logic. for example, the final chapter ("philosophies of librarianship" ) starts out bravely by questioning the relevance of theory but concludes feebly that what is needed to explain librarianship is perhaps a new integration of traditional custodial principles, the missionary approach, and the rationale of a personal reference service. references from other than the anglo-american culture-sphere are few; the book would have gained greatly from more. we here in jlauay are naturally interested to hear what mr. benge has to say on "the impact of technology". in this chapter-regrettablyhe abandons his method of social background first and relevance for libraries 108 journal of library automation vol. 4/2 june, 1971 afterward, and simply notes the direct impact of technology on libraries, mainly in the uk. he concludes that "there can be no doubt that the information crisis does exist and that traditional reference or retrieval methods have not solved it. there is chaos, duplication and waste. what i have tried to suggest here is that on the evidence to date, we cannot yet be sure that machine retrieval is the answer" ( p . 175). there are misprints, to be sure, neither unusually numerous or serious, with one exception. dr. vannevar bush's name (p. 182) has been mangled, and is, moreover, omitted from the name index. verner w. clapp serial publications in large libraries, edited by walter c. allen. urbana, ill.: graduate school of library science, university of illinois, 1970. 194 pp. $4.50. handling of serial publications was the topic of the sixteenth allerton park institute held in november 1969; the papers are published in this slim volume. almost every paper offers a number of controversial and provocative ideas which must have evoked interested and interesting reactions. the subsequent discussions are not reported. problems of serials-the librarian's basket of snakes-are identified and analyzed from selection and acquisition through check-in, cataloging, binding, shelf arrangement, abstracting and indexing, to machine applications. the papers cover this gamut well and in most cases provide a good view of the state of the art. recurrent themes are the significant role of serials in today's information flow, the urgency of the problems (though the content is long on agony and short on therapy), and the necessity for bearing in mind the user's rather than the librarian's convenience where both cannot be accommodated when reaching for solutions. donald hammer's paper on computer aided operations provides a good introduction and overview of automated serials systems, with some helpful hints to beginners in the field. microfilm technology and machine reada.ble commercial abstracting and indexing services are touched on by warren kuhn and bill woods, but each topic deserves more thorough treatment in separate papers. too few of the speakers proposed specific research in their areas; where such long-standing problems exist, some well-directed suggestions might elicit useful studies. the book should be useful to library schools as good coverage of a seldom detailed problem operation, to librarians entering the challenging maelstrom of serials handling, and to those already overinvolved who might be refreshed by the longer view. the poor proofreading is a minor flaw. mary jane reed book reviews 109 training in indexing: a course of the society of indexers, edited by g. norman knight. cambridge, massachusetts: the m. i. t. press, 1969. 219 pp. $7.95. to this reviewer, who h ad struggled through the compilation of one annual index to the journal of library automation with the aid of scarce, unrelated, and out-of-date books and periodicals on the subject of indexing, this thorough, well-written volume, aimed at the neophyte indexer, came as a godsend. it comprises a series of lectures, by master practitioners of the craft, sponsored by the society of indexers. that authors and audience were chiefly british d etracts not a whit from the book's usefulness to americans. two introductory chapters b y robert l. collison on the elements of book indexing are followed by twelve on specific treatment of those elements and of different types of material. chapters on indexing p eriodicals, scientific and technical material will particularly interest readers of lola. exercises, a selected bibliography, and an index that also serves as an illustration of points in the text, enhance the usefulness of this book to the beginner. it should be equally useful to an indexer of no matter how much experience, for , as collison emphasizes in his opening statement, indexing is still in an elementary stage, there are no common rules on which all indexers agree, and everyone considers himself his own authority on how an index should b e arranged and what should go into it. in treating a subject that might seem to the layman to lend itself all too readily to the cut-and-dried approach, the authors have brought a delightful measure of flexibility, wit and imagination. at no point do they lose sight of the fact that the indexing of books, like the writing of them, is a very human endeavor. eleano1· m. kilgour reader in library services and the computer, edited by louis kaplan. washington, d. c.: ncr microcard editions, 1971. 239 pp. $9.95. this volume contains a couple of dozen reprints, mostly of articles. the reader is not intended for those doing research and d evelopment in library automation, but rather for librarians and library students who wish to familiarize themselves with the subject. the quality of the articles is high. in general, they present a conservative position, which is not to say that they oppose library automation. rather, they inform the reader of positive action to be taken and in so doing impart understanding. within this conservative fram ework, however, various viewpoints are expressed. seven subjects group the articles: the challenge ( three articles); varieties of response (six ); theory of management (one); new services (six) ; catalogs and the computer ( two ); copyright (one ); and information retrieval testing ( six ) . the r eader is not a book in the sense that a book 110 journal of library automation vol. 4/ 2 june, 1971 contains a central theme. it is likely that the r eader will be used for its sections rather than in its entirety, but that is the manner in which one expects to use a reader. anyone who so uses it will be enlightened. the reader has but one serious shortcoming. it is devoid of an index. this deficit will seriously hamper consultation of the book. frederick g. kilgour automation management: the social perspective, ed. by ellis l. scott and roger w. bolz. athens, ga., center for the study of automation and society, university of georgia, 1970. (second annual georgia-reliance symposium) $5.75. sixteen papers are presented at this symposium by a variety of authors from labor, management, academe, etc. as in all collections of papers, they are uneven in quality. the preface of the symposium states that the "1970 symposium focused on the problem of automation management, from a social perspective, as it relates to industry, education, labor and government." the papers reflect ideas concerning the need for training and retraining, and for preparing people for automation by having them participate in the decision-making process. three papers on the effects of automation use economic analysis based upon the gross national product and other labor and business indicators and find that the changes predicted for automation in terms of joblessness and increased productivity are unfounded, although some questions are asked about the validity of the figures used to make these assumptions. there are interesting formulations on the nature of change and innovation and the time lag between basic research and industrial application. gordon carson's paper expressly attacks the issue of automation in libraries and in education. dr. carson sees one of the problems as the library's print media orientation when the other senses, such as hearing, could also be used. libraries are also attacked on the basis of how they measure effectiveness, i.e., the number of volumes on the shelf, rather than "the speed with which information can be retrieved from that library and placed in the hands of him who needs to use it." this methodology for measuring effectiveness is changing presently, so that the need expressed by dr. carson may be met. in conclusion, dr. carson states that there are "three essential areas in which automation can be exceptionally helpful in higher education. these are as follows: 1 ) improved teaching techniques including autodidactic learning systems; 2 ) registration, fee payment and curriculum planning .. . ; 3) libraries-information retrieval." although in a way many papers in this volume skirt the periphery of the effects of change and how to create it, it is worthwhile reading on the whole. henry voos book reviews 111 interlibrary loan involving academic libraries, by sarah katharine thomson. chicago: american library association, 1970. (acrl monograph, 32). viii/127 pp. $5.00. interlibrary loan pmcedure manual, by sarah katharine thomson. chicago: american library association, 1970. xi/116 pp. $4.50. interlibrary loan involving academic libraries is a summary version of "a normative survey of current interlibrary loan practices in academic libraries in the united states." it makes surprisingly compulsive reading for anyone who has worked much with interlibrary loans, and might be an eye-opener for those who haven't. (the original, complete version appeared in 1967 as a columbia university dls dissertation.) much of it documents or corroborates the feelings (or suspicions) of busy, experienced interlibrary loan staff; some of it is new and surprising; and doubtless many of the same patterns and trends hold true today. dr. thomson, working primarily with data reported by academic libraries to the u.s. office of education in 1963-64, results of intensive analysis of a sample of 5895 interlibrary loan requests (drawn from a total of 60,000 received by eight major university libraries in 1963-64 and 1964-65 ), and information from several questionnaires, presents a clear picture of who borrowed what from whom, how often; staffing and time required; distribution patterns of requests by size and location of library, type of reader; sources of difficulty, delay and failure; factors predictive of fast and efficient service; and a number of other variables. her results and conclusions are presented clearly, with supportive or illustrative statistics, graphs, correlations, and other tables. chapter 14 offers recommendations of librarians for increasing the proportion of interlibrary loan requests fill ed. suggestions and recommendations resulting from dr. thomson's study were incorporated in, or influenced the drafters of, the 1968 national interlibrary loan code, the model regional or state code, and the 1968 interlibrary loan request form. dr. thomson estimates that interlibrary loan requests involving academic libraries are well over the million mark by now, and refers to a 1965 study which reports large libraries estimating they are unable to fill about onethird of the requests they receive. it is to be hoped that some of the worst faults in interlibrary loan requests have been mitigated by the revised codes, revised form s, and better education of interlibrary loan assistants. the new procedures manual should help, too. perusing this monograph should foster greater awareness and understanding of the dimensions and problems of interlibrary loan service. now, if only we had an up-to-date cost study .... who profits from the appearance of the interlibrary loan procedure manual? not merely ill novices, whether new clerical assistants or young librarians faced with setting up, reorganizing, or streamlining interlibrary loan routines. it has value for the old ill hand, checking up on established routines to be sure no sloppiness has crept in ; for the library school student, as an early exposure to good library cooperation manners, 112 journal of library automation vol. 4/2 june, 1971 as well as a basic step-by-step indoctrination in "how to do it"; for recipient libraries, whose time and patience would be much less strained were all requestors to follow these elementary, commonsense, too often ignored recommendations; and last, not least, the library's patron, whose needs will be filled faster, more economically, with fewer false starts. a wealth of practical detail has been packed into these pages-a plethora of detail, some might complain, confusing the beginner and boring the experienced. but a procedure manual by definition tries to incorporate every stroke and serif of a to z. simple solutions to that complaint are re-reading, and/or judicious scanning. the manual includes annotated texts of the 1968 national interlibrary loan code and the model regional or special code; primer-type instructions for borrowing and requesting libraries (including concise sections on special puzzlers such as academic theses, government publications, technical reports, materials in non-roman alphabets); and consideration of related, often problematical areas such as photocopy, copyright and reprinting, location requests, teletype requests, purchase of dissertations, and international loans. useful appendices (e.g., sample forms, some library policy statements, the text of the ifla international loan code), a bibliography and a detailed index complete the work. chapter levels vary of necessity. for the novice, the teletype request chapter may seem too brief or confusing, yet several appendices (for instance ) will be of interest even to the seasoned ill assistant. throughout, the effort has been for clarity, coverage, explicitness. the cost of an interlibrary loan transaction is too great to indulge sloppy, inefficient, or idiosyncratic procedures, and this manual is therefore required reading for all involved in interlibrary loans, and a copy should be at the elbow of every new clerical assistant. elizabeth rumics microsoft word march_ital_young_tc proofread.docx building library community through social media scott w. h. young and doralyn rossmann information technology and libraries | march 2015 20 abstract in this article academic librarians present and analyze a model for community-‐building through social media. findings demonstrate the importance of strategy and interactivity via social media for generating new connections with library users. details of this research include successful guidelines for building community and developing engagement online with social media. by applying intentional social media practices, the researchers’ twitter user community grew 100 percent in one year, with a corresponding 275 percent increase in user interactions. using a community analysis approach, this research demonstrates that the principles of personality and interactivity can lead to community formation for targeted user groups. discussion includes the strategies and research approaches that were employed to build, study, and understand user community, including user-‐type analysis and action-‐object mapping. from this research a picture of the library as a member of an active academic community comes into focus. introduction this paper describes an academic library’s approach to building community through twitter. much of the literature offers guidance to libraries on approaches to using social media as a marketing tool. the research presented here reframes that conversation to explore the role of social media as it relates to building community. the researchers’ university library formed a social media group and implemented a social media guide to bring an intentional, personality-‐rich, and interaction-‐driven approach to its social media activity. quantitative analyses reveal a significant shift and increase in twitter follower population and interactions, and suggest promising opportunities for social media to strengthen the library’s ties with academic communities. literature review research in libraries has long brought a critical analysis to the value, purpose, and practical usage of social media. glazer asked of library facebook usage, “clever outreach or costly diversion?”1 three years later, glazer presented a more developed perspective on facebook metrics and the nature of online engagement, but social media was still described as “puzzling and poorly defined.”2 vucovich et al. furthermore notes that “the usefulness of [social networking tools] has often proven elusive, and evaluating their impact is even harder to grasp in library settings.”3 scott w. h. young (swyoung@montana.edu) is digital initiatives librarian and doralyn rossmann (doralyn@montana.edu) is head of collection development, montana state university library, bozeman. building library community through social media | young and rossmann 21 li and li similarly observe that there “seems to be some confusion regarding what exactly social media is.”4 social media has been experimented with and identified variously as a tool for enhancing the image of libraries,5 as a digital listening post,6 or as an intelligence gathering tool.7 with such a variety of perspectives and approaches, the discussion around social media in libraries has been somewhat disjointed. if there is a common thread through library social media research, however, it ties together the broadcast-‐based promotion and marketing of library resources and services, what li calls “the most notable achievement of many libraries that have adopted social media.”8 this particularly common approach has been thoroughly examined.9,10,11,12,13,14,15 in evaluating the use of facebook at western michigan university’s waldo library, sachs, eckel, and langan found that promotion and marketing was the only “truly successful” use for social media.16 a survey of estonian librarians revealed that facebook “is being used mainly for announcements; it is reduplicating libraries’ web site[s]. interestingly librarians don’t feel a reason to change anything or to do something differently.”17 with this widespread approach to social media, much of the library literature is predominated by exploratory descriptions of current usage and implementation methods under the banner of promoting resources by meeting users where they are on social media.18,19,20,21,22,23,24,25,26,27 this research is effective at describing how social media is used, but it often does not extend the discussion to address the more difficult and valuable question of why social media is used. the literature of library science has not yet developed a significant body of research around the practice of social media beyond the broadcast-‐driven, how-‐to focus on marketing, promotion, and public-‐relations announcements. this deficiency was recognized by saw, who studied social networking preferences of international and domestic australian students, concluding “to date, the majority of libraries that use social networking have used it as a marketing and promotional medium to push out information and announcements. our survey results strongly suggest that libraries need to further exploit the strengths of different social networking sites.”28 from this strong emphasis on marketing and best practices emerges an opportunity to examine social media from another perspective—community building—which may represent an untapped strength of social networking sites for libraries. while research in library and information science has predominantly developed around social media as marketing resource, a small subset has begun to investigate the community-‐building capabilities of social media.29,30,31,32 by making users feel connected to a community and increasing their knowledge of other members, “sites such as facebook can foster norms of reciprocity and trust and, therefore, create opportunities for collective action.”33 lee, yen, and hsiao studied the value of interaction and information sharing on social media: “a sense of belonging is achieved when a friend replies to or ‘likes’ a post on facebook.”34 lee found that facebook users perceived real-‐world social value from shared trust and shared vision developed and expressed through information-‐sharing on social media. research from oh, ozkaya, and information technology and libraries | march 2015 22 larose indicated that users who engaged in a certain quality of social media interactivity perceived an enhanced sense of community and life satisfaction.35 broader discussion of social media as a tool for community-‐building has been advanced within the context of political activity, where social media is identified as a method for organizing civic action and revolutionary protests.36,37,38 related research focuses on the online social connections and “virtual communities” developed around common interests such as religion,39 health,40 education,41 social interests and norms,42 politics,43 web-‐video sharing,44 and reading.45 in these analyses, social media is framed as an online instrument utilized to draw together offline persons. hofer notes that communities formed online through social media activity can generate a sense of “online bonding social capital.”46 further marking the online/offline boundary, research from grieve et al. investigates the value of social connectedness in online contexts, suggesting that social connectedness on facebook “is a distinct construct from face-‐to-‐face social connectedness.”47 grieve et al. acknowledges that the research design was predicated on the assumptive existence of an online/offline divide, noting “it is possible that such a separation does not exist.”48 around this online/offline separation has developed “digital dualism,” a theoretical approach that interrogates the false boundaries and contrasts between an online world as distinct from an offline world.49,50 sociologist zeynep tufekci expressed this concisely: “in fact, the internet is not a world; it’s part of the world.”51 a central characteristic of community-‐building through social media is that the “online” experience is so connected and interwoven with the “offline” experience as to create a single seamless experience. this concept is related to a foundational study from ellison, steinfield, and lampe, who identified facebook as a valuable subject of research because of its “heavy usage patterns and technological capacities that bridge online and offline connections.”52 they conclude, “online social network sites may play a role different from that described in early literature on virtual communities. online interactions do not necessarily remove people from their offline world but may indeed be used to support relationships.”53 this paper builds on existing online community research while drawing on the critical theory of “digital dualism” to argue that communities built through social media do not reside in a separate “online” space, but rather are one element of a much more significant and valuable form of holistic connectedness. our research represents a further step in shifting the focus of library social media research and practice from marketing to community building, recasting library-‐led social media as a tool that enables users to join together and share in the commonalities of research, learning, and the university community. as library social media practice advances within the framework of community, it moves from a one-‐dimensional online broadcast platform to a multidimensional socially connected space that creates value for both the library and library users. method in may 2012, montana state university library convened a social media group (smg) to guide our building library community through social media | young and rossmann 23 social media activity. the formation of smg marked an important shift in our social media activity and was crucial in building a strategic and programmatic focus around social media. this internal committee, comprising three librarians and one library staff member, aimed to build a community of student participants around the twitter platform. smg then created a social media guide to provide structure for our social media program. this guide outlines eight principal components of social media activity (see table 1). social media guide component twitter focus audience focus undergraduate and graduate students goals connect with students and build community values availability, care, scholarship activity focus information sharing; social interaction tone & tenor welcoming, warm, energetic posting frequency daily, with regular monitoring of subsequent interactions posting categories student life, local community posting personnel 1 librarian, approximately .10 fte table 1. social media activity components prior to the formation of smg, our twitter activity featured automated posts that lacked a sense of presence and personality. after the formation of smg, our twitter activity featured hand-‐crafted posts that possessed both presence and personality. to measure the effectiveness of our social media program, we divided our twitter activity into two categories based on the may 2012 date of smg’s formation: phase 1 (pre-‐smg) and phase 2 (post-‐smg). phase 1 user data included followers 1–514, those users who followed the library between november 2008, when the library joined twitter, and april 2012, the last month before the library formed smg. phase 2 included followers 515–937, those users who followed the library between may 2012, when the library formed smg, and august 2013, the end date of our research period. using corresponding dates to our user analysis, phase 1 tweet data included the library’s tweets 1–329, which were posted between november 2008 and april 2012, and phase 2 included the library’s tweets 330–998, which were posted between may 2012 and august 2013 (table 2). for the purposes of this research, phase 1 and phase 2 users and tweets were evaluated as distinct categories so that all corresponding tweets, followers, and interactions could be compared in relation to the formation date of smg. within twitter, “followers” are members of the user community, “tweets” are information technology and libraries | march 2015 24 messages to the community, and “interactions” are the user behaviors of favoriting, retweeting, or replying. favorites are most commonly employed when users like a tweet. favoriting a tweet can indicate approval, for instance. a user may also share another user’s tweet with their own followers by “retweeting.” followers tweets duration phase 1 1–514 1–329 nov. 2008–april 2012 phase 2 515–937 330–998 may 2012–august 2013 table 2. comparison of phase 1 and 2 twitter activity we employed three approaches for evaluating our twitter activity: user type analysis, action-‐ object mapping, and interaction analysis. user type analysis aims to understand our community from a broad perspective by creating categories of users following the library’s twitter account. after reviewing the accounts of each member of our user community, we collected them into one of the following nine groups: alumni, business, community member, faculty, library, librarian, other, spam, and student. categorization was based on a manual review of information found from each user’s biographical profile, tweet content, account name, and a comparison against campus directories. action-‐object mapping is a quantitative method that describes the relationship between the performance of an activity—the action—in relation to an external phenomenon—the object. action-‐object mapping aims to describe the interaction process between a system and its users.54,55,56,57 within the context of our study, the system is twitter, the object is an individual tweet, and the action is the user behavior in response to the object, i.e., a user marking a tweet as a favorite, retweeting a tweet, or replying to a tweet. we collected our library’s tweets into sixteen object categories: blog post, book, database, event, external web resource, librarian, library space, local community, other libraries/universities, photo from archive, topics—libraries, service, students, think tank, hortative, and workshop. interaction analysis serves as an extension of action-‐object mapping and aims to provide further details about the level of interaction between a system and its users. for this study we created an associated metric, “interaction rate,” that measures the rate by which each object category received an action. within the context of our study, we have treated the “action” of action-‐object mapping and the “interaction” of twitter as equivalents. to identify the interaction rate, we used the following formula: “total number of tweets within an object category” divided by “number of tweets within an object category that received an action.” interaction rate was calculated for each object category and for all tweets in phase 1 and in phase 2. building library community through social media | young and rossmann 25 results the changes in approach to the library’s twitter presence through smg and the social media guide are evident in this study’s results (figure 1). an analysis of user types in phase 1 reveals a large portion, 48 percent, were business followers. in comparison, the business percentage decreased to 30 percent in phase 2. the student percentage increased from 6 percent in phase 1 to 28 percent in phase 2, representing a 366 percent increase in student users. as noted earlier, the social media guide component, “audience focus” for twitter is “undergraduate and graduate students” and includes the “goal” to “connect with students and build community” (table 1). the increase in the percentage of students in the follower population and the decrease in the business percentage of the population suggest progress towards this goal. figure 1. comparison of twitter users by type the object categorization for phase 1 shows a heavily skewed distribution of tweets in certain areas, while phase 2 has a more even and targeted distribution reflecting implementation of the social media guide components (figure 2). in phase 1, workshops is the most tweeted category with of 36 percent of all posts. library space represents 18 percent of tweets while library events is third with 17 percent. the remaining 13 categories range from 5 percent to a fraction of a percent of tweets. phase 2 shows a more balanced and intentional distribution of tweets across all object categories, with a strong focus on the social media guide “posting category” of “student information technology and libraries | march 2015 26 life,” which accounted for 25 percent of tweets. library space consists of 11 percent of tweets, and external web resource composes 9 percent of tweets. the remaining categories range from 8 percent to 1 percent of tweets. figure 2. comparison of tweets by content category interaction rates were low in most object categories in phase 1 (see figure 3). given that the social media guide has an “activity focus” of “social interaction,” a tweet category with a high percentage of posting and a low interaction rate suggests a disconnect between tweet posting and meeting stated goals. for example, workshops represented a large percentage (36 percent) of the tweets but yielded a 0 percent interaction rate. library space was 18 percent of tweets but had only a 2 percent interaction rate. eleven of the 16 categories in phase 1 had no associated actions and thus a 0 percent interaction rate. the interaction rate for phase 1 was 12.5 percent. in essence, our action-‐object data and interaction rate data shows us that during phase 1 we created content most frequently about topics of low interest to our community while we tweeted less frequently about topics of high interest to our community. building library community through social media | young and rossmann 27 figure 3. interaction rates, phase 1 in contrast to phase 1, phase 2 interaction rate demonstrates an increase in interaction rate across nearly every object category (figure 4, figure 5), especially student life and local community. figure 4. interaction rates, phase 2 information technology and libraries | march 2015 28 figure 5. interaction rate comparison the local community category of tweets had the highest interaction rate at 68 percent. the student life category had the second highest interaction rate at 62 percent. only 2 of the 16 categories in phase 2 had no associated actions and thus a 0 percent interaction rate. the interaction rate for phase 2 was 46.8 percent, which represented an increase of 275 percent from phase 1. in essence, our action-‐object data and interaction rate data shows us that during phase 2 we created content most frequently about topics of higher interest to our community while we tweeted less frequently about topics of low interest to our community. discussion this research suggests a strong community-‐building capability of social media at our academic library. the shift in user types from phase 1 to phase 2, notably the increase in student twitter followers, indicates that the shape of our twitter community was directly affected by our social media program. likewise, the marked increase in interaction rrate between phase 1 and phase 2 suggests the effectiveness of our programmatic community-‐focused approach. the montana state university library social media program was fundamentally formed around an approach described by glazer: “be interesting, be interested.”58 our twitter user community has thrived since we adopted this axiom. we have interpreted “interesting” as sharing original building library community through social media | young and rossmann 29 personality-‐rich content with our community and “interested” as regularly interacting with and responding to members of our community. the twofold theme of personality-‐rich content and interactivity-‐based behavior has allowed us to shape our phase 2 user community. prior to the formation of smg, social media at the msu library was a rather drab affair. the library twitter account during that time was characterized by automated content, low responsiveness, no dedicated personnel, and no strategic vision. our resulting twitter community was composed of mostly businesses, at 47 percent of followers, with students representing just 6 percent of our followers. the resulting interaction rate of 12.5 percent reflects the broadcast-‐driven approach, personality-‐devoid content, and disengaged community that together characterized phase 1. following the formation of smg, the library twitter account benefitted from original and unique content, high responsiveness, dedicated personnel, and a strategic, goal-‐driven vision. our phase 2 twitter community underwent a transformation, with business representation decreasing to 30 percent and student representation increasing to 28 percent. the resulting interaction rate of 46.8 percent reflects our refocused community-‐driven program, personality-‐rich content, and engaged community of phase 2. figure 6. typical phase 1 tweet figure 6 illustrates a typical phase 1 tweet. the object category for this tweet is database and it yielded no actions. the announcement of a new database trial was auto-‐generated from our library blog, a common method for sharing content during phase 1. this tweet is problematic for community-‐building for two primary reasons: the style and content lacks a sense and personality of a human author and does not offer compelling opportunities for interaction. information technology and libraries | march 2015 30 figure 7. typical phase 2 tweet figure 7 illustrates a typical phase 2 tweet. the object category for this tweet is student life and it yielded 6 actions (2 retweets and 4 favorites). the content relates to a meaningful and current event for our target student user community, and is fashioned in such a way as to invite interaction by providing a strong sense of relevancy and personality. figure 8 further demonstrates the community effect of phase 2. in this example we have reminded our twitter community of the services available through the library, and one student user has replied. during our phase 2 twitter activity, we prioritized responsiveness, availability, and scholarship with the goal of connecting with students and building a sense of community. in many ways the series of tweets shown in figure 8 encapsulates our social media program. we were able to deliver resources to this student, who then associates these interactions with a sense of pride in the university. this example illustrates the overall connectedness afforded by social media. in contacting the library twitter account, this user asked a real-‐world research question. neither his inquiry nor our response was located strictly within an online world. while we pointed this user to an online resource, his remarks indicated “offline” feelings of satisfaction with the interaction. lee and oh found that social media interactivity and information sharing can create a shared vision that leads to a sense of community belonging.59,60 by creating personality-‐rich content that invites two-‐way interaction, our strategic social media program has helped form a holistic community of users around our twitter activity. building library community through social media | young and rossmann 31 figure 8. phase 2 example, community effect currently our work addresses the formation of community through social media. a next step will introduce a wider scope by addressing the value of community formed through social media. there is a rich area of study around the relationship between social media activity, perceived sense of community and connectedness, and student success. 61,62,63,64,65 further research along this line will allow us to explore whether a library-‐led social media community can serve as an aid in undergraduate academic performance and graduation rate. continued and extended analysis information technology and libraries | march 2015 32 will allow us to increase the granularity of results, for example by mapping user types to action-‐ object pairs and identifying the interaction rate for particular users such as students and faculty. conclusion in articulating and realizing an intentional and strategic social media program, we have generated results that demonstrate the community-‐building capability of social media. over the course of one year, we transformed our social media activity from personality-‐devoid one-‐way broadcasting to personality-‐rich two-‐way interacting. the research that followed this fundamental shift provided new information about our users that enabled us to tailor our twitter activity and shape our community around a target population of undergraduate students. in so doing, we have formed a community that has shown new interest in social media content published by the library. following the application of our social media program, our student user community grew by 366 percent and the rate of interaction with our community grew by 275 percent. our research demonstrates the value of social media as a community-‐building tool, and our model can guide social media in libraries toward this purpose. references 1. harry glazer, “clever outreach or costly diversion? an academic library evaluates its facebook experience,” college & research libraries news 70, no. 1 (2009): 11, http://crln.acrl.org/content/70/1/11.full.pdf+html. 2. harry glazer, “‘likes’ are lovely, but do they lead to more logins? developing metrics for academic libraries’ facebook pages,” college & research libraries news 73, no. 1 (2012): 20, http://crln.acrl.org/content/73/1/18.full.pdf+html. 3. lee a. vucovich et al., “is the time and effort worth it? one library’s evaluation of using social networking tools for outreach,” medical reference services quarterly 32, no. 1 (2013): 13, http://dx.doi.org/10.1080/02763869.2013.749107. 4. xiang li and tang li, “integrating social media into east asia library services: case studies at university of colorado and yale university,” journal of east asian libraries 157, no. 1 (2013): 24, https://ojs.lib.byu.edu/spc/index.php/jeal/article/view/32663/30799. 5. colleen cuddy, jamie graham, and emily g. morton-‐owens, “implementing twitter in a health sciences library,” medical reference services quarterly 29, no. 4 (2010), http://dx.doi.org/10.1080/02763869.2010.518915. 6. steven bell, “students tweet the darndest things about your library—and why you need to listen,” reference services review 40, no. 2 (2012), http://dx.doi.org/10.1108/00907321211228264. building library community through social media | young and rossmann 33 7. robin r. sewell, “who is following us? data mining a library’s twitter followers,” library hi tech 31, no. 1 (2013), http://dx.doi.org/10.1108/07378831311303994. 8. li and li, “integrating social media into east asia library services,” 25. 9. remi castonguay, “say it loud spreading the word with facebook and twitter,” college & research libraries news 72, no. 7 (2011), http://crln.acrl.org/content/72/7/412.full.pdf+html. 10. dianna e. sachs, edward j. eckel, and kathleen a. langan, “striking a balance: effective use of facebook in an academic library,” internet reference services quarterly 16, nos. 1– 2 (2011), http://dx.doi.org/10.1080/10875301.2011.572457. 11. christopher chan, “marketing the academic library with online social network advertising,” library management 33, no. 8, (2012), http://dx.doi.org/10.1108/01435121211279849. 12. melissa dennis, “outreach initiatives in academic libraries, 2009–2011,” reference services review 40, no. 3, (2012), http://dx.doi.org/10.1108/00907321211254643. 13. melanie griffin and tomaro i. taylor, “of fans, friends, and followers: methods for assessing social media outreach in special collections repositories,” journal of web librarianship 7, no. 3 (2013), http://dx.doi.org/10.1080/19322909.2013.812471. 14. lili luo,“marketing via social media: a case study,” library hi tech 31, no. 3 (2013), http://dx.doi.org/10.1108/lht-‐12-‐2012-‐0141. 15. li and li, “integrating social media into east asia library services,” 25. 16. sachs, eckel, and langan, “striking a balance,” 48. 17. jaana roos, “why university libraries don’t trust facebook marketing?,” proceedings of the 21st international bobcatsss conference (2013): 164, http://bobcatsss2013.bobcatsss.net/proceedings.pdf. 18. noa aharony, “twitter use in libraries: an exploratory analysis,” journal of web librarianship 4, no. 4 (2010), http://dx.doi.org/10.1080/19322909.2010.487766. 19. a. r. riza ayu and a. abrizah, “do you facebook? usage and applications of facebook page among academic libraries in malaysia,” international information & library review 43, no. 4 (2011), http://dx.doi.org/10.1016/j.iilr.2011.10.005. 20. alton y. k. chua and dion h goh., “a study of web 2.0 applications in library websites,” library & information science research 32, no. 3 (2010), http://dx.doi.org/10.1016/j.lisr.2010.01.002. information technology and libraries | march 2015 34 21. andrea dickson and robert p. holley, “social networking in academic libraries: the possibilities and the concerns,” new library world 111, nos. 11/12 (2010), http://dx.doi.org/10.1108/03074801011094840. 22. valerie forrestal, “making twitter work: a guide for the uninitiated, the skeptical, and the pragmatic,” reference librarian 52, nos. 1–2 (2010), http://dx.doi.org/10.1080/02763877.2011.527607. 23. gang wan, “how academic libraries reach users on facebook,” college & undergraduate libraries 18, no. 4 (2011), http://dx.doi.org/10.1080/10691316.2011.624944. 24. dora yu-‐ting chen, samuel kai-‐wah chu, and shu-‐qin xu, “how do libraries use social networking sites to interact with users,” proceedings of the american society for information science and technology 49, no. 1 (2012), http://dx.doi.org/10.1002/meet.14504901085. 25. rolando garcia-‐milian, hannah f. norton, and michele r. tennant, “the presence of academic health sciences libraries on facebook: the relationship between content and library popularity,” medical reference services quarterly 31, no. 2 (2012), http://dx.doi.org/10.1080/02763869.2012.670588. 26. elaine thornton, “is your academic library pinning? academic libraries and pinterest,” journal of web librarianship 6, no. 3 (2012), http://dx.doi.org/10.1080/19322909.2012.702006. 27. katie elson anderson and julie m. still, “librarians’ use of images on libguides and other social media platforms,” journal of web librarianship 7, no. 3 (2013), http://dx.doi.org/10.1080/19322909.2013.812473. 28. grace saw, “social media for international students—it’s not all about facebook,” library management 34, no. 3 (2013): 172, http://dx.doi.org/10.1108/01435121311310860. 29. ligaya ganster and bridget schumacher, “expanding beyond our library walls: building an active online community through facebook,” journal of web librarianship 3, no. 2 (2009), http://dx.doi.org/10.1080/19322900902820929. 30. sebastián valenzuela, namsu park, and kerk f. kee, “is there social capital in a social network site? facebook use and college students’ life satisfaction, trust, and participation,” journal of computer-‐mediated communication 14, no. 4 (2009), http://dx.doi.org/10.1111/j.1083-‐6101.2009.01474.x. 31. nancy kim phillips, “academic library use of facebook: building relationships with students,” journal of academic librarianship 37, no. 6 (2011), http://dx.doi.org/10.1016/j.acalib.2011.07.008. building library community through social media | young and rossmann 35 32. tina mccorkindale, marcia w. distaso, and hilary fussell sisco, “how millennials are engaging and building relationships with organizations on facebook,” journal of social media in society 2, no. 1 (2013), http://thejsms.org/index.php/tsmri/article/view/15/18. 33. valenzuela, park, and kee, “is there social capital in a social network site?,” 882. 34. maria r. lee, david c. yen, and c. y. hsiao, “understanding the perceived community value of facebook users,” computers in human behavior 35 (february 2014): 355, http://dx.doi.org/10.1016/j.chb.2014.03.018. 35. hyun jung oh, elif ozkaya, and robert larose, “how does online social networking enhance life satisfaction? the relationships among online supportive interaction, affect, perceived social support, sense of community, and life satisfaction,” computers in human behavior 30 (2014), http://dx.doi.org/10.1016/j.chb.2013.07.053. 36. rowena cullen and laura sommer, “participatory democracy and the value of online community networks: an exploration of online and offline communities engaged in civil society and political activity,” government information quarterly 28, no. 2 (2011), http://dx.doi.org/10.1016/j.giq.2010.04.008. 37. mohamed nanabhay and roxane farmanfarmaian, “from spectacle to spectacular: how physical space, social media and mainstream broadcast amplified the public sphere in egypt’s ‘revolution,’” journal of north african studies 16, no. 4 (2011), http://dx.doi.org/10.1080/13629387.2011.639562. 38. nermeen sayed, “towards the egyptian revolution: activists perceptions of social media for mobilization,” journal of arab & muslim media research 4, nos. 2–3 (2012): 273–98, http://dx.doi.org/10.1386/jammr.4.2-‐3.273_1. 39. morton a. lieberman and andrew winzelberg, “the relationship between religious expression and outcomes in online support groups: a partial replication,” computers in human behavior 25, no. 3 (2009), http://dx.doi.org/10.1016/j.chb.2008.11.003. 40. christopher e. beaudoin and chen-‐chao tao, “benefiting from social capital in online support groups: an empirical study of cancer patients,” cyberpsychology & behavior: the impact of the internet, multimedia and virtual reality on behavior and society 10, no. 4 (2007), http://dx.doi.org/10.1089/cpb.2007.9986. 41. manuela tomai et al., “virtual communities in schools as tools to promote social capital with high schools students,” computers & education 54, no. 1 (2010), http://dx.doi.org/10.1016/j.compedu.2009.08.009. information technology and libraries | march 2015 36 42. edward shih-‐tse wang and lily shui-‐lien chen, “forming relationship commitments to online communities: the role of social motivations,” computers in human behavior 28, no. 2 (2012), http://dx.doi.org/10.1016/j.chb.2011.11.002. 43. pippa norris and david jones, “virtual democracy,” harvard international journal of press/politics 3, no. 2 (1998), http://dx.doi.org/10.1177/1081180x98003002001. 44. xu cheng, jiangchuan liu, and cameron dale, “understanding the characteristics of internet short video sharing: a youtube-‐based measurement study,” ieee transactions on multimedia 15, no. 5 (2013), http://dx.doi.org/10.1109/tmm.2013.2265531. 45. nancy foasberg, “online reading communities: from book clubs to book blogs,” journal of social media in society 1, no.1 (2012), http://thejsms.org/index.php/tsmri/article/view/3/4. 46. matthias hofer and viviane aubert, “perceived bridging and bonding social capital of twitter: differentiating between followers and followees,” computers in human behavior 29, no. 6 (2013): 2137, http://dx.doi.org/10.1016/j.chb.2013.04.038. 47. rachel grieve et. al., “face-‐to-‐face or facebook: can social connectedness be derived online?,” computers in human behavior 29, no. 3 (2013): 607, http://dx.doi.org/10.1016/j.chb.2012.11.017. 48. ibid., 608. 49. nathan jurgenson, “when atoms meet bits: social media, the mobile web and augmented revolution,” future internet 4, no. 1 (2012), http://dx.doi.org/10.3390/fi4010083. 50. r. stuart geiger, “bots, bespoke, code and the materiality of software platforms,” information, communication & society 17, no. 3 (2014), http://dx.doi.org/10.1080/1369118x.2013.873069. 51. zeynep tufekci, “the social internet: frustrating, enriching, but not lonely,” public culture 26, no. 1, iss. 72 (2013): 14, http://dx.doi.org/10.1215/08992363-‐2346322. 52. nicole b. ellison, charles steinfield, and cliff lampe, “the benefits of facebook ‘friends’: social capital and college students’ use of online social network sites,” journal of computer-‐mediated communication 12, no. 4 (2007): 1144, http://dx.doi.org/10.1111/j.1083-‐6101.2007.00367.x. 53. ibid., 1165. 54. roger brown, a first language: the early stages (cambridge: harvard university press, 1973). building library community through social media | young and rossmann 37 55. mimi zhang and bernard j. jansen, “using action-‐object pairs as a conceptual framework for transaction log analysis,” in handbook of research on web log analysis, edited by bernard j. jansen, amanda spink, and isak taksa (hershey, pa: igi, 2008). 56. bernard j. jansen and mimi zhang, “twitter power: tweets as electronic word of mouth,” journal of the american society for information science & technology 60, no. 11 (2009), http://dx.doi.org/10.1002/asi.v60:11. 57. sewell, “who is following us?” 58. glazer, “‘likes’ are lovely,” 20. 59. lee, yen, and hsiao, “understanding the perceived.“ 60. oh, ozkaya, and larose, “how does online social networking.” 61. reynol junco, greg heiberger, and eric loken, “the effect of twitter on college student engagement and grades,” journal of computer assisted learning 27, no. 2 (2011), http://dx.doi.org/10.1111/j.1365-‐2729.2010.00387.x. 62. susannah k. brown and charles a. burdsal, “an exploration of sense of community and student success using the national survey of student engagement,” journal of general education 61, no. 4 (2012), http://dx.doi.org/10.1353/jge.2012.0039. 63. jill l. creighton et al., “i just look it up: undergraduate student perception of social media use in their academic success,” journal of social media in society 2, no. 2 (2013), http://thejsms.org/index.php/tsmri/article/view/48/25. 64. david c. deandrea et al., “serious social media: on the use of social media for improving students’ adjustment to college,” the internet and higher education 15, no. 1 (2012), http://dx.doi.org/10.1016/j.iheduc.2011.05.009. 65. rebecca gray et al., “examining social adjustment to college in the age of social media: factors influencing successful transitions and persistence,” computers & education 67 (2013), http://dx.doi.org/10.1016/j.compedu.2013.02.021. 108 information technology and libraries | september 2011 nancy m. foasberg adoption of e-book readers among college students: a survey understand whether and how students are using e-book readers to respond appropriately. as new media formats emerge, libraries must avoid both extremes: uncritical, hype-driven adoption of new formats and irrational attachment to the status quo. ■■ research context recently introduced e-reader brands have attracted so much attention that it is sometimes difficult to remember that those currently on the market are not the first generation of such devices. the first generation was introduced, to little fanfare, in the 1990s. devices such as the softbook and the rocket e-book reader are well documented in the literature, but were unsuccessful in the market.1 the most recent wave of e-readers began with the sony reader in 2006 and amazon’s kindle in 2007, and thus far is enjoying more success. barnes and noble and borders have entered the market with the nook and the kobo, respectively, and apple has introduced the ipad, a multifunction device that works well as an e-reader. amazon claims that e-book sales for the kindle have outstripped their hardcover book sales.2 these numbers may reflect price differences, enthusiasm on the part of early adopters, marketing efforts on the parts of these particular companies, or a lack of other options for e-reader users because the devices are designed to be compatible primarily with the offerings of the companies who sell them. nevertheless, they certainly indicate a rise in the consumption of e-books by the public, as the dramatic increase in wholesale e-book sales bears out.3 in the meantime, sales of the devices increased nearly 80 percent in 2010.4 with this flurry of activity have come predictions that e-readers will replace print eventually, perhaps even within the next few years.5 books have been published with such bold titles as print is dead.6 however, despite the excitement, e-readers are still a niche market. according to the 2010 pew internet and american life survey, 5 percent of americans own e-book readers. those who do skew heavily to the wealthy and well-educated, with 12 percent having an annual household income of $75,000 or more and 9 percent of college graduates owning an electronic book reader. this suggests that e-book readers are still a luxury item to many.7 to academic librarians, it is especially important to know whether e-readers are being adopted by college students and whether they can be adapted for academic use. e-readers’ virtues, including their light weight, their ability to hold many books at the same time, and the speed with which materials can be delivered, could make them very attractive to students. however, they have many limitations for academic work. most do not provide the ability to copy and paste into another document, have to learn whether e-book readers have become widely popular among college students, this study surveys students at one large, urban, four-year public college. the survey asked whether the students owned e-book readers and if so, how often they used them and for what purposes. thus far, uptake is slow; a very small proportion of students use e-readers. these students use them primarily for leisure reading and continue to rely on print for much of their reading. students reported that price is the greatest barrier to e-reader adoption and had little interest in borrowing e-reader compatible e-books from the library. p ortable e-book readers, including the amazon kindle, barnes and noble nook, and the sony reader, free e-books from the constraints of the computer screen. although such devices have existed for a long time, only recently have they achieved some degree of popularity. as these devices become more commonplace, they could signal important changes for libraries, which currently purchase and loan books according to the rights and affordances associated with print books. however, these changes will only come about if e-book readers become dominant. for academic libraries, the population of interest is college students. their use of reading formats drives collection development practices, and any need to adjust to e-readers depends on whether students adopt them. thus, it is important to research the present state of students’ interest in e-readers. do they own e-readers? do they wish to purchase one? if they do own them, do they use them often and regard them suitable for academic work? the present study surveys students at queens college, part of the city university of new york, to gather information about their attitudes toward and ownership of e-books and e-book readers. because only queens college students were surveyed, it is not possible to draw conclusions about college students in general. however, the data do provide a snapshot of a diverse student body in a large, urban, four-year public college setting. the goal of the survey was to learn whether students own and use e-book readers, and if so, how they use them. in the midst of enthusiasm for the format by publishers, librarians and early adopters, it is important to consult the students themselves, whose preferences and reading habits are at stake. it is also vital for academic libraries to nancy m. foasberg (nfoasberg@qc.cuny.edu) is humanities librarian, queens college, city university of new york, flushing, new york. adoption of e-book readers among college students: a survey | foasberg 109 foundation survey, internet and american life, found that e-readers were luxury items owned by the well educated and well off. in the survey, 5 percent of respondents reported owning an e-reader.12 in the ecar study of undergraduate students and information technology, 3.1 percent of undergraduate college students reported owning an e-book reader, suggesting that college students are adopting the devices at a slower rate than the general population.13 commercial market research companies, including harris interactive and the book industry study group, also have collected data on e-book adoption. the harris interactive poll found that 8 percent of their respondents owned e-readers, and that those who did claimed that they read more since acquiring it. however, as a weighted online poll with no available measure of sampling error, these results should be considered with caution.14 the book industry study group survey, although it was sponsored by several publishers and e-reader manufacturers, appears to use a more robust method. this survey, consumer attitudes toward e-book reading, was conducted in three parts in 2009 and 2010. kelly gallagher, who was responsible for the group that conducted the study, remarks that “we are still in very early days on e-books in all aspects—technology and adoption.” although the size of the market has increased dramatically, the survey found that almost half of all e-readers are acquired as a gift and that half of all e-books “purchased” are actually free. however, among those who used e-books, about half said they mostly or exclusively purchased e-books rather than print. the e-books purchased are mostly fiction (75 percent); textbooks comprised only 11 percent of e-book purchases.15 much of the literature on e-book readers consists of user studies, which provide useful information about how readers might interact with the devices once they have them in hand but provide no information about whether students are likely to use them of their own volition. however, these studies are of interest because they hint at reasons that students may or may not find e-readers useful, important information for predicting the future of e-books. user studies have covered small devices, such as pdas (personal data assistants);16 first-generation e-readers, such as the rocket ebook;17 and more recent e-book readers.18 the results of many recent e-reader user studies have been very similar to studies on the usability of the first generation of e-book readers: the devices offer advantages in portability and convenience but lack good note-taking features and provide little support for nonlinear navigation. amazon sponsored large-scale research on academic uses of e-book readers at universities, such as princeton, case western reserve university, and the university of virginia,19 while other universities, such as northwest missouri state university,20 carried out their own projects limited note-taking capabilities, and rely on navigation strategies that are most effective for linear reading. the format also presents many difficulties regarding library lending. many publishers rely on various forms of drm (digital rights management) software to protect copyrighted materials. this software often prevents e-books from being compatible with more than one type of e-book reader. indeed, because e-book collections in academic libraries predate the emergence of e-book readers, many libraries now own or subscribe to large e-book collections that are not compatible with the majority of these devices. furthermore, publishers and manufacturers have been hesitant to establish lending models for their books. amazon recently announced that they would allow users to lend a book once for a period of fourteen days, if the publisher gave permission.8 this very cautious and limited approach speaks volumes about publishers’ fears regarding user sharing of e-books. several libraries have developed programs for lending the devices,9 but there is no real model for lending e-books to users who already own e-readers. a service called overdrive also provides downloadable collections, primarily of popular fiction, that can be accessed in this manner. however, the collections are small and are not compatible with all devices, including the most popular, the kindle. in the united kingdom, the publisher’s association has provided guidelines under which libraries can lend e-books, which include a requirement that the user physically visit the library to download the e-book.10 clearly, we do not currently have anything resembling a true library lending model for e-reader compatible e-books, especially not one that takes advantage of the format’s strengths. despite the challenges, it is clear that if e-book readers are enthusiastically adopted by students, libraries will need to find a way to offer materials compatible with them. as buczynski puts it, “libraries need to be in play at this critical juncture lest they be left out or sidelined in the emerging e-book marketplace.”11 however, because the costs of participating are likely to be substantial, it is very important to discover whether students are indeed adopting the hardware. few studies have focused on spontaneous student adoption of the devices, although several mention that when students were introduced to e-readers, they appeared to be unfamiliar with the devices and regard them as a novelty. however, e-readers have become more prevalent since many of these studies were conducted. thus this study surveys students to find their attitudes toward e-book readers. ■■ literature review only a few studies have attempted to quantify the popularity of e-readers. as mentioned above, the 2010 pew 110 information technology and libraries | september 2011 their first encounter with an e-book reader.”34 while this is mere anecdote, it, along with the survey results noted above, raises the question of how popular the device really is on college campuses. finally, a third group of studies attempts to predict the future of e-readers and e-books. even before the introduction of e-readers, some saw e-books as the likely future of academic libraries.35 more recently, one report discusses the likelihood of and barriers to e-book adoption. this article concludes that “barriers to e-book adoption still exist, but signs point to this changing within the next two to five years. that, of course, has been said for most of the past 15 to 20 years.”36 still, nelson points out that technologies can become ubiquitous very quickly, using the ipod as an example, and warns libraries against falling behind.37 yet another report puts e-books in the two-tothree-year adoption range and claims that e-books “have reached mainstream adoption in the consumer sector” and that the “obstacles have . . . started to fall away.”38 ■■ method the e-reader survey was conducted as part of queens college’s student technology survey, which also covered several other aspects of students’ interactions with technology. the author is grateful to the center for teaching and learning (in particular, eva fernández and michelle fraboni) for graciously agreeing to include questions about e-readers in the survey and providing some assistance in managing the data. this survey, run through queens college’s center for teaching and learning, was hosted by surveymonkey and was distributed to students through their official e-mail accounts. participants were offered a chance to win an ipod touch as an incentive, but students who did not participate also were offered an opportunity to enter the ipod drawing. the survey was available between april and june 2010. all personally identifying information was removed from the responses to protect student privacy. rather than surveying the entire population about e-readers and e-books, the survey limited most of the questions to students with some experience with the format. of the students who responded to the survey, only 63 (3.7 percent) used e-readers. however, 338 more students identified themselves as users of e-books but did not use e-readers. all other students skipped past the e-book questions and were directed to the next part of the survey. the questions about e-readers fell into several categories. the students were asked about their ownership of devices and which devices they planned to purchase in the future. while they might of course change their minds about future purchases, this is a useful way of measuring whether students regard the devices as desirable. with other e-readers. other types of programs, most notably texas a&m’s kindle lending program,21 and many academic focus groups have also contributed to our knowledge of how students use e-readers. users in nearly every study have praised the portability of these devices. this can be very important to students; users in one study noted that the portability of reading devices allowed them to “reclaim an otherwise difficult to use brief period,”22 and in another, students were able to multitask, doing household chores and studying at the same time.23 adjustable text size and the ability to search for words in the text have also been popular among students, as has the novelty value of these devices. environmental concerns surrounding heavy printing have also been cited as an advantage of e-readers.24 however, the limitations of these devices, some of which are severe in an academic setting, also have been noted. the comments of students at gettysburg college are typical: they liked the e-readers for leisure reading, but found them awkward for classroom use.25 lack of note-taking support was an important drawback for many students. waycott and kukulska-hulme noted that students were much less likely to take notes while reading with a pda than they were with print.26 a study at princeton found that the same was true of students using the kindle,27 and students at northwest missouri state university said they read less with an e-textbook than with a traditional one, although they did not report changes in their study habits.28 despite the ability of many devices to search the text of a book, users in many studies also disliked the inability to skim and browse through the materials as they would with print.29 interestingly, this complaint appeared in studies of all types of e-readers, even those with larger screens. students, in a recent study with the sony reader and ipod touch, noted that these devices did a poor job of supporting pdfs, a standard format for online course materials. the documents were displayed at a very small size and the words were sometimes jumbled.30 whether these drawbacks will prevent students from adopting e-book readers remained to be seen. library and information science (lis) students in a small, week-long study reiterated the problems found in the above studies, but nevertheless found themselves using e-readers extensively and reading more books and newspapers than they had before.31 several of these user studies hint that e-readers are not currently commonplace as far as users often seemed to regard the devices with surprise and curiosity. in some studies, while users were initially attracted to the novelty value of the devices, their enthusiasm dimmed after using the devices and discovering technical problems and limitations.32 one author describes e-readers as “attention getters, but not attention keepers.”33 a study in early 2009, in which students were provided with e-readers, notes that “for the majority of the participants, this was adoption of e-book readers among college students: a survey | foasberg 111 attitudes of students in general, similar surveys should be taken across many campuses in several demographically different areas. researching e-readers is inherently difficult because the landscape is changing very quickly. since the survey began, apple’s ipad became available, prices for dedicated e-readers have dropped dramatically, publishers have become more willing to offer content electronically, and amazon has released a new version of the kindle and has begun taking out television advertisements for it. without a follow-up survey, it is impossible to know whether these events have changed student attitudes. ■■ results and discussion e-reader adoption of the 1,705 students who responded to the survey, 401 say that they read e-books (table 1). most students (338) who use e-books read them on a device other than an e-reader, but 63 say they use a dedicated reader for e-books (table 2). however, when students were asked about the technological devices that they own, only 56 selected e-book readers. perhaps the seven students who use e-book readers but don’t report owning one are sharing or borrowing them, or perhaps they are using a device other than the ones enumerated in the question. aside from table 3, which breaks down the e-reader brands that students own, the following data will be based upon the larger sample of 63 students. the students who read e-books on another device were asked whether they planned to buy an e-reader in the respondents were also asked about their use of e-books. this category includes questions about what kind of reading students use e-books for, how much of their reading uses e-books, and where they are finding their e-books. it was important to learn whether students considered e-book readers appropriate for academic work, and whether they considered the library a potential source for e-books. finally, to assess their attitudes toward e-book readers, students were asked to identify the main benefits and drawbacks of e-book readers. several possibilities were listed, and students were asked to respond to them along a likert scale. a field was also included in which students could fill in their own answers. after 643 incomplete surveys were eliminated, there were 1,705 responses from queens college students. this is about 8 percent of the queens college student body. e-mail surveys always run the risk of response bias, especially when they concern technology. however, students who responded were representative of queens college in terms of sex, age, class standing, major, and other demographic characteristics. the results were compared using a chi-squared test with the level of significance set at 0.05. in some cases, there were too few respondents to test significance properly and comparisons could not be made. please see appendix for the e-reader questions included in the survey instruments. they will be referred to in more depth throughout this article. ■■ survey limitations the survey results may not be generalizable because of the survey’s small sample size. in particular, the 63 respondents who use e-book readers may not be representative of student e-reader owners in general. the survey also relies on self-reporting; no direct observation of student behavior took place. students who do use e-readers may be more comfortable with technology and more likely to respond to e-mail surveys. however, the sample is representative for queens college students, and the percentage of students who own e-book readers is close to the national average at the time the survey was taken (5 percent).39 since only queens college students were surveyed, the results reflect the behavior and attitudes of students at a single large, four-year public college in new york city. the results do not necessarily reflect the experience of students at other types of institutions or in other parts of the united states. the other parts of the technology survey show that qc students are heavy users of technology, so they may adopt new technologies such as e-book readers more quickly than other students. to understand the table 1. e-book use among respondents e-book use number of respondents read e-books 401 (23.5%) do not read e-books 1262 (74.0%) don’t know what an e-book is 42 (2.5%) total 1705 (100%) table 2. devices used to read e-books among e-book readers device used number of respondents (% of e-book users) dedicated e-reader 63 (15.7) other device 338 (84.3) total 401 (100) 112 information technology and libraries | september 2011 desire to buy an ipad, many more than reported owning an e-reader. curiously, the e-reader owners reported that they planned to buy an ipad at the same rate as the other students. it is not clear whether these students plan to replace their e-reader or use multiple devices. in either case, while the arrival of the ipad and other tablet devices seems likely to increase the number of students carrying potential e-reading devices, some of its adopters will probably be students who already own e-readers. not surprisingly, students who used e-readers tended to be early adopters of technology in general (table 4).40 compared to the general pool of respondents, they were much more likely to like or love new technologies and much less likely to describe themselves as neutral or skeptical of them. in a chi-squared test, these differences were significant at a level of 0.001. although e-reading devices have existed since the 1990s, the newest, most popular generation of them is so recent that people who own one now are early adopters by definition. compared to the rest of the survey respondents, both e-reader owners and other e-book users were much more likely to identify as early adopters of technology in general. given this trend, the adoption rate of e-readers among students may slow once the early adopters are satisfied. uses of e-books students who used an e-book reader were asked how much of their reading they did with it and whether they used it for class, recreational, or work-related reading (table 5). students without e-readers were asked the same questions about their use of e-books. while it is likely that students who use e-book readers continue to access e-books in other ways, this distinction was made because this survey was designed to study their use of e-readers specifically. because e-reader users were not asked about their use of e-books in other formats, it is not clear whether their habits with more traditional e-book formats differ from those of other students. fewer than half the e-reader users in the study used the device for two-thirds of their reading or more. in the table below, students who did all their reading and those who did about two-thirds of their reading with e-books are combined, because so few claimed to read e-books exclusively. three students with e-readers and future. the majority had no immediate plans to buy one, with those who said they did not plan to acquire one and those who did not know combining for 62.43 percent. 23.67 percent planned to buy one either within the next year or before leaving college, and the remaining 13.91 percent planned to acquire an e-reader after graduating. despite ergonomic disadvantages, many more students are using e-books on some other device, such as a computer or a cell phone, than are loading them on e-readers. furthermore, a large percentage of these students do not plan to buy an e-book reader. the factors preventing these students from buying e-readers will be covered in more detail in the “attitudes toward e-readers” section below. however, it seems likely that a major factor is price, identified by both e-reader owners and non-owners as the greatest disadvantage of these devices. when asked to list the devices they owned, 56 students named some type of e-book reader. among these, the amazon kindle was the most popular (table 3). as expected, e-readers have yet to be adopted by most students at queens college. at the time of this survey, less than 4 percent of respondents owned one. while the rest of the survey shows that these students are highly wired—82 percent own a laptop less than five years old and 93 percent have high-speed internet access at home—this has not translated to a high rate of e-reader ownership. although apple’s ipad, a tablet device that functions as an e-reader among other things, was not yet released at the time of the survey, it may see wider adoption than the dedicated devices. when the survey was originally distributed, this device had been announced but not yet released. overall, 8 percent of students expressed a table 3. e-reader brands owned by students devices owned number of students (% of e-reader owners) amazon kindle 26 (46.4%) barnes & noble nook 14 (25.0%) sony reader 10 (17.9%) other 6 (10.7%) total 56 (100.0%) table 4. e-reader use and self-identification as an early adopter e-reader owners all respondents love or like new technologies 40 (63.5%) 698 (40.9%) neutral or skeptical about new technologies 23 (36.5%) 1007 (59.1%) total 63 (100.0%) 1705 (100.0%) adoption of e-book readers among college students: a survey | foasberg 113 pleasure. this finding is much more surprising, given the very slow adoption of e-books before the introduction of e-readers, and the ergonomic problems with reading from vertical screens. however, students who used e-books without e-readers were much more likely to read e-books for classes. this difference may be due to the sorts of material that are available in each format. although textbook publishers have shown interest in creating e-textbooks for use on devices such as the ipad, there is little selection available for e-readers as yet. when working without e-book readers, however, there is a wide variety of academic materials available in electronic formats, and many textbooks include an online component. academic libraries, including the one at queens college, subscribe to large e-book collections of academic materials. for the most part, these collections cannot be used on an e-reader, but they are available through the library’s website to students with an internet connection and a browser. it is also possible that the e-readers are not well suited to class readings. some past studies, cited above, have found that e-readers do not accommodate functions such as note taking, skimming, and non-sequential navigation very well. since these are important functions for academic work, and both print books and “traditional” e-books are superior in these respects, such limitations may prevent students from using e-readers for classes. the user behaviors reported here do not appear to herald the end of print; in fact, very few students with e-readers use them for all their reading, and over half of the students with e-readers use them for one-third of their reading or less. it is not clear whether students intentionally choose to read some materials in print and others with nine without said they used e-books for all their reading. very few students without e-book readers used e-books for a large proportion of their reading; indeed, 54 percent said they used e-books for less than a third of their reading. differences between the groups were tested for significance using a chi-squared test. note that percentages may not add up to 100 percent, due to rounding. since many studies of e-book readers have found them more suitable for recreational reading than for academic work, users of e-readers were asked to identify the kinds of readings for which they used e-readers and asked to identify all options that they found applicable (table 6). since students were allowed to choose more than one option, the totals are greater than the number of participants. indeed, e-readers were much more likely to be used for recreational reading and other types of e-books far more likely to be used for class. for other types of reading, differences between these groups were not significant. since e-readers have been marketed largely for the popular fiction market and are designed to accommodate casual linear reading, it is not surprising that students who use them are most likely to report using them for leisure reading. in this area they seem to enjoy a strong advantage over more traditional e-book formats read on another device such as a computer or a cell phone. however, the study did not control for the amount of reading that students do. students who use e-readers may be heavier leisure readers in general. further research could clarify whether heavier use of leisure e-reading is due to the devices or the tendencies of those who own them. a large proportion of the students who read e-books without e-readers (65.7 percent) do read e-books for table 5. amount of reading done with e-books amount of reading e-reader users other users x2 significance level significant? about two-thirds or all 27 (42.8%) 65 (19.2%) 16.8 0.001 yes about a third 14 (22.2%) 90 (26.6%) 0.1 0.5 no less than a third 22 (34.9%) 183 (54.1%) 7.9 0.01 yes total 63 (99.9%) 338 (99.9%) ———table 6. types of reading done with e-books type of reading e-reader users other users x2 significance level significant? recreational 54 (85.7%) 222 (65.7%) 9.9 0.01 yes class 24 (38.1%) 217 (64.2%) 14.7 0.001 yes work 11 (17.8%) 88 (26.0%) 2.1 0.5 no other 3 (4.8%) 8 (2.4%) 1.1 0.5 no 114 information technology and libraries | september 2011 from the manufacturer of the e-reader that supports them, this result is not surprising. it suggests that these booksellers have a high degree of power in the market, a potential effect of e-readers that deserves further attention. however, official e-book sellers of the sort mentioned above are not the only option for students seeking digital reading material, since both independent online bookstores and open access repositories such as project gutenberg were used by students. libraries, both public and academic, reached traditional e-book users much more successfully than e-reader users. although many libraries have large e-book collections, there is currently little material for e-readers. despite the existence of a service called overdrive, which provides e-books compatible with some e-readers (excluding the kindle), circulating e-books is challenging, due to a host of technical and legal problems. given this environment, it is not surprising that students without e-readers were more likely to use their public library as a source of e-books than were e-reader users. the queens college campus library, which offers many electronic collections but none that are e-reader-friendly, fared worse; only one student claimed to have used it as a source of e-reader compatible materials. in the free comment field, students mentioned other sources of e-books such as the apple itunes store, the campus bookstore and lulu.com, an online bookseller that also provides self-publishing. several also admitted, unprompted, that they download books illegally. attitudes toward e-readers in the interests of learning what caused students to adopt e-readers or not, the survey used a series of likert-style questions to ask what the students considered the benefits and drawbacks of such devices. strikingly, e-reader owners and non-owners agreed about both the advantages and disadvantages; owning an e-reader did not seem to change most of the things that students value and dislike about it. figure 1 shows the number of students in each group who their e-reader, or whether they are limited by the materials available for the e-reader. the circumstances under which students switch between electronic and print would be an excellent area for future research; is it a matter of what is practically available, or is the e-reader better suited for some texts and reading circumstances than others? sources of e-books the major producers of e-readers are either primarily booksellers, such as amazon and barnes & noble, or are hardware manufacturers who also provide a store where users can purchase e-books, such as sony (or, after the ipad launch, apple). in both models, the manufacturers hope to sell e-books to those who have purchased their devices. they provide more streamlined ways of loading these e-books on their devices, and in some cases use drm to prevent their e-books from being used on competing devices, as well as to inhibit piracy. table 7 shows the sources from which readers with and without e-readers obtain e-books. e-reader users were much more likely than non-users to get their e-books from the official store associated it—that is, the store providing the e-reader, such as amazon, barnes and noble, or sony’s readerstore. there was no significant difference between the two groups’ use of open access or independent sources, but the students who did not use e-readers were much more likely to use e-books from their public library, and while 19.8 percent of students without e-readers used the campus library as a source of e-books, only one student with an e-reader did. since respondents were allowed to choose more than one answer, the results do not sum up to 100 percent. by a wide margin, students who own e-readers are most likely to purchase their e-reading materials from the “official” store; 86 percent cited the official store as a source of e-books. students without e-readers also use these stores more than any other source of e-books, but they are nevertheless far less likely to use them than e-reader users. because it is much easier to buy e-books table 7. sources of e-books how do you get e-books? e-reader users other users x2 significance level significant? store specific to popular e-readers 54 (85.7%) 154 (45.6%) 34.2 0.001 yes open access repositories 16 (25.4%) 120 (35.5%) 2.4 0.5 no public library 10 (15.9%) 99 (29.3%) 4.8 0.05 yes independent online retailer 9 (14.3%) 71 (21.0%) 1.5 0.5 no other 4 (6.3%) 39 (11.5%) n/a n/a n/a campus library 1 (1.6%) 67 (19.8%) n/a n/a n/a adoption of e-book readers among college students: a survey | foasberg 115 students with e-readers were more likely than others to rate portability and convenience as “very valuable.” as the studies cited above suggest, being able to easily download books, carry them away from the computer, and store many books on a single device are very appealing to students. only the final two features, text-to-speech and special features such as dictionaries, attracted enough “not very valuable” or “not valuable” responses for an inter-group comparison. both groups considered text-to-speech the least valuable feature, but students who did not own e-readers were significantly more likely to consider it a valuable or very valuable feature, perhaps indicating that the users to whom this is important have avoided the devices, which currently support it in a very limited fashion. perhaps, too, students with e-readers rated this feature less useful because of its current limitations. in either case, rated each feature either valuable or very valuable. if the positive features of the devices are ranked based on the percentage of respondents who considered them very valuable, the order is almost the same for students with and without e-readers. for students with e-readers, the features rank as follows: portability, convenience, storage, special functions, and text-to-speech. for those without, convenience ranks slightly higher than portability; all other features rank in the same order. tables 8 and 9 present the results of these questions in more detail. for the sake of brevity, the chi-squared results have been omitted. any differences considered significant in the discussion below are significant at least at the 0.05 level. nearly all e-reader users and a strong majority of other e-book users rated portability, convenience, and storage either “valuable” or “very valuable,” though figure 1. features rated “valuable” or “very valuable” 116 information technology and libraries | september 2011 among respondents suggests that that many of those who do not own an e-book reader are unfamiliar with the technology. since e-readers are primarily sold over the internet, many people have not had a chance to see or handle one, perhaps partly explaining this result. if they become more widespread, this may well change. not surprisingly, respondents who did not own e-readers were significantly more likely to prefer print. however, it is worth noting that even among students who did use e-readers, over a third “agree” or “completely agree” that they prefer print, with another third neither agreeing nor disagreeing. use of e-readers does not appear to indicate hostility toward print. this is consistent with the students’ self-reports of e-reader use; as reported above, over half of the students surveyed use e-readers for one-third of their reading or less. thus, it seems unlikely that most of these students plan to totally abandon print any time soon; rather, e-readers are providing another format that they use in addition to print. as for students who do not use e-readers, over half say they prefer print, but this is far from their most widespread concern; rather, like e-reader owners, they are most likely to cite the cost of the reader or the selection of books available as a drawback of the devices. queens college students considered price the most important drawback of e-readers. for both groups (owners and non-owners), it was the factor most likely to be identified as a concern, and the difference between the it was the only variable listed in the survey for which either the “not very valuable” and “not valuable” responses from either group amounted to a combined total of greater than 10 percent of the respondents in that group. in addition to valuing the same features, e-reader owners and non-owners had similar concerns about the device. figure 2 shows the number of respondents in each group who agreed or completely agreed that the issues listed were one of the main shortcomings of e-book readers. tables 10 and 11 give the responses in more detail. the responses with which the most respondents either agreed or completely agreed were the same: cost of e-reader, selection of e-books, and cost of e-books, in that order. although groups such as the electronic frontier foundation have raised concerns about privacy issues related to e-readers,41 these issues have made little impression on students; both e-reader users and nonusers were in agreement in putting privacy at the bottom of the list. one exception to the general agreement between e-reader users and other e-book readers was concern about eyestrain. the majority (63 percent) of those who do not use e-readers either “completely agree” or “agree” that eyestrain is a drawback, while only 29 percent of e-reader owners did. this was a major concern for early e-readers, leading the current generation of these devices to use e-ink, a technology that resembles paper and is thought to eliminate the eyestrain problem. the disparity table 8. value of e-reader features, according to e-reader users very valuable valuable somewhat valuable not very valuable not valuable at all no response portability 52 (82.54%) 10 (15.87%) 1 (1.59%) 0 (0.00%) 0 (0.00%) 0 (0.00%) convenience 46 (73.02%) 13 (20.63%) 1 (1.59%) 1 (1.59%) 1 (1.59%) 1 (1.59%) storage 42 (66.67%) 16 (25.40%) 2 (3.17%) 1 (1.59%) 0 (0.00%) 2 (3.17%) special functions 32 (50.79%) 18 (28.57%) 7 (11.11%) 3 (4.76%) 3 (4.76%) 0 (0.00%) text-speech 10 (15.87%) 13 (20.63%) 12 (19.05%) 16 (25.40%) 11 (17.46%) 1 (1.59%) table 9. value of e-reader features, according to other e-book users very valuable valuable somewhat valuable not very valuable not valuable at all no response portability 199 (58.88%) 89 (26.33%) 39 (11.53%) 4 (1.18%) 5 (1.48%) 2 (0.06%) convenience 194 (57.40%) 98 (28.99%) 34 (10.06%) 7 (2.07%) 2 (0.59%) 3 (0.89%) storage 181 (53.55%) 99 (29.28%) 40 (11.83%) 10 (2.96%) 4 (1.18%) 4 (1.18%) special functions 169 (50.00%) 82 (24.26%) 58 (17.16%) 22 (6.51%) 4 (1.18%) 3 (0.89%) text-speech 95 (28.11%) 77 (22.78%) 77 (22.78%) 50 (14.79%) 35 (10.36%) 4 (1.18%) adoption of e-book readers among college students: a survey | foasberg 117 responded, but they brought up issues such as highlighting, battery life, and the small size of the screen. another student was more confident in the value of e-readers and used this space to proclaim paper books dead. e-book circulation programs finally, students were asked whether they would be interested in checking out e-readers with books loaded on them from the campus library (table 12). as is often the case when a survey asks for interest in a prospective new service, the response was very positive. however, it was expected that many of the students would prefer to download materials for devices that they already own to take advantage of the convenience of e-readers. on the contrary, a high percentage of both types of students expressed interest in checking out e-book readers, but very few wished to check out e-books two groups was not significant. at the time this survey was taken, amazon’s kindle cost close to $300 and barnes and noble’s nook was priced similarly. soon after the survey closed, however, the major e-reader manufacturers engaged in a “price war,” which resulted in the prices of the best-known dedicated readers, amazon’s kindle and barnes and noble’s nook, falling to under $200. given the feeling among survey respondents that the price of the readers is a serious drawback, this reduction may cause the adoption rate to rise. it would be worthwhile to repeat this survey or a similar one in the near future to learn whether the e-reader price war has had any effect upon price-sensitive students. in the pilot survey, students had written in further responses about the drawbacks of e-readers, but not about their benefits. while some of those responses were incorporated into the final survey, a free text field was also added to catch any further comments. few students figure 2. drawbacks with which students “agree” or “completely agree” 118 information technology and libraries | september 2011 ■■ future research although this survey provides some data to help libraries think about the popularity of e-readers among students, many aspects of students’ use of e-readers remain unexplored. further research on how student adoption of e-book readers varies by location and demographics, particularly considering students’ economic characteristics, for a device of their own. even students who owned e-readers were much more likely to express interest in checking out the device than checking out materials to read on it. this preference belies the common assumption that users do not wish to carry multiple devices and prefer to download everything electronically. instead, they were interested in checking out an e-reader from the library. unless the emphasis of the question altered the results, it is somewhat difficult to account for this response. table 10. drawbacks of e-readers, according to e-reader owners completely agree agree neither agree nor disagree disagree completely disagree no response cost of reader 19 (30.16%) 23 (36.51%) 13 (20.63%) 7 (11.11%) 0 (0.00%) 1 (1.59%) selection 11 (17.46%) 26 (41.27%) 12 (19.05%) 7 (11.11%) 6 (9.52%) 1 (1.59%) cost of e-books 10 (15.87%) 20 (31.75%) 16 (25.40%) 11 (17.46%) 5 (7.94%) 1 (1.59%) prefer print 6 (9.52%) 16 (25.40%) 21 (33.33%) 11 (17.46%) 8 (12.70%) 1 (1.59%) eyestrain 7 (11.11%) 11 (17.46%) 20 (31.75%) 15 (23.81%) 9 (14.29%) 1 (1.59%) interface 7 (11.11%) 10 (15.87%) 24 (38.10%) 9 (14.29%) 8 (12.70%) 5 (7.94%) privacy 3 (4.76%) 9 (14.29%) 13 (20.63%) 26 (41.27%) 11 (17.46%) 1 (1.59%) table 11. drawbacks of e-readers, according to other e-book users completely agree agree neither agree nor disagree disagree completely disagree no response cost of reader 146 (43.20%) 117 (34.62%) 50 (14.79%) 14 (4.14%) 11 (3.25%) 0 (0.00%) selection 80 (23.67%) 136 (40.24%) 84 (24.85%) 27 (7.99%) 7 (2.07%) 4 (1.18%) cost of e-books 94 (27.81%) 121 (35.80%) 76 (22.49%) 37 (10.95%) 10 (2.96%) 0 (0.00%) prefer print 78 (23.08%) 99 (29.29%) 116 (34.32%) 25 (7.40%) 19 (5.62%) 1 (0.30%) eyestrain 84 (24.85%) 129 (38.17%) 80 (23.67%) 33 (9.76%) 11 (3.25%) 1 (0.30%) interface 43 (12.72%) 82 (24.26%) 145 (42.90%) 33 (9.76%) 20 (5.92%) 15 (4.44%) privacy 39 (11.54%) 65 (19.23%) 144 (42.60%) 49 (14.50%) 40 (11.83%) 1 (0.30%) table 12. interest in checking out preloaded e-readers from the library e-reader owners other e-book users would be interested in checking out e-readers 44 (70.0%) 257 (76.0%) would not be interested in checking out e-readers 4 (6.3%) 38 (11.2%) would not be interested in checking out e-readers, but would like to check out e-books to read on my own e-reader 15 (23.8%) 43 (12.7%) total 63 (100.1%) 338 (99.9%) adoption of e-book readers among college students: a survey | foasberg 119 whom would not object to using a print edition if one were available. under these circumstances, and realizing that the future popularity of e-readers is far from guaranteed, developing such models is, for now, more important than putting them into practice in the short term. references 1. nancy k. herther, “the ebook reader is not the future of ebooks,” searcher 16, no. 8 (2008): 26–40, http://search.ebsco host.com/login.aspx?direct=true&db=a9h&an=34172354&site =ehost-live (accessed dec. 22, 2010). 2. charlie sorrel, “amazon: e-books outsell hardcovers,” wired, july 20, 2010, http://www.wired.com/gadgetlab/ 2010/07/amazon-e-books-outsell-hardcovers/ (accessed dec. 22, 2010). 3. international digital publishing forum, “industry statistics,” oct. 2010, http://www.idpf.org/doc_library/indus trystats.htm (accessed dec. 22, 2010). 4. kathleen hall, “global e-reader sales to hit 6.6m 2010,” electronics weekly, dec. 9, 2010, http://www.electronicsweekly .com/articles/2010/12/09/50083/global-e-reader-sales-to -reach-6.6m-2010-gartner.htm (accessed dec. 22, 2010). 5. cody combs, “will physical books be gone in five years?” video interview with nicholas negroponte, cnn, oct. 18, 2010, http://www.cnn.com/2010/tech/innovation/10/17/negro ponte.ebooks/index.html (accessed dec. 22, 2010). 6. jeff gomez, print is dead: books in our digital age (basingstoke, uk: palgrave macmillan, 2009). 7. aaron smith, “e-book readers and tablet computers,” in americans and their gadgets (washington, d.c.: pew internet & american life project, 2010), http://www.pewinternet.org/ reports/2010/gadgets/report/ebook-readers-and-tablet -computers.aspx (accessed dec. 22, 2010). 8. alex sharp, “amazon announces kindle book lending feature is coming in 2010,” suite101, oct. 26, 2010, http:// www.suite101.com/content/amazon-announces-kindle-book -lending-feature-is-coming-in-2010-a300036#ixzz18cxanfke (accessed dec. 22, 2010). 9. karl drinkwater, “e-book readers: what are librarians to make of them?” sconul focus 49 (2010): 4–10, http://www .sconul.ac.uk/publications/newsletter/49/2.pdf (accessed dec. 22, 2010). drinkwater provides an overview and a discussion of the challenges and benefits of such programs. 10. benedicte page, “pa sets out restrictions on library e-book lending,” the bookseller, oct. 21, 2010, http://www .thebookseller.com/news/132038-pa-sets-out-restrictions-on -library-e-book-lending.html (accessed dec. 22, 2010). 11. james a. buczynski, “library ebooks: some can’t find them, others find them and don’t know what they are,” internet services reference quarterly 15, no. 1 (2010): 11–19, doi: 10.1080/10875300903517089, http://dx.doi.org/ 10.1080/10875300903517089 (accessed dec. 22, 2010). 12. smith, “e-book readers and tablet computers,” http:// www.pewinternet.org/reports/2010/gadgets/report/ebook -readers-and-tablet-computers.aspx (accessed dec. 22, 2010). 13. shannon d. smith and judith borreson caruso, the ecar study of undergraduate students and information technology, 2010 (boulder, colo.: educause, 2010), http://net.educause. is certainly important. more research on the habits of students with e-readers would also help libraries and universities to better serve their needs. in particular, while this survey found that students tend to switch between electronic and print formats, little is yet known about when and why they move from one to the other. it will also be important to research the differences between the reading habits of students who own e-readers and those who do not, as this may prove useful in interpreting the survey data about types of reading done with different kinds of e-books. furthermore, since the e-book market changes quickly, continuing to research student adoption of e-readers is also important to monitor student reactions to new developments. ■■ conclusion while many queens college students express an interest in e-readers, and even those who do not own one believe that their portability and convenience offer valuable advantages, only a small percentage of students, many of whom are early adopters of technology in general, actually use one. furthermore, even those who own e-readers do not use them exclusively, and only a third say they prefer it to print. in light of these responses, the proper response to this technology may not be a discussion about whether “paper books are dead” (as one of the survey respondents wrote in the comment field) but how each format is used. research on when, where, and for what purposes students might choose print or electronic has already begun.42 many of the factors that contribute to the niche status of e-readers are changing. competition between manufacturers has brought down the price of the reader itself, and the selection of books available for them is improving. because these were some of the most important problems standing in the way of e-reader adoption for queens college students, e-reader ownership could increase rapidly. the lack of a significant difference between the attitudes of e-reader owners and nonowners merits further emphasis and examination, as it may indicate that price is indeed the major barrier to e-reader ownership. although the prices are lower now than they were when the survey was originally taken, this would present a major concern if e-readers became the expected format in which students read, perhaps even the possibility of a new kind of digital divide. as the future is uncertain, it is important for academic libraries to pay attention to their students’ adoption of e-readers, and to consider models under which they can provide materials compatible with them. however, it is important to remember that such materials would, at present, be accessible to only a small subset of users, many of 120 information technology and libraries | september 2011 20. jon t. rickman et al., “a campus-wide e-textbook initiative,” educause quarterly 32, no. 2 (2009), http://www.edu cause.edu/library/eqm0927 (accessed dec. 22, 2010). 21. dennis t. clark, “lending kindle e-book readers: first results from the texas a&m university project,” collection building 28, no. 4 (2009): 146–49, doi: 10.1108/01604950910999774, http://www.emeraldinsight.com/journals.htm?articleid=18174 06&show=abstract (accessed dec. 22, 2010). 22. marshall and rutolo, “reading-in-the-small,” 58. 23. mallett, “a screen too far?” 142. 24. “e-reader pilot at princeton.” 25. foster and remy, “e-books for academe,” 6. 26. waycott and kukulska-hulme, “students’ experiences with pdas,” 38. 27. “e-reader pilot at princeton.” 28. rickman, “a campus-wide e-textbook initiative.” 29. dennis t. clark et al., “a qualitative assessment of the kindle e-book reader: results from initial focus groups,” performance measurement and metrics 9, no. 2 (2008): 118–129, doi: 10.1108/14678040810906826, http://www.emeraldinsight .com/journals.htm?articleid=1736795&show=abstract (accessed dec. 22, 2010); james dearnley, cliff mcknight, and anne morris. “electronic book usage in public libraries: a study of user and staff reactions to a pda-based collection,” journal of librarianship and information science 36, no. 4 (2004): 175–182, doi: 10.1177/0961000604050568, http://lis.sagepub.com/content/36/4/175 (accessed dec. 22, 2010); mallett, “a screen too far?” 143; waycott and kukulska-hulme, “students’ experiences with pdas,” 36. 30. mallet, “a screen too far?” 142–43. 31. m. cristina pattuelli and debbie rabina. “forms, effects, function: lis students’ attitudes toward portable e-book readers,” aslib proceedings: new information perspectives 62, no. 3 (2010): 228–44, doi: 10.1108/00012531011046880, http://www .emeraldinsight.com/journals.htm?articleid=1863571&show=ab stract (accessed dec. 22, 2010). 32. see, for example, gil-rodriguez and planella-ribera, “educational uses of the e-book,” 58–59; and cliff mcknight and james dearnley, “electronic book use in a public library,” journal of librarianship & information science 35, no. 4 (2003): 235–42, doi: 10.1177/0961000603035004003, http://lis.sagepub .com/content/35/4/235 (accessed dec. 22, 2010). 33. rickman et al. “a campus-wide e-textbook initiative.” 34. maria kiriakova et al., “aiming at a moving target: pilot testing ebook readers in an urban academic library,” computers in libraries 30, no. 2 (2010): 20–24, http://search .ebscohost.com/login.aspx?direct=true&db=a9h&an=48757663 &site=ehost-live (accessed dec. 22, 2010). 35. mark sandler, kim armstrong, and bob nardini, “market formation for e-books: diffusion, confusion or delusion?” the journal of electronic publishing 10, no. 3 (2007), doi: 10.3998/3336451.0010.310, http://quod.lib.umich.edu/cgi/t/ text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0010.310 (accessed dec. 22, 2010). 36. mark r. nelson, “e-books in higher education: nearing the end of an era of hype?” educause review 43, no. 2 (2008), http://www.educause.edu/educause+review/ educausereviewmagazinevolume43/ebooksinhigher educationnearing/162677 (accessed dec. 22, 2010). 37. ibid. 38. l. johnson et al., the 2010 horizon report (austin, tex.: edu/ir/library/pdf/ers1006/rs/ers1006w.pdf (accessed dec. 22, 2010). 14. harris interactive, “one in ten americans use an e-reader; one in ten likely to get one in the next six months,” press release, sept. 22, 2010, http://www.piworld.com/com mon/items/biz/pi/pdf/2010/09/pi_pdf_harrispoll_ereaders. pdf (accessed dec. 22, 2010). 15. kat meyer, “#followreader: consumer attitudes toward e-book reading,” blog posting, o’reilly radar, aug. 4, 2010, http://radar.oreilly.com/2010/08/followreader-consumer-atti tudes-toward-e-book-reading.html (accessed dec. 22, 2010). 16. the following articles are all based on user studies with small form factor devices: paul lam, shun leung lam, john lam and carmel mcnaught, “usability and usefulness of ebooks on ppcs: how students’ opinions vary over time,” australasian journal of educational technology 25, no. 1 (2009): 30–44, http:// www.ascilite.org.au/ajet/ajet25/lam.pdf (accessed dec. 22, 2010); catherine c. marshall and christine rutolo, “readingin-the-small: a study of reading on small form factor devices,” in jcdl ’02 proceedings of the 2nd acm/ieee-cs joint conference on digital libraries (new york: acm, 2002): 56–64. doi: 10.1145/544220.544230, http://portal.acm.org/citation .cfm?doid=544220.544230 (accessed dec. 22, 2010); and j. waycott and a. kukulska-hulme, “students’ experiences with pdas for reading course materials,” personal ubiquitous computing 7, no. 1 (2002): 30–43, doi: 10.1007/s00779–002–0211-x, http://www .springerlink.com/content/w288kry251dd2vcd/ (accessed dec. 22, 2010). 17. some examples in an academic context: james dearnley and cliff mcknight, “the revolution starts next week: the findings of two studies considering electronic books,” information services & use 21, no. 2 (2001): 65–78, http://search .ebscohost.com/login.aspx?direct=true&db=a9h&an=5847810& site=ehost-live (accessed dec. 22, 2010); and eric j. simon, “an experiment using electronic books in the classroom,” journal of computers in mathematics & science teaching 21, no. 1 (2002): 53–66, http://vnweb.hwwilsonweb.com/hww/jumpstart.jhtml?recid= 0bc05f7a67b1790e5237dc070f466830549a60a87b3fa34bd0b8951acd 7a879da9fa151218a88252&fmt=h (accessed dec. 22, 2010). 18. eva patrícia gil-rodriguez and jordi planella-ribera, “educational uses of the e-book: an experience in a virtual university context,” in hci and usability for education and work, ed. andreas holzinger, lecture notes in computer science no. 5298 (berlin: springer, 2008): 55–62, doi: 10.1007/9783-540-89350-9-5, http://www.springerlink.com/content/ d357482823j10m96/ (accessed dec. 22, 2010); “e-reader pilot at princeton, final report,” (princeton university, 2009), http:// www.princeton.edu/ereaderpilot/ereaderfinalreportlong .pdf (accessed dec. 22, 2010); gavin foster and eric d. remy. “e-books for academe: a study from gettysburg college,” educause research bulletin, no. 22 (2009), http://www.educause .edu/resources/ebooksforacademeastudyfromgett/187196 (dec. 22, 2010); and elizabeth mallett, “a screen too far? findings from an e-book reader pilot,” serials 23, no. 2 (2010): 14–144, doi: 10.1629/23140, http://uksg.metapress.com/ media/mfpntjwvyqtggyjvudu7/contributions/f/3/2/6/ f32687v5r12n5h77.pdf (accessed july 11, 2011). 19. steve kolowich, “colleges test amazon’s kindle e-book reader as study tool,” usa today, feb. 23, 2010, http://www .usatoday.com/news/education/2010–02–23-ihe-amazon-kin dle-for-college23_st_n.htm (accessed dec. 22, 2010). adoption of e-book readers among college students: a survey | foasberg 121 question 22, and was reused in the current survey. again, the author extends thanks to michelle fraboni and eva fernández, who ran this portion of the survey at queens college and allowed the use of their data. 41. electronic frontier foundation, “updated and corrected: e-book buyer’s guide to privacy,” deeplinks blog, jan. 6, 2010, http://www.eff.org/deeplinks/2010/01/updated-and-corrected-e-book-buyers-guide-privacy (accessed dec. 22, 2010). 42. pattuelli and rabina, “lis students’ attitudes.” new media consortium, 2010), http://wp.nmc.org/horizon2010/chapters/electronic-books/ (accessed july 11, 2011). 39. aaron smith, “e-book readers and tablet computers,” h t t p : / / w w w. p e w i n t e r n e t . o rg / r e p o r t s / 2 0 1 0 / g a d g e t s / report/ebook-readers-and-tablet-computers.aspx (accessed july 11, 2011). 40. this question was located in a portion of the survey not focused on e-book readers and thus does not appear in the appendix. the question derives from smith and caruso, 105, 122 information technology and libraries | september 2011 appendix. queens college student technology survey adoption of e-book readers among college students: a survey | foasberg 123 124 information technology and libraries | september 2011 adoption of e-book readers among college students: a survey | foasberg 125 126 information technology and libraries | september 2011 adoption of e-book readers among college students: a survey | foasberg 127 128 information technology and libraries | september 2011 reference information extraction and processing using conditional random fields tudor groza, gunnar aastrand grimnes, and siegfried handschuh reference information extraction and processing |groza, grimnes, and handschuh 6 abstract fostering both the creation and the linking of data with the scope of supporting the growth of the linked data web requires us to improve the acquisition and extraction mechanisms of the underlying semantic metadata. this is particularly important for the scientific publishing domain, where currently most of the datasets are being created in an author-driven, manual manner. in addition, such datasets capture only fragments of the complete metadata, omitting usually, important elements such as the references, although they represent valuable information. in this paper we present an approach that aims at dealing with this aspect of extraction and processing of reference information. the experimental evaluation shows that, currently, our solution handles very well diverse types of reference format, thus making it usable for, or adaptable to, any area of scientific publishing. 1. introduction the progressive adoption of semantic web 1 techniques resulted in the creation of a series of datasets connected by the linked data 2 initiative, and via the linked data principles, into a universal web of linked data. in order to foster the continuous growth of this linked data web, we need to improve the acquisition and extraction mechanisms of the underlying semantic metadata. unfortunately, the scientific publishing domain, a domain with an enormous potential for generating large amounts of linked data, still promotes trivial mechanisms for producing semantic metadata. 3 as an illustration, the metadata acquisition process of the semantic web dog food server, 4 the main linked data publication repository available on the web, consists of two steps:  the authors manually fill-in submission forms corresponding to different publishing venues (e.g., conferences or workshops), with the resulting (usually xml) information being transformed via scripts into semantic metadata, and  the entity uris (i.e., authors and publications) present in this semantic metadata are then manually mapped to existing web uris for linking/consolidation purposes. tudor groza (tudor.groza@uq.edu.au) is postdoctoral research fellow, school of information technology and electrical engineering, university of queensland, gunnar aastrand grimnes (grimnes@dfki.uni-kl.de) is researcher, german research center for artificial intelligence (dfki) gmbh, kaiserslautern, germany, siegfried handschuh (msiegfried.handschuh@deri.org) is senior lecturer/associate professor, national university of ireland, galway, ireland. mailto:tudor.groza@uq.edu.au mailto:grimnes@dfki.uni-kl.de mailto:msiegfried.handschuh@deri.org information technology and libraries | june 2012 7 moreover, independent of the creation/acquisition process, one particular component of the publication metadata, i.e., the reference information, is almost constantly neglected. the reason is mainly the amount of work required to manually create it, or the complexity of the task, in the case of automatic extraction. as a result, currently, there are no datasets in the linked data web exposing reference information, while the number of digital libraries providing search and link functionality over references is rather limited. this is quite a problematic gap if we consider the amount of information provided by references and their foundational support for other application techniques that bring value to researchers and librarians, such as citation analysis and citation metrics, tracking temporal author-topic evolution 5 or co-authorship graph analysis. 6,7 in this paper we focus on the first of the above-mentioned steps, i.e., providing the underlying mechanisms for automatic extraction of reference metadata. we devise a solution that enables extraction and chunking of references using conditional random fields (crf). 8 the resulting metadata can then be easily transformed into semantic metadata adhering to particular schemas via scripts, the added value being the exclusion of the manual author-driven creation step from the process. from the domain perspective, we focus on computer science and health sciences only because these domains have representative datasets that can be used for evaluation and hence enable comparison against similar approaches. however, we believe that our model can be applied also in domains such as digital humanities or social sciences, and we intend, in the near future, to build a corresponding corpus that would allow us to test and adapt (if necessary) our solution to these domains. figure 1. examples of chunked and labeled reference strings reference chunking represents the process of label sequencing a reference string, i.e., tagging the parts of the reference containing the authors, the title, the publication venue, etc. the main issue associated with this task is the lack of uniformity in the reference representation. figure 1 presents three examples of chunked and labeled reference strings. one cannot infer generic patterns for all types of references. for example, the year (or date) of some of the references of this paper are similar to example 2 from the figure, i.e., they are located at the very end of the reference string. unfortunately, this does not hold for some journal reference formats, such as the one presented in example 1. and at the same time, the actual date might not comprise only the year, but also the month (and even day). in addition to the placement of the particular types of tokens within the reference string, one of the major concerns when labeling these types of tokens is disambiguation. generally, there are three categories of ambiguous elements: reference information extraction and processing |groza, grimnes, and handschuh 8  names—can act as authors, editors, or even part of organization names (e.g., max planck institute); in example 1 a name is used as part of the title;  numbers—can act as pages, years, days, volume numbers, or just numbers within the title;  locations—can act as actual locations or part of organization names (e.g., univ. of wisconsin) to help the chunker in performing disambiguation, one can use a series of markers, such as, pp. for pages, tr for technical reports, univ. or institute for organization. however, there are cases where such markers help in detecting the general category of the token, e.g., publication venue, but a more detailed disambiguation is required. for example, the proc. marker generally signals the publication venue of the reference, without knowing exactly whether it represents a workshop, conference or even journal (as in the case of proc. natl. acad. sci.—proceedings of the national academy of sciences). the solution we have devised was built to properly handle such disambiguation issues and the intrinsic heterogeneous nature of references. the features of the crf chunker model were chosen to provide a representative discrimination between the different fields of the reference string. consequently, as the experimental results show, the resulting chunker has a superior efficiency, while at the same time maintaining an increased versatility. the rest of the paper is structured as follows: in section 2 we briefly describe conditional random fields and analyze the existing related work. section 3 details the crf-based reference chunker and before concluding in section 5, section 4 presents our experimental results. 2. background 2.1 conditional random fields to have a better understanding of the machine learning technique used by our solution, in the following we give a brief description of the conditional random fields paradigm. figure 2. example linear crf—showing dependencies between features x and classes y information technology and libraries | june 2012 9 conditional random fields (crf) is a probabilistic graphical model for classification. crf, in general, can represent many different types of graphical models, however in the scope of this paper, we use the so-called linear-chain crfs. a simple example of a linear dependency graph is shown in figure 2, here only the features x of the previous item influences the class of the current item y. the conditional probability is defined as: ( | ) ( ) (∑ ( ) ) where ( ) ∑ ( ) and ( ) ∑ (∑ ( ) ) . the model is usually trained by maximizing the log-likelihood of the training data by gradient methods. a dynamic algorithm is used to compute all the required probabilities p⍬(yi, yi+1) for calculating the gradient of the likelihood. this means that in contrast to traditional classification algorithms in machine learning (e.g., support vector machines 9 ), it not only considers the attributes of the current element when determining the class, but also attributes of preceding and succeeding items. this makes it ideal for tagging sequences, such as chunking of parts of speech or parts of references, which is what we require for our chunking task. 2.2 related work in recent years, extensive research has been performed in the area of automatic metadata extraction from scientific publications. most of the approaches focus on one of the two main metadata components, i.e., on the heading/bibliographic metadata or on the reference metadata, but there are also cases when the entire set is targeted. as this paper focuses only on the second component, within this section we present and discuss those applications that deal strictly with reference chunking. the parscit framework is the closest technique mapping to our goals and methodology. 10 parscit is an open-source reference-parsing package. while its first version used a maximum entropy model to perform reference chunking, 11 currently, inspired by the work of peng et al. , 12 it uses a trained crf model for label sequencing. the model was obtained based on a set of twenty-three token-oriented features tailored towards correcting the errors that peng's crf model produced. our crf chunker builds on the work of parscit. however, as we aimed at improving the chunking performance, we altered some of the existing features and introduced additional ones. moreover, we have compiled significantly larger gazetteers required for detecting different aspects, such as names, places, organizations, journals, or publishers. one of the first attempts to extract and index reference information led to the currently well known system, citeseer. 13 around the same period, seymore et al. developed one of the first reference chunking approaches that used machine learning techniques. 14 the authors trained a hidden markov model (hmm) to build a reference sequence labeler using internal states for different parts of the fields. as it represented pioneering work, it also resulted in the first gold standard set, the cora dataset. at a later stage, the same group applied crf for the first time to perform reference chunking, which later inspired parscit. 15 reference information extraction and processing |groza, grimnes, and handschuh 10 in the same learning-driven category is the work of han et al. 16 the authors proposed an effective word clustering approach with the goal of reducing feature dimensionality when compared to hmm, while at the same time improving the overall chunking performance. the resultant domain, rule-based word clustering method for cluster feature representation used clusters formed from various domain databases and word orthographic properties. consequently, they achieved an 8.5 percent improvement on the overall accuracy of reference fields classification combined with a significant dimensionality reduction. flux-cim 17 is the only unsupervised 18 approach that targets reference chunking. the system uses automatically constructed knowledge bases from an existing set of sample references for recognizing the component fields of a reference. the chunking process features two steps:  a probability estimation of a given term within a reference which is a value for a given reference field based on the information encoded in their knowledge bases, and  the use of generic structural properties of references. similarly to seymore et al., 19 the authors have also created two datasets (specifically for the computer science and health science areas) to be used for comparing the achieved accuracies. a completely different, and novel, direction was developed by poon and domingos. 20 unlike all the other approaches, they propose a solution where the segmentation (chunking) of the reference fields is performed together with the entity resolution in a single integrated inference process. they, thus, help in disambiguating the boundaries of less-clear chunked fields, using the already well-segmented ones. although the results achieved are similar to, and even better than some of, the above-mentioned approaches, this is suboptimal from the computational perspective: the chunking/resolution time reported by the authors measured around thirty minutes. in addition to the previously described works, which were specifically tailored for bibliographic metadata extraction, there are a series of other approaches that could be used for the same purpose. for example, cesario et al. propose an innovative recursive boosting strategy, with progressive classification, to reconcile textual elements to an existing attribute schema. 21 in the case of bibliographic metadata segmentation, the metadata fields would correspond to the textual elements, while an ontology describing them (e.g., dublincore 22 or swrc 23 ) would have the schema role. the authors even describe an evaluation of the method using the dblp citation dataset, however, without giving precise details on the fields considered for segmentation. some other approaches include, in general, any sequence labeling techniques, e.g., slf, 24 named entity recognition techniques, 25 or even field association (fa) terms extraction, 26 the latter working on bibliographic metadata fields in a quasi-similar manner as the recursive boosting strategy. in conclusion, it is worth mentioning that retrieving citation contexts is an interesting research area especially in the context of digital libraries. our current work does not feature this aspect, but we regard it as one of the key next steps to be tackled. consequently, we mention the research performed by schwartz et al. 27 teufel et al., 28 or wu et al. 29 that deal with using citation contexts for discerning a citation's function and analyzing how this influences or is influenced by the work it points to. information technology and libraries | june 2012 11 3. method this section presents the crf chunker model. we start by defining the preprocessing steps that deal with the extraction of the references block, dividing the block into actual reference entries and cleaning the reference strings, and then detail the crf reference chunker features. 3.1 prerequisites most of the features used by the crf chunker require some forms of vocabulary entries. therefore, we have manually compiled a comprehensive list of gazetteers (only for english, except for the names), explained as follows:  firstname—25,155 entries gazetteer of the most common first names (independent of gender);  lastname—48,378 entries list of the most common surnames;  month—month names gazetteer and associated abbreviations;  venuetype—a structured gazetteer with five categories: conference, workshop, journal, techreport, and website. each category has attached its own gazetteer, containing specific keywords and not actual titles. for example, the conference gazetteerfeatures ten unigrams signaling conferences, such as conference, conf, or symposium;  location—places, cities, and countries gazetteer comprising 17,336 entries;  organization—150 entries gazetteer listing organization prefixes and suffixes (e.g., e.v. or kgaa);  proceedings—simple list of all possible appearances of the proceedings marker;  publisher—564 entries gazetteer comprising publisher unigrams (produced from around 150 publisher names);  jtitle—12,101 entries list of journal title unigrams (produced from around 1600 journal titles);  connection—a 42 entries stop-word gazetteer (e.g., to, and, as). 3.2 preprocessing in the preprocessing stage we deal with three aspects:  cleaning the provided input,  extracting the reference block, and  the division of the reference block into reference entries. the first step aims to clean the raw textual input received by the chunker of unwanted spacing characters while at the same time ensuring proper spacing where necessary. since the source of the textual input is unknown to the chunker, we make no assumptions with regard to its structure or content. 30 thus, in order to avoid inherent errors that might appear as a result of extracting the raw text from the original document, we perform the following cleaning steps:  we compress the text by eliminating unnecessary carriage returns, such that the lines containing less than 15 characters are merged with previous ones, 31  we introduce spaces after some punctuation characters, such as “,,” “.” or “-”, and finally,  we split the camel-cased strings, such as johndoe. reference information extraction and processing |groza, grimnes, and handschuh 12 the result will be a compact and clean version of the input. also, if the raw input is already compact and clean, this preprocessing step will not affect it. the extraction of the reference block is done using regular expressions. generally, we search in the compacted and cleaned input for specific markers, like references or bibliography, located mainly at the beginning of a line. if these are not directly found, we try different variations, such as, looking for the markers at the end of a line, or looking for split markers onto two lines (e.g., ref – erences, or refer – ences). this latter case is a typical consequence of the above-described compacting step if the initial input was erroneously extracted. the text following the markers is considered for division, although it may contain unwanted parts such as appendices or tables. the division into individual reference entries is performed on a case basis. after splitting the reference block based on new lines, we look for prefix patterns at the beginning of each line. as an example, we analyze which lines start with “[”, “(”, or a number followed by “.” or space, and we record the positions of these lines in the list of all lines. to ensure that we don't consider any false positives when merging the adjacent lines into a reference entry, we compute a global average of the differences between positions. assuming that a reference does not span on more than four lines, if this average is between one and four, a reference entry is created. the same average is also used to extract the last reference in the list, thus detaching it from eventual appendices or tables. 3.3 the reference chunking model we have built the crf learning model based on a series of features used in principle also by the other crf reference chunking approaches such as parscit 32 or peng and mccallum 33 . a set of feature values is used to characterize each token present in the reference string, where the reference's token list is obtained by dividing the reference string into space-separated pieces. the complete list of features is detailed as follows. we use example 1 from figure 1 toexemplify the feature values.  token—the original reference token: bronzwaer,  clean token—the original token, stripped of any punctuation and lower cased: bronzwaer  token ending—a flag signaling the type of ending (possible values: lower cap – c / upper cap – c / digit – 0 / punctuation character: ,  token decomposition–start—five individual values corresponding to token's first five characters, taken gradually: b, br, bro, bron, bronz  token decomposition–end—five individual values corresponding to the token's last five characters, taken gradually: r, er, aer, waer, zwaer,  pos tag—the token's part of speech tag (possible values: proper noun phrase – nnp ,  noun phrase – np, adjective – jj, cardinal number – cd, etc): nnp  orthographic case—a flag signaling the token's orthographic case (possible values:  initialcap, singlecap, lowercase, mixedcaps, allcaps): singlecap  punctuation type—a flag signaling the presence and type of a trailing punctuation character (possible values: cont, stop, other): cont  number type—a flag signaling the presence and type of a number in the token (possible values: year, ordinal, 1dig, 2dig, 3dig, 4dig, 4dig+, nonumber): nonumber information technology and libraries | june 2012 13  dictionary entries—a set of ten flags signaling the presence of the token in the set of individual gazetteers listed in sect. 3.1. for our example the dictionary feature set would be: no lastname no no no no no no no no  date check—a flag checking whether the token may contain a date in form of a period of days, e.g., 12-14 (possible values: possdate, no): no  pages check—a flag checking whether the token may contain pages, e.g., 234–238 (possible values: posspages, no): no  token placement—the token placement in the reference string, based on its division into nine equal consecutive buckets. this feature indicates the bucket number: 0 for training purposes we compiled and manually tagged a set of 830 randomly chosen references. these were extracted from random publications from diverse conferences and journals from the computer science field (collected from ieee explorer, springer link or the acm portal), manually cleaned, tagged, and categorized according to their type of publication venue. 34 to achieve an increased versatility, instead of performing crossvalidation, 35 which would result in a datasettailored model with limited or no versatility, we opted for sampling the test data. hence, we included in the training corpus some samples from the testing datasets as follows: 10 percent of the cora dataset (i.e., 20 entries), 36 10 percent of the flux-cim cs dataset (i.e., 30 entries), 37 and 1% of the flux-cim hs dataset (i.e., 20 entries). consequently, the final training corpus consisted of a total of 900 reference strings. to clarify, this is, to some extent, similar to the dataset-specific cross-validation, but instead of considering, for example, a 60–40 ratio for training/testing, we used only 10 percent for training, while the testing (described in section 4) was performed as a direct application of the chunker on the entire dataset. as already mentioned, our focus on computer science and health sciences is strictly due to evaluation purposes. our proposed model is domain-agnostic, and hence, the steps described here can be easily performed on datasets emerged from other domains, if at all necessary. in reality, the chunker’s performance on references from a domain not covered above can be easily boosted simply by including a sample of references in the training set and then retraining the chunker. the list of labels used for training and then testing consists of author, title, journal, conference, workshop, website, technicalrep, date, publisher, location, volnum, pages, etal, note, editors, organization. as we will see in the evaluation, not all labels were actually used for testing (e.g., note or editors), some of them being present in the model for the sake of disambiguation. also, as opposed to the other approaches, we made a clear distinction between workshop and conference, which adds an extra degree to the complexity of the disambiguation. the crf model was trained using the mallet (a machine learning for language toolkit) implementation. 38 the output of the chunker is post-processed to expose a series of fine-grained details. as shown in figure 1 in all the examples, the chunking provides a blocked partition of the reference string, but we require for the author field an even deeper partition. consequently, following a rule-based approach we extract the individual author names from the author block making use of the punctuation marks, the orthographic case, and the alternation between initials and actual names. when no initials, subject to the existing punctuation marks, we consider as a rule-of-thumb that each name generally comprises one first name and one surname (in this order, i.e., john doe). the result of the post-processing is used in the linking process. reference information extraction and processing |groza, grimnes, and handschuh 14 4. experimental results we have performed an extensive evaluation of the proposed reference chunking approach. in general, all the previous work in reference chunking focuses on raw reference chunking, i.e., label sequencing at the macro level. more concretely, the other approaches split and tag the reference strings using blocks of complete references, without going into details such as chunking individual authors. the only exception is the parscit package that does perform complete reference chunking in a similar fashion as we do. the evaluation results presented in this section, will feature complete chunking only for our solution and for parscit, and raw chunking for the rest of the approaches. field parscit peng han et al. our approach p r f1 f1 p r f1 p r f1 author 98.7 99.3 98.99 99.4 92.6 99.1 97.6 99.08 99.6 99.30 title 96.0 98.4 97.18 98.3 92.2 93.0 92.6 95.64 95.64 95.64 date 100 98.4 99.19 98.9 98.5 95.9 97.2 99.33 98.67 98.99 pages 97.7 98.4 98.04 98.6 95.6 96.9 96.2 99.28 99.22 99.24 location 95.6 90.0 92.71 87.2 77.7 71.5 74.5 93.45 92.59 93.01 organization 90.9 87.9 89.37 94.0 76.5 77.3 76.9 100 87.87 93.54 journal 90.8 91.2 90.99 91.3 77.1 78.7 77.9 94.02 97.42 95.68 booktitle 92.7 94.2 93.44 93.7 88.7 88.9 88.88 97.77 98.44 98.10 publisher 95.2 88.7 91.83 76.1 56.0 64.1 59.9 94.84 95.83 95.33 tech. rep. 94.0 79.6 86.2 86.7 56.2 64.1 59.9 100 90.90 95.23 website 100 100 100 table 1. evaluation results on the cora dataset an additional observation we need to make is related to the reference fields taken into account. most of the fields we have focused on coincide with the fields considered by all the existing relevant approaches. nevertheless, there are also some discrepancies, listed as follows:  the fields: volume, number, editors, or note were used in the chunking process b u t are not considered for evaluation  unlike all the other approaches, we make the distinction between conference and workshop as publication venues. however, for alignment purposes (i.e., to be able to compare our results with the other approaches), in the evaluation results these are merged into the booktitle field. the actual tests were performed on four different datasets, three of them used also for evaluating the other approaches, and a fourth one compiled by us. in the case of the three existing datasets, during the experimental evaluation we did not make use of the preprocessing step as they were already clean. as evaluation metric, we used the f1 score, 39 i.e., the harmonic mean of precision and recall, using the following formula: information technology and libraries | june 2012 15 in the following, we iterate over each dataset, by providing a short description and the experimental results. it is worth mentioning that our crf reference chunker was trained only once, as described earlier, and not specifically for each dataset. 4.1 dataset: cora the cora dataset is the first gold standard created for automatic reference chunking. 40 it comprises two hundred reference strings and focuses on the computer science area. each entry is segmented into thirteen different fields: author, editor, title, booktitle, journal, volume, publisher, date, pages, location, tech, institution and note. table 1 shows the comparative evaluation results on the cora dataset of parscit, peng et al., 41 han et al., 42 and our approach. we observe that our chunker outperforms the other chunkers on most of the fields, with some of them presenting a significant increase in performance (looking at the f1 score): journal from 91.3 percent to 95.68 percent, booktitle from 93.44 percent to 98.10 percent, publisher from 91.83 percent to 95.33 percent, and especially tech. rep. from 86.7 percent to 95.23 percent. in the case of the fields where our chunker was outperformed, the f1 score is very close to the best of the approaches and includes an increase in one of its two components (i.e., precision or recall). for example, on the organization field, we scored 93.54percent, the best being peng's 94 percent. however, we achieved a gain of almost 10 percent in precision when compared with parscit (100 percent vs. 90.9 percent precision). similarly, on the date field, our f1 was 98.99 percent, opposed to parscit's 99.19 percent, but with a better recall of 98.67 percent. field parscit flux-cim our approach p r f1 p r f1 p r f1 author 98.8 99.0 98.89 93.59 95.58 94.57 99.08 99.08 99.08 title 98.8 98.3 98.54 93.0 93.0 93.0 99.65 99.65 99.65 date 99.8 94.5 97.07 97.75 97.44 97.59 98.55 98.19 98.36 pages 94.7 99.3 96.94 97.0 97.84 97.41 97.28 97.72 97.49 location 96.9 88.4 92.45 96.83 97.6 97.21 95.55 94.5 95.02 journal 97.1 82.9 89.43 95.71 97.81 96.75 94.0 97.91 95.91 booktitle 95.7 99.3 97.46 97.47 95.45 96.45 99.13 99.13 99.13 publisher 98.8 75.9 85.84 100 100 100 98.59 98.59 98.59 table 2. evaluation results on the flux-cim dataset—cs domain field flux-cim our approach p r f1 p r f1 author 98.57 99.04 98.81 99.8 99.36 99.57 title 84.88 85.14 85.01 91.39 91.39 97.39 date 99.85 99.5 99.61 99.89 99.69 99.78 pages 99.1 99.2 99.45 99.94 99.59 99.76 journal 97.23 89.35 93.13 99.42 99.16 99.28 table 3. evaluation results on the flux-cim dataset—hs domain reference information extraction and processing |groza, grimnes, and handschuh 16 4.1 dataset: flux-cim flux-cim 43 is an unsupervised 44 reference extraction and chunking system. in order to evaluate its performance, the authors of flux-cim created two separate datasets:  the flux-cim cs dataset, composed on a collection of heterogeneous references from the computer science field, and  the flux-cim hs dataset is comprised of an organized and controlled collection of references from pubmed. the flux-cim cs dataset contains three hundred reference strings randomly selected from the acm digital library. each string is segmented into ten fields: author, title, conf, journal, volume, number, pub, date, pages and place. the flux-cim hs dataset contains 2000 entries, with each entry segmented into six fields: author, title, journal, volume, date and pages. table 2 presents the comparative test results achieved by parscit, flux-cim, and our approach on the cs dataset. similar to the cora dataset, our chunker outperformed the other chunkers on the majority of the fields, exceptions being the location, journal, and publisher fields. the test results on the hs dataset are presented in table 3. here we can observe a clear performance improvement on all fields, in some cases the difference being significant, e.g., the title field, from 85.01 percent to 97.39 percent, or the journal field, from 93.12 percent to 99.28 percent. this increase is even more relevant considering the size of the dataset, each 1percent representing twenty references. 4.3 dataset: cs-sw while the cora and flux-cim cs datasets do focus on the computer science field, they do not cover the slight differences in reference format that can be found nowadays in the semantic web community. consequently, to show the even broader application of our approach, we have compiled a dataset named cs-sw comprising 576 reference strings randomly selected from publications in the semantic web area, from conferences such as international semantic web conference (iswc), the european semantic web conference (eswc), the world wide web conference (www), or the european conference on knowledge acquisition (and co-located workshops). 45 each reference entry is segmented into twelve fields: author, title, conference, workshop, journal, techrep, organization, publisher, date, pages, website and location. table 4 shows the results of the tests carried out on this dataset. one can easily observe that the chunker performed in a similar manner as on the cora dataset, with emphasis on the author, date, pages and publisher fields. field our approach p r f1 author 98.61 99.27 98.93 title 94.91 93.29 94.09 date 98.89 98.34 98.61 pages 98.94 97.24 98.08 location 93.9 92.77 93.33 organization 85.71 80 00 82.75 journal 94.59 93.33 93.95 information technology and libraries | june 2012 17 conference 96.66 95.08 95.86 workshop 83.33 88.23 85.71 publisher 96.61 97.43 97.01 tech. rep. 100 80 88.88 website 98.14 94.64 96.35 table 4. evaluation results on the cs-sw dataset 5. conclusion in this paper we presented a novel approach for extracting and chunking reference information from scientific publications. the solution, realized using a crf trained chunker, achieved good results in the experimental evaluation, in addition to an increased versatility shown by applying the one-time trained chunker on multiple testing datasets. this enables a straightforward adoption and reuse of our solution for generating semantic metadata in any digital library or publication repository focused on scientific publishing. as next steps, we plan to create a comprehensive dataset covering multiple heterogeneous domains (e.g., social sciences or digital humanities) and evaluate the chunker’s performance on it. then we will focus on developing an accurate reference consolidation and linking technique, to address the second step mentioned in section 1, i.e., aligning the resulting metadata to the existing linked data on the web. we plan to develop a flexible consolidation mechanism by dynamically generating and executing sparql queries from chunked reference fields and filtering the results via two string approximation metrics (a combination of monge-elkan and chapman soundex algorithms). the sparql queries generation will be implemented in an extensible manner, via customizable query modules, to accommodate the heterogeneous nature of the diverse linked data sources. finally, we intend to develop an overlay interface for arbitrary online publication repositories, to enable on-the-fly creation, visualization, and linking of semantic metadata from repositories that currently do not expose their datasets in a semantic / linked manner. acknowledgements the work presented in this paper has been funded by science foundation ireland under grant no. sfi/08/ce/i1380 (lion-2). references and notes 1. tim berners-lee et al., “the semantic web,” scientific american 284 (2001): 35–43. 2. christian bizer et al., “linked data—the story so far,” international journal on semantic web and information systems 5 (2009): 1–22. 3. generating computer-understandable metadata represents an issue, in general, in the publishing domain, and not necessarily only in its scientific area. however, the relevant literature dealing with metadata extraction/generation has focused on scientific publishing, because of its accelerated growing rate, especially with the increasing use of the world wide web as a dissemination mechanism. reference information extraction and processing |groza, grimnes, and handschuh 18 4. knud moeller et al., “recipes for semantic web dog food – the eswc and iswc metadata projects,” proceedings of the 6th international semantic web conference (busan, korea, 2007). 5. wei peng and tao li, “temporal relation co-clustering on directional social network and author-topic evolution,” knowledge and information systems 26 (2011): 467–86. 6. laszlo barabasi et al., “evolution of the social network of scientific collaborations,” physica a: statistical mechanics and its applications 311 (2002): 590–614. 7. xiaoming liu et al., “co-authorship networks in the digital library research community,” information processing & management 41 (2005): 1462–80. 8. john d. lafferty et al., “conditional random fields: probabilistic models for segmenting and labeling sequence data,” proceedings of the 18th international conference on machine learning (san francisco, ca, usa, 2001): 282–89. 9. vladimir vapnik, the nature of statistical learning theory (new york: springer, 1995). 10. isaac g. councill et al., “parscit: an open-source crf reference string parsing package,” proceedings of the sixth international language resources and evaluation (marrakech, morocco, 2008). 11. yong kiat ng, “citation parsing using maximum entropy and repairs” (master's thesis, national university of singapore, 2004). 12. fuchun peng and andrew mccallum, “information extraction from research papers using conditional random fields,” information processing & management 42 (2006): 963–79. 13. c. lee giles et al., “citeseer: an automatic citation indexing system,” proceedings of the third amc conference on digital libraries (pittsburgh, pa, 1998): 89–98. 14. kristie seymore et al., “learning hidden markov model structure for information extraction,” proceedings of the aaai workshop on machine learning for information extraction (1999): 37– 42. 15. isaac g. councill et al., “parscit: an open-source crf reference string parsing package,” proceedings of the sixth international language resources and evaluation (marrakech, morocco, 2008). 16. hui han et al., “rule-based word clustering for document metadata extraction,” proceedings of the symposium on applied computing (santa fe, new mexico, 2005). 17. eli cortez et al., “flux-cim: flexible unsupervised extraction of citation metadata,” proceedings of the 2007 conference on digital libraries (new york, 2007): 215–24. 18. machine learning methods can be broadly classified into two categories: supervised and unsupervised. supervised methods require training on specific datasets that exhibit the characteristics of the target domain. to achieve high accuracy levels, the training dataset needs to be reasonably large, and more importantly, it has to cover most of the possible information technology and libraries | june 2012 19 exceptions from the intrinsic data patterns. unlike supervised methods, unsupervised methods do not require training, and in principle, use generic rules to encode both the expected patterns and the possible exceptions of the target data. 19. peng and mccallum, “information extraction from research papers using conditional random fields.” 20. hoifung poon and pedro domingos, “joint inference in information extraction,” proceedings of the 22nd national conference on artificial intelligence (vancouver, british columbia, canada, 2007): 913–18. 21. ariel schwartz et al., “multiple alignment of citation sentences with conditional random fields and posterior decoding,” proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (prague, czech republic, 2007): 847–57. 22. simone teufel et al., “automatic classification of citation function,” proceedings of the 2006 conference on empirical methods in natural language processing (sydney, australia, 2006): 103–10. 23. jien-chen wu et al., “computational analysis of move structures in academic abstracts,” coling/acl interactive presentation sessions (sydney, australia, 2006): 41–44. 24. eugenio cesario et al., “boosting text segmentation via progressive classification,” knowledge and information systems 15 (2008): 285–320. 25. dublin core website, http://dublincore.org (accessed may 4, 2011). 26. york sure et al., “the swrc ontology – semantic web for research communities,” proceedings of the 12th portuguese conference on artificial intelligence (covilha, portugal, 2005). 27. yanjun qi et al., “semi-supervised sequence labeling with self-learned features,” proceedings of ieee international conference on data mining (miami, fl, usa, 2009). 28. david sanchez et al., “content annotation for the semantic web: an automatic web-based approach,” knowledge and information systems 27 (2011): 393-418. 29. tshering cigay dorji et al., “extraction, selection and ranking of field association (fa) terms from domain-specific corpora for building a comprehensive fa terms dictionary,” knowledge and information systems 27 (2011): 141–61. 30. please note that the chunker is document-format agnostic and takes as input only raw text. the actual extraction of this raw text from the original document (pdf, doc or some other format) is the user’s responsibility. 31. as a note, we chose this length of fifteen characters empirically, and based on the assumption that in any format the publication content lines usually have more than fifteen characters. reference information extraction and processing |groza, grimnes, and handschuh 20 32. lafferty et al., “conditional random fields: probabilistic models for segmenting and labeling sequence data.” 33. councill et al., “parscit: an open-source crf reference string parsing package.” 34. the manual tagging was performed by a single person and since the reference chunks have no ambiguity attached, we did not see the need for running any data reliability tests. 35. ron kohavi, “a study of cross-validation and bootstrap for accuracy estimation and model selection,” proceedings of the 14th international joint conference on artificial intelligence (montreal, quebec, 1995): 1137–43. 36. peng and mccallum, “information extraction from research papers using conditional random fields.” 37. councill et al., “parscit: an open-source crf reference string parsing package.” 38. mallet: machine learning for language toolkit, http://mallet.cs.umass.edu (accessed may 4, 2011). 39. william m. shaw et al., “performance standards and evaluations in ir test collections: clusterbased retrieval models,” information processing & management 33 (1997): 1–14. 40. peng and mccallum, “information extraction from research papers using conditional random fields.” 41. councill et al., “parscit: an open-source crf reference string parsing package.” 42. seymore et al., “learning hidden markov model structure for information extraction.” 43. han et al., “rule-based word clustering for document metadata extraction.” 44. cortez et al., “flux-cim: flexible unsupervised extraction of citation metadata.” 45. the cs-sw dataset is available at http://resources.smile.deri.ie/corpora/cs-sw (accessed may 4, 2011). http://resources.smile.deri.ie/corpora/cs-sw 88 information technology and libraries | june 2006 article title: subtitle in same font author name and second author the concept of digital libraries is familiar to both librarians and library patrons today. these new libraries have broken the limits of space and distance by delivering information in various formats via the internet. since most digital libraries contain a colossal amount of information, it is critical to design more user-friendly interfaces to explore, understand, and manage their content. one important technique for designing such interfaces is information visualization. although computer-aided information visualization is a relatively new research area, numerous visualization applications already exist in various fields today. furthermore, many library professionals are also starting to realize that combining information visualization techniques and current library technologies, such as digital libraries, can help library users find information more effectively and efficiently. this article first discusses three major tasks that most visualization for digital libraries emphasize, and then introduces several current applications of information visualization for digital libraries. a good understanding of user tasks is the foundation of designing useful visualizations. rao et al. defined several specific user tasks of digital libraries and illustrated some existing information-visualization techniques that can be used to enhance these tasks, such as tilebar, cone tree, and document lens.1 the tasks were browsing subsets of sources iteratively, viewing context-of-query match, visualizing passages within documents, rendering sources and results, reflecting time costs of interaction, managing multiple-search processes, integrating multiple search and browsing techniques, and visualizing large information sets. moreover, zaphiris et al. generalized these tasks into three essential ones: searching, navigation, and browsing.2 indeed, most information-visualization projects for digital libraries have emphasized these three tasks. in terms of searching, shneiderman et al. proposed the use of a two-dimensional display with continuous variables to view several thousand search results simultaneously.3 this visualization included two strategies: two-dimensional visualizations, and browsers for hierarchical data sets (implemented by using categorical and hierarchical axes). in combination with a grid display, this visualization let users see an overview by colorcoded dots or bar charts arranged on a grid and organized by familiar labeled categories. users could probe further by zooming in on desired categories or switching to another hierarchical variable. a language-independent document-classification system, completed by liu et al., provided a search aid in a digital-library environment and helped users analyze the search query results visually.4 this system used a vector model to calculate the similarities between documents and a java applet to display the classification to the user. in terms of navigation, there are also a variety of information-visualization applications. the previous example of two-dimensional display developed by shniederman et al. also contained navigation functions.5 another example is hascoet’s map interface applied to a digital library.6 this prototype was associated with summary views in the form of navigation trees and neighbor trees that showed documents related to one focus document. the user interface was composed of maps automatically generated based on the characteristics of documents retrieved and a default configuration. users could also modify the configuration of maps and edit maps (classical operations such as cut, paste, move, delete, save and load a view, and expand a view). as for browsing, the use of dynamic queries is a technique that has been employed for some time. ahlberg and shneiderman’s (1994) filmfinder is an early example.7 users can move several sliders to select query parameters, and the search results change with the movement of the sliders. this tool can help users browse movie records more easily and cognitively. another technique is query previews, proposed by doan et al.8 query previews allows users to rapidly gain an understanding of the content and scope of a digital collection. users are presented with generalized previews of the entire database using only the most salient attributes. when they select rough ranges, they will immediately learn the availability of the data for their proposed query. all these applications provide good examples and paradigms to some recent projects. this paper’s discussion of visualization techniques will be based on these three essential tasks—searching, navigation, and browsing. ■ techniques and applications this section presents several recent information-visualization projects applied to digital libraries. all these applications emphasize searching, navigation, and browsing. gang wan gang wan (wangang11@gmail.com) is science librarian, texas a&m university, college station. visualizations for digital libraries article title | author 89visualizations for digital libraries | wan 89 lvis—digital library visualizer indiana university’s (iu’s) lvis (digital library visualizer) project aims to aid users’ navigation and comprehension of large document sets retrieved from digital libraries. borner et al. developed a prototype of lvis based on the data set in the dido image bank, provided by iu’s department of art history.9 lvis is a good combination of information-retrieval algorithms and visualized-search interface. in the information retrieval and classification stage, it adopts latent semantic analysis (lsa) to automatically extract semantic relationships between images. the lsa output feeds into a clustering algorithm that groups images into several classes sharing semantically similar descriptors. a modified boltzman algorithm is then used to lay out images in space. this section will focus on the interface metaphors used to display the results of this classification. two interfaces have been implemented for lvis. a 2d java applet was used on a desktop computer for details, and a 3d immersible environment for the cave (cave automatic virtual environment). cave is a virtual reality 10’ x 10’ x 10’ theater made up of three rear-projection screens for walls and a down-projection screen for the floor. projectors throw full-color workstation fields (1,280 x 512 stereo) at 120hz onto the screens, giving between 2,000 and 4,000 linear pixel resolution to the surrounding composite image.10 both 2d and 3d interfaces give users access to three levels of detail: they provide an overview about document clusters and their relations; they show how images belonging in the same cluster relate to one another; and they give more detailed information about an image, such as its description or its full-size version. in the cave environment, users can first enter a virtual display theater that stages the digital library as a cyberspace easter island, presenting gateways to specific subject categories established by the previous classification process. borner et al. used 3d icons here to encode subject categories (in this case, they actually used a sculptural form of heads inspired by images of the data set).11 after users “enter” into these head icons, they are transited to a new 3d spatial metaphor that presents images in the current category. these images, or slides from the digital library, are presented in crystalline structures (figure 1). in this environment, each crystal represents a set of images with semantically similar image descriptions. again, physical proximity is used here as a metaphor to encode semantic similarities among images. the formations of the crystalline structures depend on the size of the actual search-result data set. navigation in this space is easy. users can also “walk” through this environment and select images of interest to display a largerand clearer-size version (as two images shown in figure 1). if the larger version is not satisfactory, it can be returned to its previous iconic presentation. uc system: a fluid treemap interface for digital libraries the uc system—the acronym “uc” came from its original (but no longer used) internal name “uplib client”—was developed by good et al. at palo alto research center.12 it was built on the uplib personal digital-library platform, which provides an “extensible, format-agnostic, and secure document-repository layer.”13 “personal” here means that the user already has the right to use all of the data objects in the library, and already has local possession of those objects. however, this visualization can be employed in more general digital libraries. the uc system uses continuous and quantum treemap layouts to present collections of documents. continuous treemaps are space-filling visualizations of trees that assign an area to tree nodes based on the weighting of the nodes.14 in continuous treemaps, the aspect ratio of the cells is not constrained, although square cells are often preferred. quantum treemaps extend this idea by making cell dimensions an even multiple of a unit size.15 the treemap visualizations provide meaningful overviews of document collections and fast, intuitive navigation among multiple documents in a working set. an important aspect of the interface is the fluidity of navigation. this allows the user to focus on the documents rather than on interacting with the tool. the interface allows a user to zoom in on an object with a left-click, and to zoom out when the user clicks on the background; figure 1. lvis: image crystals and panels 90 information technology and libraries | june 2006 however, the combination of a “zoomable” user interface and continuous treemaps leads to a problem: conflicts with aspect ratios. to solve this problem, good et al. proposed to zoom and morph the cell to the window size while leaving the rest of the layout in place.16 thus the visual disturbance of the display is minimized since only a single cell moves. with respect to searching, this system provides several methods to filter results. first, its interface includes a mechanism to search for specific content within the documents. as letters are added to the search query, the system increasingly highlights matching documents to immediately indicate matching documents. secondly, the user can also choose to update the view to display only those documents that match the current query. figure 2 gives an example of a user-initiated query process. in the scenario described in figure 2, the user first enters the search terms (2.1), and interactive highlights then appear for groups with matching documents (2.2). the user presses a button to limit the view to only the matching documents (2.3). finally, the user zooms in on a document and begins reading with an integrated reading tool. (2.4) the uc system also offers a mechanism that allows the user to compare multiple documents. after users retrieve a set of documents through a search, they can press a button to “explode” the documents to pages. they can continue zooming in to a portion of a single document, and then select a document page to read with the integrated reading tool. in short, the uc system uses treemaps as the primary visual metaphor. it also uses various visualization techniques that enhance user interactions, such as zooming, interactively highlighting, exploding, etc. activegraph activegraph is an information-visualization tool designed by marks et al. (2000) at los alamos national laboratory (lanl).17 it aims to provide users with a concise, customizable view of documents in a digital library. in this system, a set of digital-library documents is represented as a data set in a 2d or 3d scatter plot. the data set can represent any digital-library objects in various formats including books, journals, papers, images, and web resources. marks et al. used six visual attributes of the scatter plot: the x-, y-, and z-axes, color, size, and shape to encode the bibliographic information of documents in a digital library, including title, author, date of publication, and number of citations.18 the user can select and adjust these attributes from a control panel on the right-hand side of the screen. thus, activegraph allows users to both view and customize the contents of a digital library. the main visual representation of this tool is a scatter plot. scatter plots have been used to represent large sets of data for a long time. they provide an overview of a data set and show the distribution of data points clearly, revealing clusters and statistical information.19 hence, these scatter plots make it possible for users to perceive meaningful patterns of the data. an example of using activegraph scatter plots to visualize citation data for postdoctoral researchers at lanl is given in figure 3. this scatter plot intends to provide users with information, such as the number of times their papers published between 1998 and 2002 were cited. the visualization is based on the metadata in the lanl digital library. in this scatter plot, the postdoctoral researcher’s last name is mapped to the x-axis and the number of citations is mapped to the y-axis. a pixel of a particular color can provide two pieces of information, for example, by encoding a paper and the subject category of the paper. a group of pixels of a particular size, shape, and color can provide four pieces of information by encoding a paper, the subject category of the paper, whether the paper has been cited, and whether the paper has been read by another user of the collaborative library. from this scatter plot, users can easily learn the citation pattern of these papers. unlike some other scatter plot applications such as homefinder and filmfinder, activegraph uses different filters for queries. instead of filter sliders, it uses filter lists, which consist of selection list boxes, one for each data attribute. these filter lists can provide users with functionality that is important in the context of digital libraries. activegraph allows users to manipulate the display of data in another manner by applying a logarithmic transformation. as some data sets, such as citation data, can frequently have an exponential distribution, the figure 2. the search interface of the uc system article title | author 91visualizations for digital libraries | wan 91 logarithmic transformation can spread the clustered dots more evenly across the scatter plot. other data transformations and visualizations may be important in some cases as well, such as parallel coordinates for displaying citation statistics for the same group of researchers at different points in time. the scatter plot is not a new visualization technique. this example, however, demonstrates that by encoding document attributes and designing proper filters, it can be used in a digital library environment effectively and efficiently. 3d vase museum the previous example, lvis, already introduced the applications of 3d representations in digital libraries. the 3d vase museum developed by shiaw et al. at tufts university is another good example of 3d space metaphor in digital libraries using a variety of visualization techniques.20 in this 3d vase museum, the user can navigate seamlessly from a high level scatter-plot-like plan view to a perspective overview of a subset of the collection, to a view of an individual item, to retrieval of data associated with that item, all within the same virtual room and without any mode change or special command. unlike the traditional digital library, which displays thumbnails and descriptions of vases in the main browser interface, this museum is a 3d virtual environment that presents each vase as a 3d object and places it in a 3d space of a room within a museum. figure 4 gives a wideangle view of this 3d museum. in this view, one wall represents the timeline (year bce) and the adjacent wall represents the types of wares (e.g., red figures, black figures). the user can “walk” through this virtual room and look at the vases. the wide-angle view pictured in figure 4 will then be transited to an eye-level view so that the user can probe the objects more clearly. when the user continues “walking” toward an object of interest, secondary information about this vase will appear in the virtual scene. if the user looks closer, the text information becomes clearer. as the user moves farther and farther away, the information becomes less and less visible until it eventually disappears from the scene. if the user clicks on the vase html page, a version of the original html page will be loaded, from which a 3d model of the vase can be loaded. the user can then rotate this 3d model on the screen using the mouse to see all the aspects of the vase. the 3d vase museum is maintained in the background all times. the user can also navigate the room in a perspective view by switching the view port upward toward the ceiling (watching from upside down). the user can then switch the views between a high-level scatter plot and a 3d perspective view. similarly, in this application, the xand yaxes are mapped to two attributes of the vases: year and ware. with this seamless blend from a high-level data plot to 3d objects, the user can navigate without losing the point of view or context by just moving within the virtual environment. according to shiaw et al., a usability test has been administered, in which tasks based on archaeology courses were designed and subjects were asked to perform these tasks in the original traditional digital library and this 3d museum.21 the results showed subjects who used the 3d vase museum performed the tasks 33 percent better and did so nearly three times faster. ■ collaborative visual interfaces to digital libraries collaborative visual interfaces is an ongoing project led by borner at indiana university (iu). borner et al. (2002) proposed the development of a shared 3d document space for a scholarly community—namely faculty, staff, and students at iu’s school of library and information science.22 the space will provide access to a collection of various online documents including text, images, video, and software demonstrations. a semantic treemap algorithm has been developed to layout documents in a 3d space.23 semantic treemaps utilize the original treemap approach to determine the size (dependent on the number of documents) and layout figure 3. activegraph scatter plot of citation data for papers authored by lanl postdoctoral researchers 92 information technology and libraries | june 2006 of document clusters. subsequently, an algorithm (force directed placement) was applied to the documents in each cluster to place documents spatially, based on their semantic similarity, which was encoded by the physical proximity between two dots. an example of the semantic treemaps is shown in figure 5.1. a 3d space metaphor was then used to display these documents on the desktop interface, as shown in figure 5.2. in this 3d space, each document is represented by a square panel textured by the corresponding web page’s thumbnail image and augmented by a short description such as the web page title that appears when the user moves the mouse over the panel. as in other 3d environments, users can “walk” through this space to probe documents of interest. upon clicking the panel, the corresponding web page is displayed in the web-browser interface. users can collaboratively examine, discuss, and modify (add and annotate) documents, thereby converting this document space into an ever-evolving repository of the user community’s collective knowledge that members can access, learn from, contribute to, and build upon. certain usability studies have been performed to determine the influence of panel size and panel density on retrieval performance. results showed that subjects were slightly faster and more accurate if web-page panels are larger and denser. ■ aquabrowser aquabrowser is a fuzzy visualization tool that shows the high-level description of a conceptual space, hiding irrelevant information and displaying information elements in context.24 it is a generic java applet that can be embedded into any web page. medialab, the developer of aquabrowser, claims that users of aquabrowser can browse through a dynamic conceptual space that is continually reshaped to reflect their interests. animations make transitions from one state to another appear more fluid, showing users why and how the information is rearranged. medialab uses the term “word cloud” as the visual metaphor of the aquabrowser interface. but in fact, the primary visual representation is a network of linked words that are distributed in the conceptual space. the search term that the user assigns will display at the center of this network. the physical distance between another term node and this term encodes the relevancy between these terms. the larger and closer the word is to the center of the screen, the greater its relevance to the search term. in contrast, the smaller and more peripherally positioned, the less relevant it is. each of the user’s actions will change and rearrange the distribution and importance of the words, putting those of greater interest closer to the user and those of less interest nearer to the edge of the screen. it also uses colors to encode attributes of terms, such as spelling variations, visited words, and translations. figure 6 shows an example of a search display. this tool has been used by a number of libraries to enhance their online catalog search interfaces. it could be a very useful search aid in digital libraries as well. ■ summary and trends the above applications are just a few examples of information visualization in a digital-library environment. many other metaphors and techniques, such as perspective wall, cone tree, document lens, and hyperbolic browser, have been used or can potentially be used to facilitate searching, browsing, and navigating through the maze of information in a digital library. the digital library is an interdisciplinary subject involving several research areas such as information retrieval, multimedia information processing, and classification. all these aspects of digital libraries make information visualizations more complicated in this environment. therefore, the systems described in this paper have integrated various visualization techniques. figure 4. a wide-angle overview of the 3d vase museum article title | author 93visualizations for digital libraries | wan 93 the examples in this paper, along with many others, show that the 3d space metaphor has attracted much attention from information-science communities. the combination of 3d space and virtual reality that can be accessed from web browsers these days is becoming a trend of information visualization for digital libraries. this technique gives the user maximum freedom to walk through the library collections, searching and browsing documents. the 3d visual structures, however, have greater implementations compared with those that are 2d, since they require more processing power and include more parameters.25 that is partly why many 2d visualizations developed in the 1990s are still widely used. for example, both activegraph and 3d vase museum have employed 2d scatter plots; both uc system and collaborative visual interfaces have used treemaps. furthermore, it is very important to focus on the actual needs of users. research on any visualization for digital libraries should be based on the detailed analysis of users, their information needs, and their tasks.26 usability tests have been done for some of the above applications, but not for all of them. further research and usability tests are required to determine to what extent a visual interface facilitates the user ’s perception of information. references and notes 1. r. rao et al., “rich interaction in the digital library,” communications of acm 38, no. 4 (1996): 29–39. 2. p. zaphiris et al, “exploring the user of information visualization for digital libraries,” the new review of information networking 10, no. 1 (2004): 51–69. 3. b. shneiderman et al., “digital library search results with categorical and hierarchical axes,” dl-00: 5th acm digital library conference, san antonio (new york: acm pr., 1999). 4. y. liu et al, “visualizing document classification: a search aid for the digital library,” journal of the american society for information science 51, no. 3 (2000), 216–27. 5. shneiderman et al., “digital library search results with categorical and hierarchical axes.” 6. m. hascoet, “using maps as a user interface to a digital library,” sigir ’98, melbourne, australia (new york: acm pr., 1998). 7. c. ahlberg and b. shneiderman, “visual information seeking using the filmfinder,” acm chi94 conference, boston (new york: acm pr., 1994). 8. k. doan et al., “query previews for networked information services,” advanced digital libraries conference (washington: ieee, 1996). 9. k. borner et al., “lvis—digital library visualizer,” proceedings, ieee international conference on information visualfigure 5.2. interface to the document space figure 5.1. a semantic treemap of web links figure 6. a search display in aquabrowser 94 information technology and libraries | june 2006 ization, july 19–21, 2000, london, england (los alamitos, calif.: ieee computer society, 2000), 77–81. 10. c. cruz-neira et al., “surround-screen projection-based virtual reality: the design and implementation of the cave,” computer graphics (proceedings of siggraph ’93), vol. 27 (new york: acm siggraph, 1993), 135–42. 11. k. borner et al., “lvis—digital library visualizer.” 12. l. e. good et al., “a fluid treemap interface for personal digital libraries,” jcdl’05, june 7–11, denver (new york: acm pr., 2005). 13. w. c. janssen and k. popat, “uplib: a universal personal digital library system,” acm symposium on document engineering (new york: acm press, 2003), 234. 14. good et al., “a fluid treemap interface for personal digital libraries.” 15. b. b. bederson et al., “ordered and quantum treemaps: making effective use of 2d space to display hierarchies,” acm transactions on computer graphics 21, no. 4 (2002): 833–54. 16. l. e. good et al., “zoomable user interface for in-depth reading,” jcdl’04, june 7–11, tucson, ariz. (new york: acm pr., 2004) 17. l. marks et al., “activegraph: a digital library visualization tool,” international journal on digital libraries 5, no. 1 (mar. 2005), 57–69. 18. ibid. 19. e. r. tufte, the visual display of quantitative information (cheshire, conn.: graphics pr., 1983). 20. h. shiaw et al., “the 3d vase museum: a new approach to context in a digital library,” jcdl’04, june 7–11, tucson, ariz. (new york: acm pr., 2004). 21. ibid. 22. k. borner et al., “collaborative visual interfaces to digital libraries,” jcdl’02, july 13–17, portland, ore. (new york: acm pr., 2002). 23. y. feng and k. borner, “using semantic treemaps to categorize and visualize bookmark files,” visualization and data analysis 2002: 21–22 january 2002, san jose, usa (proceedings of spie, v. 4665) (bellingham, wash.: spie—the international society for optical engineering, 2002), 218–27. 24. a. veling, “the aquabrowser—visualization of dynamic concept spaces,” journal of agsi 6, no. 3 (1997): 136–42. 25. b. eden, “3d visualization techniques: 2d and 3d information visualization resources, applications, and future,” library technical reports 41, no. 1 (2005). 26. e. bertini et al., “visualization in digital libraries,” www .dis.uniromal.it/~delos/docs/ivdls_book_chapter.pdf (accessed jan. 12, 2006). lib-s-mocs-kmc364-20140601053644 computer-based subject authority files at the university of minnesota libraries audrey n. grosch: university of minnesota libraries a computer-based system to produce listings of topical sub;ect terms and geographically subdivided terms is described. the system files and their associated listings are called the subject authority file (saf) and the geographic authority file (gaf). conversion, operation, problems, and costs of the system are presented. details of the optical scanning conversion, with illustrations, show the relative ease of the technique for simple upper case data files. program and data characteristics are illustrated with record layouts and sample listings. introduction as a corollary to the creation and maintenance of large library catalogs, it has become necessary for academic or research libraries to maintain authority files of various kinds, such as author name, subject, series. in a manual cataloging system these files serve to unravel the mysteries of form, meaning, and usage to the cataloger. they also serve as a control to h elp avoid conflicts, synonyms, or overlapping subjects. with a system of decentralized catalogs using different subject entries from a system's union catalog, some method must be derived to preserve such usage for the cataloger. a computer-based subject authority file provides that means. in january 1970, the university of minnesota libraries began studying the relationship of subject authority files to both the present manual cataloging system and to a planned mechanized system employing the marc ii format for storage of bibliographic data. minnesota's subject authority files are divided into two distinct logical files: subject authority and geographic authority subdivisions. the subject authority file ( saf) contains all topical subject heading terms and their subdivisions down to nine subject authority filesjgrosch 231 levels of term, and geographic main headings, i.e. u.s. with nongeographic subdivisions. nonterm data such as origin, usage notes, "libraries using," and other kinds of information are contained in the saf. the geographic authority file ( gaf) contains topical headings found in the saf, with geographical place names as subdivisions and indications of direct or indirect terms in geographic heading assignment. also similar nonterm data as found in the saf are found in the gaf. immediate and long range benefits, together with the cost of conversion versus photocopying showed that greater flexibility would be achieved through the conversion to machine-readable form. some of the benefits were: 1) immediate assistance to the libraries performing their own decentralized cataloging, while providing cards to the union catalog at minnesota; 2 ) future assistance to our coordinate campus libraries should they wish to increase compatibility of their catalogs to the minneapolis campus union catalog; 3) future provisions of a machine-readable authority to enable linking of various subject vocabularies together for an on-line controlled vocabulary subject searching system. when the decision had been made to convert the files to machinereadable form, we tried to determine what others had done regarding this application. although much previous work has been done on subject analysis, cataloging, vocabulary construction, and mechanization of bibliographic processes, very few designers have developed systems to support thesauri or subject heading files. in 1967 heald ( 1) reported on the system for test-thesaurus of engineering and scientific terms. the following year hammond of aries corp. ( 2) described the nasa thesaurus and way ( 3) outlined in detail the rand corporation library subject heading authority list ( shal) mechanized using punch cards and computer in 1967. mount and kollin ( 4) described the use of the computer in the updating and revision of the subject heading list for applied science and technology index. of course several famous information systems use mechanized thesauri , among them the national library of medicine's medlars system with its mesh vocabulary and the department of defense ddc descriptors. in addition, the seventh edition of the library of congress subject headings utilized computer photocomposition. another reported work on subject headings in a mechanized system is that of the library of congress in which a marc record for subject headings is discussed. avram et al. (5) give examples of this record and describe the system now under development at lc. unfortunately, for us, we completed the work herein reported in 1971, thereby not structuring our file to marc specifications. we mention this work here, as our file will lend itself to such a conversion, should we later require it. 232 journal of library automation vol. 5/4 december, 1972 data preparation and file conversion the saf and gaf files comprised 59 catalog card drawers of information (about 115,000 lines of typed data). each file would be converted and maintained separately, but would use the same system design and processing programs. at a later stage, merging the files would be considered. moreover, the cost of the system would be lower if one design could be used for both files. two conversion methods were evaluated, keypunching and optical scanning. other methods would have lent themselves to this conversion, such as ibm magnetic tape selectric typewriters ( mt /st) or an on-line system such as ibm's administrative terminal system (ats ). however, because of the relatively small file size (under six million characters) and a desire for as economical a conversion as possible, only keypunching and optical scanning input were seriously considered. mt /st typewriters were ruled out because of cost and lack of locally available tape conversion equipment. keypunching was considered too slow in relation to typing. our assessment of optical scanning as the cheapest method was confirmed later after completion of the conversion phase of the project, as an estimated $1800 in total savings over keypunching. files were converted without intermediate coding, permitting the typists to transcribe directly from the subject and geographic authority card files. the data preparation was done by the catalog division's subject authority coordinator. this librarian edited the file to eliminate ambiguities before the typist received the drawer. otherwise, except for a quick check of the typist's finished sheets, the data were not examined again until after they were in machine readable form on tape. this procedure worked very smoothly, and caused the staff of the catalog division little inconvenience during the conversion phase. figure 1 shows flow of the complete conversion activity. equipment used for preparation of the data consisted of two ibm selectric typewriters model 715 with carbon ribbon, dual cam inhibitor, and 065 typing element ( rabinow font). one machine had a pin feed platen. this feature later proved to make no discernible difference in the quality of the typed output, but some typists stated that they preferred the pin feed platen over the standard platen. the control data 915 page reader with a cdc 8092 teleprogrammer operating under grasp iii software was used for the conversion. block time was rented at a commercial service bureau for $50.00 per hour. library systems division personnel operated the system during these time periods. control data provided a system manual and debugging time in order to prepare for our operation during conversion. however, little assistance in handling the application was actually received from the control data personnel, who were familiar only with business data processing. a stock form, called the cdc 915 page reader form, procured from a pr1111 saf!:.. caf lv!astcr !.ists 1-. upda t <> l'jj'i,,tt.· . t r~·a ll' s,\f" gaf i \la::-,h·r t <.q"' c.. ' co:"\' u\sjo\. actjv j t) scanned shev t s subje ct authority filesjgrosch 233 cdc 9 15 procl'ssing consoll• c..• rr ur· list & n ·.1··• 1 1-------tl shec·ts cdc 3300 list tape fo r er r or chcc king cdc 3l00 updat .. , < onvl•rl ra\\ l alh' tu tnt<~ r11"letl i a t f• fornhtt fig. 1. conversion process for saf and gaf 234 journal of library automation vol. 5/ 4 december, 1972 t soc i al scienc e rese arch$sgf n t ess ays n t period i calsn d cl , en t social sc i enc esn n do not subd i vide furt he r without approval n t abst ra ctsn t periodic alsn i r c .. 19 00$f1nu n mnu ph . d, thesis shaw , ~eop~e. i!1i; 1970-n adopted j une 1970 per recom~endat i o~ of as n do no t dat e suodiv i s?d e fur ther in mnu cat n fig. 2. saf input typing sample page local forms vendor, was used. this form has a typing area of 9~" x 13" marked off by faint blue lines. top and bottom alignment areas are provided to check for line skew. scanner throughput is increased by use of the longest permissible form with as much single line data as possible. figure 2 shows a portion of a typed page from the saf. line 1 is the format recognition line which was repeated on each sheet as a precaution against its loss by the optical scanner program during processing. such a loss of the format recognition line would have forced complete rerunning of the job. the remaining lines show the various data elements identified by tag characters. the complete set of tag characters is shown in table 1. the end of page symbol # is used on pages which terminate before the last physical line of the page to increase scanner throughput. the h symbol terminates each line and serves the same speed-increasing function. table 1. conversion identification tags t ag t d n c r z x description term d epartmental catalog in which the t erm is used scope note or general note on use of the term continuation line reference from which the t erm was verified if other than lc f ollowed by s = see; by sa = see also; by x = see tracing; by xx = see also tracing. geographic authority flle cross reference tracing (implied ). subject authority filesfgrosch 235 table 2. term subfield indicat01's indicator $ sgf $ dir $ ind $ mnu $ prov $mesh $ nal description term also entered in gaf direct indirect local university of minnesota subject term provisional term medical subject heading term national agriculture library term indentation spaces serve as a flag to the conversion program to show the level of the term or other data element. this technique decreased the number of characters to be typed, yet level errors were easy to detect during proofreading. subfield indicators for certain nonterm data completed the input format used during conversion. table 2 describes these indicators and the meaning of each subfield. the gaf typed input is shown in figure 3. note the similarity between the two files, yet the presence of the variant treatment of an older term (social surveys in) from a newer term (social sciences). as a result the catalog division has now changed these old form terms to conform with library of congress subject heading forms. 1>4oor. t social sciencesn t history$dirn t byzantine empiren t sourcesn d artn t social surveys in$dirn x africa, southn x alabaman brynmawrm?, walese~~ x arynmawr, walesn # fig . 3. gaf input typing sample page 236 ]ourrujl of library automation vol. 5/ 4 december, 1972 during typing, error correction by typists was facilitated by the use of three special characters: .j, -delete line ? -delete preceding character t -type over a character to delete character without inserting blanks. a program is typed on an optical scanning sheet in an assembly level language for the cdc 915 page reader. it is then assembled into object code which operates the page reader and its controlling computer. an example of the program used in this conversion is shown in table 3. line 1 of this program defines the input-output and control characters together with a coordinate to terminate reading of a line if data are not found on the line. it also defines the special characters described above for error correction, end of line, etc. line 2 specifies that a stock form (not preprinted) is to be read, giving the left-most and right-most character positions and maximum number of lines per page together with the first line number to establish the scanning area coordinates. these coordinates are expressed as three digit octal values determined through use of a forms grid and ruler. line 3 describes the tape record format including the field size, the blank fill character, left or right justification, and alphanumeric or numeric only data field content. line 4 instructs the 8092 teleprogrammer unit to convert certain characters to octal values matching the cdc 3300 computer system which are not identical to the normal 915 page reader octal values. the final e terminates reading of the program sheet. from this sheet grasp iii compiles an object program which is stored in the 8092 teleprogrammer memory, enabling scanner operation. system description and operation the raw data tape created during optical scanning was used to build the saf and gaf data files. the magnetic tape coding is binary (odd parity) using 800 bpi density. a fixed length record of 20 characters is used with 100 records per physical block. as many 20 character format c (continuation of data) records are used as needed to achieve variable length logical records. table 4 shows the three record formats used. table 3. cdc 915 program for raw data tape creation ictliblk,dsican, ? idlt,tieol,nieop,#ifmt,wlww istkid27,350,116,004lww e subject authority filesjgrosch 237 table 4. saf and gaf record formats fo rmat a control record a;ar.--contents pos . 1 2-s 6 7-14 15 1618 19-20 reco rd tv~e paqe number column number file cr~ation date file identification subj. au th. (saf) geog . auth. {g4f) co lumns used (123 standard) )lumber of 1 i ne s ner page (75 standard) format a data record (ini tiajl ~~!~ · con tents 1 record tyoe tenn reference tenn (gaf only) reference dept. library see see also see from values 1-9999 1-3 '1:1-dd-yy s g 123 . 121' 111' 131 80 max . values t x r d 1 2 3 4 fonnat b c ar. pas 4 s-6 7-20 (continued) contents qualification code (6 bit binary) sgf (se~ geograohic) dir (direct entry) 1~0 (indirect ent.) pro (provisional entry) t1nu {r1i nn esota tenn) mesh (medi cal subj. heading term) nal (national agri. library tenn) comb inations of these terms are possibl~. they are stored by adding the above values together, i . '!. 17 r1nu/sgf number of disolay lines for item first 14 characters of i tern values 1 2 4 8 16 32 48 2 3 see also from level number 1 -7 fonna t c data record (cont1nuation) sort exception code i~umeric ~xce'ltlon hvn;en excertion sub>titut1on exceo. u.s. obbreviation ~t . brit. ab 1 'r~v . !j h s u ' char· contents pas. 1 2-20 ~ecord tyoe con tinuation of item . values blank or to change or modify the file, keypunched cards are used; one transaction card is used for each correction for both saf and gaf files. table 5 shows the layout of this card. table 5. saf and gaf transaction card column 1-4 5 6-7 8-9 10 11 12 13 14-15 16-80 contents page of master list column of master list line of master list sequence number deck number continuation number level num ber transaction type add cancel modify record type term reference term ( gaf) reference departmental library see see also see from see also from data values 1-9999 1-3 1-80 00-99 or blank 0-9 blank or 0-9 1-7 a c m t xt r d s sa x xx 238 journal of library automation vol. 5/4 december, 1972 catalogers in the wilson library (the university's largest and central library) and the bio-medical library use a 3 x 5 card as an input form. this card is filled in and transmitted to the librarian acting as subject co. ordinator. then the information is keypunched and prepared for submission to an updating run. the normal schedule as originally planned was to run a cumulative supplement monthly, with a quarterly full updating of the file. however, this schedule has been flexible as the transaction volume has varied considerably from early estimates. currently updates are run quarterly to produce supplements, with a full listing annually. these updates vary from 5,000 to 14,000 transactions. the program for the system is written in cobol for the cdc 3300 computer operating under the master operating system. upon demand the program performs four basic functions on the data files: 1 ) creation of a cumulative supplement list from a transaction card deck; 2) updating of the tape files from the transaction card deck; 3) preparation of master lists either during the update process or independently; and 4 ) querying the file on the basis of user defined search terms. parameter cards control the options available when supplements or master lists are to be run. the accept, deck, list, abort, line, space, column parameters provide control over cutoff for new supplement, transaction card list form, termination of job if the number of error cards exceeds a given value, number of lines per page of output, and number of blank lines before and after each transaction on the suppl~ment, and whether a single or double column supplement is to be produced. figure 4 shows a sample from the saf supplement. the updating phase of the program creates the new master file and produces an update error listing accompanied by a report on composition of the file by level number, kind of data, and logical/physical record counts. the master list printout is also controlled through parameter cards. the line, column, select options indicate the number of lines of data to be printed in each column, the number of columns per page, and which pages are to be listed. this latter feature permits supplying replacements for pages improperly printed or bound and suppression of printing when a program restart is necessary. figure 5 shows the most commonly used master list format. the file query function is performed upon demand to assist in file revision, to change a term throughout the file , or other special purpose. the search items can be composed of any and /or combinations of record types, record levels, qualification codes, sort exception codes, and key words or phrases. a keyword search is a character by character search of file items. thus, by specifying a root word, all derivatives of the word formed by adding prefixes or suffixes will be identified. if these derivatives are not desired, a blank preceding and/or following the root word in the search key will prevent their display. however, the word will not be identified if it is "'" · t'-•7"> 0 l t .... 'it<:; ~i· '11 • 7? ~ ll r cflgp.-,g~tl"l'' "' 1 ~., . ., dl'"' t t(, ! <:la t ton <:tf' '""l::>p"'04. t j n 'l l ~if wt' tf!"< r-<::; "'"' r '""'o~~>.oo ~ 1 r "' rr t t~r.<: o[<:;flll,l ll' " '" r' l i fr'h'"j w i'\o o" ~ <:: rr '""l:;~r>r)oe t i i" 'i o r <;ro " '"" l"f"l""l'" ht l'\ •~ <;fr '" l'l~o:.fll tih ti"'n .ant'! "'(qr,r-o nr co~p r) i'a t j qt•<:: r(' =~~~~~n~!, r'q of ii>n r c ooro;p nt;flf' nc'f o;cwont.<:: hjl'l r::ouq.sro:; "'<: ., ~v<:; ,. .. 'ill!. o!f"a'l 4 t• tw o o o:. <:;[ r atj t>4"10 s . ccs. t l olf"4n r" f:o: l t l"' ( of 11 fci''ff;4 l" dfojqi'hr.al<; cotrf'l owaotng(4l o;p .. tnrtfll' i"of ~f a._n i"ofmfn&l ~ l ~>j (.tjlf( <:; ff '"a 'i f ta••r.u arr <:;ff f".l'it i"o! i"ic a~'"' "'ou~"' a t 10 '1 "'£" ow~~"•ll; n[<::c;pw & g£4l <:;pw j'it';t "' o roy •r por ve: ntt o"' 1' q\,' 1 w dll.:;lf!i"jojit fi')n t '1'1)1)1'"1 etu>tt f'to ,lfj""' 1 '1 .... i,. f r..,e:vrwt rnn y vf1u fw ~i t c!:o\.s f ("o t.'4., ooic:vf "4tjon' r:: rni"!"'il l4.w l"f'nt't ~rt or' law<:; vr f' c!:j "i"~il j u oi sdh~tjon 0lf~otng t.th) orji r:lt""' o::<'f l"~f .. !"'iil pq11c'!01.j!: f (' 0 .. ~" <;,r[ i"oj oi!"'il l4.w (of' ..... n l 1w) co•h" lfl"f 1'1~ ldiw c: <:; c'r r.:ttoitiiill j ij '!j<;'h c t lo"'l pl ~:~~~~ t:~2.~$1~ ~~l~~l1'!'" po .. ,. o;rr !"c!iiiit~al lh f of\ooi &n l.\w) ~o tf lcol o(ijn t ofl>; t yr r, oepwv conn ra~ ~ uf io4('1oo;, ~=g~~t :.~~~!~~~ f :~~~fl~~ <;h nflll~on f"r;>q<; 's s £r"t.fi1nc;; c uo ~ 0£'\rojdrt o"l gnq tuvel foc:t• 'f 0 "1 0 " 1c fon rl !tl o"ls 1 0 1:0. of'!;f01f' f! 0 "1 a ~c to~v(l 111 1) 1r'f\o:a .... f l l" ll l1 '1 oft:l! r'i ')tr:al s .,..,r tr ft~s ""u""'rc;; iijql t or.o ~ o,..,. cu'~ e oj g u 1'"111~>itq e~>~gljs ~o~ " ul tljof 6.,1'1 f( £l t g i q"' subject authority files/grosch u"ft vi' 0 <:ify of mi nnf'i0 t6 'i~~~t~; .. ~ ~~ ... ~; ~ i y r 1 ~" h;a "1' .. .j fr !l t .j io ~1\1 n t 1" ~"' ~'~l"lo ' " " t'ijllo l f:o l on ~o 1 , .. {l tl c i n ot'l 0"1 01'1 f'l"' (ltl r: a"" 00 chi 00 ca n or~ r:a~ i ~ o , .. 7 t .. ~ t c; , aoo " i otloz h1 t ot af10 ' 1 o t .. z 20!> ,. ~ t ,.on r i o lio? '"~ flu of\0 t i o lio ? '"~ 1'h ado <:; i 0\lo? ''1 101 b(!o t 1 olio ? ?2 1 t 5 t ann '< i olio ? ?23 101 aoi) t 0131o '36 ~ ofl c at.' 's i olio? zz.~ 1<;1 100 <:; i ii iio? z?lo ofl ru,j t , ,~ .. 31~ t o t • oo 1 ' otlo? ??'!> oo on c:; iilja '"''" 15 1 an o i olio? ?lf. 00 can 1 i d11o2 ?21 ll l'f cl"l 'i i otlo ? ;t?~ u ("an f i otlo ? ??g 00 {"a"' <:; i i 0 1 lo? 7c; 1 i 0 1 ~!11'1 t i lt tlo ? 9 1'1o i t'l l aoo t i : i i i i i i 014 ? ~'sz 00 c in t ihit ? l~'i 101 aorj t otlo~ 11! ''ill cl~ t otlo3 1 h101 l{)o t 014 ! 1 ~el tst 100 l' ol lolo l ,3 101 a(10 t iij iolo 133 tst ~~n " llt]o teo(' t!!'t aoo t i n11o1o 2'if 1 1}1 100 t ?'ifl 1 st ado i) ?ofl 00 c ... l t ;?e. 7 00 o n r1 ]qf tnt 100 t ht t 'i t jim ~ 011q t fio 11o f i flo ih]g tf. o 1~1 aoo ado am aoo •no •on aoo ran can con o n can r.&n on~ lf.f tot u )l} nt!oo v? tftt an o dtlo" t '7 t 'i t •nn nt•)t v i " f 'l ~ • "" ' pcrn •r\trrro:; "r'\j71~~ ,:;;o rr·~~ "'' '"'' i 1_,,_ jo< i• • .,,,, 'ctjc••t s~~> rh.ll · y . ,:. ~,.~;~l.~'¥!1 .~~~;~ " ~ .. ..... :, 1i-1t~j,,.s;r; ~;1~,.:hm:}n rr "'" .. , ':' i r p,f c: of" rr ... t .&': t t r(!h~"i •r t rr "p' rt u.>. p ~~~' ' " of u ... t ~llt .. , 1. .. r,,e t c'f1p.luf t7 0 "' '· y'" t • " "ct~;;::~~l~ft~ ~; ~ ~~f ,.al f f' 'i l o(" fl~~~fl ~ ~~f! j<:c y h ·•rv • ~ro: 1\jh"tof""j<:; ·c"~ jy'! ri'iwll' ' '" ll~ ( jih ).oo::j'i . .. , . ~·~. 7 1 c . ... ,. .. r~· r .. • t '"* ~ ....... ., ... , :. "' "'ll"'·""""' • n•·rr'" ~ r .,.q uot: :::g: ~~~; ~ j~j!~h:~i,~~ u-1 o • "('• '· ,"1(1' 1 ~' t1 y if tj~ • • l i, w) 0 1·•··.r:r , cr -.. ~ ... ,(' .. ~ c:h<' t r r l'tfi(trt ~o~r <; :· ''' f ~ ; ~,;~.k1 ~~~~ ~ ~~ ~~ r,: .. ' ~~ ~ ~f~!j'~! ~~ i (: "t.,t · r r,. r •• . ~·r =~""~i~t:~:~;· ... ~ ,., .. ,, ,. ... ,. .,. c"'' 'trt: •• .,. • ttl\ t ipl~ ~ • c,.~rr •tt t , , • i ,("'0 p~ tr ~ '' "'• 1"1 i t"ll"' p'h f io , t , ! <\3 <>-t"lo' . f'l~ ·l"t.!i"t ll''l"' ct' ~j·?;l!;: ... : ,~! ~/f , 6 ·~~ r : ;~~ ~~~jpo y ~" ~::~-~~,y~f'·~~~~~~~~~ ' ~~~-·11 nrn , ''l tl ~.l'f htrt• , ft'"dll • .•· ~"'• n tr~ • ,. .... ~ '' • '' '' ,. , d l"ttco'" ' ' "c•rtt ., , r " r.l u .fll r-· "~:~.l~t ~ ~c ~~~~itil<: ll f' ,., r~ u" • "o"' "r~ t ~ ... , , ,., ' '' ., 1"r • •~ r • •r• 1 l· f'' i /'5-f,t' comu fru. itat( -.,1 •21 • rn' o:• pila t 1r1 c~c:,r,.ru•(,n ·~r: ... ~~:~: ~~~ h·~(l~~~ .. ~,,~, ,ro ii('(' " <'• ((fo,""" fy illr. .. .t klf'l 1 r ~ '"'" •n _., co::~~" ~=ffh ~n~~; ~~ n1~~ :, c' \l ll ro~· thc1 "":~~~"~~~~f' ~ r~ . ~:~~~ ~ cn·'"' "" ' ''f"' l.r ' ' ch t hr . ., ""r onf,c. &'tlf ,.f<' , f' f'i-is£ c'f •jtn"' t,.d iih • h' p p!j ..; t f'l''l~"rv•ttoi t'f t't'<;c -•,.,,.,.. rr•~•uv •t rn. cr· "' p'f it t r • ,., ff'~f ... ,:::. • ~('\< ., (l' f> t rr ,. .,. li""~ ipt " ,~ '"j k\.~1 ... h t r . t:"'hvj>v • nr"' •tto . ,. . tf'ci fij : • ~"n· .,~ ,h'• ' rot o1 .. r.,.t· r"'" ,.,::,..:a¥~!~:; · .... ~~~~t ~~~~l=~ .. , .,., , li p , l t' f' h l l,.r ~ ts • r .,,.,c;rq,. u tct r ," 'll:ioi l "rc;ru•r~" i <;cf ; ~:~. !~~ .,:f .. , <'c; a'f s "'" i"c'i! r 't" ~~~~~~~l ~ ~~ ,~ar.-t.c t '>r. f v.-;1 ~\~~~t ·~~!o& "r " yrio d('if' ll' " rn,..o;.• j> vu i o .. ('f ,..iiu<> . "fr "''''"' r"'t .,, ~y t • i i"._ ,. ,.. .,o:rj>yttt r • rr r'' "' ''llfr,o: · , ... "~h; ~~ ; q .~f '"t"v• t latot .,.('i f' o! :~ l·~l ~ ~ ~~ 's~ ~~~~~~~~; i o n t;(' ~<'~" f.'y .t tj rj cf r p1 jol s "~i pf' [t 1 • , (f'nj' i'v .t tr; ,_ -h o pp' l i" ch1 c,._ r:o•·c;~j.o 'flt~n cf ~r i f lft"(:o:; c;u· rc"" t'l ~'t i"n cf n.tt u"il ff <' oij111 '" r..; • ro;,i~~" ~!l ~~ .$' ! <'pi (' ~ tp~yf ~~'ll rc;t, · ~g; ~~=~:lj~~ ~= ;~;"'!~~f' c• o;. h .,olci6" f'f':,~~~ :~n('~c"~~ :~~;~~~l op<' o:-rr cljjttj•t:•. (l""'"''lfv i t j i"'n &jon cr• t i" i i tj')t ,.,. :, ·::~~n ~ f, ,.~~n~~; f('l"l •rc• crov11 1 ~1 t i'i (p~u c:o:t u ic 0tu711 1h i< c.< f i pttl ipii u"' o: j ., ro • """" ' 'n• t r~ •r~< tih' ""'"t."f< • c:t'l'c)10f'olth' h i w) • cowo:;f' l(t.,h' oi •,.tu••t a, r ,, • r t"t••nli tf ( "" •"t" toorc ., ~ if ) i •nrt,y fig. 5. master list format using 3 column standard expansion of the shelf list. although file conversion took five months to complete, the program to operate the system was delayed because of termination of the programmer originally assigned to the project. although the basic program features were ready in about 3-4 months, it was not until january of 1971 that the system was installed. during that year the staff gained experience in the system and cleansed the data of many ancient errors. by the end of the year, the system was an integral part of our catalog division support activities. costs as was pointed out previously, there was consideration given to photocopying the authority files to provide a duplicate set for the bio-medical library. it was determined that this would cost $2,400 ( 60,000 cards @ $ .04 each) . this equalled the cost of the typing personnel and rental of optical scanning equipment. moreover, there would have to be duplicate cards and filing to maintain both files, with no assurance that they would remain exact duplicates of one another. in our opinion the benefits of this subject authority files j grosch 241 table 6. conversion costs item senior clerk typists @ $2.40 ( 2 fte for 3 mos.) cdc 915 rental (20.1 hours @ $50 per hour) typewriter purchase typewriter rental ( 2 mos. ) magnetic tape cdc 915 forms cdc 3300 computer time @ $95.00/hr. total cost $1810.56 1007.50 532.70 60.00 74.00 400.00 1411.45 $5296.21 computer-based system offset the additional cost over the photocopying approach. to create these files completely cost $5,296.21 for all direct expenditures for clerical help, scanner time, typewriter purchase and rental, supplies, and cdc 3300 computer time. table 6 shows the breakdown of these costs. during the conversion and development phase, salaries of the systems personnel were absorbed by the library so that only these direct costs were charged to the project. also, the library absorbed the subject coordinator's time for editing the file of cards prior to typing. two senior clerk-typists at $2.40 per hour each were employed for three months full time to type the data. operating costs are borne by the library, which requires a half time librarian as subject coordinator and a student keypunch operator for 15-20 hours per week. the systems division provides program maintenance as required. supplies and computer time require about $2,100 per year if quarterly full lists are used with monthly supplements. some idea of the relative processing economy can be shown by examining some typical running times on the computer. the sizes of the saf and gaf files are respectively 4.35 and 1.75 million characters. a typical supplement with 12,000 transactions takes 45 minutes to print on the cdc 3300 equipped with a 1000 line-per-minute printer for either saf or gaf. printing of a full master list for the saf and gaf is 1 hour 25 minutes and 45 minutes respectively. updating the files takes about 1 hour 40 minutes for 12,000 transactions. a query of the file takes about 30 minutes. current computer and channel charges are $95 per hour. general observations our experience with this project has shown us the high reliability of the cdc 915 page reader as a conversion device. less than 1 percent of the total amount of data the page reader scanned was rejected. those errors rejected were easily spotted and retyped. no scanner-produced errors were found in the data; however, there was an occasional failure to pick up spaces when more than three occurred together. these errors were very infrequent and were discovered in the raw data proofreading. these errors were corrected and, after the final output file was generated, we again 242 journal of library automation vol. 5/4 december, 1972 checked for similar conditions and found everything in order with regard to term level indication. with an upper-case file such as this, use of the cdc 915 is simple and easily accomplished. ·however, the library should not rely upon a scanner manufacturer or the installation where a unit is being leased to provide all the assistance required. the library will have to design its application and become familiar with the equipment in order to achieve best results. all optical scanning usage requires that certain care be exercised in t~e typing operation. lines must not be skewed, characters must not be blurred, and length of line can be critical even though the scan optics may be opened and closed over longer lines than are intended to be typed. further, it is imperative that the paper used in the scanning operation meet specifications for use with the chosen scanner. our experience indicates that a pin feed platen is not necessary to maintain forms alignment if typists use care in initial alignment. we experienced some operational problems when we actually tried our program on the page reader. initially, the system would not compile our program. it was not due to a catastrophic error in our program, but rather a hardware fault in the 8092 teleprogrammer. in trying to read the program onto tape after compilation, the system consistently failed. we finally gave up trying and recompiled from the scanned input sheet at the beginning of each conversion run. no one at the data center could explain our failure to load, but we must assume an intermittent or undetected hardware problem. during the job run it was imperative that the scanner be watched closely as occasionally it would stop reading or fail to feed a sheet. these were not difficult problems but did require occasional attention by the center's customer engineer. on one occasion the scanner failed during our run, and we could not achieve a timely repair. we rescheduled for the next week and then experienced no problem. after our experiences with the 915 page reader at the data center we felt that we knew as much about the equipment as any of the operators we met while doing our production runs. we would not hesitate to use the page reader again for a simple file conversion, and would continue to handle the operation ourselves as the center operators were no better able to run our job. acknowledgments the author wishes to thank mr. eugene d. lourey for developing the program for this system. mr. curt herbert deserves recognition for the preliminary design for the system and initiating the optical scanning activities. also, mr. carl 0. sandberg, who was responsible for the many details of the conversion portion and who now maintains these programs, contributed many significant design parameters. the staff of the catalog division, too, deserve our gratitude for their file cleansing and data editing during and after conversion. subject authority filesjgrosch 243 references 1. j. heston heald, the making of test -thesaurus of engineering scientific terms. (final report of project lex, [u.s. office of naval research: nov. 1967] ad 661,001). 2. william hammond, construction of the nasa thesaurus, computer processing support, final report. (aries corp., 1968) n 68-28811. 3. william way, "subject heading authority list, computer prepared," american documentation 19: 188-99, (april 1968). 4. ellis mount and richard kollin, "analysis and revision of subject headings for applied science and technology index," special libraries 60: 639-46, (dec. 1969). · 5. henriette d. avram, lenore s. maruyama, and john c. rather, "automation activities in the processing department of the library of congress," library resources and technical services 16: 195-239, (spring 1972). reproduced with permission of the copyright owner. further reproduction prohibited without permission. harvesting information from a library data warehouse su, siew-phek t;needamangala, ashwin information technology and libraries; mar 2000; 19, 1; proquest pg. 17 harvesting information from a library data warehouse data warehousing technology has been defined by john ladley as "a set of methods, techniques, and tools that are leveraged together and used to produce a vehicle that delivers data to end users on an integrated platform. "1 this concept has been applied increasingly by industries worldwide to develop data warehouses for decision support and knowledge discovery. in the academic sector, several universities have developed data warehouses containing the universities'ftnancial, payroll, personnel, budget, and student data.2 these data warehouses across all industries and academia have met with varying degrees of success. data warehousing technology and its related issues have been widely discussed and published. 3 little has been done, however, on the application of this cutting edge technology in the library environment using library data. i motivation of project daniel boorstin, the former librarian of congress, mentions that "for most of western history, interpretation has far outrun data." 4 however, he points out "that modem tendency is quite the contrary, as we see data outrun meaning." his insights tie directly to many large organizations that long have been rich in data but poor in information and knowledge. library managers are increasingly finding the importance of obtaining a comprehensive and integrated view of the library operations and the services it provides. this view is helpful for the purpose of making decisions on the current operations and for their improvement. due to financial and human constraints for library support, library managers increasingly encounter the need to justify everything they dofor example, the library's operation budget. the most frustrating problem they face is knowing that the information needed is available somewhere in the ocean of data but there is no easy way to obtain it. for example, it is not easy to ascertain whether the materials of a certain subject area, which consumed a lot of financial resources for their acquisition and processing, are either frequently used (i.e., high rate of circulation), seldom used, or not used at all. or, whether they satisfy users' needs. another example, an analysis of the methods of acquisition (firm order vs. approval plan) together with the circulation rate could be used as a factor in deciding the best method of acquiring certain types of material. such information can play a pivotal role in performing collection development and library management more efficiently and effectively. unfortunately, the data needed to make these types of decisions are often scattered in different files maintained siew-phek t. su and ashwin needamangala by a large centralized system, such as notis, that does not provide a general querying facility or by different file/ data management or application systems. this situation makes it very difficult and time-consuming to extract useful information. this is precisely where data warehousing technology comes in. the goal of this research and development work is to apply data warehousing and data mining technologies in the development of a library decision support system (loss) to aid the library management's decision making. the first phase of this work is to establish a data warehouse by importing selected data from separately maintained files presently used in the george a. smathers libraries of the university of florida into a relational database system (microsoft access). data stored in the existing files were extracted, cleansed, aggregated, and transformed into the relational representation suitable for processing by the relational database management system. a graphical user interface (gui) is developed to allow decision makers to query for the data warehouse's contents using either some predefined queries or ad hoc queries. the second phase is to apply data mining techniques on the library data warehouse for knowledge discovery. this paper covers the first phase of this research and development work. our goal is to develop a general methodology and inexpensive software tools, which can be used by different functional units of a library to import data from different data sources and to tailor different warehouses to meet their local decision needs. for meeting this objective, we do not have to use a very large centralized database management system to establish a single very large data warehouse to support different uses. i local environment the university of florida libraries has a collection of more than two million titles, comprising over three million volumes. it shares a notis-based integrated system with nine other state university system (sus) libraries for acquiring, processing, circulating, and accessing its collection. all ten sus libraries are under the consortium umbrella of the florida center for library automation (fcla). siew-phekt. su (pheksu@mail.uflib.ufl.edu) is associate chair of the central bibliographic services section, resource services department, university of florida libraries, and ashwin needamangala (nsashwin@grove.ufl.edu) is a graduate student at the electrical and computer engineering department, university of florida. harvesting information from a library data warehouse i su and needamangala 17 reproduced with permission of the copyright owner. further reproduction prohibited without permission. i library data sources the university of florida libraries' online database, luis, stores a wealth of data, such as bibliographic data (author, title, subject, publisher information), acquisitions data (price, order information, fund assignment), circulation data (charge out and browse information, withdrawn and inventory information), and owning location data (where item is shelved). these voluminous data are stored in separate files. the notis system as used by the university of florida does not provide a general querying facility for accessing data across different files. extracting any information needed by a decision maker has to be done by writing an application program to access and manipulate these files. this is a tedious task since many application programs would have to be written to meet the different information needs. the challenge of this project is to develop a general methodology and tools for extracting useful data and metadata from these disjointed files, and to bring them into a warehouse that is maintained by a database management system such as microsoft access. the selection of access and pc hardware for this project is motivated by cost consideration. we envision that multiple special purpose warehouses be established on multiple pc systems to provide decision support to different library units. the library decision support system (loss) is developed with the capability of handling and analyzing an established data warehouse. for testing our methodology and software system, we established a warehouse based on twenty thousand monograph titles acquired from our major monograph vendor. these titles were published by domestic u.s. publishers and have a high percentage of dlc/dlc records (titles cataloged by the library of congress). they were acquired by firm order and approval plan, the publication coverage is the calendar year 1996-1997. analysis is only on the first item record (future project will include all copy holdings). although the size of the test data used is small, it is sufficient to test our general methodology and the functionality of our software system. fcla d82 tables and key list most of the data from the twenty-thousand-title domain that go into the loss warehouse are obtained from the db2 tables maintained by fcla. fcla developed and maintains the database of a system called ad hoc report request over the web (arrow) to facilitate querying and generating reports on acquisitions activities . the data are stored in 0b2 tables. 5 for our research and development purpose, we needed db2 tables for only the twenty-thousand titles that we identified as our initial project domain. these titles all have an identifiable 035 field in the bibliographic records (zybp1996, zybcip1996, zybp1997 or zybpcip1997). we used the batchbam program developed by gary strawn of northwestern university library to extract and list the unique bibliographic record numbers in separate files for fcla to pick up. 6 using the unique bibliographic record numbers, fcla extracted the 0b2 tables from the arrow database and exported the data to text files. these text files then were transferred to our system using the file transfer protocol (frp) and inserted as tables into the loss warehouse. bibliographic and item records extraction fcla collects and stores complete acquisitions data from the order records as db2 tables. however, only brief bibliographic data and no item record data are available . bibliographic and item record data are essential for inclusion in the loss warehouse in order to create a viable integrated system capable of performing cross-file analysis and querying for the relationships among different types of data. because these required data do not exist in any computer readable form, we designed a method to obtain them. using the identical notis key lists to extract the targeted twenty-thousand bibliographic and item records, we applied a screen scraping technique to scrape the data from the screen and saved them in a flat file. we then wrote a program in microsoft visual basic to clean the scraped data and saved them as text-delimited files that are suitable for importing into the loss warehouse. screen scraping concept screen scraping is a process used to capture data from a host application. it is conventionally a three-part process: • displaying the host screen or data to be scraped. • finding the data to be captured. • capturing the data to a pc or host file, or using it in another windows application. in other words, we can capture particular data on the screen by providing the corresponding screen coordinates to the screen scraping program. numerous commercial applications for screen scraping are available on the market. however, we used an approach slightly different from the conventional one. although we had to capture only certain fields from the notis screen, there were other factors that we had to take into consideration. they are: • the location of the various fields with respect to the screen coordinates changes from record to record . this makes it impossible for us to lock a particular field with a corresponding screen coordinate. 18 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. • the data present on the screen are dynamic because we are working on a "live" database where data are frequently modified. for accurate query results, all the data, especially the item record data where the circulation transactions are housed, need to be captured within a specified time interval so that the data are uniform. this makes the time taken for capturing the data extremely important. • most of the fields present on the screen needed to be captured. taking the above factors into consideration, it was decided to capture the entire screen instead of scraping only certain parts of the screen. this made the process both simpler and faster . the unnecessary fields were filtered out during the cleanup process . i system architecture the architecture of the loss system is shown in figure 1 and is followed by a discussion on its components' functions. notis notis (northwestern online totally integrated system) was developed at the northwestern university library and introduced in 1970. since its inception, notis has undergone many versions. university of florida libraries is one of the earliest users of notis. fcla has made many local modifications of the notis system since uf libraries started using it. as a result, the uf notis is different from the rest of the notis world in many respects . notis can be broken down into four subsystems: • acquisitions • cataloging • circulation • online public access catalog (opac) at the university of florida libraries, the notis system runs on an ibm 370 main frame computer that runs the os/390 operating system . host explorer host explorer is a software program that provides a tcp /ip link to the main frame computer . it is a terminal emulation program supporting the ibm main frame, as/400, and vax hosts . host explorer delivers an enhanced user environment for all windows nt platforms, windows 95 and windows 3.x desktops. exact tn3270e, tn5250, vt420/320/220/101/100/52, wyse 50/60 and ansi-bbs display is extended to leverage the wealth of the windows desktop. it also supports all db2tables loss host explorer data cleansing and extraction warehouse graphical user interface figure 1. loss architecture and its components tcp /ip based tn3270 and tn3270e gateways. the host explorer program is used as the terminal emulation program in loss. it also provides vba compatible basic scripting tools for complete desktop macro development. users can run these macros directly or attach them to keyboard keys, toolbar buttons, and screen hotspots for additional productivity. the function of host explorer in the loss is v ery simple. it has to "visit" all screens in the notis system corresponding to each notis number present in the batchbam file, and capture all the data on the screens. in order to do this, we wrote a macro that read the notis number one at a time from the batchbam file and input the number into the command string of host explorer . the macro essentially performed the following functions: • read the notis numbers from the batchbam file. • inserted the notis number into the command string of host explorer . • toggled the screen capture option in host explorer so that data are scraped from the screen only at necessary times. • saved all the scraped data into a flat file. after the macro has been executed, all the data scraped from the notis screen reside in a flat file. the data present harvesting information from a library data warehouse i su and needamangala 19 reproduced with permission of the copyright owner. further reproduction prohibited without permission. in this file have to be cleansed in order to make them suitable for insertion into the library warehouse. a visual basic program is written to perform this function. the details of this program will be given in the next section. i data cleansing and extraction this component of the loss is written in the visual basic programming language. its main function is to cleanse the data that have been scraped from the notis screen. the visual basic code saves the cleansed data in a text-delimited format that is recognized by microsoft access. this file is then imported into the library warehouse maintained by microsoft access. the detailed working of the code that performs the cleansing operation is discussed below. the notis screen that comes up for each notis number has several parts that are critical to the working of the code. they are: • notis number present in the top-right of the screen (in this case, akr9234) • field numbers that have to be extracted. example: 010::, 035:: • delimiters. the " i " symbol is used as the delimiter throughout this code. for example, in the 260 field of a bibliographic record, "i a" delimits the place of publication, " i b" the name of the publisher and, "i c" the date of publication. we shall now go step by step through the cleansing process. initially we have the flat file containing all the data that have been scraped from the notis screens. • the entire list of notis numbers from the batchbam file is read into an array called bam_number$. • the file containing the data that have been scraped is read into a single string called bibrecord$. • this string is then parsed using the notis numbers from the bam_number$ array. • we now have a string that contains a single notis record. this string is called single_record$. • the program runs in a loop till all the records have been read. • each string is now broken down into several smaller strings based on the field numbers. each of these smaller strings contains data pertaining to the corresponding field number. • a considerable amount of the data present on the notis screen is unnecessary from the point of view of our project. we need only certain fields from the notis screen. but even from these fields we need the data only from certain delimiters. therefore, we now scan each of these smaller strings for a certain set of delimiters, which was predefined for each individual field. the data present in the other delimiters are discarded. • the data collected from the various fields and their corresponding delimiters are assigned to corresponding variables. some variables contain data from more than one delimiter concatenated together. the reason for this can be explained as follows. there are certain fields, which are present in the database only for informational purposes and will not be used as a criteria field in any query. since these fields will never be queried upon, they do not need to be cleansed as rigorously as the other fields and therefore, we can afford to leave the data of these fields as concatenated strings. example: the catalog_source field which has data from " i a" and " i c" is of the form " i a dlc i c dlc" while the lang code field which has data from "i a" and" i h" is of the form" i a eng i h rus." but we split this into two fields: lang_code_l containing "eng" and lang_code_2 containing "rus." • the data collected from the various fields are saved in a flat file in the text-delimited format. microsoft access recognizes this format. a screen dump of the text-delimited file, which is the end result of the cleansing operation, is shown in figure 2. the flat file, which we now have, can be imported into the library warehouse. i graphical user interface in order to ease the tasks of the user (i.e., the decision maker) to create the library warehouse and to query and analyze its contents, a graphical user interface tool has been developed. through the gui, the user can enact the following processes or operations through a main menu: • connection to notis • screen scraping • data cleansing and extracting • importing data • viewing collected data • querying • report generating the first option opens hostexplorer and provides a connection to notis. lt provides a shortcut to closing or minimizing ldss and opening hostexplorer. the screen scraping option activates the data scraping process. the data cleansing and extracting option filters out the unnecessary data fields and saves the cleansed data in a text-delimited format. the importing data option imports the data in the text-delimited format into the warehouse. the viewing collected data option allows the user to view the contents of a selected relational table stored in 20 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. "record humber","system control humber","catalogin source","language codes 1","language code~ "akr9234", "ybp1996 0507--clarr done", "a dlc i c dlc ", "1 : i a eng "," i h rus", "e-ur-ru", "306/. 0~ "rks6472", "ybp1996 0507--clrrr done"," a dlc i c dlc ", "1 : i a eng "," i h rus", "hull", "891. 73/ 44 "aks6493", "ybp1996 0507--clarr done"," a dlc i c dlc ","hull", "hull", "hull"," 001. 4/225/ 028563 i ~f "ajx7554", "ybp1996 05 08--clarr done"," a uk i c uk ","hull", "hull", "e-uk---", "362. 1 / 068 12 2 o",' "akb3478", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "e-fr---", "843/. 7 12 2 o", "t " "akc6442","ybp19960508--clarr done","a dlc c dlc ","1 : la eng ","lh ger","e-fr---","194 12 "ake9837", "ybp1996 0508--clarr done"," a dlc c dlc ","hull", "hull", "e-gr---", "883/. 01 12 20",' "akk9486", "ybp1996 0508--clarr done", "a dlc c dlc ","hull", "hull", "e-uk---", "822/. 052309 12 ~% l'akl2258", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "e-xr---", "929. 4/2/ 08992401 1• "akm2455", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "e-gx---", "943. 086 12 2 o",' "akm4649", "ybp1996 0508--clarr done"," a dlc c dlc ","hull", "hull", "hull", "863/ .64 i 2 20", "hu] ' "akh0246","ybp19960508--clarr done","a dlc c dlc ","hull","hull","n-us--la e-uk-en","700/. "akh181 o", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull" ,"hull", "e-uk---", "305. 6/2 042/ 0903.: "akh3749","ybp19960508--clarr done","a dlc c dlc ","hull","hull","f-ke--la f-so --","327.{ "akq727 4", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "hull", "355. 4/2 12 2 o", "hu] "akq9180", "y.bp1996 0508--clarr done", "a dlc c dlc ","hull", "hull", "n-us---", "23 0/. 93/ 09 12 2,f "akr 0424", "ybp1996 05 08--clarr done"," a dlc c dlc ","hull", "hull", "n-us-mi", "331 . 88/1292/ 097' "rkr1411", "ybp1996 05 08--clarr done"," a cl i c cl ","hull", "hull", "n-us---", "3 05. 896/ 073 12 2 o' "akr1846", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "e-uk-ni", "hull", "hull", "x, "akr2169", "ybp1996jt5 08--clarr done"," a dlc i c dlc ","hull", "hull", "n-us-sc", "323. 1/196073/ 091 "akr2245" ,"ybp19960508--c .larr d.one" ," a dlc i c dlc ","hull", "hull", "hull", "306 .4/6 i 2 20", "hu1 "akr2255", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "hull", "3 03. 48/2 12 2 o", "2r "akr226 o", "ybp1996 0508--clarr done"," a dlc i c dlc ","hull", "hull", "n-us-", "3 03. 48/2 12 2 o", "akr2281", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "t-----i a r------", "333. , · "akr2287", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "hull", "57 4. 5/262 12 2 o", "t "rkr2357", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "e------", "361 . 6/1 / 094 12 l "akr2358", "ybp1996 0508--clarr done"," a dlc i c dlc ","hull", "hull" ,"hull", "333. 7/2/01 12 20" ,' ¥' "akr2371", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "e------", "3 07. 72/ 094 12 211 "akr2386", "ybp1996 05 08--clarr done", "dlc i c dlci", "hull" ,/'hull", "e-uk---", "hull", "hull", "xu, "rkr25 03", "ybp1996 05 08--clarr done"," a dlc i c dlc ","hull", "hull", "hull", "575. 1 / 09 12 2 o", "hl 'i-r---· ---------·----figure 2. a text-delimited file the warehouse. the querying option activates ldss's querying facility that provides wizards to guide the formulations of different types of queries, as discussed later in this article . the last option, report generating, is for the user to specify the report to be generated. i data mining tool a very important component of loss is the data mining tool for discovering association rules that specify the interrelationships of data stored in the warehouse. many data mining tools are now available in the commercial world. for our project, we are investigating the use of a neural-network-based data mining tool developed by limin fu of the university of florida.? the tool allows the discovery of association rules based on a set of training data provided to the tool. this part of our research and development work is still in progress . the existing gui and report generation facilities will be expanded to include the use of this mining tool. i library warehouse fcla exports the data existing in the 0b2 tables into text files. as a first step towards creating the database, these text files are transferred using ftp and form separate relational tables in the library warehouse. the data that harvesting information from a library data warehouse i su and needamangala 21 reproduced with permission of the copyright owner. further reproduction prohibited without permission. are scraped from the bibliographic and item record screens result in the formation of two more tables. characteristics data in the warehouse are snapshots of the original data files. only a subset of the data contents in these files are extracted for querying and analysis since not all the data are useful for a particular decision-making situation. data are filtered as they pass from the operational environment to the data warehouse environment. this filtering process is necessary particularly when a pc system, which has limited secondary storage and main memory space, is used. once extracted and stored in the warehouse, data are not updateable. they form a read-only database. however, different snapshots of the original files can be imported into the warehouse for querying and analysis. the results of the analyses of different snapshots can then be compared. structure data warehouses have a distinct structure. there are summarization and detail structures that demarcate a data warehouse. the structure of the library data warehouse is shown in figure 3. the different components of the library data warehouse as shown in figure 3 are: • notis and 0b2 tables. bibliographic and circulation data are obtained from notis through the screen scraping process and imported into the warehouse. fcla maintains acquisitions data in the form of db2 tables. these are also imported into the warehouse after conversion to a suitable format. • warehouse. the warehouse consists of several relational tables that are connected by means of relationships. the universal relation approach could have been used to implement the warehouse by using a single table. the argument for using the universal relation approach would be that all the collected data fall under the same domain. but let us examine why this approach would not have been suitable. the different data collected for import into the warehouse were bibliographic data, circulation data, order data, and pay data. now, if all these data were incorporated into one single table with many attributes, it would not be of any exceptional use since each set of attributes have their own unique meaning when grouped together as bibliographic table, circulation table, and so on. for example, if we group the circulation data and the pay data together in a single table, it would not make sense. however, the pay data and the circulation data are related through the bib_key. hence, our use of the conventional approach of havuser .....--------~----.----------......----=--___ bibliographic data view circulation data view ufbib, ufpay, ufinv, ufcirc, uford warehouse pay data view import screen scraping notis fcla db2 tables figure 3. structure of the library data warehouse ing several tables connected by means of relationships is more appropriate. • views. a view in sql terminology is a single table that is derived from other tables. these other tables could be base tables or previously defined views. a view does not necessarily exist in physical form; it is considered a virtual table, in contrast to base tables whose tables are actually stored in the database. in the context of the ldss, views can be implemented by means of the adhoc query wizard. the user can define a query /view using the wizard and save it for future use. the user can then define a query on this query i view. • summarization. the process of implementing views falls under the process of summarization. summarization provides the user with views, which make it easier for users to query on the data of their interests. as explained above, the specific warehouse we established consists of five tables. table names including "_wh" indicates that it contains current detailed data of the warehouse. current detailed data represents the most recent snapshot of data that has been taken from the notis system. the summarized views are derived from the current detailed data of the warehouse. since current detailed data of the warehouse are the basic data of the 22 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. application, only the current detailed data tables are shown in appendix a. i decision support by querying the warehouse the warehouse contains a set of integrated relational tables whose contents are linked by the common primary key, the bib_key (biblio_key). the data stored across these tables can be traver sed by matching the key values associated with their tuples or records . decision makers can issue all sorts of sql-type queries to retrieve useful information from the warehouse. two general types of queries can be distinguished : predefined queries and ad hoc queries . the former type refers to queries that are frequently used by decision makers for accessing information from different snapshots of data imported into the warehouse . the latter type refers to queries that are exploratory in nature. a decision maker suspects that there is some relationship between different types of data and issues a query to verify the existence of such a relationship. alternatively, data mining tools can be applied to analyze the data contents of the warehouse and discover rules of their relationships (or associations). predefined queries below are some sample queries posted in english. their corresponding sql queries can be processed using loss. l. number and percentage of approval titles circulated and noncirculated. 2. number and percentage of firm order titles circulated and noncirculated . 3. amount of financial resources spent on acquiring noncirculated titles. 4. number and percentage of dlc/dlc cataloging records in circulated and noncirculated titles . 5. number and percentage of "shared" cataloging records in circulated and noncirculated titles. 6. numbers of original and "shared" cataloging records of noncirculated titles. 7. identify the broad subject areas of circulated and noncirculated titles . 8. identify titles that have been circulated "n" number of times and by subjects . 9. number of circulated titles without the 505 field. each of the above english queries can be realized by a number of sql queries. we shall use the first two english queries and their corresponding sql queries to explain how the data warehouse contents and the querying facility of microsoft access can be used to support decision making. the results of sql queries also are given . the first english query can be divided into two parts (see figure 4), each realized by a number of sql queries as shown below . sample query outputs query 1: number and percentage of approval titles circulated and noncirculated result : total approval titles circulated noncirculated 1172 980 192 83.76 % 16.24 % similar to the above sql queries, we can translate the second english query into a number of sql queries and the result is given below: query 2: number and percentage of firm order titles circulated and noncirculated result : total firm order titles circulated noncirculated report generation 1829 1302 527 71.18 % 28.82 % the results of the two predefined english queries can be presented to users in the form of a report. total titles 3001 approval 1172 39% circulated 980 83.76 % noncirculated 192 16.24 % firm order 1829 61% circulated 1302 71.18 % noncirculated 527 28 .82 % from the above report, we can ascertain that, though 39 percent of the titles were purchased through the approval plan and 61 percent through firm orders, the approval titles have a higher rate of circulation, 83.76 percent, as compared to firm order titles of 71.18 percent. it is important to note that the result of the above queries is taken from only one snapshot of the circulation data. analysis from several snapshots is needed in order to compare the results and arrive with reliable information. we now present a report on the financial resources spent on acquiring and processing noncirculated titles. in order to generate this report, we need the output of queries four and five listed earlier in this article. the corresponding outputs are shown below. query 4: number and percentage of dlc/dlc cataloging records in circulated and noncirculated titles. harvesting information from a library data warehouse i su and needamangala 23 reproduced with permission of the copyright owner. further reproduction prohibited without permission. result: total dlc/dlc records circulated noncirculated 2852 2179 673 76.40% 23.60% query 5: number and percentage of "shared" cataloging records in circulated and noncirculated titles. result: total "shared" records circulated noncirculated 149 100 49 67.11% 32.89% in order to come up with the financial resources, we need to consider several factors, which contribute to the amount of financial resources spent. for the sake of simplicity, we consider only the following factors: 1. the cost of cataloging each item with dlc/dlc record approval titles circulated 2. the cost of cataloging each item with shared record 3. the average price of noncirculated books 4. the average pages of noncirculated books 5. the value of shelf space per centimeter because the value of the above factors differs from institution to institution and might change according to more efficient workflow and better equipment used, users are required to fill in the value for factors 1, 2, and 5. loss can compute factors 3 and 4. the financial report , taking into consideration the value of the above factors, could be as shown below. processing cost of each dlc title = $10.00 673 x $10.00 = $ 6,730.00 processing cost of each shared title = $20.00 sql query t.o retrieve the distinct bibliographic keys of all the approval titles: select distinct bibscreen.bib_key from bibscreen right join pa yl on bibscreen.bib_key = pa y l.bib_num where (((payl.fund_key) like "*07*")); sql query to count the number of approval titles that have been circulated: select count (appr_title.bib_key) as countofbib_key from (bibscreen inner join appr_title on bibscreen.bib_key = appr _title.bib_key) inner join itemscreen on bibscreen.bib_key = itemscreen .biblio_key where (((itemscreen.charges)>0)) order by count(appr _title.bib_key); sql query to calculate the percentage: select cnt_appr_ti tle_circ.countofbib_ke y, int(([cnt_appr_titl e_circ]![countofbib _key])*lo0/ count([bibscreen)![bib_key])) as percent_apprcirc from bibscreen, cnt_appr_title _circ group by cnt _appr _title_circ.countofbib _key; approval titles noncirculated sql query for counting the number of approval titles that have not been circulated: select distinct count(appr_title.bib_key) as countofbib_ke y from (appr _title inner join bibscreen on appr_title.bib_key bibscreen.bib_key) inner join itemscreen on bibscreen .bib_key = itemscreen.biblio_ke y where ( ( (itemscreen.charges)=0) ); sql query to calculate the percentage: select cnt_appr_title_noncirc.countofbib_ke y, int(([cnt_appr_title_noncirc)![countofbib_ke y])*lo0/ count([bibscreen]! [bib _key]))) as percent_appr _noncirc from bibscreen, cnt_appr _title_noncirc group by cnt_appr_title_noncirc .countofbib_ke y; figure 4. example of an english query divided into two parts 24 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. 49 x $20.00 = $ 980.00 average price paid per noncirculated item = $48.00 722 x $48.00 = $34,656.00 average size of book = 288 pages = 3 cm average cost of 1 cm of shelf space= $0.10 722 x $0.30 = $216.60 grand total = $42,582.60 again it is important to point out that several snapshots of the circulation data have to be taken to track and compare the different analyses before deriving the reliable information. ad hoc queries alternately, if the user wishes to issue a query that has not been predefined, the ad hoc query wizard can be used. the following example illustrates the use of the ad hoc query wizard. assume the sample query is: how many circulated titles in the english subject area cost more than $35? we now take you on a walk-through of the adhoc query wizard starting from the first step till the output is obtained. figure 4 depicts step 1 of the ad hoc query wizard. the sample query mentioned above requires the following fields: • biblio_key for a count of all the titles which satisfy the given condition. • charges to specify the criteria of "circulated title". • fund_key to specify all titles under the "english" subject area. • paid_amt to specify all titles which cost more than $35. step 2 of the ad hoc query wizard (figure 5) allows the user to specify criteria and thereby narrow the search domain. step 3 (figure 6) allows the user to specify any mathematical operations or aggregation functions to be performed. step 4 (figure 7) displays the user-defined query in sql form and allows the user to save the query for future reuse. the output of the query is shown below in figure 8. the figure shows the number of circulated titles in the english subject area that cost more than $35. alternatively, the user might wish to obtain a listing of these 33 titles. figure 9 shows the listing. i conclusion in this article, we presented the design and development of a library decision support system based on data warehousing and data mining concepts and techniques. we described the functions of the components of loss. the screen scraping and data cleansing and extraction figure 4. step 1: ad hoc query wizard ~ e.9,~lang__;c~,tfe ... 1 lik~ "'ft,f" j.esi: !han eg,. crfi;irget t 4 gr~er th'jn·eii, q:,arges,> 0 equal tci'e_g_cfiarge~= !1 not . . figure 5. step 2: ad hoc query wizard harvesting information from a library data warehouse i su and needamangala 25 reproduced with permission of the copyright owner. further reproduction prohibited without permission. figure 6. step three : ad hoc query wizard figure 7. step four: ad hoc query wizard figure 8. query output figure 9. listing of query output 26 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. processes were described in detail. the process of importing data stored in luis as separate data files into the library data warehouse was also described. the data contents of the warehouse can provide a very rich information source to aid the library management in decision making. using the implemented system, a decision maker can use the gui to establish the warehouse, and to activate the querying facility provided by microsoft access to explore the warehouse contents . many types of queries can be formulated and issued against the database. experimental results indicate that the system is effective and can provide pertinent information for aiding the library management in making decisions. we have fully tested the implemented system using a small sample database . our on going work includes the expansion of the database size and the inclusion of a data mining component for association rule discovery. extensions of the existing gui and report generation facilities to accommodate data mining needs are expected. i acknowledgments we would like to thank professor stanley su for his support and advice on the technical aspect of this project. we would also like to thank donna alsbury for providing us with the 0b2 data, daniel cromwell for loading the 0b2 files and along with nancy williams and tim hartigan for their helpful comments and valuable discussions on this project. references and notes 1. john ladley , "operational data stores: building an effective strategy, " data warehouse: practical advice from the experts (englewood cliffs, n.j.: prentice hall , 1997). 2. information on har vard university's adapt proj ect. accessed march 8, 2000, www.adapt.harvard .edu/; information on the arizona state university data administration and institutional analysis warehou se. accessed march 8, 2000, www .asu .edu / data_admin / wh-1.html; information on the university of minnesota clarity project. accessed march 8, 2000,www.clarity.umn .edu/; information on the uc san diego darwin project. accessed march 8, 2000, www.act .ucsd .edu/ dw i darwin.html; information on university of wisconsinmadison infoaccess . accessed march 8, 2000, http :/ / wiscinfo. doit.wisc .edu/infoac cess /; information on the univer sity of nebraska data warehouse-nulook. accessed march 8, 2000, www .nulook.uneb.edu /. 3. ramon barquin and herbert edelstein, eds ., building, using, and managing the data warehouse (englewood cliffs, n .j.: prentice hall , 1997); ramon barquin and herbert edelstein, eds ., planning and designing the data warehouse (upper saddle river, n.j .: prentice hall, 1996); joyce bischoff and ted alexander, data warehouse: practical advice from the experts (englewood cliffs, n.j.: prentice hall , 1997); jeff byard and donovan schneider, "the ins and outs (and everything in between) of data war ehousing ," acm sigmod 1996 tutorial notes, may 1996. accessed march 8, 2000, www .redbrick.com / product s/ white / pdf/sigmod96.pdf ; surajit chaudhuri and umesh dayal, "an overview of data warehousing and olap technolog ," acm sigmod record 26(1), march 1997. accessed march 8, 2000, www.acm.org/sigmod / record/issue s/ 9703/ chaudhuri .ps ; b. devlin , data warehouse: from architecture to implementation (reading, mass.: addison-wesle y, 1997); u. fayyad and others, eds ., advances in knowledge discovery and data mining (cambridge, mass.: the mit pr., 1996); joachim hammer, "data war ehousing overview, terminology, and research issues." accessed march 8, 2000, www.cise.ufl .edu/ -jhammer / classes / wh-seminar / overview / index .htm ; w. h. inmon, building the data warehouse (new york, n.y.: john wiley, 1996); ralph kimball , "dangerous preconceptions." accessed march 8, 2000, www .dbmsmag.com/9608d05.html ; ralph kimball , the data warehouse toolkit (new york, n.y.: john wiley, 1996); ralph kimball, "mastering data extraction," in dbms magazine, june 1996. (provides an overview of the process of extracting , cleaning, and loading data .) accessed march 8, 2000, www .dbmsmag.com / 9606d05 .html ; alberto mendelzon , "bibliography on data warehousing and olap." accessed march 8, 2000, www.cs.toronto.edu/-mendel/dwbib.html. 4. daniel j. boorstin, "the age of negative discovery," cleopatra's nose: essays on the unexpected (new york: random hous e, 1994). 5. information on the arrow system . accessed march 8, 2000,www . fcla.edu /s ystem/intro_arrow.html. 6. gary strawn, "batchbaming." accessed march 8, 2000, http:/ /web .uflib.ufl .edu/rs/rsd/batchbam .html. 7. li-min fu, "oomrul: leaming the domain rules ." accessed march 8, 2000, www .cise.ufl .edu / -fu / domrul.html. harvesting information from a library data warehouse i su and needamangala 27 reproduced with permission of the copyright owner. further reproduction prohibited without permission. appendix a warehouse data tables ufcirc_wh uford _wh ufpay_wh attribute domain attribute domain attribute domain bib_key text(s0) id autonumber inv_key text(20) status text(20) ord_num text(20) ord_num text(20) enum / chron text(20) ord_div number ord_div number midspine text(20) process_uni t text(20) process _unit text(20) temp_locatn text(20) bib_num text(20) bib_key text(20) pieces number order_da te da te / time ord_seq_num number ch arges number mod_date date / time inv_seq_num number last_use date / tune vendor_code text(20) status text(20) browse s number vndadr_order text(20 create_ date da te / tune value text(20) vndadr_claim text(20) lst_update da te / time invnt_date date / time vndadr_retum text(20) currency text(20) created date / time vend_ title_n um text(20) paid_am t num ber ord_unit text(20) usd_amt n u mber rcv_unit text(20) fund_key text(20) ufinv_wh ord_scope text(20 exp_class text(20) pur_ord_prod text(20) fiscal_year text(20) attribute domain action _int number copies number inv_key text(20) libspecl text(20) type_pay text(lo) create _dat e date / time libspec2 text(20) text text(20) mod_date date / time vend_note text(20) db2_11mestamp date / time approv _stat text(20) ord_note text(20) vend_adr _code text(20) source text(20) vend_code text(20) ref text(20) ufbib_wh action_date text(20) copyctl _num number attribute domain vend_inv _date date/tune mediu m text(20) approval_date date / tune piece_cnt n umber bib_key text(20) appro ver_id text(20) div_no te text(20) system_control _num text(s0) vend_inv _num text(20) acr_stat text(20) ca talog_source text(20) inv_tot number rel_stat text(20) lan g_code_l text(20) cale_ tot_rym ts num ber lst_date date / time lang_code_2 text(20) calc_net _tot_pymts number action_date text(20) geo_code text(20) currency text(20) libspec3 text(20) dewey_num text(20) discount_percen t number libspec4 text(20) edition text(20) vouch_no te text(20) encum b_units number pagina tion text(20) official_ vend text(20) currency text(20) size text(20) process _unit text(20) est_price number series_440 text(20) intemal_note text(20) encumb_outs num ber series_490 text(20) db2_ timestamp text(20) fund _key text(20) conten t text(20) fiscal_ year text(20) subject_l text(20) copies n u mber subject_2 text(20) xpay_method text(20) subject_3 text(20) vol_isu_date text(20) authors_l text(20) title_author text(20) au thors_2 text(20) db2_ timestamp date / time au th ors_3 text(20) series text(20) 28 information technology and libraries i march 2000 editorial board thoughts halfway home: user centered design and library websites mark cyzyk information technology and libraries | march 2018 4 mark cyzyk (mcyzyk@jhu.edu), a member of lita and the ital editorial board, is the scholarly communication architect in the sheridan libraries, the johns hopkins university, baltimore, maryland. our library website has now gone through two major redesigns in the past five or so years. in both cases, a user centered design approach was used to plan the site. in contrast to the single person vision and design by committee approaches, user centered design focuses on the empirical study and eliciting of the needs of users. great attention is paid to studying them, listening to them, and to exposing their needs as expressed. in both of our cases, the overall design, functionality, and content of the new site was then focused exclusively on the results of such study. if a proposed design element, a bit of functionality, or a chunk of content did not appear as an expressly desired feat ure for our users, it was considered clutter and did not make it onto the site. both iterations of our website redesign were strictly governed by this principle. but user centered design has blind spots. first, it may well be that what you take to be your comprehensive user base is not as comprehensive as you think. in my library, our primary users are our faculty and student researchers, so great attention was paid to them. this makes sense insofar as we are an academic library within a major research univ ersity. faculty and student researchers will always be our primary user group. but they are not our comprehensive user group. we have staff, administrators, visitors, members of our board of trustees, members of our friends, outside members of the profession, etc. — and they are all important constituencies in their own ways. second, unless your sample size of users is large enough to be statistically valid, you are merely playing a game of three blind men and the elephant. each user individually will be ex pressing his or her own experience and perceived needs based on that experience, and yet none of them, even taken as a group, will be reporting on the whole beast. while personal testimony definitely counts as evidence, it also frequently and insidiously results in blind spots that would otherwise be exposed through having a statistically valid sample of study participants. third, and perhaps most importantly, user centered design discounts the expertise of librarians. nobody knows a library’s users and patrons as well as librarians. knowing their users, eliciting their needs, is part of what librarians as one of the “helping professions” do; it is a central tenet of librarianship. there is no substitute for experience and the expertise that follows from it. in the art world, this is connoisseurship. somehow, the art historian just knows that what is before him is not a genuine rembrandt. the empirical evidence may ineluctably lead to a different conclusion — yet there remains something missing, something the connoisseur cannot fully elucidate. similarly, in the medical world the radiologist somehow just knows that the subtle gradations on his screen indicate one type of malady and not another. interestingly, in the poultry industry there is something called a “chicken sexer.” this is a person who quickly and accurately sorts baby chicks by sex. training for this vocation mailto:mcyzyk@jhu.edu editorial board thoughts: halfway home | cyzyk 5 https://doi.org/10.6017/ital.v37i1.103813 largely employs what the philosophers call “ostensive definition:” “this one is male; that one is female.” the differences are so small as to be imperceptible. and yet, experienced chicken sexers can accurately sort chicks at an astonishing rate. they just know through experience. such is the nature of tacit knowledge. in the case of our most recent website redesign, none of our users expressed any interest whatsoever, for example, in including floor maps as part of the new site. we were assured a demand for floor maps on the site was “not a thing.” so floor maps were initially excluded from the site. this was met with a slow crescendo of grumbling from the librarians, and rightly so. librarians, and the graduate students at our information desk, know through long experience that researchers of varying types find floor maps of the building to be useful. that’s why we’ve handed out paper copies for years. the fact that this need was missed through our focus on user centered design points to a blind spot in that process. valuable experience and the expertise that follows from it should not be dismissed or otherwise diminished through dogmatic adherence to the core principle of user centered design. ... and yet, don’t get me wrong: insofar as it’s the empirical study of select user groups and their expressed concerns and needs, user centered design as a design technique and foundational principle is crucially important and useful. it gets us halfway home. 152 information technology and libraries | december 2011 ■■ more from the far side of the k–t boundary in my september column, i offered some old-school suggestions for how we as a profession might cope with our confused and unbalanced times. since then, several more have crossed my mind, and i thought i’d offer them, for what they may be worth: ■■ we can outsource everything but responsibility. whether it’s “the cloud,” vendor acquisition profiles, or shelfready cataloguing, outsourcing has become a popular way of dealing with budgetary and staffing stresses during the past few years. generally speaking, i have serious reservations about outsourcing our services, but i do recognize the imperatives that have caused us to resort to them. that said, in farming out critical library services, we do not at the same time gain license to farm out responsibility for their efficient operation. oversight and quality control are still up to us, and it simply will not wash with patrons today, next year, or a century from now to be told that a collection or service is unacceptably substandard because we outsourced it. a vendor’s failure is our failure, too. it’s still “our stuff,” and so are the services. ■■ we’re here to make decisions, not avoid them. document delivery, patron-driven acquisitions, usability studies, and evidence-based methodologies should help to inform and serve as validity checks for our decisions, not be replacements for them. as with outsourcing and our over-reliance on technology-driven solutions, i fear that these services and methodologies are in real danger of becoming crutches, enabling us to avoid making decisions that may be difficult, unpopular, tedious, or simply too much work. but if decisions regarding collections and services can be reduced to simple questions of demand or the outcome of a survey, then who needs us? it’s our job to make these decisions; demandor survey-driven techniques are simply there to assist us in doing so. ■■ relevance is relative. we talk about “relevance” in much the same breathlessly reverential voice as we speak of the “user” . . . as if there were but one, uniquely “relevant” service model for that single, all-encompassing “user.” one of the perils of our infatuation with “relevance” is the illusion that by adopting this or that technology or targeted service, we are somehow remaining relevant to “the user.” which user? just as not all patrons come to us seeking potboiler romances, so too not all users demand that all services and collections be made available electronically, over mobile platforms. since we do recognize that our resources are finite, rather than pandering to some groups at the expense of others with trendy temporal come-ons, why not instead focus on long-term services and collections that reflect our values? the patrons who really should matter most to us will respect us for this demonstration of our integrity. ■■ libraries are ecosystems. as with the rest of the world around us, libraries comprise arrays of interlocking, interdependent, and often poorly understood/ documented entities, services, and systems. they’ve developed that way over centuries. and just as so often happens in the larger world, any and every change we make can cause a cascade of countless other changes, many of which we might not anticipate before making that seemingly simple initial change. we are stewards of the libraries in which we work: our obligation, as librarians, is to respect what was bequeathed to us, to care for and use it wisely, and to pass it on to those who follow in at least the condition in which we received it—preferably better. environments, including libraries, change and evolve of course, but critics of the supposedly slow pace of change in libraries fail to grasp that our role is just as much that of the conservationist as it is the advocate of development and change. our mission is not change for change’s sake; rather, it is incremental, considered change that will benefit not only today’s patrons and librarians, but respect those of the past and serve those of the future as well. perhaps librarians need an analogue to the medical profession’s hippocratic oath: primum non nocere, “first, do no harm.” ■■ innocents abroad probably few ital readers will be aware (i certainly wasn’t!) that mark twain’s bestselling book during his lifetime was not tom sawyer or huckleberry finn—or any of a host of others of his now better-remembered works— but rather his 1869 travelogue innocents abroad, or the new pilgrims’ progress. the book, which i’ve been savoring in my spare leisure reading time over the past several months, records in journal form twain’s involvement in a voyage in 1867 by a group of american tourists to various locales in mediterranean europe, northern africa, and the near east. in the book, twain gleefully skewers marc truitt marc truitt (marc.truitt@ualberta.ca) is associate university librarian, bibliographic and information technology services, university of alberta libraries, edmonton, alberta, canada, and editor of ital. outgoing editor’s column: parting thoughts outgoing editor’s column | truitt 153 committee assignments go, i think it fair to say that this is probably one of the more thankless. board members must be expert in all areas of technology, and as important, willing and able to do a credible job of pretending to be so in those areas where they are not expert! they must be able to recognize and create good prose and to offer authors practical, constructive insights and guidance in the sometimes black art of turning promising manuscripts into great articles. as i think many ital authors will attest, they do a superb job at this. they also write some of the most interesting and perceptive editorial columns you’ll see in ital! ■■ judith carter. it’s really impossible to overstate the contributions made by judith to ital. other than a brief four-year interlude during which i served in the role, judith has been managing editor for much of the past decade and more. she taught me the job when she relinquished it in early 2004, and then graciously offered to take it back again when i was named editor four years later. more than any other single person, she is responsible for the ital you hold in your hands, and she does it with skill and tireless dedication. she also has been my coach, my confidante, and—as only a true friend can be—even my butt-kicker when i was late in observing a deadline, which has not infrequently been the case. thank you for everything, judith. ■■ dan and john. the late dan marmion brought me on board at ital as a board member in 2000; he later asked me to serve as his managing editor. he also encouraged me to succeed john webb as editor in 2007. from both dan and john i learned much about the role of an editor and especially about what ital could and should be. i am endlessly appreciative for their mentoring and hope that i have been reasonably successful in maintaining the high standards that they set for the journal. ■■ the authors. without interesting, well-researched, and timely content, there would be no ital. i have been blessed with a rich and nearly constant supply of superb manuscript submissions that the folks who make up the ital “publication machine” have then turned into a highly stimulating and readable journal. i hope you agree. ■■ the readers. and finally, i thank all of you, gentle readers. you are the reason that ital exists. i have been grateful for your support, your patience, and your always-constructive suggestions. beginning with the march 2012 issue, ital will be edited by bob gerrity of boston college. i’ve been acquainted with bob for a number of years, and i can’t think of a better person to guide this journal through the tour-goers, those they encounter, and of course himself; as with twain generally, it is at turns witty, outlandish, biting, and—by today’s lights—completely lacking in political correctness. in short, it’s vintage mark twain: delicious! i mention innocents abroad not simply because i’m currently enjoying it (and hoping that by saying so, i might pique some other ital reader ’s interest in giving it a test drive) but also because it—as with other books, songs, stories, etc., about journeys-taken—is a metaphor for life. we are all “innocents” in some sense as we traverse the days and years of growth in selfawareness, relationships, work, and all the other facets that make up life. it’s a comforting way of viewing the world, i think. i’ve served with ital in various capacities for more than eleven years. that’s a very long time in terms of one particular ala/lita committee. it’s now time for my journey and ital’s to part ways. this is my final column as editor of this journal. this “innocent” is debarking the ital ship and moving on. ital is the product of the dedicated labor of many people of whom i am but one. for some of them, it is a labor of love. as with the credits at the end of a film, it is customary for an editor in her or his final column to recognize and thank the people who made it all possible. i’d like to do so now. polite audience members know to remain until “the end” rolls by. i hope you’ll help me honor these people by doing so, too: ■■ mary taylor, valerie edmonds, and melissa prentice in the lita office. over the years, they’ve been unfailingly helpful to me, to say nothing of being nearly as unfailingly tolerant of my clueless and occasionally obstreperous, passive-aggressive ignorance of the byzantine ways of the ala bureaucracy. ■■ ala production services. production services folk are the professionals who, among innumerable other skills, copyedit and typeset manuscripts, perform miracles with figures and tables, and generally make ital into the quality product you receive (whether it is celluloseor electron-based). regardless of ital’s future publishing format and directions, count yourself fortunate as long as the good people in production services continue to play a role. i’d especially like to single out tim clifford, ital’s production editor, who over the past several years has brought skill, grace, stability, and a healthy dose of humor to this critical post. ■■ the members—past and present—of the ital editorial board. the editorial board is a lita committee; the members of this committee serve as the editor’s primary group of reviewer-referees of manuscripts submitted for publication consideration. as 154 information technology and libraries | december 2011 “happy trails,” and “t-t-t-t-that’s all, folks!” “the end.” changes that will be coming over the next few years. i wish him the very best and hope that he has as much fun in the job—and on the journey—as have i. from the managing editor i’d like to take this opportunity to give marc truitt my heartfelt thanks and best wishes as he leaves his longterm relationship with information technology and libraries (ital). i appreciate how he ably stepped into the role of managing editor (me) when i needed to resign to focus on my full-time job. a few years later he became the new editor and i accepted his request to be his me. i think we’ve had a good partnership. i’ve nudged marc about the production schedule while he has managed manuscripts, the peer review process, and eloquently represented the journal when needed. marc held and communicated a clear and scholarly view of the journal to the editorial board and to lita. i have fond memories of many cups of tea drunk in various ala conference venues while we discussed ital, lita, and shared news of mutual friends. we endured the loss of our friend and mentor dan marmion together a year ago september when marc wrote a letter which i read at the memorial service. this too may be my final issue of ital. it is unknown at time of printing. i support the online future of ital and have offered my services to robert gerrity until a paper version is no longer supported and we successfully transition my duties into an online environment/to a new me. i know he will take the journal into its new iteration with skill and grace. i have served lita and ital for over 13 years and am proud of the quality peer reviewed journal dan marmion, john webb, marc truitt, the editorial board members and i have shared with the members of lita. it has also been my honor to communicate with each of the authors and to facilitate their scholarly communication to our profession. without the authors, where would we be? thank you all, judith carter. 242 marc program research and development: a progress report henriette d. avram, alan s. crosby, jerry g. pennington, john c. rather, lucia j. rather, and arlene whitmer: library of congress, washington, d. c. a description of some of the research and development activities at the library of congress to expand the capabilities of the marc system. gives details of the marc processing format used by the library and then describes programming work in three areas: 1) automatic tagging of data elements by format recognition programs; 2) file analysis by a statistical program called genesis; and 8) information retrieval using the marc retriever. the marc system was designed as a generalized data management system that provides flexibility in converting bibliographic descriptions of all forms of material to machine readable form and ease in processing them. the foundation of the system is the marc ii format (hereinafter simply called marc), which reached its present form after many months of planning, consultation, and testing. implementation of the system itself has required development of a battery of programs to perform the input, storage, retrieval, and output functions necessary to create the data base , for the marc distribution service. these programs are essentially like those of the marc interim system described in the report of the marc pilot project ( 1). briefly, they perform the following tasks: marc research and development/ avram 243 1) a pre-edit program converts records prepared on an mt /st to a magnetic tape file of ebcdic encoded record segments. 2) a format edit program converts the pre-edited tape file to a modified form of the marc processing format. 3) a content edit program generates records in the final processing format. at this stage, mnemonic tags are converted to numeric form, subfield codes may be supplied, implicit fixed fields are set, etc. 4) ibm sort program arranges validated content-edit output records by lc card number. this program is also used later in the processing cycle. 5) a generalized file maintenance program (update 1) allows addition, deletion, replacement, or modification of data at the record, field, or subfield levels before the record is posted to the master file. a slightly different version (update 2) is used to update the master file. 6) a print index program generates a list of control numbers for a given file. the list may also include status, date of entry, or date of last transaction for each record. 7) a general purpose print program produces a hardcopy to be used to proofread the machine data against the original input worksheet. since the program is table controlled, it can be modified easily to yield a great variety of other formats and it can be extended routinely to handle other data bases in the marc processing format. 8) two additional programs select new records from the marc master file and convert them from the processing format to the communications format on both sevenand nine-track tapes for general distribution. as the basic programs became operational, it was possible to investigate other aspects of the marc system that would benefit from elaboration and refinement. reports of some of this activity have found their way into print, notably a description of the marc sort program and preliminary findings on format recognition (2, 3), but much of the library·s research and development effort in programming is not well known. the purpose of this article is to give a progress report on work in three significant areas : 1) automatic tagging of data elements by format recognition programs; 2) file analysis by a statistical program called genesis; and 3) information retrieval using the marc retriever. in the following descriptions, the reader should bear in mind that all of the programs are written to accommodate records in the marc processing format. a full description of the format is given to point up differences between it and the communications format. all of the programs are written in assembly language for the ibm s360/ 40 functioning under the disk operating system (dos ) . the machine file is stored on magnetic tape and the system is operated in the batch mode. at present, the programs described here are not available for general distribution, but it is expected that documentation for some of them may 244 journal of library automation vol. 2/4 december, 1969 be filed with the ibm program information department in the near future. meanwhile, the library of congress regrets that it will be unable to supply more detailed information. it is hoped that the information in this article will answer most of the questions that might be asked. marc processing format the marc data base at the library of congress is stored on a ninechannel magnetic tape at a density of 800 bpi. the file contains records in the undefined format; each record is recorded in the marc processing format (sometimes called the internal format). data in the processing format are recorded in binary, packed decimal, or ebcdic notation depending on the characteristics of the data and the processing required. the maximum length of a marc processing record is 2,048 bytes. the magnetic tape labels follow the proposed standard developed by subcommittee x3.2 of the united states of america standards institute. a marc record in the processing format is composed of six parts: record leader ( 12 bytes), communications field ( 12 bytes), record control field ( 14 bytes), fixed fields (54 bytes), record directory (variable in length, with each directory entry containing 12 bytes) and variable data fields (variable length). all records are terminated by an end-of-record ( eor) character. record leader 0 1 2 4 5 6 7 record l ength element number 1 2 date yy : mm :nn status not record used type i number character name of position characters in record record length 2 0-1 date 3 2-4 8 9 11 bibliographic not level used definition total number of bytes in the logical record including the number of bytes in the record length itself. it is given in binary notation. date of last transaction (i.e., the date the last action was taken upon the whole record or some part of the record). the date is recorded in the form of marc research and development/ a vram 245 3 4 5 6 7 status 1 not used 1 record type 1 bibliographic 1 levels not used 3 communications field 12 n 14 15 16 record directory record directory entry source location colult 17 yymmdd, with each digit being represented by a four-bit binary-coded decimal digit packed two to a byte. 5 a code in binary notation to indicate a new, deleted, changed, or replaced record. 6 contains binary zeros. 7 an ebcdic character to identify the type of record that follows (e.g., printed language material) . 8 an ebcdic character used in conjunction with the record type character to describe the components of the · bibliographic record (e.g., monograph). 9-11 contains binary zeros. 18 19 20 2~ record ininnot destination process process u sed type status element number number character n arne of position definition characters in record 1 record directory 2 location 2 directory entry 2 count 3 record source 1 12-13 the binary address of the record directory relative to the first byte in the record (address zero). 14-15 the number of directory entries in the record, in binary notation. there is one directory entry for every variable field in the record. 16 an ebcdic character to show the cataloging source of the record. 246 journal of library automation vol. 2/4 december, 1969 4 record 1 17 an ebcdic character to show destination the data bank to which the record is to be routed. 5 in-process 1 18 a binary code to indicate the type action to be performed on the data base. the in-process type may signify that a new record is to be merged into the existing file; a record currently in the file is to be replaced, deleted, modified in some form; or that it is verified as being free of all error. 6 in-process 1 19 a binary code to show whether status the data content of the record has been verified. 7 not used 4 20-23 contains binary zeros. record control field 24 i ! i i 'i'i i i libr~ry of con~ess cata~og card nymber 1 supplement 1 number not used segment number element number 1 number character name of position definition characters in record library of 12 congress catalog card number 24-35 on december 1, 1968, the library of congress initiated a new card numbering system. numbers assigned prior to this date are in the "old, system; those assigned after that date are in the "new, system( 4). the library of congress catalog card number is always represented by 12 bytes in ebcdic notation but the data elements depend upon the system. marc research and development/ avram 247 old numbering system prefix 3 24-26 an alphabetic prefix is left justified with blank fill; if no prefix is present, the three bytes are blanks. year 2 27-28 number 6 29-34 supplement 1 35 a single byte in binary notation number to identify supplements with the same lc card number as the original work. new numbering system not used 3 24-26 contains three blanks. initial 1 27 initial digit of the number. digit check digit 1 28 "modulus 11, check digit. number 6 29-34 supplement 1 35 see above. number 2 not used 1 36 contains binary zeros. 3 segment 1 37 used to sequentially number the number physical records contained in one logical record. the number is in binary notation. fixed fields i ~ j { 911 the fixed field area is always 54 bytes in length. fixed fields that do not contain data are set to binary zeros . . data in the fixed fields may be recorded in binary or ebcdic notation, but the notation remains constant for any given field. 248 journal of library automation vol. 2/ 4 december, 1969 record directory 92 94 95 96 98 99 100 101 102 103 ta g site not action data relative number used code length address element number character number name of position characters in record definition 1 tag 3 92-94 an ebcdic number that identifies a variable field. the tags in the directory are in ascending order. 2 site number 1 95 a binary number used to distinguish variable fields that have identical tags. 3 not used 3 96-98 contains binary zeros. 4 action code 1 99 a binary code used in file maintenance to specify the field level action to be performed on a record ( i.e., added, deleted, corrected, or modified). 5 data length 2 100-101 length (in binary notation) of the variable data field indicated by a given entry. 6 relative 2 102-103 the binary address of the first address byte of the variable data field relative to the first byte of the record (address zero). 7 directory end 1 n since the number of entries in of field the directory varies, the characsentinel ter position of the end-of-field terminator ( eof) also varies. marc research and development/ avram 249 variable data fields indicator(s) delimiter sub field delimiter data < $ terminator code code( s) element number character number name of position 1 2 3 4 5 6 characters in record indicator variable delimiter 1 subfield variable code delimiter 1 data terminator code variable 1 n n n n n n ~ definition a variable data field may be preceded by a variable number of ebcdic characters which provide descriptive information about the associated field. a one-byte binary code used to separate the indicator ( s) from the subfield code( s). when there are no indicators for a variable field, the first character will be a delimiter. variable fields are made up of one or more data elements ( 5). each data element is preceded by a delimiter; a lower-case alphabetic character is associated with each delimiter to identify the data element. these alpha characters are grouped. all variable fields will have at least one subfield code. each data element in a variable field is preceded by a delimiter. all variable fields except the last in the record end with an endof-field te1minator ( eof); the last variable field ends with an end-of-record terminator (eor). 250 journal of library automation vol. 2/4 december, 1969 format recognition the preparation of bibliographic data in machine readable form involves the labeling of each data element so that it can be identified by the machine. the labels (called content designators) used in the marc format are tags, indicators, and subfield codes; they are supplied by the marc editors before the data are inscribed on a magnetic tape typewriter. in the current marc system, this tape is then run through a computer program and a proofsheet is printed. in a proofing process, the editor compares the original edited data against the proofsheet, checking for errors in editing and keyboarding. errors are marked and corrections are reinscribed. a new proofsheet is produced by the computer and again checked for errors. when a record has been declared error-free by an editor, it receives a final check by a high-level editor called a verifier. verified records are then removed from the work tape and stored on the master tape. the editing process in which the tags, indicators, sub:field codes, and :fixed :field information are assigned is a detailed and somewhat tedious process. it seems obvious that a method that would shift some of this editing to the machine would in the long run be of great advantage. this is especially true in any consideration of retrospective conversion of the 4.1 million library of congress catalog records. for this reason, the library is now developing a technique called "format recognition." this technique will allow the computer to process unedited bibliographic data by examining the data string for certain keywords, significant punctuation, and other clues to determine the proper tags and other machine labels. it should be noted that this concept is not unique to the library of congress. somewhat similar techniques are being developed at the university of california institute of library research ( 6) and by the bodleian library at oxford. a technique using typographic cues has been described by jolliffe ( 7 ) . the format recognition technique is not entirely new at the library of congress. the need was recognized during the development of the marc ii format, but pressure to implement the marc distribution service prevented more than minimal development of format recognition procedures. in the current marc system a few of the fields are identified by machine. for example, the machine scans the collation statement for keywords and sets the appropriate codes in the illustration fixed field. in general, however, machine identification has been limited to those places where the algorithm produces a correct result 100 percent of the time. the new format recognition concept assumes that, after the unedited record has been machine processed, a proofsheet will be examined by a marc editor for errors in the same way as is done in the current marc system. since each machine processed record will be subject to human review, it will be possible to include algorithms in the format recognition program that do not produce correct tagging all of the time. marc research and development/ avram 251 the format recognition algorithms are exceedingly complex, but a few examples will be given to indicate the nature of the logic. in all the examples, it is assumed that the record is typed from an untagged manuscript card (the work record used as a basis for the library of congress catalog card) on an input device such as a paper tape or a magnetic tape typewriter. the data will be typed from left to right on the card and from top to bottom. the data are input as fields, which are detectable by a program because each field ends with a double carriage return. each field comprises a logical portion of a manuscript card; thus the call number would be input as a single field, as would the main entry, title paragraph, collation, each note, each added entry, etc. it is important to note that the title paragraph includes everything through the imprint. identification of variable fields call number. this field is present in almost every case and it is the first field input. the call number usually consists of 1-3 capital letters followed by 1-4 numbers, followed by a period, a capital letter, and more numbers. there are several easily identifiable variations such as a date before the period or a brief string of numbers without capital letters following the period. the delimiter separating the class number from the book number is inserted according to the following five-step algorithm: 1) if the call number is law, do not delimit. 2) if the call number consists simply of letters followed by numbers (possibly including a period), do not delimit. example: hf5415.13 if this type of number is followed by a date, it is delimited before the blank preceding the date. example: ha12f 1967 3) h the call number begins with 'kf' followed by numbers, followed by a period, then: a) if there are one or two numbers before the period, do not delimit. example: kf26.l354 1966a b) if there are three or more numbers before the period, delimit before the last period in the call number. example: kfn5225f.z9f3 4) if the call number begins with 'cs71' do not delimit unless it contains a date. in this case, it is delimited before the blank preceding the date. example: cs7l.s889f 1968 5) in all other cases, delimit before the last capital letter except when the last capital letter is immediately preceded by a period. in this latter case, delimit before this preceding period. examples: ps3553.e73fw6 e595.f6fk4 1968 pz10.3.u36fsp tx652.5f.g63 1968 name main entry. the collation statement is the first field after the call number that can 252 journal of library automation vol. 2/4 december, 1969 be easily identified by analyzing its contents. the field immediately preceding the collation statement must be the title paragraph. if there is only one field between the call number and the collation, the work is entered under title (tagged as 245) and there is no name main entry. if there are two or three fields, the first field after the call number is a name main entry (tagged in the 100 block). when three fields occur between the call number and collation, the second field is a uniform title (tagged as 240). further analysis into the type of name main entry and the subfield code depends on such clues as location of open dates ( 1921) , date ranges covering 20 years or more ( 1921-1967), identification of phrases used only as personal name relators ( ed., tr., comp. ), etc. the above clues strongly indicate a personal name. identification of an ordinal number preceded by punctuation and a blank followed by punctuation is strongly indicative of a conference heading. in the course of processing, delimiters and the appropriate subfield codes are inserted. subfield code "d" is used with dates in personal names; subfield code "e" with relators. example: mepsfde smith, john,f1902-1967,fed. analysis for fixed fields publisher is main entry indicator. this indicator is set when the publisher is omitted from the imprint because it appears as the main entry. the program will set this indicator whenever the main entry is a corporate or conference name and there is no publisher in the imprint statement. this test will fail in the case where there is more than one publisher, one of which is the main entry, but occurrences of this are fairly rare (less than 0.2 percent). biography indicator. four different codes are used with this indicator as follows: a = individual autobiography; b = individual biography; c = collected biography or autobiography; and d = partial collected biography. the "n' code is set when 1) "autobiographical", "autobiography", "memoirs", or "diaries" occurs in the title statement or notes, or 2) the surname portion of a personal name main entry occurs in the short title or the remainder of the title subfields. the "b" code is set when 1 ) "biography" occurs in the title statement, 2) the surname portion of a personal name subject entry occurs in the short title or the remainder of the title subfields, or 3) the dewey number contains a "b" or a 920. the "c" code is set when 1) "biographies" occurs in the title statement or 2) a subject entry contains the subdivision 'oiography." there appears to be no way to identify a "d" code situation. despite this fact, the biography indicator can be set correctly about 83 percent of the time. marc research and development/ avram 253 implementation schedule work on the format recognition project was begun early in 1969. the first two phases were feasibility studies based on english-language records with a certain amount of pretagging assumed. since the results of these studies were quite encouraging, a full-scale project was begun in july 1969. this project is divided into five tasks. task 1 consisted of a new examination of the data fields to see if the technique would work without any pretagging. new algorithms were designed and desk-checked against a sample of records. it now seems likely that format recognition programs might produce correctly tagged records 70 percent of the time under these conditions. it is possible that one or two fixed fields may have to be supplied in a pre-editing process. tasks 2 through 5 remain to be done. task 2 will provide overall format recognition design including 1) development of definitive keyword lists, 2) typing specifications, 3) determination of the order of processing of fields within a record, and 4) description of the overall processing of a record. when the design is completed, a number of records will go through a manual simulation process to determine the general efficiency of the system design. task 3 will investigate the extension of format recognition design to foreign-language titles in roman alphabets. task 4 will provide the design for a format recognition program based on the results of tasks 2 and 3 with detailed flowcharts at the coding level. the actual coding, checkout, and documentation will be performed as task 5. according to current plans, the first four tasks are scheduled for completion early in 1970 and the programming will be finished later in the year. outlook it is apparent that a great deal of intellectual work must be done to develop format recognition algorithms even for english-language records and still greater ingenuity will be required to apply these techniques to foreign-language records. nevertheless, on the basis of encouraging results of early studies, there is evidence that the human effort in converting bibliographic records to machine readable form can be materially reduced. since reduction of human effort would in tum reduce costs, the success of these studies will have an important bearing on the rate at which current conversion activities can be expanded as well as on the economic feasibility of converting large files of retrospective cataloging data. genesis early in the planning and implementation of automation at the library of congress it became apparent that many tasks require information about the frequency of data elements. for example, it was helpful to know about the frequency of individual data elements, their length in characters, and the occurrence of marks of punctuation, diacritics, and specified 254 journal of library automation vol. 2/4 december, 1969 character strings in particular data elements. in the past, most of the counting has been done manually. once a sizable amount of data was available in machine readable form, it was worthwhile to have much of this counting done by computer. therefore, the generalized statistical program (genesis) was done as a general purpose program to make such counts on all forms of material in the marc processing format on magnetic tape files. any of a variety of counts can be chosen at the time of program execution. there are three types of specifications required for a particular run of the program: selection criteria; statistical function specifications; and output specifications. selection criteria record selection criteria are specified by statements about the various data fields that must be present in the records to be processed. field selection criteria specify the data elements that will actually be analyzed. processing by these techniques operates logically in two distinct stages: 1) the record is selected from the input file; i.e., the program must determine if a particular record is to be included in the analysis; and 2) if the record is eligible, the specified function is performed on selected data fields. it should be noted that records may be selected for negative as well as positive reasons. the absence of a particular field may determine the eligibility of a record and statistical processing can be performed on other fields in the record. record selection is optional; if no criteria are specified, all records on the input file will be considered for processing. since both record selection and field selection reference the same elements, specifications are input in the same way. selection of populations can be designated by tagging structure (numeric tags, indicators, subfield codes or any combination of these three), specified character strings, and specified characters in the bibliographic data. the following queries are typical of those that can be processed by genesis. how many records with an indicator set to show that the volume contains biographic information also have an indicator set to show that the subject is the main entry? how many records with a field tagged to show that the main entry is the name of a meeting or conference actually have the words "meeting" or "conference" in the data itself? table 1 shows the operators that can be used with record and field select statements. statistical function specification the desired statistical function is specified via a function statement. four functions have been implemented to date. they involve counts of occurrences of specified fields, unique data within specified fields given a range of data values, data within a specified range, and particular data characters. in addition to counting the frequency of the specified element, genesis calculates its percentage in the total population. marc research and development/ a vram 255 table 1. operators of genesis operator equals not equal greater than or equal to less than or equal to and or example of usage count all occurrences where data represented by tag 530 equals "bound with" count all occurrences where the publication language code is not equal to "eng" count all occurrences and output records that are greater than or equal to 1,000 characters count all occurrences of records entered on the marc data base before june 1, 1968 (less than or equal to 680601) count all occurrences where the publication equals "s" and the publication date is greater than or equal to 1960 count all occurrences of personal name main entry (tag 100) a relator ( subfield code "e") that equals "ed." or "comp." the first function counts occurrences per record of specified field selection criteria. this answers queries concerning the presence of given conditions within the selected records; for example, a frequency distribution of personal name added entries (tag 700). this type of count results in a distribution table of the number of records with 0 occurrences, 1 occurrence, 2 occurrences, and so forth. the second function, which counts occurrences of unique data values within a specified range, answers queries when the user does not know the unique values occurring in a given field, but can state an upper and lower value. for example, the specific occurrences of publishing dates between 1900 and 1960 might be requested. the output in response to this type of query consists of each unique value, lying within the range specified, with its frequency count. in addition, separate counts are given for values less than the lower bound and of values greater than the upper bound. the function is performed by maintaining in computer memory an ordered list of unique values encountered, together with their respective counts. as selected fields are processed, each new value is compared against the entries in the list. if the new value already appears in the list, its corresponding count is incremented. otherwise, the new value is inserted in the list in its proper place and the remainder of the list is pushed down by one entry. the amount of core storage used during a 256 journal of library automation vol. 2/ 4 december, 1969 particular run is directly related to the number of unique occurrences appearing within the specified range. since the length of each entry is determined by the length of the bounds specified, the number of entries which can be held in free storage can vary from run to run. thus it is possible that the number of unique entries may fill memory before a run has been completed. when this happens, the value of the last entry in the list will be discarded and its count added to the "greater than upper bound" count. in this way, while the user may not obtain every unique value in the specified range, he will obtain all unique values from the lower bound which can be contained in memory. he is then in a position to make subsequent runs using, as a beginning lower bound value, the highest unique value obtained from the preceding run. the third function processes queries concerning counts within specified ranges. when this function is used, unique values are not displayed. instead, the occurrences are counted by specified ranges of values. more than one range can be processed during a single run. on output, the program provides a cumulative count of values encountered within each range as well as the counts of those less than and those greater than the ranges. function four counts occurrences of particular data characters. an individual character may be specified explicitly or implicitly as a member of a group of characters. this allows the counting of occurrences of various alphabetic characters within specified fields. the current list of character classes that can be counted are: alpha characters, upper-case letters, lowercase letters, numbers, punctuation, diacritics, blanks, full (all characters included in above classes), nonstandard special characters, and any particular character using hex notation. it should be noted that there are various ways of specifying particular characters. for example, an "a" might be designated causing totals to accumulate for all alphabetics; or, a "u" and an "l" might be specified causing separate totals to be accumulated for upperand 1ower-case characters. in addition to the total counts for each class, individual counts of characters occurring within any class can be obtained for display along with the total count. output specifications formatted statistical information is output to the line printer. optionally, the selected records can be output on magnetic tape for later processing. limitations for the purpose of defining a query, more than one field may be specified for record and field selection, using as many statements as necessary. at present, however, the statistical processing for a particular run is performed on all of the run-criteria collectively. for example, separate runs of the program are required to obtain each frequency distribution. it is important to note that genesis is essentially a means of making marc research and development/ avram 257 counts. the statistical analysis of data is a complex task that requires sophisticated techniques. genesis does not have the capability to analyze data in terms of standard deviation, correlation, etc. but the output does constitute raw data for those kinds of analyses. although the four functions of genesis implemented to date do not, in themselves, provide a complete statistical analysis, they greatly lessen the burden of counting; and techniques for designating data elements to be counted suffice to describe extremely complex patterns. continued use of the program will no doubt provide guidelines for expansion of its functions. use of the program genesis has already provided analyses that are helpful in the design of automated procedures at the library of congress, as is indicated by the following instances. a frequency distribution of characters was made to aid in specifying a print train. an analysis of certain data characteristics has determined some of the specifications for the format recognition program described in an earlier section. genesis is providing many of the basic counts for a thorough analysis of the material currently being converted for the marc distribution service to determine frequency patterns of data elements. the findings should be valuable for determining questions about storage capacity, file organization, and retrieval strategy. although genesis is a new program in the marc system, there is little doubt that it is a powerful tool that will have many uses. marc retriever since the marc distribution service has been given the highest priority during the past two years, the emphasis in the implementation of the marc system has been on input, file maintenance, and output with only minimum work performed in the retrieval area. it was recognized, moreover, that as long as marc is tape oriented, any retrieval system put into effect at the library of congress would be essentially a research tool that should be implemented as inexpensively as possible. it did seem worthwhile, however, to build retrieval capability into the marc system to enable the lc staff to query the growing marc data base. query capability would answer basic questions about the characteristics of the data that arise during the design phases of automation efforts. in addition, it seemed desirable to use the data base in an operational mode to provide some needed experience in file usage to assist in the file organization design of a large bibliographic data base. the specifications of the system desired were: 1) the ability to process the marc processing format without modification; 2) the ability to query every data element in the marc record, alone or in combination (fixed fields, variable fields, the directory, subfield codes, indicators); 3) the ability to count the number of times a particular element was queried, to accumulate this count, print it or make it available in punched card 258 journal of library automation vol. 2/4 december, 1969 form for subsequent processing; and 4) the ability to format and output the results of a query on magnetic tape or printer hardcopy. to satisfy these requirements it was decided to adapt an operational generalized information system to the specifications of the library of congress. the system chosen was aegis, designed and implemented by programmatics, inc. the modification is known as the marc retriever. general description the marc retriever comprises four parts: a control program, a parser, a retrieval program, and a utility program. queries are input in the form of punched cards, stacked in the core of the ibm s /360, and operated on as though all queries were in fact one query. thus a marc record will be searched for the conditions described by all queries, not by handling each query individually and rewinding the input tape before the next query is processed. the control program is the executive module of the system. it loads the parser and reads the first query statement. the parser is then activated to process the query statement. on return from the parser, the control program either outputs a diagnostic message for an erroneous query or assigns an identification number to a valid query. after the last query statement has been parsed, the control program loads the retrieval program and the marc input tape is opened. as each record on the marc tape is processed, the control program checks for a valid input query. if the query is valid, the control program branches to the retrieval program. on return from the retrieval program, the control program writes the record on an output tape if the record meets the specifications of the query. after the last marc record has been read from the input tape, the control program branches to the retrieval program for final processing of any requested statistical function (hits, ratio, sum, avg) that might be a part of the query. the output tapes are closed and the job is ended. the parser examines each query to insure that it conforms to the rules for query construction. if the query is not valid, an error message is returned to the control program giving an indication as to the nature of the error. valid query statements are parsed and converted to query strings in polish notation, which permits mathematical expressions without parentheses. the absence of embedded parentheses allows simpler compiler interpretation, translations, and execution of results. the retrieval program processes the query strings by comparing them with the marc record data elements and the results of the comparison are placed in a true/false stack table. if the comparison result is true, output is generated for further processing. if the result is false, no action · takes place. if query expressions are linked together with "or" or "and'' connectors, the results in the true/false stack table are ored and anded together resulting in a single true or false condition. marc research and development/ avram 259 the utility program counts every data element (fixed field, tag, indicator, sub field code, data in a variable field) that is used in a query statement. the elements in the search argument are counted separately from those in the output specifications. after each run of the marc retriever, the counts can be printed or punched for immediate use, or they can be accumulated over a longer period and processed on demand. query language general. query statements for the marc retriever must be constructed according to a precisely defined set of rules, called the syntax of the language. the language permits the formation of queries that can address any portion of the marc record (fixed fields, record directory, variable fields and associated indicators and subfields). queries are constructed by combining a number of elements: marc retriever terms, operators, fixed field names, and strings of characters (hereafter called constants). the following sections describe the rules for constructing a query and the query elements with examples of their use. query formation. a query is made up of two basic parts or modes: the if mode which specifies the criteria for selecting a record; and the list mode which specifies which data elements in the record that satisfy the search criteria are to be selected for printing or further processing. in general, the rules that apply to constructing if-mode expressions apply to constructing list-mode expressions except that the elements in the list mode must be separated by a comma. a generalized query has the following form: if if-mode expression list list-mode expression; where: if if-mode expression list list-mode expression signals the beginning of the if mode. specifies the search argument. signals the beginning of the list mode. specifies the marc record data element( s) that are to be listed when the search argument specified in the if-mode expression is satisfied. the format of the query card is flexible. columns 1 through 72 contain the query which may be continued on subsequent cards. no continuation indicator is required. columns 73 through 80 may be used to identify the query if desired. the punctuation rules are relatively simple. one or more blanks must be used to separate the elements of a query and a query must be terminated by a semicolon. 260 journal of libmry automation vol. 2/4 december, 1969 queries that involve fixed fields take the following form: if fixed-field-name!= constant list fixed-field-name2 where: fixed-field-namel constant fixed-field-name2 the name of fixed field. any operator appropriate for this query. the search argument the fixed field to be output if a match occurs. to query or specify the output of a variable field, the following general expression is used. if scan (tag= nnn) = constant list scan (tag= nnn); where: scan tag nnn constant indicates that a variable field is to be referenced. indicates that the tag of a variable field is to follow. the only valid operator. specifies the tag of the variable field that is to be searched or output. specifies the character string of data that is the search argument. the marc retriever processes each query in the following manner. each record in the data base is read from tape into core and the data elements in the marc record specified in the if-mode expression are compared against the constant( s) in the if-mode expression. if there is a match, the data element( s) specified in the list-mode expression are output. key terms. the terms used in a query statement fall into two classes. the first group instructs the program to perform specified functions: scan, hits, avg, ratio, sum. the second group relates to elements of the record structure. the most important key terms in this class are: indic (indicator), ntc ( subfield code), record (the entire bibliographic record), and tag (variable field tag). these terms are used to define a constant; e.g., tag= 100. operators. operators are characters that have a specific meaning in the query language. they fall into two classes. the first contains relational operators, such as equal to and greater than, indicating that a numeric relationship must exist between the data element in the marc record and the search argument. the second class comprises the logical operators "and" and marc research and development/ avram 261 "or". the operators of the marc retriever are shown in table 2. in the definitions, c is the query constant and d is the contents of a marc record data element. table 2. operators of the marc retriever operator constan~s. > ;::: < ~ 1= & i meaning c equals d c is greater than d c is greater than or equal to d c is less than d c is less than or equal to d c is not equal to d "and" (both conditions must be true) "or" (at least one condition must be true ) a constant is either a string of characters representing data itself (e.g., poe, edgar allan) or a specific variable field tag, indicator( s), and subfield code( s). constants may take the following form: cc where cc is an alphabetic or numeric character or the pound sign"#". when this form is used, the marc retriever will convert all lower-case alphabetic characters in the data element of the marc record being searched to upper-case before a comparison is made with search argument. this conversion feature permits the use of a standard keypunch that has no lower-case capability for preparation of queries. 'cc' where cc can be any one of the 256 characters represented by the hexadecimal numbers 00 to ff. this form allows nonalphabetic or nonnumeric characters not represented on the standard keyboard to be part of the search argument. when this form is used, the marc retriever will also convert all lowercase alphabetic characters in the data elements in the marc record being searched to upper-case before a comparison is made. @cc@ where cc can be any one of the 256 characters represented by the hexadecimal numbers 00 to ff. when this form is used, characters in the data element of the marc record being searched will be left intact and the search argument must contain identical characters before a match can occur. # the pound sign indicates that the character in the position it occupies in the constant is not to take part in the comparison. for example, if the constant were #ank, tank, rank, bank would be considered matches. more than one pound sign can be used in a constant and in any position. 262 journal of library automation vol. 2/ 4 december, 1969 specimen queries. the following examples illustrate simple query statements involving fixed and variable fields. if mcpdate1 = 1967 list mcrcnumb ; the entire marc data base would be searched a record at a time for records that contained 1967 in the first publication date field ( mcpdate1). the lc card number (mcrcnumb) of the records that satisfied the search argument would be output. if scan(tag= 100) = destouches list scan(tag=245); the personal name main entry field (tag 100) of each marc record would be searched for the surname destouches. if the record meets this search argument, the title statement (tag 245) would be output. in addition to specifying that a variable field is to be searched, the scan function also indicates that all characters of the variable field are to be compared and a match will result at any point in the variable field where the search argument matches the variable field contents. for example, if the if-mode expression is scan(tag = 100) =smith a match would occur on the following examples of personal name main entries (tag 100) : smith, john; smithfield, jerome; jones-smith, anthony. it is possible to include the indicators associated with a variable field in the search by augmenting the constant of the scan function as follows: if scan(tag = 100&indic = 10) = destouches list scan(tag = 245); where: indic 1 0 specifies that indicators are to be included. specifies that the first indicator must be set to 1 (the name in the personal name main entry [tag 100] is a single surname, specifies that the second indicator must be set to zero (main entry is not the subject). the personal name main entry field (tag 100) of each record would be searched and a hit would occur if the indicators associated with the field were 1 and 0 and the contents of the field contained the characters "destouches." if the record met these search criteria, the title statement (tag 245) would be output. it is also possible to restrict the search to the contents of one or more subfields of a variable field. for example: if scan ( tag = loo&indic = 10&ntc = a) =destouches list scan(tag=245); where: ntc a indicates that a subfield code follows. specifies that only the contents of subfield a are to be included in the search. note that in this form the actual subfield code "a" is converted to "a" by the program (see section on constants) . marc research and development/ avram 263 special rules. so far the discussion has concerned rules of the query language that apply to either the if mode or the list mode. this section and the remaining sections will discuss those rules and functions that are unique to either the if mode or the list mode. in the if mode, fixed and variable field expressions can be anded or ored together using the logical operators & and j. for example: if mcpdate1 = 1967&scan(tag = 100) = destouches list scan(tag = 245); this query would search for records with a publication date field (mcpdate1) containing 1967 and a personal name main entry field ( tag 100) containing des touches. if both search criteria are met, the title statement field (tag 245) would be printed. in the list mode more than one fixed or variable field can be listed by a query as long as the fixed field names or scan expressions are separated by commas. for example: if scan(tag = 100) = destouches list scan(tag = 245) , mcrcnumb; the list mode offers two options, list and listm, which result in different actions. list indicates that the data elements in the expressions are to be printed, and listm indicates that the data elements in the expression are to be written on magnetic tape in the marc processing format. it is often desirable to list a complete record either in the marc processing format using listm or in printed form using list. in either case, the listing of a complete record is activated by the marc retriever key term record. for example: if scan (tag= 100) = destouches list record; the complete record would be written on magnetic tape in the marc processing format instead of being printed out if listm were substituted for list in the above query. four functions can be specified by the list mode. hits signals the marc retriever to count and print the number of records that meet the search criteria. for example: if scan(tag=650) = automation list hits; ratio signals the marc retriever to count both the number of records that meet the search criteria and the number of records in the data base and print both counts. the remaining two list functions permit the summing of the contents of fixed fields containing binary numbers. sum causes the contents of all specified fields in the records meeting the search criteria to be summed and printed. for example : if mcrcnumb = ·~~~68 # #####' list sum ( mcrlgth ); the data base would be searched for records with lc card number field 264 journal of library automation vol. 2/4 december, 1969 ( mcrcnumb) containing three blanks and 68 in positions one through five. the remaining positions would not take part in the query process and could have any value. if a record satisfied this search argument, the contents of the record length field (mcrlgth) would be added to a counter. when the complete data base had been searched, the count would be printed. avg performs the same function as sum and also accumulates and prints a count of the number of records meeting the search criteria. use of the program the marc retriever has been operational at the library of congress since may 1969 and selected staff members representing a cross-section of lc activities have been trained in the rules of query construction. the applications of the program to the marc master file include: identification of records with unusual characteristics for the format recognition study; selection of titles for special reference collections; and verification of the consistency of the marc editorial process. as the file grows, it is expected that the marc retriever will be useful in compiling various kinds of bibliographic listings, such as translations into english, topical bibliographies, etc., as well as in making complex subject searches. the marc retriever is not limited to use with the marc master file; it can query any data base that contains records in the marc processing format. thus, the legislative reference service is able to query its own data base of bibliographic citations to produce various outputs of use to its staff and members of congress. because the marc retriever is designed to conduct searches from magnetic tape, it will eventually become too costly in terms of machine processing time to operate. it is difficult to predict when the system will be outgrown, however, because its life span will be determined by the growth of the file and the complexity of the queries. meanwhile, the marc retriever should provide the means for testing the flexibility of the marc format for machine searching of a bibliographic file. references 1. u.s. library of congress. information systems office: the marc pilot project. (washington, d.c.: 1968), pp. 40-51. 2. rather, john c.; pennington, jerry g.: "the marc sort program," journal of library automation, 2 (september 1969), 125-138. 3. recon working task force. conversion of retrospective catalog records to machine-readable form. (washington, d.c.: library of congress, 1969). 4. u.s. library of congress. information systems office: subscribers guide to the marc distribution service, 3d ed. (washington, d.c.: 1969), pp. 31-3lb. 5. ibid., p. 40. marc research and development/ avram 265 6. cunningham, jay l.; schieber, william d.; shoffner, ralph m.: a study of the organization and search of bibliographic holdings records in on-line computer systems: phase i. (berkeley, calif.: institute of library research, university of california, 1969), pp. 85-94. 7. jollilie, john : "the tactics of converting a catalogue to machinereadable form," journal of documentation, 24 (september 1968), 149-158. filling the gap in database usability: putting vendor accessibility compliance to the test article filling the gap in database usability putting vendor accessibility compliance to the test samuel kent willis and faye o'reilly information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.11977 samuel kent willis (samuel.willis@wichita.edu) is assistant professor and technology development librarian, wichita state university. faye o’reilly (faye.oreilly@wichita.edu) is assistant professor and digital resources librarian, wichita state university. © 2020. abstract library database vendors often revamp simpler interfaces of their database platforms with scriptenriched interfaces to make them more attractive. sadly, these enhancements often overlook users who rely on assistive technology, leaving electronic content difficult for this user base despite the potential of electronic materials to be easier for them to access and read than print materials. even when providers are somewhat aware of this user group's needs there are questions about the effect of their efforts to date and whether accessibility documentation from them can be relied upon. this study examines selected vendors’ vpat reports (voluntary product accessibility template) through a manual assessment of their database platforms to determine their overall accessibility. introduction libraries are now providing more access to online databases than ever before. in fact, as blechner notes, most of the “information patrons seek is located in indexes and databases that are only available digitally. students and faculty rely heavily on these resources in completing course assignments and conducting research.”1 vendors frequently revamp simpler interfaces of their database platforms with script-enriched interfaces to make it more attractive to students.2 sadly, these enhancements often overlook users who rely on assistive technology, leaving electronic content difficult for this user base despite the potential of electronic materials to be easier for them to access and read than print materials. online databases not only bridge the gap for distance users but can also improve service to users with print disabilities.3 resources produced digitally or properly digitized for online dissemination more readily allow all users, including patrons with physical or mental impairments, to make use of them than do print materials. these resources allow all patrons to have access to updates and new publications at the same time, and can be presented in multiple formats.4 key features of electronic access that are helpful to users are zooming in on text and automatic reflow to reduce the need to scroll, improving color contrast or changing colors to make looking at the screen easier on the eyes, and the capability of the text to be read aloud by either a built-in feature or user-provided assistive technology such as a screen reader or refreshable braille display.5 all of this, however, presupposes that the content can be accessed using the platform provided by the vendor to navigate the database, and that the documents be made at least minimally accessible. the question is then, how well do these platforms interact with the assistive technologies employed by the largest minority group in the united states (persons with disabilities), relying on libraries to facilitate “their full participation in society,” and to achieve academic success?6 many vendors provide accessibility documentation pertaining to their database platforms. some note considerable limitations in accessibility while others claim to be highly accessible when in mailto:samuel.willis@wichita.edu mailto:faye.oreilly@wichita.edu information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 2 fact they may be no better that the former. accessibility guidelines like section 508 of the rehabilitation act sets forth are a good starting point, but related literature has emphasized that even conformance to these standards does not guarantee they will be usable for all.7 literature review accessibility in libraries has been examined from a variety of vantage points. some studies were an inspiration to our work and complementary to it, though our manual and holistic review of library databases from third parties was a unique approach. dermody and majekodunmi conducted a usability study of electronic databases, focusing on students unable to fully make use of analog materials.8 they asserted that technology, online databases in particular, can either be a help or a hindrance to users with print disabilities.9 after having visually impaired students use screen readers to test three proprietary databases, the authors concluded that their use of the platforms was disrupted by advanced features designed to engage users. study participants were frustrated to have to abandon a research article applicable to their topic because it was presented in an unreadable format.10 the authors found that as website design evolves to enhance the user experience, screen reader users and others who relied on assistive technology were often overlooked and unable to make use of the sites due to the construction of the platforms and due to inaccessible pdfs.11 regarding accessibility assessment, the authors asserted that database providers were unlikely to catch all issues or evaluate their products accurately.12 the legal responsibility for these shortfalls, however, belongs to the subscribing institutions.13 the results of dermondy and majekodunmi’s survey demonstrated that the usability of electronic databases was stunted by the limitations of screen readers, the platforms or materials themselves, and by insufficient information literacy training for assistive technology users. in 2015, blechner wrote about the challenges law students with disabilities face in their education, similar to any undergraduate or graduate program. this study was conducted by a librarian with screen-reading software and an accessibility checklist. blechner highlighted that using research databases with assistive technology to locate material and complete assignments was a barrier to completing legal education programs or passing the bar.14 in academic institutions, student success is related to library access. as much of a library’s resources are online, inaccessible electronic resources present a massive issue.15 database design is especially important to users who use assistive technologies to access online resources. blechner pointed out that an additional barrier to online resource access was an average delay of three years before an accessible version of a requested platform or service was prepared.16 if an undergraduate degree took four years to complete, a freshman living with a disability would be a senior before they have equitable access. blechner stressed a need for librarians to go beyond addressing the accessibility of their native web platforms and to inspect vendor platforms prior to subscribing to them. libraries "rarely raise the issue when selecting electronic indexes and databases for procurement from outside vendors.”17 libraries cannot adequately serve patrons and comply with legal requirements if they are unable to provide meaningful access to information for all library patrons. a significant point from blechner’s article was that compliance with federal standards does not guarantee a service is easy to use or usable at all. “a product can receive a rubber stamp even when it is not functional or usable despite a company’s good faith efforts to provide an accessible product.”18 other authors have supported this claim, which, along with our own observations, was an impetus for this research. information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 3 in chapter 8 of ensuring digital accessibility through process and policy, lazar, goldstein, and taylor used different web accessibility evaluation methods to verify vendor accessibility information on their platforms. the three methods they examined were (1) having users with disabilities test the platform or content using assistive technology; (2) conducting an expert review to ensure compliance with usability standards; and (3) performing an automated scan of the content using scanning software. regardless of method chosen for evaluation, the authors stressed the importance of continuous evaluation, as content can easily become inaccessible through changes to the user interface. the authors identified strengths and weaknesses to each of the approaches but recommended that whenever possible method one be used from early on in the development with a goal of ongoing improvement, and that method two be used in conjunction with it. when specifically examining the accessibility review of vendor-supplied database content, the authors noted that a voluntary product accessibility template (vpat) is one form of method two; however, its findings are only reliable insofar as the template is completed by an accessibility expert, and even then there is room for disagreement. 19 this supported the approach we undertook in this study to examine vendor databases and compare our findings with vendors’ vpats when available. in our professional experience, some vpat creators are experts in accessibility, while others are not, and even among experts opinions vary, which led us to the same conclusion as lazar et al.: “multiple experts, working independently, can increase the validity of the accessibility inspection.”20 jennifer tatomir created a checklist, the tatomir accessibility check-list (tac), to apply the accessibility guidelines to a usability study.21 at the time the article was written in 2010, the thencurrent web accessibility standards would have been the wcag 2.0 (released in 2008) and section 508 standards, last revised in 1998 to include equitable access to information and data under the protection of the law. wcag is now in version 2.1, with version 2.2 already in development, and section 508 requirements were updated in 2017 to include many wcag principles. the tac examined (1) documents and webpages; (2) bypass links; (3) page element labels; (4) captions for images and figures; (5) scripts and code that would interfere with assistive technology; (6) duplicate links; (7) transcripts for audiovisual material; (8) site organization; (9) timed responses; and (10) the accessibility of web forms.22 while the testing criteria used in this study differed from ours on several points, tatomir and durrance’s work supported our creation of the accessibility remediation guide (arg), a checklist of which section 508 standards would be the most important to our libraries (see appendix a). the arg will be discussed in more detail later in this article. finally, delancey conducted an assessment of the accuracy of 17 vendors’ vpats which was similar to one aspect of our research. her work used automated assessment tools as the primary measure for comparison against vpats, while this study is a direct comparison of two expert reviews.23 the goal of our research project was to determine the accuracy of vendor-supplied accessibility documentation—vpats in particular—to inform future communications with those vendors as well as collection development decisions moving forward. the studies used in this paper used sighted librarians, students using screen readers , and native users of screen readers to conduct accessibility testing. ideal candidates for accessibility testing would of course be users with disabilities. however, this approach can be complemented by a review for basic usability and compliance with section 508 standards. librarians are also ideal candidates for accessibility testing since they have access to and expertise in using research databases and are committed to providing access to all.24 librarians can also provide information information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 4 in advance, in anticipation of need. the findings of such accessibility testing could be beneficial in drafting licensing agreements that would ensure a higher level of service for patrons with disabilities. as blechner said, it is “critical that libraries independently exercise their power as buying agents to improve the state of electronic resource accessibility.”25 librarians can be instrumental in the development of database platforms moving forward by continually checking the accessibility of these platforms and sharing opportunities for enhancement with vendors.26 methodology this study made use of the arg (appendix a) for both vpat accuracy analysis, and overall testing of database accessibility. the arg was based on the standards set forth in section 508 of the rehabilitation act and related vpat creation guidelines. the arg has 11 criteria and was originally intended for accessibility evaluation of new databases, but two criteria were merged with others to make nine in order for it to be easier for a graduate student to evaluate. due to the breadth of technologies covered in a vpat, the authors determined that many of the sections in a vpat were not relevant to our examination. an example of this is section 1194.25 which refers to physical accessibility of kiosks and the like and therefore has no bearing on electronic content. the functionalities we chose to test were a restricted subset of the functionalities assessed in a vpat, but this set was selected for several reasons. some of the guidelines were selected due to their wide impact on a variety of assistive technologies related to the needs of persons with disabilities including blindness, deafness, limited vision, hearing, or mobility. following these guidelines would improve the performance of the platforms for use with screen readers and keyboards, eye tracking software, refreshable braille displays, and other assistive technology.27 other tests were chosen as a result of our preliminary investigation and use of the databases, and resulting evidence that they were areas on concern. finally, some of these items to be examined were selected because a lack of accessibility in these areas would result in drastic limitations to the usability and therefore utility of the databases overall, even if they rarely applied. the reasons behind this study were threefold. firstly, 62 percent (48) of our vendors had provided no vpat. this test would fulfill a similar purpose, allowing us to know how accessible these databases without vpats were as well as identify particular areas requiring remediation in anticipation of patron needs. secondly, our library had anecdotal evidence that some of the vpats that were provided contained inaccuracies but without a thorough examination it was impossible to know the particulars or extent of the issues. finally, the goal of the project was to identify trends in database accessibility and usability for persons with disabilities, comparing major database providers with smaller vendors. these findings will give insight into what most needs to be addressed based on the size and type of content provider and will likely have some bearing on similar institutions’ collections. these are the criteria we used in testing. other institutions, if following our example, would likely want to adapt the list to meet their needs and institutional priorities: 1) keyboard navigation and intuitive forms 2) presence of keyboard traps 3) platform optical character recognition (ocr) 4) document ocr 5) alternative text 6) table data information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 5 7) skip navigation 8) transcripts 9) closed captions note: criteria 3, 8, and 9 included testing support materials, including video tutorials for criteria 8 and 9. once we had determined which sections to include, we hired and trained a graduate student to use the arg to examine each database to which we subscribed on the nine criteria. we were awarded a grant to fund the student’s work. the student tested each database platform and a minimum of three items in each database, manually checking them using a keyboard and screen reader (nvda). this testing fulfilled the majority of our priorities but was supplemented by her checking for transcripts and captions for video content. while the findings cannot be comprehensive in a manual test this work is complementary with existing vpats in enabling us to identify areas in need of development in vendor platform usability. it is noteworthy that by testing the databases manually with a screen reader, certain limitations in the usability of the databases were discovered that would not have been revealed by doing automatic checks as have been done in similar studies. an excellent example of this is poorly designed skip navigation (which was found for nearly half of our databases). using the data we collected with the assistance of our graduate student, we compiled and compared our findings on the various vendors. our scoring (based on representative random sampling) gave one point for a database passing a single criterion, half a point for a minor issue, and no points for any criteria that failed our tests. the scores were then added together, ignoring any criteria which did not apply to particular databases, to form a composite score. when analyzing vendors with multiple databases, their overall score was based on the average of the individual database scores. this enabled us to codify a percentage of accessibility for every vendor and compare them. for the purposes of this study, we will refer to any vendor that provided the university libraries with 15 or more database subscriptions as large vendors (lvs), and the rest as small vendors (svs). given that we only subscribe to 15 or more databases from a few vendors, some of the vendors we classified as svs would likely be considered lvs at other institutions. research findings vpat accuracy assessment as previously stated, one goal of this research was to measure the accuracy of vendor-supplied vpats. 227 databases assessed had an associated vpat from the vendor, but the rest did not ( see appendix b for list of all databases by vendor). we used the arg (appendix a), and compared the vendors’ claims on the vpat to our manual testing of the database functionality. of the 227 databases, only 10 databases were found to fully match the claims the vendor made on the vpat for the 11 criteria assessed from the arg. databases where the vpat claims did not match the findings of the testing on one criteria were given a score of “partial match.” of the 227 databases, 138 were considered partial matches (see figure 1 for details). the main incongruity between vpats and our results were due to the databases not having sufficient skip navigation, meaning they did not have appropriate or functional bypasses. these issues are likely due to outdated vpats that do not reflect the latest changes to the databases but could also be the result of vendors’ lack of understanding of what it means to be truly usable by persons with disabilities. information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 6 for databases that failed two or more of the criteria tested, a score of “not a match” was given, with 79 databases of 227 failing (see figure 1 for details). for these databases, skip navigation and alternative text were the main issues. the databases, when presenting essential content in an image, like a photo or chart, did not provide an alternative presentation of that content, which means only sighted users could access the data from that image. the findings of this study are similar to the data from the overall usability study, finding that vendors struggled with skip navigation and alt-text, as we will discuss below. some of these databases were also found to have keyboard traps that prevented screen reader users from navigating to the entire site and at times may even trap the user’s navigation in a single content area. this number of inconsistencies was even higher than the authors anticipated but reinforced all the more the importance of not taking information in vpats for granted, especially when the vpat is several years old and the platform has undergone any changes. figure 1. vpat accuracy assessment accessibility analysis overall related to the vpat accuracy assessment, we conducted manual tests of our databases and database platforms, both those with and without a vpat provided by the vendor. of our 351 databases, 124 (35 percent) had no related vpat, and on a whole, examining all criteria, we found them notably less accessible. that said, there were exceptions where databases with no associated vpat still had accessibility information giving reasonable detail, and others where the vpat provided was inaccurate or where it highlighted significant accessibility issues (see tables 1 and 61% 4% 35% partial match match not a match information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 7 2). the average composite score of vpat-linked databases was 74 percent, compared to 67 percent for those with none (see table 3 for comparison). each criterion was compared and any instance where one category of databases was more than five percent higher than the other was highlighted. table 1. summary of issues for databases with vpats (227 total) good partial poor applicable n/a download ocr 50 32 78 160 67 skip navigation 68 124 35 227 0 transcripts 42 4 15 61 166 alt text 71 39 12 122 105 tables 17 38 4 59 168 captions 35 0 8 43 184 platform ocr 108 41 7 156 71 keyboard navigation 202 22 3 227 0 keyboard traps 224 0 3 227 0 average 90.78 33.33 18.33 142.44 84.56 table 2. summary of issues for databases without vpats (124 total) good partial poor applicable n/a download ocr 61 14 27 102 22 skip navigation 47 27 48 122 2 transcripts 3 0 8 11 113 alt text 52 38 26 116 8 tables 30 8 2 40 84 captions 6 1 4 11 113 platform ocr 88 10 16 114 10 keyboard navigation 74 35 15 124 0 keyboard traps 123 0 1 124 0 average 53.78 14.78 16.33 84.89 39.11 information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 8 table 3. comparison of databases with and without vpats (351 total) percent good of applicable databases with vpats (227) percent good of applicable databases without vpats (124) download ocr 41.25% 66.67% skip navigation 57.27% 49.59% transcripts 72.13% 27.27% alt text 74.18% 61.21% tables 61.02% 85.00% captions 81.40% 59.09% platform ocr 82.37% 81.58% keyboard navigation 93.83% 73.79% keyboard traps 98.68% 99.19% average 73.57% 67.04% the biggest barriers to accessibility found in this study pertained to downloadable files’ ocr, skip navigation, transcripts, and alternative text (see figure 2 and table 4). the accessibility of downloadable files through ocr or alternative formats (txt, html, etc.) was found to be the most major concern, though it did not apply to all databases. its overall score for applicable databases was 51 percent, based on the frequency and severity of the issues. many database platforms had full text available for download only through pdfs that were images of text or that had other issues failing to work with assistive technologies. it was more than twice as frequent for a database to have inaccessible downloadable files as inaccessible full text online. often html or txt formats were not available for download, but in instances where it was available through the vendor’s platform, another means of accessing the information mitigated this issue. other times, however, the full text on the platform itself was not accessible. information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 9 figure 2. accessibility issues by database table 4. summary of issues by database platform (351 total) good partial poor applicable n/a percent good of applicable download ocr 111 46 105 262 89 51.15% skip navigation 115 151 83 349 2 54.58% transcripts 45 4 23 72 279 65.28% alt text 123 77 38 238 113 67.86% tables 47 46 6 99 252 70.71% captions 41 1 12 54 297 76.85% platform ocr 196 51 23 270 81 82.04% keyboard navigation 276 57 18 351 0 86.75% keyboard traps 347 0 4 351 0 98.86% average 144.56 48.11 34.67 227.33 123.67 72.67% a lack of or poorly executed skip navigation accounted for the second greatest number of issues by vendor. this criterion’s final score was 55 percent. when skip navigation existed, the most common problem was for it to not redirect to the main content. often times, for example, on the search results page, the link would take the user to the filters in the margin with no easy way to bypass them and get to the actual results. eighty-three databases were found to have no skip navigation whatsoever, but the majority of issues found were from existing bypass links not working as intended. d o w n l o a d o c r s k i p n a v i g a t i o n t r a n s c r i p t s a l t t e x t t a b l e s c a p t i o n s p l a t f o r m o c r k e y b o a r d n a v i g a t i o n k e y b o a r d t r a p s number of databases t e s t in g c r it e r ia good partial poor information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 10 databases with audiovisual materials made up a relatively small portion of our databases, but when these types of items existed, problems were not infrequent. additionally, we examined support videos made available by database providers to test all multimedia content for transcripts and captions. twenty-seven out of 72 (38 percent) were determined to have inaccurate transcripts or be in need of them. captions are irrelevant to non-visual materials, so were only applicable to 54 databases. of these, 13 (24 percent) were found lacking. therefore, transcripts were the bigger issue. nearly half of the databases with images had at least minor issues with alternative text, whether in documents or the platforms themselves. in many cases, this issue was not identified by the vendor in any accessibility documentation because alternative text was present, but not properly descriptive. thirty-eight databases (16 percent of applicable) had major issues where images were important to the performance of the platform or database and no alternative text was provided. in database materials, charts and graphs were often lacking any alternative text, though on occasion we found the information conveyed in the chart was covered in the main text. in these instances, that was not counted as an issue. the results for tables were similar. both in the platforms and the documents, tables often lacked identifying header and cell information for screen readers to make sense out of the data. a few were entirely unreadable. fifty-two of 99 databases with tabular data (53 percent) had problems, but most of them were not major, and for this reason, tables were of less concern than alternative text. finally, keyboard navigation was a rarer issue, but still was found to be a concern in 75 databases (21 percent). this was often related to images or forms not having descriptive text for screen readers, so non-visual users would be unable to know the purpose of the form, etc. on a few occasions database platforms would have keyboard traps that prevented screen reader users from navigating to the entire site, or more often at least buttons or links that could be used only with a mouse. while our testing only included keyboard navigation, it is important to remember that if it is not usable by keyboard, neither is it likely to work with other assistive technology used for navigation. while this area was of least frequent concern of all criteria we tested, it is nevertheless a vital part of making any website or platform truly usable. all these findings were important to our study as they helped us to identify areas of need, especially for databases that had no corresponding vpat. whether the databases had a vpat or not, this research provided us with the details needed to reach out to database providers and request specific improvements. vendor comparison by size the final goal of our research was to compare the relative accessibility of database providers based on the number of databases we subscribed to from each. while at times we may have subscribed to only a small number of databases from a larger content provider, there was a general correlation between what we considered svs in this study and those vendors that only offer a more limited number of collections. in assessing the percent accessible a provider was for each criterion, we added all good scores, one point for each related database, to the partial scores, one half a point for each, then divided it by the total number of databases in this area. in this way, minor issues were not recorded as negatively as major issues. overall accessibility of the lv databases was found to be significantly higher than accessibility of individual databases and svs (see tables 5 and 6 for findings for lvs and svs respectively). our information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 11 findings showed our lvs to have an average score of 74 percent accessibility, compared to 69 percent for svs, both averages being based on the number of applicable databases. there were two tested criteria, however, that lvs scored lower on than svs: downloadable files’ ocr, and tables. the details of each criterion will be discussed below. most lv content is on a consistent platform and we found, similar to an earlier study, that this consistency helped those materials to be more accessible.28 the issues lv databases most often had were related to individual items, rather than to the platform as a whole. for example, lvs were found to have frequent problems with pdf files. given that our lvs account for 61 percent of our databases (214 of 351) and they are typically larger than the databases of svs, this has significant impact on ongoing vendor communication and accessibility remediation efforts. skip navigation issues was the largest problem found for svs. interestingly, while no lvs were entirely missing skip navigation, a lack of proper functionality was a major concern for half of them, accounting for 121 databases. thirteen databases were found where lvs had no skip navigation. in contrast, 70 sv databases (52 percent of sv content for which this criterion applied) had no skip navigation or it failed to function at all. an additional 30 sv databases and 121 lv databases had improperly functioning bypass links. overall, svs were more likely to have none at all, and lvs were more likely to have it not properly set up. full text ocr results varied greatly depending on the type. platform ocr showed little difference between lvs and svs, both being found to be 82 percent accessible. as mentioned previously, downloadable files ocr had more accessibility problems than platform ocr, but there was a large difference between lvs and svs. for this criterion lv content was found to be only accessible about 40 percent of the time, and sv content 70 percent of the time. this may be due to svs generally having smaller databases so it is less difficult to address accessibility needs for individual items. whatever the cause, the disparity between lvs and svs in this area was very significant. transcripts and captions were far more common for lvs than svs. fifty-five databases (26 percent of lv databases) included audiovisual material, including support tutorials, while only 17 (12 percent) of sv content did. lv content was found to be accessible 73 percent of the time for transcripts, and 82 percent for captions. applicable svs on the other hand were only 41 percent accessible for transcripts, and 66 percent accessible for captions. this demonstrates the need for development in both these areas, but especially for transcripts, which when synchronized with the videos have the capability to full more user needs than captions can. closely following transcripts was alternative text for non-textual content like charts, diagrams and other images. it is worth mentioning that some databases have images neither in their platforms nor in their collection materials. if the platform is simple and the database only provides abstracts, for example, there may be no images, in which case this criterion does not apply. nearly one-third (113) of the databases were found to have no images. of the 238 databases with images, we found at least some issues with 115 (48 percent of applicable, 33 percent overall), there being no significant difference between svs and lvs as a whole. individually, the platforms varied greatly, and regarding major limitations in alternative text there were found to be 21 sv databases, but only 17 lv databases. information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 12 while table accessibility applied to only 99 databases, there were significant issues found particularly with one lv. given the disparity between lvs it is impossible to draw meaningful conclusions comparing lvs and svs for this criterion. further study is needed in this area. finally, the areas of least frequent concern were keyboard navigation and keyboard traps. seventy-five databases (21 percent) were found to have suboptimal navigation. in this case, lvs did not have as many issues as svs. optimization is needed for them, but only one lv had major issues in this area. forty percent of sv databases (55 of 137) had at least some navigation issues identified, whereas only nine percent of lv databases (20 of 214) had any issues in this area. as for major issues, only four databases were identified in our study as having keyboard traps, two svs and two lvs. these only seemed to appear for separate platforms and never for large ones, suggesting that our vendors are likely aware of this issue and avoiding it in newly created platforms. the authors hope the remaining databases with this issue will not be neglected in making these improvements. to sum up, lv content was found to be more accessible overall. their largely consistent platforms more often had skip navigation (29 percent more), transcripts (32 percent more) and captions (16 percent more) for multimedia content, and superior keyboard navigation (18 percent more). sv platforms, however, had a higher score on downloadable files ocr (31 percent more) and on tables (24 percent more). see table 7 for detailed comparison. table 5. issues by lv database (214 total) good partial poor applicable n/a download ocr 52 25 86 163 51 skip navigation 80 121 13 214 0 transcripts 38 4 13 55 159 alt text 59 41 17 117 97 tables 14 35 4 53 161 captions 31 0 7 38 176 platform ocr 106 39 8 153 61 keyboard navigation 194 13 7 214 0 keyboard traps 212 0 2 214 0 average 87.33 30.89 17.44 135.67 78.33 information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 13 table 6. issues by sv database (137 total) good partial poor applicable n/a download ocr 59 21 19 99 38 skip navigation 35 30 70 135 2 transcripts 7 0 10 17 120 alt text 64 36 21 121 16 tables 33 11 2 46 91 captions 10 1 5 16 121 platform ocr 90 12 15 117 20 keyboard navigation 82 44 11 137 0 keyboard traps 135 0 2 137 0 average 57.22 17.22 17.22 91.67 45.33 table 7. comparison of lv databases and sv databases (351 total) percent good of applicable databases from lvs (214) percent good of applicable databases from svs (137) download ocr 39.57% 70.20% skip navigation 65.65% 37.04% transcripts 72.73% 41.18% alt text 67.95% 67.77% tables 59.43% 83.70% captions 81.58% 65.63% platform ocr 82.03% 82.05% keyboard navigation 93.69% 75.91% keyboard traps 99.07% 98.54% average 73.52% 69.11% information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 14 conclusion and limitations this investigation was intended to complement existing studies related to library database accessibility. it was unique in that it manually analyzed content from every database subscription in the university libraries, rather than only major or representative databases or automated tests. building a comparison between vendor vpats and our manual assessment was a key value of this research that we hope will be further developed in future inquiry. the comparison of different types of vendors was also important. while the consistency of lv platforms was found to improve the sites overall, the authors expected that lv content would be more compliant with accessibility regulations than they were found to be. from a usability and accessibility perspective, the increased cost of these databases was deemed to be associated with too little improvement of service. it matters little how clean a platform looks to visual users, for example, if it is impossible or very difficult to use by non-visual users. as anticipated, there were few instances of keyboard traps (when a keyboard and screen reader user is caught in a loop or on a single link when attempting to navigate through the website). when these occur, however, it is a major concern, as it renders the site virtually useless for non-mouse users. there was no significant difference between lvs and svs on three of nine criteria—including keyboard traps— and on two criteria, svs were superior. therefore, despite that lvs were found to be 14 percent more accessible on average, the authors urge lvs to work diligently to address the areas where they were found to be deficient. both aspects of this study concluded that vendors generally misunderstood the execution of sk ip navigation and alternative text, as a usability study of databases proved many databases failed in fulfilling these criteria, while a separate study of their vpats’ accuracy proved vendors claimed they did comply with the criteria, while the platform was found to not comply fully. this study is limited in that few samples were able to be examined for each content type in every database platform. the authors anticipate that a deeper investigation would bring to light additional accessibility concerns. another limitation of this research was related to the time involved in testing. database platforms changed during the course of this work, but the results of this study pertain to only a short period of time, making them in cases outdated even at the time of this writing. therefore, the manual testing we have performed would work best when used in conjunction with automated tools for testing database content as other studies have done. the authors hope that further study in this area could involve persons with varied impairments to test the platforms directly and assert that there is potential for collaboration between vendors and libraries in this area. information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 15 appendix a: accessibility remediation guide the authors developed the arg for database testing prior to signing licensing agreements with vendors. while initially create d based on vpat version 1 criteria as defined in section 508 standards, it was adapted and cross -referenced with vpat version 2 criteria following the refresh of section 508 in january of 2018. the organization of the criteria was altered greatly at tha t time, but vpats from vendors may use either version, depending on the age of the vpat. finally, it was used in this study to create the testing criteria. testing criteria vpat version 1-1.6 standards vpat version 2-2.3 standards notes section 1194.22 (web-based intranet and internet information and applications) related standards after section 508 refresh 5 (alternative text) a) a text equivalent for every non-text element shall be provided (e.g., via “alt,” “longdesc,” or in element content). e101 (web, software), e201 (application) wcag: 1.1.1 non-text content 8 and 9 (transcripts and closed captions) b) equivalent alternatives for any multimedia presentations shall be synchronized with the presentation. 500 (software) wcag: 1.2.2 captions (prerecorded) and 1.2.3 audio description for streaming media only “equivalent alternatives” include transcripts. 3, 4 and 6 (platform ocr, document ocr, and table data) d) documents shall be organized so they are readable without requiring an associated style sheet. e205.2-4 (electronic content) wcag: 1.3.2 meaningful sequence “documents” describes the webpage. is the webpage well organized so it’s readable without style elements (colors, blocking, font sizes, etc.). 1 and 3 (keyboard navigation and intuitive forms, and platform ocr) l) when pages utilize scripting languages to display content, or to create interface elements, the information provided by the script shall be identified with the functional text that can be read by assistive technology. e205.2-4 (electronic content) wcag: 2.1.1 keyboard does the database include interactive content (buttons, check boxes, or other mouse input), news tickers, media players, browser games etc.)? is this content accurately identified via text for use with screen readers? information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 16 testing criteria vpat version 1-1.6 standards vpat version 2-2.3 standards notes 1 (keyboard navigation and intuitive forms) n) when electronic forms are designed to be completed online, the form shall allow people using assistive technology to access the information, field elements, and functionality required for completion and submission of the form, including all directions and cues. e205.2-4 (electronic content) wcag: 3.2.1 on focus definition of “form” includes search boxes in databases. is the search box, search boxes’ purpose, and purpose accurate? 7 (skip navigation) o) a method shall be provided that permits users to skip repetitive navigation links. e205.2-4 (electronic content) wcag: 2.4.1 bypass blocks and 1.3.1 info and relationships section 1194.24 (video and multi-media products) related standards after section 508 refresh 8 and 9 (transcripts and closed captions) e) display or presentation of alternate text presentation or audio descriptions shall be user-selectable unless permanent. 400 (hardware) wcag: 1.2.1 and 1.2.3 audio description or media alternative for streaming media only section 1194.31 (functional performance criteria) related standards after section 508 refresh 3, 4 and 6 (platform ocr, document ocr, and table data) a) at least one mode of operation and information retrieval that does not require user vision shall be provided or support for assistive technology used by people who are blind or visually shall be provided. 302.1 (vision) wcag: 1.4.5 images of text do pdfs have optical character recognized (ocr) text, or are they only images of text? if they do have ocr text, is it accurate? is it missing information in images or figures? information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 17 testing criteria vpat version 1-1.6 standards vpat version 2-2.3 standards notes 8 and 9 (transcripts and closed captions) c) at least one mode of operation and informational retrieval that does not require user hearing shall be provided, or support for assistive technology used by people who are deaf or hard of hearing shall be provided. 303.4 (hearing) wcag: 1.2.1 and 1.2.2 1 and 2 (keyboard navigation and intuitive forms, and presence of keyboard traps) f) at least one mode of operation and information retrieval that does not require fine motor control or simultaneous actions and that is operable with limited reach and strength shall be provided. 303.7 (limited manipulation), 303.8 (limited reach) wcag: 2.1.1 keyboard section 1194.41 (information, documentation and support) related standards after section 508 refresh 3, 8 and 9 (platform ocr, transcripts, and closed captions) b) end-users shall have access to a description of the accessibility and compatibility features of products in alternate formats for alternate methods upon request, at no additional charge. 602.2 (accessibility and compatibility features) and 603.2 (information on accessibility and compatibility features) wcag: 3.3.5 help information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 18 appendix b: database by vendor list used in vpat accuracy audit vendor name aapg (american association of petroleum geologists) aapg/datapages abc-clio arbaonline acls (american council of learned societies) acls humanities e-book acm (association of computing machinery) acm digital library acs (american chemical society) scifinder adam matthew digital african american communities migration to new worlds american indian histories and cultures american west digital collection aiaa (american institute of aeronautics & astronautics) aiaa electronic library alexander street press academic video online african american music reference american civil war: letters and diaries american history in video anthropological field work online anthropology online art and architecture in video asian american drama bbc video collection black drama black studies in video border and migration studies online broadway hd classical music in video classical music library classical performance in video classical scores library contemporary world drama counseling and psychotherapy transcripts: volume i counseling and psychotherapy transcripts: volume ii counseling and therapy in video dance online: dance in video dance online: dance studies collection diagnosing mental disorders: dsm-5 and icd10 information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 19 vendor name disability in the modern world drama texts collection early encounters in north america education in video engineering case studies online environmental issues online ethnographic sound archives online ethnographic video online food studies online gilded age global issues library human rights studies online illustrated civil war newspapers and magazines images of america: a history of american life in images and texts international business online lgbt studies in video lgbt thought and culture music online: listening (united states) music periodicals of the 19th century new world cinema: independent features and shorts (1990-present) north american immigrant letters, diaries and oral histories north american indian thought and culture north american women's drama north american women's letters and diaries nursing and mental health in video: a symptom media collection nursing education in video pbs video collection performance design archive psychological experiments online royal shakespeare company collection silent film online sixties: primary document and personal narratives 1960– 1974 social theory social work online sony pictures classics information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 20 vendor name theatre & drama premium theatre in context theatre in performance theatre in video: volume i theatre in video: volume ii twentieth century drama underground & independent comic, comix and graphic novels women and social movements in the united states, 16002000 world history in video 60 minutes: 1997–2014 american institute of physics scitation index spin american mathematical society mathscinet apa (american psychological association) apa books e-collections asm international asm handbooks online asme (american society of mechanical engineer) asme digital collection astm astm standards & engineering digital library bioone bioone books 24x7 financepro itpro britannica encyclopedia britannica online spanish reference center business expert press business expert press cabell's cabell's directory psychology set cabell's directory educational set cambridge crystallographic data centre cambridge structural database (webcsd) webcsd cambridge university press historical statistics of the united states (hsus) chadwyck healey early english books online black abolitionist papers black studies center black studies center: history makers module early english books online text creation project clcd (children's literature comprehensive database) children's literature comprehensive database (clcd) cq press cq researcher information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 21 vendor name credo reference masterworks credo reference datazoa datazoa ebsco agricola alt-healthwatch america: history & life (ebsco) american antiquarian society (aas) historical periodicals collection (series 1–5) american doctoral dissertations 1933–1955 anthropology plus applied science & technology abstracts art abstracts art full text art index retrospective atla (american theological library association) historical monographs collection: series i atla (american theological library association) historical monographs collection: series ii auto repair reference center biography reference bank book collection: nonfiction book review digest plus business abstracts with full text business source complete cinahl complete communication & mass media complete computer source: consumer edition consumer health complete criminal justice abstracts with full text ebook collection (formerly netlibrary) ebsco databases econlit education full text ergonomics abstracts eric (ebsco) european views of the americas: 1493 to 1750 fuente academica general science full text georef information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 22 vendor name georef in process greenfile health source: consumer edition health source: nursing/academic edition history reference center humanities full text library literature & information science full text library, information science & technology abstracts (lista) literary reference center mediclatina medline (ebsco) mental measurements yearbook with tests in print mla directory of periodicals mla international bibliography music index native american archives newspaper source plus novelist plus omnifile full text mega philosopher's index psycarticles psychology and behavioral sciences collection psycinfo psyctests readers' guide full text regional business news religion & philosophy collection rilm abstracts of music literature small business reference center smartsearch social sciences full text sportdiscus with full text teacher reference center topicsearch vocational & career collection women's studies international academic search complete information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 23 vendor name ei engineering village compendex elsevier sciencedirect clinical pharmacology scopus gale 19th century u.s. newspapers 19th century uk periodicals academic onefile archives unbound artemis primary sources british literary manuscripts online business insights: essentials economist historical archive educator's reference complete eighteenth century collections online (ecco) expanded academic asap gale databases gale digital collections gale virtual reference library general onefile greenr (global reference on the environment, energy, and natural resources) health & wellness resource center (with alternative health module) health reference center academic indigenous peoples: north america informe academico infotrac newsstand kansas history, territorial through civil war years, 1854– 1865 legaltrac literature resource center making of the modern world nineteenth century collections online (ncco) opposing viewpoints in context sabin americana, 1500–1926 slavery and anti-slavery collection smithsonian collections online: evolution of flight 1784– 1991 testing & education reference center: terc information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 24 vendor name times (london) google google scholar guidestar guidestar hathitrust hathitrust heinonline heinonline: government, politics and law heinonline: slavery in america and the world: history, culture & law ibisworld ibisworld ieee ieee mit press ebooks library ieee xplore digital library ieee-wiley ebooks library infobase learning films on demand infogroup referenceusa institute of physics iopscience interdok directory of published proceedings jstor jstor kanopy kanopy streaming knovel knovel lexisnexis lexisnexis academic nexis uni library of congress congress.gov (formerly thomas legislative) mergent key business ratios mergent archives mergent intellect mergent online national academies press national academies press publications national library of medicine pubmed (medline) naxos naxos music library naxos sheet music library ncjrs national criminal justice reference service abstracts newsbank access world news newsbank oclc archivegrid articlefirst camio: catalog for art images online clase and periodica eco (electronic collections online) firstsearch information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 25 vendor name oaister oclc electronic books papersfirst proceedingsfirst worldcat (oclc) worldcat dissertations and theses worldcat.org ovid ovidsp oxford university press oxford art online oxford english dictionary oxford history of western music oxford medicine online oxford music online oxford reference online: premium projectmuse project muse proquest abi/inform collection aerospace database agricultural & environmental science database american periodicals series (1741–1988) annual register (1758–2016) art and architecture archive (1845–2005) biological science database chicago defender (1910–1975) (proquest historical black newspapers) cleveland call & post (1934–1991) (proquest historical black newspapers) comdisdome design and applied arts index (daai) digital national security archive (dnsa) dissertations and theses @ wichita state university earth, atmospheric & aquatic science database eblebook library (now ebook central) *ebook central ebrary (now ebook central) eric (proquest) fold3 harper's bazaar archive heritagequest online literature online (lion) information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 26 vendor name los angeles sentinel (1934–2005) (proquest historical black newspapers) materials science & engineering database medline (proquest) national criminal justice reference service abstracts (proquest) new york amsterdam news (1922–1993) (proquest historical black newspapers) new york times (1851–3 years ago) with index (1851– 1993) (proquest historical newspapers) new york tribune/ herald tribune (1841–1962) (proquest historical newspapers) pais index periodicals archive online pilots pittsburg courier (1911–2002) (proquest historical black newspapers) pittsburg postgazette (1786–2003) (proquest historical newspapers) proquest civil war era 1840–1865 proquest congressional publications (including hearings) proquest databases proquest digital microfilm proquest historical newspapers proquest history vault proquest nursing & allied health source proquest research library research library, proquest scitech premium collection social services abstracts sociological abstracts technology collection the christian science monitor (1908–1994) (proquest historical newspapers) the guardian & the observer (1791–1909) (proquest historical newspapers) ulrichsweb.com vogue archive women's magazine archive collection 1: 1883–2005 women's magazine archive collection 2: 1846–2015 readex african american newspapers (1827–1998) information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 27 vendor name america's historical newspapers (1690–1922) american state papers, 1789–1838 early american imprints readex allsearch territorial papers of the united states, series 1 u.s. congressional serial set, 1817–1994 sage sage journals online sage reference online sage research methods sage research methods cases sage stats salem press salem history salem literature sbrnet sports market analysis (formerly sbrnet) springer springerlink state library of kansas mango languages cloud library digital books elending learning express library oneclick digital statista statista swank swank digital campus taylor & francis crc press ebooks europa world year book thomson reuters arts & humanities citation index medline (web of science) ria checkpoint science citation index social sciences citation index web of science u.s. department fo commerce stat-usa u.s. census bureau u.s. department of education eric u.s. government printing office catalog of u.s. government publications gpo monthly catalog homeland security digital library university of chicago gss (general social survey) information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 28 vendor name university of michigan icpsr (inter-university consortium for political and social research) uptodate uptodate valueline valueline investment survey plus wharton research data services (wrds) compustat eventus wiley cochrane library wiley online library information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 29 endnotes 1 a. j. blechner, “improving usability of legal research databases for users with print disabilities,” legal reference services quarterly, 34, no. 2 (2000): 139, https://doi.org/10.1080/0270319x.2015.1048647. 2 jennifer horwath, “evaluating opportunities for expanded information access: a study of the accessibility of four online databases,” library hi tech 20, no. 2 (2002): 199, https://doi.org/10.1108/07378830210432561. 3 blechner, “improving usability of legal research databases for users with print disabilities,” 140. 4 horwath, “evaluating opportunities for expanded information access: a study of the accessibility of four online databases,” 199. 5 sarah george, ellie clement, and grace hudson, “auditing the accessibility of electronic resources,” sconul focus, 62 (2014): 16. 6 blechner, “improving usability of legal research databases for users with print disabilities,” 141. 7 suzanne l. byerley and mary beth chambers, “accessibility and usability of web-based library databases for non-visual users,” library hi tech, 20, no. 2 (2002) 177, https://doi.org/10.1108/07378831111116976; blechner, “improving usability of legal research databases for users with print disabilities,” 140. 8 kelly dermody and norda majekodunmi, “online databases and the research experience for university students with print disabilities,” library hi tech 20, no. 1 (2011): 150, https://doi.org/10.1108/07378831111116976. 9 dermody and majekodunmi, “online databases and the research experience for university students with print disabilities,” 156. 10 dermody and majekodunmi, “online databases and the research experience for university students with print disabilities,” 156. 11 dermody and majekodunmi, “online databases and the research experience for university students with print disabilities,” 156–7. 12 dermody and majekodunmi, “online databases and the research experience for university students with print disabilities,” 151. 13 dermody and majekodunmi, “online databases and the research experience for university students with print disabilities,” 144. 14 blechner, “improving usability of legal research databases for users with print disabilities,” 142. https://doi.org/10.1080/0270319x.2015.1048647 https://doi.org/10.1108/07378830210432561 https://doi.org/10.1108/07378831111116976 https://doi.org/10.1108/07378831111116976 information technology and libraries december 2020 filling the gap in database usability | willis and o’reilly 30 15 blechner, “improving usability of legal research databases for users with print disabilities,” 139. 16 blechner, “improving usability of legal research databases for users with print disabilities,” 145. 17 blechner, “improving usability of legal research databases for users with print disabilities,” 138. 18 blechner, “improving usability of legal research databases for users with print disabilities,” 140. 19 jonathan lazar, daniel f. goldstein, and anne taylor, ensuring digital accessibility through process and policy (amsterdam: morgan kaufmann/elsevier, 2015), 150. 20 lazar, goldstein, and taylor, ensuring digital accessibility through process and policy, 153. 21 jennifer tatomir and joan c. durrance, “overcoming the information gap: measuring the accessibility of library databases to adaptive technology users,” library hi tech 28, no. 4 (2010): 581. 22 tatomir and durrance, “overcoming the information gap: measuring the accessibility of library databases to adaptive technology users,” 581. 23 laura delancey, “assessing the accuracy of vendor-supplied accessibility documentation,” library hi tech 33, no. 1 (2015): 104, https://doi.org/10.1108/lht-08-2014-0077. 24 blechner, “improving usability of legal research databases for users with print disabilities,” 168. 25 blechner, “improving usability of legal research databases for users with print disabilities,” 147. 26 lazar, goldstein, and taylor, ensuring digital accessibility through process and policy, 155. 27 nondiscrimination on the basis of disability; accessibility of web information and services of state and local government entities, 81 fed. reg. 28,658 (may 9, 2016) (to be codified at 28 cfr pt. 35). 28 christina mune and ann agee, “are e-books for everyone? an evaluation of academic e-book platforms’ accessibility features,” journal of electronic resources librarianship 28, no. 3 (2016): 181, https://doi.org/10.1080/1941126x.2016.1200927. https://doi.org/10.1108/lht-08-2014-0077 https://doi.org/10.1080/1941126x.2016.1200927 abstract introduction literature review methodology research findings vpat accuracy assessment accessibility analysis overall vendor comparison by size conclusion and limitations appendix a: accessibility remediation guide appendix b: database by vendor list used in vpat accuracy audit endnotes checking out facebook.com | charnigo and barnett-ellis 23 author name and second author checking out facebook.com | charnigo and barnett-ellis 23 author name and second author author id box for 2 column layout while the burgeoning trend in online social networks has gained much attention from the media, few studies in library science have yet to address the topic in depth. this article reports on a survey of 126 academic librarians concerning their perspectives toward facebook.com, an online network for students. findings suggest that librarians are overwhelmingly aware of the “facebook phenomenon.” those who are most enthusiastic about the potential of online social networking suggested ideas for using facebook to promote library services and events. few individuals reported problems or distractions as a result of patrons accessing facebook in the library. when problems have arisen, strict regulation of access to the site seems unfavorable. while some librarians were excited about the possibilities of facebook, the majority surveyed appeared to consider facebook outside the purview of professional librarianship. d uring the fall of 2005, librarians noticed something unusual going on in the houston cole library (hcl) at jacksonville state university (jsu). students were coming into the library in droves. patrons waited in lines with photos to use the publicaccess scan ner (a stack of discarded pictures quickly grew). library traffic was noticeably busier than usual and the computer lab was constantly full, as were the publicaccess termi nals. the hubbub seemed to center around one particular web site. once students found available computers, they were likely to stay glued to them for long stretches of time, mesmerized and lost in what was later determined to be none other than “facebook addiction.” this addic tion was all the more obvious the day the internet was down. withdrawal was severe. soon after the librarians noticed this curious behavior, an article in the chanticleer, the campus newspaper for jsu, dispelled the mystery surrounding the website brouhaha. a campus reporter broke the exciting news to the jsu community that “after months of waiting and requests from across the country, it’s finally here. jsu is officially on the facebook.”1 the library suddenly became a popular hangout for students in search of computers to access facebook. apparently jsu jumped on the bandwagon relatively late. the facebook craze had already spread throughout other colleges and universities since the web site was founded in february 2004 by mark zuckerberg, a former student at harvard university. the creators of facebook vaguely define the site as “a social utility that connects you with the people around you.”2 although originally created to allow students to search for other students at colleges and universities, the site has expanded to allow individuals to connect in high schools, companies, and within regions. recently, zuckerberg has also announced plans to expand the network to military bases.3 currently, students and alumni in more than 2,200 colleges and uni versities communicate, connect with other students, and catch up with past high school classmates daily through the network. students who may never physically meet on campus (a rather serendipitous occurrence in nature) have the opportunity to connect through facebook. establishing virtual identities by creating profiles on the site, students post photographs, descriptions of academic and personal interests such as academic majors, campus organizations of which they are members, political orientation, favorite authors and musicians, and any other information they wish to share about themselves. facebook’s search engine allows users to search for students, faculty, and staff with similar interests by keyword. it would be hard to gauge how many of these students actually meet in person after connecting through facebook. the authors of this study have heard students mention that either they or their friends have made dates with other students on campus through facebook. many of the “friends” facebook users first add when they initially establish their accounts are the ones they are already acquainted with in the physical world. when facebook made its debut at jsu, it had become the “ninth most highly trafficked web site in the u.s.”4 one source estimated that 85 percent of college students whose institutions are registered in facebook’s directory have created personal profiles on the site.5 membership for the university network requires a university email address, and an institution cannot be registered in the directory unless a significant number of students request that the school be added. currently, more than nine mil lion people are registered on facebook.6 soon after jsu was registered on facebook’s direc tory, librarians began to receive questions regarding use of the scanner and requests for help uploading pictures to facebook profiles. students seemed surprisingly open about showing librarians their profiles, which usually contained more information than the librarians wanted to know. however, not all students were enthusiastic about facebook. complaints began to surface from students awaiting access to computers for academic work while classmates “tied up” computers on facebook. some stu dents complained about the distraction facebook caused checking out facebook.com: the impact of a digital trend on academic libraries laurie charnigo and paula barnett-ellis laurie charnigo (charnigo@jsu.edu) is an education librarian and paula barnett-ellis (pbarnett@jsu.edu) is a health, science, and nursing librarian at the houston cole library, jacksonville state university, alabama. 24 information technology and libraries | march 200724 information technology and libraries | march 2007 in the library’s computer lab, a complaint that eventually reached the president of jsu. currently, the administra tion at jsu has decided to block access to facebook in the computer labs on campus, including the lab in the library. opinions of faculty and staff in the library about facebook vary. some librarians scoff at this new trend, viewing the site primarily as just another dating service. others have created their own facebook accounts just to see how it works, to connect with students, and to keep up with the latest internet fad.7 ■ study rationale prompted by the issues that have arisen at hcl as a result of heavy patron use of facebook, the authors surveyed academic librarians throughout the united states to find out what impact, if any, the site has had on other libraries. the authors sought information about the practical effect facebook has had on libraries, as well as librarians’ perspectives, perceived roles associated with, and awareness of internet social trends and their place in the library. online social networking, like email and instant messaging, is emerging as a new method of com munication. recently, the librarians have heard facebook being used as a verb (e.g., “i’ll facebook you”). few would probably disagree that making social connections and friends (and facebook revolves around connecting friends) is an important aspect of the campus experi ence. much of the attraction students and alumni have toward college yearbooks (housed in the library) stems from the same fascination that viewing photos, student profiles, and searching for past and present classmates on facebook inspires. emphasis in this study centers on librarians’ awareness of, experimentation with, and atti tudes towards facebook and whether or not they have created policies to regulate or block access to the site on publicaccess computers. however trendy an individual web site such as facebook may appear, online social networking, a cat egory facebook falls within, has become a new subject of inquiry to marketing professionals, sociologists, commu nication scholars, and library and information scientists. downes defines social networks as a “collection of indi viduals linked together by a set of relations.”8 according to downes, “social networking web sites fostering the development of explicit ties between individuals as ‘friends’ began to appear in 2002.”9 facebook is just one of many popular online social network sites (myspace, friendster, flickr), and survey respondents often asked why questions focused solely on facebook. the authors decided to investigate it specifically because it is cur rently the largest online social network targeted for the academic environment. librarians are also increasingly exploring the use of what have loosely been referred to as “internet 2.0” com panies and services, such as facebook, to interact with and reach out to our users in new and creative ways. the term internet 2.0 was coined by o’reilly media to refer to internet services such as blogs, wikis, online social net working sites, and types of networks that allow users the ability to interact and provide feedback. o’reilly lists the core competencies that define internet 2.0 services. one of these competencies, which might be of particular inter est to librarians, is that internet 2.0 services must “trust the users” as “codevelopers.”10 as librarians struggle to develop innovative ways to reach users beyond library walls, it seems logical to observe online services, such as facebook and myspace, which appeal to a huge portion of our clientele. from a purely evaluative standpoint of the site as a database, the authors were impressed by several of the search features offered in facebook. graphtheory algo rithms and other advanced network technology are used to process connections.11 some of the more interesting search options available in facebook include the ability to: ■ search for students by course field, class number, or section; ■ search for students in a particular major; ■ search for students in a particular student organiza tion or club; ■ create “groups” for student organizations, clubs, or other students with common interests; ■ post announcements about campus or organization events; ■ search specifically for alumni; and ■ block or limit who may view profiles, providing users with builtin privacy protection if the user so wishes. since the authors finished the study, the site has added a news feed and a mini feed, features that allow users to keep track of their friends’ notes, messages, profile changes, friend connections, and group events. in response to negative feedback about the news feeds and mini feeds by users who felt their privacy was being violated, facebook’s administrators created a way for users to turn off or limit information displayed in the feeds. the addition of this technology, however, provides a sophisticated level of connectivity that is a benefit to users who like to keep abreast of the latest happenings in their network of friends and groups. the pulse, another feature on the site, keeps daily track of popular interests (e.g., favorite books) and member demographics (number of members, political orientation) and compares them with overall facebook member averages. the authors were pleasantly surprised to discover that the beatles and led zeppelin, beloved bands of the baby boomers, article title | author 25checking out facebook.com | charnigo and barnett-ellis 25 continue to live on in the hearts of today’s students. these groups were ranked in the top ten favorite bands by stu dents at jsu. as of october 2006, the top campaign issues expressed by facebook users were: reducing the drinking age to eighteen (go figure) and legalization for samesex marriage. arguably, much of the information provided by facebook is not academic in nature. however, an evaluation or review of facebook might provide useful information to instruction librarians and database ven dors regarding interface design and search capabilities that appeal to students. proviteramcglynn suggests that facilitating learning among millennials, who “represent 70 to 80 million people” born after 1992 (a large percent age of facebook members) involves understanding how they interact and communicate.12 awareness of students’ cultural and social interests, and how they interact online, may help older generations of academic librarians better connect with their constituents. ■ the literature on online social networks although social networks have been the subject of study by sociologists for years and social network theories have been established to describe how these networks func tion, the study of online social networks has received little attention from the scholarly community. in 1997, garton, haythornthwaite, and wellman were among the first to describe a method, social network analysis, for studying online social networks.13 their work was published years before online social networks similar to facebook evolved. currently, the literature on these networks is predominantly limited to popular news pub lications, business magazines, occasional blurbs in library science and communications journals, and numerous student newspapers.14 privacy issues and concerns about sexual predators lurking on facebook and similar sites have been the focus of most articles. in the chronicle of higher education, read details numerous arrests, suspensions, and schol arship withdrawals that have resulted from police and administrators searching for incriminating information students have posted in facebook.15 read discovered that, because students naively reveal so much informa tion about their activities, some campus police were regularly trolling facebook, finding it “an invaluable ally in protecting their campuses.”16 students may feel a false sense of security when they post to facebook, regarding it as their private space. however, read warns that “as more and more colleges offer alumni email accounts, and as campus administrators demonstrate more internet savvy, students are finding that their conversations are playing to a wider audience than they may have antici pated.”17 privacy concerns expressed about facebook appear to revolve more around surveillance than stalk ers. in a web seminar on issues regarding facebook use in higher education, shawn mcguirk, director of judicial affairs, mediation, and education at fitchburg state college, massachusetts, recommends that administrators and others concerned with students posting potentially incriminating, embarrassing, or overtly personal infor mation draft a document similar to the one created by cornell university’s office of information technologies, which advises students on how to safely and responsibly use online social networking sites similar to facebook.18 after pointing out the positive benefits of facebook and reassuring students that cornell university is proud of its liberal policy in not monitoring online social networks, the essay, entitled “thoughts on facebook,” provides poignant advice and examples of privacy issues revolv ing around facebook and similar web sites.19 the golden rule of this essay states: don’t say anything about someone else that you would not want said about yourself. and be gentle with your self too! what might seem fun or spontaneous at 18, given caching technologies, might prove to be a liability to an ongoing sense of your identity over the longer course of history.20 a serious concern discussed in this document is the real possibility that potential employers may scan facebook profiles for the “real skinny” on job candidates. however, unless the employer uses an email issued from the same school as the candidate, he or she is unable to look at the individual’s full profile without first request ing permission from the candidate to be added as a “friend.” all the employer is able to view is the user’s name, school affiliation, and picture (if the user has posted one). unless the user has posted an inappropriate picture or is applying for a job at the college he or she is attending, the threat of employers snooping for informa tion on potential candidates in facebook is minimal. the same, however, cannot be said of myspace, which is much more open and accessible to the public. additionally, three pilot research studies have also focused on privacy issues specifically relating to facebook, including those of stutzman, gross and acquisti, and govani and pashley. results from all three studies revealed strikingly close findings. individuals who participated in the studies seemed willing to dis close personal information about themselves—such as photos and sometimes even phone numbers and mailing addresses—on facebook profiles even though students also seemed to be aware that this information was not secure. in a study of fifty carnegie mellon university undergraduate users, govani and pashley concluded that these users “generally feel comfortable sharing their per sonal information in a campus environment. participants said they “had nothing to hide” and “they don’t really 26 information technology and libraries | march 200726 information technology and libraries | march 2007 care if other people see their information.”21 a separate study of more than four thousand facebook members at the same institution by gross and acquisti echoed these findings.22 comparing identity elements shared by members of facebook, myspace, friendster, and the university of north carolina directory, stutzman discov ered that a significant number of users shared personal information about themselves in online social networks, particularly facebook, which had the highest level of campus participation.23 gross and acquisti provide a list of explanations suggesting why facebook members are so open about sharing personal information online. three explanations that are particularly convincing are that “the perceived benefit of selectively revealing data to strang ers may appear larger than the perceived costs of possible privacy invasions”; “relaxed attitudes toward (or lack of interest in) personal privacy”; and “faith in the network ing service or trust in its members.”24 in public libraries, concern has primarily centered on teenagers accessing myspace.com, an online social net working site much larger than facebook. myspace, whose membership, unlike facebook, does not require an .edu email address, has a staggering 43 million users, a num ber that continues to rise.25 julian aiken, a reference librar ian at the new haven free public library, wrote about the unpopular stance he took when his library decided to ban access to myspace due to the hysterical hype of media reports exposing the dangers from online predators lurking on the site.26 for aiken, the damage of censorship policies in libraries far outweighs the potential risk of sex crimes. furthermore, he suggests that there are even edu cational benefits of myspace, observing that “[t]eenagers are using myspace to work on collaborative projects and learn the computer and design skills that are increasingly necessary today.”27 what is apparent is that whether facebook continues to rise in popularity or fizzles out among the college crowd, the next generation of college students, who now constitute the largest percentage of myspace users, are already solidly entrenched and adept at using online social networks. librarians in institutions of higher education might need to consider what implica tions the communication style preferences of these future students could have, if any, on library services. while most of the academic attention regarding online social networks has centered on privacy concerns, perhaps the business sector has done a more thorough investiga tion of user behavior and students’ growing attraction towards these types of sites. business magazines have naturally focused on the market potential, growth, and fluctuating popularity of various online social networks. advertisers and investors have sought ways to capital ize on the exponential growth of these hightraffic sites. business week reported that as of october 2005, facebook .com had 4.2 million members. more than half of those members were between the ages of twelve and twenty four.28 while some portended that the site was losing momentum, as of august 2006, membership on facebook had expanded beyond eight million.29 marketing experts have closely studied, apparently more so than com munication scholars, the behavior of users in online social networks. in a popular business magazine, hempel and lehman describe user behavior of the “myspace generation”: “although networks are still in their infancy, experts think they’re already creating new forms of social behavior that blur the distinctions between online and realworld interactions.”30 the study of user behavior in online social networks, however, has yet to be addressed in length by those outside the field of marketing. although evidence of interest in online social net works is apparent in librarian weblogs and forums (many librarians have created facebook groups for their libraries), actual literature in the field of library and information science is scarce.31 dvorak questions the lack of interest displayed by the academic community toward online social networks as a focus of scholarly research. calling on academics to “get to work,” he argues “aca demia, which should be studying these phenomena, is just as out of the loop as anyone over 30.”32 this discon nect is also echoed by michael j. bugeja, director of the greenlee school of journalism and communication at iowa state university, who writes, “while i’d venture to say that most students on any campus are regular visitors to facebook, many professors and administrators have yet to hear about facebook, let alone evaluate its impact.”33 the lack of published research articles on these types of networks, however, is understandable given the newness of the technology. a few members of the academic community have sug gested opportunities for using facebook to communicate with and reach out to students. in a journal specifically geared toward student services in higher education, shier considers the impact of facebook on campus community building.34 although she cannot identify an academic purpose for facebook, she describes how the site can con tribute to the academic social life of a campus. facebook provides students with a virtual campus experience, particularly in colleges where students are commuters or are in distance education. shier writes, “as the student’s definition of community moves beyond the geographic and physical limitations, facebook.com provides one way for students to find others with common interests, feel as though they are part of a large community, and also find out about others in their classes.”35 furthermore, facebook membership extends beyond students to fac ulty, staff, and alumni. shier cites examples of professors who used facebook to connect or communicate with their students, including the president of the university of iowa and more than one hundred professors at duke university. professors who teach online courses make article title | author 27checking out facebook.com | charnigo and barnett-ellis 27 themselves seem more human or approachable by estab lishing facebook profiles.36 greeting students on their own turf is exactly the direction staff at washington university’s john m. olin library decided to take when they hired web services librarian joy weese moll to communicate and answer questions through a variety of new technologies, includ ing facebook.37 brian mathews, information services librarian at georgia institute of technology, also created a facebook profile in order to “interact with the students in their natural environment.”38 mathews decided to experiment with the possibilities of using facebook as an outreach tool to promote library services to 1,700 stu dents in the school of mechanical engineering after he discovered that 1,300 of these students were registered on facebook. advising librarians to become proactive in the use of online social networks, mathews reported that overall, his experience helped him to effectively “expand the goal of promoting the library.”39 bill drew was among the first librarians to create an account and profile for his library, the suny morrisville library. as of september 2006, nearly one hundred librarians had created profiles or accounts for their libraries on facebook. one month later, however, the administration at facebook began shutting down library accounts on the grounds that libraries and institutions were not allowed to represent themselves with profiles as though they were individu als. in response, many of these libraries simply created groups for their libraries, which is completely appropri ate, similar to creating a profile, and just as searchable as having an account. the authors of this study created the “houston cole library users want answers!” group, which currently has ninetyone members. library news and information of interest about the library is announced in the group.40 in this study, one trend the authors will try to identify is whether other librarians have considered or are already using facebook in similar ways that moll, mathews, and drew have explored as avenues for com municating with students or promoting library services. ■ the survey in february 2006, 244 surveys were mailed to reference or public service librarians (when the identity of those per sons could be determined). these individuals were chosen from a random sample of the 850 institutions of higher education classified by the carnegie classification listing of higher education institutions as “master’s colleges and universities (i and ii)” and “doctoral/ research universities (extensive and intensive).”41 the sample size provided a 5.3 percent margin error and a 95 percent confidence level. one hundred twentysix surveys were completed, providing a response rate of 51 percent. fifteen survey questions (appendix a) were designed to target three areas of inquiry: awareness of facebook, practical impact of the site on library services, and perspectives of librarians toward online social networks. awareness of facebook a series of questions on the survey queried respondents about their awareness and degree of knowledge about facebook. the overwhelming majority of librarians were aware of facebook’s existence. out of 126 librarians, 114 had at least heard of facebook; 24 were not familiar with the site. as one individual wrote, “i had not heard of facebook before your survey came, but i checked and our institution is represented in facebook.” universities registered in facebook are easily located through a searchbyregion on facebook’s home page. thirtyeight colleges and universities for alabama (jsu’s location) are registered in facebook. (in comparison, 143 academic institutions in california are listed.) out of those librar ians who had heard of the site, 27 were not sure whether their institutions were registered in facebook’s directory. sixty survey participants were aware that their institu tions were registered in the directory, while fifteen librar ians reported that their universities were not registered (figure 1). several comments at the end of the survey indicated that some of the institutions surveyed did not issue school email accounts, making membership in facebook impossible for their university. interestingly, out of the sixty individuals who could claim that their universities were in the directory, 34 percent have created their own personal facebook accounts and two libraries have individual profiles (figure 2). one individual who established an account on the site wrote, “personally, i’m a little embarrassed by having an account because it’s such a teenybopper kind of thing and i’m a little old for it. but it’s an interesting cultural phenomenon and academic librarians need to get on the bandwagon with it, if only to better understand their constituents.” another survey respondent with an individual profile on the site reported a group created by his or her institution on facebook titled “i totally want to have sex in the library.” this individual wanted to make it clear, however, that the students—not the librarians—created this group. a particularly help ful participant went so far as to poll the reference col leagues in all nine of the libraries at his/her institution and found that “only a few had even heard of facebook.” that librarians will become increasingly aware of online social networks was the sentiment expressed by another individual who wrote, “most librarians at my institu tion are unaware of social software in general, much less facebook. however, i think this will change in the future as social software is mentioned more often in traditional media (such as television and newspapers).” according to survey responses, it does not appear 28 information technology and libraries | march 200728 information technology and libraries | march 2007 that use of facebook by students has been as noticeable or distracting in other libraries as it has been at hcl. when asked to describe their observation of student use of library computers to access facebook, 56 percent of those surveyed checked “rarely to never.” only 20 percent indicated “most of the time” to “all of the time” (table 1). however, it is important to remember that only sixty individuals could verify that their institutions are regis tered on facebook. through comments, some librarians hinted that “snooping” or keeping mental notes of what students view on library computers is frowned upon. it simply is not our business. “we do not regulate or track student use of computers in the library,” wrote one indi vidual. several librarians noted that students were using facebook in the libraries, but more so on personal laptops than publicaccess computers. practical impact of facebook another goal of this study was to find out whether facebook has had any real impact on library services, such as an increase in bandwidth, library traffic, and noise, or in use of publicaccess computers, scanners, or other equipment. student complaints about monopolization of computers for use of facebook led administrators to block the site from computer labs at jsu. access to facebook on publicaccess terminals, however, was not regulated. survey responses revealed that facebook has had minimal impact on library services elsewhere. only one library was forced to develop a policy for specifically addressing computeruse concerns as a result of facebook use. one individual mailed the sign posted on every computer terminal in the library, which states, “if you are using a computer for games, chat, or other recreational activity, please limit your usage to thirty minutes. computers are primarily intended for academic use.” another librarian reported that academic computing staff had to shut down access to facebook on library computers due to band width and access issues. this individual, however, added, “interestingly, no one has complained to the library staff about its absence!” given a list of possible effects facebook may have had on library services and operations, 10 per cent of respondents indicated that facebook has increased patron use of computers. seven percent agreed that it has increased patron traffic, and only 2 percent reported that the site has created bandwidth problems or slowed down internet access. only four individuals received patron complaints about other users “tying up” the computers with facebook (figure 3). since the advent of facebook, the public scanner has become one of the hottest items in hcl. librarians at jsu know that use of the scanner has increased tremendously due to facebook because the scanner used by students to upload photos is attached to a public workstation next to the general reference desk. students often ask questions about uploading pictures to their facebook profiles as well as how to edit photos (e.g., resizing and cropping). one survey question asked whether scanner use had increased as a result of facebook. of the sixtytwo respon dents who answered this question (it was indicated that only those libraries that provide public access to scanners should answer the question), 77 percent reported that figure 1. institutions added to the facebook directory figure 2. involvement with facebook table 1. student use of library computers to access facebook (based on observation) total percentage never 23 32 rarely 17 24 some of the time 17 24 all the time 7 10 most of the time 7 10 article title | author 2�checking out facebook.com | charnigo and barnett-ellis 2� scanner use had not increased. furthermore, only two librarians have assisted students with the scanner or pro vided any other type of assistance, for that matter, with facebook. the assistance the two librarians gave included scanning photographs, editing photos, uploading photos to facebook profiles, and creating accounts. however, in a separate question, 21 percent of participants agreed that librarians should be responsible for helping students, when needed, with questions about facebook. no librar ian has added additional equipment such as computers or scanners as a result of facebook. only one individual reported future plans by his/her library to add additional equipment in the future as a result of heavy use of the site. perspectives toward facebook one of the main goals of the study was to obtain a snapshot of the perspectives and attitudes of librarians toward facebook and online social networks in general. most of the librarians surveyed were neither enthusiastic nor disdainful of facebook. a small group of the respon dents, however, when given the chance to comment, were extremely positive and excited about the possibilities of online social networking. twentyone individuals saw no connection between libraries and facebook. sixty seven librarians were in agreement that computer use for academic purposes should take priority, when needed, over use of facebook. however, fiftyone respondents indicated that librarians needed to keep up with internet trends, such as facebook, even when such trends are not academic in nature (table 2). out of 126 librarians who completed the survey, only 23 reported that facebook has generated discussion among library faculty and staff about online social networks. on the other hand, few individuals voiced negative opinions toward facebook. only 5 percent of those surveyed indicated that facebook annoyed faculty and staff. one individual wrote, “i don’t like facebook or most social networking services. they encourage the formation of cliques and keep users from meeting and accepting those who are different than themselves.” comments like this, however, were rare. although the majority of librarians seemed fairly apa thetic toward facebook, few individuals expressed nega tive comments toward the site. few librarians indicated that facebook should be addressed or regulated in library policy. most individu als viewed the site as just another communication tool similar to instant messaging or cell phones. in fact, while most librarians did not express much interest in facebook, many were quite vocal about not regulating its use. the following comment by one survey partici pant captures this sentiment: “attempts to restrict use of facebook in the library would be futile, in my opinion, in the same way it is now impossible to ban use of usb drives and aim in academic libraries.” while most indi table 2. access, assistance, and awareness of facebook and similar trends: perspectives total percentage computer use for academic purposes should take priority, when needed, over use of facebook. 67 53 librarians need to “keep up” with internet trends, such as facebook, even when these trends are not academic in nature. 51 40 library resources should not be monopolized with use of facebook. 35 28 librarians should help students, when able, with questions regarding facebook. 27 21 there is no connection between libraries and facebook. 21 17 student use of facebook on library computers should not be regulated. 15 12 library computers should be available for access to facebook, but librarians should not feel that it is their responsibility to assist students with questions regarding the site. 11 9 (respondents were allowed to check any or all responses that applied.) figure 3. patron complaints about facebook 30 information technology and libraries | march 200730 information technology and libraries | march 2007 viduals agreed that academic use of computers should take priority over recreational use, a polite request that a patron using facebook allow another student to use the computer for academic purposes, when necessary, appears more preferable than the creation and enforce ment of strict policies. as one librarian put it, “i don’t want students to see the library as a place where they are ‘policed’ unnecessarily.” when asked if facebook serves any academic pur pose, 54 percent of those surveyed indicated that it does not, while 34 percent were “not sure.” twelve percent of the librarians identified academic potential or pos sible benefits of the site (figure 4). the authors were surprised to find that 46 percent of those surveyed were not completely willing to dismiss facebook as pure rec reation. some librarians found facebook to be a distrac tion to academics: “maybe i’m old fashioned, but when do students find time for this kind of thing? i wonder about the impact of distractions like this on academic pursuits. there’s still only twentyfour hours in a day.” another individual asked two students who were using facebook in the library what they thought of the site and they admitted that it was “frequently a distraction from academic work.” for the 34 percent who were not sure whether facebook has any academic value, there were comments such as “i am continuing to observe and will decide in the future.” academic uses for facebook included suggestions that it be used as a communication tool for student collaboration in classes (facebook allows students to search for other students by course and sec tion number). one individual suggested it could be used as an “online study hall,” but then wondered if this might lead to plagiarism. some thought instructors could somehow use facebook for conducting online discussion forums, with one participant observing “it’s ‘cooler’ than using blackboard.” “building rapport” with students through a communication medium that many students are comfortable with was another benefit mentioned. respondents who were enthusiastic about facebook thought it most beneficial as a virtual extension of the campus. facebook could potentially fill a void where facetoface connections are absent in online and dis tanceeducation classes. several librarians suggested that facebook has had a positive influence in fostering col legiate bonds and school spirit. as one individual wrote, “[t]he academic environment is not only responsible for scholarly growth, but personal growth as well. this is just one method for students to interact in our highly techno logical society.” facebook could provide students who are not physically on campus with a means to connect with other students at their institutions who have similar academic and social interests. some librarians were so enthusiastic about facebook that they suggested libraries use the site to promote their services. using the site to advertise library events and creating online library study groups and book clubs for students were some of the ideas expressed. one librar ian wrote: “facebook (and other social networking sites) can be a way for libraries to market themselves. i haven’t seen students using facebook in an academic manner, but there was a time when librarians frowned on email and aim too. if it becomes a part of students’ lives, we need to welcome it. it’s part of welcoming them, too.” more librarians, however, felt that facebook should serve as a space exclusively for students and that librarians, profes sors, administrators, police, and other uninvited folks should keep out. furthermore, as one individual noted, it is not “an appropriate venue” for librarians to promote their services. while the review of literature demonstrates that much has been made of online social networks and privacy issues, the librarians surveyed were not particularly con cerned about privacy. only 19 percent indicated that they were concerned about privacy issues related to facebook. however, some librarians voiced concerns that many stu dents are ignorant about the risks of posting personal infor mation and photographs on facebook and do not seem fully aware of the possibility that individuals outside their social sphere might also have reason to access the site. one individual mentioned that the librarians at her institution have begun to emphasize this to students during library instruction sessions on internet research and evaluation. ■ limitations several limitations to this study must be noted when attempting to reach any type of conclusion. participants who had never heard of facebook obviously could not answer any questions except that they were not famil iar with the site. some questions required respondents to “guesstimate.” unless librarians have access to their figure 4. finds conceivable academic value in facebook article title | author 31checking out facebook.com | charnigo and barnett-ellis 31 institution’s internet usage statistics, it would be hard for them to really know how much bandwidth is being used by students accessing facebook. librarians, having been trained in a profession that places a high value on freedom of access, might also be wary of activities that suggest any type of censorship. therefore, it is conceivable that some of the librarians surveyed do not know whether students are using facebook in the library because they make a point not to snoop or make note of individual web sites that students view. ■ discussion while online education is growing at a rapid rate across the united states, so is the presence of virtual academic social communities. although facebook might prove to be a passing fad, it is one of the earliest and largest online social networking communities geared specifically for students in higher education. it represents a new form of communication that connects students socially in an online environment. if online academics have evolved and continue to do so, then it is only natural that online academic social environments, such as facebook, will continue to evolve as well. while traditionally considered the heart of the campus, one is left to ponder the library’s presence in online academic social networks. what role the library will serve in these environments might largely depend on whether librarians are proactive and experi mental with this type of technology or whether they simply dismiss it as pure recreation. emerging technolo gies for communication should provoke, at the very least, an interest in and knowledge of their presence among library and information science professionals. this survey found that librarians were overwhelmingly aware of and moderately knowledgeable about facebook. some librarians were interested in and fascinated with facebook, but preferred to study it as outsiders. others had adopted the technology, but more for the purpose, it would seem, of having a better understanding of today’s students and why facebook (and other online social net working sites) appeals to so many of them. it is apparent from this study that there is a fine line between what now constitutes “academic” activity and “recreational” activity in the library. sites like facebook seem to blur this line fur ther and librarians do not seem eager or find it necessary to distinguish between the two unless absolutely pressed (e.g., asking a student to sign out of facebook when other patrons are waiting to use computers for academic work). one area of attention this study points to is a lack of con cern among librarians toward the internet and privacy issues. some individuals surveyed suggested that librari ans play a larger role in making students aware that people outside their society of friends—namely, administrative or authority figures—have the ability to access the informa tion they post online to social networks. participants were most enthusiastic about facebook’s role as a space where students in the same institution can connect and share a common collegiate bond. librarians who have not yet “checked out” facebook might consider one individual’s description of the site as “just another ver sion of the college yearbook that has become interactive.”42 among the most cherished books in hcl that document campus life at jsu are the mimosa yearbooks. alumni and students regularly flip through this treasure trove of pho tographs and memories. no administrator or librarian would dare weed this collection or find its presence irrele vant. while year books archive campus yesteryears, online social networks are dynamically documenting the here and now of campus life and shaping the future of how we communicate. as casey writes, “libraries are in the habit of providing the same services and the same programs to the same groups. we grow comfortable with our provision and we fail to change.”42 by exploring popular new types of internet services such as facebook instead of quickly dismissing them as irrelevant to librarianship, we might learn new ways to reach out and communicate better with a larger segment of our users. ■ acknowledgements the authors would like to acknowledge stephanie m. purcell, student worker at the houston cole library, for her excellent editing suggestions and insight into online social networks from the student’s point of view, and johnbauer graham, head of public services at the houston cole library, for his encouragement. references and notes 1. angela reid, “finally . . . the facebook,” the chanticleer, sept. 22, 2005, 4. 2. facebook.com, http://www.facebook.com/about.php (accessed dec. 2, 2005). 3. angus loten, “the great communicator,” inc.com., june 6, 2006, http://www.inc.com/30under30/zuckerberg.html (accessed dec. 4, 2005). 4. adam lashinsky, “facebook stares down success,” fortune, nov. 28, 2005, 4. 5. michael amington, “85 percent of college students use facebook,” testcrunch: tracking web 2.0 company review on facebook (sept. 7, 2005), http://www.techcrunch.com/2005/09/07/ 85ofcollegestudentsusefacebook (accessed dec. 2, 2005). 6. http://www.facebook.com/about.php. 7. facebook us! if you are a registered member of facebook, do a global search for “laurie charnigo” or “paula barnett ellis.” 32 information technology and libraries | march 200732 information technology and libraries | march 2007 8. stephen downes, “semantic networks and social net works,” the learning organization 12, no. 5 (2005): 411. 9. ibid. 10. tim o’reilly, “what is web 2.0?” http://www.oreilly net.com/pub/a/oreilly/tim/news/2005/09/30/whatisweb 20.html (accessed aug. 6, 2006). 11. http://www.facebook.com/about.php. 12. angela provitera mcglynn, “teaching millennials, our newest cultural cohort,” the education digest 71, no. 4 (2005): 13. 13. laura garton, caroline haythornthwaite, and barry well man, “studying online social networks,” journal of computer mediated communication 31, no. 4 (1997). 14. facebook.com’s “about” page archives a collection of col lege newspaper articles about facebook: http://www.facebook .com/about.php (accessed dec. 4, 2005). 15. brock read, “think before you share,” the chronicle of higher education, jan. 20, 2006, a38–a41. 16. ibid., a41. 17. ibid., a40. 18. shawn mcguirk, “facebook on campus: understanding the issues,” magna web seminar presented live on june 14, 2006. transcripts available for a fee from magna pubs. http://www .magnapubs.com/catalog/cds/5987551.html (accessed aug. 2, 2006). 19. tracy mitrano, “thoughts on facebook” (apr. 2006) cor nell university of information technologies, http://www.cit .cornell.edu/oit/policy/ memos/facebook.html (accessed june 22, 2006). 20. ibid., “conclusion.” 21. tabreez govani and harriet pashley, “student awareness of the privacy implications when using facebook,” unpublished paper presented at the “privacy poster fair” at the carnegie mellon university school of library and information science, dec. 14, 2005, 9, http://lorrie.cranor.org/courses/fa05/tubzhlp .pdf (accessed jan. 15, 2006). 22. ralph gross and alessandro acquisti, “information rev elation and privacy in online social networks,” paper presenta tion at the acm workshop on privacy in the electronic society, alexandria, va., nov. 7, 2005, 79, http://portal.acm.org/citation .cfm?id=1102214 (accessed nov. 30, 2005). 23. frederic stutzman, “an evaluation of identitysharing behavior in social network communities,” paper presentation at the idmaa and ims code conference, oxford, ohio, april 6–8, 2006, 3–6, http://www.ibiblio.org/fred/pubs/stutzman _pub4.pdf (accessed may 23, 2006). 24. gross and acquisti, “information revelation and privacy in online social networks,” 73. 25. “myspace: design anarchy that works,” business week, jan. 2, 2006, 16. 26. julian aiken, “hands off myspace,” american libraries 37, no. 7 (2006): 33. 27. ibid. 28. jessi hempel and paula lehman, “the myspace genera tion,” business week, dec. 12, 2005, 94. 29. http://www.facebook.com/about.php. 30. hempel and lehman, “the myspace generation,” 87. 31. the authors created the “librarians and facebook” group on facebook to discuss issues concerning facebook and librari anship, such as censorship issues, policies, and ideas for con necting with students through facebook. this is a global group. if you have a facebook account, we invite you to do a search for “librarians and facebook” and join our group. 32. john c. dvorak, “academics get to work!” pcmagazine online, http://www.pcmag.com/article2/0,1895,1928970,00 .asp (accessed feb. 21, 2006). 33. michael j. bugeja, “facing the facebook,” the chronicle of higher education, jan. 27, 2006, c1–c4; ibid. 34. maria tess shier, “the way technology changes how we do what we do,” new directions for student services 112 (winter 2005): 83–84. 35. ibid., 84 36. shier, “the way technology changes how we do what we do,” 112; j. duboff, “poke” your prof: faculty discovers thefacebook.com,” yale daily news, mar. 24, 2005, http://www .yaledailynews.com/article.asp?aid=28845 (accessed jan. 15, 2006; mingyang liu, “would you friend your professor? duke chronicle online, feb. 25, 2005, http://www.dukechronicle.com/ media/paper884/news/2005/02/25/news/would.you.friend .your.professors1472440.shtml?norewrite&sourcedomain =www.dukechronicle.com (accessed jan. 15, 2006). 37. brittany farb, “students can ‘check out’ new librarian on the facebook,” student life (washington univ. in st. louis), feb. 27, 2006, http://www.studlife.com/home/index.cfm?eve nt=displayarticle&ustory_id=5914a90d53b (accessed feb. 27, 2006). 38. brian s. mathews, “do you facebook? networking with students online,” college & research libraries news 37, no. 5 (2006): 306. 39. ibid., 307. 40. view the “houston cole library users want answers!” group by doing a search for the group title on facebook. 41. nces compare academic libraries, http://nces.ed.gov/ surveys/libraries/ compare/peervariable.asp (accessed dec. 2, 2005). the random sample was chosen using the research ran domizer available online, http://www.randomizer.org/form .htm (accessed dec. 2, 2005). 42. michael e. casey and laura c. savastinuk, “library 2.0,” library journal 131, no. 14 (2006): 40. article title | author 33checking out facebook.com | charnigo and barnett-ellis 33 1. has your institution been added to the facebook directory?  yes  no (skip to questions 10, 11, and 12  not sure (skip to questions 10, 11, and 12)  i am not familiar with facebook (skip all questions and submit) 2. which best describes your involvement with facebook?  i have a personal account  my library has an account  no involvement 3. which best describes your observation of student use of library computers to access facebook?  all the time  most of the time  some of the time  rarely  never 4. has your library added additional equipment such as computers or scanners as a result of facebook use?  yes  no  no, but we plan to in the future 5. have patrons complained about other patrons using library computers for facebook?  yes  no  not sure 6. has your library had to develop a policy or had to address computer use concerns as a result of facebook use?  yes  no  not sure 7. if your library provides public access to a scanner, has patron use of scanners increased due to the use of facebook?  yes  no 8. have you assisted students with the library’s scan ner for facebook?  yes  no 9. if you have provided assistance to students with facebook, please check all that apply:  creating accounts  scanning photographs or offering advice on where students can access a scanner  editing photographs (e.g., resizing photos or use of a photo editor)  uploading photographs to facebook profiles  other __________________________________ 10. check the responses that best describe your opinion about the responsibilities of librarians in assisting students with facebook questions and access to the web site:  student use of facebook on library computers should not be regulated.  library resources should not be monopolized with facebook use.  computer use for academic purposes should take priority, when needed, over use of facebook.  librarians should help students, when able, with facebook questions.  librarians need to “keep up” with internet trends, such as facebook, even if they are not academic in nature.  there is no connection between librarians, libraries, and facebook.  library computers should be available for facebook use, but librarians should not feel that they need to assist students with facebook questions. 11. would you consider facebook to be a relevant aca demic endeavor?  yes  no  not sure appendix a: survey on the impact of facebook on academic libraries 34 information technology and libraries | march 200734 information technology and libraries | march 2007 12. if you answered “yes” to question 11, please describe how facebook could be considered an aca demic endeavor. ______________________________________________ ______________________________________________ ______________________________________________ ______________________________________________ 13. please check all answers that best describe what effect, if any, use of facebook in the library has had on library services and operations?  has increased patron traffic  has increased patron use of computers  has created computer access problems for patrons  has created bandwidth problems or slowed down internet access  has generated complaints from other patrons  annoys library faculty and staff  interests library faculty and staff  has generated discussion among library faculty and staff about facebook 14. is privacy a concern you have about students using facebook in the library?  yes  no  not sure please list any observations, concerns, or opinions you have regarding facebook use in libraries. extracted the paragraphs from my palm to my desktop, and saved that document and the tocs on a universal serial bus (usb) key. today, i combined them in a new document on my laptop and keyed the remaining paragraphs in my room at an inn on a pier jutting into commencement bay in tacoma on southern puget sound. i sought inspiration from the view out my window of the water and the fall color, from old crow medicine show on my ipod, and from early sixties beyond the fringe skits on my treo. fred kilgour was committed to delivering informa tion to users when and where they wanted it. libraries must solve that challenge today, and i am confident that we shall. editorial continued from page 3 10603 20190318 galley measuring information system project success through a software-assisted qualitative content analysis jin xiu guo information technology and libraries | march 2019 53 jin xiu guo (jin.x.guo@stonybrook.edu) is director of collections and resource management, frank melville, jr. memorial library, stony brook university abstract information system (is)/it project success is a growing interest in management due to its high impact on organizational change and effectiveness. libraries have been adopting integrated library systems (ils) to manage services and resources for years. it is essential for librarians to understand the mechanism of is project management in order to successfully bring technology innovation to the organization. this study develops a theoretical model of measuring is project success and tests it in an ils merger project through a software-assisted qualitative content analysis. the model addresses project success through three constructs: (1) project management process, (2) project outcomes, and (3) contextual factors. the results indicate project management success alone cannot guarantee project success; project outputs and contextual factors also influence success through the leadership of the project manager throughout the lifecycle. the study not only confirms the proposed model in a post-project evaluation, but also signifies that project assessment can reinforce organizational learning, increase the chance of achieving success, and maximize overall returns for an organization. the qualitative content analysis with nvivo 11 has provided a new research method for project managers to self-assess an is/it project success systematically and learn from their experiences throughout the project lifecycle. introduction information technology (it) project success has drawn more attention in the last two decades due to its high impact on organizational change. more companies have conducted their innovation to gain business advantages through is projects. in the united kingdom alone, 21 percent of the gross value increased in manufacturing and construction happens through complex products and is development projects. however, the implementation of is projects has not been successful as practitioners hoped. nicholas and hidding reported that only 35 percent of it projects were completed on time and budget, and met the project requirements.1 the u.s. office of electronic government and information technology (oegit) noted that only 25 percent of 1,400 projects reached the office’s goals and more than $21 billion spent on it projects were in jeopardy.2 in the european union, about 20 to 30 percent of contracted it/is projects could not meet the stakeholders’ expectations and cause the loss of ₤70 billion or $99 billion.3 although some it projects are considered successful from the perspective of project management, project sponsors hardly recognize the results leading to organizational effectiveness. it is critical for it practitioners to explore new methods to articulate what it project success is and then improve project performance. measuring is project success | guo 54 https://doi.org/10.6017/ital.v38i1.10603 traditionally, the measurement of it project success focuses on internal measures such as project time, cost, risk, and quality, which address project efficiency. in recent years, external measures, such as product satisfaction and organizational effectiveness, have gained more attention. moreover, contextual factors such as top management support, project managers’ qualifications, system vendors, implementation consultants, and adaptation to change have shown critical effects on project success. the lack of literature in the post-project evaluation and merger of multiple information systems (is) still exists. notably, the consolidation of information systems of different organizations creates additional challenges for the new organizations. diverse cultures and leadership styles may create barriers for managers to gain the trust of employees who used to work at a different institution. nevertheless, the adaptation to change for all staff is necessary in the course of the merger. the need for addressing the impact of these factors on is project success is increasing. libraries have adopted the ils to manage services and resources for the last two decades. the next generation system—cloud-based library management systems—are now replacing existing ils. to improve the efficiency of higher education, consolidation of public universities or colleges is still a viable alternative. it is essential for librarians to understand the mechanism of is project management in order to successfully bring technology innovation to the organization. this study is to fill the gap by examining is project success factors and developing a model to measure is project success. the model can help practitioners better understand is project success and improve the chance of success. the author firstly provides a historical account of the definitions of project success and measures adopted. what follows is to apply the model in a postproject evaluation at an academic library. theoretical background researchers and practitioners have been seeking it project success through both quantitative and qualitative studies to find out what makes a successful it project and how a project manager can make better decisions to increase the chance of project success. this review is to examine how project success is defined and what criteria practitioners employ for measurement. it projects can be at different levels of complexity. for instance, a project of enterprise resource planning (erp) implementation is more complicated and requires more resources to deploy across organizational functions. this type of projects might quickly overrun budget and deadline. as a result, the studies on erp implementation success draw more attention. cảrstea believes that project success is to achieve the targets that an organization has created and can be relatively measured against time, cost, quality, final results obtained, resources, the degree of automation, and international standards with a flexible evaluation system. he suggests that project managers may analyze the goal discrepancies between the current and new to self-evaluate the progress.4 although this method emphasizes project efficiency, the self-developed evaluation system has shown the potential for it project managers to control planning and organization of multiple it projects within the organization. instead of studying project management process alone, tsai et al. incorporate system providers, implementation consultants, and the achievement level of project management into delone and mclean’s modified is success model. they describe the erp project success as efficient deployment and enhancement of organizational effectiveness. the success indicators include the information technology and libraries | march 2019 55 accomplishment level of project management and the degree of the improvement of is performance. the metrics of project management are fulfilling business implementation goal, top management support, budget, time, communication, and troubleshooting; while the system performance dimensions include achieving integration of systems for system quality, information quality, system use, user satisfaction, individual and organizational impacts. the authors applied the research model to a quantitative study to test five hypotheses with servqual (service quality) instruments. the results show that the services provided by system vendors and implementation consultants are correlated with project management, then from project management to system performance.5 it is worth mentioning that this measurement integrates project management into the is success model and confirms the contribution of project management to erp performance that leads to the improvement of organizational effectiveness. both studies indicate is project measures should comprise the dimensions of project management success and business goals. with the similar interest of erp, young and jordan investigate the impact of top management support (tms) on erp implementation success through descriptive case studies. the authors regard project success as the delivery of “expected benefits” and the achievement of “above average performance.” the findings of the research reveal that tms is the most important critical success factor (csf) that affects it project success through the involvement of top management in project planning, result follow-ups, and the facilitation of management problems, but project management success does not guarantee project success resulting in organizational productiveness.6 researchers are also interested in different perspectives of it project success. irani believes is project appraisal should incorporate investment evaluation into the project lifecycle. a project manager evaluates is impacts before, during, and after the investment is secured to dynamically justify the investment and ensure the project is in alignment with the organizational strategy. the author also points out that post-project evaluation lacks in current project management so that organizations lose a great learning opportunity to optimize their project management.7 furthermore, peslak inspects the relationship between it project success and overall it returns from the viewpoint of financial executives. the author defines it project success as organizational success in which staying abreast of technology and the ability to measure project and balance managerial control over projects positively affect it project success, then project success to overall it returns.8 likewise, lacerda and ensslin develop a conceptual model from the standpoint of external consultants to assess software projects. the theoretical framework contains the hierarchical structure of value, analysis, and recommendation, where they identify performance descriptors and analyze project values to improve the decision process in the course of the consultation.9 nicholas and hidding discover business goals, time for learning and reflection, and flexibility of the product are associated with project success through a series of interviews with it project managers.10 additionally, researchers make efforts to explain project outcomes for better understanding project success. thomas and fernández believe project success is changeable to each company, but the success criteria should consist of project management, technical system, and business goals that underscore business continuity, met business objectives, and delivery of benefits.11 measuring is project success | guo 56 https://doi.org/10.6017/ital.v38i1.10603 another study performed by kutsch also proves that the achievement of business purpose; benefit to the owner; the satisfaction of owners, users, and stakeholders; achieving prestated objectives; quality; cost and time; and satisfaction of team are sequentially significant variables affecting project outcomes.12 the study further attests that organizational effectiveness is an essential criterion of it project success. interestingly, researchers also examine individual success indicators such as quality and risk to deepen their understanding of project success. geraldi, kutsch, and turner think project quality has eight attributes including (1) a commitment to quality, (2) enabling capabilities, (3) completeness, (4) clarity, (5) integration, (6) adaptability, and (7) compliance along with (8) value-adding and met requirements.13 among them, enabling capabilities and adaptability are comparatively new. this discovery discloses that project quality is evaluated vigorously in the project lifecycle, which is consistent with cảrstea’s finding that project managers need to assess the projects regularly to recognize project controls and safety to achieve project goals. such practices create the agility for software development projects and secure the resources needed for development. summary of literature the literature review indicates that it is necessary to define project performance criteria and outcomes to measure is project success. is project success is the achievement of project management process and project goals. when measuring an is project, practitioners should also consider the impacts of contextual factors throughout the project lifecycle. system vendors, consultants’ services, management support, communication, adaptation to changes, time for learning and reflection, product flexibility, and project complexity are environmental influences. it is essential for practitioners to create an opportunity for organizational learning and improve future project success through a post-project evaluation. figure 1. the relationship between project success and organizational effectiveness. organizational effectiveness business goals & objectives business case 1, case 2, ... case n. project 1, project 2, … project n. information technology and libraries | march 2019 57 project success model the purpose of this study is to develop the measurement of is project success based on the findings of the literature review. therefore, the first step is to define project success. project success comprises project management success and the achievement of business goals. in the previous studies, practitioners emphasized project management success but pay less attention to project outcomes, which leads to many unexplainable project failures. for example, some it projects did not meet the business goals but conformed to the criteria of project management success. it might be a successful project from the perspective of project management process although it failed to attain the project goals. the relationship between is projects and organizational effectiveness is described in figure 1. each is project makes at least one business case, and each business case contributes to at least a business objective. it will be a successful is project if the project outcomes reach the business goals resulting in organizational effectiveness. the purpose of project performance criteria is to measure project progress throughout its lifecycle. without standards, a project manager could lose the control over the project, and most likely fail. as a result, the next step is to identify the measures of project success. the indicators of project management success have been widely studied and tested. the project scope, time, cost, quality, and risk are on the top of the metrics list. the discovery of literature review shows researchers employ business continuity, achieving business objectives, delivery of benefits, and the perceived value of a project to measure project outcomes. it is noteworthy that contextual factors also impact project success, influences such as top management support (tms), user involvement, system vendors, project manager’s qualifications, communication, and the complexity of a project, and adaptation to change need to be measured as well. hence, the author proposes a measurement model as shown in figure 2. measuring is project success | guo 58 https://doi.org/10.6017/ital.v38i1.10603 figure 2. model for measuring is project success. three constructs effect is project success in this model. project management process is a tool to help a project manager attain success, where project performance criteria are identified to control quality and assess the progress throughout the lifecycle. on the other hand, project outcomes entail project goals to ensure ultimate project success. the contextual factors may contribute to success directly or indirectly by influencing project management process or organizational environment such as change management. therefore, a project manager has to examine three constructs when assessing project success. to demonstrate the application of the model, the author conducted a case study on an ils merger project. case study: a post-project evaluation background in november 2013, the board of regents of the university system of georgia announced the consolidation of kennesaw state university (ksu) and southern polytechnic state university (spsu). the merger of two state university libraries was one of the main tasks and involved merging two integrated library systems (ils). the project involved removing duplicate bibliographic and customer records between two libraries and of relational databases that contain financial, bibliographic, transactional, vendor, and customer data. the ils provider, ex libris, and two university libraries executed the merger with the support of galileo interconnected libraries (gil) it staff. the ils merger implementation team comprised of two it experts from ex information technology and libraries | march 2019 59 libris and fourteen ils users from two libraries across five functional units comprising acquisition, cataloging, circulation and interlibrary loan, serials, and system administration. ksu/spsu and ex libris had a project manager on each side, and the author was the ksu/spsu project manager. the gil support team facilitated the implementation of the merger. the project goal was to operate two libraries with a consolidated ils by july 2015 without interrupting services to students, staff, and faculty. the project was completed within eighty-one days and the consolidated university libraries were operated uniformly by the timeline. the team also won the 2015 georgia library association team award due to its success. methodology the methodologies adopted in previous researches include interview and survey. both methods need to collect feedback from stakeholders during the post-project period, which sometimes can be challenging to reach the project stakeholders once the project is completed. however, many written communications including project documentation, emails, and reports are invaluable data for project managers to assess project success. researchers have utilized software to assist content analysis in qualitative studies. hoover and koerber used nvivo to analyze data like text, interview transcripts, photographs, audio and video recordings by coding and retrieving to understand sophisticated relations among those data.14 researchers think that computer-assisted qualitative data analysis (caqdas) has created new research practices and helped data analysis, research management, and theory development, where caqdas becomes a synonym of qualitative research.15 balan’s team manually coded and categorized the dimensions identified in concept analysis, then employed concept mapping to present data relationship, which is an integration of qualitative and quantitative methods. 16 the word tag cloud in nvivo is a technique to assess the relevance of the data obtained or gathered to the research topic and treemap on the other hand is a tool to extract the new themes along with their relationship from the study data.17 hutchison et al. believed that caqdas could facilitate the ground theory investigation. the group utilized the memo in nvivo to monitor emerging trends and justify the research purpose and theoretical sampling procedures. they also experienced the model-building tool to visualize the analytical observations.18 a study on content analysis of new articles indicated nvivo could assist qualitative research through data organization, idea management, querying data, and modeling. the research group also raised the concern about analytical reliability because qualitative data analysis is a highly interpretive method. therefore, they suggested utilizing double coding and comparison of codes by different researchers to resolve this problem.19 paulus’s team suggested researchers should write a description of the software to allow audience unfamiliar with the tool to not only appreciate its role in the study, but also understand how precisely the software enhances the potential in their analyses.20 in this case study, the author adopted nvivo 11 to conduct a content analysis to testify the proposed model by measuring is project success, which is a qualitative method for practitioners to assess project with textual data in the post-project period. data collection the data gathered in this study include the email communications between the project manager and stakeholders, the reports of university consolidation operational work group (owg), and project committee reports. after reviewing all document data to ensure the relevancy to the measuring is project success | guo 60 https://doi.org/10.6017/ital.v38i1.10603 research topic, the author imported 878 emails, twenty-five owg reports, and sixty-three project committee reports into nvivo 11. content analysis process the software—nvivo 11. nvivo 11 is the software package that allows researchers to collaborate and conduct qualitative studies. researchers can import various types of raw data including social media into nvivo 11 to store, manage, and share the data throughout the research process. however, initial learning and mastering the software could pose a difficult hurdle for researchers to perform a software-assisted qualitative research. data preparation and import. nvivo 11 can process documents (ms word, pdf, or rtf), survey, audio, video, and image. researchers may import outlook emails saved as .msg files into nvivo 11 directly. it is also noted that emails imported into nvivo become pdfs and any supported attachments are imported as well. in this study, the owg and committee reports in either ms word or pdf were imported to nvivo directly. to ensure the email content relevant to the project, the author opened the software nvivo 11 and emails in outlook 2010 simultaneously, and then dragged each email into the sources list view of nvivo 11 (see figure 3) after reviewing each email. figure 3. sources list view in nvivo 11. coding. coding is a way of categorizing all references to a specific topic, theme, person, or other entity. the process of coding can help researchers to identify patterns and theories in research data.21 in this study, the author adopted coding using queries to answer the following research questions: • what is is project success? • what are the factors that affect is project success? information technology and libraries | march 2019 61 • how do these factors influence is project success? below are the steps of coding source data: • run the query of word frequency in all data sets using the criteria of one hundred most frequent words with minimal five-character length including exact matches, stemmed words, and synonyms. • review the word list, remove irrelevant words from the list, and re-run the query until the words are accurate and relevant to the research topic. • create the parent nodes (e.g. contextual factors, project management process, project outcomes) and child nodes (e.g. top management support, manager’s qualifications, project involvement) based on the proposed model, and then save the results of word frequency in respective nodes (see the coding in figure 4). • run the query of word frequency with the same criteria in the context nodes (within each parent node) • review the results of word frequency and save the new word as a new node. • review all node references and sources, merge relevant nodes, and remove irrelevant ones as needed. figure 4. coding using queries. findings and discussion an overview of content analysis previous studies have shown that visualization tools such as models, charts, and treemaps provided by nvivo can be helpful to present the findings of qualitative studies.22 therefore, the author used the model tool to gain a better understanding and overview of key themes in the ils merger project. since the number of emails is much larger than the number of reports, the author measuring is project success | guo 62 https://doi.org/10.6017/ital.v38i1.10603 decided to display the themes of emails and reports separately. figures 5 and 6 are the word treemaps for the emails and reports respectively. figure 5. email tree map. figure 6. report tree map. the treemap is the visualization of the results of word frequency queries. in figure 5, the concepts of patron, barcode, missing, fines, charge, circulation, and policy are library user transactional data; while order, vendor, complete, wilson, lines, taylor, and holding show procurement information. the themes of production, mapping, duplicate, matching, location, cataloging, and process stand for library resource data. hence, the acquisition, bibliographic, patron, and transactional data are the primary content migrated to the new ils. the names mentioned such as russell, adriana meryll, trotter, and david reveal the involvement of system and service providers and top management. information technology and libraries | march 2019 63 figure 6 displays more details on library resource data such as serials, codes, bibliographic, ebook, format, journal and print. the user transactional data also appear. the subjects of production, implement, identify, training, mapping, match, finish, matrix, plan, procedure, campus and urgent indicate project management process. the term “accepted” in contrast shows one of the project outcomes. the treemaps shown above demonstrate that project management process, the involvement of user and system providers, top management, and project outcomes are the representatives of project success, which implies project success is to succeed in project management process, project outcomes, and engaging top management and system users and providers. how do these factors come together to impact project success? the next step is to examine the relationships among these variables and their interactions. relationships among constructs to analyze the concepts of contextual factors, project management process, and project outcomes further, the author utilized the model tool to create project maps. project maps are graphic representations of the data in a project, which helps illustrate the relationships among constructs and answer the research questions of this study. the author further inspected each construct node by creating project maps. figure 7. project management process map. figure 7 shows the relationships among the variables that affect the project management process. the child nodes of communication, project cost, quality, risk, time, and scope are the influencers of project management process. their respective child nodes such as barcodes, missing, and deadline are the results of coding source data and well support how the concepts of communication, cost, quality, risk, scope, and time effect project management process correspondently. measuring is project success | guo 64 https://doi.org/10.6017/ital.v38i1.10603 figure 8. contextual factor map. contextual factors have not been thoroughly discussed in previous project management practices. figure 8 illustrates the results of coding source data within this construct. the engagement of users and vendors, and their feedback signify the variable of project involvement. the node of top management also confirms its parent node of top management support. furthermore, jin as the project manager is associated with the node of project manger’s qualifications. she could affect project success either directly or indirectly through contextual factors. information technology and libraries | march 2019 65 figure 9. project outcomes map. figure 9 represents the themes— downtime, service satisfaction, and acceptance are the child nodes of business continuity, delivery of benefit, and project deliverables correspondently. the pdf reference source supports the subjects of “satisfaction of service” and “conditional acceptance” as the child nodes of “delivery of benefits” and “project deliverables” respectively. thus, business continuity, delivery of benefits, and project deliverables are the core factors to be considered when assessing project outcomes. figures 7, 8, and 9 have demonstrated that the project would not be successful if the project management process was not executed appropriately, context factors measuring is project success | guo 66 https://doi.org/10.6017/ital.v38i1.10603 were not fully met, or preferred project outcomes were not delivered. in other words, if one of three above project variables is not executed or delivered appropriately, the project could fail. the role of project manager although figures 7, 8, and 9 have signified the three constructs can affect project success, but do not tell how project management process, project outcomes, and contextual factors play together in this role. consequently, the author hoped to identify the connections between project items and to see if there are gaps or isolated items unexplained by the proposed model. to create such project map in nvivo 11, the author chose emails as project items and added the issues associated with the project manager jin to the map. figure 10. manager’s project map. this case study is to test the proposed model in a post-project assessment. the manager’s project map in figure 10 has well self-explained this purpose. the project manager jin led the project to success by influencing project management process, project outcomes, and contextual factors. the project success in this case includes the contribution to the consolidation of two state universities and maximization of library resources for the organization. the outcomes of the merger project are to deliver a consolidated ils and to provide library services for the new university continuously. figure 10 clearly indicates jin managed business continuity and project deliverables information technology and libraries | march 2019 67 through downtime and load acceptance. among contextual factors, the project manager executed project involvement through engaging system users and vendors and gathering user feedback. she also involved top management david in the project directly. senior management empowered jin to make decisions on the project. as a manager her qualifications enabled her to cope with the complexity of the project. the project documentation has verified the manager’s ability to govern the project. for instance, figure 11 is the project framework that the manager created according to the pmbok (project management body of knowledge). hence, a qualified project manager can directly make impacts on project success through contextual factors. figure 11. ksu library ils merger project management framework. meanwhile, the nodes of barcode, mappings, missing, patrons, and vendors confirm the manager’s role in project quality control. the coding of the deadline, cost-consolidation, communication, and risk control indicates the manager put her effort in project time, cost, and communication management and risk mitigation correspondingly. figure 10 reveals the project manager is the core of the project team and makes significant impacts on project success by influencing project management process, contextual factors and project outcomes. a project manager must fully understand project outputs; have the ability to execute project plans in the business environment, and communicate with different stakeholders at the corresponding levels through various channels since communication becomes challenging when a project involves more people from different sections of the business. people decode messages differently. multiple communication chains can help stakeholders gain consistent and accurate information directly. for example, this project manager utilized formal reports, group measuring is project success | guo 68 https://doi.org/10.6017/ital.v38i1.10603 discussions, training, and weekly coordination meetings to share information and seek feedback. the functional groups are the governance structure of the project. in the phrase of test and production loads, the leaders of functional groups communicated problems with the project manager more frequently to ensure the manager resolve issues in collaboration with related stakeholders (e.g. ex libris) in a timely way. in the meantime, the project manager communicated the expectations for responsible it staff regularly to prevent the additional waiting time for feeding the merged ils with patron data by verifying the feeder during the test load, which helps meet the deadlines of the campus it projects. the manager mitigated risk by implementing the project plan thoughtfully throughout the project. it was the project manager who connects the three variables—project management process, project outcomes, and contextual factors with project success. conclusions libraries have used the ils to manage resources and services for decades. with the exponential growth of digital information, is innovation continuously becomes one of most effective drivers of library transformation. therefore, it is crucial for libraries to effectively manage is/it projects to achieve organizational goals. this study develops a model of is project success. the model employs three constructs namely project management process, project outcomes, and contextual factors to measure is project success. project management success cannot bring is project success unless the project results achieve business goals and lead to the improvement of organizational effectiveness. the project manager makes important impacts on project success by delivering project outcomes through implementing project management process and making use of contextual factors throughout the project. the research methodology—software-assisted qualitative content analysis can be an approach to develop or test a theoretical model for library practitioners. a post-project evaluation can create an excellent opportunity for organizational learning and help managers to manage talents better and improve the chances of project success in the future. future research libraries have moved into a new era that is full of new and disruptive technologies, which affect library services, operations, and decisions on a daily basis. is projects will continue bringing innovations to library services and programs. a theoretical framework could provide librarians a methodology to manage is projects successfully. notably, the u.s. senate has unanimously approved the program management improvement and accountability act (pmiaa) to enhance project and program management practices to maximize efficiency in the federal government.23 project management has become a must-have skill for today’s library leaders. there are many opportunities for managers to test the is project success model through their practices. the future studies may combine quantitative and qualitative methods to assess and enhance the model further. each institution has different goals and contextual indicators that the author has not mentioned in this study. these factors might shift from minor to major or vice versa due to different organizational cultures. practitioners can also use nvivo to collaborate on double coding to increase the analytical reliability. a software-assisted qualitative content analysis will help library leaders to understand project management better and experiment the solutions to complex information world. information technology and libraries | march 2019 69 acknowledgements this work would not have been possible without the support of the ksu library system administration and the team efforts from ksu voyager consolidation committee, gil support, and ex libris team. i am grateful to all of those with whom i have had the privilege to work during this project. references 1 john nicholas and gezinus hidding, “management principles associated with it project success,” international journal of management and information systems 14, no. 5 (nov. 2, 2010): 147-56, https://doi.org/10.19030/ijmis.v14i5.22. 2 alan r. peslak, “information technology project management and project success,” international journal of information technology project management 3, no. 3 (july 2012): 31-44, https://doi.org/10.4018/jitpm.2012070103. 3 udechukwu ojiako, eric johansen, and david greenwood, “a qualitative re-construction of project measurement criteria.” industrial management & data systems 108, no. 3 (mar. 2008): 405-17, https://doi.org/10.1108/02635570810858796. 4 claudia-georgeta cảrstea, “it project management—cost, time and quality,” economy transdisciplinarity cognition 17, no. 1 (mar. 2014): 28-34, http://www.ugb.ro/etc/etc2014no1/07_carstea_c..pdf. 5 wen-hsien tsai et al., “an empirical investigation of the impacts of internal/external facilitators on the project success of erp: a structural equation model,” decision support systems 50, no. 2 (jan. 2011): 480-90, https://doi.org/10.1016/j.dss.2010.11.005. 6 raymond young and ernest jordan, “top management support: mantra or necessity?” international journal of project management 26, no. 7 (oct. 2008): 713-25, https://doi.org/10.1016/j.ijproman.2008.06.001. 7 z. irani, “investment evaluation within project management: an information systems perspective,” the journal of the operational research society 61, no. 6 (june 2010): 917-28, https://doi.org/10.1057/jors.2010.10. 8 peslak, “information technology project management and project success,” 31-44. 9 tadeau oliveira de lacerda, leonardo ensslin, and sandra rolim ensslin, “a performance measurement view of it project management,” international journal of productivity and performance management 60, no. 2 (2011): 132-51, https://doi.org/10.1108/17410401111101476. 10 nicholas and hidding, “management principles,” 153. 11 graeme thomas and walter fernández, “success in it projects: a matter of definition?” international journal of project management 26, no. 7 (oct. 2008): 733-42, https://doi.org/10.1016/j.ijproman.2008.06.003. measuring is project success | guo 70 https://doi.org/10.6017/ital.v38i1.10603 12 elmar kutsch, “the measurement of performance in it projects,” international journal of electronic business 5, no. 4 (2007): 415, https://doi.org/10.1504/ijeb.2007.014786. 13 joana g. geraldi, elmar kutsch, and neil turner, “towards a conceptualisation of quality in information technology projects,” international journal of project management 29, no. 5 (july 2011): 557-67, https://doi.org/10.1016/j.ijproman.2010.06.004. 14 ryan s. hoover and amy l. koerber, “using nvivo to answer the challenges of qualitative research in professional communication: benefits and best practices tutorial,” ieee transactions on professional communication 54, no. 1 (mar. 2011): 68-82, https://doi.org/10.1109/tpc.2009.2036896. 15 erika goble et al., “habits of mind and the split-mind effect: when computer-assisted qualitative data analysis software is used in phenomenological research,” forum: qualitative social research 13, no. 2 (may 2012): 1-22, https://doi.org/10.17169/fqs-13.2.1709. 16 peter balan et al., “concept mapping as a methodical and transparent data analysis process,” in handbook of qualitative organizational research (london: routledge, 2015): 318-30, https://doi.org/10.4324/9781315849072. 17 syed zubair haider and muhammad dilshad, “higher education and global development: a cross cultural qualitative study in pakistan,” higher education for the future 2, no. 2 (july 2015): 175-93, https://doi.org/10.1177/2347631114558185. 18 andrew john hutchison, lynne halley johnston, and jeff david breckon, “using qsr-nvivo to facilitate the development of a grounded theory project: an account of a worked example,” international journal of social research methodology 13, no. 4 (oct. 2010): 283-302, https://doi.org/10.1080/13645570902996301. 19 florian kaefer, juliet roper, and paresha sinha. “a software-assisted qualitative content analysis of news articles: example and reflections,” forum: qualitative social research 16, no. 2 (may 2015): 1-20, https://doi.org/10.17169/fqs-16.2.2123. 20 trena paulus et al., “the discourse of qdas: reporting practices of atlas.ti and nvivo users with implications for best practices,” international journal of social research methodology 20, no. 1 (jan. 2017): 35-47, https://doi.org/10.1080/13645579.2015.1102454. 21“about coding,” nvivo help (melbourne, australia: qsr international, 2018), accessed apr. 3 2018, http://helpnv11.qsrinternational.com/desktop/concepts/about_coding.htm?rhsearch=coding&rhsyns=. 22 paulus et al., “discourse of qdas,” 41. 23 “u.s. senate unanimously approves the program management improvement and accountability act,” business wire (dec. 2016) accessed nov. 10, 2017, http://www.businesswire.com/news/home/20161201006499/en/u.s.-senate-unanimouslyapproves-program-management-improvement. an omeka s repository for placeand land-based teaching and learning article an omeka s repository for placeand land-based teaching and learning neah ingram-monteiro and ro mckernan information technology and libraries | september 2022 https://doi.org/10.6017/ital.v41i3.15123 neah ingram-monteiro (ingramn@wwu.edu) is a teaching and learning librarian, western washington university. ro mckernan (rmckernan@whatcom.edu) is the oer librarian, whatcom community college. © 2022. abstract our small community college library developed a learning object repository to support a crossinstitutional, land-based, multidisciplinary academic initiative using the open-source platform omeka s. drawing on critical, feminist, and open practices, we document the relational labor, dialogue, and tensions involved with this open education project. this case study shares our experience with tools and processes that may be helpful for other small-scale open education initiatives, including user-centered iterative design, copyright education, metadata design, and userinterface development in omeka s. introduction whatcom community college (wcc) is a rural, public institution, located on the lands of coast salish peoples, including lummi, nooksack, semiahmoo, and samish, in the northwest region of washington state and just south of british columbia and the us-canada border. referred to as the pacific northwest or the north puget sound, this area is part of the greater salish sea bioregion (see fig. 1). the sea’s name was adopted in 2009 in washington state and british columbia to refer collectively to the strait of georgia, the strait of juan de fuca, and the puget sound.1 the library at wcc has recently established several new digital services, including the college’s first institutional repository. housed within this repository is a site named the salish sea curriculum repository, which has been developed to host a collection of materials and multidisciplinary curriculum related to engaging college students with this bioregion and is a unique cross-institutional collaboration between the library and the salish sea institute at nearby western washington university (wwu). in this paper, we document, from the perspective of the constraints of a small community college library, the development of the institutional repository service through the creation of the salish sea curriculum repository. this first phase development process began through relational work and proceeded through user-centered iterative design considerations, copyright education, metadata design, and user-centered interface development. a second phase was then launched that produced a curated index of existing work. we document our process to demonstrate a case study of a small-scale, open-source–backed scholarly communication project that can be reasonably replicated by other smaller institutions in order to encourage scholarly communication and open education services at all levels of librarianship. mailto:ingramn@wwu.edu mailto:rmckernan@whatcom.edu https://whatcomdigitalcommons.org/s/salishsea/page/welcome https://whatcomdigitalcommons.org/s/salishsea/page/welcome https://whatcomdigitalcommons.org/s/salishsea/page/welcome https://whatcomdigitalcommons.org/s/salishsea/page/welcome information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 2 ingram-monteiro and mckernan figure 1. “reference map for the salish sea bioregion,” aquila flower, 2020. made as part of the salish sea atlas, https://wp.wwu.edu/salishseaatlas/. creative commons attributionnoncommercial-noderivatives 4.0 international license. https://wp.wwu.edu/salishseaatlas/ https://creativecommons.org/licenses/by-nc-nd/4.0/ https://creativecommons.org/licenses/by-nc-nd/4.0/ information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 3 ingram-monteiro and mckernan description of library repository service development in spring 2020, our library began to develop an institutional repository in response to a need to document faculty and staff scholarship and student scholarship, including newspapers and journals, and to host a collection of historical college images and videos. lacking the budget for bepress and the dedicated technical expertise to implement dspace, we found that omeka s hosted on a shared server through reclaim hosting was the most appropriate fit for our needs. omeka was originally developed at the roy rosenzweig center for history and new media at george mason university; it offers libraries and museums a way to publish online exhibits while ensuring accessibility and the inclusion of standards-based metadata to support discovery and use.2 omeka s is a later platform that offers one single point of administration for multiple instances of omeka, making it more usable for institutions like ours with a variety of unique collection sites with their own display templates. omeka s adheres to international standards, such as the dublin core schema for metadata, and allows creation of digital content collections, simple web pages, and complex online exhibits. the software can be managed and administered by one librarian. it allows interoperability through the open archives initiative protocol for metadata harvesting (oai-pmh is critical for future ingestion into the digital public library of america, which will provide wider discoverability) and rest apis (which will be necessary for any integration into the library’s opac). as the college’s open education resources and copyright librarian, mckernan initially developed and administers the library’s omeka s installation. while the initial collections were in line with traditional institutional repository sites, a new need developed later in 2020 in response to curricular developments at the college: a repository based around multidisciplinary, land-based learning objects. this repository was the salish sea curriculum repository. development of salish sea curriculum repository relational work, which in our process includes time to build relationships and engage in dialogue, is important given our mutual exploration of open education projects. luo, hostetler, freeman, and stefaniak point to the importance of a campus culture that supports open education, through resource allocation such as oer design and development.3 as part of a larger team, this project represented three institutions (wcc, wwu, and the university of british columbia) with three different open education cultures. and additional faculty partners at wcc and wwu had varying experience with open education, ranging from an awareness of creative commons licenses to experience authoring oer textbooks. dai and carpenter discuss the feminized labor that goes into oer projects, arguing that, like instruction librarianship, oer librarianship is predicated on relational work.4 as feminized work is often invisible and undervalued, they suggest planning and documenting time for consultative tasks such as meeting with faculty as ways of bringing critical, feminist, and open pedagogies into this work.5 by discussing the development of the salish sea curriculum repository in terms of development phases, we want to devote space not just to the final products, but also to documenting this collaborative process. a note on terminology: the eric descriptor place based education is described as pedagogy to engage learners in their cultural, social, and ecological contexts; it often includes inviting community members in as instructors and bringing learners into the natural environments where https://whatcomdigitalcommons.org/s/salishsea/page/welcome https://whatcomdigitalcommons.org/s/salishsea/page/welcome information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 4 ingram-monteiro and mckernan they live. while place-based education is the prevailing term, calderon has argued that the expression of this term has typically erased indigenous relations with land and obscures the violence of settler colonialism.6 in contrast, calderon writes that land education or land-based education makes explicit the ideologies and structures of settler colonialism and that land education centers indigenous peoples’ relations to land, critically examining what it means to inhabit the lands of indigenous nations. we will continue to use the phrase land -based education in this paper. at wcc, the salish sea studies (sali) curriculum was developed by history instructor anna booker in partnership with natalie j. k. baloy of the salish sea institute at wwu. the curriculum includes experiential learning about the complex human-environment systems of the bioregion that builds a sense of place, connection, and relational accountability.7 at both colleges, the instruction teams who co-teach this course rotate from term to term and include faculty from multiple disciplines. instructors at wcc have included faculty from the departments of history, anthropology, geology, and sociology. at wwu, instructors have included faculty with appointments in salish sea studies, canadian-american studies, comparative indigenous studies, the college of the environment, and fairhaven college of interdisciplinary studies. units in the introductory sali course are designed to demonstrate that many ways of knowing are relevant and important to understanding the salish sea. when the second iteration of the course shifted online due to the covid-19 pandemic, with less than two weeks’ notice in spring 2020, instructors shifted to creating learning objects for asynchronous learning. since then, curricula for this course and adjacent courses in salish sea studies have been designed for online, in-person, and hybrid learning. subsequently booker had received a 2020–22 grant from the national endowment for the humanities (neh) to further develop the salish sea studies curriculum. with the pivot to online, she was looking for digital ways to support curriculum sharing. baloy had been approached by ingram-monteiro (who was in the ubc’s master of library and information studies program at the time) about supporting salish sea studies during the initial covid-19 shutdown. baloy connected her with booker about a possible grant work project. because of booker’s previous work on oer with mckernan, she suggested approaching the college’s library about a collaboration. the idea of using existing repository software to build a new site that would serve as a space to collect and share curriculum was born out of this dynamic context. in addition to the rotating instruction team and teaching modality variables introduced by the global health pandemic, the field of salish sea studies as taught in our context was also being defined and articulated concurrent with the initial development of this repository (through distinct curricular conversations). what started as a repository to share open educational resources about the salish sea bioregion became an online space for creating and sharing a curated set of oer explicitly for use in land-based, experiential, multidisciplinary, and transboundary teaching and learning about the salish sea bioregion. phase one: initial digital repository development the first phase of development ran from november 2020 to january 2021. library work in the first phase included designing, building out, styling, and initially populating the repository. we used a user-centered iterative design process in this phase (see fig. 2). user-centered iterative information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 5 ingram-monteiro and mckernan design is used in the human-computer interaction field to foreground user needs during design processes. van house, butler, ogle, and schiff discuss how designers need to know the larger context and purposes of users’ work with a digital library, as well as their specific tasks and information acts, such as searching or repackaging.8 for our project, user-centered iterative design started with consulting faculty partners to help them articulate use cases, identify primary users, and discuss ways to build the repository to incentivize submissions from instructors. these consultations included asking a lot of questions to draw out their needs and wishes for the repository. ingram-monteiro spent about 15 hours (of the 120 dedicated to this phase of the project) meeting or corresponding with our faculty partners over the course of four weeks. our partners contributed this time and more, in addition to their standard workloads and during winter break. this was very much a dialogue, as we went back and forth on some topics over the course of a few weeks, brainstorming together and looking for inspiration to share with each other. figure 2. “agile development” by dave gray is licensed under cc by-nd 2.0. through our dialogue, we were able to articulate that the repository would include adaptable, reusable teaching materials for lessons and courses about the salish sea. users of the repository would primarily include faculty contributors who use the repository to submit teaching materials and instructors from bioregional higher education institutions who use the repository to find material to adapt for their teaching. other users considered could include site visitors who are seeking information about the introduction to salish sea studies course that is taught in parallel at wcc and wwu. copyright education we provided copyright education in various modalities to the instructors who were involved during this phase. our faculty partners’ questions informed how we built copyright considerations into the repository. questions included what materials they could use and remix in a learning object that they would then license for reuse. while they felt protected by fair use for distributing copyrighted videos, maps, or readings within a traditionally mediated classroom environment, this calculus could not be automatically extended for distribution in oer. a recent systematic https://www.flickr.com/photos/38075047@n00/6865783267 https://www.flickr.com/photos/davegray/ https://creativecommons.org/licenses/by-nd/2.0/?ref=ccsearch&atype=rich information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 6 ingram-monteiro and mckernan literature review of empirical oer studies found that faculty are consistently uncertain how to license their creation when it includes remixed materials9 and our experience echoed that.10 the solution for the salish sea repository was the creation of a resources section in the repository that included citations for all rights reserved published works, so that an instructor could point to traditionally copyrighted works without directly uploading them to the repository (see fig. 3). figure 3. screenshot of an omeka s record for an article citation in the resources section. our faculty partners also had questions about creative commons licensing and how to select an appropriate license for their work. we designed the curriculum submission form to include explanations of each creative commons license type, as well as public domain and all rights reserved options. a submitter can read about these six terms and select which license is appropriate for their activity. figure 4. screenshot of copyright guidance and license selector on the submission form. information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 7 ingram-monteiro and mckernan while we were able to provide some guidance in the context of this project, bigger questions remained. faculty partners worked through the challenges of creating public oer from private, contextual lesson plans and learning objects.11 given the curricular emphasis on relational accountability in the salish sea bioregion, and the central role of indigenous knowledge holders and ways of knowing in line with land-based education, materials referenced in salish sea studies include traditional knowledge of indigenous nations. while this knowledge is shared in a consensual way in the context of a course (where a knowledge holder may be an invited guest, for example), sharing these materials in an open repository online introduces different considerations.12 local contexts’ traditional knowledge (tk) labels are a popular topic in open education and have been adopted by the library of congress, but as reijerkerk demonstrates, simply applying these labels in online catalogs is insufficient.13 the use of these labels is intended to be one intervention and done in relationship with indigenous knowledge holders.14 our role may be more to ask questions about existing permissions to share that knowledge, especially around ownership of that knowledge. christen provides more context on how tk labels can be applied in such material when it has been published in the public domain.15 metadata schema omeka s offers linked data infrastructure with the dublin coretm metadata initiative’s dcmi metadata terms (dcterms:) as the default vocabulary.16 we used this vocabulary to create a functional metadata schema that allows faculty to describe their submissions in ways that would be useful for other users. the metadata added during the submission process was then cleaned and enhanced by the librarian who reviewed each submission. for site visitors, this metadata allows browsing through the set of learning objects in the repository; they can browse lessons from a particular discipline or place, for example. they can also perform keyword searching to find results based on titles and lesson descriptions. during this design phase of the metadata, we started with an examination of the types of learning objects that would be shared through the repository. through iterative design we arrived at an initial structure that included four types of objects that would be deposited: assignments, activities, existing published resources, and learning modules. for each type, we then created an omeka s resource template to support consistent metadata processing and a “collecting form” to support metadata collection. we then added 40 resources, one module, two assignments, and five activities—all provided by our faculty partners—to test this structure. after more faculty consultation, we simplified the metadata design to include two learning object types: activities (including assignments) created by instructors and bibliographic citations for core resources used in salish sea studies. we refined our metadata schema for each of these and documented this metadata design and processing. see table 1 for one example. https://localcontexts.org/ https://www.dublincore.org/specifications/dublin-core/dcmi-terms/ https://www.dublincore.org/specifications/dublin-core/dcmi-terms/ information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 8 ingram-monteiro and mckernan table 1. metadata design—resource type: activity metadata field label values notes dcterms:title activity described by submitter dcterms:description lesson description described by submitter dcterms:subject discipline indigenous ways of knowing, humanities, natural sciences, social sciences, multidisciplinary/interdisciplinary, other dcterms:spatial spatial coverage described by submitter repeatable dcterms:audience course modality face-to-face, online synchronous, online asynchronous, hybrid, other dcterms:temporal temporal coverage described by submitter repeatable dcterms:format primary format of activity icebreaker, problem-based discussion, field trip, other dcterms:extent estimated time for students to complete activity 15 minutes, 30 minutes, one hour, two hours, more than two hours, multiple sessions, all quarter dcterms:creator primary creator(s) full name repeatable dcterms:contributor institutional affiliation western washington university, whatcom community college, other dcterms:license license 8 listed in item set add as “omeka resource” user interface design once we had a working metadata schema and collection mechanism/workflow and the high -level site structure, we shifted our focus to considerations of the interactivity and look and feel of the repository site. we heard from our faculty partners that they wanted a clean, colorful design that would be appealing to users. they shared the stanford history education group as one example, noting its simple navigation, and blackpast.org, noting its interactive timeline. they also shared spokanehistorical.org, which is built on omeka and includes an interactive map. we tried to manage expectations about what would be possible. we did not have many resources available for web design or experience with the omeka-compatible mapping and timeline tool neatline. still, we found that it was possible to create a simple, visually pleasing interface with some basic css skills, omeka s modules, and documentation from histsex.org, a library collection made with omeka s.17 modules in omeka s are plug-ins that can be installed and activated to add additional functionality. one of the more key modules we activated (on the admin side) was the css editor.18 we could then write an internal style sheet in this editor, in which to style the colors and links in accordance with web content accessibility guidelines for styling headers, color contrast, and text decoration for hyperlinks.19 the css editor module also includes input fields for external style sheets, which enabled us to include one for google fonts. the color scheme we selected uses wcc’s colors and complements the blues and greens of the salish sea. https://sheg.stanford.edu/ https://www.blackpast.org/ https://spokanehistorical.org/ https://neatline.org/ http://histsex.org/ information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 9 ingram-monteiro and mckernan another key module we added was the mapping module.20 this enabled us to represent the spatial coverage of learning objects on maps within the repository. instructors contributing content can associate their contribution with a specific geographic place by placing a marker or entering an address. the librarian processing the contribution can add to and edit this geospatial data. for site visitors, the mapping module allows interactive, map-based browsing of repository items. we initially deployed it only for bibliographic citations because this was the only resource type with a critical mass of existing content when we built the repository. when a visitor opens the resources page from the navigation, they see the option to “find resources by place,” with an embedded openstreetmap that includes markers that link to associated citations in the repository (see fig. 5). the spatial markers help locate scholarship to concepts of land-based education. figure 5. a screenshot of map indicators tied to repository items. a third omeka s module that we installed was fields as tags.21 this module increases the number of browsable metadata fields available to visitors on the main pages of the repository; in addition to title, subject, extent, and creator, visitors can also browse spatial and temporal coverage tags. in the interim months between phase one and phase two, we introduced the repository to wcc faculty who were participating in a year-long professional development workshop about teaching salish sea studies. the culminating project for that workshop involved submitting a teaching activity to the repository. however, while many participants began the submission process, few information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 10 ingram-monteiro and mckernan were able to submit an activity that was repository ready. reflecting on this and their own experiences developing curriculum for introduction to the salish sea, our faculty partners scaled back on expectations for oer development. it was evident that thoughtfully designing land and place-based, experiential, multidisciplinary, transboundary curriculum that is also open and adaptable would require dedicated resources in the context of deeper relationships and a longer timeframe. phase two: a curated index of published works the second phase of development, which took place roughly from july to december 2021, focused on further honing the interface and usability of the site. our faculty partners designated some neh grant funds to pay ingram-monteiro a stipend for summer website development work. further developing the resources section of the repository thus became the focus of our work in summer 2021. by providing a central access point for curated, published works about the salish sea, the repository would support faculty who were developing salish sea teaching materials as oer. we also referred to the resources section as the salish sea index, a space that provides building blocks for teaching materials. developing this index included adding individual pages for maps, collections, and terminology. the maps and collections pages—as well as the original resources page that includes published articles, books, videos, podcasts—are configured to automatically pull in newly entered omeka s items. each item was added using our previously developed bibliographic citation resource template for published resources. whenever there was a creative commons license, open access license, or copyright information provided, we note this at the item level to facilitate reuse and attribution. the digital collections page points visitors to digital collections (such as the northwest indian college salish sea speaker series videos and the south asian american digital archive) as well as to information about physical collections (such as the wing luke museum and the center for pacific northwest studies), which can be visited in person. finally, in addition to maps, journal articles, archival collections, and other media catalogued in the index, another building block for creating salish sea teaching materials is the terminology. this html page is in progress. it will be a reference tool for the vocabulary of salish sea studies, synthesizing concepts that are critical to this multidisciplinary and transboundary pedagogy. providing these building blocks functions as a way of supporting oer creation and remixing. phase three: future work—building transboundary community as of spring 2022, the salish sea repository’s role is to share curricular building blocks, learning outcomes, and sample materials. our faculty partners, with our support, are working on building a transboundary community around the repository, including librarians and interdisciplinary scholars engaged with relational accountability and landand place-based learning in higher education. we are producing this article in this context. as we expressed earlier, we wanted this to document the way relationship building is critical to the development and future growth of this project. as an example, we met with ashley edwards, one of two indigenous initiatives and instruction librarians at simon fraser university (sfu). in her work with the indigenous curriculum resource centre at sfu, edwards collects and organizes http://whatcomdigitalcommons.org/s/salishsea/item/1269 http://whatcomdigitalcommons.org/s/salishsea/item/1269 http://whatcomdigitalcommons.org/s/salishsea/item/1273 http://whatcomdigitalcommons.org/s/salishsea/item/1274 http://whatcomdigitalcommons.org/s/salishsea/item/1271 http://whatcomdigitalcommons.org/s/salishsea/item/1271 information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 11 ingram-monteiro and mckernan resources to assist faculty in learning about and engaging in indigenizing their pedagogy and curriculum, centering materials by and about coast salish communities.22 the creation of the center is part of sfu’s response to canada’s truth and reconciliation commission calls to action.23 though no such mandate exists in the us, indigenous and non-indigenous settler faculty at wwu and wcc are engaged with indigenization, as reflected in the inclusion of land-based education in salish sea studies. building a transboundary community invites collaboration with indigenous librarians like edwards, from whose work we can learn how to better support indigenization of curriculum in ethical, responsible, and respectful ways. we also presented the repository at the washington library association academic libraries division/association of college and research libraries of oregon and washington (ald/acrlwa and acrl-wa) academic libraries summit in fall 2021 with the intention of sharing this case study to document our work in the vein of open scholarship. audience questions focused on labor—attendees were interested in knowing about the job titles of people involved with the project. as more oer are developed for sharing in the salish sea repository, we intend to continually evaluate the effectiveness of the repository for users, including user experiences around adapting and remixing the building blocks, filling out the submission form, and browsing learning objects. one area that we expect to focus on is refining the metadata scheme. for example, what is the best approach for describing spatial coverage in this repository, given the variety of place names that can describe one location? we began with a controlled vocabulary and then shifted to an open entry user-defined field. this trades off the user’s ability to browse by a place name for the contributor’s ability to choose the specificity of a location name, which is important given the interdisciplinarity of land-based learning and the inclusion of multiple ways of knowing in this curriculum. we hope metadata librarians will be interested in bringing their skills to this project and working through such questions. future refinements will be driven by these evaluations. summary in response to an emerging, multidisciplinary academic initiative that originated at two local public colleges, our small library utilized our omeka s installation to create the salish sea curriculum repository. we implemented this open education project using a user-centered iterative development process. as of spring 2022, this has involved three phases of development. in the first phase, library work focused on metadata design, copyright education, and user interface development in omeka s. in the second phase, we focused on developing an index of salish sea resources, including information to help instructors find, adapt, and remix published maps, vetted terminology, and bioregional archival collections. the third phase will be focused on building a transboundary community around the creation and sharing of oer that meets salish sea studies learning objectives, including inviting other librarians to bring their specialized skills in support of this project. https://whatcomdigitalcommons.org/s/salishsea/page/welcome https://whatcomdigitalcommons.org/s/salishsea/page/welcome information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 12 ingram-monteiro and mckernan endnotes 1 “salish sea naming project,” college of the environment, western washington university, accessed march 17, 2022, https://cenv.wwu.edu/si/salish-sea-naming-project. 2 “project,” omeka, accessed march 16, 2022, https://omeka.org/about/project/. 3 tina luo, kirsten hostetler, candice freeman, and jill stefaniak, ”the power of open: benefits, barriers, and strategies for integration of open educational resources,” open learning: the journal of open, distance, and e-learning 35, no. 2 (october 2020): 149, https://doi.org/10.1080/02680513.2019.1677222. 4 jessica y. dai and lindsay inge carpenter, “bad (feminist) librarians: theories and strategies for oer librarianship,” international journal of open educational resources 3, no. 1 (may 2020): 152, https://doi.org/10.18278/ijoer.3.1.10. 5 dai and carpenter, 159. 6 dolores calderon, “speaking back to manifest destinies: a land education-based approach to critical curriculum inquiry,” environmental education research 20, no. 1, (2014): 24–36. https://doi.org/10.1080/13504622.2013.865114. 7 kathryn l. sobocinski, “section 6: opportunities for improving assessment and understanding of the salish sea,” in state of the salish sea, ed. k. l. sobocinski (bellingham: salish sea institute, 2021), 213, http://doi.org/10.25710/vfhb-3a69. 8 nancy a. van house, mark h. butler, virginia ogle, and lisa shiff, “user-centered iterative design for digital libraries,” d-lib magazine (february 1996), http://webdoc.sub.gwdg.de/edoc/aw/d-lib/dlib/february96/02vanhouse.html. 9 luo, hostetler, freeman, and stefaniak, 143. 10 the 2021 guide “code of best practices in fair use for open educational resources,” distributed by american university’s washington college of law and the center for media and social impact, has since become an important resource in such consultations. 11 one faculty partner shared walthausen’s article “how the internet is complicating the art of teaching” (the atlantic, october 26, 2016) pointing to the sentence “what i hadn’t understood before this tentative jump into the broader sharing economy was that making assignments is so much about personalization,” which illustrates one challenge to this work. 12 we are writing from lummi territory and so will share an example from here. anthropologist stacy michelle rasmus was asked by the lummi nation to study knowledge transmission and acquisition in the 1990s and early 2000s, including how research relationships are affected by the way knowledge is accessed and controlled in different contexts. in a 2002 article, rasmus shares several ways that outside researchers unethically extract and disseminate knowledge beyond the community. she shares how knowledge holders will share knowledge without giving it away, but outsiders often interpret this sharing as a license to do with it as they wish. https://cenv.wwu.edu/si/salish-sea-naming-project https://omeka.org/about/project/ https://doi.org/10.1080/02680513.2019.1677222 https://doi.org/10.18278/ijoer.3.1.10 https://doi.org/10.1080/13504622.2013.865114 http://doi.org/10.25710/vfhb-3a69 https://cmsimpact.org/code/open-educational-resources/ https://www.theatlantic.com/education/archive/2016/10/how-the-internet-is-complicating-the-art-of-teaching/505370/ https://www.theatlantic.com/education/archive/2016/10/how-the-internet-is-complicating-the-art-of-teaching/505370/ information technology and libraries september 2022 an omeka s repository for placeand land-based teaching and learning 13 ingram-monteiro and mckernan she writes, “… some researchers may not in fact know when they have been exposed to knowledge that, within a community context, is considered private in nature.” 13 dana reijerkerk, “ux design in online catalogs: practical issues with implementing traditional knowledge (tk) labels,” first monday 25, no. 8 (august 2020), https://doi.org/10.5210/fm.v25i8.10406. 14 jane anderson and kim christen, “‘chuck a copyright on it’: dilemmas of digital return and the possibilities for traditional knowledge licenses and labels,” museum anthropology review 7, no. 1–2 (spring-fall 2013), 110. 15 kimberly christen, “tribal archives, traditional knowledge, and local contexts: why the ‘s’ matters,” journal of western archives 6, no. 1 (2015), 13, https://doi.org/10.26077/78d5-47cf. 16 “vocabularies,” omeka s user manual, accessed may 13, 2022, https://omeka.org/s/docs/usermanual/content/vocabularies/. 17 brian m. watson, “grant application for 50 years on, many years past,” iu scholarworks (march 2020), https://hdl.handle.net/2022/25593. 18 “css editor,” omeka s user manual, accessed may 13, 2022, https://omeka.org/s/docs/usermanual/modules/csseditor/. 19 “wcag 2 overview,” web accessibility initiative, accessed may 13, 2022, https://www.w3.org/wai/standards-guidelines/wcag/. 20 “mapping,” omeka s user manual, accessed may 13, 2022, https://omeka.org/s/docs/usermanual/modules/mapping/. 21 libnamic, ”omeka s fields as tags,” omeka s modules, accessed may 13, 2022, https://omeka.org/s/modules/fieldsastags/. 22 ashley edwards, “supporting faculty in indigenizing curriculum and pedagogy: case study of the indigenous curriculum resource centre,” in ethnic studies in academic and research libraries, eds. raymond pun, melissa cardenas-dow, and kenya s. flash (chicago, il: association of college & research libraries, 2021), 171–72. 23 edwards, 177. https://doi.org/10.5210/fm.v25i8.10406 https://doi.org/10.26077/78d5-47cf https://omeka.org/s/docs/user-manual/content/vocabularies/ https://omeka.org/s/docs/user-manual/content/vocabularies/ https://hdl.handle.net/2022/25593 https://omeka.org/s/docs/user-manual/modules/csseditor/ https://omeka.org/s/docs/user-manual/modules/csseditor/ https://www.w3.org/wai/standards-guidelines/wcag/ https://omeka.org/s/docs/user-manual/modules/mapping/ https://omeka.org/s/docs/user-manual/modules/mapping/ https://omeka.org/s/modules/fieldsastags/ abstract introduction description of library repository service development development of salish sea curriculum repository phase one: initial digital repository development copyright education metadata schema user interface design phase two: a curated index of published works phase three: future work—building transboundary community summary endnotes reproduced with permission of the copyright owner. further reproduction prohibited without permission. a low-cost library database solution england, mark;lura, joseph;schlecht, nem w information technology and libraries; mar 2000; 19, 1; proquest pg. 46 responding rise in scholarly journal prices. nesli neither encourages nor hinders changes in scholarly communication and therefore the question of restructuring the scholarly communication process remains.20 references and notes 1. barbara mcfadden and arnold hirshon, "hanging together to avoid hanging separately: opportunities for academic libraries and consortia," information technology and libraries 17, no . 1 (march 1998): 36. see also international coalition of library consortia, "statement of current perspective and preferred practices for the selection and purchase of electronic information," information technology and libraries 17, no. 1 (march 1998): 45. 2. martin s. white, "from psli to nesli: site licensing for electronic journals," new review of academic librarianship 3, (1997): 139-50. see also chest. chest: software, data, and information for education (1996). 3. thomas j. deloughry, "library consortia save members money on electronic materials," the chronicle of higher education (feb. 9, 1996): a21. 4. information services subcommittee , "principles for the delivery of content." accessed nov . 17, 1999, www.jisc.ac.uk/ pub97 / nl_97.html#issc. 5. joint funding council's libraries review group . the follett report. (dec. 1993): accessed nov . 20, 1999, www.niss . ac . uk/ ed ucation/hefc/ follett/report/ . 6. john kirriemuir, "background of the elib programme ." accessed nov . 21, 1999, www .ukoln.ac.uk/services .elib/ background/history.html . 7. psli evaluation team, "uk pilot site license initiative : a progress report," serials io, no. 1 (1997): 17-20. 8. white, "from psli to nesli," 149. 9. tony kidd, "electronic journals: their introduction and exploitation in academic libraries in the uk," serials review 24, no . 1 (1998): 7-14. 10. jill taylor roe, "united we save, divided we spend: current purchasing trends in serials acquisitions in the uk academic sector," serials review 24, no. 1 (1998): ~11. psli evaluation team, "uk pilot site license initiative," 17-20. 12. beverly friedgood, "the uk national site licensing initiative," serials 11, no. 1 (1998): 37-39 . 13. university of manchester and swets & zeitlinger, nesli: national electronic site license initiative (1999). accessed nov. 21, 1999, www.nesli.ac.uk/. 14. nesli brochure, "further information for librarians." accessed nov . 21, 1999, www .nesli .ac.uk/ nesli-librarians-leaflet.html. 15. a copy of the model site license is available on the nesli web site . accessed nov . 22, 1999, www .nesli .ac .uk/ mode1license8.html . 16. albert prior, "nesli progress through collaboration," learned publishing 12, no . 1 (1999). 17. science direct. accessed nov. 24, 1999, www .sciencedirect.com. 18. declan butler, "the writing is on the web for science journals in print," nature 397, oan. 211998) . 19. the journal access core collection request for proposal. accessed nov . 22, 1999, www .calstate.edu/tier3/ cs+p/rfp_ifb/980160/980160.pdf . 20. frederick j. friend, "uk pilot site license initiative: is it guiding libraries away from disaster on the rocks of price rises?" serials 9, no. 2 (1996): 129-33. a low-cost library database solution mark england, lura joseph, and nem w. schlecht two locally created databases are made available to the world via the web using an inexpensive but highly functional search engine created in-house. the technology consists of a microcomputer running unix to serve relational databases. cgi forms created using the programming language perl offer flexible interface designs for database users and database maintainers. many libraries maintain indexes to local collections or resources and create databases or bibliographies con46 information technology and libraries i march 2000 cerning subjects of local or regional interest. these local resource indexes are of great value to researchers. the web provides an inexpensive means for broadly disseminating these indexes. for example, kilcullen has described a nonsearchable, webbased newspaper index that uses microsoft access 97.1 jacso has written about the use of java applets to publish small directories and bibliographies.2 sturr has discussed the use of wais software to provide searchable online indexes.3 many of the web-based local databases and search interfaces currently used by libraries may: • have problems with functionality; • lack provisions for efficient searching; • be based on unreliable software; • be based on software and hardware that is expensive to purchase or implement; • be difficult for patrons to use; and • be difficult for staff to maintain. after trying several alternatives, staff members at the north dakota state university libraries have implemented an inexpensive but highly functional and reliable solution. we are now providing searchable indexes on the web using a microcomputer running unix to serve relational databases. cgi forms created at the north dakota state university libraries using the programming language perl offer flexible interface designs for database users and database maintainers. this article describes how we have implemark england (england@badlands . nodak.edu) is assistant director, lura joseph (ljoseph@badlands.nodak.edu) is physical sciences librarian, and nem w. schlecht (schlecht@plains.nodak.edu) is a systems administrator at the north dakota state university libraries, fargo, north dakota. reproduced with permission of the copyright owner. further reproduction prohibited without permission. mented this technology to distribute two local databases to the world via the web. it is hoped that recounting our experiences will facilitate other such projects . i creating the databases the two databases that we selected to use as demonstrations of this technology are a community newspaper index and a bibliography of publications related to north dakota geology. the forum index the farg o forum is a daily newspaper published in fargo, north dakota. it began publication in 1879 and is the paper of record for north dakota . for many years, the north dakota state university libraries have maintained an index to the forum. beginning with the selective indexing of notable events and editions, we started offering full-text indexing of the entire paper in 1996. until early in the 1980s, all indexing was done manually and preserved on cards or paper. then for several years , indexing was done on one of the university's mainframe computers . starting in 1987, microcomputers were used to compile the index, first using dbase and then using procite as the database management software . printed copies of the database were sold annually to subscribing libraries and businesses . starting in the summer of 1996, th e library made arrangements with the publisher of the paper to acquire digital copy of the text of each newspaper. in early 1997, the ndsu libraries began a project to place all of our forum indexes on the web. dbase, pro-cite, wordperfect, or microsoft access computer files existed for the newspaper index from 1879 to 1975, 1988, and from 1990 to 1996. all other data was unavailable or unreadable. printed indexes from 1976 to 1987 and 1989 were scanned using a hewlett packard 4c scanner fitted with a page feeder . optical character recognition was accomplished using the software omnipage pro. once experience was gained with scanner and software settings, the scanning went very quickly with very few errors appearing in the data. various members of the library staff volunteered to check and edit the data, and the digitizing of approximately 1,500 pages was completed in about three weeks. all data were checked and normalized using microsoft's excel spreadsheet software and then saved as tab-delimited text. programmer's file editor was used to do the final text editing. because of variations in the completeness of the indexing, three separate relational database tables were created: one each for the years 1879-1975, 1976-1996, and 1996-the present. the collective bibliography of north dakota geology in 1996 a project was initiated to combine three bibliographies of north dakota geology and to make the final product searchable and browsable on the web. all three of the original print bibliographies were published by the north dakota geological survey. scott published the first bibliography as a thesis . it is a bibliography of all then-known north dakota geological literature published between 1805 and 1960, and most entries are annotated. 4 the second print bibliography, also by scott, focuses on north dakota geological literature published in the years 1960 through 1979, and also includes some material omitted in the first bibliography .5 most entries in the second bibliography include annotations in the form of keywords or keyword phrases. the third bibliography covers the years 1980 through 1993, and is not annotated.6 all three bibliographies are indexed . the third bibliography was available in digital format, whereas the first two were in print format only. library staff members began rekeying the two print bibliographies using microsoft word. the remaining pages were digitally scanned using a new hewlett packard 4c scanner and the optical character recognition software omnipage pro . there were many errors in the resulting text. different font sizes in the original documents may have contributed to optical recognition errors . editing of the scanned pages was nearly as time consuming and tedious as rekeying the documents . the microsoft word documents were saved as text files and combined as a single text file. programmer's file editor was used as a final editor to remove any line breaks or other undesirable formatting. each record was edited to occupy one line, and each field was delimited by two asterisks . asterisks were used because there were many occurrences of commas, semicolons, and other symbols that would have made it difficult to parse any other way. because italics were removed by converting to a text file, some errors were made in parsing. in retrospect, parsing should have been done before the document was saved as a text file. punctuation between fields was removed because the database would be converted to a large table. it would have been better to leave the punctuation intact, since it cannot easily be put back in for the output to be presented in bibliographic form. the alphabetical additions to publication dates (e.g. baker, 1966a) were left intact to aid in hand-cutting and pasting index terms into the records at a later date. initially, the resulting document was converted to a microsoft access file so that it would be in a table format. however, many of the fields communications i england, joseph, and schlecht 47 reproduced with permission of the copyright owner. further reproduction prohibited without permission. secure database: shaw diese fields in results : aalhor: ::::=====~---~---date : le~al to i p author p date p tille p' source tid,:l . _j r annot:l!iom r: index sour1:e: ~=====~ amiotalions: l ... -· · ... ······-···~~ ..... ·.-... --.... j r prilll resource p record number bulu: ;:::::::::::=::::::::::::::::::;:::;,~~ priat re1oun:11: ! show all ii record naml,er: i equal to ij l=:j sort results by: jaulhor j r descending ai b ic idi eif ig ihii ij ikil im in io ipi.q.iri s it iuiviwix iy iz figure 1: secure database editing interface were well over the 256 character limit of individual fields . to solve this problem, the data were imported into a relational database called mysql, which allows large data fields called "blobs." running under unix, mysql is very flexible and powerful . i database and search engine design we examined the features and capabilities of various online bibliographies and indexes when deciding on our search interfaces and search engine designs . we wanted our databases to be both searchable and browsable and, in the case of the collective bibliography of north dakota geology, we wanted to provide the option of receiving search results accurately in a specific bibliographic format. we wanted both simple and advanced search capabilities, including the ability to do highly sophisticated boolean searching. finally, we wanted to provide those maintaining the databases with the ability to easily add, delete, and change records from within simple forms on the web and immediately see the results of this editing . mysql uses a perl interface, dbi (database independent interface), which makes accessing the database simple from a perl script. essentially, a sql statement is generated, based on data from an html form. this sql statement is then run against the mysql database, returning matching rows that the same script can handle and display as needed. all of the dynamically generated pages in this database are created this way. using both mysql and perl provided a nice, elegant way to integrate database functionality with the web. the databases were installed on a server and made available via the web. it soon became apparent that there were problems with large numbers of returns . depending upon the client machine's hardware configuration, browsers could lock up the 48 information technology and libraries i march 2000 machine. while an efficient search should not result in such a large number of hits, we decided to limit returns to reduce this problem. following suggestions from users, various search tips were added, and some search interface terminology was changed. from a secure gateway , it is possible to call up different forms that allow individual records to be displayed, edited, and saved (see figure 1). new records are added by using a simple html form . it is also possible to bulk-load large numbers of records by using a special perl program to load the data directly from a text file. i advantages of the unix/mysql solution after first using glimpse, a popular web search engine, under linux, a free unix platform, and then microsoft's internet information server (iis) software on a windows nt platform to search the forum newspaper index, we settled on using mysql on a microcomputer running linux and the apache web server. we found we could write perl scripts that allowed users to make very sophisticated searches of the data from within very simple web forms. mysql is stable, reliable, free, and offers a high degree of functionality, flexibility, and efficiency. apache is reliable, extendible, very fast, free, and offers tight control of data access. initially, each story received from the newspaper was maintained as a separate file on a microcomputer. by having the stories as separate files, it was easy to set up glimpse as a searching tool for the articles. although it did provide a nice preview of a workable system, glimpse did not provide enough flexibility in how records were displayed, organized, or searched. it was not meant for managing data of this sort. windows nt, although a popular and successful it solution, was reproduced with permission of the copyright owner. further reproduction prohibited without permission. found to be somewhat cumbersome to implement and did not provide enough flexibility. the installation of these tools was easy, but it was difficult to obtain a high level of database and web integration . reliability and cost were also concerns . we found that unix was more stable and practically eliminated any unavailability of the data . perl, mysql, and apache were ultimately used to manage, store, and deliver the data. although these products are available for windows nt, their native platform is unix. by running these products on unix, we were able to take advantage of all the features offered by each of the products. we found that mysql offered the flexibility and power to manage both sets of data efficiently. also, to load the data into a relational database such as mysql required the data to be normalized. normalized data are data that are separated into logically separate components. to normalize data often takes some extra effort, as fields must be defined to contain certain types of data, but in the end the data is easier to manage and well organized. by having articles and bibliographies in a relational database, we are able to easily make updates, additions, and generate output or reports on the data in many different ways. there are several web servers available on the market today . however, apache is often singled out as being the most popular server . apache, like perl and mysql, is available free for all uses (educational and commercial). using apache and .htaccess control files, we are able to restrict access to administrative pages where data are added or modified. many extensions for apache are available to increase web performance in different situations. for example, a module for apache allows the web server to execute perl code within the server without the need to run the regular perl interpreter. i conclusion and future plans work is under way to refine and update the collective bibliography of north dakota geology. because bibliography number three was not annotated, index terms are being added to facilitate searching and retrieval of citations. we have recently updated the collective bibliography of north dakota geology to include citations to publications through 1998, and we plan to update the database annually. additionally, we receive monthly updates of forum articles, which are added using a simple perl script as soon as they are received. we have successfully implemented a number of other databases using these methods. we realize that this unix/ mysql solution is likely to be most helpful to other academic libraries: there are generally students and staff available on many campuses who are capable of programming in perl and maintaining sql databases on unix servers. our perl scripts are available at the url ww.lib.ndsu .nodak.edu/ kids. references and notes 1. m . kilcullen, "publishing a newspaper index on the world wide web using microsoft access 97," the indexer 20, no . 4 (1997): 195-96 . 2. p . jacso, "publishing textual databases on the web," information today 15, no . 11 (1998): 33, 36 3. n .o . sturr, "wais: an internet tool for full-text indexing," computers in libraries 15 (june 1995): 52-54. 4. m .w . scott, annotated bibliography of the geology of north dakota 1806-1959 north dakota geological survey miscellaneous series, no. 49 . (grand forks , n .d .: north dakota geological survey , 1972). 5. m . w . scott , annotated bibliography of the geology of north dakota 1960-1979 north dakota geological survey miscellaneous series, no. 60. (grand forks, n.d.: north dakota geological survey, 1981). 6. l. greenwood and others, bibliography of the geology of north dakota 1980-1993 north dakota geological survey miscellaneous series, no. 83. (bismarck, n .d .: north dakota geological survey, 1996). related urls linux homepage: www.linux.org/ mysql homepage: www.mysql.com/ perl homepage: www.perl.com/ apache homepage: www.apache.org/ ndsu forum index: www.lib.ndsu. nodak.edu/forum/ collective bibliography of north dakota geology: www.lib.ndsu.nodak.edu/ ndgs/ communications i england, joseph, and schlecht 49 letter from the editor kenneth j. varnum information technology and libraries | december 2018 1 https://doi.org/10.6017/ital.v37i4.10852 as 2018 draws to a close, so does our celebration of information technology and libraries’ 50th anniversary. in the final “ital at 50” column, editorial board member steven bowers takes a look at the 1990s. much as for steven, for me this decade was where my career direction and interests crystallized around the then-newfangled “world wide web.” taking a look at the topics covered in ital over those ten years, it’s clear that plus ça change, plus c'est la même chose: the more things change, the more they stay the same. we were exploring then questions of how the burgeoning internet would allow libraries to provide new services and be more efficient and helpful in improving existing ones. user experience, distributed data and the challenges that causes, who has access to technology and who does not…. all topics as vibrant and concerning then as they ar e now. with the end of our look back at the last 50 years, we are taking the opportunity start something new in 2019. there will be a new quarterly column, “public libraries leading the way,” to highlight a technology-based innovation from a public library perspective. topics we are interested in include the following, but proposals on any other technology topic are welcome. • virtual and augmented reality • artificial intelligence • big data • internet of things • 3-d printing and makerspaces • robotics • drones • geographic information systems and mapping • diversity, equity, and inclusion and technology • privacy and cyber-security • library analytics and data-driven services • anything else related to public libraries and innovations in technology columns will be in the 1,000-1,500 word range and may include illustrations. these will not be research articles, but are meant to share practical experience with technology development or uses within the library. if you are interested in contributing a column, please submit a brief summary of your idea. i’m grateful to the ital editorial board, and especially to ida joiner and laurie willis, for their guidance in shaping this concept. regardless of whether you work in a public, or any other, library, i’m always happy to talk with you about how your experience and knowledge could be published as an article in ital. get in touch with me at varnum@umich.edu. kenneth j. varnum, editor varnum@umich.edu december 2018 https://goo.gl/forms/mcz2kdltiwypsnq43 https://goo.gl/forms/mcz2kdltiwypsnq43 mailto:varnum@umich.edu mailto:varnum@umich.edu lib-s-mocs-kmc364-20140601053517 regional numerical union catalog on computer output microfiche william e. mcgrath: director of libraries; and donald simon: systems analyst and computer programmer, university of southwestern louisiana library, lafayette, louisiana. a union catalog of 1,100,000 books on computer output microfiche (com) in twenty-one louisiana libraries is described. the catalog, called lnr for louisiana numerical register, consists not of bibliographic information, but primarily of the lc card number and letter codes for the libraries holding the book. the computer programs, the data bank, and output are described. the programs provide the capability for listing over two million entries. also described are the statistical tabulations which are a by-product of the system and which provide a rich source for analysis. twenty-one louisiana libraries have produced on computer output microfiche (com) a union catalog containing locations for 1,100,000 books. about 150,000 of these are current acquisitions (books acquired in the last two years ) ; the rest are volumes in the retrospective collections of ten of the twenty-one libraries. the numerical register of books in louisiana libraries, as the catalog is now entitled, is the second step toward what is hoped will be a comprehensive current and retrospective list of over two million volumes, the estimated holdings of the participating libraries. the first was a conventionally printed register of 550,000 books, issued in 1971 and distributed to fifty louisiana libraries. the new register is not a bibliography. it includes no bibliographic information. it is a location device for books whose bibliographic information is already known and includes nothing that is not also listed by the library of congress. the title was deliberately chosen to distinguish it from 218 journal of library automation vol. 5/4 december, 1972 an older bibliographic louisiana union catalog. all books listed in the register are those having a library of congress ( lc ) card number; indeed the lc card number is the entry. the term "numerical" was chosen because we anticipate using other numbers besides the lc number-e.g., the mansell number, and the international standard book number ( isbn ). the lc card number is the most widely used book number we now have. this fact is put to good use by the library of congress in its own nucregister of additional locations. there are other lc number indexes, but they are not union catalogs. (the mansell number, of course, will be very useful when publication of the nuc-pre-1956 imprints is complete.) many more titles can be represented on a page by number codes than by complete bibliographic data, at a ratio of perhaps 600 to 9. unit costs are, therefore, much less. the first edition ( 1971 ) containing 550,000 volumes was produced for an estimated total cost of $22,600-$8,600 grant plus $14,000 absorbed. one hundred copies of the register were printed in hard copy form with approximate overall unit costs for keypunching, computer, travel, salaries, and printing, as follows : in terms of actual expenditures in terms of total funds, (grant funds ) expended plus absorbed per title entry 2.5¢ 6.0¢ per volume entry 1.6¢ 3.8¢ the second edition (november 1972) contains over 1,100,000 volumes and in terms of the second grant, was produced on computer output microfiche for an estimated total cost of $31,200, i.e., $10,000 grant plus $21,200 absorbed. (reproduction costs for the com are negligible. for an original copy of 5 fiche , containing all1,100,000 volumes, we were charged $25 by a commercial firm, and for extra copies, $3 each. copies for distribution will be sold at a slightly higher price.) unit costs for the com edition are: in terms of in terms of total actual expenditures funds, second grant (seco nd grant funds) expenditures plus absorb ed per title entry 1.8¢ 5.6¢ per volume entry .9¢ 2.8¢ unit costs computed on the basis of total costs to date suggest that they remain relatively constant from cumulation to cumulation. the concept of a numerical register is not new. the idea was discussed at length in a proposal by harry dewey ( 1) almost a generation ago in which he espoused all the essential ideas, and again in 1965 by louis schreiber ( 2). both argued that if the bibliographic data including the lc card number were already in hand, one could then merely look up the number in a numerical union catalog to determine a location. goldstein and others ( 3 ) have also studied what they called the "schreiber catalog" and have produced a sample computer printout of lc numbers. computer output microfiche, on the other hand, was not anticipated in the original concept. it has made reproduction and distribution cheap, fast, and regional numerical union catalog/mcgrath and simon 219 eminently feasible. the history of the register and its rationale have been discussed more fully by mcgrath ( 4). programs comprising the union catalog system the union catalog data record is shown in table 1. the first three fields are the familiar lc card number, and the fourth, the library location. table 1. the data record (1) (2) alpha year or series numeric series agr 69 (3) serial number within numeric series 2354 (4) library c ( 1 ) alpha series prefix this data field may contain from 1 to 4 alphabetic characters denoting a special series. (2) numeric series prefixthis data field may contain 1 or 2 digits. ( 3) serial number -this data field may contain up to 6 numeric digits. ( 4) alphabetic library designation codethis field contains a preassigned alphabetic code (up to 26) designating the participating library. the three programs which use this data record and comprise the union catalog system are shown in figure 1 and described below. lnredt program lnredt is an editing program which examines all card input data to determine whether they are acceptable or not. each data field as shown above is examined as follows: field 1 for the presence and rejection of nonalphabetic characters, and also to determine if the alphabetic code is a member of the accepted set of codes obtained from the library of congress; the accepted records are transferred after checking all fields to a magnetic tape file for subsequent use; rejected data records are printed and visually scanned for the source of error; fields 2 and 3 for the presence and rejection of nonnumeric characters; field 4 to determine if alphabetic. lnrsrt program lnrsrt sorts all records on the above mentioned tape file. the major sort key is the numeric prefix, field 2. the minor sort keys in order of the sort sequence are: field 1-the alphabetic special series indicator; field 3-the book serial number; field 4-the library code designation. lnrlst program lnrlst is the main program which uses the sorted data tape to: 220 journal of library automation vol 5/4 december, 1972 1. lnredt card recor ds editing of data fields generation of records of nique titl~s in combinations 3. lnrlst combinations en tered in memory matrix and coun initialized subsequent combinations matched and tallied 2. lnrsrt sorting of records calculation of statist i cal tables fig. 1. flow chart of the programs comprising the regist er system . regional numerical union catalog/mcgrath and simon 221 a. create a single record for each unique lc number containing the library code designation of each library having this particular book; b. produce a listing of the above records in lc card number order; c. generate records of unique titles in combinations of libraries owning the titles; d. enter into a memory matrix the combinations of libraries created in part (c); combinations are then counted; each time a combination is encountered, the matrix is searched for a match; if a match is found, the corresponding matrix position is incremented by one; if no match is found, a new matrix position is created with the new combination and the corresponding count initialized to one; this routine also provides for a total count of each library's contributions plus a grand total of all libraries' contributions; e. tabulate, from the data compiled in (d) above, several elaborate tables of summary statistics; these statistics are described later in this paper. the number of libraries the program lnrlst can accommodate is a variable and is entered as an execution-time parameter along with the library names and code designations. the main program occupies approximately 150,000 bytes of core memory. the output a sample of the register entries appears in figure 2. a simple one-letter designation was used to identify each library rather than the usual national union catalog ( nuc) designation in order to save space in the printout. these letters appear alphabetically to the right of each lc number. a typical page of the register contains ten columns of up to six-digit lc numbers, with the two-digit series number appearing only once at the beginning of each series. thus each page contains about 600 lc numbers. the latest cumulation of 1,100,000 volumes ( 560,000 lc numbers) consists of nearly 1,000 pages. the entire output was produced on five pieces of fiche directly from the cumulated tape. the com program was written by the commercial firm which contracted to run it. the computer output microfiche was issued on five 4x6 pieces in 42x. each piece contains 208 frames and each frame contains an average of 1,126 volumes and 573 titles. the data can be produced on 24x fiche as well as roll film. statistical summary the large samples of holdings (from an initial 5,000 volumes, through successive cumulations to 90,000 and, the most recent, 1,100,000) provide an excellent data base for statistical analysis. we believe the samples may be the largest title by title comparison of monographs ever tabulated in this format. very little analysis is presented in this paper, but the data base and its format will be explained. even without analysis, many interesting observations can be made. 222 ] ournal of library automation vol. 5/4 december, 1972 4449 e z 4587 ce 90 0 1544c az 4607 e 9157 ae 15503 c ez ps 4690 bcen 9236 b 15972 0 ej 76 4729 o"' 9314 z 15980 e 80 15168 4788 e 9611 e 16003 e e j 4859 c 9717 0 16109 e m 112600 j 4891 e 9792 be 16141 eo a 4903 aced 9944 z 16393 a e 4911 e 10294 0 1 6405 e 75630 elmo 77 4938 e 10349 ! 1~472 e 75728 a a 5087 bjlo 10354 16649 e 75736 z 5 . 5158 ab 10357 e 16681 e 75779 ai 56 i 5190 a 10361 j 1 6728 e 75787 ae 100 0 5296 0 10365 e 16752 e 75823 ae 214 bp 5564 c 10460 e 17260 ce 75866 aliiz 257 be 5~68 e 10468 a 17567 e 75874 ei'. 360 a 5647 e 10558 z 17b89 e 75902 acecl 407 a 5655 a 10631 17733 0 l 431 cp 5785 0 10645 e 18103 e 75937 abcmn 553 c 5813 ~~ 10661 ae 18154 e ol 632 e 5821 10716 ed 15225 e 75996 c 738 ae 5927 e 10723 z 19038 e 76051 aciop ~abceh 6112 e 10774 8 ~56~,1.. ? i 'of fig. 2. portion of a typical page of the computer printout showing the 2-digit 76 and 77 series, a typical prefix-ps, the serial numbers with the series, and letter codes to the right of each serial number. for example, library a has the book 77-5; seven libraries-a, b, c, m, n, 0, and z hold the book 77-75937. each page contains ten columns; only five are shown. most of the tabulations are designed to throw light on the various aspects of the overlap problem, since a decisive factor in determining the utility of . the register is a knowledge of the number of titles held in common by all the libraries. over the years there has been continuing interest in overlap. probably the first and most elaborate of the early studies was by leroy merritt ( 5), and one of the most recent by leonard, maier, and dougherty (6). continuing interest is expressed in such proclamations as that by ellsworth mason where he claims that materials are "being acquired in duplications that are rather staggering across the country." ( 7). the following statistics were tabulated from input for current acquisitions, the most recent being a total of 90,302 volumes, rather than the retrospective and current totals in the production runs. the 90,302 volumes were acquired for the most part during the two year period, fall 1969 to fall 1971. the statistics show holdings for sixteen libraries. the basic tabulation-titles held in common by unique combinations of libraries the basic tabulation sections which are shown in table 2 actually fill seven pages of computer printout. the tabulation is designed so that each unique and actual combination of libraries is separately listed, and the books held by each combination are counted. thus, in the table, although the total number of books held in common by libraries a and b is 127, the table 2. titles h eld in common by each unique combination of libraries ~a n~ l ibrary in combined library in combined lli comhination common holdin gs % combination common holdings 01 01 at3c adce a t~ cetii-1jlz aticelz atjcel ~ b~~~1_ ~ atlchhipz ao{;hjlmp a ~ ciji\ mpz ali cjnuz aocl ~b~~ ~ abe 2 39874 aijeh 2 44346 a~ehljm l 55067 al) thli"•iiipl i 66188 a~ehj l 48499 a~ehj~l i l 60064 a~ehji'. .52790 a~ehmo 1 54117 a~thz l 52108 autjoz 1 57757 aoel l 44765 atleu abez a d ghl i-ip a~r1 a!jhz ~~~ aujklp aokno ', 01 a ~l arlm ,01 abn athj • u library titles combiin combined % nation commonholdings ::j:j ~ .... 0 ~ ~ ~ .: ~ ""t .... (") !;::) ........ <::: ~ a· ~ cj !;::) .... !;::) c (jq ......... ~ (".) 0 !:xi ~ ~ § 0.. cj) -~ 0 z ~ 224 journal of library automation vol. 5/4 december, 1972 number of books held in common by them and no other library is only 52. the number of books held by libraries a, b and z, and no other library is 18. none of these 18 is included in the count of 52, and none of the 52 in the 18. they are mutually exclusive. but the 18, plus the 52, plus the small counts in each of the other combinations in which a and b share holdings is 127. the percentage of common holdings for each combination is also given except when the percentage is less than .01. thus libraries a and b have .48 percent in common of their total combined holdings of 10,688 volumes. it is interesting to note that of the 65,535 possible combinations, in only 444 combinations did the percentage of common holdings exceed .01 percent, and in only 8 did the percentage exceed 1 percent. of these, th.e highest is 5.43 percent (a and z). this 5.43 percent means that 678 of a and z's common holdings were held by no other library. the total of a and z's common holdings that were also held by other libraries is 1,315, or about 10.5 percent of 12,470. again this is the highest percentage of any combination. summary of titles held in common the basic tabulation of titles held in common is summarized in table 3. column 1 is the number of libraries from 1 to 16 in each combination. column 2 is the total number of titles counted in all combinations. for example, 59,907 titles exist in unique copy, thus there were only 59,907 copies (column 3), but there were only 8 titles which as many as 9 libraries held, for a total of 72 copies ( column 3). column 4 shows that all 16 libraries contributed unique titles and that there were 117 different combinations of two libraries, out of a possible 120 (column 5). thus there were 3 combinations of 2 libraries which had no titles in common. it is also most interesting that there were only 7 combinations of 9 libraries out of a possible 11,440, and no combinations of 10 or larger. according to the binomial distribution, there are 65,535 theoretical ways that 16 libraries can combine (total, column 5), whereas, in this sample, only 1,198 combinations occurred (total, column 4). column 6 is the result of column 2 divided by column 4. thus 3774.19 is the average number of unique titles contributed by each library. 74.92 is the average number held by any combination of 2 libraries, and 6.89 is the average held by any combination of 3. summary of each library's multiplicated titles the administrators of each library are especially interested to know how many of their own titles are also held by other libraries. this information for total input (i.e., for titles with lc prefixes from 1900 to the present) is given in table 4. (tables were also produced giving the same kind of table 3. summary of titles held in common by unique combinations of libraries (spring 1971 tabulation) ::x:l ('\) o'c:l .... column 1 column 2 column 3 column4 column 5 column 6 c ;:3 1::1 no. of libraries total no. of total no. of no. of times theoretical no. of average title ....... in each titles in all copies in all a combination times a combination overlap per <: combination combinations combinations occurred can occur combination ~ (binomial distribution) ~ ('\) 1 59,907 59,907 16 16 3,774.19 '""t ... 2 8,766 17,532 117 120 74.92 2 ..... 3 2,453 7,359 356 560 6.89 ~ 4 782 3,128 360 1,820 2.17 ;:s ... 5 279 1,395 214 4,368 1.30 g 6 84 504 75 8,008 1.12 (") 1::1 7 43 301 41 11,440 1.04 .... 1::1 8 13 104 12 12,870 1.08 -c o'c:l 9 8 72 7 11,440 1.14 .......... ~ 10 0 0 0 8,008 0.00 (') 11 0 0 0 4,368 0.00 (') :0 12 0 0 0 1,820 0.00 > ~ 13 0 0 0 560 0.00 ::r: 14 0 0 0 120 0.00 ii:> 15 0 0 0 16 0.00 ::i 0.. 16 0 0 0 1 0.00 c/) -totals 72,335 90,302 1,198 65,535 60.38 ~ 0 z ~ ~ cjl table 4. summary of each libt-ary's multiplicated titles (1900-1971 imprints) t:5 0) ...... column 1 column2 column 3 column 4 columns column 6 column 7 c ~ library library number at each library's no. of titles each library's each library's ~ code volumes volume as a for which copies m ultiplicated m ultiplicated ...... contributed %of total are also held titles as a % at titles as a % of c by each volumes by other own titles grand total -library libraries (col. 5+col. 3) (col. 5+total, col. 3) r:--. louisiana state ... ~ library a 4,708 5.21 2,497 53.03 2.76 ~ louisiana tech ~ university b 5,980 6.62 2,378 39.76 2.63 > university of south~ ..... western louisiana c 6,353 7.03 1,932 30.41 2.13 0 louisiana state uni~ versity-baton rouge e 29,186 32.32 6,190 21.20 6.85 ..... .... louisiana state univer0 ;:s sity medical center f 580 .64 168 28.96 .18 < grambling g 1,606 1.77 471 29.32 .52 0 -centenary h 4,472 4.95 2,061 46.08 2.28 louisiana state unicjl .......... versity-aiexandria i 2,765 3.06 1,087 39.31 1.20 ~ southeastern louisiana j 4,153 4.59 1,849 44.52 2.04 tj northwestern louisiana k 563 .62 230 40.85 .25 ("!) (") northeastern louisiana l 4,891 5.41 1,980 40.48 2.19 ("!) 3 loyola-new orleans m 3,803 4.21 1,744 45.85 1.93 0" louisiana state uni("!) versity-shreveport n 4,291 4.75 1,749 40.75 1.93 ~'"1 louisiana state uni,_.. (.0 versity-new orleans 0 5,968 6.60 1,783 29.87 1.97 -..:( ~ nicholls p 3,221 3.56 1,048 32.53 1.16 new orleans public z 7,762 8.59 3,228 41.58 3.57 totals 90,302 100.00 30,395 average 5,644 6.25 1,900 37.78 2.09 regional numerical union catalog/mcgrath and simon 227 information by decade and for the last two years, but are not reproduced here.) the column labels are self-explanatory, but it may be observed that the total in column 5, 30,395, equals the difference between the total copies, 90,302 (column 3, table 3) and the number of titles held by one library only, 59,907 (columns 2 and 3, table 3). distribution of titles published and multiplicated by decade table 5 shows that the very largest overlap, in current acquisitions, occurs among books with recent imprints. this is to be expected since these figures do not make any comparison to older books recently acquired by one library to those already in another library, and since the acquisition of older books is from a much larger universe than that for current books. table 5. distribution of contributed titles published and multiplicated by decade (titles acquired from 1969 to 1971) imprint number of titles %of titles number of volumes %of total period contributed contributed m ultiplicated volumes m ultiplicated 1900-1909 1,483 2.05 23 .13 1910-1919 1,049 1.45 29 .16 1920-1929 1,180 1.63 22 .12 1930-1939 1,816 2.51 74 .41 1940-1949 2,539 3.51 102 .57 1950-1959 5,353 7.40 361 2.01 1960-1971 58,915 81.45 17,356 96.59 totals 72,335 100.00 17,967 100.00 other summary statistics the foregoing tables illustrate the kind of tabulations that can be made with this type of data. more detailed tables can be compiled, and indeed were-e.g., tables giving the percentage of books acquired for each year and each decade for each library, with ten year totals and averages. other possibilities would be frequency distributions and summaries for clusters of similar libraries. this material awaits analysis. we believe it contains many heretofore unsuspected insights. future plans since the data can be updated so readily, plans are being made to provide funds for the extraction and keypunching of lc numbers in the remaining retrospective collections of the participating libraries. these libraries contain an estimated total of two million volumes. succeeding cumulations will be readily produced on com. most of the cost has been 228 journal of library automation vol. 5/4 december, 1972 for extracting retrospective numbers from card catalogs. once the remaining retrospective collections are cumulated, costs for cumulating current input will be negligible. any final catalog of course can never list complete holdings since each library has many titles without lc numbers. those titles could be listed in more conventional form. since they are in a minority, the expense would be far more reasonable than it would be to reproduce entire holdings in conventional form. we have said nothing about other aspects of the project. in committee discussions, however, much has been said about the feasibility of using the lc card number to access the information in other major projects such as marc, and possibly even the data bank in the ohio college library center. technically, it is feasible to print a conventional bibliographic catalog by matching up our lc numbers with titles listed in the current marc tapes; pragmatically and economically, of course, it is another matter. other possibilities are the printing of a list of specialized holdings by accessing the subject headings on the marc tapes, assignment of specialized acquisitions, and the gathering of information which might affect development of a joint processing center. acknowledgments this project was supported in part by the library services and construction act title iii funds administered by the louisiana state library. the authors wish to give special thanks to miss sallie farrell, louisiana state librarian, for her enthusiastic support and fine advice. we wish also to thank the other members of the l.l.a. committee on the union catalog: mr. sam dyson, louisiana tech university; mrs. jane kleiner, louisiana state university, baton rouge; mrs. elizabeth roundtree, louisiana state library; dr. gerald eberle, louisiana state university, new orleans; mrs. hester slocum, new orleans public library; mr. charles miller, tulane university, new orleans; mr. ronald tumey, rapides parish library, alexandria; and finally, mr. john richard, past president of the louisiana library association, who saw the importance of the project, and who appointed the original committee. complete documentation for this project, including computer programs, has been deposited with the eric clearinghouse on library and information science ( 8). references 1. harry dewey, "numerical union catalogs," library quarterly 18:33-34 (jan. 1948). regional numerical union catalog/mcgrath and simon 229 2. louis schreiber, "a new england regional catalog of books," bay state librarian 55: 13-15 (jan. 1965). 3. samuel goldstein, et al., development of a machine form union catalog for the new england library information network (nelinet). (wellesley, mass.: new england board of higher education, 1970) (u.s. office of education final report, project no. 9-0404.) ed 043 367. 4. william e. mcgrath, "lnr: numerical register of books in louisiana libraries," louisiana library association bulletin 34:79-86 (fall197l). 5. leroy c. merritt, "the administrative, fiscal, and quantitative aspects of the regional union catalog," in union catalogs in the united states (chicago, ill.: american library association, 1942). 6. lawrence e. leonard, joan m. maier, and richard m. dougherty, centralized processing: a feasibility study based on colorado academic libraries. (metuchen, n.j.: scarecrow press, 1969). 7. ellsworth mason, "along the academic way," library journal 96:167176 (15 may 1971). 8. william e. mcgrath and donald j. simon, lnr: numerical register of books in louisiana libraries; basic documents (lafayette, la.: louisiana library association, dec. 1972) (u.s. office of education) ed 070 470, ed 070 471. articles “good night, good day, good luck”: applying topic modeling to chat reference transcripts megan ozeran and piper martin information technology and libraries | june 2019 59 megan ozeran (mozeran@illinois.edu) is data analytics & visualization librarian, university of illinois library. piper martin (pm13@illinois.edu) is reference services & instruction librarian, university of illinois library. abstract this article presents the results of a pilot project that tested the application of algorithmic topic modeling to chat reference conversations. the outcomes for this project included determining if this method could be used to identify the most common chat topics in a semester and whether these topics could inform library services beyond chat reference training. after reviewing the literature, four topic modeling algorithms were successfully implemented using python code: (1) lda, (2) phrase-lda, (3) dmm, and (4) nmf. analysis of the top ten topics from each algorithm indicated that lda, phraselda, and nmf show the most promise for future analysis on larger sets of data (from three or more semesters) and for examining different facets of the data (fall versus spring semester, different time of day, just the patron side of the conversation). introduction the library at the university of illinois at urbana-champaign has included chat reference services since the spring of 2001.1 today, this service is extensively used by library patrons, resulting in thousands of conversations each semester. while in-person reference edges out chat for the largest number of interactions at the main library information desk over the most recent four years, chat questions have a higher number of more complex questions that incorporate teaching or strategizing.2 since the initial implementation of chat, the library has continually assessed and improved chat reference by evaluating the software, measuring the effectiveness and value of the service, and providing staff training.3 for several years, librarians at the university of illinois have used chat transcripts for training graduate assistants and new employees and chat statistics for determining staffing. unlike other forms of reference interactions, chat offers a textual record of the conversation, so librarians have used this unique opportunity in a couple different ways. in a training exercise, students read through actual transcripts and are guided in recognizing both well-developed and less-than-ideal interactions. they are then asked to think about ways those chat conversations could have been improved and to share strategies for doing so. graduate assistant supervisors also use chat transcripts to evaluate the performance of individual graduate assistants, checking for appropriate levels of helpfulness and for adherence to the library’s chat policies. finally, part of the library’s assessment strategy looks at chat interaction numbers, such as chats per hour, the duration of each conversation, and the level of complexity of each conversation to help make decisions about optimal chat staffing levels. however, prior to the project described here, the library had not yet good night, good day, good luck | ozeran and martin 60 https://doi.org/10.6017/ital.v38i2.10921 analyzed the chat reference conversations on a large scale to understand the range and consistency of topics being discussed. while these uses of chat data have been successful, such a large body of information from patrons about the library and its collections and services seemed underutilized. in an environment of growing data-informed decision-making, both within the broader library community and at the university of illinois in particular, it was now an opportune time to implement this kind of largescale topic analysis. if common themes emerged from the chat interactions beyond simply showing the most frequently asked questions, these themes could inform the library’s reference services beyond just training for chat reference. for example, patterns in the number of citation questions could indicate the best times to offer a citation management tool workshop; multiple inquiries about a new resource or tool might prompt planning a new workshop; and repeated confusion regarding a service or policy may signal a need to bolster the online guides or faq. since the number of chat transcripts was so large, automating analysis through a programming language such as python seemed the best course of action. this article presents the results of a pilot project that tested the application of algorithmic topic modeling to chat conversations. the outcomes for this project included (1) determining if this method could be used to identify the most common chat reference topics in a semester; and (2) whether this information indicated if it could be used to inform reference services beyond just training for chat, such as improving faqs, workshops, the library website, or other instruction. literature review chat reference services are well established in academic libraries, and there are abundant examples in the literature exploring these services. however, there is a lack of research on ways to employ automated methods to analyze chat reference. numerous articles approach chat analysis via traditional qualitative methods, where research teams hand-code chat themes, topics, or question categories.4 schiller employed a tool called qda miner to partially automate the otherwise human-driven coding process, using the software to automatically generate clusters of manually created codes.5 only one paper appeared to explicitly address the issue primarily by using algorithmic analysis methods. in addition to conducting sentiment analysis, kohler applied three topic modeling algorithms to chat reference conversations at rockhurst university.6 kohler identified the algorithm of non-negative matrix factorization (nmf) as the “winning topic extractor” based on how evenly it distributed the topic clusters across all the chat conversations.7 the other algorithms kohler tested, latent dirichlet allocation (lda) and latent semantic analysis (lsa), had much more skewed distributions of topics. the most common topic identified by lda appeared in so many of the chat conversations that it was essentially meaningless as a category. lda is one the most well-established topic modeling algorithms, but as kohler found, it does not work very well with short texts like chat conversations. to supplement the lack of library research in this area, non-library research that has applied topic modeling to short texts was also reviewed. interestingly, although the nmf algorithm worked well for kohler’s analysis of library chat conversations, there was little mention of nmf in the nonlibrary literature. on the other hand, it was not surprising that lda was one of the most commonly discussed algorithms, either as an example of what doesn’t work or as a basis upon which a modified algorithm was created to perform better for short texts.8 another common algorithm information technology and libraries | june 2019 61 was biterm topic modeling (btm). proposed by cheng et al., btm takes pairs of words (biterms), rather than individual words, as the unit on which to base topics.9 by creating biterms, the researchers increased the number of items to sort into topics, thus mitigating a common problem with analyzing short texts. a final commonly used algorithm was the dirichlet mixture model (dmm).10 a key feature of dmm for analyzing short texts is that it assumes each text (in this project, each chat conversation) is associated with only one topic. while longer texts like articles or books likely encompass many topics, it is plausible that a chat conversation could be summarized in one topic. methodology at the time of this project (spring 2018), the library was using locally developed chat software called iwonder. the chat widget is embedded on the library homepage, on the “ask a librarian” page, in libguides, and within the library’s interface for its licensed ebsco databases. the chat service was available 87 hours per week at the time the data was collected. during the day, chat service is provided by a mix of librarians, library staff, and graduate assistants, most of whom are scheduled at the main library’s information desk. subject-specific libraries, including the engineering library, the agricultural and life sciences library, and the social sciences, health, and education library, also contribute many hours on chat reference from their respective locations. the evening and weekend shifts are all covered by graduate assistants from the university of illinois school of information sciences. the authors decided that one semester of chat transcripts would be the most appropriate corpus with which to work for this pilot project because it would encompass a substantive and meaningful (but also manageable) number of conversations. in preparation, institutional review board approval was received, and a graduate student completing a degree in information management from the school of information sciences was selected to assist with this project through the school’s practicum program. this practicum student is an experienced programmer, and his presence on the team allowed the project to proceed more quickly than if the authors had pursued the project without his expertise. to begin the project, all chat conversations from the spring 2017 semester were obtained by querying the local server using mysql workbench, limiting the query to chat logs between the dates 1/17/2017 and 5/12/2017 (inclusive). because each line of a chat conversation was saved as a separate line in the database, this meant retrieving approximately 90,000 lines of data. the actual text of the chat conversations was unstructured (by its nature), but the text was saved with related metadata. for instance, each chat conversation was automatically given a unique identifier, so the individual lines could be grouped into conversations and put in order by their timestamp. the 90,000 lines represented almost 6,000 individual conversations. the chat logs were cleaned using a combination of openrefine (primarily for ascii character cleanup) and python code to remove personally identifiable information (pii) and to make the data easier to analyze.11 by default, the chat software did not collect any information about patrons, but sometimes patrons volunteered pii because they thought it was needed to answer their questions. therefore, part of the cleaning process involved removing as much of this patron pii as possible, replacing it with the word “removed” to denote the change. in addition, library staff usernames were scrubbed by replacing each username with a generic “staff###”, where “###” was a unique (incremented) number assigned to each original username. this maintained good night, good day, good luck | ozeran and martin 62 https://doi.org/10.6017/ital.v38i2.10921 the ability to track a single staff member across multiple conversations, if desired, without identifying the actual person. another important part of the data cleaning was to remove urls, because these would be unnecessary in identifying topics, and they significantly increased the number of unique “words” that the analysis algorithms identified. the urls were nearly always saved within an html tag, so most urls were easily identified for removal. the data cleaning process has been described here in a linear fashion for ease of understanding, but over the course of the project it was actually an iterative process, as more cleaning issues were discovered during analysis. based on the analyses performed in the related literature, the practicum student wrote code to test five topic modeling algorithms: (1) latent dirichlet allocation (lda), (2) phrase-lda (lda applied to phrases instead of words), (3) biterm topic modeling (btm), (4) dirichlet mixture modeling (dmm), and (5) non-negative matrix factorization (nmf). ultimately, the processing power and time required to implement btm meant that this algorithm could not be implemented for this project. however, for the other four models, lda, phrase-lda, dmm, and nmf, were all successfully implemented. all code related to this project, including the cleaning and analysis, are available on github (https://github.com/mozeran/uiuc-chat-log-analysis). results outputs of the lda, phrase-lda, dmm, and nmf modeling algorithms are shown in tables 1 through 4. after removing common stop words, the remaining words were put into lowercase and stemmed before topic modeling algorithms were applied. the objective of the stemming process was to convert singular and plural versions of a word to a hybrid form so that they are treated as the same word. thus, many words ending in “y” are shown ending in “i”. for instance, “library” and “libraries” would both be converted to “librari” and thus be treated as the same word. the phrase “easi search” refers to “easy search,” the all-in-one search box on the library homepage. the word “ugl” refers to the undergraduate library (ugl). the word “remov” showed up in the topic lists surprisingly frequently, probably because patron pii was replaced with the word “removed.” since explicitly denoting the removal of pii is unlikely to be of import, it makes sense in the future to simply remove the pii without replacement. table 1: lda (top 10 words in each topic) topic 1 music map laptop remov find ok one also may score topic 2 look search find help databas thank use articl research would topic 3 book librari thank help check look remov reserv would els topic 4 help use student find articl librari hi look tri question topic 5 request librari account item thank ok get help loan number topic 6 thank chat good know one night go okay think hi topic 7 thank look librari remov help would contact inform find like topic 8 search articl databas click thank journal help page ok find topic 9 articl thank journal access look help remov full link find topic 10 access tri link thank use work get campu remov let table 2: phrase-lda information technology and libraries | june 2019 63 (top 10 phrases in each topic) topic 1 interlibrari loan, lose chat, chat servic, lower level, chat open, writer workshop, spring break, studi room, call ugl, add chat topic 2 good night, great day, good day, good luck, drop menu, sound good, nice day, ye great, remov thank welcom, make sens topic 3 anyth els, tri find, abl find, find anyth, feel free, ll tri, social scienc, tri access, ll back, abl access topic 4 easi search, academ search, find articl, search box, tri search, databas subject, search bar, search term, databas search, search databas topic 5 graduat student, grad student, peer review, undergrad student, illinoi undergrad, scholarli sourc, univers illinoi, undergradu student, primari sourc, googl scholar topic 6 main librari, librari catalog, librari account, librari homepag, call number, librari websit, netid password, main stack, creat account, borrow id topic 7 page remov, click link, open new tab, link remov, send link, remov click, left side, remov link, page click, error messag topic 8 give one moment, contact inform, moment pleas, faculti staff, give minut, pleas contact, email address, staff member, faculti member, unit state topic 9 full text, journal articl, access articl, find articl, databas journal, light blue, articl titl, titl articl, journal databas, found articl topic 10 request book, request item, check book, doubl check, print copi, cours reserv, copi avail, physic copi, book avail, copi past table 3: dmm (top 10 words in each topic) topic 1 work open chat way onlin say specif avail day sourc topic 2 check titl research much onlin avail day text sourc say topic 3 pleas sourc day onlin titl found right hello may take topic 4 chat also copi pleas think onlin undergrad sourc work way topic 5 pleas sorri found item chat way right open work time topic 6 found also right much think could research undergrad sorri way topic 7 contact hello account sorri could ask titl moment may think topic 8 copi onlin sorri ask think say right also much sourc topic 9 much research way may right think open take hello result topic 10 abl avail also titl catalog pleas say campu onlin take table 4: nmf (top 10 words in each topic) topic 1 request take titl today moment way item may place say topic 2 specif start type journal topic research tab way subject result topic 3 ugl today ask wonder call may contact peopl someon talk topic 4 sourc univers scholarli research servic resourc tell illinoi guid librarian topic 5 account log set vpn us password id say campu problem topic 6 main locat undergradu call tab review two circul ugl number topic 7 reserv class time undergradu cours websit show im titl onlin good night, good day, good luck | ozeran and martin 64 https://doi.org/10.6017/ital.v38i2.10921 topic 8 text full troubl problem still pdf websit onlin send moment topic 9 chat night hey yeah oh well time tonight take yep topic 10 unfortun uiuc onlin wonder version graduat print seem way grad discussion interpreting the results of a topic model can be a bit of a guessing game. none of these algorithms look at the semantic meaning of words, so the resulting topics are not based on semantics. each algorithm simply employs a different method of mathematically determining the likelihood that words are related to each other. when this likelihood is high enough (as defined by the algorithm), the words are listed within the same topic. identifying topics mathematically is much quicker than a person hand-coding conversations. however, automatic classification also means that the resulting topics could make absolutely no sense to people, who understand the semantic meaning of the words within a topic. this lack of coherent meaning is most present in the results of the dmm model (table 3). for instance, the words that comprise topic 1 are the following: “work open chat way online say specify available day source.” it is difficult to imagine what overarching concept links all, or even most, of these words. only a few words appear to have any significance at all: “open” could refer to open access, or to the library’s open hours; “online” may refer to finding resources online, or the fact that a student is taking online classes; and “source” is likely some reference to a research resource. these words barely relate to each other semantically, and the remaining seven words don’t provide much clarification. thus, it appears that dmm is not a particularly good topic modeling algorithm for library chat reference. the results seen from the lda model (table 1) appear slightly more comprehensible. in topic 2, for instance, the words are as follows: “look search find help database thank use article research would.” while not all the words relate to each other, a common theme could emerge from the words look, search, find, database, article, and research. it’s possible that this topic 2 identified chat conversations where a patron needed help finding research articles. even topic 6, at first glance a silly list of words, makes some sense: “thank chat good know one night go okay think hi.” greetings and sign-offs probably comprised a good number of the total words in the corpus, so it is understandable that a “greetings” topic could be mathematically identified. overall, lda appears to have potential in topic modeling chat reference, but it probably needs to be further tweaked. when applying the lda model to phrases (table 2), the coherence increases within the phrases, but the topics are not always as coherent. topic 1 includes the following phrases: “interlibrary loan, lose chat, chat service, lower level, chat open, writer workshop, spring break, study room, call ugl, add chat.” each phrase, individually, makes perfect sense for the context of this library; as a collection, however, the phrases don’t comprise one coherent topic. four of the phrases explicitly mention chat services (an interesting meta-topic), while the rest appear completely unrelated. on the other hand, topic 10 does show more semantic relation between the phrases: “request book, request item, check book, double check, print copy, course reserve, copy available, physical copy, book available, copy past.” it seems pretty clear that this topic refers to books— whether on reserve, being requested, or checking if they are even available. with the wide difference in topic coherence, the phrase-lda algorithm is not perfect for topic modeling chat reference, but further exploration is warranted. information technology and libraries | june 2019 65 the final algorithm, nfm (table 4), is also imperfect. it is possible to distill each topic into an actual semantic concept, but there is almost always at least one word that makes it a little less clear. topic 5 probably provides the best coherence: “account log set vpn use password id say campus problem.” it seems clear this topic refers to identity verification, likely for off-campus use of library resources. the other topics given by the algorithm have more confusing elements, such as in topic 1 where the relatively meaningless words may, way, and say all appear. it’s interesting that kohler found nmf to work very well, while the results above are not nearly as coherent as those identified in her implementation.12 this is a perfect example of how the tuning of many different parameters can affect the ultimate results of each topic modeling algorithm. this is why the authors think it is worth continuing to explore how to improve the implementation of lda, phrase-lda, and nmf algorithms for chat conversations, as well as share the original code for others to test and revise. it will take many different projects at many different libraries before an optimum topic model implementation is found for chat reference. next steps for the most part, the more coherent results from the lda and nmf topic modeling algorithms support anecdotal understanding of the primary themes in chat conversations. currently, two members of the research & information services unit, the department responsible for scheduling the chat reference service at the main library, are examining the model outputs to determine whether any of the results are strong enough at this stage to suggest changes to services or resources. they will also share the results with the chat coordinators at other libraries on campus in case the results indicate changes for them. additionally, results will be shared with the library’s web working group, since repeated questions about the same services or locations may suggest the need to display them in a more prominent place on the library website or provide a more discoverable online path to them. since this was a pilot project that used a fairly small data set, it is anticipated that years of transcripts—along with improved topic model implementation—will reveal even more significant and robust themes. with the encouraging results of this pilot project, there is much to continue to explore.13 one future question is whether there are differences between fall and spring semesters. if some topics arise more frequently in one semester than the other, perhaps the library needs to offer more workshops during that semester. alternatively, perhaps support materials should be created (such as handouts or online guides) that emphasize the related services and place them more prominently, while withdrawing or de-emphasizing them in the other semester. another area for further analysis is how the topics that emerge in the late-night chat interactions compare to other times of day. this will help the library design more relevant training materials for the graduate assistants who staff those shifts, or potentially change who is staffing the shifts. also of interest is comparing the text written by the chat operators versus the chat users, as this would further spotlight the terminology that patrons use. if patrons are using significantly different terms from staff, then modifying the language of the library’s website may reduce confusion. there are also improvements to make to the data cleaning process, such as better identifying when to remove stop words and when to remove punctuation. these steps weren’t perfectly aligned, which is why; for example, the “ll” that appears in topic 3 of the phrase-lda results (table 2) is most likely a mutation of the contractions like “i’ll,” “we’ll,” and “you’ll.” generating “ll” as a word from multiple different contractions not only created a meaningless word, but since “ll” good night, good day, good luck | ozeran and martin 66 https://doi.org/10.6017/ital.v38i2.10921 occurred more frequently than any unique contraction, it was potentially treated as more important by the topic modeling algorithms. conclusion this project has demonstrated that topic modeling is one possible way to employ automated methods to analyze chat reference, with mixed success. the library will continue to improve chat reference analysis based on this project experience. the authors hope that other libraries will use the lessons from this project and the code in github as a starting point to employ similar analysis for their own chat reference. in fact, a related project at the university of northern iowa library is evidence of growing interest in topic modeling of chat reference transcripts.14 considering how frequently patrons use chat reference, is it important for libraries to explore and embrace whatever methods will allow them to assess and improve such services. acknowledgements the authors wish to acknowledge the research and publication committee of the university of illinois at urbana-champaign library, which provided support for the completion of this research. many thanks are owed to xinyu tian, our practicum student, for the extensive work he did in identifying relevant literature and developing the project code. notes 1 jo kibbee, david ward, and wei ma, “virtual service, real data: results of a pilot study,” reference services review 30, no. 1 (mar. 1, 2002): 25–36, https://doi.org/10.1108/00907320210416519. 2 the library uses the read scale (reference effort assessment data scale), which allows reference transactions to be translated into a numerical scale that takes into account the effort, skills, knowledge, teaching moment, techniques and tools used by the staff in the transaction. see readscale.org for more information. 3 david ward and m. kathleen kern, “combining im and vendor-based chat: a report from the frontlines of an integrated service,” portal: libraries and the academy 6, no. 4 (oct. 2006): 417–29, https://doi.org/10.1353/pla.2006.0058; joann jacoby et al., “the value of chat reference services: a pilot study,” portal: libraries and the academy 16, no. 1 (jan. 2016): 109– 29, https://doi.org/10.1353/pla.2016.0013; david ward, “using virtual reference transcripts for staff training,” reference services review 31, no. 1 (2003): 46–56, https://doi.org/10.1108/00907320310460915. 4 robin brown, “lifting the veil: analyzing collaborative virtual reference transcripts to demonstrate value and make recommendations for practice,” reference & user services quarterly 57, no. 1 (fall 2017): 42–47, https://doi.org/10.5860/rusq.57.1.6441; maryvon côté, svetlana kochkina, and tara mawhinney, “do you want to chat? reevaluating organization of virtual reference service at an academic library,” reference & user services quarterly 56, no. 1 (fall 2016): 36–46, https://doi.org/10.5860/rusq.56n1.36; donna goda and corinne bisshop, “frequency and content of chat questions by time of semester at the university of central florida: implications for training, staffing and marketing,” public services quarterly 4, no. 4 (dec. 2008): 291–316, https://doi.org/10.1080/15228950802285593; information technology and libraries | june 2019 67 kelsey keyes and ellie dworak, “staffing chat reference with undergraduate student assistants at an academic library: a standards-based assessment,” the journal of academic librarianship 43, no. 6 (2017): 469–78, https://doi.org/10.1016/j.acalib.2017.09.001; michael mungin, “stats don’t tell the whole story: using qualitative data analysis of chat reference transcripts to assess and improve services,” journal of library & information services in distance learning 11, no. 1–2 (jan. 2017): 25–36, https://doi.org/10.1080/1533290x.2016.1223965. 5 shu z. schiller, “chat for chat: mediated learning in online chat virtual reference service,” computers in human behavior 65 (dec. 2016): 651–65, https://doi.org/10.1016/j.chb.2016.06.053. 6 ellie kohler, “what do your library chats say?: how to analyze webchat transcripts for sentiment and topic extraction,” in brick & click libraries conference proceedings (brick & click, maryville, mo: northwest missouri state university, 2017), 138–48, https://www.nwmissouri.edu/library/brickandclick/presentations/eproceedings.pdf. 7 kohler, 141. 8 for example: guan-bin chen and hung-yu kao, “re-organized topic modeling for microblogging data,” in proceedings of the ase bigdata & socialinformatics 2015, ase bd&si ’15 (new york, ny: acm, 2015), 35:1–35:8, https://doi.org/10.1145/2818869.2818875. 9 x. cheng et al., “btm: topic modeling over short texts,” ieee transactions on knowledge and data engineering 26, no. 12 (dec.2014): 2,928–41, https://doi.org/10.1109/tkde.2014.2313872. 10 for example: chenliang li et al., “topic modeling for short texts with auxiliary word embeddings,” in proceedings of the 39th international acm sigir conference on research and development in information retrieval (acm press, 2016), 165–74, https://doi.org/10.1145/2911451.2911499. 11 we used python packages gensim, langid, nltk, numpy, pandas, re, sklearn, and stop_words for data cleaning and analysis. 12 kohler, “what do your library chats say?” 13 the library implemented new chat reference software after this project was completed, so analysis of chat conversations that took place after the spring 2018 semester will require a reworking of the data collection and cleaning processes. 14 hyunseung koh and mark fienup, “library chat analysis: a navigation tool,” (poster, dec. 5, 2018), https://libraryassessment.org/wp-content/uploads/2018/11/58-kohfienuplibrarychatanalysis.pdf. lib-s-mocs-kmc364-20140601053820 book reviews proceedings of the conference on interlibrary communications and information networks, edited by joseph becker, sponsored by the american library association and the u.s. office of education, bureau of libraries and educational technology held at airlie house, warrenton, virginia, september 28, 1970-0ctober 2, 1970. chicago: american library association, 1971. 347p to see how rapidly the field of library networking and communications has moved in recent times, one need only try to review a conference on the subject some years after it was held. what was fresh, imaginative, innovative, or blue-sky has become accepted or gone beyond; errors in thinking or bad guesses as to the future have been shown up; and the blue sky has been divided into lower stratospheres and outer space for ease of working. under these circumstances one can only review such proceedings as history. the assumptions on which the conference was based were the traditional ones of librarians and information scientists-that access to information should be the right of anyone without regard to geographical or economic position, and that pooling of resources (here by networking operations) is one of the best ways to reach that goal. since 1970 both of these assumptions have been questioned, but at the time of the conference there were no opposing voices. the final conclusions, of course, were based on these assumptions. national systems were recommended, both governmental and private, with the establishment of a public corporation (such as the corporation for public broadcasting) as the central stimulator, coordinator, and regulator, to be served by input from a large number of groups. funding, the attendees decided, should be pluralistic, from public, private, and foundation sources (are there any others?), but with the federal government bearing the largest burden of support. since it is deemed desirable to give the widest chance for all individuals to use these networks, it was recommended that fee-forservice prices should be kept low through subventions of the telecommunications costs by libraries and information centers. and since new techniques and methods need to be learned, both education and research in the field must be strengthened and enlarged. since the basic components of networks of libraries and information centers was conceived as being: 1. bibliographic access to media 2. mediation of user request to information book reviews 245 3. delivery of media to users 4. education traditional questions of bibliographic description, the most useful form of public services (including such things as interviewing requestors, seeking information on the existence of answers, locating the answers physically, providing them, evaluating them and obtaining feedback), as well as the best ways to set up networks were discussed at length. moreover, since new technologies have sometimes been touted as the answer to many of these problems, a whole section on network technology was included. such subjects as telecommunications, cable television, and computers were examined; here most of the recommendations still remain to be carried out. the organization proposed for these networks again plowed old ground. the conferees felt that one should use the tremendous national and disciplinary resources already established (the library of congress, the national library of medicine, the national agricultural library, chemical abstracts, etc.); there should be a coordinating body to minimize duplication of effort and assure across-the-board coverage; the systems must be sold to legislators if public money is to be provided; and more research on the best networking operations is necessary. above all in almost every section of the report and in the preface the then-new national commission on libraries and information science was referred to as the great savior. together with requests for public money, it might be said, this was the thread binding all sections of the conference together. was this conference necessary? could it have brought forth something more useful than the gentle spoof in irwin pizer's poem "hiawatha's network?" it was undoubtedly very inspiring for those at the conferenceall 100 of them-who probably learned more over the cocktail glass and dinner plate than at the formal sessions, and who learned as they grappled with the difficulties of consensus-making. but need the proceedings have been published? is everything ever said at a meeting always worth preserving? how about the concept of ephemera rather than total recall? would not a short summary of the recommendations have sufficed? estelle brodman introducing zoomify image | smith 29 column title editor author id box for 3 column layout playing tag in the dark: diagnosing slowness in library response time | brown-sica 29 margaret brown-sicatutorial playing tag in the dark: diagnosing slowness in library response time in this article the author explores how the systems department at the auraria library (which serves more than thirty thousand primarily commuting students at the university of colorado–denver, the metropolitan state college of denver, and the community college of denver) diagnosed and analyzed slow response time when querying proprietary databases. issues examined include vendor issues, proxy issues, library network hardware, and bandwidth and network traffic. w hy is everything so slow?” this is the question that library systems departments often have the most trouble answering. it is also easy to dismiss because it is often the fault of factors beyond the control of library staff. what usually prompts these questions are the experiences of the reference librarians. when these librarians are trying to help students at the reference desk, it is very frustrating when databases seem to respond to queries slowly, files take forever to load onto the computer screen, and all the while the line in front of the desk get continues to grow. or the library gets calls from students using databases and the catalog from their homes who complain that searching library resources takes too long, and that they are getting frustrated and using google instead. this question is so painful because libraries spend so much of their shrinking budgets on high quality information in the form of expensive proprietary databases, and it is all wasted if users have trouble using them. in this case the problem seemed to be how slow the process of searching for information and downloading documents from databases was. for lack of a better term, the auraria library called this the “response time” problem. this article will discuss the various ways the systems (technology) department of the auraria library, which serves the university of colorado–denver, metropolitan state college of denver, and the community college of denver, tried to identify problems and improve database response time. the systems department defined “response time” as the time it took for a person to send a query from a computer at home or in the library to a proprietary information database and receive a response back, or how long it took to load a selected fulltext article from a database. when a customer sets out to use a database in the library, the query to the database could be slowed down by many different factors. the first is the proxy, in our case innovative interfaces’ inc. web access management (iii wam), a product that authenticates the user via the iii api (application program interface) product. to do this the query travels over network hardware, switches, and wires to the iii server and back again. then the query goes to the database’s server, which may be almost anywhere in the world. hardware problems at the database vendor’s end can affect this transfer. in the case of auraria library this transfer can be influenced by traffic on the library’s network, the university’s network, and any other place in between. this could also be hampered by the amount of memory in the computer where the query originates, by the amount of tasks being performed by that computer, etc. the bandwidth of the network and its speed can also have an effect. basically, the bottlenecks needed to be found and fixed. bottlenecks are described by webopedia as “the delay in transmission of data through the circuits of a computer’s microprocessor or over a tcp/ip network. the delay typically occurs when a system’s bandwidth cannot support the amount of information being relayed at the speed it is being processed. there are, however, many factors that can create a bottleneck in a system.”1 literature review there is not a lot on database response slowness in library literature, probably because the issue overlaps with computer science and really is not one problem but a possibility of one of several problems. the issue is figuring out where the problem lies. gerhan and mutula examined technical reasons for network slowness, performing bandwidth testing at a library in botswana and one in the united states using the same computer, and giving several suggestions for testing, fixing technical problems, and issues to examine. gerhan and mutula concluded that bandwidth and insufficient network infrastructure were the main culprits in their situation. they studied both bandwidth and bandwidth “squeeze.” looking for the bandwidth “squeeze” means looking along the internet’s “journey of many stages through routers and exchange points, each successively farther removed from the user.”2 bandwidth bottlenecks could occur at any one or more of those stages in the query’s transmission. the following four sections parse that lengthy pathway and examine how each may contribute to delays. badue et al. in their article “basic issues on the processing of web queries,” described web margaret brown-sica (margaret.brown -sica@ucdenver.edu) is head of technology and distance education support, auraria library, serving the university of colorado–denver, metropolitan state college of denver, and the community college of denver. 30 information technology and libraries | december 200830 information technology and libraries | december 2008 queries, load balancing, and how they function.3 bertot and mcclure’s “assessing sufficiency and quality of bandwidth for public libraries” is based on data collected as part of the 2006 public libraries and the internet study and provides a very straightforward approach for checking specific areas for problems.4 it outlines why basic data such as bandwidth readings may not give the complete picture. it also gives a nice outline of factors involved such as local settings and parameters, ultimate connectivity path, application resource needs, and protocol priority. azuma, okamoto, hasegawa, and masayuki’s “design, implementation and evaluation of resource management system for internet servers” was very helpful in understanding the role and function of proxy servers and problems they can present.5 vendor issues this is a very thorny topic because it is out of the library’s control, and also because the library has so many databases. the systems department asked the reference staff to send reports of problems listing the type of activity attempted, time and dates, the names of the database, the problem and any error messages encountered. a few that seemed to be the slowest were selected for special examination. one vendor worked extensively with the library and in the end it was believed that there were problems at their end in load balancing, which eventually seemed to be fixed. that company was in the middle of a merger and that may have also been an issue. we also noted that a database that uses very large image files, artstor, was hard to use because it was so slow. this company sent the library an application that simulated the databases’ use and was supposed to test to see if bandwidth at auraria library was sufficient for that database. according to the test, it was. databases that consistently were perceived as the slowest were those that had the largest documents and pictures, such as those that used primarily pdfs and visual material. this, with the results of the testing, pointed to a problem independent of vendor issues. bandwidth and network traffic the systems department decided to do bandwidth testing on the library’s public and staff computers after reading gerhan and mutula’s article about the university of botswana. the general perception is that bandwidth is often the primary problem in network slowness, as well as the problems with databases that use larger files. several of the computers were tested in several successive days during what is usually the busiest time for the network, between noon and 2 p.m. the results were good, averaging about 3000 kilobytes per second (kbps). for this test we used the cnet bandwidth meter, which downloads an image to your computer, measures the time of the download, and compares it to the maximum speeds offered by other internet service providers.6 there are several bandwidth meters available on the internet. when the network administrator checked the switches for network traffic, they showed low traffic, almost always less than 20 percent of capacity. this was confusing: if the problem was neither with the bandwidth nor the vendors, what was causing the slow network performance? one of the university network administrators was consulted to see if any factor in their sphere could be having an effect on our network. we knew that the main university network had implemented a bandwidth shaper to regulate bandwidth. “these devices limit bandwidth . . . by greedy applications, guarantee minimum throughput for users, groups or protocols, and better utilize widearea connections by smoothing out bursty traffic.”7 it was thought that perhaps this might be incorrectly prioritizing some of the library’s traffic. this was a dead end, though—the network administrators had stopped using the device. if the bandwidth was good and the traffic was manageable, then the problem appeared to not be at the library. however, according to bertot and mcclure, the bandwidth question is complex because typically an arbitrary number describes the number of kbps used to define “broadband.” . . . such arbitrary definitions to describe bandwidth sufficiency are generally not useful. the federal communications commission (fcc), for example, uses the term “high speed” for connections of 200kbps in at least one direction. there are three problematic issues with this definition: 1. it specifies unidirectional bandwidth, meaning that a 200kbps download, but a much slower upload (e.g., 56kbps) would fit this definition; 2. regardless of direction, bandwidth of 200kbps is neither high speed nor does it allow for a range of internet-based applications and services. this inadequacy will increase significantly as internet-based applications continue to demand more bandwidth to operate properly. 3. the definition is in the context of broadband to the single user or household, and does not take into consideration the demands of a high-use multiple-workstation public-access context.8 proxy issues auraria library uses the iii wam proxy server product. there were several things that pointed to the introducing zoomify image | smith 31playing tag in the dark: diagnosing slowness in library response time | brown-sica 31 proxy being an issue. one was that the systems department had been experimenting with invoking the proxy in the library building in order to collect more accurate statistics and found that complaints about speed seemed to have started around the same time as this experiment. but if the bandwidth was not showing inadequacy and the traffic was light, why was this happening? the answer is better explained by azuma et al.: needless to say, busy web servers must have many simultaneous http sessions, and server throughput is degraded when effective resource management is not considered, even with large network capacity. web proxy servers must also accommodate a large number of tcp connections, since they are usually prepared by isps (internet service providers) for their customers. furthermore, proxy servers must handle both upward tcp connections (from proxy server to web servers) and downward tcp connections (from client hosts to proxy server). hence, the proxy server becomes a likely spot for bottlenecks to occur during web document transfers, even when the bandwidth of the network and web server performance are adequate.9 testing was done from on campus and off campus, with and without using the proxy server. the results showed that the connection was faster without the proxy. when testing was done from the health sciences library at the university of colorado with the same type of server and proxy, the response time was much faster. the difference between auraria library and the other library is that the community auraria library serves (the community college of denver, metropolitan state college, and the university of colorado–denver) has a much larger user population who overwhelmingly use databases from home, therefore taxing the proxy server. the other library belonged to a smaller campus, but the hardware was the same. the proxy was immediately dropped for on-campus users, and that resulted in some responsetime improvements. a conference call was set up with the proxy vendor to determine if improvements in response time might be attained by changing from a proxy server to ldap (lightweight directory access protocol) authentication. the response given was that although there might be other benefits, increased response time was not one of them. library network hardware it was evident that the biggest bottleneck was the proxy, so the systems department decided to take a closer look at iii’s hardware. the switch that regulated traffic between the network and the server that houses our integrated library system, part of which is the proxy server, was discovered to have been set at “halfduplex.” half-duplex refers to the transmission of data in just one direction at a time. for example, a walkie-talkie is a half-duplex device because only one party can talk at a time. in contrast, a telephone is a full-duplex device because both parties can talk simultaneously. duplex modes often are used in reference to network data transmissions. some modems contain a switch that lets you select between halfduplex and full-duplex modes. the correct choice depends on which program you are using to transmit data through the modem.10 when this setting was changed to full duplex response time increased. there was also concern that this switch had not been functioning as well as it could. the switch was replaced, and this also improved response time. in addition, the old server purchased through iii was a generic server that had specifications based on the demands of the ils software and didn’t into consideration the amount of traffic going to the proxy server. auraria library, which serves a campus of more than thirty thousand full-time equivalent students, is a library with one of the largest commuter student populations in the country. a new server had been scheduled to be purchased in the near future, so a call was made to the ils vendor to talk about our hypothesis and requirements. the vendor agreed that the library should change the specification on the new server to make sure it served the library’s unique demands. a server will be purchased with increased memory and a second processor to hopefully keep these problems from happening again in the next few years. also, the cabling between the switch and the server was changed to greater facilitate heavy traffic. conclusion although it is sometimes a daunting task to try to discover where problems occur in the library’s database response time because there are so many contributing factors and because librarians often do not feel that they have enough technical knowledge to analyze such problems, there are certain things that can be examined and analyzed. it is important to look at how each library is unique and may be inadequately served by current bandwidth and hardware configurations. it is also important not to be intimidated by computer science literature and to trust patterns of reported problems. the auraria library systems department was fortunate to also be able to compare problems with colleagues at other libraries and test in those libraries, which revealed issues that were unique and therefore most likely due to a problem at the library end. it is important to keep learning about how 32 information technology and libraries | december 200832 information technology and libraries | december 2008 your system functions and to try to diagnose the problem by slowly looking at one piece at a time. though no one ever seems to be completely satisfied with the speed of their network, the employees of auraria library, especially those who work with the public, have been pleased with the increased speed they are experiencing when using proprietary databases. having improved on the responsetime speed issue, other problems that are not caused by the proxy hardware have been illuminated, such as browser configuration, which may be hampering certain databases—something that had been attributed to the network. references 1. webopedia, s.v. “bottleneck,” www.webopedia.com/term/b/bottleneck.html (accessed oct. 8, 2008). 2. david r. gerhan and stephen mutula, “bandwidth bottlenecks at the university of botswana,” library hi tech 23, no. 1 (2005): 102–17 3. claudine badue et al., “basic issues on the processing of web queries,” sigir forum; 2005 proceedings (new york: association for computing machinery, 2005): 577–78. 4. john carlo bertot and charles r. mcclure,” assessing sufficiency and quality of bandwidth for public libraries,” information technology and libraries 26, no. 1 (mar. 2007): 14 –22. 5. kazuhiro azuma, takuya okamoto, go hasegawa, and murata masayuki, “design, implementation and evaluation of resource management system for internet servers,” journal of high speed networks 14, no. 4 (2005): 301–16. 6. “cnet bandwidth meter,” http:// reviews.cnet.com/internet-speed-test (accessed oct. 8, 2008). 7. michael j. demaria, “warding off wan gridlock,” network computing nov. 15, 2002, www.networkcomputing.com/ showitem.jhtml?docid=1324f3 (accessed oct. 8, 2008). 8. bertot and mcclure, “assessing sufficiency and quality of bandwidth for public libraries,” 14. 9. azuma, okamoto, hasegawa, and masayuki, “design, implementation and evaluation of resource management system for internet servers,” 302. 10. webopedia, s.v. “half-duplex,” www.webopedia.com/term/h/half _duplex.html (accessed oct. 8, 2008). lita cover 2, cover 3, cover 4 index to advertisers are pdf documents accessible? | ribera turró 25 are pdf documents accessible? mireia ribera turró adobe pdf is one of the most widely used formats in scientific communications and in administrative documents. in its latest versions it has incorporated structural tags and improvements that increase its level of accessibility. this article reviews the concept of accessibility in the reading of digital documents and evaluates the accessibility of pdf according to the most widely established standards. in a world in which an increasing amount of informa-tion is circulating in digital format, document acces-sibility is becoming a major concern. many countries have adopted legislative measures concerning digital accessibility (see, for example, the web accessibility initiative at www.w3.org/wai/policy) and the guru of the web, jakob nielsen, has included it in several columns (nielsen 1996, 1999) and reports (coyne and nielsen 2001a, 2001b, 2001c; schade and neilsen 2002). improving document accessibility for disabled persons, including the elderly, offers good business opportunities for it firms. for example, sun has introduced strict accessibility guidelines in its java programming language, and microsoft has incorporated an increasing number of assistive technologies in its operating system. for its part, adobe came out clearly in favor of accessibility in the latest updates of its flagship format, pdf, and its free reader program (adobe 2005). the efforts of these and many other companies are necessary if persons with disabilities are to be able to use products as well as persons without disabilities. in an effect similar to that of the cascade of interactions that takes place in the search for information in a digital library (bates 2002), the accessibility of a digital product is contextual and depends on many layers: the product itself, the application used to operate it, the support of the operating system, and the additional assistive technologies used to transform the content (henry 2007). for example, an html document is considered to be accessible if it complies with the web content accessibility guidelines (wcag 1.0) (w3c 1999, w3c 2006), but it is only usable if the browser with which it is consulted provides the options of accessibility (e.g., by allowing users to modify the associated style sheet), if the user has the necessary assistive technologies—screen magnifiers, screen readers, alternate pointing devices, etc.—to use the information and functionality contained in the document, and if all these tools interact correctly with each other. this article focuses on the accessibility of the pdf due to its importance in the world of digital publishing. though we do not have global statistics on its use, a google search specifying pdf as the format returns 236 million documents, whereas none of the other recoverable formats reaches 50 million documents (postscript 10 million, microsoft word 37 million, microsoft excel 14 million, microsoft powerpoint 14 million). (the search was performed on april 14, 2007, with the arguments filetype:pdf, etc. values were rounded off to the nearest million.) it should be remembered that pdf is the main format used for digital publishing of electronic journals and for a great variety of administrative documents, including e-government communications. furthermore, the subformat pdf/a for archiving is the preferred format for digital preservation in many large libraries, including the library of congress, which recommends it for textual documents in which the appearance is more relevant than the structure (library of congress 2005). finally, according to a study by forrester research in 2005, pdf/a and xml will be the dominant formats in document archiving in 2008 (markham 2005). if our digital memory is going to be in pdf, we must ensure that it is accessible to all persons. so far, the many studies that have been carried out on the accessibility of digital information have considered mainly the accessibility of web content in html. digital documents in a broad sense have never been evaluated from an accessibility viewpoint, and the only user studies carried out on them have dealt with usability—without paying particular attention to special capacities (see dillon 2004)—or user preferences with regard to articles in electronic format (tenopir 2003). because it is a very new field, few studies have concentrated on pdf accessibility, and they do not form part of the scientific literature. however, joe clark and duff johnson have published some interesting articles on the subject (see clark 2005, johnson 2006, 2007a, and 2007b). n what does accessible really mean? the most widespread view of digital accessibility is the regulatory one. the concept of accessibility of digital information was mainly disseminated with the publication of wcag 1.0 by the world wide web consortium (w3c) as the de facto standard in this area, and with its incorporation in the federal legislation of the united states. in the united states, compliance with the accessibility guidelines has been used as a requirement in calls for tenders, and in some cases bidders have been taken to court for failure to comply. from this viewpoint, an accessible application is one that is “valid,” i.e., approved by the criteria of wcag 1.0 or section 508 of the rehabilitation act (u.s. access board) and complymireia ribera turró (ribera@ub.edu) is professor at universitat de barcelona department of library and information science. 26 information technology and libraries | september 2008 ing with their established checklist (see appendixes for the detailed checklist). products must be certifiable as accessible. to facilitate the administrative procedures for approving bids, there is great interest in the creation of automatic protocols for checking compliance. examples of this are products such as the historic bobby, the lift extension for adobe dreamweaver, and the new wcag 2.0, which is still being revised at the time of submission of this manuscript. some authors have evaluated digital journals and database interfaces according to their compliance with these guidelines (e.g., coonin 2002; stewart, narendra, and schmetzke 2005). see the importance of the concept “programmatically determined” in the draft version of wcag 2.0 (w3c 2006). the international standard defined by iso 16071:2003 offers a different definition of accessibility. it considers accessibility to be “the usability of a product, service, environment, or facility by people with the widest range of capabilities” (iso 16071:2003). in other words, accessibility is considered equivalent to usability, with the sole difference that the objective users are not specified but rather defined broadly as having “the widest range of capabilities.” if we consult the standard definition of usability (iso 9241-11:1998), we can rewrite the definition of accessibility as “the extent to which a product can be used (by users with the widest range of capabilities) to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use.” accessibility must also be measured according to the parameters of efficacy, efficiency, and satisfaction for the type of user (hornbaek 2006). furthermore, one should not consider the accessibility of a product in general, but rather in a given context and for given tasks. according to this definition, it would not be appropriate to state that “the website of the company company.com is accessible.” we should state that “the website of the company company.com is accessible for broadband connections in office environments with the browsers internet explorer 6.0 or later and mozilla firefox 7 or later, and for commercial transactions.” a new concept, the “baseline,” appears is wcag 2.0. it marks a great change in the philosophy of website accessibility. the baseline defines the context of the software platform, and the accessibility can be evaluated only in this context. specifying the context is particularly important because some authors have seen a parallel between disabled users and disabling situations, which arise increasingly with the new paradigm of ubiquitous computation (newell 1995). for example, a person suffering from deafness may have the same problems of access as a user in a noisy environment (a discotheque) or one in which silence is compulsory (a hospital or library). accessibility is also linked to computer manipulability. according to the mvc (model-view-controller) pattern, the final format or document should allow its content, its presentation, and its interactivity to be manipulated independently in order to personalize each one according to the user’s preferences. for example, a webpage (content) should be navigable with a keyboard and with a mouse (control), and should be viewable with different font sizes, color contrasts, etc. (presentation). the separation of presentation and content, for example, through cascading style sheets (css), is thus highly desirable. however, there are also other aspects of digital formats related to computer manipulability. for example, it is considered that an open format is more accessible than a proprietary format because it is easier to develop assistive technologies to take advantage of its potential; a multiplatform format is more accessible than a format linked to a particular platform because it is adapted to a greater diversity of controls; a format that includes characteristics of internationalization is more accessible than one that does not because it can present a greater wealth of content; and a format that uses semantic encoding is more accessible than one that uses syntactic encoding because the software tools can extract more information from it. finally, several authors in the field of publishing (dechilly 2004) and the field of accessible design (paciello 2000, petrie and weber 2002) have related the accessibility of the digital documents to the structuring of the content and its potential transformation. specifically, raman (1994) defines the accessibility of a digital document as n the amount of structural information captured by the encoding; n the degree to which this structural information is available for processing by other applications; and n the availability of the appropriate software needed to process this structure. n disabilities that affect reading and assistive technologies returning to the definition of accessibility as being for the widest range of capabilities, it is observed that some disabilities have direct effects on digital reading and associated activities (o’hara 1996), and that there are some well-established assistive technologies that can eliminate or minimize these effects. there are three main groups of print disabilities that also affect the comprehension of graphics: n all degrees of vision problems, from total blindness to reduced vision, color-blindness, and other dysfunctions. the most widely used assistive technology for total blindness is that of screen readers, which digest the information on the screen and transform it into spoken text. reduced vision makes it difficult to read or capture the information offered; for persons with this disability, screen are pdf documents accessible? | ribera turró 27 magnifiers offer an optical zoom of the information shown in addition to color and contrast adjustment. in both cases additional information in the document is often required, such as explanatory subtitles for video recordings and alternative descriptions for images. n motor skill problems, particularly those affecting the upper extremities. this disability hinders the interaction with information and the activation of controls, links, and even linear scrolling in the document. there are a great variety of assistive technologies for persons with this disability, including pointing devices, alternative keyboards, voice synthesis technologies for activating controls, and even assistive technologies for automatic text completion. n different types of cognitive problems that affect reading comprehension. those caused by dyslexia or early deafness can benefit from screen magnifiers, screen readers, and automatic text completion. those caused by cognitive disabilities—which have not been widely studied—often require a simplified presentation of the information through graphics or very simple language. n pdf pdf is the descendent of postscript and is oriented toward presentation. though recent research has experimented with new, more versatile image models (bagley, brailsford, and hardy 2003), these have not yet been incorporated in the commercial format, which is still basically postscript. pdf is a format of digital dissemination that has replicated paper documents for many years. its faithful reproduction and portability on different platforms, in addition to a commercial policy of free dissemination of the reader program, have given it a dominant market position among digital publishing formats. in the latest versions, pdf incorporates functions of digital management of access rights (drm) and allows information providers to regulate the permission to view, print, extract, and modify the content. the orientation toward presentation, which has turned pdf into the de facto standard in the publishing industry, is its main drawback for accessibility. in order to solve this, from version 1.4 onward adobe has incorporated structural elements in the format (e.g., structural tags). this article therefore only studies the most recent versions of the format. despite the potential of the format, it is possible to create pdf documents of very different qualities. the application from which a pdf document is generated and the process followed in creating it directly affect the accessibility of the resulting document. specifically, pdf can have four increasing levels of accessibility: 1. pdf image—no accessibility 2. pdf text 3. pdf text, with order 4. tagged pdf—maximum accessibility pdf image documents a pdf image document is a document obtained from scanning or photographing a printed document. its content is exclusively the bitmap resulting from the optical process. it does not allow searches in the document or text extraction because the text is only coded as a graphic. a pdf image document is similar to a paper document in its level of accessibility. for blind persons or persons with reading problems (caused by dyslexia or early deafness) it must be transformed through an optical character recognition process in order to encode the text and adapt it to screen readers or alter the presentation. the only advantage that the digital presentation may have over a paper presentation is the possibility of optical zooming and increasing the text size to benefit persons with certain visual impairments. pdf text documents the second level of accessibility is that of pdf text documents, which come from the same source as image files but have gone through an optical character recognition process and incorporate the resulting text in the file. in this case it is possible to search the content, export the content to a word processor, listen to it with screen readers, and perform other types of conversion. specifically, as of version 6 or later, reader incorporates the possibility of removing columns, viewing the text in negative (white on black), increasing the font size, and even hearing it with a screen reader. according to the quality of the ocr used and of the subsequent manual revision, these files often contain slight typographic errors that may affect the results of text searches and also make continued reading more difficult (especially when a screen reader is being used). if the quality of the original is poor, or it is in bad condition, ocr programs often make small mistakes. furthermore, if the original has a creative layout in which the order of presentation does not correspond to the order of reading, the resulting text appears disordered and is therefore illegible; this problem can arise due to such common structures as footnotes, headers, and margin notes. pdf documents with ordered text the third level of accessibility is that of pdf text docu28 information technology and libraries | september 2008 ments in which the correct reading order has been established. this can be done when the document is created or by editing it at a later stage. tagged pdf documents the fourth and highest level of accessibility is when the pdf document contains ordered text and structural tags to define headers, tables, lists, etc. with this encoding, an assistive technology can present a summary of the document, facilitate navigation, provide structural information of the content, etc. this fourth level is normally achieved only by postprocessing a pdf document that has already been created. when documents are converted from the most widely used word processors to pdf, there still arise errors that must be revised manually. for example, when tables are converted from microsoft word to pdf, the tags are created correctly in general, but the headers of the tables are not marked up because word does not allow them to be differentiated structurally. another example is the conversion from open office 2.0, in which the documents created have major structural deficiencies. n are pdf documents accessible? even at the highest level of accessibility, tagged pdf document, the aspects discussed above in the definition of accessibility must be checked. though pdf is totally multimedia, particularly in the latest versions, and it now allows programming to be included, the commonest pdf documents are plain documents in which text and images are reproduced and the interactivity is reduced to the use of forms; these documents will thus be the ones analyzed here in an initial approach to pdf accessibility. though this may represent a limitation, in fact it includes most of the pdf documents used for electronic journals and administrative documents. pdf’s accessibility is analyzed from the viewpoint of the end users, the readers of the document, so comments on the most widespread user agent, reader, are also included. pdf’s compliance with wcag 1.0, the wcag 2.0 draft, and section 508 is evaluated, and its accessibility is considered from the viewpoint of the platforms on which it runs and the programs that can be used to create documents. however, reader and adobe acrobat professional are not evaluated as authoring tools or user agents because that is not the focus of this article. wcag 1.0 in 1997 the w3c officially created an initiative to foster the accessibility of the web (engelen 2001), following the vision of its creator, tim berners lee: “the power of the web is in its universality. access by everyone regardless of disability is an essential aspect.” the web accessibility initiative (wai, www.w3.org/wai) works with organizations around the world to develop strategies, guidelines, and resources to help make the web accessible to people with disabilities. w3c-wai has established three recommendations to improve the accessibility of the web: n wcag 1.0, released in may 1999, based on the trace unified web guidelines (version 8), affecting web content in itself (e.g., an html page) n the authoring tool accessibility guidelines, affecting software used to build websites (e.g., dreamweaver) n the user agents accessibility guidelines (uaag), affecting browsers and multimedia players that interact with web content (e.g., mozilla firefox) of these guidelines, the ones that have had the greatest impact are wcag 1.0, because they affect content providers, such as governments. most legislations promoting digital accessibility have made direct or indirect reference to these guidelines. wcag 1.0 is divided into in fourteen guidelines or general principles of accessible design, which are specified through several checkpoints. each checkpoint has a priority level based on its impact on accessibility. checkpoints of [priority 1] are a basic requirement for some groups to be able to use web documents. the ones of [priority 2] will remove significant barriers to accessing web documents. checkpoints of [priority 3] will improve access to web documents (w3c 1999). wcag 1.0 is designed to evaluate documents, not formats, so in this article the evaluation often refers to the potential of the format if it is used properly to create documents. pdf can potentially comply with all the checkpoints of wcag 1.0 applicable to text, images, and forms (its multimedia potential has not been analyzed in this article) in any priority except for three points: 5.2 headers for tables of complex data, in priority 1. the current version of pdf only foresees the use of th as the header of tables, specifies attributes that allows it to be related to columns or files, but provides no mechanism for grouping cells (like colgroup or rowgroup of html). 3.4 relative units in markup language attribute values and style sheet property values, in priority 2. though the most recent versions of pdf use css, the format only allows absolute values to be specified. however, this does not prevent reader from making extensive zooms of the content of the pdf are pdf documents accessible? | ribera turró 29 documents. 9.5 provide keyboard shortcuts, in priority 3. no mechanism is specified for linking a keyboard shortcut to a content. in both point 4.1, for identifying changes of language, and point 5.1, on simple table headers (both of them in priority 1), although pdf provides for the incorporation of this information, some experiments carried out with the acrobat professional tools for conversion from microsoft word have shown that this information is not correctly transferred from the word processor to the pdf document. for a more detailed analysis, see the tables in appendix a. here we have only commented on the most outstanding aspects. wcag 2.0 draft in the course of time wcag 1.0 has become out of date. the web of 2007 is very different from that of 1999: the increased use of multimedia formats; the growth of webbased software, ajax, the xml subformats; and the paradigm of ubiquitous computing have made it necessary to redefine wcag 1.0 guidelines, which were initially highly focused on html, in order to extend them to all types of format; it has been considered necessary to be able to define an available software platform baseline (for a detailed discussion of the differences between wcag 1.0 and wcag 2.0, see www.webaim.org/stan dards/wai/wcag2.php#differences). however, due to the enormous success of wcag 1.0, the development of the wcag 2.0 has been subject to enormous pressure, and it has received more comments and suggestions than any other guideline of w3c. this is why the process of approval is slower than normal, and though publication dates have been repeatedly announced, the current document is no more than a draft. wcag 2.0 has four principles, each one addressed by several guidelines. under each guideline there are success criteria used to evaluate conformance to this standard for that guideline, which fall into three levels of conformance, each representing a higher level of accessibility (w3c 2006). pdf complies almost absolutely with all the success criteria of all the levels of the four principles of accessibility described in wcag 2.0. only in principle 3 (which establishes that both the content and the controls must be understandable), guideline 3.1, level 3, “make text content readable and understandable” does wcag 2.0 fail in several success criteria: 3.1.3 offer definitions for words used in an unusual way. 3.1.6 offer the pronunciation of words where meaning cannot be determined without pronunciation. in all of these points, the generic title attribute could be used for all the tags, but there is no standard mechanism for differentiating pronunciation or slang. for a more detailed analysis, see the tables in appendix b. here we have only commented on the most outstanding aspects. section 508 of the rehabilitation act in august 2000 the u.s. access board (www.access-board .gov), an independent federal group that oversees the production of guidelines on accessibility for compliance with various legislative measures, published a revised amendment of the rehabilitation act of 1973, stating that the information provided by federal agencies must be accessible to people with disabilities (engelen 2001). as johnson mentions, “the regulations also apply to contractors that submit electronic documents to the federal government”(2007b). compliance with section 508 is parallel to compliance with wcag 1.0. pdf complies with all the points except the ability to associate data cells and header cells for tables that have more than one logical level (checkpoint h). for a detailed analysis see the tables in appendix c. here we have only commented on the most outstanding aspects. n are pdf documents accessible from a computer’s viewpoint? some doubt still remains about whether pdf is an open or closed format. it is a proprietary format, so strictly speaking it is not open, but adobe systematically publishes the specification of format and allows it to be used by third parties free of charge, simply reserving the intellectual property rights (see adobe 2006, 32, for the exact terms of the adobe license). furthermore, adobe recently initiated the process for pdf to become an iso standard (adobe 2007). the latest versions of pdf allow content and presentation to be differentiated. the content consists of the text and images, and the presentation can be encoded like webpages with some properties of css version 1 and 2. the control of the document depends on the program used to read it, since pdf does not allow keyboard shortcuts or alternative actions to be defined for any application. one of its main advantages is that it can be reproduced faithfully on any platform, but in terms of accessibility it is not multiplatform. the structure, links, and forms of adobe reader for assistive technologies are accessed through the microsoft active accessibility technology, so they can only be used on microsoft platforms. the screen reader incorporated as of reader 6 does work on both windows and macintosh. 30 information technology and libraries | september 2008 with regard to internationalization, pdf supports the unicode character set and also allows the language of the document to be specified on a global and local level. though the specification of the format establishes that it supports inverse reading order (e.g., for arabic languages) or vertical reading order (e.g., for asian languages), webaim (webaim 2006) states that it causes problems with them. the structure in a pdf document is transmitted mainly through the tags incorporated in the format since version 1.4. the standard set of tags is fairly limited—more so than that of html 4 (see adobe 2006, sections 10.7 and 10.8, for a complete list). the set of tags can be extended but always with an equivalent with the standard set, which is the only one supported by adobe. the structure is transmitted through the microsoft active accessibility (msaa) technology to any assistive technology, so other applications can also read it. msaa provides agents and synthesizers in several languages that do not tend to be installed by default but can be downloaded free of charge from the microsoft website (www.microsoft .com/msagent/downloads/user.asp). there are currently few programs that can process this structure, of which reader is the most widespread. among the screen readers, jaws by freedomscientific and windows eyes by gw micro can also process pdf files on a structural level. according to joe clark (2005), ibm home page reader and hal screen reader by dolphin can also do so. most of these programs incorporate support for pdf in their latest versions, which are not always the most widely used. n are pdf documents accessible according to the iso standard? on this topic, few studies have been made and much work remains to be done. accessibility should be evaluated for the different types of disabilities that affect electronic reading and for different scenarios and contexts of use. though the tests of users with disabilities are beginning to be generalized, and there are even guidelines on how to do them (coyne and nielsen 2001a), after an extensive bibliographic search i was unable to locate any studies evaluating this aspect in pdf. n the opinion of the experts joe clark, the author of one of the most important books on website accessibility, building accessible websites (2003), currently forms part of adobe’s work group on usability and accessibility. he is one of the greatest proponents of pdf, and claims that it offers some advantages of accessibility/usability compared with html (its greatest competitor for digital documents), such as its ability to use footnotes, notes, and comments. he gives little importance to the fact that it is a proprietary format and argues that the important point is that the specification is public. in his article on the accessibility of pdfs (clark 2005), he defends the use of the format compared with others for certain objectives and needs: for interactive forms, for documents in revision, for design fonts not available in html, as a format of preservation, and for documents that require the management of digital rights. with regard to software, he makes a great defense of reader for taking advantage of documents and resolving problems of accessibility, but recognizes that in the field of authoring tools more programs are necessary. webaim, an initiative for accessibility at utah state university, gives its opinion on some points that facilitate or hinder the accessibility of pdfs: n the screen-reading function only exists in the complete version of reader, and by no means offers the same functionality as jaws or windows eyes. furthermore, to use it one must memorize new access keys that are not common to other programs. n the reflowing function is very useful for persons who require magnification or who work with small screens because it eliminates the horizontal scroll. n webaim recommends the use of html for tables, particularly if they are complex, because current screen readers can process them far better than if they are in pdf. n webaim criticizes the fact that the options for configuring accessibility in reader are highly oriented toward screen readers and magnifiers, and that they are only partial and confusing. for example, it mentions the fact that it is possible to configure some multimedia options but not from the accessibility setup wizard. (webaim 2006) another aspect that receives constant criticism is the cost of creating accessible pdf documents. though throughout this article the accessibility of pdf has been studied from the viewpoint of the user, the lack of tools for creating documents must also be stressed. though the specification is in theory public and there exist software tools (e.g., the pdflib software library and the xslt transformations in open office) to generate tags in a pdf that are the result of a document conversion, users normally depend on the tools of microsoft office and the acrobat professional program to create accessible pdf documents. even with these tools, editing is not as intuitive as editing a plain text tagged with html, and it is are pdf documents accessible? | ribera turró 31 far less maintainable because in pdf tags and content are encoded separately. n conclusions as has been seen in the analysis, pdf can be considered fairly accessible from many points of view, and its degree of compliance with the most widely recognized guidelines is fairly high. however, a surprising lack of attention is paid to complex data tables, which form a very important part of scientific articles, one of the main types of document encoded in this format. the accessibility of pdf has greatly reduced its multiplatform nature and it is to be expected that adobe will gradually resolve this point, particularly in the linux environment that has been adopted by many public administrations. an accessible and multiplatform pdf is a requisite for a really public electronic government. the format still faces three major challenges with regard to accessibility: n the creation of powerful authoring tools that allow documents to be edited easily, to be modified, and to be partially changed without having to restart the cycle of creation. there is a lack of authoring tools for creating accessible pdf documents easily and a lack of integration of the creation process in the most widely used word processors. it is to be expected that with the merger of macromedia and adobe these tools will shortly appear on the market. n a greater opening of the format by adobe in order to foster its extensibility (adobe recently applied for pdf to become an isa standard), which in digital articles is beginning to be a requirement for the annotation of mathematical or chemical formulas. (see, for example, the specific development of the infty project for reading mathematical formulas in pdf [kanahori 2006].) n a greater wealth of tags and attributes in order to express variants, and additional or complementary information, such as definitions and pronunciation. despite its shortcomings, the possibilities of the pdf with regard to accessibility have increased greatly in the latest versions, and it is now almost on a level with html. the existence of reader with several options for facilitating accessible reading increases its attractiveness even further. the experts recognize the giant steps made by the format, though they are aware of its limitations; for the moment their advice is to use the right format for the right task. finally, further research is required in order to gather the opinion of users on its accessibility. works cited adobe. 2005. creating accessible pdf documents with adobe acrobat 7.0: a guide for publishing pdf documents for use by people with disabilities. www .adobe.com/enterprise/accessibility/pdfs/acro7_pg_ ue.pdf (accessed apr. 17, 2007). ———. 2006. pdf reference: adobe portable document format version 1.7, 6th ed. www.adobe.com/devnet/acrobat/ pdfs/pdf_reference.pdf (accessed apr. 17, 2007). ———. 2007. adobe to release pdf for industry standardization. press release, jan. 29. www.adobe.com/aboutadobe/ pressroom/pressreleases/200701/012907openpdfaiim .html (accessed apr. 17, 2007). bagley, s. r., d. f. brailsford, and m. r. b. hardy. 2003. document formatting: creating reusable well-structured pdf as a sequence of component object graphic (cog) elements. presented at document engineering, grenoble, france. bates, m. j. 2002. the cascade of interactions in the digital library interface. information processing and management 38, no. 3: 381–400. clark, j. 2003. building accessible websites. indianapolis: new riders. ———. 2005. facts and opinions about pdf accessibility. a list apart, aug. 22, 2005. www.alistapart.com/articles/ pdf_accessibility (accessed apr. 17, 2007). coonin, b. 2002. establishing accessibility for e-journals: a suggested approach. library hi tech 20, no. 2: 207–20. coyne, k. p., and j. nielsen. 2001a. how to conduct usability evaluations for accessibility: methodology guidelines for testing websites and intranets with users who use assistive technology. fremont, calif.: norman nielsen group. ———. 2001b. beyond alt text: making the web easy to use for users with disabilities. fremont, calif.: norman nielsen group. ———. 2001c. web usability for senior citizens: design guidelines based on usability studies with people age 65 and older. fremont, calif.: norman nielsen group. dechilly, t. 2004. diffusion de contenus et de documents sur internet. in j. le moal, b. hidoine, and l. calderna (eds.), publier sur internet : séminaire inria 27 septembre–1er octobre 2004 aix-les-bains: 65–100. (s.l.): association des professionals de l’information et de la documentation (adbs)/institut national de recherche en informatique et en automatique. dillon, a. 2004. designing usable electronic text. 2nd ed. boca raton, fla.: crc press. engelen, j. 2001. guidelines for web accessibility. inclusive design guidelines for hci. ed. nichole collette and julio abascal. london: taylor and francis. henry, s. l. 2007. just ask: integrating accessibility throughout design. www.uiaccess.com/accessucd (accessed apr. 17, 2007). hornbaek, k. 2006. current practice in measuring usability: challenges to usability studies and research. international journal of human-computer studies 64, no. 2: 79–102. 32 information technology and libraries | september 2008 iso 16071:2003. ergonomics of human-system interaction— guidance on accessibility from human-computer interfaces. iternational organization for standardization, 2003. iso 9241-11:1998. ergonomic requirements for office work with visual display terminals (vdts)—part 11: guidance on usability. international organization for standardization, 1998. johnson, d. 2006. what are pdf tags and why should i care? www.acrobatusers.com/articles/2006/02/pdf_tags/pdf_ tags.php (accessed apr. 17, 2007). ———. 2007a. pdf goes to iso: the road ahead. www.planet pdf.com/enterprise/article.asp?contentid=pdf_goes_to_ iso_-_the_road_ahead&page=0 (accessed apr. 17, 2007). ———. 2007b. pdf in government. www.acrobatusers.com/ articles/2007/02/pdf_in_government/index.php (accessed apr. 17, 2007). kanahori, t. 2006. scientific pdf document reader with simple interface for visually impaired people. conference on computers helping people with special needs (icchp 10th), university of linz, austria. library of congress. 2005. pdf/x, pdf for prepress graphics file exchange. sustainability of digital formats planning for library of congress collections. http://digitalpreservation .gov/formats/fdd/fdd000124.shtml (accessed apr. 17, 2007). markham, r. 2005. the market for accessible technology— the wide range of abilities and its impact on computer use. www.microsoft.com/enable/research/phase1.aspx (accessed apr. 17, 2007). nielsen, jakob. 1996. accessible design for users with disabilities. alertbox: current issues in web usability, oct. www .useit.com/alertbox/9610.html (accessed apr. 17, 2007). ———. 1999. disabled accessibility: the pragmatic approach. alertbox: current issues in web usability, june 13. www.useit .com/alertbox/990613.html (accessed apr. 17, 2007). newell, alan f. 1995. extra-ordinary human computer operation. extra-ordinary human-computer interactions: interfaces for users with disabilities. ed. a. d. n. edwards, 3–18. cambridge: cambridge univ. pr. o’hara, k. 1996. toward a typology of reading goals. no. xrce technical report no. epc-1996-107. xerox research centre europe. www.xrce.xerox.com/publications/ attachments/1996-107/epc-1996-107.pdf (accessed apr. 17, 2007). paciello, michael g. 2000. web accessibility for people with disabilities. lawrence, kans.: cmp books. petrie, helen and g. weber. 2002. reading multimedia documents. computers helping people with special needs 8th international conference, icchp 2002. proceedings, 15–20 july. raman, t. v. 1994. audio system for technical reading. phd thesis, cornell university. schade, a. and j. nielsen. 2002. accessibility and usability of flash for users with disabilities based on best practices for design of flash-based user interfaces, based on usability studies with people who use assistive technology. fremont, calif.: norman nielsen group. stewart, r., v. narendra, and a. schmetzke. 2005. accessibility and usability of online library databases. library hi tech 23, no. 2: 265–286. tenopir, c. 2003. use and users of electronic library resources: an overview and analysis of recent research studies. washington, d.c.: council on library and information resources. http://clir.org/pubs/reports/pub120/pub120. pdf (accessed apr. 17, 2007). u.s. access board. 1998. section 508 of the rehabilitation act. u.s. code 29, §794d. webaim. 2006. accessibility features in acrobat reader 7. www .webaim.org/resources/reader/index.php (accessed apr. 17, 2007). w3c. 1999. web content accessibility guidelines 1.0. www .w3.org/tr/wai-webcontent (accessed apr. 17, 2007). w3c. 2006. web content accessibility guidelines 2.0. w3c working draft, apr. 27, 2006. www.w3.org/tr/wcag20 (accessed apr. 17, 2007). are pdf documents accessible? | ribera turró 33 appendix a. wcag 1.0 checklist of checkpoints priority 1 checkpoints in general (priority 1) yes no n/a 1.1 provide a text equivalent for every non-text element (e.g., via “alt,” “longdesc,” or in element content). this includes: images, graphical representations of text (including symbols), image map regions, animations (e.g., animated gifs), applets and programmatic objects, ascii art, frames, scripts, images used as list bullets, spacers, graphical buttons, sounds (played with or without user interaction), stand-alone audio files, audio tracks of video, and video. x 2.1 ensure that all information conveyed with color is also available without color, for example from context or markup. x 4.1 clearly identify changes in the natural language of a document’s text and any text equivalents (e.g., captions). x1 6.1 organize documents so they may be read without style sheets. for example, when an html document is rendered without associated style sheets, it must still be possible to read the document. x 6.2 ensure that equivalents for dynamic content are updated when the dynamic content changes. x 7.1 until user agents allow users to control flickering, avoid causing the screen to flicker. x 14.1 use the clearest and simplest language appropriate for a site’s content. x and if you use images and image maps (priority 1) yes no n/a 1.2 provide redundant text links for each active region of a server-side image map. x 9.1 provide client-side image maps instead of server-side image maps except where the regions cannot be defined with an available geometric shape. x and if you use tables (priority 1) yes no n/a 5.1 for data tables, identify row and column headers. x 5.2 for data tables that have two or more logical levels of row or column headers, use markup to associate data cells and header cells. x and if you use frames (priority 1) yes no n/a 12.1 title each frame to facilitate frame identification and navigation. x and if you use applets and scripts (priority 1) yes no n/a 6.3 ensure that pages are usable when scripts, applets, or other programmatic objects are turned off or not supported. if this is not possible, provide equivalent information on an alternative accessible page. x and if you use multimedia (priority 1) yes no n/a 1.3 until user agents can automatically read aloud the text equivalent of a visual track, provide an auditory description of the important information of the visual track of a multimedia presentation. x 1.4 for any time-based multimedia presentation (e.g., a movie or animation), synchronize equivalent alternatives (e.g., captions or auditory descriptions of the visual track) with the presentation. x and if all else fails (priority 1) yes no n/a 11.4 if, after best efforts, you cannot create an accessible page, provide a link to an alternative page that uses w3c technologies, is accessible, has equivalent information (or functionality), and is updated as often as the inaccessible (original) page. x 34 information technology and libraries | september 2008 priority 2 checkpoints in general (priority 2) yes no n/a 2.2 ensure that foreground and background color combinations provide sufficient contrast when viewed by someone having color deficits or when viewed on a black and white screen. [priority 2 for images, priority 3 for text.] x2 3.1 when an appropriate markup language exists, use markup rather than images to convey information. x3 3.2 create documents that validate to published formal grammars. x 3.3 use style sheets to control layout and presentation. x4 3.4 use relative rather than absolute units in markup language attribute values and style sheet property values. x 3.5 use header elements to convey document structure and use them according to specification. x 3.6 mark up lists and list items properly. x 3.7 mark up quotations. do not use quotation markup for formatting effects such as indentation. x 6.5 ensure that dynamic content is accessible or provide an alternative presentation or page. x 7.2 until user agents allow users to control blinking, avoid causing content to blink (i.e., change presentation at a regular rate, such as turning on and off). x 7.4 until user agents provide the ability to stop the refresh, do not create periodically autorefreshing pages. x 7.5 until user agents provide the ability to stop auto-redirect, do not use markup to redirect pages automatically. instead, configure the server to perform redirects. x 10.1 until user agents allow users to turn off spawned windows, do not cause pop-ups or other windows to appear and do not change the current window without informing the user. x 11.1 use w3c technologies when they are available and appropriate for a task and use the latest versions when supported. x 11.2 avoid deprecated features of w3c technologies. x 12.3 divide large blocks of information into more manageable groups where natural and appropriate. x 13.1 clearly identify the target of each link. x 13.2 provide metadata to add semantic information to pages and sites. x 13.3 provide information about the general layout of a site (e.g., a site map or table of contents). x 13.4 use navigation mechanisms in a consistent manner. x are pdf documents accessible? | ribera turró 35 in general (priority 2), cont. yes no n/a and if you use tables (priority 2) yes no n/a 5.3 do not use tables for layout unless the table makes sense when linearized. otherwise, if the table does not make sense, provide an alternative equivalent (which may be a linearized version). x 5.4 if a table is used for layout, do not use any structural markup for the purpose of visual formatting. x and if you use frames (priority 2) yes no n/a 12.2 describe the purpose of frames and how frames relate to each other if it is not obvious by frame titles alone. x and if you use forms (priority 2) yes no n/a 10.2 until user agents support explicit associations between labels and form controls, for all form controls with implicitly associated labels, ensure that the label is properly positioned. x 12.4 associate labels explicitly with their controls. x and if you use applets and scripts (priority 2) yes no n/a 6.4 for scripts and applets, ensure that event handlers are input-device independent. x 7.3 until user agents allow users to freeze moving content, avoid movement in pages. x 8.1 make programmatic elements such as scripts and applets directly accessible or compatible with assistive technologies. [priority 1 if functionality is important and not presented elsewhere, otherwise priority 2.] x 9.2 ensure that any element that has its own interface can be operated in a device-independent manner. x 9.3 for scripts, specify logical event handlers rather than device-dependent event handlers. x priority 3 checkpoints in general (priority 3) yes no n/a 4.2 specify the expansion of each abbreviation or acronym in a document where it first occurs. x5 4.3 identify the primary natural language of a document. x 9.4 create a logical tab order through links, form controls, and objects. x 9.5 provide keyboard shortcuts to important links (including those in client-side image maps), form controls, and groups of form controls. x6 10.5 until user agents (including assistive technologies) render adjacent links distinctly, include nonlink, printable characters (surrounded by spaces) between adjacent links. x 11.3 provide information so that users may receive documents according to their preferences (e.g., language, content type, etc.). x 36 information technology and libraries | september 2008 in general (priority 3), cont. yes no n/a 13.5 provide navigation bars to highlight and give access to the navigation mechanism. x 13.6 group related links, identify the group (for user agents), and, until user agents do so, provide a way to bypass the group. x 13.7 if search functions are provided, enable different types of searches for different skill levels and preferences. x7 13.8 place distinguishing information at the beginning of headings, paragraphs, lists, etc. x 13.9 provide information about document collections (i.e., documents comprising multiple pages). x 13.10 provide a means to skip over multi-line ascii art. x 14.2 supplement text with graphic or auditory presentations where they will facilitate comprehension of the page. x 14.3 create a style of presentation that is consistent across pages. x and if you use images and image maps (priority 3) yes no n/a 1.5 until user agents render text equivalents for client-side image map links, provide redundant text links for each active region of a client-side image map. x and if you use tables (priority 3) yes no n/a 5.5 provide summaries for tables. x8 5.6 provide abbreviations for header labels. x 10.3 until user agents (including assistive technologies) render side-by-side text correctly, provide a linear text alternative (on the current page or some other) for all tables that lay out text in parallel, word-wrapped columns. x and if you use forms (priority 3) yes no n/a 10.4 until user agents handle empty controls correctly, include default, place-holding characters in edit boxes and text areas. x notes 1. this information is not correctly transferred from some word processors to the pdf format. 2. adobe reader allows color combinations in text to be changed. 3. the pdf tag set is very limited and does not include mathematical formulas or chemical symbols. 4. the latest versions of pdf use css. 5. creators can use the e element to specify an abbreviation for a word. 6. adobe reader and adobe acrobat use generic access keys, but they cannot be specified in a document. 7. included in adobe reader and adobe acrobat. 8. only as of version 1.7. are pdf documents accessible? | ribera turró 37 appendix b. wcag 2.0 checklist of checkpoints (draft april 2006) principle 1: content must be perceivable guideline 1.1: provide text alternatives for all non-text content yes no n/a level 1 success criteria for guideline 1.1 for all non-text content, one of the following is true: if non-text content presents information or responds to user input, text alternatives serve the same purpose and present the same information as the non-text content. if text alternatives cannot serve the same purpose, then text alternatives at least identify the purpose of the non-text content. if non-text content is multimedia; live audio-only or live video-only content; a test or exercise that must use a particular sense; or primarily intended to create a specific sensory experience; then text alternatives at least identify the non-text content with a descriptive text label. if the purpose of non-text content is to confirm that content is being operated by a person rather than a computer, different forms are provided to accommodate multiple disabilities. if non-text content is pure decoration, or used only for visual formatting, or if it is not presented to users, it is implemented such that it can be ignored by assistive technology. x1 guideline 1.2 provide synchronized alternatives for multimedia (not analyzed in this article) yes no n/a level 1 success criteria for guideline 1.2 1.2.1 captions are provided for prerecorded multimedia. 1.2.2 audio descriptions of video, or a full multimedia text alternative including any interaction, are provided for prerecorded multimedia. level 2 success criteria for guideline 1.2 1.2.3 audio descriptions of video are provided for prerecorded multimedia. 1.2.4 captions are provided for live multimedia. level 3 success criteria for guideline 1.2 1.2.5 sign-language interpretation is provided for multimedia. 1.2.6 extended audio descriptions of video are provided for prerecorded multimedia. 1.2.7 for prerecorded multimedia, a full multimedia text alternative including any interaction is provided. guideline 1.3: ensure that information and structure can be separated from presentation yes no n/a level 1 success criteria for guideline 1.3 1.3.1 information and relationships conveyed through presentation can be programmatically determined, and notification of changes to these is available to user agents, including assistive technologies. x 1.3.2 any information that is conveyed by color is also visually evident without color. x 38 information technology and libraries | september 2008 guideline 1.3: ensure that information and structure can be separated from presentation, cont. yes no n/a 1.3.3 when the sequence of the content affects its meaning, that sequence can be programmatically determined. x level 2 success criteria for guideline 1.3 1.3.4 information that is conveyed by variations in presentation of text is also conveyed in text, or the variations in presentation of text can be programmatically determined. x 1.3.5 information required to understand and operate content does not rely on shape, size, visual location, or orientation of components. x guideline 1.4: make it easy to distinguish foreground information from its background yes no n/a level 2 success criteria for guideline 1.4 1.4.1 text or diagrams, and their background, have a luminosity contrast ratio of at least 5:1. x 1.4.2 a mechanism is available to turn off background audio that plays automatically, without requiring the user to turn off all audio. x level 3 success criteria for guideline 1.4 1.4.3 text or diagrams, and their background, have a luminosity contrast ratio of at least 10:1. x 1.4.4 audio content does not contain background sounds, background sounds can be turned off, or background sounds are at least 20 decibels lower than the foreground audio content, with the exception of occasional sound effects. x principle 2: interface components in the content must be operable guideline 2.1: make all functionality operable via a keyboard interface yes no n/a level 1 success criteria for guideline 2.1 2.1.1 all functionality of the content is operable in a non-time-dependent manner through a keyboard interface, except where the task requires analog, time-dependent input. note: this does not preclude and should not discourage the support of other input methods (such as a mouse) in addition to keyboard operation. x level 3 success criteria for guideline 2.1 2.1.2 all functionality of the content is operable in a non-time-dependent manner through a keyboard interface. x are pdf documents accessible? | ribera turró 39 guideline 2.2: allow users to control time limits on their reading or interaction yes no n/a level 1 success criteria for guideline 2.2 2.2.1 for each time-out that is a function of the content, at least one of the following is true: • the user is allowed to deactivate the time-out; or • the user is allowed to adjust the time-out over a wide range that is at least ten times the length of the default setting; or • the user is warned before time expires and given at least 20 seconds to extend the time-out with a simple action (for example, “hit any key”), and the user is allowed to extend the timeout at least ten times; or • the time-out is an important part of a real-time event (for example, an auction), and no alternative to the time-out is possible; or • the time-out is part of an activity where timing is essential (for example, competitive gaming or time-based testing) and time limits can not be extended further without invalidating the activity. x level 2 success criteria for guideline 2.2 2.2.2 content does not blink for more than three seconds, or a method is available to stop all blinking content in the web unit or authored component. x 2.2.3 content can be paused by the user unless the timing or movement is part of an activity where timing or movement is essential. x level 3 success criteria for guideline 2.2 2.2.4 except for real-time events, timing is not an essential part of the event or activity presented by the content. x 2.2.5 interruptions, such as updated content, can be postponed or suppressed by the user, except interruptions involving an emergency. x 2.2.6 when an authenticated session expires, the user can continue the activity without loss of data after re-authenticating. x guideline 2.3: allow users to avoid content that could cause seizures due to photosensitivity yes no n/a level 1 success criteria for guideline 2.3 2.3.1 content does not violate the general flash threshold or the red flash threshold. x level 3 success criteria for guideline 2.3 2.3.2 web units do not contain any components that flash more than three times in any onesecond period. x 40 information technology and libraries | september 2008 guideline 2.4: provide mechanisms to help users find content, orient themselves within it, and navigate through it yes no n/a level 1 success criteria for guideline 2.4 2.4.1 a mechanism is available to bypass blocks of content that are repeated on multiple web units. x level 2 success criteria for guideline 2.4 2.4.2 more than one way is available to locate content within a set of web units where content is not the result of, or a step in, a process or task. x 2.4.3 web units have titles. x 2.4.4 each link is programmatically associated with text from which its purpose can be determined. x level 3 success criteria for guideline 2.4 2.4.5 titles, headings, and labels are descriptive. x 2.4.6 when a web unit or authored component is navigated sequentially, components receive focus in an order that follows relationships and sequences in the content. x 2.4.7 information about the user’s location within a set of web units is available. x 2.4.8 the purpose of each link can be programmatically determined from the link. x guideline 2.5: help users avoid mistakes and make it easy to correct mistakes that do occur yes no n/a level 1 success criteria for guideline 2.5 2.5.1 if an input error is detected, the error is identified and described to the user in text. x level 2 success criteria for guideline 2.5 2.5.2 if an input error is detected and suggestions for correction are known and can be provided without jeopardizing the security or purpose of the content, the suggestions are provided to the user. x 2.5.3 for forms that cause legal or financial transactions to occur, that modify or delete data in data storage systems, or that submit test responses, at least one of the following is true: actions are reversible. actions are checked for input errors before going on to the next step in the process. the user is able to review and confirm or correct information before submitting it. x level 3 success criteria for guideline 2.5 2.5.4 context-sensitive help is available for text input. x principle 3: content and controls must be understandable guideline 3.1: make text content readable and understandable yes no n/a level 1 success criteria for guideline 3.1 3.1.1 the primary natural language or languages of the web unit can be programmatically determined. x2 level 2 success criteria for guideline 3.1 are pdf documents accessible? | ribera turró 41 guideline 3.1: make text content readable and understandable, cont. yes no n/a 3.1.2 the natural language of each passage or phrase in the web unit can be programmatically determined note: this requirement does not apply to individual words or phrases that have become part of the primary language of the content. x2 level 3 success criteria for guideline 3.1 3.1.3 a mechanism is available for identifying specific definitions of words or phrases used in an unusual or restricted way, including idioms and jargon. x3 3.1.4 a mechanism for finding the expanded form of abbreviations is available. x4 3.1.5 when text requires reading ability more advanced than the lower secondary education level, supplemental content is available that does not require reading ability more advanced than the lower secondary education level. x 3.1.6 a mechanism is available for identifying specific pronunciation of words where meaning cannot be determined without pronunciation. x3 guideline 3.2: make the placement and functionality of content predictable yes no n/a level 1 success criteria for guideline 3.2 3.2.1 when any component receives focus, it does not cause a change of context. x 3.2.2 changing the setting of any form control or field does not automatically cause a change of context (beyond moving to the next field in tab order), unless the authored unit contains instructions before the control that describe the behavior. x level 2 success criteria for guideline 3.2 3.2.3 navigational mechanisms that are repeated on multiple web units within a set of web units or other primary resources occur in the same relative order each time they are repeated, unless a change is initiated by the user. x 3.2.4 components that have the same functionality within a set of web units are identified consistently. x level 3 success criteria for guideline 3.2 3.2.5 changes of context are initiated only by user request. x principle 4: content should be robust enough to work with current and future user agents (including assistive technologies) guideline 4.1: support compatibility with current and future user agents (including assistive technologies) yes no n/a level 1 success criteria for guideline 4.1 4.1.1 web units or authored components can be parsed unambiguously, and the relationships in the resulting data structure are also unambiguous. x 4.1.2 for all user interface components, the name and role can be programmatically determined, values that can be set by the user can be programmatically set, and notification of changes to these items is available to user agents, including assistive technologies. x 42 information technology and libraries | september 2008 guideline 4.2: ensure that content is accessible or provide an accessible alternative yes no n/a level 1 success criteria for guideline 4.2 4.2.1 at least one version of the content meets all level 1 success criteria, but alternate version(s) that do not meet all level 1 success criteria may be available from the same uri. x 4.2.2 content meets the following criteria even if the content uses a technology that is not in the chosen baseline: if content can be entered using the keyboard, then the content can be exited using the keyboard. content conforms to success criterion 2.3.1 (general and red flash). x level 2 success criteria for guideline 4.2 4.2.3 at least one version of the content meets all level 2 success criteria, but alternate version(s) that do not meet all level 2 success criteria may be available from the same uri. x level 3 success criteria for guideline 4.2 4.2.4 content implemented using technologies outside of the chosen baseline satisfies all level 1 and level 2 requirements supported by the technologies. x notes 1. nonfunctional content can be specified using watermarks, or can simply be deleted from the reading order or the tag tree. 2. this information is not correctly transferred from some word processors to the pdf format. 3. creators can use the title attribute to specify an alternate title for a tag. 4. creators can use the e element to specify the abbreviation for a word. are pdf documents accessible? | ribera turró 43 appendix c. section 508 checklist section 508 pass fail n/a (a) a text equivalent for every non-text element shall be provided (e.g., via “alt,” “longdesc,” or in element content). x (b) equivalent alternatives for any multimedia presentation shall be synchronized with the presentation. x (c) webpages shall be designed so that all information conveyed with color is also available without color, for example from context or markup. x (d) documents shall be organized so they are readable without requiring an associated style sheet. x (e) redundant text links shall be provided for each active region of a server-side image map. x (f) client-side image maps shall be provided instead of server-side image maps except where the regions cannot be defined with an available geometric shape. x (g) row and column headers shall be identified for data tables. x (h) markup shall be used to associate data cells and header cells for data tables that have two or more logical levels of row or column headers. x (i) frames shall be titled with text that facilitates frame identification and navigation. x (j) pages shall be designed to avoid causing the screen to flicker with a frequency greater than 2 hz and lower than 55 hz. x (k) a text-only page, with equivalent information or functionality, shall be provided to make a website comply with the provisions of this part, when compliance cannot be accomplished in any other way. the content of the text-only page shall be updated whenever the primary page changes. x (l) when pages utilize scripting languages to display content, or to create interface elements, the information provided by the script shall be identified with functional text that can be read by assistive technology. [not evaluated in this article] x (m) when a webpage requires that an applet, plug-in, or other application be present on the client system to interpret page content, the page must provide a link to a plug-in or applet that complies with §1194.21(a) through (l). x (n) when electronic forms are designed to be completed online, the form shall allow people using assistive technology to access the information, field elements, and functionality required for completion and submission of the form, including all directions and cues. x (o) a method shall be provided that permits users to skip repetitive navigation links. x (p) when a timed response is required, the user shall be alerted and given sufficient time to indicate more time is required. x 150 information technology and libraries | december 2011 hardly a day goes by in my professional life (and it sometimes creeps into my personal life too!) when i don’t think about the issues of connecting people with data, and then how to present that data in ways that are relevant to their needs. the tides are shifting in health sciences library and likely in your library too. ongoing changes in publishing and the changing nature of research have challenged the traditional nature of the library. it is no longer solely a repository for information, physical or virtual. as librarians move from collecting and cataloging bibliographic information new roles have emerged in data discovery, in its preservation, and in helping to make data more accessible. important specialties include; knowledge management, data visualization, e-science and copyright. librarians have valuable skills sets in mining and accessing data, human–computer interaction, computer interface design, and knowledge management that can be leveraged now. it is inevitable that data discovery will quicken the pace of science and lead to collaboration and collaboration will in turn lead to data discovery and accelerate the pace of science and so on and so on. in short twentieth century data stored in individual scientists’ notebooks or computers is largely inaccessible. twenty-first-century data needs to be available 24/7 in a curated state for continuous analysis. information overload and data deluge created by intersection of science and technology are two very real problems that the librarians have the skill and ability to deal with. and, as i talk of science, bear in mind that it extends beyond the biological and physical sciences to encompass the social sciences as well. interdisciplinary studies in particular have intensive data needs. in fields such as public health and urban planning, government data alongside research data is used to predict trends, forecast, make decisions, etc. government data is a particularly important part of the equation. consider the recent nsf requirement for researchers to provide open access to their data for any nsf-sponsored grants. it is likely other government agencies will follow suit. one of taiga’s provocative statements of 2011 is “#10. the oversupply of mlss” which states that “within five years, library programs will have overproduced mlss at a rate greater even than humanities phds and glutted a permanently diminished market.”1 as the alarming scenario of an over abundance of new mlss in proportion to available library jobs presents itself, i encourage librarians to begin to envision themselves as digital information brokers or data scientists. the us department of labor in the 2010–11 occupational outlook handbook, anticipates that librarian jobs in nontraditional settings will grow the fastest over this decade. nontraditional libraries and jobs include working as information brokers for private corporations, nonprofit organizations, and consulting firms. “many companies are turning to librarians because of l ast week i attended the second annual vivo conference in washington, d.c. vivo (vivoweb .org) is a semantic web application that enables the discovery of research and scholarship across disciplines in an institution with the potential to also link scholars and research across institutions. despite an earthquake and a hurricane the conference itself was the real showstopper—excellent, informative programming, engaging speakers, great networking and exchange of ideas. my institution is one of the core vivo members so it was an opportunity to showcase our work, see what others are doing as well as learn more about trends in research, e-science and data discovery and collaboration initiatives. much of what i learned or rediscovered at vivo will make it into my fifty-minute presentation on the subject at the lita national forum in st. louis later this month. in fact the vivo conference itself reminded me of our own national forum in size, scope and content. it was a good mix of in-depth technical discussions coupled with broad coverage of issues and trends in scientific research. this attention to content balance is something that lita consistently gets right at our annual forum—there is literally something for everyone from introductory concepts to technical details—and i look forward to seeing many familiar faces and meeting some new folks at this year’s lita national forum in st. louis “rivers of data: currents of change.” i would also like to take this opportunity to personally invite each and every ital reader to the 2012 lita national forum. building on this year’s theme, the 2012 lita national forum will be “the new world of data: discover. connect. remix.” i just signed off on theme this week and i am excited and impressed by the work completed by the national forum planning committee so far. please look for the call for papers and posters to come out in late december. i love the forum because it is much more intimate than the much larger ala meetings i always come away with new ideas and new friends. i am not alone in this feeling. a recent forum attendee commented,” (the lita forum) was one of the best conferences i have attended. i met a far greater concentration of peers—colleagues at other libraries doing similar work—at lita forum than i have met at other similar conferences.” i don’t think i could say it better myself. the 2012 forum theme is one of great personal interest to me and i plan to extend the theme to the lita president’s program on june 24, 2012, in anaheim. in fact colleen cuddypresident’s message: data discovery colleen cuddy (colleen.cuddy@med.cornell.edu) is lita president 2011–12 and director of the samuel j. wood library and c. v. starr biomedical information center at weill cornell medical college, new york, new york president’s message | cuddy 151 column a call to arms for librarians of all backgrounds. the time to address data discovery is now! references 1. “taiga 2011 provocative statements,” http://taigaforum provocativestatements.blogspot.com/ (accessed sept. 22, 2011). 2. united states department of labor, bureau of labor statistics, occupational outlook handbook, 2010–11 edition, http:// www.bls.gov/oco/ocos068.htm (accessed sept. 22, 2011). their research and organizational skills and their knowledge of computer databases and library automation systems. librarians can review vast amounts of information and analyze, evaluate, and organize it according to a company’s specific needs.” 2 we have been seeing new job titles emerging to reflect these needs, such as data curation librarian, digital data outreach librarian, gis librarian, etc. what is your library doing with data? how can you and your library address the data needs of the twenty-first century? what technology is needed to address data needs? how can lita help you meet those needs? consider this statement of ownership, management, and circulation information technology and libraries, publication no. 280-800, is published quarterly in march, june, september, and december by the library information and technology association, american library association, 50 e. huron st., chicago, illinois 60611-2795. editor: marc truitt, associate director, information technology resources and services, university of alberta, k adams/cameron library and services, university of alberta, edmonton, ab t6g 2j8 canada. annual subscription price, $65. printed in u.s.a. with periodical-class postage paid at chicago, illinois, and other locations. as a nonprofit organization authorized to mail at special rates (dmm section 424.12 only), the purpose, function, and nonprofit status for federal income tax purposes have not changed during the preceding twelve months. extent and nature of circulation (average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures denote actual number of copies of single issue published nearest to filing date: september 2010 issue). total number of copies printed: average, 4,547; actual, 4,494. mailed outside country paid subscriptions: average, 3,608; actual, 3,577. sales through dealers and carriers, street vendors, and counter sales: average, 395; actual 367. total paid distribution: average, 4,003; actual, 3,944. free or nominal rate copies mailed at other classes through the usps: average, 27; actual, 27. free distribution outside the mail (total): average, 118; actual, 117. total free or nominal rate distribution: average, 145; actual, 144. total distribution: average, 4,148; actual, 4,088. office use, leftover, unaccounted, spoiled after printing: average, 399; actual, 406. total: average, 4,547; actual, 4,494. percentage paid: average, 96.50; actual, 96.48. s t a t e m e n t o f o w n e r s h i p , m a n a g e m e n t , a n d c i r c u l a t i o n ( p s f o r m 3 5 2 6 , s e p t e m b e r 2 0 0 7 ) f i l e d w i t h t h e u n i t e d s t a t e s p o s t o f f i c e p o s t m a s t e r i n c h i c a g o , o c t o b e r 1 , 2 0 11 . untitled a case study on the path to resource discovery beth guay information technology and libraries | september 2017 18 abstract a meeting in april 2015 explored the potential withdrawal of valuable collections of microfilm held by the university of maryland, college park libraries. this resulted in a project to identify oclc record numbers (ocn) for addition to oclc’s chadwyck-healey early english books online (eebo) kbart file.1 initially, the project was an attempt to adapt cataloging workflows to a new environment in which the copy cataloging of e-resources takes place within discovery system tools rather than traditional cataloging utilities and marc record set or individual record downloads into online catalogs. in the course of the project, it was discovered that the microfilm and e-version bibliographic records contained metadata which had not been utilized by oclc to improve its link resolution and discovery services for digitized versions of the microfilm resources. this metadata may be advantageous to oclc and to others in their work to transition from marc to linked data on the semantic web. with marc record field indexing and linked data implementations, this collection and others could better support scholarly research. collections, discovery tools, and metadata services the university of maryland, college park libraries’ (the libraries; um libraries) collections include 3.45 million print books and 1.2 million ebooks, 17,000 electronic journals, and 352 electronic databases.2 in late 2011, the libraries implemented worldcat local, oclc’s singlesearch-box interface to the worldcat database of cataloged resources and a central index of metadata provided by publishers, abstracting and indexing services, institutional repositories, and so on. with worldcat local, and later, worldcat discovery, oclc utilizes a knowledge base in managing e-resources discovery and access.3 knowledge bases are “associated with link resolvers and electronic resource management systems” and “contain title-level metadata, linking syntax rules, publication ranges and other data.”4 kbart files are so named to represent files compliant with the niso recommended practice, “knowledge bases and related tools (kbart).”5 kbart files, created and supplied by content providers, are used to transmit this title level metadata to knowledge base vendors and discovery service providers.6 since oclc enhances these files with oclc numbers (ocn) in order to provide automated holdings maintenance on worldcat bibliographic records, the libraries’ metadata services department (msd) adopted a policy in 2012 to provide access to e-resources only via worldcat when such files are available. beth guay (baguay@umd.edu) is continuing resources librarian, university of maryland libraries, university of maryland, college park. a case study on the path to resource discovery | guay | doi:10.6017/ital.v36i3.9966 19 space planning early on, the libraries’ collection policies targeted duplicate copies of print monographs and print journals held electronically in trusted repositories, e.g., jstor, for deselection. by march 2014, the libraries’ collection development council discussed moving microfilm collections to the yet to be opened severn library, slated to “house lesser used materials … in order to free up much needed space for users and the development of new collaborative learning spaces.” 7 8 a year later, in april 2015, a meeting was called by the assistant head, collection development, to investigate microfilm collection retention decisions. this time the libraries were considering the withdrawal of microfilm resources for which equivalent versions were held online. a caveat placed on the withdrawal of the microfilm by the collection managers was that prior to their withdrawal and subsequent deletion of the libraries’ holdings on the worldcat bibliographic records, the equivalent e-version resources should be made discoverable in worldcat umd (the libraries’ worldcat discovery implementation) by the addition of the libraries’ holdings on e-version bibliographic records corresponding to the microfilm version records. following the meeting, the librarian for english, latin american, & latina/o studies and second language acquisition provided the continuing and electronic resources cataloger (c-er cataloger) with a list of eight valuable microfilm collections of resources and for each, the name of the comparable online collection (or e-collection) subscribed to. it was agreed that the c-er cataloger would investigate to determine if any of those microfilm collections could be withdrawn in compliance with the collection managers’ caveat. in other words, the c-er cataloger’s mission was to ensure a one-to-one correspondence of electronic and microfilm version bibliographic records for the equivalent versions of the resources. one of the e-collections added to the worldcat knowledge base (wckb) by the libraries was gale’s, the making of the modern world, 1450-1850: part i collection (momw). this collection is comprised of digitized versions of gale's microfilm resources in the series, the goldsmiths'-kress library of economic literature.9 a kbart file was derived from the libraries’ momw marc record set and uploaded to the wckb sandbox where it supports the libraries’ access to the e-version resources. the momw marc record set had been reviewed and vetted by the libraries prior to its purchase, and upon its purchase, gale had set the libraries’ holdings on the worldcat bibliographic records representing the resources. with this information in mind, the c-er cataloger determined that the momw e-resource bibliographic records were comparable to those representative of the libraries’ corresponding goldsmiths'-kress library of economic literature microfilm collection, thus meeting the collection managers’ criteria for deselection. the 3380 reels that could be withdrawn comprised a small but not insignificant allotment of physical space in the library. provision of discoverability of equivalent e-versions of resources held in other collections proved difficult. for example, the corresponding microfilm collections represented in the wckb’s british periodicals collections i and ii were held in the series, early british periodicals and english literary information technology and libraries | september 2017 20 periodicals.10 the libraries had cataloged 186 individual serial titles in the microfilm series, early british periodicals in 2002, but none in the series, english literary periodicals. thus the objective would have been to ensure discoverability for the equivalent electronic versions of the libraries’ 186 cataloged microfilm versions in the early british periodicals series. at the time of this investigation, there were 580 british periodicals i and ii kbart file title entries; 390 of which had ocn. whereas the ocn of the making of the modern world, 1450-1850: part i wckb collection were known entities, the ocn of the remaining e-collections had yet to be vetted. thus the british periodicals collections i and ii records were spot checked for evaluation. the quality of the 390 oclc records ranged from excellent, e.g., oclc record #297425799, to poor, e.g. #818401694 (see figure 1, 2, 3, and 4). marc record images in figures 1-4 are sourced from oclc’s connexion cataloging client interface to the worldcat bibliographic database. figures 1 and 2 represent a microfilm version record and a comparable “excellent” quality record given for the resource in the worldcat knowledge base, while figures 3 and 4 represent a microfilm version and comparable “poor” quality record given for the resource in the wckb. note that the c-er cataloger’s definition of an excellent quality e-version record was one which provided metadata comparable to those of its equivalent microfilm version record; likewise, a poor quality record lacked comparable metadata. in other words, an excellent quality record was viewed as a guarantor of a discoverable resource, while a poor quality record was viewed as an obstacle to discovery. for this wckb collection, the c-er cataloger determined that staff expertise with serial bibliographic records was required, and due to msd staffing limitations, moved ahead to examine the other collections. a case study on the path to resource discovery | guay | doi:10.6017/ital.v36i3.9966 21 figure 1. microfilm version record information technology and libraries | september 2017 22 figure 2. excellent quality e-version record — ocn in the kb file a case study on the path to resource discovery | guay | doi:10.6017/ital.v36i3.9966 23 figure 3. microfilm version record information technology and libraries | september 2017 24 figure 4. poor quality e-version record — ocn in the kb file in an investigation into oclc’s chadwyck-healey early english books online (eebo) kbart file, for which equivalent e-versions of microfilm resources in the series early english books, 14751640 and early english books, 1641-1700 are held, it was found that the availability of comparable e-version bibliographic records was optimal.11 in consultation with the msd department head, a project to ensure the discoverability of equivalent e-versions of the libraries’ 5,062 cataloged microfilm resources in the series, early english books, 1475-1640 was initiated. the c-er cataloger had hoped to follow with a similar effort for the libraries’ resources in the series early english books, 1641-1700 (represented by 41,306 records in the libraries’ integrated library system). background: eebo, related resources and bibliographic records much has been written on eebo’s inception and continuing development as a collection of digital reproductions of microfilm reproductions of pre-1700 print resources, and on its scholarly value (kitchuk, 2007; martin, 2007; gadd, 2009; mak, 2013; folger shakespeare library, 2015).12 alfred pollard and gilbert redgrave’s a short-title catalogue of books printed in england, scotland, & ireland and of english books printed abroad, 1475-1640 (“stc”), and the “companion” volume, donald wing's short-title catalogue of books printed in england, scotland, ireland, wales, and british america, and of english books printed in other countries, 1641-1700 (“wing”), respectively, were used in selecting the print resources for filming.13 gadd (2009, 683) pinpointed the stc as “a catalogue of editions (or more accurately, editions and issues) not copies although, of course, the a case study on the path to resource discovery | guay | doi:10.6017/ital.v36i3.9966 25 information about any edition is derived primarily from the surviving copies … each entry gives the location of known copies …” 14 the “successor” to stc and wing, the english short title catalog (estc), “includes records for every item listed in stc, every item in wing, every item in the eighteenth century short title catalogue … and newspapers and other serials which began publication before 1801” and is freely available online from the british library.15 16 gadd (2009, 685-686) offered this critique concerning eebo’s bibliographic data and relationship to the estc: eebo’s relationship with the original stc and wing is straightforward and clear; eebo’s relationship with electronic estc, on the other hand, is less well-known. a series of agreements made between estc and university microfilms/proquest between 1989 and 1997 allowed eebo to draw directly on estc’s existing bibliographical data … eebo heavily edited estc’s data for its own purposes; certain categories of data were removed (e.g. collations, stationer’s register entrances), some information was amended (e.g., subject headings), and some was added (e.g. microfilm specific details). second, there is no formal mechanism for synchronizing the data between the two resources. occasionally, snapshots of data are sent by eebo to estc but there is no guarantee that a correction or revision made to an estc entry will be replicated in the corresponding eebo or vice-versa: neither estc nor eebo will necessarily know when the other made a correction.17 gadd postured that “as both resources continue to amend and expand their bibliographical data for their own purposes, there is an increasing likelihood of significant discrepancy between the two resources.”18 he did not further address the quality of the bibliographic records describing the eebo versions of the resources; perhaps he was unaware of the sources of the eebo bibliographic data. microfilm version bibliographic records serve as the basis of the metadata describing the eebo version resources. according to proquest, “marc records (from which eebo bibliographic records derive) are produced for the microfilm collection early english books (eeb) after they are filmed.”19 oclc’s cataloging database has served as one source of microfilm version records for titles in the series since the 1980s. in 1984, the association of research libraries (1984, p. j-3) reported that one library had “input an indeterminate amount [of bibliographic records] into oclc” for early english books, 1475-1640, and that one had “input records for an indeterminate percentage of the set into oclc” for resources in the series, early english books 1641-1700.20 the cataloging sources of these microfilm resources have varied over time, from cooperative projects to umi/proquest staff to individual libraries, however, adherence to standards has characterized the totality of the efforts invested. joachim (1993, p. 111) described the cooperative effort begun in 1984 by the indiana university libraries, university of california, riverside, university of delaware, and the university of utah to catalog microfilm version resources cataloged by wing: information technology and libraries | september 2017 26 in order to maintain standards and consistency among the five libraries, the project director prepared a “wing stc project manual.” the manual includes general information, information on authority work, a bibliography, a discussion of special cataloging problems and procedures, sample records, and database input guidelines.21 oclc’s marc records for the microfilm and eebo version resources contain note fields identifying the locations of the print copies filmed and subsequently reproduced digitally by umi/proquest. gadd (2009, p. 686) emphasized the importance of this information to scholars in stating that “different copies from the same edition might vary, sometimes markedly.”22 as to gadd’s (2009) critique concerning the lack of a formal synchronization mechanism and increasing likelihood of discrepancies between eebo and estc, further examination of eebo and estc bibliographic record displays such as those shown in figures 5 and 6 suggest that the british library is working with proquest to align their data. it appears a focus of the british library may be to inform the scholar of the availability of the microfilm and electronic versions of the print resources. in its estc overview, the british library states that “the existence of selected … printed and digital surrogates within products such as early english books online … is … noted” in its records and that its records “act as an index to several major research microform series … including early english books, 1475-1640 … [and] early english books, 1641-1700.”23 a case study on the path to resource discovery | guay | doi:10.6017/ital.v36i3.9966 27 figure 5. eebo bibliographic record for the resource cited by stc 2nd edition entry 9164 and reproduced from the copy held at the society of antiquaries, london. information technology and libraries | september 2017 28 figure 6. estc catalog record for stc 2nd edition, entry 9164 (http://estc.bl.uk/s3614). the code, “lsa” given as “loc. of filmed copy” is the british libraries’ marc code for the society of antiquaries library. 24 a case study on the path to resource discovery | guay | doi:10.6017/ital.v36i3.9966 29 finally, to add to this mix of print, microfilm, and eebo digitized images, xml/sgml versions of the resources are being created by the text creation partnership (tcp), formed in 1999 by the university libraries of michigan and oxford, proquest, and the council on library and information resources, to provide full text search capability.25 catalog records describing tcp versions are available in worldcat. according to the tcp, “the tcp does not have the resources to create new catalog records for each text we produce (though you are welcome to do so, and if you are willing to share them we would be very glad to know about it).”26 the um libraries’ eebo project the oclc eebo kbart file, which contained 129,544 title entries when downloaded, 58,518 of which lacked ocn, was combined with a file extracted from the 5,062 marc records that represented the microfilm resources. the merged file was to be used as a tool in identifying the ocn of the equivalent e-versions of the microfilm resources held. the plan was to add the eversion ocn to the eebo kbart file via oclc’s ocn correction form.27 significant time was spent developing and documenting procedures by which staff could perform the work of identifying ocn for addition to the eebo file. the basic procedures are as follows: (1) via the oclc connexion cataloging client, search and retrieve the e-version record using the microfilm version record data; (2) use titles and/or ocn of the microfilm version record to identify the comparable eebo resource in the kbart file; (3) view the eebo resource record using the url in the file; and (4) record the ocn of the matching e-version record in the appropriate row/column of the file.28 subsequently, two msd staff members were recruited to assist in the effort. in early november and mid-december, 2015, training sessions were held with both staff, followed by an individual session with each. before the year’s end, each staff member had successfully completed an assigned number of “titles” for review. importantly, from the initial investigative work, a kbart file with 50 ocn was compiled and submitted to oclc. confirmation from oclc customer support was given that the file would be loaded. due to the ongoing developmental status of oclc’s services, the ocn were not loaded into the wckb until june 2016. however, a second file sent in april 2016 was loaded in june as well. the number of ocn added to the worldcat knowledge base from the project’s inception through 2016 was small due to staffing issues. the average staff time to complete a microfilm/equivalent e-version title entry in the kbart file was 13 minutes.29 as the project progressed, staff following the procedures confirmed that some ocn in the eebo kbart file were incorrect. most often, the “errors” stemmed from the attribution of tcp or german language of cataloging record ocn to the eebo version resources. these tcp and german language of cataloging records correctly corresponded to matching eebo version resources, however, tcp version records refer to xml/sgml encoded text editions; secondly, oclc attempts to prefer english language of cataloging records over others in its knowledge base.30 information technology and libraries | september 2017 30 other ocn errors seriously detract from the wckb’s eebo file’s value. for example, worldcat record number 606541404 describes the “fourth edition very much enlarged” of “a most exact catalogue of the lords spirituall and temporall, as peers of the realme, in the higher house of parliament, according to their dignities, offices, and degrees: some other called thither for their assistance, & officers of their attendances …” yet this ocn in the worldcat knowledge base’s eebo kbart file links to an eebo record describing the “third edition much enlarged.” see figure 8 illustrating the worldcat umd record which links to an eebo resource record describing the “third edition much enlarged.” note that the oclc record (as seen in the connexion client view of the record in figure 9) is cited by stc (2nd ed.) 7746.3 while the eebo version record linked to is cited by stc (2nd ed.) 7746.2. to make matters worse, the author determined that the corresponding image associated with the eebo catalog record cited by stc 7746.2 and displayed at the site corresponded to neither resource cited as stc 7742.2 and stc 7746.3. these were both printed in 1628, but the image provided at the eebo site was of a resource printed in 1640 (see figure 10). figure 8. worldcat umd record ocn 606541404 linking to the wrong version of a resource in eebo. a case study on the path to resource discovery | guay | doi:10.6017/ital.v36i3.9966 31 figure 9. connexion client view of ocn 606541404 information technology and libraries | september 2017 32 figure 10. digital image linked to from eebo record describing the “third edition much enlarged” of a resource printed in 1628. http://gateway.proquest.com/openurl?ctx_ver=z39.882003&res_id=xri:eebo&rft_id=xri:eebo:image:23639 a case study on the path to resource discovery | guay | doi:10.6017/ital.v36i3.9966 33 further investigation identified errors of misappropriation of ocn in the kbart file to eebo version records describing copies of editions filmed at locations other than those noted in the corresponding oclc records. for example, the eebo resource, “by the king. a proclamation for the adiournement of part of trinitie terme,” identified in the wckb as associated with ocn 71492075, links the scholar to a resource described by the eebo version record as the copy filmed at the british library. oclc record 71492075 however indicates that the copy it describes was the copy filmed at the henry e. huntington library and art gallery. see figures 11-13. figure 11. the wckb associates ocn 71492075 with the eebo resource, “by the king. a proclamation for the adiournement of part of trinitie terme,” described by the eebo website as the copy filmed at the british library. information technology and libraries | september 2017 34 figure 12. the eebo resource record linked from ocn 71492075 by the oclc eebo kbart file indicates the copy filmed was held by the british library. a case study on the path to resource discovery | guay | doi:10.6017/ital.v36i3.9966 35 figure 13. ocn 71492075 indicates it describes a copy of the resource, “by the king : a proclamation for the adiournement of part of trinitie terme,” filmed at the henry e. huntington library and art gallery. evaluation the um libraries’ eebo project procedures revealed that match points of equivalent microfilm and e-version records were the names of the institutions holding the filmed copies and the stc citations to the resources.31 stc citations are carried in the marc 510 fields of the bibliographic records in two subfields: 1. in subfield “a,” the names of citing works, given in a brief form, e.g., “stc” to represent pollard and redgrave’s short-title catalogue; and 2. in subfield “c,” the location (e.g., page number or volume) within the citing works, e.g. “8626.”32 figure 14 displays a connexion client view of ocn 33150534, cited as stc 9170, and figure 15 shows the same record in the worldcat display view. unfortunately, the marc 510 fields are neither indexed by oclc nor displayed in worldcat.33 oclc could enable the identification and information technology and libraries | september 2017 36 collocation of records for equivalent print, microfilm and electronic versions by indexing the marc 510 fields and subfields.34 figure 14. microfilm version record ocn 33150534, cited as stc 9170. a case study on the path to resource discovery | guay | doi:10.6017/ital.v36i3.9966 37 figure 15. worldcat.org view of ocn 33150534, stc 9170 (http://www.worldcat.org/oclc/33150534). the underlying marc 510 field metadata is not displayed. investigation by the author revealed that tcp version records supply these metadata elements in duplicate in different marc fields; one a free text note field, the other a number/code field, 024. the 024 field is defined to carry a “standard number or code published on an item which cannot be accommodated in another field (e.g., field 020 (international standard book number).”35 it should be noted that use of the 024 field to carry a number that is not published on the item is not in accordance with the field’s definition. the tcp records use the 024 field with a first indicator value “8,” conveying that the number is an unspecified type of standard number or code.36 subfield “a” of the 024 field, which carries the stc numbers in the tcp version records, is indexed by oclc. in the tcp version records, however, these elements are ensconced within strings of text, e.g., “(stc) stc (2nd ed.) 9170.”37 a search on standard number, “9170,” in worldcat will therefore fail to retrieve the appropriate record. see figure 16 for an example of a tcp version record of a resource cited as stc 9170. information technology and libraries | september 2017 38 in respect to the marc field definitions, should there be a need to retrieve bibliographic records representing tcp versions of resources via stc citations, these numbers should be entered in “a” subfields, and the brief abbreviated names of the citing source, e.g., “stc (2nd ed.),” “wing,” etc. in the “2” subfield which is defined to carry the “source of number or code.”38 should oclc choose to index the marc 510 fields as described above, the text creation partnership records would be missed. figure 16. text creation partnership version ocn 832931179, stc 9170 indexing of the marc 510 fields/subfields by oclc combined with use of other marc field/subfield values, such as language of cataloging, to limit results to desired ocn could support elimination of eebo kbart file ocn errors and identification of thousands of new ocn for addition to this and perhaps other similar files. 39 as a point of reference, according to oclc’s “marc usage in worldcat” webpages, as of january 1, 2016, there were 6,382,317 instances of marc 510 “a” subfields and 4,082,280 instances of the “c” subfields.40 it should be noted, however, there are five first indicator values available for use in marc 510 fields and only one of them is used to convey the information that the location in the source data is given in the field. also worth noting, 024 data at the “marc usage in worldcat” webpages shows that there were 4,633,776 occurrences of subfield 2 of the 024 field, and 43,711,819 occurrences of subfield “a.”41 510 field indexing to support identification of ocn for addition to the eebo kbart file may require the participation of the content provider, proquest. the 510 field elements are indexed in a case study on the path to resource discovery | guay | doi:10.6017/ital.v36i3.9966 39 its early english books online collection. proquest could add these data to its eebo kbart file in support of ocn matching. the kbart recommended practice allows content providers “to include any extra data fields after the last kbart utilized position.”42 finally, it should be noted that reconciliation of errors in the wckb eebo file pertaining to the locations of the filmed copies as noted in oclc records but found to be different at the eebo site would require more complex steps than 510 field matching. furthermore, catalogers working on the eebo project were not instructed to check the images at the eebo website but only to confirm the stc citation match points in the eebo version records. a closer examination of eebo in light of the findings in this paper of an eebo record linked to a resource printed 12 years later is an area calling for further study. in respect of the needs of scholars as eloquently described by gadd (2009), the worldcat knowledge base ocn must improve its accuracy in terms of access provision via worldcat discovery. marc 510 elements: opportunities for linked data applications? oclc is actively engaged in research and collaboration with the greater library community to transition its metadata to linked data, however, marc 510 metadata is lacking in its linked data record display views (see figure 14 in a connexion client view of a record and figure 17 in the worldcat linked data display view).43 44 on the other hand, in its work to transfer its english short title catalog, a “marc based … vendor-supplied ils” to “estc21” a “native linked data resource,” it appears the british library combines the marc 510 subfield values, e.g., “bristol, b7384” as a resource property value (figures 18 and 19).45 46 “bristol, b7384” represents entry number 7384 in roger p. bristol’s supplement to charles evans' american bibliography (see figure 20, worldcat oclc record number 88701).47 as presented in figure 19 (stahmer, 2014), “bristol, b7384” may be comprehensible to a well-versed scholar, librarian or archivist, but not to a computer. hillmann, dunsire, and phipps (2013) posited that “it would be useful if all managers of schemas and other standards were to develop element sets and value vocabulary representations that match the source semantics at the finest granularity and make them available along with maps of the internal ontologies.”48 could a semantic web implementation of marc 510 metadata at the finest granularity, with resource identifiers representing citing works such as “bristol” and with property values such as “7384” representing locations within citing works, offer benefits to scholarship? it has been demonstrated in this paper that the consistent match points across bibliographic records representing equivalent versions of these resources has been the metadata contained in marc 510 fields. ultimately, a linked data implementation of the marc bibliographic 510 field should lead the scholar to every known print copy comprising every edition, according to gadd’s definition of an edition, above, and to the institutional holdings of equivalent microform, digitized images, or digitized full-text versions, giving the scholar the path to the resources of interest.49 oclc, the british library, members of the tcp, and other stakeholders may want to consider further exploration of use case scenarios to determine or rule out additional benefits of transforming marc 510 field metadata to linked data. information technology and libraries | september 2017 40 figure 17. linked data view of oclc #33150534, http://www.worldcat.org/oclc/33150534 a case study on the path to resource discovery | guay | doi:10.6017/ital.v36i3.9966 41 figure 18. marc 510 field data in estc information technology and libraries | september 2017 42 figure 19. marc 510 metadata in structured data view in estc21 a case study on the path to resource discovery | guay | doi:10.6017/ital.v36i3.9966 43 figure 20. print version of ocn 88701, supplement to charles evans' american bibliography by roger p. bristol, http://www.worldcat.org/oclc/88701. conclusion at the current pace, given available staffing and the number of eebo resources lacking ocn, the time and effort spent by the libraries’ metadata services department staff toward the goal of adding ocn to the oclc eebo kbart file, though well spent, will be years in the making. a collective effort in this endeavor by the wckb community of users is welcomed by this author.50 a combined effort by oclc and proquest to improve discovery and link resolution services for these valuable scholarly resources could increase their discoverability exponentially, allowing msd staff to spend more time creating and enhancing the metadata that will lead researchers to the uncatalogued eebo resources they seek. as to the transition of marc 510 field metadata to linked data, oclc, the british library, members of the tcp, and other stakeholders should consider their options before moving forward without it. information technology and libraries | september 2017 44 acknowledgement the author wishes to thank karen coyle for reading and advising on earlier versions of this paper; becky culbertson, nathan putnam, and patricia herron for supporting the project; and joshua westgard for converting the data to get the project underway. special thanks are due to staff members of the um libraries, donna king, roselin becker, erica hemsley, yeo-hee koh, and tanisha lee, and to freeda brook, luther college, for their work on the project. references 1. a kbart file is a file compliant with the niso recommended practice, knowledge bases and related tools (kbart). see kbart phase ii working group, knowledge bases and related tools (kbart): recommended practice: niso rp-9-2014 (baltimore, md: national information standards organization (niso), 2014), accessed march 14, 2017, http://www.niso.org/publications/rp/rp-9-2014/. 2. university of maryland libraries. “about.” last updated july 28, 2016, http://www.lib.umd.edu/about 3. in 2015, the libraries implemented worldcat discovery, intended to be a replacement for worldcat local. 4. marshall breeding, the future of library resource discovery, (baltimore, md: national information standards group (niso), 2015): 17, accessed february 18, 2017. http://www.niso.org/apps/group_public/download.php/14487/future_library_resource_disc overy.pdf 5. kbart phase ii working group, knowledge bases and related tools (kbart): recommended practice: niso rp-9-2014, (baltimore, md: national information standards group (niso), 2014), accessed april 13, 2017, http://www.niso.org/publications/rp/rp-9-2014/ 6. open discovery initiative working group, open discovery initiative: promoting transparency in discovery: niso rp-19-2014, (baltimore, md: niso, 2014): 13, accessed march 14, 2017, http://www.niso.org/publications/rp/rp-9-2014/ 7. university of maryland libraries collection development council. “meeting notes,” march 4, 2014. 8. “university of maryland libraries master space plan,” nov. 2015, june 2016 update. 9. see gale’s web page, “the making of the modern world (momw) faq,” at http://find.galegroup.com/mome/component/researchtools/xml/faq.xml, accessed february 18, 2017, for a details about the collection. worldcat knowledge base collections may be created by libraries and uploaded to the knowledge base. details on the process are available at http://www.oclc.org/support/services/collectionmanager/documentation.en.html#knowledgebase, accessed february 18, 2017. a case study on the path to resource discovery | guay | doi:10.6017/ital.v36i3.9966 45 10. proquest’s british periodicals collection “offers facsimile page images and searchable full text for nearly 500 british periodicals published from the 17th century through to the early 21st” and “is available in four separate collections, british periodicals collections i, ii, iii, and iv, each of which can be purchased separately.” proquest british periodicals product description page, http://search.proquest.com/britishperiodicals/productfulldescdetail?accountid=14696, accessed jan. 29, 2017 11. details about resources available in eebo are provided by proquest at its website, “eebo: about eebo,” accessed january 29, 2017. http://eebo.chadwyck.com/marketing/about.htm 12. diana kichuk, “metamorphosis: remediation in early english books online (eebo),” literary and linguistic computing, 22:3 (2007): 291-303; shawn martin, “eebo, microfilm, and umberto eco: historical lessons and future directions for building electronic collections,” microform & imaging review, 36:4 (2007): 159-164; ian gadd, “the use and misuse of early english books online, literature compass, 6:3 (2009): 680-692; bonnie mak, “archaeology of a digitization,” journal of the association for information science and technology, 65:8 (2014): 1515-1526; folger shakespeare library, “history of early english books online,” http://folgerpedia.folger.edu/history_of_early_english_books_online, last modified on 26 august 2015. 13. a.w. pollard and g. r. redgrave. a short-title catalogue of books printed in england, scotland, & ireland and of english books printed abroad, 1475-1640, rev. ed. (london: the bibliographical society, 1976–1991); donald wing, short-title catalogue of books printed in england, scotland, ireland, wales, and british america, and of english books printed in other countries, 1641-1700, 2d ed., newly rev. and enl. (new york : modern language association of america, 1972<1994>) 14. gadd, “the use and misuse of early english books online,” 683. 15. “about eebo.” 16. details on the estc are provided by the british library at http://www.bl.uk/reshelp/findhelprestype/catblhold/estccontent/estccontent.html, viewed march 12, 2017 17. gadd, “the use and misuse of early english books online,” 685-686. 18. gadd, “the use and misuse of early english books online,” 686. 19. eebo, “frequently asked questions,” accessed february 18, 2017. http://eebo.chadwyck.com/help/faqs.htm 20. association of research libraries, microform sets in u.s. and canadian libraries, (washington, d.c.: association of research libraries, 1984), j-3. 21. martin d. joachim, “cooperative cataloging of microform sets,” in cooperative cataloging: past, present, and future (new york: the haworth press, 1993), 111. information technology and libraries | september 2017 46 22. gadd, “the use and misuse of early english books online,” 686. 23. british library, “catalogs of british library holdings: english short title catalogue content,” accessed february 18, 2017. http://www.bl.uk/reshelp/findhelprestype/catblhold/estccontent/estccontent.html 24. the british libraries estc codes for filmed copy locations are difficult to translate. see meaghan j. brown’s finding aid, “stc location code transcription” wherein she offers details on stc and estc location codes and the problem her finding aid addresses. brown explains, “… it is currently possible to search the estc for items using marc codes, but not the location codes familiar from the stc,” accessed february 18, 2017. http://www.meaghanbrown.com/stc-location-codes/ 25. text creation partnership, accessed january 25, 2017. http://www.textcreationpartnership.org/home/ 26. text creation partnership, accessed january 25, 2017. http://www.textcreationpartnership.org/catalog-records/ 27. oclc’s form is available at https://www.oclc.org/content/dam/support/knowledgebase/ocn_report.xlsx, accessed october 18, 2016. 28. see appendix 1 for the procedures 29. with streamlined kbart search features introduced by a metadata services department colleague, it’s expected this time may be reduced moving forward. 30. a june 9, 2015 email from an oclc staff member to the kb-l@oclc.org listserv reported on oclc’s efforts to match ocn in its kbart files to english language of cataloging records, when available. 31. um libraries’ staff use this metadata in the equivalent oclc microfilm and e-version and eebo resource records as match points. staff do not verify that the images linked to the eebo version records correspond to those in the aforementioned bibliographic records. it is hoped that proquest will investigate the case described in this paper in which the eebo resource differs from its corresponding record. 32. “510 citation/reference note,” oclc, bibliographic formats and standards. 4th edition, last revised august 22, 2016. https://www.oclc.org/bibformats/en/5xx/510.html 33. as of january 29, 2017, the marc 510 field has not been indexed by oclc. see http://www.oclc.org/support/help/searchingworldcatindexes/#05_fieldsandsubfields/5xx _fields.htm 34. e.g., oclc indexes “internet resources” using a combination of marc data elements. these are laid out in “searching worldcat indexes” at http://www.oclc.org/support/help/searchingworldcatindexes/#06_format_document_typ e_codes/format_document_type_codes.htm. marc 21 bibliographic at a case study on the path to resource discovery | guay | doi:10.6017/ital.v36i3.9966 47 https://www.loc.gov/marc/bibliographic/bdleader.html provides the leader position 06 code for “language material.” marc code list for languages (http://www.loc.gov/marc/languages/) contains the language codes contained in the language of cataloging field/subfield (marc 040 field, subfield “b”). 35. “024 other standard identifier,” in oclc, bibliographic formats and standards, 4th edition, accessed january 25, 2017. https://www.oclc.org/bibformats/en/0xx/024.html 36. ibid. 37. oclc. searching worldcat indexes, accessed february 18, 2017. http://www.oclc.org/support/help/searchingworldcatindexes/#05_fieldsandsubfields/0xx _fields.htm%3ftocpath%3dfields%2520and%2520subfields%7c_____2 38. see oclc bibliographic formats and standards, fourth edition. 024 other standard identifier https://www.oclc.org/bibformats/en/0xx/024.html, viewed january 25, 2017 39. an oct. 18, 2016 review of oclc’s all-collections-list, available at https://www.oclc.org/content/dam/support/knowledge-base/all-collections-list.xlsx indicates that 38.5% percent of the 129,498 resources on the eebo kbart file have oclc number coverage. 40. http://experimental.worldcat.org/marcusage/510.html 41. http://experimental.worldcat.org/marcusage/024.html 42. kbart phase ii working group, knowledge bases and related tools (kbart): recommended practice: niso rp-9-2014 (baltimore, md: niso 2014), 18. http://www.niso.org/workrooms/kbart 43. https://www.oclc.org/worldcat/data-strategy.en.html, viewed jan. 26, 2017 44. the image of the linked data view of figure 14 was captured on february 18, 2017. 45. carl stahmer, “making marc agnostic: transforming the english short title catalogue for the linked data universe,” in linked data for cultural heritage, (chicago: ala editions), p. 23-25. 46. the assertion that the estc transformation of marc 510 field metadata is solely based on carl stahmer, “the estc as a 21st century research tool,” presentation given at the 2014 conference of the text encoding initiative, viewed february 19, 2017. https://figshare.com/articles/estc21_at_tei_2014/1558057 47. roger p. bristol, supplement to charles evans' american bibliography (charlottesville: university press of virginia, 1970). 48. dianne hillmann, gordon dunsire, and jon phipps, “maps and gaps: strategies for vocabulary design and development,” in dcmi international conference on dublin core and metadata applications, 2013: 88, accessed february 18, 2017. http://dcpapers.dublincore.org/pubs/article/view/3673/1896 information technology and libraries | september 2017 48 49. see reference 14 above. 50. a discussion and invitation to collaborate on this work took place in late 2016 on the oclc worldcat kb listserv (see http://listserv.oclc.org/scripts/wa.exe?subed1=kb-l&a=1). to date, the preus library, luther college, will be working with the libraries on this project. lib-s-mocs-kmc364-20141005045627 257 technical communications announcements isad institute on bibliographic networking information science and automation division (is ad) of the american library association will hold an institute in new orleans on february 28-march 1, 1974 at the monteleone hotel in the french quarter. the subject of the institute will be "alternatives in bibliographic networking, or how to use automation without doing it yourself." the seminar will review the options available in cooperative cataloging and library networks, provide a framework for identifying problems and selecting alternative cataloging systems on a functional basis, and suggest evaluation strategies and decision models to aid in making choices among alternative bibliographic networking systems. the institute is designed to assist the participant in solving problems and in selecting the best system for a library. methods of cost analysis and evaluation of alternative systems will be presented and special attention will be given to comparing on-line systems with microfiche-based systems. the speakers and panelists are recognized authorities in bibliographic networking and automated cataloging systems and will include: james rizzolo, new york public library; maryann duggan, slice; jean l. connor, new york state library; maurice freedman, hennepin county library, minneapolis; brett butler, information design, inc.; and michael malinconico, new york public library. the cost will be $60 for ala members and $75 for nonmembers. for hotel reserv~tion information and a registration blank, write to donald p. hammer· isad; american library association; 50 e. huron st.; chicago, il 60611. p.s. mardi gras is february 26! isad forms committee on technical standariu for library automation (tesla) the information science and automation division of the american library association now has a committee on technical standards for library automation (tesla). tesla, recently formed with the approval of the isad board of directors will act primarily as: a clearinghouse fo; technical standards relating to library automation; a focal point for information relating to automation standards; and a coordinator of standards proposals with appropriate organizations, e.g., the american national standards institute the electronic industries association, n~tional association of state information systems. the committee's initial work will be to formulate areas and priorities in which standards are required, to document existing standards sources, and to develop a "library" of applicable standards to be drawn upon by the membership of ala. according to the new committee's chairman, john kountz, california state universities and colleges, "it is auspicious that this time be selected for the implementation of a standards committee for library automation. with the current introduction en masse of production library automation systems and the fading of research and development activities, such standards will come into good use as they may be developed for library automation. in addition, the close linkage with new developments such as the information industries association and the availability of standardized data bases, hardware, and communication standards are becoming requirements. the standards which shall be emphasized in the committee activities are those relating to areas of interestfor administrators and automators 258 journal of library automation vol. 6/ 4 december 1973 alike. these standards are intended to fill the void for future library automation operations." the committee efforts should be measured in terms of facilitating the automation of library functions as required on an individual library basis. information relating to the standards committee activities and its scope, or general information relating to library information technical standards, should be addressed to: ala/ isad committee on technical standards for library automation, john kountz, chairman, 5670 wilshire blvd., suite 900, los angeles, ca 90036. formation of an ad hoc disetl$sion group on serials data bases as a result of an informal meeting held during the ala conference in las vegas to discuss the problems associated with the establishment and maintenance of union lists of serials, an ad hoc discussion group on serials data bases was formed, with richard anable acting as interim coordinator. the council on library resources agreed to fund a meeting of the group's steering committee on september 21, 1973 at york university in toronto, canada. many of the major union list activities on this continent will be represented as well as the national libraries and isds national centers from both canada and the united states. a list of the subgroups that have been formed gives a good idea of the individual problem areas which the group is tackling: a. record format comparison b. minimum record data element requirements c. cooperative conversion arrangements d. organizational relationships and grant support e. holding statement notation f. bibliographic standards g. authority files h. software evaluation and exchange a detailed description of the history and activities of the discussion group can be found on page 207 of this issue. for further information contact: richard anable, york university, downsview, ontario, canada, m3j 2r2, ( 416) 6673789. technical exchanges file conversion using optical scanning: a comparison of the systems employed by the university of minnesota and the university of california, berkeley by this time most large libraries in the u.s. have converted into machine-read~ able form at least some of their ides. most of them, however, have used relatively inefficient techniques (such as key-punching) or relatively expensive ones (such as on-line data entry). it was with pleasure, then, that i read ms. grosch's recent article ("computer-based subject authority files at the university of minnesota libraries," journal of library automation, dec. 1972) describing a conversion technique that she, like the library at the university of california at berkeley, has found to be extremely cost effective, namely optical character recognition using a cdc 915 scanner. berkeley has used (and still is using) this technique in its efforts to create what will soon be among the largest machinereadable serials files of any university in the world. that me currently contains records for over 50,000 serials (in the marc structure). it is expected to contain records for about 90,000 unique titles (approximately 30 million characters) before the end of the current fiscal year. based on our ej~:perience in this undertaking, i would like to offer the following comments on the use of the cdc 915 scanner as it is used in minneapolis and in berkeley. costs-it should be crystal clear that the main reason for using the scanner is cost of the keyboarding device. that is, the keyboarding device for the cdc 911s scanner is an ordinary ten pitch selectric typewriter which can be purchased for under $500.00 or rented for from $ii.oo to $30.00 per month. when not used as a computer input device the machine functions as a normal office typewriter. a device like an mt/st that rents for about $110.00 a month costs about $.60 an hour for every hour it is used, or ten times as much. keyboard operators for a typewriter are easily obtained since there is no need to train an operator in the idiosyncrasies of keypunch cards, crt terminals, magnetic or paper tape devices, etc. keyboarding is fast and easy, especially when compared to a key punch. mistakes are easily corrected by, for example, merely crossing out the character ( s) in error. keyboarding on a selectric for a scanner and keyboarding on a device like the mt/st both require a "converter" (the scanner itself or the mt/st-to-computertape converter) . these "converters" are equally available and the decision to use one keyboarding device over another should not hinge on the "availability" of such "converters," as is usually the case. in addition to selecting a cost-effective keyboarding device, minnesota has also operated a system that delivers the data to the keyboarding device in an efficient manner: the typing is done from the source document itself, rather than from a copy of that document that has been transcribed onto a "coding sheet" or a photocopy of that document. ms. grosch points out that photocopying the source document would have raised the project costs by about 50 percent. in addition, keyboarding from photocopied documents would probably have been much slower and less accurate. the berkeley typists also keyboard from the original document, even when that document is a public catalog card that must be temporarily marked up in order to resolve ambiguities for the typists. supplies-it is true that the ordinary selectric typewriter (without the pinfeed platen) performs satisfactorily. thus, one does not need continuous forms for the typewriter. indeed, it is not necessary even to use a "stock form"; plain 20 pound white long grain paper will do. we use zellerbach's hammermill bond 820, which costs $2 a ream. at minnesota, using this paper instead of the "stock form" would probably have reduced the supplies cost from $400 to less than $25. technical communicat!ons 259 had a keypunch been used, the operation would probably have required about $150 worth of ibm cards. scanner throughput-careful design of the format of the data on the typed sheet can substantially improve throughput on the cdc 915 scanner. with double spaced typing (three lines per inch), the cdc scanner is capable of reading data at the rate of over a half million characters an hour, or about twice as fast as was actually achieved at minnesota. thus, with altered design of the input format, about half of the cost of the "converter" -the scanner-could· have been saved, representing an additional savings of $500. the principle applied to maximize throughput on a scanner such as the cdc 915 is to enter as much data as possible on a line and as many lines as possible on a page without crowding the data so much as to cause the machine to misread. (the machine enforces stricter tolerances as its capabilities are pushed to their limits.) one wants to get as much as possible on a line for the same reason that one wants to get as much as possible onto a punched card: there is a fair amount of machine overhead involved in advancing to the next line and/or page. the berkeley system uses a sheet of paper that is 8}~ x 14 inches in size, and the typists type each line a full 63~ inches long. typing is double-spaced (even though the machine is capable of handling single-spaced typing) because this increases the vertical skew tolerance from ~ of a character height to a full character height. figure 1 is an example of a page typed at berkeley. at berkeley, more than one field may be placed on a line, each field being separated by the "fork" character (y) . like minnesota, typists identify each field by a one-character code at the beginning of the field (a for author, t for title, h for holdings, c for call number, b for branch library location, etc.). typists are instructed to type until the margin locks. the beginning of each logical record is identified by the ''chair" character (r\) plus the typist's initials at the beginning of the line. thus the entire line is utilized, and the 260 journal of library automation vol. 6/4 december 1973 nssya=faoytproceedings of a symposium on man made forests and th eir industrial importance, canberra, 19b7yh1-3, 19b7//ycsd118.fs nssya=faoytreport on a survey of the awash river basinyh1-s, 1965// nssya=faoytetudes pedrohydrologiques, togoyh1-3, 19b7//ycs599.c7fb fig.l. berkeley optical scanner input. machine is not required to read a large number of blank spaces at the beginning of the line (which, as ms. grosch points out, it has trouble doing since it cannot readily tell whether six blanks may, in fact, be really five or seven blanks) . we generally do not proofread the sheets after they are typed. we have found that when proofreading is necessary (usually during training), it is not difficult to proofread data typed in the format that we use. data element identification-at berkeley, as at minnesota, the typist identifies the data element (e.g., the author or the title) rather than relying on a computer algorithm of the kind used by the library of congress or the institute of library research (automatic format recognition). this approach was selected because it was felt (a) that the typist could perform this task better than the computer could, and (b) that the routine nature of the typing job necessitated the insertion of more meaningful tasks for the typists. the data presented to the typists for interpretation can be in a wide variety of languages and may be transcribed on the source document according to any one of the conventions used by the library during the past several decades. typing throughput-the berkeley conversion system includes the use of certain "super abbreviations" that typists may use in place of commonly occurring words or phrases. all such abbreviations are two or three characters in length and are preceded by an equal sign. for example, "= f ao" is translated into "food and agricultural organization of the united nations," by the computer software. although this substantially improves keyboarding throughput, its chief advantage is the insurance that the long phrase is entered into the :file correctly and consistently. i personally find the requirement that the typist at minnesota type the "format recognition line'; at the top of each sheet in .order to avoid the necessity of a "complete rerunning of the job" to be not only wasteful, but playing brinkmanship with systems design. expanding the character set-although the cdc 915 scanner is capable of reading only the ocr a font (an all upper case font), it is relatively simple to produce upper-and-lower case output from data input via the cdc 915. two alternatives are: 1. have the typist key a special character that means "next character is to be capitalized" before each upper case character (the technique used by typists throughout the western world, in the form of the shift key) . if, for the cdc scanner, the dollar sign were chosen to be that special character, the "$john" would represent "john" and "john" would represent "john." this technique can be used to expand the keyboard to include diacritical marks. a berke~ ley typist keys "esp an%eol" to produce "espafiol," since the computer translates %e into a tilde over the preceding character. 2. do all capitalization by logic contained within the software. a primitive computer algorithm might simply say "capitalize the first word of every sentence plus the following proper nouns .... " the berkeley library currently uses such a technique for the capitalization of words in serial entries. this has been done in order to print out the serial entries following standard rules of technical communications 261 call number university of california berkeley general "library serials key word index page 372 holdings qa76.alc545 z699.alh3 qa76.al!4.r4 qa74.ali65 qa16.all5549 qa16.a1!555 computers qa76.a1a36 engi advances in computers , qa76.ala36 hath advances in computers , , , •••• qa76.alc56 eng! computers and automation •• , •• , ra409.5.alc65 biol computers and biomedical research • qc.145.2.c6 eng! computers and fluids. , .•• , ••. , , , , •• 1,19601, 19602, 19531, 19671, 19131968 ta64l.c65 libr computers and libraries; an australian directory. 1, 1911-eng! computers and structures. , • • • • • . • • • , • qa76.5.a1c65 main computers and the humanities ••••••••••• 1, sep. 196* 5, 1910/110n order unde computers and the humanities ••••••••••• on order hath computers, control and information theory •••• lb1028.5.a1c* ~~g~ g~~~~~~~e igna~~~6tei~di~e~~;in~der9r~d~~te • c~r~i~uia: p~o~e~dln9s: f~~ 8~~~ alh65 * i:gi ~~~i~~~ng~n~~r:~~;t~~n~ir~~~i~:e~~ste;~n!~~egg:pb~~~~~' proc~edi~gs 1, 19701, 19701,19581,19691:18, dec.l* 1, mar. 196* 1, 1971ti<6540.i55 eng! ieee transactions on computers , •••••• , • , • z699.alp76 libr program; news of computers in libraries •••• , , , , engi quarterly bibliography computers and data processing ••• 8~l~:~~2q3 :i~ ~!~f~rh:r ag~i3f~~~~y. o: ?~p:rr~r~ ~n~ ~a~a. p:o:e~s~n? : qa4 7. t7 math tracts for computers. • , , • . • , , , , • • • , , • • , 1, 19711, 1919; 2* 11-161; 1sfig. 2. berkeley's serial key word index: sample page. style, rather than the traditional rules of librarianship, namely every significant word in the title is capitalized. (did the library practice arise because early typewriters had shift keys that were hard to use?) our computer algorithm says essentially "capitalize all words in the entry except the following insignificant ones .... " this technique has created an upper-lower case file without having typists use the shift key, or its equivalent, .at least a half million times. figure 2, a page from berkeley's serials key word index, illustrates the results of this system. the real problem-! do not mean to imply that everything is rosy in file conversion land. a file conversion is a messy, difficult and essentially unproductive task, no matter how well done, because it merely transforms existing data into another form and in so doing exposes, for all to see, the "many ancient errors" which we do not want to see. it also exposes the "ambiguities" that were perhaps better left ambiguous, not to mention the inconsistencies that have cropped up as library practices varied. i would suggest that any file conversion that works from files that have been built up over some time period requires more in the way of resources for the "cleansing" than for the conversion. that is, in the case of the subject authority files at minnesota, i would guess that far more than $5,296.21 (the total amount spent on typists, keyboards, computers, supplies, etc.) was spent resolving ambiguities (before the drawer was handed to the typist) and "cleansing" the data in the one year between the time when the data had been converted and the time that they were put to use. this has been our experience at berkeley. stephen silberstein university of california, berkeley 262 journal of library automation vol. 6/4 december 1973 reports-library projects and activities bucknell university plans entire bibliographic file to go on-line bucknell university's already strong computer-usage program is expected to be strengthened in 1973/74 to permit stu~ dents and faculty to conduct fast, accurate searches of the university library from any of thirty-five campus terminals. a $28,000 grant to the bucknell university library from the council on library resources is supporting this program. seventy-five percent of bucknell's students already use the campus computer in course work. and bucknell's on-line library data base includes records of ap-proximately 25,000 of the library's 200,000 books. the council grant will enable additional computer storage to be rented to permit the entire bibliographic .file at bucknell to go on~line. the complete flle is already in machine-readable form. while bucknell's current system enables a search of the on-line files by authortitle, title alone, and library of congress (lc) number, its enlarged plan calls for subject search capability as well. using lc classification numbers, a user will be able to ask the computer to locate and display the authors and titles associated with the subject of interest, examine the near neighbors of his original hit in the file, or he may pick an author's name from the response and enter the system again on the author's name to see what else the author may have written. stanford university data file directory the stanford university data file di~ rectory, compiled by douglas ferguson, is available as an example of a libraryproduced access publication for computer~ ized data files on a university campus. the directory lists and describes colle~ tions of social, economic, political, and scientific research data on punched cards, computer tape, and disk, located on the stanford campus. each file description directs the user to documentation and published research in the university library collection or elsewhere. access to each data file is controlled by the owner and is listed in each file description. the di~ rectory is available, for prepayment of $4, from the financial office, stanford university libraries, stanford, ca 94304. standards editor note: the recent flurry of activity concerning standards which affect l~ brary automation, dilta bases, etc., is pointed up in the several actions reported in the last issues of tc. perhaps the futility of keeping up with standards and the need for a clearinghouse type of operation is best recognized by noting a sample of some recently adopted stan.. dilrds which now have or will potentially have ramifications in library automation. the following list does not represent a complete accounting of all pertinent standards due to lack of a comprehensive source. selected ansi standards many ansi standards published in the ansi categories of "information processing systems" and "information systems" may be of interest to isad members. selected items are listed below. the new american national standards institute (ansi) catalog is available free of charge from the institute's sales department at 1430 broadway, new york, ny 10018. the catalog lists " iso standards" and "iso recommendations" as well. x3.14 recorded magnetic tape for information interchange (200 cpi, nrzi) (revision of ansi x3.14-i969)--provides the standard technique for recording american nation~ al standard code for information interchange (ascii), x3.4-l968, on magnetic tape at 200 characters per inch ( cpi) using nonretum-to-zero-change on ones (nrzi) recording techniques. approval date: december 12, 1972. x3.38 computer code for states-x3.38-i912 provides two-digit numeric codes and two-character alpha------~------------------betic abbreviations for both the states and the district of columbia. the numeric codes will allow the states and the district of columbia to be sorted into alphabetic sequence. ansi x3.38-1972 may be obtained from the american national standards institute at $1.25 per copy. it was developed under the secretariat of the business equipment manufacturers association. x3.31 structure for the identification of the counties of the united states for information interchange (new standard )-identifies a three-digit numeric code structure for the counties of the states of the united states, including the district of columbia. supersedes the listing which appeared in the march 26, 1971 issue of standards action. approval date: march 14, 1973. x3.39 recorded magnetic tape for information interchange (1600 cpi, phase encoded) (new standard)-presents the standard technique for recording the coded character set provided in american national standard code for information interchange, x3.4-1968 (ascii) on magnetic tape at 1600 characters per inch (cpi) using phase recording techniques. approval date: march 7, 1973. x3.40 unrecorded magnetic tape for information interchange (9-track 200 and 800 cpi, nrzi, and 1600 cpi, pe) (new standard)-presents the minimum requirements for the physical and magnetic interchangeability requirements of ~-inch wide magnetic tape and reels between information processing systems, communication systems, and associated equipment using american national standard code for information interchange, x3.4-1968 (ascii). approval date: march 5, 1973. bsr x3.41 code extension techniques for use with the 7-bit coded character set for ascii (ansi x3.4-1968) (new ,proposed standard)-provides means for augmenting the standard repertory of 128 technical communications 263 characters of american national standard code for information interchange, x3.41968 (ascii), with additional graphics or control functions, by extending the 7-bit code while remaining in a 7-bit environment, or increasing to an 8-bit environment in which ascii is a subset. order from: business equipment manufacturers association; 1828 l st., nw; washington, dc 20036. single copy price: free. bsr x3.47 identification of named populated places and related entities of the states of the united states, structure for the (new proposed standard)-provides the structure for an unambiguous, five digit code for named populated cities, towns, villages, and similar communities and for several categories of named entities similar to these in one or more important respects. order from: business equipment manufacturers association; 1828 l st., nw, washington, dc 20036. single copy price: free. bsr x11 .6 operational data processing applications containing constitutionally protected data, documentation requirements for (new proposed standard)-provides all those involved with operating electronic data processing applications, involving constitutionally protected data, with a list of minimum documentary requirements which apply to such applications. order from: society of c ertified data processors, 38 main st., hudson, ma 01749. single copy price: $2.00. bsr xll.l categories of errorcreating characteristics of various data storage systems used with electronic data processing applications (new proposed st andard)provides the consumers of electronic data processing applications and the suppliers and implementors of such applications with a technique for defining the error-generating capabilities that exist in the data st orage system used to hold the consumer data. it is one of a series of data storage stan264 journal of library automation vol. 6/4 december 1973 dards being prepared by the society of certified data processors technical standards committee, to provide a method whereby the application implementor ·and the application consumer may communicate easily, allowing the application consumer to take the responsibility for the accuracy of the maintenance of the data base by electronic data processing systems. . order from: society of certilled data processors, attn: chairman, technical standards committee, 38 main st., hudson. ma 01749. single copy price: $2.00. bsr x11.2 data items stored in general data bases, classification . of (new proposed standard)-provides the suppliers of data to a general data base \vith a means of communication with the operation of the base regarding the characteristics of the data items being supplied. order from: society of certified data p~o~ssors, attn: chainnan, technical standards committee, 38 main st., hudson,.ma01749. single copy price: $2.00. bsr xl1.3 data base processing activities based on data items used, categories of (new proposed standard)-provides the application designers of data base applications and the operators of several data bases with a means of describing the characteristics of the data items stored in the data base. order from: society of certified data processors, attn: chairman, technical standards committee, 38 main st., hudson, ma 01749. single copy price: $2.00. bsr x2.3.4-1959 charting paperwork procedures , metho d of -this standard was one of the original input docume nts considered in the developm.ent of american national standard fi.owchart symbols and their usage in information processing, x3.5-1970 (originally ansi x3.5-1966). however, ansi x2.3.4-1959 was not considered sufficiently useful to serve the needs of the community which now uses ansi x3.5, nor at that time did x3 have responsibility for ansi x2.3.4 or feel that it should initiate action to modify the older standard. the subject standard was subsequently as~ signed to american national standards committee x3 for review and revision, reaffirmation or withdrawal. current review finds no interest in this standard, either in the form of users of the standard or of an organization desiring to assume its maintenance. order from: american national standards institute, dept. bsr, 1430 broadway, new york, ny 10018. single copy price: $1.00. sc/ 20 standard serial coding-the american national standard identi.scation number· for serial publications, .z39.9-1971 is available from ansi at $2.25 per copy. in june 1970, iso/ tc 46/wg 1· accepted the system as outlined in z39.9-1971 -as the basis for the international standard numbering system. a final issn standard was presented to the plenary session on tg 46 in october 1972 at the hague. the international center (ic) of the international serials data system (isds) is responsible for the administration of the issn as a central authority. the ic-isds was established with headquarters in the bibliotheque nationale with financial support being shared by the french government and unesco. the national serials data program (nsdp) has been selected to serve as the united states national center and as such is the sole agency responsible for the control and assignment of issn in the u.s. (note-the ansi !stab (information systems technical advisory board) rejected the proposed ansi z219.1-1971 , use of coden for periodical title abbreviations. this proposal had been submitted to ansi by the american society for testing and materials in 1971 for approval as an american national standard; z39 members were asked to comment on it during the public review in july and august 1971. after considerable discussion the 1st ab came to the conclusion that the proposed standard was in conflict with 239.9-1971, the. ansi identification number for serial publications.) sc/ 2 machine input records -the members of sc/ 2 have agreed that this standard cannot be written at this time. the purpose of the proposed standard was for general information interchange at the interface between data processing terminal equipment {such as data processors, data media input/ output devices, office machines, etc.) and data communications equipment (such as data sets, modems, etc.) . the decision was based on the fact that the problem of designing a format is not being addressed here (that standard already exists, namely z39.2-1971) but rather the problem of ne twork protocol. therefore, the transmission of the bibliographic record itself, taken in this context, is only a small part of the total picture. subcommittee 2 has concluded, however, that in the light of future developments in network protocol, bibliographic data should be transmitted in the z39.21971 interchange format standard. in order to further this recommendation, the present z39.2-1971, the american national standard for bibliographic information interchange on magnetic tape, will be revised by sci 2 to reflect a broader scope, i.e., information interchange in digital form, with appropriate sections in the document describing the existing standards for different media (the first of these would be magnetic tape since this standard already exists). this should have the effect of using the standard format in future systems via telecommunications as well as via magnetic tape. the additional sections discussing various media will aid the user of the format regardless of the media involved. input to the editor: say it isn't so. tell me that, as editor of technical communications, you are not responsible for the item on page 65 of val. 6, no. 1. i refer to the squib headed "tomorrow's library: spools of tape." i am particularly offended to see this kind of outdated foolishne ss promoted after noting two pages earlier that the new directions for technical communications will involve pertinent information about technical developments. how could a technical communications 265 publication entitled college management possibly contribute technically significant information about such a specialized and sophisticated area as library automation? in general, i think blue sky articles are inappropriate for tc. carl m. spaulding council on library resources the new format and content of technical communications is expected to evolve, and thf18 no step function change was anticipated. in the meantime, while operating on an accelerated publication schedule i have attempted to find pertinent (if not completely appropriate) articles for tc. i would like to see more contributions of hardcore technical communications from the field, but until people accept the new design for tc and contribute to it, the selections wiu be scarce. incidentally, i have received some comment to the contrary, that perhaps a " bltte sky?" category of news notes in tc would serve the useful purpose af providing another perspective, or putting "far out" items into context. certainly, contributions of the type submitted by stephen silberstein in this issue and justine roberts in the last issue of tc represent the directions envisaged for tc's content. in most technical fields there's a place for the proposed tc type of forum, and r m confident library automation and technology have a similar need. i would appreciate more readers' comments, and more importantly, brief write-ups of the technical aspects of your accomplishments ami. findings which would be of interest to isad members.-dlb potpourri unisist international serials data system the international serials data system { isds ) as establi~hed within the framework of the unisist program, is an international network of operational centers, jointly responsible for the creation and maintenance of computer-based data banks. 266 journal of library automation vol. 6/ 4 december 1973 the objectives of the isds system are: a. to develop and maintain an international register of serial publications containing all the necessary information for the identification of the serials. b. to define and promote the use of a standard code (issn) for the unique identification of each serial. c. to facilitate retrieval of scientific and technical information in serials. d. to make this information currently available to all countries, organizations, or individual users. e. to establish a network of communications between libraries, secondary information services, publishers of serial literature, and international organizations . f. to promote international standards for bibliographic description, communication formats, and information exchange in the area of serial publications. the isds is designed as a two-tier system consisting of: an international centre (ic) national and regional centres the isds-international centre is established in paris by agreement between unesco and the french government. it is temporarily located at the bibliotheque nationale. the isds-ic will establish an international file of serials from all countries. this file will be limited, initially, to scientific and technical publications, and will be gradually extended to include all disciplines. each serial will receive an international standard serial number (issn), which has been developed by the international organization for standardization (iso). products which could be derived from the international serials data system are as follows: titles index; issn index; isds register of . periodicals (register); classified titles index ( cti); new and amended titles index (n & at); cumulated new titles (cnt); permuted index; microform reference file (mrf). a magnetic tape service will be provided of the current master file, and of the new and amended titles. the responsibility for the establishment of national or regional centres belongs to unesco member states, and associate members who wish to participate in the unisist program. upon establishment each national centre will obtain a block of issns from the international centre and will gradually take over the responsibility for the registration of serials published in its territory. a regular information exchange program will be established between the national centers and the international center. the international register will thus be a regularly updated cumulation of the initial file established by the ic and the national or regional files. serials published in countries with no national or regional centres will be registered by the international centre, which will endeavor to obtain the necessary information. the relationship with users of isds is primarily through national or regional centres, but this general rule does not exclude direct contact with the international centre. the building of a consistent international file of serials implies close cooperation between all members of isds. the work in all countries will be based on a common set of rules concerning: bibliographic description, communication format, character sets, abbreviations, transliteration, etc. coordination between all members of the system is one of the main tasks of the international centre. close cooperation has also been established with various international organizations, the objectives of which are closely related to those of isds. in november 1972 the director-general of unesco informed member states of the creation of the international centre and has invited them to cooperate in isds by establishing national or regional centers. to assist in the creation of these national or regional centers provisional guideunes were made available. these guidelines are at present being finalized and will shortly be widely distributed in english, french, spanish, and russian. the response of member states was most encouraging and to date the following countries have set up or are in the process of setting up national or regional centers: argentina, australia, austria, canada, colombia, dahomey, france, federal republic of germany, guatemala, india, italy, malta, new zealand, nigeria, philippines, union of soviet socialist republics, united kingdom, and united states of america. for further information and issn assignment contact is os-intemational centre, bibliotheque nationale, 58 rue de richelieu, paris 2eme, france. adl to condtjct study of the data base publishing industry arthur d. little, inc., the cambridge, massachusetts, consulting firm, is launching a major study of the data base publishing industry. the study, which will be available on a subscription basis, will cover present and future technology utilization, economics, markets, and business and competitive structure. more specifically, the study will: • characterize typical data base publishing activities in terms of markets, products, sales strategies, methods of data base collection, distribution, etc.; • identify the current and expected roles of private industry sectors, government, and professional associations; • analyze existing and latent markets for data base publishing ventures and estimate market growth over the next five years; • describe criteria for analyzing the economics of data base publishing services and pricing them; • review hardware, software, and developments likely to affect the industry in the next five years, including emergence of lower-cost switched data networks; • describe the probable impacts of technical communications 267 public policy and regulatory developments, including copyright legislation, patentability of software, and concern over protection of confidentiality of personal information, and • characterize the reasons for past failures of certain data base publishing ventures and propose strategies for successful involvement. the study will be directed by vincent giuliano and robert kvaal. dr. giuliano has extensive experience working with major information dissemination systems, ranging from libraries to telecommunications-based computer systems. he has led a variety of systems development, systems analysis, evaluation, and market research projects at adl. mr. kvaal has focused his recent work on strategic planning issues facing computer services companies, and on assisting computer users in financial institutions, retail and distribution companies. this work has included operational and management audits, planning and implementation assistance, management information systems development, and the overall design of a nationwide teleprocessing system. according to giuliano and kvaal, data base publishing enterprises tend to evolve through well-defined stages of automation and business development: maintenance of manual data bases (reports, clippings, etc.) and the manual preparation of conventional printed products; partial computerization of the data base and some computer usage in preparation of conventional printed products; considerable automation of the data base and output process; offering of information retrieval and specialized search services on an overnight or phone call basis; and offering direct access to the data base via remote computer terminals. "but," giuliano and kvaal note, "the growing tendency of data base enterprises to evolve along this scale is creating dislocations in many of them, while at the same time, offering new opportunities for participants and suppliers. this uncertainty makes a study such as adl's especially useful at this point in the industry's development." 268 journal of library automation vol. 6/ 4 december 1973 the results of adl's study will be presented to clients in published form and in group meetings held in appropriate locations. the cost to each subscriber is $2,000. additional information may be obtained from philip a. untersee (617864-5770). pertinent publications new 1973 acm publication catalog the new expanded thirty-four page publication catalog of the association for computing machinery has been released. the catalog covers technical publications in over thirty major segments of the computing and automation field. copies are available upon request by writing to: publication services department, association for computing machinery, 1133 avenue of the americas, new york, ny 10036. proceedings of 1973 national computer conference the proceedings of the 1973 national computer conference & exposition are now available from the american federation of information processing societies, inc. (afips). the conference proceedings, volume 42, contains more than 160 technical papers and abstracts covering a wide range of topics in computer science & technology and methods & applications featured at the recent '73 ncc, june 4-8 in new york. the price of the 920 page hard-cover volume is $40. a reduced rate of $20 is available for prepaid orders from members of the afips' constituent societies stating their affiliation and membership number. copies of the proceedings may be ordered from afips press, 210 summit ave., montvale, nj 07645. computerized serials systems the larc association announces a new publication series entitled computerized serials systems. each volume in the series will consist of six issues published · at bimonthly intervals in both paperback and hardbound editions. each issue will be authored and edited by a person directly affiliated with the project reported, and each issue will be devoted to papers relating to an automated serials project undertaken by a specific library. the format of the new series is designed to promote understanding through clear narrative description and extensive illustrative materials. for details concerning the purchase of individual issues or a subscription to the complete volume contact larc press, 105-117 west fourth ave., peoria, il 61602. -34 information technology and libraries | march 2011 camilla fulton web accessibility, libraries, and the law as a typical student, you are able to scan the resources and descriptions, familiarize yourself with the quiz’s format, and follow the link to the quiz with no inherent problems. everything on the page flows well for you and the content is broken up easily for navigation. now imagine that you are legally blind. you navigate to the webpage with your screen reader, a software device that allows you to surf the web despite your impairment. ideally, the device gives you equal access to webpages, and you can navigate them in an equivalent manner as your peers. when you visit your teacher’s webpage, however, you start experiencing some problems. for one, you cannot scan the page like your peers because the category titles were designed with font tags instead of heading tags styled with cascading style sheets (css). most screen readers use heading tags to create the equivalent of a table of contents. this table of contents function divides the page into navigable sections instead of making the screen reader relay all page content as a single mass. second, most screen readers also allow users to “scan” or navigate a page by its listed links. when you visit your teacher’s page, you get a list of approximately twenty links that all read, “search this resource.” unfortunately, you are unable to differentiate between the separate resources without having the screen reader read all content for the appropriate context. third, because the resources are separated by hard returns, you find it difficult to differentiate between each listed item. your screen reader does not indicate when it approaches a list of categorized items, nor does it pause between each item. if the resources were contained within the proper html list tags of either ordered or unordered (with subsequent list item tagging), then you could navigate through the suggested resources more efficiently (see figures 1, 2, and 3). finally, the video tutorial’s audio tract explains much of the quiz’s structure; however, the video relies on image-capture alone for page orientation and navigation. without a visual transcript, you are at a disadvantage. stylistic descriptions of the page and its buttons are generally unhelpful, but the page’s textual content, and the general movement through it, would better aid you in preparation for the quiz. to be fair, your teacher would already be cognizant of your visual disability and would have accommodated your class needs appropriately. the individuals with disabilities education act (idea) mandates educational institutions to provide an equal opportunity to education.1 your teacher would likely avoid posting any class materials online without being certain that the content was fully accessible and usable to you. unlike educational institutions, however, most libraries are not legally bound to the same law. idea does not command libraries to provide equal access to information through with an abundance of library resources being served on the web, researchers are finding that disabled people oftentimes do not have the same level of access to materials as their nondisabled peers. this paper discusses web accessibility in the context of united states’ federal laws most referenced in web accessibility lawsuits. additionally, it reveals which states have statutes that mirror federal web accessibility guidelines and to what extent. interestingly, fewer than half of the states have adopted statutes addressing web accessibility, and fewer than half of these reference section 508 of the rehabilitation act or web content accessibility guidelines (wcag) 1.0. regardless of sparse legislation surrounding web accessibility, librarians should consult the appropriate web accessibility resources to ensure that their specialized content reaches all. i magine you are a student. in one of your classes, a teacher and librarian create a webpage that will help the class complete an online quiz. this quiz constitutes 20 percent of your final grade. through the exercise, your teacher hopes to instill the importance of quality research resources found on the web. the teacher and librarian divide their hand-picked resources into five subject-based categories. each resource listing contains a link to that particular resource followed by a paragraph of pertinent background information. the list concludes with a short video tutorial that prepares students for the layout of the online quiz. neither the teacher nor the librarian has extensive web design experience, but they both have basic html skills. the library’s information technologists give the teacher and librarian web space, allowing them to freely create their content on the web. unfortunately, they do not have a web librarian at their disposal to help construct the page. they solely rely on what they recall from previous web projects and visual layouts from other websites they admire. as they begin to construct the page, they first style each category’s title with font tags to make them bolder and larger than the surrounding text. they then separate each resource and its accompanying description with the equivalent of hard returns (or line breaks). next, they place links to the resources within the description text and label them with “search this resource.” finally, they create the audiovisual tutorial with a runtime of three minutes. camilla fulton (cfulton2@illinois.edu) is web and digital content access librarian, university of illinois, urbana-champaign. web accessibility, libraries, and the law | fulton 35 providing specifics on when those standards should apply. for example, section 508 of the rehabilitation act could serve as a blueprint for information technology guidelines that state agencies should follow. section 508 states that federal employees with disabilities [must] have access to and use of information and data that is comparable to the access and use by federal employees who are not individuals with disabilities, unless an undue burden would be imposed on the agency.4 section 508 continues to outline how the declaration should be met when procuring and managing software, websites, telecommunications, multimedia, etc. section 508’s web standards comply with w3c’s web content accessibility guidelines (wcag) 1.0; stricter compliance is optional. states could stop at section 508 and only make web accessibility laws applicable to other state agencies. section 504 of the rehabilitation act, however, provides additional legislation to model. in section 504, no disabled person can be excluded from programs or activities that are funded by federal dollars.5 section 504 further their websites. neither does the federal government possess a carte blanche web accessibility law that applies to the nation. this absence of legislation may give the impression of irrelevance, but as more core components of librarianship migrate to the web, librarians should confront these issues so they can serve all patrons more effectively. this article provides background information on the federal laws most frequently referenced within web accessibility cases. additionally, this article tests three assumptions: ■■ although the federal government has no web accessibility laws in place for the general public, most states legalized web accessibility for their respective state agencies. ■■ most state statutes do not mention section 508 of the americans with disabilities act (ada) or acknowledge world wide web consortium (w3c) standards. ■■ most libraries are not included as entities that must comply with state web accessibility statutes. further discussion on why these issues are important to the library profession follows. ■■ literature review no previous study has systematically examined state web accessibility statutes as they relate to libraries. most articles that address issues related to library web accessibility view libraries as independent entities and run accessibility evaluators on preselected library and university websites.2 those same articles also evaluate the meaning and impact of federal disability laws that could drive the outcome of web accessibility in academia.3 in examining state statutes, additional complexities may be unveiled when delving into the topic of web accessibility and librarianship. ■■ background with no definitive stance on public web accessibility from the federal government, states became tasked with figure 1. these webpages look exactly the same to users, but the html structure actually differs in source code view. 36 information technology and libraries | march 2011 title ii, section 201 (1) defines “public entity” as state and local governments, including their agencies, departments, and districts.9 title iii, section 302(a) builds on title ii and states that in the case of commercial facilities, no individual shall be discriminated against on the basis of disability in the full and equal enjoyment of the goods, services, facilities, privileges, advantages, or accommodations of any place of public accommodation by any person who owns, leases . . . or operates a place of public accommodation.10 delineates specific entities subject to the auspice of this law. though section 504 never mentions web accessibility specifically, states could freely interpret and apply certain aspects of the law for their own use (e.g., making organizations receiving state funds create accessible websites to prevent the exclusion of disabled people). if states wanted to provide the highest level of service to all, they would also consider incorporating the most recent w3c recommendations. the w3c formed in 1994 to address the need for structural consistency across multitudinous websites and web browsers. the driving principle of the w3c is to make the benefits of the web accessible to all, “whatever their hardware, software, network infrastructure, native language, culture, geographical location, or physical or mental ability.”6 the most recent w3c guidelines, wcag 2.0, detail web accessibility guidelines that are simpler to understand and, if followed, could improve both accessibility and usability despite browser type. alternatively, states could decide to wait until the federal government mandates an all-encompassing law on web accessibility. the national federation of the blind (nfb) and american council of the blind (acb) have been trying commercial entities in courts, claiming that inaccessible commercial websites discriminate against disabled people. the famous nfb lawsuit against target provided a precedent for other courts to acknowledge; commercial entities should provide an accessible means to purchase regularly stocked items through their website (if they are already maintaining one).7 these commercial web accessibility lawsuits are often defended with title ii and title iii of the ada. title ii, section 202 states, subject to the provisions of this title, no qualified individual with a disability shall, by reason of such disability, be excluded from participation in or be denied the benefits of the services, programs, or activities of a public entity, or be discriminated by any such entity.8 figure 2. here we see distinct variances in the source code. the image at the top (inaccessible) reveals code that does not use headings or unordered lists for each resource. the image on the bottom (accessible) does use semantically correct code, maintaining the same look and feel of the headings and list items through an attached cascading stylesheet. web accessibility, libraries, and the law | fulton 37 accessibility believe that section 301(7) specifically denotes places of physical accommodation because the authors’ original intent did not include virtual ones.13 settling on a definition for “public accommodation” is so divisive that three district courts are receptive to “public accommodation” referring to nonphysical places, four district courts ruled against the notion, and four have not yet made a decision.14 despite legal battles within the commercial sector, state statute analysis shows that states felt compelled to address web accessibility on their own terms. ■■ method this study surveys the most current state statute web presences as they pertain to web accessibility and their connection to libraries. using georgia institute of technology’s state e&it accessibility initiatives database and golden’s article on accessibility within institutions of higher learning as starting points, i searched each state government’s online statutes for the most recently available code.15 examples of search terms used include “web accessibility,” “information technology,” and “accessibility -building -architecture -health.” “building,” for example, excluded statute results that pertained to building accessibility. i then reviewed each statute to determine whether its mandates applied to web accessibility. some statutes excluded mention of web accessibility but outlined specific requirements for an institution’s software procurement. when statutes on web accessibility could not be found, additional searches were conducted for the most recently available web accessibility guidelines, policies, or standards. using a popular web search engine and the search terms “[state] web accessibility” usually resulted in finding the state’s standards online. if the search engine did not offer desirable results, then i visited the appropriate state government’s website. the term “web accessibility” was used within the state government’s site search. the following results serve only as a guide. because of the ever-changing nature of the law, please consult legal advisors within your institution for changes that may have occurred post article publication. ■■ results “although the federal government has no web accessibility laws in place for the general public, most states legalized web accessibility for its respective state agencies.” false—only seventeen states have codified laws ensuring web accessibility for their state websites.16 four this title’s proclamation seems clear-cut; however, legal definitions of “public accommodation” differ. title iii, section 301(7) defines a list of acceptable entities to receive the title of “public accommodation.”11 among those listed are auditoriums, theaters, terminals, and educational facilities. courts using title iii in defense for web accessibility argue that the web is a place, and therefore cannot discriminate against those with visual, motor, or mental disabilities.12 those arguing against using title iii for web figure 3. fangs (http://www.standards-schmandards.com/ projects/fangs/) visually emulates what a standard screen reader outputs so that designers can take the first steps in creating more accessible content on the web. 38 information technology and libraries | march 2011 classified institutions with library websites found that less than half of each degree-producing division was directed by their institution to comply with the ada for web accessibility.24 some may not recognize the significance of providing accessible library websites, especially if they do not witness a large quantity of accommodation requests from their users. coincidentally, perceived societal drawbacks could keep disabled users from seeking the assistance they need.25 according to american community survey terminology, disabilities negatively affecting web accessibility tend to be sensory and self-care based.26 the 2008 american community survey public use microdata sample estimates that 10,393,100 noninstitutionalized americans of all ages live with a hearing disability and 6,826,400 live with a visual disability.27 according to the same survey, an estimated 7,195,600 noninstitutionalized americans live with a self-care disability. in other words, nearly 24.5 million people in the united states are unable to retrieve information from library websites unless web authors make accessibility and usability their goal. as gatekeepers of information and research resources, librarians should want to be the first to provide unrestricted and unhindered access to all patrons despite their ability. nonetheless, potential objections to addressing web accessibility can deter improvement: learning and applying web accessibility guidelines will be difficult. there is no way we can improve access to disabled users in a way that will be useful. actually, more than 90 percent of sensory-accessibility issues can be resolved through steps outlined in section 508, such as utilizing headings properly, giving alternative image descriptions, and providing captions for audio and video. granted, these elements may be more difficult to manage on extensive websites, but wisely applied web content management systems could alleviate information technology units’ stress in that respect.28 creating an accessible website is time consuming and resource draining. this is obviously an “undue burden” on our facility. we cannot do anything about accessibility until we are given more funding. the “undue burden” clause seen in section 508 and several state statutes is a real issue that government officials needed to address. however, individual institutions are not supposed to view accessible website creation as an isolated activity. “undue burden,” as defined by the code of federal regulations, relies upon the overall budget of the program or component being developed.29 claiming an “undue burden” means that the institution must extensively document why creating an accessible website would cause a burden.30 the institution would also have to provide disabled users an alternative means of access to information provided online. of these seventeen extended coverage to include agencies receiving state funds (with no exceptions).17 though that number seems disappointingly low, many states addressed web accessibility through other means. thirtyone states without web accessibility statutes posted some form of standard, policy, or guideline online in its place (see appendix). these standards only apply to state entities, however, and have no legal footing outside of federal law to spur enforcement. at the time of article submission, alaska and wyoming were the only two states without an accessibility standard, policy, or guideline available on the web. “most state statutes do not mention section 508 of the americans with disabilities act or acknowledge world wide web consortium (w3c) standards” true—interestingly, only seven of the seventeen states with web accessibility statutes reference section 508 or wcag 1.0 directly within their statute text (see appendix).18 minnesota is the only state that references the more current wcag 2.0 standards.19 these numbers may seem minuscule as well, but all states have supplemented their statutes with more descriptive guidelines and standards that delineate best practices for compliance (see appendix). within those guidelines and standards, section 508 and wcag 1.0 get mentioned with more frequency. “most libraries are not included as entities that must comply with state web accessibility statutes.” true—from the perspective of a librarian, the above data means that forty-eight states would require web accessibility compliance for their state libraries (see appendix). four of those states (arkansas, california, kentucky, and montana) require all libraries receiving state funds to maintain an accessible website.20 an additional four states (illinois, oklahoma, texas, and virginia) explicitly hold universities, and therefore their libraries, to the same standards as their state agencies.21 despite the commendable efforts of eight states pushing for more far-reaching web accessibility, thousands of k–12, public, and academic libraries nationwide escape these laws’ reach. ■■ discussion and conclusion without legal backing for web accessibility issues at all levels, “equitable access to information and library services” might remain a dream.22 notably, researchers have witnessed web accessibility improvements in a four-year span; however, as of 2006, even libraries at institutions with ala-accredited library and information science programs did not average an accessibility validation of 70 percent or higher.23 additionally, a survey of carnegie web accessibility, libraries, and the law | fulton 39 9. 42 u.s.c. §12131. 10. 42 u.s.c. §12182. 11. 42 u.s.c. §12181. 12. carrie l. kiedrowski, “the applicability of the ada to private internet web sites,” cleveland state law review 49 (2001): 719–47; shani else, “courts must welcome the reality of the modern word: cyberspace is a place under title iii of the americans with disabilities act,” washington & lee law review 65 (summer 2008): 1121–58. 13. ibid. 14. nikki d. kessling, “why the target ‘nexus test’ leaves disabled americans disconnected: a better approach to determine whether private commercial websites are ‘places of public accommodation,’” houston law review 45 (summer 2008): 991–1029. 15. state e & it accessibility initiatives workgroup, “state it database,” georgia institute of technology, http://acces sibility.gtri.gatech.edu/sitid/state_prototype.php (accessed jan. 28, 2010); nina golden, “why institutions of higher education must provide access to the internet to students with disabilities,” vanderbilt journal of entertainment & technology law 10 (winter 2008): 363–411. 16. arizona revised statutes §41-3532 (2010); arkansas code of 1987 annotated §25-26-201–§25-26-206 (2009); california government code §11135–§11139 (2010); colorado revised statutes §24-85-101–§24-85-104 (2009); florida statutes §282.601– §282.606 (2010); 30 illinois complied statutes annotated 587 (2010); burns indiana code annotated §4-13.1-3 (2010); kentucky revised statutes annotated §61.980–§ 61.988 (2010); louisiana revised statutes §39:302 (2010); maryland state finance and procurement code annotated §3a-311 (2010); minnesota annotated statutes §16e.03 subdivisions 9-10 (2009); missouri revised statutes §191.863 (2009); montana code annotated §185-601 (2009); 62 oklahoma statutes §34.16, §34.28–§34.30 (2009); texas government code §2054.451–§2054.463 (2009); virginia code annotated §2.2-3500–§2.2-3504 (2010); west virginia code § 18-10n-1–§18-10n-4 (2009). 17. arkansas code of 1987 annotated §25-26-202(7) (2009); california government code §11135 (2010); kentucky revised statutes annotated §61.980(4) (2010); montana code annotated §18-5-602 (2009). 18. arizona revised statutes §41-3532 (2010); california government code §11135(d)(2) (2010); burns indiana code annotated §4-13.1-3-1(a) (2010); florida statutes §282.602 (2010); kentucky revised statutes annotated §61.980(1) (2010); minnesota annotated statutes §16e.03 subdivision 9(b) (2009); missouri revised statutes §191.863(1) (2009). 19. minnesota annotated statutes §16e.03 subdivision 9(b) (2009). 20. arkansas code of 1987 annotated §25-26-202(7) (2009); california government code §11135 (2010); kentucky revised statutes annotated §61.980(4) (2010); montana code annotated §18-5-602 (2009). 21. 30 illinois complied statutes annotated 587/10 (2010); 62 oklahoma statutes §34.29 (2009); texas government code §2054.451 (2009); virginia code annotated §2.2-3501 (2010). 22. american library association, “alahead to 2010 strategic plan,” http://www.ala.org/ala/aboutala/missionhistory/ plan/2010/index.cfm (accessed jan. 28, 2010). 23. comeaux and schmetzke, “accessibility trends.” no one will sue an institution focused on promoting education. we will just continue providing one-on-one assistance when requested. in 2009, a blind student, backed by the nfb, initiated litigation against the law school admissions council (lsac) because of the inaccessibility of its online tests.31 in 2010, they added four law schools to the defense: university of california hastings college of the law, thomas jefferson school of law, whittier law school, and chapman university school of law.32 these law schools were added because they host their application materials on the lsac website.33 assuredly, if instructors and students are encouraged or required to use library webpages for assignments and research, those unable to use them in an equivalent manner as their peers may pursue litigation for forcible change. ultimately, providing accessible websites for library users should not be perceived as a hassle. sure, it may entail a new way of thinking, but the benefits of universal access and improved usability far outweigh the frustration that users may feel when they cannot be self-sufficient in their web-based research.34 regardless of whether the disabled user is in a k–12, college, university, or public library, they are paying for a service that requires more than just a physical accommodation.35 federal agencies, state entities, and individual institutions are all responsible (and important) in the promotion of accessible website construction. lack of statutes or federal laws should not exempt libraries from providing equivalent access to all; it should drive libraries toward it. references 1. individuals with disabilities education act of 2004, 40 u.s.c. §1411–§1419. 2. see david comeaux and axel schmetzke, “accessibility trends among academic library and library school web sites in the usa and canada,” journal of access services 6 (jan.–june 2009): 137–52; julia huprich and ravonne green, “assessing the library homepages of copla institutions for section 508 accessibility errors: who’s accessible, who’s not and how the online webxact assessment tool can help,” journal of access services 4, no. 1 (2007): 59–73; michael providenti and robert zai iii, “web accessibility at kentucky’s academic libraries,” library hi tech 25, no. 4 (2007): 478–93. 3. ibid.; michael providenti and rober zai iii, “web accessibility at academic libraries: standards, legislation, and enforcement,” library hi tech 24, no. 4 (2007): 494–508. 4. 29 u.s.c. §794(d); 36 code of federal regulations (cfr) §1194.1. 5. 29 u.s.c. § 794. 6. world wide web consortium, “w3c mission,” http:// www.w3.org/consortium/mission.html (accessed jan. 28, 2010). 7. national federation of the blind v. target corp., 452 f. supp. 2d 946 (n.d. cal. 2006). 8. 42 u.s.c. §12132. 40 information technology and libraries | march 2011 special needs, vol. 5105, lecture notes in computer science (linz, australia: springer-verlag, 2008) 454–61; david kane and nora hegarty, “new site, new opportunities: enforcing standards compliance within a content management system,” library hi tech 25, no. 2 (2007): 276–87. 29. 28 cfr §36.104. 30. ibid. 31. sheri qualters, “blind law student sues law school admissions council over accessibility,” national law journal (feb. 20, 2009), http://www.law.com/jsp/nlj/pubarticlenlj .jsp?id=1202428419045 (accessed jan. 28, 2010). follow the case at the county of alameda’s superior court of california, available online (search for case number rg09436691): http://apps .alameda.courts.ca.gov/domainweb/html/index.html (accessed sept. 20, 2010). 32. ibid. 33. ibid. after finding the case, click on “register of actions” in the side navigation menu. these details can be found on page 10 of the action “joint case management statement filed,” uploaded june 30, 2010. 34. jim blansett, “digital discrimination: ten years after section 508, libraries still fall short of addressing disabilities online,” library journal 133 (aug. 2008): 26–29; drew robb, “one site fits all: companies are working to make their web sites comply with accessibility guidelines because the effort translates into more customers,” computerworld (mar. 28, 2005): 29–32. 35. the united states department of justice supports title iii’s application of “public accommodation” to include virtual web spaces. see u.s. department of justice, “settlement agreement between the united states of america and city of missoula county, montana under the americans with disabilities act,” dj# 204-44-45, http://www.justice.gov/crt/foia/mt_1.php and http://www.ada.gov/missoula.htm (accessed jan. 28, 2010). 24. ruth sara connell, “survey of web developers in academic libraries,” journal of academic librarianship 34, no. 2 (2008): 121–29. 25. patrick m. egan and traci a. guiliano, “unaccommodating attitudes: perceptions of students as a function of academic accommodation use and test performance” north american journal of psychology 11, no. 3 (2009): 487–500; ramona paetzold et al., “perceptions of people with disabilities: when is accommodation fair?” basic & applied social psychology 30 (2008): 27–35. 26. u.s. census bureau, american community survey, puerto rico community survey: 2008 subject definitions (washington, d.c.: government printing office, 2009). hearing disability pertains to deafness or difficulty in hearing. visual disability pertains to blindness or difficulty seeing despite prescription glasses. self-care disability pertains to those whom have “difficulty dressing or bathing.” 27. u.s. census bureau, data set: 2006–2008 american community survey (acs) public use microdata sample (pums) 3-year estimates (washington, d.c.: government printing office, 2009). for a more interactive table, with statistics drawn directly from the american community survey pums data files, see the database created and maintained by the employment and disability institute at cornell university: m. j. bjelland, w. a. erickson, and c. g. lee, disability statistics from the american community survey (acs), cornell university rehabilitation research and training center on disability demographics and statistics (statsrrtc), http://www.disabilitystatistics.org (accessed jan. 28, 2010). 28. sébastien rainville-pitt and jean-marie d’amour, “using a cms to create fully accessible web sites,” journal of access services 6 (2009): 261–64; laura burzagli et al., “using web content management systems for accessibility: the experience of a research institute portal,” in proceedings of the 11th international conference on computers helping people with appendix. library website accessibility requirements, by state state libraries included? code online state statutes online statements/policies/ guidelines ala. n/a n/a n/a http://isd.alabama.gov/isd/statements .aspx alas. n/a n/a n/a n/a ariz.* state and statefunded (with exceptions) arizona revised statutes §413532 http://www.azleg.state.az.us/ arizonarevisedstatutes.asp? title=41 http://az.gov/polices_accessibility.html ark. state and state-funded arkansas code annotated §2526-201 thru §25-26-206 http://www.arkleg.state.ar.us/assembly/ arkansascodelargefiles/title%2025%20 state%20government-chapter%2026%20 information%20technology.htm and http:// www.arkleg.state.ar.us/bureau/publications/ arkansas%20code/title%2025.pdf http://portal.arkansas.gov/pages/policy .aspx web accessibility, libraries, and the law | fulton 41 state libraries included? code online state statutes online statements/policies/ guidelines calif.* state and state-funded california government code §11135 thru §11139 http://www.leginfo.ca.gov/calaw.html http://www.webtools.ca.gov/accessibility/ state_standards.asp colo. state colorado revised statutes §2485-101 thru §24-85-104 http://www.state.co.us/gov_dir/leg_dir/ olls/colorado_revised_statutes.htm www.colorado.gov/colorado/accessibility .html conn. n/a n/a n/a http://www.access.state.ct.us/ del. n/a n/a n/a http://gic.delaware.gov/information/ access_central.shtml fla.* state florida statutes §282.601 thru §282.606 http://www.leg.state.fl.us/statutes/ http://www.myflorida.com/myflorida/ accessibility.html ga. n/a n/a n/a http://www.georgia.gov/00/static/ 0,2085,4802_0_0_accessibility, 00.html hawaii n/a n/a n/a http://www.ehawaii.gov/dakine/docs/ada .html idaho n/a n/a n/a http://idaho.gov/accessibility.html ill. state and university 30 illinois complied statutes annotated 587 http://www.ilga.gov/legislation/ilcs/ilcs.asp http://www.dhs.state.il.us/page.aspx? item=32765 ind.* state and local government burns indiana code annotated §4-13.1-3 http://www.in.gov/legislative/ic/code/title4/ ar13.1/ch3.html http://www.in.gov/core/accessibility.htm iowa n/a n/a n/a http://www.iowa.gov/pages/accessibility kans. n/a n/a n/a http://www.kansas.gov/about/ accessibility_policy.html ky.* state and state-funded kentucky revised statutes annotated §61.980 thru §61.988 http://www.lrc.ky.gov/krs/titles.htm http://technology.ky.gov/policies/ webtoolkit.htm la. state louisiana revised statutes §39:302 http://www.legis.state.la.us/ http://www.louisiana.gov/government/ policies/#webaccessibility maine n/a n/a n/a http://www.maine.gov/oit/accessibility/ policy/webpolicy.html appendix. library website accessibility requirements, by state (continued) 42 information technology and libraries | march 2011 state libraries included? code online state statutes online statements/policies/ guidelines md. state and (possibly) community college maryland state finance and procurement code annotated §3a311 http://www.michie.com/maryland/ and http://www.dsd.state.md.us/comar/coma r.aspx http://www.maryland.gov/pages/ accessibility.aspx mass. n/a n/a n/a http://www.mass.gov/accessibility and http://www.mass.gov/?pageid=mg2utiliti es&l=1&sid=massgov2&u=utility_policy_ accessibility mich. n/a n/a n/a http://www.michigan.gov/som/0,1607,7– 192–26913–2090—, 00.html minn.** state minnesota annotated statutes §16e. 03 subdivisions 9–10 https://www.revisor.mn.gov/pubs/ http://www.starprogram.state.mn.us/ accessibility_usability.htm miss. n/a n/a n/a http://www.mississippi.gov/access_policy .jsp mo.* state missouri revised statutes §191.863 http://www.moga.mo.gov/statutes/ statutes.htm http://oa.mo.gov/itsd/cio/standards/ ittechnology.htm mont. state and state-funded montana code annotated §185-601 http://data.opi.mt.gov/bills/mca_toc/index .htm http://mt.gov/discover/disclaimer .asp#accessibility neb. n/a n/a n/a http://www.webmasters.ne.gov/ accessibilitystandards.html nev. n/a n/a n/a http://www.nitoc.nv.gov/psps/3.02_ standard_webstyleguide.pdf n.h. n/a n/a n/a http://www.nh.gov/wai/ n.j. n/a n/a n/a http://www.state.nj.us/nj/accessibility.html n.m. n/a n/a n/a http://www.newmexico.gov/accessibility .htm n.y. n/a n/a n/a http://www.cio.ny.gov/policy/nys-p08– 005.pdf n.c. n/a n/a n/a http://www.ncsta.gov/docs/principles%20 practices%20standards/application.pdf n. dak. n/a n/a n/a http://www.nd.gov/ea/standards/ ohio n/a n/a n/a http://ohio.gov/policies/accessibility/ appendix. library website accessibility requirements, by state (continued) web accessibility, libraries, and the law | fulton 43 state libraries included? code online state statutes online statements/policies/ guidelines okla. state and university 62 oklahoma statutes §34.16, §34.28 thru §34.30 http://www.lsb.state.ok.us/ http://www.ok.gov/accessibility/ ore. n/a n/a n/a http://www.oregon.gov/accessibility.shtml pa. n/a n/a n/a http://www.portal.state.pa.us/portal/ server.pt/community/it_accessibility/10940 r.i. n/a n/a n/a http://www.ri.gov/policies/access.php s.c. n/a n/a n/a http://sc.gov/policies/accessibility.htm s. dak. n/a n/a n/a http://www.sd.gov/accpolicy.aspx tenn. n/a n/a n/a http://www.tennesseeanytime.org/web -policies/accessibility.html tex. state and university texas government code §2054.451 thru §2054.463 http://www.statutes.legis.state.tx.us/ http://www.texasonline.com/portal/tol/en/ policies utah n/a n/a n/a http://www.utah.gov/accessibility.html va. state, university, and commonwealth virginia code annotated §2.2-3500 thru §2.2-3504 http://leg1.state.va.us/000/src.htm http://www.virginia.gov/cmsportal3/ about_virginia.gov_4096/web_policy.html vt. n/a n/a n/a http://www.vermont.gov/portal/policies/ accessibility.php wash. n/a n/a n/a http://isb.wa.gov/webguide/accessibility .aspx w. va. state west virginia code §1810n-1 thru §18-10n-4 http://www.legis.state.wv.us/wvcode/ code.cfm http://www.wv.gov/policies/pages/ accessibility.aspx wis. n/a n/a n/a http://www.wisconsin.gov/state/core/ accessibility.html wyo. n/a n/a n/a n/a *these states mention section 508 of the rehabilitation act within statute text **this state mentions wcag 2.0 within its statute text note: most states with statutes on web accessibility also have statements, policies, and guidelines that are more detailed than the statute text and may contain references to section 508 and wcag 2.0. all webpages were visited between january 1, 2010, and february 12, 2010. appendix. library website accessibility requirements, by state (continued) editorial board thoughts: requiring and demonstrating technical skills for library employment emily morton-owens information technologies and libraries | september 2016 6 recently i’ve been involved in a number of conversations about technical skills for library jobs, sparked by an ital article by monica maceli1 and a code4lib presentation by jennie rose halperin.2 maceli performed a text analysis of job postings on code4lib to reveal what skills are cooccurring and most frequent. halperin problematized the expense of the mls credential in comparison to the qualifications actually required by library technology jobs and the salaries offered for technical versus nontechnical work. this work has inspired many conversations about the shift in skills required for library work, the value placed on different kinds of labor, and how mls programs can teach library technology. during a period of hiring at my institution and through teaching a library school course in which many of the students are on the brink of graduation, my attention has been called particularly to one point in the library employment process: job postings. these advertisements are the first step in matching aspiring library staff with the real-life needs of libraries—where the rubber meets the road between employer expectations and new-grad experience. most libraries already use the practice of distinguishing between required and preferred qualifications, which is a good start, especially for technology jobs where candidates may offer strong learning proficiency yet lack a few particular tools. although there have been conflicting interpretations of the hewlett-packard research suggesting that men are more likely than women to apply to jobs when they don’t meet all the requirements,3 i observe a general tendency among graduating students to err on the side of caution because they’re not sure which qualifications they can claim. among my students, for example, constant confusion attends the years of experience required. is this library experience? general job experience? experience at the same type of library? paid or unpaid? postings are often ambiguous and students may choose to apply or not. similarly, there are questions about what extent of experience qualifies someone to know a technology: mastering it through creating new projects at a paid job, experience maintaining it, or merely basic familiarity? not knowing who has been hired, and on the basis of what kind of experience, is a gap for researchers trying to close the loop on job advertisements. even when a job posting has avoided an overlong list of required technical skills, it might still be expressing a narrow sense of what’s required to qualify. someone who understands subversion will be capable of understanding git, so we see plenty of job advertisements that ask for experience with a “a version control system (e.g. git, subversion, or mercurial).” i recently polled staff in our department and found very few of us with bachelor’s degrees in technical subjects. more of us had come to working in library technology through work experience or graduate programs. and yet, our job postings contained long statements that conflated education and experience, such as “bachelor’s degree in computer science, information science, or other emily morton-owens (egmowens@upenn.edu), a member of the ital editorial board, is director of digital library development and systems, university of pennsylvania libraries, philadelphia, pennsylvania. mailto:egmowens@upenn.edu editorial board thoughts | morton-owens doi: 10.6017/ital.v35i3.9527 7 relevant field and at least 3 years of experience application development in object oriented and scripting languages or equivalent combination of education and experience. master’s desirable.” i edited our statement to more clearly allow a combination of factors that would show sufficient preparation: “bachelor’s degree and a minimum of 3-5 years of experience, or an equivalent combination of education and experience, are required; a master’s degree is preferred,” followed by a separate description of technical skills needed. this increased the number and quality of our applications, so i’ll remain on the lookout for opportunities to represent what we want to require more faithfully and with an open mind. meanwhile, on the other side of the table, students and recent grads are uncertain how to demonstrate their skills. first, they’re wondering how to show clearly enough that they meet requirements like “three years of work experience” or “experience with user testing” so that their application is seriously considered. second, they ask about possibilities to formalize skills. recently, i’ve gotten questions about a certificate program in ux and whether there is any formal certification to be a systems librarian. surveying the past experience of my own network—with very diverse paths into technology jobs ranging from undergraduate or second master’s degrees to learning scripting as a technical services librarian to pre-mls work experience—doesn’t suggest any standard method for substantiating technical knowledge. once again, the truth of the situation may be that libraries will welcome a broad range of possible experience, but the postings don’t necessarily signal that. some advice from the tech industry about how to be more inviting to candidates applies to libraries too; for example, avoiding “rockstar”/ “ninja” descriptions, emphasizing the problem space over years of experience,4 and designing interview processes that encourage discussion rather than “gotcha” technical tasks. at penn libraries, for example, we’ve been asking developer candidates to spend a few hours at most on a take-home coding assignment, rather than doing whiteboard coding on the spot. this gives us concrete code to discuss in a far more realistic and relaxed context. while it may be helpful to express requirements better to encourage applicants to see more clearly whether they should respond to a posting, this is a small part of the question of preparing new mls grads for library technology jobs. the new grads who are seeking guidance on substantiating their skills are the ones who are confident they possess them. others have a sense that they should increase their comfort with technology but are not sure how to do it, especially when they’ve just completed a whole new degree and may not have the time or resources to pursue additional training. even if we make efforts to narrow the gap between employers and jobseekers, much remains to be discussed regarding the challenge of readying students with different interests and preparation for library employment. library school provides a relatively brief window to instill in students the fundamentals and values of the profession and it can’t be repurposed as a coding academy. there persists a need to discuss how to help students interested in technology learn and demonstrate competencies rather than teaching them rapidly shifting specific technologies. editorial board thoughts | morton-owens doi: 10.6017/ital.v35i3.9527 8 references 1. monica maceli, “what technology skills do developers need? a text analysis of job listings in library and information science (lis) from jobs.code4lib.org,” information technology and libraries 34 no3 (2015): 8-21, doi:10.6017/ital./v23i3.5893. 2. jennie rose halperin, “our $50,000 problem: why library school?” code{4}lib, http://code4lib.org/conference/2015/halperin. 3. tara sophia mohr, “why women don’t apply for jobs unless they’re 100% qualified,” harvard business review, august 25, 2014, https://hbr.org/2014/08/why-women-dont-apply-for-jobsunless-theyre-100-qualified. 4. erin kissane, “job listings that don’t alienate,” https://storify.com/kissane/job-listings-thatdon-t-alienate. http://dx.doi.org/10.6017/ital./v23i3.5893 http://code4lib.org/conference/2015/halperin https://hbr.org/2014/08/why-women-dont-apply-for-jobs-unless-theyre-100-qualified https://hbr.org/2014/08/why-women-dont-apply-for-jobs-unless-theyre-100-qualified https://storify.com/kissane/job-listings-that-don-t-alienate https://storify.com/kissane/job-listings-that-don-t-alienate bento box user experience study at franklin university articles bento-box user experience study at franklin university marc jaffy information technology and libraries | march 2020 https://doi.org/10.6017/ital.v39i1.11581 marc jaffy (marc.jaffy@franklin.edu) is acquisitions librarian, franklin university. abstract this article discusses the benefits of the bento-box method of searching library resources, including a comparison of the method with a tabbed search interface. it then describes a usability study conducted by the franklin university library in which 27 students searched for an article, an ebook, and a journal on two websites: one using a bento box and one using the ebsco discovery service (eds). screen recordings of the searches were reviewed to see what actions users took while looking for information on each site, as well as how long the searches took. students also filled out questionnaires to indicate what they thought of each type of search. overall students found more items on the bento-box site, and indicated a slight preference for the bento-box search over eds. the bento-box site also provided quicker results than the eds site. as a result, the franklin university library decided to implement bento-box searching on its website. introduction “one page, one search box, results from as many library-resource types as possible.”1 in 2018, the franklin university library redesigned its website to provide users with a more modern interface that more closely matched franklin university’s website. the library also wanted to improve the site’s usability and make it easier for students to find information. to determine how to best improve the user experience, library staff members held a number of meetings to discuss the new site’s layout and contents. because “students almost always resort[] to searching via web site search boxes rather than navigating through the web site by browsing,” a crucial decision involved what search results the redesigned library website would provide. 2 as a result of these discussions, the franklin university library’s initial website redesign included a persistent search bar in the upper left of each page which searched the library’s website, as well as a prominent tabbed search bar on the library’s homepage (see figure 1). the homepage search bar provided a default tab that used ebsco discovery service (eds) to search the library resources cataloged in eds (most of the library’s databases and catalog) and a second tab which used ebsco’s journal finder to look for e-journals. once our new website went live, feedback from patrons demonstrated that the persistent website search bar caused confusion among users who expected it to search the library’s databases rather than the library’s website. we also found the “search journals” tab on the homepage unnecessary. as a result, we removed both the persistent search bar and the journals tab. after these changes, the main search option provided to library users was the eds search bar on the library’s homepage, although some interior pages of the library’s website provided a search bar related to that page (such as an option to search the catalog on the catalog page). although eds searches mainly for articles and books, it “may overlook user needs for other types of library mailto:marc.jaffy@franklin.edu information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 2 resources or services.”3 this is a problem because “library users increasingly perceive the discovery interface as a portal to all of the library’s resources.”4 due to dissatisfaction with the eds search, the library decided to look for alternatives. one alternative which “[a] number of libraries have turned to [is] the bento-based approach to discovery and display.”5 figure 1. redesigned franklin university library website with two search boxes on the homepage. the circled search box in the upper left was initially persistent across the entire site. to determine whether the bento-box search format would serve our users better than eds, the library designed and conducted a usability study comparing eds and bento-box searches. by comparing user search behavior and results for each search method, as well as user opinion regarding these different methods of searching library websites, the library hoped to gain a clearer understanding of how its users interact with search boxes on the library’s website and— most importantly—which search method would best serve its users. the remainder of this article sets forth the results of that trial. after explaining what bento-box search is, as well as reasons a library might want to use bento-box search results, it reports on a usability study the franklin university library conducted by discussing both the observations of screen recordings demonstrating user search behavior and responses to questionnaires. information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 3 bento-box library search what is bento-box search? the term “bento box” is based on “japanese cuisine where different parts of a meal are compartmentalized in aesthetically pleasing ways.”6 instead of compartmentalizing food, a bentobox search results page compartmentalizes search results from a variety of different resources on a single page. the user sees a single search box which gives “side-by-side results from multiple library search tools, e.g., library catalog, article index, institutional repository, website, libgu ides, etc.”7 a bento-box search provides results based on searches of individual library resources. this is important because of the difficulty of providing a single search that includes combined results from all resources: “the nature of library content and current technology makes it difficult to create usable ‘blended’ results; catalog materials may crowd out books or vice versa.” 8 bento-box results avoid this problem by “provid[ing] these resources on equal footing, leveraging the ranking algorithms internal to each resource’s individual search interface.” 9 as a result, the bento box gives libraries “the best cost/benefit way to improve article search relatively quickly with relatively large user benefit.”10 figure 2, the university of michigan’s bento-box results page, illustrates how a bento box provides search results from a variety of resources in visually discrete boxes. this is done behind the scenes by using separate searches to query the individual resources, as demonstrated by figure 3, the architecture of wayne state university’s bento-box search. benefits of a bento-box search results page wayne state university’s switch to a bento box “resulted in increased access to resources.”11 a bento box can increase access to resources both because it makes library search easier for users and because it provides results in a format that makes it easier for users to find information. simplified search when deciding what type of search to provide on the library website, the main consideration involves what users expect when searching. student experiences with internet search engines have influenced their expectations for library search, which leads them to “approach library search interfaces as if they were google.”12 what do users like about google? “one of the main reasons that users are satisfied with google is its simple user interface.”13 based on their experience with google and other search engines, library users “expect easy searching” that provides “one-step immediate access to the full text of library resources.”14 bento-box results let libraries meet these expectations by presenting users with a simple interface that permits easy searching and “returns results across many library sources.”15 additionally, bento-box results “can integrate library website results, allowing users to type things like ‘how do i renew a book’ into a single search box and get meaningful results.”16 as a result, adopting a bento-box search results page can permit a library to satisfy user search expectations. the bento-box format will provide users the information they seek whether they are looking for an article, a book, or information about the library. information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 4 figure 2. university of michigan library’s bento-box results. information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 5 figure 3. wayne state university library bento box architecture, from cole hudson and graham hukill, “one-to-many: building a single-search interface for disparate resources,” in k. varnum (ed.) exploring discovery: the front door to your library’s licensed and digitized content (chicago: ala editions, 2016): 147. better presentation of results bento-box results can help alleviate user confusion because “format types [are] more evident: novice users, such as undergraduates, may not have a good understanding of the difference between books, journals, and articles.”17 the bento-box presentation makes it easier for users to find information since “results from different sources are returned to visually discrete boxes” 18 on a single page. this presentation of grouped results benefits library users because “[b]y presenting search results in separate streams, users can more easily navigate to what they need.”19 when the princeton university library implemented a bento-box results page (termed all search), it found that “most users praised the new all search for its ease of use and also for the ‘bento-box’ approach of grouping results by category. they felt that this clarified the results they were seeing and made it easier for them to pursue different avenues of inquiry.”20 information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 6 comparison with tabbed searching one alternative to the bento box is to offer users a tabbed search box which lets users select a specific resource to search by selecting a tab on the search bar. before the 2018 redesign, the franklin university library’s website provided users with a tabbed search box, as shown in figure 4. our redesigned website reduced the number of tabs from four to two, but we ultimately removed tabbed search from our website because we did not find it effective. figure 4. previous franklin university library website with tabbed search box. tabbed search requires users to decide which tab to use. when the franklin university library provided a tabbed search box we found that users had difficulty identifying which tab they should use. in addition to causing user confusion over which tab to use for their search, because each search tab only searches a portion of library resources, tabbed searching “misses a wide swath of available information and resources [which] will make that missing information practically invisible.”21 another problem with tabbed search is that it requires a library to designate one of the tabs as a default search. this can lessen the chances that users will search (or use) resources in the nondefault tab(s) because library users “tend to favor the most prominent search option.”22 lown, sierra, and boyer cited a study which found that the default option was used 60.5 percent of the time in a tabbed interface, and reported that on the north carolina state university website the default tab was used 73.7 percent of the time.23 tabbed search does not meet library user needs because it is “inconsistent and confusing.”24 when wayne state university switched from tabbed searching to a bento box “many library resources that were previously hidden from search and discovery on the main library website were, for the first time, exposed to all searches, for all users . . . [which] resulted in increased use and awareness of these resources.”25 information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 7 design considerations bento-box results pages are highly customizable. a 2017 review of 38 academic libraries using bento-box search found “much variation in the implementation.”26 a bento-box results page needs to balance providing necessary information with displaying results in a way that is not too overwhelming—or too cluttered. the university of michigan analyzed usage of its bento-box results page and redesigned it to improve how it presented results to library users by displaying “[f]ewer results . . . in each section—with a more prominent ‘see all’ link—than in the original design.”27 given user expectations and the challenges previously discussed, libraries must design the results to “maximize the exposure of [their] collection[s] and services, in the most appropriate precedence, while preventing cognitive overload.”28 a cluttered results page, with a lack of distinction between categories, will make it difficult for users to find information and will cause confusion rather than ease it. another concern with a bento-box results page occurs when some of the result boxes “end up ‘below the fold,’ meaning users will need to scroll down to see them. this creates the same problem as a tabbed search box—users don’t see results from all library sources.”29 because users are less likely to see below the fold search results, the bento-box results page needs to prioritize category locations so that the more important results are above the fold (which requires the library to determine the relative importance of search result categories). user experience study at franklin university the trial design during franklin university’s fall 2018 and spring 2019 welcome weeks, in addition to providing students with information about the library’s services, staff at the library’s information table asked students to participate in a trial to help determine whether adopting a bento-box results page would benefit our users. participants were offered a franklin university coffee mug as an incentive. the trial asked participants to look for information on two different library websites: one using eds (franklin university library) and one using a bento box (wayne state university library). we set up two laptops for participants to use and made screen recordings of the participants’ actions during their searches for later viewing and analysis. after they finished the tasks we asked participants to fill out a questionnaire (reproduced in appendix a), which had three background questions and six questions about their experience/thoughts relating to the tasks. to decide what information to ask participants to look for, we reviewed library websites that used bento boxes to see what categories they searched. we compared those categories to the types of information available on our library’s website. although we identified a number of possible categories, we decided to limit the trial to three tasks because we did not want to overburden participants. based on our experience working with students, we decided the three categories we were most interested in investigating: articles, books, and journals. to see how users searched for items in these categories we asked participants to complete the following three tasks on each website: information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 8 1. find an article available through the library on the topic “criminal justice.” 2. find the ebook lean six sigma for leaders by martin brenig-jones and jo dowdall. 3. find the electronic journal business today. participants thirty-four people participated in the trial. however, not all of the participants completed all of the tasks on each library’s website. we discarded the results from participants who did not attempt at least two tasks on each library’s website. removing those who did not complete at least two tasks on each site left 27 participants (“adjusted” results). unless otherwise noted, the data discussed below refers to the adjusted results. the trial was open to students, faculty, and staff. eleven participants took the trial in fall 2018 and 16 participants took it in spring 2019. most of the participants were undergraduates (21), with some graduate students (6). no doctoral students, faculty, or staff-only participated. (one staff member participated and completed a questionnaire but did not perform enough tasks for their results to be included.) results we watched the screen recordings to time how long it took students to complete the three tasks on each library’s website. however, if a student flipped between the sites while searching instead of first completing all three searches on one site we did not time the results. we also observed what students did while searching to gain an understanding of how they searched for information. overall, students spent less time searching on the site using the bento box (wayne state university library) than they did on the franklin university library site: • students spent an average of 2 minutes, 35 seconds to complete the tasks on the wayne state site compared to 3 minutes, 28 seconds on franklin’s. • twelve students finished their searches quicker on wayne state’s site, while six had quicker results from franklin’s. how students searched for information the screen recordings showed that students looking for information often went to parts of the library websites which did not contain the content they sought. frequently, they would search whatever part of the site they were on—even if it did not contain the content they needed. on the wayne state site, when a student used an interior search bar the bento box provided results from a wide range of library resources. a search on the journals page would give results for journals, books, articles, and more—even if the page they were on did not contain the resource they needed to find. by contrast, any interior search box found on the franklin university page would only provide results for whatever portion of the library’s resources that search box accessed: a search on the journals page would only provide journal results, a search on the catalog page would only provide catalog results, etc. student action after the initial search demonstrates the need for interior search boxes which can search the library’s entire site. twelve of 27 students on each site (although not the same 12 students) followed up a search by using a search bar they found on their results page without returning to the homepage to use the main search bar. students did this even when the page they were on did not relate to the content they were looking for. information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 9 for library users, the division between the content of the library’s website and the content provided by the library “’is not obvious and makes no real sense.’”30 the screen recordings of student behavior when searching for content on library websites demonstrated that students also could not distinguish between different areas within the library’s website. search for articles the first task asked students to find an article on criminal justice. because it was the first task on the list, most students started with this search. students had more success finding the article on franklin’s site than on wayne state’s site (18 to 14). although only 14 students were credited with finding the article on wayne state’s site, several students actually reached the bento-box page which included a category for article results. however, they did not realize that they had found the article and selected an ebook or database instead. search for ebooks the second task asked students to find the ebook lean six sigma for leaders by martin brenigjones and jo dowdall. students had more success finding this book on the bento-box site. twenty students found the book using the wayne state library’s bento-box search, compared to ten who found the book on the franklin university library’s site. many students, in searching for the ebook, typed their search on the results page from the previous search. on the wayne state university library’s site, this led to a bento-box results page which included the book. on franklin university library’s site, the results were more complex. between fall 2018 and spring 2019, our ebsco eds custom catalog was not renewed, which resulted in the search results for the ebook no longer displaying in eds. [this non-renewal was not intentional.] as a result, in fall 2018, the ebook that students looked for appeared in eds search results (on both the franklin university library’s main search bar and interior ebsco search boxes); however, in spring 2019 it did not. to see what effect this had, we looked at the more limited search results from the fall 2018 trial when an eds search on franklin university library’s site would successfully find the book. even then, students had more success finding the book on the bento-box site: 9 students successfully found the book on wayne state’s site, compared to 7 on franklin university’s site. some students on franklin university library’s site tried, unsuccessfully, to identify the proper page of the library’s site to find “books.” instead of the catalog page, they went to a page labeled “textbooks” which helped students locate course reserves. because there was no search option for the entire site on that page, those students did not successfully find the ebook. search for journals the third task asked students to look for the journal business today. as with the ebook search, students had more success finding the journal on the bento-box site: 19 students found the journal on wayne state library’s site compared to 10 on franklin university library’s site. students searched for the journal in a similar manner to the way they searched for the ebook. many just put the journal title in the search bar on the page they were on—if that search bar on franklin university library’s site provided access to the journal, they would find it with their information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 10 search. but if they were on a page which did not have a search which included the journal as a result, they would not. the journal search on the wayne state university library’s site demonstrated the “below th e fold” problem discussed above because the bento-box result for “journals” was below the fold. as a result, at least one student properly searched for the journal, but did not find it because the result was not visible on the screen and they did not scroll down. questionnaires we asked students a series of questions about their experience searching for information on the two library websites (see appendix a for the questionnaire and appendix b for the results). slightly more students preferred the wayne state library’s site (14) to franklin university’s (12). however, five of the users who preferred franklin university’s site referenced their familiarity with that site. four of these users specifically referenced their familiarity with the franklin university library site in response to a question asking “[w]hy did you prefer the type of search you picked,” while one referenced their familiarity with franklin university library’s site in response to a question asking whether there was “anything else you’d like to tell us.” the questionnaire also asked students why they preferred one site to the other and what they liked and disliked about each site. preference: franklin university library as mentioned above, many of the comments from those who indicated a preference for the franklin university library site indicated the preference was due to familiarity: • “might be because i am a bit used to it, i just found it easier to navigate.” • “because it’s the one i am familiar with.” • “because i'm familiar with it.” • “i liked both. both easy to use. familiar with franklin's.” other comments favored franklin university library’s overall website design (as opposed to search): • “the website is cleaner. easier to use.” • “easier to navigate.” some of the comments did indicate a preference for the search technique: • “one search bar to search all types. seemed to include more in search results.” • “it easy to search and straight forward.” • “access to research was quicker on the franklin website. also, i felt like there was more research material available.” preference: wayne state university library those who preferred the wayne state search did so more based on the search technique than did those who preferred franklin: • “simple and the search was in one spot.” information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 11 • “one search bar that pulled from the [catalog].” • “because you can type in exactly what you were looking for and it comes up.” others appreciated the way search results were displayed: • “their search system organizes the result by type of information, whereas franklin's website makes you search for the type of material information before displaying the results.” • “better layout breaks articles, journals, etc. into separate columns.” • “wayne had each section (book, e-journal, article) separately which was easier to find.” • “the layout.” still others just found the wayne state library search easier to use: • “it is more visual and easy to find and easy to use.” • “it presented the information in an easy way to find.” • “easy—all in one.” what search results do library users want? we asked participants to rank which results they would like to see displayed when searching on the library’s site. while most applied a numerical ranking, some just circled items. all questionnaire responses were included when compiling these rankings, including rankings from those who did not complete at least two tasks on each website, because user preferences about what search results they want are valid even if they did not perform the required tasks on each library’s website. we converted numerical rankings so that the first choice received six points, the second choice five points, etc. of the 34 participants who answered this question, 24 provided rankings and ten circled items without indicating how they ranked those items. where participants circled items, we converted their responses to a numerical equivalent based on how many answers they circled. if they only circled one, it was treated as the first choice, and given 6 points. if they circled more than one, we combined the numerical value of the answers and each answer received the average value. (for example, if two answers were circled, they were treated as a first and second choice, and each circled answer was given a score of 5.5.) the responses indicate students most wanted library search to provide results for articles and journals, followed by databases and ebooks: • articles: 144 • journals: 125 • databases: 112 • books/ebooks: 111 • research guides: 69.5 • library site: 63.5 mischo, norman, and schlembach reported actual usage of the university of illinois’ bento-box results page by category between 2015 and 2017. how do the categories our users indicated they information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 12 would like to see on a bento-box results page compare with the actual use of bento-box results at the university of illinois? at the university of illinois, 56.1 percent of click-throughs were for articles (franklin university students’ first choice), while 33.6 percent were for book and online catalog content (our students’ fourth choice).31 databases, our students’ third choice, were not a listed category, while journals (franklin university students’ second choice) were only the fifth-most used resource (and the percentage of click-throughs was low, at only 3.6 percent).32 limitations there were a few issues with the study which should be kept in mind when evaluating the results. number of participants thirty-four individuals participated in the trial. after removing results from participants who failed to complete a sufficient number of tasks, only 27 participants remained. while this number is small, it does provide information on what students think and, more importantly, how they act when searching for various types of information on the library’s website. examples of library user experience testing based on similar numbers include: • the university of kansas library conducted “usability testing of our primo discovery interface on basic library searching scenarios for undergraduate and graduate students” and reported results from 27 users.33 • the university of southern mississippi library conducted usability testing of 24 users (“six participants from each of the following library user groups: undergraduate students, graduate students, faculty, and library employees”) to evaluate and modify their website.34 • syracuse university conducted usability testing on “ten students . . . and eighteen library staff members.”35 familiarity with franklin university library website student familiarity with the franklin university library website affected student opinion. of 12 students in the adjusted results who preferred franklin university library’s site, 5 (41.7 percent) gave an answer indicating that familiarity was a factor in their preference. when considering all the questionnaires (including those from participants who were not included in the adjusted results), 7 out of 17 users (41.2 percent) who preferred franklin university library’s site gave an answer referencing familiarity. as a result, opinion may have been skewed in favor of franklin university, despite wayne state being slightly favored overall. a good illustration of this problem is the response from a student who the screen recording showed did not even attempt any of the tasks on the franklin university library website but indicated a preference for the franklin university library site because it’s “easy to use.” conclusion as a result of the user experience study, the franklin university library decided that providing bento-box search results would benefit our library’s users. the trial showed that students required less time to conduct their searches using wayne state’s bento-box search and found more items successfully on wayne state’s site. information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 13 the lack of student distinction between different types of library content, along with the likelihood of their entering a search in whatever search box they see on a page further supports providing bento-box results. adopting a bento-box results page will permit the library to provide search boxes on interior pages which permit students to search for materials site-wide. the bento box will let students search for content anywhere on the library’s web site without requiring them to first figure out what type of library resource they are looking for and then find the correct section of the library’s website. additionally, the ebook search issue previously discussed demonstrates the benefits of switching to a bento box. the disappearance of ebook search results from the eds listing would not have mattered with a bento-box style search because the bento box would have displayed a box for catalog results. comments from two of the students who preferred the wayne state library website demonstrate the benefits of a bento-box format. the bento-box search meant that wayne state’s site is “simple and the search was in one spot.” it also helps students because “[wayne state’s] search system organizes the result by type of information, whereas franklin's website makes you search for the type of material information before displaying the results.” information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 14 appendix a: questionnaire about this study the franklin university library is studying how users search for, and find, information on library websites. the purpose of this study is to ask library users (and potential library users) to search for information on two different library websites and give their opinion on their search experience. you are asked to be a participant as a member of the franklin university community who is a library user, or a potential library user. we hope to have between 20 and 50 people participate in this study. if you agree to participate in this study, you will be asked to look for information to find four different resources on two library websites (franklin university’s website and wayne state university’s website). you will only be asked to find the information, not to access the information. you will then be asked to fill out a questionnaire providing demographic information and your opinion about library search. as part of this study, your search activity on the websites may be recorded by screen recording software. your participation in this study is anticipated to take about 15 minutes. there are no known risks to participation in the study. the benefits of participation include helping the library to better serve its users by identifying how users search for information on the library’s website. in return for your participation in this survey, you will receive a franklin university coffee mug. this study is conducted anonymously—no personally identifiable information will be collected. your participation in this survey is voluntary. if you decide to participate, you have the right to refuse to answer any of the questions that make you uncomfortable. you also have the right to withdraw from this study at any time, with no repercussions. this research has been reviewed and approved by the franklin university institutional review board. for questions regarding participants’ rights, you can contact the institutional review board at irb@franklin.edu. information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 15 please answer the following background questions: 1) what do you do at franklin university (circle all that apply)? a) non-degree-seeking student b) undergraduate student c) master student d) doctoral student e) staff f) faculty 2) how often do you use the library’s website (circle the best choice)? a) frequently (every week) b) occasionally (every month) c) rarely (less than once a month) d) never 3) how often do you use the search function on the library’s website? a) frequently b) occasionally c) rarely d) never information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 16 please answer the following questions about your experience and preferences when searching for information on library websites: 1) which library’s search did you prefer: a) franklin university library b) wayne state university library 2) why did you prefer the type of search you picked in the answer to question 1? 3) for franklin university’s search results: a) what did you like? b) what didn’t you like? 4) for wayne state university’s search results: a) what did you like? b) what didn’t you like? 5) please rank in order of preference what search results you would want to see displayed when searching on the library’s website: a) articles related to a topic b) books / ebooks c) databases d) library website e) journals f) research guides g) other (please list): 6) is there anything else you’d like to tell us about your experience looking for information on these library websites? information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 17 appendix b: adjusted questionnaire results below are the results from participants who completed at least two tasks on each university library’s website. where the screen recordings indicated that participants did not complete at least two tasks on each of the websites the questionnaire responses were not recorded. what do you do at franklin university (circle all that apply)? undergraduate: 21 masters: 6 how often do you use the library’s website (circle the best choice)? occasionally: 9 frequently: 12 rarely: 6 how often do you use the search function on the library’s website? occasionally: 7 frequently: 13 rarely: 7 which library’s search did you prefer: franklin university: 12 wayne state university: 14 n/a: 1 information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 18 appendix c: screen recording results analysis of screen recordings from participants who completed at least two tasks on each university library’s website. for timed results, we did not include the results of students who flipped between library websites while completing the tasks. (an example of flipping between sites occurred when a student found the article on the franklin university library site, then looked for it on wayne state university library site before looking for the ebook on the franklin university site.) time to complete tasks (average): franklin university library site: 3:28 wayne state university library site: 2:35 site where student finished search quicker: franklin university library site: 6 wayne state university library site: 12 total items found: franklin university library site: 38 wayne state university library site: 53 articles found: franklin university library site: 18 wayne state university library site: 14 books found: franklin university library site: 10 wayne state university library site: 20 journals found: franklin university library site: 10 wayne state university library site: 19 information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 19 endnotes 1 cole hudson and graham hukill, “one-to-many: building a single-search interface for disparate resources,” in k. varnum (ed.) exploring discovery: the front door to your library’s licensed and digitized content (chicago: ala editions, 2016): 146. 2 suzanna conrad and nathasha alvarez, “conversations with web site users: using focus groups to open discussion and improve user experience,” journal of web librarianship 10, no. 2 (april 2016): 71, https://doi.org/10.1080/19322909.2016.1161572. 3 scott hanrath and miloche kottman, “use and usability of a discovery tool in an academic library,” journal of web librarianship 9, no. 1 (january 2015): 4, https://doi.org/10.1080/19322909.2014.983259. 4 irina trapido, “library discovery products: discovering user expectations through failure analysis.” information technology and libraries 35, no. 3 (2016): 22, https://doi.org/10.6017/ital.v35i3.9190. 5 william mischo, michael norman, and mary schlembach, “innovations in discovery systems: user studies and the bento approach,” proceedings of the charleston library conference (2017): 299, https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1991&context=charleston. 6 hudson and hukill, “one-to-many,” 142. 7 emily singley, “to bento or not to bento—displaying search results,” http://emilysingley.net/usablelibraries/to-bento-or-not-to-bento-displaying-search-results/. 8 jonathan rochkind, “article search improvement strategy,” https://bibwild.wordpress.com/2012/10/02/article-search-improvement-strategy/. 9 hudson and hukill, “one-to-many,” 145. 10 rochkind, “article search improvement strategy.” 11 hudson and hukill, “one-to-many,” 142. 12 nancy turner, “librarians do it differently: comparative usability testing with students and library staff,” journal of web librarianship 5, no. 4 (october 2011), https://doi.org/10.1080/19322909.2011.624428; elena azadbakht, john blair, and lisa jones, “everyone's invited: a website usability study involving multiple library stakeholders,” information technology & libraries 36, no. 4 (2017), 43, https://doi.org/10.6017/ital.v36i4.9959. 13 colleen kenefick and jennifer a. devito, “google expectations and interlibrary loan: can we ever be fast enough?” journal of interlibrary loan, document delivery & electronic reserves 23, no. 3 (july 2013): 158, https://doi.org/10.1080/1072303x.2013.856365. https://doi.org/10.1080/19322909.2016.1161572 https://doi.org/10.1080/19322909.2014.983259 https://doi.org/10.6017/ital.v35i3.9190 https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1991&context=charleston http://emilysingley.net/usablelibraries/to-bento-or-not-to-bento-displaying-search-results/ https://bibwild.wordpress.com/2012/10/02/article-search-improvement-strategy/ https://doi.org/10.1080/19322909.2011.624428 https://doi.org/10.6017/ital.v36i4.9959 https://doi.org/10.1080/1072303x.2013.856365 information technology and libraries march 2020 bento box user experience study at franklin university | jaffy 20 14 kenefick and devito, “google expectations and interlibrary loan,” 157; carol diedrichs, “discovery and delivery: making it work for users,” serials librarian 56, no. 1-4 (january 2009): 81, https://doi.org/10.1080/03615260802679127. 15 singley, “to bento or not to bento.” 16 singley. 17 singley. 18 hudson and hukill, “one-to-many,” 146. 19 singley, “to bento or not to bento.” 20 eric phetteplace and jeremy darrington, “a hybrid approach to discovery services,” reference & user services quarterly 53, no. 4 (2014): 293. 21 cory lown, tito sierra, and josh boyer, “how users search the library from a single search box,” college & research libraries 74, no. 3 (2013): 229. 22 singley, “to bento or not to bento.” 23 lown, sierra, and boyer, “how users search the library from a single search box.” 24 hudson and hukill, “one-to-many,” 150. 25 hudson and hukill, 150. 26 mischo, norman, and schlembach, “innovations in discovery systems,” 304. 27 suzanne chapman et al., “manually classifying user search queries on an academic library web site,” journal of web librarianship 7, no. 4 (october 2013): 419, https://doi.org/10.1080/19322909.2013.842096. 28 aaron tay and feng yikang, “implementing a bento-style search in libguides v2,” code4lib journal no. 29 (july 2015), https://journal.code4lib.org/articles/10709. 29 singley, “to bento or not to bento.” 30 chapman et al., “manually classifying user search queries,” 406. 31 mischo, norman, and schlembach, “innovations in discovery systems.” 32 mischo, norman, and schlembach. 33 hanrath and milche kottman, “use and usability of a discovery tool,” 5. 34 azadbakht, blair, and jones, “everyone's invited,” 34. 35 turner, “librarians do it differently,” 290. https://doi.org/10.1080/03615260802679127 https://doi.org/10.1080/19322909.2013.842096 https://journal.code4lib.org/articles/10709 abstract introduction bento-box library search what is bento-box search? benefits of a bento-box search results page simplified search better presentation of results comparison with tabbed searching design considerations user experience study at franklin university the trial design participants results how students searched for information search for articles search for ebooks search for journals questionnaires preference: franklin university library preference: wayne state university library what search results do library users want? limitations number of participants familiarity with franklin university library website conclusion appendix a: questionnaire about this study appendix b: adjusted questionnaire results what do you do at franklin university (circle all that apply)? how often do you use the library’s website (circle the best choice)? how often do you use the search function on the library’s website? which library’s search did you prefer: appendix c: screen recording results time to complete tasks (average): site where student finished search quicker: total items found: articles found: books found: journals found: endnotes lib-mocs-kmc364-20140106083618 170 book reviews basic fortran iv programming, by donald h. ford. homewood, illinois: richard d. irwin, inc., 1971. 254 pp. $7.95. fortran texts are now quite plentiful, so the main question in the reviewer's mind is: what does this book have to offer that no other book has? regrettably the answer must be nothing. there are many other good fortran books available. this has very little to distinguish it. that is not to say that it is not a good book. the quality of the book is good, the text is very readable, and there has been very good attention to the examples and proofreading. the book is suit able for an introductory course, or for self study. it does not go completely into all the features of the language, as these are usually best left to the specific manuals relating to the machines available. the book does bring the student to a level where he will be able to use those manuals and the level where he will need to use those manuals. the book does come to the level necessary for the person who writes his programs with professional assistance. the author has chosen ansi basic fortran iv to be discussed in the book. in particular he relates this to the ibm/360 and 370 computers. this is a common language and is available on most machines with only minor modifications. this was a good choice for the level of book he intended to write, since he didn't want to go into the advanced features of the language. the author goes quickly to the heart of the matter in fortran programming, so that the reader can start using the computer right away. the basic material is well covered and gives a good introduction to the more advanced features which are available on most machines. the examples are well chosen so that they do not require any specialized knowledge ; therefore the emphasis can be put on the programming aspects of the examples. he also has very good end-ofchapter problems, ranging in difficulty from straight repetition of text material to programming problems which will require a considerable amount of individual work. he has a good discussion of mixed mode arithmetic, one of the more difficult topics of fortran to explain. he also has a good discussion of input/output operations, and an explanation of formatting which is very good. this again is a difficult area of the language and has been well explained. discussing each of the statement types in fortran, he begins by giving the general form of the statement in a standardized way, which is very good for introductory purposes and for review and reference. the index in the book doesn't single these out, so somebody who wanted to use the book as a reference should make a self-index of these particular areas of the book where the general forms and statements are given. this is a good feature of the book. robert f. mathis book reviews 171 films: a marc format; specifications for magnetic tapes containing catalog records for motion pictures, filmstrips, and other pictorial media intended for pro;ection. washington: marc development office, 1970. 65 pp. $0.65. this latest format issued by the marc development office is similar in organiza tion to the previously issued formats, describing in tum the leader, record directory, control fields , and variable fields. three appendices give the variable field tags , indicators, and subfield codes applicable to this format , categories of films , and a sample record in the marc format. in addition to the motion pictures and filmstrips specified in the subtitle, the coverage of this format includes slides, transparencies, video tapes, and electronic video recordings. data elements describing these last two have not been defined completely as the marc development office feels that further investigation is needed in these areas. the bibliographic level for this format is for monograph material, i.e., material complete at time of issue or to be issued in a known number of parts . since most of the material covered by this format is entered under title, main entry fields ( 100, 110, 111, 130 ) have not been described. this exclusion also covers the equivalent fields in the 400s and 800s. main entry and other fields not listed in this format but required by a user can be obtained from books: a marc format. this format describes two kinds of data: that generally found on an lc printed card and that needed to describe films in archival collections. only the first category will be distributed in machine readable form on a regular basis. one innovation introduced in this format that can only be applauded by marc users is the adoption of the bnb practice of using the second indicator of title fields (241, 245, 440, 840, but not 740 where the second indicator had previously been assigned a different function) to specify the number of characters at the beginning of the entry which are to be ignored in filing. it is to be hoped that in the future this practice will be applied to books, serials, and other types of works as well as to films. judith hopkins u.k. marc pmiect, edited by a. e. jeffreys and t. d. wilson. newcastle upon tyne: oriel press, 1970. 116 pp. 25s. this volume, which reports the proceedings of a conference on the u.k. marc project held in march 1969, may be of as much interest in the usa as in britain; although the intake of british libraries is much smaller and the money available for experiments much less, the problems of developing and using marc effectively within these constraints are for this very reason of special interest. 172 journal of library automation vol. 4/3 september, 1971 a. j. wells opened the conference with a paper introducing u.k. marc and closed it with a paper stating its relationship to the british national bibliography. points of interest are the need for standardisation among libraries (not smprisingly, this theme occurs throughout) and the differences between u.k. marc and l.c. marc (the latter being the odd one out, in its departures from aacr 67). disappointingly, no hint is given of additional national bibliographical products that might come from marc, such as cumulated and updated bibliographies on given subjects, or listings of children's books, etc. richard coward, with his usual clarity and conciseness, explains the planning and format of u.k. marc, in which he has been so centrally involved. as he says, "we have the technology to produce a marc service but we really need a higher level of technology to use it at anything like its full potential." r. bayly's paper on "user programs and package deals" is disappointing, dealing only with icl 1900 computers, and not comprehensively or clearly even with them. two papers discuss the problems of actually using marc: e. h. c . driver's "why marc?", which concludes that "the most efficient use of marc will be made by large library sys tems or groups of libraries," and f . h. ayres' "marc in a special library environment," which concludes that eventually all libraries will use the marc tape. mr. ayres discusses the proposed use of marc at a wre aldermaston, and also gives a general (and highly optimistic ) blueprint of the sort of way marc could be used in an all-through selection, acquisition and cataloging system. (the four american experimental uses of marc reviewed by c. d. batty-at toronto, yale, rice and indianaare probably well enough known in the usa and canada.) keith davidson's discussion of filing problems is first class-and his paper is just as topical as when it was written, because little progress has been made since then. peter lewis, in "marc and the future in libraries," makes the point that whereas bnb cards provided a ready-made product for libraries, marc tapes will merely offer them a set of parts to put together themselves. of special interest to american audiences may be derek austin's paper, "subject retrieval in the u.k. marc," since the precis system to which it forms an introduction may represent a major breakthrough in machine manipulable subject indexing. marc and its uses constitute one of the most rapidly developing areas of librarianship. regular conferences of this standard are needed to review progress from time to time. maurice b. line lib-mocs-kmc364-20140106083744 185 automatic processing of personal names for filing foster m. palmer: associate university librarian, harvard university library, cambridge, massachusetts describes a method for preparing personal names already in machine readable form for processing by any standard computer sort program, determining filing order insofar as possible from normally available information rather than from special formating . prefix recognition is emphasized; multiword forename entries are a problem area. provision is made for an edit list of problems requiring human decision. possible extension of the method to titles is discussed. this paper describes a method of computerized filing of personal names for display in book catalogs or other lists intended for direct human consultation. the problem is to be distinguished from a related but different one: computerized storage for retrieval by means of a search key, in which machine rather than human convenience can determine the order. to the extent that filing is a purely mechanistic sorting process, it is ideally suited to computerization. however, it was early recognized that there are many possible complications in machine filing of library entries, even in the relatively straightforward area of personal names. some of these complications arise from such factors as upper-case codes, diacritic codes, and punctuation; others are the result of library rules or practices that call for departures from strict alphabetical order. while the latter are especially numerous in subject headings and titles, they affect names as well, for example, the custom of filing me as if mac. 186 journal of library automation vol. 4/4 december, 1971 while no general review of the literature on machine filing will be attempted here, attention will be called to selected contributions. nugent ( 1) described an approach to computerizing the library of congress filing rules and pointed out areas where the present rules do not lend themselves to mechanization. cartwright and shoffner ( 2) discussed four major ways of approaching a solution to the problem and concluded that a mixture of different methods would eventually be required. in a later publication cartwright ( 3) developed his ideas further and included a brief description of the present writer's then unpublished work. the principal monograph on the subject is that by hines and harris ( 4). they present a suggested filing code departing significantly from those in widespread use and propose that material be encoded in a certain fashion so that it will be ready for computer sorting. in particular, considerable dependence is placed on distinctions between single, double, and multiple blanks separating words or fields. in a recent paper, harris and hines restate their rules briefly and report on their later research ( 5). the present paper describes a different, virtually an opposite, approach. rather than relying on special formating of the material at the time of encoding, the system described herein attempts to derive the necessary filing information from normally formated material. historically, it grew out of a desire to construct improved indexes for use at the harvard university library to the body of records distributed by the marc pilot project, in which there were field indicators and a limited number of delimiters within fields, but a general absence of information added expressly for the purpose of filing. while some early work embraced both personal names and titles, it was soon apparent that names by themselves presented a considerable challenge, and further consideration of the even more difficult areas of titles, corporate entries, and subject entries was deferred. a few comments on the possible applicability of the general method to titles will be made later. the concrete form which the work eventually took was an autocoder macro instruction for a second generation computer, an ibm 1401. (a macro instruction is a means of calling forth by means of a single instruction a more extensive routine already worked out and placed in the system "library.") since the 1401 was a fairly small computer, it was important that the algorithm not require an excessive number of instructions, and since the internal speed of the machine was only moderate, it was also important that processing be direct and economical. the method used, however, is by no means limited to a particular computer or a particular language. a partial version of the algorithm has been written in adpac, as an exercise in the evaluation of that language, and run on an ibm 360-65 using marc ii test data. the system is based on examination of names (previously identified as such by appropriate tags) and development of parallel sort keys consisting processing of personal names/palmer 187 only of letters, numerals, and blanks, readily processable by any standard computer sort package designed for alphanumeric information. the only requirements are that blank sort low and that the letters a z and the numerals 0-9 sort in their natural order; whether numbers are considered higher or lower than letters does not matter. processing starts at the beginning of the name and proceeds until one of three conditions prevails: the number of characters examined is equal to the length of the field as specified in the record; the number of characters developed in the sort key has reached a specified cut-off point or the default value of 40; or a delimiter indicating the end of the name, or the end of the name proper, is encountered (a search being then made beyond the delimiter for a date, which, if found, is added to the sort key). the sort key is derived by transferring letters (or, in the case of a date, numbers) from the source, with occasional modifications as described below, and inserting one of four filing codes at the end of each word or element of the name. in early work, single special characters were used as filing codes, but this was inappropriate as a general solution since the filing order of these characters depended on the collating sequence peculiar to a particular computer. furthermore, it was inconvenient because it involved changing all blanks to something else, since a blank within a name with its implication of something to follow should not file as low as whatever indicates the very end of the name. the idea of using a two-character code, the first always being blank so that any filing code will file ahead of any letter or date, was derived from nugent ( l) and has been followed in all later work. only three filing codes were actually used in compiling indexes to the marc i tapes, and in the first description privately circulated by the author ( 6 ). however, at least four are now seen to be necessary, actual need to. distinguish the second and third not yet having been encountered but being possible: code (blank followed by: ) 3 5 6 7 placement the end of the name including date if any. between the name proper and a date. the end of the surname. the end of any other "word" of the name. (a word is any element followed by a blank, hyphen, comma, or period, except that prefixes which are identified as such are not considered separate words. ) the following examples illustrate the use of the codes and the general workings of the system. in this and later examples, the left hand column gives data in marc i format (where diacritics are represented by superscript numbers preceding the letters to which they apply, and the equal sign is a delimiter ), and the right hand column gives the sort key as derived by the macro. 188 journal of library automation vol. 4/4 december, 1971 arthur arthur, joseph arthur, joseph,= 1875arthur, joseph charles arthur-behenna, k. arthur-petr2os, gabriele maria wilson, william wilson, william,= 1923wilson, william lyne wilson-browne, a. e. arthur 3 arthur 6joseph 3 arthur 6joseph 51875 3 arthur 6joseph 7 charles 3 arthur 7behenna 6k 3 arthur 7petros 6gabriele 7maria 3 wilson 6william 3 wilson 6william 51923 3 wilson 6william 7lyne 3 wilson 7browne 6a 7 e 3 the use of the numbers 3, 5, 6, and 7 is arbitrary to a degree. an interval was left between 3 and 5 so that the end of name code could be changed to 4 if the name were a subject rather than a main or added entry. no extra interval to accommodate added entry as distinguished from main entry was left because the author did not wish to encourage what he regards as an unwise practice. however, those who insist may easily substitute a new series of codes allowing for it. the distinction between end of name and end of surname serves to bring simple forename entries, that is those consisting of a single word, e.g. sophocles, ahead of similar surnames, e.g. sophocles, evangelinus apostolides. no serious work has yet been undertaken on the problem of processing complex forenames, but the distinctive tagging of forenames in marc ii has made available a growing body of experimental data and the codes 1 (and 2 for subject) are reserved for possible future use in this connection, without any intent of prejudging the question whether complex forename entries should come before similar surnames. it is the view of the author that the filing of complex forename entries is one of the areas in which all librarians are on most uncertain grounds in assessing the preference and convenience of readers. in handling such entries as alexander, mrs., or maurice, sister, the algorithm depends on the presence of a delimiter before mrs. or sister to avoid filing after alexander, milton or maurice, robert. such delimiters were in fact present in the marc pilot project data. despite the limitations mentioned in dealing with multiple-word forename entries and with surnames lacking forenames, the algorithm is well suited to names in the normal modern pattern, namely a simple or compound surname followed by a comma and one or more given names or initials. furthermore, very specifically, it deals with prefix names. prefixes with apostrophes are taken care of by a general dropping out of apostrophes and other non-significant punctuation: [l'isle, guillaume de] lisle 6guillaume 7de 3 o'brian, robert enlow obrian 6robert 7 enlow 3 the same feature also handles such names as the following: prud'homme, louis arthur prudhomme 6louis 7arthur 3 ta'bois, roland tabois 6roland 3 processing of personal names/palmer 189 most prefixes, however, are dealt with by a specific search based on examining the first letter of each new "word" of the name. if the element begins with a, b, d, e, f, i, l, m, 0, s, t, v, or z, a branch is made to a prefix searching routine tailor-made for the particular letter. takin& names beginning with l as an example, if the second character is "e," "a,' or "o," a prefix may be present; otherwise the prefix search is discontinued. if still searching and the third character is a blank or a hyphen, a prefix is adjudged to be present. the letters "le," "la," or "lo" are moved to the sort key output field. three input and two output characters are counted, effectively skipping over the blank or hyphen. similarly, if the third character is an "s" followed by a blank or a hyphen, "les," "los," or "las" is moved with a count of four input and three output. otherwise there is no prefix. la place, pierre antoine de laplace 6pierre 7antoine 7de 3 las cases, philippe de las cases 6philippe 7 de 3 le fanu, joseph sheridan lefanu 6joseph 7sheridan 3 lo presti, salvatore iopresti 6salvatore 3 routines for other letters, similar in approach but varying in detail, produce similar results: degli antoni, carlo degliantoni 6carlo 3 de la roche, mazo delaroche 6mazo 3 fitz gibbon, constantine fitzgibbon 6constantine 3 van der bijl, hendrick johannes vanderbijl 6hendrick 7johannes 3 the search for prefixes and quasi-prefixes is not limited to the first surname. it is and quite plainly should be extended to given names: bundy, mcgeorge bundy 6macgeorge 3 bundy, mary lee bundy 6mary 7lee 3 whether it should be extended to later elements of compound surnames is problematical. bowing to the fact that filing is as much an art as a science, in practice a compromise was reached: the prefix search was extended to compounds, except when the prefix of the succeeding element begins with d. the exception was made to accommodate the large number of hispanic names in this pattern, since it seemed clearly preferable to file all the names beginning "perez de" before any of those beginning "perez del": p2erez, joaqu2in perez 6joaquin 3 p2erez de urbel, justo perez 7de 7urbel 6justo 3 p2erez del castillo, j os2e perez 7 del 7 castillo 6jose 3 p2erez gald2os, benito perez 7 galdos 6benito 3 perhaps skipping prefix treatment in subsequent elements should have been made the rule rather than the exception; but an exception would then have been required for "me," "st.," and perhaps others. a list of the prefixes and quasi-prefixes sought for is given in table 1. note that in some cases the result is considered doubtful, and a special signal is set. in such situations the program can then set another signal within the macro and reprocess the name using alternate rules. 190 journal of library automation vol. 4/4 december, 1971 table 1. list of prefixes, etc., found by special search a 1, 4, 7 den st. 4, 15 a 2, 4 der 4, 11, 18 ste. 16 ab des te 4, 11 al 5 di ten 4 ai 3, 4, 6 do ter an 4, 7 dos 4, 11 the 1, 4, 8 ap du van 1, 17 at el 5 van 2, 4, 12, 17 aus 17 el 3, 4,6 van' ... 4,9 aus' ... 4, 9 fitz vande bar 10 im vanden bat 10 in 17 vander ben 10 la ver da las von 17 das 4, 12 le vande de 17 les vanden degli 1 lo vander dei los z 4, 5 del m' 4, 14 zu 17 della mac zum delle me 13 zur della 0 1. only when followed by blank. 2. only when followed by hyphen. 3. only when upper case. 4. "doubt" signal is set. 5. bypassed, i.e. dropped out and disregarded. 6. bypassed if "alternate" signal is on. 7. bypassed unless "alternate" signal is on. 8. bypassed if first word. 9. aus'm and van' t are closed up to "ausm" and "vant'' by the general dropping of apostrophes but no attempt is made at further special processing since their rarity would not justify the necessary elaboration of the algorithm. 10. not treated as prefix if special parameter is present. 11. not treated as prefix if "alternate" signal is on. 12. not treated as prefix unless "alternate" signal is on. 13. expanded to "mac". 14. expanded to "mac" unless "alternate" signal is on. 15. expanded to "saint". 16. expanded to "sainte". 17. another prefix may follow, as in de la. 18. previous notes do not apply when preceded by van or von. processing of personal names/palmer 191 diacritical marks on other than the first letter, or capitalization beyond the normal, such as all caps., would prevent proper processing. except as indicated, lower case is included along with upper, and prefixes followed by a hyphen are treated the same as those followed by a blank. the marc i corpus included several names with hyphenated prefixes, and fortuitously a method was available with the 1401 for giving the hyphen search almost a "free ride" along with that for the blank. since the code for hyphen was a single bit, the so-called b bit, and a blank was represented by no bits, a "branch if bit equal" instruction specifying all the other bits, a, 8, 4, 2, and 1, would branch if any character other than blank or hyphen was present. implementations for other machines may have to devote a disproportionate number of instructions to the search for the rare hyphenated prefixes, or else risk missing them. no doubt some other prefixes could be added to the list. "ua," for example, was considered but not included in the actual working macro after examination of a catalog of five million cards showed that only two beginning with these two letters were not for the prefix. the increase in processing time involved in adding another initial letter to the list of those looked for did not seem to be justified. in the program employing the macro for production of an index to names in the marc pilot project data, whenever the "doubt" signal was set, the name was printed on an edit list for human inspection. the name was then reprocessed with the "alternate" signal set and if a different output form was developed, this form also was printed. if the person reviewing the list accepted the first form, no special action was necessary. if the second was preferred, a card with an identifying number and the code 2 was punched; if a hand-made form was needed, this form was entered on a card with the code 3. these cards and the original output tape were then used to produce an edited output tape, in which the alternate forms were dropped unless a card directed otherwise. a second printed listing, recording the action taken, was also produced. the doubtful cases identified by the algorithm are not limited to the prefix problems described above. by far the commonest occasion for doubt was the presence of "a," "o," or "ii." was it a germanic umlaut, calling for translation for filing purposes to "ae," "oe," or "ue," or was it something else? this is not the place to debate the practice, followed in most american academic libraries, of filing umlauted letters as if spelled out with an "e." the major bibliographies covering the german book trade do so, but most german dictionaries and encydopedias do not; the example of other reference works and indexes is mixed. since the aim of the work described here was to produce an index of names that could be used comfortably by librarians used to the practice, a means of continuing it was sought. however, it would be manifestly improper to insert an "e" if the mark were a diaeresis rather than an umlaut; and, in the opinion of the writer, almost equally improper for hungarian, finnish, and turkish vowels. even 192 journal of library automation vol. 4/4 december, 1971 those who do file such vowels in these languages as if they were germanic do not usually do so for chinese. it should be noted here that not all transformations of special letters turn on the doubt signal. "a" is routinely translated to "aa" and icelandic thorn to "th." other occasions for signalling doubt include names with a suspiciously high number of words before the first comma. this provision was introduced in an attempt to catch some non-names in the original data which had been wrongly coded, e.g. women's association of the st. louis symphony. when found, a card with the code d was punched for the edit run to delete these entries entirely. statistics of processing for the entire corpus of marc pilot project data as cumulated and to some slight degree edited at the harvard university library will be useful in seeing the edit list in proper perspective. the entire file consisted of 47,884 records, 4,285 of which lacked names. the remaining 43,599 records contained 55,286 names ( or alleged names ). of these, 52,372 or 94.7% were judged to be purely routine. special processing of some sort not involving doubt (e.g., recognition of compound surname, expansion of "me" to "mac," closing up of apostrophe or nondoubtful prefix) was performed on 2,283 names, or 4.1%. the total number of doubtful names printed on the edit list was 631, or 1.1%. somewhat more than half of these ( 334) resulted in different forms on being reprocessed with the "alternate" signal on. in 562 of the 631 doubtful cases, or 89% of this group, the first or only form printed was accepted, so that no action beyond inspection was necessary. only 69 names, or not quite one out of 800 of the whole number, required the punching of a card-47 to indicate choice of the second form, 14 supplying a hand-made form, and 8 calling for deletion of non-names. subsequent changes in the macro would have reduced considerably the number of names requiring hand-made forms. it will be instructive to examine some of the names from the edit list to see what types of problems arise. the first selection of actual consecutive names (from lc card number 66-15363 through 66-17297) is rather typical: barnard, douglas st. paul barnard 6douglas 7saint 7paul 3 ekel4of, gunnar,= 1907ekeloef 6gunnar 51907 3 or: ekelof 6gunnar 51907 3 woolley, ai e. woolley 6al 7e 3 sch4onfeld, walther h. p., = 1888schoenfeld 6walther 7h 7p 51888 3 or: schonfeld 6walther 7h 7p 51888 3 ]4anner, michael jaenner 6michael 3 or: janner 6michael 3 m4 uller, alois,= 1924mueller 6alois 51924 3 or: muller 6alois 51924 3 huang, y4uan-shan huang 6yuean 7shan 3 or: huang 6yuan 7shan 3 m4uller, kurt,= 1903 mueller 6kurt 51903 3 or: muller 6kurt 51903 3 processing of personal names/palmer 193 note the dominance of simple umlauts; also, as a curiosity, the fact that all persons named "al" appear on the list because of the possibility that it might be an unhyphenated arabic prefix. note also that saint is treated as a separate word, not closed up as a prefix. "st." was originally put on the doubtful list with the thought that it might stand for sankt or szent instead of saint, although normal library practice would not use an abbreviation in such cases. its inclusion on the doubtful list was unexpectedly justified, however, by the occurrence of the name erlich, vera st. it seems likely that in this case "st." may stand for a patronymic, perhaps stojanova or stefanova, and there may be other occasions on which st. rather than s. is used as an abbreviation for such a name as stefan ( cf. the french use of ch. rather than simple c. as an abbreviation for charles). the only action required for the names in the list above would be to punch a "2" card for the chinese name huang, yuan-shan. indeed, just as the umlaut is the largest category on the edit list, so the non-umlauta diacritic that looks like an umlaut but does not call for insertion of "e"is the commonest occasion for punching an exception card. occasionally a diaeresis is found: lecomte du no 4 uy, pierre lecomte 7du 7nouey 6pierre 3 or: lecomte 7du 7nouy 6pierre 3 more common are certain front vowels in hungarian, finnish, or turkish, or the vowel ii in chinese as already encountered: f 4oldi, mih2aly foeldi 6mihaly 3 t4 olgyessy, juraj or: foldi 6mihaly 3 toelgyessy 6juraj 3 or: tolgyessy 6juraj 3 mettaelae 7portin 6raija 3 or: mettala 7portin 6raija 3 naervaenen 6sakari 3 or: narvanen 6sakari 3 inoenue 6e 3 or: inonu 6e 3 suemer 6mine 3 or: stuner 6mine 3 yue 6ying 7shih 3 or: yu 6ying 7shih 3 some libraries avoid the problem by treating all but the last of these as if umlauted, but determination of the correct category can usually be made at sight. occasionally a name gives pause, for example these two which both prove to be swiss and presumably germanic, although chonz may be romansh: ch4onz, selina r4 uede, thomas or: or: choenz 6selina 3 chonz 6selina 3 rueede 6thomas 3 ruede 6thomas 3 194 journal of library automation vol. 4/4 december, 1971 somewhat more troublesome are names where some but not all elements are germanic: vogt, ulya (g4oknil) ouchterlony, 40rjan vogt 6ulya 7 goeknil 3 or: vogt 6ulya 7 goknil 3 ouchterlony 6oerjan 3 or: ouchterlony 6orjan 3 ivanyi 7 gruenwald 6bela 3 or: ivanyi 7grunwald 6bela 3 although vogt is obviously germanic, ulya goknil is equally obviously not, and therefore the decision is that no umlaut is present. orjan, on the other hand, is a scandanavian forename, to be treated as umlauted even though coupled with a surname of scottish gaelic origin. bela ivanyigrunwald is a more difficult case. grunwald is of course germanic in origin, but can it be regarded as magyarized? in english we might assume that such a name is anglicized when the bearer starts writing it grunwald or gruenwald. however, the case is not so clear in hungarian, since that language also has the letter "u." discussion of such a point may seem to split hairs, but it does involve a significant difference between manual and machine systems. in a manual system, the question of whether to file as ivanyi-grunwald or as ivanyi-gruenwald would arise only in the exceedingly unlikely event that another name which would file between the two also occurred in the corpus. in a machine system, however, any difference, even this late in a distinctive name, could result in the various works of the author being misfiled among themselves, or a work about him filed before one by him. use of different codes to represent the same graphic, umlaut on the one hand or diaeresis or other non-umlaut on the other, would drastically reduce both the number of doubtful names aud lhe number of those for which an exception procedure is required. the harvard college library actually follows this practice. the library of congress experimented with it, but found that catalogers were reluctant in some cases to make the decision. contemplation of the case of bela ivanyi-grunwald gives the author more sympathy with this reluctance than he originally felt. in attempting to evaluate the method described above, one must acknowledge both strong points and limitations. on the one hand it is very gratifying to see aesop us and [a esopus] falling together despite differences in the capitalization of the "e" and the bracketing, and to find such sequences as the following, all without even being referred to the edit list under the rules then prevailing: aziz, khursheed kamal aziz ahmad al-azm, sadik j. azrael, jeremy r. ba maw, u baab, clarence theodore aziz 6khursheed 7kamal 3 aziz 7 ahmad 3 azm 6sadik 7j 3 azrael 6jeremy 7r 3 ba 7maw 6u 3 baab 6clarence 7theodore 3 processing of personal names/palmer 195 delgado, david j. del grande, john joseph delhom, louis a. delieb, eric delise, knoxie c. de lisser, r. lionel dell, ralph bishop dellinger, dave dell'isola, frank del mar, alexander delmar, anton delmar-morgan, edward locker delgado 6david 7j 3 delgrande 6john 7joseph 3 delhom 6louis 7 a 3 delieb 6eric 3 delise 6knoxie 7 c 3 delisser 6r 7lionel 3 dell 6ralph 7bishop 3 dellinger 6dave 3 dellisola 6frank 3 delmar 6alexander 3 delmar 6anton 3 delmar 7morgan 6edward 7locker 3 while it is certainly true that the system cannot survive without some provision for referring doubtful questions to a human editor, the number of these depends to a considerable extent on the filing and coding policies followed. provided forename entries are coded as such, the system does a good job of identifying possible problems. (presently, all multiple word forename entries are considered doubtful.) "u a" has already been cited as an example of a prefix deliberately omitted, and there are others which could be added at any time it is thought worth while. a more troublesome situation, pointed out by kelley cartwright, is the possible occurrence of "van" as a non-final element of an unhyphenated vietnamese name. the only way this could be prevented from misfiling by merging it with the next element would be to throw all "vans" including the numerous ones of dutch origin into the doubtful category, expanding the edit list more than twenty percent. this did not seem advisable, particularly since normal library usage is to hyphenate vietnamese compound names. up to this point the evaluation is quite favorable. the system can correctly process a very large proportion of names, including some which involve quite sophisticated points, without reference to a human editor, and it can call virtually all the rest to the attention of an editor. however, human review of problems means that there will be occasions when borderline cases are decided in different ways. if a permanent machine file of all established forms of names in the system is kept, both forms of each doubtful name could be checked against it so that decisions already made would not have to be repeated, thus saving the time of the editor as well as the hazard of differing decisions. it would of course be very expensive to keep such a file just for this purpose, but a file of this type would probably form a part of a comprehensive mechanized bibliographic system anyway. another area in which a mixed report would have to be given to the system is its extensibility to types of headings other than names. in work conducted on the same principles with a few thousand early titles from the marc pilot project, there were only two conspicuous problems, one of which may not in fact be a problem: the filing of numbers as such rather 196 journal of library automation vol. 4/4 december, 1971 than as if they were spelled out in the language of the title. true, the particular algorithm then in use did not provide for bringing numbers of differing length into logical order ("50 great ghost stories" before "200 years of watercolor painting in america" ), but this is a readily attainable refinement. the other problem is more refractory and is exemplified by titles beginning with prefix names, for example "de gaulle," "de soto," and "van gogh." names within titles could not receive the usual name treatment since there was no way of identifying them as such, and therefore the prefixes were filed as separate words. furthermore, while marc pilot project authors were quite a cosmopolitan lot, the titles were almost entirely in english. therefore, removal of initial articles was not much of a problem. there did not happen to be any work beginning "a to z of ... ". however, there was a book which, although in english and so coded, had a title beginning with a spanish article: "la vida," by the late oscar lewis. in working toward automatic removal of initial articles from titles, the usual assumption is that machine coding of the language of the work is available and will be checked first. this seems desirable both because it is probably more efficient in machine time than to check every title against a long list of possible articles in many languages, and because words that are articles in one lan~uage are not necessarily so in another. most occurrences of initial "die' are probably german articles, but some are other parts of speech in english, for example "die casting" or "die like a dog." if the umlaut is the common problem in names, the initial indefinite article which is the same as the numeral "one" in several languages may well be the most frequent occasion for doubt in processing of titles. "un" or "ein" will usually mean "a," to be dropped; but will sometimes mean "one," to be kept. there are certainly other problems, in addition to the one with prefix names already mentioned, including some that give trouble even in manual filing: "charles the first," "charles ii," "charles v et son temps." it may be that at some point in the cataloging process a reviser will have to be on the lookout for certain of these special situations and add flags to indicate that a title includes a prefix name, or that it begins with an article which would not be found by program, or that it does not begin with an article although it appears to do so, or that for some other reason it calls for a hand made key. the system described is not an absolute system, but absolute systems have their own tyrannies. if, as the author believes, cartwright and shoffner ( 2) are correct in thinking that a mixture of methods will be required in actual book catalog projects, then a system along the lines of the one described may well be a useful part of the mix. references 1. nugent, william r.: "the mechanization of the filing rules for the dictionary catalogs of the library of congress," library resources & technical services, 11 (spring 1967), 145-166. processing of personal names/palmer 197 2. cartwright, kelley l.; shoffner, ralph m.: catalogs in book form ([berkeley]; institute of library research, university of california, 1967 ), pp. 24-27. 3. cartwright, kelley l.: "mechanization and library filing rules," advances in librarianship, 1 ( 1970 ), 59-94. 4. hines, theodore c.; harris, jessica l.: computer filing of index, bibliographic, and catalog entries (newark, n.j.: bro-dart foundation, [ 1966]). 5. harris, jessica l.; hines, theodore c.: "the mechanization of the filing rules for library catalogs: dictionary or divided," library resources & technical services, 14 (fall 1970 ), 502-516. 6. palmer, foster m.: a macro instruction to process personal names for filing ([cambridge, mass.]: harvard university library, 1970 ). a copy of this document, which contains an autocoder listing of the actual working macro, has been deposited with the national auxiliary publications service, from which it can be obtained on microfiche (naps 01680 ) . in this version there are only three codes, 2 corresponding to 3 as used in this paper, 4 to both 5 and 6, and 6 to 7. there are also a few differences in the treatment of particular prefixes. the macro is made up of 579 cards, of which 125 are comments only. batch loading collections into dspace | walsh 117 maureen p. walsh batch loading collections into dspace: using perl scripts for automation and quality control colleagues briefly described batch loading marc metadata crosswalked to dspace dublin core (dc) in a poster session.2 mishra and others developed a perl script to create the dspace archive directory for batch import of electronic theses and dissertations (etds) extracted with a java program from an in-house bibliographic database.3 mundle used perl scripts to batch process etds for import into dspace with marc catalog records or excel spreadsheets as the source metadata.4 brownlee used python scripts to batch process comma-separated values (csv) files exported from filemaker database software for ingest via the dspace item importer.5 more in-depth descriptions of batch loading are provided by thomas; kim, dong, and durden; proudfoot et al.; witt and newton; drysdale; ribaric; floyd; and averkamp and lee. however, irrespective of repository software, each describes a process to populate their repositories dissimilar to the workflows developed for the knowledge bank in approach or source data. thomas describes the perl scripts used to convert marc catalog records into dc and to create the archive directory for dspace batch import.6 kim, dong, and durden used perl scripts to semiautomate the preparation of files for batch loading a university of texas harry ransom humanities research center (hrc) collection into dspace. the xml source metadata they used was generated by the national library of new zealand metadata extraction tool.7 two subsequent projects for the hrc revisited the workflow described by kim, dong, and durden.8 proudfoot and her colleagues discuss importing metadata-only records from departmental refbase, thomson reuters endnote, and microsoft access databases into eprints. they also describe an experimental perl script written to scrape lists of publications from personal websites to populate eprints.9 two additional workflow examples used citation databases as the data source for batch loading into repositories. witt and newton provide a tutorial on transforming endnote metadata for digital commons with xslt (extensible stylesheet language transformations).10 drysdale describes the perl scripts used to convert thomson reuters reference manager files into xml for the batch loading of metadata-only records into the university of glascow’s eprints repository.11 the glascow eprints batch workflow is additionally described by robertson and nixon and greig.12 several workflows were designed for batch loading etds into repositories. ribaric describes the automatic this paper describes batch loading workflows developed for the knowledge bank, the ohio state university’s institutional repository. in the five years since the inception of the repository approximately 80 percent of the items added to the knowledge bank, a dspace repository, have been batch loaded. most of the batch loads utilized perl scripts to automate the process of importing metadata and content files. custom perl scripts were used to migrate data from spreadsheets or comma-separated values files into the dspace archive directory format, to build collections and tables of contents, and to provide data quality control. two projects are described to illustrate the process and workflows. t he mission of the knowledge bank, the ohio state university’s (osu) institutional repository, is to collect, preserve, and distribute the digital intellectual output of osu’s faculty, staff, and students.1 the staff working with the knowledge bank have sought from its inception to be as efficient as possible in adding content to dspace. using batch loading workflows to populate the repository has been integral to that efficiency. the first batch load into the knowledge bank was august 29, 2005. over the next four years, 698 collections containing 32,188 items were batch loaded, representing 79 percent of the items and 58 percent of the collections in the knowledge bank. these batch loaded collections vary from journal issues to photo albums. the items include articles, images, abstracts, and transcripts. the majority of the batch loads, including the first, used custom perl scripts to migrate data from microsoft excel spreadsheets into the dspace batch import format for descriptive metadata and content files. perl scripts have been used for data cleanup and quality control as part of the batch load process. perl scripts, in combination with shell scripts, have also been used to build collections and tables of contents in the knowledge bank. the workflows using perl scripts to automate batch import into dspace have evolved through an iterative process of continual refinement and improvement. two knowledge bank projects are presented as case studies to illustrate a successful approach that may be applicable to other institutional repositories. ■■ literature review batch ingesting is acknowledged in the literature as a means of populating institutional repositories. there are examples of specific batch loading processes minimally discussed in the literature. branschofsky and her maureen p. walsh (walsh.260@osu.edu) is metadata librarian/ assistant professor, the ohio state university libraries, columbus, ohio. 118 information technology and libraries | september 2010 relational database postgresql 8.1.11 on the red hat enterprise linux 5 operating system. the structure of the knowledge bank follows the hierarchical arrangement of dspace. communities are at the highest level and can be divided into subcommunities. each community or subcommunity contains one or more collections. all items—the basic archival elements in dspace—are contained within collections. items consist of metadata and bundles of bitstreams (files). dspace supports two user interfaces: the original interface based on javaserver pages (jspui) and the newer manakin (xmlui) interface based on the apache cocoon framework. at this writing, the knowledge bank continues to use the jspui interface. the default metadata used by dspace is a qualified dc schema derived from the dc library application profile.18 the knowledge bank uses a locally defined extended version of the default dspace qualified dc schema, which includes several additional element qualifiers. the metadata management for the knowledge bank is guided by a knowledge bank application profile and a core element set for each collection within the repository derived from the application profile.19 the metadata librarians at osul create the collection core element sets in consultation with the community representatives. the core element sets serve as metadata guidelines for submitting items to the knowledge bank regardless of the method of ingest. the primary means of adding items to collections in dspace, and the two ways used for knowledge bank ingest, are (1) direct (or intermediated) author entry via the dspace web item submission user interface and (2) in batch via the dspace item importer. recent enhancements to dspace, not yet fully explored for use with the knowledge bank, include new ingest options using simple web-service offering repository deposit (sword), open archives initiative object reuse and exchange (oai-ore), and dspace package importers such as the metadata encoding and transmission standard submission information package (mets sip) preparation of etds from the internet archive (http:// www.archive.org/) for ingest into dspace using php utilities.13 floyd describes the processor developed to automate the ingest of proquest etds via the dspace item importer.14 also using proquest etds as the source data, averkamp and lee described using xslt to transform the proquest data to bepress’ (the berkeley electronic press) schema for batch loading into a digital commons repository.15 the knowledge bank workflows described in this paper use perl scripts to generate dc xml and create the archive directory for batch loading metadata records and content files into dspace using excel spreadsheets or csv files as the source metadata. ■■ background the knowledge bank, a joint initiative of the osu libraries (osul) and the osu office of the chief information officer, was first registered in the registry of open access repositories (roar) on september 28, 2004.16 as of december 2009 the repository held 40,686 items in 1,192 collections. the knowledge bank uses dspace, the open-source java-based repository software jointly developed by the massachusetts institute of technology libraries and hewlett-packard.17 as a dspace repository, the knowledge bank is organized by communities. the fifty-two communities currently in the knowledge bank include administrative units, colleges, departments, journals, library special collections, research centers, symposiums, and undergraduate honors theses. the commonality of the varied knowledge bank communities is their affiliation with osu and their production of knowledge in a digital format that they wish to store, preserve, and distribute. the staff working with the knowledge bank includes a team of people from three osul areas—technical services, information technology, and preservation—and the contracted hours of one systems developer from the osu office of information technology (oit). the osul team members are not individually assigned full-time to the repository. the current osul team includes a librarian repository manager, two metadata librarians, one systems librarian, one systems developer, two technical services staff members, one preservation staff member, and one graduate assistant. the knowledge bank is currently running dspace 1.5.2 and the figure 1. dspace simple archive format archive_directory/ item_000/ dublin_core.xml--qualified dublin core metadata contents --text file containing one line per filename file_l.pdf --files to be added as bitstreams to the item file_2.pdf item_001/ dublin_core.xml file_1.pdf ... batch loading collections into dspace | walsh 119 ■■ case studies the issues of the ohio journal of science ojs was jointly published by osu and the ohio academy of science (oas) until 1974, when oas took over sole control of the journal. the issues of ojs are archived in the knowledge bank with a two year rolling wall embargo. the issues for 1900 through 2003, a total of 639 issues containing 6,429 articles, were batch loaded into the knowledge bank. due to rights issues, the retrospective batch loading project had two phases. the project to digitize ojs began with the 1900–1972 issues that osu had the rights to digitize and make publicly available. osu later acquired the rights for 1973–present, and (accounting for the embargo period) 1973–2003 became phase 2 of the project. the two phases of batch loads were the most complicated automated batch loading processes developed to date for the knowledge bank. to batch load phase 1 in 2005 and phase 2 in 2006, the systems developers working with the knowledge bank wrote scripts to build collections, generate dc xml from the source metadata, create the archive directory, load the metadata and content files, create tables of contents, and load the tables of contents into dspace. the ojs community in the knowledge bank is organized by collections representing each issue of the journal. the systems developers used scripts to automate the building of the collections in dspace because of the number needed as part of the retrospective project. the individual articles within the issues are items within the collections. there is a table of contents for the articles in each issue as part of the collection homepages.21 again, due to the number required for the retrospective project, the systems developers used scripts to automate the creation and loading of the tables of contents. the tables of contents are contained in the html introductory text section of the collection pages. the tables of contents list title, authors, and pages. they also include a link to the item record and a direct link to the article pdf that includes the file size. for each phase of the ojs project, a vendor contracted by osul supplied the article pdfs and an excel spreadsheet with the article-level metadata. the metadata format. this paper describes ingest via the dspace batch item importer. the dspace item importer is a command-line tool for batch ingesting items. the importer uses a simple archive format diagramed in figure 1. the archive is a directory of items that contain a subdirectory of item metadata, item files, and a contents file listing the bitstream file names. each item’s descriptive metadata is contained in a dc xml file. the format used by dspace for the dc xml files is illustrated in figure 2. automating the process of creating the unix archive directory has been the main function of the perl scripts written for the knowledge bank batch loading workflows. a systems developer uses the test mode of the dspace item importer tool to validate the item directories before doing a batch load. any significant errors are corrected and the process is repeated. after a successful test, the batch is loaded into the staging instance of the knowledge bank and quality checked by a metadata librarian to identify any unexpected results and script or data problems that need to be corrected. after a successful load into the staging instance the batch is loaded into the production instance of the knowledge bank. most of the knowledge bank batch loading workflows use excel spreadsheets or csv files as the source for the descriptive item metadata. the creation of the metadata contained in the spreadsheets or files has varied by project. in some cases the metadata is created by osul staff. in other cases the metadata is supplied by knowledge bank communities in consultation with a metadata librarian or by a vendor contracted by osul. whether the source metadata is created in-house or externally supplied, osul staff are involved in the quality control of the metadata. several of the first communities to join the knowledge bank had very large retrospective collection sets to archive. the collection sets of two of those early adopters, the journal issues of the ohio journal of science (ojs) and the abstracts of the osu international symposium on molecular spectroscopy currently account for 59 percent of the items in the knowledge bank.20 the successful batch loading workflows developed for these two communities—which continue to be active content suppliers to the repository—are presented as case studies. figure 2. dspace qualified dublin core xml notes on the bird life of cedar point 1901-04 griggs, robert f. 120 information technology and libraries | september 2010 article-level metadata to knowledge bank dc, as illustrated in table 1. the systems developers used the mapping as a guide to write perl scripts to transform the vendor metadata into the dspace schema of dc. the workflow for the two phases was nearly identical, except each phase had its own batch loading scripts. due to a staff change between the two phases of the project, a former osul systems developer was responsible for batch loading phase 1 and the oit systems developer was responsible for phase 2. the phase 1 scripts were all written in perl. the four scripts written for phase 1 created the archive directory, performed database operations to build the collections, generated the html introduction table of contents for each collection, and loaded the tables of contents into dspace via the database. for phase 2, the oit systems developer modified and added to the phase 1 batch processing scripts. this case study focuses on phase 2 of the project. batch processing for phase 2 of ojs the annotated scripts the oit systems developer used for phase 2 of the ojs project are included in appendix a, available on the italica weblog (http://ital-ica .blogspot.com/). a shell script (mkcol.sh) added collections based on a listing of the journal issues. the script performed a login as a selected user id to the dspace web interface using the web access tool curl. a subsequent simple looping perl script (mkallcol.pl) used the stored credentials to submit data via this channel to build the collections in the knowledge bank. the metadata.pl script created the archive directory for each collection. the oit systems developer added the pdf file for each item to unix. the vendor-supplied metadata was saved as unicode text format and transferred to unix for further processing. the developer used vi commands to manually modify metadata for characters illegal in xml (e.g., “<” and “&”). (although manual steps were used for this project, the oit systems developer improved the perl scripts for subsequent projects by adding code for automated transformation of the input data to help ensure xml validity.) the metadata.pl script then processed each line of the metadata along with the corresponding data file. for each item, the script created the dc xml file and the contents file and moved them and the pdf file to the proper directory. load sets for each collection (issue) were placed in their own subdirectory, and a load was done for each subdirectory. the items for each collection were loaded by a small perl script (loaditems. pl) that used the list of issues and their collection ids and called a shell script (import.sh) for the actual load. the tables of contents for the issues were added to the knowledge bank after the items were loaded. a perl script (intro.pl) created the tables of contents using the metadata and the dspace map file, a stored mapping of item received from the vendor had not been customized for the knowledge bank. the ojs issues were sent to a vendor for digitization and metadata creation before the knowledge bank was chosen as the hosting site of the digitized journal. the osu digital initiatives steering committee 2002 proposal for the ojs digitization project had predated the knowledge bank dspace instance. osul staff performed quality-control checks of the vendor-supplied metadata and standardized the author names. the vendor supplied the author names as they appeared in the articles—in direct order, comma separated, and including any “and” that appeared. in addition to other quality checks performed, osul staff edited the author names in the spreadsheet to conform to dspace author-entry convention (surname first). semicolons were added to separate author names, and the extraneous ands were removed. a former metadata librarian mapped the vendor-supplied table 1. mapping of vendor metadata to qualified dublin core vendor-supplied metadata knowledge bank dublin core file [n/a: pdf file name] cover title dc.identifier.citation* issn dc.identifier.issn vol. dc.identifier.citation* iss. dc.identifier.citation* cover date dc.identifier.citation* year dc.date.issued month dc.date.issued fpage dc.identifier.citation* lpage dc.identifier.citation* article title dc.title author names dc.creator institution dc.description abstract dc.description.abstract n/a dc.language.iso n/a dc.rights n/a dc.type *format: [cover title]. v[vol.], n[iss.] ([cover date]), [fpage]-[lpage] batch loading collections into dspace | walsh 121 directories to item handles created during the load. the tables of contents were added to the knowledge bank using a shell script (installintro.sh) similar to what was used to create the collections. installintro.sh used curl to simulate a user adding the data to dspace by performing a login as a selected user id to the dspace web interface. a simple looping perl script (ldallintro.pl) called installintro.sh and used the stored credentials to submit the data for the tables of contents. the abstracts of the osu international symposium on molecular spectroscopy the knowledge bank contains the abstracts of the papers presented at the osu international symposium on molecular spectroscopy (mss), which has met annually since 1946. beginning with the 2005 symposium, the complete presentations from authors who have authorized their inclusion are archived along with the abstracts. the mss community in the knowledge bank currently contains 17,714 items grouped by decade into six collections. the six collections were created “manually” via the dspace web interface prior to the batch loading of the items. the retrospective years of the symposium (1946–2004) were batch loaded in three phases in 2006. each symposium year following the retrospective loads was batch loaded individually. retrospective mss batch loads the majority of the abstracts for the retrospective loads were digitized by osul. a vendor was contracted by osul to digitize the remainder and to supply the metadata for the retrospective batch loads. the files digitized by osul were sent to the vendor for metadata capture. osul provided the vendor a metadata template derived from the mss core element set. the metadata taken from the abstracts comprised author, affiliation, title, year, session number, sponsorship (if applicable), and a full transcription of the abstract. to facilitate searching, the formulas and special characters appearing in the titles and abstracts were encoded using latex, a document preparation system used for scientific data. the vendor delivered the metadata in excel spreadsheets as per the spreadsheet template provided by osul. quality-checking the metadata was an essential step in the workflow for osul. the metadata received for the project required revisions and data cleanup. the vendor originally supplied incomplete files and spreadsheets that contained data errors, including incorrect numbering, data in the wrong fields, and inconsistency with the latex encoding. the three knowledge bank batch load phases for the retrospective mss project corresponded to the staged receipt of metadata and digitized files from the vendor. the annotated scripts used for phase 2 of the project, which included twenty years of the osu international symposium between 1951 and 1999, are included in appendix b, available on the italica weblog. the oit systems developer saved the metadata as a tab-separated file and added it to unix along with the abstract files. a perl script (mkxml2.pl) transformed the metadata into dc xml and created the archive directories for loading the metadata and abstract files into the knowledge bank. the script divided the directories into separate load sets for each of the six collections and accounted for the inconsistent naming of the abstract files. the script added the constant data for type and language that was not included in the vendor-supplied metadata. unlike the ojs project, where multiple authors were on the same line of the metadata file, the mss phase 2 script had to code for authors and their affiliations on separate lines. once the load sets were made, the oit systems developer ran a shell script to load them. the script (import_ collections.sh) was used to run the load for each set so that the dspace item import command did not need to be constructed each time. annual mss batch loads a new workflow was developed for batch loading the annual mss collection additions. the metadata and item files for the annual collection additions are supplied by the mss community. the community provides the symposium metadata in a csv file and the item files in a tar archive file. the symposium uses a web form for latex–formatted abstract submissions. the community processes the electronic symposium submissions with a perl script to create the csv file. the metadata delivered in the csv file is based on the template created by the author, which details the metadata requirements for the project. the oit systems developer borrowed from and modified earlier perl scripts to create a new script for batch processing the metadata and files for the annual symposium collection additions. to assist with the development of the new script, i provided the developer a mapping of the community csv headings to the knowledge bank dc fields. i also provided a sample dc xml file to illustrate the desired result of the perl transformation of the community metadata into dc xml. for each new year of the symposium, i create a sample dc xml result for an item to check the accuracy of the script. a dc xml example from a 2009 mss item is included in appendix c, available on the italica weblog. unlike the previous retrospective mss loads in which the script processed multiple years of the symposium, the new script processes one year at a time. the annual symposiums are batch loaded individually into one existing mss decade collection. the new script for the annual loads was tested and refined by loading the 2005 symposium into the staging instance of the 122 information technology and libraries | september 2010 ■■ summary and conclusion each of the batch loads that used perl scripts had its own unique features. the format of content and associated metadata varied considerably, and custom scripts to convert the content and metadata into the dspace import format were created on a case-by-case basis. the differences between batch loads included the delivery format of the metadata, the fields of metadata supplied, how metadata values were delimited, the character set used for the metadata, the data used to uniquely identify the files to be loaded, and how repeating metadata fields were identified. because of the differences in supplied metadata, a separate perl script for generating the dc xml and archive directory for batch loading was written for each project. each new perl script borrowed from and modified earlier scripts. many of the early batch loads were firsts for the knowledge bank and the staff working with the repository, both in terms of content and in terms of metadata. dealing with communityand vendor-supplied metadata and various encodings (including latex), each of the early loads encountered different data obstacles, and in each case solutions were written in perl. the batch loading code has matured over time, and the progression of improvements is evident in the example scripts included in the appendixes. batch loading can greatly reduce the time it takes to add content and metadata to a repository, but successful knowledge bank. problems encountered with character encoding and file types were resolved by modifying the script. the metadata and files for the symposium years 2005, 2006, and 2007 were made available to osul in 2007, and each year was individually loaded into the existing knowledge bank collection for that decade. these first three years of community-supplied csv files contained author metadata inconsistent with knowledge bank author entries. the names were in direct order, uppercase, split by either a semicolon or “and,” and included extraneous data, such as an address. the oit systems developer wrote a perl script to correct the author metadata as part of the batch loading workflow. an annotated section of that script illustrating the author modifications is included in appendix d, available on the italica weblog. the mss community revised the perl script they used to generate the csv files by including an edited version of this author entry correction script and were able to provide the expected author data for 2008 and 2009. the author entries received for these years were in inverted order (surname first) and mixed case, were semicolon separated, and included no extraneous data. the receipt of consistent data from the community for the last two years has facilitated the standardized workflow for the annual mss loads. the scripts used to batch load the 2009 symposium year are included in appendix e, which appears at the end of this text. the oit systems developer unpacked the tar file of abstracts and presentations into a directory named for the year of the symposium on unix. the perl script written for the annual mss loads (mkxml. pl) was saved on unix and renamed mkxml2009.pl. the script was edited for 2009 (including the name of the csv file and the location of the directories for the unpacked files and generated xml). the csv headings used by the community in the new file were checked and verified against the extract list in the script. once the perl script was up-to-date and the base directory was created, the oit systems developer ran the perl script to generate the archive directory set for import. the import.sh script was then edited for 2009 and run to import the new symposium year into the staging instance of the knowledge bank as a quality check prior to loading into the live repository. the brief item view of an example mss 2009 item archived in the knowledge bank is shown in figure 3. figure 3. mss 2009 archived item example batch loading collections into dspace | walsh 123 proceedings of the 2003 international conference on dublin core and metadata applications: supporting communities of discourse and practice—metadata research & applications, seattle, washington, 2003, http://dcpapers .dublincore.org/ojs/pubs/article/view/753/749 (accessed dec. 21, 2009). 3. r. mishra et al., “development of etd repository at iitk library using dspace,” in international conference on semantic web and digital libraries (icsd-2007), ed. a. r. d. prasad and devika p. madalli (2007), 249–59. http://hdl.handle .net/1849/321 (accessed dec. 21, 2009). 4. todd m. mundle, “digital retrospective conversion of theses and dissertations: an in house project” (paper presented to the 8th international symposium on electronic theses & dissertations, sydney, australia, sept. 28–30, 2005), http://adt.caul .edu.au/etd2005/papers/080mundle.pdf (accessed dec. 21, 2009). 5. rowan brownlee, “research data and repository metadata: policy and technical issues at the university of sydney library,” cataloging & classification quarterly 47, no. 3/4 (2009): 370–79. 6. steve thomas, “importing marc data into dspace,” 2006, http://hdl.handle.net/2440/14784 (accessed dec. 21, 2009). 7. sarah kim, lorraine a. dong, and megan durden, “automated batch archival processing: preserving arnold wesker’s digital manuscripts,” archival issues 30, no. 2 (2006): 91–106. 8. elspeth healey, samantha mueller, and sarah ticer, “the paul n. banks papers: archiving the electronic records of a digitally-adventurous conservator,” 2009, https://pacer .ischool.utexas.edu/bitstream/2081/20150/1/paul_banks_ final_report.pdf (accessed dec. 21, 2009); lisa schmidt, “preservation of a born digital literary genre: archiving legacy macintosh hypertext files in dspace,” 2007, https://pacer .ischool.utexas.edu/bitstream/2081/9007/1/mj%20wbo%20 capstone%20report.pdf (accessed dec. 21, 2009). 9. rachel e. proudfoot et al., “jisc final report: increase (increasing repository content through automation and services),” 2009, http://eprints.whiterose.ac.uk/9160/ (accessed dec. 21, 2009). 10. michael witt and mark p. newton, “preparing batch deposits for digital commons repositories,” 2008, http://docs .lib.purdue.edu/lib_research/96/ (accessed dec. 21, 2009). 11. lesley drysdale, “importing records from reference manager into gnu eprints,” 2004, http://hdl.handle.net/1905/175 (accessed dec. 21, 2009). 12. r. john robertson, “evaluation of metadata workflows for the glasgow eprints and dspace services,” 2006, http://hdl .handle.net/1905/615 (accessed dec. 21, 2009); william j. nixon and morag greig, “populating the glasgow eprints service: a mediated model and workflow,” 2005, http://hdl.handle .net/1905/387 (accessed dec. 21, 2009). 13. tim ribaric, “automatic preparation of etd material from the internet archive for the dspace repository platform,” code4lib journal no. 8 (nov. 23, 2009), http://journal.code4lib.org/ articles/2152 (accessed dec. 21, 2009). 14. randall floyd, “automated electronic thesis and dissertations ingest,” (mar. 30, 2009), http://wiki.dlib.indiana.edu/ confluence/x/01y (accessed dec. 21, 2009). 15. shawn averkamp and joanna lee, “repurposing probatch loading workflows are dependent upon the quality of data and metadata loaded. along with testing scripts and checking imported metadata by first batch loading to a development or staging environment, quality control of the supplied metadata is an integral step. the flexibility of perl allowed testing and revising to accommodate problems encountered with how the metadata was supplied for the heterogeneous collections batch loaded into the knowledge bank. however, toward the goal of standardizing batch loading workflows, the staff working with the knowledge bank iteratively refined not only the scripts but also the metadata requirements for each project and how those were communicated to the data suppliers with mappings, explicit metadata examples, and sample desired results. the efficiency of batch loading workflows is greatly enhanced by consistent data and basic standards for how metadata is supplied. batch loading is not only an extremely efficient means of populating an institutional repository, it is also a valueadded service that can increase buy-in from the wider campus community. it is hoped that by openly sharing examples of our batch loading scripts we are contributing to the development of an open library of code that can be borrowed and adapted by the library community toward future institutional repository success stories. ■■ acknowledgments i would like to thank conrad gratz, of osu oit, and andrew wang, formerly of osul. gratz wrote the shell scripts and the majority of the perl scripts used for automating the knowledge bank item import process and ran the corresponding batch loads. the early perl scripts used for batch loading into the knowledge bank, including the first phase of ojs and mss, were written by wang. parts of those early perl scripts written by wang were borrowed for subsequent scripts written by gratz. gratz provided the annotated scripts appearing in the appendixes and consulted with the author regarding the description of the scripts. i would also like to thank amanda j. wilson, a former metadata librarian for osul, who was instrumental to the success of many of the batch loading workflows created for the knowledge bank. references and notes 1. the ohio state university knowledge bank, “institutional repository policies,” 2007, http://library.osu.edu/sites/ kbinfo/policies.html (accessed dec. 21, 2009). the knowledge bank homepage can be found at https://kb.osu.edu/dspace/ (accessed dec. 21, 2009). 2. margret branschofsky et al., “evolving metadata needs for an institutional repository: mit’s dspace,” 124 information technology and libraries | september 2010 appendix e. mss 2009 batch loading scripts -mkxml2009.pl -#!/usr/bin/perl use encode; # routines for utf encoding use text::xsv; # routines to process csv files. use file::basename; # open and read the comma separated metadata file. my $csv = new text::xsv; #$csv->set_sep(' '); # use for tab separated files. $csv->open_file("mss2009.csv"); $csv->read_header(); # process the csv column headers. # constants for file and directory names. $basedir = "/common/batch/input/mss/"; $indir = "$basedir/2009"; $xmldir= "./2009xml"; $imagesubdir= "processed_images”; $filename = "dublin_core.xml"; # process each line of metadata, one line per item. $linenum = 1; while ($csv->get_row()) { # this divides the item's metadata into fields, each in its own variable. my ( $identifier, $title, $creators, $description_abstract, $issuedate, $description, $description2, appendixes a–d available at http://ital-ica.blogspot.com/ quest metadata for batch ingesting etds into an institutional repository,” code4lib journal no. 7 (june 26, 2009), http://journal .code4lib.org/articles/1647 (accessed dec. 21, 2009). 16. tim brody, registry of open access repositories (roar), http://roar.eprints.org/ (accessed dec. 21, 2009). 17. duraspace, dspace, http://www.dspace.org/ (accessed dec. 21, 2009). 18. dublin core metadata initiative libraries working group, “dc-library application profile (dc-lib),” http://dublincore .org/documents/2004/09/10/library-application-profile/ (accessed dec. 21, 2009). 19. the ohio state university knowledge bank policy committee, “osu knowledge bank metadata application profile,” http://library.osu.edu/sites/techservices/kbappprofile.php (accessed dec. 21, 2009). 20. ohio journal of science (ohio academy of science), knowledge bank community, http://hdl.handle .net/1811/686 (accessed dec. 21, 2009); osu international symposium on molecular spectroscopy, knowledge bank community, http://hdl.handle.net/1811/5850 (accessed dec. 21, 2009). 21. ohio journal of science (ohio academy of science), ohio journal of science: volume 74, issue 3 (may, 1974), knowledge bank collection, http://hdl.handle.net/1811/22017 (accessed dec. 21, 2009). batch loading collections into dspace | walsh 125 $abstract, $gif, $ppt, ) = $csv->extract( "talk_id", "title", "creators", "abstract", "issuedate", "description", "authorinstitution", "image_file_name", "talk_gifs_file", "talk_ppt_file" ); $creatorxml = ""; # multiple creators are separated by ';' in the metadata. if (length($creators) > 0) { # create xml for each creator. @creatorlist = split(/;/,$creators); foreach $creator (@creatorlist) { if (length($creator) > 0) { $creatorxml .= '' .$creator.’’.”\n “; } } } # done processing creators for this item. # create the xml string for the abstract. $abstractxml = ""; if (length($description_abstract) > 0) { # convert special metadata characters for use in xml/html. $description_abstract =~ s/\&/&/g; $description_abstract =~ s/\>/>/g; $description_abstract =~ s/\' .$description_abstract.''; } # create the xml string for the description. $descriptionxml = ""; if (length($description) > 0) { # convert special metadata characters for use in xml/html. $description=~ s/\&/&/g; $description=~ s/\>/>/g; $description=~ s/\' .$description.''; } appendix e. mss 2009 batch loading scripts (cont.) 126 information technology and libraries | september 2010 # create the xml string for the author institution. $description2xml = ""; if (length($description2) > 0) { # convert special metadata characters for use in xml/html. $description2=~ s/\&/&/g; $description2=~ s/\>/>/g; $description2=~ s/\' .'author institution: '.$description2.''; } # convert special characters in title. $title=~ s/\&/&/g; $title=~ s/\>/>/g; $title=~ s/\:encoding(utf-8)", "$basedir/$subdir/$filename"); print fh <<"xml"; $identifier $title $issuedate $abstractxml $descriptionxml $description2xml article en $creatorxml xml close($fh); # create contents file and move files to the load set. # copy item files into the load set. if (defined($abstract) && length($abstract) > 0) { system "cp $indir/$abstract $basedir/$subdir"; } $sourcedir = substr($abstract, 0, 5); if (defined($ppt) && length($ppt) > 0 ) { system "cp $indir/$sourcedir/$sourcedir/*.* $basedir/$subdir/"; } if (defined($gif) && length($gif) > 0 ) { system "cp $indir/$sourcedir/$imagesubdir/*.* $basedir/$subdir/"; } # make the 'contents' file and fill it with the file names. appendix e. mss 2009 batch loading scripts (cont.) batch loading collections into dspace | walsh 127 system "touch $basedir/$subdir/contents"; if (defined($gif) && length($gif) > 0 && -d "$indir/$sourcedir/$imagesubdir" ) { # sort items in reverse order so they show up right in dspace. # this is a hack that depends on how the db returns items # in unsorted (physical) order. there are better ways to do this. system "cd $indir/$sourcedir/$imagesubdir/;" . " ls *[0-9][0-9].* | sort -r >> $basedir/$subdir/contents"; system "cd $indir/$sourcedir/$imagesubdir/;" . " ls *[a-za-z][0-9].* | sort -r >> $basedir/$subdir/contents"; } if (defined($ppt) && length($ppt) > 0 && -d "$indir/$sourcedir/$sourcedir" ) { system "cd $indir/$sourcedir/$sourcedir/;" . " ls *.* >> $basedir/$subdir/contents"; } # put the abstract in last, so it displays first. system "cd $basedir/$subdir; basename $abstract >>" . " $basedir/$subdir/contents"; $linenum++; } # done processing an item. --------------------------------------------------------------------------------------------------import.sh –#!/bin/sh # # import a collection from files generated on dspace # collection_id=1811/6635 eperson=[name removed]@osu.edu source_dir=./2009xml base_id=`basename $collection_id` mapfile=./map-dspace03-mss2009.$base_id /dspace/bin/dsrun org.dspace.app.itemimport.itemimport --add --eperson=$eperson --collection=$collection_id --source=$source_dir --mapfile=$mapfile appendix e. mss 2009 batch loading scripts (cont.) local hosting of faculty-created open education resources: launching pressbooks communication local hosting of faculty-created open education resources launching pressbooks joseph letriz information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.13803 joseph letriz (jletriz@dbq.edu) is the electronic systems librarian, university of dubuque. © 2022. abstract rising costs of secondary education institutions, coupled with the inflated cost of textbooks, have forced students to make decisions on whether they can afford the primary materials for their classes. publishers working to supply digital access codes, which limit the ability of students to copy, print, or share the materials, or resell the textbook after the course is over, have further pushed students into forgoing purchasing materials. in recent years, institutions have moved to support oer (open education resources) initiatives to provide students a cost-free primary text or supplement to their materials. this allows students unfettered access to quality resources that help drive engagement in courses, from homework to discussions. while larger institutions or in-state partnerships with resource sharing consortiums, such as the mnpals cooperation with the state of minnesota, provide access to platforms like pressbooks, smaller institutions and private colleges don’t always have the ability to negotiate these types of relationships. in this case study, i will cover the foundations necessary to start a low-cost, self-hosted solution to support faculty creation of oer material and the available resources that the university of dubuque utilized in their development process. this overview will briefly cover the skills and knowledge needed to support the growth of this initiative with minimal complexity and as little jargon as possible. introduction at the university of dubuque, the library installed, configured, and deployed an instance of pressbooks to support faculty development of open education resources (oer). the university of dubuque is a small, private university with a total full-time enrollment (fte) of about 2,000. two library personnel lead the deployment of the resource. as many universities find themselves grappling with an increase in textbook costs and other barriers to students’ access to quality information, libraries have emerged as a natural partner within institutions to identify, curate, and provide access to quality oer. okamoto points towards a variety of ways that libraries have managed this, including the community college consortium for open education resources (cccoer), which includes “150 member colleges … promot[ing] oer adoption to enhance teaching and learning.”1 braddlee and vanscoy state that librarians hold an important role in “supporting faculty and students in expanding the range of oer” through a number of methods, referencing prior research that okamoto performed.2 from a number of interviews from similarly sized liberal arts colleges, schleicher et al. state that librarians leading the initiatives for oer “may need technical skills … to assist faculty in developing oer projects.” 3 the benefits for students in terms of cost alone show that oer supported projects, such as the launch of pressbooks at the university of dubuque, has longstanding benefits for faculty, students, and the library.4 mailto:jletriz@dbq.edu information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 2 pressbooks is an open-source book content management system, making the software free for anyone to utilize, customize, and remix. with an open-source software as the basis for this project, the university of dubuque could view and change any of the underlying codebase to fit their exact needs to provide a platform for faculty to publish and develop oer content for their classes and students. the overlaying interface and configuration of pressbooks is built upon a fresh installation of the wordpress blog hosting system utilizing the multisite feature. these two systems are free to install, configure, and deploy on a locally hosted or cloud-based network. larger consortiums, which can consist of state level organizations, universities, and partnerships with businesses, may have the flexibility in spending to fund a hosted solution from the company itself. the cost of paying for a hosted solution can vary depending on the needs of the community served. a pressbooks edu single network plan, hosted by pressbooks, can cost $7,000 a year for the silver plan or $14,000 a year for the gold plan.5 at the university of dubuque, we opted for the low-cost solution of locally hosting our installation, which involved configuring the software locally and providing our own support for the faculty and students utilizing it. in this case study, i will detail how we successfully deployed the instance of pressbooks for the university of dubuque. this case study will cover the documentation used, the systems and services utilized to support the network, and the timeline from beginning the project to its successful launch. documentation to start the installation process, there needs to be a web server to host the pressbooks instance. at the university of dubuque, we used an already configured amazon web services (aws) account to set up the server that pressbooks would run on. aws offers a variety of tiers for its web server hosting, from the smallest available configuration that can be used for free to larger, more powerful instances for public access. at the university of dubuque, we opted for the aws t3a.large instance type, which gave us access to a faster server load for processing the installation and running the instance operations, as well as better network bandwidth.6 once we had the instance type selected, aws allowed for configuration of a variety of operating system (os) installations that come preconfigured or an à la carte option. we chose the same os platform that we utilize for our digital repository, a c-panel, centos 7 instance, as we already owned an educational software license for it. c-panel offers a reduced cost, education license available for any institutions with a .edu domain. the application to receive an educational license for the c-panel account takes little time to fill out and the only cost associated with the initial application is a $30 processing fee.7 once c-panel activated the education license on our primary platform, the license was utilized on the other instances without having to worry about multisite or platform license fees. aws launches the instance in the ec2 services page listed under the account, which details the instance’s setup, volumes attached to the instance that the software gets installed on (with additional volumes available to add onto the instance if necessary), and the ability to create snapshots of the instance for backups and restoration of the installation configuration. aws categorizes volumes as the primary storage devices for the installation, akin to a virtual hard drive, while the snapshots function as a copy of that storage device. during the configuration process, aws provides additional information about all of the options available in their ec2 service. as these are not directly relevant here, i will not go into detail about them. at the university of dubuque, we had preconfigured security groups and snapshot schedules set up that we applied to the pressbooks instance before we installed the underlying software. information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 3 the primary documentation used for the platform setup before installing the platform came from the pressbooks documentation site.8 the documentation begins by directing users to wordpress and their famous 5-minute install. wordpress documents are available on their website (https://wordpress.com/support/); installation directions include prompts that guide users through the entire setup and configuration process. once the wordpress installation process is complete, pressbooks can be set up on top of the wordpress site by following the installation directions from the pressbooks documentation. the beginning portion of configuring the wordpress site for pressbooks involves editing the configuration file for wordpress to allow for multisite setups on the single instance of wordpress. once the pressbooks site is installed, pressbooks will require additional plugins through wordpress in order for pressbooks to function correctly. again, the installation documentation for pressbooks walks through each of the necessary plugins, providing directions on how to configure the files for the installation to work correctly, how to start the configuration of users and appearance, and how to begin the creation of digital materials on the site. access to pressbooks can be set up through the installation itself, using plugins to link the installation to microsoft office accounts, google accounts, or any others that might be used. once this final step is completed (which will vary institution by institution depending on what service the institution utilizes for their primary authentication method ), the pressbooks site is ready to be utilized. there are two kinds of regular maintenance needed to keep the installation up to date. the first relates to the pressbooks and wordpress installations and updates, changes to configurations, additions, or deletions for the instance. most of these software updates, configuration changes, and plugin updates are handled through the pressbooks interface under the network manager administrator menu. since pressbooks is a layered software that’s built on the wordpress platform, all of the network configuration options use the same wordpress tools and user interface. the second kind of maintenance is done through a terminal command-line interface (cli) connection to the aws instance. this includes server maintenance tasks, which can be preconfigured through a script run on the server or handled by an administrator with sufficient knowledge of the system. the cli can also locate the error logs to pinpoint any errors that may have happened during setup and configuration. this maintenance can run on a monthly schedule, usually to ensure that web hosting software or internet access services are running correctly on the aws instance in addition to any server updates for the os and installed platform. at a smaller institution, the work on pressbooks can be handled by a librarian or professional staff member, as wordpress makes the procedure as simple as possible for anyone. the command-line interface work, if an os is installed without a user interface, can be handled by either a librarian familiar with terminal commands or a member of the institution’s help desk or information technology support personnel. any additional dependencies outside of what comes with wordpress are easily handled through the same network manager administrator menu. most installs include a number of default configuration options, such as uploading documents, printing from a pdf, or view functionalities. at the university of dubuque, any additional dependencies were all installed using wordpress and configured on pressbooks without any need to access the server directly. for a smaller institution, this makes the process of approaching a self -hosted solution sustainable over time, as it does not require specialized knowledge of servers to handle pressbooks once it is installed. https://wordpress.com/support/ information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 4 working with faculty to add materials and address concerns when we launched pressbooks at the university of dubuque (http://pressbooks.dbq.edu) and wanted to showcase how using the platform would be advantageous, we worked with a geology professor who had already created his own textbook for his entry level geology course. the pdf he created was over 170 pages long and included all of the terminology, concepts, and example questions the students would see on the quizzes.9 we worked with the professor to get the original word document of his textbook, complete with his own layout structure, font, and headings, correctly formatted to import into pressbooks. the system manages the import process by utilizing very basic formatting of the document, identifying chapters based on the heading types.10 essentially, the library staff worked with the professor to sanitize the document of all unnecessary formatting, laid out the primary chapter headings in the document using the word heading formats that are supported, and then processed the document through the pressbooks tools for importing. with the primary example uploaded and ready to showcase to the faculty members, our library director began fielding the requests of other faculty members at the university of dubuque.11 the current process for working with faculty involves sending any interested faculty the list of required reading that pressbooks has hosted on their website. this includes materials related to creating the content directly in pressbooks, importing the content from a word document if authors already have something they want to use, and setting up an account as an author on pressbooks. in addition to the geology professor mentioned above, two additional faculty members in vastly different departments, computer information science and philosophy, used our pressbooks instance to curate their materials for their students. as the instance is built on a wordpress multisite installation, library staff are able to install and configure a variety of additional material for faculty—enabling practice quizzes, the list of glossary terms to study, and other material—either through the native pressbooks interface or with the assistance of opensource plugins such as h5p, a plugin that allows community-created videos, presentations, quizzes, and interactive content to be created, shared, and reused. all of these additional configuration options, including adding the additional tools for faculty, are handled through the network manager administrator menu. faculty with questions or needs for assistance can reach out to library personnel directly through email or by setting up a teams or zoom call to walk through the problem they might have or express a need that they can assist with. looking back/reflections throughout the process, the university of dubuque’s work came to fruition through the efforts of one librarian focused on the application and server side management and a library worker who was familiar with mysql query language and data management. this partnership proved invaluable in working with the nuances of configuring the sql database to the necessary specifications. for any institution looking to have an uncustomized database, the wordpress installation configuration options work without any additional knowledge or customization necessary. the library’s access to the aws instance from the campus needed involvement by the campus information technology department’s help desk to approve the ip address on file for the dns configuration. in simpler terms, once the library set up the aws instance with an elastic ip address (the term amazon uses on aws to refer to their ranges of ip addresses) and configured the domain information on pressbooks through the installation, the library provided that information to the help desk and they updated the necessary documents and certificates. the last http://pressbooks.dbq.edu/ information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 5 piece of the process, inviting users to utilize installation, required the most patience and is an ongoing process. in setting up the accounts locally, for more restricted access, pressbooks provides only temporary account status to any created user accounts. this means if a faculty member has an account created for them by the institution in july but doesn’t attempt to sign in to the account until september, pressbooks will not hold onto that information in the sql database. after a default period of three days, which is customizable through the wordpress configuration options, the username is not retained by the system and the new account creation process has to begin again. there are options to link the installation to a single sign-on system, such as microsoft’s adfs or a program such as shibboleth or google apps. directions for setting up this configuration option are also available as part of pressbooks documentation on their website. at the university of dubuque, having a small fte allows for more time to work closely with a faculty member throughout the oer creation process, as the faculty are more flexible with their time. the current process of creating accounts as needed, on an individual basis, wor ks well when handling limited requests. larger institutions that would utilize this method of configuration might find it easier to streamline the request through a single sign-on system, an authentication method that is automated through an administrator or pressbooks or another program. additional needs after the rollout of the pressbooks site to the campus community, we encountered additional needs for our instance that weren’t configured as part of the base installation of the site. for faculty members registered for an account on the site, pressbooks would allow their account to have basic user access to the features necessary to start creating their oer. however, this did not allow for the usage of a majority of the features that pressbooks offers. part of this disconnect stemmed from the way the accounts were created on our multisite instance. accounts created need to be manually added and confirmed as an existing account on the pressbooks site as an author in order to allow access to the full suite of options available for the oer creation. the other hang-up in access for faculty came from the way pressbooks handles email for new registration, password change information, or any type of communication. prior to june 2020, developers were able to simply connect a wordpress site, or other sites, to gmail using a simple authentication of their account using their username and password. in june 2020, google required users who want to utilize gmail to send emails from a new site, or in this case a locally created instance of a wordpress multisite for pressbooks, to authenticate their account information by authorizing the site through a google developer api, paying for access to the plugin that would allow for configuration of gmail, outlook, or other email providers, or rely on the site maintainers to configure their own email services through the server itself.12 if it is built into the budget of the university to purchase and subscribe to a service provided by a plugin owner, that option works without additional server configuration. our institution, however, was limited in its payment options and was unable to utilize standard forms of payment required by the plugin providers. as such, we are manually reviewing the registration requests for the site and creating accounts on pressbooks on an as-needed basis. information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 6 concluding thoughts the university of dubuque’s initial introduction to pressbooks came from attending the library technical conference 2019, held at macalester college in minnesota. while there, representatives from the mnpals consortium walked through the work done between the university of minnesota and the state library system to integrate their instance of pressbooks throughout public library systems and university systems.13 the work done at our institution is at a significantly smaller scale, only being utilized by faculty members at the university and members of the university community, including adjunct professors and professional staff. while work on a consortium level can proceed quickly, as there are multiple parties involved in the creation of the resource, we at the university of dubuque had a small number of people immediately working on the project. the discussion between the two personnel in the library handling the system work and the director of the library took the longest amount of time, followed by a couple of months between contact with pressbooks about pricing, hosting through them, and conversations at the state level attempting to gauge interest from additional parties to partner with. initial conversations at local state conferences, with the larger public institution librarians participating in discussions, didn’t evolve into an actionable plan. from there, the planning for the setup of the aws instance to install wordpress and pressbooks took a month to set up. another two weeks were spent working with the mysql database to customize it to the university’s needs and upload the instructor book used as the pilot upload. from start to finish, seven months passed to the launch and rollout of the product. since the launch of the platform, work has started on identifying faculty who would benefit from using pressbooks, with surveys across the institution to glean insight into what faculty are currently doing, how they and their students can benefit from this, and all the steps involved. with the work done at the university of dubuque, operating as a private, smaller university allowed for more flexibility in our adoption of technology, a more focused approach to introducing new systems to the university at large, and a less bureaucratic approach to seeking approval. in the library, we recognized that we were in a unique position to begin this development and implementation rapidly for the university and took advantage of that. endnotes 1 karen okamoto, “making higher education more affordable, one course reading at a time: academic libraries as key advocates for open access textbooks and educational resources ,” public services quarterly 9, no. 4 (2013): 4. 2 dr. braddlee and amy vanscoy, “bridging the chasm: faculty support roles for academic librarians in the adoption of open educational resources,” college & research libraries (may 2019): 429. 3 caitlin a. schleicher, christopher a. barnes, and ronald a. joslin, “oer initiatives at liberal arts colleges: building support at three small, private institutions,” journal of librarianship and scholarly communication 8 (2020): 16. 4 jennifer snoek-brown, dale coleman, and candice watkins, “from spark to flame, lighting the way for sustainable student oer advocacy framework at a community college,” scholarly communication 82, no. 8 (2021): 2. information technology and libraries march 2022 local hosting of faculty-created open education resources | letriz 7 5 pressbooksedu, pressbooksedu plans q3 2019, received july 26, 2019, adobe pdf. 6 “amazon ec2 t3 instances”, amazon, last modified september 14, 2021, https://aws.amazon.com/ec2/instance-types/t3/. 7 “educational license application”, cpanel, accessed march 14, 2022, https://input.cpanel.net/s3/edu. 8 “installation.”, pressbooks documentation, pressbooks, last modified february 23, 2022, https://docs.pressbooks.org/installation/. 9 dale easley, “the story of the earth.” dale easley, september 1, 2021, http://www.daleeasley.com/resources/physical/geomain.pdf. 10 “import from a word document.”, pressbooks user guide, accessed march 14, 2022, https://guide.pressbooks.com/chapter/bring-your-content-into-pressbooks/#chapter-156section-3. 11 dale easley, the story of earth (dubuque: university of dubuque pressbooks), http://pressbooks.dbq.edu/storyoftheearth/. 12 “how to upgrade to oauth2 security for existing google/gmail accounts.”, postbox, accessed september 1, 2021, https://support.postbox-inc.com/hc/en-us/articles/218446767-how-toupgrade-to-oauth2-security-for-existing-google-gmail-accounts. 13 “about the minnesota libraries publishing project.”, minnesota libraries publishing project, accessed september 1, 2021, https://mlpp.pressbooks.pub/about-the-minnesota-librarypublishing-project/. https://aws.amazon.com/ec2/instance-types/t3/ https://input.cpanel.net/s3/edu https://docs.pressbooks.org/installation/ http://www.daleeasley.com/resources/physical/geomain.pdf https://guide.pressbooks.com/chapter/bring-your-content-into-pressbooks/#chapter-156-section-3 https://guide.pressbooks.com/chapter/bring-your-content-into-pressbooks/#chapter-156-section-3 http://pressbooks.dbq.edu/storyoftheearth/ https://support.postbox-inc.com/hc/en-us/articles/218446767-how-to-upgrade-to-oauth2-security-for-existing-google-gmail-accounts https://support.postbox-inc.com/hc/en-us/articles/218446767-how-to-upgrade-to-oauth2-security-for-existing-google-gmail-accounts https://mlpp.pressbooks.pub/about-the-minnesota-library-publishing-project/ https://mlpp.pressbooks.pub/about-the-minnesota-library-publishing-project/ abstract introduction documentation working with faculty to add materials and address concerns looking back/reflections additional needs concluding thoughts endnotes 126 standards for library automation and isad's committee on technical standards for library automation (tesla) the 1'0le of isad's committee on technical standards for library automation is examined and discussed. a p1'dcedu1'e fo1' the reaction to and initiation of standards is described, with reference to relevant standards organizations. the development, implementation, and maintenance of standards might best be characterized as the complexity of simplification-complex insofar as a standard represents a universally applicable ideal which is usually the result of arduous negotiation and compromise; simplification since a standard, once recommended and followed in practice, forms a firm reference point for the achievement of specified objectives. thus, if a standard exists, it can be referenced or used immediately and variant wheels do not have to be invented. unfortunately, to use, reference, or advocate a standard requires an awareness of available standards or the process whereby standards evolve. it is at this point that standards again become complex-in fact, they become a maze, which perhaps can be characterized by questions such as: is there a standard already? who is responsible for it? where are copies of this standard available? and so on (a maze familiar, certainly, to all of us). n is precisely to address the mazelike aspect of the standards "game" that the committee on technical standards for library automation ( tesla) has been established. in short, tesla intends to act as a twoway clearinghouse, hopefully to bring user and supplier into a meaningful dialogue wherein the requirements of both might be satisfied. technical standards and data dependence within this context the emphasis by tesla shall be on technical standards for library automation (e.g., standards relating to electronic data processing devices and techniques). concurrently, however, there are instances where device and data become inseparably linked. for example, the standatds fot libraty automation 127 relationship between the physical dimensions of a machine-readable patron badge and the amount and, therefore, type of data which can be me~ chanically encoded in it; or the character set used by a terminal and the minimum processing potential and, thus, hardware which must be internal to the terminal to receive, transmit, and display that character set. because it would be foolish to ignore this relationship, tesla in its clearinghouse function will stress and foster the involvement of individuals or organizational units within the american library association wherever data-dependent technical standards are involved. ala-originated and maintained technical standards though certainly no mystery, there is little evidence that the direct cost and personal involvement for a published and practiced standard is popularly known. for example, it has been indicated by those within the standards business that an adopted standard might culminate an investment of over a million dollars and represent the expenditure of tens of man-years. the cost, for example, leading to and including the final publication by the american national standards institute (ansi) of the standard for bibliographic information interchange on magnetic tape (ansi z39.21971), more popularly known as marc, has not been published. it is suspected, however, that the cost of the marc standard was monumental. in short, and by way of this example, it can be safely assumed that neither the american library association nor isad nor tesla will become standards organizations in the strict sense of the word. in fact such a capability is not desirable, since organizations such as ansi, electronics industry association (eia), institute of electronic and electrical engineers (ieee), national microfilm association ( nma), etc., exist and are geared specifically to this activity. rather, the american library association and isad should, and must, participate actively in the standards processes available to them to insure a meaningful user-voice in the development of standards by those organizations. to provide for participation in the standards process at the membership level is precisely tesla' s role. thus, when placed in operation, such standards will reflect the library community's requirements, contributing to and fostering library automation rather than hindering it. at least one of the anticipated results would be the development of equipment addressing library needs directly, and so preclude the custom fabrication of specialty devices which while satisfying the needs of a few libraries-expensively, cannot satisfy libraries in generaleconomically. what is provided by tesla? tesla specifically has established a procedure whereby the membership of the american library association might either teact to proposals for standards regardless of origin, or initiate proposals for standards for membership reaction. the results of this procedure, whether reactive or initia128 journal of library automation vol. 7/2 june 1974 tive, would be communicated to the membership in terms of the status and position taken for each proposal, and to the originator and to ala's official representative in full detail for subsequent application. the tesla procedure the procedure is geared to handle both reactive (originating from the outside) and initiative (originating from within ala) standards proposals to provide recommendations to ala's representatives on existing, recognized standards organizations. to enter the procedure for an initiative standards proposal the member must complete an "initiative standards proposal" using the outline which follows: initiative standard proposal outline the following outline is designed to facilitate review by both the committee and the membership of initiative standards requirements and to expedite the handling of initiative standards proposals through the procedure. since the outline will be used for the review process, it is to be followed explicitly. where an initiative standard requirement does not require the use of a specific outline entry, the entry heading is to be used followed by the words "not applicable" (e.g., where no standards exist which relate to the proposal, this is indicated by: vi. existing standards. not applicable). note that the parenthetical statements following most of the outline entry descriptions relate to the ansi standards proposal section headings to facilitate the translation from this outline to the ansi format. all initiative standards proposals are to be typed, double spaced on 8%" x 11" white paper (typing on one side only). each page is to be numbered consecutively in the upper right-hand corner. the initiator's last name followed by the key word from the title is to appear one line below each page number. i. title of initiative standard proposal (title). ii. initiator information (forward). a. name b. title c. organization d. address e. city, state, zip f. telephone: area code, number, extension iii. technical area. describe the area of library technology as understood by initiator. be as precise as possible since in large measure the information given here will help determine which ala official representative might best handle this proposal once it has been reviewed and which ala organizational component might best be engaged in the review process. iv. purpose. state the purpose of standard proposal (scope and qualifications). v. description. briefly describe the standard proposal (specification of the standard). vi. relationship of other standards. if existing standards have been identified which relate to, or are felt to influence, this standard proposal, cite them here (expository remarks). vii. background. describe the research or historical review performed relating to this standard proposal (if applicable, provide a bibliography) and your findings (justification). standards for librm·y automation 129 viii. specifications. specify the standard proposal using record layouts, mechanical . drawings, and such related documentation aids as required in addition to text exposition where applicable (specification of the standard). please note that the outline is designed to enable standards proposals to be written following a generalized format which will facilitate their review. in addition, the outline permits the presentation of background and descriptive information which, while important during any evaluation, is a prerequisite to the development of a standard. the reactor ballot (figure 1) is to be used by members to voice their recommendations relative to initiative standards proposals. the reactor ballot permits both "for" and "againsf' votes to be explained permitting the capture of additional information which is necessary to document and communicate formal standards proposals to standards organizations outside the american library association. as you, the members, use the outline to present your standards protesla reactor ballot reactor information name __________________ __ title organization address city___ state ___ zip __ telephone ---------identification number for standard requirement -----------for ------------against reason for position: (use additional pages if required) fig. 1. tesla reactor ballot posals, tesla will publish them in ]ola-tc and solicit membership reaction via the reactor ballot. throughout the process tesla will insure that standards proposals are drawn to the attention of the applicable american library association division or committee. thus, internal review usually will proceed concurrent with membership review. from the review and the reactor ballot tesla will prepare a majority recommendation and a minority report on each standards proposal. the majority recom130 journal of library automation vol. 7/2 june 1974 receipt screen division rej/acp1 publish tally representative title/i.d. number date date date date date date date target fig. 2. tesla standards scoreboard mendation and minority report so developed will then be transmitted to the originator, and to the official american library association representative on the appropriate standards organization where it should prove a standards for library automation 131 source of guidance as official votes are cast. in addition, the status of each standards proposal will be reported by teslain jola-tc via the standards scoreboard (figure 2). the committee ( tesla) itself will be nonpartisan with regard to the proposals handled by it. however, the committee does reserve the right to reject proposals which after review are not found to relate to library automation. tesla's composition tesla is comprised of representatives both from the library community and library suppliers to insure a mix of both users and producers for its review of standards proposals. in addition, rotating membership on tesla will insure a continuing movement of voices from different segments of the library and library supplier communities to shortstop the pressing of vested interests. at this time, the members of tesla and the term for each are: ms. madeline henderson chairperson, task force on automation of library operations and federal library committee u.s. department of commerce/nos washington, dc 20234 term ends: 1974 mr. arthur brody chairman of the board bro-dart industries 1609 memorial ave. williamsport, pa 17701 term ends: 1975 dr. edmund a.' bowles data processing division international business machines corporation 10401 fernwood rd. bethesda, md 20034 term ends: 1974 mr. anthony w. miele assistant director technical services illinois state library centennial building springfield, il 62706 term ends: 1975 standards library mr. jay l. cunningham director, university-wide library automation program university of california, south hall annex berkeley, ca 94720 term ends: 1976 mr. richard e. uttman p.o. box 200 princeton, nj 08540 term ends: 1976 mr. leonard l. johnson director of media services greensboro public schools drawer v greensboro, nc 27402 term ends: 1975 mr. john c. kountz (chairman) associate for library automation office of the chancellor the california state university and colleges 5670 wilshire blvd., suite 900 los angeles, ca 90036 term ends: 1976 in addition to acting as a clearinghouse for standards for isad, and the maintenance of the standard proposal and reactor ballot procedure, tesla intends to urge the establishment of an ala collection of stan132 journal of library automation vol. 7/2 june 1974 dards applicable to libraries to handle requests for information from the library community. thus, while currently each member is left to "do it himself," there appear to be definite economies in the centralization of such a collection and the periodic publication of indices to relevant standards. sources of information relating to standards to provide a source of guidance at this time for the types of available standards, to list the many existing standards, and to index the originating standards organizations would consume several issues of jola. therefore, in the following are very brief recapitulations of the more relevant organizations impacting library automation standards. the list is very incomplete as might be expected. for those interested in a comprehensive review of standards, global engineering, 3950 campus drive, newport beach, ca 92660, maintains and annually publishes their directory of engineering document somces .. this directory, now in its second edition, lists over 2,000 standards organizations and the prefixes used by those organizations in publishing their standards, to permit global engineering's customers to specify standards for purchase. ( global engineering's primary function is the sale of original copies of standards and specifications.) american national standards institute, inc. (ansi)-the american national standards institute, inc. ( 1430 broadway, new york; ny 10019) does not write standards. rather, ansi has established the procedures and guidelines to be followed in the development of standards that will be labeled american national standards. the actual work is done by ansi members and other interested groups and individuals, using the ansi procedures in the development of standards. only after these groups have demonstrated to ansi's satisfaction that the proposed standard has been developed in accordance with the procedure established by ansi will it be approved and published as an american national standard. in addition, ansi publishes materials relating to standards, of which the ansi reporter, a journal dedicated to standards, probably represents the single best source for information relating to current standards issues. however, the scope of ansi is very broad. thus for library and library automation activities specific committees of ansi, rather than ansi itself, are relevant; specifically, ansi committees: ph5 (microfilm). see national microfilm association below. x3 ( computers and information processing); x4 (business machines and supplies). both of these subcommittees are sponsored by the computer and business equipment manufacturers' association ( cbema), which is also their secretariat. cbema ( 1828 l st. nw, washington, dc 20036) periodically provides indices to the published standards of x3. an insight into the breadth of x3's activities can be implied from figure 3, ansi x3 standards committee organization. x3 currently has standards fo1' library automation 133 bema iso/tc 97 secreuiriat information processing systems technical advisory board (ipstab) computers and lnfonnation processing american national standards institute american national standards committee ~=:.:.:.:;:.~::::::;;r:----1 consumer members -i i i i i i :------, : i : i : i i -i i standanls department administrstion secretariat a standards advisory committee coordination dpg advisory committee on plans & policy policy dpg standards conmittee technics/ '-----staff line staff i i i standards standards internanonal planning & steering ·advisory requirements committee coi.wittee coi.wittee (ssc) (lac) (sl'arc) i study groui's i i•• requiredl i i hardware gro141 software gro141 systems g10141 recognition section language section data communications sectio~ ma" m:ro "s" iqa1 ocr ic3j1 pl/1 ic3s3 data communications 113a7 mica 113j3 forman physical media section x3j4 cobol systems technology section x3j7 apt "t" "au x3jb algol magnetic tape & cauettea 113'19 1/0 lntarfaca x3b1 documentation section x3b2 perforated tape "k" x3b3 punched cards x3b4 edge punched carda x3k1 documentation 11387 megnetic disc. 1131<2 flow charta 1131<5 vocabulary x3kb network-oriented lnfonnation syatama data representation section ml" x3l2 cod"" x3l5 labels 113lb data repntsontatkln fig, 3, ansi x3 standards committee organization, 134 ]oumal of librmy automation vol. 1/2 june 1974 over fifty member organizations. the ala representative on x3 is mr. james rizzolo of the new york public library. an excellent overview of x4's scope and activities was published in the secretary (nov. 1973) under the title "what's being done about office equipment standards." from the library viewpoint x4' s activities in credit cards, typewriter keyboards, and forms are of interest. at this writing x4 has nine user, fifteen producer, and nine general interest members. the ala is not currently represented on x4. z39 (library work, documentation and related publishing practices) sponsored by the council of national library associations. with thirtysix subcommittees ( sc), z39 covers library related activities from machine input records ( sc/2) through standard order forms ( sc/36). it was through z39 that marc became the american national standard (z39.2-1911). z39 publishes a quarterly entitled news about z39. z39 is located at the school of library science, university of north carolina, chapel hill, nc 27514. fifteen standards have been published by z39. the ala representative to z39 is fred blum of eastern michigan university, ypsilanti, whose excellent summary of z39 appears in the winter 1974 issue of library resources & technical services. national microfilm association (nma)-the national microfilm association has an organization of standards committees and a standards board as shown in figure 4. information relating to their standards is published from time to time in the ]oumal of mic1·ographics. a recent article, entitled "standards: nma standards committee scope of work," (vol. 7, no. 1 [sept. 1973] ), briefly describes the subcommittees internal to nma's standards organization and the scope of each. of particular interest to libraries is the sponsorship by nma of ansi-ph5. micrographic standards are listed in an nma publication ( rr1-1974 resource report). copies of this resource report may be obtained by contacting the nma at suite 1101, 8728 colesville rd., silver spring, md 20910. international organization for standardization (iso)-this organization is truly international with representatives from thirty-five nations. the secretariat of iso is ansi (see above). while iso parallels ansi in its coverage, it differs organizationally. thus, the committees/subcommittees of ansi have in large measure their equivalent technical committees, subcommittees, and working groups in the iso. standards developed by the iso and published by them are reported regularly in the ansi reporter (referred to above). most recently, the january 11, 1974 ansi reporter contained an article outlining iso publications and describing five iso titles. marc, by the way, is also iso standard 2709. the iso technical committees ( tc) of immediate interest to library automation are tc 37 (terminology), tc 46 (documentation), tc 95 (office machines), and tc 97 (computers and associated information processing systems, nma standards board i i i i i i microfiche inspection & materials & operational public equipment quality con. supplies practices records i i i i i i microdrafting terminology info. storage rotary reduction facsimile & retrieval cameras ratios i i i i i i flow chart newspapers com format com com ecology symbols & coding quality software fig. 4. nma standards organization 136 journal of libmry automation vol. 7/2 june 197 4 peripheral equipment and devices, and media related thereto). as an indication of the technical areas covered by tc 97, its organization is shown in figure 5. electronic industries association (eia)-the electronic industries association maintains a broad variety of standards for hardware and related peripheral equipment. such areas as cathode ray tube (crt) terminals, the luminescence of cathode ray tubes themselves, television transmission, and data communications are dealt with by the eia standards. an excellent source of eia standards is the publication produced by the eia entitled index of eia and jedec standards and engineering publications ( 1973 revision and no. 2). copies are available through the electronic industries association, engineering department, 2001 i st. nw, washington, dc 20006. the ala is not a member of the eia, by definition. the institute of electt·onic and electrical engineers (ieee)-the institute of electronic and electrical engineers, inc. is a professional organization which, in addition to its professional activities, maintains standards. many of these standards relate to library automation in such areas as keyboards for terminals and transmission types for data communications. while each monthly issue of the ieee publication spectrum contains annotated lists of new standards, a full index to the ieee standards is available by contacting the ieee headquarters, 345 e. 47 st., new york, ny 10017. national bureau of standards (nbs)-the national bureau of standards, u.s. department of commerce, has the responsibility within the federal government for monitoring and coordinating the development of information processing standards and publishing proved data standards for data elements and codes in data systems. thus, the national bureau of standards works closely with federal departments and agencies, the american national standards institute (ansi), and the international orgatlization for standardization (iso). of specific interest to library automation are the fedeml information processing standards (fips) and the fips index published by nbs. the annual fips index (fips pub 12-1) is a veritable gold mine of information relating to ansi, iso, federal government participation and representatim,1 in the standards process, and the role of nbs itself. fips 12-1 is available from the superintendent of documents, u.s. g.p.o., washington, dc 20402 (sd catalog no. c 13:52: 12-1). while the material above should provide a brief overview of the standards arena in which tesla will function and some insight into the scope of standards activities, it is not to be construed as a definitive compilation of standards organizations. as indicated earlier, over 2,000 such organizations are known to be active currently. iso/tc97 computers aiid ill formation processing usa sci sc2 sc3 sc4 sc 5 sc 6 sc7 s c8 'igk character character program lung digital data problem numerical data elements vocabulary ill put/ output definition l control of l their coded sets lcodinc recognition languages transmission analysis machines representations sec: france france usa italy usa usa germany france usa i i i wgi wg i 'ig2 prog\i~niiig wgi vocabulary optical magnetic ink laitguage for vocabulary maintenance character character numeric control for numeric recognition recognition of machines control usa switzerland belgium usa france we 1 'ig2 wg3 wg4 igs 'ig6 magnetic punched punched 1/0 iiisllwweiita tioi magnetic tape cards tape equipneiit tape disk packs usa france italy germany usa germaiiy fig. 5. iso/tc 97 organization chart 138 ] ournal of libm1'y automation vol. 7/2 june 197 4 finally, an invitation during the formative period of tesla a list of potential standards areas for library automation was developed. potential technical standa1'ds a1·eas 1. codes for libraries and library networks, including network hierarchy structures. 2. documentation for systems design, development, implementation, operation, and post-implementation review. 3. minimum display requirements for library crts, keyboards for terminals, and machine-readable character or code set to be used as label printed in book. 4. patron or user badge physical dimension ( s) and minimum data elements. 5. book catalog layout (physical and minimum data elements) : a. off-line print b. photocomposed c. microform 6. communication formats for inventory control (absorptive of interlibrary loan and local circulation). 7. data element dictionary content, format, and minimum vocabulary, and inventory identification minimum content. 8. inventory labels or identifiers (punched cards, labels, badges, or ... ) physical dimensions and minimum data elements. 9. model/minimum specifications relating to hardware, software, and services procurement for library applications. 10. communication formats for library material procurement (absorptive of order, bid, invoice, and related follow-up). you are invited to review this list and voice your opinion of any or all areas indicated by means of the reactor ballot in ]ola-tc in this issue. or, if you've a requirement for a standard not included in this list, use the initiative standard proposal outline to collect and present your thoughts. henceforth, future issues of ]ola-tc will contain a reactor ballot and the scoreboard. the ball is in your court! send ballots and/or initiative standard proposals to: john kountz, chairman, isad-tesla, 5670 wilshire blvd., suite 900, los angeles, ca 90036. editorial board thoughts: policy before technology — don’t outkick the coverage editorial board thoughts policy before technology don’t outkick the coverage brady lund information technology and libraries | march 2022 https://doi.org/10.6017/ital.v41i1.14773 brady lund (blund@g.emporia.edu) is a doctoral candidate and lecturer at emporia state university and a member of the ital editorial board. © 2022. opinions expressed in this column are the author’s and do not reflect those of the editorial board as a whole or of core, a division of ala. in the race to adopt the newest and best, practical considerations for emerging technologies are frequently overlooked. technology can set an organization apart and, in the case of libraries, be instrumental in helping demonstrate value. yet, all new technologies carry additional, potentially unpleasant consequences, whether it be threats to privacy and security, barriers to accessibility or risks to health, learning barriers, or exposure to misinformation. organizations must consider these threats before introducing new technologies, rather than the other way around. to illustrate these threats and their policy implications, i will briefly discuss two popular technologies/innovations—virtual reality and data analytics—and the threats that are often overlooked by organizations and how they may be appropriately addressed by policy. virtual reality (vr) has quickly become a popular technology in all types of libraries and learning organizations. as noted in many recent publications, vr provides an immersive and interactive medium to engage with learning and entertainment content.1 of course, libraries are always seeking new ways to engage patrons with their collections and services, so it is natural that there would be high interest in this technology. however, i have observed that this technology is frequently made available with little foresight or oversight of potential issues. the engaging interface of vr technology also presents risks to certain individuals. it has been known to invoke seizures among those who are predisposed and can cause severe dizziness and disorientation.2 these risks are severe enough that the institutional review board at my university required a safety disclaimer be included for any project that utilized vr technology for learning. however, inclusion of a disclaimer is not necessarily common practice in library research and certainly not for non-research projects. further, substantial learning barriers should be acknowledged for virtual reality technology. a learning curve is perhaps a less-serious threat, compared to the health and safety risks, but can still lead to non-use or misuse of the technology.3 libraries should want as many patrons as possible to use the technology to enrich their lives. this includes individuals who have limited technology experience. it is important to provide education and policy to ensure the technology is used properly, such that the technology will not be damaged, and the user will not quit trying to use the technology due to frustration. specific policies for the use of vr technology could be integrated into existing technology policy (if such a policy already exists) or created as a new policy. either way, it should be highly visible, and patrons should be asked to acknowledge it before use. the policy may include elements like that “the patron must follow all library employees’ guidance on how to properly use the vr headset” and “the patron is encouraged to ask any employee for assistance with the headset.” while a library may not be able to foresee or enforce perfect policy for all issues that arise from using emerging technologies like vr, these are some commonsense policy items that protect the user, the library, and the technology while it is in use. though a vastly different “technology” in many ways, the evolution of data analytics in modern libraries similarly poses significant threats to library patrons. as opposed to physical threats to well-being, the threats associated with data analytics are mostly related to social, psychological, and economic well-being mailto:blund@g.emporia.edu information technology and libraries march 2022 policy before technology | lund 2 through privacy and security risks. depending on how data is used, it can be rather innocuous or overtly malicious. it is not always clear when data goes from being innocuous to being a threat.4 collecting patron addresses can seem like necessary and acceptable data in order to issue a library card. libraries could use this data, though, in conjunction with census and other government data, to identify demographics of library users, like the ethnicity of patrons. this could be helpful in knowing, for instance, that a library has a large hispanic patron-base and thus may want to invest in spanish-language resources, but it also involves using data that patrons were forced to supply in order to profile them and make inferences about what materials they would like. understandably, many patrons would likely rather not have private data about them collected and analyzed, even if it could significantly improve services. there is certain data—like the addresses mentioned above—that libraries must collect in order to provide services. this cannot be avoided. rather, what should be done is to have a policy that clearly (without much legal jargon) outlines what data is collected and for what purposes it is used. everyone knows that no one reads lengthy legal disclaimers. while it may be seen as above-board in the eyes of the law, slipping policies that the library knows most patrons would question into a disclaimer is unethical. any questionable policies or procedures should be made clear to patrons, so that they can make an informed decision on whether to opt out of those services. it is great to have extensive data to improve services, but it should not be collected without real consent. no librarian should go home at the end of the day with any question about whether they used proper data collection procedures. additionally, there are always risks with the storage and maintenance of data. how is the data being stored? what security measures have been taken? these questions, along with the concerns in the prior paragraphs, are items that would all have to be addressed in an ethical review application for human subjects research at a university, but could be (and often are) overlooked when it comes to library service s and assessment. this may be particularly true at public libraries, which are not connected to an institution of higher education (which provide some ethical oversight). it is always better to start with a policy than to make one up as one goes along, even if it is necessary to adjust the policy over time as new risks and considerations emerge. for those who are creating a new policy from scratch, one of the best sources of information and inspiration can be the existing policies of other, similar organizations.5 for example, a large public library may look to the data policies of a similarlysituated large public library for inspiration. i encourage additional works by researchers within the field of library technology to strengthen evidence based practice within the area of technology policy formation. it is important to be careful with the design of policy and not to come at it without first doing your homework. yet, at the same time, it is important to consider the unique context of your own institution. what is a successful policy for one library may not be so for another—you must know your service population and specific space and technology infrastructure and management capacities. something like the administrative structure of a library system can significantly impact the success of policy implementation. policy, understandably, can be seen as a boring—if necessary—part of the proper functioning of a library and its technology. this can lead to policy being something that either is created in haste or after considerable procrastination, or something that becomes the subject of unnecessary, prolonged debate among library administration. in most cases, appropriate policy can, in fact, be quite straightforward, if libraries rely upon existing policy examples, understanding of the technology in question, and a thorough assessment of their library environment to guide the policy-drafting process. technology policy can be a boring subject, but its necessity cannot be overstated for reducing liability and threats to the well-being of patrons, library employees, and property. it is important to have technology policy in place before the information technology and libraries march 2022 policy before technology | lund 3 technology is made available to the public so that patrons can make informed decisions about whether to use the technology and/or agree to share data. endnotes 1 matt cook et al., “challenges and strategies for educational virtual reality,” information technology and libraries 38, no. 4 (2019): 25–48, https://doi.org/10.6017/ital.v38i4.11075; kenneth j. varnum, beyond reality: augmented, virtual, and mixed reality in the library, (chicago, il, american library association, 2019). 2 james s. spiegel, “the ethics of virtual reality technology: social hazards and public policy recommendations,” science and engineering ethics 24, no. 5 (2018): 1537–50, https://doi.org/10.1007/s11948-017-9979-y. 3 amy restorick roberts et al., “older adults’ experiences with audiovisual virtual reality: perceived usefulness and other factors influencing technology acceptance," clinical gerontologist 42, no. 1 (2019): 27–33, https://doi.org/10.1080/07317115.2018.1442380. 4 yong jin park, "personal data concern, behavioral puzzle and uncertainty in the age of digital surveillance," telematics and informatics 66 (2022): article 101748, https://doi.org/10.1016/j.tele.2021.101748. 5 lili luo, “experiencing evidence-based library and information practice: academic librarians’ perspective,” college and research libraries 79, no. 4 (2018): 554–67, https://doi.org/10.5860/crl.79.4.554. https://doi.org/10.6017/ital.v38i4.11075 https://doi.org/10.1007/s11948-017-9979-y https://doi.org/10.1080/07317115.2018.1442380 https://doi.org/10.1016/j.tele.2021.101748 https://doi.org/10.5860/crl.79.4.554 endnotes cultivating digitization competencies: a case study in leveraging grants as learning opportunities in libraries and archives article cultivating digitization competencies a case study in leveraging grants as learning opportunities in libraries and archives gayle o'hara, emily lapworth, and cory lampert information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.11859 gayle o’hara (gayle.ohara@wsu.edu) is manuscripts librarian, washington state university. emily lapworth (emily.lapworth@unlv.edu) is digital special collections & archives librarian, university of nevada las vegas. cory lampert (cory.lampert@unlv.edu) is head of digital collections, university of nevada las vegas. © 2020. abstract this article is a case study of how six digitization competencies were developed and disseminated via grant-funded digitization projects at the university of nevada, las vegas libraries special collections and archives. the six competencies are project planning, grant writing, project management, metadata, digital capture, and digital asset management. the authors will introduce each competency, discuss why it is important, and describe how it was developed during the course of the grant project, as well as how it was taught in a workshop environment. the differences in competency development for three different stakeholder groups will be examined: early career grant staff gaining on-the-job experience; experienced digital collections librarians experimenting and innovating; and a statewide audience of cultural heritage professionals attending grant-sponsored workshops. introduction digitization of cultural heritage resources is commonly viewed as an important and necessary task for libraries, archives, and museums. there are many reasons for engaging in digitization projects and creating digital collections, including providing increased access to unique collections, preserving fragile records, raising the global profile of the institution, meeting user demand, and supporting the teaching, learning, and research needs of host institutions. in addition, there is an expectation among the public that research resources are digitized and available online. from the perspective of librarians and archivists, digitization of special collections and archives materials involves more than just reformatting analog materials into a digital format (this article uses the term “digitization” to refer to the entire lifecycle of digitization projects involving special collections and archives materials, from planning to preservation). materials must be selected and prepared, the digital surrogates must be described and preserved, and access must be provided to the appropriate audiences. digitization work is often project-based, since each set of materials to be digitized may require different equipment, specifications, approaches, or workflows. digitization projects and workflows can be a solo affair, a temporary project team, or a permanent functional area complete with staff specializing in activities such as project management, grantwriting, web development, or metadata. staff learning needs will significantly vary depending on organizational characteristics, assigned roles, project specifications, and motivation of individuals. overall, the libraries’ and archives’ profession-wide approach to teaching and developing digitization competencies is somewhat haphazard. there are many methods to learn about digitization, including self-study of published resources, online tutorials and resources, conference presentations, workshops, continuing education courses, and masters in library and information mailto:gayle.ohara@wsu.edu mailto:emily.lapworth@unlv.edu mailto:cory.lampert@unlv.edu information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 2 science (mlis) program classes.1 in many graduate school programs there has been a move toward integrating digital library theory and practice, but courses are necessarily broad in nature, and not every student will be required or have the opportunity to complete a practicum or internship while studying. this can make it difficult for new librarians to identify which skills are most in demand and which type of self-study is most useful for the job market. identifying key competencies, and how to acquire them, may be helpful in supporting new librarians as th ey make the jump from graduate education to their first professional position, but it is not a challenge limited to newer professionals. even seasoned librarians and archivists, with practical experience in their portfolio, may find that their local experience does not translate to different organizations, is too broad for a particular project, or is not deep enough for them to lead the initiation of a new digitization program. the digital collections department at the university of nevada, las vegas (unlv) has a decadelong record of hiring early career librarians for grant-funded projects, providing them with opportunities to develop digitization competencies on the job. from 2017 to 2019, unlv’s digital collections department completed two grant-funded digitization projects that specifically set out goals to contribute to competency development for multiple stakeholders. early career project managers learned, practiced, and refined skills; the department experimented and innovated its own workflows; and the project team held two workshops to contribute to the development of digitization competencies throughout the state. the six main competencies that were developed during the grant projects are project planning, grant writing, project management, metadata, digital capture, and digital asset management. the authors, who were members of the grant project teams, will discuss the six competencies in this article. using the grant projects as a case study, they will describe each competency and share how it was used and developed within the project team via on-the-job learning, and within the state via the statewide workshops. literature review the idea of professional competencies for librarians and archivists is well-established and documented in academic literature, and defined competencies are recognized as valuable tools for education, recruitment, professional development, and evaluation. drawing from organizational project management literature, daniel, oliver, and jamieson define competency as the ability to apply combined knowledge, skills, and abilities in service of a measurable and observable goal. 2 in the united states, the american library association (ala) defines “core competencies of librarianship” and “competencies for special collections professionals.”3 the competency framework of the archives & records association of the united kingdom and ireland (ara) describes five levels of experience: novice, beginner, competent, proficient, and expert/authoritative.4 ara’s recognition of the varying dimensions of competency is a helpful guide, and aligns with the reality of different levels of expertise. however, the competencies identified by ala, ara, and other similar professional organizations are necessarily broad; competencies for specific library roles are harder to generalize and define. in order to identify the knowledge, skills, and abilities required of “digital librarians,” researchers such as choi and rasmussen analyzed job announcements and surveyed practitioners.5 job announcement analysis shows that there is no single definition of a digital librarian; instead digital librarian positions consist of many varied roles and responsibilities in almost infinite combinations. the competencies discussed in this article (project planning, grant writing, project management, metadata, digital capture, and digital asset management) were locally important to information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 3 unlv’s digitization projects, but they also align with the competencies identified in previous research. in their study of projects undertaken in the national digital stewardship residency program (ndsr), blumenthal et al. found that project management skills and technical skills (including metadata, workflow enhancement/development, digital asset management, and digitization) were important.6 the level of required technical competency tended to vary by project but workflow enhancement stood out as a universally important skill. a 2019 analysis of the latest career trends for information professionals by san jose state university’s (sjsu) ischool noted that there is increasing demand for project management skills across all career types. 7 this usually encompasses the ability to organize complex tasks and collaborate with other departments or institutions in service of a shared goal. sjsu also cited “new technologies” as a necessary skill. however, they specified that this refers to “all iterations relating to interest in, familiarity with, or experience with new and emerging technologies” (emphasis in the original). in choi and rasmussen’s article analyzing job ads, the authors note that many of the frequently stated job requirements tend to be vaguely described or cover broad areas, including current trends in digital libraries, competency on general technological knowledge, and the current state of information technology as three most frequently mentioned competencies. 8 digital asset management, digital scanning, digital preservation, and metadata were some of the specific technical skills desired, as well as project management, planning, and organization. research shows that the more generic the competencies, the more broadly applicable they are; but specific competencies depend on the local environment, the role of the position, and the variables of the project or responsibilities. the wide range of competencies required by the digital library field paired with the specificity of local implementation requires new librarians and archivists to seek out learning opportunities that target both theory and practice. in fact, one of the most important aspects of practical experience is the benefit gained by experiencing the concepts in real-world situations that require decision-making, iteration, and sometimes even failure. the education field points to the kolb model of experiential learning, a cycle that is composed of four elements: concrete experience, reflective observation, abstract conceptualization, and active experimentation. 9 these elements mirror the process of learning observed in the grant case studies. new project staff are often trained to do tasks, then reflect upon what went well or was challenging. then permanent staff in leadership roles encourage and facilitate discussions in abstract concepts such as the philosophy behind an organization’s decision to prioritize efficiency or the concepts of creating authentic digital surrogates. while it may not happen in every project, within both grant cases, the final phase of the learning cycle was also reached as project staff and permanent employees worked together to move practice forward through testing, experimentation with new methods, and ultimately innovation of new models for digital library practices in the area of large-scale digitization. kolb’s model can be useful throughout the library and archives field, as shown in the following example. the federal agencies digital guidelines initiative (fadgi) started in 2007 as a collaboration of federal agencies seeking to articulate common sustainable practices and guidelines for digitized and born-digital cultural heritage resources. the fadgi website is a treasure trove of approved and recommended guidelines covering still image digitization, embedding metadata, project planning for digitization activities, and more.10 it essentially provides step-by-step guides for all aspects of digitization and is a tool that those interested or actively involved in digitization should be familiar with and consult on a regular basis. however, information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 4 fadgi technical standards are relatively prescriptive, so organizations often have to decide how to implement them within their local environments, taking into consideration a wide range of variables. if every new digitization project manager conscientiously implemented the fadgi standards without associated institutional context, they could be investing their organization in long-term cost commitments that cannot be sustained over time or that do not meet the project goals. this scenario points to the need for hands-on experience and learning as outlined in the kolb model. the digitization project manager may want to revisit the goals of the project (access vs. preservation, or both) and resource allocations (storage capacity, software and hardware specifications, staff time and expertise), and then pilot a subset of materials by capturing with the fadgi standard and calculating the storage sizes of the files and any associated workflows for long-term management. through this small experiential exercise, much information can be gained, reflected upon, and then used to conceptualize how to proceed. most of the tasks associated with digital library projects demand increasing competency over time to progress from enacting the technical standard in an organizational context, to revising it across projects or local environments, to educating others about the role of the standard, or to, at the highest levels of competency actively participate in the creation or revision of the standard itself as it changes over time. the ability to not only implement but also refine and even innovate comes from a process of mastery of the competency in question. experiential learning is an important method for developing and refining competencies from a novice to more expert level, but not all librarians and archivists have the opportunity to learn from more experienced colleagues on the job. matusiak and hu emphasize the importance but also the inconsistency of integrating experiential learning into mlis programs.11 for those who do not gain practical experience in library school or on the job, workshops are an additional learning opportunity that can help professionals bridge the gap from written resources to local implementation. the illinois digitization institute is one example described in detail by maroso in 2005.12 digital directions is a conference that presents the “fundamentals of creating and managing digital collections” in two days.13 other available workshops focus more closely on different aspects of digitization, such as metadata or preservation, or training for specific equipment via a vendor. in the following examination of unlv’s digitization grant projects and workshops, the authors address six competencies that were either employed or developed by staff or have been identified in existing literature. these competencies may be viewed as critical building blocks for digitization projects and the authors address how they were developed to different levels of expertise and using different methods, experiential learning, and workshops. overview of grant projects unlv’s digital collections completed two grant projects with the main goals of: (1) the large-scale digitization of archival collections, (2) the development of large-scale digitization models and workflows that could be reused, and (3) statewide workshops to share those models and workflows with other libraries and archives institutions. both projects were funded by library services & technology act (lsta) grants administered by the nevada state library and archives. the first project, “raising the curtain: large-scale digitization models for nevada cultural heritage,” digitized mainly visual materials on the topic of las vegas entertainment, while the second project, “building the pipelines: large-scale digitization models for nevada cultural heritage,” digitized mostly text documents about water issues in southern nevada. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 5 digital collections hired two types of temporary project-specific staff for the two digitization grants: project managers and student assistants. the project manager for each grant coordinated the day-to-day activities, such as preparation of the materials, digital capture, quality control, metadata, and ingest into the digital collection management system, as well as helping to fine-tune workflow documentation. the student assistants contributed to digital capture, quality control, metadata creation, and upload to the digital collection management system. these grant projects are strong examples of experiential learning and competency development. two of the authors were principal investigators (pis) for both of the grants, and one author was the project manager for the second building the pipelines grant. at time of hire, the project manager for the second grant had experience working in special collections and archives but had not previously worked in a digital environment. one student assistant was hired for this project; she had already worked on the first large-scale digitization grant project in digital collections and was already familiar with the digitization workflow, as well as the hardware and software. employing a student who had already experienced the concrete tasks (phase 1, “concrete experience” in the kolb model) allowed her to help the new project staff as they cou ld together perform “reflective observation” (phase 2) and learn from their compiled shared experience. the project pis were intentional in designing opportunities for discussion. they regularly met with the student and project manager to help them understand what they were seeing and experiencing in the context of the organization's mission and the grant goals (kolb’s “abstract conceptualization”). the building the pipelines grant project facilitated each of them gaining more competency and moving to the next level while also helping the pis learn through experimenting with new approaches (the final phase of “active experimentation”). the same experiential learning model was also successfully used for the first raising the curtain grant project. as previously stated, conducting a day-long digitization workshop for nevada libraries and archives was a goal of both large-scale digitization grant projects undertaken by unlv digital collections. the nevada statewide large-scale digitization workshops, which were held towards the end of each grant period, were free for participants, and travel grants were available thanks to the grant funding. the workshops sought to provide an overview of large-scale digitization using unlv projects as examples, as well as to provide practical advice related to developing digitization competencies. the first workshop that unlv held in may 2018 consisted of presentations and discussions addressing the basics, methods, and challenges of large-scale digitization. the second workshop, held in may 2019, still shared what unlv learned about large-scale digitization during the grant project, but widened the scope to address multiple important digitization competencies, whether the project is large or small. competencies whether presented in a project-based learning environment, a one-day workshop, or in a selfstudy scenario, learners can benefit from a clear understanding of what is meant by competencies in each of the areas that make up a successful digitization project. below, the authors share the competencies most critical to success in the case study projects. these were also the competencies selected as priorities for the workshops. while expertise is not mandatory in all of the competencies in order to start a digitization project or apply for a grant, reflection and planning for each of these steps should be addressed prior to initiating any project. by identifying available resources (such as existing documentation, available staff with expertise to consult, or approval from a supervisor for a self-study plan) project managers can ensure that if there are any information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 6 competency gaps, they will learn the needed competencies to carry out the project. in addition, throughout the learning process, interpersonal skills such as proactive communication, adaptability to change, flexibility in evolving job scope, and cultivation of comfort with ambiguity are all qualities that are just as necessary as any technical skill in mastering competency in digitization. project planning this competency can be defined as the ability to create a shared and documented vision and plan so that specifications, expectations, roles, goals, and timelines are considered in advance and clear to everyone involved. planning for a digitization project is best approached holistically. the planning period is the time to consider all needed competencies and plan for their implementation. writing up a project plan is important, especially since digitization can involve many collaborators and stakeholders. even if one is working alone, there are so many components, steps, and details involved in digitization projects that it is important to plan ahead for them and to document everything. brainstorm and write down ideas and plans for the project, from the overall scope, goals, timelines, and roles, to the specific details of each component, including specifications and workflows for digital capture, metadata, access, preservation, assessment, and promotion (see appendix a, “an overview of planning and implementing digitization projects”). the plan should be communicated, remain flexible, and be updated (or better yet, versioned) to document changes implemented during the project. an important part of project planning is selecting materials for digitization. to develop competency in effectively selecting materials, a person should be familiar with the materials and the digitization process or collaborate closely with people who are. it is often not until one is in the weeds and discussing the nitty-gritty details of a project that the challenges and actual viability of digitizing specific materials become apparent. format is a huge factor in digitization, as is description, and understanding how materials will be used.14 digitizing a group of materials that can all be processed the same way is much easier than undertaking a project to digitize many different formats that require different digitization specifications, equipment, description, processing, etc. one must also take into account legal and ethical considerations. successful selection of materials takes all factors into account and targets materials that fit with the overall goals and vision of a specific project.15 in the case of unlv’s grant projects, the head of digital collections and the digital collections librarian identified the main goals, developed tentative workflows, and authored the grant applications as copis. the pis had multiple years of experience planning and completing digitization projects, which they drew upon to plan these projects. they both started off developing their digitization competencies by completing pilot projects, developing workflows and writing grants to fund smaller-scale, highly curated “boutique” projects. as they honed their skills and the department’s workflows over the years and the organization built the capacity and expertise to successfully scale up the rate of digitization, digital object production grew from one staff member using one scanner to digitize a couple hundred items in a year, to a robust department with a digitization lab that produces tens of thousands of digital surrogates per year. the pis documented the vision and goals of the projects in the grant applications, along with timelines, desired outcomes, the roles of the team members, and budgets. the grant application information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 7 provided a structure to help with the bigger picture of project planning, and the digital collections librarian also used a template to create detailed digitization plans for the collections. the template was developed locally based on past experience planning and implementing digitization projects (see appendix b for unlv’s “digitization plan template”). project planning was completed prior to the hire of the project managers and student assistants. the project managers and student assistants were responsible for enacting the project plans, and during the projects they were empowered to adapt and improve upon the plans. the modelling provided by the pis, coupled with the day-to-day experience of the project managers, led to the continuous improvement of and adaptation of workflows through experiential learning. the grant application and digitization plan, along with all of the prepared workflow documentation and tracking spreadsheets, provided a concrete example of how large digitization projects can successfully be planned. by implementing and refining the plans herself, the project manager gained direct experience and intimate knowledge of the plans, including what worked well and what did not. the project manager therefore developed competency in project planning to be able to create plans herself, and the pis further refined their own planning skills, allowing them to plan for even larger or more complex projects in the future. based on previous experience with projectbased learning, the pis had already established a level of expertise at roughly level 4 in the ara tiers. level 5 includes innovation, which was a target of the grant project as it required the pis not only to successfully map past experience to a new situation, but in cases where experiences did not map, gain new knowledge through experimentation. the project team included project planning as a topic of the statewide digitization workshops, sharing digitization plan templates, finalized workflows, and other planning resources that aided in the successful completion of the grant projects. building upon feedback from the first workshop in 2018, the second one addressed the ability to create a digitization project plan of any scale, recognizing that many nevada institutions do not have the ability to engage in large-scale projects. despite the emphasis on the foundational importance of project planning, most attendees noted that they do not currently create detailed digitization plans prior to starting a project. providing examples of plans, practical resources, sharing hands-on experiences, and welcoming discussion was helpful to participants, as indicated by feedback on the post-workshop survey. the workshop organizers scheduled time for participants to work on their own digitization plan, and also offered private consultations to help them, but many participants did not have a specific project in mind and did not seem ready to jump into the details of project planning during the workshop. overall, these teaching strategies helped participants gain a better idea about how to plan digitization projects, but they do not match the experience of creating or implementing a plan oneself. grant writing all projects begin with an idea, but only a small fraction of possible projects are acted upon. this is due primarily to a scarcity of resources. grant writing is not a necessary competency for all projects, but it is a valuable skill that can secure funding for projects that otherwise would not have been prioritized or possible. in its simplest form, a grant is a well-communicated idea with supporting rationale that effectively communicates why a project is a priority to undertake.16 grant applications are usually composed of a narrative section that covers the main goals, a budget with associated costs for the project, letters of support from partners, and details about the project team leading the work. even if a grant is not needed to undertake a project, the process of writing one often mirrors the very same decision-making that is necessary in the project planning information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 8 step. project planning is recommended for all digitization projects and it is nearly always required by external grant funders. grant writing can be undertaken alone, in a team, or (in larger institutions) as part of a research office or external funding program. in any case, it can be defined as the skill of writing text, calculating costs, and compiling relevant documentation to successfully propose projects for the award of external funding. competency in grant writing requires excellent communication skills, including the ability to craft persuasive arguments advocating for the project and the analytical ability to interpret instructions and guidelines to ensure the project is in compliance with the funder’s requirements. often, grant writing involves several people: disciplinary experts, collaborative partners, commercial vendors or contractors, technicians, and advisory boards. being able to facilitate discussions and coordinate actions is vital to wrangling the pieces of a large grant pre-award, as well as successful grant administration once funded. grants are competitive in nature, so creativity and originality in framing of a problem can mean the difference between a highly ranked grant and one that is passed over by reviewers. one method to obtain competency in grant writing is to read as many grant proposals as possible, specifically targeting those for similar projects.17 in addition, some funders look for a panel of grant reviewers and seeking out opportunities to participate in these processes is a valuable education. in the case of unlv digital collections and the pis, grant writing has been honed over time by some of the strategies mentioned above: reading other grant applications, serving on grant review panels, collaborating with other stakeholders, and communicating with the granting agency to understand criteria and solicit feedback. although the grant proposal was written and the grant was secured prior to the hire of the project managers, the project managers were able to develop a thorough understanding of the grant process. by successfully completing the grant projects, in addition to reviewing the grant proposals, contributing to quarterly reports, and discussing the projects with the pis and other stakeholders, the project managers gained valuable experience and understanding to inform their own future grant applications. given the scarcity of resources, the statewide digitization workshops made it a priority to address various aspects of locating grant opportunities, preparing to write proposals, seeking out collaborations to strengthen applications, and the mechanics and timelines to expect when applying for grants. one of the panel sessions in the workshops included a presentation by the state library’s grant administrator, who provided an overview of the state process and what the board looks for when reviewing project proposals. many participants found this particularly helpful because seeking out and applying for grants for digitization projects was not within their frame of reference, especially as many did not believe they had the requisite expertise in digitization. awareness of a need, gathering information, and analyzing examples are some of the first steps in developing a competency. the workshops helped attendees take these first steps of developing competency in grant writing and management but fell short of actually helping them to write their own grants. in this case, however, it was appropriate since the attendees did not have specific projects in mind and likely needed to spend more time in the first stages of competency development before jumping into implementation. workshops are most effective when the level of the content is appropriate to the level of expertise of the attendees. project management project management training is not often specifically emphasized in mlis programs. while there is literature on this topic, most people learn on the job.18 a successful project manager demonstrates information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 9 mastery of this competency by taking responsibility and assuming leadership of the project throughout the process, even if they are not intimately involved in the day-to-day tasks. they often are responsible for hiring and training project team members as well as communicating and responding to project team members and stakeholders. while they are tracking and analyzing progress using appropriate metrics, they are often the one raising a red flag if the project is experiencing delays or challenges. because they are responsible for ensuring the completion of the main goals of the project within the specified timeline, they often need to analyze bottlenecks and propose possible solutions in order to deliver high-quality results. ideally, they learn from their experiences and also help other team members and the organization learn from experience. a key role of the project manager is not only to deliver the outputs, but to assess and analyze, both during the project, in order to make improvements, and after, in order to inform future projects. therefore, investment in mentoring and supporting a project manager, whether a temporary or permanent staff member, can greatly influence how much learning takes place during the project and how that acquired knowledge is transferred to others. documentation is a key part of project management. this needs to happen at every interval of the project—while planning, during implementation, and at the conclusion.19 documenting concrete data including the time spent on specific activities helps to plan cost predictions for future projects, as well as to make recommendations regarding future staffing and equipment. mastering this competency involves planning, an eye for both details and the big picture, clarity, transparency, communication, and dedicated recordkeeping from the start of the project to the end. much like in project planning, the unlv pis had multiple years of experience stewarding projects from start to finish, which assisted them in on-the-job development of the project management competency. they were able to share with the project managers their accumulated years of learning experiences on both the projects, providing guidance on what to look for and how to comprehensively document the current digitization projects. this mentorship, combined with the experience of managing the day-to-day workings of the digitization projects, allowed the early career librarians to develop this competency. in addition, monthly project staff meetings, complemented by on the spot consultation when necessary, contributed to the ease of competency development. during the statewide digitization workshops, the project teams discussed digitization project management and shared strategies and tools such as using google sheets and trello to track workflows and progress. the teams also provided advice for aspects of project management such as managing student workers, troubleshooting equipment, transparent communication, and more. the project team chose to focus specifically on their own large-scale digitization experience because literature and resources about general and library project management are readily available. in addition, participants were encouraged to consider how their non-digitization experiences with project management could be translated to this kind of project as a way to encourage reflective learning based on their individual experience. metadata digitizing materials would not be a valuable endeavor without comparable investment in describing them with metadata that aids users in discovering and using the digital objects. developing a project plan that includes metadata approaches is essential in scoping project work and resources. metadata assignment and quality review is often a far more resource-intensive step information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 10 than the process of digital capture. metadata is one digitization competency that is robustly addressed in library school programs. standards are well documented and examples of digital library metadata are easily accessible online. the importance of metadata to the library and archives profession means that many professionals already have a foundational knowledge. what makes metadata a difficult competency to master is the level of detail and specificity it entails, which makes the step from theory to practice challenging. metadata competencies require an understanding of recognized standards, the ability to interpret and apply them, and an awareness of metadata mobility including: reusability, interoperability, and flexibility for migration or transfer.20 metadata-related skills require comfort moving along a wide spectrum of varied tasks, often toggling between awareness and understanding of high-level philosophical issues (such as inclusiveness of subject terms) and a laser-focused eye for detail to troubleshoot data issues (like correcting spreadsheets or code). metadata work traverses several phases of the digitization lifecycle: from initial preparation of collections, during capture, through the ingest into systems, and over the long-term to maintain and preserve the assets. metadata quality itself is difficult to quantify, making this a competency that can be tricky to evaluate. mastery can be indicated through the identification and study of appropriate standards, including compliance with any data reuse requirements, such as a regional digital library consortium, or metadata requirements to ensure compatibility with existing systems and data. in addition to selection of standards, or adherence to existing standards, metadata can be subjective and needs to be undertaken with attention to the level of specificity required for the project. completion of successful projects demonstrates efficient processing of records balanced with an appropriate level of metadata richness. documentation of metadata approach via a metadata application profile (map) as well as training materials and examples for metadata creators are also good indicators of metadata expertise. while technical skills are valuable for metadata competencies, communication and soft skills should not be underestimated as part of this skill set. often metadata competency is an area where collaboration is required. many libraries have catalogers, metadata librarians or aggregators that can advise and sometimes train or provide documentation for projects. before creating a new metadata approach from scratch, consultation can be a very effective way to gain greater competency. at unlv, the choice of an already processed collection eased the metadata choices for digitization. this meant there was already a certain amount of basic metadata regarding the collection; in addition, having the curator function as a subject expert engaged in prepping the collection enabled the project team to have a readymade list of prioritized subject terms, people, and corporate bodies available to input as each folder in the collection was digitized. the building the pipelines project manager had prior coursework in metadata, as well as experience assigning metadata in a previous internship. using the unlv’s metadata application profile as a guide and the existing metadata procedures established for the project, the project manager was able to hone a better understanding of metadata theory applied in practice, including how to best capture the “aboutness” of these particular digital objects. the project manager also observed the importance of consistency in applying metadata by performing quality control of the studentcreated metadata. a final contributing factor in developing competency in this area is that the team, consisting of the digital collections librarian, the project manager, and the student assistant, had many resources available as a team to solve problems. as previously mentioned, the team met information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 11 to review any project concerns and to pull in adjacent team members such as technical services staff, the metadata librarian, the curator, or even those with experience in programming and application development who could advise on how the metadata would appear in other systems, such as the one being developed for future digital collections. this larger group feedback was invaluable in the learning process and often touched on the more abstract concepts underpinning the tasks. at the statewide workshops, a metadata “bootcamp” was held in which staff addressed the types of metadata standards attendees were likely to encounter, the role of a metadata application profile, how to identify an existing map and apply it to your collection materials, as well the value of having a subject expert available for consultation. while reuse of existing description data (e.g. , finding aids or inventories) was an important topic for the first workshop, in response to feedback the second workshop’s metadata bootcamps focused more on concrete steps required to make digitized images searchable regardless of other workflows or systems that might be in use. again , this was an example of tailoring the content to the learning level of the audience. while all participants were familiar with metadata, many did not have experience using a map or taking interoperability into consideration. many recognized a need to devote more time to developing this competency, regardless of project. digital capture whether it is done in-house or outsourced to a vendor, competency in digital capture (digitization in the most specific sense) is key. this competency requires considering the materials to be digitized, how they will be displayed, and how long-term access will be provided to the digital objects. working in-house, technical mastery is not required, but it is necessary to have a solid idea of what hardware and software capabilities are, as well as who to consult should difficulties arise (and they will).21 mastery of this competency means having a vision for the ongoing presentation and use of the digitized material and outlining specifications to make that happen. documenting digitization specifications is useful not only for the project manager and for fu ture projects, but also as a training tool for students, interns, and volunteers. it can also be a source of important preservation and technical metadata ensuring files created today are sustainable into the future. in addition, a robust quality control workflow should be in place prior to uploading digital objects for display and use. a key component of digital capture is efficiently preparing the selected materials. at unlv, experience has taught the digital collections department that digitization is most successful when using materials that have already been physically processed (surveyed and arranged) and for which an inventory (finding aid) has been created. digitization of archival materials can quickly become complicated because they are often not physically uniform or consistent, and sometimes they are grouped together for digitization into complex/compound/aggregate digital objects. well thought-out workflows for naming and tracking individual files can make the digital capture process smoother, especially when files are related (such as the front and back of a photo, or pages of a scrapbook). this item-level documentation is critical to managing the large volume of files created in digital capture. any conservation or preservation concerns of the physical materials should also be addressed prior to capture. additional consultation may be required if unforeseen complications or problems arise during digital capture; item-level review may not be possible for all materials during the planning stage. for instance, there may need to be an alternate workflow for items that contain personally identifiable information or which are too fragile to undergo scanning or capture. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 12 there are a number of options for capturing images to create digital surrogates, including digital camera systems and a variety of scanners. depending upon the method of capture, additional software may be needed to edit, output, and ingest the images into a digital management system. for a text-heavy collection, software for optical character recognition (ocr) makes the items fulltext searchable. for audiovisual materials, digital capture is even more complex. the local hardware, software, and procedures for capture all may require an investment in hands-on time learning and testing procedures. the repetitive nature of capturing items may also require some investigation of ergonomics or more human-friendly configurations of these variables. at unlv, step-by-step documentation for using the various hardware and software is key to developing staff competencies. such documentation includes screenshots of steps in the process to contribute to comprehensive understanding and correct implementation of the workflow. projectbased staff also make suggestions, as they move through projects, to improve current workflows. the clear documentation, repetition of tasks, access to workflows of prior digitization projects, consultation with experienced staff, and review of available resources (such as the previously mentioned fadgi website) all contributed to competency development for the project managers. although the pis have years of experience digitizing, it is a detailed process that can be forgotten without use and practice, and it is a competency that must be continually cultivated because of changing technology. if it is decided to outsource digital capture, there are a number of factors to take into account in order to find the right vendor. issues to consider include cost, company stability, prior clients and completed projects, timelines, where the work is performed, and preferred communication methods. requesting a quote for services can be a good way to gain visibility into vendor communications, flexibility, and workflows, and will be essential if the project funds are administered in conjunction with any state or organizational purchasing rules or guidelines. although it can be time-consuming, it is vital for the research and legwork to take place prior to starting the project (see the “project planning” section). in outsourcing, confidence in the digital capture partner is key. mastery of this aspect of digitization means a comprehensive, transparent agreement, a regular flow of communication, and comfort in letting go of control over a major part of the project. resources provided by the northeast document conservation center (nedcc) and the sustainable heritage network help to consider the pros and cons of both in-house and outsourced digital capture.22 project management skills can also be very useful as working with a vendor shifts the needed competency from digital capture to more of a project management focus. unlv often employs vendors for the more challenging formats mentioned, such as oversized materials like maps and architectural drawings, and for materials like newspapers that require specialized zoning in the metadata to retrieve articles. working with a vendor can be an informative experience, teaching communication skills, negotiation of contracts, building appropriate timelines, and quality reviewing deliverables. some granting agencies cover a limited timeframe and outsourcing digital capture can free up an organization’s time to do more librarycentric work like metadata or archival processing. for the building the pipelines project, most of the material in the selected collections was flat printed material that was not oversized or in challenging formats such as film/transparent material, newspapers, or media (audio/video). this led to a high comfort level for in-house digital capture as there were established procedures for the archival collection. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 13 at the statewide workshops, participants attended a digital capture session where they were presented with digital capture workflows and information about unlv’s decision-making regarding digitization equipment, outsourcing vendors, and technical standards, and then they went into the digitization lab to observe the equipment in action. the digital capture bootcamp was facilitated by the head of digital collections, the student assistant, and the visual resources curator (who is a professional photographer). this unstructured session offered a place for attendees to preview equipment that might be suitable for their projects, get a sense of costs if they were looking to purchase equipment, and to observe the digital capture in a large-scale workflow (a specially designed rapid capture overhead camera system), a medium-scale workflow (with digital slr camera and copy stand), and a small-scale workflow (flatbed and map scanners). attendees were encouraged to match equipment to their project needs or identify if outsourcing was an appropriate approach for their collection. attendees were not able to use the equipment themselves or practice the digital capture workflows, but the small workshop format allowed them to view demonstrations in person, ask specific questions, and also see example workflows in action, which is a step above what online research or resources provide for competency development. digital asset management competency in digital asset management goes beyond identifying the storage capacity necessary for a project. digital asset management includes the storage, access, and preservation of digital files and their accompanying metadata. there are different ways to provide access to digital objects, some of the most popular being online content management systems like omeka or digital collection management systems like contentdm.23 as mentioned previously, metadata is important for staff and users to discover and locate digital objects. competency in digital asset management requires technical knowledge of how to securely and efficiently transfer digital files that are requested, or how to provide secure and user-friendly online access. it also requires planning to ensure that whichever approach taken is sustainable and can meet demand. good digital preservation means planning and implementing the necessary actions to ensure that digitized resources continue to be discoverable, accessible, and usable well into the future. in the case of digitized libraries and archives materials, this means that they must be well-documented and trustworthy. preserving digital materials includes maintaining multiple copies of files, capturing checksums to verify if the bits of a file have been corrupted over time, and in some cases, migrating file formats so that items can be viewed and used with future hardware and software. models for digital preservation include the open archival information system (oais) model and the national digital stewardship alliance (ndsa) levels of preservation.24 software and tools to aid in digital preservation tasks are available, as is training. however, digital preservation is still relatively new to many in the libraries and archives profession, although some individuals and institutions have developed very sophisticated and carefully considered programs and approaches. since digital preservation is based on technology, it will always be changing. one must not only learn and be able to implement the current standards and best practices of digital preservation, but also always keep up with changes. success in digital preservation requires ongoing effort and evaluation. successful digital preservation means that staff and users can find, understand, view, and use digital resources at any point in the future. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 14 for the grant project managers, this was the most challenging competency. while they were exposed to the complexities of digital preservation at unlv, this process was already wellestablished, having been developed over time by the pis and other library staff. the project managers essentially stewarded the newly digitized objects up to this point and then handed over the reins to the digital collections librarian. while they were free to ask questions and developed an understanding of the standards that contribute to long-term digital preservation, the project managers did not implement this particular workflow, nor did they contribute to adapting it. it is important to keep in mind that digital preservation is not an all or nothing proposition; small steps can be taken by libraries and archives professionals to address short-term digital preservation while gaining a better understanding of long-term solutions.25 given the complexity of this competency, it was difficult to train participants in the statewide digitization workshop setting. however, unlv’s digital collections staff emphasized the multiplicity of options available for libraries and archives with varying levels of resources and encouraged participants to be open to starting despite ambiguity about the ultimate long-term solution for their organization. digital collections staff also provided an overview of these options and shared the evolution of digital preservation strategies at unlv, including suggesting some first steps such as creating an inventory of digital assets and a digital preservation plan. developing expertise in this competency requires in-depth research, consultation, and analysis to customize plans for local circumstances. the statewide workshops provided only an hour-long introduction to the topic and a broad overview as an example. digital preservation is a topic that is well-addressed by other more intensive workshops though, such as the society of american archivists digital archives specialist courses and the powrr institute.26 summary of competency development: experiential learning versus workshops learning through experience for project teams unlv’s grant projects are examples of how specific time-bound projects and grant funding can be utilized to develop both individual and organizational competencies, and to share what is learned via workshops, aiding in the professional development of others. the early career project managers advanced the most in competency development because of the opportunity for focused training and experiential learning through practice. they progressively developed digitization competencies in a number of ways, including training from the pis, working with experienced student assistants and staff, reading locally created documentation, observing project activities and decision-making by the team, proposing solutions to challenges and testing them through trial and error, learning by doing tasks and suggesting small iterations to improve them, consulting the workflows from previous projects, and reviewing recommended resources such as the fadgi website. the project managers, though temporary employees, were treated with the same status as permanent staff and encouraged to attend meetings, ask questions, take risks, and experiment in a safe and controlled environment of learning. given the many multifaceted details and tasks that go into a digitization project from start to finish, it is unrealistic to expect staff to remember everything without engaging in the process themselves. training for the grant projects broke each project down into a series of discrete steps, including preparation, digital capture, quality review, ocr transcription, metadata creation and review, and upload into the digital asset management system. each task was reviewed and practiced in a linear manner. given the volume of materials, basic mastery and self-sufficiency for information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 15 the grant project staff were achieved fairly quickly. this allowed project staff to then identify areas for workflow improvements and test adjustments for increased efficiency. despite having just two dedicated staff, one of whom had no prior digitization experience, over 55,000 digital surrogates were created during the ten-month building the pipelines project, far exceeding the original goal of 10,000 images. in both grant projects the project managers were able to develop digitization competencies as a result of on-the-job experience, enriching their skill sets while also assisting unlv digital collections to refine workflows for large-scale digitization projects. this in turn strengthened competencies on the organizational level, and those of the pis. in the best-case scenario of this kind of project, temporary staff develop valuable digitization competencies via project-based work; however, that is not always the case, and temporary project-based positions can be very harmful to the personal and professional development of workers. when undertaking a project that uses temporary labor, the organization should plan for and prioritize equitable hiring practices, fair compensation and benefits, and a positive and productive experience for temporary staff.27 learning through experience for organizations grants are temporary in nature, so it is important that organizations who fund them and who receive funding think about the long-term implications of the temporary work. it is important for project staff to clearly document all of the details of the digitization approaches and workflows that worked successfully in the grant, as well as any problems that can be avoided. all the extra work of testing and refining new workflows completed by the project staff, can (and should) be adopted and integrated by permanent staff into the existing structure of the department or institution. one of the drawbacks for institutions undertaking grant-funded projects is that temporary staff leave and take their expertise with them. it is essential for permanent staff to not only teach, but also be open to and active in learning from the temporary staff during the project, even if the permanent staff are not doing the day-to-day work. building opportunities for information-sharing and knowledge transfer into a project plan vastly increases the value of the grant project funding. this organizational learning is a form of accountability to the funder ensuring that projects can be sustained and that lessons learned contribute to increased capacity in the funded organization and beyond. learning through workshops for professional development grant projects also pave the way to share lessons learned with colleagues via workshops or collaborative endeavors. as previously stated, conducting a day-long digitization workshop for nevada libraries and archives institutions was a goal of both large-scale digitization grant projects undertaken by unlv digital collections. besides the metropolitan areas surrounding las vegas and reno, much of nevada is rural and sparsely populated. these workshops provided a forum for people who might not usually come together to meet and talk about their work. many libraries and archives institutions in nevada are small and may have limited or no experience with digitization. the workshops sought to provide an overview of large-scale digitization using unlv projects as examples, as well as to provide practical advice related to developing digitization competencies. the first workshop at unlv, held in may 2018, consisted of presentations and discussions addressing the basics, methods, and challenges of large-scale digitization (see appendix c for the may 2018 agenda, “nevada statewide large-scale digitization symposium”). the grant team surveyed participants after the workshop and received mostly positive responses. sixteen out of information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 16 nineteen people who completed the survey said they learned something, thirteen said they were confident and likely to apply what they learned, and eleven people said that if there was a follow up workshop they would attend. the comments from the surveys showed that participants wanted more interactive activities, and many of them were not ready to implement large-scale digitization at their institutions—they wanted to learn more about the basics of digitization first. this feedback highlighted two challenges of workshop-based learning: the tendency toward passive delivery of large amounts of information and designing content for an audience with unknown or varying skill levels. the second workshop, held in may 2019, still shared what unlv learned about large-scale digitization during the grant project, but widened the scope to address multiple important digitization competencies, whether the project is largeor small-scale (see appendix d for the may 2019 agenda, “nevada statewide large-scale digitization symposium”). prior to the workshop, attendees were surveyed about their expertise level and topics of interest and were asked to review a project planning document with their local materials in mind. sessions were designed as bootcamps with more extensive documentation that could be used as a template for implementation at their home organization. participants were encouraged to ask questions and share their own experiences during the workshops and were given the option to sign up for a private consultation. the team endeavored across the workshop to allow for more interactive, hands-on learning. although unlv adjusted the second workshop based on the feedback from the first, teaching practical how-to skills that are broadly applicable in a one-day workshop is challenging. digitization is a complicated and technical undertaking that is most easily learned via hands-on experience, which is most effectively gained through repetition rather than a one-day workshop. there was not enough time or equipment for participants to actually practice parts of the digitization process themselves and so experiential learning was not always an option for every competency. also, if participants return to an organization with different equipment, hardware, and software, there are limits to hands-on training. another potentially problematic issue is staying up to date with the rapid technological changes that characterize digital collections. if a person gains a basic intellectual understanding of digitization via a workshop or other professional education opportunities, and then returns to their setting without starting a specific project in a timely manner, there is a risk that the knowledge they gained becomes outdated. despite the drawbacks of workshop-based learning, workshops are still valuable venues for colleagues to come together and learn from one another. they can also provide demonstrations or hands-on learning activities that help to bridge the gap from written theory to local implementation. conclusion online access to libraries and archives materials is expected and increasingly necessary in order for institutions and their collections to remain vital, useful, and relevant. ideally, digitization in libraries, archives, and museums would be a permanent functional area with specialized staff. however, many medium and smaller libraries and archives institutions do not have the capacity to sustain such an area. competencies in the areas of project planning and management, grant writing and administration, digital capture, metadata, and digital asset management are instrumental in order to complete a successful digitization project or institute a digitization program in any setting. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 17 despite the proliferation of professional workshops, online resources, literature, and conferences regarding digitization skills, it can be difficult to make time to study these materials and put such learning into practice in a way that builds to more sophisticated learning through experience. the diversity of collection materials to be digitized, the range of local circumstances, and the changing pace of technology prevent any profession-wide standardized approach to digitization education. instead, individuals, organizations, and the profession as a whole must strategically invest in the most effective and efficient methods and opportunities for developing digitization competencies. locally, unlv digital collections has found that experiential project-based learning is the most effective way to pilot new workflows and develop competencies. project-based experiences, if thoughtfully designed with an eye to mentoring and supporting temporary staff, provide an opportunity for individuals to develop and practice these competencies in a hands-on way that encourages deep learning. there is a unique place for small pilot projects, modest grant projects, or one-time experimental projects to create a space for this kind of learning in almost any organization. as capacity increases, digitization projects can also be designed to develop competencies at the staff functional group level, the organizational level, or the regional level. workshops in turn can be an opportunity for project teams or experienced individuals to share what they’ve learned and teach basic competencies to others. although not as comprehensive and effective as experiential learning, workshops can provide a solid introduction to digitization competencies, especially if interactive and hands-on learning methods are incorporated and there is a willingness for organizations to remain available for consultation or questions from attendees. workshops that have a preand post-session component can add continuity, and workshops that can be offered multiple times have the ability to evolve and scale. rotating instructors, incorporating hands-on sessions, and on-going mentoring are all ways to improve workshopbased learning. scaffolding these approaches and sharing what is learned individually or locally with others is a way to continue to develop the capacity of libraries and archives institutions to provide global online access to unique historical materials. although this approach is already widespread in the profession, it is important not to leave individuals or institutions with less resources behind. when planning new digitization projects or initiatives, institutions should consider adding and investing in new positions, partnerships, and regional collaborations and networks. when new permanent positions are not possible, temporary positions should be designed to be empowering and valuable for workers, rather than exploitative and harmful. in an age where technology is changing rapidly and is driven by large, well-resourced corporations, developing the profession’s competencies in digitization, keeping pace with digital technologies, and remaining relevant in the information environment depends on decentralized, peer-to-peer educational opportunities that use efficient and effective methods of teaching, such as interactive and hands -on learning. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 18 appendix a an overview of planning and implementing digitization projects created by emily lapworth for local use, march 8, 2018. shared at the statewide digitization workshops. these steps were written for large-scale digitization but can be applied to any size digitization project. 1) identify collections for digitization. a) brainstorm your goals for this project. think about what you will do with these digital surrogates, and who your audience is. b) criteria for selection of materials i) formats: start simple. if everything's the same, large-scale workflows are easier to apply. ultimately you will need to create different workflows for each format with differing requirements. for example, print photos are digitized differently than film negatives. text documents benefit from transcription using optical character recognition (ocr) software, while photos do not, and handwritten materials present additional discoverability challenges. when creating complex digital objects with different formats within them things can become even more complicated. ii) condition: fragile materials require extra handling time and possibly additional physical treatment prior to digitization. iii) existing arrangement and description: it is easiest if online access can directly mirror physical access, but the materials may need additional arrangement and description before digitization, depending on your goals. if the materials already have item or folder level description that is ideal. if there is any hierarchy in the existing description, especially inconsistent or complex hierarchy, consider how you will reuse that description for digital objects. iv) copyright: plan on providing public online access only if you own the copyright, have permission from the copyright holder, or if it is a strong case of fair use. c) see preparation step (below) to come up with some idea of how you will undertake this project. it will likely be modified during the actual preparation, but you need to have some idea of what you will do and how you will do it in order to gather support and resources. 2) assess the technical infrastructure needed to create, manage, provide access to, and preserve the digital files. a) estimate how much storage space you will need, and how much space will be needed for long-term digital preservation. b) make sure that your current digital preservation policies and workflows will be able to accommodate this project. adjust them if needed. c) identify what equipment and software will be needed and if you already have it, can acquire it, or can use someone else’s. d) assess if your existing workflows and systems for providing access to digital materials will be able to accommodate this project, and what changes you might need to make. e) technology could be a great area for collaboration! if you lack certain resources, explore opportunities to collaborate with other institutions. 3) coordinate with other stakeholders to verify choices and plans for digitization. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 19 a) find out what kind of support there is (financial, staffing, etc.) from management, administration, and the community. b) identify possible collaborators and discuss plans, make agreements, etc. c) decide who will manage and oversee the project and how different responsibilities will be distributed. d) identify and apply for grants if appropriate. 4) prepare collections for digitization. a) arrangement: assess how are the materials physically arranged and described, and if it will help or slow down your anticipated workflows. plan for and complete additional processing if needed. b) decide how you will display digitized materials. mirroring existing arrangement is the easiest, but you also have to consider the file formats you want to create. c) description: figure out how you can reuse existing description. plan metadata fields, vocabularies, prioritized subject terms and names. d) prepare preliminary metadata. reuse what you already have! e) prepare physical materials. verify that physical contents of the collection match existing description or inventory. remove staples, unbind, unsleeve, flatten, etc. identify and address any preservation or conservation issues. f) identify physical formats (this will help determine timeline and what equipment is needed). g) decide: outsource or in house? h) create and test workflows and procedures. i) create documentation for workflows and procedures (important for duration of project, for reusing for future projects, and also for future employees stewarding these digital assets to know what you did and how you did it). j) create and prepare systems, documents, or mechanisms to track work (it’s important to stay organized, especially when dealing with a large amount of materials or a team of workers). 5) digitize collections a) set up consistent file naming procedures and make sure they are followed. b) when dealing with mixed materials in house: depending on equipment and composition of materials, start with the easiest or what you have the most of, then take note of other formats (e.g., transparencies, oversize, etc.) that require different equipment or settings so you group them together to do later all at once. c) keep specifications simple if possible, especially if you have student workers doing the digitization. (for example, if you have complex digital objects with both text and photographic prints, and can digitize both materials on the same equipment without changing settings, do so. if you normally digitize text at 300 ppi but want photos at 600 ppi, rather than having the technician stop and change the settings, capture all at 600 ppi if you have the space.) d) auto-crop is a great tool if you have it but otherwise try to improve the efficiency of your processes with any tools at your disposal. sometimes this can be as simple as placing the item with the correct orientation to avoid the need to manually rotate later. e) file formats: archival images are generally tiffs. smaller derivative files may be necessary for access or to speed up ocr processes. sometimes it’s better to output them at the time of scanning than to batch process later. 6) process images information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 20 a) see above: try to improve your digitization workflows and procedures to shave time off of image processing. b) ocr: if you have textual materials ocr transcription makes them much more accessible with less manual work into creating detailed metadata. this is especially true for large aggregations of textual documents. resist the urge to have perfect ocr transcription. something is better than nothing, and when dealing with scale, you do not have the time to correct everything. here is also an opportunity for crowdsourcing, if you have the technical resources to set it up. c) ocr file output: depending on how you choose to display and make the digital surrogates available, you may need to output text files and/or pdf/as. 7) describe and provide access a) reuse description that already exists (e.g., from an inventory or a finding aid). if a finding aid exists, make sure you are using all available information and understand how description is inherited and can be reused. b) at the beginning of the project transform the metadata that already exists into a format you can use to describe the digital objects. you can add to this existing metadata throughout the workflow. c) at the beginning of the project identify preferred subject terms and important names to look out for and add to digital object metadata when appropriate. this is especially important when metadata is created by students or teams or anyone unfamiliar with the subject matter of the collection. it will help ensure consistency and make faceting better for users. d) explore how search engine optimization (seo) works for your public online access system. take that into consideration when creating metadata in order to optimize discovery of the materials. e) make it as easy as possible for users to identify the provenance of the digital object and to find other digital objects from the same collection. f) consider the links between the original collection description and the digital surrogates. consider adding digitization information or links to digital surrogates into finding aids and other records. consider also adding a link to the finding aid in the digital object metadata. consider using persistent identifiers, such as arks (archival resource keys), to d o this, instead of using regular urls. g) find out how your access system indexes full text transcripts and how it displays different file formats. consider if you are able/if you want to offer multiple file formats of a digital object. for example, a compound digital object that includes both text and images could be available as a collection of image files, a single pdf file, or both. identify what would be most useful to your users. h) don’t forget about structural, administrative, technical, and preservation metadata! 8) implement quality control procedures (qc) a) have a strategy (e.g., sampling), guidelines, and goals for qc. b) for staff performing quality control, identify the most important things to look for. c) decide how much time should be spent on qc. d) identify and acquire any automated tools that can be used. e) set up procedures or steps to follow when errors are found. 9) preserve digital assets a) you should have already planned how you will ensure access to and preservation of the digital files and metadata in the long term. best practice is to have policies in place information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 21 identifying what digital assets should be preserved and to what extent. identify applicable standards and best practices, implement software and technical solutions. b) set up workflows and procedures to ensure that the digital files receive appropriate ongoing digital preservation treatment. 10) publicize and promote a) work with administration, collaborators, and other stakeholders to publicize and promote the project. b) depending on your audience, social media, academic listservs, and professional organization publications can be other avenues to spread the word. c) set up harvesting with your regional digital library for inclusion in the digital public library of america. 11) assess a) web statistics can be used to track the use of online materials. see saa/acrl’s “standardized statistical measures and metrics for public services” section 8 “online interactions” for general information on what information to collect, and the digital libr ary federation’s “best practices for google analytics” for specific information on google analytics. if you are a contentdm user, see “google analytics in contentdm.”28 b) surveys, interviews, and focus groups are other methods that can be used to gather feedback. c) record and compile any oral or written feedback received from stakeholders and audiences. d) analyze feedback and use statistics to identify areas of success and areas for improvement. make improvements as necessary and incorporate findings into planning for future projects. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 22 appendix b digitization plan template template created by emily lapworth for local use and shared at the statewide digitization workshop. project overview collection name(s): collection number(s): link to finding aid(s) or existing description(s): project staff: project supervisor: research value/audience: goals: available resources: staff, money, equipment, software, etc. additional resources needed: staff, training, money, equipment, software, etc. priority level: low, medium, high. why is this being digitized now? part of the regular workflow, part of a grant project, or specially requested? publicity and promotion plans: assessment plans: estimated time frame/due date: estimate how much time should be spent on the collection, or when it should be finished by. date completed/approximate hours spent: formats and quantity of items: e.g., seven boxes of photographic prints, three folders of flat text documents, two drawers of oversize materials, etc. existing arrangement & description: how the collection is currently arranged, what description is currently available? copyright: what is the copyright status of the materials and can you legally digitize and provide access to them? restricted or sensitive materials: e.g., skip over restricted folders, digitize restricted item notice, or physically cover pii (personally identifiable information, such as social security numbers) during digitization. preservation issues: any fragile or delicate materials that need extra attention? supply needs: e.g., envelopes needed for rehousing information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 23 notes for future/follow-up: e.g., missing items, materials that should be restricted, recommend additional processing, rehousing, digitization, metadata enhancement, etc. preparation what will be digitized and what won’t be? e.g., series x will not be digitized at this time. it consists of audiovisual materials which would need to be outsourced. how will items be arranged and described online? how will identifiers/file names be assigned? e.g., each folder = a compound object, file titles from the finding aid will be used as titles for the digital objects what physical preparation must take place before digitization? e.g., remove all staples and fasteners digitization equipment/technical specs: • outsourcing or in-house equipment to be used • file types (e.g., tiffs) • file quality (e.g., 24-bit color, 600ppi) • file naming other specifications: • where will digital files be stored and preserved? • how will special physical formats be handled? (e.g., scrapbooksentire page or individual photos; magazinesentire issue or just cover? etc.) digital file processing • image correction? • cropping or other editing? • ocr or transcription? • create derivative files? digital file quality control: what procedures and workflows will you put in place to ensure that everything is digitized accurately and according to the project specifications? metadata what standards, fields, guidelines, and controlled vocabularies will you use? metadata quality control: what procedures and workflows will you put in place to ensure that all metadata is accurate, consistent, and conforms to the project specifications? access how will digital objects be accessed? what systems, workflows, and procedures will be used to provide access? information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 24 appendix c nevada statewide large-scale digitization symposium funded by lsta may 18, 2018 coffee and pastries 9:00 9:30 digitization lab tour 9:30 10:00 welcome 10:00 10:15 opening remarks from the dean of university libraries and the director of special collections and archives. session: what is large-scale? [live streaming begins] 10:15 11:00 this session will cover the characteristics of large-scale digitization and what sets it apart from other types of digitization projects. the unlv entertainment project team will also provide an update on the lsta funded project they undertook to digitize over 25,000 items from unlv’s entertainment collections. panel: methods for ramping up identifying resources 11:00 12:00 there is a mandate to increase efficiency in digitization, but what resources can help you get there? this session will detail four methods to increase digitization output and address how organizations of varying resource levels can adopt them. 12:00 1:00 lunch enjoy a catered lunch and some discussion time with colleagues from across the state and region. there will be time to walk around the room and share digitization activities at your organization via whiteboards. during lunch you can also browse the “equipment buffet” where we will have handouts/displays on various types of digitization equipment and outsourcing vendors. panel: challenges of digitization at a larger scale 1:00 2:00 information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 25 ramping up digitization is not as simple as merely increasing numbers. in this session we will discuss the challenges encountered each phase of digitization when scaling up and some strategies to meet the challenges. break [live streaming ends] 2:00 2:15 during the break, browse the “equipment buffet” where we will have handouts/displays on various types of digitization equipment and outsourcing vendors. using the provided worksheet, shop the buffet and rank how well each product meets your digitization needs. discussion: resource 5: statewide collaboration (in groups) 2:15 3:15 the last session of the day will focus on an additional resource to ramping up digitization: your peers and partners right here in nevada! we will review the notes about organizational projects and shared challenges, identify potential partnerships or collaborations, discuss grant opportunities, and work as a group to prioritize our state’s most at-risk collections. wrap up / assessment 3:15 3:30 before everyone departs for home, we will share contact information from attendees, complete a workshop evaluation and discuss follow up activities for next year. all attendees will leave with a customized plan of action for their organization. attendee learning objectives: • be able to define characteristics of digitization projects (mass, large-scale, boutique) and where your organization fits. decide on the type of digitization appropriate for your organization to move toward. • understand pros and cons of each method and the type of resources needed to support implementation. identify one or more method/resource for your organization to target to increase your organizational capacity. • understand complexities of large-scale digitization and identify one or more challenges at your organization. • gain perspective on projects across nevada. be able to identify at least one future collaborative opportunity. information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 26 appendix d nevada statewide large-scale digitization workshop funded by lsta may 10, 2019 workshop outcomes: • digitization boot camp sessions guided by survey responses • upr lsta project update and lessons learned • project consultations available • reflections on statewide workshops compare over 1 year agenda 8:00 9:00 *concurrent session coffee and pastries digitization lab equipment consultations welcome 9:00 9:15 opening remarks from the dean of university libraries panel: challenges of digitization at a larger scale 9:15 -10:00 what does it take to complete a large digitization project? in this case study panel presentation, we will cover the approach used in digitizing the union pacific railroad water documents, including: writing the grant and selecting materials, preparing archival collections for efficient digitization, managing the project, the student technician perspective, and trouble-shooting imaging and technical issues. panelists: project manager; curator; digital collections librarian; student technician; visual resources curator goal: overview of large-scale digitization and project deliverables. boot camp: preparing to digitize 10:00 11:00 goal: dig into the decisions needed to create a digitization plan. there will be a short presentation to go over the planning document, including asking “what makes a good project”? we will discuss labor and students and complete hands-on activities with actual collections to encourage work on individual plans. 11:00 12:00 *concurrent session boot camp: capture images group a boot camp: create metadata group b information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 27 goal: provide introductions to two main workflows in digitization projects: digital capture and metadata creation. there will be demonstrations, hands-on activities, and a chance to ask questions with the goal of helping to complete digitization plans. 12:00 1:00 lunch 1:00 2:00 *concurrent session boot camp: capture images group b boot camp: create metadata group a goal: provide introductions to two main workflows in digitization projects: digital capture and metadata creation. there will be demonstrations, hands-on activities, and a chance to ask questions with the goal of helping to complete digitization plans. boot camp finding external funding 2:00 2:30 goal: learn what opportunities exist to secure funding for your project. hear tips on successful grant writing. discuss possible collaboration opportunities across the state. presenting online images dams overview 2:30 3:30 goal: see several options for presenting your collection to an online audience. options will highlight strategies for many staffing configurations including: solo librarian/historian, low it resourced institutions, common systems in the profession, and complex open source development communities focused on digital asset management platform (islandora 8). wrap up / assessment 3:30 3:45 goal: complete short survey on the workshop and ideas for future statewide events related to digitization. one on one consultations available 3:45 4:30 information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 28 endnotes 1 some examples include: “moving theory into practice: digital imaging tutorial,” cornell university library/research department, http://preservationtutorial.library.cornell.edu/contents.html; “bcr’s cdp digital imaging best practices version 2.0,” bibliographical center for research, june 2008, https://sustainableheritagenetwork.org/system/files/atoms/file/bcrcdpimagingbp.pdf; “new self-guided curriculum for digitization,” digital public library of america, https://dp.la/news/new-self-guided-curriculum-for-digitization/; elizabeth la beaud, “analysis of digital preservation course offerings in ala accredited graduate programs,” slis connecting 6, no. 2 (2017): 10, https://doi.org/10.18785/slis.0602.09. 2 anne daniel, amanda oliver, and amanda jamieson, “toward a competency framework for canadian archivists,” journal of contemporary archival studies 7, article 4 (2020): 1–13, https://elischolar.library.yale.edu/jcas/vol7/iss1/4. 3 “ala’s core competences of librarianship,” american library association, 2009, http://www.ala.org/educationcareers/careers/corecomp/corecompetences; “guidelines: competencies for special collections professionals,” association of college and research libraries, 2017, http://www.ala.org/acrl/standards/comp4specollect. 4 archives & records association of the united kingdom and ireland, “the ara competency framework,” 2016, https://www.archives.org.uk/160-cpd/cpd/700-competency-framework.html. 5 youngok choi and edie rasmussen, "what is needed to educate future digital librarians," d-lib magazine 12, no. 9 (september 2006), https://doi:10.1045/september2006-choi. youngok choi and edie rasmussen, "what qualifications and skills are important for digital librarian positions in academic libraries? a job advertisement analysis," the journal of academic librarianship 35, no. 5 (2009): 457–67, https://doi.org/10.1016/j.acalib.2009.06.003. 6 karl-rainer blumenthal et al., “what makes a digital steward: a competency profile based on the national digital stewardship residencies,” lis scholarship archive (2017), https://doi.org/10.17605/osf.io/tnmra. 7 “mlis skills at work: a snapshot of job postings,” san jose state university school of information, 2019, https://ischool.sjsu.edu/lis-career-trends-report. 8 choi and rasmussen, “what qualifications.” 9 david a. kolb and ronald fry, “toward an applied theory of experiential learning,” in theories of group process, ed. cary l. cooper (london: john wiley, 1975), 33–57. 10 “guidelines,” federal agencies digital guidelines initiative, http://www.digitizationguidelines.gov/guidelines/. 11 krystyna k. matusiak and xiao hu, "educating a new cadre of experts specializing in digital collections and digital curation: experiential learning in digital library curriculum," proceedings of the american society for information science and technology 49, no. 1 (2012): 1– 3, https://doi.org/10.1002/meet.14504901018. http://preservationtutorial.library.cornell.edu/contents.html https://sustainableheritagenetwork.org/system/files/atoms/file/bcrcdpimagingbp.pdf https://dp.la/news/new-self-guided-curriculum-for-digitization/ https://doi.org/10.18785/slis.0602.09 https://elischolar.library.yale.edu/jcas/vol7/iss1/4 http://www.ala.org/educationcareers/careers/corecomp/corecompetences http://www.ala.org/acrl/standards/comp4specollect https://www.archives.org.uk/160-cpd/cpd/700-competency-framework.html https://doi:10.1045/september2006-choi https://doi.org/10.1016/j.acalib.2009.06.003 https://doi.org/10.17605/osf.io/tnmra https://ischool.sjsu.edu/lis-career-trends-report http://www.digitizationguidelines.gov/guidelines/ https://doi.org/10.1002/meet.14504901018 information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 29 12 amy lynn maroso, "educating future digitizers," library hi tech 23, no. 2 (june 1, 2005): 187– 204, https://doi.org/10.1108/07378830510605151. 13 “agenda, digital directions: fundamentals of creating and managing digital collections, october 19-20, 2020, tucson, az,” northeast document conservation center, https://www.nedcc.org/preservation-training/dd20/agenda. 14 kim christen and lotus norton-wisla, “digitization project decision-making: starting a digitization project,” center for digital scholarship and curation, sustainable heritage network, july 1, 2017, https://sustainableheritagenetwork.org/digital-heritage/digitizationproject-decision-making-starting-digitization-project. 15 kim christen and lotus norton-wisla, “digitization project decision-making: should we digitize? can we digitize?,” center for digital scholarship and curation, sustainable heritage network, july 1, 2017, https://sustainableheritagenetwork.org/digital-heritage/digitizationproject-decision-making-should-we-digitize-can-we-digitize-0. 16 taylor surface, “getting a million dollar digital collection grant in six easy steps,” oclc next, december 6, 2016, http://www.oclc.org/blog/main/getting-a-million-dollar-digital-collectiongrant-in-six-easy-steps/. 17 institute of museum and library services, “putting your best foot forward: tips on making your preliminary proposal competitive”, december 31, 2015, https://www.imls.gov/blog/2015/12/putting-your-best-foot-forward-tips-making-yourpreliminary-proposal-competitive. 18 examples of project management literature relevant to cultural heritage digitization projects include: cyndi shein, hannah e. robinson, and hana gutierrez, “agility in the archives: translating agile methods to archival project management,” rbm: a journal of rare books, manuscripts, and cultural heritage 19, no. 2 (2018), https://rbm.acrl.org/index.php/rbm/article/view/17418/19208; michael dulock and holley long, “digital collections are a sprint, not a marathon: adapting scrum project management techniques to library digital initiatives,” information technology and libraries 34, no. 4 (2015), https://doi.org/10.6017/ital.v34i4.5869; michael middleton, “library digitisation project management," proceedings of the iatul conferences (1999), http://docs.lib.purdue.edu/iatul/1999/papers/20; “dlf project managers toolkit," digital library federation, https://wiki.diglib.org/dlf_project_managers_toolkit; theresa burress and chelcie juliet rowell, “project management for digital library projects with collaborators beyond the library,” journal of college & undergraduate libraries 24, no. 2–4 (2017), https://doi.org/10.1080/10691316.2017.1336954. 19 “guiding digital success,” online computer library center (oclc), https://www.oclc.org/content/dam/oclc/contentdm/guiding_digital_success_handout.pdf . 20 useful metadata resources include: digital public library of america, “metadata application profile,” https://pro.dp.la/hubs/metadata-application-profile; dublin core metadata initiative, “guidelines for dublin core application profiles,” https://www.dublincore.org/specifications/dublin-core/profile-guidelines/; oksana l. https://doi.org/10.1108/07378830510605151 https://www.nedcc.org/preservation-training/dd20/agenda https://sustainableheritagenetwork.org/digital-heritage/digitization-project-decision-making-starting-digitization-project https://sustainableheritagenetwork.org/digital-heritage/digitization-project-decision-making-starting-digitization-project https://sustainableheritagenetwork.org/digital-heritage/digitization-project-decision-making-should-we-digitize-can-we-digitize-0 https://sustainableheritagenetwork.org/digital-heritage/digitization-project-decision-making-should-we-digitize-can-we-digitize-0 http://www.oclc.org/blog/main/getting-a-million-dollar-digital-collection-grant-in-six-easy-steps/ http://www.oclc.org/blog/main/getting-a-million-dollar-digital-collection-grant-in-six-easy-steps/ https://www.imls.gov/blog/2015/12/putting-your-best-foot-forward-tips-making-your-preliminary-proposal-competitive https://www.imls.gov/blog/2015/12/putting-your-best-foot-forward-tips-making-your-preliminary-proposal-competitive https://rbm.acrl.org/index.php/rbm/article/view/17418/19208 https://doi.org/10.6017/ital.v34i4.5869 http://docs.lib.purdue.edu/iatul/1999/papers/20 https://wiki.diglib.org/dlf_project_managers_toolkit https://doi.org/10.1080/10691316.2017.1336954 https://www.oclc.org/content/dam/oclc/contentdm/guiding_digital_success_handout.pdf https://pro.dp.la/hubs/metadata-application-profile https://www.dublincore.org/specifications/dublin-core/profile-guidelines/ information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 30 zavalina et al., “developing an empirically-based framework of metadata change and exploring relation between metadata change and metadata quality in marc library metadata,” procedia computer science 99 (2016 ) 50–63, https://doi.org/10.1016/j.procs.2016.09.100. 21 “guidelines: technical guidelines for digitizing cultural heritage materials,” federal agencies digital guidelines initiative, http://www.digitizationguidelines.gov/guidelines/digitizetechnical.html; “digital preservation at the library of congress,” library of congress, https://www.loc.gov/preservation/digital/. 22 robin l. dale, “reformatting: 6.7 outsourcing and vendor relations,” northeast documentation conservation center, https://www.nedcc.org/free-resources/preservation-leaflets/6.reformatting/6.7-outsourcing-and-vendor-relations; “deciding to outsource or digitize inhouse," digital stewardship curriculum, center for digital scholarship and curation/sustainable heritage network, https://www.sustainableheritagenetwork.org/system/files/atoms/file/1.20_outsourcingvsin house.pdf. 23 “omeka,” roy rosenzweig center for history and new media, https://omeka.org/; “contentdm: build, showcase, and preserve your digital collections,” oclc, https://www.oclc.org/en/contentdm.html. 24 “iso 14721:2012 space data and information transfer systems—open archival information system (oais)—reference model,” international organization for standardization, https://www.iso.org/standard/57284.html; “levels of digital preservation,” national digital stewardship alliance/digital library federation, https://ndsa.org//activities/levels-of-digitalpreservation/. 25 “from theory to action: ‘good enough’ digital preservation solutions for under-resourced cultural heritage institutions”, preserving digital objects with restricted resources (powrr), august 2014, http://commons.lib.niu.edu/handle/10843/13610. 26 “digital archives specialist (das) curriculum and certificate program,” society of american archivists, https://www2.archivists.org/prof-education/das; “powrr institutes,” digital powrr, https://digitalpowrr.niu.edu/institutes/. 27 sandy rodriguez et al., “collective responsibility: seeking equity for contingent labor in libraries, archives, and museums,” institute for museum and library services white paper, http://laborforum.diglib.org/wpcontent/uploads/sites/26/2019/09/collective_responsibility_white_paper.pdf. 28 saa-acrl/rbms joint task force on public services metrics, “standardized statistical measures and metrics for public services in archival repositories and special collections libraries,” 2018, https://www2.archivists.org/standards/standardized-statistical-measuresand-metrics-for-public-services-in-archival-repositories; molly bragg et al., “best practices for google analytics in digital libraries: digital library federation assessment interest group analytics” working group, 2015, https://doi.org/10.17605/osf.io/ct8bs; “google analytics in contentdm,” oclc, https://doi.org/10.1016/j.procs.2016.09.100 http://www.digitizationguidelines.gov/guidelines/digitize-technical.html http://www.digitizationguidelines.gov/guidelines/digitize-technical.html https://www.loc.gov/preservation/digital/ https://www.nedcc.org/free-resources/preservation-leaflets/6.-reformatting/6.7-outsourcing-and-vendor-relations https://www.nedcc.org/free-resources/preservation-leaflets/6.-reformatting/6.7-outsourcing-and-vendor-relations https://www.sustainableheritagenetwork.org/system/files/atoms/file/1.20_outsourcingvsinhouse.pdf https://www.sustainableheritagenetwork.org/system/files/atoms/file/1.20_outsourcingvsinhouse.pdf https://omeka.org/ https://www.oclc.org/en/contentdm.html https://www.iso.org/standard/57284.html https://ndsa.org/activities/levels-of-digital-preservation/ https://ndsa.org/activities/levels-of-digital-preservation/ http://commons.lib.niu.edu/handle/10843/13610 https://www2.archivists.org/prof-education/das https://digitalpowrr.niu.edu/institutes/ http://laborforum.diglib.org/wp-content/uploads/sites/26/2019/09/collective_responsibility_white_paper.pdf http://laborforum.diglib.org/wp-content/uploads/sites/26/2019/09/collective_responsibility_white_paper.pdf https://www2.archivists.org/standards/standardized-statistical-measures-and-metrics-for-public-services-in-archival-repositories https://www2.archivists.org/standards/standardized-statistical-measures-and-metrics-for-public-services-in-archival-repositories https://doi.org/10.17605/osf.io/ct8bs information technology and libraries december 2020 cultivating digitization competencies | o’hara, lapworth, and lambert 31 https://help.oclc.org/metadata_services/contentdm/get_started/google_analytics_in_con tentdm. https://help.oclc.org/metadata_services/contentdm/get_started/google_analytics_in_contentdm https://help.oclc.org/metadata_services/contentdm/get_started/google_analytics_in_contentdm abstract introduction literature review overview of grant projects competencies project planning grant writing project management metadata digital capture digital asset management summary of competency development: experiential learning versus workshops learning through experience for project teams learning through experience for organizations learning through workshops for professional development conclusion appendix a appendix b project overview preparation digitization metadata access appendix c appendix d endnotes mets as an intermediary schema for a digital library of complex scientific multimedia richard gartner information technology and libraries | september 2012 24 abstract the use of the metadata encoding and transmission standard (mets) schema as a mechanism for delivering a digital library of complex scientific multimedia is examined as an alternative to the fedora content model (fcm). using mets as an “intermediary” schema, where it functions as a template that is populated with content metadata on the fly using extensible stylesheet language transformations (xslt), it is possible to replicate the flexibility of structure and granularity of fcm while avoiding its complexity and often substantial demands on developers. mets as an intermediary schema for a digital library of complex scientific multimedia of the many possible approaches to structuring complex data for delivery via the web, two divergent philosophies appear to predominate. one, exemplified by such standards as the metadata encoding and transmission standard (mets)1 or the digital item declaration language (didl),2 relies on the structured packaging of the multiple components of a complex object within “top-down” hierarchies. the second, of which the fedora content model (fcm) is perhaps a prime example,3 takes the opposite approach of disaggregating structural units into atomistic objects, which can then be recombined according to the requirements of a given application.4 neither is absolute in its approach—mets, for instance, allows cross-hierarchy linkages, and many fcm models are designed hierarchically—but the distinction is clear. many advantages are validly claimed for the fcm approach to structuring digital data objects. individual components, not constrained to hierarchies, may be readily reused in multiple representations with great flexibility.5 complex interobject relationships may be encoded using semantic linkages,6 a potentially much richer approach to expressing these than the structural relationships of xml can allow. multiple levels of granularity, from that of the collection as a whole down to its lowest-level components, can readily be modelled, allowing interobject relationships to be encoded as easily as intercomponent ones.7 such models, particularly the rdf-based fedora content model, are very powerful and flexible, but can often lead to complexity and consequently considerable demands on system development before they can be implemented. in addition, despite the theoretical interoperability offered by rdf, in practice the exchange and reuse of content models has proved somewhat limited because considerable work is usually required to re-create and validate a content model created elsewhere.8 this article examines whether it is possible to replicate the advantages of this approach to structuring data within the constraints of the more rigid mets standard. the data used for this analysis is a set of digital objects that result from biological nanoimaging experiments, the interrelationships of which present complex problems when they are delivered online. the richard gartner (richard.gartner@kcl.ac.uk) is a lecturer in library and information science, king’s college, london. mets as an intermediary schema for a digital library of scientific multimedia | gartner 25 method used is an unconventional use of a mets template as an intermediary schema;9 this allows something of the flexibility of the fcm approach while retaining the relative simplicity of the more structured mets model. a nanoimaging digital library and its metadata requirements the collection analysed for this study derives from biological nanoimaging experiments undertaken at the randall division of cell and molecular biophysics at king’s college london. biological nanoimaging is a relatively new field of research that aims to unravel the biological processes at work when molecules interact in living cells; this is done by using optical techniques that can resolve images down to the molecular level. it has particular value in the study of how diseases progress and has great potential to help predict the effects of drugs on the physiology of human organs. as part of the biophysical repositories in the lab (bril) project at king’s college london,10 a digital library is being produced to meet the needs of practitioners of live cell protein studies. although the material being made available here is highly specialised, and the user base is restricted to a specialist cohort of biologists, the challenges of this library are similar to those of any collection of digital objects: in particular, the metadata strategy employed must be able to handle the delivery of complex, multifile objects as efficiently as, for example, a library of digitized books has to manage the multiplicity of image files that make up a single digital volume. the digital library itself is hosted on the widely used fedora repository platform; as a result, it is employing fcm as the basis of its data modelling. the purpose of this analysis is to ascertain whether mets can also be used for the complex models required by this data and to compare its potential viability as an architecture for this type of application with fcm. a particular challenge of this collection is that the raw images from which it is constituted require combining and processing before they are delivered to the user. a further challenge is that the library encompasses images from a variety of experiments, all of which combine these files in different ways and employ different software for processing them. some measure of the complexity of these requirements can be gathered from figure 1 below, which illustrates the processes involved in delivering the digital objects for two types of experiments. figure 1. architecture for two experiment types mets as an intermediary schema for a digital library of scientific multimedia | gartner 26 the images created by two experiments, bleach and actin_5, are shown here: it will be seen that the bleach experiment is divided into two subtypes (here called 2grating and apotone). each type or subtype of experiment has its own requirements for combining the images it produces before they are displayed. for the subtype 2grating, for instance, two images, each generated using a different camera grating, are processed in parallel (indicated by the brackets); these are then combined using the software package process-non-clem (shown by the hexagonal symbol) to produce a display image in tiff format. the subtype apotone requires three grating images and a further image with background information to be processed in parallel by the software process-apotone; in this case, the background image provides data to be subtracted from the combined three grating images to produce the final tiff for display. actin_5 experiments are entirely different: they produce still images that need to be processed sequentially (shown by the braces) to produce a video. encoding the bril architecture in mets this architecture, although complex, is readily handled within mets in a manner analogous to that of more conventional collections. as in any mets file, the structure of the experiments, including their subexperiments, is encoded using nested division (
) elements in the structural map (example 1a).

[subsidiary
s containing image information]

[subsidiary
s containing image information]

[subsidiary
s containing image information]

example 1a. sample experiment-level structural map within these containing divisions, subsidiary
elements are used to map the combination of images necessary to deliver the content for each type. mets allows the specification of the parallel or sequential structuring of files using its and elements respectively. the parallel processing of the apotone subtype, for instance, could be encoded as shown in example 1b. information technology and libraries | september 2012 27

example 1b. sample parallel structure for raw image files to be combined using a process specified in associated metadata (behavior section) each division of the structural map of this type may in its turn be attached to a specific software item in the mets behavior section to designate the application through which it should be processed: the tri-partite set of images in example 1b, for instance, would be linked to the processapotone software using the code in example 1c. example 1c. sample mets behavior mechanism for a specification of image processing this approach is straightforward, and mets is capable of encoding all of the requirements of this data model, although at the cost of large file sizes and a degree of inflexibility. this may be no problem when the principle rationale behind the creation of this metadata is preservation: linking all of the project metadata in a coherent, albeit monolithic, structure of this kind benefits especially its usage as an open archival information system (oais) archival information package (aip), one of the key functions for which mets was designed. problems are likely to arise, however, when this approach is scaled up in a delivery system to include the potentially millions of data objects that this project may produce. the large size of the mets files that this approach necessitates makes their on-the-fly processing for delivery much slower than a system that uses aggregations of the smaller files required by the fcm model and so processes only metadata at the granularity necessary for the delivery of each object. such flexibility is much harder to achieve within mets, although mechanisms that currently exist for aggregating diverse objects within mets may seem to offer some degree of solution to this problem. complex relationships under mets underlying the mets structural map is an assumed ontology of digital objects that encodes a longestablished view of text as an ordered hierarchy of content objects;11 this model accounts for the mets as an intermediary schema for a digital library of scientific multimedia | gartner 28 map’s use of hierarchical nesting and the ordinality of the object’s components. the rigidity of this model is alleviated to some extent by the facility within mets to encode structural links that cut across these hierarchies. these links, which join nodes at any level of the structural map, are particularly useful for encoding hyperlinks within webpages,12 and so are often used for archiving websites. various attempts have been made to extend the functionality of the structural map and structural links sections to allow more sophisticated aggregations and combinations of components beyond the boundaries of a single digital object, in a manner analogous to the flexible granularity of fcm. mets itself offers the possibility of aggregating other mets files through its (mets pointer) element: this element, always a direct child of a
element in the structural map, references a mets file that contains metadata on the digital content represented by this
. for example, two complex digital objects could be represented at a higher collection level, as shown in example 2.

example 2. use of mets element this feature has found some use in such projects as the echo depository, which uses it to register digital objects at various stages of their ingest into, and dissemination from, a repository;13 it is also recommended by the paradigm project as a method for archiving born-digital content, such as emails.14 nonetheless, its usage remains fairly limited; of all the mets profiles registered on the central mets repository, for instance, echo dep at the time of writing remains the only project on the library of congress’s repository of mets profiles to employ this feature. 15 an important reason for its limited take-up is that its potential for more sophisticated uses than merely populating a division of the structural map is severely limited by its place in the mets schema. the element can only be used as a direct child of its parent
: it cannot, for instance, be located in or elements to indicate that the objects referenced in its subordinate mets files should be processed in parallel or in sequence (as is required by the different experiment types in figure 1), nor may the contents of these files be processed by the sophisticated partitioning features of the element, which allows subsidiary parts of a
to be addressed directly. a more sophisticated approach to combining digital object components is to employ open archives initiative object reuse and exchange (oai-ore) aggregations,16 which express more complex relationships at greater levels of granularity than the method allows. information technology and libraries | september 2012 29 mcdonough’s examination of the possibility of aligning the two standards concludes that it is indeed possible, although at the cost of eliminating the mets behavior section and removing much of the flexibility of mets’s structural links, both side effects of oai-ore’s requirement that resource maps must form a connected rdf graph.17 in addition, converting between mets and oai-ore may not be lossless, depending on the design of the mets document.18 neither approach therefore seems ideal for an application of this type, the former because of the limited ways in which the element can be deployed outside the element and its subsidiaries, the latter because of its removal of the functionality of the behavior section, which is essential for the delivery of material such as this. mets as an intermediary schema an alternative approach adopted here uses the technique of employing mets files as intermediary schemas to act as templates from which mets-encoded packages for delivery can be generated. intermediary xml schemas are intermediary in the sense that they are designed not to act as final metadata containers for archiving or delivery, but as mediating encoding mechanisms from which data or metadata in these final forms can be generated by xslt transformations: one example is cerif4ref, a heavily constrained xml schema used to encode research management information from which metadata in the complex common european research information format (cerif) data model can be generated.19 the cerif4ref schema attempts to emulate the architectural processing features of sgml,20 which are absent from xml; these allowed simpler document type definitions (dtds) to be compiled for specific applications, which could then be mapped to established, more complex, sgml models. instead of architectural processing, cerif4ref uses xslt to carry out this processing, so allowing the combination of a simpler scheme tailored to the requirements of an application to be combined with the benefits of a more interoperable but highly complex model that is difficult to implement in its standard form. instead of using this technique for constraining encoding to a simpler model and generating more complex data structures from this, the intermediary schema technique may be used to define templates, similar to a content model, from which the final mets files to be delivered can be constructed. as is the case with cerif4ref, xslt is used for these transformations, and the xslt files form an integral part of the application. in this way, a series of templates, beginning with highest-level abstractions, are used to generate their more concrete subsidiaries, until a final version used for dissemination is generated. the core of this application is a mets file, which acts as a template for the data delivery requirements for each type of experiment. figure 2 demonstrates the components necessary for defining these for the 2grating experiment subtype detailed previously in figure 1. mets as an intermediary schema for a digital library of scientific multimedia | gartner 30 figure 2. defining an experiment subtype in mets the data model for the delivery of these objects is defined in the (b): as can be seen here, a series of nested
elements is used to define the relationship of experiment subtypes to types, and then to define, at the lowest level of this structure, the model for delivering the objects themselves. in this example, two files are to be processed in parallel; these are defined by elements within the (parallel) element. in a standard mets file, the fileid attribute of would reference a element within the mets file section (a): in this case, however, they reference empty file group () elements, which are populated with elements when this template undergoes its xslt transformation. the final component of this template is the mets behavior section (c), in which the applications required to process the digital objects are defined. two behavior sections are shown in this example: the first is used to invoke the xslt transformation by which this mets template file is to be processed, the second to define the software necessary to co-process the two images files for delivery. both indicate the divisions of the structural map whose components they process by their structid attributes: the first references the map as a whole because it applies to recursively to the mets file itself, the second references the experiment for which it is needed. when delivering a digital object, it is then necessary to process this template mets file to generate the final version used to encode its metadata in full. the xslt used to do this co-processes the template and a separate mets file defined for each object containing all of its relevant metadata: information technology and libraries | september 2012 31 this latter file is used to populate the empty sections of the template, in particular the file section. figure 3 provides an illustration of the xslt fragment which carries out this function. figure 3. the xslt transformation file is evoked with the sample parameter, which contains the number of the sample to be rendered: this is used to generate the filename for the document function, which selects the relevant mets file containing metadata for the image itself. the element within this file, which corresponds to the required image, is then integrated into the relevant element in the template file, populating it with its subcomponents, including the element, which contains the location of the file itself. in the case of the actin_5 experiment, which generates a video file from a sequence of still images, the processes involved are slightly more complicated. because the number of still images to be processed will vary for each sample, it is not possible to specify the template for the delivery mets as an intermediary schema for a digital library of scientific multimedia | gartner 32 version of the sequence explicitly within a element as is done for the other experiments. instead, it is necessary to define a further mets file (the “sequence file”) in which the sequence for a given sample is defined. in this case, the architecture is shown in figure 4. figure 4. populating sequentially processed file section with xslt in this case the element in the mets template file acts as a placeholder only and does not encode even the skeletal information for the parallel-processed tiff files in figure 3. similarly, the structural map
for this experiment indicates only that this section is a sequence but does not enumerate the files themselves even in template form. both of these sections are populated when the file is processed by the xslt transformation to import metadata from the mets “sequence file,” information technology and libraries | september 2012 33 in which the file inventory (a) and sequential structure (b) for a given sample are listed. the xslt file populates the file section and structural map directly from this file, replacing the skeletal sections in the template with their counterparts from the sequence file. through this relatively simple xslt transformation, the final delivery version of the mets file is readily generated for either content model. this file can itself then be delivered on the fly (for instance, as a fedora disseminator); this is done by using a further xslt file to process the complex digital object components using the mechanism associated with each experiment in the mets behavior section. given the relatively small size of all of the files involved, this processing can be done more quickly than would be possibly using a fully aggregated mets approach. in the laboratory environment in particular, where the fast rendering and delivery of these images is needed so as not to impede workflows, this has major advantages. although the project aimed to examine specifically the use of fedora for the delivery of this complex material, and so employed fcm as the basis of its metadata strategy, the technique examined in this article proved itself a viable alternative that made much fewer demands on developer time. the small number of xslt stylesheets required to render and deliver the mets files were written within a few hours: the development time to program the delivery of the rdfbased metadata that formed the fcm required several weeks. processing xml using xslt disseminators in fedora is very fast, and so using this method instead of processing rdf introduces no discernible delays in object delivery. conclusions this approach to delivering complex content appears to offer the benefits of the alternative approaches outlined above in a simpler manner than either currently allows. it offers much greater flexibility than the mets element, which can only populate a complete structural map division. when compared to the fcm approach, this strategy, which relies solely on relatively simple xslt transformations for processing the metadata, requires less developer time but offers a similar degree of flexibility of structure and granularity. it also avoids much of the rigidity of the oai-ore approach by not requiring the use of connected rdf graphs, and so frees up the behavior section to define the processing mechanisms needed to deliver these objects. using the intermediary schema technique in this way does therefore offers a means of combining the advantages of employing well-defined interoperable metadata schemes and the practicalities of delivering digital content in an efficient manner, which makes limited demands on development. as such, it represents a viable alternative to the previous attempts to handle complex aggregations within mets discussed above. the adoption of integrated library systems (ils) became prevalent in the 1980s and 1990s as libraries began or continued to automate their processes. these systems enabled library staff to work, in many cases, more efficiently than they had been in the past. however, these systems were also restrictive—especially as the nature of the work began to change—largely in response to the growth of electronic and digital resources for which they were not intended to manage. new library systems—the second (or next) generation—are needed to effectively manage the processes of acquiring, describing, and making available all library resources. this article examines the state of library systems today and describes the features needed in a next-generation ils. the authors also examine some of the next-generation ilss currently in development that purport to fill the changing needs of libraries. mets as an intermediary schema for a digital library of scientific multimedia | gartner 34 references 1 library of congress, “metadata encoding and transmission standard (mets) official web site,” 2011 http://www.loc.gov/standards/mets (accessed august 1, 2011). 2 organisation internationale de normalisation, “iso/iec jtc1/sc29/wg11: coding of moving pictures and audio,” 2002, http://mpeg.chiariglione.org/standards/mpeg-21/mpeg-21.htm (accessed august 1, 2011). 3 fedora commons, “the fedora content model architecture (cma),” 2007, http://fedoracommons.org/documentation/3.0b1/userdocs/digitalobjects/cmda.html (accessed december 9, 2011). 4 carl lagoze et al., “fedora: an architecture for complex objects and their relationships,” international journal on digital libraries 6, no. 2 (2005): 130. 5 ibid., 127. 6 ibid., 135. 7 ibid. 8 rishi sharma, fedora interoperability review (london: centre for e-research, 2007), http://wwwcache1.kcl.ac.uk/content/1/c6/04/55/46/fedora-report-v1.pdf.3 (accessed august 1, 2011). 9 richard gartner, “intermediary schemas for complex xml publications: an example from research information management,” journal of digital information 12, no. 3 (2011), https://journals.tdl.org/jodi/article/view/2069 (accessed august 1, 2011). 10 centre for e-research, “bril,” n.d., http://bril.cerch.kcl.ac.uk (accessed august 1, 2011). 11 s. j. derose et al., “what is text, really,” journal of computing in higher education 1, no. 2 (1990): 6. 12 digital library federation, “: metadata encoding and transmission standard: primer and reference manual,” digital library federation, 2010, www.loc.gov/standards/mets/metsprimerrevised.pdf, 77 (accessed august 1, 2011). 13 bill ingram, “echo dep mets profile for master mets documents,” n.d., http://dli.grainger.uiuc.edu/echodep/mets/drafts/mastermetsprofile.xml (accessed august 1, 2011). 14 susan thomas, “using mets for the preservation and dissemination of digital archives,” n.d., www.paradigm.ac.uk/workbook/metadata/mets-altstruct.html (accessed august 1, 2011). 15 library of congress. “mets profiles: metadata encoding and transmission standard (mets) http://www.loc.gov/standards/mets http://mpeg.chiariglione.org/standards/mpeg-21/mpeg-21.htm http://fedora-commons.org/documentation/3.0b1/userdocs/digitalobjects/cmda.html http://fedora-commons.org/documentation/3.0b1/userdocs/digitalobjects/cmda.html http://wwwcache1.kcl.ac.uk/content/1/c6/04/55/46/fedora-report-v1.pdf.3 https://journals.tdl.org/jodi/article/view/2069 http://bril.cerch.kcl.ac.uk/ http://dli.grainger.uiuc.edu/echodep/mets/drafts/mastermetsprofile.xml information technology and libraries | september 2012 35 officialweb site”, 2011. http://www.loc.gov/standards/mets/mets-profiles.html (accessed december 6, 2011). 16 open archives initiative, “open archives initiative protocol—object exchange and reuse,” n.d., www.openarchives.org/ore (accessed december 12, 2011). 17 jerome mcdonough, “aligning mets with the oai-ore data =mmodel,” jcdl ’09 proceedings of the 9th acm/ieee-cs joint conference on digital libraries (new york: association for computing machinery, 2009): 328. 18 ibid., 329. 19 gartner, “intermediary schemas.” 20 gary simons, “using architectural processing to derive small, problem-specific xml applications from large, widely-used sgml applications,” summer institute of linguistics electronic working papers (chicago: summer institute of linguistics, 1998), www.silinternational.org/silewp/1998/006/silewp1998-006.html (accessed august 1, 2011). http://www.loc.gov/standards/mets/mets-profiles.html http://www.openarchives.org/ore/ 105 application of the variety-generator approach to searches of personal names in bibliographic data bases-part 1. microstructure of personal authors' names dirk w. fokker and michael f. lynch: postgraduate school of librarianship and information science, university of sheffield, england. conventional approaches to processing records of linguistic origin for storage and retrieval tend to regard the data as immutable. the data generally exhibit great variety and disparate frequency distributions, which are largely ignored and which entail either the storage of extensive lists of items or the use of complex numerical algorithms such as hash coding. the results in each case are far fmm ideal. the variety-generator approach seeks to reflect the microstructure of data elements in their description for storage and search, and takes advantage of the consistency of statistical characteristics of data elements in homogeneous data bases. in this paper, the application of the variety-generator approach to the description of personal author names from the inspec data base by means of small sets of keys is detailed. it is shown that high degrees of partitioning of names can be obtained by key-sets generated from the initial characters of surnames, fmm the terminal characters of surnames, and from the initials. the implications of the findings for computer-based bibliographical information systems are discussed. introduction the application of computer technology to the storage of bibliographic data bases and to the selection of items from them on the basis of the content of specified data elements poses considerable problems. among the most important of these, from the viewpoint of the efficiency of computer use, is the fact that many of the individual data elements exhibit great variety (i.e., lists of their contents are extensive), and show relatively disparate distributions. this behavior is encountered in different degrees in regard to items such as words in the titles of monograph or periodical ar106 ]oumal of library automation vol. 7/2 june 1974 ticles, assigned subject headings, authors' names, and citations.14 such distributions have been extensively studied in various contexts by bradford, zip£, and mandelbrot.4-6 in general, the distributions are approximately hyperbolic, so that a small proportion of items may account for a substantial proportion of occurrences, while the majority of items occur only infrequently. the studies have been well reviewed by fairthorne.7 of all the data elements, personal author names exhibit a distribution which is at its most exh·eme in one direction. as is shown later in this paper, the most frequent author name in a file of 50,000 names occurred only sixteen times, while over 35,000 of the names, or over 70 percent of the file, occurred once only. a simple and general strategy for dealing with searches of data elements, the contents of which show large variety and disparate distributions, is under development by the research unit at the sheffield school, and has thus far been elaborated in regard to searches of chemical structures and of natural-language data bases. 8• 9 based on information-theoretic principles, it involves a two-stage search procedure in which in the first and rapid stage the majority of items which cannot possibly fulfill the search criteria are eliminated, while those which meet the criteria are examined for an exact match at the second stage. the criteria (or attributes) are selected on the basis of an examination of the microstructure of the items in the data base, and are chosen so that their frequencies are approximately equal. the number of criteria or attributes chosen for description of the items is variable within a wide range; with their aid, the variety of items can be described so as to facilitate discrimination among them. in the context of substructure searching, the attributes are representations of fragments of chemical structures,10 while in the case of text, they are strings of characters which are variable in length. these strings are long when the characters comprising them represent frequent combinations, and short when the characters are infrequent.11 since the sets of attributes can generate, in an approximate manner, the variety of items encountered in the data base, they are termed variety generato1·s. they are intermediate in number between the primitive set of symbols ( alphanumeric characters in the case of text, atoms and bonds in that of chemical structures) and the actual variety of items in the collection (words or word fragments in text in the first instance, and molecules in the second). the variety-generator approach involves recognition of the fact that the statistical properties of specific data elements within homogeneous data bases are relatively constant, and that the primitive symbols of the data elements themselves usually show hyperbolic distributions. new symbol sets can therefore be defined, consisting of sequences of primitive symbols such that their frequencies of occurrence become comparable. the new symbol sets then constitute the attributes which are employed, singly or in combination, to represent the items within a search file. these symbol sets variety-generator approachjfokker and lynch 107 approximate to the ideal of equifrequency postulated by shannon for optimal efficiency in communication. 12 only an approximation can be obtained, however, since the distributions of the newly defined symbols still cover a relatively wide range, and since they are seldom entirely independent of one another in statistical terms, and may often be strongly associated. the variety-generator concept is not entirely novel. indeed, it was anticipated most closely in precisely the present context by merrill and by cutter with a view to subdividing a library's holdings into equal groups of items.13 • 14 however, the greater flexibility of computer techniques would appear to make its use today even more attractive. this paper thus describes a study of a large file of authors' names with a view to identifying attributes of the names which can be used for efficient reh·ieval purposes. assessment of the effectiveness of the attributes in retrieval is described in part 2 of this series. (t the main terms used here are n-gram, key, and key-set, where an n-gram is a string of n adjacent characters. a key consists of an n-gram, and keys are chosen so that the frequencies of a set of keys (or key-set) are approximately equivalent in a given file. the measures used in assessing frequency distributions are shannon's expressions for the entropy of a sequence of symbols: and relative entropy: i h = i p1log2pi i= 1 h _ hactual rhmaximum hmaxlmum is reached when the probabilities of occurrence of the symbols of the sequence are equal; its value is the binary logarithm of the variety of symbols, since 1 1 h =n(-log2-) =log2n n n the value of the relative entropy is thus a measure of the degree of equifrequency of a set of symbols, and is independent of their variety. characteristics of name file the file studied was a collection of 100,000 personal names taken from ten issues of the inspec data base dating from the period 1969 to 1972. the names are represented in variable-length format, surname followed by a comma, space and initials each followed by a period. for the present purpose, case and diacritic shift symbols were ignored. <~>to appear in the september 1974 issue of the journal of library automation. 108 journal of library automation vol. 7/2 june 1974 subsets of the file were first sorted into sequence on the basis of the full names, and distributions determined both for surnames and initials, and for surnames alone, as shown in table 1 for the subset of 50,000 names. since the great majority of full names occur once only, the relative entropy of this distribution, at 0.975 (computed with respect to the 50,000 names, i.e., hmax= log250,000), is high, while that for surnames alone is lower, at 0.904. an analysis of the ratio of unique surnames to the total number of entries in files of 25,000, 50,000, 75,000 and 100,000 names showed that the proportion of different surnames added to the file as it increases in size is predictable. the relationship between the number of different surnames (d) and the total number of entries ( n) conforms to the expression: d=antl where a = 5.89 and {3 = 0.78. next, the frequencies of characters at different positions in the surnames and of the initials were determined. the most important positions in the surname are the first and last characters, as will be seen shortly. the distributions of these characters and of the first and second initials are shown in table 2. the relative entropy of the first initial is, interestingly, table 1. distribution of full names and surnames alone in a file of 50,000 inspec names. frequency f 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 > 20 full names no. of names with %of names with frequency f frequency f 35,187 70.37 4,768 19.07 1,060 6.36 302 2.42 88 0.88 34 0.41 16 0.22 7 0.11 3 0.05 1 0.03 2 0.05 1 0.03 total number of different full names = 41,469 h = 15.22 hmax = 15.61 ( log250,000) hr = 0.9753 su1·names no. of surnames with frequencyf % of surnames with frequencyf 19,894 39.79 4,258 17.03 1,597 706 395 235 134 104 68 54 36 39 36 28 24 24 15 19 16 9 112 9.58 5.65 3.75 2.82 1.88 1.66 1.22 1.08 0.79 0.94 0.94 0.78 0.72 0.77 0.51 0.68 0.61 0.36 8.44 total number of different surnames = 27,803 h = 14.11 hmax = 15.61 ( log250,000) hr = 0.9042 variety-generator approachjfokker and lynch 109 table 2. distributions of first and last characters of surname and of initials in 50,000 inspec name me. first character last character first second of surname of surname initial initial s 0.113 n 0.164 j 0.100 space 0.371 b 0.083 r 0.102 a 0.083 a 0.066 m 0.080 a 0.084 r 0.081 m 0.045 k 0.076 s 0.082 m 0.064 j 0.043 h 0.056 i 0.074 g 0.058 s 0.035 g 0.055 e 0.068 v 0.051 l 0.033 p 0.053 v 0.067 d 0.050 e 0.033 c 0.052 y 0.043 h 0.050 r 0.031 r 0.047 t 0.042 s 0.047 p 0.031 l 0.047 0 0.041 e 0.043 g . 0.030 d 0.044 l 0.040 p 0.042 c 0.030 t 0.040 h 0.037 w 0.038 w 0.028 w 0.040 k 0.033 k 0.036 v 0.028 a 0.036 d 0.030 l 0.036 h 0.027 f 0.034 g 0.026 c 0.035 d 0.026 n 0.025 z 0.013 t 0.033 i 0.026 v 0.025 m 0.013 b 0.032 f 0.024 e 0,018 u 0.013 n 0.026 n 0.024 j 0.017 f 0.006 f 0.026 k 0.022 0 0.016 c 0.005 i 0.023 b 0.020 z 0.013 w 0.005 y 0.023 t 0.013 i 0.013 p 0.004 0 0.010 y 0.007 y 0.011 x 0.004 space 0.005 0 0.005 u 0.005 b 0.003 z 0.005 z 0.002 q 0.001 j 0.001 u 0.004 u 0.001 x q 0.0002 q 0.0002 q 0.0002 x 0.0001 x 0.0001 h =4.309 h =4.039 h =4.374 h =3.688 hmax = 4. 700 (log,26) hmax = 4. 700 (log,26) hmax = 4.755 (iog.27) hmax = 4, 755 (log,27) hr = 0.917 hr = 0.859 h. =0.920 h. =0.776 the highest of the four; the highest ranking initial is j, which is one of the least frequent characters in english text. thereafter follow the first and last letters of the surname, and the second initial. the low relative entropy of the last is partly accounted for by the fact that a single initial occurred in 37 percent of the entries. distributions were also obtained for the second and subsequent characters of the surname. these, and also the distributions of the first character, are in general agreement with the results of earlier studies by bourne and ford, and by ohlman, and indicate that consonants predominate in the first position, vowels in the second position, while thereafter the distributions become less disparate. 15• 16 however, due to the variable lengths of names, the dominant character at the sixth and subsequent positions of the surname is the space character. key-set generation technique the basic key-set generation technique involves creating fixed-length 110 journal of library automation vol. 7/2 jube 1974 n-grams from some point or points of reference within each record, the strings generated being initially of length greater than those anticipated within the key-set. these strings are sorted into lexicographic order and counted. (the resultant distribution of the fixed-length strings is again hyperbolic.) the frequencies are compared with a predetermined threshold frequency-at the first stage none of the string frequencies should exceed this value. the strings are then shortened by truncation of the right-hand character, and the frequencies of the strings which have become identical through truncation are accumulated. the new n-gram frequencies are compared with the threshold value; any strings which exceed the value are noted. the procedure is repeated until the single characters are reached. two types of analysis are possible, redundant and nonredundant. ·in the latter, any string exceeding the threshold value is removed from the list and not processed further, while in the former they continue to the next processing stage. while redundant analysis is valuable at the exploratory stage, the nonredundant type is preferred for key-set generation. the procedure was first applied to strings of characters starting with the first character of each surname, as illustrated in figure 1. n-gram foreman forema forem fore for fo f frequency 11 13 24 98 143 214 1685 fig. 1. successive right-hand truncations of a surname during key-set generation here the frequency of the surname foreman in a _file of 50,000 names is eleven. when successively shortened, other surnames with the same initial n-gram are included in the count. comparison of the count with a threshold value results in selection of a key. here, if the threshold were 100, the key selected would be for. application of the procedure to the surnames of the 50,000 name file (the name records had a maximum of eighteen characters, left-justified and space-filled if less than this length), with a threshold frequency of 300 (i.e., a probability of 0.006), gave a key-set consisting of eighty-seven keys, including all the alphabetic characters. the key-set is shown, in alphabetic order, together with the probabilities, in table 3. it is clear that the most frequent characters at the beginning of the surname have produced most keys, s and m with eight keys each, b with seven, k with six, and h, g, p, and r each with five keys. whereas the relative entropy of the initial surname letter was 0.917, that of the key-set is 0.977. the probabilities of no less than seventy of the eighty-seven keys now lie between 0.005 and 0.015. the key-set itself consists of the twenty-six alphabetic characters (one of these, x, is not represented in the collection), fiftyvariety-generator approachjfokker and lynch 111 table 3. key-set of 87 keys produced from 50,000 surnames from inspec files. key p1'0bability key probability key probability key probability a .023 ga .009 m .001 ro .016 al .007 go .011 ma .022 s .027 an .006 gr .012 mar .008 sa .016 b .012 gu .007 mc .007 sch .014 ba .013 h ,006 me .010 se .008 bar .006 ha .021 mi .012 sh .016 be .017 he .010 mo .012 si .010 bo .014 ho .012 mu .008 so .007 br .014 hu .007 n .011 st .016 bu .009 i .013 na .008 t .030 c .013 j .010 ni .006 ta .010 ca .011 jo .007 0 .017 u .005 ch .016 k .015 p .011 v .015 co .013 ka .018 pa .014 va .010 d .015 ki .008 pe .011 w .011 da .009 ko .017 po .010 wa .011 de .013 kr .008 pr .006 we .008 do .007 ku .010 q .001 wi .010 e .018 l .013 r .007 x f .025 la .012 ra .011 y .011 fr .008 le .014 re .008 z .013 g .015 li .009 ri .006 h=6.2952 hmax = 6.443 (log,87) h, =0.977 eight digram keys, and the three trigram keys bar, mar, and sch. the predominance of vowels as the second character of keys is noticeable; forty-nine of the sixty-one n-grams have a vowel in the second position. the size of the key-set produced from a given data base can be varied arbitrarily by changing the threshold value. an approximately hyperbolic relation obtains between the value of the threshold and the number of keys selected. as the size of the key-set increases, the length of the longest n-gram in the key-set increases, and the distribution of n-grams shifts toward higher values, as shown in figure 2. stability of the key-sets with increase in file size is clearly an important factor. to determine the extent of this, successive portions of the entire file of 100,000 surnames were subjected to the analysis at a threshold value of 0.005. as illustrated in table 4, the key-sets are remarkably stable in regard to total key-set size, the number of keys of each length, and to the actual keys. table 4. stability of size and composition of keys with increasing file size. number of number of number of number of total size entries in file characters digrams trigrams of key-set 25,000 26 76 10 112 50,000 26 74 9 109 75,000 26 74 10 110 100,000 26 75 10 111 no, of keys common to key-sets 26 73 9 108 112 ]ou1'nal of library automation vol. 7/2 june 1974 400 300 number of n-grams 200 100 1 2 3 4 5 6 7 8 9 length of n-grams key-set size a 184 b 332 c 572 d 1034 threshold probability 0.0025 0.0015 0.0010 0.0007 10 11 12 13 fig. 2. distribution characteristics of n-grams generated from 10,000 surnames from inspec for four different threshold values as the size of the key-set increases, the range of probabilities represented among the keys narrows, and the relative entropy of the distribution increases, becoming eventually asymptotic with the value of one. this i~ illustrated in figure 3, for the surnames in a file of 50,000 entries. beyond a key-set size of about 100, increases in the relative entropy of the resultant distribution are marginal. furthermore, with increasing key-set size, the va1'iety-gene1'ato1' appmachjfokker and lynch 113 shorter and more frequent surnames begin to appear in their entirety as keys. as an alternative to increasing the variety of the keys, the production of keys from character positions after the first letter of the surname was considered. the problem of variations in name length, as well as the very different distributions of the characters at these positions, were not encouraging, and instead the production of key-sets from the last letter of the sur1 .99 .98 .97 .96 .95 .94 .93 hr .92 .91 .90 .89 .88 .87 .86 0 20 40 60 80 100 total number of keys for the front of surnames fig. 3. increase in relative entropy with increase in key-set size; keys generated from 50,000 surnames 114 j oumal of library automation vol. 7/2 june 1974 name was investigated, and proved much more ath·active, since it is largely independent of surname length. key-sets from the end of the surname for this purpose, each surname in the file was reversed within a record and subjected to key-generation. the relative entropy of the last character of the surname is substantially lower than that of the first character, at 0.860. accordingly, the key-sets have a higher proportion of longer keys than those produced from the front of the surname, as shown in table 5. this key-set consists of the twenty-six characters, seventy-eight digrams, table 5. key-set of 155 n-grams produced from last letter of 50,000 inspec surnames at threshold of 0.003. key p1'obability key p!'obability key probability key probability a .012 vich ,005 ein .005 is .012 ca .003 gh .003 kin .007 ns .006 da .008 sh .003 lin .005 ins .003 ka .006 th .005 tin .003 os .004 ma .007 ith .004 nn .010 rs .006 na .003 i .014 on .009 ss .005 ina .004 ai .004 son .013 ts .004 ra .010 hi .007 lson .004 us .004 ta .008 ii .009 nson .006 t .012 va .004 vskii .005 rson .004 dt .003 ova .010 ki .006 ton .009 et .004 wa .004 ski .005 0 .017 nt .004 ya .005 wski .004 ko .003 rt .003 b .003 li .005 nko .010 ert .004 c .005 ni .007 no .004 st .004 d .009 ri .005 to .007 tt .005 ld .005 ti .004 p .004 ett .003 nd .006 j .001 q .001 u .013 rd .009 k .010 r .005 v .001 e .020 ak .006 ar .006 ev .018 de .003 ck .009 er .016 ov .012 ee .004 ek .004 ber .003 kov .008 ge .004 ik .004 der .006 ikov .004 ke .006 l .007 ger .005 lov .005 le .008 al .006 nger .003 nov .006 ne .008 el .012 her .006 anov .006 re .006 ll .004 ier .005 rov .006 se .005 all .004 ker .007 sov .003 te .004 ell .008 ler .007 w .005 f .003 m .008 ller .005 x .004 ff .003 am .005 mer .003 y .017 g .004 n .009 ner .010 ay .004 ng .004 an .017 ser .003 ey .006 ang .003 man .014 ter .008 ley .007 ing .007 rman .003 or .004 ky .004 rg .007 yan .003 s .016 ry .005 h .004 en .018 as .007 z .007 ch .009 sen .007 es .011 tz .006 ich .003 in .019 nes .004 h=7.059 hmax = 7.276(log.155) hr = 0.970 va1'iety-generator approachjfokker and lynch 115 1 .99 .98 .97 .96 .95 .94 .93 .92 hr .91 .90 .89 .88 .87 .86 0 40 80 120 160 200 total number of keys for the end of sumames e!g. 4. increase in relative entropy with increase in key-set size; keys generated from 50,000 surnames forty trigrams, ten tetragrams, and a single pentagram. the breakdown of the individual terminal characters of the surname is also more extreme, since the distribution is more skew. thus n, the most frequent last character, has no fewer than nineteen different keys in this set, closely followed by r, with seventeen keys. the relative entropy of the distribution is again high, at 0.970 for this key-set. figure 4 shows the relation between key-set size and relative entropy, and indicates that a larger number of keys from the last character of the surname is required to reach the same relative en116 journal of library automation vol. 7/2 june 197 4 tropy as keys from the first character. there is an anomalous section of the curve, which may well derive from the much greater prevalence of suffixes than prefixes in personal names. conclusions this study has demonstrated the feasibility of devising partial representations of author names by applying the variety-generator approach to overcome the substantial frequency variations encountered in their distributions. it has also been shown that within a homogeneous file, i.e., one of consistent provenance, there exists a substantial level of consistency in terms of character distributions, as illustrated in table 4. the characteristics may vary substantially between data bases of different provenance, e.g., as between inspec and marc files. 17 conventional approaches to processing records comprising linguistic data tend to disregard the statistical properties of the items, and attempt to overcome the resultant problems either by storage of extensive lists of items or by using complex numerical algorithms. typical of this latter approach, in the present context, is the use of truncated search keys for access to bibliographical files in direct access stores, in which fixed-length character strings are the keys, as, for instance, in the system in operation at the ohio college library center.18 the problems encountered in the use of fixed-length truncated author and title search keys for monograph data are indicated by the fact that the search files using hash-addressing are operated, on average, at a density of only 62.5 percent. once the density reaches 75 percent, the proportion of collisions and the resultant degradation in performance are such that the files are recreated at a density of only 50 percent. fixed-length keys from author and title entries are demonstrably inefficient in performance since the information content is low. the distribution of the initial trigrams of 50,000 names from the inspec file provides corroboration of this fact. the number of possible combinations of three characters is 17,576 (263 ), yet only 3,285 trigrams were represented in the file, or 18.7 percent of the total variety. moreover, the relative entropy of the trigrams is much lower than that of the initial characters of the surnames, at 0.73. performance figures for precision illustrate this point.19 the present work, together with other studies of the scope for application of the variety-generator approach, thus stands in considerable contrast to prior work, and must be viewed as a means whereby the microstructure of particular data elements is fully reflected in their manipulation, affording substantial advantages. 20 part 2 of this paper illustrates this in regard to searches of personal names. acknowledgments we thank m. d. martin of the institution of electrical engineers for vm·iety-generator approachjfokker and lynch 117 provision of a part of the inspec data base and of file-handling software, and the potchefstroom university for c.h.e. (south africa) for awarding a national grant to d. fokker to pursue this work. we also thank dr. i. j. barton and dr. g. w. adamson for valuable discussions, and the former for n-gram generation programs. references i. p. b. schipma, term fragment analysis for inversion of large files (chicago: illinois institute of technology research institute, 1971). 2. j. c. costello and e. wall, "recent improvements in techniques for storing and retrieving information," in studies in co-ordinate indexing, vol. 5 (washington, d.c.: documentation inc., 1959). 3. l. h. thiel and h. s. heaps, "program design for retrospective searches on large data bases," information storage and retrieval8:1-20 (feb. 1972). 4. s.c. bradford, documentation (london: crosby-lockwood, 1948). 5. g. k. zip£, human behaviour and the principle of least effort (cambridge, mass: addison-wesley, 1949). 6. b. mandelbrot, "an informational theory of the statistical structure of language," in w. jackson, ed., communication theory (london: butterworth, 1953), p.486501. 7. r. a. fairthorne, "empirical hyperbolic distributions (bradford-zipf-mandelbrot) for bibliometric description and prediction," ]oumal of documentation 25:319-43 (dec. 1969). 8. m. f. lynch, "the microstructure of chemical data-bases, and their representation for retrieval," proceedings, cn ai nato advanced study institute on computer representation and manipulation of chemical information (in press). 9. i. j. barton, s. e. creasey, m. f. lynch, and m. j. snell, "an information-theoretic approach to text searching in direct-access systems," communications of the acm (in press). 10. g. w. adamson, j. cowell, m. f. lynch, a. h. w. mclure, w. g. town, and a. m. yapp, "strategic considerations in the design of screening systems for substructure searches of chemical structure files," ]oumal of chemical documentation 13:153-57 (aug. 1973). 11. a. c. clare, e. m. cook, and m. f. lynch, "the identification of variable-length, equifrequent character strings in a natural language data base," computer journal15:259-62 (aug. 1972). 12. c. e. shannon, "a mathematical theory of communication," bell system technical journal 27: 398-403 ( 1948) . 13. w. c. b. sayers, a manual of classification for librarians and bibliographers (london: grafton, 1926), 14. c. a. cutter, c. a. cutter's alphabetic order table ... altered and fitted with three figures by kate e. sanborn (boston: boston library bureau, 1896). 15. c. p. bourne and d. f. ford, "a study of the statistics of letters in english words," information & control4:48-67 (1961). 16. h. ohlman, "subject word letter frequencies; applications to superimposed coding," proceedings of the inte1'national conference of scientific information, vol. 2 (washington, d.c.: national academy of science, 1959), p.903-16. 17. d. w. fokker and m. f. lynch, "a comparison of the microstructure of author names in the inspec, chemical titles and b.n.b. marc data-bases" (in preparation). 118 ]oumalof library automation vol. 7/2 june 1974 18. f. g. kilgour, p. l. long, a. l. landgraf, and j. a. wyckoff, "the shared cataloging system of the ohio college library center," journal of library automation 5:157-83 (sept. 1972). 19. f. g. kilgour, p. l. long, and e. b. leiderman, "retrieval of bibliographic entries from a name-title catalog by use of truncated search keys," proceedings of the asis 7:79-82 (1970). 20. i. j. barton, m. f. lynch, j. h. petrie, and m. j. snell, "variable-length character string analysis of three data-bases, and their application for file compression," proceedings, 1st informatics con£., durham, 1973 (in press). ital_24n4p2 ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ ฀ reproduced with permission of the copyright owner. further reproduction prohibited without permission. wikiwikiwebs: new ways to communicate in a web environment chawner, brenda;lewis, paul h information technology and libraries; mar 2006; 25, 1; proquest education journals pg. 33 reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. graphical table of contents for library collections: the application ... herrero-solana, victor;félix moya-anegón;guerrero-bote, vicente;zapico-alonso, felipe information technology and libraries; mar 2006; 25, 1; proquest education journals pg. 43 reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. reproduced with permission of the copyright owner. further reproduction prohibited without permission. 218 an analysis of cost factors in maintaining and updating card catalogs j. l. dolby and v. j. forsyth : r&d consultants company, los altos, california this study enumerates and compares costs of manual and computerized catalogs. the difficulties of making comparative cost studies are examined. the report concentrates on the problems of cost element definition and on the reporting of as many comparable sources as possible. results of cost studies are presented in the form of tables that show comparative costs of cataloging, card processing, conversion, and manual and computerized processing. there are also tables on card catalog costs. conclusions are that the costs of manual and automated methods are essentially the same for short entries, and that there is a substantial economic advantage for automated methods in full entries. a side benefit of the present interest in library automation is the amount of attention now being given to study of the traditional methods of librarianship. this phenomenon is hardly unique to librarianship; in almost every area of human endeavor where attempts have been made to introduce the use of computers, workers in the field have suddenly discovered that they did not understand some of their long-standing methods quite as fully as they had believed. the source of this seeming anomaly is easy to find: to program a computer, it is necessary to specify the work to be done in much greater detail than is necessary to explain the same problem to a human being, that curious human phenomenon known variously as "common sense" or "experience" making up the difference. it has not been uncommon over the past decade to hear many survivors of the "automation experience" admit that a main benefit of use of the machine was catalog cost factors/dolby and forsyth 219 acquisition of better procedures through a more detailed understanding of the process involved. improved knowledge of "processes about to be automated" extends to the cost of the process as well, and with added force. in recommending the substitution of one procedure for another in a cost-conscious atmosphere, it behooves one to proffer sound financial reasons for doing so. computers are expensive devices. they also represent expenditure of a different kind of money: capital or lease funds in place of labor expense. thus, although one can still hear the occasional cry that it is difficult to obtain reasonable cost data on various parts of library operations, it is becoming increasingly difficult to pick up an issue of almost any library journal that does not include at least one piece of cost information. this paper is concerned with the cost of maintaining and updating card catalogs. as the authors have observed elsewhere (l), the cost of computing is going down at a rather spectacular rate, while the cost of labor is increasing. if this trend continues, almost every library will be forced to automate certain aspects of the catalog operation at some point in time. the cited report provided some information about the cost of computerized library catalogs. by adding a summary of the cost factors in the use of card catalogs, this article should place in slightly better perspective the more difficult problem of deciding (in the context of a particular library) when the crossover point between manual and automated methods is to be reached. the plan of attack remains essentially the same as in the previous report: selection from among the growing number of papers on the subject those that provide comparable sets of cost information pertinent to the various cost elements of the card catalog operation. it is appropriate, therefore, to begin this study with a brief description of the difficulties in comparing cost statistics in such a way. problems of comparative cost studies although comparative cost studies have much to recommend them, they are fraught with certain difficulties ( 2). in the first place, few librarians would group elementary cost operations in precisely the same way. one library may consider a particular element of cost as part of the acquisitions operation and a second as a part of the cataloging operation; a third may ignore it altogether, or include it in the burden or overhead cost. nor is this mere capriciousness on the part of members of the library community. library operations not only differ from one another, but they also change with time. consider, for example, the problem of obtaining a set of catalog cards for a particular monograph. any or all of the following alternatives might be in use at a given library: the cards may be l) supplied with the book as a service provided by the bookseller at some extra cost; 2) ordered from the library of congress; 3) provided by a centralized cataloging 220 journal of library automation vol. 2/4 december, 1969 operation serving several libraries (as in a county or state library system); 4) prepared by catalogers working in the library; or 5) generated by computer program from standard listings (e.g., from marc tapes). comparing any two of these procedures within a given library does not present any overwhelming problems, although minor questions of definition do occur (for example, how much of the cost of ordering should be allocated to the acquisitions department and how much to the cataloging department when both the book and the catalog cards are obtained simultaneously from the same source?). however, to compare costs from two different libraries, it is essential to know what proportion of each card source was used by each library. fortunately for the purposes of this study most libraries are presently using a mix of method 2 ( lc) and method 4 (own catalogers), and at least some provide sufficient information to enable determination of the appropriate mix for each. however, the problem is indicative of one essential difficulty in comparative cost analyses; and one that, although eased, would not be eliminated by having all libraries band together for adoption of a standard costing procedure. a second difficulty arises from temporal and geographic differences in the cost of manpower. on the surface, this problem can be eliminated, or substantially reduced, by having all studies based on man-hours spent, rather than on dollars required per item, and a number of writers have suggested such a change in reporting procedure. however, the problem is not quite so simple. for example, determining the number of man-hours spent on cataloging adds cost to the study that tends to reduce the number of libraries willing to report; those that do report may or may not be a representative sample of the total. however, there is a more basic problem. in almost all libraries the real restraint on activities is financial: there are just so many funds available for cataloging and these must be used to at least keep the backlog of uncataloged material down to the amount of space available to store it. suppose, for instance, that the amount of material to be cataloged increases by ten percent from one year to the next and that the catalogers are fortunate enough to obtain ten percent salary increases over the same period. it is not impossible to consider that in some libraries the catalogers may be forced to "earn" this raise by absorbing at least a part of the increased load without extra help. balancing salary increases by productivity increases is, of course familiar in industry and may well exist in libraries. as evidence that such an effect is present, it is noted later in this report (see table 4) that three studies made in three rather different libraries over a period of six years showed costs of from $0.228 to $0.235 per card for preparation, production, and filing. the total range ( $0.007) is only three percent of the average cost pef card. ($0.230). such close agreement would be startling if it were found in three simultaneous studies of three nearly identical library operations. to set this agreement aside as pure coincidence seems unwarranted. it is catalog cost factors/dolby and forsyth 221 more reasonable to assume that librarians are forced to operate under strong financial constraints and that they adjust their performance to those constraints through hiring of less well-trained personnel, increased time pressures on all personnel, etc. if this is the case, "standardized" reporting through time figures might be quite misleading unless cost figures were reported as well. finally, there is the question of allocating burden or overhead. potentially, burden could present a severe problem, and occasionally it may. however, in most of the reports cited here, burden is either ignored or separately stated and there is no reason to suspect that the results given in the summaries are noticeably biased by unseen burden differences. nevertheless, it would be of interest to determine proper overhead figures for library operations, as the switch to automation (which seems inevitable), will entail the use of more machines and fewer people, which in turn may drastically alter the overhead structure. the use of cost information having noted some of the difficulties that tend to cloud cost comparisons, it is perhaps useful to investigate how cost information is likely to be used. the nature of the problem can be illustrated by two rather different situations. one is exemplified by library "a", a large public library of some years' standing. it is considering the possibility of changing from its present manual procedures to some form of automation, and wishes to determine a reasonable strategy for implementing such a change over the next five years. library "b", otherwise comparable to "a", has been keyboarding the catalog records of its current acquisitions for the last three years. it has now decided to convert its retrospective catalog and wishes to choose the most economic procedure for this step. the differences in the problems facing two such libraries are basically the classic differences between strategy and tactics. library "a", must lay out a long-term plan, taking into account the growth in its collection over the five-year period, likely changes in equipment and personnel available to it, increases in labor costs, decreases in equipment usage costs, etc. library "b", on the other hand, is in the position of making a specific set of decisions as to whether the work should be done in-house or subcontracted; whether the library should use punched cards, punched paper tape, or optical character-recognition devices; and so forth. in terms of cost, library "b", has to prepare a specific budget request for its funding agency, and it is reasonable to assume that that funding agency will require assurance that the task is to be accomplished at the minimum cost consistent with the designated quality level. cost differences of as little as five percent may be quite important to library "b". general cost summaries can be of use only in enumeration of the possible alternatives. even the accounting procedures in effect in the local system will have a bearing on the final decision. 222 journal of library automation vol. 2/ 4 december, 1969 thus, the primary utility of a general cost summary to the library about to commit itself in a tactical situation is the information it can provide about the problem statement: which cost factors other libraries have been able to identify in similar situations; which of the various alternatives may be safely eliminated from consideration on the grounds that their present costs are considerably higher than other existing methods; and so forth. the likelihood seems remote that any general study, or, for that matter, any particular study, will be sufficiently applicable to the library now undertaking the problem to enable it to take over cost structures unchanged. library "a", faced with establishing a long-range plan, has much more flexibility available to it. its interest in specific costs will be established by some gross notion as to what quantity of funds are likely to be available over the period under plan. some procedures may be seriously considered because they are relatively new and untried and hence of potential interest to national funding agencies who would not consider funding further experiments with procedures that have been thoroughly tested. access to good cost information of such well-tested procedures will help in establishing the likely costs for important aspects of the overall plan. of even greater interest is the possibility that certain costs are likely to undergo substantial change over the planning period. for instance, in reference 1 it was noted that optical character recognition may be a very attractive long-run option for catalog conversion problems precisely because it is so new, and hence has not had time to allow a sufficient number of service centers to spring up to provide truly competitive service capabilities. computer typesetting with the new generation of hardware is in much the same category. in both situations it is clear that what is most needed is the enumeration of cost elements on the one hand and operating cost experience on the other. precise estimates of any one cost element are of relatively little importance, either because they are so likely to change over the long run, or because they are likely to be not appropriate to a specific application even in the short run. comparative cost information would therefore seem to provide a good basis for either application. the comparison forces an enumeration of cost elements precisely because one must evaluate the cost structure of each source to be sure that a reasonable comparison is possible. reporting of the actual experience of several libraries provides a range of experience, not only over several libraries but also over time, so that the extremes reported give an indication of the variability that must be allowed for. in what follows, therefore, concentration is on the problems of cost element definition and on the reporting of as many sources as are comparable in the broad sense. because precise estimates are not only difficult to ob' tain, but also unlikely to be relevant to most users, no attempt has been made to provide formal estimates either of the average cost figures or of their underlying variability. catalog cost factors/ dolby and forsyth 223 the cost of cataloging the preparation of catalog information for a given monograph is perhaps the most sophisticated operation in the entire catalog operation. as such it is probably the last to be considered a candidate for automation, although it is not unreasonable, even now, to consider the use of computers as aids to the cataloger. consequently in many operations the cost of cataloging will continue to be an invariant regardless of whether automation is introduced into other aspects of the catalog operation or not. nevertheless, it is useful to study the cost of catalogs, both to establish the relative cost of cataloging and the subsequent processing steps, and to establish the line of demarcation between the catalog step and the subsequent steps. any enumeration of the detailed steps involved in a complex process must be tentative. this is nowhere more true than in the cataloging operation. fortunately the number of descriptions in detail is growing. for the cataloging operation, three sources of information were used: 1) a detailed analysis made as part of an overall time and motion study of operations in the lockheed research library ( 3); a detailed study of the cataloging and processing activities of the new york public library as a preliminary to possible automation of some of these operations ( 4); and a detailed study of the acquisitions, cataloging, and other processing operations for the columbia university science libraries ( 5) . a summary of these studies is given in table 1. in addition to the eight items in table 1, the lockheed library study included five other items that we have chosen to include in subsequent operations. it is generally true that professionals do not like to have their jobs subjected to the minutiae of time and motion study. there is always the ugly feeling that the creative (and most important) aspects of the job cannot be subjected to simple measurement. nevertheless, cataloging is a continuing effort in most libraries and it is possible to establish some average production rates in terms of number of books cataloged per month or the number of minutes needed per book. the problem, as with most statistical studies, is not with the establishment of objective measurements but rather in the manner in which they are interpreted. use of comparative statistics does not eliminate the possibility of misinterpretation but it does tend to minimize it. the comparative studies selected for the cataloging operation, in addition to those already cited, were: a colorado study based on average cataloging times for eleven librarians from six cooperating libraries ( 2), and a study of ordering, cataloging, and preparations in several southern california libraries ( 6). the catalog cost information for these five studies is summarized in table 2. table 1. cataloging cost elements columbia university science (with lc information) 1. assign class number 2. compare book and card, check entries in general catalog, establish subjects, etc. 3. make necessary changes in lc proof slip, or type temporary slip giving brief descriptive information and class number 4. completed books revised and sent for shelf listing ( without lc information) i. supply descriptive cataloging 2. subject analysis, classification and authority work 3. type workslip for processing section. new york public 1. review work done by searcher. reconcile conflicts and approve new entry forms 2. full descriptive cataloging 3. assign subject entries 4. assign divisional catalog designators 5. check authority files and establish new authorities and cross references 6 . determine classmark lockheed research laboratory 1. 2. 3a. 3b. 4. 5. 6. 7. get book and analyze for subject. obtain dewey and cutter numbers check shelf list for duplicates and copy number (with lc information ) insert and type copy slip and temporary catalog card, check lc subject headings and other references. descriptive and subject catalog book pencil call number on title page (without lc information) insert and type descriptive part only on copy slip and temporary catalog card. write subject data only on catalog card. pencil call number on title page tear and separate copy slips and temporary cards. proof and correct as necessary. take report to reports cataloging travel to library, check national union catalog or other reference book count and tally titles cataloged 1:'0 ~ ....... ~ 5 ..... ..a t'"'l .... c::s-' a ~ :;t.. ~ ~ .... g· 6!:""" 1:'0 ~ tj (b (') (b s 0"' $ ..... co ffi catalog cost factors/ dolby and forsyth 225 table 2. comparative costs of cataloging library source date average cataloging implied avg. time, min. cost salary (per hour) lockheed 1967 10.0 colorado 1969 28.6 $ 2.07 $ 4.34 new york 1968 39.8 6.30 5.25 so. cal. 1961 44.8 2.23 2.98 columbia 1967 84.0 5.85 4.17 in the lockheed and colorado studies, basic times of each operation were studied and then "standard" time factors added to allow for nonproductive time. the standard factors increased the lockheed times by 13 percent and the colorado times by 48 percent. (the times in the table include these allowances.) the figures for new york were derived from their reported statements that they processed 65,000 books using 49 catalogers at a total cost of $409,500 (not including fringe benefits). the columbia figures have been reduced by 20 percent to eliminate fringe benefits. the implied average salary for each source was obtained by dividing the total cataloging cost by the average time and multiplying by 60 to convert to cost per hour. the simplest conclusion to reach from a study of table 2 is that cataloging costs vary widely from one library to another. average times differ by more than 8 to 1 and total cost varies by more than 3 to 1. the low salary for the southern california study is presumably explained by the fact that that study was done in 1961. adjustment of this figure for average salary increases from 1961 to 1968 would undoubtedly bring their total cost more directly in line with the other studies (bureau of labor statistics shows hourly wages increased approximately 30 percent over this period) . it would be interesting to know if the presumed increased salaries of the southern california catalogers has led to a decrease in the average time they spend on cataloging. the more recent data on colorado and new york suggest that this might be expected. the columbia and lockheed time data represent, perhaps not unreasonably, the extremes in this table. the lockheed research library is small compared to the others, and lockheed is, of course, a private corporation, whereas the other sources represent public and university libraries. columbia, on the other hand, is a large university library; however, the figures given are from a study of cataloging of science monographs, which may be more time-consuming. as these cataloging cost figures will be used only as a point of comparison with subsequent operations, it is not necessary to further resolve the apparent differences. the average time for the five sources is 41.4 minutes. assuming that a cataloger currently earns $4.50 an hour, the average cost for the five sources would be $3.11 for the unit cost of cataloging. 1:-0 1:-0 table 3. processing cost elements ~ ....... c columbia university new york public sacramento state ~ '"'t 5 ...... 1. card production 1. receive and distribute plan1. type master cards from c -2. card set completion ning sheets handwritten slips ~ .... 3. sorting and preliminary 2. type headings for added en2. produce subject cross refer<:l"' a filing tries and subject entries ence cards '"'t <.s::: 4. shelf listing 3. mark designators and sort 3. maintain guide cards > 5. typing of book pockets completed cards 4. card production and pur~ ..... 6. filing 4. distribute cards to filing chase c ~ section 5. complete card sets ~ ..... 5. paint edges of cards when 6. proof .... c ;:s required 7. alphabetize 6. glue and separate batches 8. file and revise < c 7. type masters for offset print9. card shifting !'""" to ing 10. update existing cards ........ 8. prepare copy for itek masters 11. correction of problems 11:>9. check format of entry on 12. withdrawals t:) (1) masters 13. weed order slips (') (1) 10. check letter for letter on 14. assembly of statistics s 0" planning sheet 15. file temporary slips (1) 11. gather statistics and keep 16. file permanent slips u'"t ...... log of card preparation 17. shelf list shifting co ~ 12. prepare itek masters and 18. blank catalog card stock co print cards on offset 13. file catalog cost factors/dolby and forsyth 227 card processing costs if cataloging is the least likely part of the library operation to be automated in the near future, the procedures that immediately follow cataloging are precisely opposite in character. card preparation, production, and filing all involve time-consuming routine operations that can be done automatically, thus relieving the library community of a significant proportion to man-hours to apply to problems of greater intellectual content. cost factors must nonetheless be considered. as with cataloging, description of basic cost elements will vary from one library to another. for the detailed breakdown in table 3, use is again made of the columbia and new york public studies previously cited. added to them is data from an unpublished study made available by neil barron of sacramento state college library. barron's cost elements are given in finer detail than those in the other studies reported in this section. in table 4, data from the new york public library and from the sacramento state college library have been grouped into three categories (preparation, production and filing) to achieve maximum compatibility with data from other sources reported in the table. these sources are: a study (7) at the university of toronto of manual costs made in conjunction with early machine methods; a comparative study ( 8) of manual methods and a special-purpose machine procedure at the air force cambridge research laboratory library; and results of three years of computerized card production at the yale medical library (9). costs shown in table 4 are on a "per-card" basis, rather than on a title basis, as differing library requirements show averages ranging from 4.6 cards per title at sacramento to 9.8 cards per title at new york public. most significant in table 4 is the extraordinary agreement between two of the studies: the total processing costs amount to 23.2c per card and 23.5c per card for these two sources, even though the reports were prepared over a six-year period and include significant changes in the cost of labor and materials. furthermore, these costs are reasonably constant for the individual categories in all three sources: card preparation varies from 11.4c per card to 11.6c per card; card production varies from 6.4c per card to 7.9c per card; and card filing varies from 4.2c per card to 5.2c per card. in one sense this close agreement should not be surprising. if it is indeed true that cataloging involves relatively high intellectual content that is difficult to automate, and card processing involves straightforward operations that are relatively easy to automate, it is reasonable to argue that the latter should show much less variability from one operation to the next. the fact that the new york public operation has significantly higher costs can be partially explained by the following observations. the nypl costs are based on the supposition that all cards are locally produced. the to to 00 ....... 0 ~ table 4. comparative costs of card processing ~ -0 date 1968 1969 1965 1963 1968 ~ .... ~ i:s library nypl sse onulp afcrl ch¥ ~ > cards per title (9.8) ( 4.6) (-9) (7) (9.3) ~ ..... 0 ~ (local) (lc (machine) ~ ..... .... 0 cat.) ~ preparation 0.140 0.116 0.114 0.088 < } 0 !"""" 0.233 0.166 0.075 to production } 0.064 0.079 ~ 0.186 tj filing 0.052 0.042 0.043 0.043 0.043 (b () (b totals 0.336 0.232 0.235 0.276 0.209 0.118 9 c"' ~ (b ~~ 0.228 ~ co 0':> co t catalog cost factors/ dolby and forsyth 229 other libraries indicate that a significant proportion of their work is based on the acquisition of lc cards. the breakdown for the afcrl study is shown in table 4 and the breakdown for sacramento is approximately the same. secondly nypl is clearly the largest of the operations under consideration here, and it is not unreasonable to expect that the size of the file will have an effect on the cost of filing. in fact, assuming that the nypl cost of preparation and production is the same as that for the afcrl' s locally produced cards ( 27 .6c) and assigning the rest of the nypl cost to filing, the latter figure becomes 10.3c per card, or a little more than twice the average for the other three sources ( 4.8c per card). if this is the case, it would be of interest to know whether the problem is one of sheer size of the catalog or rather one of increased density that naturally occurs in larger files. e.g., is it more costly to file "smith, adrian j." in a file with 100 smith's or 1000 smith's? finally, in the two cases of partial automation ( afcrl and yale) the cost of card preparation and production is significantly lower (7.5c and 8.8c) than that indicated for lc cards ( 16.6c), or the average for the three closely agreeing sources ( 23.2c). this observation alone should point the library community strongly towards automation of the card processing function. nor is this observation new; both authors of the preliminary studies at afcrl and yale made the point more than adequately. furthermore, as will be demonstrated shortly, the cost of filing is also reduced in an automated system. several factors may be contributing to the slowness of the library community to introduce changes to achieve such cost savings. first, there is inevitably a substantial initial cost involved in any automation project. second, although the potential cost saving is a substantial proportion of the processing cost, it is still small when compared to the cost of cataloging; a librarian under pressure to reduce costs could gain more by cutting back on the time allowed for cataloging without the initial investment necessary for automation. third, there is a persistent difficulty in finding trained personnel in the automation field. finally, librarians are certainly aware of the rapid changeover in equipment in the computing field with the concomittant costs of adapting programs to new equipment. case and space the preceding discussion has provided some notion as to the cost of obtaining the required cataloging information, encoding it on catalog cards, and entering those cards in a catalog file. these costs can be compared with other possible approaches to the problem, including those that involve some degree of automation. there are, of course, a number of associated costs that must be taken into account to obtain a full picture of the cost of card cataloging. they would include, at a minimum, the cost of the space occupied by the catalog, the purchase price of catalog filing 230 journal of library automation vol. 2/4 december, 1969 cases, the cost to the user of consulting th~ catalog, and the cost to the library of maintaining the catalog in usable form. the allocation of capital expenditure costs to a form comparable to the costs per title and the costs per card used in the earlier sections of this rep01t raises certain difficulties. accounting procedures vary from one institution to another. further there is the real but difficult-to-measure problem of comparing funds of various types in a particular situation. nonetheless, it is useful to know whether under any reasonable accounting system the cost of space and cabinets is of sufficient magnitude to make it worthwhile to consider these costs in the overall evaluation. assuming, therefore, that a filing case capable of storing 72,000 cards fully packed costs $800 and occupies approximately 30 square feet of space, including room for aisles and access area, and further assuming that land and construction costs are approximately $30 per square foot, the total cost of the cabinet and the space it occupies would be approximately $1,700. finally if it is assumed that on the average a catalog is approximately 60 percent full, the initial cost of space and case is approximately 4c per card. four cents a card is not negligible, but it is only about 15 to 20 percent of the cost of producing the cards and an even smaller fraction of the total cost when cataloging is included. hence, it seems reasonable to put this cost for space and case in the category of a secondary cost item that will favor book catalogs, microfilm catalogs, and other high-density forms. it is unlikely to be a determining factor unless other cost factors are very closely balanced. book and card catalogs: some relative advantages among the various cost factors involved in cataloging, the most difficult to assess objectively is the cost to the user. the problem is that no one really knows what a user does in a library, nor what impact a given change will have on its utility to him. whether they like a card catalog or not, library users do consult it and it is thus a usable device for providing access to library materials. equally, many libraries in times past, and again more recently, have had book catalogs; and they also are viable devices. but which is better? a card catalog is updated by the simple expedient of entering recently obtained cards in the file. a book catalog is updated by periodically printed revisions. hence any search for a particular item will in general require fewer specific searches in the card catalog than in the book catalog, if the proper information is available to the searcher. card catalogs are large and costly and there are few savings over the original cost in producing a second copy. reproducing books after the first copy is relatively inexpensive. libraries with many branches, or a decentralized set of users, will provide better service with book catalogs. the added cost of maintaining more than a few files is heavy with cards and light with books. whether card or book catalogs are used, the existence of a machine table 5. comparative conversion costs per title mar. 68 1968 1964 1968 1966 1964 1966 lc lacp onlup nypl uc/ b chy sul 446 char. -450 char. 400 char. 300 char. 317 char. 243 char. 180 char. coding/editing $0.169 $0.0801 $0.044 keying 0.207 } $0.480 $0.307 } $0.450 0.188 $0.198 } 0.183 re-keying 0.033 ) 0.030 } 0.117 ~ ~ 0.259 ~ proofing 0.125 0.127 0.085 0.103 s" j 0 ~ rental 0.156 0.084 0.6502 0 .036 0.037 ~ 0 0., ..... conversion & list } 0.359 0.020 }o.096 0.046 0.020 0.024 } 0.1043 ~ ~ c) edit list 0.084 0.141 ..... 0 ~ 0., sort & merge 0.165 0.121 '-. t::l 0 supplies 0.080 0.036 0.5084 0.033 r-c t:d supervision 0 .183 0.580 ....::: ~ ~ 0... 1 includes provision for keypunch rental, and supplies "tj 0 ::::0 2full keypunch rental absorbed by pilot project u) ....::: ..., 3includes use of automatic error-detection routines ::r:: 4includes cost of magnetic tapes and other supplies t--:l cn 1-' 232 journal of library automation vol. 2/4 december, 1969 readable catalog provides much greater flexibility as time goes on. revisions of cataloging practice become much simpler if the revisions can be programmed on a computer. in sum, machine readable book catalogs appear less advantageous than card catalogs only when immediate updating is the primary criterion for comparison. comparative costs of catalog conversion table 5 (an extension and revision of table 7 appearing on page 42 of reference 1) gives comparative conversion costs for three public libraries (library of congress (10), new york public library and los angeles public library), the library of the university of california at berkeley, the stanford undergraduate library (11), the ontario new universiti~s library project, and the columbia-harvard-yale study. although the data was gathered for the most part independently over a fouryear period, it is worth making a number of internal comparisons to test for consistency. the most outstanding comparison is between the encoding costs for the library of congress and those for the los angeles public library. for records of essentially the same average length ( 446 characters versus 450 characters) the coding costs agree to the penny! yet the methods of production are significantly different. the library of congress invested heavily in the coding and editing operation and used paper tape typewriters with their relatively high rental. as a result its costs in this area are significantly higher than those for lacp. on the other hand these procedural changes resulted in significantly lower keying costs, so that the overall cost for encoding was the same. the encoding costs of uc/b, chy, and sul are all very close (within three cents per title) even though there is a fair range of record size (from 180 for sul to 317 for uc/b). these three studies probably provide a more reasonable picture of the underlying variation in cost than the unusually close figures for lc and lacp. as a further test of consistency, average cost is plotted against average record length (in characters per record) in figure 1. the rightmost points are for lc and lacp, and the line is simply drawn through the origin (zero dollars, zero cost) and those points. the points of uc/ b, chy and sul cluster about the center of the line. following is an interpretation of the other points charted. the nypl point of $.45 for a 300-character record is not based on actual nypl experience, but rather on a study of information from other investigations. its proximity to the line suggests that nypl's analysis of existing information reaches a conclusion similar to that of this paper. the average encoding cost used to plot the onulp point does not contain the full rental charge reported in the onulp study, because the entire cost of keyboard rental was charged against the project although catalog cost factors/ dolby and forsyth 233 100 § ....l 80 8 z lc lacp ..... lzl ....l onulp • h ..... 60 h ~ lzl ll< h "' 0 u • nyp '-' 40 z sul • • uc/b h q .chy 0 u ~ 20 hul . o~-----.lon0------~2~00~----~j~00~-----4~00~----~j~00~----~6~00 average record length in charact ers fig. 1. encoding costs per title as a function of average record length. the machines were only partially utilized. the point for harvard university library ( hul) is based on information received in a private communication. although there is a significant amount of variation from one study to another it seems reasonable to conclude that the cost of encoding is approximately $.15 per title per hundred characters. the cost of computation is not as well-documented as the cost of conversion. studies that reported computer costs all include the following three operational costs: the first is the cost of conversion and listing. this cost includes the cost of converting the original machine readable form (be it cards or paper tape) to magnetic tape form. in most cases a byproduct of this operation was a listing (all-caps only) of the material on the tape. the second is the cost of an edit run, including a listing in upperand lower-case. the latter was eschewed in a number of cases because of the added costs. however, many libraries would require a proper edit run and many librarians would prefer to edit from an upper/ lower-case printout than from an all-caps printout. 234 journal of library automation vol. 2/4 december, 1969 the third is the cost of sorting and merging the tapes. many of the early studies did not explicitly report on this cost because they were primarily concerned with the cost of converting the retrospective list. however, in an on-going operation this would be a continuing cost of some magnitude. the available information points to a uniform cost of approximately $.02 per record for conversion and list, and approximately $.08 per record for editing. the two studies where both these costs are given indicate that a ratio of 4 to 1 is appropriate. the only study giving a ratio between the sort and merge operation and the edit operation is the nypl study and this is based on before-the-fact-information only; the ratio is approximately 8 to 7. for convenience, one can assume that this ratio is unity, giving an overall ratio of 4-4-1. the most complete history of total computer cost is given by lc: a total of $.36 per record for 446 character records. applying the above ratio to the lc total yields a breakdown of $.04 for conversion and list, $.16 for editing, and $.16 for sort and merge. extending the stanford cost of $.12 for conversion and list and editing gives a total cost for sul of $.22 for its 180 character records. this figure is considerably more than 180/ 446 parts of the lc cost. one other pertinent piece of information is available from the sul data. in the production of the annual catalog, stanford estimates a cost of $.121 per title for what is roughly comparable to the cost of sort and merge. this cost is then roughly 1.2 times the sul cost for conversion and list and editing, verifying the notion that the cost of "sort and merge" is of the same general magnitude as the cost of editing. the ratios of sul costs to lc for encoding are .367/.690 = .532 and .225/.359 = .625 for computer time. this suggests that the means of computing average record length may be different for the two institutions. taking the lc figures as the standard and assuming that both computing and encoding costs are strictly a function of record length, the sul record length should be between .532x446 = 238 and .625x446 = 279. this discrepancy may be a result of one source (presumably lc) counting all delimiter and other non-printing characters while the other does not. nypl indicates that the ratio of printed characters to total characters is approximately 3:4. if the sul figure of 180 is expanded by one third, one obtains the figure of 240 which agrees well with the lower limit (based on encoding costs) given above. the cost of sort and merge is a function of the size of the data base, not the amount of material being put into it. the library of congress points this out in its study ( 11) and report on an average month (where the data base grows for a period and then is reduced to zero.) stanford undergraduate library figures are based on its second year of operation, in which 16,000 titles were added to form a total base of 41,000 titles. the actual cost of this step in the operatiqn will therefore depend strongly on the operating strategy employed. clearly, the number of times one catalog cost factors/dolby and forsyth 235 has to sort and merge the entire data base should be minimized, particularly taking into account the fact that sorting costs go up faster than linearly. if the master file is arranged in n orders (author, subject, title, class number, etc.), it will generally be less expensive to sort the updating material into those n orders and make n merge runs with the sorted master files than to make a single merge with a single ordering of the master file and then sort the master file n times to obtain the required updated orderings of the master file. manual and computer processing: comparative cost one objective of this paper is to define factors whose costs enter into calculations of relative costs of manual and computer processing of catalog information and to report these factor costs. the following paragraphs present a simplified comparison of actual costs of manual and machine processing for a "typical" library characterized by average costs approximating those in the preceding tables. table 5 yields average figures for two cases: catalogs with approximately 425 characters per entry and catalogs with approximately 250 characters per entry; they may be called "full entries" and "short entries," respectively. from table 4, it is possible to compute similar figures for "full catalogs" and "short catalogs" by clustering the three larger cases (those having 9.8, 9.0, and 7.0 cards per title) and the three smaller cases (those having 3.0 and 4.6 cards per title). for the full catalogs the average cost of processing is 26.7c per card and 8.6 cards per title, or a total cost of $2.29 per title. for the short catalogs the average cost of processing is 20.3c per card and 3.8 cards per title, or $0.78 per title. combining these two sets of figures gives the results in table 6. table 6. comparative costs of manual and computerized processing short full entries entries manual $0.78 $2.29 computer $0.84 $1.31 table 6 shows that an hypothesized "typical" library would be slightly better off with manual methods if it chose the short form entries, and noticeably better off with the machine if it chose the full form of the entry. in making this quick comparison, consideration has not been given to several factors that should obviously be taken into account even in this simple example. first, there is not included either the initial cost of programming or the initial cost of converting the retrospective records. either or both of these costs could be substantial, but as they are one-time costs and as libraries are basically long-term institutions, such costs should be written off over a relatively long period, even though they must be financed out of a given year's budget. 236 journal of library automation vol. 2/4 december, 1969 second, the cost of printing the catalog is not included (assuming a book catalog is in fact to be used in the computerized system). thus the comparison in table 6 is between a card catalog and a catalog in machine readable form. such a comparison is complicated by the fact that a card once filed stays in the catalog indefinitely, subject only to longterm wear and tear and a certain rate of attrition due to unauthorized removal, misfiling, and so forth, whereas the machine readable catalog must be updated periodically and supplemented by interim publications. and, of course, the comparison is also complicated by the corresponding low cost of producing a number of copies of the book catalog where this is useful for a given system. however, to put the printing cost in some degree of perspective, one may make a quick calculation based on the production of a single book catalog using a standard upperand lower-case print chain. at present commercially available prices this would cost between 35c and 50c per 10,000 characters, or approximately 9c per entry for the full form entries and 5c per entry for the short form entries (assuming four complete listings for author, title, subject, and class number listings). this added cost would make the comparison between manual and computerized methods even less favorable for the short form, but still substantially better for the long form entries $1.40 to $2.29). conclusion it may be concluded that the card-processing operations in typical libraries can be automated economically in many situations today. libraries using the short form of a catalog and having no immediate need for multiple copies of the catalog may find it desirable to wait a year or two, depending upon their local situation, the availability of trained personnel and, of course, the availability of capital to finance the initial cost of programming and retrospective conversion. however, libraries using the full form in their catalogs, or those needing multiple copies of their catalogs, will almost certainly find that there is a substantial economic advantage to computerization at the present time. even when allowance is made for substantial departures from the "typical" costs found in this study, it is difficult to visualize any library using full form information not finding significant economic gains in computerization. considering the further advantages of the greater flexibility available in machine readable records, the increased services that can be offered to the user, and the fact that machine costs are decreasing while labor costs are increasing, one is led to the conclusion that more and more libraries will move towards catalog automation. tables 7 to 11 appearing on the following pages are reference tables for calculating costs. catalog cost factors/dolby and forsyth 237 acknowledgments the work reported in this paper was supported by the u . s. office of education under contract number oec-9-8-00292-0107. mrs. henriette avram (library of congress) and mr. neil barron (sacramento state college, sacramento, california) made important contributions of cost figures and other technical data used in this report. various state libraries supplied detailed cost information. bibliography a 400-item bibliography on cost and automation is available from the national auxiliary publication service of asis (naps 00696). references 1. dolby, j. l.; forsyth, v. j.; resnikoff, h. l.: computerized library catalogs: their growth, cost and utility (cambridge, mass.: m.i.t. press, 1969). 2. dougherty, richard m.: "cost analysis studies in libraries : is there a basis for comparison," library resources and technical setvices, 13 (winter 1969), 136-141. 3. kozumplik, william a.: "time and motion study of library operations," special libraties, 58 (october 1967), 585-588. 4. henderson, j. w.: rosenthal, j. a.: libmry catalogs: theit preservation and maintenance by photographic and automated techniques (cambridge, mass. : m. i.t. press, 1968). 5. fasana, paul j.; fall, james e. : "processing costs for science monographs in the columbia university libraries," libmry resources and technical services, 11 (winter 1967), 97-114. 6. macquarrie, catherine: "cost survey: cost of ordering, cataloging, and preparations in southern california libraries," library resources and technical services, 6 (fall 1962), 337-350. 7. bregzis, ritvars: "the onulp bibliographic control system: an evaluation," in university of illinois graduate school of library science: proceedings of 1965 clinic on library applications of data processing (urbana: university of illinois, 1966), pp. 112-140. 8. fasana, paul j.: "automating cataloging functions in conventional libraries," libmry resources and technical services, 7 (fall 1963), 350-365. 9. kilgour, frederick g.: "costs of library catalog cards produced by computer," journal of libmry automation,. 1 (june 1968), 121-127. 10. avram henriette: the marc pilot pro;ect (final report on a project sponsored by library resources: chapter viii: "cost models" (washington, d. c.: library of congress, 1968). 11. johnson, richard d.: "a book catalog at stanford," journal of libmry automation, 1 (march 1968), 13-50. to table 7. cost/ card-library of congress catalog cards (july 1968) co 00 extra ..... c ~ all chgs/title -t 5 titles au orders ...... 1st cd of 3 add'l copies same specific subsc for lacking .a lc cards ordered by/for 1-2 cds only or more order cd ordered same tm. subject all cds req info l:"'t .... <:3-' 1) lc # $ .22 $ .10 $ .06 **-~ ~ > 2) author & title .27 .15 .06 ~ 0 ~ 3) series .10 .06 1:::. ..... s· ;::s 4) subject -~---.10 .06 ---<: 0 5) chinese/ japanese/ korean .22-.27 .10-.15 .06 .04 $ .04 !'"""' to .......... .;:.. 6 ) motion pictures & .22-.27 .10.15 .06 .10 .04 filmstrips tj (!) (") (1) 7) phonorecords .22.27 .10.15 .06 .10 .04 3 0"' (1) 8) revised & cross ref. .04 ~'"i ---...... co "' 9) anonymous $ .04 co source-lc cds, july 1968 table 8. catalog card costs cards cost/ card cost/ hour time lc cards $.22-.27 (min order 1-2 cds) } $.04 extra chg all .10-.15 ( 1st cd-3 or more order) orders lacking .04-.06 ( add'l copies same cd-same order ) req. info. blank cards < 3-< 4 for $.01 (j ~ soriginal card ..... c ()'q prepantion $.20-2.34 $2.40-4.70 5-30 min/ cd (j c «> .... card checking ~ before filing $.21 $4.20 3 min/ cd ~ c") 8' ~ correcting «> .......... detected $.12 $2.40 3 min/ cd tj 0 errors t"'' t:p file $.024 $2.40 100 cds/ hr ~ ~ .03 3.00 100 cds!hr l:l p.. .047 4.71 100 cds/ hr "%j 0 store $.01 ~ rj:j ~ reproduce $.0023-.00208 ( ab dick offset press = $.125/bk( 54-60 cds ) ::i: .045 ( xerox-1k-100k cds ) 1:-0 c.:> td 240 journal of library automation vol. 2/4 december, 1969 table 9. (estimated) annual cost of 1000 sq ft of storage space 1)" minnesota state dept. of education ( 1968 )-$520 "source-private communication 2) r&d estimate 04 1968 construction cost $30 sq ft x 1000 sq ft $30,000 100 yrs (life of bldg) +maintenance costs, clean up, etc. ($1 yr/sq ft) $50,000 197 4 construction cost $50 sq ft x 1000 sq ft 100 yrs (life of bldg) +maintenance costs, clean up, etc. ($1 yr/sq ft) ""source-e. graziano, univ. calif. at santa barbara table 10. card catalog cost/ year $ 300/yr $ 1000 $ 1300/ yr $ 500/yr $ 1000 $ 1500/yr given the following variables, 1 card catalog case with a maximum card capacity of 72,000 cards (purchase price-$789) -the cost/ card to store would be $.01. estimated construction cost cost sq ft $30/sq ft maintenance rental@ --;100 yrs est. $.42 sq ft/mo life bldg @ $1/sq ft cost/ yr cabinet ( 6 sq ft) $30.24 $1.80 $ 6.00 $ 38.04 room for users ( 16 sq ft) 80.64 4.80 16.00 101.44 aisles ( 3 sq ft) 15.12 .90 3.00 19.02 catalog table ( 5 sq ft) 25.20 1.50 5.00 31.70 $190.20 + 72,000 cards @ $ .01 (to store) 720.00 total cost /yr $910.20 catalog cost factors/ dolby and forsyth 241 table 11. card catalog maintenance costs estimated requirement space cost/sq ft cost/mo cost/year card catalog cabinet 6 sq ft $ .42 $ 2.52 $ 30.24 room for users -16 sq ft 6.72 80.64 aisles 3 sq ft 1.26 15.12 catalog table 5 sq ft 2.10 25.20 30 sq ft $12.60 $151.20 source-e. graziano, univ. calif. at santa barbara and r&d consultants co. in march 2003 the university of mississippi libraries made our metasearch tool publicly available. after a year of working with this product and integrating it into the library web site, a wide variety of libraries interested in our implementation process and experiences began to call. libraries interested in this product have included consortia, public, and academic libraries in the united states, mexico, and europe. this article was written in an effort to share the recommendations and concerns given. much of the advice is general and could be applied to many of the metasearch tools available. google scholar and other open web initiatives that could impact the future of metasearching are also discussed. m any libraries are looking for ways to facilitate the discovery process for users. implementing a onestop search product that does not require databasespecific knowledge is one of the paths librar ies are choosing.1 as these search engines are made available to patrons, the burden of design falls to the library as well as to the product developers. most library users may be familiar with a few databases, but the vast majority of electronic resources remain unrevealed. using a metasearch product, a single search is broadcast out to similar and divergent electronic resources, and search results are returned and typically mixed together. metasearch results are returned in realtime and link the user to the native interface. although there are many products that support onestop searching, the university of mississippi libraries chose to purchase innovative interfaces’ metafind product because it tied into a digital initiative partnership with innovative. some of the possibilities of the types of resources you can search include: n library catalogs n licensed databases n locally created databases n full text from journals and newspapers n digital collections n selected web sites internet search engines the simplicity of google searching is very appeal ing to users. in fact, users have come to expect this kind of empowering tool. at the university of mississippi, students use and have been using google for research. as google scholar went public, it became evident that university faculty also use it for the same reasons. it was apparent from the university of mississippi libraries’ 2003 libqual+ survey results that users would like more personal control than the library was offering (table 1). unintentionally elaborate mazes are created and users become lost in a quagmire of choices. as indicated by our libqual+ survey results, our users want easytouse tools that allow them to find informa tion on their own, and they want information to be easily accessible for independent use. these are clearly two areas that many libraries are struggling to improve for their patrons. the question is how to go about it. based on several changes made between 2003 and 2005, which included implementing a metasearch tool, the adequacy mean improved for both questions and for undergradu ates as well as graduate students and faculty (table 2). the adequacy mean compares the minimum level of ser vice that a user expects with the level of service that they perceive. in table 1, the negative adequacy mean figures indicate that the library was not meeting users’ minimum level of service for these two questions or that the per ceived level of service was lower than the minimal level of service. table 2 compares the adequacy mean from 2005 with 2003 and indicates a notable, positive change in adequacy mean for each question and with each group. n design perspectives and tension generally, there are conflicts within libraries regarding the question of how to improve access for patrons and allow for independent discovery. for those leading a metasearch implementation, these tensions are important to understand. in implementing new technologies, there are key development issues that may decrease internal acceptance until they are addressed. however, one may also find that there are some underlying fears regarding this technology. although the following crosssubculture comparisons simply do not do justice to each of the valid perspectives, these brief descriptions highlight the types of perspectives one might encounter when considering or implementing a metasearch product. expert searchers prefer native interfaces and all of the functionalities of the native interface. they are typically unhappy with the “dumbeddown” or clunky searching of a metasearch utility. they would prefer for patrons to be taught the ins and outs of the database they should be using for their research. this presupposes that the students either know which database to use, will spend time inves tigating each database on their own, or that they will ask for assistance. however, there are clearly native interface 44 information technology and libraries | june 2007 metasearching and beyond: implementation experiences and advice from an academic library gail herrera gail herrera (gherrera@olemiss.edu) is assistant dean for technical services & automation and associate professor at the university of mississippi. metasearching and beyond | herrera 45 functionalities—such as lim iting to full text—that, while wonderful to patrons, are not consistent across resources or a part of the metasearch standard. users would cer tainly benefit if limiting to fulltext was ubiquitous among vendors and if there were some way to determine fulltext availability within metasearch tools. results ranking is another issue that expert searchers may bring to the table. currently, there is a niso metasearch initiative that is striving to standard ize metasearching.2 another downside for the expert searcher is that there is no browse function. those who are in administrative or manage rial positions working with electronic resources see metasearching as an opportunity to reveal these resources to users who might not otherwise discover them. for example, many users have learned to search ebsco’s academic search premier not realizing that key articles on a local civil rights figure such as james meredith are also available in america: history & life, jstor, and lexisnexis. metasearching removes the need for the user to spend additional time choosing databases that seem relevant and searching them indi vidually. from a financial perspective, if a library is pay ing for these electronic resources, they should be using them as much as possible. and while the university of mississippi libraries generally target the undergraduate audience with our metasearch tool, the james meredith search is a good example of how a metasearch tool might reveal other databases with information that a serious researcher could then further investigate by link ing through the citation to the native interface. those associated with library instruction may also be uncomfortable with metasearching. in fact within a short time of implementing the product, several instructors conveyed their fear that in making searching so simple, they would no longer have a job as the product developed. generally, it seems that users are always in need of instruc tion although the type of instruction and the tools continue to change. it is an understandable fear and one that would be wise to acknowledge for those embarking on a metasearch implementation. while metasearch can be an empowering tool for users, you may also encounter some emotional reactions among library employees. from an information literacy point of view, frost has noted that metasearching is “a step backward” and “a way of avoiding the learning process.”3 it is true that in providing an easy search tool, the library is not endeavoring to teach all students intermedi ate or advanced information retrieval knowledge or skills. however, it is important to provide tools that meet users at their level of expertise and as previously noted, this is an area identified in need of improvement. for those working at public service points such as the reference desk, metasearching is an adjustment. many times those working with patrons tend to use databases with which they are more familiar or in which they feel more confident. federated search tools may reveal resources that are typically less used and therefore unfa miliar to library employees. training may then become an issue worthy of addressing not just for the metasearch interface and design but also for the lessused resources. for those involved in technical support, this product may range from exciting to exasperating. the amount of time your technical support personnel have to dedicate to your metasearch project should be a major factor when investigating the available products. just like any other technological investment, you are either going to (1) purchase the technology and outsource manage ment or (2) obtain a lesser price from a vendor for the tool and invest in developing it yourself. there is also a middle ground, but this costshifting is important to keep in mind. regardless of your approach, it is critical to include the technical support person on your imple mentation team and to keep in mind the kind of time investment that is available when reviewing prices. along with developing this product, one may also find oneself investing additional time and money into infra structural upgrades such as the proxy server, network equipment, or dns servers. in addition to these perspectives, there is a general tension in library web site design philosophies between how librarians would like patrons to use their services table 1. 2003 libqual adequacy mean undergrad grad faculty easy-to-use access tools that allow me to find things on my own -.10 -.30 -.29 making information easily accessible for independent use .37 -.09 .03 table 2. positive change in libqual adequacy mean from 2003 to 2005 undergrad grad faculty easy-to-use access tools that allow me to find things on my own .53 .46 .24 making information easily accessible for independent use .22 .22 .45 46 information technology and libraries | june 2007 and what patrons want. the traditional design based on educating users and having users navigate to information “our way” has definitely curtailed over the past several years with attention being paid increasingly to usability. as usability studies give librarians increasing informa tion, libraries are moving toward designing for our users based on their approaches and needs rather than how librarians would have them work. depending on where one’s library is in this spectrum of design philosophy, implementing a metasearch tool may be harder or easier. judy luther surmised the situa tion well, “for many searchers, the quality of the results matter less than the process—they just expect the process to be quick and easy.”4 moving toward this lofty goal is to some extent dictated by the abilities and inabilities of the technologies chosen. as a technologist, the general rule seems to be that the easier navigation is made for our users; the more complex the technical structure becomes. n metasearch categories in arranging categories of searches for a metasearch product, some libraries group their electronic resources by subject, and others use categories that reflect fulltext avail ability. the university of mississippi libraries use both. the most commonly used category is our fulltext category. this fulltext category was set as the default on our most popular search box located on our articles and databases web page (figure 1). since limiting to fulltext materials is not a standard, the category was defined by the percentage of fulltext they contain. this is an important distinction to understand because a user may receive results that are not fulltext, but the majority of results will likely be fulltext. at our library, if the resource contains more than 50 percent fulltext, it is included in the fulltext category. other categories included in this implementation are ready reference, library catalogs, digital collections, lim ited resources, publicly available databases, and broad subject categories. one electronic resource may be included in the fulltext category, a broad sub ject category such as “arts and humanities” and also have its own individual category in order to mix and match individual resources on sub ject guides using a tailormade search box. the limited resource category contains resources that should be searchable using the metasearch tool but that have a limited number of simultaneous users. if it were included in the default fulltext category that is used so much, it would tie up the resource too much. investigating resources with only one or two simultaneous users at the begin ning of the project may help you avoid error messages and user frustration. one might wonder, “why profile limited resources then?” there may be specific search boxes on subject guides where librarians decide to add that individual but limited resource. it might also be necessary to shorten the timeout period for limited user resources. along those same lines, having paypersearch resources profiled could also be expensive and is not recommended. since the initial implementation, migrating away from per search resources has become a priority. within the first few months of implementation, the free resources such as pubmed and askeric were moved to a new “publicly available” category. the reason is that since there is not any authentication involved, these results return very quickly and are always the first results a user sees. while they are important resources, our intent was really to reveal our subscription resources. this approach allows users to search these resources if specifically chosen but they are not included in the default fulltext category. this approach does still allow subject librarians to mix and match these free individual resources on subject guide search boxes. n response time of all of the issues with our metasearch tool, response time has been the most challenging. there are so many issues when it comes to tracking down sluggish response that it can be extremely difficult to know where to start. if one’s metasearch software is not locally hosted, response time could involve the library network, campus network, offcampus network provider, and the vendor’s network, not to mention the networks of all the electronic resources users are searching. when one adds the other variable of authentication, the picture becomes even more over whelming and difficult to troubleshoot. for authentication, the university of mississippi libraries purchased innovative’s web access management module (wam), which is based on the figure 1. metasearch tailored search box with full text category selected metasearching and beyond | herrera 47 ezproxy software. as the use of our electronic resources from oncampus and offcampus has grown, the inci dence of increasing network issues has risen. in work ing with our campus telecommunications group, the pursuit of evergreater bandwidth has become a priority. troubleshooting has included tracking down trouble some switch settings, firewall settings, as well as campus dns and vendor dns issues. if your network adminis trators use packet shapers, this may be another hurdle. clearly, our metasearch product has placed a significant load increase on the proxy server. in looking at proxy statistics, 24 percent of total proxy hits were from the metasearch product (figure 2). with this in mind, one may find the load on one’s proxy server increasing very dramatically during peak usage and may need to plan for upgrades accordingly. even with improvements and tweaks along the way, response time is still an issue and one of the highest hurdles in selling a metasearch product internally and externally. one metasearch statistical module includes response time information for individual resources along with usage data. the response time information would be very helpful in troubleshooting and in working with electronic resource vendors. usage tracking is another criterion to consider in reviewing metasearch products. n response time and tailored search boxes during implementation, one of the first discussions to have is who will be the target audience for this product. at this institution, undergraduates were the target audi ence and more specifically, those looking for three to five articles for a paper. while our metasearch software has a master screen showing all of the resources divided into the main categories, facing users with over sixty check boxes was not a good solution (figure 3). this master screen is good for demonstrating categories to library staff, overall functionality of the technology, and also for quickly checking all of your resources for connectivity errors. from early conversations with students, keeping basic users far away from this busy screen is a good goal. remember, the purpose is to give them an easy starting point. the best way to keep users in a simple search box is to construct search boxes and handpick either individual resources or categories keep ing in mind the context of the web page. for example, the articles and databases page has a simple search box that searches for articles. subject guide boxes search individual electronic resources selected by the subject librarian. the university of mississippi libraries also have a large col lection from the american institute of certified public accountants (aicpa). the search box on that page searches our catalog, which contains aicpa books along with the aicpa digital collection. some libraries are interested in developing a standard metasearch box to display as a widget or standing content area throughout their web site. this is interesting and worth considering. however, matching the web page content with appropri ate resources has been our approach. as the standards and technology develop, this may be worth further con sideration depending on usability findings. for the most commonly used search box on the articles and databases page (figure 1), the default category checked is the full text articles category. donna fyer stated that, “for the average end user, the less decision making, the better.”5 this certainly rings true for our users. originally, a simple metasearch search box was placed on the library homepage. the library catalog and the basic metasearch box were both displayed. this seemed confusing for users since both products have search capabilities. with the next web site redesign, the basic metasearch box moved from the library homepage to the articles and journals web page. this was a success ful place for the article quick search box to reside since the default was set to search the fulltext category. there were some concerns that users might be typing journal titles into the search box but these were rare instances and not necessarily inappropriate uses. the next rede sign eventually moved this search box to the articles and databases page, where it remains. for the articles and databases pages, the simple search box (figure 1) by default searches the fulltext category and searches the title keyword index. the index category with the label, “article citations,” can also be checked by the user. the majority of metasearches begin with this search box and figure 2. total proxy hits vs. metafind proxy hits 4� information technology and libraries | june 2007 most users do not change the default settings for the resources or the index. n subject guide search boxes in addition to the “article quick search” box, subject librarians slowly became interested in a search box for their subject guides as the possibili ties were demonstrated. in order to do this, the ven dor was asked to profile each resource with its own unique value in order to mix and match individual resources. while the idea of searching resources by subject category sounds useful and appealing, sometimes universal design begets universal dis cord. even with a steering committee involved, it is hard for everyone to agree what resources should be in each of the main subject categories: arts and humanities, science and engineering, business and economics, and social science. some libraries have put a lot of time and effort into creating a large number of subject categories. the master search screen (figure 3) displays several of this library’s categories but not the broad subject categories noted above. these general sub ject categories are brought out in the multipurpose interface called the “library search engine” (figure 4). the library search engine design is a collection of the categories and resources showing the full functionality of our metasearch tool. the subject categorization approach within our metasearch interface is a good way to show the multifunction ality of the product but remains relatively unused by patrons. by giving each resource its own value, subject librarians have the flexibility to select spe cific resources and/or categories for their subject guides. it is worth noting that it required additional setup from our vendor and was not part of the original implementation. after a few months of testing with the initial implemen tation, willing subject librarians chose individual resources for their tailored search boxes. once a simple search box has been constructed, it can be easily copied with minor modi fications to make search boxes for those requesting them. while progress was slow to add these boxes to subject guides, after about a year there was growing interest. in setting these up, subject librarians have several choices to make. first of all, they choose the resources that will be searched. for example, the biology subject guide search box searches academic search premier, bioone, and jstor by default. basicbiosis and pubmed are also avail able but are not checked by default. users can check these search boxes if they also wish to search these resources. choosing the resources to include in the search box as well as setting what resources are checked by default is the most important decision. the subject librarian is also encour aged to assist in evaluating the number of hits per resource returned. with response time being a critical factor, deter mining the number of hits per resource should involve testing and take into consideration the overall number of resources being searched. n relevance selecting the default index is another decision in setting up search boxes. again, users are googleoriented and tend to go with whatever is set as the default option. out of the box, our metasearch tool defaults to the keyword index or keyword search. the issue of relevancy is a hot topic for metasearch products. this issue typically comes up in metasearch discussions. it is also listed as an issue in the niso metasearch initiative. from the technical side of the equation, results are displayed to the user as soon as they are retrieved. this allows users to begin immediately exam figure 3. master screen display (partial screenshot) figure 4. library search engine subject categories metasearching and beyond | herrera 4� ining the results. adding a relevancy algorithm as a step would mean all of the results would have to be returned, ranked, and then displayed. with response time being a key issue, a faster response is more important than relevance. another consideration is if the metasearch results are displayed to the user as interfiled or by electronic resource where the resource is returning results based on its own relevancy rankings. one way to increase relevance is to change the default index from keyword to title keyword. for our students, bringing back keywords in the title made the results more relevant. this is the default index used for our article search on the articles and database web page. subject librarians have the choice of indexes they prefer when blending resources. one caveat in using title keyword is that there are resources that do not support title keyword searching. for other resources, title keyword is not an appropriate index. for example, wilson biographies does not have a title keyword search. it makes perfect sense that a biography database would not support title keyword searching. in these cases, the search may fail and note that the index is not supported. to accommodate this type of exception, the profile for wilson biographies needed to be changed to have the title keyword searchmapped to a basic keyword search. while this does not make the results as relevant as the other search results, it keeps any errors from appearing and allows results to be retrieved. n results per source and per page for metafind, there are also two minor controls that can work as hidden values unseen by the patron or as compo nents within the search box for users to manipulate. the first control is the number of hits to return per resource. if a subject librarian is only searching two or three resources in his tailored search box, he probably will want to set this number higher. if there are many resources, this number should be lower in order to keep response time reasonable. the second control is the number of results to return per page. in general, it is important to adjust these controls after testing the response for the resources selected. while users typically use the default settings, showing these two con trols gives the user a visual clue that the metasearch tool is not retrieving all of the results from the resource. instead, it is only retrieving the first twentyfive, for example. n implementation advice one of the most important pieces of advice is that it is extremely important to have a date in one’s contract or rfp for all of the profiling to be completed if the vendor is doing the resource profiling. from this library’s experi ence, the profiling of a resource can take a very long time, and this is a critical point to include in the contract. one might also consider adding cost and turnaround time for new resources after the initial implementation to the contract. the more resources profiled, the more useful the product. however, one also needs to pay attention to response time. if the plan is to profile one’s own resources or connectors, librarians should be mindful of the time involved and ask other libraries with the same product about time investments. being able to work with vendors who will provide an opportunity to evaluate the product “live” is preferable. in deciding who to target for an implementation team, consider representatives from reference, collection development, and systems. it is also very important to include whoever manages electronic resource access/ subscriptions and a web manager. in watching other pre sentations, exclusion of any of these representatives can seriously undermine the implementation. buyin is essen tial to success. additionally, giving librarians as many options as possible, such as control over what types of resources are in their search boxes as well as the number of hits per resource makes the product more appealing. n questions to ask once the implementation team is set, interviewing refer ences for the products under consideration is an impor tant part of the process. unstructured conversations with references really allow librarians to explore together what the group wants and how its needs fit with the services the vendor offers. a survey of questions via email is another possibility. in choosing this method, be sure to leave some room for open comments. regardless of the approach, it is important to spend some time asking ques tions. provided are a list of recommended questions: n who is responsible for setting up each resource—the vendor or you? n how much time does it typically take to set up a new resource and what is the standard cost to add a new resource? n is there a list or database of alreadyestablished pro files for electronic resources for this product? n how much time would you estimate that it took to implement the product? n will you be able to edit all of the public web pages yourself or will you be using vendor support staff to make changes? if the vendor support staff has to make some of the changes, how responsive are they? 50 information technology and libraries | june 2007 n can you easily mix and match individual resources for subject guides, departmental pages, or other kinds of web pages? or do you only have the option to set up global categories? n is your installation local or does the vendor host it? are there response issues? n is there an administrative module to allow you to maintain categories, resource values, and configura tion options? n how much time goes into managing the product monthly? and who manages the product at your library? n what kind of statistical information does the vendor provide? n how satisfied are you with the training, implementa tion support, and technical documentation? n how does the vendor handle broken resources or subscription changes? as with most technologies, there are upfront and hid den costs. it is important to determine what hidden costs are involved and if you have the resources to support all of the costs. sometimes libraries choose the least expen sive product. however, this approach can lead librar ies down the path of hidden costs. for example, if the product is less expensive but your library is responsible for setting up new electronic resources, managing all of the pages, and finding ways to monitor and troubleshoot performance outside of the tools provided, the hidden expenditures in time and training may be more costly in the end than purchasing the premium metasearch tool. in essence, one must pay for the product one way or another. the big question is, where are the resources to support the product? if one’s library has more it/web personnel than money, the lowercosting product may be the way to go, but be sure to check with other librar ies to see if they have been able to successfully clear this hurdle. additionally, if your library has more onetime money than yearly subscription money, this may dictate the details of the rfp, and your library may lean toward a purchase rather than an annual subscription. n metasearch summary clearly, students want a simple starting place for their research. implementing a metasearch tool to meet this need can be a hard sell internally for many reasons. at this institution, response time has been the overriding critical issue. response has lagged due to server and network issues that have been difficult to track down and improve. however, authentication is truly the most time consuming and complex part of the equation. some fed erated search tools are actually searching locally stored information, which helps with response. while these are not truly metasearch tools and are not performing real time searches, this approach may yield more stability with faster response. over the years in implementing new services such as the library web site, illiad, electronic resources, and off campus authentication, new services are often adopted at a much faster rate by library users than by library employees. typically, there will be early adopters who use the services immediately based on need. it then takes general users about a year to adopt a new service. iii’s metasearch technology has been available for the past four years. however, our implementation is evolving with each web site redesign. still, it is used regularly. the university of mississippi libraries has been pro viding access to its electronic resources in two distinct ways: (1) providing urls on web pages to the native interface of the electronic resource and (2) metasearching. as the library moves forward in developing digital col lections and the number of electronic resources profiled for metasearching increases, it is possible that this kind of global discovery tool will compete in popularity with the library catalog. providing such information mining tools to patrons will cause endless frustration for the library literate. response times, record retrieval order, as well as licensing and profiling issues, are all obstacles to pro viding a successful metasearch infrastructure. retrieval inconsistency and ad hoc retrieval order of records is very unsettling for librarians. however, this is the kind of tool to which web users have become accustomed and certainly seems to fill a need that to date has been lacking where library electronic resources are concerned. n open web developments one other trend appearing is scholarly research discovery tools on the open web. enter google scholar along with other similar initiatives such as windows live academic search. google scholar beta was released in november 2004 and very soon after began an initiative to work with libraries and their openurl resolvers.6 this bridging between an open web tool and libraries is an interest ing development. a fair amount has been written about google scholar to date although the project is still in its beta phase. what does google scholar have to do with metasearching? good question. it remains to be seen how much scholarly information will become search able via google scholar. for now, the jury is still out as to whether google scholar will begin to encroach upon the traditional territory of the indexing and abstracting world. if sufficient content becomes available on the open web, whether from publishers or vendors allowing their metasearching and beyond | herrera 51 content to be included, then the authentication piece that directly effects response time may be overcome. in using google scholar or other such open web portals, search ing happens instantly. when a user uses the openurl resolver to get to the fulltext, that is where authentication enters into the picture and removes the negative impact on searching. the tradeoff is that there are many issues involved in openurl linking and the standardization of the metadata needed to provide consistent access. there are many parallels between what google scholar is attempting to offer and what the promises of metasearching have been. for metasearching, under graduate students looking for their three to five articles for a paper are considered our target audience. for in depth searching, metasearching does have limitations, but for the casual searcher looking for a few fulltext articles, it works well. interestingly, similar recommen dations are being made for google scholar.7 however, opinions differ on this point. roy tennant went so far as to indicate it is a step forward in access to those users without access to licensed databases, but remained reserved in his opinion regarding the usefulness for those with access.8 google scholar also throws in a few bonuses. while providing access to open access (oa) materials in our opac for specific collections such as the directory of open access journals, these same resources have not been included in our metasearch discovery tool. google scholar is searching these open repositories of scholarly informa tion, although there is some concern over the automatic inclusion of materials such as syllabi and undergraduate term papers within the institutional repositories.9 google scholar also provides a useful citation feature and rel evancy. google scholar recognizes the user’s preference for fulltext access and provides a visual cue from the brief results when article fulltext is available. this func tionality is not currently available from our metasearch software but would be extremely helpful to users. on the downside, some of google scholar’s linking policies make it difficult for libraries to extend services beyond full text articles to their users. another notable development among subscription indexing services is the ability to reveal content to web search engines. ebsco’s initiative is called ebscohost connection.10 in implementing metasearching, libraries have debated about providing access to free versus subscrip tion resources. for our purposes, free resources were not included in the most commonly used search in the full text category. there are those who would argue against this decision, and they have very good points. in fact, it has already been noted that some libraries use google scholar to verify incomplete interlibrary loan citations quickly.11 in watching the development of google scholar, it seems possible that this free tool that uncovers free open access resources and institutional repository mate rials may not necessarily be a competitive product, but may be a very complementary one. n impact on the opac what will this mean for the “beloved” opac? for a very long time, users have expected more of the library catalog than it has provided. while the library catalog is typically appreciated by library personnel, its usefulness for finding materials other than books has been hard for general users to understand. many libraries including the university of mississippi have been loading records from their electronic resources in hopes of making the library catalog more useful. the current conversation regarding digital library creation also begs the question, “what is the library catalog?” although the library catalog serves as a searchable inventory of what the library owns, it is simply a pointing mechanism, whether it points the user to a shelf, a building, or a url. in our endeavor to provide instant gratification and fulltext, as well as the user’s desire for information regardless of format, the library catalog is beginning to take a backseat. it was clear four years ago in plan ning digital collections that a metasearch tool would be needed to tie together subscription resources, digital collections, publicly available resources, and the library catalog. it will be interesting to see whether patrons choose to use the formal tools provided by the library or the informal tools developing on the open web, such as google scholar, to perform their research. more than likely, discovery and access will happen through many avenues. while this may complicate the big picture for those in library instruction, it is important to meet users on the open web. one’s best intentions and designs are presented to users but they may choose unintended paths. librarians should watch the paths they are taking and build upon them. sometimes even one’s best attempts fall short, as pointed out clearly in karen schneider’s latest series, “how opacs suck.”12 still it is important to acknowl edge design shortcomings and keep forging ahead. dale flecker, who spoke at the taiga forum, recommended not to spend years trying to “get it right” before imple menting, but instead to consider ourselves in perpetual beta and simply implement and iterate.13 in other words, do not try to make the service perfect before implement ing it. most libraries do not have the time and resources to do this. instead, find ways to gain continual feedback and constantly adjust and develop. students are familiar with internet search engines and do not want to choose between resources. access to a simple resource discovery tool is an important service for users. unfortunately, authentication, product design 52 information technology and libraries | june 2007 and management, and licensing restrictions tend to be stumbling blocks to providing fast and comprehen sive access. regarding the metasearch tool used at the university of mississippi libraries, development part nerships have already been formed between the vendor and a few libraries to improve upon many of the issues discussed. innovative is developing a nextgeneration metasearch product called research pro that leverages ajax technology. while efforts are made to participate in discussions and develop our alreadyexisting tools, it is also impor tant to pay attention to other developments such as google scholar. at this point, google scholar is in beta but this kind of free searching could turn the current infra structure on its ear to the benefit of patrons. the efforts to meet users on the open web and reveal scholarly content are definitely worth keeping an eye on. references 1. roland dietz and kate noerr, “onestop searching bridges the digital divide,” information today 21, no. 7 (2004): s24. 2. niso metasearch initiative, http://www.niso.org/ committees/ms_initiative.html (accessed may 8, 2006). 3. william j. frost, “do we want or need metasearching?” library journal 129, no. 6 (2004): 68. 4. judy luther, “trumping google? metasearching’s prom ise.” library journal 128, no. 16 (2003): 36. 5. donna fyer, “federated search engines,” online 28, no. 2 (2004): 19. 6. jill e. grogg and christine l. ferguson, “openurl link ing with google scholar,” searcher 13, no. 9 (2005): 39–46. 7. mick o’leary, “google scholar: what’s in it for you?” information today 22, no. 7 (2005): 35–39. 8. roy tennant, “is metasearching dead?” library journal 130, no. 12 (2005): 28. 9. o’leary, “google scholar.” 10. what is ebscohost connection?, http://support.epnet .com/knowledge_base/detail.php?id=2716 (accessed may 10, 2006). 11. laura bowering mullen and karen a. hartman, “google scholar and the library web site: the early response by arl libraries,” college & research libraries 67, no. 2 (2006): 106–22. 12. karen g. schneider, “how opacs suck,” ala techsource, http://www.techsource.ala.org/blog/karen+g./sch neider/100003/ (accessed may 10, 2006). 13. dale flecker, “my goodness, life is different,” pre sentation to the taiga forum, mar. 27–28, 2006, http://www .taigaforum.org/pres/fleckerlifeisdifferenttaiga20060327.ppt (accessed may 10, 2006). lita cover 2, cover 3, cover 4 index to advertisers a simple scheme for book classification using wikipedia | yelton 7 andromeda yelton a simple scheme for book classification using wikipedia ■■ background hanne albrechtsen outlines three types of strategies for subject analysis: simplistic, content-oriented, and requirements-oriented.3 in the simplistic approach, “subjects [are] absolute objective entities that can be derived as direct linguistic abstractions of documents.” the content-oriented model includes an interpretive step, identifying subjects not explicitly stated in the document. requirementsoriented approaches look at documents as instruments of communication; thus they anticipate users’ potential information needs and consider the meanings that documents may derive from their context. (see, for instance, the work of hjørland and mai.4) albrechtsen posits that only the simplistic model, which has obvious weaknesses, is amenable to automated analysis. the difficulty in moving beyond a simplistic approach, then, lies in the ability to capture things not stated, or at least not stated in proportion to their importance. synonymy and polysemy complicate the task. background knowledge is needed to draw inferences from text to larger meaning. these would be insuperable barriers if computers limited to simple word counts. however, thesauri, ontologies, and related tools can help computers as well as humans in addressing these problems; indeed, a great deal of research has been done in this area. for instance, enriching metadata with princeton university’s wordnet and the national library of medicine’s medical subject headings (mesh) is a common tactic,5 and the yahoo! category structure has been used as an ontology for automated document classification.6 several projects have used library of congress classification (lcc), dewey decimal classification (ddc), and similar library tools for automated text classification, but their results have not been thoroughly reported.7 all of these tools have had problems, though, with issues such as coverage, currency, and cost. this has motivated research into the use of wikipedia in their stead. since wikipedia’s founding in 2001, it has grown prodigiously, encompassing more than 3 million articles in its english edition alone as of this writing; this gives it unparalleled coverage. wikipedia also has many thesaurus-like features. redirects function as “see” references by linking synonyms to preferred terms. disambiguation pages deal with homonyms. the polyhierarchical category structure provides broader and narrower term relationships; the vast majority of pages belong to at least one category. links between pages function as related-term indicators. editor’s note: this article is the winner of the lita/ex libris student writing award, 2010. because the rate at which documents are being generated outstrips librarians’ ability to catalog them, an accurate, automated scheme of subject classification is desirable. however, simplistic word-counting schemes miss many important concepts; librarians must enrich algorithms with background knowledge to escape basic problems such as polysemy and synonymy. i have developed a script that uses wikipedia as context for analyzing the subjects of nonfiction books. though a simple method built quickly from freely available parts, it is partially successful, suggesting the promise of such an approach for future research. a s the amount of information in the world increases at an ever-more-astonishing rate, it becomes both more important to be able to sort out desirable information and more egregiously daunting to manually catalog every document. it is impossible even to keep up with all the documents in a bounded scope, such as academic journals; there were more than twenty-thousand peer-reviewed academic journals in publication in 2003.1 therefore a scheme of reliable, automated subject classification would be of great benefit. however, there are many barriers to such a scheme. naive word-counting schemes isolate common words, but not necessarily important ones. worse, the words for the most important concepts of a text may never occur in the text. how can this problem be addressed? first, the most characteristic (not necessarily the most common) words in a text need to be identified—words that particularly distinguish it from other texts. some corpus that connects words to ideas is required—in essence, a way to automatically look up ideas likely to be associated with some particular set of words. fortunately, there is such a corpus: wikipedia. what, after all, is a wikipedia article, but an idea (its title) followed by a set of words (the article text) that characterize that title? furthermore, the other elements of my scheme were readily available. for many books, amazon lists statistically improbable phrases (sips)— that is, phrases that are found “a large number of times in a particular book relative to all search inside! books.”2 and google provides a way to find pages highly relevant to a given phrase. if i used google to query wikipedia for a book’s sips (using the query form “site:en.wikipedia .org sip”), would wikipedia’s page titles tell me something useful about the subject(s) of the book? andromeda yelton (andromeda.yelton@gmail.com) graduated from the graduate school of library and information science, simmons college, boston, in may 2010. 8 information technology and libraries | march 2011 ■■ an initial test case to explore whether my method was feasible, i needed to try it on a test case. i chose stephen hawking’s a brief history of time, a relatively accessible meditation on the origin and fate of the universe, classified under “cosmology” by the library of congress. i began by looking up its sips on amazon.com. noticing that amazon also lists capitalized phrases (caps)—“people, places, events, or important topics mentioned frequently in a book”—i included those as well (see table 1).14 i then queried wikipedia via google for each of these phrases, using queries such as “site:en.wikipedia .org ‘grand unification theory.’” i selected the top three wikipedia article hits for each phrase. this yielded a list of sixty-one distinct items with several interesting properties: ■■ four items appeared twice (arrow of time, entropy [arrow of time], inflation [cosmology], richard feynman). however, nothing appeared more than twice; that is, nothing definitively stood out. ■■ many items on the list were clearly relevant to brief history, although often at too small a level of granularity to be good subject headings (e.g., black hole, second law of thermodynamics, time in physics). ■■ some items, while not unrelated, were wrong as subject classifications (e.g., list of solar system objects by size, nobel prize in physics). ■■ some items were at best amusingly, and at worst bafflingly, unrelated (e.g., alpha centauri [doctor who], electoral district [canada], james k. polk, united states men’s national soccer team). ■■ in addition, i had to discard some of the top google hits because they were not articles but wikipedia special pages, such as “talk” pages devoted to discussion of an article. this test showed that i needed an approach that would give me candidate subject headers at a higher level of granularity. i also needed to be able to draw a brighter line between candidates and noncandidates. the presence of noncandidates was not in itself distressing—any automated approach will consider avenues a human would not—but not having a clear basis for discarding low-probability descriptors was a problem. as it happens, wikipedia itself offers candidate subject headers at a higher level of granularity via its categories system. most articles belong to one or more categories, which are groups of pages belonging to the same list or topic.15 i hoped that by harvesting categories from the sixty-one pages i had discovered, i could improve my method. this yielded a list of more than three hundred categories. unsurprisingly, this list mostly comprised irrelevant because of this thesaurus structure, all of which can be harvested and used automatically, many researchers have used wikipedia for metadata enrichment, text clustering and classification, and the like. for example, han and zhao wanted to automatically disambiguate names found online but faced many problems familiar to librarians: “the traditional methods measure the similarity using the bag of words (bow) model. the bow, however, ignores all the semantic relations such as social relatedness between named entities, associative relatedness between concepts, polysemy and synonymy between key terms. so the bow cannot reflect the actual similarity.” to counter this, they constructed a semantic model from information on wikipedia about the associative relationships of various ideas. they then used this model to find relationships between information found in the context of the target name in different pages. this enabled them to accurately group pages pertaining to particular individuals.8 carmel, roitman, and zwerdling used page categories and titles to enhance labeling of document clusters. although many algorithms exist for sorting large sets of documents into smaller, interrelated clusters, there is less work on labeling those clusters usefully. by extracting cluster keywords, using them to query wikipedia, and algorithmically analyzing the results, they created a system whose top five recommendations contained the human-generated cluster label more than 85 percent of the time.9 schönhofen looked at the same problem i examine— identifying document topics with wikipedia data—but he used a different approach. he calculated the relatedness between categories and words from titles of pages belonging to those categories. he then used that relatedness to determine how strongly words from a target document predicted various wikipedia categories. he found that although his results were skewed by how wellrepresented topics were on wikipedia, “for 86 percent of articles, the top 20 ranked categories contain at least one of the original ones, with the top ranked category correct for 48 percent of articles.”10 wikipedia has also been used as an ontology to improve clustering of documents in a corpus,11 to automatically generate domain-specific thesauri,12 and to improve wikipedia itself by suggesting appropriate categories for articles.13 in short, wikipedia has many uses for metadata enrichment. while text classification is one of these potential uses, and one with promise, it is under-explored at present. additionally, this exploration takes place almost entirely in the proceedings of computer science conferences, often without reference to library science concepts or in a place where librarians would be likely to benefit from it. this paper aims to bridge that gap. a simple scheme for book classification using wikipedia | yelton 9 computationally trivial to do so, given such a list. (the list need not be exhaustive as long as it exhaustively described category types; for instance, the same regular expression could filter out both “articles with unsourced statements from october 2009” and “articles with unsourced statements from may 2008.”) at this stage of research, however, i simply ignored these categories in analyzing my results. to find a variety of books to test, i used older new york times nonfiction bestseller lists because brand-new books are less likely to have sips available on amazon.19 these lists were heavily slanted toward autobiography, but also included history, politics, and social science topics. ■■ results of the thirty books i examined (the top fifteen each from paperback and hardback nonfiction lists), twenty-one had sips and caps available on amazon. i ran my script against each of these phrase sets and calculated three measures for each resulting category list: ■■ precision (p): of the top categories, how many were synonyms or near-synonyms of the book’s lcshs? ■■ recall (r): of the book’s lcshs, how many had synonyms or near-synonyms among the top categories? ■■ right-but-wrongs (rbw): of the top categories, how many are reminiscent of the lcshs without actually being synonymous? these included narrower terms (e.g., the category “african_american_actors” when the lcshs included “actors—united states —biography”), broader terms (e.g., “american_folk_ singers” vs. “dylan, bob, 1941–”), related terms (e.g., “the_chronicles_of_narnia_books” vs. “lion, the witch and the wardrobe (motion picture)”), and examples (“killian_documents_controversy” vs. “united states—politics and government—2001–2009”). i considered the “top categories” for each book to be the five that most commonly occurred (excluding wikipedia administrative categories), with the following exceptions: ■■ because i had no basis to distinguish between them, i included all equally popular categories, even if that would bring the total to more than five. thus, for example, for the book collapse, the most common category occurred seven times, followed by two categories with five appearances and six categories with four. rather than arbitrarily selecting two of the six four-occurrence categories to bring the total to five, i examined all nine top categories. ■■ if there were more than five lcshs, i expanded the number of categories accordingly, so as not to candidates (“wars involving the states and peoples of asia,” “video games with expansion packs,” “organizations based in sweden,” among many others). many categories played a clear role in the wikipedia ecology of knowledge but were not suitable as general-purpose subject headers (“living people,” “1849 deaths”). strikingly, though, the vast majority of candidates occurred only once. only forty-two occurred twice, fifteen occurred three times, and one occurred twelve times: “physical cosmology.” twelve occurrences, four times as many as the next candidate, looked like a bright line. and “physical cosmology” is an excellent description of brief history— arguably better than lcsh’s “cosmology.” the approach looked promising. ■■ automating further test cases the next step was to test an extensive variety of books to see if the method was more broadly applicable. however, running searches and collating queries for even one book is tedious; investigating a large number by hand was prohibitive. therefore i wrote a categorization script (see appendix) that performs the following steps:16 ■■ reads in a file of statistically improbable phrases17 ■■ runs google queries against wikipedia for all of them18 ■■ selects the top hits after filtering out some common wikipedia nonarticles, such as “category” and “user” pages ■■ harvests these articles’ categories ■■ sorts these categories by their frequency of occurrence this algorithm did not filter out wikipedia administrative categories, as creating a list of them would have been prohibitively time-consuming. however, it would be table 1. sips and caps for a brief history of time sips grand unification energy, complete unified theory, thermodynamic arrow, psychological arrow, primordial black holes, boundary proposal, hot big bang model, big bang singularity, more quarks, contracting phase, sum over histories caps alpha centauri, solar system, nobel prize, north pole, united states, edwin hubble, royal society, richard feynman, milky way, roger penrose, first world war, weak anthropic principle 10 information technology and libraries | march 2011 “continental_army_generals” vs. “united states— history—revolution, 1775–1783.” ■■ weak: some categories treated the same subject as the lcsh but not at all in the same way ■■ wrong: the categories were actively misleading the results are displayed in table 2. ■■ discussion the results of this test were decidedly more mixed than those of my initial test case. on some books the wikipedia method performed remarkably well; on misleadingly increase recall statistics. ■■ i did not consider any categories with fewer than four occurrences, even if that left me with fewer than five top categories to consider. the lists of three-, two-, and one-occurrence categories were very long and almost entirely composed of unrelated items. i also considered, subjectively, the degree of overlap between the lcshs and the top wikipedia categories. i chose four degrees of overlap: ■■ strong: the top categories were largely relevant and included synonyms or near-synonyms for the lcsh ■■ near miss: some categories suggested the lcsh but missed its key points, such as table 2. results (sorted by percentage of relevant categories). book p r rbw subjective quality chronicles, bob dylan 0.2 0.5 0.8 strong the chronicles of narnia: the lion, the witch and the wardrobe official illustrated movie companion, perry moore 0.25 1 0.625 strong 1776, david mccullough 0 0 0.8 near miss 100 people who are screwing up america, bernard goldberg 0 0 0.625 weak the bob dylan scrapbook, 1956–1966, with text by robert santelli 0.2 0.5 0.4 strong three weeks with my brother, nicholas sparks 0 0 0.57 weak mother angelica, raymond arroyo 0.07 0.33 0.43 near miss confessions of a video vixen, karrine steffans 0.25 0.33 0.25 weak the fairtax book, neal boortz and john linder 0.17 0.33 0.33 strong never have your dog stuffed, alan alda 0 0 0.43 weak the world is flat, thomas l. friedman 0.4 0.5 0 near miss the tender bar, j. r. moehringer 0 0 0.2 wrong the tipping point, malcolm gladwell 0 0 0.2 wrong collapse, jared diamond 0 0 0.11 weak blink, malcolm gladwell 0 0 0 wrong freakonomics, steven d. levitt and stephen j. dubner 0 0 0 wrong guns, germs, and steel, jared diamond 0 0 0 weak magical thinking, augusten burroughs 0 0 0 wrong a million little pieces, james frey 0 0 0 wrong worth more dead, ann rule 0 0 0 wrong tuesdays with morrie, mitch albom no category with more than 4 occurrences a simple scheme for book classification using wikipedia | yelton 11 my method’s success with a brief history of time. i tested another technical, jargon-intensive work (n. gregory mankiw’s macroeconomics textbook), and found that the method also worked very well, giving categories such as “macroeconomics” and “economics terminology” with high frequency. therefore a system of this nature, even if not usable for a broad-based collection, might be very useful for scientific or other jargon-intensive content such as a database of journal articles. ■■ future research the method outlined in this paper is intended to be a proof of concept using readily available tools. the following work might move it closer to a real-world application: ■■ a configurable system for providing statistically improbable phrases; there are many options.23 this would provide the user with more control over, and understanding of, sip generation (instead of the amazon black box), as well as providing output that could integrate directly with the script. ■■ a richer understanding of the wikipedia category system. some categories (e.g., “all articles with unsourced statements”) are clearly useful only for wikipedia administrative purposes, not as document descriptors; others (e.g., “physical cosmology”) are excellent subject candidates; others have unclear value as subjects or require some modification (e.g., “environmental non-fiction books,” “macroeconomics stubs”). many of these could be filtered out or reformatted automatically. ■■ greater use of wikipedia as an ontology. for example, a map of the category hierarchies might help locate headers at a useful level of granularity, or to find the overarching meaning suggested by several headers by finding their common broader terms. a more thorough understanding of wikipedia’s relational structure might help disambiguate terms.24 others, it performed very poorly. however, there are several patterns here: many of these books were autobiographies, and the method was ineffective on nearly all of these.20 a key feature of autobiographies, of course, is that they are typically written in the first person, and thus lack any term for the major subject—the author’s name. biography, by contrast, is rife with this term. this suggests that including titles and authors along with sips and caps may be wise. additionally, it might require making better use of wikipedia as an ontology to look for related concepts (rather in the manner that han and zhao used it for name disambiguation).21 books that treat a single, well-defined subject are easier to analyze than those with more sprawling coverage. in particular, books that treat a concept via a sequence of illustrative essays (e.g., tipping point, freakonomics) do not work well at all. sips may apply only to particular chapters rather than to the book as a whole, and the algorithm tends to pick out topics of particular chapters (e.g., for freakonomics, the fascinating chapter on sudhir venkatesh’s work on “gangs_in_chicago, _illinois”22) rather than the connecting threads of the entire book (e.g. “economics—sociological aspects”). the tactics suggested for autobiography might help here as well. my subjective impressions were usually, but not always, borne out by the statistics. this is because some of the rbws were strongly related to one another and suggested to a human observer a coherent narrative, whereas others picked out minor or dissimilar aspects of the book. there was one more interesting, and promising, pattern: my subjective impressions of the quality of the categories were strongly predicted by the frequency of the most common category. remember that in the brief history example, the most common category, “physical cosmology,” occurred twelve times, conspicuously more than any of its other categories. therefore i looked at how many times the top category for each book occurred in my results. i averaged this number for each subjective quality group; the results are in table 3. in other words, the easier it was to draw a bright line between common and uncommon categories, the more likely the results were to be good descriptions of the work. this suggests that a system such as this could be used with very little modification to streamline categorization. for example, it could automatically categorize works when it met a high confidence threshold (when, for instance, the most common category has double-digit occurrence), suggest categories for a human to accept or reject at moderate confidence, and decline to help at low confidence. it was also interesting to me that—unlike my initial test case—none of the bestsellers were scientific or technical works. it is possible that the jargon-intensive nature of science makes it easier to categorize accurately, hence table 3. category frequency and subjective quality subjective quality of categories frequencies of most common category average frequency of most common category strong 6, 12, 16, 19 13.25 near miss 5, 5, 7, 10 6.75 weak 4, 5, 6, 7, 8 6 wrong 3, 4, 4, 5, 5, 5, 7, 7 5 12 information technology and libraries | march 2011 (1993): 219. 4. birger hjørland, “the concept of subject in information science,” journal of documentation 48, no. 2 (1992): 172; jenserik mai, “classification in context: relativity, reality, and representation,” knowledge organization 31, no. 1 (2004): 39; jens-erik mai, “actors, domains, and constraints in the design and construction of controlled vocabularies,” knowledge organization 35, no. 1 (2008): 16. 5. xiaohua hu et al., “exploiting wikipedia as external knowledge for document clustering,” in proceedings of the 15th acm sigkdd international conference on knowledge discovery and data mining, paris, france, 28 june–1 july 2009 (new york: acm, 2009): 389. 6. yannis labrou and tim finin, “yahoo! as an ontology— using yahoo! categories to describe documents,” in proceedings of the eighth international conference on information and knowledge management, kansas city, mo, usa 1999 (new york: acm, 1999): 180. 7. kwan yi, “automated text classification using library classification schemes: trends, issues, and challenges,” international cataloging & bibliographic control 36, no. 4 (2007): 78. 8. xianpei han and jun zhao, “named entity disambiguation by leveraging wikipedia semantic knowledge,” in proceeding of the 18th acm conference on information and knowledge management, hong kong, china, 2–6 november 2009 (new york: acm, 2009): 215. 9. david carmel, haggai roitman, and naama zwerdling, “enhancing cluster labeling using wikipedia,” in proceedings of the 32nd international acm sigir conference on research and development in information retrieval, boston, ma, usa (new york: acm, 2009): 139. 10. peter schönhofen, “identifying document topics using the wikipedia category network,” in proceedings of the 2006 ieee/wic/acm international conference on web intelligence, hong kong, china, 18–22 december 2006 (los alamitos, calif.: ieee computer society, 2007). 11. hu et al., “exploiting wikipedia.” 12. david milne, olena medelyan, and ian h. witten, “mining domain-specific thesauri from wikipedia: a case study,” in proceedings of the 2006 ieee/wic/acm international conference on web intelligence, 22–26 december 2006 (washington, d.c.: ieee computer society, 2006): 442. 13. zeno gantner and lars schmidt-thieme, “automatic content-based categorization of wikipedia articles,” in proceedings of the 2009 workshop on the people’s web meets nlp, acl-ijcnlp 2009, 7 august 2009, suntec, singapore (morristown, n.j.: association for computational linguistics, 2009): 32. 14. “amazon.com capitalized phrases,” amazon.com, http://www.amazon.com/gp/search-inside/capshelp.html/ ref=sib_caps_help (accessed mar. 13, 2010). 15. for more on the epistemological and technical roles of categories in wikipedia, see http://en.wikipedia.org/wiki/ wikipedia:categorization. 16. two sources greatly helped the script-writing process: william steinmetz, wicked cool php: real-world scripts that solve difficult problems (san francisco: no starch, 2008); and the documentation at http://php.net. 17. not all books on amazon.com have sips, and books that do may only have them for one edition, although many editions may be found separately on the site. there is not a readily apparent pattern determining which edition features sips. therefore ■■ a special-case system for handling books and authors that have their own article pages on wikipedia. in addition, a large-scale project might want to work from downloaded snapshots of wikipedia (via http:// download.wikimedia.org/), which could be run on local hardware rather than burdening their servers, this would require using something other than google for relevance ranking (there are many options), with a corresponding revision of the categorization script. ■■ conclusions even a simple system, quickly assembled from freely available parts, can have modest success in identifying book categories. although my system is not ready for real-world applications, it demonstrates that an approach of this type has potential, especially for collections limited to certain genres. given the staggering volume of documents now being generated, automated classification is an important avenue to explore. i close with a philosophical point. although i have characterized this work throughout as automated classification, and it certainly feels automated to me when i use the script, it does in fact still rely on human judgment. wikipedia’s category structure and its articles linking text to title concepts are wholly human-created. even google’s pagerank system for determining relevancy rests on human input, using web links to pages as votes for them (like a vast citation index) and the texts of these links as indicators of page content.25 my algorithm therefore does not operate in lieu of human judgment. rather, it lets me leverage human judgment in a dramatically more efficient, if also more problematic, fashion than traditional subject cataloging. with the volume of content spiraling ever further beyond our ability to individually catalog documents—even in bounded contexts like academic databases, which strongly benefit from such cataloging— we must use human judgment in high-leverage ways if we are to have a hope of applying subject cataloging everywhere it is expected. references and notes 1. carol tenopir. “online databases—online scholarly journals: how many?” library journal (feb. 1, 2004), http://www .libraryjournal.com/article/ca374956.html (accessed mar. 13, 2010). 2. “amazon.com statistically improbable phrases,” amazon. com, http://www.amazon.com/gp/search-inside/sipshelp .html/ref=sib_sip_help (accessed mar. 13, 2010). 3. hanne albrechtsen. “subject analysis and indexing: from automated indexing to domain analysis,” the indexer, 18, no. 4 a simple scheme for book classification using wikipedia | yelton 13 problematic million little pieces to be autobiography, as it has that writing style, and as its lcsh treats it thus. 21. han and zhao, “named entity disambiguation.” 22. sudhir venkatesh, off the books: the underground economy of the urban poor (cambridge: harvard univ. pr., 2006). 23. see karen coyle, “machine indexing,” the journal of academic librarianship 34, no. 6 (2008): 530. she gives as examples phraserate (http://ivia.ucr.edu/projects/phraserate/), kea (http://www.nzdl.org/kea/), and extractor (http://extractor. com/). 24. per han and zhao, “named entity disambiguation.” 25. lawrence page et al., “the pagerank citation ranking: bringing order to the web,” stanford infolab (1999), http:// ilpubs.stanford.edu:8090/422/ (accessed mar. 13, 2010). this paper precedes the launch of google; as the title indicates, the citation index is one of google’s foundational ideas. this step cannot be automated. 18. be aware that running automated queries without permission is an explicit violation of google’s terms of service. seegoogle webmaster central, “automated queries,” http://www.google.com/support/webmasters/bin/answer .py?hl=en&answer=66357 (accessed mar. 13, 2010). before using this script, obtain an api key, which confers this permission. ajax web search api keys can be instantly and freely obtained via http://code.google.com/apis/ajaxsearch/web.html. 19. “hardcover nonfiction,” new york times, oct. 9, 2005, http://www.nytimes.com/2005/10/09/books/bestseller /1009besthardnonfiction.html?_r=1 (accessed mar. 13, 2010); “paperback nonfiction,” new york times, oct. 9, 2005, http://www .nytimes.com/2005/10/09/books/bestseller/1009bestpapernon fiction.html?_r=1 (accessed mar. 13, 2010). 20. for the purposes of this discussion i consider the appendix. php script for automated classification 4) { echo “i’m sorry; the number specified cannot be more than 4.”; die; } // next, turn our comma-separated list into an array. 14 information technology and libraries | march 2011 $sip_temp = fopen($argv[1], ‘r’); $sip_list = ‘’; while (! feof($sip_temp)) { $sip_list .= fgets($sip_temp, 5000); } fclose($sip_temp); $sip_array = explode(‘, ‘, $sip_list); /* here we access google search results for our sips and caps. it is a violation of the google terms of service to run automated queries without permission. obtain an ajax api key via http://code.google.com. */ $apikey = ‘your_key_goes_here’; foreach($sip_array as $query) { /* in multiword terms, change spaces to + so as not to break the google search. */ $query = str_replace( “ “, “+”,,” $query); $googresult = “http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=site%3aen.wikipedia.org+$query&key=$apikey”; $googdata = file_get_contents($googresult); // pick out the urls we want and put them into the array $links preg_match_all(‘|” url”:” [^” ]*”|i’,, $googdata, $links); /* strip out some crud from the json syntax to get just urls */ $links[0] = str_replace( “\” url\”:\” “, “”, $links[0]); $links[0] = str_replace(“\” “, “”, $links[0]); /* here we step through the links in the page google returned to us and find the top wikipedia articles among the results */ $i=0; foreach($links[0] as $testlink) { /* these variables test to see if we have hit a wikipedia special page instead of an article. there are many more flavors of special page, but these are the most likely to show up in the first few hits. */ $filetest = strpos($testlink, ‘wiki/file:’); $cattest = strpos($testlink, ‘wiki/category:’); $usertest = strpos($testlink, ‘wiki/user’); $talktest = strpos($testlink, ‘wiki/talk:’); $disambtest = strpos($testlink, ‘(disambiguation)’); $templatetest = strpos($testlink, ‘wiki/template_’); if (!$filetest && !$cattest && !$usertest && !$talktest && !$disambtest && !$templatetest) { $wikipages[] = $testlink; $i++; } /* once we’ve accumulated as many article pages as the user asked for, stop adding links to the $wikipages array. */ appendix. php script for automated classification (continued) a simple scheme for book classification using wikipedia | yelton 15 if ($i == $argv[2]) { break; } //this closes the foreach loop which steps through $links } // this closes the foreach loop which steps through $sip_array } /* for each page that we identified in the above step, let’s find the categories it belongs to. */ $mastercatarray = array(); foreach ($wikipages as $targetpage) { // scrape category information from the article page. $wikiscrape = file_get_contents($targetpage); preg_match_all(“|/wiki/category.[^\” ]+|”,,” $wikiscrape, $categories); foreach ($categories[0] as $catstring) { /* strip out the “wiki/category:” at the beginning of each string */ $catstring = substr($catstring, 15); /* keep count of how many times we’ve seen this category. */ if (array_key_exists($catstring, $mastercatarray)) { $mastercatarray[$catstring]++; } else { $mastercatarray[$catstring] =1; } } } // sort by value: most popular categories first. arsort($mastercatarray); echo “the top categories are:\n”; print_r($mastercatarray); ?> appendix. php script for automated classification (continued) lib-mocs-kmc364-20140103102802 86 booth library on-line circulation system (bloc) paladugu v. rao: automation and systems librarian, and b. joseph szerenyi: director of library services, eastern illinois university, charleston, illinois an on-line circulation system developed at a relatively small university library demonstrates that academic libraries with limited funds can develop automated systems utilizing parent institution's computer facilities in a time-sharing mode. in operation sinte september 1968, using an ibm 360 j 50 computer and associated peripheral equipment, it provides control over all stack books. this article describes the history, analysis and design, and operational experience of the booth library on-line circulation system (bloc). since september 1968, when it went into operation, it has constantly been evaluated and modified to make it as perfect a system as possible. articles in library literature describing on-line circulation systems in operation at various libraries include hamilton ( 1 ) , heineke ( 2), kennedy ( 3), and bearman and harris ( 4). bloc differs considerably from those reported systems and has some unique characteristics that deserve the attention of the library profession. it is one of the pioneering circulation systems in which on-line real-time inquiries are being made into the computer files by use of a cathode ray tube display tenninal. it is not a prototype or model system to be interpreted as the optimum circulation system, but rather it is a dynamic system which will and should ------------------......... on-line circulation system j rao and szerenyi 87 be modified to achieve th~ best possible system in accordance with the latest developments in computer hardware and software. its analysis and design were influenced by the needs of an academic library. however, with little or no modification this system can be adopted by public and school libraries. environment eastern illinois university, a state-supported institution located in charleston, has developed a comprehensive curriculum that offers programs in liberal arts, teacher education and other professional fields, and a graduate school. the enrollment for the academic year 1970-71 is 8,600 students, and the number of the faculty is 711. the goal of the university is to provide an excellent education in an atmosphere of high faculty-student ratio and generally small classes, characterized by intellectual dialog and daily contact among students, faculty and administrators. instructors require heavy use of library materials. booth library, the main library of th~ university, contains 235,000 volumes in its collection at present. it has just finalized a five-year development plan to keep pace with the growth of the institution and will increase the collection to over 400,000 volumes by the end of 1975. bloc was designed to satisfy the library's present and future requirements. planning analysis phase in order to improve services to its patrons through the utilization of modern technology, booth library started planning for library automation as early as 1965. early experiments used unit record equipment in such areas as the ordering of library of congress printed catalog cards, acquisitions and serials control. initial difficulties prevented these projects achieving full operational status, however, and subsequently all were abandoned. the primary benefit library staff gained from these early experiments was education in planning carefully for subsequent automation projects, one of which is the bloc system. initial planning for the latter began in 1966; however, the original plan, which was for closed stacks, had to be modified considerably to make bloc compatible with more recent developments in booth library operations. the library switched to open stack operation in 1967. while the bloc planning was going on, there were also plans underway to expand booth library's physical facilities and its resources to meet the needs of an expanding campus. the volume of circulation had already been increasing at the rate of 15% per year. the circulation staff had to be increased to cope with the situation, and even then quality of service had to be sacrificed to quantity demands. furthermore, it was determined that the proposed growth in enrollment and the anticipated increase in library materials would increase the volume of circulation even more and 88 journal of library automation vol. 4/2 june, 1971 impose additional work on already overburdened circulation staff. the call-slip circulation system in use at that time no longer seemed adequate, and the file maintenance associated with the call-slip system had turned into a time-consuming and cumbersome task. thus the need for an improved and simplified circulation system became evident to the administration of the library. the professional librarians held several informal discussions to identify and develop a circulation system that would adequately meet both present and future requirements of the library. several existing types of circulation systems were considered and comprehensively reviewed, but the librarians did not agree upon any of them. however, the review did result in the formation of a task force, consisting of representatives from the administration, the data processing center, and the library. after thorough investigation, this task force recommended a computerized on-line circulation system as a possible solution to the library's problem, and the administration authorized the task force to prepare a detailed analysis and design proposal. design phase in developing its detailed proposal, members of the task force took into consideration the fact that the new circulation system would use the existing computer facilities on the campus. they aimed at a system that would provide the best possible service at least cost in the long run, and one that would allow for incorporation of future developments in computer technology. main design objectives were to l) eliminate borrower participation in the check-out process, 2 ) speed and simplify circulation procedure, 3) eliminate manual file maintenance, 4) permit identification of the status of any book within the system, 5) provide accurate and up-to-date statistics concerning use of library materials, including the number of times a given book is used, 6 ) provide guidance from the system in case of human error in conducting a transaction, and 7) relieve professional librarians from clerical chores. development hardware the computer system on the campus operates in a time-sharing mode, concurrently performing several on-line and batch processing jobs for the registrar's office, business office, textbook library and booth library. at the present time it is an ibm s/360 model 50 with 262k bytes central core and related peripheral equipment. it functions under the supervision of operating system os. on-line circulation systemjrao and szerenyi 89 figure 1 shows a schematic of the system's ibm equipment and data how among the various components, as applicable to bloc. among the components shown in the schematic, two 029 keypunches, one 059 verifier, two 1031 terminals, one 1033 printer and one 2260 cathode ray tube display fig. 1. booth library on-line circulation (bloc). 059 verifier 557 interpreter terminal are exclusively used by bloc and are located in the library. the other components, located in the computer center, are shared by bloc along with the other systems in operation on the campus. it should be pointed out also that this schematic represents only the equipment used exclusively or in a shared mode by bloc and does not represent the university's total computing system configuration. software there are 25 different applications programs written in pl/1 ( f level) to support the bloc system. these do not include the system programs written in assembler language to perform certain basic machine functions. there are two main data files that are required for the operation of bloc. these two files are stored on a 2314 disk storage facility and are available to the system in an on-line, random-access basis. the programs required to process the bloc transactions are also stored on the same disk storage facility, and these programs are loaded as needed by the operating system. 90 journal of library automation vol. 4/2 june, 1971 of the two data files the first one is the patron file, which contains identification data of persons eligible to borrow books from booth library. the second one is the booth master file, containing identification information for each physical volume located in booth library. the patron file is a combination of employee and student files that were created to serve the usual business needs of a university. this file is arranged in the indexed-sequential method (5) by the patron's social security number. each student record in this file is 408 bytes long and each employee record in this file 304 bytes long. at present this file contains over 19,000 records, including some inactive student records. to process the transactions bloc borrows such information as name, address and telephone number of the patron from this file as needed. updating of this file is done by the computer center with the aid of the university administrative offices. the booth master file was created exclusively for the operation of bloc. creation of this file, which took one and one-half years, was done by converting the booth library shelf list into punched cards and then transferring the information from the punched cards to a disk file. one master card was punched for each physical volume in the library. after verification, information from these cards was loaded onto the disk file through the s/360. layout of the master cards is given below: field card columns explanation transaction code accession number format code call number edition, year, series volume number part, index, supplement number copy number location code author title end of card code 1 2-7 8 9-28 29-31 32-35 36-38 39-40 41-42 43-52 53-79 80 a=new record c=change record d=delete record oversize, etc. reference library, etc. 12-4-8 punches any blank space in the above fields is filled in by a slash ( j). creation of book cards (those used in circulation transactions) from the master cards is explained in file updating procedure. the booth master file on the disk is arranged in the indexed-sequential method ( 5) by the first ten characters of the call number and by the on-line circulation system j rao and szerenyi 91 accession number, which has a fixed length of six characters. each record in this .file is 124 b ytes long. the layout for a record is given below: field byte positions explanation os control 1 call number 2-11 accession number 12-17 call number 18-27 edition, year, series 28-30 volume number 31-34 part, index, supplement number copy number location code author title control byte number of check outs status of book borrower social security # borrower status due date format code save social security # save type save status unused bytes 35-37 38-39 40-41 42-51 52-78 79 80-82 83 84-92 93 94-99 100 101-109 110 111 112-124 first ten characters remainder of call number cumulative number of check outs in or out !=student, 3=faculty, etc. oversize, etc. ss# of the patron that requested save !=student, 3=faculty, etc. is there a save or not? for future use average access time of a record from this file is 75 milliseconds. as a security measure, a copy of the booth master file is kept separately on a magnetic tape, from which the disk file can always be recreated. operation updating the booth master file is updated nightly with records of the new books acquired by booth library. after processing in the catalog department, the new books go to the keypunch section, where a master card is punched and verified for each new book. the master cards are then sent to the computer center, where each master card is reproduced and interpreted into two book cards. the layout of the book card is identical to that of the master card with two exceptions. a 't' is punched in column one as a transaction code for the system, and the end-of-card code is moved 92 journal of library automation vol. 4/2 june, 1971 to column 19 to expedite transaction processing, accession number and class number being adequate for locating a book record from the booth master file. the identification data appearing on the book card is similar to that on the master card. however, during the interpretation process printing on the book card is rearranged in a format more suited to visual veri£cation (figure 2) ' ii ii i fig. 2. bloc book card. i i i ii i i i i after interpretation the book cards are run through a stamping process. in this process the machine reads the accession number from each card and stamps the number on the back of same card, across the 3)~-inch dimension of the card and near the top. this allows the circulation staff to compare the accession number with the number stamped on the book pocket, to insure that the card is put in the right book. after being stamped the book cards are sorted into two identical decks and sent back to the library's keypunch section. one deck of the cards is put into book pockets and the books are shelved in the stack area ready for circulation. the second deck goes to the circulation department for interfiling in call number sequence into a duplicate book card file. cards from this file are used as replacements for the original book cards in the book pockets as needed. whenever a card is removed from this file, the information is noted on a special card, so that another duplicate can be punched and placed in the file for future use. late at night, after the library is closed, the master cards received on that day in the computer center are used to update the booth master file before the new books are put into actual circulation. transaction p1'ocessing circulation transactions are processed through the ibm 1030 data collection system, whose configuration consists of two 1031 card badge on-line circulation systemjrao and szerenyi 93 readers, one 1033 printer and one 1034 card punch. the entire 1030 system is controlled by a 2701 data adapter unit (figure 1). if the computer is not in the on-line mode, the 2701 routes the transaction information to the 1034 card punch to be punched into cards that will be used later to update the disk files. the 1030 system transactions are monitored by a special program called "1030 analyzer". this program is written in basic assembler language ( bal) and has its own partition of about 50k in memory. the 1030 analyzer controls five overlay programs which actually process the transactions and make necessary file modifications. each overlay is a segment of the transaction processing program and processes a specific routine, such as determining type of patron and loan period, calculating due date, etc. when the information for a transaction is transmitted to the 1030 partition, the 1030 analyzer determines which overlays are needed to process the particular transaction and calls those overlays. the overlays access the required records from the patron and booth master files and do the necessary processing, then the master records representing the latest transaction information are written back in their storage locations. the 1030 analyzer program and the associated overlays were written locally for bloc. check-out each person associated with the university who is eligible to borrow books from booth library is issued a badge by the appropriate administrative office. a patron is expected to present this badge, along with the books he wishes to check out, to the circulation desk. though transactions can be processed without the badge, this is done only in exceptional cases. into each badge are punched the person's social security number and a one-digit status code to indicate student, faculty, etc. the badge reader in the terminal reads the social security number and transmits it to the system, which interprets it as the record address for the particular person in the patron file and takes the necessary information from that address. the status code enables the system to determine the loan period. after receiving the books to be checked out and the badge from the patron, the circulation attendant first compares the accession number stamped on the book pocket against the accession number stamped on the back of the book card. if the numbers match, she proceeds with the processing; otherwise she first pulls the right book card from the duplicate book card file before proceeding. mter comparison of accession numbers, the badge is inserted into the badge slot on the 1031 terminal. the reset switch is set to "non-reset" to charge out more than one book to the same patron. the book card is then fed into the card input slot, face down, notch edge first. if the terminal is not able to read the card the first time a "repeat" light comes on, in which case the card is taken from the exit slot and fed into the input slot again. if the "repeat" light comes on more than twice for the same card, 94 journal of library automation vol. 4/2 june, 1971 it is assumed that there is a punching error in the card and the transaction is completed by manually recording the necessary information on a special card that is later punched and used to update the disk files. if the terminal reads the card without any problem the "card" light comes on, indicating that the terminal is ready to process the next card. the attendant takes the book card from the exit slot and puts it in the book pocket, then stamps due date on the date due slip for a student check-out, or inserts a prestamped date due card for a faculty check-out. when all book cards for one patron have been run through the terminal, the badge control switch is set to "re-set," which releases the badge to be returned to the patron. if the transaction is not normal, the deviation is communicated to the attendant by the system on the 1033 printer with one of the following messages: terminal address message ace.# class patron code 1) "#terminal-s no master record 137335 9197 3233815871" class# ace.# 2) "#da428-h21 001253 message you just tried to check out -is already out-check it in and try again" the first message is given for a book that got into circulation before its master card was loaded onto the disk file by the computer center; in this case the transaction is completed with a special card through manual recording. the second message is given for a book that has not gone through the check-in process upon arrival in the library from a previous check-out; here the attendant simply checks in the book, then checks it out again. check-in at frequent intervals books deposited in return bins are placed on a truck and taken to a terminal to be checked in. the check-in badge is inserted into the badge slot and the reset switch is set to non-reset mode. when a book is taken from the truck the accession number on the back of the book card is compared with the accession number on the book pocket. if the card is the right one, it is then run through the terminal and replaced in the book pocket, which completes the check-in process for a book. if the book card is not the right one or is missing from the pocket, the transaction is completed using the book card from the duplicate file. circulation list each night after the library is closed a cumulative circulation list is printed giving all books checked out up to the closing hour of 11 p.m. two copies of this list are delivered to the circulation department the next morning. one copy is placed at the card catalog to enable patrons to find out whether books they want are in or out. the second copy is kept at the circulation desk for staff use. on-line circulation systemfrao and szerenyi 95 this list, printed in call-number order, sho:ws the identification data of the book, its due date, the patron's social security number and his status. for faculty and special badge (mending etc.) check-outs, the transaction date is printed instead of the due date. at present a faculty member can check out a book for a whole academic year. however, the circulation librarian may recall a book from a faculty member after thirty days if it is needed by another patron. the transaction date helps the circulation librarian to recall books in accordance with this policy. since the loan period for special badge check-outs can not be predetermined, the transaction date is printed for these check-outs. the circulation list acts as a back-up to permit circulation staff to answer questions when the system is not in the on-line mode. when the system is in the on-line mode the circulation list reduces the demand on the 2260 terminal inquiries during peak periods. the circulation list printing will be eliminated when two more 2260 terminals become available for the use of the system. transaction file each check-in and check-out transaction· is recorded on a tape in addition to being on the booth master file on disk. this transaction tape is used to generate the circulation list. it also enables restoration of the disk file should something happen to the latter. the transaction tape is cumulated into weekly, monthly and annual tapes to generate a variety of statistical reports and also overdue lists. batch processing most of the time this system operates in an on-line, real-time mode. occasionally it has to operate in batch mode because of some mechanical malfunctioning. in this mode the computer system is not able to service the circulation system. consequently the 2701 data adapter unit routes the transaction information to the 1034 card punch (figure 1) to receive data and punch this data into cards in a pre-designated format. for each transaction conducted in this mode the 1034 card punch punches one card with appropriate data; this is later used to update the transaction file. in this mode the circulation staff cannot make on-line inquiries and cannot get guidance from the system in case of errors. to alert them there is a light on the terminal that hashes whenever the system operates in this mode. in case of a complete breakdown of the system, transactions are processed manually using special cards. later, information from these cards is punched in 1034 card format and the files are updated. during the two years of its operation the system went into this mode only once for two hours because of engineering difficulties. over dues a cumulative overdue list is printed once a week listing books overdue on that date. it shows identification data of a book and the address of its 96 journal of library automation vol. 4/2 june, 1971 borrower. for each overdue book a mail notification card is also printed, addressed to the borrower and containing identification data for the book. when an overdue book is checked in, the system prints out a message for the attendant. for example the message "lb1051-s62-131313 checked in by 320-46-0785 1 was 20 days overdue" means that the book identified as "lb1051-s62-131313", brought back by a borrower whose number is "320-46-0785", was 20 days overdue. mter check-in the overdue book is turned over to a clerk for necessary action. personal reserves a patron wishing to obtain a book that is checked out places a reserve on the book at the circulation desk. reserves are placed in on-line, real-time mode in the bloc system. the circulation attendant merely keys in the identification data of the book and the requestor, along with the reserve code, using the 2260 display terminal. this information is sent to the system in the following order: 1) start symbol indicating the beginning of an inquiry. 2) inquiry code "br" (for booth reserve). 3) identification data of the book (call and accession numbers). 4) identification data of the requestor (social security number) . 5) requestor's status code (no.3 for faculty members, etc.). 6) end of the message code ( "_" underscore ) . when this information is entered, the system places the book on reserve for the requestor and displays the necessary information on the screen for the attendant's visual verification. whenever a book on reserve is checked in, the system prints a message such as "qa76-5-f34-182929 is saved for 138-32-0044 3." alerted by this message, the attendant places the book on the reserve shelf and notifies the requestor, usually by telephone. meanwhile, if another person inadvertently tries to check out the book, the system prints the message as "qa76-5-f34-182929 is saved for 138-32-0044 3 do not check out." if the requestor cancels his reserve on the book, it can be taken off reserve status by sending the appropriate code and identification data of the book via the 2260 terminal. on-line inquiries one of the main advantages of bloc is that it enables the library to obtain answers to a variety of questions in seconds. the circulation staff can tell easily the status of any book, and can obtain the list of books borrowed by a patron. on-line, real-time inquiries can be made on this system using the 2260 display terminal. the 2260 inquiry processing is controlled by a special program called the 2260 analyzer. this program is written locally in pl/i and has its own partition (about 95k) in the memory of the computer. altogether it on-line citculation system j rao and szerenyi 97 services thirteen terminals located at various places on the campus. only two of these terminals accept the circulation inquiries: the master terminal in the computer center and the terminal at the circulation desk. the rest are used in connection with the other computer applications on the campus. when a circulation inquiry is transmitted to its partition, the 2260 analyzer determines the type of inquiry and calls in the appropriate overlays (at present there are 20 ) to access the needed records from the files and to process the inquiry, and then send the response back to the inquiry originating terminal. after processing, the records representing the latest modifications, if any, are written back in their previous storage locations. inquiry response time is less than a second. to know how to make a certain type of inquiry all one has to do is key in the letters "in" onto the screen and enter them into the system, then the system displays formats for various types of inquiries on the screen. this feature enables new operators to make inquiries on the terminal with minimum training. the reserve and clear inquiries have already been explained. the other circulation inquiries include: name, student or employee master file, book display, book scan, and unclear. name inquiry the social security number of a patron may be obtained by keying in his last name preceded by code letters "na"; if unsure of the spelling of the d esired last name the operator merely keys in a part of the last name. when either a name or segment thereof is entered, the screen displays twelve names in alphabetical order (beginning with the last name or part of the last name entered) along with corresponding social security numbers. if the desired name is not within these twelve, the operator can get the next twelve by pressing the "next" key. this procedure may be repeated until the desired name and corresponding social security number are located. she can then select that social security number and enter it into the system to get the address of that person. student and employee master file inqui1·ies these inquiries are being made to find the addresses and telephone numbers of patrons as needed by the circulation department. whenever a person's social security number, preceded by code letters "sm" (for students ) or "em" (for employees ), is entered into the system, it displays his campus and home addresses and telephone numbers. book display inquiry this inquiry enables the circulation staff to know the status of any book within the system. when the call number and accession number ( which is usually obtained from the duplicate book card fil e at the circulation desk), preceded by code letters "bd", are entered through the terminal, the system displays the following information on the book: call number; acces98 journal of library automation vol. 4/2 june, 1971 sion number; copy number; author and title; status, as checked in or checked out; if checked out, when; how many times it has been checked out so far; if checked out, the name and address of the person who has it; if on reserve, name and address of the reserve requestor. book scan inquiry through this inquiry the books in a given class can be scanned, one after another. whenever a class number, or part of it, preceded by code letters "bs" (for book scan), is entered into the system, it displays the information about the first book in that class; then, by pressing the "next" button on the terminal keyboard, the operator can have displayed information about the next book in that class. this procedure may be repeated as many times as necessary. this class access method is a very important feature of the bloc system; through it, one may discover rather quickly what books are available in the library on a given subject, and simultaneously it can be found whether a book is in the library or checked out. in this inquiry mode the system also keeps track of how many books are scanned for a given class and displays this information on the screen. the difference between the "bd" and the "bs" inquiries is that the "bd" inquiry is made when information about a specific book is needed and when its unique record address (call and accession numbers) is known. the "bs" inquiry is made when only part of the record address (such as class number portion of the call number) is known and when browsing through a given class is desirable. unclear inquiry the university library has to clear withdrawing or graduating students, and leaving employees. this is very easily accomplished through this system, by the operator's merely entering the patron's social security number, preceded by the code letters "bu" (for book unclear), into the system. if the patron has no books out as of that minute, the system displays "patron xxx-xx-xx~:x has no books checked out." otherwise the system displays the call and accession numbers for books checked out and not yet returned by the patron. the system can display up to ten titles at a time; if the patron has more than ten books out a "continue message" appears at the end of the top line on the screen, and otherwise a "final message" appears. discussion benefits gained by the circulation department have already been discussed. following are benefits gained by booth library as a whole: 1) booth library can now provide subject listings, arranged in callnumber order (sorting and printing takes only a few hours) as required by various academic departments. the listings have been extremely helpful in pointing out the library's resources to various accreditation committees. on-line circulation systemjrao and szerenyi 99 2) physical book inventory taking was greatly facilitated by printing the booth master file in segments and with indication whether a book was in or out at that time. 3) periodic listings of books charged out to special badges, as binding, lost, etc., have been printed to facilitate follow-up activities by the respective departments. 4) the booth master file acts as a security back-up to the shelf list. should something happen to the shelf list it could be recreated from the booth master file within two days. similarly, if need arises, the departmental shelf lists can be created overnight. 5) the library committee can now make book budget allocations on a more scientific basis by reviewing the annual statistical reports, which give more accurately than before the volume of circulation in various subject fields. in addition to the above benefits there are interesting possibilities for doing a variety of things, of which only a few are mentioned below: 1) periodic listings of new books received, on the basis of area of interest, can be printed to provide selective dissemination of information service. 2) since both students' and circulation records are in machine readable form, a variety of research tasks could be undertaken with readily available data to find out the reading habits of students at academic level and at age level. these reading habits could be correlated to academic achievement with the aid of data in the student records . 3) when the booth library marc implementation project is completed, most book cards can be generated directly from the marc tapes. punching of master cards will be necessary only for those books not entered on marc tapes. 4) the booth master file can be used to create data bases (partially or completely, depending on the size and characteristics of a library) at other libraries for similar applications. costs booth library does not differ from other libraries when an attempt is made to collect data on costs. there are no figures available on planning cost, system design cost, or on writing and testing the programs. bloc was developed through the collaborative efforts of library and computer center staff. many people have devoted time to the planning and development effort, working for bloc on an additional duty basis. only two people were hired to work full time for the project: one keypunch operator and one ibm machine operator; their combined annual salary was $9,000.00 in 1968 and in 1969. the machine operator position was terminated at the completion of the basic file conversion in 1969. present operating costs of the system are not yet available. since all programmers' and operators' time is devoted to maintaining and operating a number of systems, it is difficult to determine how much the operation 100 journal of library automation vol. 4/ 2 june, 1971 and maintenance of bloc costs. so far staff members have been busy improving the performance of bloc and have not had time to do an indepth cost study. howeve r, there are some costs which can be directly charged to bloc. at present the full time of one keypunch operator and 270 hours of student help, at a total cost of $745.00 per month, are exclusively devoted to bloc. the breakdown listed in table 1 gives an estimate of the percentage of use on each of the items of equipment used for library purposes and the monthly cost calculated from the percentage of use. terminal cable and magnetic tape costs are not included in the total. table 1. equipment use and cost. qty. item number item description percentage proportional of use by monthly rental bloc cost charged to bloc 2 029 card punch 100 $117 1 059 card verifier 100 58 1 083 sorter 1 1 1 088 collator 1 4 1 519 reproducer 1 2 1 557 interpreter 1 1 2 1031 input station 100 159 1 1033 printer 100 76 1 1034 card punch 67 206 1 1052 printer keyboard 5 3 1 1403 printer 2 16 1 2050 central processing unit 10 1,188 1 2260 visual display 100 41 1 2314 disk storage facility 13 689 1 2316 disk cartridge 100 17 1 2401 magnetic tape unit 50 138 1 2540 card read/punch 5 28 1 2701 data adapter unit 67 157 1 2821 control unit 5 46 1 2848 display control unit 13 96 total monthly cost of equipment: $ 3,043 total yearly cost of equipment: $36,516 it should be pointed out that the costs shown in table 1 are averaged on the basis of total number of units rented and the amount paid in connection with all computer applications, and not on the basis of equipment used by bloc alone. if the above-listed equipment had been rented for utilization by bloc alone the rental costs would have been much higher. moreover, the costs in table 1 do not include salaries of computer personnel. on-line circulation systemjrao and szerenyi 101 utilization of the bloc system has not produced any payroll savings. no library position was eliminated by installing it, but it is a certainty that more personnel would have been needed to discharge all duties at the circulation desk in the future without this system. using the computer allows a 20% increase in loans to be processed without increase in personnel cost. expansion the capacity of bloc has by no means been exhausted; its flexibility allows for more innovations, so that every possible circulation need can be met. the utilization of the bloc system is limited only by the ingenuity of its users. two new features are to be added to it in the near future. one of these is the installation of an ibm 27 41 communications terminal to generate date-due slips, so that the present method of stamping the due date in a book can be eliminated. the 2741 terminal was decided on for the reason that it can be operated in either on-line or off-line mode, enabling the circulation staff to type date due slips manually when the system is in off-line mode. the second new feature will be installation of an additional 2260 terminal for public use near the card catalog. this terminal will accept only "bd" and "bs" inquiries, and the "bd" inquiry on this terminal can be made by call number alone, which is readily available in the card catalog. privileged inquiries, such as placing books on reserve, etc., will continue to be the prerogative of the terminal at the circulation desk. this new feature will provide patrons with up-to-the-minute information concerning the availability of library materials. design phase for these new features has been completed and the programming effort is underway. it is expected that these new features will be added to bloc by the fall of 1971. conclusion it can be said that a relatively small university library with limited funds can start and develop automated systems if the parent institution obtains a computer for instructional and administrative purposes. this was the case in booth library's circulation system. to keep pace with eastern illinois university's anticipated growth, it was decided in 1964 to develop a data processing center. it has grown rapidly in terms of services rendered to the university. its main purpose initially was to serve the academic departments, but its services have spread to several administrative functions, such as admissions, student records, registration and personnel services, just to name a few. it was not difficult for the librarians to convince the university's administration of the necessity and usefulness of the computer for library purposes. relatively little extra expenditure for hardware was needed. understanding and cooperation from the staff of the reorganized 102 journal of library automation vol. 4/2 june, 1971 computer center helped to develop the library's circulation system. what was the dream of the librarians a few years ago is now an actual operation, is working well and giving better service to the library's patrons. the major advantage is the saving of time on all necessary operations. the system also freed the staff from routine manual work. it eliminated the large call-slip files and inevitable human errors in that file. patrons were freed from filling out call slips, and the circulation staff was freed from the tiresome task of decoding the unreadable "scribbling" of many patrons. check-out and check-in of books was speeded. there is no longer a line of waiting students at the circulation desk and, on the average, it takes less than five seconds to check out a book. a variety of reports containing computer analysis of circulation records are available at regular intervals. they are an aid to ordering additional copies for heavily used titles and to surveying the collection for weak spots. after more than two years of operational experience, it can be said with confidence that the bloc system has fully satisfied all its design objectives and even exceeded them by providing some additional benefits that were not in the original planning. references 1. hamilton, robert: "on-line circulation system at the illinois state library," the larc reports 1 (december 1968). 2. heineke, charles d.; boyer, calvin j.: "automated circulation system at midwestern university," ala bulletin, 63 ( october 1968 ), 1249-1254. 3. kennedy, r. a.: "bell laboratories' library real time loan system (bellrel), journal of library automation, 1 (june 1968 ), 128-146. 4. bearman, h. k. g.; harris, n.: "west sussex county library computer book issuing system," assistant librarian, 61 (september 1968 ), 200202. 5. ibm corp. introduction to ibm systemj360. direct-access storage devices and organization methods. white plains, n.y.: ibm, 1967. a technology-dependent information literacy model within the confines of a limited resources environment ibrahim abunadi information technology and libraries | december 2018 119 ibrahim abunadi (i.abunadi@gmail.com) is an assistant professor, college of computer and information sciences, prince sultan university, riyadh, saudi arabia. abstract the purpose of this paper is to investigate information literacy as an increasingly evolving trend in computer education. a quantitative research design was implemented, and a longitudinal case study methodology was conducted to measure tendencies in information literacy skill development and to develop a practical information literacy model. it was found that both students and educators believe that the combination of information literacy with a learning management system is more effective in increasing information literacy and research skills where information resources are limited. based on the quantitative study, a practical, technology-dependent information literacy model was developed and tested in a case study, resulting in fostering the information literacy skills of students who majored in information systems. these results are especially important in smaller universities with libraries having limited technology capabilities, located in developing countries. introduction many different challenges arise during a graduate’s career. moreover, professional life can involve numerous situations and problems that university students are not prepared for during their college studies.1 the use of internet sources to find solutions to real problems depends on students’/graduates’ information literacy skills.2 a strong aid to students’ learning is the ability to search, analyze, and apply knowledge from different sources, including literature, databases, and the internet.3 one of the issues students face concerning technology is its continuous evolution. although students learn survival skills in their professional lives, they also require special coping skills. a skill that should be considered for all technology-related courses is information literacy. lin defines information literacy as a “set of abilities, skills, competencies, or fluencies, which enable people to access and utilize information resources.”4 these are part of the lifelong learning skills of students, which put the power of continuous education in their hands. another issue is the exclusive allocation of the responsibility for information literacy skill development in smaller educational institutes to librarians or to instructors who majored in library science.5 this paper has taken another approach to information literacy skill development whereby specialized educators, such as capable information systems faculty members, facilitate this skill development. a learning management system (lms) is a widely used form of technology for course delivery and the organization of subject material. blackboard, desire2learn, sakai, moodle, and angel, as common lms platforms, provide an integrated guidance system to deliver and analyze learning. mailto:i.abunadi@gmail.com information literacy model | abunadi 120 https://doi.org/10.6017/ital.v37i4.9750 these systems can be used to support information literacy instruction. standard features include assignments and quizzes, while other systems offer tools that allow students to view and comment on other students’ portfolios or work, depending on the lms’s features.6 before the 1990s, face-toface learning was common within the educational domain. however, lms emerged in the twentyfirst century as the internet became a suitable alternative to traditional learning. moodle, an opensource lms, is an acronym that stands for “modular object-oriented dynamic learning environment.” this online education system is intended to make learning available with the necessary guidance for educators. web services available through moodle are based on a wellorganized structural outline, and they are widely used to perform educational tasks and to analyze statistics helpful to instructors.7 peter et al. (2015) presented an approach related to information literacy instruction in universities and colleges that combines traditional classroom instruction and online learning; this is known as “blended learning.”8 this involves only one seminar in the classroom; thus, it can replace traditional sessions at universities and colleges with education involving information literacy instructions. it has been recommended that a time-efficient method should be adopted by augmenting classroom seminars and literacy instructions through the addition of online materials. however, the findings of this study showed that students who only use online materials do not show greater progress in their learning than those who follow the blended approach. the results of another study by jackson more effectively integrated educational services into learning management systems and library resources.9 jackson suggested that better implementation was required, and recommended using blackboard lms to include information literacy and scaffolding activities into subject-specific courses. this study intends to determine the most effective method of information literacy education. it evaluates instructors’ and students’ perceptions of the effectiveness of traditional teaching in comparison to electronic teaching in information literacy. in this study, a quantitative research investigation was conducted with participants. a research model and questionnaire were developed for this purpose with three underlying latent variables. the participants were asked to describe their understanding of learning systems and their preferences in information literacy education. their requirements varied with their continuing education levels and past educational activities, based on which software or website appeared to be more supportive and compatible with them.10 this study considered the research results, developed an information literacy intervention model and applied it to a case study. literature review previously, educational institutions were limited to face-to-face teaching techniques or classroombased teaching. face-to-face teaching is the traditional method still used in most educational institutions. in classrooms, the subject is explained, and books or other paper-based materials are read out of class to enhance understanding.11 face-to-face learning or teaching is limited by the number of physical resources available. therefore, it becomes difficult to accommodate the widespread interest in information literacy through face-to-face learning.12 gathering information using only physical resources can lead to information deficiencies.13 education has evolved to benefit from advances in technologies by using lms and online sources. the effective usage of lms and online sources requires the development of information literacy. information technology and libraries | december 2018 121 information literacy information literacy includes technological literacy, information ethics, online library skills, and critical literacy.14 technological literacy is defined as the ability to use common software, hardware, and internet tools to reach a specific goal. this ability is an important component of information literacy that enables a graduate to seek answers by using the internet and digital resources.15 hauptman defines information ethics as “the production, dissemination, storage, retrieval, security, and application of information within an ethical context.”16 this skill is essential to preserve the original rights of researchers cited in a study, based on the ethical standards of the graduate conducting the study. another important component of information literacy comprises online library skills, which can be defined as the ability to use online digital sources, including digital libraries, to effectively seek different knowledge resources by using search engines, correctly locating required information, and using online support when needed.17 critical literacy is a thorough evaluation of online material that allows for the appropriate conclusion to be reached on the suitability of the material for the required investigation. 18 seeking answers from appropriate sources is important to allow graduates to find and report on accurate and valid data. these components of information literacy enable information extraction from topics related to the desired course or field of research. students, professors, instructors, employees, learners, and educational policy administrators are the major knowledge seekers who use information literacy skills.19 with improved online resources available for learning, many learning requirements are moving toward providing services that are exclusively online. 20 gray and montgomery studied an online information literacy course.21 they found that teaching with the aid of information literacy is helpful for students in obtaining improvised instruction. the authors also compared an online information literacy course and face-to-face instruction, focusing primarily on the behaviors and attitudes of teachers and college students toward the online course. the students agreed that the application of information literacy techniques would be particularly helpful to them in clarifying their understanding of complicated instructions. the teachers also indicated that an information literacy course would result in better regulation of academic processes than face-to-face learning. dimopoulos et al. (2013) measured student performance within an online learning environment, finding that the online learning environment has direct relevance for the completion of challenging tasks within academic settings.22 the findings further indicated that an lms could improve teaching activities. as an lms, moodle was also helpful for students to ensure their development of collaborative problem-solving skills. they concluded that moodle includes different useful modules such as digital resource repositories, interactive wikis, and external addin tools that have been related to student learning when incorporated into the lms environment, resulting in better performance. hernández-garcía and conde-gonzález focused on learning analytical tools within engineering education, noting that engineering students are more likely to understand complicated concepts better. therefore, the application of the information literacy model resulted in better performance.23 further, educating students about information sources was found to be helpful for the instructors in enhancing the students’ learning by improving their online information retrieval skills. this study indicated that students can develop their learning traits more effectively through online learning than through face-to-face learning. information literacy model | abunadi 122 https://doi.org/10.6017/ital.v37i4.9750 many researchers in this area have developed models that are only theoretical. 24 however, this paper develops a practical information literacy model that can be tested for improvement in information literacy skills. this is especially relevant for computer and information systems courses, which can sometimes fall outside the purview of library-related training or education in universities with limited resources. the inclusion of information literacy training within computer and information systems courses is not regularly done in the information literacy field. 25 additionally, although some information literacy has been implemented practically in research, no other study has developed a practical information literacy model based on educators’ and students’ information literacy dispositions as well as both information literacy theory and practice.26 moodle as an lms moodle is a useful and accommodating open-source platform with a stable structure of website services that allows instructors and learners to implement a range of helpful plugins. it can be used as a lively online education community and an enhancement to the face-to-face learning process.27 moodle is used in around 190 countries and offers its services in over seventy languages. it acts as a mediator for instruction and is widely adopted in many institutions. moodle provides services such as assignments, wikis, messaging, blogs, quizzes, and databases. 28 it can provide a more flexible teaching platform than traditional teaching. health science educational service providers facilitate self-assurance in their learners. several educational campuses operate by using face-to-face learning strategies, whereby learners obtain their training on-campus locations. the objective of moodle is to enable the education of learners through internet access.29 xing focused on the broad application of the moodle lms for developing educational technology within academic settings, suggesting that academic organizations should promote technology as a solution for common problems with students’ learning processes.30 such suggestions have been supported by costa et al. (2012) who found that moodle is significantly helpful for developing an e-learning platform for students. they emphasized that engineering universities must use the moodle lms to provide students with extensive technical knowledge. 31 costello et al. (2012) stated that moodle, if used, will significantly help students improve their skills and knowledge effectively.32 methodology in information literacy skill development, there are studies that support using only face-to-face education or only using an lms. for example, churkovich and oughtred found that face-to-face learning leads to better results in information literacy tutorials than online learning. 33 at the same time, anderson and may concluded that the use of an lms is viewed by students as a better method than face-to-face instruction in information literacy.34 to test which educational pedagogy (traditional or technology) is better regarding information literacy, the following two hypotheses were posited: h1: face-to-face learning has a significantly positive influence on information literacy disposition. h2: moodle learning has a significantly positive influence on information literacy disposition. to provide a better understanding of the most effective method of information literacy instruction, a quantitative research design was used. the wording of the questionnaire items (shown in table 1) was inspired by the studies of ng, horvat et al., abdullah, and deng and tavares. 35 online information technology and libraries | december 2018 123 questionnaires were prepared and distributed to students, teachers, trainers, and professors as well as administrative departments in a small private university located in the arabian gulf region. initially, a pilot study was conducted to test the instrument. this pilot study involved forty-nine participants and fifteen questions on information literacy. it also included demographic questions. variables code item wording face-to-face education disposition (fed) fed1 information literacy skills are polished through face-to-face learning fed2 face-to-face learning accommodates information literacy requirements fed3 face-to-face learning is easier than learning management systems fed4 face-to-face learning is better than learning management systems moodle usage disposition (mud) mud1 moodle is more easily accessible than other online resources mud2 moodle is an effective web server for information literacy mud3 moodle is more reliable than other online resources mud4 moodle enables the provision of an extensive amount of useful information mud5 moodle is used to overcome language, understanding, and communication gaps information literacy preference (il) il1 students and teachers prefer online resources il2 inauthentic websites are helpful for students and teachers il3 authentic websites are useful for students and teachers il4 students and teachers prefer published articles, journals, and books il5 online learning is more effective il6 information is essential for individuals’ knowledge table 1. item coding. after the pilot study, a full-scale study was conducted, in which the participants were students, professors, and educational administrators. an online questionnaire was sent to the management of an academic institution in the arabian gulf region to assess the instruction methodology to improve students’ information literacy skills. the language used in the survey was arabic, and the questionnaire was translated into english for this article by a professional translator. a total of five hundred questionnaires were sent, and 398 of them were received with complete responses. the following criteria were used to filter questionnaires that were not appropriate for this study: inclusion criteria • people currently involved in the education system. • students, teachers, or members of an academic department. • people who understand information literacy. a question was added in the survey about whether the participant was familiar with information literacy; if not, the participant was removed from the sample. exclusion criteria • people who were not involved in the education system. • people who were not aware of online learning systems. • staff with no role in learning or teaching. information literacy model | abunadi 124 https://doi.org/10.6017/ital.v37i4.9750 gender frequency percent male 186 46.73 female 212 53.27 total 398 100 qualification frequency percent undergraduate 181 45.48 graduate 98 24.62 masters 119 29.90 total 398 100 designation frequency percent student 216 54.27 instructor 90 22.61 administrator 92 23.12 total 398 100 table 2. demographic information. question agree neutral disagree don’t know face-to-face education disposition (fed) fed1 46.8 22.8 21.3 9.1 fed2 10 74.5 14.2 1.3 fed3 1.5 12.8 75.8 9.9 fed4 32 30 26 12 information literacy preference (il) il1 38.8 21.3 1.5 38.4 il2 0.3 1 98.7 - il3 15 31 53.3 - il4 49.5 30 13.0 7.5 il5 48 29.8 -22.2 il6 74 11.5 1.8 12.7 table 3. questionnaire response distribution for fed and il. question yes no moodle usage disposition (mud) mud1 65 35 mud2 73.3 26.8 mud3 67 33 mud4 66 34 mud5 63.7 36.3 table 4. responses to mud. the reliability statistics showed a high level of consistency for the pilot test because the cronbach’s alpha for the fifteen items was 0.901, which is above the recommended level of 0.7.36 cronbach’s alpha is a widely used coefficient measuring the internal consistency of items as a unified group.37 information technology and libraries | december 2018 125 based on the successful pilot study, a full-scale study was conducted. the demographic distribution for the full-scale study is shown in table 2 along with the mean and standard deviation of each demographic factor. the distribution of the questionnaire items for the full-scale study is shown in tables 3 and 4. cronbach’s alpha was used to determine the reliability of the constructed items for the full-scale study. the standard benchmark for the reliability value is a 0.7 threshold; however, the cronbach’s alpha for all constructed items was above the 0.7 standard value. thus, this standard score revealed that all the items had appropriate and adequate reliability.38 results the research hypotheses were tested using structural equation modeling (sem) with the analysis of momentum structures (amos) approach. sem includes various statistical methods and computer algorithms that are used to assess latent variables along with observed variables. sem also indicates the relationships among latent variables, showing the effects of the independent variables on the dependent variables.39 one well-regarded sem methodology is amos, which is a multivariate technique that can concurrently assess the relationships between latent variables and their corresponding indicators (measurement model), as well as the relationships among the model’s variables.40 highly cited information systems and statistics guidelines were followed for the sem to ensure the validity and reliability of data analysis. 41 measurement and structural model the measurement model contained fifteen items for ascertaining the representation of three latent variables, including face-to-face education disposition, moodle usage disposition, and information literacy preferences. before we proceed to this analysis, the data need to show normality for us to be able to trust the robustness of this parametric sem. curran et al. suggested a skewness and kurtosis less than the absolute value of 2 and 7, respectively, to display the normality of the data.42 all items’ absolute values of skewness and kurtosis were less than the suggested cut off, showing a suitable level of normality for conducting sem analysis. the overall measurement model showed a high level for the fit indices: gfi=0.99, agfi=0.98, nfi=0.98, cmin/df=0.86, and rmr=0.39. all these fit indices show that the theoretical model fits well with the empirical data if they are above 0.95, except cmin/df and rmr, which do not follow this cut off. the cmin/df should be less than 3, while the rmr should be less than 0.5.43 table 5 shows all the items loaded on their corresponding latent variables higher than the suggested cut off (0.5). as shown in the table, il6 was the only item that did not load clearly on its latent variable and, thus, it was dropped from further analysis.44 an additional method to assess item loading was item loading significance, which was significant at the level of 0.001, indicating that all items loaded on their latent variables.45 the indices of the measurement model suggested that the psychometric properties of this instrument can be preceded by the structural model. information literacy model | abunadi 126 https://doi.org/10.6017/ital.v37i4.9750 item estimate face-to-face education disposition (fed) fed4 0.71 fed3 0.52 fed2 0.66 fed1 0.89 moodle usage disposition (mud) mud5 0.93 mud4 0.92 mud3 0.92 mud2 0.73 mud1 0.93 information literacy preference disposition (il) il6 0.32 il5 0.91 il4 0.72 il3 0.86 il2 0.81 il1 0.83 table 5. item loadings. the next step was to assess the structural model, which was used to evaluate the hypothesized relations between the dependent variables (face-to-face education disposition [fed] and moodle usage disposition [mud]) and independent variables (il). both education methods were tested in the hypotheses to identify the most suitable information literacy delivery mode for students. both hypotheses were supported, which indicates that an individual method of information literacy delivery (either face-to-face instruction or lms) is not preferred, and a different model can be suggested. both hypotheses were supported at the level of 0.001 with an effect size for face-to-face education disposition of 0.32, which indicates a medium impact on information literacy preferences. meanwhile, the moodle usage disposition had an effect size of 0.70, which is considered a large effect size (hair et al. 2010). finally, the model’s explanatory power of information literacy preferences was determined by r2, which was high (0.85). based on the previous analysis, it can be said that an individual method of information literacy delivery is insufficient in developing countries. thus, a different model for information literacy was developed (figure 1), which had an impact on students’ related competencies. information technology and libraries | december 2018 127 figure 1. information literacy intervention model. as shown in figure 1, the model includes conducting weekly information literacy sessions that focus on educating students about technological literacy, information ethics, online library skills, and critical literacy. after each session is concluded, the instructor creates weekly assignments using an lms that tests the students’ information literacy abilities regarding the subject material. the instructor follows up regarding the students’ overall performance and fills any identified gaps in subsequent information literacy sessions and assignments. the instructor studies the students’ performance after one month and provides feedback to students. finally, a “real case project assignment” is used to teach students to solve real problems using the skills they learn. the instructor can further extend reflection on the process of assigning “real case project” grades by creating a course exit survey that asks students about their acquired level of information literacy skills. longitudinal case study a small technical university in the arabian gulf region faces difficulties in providing adequate library resources to its students because of its limited capabilities. the university has about 4,500 students and five hundred employees. the university library and information technology department shortage in adequate staff and resources, resulting in an insufficient support for student learning. this has caused lack of student information literacy education, which is evident in the submission of student assignments. for example, students are not accustomed to citing materials that were used in their assessments. thus, these undergraduates are viewed suspiciously by their educators when using online materials. not knowing how to paraphrase then cite relevant online materials causes missing learning opportunities for students. information literacy is a skill that should be considered for all technology-related courses.46 the outcomes of this course will be used to improve the education of students and place the power of learning in their hands.47 therefore, the objective of this case study is to determine the influence of information literacy practices in improving student performance in solving organizational problems, especially when technology and library resources are scarce. this longitudinal case information literacy model | abunadi 128 https://doi.org/10.6017/ital.v37i4.9750 study was conducted in two semesters: the first was conducted traditionally without the use of an information literacy intervention model, whereas in the second semester, the intervention model was introduced. finally, the performance and opinions of students for the two semesters were compared using a case study assignment and course exit survey. the information literacy intervention model was implemented by providing a series of practical tutorials at the beginning of the semester showing students how to use information from the internet. then, the students applied the information and used information literacy skills to solve weekly assessments for an enterprise-architecture (ea) course. this course is taught under the information systems program at a private university. students enrolling in the course are in their second year or higher. the information literacy assessments require students to search for reliable sources of information and cite and reference them. this forces them into the habit of critically examining sources of information, and grasping, analyzing, and using these sources to solve problems. the information literacy technology pedagogical method was followed to improve students’ knowledge of methods of learning.48 the students were educated through a series of classes on how to use the university’s databases, e-books, and internet resources to solve real-life organizational problems and to apply concepts in different situations, as shown in figure 1. the students were given ten small assessments from the moodle lms, where a concept taught in the class needed to be applied after students searched for it and learned more about it from different sources. this included looking in the correct places for reliable resources, online scholarly databases, and online videos that could be of use. then, students were taught how to critically examine resources and determine which of these could be reliable. for example, students were shown that highly cited papers were more reliable than less cited papers and that online videos from professional organizations (e.g., ibm or gartner) were more reliable than personal videos. students were also taught how to use in-text citations and how to create reference lists. in the last quarter of the semester, a case study assignment was provided with real-life problems that students were required to solve using different sources, including the internet. the performance of semester-1 students (no intervention was conducted) was compared with that of semester-2 students (information literacy intervention was conducted) taking the same course. an improvement in grades was considered a successful indicator. the comparison point was a major project that required students to solve real-life organizational problems and required greater information literacy. some of the ea concepts taught in the class required practice to apply them. for example, the as-is organizational modeling that is needed before implementing ea would be difficult to understand unless students actually conducted modeling on selected organizations. this enabled students to understand how they related to the real world. the concepts that were focused on were related to business tools in information systems (e.g., business process management and requirement elicitation) that are widely used for analysis within organizations. the theory behind these tools was explained in class; applying these theories required students to search many sources of information, including online books and research databases. students were unaware of these resources until the instructors explained their availability on the internet and in the library. the students were provided with regular information literacy sessions to improve their skills in this aspect. they were shown how to search; for instance, if they could not find a specific term, they information technology and libraries | december 2018 129 could look for synonyms. they were instructed on how to use search engines and research databases and were shown the relevant electronic journals and books that can aid in solving weekly assessments. the usage of internet multimedia is also important in education.49 the students were shown relevant youtube channels (e.g., by harvard and khan academy) and relevant massive online open courses (e.g., free courses on coursera.com and udemy.com). weekly tests required students to use these resources to solve the assessment problems. an important outcome of this intervention was an improvement in students’ abilities to use different digital resources. this was evident in semester-2 students’ usage of suitable reference lists and in-text citations, as compared to a lack of such usage by semester-1 students. an additional measure was the higher average score the students indicated in semester 2 (4.15/5), in comparison to semester 1 (3.2/5), for one of the items in the course exit survey relevant to information literacy: “illustrate responsibility for one’s own learning.” the students were continually taught that information literacy grants a power that comes with responsibility, and no incidents of plagiarism were reported during the semester in which the intervention was conducted. referencing became a habit with weekly information literacy assessments. the students’ grades in the final project were better than in the previous academic semester. the average grade for the project for semester 1 was 15.5/20, while that for semester 2 was 17/20. the difference between the grades for semester-1 and semester-2 projects was statistically significant at the level of 0.10, indicating significant differences in the students’ grades between the two semesters. the students could use digital library databases, and some were interested in using external online books. it became habitual for students to use in-text citations, and their references became diversified. some students, however, struggled at times with the limited usage of suitable references in only some paragraphs. this feedback was delivered to students so that they could address this issue in other courses. discussion and conclusion this study was conducted to investigate the most effective mode of information literacy delivery. the study focused on smaller universities because they do not have adequate library facilities and technological capabilities to provide students with sufficient information literacy competencies during course delivery. a survey was conducted to determine the most suitable form of information literacy delivery. the survey determined that moodle and face-to-face methods were both favored regarding information literacy. thus, the information literacy intervention model was developed and tested in a case study, so that students’ performance would improve. the results of this study have shown that the combination of technology and information literacy instruction is an effective method to improve student skills in using digital resources in seeking knowledge. it was found that both face-to-face learning and the use of an lms increase student performance in assessments that require information literacy. face-to-face learning is required in order to explain information literacy concepts, while the lms is used to disseminate the necessary digital resources and in creating assessment modules. thus, the arrangement of both theory and practice in information literacy resulted in better understanding and implementation in knowledge seeking and problem-solving related to information systems. the inclusion of information literacy instruction along with the use of lms for information literacy assessments within information systems courses has reduced the pressure on libraries that lack technological resources (such as pcs) and qualified staff. information literacy model | abunadi 130 https://doi.org/10.6017/ital.v37i4.9750 the results with regard to this study’s hypotheses are in agreement with those of previous studies.50 hypothesis 1, which posited that there would be a positive significant influence on information literacy disposition, is congruent with the research of churkovich and oughtred. 51 their research focused on student information literacy skill development using library facilities instead of faculty, which is a different approach than the approach followed in the present study. however, both the present study and the study of churkovich and oughtred found that using faceto-face instruction leads to improved student performance. hypothesis 2, which posited a positive impact on information literacy disposition, correlates with the research of anderson and may.52 they found that using an lms is more effective than face-to-face instruction for information literacy instruction. similar to churkovich and oughtred (and in contrast to the present study), anderson and may relied on librarians to deliver information literacy instruction online. however, anderson and may also relied on faculty staff in addition to librarians. there are two noteworthy outcomes of the first study. first, the questionnaire measurement model showed that the development of this instrument was successful and that the items and their latent variables can be used in further studies. second, the results regarding the structural model indicated that both face-to-face instruction and moodle use influenced information literacy preferences. other studies have supported these results. the results of peter et al. (2015) agree with the finding of the present study that the combination of face-to-face instruction and lms use leads to improved student performance.53 peter et al. (2015), based on psychology students, focused on the time-efficiency of the delivery of information literacy instruction; in contrast, the present study considers information literacy skill development as a progressive, long-term process. the information literacy intervention model is not only a learning medium but an interactive method of teaching that adapts to student learning patterns. the primary limitations of the study were the nature of the sample, the exclusion of some potentially relevant variables, and the simplification of the study’s findings. the sample was limited to students, professors, and people who were aware of the learning programs; it is highly possible that they were more familiar with such technological innovations than the general population. future studies could retest the hypothesis of the study in a comprehensive manner and impose more control on the respondents. the interaction between people while visiting a site is itself an activity worthy of examination, but it must be either controlled or measured for us to understand the role it plays in shaping attitudes and behaviors. future studies can apply the developed theoretical model in different settings to determine its interaction with other variables in the information systems field. a quantitative instrument can be developed based on the information literacy intervention model. alternatively, this model can be applied with qualitative interviews in future studies to develop theoretical themes based on instructors’ and students’ responses. references 1 harry m. kibirige and lisa depalo, “the internet as a source of academic research information: findings of two pilot studies,” information technology and libraries 19, no. 1 (2000): 11–15; debbie folaron, a discipline coming of age in the digital age (philadelphia: john benjamins, 2006); n. n. edzan, “tracing information literacy of computer science undergraduates: a content analysis of students’ academic exercise,” malaysian journal of library & information science 12, no. 1 (2007): 97–109. information technology and libraries | december 2018 131 2 heinz bonfadelli, “the internet and knowledge gaps,” european journal of communication 17, no. 1 (2002): 65–84, http://journals.sagepub.com/doi/abs/10.1177/0267323102017001607; kibirige and depalo, “the internet as a source of academic research information,” 11–15. 3 laurie a. henry, “searching for an answer: the critical role of new literacies while reading on the internet,” the reading teacher 59, no. 7 (2006): 614–27. 4 peyina lin, “information literacy barriers: language use and social structure,” library hi tech 28, no. 4 (2010): 548–68, https://doi.org/10.1108/07378831011096222. 5 michael r. hearn, “embedding a librarian in the classroom: an intensive information literacy model,” reference services review 33, no. 2 (2005): 219–27. 6 hui hui chen et al., “an analysis of moodle in engineering education: the tam perspective” (paper presented at teaching, assessment and learning for engineering (tale), 2012 ieee international conference on). 7 n. n. edzan, “tracing information literacy of computer science undergraduates: a content analysis of students' academic exercise,” malaysian journal of library & information science 12, no. 1 (2007): 97–109. 8 johannes peter et al., “making information literacy instruction more efficient by providing individual feedback,” studies in higher education (2015): 1–16, https://doi.org/10.1080/03075079.2015.1079607. 9 pamela alexondra jackson, “integrating information literacy into blackboard: building campus partnerships for successful student learning,” the journal of academic librarianship 33, no. 4 (2007): 454–61, https://doi.org/10.1016/j.acalib.2007.03.010. 10 manal abdulaziz abdullah, “learning style classification based on student's behavior in moodle learning management system,” transactions on machine learning and artificial intelligence 3, no. 1 (2015): 28. 11 catherine j. gray and molly montgomery, “teaching an online information literacy course: is it equivalent to face-to-face instruction?,” journal of library & information services in distance learning 8, no. 3–4 (2014): 301–9, https://doi.org/10.1080/1533290x.2014.945876. 12 william sugar, trey martindale, and frank e crawley, “one professor’s face-to-face teaching strategies while becoming an online instructor,” quarterly review of distance education 8, no. 4 (2007): 365–85. 13 stephann makri et al., “a library or just another information resource? a case study of users’ mental models of traditional and digital libraries,” journal of the association for information science and technology 58, no. 3 (2007): 433–45. 14 christine susan bruce, “workplace experiences of information literacy,” international journal of information management 19, no. 1 (1999): 33–47, michael b eisenberg, carrie a lowe, and kathleen l spitzer, information literacy: essential skills for the information age, (westport, ct: greenwood publishing group, 2004), https://doi.org/10.1016/s0268-4012(98)00045-0. information literacy model | abunadi 132 https://doi.org/10.6017/ital.v37i4.9750 15 andy carvin, “more than just access: fitting literacy and content into the digital divide equation,” educause review 35, no. 6 (2000): 38–47. 16 robert hauptman, ethics and librarianship (jefferson, nc: mcfarland, 2002). 17 janae kinikin and keith hench, “poster presentations as an assessment tool in a third/college level information literacy course: an effective method of measuring student understanding of library research skills,” journal of information literacy 6, no. 2 (2012), https://doi.org/10.11645/6.2.1698; stuart palmer and barry tucker, “planning, delivery and evaluation of information literacy training for engineering and technology students, ” australian academic & research libraries 35, no. 1 (2004): 16–34, https://doi.org/10.1080/00048623.2004.10755254. 18 lauren smith, “towards a model of critical information literacy instruction for the development of political agency,” journal of information literacy 7, no. 2 (2013): 15–32, https://doi.org/10.11645/7.2.1809. 19 melissa gross and don latham, “what’s skill got to do with it?: information literacy skills and self‐views of ability among first‐year college students,” journal of the american society for information science and technology 63, no. 3 (2012): 574–83, https://doi.org/10.1002/asi.21681. 20 bala haruna et al., “modelling web-based library service quality and user loyalty in the context of a developing country,” the electronic library 35, no. 3 (2017): 507–19, https://doi.org/10.1108/el-10-2015-0211. 21 catherine j. gray and molly montgomery, “teaching an online information literacy course: is it equivalent to face-to-face instruction?,” journal of library & information services in distance learning 8, no. 3–4 (2014): 301–9, https://doi.org/10.1080/1533290x.2014.945876. 22 ioannis dimopoulos et al., “using learning analytics in moodle for assessing students’ performance” (paper presented at the 2nd moodle research conference sousse, tunisia, 4 –6, 2013). 23 ángel hernández-garcía and miguel á. conde-gonzález, “using learning analytics tools in engineering education” (paper presented at lasi spain, bilbao, 2016). 24 michael r. hearn, “embedding a librarian in the classroom: an intensive information literacy model,” reference services review 33, no. 2 (2005): 219–27, https://doi.org/10.1108/00907320510597426; thomas p mackey and trudi e jacobson, “reframing information literacy as a metaliteracy,” college & research libraries 72, no. 1 (2011): 62–78; s. serap kurbanoglu, buket akkoyunlu, and aysun umay, “developing the information literacy self-efficacy scale,” journal of documentation 62, no. 6 (2006): 730–43, https://doi.org/10.1108/00220410610714949. 25 michelle holschuh simmons, “librarians as disciplinary discourse mediators: using genre theory to move toward critical information literacy,” portal: libraries and the academy 5, no. 3 (2005): 297–311, https://doi.org/10.1353/pla.2005.0041; sharon markless and david r. information technology and libraries | december 2018 133 streatfield, “three decades of information literacy: redefining the parameters,” change and challenge: information literacy for the 21st century (blackwood, south australia: auslib press, 2007): 15–36; meg raven and denyse rodrigues, “a course of our own: taking an information literacy credit course from inception to reality,” partnership: the canadian journal of library and information practice and research 12, no. 1 (2017), https://doi.org/10.21083/partnership.v12i1.3907. 26 joanne munn and jann small, “what is the best way to develop information literacy and academic skills of first year health science students? a systematic review,” evidence based library and information practice 12, no. 3 (2017): 56–94, https://doi.org/10.18438/b8qs9m; sheila corrall, “crossing the threshold: reflective practice in information literacy development,” journal of information literacy 11, no. 1 (2017): 23–53, https://doi.org/10.11645/11.1.2241. 27 liping deng and nicole judith tavares, “from moodle to facebook: exploring students’ motivation and experiences in online communities,” computers & education 68 (2013): 167– 76, https://doi.org/10.1016/j.compedu.2013.04.028. 28 ana horvat et al., “student perception of moodle learning management system: a satisfaction and significance analysis,” interactive learning environments 23, no. 4 (2015): 515–27, https://doi.org/10.1080/10494820.2013.788033. 29 cary roseth, mete akcaoglu, and andrea zellner, “blending synchronous face-to-face and computer-supported cooperative learning in a hybrid doctoral seminar,” techtrends 57, no. 3 (2013): 54–59, https://doi.org/10.1007/s11528-013-0663-z. 30 ruonan xing, “practical teaching platform construction based on moodle—taking ‘education technology project practice’ as an example,” communications and network 5, no. 3 (2013): 631, https://doi.org/10.4236/cn.2013.53b2113. 31 carolina costa, helena alvelos, and leonor teixeira, “the use of moodle e-learning platform: a study in a portuguese university,” procedia technology 5 (2012): 334–43, https://doi.org/10.1016/j.protcy.2012.09.037. 32 eamon costello, “opening up to open source: looking at how moodle was adopted in higher education,” open learning: the journal of open, distance and e-learning 28, no. 3 (2013): 187– 200, https://doi.org/10.1080/02680513.2013.856289. 33 marion churkovich and christine oughtred, “can an online tutorial pass the test for library instruction? an evaluation and comparison of library skills instruction methods for first year students at deakin university,” australian academic & research libraries 33, no. 1 (2002): 25– 38, https://doi.org/10.1080/00048623.2002.10755177. 34 karen anderson and frances a. may, “does the method of instruction matter? an experimental examination of information literacy instruction in the online, blended, and face-to-face classrooms,” the journal of academic librarianship 36, no. 6 (2010): 495–500, https://doi.org/10.1016/j.acalib.2010.08.005. information literacy model | abunadi 134 https://doi.org/10.6017/ital.v37i4.9750 35 wan ng, “can we teach digital natives digital literacy?,” computers & education 59, no. 3 (2012): 1065–78, https://doi.org/10.1016/j.compedu.2012.04.016; horvat et al., “student perception of moodle learning management system,” 515–27, https://doi.org/ 10.1080/10494820.2013.788033; manal abdulaziz abdullah, “learning style classification based on student's behavior in moodle learning management system,” transactions on machine learning and artificial intelligence 3, no. 1 (2015): 28; liping deng and nicole judith tavares, “from moodle to facebook: exploring students’ motivation and experiences in online communities,” computers & education 68 (2013): 167–76, https://doi.org/10.1016/j.compedu.2013.04.028. 36 j. f. hair, william c. black, and barry j. babin, multivariate data analysis: a global perspective, 7th ed. (upper saddle river, nj: pearson, 2010). 37 l. j. cronbach, “test validation,” in educational measurement, r. l. thorndike 2nd ed. (washington, dc: american council on education, 1971). 38 b. tabachnick and l. fidell, using multivariate statistics, 5th ed. (new york: allyn and bacon, 2007). 39 hair, black, and babin, multivariate data analysis. 40 b. m. byrne, structural equation modeling with amos: basic concepts, applications, and programming, 2nd ed. (new york: taylor & francis group, 2010); hair, black, and babin, multivariate data analysis. 41 t. a. brown, confirmatory factor analysis for applied research (methodology in the social sciences) (new york: guilford, 2006); byrne, structural equation modeling with amos; d. gefen, d. straub, and m. boudreau, “structural equation modeling and regression: guidelines for research practice,” communications of the association for information systems 4, no. 7 (2000): 1–77; hair, multivariate data analysis: a global perspective. 42 p. j. curran, s. g. west, and j. f. finch, “the robustness of test statistics to nonnormality and specification error in confirmatory factor analysis,” psychological methods 1, no. 1 (1996): 16–29, https://doi.org/10.1037/1082-989x.1.1.16. 43 byrne, structural equation modeling with amos. 44 brown, confirmatory factor analysis for applied research; byrne, structural equation modeling with amos. 45 hair, black, and babin, multivariate data analysis: a global perspective. 46 michael b. eisenberg, carrie a. lowe, and kathleen l. spitzer, information literacy: essential skills for the information age (westport, ct: greenwood publishing group, 2004). 47 james elmborg, “critical information literacy: implications for instructional practice,” the journal of academic librarianship 32, no. 2 (2006): 192–99, https://doi.org/10.1016/j.acalib.2005.12.004. information technology and libraries | december 2018 135 48 ibid. 49 anderson and may, “does the method of instruction matter?,” 495–500; horvat et al., “student perception of moodle learning management system,” 515–27, https://doi.org/ 10.1080/10494820.2013.788033. 50 horvat et al., “student perception of moodle learning management system,” 515–27, https://doi.org/ 10.1080/10494820.2013.788033; anderson and may, “does the method of instruction matter?,” 495–500; raven and rodrigues, “a course of our own.” 51 churkovich and oughtred, “can an online tutorial pass the test for library instruction?,” 25–38. 52 anderson and may, “does the method of instruction matter?,” 495–500. 53 peter et al., “making information literacy instruction more efficient,” 1–16. abstract introduction literature review information literacy moodle as an lms methodology inclusion criteria exclusion criteria results measurement and structural model longitudinal case study discussion and conclusion references the goal of this paper is to describe a design—including the hardware, software, and configuration––for an open source wireless network. the network designed will require authentication. while care will be taken to keep the authentication exchange secure, the network will otherwise transmit data without encryption. w ireless networks are an essential tool for provid ing service for colleges and libraries. this paper will explain the setup of a wireless network using opensource software and inexpensive commodity hardware. opensource software was employed exclu sively. this allowed for flexibility in design and reduction in expense while also providing a platform for students to learn more about the internal workings of the system by examining particular sections of code in which they have interest. standard commodity hardware was used as a means of saving cost. this should allow others to repeat this design with a minimum of funding. the purpose of a network, like any resource, is to provide a service for those who own it; in this case, the patrons of a library, or students, faculty, and staff at a col lege. to ensure that this network serves its owners, users will be required to authenticate before gaining access. once authenticated, the central captive portal can pro vide different levels of service for specific user groups, including guest access, if desired. for this system, ease of access for users was the primary concern; other than using the secure socket layer for authentication, the remainder of the traffic was unencrypted. other than the base nodes, the remaining access points were connected to each other using a wireless connection in order to avoid physically connecting all access points across campus and to further reduce the expense for the deployment of the network. this was accomplished using the wds (wireless distributed system) feature on the wireless routers. all access points connect to a centralized set of servers that provide: dhcp, webcaching proxy, dns caching, radius, web server, a captive portal, and logging of network traffic. n hardware requirements for the network were relatively modest, using inexpensive wireless routers along with several linux servers built upon older pentium 3 desktop systems. linksys wrt54gs routers were chosen as the access points as they are inexpensive, readily available, and possess the ability to run custom opensource firmware. other access points could be used; however, the configuration sugges tions are specific to the wrt54gs and may not apply to other hardware. the routing functions of the wrt54gs were not used in this implementation. the servers need not be anything special; older hardware will work just fine. for this implementation, decommissioned 900 mhz units with 512mb of ram and 40gb hard drives were used. n wireless router software in order to provide the functionality required, the units had their firmware flashed with an opensource, linux based operating system available from openwrt for the linksys routers (http://www.openwrt.org). support is also available for other wireless devices. “the firmware from openwrt provides a fully writable file system with pack age management. this allows developers the freedom to customize the devices by choosing only the packages and software that are necessary for their applications.”1 as the routers have limited storage, being able to hand select only the necessary components is a definite advantage. n server software for the operating system on the servers, fedora core was chosen.2 fedora provides the yellow dog updater, modified (yum), which eases the updating of all pack ages installed on the system, including kernel updates.3 this aids security by providing a platform for easily and frequently updating the system. fedora core is an open source distribution that is available for free. fedora core also comes with many other opensource packages that were used in this design, such as the apache web server. while the designers had more familiarity with fedora, other distributions are also available that provide simi lar benefits (suse, ubuntu, openbsd, debian, etc.). the server was run in command line mode with no graphical user interface in order to reduce the load on the server and save space on the hard drive. n captive portal in order to require authentication before gaining access to the network, a captive portal was used. some of the open source wifi hotspot implementation | sondag and feher 35 open source wifi hotspot implementation tyler sondag and jim feher jim feher (jdfeher@mckendree.edu) is an associate professor of computer science at mckendree college in lebanon, illinois. tyler sondag (tnsondag@mckendree.edu), is a senior in computer science at mckendree college. 36 information technology and libraries | june 200736 information technology and libraries | june 2007 desired features in the choice of the captive portal were: encrypted authentication, traffic logging, and the ability to provide different levels of service for different user groups. logging traffic allows the system administrators to identify accounts that have been misusing the network. those who inadvertently misuse the system or perhaps have had their accounts compromised can have their access temporarily disabled until they can be contacted with instructions concerning acceptable use of the net work. as the network must be shared by all, those who habitually abuse the resource can have their accounts per manently disabled. the captive portal should also redi rect web traffic to a login page that is served on the secure socket layer until the user logs in. chillispot was chosen as it possesses all of the features mentioned above.4 n server layout as can be seen in appendix a, three servers were used for this implementation. the first server was used as the main router to the internet. the second server ran a squid web caching server.5 it also ran a dns cach ing server and the freeradius server.6 the third was used for the captive portal. three servers were used for various reasons. first, this distributed the load. second, portions of the network that were not behind the cap tive portal could more easily use the services on the second server running squid, dns, and freeradius. it should be noted that three independent servers are not required; many of the services could be consolidated on two or even one single server to reduce the hardware requirements. the implementation depends upon the specific needs for the network. n server installation installing the operating system (fedora core) on each server is a relatively straightforward procedure. each machine was partitioned with 1024 mbs of swap space with the rest of the drive being an ext3 partition with the mount point “/”. only the minimal set of packages required were installed at this time. the first server, server #1 (router), was given three network interfaces, one for the internet connection, one to connect to a switch that then connects to server #2 (web/dns caching and radius) as well as other machines that do not connect through the captive portal, and one connecting to server #3 (captive portal machine). the second server, server #2, only needs one interface, but the third, server #3, requires two interfaces, one for the master wireless access point, and one to connect to the switch connecting this machine to the rest of the network (appendix a). ssh login for root was also disabled at this time for added security. n server #1 configuration for server #1, very little setup was required. since this server works mainly as a router, the only major items that went into its configuration were the iptables rules, which are shown and described in appendix b.7 rules were set up to: n set up network address translation; n allow traffic to flow within the network; n log the traffic from the wireless portion of the net work; n allow for the transparent setup of the web proxy server; and n set up port knocking before allowing users to log into the router via ssh.8 a reference to this script was placed in the /etc/rc.d/ rc.local file so that it would run when the server boots. last was the setup of the three network interfaces in the machine. this can be done during system installation or afterwards on the fedora core based server by editing the configuration files in the /etc/sysconfig/networking scripts/ directory. one of the configuration files used in this implementation can be seen in appendix c. of course the configuration will change as the topology of the net work changes. n server #2 configuration the second server required significantly more setup to configure all of the necessary services that it runs. the first service added for this implementation was the web caching proxy server, squid. squid’s default configura tion file (/etc/squid.conf) is quite large; fortunately it requires little modification to get a simple server up and running.9 the changes made for this implementation can be seen in appendix d. the most important lines in this configuration are the last few, which enable it to act as a transparent proxy server, making it invisible to the users and requiring no setup of their browsers. as there was no need for an authoritative dns server, just dns caching for the network, dnsmasq, which is easy to configure and can handle both dhcp services as well as dns caching, was chosen.10 in this instance, the captive portal was used to provide dhcp services for the wireless clients; however dnsmasq was used for dynamic clients on the remaining portion of the network. dnsmasq public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 37open source wifi hotspot implementation | sondag and feher 37 is relatively easy to configure, requiring only one change in its default configuration file, which points to the file in which the dns server addresses are stored, in this case /etc/dnsmasq_resolv.conf. next is the configuration of freeradius server. there are two files that need to be modified for the radius server; both are in the /etc/raddb/ directory. the first is clients.conf (appendix e). in this file at least two clients must be listed, one for localhost (this machine) and one for the captive portal machine. for each machine, a pass word must be specified as well as the hostname for that machine. this establishes the shared key that is used to encrypt communication between the captive portal and the radius server. the second is the users file (appendix f). in this file, each user for the captive portal system must be listed with his/her password. this implementa tion also included a class, a session timeout (dhcp lease time), idle timeout, accounting interim interval, and the maximum upload and download speeds. if guest access is required, one or several guest accounts should be added to this file along with entries for the registered users. an entry was added for each access point so that they can obtain an ip address from the dhcp server. finally for this machine, the interface configuration file was changed according to the network specifications. for this machine the configuration is simple since it only has one interface, and the only requirement for its address is that it be on the same network as the interface on the main router server to which it is connected. n server #3 configuration the third server required the installation of the captive portal software, in this case chillispot. in order to install chillispot, if fedora was used for the base system, it may be possible to install it as a prepackaged binary in the form an rpm package manager (rpm) file. otherwise, if you find that you need to compile chillispot from source code, you may need to deviate from a minimal installa tion of the operating system and base components and also include the gnu compiler collection (gcc). when installing from source code, first download the code from the chillispot web site. once the code is down loaded, unzipped and untarred, installing the chillispot daemon is done by entering the directory containing the source files and entering the standard commands: ./configure make make install when chillispot is on the system, either by compiling from source or through an rpm file, two more files must be configured and copied to the proper directory, the main configuration file and the login file. the configuration file, chilli.conf, is located in the directory that contains the source files. move this file to the /etc/ directory and make the necessary changes. in this implementation, the file required several changes (appendix g). one of the more significant alterations was to change the default network range of 192.168.182.0/24, which would be limited to less than 256 addresses. the address range was for the dhcp server was also expanded to allow for more users. the lower portion of the network range was left to make room for addresses that could be assigned to the wireless access points. an entry was added to allow the access points to obtain a static ip address in that lower range. after this, settings must be changed for the dns addresses given out to clients, and the address of the radius server. there is also a setting in the chillispot configuration file that allows users to access a certain list of domains without logging in. for this implementation, the decision was to allow the users access to the campus network, as well as to the dns server. next, the “radi ussecret” must be set. this is the same password that was entered into the clients.conf file on the radius server for this machine. it is also necessary to set the address of the page to which users will be directed. two lines must also be added to allow authentication using the physical or media access control (mac) address for the access points. all of the access points shared a common password. chillispot passes the physical address of the access point to the radius server along with this password. a separate entry must exist in the radius configuration file for each ip/physical address combination. for this setup, the redirect page was placed on this server, therefore apache (using yum) was also installed, and this server’s address was added as the web address for the redirect page (also note that the https module may be required for apache if it does not automatically install). rather than write a new page at this time, the sample page (hotspotlogin.cgi) from the chillispot source folder was copied and modified slightly (appendix h). in addi tion, a secure socket layer (ssl) certificate was installed on this server. this is not necessary, but it helps to avoid the warnings that pop up when a client attempts to access the login page with a browser. a few iptables rules need to be added. the first com mand needs to be executed in order to utilize network address translation (nat) and have the server forward packets to the outside network. /sbin/iptables t nat a postrouting o eth0 \ j masquerade the next is used to drop all outbound traffic originating from the access points. this prevents anyone spoofing the physical address of the access point from accessing 3� information technology and libraries | june 20073� information technology and libraries | june 2007 the internet, while still allowing the access points and the chillispot server to communicate for configuration and monitoring. /sbin/iptables a forward s 192.168.182.0/24 \ j drop these commands need to be executed when the chillispot machine boots, so they were placed into the /etc/rc.d/rc.local file. it may also be necessary to ensure that the machine can forward network traffic. this can be accomplished with the following command, which is also found as the first executable command from the script in appendix b: echo “1” > /proc/sys/net/ipv4/ip_forward finally, the configuration files for the interfaces were set up. n openwrt installation and configuration several ways exist to replace the default linksys firmware with the openwrt firmware.11 the tftp protocol can be used with both windows and linux, and one such method can be found in appendix i.12 in addition, other methods for using the standard web interface can be found on the openwrt web site.13 there are several versions of the openwrt firmware available; the newest version that uses the squashfs filesystem was chosen because it utilizes com pression that frees more space on the access point. openwrt comes with a default web interface that can be used for configuration, however, ssh was enabled and a script using the nvram command was used to configure each access point (see appendix j). before ssh can be used, you must telnet into the router and change the default password (which for linksys routers is ‘admin’). note: even if you decide to use the web interface, you should still change the default password. as several services that were installed with the default configuration were not used in the implementa tion, they were disabled once the firmware was flashed by removing the modules that boot at startup: the web interface, dnsmasq, and the firewall. this is done by deleting their entries in the /etc/init.d directory. changes were needed to set the mode of the access point, to turn on and configure the clients needing to use wds, to set the network information for the access point and then to save these settings. all of the wireless access points that communicate with each other via a wireless connec tion must have their physical addresses entered using a nvram command. for example, the command used for the main access point for the library would be: nvram set w10_wds=”mac_4_lib1 mac_4_lib2” all of this is detailed in appendix j. a final set of com mands, which were needed for the wrt54gs, are included to allow the access point to obtain its ip address from the dhcp server. these commands may not be necessary depending upon the type of access point used. since extra wireless access points are available, if an access point fails or is having problems for some reason, it is simply a matter of running a script similar to the one found in the appendix on one of the extra routers and swapping it out. n security unfortunately this system is not very secure. only the login credentials are encrypted via ssl. general data packets are in no way encrypted, so any information being transmitted is available to anyone sniffing the channel. wep and wpa could be used for encryption, but they have known vulnerabilities. other methods exist for securing the network such as wpa with radius or the use of a virtual private network, however the client setup for such systems may not be considered trivial for the typical user. therefore it was decided that it was better to inform the users that the data was not being encrypted and let them act accordingly, rather than use encryption with known flaws or invest the time required to train the general population on how to configure their mobile units to use a more secure form of encryption. as the main goal of this particular network was connectivity and not security, it was felt that this was a fair trade off. as new standards for wireless communication are developed and commodity hardware that supports them becomes available, this may change so that encrypted channels can be employed more easily. n conclusion this implementation is in no way completed. it is a work in progress, with many goals still in mind. also, as new features are desired, parts of the system will change to accommodate these requirements. current plans for the future are first to develop scripts to check the status of the access points and display this information to a web page. these scripts will also notify network administrators when access points go offline. this will help the adminis trators in making sure the system is up at all times. after this, scripts will be developed to parse the log files to find abusive activity (spamming, viruses, etc). however, the current project as described is complete and has already functioned successfully for nearly a year providing con nectivity for the library and portions of the mckendree college campus. public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 3�open source wifi hotspot implementation | sondag and feher 3� references and notes 1. openwrt, wireless freedom. www.openwrt.org (accessed june 16, 2006). 2. the fedora project. www.fedora.redhat.com (accessed nov. 29, 2005). 3. yum: yellow dog updater, modified. www.linux.duke. edu/projects/yum (accessed july 22 2006). 4. chillispot—open source wireless lan access point controller. www.chillispot.org (accessed june 23, 2006). 5. squid web proxy cache. www.squidcache.org (accessed june 1, 2006). 6. freeradius—building the perfect radius server. www. freeradius.org (accessed june 28, 2006). 7. netfilter/iptables project homepage—the netfilter.org project. www.netfilter.org (accessed aug. 8, 2006). 8. thomas eastep, “port knocking and other uses of ‘recent match.’” www.shorewall.net/portknocking.html (accessed aug. 11, 2006). 9. squid web proxy cache, “squid frequently asked questions: interception caching/proxying.” www.squidcache. org/doc/faq/faq17.html (accessed aug. 8, 2006). 10. dnsmasq—a dns forwarder for nat firewalls. www. thekelleys.org.uk/dnsmasq/doc.html (accessed june 1, 2006). 11. linksys.com. www.linksys.com (accessed dec. 15, 2005). 12. openwrtdocs/installing/tftp—openwrt. wiki.open wrt.org/openwrtdocs/installing/tftp?action=show&redirect =openwrtviatfp (accessed aug. 2, 2006). 13. openwrtdocs/installing—openwrt. wiki.openwrt.org/ openwrtdocs/installing (accessed aug. 2, 2006). appendix a. network configuration 40 information technology and libraries | june 200740 information technology and libraries | june 2007 appendix b. iptables script—server #1 # this particular bit must be set to one to allow the # network to forward packets echo “1” > /proc/sys/net/ipv4/ip_forward # set up path to the internal network from internet if the # internal network initiated the connection iptables a forward i eth0 o eth1 d 10.4.0.0 \ m state state established,related j accept # same for the chillispot subnet iptables a forward i eth0 o eth2 d 10.5.0.0 \ m state state established,related j accept # allow the internal subnets to communicate with one another iptables a forward i eth1 d 10.5.0.0 o eth2 \ j accept iptables a forward i eth2 d 10.4.0.0 o eth1 \ j accept # allow subnet containing server 2 to reach the internet iptables a forward i eth1 o eth0 j accept # chillispot – accept and forward packets iptables a forward i eth2 s 10.5.3.30 j accept # set up transparent proxy for wireless network, but allow # connections that go through to the campus network # to bypass proxy iptables t nat a prerouting i eth2 ! \ d 66.99.172.0/23 p tcp dport 80 s 10.5.0.0/16 \ j dnat todestination 10.4.1.90:3128 # nat iptables t nat a postrouting o eth0 \ j masquerade # simple port knocking to allow port 22 connection adapted # from www.shorewall.net/portknocking.html1 another # excellent document can be found at # www.debian-administration.org/articles/26814 # once connection started let it continue iptables a input m state state \ established,related j accept # if name ssh has been set, then allow connection iptables a input p tcp dport 22 m recent \ rcheck name ssh j accept # surround the port that opens ssh so that a sequential port # scanners will end up closing it right after opening it. iptables a input p tcp dport 1233 m recent \ –name ssh remove j drop iptables a input p tcp dport 1234 m recent \ name ssh set j drop iptables a input p tcp dport 1235 m recent \ name ssh remove j drop # drop all packets that do not match a rule above by default iptables a input j drop appendix c. server configuration for first network card (ethernet 0) # /etc/sysconfing/networkingscripts/ifcfgeth0 # server #1 # device=eth0 bootproto=static broadcast=66.128.109.63 hwaddr=00:11:22:33:44:66 ipaddr=66.128.109.60 netmask=255.255.255.248 network=66.128.109.56 onboot=yes type=ethernet appendix d. /etc/squid.conf—server #2 #default squid port http_port 3128 # settings changed to specify memory for squid cache_mem 32 mb cachedir ufs /var/spool/squid 1000 16 256 # allow assess to squid for all within our network acl all src 0.0.0.0/0.0.0.0 http_access allow all http_reply_access allow all # internal host with no externally known name so we put # our internal host name visible_hostname hostname # specifications needed for transparent proxy2 httpd_accel_port 80 httpd_accel_host virtual httpd_accel_with_proxy on httpd_accel_uses_host_header on public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 41open source wifi hotspot implementation | sondag and feher 41 appendix e. /etc/raddb/clients.conf— server #2 client 127.0.0.1 { secret = password shortname = localhost nastype = other } client 10.5.3.30 { secret = password shortname = other machine } appendix f. /etc/raddb/users—server #2 # example of an entry for a user joeuser authtype:=local, userpassword==”passwd” class = 0702345678, sessiontimeout = 3600, idletimeout = 600, acctinteriminterval = 60, wisprbandwidthmaxup = 128000, wisprbandwidthmaxdown = 512000 # example of an entry for an access point # the physical/mac address listed below is for the # lan side of the router/access point mac_address authtype := local, userpassword == “password” framedipaddress = 192.168.182.10, acctinteriminterval = 3600, sessiontimeout = 0, idletimeout = 0 appendix g. /etc/chilli.conf—server #3 # used to expand the network net 192.168.176.0/20 # used to expand the number of hosts that can connect # while still leaving a portion of the network for # infrastructure dynip 192.168.184.0/21 # used to give static addresses to the access points statip 192.168.182.0/24 # internal dns followed by external dns dns1 10.4.1.90 dns2 24.217.0.3 # radius server for the network radiusserver1 10.4.1.90 radiusserver2 10.4.1.90 # radius secret used radiussecret password # interface chillispot server to listens to dhcp requests dhcpif eth1 # specified default login page uamserver https://10.5.3.30/cgibin/hotspotlogin.cgi # addresses that users can visit without authenticating uamallowed 10.4.1.90,24.217.0.3,66.99.172.0/24 # this allows the access points to authenticate based on # mac address only, this is required to log into the access # points from the captive portal server macauth # this password corresponds with the password from the # radius users file macpasswd password 42 information technology and libraries | june 200742 information technology and libraries | june 2007 appendix h. redirection page appendix i. method for flashing firmware of linksys router the firmware can be flashed using the builtin web inter face or via tftp. while help is available online3 for this, the procedure outlined here may also be helpful. on newer versions of the linksys routers, an older version of the linksys firmware must be installed first that supports a bug in the ping function on the router. once the older version is installed, you can exploit a bug in the ping com mand on the router to enable “boot wait,” which enables the router to accept a connection to flash its firmware as it is booting. detailed instructions for this installation are as fol lows: n first, download an old version of a linksys firmware that supports the ping bug to enable boot wait. one is available at: ftp://ftp.linksys.com/pub/network/ wrt54gs_3.37.2_us_code.zip n download and unzip this file. n plug an ethernet patch cable into link #1 on the router (not the wan port) and the interface on your machine. set the ip address of your computer to a static ip address in the 192.168.1.x range, not 192.168.1.1, which is used by the router. n log into router by opening a browser window and putting 192.168.1.1 into the address bar. (note: this is only for factory preset routers.) username: (leave blank) password: admin n click on "administration". n click on "firmware upgrade". n click "browse" and locate the old linksys firmware on your machine. n click "upgrade". n wait patiently while it flashes the firmware…. n click "setup". n click "basic setup". public libraries and internet access | jaeger, bertot, mcclure, and rodriguez 43open source wifi hotspot implementation | sondag and feher 43 n choose "static ip" from the first box. n for the ip address put in "10.0.0.1". n for the netmask put in "255.0.0.0". n for the gateway put in "10.0.0.2". n you can leave everything else as their default set tings. n choose save settings at the bottom of the page. n click on "administration". n click on "diagnostics". n click on "ping". in the “address” box put the following commands in one at a time and click on “ping”; if you see the message that the host was unreachable you have done something wrong. ;cp${ifs}*/*/nvram${ifs}/tmp/n ;*/n${ifs}set${ifs}boot_wait=on ;*/n${ifs}commit ;*/n${ifs}show>tmp/ping.log n after the last command you will see a list of all the nvram settings on the router, make sure that the line for "boot_wait" is set to on n unplug the router (the linksys router will only look for new firmware on boot). n use tftp on your linux or windows machine. n if the openwrt0wrt54gssquashfs.bin file is not in this directory, copy the file to this directory n run the following commands at the prompt (below are the linux commands) tftp 192.168.1.1 tftp> binary tftp> rexmt 1 tftp> timeout 60 tftp> trace tftp> put openwrtxxxx.xxxx.bin n the router will now reboot (it may take a very long time), when it is done rebooting, the dmz light will turn off the new firmware is now loaded onto the router. appendix j. nvram script for wireless routers ## server information stored as comments ##192.168.182.10 mainap 00:11:22:33:44:00 ##192.168.182.11 cl202a 00:11:22:33:44:11 ##192.168.182.20 lib01 00:11:22:33:44:22 ##192.168.182.21 lib02 00:11:22:33:44:33 ##192.168.182.22 lib03 00:11:22:33:44:44 ##192.168.182.30 car01 00:11:22:33:44:55 ## same for all nvram set wl0_mode=ap nvram set wl0_ssid=mck_wireless nvram set wl0_channel=9 nvram set lan_proto=dhcp ## sample configuration for a few access points. ## uncomment and run for the appropriate node. ## make sure to ## add a line for every access point you have. ## unique for lib01 ## allow connections to/from lib02, and lib03 #nvram set wl0_wds=”00:11:22:33:44:33 00:11:22:33:44:44” ## unique for lib02 ## allow connections to/from lib01 #nvram set wl0_wds=”00:11:22:33:44:22” ## unique for lib03 ## allow connections to/from lib01 #nvram set wl0_wds=”00:11:22:33:44:22” ## same for all nvram commit ## same for all ## this needed to be done to allow each wrt54gs router ## to accept an ip address from a dhcp server. this is ## only for the wrt54gs. other access point/routers ## may require something different. # cd /etc/init.d # rm s05nvram # cp /rom/etc/init.d/s05nvram . # vi s05nvram ## place a # in front of (comment out) ## nvram set lan_proto=”static” references 1. thomas eastep, “port knocking and other uses of ‘recent match.’ ” www.shorewall.net/portknocking.html (accessed aug. 11, 2006) 2. ibid. 3. openwrtdocs/installingopenwrt, wiki.openwrt.org/ openwrtdocs/installing (accessed aug. 2, 2006). let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects tanya m. johnson information technology and libraries | june 2016 39 abstract three-dimensional objects are important sources of information that should not be ignored in the increasing trend towards digitization. previous research has not addressed the evaluation of digitized versions of three-dimensional objects. this paper first reviews research concerning such digitization, in both two and three dimensions, as well as public access in this context. next, evaluation criteria for websites incorporating digital versions of three-dimensional objects are extrapolated from previous research. finally, five websites are evaluated, and suggestions for best practices to provide public access to digital versions of three-dimensional objects are proposed. introduction much of the literature surrounding the increased efforts of libraries and museums to digitize content has focused on two-dimensional forms, such as books, photographs, or paintings. however, information does not only come in two dimensions; there are sculptures, artifacts, and other three-dimensional objects that have been unfortunately neglected by this digital revolution. as one author stated, “while researchers do not refer to three-dimensional objects as commonly as books, manuscripts, and journal articles, they are still important sources of information and should not be taken for granted” (jarrell 1998, 32). the importance of three-dimensional objects as information that can and should be shared is not a new phenomenon; indeed, as early as 1887, museologists and educators forwarded the view that “museums were in effect libraries of objects” that provided information not supplied by books alone (given and mctavish 2010, 11). however, it is only recently, with the advent of newer technological mechanisms, that such objects could be shared with the public on a larger scale. no longer do people need to physically visit museums to experience and learn from threedimensional objects. rather, various techniques have been utilized to place digital versions of such objects on the websites of museums and archives, and projects have been created by various universities in order to enhance that digital experience. nevertheless, as newell (2012) states: collections-holding institutions increasingly regard digital resources as additional objects of significance, not as complete replacements for the original. digital technologies work best when they enable people who feel connected to museum objects to have the freedom to deepen these tanya m. johnson (tmjohnso@gmail.com), a recent mlis degree graduate from the school of communication & information, rutgers, the state university of new jersey, is winner of the 2016 lita/ex libris student writing award. mailto:tmjohnso@gmail.com let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 40 relationships and, where appropriate, to extend outsiders’ understandings of the objects’ cultural contexts. the raison d’être of museums and other cultural institutions remains centred on the primacy of the object and in this sense continues to privilege material authenticity. (303) in this regard, three-dimensional visualization of physical objects can be seen as the next step for museums and cultural heritage institutions that seek to further patrons’ connection to such objects via the internet. indeed, in this digital age, the goals of museums and archives are changing, converging with those of libraries to focus more efforts on providing information to the public, and, along with the growing trend to digitize information contained within libraries, there has been a concomitant trend to digitize the contents of museums in order to provide greater public access to collections (given and mctavish 2010). in light of this progress, this paper will review various methods of presenting three-dimensional objects to the public on the internet and, based on an evaluation of five digital collections, attempt to provide some advice as to best practices for museums or institutions seeking to digitize such objects and present them to the public via a digital collection. literature review two-dimensional digitization there are many ways to present digital versions of three-dimensional objects on a webpage, ranging from simple two-dimensional photography to complicated three-dimensional scanning and rendering. beginning on the simpler end of the scale, bincsik, maezaki, and hattori (2012) describe the process of photographing japanese decorative art objects in order to create an image database of objects from multiple museums. specifically, the researchers explain that they need high quality photographs showing each object in all directions, as well as close-up images of fine details, in order to recreate the physical research experience as closely as possible. they also note that, for the same reason, the context of each object must be recorded, including photographs of any wrapping or storage materials and accompanying documentation. for this project, the researchers utilized nikon professional or semi-professional cameras, with zoom and macro lenses, and often used small apertures to increase depth-of-field. at times, they also took measurements of the objects in order to assist museums in maintaining accurate records. the raw image files were then processed with programs such as adobe photoshop, saved as original tif files, and converted into jpeg format for upload. despite the success of the project, the researchers also noted the limitations of digitizing three-dimensional objects: with decorative art objects some information is inevitably lost, such as the weight of the object, the feeling of its surface texture or the sense of its functionality in terms of proportions and balance. digital images clearly can fulfill many research objectives, but in some cases they can only be used as references. one objective of the decorative arts database is to advise the researcher in selecting which objects should be examined in person. (bincsik, maezaki, and hattori 2012, 46) one difficulty with photography, particularly when digitizing artwork, is that color is a function of light. thus, a single object will often appear to be different colors when photographed in different lighting conditions using conventional digital cameras, which process images using rgb filters. information technology and libraries | june 2016 41 more accurate representations of objects can be acquired using multispectral imaging, which uses a higher number of parameters (the international standard is 31, compared to rgb’s 3) in order to obtain more information about the reflectance of an object at any particular point in space (novati, pellegri, and schettini 2005). multispectral imaging, however, is very expensive and, despite some researchers’ attempts to create affordable systems (e.g., novati, pellegri, and schettini 2005), the acquisition of multispectral images is generally limited to large institutions with considerable funding (chane et al. 2013). the use of two-dimensional photography to digitize objects is not limited to the arts; in the natural sciences, different types of photographic equipment have been developed to document existing collections and enhance scientific observation. gigapixel imaging, for example, has been utilized to allow museum visitors to virtually explore large petroglyphs located in remote locations as well as for documentation and viewing of dinosaur bone specimens that are not on public display (louw and crowley 2013). this technology consists of taking many, very high resolution photographs that are then, via computer software, “aligned, blended, and stitched” together to create one extremely detailed composite image (louw and crowley 2013, 89–90). robotic systems, such as gigapan, have been developed to speed up the process and permit rapid recording and processing of the necessary area. once the gigapixel image is created, it can then be uploaded and displayed on the web in dynamic form, including spatial navigation of the image with embedded text, audio, or video at specific locations and zoom levels to provide further information (louw and crowley 2013). various types of gigapixel imaging, including the gigapan system, have also been used to digitize important collections of biological specimens, particularly insects, which are often stored in large drawers. one study examined the documentation of entomological specimens by “whole-drawer imaging” using various gigapixel imaging technologies (holovachov, zatushevsky, and shydlovsky 2014). the researchers explained that different gigapixel imaging systems (many of which are commercial and proprietary) utilize different types of cameras and lenses, as well as different types of software for processing. however, despite the expensive cost of some commercially available systems, it is possible for museums and other institutions to create their own, economically viable versions. the system created by holovachov, zatushevsky, and shydlovsky utilized a standard slr camera, fitted with a macro lens and attached to an immovable stand. the researchers manually set up lighting, focus, aperture, and other settings, and moved the insect drawer along a pre-determined grid pattern in order to obtain the multiple overlapping photographs necessary to create a large gigapixel image. they used a freely available stitching software program and manually corrected stitching artifacts and color balance issues that resulted from the use of a non-telecentric lens.1 despite the lower cost of their individualized system, however, the researchers noted that the process was much more time-consuming and necessitated more labor from workers digitizing the collection. moreover, technologically speaking, the researchers emphasized the limits of two-dimensional imaging, given that the 1the difference between telecentric and non-telecentric lenses is explained by the researchers: “contrary to ordinary photographic lenses, object-space telecentric lenses provide the same object magnification at all possible focusing distances. an object that is too close or too far from the focus plane and not in focus, will be the same size as if it were in focus. there is no perspective error and the image projection is parallel. therefore, when such a lens is used to take images of pinned insects in a box, all vertical pins will appear strictly vertical, independent of their position within the camera’s field of view” (holovachov, zatushevsky, and shydlovsky 2014, 7). let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 42 “diagnostic characteristics of three-dimensional insects,” as well as the accompanying labels, are often invisible when a drawer is only photographed from the top. thus, the researchers concluded that, ultimately, “the whole-drawer digitizing of insect collections needs to be transformed from two-dimensions to three-dimensions by employing complex imaging techniques (simultaneous use of multiple cameras positioned at different angles) and a digital workflow” (holovachov, zatushevsky, and shydlovsky 2014, 7). three-dimensional digitization given the goal of obtaining as accurate a representation as possible when digitizing objects, many researchers have turned to the use of various techniques in order to obtain three-dimensional data. acquiring a three-dimensional image of an object takes place in three steps: 1. preparation, during which certain preliminary activities take place that involve the decision about the technique and methodology to be adopted as well as the place of digitization, security planning issues, etc. 2. digital recording, which is the main digitization process according to the plan from phase 1. 3. data processing, which involves the modeling of the digitized object through the unification of partial scans, geometric data processing, texture data processing, texture mapping, etc. (pavlidis et al. 2007, 94) steps 2 and 3 have been more technically described as (2) obtaining data from an object to create point clouds (from thousands to billions of x,y,z coordinates representing loci on the object); and (3) processing point clouds into polygon models (creating a surface on top of the points), which can then be mapped with textures and colors (metallo and rossi 2011). there are several techniques that can be utilized to acquire three-dimensional data from a physical object. table 1 explains the four general methods most commonly used by museums. information technology and libraries | june 2016 43 type description positives negatives approx. price range laser scanning a laser source emits light onto the object’s surface, which is detected by a digital camera; geometry of the object is extracted by triangulation or time of flight calculations high accuracy in capturing geometry; can capture small objects and entire buildings (using different hardware) limited texture and color captured; shiny surfaces refract the laser $3,000– $200,000 white light (structured light) scanning a pattern of light is projected onto the object’s surface, and deformations in that pattern are detected by a digital camera; geometry is extracted by triangulation from deformations captures texture details, making it very accurate; can capture color dark, shiny, or translucent objects are problematic $15,000– $250,000 photogrammetry three-dimensional data is extracted from multiple twodimensional pictures can capture small objects and mountain ranges; good color information need either precise placement of cameras or more precise software to obtain accurate data cameras: $500– $50,000; software: free– $40,000 volumetric scanning magnetic resonance imaging (mri) uses a strong magnetic field and radio waves to detect geometric, density, volume and location information; computed tomography (ct) uses rotating x-rays to create twodimensional slices, which can then be reconstructed into three-dimensional images both types can view the interior and exterior of an object; ct can be used for reflective or translucent objects; mri can image soft tissues no color information; mri requires object to have high water content $200,000– $2,000,000 table 1. description of four general methods of acquiring three-dimensional data about physical objects (table information compiled by reference to pavlidis et al. 2007; metallo and rossi 2011; abel et al. 2011; and berquist et al. 2012). the type of three-dimensional digitization used can ultimately depend upon the types of objects to be imaged or the type of data needed. for example, in digitizing human skeletal collections, one study explained that three-dimensional laser scanning was an advantageous technique to create models of bones for preservation and analysis, but cautioned that ct scans would be needed to examine the internal structures of such specimens (kuzminsky and gardiner 2012). another study let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 44 utilized several techniques in an attempt to decipher graffiti inscriptions on ancient roman pottery shards, ultimately concluding that high-resolution photography (similar to gigapixel imaging) and three-dimensional laser scanning both provided detailed and helpful data (montani et al. 2012). additionally, sometimes multiple types of digitization can be used for the same objects with similar results. one study, for example, obtained virtually equivalent threedimensional models of the same object using laser scanning and two types of photogrammetry (lerma and muir 2014). most recently, researchers have been utilizing combinations of digitization techniques to obtain the most accurate representations possible. chane et al. (2013), for example, examined methods of combining three-dimensional digitization with multispectral photography in order to obtain enhanced information concerning the physical object in question. the researchers explained that combining the two processes is difficult because, in order to obtain multispectral textural data that is mapped to geometric positions, the object must be imaged from identical locations by multiple scanners/cameras or else the data processing that combines the two types of data becomes extremely complex. as a compromise, the researchers created a system of optical tracking based on photogrammetry techniques that permits the collection and integration of geometric positioning data and multispectral textures utilizing precise targeting procedures. however, the researchers noted that most systems integrating multispectral photography with threedimensional digitization tended to be quite bulky, did not adapt easily to different types of objects, and needed better processing algorithms for more complex three-dimensional objects (chane et al. 2013). public access to three-dimensionally digitized objects despite museums’ growing focus on increasing public access to collections via digitization (given and mctavish 2010), there is very little literature addressing public access to three-dimensionally digitized objects. indeed, studies in this realm tend to focus on the technological aspects of either the modeling of specific objects or collections or website viewing of three-dimensional models. for example, abate et al. (2011) described the three-dimensional digitization of a particular statue from the scanning process to its ultimate depiction on a website. the researchers explained in detail the particular software architecture utilized in order to permit the remote rendering of the three-dimensional model on users’ computers via a java applet without compromising quality or necessitating download of potentially copyrighted works. by contrast, literature concerning the digital michelangelo project, during which researchers three-dimensionally digitized various michelangelo works, focused on the method used to create an accurate three-dimensional model, complete with color and texture mapping, and a visualization tool (dellepiane et al. 2008). one study did describe a project that was designed to place three-dimensional data about various cultural artifacts in an online repository for curators and other professionals (hess et al. 2011). this repository was contained within database management software, a web-based interface was designed for searching, and user access to three-dimensional images and models was provided via an activex plugin. despite the potential of the prototype, however, it appears that the project has ceased,2 and the institution’s current three-dimensional imaging project is focused on the design 2see http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/ecurator. http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/e-curator http://www.ucl.ac.uk/museums/petrie/research/research-projects/3dpetrie/3d_projects/3d-projects-past/e-curator information technology and libraries | june 2016 45 of a traveling exhibition incorporating, among other things, three-dimensional models of artifacts and physical replicas created from such models.3 studies that do address public access directly tend to focus on the improvement of museum websites generally. for example, in terms of user expectations of museum websites, one study found that approximately 63 percent of visitors to a museum’s website did so in order to search the digital collection (kravchyna and hastings 2002). another study found four types of museum website users, who each had different needs and expectations of sites. relevantly, educators sought collections that were “the more realistic the better,” including suggestions like incorporating three-dimensional simulations of physical objects so that students could “explore the form, construction, texture and use of objects” (cameron 2003, 335). further, non-specialist users “value free choice learning” and “access online collections to explore and discover new things and build on their knowledge base as a form of entertainment” (cameron 2003, 335). similarly, some studies have addressed the incorporation of web 2.0 technologies into museum websites. srinivasan et al. (2009), for example, argue that web 2.0 technologies must be integrated into museum catalogs rather than simply layered over existing records because users’ interest in objects is increased by participation in the descriptive practice. an implementation of this concept is found in hunter and gerber’s (2010) system of social tagging attached to threedimensional models. this paper is an effort to address the gap between the technical process of digitizing and presenting three-dimensional objects on the web and the user experience of such. through the evaluation of five websites, this paper will provide some guidance for the digitization of threedimensional objects and their presentation in digital collections for public access. methodology and evaluative criteria evaluations of digital museums are not as prevalent as evaluations of digital libraries. however, given the similar purposes of digital museums and digital libraries, it is appropriate to utilize similar criteria. for digital libraries, saracevic (2000) synthesized evaluation criteria into performance questions in two broad areas: (a) user-centered questions, including how well the digital library supports the society or community served, how well it supports institutional or organizational goals, how well it supports individual users’ information needs, and how well the digital library’s interface provides access and interaction; and (b) systemcentered questions, including hardware and network performance, processing and algorithm performance, and how well the content of the collection is selected, represented, organized, and managed. xie (2008) focused on user-centered evaluation and found five general criteria that exemplified users’ own evaluations of digital libraries: interface usability, collection quality, service quality, system performance, and user satisfaction. parandjuk (2010) used information architecture to construct criteria for the evaluation of a digital library, including the following: • uniformity of standards, including consistency among webpages and individual records; • findability, including ease of use and multiple ways to access the same information; • sub-navigation, including indexes, sitemaps, and guides; 3see http://www.3dencounters.com. http://www.3dencounters.com/ let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 46 • contextual navigation, including simplified searching and co-location of different types of resources; • language, including consistency in labeling across pages and records and appropriateness for the audience; and • integration of searching and browsing. this system is particularly appropriate in the context of digital museums, as it emphasizes the curatorial or organizational aspect of the collection in order to support learning objectives. in one comprehensive evaluation of the websites of art museums, pallas and economides (2008) created a framework for such evaluation, incorporating six dimensions: content, presentation, usability, interactivity and feedback, e-services, and technical. each dimension then contained several specific criteria. many of the criteria overlapped, however, and three-dimensional imaging, for example, was placed within the e-services dimension, under virtual tours, although it could have been placed within presentation, with other multimedia criteria, or even within interactivity, with interactive multimedia applications. the problem in trying to evaluate a particular part of a museum’s website, namely, the way it presents three-dimensional objects in digital form, is that the level of specificity almost renders many of the evaluation criteria from previous studies irrelevant. as hariri and norouzi (2011) suggest, evaluation criteria should be based on the objective of the evaluation. hence, based on portions of the above-referenced studies, this author has created a more focused evaluation framework, concentrating on criteria that are particularly relevant to museums’ digital presentations of three-dimensional objects. this framework is detailed in table 2, below. dimension description functionality what technology is used to display the object? how well does it work? must programs or files be downloaded? are the loading times of displays acceptable? usability how easy is the site to use? what is the navigation system? are there searching and browsing functions, and how well does each work? how findable are individual objects? presentation how does the display of the object look? what is the context in which the object is presented? are there multiple viewing options? is there any interactivity permitted? content does the site provide an adequate collection of objects? for individual objects, is there sufficient information provided? is there additional educational content? table 2. summary of evaluative criteria five digital collections, specified below, will be evaluated based on these criteria. this will be done in a case study manner, describing each website based on the above criteria and then using those evaluations to make suggestions for best practices. results information technology and libraries | june 2016 47 it is difficult to compare different types of digital collections, particularly when the focus is on different types of technology utilized to display similar objects. however, because the goal here is to determine the best practices for the digital presentation of three-dimensional objects, it is important to evaluate a variety of techniques in a variety of fields. thus, the following digital collections have been chosen to illustrate different ways in which such objects can be displayed on a website. museum of fine arts, boston (mfa) (http://www.mfa.org/collections) the mfa, both in person and online, boasts a comprehensive and extensive collection of art and historical artifacts of varying forms. the website is very easy to navigate, with well-defined browsing options and easy search capabilities, allowing for refinement of results by collection or type of item. there are many collections, which are well organized and curated into separate exhibits and galleries. in addition, when viewing each gallery, suggestions are linked for related online exhibitions as well as tours and exhibits at the physical museum. each item record contains a detailed description of the item as well as its provenance. thus, the mfa website attains a very high rating for usability and content. however, individual items are represented by only single pictures of varying quality. some pictures are color, some are black and white, and no two pictures appear to have the same lighting. additionally, despite being slow to load, even the pictures that appear to be of the best quality cannot be of high resolution, as zooming in makes them slightly blurry. accordingly, the mfa website receives a medium rating for functionality and a low rating for presentation. digital fish library (dfl) (http://www.digitalfishlibrary.org/index.php) the dfl project is a comprehensive program that utilizes mri scanning to digitize preserved biological fish samples from a particular collection housed at the scripps institution of oceanography. after mri scans of a specimen are taken, the data is processed and translated into various views that are placed on the website, accompanied by information about each species (berquist et al. 2012). navigating the dfl website is very intuitive, as the individual specimen records are organized by taxonomy. it is easy to search for particular species or browse through the clickable, pictorial interface. records for each species include detailed information about the individual specimen, the specifics of the scans used to image each, and broader information about the species. individual records also provide links to other species within the taxonomic family. thus, the dfl website attains high ratings in both usability and content. for functionality and presentation, however, the ratings are medium. although for each item there are videos and still images obtained from threedimensional volume renderings and mri scans, they are small in size and have low resolution. there is no interactive component, with the possible exception of the “digital fish viewer” that supposedly requires java, but this author could not get it to work despite best efforts. one nice feature, shown in figure 1 below, is that some of the specimen records have three-dimensional renderings showing and explaining the internal structures of the species. http://www.mfa.org/collections http://www.digitalfishlibrary.org/index.php let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 48 figure 1. annotated three-dimensional rendering of internal structures of hammerhead shark, from the digital fish library (http://www.digitalfishlibrary.org/library/viewimage.php?id=2851) the eton myers collection (http://etonmyers.bham.ac.uk/3d-models.html) the eton myers collection of ancient egyptian art is housed at eton college, and a project to threedimensionally digitize the items for public access was undertaken via collaboration between that institution and the university of birmingham. digitization was accomplished with threedimensional laser scanners, data was then processed with geomagic software to produce point cloud and mesh forms, and individual datasets were reduced in size and converted into an appropriate file type to allow for public access (chapman, gaffney, and moulden 2010). usability of the eton myers collection website is extremely low. the initial interface is simply a list of three-dimensional models by item number with a description of how to download the appropriate program and files. another website from the university of birmingham (http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=the+eton+myers+col lection) contains a more museum-like interface, but contains many more records for objects than are contained on the initial list of three-dimensional models. moreover, most of the records do not even include pictures of the items, let alone links to the three-dimensional models, and the records that do include pictures do not necessarily include such links. even when a record has a link to the three-dimensional model, it actually redirects to the full list of models rather than to the individual item. there is no search functionality from the initial list of three-dimensional models, and no way to browse other than to, colloquially speaking, poke and hope. individual items are only identified by item number, and, aside from the few records that have accompanying pictures on the university of birmingham site, there is no way to know to what item any given number refers. the http://www.digitalfishlibrary.org/library/viewimage.php?id=2851 http://etonmyers.bham.ac.uk/3d-models.html http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=the+eton+myers+collection http://mimsy.bham.ac.uk/info.php?f=option8&type=browse&t=objects&s=the+eton+myers+collection information technology and libraries | june 2016 49 website attains only a low rating for content; although it seems that there may be a decent number of items in the collection, it is impossible to know for certain given the problems with the interface and the fact that individual items are virtually unidentified. the eton myers collection website also receives a low rating for functionality. in order to access three-dimensional models of items, users must download and install a program called meshlab, then download individual folders of compressed files, then unzip those files, and finally open the appropriate file in meshlab. despite compression, some of the file folders are still quite large and take some time to download. presentation of the items is also rated low. even for the high resolution versions of the three-dimensional renderings, viewed in meshlab, the geometry of the objects seems underdeveloped (e.g., hieroglyphics are illegible) and surface textures are not well mapped (e.g., colors are completely off). this is evident from a comparison of the threedimensional rendering with a two-dimensional photograph of the same item, as in figure 2, below. figure 2. comparison of original photograph (left) and three-dimensional rendering (right) of item number ecm 361, from the eton myers collection (http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+3 61&op-earliest_year=%3d&op-latest_year=%3d). notably, chapman, gaffney, and moulden (2010) indicate that the detailed three-dimensional imaging enabled them to identify tooling marks and read previously unclear hieroglyphics on certain items. thus, it is possible that the problems with the renderings may be a result of a loss in quality between the original models and the downloaded versions, particularly given that the files were reduced in size and converted prior to being made available for download. http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+361&op-earliest_year=%3d&op-latest_year=%3d http://mimsy.bham.ac.uk/detail.php?t=objects&type=ext&f=&s=&record=0&id_number=ecm+361&op-earliest_year=%3d&op-latest_year=%3d let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 50 epigraphia 3d project (http://www.epigraphia3d.es) the epigraphia 3d project was created to present an online collection of various historical roman epigraphs (also known as inscriptions) that were discovered and excavated in spain and italy; the physical collection is housed at the museo arqueológico nacional (madrid). digital imaging was accomplished using photogrammetry, free software was utilized to create three-dimensional object models and renderings, and photoshop was used to obtain appropriate textures. finally, the three-dimensional model was published on the web using sketchfab, a web service similar to flickr that allows in-browser viewing of three-dimensional renderings in many different formats (ramírez-sánchez et al. 2014). the epigraphia 3d website is intuitive and informative. browsing is simple because there are not many records, but, although it is possible to search the website, there is no search function specifically directed to the collection. thus, usability is rated as medium. despite the fact that the website provides descriptions of the project and the collection, as well as information about epigraphs generally, the website attains a medium rating for content in light of the small size of the collection and the limited information given for each individual item. however, the epigraphia 3d website receives very high ratings for functionality and presentation. the individual threedimensional models are detailed, legible, and interactive. individual inscriptions are transcribed for each item. the use of sketchfab to display the models is effective; no downloading is necessary, and it takes an acceptable amount of time to load. when viewing the item, users can rotate the object in either “orbit” or “first person” mode, as well as view it full-screen or within the browser window. users can also display the wireframe model and the textured or surfaced rendering, as shown in figure 3 below. figure 3. three-dimensional textured (left) and wireframe (middle) renderings from the epigraphia 3d project (http://www.epigraphia3d.es/3d-01.html), as compared to an original twodimensional photograph of the same object (right) (http://edabea.es/pub/record_card_1.php?refpage=%2fpub%2fsearch_select.php&quicksearch=dapynus&r ec=19984). http://www.epigraphia3d.es/ http://www.epigraphia3d.es/3d-01.html http://eda-bea.es/pub/record_card_1.php?refpage=%2fpub%2fsearch_select.php&quicksearch=dapynus&rec=19984 http://eda-bea.es/pub/record_card_1.php?refpage=%2fpub%2fsearch_select.php&quicksearch=dapynus&rec=19984 http://eda-bea.es/pub/record_card_1.php?refpage=%2fpub%2fsearch_select.php&quicksearch=dapynus&rec=19984 information technology and libraries | june 2016 51 smithsonian x 3d (http://3d.si.edu) the smithsonian x 3d project, although affiliated with all of the smithsonian’s varying divisions, was created to test the application of three-dimensional digitization techniques to “iconic collection objects” (http://3d.si.edu/about). the website provides significant detail concerning the project itself, mostly in the form of videos, and individual items, many of which are linked to “tours” that incorporate a story about the object. content is rated as medium because, despite the depth of information provided about individual items, there are still very few items within the collection. the website also receives a medium rating for usability, given the simple browsing structure, easy navigation, and lack of a search feature (all likely due at least in part to the limited content). functionality and presentation, however, are rated high. the x3d explorer in-browser software (powered by autodesk) does more than simply display a three-dimensional rendering of an object; it also permits users to edit the model by changing color, lighting, texture, and other variables as well as incorporates detailed information about each item, both as an overall description and as a slide show, where snippets of information are connected to specific views of the item. the individual three-dimensional models are high resolution, detailed, and wellrendered, with very good surface texture mapping. however, it must be noted that the x3d explorer tool is in beta and, as such, still has some bugs; for example, this author has observed a model disappear while zooming in on the rendering. table 3, below, summarizes the results of the evaluation. functionality usability presentation content mfa medium very high low very high dfl medium high medium high eton myers low low low low epigraphia 3d very high medium very high medium smithsonian x 3d high medium high medium table 3. summary of evaluation results for each website by individual criteria discussion based on the evaluation of the five websites described above, some suggested best practices for the digitization and presentation of three-dimensional objects become apparent. when digitizing, the museum should utilize the method that best suits the object or collection. for example, while mri scanning is likely the best method for three-dimensionally digitizing biological fish specimens, it is not going to be effective or feasible for digitizing artwork or artifacts (abel et al. 2011; berquist et al. 2012). regardless of the method of digitization used, however, the people conducting the imaging and processing should fully comprehend the hardware and software necessary to complete the task. additionally, although financial restraints must be considered, museums should note that some three-dimensional scanning equipment is just as economically feasible as standard digital cameras (metallo and rossi 2011). however, if a museum chooses to utilize only two-dimensional imaging, http://3d.si.edu/ http://3d.si.edu/about let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 52 each item should be photographed from multiple angles in high resolution, to avoid creating a website, like the mfa’s, on which everything other than the object itself is presented outstandingly. further, museums deciding on two-dimensional imaging should explore the possibility of utilizing photogrammetry to create three-dimensional models from their twodimensional photographs, like the epigraphia 3d project. there is free or inexpensive software that functions to permit the creation of three-dimensional object maps from very few photographs (ramírez-sánchez et al. 2014). finally, compatibility is a key issue when conducting threedimensional scans; the museum should ensure that the software used for rendering models is compatible with the way in which users will be viewing the models. in the context of public access to the museum’s digital collections, the website should be easy and intuitive to navigate. the mfa website is an excellent example; browsing and search functions should both be present, and reorganization of large numbers of objects into separate collections may be necessary. where searching is going to be the primary point of entry into the collection, it is important to have sufficient metadata and functional search algorithms to ensure that item records are findable. furthermore, remember that the website is simply a way to access the museum itself. hence, the collections on the website, like the collections in the physical museum, should be curated; there should be a logical flow to accessing object records. the museum may also want to have sections that are similar to virtual exhibitions, like the “tours” provided by the smithsonian x 3d project. finally, museums should ensure that no additional technological know-how (beyond being able to access the internet) is required to access the three-dimensional content in object records. users should not be required to download software or files to view records; epigraphia 3d’s use of sketchfab and the smithsonian’s x 3d explorer tool are both excellent examples of ways in which three-dimensional content can be viewed on the web without the need for extraneous software. museums and cultural heritage institutions are increasing the focus on providing public access to collections via digitization and display on websites (given and mctavish 2010). in order to do this effectively, this paper has attempted to provide some guidance as to best practices of presenting digital versions of three-dimensional objects. in closing, however, it must be noted that this author is not a technician. although this paper has tried to contend with the issues from the perspective of a librarian, there are complicated technical concerns behind any digitization project that have not been adequately addressed. in addition, this paper has not examined the role of budgetary constraints on digitization or the concomitant issues of creating and maintaining websites. moreover, because this paper has been treated as a broad overview of the digitization and presentation for public access of three-dimensional objects, the five websites evaluated were from varying fields of study. museums should look to more specific comparisons in order to appropriately digitize and present their collections on the web. conclusion there may not be a direct substitute for encountering an object in person, but for people who cannot obtain physical access to three-dimensional objects, the digital realm can serve as an adequate proxy. this paper has demonstrated, through an evaluation of five distinct digital collections, that utilizing three-dimensional imaging and presenting three-dimensional models of physical objects on the web can serve the important purpose of increasing public access to otherwise unavailable collections. information technology and libraries | june 2016 53 references abate, d., r. ciavarella, g. furini, g. guarnieri, s. migliori, and s. pierattini. “3d modeling and remote rendering technique of a high definition cultural heritage artefact.” procedia computer science 3 (2011): 848–52. http://dx.doi.org/10.1016/j.procs.2010.12.139. abel, r. l., s. parfitt, n. ashton, simon g. lewis, beccy scott, and c. stringer. “digital preservation and dissemination of ancient lithic technology with modern micro-ct.” computers and graphics 35, no. 4 (august 2011): 878–84. http://dx.doi.org/10.1016/j.cag.2011.03.001. berquist, rachel m., kristen m. gledhill, matthew w. peterson, allyson h. doan, gregory t. baxter, kara e. yopak, ning kang, h.j. walker, philip a. hastings, and lawrence r. frank. “the digital fish library: using mri to digitize, database, and document the morphological diversity of fish.” plos one 7, no. 4: (april 2012). http://dx.doi.org/10.1371/journal.pone.0034499. bincsik, monika, shinya maezaki, and kenji hattori. “digital archive project to catalogue exported japanese decorative arts.” international journal of humanities and arts computing 6, no. 1– 2 (march 2012): 42–56. http://dx.doi.org/10.3366/ijhac.2012.0037. cameron, fiona. “digital futures i: museum collections, digital technologies, and the cultural construction of knowledge.” curator: the museum journal 46, no. 3 (july 2003): 325–40. http://dx.doi.org/10.1111/j.2151-6952.2003.tb00098.x. chane, camille simon, alamin mansouri, franck s. marzani, and frank boochs. “integration of 3d and multispectral data for cultural heritage applications: survey and perspectives.” image and vision computing 31, no. 1 (january 2013): 91–102. http://dx.doi.org/10.1016/j.imavis.2012.10.006. chapman, henry p., vincent l. gaffney, and helen l. moulden. “the eton myers collection virtual museum.” international journal of humanities and arts computing 4, no. 1–2 (october 2010): 81–93. http://dx.doi.org/10.3366/ijhac.2011.0009. dellepiane, m., m. callieri, f. ponchio, and r. scopigno. “mapping highly detailed colour information on extremely dense 3d models: the case of david's restoration.” computer graphics forum 27, no. 8 (december 2008): 2178–87. http://dx.doi.org/10.1111/j.14678659.2008.01194.x. given, lisa m., and lianne mctavish. “what’s old is new again: the reconvergence of libraries, archives, and museums in the digital age.” library quarterly 80, no. 1 (january 2010): 7– 32. http://dx.doi.org/10.1086/648461. hariri, nadjla, and yaghoub norouzi. “determining evaluation criteria for digital libraries’ user interface: a review.” the electronic library 29, no. 5 (2011): 698–722. http://dx.doi.org/10.1108/02640471111177116. hess, mona, francesca simon millar, stuart robson, sally macdonald, graeme were, and ian brown. “well connected to your digital object? e-curator: a web-based e-science platform for museum artefacts.” literary and linguistic computing 26, no. 2 (2011): 193– 215. http://dx.doi.org/10.1093/llc/fqr006. http://dx.doi.org/10.1016/j.cag.2011.03.001 http://dx.doi.org/10.1371/journal.pone.0034499 http://dx.doi.org/10.3366/ijhac.2012.0037 http://dx.doi.org/10.1111/j.2151-6952.2003.tb00098.x http://dx.doi.org/10.1016/j.imavis.2012.10.006 http://dx.doi.org/10.3366/ijhac.2011.0009 http://dx.doi.org/10.1111/j.1467-8659.2008.01194.x http://dx.doi.org/10.1111/j.1467-8659.2008.01194.x http://dx.doi.org/10.1086/648461 http://dx.doi.org/10.1108/02640471111177116 http://dx.doi.org/10.1093/llc/fqr006 let’s get virtual: examination of best practices to provide public access to digital versions of three-dimensional objects | johnson | doi:10.6017/ital.v35i2.9343 54 holovachov, oleksandr, andriy zatushevsky, and ihor shydlovsky. “whole-drawer imaging of entomological collections: benefits, limitations and alternative applications.” journal of conservation and museum studies 12, no. 1 (2014): 1–13. http://dx.doi.org/10.5334/jcms.1021218. hunter, jane, and anna gerber. 2010. “harvesting community annotations on 3d models of museum artefacts to enhance knowledge, discovery and re-use.” journal of cultural heritage 11, no. 1 (2010): 81–90. http://dx.doi.org/10.1016/j.culher.2009.04.004. jarrell, michael c. “providing access to three-dimensional collections.” reference & user services quarterly 38, no. 1 (1998): 29–32. kravchyna, victoria, and sam k. hastings. “informational value of museum web sites.” first monday 7, no. 4 (february 2002). http://dx.doi.org/10.5210/fm.v7i2.929. kuzminsky, susan c. and megan s. gardiner. “three-dimensional laser scanning: potential uses for museum conservation and scientific research.” journal of archaeological science 39, no. 8 (august 2012): 2744–51. http://dx.doi.org/10.1016/j.jas.2012.04.020. lerma, josé luis, and colin muir. “evaluating the 3d documentation of an early christian upright stone with carvings from scotland with multiples images.” journal of archaeological science 46 (june 2014): 311–18. http://dx.doi.org/10.1016/j.jas.2014.02.026. louw, marti, and kevin crowley. “new ways of looking and learning in natural history museums: the use of gigapixel imaging to bring science and publics together.” curator: the museum journal 56, no. 1 (january 2013): 87–104. http://dx.doi.org/10.1111/cura.12009. metallo, adam, and vince rossi. “the future of three-dimensional imaging and museum applications.” curator: the museum journal 54, no. 1 (january 2011): 63–69. http://dx.doi.org/10.1111/j.2151-6952.2010.00067.x. montani, isabelle, eric sapin, richard sylvestre, and raymond marquis . “analysis of roman pottery graffiti by high resolution capture and 3d laser profilometry.” journal of archaeological science 39, no. 11 (2012): 3349–53. http://dx.doi.org/10.1016/j.jas.2012.06.011. newell, jenny. “old objects, new media: historical collections, digitization and affect.” journal of material culture 17, no. 3 (september 2012): 287–306. http://dx.doi.org/10.1177/1359183512453534. novati, gianluca, paolo pellegri, and raimondo schettini. “an affordable multispectral imaging system for the digital museum.” international journal on digital libraries 5, no. 3 (may 2005): 167–78. http://dx.doi.org/10.1007/s00799-004-0103-y. pallas, john, and anastasios a. economides. “evaluation of art museums' web sites worldwide.” information services and use 28, no. 1 (2008): 45–57. http://dx.doi.org/10.3233/isu2008-0554. parandjuk, joanne c. “using information architecture to evaluate digital libraries.” the reference librarian 51, no. 2 (2010): 124–34. http://dx.doi.org/10.1080/02763870903579737. http://dx.doi.org/10.5334/jcms.1021218 http://dx.doi.org/10.1016/j.culher.2009.04.004 http://dx.doi.org/10.5210/fm.v7i2.929 http://dx.doi.org/10.1016/j.jas.2012.04.020 http://dx.doi.org/10.1016/j.jas.2014.02.026 http://dx.doi.org/10.1111/cura.12009 http://dx.doi.org/10.1111/j.2151-6952.2010.00067.x http://dx.doi.org/10.1016/j.jas.2012.06.011 http://dx.doi.org/10.1177/1359183512453534 http://dx.doi.org/10.1007/s00799-004-0103-y http://dx.doi.org/10.3233/isu-2008-0554 http://dx.doi.org/10.3233/isu-2008-0554 http://dx.doi.org/10.1080/02763870903579737 information technology and libraries | june 2016 55 pavlidis, george, anestis koutsoudis, fotis arnaoutoglou, vassilios tsioukas, and christodoulos chamzas. “methods for 3d digitization of cultural heritage.” journal of cultural heritage 8, no. 1 (2007): 93–98, http://dx.doi.org/10.1016/j.culher.2006.10.007. ramírez-sánchez, manuel, josé-pablo suárez-rivero, and maría-ángeles castellano-hernández. “epigrafía digital: tecnología 3d de bajo coste para la digitalización de inscripciones y su acceso desde ordenadores y dispositivos móviles.” el profesional de la información 23, no. 5 (2014): 467–74. http://dx.doi.org/10.3145/epi.2014.sep.03. saracevic, tefko. “digital library evaluation: toward an evolution of concepts.” library trends 49, no. 3 (2000): 350–69. srinivasan, ramesh, robin boast, jonathan furner, and katherine m. becvar. “digital museums and diverse cultural knowledges: moving past the traditional catalog.” the information society 25, no. 4 (2009): 265–78, http://dx.doi.org/10.1080/01972240903028714. xie, hong iris. “users’ evaluation of digital libraries (dls): their uses, their criteria, and their assessment.” information processing and management 44, no. 3 (may 2008): 1346–73, http://dx.doi.org/10.1016/j.ipm.2007.10.003. http://dx.doi.org/10.1016/j.culher.2006.10.007 http://dx.doi.org/10.3145/epi.2014.sep.03 http://dx.doi.org/10.1080/01972240903028714 http://dx.doi.org/10.1016/j.ipm.2007.10.003 introduction 36 information technology and libraries | march 200736 information technology and libraries | march 2007 author id box for 2 column layout opac design enhancements and their effects on circulation and resource sharing within the library consortium environment michael j. bennett a longitudinal study of three discrete online public access catalog (opac) design enhancements examined the possible effects such changes may have on circulation and resource sharing within the automated library consortium environment. statistical comparisons were made of both circulation and interlibrary loan (ill) figures from the year before enhancement to the year after implementation. data from sixteen libraries covering a seven-year period were studied in order to determine the degree to which patrons may or may not utilize increasingly broader opac ill options over time. results indicated that while ill totals increased significantly after each opac enhancement, such gains did not result in significant corresponding changes in total circulation. m ost previous studies of online public access catalog (opac) use and design have centered on transactionlog analysis and user survey results in the academic library environment. measures of patron success or lack thereof have traditionally been expressed in the form of such concepts as “zerohit” analysis or the “branching” analysis of kantor and, later, ciliberti.1 missing from the majority of the literature on opac study, however, are the effects that use and design have had on public library patron borrowing practices. major drawbacks to transactionlog analyses and user surveys as a measure of successful opac use include a lack of standardization and the inherent difficulties in interpreting resulting data. as peters notes, “[s]urveys measure users’ opinions about online catalogs and their perceptions of their successes or failures when using them, while transaction logs simply record the searches conducted by users. surveys,” he concludes, “mea sure attitudes, while transaction logs measure a specific form of behavior.”2 in both cases it is difficult, in many instances, to draw clear conclusions from either method. circulation figures, on the other hand, measure a more narrowly defined level of patron success. circulation is a discrete output that is the direct result of patrons’ initiated interaction with one or many library collections, one or many levels of library technology. with the recent advent of such enhanced opac functionality as patronplaced holds on items from broader and broader catalogs, online catalogs now more than ever not only serve as search mechanisms but also as ways for patrons to directly obtain materials from multiple sources. it follows that an investigation of the possible effects such enhancements may have on general circulation trends is warranted. ■ literature review during the midtolate 1980s, transactionlog analysis was introduced as an inexpensive and easy method of looking at opac use in primarily the academic library environment. peters’s transactionlog survey of more than thirteen thousand searches executed over a five month period at the university of missourikansas city remains particularly instructive today for its large sample and transferable design as well as its interpreta tion of results.3 here analysis was broken into two phases. in phase one, usage patterns by search type and failure rates as measured by zero hits were examined as dependent vari ables with search type as the independent variable in a comparison study. phase two took this one step further in the assigning of what peters termed “probable cause” of zero hits. these probable causes fell into patterns that, in turn, resulted in the identification of fourteen discernable error types that included such things as typographical errors and searches for items not in the catalog. once again, search type formed the independent variable while error type shaped the dependent variable in a simple study of error types as a percentage of total searches. peters found that users rarely employed truncation or any advanced feature searches and that failures were due primarily to such consistent erroneous search patterns as typographical errors and misspellings. more importantly, however, he cogently reassessed transactionlog analysis as a tool and critiqued its limitations. zero hits, for exam ple, need not necessarily construe failure when a patron performs a quality search and finds that the library simply does not own the title in question. concerning intelligible outputs from transactionlog study, peters found that, “if the user is seen as carrying on a dialog of sorts with the online catalog, then it could be said that most transaction logs record only half of the conversa tion. more information about the system’s response to the user’s queries would help us better understand why patrons do what they do.”4 a look at subsequent transactionlog analyses into the 1990s reveals somewhat differing research approaches yet strikingly similar results. wallace (1993) duplicated peters’s methods at eleven terminals within the university of colorado library system.5 her efforts spanned twenty hours of search monitoring and resulted in 4,134 logged searches. these were defined by carl system search type, (e.g., word, subject), then analyzed as cumulative totals and percentages of all searches. in this case, how michael j. bennett michael j. bennett (mbennett@cwmars.org) is digital initiatives librarian, c/w mars library network, worcester, massachusetts. article title | author 37opac design enhancements | bennett 37 ever, failed searches (peters’s zero hits) were eliminated entirely from the sample as wallace focused primarily on patterns of completed searches and did not concern her self with questions of search success or failure, thus limit ing the scope of her findings. among searches analyzed, results were comparable to peters’s.6 in keeping with peters’s line of thinking, wallace remarked, intriguing vagaries in human behavior during an infor mation search process continue to stymie researchers’ efforts to understand that process. . . . current, widely used and described guidelines, rules and principles of searching simply do not take into account important aspects of what is really going on when an individual is using a computer to search for information.7 in 1998, ciliberti et al. conducted a materials avail ability study of 441 opac searches at adelphi university over a threeweek period during fall semester.8 their work combined kantor’s branchinganalysis methodol ogy with transactionlog analysis of opac use in order to better understand if users obtain the materials they need through the online catalog.9 sampling was accom plished during random open hours and drew informa tion from undergraduate, graduate, and faculty users. survey forms included questions of what patrons were searching for. forms were then picked randomly by staff for recreation. the study was unclear as to the actual design of these forms and their queries. as a result their effectiveness remains questionable. a sevencategory scheme was developed to code search failures that closely followed kantor’s branching analysis, where the concept of errors extends beyond just opac and its design to include such things as library collection devel opment and circulation practices.10 the survey itself along with the loss of accuracy that can be expected from patrons attempting to describe their searches on paper, then having these same searches recreated by research staff lead this author to question the data’s validity. as peters has noted, surveys are good for assessing opac users’ opinions but not necessarily their behavior.11 it would seem that in this instance the tool did not fit the task. this study did, however, use transaction logs after the initial survey analysis and indeed found discrepancies between the selfreport (survey) and actual transactionlog data. search errors were subsequently categorized as pre viously described.12 though branching analysis is adept at examining on a holistic, entirelibrary scale (e.g., the ques tion of why patrons are not able to obtain materials), the method’s inherent breadth of focus does not lend itself to fine scrutiny of opac design issues in and of themselves. further refinement of the transactionlog analysis methodology may be seen in blecic’s et al. fouryear longi tudinal study of opac use within the university of illinois library system.13 once again, failed searches, termed “zero postings” by the authors, were examined as dependent variables and percentages of the total number of searches and were used as a control. reasons for zero postings (e.g., searches missing search statements, author names entered in incorrect order) fell into seven separate catego ries. subsequent transactionlog sets were then culled after three incremental opac enhancements. enhancements included redesigns of general introductory and explain screens. ztest analysis of the level of equality between percentages of zero postings from log set to log set was then made in order to assess whether or not the enhance ments had any affect on diminishing said percentages and thus improving searching behavior. what blecic et al. found was temporary improve ment in patron searches followed by an unexpected lowering of patron performance over time. confounding attributes to the study include its longitudinal nature in an academic setting where user groups are not constant but variable. sadly, no attempt at tracking such possible changes in user populations was made. also of note was the fact that, as time passed, the commandbased opac was increasingly being surrounded by webbased journal database search interfaces that did not require the use of sophisticated search statements and arguments. as users became accustomed to this type of searching, their com mand syntax skills may have suffered as a result.14 merits of the study include its straightforward design, logical data analysis, and plausible conclusions. longitudinal studies, though prone to the confound ing variables described, nevertheless form a persuasive template for further research into how incremental opac enhancements affect actual opac use over time. variations of transactionlog analysis also include the purely experimental. thomas’s 2001 simulation study of eightytwo firstyear undergraduates at the university of pittsburg utilized four separate experimental screen inter faces.15 these interfaces included one that mimicked the current catalog with data labels and brief bibliographic displays, a second interface with the same bibliographic display but no data labels, and a third that contained the data labels but modified the brief display to include more subjectoriented fields. a fourth interface viewed the same brief displays as the third group but with the labels removed. users were pretested for basic demographic informa tion and randomly assigned to one of the four experi mental interface groups. each group was then given the same two search tasks. for the first task, users were asked to select items that they would examine further for a hypothetical research paper on bigband music and the music of duke ellington. the second task involved asking participants to examine twenty bibliographic records and to decide whether they would choose to look into these records further. participants were then asked to identify the data elements used to inform their 38 information technology and libraries | march 200738 information technology and libraries | march 2007 relevance choices. resulting user behavior was subse quently tracked through transaction logs. for thomas’s experimental purposes, though, trans action logs took on a higher level of sophistication than in earlier comparative studies. here participants’ actions were monitored with a greater level of granularity. quantitative data were tracked for screens visited, time spent viewing them, total number of screens, total number of bibliographic citations examined at each level of speci ficity, and total time it took to complete the task. because of the obtrusive nature of the project, a third party was hired to administer the experiment. chisquare analysis of demographic data found no significance among partici pant groups in terms of their experience in using comput ers, online catalogs, or prior knowledge of the problem topic. this important analysis allowed the researchers a higher level of confidence in their subsequent findings. results in many instances were, however, inconclu sive. factors impairing the clarity of conclusions included the number of variables analyzed and the artificiality of the test design itself. thomas comments on one particular example of this: one of the fields that previous researchers said that library users found important was the call number field. obviously, without the call number, locating the actual item on the shelf is greatly complicated. in this experi ment, however, participants were not asked to retrieve the items they selected; thus, their perceived need for the call number may well have been mitigated.16 here is further evidence that a study of opac activity viewed in the context of actual outcomes, namely circula tion, is a logical approach to consider. most recently, graham at the university of lethbridge, alberta, examined opac subject searching and no hit results and considered two possible experimental enhancement types in order to allow users the ability to conduct more accurate searches.17 over a oneweek period, 1,521 nohit subject searches were first sampled and placed into nine categories by error type. subtotals were then expressed as percentage distributions of the total. a similar examination of 37,987 nohit findings was also made over the course of four calendar years, form ing a longitudinal approach. percent distribution of error types from the two studies were then compared and were found to be similar with “nonlibrary of congress subject headings” being the predominant area of concern. graham then attempted to improve subject searching by systematically enhancing the catalog in two ways. first, crossreferences were created based upon the original no hit search term and linked to existing library of congress subject headings (lcshs) that graham interpreted as appropriate to the searcher’s original intentions. second, in instances where the original search could not be easily linked to an existing lcsh, a pathfinder record was cre ated that suggested alternate search strategies. all total, 10,520 new authority records and 2,312 pathfinder records were created over the course of the longitudinal study.18 the experiment, unfortunately, only went this far. no attempt was subsequently made to test whether these two methods of adding value to an existing opac search interface made a difference in users’ experiences. though creative in its suggested ameliorations to nohit searches, the study also lacked any statistical testing of comparative data among sample years. possible problematic design issues, such as the relative complexity of pathfinders and how this might affect their end use were discussed but never tested through the analysis of real outcomes. in summary, major weaknesses of the transactionlog analysis model as demonstrated through the literature include: 1. lack of standardization among general study methodologies. 2. lack of standardization of opacs themselves: command structure and screen layout differ among software vendors. 3. lack of standards on measurable levels of search “success” or “failure.” while the following study of opac design enhance ments in the public library consortium environment did not directly address the first two points of emphasis, it was this author’s expectation that the lack of stan dardized notions of opac search success or failure found throughout the literature may be better addressed through a longitudinal analysis of discrete circulation and ill statistics. in this way, these quantifiable outcomes, both the direct results of patron initiation, would better assume clearer measures of patron success or failure in opac end use. ■ purpose and methodology in recent years, both academic and public libraries have invested substantial capital in improving opac design and automated systems. to what extent have these improvements affected the use of library materials by public library patrons? in order to better examine the question, this study tracked, over a sevenyear period dating back from july 1998 through june 2005, the circulation and systemwide holds statistical trends of sixteen member libraries of c/ w mars, a massachusetts automated library network of 140 libraries. during this time a number of discrete, incre mental opac modifications granted patrons the ability to accomplish tasks remotely through the opac that previ ously had required library staff mediation. among these article title | author 3�opac design enhancements | bennett 3� changes, the initiation of intraconsortium (c/w mars) patronplaced holds, and the subsequent introduction of a link from the existing opac to the massachusetts virtual catalog (nine massachusetts consortiums, four university of massachusetts system libraries) were examined. this author hypothesized that such opac enhance ments that allow for broader choices of patronplaced holds would result in increases in both total circulation and total network transfers (ill) of library materials one year after initial enhancement adoption. as both total cir culation and total ill grew, it was hypothesized that ill as a percent of total circulation would likewise increase due to the fact that each opac enhancement was targeted directly toward facets of ill procurement. opac enhancements followed the schedule below: 1. general c/w mars network systemwide holds (requests mediated through library staff only), november 2000 2. patronplaced holds (request button placed on c/ w mars opac screens), december 2002 3. c/w mars participation in the massachusetts virtual catalog (additional button for pass through opac searches and requests from c/w mars catalog into the massachusetts virtual catalog), august 2004 these dates served as independent variables in a study of separate dependent variables (total circulation and total ills received) for all eight libraries one year after initial adoption of a new enhancement. for the sake of continu ity the terms holds and ills were used interchangeably throughout this examination. ttest comparisons to fig ures from the year prior to enhancement were then made for statistical significance. in addition, ills received as a percentage of total circulation (dependent variable) for all fifteen libraries one year after initial adoption of a new enhancement were also calculated and compared to the year prior to enhancement through ztest analysis. libraries chosen were a random sample from both central and western geographic regions of the network. sampled institutions did not go through any substantial renovations, drastic open hours changes, or closures dur ing the study period in order to better avoid potential con founding variables that may have skewed the resulting data. raw circulation and ill figures were taken directly from the massachusetts board of library commissioners’ (mblc) data files for fiscal years 1999 through 2004.19 in the mblc’s data files, the following fields, sorted by library, correlated to this study’s statistical reporting: “dircirc” = “circulation” “loan from” = “ill” as fiscal year (fy) 2005 figures for circulation and ill had not yet been compiled by mblc at the time of this writing, these statistics were in turn taken directly from reports run off of c/w mars’s network servers. it should be noted that similar c/w mars reports are distributed and used by the consortium’s libraries them selves each fiscal year for reporting circulation and ill statistics to mblc. raw data by library were entered into microsoft excel spreadsheets. totals for circulation and ills received for all libraries by fy of opac enhancement were totaled and then compared to fy data prior to enhancement as a percent change value. excel’s data analysis tools were then employed to run ttests (paired two sample for means) in tables 1 through 5 to analyze the level of change for significance from one sample to the next in both total circulation and total ills. (all tables and charts can be found in appendix following article.) tests for sig nificance employed twotailed ttests with an alpha level set to .05. raw data for these same libraries across identical study years were also entered into subsequent spread sheets (tables 6 through 10) for additional ztests (two samples for means) to analyze the level of change for significance from one fy sample to the next in ills received as a percentage of total circulation. here tests for significance employed twotailed ztests with an alpha level set to .05. ■ results and discussion the results of a sixteenlibrary, sevenyear longitudinal study of total circulation and total illsreceived statistics are outlined in tables 1 through 5, charts 1 through 10. in addition, an analysis of ills received as a percentage of total circulation during this same time period among sampled libraries is represented in tables 6 through 10. over the course of the study a total of 22,277,245 circula tion and 624,286 ill transactions were examined from july 1998 through june 2005. yearly comparisons in total circulation and total ills received from fy ’99 to fy ’00 were made to analyze the level of changes in circulation and ill statistics between years before any opac ill enhancements were under taken. as such these numbers gave insight into what changes, if any, normally occur in circulation and ill fig ures prior to a schedule of substantial opac ill enhance ments. although the yeartoyear comparisons over the course of subsequent enhancement rollouts were made to test for the statistical significance of the year prior and following a particular functionality addition, the ’99 to ’00 40 information technology and libraries | march 200740 information technology and libraries | march 2007 comparison was made to form a control of what circula tion and ill trends may look like between years of no drastic workflow or design changes. results showed that this yearly comparison prior to the beginning of opac enhancements (table 1, charts 1 and 2) showed no significant change from one year to the next in total circulation (t = 1.81, p > 0.05) or total ills received (t = 0.76, p > 0.05). circulation from ’99 to ’00 declined slightly by 3.42 percent while total ills received increased 3.35 percent. the mblc’s available retrospec tive data set currently only goes back to fy ’99, so a deeper understanding beyond this twoyear comparison of normal yeartoyear trends was impossible to achieve. yet data from this sample suggest that both circulation and ills may trend statistically flat from one year of little if any alteration of ill design to the next. additionally, comparisons of the percent of total ills received to total circulation were made between ’99 and ’00 (as will be seen in table 6) and were found to be insignificantly different (z = 0.23, p > 0.05). ills received made up 0.61 percent of total circulation in fy ’99 and 0.65 percent of total circulation in fy ’00. during fy ’01 (november 2001), c/w mars rolled out automated systemwide holds functionality whereby library staff were first able to place patron requests for materials at other c/w mars member libraries through the consortium’s automated circulation system. up until this point, holds (ills) were placed primarily by staff through email or faxed requests from one ill depart ment to another. patrons would request material either verbally with staff or through the submission of a paper or electronic form. staff would then look up the item in the electronic catalog and make the request. with the advent of systemwide holds, staff still accepted requests in a similar fashion, but instead of using the fax or email, they began to place requests directly into the network’s innovative millennium circu lation clients. from there, the automated system not only randomly chose the lending library within the system but also automatically queued paging slips at the lending library for material that would subsequently be sent in transit to the borrowing location. by this time in the network’s development, opac had also graduated from a characterbased telnet system to a smoother web design. but the catalog, in terms of directly assisting in the placing of ill requests, func tioned as it always had—it was still individually a search ing mechanism. the introduction of systemwide holds led to the sec ond largest jump in ill figures out of all comparative samples (table 2, chart 4). interestingly enough, the con siderably significant 127.23percent gain in ill activity from fy ’00 to fy ’01 (t = 4.07, p < 0.05) did not translate into a significant increase in total circulation. in fact, cir culation declined during this period, not significantly (t = 1.87, p > 0.05), but by 2.40 percent nonetheless (table 2, chart 5). a comparison of the percent of ills to total circulation from fy ’00 to fy ’01 (table 7) indicated a sig nificant increase of 0.65 percent to 1.52 percent (z = 4.20, p < 0.05). more on the possible effects to circulation that rising levels of ills may elicit will be touched upon. though no statistical evaluations were made between fy ’01 and fy ’02 (as no novel ill changes were made over this period), it should be noted that during fy ’02 the network first allowed patrons the ability, through opac, to log into their own accounts remotely. patrons were given the ability to set up a personal identification number and view such things as a list of their checked out items. patrons were also allowed to place checks next to such items and to renew these items remotely. fy ’03 saw the original direct ill enhancement to opac. during this year patrons were first given the opportunity to directly place ill requests of their own (patronplaced holds) for material found in the catalog through the addition of an opac screen request button. up until this time, all material requests had been medi ated by library staff. comparative total circulation results from the year before enhancement to fy ’03 (table 3, chart 5) showed only a slightly significant 4.18 percent increase (t = 2.94, p < 0.05). illsreceived figures (table 3, chart 6), however, jumped by a considerable 25.58 percent margin (t = 4.66, p < 0.05), strongly suggesting that the opac request button addition and its facilitation of patronplaced holds had a positive effect upon total ill activity as was hypothesized. finally, total ills received as a percentage of total circulation increased slightly from fy ’02 (2.52 percent) to fy ’03 (3.04 percent) (table 8) but did not rep resent a significant shift (z = 1.51, p > 0.05). the last augmentation to the network’s opac design that this study examined was an additional link for ills through the massachusetts virtual catalog. the massachusetts virtual catalog at the time of this study was an online union catalog of nine massachusetts net work consortia and four university of massachusetts system libraries. unlike the previous requestbutton enhancement that allowed for seamless patronplaced holds within the c/ w mars catalog, the massachusetts virtual catalog link was not a button but a descriptive hyperlink (can’t find the title you want here? try the massachusetts virtual catalog next!) from the network’s opac to the virtual catalog’s own dedicated opac interface. once there, patrons were required to login to the virtual catalog and recreate their search queries from scratch as previous searches were not automatically passed through to the second catalog. in essence, the virtual catalog acted as an additional step for patrons to take beyond c/w mars’s list of holdings to broaden their search for materials that the network’s member libraries did not own. article title | author 41opac design enhancements | bennett 41 comparative figures for total circulation between fy ’04 and fy ’05 (table 4, chart 7) when the virtual catalog link was added to the c/w mars opac screen found circulation down an insignificant 2.04 percent (t = 0.97, p > 0.05), which ran counter to hypothesized expectations. total ills received between fy ’04 and fy ’05 (table 4, chart 8), however, rose 30.85 percent, which proved to be a highly significant increase (t = 7.03, p < 0.05). additionally ills as a percent of total circulation rose from 4.70 percent in fy ’04 to 6.27 percent in fy ’05 (table 9), which was sta tistically significant (z = 3.28, p < 0.05) and pointed to not only gains in ill itself after the introduction of the virtual catalog link but also to the ever increasing proportion of total circulation that ill activity accounted for. the final statistical comparison accomplished in this study was a look at what possible cumulative effect, if any, both opac enhancements may have had from the year before the first enhancement’s rollout (patronplaced holds request button) to one year after the latest addition (virtual catalog hyperlink from opac). in turn, com parative numbers for circulation and ills between fy ’02 and fy ’05 were examined. total circulation over this time (table 5, chart 9) increased insignificantly by 3.46 percent (t = 1.47, p > 0.05). total ills received (table 5, chart 10), how ever, increased by 157.47 percent, the highest significant increase of any two comparative samples (t = 7.20, p < 0.05). ills as a percent of total circulation also increased significantly from 2.52 percent in fy ’02 to 6.27 percent in fy ’05 (z = 7.71, p < 0.05) (table 10). if one steps back and examines the various compari sons discussed up to this point, certain trends become evident. over the course of the sevenyear study, total circulation remained relatively flat, oscillating slightly back and forth, year to year with only one significant increase that occurred after the introduction of patron placed holds in fy ’03. these results, excluding fy ’03, ran against hypothesized expectations that predicted that as ill enhancements were rolled out, correspondingly significant increases in circulation would result. total ills received (the fy ’99 to fy ’00 control com parison) before the advent of first, network systemwide holds, then a succession of opac design enhancements that allowed for a broader range of patroninitiated ills suggested that these totals run statistically flat from one year to the next. with the advent of systemwide holds, the ill picture, however, began to change dramatically with a significant increase in total ills. this was fol lowed by significant increases in ill activity in each study year that came after an opac ill enhancement. these results pointed toward the substantial effect that these enhancements made in total ill activity and sup ported hypothesized expectations. when such opac rollouts were examined as a cumu lative influence through the prism of ill levels of this past fiscal year (fy ’05) compared to the year before their initial advent (fy ’02), the positive effect that such enrich ments had on not only total ill but also on total circula tion becomes clearest. for it is through this comparison that it was found that not only did total ills increase significantly but that ills as a percentage of total circula tion also increased significantly from the time before the first opac enhancement to the present. total circulation was surprisingly impervious to change and ran statisti cally flat during this time. it is clear from this longitudinal study that incremen tally granting patrons access to online tools for them to initiate such traditional library business as ills spurs sig nificantly large increases in such activity. in other words, these online tools are not ignored but are intellectually and literally grasped. what may be surprising, however, is the degree to which ill has increased as a result of them, to a point where ill has not only taken up a sig nificantly greater proportion of total circulation than ever before but also appears to be changing the very nature of circulation itself. future studies may include a deeper examination of the circulation and ill statistical picture farther back in time than this investigation covers to better clarify trends leading up to such major enhancement rollouts. also, similar longitudinal studies from different consortia envi ronments may shed further light on evidence discussed throughout this writing. consortia are uniquely poised to offer large statistical sample sizes and standardized workflows within their networkwide ill and circulation software packages and automated statistical programs. this, in turn, results in highquality, consistent data samples from heterogeneous library sources that are rela tively uncorrupted by scattershot recording methods and differing circulation and ill methodologies. finally, a future look at the effects that similar opac ill enhancements may have on borrowing trends beyond general raw transactional figures is warranted. chris anderson, for example, has recently commented on long tail statistical analysis and its relation to library catalogs. here outwardly shifting demand curves for library mate rials are hypothesized as collections become more visible and interconnected through the web.20 in a similar vein, a more granular examination of such concepts as possible circulation and illactivity trends in terms of discrete material types borrowed, patron types who borrow, or a crosstabulation of these data points would appear to be a fertile next step toward a greater knowledge of ills and circulation as a whole. references 1. t. peters, “when smart people fail: an analysis of the transaction log of an online public access catalog,” the journal of academic librarianship 15, no. 5 (1989): 267–73. 42 information technology and libraries | march 200742 information technology and libraries | march 2007 2. ibid., 272. 3. ibid. 4. ibid., 272. 5. p. wallace, “how do patrons search the online catalog when no one’s looking? transactionlog analysis and impli cations for bibliographic instruction and system design,” rq 33, no. 2 (1993): 239–43. 6. peters, “when smart people fail.” 7. wallace, “how do patrons search the online catalog when no one’s looking?” 239. 8. a. ciliberti et al., “empty handed? a material availabil ity study and transactionlog analysis verification,” the journal of academic librarianship 24, no. 4 (1998): 282–89. 9. p. kantor, “availability analysis,” journal of the american society for information science 27, nos. 5–6 (1976): 311–19. 10. ciliberti et al., “empty handed? a material availability study and transactionlog analysis verification.” 11. peters, “when smart people fail.” 12. ciliberti et al., “empty handed? a material availability study and transactionlog analysis verification.” 13. d. blecic, et al., “a longitudinal study of the effects of opac screen changes on searching behavior and searcher suc cess,” college & research libraries 60, no. 6 (1999): 515–30. 14. ibid. 15. d. thomas, “the effect of interface design on item selec tion in an online catalog,” library resources & technical services 45, no. 1 (2001): 20–46. 16. ibid., 41. 17. r. graham, “subject nohits searches in an academic library online catalog: an exploration of two potential ame liorations,” college & research libraries 65, no. 1 (2004): 36–54. 18. ibid. 19. massachusetts board of library commissioners 2005, “public library data, data files,” http://www.mlin.lib.ma.us/ advisory/statistics/public/index.php (accessed oct. 13, 2005). 20. c. anderson, “the long tail,” wired magazine 12, no. 10 (2004): 170–77; “q&a with chris anderson,” oclc newsletter, 2005, no. 268, http://www.oclc.org/news/publications/news letters/oclc/2005/268/interview.htm (accessed july 20, 2006). appendix a: tables and charts table 1. yearly comparison prior to the beginning of ill opac enhancements table 2. general systemwide holds implementation (adopted 11/00) article title | author 43opac design enhancements | bennett 43 table 3. opac design enhancement: patron-placed holds (adopted 12/02) table 4. opac design enhancement: patron-placed massachusetts virtual catalog holds (adopted 8/04) table 5. opac design enhancements: “cumulative effect” (fy ’02 to fy ’05) table 6. yearly comparison prior to the beginning of ill opac enhancements of ill received as a percentage of total circulation 44 information technology and libraries | march 200744 information technology and libraries | march 2007 table �. opac design enhancement: patron-placed massachusetts virtual catalog holds (adopted 8/04) ill received as a percentage of total circulation table 10. opac design enhancements: “cumulative effect” (fy ’02 to fy ’05) ill received as a percentage of total circulation table 7. general systemwide holds (adopted 11/00) ill received as a percentage of total circulation table 8. opac design enhancement: patron-placed holds (adopted 12/02) ill received as a percentage of total circulation article title | author 45opac design enhancements | bennett 45 chart 1. circulation comparison prior to any ill opac enhancement (fy ’99 to fy ’00) chart 2. ill received comparison prior to any ill opac enhancement (fy ’99 to fy ’00 chart 4. holds received comparison before and after general systemwide holds implementation (adopted 11/00) chart 5. circulation comparison before and after patron-placed holds opac enhancement (adopted 12/02) chart 3. circulation comparison before and after general systemwide holds implementation (adopted 11/00) chart 6. holds received comparison before and after patron-placed holds opac enhancement (adopted 12/02) 46 information technology and libraries | march 200746 information technology and libraries | march 2007 chart 7. circulation comparison before and after massachusetts virtual catalog opac enhancement (adopted 8/04) chart 8. holds received comparison before and after massachusetts virtual catalog opac enhancement (adopted 8/04) chart 9. circulation comparison opac enhancements “cumulative effect” (fy ’02 to fy ’05) chart 10. ill comparison opac enhancements “cumulative effect” (fy ’02 to fy ’05) lita 35, 47, cover 2, cover 4 nealschuman cover 3 index to advertisers 34 information technology and libraries | december 2007 author id box for 2 column layout column title editor as public libraries are becoming e-government access points relied on by both patrons and government agencies, it is important for libraries to consider the implications of these roles. while providing e-government access serves to reinforce the tremendously important role of public libraries in the united states social infrastructure, it also creates new demands on libraries and opens up significant new opportunities. drawing upon several different strands of research, this paper examines the nexus of public libraries, values, trust, and e-government, focusing on the ways in which the values of librarianship and the trust that communities place in their public libraries reinforce the role of public libraries in the provision of e-government. the unique values embraced by public libraries have not only shaped the missions of libraries, they have influenced popular opinion surrounding public libraries and fostered the confidence that communities place in them as a source of trusted information and assistance in finding information. as public libraries have embraced the provision of internet access, these values and trust have become intertwined with their new social role as a public access point for e-government both in normal information activities and in the most extreme circumstances. this paper explores the intersections of these issues and the relation of the vital e-government role of public libraries to library funding, public policy, library and information science education, and research initiatives. p ublic libraries have always been valued and trusted institutions within society. due to recent advances in technology and changes in united states society, public libraries now also play a unique and critical role by offering free public internet access. with the increas ing reliance on the internet as a key source of news, social capital, and access to government services and information, the free access provided by public librar ies is an invaluable resource. as a result, a significant proportion of the u.s. population, including people who have no other means of access, people who need help using computers and the internet, and people who have lower quality access, rely on the internet access and computer help available in public libraries. federal, state, and local government agencies now also rely on public libraries to provide citizens with access to and guidance in using egovernment web sites, forms, and services; many government agencies simply direct citizens to the nearest public library for help. this confluence of events has created a major new social role for public libraries— guarantors of internet and egovernment access. though public libraries are not the only points of free internet access in many communities, they have created the strongest commitment to providing access and help for all. by providing not only the access to technology, but also to help using the technology, libraries became internet access points, while community technology cen ters, which usually did not offer the same level of avail able assistance, failed in the late 1990s and early 2000s. further, as libraries not only provide internet access, but free computer access as well, they attract the people who do not own computers and do not benefit from a city’s or coffee shop’s free wifi. the compelling combination of free computer access, free internet access, the avail ability of assistance from knowledgeable librarians, the value that public librarians place on serving their local communities, and the historical trust that society places in public libraries has made libraries a critical part of the u.s. social infrastructure. without public libraries, large segments of the population would be cut off from access to the internet and egovernment. while the provision of internet access for those who have no other access parallels the role of public libraries as providers of access to print materials, the matura tion of public libraries into internet and egovernment access hubs has profound implications for the roles that public libraries are being expected to play in their communities. public libraries are trusted by their com munities as places that community members can turn to for unfettered information access and as places to go for information in times of need. combining this trust with the power of internet access and support makes public libraries even more critical within their local com munities. the trust placed in libraries is also important in balancing the lack of confidence that many citizens place in other government institutions as well as in the internet. clearly, egovernment, which exists at this intersection, has its trustworthiness bolstered by the role of public libraries in its use. as patrons are able to access egovernment through the library—a place that is trusted—they may have greater confidence in the gov ernment services they use through library computers and with the assistance of librarians. the important role of libraries in providing citizens with access to the internet, and especially to egovern paul t. jaeger (pjaeger@umd.edu) is an assistant professor and director of the center for information policy and electronic government at the college of information studies of the university of maryland, college park. kenneth r. fleischmann (kfleisch@umd.edu) is an assistant professor at the college of information studies of the university of maryland, college park. paul t. jaeger and kenneth r. fleischmann public libraries, values, trust, and e-government article title | author 35public libraries, values, trust, and e-government | jaeger and fleischmann 35 ment, makes natural sense given the values of the public library. these new services reflect the values traditionally upheld by public libraries, such as equal access to infor mation, literacy and learning, and democracy. indeed, these values likely have played a significant role in developing and sustaining public trust in public libraries as institutions. thus, to understand how public libraries have come to serve as the default site for egovernment access, it is important to consider how this role builds on and reflects the public library’s enduring values. drawing upon several different strands of research, this article explores the intersections of public libraries, values, trust, and egovernment. the article first exam ines the values of public libraries and the role that these values play in influencing popular opinion surrounding public libraries. next, the article focuses on the trust that communities place in public libraries, which builds upon the values that libraries uphold. after that, the article explores the reasons why public libraries became and remain the public access point for egovernment, providing examples from the 2004 and 2005 hurricane seasons that illustrate this point in the most extreme cir cumstances. the article then examines the nexus of public libraries, values, trust, and egovernment, further exam ining how the values of librarianship and the confidence that communities place in their public libraries reinforce the role of public libraries in the provision of egovern ment. finally, the article explores how the egovernment role of public libraries could be cultivated to improve library services through involvement in research and educational initiatives. ■ public libraries and values values can be seen as “evaluative beliefs that synthesize affective and cognitive elements to orient people to the world in which they live.”1 in other words, values tie together how individuals think about the world and how they feel about the world. following this definition, values are situated within individuals. although they are a result of social interaction and may be shared among individuals, values are a highly individualized and per sonalized phenomenon. thus, values arise at the intersec tion of the individual and the social, with some scholars now making a case for increasing the emphasis placed on values in the social sciences.2 recently, many scholars and commentators have focused on the values of librar ies, most notably former ala president michael gorman, who has written extensively on the topic.3 gorman focuses on library values in response to what he views as a disconnect between library practitioners and academics. he argues that libraryscience programs are becoming increasingly detached from reality, and that one way to ground library science, as well as the library profession, is through an emphasis on the values of librar ianship, which demonstrate the core, enduring values of the profession.4 he explains that values, on the one hand, should provide a foundation for interaction and mutual understanding among members of a profession; on the other hand, they should not be viewed as immutable, but rather as sufficiently flexible to match the changing times. he lists eight central values of librarianship that he views as particularly salient at present: stewardship, service, intellectual freedom, rationalism, literacy and learning, equity of access to recorded knowledge and information, privacy, and democracy. frances groen echoes gorman’s sentiments and argues that one of the major limitations of libraryscience programs is their lack of attention to values.5 she argues that library and information science (lis) programs place almost all of their educational emphasis on what librar ians do and how they do it, and almost none on the rea sons why they do what they do and why such activities are important. she identifies three fundamental library values: access to information, universal literacy, and preservation of cultural heritage, all of which she argues are also characteristics of liberal democratic societies. this argument parallels the observation that increases in information access within a society are essential to increasing the inclusiveness of the democratic process in that society.6 library historian toni samek focuses on another aspect of library values that is no longer as strongly emphasized—attempts to achieve neutrality in libraries.7 neutrality often was advocated as a cherished value, in the sense of providing equal access to all information and sources. however, samek demonstrates that libraries, on the contrary, were more likely to emphasize mainstream information sources and thus privilege them over alter native sources. not only has the value of neutrality been problematic in terms of how it has been implemented and mobilized in public libraries in the 1960s and 1970s, but it also is perhaps impossible to ever achieve in reality.8 the fact that neither gorman nor groen include neutrality in their listings of fundamental library values demonstrates how library values have continued to evolve as public libraries have developed as social institutions. as library values have developed, they have served to unite librarians and establish the role of public libraries in their communities. the values of librarianship have been encoded in the american library association’s (ala) library bill of rights, which strongly asserts the values of equal access and service for all patrons, nondiscrimina tion, diversity of viewpoint, and resistance to censorship and other abridgments of freedom of expression.9 the values of libraries and librarianship are one of the fac tors that lead communities to trust public libraries, as the following section explores. overall, further study of the 36 information technology and libraries | december 200736 information technology and libraries | december 2007 role of values in libraries is essential, especially given the increasing role of technology in public libraries.10 ■ public libraries and trust exactly one half of the respondents to a 2007 pew research center study agreed with the statement “you can’t be too careful in dealing with people.”11 however, even in a climate where trust can be a precious commodity, public libraries are trusted by their communities. carr argues that libraries have come to earn the trust of their com munities because of four obligations that librarians strive to meet: to provide usercentered service, to actively engage in helping users, to connect information seekers to unexplored information sources, and to take the goal of helping users as a professional duty that is controlled first and foremost by the library user.12 similarly, jaeger and burnett argue that, because of its traditional defense of commonly accepted and popular values—such as free access to and exchange of information, providing a diverse range of materials and perspectives to users from across society, and opposition to government intrusions into personal reading habits—public libraries have come to be seen by members of the populace as a trusted source of information in the community.13 gorman argues for a direct link between the values of libraries and the trust that is instilled within them by the public, stating that one important mission for ensuring the survival of libraries and librarianship is “assuring the bond of trust between the library and the society we serve by demonstrating our stewardship and commitment, thus strengthening the mutuality of the interests of librar ians and the broader community.”14 further, a 2006 study conducted by public agenda found that “public libraries seem almost immune to the distrust that is associated with so many other institutions.”15 in specific terms of the internet, the public library “is a trusted communitybased entity to which individuals turn for help in their online activities—even if they have comput ers and internet access at home or elsewhere.”16 in a large scale national survey, 64 percent of respondents, including both users and nonusers of public libraries, asserted that providing public access to the internet should be one of the highest priorities for public libraries.17 thus, trust in public libraries seems to carry over from other library services to provision of internet access and training. however, challenges to trust in public libraries seem to be growing in the internet age. the trusted role of pro tecting users’ personal information may create conflicts with the other social responsibilities of public libraries.18 as a result of a lack of preparedness of some librarians to deal with privacy issues, it is possible that “the trust that research shows users place in libraries is not fully repaid.”19 a 2005 oclc study suggests that, indeed, user trust in public libraries shows signs of weakening, as the majority of citizens place as much trust in internet search engines as they do in public libraries.20 further, the changes in the law following the 9/11 terror attacks that have increased the ability of the federal government to track patron activities in public libraries, such as through the usa patriot act, have raised serious concerns about privacy and freedom of expression among many public library patrons and librarians.21 trust in libraries also has been challenged by the impo sition of filters for public libraries that receive erate fund ing due to the children’s internet protection act.22 while internet access is no longer unfettered in libraries that have to comply with the law, public libraries have been able to prevent this law from eroding their role as trusted internet provider through ala’s vigorous legal challenge to the constitutionality of law and the rejection of erate funds by a large number of libraries after the supreme court upheld the constitutionality of the law.23 thus, the trusting rela tionships that public libraries have built with their com munities are valuable commodities that can be transferred under some circumstances from one particular service to another, yet are not inalienable rights granted to public libraries. rather, public trust is something that libraries must work hard to maintain. trust in public libraries also has served as an important cause and effect of the role of libraries in providing access to egovernment. ■ public libraries and e-government public libraries are not only trusted as a means of access to the internet in general, they are trusted as a provider of access to egovernment. with nearly every united states public library now connected to the internet and offer ing free public access, they can fill a community need of ensuring that all citizens have access to egovernment and assistance using egovernment services.24 indeed, public libraries and the internet have both improved public access to government information.25 this social role also is embraced by all levels of government, with government agencies often directing people with questions about their online materials to public libraries for help.26 as such, government agencies also trust public libraries to serve as key providers of e government access and training. public libraries could not have foreseen becoming the default social access point for egovernment when they began to provide free public internet access in the mid1990s, due in great part to the largely separate evolution of internet access in libraries and egovernment. however, they now fill this role in society, ensuring access to those who have no other means of reaching egovernment and providing a safety article title | author 37public libraries, values, trust, and e-government | jaeger and fleischmann 37 net of training and assistance for those who have access but need help using egovernment. public libraries have developed into the social source of egovernment for two reasons. the first is simply that libraries committed to the provision of public internet access in the early 1990s and have continued to grow and improve that access so that virtually all public libraries in the united states provide free public internet access.27 however, presence of access alone does not account for the current role of the public library, as most public schools and government offices have internet access, and community technology centers were origi nally funded to create an environment that would provide computer access. a key difference in public libraries is that they are historically trusted as providers of information, including government information, to all segments of society. “the public library is one place that is culturally ingrained as a trusted source of free and open information access and exchange.”28 a key part of the provision of internet access in pub lic libraries also has been providing help. as heanue explains, “even if americans had all the hardware they needed to access every bit of government information they required, many would still need the help of skilled librarians whose job it is to be familiar with multiple systems of access to government systems.”29 not only is the information trusted because of the source, the help is trusted because the librarians are part of the library. as egovernment has developed and the complexity has grown, this trusted help has become invaluable to many people who need to use egovernment but do not feel able to on their own. in a 2001 study of both public library and internet users, the key preferences identified for public libraries included the ease of use, accuracy of informa tion available, and help provided by library staff.30 these perceptions have carried over into egovernment, as the staff members not only provide help using egovernment; their guidance directs users to the correct egovernment sites and forms and makes using the sites an easier expe rience than it otherwise would be. in the era of egovernment, governments internation ally are showing a strong preference for delivering ser vices via the internet, particularly as a means of boosting costefficiency and reducing time spent on direct interac tions with citizens.31 however, citizens show a strong preference for phonebased or inperson interactions with government representatives when they have questions or are seeking services.32 egovernment services generally are limited by difficulties in searching for and locating the desired information, as well as lack of availability of computers and internet access to many segments of the general population.33 such problems are exacerbated by general lack of familiarity of the structure of government and which agencies to contact as well as many citizens’ attitudes toward technology and government.34 also, as egovernment sites give more emphasis to presenting political agendas rather than promoting democratic par ticipation, users are less trusting of the sites themselves.35 finally, perhaps the most compelling reason for the reli ance on public libraries to provide access to and help with egovernment is that public libraries provide support equally to all members of a community—and that free services are of most relative value to those who have the fewest resources of their own. as a result of the reliance of patrons and government agencies on the public library as a center for egovernment access and assistance, public librarians have had to become de facto experts on egovernment, ranging from medicare prescription plans to fema forms to immigration registra tion to water management registration.36 in one case, the involvement of a librarian who specialized in government information was necessary in a community planning pro cess to sort through the related egovernment materials and information sources.37 one area where the social roles as provider of egovernment and as trusted provider of information were notably intertwined was during the 2004 and 2005 hurricane seasons along the gulf coast. ■ public libraries as trusted provider of e-government public libraries have become vital access points and com munication hubs for many communities and, in times of emergency, are vital in helping their communities cope with the crisis.38 this role proved especially important in com munities along the gulf coast during the unprecedented 2004 and 2005 hurricane seasons, with public libraries employing their internet access to assist their communities in hurricane recovery in numerous ways. the public librar ies in that region described five major roles for the public library internet access in communities after a hurricane: ■ finding and communicating with dispersed and dis placed family members and friends; ■ completing fema forms, which are online only, and insurance claims; ■ searching for news about conditions in the areas from which they had evacuated; ■ trying to find information about the condition of their homes or places of work, including checking news sites and satellite maps; and ■ helping emergency service providers find informa tion and connect to the internet.39 the provision of egovernment information and assis tance in filling out egovernment forms was a central function of these libraries in helping their communities. the level of assistance was astounding—one mississippi library completed more than fortyfive thousand fema 38 information technology and libraries | december 200738 information technology and libraries | december 2007 applications for patrons in the first month after katrina struck—despite the fact that the libraries were not specifi cally prepared to offer such a service and that few library systems planned for this type of situation.40 furthermore, while libraries helped many communities, they could not meet the enormous needs in the affected communi ties. the events along the gulf coast in 2004 and 2005 revealed a serious need for the integration of local and state public entities that have largescale coordination plans to work with the libraries.41 most of the functions that community organizations played in the most ravaged areas after katrina, rita, wilma, dennis, ivan, and the other major storms were completely ad hoc and unplanned.42 the federal gov ernment was of little help in the immediate aftermath of many of these situations.43 as such, it was the local community organizations, particularly public libraries, that used information technology (at least what was still working) to try to pick up the pieces, get aid, find the missing, and perform other vital functions. consider the following quotes from local government officials explaining the role computers and internet access in public libraries played in providing information to dev astated communities: our public access computers have been the only source of communicating with insurance carriers, the federal emergency management agency and other sources of aid. the greatest impact has been access to information such as fema forms and job applications that are only available via internet. this was highly visible during the aftermath of hurricanes rita & katrina. overall access to information in this rural community has been outstanding due to use of the internet. relief workers were encouraged to use the library to keep in touch with family and friends through email. . . . the library provided a fema team with local maps and help in locating areas that potentially suffered major damage from the storm. during the immediate aftermath of katrina, our com puters were invaluable in locating missing family, applying for fema relief (which could only be done online) and other emergency needs. for that time—the computers were a godsend. we have a large number of displaced people who are coming to rely upon the library in ways many of them never expected. i’ve had so many people tell me that they had never been to a library before they had to find someplace to file a fema application or insur ance claim. many of these people knew nothing about computers and would have been totally lost without the staff’s help.44 along with egovernment access, one of the greatest affects of access to information related to searches for lost family, friends, and pets, with many libraries creating lists of individuals who had been to the library and who were being sought to help in establishing contacts between people. as one librarian stated, “our computers were invaluable in locating a missing family.”45 searches were conducted by patrons and by librarians helping them to locate evacuees and search for information about those who stayed behind. internet access also allowed patrons to have “contact with family members outside of the disaster area,” “communicate with family and friends,” and “stay in touch with family and friends due to lack of telephone service.”46 libraries used their internet access to aid rescue personnel to communicate with their agen cies, and even to direct emergency responders with direc tions, maps, and information about where people most needed help.47 the level of local libraries’ success in meeting the needs of their communities after the hurricanes varied widely, though. many were simply overwhelmed by the numbers of people in need and limited by the fact that they had never expected to have to act as a community lifeline in this way.48 the libraries that faired the best were usually in florida; they have a greater familiarity with dealing with hurricanes and thus were more prepared and had more established ties between local libraries, county governments, and state agencies.49 having internet access and expertise is clearly not enough. planning, coordina tion, experience, and government support and funding all influenced how different public libraries were able to respond after the major hurricanes. public libraries also may be able to play a role in ongoing emergency response efforts, such as the development of largescale community response grids that coordinate citizens and emergency responders in emergencies.50 the greatest lesson, however, may be that public librar ies, as trusted providers of information technology access, particularly access to egovernment, are the most local line of response in communities. the national government failed shatteringly and completely to help people after hurricane katrina, while little public libraries in and on the edges of the devastation hummed along. the local nature of the response that libraries could provide man aged to reach communities and members of those commu nities much better than national or state level responses. such local response to crises, while vital, is becoming much harder to find outside of public libraries. ■ the nexus of public libraries, values, trust, and e-government the democratically oriented core values of public librar ies and the trust that communities place in their public article title | author 39public libraries, values, trust, and e-government | jaeger and fleischmann 39 libraries have the potential to significantly enhance and strengthen the role of public libraries in the provision of egovernment. citizens who access egovernment using computers in public libraries, and with the expert assistance of librarians, may have more confidence in the egovernment information and services they are using as a result of their high regard for public libraries. as patrons trust that librarians will help them reach the information they need, patrons’ awareness of and confidence in egovernment will increase as they learn from librarians about the types of information and services available from egovernment. further, by teaching patrons what is available from and how to use egovernment, librar ians are serving to increase the number of egovernment users. because egovernment is still at an early stage in its development, such positive associations could play a critical role in encouraging and facilitating its widespread acceptance and adoption. just as egovernment is still in its formative stages, research on egovernment also is just getting started. to date, research on egovernment has focused more on technical than social aspects. for example, a meta analysis of 110 peerreviewed journal articles related to egovernment revealed that the relationship between egovernment and values is an important, yet to date understudied, topic.51 it is important to consider not only bandwidth and markup languages, but also values and trust in developing and analyzing egovernment. it also is important to consider the relationship between trust in egovernment and the potential for increasingly participatory democracy. trust can be seen as “centrally positioned at the nexus between the primarily internally driven administrative reforms of egovernment’s architecture and the related, more exter nally rooted pressures for egovernance reflected in widening debates on openness and engagement.”52 similarly, “citizen engagement can help build and strengthen the trust relationship between governments and citizens.”53 through egovernment, it is possible to facilitate citizen participation in government through the bidirectional interactive potential of the internet, making it possible to move toward strong democracy.54 greater faith in democracy can potentially significantly increase citizen trust in egovernment. at the same time that we consider all of these impor tant issues related to egovernment, it is important not to lose sight of the critical role that public libraries play in the provision of egovernment. further, it is necessary to make certain that public libraries receive credit and support for the work that they do in providing access to and help with egovernment. as demonstrated above, public libraries are uniquely and ideally situated to ensure access to and assistance in using egovernment information and services. however, this activity is not sustainable without the recognition and resources that must accompany this role. the conclusion addresses this important point in more detail. ■ conclusions and future directions the evolution of the public library into an egovernment access point has occurred without the direct intention of public libraries and without their involvement in policy decisions related to these new social roles. as with the need to become more active in encouraging the develop ment of technologies to help libraries fulfill these social expectations, public libraries also must become more involved in the policymaking process and in seeking financial and other support for these activities. public libraries have to demand a voice not only to better con vey their critical role in the provision egovernment, but to help shape the direction of the policymaking process to ensure more government support for the access to and help with egovernment that they provide. public libraries have taken on these responsibilities without receiving additional funding. while the provi sion of internet access alone is a major expense for public libraries, the reliance of government agencies on public libraries as the public support system for egovernment adds very significant extra burdens to libraries.55 in a 2007 survey of florida public libraries, for example, 98.7 percent indicated that they receive no support from an outside agency to support the egovernment services the library provides, despite the fact that 83.3 percent of responding libraries indicated that the use of egovern ment in the library had increased overall library usage.56 this lack of outside support has resulted in public librar ies in different parts of the country having widely varying access to the internet.57 the reality is that public libraries are expected by patrons and government agencies to fulfill this social role, whether or not any support—financial, staffing, or training—is provided for this role. the vital roles that public libraries played in the aftermath of the major hur ricanes of the 2004 and 2005 seasons may have perma nently cemented the public and government perception of public libraries as hubs for egovernment access.58 while public libraries have become the unofficial uni versal access point for egovernment and are trusted to serve as a vital community response and recovery agency during emergencies, they do not receive funding or other forms of external assistance for these functions. public libraries need to become involved in and encourage plans and programs that will serve to sustain these essential and inextricably linked activities, while also bringing some level of financial, training, and staffing support for these roles. the tremendous efforts and successes of public librar ies in the aftermath of the 2004 and 2005 hurricanes has 40 information technology and libraries | december 200740 information technology and libraries | december 2007 earned libraries a central position to egovernment and emergency planning at local, state, and federal levels. in those emergency situations, public libraries were able to serve their communities in a capacity that was far beyond the traditional image of the role of libraries, but these emergency response roles are as significant as anything else libraries could do for their communities. in order to continue fulfilling these roles and adequately performing other expected functions, public libraries need to push not only for financial support, but also for a greater role in planning and decisionmaking related to egovernment services as well as emergency response and recovery at all levels of government. if strategic plans and library activities have a consis tent message about the need for support, the interrelated roles of trusted source of local information, egovernment access provider, and communityresponse information and coordination center can make a compelling argument for increases in funding, support, and social standing of public libraries. the most obvious source of further sup port for these activities would be the federal government. amazingly, federal government support accounts for only about 1 percent of public library funding.59 given that federal government agencies are already relying on public libraries to ensure access to egovernment and fos ter community response and recovery in times of emer gencies, federal support for these social roles of the public library clearly can and should be increased significantly. state libraries, cooperatives, and library networks already work to coordinate funding and activities related to certain programs, such as the erate program.60 these same library collectives may be able to work together to promote the need for additional resources and coor dinate those resources once they are attained. private and public partnerships offer another potential means of support for these library activities. with its strong historical and current connections to technology and libraries, the bill and melinda gates foundation might be a very important partner in funding and facilitating the increased role that public libraries play in providing access to and help with egovernment. the search for additional funding to support egovernment provision should not only focus on funds for access and training, but also on funds for research about how to better meet individual and community egovernment needs and the affects of egovernment provision by public libraries on individuals and communities. regardless of what approaches are taken to find ing greater support, however, public libraries must do a better job of communicating their involvement in the provision of egovernment to governments and private organizations in order to increase support. such commu nications will need to be part of a larger strategy to define a place within public policy that gives public libraries a voice in egovernment issues. if public libraries are going to fulfill this social role, they must become a greater pres ence in the national policy discourse surrounding egov ernment. to increase their support and standing in policy discourse, libraries must not be hesitant in reminding the public and government officials of their successes after emergencies and in providing the social infrastructure for efiling of taxes, enrolling in medicare prescription drug plans, and myriad other routine egovernment activities. in many societies, egovernment has come to be seen by many citizens and governments as a force that will enhance democratic participation, more closely link citizens and their representatives, and help disadvan taged populations become more active participants in government and in society.61 egovernment is seen by many as having “the potential to fundamentally change a whole array of public interactions with government.”62 while the egovernment act of 2002 and president’s egovernment management agenda have emphasized the transformative effect of egovernment, thus far it has primarily been used as a way to make information available, provide forms and electronic filing, and distrib ute the viewpoints of government agencies.60 however, many citizens do look to egovernment as a valuable source of information, considering egovernment sites to be “objective authoritative sources.”64 currently, the primary reason that people use egovernment is to gather information.65 in the united states, 58 percent of internet users in the united states believe egovernment to be the best source for government information, 65 percent of americans expect that information they are seeking will be on a government site, and 26 million americans seek political information online everyday.66 public satisfaction with the egovernment services available, however, is limited. as commercial sites are developing faster and provide more innovative services than egovernment sites, public satisfaction with gov ernment web sites is declining.67 public confidence in government web sites also has declined as much of the public policy related to egovernment since 9/11 has been to reduce access to information through egovern ment.68 the types of information that have been affected include many forms of socially useful information, from scientific information to public safety information to information about government activities.69 for these and other reasons, the majority of citizens, even those with a highspeed internet connection at home, seeking govern ment information and services prefer to speak to a person directly in their contacts with the government.70 in many cases, people turn to public librarians to serve as the per son involved in egovernment contacts. further, when people struggle with, become frustrated by, or reject egovernment services, they turn to public libraries. every year, public libraries deal with huge num bers of patrons needing help with online taxes, and the medicare prescription drug plan signup period resulted in article title | author 41public libraries, values, trust, and e-government | jaeger and fleischmann 41 an influx of seniors to public libraries seeking help in using the online registration system.71 for example, during the 2006 tax season, virginia discontinued the distribution of free print copies of tax forms to encourage use of the online system. instead, citizens of the state flooded public librar ies, assuming that libraries could find them print copies of the forms, which of course the libraries did. it seems unlikely, however, that the same government officials pushing the use of egovernment are aware of the roles of public libraries in helping citizens with daytoday egovernment use. further, the enormous social roles of public libraries in emergency response in communities, such as during the 2004 and 2005 hurricane seasons, are far from widely known among government officials. to encourage the provision of external funding, the develop ment of targeted support technologies, and policy sup port for these social roles, public libraries must make the government and the public better aware of these roles and what is needed to ensure that the roles can be fulfilled. similarly, there is an extremely important role for lis programs in ensuring public libraries can meet community expectations for egovernment provision. lis program graduates need to be prepared to help patrons access and use egovernment information and services. as govern ment activities move primarily or exclusively online, patrons will increasingly seek help with egovernment from public libraries. lis programs must ensure that grad uates are ready to serve patrons in this capacity. in 2007, the college of information studies at the university of maryland became the first alaaccredited school to offer a concentration in egovernment as part of the master of library science program.72 the goal of this concentration is to prepare future librarians who wish to specialize in egovernment, which will be an area of increasing and sig nificant need as more government information and services move online and more government agencies rely on public libraries to ensure access to egovernment. lis programs need to prioritize finding ways to incorporate the teaching of issues related to egovernment in public libraries as new concentrations or courses, or into existing courses. the provision of egovernment is an important role of public libraries that is likely to increase significantly, and gradu ates of lis programs need to be prepared to meet patrons’ egovernment information needs. further, lis faculties also can support public libraries in their egovernment access and training roles by focusing more research on the intersections of public libraries and egovernment. ultimately, the role of the trusted and valued public provider of egovernment access creates many financial and staffing obligations and social responsibilities, but it also is a tremendous opportunity for public libraries. fighting against censorship efforts in the 1950s estab lished the public perception of libraries as guardians of the first amendment during the mccarthy era.73 working to ensure access and the ability to use egovernment is creating new public perceptions of libraries as guardians of equal access in new but just as socially meaningful ways. rather than needing to ponder whether the emer gence of the internet will limit or remove the relevance of public libraries, the advent of egovernment has created a brand new and very significant role that public libraries can play in serving their communities. given the empha sis that governments are placing on moving information and services online, patrons will continue to need access to and assistance in using egovernment. the trust and values that have long been associated with public libraries are evolving to include the social expectations of the provision of access to and training for egovernment by public libraries. in the same ways that patrons have learned to trust public libraries to provide equal access to print information sources, they now have learned to trust that libraries can provide equal access to egovernment information. it seems that citizens will regu larly be turning to public libraries for help with mundane egovernment activities, such as finding forms and filing taxes, as well as with the most pressing egovernment activities, as was demonstrated in the aftermath of hur ricanes katrina and rita. because the trust in and values of public libraries have set the stage for the emerging role of libraries in egovernment, public libraries need to work to ensure the availability of the support, education, and policy decisions that they need to serve their communities in this new and vital role in situations ranging from every day information needs to the most extreme circumstances. in spite of the costs associated with serving as the public’s egovernment access center, acting as the social guarantor of equal access to egovernment emphatically demonstrates that public libraries will continue to be a central part of the infrastructure of society in the internet age. public libraries now must learn to articulate better the social roles they are playing and the types of support they need from lis programs, funding agencies, and gov ernment agencies to continue playing these roles. ■ acknowledgment the authors of this paper have worked with several col leagues on projects related to the ideas discussed in this paper. the authors would particularly like to thank john carlo bertot, lesley a. langa, charles r. mcclure, jennifer preece, yan qu, ben shneiderman, and philip fei wu. references and notes 1. margaret mooney marini, “social values and norms,” encyclopedia of sociology, edgar f. borgatta and marie l. borgatta, eds., 2828 (new york: macmillan, 2000). 42 information technology and libraries | december 200742 information technology and libraries | december 2007 2. steven hitlin and jane allyn piliavin, “values: reviv ing a dormant concept,” annual review of sociology 30 (2004): 359–93. 3. michael gorman, our singular strengths: meditations for librarians (chicago: ala, 1997); michael gorman, our enduring values: librarianship in the 21st century (chicago: ala, 2000); michael gorman, our own selves: more meditations for librarians (chicago: ala, 2005). 4. gorman, our enduring values. 5. frances k. groen, access to medical knowledge: libraries, digitization, and the public good (lanham, md.: scarecrow, 2007). 6. elizabeth smith, “equal information access and the evo lution of american democracy,” journal of educational media and library sciences 33, no. 2 (1995): 158–71. 7. toni samek, intellectual freedom and social responsibility in american librarianship, 1967–1974 (jefferson, n.c.: mcfarland, 2001). 8. pam scott, evelleen richards, and brian martin, “cap tives of controversy: the myth of the neutral social researcher in contemporary scientific controversies,” science, technology, and human values 15 (1990): 474–94. 9. american library association, “library bill of rights,” www.ala.org/ala/oif/statementspols/statementsif/librarybill rights.htm (accessed may 19, 2007). 10. kenneth r. fleischmann, “digital libraries with embed ded values: combining insights from lis and science and technology studies,” library quarterly (in press); kenneth r. fleischmann, “digital libraries and human values: human computer interaction meets social informatics,” proceedings of the 70th annual conference of the american society for infor mation science and technology, milwaukee, wisc., 2007. 11. pew research center, americans and social trust: who, where, and why (washington, d.c.: pew research center, 2007), http://pewresearch.org/assets/social/pdf/socialtrust.pdf, 2. 12. david wildon carr, “an ethos of trust in information service,” in ethics and electronic information: a festschrift for stephen almagno, barbara rockenbach and tom mendina, eds., 45–52 (jefferson, n.c.: mcfarland, 2003). 13. paul t. jaeger and gary burnett, “information access and exchange among small worlds in a democratic society: the role of policy in redefining information behavior in the post 9/11 united states,” library quarterly 75, no. 4 (2005): 464–95. 14. gorman, our enduring values, 66. 15. public agenda, long overdue: a fresh look at public and leadership attitudes about libraries in the 21st century (new york: public agenda, 2006), 11, www.publicagenda.org/research/ pdfs/long_overdue.pdf (accessed may 19, 2007). 16. john carlo bertot et al., “public access computing and internet access in public libraries: the role of public librar ies in egovernment and emergency situations,” first monday 11, no. 9 (2006), www.firstmonday.org/issues/issue11_9/bertot (accessed may 19, 2007). 17. public agenda, long overdue. 18. nancy zimmerman and feili tu, “it is not just a matter of ethics ii: an examination of issues related to the ethical provi sion of consumer health services in public libraries,” ethics and electronic information: a festschrift for stephen almagno, barbara rockenbach and tom mendina, eds., 119–27 (jefferson, n.c.: mcfarland, 2003). 19. paul sturges and ursula iliffe, “preserving a secret garden for the mind: the ethics of user privacy in the digital library,” ethics and electronic information: a festschrift for stephen almagno, barbara rockenbach and tom mendina, eds., 74–81 (jefferson, n.c.: mcfarland, 2003), 81. 20. online computer library center, inc. (oclc), perceptions of libraries and information resources: a report to the oclc membership (dublin, ohio: oclc, 2005). 21. jaeger and burnett, “information access and exchange among small worlds in a democratic society”; paul t. jaeger et al., “the usa patriot act, the foreign intelligence surveil lance act, and information policy research in libraries: issues, impacts, and questions for library researchers,” library quarterly 74, no. 2 (2004): 99–121. 22. children’s internet protection act, public law 106–554. 23. paul t. jaeger, john carlo bertot, and charles r. mcclure, “the effects of the children’s internet protection act (cipa) in public libraries and its implications for research: a statistical, policy, and legal analysis,” journal of the american society for information science and technology 55, no. 13 (2004): 1131–39; paul t. jaeger et al., “cipa: decisions, implementation, and impacts,” public libraries 44, no. 2 (2005): 105–09. 24. bertot et al., “public access computing and internet access in public libraries”; john carlo bertot et al., “drafted: i want you to deliver egovernment,” library journal 131, no. 13 (2006): 34–39; john carlo bertot et al., public libraries and the internet 2006: study results and findings (tallahassee, fla.: infor mation institute, 2006), www.ii.fsu.edu/plinternet_reports.cfm (accessed may 19, 2007). 25. nancy kranich, “libraries, the internet, and democracy,” libraries & democracy: the cornerstones of liberty, nancy kranich, ed., 83–95 (chicago: ala, 2001). 26. bertot et al., “public access computing and internet access in public libraries”; bertot et al., “drafted.” 27. bertot et al., public libraries and the internet 2006. 28. jaeger and burnett, “information access and exchange among small worlds in a democratic society,” 487. 29. anne heanue, “in support of democracy: the library role in public access to government,” information, libraries, and democracy: the cornerstones of liberty, nancy kranich, ed. (chi cago: ala, 2001), 124. 30. george d’elia et al., “the impact of the internet on public library uses: an analysis of the current consumer market for library and internet services,” journal of the american society for information science and technology 53, no. 10 (2002): 802–20; eleanor jo rodger, george d’elia, and corrine jorgensen, “the public library and the internet: is peaceful coexistence pos sible?,” american libraries 31, no. 5 (2001): 58–61. 31. w. e. ebbers, w. j. pieterson, and h. n. noordman, “elec tronic government: rethinking channel management strate gies,” government information quarterly (in press). 32. ibid. 33. awdhesh k. singh and rajendra sahu, “integrating inter net, telephones, and call centers for delivering better quality egovernance to all citizens,” government information quarterly (in press). 34. paul t. jaeger and kim m. thompson, “egovernment around the world: lessons, challenges, and new directions,” government information quarterly 20, no. 4 (2003): 389–94; paul t. jaeger and kim m. thompson, “social information behavior article title | author 43public libraries, values, trust, and e-government | jaeger and fleischmann 43 and the democratic process: information poverty, normative behavior, and electronic government in the united states,” library & information science research 26, no. 1 (2004): 94–107. 35. paul t. jaeger, “deliberative democracy and the con ceptual foundations of electronic government,” government information quarterly 22, no. 4 (2005): 702–19; paul t. jaeger, “information policy, information access, and democratic partic ipation: the national and international implications of the bush administration’s information politics,” government information quarterly (in press). 36. bertot et al., “public access computing and internet access in public libraries”; bertot et al., “drafted.” 37. aimee c. quinn and laxmi ramasubramanian, “infor mation technologies and civic engagement: perspectives from librarianship and planning,” government information quarterly (in press). 38. bertot et al., public libraries and the internet 2006; paul t. jaeger et al., “the 2004 and 2005 gulf coast hurricanes: evolv ing roles and lessons learned for public libraries in disaster preparedness and community services,” public library quarterly (in press). 39. bertot et al., “drafted.” 40. jaeger et al., “the 2004 and 2005 gulf coast hurricanes.” 41. ibid. 42. ibid. 43. michael arnone, “storm watch 2006: ready or not,” federal computer week, june 5, 2006, www.fcw.com/print/12_20/ news/947111.html (accessed may 19, 2007). 44. jaeger et al., “the 2004 and 2005 gulf coast hurricanes.” 45. bertot et al., “public access computing and internet access in public libraries.” 46. jaeger et al., “the 2004 and 2005 gulf coast hurricanes.” 47. ibid. 48. ibid. 49. bertot et al., “public access computing and internet access in public libraries.” 50. paul t. jaeger et al., “911.gov: harnessing egovernment, mobile communication technologies, and social networks to promote community participation in emergency response,” telecommunications policy (in press); ben shneiderman and jenny preece, “911.gov: community response grids,” science 315 (2007): 944. 51. kim viborg andersen and helle zinner henriksen, “egovernment research: capabilities, interaction, orientation, and values,” current issues and trends in e-government research, donald f. norris, ed., 269–88 (hershey, pa.: cybertech, 2007). 52. jeffrey roy, “egovernment in canada: transition or trans formation?” current issues and trends in e-government research, donald f. norris, ed., 44–67 (hershey, pa.: cybertech, 2007), 51. 53. oecd egovernment studies, the e-government imperative (danvers, mass.: organization for economic cooperation and development, 2005), 45. 54. bruce barber, strong democracy (berkeley, calif.: univ. of california pr., 1984). 55. bertot et al., public libraries and the internet 2006. 56. charles r. mcclure et al., e-government and public libraries: current status, meeting report, findings, and next steps (tallahassee, fla.: information use management and policy institute, 2007), www.ii.fsu.edu/announcements/egov2006/ egov_report.pdf (accessed may 19, 2007). 57. paul t. jaeger et al., “public libraries and internet access across the united states: a comparison by state from 2004 to 2006,” information technology and libraries 26, no. 2 (2007): 4–14. 58. jaeger et al., “the 2004 and 2005 gulf coast hurricanes.” 59. bertot et al., “drafted.” 60. jaeger et al., “public libraries and internet access across the united states.” 61. beth simone noveck, “designing deliberative democracy in cyberspace: the role of the cyberlawyer,” boston university journal of science and technology 9 (2003): 1–91. 62. s. h. holden and l. i. millett, “authentication, privacy, and the federal egovernment,” information society 21 (2005): 367. 63. egovernment act of 2002, p.l. 107–347; jaeger, “delibera tive democracy and the conceptual foundations of electronic government”; e-government strategy: implementing the president’s management agenda for e-government (washington, d.c.: egov, 2003), www.whitehouse.gov/omb/egov/2003egov_strat.pdf (accessed may 19, 2007). 64. anderson office of government services, a usability analysis of selected federal government web sites (anderson office of government services: washington, d.c., 2002), 1. 65. christopher g. reddick, “citizen interaction with egov ernment: from the streets to servers?,” government information quarterly 22, no. 1 (2005): 338–57. 66. john b. horrigan, politics online (washington, d.c., pew internet & american life project, 2006); john b. horrigan and lee rainie, counting on the internet (washington, d.c., pew internet & american life project, 2002). 67. stephen barr, “public less satisfied with government websites,” washington post, mar. 21, 2007, www.washingtonpost. com/wpdyn/content/article/2007/03/20/ar2007032001338. html (accessed may 19, 2007). 68. lotte e. feinberg, “foia, federal information policy, and information availability in a post9/11 world,” government information quarterly 21 (2004): 439–60; elaine l. halchin, “electronic government: government capability or terrorist resource,” government information quarterly 21 (2004): 406–19: harold c. relyea and elaine l. halchin, “homeland security and information management,” the bowker annual: library and trade almanac 2003, d. bogart, ed., 231–50 (medford, n.j.: infor mation today, 2003). 69. jaeger, “information policy, information access, and democratic participation.” 70. john b. horrigan, how americans get in touch with government (washington, d.c., pew internet & american life project, 2004). 71. bertot et al., “public access computing and internet access in public libraries”; bertot et al., “drafted.” 72. the description of the university of maryland’s egov ernment master’s program is available at www.clis.umd.edu/ programs/egov.shtml. 73. jaeger and burnett, “information access and exchange among small worlds in a democratic society.” 104 information technology and libraries | september 2010 development of such a mediation mechanism calls for an empirical assessment of various issues surrounding metadata-creation practices. the critical issues concerning metadata practices across distributed digital collections have been relatively unexplored. while examining learning objects and e-prints communities of practice, barton, currier, and hey point out the lack of formal investigation of the metadatacreation process.2 as will be discussed in the following section, some researchers have begun to assess the current state of descriptive practices, metadata schemata, and content standards. however, the literature has not yet developed to a point where it affords a comprehensive picture. given the propagation of metadata projects, it is important to continue to track changes in metadata-creation practices while they are still in constant flux. such efforts are essential for adding new perspectives to digital library research and practices in an environment where metadata best practices are being actively sought after to aid in the creation and management of high-quality digital collections. this study examines the prevailing current state of metadata-creation practices in digital repositories, collections, and libraries, which may include both digitized and born-digital resources. using nationwide survey data, mostly drawn from the community of cataloging and metadata professionals, we seek to investigate issues in creating descriptive metadata elements, using controlled vocabularies for subject access, and propagating metadata and metadata guidelines beyond local environments. we will address the following research questions: 1. which metadata schema(ta) and content standard(s) are employed in individual digital repositories and collections? 2. which controlled vocabulary schema(ta) are used to facilitate subject access? 3. what criteria are applied in selecting metadata and controlled-vocabulary schema(ta)? 4. to what extent are mechanisms for exposing and sharing metadata integrated into current metadatacreation practices? in this article, we first review recent studies relating to current metadata-creation practices across digital collections. then we present the survey method employed to conduct this study, the general characteristics of survey participants, and the validity of the collected data, followed by the study results. we report on how metadata and controlled vocabulary schema(ta) are being used across institutions, and we present a data analysis of current metadata-creation practices. the final section summarizes the study and presents some suggestions for future studies. this study explores the current state of metadata-creation practices across digital repositories and collections by using data collected from a nationwide survey of mostly cataloging and metadata professionals. results show that marc, aacr2, and lcsh are the most widely used metadata schema, content standard, and subjectcontrolled vocabulary, respectively. dublin core (dc) is the second most widely used metadata schema, followed by ead, mods, vra, and tei. qualified dc’s wider use vis-à-vis unqualified dc (40.6 percent versus 25.4 percent) is noteworthy. the leading criteria in selecting metadata and controlled-vocabulary schemata are collection-specific considerations, such as the types of resources, nature of the collection, and needs of primary users and communities. existing technological infrastructure and staff expertise also are significant factors contributing to the current use of metadata schemata and controlled vocabularies for subject access across distributed digital repositories and collections. metadata interoperability remains a major challenge. there is a lack of exposure of locally created metadata and metadata guidelines beyond the local environments. homegrown locally added metadata elements may also hinder metadata interoperability across digital repositories and collections when there is a lack of sharable mechanisms for locally defined extensions and variants. m etadata is an essential building block in facilitating effective resource discovery, access, and sharing across ever-growing distributed digital collections. quality metadata is becoming critical in a networked world in which metadata interoperability is among the top challenges faced by digital libraries. however, there is no common data model that cataloging and metadata professionals can readily reference as a mediation mechanism during the processes of descriptive metadata creation and controlled vocabulary schemata application for subject description.1 the jung-ran park (jung-ran.park@ischool.drexel.edu) is assistant professor, college of information science and technology, drexel university, philadelphia, and yuji tosaka (tosaka@tcnj.edu) is cataloging/metadata librarian, tcnj library, the college of new jersey, ewing, new jersey. jung-ran park and yuji tosaka metadata creation practices in digital repositories and collections: schemata, selection criteria, and interoperability metadata creation practices in digital repositories and collections | park and tosaka 105 possible increase in the use of locally developed schemata as many projects added new types of nontextual digital objects that could not be adequately described by existing metadata schemata.6 there is a lack of research concerning the current use of content standards; however, it is reasonable to suspect that content-standards use exhibits patterns similar to that of metadata because of their often close association with particular metadata schemata. the oclc rlg survey reveals that anglo-american cataloguing rules, 2nd edition (aacr2)—the traditional cataloging rule that has most often been used in conjunction with marc—is the most widely used content standard (81 percent). aacr2 is followed by describing archives: a content standard (dacs) with 42 percent; descriptive cataloging of rare materials with 33 percent; archives, personal papers, manuscripts (appm) with 25 percent; and cataloging cultural objects (cco) with 21 percent.7 in the same way as metadata schemata, there appears to be a concentration of a few controlled vocabulary schemata at research institutions. ma’s arl survey, for example, shows that the library of congress subject headings (lcsh) and name authority file (naf) were used by most survey respondents (96 percent and 88 percent, respectively). these two predominantly adopted vocabularies are followed by several domain-specific vocabularies, such as art and architecture thesaurus (aat), library of congress thesaurus for graphical materials (tgm) i and ii, getty thesaurus of geographic names (tgn), and the getty union list of artists names (ulan), which were used by between 30 percent to more than 60 percent of respondents.8 the oclc rlg survey reports similar results; however, nearly half of the oclc rlg survey respondents (n = 9) indicated that they had also built and maintained one or more locally developed thesauri.9 while creating and sharing information about local metadata implementations is an important step toward increased interoperability, recent studies tend to paint a grim picture of current local documentation practices and open accessibility. in a nationwide study of institutional repositories in u.s. academic libraries, markey et al. found that only 61.3 percent of the 446 survey participants with operational institutional repositories had implemented policies for metadata schemata and authorized metadata creators.10 the oclc rlg survey also highlights limited collaboration and sharing of the metadata guidelines both within and across the institutions. it finds that even when there are multiple units creating metadata within the same institution, metadata-creation guidelines often are unlikely to be shared (28 percent do not share; 53 percent sometimes share).11 a mixed result is reported on the exposure of metadata to outside service providers. in an arl survey, the university of houston libraries institutional repository ■■ literature review as evinced by the principles and practices of bibliographic control through shared cataloging, successful resource access and sharing in the networked environment demands semantic interoperability based on accurate, complete, and consistent resource description. the recent survey by ma finds that the open archives initiative protocol for metadata harvesting (oai-pmh) and metadata crosswalks have been adopted by 83 percent and 73 percent of respondents, respectively. even though the sample comes only from sixty-eight association of research libraries (arl) member libraries, and the figures thus may be skewed higher than those of the entire population of academic libraries, there is little doubt that interoperability is a critical issue given the rapid proliferation of metadata schemata throughout digital libraries.3 while there is a variety of metadata schemata currently in use for organizing digital collections, only a few of them are widely used in digital repositories. in her arl survey, ma reports that the marc format is the most widely used metadata schema (91 percent), followed by encoded archival description (ead) (84 percent), unqualified dublin core (dc) (78 percent), and qualified dc (67 percent).4 similarly, a 2007 member survey by oclc research libraries group (rlg) programs gathered information from eighteen major research libraries and cultural heritage institutions and also found that marc is the most widely used scheme (65 percent), followed by ead (43 percent), unqualified dc (30 percent), and qualified dc (29 percent). the different levels of use reported by these studies are probably due to different sample sizes and compositions, but results nonetheless suggest that metadata use at research institutions tends to rely on a small number of major schemata.5 there may in fact be much greater diversity in metadata use patterns when the scope is expanded to include both research and nonresearch institutions. palmer, zavalina, and mustafoff, for example, tracked trends from 2003 through 2006 in metadata selection and application practices at more than 160 digital collections developed through institute of museum and library services grants. they found that despite perceived limitations, use of dc is the most widespread, with more than half of the digital collections using it alone or in combination with other schemata. marc ranks second, with nearly 30 percent using it alone or in combination. the authors found that the choice of metadata schema is largely influenced by practices at peer institutions and compatibility with a content management system. what is most striking, however, is the finding that locally developed schemata are used as often as marc. there is a decline in the percentage of digital projects using multiple metadata schemata (from 53 percent to 38 percent). yet the authors also saw a 106 information technology and libraries | september 2010 ■■ method the objective of the research reported in this paper is to examine the current state of metadata-creation practices in terms of the creation of descriptive metadata elements, the use of controlled vocabularies for subject access, and the exposure of metadata and metadata guidelines beyond local environments. we conducted a web survey using websurveyor (now vovici: http://www.vovici .com). the survey included both structured and openended questions. it was extensively reviewed by members of an advisory board—a group of three experts in the field—and it was pilot-tested prior to being officially launched. the survey included many multiple-response questions that called for respondents to check all applicable answers. we recruited participants through survey invitation messages and subsequent reminders to the electronic mailing lists of communities of metadata and cataloging professionals. table 1 shows the mailing lists employed for the study. we also sent out individual invitations and distributed flyers to selected metadata and cataloging sessions during the 2008 ala midwinter meeting, held that year in philadelphia. the survey attracted a large number of initial participants (n = 1,371), but during the sixty-two days from august 6 to october 6, 2008, we only received 303 completed responses via the survey management system. we suspect that the high incompletion rate (77.9 percent) stems from the fact that the subject matter may have been outside the scope of many participants’ job responsibilities. the length of the survey may also have been a factor in the incompletion rate. the profiles of respondents’ job titles (see table 2) task force found that exposing metadata to oai-pmh service providers is an established practice used by nearly 90 percent of the respondents.12 ma’s arl survey also reports the wide adoption of oai-pmh (83 percent). these results underscore the virtual consensus on the critical importance of exposing metadata to achieve interoperability and make locally created metadata useful across distributed digital repositories and collections.13 by contrast, the oclc rlg survey shows that only one-tenth of the respondents stated that all non-marc metadata is exposed to oai harvesters, while 30 percent indicated that only some of it was available. the prominent theme revealed by the oclc rlg survey is an “inward focus” in current metadata practices, marked by the “use of local tools to reach a generally local audience.”14 in summary, recent studies show that the current practice of metadata creation is problematic due to the lack of a mechanism for integrating various types of metadata schemata, content standards, and controlled vocabularies in ways that promote an optimal level of interoperability across digital collections and repositories. the problems are exacerbated in an environment where many institutions lack local documentation delineating the metadata-creation process. at the same time, researchers have only recently begun studying these issues, and the body of literature is at an incipient stage. the research that was done often targeted different populations, and sample sizes were different (some very small). in some cases the literature exhibits contradictory findings about issues surrounding metadata practices, increasing the difficulty in understanding the current state of metadata creation. this points out the need for further research of current metadata-creation practice. table 1. electronic mailing lists for the survey electronic mailing lists e-mail address autocat dublin core listserv metadata librarians listserv library and information technology association listserv online audiovisual catalogers electronic discussion list subject authority cooperative program listserv serialist text encoding initiative listserv electronic resources in libraries listserv encoded archival description listserv autocat@listserv.syr.edu dc-libraries@jiscmail.ac.uk metadatalibrarians@lists.monarchos.com lita-l@ala.org olac-list@listserv.acsu.buffalo.edu sacolist@listserv.loc.gov serialst@list.uvm.edu tei-l@listserv.brown.edu eril-l@listserv.binghamton.edu ead@listserv.loc.gov metadata creation practices in digital repositories and collections | park and tosaka 107 and job responsibilities (see table 3) clearly show that most of the individuals who completed the survey engage professionally in activities directly relevant to the research objectives, such as descriptive and subject cataloging, metadata creation and management, authority control, nonprint and special material cataloging, electronic resource and digital project management, and integrated library system (ils) management. although the largest number of participants (135, or 44.6 percent) chose the “other” category regarding their job title (see table 2), it is reasonable to assume that the vast majority can be categorized as cataloging and metadata professionals.15 most job titles given as “other” are associated with one of the professional activities listed in table 4. thus it is reasonable to assume that the respondents are in an appropriate position to provide first-hand, accurate information about the current state of metadata creation in their institutions. concerning the institutional background of participants, of the 303 survey participants, fewer than half (121, or 39.9 percent) provided institutional information. we believe that this is mostly due to the fact that the question was optional, following a suggestion from the institutional review board at drexel university. of those that provided their institutional background, the majority (75.2 percent) are from academic libraries, followed by participants from public libraries (17.4 percent) and from other institutions (7.4 percent). table 3. participants’ job responsibilities (multiple responses) job responsibilities number of participants general cataloging (e.g., descriptive and subject cataloging) 171 (56.4%) metadata creation and management 153 (50.5%) authority control 147 (48.5%) nonprint cataloging (e.g., microform, music scores, photographs, video recordings) 133 (43.9%) special material cataloging (e.g., rare books, foreign language materials, government documents) 126 (41.6%) digital project management 101 (33.3%) electronic resource management 62 (20.5%) ils management 59 (19.5%) other 51 (16.8%) survey question: what are your primary job responsibilities? (please check all that apply) table 2. job titles of participants (multiple responses) job titles number of participants other 135 (44.6%) cataloger/cataloging librarian/ catalog librarian 99 (32.7%) metadata librarian 29 (9.6%) catalog & metadata librarian 26 (8.6%) head, cataloging 26 (8.6%) electronic resources cataloger 17 (5.6%) cataloging coordinator 15 (5.0%) head, cataloging & metadata services 15 (5.0%) n = 227. survey question: what is your working job title? (please check all that apply) table 4. professional activities specified in “other” category in table 2 professional activities number of participants cataloging & metadata creation 31 (10.2%) digital projects management 23 (7.6%) technical services 17 (5.6%) archiving 16 (5.3%) electronic resources and serials management 6 (2.0%) library system administration/ other 6 (2.0%) n = 99. survey question: if you selected other, please specify. 108 information technology and libraries | september 2010 it is noteworthy that use of qualified dc was higher than that of unqualified dc. this result is different from the arl survey and a member survey conducted ■■ results in this section, we will present the findings of this study in the following three areas: (1) metadata and controlled vocabulary schemata and metadata tools used, (2) criteria for selecting metadata and controlled vocabulary schemata, and (3) exposing metadata and metadata guidelines beyond local environments. metadata and controlled vocabulary schemata and metadata tools used a great variety of digital objects were handled by the survey participants, as figure 1 shows. the most frequently handled object was text, cited by 86.5 percent of the respondents. about three-fourths of the respondents described audiovisual materials (75.2 percent), while 60.1 percent described images and 51.8 percent described archival materials. more than 65 percent of the respondents handled electronic resources (68.3 percent) and digitized resources (66.7 percent), while approximately half handled borndigital resources (52.5 percent). the types of materials described in digital collections were diverse, encompassing both digitized and born-digital materials; however, digitization accounted for a slightly greater percentage of metadata creation. to handle these diverse digital objects, the respondents’ institutions employed a wide range of metadata schemata, as figure 2 shows. yet there were a few schemata that were widely used by cataloging and metadata professionals. specifically, 84.2 percent of the respondents’ institutions used marc; dc was also popular, with 25.4 percent using unqualified dc and 40.6 percent using qualified dc to create metadata. ead also was frequently cited (31.7 percent). in addition to these major types of metadata schemata, the respondents’ institutions also employed metadata object description schema (mods) (17.8 percent), visual resource association (vra) core (14.9 percent), and text encoding initiative (tei) (12.5 percent). figure 1. materials/resources handled (multiple responses) survey question: what type of materials/resources do you and your fellow catalogers/metadata librarians handle? (please check all that apply) figure 2. metadata schemata used (multiple responses) survey question: which metadata schema(s) do you and your fellow catalogers/metadata librarians use? (please check all that apply) metadata creation practices in digital repositories and collections | park and tosaka 109 custom metadata elements derives from the imperative to accommodate the perceived needs of local collections and users, as indicated by the two most common responses: (1) “to reflect the nature of local collections/resources” (76.9 percent) and (2) “to reflect the characteristics of target audience/community of local collections” (58.3 percent). local conditions were also cited from institutional and technical standpoints. many institutions (34.3 percent) follow existing local practices for cataloging and metadata creation while other institutions (18.5 percent) are making homegrown metadata additions because of constraints imposed by their local systems. table 6 summarizes the most frequently used controlled vocabulary schematas by resource type. by far the most widely used schema across all resource types was lcsh. the preeminence of lcsh evinces the critical role that it plays as the de facto form of controlled vocabulary for subject description. library of congress classification (lcc) was the second choice for all resource types other than images, cultural objects, and archives. for digital collections of these resource types and digitized resources, aat was the second most used controlled vocabulary, a fact that reflects its purpose as a domain-specific terminology used for describing works of art, architecture, visual resources, material culture, and archival materials. while traditional metadata schemata, content standards, and controlled vocabularies such as marc, aacr2, and lcsh clearly were preeminent in the majority of the respondents’ institutions, current metadata creation in digital repositories and collections faces new challenges from the enormous volume of online and digital resources.19 approximately one-third of the respondents’ institutions (33.8 percent) were meeting this challenge with tools for semiautomatic metadata generation. yet a majority of respondents (52.5 percent) indicated that their institutions did not use any such tools for metadata creation and management. this result seems to contrast with ma’s finding that automatic metadata generation was used in some capacity in nearly by oclc rlg programs (as described in “literature review” on page 105).16 in these surveys, unqualified dc was more frequently cited than qualified dc. one possible explanation of this less frequent use of unqualified dc may lie in the limitations of unqualified dc metadata semantics. survey respondents also reported on problems using dc metadata, which were mostly caused by semantic ambiguities and semantic overlaps of certain dc metadata elements.17 limitations and issues of unqualified dc metadata semantics are discussed in depth in park’s study.18 in light of these results, examining trends of qualified dc use in a future study would be interesting. despite the wide variety of schemata reported in use, there seemed to be an inclination to use only one or two metadata schemata for resource description. as shown in table 5, the majority of the respondents’ institutions (53.6 percent) used only one schema for metadata creation, while approximately 37 percent used two or three schemata (26.2 percent and 10.3 percent, respectively). the institutions using more than three schemata during the metadata-creation processes comprised only 9.9 percent of the respondents. turning to content standards (see figure 3), we found that aacr2 was the most widely used standard, indicated by 84.5 percent of respondents. this high percentage clearly reflects the continuing preeminence of marc as the metadata schema of choice for digital collections. dc application profiles also showed a large user base, indicated by more than one-third of respondents (37.0 percent). more than one quarter of the respondents (28.4 percent) used ead application guidelines as developed by the society of american archivists and the library of congress, while 10.6 percent used rlg best practice guidelines for encoded archival description (2002). about one quarter (25.7 percent) indicated dacs as their content standard. homegrown standards and guidelines are local application profiles that clarify existing content standards and specify how values for metadata elements are selected and represented to meet the requirements of a particular context. as shown in the results on metadata schemata, it is noteworthy that homegrown content standards and guidelines constituted one of the major choices of participants, indicated by more than one-fifth of the institutions (22.1 percent). almost two-fifths of the survey participants (38 percent) also reported that they add homegrown metadata elements to a given metadata schema. slightly less than half of the participants (47.2 percent) indicated otherwise. the local practice of creating homegrown content guidelines and metadata elements during the metadatacreation process deserves a separate study; this study only briefly touches on the basis for locally added custom metadata elements. the motivation to create table 5. number of metadata schemata in use number of metadata schemata in use number of participants 1 141 (53.6%) 2 69 (26.2%) 3 27 (10.3%) 4 or more 26 (9.9%) n=263. survey question: which metadata schema(s) do you and your fellow catalogers/metadata librarians use the most? (please check all that apply) 110 information technology and libraries | september 2010 criteria for selecting metadata and controlled vocabulary schemata what are the factors that have shaped the current state of metadata-creation practices reported thus far? in this section, we turn our attention to constraints that affect decision making at institutions in the selection of metadata and controlled vocabulary schemata for subject description. figure 4 presents the percentage of different metadata schemata selection criteria described by survey participants. first, collection-specific considerations clearly played a major role in the selection. the most frequently cited reason was “types of resources” (60.4 percent). this response reflects the fact that a large number of metadata schemata have been developed, often with wide variation in content and format, to better handle particular two-thirds of arl libraries.20 because semiautomatic metadata application is reported in-depth in a separate study, we only briefly sketch the topic here.21 the semiautomatic metadata application tools used in the respondents’ digital repositories and collections can be classified into five categories of common characteristics: (1) metadata format conversion, (2) templates and editors for metadata creation, (3) automatic metadata creation, (4) library system for bibliographic and authority control, and (5) metadata harvesting and importing tools. as table 7 illustrates, among those institutions that have introduced semiautomatic metadata generation tools, “metadata format conversion” (38.6 percent) and “templates and editors for metadata creation” (27 percent) are the two most frequently cited tools. figure 3. content standards used (multiple responses) survey question: what content standard(s) and/or guidelines do you and your fellow catalogers/metadata librarians use? (please check all that apply) metadata creation practices in digital repositories and collections | park and tosaka 111 job responsibility, “expertise of staff” (44.2 percent) and “integrated library system” (39.9 percent) appeared to highlight the key role that marc continues to play in the metadata-creation process for digital collections (see figure 2). “budget” also appeared to be an important factor in metadata selection (17.2 percent), showing that funding levels played a considerable role in metadata decisions. types of information resources. the primary factor in selecting metadata schemata is their suitability for describing the most common type of resources handled by the survey participants. the second and third most common criteria, “target users/ audience” (49.8 percent) and “subject matters of resources” (46.9 percent), also seem to reflect how domain-specific metadata schemata are applied. in making decisions on metadata schemata, respondents weighed materials in particular subject areas (e.g., art, education, and geography) and the needs of particular communities of practice as their primary users and audiences. however, existing technological infrastructure and resource constraints also determine options. given the prominence of general library cataloging as a primary table 6. the most frequently used controlled vocabulary schema(s) by resource type (multiple responses) lcsh lcc ddc aat tgm ulan tgn other text 79.5% (241) 35.6% (108) 16.8% (51) 10.2% (31) 6.9% (21) 3.6% (11) 5.0% (15) 14.2% (43) audiovisual materials 67.3% (204) 25.1% (76) 12.9% (39) 9.2% (28) 8.6% (26) 4.0% (12) 5.0% (15) 14.5% (44) cartographic materials 44.9% (136) 17.5% (53) 7.3% (22) 5.0% (15) 4.3% (13) 1.3% (4) 4.3% (13) 6.3% (19) images 43.2% (131) 11.9% (36) 5.6% (17) 25.7% (78) 20.1% (61) 9.9% (30) 10.6% (32) 11.2% (34) cultural objects (e.g., museum objects) 20.1% (61) 7.3% (22) 4.3% (13) 13.2% (40) 6.3% (19) 4.6% (14) 3.0% (9) 7.9% (24) archives 44.2% (134) 11.6% (35) 6.3% (19) 11.9% (36) 6.6% (20) 3.0% (9) 2.6% (8) 12.2% (37) electronic resources 60.7% (184) 23.4% (71) 8.6% (26) 5.3% (16) 3.6% (11) 1.7% (5) 3.0% (9) 14.2% (43) digitized resources 51.8% (157) 15.5% (47) 5.0% (15) 15.5% (47) 10.2% (31) 6.6% (20) 7.6% (23) 15.2% (46) born-digital resources 43.9% (133) 13.5% (41) 5.6% (17) 8.3% (25) 7.3% (22) 4.3% (13) 4.6% (14) 13.9% (42) survey question: which controlled vocabulary schema(s) do you and your fellow catalogers/metadata librarians use most? (please check all that apply) table 7. types of semi-automatic metadata generation tools in use types response rating metadata format conversion 38 (38.6%) templates and editors for metadata creation 26 (27.0%) automatic metadata creation 16 (16.7%) library system for bibliographic and authority control 15 (15.6%) metadata harvesting and importing tools 8 (8.3%) n = 96. survey question: please describe the (semi)automatic metadata generation tools you use. 112 information technology and libraries | september 2010 the software used by their institutions—i.e., “integrated library system” (39.9 percent), “digital collection or asset management software” (25.4 percent), “institutional repository software” (19.8 percent), “union catalogs” (14.9 percent), and “archival management software” (5.6 percent)—as a reason for their selection of metadata schemata. metadata decisions thus seem to be driven by a variety of local technology choices for developing digital repositories and collections. as shown in figure 5, similar patterns are observed with regard to selection criteria for controlled vocabulary schemata. three of the four selection criteria receiving majority responses—“target users/audience” (55.4 percent), “type of resources” (54.8 percent), and “nature of the collection” (50.2 percent)—suggest that controlled vocabulary decisions are influenced primarily by the substantive purpose and scope of controlled vocabularies for local collections. a major consideration seems to be whether particular controlled vocabularies are suitable for representing standard data values to improve access and retrieval for target audiences. “metadata standards,” another selection criteria frequently cited in the survey (54.1 percent), reflects how some domain-specific metadata schemata tend to dictate the use of particular controlled vocabularies. at the same time, the results also suggest that resources and technological infrastructure available to institutions were also important reasons for their selections. “expertise of staff” (38.3 percent) seems to be a straightforward practical reason: the application of controlled vocabularies is highly dependent on the width and depth of staff expertise available. likewise, when implementing controlled vocabularies in the digital environment, some institutions also took into account at the same time, it is noteworthy that while responses were not mutually exclusive, many respondents cited figure 4. criteria for selecting metadata schemata (multiple responses) question: which criteria were applied in selecting metadata schemata? (please check all that apply) figure 5. criteria for selecting controlled vocabulary schemata (multiple responses) question: which criteria are applied in selecting controlled vocabulary schemata? (please check all that apply) metadata creation practices in digital repositories and collections | park and tosaka 113 for search engines and 63.2 percent for oai harvesters), a result that may be interpreted as a tendency to create metadata primarily for local audiences. why do many institutions fail to make their locally created metadata available to other institutions despite wide consensus on the importance of metadata sharing in a networked world? responses from those institutions exposing none or not all of their metadata (see table 8) reveal that financial, personnel, and technical issues are major hindrances in promoting the exposure of metadata outside the immediate local environment. some institutions are not confident that their current metadata practices are able to satisfy the technical requirements for producing standards-based interoperable metadata. another reason frequently mentioned is copyright concerns about limited-access materials. yet some respondents simply do not see any merit to exposing their item-level metadata, citing its relative uselessness for resource discovery outside their institutions. as stated earlier, the practice of adding homegrown metadata elements seems common among many institutions. while locally created metadata elements accommodate local needs and requirements, they may also hinder metadata interoperability across digital repositories and collections if mechanisms for finding information about such locally defined extensions and variants are absent. homegrown metadata guidelines document local data models and function as an essential mechanism for metadata creation and quality assurance within and across digital repositories and collections.23 in this regard, it is essential to examine locally created metadata guidelines and best practices.24 however, the results of the survey analysis evince that the vast majority of institutions (72.0 percent) provided no public access to local application profiles on their websites while only 19.6 percent of respondents’ institutions made them available online to the public. ■■ conclusion metadata plays an essential role in managing, organizing, and searching for information resources. in the networked existing system features for authority control and controlled vocabulary searching, as exhibited by 17.2 percent of responses for “digital collection/or asset management software.” exposing metadata and metadata guidelines beyond local environments metadata interoperability across distributed digital repositories and collections is fast becoming a major issue.22 the proliferation of open-source and commercial digital library platforms using a variety of metadata schemata has implications on the librarians’ ability to create shareable and interoperable metadata beyond the local environment. to what extent are mechanisms for sharing metadata integrated into the current metadata-creation practices described by the respondents? figure 6 summarizes the responses concerning the uses of three major mechanisms for metadata exposure. approximately half of respondents exposed at least some of their metadata to search engines (52.8 percent) and union catalogs such as oclc worldcat (50.6 percent). more than one-third of the respondents exposed all or some of their metadata through oai harvesters (36.8 percent). about half or more of the respondents either did not expose their metadata or were not sure about the current operations at their institutions (e.g., 47.2 percent figure 6. mechanism to expose metadata (multiple responses) survey question: do you/your organization expose your metadata to oai (open archives initiative) harvesters, union catalogs or search engines? 114 information technology and libraries | september 2010 the dc metadata schema is the second most widely employed according to this study, with qualified dc used by 40.6 percent of responding institutions and unqualified dc used by 25.4 percent. ead is another frequently cited schema (31.7 percent), followed by mods (17.8 percent), vra (14.9 percent), and tei (12.5 percent). a trend of qualified dc being used (40.6 percent) more often than unqualified dc (25.4 percent) is noteworthy. one possible explanation of this trend may be derived from the fact that semantic ambiguities and overlaps in some of the unqualified dc elements interfere with use in resource description.25 given the earlier surveys reporting the higher use of unqualified dc over qualified dc, more in-depth examination of their use trends may be an important avenue for future studies. despite active research and promising results obtained from some experimental tools, practical applications of semiautomatic metadata generation have been incorporated into the metadata-creation processes by only one-third of survey participants. the leading criteria in selecting metadata and controlled vocabulary schemata are derived from collection-specific considerations of the type of resources, the nature of the collections, and the needs of primary users and communities. existing technological infrastructure, encompassing digital collection or asset management software, archival management software, institutional repository software, integrated library systems, and union catalogs also greatly influence the selection process. the skills and knowledge of metadata professionals and the expertise of staff also are significant factors in understanding current practices in the use of metadata schemata and controlled vocabularies for subject access across distributed digital repositories and collections. the survey responses reveal that metadata interoperability remains a challenge in the current networked environment despite growing awareness of its importance. for half of the survey respondents, exposing metadata to the service providers, such as oai harvesters, union catalogs, and search engines, does not seem to be a high priority because of local financial, personnel, and technical constraints. locally created metadata elements are added in many digital repositories and collections in large part to meet local descriptive needs and serve the target user community. while locally created metadata elements accommodate local needs, they may also hinder metadata interoperability across digital repositories and collections when shareable mechanisms are not in place for such locally defined extensions and variants. locally created metadata guidelines and application profiles are essential for metadata creation and quality assurance; however, most custom content guidelines and best practices (72 percent) are not made publicly available. the lack of a mechanism to facilitate public access to local application profiles and metadata guidelines may environment, the enormous volume of online and digital resources creates an impending research need to evaluate the issues surrounding the metadata-creation process and the employment of controlled vocabulary schemata across ever-growing distributed digital repositories and collections. in this paper we explored the current status of metadata-creation practices through an examination of survey responses drawn mostly from cataloging and metadata professionals (see tables 2, 3, and 4). the results of the study indicate that current metadata practices still do not create conditions for interoperability. despite the proliferation of newer metadata schemata, the survey responses showed that marc currently remains the most widely used schema for providing resource description and access in digital repositories, collections, and libraries. the continuing predominance of marc goes hand-in-hand with the use of aacr2 as the primary content standard for selecting and representing data values for descriptive metadata elements. lcsh is used as the de facto controlled vocabulary schema for providing subject access in all types of digital repositories and collections, while domain-specific subject terminologies such as aat are applied at significantly higher rates in digital repositories handling nonprint resources such as images, cultural objects, and archival materials. table 8. sample reasons for not exposing metadata not all our metadata conforms to standards required not all our metadata is oai compliant lack of expertise and time and money to develop it it restrictions security concerns on the part of our information technology department some collections/records are limited access and not open to the general public we think that having worldcat available for traditional library materials that many libraries have is a better service to people than having each library dump our catalog out on the web varies by tool and collection, but usually a restriction on the material, a technical barrier, or a feeling that for some collections the data is not yet sufficiently robust “still in a work in progress” survey question: if you selected “some, but not all” or “no” in question 13 [see figure 6], please tell why you do not expose your metadata. metadata creation practices in digital repositories and collections | park and tosaka 115 presented at 2003 dublin core conference: supporting communities of discourse and practice—metadata research & applications, seattle, wash., sept. 28–oct. 2, 2003), http://dcpapers .dublincore.org/ojs/pubs/article/view/732/728 (accessed mar. 24, 2009); sarah currier et al., “quality assurance for digital learning object repositories: issues for the metadata-creation process,” alt-j 12 (2004): 5–20. 3. jin ma, metadata, spec kit 298 (washington, d.c.: association of research libraries, 2007): 13, 28. 4. ibid., 12, 21–22. 5. karen smith-yoshimura, rlg programs descriptive metadata practices survey results (dublin, ohio: oclc, 2007): 6–7, http://www.oclc.org/programs/publications/reports/2007-03 .pdf (accessed mar. 24, 2009); karen smith-yoshimura and diane cellentani, rlg programs descriptive metadata practices survey results: data supplement (dublin, ohio: oclc, 2007): 16, http://www.oclc.org/programs/publications/reports/2007-04 .pdf (accessed mar. 24, 2009). 6. carole palmer, oksana zavalina, and megan mustafoff, “trends in metadata practices: a longitudinal study of collection federation” (paper presented at the seventh acm/ iees-cs joint conference on digital libraries, vancouver, british columbia, canada, june 18–23, 2007), http://hdl.handle .net/2142/8984 (accessed mar. 24, 2009). 7. smith-yoshimura, rlg programs descriptive metadata practices survey results, 7; smith-yoshimura and cellentani, rlg programs descriptive metadata practices survey results, 17. 8. ma, metadata, 12, 22–23. 9. smith-yoshimura, rlg programs descriptive metadata practices survey results, 7; smith-yoshimura and cellentani, rlg programs descriptive metadata practices survey results, 18–21. 10. karen markey et al., census of institutional repositories in the united states: miracle project research findings (washington, d.c.: council on library & information resources, 2007): 3, 46–50, http://www.clir.org/pubs/reports/pub140/pub140.pdf (accessed mar. 24, 2009). 11. yoshimura and cellentani, rlg programs descriptive metadata practices survey results, 24. 12. university of houston libraries institutional repository task force, institutional repositories, spec kit 292 (washington, d.c.: association of research libraries, 2006): 18, 78. 13. ma, metadata, 13, 28. 14. smith-yoshimura, rlg programs descriptive metadata practices survey results, 9, 11; smith-yoshimura and cellentani, rlg programs descriptive metadata practices survey results, 27–29. 15. for the metrics of job responsibilities used to analyze job descriptions and competencies of cataloging and metadata professionals, see jung-ran park, caimei lu, and linda marion, “cataloging professionals in the digital environment: a content analysis of job descriptions,” journal of the american society for information science & technology 60 (2009): 844–57; jung-ran park and caimei lu, “metadata professionals: roles and competencies as reflected in job announcements, 2003–2006,” cataloging & classification quarterly 47 (2009): 145–60. 16. ma, metadata; smith-yoshimura, rlg programs descriptive metadata practices survey result. 17. jung-ran park and eric childress, “dublin core metadata semantics: an analysis of the perspectives of information professionals,” joural of information science 35, no. 6 (2009): 727–39. 18. park, “semantic interoperability.” 19. jung-ran park, “metadata quality in digital repositories: hinder cross-checking for quality metadata and creating shareable metadata that can be harvested for a high level of consistency and interoperability across distributed digital collections and repositories. development of a searchable registry for publicly available metadata guidelines has potential to enhance metadata interoperability. a constraining factor of this study derives from the participant population; thus we have not attempted to generalize the findings of the study. however, results indicate a pressing need for a common data model that is shareable and interoperable across ever-growing distributed digital repositories and collections. development of such a common data model demands future research of a practical and interoperable mediation mechanism underlying local implementation of metadata elements, semantics, content standards, and controlled vocabularies in a world where metadata can be distributed and shared widely beyond the immediate local environment and user community. (other issues such as semiautomatic metadata application, dc metadata semantics, custom metadata elements, and the professional development of cataloging and metadata professionals are explained in-depth in separate studies.)26 for future studies, incorporation of other research methods (such as follow-up telephone surveys and face-to-face focus group interviews) could be used to better understand the current status of metadata-creation practices. institutional variation also needs be taken into account in the design of future studies. ■■ acknowledgments this study is supported through an early career development research award from the institute of museum and library services. we would like to express our appreciation to the reviewers for their invaluable comments. references 1. jung-ran park, “semantic interoperability and metadata quality: an analysis of metadata item records of digital image collections,” knowledge organization 33 (2006): 20–34; rachel heery, “metadata futures: steps toward semantic interoperability,” in metadata in practice, ed. diane i. hillman and elaine l. westbrooks, 257–71 (chicago: ala, 2004); jung-ran park, “semantic interoperability across digital image collections: a pilot study on metadata mapping” (paper presented at the canadian association for information science 2005 annual conference, london, ontario, june 2–4, 2005), http://www.cais-acsi .ca/proceedings/2005/park_j_2005.pdf (accessed mar. 24, 2009). 2. jane barton, sarah currier, and jessie m. n. hey, “building quality assurance into metadata creation: an analysis based on the learning objects and e-prints communities of practice” (paper 116 information technology and libraries | september 2010 a survey of the current state of the art,” in “metadata and open access repositories,” ed. michael s. babinec and holly mercer, special issue, cataloging & classification quarterly 47, no. 3/4 (2009): 213–38. 20. ma, metadata, 12, 24. the oclc rlg survey found that about 40 percent of the respondents were able to generate some metadata automatically. see smith-yoshimura, rlg programs descriptive metadata practices survey results, 6; yoshimura and cellentani, rlg programs descriptive metadata practices survey results, 35. 21. jung-ran park and caimei lu, “application of semiautomatic metadata generation in libraries: types, tools, and techniques,” library & information science research 31, no. 4 (2009): 225–31. 22. park, “semantic interoperability”; sarah l. shreeves et al., “is ‘quality’ metadata ‘shareable’ metadata? the implications of local metadata practices for federated collections” (paper presented at the 12th national conference of the association of college and research libraries, apr. 7–10, 2005, minneapolis, minnesota), https://www.ideals.uiuc.edu/handle/2142/145 (accessed mar. 24, 2009); amy s. jackson et al., “dublin core metadata harvested through oai-pmh,” journal of library metadata 8, no. 1 (2008): 5–21; lois mai chan and marcia lei zeng, “metadata interoperability and standardization—a study of methodology part i: achieving interoperability at the schema level,” d-lib magazine 12, no. 6 (2006), http://www.dlib.org/ dlib/june06/chan/06chan.html (accessed mar. 24, 2009); marcia lei zeng and lois mai chan, “metadata interoperability and standardization—a study of methodology part ii: achieving interoperability at the record and repository levels,” d-lib magazine 12, no. 6 (2006), http://www.dlib.org/dlib/june06/ zeng/06zeng.html (accessed mar. 24, 2009). 23. thomas r. bruce and diane i. hillmann, “the continuum of metadata quality: defining, expressing, exploiting,” in metadata in practice, ed. hillman and westbrooks, 238–56; heery, “metadata futures”; park, “metadata quality in digital repositories.” 24. jung-ran park, ed., “metadata best practices: current issues and future trends,” special issue, journal of library metadata 9, no. 3/4 (2009). 25. see park, “semantic interoperability”; park and childress, “dublin core metadata semantics.” 26. park and childress, “dublin core metadata semantics”; park and lu, “application of semi-automatic metadata generation in libraries.” september_ital_ullah_for_proofing bibliographic classification in the digital age: current trends and future directions asim ullah, shah khusro, and irfan ullah information technology and libraries | september 2017 48 abstract bibliographic classification is among the core activities of library & information science that brings order and proper management to the holdings of a library. compared to printed media, digital collections present numerous challenges regarding their preservation, curation, organization and resource discovery & access. therefore, true native perspective is needed to be adopted for bibliographic classification in digital environments. in this research article, we have investigated and reported different approaches to bibliographic classification of digital collections. the article also contributes two evaluation frameworks that evaluate the existing classification schemes and systems. the article presents a bird’s-eye view for researchers in reaching a generalized and holistic approach towards bibliographic classification research, where new research avenues have been identified. introduction classification is the primary instinct of human beings in arranging, understanding, and relating knowledge artifacts. bibliographic classification provides a framework for arranging and organizing knowledge artifacts preserved in the form of books, magazines, newspapers and other holdings to explore new avenues of knowledge management. today several classification schemes are in use ranging from conventional classification schemes including library of congress classification (lcc), dewey decimal classification (ddc), colon classification (cc), and universal decimal classification (udc) to classification for digital environments including association for computing machinery (acm) digital library1, institute of electrical and electronics engineering (ieee) digital library2, and online computer library center (oclc) cooperative catalogue3. besides the difficulties that lie in devising a classification scheme (time-consuming and resourceconsuming), it is required that either the existing schemes should be revised and extended or a new classification scheme should be devised, which could act as a common platform for representing knowledge artifacts belonging to different contexts. such a classification scheme should also resolve the challenges in digital preservation and curation and support the precise asim ullah (asimullah@upesh.edu.pk), shah khusro (khusro@upesh.edu.pk), and irfan ullah (cs.irfan@upesh.edu.pk) are researchers at the department of computer science, university of peshawar, peshawar, pakistan. 1 http://dl.acm.org/ 2 http://ieeexplore.ieee.org/xplore/home.jsp 3 https://www.oclc.org/ bibliographic classification in the digital age | ullah, khusro, and ullah | doi:10.6017/ital.v36i3.8930 49 and accurate search and retrieval of digital collections. the first step, in this connection, is to properly analyze and evaluate the existing bibliographic classification schemes and to dig out their strengths and limitations in classifying digital collections accurately and appropriately. therefore, the objectives of this research article include: • to investigate and evaluate the available approaches to bibliographic classification from the perspective of devising a classification scheme that can act as a common platform for classifying any type of digital collection. • to devise evaluation frameworks that compares the available bibliographic classification schemes and approaches. • to present issues, challenges, and research opportunities in state-of-the-art bibliographic classification research. the rest of the paper is organized as: section 2 presents the current trends in the classification of digital collections. section 3 presents two evaluation frameworks for comparing and evaluating the existing solutions. section 4 presents research challenges and opportunities in bibliographic classification research. finally, section 5 concludes our discussion. references are presented at the end of the paper. classifying digital collections – a mixed trend the bibliographic classification has been the focus of several researchers to properly classify, catalogue, and describe digital collections. in this regard, two approaches have been adopted: the former supports the use of conventional classification schemes including cc, ddc, and lcc etc., in describing and classifying digital documents, while the latter recommends devising some new ways of classification such as acm4 computing classification. however, in most of the digital environments, a mixed trend has been observed, where along with new classification schemes, categorization is also used as a complementary solution. for example, acm presents its own classification system as poly-hierarchical ontology in describing computer science literature and for using in semantic web applications. acm has replaced its 2008 acm classification system that serves as de-facto model for the classification of computer science literature by giving visual topic display along with searching services. it serves as semantic vocabulary for categorizing concepts and a foundation of computing disciplines ("the 2012 acm computing classification system,"). similarly, ieee digital library categorizes its holdings into directories per its own rules of cataloguing and categorization. it categorizes articles and standards in to several subject areas and clusters documents through year of publication, author names, content type, affiliation, publication title, publisher, country of publication, alphabets, numerals and alphanumeric values5. the document collection can be navigated through collection names, number of documents, by topic and international classification for standards (ics). 4 http://dl.acm.org 5 http://ieeexplore.ieee.org/browse/standards/ics/ieee/ information technology and libraries | september 2017 50 the dmoz6 directory is the largest human made directory of web pages. since its inception in 1998, it categorizes 3,861,137 websites available in 90 languages into 1,031,719 categories and sub-categories by 91,928 editors and volunteers. in addition, it has its dmoz rdf dumps available on linked open data (lod) cloud. according to the world wide web consortium (w3c), lod enables the data integration and reasoning at a large scale ("linked data,"). it establishes links among data enabling machines and users to explore the web of data rather than the web of documents along with finding related data (berners-lee, 2006; bizer, heath, & berners-lee, 2009). however, it lacks in semantic search (meaningful search), which affects the precision and accuracy in exploring the required resources. also, the categories, under which the websites are kept, are needed to be revised because there can be faceted and intra-hierarchical links among web pages. in addition, the content management needs to be upgraded for updating the directory with new entries and the way it reviews and categorizes websites (boykin, 2016). institutional repositories use the mixed approach towards creating, collecting and managing metadata for printed and digital collections using several sources including conventional and digital. this mixed trend introduces challenges to the metadata managers (chapman, reynolds, & shreeves, 2009). to deal with these challenges, the subject classification systems can be very beneficial in providing web-oriented services including searching of contents through search patterns, browsing, and content filtering by subject area. however, at the same time, a cognitive overload rises for the authors and depositors of the institutional repository (cliff, 2008) that needs further attention. to handle the information overload in retrieving digital collections, several controlled methods have been proposed in the literature ranging from manual techniques (e.g., web directories) to automatic techniques including clustering and classification. several classification schemes including sentiment and subject classification have been developed for classifying (and categorizing) web pages. classification is used in focused crawling, searching and ranking results, and classifying queries. clustering also classifies web resources but it is slightly different from classification, which is based on a rigid predefined taxonomy and rules for interpreting the meaning of classification order. on the other hand, clustering shows flexibility in classification (categorization) of web documents (zhu, 2011). however, a mixed trend has been observed, where classification and categorization are intermingled to facilitate organization, description, exploration, and retrieval of digital collections. semantic web brings meaningful connections between the web of data so that not only humans but machines can also understand the content of documents to retrieve the most intended and required documents. this way other related documents could also be easily connected and retrieved (berners-lee, 2006). to understand, describe, and relate concepts within documents, ontologies are used. therefore, researchers have been working on bringing semantics through semantic web and related technologies to automatically classify digital collections. for example, 6http://www.dmoz.org bibliographic classification in the digital age | ullah, khusro, and ullah | doi:10.6017/ital.v36i3.8930 51 (beghtol, 1986) argues that semantic axis makes syntactical classification structure more meaningful and provides the platform for developing relationships among knowledge artifacts through several warrants in classification systems. similarly, classification ontology is used in automatic classification (wijewickrema & gamage, 2013) to minimize the ambiguity in vocabulary. to obtain a single subject for the input document, several weight functions including the term frequency-inverse document frequency (tf-idf), and filtering methods are applied. semantic web and lod technologies have also been used in dealing with bibliographic data. for example, bibbase7, a bibliographic data publishing and management tool (xin, hassanzadeh, fritz, sohrabi, & miller, 2013) publishes bibliographic data on the user website according to lod principles. however, these are limited because of the lack of interoperability among native languages while translating classification records from source language to the target language (kwaśnik & rubin, 2003). the classification schemes are also being converted into ontologies. (giunchiglia, marchese, & zaihrayeu, 2007) have applied reasoning capabilities of owl ontologies to classification schemes. these ontologies are used as interfaces to human knowledge for machines whereas classification schemes are interfaces to knowledge for humans. however, there is limited support available for cross-disciplinary searching and accommodation for more views and interpretations of knowledge (albrechtsen, 2000). the supervised and unsupervised machine learning techniques are used for automatic text classification. supervised machine learning techniques use models including multinomial naïve bayes model, and bernoulli model (manning, raghavan, & schütze, 2008) for classification. yelton (2011) applies probabilistic classification of important words (and therefore of documents) especially by considering amazon’s statistically improbable phrases (sips)8 and google phrase search inside a book. for subject analysis, he mentions simplistic; content-based; and requirements-based methods in terms of understanding text classification and manipulation of books. the wikipedia page structural hierarchy is exploited in automatically harvesting, classification, categorization, clustering, and metadata enrichment (yelton, 2011). information extraction (ie) is also applied in classifying books automatically. for example, (betts, milosavljevic, & oberlander, 2007) use ie methods for automatic labeling of books using lcc classification. they used bag-of-words (bow) model, bag-of-named-entity recognition (ner) model, generalizing named entities (gaz) model in automatic text classification. to achieve better accuracy, they also combined the results of these models. however automatic classification may lead to limited search and retrieval because of the missing semantics associated with phrases or key words. to overcome this issue, a fundamental and practical theoretical model of classification is required (jones, 1970). 7 https://bibbase.org/ 8http://www.amazon.com/gp/search-inside/sipshelp.html information technology and libraries | september 2017 52 table 1 categorizes the bibliographic classification approaches into three broader categories namely: theoretical approaches, practical approaches and approaches used in digital environments. theoretically researchers have discussed different viewpoints for classification, whereas we get a different view when these schemes are applied for classification. practically, the syntactic structure is valued by using faceted and enumerative techniques. in digital environments like the web and digital libraries, strict boundaries of classification are often compromised by categorization. approaches to classification techniques used theoretical approaches 1. biasness (mai, 2009) (mai, 2010) 2. subjectivity and objectivity (hjørland, 2016) 3. epistemological and semiotic approaches (hjørland, 2013) (lee, 2012; mai, 2011) (tennis, 2008) 4. empiricism, rationalism, historicism and pragmatism (hjørland, 2013) 5. multidisciplinarity approach (beghtol, 1998) 6. scientific approaches (hjørland, 2008) 7. positivistic and pragmatic approaches (dousa, 2009) (mai, 2011) 8. interdisciplinary and evidence based practice classification (hjørland, 2016) 9. social and cultural context (j.-e. mai, 2004) 10. by tracking the universe of knowledge 11. universal order (smiraglia & van den heuvel, 2011) 12. integrative levels in classification (dousa, 2009) 13. literary warrant (rodriguez, 1984) 14. education warrant (hjørland, 2007) (beghtol, 1986) 15. semantic warrant (beghtol, 1986) 16. syntactic warrant (beghtol, 1986) 17. domain and users requirements (mai, 2005) 18. pluralism and human interpretations practical approaches 1. enumerative and faceted (batley, 2014). 2. general purpose approach (mai, 2003) and special purpose approach (mancuso, 1994) e.g. classification schemes for general classes of knowledge areas or for a special class of knowledge area. 3. syntactic axis (beghtol, 1986) (beghtol, 2001) 4. semantic axis (beghtol, 1986) (beghtol, 2001) bibliographic classification in the digital age | ullah, khusro, and ullah | doi:10.6017/ital.v36i3.8930 53 classification in digital environment 1. document similarity (hamming distance and euclidean geometric approaches) (losee, 1993) 2. fuzzy approach (jacob, 2004) 3. clustering (nizamani, memon, & wiil, 2011) 4. categorization (koshman, 1993) 5. tf-idf weighting (dorji et al., 2011) 6. unsupervised machine learning techniques (joorabchi & mahdi, 2011). (k-mean clustering, hierarchical clustering) 7. supervised machine learning techniques (wang, 2009) (multinomial naïve bayes, bernoulli model, support vector machine, random forest, k-nn technique) 8. information extraction methods (gilchrist, 2015) 9. probabilistic text and document classification (maron, kuhns, & ray, 1959) 10. ontologies (campbell, 2002) table 1. categorization of approaches towards bibliographic classification evaluating classification schemes & approaches in this section, we present two evaluation frameworks to compare and evaluate the existing classification and categorization systems and well-known bibliographic classification ontologies. we have chosen cc, ddc, lcc, and universal decimal classification (udc) on the basis of their structural properties and wide usage both in conventional and digital libraries ("subject classification schemes," 2015) ("library of congress classification," 2014) ("about universal decimal classification (udc),") (press, 2002) (encyclopedia, 1 august 2014). some of these properties include: citation and filling order; notations expressiveness; flexibility in classification principles, rules and notations; coverage of the knowledge areas; classification schedules and notations structure; notations brevity and simplicity; notations mnemonics; notations hospitality; schedules with updateable and comprehensive subjects order; and knowledge coverage (batley, 2014). the udc, lcc, and ddc are universal, multidisciplinary, and widely used systems (koch & day, 1997), whereas cc has the seminal and inspirational value for the faceted structure of the bibliographic classification. therefore, the evaluation framework mainly targets these classification schemes as our natural choice for the evaluation and comparison. similarly, we evaluate acm9, ieee10 and dmoz11 using the evaluation framework as these are the well-known and widely used document classification & categorization systems for the digital libraries. table 2 presents 22 metrics used in the evaluation framework. these evaluation metrics are extracted 9 http://www.acm.org/about/class 10http://www.ieee.org/about/today/at_a_glance.html?utm_source=mm_link&utm_campaign=iaa&utm_medium=ab& utm_term=at%20a%20glance 11 https://www.dmoz.org/docs/en/about.html information technology and libraries | september 2017 54 from the existing literature (kaosar, 2008) (painter, 1974) (encyclopedia, 1 august 2014) (buchanan, 1979) (kaosar, 2008) (painter, 1974) (encyclopedia, 1 august 2014) (koch et al., 1997) (reiner, 2008) (gnoli, merli, pavan, bernuzzi, & priano, 2008) (francu, 2007) (chan, intner, & weihs, 2016). these metrics include: (i) structural complexity; (ii) notational brevity; (iii) predefined structure; (iv) rules complexity; (v) theoretical laws; (vi) mnemonics; (vii) hospitality; (viii) search complexity; (ix) usability; (x) precision and accuracy; (xi) multilinguality; (xii) interoperability; (xiii) semantic search; (xiv) bias in subject representation; (xv) enumerative structure; (xvi) faceted structure; (xvii) faceted search; (xviii) consistency; (xix) lod datasets; (xx) linked open vocabularies (lov) support; (xxi) platform; and (xxii) warrants of classification. these metrics, their need, and their use in ratings of classification systems are discussed in the following paragraphs. in table 2, these bibliographic systems are evaluated for these metrics. the indicator ü shows the presence of metric value, û indicator represents that the system has no or minimal support for the mentioned metric, whereas and n/a is used for not applicable. in addition, each classification system has been evaluated and rated based on these metrics (table 3), where figure 1 graphically demonstrates the rankings and ratings of these classification systems. schemes metrics cc udc ddc lcc acm ieee dmoz structural complexity ü ü û û û û û notational brevity û û ü ü ü ü n/a predefined structure ü û ü ü ü ü ü rules complexity ü ü û ü û û û theoretical laws ü ü ü ü û û û mnemonics ü ü ü ü ü ü û hospitality ü ü ü ü ü ü ü search complexity ü ü û û û û ü usability ü ü ü ü ü ü û accuracy and precision ü ü ü ü ü ü û multilinguality ü ü ü ü û û ü interoperability û ü ü ü ü ü û semantic search ü ü ü ü ü ü û bias in representation ü ü ü ü ü ü û enumerative structure û ü ü ü û û û bibliographic classification in the digital age | ullah, khusro, and ullah | doi:10.6017/ital.v36i3.8930 55 faceted structure û ü û û ü ü û faceted search û ü ü ü ü ü û consistency ü ü ü ü ü ü ü lod datasets û ü ü ü ü ü ü lov support û û û û ü û û platform n/a udc consortium oclc library of congress acm digital library ieee xplore digital library open directory project warrants of classification literary warrant (giess, wild, & mcmahon, 2007) literary warrant (perles, 1995) literary and scientific warrant (giess et al.,2007) literary and scientific warrant (giess et al.,2007) scientific research warrant scientific research warrant n/a table 2. evaluation of classification schemes the structural complexity means difficulties in using the structure and notations in classifying and describing a specific subject area. the metric will help us in selecting a classification scheme or system that is easy to use in classifying document collection by requiring short notations and simple rules. the notations and rules are complex in cc and udc (ranganathan, 1968). this complexity is because of the faceted structure in these classification schemes (sukhmaneva, 1970). the structural complexity of cc is greater than that of udc. udc comes at second position in complexity as compared to cc. because of its enumerative structure, lcc stands at third position, as it is lesser complex than cc and udc. ddc is the simplest in this list because it is based on enumerative classification structure and on the principle of dividing universe of knowledge into defined classes. ieee is more complex than acm, whereas dmoz is the least complex system. the classification system with greater structural complexity is ranked lower in the list. therefore, based on this metric, the classification systems can be ranked as dmoz, acm, ieee, ddc, lcc, udc, and cc. the notational brevity means how brief are the notations in describing and understanding the holdings with minimum number of symbols and minimal cognitive load. ddc uses well-organized short notations and their mnemonic value is also greater (comaromi & satija, 1983) (hyman, 1980). lcc has notational brevity (chan et al., 2016). udc uses lengthy notations (kaosar, 2008) as compared to ddc, whereas cc also uses lengthy and complex notations (chatterjee, 2016). acm notations are shorter than ieee, whereas dmoz do not use any notations at all. using this metric, these classification systems can be ranked as acm, ieee, ddc, lcc, udc, cc, and dmoz at the last with no usage of symbols at all. the predefined structure means that the classification scheme follows rigid pre-assumed subject categorization along with classification class marks. in this regard, udc and lcc are enumerative information technology and libraries | september 2017 56 and impose subjectivity viewpoint of classification by following a predefined structure (goh, giess, mcmahon, & liu, 2009). being faceted, cc arranges basic concepts in few predefined categories (satija & martínez-ávila, 2015). ddc also has the predefined hierarchical structure of classification (press, 2002) (jonassen, 2004). among these schemes, cc has minimal predefined structure because of using facets; udc is both enumerative and analytico-synthetic. lcc is enumerative but possesses weaker predefined rules for the structural design. because of the rigid enumerative hierarchies and predefined class structure, ddc comes at first position. dmoz has the most rigid predefined structure as compared to that of ieee and acm. the classification system with most rigid and predefined structure is ranked lower, and therefore, the ranking could be cc, acm, ieee, udc, ddc, lcc and dmoz. the complexity in rules determines the difficulty level in applying classification rules on knowledge artifacts. cc presents a complex set of rules and classification theory, which is comparatively difficult to implement and understand (tennis, 2011). lcc is also complex ("library of congress subject headings: prevs. post-coordination and related issues," march 15, 2007 ) in implementing library of congress subject headings (lcsh) in pre-coordinated subject strings. ddc’s rules and principles are comprehensive and complete (press, 2002) and easier than those of cc and lcc. udc is also easy to understand and implement (piros, 2014). acm, ieee, and dmoz are simple to use and understand, and therefore, bears no such complexity. a classification system with greater complexity is ranked lower, therefore, based on this metric, the rankings could be acm, ieee, dmoz are on top with similar rankings followed by udc, ddc, lcc, and cc. theoretical laws are considered as a metric to analyze the foundations of classification systems to understand whether they are based on certain theoretical laws and principles of classification or not. udc combines the enumerative and faceted approaches gathered from ddc and cc (kaosar, 2008). the synthetic principle of udc contributes to its widespread use but it is not enough at the intellectual level for making the relations between the subject facets (kyle & vickery, 1961). udc lacks standard rules for its application for making facets, but there are rules for its structural representation (mcilwaine, 1997). therefore, the structural and synthetic rules are good enough for its applicability but it should be refined further at the intellectual level. the theoretical laws of cc are based on the faceted approach of managing knowledge artifacts. cc has sound rules and principles, which include different postulates, laws, principles and canons (batley, 2014) (arashanapalai neelameghan & parthasarathy, 1997). on the other hand, lcc has weaker theoretical foundations. there also exists some intellectual and structural limitations due to its enumerative structure (san segundo manuel, 2008). ddc has the hierarchical and the enumerative structure which is based on the knowledge philosophy of hierarchical division (hjorland, 1999). because of the strong theoretical foundations, cc is at the top of this list, ddc is second because of its universal theory of knowledge division, udc is third for being exploiting the theories of ddc and cc, lcc is at fourth position for comparatively weak theory of classification, whereas acm, ieee, and dmoz present no or very limited theoretical laws or philosophical rules of classification. bibliographic classification in the digital age | ullah, khusro, and ullah | doi:10.6017/ital.v36i3.8930 57 the support for using mnemonics enables human classifiers to easily memorize the symbols and notations of classification scheme. the systematic and literal mnemonics are used in udc (satija, 2013) (kaosar, 2008). the mnemonics are increased through mnemonic devices, which are described through the canons of mnemonics (kaula, 1965). lcc uses literal mnemonics (satija, 2013), whereas ddc uses systematic and literal mnemonics but its systematic mnemonics are not consistent (satija, 2013). there are several seminal mnemonics in cc (rahman & ranganathan, 1962). these mnemonic devices increase mnemonics in cc, but the formation and length of the notations affects this mnemonic quality. acm has greater support for mnemonics in comparison with ieee, whereas dmoz is the collection of web pages under specific categories. based on this metric, the rankings of classification systems could be ddc, udc, lcc, acm, ieee, cc, whereas dmoz lacks in using any mnemonic devices or notations. hospitality means the ability of a classification scheme to incorporate new knowledge areas expressed in different multilingual contexts. hospitality is present in udc (kaosar, 2008). cc is also hospitable for new subjects (de grolier, 1962). lcc is hospitable for expressing the new subjects and knowledge areas (satija, 2013). ddc is hospitable for new subject areas (satija, 2013). by applying this metric, a classification scheme with faceted approach is naturally more hospitable than others. therefore, cc is more hospitable and at the top in this list followed by udc. ddc is at third position for being following enumerative approach. lcc is at fourth position because of it’s of pure enumerative structure. ieee and acm are at fifth position by covering short span of knowledge areas, faceted structure, and efficient search. dmoz is covering only web pages in already specified categories therefore it is at seventh position. search complexity measures the difficulty in searching artifacts using a classification scheme. it describes that which classification scheme is worth in searching a specific document. search complexity is minimal in udc because of its syntactic-analytico and enumerative nature (kaosar, 2008), which can contribute in search applications in both web based and in-house searching applications e.g., online public access catalog (opac). the theory and philosophy of cc is the trend setter for the knowledge management, resource discovery & access, however, according to (raghavan, 2016) searching through cc is comparatively weaker than other bibliographic classification schemes. according to (chan, 2000), lcc and lcsh have the potential to provide the ease in searching because of richer vocabulary for greater subject coverage, synonym and homograph capabilities, pre-coordinated system, browsing capability in multi-faceted structure, multilingual support and marc format support with sematic interoperability. however, it is limited in providing ease in search & retrieval process, which include syntax and application rules complexity, lack of training for the personnel, and too lengthy and complex searching strings. ddc and lcc are aggregated in classify12 project initiated by oclc. with the use of the classify application, the search experience of the catalogers and patrons becomes much easier. using this metric, ddc stands at the top with least complexity than lcc, udc, and cc. ieee is more complex than acm and dmoz. the classification scheme with less search complexity will be ranked higher. 12 http://www.oclc.org/research/themes/data-science/classify.html information technology and libraries | september 2017 58 therefore, acm and ieee, ddc, and lcc stand first with least search complexity followed by udc, and cc. dmoz stands at the last position with greater search complexity having loose boundaries of categorization. usability analyzes the difficulty in using a classification scheme for classifying and searching documents. this metric defines the ease of learning and effective usage. usability measures user satisfaction, user understanding of the system, and precision with minimal recall in lesser amount of time (singapore, 2016). oclc has included structural changes to improve usability and simplify classification tasks ("dewey services: dewey decimal classification system,"). the classify13 project aims at finding books through a web interface, which is easy to use and understand by using ddc and lcc. udc is extensively used in web-based search and retrieval applications (kaosar, 2008). this classification scheme is used in several institutions’ opac systems ("library opacs containing udc codes,"). the udc notations are supportive for the usability (slavicoverfield, 2005). however, the user interface of these opac search systems could be further improved (slavic, 2006) (pollitt, 1998) (schallier, 2005). cc is the source of inspiration and a standardized model for the usability of faceted structure of bibliographic classification in the electronic and web based environments (thelwall, 2009). in (rosenfeld & morville, 2002), the philosophy and methodology of cc is considered at the abstract and theoretical level. this assessment of cc leads us to the argument that the faceted structure is supportive in precise retrieval with a considerably high cognitive work at the user end as compared to ddc and lcc because of their simple enumerative structures. library of congress uses lcc in its catalog14 and classification web15 applications. these applications are exploiting lcshs and lcc in user friendly manner. by looking at the usability aspect of these classification schemes, the ranking through this metrics appears as ddc is at the top for its easy enumerative structure and notational simplicity along with easy to use web applications. lcc is at second position because of its enumerative structure and adoptability in web applications. being enumerative and faceted, udc stands at the third position. cc for being a pure faceted scheme with complex notations and rules, is ranked at the fourth position. similarly, ieee and acm are faceted and easy to use, and therefore, share the first position with ddc. dmoz with loose boundaries of categorization is least usable with limited browsing and search. the accuracy and precision metric measures how accurate and precise a classification system can identify the exact locations of the holdings in the given knowledge space. udc shows accuracy and precision in finding the required knowledge artifact (kaosar, 2008). the accuracy and precision of cc gets compromised as its lengthy notations introduces complexity in searching and discovering documents (satija, 2015). lcc and ddc were researched for accuracy and precision by using a prototype model (gnoli, pusterla, bendiscioli, & recinella, 2016) for automatic text classification of electronic documents using classification metadata of library holdings from lcc and ddc 13 http://classify.oclc.org/classify2/ 14 https://catalog.loc.gov/vwebv/searchbasic 15 https://www.loc.gov/cds/classweb/classwebfeatures.html bibliographic classification in the digital age | ullah, khusro, and ullah | doi:10.6017/ital.v36i3.8930 59 datasets. it was observed that for precision, there is a need for increasing ddc and lcc bibliographic data on the web, introducing searching capabilities for bibliographic data at the micro level of any document, and increasing the efficiency of user interfaces for navigation using ddc-based browsing structure (joorabchi & mahdi, 2009) (joorabchi & mahdi, 2011). therefore, cc because of the pure faceted approach has high-level precision in search and resource discovery. udc stands second because of being enumerative and enumerative and analyticosynthetic. ddc is at the third position as oclc maintains and updates its structure regularly along with state-of-the-art search applications. lcc shares the third position with ddc, being regularly updated and maintained by library of congress for precision in their search application. ieee and acm also show greater precision in their search & retrieval, and therefore, share the third position with ddc and lcc. dmoz are the manually created and updated categories of web pages, having limited keyword search with very low precision. in connection to the evaluation framework, multilinguality means to classify and describe the knowledge artifacts written and expressed in variety of natural languages and the availability of any classification scheme in different natural languages. dmoz supports 72 different languages of the world and therefore stays at the top. udc is multilingual by supporting french, portuguese, spanish and russian (slavic, 2008) (koch & day, 1997) and has been translated into languages ("universal decimal classification summary," 2017). lcc supports works in 19 language subclasses ("library of congress classification outline: class p language and literature,") including german, slavic, oriental languages and roman languages etc. the translations of ddc support to localize this scheme for different languages of the world (vizine-goetz, 2009). ddc is translated in 30 different languages but covers different languages in only seven classes i.e., from 420 to 490 class number ("dewey decimal classification summaries,"). cc shows minimal multilingual support because of its sub-continental origin (a neelameghan & lalitha, 2013; raghavan, 2016). acm and ieee are in english languages only and therefore, show no multilinguality at all. using this metric, we can conclude that dmoz is at the first position, followed by udc, ddc, lcc, and then cc. consistency measures the level of uniformity in classification system to classify subjects. according to (batty, 1967), in the earlier stages, cc shows no consistency but by the addition of consistency cannons, it has gradually become consistent. lcc seems less consistent in expressing different subjects areas (madge, 2011). ddc and lcc were found short of defining and classifying religious holdings especially jewish contents. these schemes also show biasness towards different religious and regional contents (maddaford & briefing). although ddc is a little bit inconsistent, still it can classify complex subjects (gnoli et al., 2016). udc also shows inconsistency, which can be sorted out by introducing specific udc classes to database in online system (kaosar, 2008). ddc shows comparatively greater consistency in classifying new subjects with constant uniformity; cc is ranked second because of the introduction of canons of consistency. lcc and udc are ranked at third position. for being only limited to the scientific research articles, ieee information technology and libraries | september 2017 60 and acm are at fourth position. dmoz stands at fifth position due to its loose boundaries of categorization. the interoperability determines how much a given classification scheme is interoperable in expressing its classification artifacts with other schemes. udc is interoperable (koch & day, 1997) and supports integration with other systems. cc, because of its sub-continental origin, shows limited interoperability (a neelameghan & lalitha, 2013) (raghavan, 2016). lcc shows interoperability by being capable to map with ddc (vizine-goetz, 2009). the interoperability and multilinguality of ddc enables it to map with other classification schemes (vizine-goetz, 2009). ieee, acm and dmoz datasets are interoperable with other web applications. based on this metric, ddc, lcc, udc, acm, ieee and dmoz are standing at first position because of the presence of their interoperability and data harvesting protocols and ontologies in the digital environment. dmoz stands at the second position because of limited interoperability. cc provides only philosophical and theoretical model but we found no practical web-based application so it is not included in this list. by enabling semantic search, a classification scheme can proactively respond to information seekers using its faceted structure. udc, because of its semantic structure (slavic, 2008), contains semantic search capability. the classification theory and philosophy of cc provides the basis for classification ontology development (panigrahi & prasad, 2005), which makes obvious its capability of semantic search and inference. lcc supports semantic search through lod support, semantically enabled lcsh and authority control files ("lc linked data service: authorities and vocabularies,") (harper & tillett, 2007). ddc also contains semantic features (green, 2015), which can be utilized in the semantic search applications. therefore, it can be concluded that semantic search is also supported by ddc. this metric can be better analyzed in the digital environment and especially through analyzing these bibliographic classifications for their ontologies. lcc could be ranked first because of its expressive ontology with efficient semantic search application. ddc is at second position, because of efficient search but limited usage of its ontology. acm is at third position because of its expressive ontology and efficient search but limited coverage to scientific domain. ieee is at fourth position because of its faceted semantic search. udc comes at fifth position because of its ontological presence but with limited usage. cc has no application in the digital environment, which could demonstrate its capability for semantic search, although it provides the basis for the semantic level for all bibliographic classification systems. dmoz lacks in semantic search, where it is only based on keywords. bias in subject representation means inclination for or against some subjects which results in unfair, partial negligence or fully ignoring any subject. ddc and lcc are biased in representing different knowledge and regional information, e.g., anglo-american bias (tomren, 2003), while udc is biased towards european culture (fandino, 2008). cc is biased towards different knowledge areas (satija & singh, 2010). a classification system with least biasness is ranked higher. therefore, in this connection, dmoz is ranked higher for showing no/least biasness; cc is ranked second because of the presence of acute biasness followed by ddc showing comparatively bibliographic classification in the digital age | ullah, khusro, and ullah | doi:10.6017/ital.v36i3.8930 61 less biasness towards religion and regional subjects. lcc comes at the fourth position followed by ieee and acm that show greater biasness towards certain domains. enumerative structure exhibits the rigid hierarchies. lcc is enumerative (goh et al., 2009; perles, 1995) (bryant, october 4, 1993). udc is nearly enumerative and faceted (kaosar, 2008) (bryant, october 4, 1993) and ddc is both analytico-synthetic and enumerative (hallows, 2014). cc is faceted (chatterjee, 2016; dawson, brown, & broughton, 2006). by comparing these systems, lcc fully supports enumerative structure, and then comes ddc, whereas udc is nearly enumerative and cc shows no enumerative structure at all. lcc and ddc are enumerative. the trend is towards semantic and faceted structure, and therefore, enumerative structure in classification systems is not a desirable characteristic. therefore, the system with enumerative nature will be ranked lower. based on this metric, cc and dmoz are least enumerative and therefore, ranked higher, followed by ieee and acm at the second position, then udc at the third position, while ddc and lcc at the last. the faceted structure means the semantically interlinked structure of categories, which can be merged and combined to generate an expression for existing or new concepts (svenonius, 2000). cc is faceted (chatterjee, 2016; dawson et al., 2006). udc is analytico-synthetic (kaosar, 2008) and follows the faceted method of cc using different connecting symbols in mixed notations and using subject facets including time and space (chatterjee, 2016). ieee and acm possess faceted structures. dmoz has only hierarchical structure and predefined categories. based on this metric we rank cc first, udc second, acm and ieee third while ddc and lcc are enumerative structures, and therefore, cannot be included in the list. faceted search means to navigate or browse through the faceted structure of a faceted classification scheme. faceted search is also applied by selecting different ranges and choices from different facets that are given by any faceted system to search the required contents. it is different from search complexity in the sense that it looks at the pattern and criteria of search that exist in any classification scheme either in there opacs or web applications. the theory and philosophy of cc supports faceted search & browsing economically (kong, 2016), however, to the best of our knowledge, no real-world application demonstrates its usefulness. udc is based on the faceted approach, which supports faceted search (tunkelang, 2009). lcc supports faceted search with the help of lcsh (mcgrath, 2007). lcc also provides faceted search through the faceted application of subject terminology (fast) application ("faceted application of subject terminology," 2017). ddc provides the faceted search through the oclc classify16 application. using this metric, these classification schemes can be ranked as ddc at first position because ddc is adopting the faceted approach along with its native enumerative nature and state-of-the-art web based search applications developed by oclc. lcc is at second position because of its web based search applications and its adaptation of comparatively restricted faceted approach. ieee, for providing extensive choice of searching patterns, stands at the third position. acm has poly-hierarchical and 16 http://classify.oclc.org/classify2/ information technology and libraries | september 2017 62 multi-faceted classification structure along with robust search mechanism; therefore, it is on fourth position in this list. there are very limited faceted search applications of udc and therefore it stands at fifth position. dmoz has hierarchical structure in which the required element can be accessed through a keyword search. therefore, it is not providing any faceted search. cc has no search applications that could confirm its support for the faceted search. lod datasets means the availability of datasets of a given classification system on lod cloud. among our choice of well-known classification systems udc, lcc, ddc, ieee, acm and dmoz have datasets in the lod cloud whereas cc has no such datasets. the definitions of classes and properties are gathered in linked data vocabularies (lov), which are used for describing different types of objects used in lod cloud. these definitions of different things provide vocabularies for linking the linked data (foundation, 2017). cc, udc, ddc, lcc, ieee and dmoz have no lov, whereas acm has lov vocabularies. the metric “platforms” in the evaluation framework, considers the applicability of a given classification system in real-world web applications and other digital environments. in this regard, udc is supported by udc consortium, ddc by oclc, lcc by library of congress, acm by acm digital library, ieee by ieee xplore digital library, and dmoz by open directory project. to the best of our knowledge, cc has not been used by any of the online applications. table 3. ranking and average ranking of classification schemes the warrants of classification work as authoritative acts for classificationists to perform the cognitive practice for designing the classes and concepts in the classification system, their structural properties and then putting subjects in the specified classes (beghtol, 1986). cc and r an ki n g st ru ct ur al c om pl ex it y n ot at io n al b re vi ty p re de fi n ed s tr uc tu re r ul es c om pl ex it y t he or et ic al l aw s m n em on ic s h os pi ta li ty se ar ch c om pl ex it y u sa bi li ty p re ci si on a n d a cc ur ac y m ul ti li n gu il it y in te ro pe ra bi li ty se m an ti c se ar ch b ia sn es s en um er at iv e st ru ct ur e fa ce te d s tr uc tu re fa ce te d s ea rc h co n si st en cy lo d d at as et s lo v s up po rt a ve ra ge r an ki n g cc 1 2 7 1 5 2 6 2 2 4 2 1 7 5 4 4 1 4 1 1 3.1 udc 2 3 4 4 3 6 5 3 3 3 5 3 3 3 2 3 2 5 2 1 3.25 ddc 4 5 3 3 4 7 4 4 5 2 4 3 6 4 1 1 6 3 2 1 3.6 lcc 3 4 2 2 2 5 3 4 4 2 3 3 2 1 1 1 5 3 2 1 2.65 acm 6 7 6 5 1 4 2 4 5 2 1 3 5 2 3 2 3 2 2 1 3.3 ieee 5 6 5 5 1 3 2 4 5 2 1 3 4 2 3 2 4 2 2 2 3.15 dmoz 7 1 1 5 1 1 1 1 1 1 6 2 1 6 4 1 1 1 2 1 2.25 bibliographic classification in the digital age | ullah, khusro, and ullah | doi:10.6017/ital.v36i3.8930 63 udc use literary warrant; ddc and lcc use literary & scientific warrants. acm and ieee use scientific research warrant, while dmoz exhibits no warrant of classification. in the above paragraphs, we compared and evaluated the selected classification system using the evaluation metrics (shown in table 2), and discussed how these systems can be ranked based on a given evaluation metric. however, to give a holistic view of this comparison and evaluation, we introduce a ranking score or levels ranging from 1 (meaning low ranking, not applicable, or not available) to 7 (meaning high ranking) in how a classification scheme is best among its counterparts in the list. it is also the case that for a given metric, multiple systems may belong to the same ranking level. by assigning these ranking levels, table 3 compares these systems based on 20 metrics by excluding platforms and warrants of classification. table 3 also reports the average ranking of these classification systems, showing ddc at top with average ranking of 3.6, followed by acm = 3.3, and udc = 3.25. it can be concluded that ddc and udc are among the best classification schemes for describing printed as well as digital collections, whereas acm is best for classifying digital collections belonging to computer science domain. however, acm classification system can be extended to include other domains as well. figure 1 illustrates graphically the comparison and evaluation of these systems. figure 1. comparison and ranking of classification systems table 4 presents the state-of-the-art bibliographic classification ontologies including bibliographic ontology, lcc ontology, ddc ontology, udc ontology, and dmoz ontology. some of these ontologies were designed specifically for certain targeted applications e.g., acm ontology for acm digital library, and lcc ontology for library of congress etc., whereas others have multiple usage scenarios and have been used by several applications. an example of such general-purpose information technology and libraries | september 2017 64 bibliographic classification ontology is the bibliographic ontology17, which is used by several bibliographic services and digital libraries e.g., digital object identifier (doi), zotero, and library of congress classification number (lccn) permalink service (giasson, 2012). this evaluation framework compares these ontologies based on their size (in terms of number of classes), usage in the state-of-the-art applications, lod support, the availability of datasets on datahub18, and lov support. by looking at table 3, acm show more comprehensiveness in terms of number of classes, triples and lov support. classification and categorization ontologies no. of classes applications lod datasets lod datasets triples lov support bibliographic ontology19 69 library of congress and bibbase ü 200000 ü lcc ontology20 40+ library of congress ü not given û ddc ontology21 20+ oclc ü 402288 û udc ontology22 2,600 udc23 ü 69,000 û acm ontology24 1469 acm ü 12402336 ü ieee lom metadata ontology (casali, deco, romano, & tomé, 2013) 9 ieee25 xplore digital library ü 91564 ü dmoz ontology26 not given open directory project ü not given û table 4. comparison of classification and categorization ontologies 18 https://datahub.io 19 http://purl.org/ontology/bibo 20 http://id.loc.gov/ 21 http://dewey.info/ 22 http://udcdata.info/ 23 http://udcdata.info/ 24 http://dl.acm.org/ccs/ccs.cfm 25 http://ieee.rkbexplorer.com/id/ 26 https://www.dmoz.org/rdf.html bibliographic classification in the digital age | ullah, khusro, and ullah | doi:10.6017/ital.v36i3.8930 65 issues & challenges in classification research although bibliographic classification has been practiced since the use of books and the inception of library & information science practices, further research & development efforts are required for to meet the classification needs of the digital age. especially, with the arrival of digital holdings, researchers face several issues and challenges. for example, automatic text classification performs categorization of resources using ordinary metrics including tf-idf and classification in its true sense is yet to be achieved (yi, 2006). to handle the issue, text classification is also carried out through semantic indexing but its accuracy and precision are yet to be achieved. semantic and structural relationships among different parts of text corpus is still at infant level and has not been exploited to their fullest so that these can be used in text classification in more meaningful ways. other challenges in text classification include handling huge data resulted by applying a classification scheme, dynamism in classification, and structure dissimilarity among classification schemes although they agree upon subject as the primary characteristic. the biasness in ddc and lcc needs to be resolved. several revisions and proposals are put forward for addressing the problem of systematic knowledge organization and searching through natural language terms (miksa, 2007). there are various issues regarding the structural updates, search & retrieval criteria, and visualization (slavic-overfield, 2005). there are two main challenges for the application of the bibliographic classification principles in classifying the web. first, the principles of the bibliographic classification are formulated for the printed documents, which should also be applicable to digital collections. for addressing these challenges, there is need to apply and modify bibliographic classification principles in digital environments. second, it is required to exploit hidden hierarchies and concepts to be better classified by the principles of bibliographic classification for precise discovery, search and retrieval (j. mai, 2004). the issue of dependent process of classification of any object per predefined criteria and principles is important to address for finding a place in this age of search engines. this issue can be tailored by the principles of classification, so that the conventional principles are modified to consider the purpose of classification and domain of objects. for this issue semantic web and ontologies can play a vital role in bibliographic classification, which can provide independent classification of the bibliographic classification predefined theories (hjørland, 2012). the issue of heterogeneity conflicts, which arise because of the inconsistencies and structural divergences, are the challenges for the semantic interoperability. semantic interoperability can be brought into the bibliographic records inside the bibliographic system and across the systems through different phases of interlinking, evaluation, analysis, remodeling & conversion for analyzing, and restructuring the bibliographic data (tallerås, 2013). bibliographic data is in multi-format, multi-topical, multi-lingual and multi-targeted. for tackling these issues, the bibliographic data must be made mutually interoperable for making it information technology and libraries | september 2017 66 interlinked, searchable, and presented in a harmonized way across the boundaries of the datasets and data silos. the interoperability problem arises at the syntactic level for making consistent the character sets, notations, data formats and records in different systems. the interoperability problem is also arising at the semantic level because of the difference in data interpretations and difference in vocabularies, and precision levels in data encoding. publishing, collecting and maintenance of bibliographic data by multi organizations through own established standards and best practices in web 2.0 (hyvönen, 2012). with these problems in hand, the transition of this data from syntactical web to semantic web is a challenge for bringing the uniformity in records that are generated by diverse sources, encoded in multi-bibliographic systems, cross bibliographic systems interoperability, the visualization of bibliographic data accordingly as per need for different contexts. for addressing these problems there is a need for coordination and collaboration between bibliographic data publishers and the technical developers of the web applications (hyvönen, 2012). there is variety of metadata standards and schemas for defining, managing, resource discovery, search & retrieval, preserving, mapping, cross-walking, integrity, accuracy, and authenticity of metadata and bibliographic data. but for these tasks to be handled with great simplicity, semantic richness and accuracy, a universal all in one metadata format and schema is the need of the day (ramesh, vivekavardhan, & bharathi, 2015) to get out of this jungle of standards (gartner, 2016). this way, the metadata publishers and managers could get relieved and the job will become economic in terms of time, management, and search & retrieval. three main tasks were set in semantic publishing challenges 2015. these tasks are: (i) extracting data on workshops’ quality indicators; (ii) extracting data on affiliations, citations, funding; and (iii) interlinking. several challenges were faced while fulfilling these tasks. these tasks are being fulfilled through a proposed solution, which is composed of a text mining pipeline, lodexporter and named entity recognition (ner) for named entities extraction form text and linking them to resources on the lod cloud (sateli & witte, 2015). in (peroni, 2012) three main issues of semantic publishing are addressed which are: lack of document publishing universal metadata schemas according to publishing vocabulary, lacking of efficient user interface that are based on models and theories of semantic publishing, and there is a need for a tool that semantically link and describe document text. these issues require the urgent need for comprehensive ontologies for document publishing domain. (ferrara & salini, 2012) tossed 10 challenges for multiple dimensions of data in terms of bibliographic analysis. these challenges are: (i) analyzing bibliographic data in a multidimensional pattern; (ii) discovering and integrating data coming from diverse sources; (iii) detecting multiple references to the same item and cleaning, normalizing, and disambiguating bibliographic data records; (iv) analyzing multidimensional nature of bibliographic data through multivariate analysis for aggregating the data; (v) comparing different elements of bibliographic data and its ranking accordingly, (vi) aggregating indexes of different nature with respect to bibliographic classification in the digital age | ullah, khusro, and ullah | doi:10.6017/ital.v36i3.8930 67 different parameters, dimensions, and elements of bibliographic data; (vii) dealing with multiple indexes for the same item with different values coming from different sources; (viii) extracting and indexing textual information from text corpus in support of text mining; (ix) analyzing textual data topic-wise and describing these topics for research and learning process and tracing different trends; and (x) combining multidimensional information for finding trends in bibliographic data collection. bibliographic classification systems are being incorporated in lod. in dewey.info27, a prototype version of ddc is designed for linking its dataset in linked data cloud. the intention is to provide a platform for ddc data on the web having summaries of top 3 levels of classification order of ddc 22nd edition in 11 different language encoded in rdf/skos, having actionable uris for every class, representation for machines is in rdf, and for humans in xhtml+rdfs, and serialization available in formats of rdf/xml, turtle and json, and with sparql endpoint. (oclc 2011; mitchell and panzer 2013). however, this version of ddc on lod cloud is still at infant stage to cover different subjects and to be widely used in generating and creating documents metadata. library of congress linked data service provides access to commonly used standards and vocabularies developed by library of congress. this includes data values, controlled vocabularies, and preservation vocabularies which are part of this service. this service provides access to lcsh, lcc name authority files, lcc28, lc children's subject headings, lc genre/form terms, thesaurus for graphic materials, marc relators, marc countries, marc geographic areas, marc languages, iso639-1 languages, iso639-2 languages, iso639-5 languages, extended date/time format, preservation events, and preservation level role and cryptographic hash functions. the authorities and vocabularies currently included in this service are listed on the linked data service (library of congress 2014). however, it lacks in vocabularies for supporting premis, marc, mods, mets, and mix. as presented in section 2, several ontologies have been developed for describing and sharing knowledge about bibliographic classification. however, the available ontologies are limited in several ways e.g., these ontologies are not the complete clones of classification schemes of which they are deemed to be ontologies and they also not mature enough in terms of metadata collection. in addition, these ontologies still couldn’t break the cross-classification scheme metadata collection barriers i.e., they are not interoperable enough to harvest the metadata across bibliographic ontology system. therefore, further initiatives are required to develop matured bibliographic ontologies which fully clone bibliographic schemes that are in practical use and have strong theoretical ground. these ontologies must be interoperable and sharing metadata collection with other bibliographic ontologies. in this way in future we can have ontology-based general bibliographic classification system by fusion of the new and existing bibliographic ontologies for better management of the knowledge artifacts. 27 https://datahub.io/dataset/dewey_decimal_classification information technology and libraries | september 2017 68 conclusions with the arrival of digital collections, new challenges of preservation, curation as well as resource discovery & access (retrieval) have emerged that needs proper attention, where classification schemes and ontologies can play a significant role. by comparing and evaluating the available bibliographic classification and categorization systems it is concluded that currently ddc is the best classification system followed by udc, and acm. the bibliographic classification ontologies are limited in one way or the other e.g., some of these are comprehensive like udc and acm but lack support for lod and lov etc., while others support these later aspects but lack comprehensiveness. keeping in view the available bibliographic classification ontologies and their limitations, we recommend that a universal bibliographic classification ontology should be developed by using the classes from the available ontologies and providing support in terms of availability of datasets, support for interoperability, lod, and linked data vocabularies. for developing a more meaningful classification system, equally applicable to digital environments, it is necessary to consider the book structural semantics such as table of contents, headings, chapters, sections, subsections, figures, algorithms, mathematical equations, quotations etc., and the logical connections in contents (khusro & ullah, 2016; i. ullah & khusro, 2016) as well as about the book information i.e., the bibliographic details of the holdings. to meet, the former requirement, a comprehensive ontology like bookont (a. ullah, ullah, khusro, & ali, 2016) could be used, which can be mapped with any bibliographic ontology like e.g., bibliographic ontology29. however, as the evaluation frameworks suggest, ddc, udc, and acm classification system should be exploited in designing such a general-purpose classification system. references the 2012 acm computing classification system. retrieved march 20, 2017, from http://www.acm.org/about/class/2012 about universal decimal classification (udc). retrieved march 21, 2017, from http://www.udcc.org/index.php/site/page?view=about albrechtsen, h. (2000). who wants yesterday's classifications? information science perspectives on classification schemes in common information spaces. in k. schmidt (ed.), papers. technical university of denmark, center for tele-information. batley, s. (2014). classification in theory and practice. oxford: chandos publishing. batty, c. d. (1967). an introduction to colon classification: archon books. beghtol, c. (1986). semantic validity: concepts of warrant in bibliographic classification systems. library resources & technical services, 30(2), 109-125. 29 http://bibliontology.com/# bibliographic classification in the digital age | ullah, khusro, and ullah | doi:10.6017/ital.v36i3.8930 69 beghtol, c. (1998). knowledge domains: multidisciplinarity and bibliographic classification systems. knowledge organization, 25(1-2), 1-12. beghtol, c. (2001). relationships in classificatory structure and meaning relationships in the organization of knowledge (pp. 99-113): springer. berners-lee, t. (2006, june 18, 2009). linked data. design issues. retrieved march 21, 2017, from https://www.w3.org/designissues/linkeddata.html betts, t., milosavljevic, m., & oberlander, j. (2007). the utility of information extraction in the classification of books advances in information retrieval (pp. 295-306): springer. bizer, c., heath, t., & berners-lee, t. (2009). linked data-the story so far. semantic services, interoperability and web applications: emerging concepts, 205-227. boykin, j. (2016). assessing dmoz: a quality review. retrieved 14-03-2016, 2016, from https://www.seochat.com/c/a/search-engine-news/assessing-dmoz-a-quality-review/ bryant, b. (october 4, 1993). 'numbers you can count on' dewey decimal classification is maintained at lc. library of congress information bulletin, 52(18). http://www.loc.gov/loc/lcib/93/9318/count.html buchanan, b. (1979). theory of library classification. campbell, d. g. (2002). centripetal and centrifugal forces in bibliographic classification research. paper presented at the asis sig/cr classification research workshop. casali, a., deco, c., romano, a., & tomé, g. (2013). an assistant for loading learning object metadata: an ontology based approach. chan, l. m. (2000). exploiting lcsh, lcc, and ddc to retrieve networked resources: issues and challenges. chan, l. m., intner, s. s., & weihs, j. (2016). guide to the library of congress classification: abcclio. chapman, j. w., reynolds, d., & shreeves, s. a. (2009). repository metadata: approaches and challenges. cataloging & classification quarterly, 47(3-4), 309-325. chatterjee, a. (2016). universal decimal classification and colon classification: their mutual impact. annals of library and information studies (alis), 62(4), 226-230. cliff, p. (2008). jisc-repositories: subject classification thread summary. comaromi, j. p., & satija, m. p. (1983). brevity of notation in dewey decimal classification: metropolitan. dawson, a., brown, d., & broughton, v. (2006). the need for a faceted classification as the basis of all methods of information retrieval. paper presented at the aslib proceedings. information technology and libraries | september 2017 70 de grolier, e. (1962). a study of general categories applicable to classification and coding in documentation. dewey decimal classification summaries. retrieved march 21, 2017, from https://www.oclc.org/en/dewey/features/summaries.html dewey services: dewey decimal classification system. retrieved march 20, 2017, from https://www.oclc.org/content/dam/oclc/services/brochures/211422usb_dewey_services. pdf dorji, t. c., atlam, e.-s., yata, s., fuketa, m., morita, k., & aoe, j.-i. (2011). extraction, selection and ranking of field association (fa) terms from domain-specific corpora for building a comprehensive fa terms dictionary. knowledge and information systems, 27(1), 141-161. doi: 10.1007/s10115-010-0296-x dousa, t. m. (2009). evolutionary order in the classification theories of ca cutter & ec richardson: its nature and limits. encyclopedia, n. w. (1 august 2014). library classification. 2017, from http://www.newworldencyclopedia.org/entry/library_classification faceted application of subject terminology. (2017). retrieved march 21, 2017, from http://www.oclc.org/research/themes/data-science/fast.html fandino, m. (2008). udc or ddc: a note about the suitable choice for the national library of liechtenstein. extensions and corrections to the udc. ferrara, a., & salini, s. (2012). ten challenges in modeling bibliographic data for bibliometric analysis. scientometrics, 93(3), 765-785. foundation, o. k. (2017). about lov. from http://lov.okfn.org/dataset/lov/about francu, v. (2007). multilingual access to information using an intermediate language: proefschrift voorgelegd tot het behalen van de graad van doctor in de taal-en letterkunde aan de universiteit antwerpen. gartner, r. (2016). metadata: springer. giasson, b. d. a. f. (2012). projects using bibo. from http://www.bibliontology.com/projects.html giess, m. d., wild, p., & mcmahon, c. (2007). the use of faceted classification in the organisation of engineering design documents. paper presented at the proceedings of the international conference on engineering design 2007. gilchrist, a. (2015). reflections on knowledge, communication and knowledge organization. knowledge organization, 42(6), 456-469. giunchiglia, f., marchese, m., & zaihrayeu, i. (2007). encoding classifications into lightweight ontologies journal on data semantics viii (pp. 57-81): springer. bibliographic classification in the digital age | ullah, khusro, and ullah | doi:10.6017/ital.v36i3.8930 71 gnoli, c., merli, g., pavan, g., bernuzzi, e., & priano, m. (2008). freely faceted classification for a web-based bibliographic archive: the bioacoustic reference database. gnoli, c., pusterla, l., bendiscioli, a., & recinella, c. (2016). classification for collections mapping and query expansion. goh, y. m., giess, m., mcmahon, c., & liu, y. (2009). from faceted classification to knowledge discovery of semi-structured text records foundations of computational, intelligencevolume 6 (pp. 151-169): springer. green, r. (2015, october 29-30, 2015). relational aspects of subject authority control: the contributions of classificatory structure. paper presented at the proceedings of the international udc seminar 2015 classification & authority control expanding resource discovery, lisbon. hallows, k. m. (2014). it's all enumerative: reconsidering library of congress classification in us law libraries. law libr. j., 106, 85. harper, c. a., & tillett, b. b. (2007). library of congress controlled vocabularies and their application to the semantic web. cataloging & classification quarterly, 43(3-4), 47-68. hjorland, b. (1999). the ddc, the universe of knowledge, and the post-modern library. journal of the association for information science and technology, 50(5), 475. hjørland, b. (2007). semantics and knowledge organization. annual review of information science and technology, 41(1), 367-405. hjørland, b. (2008). core classification theory: a reply to szostak. journal of documentation, 64(3), 333-342. hjørland, b. (2012). is classification necessary after google? journal of documentation, 68(3), 299317. hjørland, b. (2013). theories of knowledge organization—theories of knowledge: keynote march 19, 2013. 13th meeting of the german isko in potsdam. knowledge organization, 40(3), 169-181. hjørland, b. (2016). subject (of documents). knowledge organization, 44(1), 55-64. hyman, r. j. (1980). shelf classification research: past, present--future? occasional papers (university of illinois at urbana-champaign. graduate school of library science); no. 146 (nov. 1980). hyvönen, e. (2012). publishing and using cultural heritage linked data on the semantic web. synthesis lectures on the semantic web: theory and technology, 2(1), 1-159. jacob, e. k. (2004). classification and categorization: a difference that makes a difference. library trends, 52(3), 515. information technology and libraries | september 2017 72 jonassen, d. h. (2004). handbook of research on educational communications and technology: taylor & francis. jones, k. s. (1970). some thoughts on classification for retrieval. journal of documentation, 26(2), 89-101. joorabchi, a., & mahdi, a. e. (2009). leveraging the legacy of conventional libraries for organizing digital libraries. paper presented at the international conference on theory and practice of digital libraries. joorabchi, a., & mahdi, a. e. (2011). an unsupervised approach to automatic classification of scientific literature utilizing bibliographic metadata. j. inf. sci., 37(5), 499-514. doi: 10.1177/0165551511417785 kaosar, a. (2008). merit & demerit of using universal decimal classification on the internet. kaula, p. (1965). colon classification: genesis and development. library science today. ranganathan’s festschrift, 1, 87-93. khusro, s., & ullah, i. (2016). towards a semantic book search engine. paper presented at the 2016 international conference on open source systems & technologies (icosst'16), lahore, pakistan. koch, t., & day, m. (1997). desire development of a european service for information on research and education. koch, t., day, m., brümmer, a., hiom, d., peereboom, m., poulter, a., & worsfold, e. (1997). the role of classification schemes in internet resource description and discovery. work package, 3. kong, w. (2016). extending faceted search to the open-domain web. university of massachusetts amherst. koshman, s. (1993). categorization and classification revisited: a review of concept in library science and cognitive psychology. current studies in librarianship spring/fall, 26. kwaśnik, b. h., & rubin, v. l. (2003). stretching conceptual structures in classifications across languages and cultures. cataloging & classification quarterly, 37(1-2), 33-47. kyle, b., & vickery, b. c. (1961). the universal decimal classification: present position and future developments: unesco. lc linked data service: authorities and vocabularies. retrieved 28 feb 2017, 2017, from http://id.loc.gov lee, h.-l. (2012). epistemic foundation of bibliographic classification in early china: a ru classicist perspective. journal of documentation, 68(3), 378-401. library of congress classification. (2014, 10/1/2014). retrieved march 20, 2017, from https://www.loc.gov/catdir/cpso/lcc.html bibliographic classification in the digital age | ullah, khusro, and ullah | doi:10.6017/ital.v36i3.8930 73 library of congress classification outline: class p language and literature. [press release]. retrieved from https://www.loc.gov/aba/cataloging/classification/lcco/lcco_p.pdf . library of congress subject headings: prevs. post-coordination and related issues. (march 15, 2007 ) report for beacher wiggins, director, acquisitions & bibliographic access directorate, library services, library of congress (pp. 49). cataloging policy and support office. library opacs containing udc codes. retrieved march 21, 2017, from http://www.udcc.org/index.php/site/page?view=opacs linked data. from https://www.w3.org/standards/semanticweb/data losee, r. m. (1993). seven fundamental questions for the science of library classification. knowledge organization, 20, 65-65. maddaford, s., & briefing, c. library of congress classification system. madge, o.-l. (2011). evidence based library and information practice. studii de biblioteconomie şi ştiinţa informării(15), 107-112. mai, j.-e. (2003). the future of general classification. cataloging & classification quarterly, 37(1-2), 3-12. mai, j.-e. (2004). classification in context: relativity, reality, and representation. knowledge organization, 31(1), 39-48. mai, j.-e. (2005). analysis in indexing: document and domain centered approaches. information processing & management, 41(3), 599-611. doi: http://dx.doi.org/10.1016/j.ipm.2003.12.004 mai, j.-e. (2009). the boundaries of classification. mai, j.-e. (2010). classification in a social world: bias and trust. journal of documentation, 66(5), 627-642. mai, j.-e. (2011). the modernity of classification. journal of documentation, 67(4), 710-730. mai, j. (2004). classification of the web: challenges and inquiries. knowledge organization, 31(2), 92. mancuso, j. (1994). general purpose vs special purpose couplings. paper presented at the 23rd turbomachinery symposium, dallas, tx, sept. manning, c. d., raghavan, p., & schütze, h. (2008). introduction to information retrieval (vol. 1): cambridge university press cambridge. maron, m. e., kuhns, j. l., & ray, l. c. (1959). probabilistic indexing: a statistical approach to the library problem. paper presented at the preprints of papers presented at the 14th national meeting of the association for computing machinery, cambridge, massachusetts. information technology and libraries | september 2017 74 mcgrath, k. (2007). facet-based search and navigation with lcsh: problems and opportunities. code4lib journal, 1. mcilwaine, i. c. (1997). the universal decimal classification: some factors concerning its origins, development, and influence. journal of the american society for information science (19861998), 48(4), 331. miksa, s. d. (2007). the challenges of change: a review of cataloging and classification literature, 2003-2004. library resources & technical services, 51(1), 51. neelameghan, a., & lalitha, s. (2013). multilingual thesaurus and interoperability. desidoc journal of library & information technology, 33(4). neelameghan, a., & parthasarathy, s. (1997). sr ranganathan's postulates and normative principles: applications in specialized databases design, indexing and retrieval: sarada ranganathan endowment for library science. nizamani, s., memon, n., & wiil, u. k. (2011). cluster based text classification model counterterrorism and open source intelligence (pp. 265-283): springer. painter, a. f. (1974). classification: theory and practice. drexel library quarterly, 10(4), n4. panigrahi, p., & prasad, a. (2005). inference engine for devices of colon classification in ai-based automated classification system. perles, b. (1995). faceted classifications and thesauri. retrieved from howard besser's web website: http://besser.tsoa.nyu.edu/impact/f95/papers-projects/papers/perles.html peroni, s. (2012). semantic publishing: issues, solutions and new trends in scholarly publishing within the semantic web era. alma. piros, a. (2014, 29 february 1, 2014). a different approach to universal decimal classification in a mechanized retrieval system. paper presented at the proceedings of the 9th international conference on applied informatics eger, hungary. pollitt, a. s. (1998). the key role of classification and indexing in view-based searching: technical report, university of huddersfield, uk, 1998. http://www.ifla.org/iv/ifla63/63polst.pdf. press, o. f. (2002). introduction to the dewey decimal classification. raghavan, k. (2016). the colon classification: a few considerations on its future. annals of library and information studies (alis), 62(4), 231-238. rahman, a., & ranganathan, t. (1962). seminal mnemonics. annals of library science, 9, 53-67. bibliographic classification in the digital age | ullah, khusro, and ullah | doi:10.6017/ital.v36i3.8930 75 ramesh, p., vivekavardhan, j., & bharathi, k. (2015). metadata diversity, interoperability and resource discovery issues and challenges. desidoc journal of library & information technology, 35(3). ranganathan, s. r. (1968). choice of scheme for classification. lib. sci. with a slant to documentation, 5(1), 1-69. reiner, u. (2008). automatic analysis of dewey decimal classification notations data analysis, machine learning and applications (pp. 697-704): springer. rodriguez, r. d. (1984). hulme's concept of literary warrant. cataloging & classification quarterly, 5(1), 17-26. rosenfeld, l., & morville, p. (2002). information architecture for the world wide web: " o'reilly media, inc.". san segundo manuel, r. (2008). some arguments against the suitability of library of congress classification for spanish libraries. extensions and corrections to the udc. sateli, b., & witte, r. (2015). automatic construction of a semantic knowledge base from ceur workshop proceedings. paper presented at the semantic web evaluation challenge. satija, m. p. (2013). the theory and practice of the dewey decimal classification system: elsevier. satija, m. p. (2015). save the national heritage: revise the colon classification. satija, m. p., & martínez-ávila, d. (2015). features, functions and components of a library classification system in the lis tradition for the e-environment. journal of information science theory and practice, 3(4), 62-77. satija, m. p., & singh, j. (2010). colon classification (cc) encyclopedia of library and information sciences (vol. 2, pp. 1158-1168). schallier, w. (2005). subject retrieval in opac's: a study of three interfaces. paper presented at the 7th isko-spain conference: the human dimension of knowledge organization, barcelona. singapore, n. l. o. (2016). usability on the web. from http://www.nlb.gov.sg/resourceguides/usability-on-the-web/ slavic-overfield, a. (2005). classification management and use in a networked environment: the case of the universal decimal classification. university of london. slavic, a. (2006). interface to classification: some objectives and options. slavic, a. (2008). use of the universal decimal classification: a world-wide survey. journal of documentation, 64(2), 211-228. smiraglia, r. p., & van den heuvel, c. (2011). idea collider: from a theory of knowledge organization to a theory of knowledge interaction. bulletin of the american society for information science and technology, 37(4), 43-47. information technology and libraries | september 2017 76 subject classification schemes. (2015). from http://www.ifla.org/best-practice-for-nationalbibliographic-agencies-in-a-digital-age/node/9042 sukhmaneva, e. (1970). the problems of notation and faceted classification. 17(3-4), 112-116. svenonius, e. (2000). the intellectual foundation of information organization: mit press. tallerås, k. (2013). from many records to one graph: heterogeneity conflicts in the linked data restructuring cycle. information research: an international electronic journal, 18(3), n3. tennis, j. t. (2008). epistemology, theory, and methodology in knowledge organization: toward a classification, metatheory, and research framework. tennis, j. t. (2011). ranganathan's layers of classification theory and the fasda model of classification. thelwall, m. (2009). synthesis lectures on information concepts, retrieval, and services.". introduction to webometrics: quantitative web research for the social sciences. tomren, h. (2003). classification, bias, and american indian materials. unpublished work, san jose state university, san jose, california. tunkelang, d. (2009). faceted search. synthesis lectures on information concepts, retrieval, and services, 1(1), 1-80. ullah, a., ullah, i., khusro, s., & ali, s. (2016, 19-21 dec. 2016). bookont: a comprehensive book structural ontology for book search and retrieval. paper presented at the 2016 international conference on frontiers of information technology (fit), islamabad, pakistan. ullah, i., & khusro, s. (2016). in search of a semantic book search engine on the web: are we there yet? artificial intelligence perspectives in intelligent systems (pp. 347-357): springer. universal decimal classification summary. (2017). from http://www.udcsummary.info/php/index.php?id=67277&lang=en# vizine-goetz, j. s. m. d. (2009). the dewey decimal classification. encyclopedia of library and information science. wang, j. (2009). an extensive study on automated dewey decimal classification. journal of the american society for information science and technology, 60(11), 2269-2286. wijewickrema, c. m., & gamage, r. (2013). an ontology based fully automatic document classification system using an existing semi-automatic system. xin, r. s., hassanzadeh, o., fritz, c., sohrabi, s., & miller, r. j. (2013). publishing bibliographic data on the semantic web using bibbase. semantic web, 4(1), 15-22. yelton, a. (2011). a simple scheme for book classification using wikipedia. information technology and libraries, 30(1), 7-15. bibliographic classification in the digital age | ullah, khusro, and ullah | doi:10.6017/ital.v36i3.8930 77 yi, k. (2006). challenges in automated classification using library classification schemes. paper presented at the proceedings of world library and information congress: 72nd ifla general conference and council. zhu, z. (2011). improving search engines via classification. university of london. 2 information technology and libraries | march 2008 currently we librarians seem to be hitching our wagon to the idea of library as community because in part it’s what we ourselves want. we’ve seen that our lita members want more community from our association, so it makes sense to us that our patrons also want community. it’s what pew, oclc, and other studies seem to be telling us. the business-wired side of the world is breaking their backs to create every form of virtual community they can think of as quickly as possible. apply the appropriate amounts of marketing and then our patrons want those things and expect them from all of their historically important community resources, the library being a prime player in that group. so we strive and strive and strive to not only provide the standard issue face-to-face community we’ve always created, but to also create that new highly desired virtual community. either we create a library-specific version, or we at the very least create a way for our patrons to access those communities. hopefully, when our patrons step into those virtual communities, we work to make it possible for them to find libraries there, too. all well and good, but do we have a plan? what’s the goal? what’s the end achievement? if, as studies say, patrons with a research need turn to libraries first only one percent of the time, and instead first hit up friends and family fifty or more percent of the time, then where is our significance and place in either the physical or virtual spaces? we know we serve significant numbers in many ways. we have gate counts, circulation records, holds placed, warm bodies in the building—all manners of indicators that show a well-managed and -marketed library is in demand and appreciated. as we run into the terrible head-on crash of community and technology, willy-nilly doing absolutely everything we can to accommodate everyone and everything, because we’re librarians and library technologists and that’s what we do, do we really have a clue why we’re doing it? all fodder for deep thought and many lattes or beers and late night discussions. on the lita side, though, we’re embarking on doing something about this knot when it comes to serving our members. under the guidance of past-president bonnie postlethwaite we’ve established an assessment and research committee co-chaired by bonnie and diane bisom. to kick off the committee activities and to help them establish an agenda and direction, lita hired the research firm the wedewer group to work with the lita board and the new committee. stay tuned for reports and announcements from this committee as it works to find answers to some of those questions. and have that latte with a lita colleague as you seek to find some answers yourself. it’s all part of building community. mark beatty (mbeatty@wils.wisc.edu) is lita president 2007/2008 and trainer, wisconsin library services, madison. president’s message: doing something about life’s persistent problems? mark beatty information technology and libraries at 50: the 1960s in review mark cyzyk information technology and libraries | march 2018 6 mark cyzyk (mcyzyk@jhu.edu), a member of lita and the ital editorial board, is the scholarly communication architect in the sheridan libraries, the johns hopkins university, baltimore, maryland. in the quarter century since graduating from library school, i have now and then run into someone who had what i consider to be a highly inaccurate and unintuitive view of librarians and information technology. seemingly, in their view, librarians are at worst luddites and at best technological neophytes. not so! in my view, librarians have always been at worst technological power users and at best true it innovators. one has only to scan the first issues of ital, or the journal of library automation as it was then called, to put such debate to rest. march 1968 saw the first issue of the first volume of the journal of library automation published. the first article of that inaugural issue sets the scene: “computer based acquisitions system at texas a&i university” by ned c. morris. here we find librarians not only employing computing technology to streamline library operations (using an ibm 1620 with 40k ram), but as the article points out, this new system for computerizing acquisitions was an adjunct to the systems they already had in place at texas a&i for circulation and serials management. this first article in the first issue of the first volume indicates that we’ve dipped a toe into a stream that was already swiftly flowing. the other bookend of that first issue, “the development and administration of automated systems in academic libraries” by harvard’s richard de gennaro, goes meta and takes a comprehensive look at how automated library systems were already being created and the various system development and implementation rubrics under which such development occurred. much in this article should resonate with current readers of ital. i knew immediately that this article was going to be a good read when i encountered, in the very first paragraph: development, administration, and operations are all bound up together and are in most cases carried on by the same staff. this situation will change in time, but it seems safe to assume that automated library systems will continue to be characterized by instability and change for the next several years. i’d say that was a safe assumption. the second and final volume of the 1960’s contains gems as well. the entirety of volume 2 issue 2 that year was devoted to “usa standard for a format for bibliographic information interchange on magnetic tape” a.k.a. marc ii. is it possible for something to be dry, yet fascinating? some titles of this second volume point to the wide range of technological projects underway in the library world in 1969: mailto:mcyzyk@jhu.edu the 1960s in review | cyzyk 7 https://doi.org/10.6017/ital.v37i1.10339 • “an automated music programmer (musprog)” by david f. harrison and randolph j. herber • “a fast algorithm for automatic classification” by r. t. dattola • “simon fraser university computer produced map catalogue” by brian phillips and gary rogers • “management planning for library systems development” by fred l. bellomy • “performance of ruecking’s word-compression method when applied to machine retrieval from a library catalog” by ben-ami lipetz, peter stangl, and kathryn f. taylor and this is only in the first two volumes. as this current 2018 volume of ital proceeds, we’ll be surveying the morphing information technology and libraries landscape through ital articles of the seventies, eighties, and nineties. i think you will see what i mean when i say that librarians have always been at worst technological power users, at best true it innovators. 102 information technology and libraries | september 2010 lita committees and interest groups are being asked to step up to the table and develop action plans to implement the strategies the lita membership have identified as crucial to the association’s ongoing success. members of the board are liaisons to each of the committees, and there is a board liaison to the interest groups. these individuals will work with committee chairs, interest group chairs, and the membership to implement lita’s plan for the future. the committee and interest group chairs are being asked to contribute those actions plans by the 2011 ala midwinter meeting. they will be compiled and made available to all lita and ala members for their use through the lita website (http://lita.org) and ala connect (http://connect.ala.org). what is in it for you? lita is known for its leadership opportunities, continuing education, training, publications, expertise in standards and information policy, and knowledge and understanding of current and cuttingedge technologies. lita provides you with opportunities to develop those leadership skills that you can use in your job and lifelong career. the skills working within a group of individuals to implement a program, influence standards and policy, collaborate with other ala divisions, and publish can be taken home to your library. your participation documents your value as an employee and your commitment to lifelong learning. in today’s work environment, employers look for staff with proven skills who have contributed to the good of the organization and the profession. lita needs your participation in developing and implementing continuing education programs, publishing articles and books, and illustrating by your actions why others want to join the association. how can you do that? volunteer for a committee, help develop a continuing education program, write an article, write a book, role model for others with your lita participation, and recruit. what does your association gain? a solid structure to support its members in accomplishing the mission, vision, and strategic plan they identified as core for years to come. look for opportunities to participate and develop those skills. we will be working with committee and interest group chairs to develop meeting management tool kits over the next year, create opportunities to participate virtually, identify emerging leaders of all types, collaborate with other divisions, and provide input on national information policy and standards through ala’s office for information technology policy and other similar organizations. if you want to be involved, be sure to let lita committee and interest group chairs, the board, and your elected officers know. c loud computing. web 3.0 or the semantic web. google editions. books in copyright and books out of copyright. born digital. digitized material. the reduction of stanford university’s engineering library book collection by 85 percent. the publishing paradigm most of us know, and have taken for granted, has shifted. online databases came and we managed them. then cd-roms showed up and mostly went away. and, along came the internet, which we helped implement, use, and now depend on. how we deal with the current shifts happening in information and technology during the next five to ten years will say a great deal about how the library and information community reinvents itself for its role in the twenty-first century. this shift is different, and it will create both opportunities and challenges for everyone, including those who manage information and those who use it. as a reflection of the shifts in the information arena, lita is facing its own challenges as an association. it has had a long and productive role in the american library association (ala) dating back to 1966. the talent among the association members is amazing, solid, and a tribute to the individuals who belong to and participate in lita. lita’s members are leaders to the core and recognized as standouts within ala as they push the edge of what information management means, and can mean. for the past three years, lita members, the board, and the executive committee have been working on a strategic plan for lita. that process has been described in michelle frisque’s “president’s message” (ital v. 29, no. 2) and elsewhere. the plan was approved at the 2010 ala annual conference in washington, d.c. a plan is not cast in concrete. it is a dynamic, living document that provides the fabric that drives the association. why is this process important now more than ever? we are all dealing with the current recession. libraries are retrenching. people face challenges participating in the library field on various levels. the big information players on the national and international level are changing the playing field. as membership, each of us has an opportunity to affect the future of information and technology locally, nationally, and internationally. this plan is intended to ensure lita’s role as a “go to” place for people in the library, information, and technology fields well into the twenty-first century. karen j. starr (kstarr@nevadaculture.org) is lita president 2010–11 and assistant administrator for library and development services, nevada state library and archives, carson city. karen j. starrpresident’s message: moving forward microsoft word 13355 20211217 galley.docx article a 21st century technical infrastructure for digital preservation nathan tallman information technology and libraries | december 2021 https://doi.org/10.6017/ital.v40i4.13355 nathan tallman (ntt7@psu.edu) is digital preservation librarian, pennsylvania state university. © 2021. abstract digital preservation systems and practices are rooted in research and development efforts from the late 1990s and early 2000s when the cultural heritage sector started to tackle these challenges in isolation. since then, the commercial sector has sought to solve similar challenges, using different technical strategies such as software defined storage and function-as-a-service. while commercial sector solutions are not necessarily created with long-term preservation in mind, they are well aligned with the digital preservation use case. the cultural heritage sector can benefit from adapting these modern approaches to increase sustainability and leverage technological advancements widely in use across fortune 500 companies. introduction most digital preservation systems and practices are rooted in research and development efforts from the late 1990s and early 2000s when the cultural heritage sector started to tackle these challenges in isolation. since then, the commercial sector has sought to solve similar challenges, using different technical strategies. while commercial sector solutions are not necessarily created with long-term preservation in mind, they are well aligned with the digital preservation use case because of similar features. the cultural heritage sector can benefit from adapting these modern approaches to increase sustainability and leverage technological advancements widely in use across fortune 500 companies. in order to understand the benefits, this article will examine the principles of sustainability and how they apply to digital preservation. typical preservation activities that use technology will be described, followed by how these activities occur in a 20th-century technical infrastructure model. after a discussion on advancements in the it industry since the conceptualization of the 20thcentury model, a theoretical 21st-century model is presented that attempts to show how the cultural heritage sector can employ industry advancements and the beneficial impact on sustainability. galleries, libraries, archives, and museums cannot afford to ignore the sustainability of managing and preserving digital content and neither can distributed digital preservation or commercial service providers.1 budgets lag behind economic inflation while the cost of and amount of materials to purchase rises, coupled with the need to hire more employees to do this work. if digital preservation programs are going to scale up to enterprise levels and operate in perpetuity, it is imperative to update technical approaches, adopt industry advancements, and embrace cloud technology. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 2 sustainability for digital preservation programs to succeed, they must be sustainable per the triple bottom line or they risk subverting their mission. the triple bottom line definition of sustainability identifies three pillars: people (labor), planet (environmental), and profit (economic).2 while there are typically few people with digital preservation in their job title within an organization, it’s a collaborative domain with roles and responsibilities distributed throughout organizations, reflecting the digital object lifecycle. it’s important that the underlying technical infrastructure can easily be supported and is not so complicated that it is hard to recruit systems administration staff. digital preservation consumes many technical resources and data centers have a substantial environmental impact. as ben goldman points out in “it’s not easy being green(e),” data centers consume an immense amount of power and require extravagant cooling systems that use precious fresh water resources.3 because there is no point in preserving digital content if there will be no future generation of users, responsible digital preservation programs will seek to reduce carbon outputs and the number of rare-earth elements in our technical infrastructure.4 while cultural heritage organizations rarely seek to make a profit, economic sustainability is vital to organizational health and costs for digital preservation must be controlled. modern technological infrastructures discussed here will help to increase sustainability by using widespread technologies and strategies for which support can be easily obtained, by reducing energy consumption, by minimizing reliance on hardware using rare-earth elements, and by leveraging advances in infrastructure components such as storage to perform digital preservation activities. basic digital preservation activities this paper will examine technical preservation activities and the author acknowledges that basic digital preservation activities are likely to include risk management and other non-technical concepts. while there is no formal, agreed-upon definition of what constitutes a set of basic digital preservation activities, bit-level digital preservation is a common baseline. bit-level digital preservation seeks to preserve the digital object as it was received, ensuring that you can get out an exact copy of what you put in, no matter how long ago the ingest occurred; however, with no guarantees as to the renderability of said digital object. two basic digital preservation activities are key to this strategy: fixity and replication. fixity fixity checking, or the “practice of algorithmically reviewing digital content to ensure that it has not changed over time,” is a foundational digital preservation strategy for verifying integrity that aligns with rosenthal et al.’s “audit” strategy.5 fixity is how preservationists demonstrate mathematically that the content has not changed since it was received. not all fixity is the same, however; fixity can be broken up into three types: transactional fixity, authentication fixity, and fixity-at-rest.6 transactional fixity transactional fixity is checked after some sort of digital preservation event7, such as ingest or replication. depending on the event, it’s desirable to use a non-cryptographic algorithm, such as crc32 or md5, when files move within a trusted system. when it’s only necessary to prove that a file hasn’t immediately changed, such as copying between filesystems, cryptographic algorithms are unnecessarily complex and are too expensive, in terms of compute consumption. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 3 authentication fixity authentication fixity proves that a file hasn’t changed over a long period of time, particularly since ingest. although one could use a chain of transactional fixity checks to cumulatively prove there has been no change, it’s often desirable to conduct one fixity check that can be independently verified. unbroken cryptographic algorithms, such as one from the sha-2 and sha-3 families, are well suited to this use case and worth the complexity and compute expense, particularly since this type of fixity check doesn’t have to be run as often. fixity-at-rest fixity-at-rest is when fixity is monitored while content is stored on disk. while some organizations may choose to only conduct fixity checks when files move or migrate, this strategy can miss bit loss due to media degradation, software or human error, or malfeasance that is only discovered when the file is retrieved.8 a common approach for monitoring fixity-at-rest is to systematically conduct fixity checks on all or a sample of files at regular intervals. these types of fixity checks may or may not use cryptographic algorithms, depending on their availability.9 replication replication is another cornerstone of achieving bit-level digital preservation. the national digital stewardship alliance’s 2019 levels of digital preservation, a popular community standard, recommends maintaining at least two copies in separate locations, while noting three copies in geographic locations with different disaster threats is stronger.10 all of these copies must be compared to ensure fixity is maintained. an important concept to consider when thinking about replication is the independence of each copy. according to schaefer et al.’s user guide for the preservation storage criteria, “the copies should exist independently of one another in order to mitigate the risk of having one event or incident which can destroy enough copies to cause loss of data.”11 in other words, a replica should not depend on another replica, but instead depend on the original file. advanced digital preservation activities when considering more robust digital preservation strategies beyond bit-level preservation, additional activities must be considered to ensure that the information contained within digital files can be understood. implementation of these activities may vary by digital object, depending on the digital preservation goal and appraised value of the content. this paper only describes a handful of the many advanced digital preservation activities as illustrative examples; the ideas in this paper could be applied to most advanced activities. metadata extraction digital files often contain various types of embedded metadata that can be used to help describe both its intellectual content and technical characteristics. this metadata can be extracted and used to populate basic descriptive metadata fields, such as title or author. extracted technical metadata is useful for broader preservation planning, but also for validating technical characteristics in derivative files. for example, if generating an access file for digitized motion picture film, it’s necessary to know the color encoding, aspect ratio, and frame rate. if these details are ignored, the access derivative may appear significantly different than the original file and give a false impression to users. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 4 file format conversions file format conversions help to ensure the renderability of digital content. there are two types of file format conversions to consider: normalization and migration. normalization generally refers to proactively converting file formats upon ingest to retain informational content, e.g., converting a wordperfect document to plain text or pdf when only the informational content is desired. migration may occur at any time: upon ingest, upon access, or any time while an object is in storage. migration occurs when file formats are converted to a newer version of the same format, e.g., microsoft access 2003 (mdb) to microsoft access 2016 (accdb) or to a more stable and open format that retains features, e.g., microsoft access 2016 (accdb) to sqlite. versioning versioning, or the retention of past states of a digital object with the ability to restore previous states, is complex to implement and not always necessary. an organization might choose to apply versioning to subsets of digital content, such as within an institutional repository, but not for born-analog, or digitized material. additionally, an organization may choose to version metadata only, ignoring changes to the bitstream, such as for born-analog digital objects. figure 1. the infrastructure architecture for a typical 20th-century stack. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 5 the 20th century technical infrastructure the technical infrastructure that enables digital preservation can come in many forms. while technology has advanced over the past thirty years, the cultural heritage sector, particularly where digital preservation is concerned, has been slow to adapt. below are descriptions of three common components of a typical server stack (technical infrastructure), though the author acknowledges that some organizations have already moved past this model. figure 1 is a diagram of the typical 20th-century stack. storage storage, at the core of digital preservation, has benefitted from rapid technological advancement since computers first started storing information on punch cards and magnetic media. twentiethcentury servers often use three main types of storage: file, block, and object. file storage file storage is what most people are familiar with. a filesystem interfaces with the underlying storage technology (block or object) and physical media (hard disk drives, solid state drives, tapebased media, or optical media) to present users with a hierarchy of directories and subdirectories to store data. this data can easily be accessed by users or applications using file paths, while the filesystem negotiates the actual bit-locations on the physical media. the choice of filesystem can impact data integrity (fixity), although choice may be limited by operating system. in the 20th century, journaling filesystems offered the most data protection as the filesystem keeps track of all changes; in the event of a disk failure, it’s possible to recover more data if a journaling filesystem is used. block storage block storage uses blocks of memory on physical media (disk, tape, etc.) that are managed through a filesystem to present volumes of storage to the server. all interactions between server and storage are handled by the filesystem via file paths, though the data is stored on scattered blocks on the media. block storage directly attached to a server is often the most performant option, the data does not travel outside the server. network attached storage, in which an external file system is mounted to the server as if it were locally attached block storage, requires data to travel through cables and networks before it gets to the server, which decreases performance. object storage object storage, which still uses tape and disk media, is an abstraction on top of a filesystem. instead of using a filesystem to interact directly with storage media, the storage media is managed by software. the software pools storage media and interactions happen through an api, with files being organized into “buckets” instead of using a filesystem with paths. object storage is webnative and the basis for commercial cloud storage. software-defined storage, which is discussed in more detail later in this article, also allows users to create block storage volumes that can be directly mounted to virtual servers as part of a filesystem or to create network shares that present the underlying storage to users via a filesystem.12 both block and object storage can be used for high-performance storage, hot storage (online), cold storage (nearline), and offline storage. generally, tape and slower performing hard disks are used for offline and nearline storage; faster performing hard disks are used for online storage. solidinformation technology and libraries december 2021 a 21st century technical infrastructure | tallman 6 state drives (ssds) using non-volatile memory express (nvme) protocols are best suited for highperformance storage.13 in the 2019 storage infrastructure survey, by the national digital stewardship alliance, 60% of those aware of their organizational infrastructure reported a reliance on hardware-based filesystems (file and block storage) while about 15% used software-based filesystems (object storage), with 14% reporting a hybrid approach.14 this indicates that the cultural heritage sector continues to rely more on file and block storage and is not yet fully embracing object storage. the survey did not probe into why this might be. servers: physical and virtual twentieth-century technical infrastructures relied primarily upon physical servers. physical servers, also called bare metal, dominated the server landscape up through roughly 2005. virtual servers arrived on the scene after “vmware introduced a new kind of virtualization technology which … [ran] on the x86 system” in 1999.15 server virtualization facilitated a fresh wave of innovation by making it easier and more inexpensive to create, manage, and destroy servers as necessary. dedicating physical servers to one or a limited number of applications requires more resources and expends a higher carbon cost; virtual servers can be highly configured for their precise needs and this configuration can be changed using software, rather than changing parts on a physical server, resulting in less waste. cultural heritage organizations have been slow to fully adapt virtual servers. the 2019 ndsa storage infrastructure survey reports that 81% of respondents continue to rely on physical servers with 63% of respondents using virtual servers. fewer than 10% reported using containers, an even more efficient virtualization technology.16 containers are an evolution of virtual servers that act like highly optimized, self-contained servers doing a specific activity.17 applications and microservices in the 20th century, applications often required dedicated servers. business logic was handled by applications or microservices that ran on top of the server and storage, the highest level in the stack. there are advantages to handling the business logic at this high level: it’s completely in the control of the developer and can be finely tuned to the needs. unfortunately, this is also an expensive place to handle all business logic as the application needs to be maintained over time and there’s overhead involved in working at this level of the stack. microservices, in this server model, are generally specific commands that can be invoked as needed. while called microservices because they can be applied individually, they still run in this expensive part of the stack and have the same downsides as applications. in digital preservation systems using this type of architecture, basic and advanced digital preservation activities occur within this application layer. fixity can be a costly activity. garnett, winter, and simpson, in their paper “checksums on modern filesystems, or: on the virtuous consumption of cpu cycles,” point out that “calculating full checksums in software is not efficient” and “increases wear and tear on the disks themselves, actually accelerating degradation.”18 fixity, when done this way, is a linear process that requires every file to be read from disk so a checksum can be calculated; when performing fixity over large amounts of content, this is very inefficient and time consuming. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 7 preservation activities in the 20th-century stack in this model of infrastructure, many cultural heritage institutions are relying on practices created when the field of digital preservation was emerging. basic activities basic preservation activities take a generalized approach and mostly occur in the costly application and microservices layer. this follows the general approach of application development from the commercial sector in the 20th century. fixity although there are differences in frequency, most organizations do not currently make distinctions between transactional fixity, authentication fixity, or fixity-at-rest. common current practices use the same method (md5, sha-256, sha-512) for all fixity checks.19 this inefficient approach take place in the application and microservices layer and uses more compute power than necessary, increasing the environmental impact. replication in most 20th-century stacks, replications are handled in the application layer, where it is most costly in terms of computational power and labor to maintain, having a negative impact on sustainability. some are using 20th-century microservices are well. advanced digital preservation activities like basic preservation activities, advanced ones chiefly take place in the application and microservices layer if they occur at all. metadata extraction and file format conversion metadata extraction and file format conversion tends to occur only upon ingest as a one-time event. archivematica, the popular open-source digital preservation system, uses 20th-century microservices for each and they only occur during the transfer (ingest) process.20 other systems often include this in the business logic of the application layer. versioning version control is a feature that many organizations choose not to implement. the 2019 ndsa storage infrastructure surveys shows that fewer than half (40) of respondents (83) used any type of version control.21 version control is hard to implement in a custom system, with alternative approaches. fedora, a digital preservation backend repository, introduced support for versioning in the application layer around 2004.22 advances in the commercial sector since the conceptualization of the 20th-century stack, there have been significant advancements made in the general it industry. virtualization technology developed in the 1990s led to the proliferation of cloud computing and infrastructure that transformed the it industry in the early 2000s, leading to the “long-held dream of computing as a utility” or commodity.23 clouds can be public, where anyone is able to provision and use services, or private, where services are only available to a group of authorized users. public clouds are run in commercial data centers while private clouds are typically built-in privately-owned data centers, though it’s possible to use commercial data centers to build private clouds. hybrid clouds are also possible, typically combing private and public clouds, or combining on-premises infrastructure with a private or public cloud. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 8 in 2009, researchers at uc berkley identified three strong reasons why cloud computing has been so widely adopted: the illusion of vertical scaling on demand, elimination of upfront cost, and the ability to pay for short-term resources.24 surveys from the ndsa and the beyond the repository grant project show a steady, but slow adoption of cloud infrastructure by the cultural heritage community.25 it is unclear whether early adopters have chosen independently or simply followed it changes in their parent organizations. any organization can build a private cloud and take advantage of the benefits described in this article. using the cloud does not mean that you must contract with commercial cloud providers. some organizations may choose to build a private cloud if there are concerns over data sovereignty, mistrust in public clouds, or for other reasons. the ontario council of university libraries in canada has built a private cloud for its members called the ontario library research cloud using openstack, a suite of open-source software for building clouds.26 software-defined storage while virtualization enables cloud computing, software-defined storage is the foundation for cloud storage. software-defined storage combines inexpensive hardware with software abstractions to create a flexible, scalable, storage solution that provides data integrity.27 software-defined storage can use the same pool of disks to present all three of the common types of storage: file, object, and block. file storage is what most users are familiar with. software defined file storage creates a network file share from which files can be accessed on local devices via a filesystem.28 object storage in this environment is like a web-native file share; files are stored in buckets, which can be further organized by folders. files are not accessed through a filesystem, but are instead accessed through uris, which makes object storage very amenable to web applications and avoids some of the pitfalls of relying on filesystems. block storage is mostly used to mount storage to virtual servers, storage that is directly attached to the server as if it was a physical disk or volume mounted to the server. block storage is more performant than either file or object storage; as such it’s typically used for things like the operating system and application code, but not for storing content. all storage can be managed through apis, adding to its suitability for automation, software development, and it operations.29 hardware diversity software defined storage also has features that make a compelling use case for digital preservation. first, software defined storage accommodates hardware diversity. because software defined storage is an abstraction, it’s possible to combine different types of storage media, from different manufacturers and production batches to ensure some technical diversity and avoid risk from catastrophic failure from a hardware monoculture. fixity and integrity second, like the use of raid in traditional filesystems, file integrity can be strengthened through the use of erasure coding.30 erasure coding splits files into chunks and spreads them across multiple disks or potentially nodes such that the file can be reconstructed if some of the disks or nodes fail. this can be configured in different ways, depending on the amount of parity desired.31 replication third is replication of content. for cloud administrators, replication might be an alternative to erasure coding for ensuring data integrity; for digital preservationists, it’s a distinct strategy and information technology and libraries december 2021 a 21st century technical infrastructure | tallman 9 basic preservation activity. operating nodes in a software defined storage network can be in different availability zones; through object storage policies, content can be replicated as many times as necessary to provide mitigation of geographic based threats. it’s even possible to replicate to object storage in a different software defined storage network, helping to achieve organizational diversity as well. figure 2. the infrastructure architecture for a theoretical 21st-century stack. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 10 an updated technical infrastructure for the 21st century a theoretical 21st-century stack for digital preservation has many of the same components as its 20th-century antecedent. however, these components are used in different ways, largely due to technological advancements. leveraging these advancements to handle digital preservation activities at lower levels of the stack reduces the complexity of the business logic in the application layer. figure 2 shows an updated architecture diagram for this 21st-century stack, which may be used by an individual organization, consortium, or service provider planning to build a digital preservation system. the storage layer is built on software-defined storage with digital content primarily being stored as objects; these objects are stored using the oxford common file layout (discussed further later). physical bare metal servers are used to power virtual machines that host applications such as a digital repository. physical servers also host a container and function as a service to provide a suite of microservices for processing digital content. storage in this new stack, storage is primarily managed through software defined storage with data flowing over networks. there are currently two primary open-source options for running a software-defined storage service: gluster and ceph. both can be installed and run on-premises, in a private or public data center, or even contracted through infrastructure as a service (iaas). in his presentation at the 2018 designing storage architectures for digital collections meeting, hosted by the library of congress, glenn heinle recommended ceph over gluster where data integrity is the highest priority; however, others argue that gluster is better for long-term storage.32this is likely because ceph is better able to recover from hardware failures.33 file storage reliance on file storage has become minimal in this theoretical stack, with data primarily stored as objects. however, file storage may still be used; when it is, it benefits from using a modern filesystem. several modern filesystems have emerged since 2000, most notably zfs and openzfs34 with their innovative copy-on-write transactional model and methods for managing free space.35 both zfs and openzfs can also be configured to use raid-z, which maintains blocklevel fixity by calculating checksums for each block of data and verifying the checksum when accessed. this can be combined with simple software to touch every block on a regular basis to ensure fixity-at-rest. although this is a different approach than file-level fixity checks, it accomplishes the same thing in a much more efficient method: preservation metadata could be recorded for each block that contains part of the file.36 zfs has also inspired similar modern filesystems such as btrfs, apple file system (apfs), refs, and resier.37 however, even if this theoretical stack isn’t relying on file storage to persist data, software-defined storage is an abstraction that sits atop servers and disks (or tape) that do use filesystems.38 ironically, zfs is not the best option for the underlying disks as its data integrity features come with more overhead and data integrity can be achieved through different means with softwaredefined storage.39 block storage block storage comes in two forms in this future stack. many virtual servers will leverage the block storage offerings of the software defined storage service, attaching virtual disk blocks to virtual servers. however, the physical servers that support virtualization will still have some physically information technology and libraries december 2021 a 21st century technical infrastructure | tallman 11 attached storage using ssds (through nvme) to support high performance storage needs. this physically attached block storage is more performant than virtually attached block storage since the system has direct access to the disks and does not have to work through a virtually abstracted filesystem. object storage object storage has become the primary method of storing data in this theoretical stack. the flexibility of object storage, with its web-native apis and authentication, gives it an advantage as systems become less centralized and more integrations are needed. the natural scalability of object storage and the variety of private, public, and commercial offerings greatly simplifies geographic and organizational redundancy when replicating data. with software-defined storage, it’s also possible to offer hot (live) and cold (nearline, offline) options, giving flexibility for how data is stored to better optimize the storage for various needs. hot storage may use either hard disk or solid-state drives while cold storage would rely on tape or optical media. presently, options for running software defined storage on tape and optical media are mostly proprietary.40 while this would be a concern if these systems held the only copy, if the data is replicated to systems using other technology and media, this risk can be managed. while optical media has long been criticized for use as a preservation media, when well-managed, the risk may be overstated.41 oxford common file layout the oxford common file layout (ocfl) is a “shared approach to filesystem layouts for institutional and preservation repositories.”42 ocfl is a specification for organizing digital objects in a way that supports preservation while being computationally efficient. it has several advantages for use in digital preservation, such as the ability to rebuild a repository with only the files, it’s both human and machine readable, supports native error detection, allows objects to be efficiently versioned, and is designed to work with a variety of storage infrastructures.43 although some implementation details are still being worked out, ocfl can be used with object storage.44 ocfl is in production use and client libraries are available for javascript, java, ruby, and rust.45 in this future stack, all storage operations are handled by an ocfl client, which then interacts with the underlying software defined storage network as shown in figure 2. servers physical servers are used chiefly to support virtualization in this future stack. however, this stack moves beyond virtual servers and supports containers and serverless computing. virtual servers are chiefly used to support applications and databases while containers are perfectly suited for microservices running preservation activities. serverless, or function-as-a-service, is the next evolution in virtualization. while a container may be idling all the time, spinning into action when a microservice is called, serverless functions are run on-demand only. they can cost less when using commercial services as aws lambda or aws fargate where the customer is billed for usage only.46 though serverless functions can make use of containers, function-as-a-service platforms have emerged, such as apache openwhisk and openfaas that don’t always require containers. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 12 preservation activities in the 21st-century stack this 21st-century stack performs the same preservation activities as its predecessor. however, it generally does this at lower levels of the stack, in the infrastructure layers as opposed to the application and microservice layers. this change reduces the computational load on the stack and simplifies the business logic. basic activities fixity and replication are achieved leveraging a combination of microservices and softwaredefined storage. by optimizing the approach to fixity for each use case, instead of using the same computationally intensive method for all fixity, organizations can use less compute power. while fixity and replication still involve the microservice layer, it is a more targeted approach. transactional fixity transactional fixity is maintained through a function-as-a-service-based microservice. each time a file is moved, the microservice is triggered, which calculates a md5 checksum and compares it to a stored value that was created upon ingest. if there is a mismatch between the md5 values, a second microservice is called that fetches a valid file replica. while crc32 might be preferred (because it’s slightly less cpu-intensive), box has shown that crc32 values can differ depending on how they are calculated.47 a stored crc32 can only be reliably used to confirm fixity if the new calculation uses the exact same method because crc32 not a true specification—such as md5— and implementations may differ. crc32 is recommended only when it’s possible to calculate in the same manner, such as the same microservice. as this introduces technical complexity, some organizations may prefer to rely solely on md5 for transactional fixity. authentication fixity authentication fixity is maintained in much the same way as in the 20th-century model, except using a cryptographically secure checksum algorithm, such as sha-512 (part of the sha-2 family). however, distinguishing between transactional vs. authentication fixity allows more precise use of algorithms, only requiring more computationally intensive cryptography when it’s truly needed. authentication fixity may require the use of a container-based microservice, versus a function-asa-service, due to the increased computational need. fixity-at-rest fixity-at-rest, the most common type of fixity, is managed by the software-defined storage service and reported in preservation metadata. how this is achieved might look different, depending on which software-defined storage service is used. the ceph community has developed a new technology called bluestor which serves as a custom storage backend that directly interacts with disks, essentially replacing the need to use an underlying filesystem.48 bluestor calculates checksums for every file and verifies them when read. because this is all internal and managed by the same system, crc32 is the default algorithm, but multiple algorithms are supported. ceph will “scrub” data every week. scrubbing is the process of reading the file simply to verify the checksum, even if no user has accessed the file. because of the way ceph performs erasure coding, if a checksum is invalid, the file can be repaired. what remains to be done is writing a script that will read ceph’s internal metadata and record preservation events within the object’s preservation metadata for the fixity verification and any reparative actions. ryu and park propose a “markov failure and repair model” to optimize the frequency of data scrubbing and number of replicas such that the least amount of information technology and libraries december 2021 a 21st century technical infrastructure | tallman 13 power is consumed and that scrubbing occurs at off-peak times.49 it appears that this optimization causes less media degradation from the process of regularly reading the file, though empirical studies are needed to confirm that there is overall less degradation than conducting fixit checks through an application. gluster has a similar scrubbing process for fixity-at-rest in the optional bitrot feature, although it uses sha-256 by default, instead of crc32, which requires more computing power.50 replication replication in this future stack is mostly handled by the software-defined storage service, but microservices may play a role in achieving independence of copies.51 object storage policies allow the automatic copying of data into another region or availability zone that is within the software defined storage network. however, these copies are not replicas, or independent instances, because all copies are in a chain derived from the primary instance; if there is a problem anywhere in the chain, bad data will be copied. in addition to using object storage policies, microservices could be used to independently verify the fixity of downstream copies as well as trigger true replications to independent instantiations, such as an alternative storage service or different storage area within the same software defined storage network. bill branan suggested a similar approach in his cloud native preservation presentation at ndsa digital preservation 2019.52 advanced digital preservation activities advanced digital preservation activities in a 21st-century stack also make use of microservices for metadata extraction and file format conversion. versioning, however, relies upon features of the oxford common file layout, even though object storage often supports versioning natively. metadata extraction function-as-a-service microservices are well suited to metadata extraction, actuated upon ingest or as needed. since embedded metadata is machine-readable, this activity will not be resource intensive or time consuming. in addition to extracting metadata and storing it as discrete, sidecar files, these microservices can be used to populate specific metadata fields used by the repository, including descriptive, administrative, and technical metadata. this approach is more efficient as it gives flexibility to reuse the functions in multiple workflows as opposed to specific events like ingest. file format conversion file format conversions use a combination of function-as-a-service and container-based microservices, depending upon the original format. like metadata extraction, conversion may occur at ingest (normalization) or as needed (migration). function-as-a-service is well suited for small to medium files, such as converting wordperfect to opendocument format. function-as-aservice is also well suited for logical preservation, when only the informational content is necessary to preserve, such as converting a tif to a txt file through ocr. container-based microservices are better suited for converting large media files that may take more memory and time; function-based services often have a time constraint, for example, migrating proprietary encoded digital video to open codecs and container formats. versioning although object storage typically supports versioning, it is inefficient because each version is an entirely new object. this means that unchanged data is duplicated, taking up more disk space. the oxford common file layout, which negotiates storage between the application and microservices information technology and libraries december 2021 a 21st century technical infrastructure | tallman 14 layers and a software defined storage service, supports a forward delta versioning in which each new version only contains the changes. using the object inventories, it’s possible to rebuild any object to any version without duplicating bits.53 an additional benefit of using ocfl is that it inherently uses checksums, and any changes or corruption are detected when an interaction occurs with the object, creating a layered approach to maintaining fixity-at-rest. sustainability in the 21st-century stack the differences between our 20thand 21st-century stacks result in a more sustainable approach to digital preservation, per the triple-bottom-line definition.54 by adopting commercial sector approaches, cultural heritage organizations can more efficient data centers consumers. people (labor) by shifting activities to lower levels in the stack and letting infrastructure components selfmanage, fewer people are needed to develop and maintain the business logic that formerly handled the same action. the application and microservice layers use programming languages and libraries that can become outdated quickly, requiring development work to maintain functionality. while there is still a need for specialized knowledge, fewer people are needed to do the work when these actions take place in more stable parts of the stack. planet (environmental) our new stack has a lower environmental impact for a variety of reasons. first, per kryder’s law (the storage parallel to moore’s law for computing), the areal density of storage media predictably increases annually, and our new stack uses the latest hard disk and tape technology.55 this results in needing less media, some of which doesn’t need power to run, decreasing the carbon impact. additionally, our new stack uses a mix of hot and cold storage, making it possible to implement automatic tiering to shift objects to less resource-intensive storage, like tape.56 second, as the stack becomes more serverless, fewer computational resources are needed. even though container and function-based microservices may incur some overhead in terms of cpu cycles, it is more efficient in terms of system idling to be running these as microservices on one platform rather than doing the same action in the application or vm layer. this further decreases the carbon impact and while also decreasing the dependency on rare-earth elements. relatedly, by leveraging software-defined storage to maintain fixity-at-rest, the compute load is greatly decreased; the cpu cost to calculating checksums in the storage layer is less than when this is done in the through applications or microservices. profit (economic) sustainability improvements for both people and planet may also result in a lower total cost of ownership for a digital preservation system. cost is a prime motivator when administers and leaders make long term decisions, decreasing annual operating cost related to digital preservation is crucial to the viability of a program. future and related work the 21st-century stack proposed in this paper is not the only way to increase sustainability or the only way in which digital preservation stacks will change. the planet is running out of bandwidth and will need to expand into using 5g and low-earth orbit satellite communications. new, quantum-resistant algorithms will need to be introduced as quantum computing advances.57 information technology and libraries december 2021 a 21st century technical infrastructure | tallman 15 blockchain technology introduces many possibilities. inherently, blockchain maintains fixity. the archangel project is exploring practical methods of recording provenance and proving authenticity by using a permissioned blockchain.58 blockchain is also the technology behind the interplanetary file system (ipfs), a content-addressed distributed storage network, that is in turn used by filecoin, a marketplace for an ipfs storage. small data industries is building starling, a filecoin-based application designed for cultural heritage organizations to securely store digital content.59 it’s important to note that these blockchain-based projects use a proof-of-stake model instead of a proof-of-work model, which has a significantly lower environmental impact than other blockchain implementations like the cryptocurrency bitcoin.60 while some organizations, like stanford university, may already leverage software-defined storage, most in the cultural heritage sector are not.61 the metaarchive cooperative, a nonprofit consortium for digital preservation, has begun a noteworthy project to explore using softwaredefined storage in a distributed digital preservation network. metaarchive, which currently uses lockss, is one of the few public digital preservation services that mitigates risk through organizational and administrative diversity. because members host and administer the lockss nodes that contain the replications, each copy is managed by a different set of organizational and administrative policies and often use different technology to do so. diversifying in this way protects against a single point of failure if only one organization managed the technical infrastructure. how this same diversity is achieved in a software-defined storage-based distributed digital preservation network will be a great contribution to the community. it would also be useful to study the reasons cultural heritage organizations have been so reluctant to adopt commercial sector technologies. identifying these hesitations would make it possible to create strategies that would encourage adoption of these approaches. it may simply be that when it comes to digital preservation, familiar and proven technologies provide a level of comfort. organizations may also be entrenched in custom developed solutions that are hard to move away from. conclusion digital preservation is a long-term commitment. while re-appraisal may take place, it’s inevitable that the amount of content stored in digital repositories will only increase over time. it is fiduciarily incumbent upon the cultural heritage community to examine our practices and look for better alternatives. exceptionalism ignores technological advancements made by the commercial industry, advancements that are very well suited to digital preservation. by adopting commercial industry data practices, such as software-defined storage, while simultaneously implementing innovations from within the cultural heritage community, like the oxford common file layout, it is possible to reduce the ongoing costs, resource consumption, and environmental impact of digital preservation. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 16 endnotes 1 ben goldman, “it’s not easy being green(e): digital preservation in the age of climate change,” in archival values: essays in honor of mark a. greene, ed. mary a. caldera and christine weidman (chicago: american library association, 2018), 274–95, https://scholarsphere.psu.edu/concern/generic_works/bvq27zn11p. 2 “a simple explanation of the triple bottom line,” university of wisconsin sustainable management, october 2, 2019, https://perma.cc/2hwf-3mmq. 3 goldman, “it’s not easy being green(e).” 4 keith l. pendergrass et al., “toward environmentally sustainable digital preservation,” the american archivist 82, no. 1 (2019): 165–206, https://doi.org/10.17723/0360-9081-82.1.165. 5 sarah barsness et al., 2017 fixity survey report: an ndsa report (osf, april 24, 2018), https://doi.org/10.17605/osf.io/snjbv; david s. h. rosenthal et al., “requirements for digital preservation systems: a bottom-up approach,” d-lib magazine 11, no. 11 (2005), https://perma.cc/x2r7-r5xp. 6 matthew addis, which checksum algorithm should i use? (dpc technology watch guidance note, digital preservation coalition, december 11, 2020), https://doi.org/10.7207/twgn20-12. 7 premis editorial committee, premis data dictionary for preservation metadata, version 3.0 (library of congress, november 2015), https://perma.cc/l79v-gqv7. 8 some organizations may continue to use a strategy where fixity is only checked when a file is accessed, if the potential loss fits within a defined acceptable loss. while this strategy may not work for all organizations, recognizing that loss is inevitable and defining a level of acceptable loss is an effective and pragmatic approach to managing risk of bit decay. 9 barsness et al., 2017 fixity survey report. 10 ndsa levels of preservation revisions working group, “levels of digital preservation, 2019 lop matrix, v2.0 (osf, october 14, 2019), https://osf.io/2mkwx/. 11 sibyl schaefer et al., “user guide for the preservation storage criteria,” february 25, 2020, https://doi.org/10.17605/osf.io/sjc6u. 12 mark carlson et al., “software defined storage,” (white paper, storage network industry association, january 2015), https://perma.cc/aq4t-9yxq. 13 abutalib aghayev et al., “file systems unfit as distributed storage backends” (proceedings of the 27th acm symposium on operating systems principles—sosp ’19, huntsville, ontario, canada: association for computing machinery, 2019): 353–69, https://doi.org/10.1145/3341301.3359656. 14 ndsa storage infrastructure survey working group, 2019 storage infrastructure survey: results of the storage infrastructure survey (osf, 2020), https://doi.org/10.17605/osf.io/uwsg7. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 17 15 joseph migga kizza, “virtualization technology and security,” in guide to computer network security, computer communications and networks (springer, cham, 2017), 457–75, https://doi.org/10.1007/978-3-319-55606-2_21. 16 ndsa storage infrastructure survey working group, 2019 storage infrastructure survey. 17 eric jonas et al., “cloud programming simplified: a berkeley view on serverless computing” (university of california, berkeley, february 10, 2019), https://perma.cc/yam2-tz8w. 18 alex garnett, mike winter, and justin simpson, “checksums on modern filesystems, or: on the virtuous consumption of cpu cycles,” in ipres 1028 conference [proceedings] (international conference on digital preservation, boston, mass., 2018), https://doi.org/10.17605/osf.io/y4z3e. 19 barsness et al., 2017 fixity survey report. 20 “import metadata,” documentation for archivematica 1.12.1, artefactual systems, inc., accessed may 21, 2021, https://perma.cc/ue3r-bdgz.; “ingest,” documentation for archivematica 1.12.1, artefactual systems, inc., accessed may 21, 2021, https://perma.cc/5sn5-gfx3. 21 ndsa storage infrastructure survey working group, 2019 storage infrastructure survey. 22 “fedora content versioning,” 2005, https://duraspace.org/archive/fedora/files/download/2.0/userdocs/server/features/version ing.html. 23 michael armbrust et al., above the clouds: a berkeley view of cloud computing, (technical report, eecs department, university of california, berkeley, february 10, 2009), https://perma.cc/qj5w-8s5y. 24 armbrust et al., above the clouds. 25 micah altman et al., “ndsa storage report: reflections on national digital stewardship alliance member approaches to preservation storage technologies,” d-lib magazine 19, no. 5/6 (may 2013), https://doi.org/10.1045/may2013-altman; michelle gallinger et al., “trends in digital preservation capacity and practice: results from the 2nd bi-annual national digital stewardship alliance storage survey,” d-lib magazine 23, no. 7/8 (2017), https://doi.org/10.1045/july2017-gallinger; ndsa storage infrastructure survey working group, 2019 storage infrastructure survey; evviva weinraub et al., beyond the repository: integrating local preservation systems with national distribution services (northwestern university, 2018), https://doi.org/10.21985/n28m2z. 26 ontario council of university libraries, “ontario library research cloud,” accessed april 14, 2021, https://perma.cc/kmp9-fs8k; “open source cloud computing infrastructure,” openstack, accessed april 14, 2021, https://perma.cc/g9ge-92jd. 27 nathan tallman, “software defined storage,” (presentation for the ndsa infrastructure interest group, march 16, 2020), https://doi.org/10.26207/3nn2-zv13. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 18 28 these network shares typically use the smb (server message block) or cifs (common internet file system) protocols to present file shares through a graphical user interface in operating systems such as windows or macos while the nfs (network file shares) protocol is more often used to mount storage in linux. 29 carlson et al., “software defined storage.” 30 raid, or the redundant array of independent disks, is technology that splits a file into multiple chunks and spreads them across multiple disks in a storage device, adding extra copies of the chunks so that the file can be recovered if an individual drive fails. 31 abhijith shenoy, “the pros and cons of erasure coding & replication vs raid in next-gen storage platforms” (software developer conference, storage networking industry association, 2015), https://perma.cc/yfs5-kxkk. 32 glenn heinle, “unlocking ceph” (presentation, designing storage architectures for digital collections, washington, dc: library of congress, 2019), https://perma.cc/z2r9-79ze; tamara scott, “big data storage wars: ceph vs gluster,” technologyadvice (blog), may 14, 2019, https://perma.cc/2yy2-bbxg. 33 giacinto donvito, giovanni marzulli, and domenico diacono, “testing of several distributed file-systems (hdfs, ceph and glusterfs) for supporting the hep experiments analysis,” journal of physics: conference series 513, no. 4 (june 2014): 042014, https://doi.org/10.1088/1742-6596/513/4/042014. 34 matthew ahrens, “openzfs: a community of open source zfs developers,” in asiabsdcon 2014 (asiabsdcon, tokyo, japan: bsd research, 2014), 27–32, https://perma.cc/xg79-pbu7. 35 brian hickmann and kynan shook, “zfs and raid-z: the über-fs?” (university of wisconsin– madison, december 2007), https://perma.cc/w5pd-enpp. 36 garnett, winter, and simpson, “checksums on modern filesystems.” 37 edward shishkin, “resier5 (format release 5.x.y),” marc mailing list archive, 2019, https://perma.cc/dn8y-v8kq. 38 “fujifilm launches ‘fujifilm software-defined tape,’” fujifilm europe, may 19, 2020, https://perma.cc/b3gn-plr9. 39 aghayev et al., “file systems unfit as distributed storage backends.” 40 ibm systems, “tape goes high speed,” 2016, https://perma.cc/fnv9-rtg9; “fujifilm launches ‘fujifilm software-defined tape’”; desire athow, “here’s what sony’s million gigabyte storage cabinet looks like,” techradar (blog), 2020, https://perma.cc/vhn4-layt. 41 david rosenthal, “optical media durability: update,” dshr’s blog, august 20, 2020, https://perma.cc/vkw9-83j3. information technology and libraries december 2021 a 21st century technical infrastructure | tallman 19 42 andrew hankinson et al., “the oxford common file layout: a common approach to digital preservation,” publications 7, no. 2 (june 2019): 39, https://doi.org/10.3390/publications7020039. 43 andrew hankinson et al., “oxford common file layout specification,” july 7, 2020, https://perma.cc/s73z-3n6k. 44 marco la rosa et al., “our thoughts on ocfl over s3 · issue #522 · ocfl/spec,” github, accessed march 12, 2021, https://perma.cc/pa3g-cb78. 45 hannah frost, “version 1.0 of the oxford common file layout (ocfl) released,” stanford libraries (blog), july 23, 2020, 1, https://perma.cc/5j5m-gyqw; andrew woods, “implementations | ocfl/spec,” github, february 10, 2021, https://github.com/ocfl/spec. 46 while serverless might be the ultimate microservice, requiring the least amount of overhead, costs may still be hard to predict. 47 ryan luecke, “crc32 checksums; the good, the bad, and the ugly,” box blog, october 12, 2011, https://perma.cc/mvp7-yvzv. 48 aghayev et al., “file systems unfit as distributed storage backends.” 49 junkil ryu and chanik park, “effects of data scrubbing on reliability in storage systems,” ieice transactions on information and systems e92-d, no. 9 (september 1, 2009): 1639–49, https://doi.org/10.1587/transinf.e92.d.1639. 50 raghavendra talur, “bitrot detection | gluster/glusterfs-specs,” github, august 15, 2015, https://github.com/gluster/glusterfsspecs/blob/fe4c5ecb4688f5fa19351829e5022bcb676cf686/done/glusterfs%203.7/bitrot.m d. 51 schaefer et al., “user guide for the preservation storage criteria.” 52 bill branan, “cloud-native preservation” (osf, october 22, 2019), https://osf.io/kmdyf/. 53 andrew hankinson et al., “implementation notes, oxford common file layout specification,” july 7, 2020, https://perma.cc/pvf8-sqfn. 54 although out of scope in terms of the stack, the policies and practices implemented by organizations can have a direct impact on digital preservation sustainability. for example, appraisal can be the most powerful tool available to an organization to control the amount of content being preserved. despite storage vendors proclamations that storage is cheap, digital preservation is not. it is not wise nor necessary to keep every digital file. organizations will benefit from applying flexible appraisal systems that reduce the amount of content needing preservation, but also establishing different classes of preservation so the most advanced activities are only applied as needed. additionally, organizations should consider allowing lossy compression to decrease disk usage, where appropriate; compression as an appraisal choice is very similar to choosing to sample a grouping of material rather than preserving the whole. for additional information see nathan tallman and lauren work, “approaching information technology and libraries december 2021 a 21st century technical infrastructure | tallman 20 appraisal: framing criteria for selecting digital content for preservation,” in ipres 1028 conference [proceedings] (international conference on digital preservation, boston, mass.: osf, 2018), https://doi.org/10.17605/osf.io/8y6dc. 55 david rosenthal, “cloud for preservation,” dshr’s blog, 2019, https://perma.cc/zls9-r857. 56 pendergrass et al., “toward environmentally sustainable digital preservation.” 57 henry newman, “industry trends” (presentation, designing storage architectures for digital collections, washington, dc: library of congress, 2019), https://perma.cc/3mgk-n5u3. 58 t. bui et al., “archangel: trusted archives of digital public documents,” in proceedings acm document engineering 2018 (association for computing machinery, arxiv.org, 2018), https://doi.org/10.1145/3209280.3229120. 59 ben fino-radin and michelle lee, “[starling]” (presentation, designing storage architectures for digital collections, washington, dc: library of congress, 2019), https://perma.cc/7lguuew9. 60 for additional information on the differences of proof-of-stake vs. proof-of-work models, see peter fairley, “ethereum plans to cut its absurd energy consumption by 99 percent,” ieee spectrum (blog), january 2, 2019, https://perma.cc/gch7-t556. 61 julian morley, “storage cost modeling” (presentation, pasig, mexico city, mexico, 2019), https://doi.org/10.6084/m9.figshare.7795829.v1. 156 information technology and libraries | december 2011 mark dehmlow editorial board thoughts: sharing responsibility in the digital age t his topic is very resonant for me because this past year we launched a new interface to our catalog, rich with all of the features that our users have been self-trained to expect from browsing the internet. we actually launched this project in public beta for two years, t–w–o years. i should also mention that the initial implementation team was diverse, drawing from technology, public services, collections, and technical services. yet, when we launched the project into production, it was only then that we heard concerns and complaints. those concerns revolved around two things—first, there was functionality in the classic catalog that wasn’t in the new one, and second, people were used to the old way of doing things and didn’t know how the supposedly more intuitive interface worked—a kind of opacholm syndrome, and more importantly for librarians, they wanted to know how to exploit the system powerfully. we also found during the first semester, there were few instructors teaching the new system because they were afraid they couldn’t speak authoritatively about it. people are creatures of habit and even though something might be easier to learn if it were your first exposure to it (macs vs. pcs anyone?), often times changing from a more complex, but well understood, process is difficult. i remember years ago at another institution i worked for, helping the organization move from a menu driven ils interface to a graphical user interface. it required staff to actually rethink the process they were performing because although the gui is able to make the process more efficient, it also hides many of the more mundane parts of it. with all of those concerns on the table all of a sudden, what did we do? we spent the summer after our production launch providing targeted training sessions and gathering in person feedback from our internal stakeholders. it probably amounted to more than thirty meetings over the course of three months. we synthesized feedback, identified the biggest pain points, and spent a couple of months developing solutions. providing a more organized training program and targeted feedback sessions as a replacement for our more generalized call for input bought us a lot of goodwill internally. it also gave us some direction on what areas to focus on and opened more dialog with the rest of the library. in the end, it is really important for all areas to be responsible for trying out new systems, even if those responsible are doing more outreach than simple general calls for participation. in some ways, those deploying new systems have the greater onus in this relationship in that they are driving many of the effort; this is especially critical for changes that have broad impact. taking a more organized and proactive approach to training and acclimating our organizations to change can go a long way to reducing conflict and stress. everyone is extremely busy, and the tendency for people is to ignore the things that aren’t directly in front of them. making efforts toward a more proactive strategy raises awareness and by meeting in person, you show people that their input is valuable enough to make time to listen and talk to them. taking this type of approach is important even in the cases where projects are managed by committees. liaisons don’t necessarily provide organizational saturation and oftentimes the vital information about a new system is filtered through their own sense of what is critical. a good start to determine how much communication is needed is to first gauge the potential impact—if a change affects more than a certain percentage of the library and its users, it probably means it will require a good deal more outreach so people don’t feel quite as off balance when the change is implemented. those deploying projects should add a couple of months onto the end of planning cycles to help provide training and gather feedback in a hands on way—e-mail announcements are more often ignored than not given the sheer amount of e-mail everyone gets these days. another possible strategy is to devise testing scripts for anyone trying the system to follow as opposed to just having them “try it out.” a script will give people some direction and hopefully get them into system functionality that they otherwise could miss by trying it without any specific goal. i am not so naive to think we will reach an allencompassing-kumbaya moment where communication is perfect and everyone agrees on what kinds of changes to implement in our organizations. i do think though, that teams and individuals who are implementing new systems can help alleviate anxieties if we build more time into our deployment processes to ease our organizations into change instead of hoping they learned how to swim before we all jump in. mark dehmlow (mdehmlow@nd.edu) is head, library web department, interim head, library information systems department, hesburgh libraries, university of notre dame, notre dame, indiana investigations into library web-scale discovery services jason vaughan information technology and libraries | march 2012 32 abstract web-scale discovery services for libraries provide deep discovery to a library’s local and licensed content and represent an evolution—perhaps a revolution—for end-user information discovery as pertains to library collections. this article frames the topic of web-scale discovery and begins by illuminating web-scale discovery from an academic library’s perspective—that is, the internal perspective seeking widespread staff participation in the discovery conversation. this included the creation of the discovery task force, a group that educated library staff, conducted internal staff surveys, and gathered observations from early adopters. the article next addresses the substantial research conducted with library vendors that have developed these services. such work included drafting of multiple comprehensive question lists distributed to the vendors, onsite vendor visits, and continual tracking of service enhancements. together, feedback gained from library staff, insights arrived at by the discovery task force, and information gathered from vendors collectively informed the recommendation of a service for the unlv libraries. introduction web-scale discovery services, combining vast repositories of content with accessible, intuitive interfaces, hold the potential to greatly facilitate the research process. while the technologies underlying such services are not new, commercial vendors releasing such services, and their work and agreements with publishers and aggregators to preindex content, is very new. this article in particular frames the topic of web-scale discovery and helps illuminate some of the concerns and commendations related to web-scale discovery from one library’s staff perspective—that is, the internal perspective. the second part focuses on detailed dialog with the commercial vendors, enabling the library to gain a better understanding of these services. in this sense, the second half is focused externally. given that web-scale discovery is new for the library environment, the author was unable to find any substantive published work detailing identification, research, evaluation, and recommendation related to library web-scale discovery services. it’s hoped that this article will serve as the ideal primer for other libraries exploring or contemplating exploration of these groundbreaking services. web-scale discovery services are able to index a variety of content, whether hosted locally or remotely. such content can include library ils records, digital collections, institutional repository content, and content from locally developed and hosted databases. such capabilities existed, to varying degrees, in next-generation library catalogs that debuted in the mid 2000s. in addition, web-scale discovery services pre–index remotely hosted content, whether purchased or licensed by the library. this latter set of content—hundreds of millions of items—can include items such as e-books, publisher or aggregator content for tens of thousands of full-text journals, content from abstracting and indexing databases, and materials housed in open-access repositories. for purposes of this article, web-scale discovery services are flexible services which jason vaughan (jason.vaughan@unlv.edu) is director, library technologies, university of nevada, las vegas. investigations into library web-scale discovery services | vaughan 33 provide quick and seamless discovery, delivery, and relevancy-ranking capabilities across a huge repository of content. commercial web-scale discovery vendors have brokered agreements with content providers (publishers and aggregators), allowing them to pre–index item metadata and full-text content (unlike the traditional federated search model). this approach lends itself to extremely rapid search and return of results ranked by relevancy, which can then be sorted in various ways according to the researcher’s whim (publication date, item type, full text only, etc.). by default, an intuitive, simple, google-like search box is provided (along with advanced search capabilities for those wishing this approach). the interface includes design cues expected by today’s researchers (such as faceted browsing) and, for libraries wishing to extend and customize the service, embraces an open architecture in comparison to traditional ils systems. why web-scale discovery? as illustrated by research dating back primarily to the 1990s, library discovery systems within the networked online environment have evolved, yet continue to struggle to serve users. as a result, the library (or systems supported and maintained by the library) is often not the first stop for research—or worse, not a stop at all. users accustomed to a quick, easy, “must have it now” environment have defected, and research continues to illustrate this fact. rather than weave these research findings into a paragraph or page, below are some illustrative quotes to convey this challenge. the quotations below were chosen because they succinctly capture findings from research involving dozens, hundreds, and in some cases thousands of participants or respondents: people do not just use information that is easy to find; they even use information that they know to be of poor quality and less reliable—so long as it requires little effort to find—rather than using information they know to be of high quality and reliable, though harder to find.1 * * * today, there are numerous alternative avenues for discovery, and libraries are challenged to determine what role they should appropriately play. basic scholarly information use practices have shifted rapidly in recent years, and as a result the academic library is increasingly being disintermediated from the discovery process, risking irrelevance in one of its core functional areas [that of the library serving as a starting point or gateway for locating research information] . . . we have seen faculty members steadily shifting towards reliance on networklevel electronic resources, and a corresponding decline in interest in using locally provided tools for discovery.2 * * * a seamless, easy flow from discovery through delivery is critical to end users. this point may seem obvious, but it is important to remember that for many end users, without the delivery of something he or she wants or needs, discovery alone is a waste of time.3 * * * end users’ expectations of data quality arise largely from their experiences of how information is organized on popular web sites. . . 4 * * * [user] expectations are increasingly driven by their experiences with search engines like google and online bookstores like amazon. when end users conduct a search in a library information technology and libraries | march 2012 34 catalog, they expect their searches to find materials on exactly what they are looking for; they want relevant results.5 * * * users don’t understand the difference in scope between the catalog and a&i services (or the catalog, databases, digitized collections, and free scholarly content).6 * * * it is our responsibility to assist our users in finding what they need without demanding that they acquire specialized knowledge or select among an array of “silo” systems whose distinctions seem arbitrary . . . the continuing proliferation of formats, tools, services, and technologies has upended how we arrange, retrieve, and present our holdings. our users expect simplicity and immediate reward and amazon, google, and itunes are the standards against which we are judged. our current systems pale beside them.7 * * * q: if you could provide one piece of advice to your library, what would it be? a: just remember that students are less informed about the resources of the library than ever before because they are competing heavily with the internet.8 additional factors sell the idea of web-scale discovery. obviously, something must be discoverable for it to be used (and of value) to a researcher; ideally, content should be easily discoverable. since these new services index content that previously was housed in dozens or hundreds of individual silos, they can greatly facilitate the search process for many research purposes. libraries often spend large sums of money to license and purchase content, sums that often increase annually. any tool that holds the potential to significantly increase the discovery and use of such content should cause libraries to take notice. at time of writing, early research is beginning to indicate that these tools can increase discovery. doug way compared link-resolver-database and full-text statistics prior to and after grand valley state university’s implementation of the summon webscale discovery service.9 his research suggests that the service was both broadly adopted by the university’s community and that it has led to an increase in their library’s electronic resource discovery and use. willamette university implemented worldcat local, and bill kelm presented results that showed an increase in both ill requests as well as use of the library’s electronic resources.10 from another angle, information-literacy efforts focus on connecting users to “legitimate” content and providing researchers the skills to identify content quality and legitimacy. given that these web-scale discovery services include or even primarily focus on indexing a large amount of scholarly research, such services can serve as another tool in the library’s arsenal. results retrieved from these services—largely content licensed or purchased by libraries—is accurate, relevant, and vetted, compared to the questionable or opinionated content that may often be returned through a web search engine query. several of the services currently allow a user to refine results to just categorized as peer-reviewed or scholarly. the internal academic library perspective: genesis of the unlv libraries discovery task force the following sections of this article begin with a focus on the internal unlv library perspective—from early discussions focused on the broad topic of discovery to establishing a task investigations into library web-scale discovery services | vaughan 35 force charged to identify, research, evaluate, and recommend a potential service for purchase. throughout this process, and as detailed below, communication with and feedback from the variety of library staff was essential in ensuring success. given the increasing vitality of content in electronic format, and the fact that such content was increasingly spread across multiple access points or discovery systems, in late 2008 the university of nevada las vegas (unlv) libraries began an effort to engage library staff in information discovery and how such discovery would ideally occur in the future. related to the exponential growth of content in electronic format, traditional technical-services functions of cataloging and acquisitions were changing or would soon change, not just at unlv, but throughout the academic library community. coinciding with this, the libraries were working on drafting their 2009–11 strategic plan and wanted to have a section highlighting the importance of information discovery and delivery with action items focused on improving this critical responsibility of libraries. in spring 2009, library staff were given the opportunity to share with colleagues a product or idea, related to some aspect of discovery, which they felt was worthy of further consideration. this event, open to unlv libraries staff and other nevada colleagues, was titled the discovery mini-summit, and more than a dozen participants shared their ideas, most in a poster-session format. one of the posters focused on serial solutions summon, an early entrant into the vendor web-scale discovery service landscape. at the time, it was a few months from public release. other posters included topics such as the flickr commons (cultural heritage and academic institutions exposing their digital collections through this popular platform), and a working prototype of a homegrown, open-source federated search approach searching across various subscribed databases. in august 2009, the dean of the unlv university libraries charged a ten-person task force to investigate and evaluate web-scale discovery services with the ultimate goal of providing a final recommendation for potential purchase. representation on the task force included three directors and a broad cross section of staff from across the functional areas of the library, including back-of-the-house and public-service operations. the director of library technologies, and author of this article, was tasked with drafting a charge and chairing the committee; once charged, the discovery task force worked over the next fifteen months to research, evaluate, and ultimately provide a recommendation regarding a web-scale discovery service. to help illustrate some of the events described, a graphical timeline of activities is presented as appendix a; the original charge appears as appendix b. in retrospect, the initial target date of early 2010 to make a recommendation was naive, as three of the five products ultimately identified and evaluated by the task force weren’t publicly released until 2010. several boundaries were provided within the charge, including the fact that the task force was not investigating and evaluating traditional federated search products. the libraries had had a very poor experience with federated search a few years earlier, and the shortcomings of the traditional federated search approach—regardless of vendor—are well known. the remainder of this article discusses the various steps taken by the discovery task force in evaluating and researching web-scale discovery services. while many libraries have begun to implement the webscale discovery services evaluated by this group, many more are currently at the learning and evaluation stage, or have not yet begun. many libraries that have already implemented a commercial service likely went through an evaluation process, but perhaps not at the scale conducted by the unlv libraries, if for no other reason than the majority of commercial services are extremely new. even in early 2010, there was less competition, fewer services to evaluate, information technology and libraries | march 2012 36 fewer vendors to contact, and fewer early adopters from whom to seek references. fortunately, the initial target date of early 2010 for a recommendation was a soft target, and the discovery task force was given ample time to evaluate the products. based on presentations given by the author in 2010, it can’t be presumed that an understanding of web-scale discovery—or the awareness of the commercial services now available—is necessarily widespread. in that sense, it’s the author’s hope and intent that information contained in this article can serve as a primer, or a recipe, for those libraries wishing to learn more about web-scale discovery and perhaps begin an evaluation process of their own. while research exists on federated search technologies within the library environment, the author was unable to find any peer-reviewed published research on the evaluation model and investigations for vendor produced web-scale discovery services as described in this paper. however, some reports are available on the open web, providing some insights into web-scale discovery evaluations led by other libraries, such as two reports provided by oregon state university. the first, dated march 2009, describes a task force whose activities included “scrutinize wcl [worldcat local], investigate other vendors’ products, specifically serials solutions’ summon, the recently announced federated index discovery system; ebsco’s integrated search; and innovative interfaces’ encore product, so that a more detailed comparison can be done,” and “by march 2010, communicate . . . whether wcl or another discovery service is the optimal purchase for osu libraries.”11 note that in 2009, encore existed as a next-generation discovery layer, and it had an optional add on called “encore harvester,” which allows for the harvesting of digital local collections. the report cites the university of michigan’s evaluation of wcl, and adds their additional observations. the march 2009 report provides a features comparison matrix for worldcat local, encore, summon, and libraryfind (an open-source search tool developed at osu that provides federated searching for selected resources). feature sets include the areas of search and retrieval, content, and added features (e.g., book covers, user tagging, etc.). the report also describes some usability testing involving wcl and integration with other local library services. a second set of investigations followed “in order to provide the task force with an opportunity to more thoroughly investigate other products” and is described in a second report provided at the end of 2009.12 at the time of both phases of this evaluation (and drafted reports) three of the web-scale discovery products had yet to enter public release. the december 2009 report focused on the two released products, serials solutions summon and worldcat local, and includes a feature matrix like the earlier report, with the added feature set of “other,” which included the features of “clarity of display,” “icons/images,” and “speed.” the latter report briefly describes how they obtained subject librarian feedback and the pros and cons observed by the librarians in looking at summon. it also mentions obtaining feedback from two early adopters of the summon product, as well as obtaining feedback from librarians whose library had implemented worldcat local. apart from the oregon reports, some other reports on evaluations (or selection) of a particular service, or a set of particular services, are available, such as the university of michigan’s article discovery working group, which submitted a final report in january 2010.13 activity: understanding web-scale the first activity of the discovery task force was to educate the members, and later, other library colleagues, on web-scale discovery. terms such as “federated search,” “metasearch,” “next investigations into library web-scale discovery services | vaughan 37 generation catalogs,” and “discovery layers” had all come before, and “web-scale” was a rather new concept that wasn’t widely understood. the discovery mini summit served as a springboard that perhaps more by chance than design introduced to unlv library staff what would later become more commonly known as web-scale discovery, though even we weren’t familiar with the term back in spring 2009. in fall 2009, the discovery task force identified reports from entities such as oclc, ithaka, and reports prepared for the library of congress highlighting changing user behavior and expectations; these reports helped form a solid foundation for understanding the “whys” related to web-scale discovery. additional registration and participation in sponsored web-scale discovery webcasts and meeting with vendors at library conferences helped further the understanding of web-scale discovery. after the discovery task force had a firm understanding of web-scale discovery, the group hosted a forum for all library staff to help explain the concept of web-scale discovery and the role of the discovery task force. specifically, this first forum outlined some key components of a web-scale discovery service, discussed research the task force had completed to date, and outlined some future research and evaluation steps. a summary of these steps appears in the timeline in appendix a. time was allowed for questions and answers, and then the task force broadcast several minutes of a (then recent) webcast talking about web-scale discovery. as part of its education role, the discovery task force set up an internal wiki-based webpage in august 2009 upon formation of the group, regularly added content, and notified staff when new content was added. a goal of the task force was to keep the evaluative process transparent, and over time the wiki became quite substantial. links to “live” services were provided on the wiki. given that some services had yet to be released, some links were to demo sites or sites of the closest approximation available, i.e., some services yet to be released were built on an existing discovery layer already in general release, and thus the look, feel, and functionality of such services was basically available for staff review. the wiki also provided links to published research and webcasts on web-scale discovery. such content grew over time as additional webscale discovery products entered general release. in addition to materials on particular services, links were provided to important background documents and reports on topics related to the user discovery experience and user expectations for search, discovery, and delivery. discovery task force meeting notes and staff survey results were posted to the wiki, as were evaluative materials such as information on the content-overlap analysis conducted for each service. announcements to relevant vendor programs at the american library association’s annual conference were also posted to the wiki. activity: initial staff survey as noted above, when the task force began its work, only two products (out of five ultimately evaluated) were in general release. as more products entered public release, a next step was to invite vendors onsite to show their publicly released product, or a working, developed prototype nearing initial public release. to capture a sense of the library staff ahead of these vendor visits, the discovery task force conducted the first of two staff surveys. the 21-question survey consisted of a mix of “rank on a scale” questions, multiple-choice questions, and free-text response questions. both the initial and subsequent surveys were administered through the online surveymonkey tool. respondents were allowed to skip any question they wished. the survey was broken into three broad topical areas: “local library customization capabilities,” “end user aspect: information technology and libraries | march 2012 38 features and functionality,” and “content.” the survey had an average response rate of 47 staff, or 47% of the library’s 100-strong workforce. the survey questions appear in appendix c. in hindsight, some of the questions could have benefitted from more careful construction. that said, there was a conscious juxtaposition of differing concepts within the same question—the task force did not want to receive a set of responses in which all library staff felt it was important for a service to do everything—in short, to be all things to all people. forcing staff to rate varied concepts within a question could provide insights into what they felt was really important. a brief summary of some key questions for each section follows. as an introduction, one question in the survey asked staff to rate the relative importance of each overarching aspect related to a discovery service (customization, end user interface, and content). staff felt content was the most critical aspect of a discovery service, followed by the end-user interface, followed by the ability to heavily customize the service. a snapshot of some of the capabilities library staff thought were important (or not) is provided in table 1. web-scale capabilities sa a n d sd physical item status information 81.6% 18.4% publication date sort capability 75.5% 24.5% display library-specified links in the interface 69.4% 30.6% one-click retrieval of full-text items 61.2% 36.7% 2% ability to place ill / consortial catalog requests 59.2% 36.7% 4.1% display the library’s logo 59.2% 36.7% 4.1% to be embedded within various library website pages 58% 42% full-text items first sort capability 58.3% 31.3% 8.3% 2.1% shopping cart for batch printing, emailing, saving 55.1% 44.9% faceted searching 48.9% 42.6% 8.5% media type sort capability 47.9% 43.8% 4.2% 4.2% author name sort capability 41.7% 37.5% 18.8% 2.1% have a search algorithm that can be tweaked by library staff 38% 36% 20% 4% 2% user account for saved searches and marked items 36.7% 44.9% 14.3% 4.1% book cover images 25% 39.6% 20.8% 10.4% 4.2% have a customizable color scheme 24% 58% 16% 2% google books preview button for book items 18.4% 53.1% 24.5% 4.1% tag cloud 12.5% 52.1% 31.3% 4.2% user authored ratings 6.4% 27.7% 44.7% 12.8% 8.5% user authored reviews 6.3% 20.8% 50% 12.5% 10.4% user authored tags 4.2% 33.3% 39.6% 10.4% 12.5% sa = strongly agree; a = agree; n = neither agree nor disagree; d = disagree; sd = strongly disagree table 1. web-scale discovery service capabilities investigations into library web-scale discovery services | vaughan 39 none of the results was surprising, other than perhaps the low interest or indifference in several web 2.0 community features, such as the ability for users to provide ratings, reviews, or tags for items, and even a tag cloud. the unlv libraries already had a next-generation catalog offering these features, and they have not been heavily used. even if there had been an appreciable adoption of these features by end users in the next-generation catalog for a web scale discovery service they are perhaps less applicable—it’s probably more likely that users would be less inclined to post reviews and ratings for an article, as opposed to a monograph—and article-level content vastly outnumbers book-level content with web-scale discovery services. the final survey section focused on content. one question asked about the incorporation of ten different information types (sources) and asked staff to rank how important it was that a service include such content. results are provided in table 2. a bit surprisingly, inclusion of catalog records was seen as most important. not surprisingly, full-text and a&i content from subscription resources were ranked very highly. it should also be noted that at the time of the survey, the institutional repository was in its infancy with only a few sample records, and awareness of this resource was low among library staff. another question listed a dozen existing publishers (e.g., springer, elsevier, etc.) deemed important to the libraries and asked staff to rank the importance that a discovery service index items from these publishers on a four point scale from “essential” to “not important.” results showed that all publishers were ranked as essential and important. related to content, 83.8 percent of staff felt that it was preferable for a service to de-dupe records such that the item appears once in the returned list of results; 14.6 percent preferred that the service not de-dupe results. information source rating average ils catalog records 1.69 majority of full-text articles / other research contained in vendorlicensed online resources 2.54 majority of citation records for non-full-text vendor-licensed a&i databases 4.95 consortial catalog records 5.03 electronic reserves records 5.44 records within locally created and hosted databases 5.64 digital collection records 5.77 worldcat records 6.21 ils authority control records 6.5 institutional repository records 6.68 table 2. importance of content indexed in discovery service after the first staff survey was concluded, the discovery task force hosted another library forum to introduce and “test drive” the five vendor services in front of library staff. this session was scheduled just a few weeks ahead of the onsite vendor visits to help serve as a primer to engage library staff and get them actively thinking about questions to ask the vendors. the task force information technology and libraries | march 2012 40 distributed notecards at the forum and asked attendees to record any specific questions they had about a particular service. after the forum, specific questions related to the particular products were collected; 28 questions were collected, and they helped inform future research for those questions for which the task force did not at the time have an answer. questions ran the gamut and collectively touched on all three areas of evaluation. activity: second staff survey within a month after the five vendor onsite visits, a content analysis of the overlap between unlv licensed content and content indexed by the discovery services was conducted. after these steps, a second staff survey was administered. this second staff survey had questions focused on the same three functional areas as the first staff survey: local library customization features, end user features and functionality, and content. since the vendor visits had taken place and users could now understand the questions in the context of the products, questions were asked from the perspective of each product, e.g., “please rate on a five point likert scale whether each discovery service appears to adequately cover a majority of the critical publisher titles (worldcat local, summon, eds, encore synergy, primo central).” in addition, there were free-text questions focused on each individual product allowing colleagues to share additional, detailed thoughts. the second survey totalled 25 questions and had an average response rate of 18 respondents, or about 18 percent of library staff. several staff conducted a series of sample searches in each of the services and provided feedback of their findings. though this was a small response rate, two of the five products rose to the top, a third was a strong contender, and two were seen as less desirable. the lower response rate is perhaps indicative of several things. first, not all staff had attended the onsite vendor demonstrations or had taken the time to test drive the services via the links provided on the discovery task force wiki site. second, some questions were more appropriately answered by a subset of staff. for example, the content questions might best be matched to those with reference, collection development, or curriculum and program liaison duties. finally, intricate details emerged once a thorough analysis of the vendor services was commenced. the first survey was focused more on the philosophy of what was desirable; the second survey took this a step further and asked how well each product matched such wishes. discovery services are changing rapidly with respect to interface updates, customization options, and scope of content. as such, and also reflective of the lower response rate, the author is not providing response information nor analysis for this second survey within this article. however, results may be provided upon specific request to the author. the questions themselves for the second staff survey are significant, and they could help serve as a model for other libraries evaluating existing services on the market. as such, questions appear in appendix d. activity: early adopter references one of the latter steps in the evaluation process from the internal academic library perspective was to obtain early adopter references from other academic library customers. a preliminary shortlist was compiled through a straw vote of the discovery task force—and the results of the vote showed a consensus. this vote narrowed down the discovery task force’s list of services still in contention for a potential purchase. this shortlist was based on the growing mass of research conducted by the discovery task force and informed by the staff surveys and feedback to date. three live customers were identified for each service that had made the shortlist, and the task investigations into library web-scale discovery services | vaughan 41 force successfully obtained two references for each service. reference requests were intensive and involved a set of two dozen questions that references either responded to in writing or answered during scheduled conference calls. to help libraries conducting or interested in conducting their own evaluation and analysis of these services, this list of questions appears in appendix e. the services are so new that the live references weren’t able to comprehensively answer all the questions—they simply hadn’t had sufficient time to fully assess the service they’d chosen to implement. still, some important insights were gained about the specific products and, at the larger level, discovery services as a whole. as noted earlier, discovery services are changing rapidly in the sense of interface updates, customization options, and scope of content. as such, the author is not providing product specific response information or analysis of responses for each specific product—such investigations and interpretations are the job of each individual library seriously wishing to evaluate the services to help decide which product seems most appropriate for its particular environment. several broad insights merit notice, and they are shared below. regarding a question on implementation (though some challenges were mentioned with a few responders), nothing reached the threshold of serious concern. all respondents indicated the new discovery service is already the default or primary search box on their website. one section of the early adopter questions focused on content. the questions in this area seemed a bit challenging for the respondents to provide lots of detail. in terms of “adequately covering a majority of the important library titles,” respondents varied from “too early to tell,” “it covers many areas but there are some big names missing,” to two of the respondents answering simply, “yes.” several respondents also clearly indicated that the web-scale discovery service is not the “beginning and ending” for discovery, a fact that even some of the discovery vendors openly note. for example, one respondent indicated that web-scale discovery doesn’t replace remote federated searching. a majority (not all) of the discovery vendors also have a federated search product that can, to varying degrees, be integrated with their preharvested, centralized, index-based discovery service. this allows additional content to be searched because such databases may include content not indexed within the web-scale discovery service. however, many are familiar with the limitations of federated search technologies: slow speed, poor relevancy ranking of results, and the need to configure and maintain sources and targets. such problems remain with federated search products integrated with web-scale discovery services. another respondent indicated they were targeting their discovery service at undergraduate research needs. another responded, “as a general rule, i would say the discovery service does an excellent job covering all disciplines. if you start really in-depth research in a specific discipline, it starts to break down. general searches are great . . . dive deeper into any discipline and it falls apart. for example, for a computer science person, at some point they will want to go to acm or ieee directly for deep searches.” related to this, “the catalog is still important, if you want to do a very specific search for a book record, the catalog is better. the discovery service does not replace the catalog.” in terms of satisfaction with content type (newspapers, articles, proceedings, etc.), respondents seemed generally happy with the content mix. a range of responses were received, such as “doesn’t appear to be a leaning one way or another, it’s a mix. some of these things depend on how you set the system up, as there is quite a bit of flexibility; the library has to make a decision on what they want searched.” another example was that “the vendor has been working very hard to balance content types and i’ve seen a lot of improvement,” “no imbalance, results seem pretty well rounded.” another responded, “a common complaint is that newspapers and book reviews dominate the search results, but that is much more a function of search algorithms then the amount of content in the index.” information technology and libraries | march 2012 42 when asked about positive or critical faculty feedback to the service, several respondents indicated they hadn’t had a lot of feedback yet. one indicated they had anecdotal feedback. another indicated they’d received backlash from some users who were used to other search services (but also added that it was no greater than backlash from any other service they’d implemented in the past—and so the backlash wasn’t a surprise). one indicated “not a lot of feedback from faculty, the tendency is to go to databases directly, librarians need to instruct them in the discovery service.” for student feedback, one indicated, “we have received a few positive comments and see increased usage.” another indicated, “reviews are mixed. we have had a lot of feedback thanking us for providing a search that covers articles and books. they like the ability to do one search and get a mix of resources without the search taking a long time. other feedback usually centers around a bug or a feature not working as it should, or as they understand it should. in general, however, the feedback has been positive.” another replied, “comments we receive are generally positive, but we’ve not collected them systematically.” some respondents indicated they had done some initial usability testing on the initial interface, but not the most recent one now in use. others indicated they had not yet conducted usability testing, but it was planned for later in 2010 or 2011. in terms of their fellow library staff and their initial satisfaction, one respondent indicated, “somewhere between satisfied and very satisfied . . . it has been increasing with each interface upgrade . . . our instruction librarians are not planning to use the discovery service this fall [in instruction efforts] because they need more experience with it . . . they have been overall intrigued and impressed by it . . . i would say our organization is grappling more with the implications of a discovery tools as a phenomenon than with our particular discovery service in particular. there seems to be general agreement that it is a good search tool for the unmediated searcher.” another indicated some concerns with the initial interface provided: “if librarians couldn’t figure it out, users can’t figure it out.” another responded, it was “a big struggle with librarians getting on board with the system and promoting the service to students. they continually compare it against the catalog. at one point, they weren’t even teaching the discovery service in bib instruction. the only way to improve things it with librarian feedback; it’s getting better, it has been hard. librarians have a hard time replacing the catalog and changing things that they are used to.” in terms of local customization, responses varied; some libraries had done basically no customization to the out-of-the-box interface, others had done extensive customization. one indicated they had tweaked sort options and added widgets to the interface. another indicated they had done extensive changes to the css. one indicated they had customized the colors, added a logo, tweaked the headers and footers, and created “canned” or preconfigured search boxes searching a subset of the index. another indicated they couldn’t customize the header and footer to the degree they would have liked, but were able to customize these elements to a degree. one respondent indicated they’d done a lot of customization to an earlier version of the interface, which had been rather painstaking, and that much of this broke when they upgraded to the latest version. that said, they also indicated the latest version was much better than the previous version. one respondent indicated it would be nice if the service could have multiple sources for investigations into library web-scale discovery services | vaughan 43 enriched record content so that better coverage could be achieved. one respondent indicated they were working on a complete custom interface from scratch, which would be partially populated with results from the discovery service index (as well as other data sources). a few questions asked about relevancy as a search concept and how well the respondents felt about the quality of returned results for queries. one respondent indicated, “we have been able to tweak the ranking and are satisfied at this point.” another indicated, “overall, the relevance is good – and it has improved a lot.” another noted, “known item title searching has been a problem . . . the issues here are very predictable – one word titles are more likely to be a problem, as well as titles with stopwords,” and noted the vendor was aware of the issue and was improving this. one noted, “we would like to be able to experiment with the discovery service more – and noted, “no relevancy algorithm control.” another indicated they looked to investigate relevance more once usability studies commenced, and noted they worked with the vendor to do some code changes with the default search mechanism. one noted that they’d like to be able to specify some additional fields that would be part of the algorithm associated with relevancy. another optimistically noted “as an early adopter, it has been amazing to see how relevance has improved. it is not perfect, but it is constantly evolving and improving.” a final question asked simply, “overall, do you feel your selection of this vendor’s product was a good one? do you sense that your users – students and faculty – have positively received the product?” for the majority of responses, there was general agreement from the early adopters that they felt they’d made the right choice. one noted that it was still early and the evaluation is still a work in progress, but felt it has been positively received. the majority were more certain, “yes, i strongly feel that this was the right decision . . . as more users find it, i believe we will receive additional positive feedback,” “yes, we strongly believe in this product and feel it has been adopted and widely accepted by our users,” “i do feel it was a good selection.” the external perspective: dialog with web-scale discovery vendors the preceding sections focused on an academic library’s perspective on web-scale discovery services—the thoughts, opinions, preferences, and vetting activities involving library staff. the following sections focus on the extensive dialog and interaction with the vendors themselves, regardless of the internal library perspective, and highlight the thorough, meticulous research activities conducted on five vendor services. the discovery task force sought to learn as much about the each service as possible, a challenging proposition given the fact that at the start of investigations, only two of five services had been released, and, unsurprisingly, very little research existed. as such, it was critical to work with vendors to best understand their services, and how their service compared to others in the marketplace. broadly summarized efforts included identification of services, drafting of multiple comprehensive question lists distributed to the vendors, onsite vendor visits, and continual tracking of service enhancements. activity: vendor identification over the course of a year’s work, the discovery task force executed several steps to systematically understand the vendor marketplace—the capabilities, content considerations, development cycles, and future roadmaps associated with five vendor offerings. given that the information technology and libraries | march 2012 44 task force began their work when only two of these services were in public release, there was no manual, recipe, or substantial published research to rely on. the beginning, for the unlv libraries, lie in identification of the services—one must first know the services to be evaluated before evaluation can commence. as mentioned previously, the discovery mini-summit held at the unlv libraries highlighted one product—serial solutions summon; the only released product at the time of the mini-summit was worldcat local. while no published peer-reviewed research highlighting these new web-scale discovery services existed, press and news releases did exist for the three to-be-released services. such releases shed light on the landscape of services that the task force would review—a total of five services, from the first-to-market, worldcat local, to the most recent entrant, primo central. oclc worldcat local, released in november 2007, can be considered the first web-scale discovery service as defined in this research; the experience of an early pilot partner (the university of washington) is profiled in a 2008 issue of library technology reports.14 in the uw pilot, approximately 30 million article-level items were included with the worldcat database. another product, serials solutions summon, was released in july 2009, and together these two services were the only ones publicly released when the discovery task force began its work. the task force identified three additional vendors each working on their own version of a web-scale discovery service; each of these services would enter initial general release as the task force continued its research: ebsco eds in january 2010, innovative interfaces encore synergy around may 2010, and ex libris primo central in june 2010. while each of these three were new in terms of web-scale discovery capabilities, each was built, at least in part, on earlier systems from the vendors. eds draws heavily from the ebscohost interface (the original version of which dates back to the 1990s), while the base encore and base primo systems were next-generation catalog systems that debuted in 2007. activity: vendor investigations after identification of existing and under development discovery services, a next step in unlv’s detailed vendor investigations included the creation of a uniform, comprehensive question list sent to each of the five vendors. the discovery task force ultimately developed a list of 71 questions divided into nine functional areas, as follows, with an example question: section 1: background. “when did product development begin (month, year)?” section 2: locally hosted systems and associated metadata. “with what metadata schemas does your discovery platform work? (e.g., marc, dublin core, ead, etc.)” section 3: publisher/aggregator coverage (full text and citation content). “with approximately how many publishers/aggregators have you forged content agreements ?” section 4: records maintenance and rights management. “how is your system initialized with the correct set of rights management information when a new library customer subscribes to your product?” investigations into library web-scale discovery services | vaughan 45 section 5: seamlessness & interoperability with existing content repositories. “for ils records related to physical holdings, is status information provided directly within the discovery service results list?” section 6: usability philosophy. “describe how your product incorporates published, established best practices in terms of a customer focused, usable interface.” section 7: local “look & feel” customization options. “which of the following can the library control: color scheme; logo / branding; facet categories and placement; etc.” section 8: user experience (presentation, search functionality, and what the user can do with the results). “at what point does a user leave the context and confines of the discovery interface and enter the interface of a different system, whether remote or local?” section 9: administration module & statistics. “describe in detail the statistics reporting capabilities offered by your system. does your system provide the following sets of statistics . . .” all vendors were given 2–3 weeks to respond, and all vendors responded. it was evident from the uneven level of responses to the questions that the vendors were at different developmental states with their products. some vendors were still 6–9 months away from initial public release; some were not even firm on when their service would enter release. it was also observed that some vendors were less explicit in the level of detail provided, reflective of, or in some cases perhaps regardless of, development state. a refined subset of the original 71 questions appears as a list of 40 questions in appendix f. apart from the detailed question list, various sets of free and licensed information on these discovery services are available online, and the task force sought to identify and digest the information. the charleston advisor has conducted interviews with several of the library webscale discovery vendors on their products, including ebsco,15 serials solutions,16 and ex libris.17 these interviews, each around a dozen questions, ask the vendors to describe their product, how it differs from other products in the marketplace, and include questions on metadata and content—all important questions. an article by ronda rowe reviews summon, eds, and worldcat local, and provides some analysis of each product on the basis of content, user interface and searchability, pricing, and contract options.18 it also provides a comparison of 24 product features provided by these three services, such as “search box can be embedded in any webpage,” “local branding possible,” and “supports social networking.” a wide variety of archived webcasts, many provided by library journal, are available through free registration, and new webcasts are being offered at time of writing; these presentations to some degree touch on discussions with the discovery vendors, and are often moderated or include company representatives as part of the discussion group.19 several libraries have authored reports and presentations that, at least partially, discuss information on particular services gained through their evaluations, which include dialog with the vendors.20 vendors themselves each have a section on their corporate website devoted to their service. information provided on these websites ranges from extremely brief to, in the case of worldcat local, very detailed and informative. in addition, much can be gained by “test-driving” live implementations. as such, a listing of vendor website addresses information technology and libraries | march 2012 46 providing more information as well as a list of sample, live implementations is provided in appendix g. activities: vendor visits and content overlap analysis each of the five vendors visited the unlv libraries in spring 2010. vendor visits all occurred within a nine-day span; visits were intentionally scheduled close to each other to keep things fresh in the minds of library staff, and such proximity would help with product comparisons. vendor visits lasted approximately half a day, and each vendor visit often included the field or regional sales representative as well as a product manager or technical expert. vendor visits included a demonstration and q&a for all library staff as well as invited colleagues from other southern nevada libraries, a meeting with the discovery task force, and a meeting with technical staff at unlv responsible for website design and application development and customization. vendors were each given a uniform set of fourteen questions on topics to address during their visit; these appear in appendix h. questions were divided into the broad topical areas of content coverage, end user interface and functionality, and staff “control” over the end user interface. on average, approximately 30–40 percent of the library staff attended the open vendor demo and q & a session. shortly after the vendor visits, a content-overlap analysis comparing unlv serials holdings with indexed content in the discovery service was sought from each vendor. given that the amount of content indexed by each discovery service was growing (and continues to grow) extremely rapidly as new publisher and aggregator content agreements are signed, this content-overlap analysis was intentionally not sought at an earlier date. some vendors were able to provide detailed coverage information against our existing journal titles (unlv currently subscribes to approximately 20,000 e-journals and provides access to another 7,000+ open-access titles). for others, this was more difficult. recognizing this, the head of collection development was asked to provide a list of the “top 100” journal titles for unlv based on such factors as usage statistics and whether the title was a core title for part of the unlv curriculum. the remaining vendors were able to provide content coverage information against this critical title list. four of the five products had quite comprehensive coverage (more than 80 percent) of the unlv libraries’ titles. while outside the scope of this article, “coverage” can mean different things for different services. driven by the publisher agreements they are able to secure, some discovery services may have extensive coverage for particular titles (such as the full text, abstracts, author-supplied keywords, subject headings, etc.), whereas other services, while covering the same title, may have “thinner” metadata, such as basic citation information (article title, publication title, author, publication date, etc.). more discussion on this topic is present in the january 2011 library technology reports on library web-scale discovery services.21 activity: product development tracking one aspect of web-scale discovery services, and the next-generation discovery layers that preceded them, is a rapid enhancement cycle, especially when juxtaposed against the turnkeystyle ils system that dominated library automation for many years. as an example, minor enhancements are provided by serials solutions to summon approximately every three to four weeks; provided by ebsco to ebsco discovery service approximately every three months; and investigations into library web-scale discovery services | vaughan 47 provided by ex libris to primo/primo central approximately every three months. many vendors unveil updates coinciding with annual library conferences, and 2010 was no exception. in late summer/early fall 2010, the discovery task force had conference calls or onsite visits with several of the vendors with a focused discussion on new enhancements and changes to services as well as to obtain answers to any questions that arose since their last visit several months earlier. since the vendor visits in spring 2010, each service had changed, and two services had unveiled significantly different and improved interfaces. the discovery task force’s understanding of web-scale discovery services had expanded greatly since starting their work. coordinated with the second series of vendor visits and discussions, an additional list of more than two dozen questions, recognizing this refined understanding, was sent to the majority of vendors. a portion of these questions are provided as part of the refined list of questions presented in appendix f. this second set of questions dealt with complex discussions of metadata quality, such as what level of content publishers and aggregators were providing for indexing purposes, e.g., full text, abstracts, table of contents, author-supplied keywords or subject headings, or particular citation and record fields), and also the vendor’s stance on content neutrality, i.e., whether they are entering into exclusive agreements with publishers and aggregators, and, if the discovery service vendor is owned by a company involved with content, if that content is promoted or weighted more heavily in result sets. other questions dealt with such topics as current install base counts and technical clarifications about how their service worked. in particular, the questions related to content were tricky for many (not all) of the vendors to address. still, the discovery task force was able to get a better understanding of how things worked in the evolving discovery environment. combined with the internal library perspective and the early adopter references, information gathered from vendors provided the necessary data set to submit a recommendation with confidence. activity: recommendation by mid-fall 2010, the discovery task force had conducted and had at their disposal a tremendous amount of research. recognizing how quickly these services change and the fact that a cyclical evaluation could occur, the task force members felt they had met their charge. if all things failed during the next phase—implementation—at least no one would be able to question the thoroughness of the task force’s efforts. unlike the hasty decision, which in part led to a less than stellar experience with federated search a few years earlier, the evaluation process to recommend a new web-scale discovery service was deliberate, thorough, transparent, and vetted with library stakeholders. given the discovery task force was entering its final phase, official price quotes were sought from each vendor. each task force member was asked to develop a pro/con list for all five identified products based on the knowledge that was gained. these lists were anonymized and consolidated into a single, extensive pro/con list for each service. some of the pros and cons were subjective (such as the interface aesthetics), some were objective (such as a particular discovery service not offering a desired feature). at one of the final meetings of the task force, members reaffirmed the three top contenders, indicated the other two were no longer under consideration and, afterward, were asked to rank their first, second, and third choices for the remaining services. while complete consensus wasn’t achieved, there was a resounding first choice, second choice, and third information technology and libraries | march 2012 48 choice. the task force presented a summary of findings at a meeting open to all library staff. this meeting summarized the research and evaluation steps the task force had conducted over the past year, framed each of the three shortlisted services by discussing some strengths and weaknesses of each service as observed by the task force, and sought to answer any questions from the library at large. prior to drafting the final report and making the recommendation to the dean of libraries, several task force members led a discussion and final question and answer at a libraries’ cabinet meeting, one of the high-level administrative groups at the unlv libraries. vetting by this body represented the last step related to the discovery task force’s investigation, evaluation, and recommendation for purchase of a library web-scale discovery service. the recommendation was broadly accepted by the library cabinet, and shortly afterward the discovery task force was officially disbanded, having met its goal of investigating, evaluating, and making a recommendation for purchase of a library web-scale discovery service. next steps the dialog above describes the research, evaluation, and recommendation model used by the unlv libraries to select a web-scale discovery service. such a model and the associated appendixes could serve as a framework, with some adaptations perhaps, for other libraries considering the evaluation and purchase of a web-scale discovery service. together, the discovery task force’s internal and external research and evaluation provided a substantive base of knowledge on which to make a recommendation. after its recommendation, the project progressed from a research and recommendation phase to an implementation phase. the libraries’ cabinet brainstormed a list of more than a dozen concise implementation bullet points—steps that would need to be addressed—including the harvesting and metadata mapping of local library resources, local branding and some level of customization work, and integration of the web-scale discovery search box in the appropriate locations on the libraries’ website. project implementation co-managers were assigned (the director of technical services and the web technical support manager), as well as key library personnel who would aid in one or more implementation steps. in january 2011, the implementation commenced, with an expected public launch of the new service planned for mid-2011. the success of a web-scale discovery service at the unlv libraries is a story yet to be written, but one full of promise. acknowledgements the author wishes to thank the other members of the unlv libraries’ discovery task force in the research and evaluation of library web-scale discovery services: darcy del bosque, alex dolski, tamera hanken, cory lampert, peter michel, vicki nozero, kathy rankin, michael yunkin, and anne zald. references 1. marcia j. bates, improving user access to library catalog and portal information, final report, version 3 (washington, dc: library of congress, 2003), 4, http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03.doc.pdf (accessed september 10, 2010). http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03.doc.pdf http://www.loc.gov/catdir/bibcontrol/2.3batesreport6-03.doc.pdf investigations into library web-scale discovery services | vaughan 49 2. roger c. schonfeld and ross housewright, faculty survey 2009: key strategic insights for libraries, publishers, and societies (new york: ithaka s+r, 2010), 4, http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-20002009/faculty%20study%202009.pdf (accessed september 10, 2010). 3. oclc, online catalogs: what users and librarians want (dublin, oh: oclc, 2009), 20, http://www.oclc.org/reports/onlinecatalogs/fullreport.pdf (accessed september 10, 2010). 4. ibid, vi. 5. ibid, 14. 6. karen calhoun, the changing nature of the catalog and its integration with other discovery tools: final report (washington, dc: library of congress, 2006), 35, http://www.loc.gov/catdir/calhoun-report-final.pdf (accessed september 10, 2010). 7. bibliographic services task force, rethinking how we provide bibliographic services for the university of california: final report ([pub location?] university of california libraries, 2005), 2, http://libraries.universityofcalifornia.edu/sopag/bstf/final.pdf (accessed september 10, 2010). 8. oclc, college students’ perceptions of libraries and information resources (dublin, oh: oclc, 2006), part 1, page 4, http://www.oclc.org/reports/pdfs/studentperceptions.pdf (accessed september 10, 2010). 9. doug way, “the impact of web-scale discovery on the use of a library collection,” serials review, in press. 10. bill kelm, “worldcat local effects at willamette university,” presentation, prezi, july 21, 2010, http://prezi.com/u84pzunpb0fa/worldcat-local-effects-at-wu/ (accessed sept 10, 2010). 11. michael boock, faye chadwell, and terry reese, “worldcat local task force report to lamp,”march 27, 2009, http://hdl.handle.net/1957/11167 (accessed february 12, 2012). 12. michael boock et al., “discovery services task force recommendation to university librarian,” http://hdl.handle.net/1957/13817 (accessed february 12, 2012). 13. ken varnum et al., “university of michigan library article discovery working group final report,” umich, january 29, 2010, http://www.lib.umich.edu/files/adwg/final-report.pdf.[access date?] 14. jennifer ward, pam mofjeld, and steve shadle, “worldcat local at the university of washington libraries,” library technology reports 44, no. 6 (august/september 2008). 15. dennis brunning and george machovec, “an interview with sam brooks and michael gorrell on the ebscohost integrated search and ebsco discovery service,” charleston advisor 11, no. 3 (january 2010): 62–65. http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-2000-2009/faculty%20study%202009.pdf http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-2000-2009/faculty%20study%202009.pdf http://www.oclc.org/reports/onlinecatalogs/fullreport.pdf http://www.loc.gov/catdir/calhoun-report-final.pdf http://libraries.universityofcalifornia.edu/sopag/bstf/final.pdf http://www.oclc.org/reports/pdfs/studentperceptions.pdf http://prezi.com/u84pzunpb0fa/worldcat-local-effects-at-wu/ http://hdl.handle.net/1957/11167 http://hdl.handle.net/1957/13817 http://www.lib.umich.edu/files/adwg/final-report.pdf information technology and libraries | march 2012 50 16. dennis brunning and george machovec, “interview about summon with jane burke, vice president of serials solutions,” charleston advisor 11, no. 4 (april 2010): 60–62. 17. dennis brunning and george machovec, “an interview with nancy dushkin, vp discovery and delivery solutions at ex libris, regarding primo central,” charleston advisor 12, no. 2 (october 2010): 58–59. 18. ronda rowe, “web-scale discovery: a review of summon, ebsco discovery service, and worldcat local,” charleston advisor 12, no. 1 (october 2010): 5–10. 19. library journal archived webcasts are available at http://www.libraryjournal.com/csp/cms/sites/lj/tools/webcast/index.csp (accessed sept 10, 2010). 20. boock, chadwell, and reese, “worldcat local task force report to lamp”; boock et al., “discovery services task force recommendation to university librarian”; ken varnum et al., “university of michigan library article discovery working group final report.” 21. jason vaughan, “library web-scale discovery services,” library technology reports 47, no. 1 (january 2011). note: appendices a–h available as supplemental files. http://www.libraryjournal.com/csp/cms/sites/lj/tools/webcast/index.csp investigations into library web-scale discovery services: appendices a-h jason vaughan information technology and libraries | march 2012 51 appendices appendix a. discovery task force timeline appendix b. discovery task force charge appendix c. discovery task force: staff survey 1 questions appendix d. discovery task force: staff survey 2 questions appendix e. discovery task force: early adopter questions appendix f. discovery task force: initial vendor investigation questions appendix g. vendor websites and example implementations appendix h. vendor visit questions investigations into library web-scale discovery services | vaughan 52 appendix a. discovery task force timeline information technology and libraries | march 2012 53 appendix b. discovery task force charge discovery task force charge informed through various efforts and research at the local and broader levels, and as expressed in the libraries 2010/12 strategic plan, the unlv libraries have the desire to enable and maximize the discovery of library resources for our patrons. specifically, the unlv libraries seek a unified solution which ideally could meet these guiding principles: • creates a unified search interface for users pulling together information from the library catalog as well as other resources (e.g. journal articles, images, archival materials). • enhances discoverability of as broad a spectrum of library resources as possible • intuitive: minimizes the skills, time, and effort needed by our users to discover resources •supports a high level of local customization (such as accommodation of branding and usability considerations) • supports a high level of interoperability (easily connecting and exchanging data with other systems that are part of our information infrastructure) •demonstrates commitment to sustainability and future enhancements •informed by preferred starting points as such, the discovery task force advises libraries administration on a solution that appears to best meet the goal of enabling and maximizing the discovery of library resources. a bulk of the work will entail a marketplace survey and evaluation of vendor offerings. charge specific deliverables for this work include: 1. identify vendor next generation discovery platforms, whether established and currently on the market, or those publicized and at an advanced stage of development, with an expectation of availability within a year’s time. identify & create a representative list of other academic libraries which have implemented or purchased currently available products. 2. create a checklist / criteria of functional requirements / desires for a next generation discovery platform. 3. create lists of questions to distribute to potential vendors and existing customers of next generation discovery platforms. questions will focus on broad categories such as the following: a. seek to understand how content hosted in our current online systems (iii catalog, contentdm, locally created databases, vendor databases, etc.) could/would (or not be able investigations into library web-scale discovery services | vaughan 54 to) be incorporated or searchable within the discovery platform. apart from our existing online systems as we know them today, the task force will explore, in general terms, how new information resources could be incorporated into the discovery platform. more explicitly, the task force will seek an understanding of what types of existing records are discoverable within the vendor’s next generation discovery platform, and seek an understanding of what basic metadata must exist for an item to be discoverable. b. seek to understand whether the solution relies on federated search, the creation of a central site index via metadata harvesting, or both, to enable discovery of items. c. additional questions, such as pricing, maintenance, install base, etc. 4. evaluate gathered information and seek feedback from library staff. 5. provide to the dean’s directs a final report which summarizes the task force findings. this report will include a recommended product(s) and a broad, as opposed to detailed, summary of workload implications related to implementation and ongoing maintenance. the final report should be provided to the dean’s directs by february 15, 2010. boundaries the work of the task force does not include: • detailing the contents of “hidden collections” within the libraries and seeking to make a concrete determination that such hidden collections, in their current form, would be discoverable via the new system. • conducting an inventory, recommending, or prioritizing collections or items which should be cataloged or otherwise enriched with metadata to make them discoverable. • coordination with other southern nevada nshe entities. • an ils marketplace survey. the underlying innovative millennium system is not being reviewed for potential replacement. • implementation of a selected product. [the charge concluded with a list of members for the task force] information technology and libraries | march 2012 55 appendix c. discovery task force: staff survey 1 questions “rank” means the surveymonkey question will be set up such that each option can only be chosen once, and will be placed on a scale that corresponds to the number of choices overall. “rate” means there will be a 5 point likert scale ranging from strongly disagree to strongly agree. section 1: customization. the “staff side” of the house 1. customization. it is important for the library to be able to control/tweak/influence the following design element [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  general color scheme  ability to include a unlv logo somewhere on the page.  ability to add other branding elements to the page.  ability to add one or more library specified links prominently in the interface (example: a link to the libraries’ home page)  able to customize the name of the product (meaning, the vendor’s name for the product doesn’t need to be used nor appear within the interface)  ability to embed the search box associated with the discovery platform elsewhere into the library website, such as the homepage (i.e. the user could start a search w/o having to directly go to the discovery platform 2. customization. are there any other design customization capabilities that are significantly important? please list, and please indicate if this is a high, low, or medium priority in terms of importance to you. (freetext box ) 3. search algorithms. it is important for the library to be able to change or tweak the platform’s native search algorithm to be able to promote desired items such that they appear higher in the returned list of [strongly disagree / disagree / neither agree or disagree / agree / strongly agree] [e.g. the library, at its option, could tweak one or more search algorithms to more heavily weight resources it wants to promote. for example, if a user searches for “hoover dam” the library could set a rule that would heavily weight and promote unlv digital collection images for hoover dam – those results would appear on the first page of results]. 4. statistics. the following statistic is important to have for the discovery platform [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  number of searches, by customizable timeframe number of item or article level records accessed (that is, a user clicks on something in the returned list of results)  number of searches generating 0 results investigations into library web-scale discovery services | vaughan 56  number of items accessed by type  number of items accessed by provider of content (that is, number of articles from particular database/fulltext vendor 5. statistics. what other statistics would you like to see a discovery platform provide and how important is this to you? (freetext box) 6. staff summary. please rank on a 1-3 scale how important the following elements are, with a “1” being most important, a “2” being 2nd most important, and a 3 being 3rd most important.  heavy customization capabilities as described in questions 1 & 2 above  ability to tweak search algorithms as described in question 3  ability for the system to natively provide detailed search stats such as described in question 4, 5. section 2. the “end user” side of the house 7. searching. which of the following search options is preferable when a user begins their search [choose one]  the system has a “google-like” simple search box  the system has a “google-like” simple search box, but also has an advanced search capability (user can refine the search to certain categories: author, journal, etc.)  no opinion 8. zero hit searches. for a search that retrieves no actual results: [choose one]  the system should suggest something else or ask, “did you mean?”  retrieving precise results is more important and the system should not suggest something else or ask “did you mean?”  no opinion 9. de-duplication of similar items. which of the following is preferable [choose one]  the system automatically de-dupes records (the item only appears once in the returned list)  the system does not de-dupe records (the same item could appear more than once in the returned list, such as when we have overlapping coverage of a particular journal from multiple subscription vendors)  no opinion information technology and libraries | march 2012 57 10. sorting of returned results. it is important for the user to be able to sort or reorder a list of returned results by . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  publication date  alphabetical by author name  alphabetical by title  full text items first  by media type (examples: journal, book, image, etc) 11. web 2.0 functionality on returned results. the following items are important for a discovery platform to have . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree] (note, if necessary, please conduct a search in the libraries’ encore system to help illustrate / remember some of the features/jargon mentioned below. in encore, “facets” appear on the left hand side of the screen; the results with book covers, “add to cart,” and “export” features appear in the middle; and a tag cloud to the right. note: this question is asking about having the particular feature regardless of which vendor, and not how well or how poorly you think the feature works for the encore system)  a tag cloud  faceted searching  ability to add user-generated tags to materials (“folksonomies”)  ability for users to write and post a review of an item • other (please specify) 12. enriched record information on returned results. the following items are important to have in the discovery system . . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  book covers for items held by the libraries  a google books preview button for print items held by the libraries  displays item status information for print items held by the libraries (example: available, checked out) 13. what the user can do with the results. the following functionality is important to have in the discovery system . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  retrieve the fulltext of an item with only a single click on the item from the initial list of returned results  ability to add items to a cart for easy export (print, email, save, export to refworks) investigations into library web-scale discovery services | vaughan 58  ability to place an interlibrary loan / link+ request for an item  system has a login/user account feature which can store user search information for later. in other words, a user could potentially log in to retrieve saved searches, previously stored items, or create alerts when new materials become available. 14. miscellaneous. the following feature/attribute is important to have in the discovery system . . . [strongly disagree / disagree / neither agree or disagree / agree / strongly agree]  the vendor has an existing mobile version of their discovery tool for use by smartphones or other small internet-enabled devices.  the vendor has designed the product such that it can be incorporated into other sites used by students, such as webcampus and/or social networking sites. such “designs” may include the use of persistent urls to embed hyperlinks, the ability to place the search box in another website, or specifically designed widgets developed by the vendor  indexing and availability of newly published items occurs within a matter of days as opposed to a week or perhaps a month.  library catalog authority record information is used to help return proper results and/or populate a tag cloud. 15. end user summary. please rank on a 1-8 scale how important the following elements are; a “1” means you think it is the most important, a “2” second most important, etc.  system offers a “google-like” simple search box only, as detailed in question 7 above  system offers a “did you mean?” or alternate suggestions for all searches retrieving 0 results as detailed in question 8 above (obviously, if you value precision of results over “did you mean” functionality, you would rank this toward the lower end of the spectrum).  system de-dupes similar items as detailed in question 9 above(if you believe the system should not dedupe similar items, you would rate this toward the lower end of the spectrum)  system provides multiple sort options of returned results as detailed in question 10 above  system offers a variety of web 2.0 features as detailed in question 11 above  system offer enriched record information as detailed in question 12 above  system offers flexible options for what a user can do with the results, as detailed in question 13 above  system has one or more miscellaneous features as detailed in question 14 above. section 3: content 16. incorporation of different information types. in an ideal world, a discovery platform would incorporate all of our electronic resources, whether locally produced or licensed/purchased from vendors. below is a listing of different information types. please rank on a scale of 1-10 how vital it is information technology and libraries | march 2012 59 that a discovery platform accommodate these information types (“1” is the most important item in your mind, a “2” is second most important, etc). a. innopac millennium records for unlv print & electronic holdings b. link+ records for print holdings held within the link+ consortium c. innopac authority control records d. records within oclc worldcat e. contentdm records for digital collection materials f. bepress digital commons institutional repository materials g. locally created web accessible database records (e.g. the special collections & architecture databases) h. electronic reserves materials hosted in eres i. a majority of the citation records from non fulltext, vendor licensed online index/abstract/citation databases (e.g. the “agricola” database) j. a majority of the fulltext articles or other research contained in many of our vendor licensed online resources (e.g. “academic search premier” which contains a lot of full text content, and the other fulltext resource packages / journal titles we subscribe to) 17. local content. related to item (g) in the question immediately above, please list any locally produced collections that are currently available either on the website, or in electronic format as a word document, excel spreadsheet or access database (and not currently available on the website) that you would like the discovery platform to incorporate. (freetext box) 18. particular sets of licensed resources, what’s important? please rank which of the licensed (full text or primarily full text) existing publishers below are most important for a discovery platform to accommodate. elsevier sage wiley springer american chemical society taylor & francis (informaworld) ieee american institute of physics oxford ovid nature emerald investigations into library web-scale discovery services | vaughan 60 section 4: survey summary 19. overarching survey question. the questions above were roughly categorized into three areas. given that no discovery platform will be everything to everybody, please rank on a 1-3 scale what the most important aspects of a discovery system are to you (1 is most critical, 2 is second in importance overall, etc.)  the platform is highly customizable by staff (types of things in area 1 of the survey)  the platform is highly flexible from the end-user standpoint (type of things in area 2 of the survey)  the platform encompasses a large variety of our licensed and local resources (type of things in area 3 of the survey) 20. additional input. the survey above is roughly drawn from a larger list of 71 questions sent to the discovery task force vendors. what other things do you think are really important when thinking about a next-generation discovery platform? (freetext input, you may write a sentence or a book) 21. demographic. what library division do you belong to? library administration library technologies research & education special collections technical services user services information technology and libraries | march 2012 61 appendix d. discovery task force: staff survey 2 question for the comparison questions, products are listed by order of vendor presentation. please mark an answer for each product. part i. licensed publisher content (e.g. fulltext journal articles; citations / abstracts) sa = strongly agree; a = agree; n= neither agree nor disagree; d = disagree; sd = strongly disagree 1. “the discovery platform appears to adequately cover a majority of the critical publisher titles.” sa a n d sd i don’t know enough about the content coverage for this product to comment ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 2. “the discovery platform appears to adequately cover a majority of the second-tier or somewhat less critical publisher titles.” sa a n d sd i don’t know enough about the content coverage for this product to comment ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 3. overall, from the content coverage point of view, please rank each platform from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 4. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the content coverage standpoint. unacceptable acceptable ex libris primo central investigations into library web-scale discovery services | vaughan 62 oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part ii. end-user functionality & ease of use 5. from the user perspective, how functional do you think the discovery platform is? are the facets and/or other methods that one can use to limit or refine a search appropriate? were you satisfied with the export options offered by the system (email, export into refworks, print, etc.)? if you think web 2.0 technologies are important (tag cloud, etc.), were one or more of these present (and well executed) in this product? the platform appears to be severely limited in major aspects of end user functionality the platform appears to have some level of useful functionality, but perhaps not as much or as well executed as some competing products. yes, the platform seems quite rich in terms of end user functionality, and such functions are well executed. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 6. from the user perspective, for a full-text pdf journal article, how easy is it to retrieve the full-text? does it take many clicks? are there confusing choices? it’s very cumbersome trying to retrieve the full text of an item, there are many clicks, and/or it’s simply confusing when going through the steps to retrieve the full text. it’s somewhat straightforward to retrieve a full text item, but perhaps it’s not as easy or as well executed as some of the competing products it’s quite easy to retrieve a full text item using this platform, as good as or better than the competition, and i don’t feel it would be a barrier to a majority of our users. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central information technology and libraries | march 2012 63 oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 7. how satisfied were you with the platform’s handling of “dead end” or “zero hit” searches? did the platform offer “did you mean” spelling suggestions? did the platform offer you the option to request the item via doc delivery / link+? is the vendor’s implementation of such features well executed, or were they difficult, confusing, or otherwise lacking? the platform appears to be severely limited in or otherwise poorly executes how it responds to a dead end or zero hit search. the platform handled dead end or zero hit results, but perhaps not as seamlessly or as well executed as some of the competing products. i was happy with how the platform handled “dead end” searches, and such functionality appears to be well executed, as good as or better than the competition. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 8. how satisfied were you with the platform’s integration with the opac? were important things such as call numbers, item status information, and enriched content immediately available and easily viewable from within the discovery platform interface, or did it require an extra click or two into the opac – and did you find this cumbersome or confusing? the platform provides minimal opac item information, and a user the platform appeared to integrate ok with the opac in i was happy with how the platform integrated with the i can’t comment on this particular product because i didn’t see the investigations into library web-scale discovery services | vaughan 64 would have to click through to the opac to get the information they might really need; and/or it took multiple clicks or was otherwise cumbersome to get the relevant item level information terms of providing some level of relevant item level information, but perhaps not as much or as well executed as competing products. opac. a majority of the opac information was available in the discovery platform, and/or their connection to the opac was quite elegant. vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 9. overall, from an end user functionality / ease of use standpoint – how a user can refine a search, export results, easily retrieve the fulltext, easily see information from the opac record – please rank each platform from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 10. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the user functionality / ease of use standpoint. unacceptable acceptable ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part iii. staff customization information technology and libraries | march 2012 65 11. the “out of the box” design demo’ed at the presentation (or linked to the discovery wiki page – whichever particular implementation you liked best for that product) was . . seriously lacking and i feel would need major design changes and customization by library web technical staff. middle of the road – some things i liked, some things i didn’t. the interface design was better than some competing products, worse than others. appeared very professional, clean, well organized, and usable; the appearance was better than most/all of the others products. i can’t comment on this particular product because i didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 12. all products offer some level of customization options that allow at least some changes to the “out of the box” platform. based on what the vendors indicated about the level of customization possible with the platform (e.g. look and feel, ability to add library links, ability to embed the search box on a homepage) do you feel there is enough flexibility with this platform for our needs? the platform appears to be severely limited in the degree or types of customization that can occur at the local level. we appear “stuck” with what the vendor gives us – for better or worse. the platform appeared to have some level of customization, but perhaps not as much as some competing products. yes, the platform seems quite rich in terms of customization options under our local control; more so than the majority or all of the other products. i can’t comment on this particular product because i didn’t see the vendor demo, don’t have enough information, and/or would prefer to leave this question to technical staff to weigh in on. ex libris primo central oclc worldcat local ebsco discovery services innovative encore investigations into library web-scale discovery services | vaughan 66 synergy serials solutions summon 13. overall, from a staff customization standpoint – the ability to change the interface, embed links, define facet categories, define labels, place the searchbox in a different webpage, etc., please rank each platform from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon 14. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the staff customization standpoint. unacceptable acceptable ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part iv. summary questions 15. overall, from a content coverage, user functionality, and staff customization standpoint, please rank each product from best to worst. worst 2nd worst middle 2nd best best ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon information technology and libraries | march 2012 67 16. regardless of a best to worst ranking, please indicate if the products were, overall, acceptable or unacceptable to you from the overall standpoint of content coverage, user functionality, and staff customization standpoint. unacceptable acceptable ex libris primo central oclc worldcat local ebsco discovery services innovative encore synergy serials solutions summon part v. additional thoughts 17. please share any additional thoughts you have on ex libris primo central. (freetext box) 18. please share any additional thoughts you have on oclc worldcat local. (freetext box) 19. please share any additional thoughts you have on ebsco discovery services. (freetext box) 20. please share any additional thoughts you have on innovative encore synergy. (freetext box) 21. please share any additional thoughts you have on serials solutions summon. (freetext box) investigations into library web-scale discovery services | vaughan 68 appendix e. discovery task force: early adopter reference questions author’s note: appendix e originally appeared in the january 2011 library technology reports: web scale discovery services as chapter 7, “questions to consider.” part 1 background 1. how long have you had your discovery service available to your end users? (what month and year did it become generally available to your primary user population, and linked to your public library website). 2. after you had selected a discovery service, approximately how long was the implementation period – how long did it take to “bring it up” for your end‐users and make it available (even if in ‘beta’ form) on your library website? 3. what have you named your discovery service, and is it the ‘default’ search service on your website at this point? in other words, regardless of other discovery systems (ils, digital collection management system, ir, etc.), has the new discovery service become the default or primary search box on your website? part 2 content: article level content coverage & scope “article level content” = articles from academic journals, articles from mainstream journals, newspaper content, conference proceedings, open access content 4. in terms of article level content, do you feel the preindexed, preharvested central index of the discovery platform adequately covers a majority of the titles important to your library’s collection and focus? 5. have you observed any particular strengths in terms of subject content in any of the three major overarching areas -humanities, social sciences, sciences? 6. have you observed any big, or appreciable, gaps in any of the three major overarching areas – humanities, social sciences, sciences? 7. have you observed that the discovery service leans toward one or a few particular content types (e.g. peer reviewed academic journal content; mainstream journal content; newspaper article content; conference proceedings content; academic open access content)? 8. are there particular publishers whose content is either not incorporated, (or not adequately incorporated), into the central index, that you’d like to see included (e.g. elsevier journal content)? 9. have you received any feedback, positive or negative, from your institution’s faculty, related to the content coverage within the discovery service? 10. taking all of the above questions into consideration, are you happy, satisfied, or dissatisfied with the scope of subject content, and formats covered, in the discovery platform’s central index? 11. in general, are you happy with the level of article level metadata associated with the returned information technology and libraries | march 2012 69 citation level results (that is, before one retrieves the complete full text). in other words, the product may incorporate basic citation level metadata (e.g. title, author, publication info), or it may include additional enrichment content, such as abstracts, author supplied keywords, etc. overall, how happy do you sense your library staff is with the quality and amount of metadata provided for a “majority” of the article level content indexed in the system? part 3 content: your local library resources 12. it’s presumed that your local library ils bib records have been harvested into the discovery solution. do you have any other local “homegrown” collections – hosted by other systems at your library or institution – whose content has been harvested into the discovery solution? examples would include digital collection content, institutional repository content, library subject guide content, or other specialized, homegrown local database content. if so, please briefly describe the content – focus of collection, type of content (images, articles, etc.), and a ballpark number of items. if no local collections other than ils bib record content have been harvested, please skip to question 15. 13. [for local collections other than ils bib records]. did you use existing, vendor provided ingestors to harvest the local record content (i.e. ingestors to transfer the record content, apply any transformations and normalizations to migrate the local content to the underlying discovery platform schema)? or did you develop your own ingestors from scratch, or using a toolkit or application profile template provided by the vendor? 14. [for local collections other than ils bib records]. did you need extensive assistance from the discovery platform vendor to help harvest any of your local collections into the discovery index? if so, regardless of whether the vendor offered this assistance for free or charged a fee, were you happy with the level of service received from the vendor? 15. do you feel your local content (including ils bib records) is adequately “exposed” during a majority of searches? in other words, if your local harvested content equaled a million records, and the overall size of the discovery platform index was a hundred million records, do you feel your local content is “lost” for a majority of end user searches, or adequately exposed? part 4 interface: general satisfaction level 16. overall, how satisfied are you and your local library colleagues with the discovery service’s interface? 17. do you have any sense of how satisfied faculty at your institution are with the discovery service’s interface? have you received any positive or negative comments from faculty related to the interface? 18. do you have any sense of how satisfied your (non-faculty) end-users are with the discovery service’s interface? have you received any positive or negative comments from users related to the interface? 19. have you conducted any end-user usability testing related to the discovery service? if so, can you provide the results, or otherwise some general comments on the results of these tests? 20. related to searching, are you happy with the relevance of results returned by the discovery service? have you noticed any consistent “goofiness,” or surprises with the returned results? if you could make a investigations into library web-scale discovery services | vaughan 70 change in the relevancy arena, what would it be, if anything? part 5 interface: local customization 21. has your library performed what you might consider any “major customization” to the product? or has it primarily been customizations such as naming the service, defining hyperlinks and the color scheme? if you’ve done more extensive customization, could you please briefly describe, and was the product architecture flexible enough to allow you to do what you wanted to do (also see question 22 below, which is related). 22. is there any particular feature or function that is missing or non-configurable within the discovery service that you wish were available? 23. in general, are you happy with the “openness” or “flexibility” of the system in terms of how customizable it is by your library staff? part 6: final thoughts 24. overall, do you feel your selection of this vendor’s product was a good one? do you sense that your users – students and faculty – have positively received the product? 25. have you conducted any statistics review or analysis (through the discovery service statistics, or link resolver statistics, etc.) that would indicate or at least suggest that the discovery service has improved the discoverability of some of your materials (whether local library materials or remotely hosted publisher content). 26. if you have some sense of the competition in the vendor discovery marketplace, do you feel this product offers something above and beyond the other competitors in the marketplace? if so, what attracted you to this particular product, what made it stand out? information technology and libraries | march 2012 71 appendix f. discovery task force: initial vendor investigation questions section 1: general / background questions 1. customer install base how many current customers do you have that have which have implemented the product at their institution? (the tool is currently available to users / researchers at that institution) how many additional customers have committed to the product? how many of these customers fall within our library type (e.g. higher ed academic, public, k-12)? 2. references can you provide website addresses for live implementations which you feel serve as a representative model matching our library type? can you provide references – the name and contact information for the lead individuals you worked with at several representative customer sites which match our library type? 3. pricing model, optional products describe your pricing model for a library type such as ours, including initial upfront costs and ongoing costs related to the subscription and technical support. what optional add-on services or modules (federated search, recommender services, enrichment services) do you market which we should be aware of, related to and able to be integrated with your web scale discovery solution? 4. technical support and troubleshooting briefly describe options customers have, and hours of availability, for reporting mission critical problems; and for reporting observed non mission-critical glitches. briefly describe any consulting services you may provide above and beyond support services offered as part of the ongoing subscription. (e.g. consulting services related to harvesting of a unique library resource for which an ingest/transform/normalize routine does not already exist). is there a process for suggesting enhancement requests for potential future incorporation into the product? 5. size of the centralized index. how many periodical titles does your preharvested, centralized index encompass? how many indexed items? 6. statistics. please describe what you feel are some of the more significant use, management or content related statistics available out-of-the-box with your system. investigations into library web-scale discovery services | vaughan 72 are the statistics counter compliant? 7. ongoing maintenance activities, local library staff. for instances where the interface and discovery service is hosted on your end, please describe any ongoing local library maintenance activities associated with maintaining the service for the local library’s clientele (e.g. maintenance of the link resolver database; ongoing maintenance associated with periodic local resource harvest updates; etc.) section 2: local library resources 8. metadata requirements and existing ingestors. what mandatory record fields for a local resource has to exist for the content to be indexed and discoverable within your platform (title, date)? please verify that your platform has existing connectors -ingest/transform/normalize tools and transfer mechanisms and/or application profiles for the following schema used by local systems at our library (e.g. marc 21 bibliographic records; unqualified / qualified dublin core, ead, etc.) please describe any standard tools your discovery platform may offer to assist local staff in crosswalking between the local library database schema and the underlying schema within your platform. our library uses the abc digital collection management software. do you have any existing customers who also utilize this platform, whose digital collections have been harvested and are now exposed in their instance of the discovery product? our library uses the abc institutional repository software. do you have any existing customers who also utilize this platform, whose digital collections have been harvested and are now exposed in their instance of the discovery product? 9. resource normalization. is content for both local and remote content normalized to a single schema? if so, please offer comments on how local and remote (publisher/aggregator) content is normalized to this single underling schema. to what degree can collections from different sources have their own unique field information which is displayed and/or figures into the relevancy ranking algorithm for retrieval purposes. 10. schedule. for records hosted in systems at the local library, how often do you harvest information to account for record updates, modifications, deletions? can the local library invoke a manual harvest of locally hosted resource records on a per-resource basis (e.g. from a selected resource – for example, if the library launches a new digital collection and want the records to be available in the new discovery platform shortly after they are available in our local digital collection management system, is there a mechanism to force a harvest prior to the next regularly scheduled harvest routine? after harvesting, how long does it typically take for such updates, additions, and deletions to be reflected in the searchable central index? information technology and libraries | march 2012 73 11. policies / procedures. please describe any general policies and procedures not already addressed which the local library should be aware of as relates to the harvesting of local resources. 12. consortial union catalogs. can your service harvest or provide access to items within a consortial or otherwise shared catalog (e.g. the inn-reach catalog). please describe. section 3: publisher and aggregator indexed content 13. publisher/aggregator agreements: general with approximately how many publishers have you forged content agreements with? are these agreements indefinite or do they have expiration dates? have you entered into any exclusive agreements with any publishers/aggregators (i.e. the publisher/aggregator is disallowed from forging agreements with competing discovery platform vendors, or disallowed from providing the same deep level of metadata/full text for indexing purposes). 14. comments on metadata provided. could you please provide some general comments on the level of data provided to you, for indexing purposes, by the “majority” of major publishers/aggregators with which you have forged agreements. please describe to what degree the following elements play a role in your discovery service: a. “basic” bibliographic information (article title/journal title/author/publication information) b. subject descriptors c. keywords (author supplied?) d. abstracts (author supplied?) e. full text 15. topical content strength do you feel there is a particular content area that you feel the service covers especially well or leans heavily toward (e.g. humanities, social sciences, sciences). do you feel there is a particular content type that you feel the service covers very well or leans heavily toward (scholarly journal content, mainstream journal content, newspapers, conference proceedings). what subject / content areas, if any, do you feel the service may be somewhat weak? are there current efforts to mitigate these weaknesses (e.g. future publisher agreements on the horizon)? 16. major publisher content agreements. are there major publisher agreements that you feel are especially significant for your service? if so, which publishers, and why (e.g. other discovery platform vendors may not have such agreements with those particular providers; the amount of content was so great that it greatly augmented the size and scope of your service; etc.) investigations into library web-scale discovery services | vaughan 74 17. content considered key by local library (by publisher). following is a list of some major publishers whose content the library licenses which is considered “key.” has your company forged agreements with these publishers to harvest their materials. if so please describe in general the scope of the agreement. how many titles are covered for each publisher? what level of metadata are they providing to you for indexing purposes (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). a. ex. elsevier b. ex. sage c. ex. taylor and francis d. ex. wiley / blackwell 18. content considered key by local library (by title). following is a list of some major journal / newspaper titles whose content the library licenses which is considered “key.” could you please indicate if your central index includes these titles, and if so, the level of indexing (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). a. ex. nature b. ex. american historical review c. ex. jama d. ex. wall street journal 19. google books / google scholar. do any agreements exist at this time to harvest the data associated with the google books or google scholar projects into your central index? if so, could you please describe the level of indexing (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). 20. worldcat catalog. does your service include the oclc worldcat catalog records? if so, what level of information is included? the complete record? holdings information? 21. e-book vendors. does your service include items from major e-book vendors? 22. record information. given the fact that the same content (e.g. metadata for a unique article) can be provided by multiple sources (e.g. the original publisher of the journal itself, an open access repository, a database / aggregator, another database / aggregator, etc.), please provide some general comments on how records are built within your discovery service. for example: a. you have an agreement with a particular publisher/aggregator and they agree to provide you with rich metadata for their content, perhaps even provide you with indexing they’ve already done for their content, and may even provide you with the full text for you to be able to “deep index” their content. b. you’ve got an agreement with a particular publisher who happens to be the only publisher/provider of that content. they may provide you rich info, or they may provide you rather weak info. in any case, you choose to incorporate this into your service, as they are the only provider/publisher of the info. or, information technology and libraries | march 2012 75 alternately, they may not be the only publisher/provider of the info, but they are the only publisher/provider you’ve currently entered into an agreement with for that content. c. for some items appearing within your service, content for those items is provided by multiple different sources whom you’ve made agreements with. in short, there will be in some/many cases of overlap for unique items, such as a particular article title. in such cases, do you create a “merged/composite/super record” -where your service utilizes particular metadata from each of the multiple sources, creating a “strong” single record built from these multiple resources. 23. deduping. related to the question immediately above, please describe your services’ approach (or not) to deduplicating items in your central index. if your service incorporates content for a same unique item from more than one content provider, does your index retrieve and display multiple instances of the same title? or do you create a merged/composite/super record, and only this single record is displayed? please describe. section 4: open access content 24. open access content sources. does your service automatically include (out of the box, no additional charge) materials from open access repositories? if so, could you please list some of the major repositories included (e.g. arxiv e-prints; hindawi publishing corporation; the directory of open access journals; hathi trust materials; etc.). 25. open access content sources: future plans. in addition to the current open access repositories that may be included in your service, are there other repositories whose content you are planning to incorporate in the future? 26. exposure to other libraries’ bibliographic / digital collection / ir content. are ils bibliographic records from other customers using your discovery platform exposed for discoverability in the searchable discovery instance of another customer? are digital collection records? institutional repository records? section 5: relevancy ranking 27. relevancy determination. please describe some of the factors which comprise the determination of relevancy within your service. what elements play a role, and how heavily are they weighted for purposes of determining relevancy? 28. currency. please comment on how heavily currency of an item plays in relevancy determination. does currency weigh more heavily for certain content types (e.g. newspapers)? 29. local library influence. does the local library have any influence or level of control over the relevancy algorithm? can they choose to “bump up” particular items for a search? please describe. 30. local collection visibility. could you please offer some comments on how local content (e.g. ils bibliographic records; digital collections) remains visible and discoverable within the larger pool of content indexed by your service? for example, local content may measures a million items, and your centralized index may cover half a billion items. investigations into library web-scale discovery services | vaughan 76 31. exposure of items with minimal metadata. some items likely have lesser metadata than other items. could you please offer some comments on how your system ensures discoverability for items with lesser or minimal metadata. 32. full text searching. does your service offer the capability for the user to search the fulltext of materials in your service (i.e. are they searching a full text keyword index?) if so, approximately what percentage of items within your service are “deep indexed?” 33. please describe how your system deals when no hits are retrieved for a search. does your system enable “best-match” retrieval – that is, something will always be returned or recommended? what elements play into this determination; how is the user prevented from having a completely “dead-end” search? section 6: authentication and rights management 34. open / closed nature of your discovery solution. does your system offer an unauthenticated view / access? please describe and offer some comments on what materials will not be discoverable/visible for an unauthenticated user. a. licensed full text b. records specifically or solely sourced from abstract and indexing databases c. full citation information (e.g. an unauthenticated user may see just a title; an authenticated user would see fuller citation information) d. enrichment information (such as book image covers, table of contents, abstracts, etc.) e. other 35. exposure of non-licensed resource metadata. if one weren’t to consider and take into account any e-journal/publisher package/database subscriptions & licenses the local library pays for, is there a base index of citation information that’s exposed and available to all subscribers of your discovery service? this may include open access materials, and/or bibliographic information for some publisher / aggregator content (which often requires a local library license to access the full text). please describe. would a user need to be authenticated to search (and retrieve results from) this “base index?” approximately how large is this “base index” which all customers may search, regardless of local library publisher/aggregator subscriptions. 36. rights management. please discuss how rights management is initialized and maintained in your system, for purposes of determining whether a local library user should have access to the full text (or otherwise “full resolution” if a library doesn’t license the fulltext – such as resolution to a detailed citation/abstract). information technology and libraries | march 2012 77 our library uses the abc link resolver. our library uses the abc a-z journal listing service. our library uses the abc electronic resource management system. is your discovery solution compatible with one/all of these systems for rights management purposes? is one approach preferable to the other, or does your approach explicitly depend on one of these particular services? section 7: user interface 37. openness to local library customization. please describe how “open” your system is to local library customization. for example, please comment on the local library’s ability to a. rename the service b. customize the header and footer hyperlinks / color scheme c. choose which facet clusters appear d. define new facet clusters e. embed the search box in other venues f. create canned, pre-customized searches for an instance of the search box g. define and promote a collection, database, or item such that it appears at the top or on the first page of any search i. develop custom “widgits” offering extra functionality or download “widgits” from an existing user community (e.g. image retrieval widgits such as flickr integration; library subject guide widgits such as libguides integration; etc. j. incorporate links to external enriched content (e.g. google book previews; amazon.com item information) k. other 38. web 2.0 social community features. please describe some current web 2.0 social features present in your discovery interface (e.g. user tagging, ratings, reviews, etc.). what, if any, plans do you have to offer or expand such functionality in future releases? 39. user accounts. does your system offer user accounts? if so, are these mandatory or optional? what services does this user account provide? a. save a list of results to return to at a later time? investigations into library web-scale discovery services | vaughan 78 b. save canned queries for later searching? c. see a list of recently viewed items? d. perform typical ils functions such as viewing checked out items / renewals / holds? e. create customized rss feeds for a search 40. mobile interface. please describe the mobile interfaces available for your product. is it a browser based interface optimized for smallscreen devices? is it a dedicated iphone, android, or blackberry based executable application? 41. usability testing. briefly describe how your product incorporates published, established “best practices” in terms of a customer focused, usable interface. what usability testing have your performed and/or do you conduct on an ongoing basis? have any other customers that have gone live with your service completed usability testing that you’re aware of? information technology and libraries | march 2012 79 appendix g: vendor websites and example implementations oclc worldcat local www.oclc.org/us/en/worldcatlocal/default.htm example implementations: lincoln trails library system www.lincolntrail.info/linc.html university of delaware www.lib.udel.edu university of washington www.lib.washington.edu willamette university http://library.willamette.edu serials solutions summon www.serialssolutions.com/summon example implementations: dartmouth college www.dartmouth.edu/~library/home/find/summon drexel university www.library.drexel.edu university of calgary http://library.ucalgary.ca western michigan university http://wmich.summon.serialssolutions.com ebsco discovery services www.ebscohost.com/discovery example implementations: james madison university www.lib.jmu.edu mississippi state university http://library.msstate.edu northeastern university www.lib.neu.edu university of oklahoma http://libraries.ou.edu investigations into library web-scale discovery services | vaughan 80 innovative interfaces encore synergy encoreforlibraries.com/tag/encore-synergy example implementations: university of nebraska-lincoln http://encore.unl.edu/iii/encore/home?lang=eng university of san diego http://sallypro.sandiego.edu/iii/encore/home?lang=eng scottsdale public library http://encore.scottsdaleaz.gov/iii/encore/home?lang=eng sacramento public library http://find.saclibrarycatalog.org/iii/encore/home?lang=eng ex libris primo central www.exlibrisgroup.com/category/primocentral example implementations: (note: example implementations are listed in alphabetical order. some implementations are more open to search by an external audience, based on configuration decisions at the local library level.) brigham young university scholarsearch www.lib.byu.edu (note: choose all-in-one search) northwestern university http://search.library.northwestern.edu vanderbilt university discoverlibrary http://discoverlibrary.vanderbilt.edu (note: choose books, media, and more) yonsei university (korea) wisearch: articles + library holdings http://library.yonsei.ac.kr/main/main.do (note: choose the articles + library holdings link. the interface is available in both korean and english; to change to english, select english at the top right of the screen after you have conducted a search and are within the primo central interface) information technology and libraries | march 2012 81 appendix h. vendor visit questions content 1. please speak to how well you feel your product stacks up against the competition in terms of the licensed full-text / citation content covered by your product. based on whatever marketplace or other competitive analysis you may have done, do you feel the agreements you’ve made with publishers equal, exceed, or trail the agreements other competitors have made? 2. from the perspective of an academic library serving undergraduate and graduate students as well as faculty, do you feel that there are particular licensed content areas your product covers very well (e.g. humanities, social sciences, sciences). do you feel there are areas which you need to build up? 3. what’s your philosophy going forward in inking future agreements with publishers to cover more licensed content? are there particular key publishers your index currently doesn’t include, but whom you are in active negotiations with? 4. we have several local content repositories, such as our digital collections in contentdm, our growing ir repository housed in bepress, and locally developed, web-searchable mysql databases. given the fact that most discovery platforms are quite new, do you already have existing customers harvesting their local collections, such as the above, into the discovery platform? have any particular, common problems surfaced in their attempts to get their local collections searchable and exposed in the discovery platform? 5. let’s say the library subscribes to an ejournal title – journal of animal studies -that’s from a publisher with whom you don’t have an agreement for their metadata, and thus, supposedly, don’t index. if a student tried to search for an article in this journal – “giraffe behavior during the drought season,” what would happen? is this content still somehow indexed in your tool? would the discovery platform invoke our link resolver? please describe. 6. our focus is your next generation discovery platform, and not on your “traditional” federated search product which may be able to cover other resources not yet indexed in your next generation discovery platform. that said, please briefly describe the role of your federated search product vis a vis the next generation discovery platform. do you see your federated search product “going away” once more and more content is eventually indexed in your next generation discovery platform? end user interface & functionality 7. are there any particular or unique look and feel aspects of your interface that you feel elevate your product above your competitors? if so, please describe. 8. are there any particular or unique functionality aspects of your product that you feel elevate it above the competition (e.g. presearch or postsearch refinement categories, export options, etc.) 9. studies show that end users want very quick access to full text materials such as electronic journal articles and ebooks. what is your product’s philosophy in regards to this? does your platform, in your opinion, provide seamless, quick access to full text materials, with a minimum of confusion? please describe. investigations into library web-scale discovery services | vaughan 82 related to this, does your platform de-dupe results, or is the user presented with a list of choices for a single, particular journal article they are trying to retrieve? in addition, please describe a bit how your relevancy ranking works for returned results. what makes an item appear first or on the first page of results? 10. please describe how “well” your product integrates with the library’s opac (in our case, innovative’s millennium opac). what information about opac holdings can be viewed directly in the discovery platform w/o clicking into the catalog and opening a new screen (e.g. call #, availability, enriched content such as table of contents or book covers?) in addition, our opac uses “scopes” which allow a user – if they choose – to limit at an outset (prior to a search being conducted) what collection they are searching. in other words, these scopes are location based, not media type based. for our institution, we have a scope for the main library, one for each of our three branch libraries, and a scope for the entire unlv collection. would your system be able to incorporate or integrate these pre-existing scopes in an advanced search mode? and/or, could these location based scopes appear as facets which a user could use to drill down a results list? 11. what is your platform’s philosophy in terms of “dead end searches.” does such a thing exist with your product? please describe what happens if a user a.) misspells a word b.) searches for a book or journal title / article that our library doesn’t own/license, but that we could acquire through interlibrary loan. staff “control” over the end user interface 12. how “open” is your platform to customization or interface design tweaks desired by the library? are there any particular aspects that the library can customize with your product that you feel elevate it above your competitors (e.g. defining facet categories; completely redesigning the end-user interface with colors, links, logos; etc.)? what are the major things customizable by the library, and why do you think this is something important that your product offers. 13. how “open” is your platform to porting over to other access points? in other words, provided appropriate technical skills exist, can we easily embed the search box for your product into a different webpage? could we create a “smaller,” more streamlined version of your interface for smartphone access? overarching question 14. in summary, what are some of the chief differentiators of your product from the competition? why is your product the best and most worthy of serious consideration? abstract introduction why web-scale discovery? q: if you could provide one piece of advice to your library, what would it be? the internal academic library perspective: genesis of the unlv libraries discovery task force the following sections of this article begin with a focus on the internal unlv library perspective—from early discussions focused on the broad topic of discovery to establishing a task force charged to identify, research, evaluate, and recommend a pot... activity: understanding web-scale activity: initial staff survey table 1. web-scale discovery service capabilities activity: second staff survey activity: early adopter references activity: vendor identification activity: vendor investigations activity: product development tracking activity: recommendation next steps references 16 information technology and libraries | march 2011 the internet public library (ipl): an exploratory case study on user perceptions environment. digital and physical holdings, academic and public libraries, free and subscription resources, internet encyclopedias, and a multitude of other offerings form a complex (and often overwhelming) informationseeking environment. to move forward effectively and to best serve its existing and potential users, the ipl must pursue a path that is adapted to the present state of the internet and that is user-informed and user-driven. recent large-scale studies, such as the 2005 oclc reports on perceptions of libraries and information resources, have begun to explore user perceptions of libraries in the complex internet environment.3 these studies emphasize the importance of user perceptions of library use, questioning whether libraries still matter in the rapidly growing infosphere and what future use trends might be. in the internet environment, user perceptions play a key role in use (or nonuse) of library resource and services as information-seekers are faced with myriad easily accessible electronic information sources. the ipl’s name, for example, may or may not be perceived as initially helpful to users’ information-seeking needs. repeat use relates to such perceptions as well, in the amount of value users perceive in the library resources over the many other sources available. in beginning to explore such issues, there is a need for current research addressing user perceptions of an internet public library: what the name implies to both existing and potential users as well as the associated functions and resources that should be offered. in this study, we present an exploratory case study on public perceptions of the ipl. using qualitative analysis of interviews with ten college students, some of whom are current users of the ipl and others with no exposure to the ipl, begins to yield an understanding of the public perception of what an internet public library should be. this study seeks to expand our understanding of such issues and explore the present-day requirements for the ipl in addressing the following research questions: ■■ what is the public perception of an internet public library? ■■ what services and materials should an internet public library offer? ■■ background the ipl: origins and research in 1995, joe janes, a professor at the university of michigan’s school of information and library studies, ran a graduate seminar in which a group of students created a web-based library intended to be a hybrid of both physical library services and internet resources and offerings. the resulting ipl would take the best from both the physical and digital the internet public library (ipl), now known as ipl2, was created in 1995 with the mission of serving the public by providing librarian-recommended internet resources and reference help. we present an exploratory case study on public perceptions of an “internet public library,” based on qualitative analysis of interviews with ten college student participants: some current users and others unfamiliar with the ipl. the exploratory interviews revealed some confusion around the ipl’s name and the types of resources and services that would be offered. participants made many positive comments about the ipl’s resource quality, credibility, and personal help. t he internet public library (ipl), now known as ipl2, is an online-based public service organization and a learning and teaching environment originally developed by the university of michigan’s school of information and currently hosted by drexel university’s ischool. the ipl was created in 1995 as a project in a graduate seminar; a diverse group of students worked to create an online space that would be both a library and an internet institution, helping librarians and the public identify useful internet resources and content collections. with a strong mission to serve and educate a varied community of users, the ipl sought to help the public navigate the increasingly complex internet environment as well as advocate for the continuing relevance of librarians in a digital world. the resulting ipl provided online reference, content collections (such as ready reference and a full-text reading room), youth-oriented resources, and services for other librarians, all through its free, web-based presence.1 currently, the ipl consists of a publicly accessible website with several large content collections (such as “potus: presidents of the united states”), sections targeted toward teens and children (“teenspace” and “kidspace”), and a question and answer service where users can e-mail questions to be answered by volunteer librarians.2 there has been an enormous amount of change in the internet and digital libraries since the ipl’s inception in 1995. while web use statistics, user feedback, and incoming patron questions indicate that the ipl remains well-used and valued, there are many questions about its place in an increasingly information-rich online monica maceli (mgm36@drexel.edu) is a doctoral student, susan wiedenbeck (susan.wiedenbeck@drexel.edu) is a professor, and eileen abels (eabels@drexel.edu) is a professor at the college of information science and technology, drexel university, philadelphia. monica maceli, susan wiedenbeck, and eileen abels the internet public library (ipl) | maceli, wiedenbeck, and abels 17 there has also been a continuous evaluation of the role of the library in an increasingly digital world, a question janes sought to address in his first imaginings of the ipl. a study conducted in 2005 claimed that “electronic information-seeking by the public, both adults and children, is now an everyday reality and large numbers of people have the expectation that they should be able to seek information solely in a virtual mode if they so choose.”12 this trend in electronic information-seeking has driven both public and academic libraries to create and support vast networks of licensed and free online information, directories, and guides. these electronic offerings, which (at least in theory) are desired and appreciated by users, are often overshadowed by the wealth of quickly accessible information from tools such as search engines.13 in competition with quickly accessible (though not necessarily credible or accurate) information sources, librarians have struggled to find their place and relevance in an evolving environment. google and other web search engines often shape users’ experiences and expectations with information-seeking, more so than any formal librarian-driven instruction such as in boolean searching. several recent comprehensive studies have explored user perceptions of libraries, both physical and digital, in relationship to the larger internet. abels explored the perspective of libraries and librarians across a broad population consisting of both library users and non-users.14 her findings included the fact that web search engines were the starting point for the majority of information-seeking, and that there is a high preference among users for virtual reference desk services. she proposed an information-seeking model in which the library serves as one of many internet resources, including free websites and interpersonal sources, and is likely not the user’s first stop. in respect to this model of information-seeking, abels suggests that “librarians need to accept the broader framework of the information seeker and develop services that integrate the library and the librarian into this framework.”15 in 2005, oclc released what is possibly the most comprehensive study to date of the public’s perceptions of library and information resources as explored on a number of levels, including both the physical and digital environments.16 findings relevant to libraries on the internet (and this study) included the following: ■■ 84 percent of participants reported beginning an information search from a search engine; only 1 percent started from a library website ■■ there was a preference for self-service and a tendency to not seek assistance from library staff ■■ users were not aware of most libraries’ electronic resources ■■ college students have the highest rate of library use ■■ users typically cross-reference other sites to validate their results worlds while developing its own unique offerings and features.4 janes had conceived the idea in 1994, when the internet’s continued growth began to make it clear that the role of libraries and librarians would be forever changed as a result. janes’ motivating question was “what does librarianship have to say to the network environment and vice versa?”5 the ipl tackled a broad mission of enhancing the value of the internet by providing resources to its varied users, educating and strengthening that community, and (perhaps most unique at the time) communicating “its’ creators vision of the unique roles of library culture and traditions on the internet.”6 initial student brainstorming sessions yielded the priorities that the ipl would address and included such services as reference, community outreach, and youth services. the first version of the ipl contained electronic versions of classic library offerings, such as magazines, texts, serials, newspapers, and an e-mail reference service. the ipl was well received and continued its development, adding and expanding resources to support specific communities such as teens and children. the ipl was awarded several grants over the next few years, allowing for expansion and continuation.7 a wealth of librarian volunteers, composed of students and staff, contributed to the ipl, in particular toward the e-mail reference services. with a stated goal of responding to patrons’ questions within one week, the reference services provide help and direct contact with the ipl’s user base, many of whom are students working on school assignments.8 the ipl’s collections are discoverable through search engines (popular offerings such as the “potus: presidents of the united states” resources rank highly in search results lists) and through its presence on social networking sites such as myspace, facebook, and twitter. additionally, ipl distributes brochures to teachers and librarians at relevant conferences. the ipl has been the focus of many research studies covering a broad range of themes, such as its history and funding, digital reference and the ipl’s question-andanswer service, and its resources and collections.9 also, in line with the original mission of the ipl, janes developed the internet public library handbook to share best practices with other librarians.10 the majority of publications, however, have focused on ipl’s reference service, which is uniquely placed as a librarian-staffed volunteer digital reference service. as the ipl has collected and retained all reference interactions since its inception in 1995, there is a wealth of data readily available to such studies and exploratory work into how best to analyze it.11 user perceptions of digital libraries the internet is a vastly different world than it was in the early days of the ipl’s creation. the expectations of library patrons, both in digital and in physical environments, have changed as well. and as the internet evolves 18 information technology and libraries | march 2011 of the public, which is the intention of this study. ■■ method this exploratory study consisted of a qualitative analysis of data gathered from interviews and observations of ten college student participants who were academic library users and nonusers of the ipl. a pilot study preceded the final research effort and allowed us to iteratively tailor the study design to best pursue our research questions. our initial study design incorporated a usability test portion, in which users were presented with a series of information-seeking needs and instructed to use the ipl’s website to answer the questions. however, we later dropped this portion of the study because pilot results found that it contributed little to answering our research questions about public perceptions; it largely explored implementation details, which was not the focus of this study. following the pilot study, we recruited ten drexel university students from the university’s w. w. hagerty library. this ensured recruiting participants who were at least minimally familiar with physical libraries and who were from a variety of academic focuses. the participant group included eight females and two males—two were graduate students, eight were undergraduates—from a variety of majors, including biology, biomedical engineering, business, library science, accounting, international studies, and information systems. participants took an average of twenty-six minutes to complete the study. the study consisted of a short interview to assess the user’s experience with public libraries (both physical and online) and their expectations of an internet public library. these open-ended questions (included in the appendix) sought to determine what features, services, or content were desired or expected by users, whether the term of “internet public library” was meaningful, if there were similarities to web-based systems that the participants were already familiar with, or if they had previously used a website they would consider an internet public library. all interviews were audio recorded and transcribed. an initial coding scheme was established and iteratively developed (table 1). once we observed significant overlap between participant responses, the study then proceeded to the final analysis and presentation, using inductive qualitative analysis to code text and identify themes from the data.22 ■■ findings all participants were current or former public library patrons; six participants (p1, p4, p5, p6, p8, and p9) were a portion of the study focused on library identity or brand in the mind of the public; participants found the library brand to be “books,” with no other terms or concepts coming close. as a companion report to this study, oclc released a report focused on the library perceptions of college students.17 as our study uses a college student participant base, oclc’s findings are highly relevant. the vast majority of college students reported using search engines as a starting point for informationseeking and expressed a strong desire for self-service library resources. as compared to the general population, however, college students have the highest rate and broadest use of both physical and digital library resources and a corresponding high awareness of these services. the relationship between public libraries and the internet was explored in depth in a 2002 study by d’elia et al.18 the study sought to systematically investigate patrons’ use of the internet and of public libraries. findings included the fact that the internet and public libraries are often complementary; that more than half of internet users were library users and vice versa; and that libraries are valued more than the internet for providing accurate information, privacy, and child-oriented spaces and services. participants made a distinction between the service characteristics of the public library versus those of the internet. many of the most-valued characteristics of the internet (such as information that is always available when needed) were not supported by physical libraries because of limited offerings and hours. in addition to large, comprehensive surveys, there have been several case-study approaches, exploring user perceptions of a particular digital library or library feature. tammaro researched user perceptions of an italian digital library, finding the catalog, online databases, and electronic journals to be most valued; she found speed of access, remote access, a larger number of resources, and personalization to be key digital library services.19 this study also reported a consistent theme in digital library literature: a patron base primarily consisting of novice users who do not know how to use the library and are unaware of the various services offered. crowley et al. evaluated an existing academic library’s webpages for issues and user needs.20 they identified issues with navigational structures and overly technical terminology and a general need for robust help and extensive research portals. in respect to our study, we found no literature that studied perceptions of internet public libraries. as mentioned earlier, research that addressed the ipl from the perspective of its patrons largely focused on ipl’s reference services. in 2008, ipl staff reported 13,857 reference questions received and 9,794,292 website visitors.21 although reference is clearly a vital and well-used service, there is also a great deal of website collection use that must be researched. recent literature does not address the current state of the ipl from the perspective the internet public library (ipl) | maceli, wiedenbeck, and abels 19 of such a library. a few remained confused about how such a concept would relate to physical public libraries and the internet in general. one participant assumed that such a term must mean the web presence of a particular physical public library. another’s immediate reaction was to question the value of such a venture in light of existing internet resources: “i mean, the internet is already useful, so i don’t know [how useful it would be]” (p2). two other participants found meaning in the term by associating it with a known library website, such as that of their academic library or local physical public library. when asked what websites seem similar in function or appearance to what they would consider an internet public library, responses varied. while most participants could not name any similar website or service, one mentioned several academic library websites that he was familiar with, another described several bookseller websites (amazon.com, half.com, and abebooks.com), and a third mentioned wikipedia (but then immediately retracted the statement, after deciding that wikipedia was not a library). theme 2: quick and easy, but still credible participants were highly enthusiastic about the perceived benefits in access to and credibility of information from an internet public library. ease of use and faster information access, often from home, were key motivators for use of internet-based libraries, both public and academic. as described earlier, there is a wealth of competing information options freely available on the internet. given this, participants felt that an internet public library would offer the most value because of its credible information: i like the ready reference [almanacs, encyclopedias]. . . . i’m not used to using any of these, wikipedia is just so ready and user friendly. it’s so easy to go to wikipedia but it’s not necessarily credible. . . . whereas i feel like this is definitely credible. it’s something i could use if i needed to in some sort of academic setting. (p10) theme 3: lack of differentiation between public and academic; physical and digital libraries for many participants, there was confusion about what was or was not a public library, and they initially considered their academic library in that category. overall, participants did not think of public and academic libraries (physical or on the internet) as distinctly different; rather they were more likely to be associated with phase of life. participants that were not current public library users reported using public libraries frequently during their years of elementary education. for participants that were current public library users, physical public libraries (and other local academic libraries) were used to fill in the gaps current public library users, and four (p2, p3, p7, and p10) had used public libraries in the past but were no longer using their services. two participants were graduate students (p3 and p9) with the remainder undergraduates, and two of the ten students had used the ipl website before (p3 and p6). the participants could be characterized as relatively infrequent public library users with a strong interest in the physical book holdings of the public library, primarily for leisure but frequently for research as well. several participants mentioned scholarly databases that were provided by their public library (typically from within the library or online with access using a public library card). there was also interest in leisure audiovisual offerings and in using the library as a destination for leisure. the following themes illustrate our main findings with respect to our research questions. as described above, we conceptualized our raw data into broad themes through an iterative process of inductive coding and analysis. although multiple themes emerged as associated with each of our research questions, we present only the most important and relevant themes (see table 2). all themes were supported by responses from multiple participants. we will further elaborate the themes discovered later in this section; a selected relevant and meaningful participant quote illustrates each theme. theme 1: confusion about name “internet public library” was not an immediately clear term to four of the participants; the six other participants were able to immediately begin describing their concept table 1. inductive coding scheme developed from raw transcript text, used to identify key themes coding scheme physical public libraries tied to life phase confusion between academic and public current use frequency of use perceptions of an internet public library access properties of physical libraries reference resources tools users general internet use academic library use similar sites to ipl 20 information technology and libraries | march 2011 would contain both electronic online items and locally available items in physical formats. in particular, connections to local physical libraries to share item holdings and availability status were desired: “general book information and maybe a list of where books can be found. like online, the local place you can find the books.” (p7) given that information-seeking, for this group, was conducted indiscriminately across physical and digital libraries, this integrated view into local physical resources seems to be a natural request. theme 6: personal and personalized help although no participants claimed that reference was a service that they typically use during their physical public library experiences, it was a strong expectation for an internet public library and mentioned by nearly every participant. when questioned as to how this reference interaction should take place, there was a clear preference for communicating via instant message: “reference information. . . . you know, where you have real people. a place where you can ask questions. . . . if you think you can get an answer at a library, then online you would hope to get the same things.” (p1) in addition to being able to interact with a “real” librarian, participants desired other personalized elements, such as resources and services dedicated to information needy populations (like children) as well as resources supporting the community and personal lifestyle issues and topics (like health and money). ■■ discussion in summary, we characterized the participants in this case study as low-frequency physical public library users with a high association between life phase (high school or grade school) and public library use. participants looked to public libraries to provide physical books—primarily for leisure but often for research use as well—leisure dvds and cds, scholarly databases, and a space to “hang for items that could not be located at their school’s academic library, either through physical or digital offerings. consistent with this finding, a few participants reported conducting searches across both local academic and public libraries in pursuit of a particular item. there was a general disregard for where the item came from, as long as it could be acquired with relatively little effort from physically close local or online resources. however, participants reported typically starting with their academic libraries for school resources and the public libraries for leisure materials “i go to the philadelphia public library probably once a month or so usually for dvds but sometimes for books that i can’t find here [academic library]. . . . i usually check here first because it’s closer.” (p5) theme 4: electronic resources, catalog, and searching tools are key there were many participant comments, and some confusion, around what type of resources an internet public library would provide, as well as whether they would be free or not (one participant assumed there would be a fee to read online). the desired resources (in order of importance) included leisure and research e-books, scholarly databases, online magazines and newspapers, and dvds and cds (pointers to where those physical items could be found in local libraries). a few comments were negative, assuming the resources provided would only be electronic, but participants were mostly enthusiastic about the types and breadth of resources that such a website would offer. for example, one participant commented, “i think you could get more resources. . . . the library i usually visit is kind of small so it’s very limited in the range of information you can find.” (p4) many participants emphasized the importance of providing robust, yet easy-to-use, search tools in managing complex information spaces and conveying item availability. theme 5: connections to physical libraries several participants assumed that the resource collection table 2. themes identified research question themes identified what is the public perception of an internet public library? confusion about name quick and easy, but still credible lack of differentiation between public and academic; physical and digital libraries what services and materials would such a website offer? electronic resources, catalog, and searching tools are key connections to physical libraries personal and personalized help the internet public library (ipl) | maceli, wiedenbeck, and abels 21 infosphere—their services and collections both physical and virtual.25 this is, like many issues in library systems design, a complex challenge. as previous research has shown, extending the metaphor of the physical library into the digital environment does not always assist users, especially when they may be more likely to draw on previous experiences with other internet resources.26 the original prospectus for the internet public library, as developed by joe janes, acknowledges the different capabilities of physical libraries and libraries on the internet, claiming that the ipl would “be a true hybrid, taking the best from both worlds but also evolving its own features.”27 if users anticipate an experience similar to the internet resources they typically use (such as search engines), then the ipl may best serve its users by moving closer to “internet” than “library.” however, such a choice may entail unforeseen tradeoffs. several participants in this study mused over what physical public library characteristics would carry over to a digital public library and the potential tradeoffs: “you wouldn’t have to leave your home but at the same time i think it’s easier to wander the library and just see things that catch your eye. and i like the quiet setting of the library too.” (p8) another participant mentioned the distinctly positive public library experience, and how such an experience should be reflected in an internet-based public library: “i think that public libraries have a very positive reputation within communities. and i don’t think it would be bad for an internet public library to move toward that expectation that people have.” (p3) the question remains, then, whether the ipl can compete with a multitude of other internet resources without losing the familiar and positive essence of a traditional physical public library. or rather, how can the ipl find a way to translate that essence to a digital environment without sacrificing performance and user expectations of internet services? ■■ conclusion during this study, participants described an internet public library that, in many ways, takes the best features of several currently existing and popular websites. an internet public library should contain all the information of wikipedia, yet be as credible as information received directly from your local librarian. it should search across both websites and physical holdings, like abebooks.com or a search aggregator. it should search as powerfully and as easily as google, yet return fewer, more targeted results. and it should provide real-time help immediately and conveniently, all from the comfort of your home. out” or occupy leisure time. for the participants, an internet public library (an occasionally confusing term) described a service you could access from home, which included electronic books, information about locally available physical books, scholarly databases, reference or help services, and robust search tools. it must be easy to use and tailored to needy community populations such as children and teens. for several participants it would be similar to existing bookseller websites (such as amazon. com or abebooks.com) or academic library websites. in exploring how these findings can inform the future design and direction of the ipl, it is again necessary to reflect on the values and concepts that inspired the original creation of the ipl. the initial choice of the ipl’s name was intended to reflect a novel system at the time, as joe janes detailed in the ipl prospectus: “i would view each of those three words as equally important in conveying the intent of this project: internet, public, and library. i think the combination of the three of them produces something quite different than any pair or individual might suggest.”23 all three of these concepts—internet, public, and library—have evolved with the changing nature of the internet. and, as the research explored would indicate, there may not be a distinct boundary between these concepts from the perspective of users. our finding that participants seek information by indiscriminately crossing public and academic libraries, as well as digital and physical resource formats verifies earlier research efforts.24 as the amount of information accessible on the internet has expanded, the boundary of the library can be seen as either expanding (providing credible indexing, pointers, and information about useful resources from all over the internet), contracting (primarily providing access to select resources that must be accessed through subscription), or existing somewhere in between, depending on the perspective. in any of these cases, it is vital that the ipl present its resources, services, and offerings such that its value and contribution to information-seeking is highlighted and clear to users. amorphously placed in a complex world of digital and physical information, the ipl must work toward creating a strong image of its offering and mission; an image that is transparent to its users, starting with its name. this challenge is not the ipl’s alone, but rather that of all internet library portals, resources, and services. the 2005 oclc report on perceptions of libraries expressed the importance of a strengthened image for internet libraries: libraries will continue to share an expanding infosphere with an increasing number of content producers, providers and consumers. information consumers will continue to self-serve from a growing information smorgasbord. the challenge for libraries is to clearly define and market their relevant place in that 22 information technology and libraries | march 2011 library,” journal of electronic publishing 3, no. 2 (1997). 8. david s. carter and joseph janes, “unobtrusive data analysis of digital reference questions and service at the internet public library: an exploratory study,” library trends 49, no. 2 (2000): 251–65. 9. on the ipl’s history and funding, see barbara hegenbart, “the economics of the internet public library,” library hi tech 16, no. 2 (1998): 69–83; joseph janes, “serving the internet public: the internet public library,” electronic library 14, no. 2 (1996): 122–26; and carter and janes, “unobtrusive data analysis,” 251–65. on digital reference and ipl’s question-andanswer service, see kenneth r. irwin, “professional reference service at the internet public library with ‘freebie’ librarians,” searcher—the magazine for database professionals 6, no. 9 (1998): 21–23; nettie lagace and michael mcclennen, “questions and quirks: managing an internet-based distributed reference service,” computers in libraries 18, no. 2 (1998): 24–27; sara ryan, “reference service for the internet community: a case study of the internet public library reference division,” library & information science research 18, no. 3 (1996): 241–59; and elizabeth shaw, “real time reference in a moo: promise and problems,” internet public library, http://www.ipl.org/div/iplhist/moo .html (accessed dec. 4, 2008). on the ipl’s resources and collections, see thomas pack, “a guided tour of the internet public library—cyberspace’s unofficial library offers outstanding collections of internet resources,” database 19, no. 5 (1996): 52–56. 10. joseph janes, the internet public library handbook (new york: neal schuman, 1999). 11. carter and janes, “unobtrusive data analysis,” 251–65. 12. gloria j. leckie and lisa m. given, “understanding information-seeking: the public library context,” advances in librarianship 29 (2005): 1–72. 13. james rettig, “reference service: from certainty to uncertainty,” advances in librarianship 30 (2006): 105–43. 14. eileen abels, “information seekers’ perspectives of libraries and librarians,” advances in librarianship 28 (2004): 151–70. 15. ibid., 168. 16. cathy de rosa et al., “perceptions of libraries.” 17. cathy de rosa et al., “college students’ perceptions of libraries.” 18. george d’elia et al., “the impact of the internet on public library use: an analysis of the current consumer market for library and internet services,” journal of the american society for information science & technology 53, no. 10 (2002): 802–20. 19. anna maria tammaro, “user perceptions of digital libraries: a case study in italy,” performance measurement & metrics 9, no. 2 (2008): 130–37. 20. gwyneth h. crowley et al., “user perceptions of the library’s web pages: a focus group study at texas a&m university,” the journal of academic librarianship 28, no. 4 (2002): 205–10. 21. adam feldman, e-mail to author, apr. 3, 2009; mark galloway, e-mail to author, apr. 3, 2009. 22. for information on inductive qualitative analysis, see david r. thomas. “a general inductive approach for analyzing qualitative evaluation data” american journal of evaluation 27, no. 2 (2006): 237–46; michael quinn patton, qualitative research and evaluation methods (thousand oaks, calif.: sage, 2002); these are clearly complex, far-reaching, and labor-intensive requirements. and many of these requirements are currently difficult and unresolved challenges to digital libraries in general, not simply the ipl. this preliminary study is limited in its college student participant base and small sample size, which may not reflect perspectives of the greater community of ipl users. these results therefore may not be generalizable to other populations who are current or potential users of the ipl, including other targeted groups such as children and teens. additionally, our chosen participant group, college students who are physical library users, had relatively high levels of library and technology experience, as well as complex expectations. our results would likely differ with a participant group of novice internet users. as detailed above, this study explores public perceptions of an internet public library—an important aspect of the ipl that is not well studied and that has implications on ipl use and repeat use. while the ipl was carefully and thoughtfully constructed by a dedicated group of librarians, students, and educators, there has not been a recent study devoted to understanding what an internet public library should be today. more recently, in january 2010, the ipl merged with the librarians’ internet index to form ipl2. the two collections were merged and the website was redesigned. although this merger was because of circumstances unrelated to our research, our findings were leveraged during the redesign (for example, in naming the collections). in the future, our findings can be used in further ipl2 design iterations or explored in subsequent research studies in the specific context of ipl2 or of digital libraries in general. as discussed above, this study may be extended to different participant populations and to existing but remote ipl2 users. this study may also be continued in a more design-oriented direction to explore the usability and user acceptance of ipl2’s website. references 1. joseph janes, “the internet public library: an intellectual history,” library hi tech 16, no. 2 (1998): 55–68. 2. “about the internet public library,” internet public library, http://ipl.org/div/about/ (accessed feb. 17, 2009). 3. cathy de rosa et al., “perceptions of libraries and information resources,” oclc online computer library center, 2005, http://www.oclc.org/reports/pdfs/percept_all .pdf (accessed mar. 9, 2009); cathy de rosa et al., “college students’ perceptions of libraries and information resources,” oclc online computer library center, 2005, http://www .oclc.org/reports/pdfs/studentperceptions.pdf (accessed mar. 9, 2009). 4. janes, “the internet public library,” 55. 5. ibid., 56. 6. ibid., 57. 7. lorrie lejeune, “before its time: the internet public the internet public library (ipl) | maceli, wiedenbeck, and abels 23 american society for information science & technology 58, no. 3 (2007): 433–45. 25. de rosa et al., “college students’ perceptions of libraries,” 146. 26. makri et al., “a library or just another information resource?” 434. 27. joseph janes, “the internet public library,” 56. and matthew b. miles and michael huberman, qualitative data analysis: an expanded sourcebook, 2nd ed. (thousand oaks, calif.: sage, 1994). 23. janes, “the internet public library,” 56. 24. for example, stephann makri et al., “a library or just another information resource? a case study of users’ mental models of traditional and digital libraries,” journal of the appendix. interview protocol ■■ have you ever visited a public library? ■❏ if so, how often do you visit and why? ■❏ what services do you typically use? ■❏ can you describe your last visit and what you were looking for? ■❏ what do you think an internet public library would be? ■■ what sort of services would it offer? ■■ what else should it do? ■■ have you ever visited an internet public library? microsoft word march_ital_kuglitsch_proof.docx facilitating research consultations using cloud services: experiences, preferences, and best practices rebecca zuege kuglitsch, natalia tingle, and alexander watkins information technology and libraries | march 2017 29 abstract the increasing complexity of the information ecosystem means that research consultations are increasingly important to meeting library users' needs. yet librarians struggle to balance escalating demands on their time. how can we embrace this expanded role and maintain accessibility to users while balancing competing demands on our time? one tool that allows us to better navigate this balance is google appointment calendar, part of google apps for education. it makes it easier than ever for students to book a consultation with a librarian, while at the same time allowing the librarian to better control their schedule. our experience suggests that both students and librarians felt it was a useful, efficient system. introduction the growing complexity of the information ecosystem means that research consultations are increasingly important to meeting library users' needs. although reference interactions in academic libraries have declined overall, in-depth research consultations have not followed that trend.1 these research consultations represent an increasingly large proportion of academic librarians' reference interactions, and offer important opportunities to follow up on information literacy instruction, support student academic success, and relieve library anxiety. the library literature has demonstrated a need for and appreciation of these services.2 moreover, students value face to face consultations because they provide an opportunity to talk through complex problems and questions while providing affective benefits such as relationship building and reassurance.3 it is evident that students seek out and value these services. but even as these services become increasingly important, librarians struggle to balance escalating demands on their time. how can we embrace this expanded role and maintain accessibility to users while managing competing priorities? we found little guidance in the literature to identify the most efficient technological tools to offer these services to undergraduates, so we began to explore options. one tool that allows us to better navigate this shifting landscape is google appointment calendar, part of google apps for education. it makes it easier for students to book a consultation with a librarian, while at the same time allowing the librarian to better control their schedule; rebecca zuege kuglitsch (rebecca.kuglitsch@colorado.edu) is head, gemmill library of engineering, mathematics & physics, university of colorado boulder. natalia tingle (natalia.tingle@colorado.edu) is business collections & reference librarian, university of colorado boulder. alexander watkins (alexander.watkins@colorado.edu) is art & architecture librarian, university of colorado boulder. facilitating research consultations using cloud services: experiences, preferences, and best practices | kuglitsch, tingle, and watkins | https://doi.org/10.6017/ital.v36i1.8923 30 consequently, it is being adopted by many librarians at the university of colorado boulder. there are several other options available for librarians interested in calendar applications, such as youcanbook.me.4 however, on campuses using google apps for education, it may be easier to use a tool students are already familiar with and commonly use as part of their daily academic routines. moreover, the integration with apps for education solves some of the problems hess noted in the public version of google calendar appointments (which is also no longer available), such as appointments booked without identifying information, and the extra step of logging in just for an appointment. because students are often already logged in due to using google apps for word processing, group work, and more, there is no extra step to log in for a simple appointment.5 our exploration of this tool suggests that it is helpful to librarians, but that it can also be of benefit to students, too. research has proposed that students may hesitate to ask questions due to library anxiety. would scheduling an appointment using a calendaring system be less intimidating than emailing a librarian directly, for example? we set out apply this technology in an environment of changing student preferences and expectations, explore how students received it, and establish effective practices for using it in an academic setting. since we are liaisons to science, social science, and humanities subject areas, we were able to work with a wide spread of undergraduate students in our exploration to see what might be most effective for us, and also for students from a variety of backgrounds. why google calendar we selected appointment booking via google calendar because of its ease of use and because the university of colorado boulder has google apps for education. this means that every student will have a google id and the option of using google calendar as part of their normal routine. in december 2012, google discontinued appointment calendars for general users, and limited claimable appointment slots to google apps for education. for institutions which who do not subscribe, it may be worth investigating third-party google calendar apps, some of which are free or freemium, such as calendly (https://calendly.com/), or springshare’s similar subscription service, libcal (https://www.springshare.com/libcal/). setting up google calendar one of the benefits of google calendar is its ease of use. starting to set up the calendar for appointment slots is as simple as creating a new google calendar event and selecting appointment slots as the type of event. next, you can give your appointment slots a name that correspond with the language your institution uses for research consultations, and schedule them for the desired length of time. it is possible to schedule blocks of appointments that google will automatically break into shorter appointments of predetermined amounts of time. the authors created appointments lasting 30 minutes, 60 minutes, or a mix of both, depending on the expectations of our disciplines. it is also possible to create several simultaneous appointment slots, if you would like to accommodate small groups. as well as indicating time, each appointment also has a space to indicate location, particularly useful for librarians who might work in several branches or combine office hours in academic buildings with in-library office consultations. once the events are named and saved, the calendar can be shared. information technology and libraries | march 2017 31 figure 1. create a new event, selecting ‘appointment slots’. appointment calendars are given a unique shareable url to direct users to available appointments; however, these urls are necessarily long and complicated, so we recommend using a link shortener. to obtain the very long url for an appointment calendar, click on ‘edit details’ in an appointment event. from there, it is possible to copy the link and use a link shortener to make a brief, understandable link. figure 2. obtain the shareable link when a student uses the link to make an appointment, both the librarian and the student receive an email with the student’s login name, email, appointment time, and other details. the slot immediately appears as taken on the calendar, so it is no longer available for other students, reducing confusion and double booking. receiving the student’s email allows the librarian to initiate the reference interview and establish expectations. facilitating research consultations using cloud services: experiences, preferences, and best practices | kuglitsch, tingle, and watkins | https://doi.org/10.6017/ital.v36i1.8923 32 figure 3. google calendar showing a variety of available appointments. student impressions we received positive feedback about the appointment calendars from students. students commented: ● “i like the ability to see all of the possible openings,” ● “i already bookmarked that bit.ly, so you’ll probably hear from me” (which we did, shortly thereafter). ● “i like to be able to ‘schedule’ a consultation, not request one. it seems more useful and immediate.” we kept track of how many students who made calendar appointments over two semesters kept them, and sent a short, informal survey to students who made appointments. no students who made a calendar appointment failed to attend their consultation. though our survey does not permit large-scale generalizations due to a very low response rate (4) and a small sample size (15), all of the students who responded and used the calendar found the experience of booking an appointment that way to be easy, convenient, and unintimidating. everyone who used the calendar indicated that they would prefer to use it again, and about half of the respondents who set up their appointments via email told us that they would prefer to book a consultation through information technology and libraries | march 2017 33 an appointment calendar in the future. our anecdotal evidence in succeeding semesters aligns with this perception. we found that using appointment calendars can have many benefits for students: ● they can reduce student anxiety from having to compose and send an email. ● booking appointments can take less of their time. they book immediately without back and forth emailing. this also means there’s no time to rethink the appointment and either never send the email or back out later. ● the appointment is placed on their calendar, meaning they automatically have a built-in reminder and don’t need to search through their email to find the date and time of their appointment. ● since the appointment calendars eliminate back and forth scheduling and reduce email fatigue, students may be more willing to use email to discuss their topic and/or question with the librarian. librarian impressions our experience has been equally positive. we found that using the calendars radically streamlines the typical back and forth email exchanges for setting appointments. we emailed each student to confirm the appointment, but this single email is still a significant reduction of claim on the librarian’s attention from a minimum of three emails to schedule an appointment (which often realistically becomes five or more when negotiating a time) to two. additionally, librarians can put appointment slots in between meetings and other times when they might only have a spare hour, which are often too tedious to list when emailing. using appointment calendars lets librarians efficiently use their time even when it is fragmented. as well as facilitating efficient use of small amounts of time, appointment calendars also allow librarians to gently create boundaries. rather than having to deny appointments requested for late nights or weekends, students are guided to viable times. while the use of google calendar is entirely voluntary at the university of colorado boulder we presented the tool at several reference librarian meetings with success and several other librarians have happily adopted the tool. one librarian who adopted the tool said: “sending a student a calendar that they can use to request a meeting eliminates the twelve messages back and forth on when to schedule a meeting. i also like that it puts the meeting on both our calendars, reducing the number of no-shows.” best practices our experiences and verbal feedback from students and librarians provided a foundation to develop best practices to minimize both librarian and student confusion. for students, confusion often centered around accessing the calendar, identifying which time slots were available, and identifying acceptable locations for appointments. the following best practices can help solve these difficulties. use a link shortener and a consistent naming convention so the links are similar for multiple librarians. using a link shortener makes it easy for students to jot down the calendar url, either to manually enter into a browser later or to quickly get to the link and bookmark it. this makes it easy for students to file the link and return to it at point of need. using a consistent naming facilitating research consultations using cloud services: experiences, preferences, and best practices | kuglitsch, tingle, and watkins | https://doi.org/10.6017/ital.v36i1.8923 34 convention makes it intuitive for students to transfer the appointment method over to other librarians’ cases for future research needs. if your link shortener is case-sensitive, create capitalized and lowercase versions of the link. many link shorteners are case-sensitive, unlike most urls, which can confuse students and lead to frustration when they try to access a link later. while this could be solved to some extent by using only lowercase letters for the shortened link, that solution can create a cumbersome and difficult to read short url. simply creating two forms of the link efficiently solves this. develop a naming convention so available appointment slots are obvious. we found that when naming time slots simply “consultation” students sometimes assumed that all appointments were booked when, in fact, every appointment was open. using a term like “available consultation” made it clear to students that the appointments were not already booked. google calendar automatically makes booked appointments unavailable, eliminating the opposite frustration. carefully consider the location in the bookable appointment form. google calendar allows librarians to enter or leave empty the location. if the field is left empty, users can specify a location, and students often filled in a location when none was indicated. if a librarian is not mobile, or is available in certain places only at certain times, it is key to identify a location. for example, in our study, one librarian held weekly office hours in two academic buildings; it was particularly important to identify which times the librarian was available in the library versus the academic buildings. on the other hand, it may also make sense not to designate a location. another of the authors, serving a population that used the main library, one branch library, and research area of the campus with no onsite library services, chose not to enter any location in order to accommodate the extremely dispersed population. users frequently indicated in which location they would be willing to meet, an option the librarian wanted to support in order to underscore the availability of services wherever users were located on campus. schedule two weeks of availability. we found that students could almost always find a time that worked for them with two weeks of available appointments. moreover, other than recurring office hours, it was difficult for librarians to predict their schedule further into the future than a few weeks. librarian concerns centered around keeping calendars synchronized, providing enough lead time for users to book appointments, and publicizing the service. we found several best practices that eased these concerns. designate a day each week to update hours and clear conflicts on the calendar. if google calendar is not the primary calendaring software for the library, it can be challenging to synchronize calendars. google calendar sends a calendar invitation to the librarian when an appointment is claimed, which they can accept on their primary calendaring system, but conflicts that arise on the primary calendaring system are not automatically sent to google calendar. by selecting a day and habitually updating the google calendar and quickly checking for conflicts that have arisen with unclaimed slots, librarians can avoid forgetting to add slots or remove those that conflict with other late-arising obligations. advertise the link on the library web site, give out the calendar link during class sessions and give it to professors to embed in course management systems. while appointment calendars still information technology and libraries | march 2017 35 benefit librarian workflows without advertising, students need easy access to the calendar. for maximum user uptake, it is important to put the calendar link anywhere a librarian’s contact information can be found. we found it helpful to promote the link in classes, and that it was particularly effective when professors agreed to place the link in the class web site. this positions library research assistance next to assignments when they are given out and drafts when they are returned--hopefully reminding students that the library is available for assistance at moments in which they are most likely to seek it. reflections and conclusions our experiences support the idea that online appointment calendars are appreciated by students, streamline work for librarians, and are easily adopted by both parties. more use of this technology, whether via google apps for education or another service, can be mutually beneficial to librarians and students. students using the calendar indicated that it was not more intimidating than emailing a librarian, and by removing the waiting period for a response, a calendar can prevent student distraction or students persuading themselves that they actually do not need help in the interim. by providing a calendar where students can quickly and simply book an appointment with a librarian for research assistance, librarians can support students seeking assistance, and thus ultimately bolster student success and increase the library’s relevance. references 1. naomi lederer and louise mort feldmann, “interactions: a study of office reference statistics,” evidence based library and information practice 7, no. 2 (2012): 5–19. 2. ramirose attebury, nancy sprague, and nancy j. young, “a decade of personalized research assistance,” reference services review 37, no. 2 (2009): 207–20, https://doi.org/10.1108/00907320910957233; trina j. magi and patricia e. mardeusz, “what students need from reference librarians: exploring the complexity of the individual consultation,” college & research libraries news 74, no. 6 (2013): 288–91. 3. trina j. magi and patricia e. mardeusz, “why some students continue to value individual, faceto-face research consultations in a technology-rich world,” college & research libraries 74, no. 6 (november 1, 2013): 605–18, https://doi.org/10.5860/crl12-363. 4. amanda nichols hess, “scheduling research consultations with youcanbook.me low effort, high yield,” college & research libraries news 75, no. 9 (october 1, 2014): 510–13. 5. hess, “scheduling research consultations with youcanbook.me low effort, high yield,” 511. solving seo issues in dspace-based digital repositories: a case study and assessment of worldwide repositories article solving seo issues in dspace-based digital repositories a case study and assessment of worldwide repositories matúš formanek information technology and libraries | march 2021 https://doi.org/10.6017/ital.v40i1.12529 matúš formanek (matus.formanek@fhv.uniza.sk) is assistant professor in the department of mediamatics and cultural heritage, faculty of humanities, university of zilina, slovakia. © 2021. abstract this paper discusses the importance of search engine optimization (seo) for digital repositories. we first describe the importance of seo in the academic environment. online systems, such as institutional digital repositories, are established and used to disseminate scientific information. next, we present a case study of our own institution’s dspace repository, performing several seo tests and identifying the potential seo issues through a group of three independent audit tools. in this case study, we attempt to resolve most of the seo problems that appeared within our research and propose solutions to them. after making the necessary adjustments, we were able to improve the quality of seo variables by more than 59% compared to the non-optimized state (a fresh installation of dspace). finally, we apply the same software audit tools to a sample of global institutional repositories also based on dspace. in the discussion, we compare the seo sample results with the average score of the semi-optimized dspace repository (from the case study) and make conclusions. introduction and state of art search engine optimization (seo) is a crucial part of the academic electronic environment. all their users attempt to process too much information and need to retrieve information fast and effectively. making academic information findable is essential. digital institutional repository systems, used to disseminate scientific information, must present their content in ways that make it easy for researchers elsewhere to find. in this paper, we describe work conducted in the department of mediamatics and cultural heritage at faculty of humanities, university of zilina to improve the discoverability of materials contained within its dspace institutional repository. in the literature review, we examine definitions of website quality and discuss audit tools. then, beginning our case study, we describe the tools applied at our institution. we next describe the selection process of a suitable set of testing tools, focused on the optimization of seo variables of the selected institutional repository running with dspace software, that will be applied later in the case study. the remainder of the article focuses on the identification and resolution of potential seo issues using the three independent online tools we selected. we aim to resolve as many problems as possible and compare the level of achieved improvement with the default installation of dspace 6.3 software which our digital repository is based on. the primary goal is not only to improve the seo parameters of the discussed system but also to increase the searchability of scientific website content disseminated by dspace-based digital repositories. next, we offer insights into worldwide dspace-based repositories. we will show that dspace is currently one of the most widely used software packages to support and run digital repositories. unfortunately, there are many major seo issues that will be discussed later. the secondary objective of this paper is to use the same set of tools to evaluate the current state of the sample of worldwide digital repositories also based on dspace. we will provide the report based on our own findings. in the discussion, the seo score of the optimized dspace (from th e case study) will be mailto:matus.formanek@fhv.uniza.sk information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 2 compared with the results of the current state of seo parameters from the worldwide dspace repositories. finally, our work also carries out many relatively innovative approaches related to digital repositories that have not been extensively debated anywhere in the literature yet. literature review to achieve our goal, we started with a review of existing academic papers. drawing from those papers we describe the current state of academic institutions’ presentation through the internet and search engines. in this sense, we focus on website optimization. the internet, as a medium, is still rapidly expanding. a massive amount of data is communicated, shared, and available online, as noted by christos ziakos: as a result, billions of websites were created, which made it hard for the average (or even advanced) user to extract useful information from the web efficiently for a specific search. the need for an easier, more efficient way to search for information led to the development of search engines. gradually, search engines began to assess the relevance of every website on their indexes compared to the queries provided to them by the users. they took into consideration several website characteristics and metrics and calculated the value of each website using complex algorithms. the enormous number of websites being indexed from search engines, along with the increasing competition for the first search results, led to studying and implementing various techniques in order for websites to appear more valuable in search engines.1 that description applies equally to academic websites as well as commercial ones. a review of relevant literature suggests that it is very important for academic institutions to carefully consider and apply website optimization. there were around 28,000 universities worldwide in 2010, according to one study that monitored research in the field of worldwide academic webometrics.2 the actual number of universities seems to be very similar in 2020. baka and leyni affirm in their working paper that the success or failure of an academic institution depends on its website: “the work of each university exists only when it encounters and interacts with society. their popularity with the public is steadily growing.” what is directly connected with the institution’s presence in the world wide web.3 many authors define the term search engine optimization (seo) as a series of processes that are conducted systematically to improve the volume and quality of traffic from search engines to a specific site by utilizing the working mechanism or algorithm of the search engine. it is a technique of optimization a website’s structure and content to achieve a higher position in search results. the aim is to make increase the website’s ranking in a web search results.4 after an extensive information retrieval in the relevant literature, we can conclude that although seo is currently a widely discussed topic, there is very little accessible scientific literature related to seo applications in the field of digital repositories in general, and none at all in the particular subset of dspace-based repositories. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 3 website quality many authors generally affirm that there is a positive correlation between academic excellence and the complex web presence of an institution. it confirms that website quality is a factor that can give us a predictive or causal relationship with seo performance.5 numerous tools could be employed to measure the quality of websites, test them closely and produce an seo performance ranking websites ability to properly promote their content through the search engines. for example, the academic ranking of world universities (the shanghai ranking, http://www.shanghairanking.com) has been established for the top 1,000 universities in the world. the website quality is considered by the authors as the quality of institution’s online presence, its ability to properly promote digital content in search engines and finally, in combination, its overall web presence. according to the shanghai ranking list, this is a factor for some “prospective students to decide on whether they will enroll in a specific institute or not. ” 6 a number of recent studies have also attempted to examine the online presence of academic institutions from various points of view. one of the older studies mentioned that the quality of academic websites is very important for students in the process of enrollment.7 another key aspect is the optimized website performance as well as seo and website security.8 audit tools if we want to perform any optimization, we need an appropriate software tool to check a current website’s ranking. according to g2, the world’s largest technology online marketplace, seo software is designed to improve the ranking of websites in search engine results pages without paying the search engine provider for placement. these tools provide seo insights to companies through a variety of different features, helping identify the best strategies to improve a website’s search relevance.9 seo audit software could be used by seo specialists or system administrators, as well. audit software performs one or more of the following functions in relation to seo: content optimization, keyword research, rank tracking, link building, or backlink monitoring. the software then provides reports on the optimization-related metrics.10 many authors stress the importance of a holistic approach to seo factors (24 factors were tested), but it depends on the most effective ones: for example, the quantity and quality of the backlinks, the ssl certificate and so on, which will be described later in this paper.11 the quality of academic websites is very important for researchers, too. they need to disseminate scientific information and communicate it in effective ways. according to some authors, the topic of academic seo (aseo) has been gaining attention in recent years.12 aseo applies seo principles to the search for academic documents in academic search engines such as google scholar and microsoft academic. in another scientific paper, aseo is considered as very similar to traditional seo, where institutions want to make good use of a seo to promote digital scientific content on the internet. beel, gipp, and wilde emphasize the importance for researchers to ensure that their publications will receive a high rank on academic search engines.13 by making good use of aseo, researchers will have a higher chance of improving the visibility of their publications and have their work read and cited by more researchers. in recent years, digital institutional repositories (as the academic systems) have been used as modern ways of promotion and dissemination of digital scientific objects through the internet. digital objects need to reach a wider audience—digital repositories have a form of website interface, interact with students, teachers, or researchers on a daily basis and use the number of citations, articles, theses or other research objects. institutional repositories are affected by search http://www.shanghairanking.com/ information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 4 engines too, so some improvements on repositories’ seo parameters are needed. these factors contribute to a system’s rankings. seo on institutional repositories is not considered an absolutely new scientific topic. kelly stressed eight years ago that google is critical in driving traffic to repositories. he analyzed results from a survey describing the summarization of seo findings for the 24 institutional repositories in the united kingdom. the survey results showed that referring platforms were primarily responsible for driving traffic to those institutional repositories—thanks to many hypertext links in referring domains.14 since then, seo analyses of digital repositories have not been a widely discussed topic in the literature. it is a relatively unique topic to discuss seo on a specific type of digital repository software—dspace, as the most used and popular software for running digital libraries and repositories.15 consequently, this paper focuses on that topic since the dspace-based digital repository is a complex online computer system where some seo parameters could be adjusted. seo audit tools help to identify areas of potential adjustments of those website properties that could help produce higher rankings in search engines (and improve the whole system visibility). audit tools selection process website variables that affect seo can be tested using specialized online software tools. this topic is discussed in detail on a semi-professional level on specialized websites that provide a number of recommendations regarding the use of specific tools as well as evaluations of the tools.16 these tools can keep track of changes in many seo variables. we want to use this approach in our study. however, first we need to choose the appropriate set of these tools. we have found that many seo audit tools mentioned in professional online sources are narrowly specialized.17 for example, they may be focused only on keyword analysis, backlink analysis (for example, ahrefs’ free backlink checker), and so on. in our study, we intend to describe a greater number of seo parameters to monitor rather than emphasize only a few selected ones. we also need tools that are fully available online for free. based on these criteria, we immediately excluded several tools from the selection, because they provide only austere, simple, or restricted information. many tools were excluded because they were limited to a single test with the requirement of registration or provision of an email address. a number of testing tools were also available only in paid versions. we wanted a set of tools that focus on several aspects of seo analyses and evaluate the quality of websites’ seo variables comprehensively. it is important to add that the selected tools results must be comparable, too. after careful consideration of all possibilities, we finally decided to choose three independent seo audit tools in order to make the approach more transparent. the selected tools met most of the criteria mentioned above. however, it is very important to note that many other software tools surely meet the criteria and could also be suitable for testing purposes. based on the scientific literature review, we were not able to identify specific recommendations in this regard; therefore, we have been inspired by the advice offered in the websites and blogs previously mentioned that are focused primarily on seo. our tools selection is as follows (listed in alphabetical order): 1. seo checker (https://suite.seotesteronline.com/seo-checker ) is part of a complex audit software suite called seo tester online suite. seo checker provides tests in the following categories: base, content, speed, and connections to social media. it tracks, among many other parameters, title coherence, text/code ratio, accessibility of microdata, opengraph https://suite.seotesteronline.com/seo-checker information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 5 metadata, social plugins, in-page and off-page links, quality of links, mobile friendliness of the page and many other seo and technical website attributes. regarding restrictions, only two sites can be tested within a 24-hour period. the limit increases to four sites per day after free registration with a valid email address. moreover, there is a 14-day trial period during which all hidden functionalities work. in the free version that we used, a complete report can be viewed only, not downloaded or saved. 2. seo site checkup (https://seositecheckup.com/) was selected based on many positive recommendations from the technically oriented expert website traffic radius.18 seo site checkup is described as “a great seo tool that offers more than 40 checks in 6 different categories (common seo issues like missing metadata, keywords, issues related with absence of connections to social media, semantic web, etc.) to serve up a comprehensive report that you can use to improve results and the website’s organic traffic. it also gives recommendations to fix critical issues in just a few minutes. as a tool, it is very fast and provides in-depth information about the various seo opportunities and accurate results.”19 seo site checkup is appreciated and recognized as number one among other audit tools ranked by the geekflare website.20 another reason we selected this tool for our testing scenario is the fact that the google search engine will offer a link to this tool as the first after entry the search query “seo testing tool” (excluding paid links). seo site checkup is also the fastest of the selected audit tools, which could be considered as another advantage. its disadvantages include the ability to test only one website within 24 hours from one public ip address. 3. woorank (https://woorank.com) is recommended by traffic radius: “woorank offers an in-depth analysis that covers the performance of existing seo strategies, social media and more. the comprehensive report analysis is classified into eight sections for improved readability quotient, and you may also download the report as branded pdf.”21 woorank has obtained the third position among the recommended software tools. trustradius gives it a score of 9.2 out of 10 and users rate it of 4.67 out of 5 stars based on 51 reviews .22 on the one hand, some results are hidden in the free version, but the final score will be shown. on the other hand, woorank has no limit to the number of websites tested per day, but it is the slowest of the selected testing tools. we selected these three seo audit tools because they work independently, their results are comparable to each other, and they offer a quick way to get comprehensive seo analysis results for a tested site. it should be noted that results of some performed tests are hidden, but there is general guidance on how to fix some issues. however, the solution always depends on the specif ic site and used technology. using three different tools adds objectivity because we do not rely on just one tool and a one-sided view of the seo issue. the three selected testers all display results in the same way—test results are always shown as a summarized score in the range of 0 to 100 points (100 represents the best result). a very large set of seo parameters and technical website properties is evaluated in all three cases. these tests are usually divided into several categories (for example, common seo issues, performance, security issues, and social media integration). although similar parameters https://seositecheckup.com/ https://woorank.com/ information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 6 are assessed in all three audit tools, there are still some differences between them. each of the testing tools is unique in a certain area because it also tests a parameter that the others do not deal with or evaluates a website by a different methodology. still, the fact remains that the evaluated seo parameters overlap between the tools. we will not overload this paper with detailed information and technical details of individual partial tests, because they can be easily found on the website of the given test tools (seo site checkup, seo checker online, woorank). we will just mention the common core of main tests: css minification test, favicon test, google search results preview test, google analytics test, h1 heading tags test, html page size test, image alt test, javascript minification test, javascript error test, keywords usage test, meta description test, meta title test, seo friendly url test, sitemap test, social media test, robots.txt test, url canonicalization test, and url redirects test. another specific group consists of tests related to a particular audit tool. thanks to them we can get a more comprehensive view of the tested area of a website’s seo characteristics. for example, seo checker features the following specific tests: title coherence test, unique key words test, h1 coherence test, h2 heading tags test and facebook popularity test. woorank as the second tool extends the basic set of tests with the following: title tag length test, in-page links test, off-page links test, language test, twitter account test, instagram account test, traffic estimations and traffic rank. of course, there is also a set of tests that are parts of two audit tools, but the third one does not deal with them since it is specialized in another area. as we have mentioned, the tools offer a list of suggestions for potential improvement of seo characteristics. the user is informed about an issue, but no instructions or solutions are provided on how to resolve it. the main benefit of this paper lies with its objective to solve specific seo issues. this work may improve the visibility and searchability of dspace-based institutional repositories. a set of the three audit tools described above will be used in the following section. we attempt to identify possible seo issues of the selected institutional repository in the form of a case study. then we aim to fix the identified seo issues and increase its quality of seo parameters as well as demonstrate the potential impact on website traffic caused by performed repairs. all traffic measurements will be based on google analytics data. the institutional repository of the department of mediamatics and cultural heritage (seo case study) background information an older version of our digital repository (based on dspace v5.5) was launched by the department of cultural heritage and mediamatics in april 2017. now, in 2021, the repository makes available online over 180 digital objects, most of them open access under creative commons licenses. the first attempts to create and establish a similar virtual space for digital objects started long ago. several software solutions had been tested for this purpose—for example, invenio and eprints, along with dspace. according to opendoar’s statistics, eprints and dspace have always been the most popular tools for running digital repositories.23 a few years ago, dspace was chosen as the primary software for running a digital repository. since then, the usage of open-source software has been raising. for example, ubuntu server lts (long term support) is used as an operating system, tomcat 8 is used as a web server, postgresql information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 7 assumes the role of a database system, etc. all of those software components are part of a complex digital system and are orchestrated in a virtual environment that is built on an open-source virtualization solution called xcp-ng (in version 8.2). some software components have been switched for others during the development period. based on our experience, the digital repository’s regular visitors were mostly from the staff and students of the department. we initially did not feel a need to improve the visibility of this system to search engines, an oversight that turned out to be a mistake in the long run. we did not perform any search engine optimization on this repository until november 2019, when we coincidentally discovered several scientific articles dealing with seo in the academic environment. after studying the theoretical background, we initiated the practical application process. we applied theory and our experience with dspace software into an seo troubleshooting process within our local repository. most of the optimizing actions related to solving the major seo issues were performed before november 10, 2019. we will describe the seo adjustments we made and derive a list of recommendations for other institutions based on our own experience. initial testing of a clean dspace 6 installation in order to formulate any recommendations related to seo and the administration of dspace digital repositories, it is important to determine and test a starting point. for this purpose, we chose a clean instance of dspace v6.3 with an xml user interface (xmlui)—the latest commonly available stable version. this is the same version that we use in this case study and in our production environment. (a newer version, dspace 7 beta 4, was released by atmire on october 13, 2020).24 no other customization edits were made except a base configuration and necessary url settings. this installation of dspace v6.3 has been tested by the same set of tools mentioned previously. the tests we performed are summarized in table 1, where they are divided into four main seo sections in the first column: common seo issues, social, speed and security. a test name is shown in the second column. the third column is marked as “default installation,” where we display the test results on our clean dspace 6.3 installation. if the tested instance met the criteria of the given test, the green pictogram occurs. when the particular test fails, the red cross is used. the improved state is shown in the fourth column marked as semi-optimized. it is a consequence caused by many important technical changes and seo issues solving process. th is issue will be discussed and described later in this paper; however, a short note about the considered issue is displayed in each row. these notes were retrieved by reports on results. we have used the prefix semiin the last column because we were not able to resolve all detected seo issues—only most of them. all related reasons will be described briefly in the discussion section. when the improving change between states has been made, we have changed a status pictogram (from the red cross to the green correct tick) and set the row color to yellow. the changes leading to improvement (e.g., the yellow rows) will be discussed in detail later, too. recall that we have no need to overload the main text of this paper with detailed technical information about partial tests, because it can be easily found on the websites of the given test tools. table 1 shows the compared results between the non-optimized and semi-optimized states of the dspace repository. based on table 1, the default instance of dspace with basic http and other information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 8 default settings received only 58 points out of 100 in seo site checkup, 50.1 points in seo checker and 32 points in woorank. the average final score is 46.7 points out of 100. although this gained score could be considered as low, the dspace default instance still meets certain basic criteria of seo. in addition, many repository administrators usually do not rely only on a default installation, but they make at least some changes in configuration immediately after the initial installation. inter alia, the first thing to do should be an implementation of https protocol, adding a connection with google analytics services and so on. the improved state is shown in the last column of table 1. whenever we solved an issue, the overall score raised. the semi-optimized repository has obtained a higher score compared to the previous column (default installation). the last column represents the final (however semioptimized) state of technical and seo attributes which we were able to reach at this moment. as shown, many seo issues have been solved. we highlighted them in yellow. on the one hand, some issues remain unsolved. on the other hand, the overall seo improvement is more than noticeable although the final average gained score has not reached the maximum value (100 points). information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 9 table 1. comparison of results between the non-optimized and semi-optimized states of dspace repository. test name state default installation (before optimization) semi-optimized (after a few optimization steps) meta title test, title tag length the title tag is set, but the meta title of the webpage (dspace home) has a length of 11 characters. it is too low. the title tag has been set to “digitálny repozitár katedry mediamatiky a kultúrneho dedičstva” (note: in slovak language). title coherence test the keywords in the title tag are included in the body of the page the title of the page seems optimized. meta description test no meta-description tag is set. meta-description tag has been set. (121 characters) google search results preview test “dspace home” is too general. the title of the page has been changed. keywords usage test the keywords are not included in title and meta-description tags. a set of appropriate keywords has been added. unique key words test the textual content is not optimized on the page. there is an excellent concentration of keywords in the page. this page includes 382 words of which 58 are unique. h1 heading tags test 8 h1 tags, 6 h2 tags the h1 tags of the page seem not to be optimized. there are too many h1 tags. h1 coherence test the keywords present in the tag h1 are included in the body of the page. some of the keywords of the tag h1 are not included in the body of the page. h2 heading tags test the keywords present in the tag
are included in the body of page. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 10 test name state default installation (before optimization) semi-optimized (after a few optimization steps) language test detected: slovak declared: missing a missed language tag has been implemented. robots.txt test no “robots.txt” file has been found. “robots.txt” file has been enabled. sitemap test no sitemap has been found. sitemap has been enabled. seo friendly url test webpage contains urls that are not seo friendly! webpage contains urls that are not seo friendly. image alt test the webpage does not use “img” tags. it is optimized. inline css test the webpage uses inline css styles. the webpage uses inline css styles. deprecated html tags test the webpage does not use html deprecated tags. google analytics (ga) test ga is not in use. ga has been implemented. favicon test default dspace favicon is used. the favicon has been customized. js error test no severe javascript errors were detected. no severe javascript errors were detected. social media test no connection with social media has been detected. the website is successfully connected with social media (using facebook). facebook account test information about facebook page has been added by schema.org metadata. facebook popularity (low) the webpage is promoted enough on facebook. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 11 test name state default installation (before optimization) semi-optimized (after a few optimization steps) twitter account test no connection with twitter has been detected. information about twitter account has been added by schema.org metadata. twittercard test no twittercard is implemented. metainformation about twittercard has been added by opengraph metadata. instagram account test no connection with instagram has been detected. information about instagram account has been added by schema.org metadata. microdata (opengraph, schema.org) test there is no microdata or opengraph/schema.org metadata on the website. some opengraph and schema.org matadata has been added. html page size test the size of the page is excellent. (23.65 kb) the size of the page is excellent. (28.84 kb) text/code ratio test 10.71% (excellent) 15.45% (excellent) html compression/gzip (no compression is enabled) the size of html could be reduced up to 79%. the webpage is successfully compressed using gzip compression on your code. your html is compressed with 78% size savings. site loading speed test loading time is around 1.86s loading time is around 2.39s page objects test the webpage has fewer than 20 http requests. the webpage has fewer than 20 http requests. page cache test (server-side caching) the pages are not cached. the pages are not cached. flash test website does not include flash objects. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 12 test name state default installation (before optimization) semi-optimized (after a few optimization steps) cdn usage test your webpage is not serving all resources (images, javascript and css) from cdns. your webpage is not serving all resources (images, javascript and css) from cdns. image, javascript, css caching tests data are not cached. data are not cached. javascript minification test javascripts are not minified. javascript files’ minification has been enabled in tomcat configuration. css minification test some of your webpage’s css resources are not minified. some of your webpage’s css resources are not minified. nested tables test the webpage does not use nested tables. frameset test the webpage does not use frames. doctype test the website has a valid doctype declaration. url redirects test 1 url redirect has been detected. it is acceptable. url canonicalization test the webpage urls are not canonized. https://repozitar.kmkd.uniza.sk/x mlui and https://www.repozitar.kmkd.uniz a.sk/xmlui should resolve to the same url, but currently do not. canonical tag test no canonical tag has been detected. the webpage is using a canonical link tag. https test website is not ssl secured. https has been implemented. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 13 test name state default installation (before optimization) semi-optimized (after a few optimization steps) safe browsing test no malware or phishing activity found. server signature test server self-signature for https is off. directory browsing test server has disabled directory browsing. plaintext emails test the webpage does not include email addresses in plain text. mobile friendliness (includes tap targets, no plugin content, font size legibility, mobile viewport) the webpage is optimized for mobile visitors. seo site checkup final score 58/100 81/100 seo checker online final score 50.1/100 78.0/100 woorank final score 32/100 65/100 average final score 46.7/100 74.66/100 resolving major seo issues this section will look at how we resolved the major seo issues that the tools detected. this is the key technical part because most of mentioned issues highlighted in table 1 were solved and described. the following technical and seo adjustments have been implemented and tested in order to improve the average final score by 59.87% (by 27.96 points, from 46.7 to 74.66 points)— comparing the fresh installation of dspace against the semi-optimized one. all the following solution procedures are based on our own experience, experiments, and research carried out in the area of digital repositories and their optimization as virtual spaces. during the solving process, we follow the order of issues stated in table 1 and describe them in more details in the dspace v6.3 environment and an xml user interface (xmlui). the following procedures may differ slightly if you are using a different version of dspace or another graphic interface (for example jspui). examples of code are given in monospaced font. title, description, and keywords tags in a website header this criterion requires filling in the specific metadata (e.g., metacontent) fields in the page’s html code. the search engines process them automatically to find out what the website is about. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 14 to solve these seo issues change a website title (in default “dspace home”) located in the language translations config files in the folder path. /dspace/webapps/xmlui/i18n/messages_en.xml. find the appropriate key and change the value. all content in this file is fully customizable. next, edit dspace’s page structure config file (in path /themes/mirage/lib/xsl/core/page-structure.xsl) in order to add the metadata content: • a meta-description tag • a keywords tag • an author tag with a carefully selected content and length just below the main tag, as shown in the example: note: do not forget the termination characters />. the keywords should be included in title and meta–description tags. several other seo parameters are affected by performing those steps, for example, google search results preview test, keywords usage test, unique key words test and keywords concentration test. language declaration the language declaration is very important for search engines to identify the primary language of the website content. if a declared language is missing in a website, you can define it by adding the following line into the page-structure.xsl file (the process is similar to adding keywords and description tag as explained above). edit the page-structure.xsl file (with vim or another text editor, for example) and add a statement like the following above the main tag: note: “sk” is an abbreviation for “slovak language” as stated in w3 namespaces. more information is available at https://www.w3.org/tr/xml/ . google analytics, robots.txt and sitemap implementation the connection between a website and google analytics services enables google analytics to track users’ behavior and understand as they interact with this site. it is the basis of web analysis. the “robots.txt” and “sitemap.xml” files are simple text files which are required for search engines to specify the website structure and additional information about it. https://www.w3.org/tr/xml/ information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 15 to enable google analytics services, insert a ua code identifier (id is a string), obtained from google analytics, into the dspace.cfg config files located in the dspace home folder. in that file find the key/row named “xmlui.google.analytics.key=” and insert the corresponding ua identifier there. next, it is needed to uncomment the row with the key “xmlui.controlpanel.activity.max = 250” in the same “dspace.cfg” file. finally, uncomment the row below in the “xmlui.xconf”file located in the path /dspace/config/ and restart the tomcat service: the “robots.txt” file is commonly used and enabled in dspace, but many seo audit tools are not able to detect it successfully because this file is located in path other than the expected default one. to enable robots.txt file detection, copy the file /dspace/webapps/xmlui/static/robots.txt to the root of the tomcat folder (usually located in path /var/lib/tomcat8/webapps/root). finally, restart the tomcat web service. a sitemap for a currently running dspace instance is available in the “robots.txt” file mentioned above. edit this file and set an appropriate url for the sitemap location. enabling connections with social media this criterion detects a hyperlink (or other metadata) connection between a website and popular social media, such as facebook, twitter, etc. the primary goal is to promote the digital content. this subsection deals with social media connections with a dspace-based repository. a simple creation of a profile or a site on a social network related to a repository is considered an essential example of good practice. however, an appropriate form of connection between sites must be created, too. naturally, further endorsement of this system through social networks is another key step. social media-oriented tests are performed by each seo audit tool nowadays. the detected connection with social media could have a big impact on the site’s popularity, as well as on the gained seo final score. there are many ways how to establish these connections: connection with facebook, instagram and twitter—a direct link from the homepage, for example: to add a link to a facebook site profile, edit the page-structure file (/dspace/webapps/xmlui/themes/mirage/lib/xsl/core/page-structure.xsl) just below a div tag with id “ds-footer-wrapper”. for example:

facebook page
a direct link to other media could be similarly added, too, if needed. after accomplishing this procedure, the test for social media (e.g., facebook) passes correctly. however, it should be done by adding microdata or other inserted structured metadata into the html code. please, do not forget to add the appropriate links in the schema.org metadata, too (see below). it is advisable to periodically promote the repository through the posts on facebook and other social media. https://www.facebook.com/digitalnyrepozitar/ information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 16 twitter optimization through a twittercard insertion—some audit tools perform a specific test focused on a presence of so-called twittercard html markup on the website to optimize future tweets. only one card type per-page is supported. here is an example for twittercard consisting of four parts (card, title, image, description). after customizing the following code to reflect your institution’s information, insert the following code just below the tag in the page-structure.xsl file: note: the url of the twitter:image must be absolute. placing a repository link into other (especially high-ranked) websites is also highly recommended. the gained score as well as website traffic will surely rise by implementing those links. opengraph protocol integration this criterion refers to the presence of specific metadata object (opengraph element) in the website. it is very important for website objects’ visibility in social networks. many website objects could be described through opengraph metadata protocol tags.25 the woorank audit tool verifies whether opengraph tags on your webpage have been detected or not. opengraph protocol adoption is a way to enable the integration of any website with social media or other platforms. you will be able to control how your websites are presented when they (e.g., their links) are shared across social media with metadata stored in opengraph protocol tags (all documentation is available at https://ogp.me/). to adopt the main opengraph elements, insert the following code (updating it for your institution) just below the tag in the page-structure.xsl file: note: the url of the tag “og:image” must be absolute. structured data integration (schema.org) this criterion analogously deals with the presence of objects described by another standard for structured metadata, schema.org: https://ogp.me/ information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 17 schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the internet. schema.org uses vocabularies that can be used with many different encodings, including rdfa, microdata and json -ld.26 google, microsoft. and others already use these vocabularies to power rich, extensible experiences with a shared collection of schemas.27 a lot of information included in a dspace repository can be described by schema.org vocabulary. there are three main ways how to do that—through jsonld, rdfa, and microdata. json-ld is a javascript notation embedded in a https://www.facebook.com/digitalnyrepozitar/ information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 18 many elements like the one above could be added, too. see the documentation available at https://schema.org/. a free online tool for testing structured data is available from google at https://search.google.com/structured-data/testing-tool?hl=en. reducing repository website size during transfer the primary goal of this criterion is to measure a reduction of website size which is conducted by the enabled compression of website code parts. this reduction can be achieved by enabling compression methods for html and other file formats when they are transmitted from a server to a client. the tomcat webserver (which is an essential website component for dspace repositories) allows turning on gzip compression and so -called javascript minification. to enable gzip on the tomcat webserver, edit the tomcat’s configuration file “server.xml” located in its home directory. under the tag “” edit a corresponding connector tag so it looks like the following example. changes in the code are shown in bold: the compressablemimetype contains the formats you want to compress. important note: if you deal with https (and corresponding port number 443 instead 8080), you must set the options stated above into the corresponding connector (443), too. otherwise, the compression will be enabled only in simple http (running on port number 8080). javascript minification can be enabled in a “dspace.cfg” configuration file located usually in a dspace home directory (/dspace/config/). change the key value from false to true in the following rows: xmlui.theme.enableminification = true xmlui.theme.enableconcatenation = true setting a canonical link this requirement deals with the presence of a canonical link used by search engines. “a canonical link is included in the html code of a webpage to indicate the original source of content. this markup is used to address seo problems with duplicate content which arise when different pages with different urls contain identical or nearly identical content.”30 the problem with duplicated content can arise, for example, when a webpage is accessible with or without a www prefix in its url or a webpage is accessible via http and https protocols. “for seo purposes, the canonical link shows google and other search engines which url corresponds to the original source of content and should be shown in search results. it is added as a meta tag to every url version of a given webpage and indicates the canonical url.”31 https://schema.org/ https://search.google.com/structured-data/testing-tool?hl=en information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 19 after a necessary customization, insert the following row just below the tag in dspace’s page-structure.xsl file (/dspace/webapps/xmlui/themes/mirage/lib/xsl/core/page-structure.xsl) : https adoption the adoption of https is required for a secure data transfer. this criterion inspects if https is enabled and what quality it displays. https is an essential component that supports website security for sites available via the internet. we pointed out the importance of http adoption in a dspace respiratory interface in our previous research papers.32 firstly, you should prepare a file called the certificate signing request, or csr, that will be used by the certificate authority of your choice to generate the certificate ssl. the process of https configuration on the tomcat webserver (used natively in dspace repositories) is widely described online (for example available at https://www.mulesoft.com/tcat/tomcat-ssl). secondly, you should configure a corresponding connector for https (port 443) in tomcat´s configuration file. we strongly recommend following those instructions and to use dspace instance only with https, among other major security risks, because dealing with simple http has surely a very negative impact on seo final score. google and other search engines strongly prefer websites with https enabled. discussion about the seo issues solving process in the previous subsections, we have offered solutions of selected major seo issues that can be relatively easily resolved in systems based on dspace and its website technologies. however, in practice, it is unrealistic to expect a 100% optimization level and final solutions for all detected problems. therefore, we intentionally did not mark the second state of the system (shown in table 1) as fully optimized but only semi-optimized. some of the issues we detected remain unsolved despite all our efforts. there are several reasons. one of the most important of them is the fact that dspace software, like many complex systems, cannot be easily modified without programming experience. therefore, resolving some complicated issues is beyond the scope of this article. another significant reason is that we lacked knowledge about some issues at the time of writing this paper and therefore we could not solve them. this situation creates an opportunity for further research and proposals for solutions of unsolved issues in this specific area, which the professional public would certainly like to welcome. taken together, it could be said that the changes we have made, helped to objectively increase the average seo score by 59 percent compared to the default installation. all the successfully performed actions improved the search results of our repository and rapidly increased its. we suppose that all related seo actions can affect website traffic. most major issues discussed in this case study were resolved before november 10, 2019. therefore, we prepared an analysis of the repository traffic which involved 30-day period before and after this date (one from october 11 until the change, the other from the change until december 10, 2019). we determined the impact of performed seo actions on website traffic. the results are satisfactory because the number of established relations has significantly raised. the impact of organic search (through google, for example) has increased traffic by 47.67% (from 86 to 127 sessions). the number of sohttps://www.mulesoft.com/tcat/tomcat-ssl information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 20 called referral sessions (sessions initiated from social media and other referral sites) has increased by 193.75% (from 16 to 47 sessions). users spent much more time on the website and viewed more pages on average (an increase of up to 159%). we view the significant traffic increase as a proof that the seo changes we implemented helped to promote use of the digital repository’s content. in the next section, we want to compare the quantitative improvement of seo parameters, which we have been able to achieve to this point, with the results achieved in global testing of worldwide dspace-based repositories by the same set of tests. next, we can easily compare the results gained in the local case study with the current state determined in the worldwide area of dspace repositories. testing seo parameters of worldwide dspace-based repositories there are several thousand digital repositories around the world. most of them (41.1% according to roar registry data and over 39% according to the opendoar registry) are based on dspace software.33 therefore, we also focus our research exclusively on dspace-based repositories in this study. as we have pointed out in the methodology, the second objective of this paper is to briefly describe a current state of seo parameters related to worldwide dspace-based digital repositories. next, we will discuss the comparison of results obtained from the case study and exploration of worldwide repositories. methodology according to the facts stated above, we would like to know more details about the quality of seo parameters related to worldwide repositories running with dspace. we decided to use one of the two most authoritative registries of digital repositories: the registry of open access repositories (roar) and the directory of open access repositories (opendoar). roar is hosted at the university of southampton in the united kingdom and is available online at http://roar.eprints.org/. opendoar is available at https://v2.sherpa.ac.uk/opendoar/. both are quality-assured global directories of academic open access repositories. they “enable the identification, browsing and search for repositories, based on a range of features, such as location, software or type of material held.”34 we decided to utilize the roar registry as the source for a sample list because it is possible to filter systems based on specific criteria. we applied these three filters on march 11, 2020: any country, any repository type, and dspace software. we downloaded the raw data in a text/csv file with 1,977 records. each record had a separate row for each repository. each row has a sequence number and includes many columns with much additional information. only a few columns were necessary for our purpose—the columns marked as “title” and “home_page”. other columns were removed. all changes in the list were performed using microsoft excel. for further evaluation, we selected a random sample from this file. we used a sample size online calculator (available at https://www.calculator.net/sample-size-calculator.html ) to do that. we had set the following values for statistical parameters: http://roar.eprints.org/ https://v2.sherpa.ac.uk/opendoar/ https://www.calculator.net/sample-size-calculator.html information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 21 • population size: 1,977 (the total count of dspace repositories in roar) • confidence level: 95% • margin of error: 10% a sample size of 92 was automatically calculated for these values of statistical parameters. next, we used a random number generating function integrated in excel (randbetween(1,1977)) that generated 92 random numbers from the strictly defined range. each randomly generated number corresponds with the matching row number in the table of repositories downloaded from the roar. we could choose 92 dspace repositories for testing purposes. in this way, objectivity in the selection of the research sample was ensured. we also tested the sample for duplicate entries, to ensure that no repository was selected twice. we had to do so, because the random generating function does not guarantee that only unique integer values will be generated. figure 1 shows the distribution histogram of randomly generated values from 1 to 1,977. figure 1. distribution histogram of randomly generated values. then, we attempted to test each of 92 selected repositories with three audit tools. the results are discussed in the next section. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 22 test results table 2 shows a part of the table with results. this table does not contain any urls or titles to ensure anonymity, however we can provide this information upon request. a second-level domain name is only displayed in each row as well as the corresponding scores gained in tests. the maximum value is 100 points in every case. the rows in the table were sorted by the calculated average score from high to low. many rows are omitted due to the table length (one row for the each from 92 repositories). the last tested repository has a sequence number equal to 65. the repositories with a higher sequence number have no gained score (n/a state), due to inaccessibility. table 2. test results the repository sequence number the first and secondlevel domain name seo site checkup seo checker woorank average 1 econstor.eu 76 65.9 69 70.30 2 datadryad.or g 73 54.9 54 60.63 3 edu.ar 66 54.5 61 60.50 4 cuni.cz 60 52.8 65 59.27 5 edu.co 65 55.5 56 58.83 . . . . . . . . . . . . . . . . . . 65 ac.cn 36 21.7 33 33.23 66 scholarporta l.info n/a n/a n/a n/a . . . . . . . . . . . . . . . . . . 89 org.br n/a n/a n/a n/a 90 mapfig.com n/a n/a n/a n/a 91 edu.ec n/a n/a n/a n/a 92 edu.co n/a n/a n/a n/a average score gained from particular tests 53.47 48.08 49.22 50.26 standard deviation 9.31 9.29 10.27 9.62 median 54 46.7 52 50.90 modus 52 40 54 48.67 the testing process started on march 11, 2020, and finished on april 6, 2020. it took a lot of time, because we were limited by the reuse restrictions (described above) in the audit tools’ free accounts. these restrictions meant that only a few tests could be performed daily even though we used several public ip addresses to speed up the overall testing process. among other items, we identified a startling problem related to nonfunctional repository urls. thirty one out of 92 tested repositories were unavailable between march and april 2020 (in table 2, they are shown with n/a status). on april 6, 2020, at the end of testing period, we tried to test the unavailable systems once again. four of them had become functional, so the final number of information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 23 really tested repositories rose to 65 (out of 92). the remaining 27 (29.35 percent of the total) repositories were still offline or unavailable. it is possible that the urls stated in roar’s records have been out-of-date. n/a values were ignored in all calculations and had no impact on the final average score or other statistical parameters. only 65 fully functional dspace-based worldwide repositories were involved and were used for testing purposes. for better visualization of the partial as well as summarized results, we have decided to use a graph instead of a table. figure 2 shows the results of 65 repositories sorted by an average gained score (from highest to lowest) that was calculated from three partial scores gained in seo site checkup, seo checker and woorank testing tools. so, there are three corresponding partial discrete values (colored dots) shown for each repository in figure 2. the calculated average score for each one is marked in red color. the red dotted line provided the most valuable results for this partial section. figure 2. results of 65 involved repositories. the repositories that gained a higher score (e.g., better seo results) are, logically, situated on the left side of figure 2. on the right side are systems with lower scores. non-functional systems (n/a) are not displayed at all. the underlying frequency distribution graph of average score (the red dotted line in the previous figure) is available in figure 3. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 24 figure 3. underlying frequency distribution graph of average score based on the submitted results manifested in the previous figures we can make the conclusion with a relatively high degree of reliability: the large part of dspace-based repositories registered in roar (over 29%) were unavailable at the time of writing the article. it is alarming, because roar is still considered as an authoritative registry for open access repositories and should not contain any invalid data. an average score of functional repositories, gained during the testing period, is very similar between audit tools: 53.47 points in seo site checkup, 48.08 points in seo checker and 49.22 points in woorank. standard deviations of population are comparable, too. finally, most of the tested repositories (19) gained a score from the interval (55.60) as is shown in fig ure 3; however, the average seo score of all tested dspace-based repositories was only 50.26 points out of 100 (data from march/april 2020), which is an adequate value for a relatively low level of search engine optimization of those systems. results and discussion we have obtained complete insights on the seo parameters of worldwide dspace-based digital repositories in the previous section. now, we can compare this data with the results gained during the case study solving process described above. the situation is briefly pointed out in table 3. information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 25 table 3. comparison of fresh installation, semi-optimized installation and average worldwide score seositechechup seo checker woorank total average score calculated improvement (%) fresh dspace installation 58 50.1 32 46.7 100 a reference point semi-optimized state of the institutional repository of the department of mediamatics and cultural heritage 81 78 59 74.66 +59.87 the average score of worldwide dspacebased repositories 53.47 48.08 49.22 50.26 +7.62 based on table 3, it is proposed that the fresh, non-optimized dspace obtained a slightly worse score than the worldwide average. although a few seo issues still remain in our semi-optimized dspace instance, the state of seo parameters is much better than the score gained in any other discussed cases. if we considered a fresh dspace installation as a reference point (100 percent), the improvement level would be shown in the last column of table 3. semi-optimized dspace offers an improvement up to 59.87% compared to fresh (non-optimized) dspace installation. there is no significant difference (up to 7.62%) in seo quality between the worldwide average repository and non-optimized dspace instance. the results they have obtained are very similar. as we have mentioned at the beginning of our paper, a higher score obtained in tests is not the primary objective. the main goal is to improve visibility and the content searchability of digital repositories, as well as to improve their security and ways of promotion through the social/new media. conclusion this study exposed a serious research in the field of digital repositories running dspace software—as the most popular tool for this purpose. we have shown that significant seo improvement of more than 59% can be achieved thanks to a few simple modifications within the dspace configuration and associated used application layers (tomcat webserver, etc.). some of those technical optimization steps can be performed in a relatively simple way, using previously described solving procedures and a wide theoretical background. we have publicly presented the reports and solving explanations of the most common and major seo problems that dspace repositories usually face. this paper is one of the first academic studies to deal with seo issues related to digital repositories, especially those that are running dspace software. we realize that we have not been able to solve all of the identified problems completely. therefore, the following seo issues remain unresolved: information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 26 • h1 heading tags test • h1 coherence test • seo friendly url test • inline css test • page cache test (server-side caching) • cdn usage test • image, javascript, css caching tests • css minification test • url canonicalization test some of these could probably be solved more easily than others; however, the system url cannot be changed without difficulty to be considered as seo friendly. in conclusion, all of the above presents a great opportunity for further discussions and research in this field. the current state of seo parameters related to dspace repositories has been presented as unsatisfactory, as shown in the test results. conclusively, the results of our research indicate that there is a small difference in seo quality between the average results obtained by global, worldwide dspace repositories and the non-optimized installation of dspace v6.3 (the difference is approximately 7% in global repositories’ favor). it seems that the most of these systems are not currently optimized in terms of seo and other technical website parameters. the second major finding indicates that the metadata records stored in the roar are not always accurate and may be incorrect or obsolete. in order to make this finding more objective we must note that the roar’s storage had a major failure, which could lead to the harvesting service failing. (more information about the failure is available at http://roar.eprints.org/.) finally, we recommend periodically re-testing the level of search engine optimization on digital repositories. the “search engine algorithms tend to change often, and new factors are added while outdated or not effective factors are excluded. this is why web developers must check the algorithm changes and adjust their websites in order to not only achieve but also maintain high ranking in search engines.”35 we believe that our work will also contribute to the initiation of cooperation among other experts in order to resolve remaining seo problems. ultimately, we hope that all presented efforts and recommendations will help repository administrators, users, scientists, researchers, teachers as well as students and other members of the general public to find what they need in the virtual spaces like digital repositories more quickly and efficiently. endnotes 1 christos ziakis et al., “important factors for improving google search rank,” future internet 11, no. 2 (january 2019): 2–3, https://doi.org/10.3390/fi11020032. 2 f. insidro aguillo et al., ”comparing university rankings,” scientometrics 85 (february 2010): 243–56, https://doi.org/10.1007/s11192-010-0190-z. 3 ahmad bakeri abu baka and nur leyni, ”webometric study of world class universities websites,” qualitative and quantitative methods in libraries (july 2017): 105–15, http://qqmljournal.net/index.php/qqml/article/view/367; andreas giannakoulopoulos et al., ”academic http://roar.eprints.org/ https://doi.org/10.3390/fi11020032 https://doi.org/10.1007/s11192-010-0190-z http://qqml-journal.net/index.php/qqml/article/view/367 http://qqml-journal.net/index.php/qqml/article/view/367 information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 27 excellence, website quality, seo performance: is there a correlation?” future internet 11, no. 11 (november 2019): 242, https://doi.org/10.3390/fi11110242. 4 dwi budi santoso, “pemanfaatan teknologi search engine optimazion sebagai media untuk meningkatkan popularitas blog wordpress,” dinamik 14, no. 2 (2009): 12–33, https://www.unisbank.ac.id/ojs/index.php/fti1/article/view/100; m. iskandar and d. komara, “application marketing strategy search engine optimization (seo),” iop conference series: materials science and engineering 407 (2018), https://iopscience.iop.org/article/10.1088/1757-899x/407/1/012011/pdf. 5 giannakoulopoulos et al., ”academic excellence, website quality, seo performance.” 6 giannakoulopoulos et al., ”academic excellence, website quality, seo performance.” 7 thomas abrahamson, “life and death on the internet: to web or not to web is no longer a question,” journal of college admission 168 (2000): 6–11. 8 sukhpuneet kaur, kulwant kaur, and parminder kaur, “an empirical performance evaluation of universities website,” international journal of computer applications 146, no. 15 (july 2016): 10–16, https://doi.org/10.5120/ijca2016910922. 9 “best seo software,” g2, last modified 2020, https://www.g2.com/categories/seo. 10 “best seo software.” 11 ziakis et al., “important factors for improving google search rank.” 12 giannakoulopoulos et al., “academic excellence, website quality, seo performance.” 13 joeran beel, bela gipp, and eric wilde, “academic search engine optimization (aseo): optimizing scholarly literature for google scholar & co.,” journal of scholarly publishing 41, no. 2 (january 2010): 176–90, http://dx.doi.org/10.3138/jsp.41.2.176. 14 brian kelly, “majesticseo analysis of russell group university repositories,” uk web focus (blog), august 29, 2012, http://ukwebfocus.wordpress.com/2012/08/29/majesticseoanalysis-of-russell-group-university-repositories/. 15 “opendoar statistics,” jisc, last modified september 2020, https://v2.sherpa.ac.uk/view/repository_visualisations/1.html. 16 si ong quan, “44 best free seo tools (tried & tested),” last modified may 28, 2020, https://ahrefs.com/blog/free-seo-tools/; navneet kaushal, ”top 15 most recommended seo tools,” last modified september 2020, https://www.pagetraffic.com/blog/top-15-mostrecommended-seo-tools/. 17 quan, “44 best free seo tools (tried & tested)”; kaushal, “top 15 most recommended seo tools.” https://doi.org/10.3390/fi11110242 https://www.unisbank.ac.id/ojs/index.php/fti1/article/view/100;%20m https://iopscience.iop.org/article/10.1088/1757-899x/407/1/012011/pdf https://doi.org/10.5120/ijca2016910922 https://www.g2.com/categories/seo http://dx.doi.org/10.3138/jsp.41.2.176 http://ukwebfocus.wordpress.com/2012/08/29/majesticseo-analysis-of-russell-group-university-repositories/ http://ukwebfocus.wordpress.com/2012/08/29/majesticseo-analysis-of-russell-group-university-repositories/ https://v2.sherpa.ac.uk/view/repository_visualisations/1.html https://ahrefs.com/blog/free-seo-tools/ https://www.pagetraffic.com/blog/top-15-most-recommended-seo-tools/ https://www.pagetraffic.com/blog/top-15-most-recommended-seo-tools/ information technology and libraries march 2021 solving seo issues in dspace-based digital repositories | formanek 28 18 “28 top seo site checkup tools,” traffic radius, accessed march 29, 2020, https://trafficradius.com.au/seo-site-checkup-tools/. 19 “28 top seo site checkup tools,” traffic radius. 20 chandan kumar, “13 online tools to analyse website seo for better search ranking,” last modified april 11, 2020, https://geekflare.com/online-tool-to-analyze-seo/#seo-testeronline. 21 “28 top seo site checkup tools,” traffic radius. 22 kumar, “13 online tools to analyse website seo for better search ranking.” 23 “opendoar statistics,” jisc. 24 “dspace 7.0 beta 4 release announcement,” lyrasis, october 13, 2020, https://duraspace.org/dspace-7-0-beta-4-release-announcement/. 25 “the open graph protocol,” ogp, accessed january 25, 2021, https://ogp.me/. 26 “welcome to schema.org,” schema, accessed may 1, 2020, https://schema.org/. 27 “welcome to schema.org.” 28 “understand how structured data works,” google, accessed may 2, 2020, https://developers.google.com/search/docs/guides/intro-structured-data. 29 “understand how structured data works,” google. 30 “canonical tag,” seobility, accessed march 20, 2020, https://www.seobility.net/en/wiki/canonical_tag. 31 “canonical tag,” seobility. 32 matus formanek and martin zaborsky, “web interface security vulnerabilities of european academic repositories,” liber quarterly 27, no. 1 (february 2017): 45–57, http://doi.org/10.18352/lq.10178; matus formanek, vladimir filip, and erika sustekova, “the progress of web security level related to european open access lis repositories between 2016 and 2018,” jlis.it 10, no. 2 (may 2019): 107–15, http://dx.doi.org/10.4403/jlis.it-12545. 33 “opendoar statistics,” jisc. 34 “about opendoar,” jisc, last modified september 2020, https://www.jisc.ac.uk/opendoar. 35 ziakis et al., “important factors for improving google search rank,” 2. https://trafficradius.com.au/seo-site-checkup-tools/ https://geekflare.com/online-tool-to-analyze-seo/#seo-tester-online https://geekflare.com/online-tool-to-analyze-seo/#seo-tester-online https://ogp.me/ https://schema.org/ https://developers.google.com/search/docs/guides/intro-structured-data https://www.seobility.net/en/wiki/canonical_tag http://doi.org/10.18352/lq.10178 http://dx.doi.org/10.4403/jlis.it-12545 https://www.jisc.ac.uk/opendoar abstract introduction and state of art literature review website quality audit tools audit tools selection process the institutional repository of the department of mediamatics and cultural heritage (seo case study) background information initial testing of a clean dspace 6 installation resolving major seo issues title, description, and keywords tags in a website header language declaration google analytics, robots.txt and sitemap implementation enabling connections with social media opengraph protocol integration structured data integration (schema.org) reducing repository website size during transfer setting a canonical link https adoption discussion about the seo issues solving process testing seo parameters of worldwide dspace-based repositories methodology test results results and discussion conclusion endnotes lib-mocs-kmc364-20140103102629 72 entry /title compression code access to machine readable bibliographic files william l. newman: systems analyst and programmer, and edwin j. buchinski: assistant to the librarian, systems and planning, university of saskatchewan, saskatoon. an entry/title compression code is proposed which will fulfill the following requirements at the library, university of saskatchewan: 1) entry/title access to marc tapes; 2) entry/title access to the acquisitions and cataloguing in-process file; and 3) entry /title duplicate order edit within the acquisitions and cataloguing in-process file. the study which produced the code and applications for the code are discussed. introduction the determination and design of access points, or keys, to machine readable bibliographic files is a major problem faced by libraries planning computer assisted processing. alphabetic keys, i.e. truncations of title and/or author variable fields, are inadequate, since minor differences in spelling, punctuation, or spacing between master key and request key cause difficulties in accessing records. numeric keys, such as library of congress card numbers, isbn, purchase order numbers, etc. are therefore usually employed for searching machine readable library files. more sophisticated means must be developed in order to maximize the usefulness of these files, since a searcher, even with book in hand, may not be able to provide the numeric key necessary to obtain the book's machine readable data. this problem may be solved through the use of compression codes generated from author /title, or other bibliographic information. studies entry/title compression code/newman and buchinski 73 of compression codes and their performance have been reported by ruecking ( 1), kilgour ( 2), and the university of chicago ( 3). this approach has been endorsed by the library of congress ( 4) in the recon study. studies at the library, university of saskatchewan, were initiated with the hope of producing a compression code that would provide machine duplicate order edit in the acquisitions and cataloguing in-process file, and retrieve entries, using unverified or verified bibliographic information as input, either from a partially unverified file, such as the acquisitions and cataloguing in-process file, or from an authoritative machine readable data base, such as marc ii. in addition, the desired code would have to minimize errors in punctuation and spelling in order to achieve a high retrieval percentage, yet produce a low volume of duplicate codes for dissimilar works. construction of the data base since june, 1969, the marc data base has been used at the library to generate unit cards which have been used as source data for unit card masters in the cataloguing department. approximately 300 of these were drawn at random. at the same time the original order request forms for these items were searched. of the 300 items, 254 requisition forms were found in the manually maintained acquisitions in-process file. the lc card numbers were used to retrieve the corresponding marc records from the library's history marc tape. an additional4,128 marc records were placed on the same tape as the 254 marc records for which order request information existed. the lc card numbers, order entry, title, and, if present, d ate of publication were keypunched from the 254 requisition forms. this bibliographic information formed the data base from which search codes were produced for the acquisitions department records. code generation a computer program performed the following modifications on all input data prior to generating the actual compression codes. first, all the lowercase alphabetics were converted to upper-case alphabetics. then all punctuation was eliminated from the title field except for periods and apostrophies within a word. a word compaction routine then eliminated periods from within abbreviations and apostrophes from within words. the entries from the 4,382 marc records and the 254 requisition forms were categorized according to personal name, and corporate or conference name. the first comma delimited the portion of the personal name to be used in the compression routine. spaces, diacritics, periods, apostrophes, and hyphens were all eliminated from the personal name. the first two codes used in the project were labelled imaginatively code type 1, and code type 2, where code type 1 was a slight modification of the code developed by frederick h. ruecking ( 1). code type 2 was 74 journal of library automation vol. 4/2 june, 1971 based on a modified university of chicago experimental search code ( 3) incorporating ideas from some of ruecking's studies. code type 1 title compression (16 characters) see ruecking ( 1) for the rules which were used to construct the fourcharacter compressions. entry compression (12 characters) three four-character compressions were used for corporate or conference names instead of ruecking's four. one four-character compression was produced for personal names. date of publication (3 characters) if the year of publication was available, the last three digits were used, otherwise, the date was left blank. the total length of code type 1 is 31 characters. code type 2 title compression (6 characters) 1) "a", "an", "the", "and", "by", "if", "in", "of", "on", "to" were deleted from the title. 2) the first word containing two consonants was located and the first two consonants appearing in the word were used for the search code. 3) step 2 was repeated with a second and third word of the short title, whenever these were available. 4) if three words with two consonants were not available, the balance of the six characters needed for the code were supplied by those characters immediately after the last character used (except for blanks). entry compression (6 characters) a) personal name. 1 ) only the surname, or the forename if there was no surname. 2) if the name had six or fewer characters, the entire name was used. otherwise, vowels were deleted from the name (working backwards on the name ) until the six-character compression was formed, or the second consonant was located. 3) if the six-character compression was not formed by step 2, then the first four characters and the last two characters were used for the six-character compression. b) corporate and conference entries the rules for title compression to form the six-character code were followed. entry/title compression code/newman and buchinski 75 date of publication (3 characters) the last three digits of the date of publication, as in code type 1, were used. in either of the codes, if the title was the main entry, a code was generated with the entry field blank. examples of code generation title: factors in the transfer of technology. entries: 1) m.i.t. conference on the human factor in the transfer of technology, endicott house, 1966. 2) gruber, william h. 3) marquis, donald george 4) massachusetts institute of technology date of publication: 1969 code type 1 compressions: 1) f acttrsftchnlil:z;lblbmitlbcofrhumlb969 f acttrsftchnlblblblbmit!tcofrhumlblblblb 2) f acttrsftchnlblblz:hgrbrlblblb1z:lbt)lblb969 f acttrsftchnlblblblbgrbrlblblblblblblblblblblb 3) f acttrsftchnlbl:z;lblbmaqslbbbblz:lblbb969 f acttrsftchnipbblbmaqsbblbbbbbbl:z;bb 4) f acttrsftchnlbblbbmattintttchn969 f acttrsftchni:z;bblbmattintttchnlblblb code type 2 compressions: 1) fctrtcmtcnhm969 fctrtcmtcnhmlblblb 2) fctrtcgruber969 fctrtcgruberblbb 3) fctrtcmarqus969 fctrtcmarquslbbb 4) fctrtcmsnstc969 fctrtcmsnstclbblb procedure and results the two types of codes were generated from the 4,382 marc records using publication date, short title, main entry, and added entries. another program was written to generate codes from the acquisitions department data on cards and to write them on a separate tape using publication date if available, entry, and the first four significant words of the title and/or the words of the title up to the first punctuation mark. the two tapes containing codes were sorted in ascending code sequence, then compared. if the code generated from the acquisitions data, hereafter called the unverified code, was exactly the same as the code generated from marc tape, hereafter called the verified code, the codes and corresponding lc card numbers were printed as a hit. the program then checked the lc card 76 journal of librat·y automation vol. 4/2 june, 1971 numbers corresponding to the identical codes. if the lc card numbers were the same, a retrieval was recorded; otherwise, the matching codes were considered a false drop. the program also checked and printed duplicates existing within the verified codes and within the unverified codes. since the code formation programs involved string manipulation, they were written in pl/i, while the comparison program was written in cobol. the programs were run on the ibm/360 model 50 installed at the university of saskatchewan computation centre. table 1 and table 2 give the results. the following is a description of the error types used in evaluating non-retrieval: a-the unverified entry had only a remote relationship to the verified entry. no retrieval technique would have produced a match. b-the unverified entry was misspelled. c-the unverified title had only a remote relationship to the verified title. d-the unverified title contained misspelled word ( s) . e-only the unverified date of publication was incorrect. as an immediate consequence of the analysis of tables 1 and 2, the publication date was eliminated from the codes and the comparison program rerun producing the results given in table 3. table 1. code performance retrievals false drops code type 1 200 0 code type 2 206 0 table 2. non-retrieval a nalysis numbe r of no n-retrievals error type a code type 1 b c d e table 3. code performance rett·ievals code type 1a code type 2a 220 226 9 10 8 7 20 false drops 0 0 percent retrieval 78.74 81.10 code type 2 9 7 8 4 20 percent retrieval 86.61 88.98 no duplicate codes existed within the unverified code tape. from the 4,382 marc records, 6,828 cod es were produced for each of code type 1a entry/title compression code /n ewman and buchinski 77 and code type 2a. works having the same author and title, but different imprint, were not considered duplicates even though the program listed them as such. seven duplicates, one triplicate and one quadruplicate occurred in code type 1a; and eight duplicates, two triplicates and two quadruplicates in code type 2a. government publications were responsible for all but one of the duplicate codes. code type 2b a graph of the number of duplicate codes vs. the number of source records was drawn for code type 1a and code type 2a (fig. 1 ). as a result of this graph code type 2b was proposed. this code employed the same rules for construction as code type 2a, except that four significant words from the title and four significant words from corporate or conference entries were used to generate the compression. the total length of code type 2b is thus sixteen characters. six duplicates, one triplicate and one quadruplicate appeared when the comparison program was run using code type 2b. figure 1 is a graph of the result. ~ < (.) 15 ~ 10 rx. 0 5 looo 2000 3000 number of marc records 4000 2a la 2b fig. 1. numb er of duplicates vs number of source records for code types la, 2a and 2b. 78 journal of library automation vol. 4/2 june, 1971 the performance of code type 2b is summarized in tables 4 and 5. table 4. code performance retrievals code type 2b 223 table 5. non-retrieval analysis error type applications marc tapes a b c d f al:>e drops 0 percent retrieval 87.80 number of nonretrievals code type 2b 9 10 8 4 a marc code tape was recently created and is being maintained at the university of saskatchewan, as flowcharted in figure 2. each record on the tape consists of a compression code and an lc card number. approximately 100,000 entry /title keys, plus series statement and sbn keys, have been created from the 65,000 records on the current marc history tape. figure 3 illustrates how these access points are used to provide unit card printouts. figure 4 shows a sample output from the matching step in figure 3. this printout indicates the results of the search, and serves as a link between the request and the catalog card printed from the marc tape. in the printout, entry / title requests that have found more than one lc card number do not necessarily indicate a false drop. so far, these multiple finds have resulted from the same publications appearing on marc with different imprints. it is a simple matter to select the catalog card with the appropriate imprint. the discrepancy in table 6 between marc records found and titles verified is due to the above, and to multiple hits on a single record when requests for that record were submitted in more than one form, i.e. s.b.n. and author/ title. table 6 presents a summary of the results of submitting unverified requests over a four-week period against the marc code tape. during that time, 563 english language monographs with potential 1969 and 1970 imprints were searched. desired marc records were found for 184 titles, or 32.7% of these requests. the source data for the requests was supplied from title-pages and order recommendations. this data was not verified because the compression code access technique partially solves the problem of non-retrieval due to human errors in the submission. entry/ title compression code / newman and buchinski 79 i i i new new update codes codes .., i i i i i r-update i i i i i i i occasi onal -----1 i i i i i i i i i old i i split codes i i i i i update r _____ ., __ .j fig. 2. marc tape processing. 80 journal of library automation vol. 4/ 2 june, 1971 generate codes lc card number ~uests cataloo cards add to fi le codes fig. 3. marc a ccess programs. table 6. marc ret1·ieval form of request author/ corporate title title author / title number of requests marc records 546 29 130 found 173 2 11 titles verified 148 2 6 false drops 0 0 2 sbn 139 36 20 0 codes series total 11 855 18 240 8 184 6 8 entry/title compression code/ newman and buchinski 81 l i 8uit't • fntii:yifl'lf ~fou£""s"fsfofl: "'l ui(. cll.uog (.uos ii',u;t; ) m ilic i'oi !io ~oj l ii hf:~ mimu ages i~ ~nglan o 1216 1185 shah ~neujl l u!)jol of uniyfr'i i tt tf,i,c.hinc fltf frlll( lu~ i n i)jt: puge ino l i ii.fnf'>s ullll~ius 1 1'11 'lufl: f ii i.gf •no llltf'-f~s huttani!y l'«) qivin1f'l' 1111 ft.ll111,. ......... 0 .. 11-au"s nf south iii fst sariwu !holf 'tal 4y su fkcloskf't' .. (f4 fh41cs and noit "'a!..!y.lfthi n tc.f fl' , in i'lu') f ll fa l i lat i on in joii netfhhh cfnt~'f fu fl:tlp! l(f flt il' 1ndlls ta.iili14tion in l 9_th cfn tl,. y euii.o pf valf f:ncll sh cas(.o "'v val( fhclish g.t.sco"'y llot't 14h huit l "if'/n fncl i sh ho"'f lt't'ltfs a"tf".l,o saj:c)til "0t1'fity l>t_!l_ httlf mfnt nf f "'gl&no ')ii'~(. i( , ,.. 1."10"ty"100s ii'utianf:nu'ty d!aity 10 seat:ch coof lc caiii.o numiuu i ·=-------------------cu tht ogs nciil l ks n no lc caito nu"' iuiis found c u i'igliihi4dvu.in1ts 10460121 s'ec~t cu nnp~oiiihsp£(11( no lt co\11:0 hu~ 8fiii:5 fou !ro'd &"f &'ii"'ny iii()j$ paiili&mh11.!.!3!_qj'.lllty 17.0"'-'-''------------l(l 'lr. anhii:c tic jo'inv)n tntiiioplltj ion to \ovih lfg&l. sy\_ffl't 'u..neii lt(lt 1fic5 o\no "'ll. l 1 'tu10'1h co .. i'u'f c u l'l" lnfc mtuit nflt fifo lc cuo nu'!&u5 founo cu i' m5 u.fltn no lt caito nu muu found fig. 4. results of entry / title search for marc unit card printouts. manual searching of the nuc catalogs was employed to verify titles that could not be located on the marc tape. ten titles were found with marc notations after failing to be retrieved by compression code matching. type a and type c errors were primarily responsible for this non-retrieval. however, two of these titles could not be retrieved from the marc tape following manual verification, since the verified entries in the nuc preceded their counterparts on the marc tape. thus the performance of the compression codes can be evaluated as 184 of a possible 192 hits, or a 95.8% retrieval rate. during the four-week period the keypunchers formul ated 52% more entry / title requests than there were titles for verification. this is due mainly to the need for submitting more than one author/ title request whenever the portion of the title which comprises the short title is in doubt, since the code is formulat ed from the short title only. additional experience should decrease the number of redundant requests. only 8 false drops have been received in the above submissions. retrieval of series entries is likely to engender the greatest number of false drops because series statements are treated as titles in the code generation procedure. acquisitions and cataloguing during the past two years, the technical services department at the 82 journal of library automation vol. 4/2 june, 1971 library and the computation centre have designed, and are currently testing tesa i (technical services automation-phase i), an automated acquisition and cataloguing system ( 5), the primary objectives of which were to pursue a total library system concept and to provide for conversion from a batch system to an on-line operation when sufficient computer facilities become available. at the same time that work proceeded towards these objectives, status codes and receiving reports were employed as used in washington state university's lola system (6) and (7). however, marc tapes and compression codes comprise an integral part of the system. if a marc record can be located before an order is entered, a tremendous amount of keying is saved. one 64-character in-process transaction will supply the ordering information and transfer the bibliographic data from the current marc history tape to the direct access acquisitions and cataloguing in-process file (ibm's basic direct access method). minimal cataloguing updates are necessary before catalog card sets can be produced. entry /title access ensures that only a small percentage of needed marc records will slip through tesa i's fingers at order initiation time. another code application as illustrated in figure 5 will exploit the fact that the same code construction rules are used in the marc system as in tesa i. items requiring bibliographic information will be flagged in the in-process file. when a new marc tape arrives, the in-process code file (ibm's index sequential access method) will be automatically matched with the marc codes created from the new weekly tape. a sample printout from these matches is provided in figure 6. after verifying which marc records are needed, marc bibliographic information will be transferred to the appropriate in-process records. each record in the direct-access isam compression code file consists of a compression code (or sbn or lc card number) and the key ( purchase order number) to the corresponding in-process record. a threaded list structure exists within the in-process file to handle the possibility of one code accessing several items. thus an in-process record may be directly accessed by entry /title, series statement, sbn, lc card number or purchase order number. a fast edit routine built into the direct-access write detects whether or not the compression code about to be written is a duplicate of a code already in the file. if the code is unique the code record is written on disc and a single item list is created within the corresponding in-process record. if the code is not unique, the code record cannot be written. in this case the list structure for the code is updated to include the key of the in-process record being added. a message, together with the purchase order numbers of items which may be duplicates, is printed to warn the acquisitions staff that a potential duplicate is being added to the in-process file. traditional duplicate checking of in-process items thus becomes an exception. entry/title compression code/newman and buchinski 83 fig. 5. search of weekly marc tape for records needed in the in-process file. pag£ 1 ----------------------------------------------------------------------------------------------------------------1 tf n humiijer ; 1001s2'co ma't' cnu:espono to lc caiu) !ij miijer: author: ltfnyon, muu lloyi"', jitle~i:ny~oi ns of encund. 1711120 4 -!~~::o=~"' ;~:~:~!::~~l~bco~respond to lc card number: 7tl2 lus,_ ___________________ _ title: plt.oducts an d the consumer : oefective and oangerous pltoducts , -!~~=o~~~~~~~~e~~o~=~~ /: ~ c~~:ri;spnnn ro u: cuo'-"-" ""'""'"~'•oc''---''-"•'-'11'-''-""-'----------------------' -'tlf: china 800 y t s;eou!!>l'-''---ite m nu mi!i fr : loojs\~_0 hu corrfspono to lc card number : l9~()61u i u'i'hqr : ~ehinir on p.ln(ht.y u i ra j , p unn i ng and of140cr ac yt jai puil , 1~ 6 4. ---l.!.!!:...e: p~chau.t i uj , pl anning ahd .qemocu~c.cyl• ..><.<_projot!llfe-'.•----------------------fig. 6. in-process items for which bibliographic information may exist on the newly arrived marc tape. 84 journal of library automation vol. 4/2 june, 1971 remote access to marc an experiment was conducted in which entry /title requests were submitted from the ibm s360/40 computer at the university of saskatchewan, regina campus, computer centre, over a communication link to the saskatoon campus ibm s360/50 computer. the marc access program was read into the regina computer, sent to saskatoon's computer, spooled in the saskatoon job queue and executed; then the results of the search were sent to regina to be printed. the entire process took approximately the same time as if the program had actually been executed in regina. no data transmission errors were encountered in transmitting either the requests or the retrieved marc unit cards over this 150-mile communication link. conclusion there is an inverse relationship between retrieval performance and number of duplicate codes produced. a high retrieval code such as code type 2a results in more duplicates than a code such as ruecking's, which has a slightly lower retrieval performance. code type 2b fulfills the requirements for a code short in length and easy to construct that produces a low number of duplicates and has high retrieval capability. for an index to a library holdings file, or to a national data base, a code such as ruecking's, with four or more significant words from title and corporate or conference entries, and with different rules for personal author compression, would perhaps be suitable. acknowledgments the authors thank the library staff for their assistance in the study. they are also grateful to the library and computation centre administrations, in particular, d. c. appelt, g. c. burgis, and n. e. glassel for the allotment of computer time and their encouragement. references 1. ruecking, frederick h. jr.: "bibliographic retrieval from bibliographic input; the hypothesis and construction of a test," ]oumal of library automation, 1 (december 1968), 227-238. 2. kilgour, frederick g.: "retrieval of single entries from a computerized library catalog." in american society for information science, annual meeting, columbus, 0., 20-24 oct. 1968: proceedings, 5 ( 1968 ), 133-136. 3. "university of chicago experimental search code." in avram, henriette d.; knapp, john f.; rather, lucia j.: the marc ii format: a communications format for bibliographic data (washington, d.c., library of congress, 1968), pp. 129-131. 4. "computer requirements for a national bibliographic service." in recon working task force: conversion of retrospective records to machine-readable form (washington, d.c.: library of congress, 1969), pp. 183-226. entry/title compression codejnewman and buchinski 85 5. newman, w. l.: technical services automation-phase i acquisitions and cataloguing (computation centre, university of saskatchewan, saskatoon, november, 1969) , mimeographed. 6. burgess, t.; ames, l.: lola; library on-line acquisitions sub-system (pullman, wash.: washington state university library, july, 1968 ). 7. mitchell, patrick c.: lola, library on-line acquisitions sub-system, (washington state university, june, 1969) unpublished. graphs in libraries: a primer | powell et al. 157 james e. powell, daniel alcazar, matthew hopkins, robert olendorf, tamara m. mcmahon, amber wu, and linn collinsgraphs in libraries: a primer answer routine searches is compelling. how, we wonder, can we bring a bit of google to the library world? google harvests vast quantities of data from the web. this data aggregation is obviously complex. how does google make sense of it all so that it can offer searchers the most relevant results? answering this question requires understanding what google is doing, which requires a working knowledge of graph theory. we can then apply these lessons to library systems, make sense of voluminous bibliometric data, and give researchers tools that are as effective for them as google is for web surfers. just as web surfers want to know which sites are most relevant, researchers want to know which of the relevant results are the most reliable, the most influential, and of the highest quality. can quantitative metrics help answer these qualitative questions? the more deeply libraries and librarians can mine relationships between articles and authors and between subjects and institutions, the more reliable are their metrics. suppose some librarians want to compare the relative influence of two authors. they might first look at the authors’ respective number of publications. but are those papers of equally high quality? they might next count all citations to those papers. but are the citing articles of high quality? deeper still, they might assign different weights to each citing article using its own number of citations. at each step, whether realizing it or not, they are applying graph theory. with deeper knowledge of this subject, librarians can embrace complexity and harness it for research tools of powerful simplicity. ■■ pagerank and the global giant graph indexing the web is a massive challenge. the internet is a network of computer hardware resources so complex that no one really knows exactly how it is structured. in fact, researchers have resorted to conducting experiments to discern the structure and size of the internet and its potential vulnerability to attacks. representations of the data collected by these experiments are based on network whenever librarians use semantic web services and standards for representing data, they also generate graphs, whether they intend to or not. graphs are a new data model for libraries and librarians, and they present new opportunities for library services. in this paper we introduce graph theory and explore its real and potential applications in the context of digital libraries. part 1 describes basic concepts in graph theory and how graph theory has been applied by information retrieval systems such as google. part 2 discusses practical applications of graph theory in digital library environments. some of the applications have been prototyped at the los alamos national laboratory research library, others have been described in peer-reviewed journals, and still others are speculative in nature. the paper is intended to serve as a high-level tutorial to graphs in libraries. part 1. introduction to graph theory complexity surrounds us, and in the twenty-first century, our attempts at organization and structure sometimes lead to more complexity. in layman’s terms, complexity refers to problems and objects that have many distinct but interrelated issues or components. there also is an interdisciplinary field referred to as “complex systems,” which investigates emergent properties, such as collective intelligence.1 emergent properties are an embodiment of the old adage “the whole is greater than the sum of its parts.” these are behaviors or characteristics of a system “where the parts don’t give a real sense of the whole.”2 libraries reside at the nexus of these two definitions: they are creators and caretakers of complex data sets (metadata), and they are the source of explicit records of the complex and evolving intellectual and social relationships underlying the evolution of knowledge. digital libraries are complex systems. patrons visit libraries hoping to find some order in complexity or to discover a path to new knowledge. instead, they become the integration point for a complex set of systems as they juggle resource discovery by interacting with multiple systems, either overtly or via federated search, and by contending with multiple vendor sites to retrieve articles of interest. contrast this with google’s simple approach to content discovery: a user enters a few terms in a single box, and google returns a large list of results spanning the internet, placing the most relevant results at the top of this list. no one would suggest using google for all research needs, but its simplicity and recognized ability to james e. powell (jepowell@lanl.gov) is research technologist, daniel a. alcazar (dalcazar@lanl.gov) is professional librarian, matthew hopkins (mfhop@lanl.gov) is library professional, tamara m. mcmahon (tmcmahon@lanl.gov) is library technology professional, amber wu (amber.ponichtera@gmail.com) is graduate research assistant, and linn collins (linn@lanl .gov) is technical project manager, los alamos national laboratory, los alamos, new mexico. robert olendorf (olendorf@unm .edu) is data librarian for science and engineering, university of new mexico libraries, albuquerque, new mexico. 158 information technology and libraries | december 2011 influence a person has in a business context. if we want to analyze this aspect of the network, then it makes sense to consider the fact that some relationships are more influential than others. for example, a relationship with the president of the company is more significant than a relationship with a coworker, since it is a safe assumption that a direct relationship with the company leader will increase influence. so we assign weights to the edges based on who the edge connects to. google does something similar. all the webpages they track have centrality values, but google’s weighting algorithm takes into account the relative importance of the pages that connect to a given resource. the weighting algorithm bases importance on the number of links pointing to a page, not the page’s internal content, which makes it difficult for website authors to manipulate the system and climb the results ladder. so if a given webpage science, also known as graph theory. this is not the same network that ties all the computers on the internet together, though at first glance it is a similar idea. network science is a technique for representing the relationships between components of a complex system.3 it uses graphs, which consist of nodes and edges, to represent these sets of relationships. generally speaking, a node is an actor or object of some sort, and an edge is a relationship or property. in the case of the web, universal resource locators (urls) can be thought of as nodes, and connections between pages can be thought of as links or edges. this may sound familiar because the semantic web is largely built around the idea of graphs, where each pair of nodes with a connecting edge is referred to as a triple. in fact, tim berners-lee refers to the semantic web as the global giant graph—a place where statements of facts about things are published online and distinctly addressable, just as webpages are today.4 the semantic web differs from the traditional web in its use of ontologies that place meaning on the links and in the expectation that nodes are represented by universal resource identifiers (uris) or by literal (string, integer, etc.) values, as shown in figure 1, where the links in a web graph have meaning in the semantic web. semantic web data are a form of graph, so graph analysis techniques can be applied to semantic graphs, just as they are applied to representations of other complex systems, such as social networks, cellular metabolic networks, and ecological food webs. herein lies the secret behind google’s success: google builds a graph representation of the data it collects. these graphs play a large role in determining what users see in response to any given query. google uses a graph analysis technique called eigenvector centrality.5 in essence, google calculates the relative importance of a given webpage as a function of the importance of the pages that point to it. a simpler centrality measure is called degree centrality. degree centrality is simply a count of the number of edges a given node has. in a social network, degree centrality might tell you how many friends a given person has. if a person has edges representing friendship that connect him to seventeen other nodes, representing other people in the network, then his degree value is seventeen (see figure 2). if a person with seventeen friends has more friendship edges than any other person in the network, then he has the highest degree centrality for that network. eigenvector centrality expands on degree centrality. consider a social network that represents the amount of figure 1. a traditional web graph is compared to a corresponding semantic web graph. notice that replacing traditional web links with semantic links facilitates a deeper understanding of how the resources are related. graphs in libraries: a primer | powell et al. 159 networks evidence for the evolution of metabolic processes.7 chemists have used networks to model reactions in a step-wise fashion by “editing” graphs representing models of molecules and their reactivity,8 and they also have used graphs to better comprehend phase transition states, such as the freezing of water or the emergence of superconductivity when a material is cooled.9 economists have used graphs to model market trades and the effects of globalization.10 infectious disease specialists have used networks to model the spread of disease and to evaluate prospective vaccination plans.11 sociologists have modeled the complex interactions of people in communities.12 and in libraries, computer scientists have explored citation networks and coauthorship networks,13 and they have developed maps of science that integrate scientific papers, their topics, the journals in which they appear, and comsumers’ usage patterns to provide a new view of the pursuit of science.14 network science can make complexity more comprehensible by representing a subset of actors and relationships in a complex system as a graph. these has only two edges, it may still rank higher than a more connected page if one of the pages that links to it has a large number of pages pointing to it (see figure 3). this weighted degree centrality measure is eigenvector centrality, and a higher eigenvector centrality score causes a page to show up closer to the top of a google results set. the user never sees a graph, but this graphbased approach to exploring a complex system (the web), works quite well for routine web searches. ■■ graph theory graph theory, also known as network science, has evolved tremendously in the last decade. for example, information scientists have discovered hubs in the web that connect large numbers of pages, and if removed, disconnect large portions of the network.6 biologists have begun to explore cellular processes, such as metabolism, by modeling these processes as networks and have even found in these figure 2. friendship network figure 3. node 2 ranks higher than node 1 because node 3, which connects to node 2, has more incoming links than node 1. node 3 is deemed more important than node 9, which has no incoming links. 160 information technology and libraries | december 2011 as subgraphs, e.g., in the case where a person has two friends who are also mutual friends. small world networks have numerous highly interconnected subgroups called clusters. these may be distributed throughout the network in a regular fashion, with a few random connections that connect the otherwise disconnected clusters. these random links have the effect of greatly reducing the path length between any two nodes and explain the oft-cited six degrees of separation that connect all people to one another. in social networks, agency is often described as the mechanism by which graphs can then be explored visually and mathematically. graphs can be used to represent systems as they are, to extract subsets of these systems, or to discover wholly artificial collections of relationships between components of a speculative system. data also can be represented as graphs when they consist of “measurements that are either of or from a system conceptualized as a network.”15 in short, graphs offer a continuum of techniques for comprehending complexity and are suitable either for a layman with casual interest in a topic or a serious researcher ferreting out discrete details. at the core of network science is the graph. as stated earlier, a graph is a collection of nodes and the edges that connect some of those nodes, together representing a set of actors and relationships in a type of system. relationships can be unidirectional (e.g., in a social network, when the information flows from one person to another) or bidirectional (e.g., when the information flows back and forth between two individuals). relationships also can vary in significance and can be assigned a weight—for example, a person’s relationship to his or her supervisor might be weighted more heavily than a person’s relationship to his or her peers. a graph can consist of a single type of node (for subjects) and a single type of edge connecting those nodes (for predicates). these are called unipartite graphs. from the standpoint of graph theory, these are the easiest types of graphs to work with. graphs that represent two relationships (bipartite) or more are typically reduced to unipartite graphs in the process of exploring them because the vast majority of techniques for evaluating graphs were developed for graphs that address a single relationship between a set of nodes. ■■ global properties of graphs there are other aspects of graphs to consider, sometimes referred to as “global graph properties.”16 there are two basic classes of networks: homogeneous networks and inhomogeneous networks.17 these graphs exhibit characteristics that may not be comprehensible by close examination (e.g., by examining degree centrality, node clustering, or paths within a graph)18 but may be apparent, depending on the size and the way in which the graph is rendered, merely by looking at a visualization of the graph. in homogeneous graphs, nodes have no significant difference between their number of connections. examples include random graphs, complete graphs, and small world networks. in random graphs there is an equal probability that any two nodes will be connected (see figure 4), while in complete graphs (see figure 5) all nodes are connected with one another. random graphs are often used as tools to evaluate networks that describe real systems. complete graphs might occur in social networks figure 4. a random graph. any given node has an equal probability of being linked to any other node figure 5. a complete graph. all nodes are connected to all other nodes graphs in libraries: a primer | powell et al. 161 building blocks of networks.20 a three-node feedback motif is a set of nodes where the edges between them form a triangle and the edges are directional. in other words, node a is connected to (and might convey some information to) node b; node b, in turn, has the same relationship with node c; and node c is connected to (and conveys information back to) node a. in digital libraries, for example, if similar papers exhibit the same pattern of connectivity to a group of subject or keyword categories, motifs will make it possible to readily identify the topical overlap between them. collections of nodes that have a high degree of connectivity with each other are called clusters.21 in many complex systems, clusters are formed by preferential attachment. a group of highly clustered nodes that have low connectivity to the larger graph is known as a clique. while there are other aspects of graphs that can be explored, these four—node centrality measures, paths between nodes, motifs, and clustering—are accessible to most users and are significant in graphs representing bibliographic metadata and textual content. this will become clearer in the examples that follow. ■■ quantitative evaluation of graphs returning now to centrality measures, two of particular interest in digital libraries are degree centrality and betweenness centrality (or flow centrality). an interesting aspect of graphs is that, regardless of the data being represented, centrality measures and clustering characteristics often reveal important clues about the system that the data these random links get established. agency refers to the idea that multiple, often unpredictable actions on the part of individuals in a network result in unanticipated connections between people. examples of such actions are hobbies, past work experience, meeting someone new while on a trip to another country—pretty much anything that takes a person outside his or her normal social circles. in the case of inhomogeneous graphs, not all nodes are created equal. one type, scale-free networks, is common in a variety of systems ranging from biological to technological (see figure 6. these exhibit a structure in which a few nodes play a central role in connecting many others. these hubs form as a result of preferential attachment, known colloquially as “the rich get richer.” researchers became aware of scale-free networks as a result of analysis of the web when it was in its infancy. scale-free networks have been documented in biology, social networks, and technological networks. as a result, they are quite important in the field of information science. small world and scale-free networks are typical of complex systems that occur in nature or evolve because of emergent dynamic processes, in which a system self-organizes over time. small world networks provide fast, reliable communication between nodes, while scale-free networks are more fault tolerant, making them ideal for systems such as living cells, which are frequently challenged by the external environment.19 ■■ local properties of graphs below the ten-thousand-foot system-level view of networks, graphs can be scrutinized more closely using many other techniques. we will now consider four broad categories of local characteristics that describe networks and how they are, or could be, applied in digital libraries: node centrality measures, paths between nodes, motifs, and clustering. centrality measures make it possible to determine the importance of a given node in a network. degree centrality, in its simplest form, is simply a count of the number of edges connected to any given node in a network: a node with high-degree centrality has many connections to other nodes compared to a typical node in the graph. paths make it possible to explore the connections between nodes. an author who is two degrees removed from another author—in other words, the friend of a friend of a friend—has a path length of 2. researchers are often interested specifically in the shortest path between a given pair of nodes. many other types of paths can be explored depending on the type of network, but in libraries, paths that describe the flow of ideas or communication between people are most likely to be useful. motifs are the fundamental recurring structures that make up the larger graph, and they often are called the figure 6. example of a scale-free coauthorship network. a few nodes have many links, and most nodes have few or a single link to another node 162 information technology and libraries | december 2011 path that connects a node through other nodes back to itself. within graph visualization tools, the placement of nodes can vary from one layout to another. what matters is not the pictorial representation (though this can be useful), but the underlying relationships between nodes (the topology). along with clustering, paths help differentiate motifs, which are considered to be building blocks of some types of complex networks. since bibliographic metadata represents communication in one form or another, it is often most common to apply social network theory to graphs. but it is also possible to apply various centrality measures to graphs that are not social and to use these to discover significant nodes within those graphs. in part 2 we consider various unipartite and bipartite graphs that might be especially useful for examining digital library metadata. part 2. graph theory applications in digital libraries library systems, by virtue of the content they contain, are complex systems. fielded searches, faceted searches, and full-text searches all allow users to access aspects of the complex system. fielded searches leverage the explicit structure that has been encoded into the metadata describing the resources that users are ultimately trying to find (articles, books, etc). full-text searches enable users to explore in a more free-form manner, subject of course to the availability of searchable text. often, full-text search means the user is searching titles, abstracts, and other content that summarizes a resource, rather than the actual full text of articles and books. even if the user is searching the full content, there are relationships and aspects of the content that are not readily discernible through a full-text search. furthermore, there is not one single, comprehensive digital library—many library systems live in the deep web, that is, they are databases that are not indexed by search engines like google, and so users must describes, whether it’s coauthorship relationships or protein interactions in the cell of a living organism. often the clusters or nodes that exhibit a higher score in some centrality calculation are significant in some meaningful way compared to other nodes. recall that degree centrality refers to how many edges a given node has. degree centrality can vary significantly in strength depending on the relationships that are represented in the graph. consider a graph of citations between papers. while it may be obvious to humans that the mostly highly cited papers will have the highest-degree centrality, computers have no idea what this means. it is still up to humans to lend a degree of comprehensibility to the raw calculation: in other words, to understand that a paper with high-degree centrality is an important paper, at least among the papers the graph represents. betweenness centrality exposes how integral a given node is to a network. basically, without getting into the mathematics, it measures how often a node falls on the shortest path between other nodes. thus, nodes with high betweenness centrality do not necessarily have a lot of edges, but they bridge disparate clusters. in an informational network, the nodes with high betweenness centrality are crucial to information flow, social connections, or collaborations. hubs are examples of nodes with high betweenness centrality. the removal of a hub causes large portions of a network to become detached. in figure 7, the node labeled “folkner, w.m.” exhibits high betweenness centrality, since it connects two clusters together. a cluster coefficient expresses whether a given node in a network is a member of a tightly interlinked collection of nodes, or clique. the cluster coefficient of an entire graph reveals the overall tendency for clustering in a graph, with higher cluster coefficients typical of small world graphs. in other types of graphs, clusters sometimes manifest as homophily; that is, nodes of a given type are highly interconnected with one another and have few connections with nodes of other types. in social networks, this is sometimes referred to as the “birds of a feather” effect. in a more current reference, the effect was explored as a function of the likelihood that someone would “unfriend” an acquaintance on the social networking site facebook.22 in some networks (such as the internet), clusters are connected by hubs, while in others, the hub is the primary connecting node of other nodes. paths refer to the edges that connect nodes. the simplest case of a path is an edge that connects two nodes directly. path analysis addresses the set of edges that connect two nodes that are not, themselves, directly connected. the shortest path, as its name implies, refers to the route that uses the least number of edges to connect from node a to node b and measures the number of edges, not the linear distance. walks and paths refer to a list of nodes between two nodes, with walks allowing repeat visits to nodes, and paths not allowing them. cycles refer to a figure 7. paths in a coauthorship network graphs in libraries: a primer | powell et al. 163 coauthorship (collaboration) networks coauthorship (collaboration) networks are typically small world networks in which crossand interdisciplinary work provides the random links that connect various clusters (see figure 8). these graphs can be explored to determine which researchers are having the most influence in a given field; influence is a function of frequency of authorship. a prime example is the collaboration network graph for paul erdős, a highly productive mathematician. the popularity of his influence in academia has lead to the creation of the erdős number, which is “defined as indicating the topological distance in the graph depicting the co-authorship relations.”23 liu et al. proposed a node analysis measure that they called authorrank, which establishes weighted directed edges between authors. the author ’s authorrank value is a sum of the weighted edges connected to that author.24 these networks also can be used to explore how an idea spreads and what opportunities may exist for future collaborations, as well as many other existing and potential relationships. citation graphs citation graphs more strongly resemble scale-free networks, in which early papers in a given field tend to accumulate more links. such hub papers can be cited hundreds or even thousands of times while most papers are cited far less often or not at all. many researchers have explored citation graphs, though the person often credited with first noting the network characteristics of citation patterns was dereck j. de solla price in 1965.25 more recently, mark newman introduced the concept of what he calls “first mover advantage” to describe the preferential attachment observed in citation networks.26 search each individually. but if more of these systems adopted semantic web standards, they could be explored as graphs, and relationships between different databases would be easier to discern and represent to the user. many libraries have tried to emulate google by incorporating federated search engines with a single search box as an interface. this copies the form of google’s search engine but not its underlying power. to do that, libraries must enhance full-text searches by drawing on relationships. a full-text search will (hopefully) find relevant papers on a given topic, but a researcher often wants to find the best papers on that topic. to meet that need, libraries must harness the information contained in relationships; otherwise each paper is stuck in a vacuum. cited references are one way to connect papers. for researchers and librarians alike, this is a familiar metric for assessing a paper’s relative importance. the web of science and scopus are two databases that perform this function. looked at another way, citation counts are nothing more than degree centrality applied to a simple graph in which papers are nodes and references are edges. thus, in the framework of graph theory, citation analysis is just a small sliver of a world of possible relationships, many of which are unexplored. the following examples outline use case scenarios in which graph techniques are or could be applied to library data, such as bibliographic metadata, to help users find relationships and conduct research. ■■ informational graphs intrinsic to digital library systems there are multiple relationships represented within and between metadata contained in library systems that can be represented as graphs and explored using graph techniques. some of these, such as citation networks, are among the most well-studied informational networks. citation networks are valued because the data describing them is readily accessible and because scientists studying classes of networks have used them as surrogates for exploring scale-free networks. they are often evaluated as static networks (i.e., a snapshot in time) but some also have dynamic characteristics (e.g., they change and grow over time or they allow information-flow analysis). techniques such as pagerank can be used to evaluate information when the importance of a linking resource is as important as the number of links to a resource. multirelational networks can be developed to explore dynamic processes in research fields by using library data to provide the basic topological framework for some of the explorations. figure 8. a coauthorship network 164 information technology and libraries | december 2011 network with three types of nodes: one to represent individual pieces of debris, a second to represent collections of debris that are the original object that the debris is a fragment of, and a third to represent conjunction events (near misses) between objects. another example of graphs being used as tools is the case of developing vaccination strategies to curtail the spread of an infectious disease.30 in this case, networks have been used to determine that one of the best strategies for curtailing the transmission of a disease is to identify and vaccinate hubs, rather than to conduct mass vaccination campaigns. in libraries, graphs as tools could be used to help researchers identify collaboration opportunities, to disambiguate author identities and aggregate related materials, to allow library staff to evaluate the academic contribution of a group of researchers (bibliometrics), and to explore geospatial and temporal aspects of information, including changes in research focus over time. graphs for author name disambiguation author name disambiguation is a long-standing problem in libraries. many resources have been devoted to manual and automatic name authority control, yet the problem remains unsolved. projects such as oclc viaf and efforts to establish unique author identifiers will no doubt improve the situation, but many problems remain.31 meanwhile, we have experimented with an approach to author name matching by generating multirelational graphs. authors subject–author (expertise) graphs graphs that connect authors by subject areas can vary because of the granularity of subject headings (see figure 9). high-level subject headings tend to function as hubs, but more useful relationships are revealed by specific subject headings and author-provided keywords. the map of science merges publications and citations with actual end user usage patterns captured in library systems and deals, in part, with categories of scientific research.27 it clusters publications and visualizes them “as a journal network that outlines the relationships between various scientific domains.” implicit in this a model is the relationship of authors to subject areas. institution–topic and nation–topic (expertise) graphs from a commercial or geopolitical perspective, graphs that represent institutional or national expertise can reveal valuable information for scientists, elected officials, and investors, particularly in networks that represent the change in a given organization or region’s contributions to a field over time. metadata for scientific papers typically includes enough information to generate nodes and edges describing this. the resulting graph can reveal unexpected details, such as national or institutional efforts to nurture expertise in a given field, and the results of those efforts. the visualization of this data may take the form of icons that vary in shape and size depending on various aspects of nodes in the institution-topic network. these visual representations can then be overlaid onto a map, with various visual aspects of the icons also affected by centrality measures applied to a given institution’s contributions.28 ■■ graphs as tools graph representations can be used as tools to explore a variety of complex systems. even systems that do not initially appear to manifest networks of relationships can often be better understood when some aspect of the system is represented as a graph. this approach requires thinking about what aspects of information needs, discovery, or consumption might be represented or evaluated using networks. two interesting examples from other fields will illustrate the point. a 2009 paper in acta astronautica proposed that techniques to reduce the amount of space junk in orbit around the earth could be evaluated using graph theory techniques.29 the authors propose a dynamic multirelational figure 9. a subject–author graph for stephen hawking graphs in libraries: a primer | powell et al. 165 computation over time because it is typically so important to understanding data. allen’s temporal intervals address machine reasoning over disparate means of recording the temporal aspects of events.33 another temporal computing concept that has applicability to graphs is from the memento project, which makes it possible for users to view prior versions of webpages.34 entities in the memento ontology can become predicates in triples, which in turn can become edges in graphs. using graphs, time can be represented as a relationship between objects or as a distinct object within a graph. nodes that connect through a temporal node may overlap, coincide, or co-occur. nodes that cluster around time represent something important about the objects. genomic-document and proteindocument networks many people hoped that mapping the human genome would result in countless medical advances, but the process whereby genes manifest themselves in living organisms turned out to be much more complex—there wasn’t just a simple mapping between genes and organism traits, there were other processes controlled by genes representing additional layers of complexity scientists had not anticipated. today biologists apply network science to these processes to reveal the missing pieces of this puzzle.35 just as the process itself is complex, the information needs of these researchers benefit from a more sophisticated approach. biologists often need to find papers that reference a given gene or protein sequence. and so, representing these relationships (e.g., article–gene) as graphs has the added benefit of making the digital library research data compatible with the methods that biologists already use to document what they know about these processes. although this is a specialized type of graph, a similar approach might be valuable to researchers in a number of scientific disciplines, including materials science, astrophysics, and environmental sciences. graphs of omission one of the less obvious capabilities of network science is to make predictions about complex systems by looking for missing nodes in graphs.36 this has many applications: for example, identifying a hub in the metabolic processes of bacteria can yield new targets for antibiotics, but it is vital to know that interrupting the enzyme that serves as that hub will effectively kill the organism. making predictions about the evolution of research by identifying areas for cross-disciplinary collaboration or areas where little work has been done—enabling a researcher to leverage are the primary nodes of interest, but relationships such as topic areas, titles, dates, and even soundex representations of names also are represented. as one would expect, phonetically similar names cluster around particular soundex representations. shared coauthorship patterns and shared topic areas can reveal that two different names are for the same author as, for example, when a person’s name changes after marriage (see figure 10). graphs for title or citation deduplication string edit distance involves counting the number of changes that would need to be made to one string to convert it to another, and it is one of the most common approaches to deduplicating titles, citations, and author names. multirelational graphs, in which titles are linked to authors, publication dates, and subjects, result in subgraphs that appear virtually identical when two title variants are represented. centrality measures can be applied to unipartite subgraphs of these networks to home in on areas where data duplication may exist. temporal-topic graphs for analyzing the evolution of knowledge over time a particularly active area of research in graph theory is the representation of dynamical systems as networks. a dynamical system is described as a complex system that changes over time.32 computer scientists have developed various strategies and technologies to cope with figure 10. two authors with similar names linked by subject nodes 166 information technology and libraries | december 2011 basis for an on-the-fly search expansion tool. a querysuggestion tool might look at user-entered terms and determine that some are hubs, then suggest related terms from nodes that connect to those hub nodes. remember, graphs need not be visible to be useful! global subject resolution using dbpedia although dbpedia appears to lag behind wikipedia in terms of completeness and scrutiny by domain experts, it offers one mechanism for unifying user-provided tags, author keywords, and library-assigned subject headings with a graph of known facts about a topic. links into and out of dbpedia’s graphs on a given topic would enable serendipitous knowledge discovery through browsing these semantic graphs. viaf linked author data oclc’s effort to convert tens of millions of identity records into graphs describing various attributes of authors promises to enhance exploration of digital library content on the author dimension.42 these authority records contain a wealth of information, linking name variations, basic genealogical data such as birth and death dates, associations with institutions, subject areas, and titles published by authors. although some rough edges need to be smoothed (one of the authors of this paper discovered that his own authorship data was linked with another author of the same name), iterative refinement of this data as it is actually used may enable crowd-sourced the first-mover advantage and thus advance his or her career—is a valuable service that libraries are well positioned to provide (see figure 11). machine-supplied suggestions offer another type of prediction. for example, providing the prompt “did you mean john smith and climate change?” can leverage real or predicted relationships between author and subject (see figure 12). graphs, in turn, can be used to create tools that will simplify an author–subject search. viral concept detection phase transition typically refers to a process in thermodynamics that describes the point at which a material changes from one state of matter to another (e.g., liquid to solid). phase transition also applies to the dispersal of a new idea. interestingly enough, graphs representing matter at the point of phase transition, and graphs representing the spread of a fad in a social network, exhibit the same recognizable pattern of change: suddenly there are links between many more nodes, there’s a dramatic increase in clustering, and something called a giant component emerges.37 in a giant component, all of the nodes in that portion of the graph are interlinked, resulting in a complete graph like figure 5. this is not so different from what one observes when something “goes viral” on the internet. in a library, a dynamic graph showing the usage of new keywords for emerging subject areas would likely reflect a similar pattern. ■■ linked data graph examples cross-collection graphs, or graphs that link data under your control to data published online, can be constructed by building links into the web of linked data.38 linked data refers to semantic graphs of statements that various organizations publish on the web. for example, geonames.org publishes millions of statements about geographic locations on the linked data web.39 as these graphs grow and evolve, opportunities emerge for using this data in combination with your own data in various ways. for example, it would be quite interesting to develop a network representation of library subject headings and their relationships to concepts in the encyclopedic linked data collection known as dbpedia.40 the resulting graph could be used in a variety of ways: for example, to evaluate the consistency of statements made about concepts, to establish semantic links between user-provided tags and concepts,41 or to function as the figure 11. identifying areas for collaboration: a co-author graph with many simple motifs and few clusters might indicate a field ripe for collaboration graphs in libraries: a primer | powell et al. 167 content could be represented and explored as a graph, and some research has already shown that geographic networks—especially those representing human-constructed entities such as cities and transportation networks—exhibit small world characteristics.45 another way graphs can express geographic relationships in useful ways would be in representing the concept of nearness. waldo tobler’s first law of geography states that “everything is related to everything else, but near things are more related than distant things.”46 in practice, human beings define nearness in different ways, so a graph representing a shared concept of nearness would be very valuable, particularly in exploring works associated with biological, ecological, geological, or evolutionary sciences. graph representations of nearness could be developed by librarians working with scientists and could be the geographic equivalent to subject guides and finding aids. they also might be useful across disciplines and would enable machine inferencing across data that include geographic relationships. still other kinds of graphs what might a digital library tool based on graph theory look like? what could it do? it wouldn’t necessarily depict visualizations of graphs (though in some cases visual graphs are the most efficient way to impart concepts). after all, citation databases utilize graph theory, but the user only sees a number (cite count) and lists of articles (citing or cited). in many cases, then, the tool would perform graph evaluation techniques behind the scenes, translating these metrics into simple descriptive queries for the user. for example, a user interested in the most influential papers in his field would enter his subject, and then on the backend, the tool would apply eigenvector centrality to that subject’s citation graph. if the same user finds an especially relevant article, clicking a “find similar articles” button will produce a list of papers in that graph with the shortest path length to the paper in question. researchers also could use this tool to evaluate authors and institutions in various ways: ■■ is my output diverse or specialized compared to my colleagues? the tool assigns a score for each author based on degree centrality in a subject-author graph. ■■ i want to find potential collaborators. the tool returns authors connected to researcher by the shortest path length in a coauthorship graph. ■■ i want to collaborate with colleagues from other departments at my institution. high betweenness centrality quality control that will more rapidly identify and resolve these problems. linked geographic data using geonames it is ironic that the use of networks to describe geographic aspects of the world is in its infancy, considering that many consider leonhard euler’s attempt to find a mathematical solution to the seven bridges of königsberg problem in 1735 to be the birth of the field.43 as some authors have pointed out, geometric evaluation of geographic relationships is actually a poor way to explore geographic relationships.44 graphs can be used to express arbitrary relationships between geographically separated objects, and it is perhaps no accident that our road and railway systems are in fact among the most familiar graphs that people encounter in the real word. a subway map is a graph where subway stations are nodes linked by railway. graphs can represent the relationships between topological features, the visibility of buildings in a city to one another, or the land, sea, and air transportation that links one country to another. geonames supplies a rich collection of geographic information that includes descriptions of geopolitical entities (cities, states, countries), geophysical features, and various names that have been ascribed to these objects. the geographic relationships in intellectual figure 12. find similar articles: a search for hv reynolds might prompt a suggestion for sd miller, who has a similar authorship pattern 168 information technology and libraries | december 2011 nov. 21, 2007, timbl’s blog, http://dig.csail.mit.edu/bread crumbs/node/215. 5. lawrence page et al., the pagerank citation ranking: bringing order to the web (1999), http://citeseerx.ist.psu.edu/ viewdoc/summary?doi=10.1.1.31.1768. 6. duncan s. callaway et al., “network robustness and fragility: percolation on random graphs,” physical review letters 85, no. 25 (2000): 5468–71. 7. adreas wagner and david a. fell, “the small world inside large metabolic networks,” proceedings of the royal society b: biological sciences 268, no. 1478 (2001): 1803–10. 8. gil benko, christopher flamm, and peter f. stadler, “a graph-based toy model of chemistry,” journal of chemical information and modeling 43, no. 4 (2003): 1085–93. 9. tad hogg, bernardo a. huberman, and colin p. williams, “phase transition and the search problem,” artificial intelligence 81 (1996): 1–15. 10. vladimir boginski, sergiy butenko, and panos m. pardalos, “mining market data: a network approach,” computers & operations research 33, no. 11 (2006): 3171–84. 11. zoltán dezső and albert-lászló barabási, “halting viruses in scale-free networks,” physical review e 65, no. 5 (2002), doi: 10.1103/physreve.65.055103. 12. hans noel and brendan nyhan, “the ‘unfriending’ problem: the consequences of homophily in friendship retention for causal estimates of social influence,” sept. 2010, http://arxiv.org/abs/1009.3243. 13. johan bollen et al., “toward alternative metrics of journal impact: a comparison of download and citation data,” information processing & management 41, no. 6 (2005): 1419–40; xiaoming liu et al., “co-authorship networks in the digital library research community,” information processing & management 41, no. 6 (2005): 1462–80. 14. johan bollen et al., “clickstream data yields highresolution maps of science,” ed. alan ruttenberg, plos one 4, no. 3 (3, 2009): e4803. 15. eric kolaczyk, statistical analysis of network data (new york; london: springer, 2009). 16. alejandro cornejo and nancy lynch, “reliably detecting connectivity using local graph traits,” csail technical reports mit-csail-tr-2010–043, 2010, http://hdl.handle .net/1721.1/58484 (accessed feb. 17, 2011). 17. réka albert, hawoong jeong, and albert-lászló barabási, “error and attack tolerance of complex networks,” nature 406, no. 6794 (2000): 378–82. 18. m. e. j. newman, “scientific collaboration networks. ii. shortest paths, weighted networks, and centrality,” physical review e 64, no. 1 (2001), doi: 10.1103/physreve.64.016132. 19. albert, jeong, and barabási, “error and attack tolerance.” 20. r. milo, “network motifs: simple building blocks of complex networks,” science 298, no. 5594 (2002): 824–27. 21. lawrence j. hubert, “some applications of graph theory to clustering,” psychometrika 39, no. 3 (1974): 283–309. 22. noel and nyhan, “the ‘unfriending’ problem.” 23. alexandru balaban and douglas klein, “co-authorship, rational erdős numbers, and resistance distances in graphs,” scientometrics 55, no. 1 (2002): 59–70. 24. liu et al., “co-authorship networks in the digital library research community.” 25. derek j. de solla price, “networks of scientific papers,” in a subject–author graph for that institution may locate potential “bridge” subjects to collaborate in. ■■ i’m leaving my current job. what other institutions are doing similar work? in an institution–subject graph, the shorter the path length between two institutions, the more comparable they may be. graphs also enable libraries to reach outside their own data to build connections with other data sets. heterogeneity, which makes relational database representations of arbitrary relationships difficult or impossible, becomes a trivial matter of adding additional nodes and edges to bridge collections. the linked data web defines simple requirements for establishing just such representations, and libraries are wellpositioned to build these bridges. ■■ conclusion for many centuries, libraries have served as repositories for the accumulated knowledge and creative products of civilization, and they contain mankind’s best efforts at comprehending complexity. this knowledge includes scientific works that strive to understand various aspects of the physical world, many of which are complex and require the efforts of numerous researchers over time. since the advent of the dewey decimal system, librarians have worked on many fronts to make this knowledge discoverable and to assist in its evaluation. qualitative evaluation increasingly requires understanding a resource in a larger context. we suggest that this context is itself a complex system, which would benefit from the modeling and quantitative evaluation techniques that network science has to offer. we believe librarians are well positioned to leverage network science to explore and comprehend emergent properties of complex information environments. as motivation for this pursuit, we offer in closing this prescient quote from carl woese, which though focused on the discipline of biology, is equally applicable to the myriad complexities of modern life: “a society that permits biology to become an engineering discipline, that allows that science to slip into the role of changing the living world without trying to understand it, is a danger to itself.”47 references 1. melanie mitchell, complexity: a guided tour (oxford, england; new york: oxford univ. pr., 2009). 2. carl woese, “a new biology for a new century,” microbiology and molecular biology reviews (june 2004): 173–86, doi: 10.1128/mmbr. 68.2.173–186.2004. 3. national research council (u.s.), network science (washington, d.c.: national academies pr., 2005). 4. tim berners-lee, “giant global graph,” online posting, graphs in libraries: a primer | powell et al. 169 networks,” proceedings of the national academy of sciences of the united states of america 98, no. 2 (jan. 16, 2001): 404–9. 38. chris bizer, richard cyganiak, and tom heath, how to publish linked data on the web? http://sites.wiwiss.fu-berlin.de/ suhl/bizer/pub/linkeddatatutorial/ (accessed feb. 17, 2011). 39. geonames, http://www.geonames.org/ (accessed feb. 17, 2011). 40. dbpedia, http://dbpedia.org/ (accessed february 17, 2011). 41. alexandre passant and phillippe laublet, “meaning of a tag: a collaborative approach to bridge the gap between tagging and linked data,” proceedings of the www 2008 workshop linked data on the web (ldow2008), bejing, apr. 2008, doi: 10.1.1.142.6915. 42. oclc, “viaf”; oclc homepage, http://www.oclc.org/ us/en/default.htm (accessed feb. 17, 2011). 43. norman biggs, graph theory, 1736–1936 (oxford, england; new york: clarendon, 1986). 44. bin jiang, “small world modeling for complex geographic environments,” in complex artificial environments (springer berlin heidelberg, 2006): 259–71, http://dx.doi.org/10.1007/3 -540-29710-3_17. 45. gillian byrne and lisa goddard, “the strongest link: libraries and linked data,” d-lib magazine 16, no. 11/12 (2010), http://www.dlib.org/dlib/november10/byrne/11byrne.html (accessed feb. 17, 2011). 46. daniel sui, “tobler’s first law of geography: a big idea for a small world?” annals of the association of american geographers 94, no. 2 (2004): 269–77. 47. woese, “a new biology for a new century.” science 149, no. 3683 (july 30, 1965): 510–15. 26. m. e. j. newman, “the first-mover advantage in scientific publication,” epl (europhysics letters) 86, no. 6 (2009): 68001. 27. bollen et al., “clickstream data yields high-resolution maps of science.” 28. chaomei chen, jasna kuljis, and ray j. paul, “visualizing latent domain knowledge,” ieee transactions on systems, man and cybernetics, part c (applications and reviews) 31, no. 4 (nov. 2001): 518–29. 29. hugh g. lewis et al., “a new analysis of debris mitigation and removal using networks,” acta astronautica 66, no. 1–2 (2010): 257–68. 30. dezso and barabási, “halting viruses in scale-free networks.” 31. oclc, “viaf (the virtual international authority file) [oclc—activities],” http://www.oclc.org/research/activities/viaf/ (accessed feb. 17, 2011). 32. mitchell, complexity: a guided tour. 33. james f. allen, “toward a general theory of action and time,” artificial intelligence 23, no. 2 (1984): 123–54. 34. herbert van de sompel et al., “memento: timemap apo for web archives,” http://www.mementoweb.org/events/ ia201002/slides/memento_201002_timemap.pdf (accessed feb. 17, 2011). 35. hawoong jeong et al., “lethality and centrality in protein networks,” nature 411 (may 3, 2001): 41–42. 36. aaron clauset, cristopher moore, and m. e. j. newman, “hierarchical structure and the prediction of missing links in networks,” nature 453, no. 7191 (2008): 98–101. 37. m. e. j. newman, “the structure of scientific collaboration microsoft word december_ital_gonzales_final.docx linking libraries to the web: linked data and the future of the bibliographic record brighid m. gonzales information technology and libraries | december 2014 10 abstract the ideas behind linked data and the semantic web have recently gained ground and shown the potential to redefine the world of the web. linked data could conceivably create a huge database out of the internet linked by relationships understandable by both humans and machines. the benefits of linked data to libraries and their users are potentially great, but so are the many challenges to its implementation. the bibframe initiative provides the possible framework that will link library resources with the web, bringing them out of their information silos and making them accessible to all users. introduction for many years now the marc (machine-‐readable cataloging) format has been the focus of rampant criticisms across library-‐related literature, and though an increasing number of diverse metadata formats for libraries, archives, and museums have been developed, no framework has shown the potential to be a viable replacement for the long-‐established and widely used bibliographic format. over the past decade, web technologies have been advancing at a progressively rapid pace, outpacing marc’s ability to keep up with the potential these technologies can offer to libraries. standing by the marc format leaves libraries in danger of not being adequately prepared to meet the needs of modern users in the information environments they currently frequent (increasingly, search engines such as google). new technological developments such as the ideas behind linked data and the semantic web have the potential to bring a host of benefits to libraries and other cultural institutions by allowing libraries and their carefully cultivated resources to connect with users on the web. though there remains a host of obstacles to their implementation, linked data has much to offer libraries if they can find ways to leverage this technology for their own uses. libraries are slowly finding ways to take advantage of the opportunities linked data present, including initiatives such as the bibliographic framework initiative, known as bibframe, which may have the potential to be the bibliographic replacement for marc that the information community has long needed. such a change may help libraries not only to stay current with the modern information world and stay relevant in the minds of users, but also reciprocally create a richer world of data available to information seekers on the web. brighid gonzales (brighidmgonzales@gmail.com), a recent mlis recipient from the school of library and information science, san jose state university, is winner of the 2014 lita/ex libris student writing award. linking libraries to the web | gonzales 11 the limitations of marc much has been written over the years about the issues and shortcomings of the marc format. nonetheless, marc formatting has been widely used by libraries around the world since the 1960s, when it was first created. this long-‐established and ubiquitous usage has resulted in countless legacy bibliographic records that currently exist in the marc format. to lose this carefully crafted data or to expend the finances, time, and manual effort required to convert all of this legacy data into a new format may be a cause for reservation in the community. but the fact remains that in spite of its widespread use, there are many issues with the marc format that make it a candidate for replacement in the world of bibliographic data. andresen describes several different versions of marc that have largely been wrapped together in the community’s mind, reminding us that “although marc21 is often described as an international standard, it is only used in a limited number of countries.”1 in actuality, what we often refer to simply as marc could be marc21, ukmarc, unimarc or even danmarc2.2 this lack of a unified standard has long been an issue with this particular format. then there is marc’s notorious inflexibility. originally created for the description of printed materials, marc’s rigidly defined standards can make it unsuited for the description of digital, visual, or multimedia resources. andresen writes that “the lack of flexibility means that local additions might hinder exchange between local systems and union catalogue systems.”3 tennant has also expressed frustration with marc’s inflexibility, particularly its inability to express hierarchical relationships. tennant posits that where the marc format is “flat,” expressing relationships involving hierarchy, such as in a table of contents, “would be a breeze in xml,” which is the format he recommends moving toward for its greater extensibility.4 marc’s rigidity may also be a reason why the format is not generally used outside of the library environment; thus information contained in marc format cannot be exchanged with information from nonlibrary environments.5 inconsistencies, errors, and localized practices are also issues frequently cited in detailing marc’s inherent shortcomings. with shared cataloging, inconsistencies may be less common, but there remains the fact that with any number of individual catalogers creating records, the potential for error is still great. and any localized changes can also create inconsistency in records from library to library. tennant gives as an example recording the editor of a book, which “should be encoded in a 700 field, with a $e subfield that specifies the person is the editor. but the $e subfield is frequently not encoded, thus leaving one to guess the role of the person encoded in the 700 field.”6 when it comes to issues with marc in the modern computing environment, however, one of the biggest and seemingly insurmountable problems is its inability to express the relationships between entities. andresen points out that it is “difficult to handle relations between data that are described in different fields,”7 while tennant writes that “relationships among related titles are problematic.”8 alemu et al. also write of marc’s “document-‐centric” structure, which prevents it information technology and libraries | december 2014 12 from recognizing relationships between entities that might be possible in a more “actionable data-‐ centric format.”9 though tennant advocates the embrace of xml-‐based formats as a way to transition from marc, breeding writes that even marcxml “cannot fully make intelligible the quirky marc coding in terms of semantic relationships.”10 alemu et al. also note that marc may continue to be widely used mainly because alternatives, including xml, have not yet been found to be an adequate replacement.11 it is clear that if libraries and their carefully crafted bibliographic records are to remain relevant and viable in today’s modern computing world, a more modern metadata format that addresses these issues will be required. clearly needed is a more flexible and extensible format that allows for the expression of relationships between points of data and the ability to link that data to other related information outside of the presently insular library catalog. linked data and the semantic web linked data works as the framework behind the semantic web, an idea by world wide web inventor tim berners-‐lee, which would turn the internet into something closer to one large database rather than simply a disparate collection of documents. since the internet is often the first place users turn to for information, libraries should take advantage of the concepts behind linked data to both put their resources out on the web, where they can be found by users, and in turn bring those users back to the library through the lure of authoritative, high-‐quality resources. in the world of linked data, the relationships between data, not just the documents in which they are contained, are made explicit and readable by both humans and machines. with the ability to “understand” and interpret these semantically explicit connections, computers will have the power to lead users to a web of related data based on a single information search. underpinning the semantic web are the web-‐specific standards xml and rdf (resource description framework). these work as universal languages for semantically labeling data in such a way that both a person and a computer can interpret their meaning and then distinguishing the relationships between the various data sources. these relationships are expressed using rdf, “a flexible standard proposed by the w3c to characterize semantically both resources and the relationships which hold between them.”12 baker notes that rdf supports “the process of connecting dots—of creating “knowledge”—by providing a linguistic basis for expressing and linking data.”13 rdf is organized into triples, expressing meaning as subject, verb, and object and detailing the relationships between them. an example is the catcher in the rye is written by j. d. salinger, where the catcher in the rye acts as the subject, j. d. salinger is the object and the “verb” is written by expresses the semantic relationship between the two, naming j. d. salinger as the author of the catcher in the rye. by using this framework, computers can link to other rdf-‐encoded data, leading users to other works written by j. d. salinger, other adaptations of the catcher in the rye, and other related data sources from around linking libraries to the web | gonzales 13 the web. rdf gives machines the ability to “understand” the semantic meaning of things on the web and the nature of the relationships between them. in this way it can make connections for people, leading them to related information they may not have otherwise found. the use of xml allows developers to create their own tags, adding an explicit semantic structure to their documents that they can exploited using rdf. the semantic web is based on four rules explicated by web inventor tim berners-‐lee. the rules for the semantic web are as follows: 1. use uris (uniform resource identifiers) as names for things. 2. use http uris so that people can look up those names. 3. when someone looks up a uri, provide useful information, using the standards (rdf*, sparql). 4. include links to other uris so that they can discover more things.14 uris act as a permanent signpost for things, both on and off the web. using consistent uris allows data to be linked between and back to certain places on the web without the worry of broken or dead links. rdf triples map the relationships between each thing, which can then be linked to more things, opening up a wide world of interrelated data for users. the concept behind linked data would allow for the integration of library data and data from other resources, whether from “scientific research, government data, commercial information, or even data that has been crowd-‐sourced.”15 however, to create an open web of data facilitated by linked data theories, open standards such as rdf must be used, making data interoperable with resources from various communities. this interoperability is key to being able to mix library resources with those from other parts of the web. interoperability helps to make “data accessible and available, so that they can be processed by machines to allow their integration and their reuse in different applications.”16 in this way, machines would be able to understand the relationships and connections between data contained within documents and thus lead users to related data they may not have otherwise found. using linked data would bring carefully crafted and curated library data out of the information silos in which they have long been enclosed and connect them with the rest of the web where users can more easily find them. benefits for libraries libraries and their users have much to gain from participation in the linked data movement. in an age when google is often the first place users turn when searching for information, freeing library data from their insulated databases and getting them out onto the web where the users are can help make library resources both relevant and available for users who may not make the library information technology and libraries | december 2014 14 the first place they look for information. this can lead not only to increased use by library patrons and nonpatrons (who would now be potential library patrons) alike, but also to increased visibility for the library. creating and using linked data technologies also opens the door for libraries to share metadata and other information in a way previously limited by marc. libraries also have the potential to add to the richness of data that is available on the web, creating a reciprocal benefit with the semantic web itself. coyle writes that “every minute an untold number of new resources is added to our digital culture, and none of these is under the bibliographic control of the library.”17 indeed, the world wide web is a participatory environment where anyone can create, edit or manipulate information resources. libraries still consider themselves the province of quality, reliable information, but users don’t necessarily go to libraries when searching and don’t necessarily have the internet acumen to distinguish between authoritative information and questionable resources. coyle also notes that “the push to move libraries in the direction of linked data is not just a desire to modernize the library catalog; it represents the necessity to transform the library catalog from a separate, closed database to an integration with the technology that people use for research.”18 using linked data, libraries can still create the rich, reliable, authoritative data they are known for while also making it available on the web, where potentially anyone can find it. much has been written about libraries’ information silos, and many researchers are finding in linked data the possibility to free this information. for the information contained in the library catalog to be significantly more usable it “must be integrated into the web, queryable from it, able to speak and to understand the language of the web.”19 alemu et al. write that linking library data to the web “would allow users to navigate seamlessly between disparate library databases and external information providers such as other libraries, and search engines.”20 users are likely to find the world of linked data immeasurably more useful than individually searching library databases one-‐by-‐one or relying on google search results for the information they need. linked data also allows for the possibility of serendipity in information searching, of finding information one didn't even know they were looking for, something akin to browsing the library shelves.21 linked data “allows for the richer contextualization of sources by making connections not only within collections but also to relevant outside sources.”22 tillett adds that linked data would allow for “mashups and pathways to related information that may be of interest to the web searcher—either through showing them added facets they may wish to consider to refine their search or suggesting new directions or related resources they may also like to see.”23 the use of linked data is not just beneficial to users though. libraries are also likely to see increased benefits in the sharing of metadata and other resources. alemu et al. write that “making library metadata available for re-‐use would eliminate unnecessary duplication of data that is already available elsewhere, through reliable sources.24 tillett also writes about the reduced cost to libraries for storage and data in a linked data environment where “libraries do not need to replicate the same data over and over, but instead share it mutually with each other and with linking libraries to the web | gonzales 15 others using the web,” reducing costs and expanding information accessibility.25 byrne and goddard also note that “having a common format for all data would be a huge boon for interoperability and the integration of all kinds of systems.”26 in addition to the reduced cost of shared resources, something with which libraries are already very familiar, the linking of data from libraries to one another and to the web would also allow for an increased richness in overall data. from metadata that may need to be changed or updated periodically to user-‐generated metadata that is more likely to include current, up-‐to-‐date terminology, the “mixed metadata” approach allowed by linked data would be “better situated to provide a richer and more complete” description of various resources that could more accurately provide for the variety of interpretation and terminology possible in their description.27 a new bibliographic framework one of the most important ways libraries are moving toward the world of linked data is with the bibliographic framework initiative, known as bibframe, which was announced by the library of congress in 2011. since then, though bibframe is still in development, rapid progress has been made that suggests that bibframe may be the long-‐awaited replacement for the marc format that could free library bibliographic information from its information silos and allow it to be integrated with the wider web of data. the bibframe model comprises four classes: creative work, instance, authority, and annotation. in this model, creative work represents the “conceptual essence” of the item. instance is the “material embodiment” of the creative work. authority is a resource that defines relationships reflected by the creative work and instance, such as people, places, topics, and organizations. annotation relates the creative work with other information resources, which could be library holdings information, cover art, or reviews.28 these are similar in a way to the frbr (functional requirements for bibliographic records) model, which uses work, expression, manifestation, and item.29 indeed, bibframe is built with rda (resource description and access) as an important source for content, which was in turn built around the principles in frbr. despite this, bibframe “aims to be independent of any particular set of cataloging rules.”30 realizing the vast amounts of information that is still recorded in marc format, the bibframe initiative is also working on a variety of tools that will help to transform legacy marc records into bibframe resources.31 these tools will be essential as “the conversion of marc records to useable linked data is a complicated process.”32 where marc allowed for libraries to share bibliographic records without each having to constantly reinvent the wheel, bibframe will allow library metadata to be “shared and reused without being transported and replicated.”33 bibframe would support the linked data model while also incorporating emerging content standards such as frbr and rda.34 the bibframe initiative is committed to compatibility with existing marc records but would eventually replace marc as a bibliographic framework “agnostic to cataloging rules”35 rather than intertwined with them as marc was with aacr2. also unlike information technology and libraries | december 2014 16 marc, which is rigidly structured and not amenable to incorporation with web standards, bibframe would enable library metadata to be found on the web, freeing it from the information silos that have contained it for decades. whereas marc is not very web-‐compatible, “bibframe is built on xml and rdf, both ‘native’ schemas for the internet. the web-‐friendly nature of these schemas allows for the widest possible indexing and exposure for the resources held in libraries.”36 backed by the library of congress, bibframe already has a great deal of support throughout the information community, though it is not yet at the stage of implementation for most libraries. however, half a dozen libraries and other institutions are acting as “early experimenters” working to implement and experiment with bibframe to assist in the development process and get the framework library ready. participating institutions include the british library, george washington university, princeton university, deutsche national bibliothek, national library of medicine, oclc, and the library of congress.37 though not yet fully realized, bibframe seems to offer a substantial step toward the implementation of linked data to connect library bibliographic materials with other resources on the web. the challenges ahead the road to widespread use of the semantic web, linked data, and even possible implementations such as bibframe is not without obstacles. for one, knowledge and awareness is a major concern, as well as the intimidating thought of transitioning away from marc, a standard that has been in widespread use for as long as many of the professionals using it have been alive. there is also the challenge and significant resources required for converting huge stores of legacy data from marc format to a new standard. in addition, linked data has its own set of specific concerns, such as legality and copyright issues involved in the sharing of information resources, as well as the willingness of institutions to share metadata that they may have invested a great deal of time and money in creating. many organizations may be hesitant to make the move toward linked data without a clear sign of success from other institutions. chudnov writes that “a new era of information access where library-‐provided resources and services rose swiftly to the top of ambient search engines’ results and stayed there” is what may be necessary, as well as “tools and techniques that make it easier to put content online and keep it there.”38 byrne and goddard also note that “linked data becomes more powerful the more of it there is. until there is enough linking between collections and imaginative uses of data collections there is a danger librarians will see linked data as simply another metadata standard, rather than the powerful discovery tool it will underpin.”39 alemu et al. concur that making linked data easy to create and put online is necessary before potential implementers will begin to use it. “it is imperative that the said technologies be made relatively easy to learn and use, analogous to the simplicity of creating html pages during the early days of the web.”40 the potential learning curve involved in linked data may be a great barrier to its potential use. tennant writes in an article about moving away from marc to a more linking libraries to the web | gonzales 17 modern bibliographic framework that users “must dramatically expand our understanding of what it means to have a modern bibliographic infrastructure, which will clearly require sweeping professional learning and retooling.”41 even without considering ease-‐of-‐use difficulties or the challenges in teaching practitioners an entirely new bibliographic system, the fact remains that transitioning away from marc toward any new bibliographic infrastructure system will require a great deal of resources, time and effort. “there are literally billions of records in marc formats; an attempt at making the slightest move away from it would have huge implications in terms of resources.”42 breeding also writes of the potential trauma involved in shifting away from marc, which is currently integral to many library automation systems.43 a shift to anything else would require not just the cooperation of libraries but also of vendors, who may see no reason to create systems compatible with anything other than marc. as tennant writes, “anyone who has ever been involved with migrating from one integrated library system to another knows, even moving from one system based on marc/aacr2 to another can be daunting.”44 moving from a marc/aacr2-‐based system to one based on an entirely new framework may be more of a challenge than many libraries would like to take on. a move to something such as bibframe may be fraught with even more difficulty, though it is impossible to say before such an implementation has been fully realized. library system software is not yet compatible with bibframe, and as kroeger writes, “most libraries will not be able to implement bibframe because their systems do not support it, and software vendors have little incentive to develop bibframe integrated library systems without reasonable certainty of library implementation of bibframe.”45 this catch-‐22 situation may be difficult to remedy without a large cooperative effort between libraries, vendors, and the entire information community. another potential obstacle to bibframe implementation that kroeger suggests is the possible difficulty in providing interoperability with all of the many other metadata standards currently in existence.46 this is an issue that tennant also considers in his recommendations that a new bibliographic infrastructure compatible with modern library and information needs must be versatile, extensible, and especially interoperable with other metadata schemes currently in use.47 xml has proven to be useful for a wide variety of metadata schemas, but bibframe would need to be able to make library data held in a huge variety of metadata standards available for use on the web. another issue, cited by byrne and goddard, is that of privacy. “librarians, with their long tradition of protecting the privacy of patrons, will have to take an active role in linked data development to ensure rights are protected.”48 issues of copyright and ownership, something libraries already grapple with in the licensing of various library journals, databases, and other electronic resources, may be insurmountable. “libraries no longer own much of the content they provide to users; rather it is subscribed to from a variety of vendors. not only does that mean that vendors will have to make their data available in linked data formats for improvements to federated search to happen, but a mix of licensed and free content in a linked data environment would be extremely information technology and libraries | december 2014 18 difficult to manage.”49 again, overcoming obstacles such as these would require intense negotiation and cooperation between libraries and vendors. a sustainable and viable move to a linked data environment would need to be a cooperative effort between all involved parties and would have to have the full support and commitment of everyone involved before it could begin to move forward. moving libraries toward linked data making the move toward the use of linked data and modern bibliographic implementations such as bibframe will require a great deal of cooperation, sharing, learning, and investigation, but libraries are already starting to look toward a linked future and what it will take to get there. libraries will need to begin incorporating the principles of linked open data in their own catalogs and online resources as well as publishing and sharing as much data as possible. libraries also need to put forth a concerted effort to encourage vendors to move toward library systems which can accommodate a linked data environment. alemu et al. write that cooperation and collaboration between all of the involved stakeholders will be a crucial piece to the transfer of library metadata from catalog to web. in the process, and as part of this cooperative effort, libraries will have to wholeheartedly adopt the rdf/xml format, something alemu et al. deem “mandatory.”50 this would support the “conceptual shift from perceiving library metadata as a document or record to what coyle (2010) terms as actionable metadata, i.e., one that is machine-‐readable, mash-‐able and re-‐combinable metadata.”51 chudnov adds that libraries will need to follow “steady url patterns” for as much of their resources as possible, one of the key rules of linked data. 52 he also notes that we will know we have made progress on the implementation of linked data when “link hubs at smaller libraries (aka catalogs and discovery systems) cross link between local holdings, authorities, these national authority files, and peer libraries that hold related items,” though the real breakthrough will come when “the big national hubs add reciprocal links back out to smaller hub sites.”53 before this can happen, however, libraries must make sure that all of their own holdings link to each other, from the catalog to items in online exhibits. chudnov also advocates adding user-‐generated knowledge into the mix by allowing users to make new connections between resources when and where they can.54 borst, fingerle, and neubert, in their conference report from 2009, write that libraries and projects using linked data need to regard the catalog as a network, publish their data as linked data using the semantic web standards laid out by tim berners-‐lee, and link to external uris.55 they also suggest libraries use and help to further develop open standards that are already available rather than rely on in-‐house developments.56 in their final recommendation, they write that while libraries need to publish their data as open linked data on the web, they should also try to do so with the “least possible restrictions imposed by licences in order to ensure widest re-‐ usability.”57 linking libraries to the web | gonzales 19 conclusion the theories behind linked data and the semantic web are still in the process of being drawn out, but it is clear that at this point they are more than hypotheticals. linked data is the possible future of the web and how information will be organized, searched for, discovered, and retrieved. as search algorithms continue to improve and users continue to turn to them first (and sometimes entirely) for their information needs, libraries will need to make major changes to ensure the data they have painstaking created and curated over the decades remains relevant and reachable to users on the web. linked data provides the opportunity for libraries to integrate their authoritative data with user-‐generated data from the web, creating a rich network of reliable, current, far-‐reaching resources that will meet users’ needs wherever they are. libraries have always been known to embrace technology to stay at the forefront of user needs and provide unique and irreplaceable user services. to stay current with shifts in modern technology and user behavior, libraries need to be a driving force in the implementation of linked data, embrace semantic web standards, and take full advantage of the benefits and opportunities they present. ultimately, libraries can leverage the advantages created by linked data to construct a better information experience for users, keeping libraries both a relevant and more highly valued part of information retrieval in the twenty-‐first century. references 1. leif andresen, “after marc—what then?” library hi tech 22, no. 1 (2004): 41. 2. ibid., 40-‐51. 3. ibid., 43. 4. roy tennant, “marc must die,” library journal 127, no. 17 (2002): 26–28, http://lj.libraryjournal.com/2002/10/ljarchives/marc-‐must-‐die/#_. 5. andresen, “after marc—what then?” 6. tenant, “marc must die.” 7. andresen, “after marc—what then?”, 43. 8. tenant, “marc must die.” 9. getaneh alemu et al., “linked data for libraries: benefits of a conceptual shift from library-‐ specific record structures to rdf-‐based data models,” new library world 113, no. 11/12 (2012): 549-‐570, http://dx.doi.org/10.1108/03074801211282920. 10. marshall breeding, “linked data: the next big wave or another tech fad?,” computers in libraries 33, no. 3 (2013): 20-‐22, http://www.infotoday.com/cilmag/. 11. alemu et al., “linked data for libraries.” information technology and libraries | december 2014 20 12. mauro guerrini and tiziana possemato, “linked data: a new alphabet for the semantic web,” italian journal of library & information science 4, no. 1 (2013): 79-‐80, http://dx.doi.org/10.4403/jlis.it-‐6305. 13. tom baker, “designing data for the open world of the web,” italian journal of library & information science 4, no 1 (2013): 64, http://dx.doi.org/10.4403/jlis.it-‐6308. 14. tim berners-‐lee, “linked data,” w3.org, last modified june 18, 2009, http://www.w3.org/designissues/linkeddata.html. 15. karen coyle, “library linked data: an evolution,” italian journal of library & information science 4, no 1 (2013): 58, http://dx.doi.org/10.4403/jlis.it-‐5443. 16. gianfranco crupi, “beyond the pillars of hercules: linked data and cultural heritage,” italian journal of library & information science 4, no. 1 (2013), 36, http://dx.doi.org/10.4403/jlis.it-‐ 8587. 17. coyle, “library linked data: an evolution,” 56. 18. ibid., 56-‐57. 19. crupi, “beyond the pillars of hercules,” 35. 20. alemu et al., “linked data for libraries,” 562. 21. ibid. 22. thea lindquistet al., “using linked open data to enhance subject access in online primary sources,” cataloging & classification quarterly 51 (2013): 913-‐928, http://dx.doi.org/10.1080/01639374.2013.823583. 23. barbara tillett, “rda and the semantic web, linked data environment,” italian journal of library & information science 4, no. 1 (2013): 140, http://dx.doi.org/10.4403/jlis.it-‐6303. 24. alemu et al., “linked data for libraries.” 25. tillett, “rda and the semantic web, linked data environment,” 140. 26. gillian byrne and lisa goddard, “the strongest link: libraries and linked data,” d-‐lib magazine 16, no. 11/12 (2010), http://dx.doi.org/10.1045/november2010-‐byrne. 27. alemu et al., “linked data for libraries,” 560. 28. library of congress, bibliographic framework as a web of data: linked data model and supporting services, (washington, dc: library of congress, november 21 2012), http://www.loc.gov/bibframe/pdf/marcld-‐report-‐11-‐21-‐2012.pdf. 29. barbara tillett, “what is frbr? a conceptual model for the bibliographic universe,” library of linking libraries to the web | gonzales 21 congress, 2003, http://www.loc.gov/cds/downloads/frbr.pdf. 30. “bibframe frequently asked questions,” library of congress, http://www.loc.gov/bibframe/faqs/#q04. 31. ibid. 32. lindquist et al., “using linked open data to enhance subject access in online primary sources,” 923. 33. alan danskin, “linked and open data: rda and bibliographic control.” italian journal of library & information science 4, no. 1 (2013): 157, http://dx.doi.org/10.4403/jlis.it-‐5463. 34. erik t. mitchell, “three case studies in linked open data.” library technology reports 49, no. 5 (2013): 26-‐43. http://www.alatechsource.org/taxonomy/term/106. 35. angela kroeger, “the road to bibframe: the evolution of the idea of bibliographic transition into a post marc future,” cataloging & classification quarterly 51 (2013): 881, http://dx.doi.org/10.1080/01639374.2013.823584. 36. jason w. dean, “charles a. cutter and edward tufte: coming to a library near you, via bibframe,” in the library with the lead pipe, december 4, 2013, http://www.inthelibrarywiththeleadpipe.org/2013/charles-‐a-‐cutter-‐and-‐edward-‐tufte-‐ coming-‐to-‐a-‐library-‐near-‐you-‐via-‐bibframe/ . 37. “bibframe frequently asked questions,” library of congress, http://www.loc.gov/bibframe/faqs/#q04. 38. daniel chudnov, “what linked data is missing,” computers in libraries 31, no. 8 (2011): 35-‐ 36,http://www.infotoday.com/cilmag. 39. byrne and goddard, “the strongest link: libraries and linked data.” 40. alemu et al., “linked data for libraries,” 557. 41. roy tennant, “a bibliographic metadata infrastructure for the twenty-‐first century,” library hi tech 22, no. 2 (2004): 175-‐181, http://dx.doi.org/10.1108/07378830410524602. 42. alemu et al., “linked data for libraries,” 556. 43. breeding, “linked data.” 44. tennant, “a bibliographic metadata infrastructure for the twenty-‐first century.” 45. kroeger, “the road to bibframe,” 884-‐885. 46. ibid. 47. tennant, “a bibliographic metadata infrastructure for the twenty-‐first century.” information technology and libraries | december 2014 22 48. byrne and goddard, “the strongest link: libraries and linked data. 49. ibid. 50. alemu et al., “linked data for libraries.” 51. ibid., 563. 52. chudnov, “what linked data is missing.” 53. ibid. 54. ibid. 55. timo borst, birgit fingerle, and joachim neubert, “how do libraries find their way onto the semantic web?” liber quarterly 19, no 3/4 (2010): 336–43, http://liber.library.uu.nl/index.php/lq/article/view/7970/8271. 56. ibid. 57. ibid., 342-‐343. journey with veterans: virtual reality program using google expeditions public libraries leading the way journey with veterans virtual reality program using google expeditions jessica hall information technology and libraries | december 2020 https://doi.org/10.6017/ital.v39i4.12857 jessica hall (jessica.hall@fresnolibrary.org) community librarian, fresno county public library. © 2020. “where would you like to go?” is the question of the day. we have stood atop the great wall of china, swam with sea lions in the galapagos islands, and walked along the vast red sands of mars. each journey was unique and available through the library. as a community librarian in charge of outreach to seniors and veterans, i first learned about the virtual tour idea from a colleague who returned from a conference excited to tell me about a workshop she had attended. the workshop she had taken described a program which utilized google expeditions to take seniors on virtual tours. this idea stayed with me for months until fresno county public library obtained the $3000 value of libraries grant, which was funded by the california library services act. as a part of this grant, $2905 went to purchase a google expeditions kit and supplied to create a virtual reality program called journey with veterans. the kit includes 5 viewers and 1 tablet. a viewer is basically a google cardboard except the case is plastic and there is a smartphone inside of the case. during the program, i use the table to select and run each tour. the tour i select on the tablet is projected to the 5 viewers so participants can experience it. in this manner, veterans can explore places without physically having to travel anywhere. the journey with veterans program took the technology to the veterans instead of requiring them to come into the library. the two locations that were chosen were the veterans home of california fresno and the community living center at the va medical center in fresno, ca. from the time the program began in september 2019 to march 2020, when the pandemic shutdown brought a halt to the program, the library hosted 26 sessions at these two locations with 182 veterans. in sessions where more than 5 people were in attendance, the viewers were shared between the participants. the tablet and smartphones inside of the viewers have an app installed on them called google expeditions which is the software that runs the tours. one hotspot, which was already owned by the library, was used for this program. it is a requirement that all the viewers and the tablet are connected to the same wifi. having a portable wifi connection was necessary to run this program in locations where there was not access to a strong internet connection. each tour is a selection of still 360-degree views. the landscape does not move. instead, the participant turns their head around, up and down to look at the entire scene. the control tablet included additional menu items not seen by participants. these items included scripts that i can read off about the landscape we are looking at and suggested points of interest that i could highlight for participants. when i selected the point of interest on the tablet, the participant would see arrows pointing to that area of their screen. the participant would follow the arrows by turning their head in the direction that was indicated. the participants knew they were looking at the area of interest when the arrows disappeared and was replaced by a white circle surrounding the relevant portion of the screen. mailto:jessica.hall@fresnolibrary.org information technology and libraries december 2020 journey with veterans | hall 2 the viewers did not have straps attached to them and there was no way to attach straps to them. therefore, the viewer could not be strapped to the participant’s head. instead, the participant had to hold up the viewer the entire time they wished to look through it. this presented a challenge for participants who did not have the ability to hold the viewer on their own. at the locations i went to, there were staff available to help and they would hold the viewer up to a participant’s eyes. in some cases, one staff person held the viewer up for the participant while another would turn the participant’s wheelchair in a circle so they could see the entire image. each program lasted 30-45 minutes but the amount of time looking through the viewer was kept to around 15-20 minutes. the rest of the time was filled with talking about the location that we are viewing. for the veterans in memory care at the veterans home of california fresno, this program was designed with the hope that it would allow the veterans to reminisce about places they had visited and lived in and encouraged them to talk about their experiences. some of the participants had been to the countries that we visited virtually and they reminisced on their time there. at every session, the participants shared their enthusiasm and eagerness to continue the program. the program once was tried with music. on one of my first visits to the community living center at the va medical center, a participant asked if he could play music in the background. since i had thought about incorporating music into the program, i agreed, and the participant played some classical music from his own device. though it was a good idea, the execution did not work well. the music was coming from one location, which made it too loud when one stood near it but too quiet once one walked too far away. i found the music difficult to talk over while giving the tour. i believe that incorporating sounds of the location we visit, such as the sounds of the countryside or a big city would make the experience more immersive. however, i have yet to find a way to do so successfully. after the grant ended, i continued the program at both locations. the partnership i had created at the veterans home of california-fresno grew into a second program, storytime with veterans which was requested by specifically by the residents. i alternated my visits so that some weeks we did a virtual reality program and some weeks i read to them. one time, there was miscommunication and the activity coordinator thought i had come to read a story but i was under the impression that it was a virtual reality week and so i had brought the google expeditions with me. the solution was to do both. one of the google expeditions tours is a very short and much abridged virtual reality version of twenty thousand leagues under the sea by jules verne. the tour used artwork to represent scenes from the books and each scene tells a different part of the story. the veterans home’s residents were treated to both a story and a virtual reality tour at the same time. up until the library’s shutdown in mid-march due to covid-19, i was in the process of expanding the use of the google expeditions but was unable to continue. since then, the equipment has not been used. restarting the program now includes multiple challenges, not the least of which is sanitizing the devices. sanitation was a consideration even before covid-19 and sanitary virtual reality masks were acquired using grant funds as part of the initial program. these masks look like strips of cloth that line the eyes with strings to hook it around the ears to hold it in place. cleaning products were also purchased and utilized to clean the devices after each program. information technology and libraries december 2020 journey with veterans | hall 3 before covid-19, a viewer could be handled by multiple people before it was cleaned. i always handled them first to prepare them for use. then i handed each one to the participant. occasionally they were also handled by staff. i always cleaned the viewers right after the program ended but not during the program. with the current covid-19 restrictions, the sanitation practices previously used are inadequate. i do not know the future of the program in a postcovid-19 world, but i intend to begin the program again once when it becomes safe to do so and i will incorporate all required precautions and restrictions. i look forward to once more being able to take veterans on exciting virtual journeys. reproduced with permission of the copyright owner. further reproduction prohibited without permission. engelond: a model for faculty-librarian collaboration in the information age scott, walter information technology and libraries; mar 2000; 19, 1; proquest pg. 34 one-stop place for presenting scholarly research. staff support includes consultation in any aspect of the bailiwick project, including design issues, interface development, and training in software. staff members do not provide programming nor do they do any work in researching or assembling sites. each faculty member is assigned an information arcade consultant at the point of submitting a bailiwick application. the consultant serves as a primary contact person for technical support, troubleshooting, basic interface design guidance, and referrals to other staff both in the libraries and on campus. at present, the current level of staffing has been sufficient to accommodate this sort of assistance, which is not unlike the assistance provided to any patron who walks in the door of the information arcade. as a computing facility, the information arcade provides public access to a host of multimedia development workstations for scanning images, slides, and text, and for digitizing video and audio. at these multimedia stations, a large suite of multimedia integration software and web publishing software is made available for public use. staff at the public services desk have a strong background in multimedia development and web design and can provide some one-on-one training on a walk-in basis beyond technical support and troubleshooting. all of these hardware and software resources are available to bailiwick content providers, who can choose to do their development work in the information arcade or at their home or office. finally, since there is a close relationship between the information arcade and the university libraries web site, system administration and web server support is all handled inhouse as well. there are few artificial barriers imposed by the technology, thereby permitting content providers to focus on their creative expression and scholarly work. with only minimal reallocation of existing resources, the university of iowa libraries has been able to launch the bailiwick project and continue to develop it at a modest pace. one of the components most essential for its continued success, however, is the ability to scale up to meet the expected demand over the next several years. technical infrastructure challenges are not overwhelming as yet. an analysis still needs to be made to determine how quickly creators are developing their sites, what the implications are for network delivery of these resources, what reasonable projections there are for disk space, and who is using the resources. perhaps more importantly, though, adequate staffing will always remain a concern. some faculty wish to work more closely with library staff consultants than time allows, and the consultants would certainly find it enriching to be more intimately involved with the development of each bailiwick site. marketing of the bailiwick project has been discrete (to say the least) because of the limited staffing available. however, embedded in the collaboration inherent in bailiwicks is the potential for stronger involvement with faculty in obtaining grant funding to support the development of specific bailiwick sites. a model for research libraries bailiwick is a project that allows the university of iowa libraries, and specifically the information arcade, to focus on the integration of technology, multimedia, and hypertext in the context of scholarship and research. to date, most of the bailiwick sites represent disciplines in the arts, humanities, and social sciences. this matches the overall clientele of the information arcade (given its location in the university of iowa's main library), but it also reflects the fact that these disciplines have been tradi34 information technology and libraries i march 2000 tionally undersupported with respect to technology. nevertheless, individual faculty in these disciplines have integrated some of the most creative applications of the technology in their everyday teaching and research, in part because of the existence of the information arcade and the groundwork laid by the libraries for the past several years. with the information arcade's visibility on campus, and with similar resources and support in the information commons-a sister facility in the hardin library for the health sciences-the university of iowa libraries are well regarded on campus as a leader in information technology, electronic publishing, and new media. thus, faculty and students alike are accustomed to turning to the libraries for innovation in technology and the bailiwick project is a natural fit. bailiwick is now fully integrated as part of a palette of new technology services and scholarly resources included within the libraries' support of teaching, learning, and research at the university of iowa. engelond: a model for faculty-librarian collaboration in the information age scott walter the question of how best to incorporate information literacy instruction into the academic curriculum has long been a leading concern of academic librarians. in scott walter (walter.123@osu.edu), formerly humanities and educaton reference librarian, university of missouri-kansas city, now is information services librarian, ohio state university. reproduced with permission of the copyright owner. further reproduction prohibited without permission. recent years, this issue has grown beyond the boundaries of professional librarianship and has become a general concern regularly addresssed by classroom faculty, educational administrators, and even regional accrediting organizations and state legislatures. this essay reports on the success of a pilot program in course-integration information literacy instruction in the field of medieval studies. the author's experience with the "enge/and" project provides a model for the ways in which information literacy instruction can be effectively integrated into the academic curriculum, and for the ways in which a successful pilot program can both lead the way for further development of the general instructional program in an academic library, and serve as a springboard for future collaborative projects between classroom faculty and academic librarians. in 1989 the chronicle of higher education reported on the proceedings of a conference on teaching and technology held near the richmond, indiana campus of earlham college.1 conference speakers identified a number of concerns for those involved in teaching and learning at the end of the twentieth century. chief among these were recent advances in information technology that threatened "to leave students adrift in a sea of information." earlham college librarian evan i. farber and his fellow speakers called upon conference attendees to develop new teaching strategies that would help students learn how to evaluate and make use of the "masses of information" now accessible to them through emergent information technologies, and to embrace a collaborative teaching model that would allow academic librarians and classroom faculty members to work together in developing instructional objectives appropriate to the information age. the concerns expressed by these faculty and administrators for the information literacy skills of their students may have still seemed unusual to the general educational community in the late 1980s, but, as behrens and breivik have demonstrated, such concerns have been a leading issue for academic librarians for more than twenty years. according to its most popular definition, information literacy may be understood as "[the ability] to recognize when information is needed and ... the ability to locate, evaluate, and use effectively the needed information."2 it has become increasingly clear over the past decade that educators at every level consider information literacy a critical educational issue in contemporary society. perhaps the most frequently cited example of concern among educational policy-makers for the information literacy skills of the student body can be found in ernest boyer's report to the carnegie foundation, college: the undergraduate experience in america (1987), in which the author concludes that "all undergraduates should be introduced to the full range of resources for learning on campus," and that students should spend "at least as much time in the library ... as they spend in classes."3 but while boyer's report may be the most familiar example of such concern, it is hardly unique. as breivik and gee have described, a small group of educational leaders have regularly expressed similar concerns over the past several decades. moreover, as bodi et al. among others, have demonstrated, the rise in professional interest in information literacy issues among librarians in the past decade is closely related to more general concerns among the educational community, especially the desire to foster critical thinking skills among the student body. by the mid-1990s, professional organizations such as the national education association, accrediting bodies such as the middle states association of colleges and schools, and even state legislators began to incorporate information literacy competencies into proposals for educational reform at both the secondary and the post-secondary levels. the confluence over the past decade of new priorities in educational reform with rapid developments in information technology provided a perfect opportunity for academic librarians to develop and implement formal information literacy programs on their campuses, and to assume a higher profile in terms of classroom instruction. for the past two years, a pilot project has been underway at the miller nichols library of the university of missouri-kansas city that not only fosters collaborative relations between classroom faculty members and librarians, but promotes the development of higherorder information literacy skills among participating members of the student body. engelond: resources for 14th-century english studies (www.umkc.edu/lib i engelond/) incorporates traditional library instruction in information access as well as instruction in how to apply critical thinking skills to the contemporary information environment into the academic curriculum of participating courses in the field of medieval studies. our experience with the engelond project provides a model for the ways in which information literacy instruction can be effectively integrated into the academic curriculum, and for the ways in which a successful pilot program can both lead the way for further development of the general instructional program in an academic library, and serve as a springboard for future collaborative projects between classroom faculty members and librarians. the impetus for collaboration "most medieval web sites are dreck," or so wrote linda e. voigts, curators' professor of english at the university communications i walter 35 reproduced with permission of the copyright owner. further reproduction prohibited without permission. of missouri-kansas city, in a recent review of her participation in the engelond project for the medieval academy news. describing the impetus for the development of the project in terms of a complaint increasingly common among members of the classroom faculty, voigts provides a number of examples from recent years in which students made extensive, but inappropriate, use of web-based information resources in their academic research. in one example, voigts describes a student who made the mistake of relying heavily on what appeared to be an authoritative essay for her report on medieval medical practices. the report was actually authored by a radiologist "with little apparent knowledge of either the middle ages or of premodern medicine." "how can those of us who teach the middle ages," voigts asked, "help our students find in the morass of rubbish on the internet the relatively few pearls? how can we foster skills for distinguishing between true pearls and those glittery paste jewels that dissolve upon close examination?"4 by the time voigts approached the miller nichols library during the fall 1997 semester for suggestions about the best ways to teach her students how to "sift the web" in their search for resources suitable for academic research in medieval studies, the issue of faculty-librarian collaboration in internet instruction was a familiar one. in a representative review of the literature, jayne and vander meer identified three "common approaches" that libraries have taken to the problem of teaching students how to apply critical thinking skills to the use of web-based information resources: (1) the development of generic evaluative criteria that may be applied to web-based information resources; (2) the inclusion of web-based information resources as simply one more material type to be evaluated during the course of one's research (i.e., adding lo.st updated : 27.april 1999 ! enge/and supports the research of students in dr . linda ehrsam voigts' chaucer (english 4121512 ) and medieval literature ii (english 555a) courses at the university of miss ouri-k ansas city. the site was created by the university libraries' public services staff with the collaboration of dr . voigts . we hope it will serve as a prototype for future collaborative efforts integrating library resources, course content. and multi-media technologies these pages contain syllabi for both courses, links to internet resources (including web sites. news groups and online discussion groups relevant to medieval studies) . a guide to evaluating both online and print research tools. a list of materials held on reserve at miller nich ols library for the use of these classes. and links to the merlin library catalog and a wide range of databases available through the university libraries . audionisual resources include rea!audio streams of dr. voigts reading from chaucer's canterbury tales and troi/us and criseyde . also included is joshua merrill's 'from gatehouse to cathedral a phot ograp hic pilgrimage to chaucerian landmarks .' , ,~ • • i ~ i i ju l . • i t• ii i i i ' • "i h d1,t i lt.;l 1.,,ld~.;, ..,;,.,"' r'\uu,v i io 1.;:,u.ji .;l ,t. '•·" ;,t.,:, ),j l..,l<;i~.;tl.., .... j -'-' ' • j ~ figure 1. engelond home page the web to the litany of resources, popular and scholarly, print and electronic, typically addressed in a general instructional session); and (3) working with faculty to integrate critical thinking skills into an academic assignment that asks students to use or evaluate web-based information resources relevant to their coursework. 5 while the engelond project focused primarily on the last of these options, our work on the project also fostered the use of the first two approaches in our broader instructional program. engelond's landscape the engelond web site provides access to a number of resources for participating students. these resources may be categorized as course-specific (e.g., course syllabi), information literacyrelated (e.g., a set of evaluative criteria for use with web-based information resources), or multimedia (e.g., sound recordings of voigts reading excerpts from chaucer's works in middle english). all of these resources are accessible from the engelond home page www.umkc.edu/lib/engelond/) (see figure 1). several links are also provided throughout the site to resources housed on the library's web site, including access to electronic databases and subject-specific guides to relevant resources in the print collection. although students make use of all of these resources during the course of the semester, the emphasis in this essay will be on describing the nature and use of the information literacy-related resources. as behrens and euster have noted, recent interest in information literacy instruction has been guided to a degree by concern over student ability to make effective use of new forms of information technology. this concern is addressed in the engelond project by its "internet resources" page, through which students are acquainted with the architecture of the internet and are provided with annotated references (and links) to a number of electronic resources (including web portals) that will allow them to begin their research in medieval studies. students making use of the page are 36 information technology and libraries i march 2000 reproduced with permission of the copyright owner. further reproduction prohibited without permission. introduced, for example, to a variety of the different types of information resourc e s available through the internet, including web sites, telnet sites, news groups, and discussion lists. users are also directed to related resources on the library web site, including a guide to print resources for the study of chaucer and an annotat ed guide to web-based information resources generally useful for the study of literatur e . 6 also provided on the engelond site is a discussion of evaluative criteria that students might apply to their selection of web-based information resources for academic research. designed to address voigts' initial concern about the issue of teaching students how to apply critical thinking skills to their use of the web, the "criteria" page provides a general discussion of the nature of webbased information resources, the ways in which such resources differ from traditional resources, and the kinds of questions that students must ask of any web-based resource before making use of it in their academic work . reflecting the idea that information literacy skills are best taught in connection with a specific subject matter, the "criteria" page includes references to a number of illustrative examples of web-based resources in medieval studies. this page also reflects the evolutionary nature of the engelond project, since new illustrations are added as each successive group of student users discovers different examples (both positive and negative). also included on this page is a link to the library 's "quick reference guide to evaluating resources on the world wide web," a generic version of the criteria developed for use with the broader instruction program at the miller nichols library . 7 while the resources described above introduce students to the information landscape in the field of medieval studies and provide them with evaluative tools tailored to subject-specific concerns in making use of web-based information resources in their academic work, the final information literacy-related resource made available through the engelond site is perhaps of the greatest interest. the "class picks" page presents the results of participating students' web site evaluati on assignments. on this page , user s will find student evaluations of web-based resources in medieval studies that draw not only on the information literacy skills provided through traditional library instruction, but also on the subject-specific knowledge that students gain as part of their academic coursework. jayne and vander meer wrote that faculty-librarian collaboration in internet instruction is most effective when students are asked to draw both on generic informational literacy skills and on information and evaluative criteria specific to the subject matter being addressed.8 as they concluded, " [to] benefit fully from the web's potential, stud ents need training and guidance from librarians and faculty." incorporating discussions of site design, organization of information, and veracity of content, the web site evaluations found on the "class picks" page demonstrate that participating students have learned both from the librarian and the scholar, and hav e begun to consider the best ways to incorporate web-based information resources into their day-to-day academic work. in a review of "the harvard chaucer page " (http:/ / icg.fas. harvard .edu / -chaucer /) , for example, students note the general appeal of the site, but criticize it both for technical problems in its design and for editorial choices that limit its utility for academic research: the harvard chaucer is an insightful , colorful look at the author and his times, but is dappled conspicuously with misspellings, repeated phrases , sentence fragments, broken links, and unfinished pages . translations of medieval texts provided on the site are often anonymou s, making it hard to tell if the translation is credible and an acceptable resourc e for serious research in chauce r studies. if one is interested in pursuing a topic found on the harvard chaucer , s / he is well advised to explore the site for ideas and background information, but to go elsewhere for authoritative sources .. . 9 in another review , this one of "the medieval feminist index" (www.haverford.edu / library/ reference/mschaus/mfi/mfi.html), students provide a discussion of the scholarly authority of the site as well as a description of the results retrieved in sample searches of the index for materials relevant to the study of chaucer. 10 the review concludes with further examples of issues relevant to chaucer studies that might be effectively investigated with information identified through this resource. in both reviews, students demonstrate the ability to critically evaluate a web site both for its design and for its content , and the ability to express the strengths and weaknesses of a site from the point of view of a student concerned with how to make use of a web-based information resource in his or her academic work. as a result, the reviews found on the "class picks" page not only demonstrate the successful approach to course-integrated information literacy instruction promoted through the engelond project, but also provide a useful student resource in their own right. the collaborative approach in her review of faculty-librarian partnerships in information literacy instruction, smalley wrote that, in the best-case scenario, "the student gains communications i walter 37 reproduced with permission of the copyright owner. further reproduction prohibited without permission. mastery in using some portion of internet resources, as well as exposure to resources intrinsically important to disciplinary pursuits. in doing the web-based exercises, students see information seeking and evaluation as essential parts of problem solving within the field of study." 11 the three information literacy-related resources found on the engelond site"internet resources," "criteria, " and "class picks" -demonstrate one approach to providing course-integrated information literacy instruction in such a way that the classroom faculty member and the academic librarian can work collaboratively and productively to meet their mutual instructional goals . both the classroom faculty member and the cooperating librarian are able to meet their instructional goals using the engelond model because of the collaborative nature of the information literacy instruction provided to the participating students. students enrolled in voigts' chaucer class during the winter 1999 semester received information literacy instruction focused both on information access and critical thinking while completing successive iterations of the web site evaluation assignment required for the course. a brief overview of the collaborative teaching process should suggest ways in which the participating faculty member and librarian were able to draw successfully both on generic information literac y skills and on subjectspecific knowledge while conducting course-integrated library instruction using the engelond site. participating students during the winter 1999 semester began with a general introduction to the electronic resources available through the miller nichols library at the university of missouri-kansas city (e.g., using the online catalog and databases such as the mla bibliography) . students were then presented with an introduction to the problem of applying critical thinking skills to the use of web-based information resources, as described on engelond's "criteria" page. following this introductory session conducted by the cooperating librarian, the cooperating faculty member provided students with a number of illustrative examples of the inappropriate use of electronic resources for academic research in medieval studies. from the beginning, the librarian and the faculty member modeled an integrated approach to the evaluation of information resources for their students; one that drew both on generic critical thinking skills and on specific examples of how such skills might be applied to resources in their field. following this initial session (which took place during the first week of the semester) , students were asked to complete an evaluation of a web site containing information they might consider using as part of their academic work. individual sites were chosen from among those accessible through the subject-specific web portals provided on the "internet resources" page. students were provided both with the library's "quick reference guide to evaluating resources on the world wide web" and with the more extensive description of web site evaluation available on the "criteria" page . students completed these initial reviews over the following week and submitted copies to both the faculty member and the librarian . in preparation for the second instructional session (which took place during the third week of the semester), the faculty member and the librarian evaluated each review twice (individually, and then together). reviews were evaluated for the clarity of their criticism of a site, both from the point of view of information organization and design and from the point of view of the significance of the information for student research in the field . sites that seemed to merit further review by the entire class were selected from 38 information technology and libraries i march 2000 this pool of evaluations and were discussed in greater detail by the instructors. the second instructional session took the form of an extended review of the sites selected in the meeting described above . in each case, students were asked to describe their reaction to the site in question. in cases where more than one student had evaluated the same site, each student was asked to present one or two distinct points from his or her review. the instructors then presented their reactions to the site. again, the librarian and the faculty member modeled for the students an approach to the critical evaluation of information resources that drew not only on the professional expertise of the librarian, but also on the scholarly expertise of the faculty member . by the end of this session, students had been exposed to three separate critiques of the selected web sites: the student's opinion of how the information presented on the site might be used in academic research; the librarian 's opinion of how effectively the information was organized and presented, and how its authority, currency, etc ., might differ from that of comparable print resources; and, finally, the faculty member's opinion of the place and value of the information provided on the site in the broader scheme of the discipline. following this session, the students were assigned to groups in order to develop more detailed evaluations of the web sites discussed in class. as before, these assignments were submitted both to the faculty member and to the librarian. after further review by both instructors, the assignments were returned to the students for a third (and final) iteration, and then mounted to the "class picks" page. by the conclusion of this assignment, participating students had learned not only how to apply critical thinking skills to web-based information resources, but had begun to think about the nature of reproduced with permission of the copyright owner. further reproduction prohibited without permission. electronic information and the many forms that such information can take. the web site evaluations included on the "class picks" page demonstrate the students' ability to successfully evaluate a web-based information resource both for its design and for its content, and to suggest the academic situations in which its use might be warranted for a student of medieval literature. evaluating engelond during the winter 1999 semester, we attempted to evaluate the success of the information literacy instruction provided through the engelond project. while the web site evaluations produced by the students provided one obvious measure of our instructional success, we attempted to learn more about the ways in which students used the materials provided through the engelond site by polling users and by examining use patterns on the site. both of these latter measures confirmed what the instructors already suspected: students enrolled in participating courses were making heavy use of the information literacy-related resources housed on the engelond site and saw the skills fostered by those resources as a valuable complement to the disciplinary knowledge being gained in the traditional classroom. as part of a general evaluation of the instructional services provided by the library during the course of the semester, students participating in the engelond project were asked open-ended questions such as: "what features of the engelond web site did you find most useful as a student in this course?"; "how did the existence of the engelond site and the collaboration between your classroom instructor and the library enhance your learning experience in this course?"; and "what aspects of the library instruction that you received as part of this course do you believe will be useful to you in other courses or in regards to lifelong learning?" among the specific items cited most often by students as being useful to them in their academic work were two of the information literacy-related resources: "internet resources" and "class picks." likewise , information literacy skills such as familiarity with the structure of the internet and the ability to critically evaluate web-based information resources were listed by almost every student as skills that would be useful both in other academic courses and in their daily lives . moreover, two graduate students who were participants reported that their experience with engelond had led them to incorporate information literacy instruction into the undergraduate courses that they taught themselves. any conclusions about the appeal of the information literacyrelated resources housed on the engelond site based on these narrative responses were reinforced by a study of the use statistics for the same period. through the first three months of the winter 1999 semester ganuary-march), the engelond site recorded approximately one thousand "hits" on its main page.12 in each month, the most frequently accessed pages were the three information literacy-related resources described above, with the "criteria" page regularly recording the greatest number of hits . among the other most-frequently visited pages on the site were the multimedia resource page (" audio -visual"), the "syllabi" page, and the "quick reference guide to chaucer" (housed on the library web site, but accessible through the "internet resources" page). taken in conjunction with the narrative responses provided on the evaluation form , these use statistics suggest that the information literacy resources provided through the engelond site have become a fullyintegrated, and greatly appreciated, feature of the academic curriculum in medieval studies in the department of english at the university of missouri-kansas city. a model for future collaboration the engelond project has not only been a success with students who have enrolled in participating courses, but has had a significant influence on the broader instructional program at the miller nichols library. it has served as a template for future collaborative efforts between the classroom faculty and the library in terms of integrating information technology and information literacy into the academic curriculum. in terms of the instructional program at the miller nichols library, our experience with engelond helped lay the groundwork for the development of new instructional materials and for new instructional programs . it was through engelond, for example, that we first provided electronic access to our point-of-service guides to library materials in various subjects (e.g., the "library guide to chaucer"). as of the end of the winter 1999 semester, we have made almost all of our pathfinders available on the library web site and are now considering ways in which these might be effectively incorporated into the work being done by our faculty in developing web-based coursework. also, it was through engelond that our subject specialists started collecting and annotating web-based information resources of potential use to our students and faculty. now, subject specialists are developing "subject guides" to web-based resources in a number of fields and promoting their use among faculty members who , like voigts, are concerned about the quality of the web-based information being used by their students in their communications i walter 39 reproduced with permission of the copyright owner. further reproduction prohibited without permission. !' miller nichols library about the tl tc services schedule workshops staff technology for learning and teaching center a umkc faculty service figure 2. tltc home page academic work. both our pathfinders and our subject guides to web-based resources are available online (www. umkc.edu/lib /instruction/guides/ index.html). finally, the instructional session on the critical evaluation of webbased resources that has been the centerpiece of library instruction for the engelond project has now been adapted for inclusion in our normal round of instructional workshops. while support for such innovations in our instructional program clearly existed within the library prior to the initiation of the engelond project, the project's success has provided an important spur to the development of instructional services in the library. the commitment to collaborative instructional programming demonstrated by the engelond project has also helped pave the way for the development of the university of missouri-kansas city's new technology for learning and teaching (tlt) center. housed in the miller nichols library, the tlt center offers faculty workshops in the use of information technology and a place in which classroom faculty, subject specialists, and educational technologists may collaborate on the development of projects such as engelond. further information on the tlt center is available online (www.umkc.edu/tltc/) (see figure 2). initiating a culture of collaboration between members of the classroom faculty and academic librarians can be a difficult task (as so much of the literature has shown). in reviewing our experience with engelond, we have benefited from the suggestions that hardesty made some years ago about the means of supporting the adoption across campus of an innovative instructional model: (1) the librarian must present information literacy instruction in such a way that it does not threaten the role of the classroom faculty member as an authority in the subject matter of the course; (2) the new approach to instructional collaboration must be adopted on a limited basis at first, rather than requiring that all instructional programs immediately adopt the new approach; and (3) the results of a successful pilot projects 40 information technology and libraries i march 2000 must be "readily visible to others" on campus. 13 designed as a pilot project, engelond has successfully demonstrated that classroom faculty and academic librarians can collaborate to meet their mutual instructional objectives, both in terms of information literacy instruction and in terms of academic course content. as information technology continues to gain a central place in the educational mission of the college and university, it is likely that the sphere of mutual instructional objectives between classroom faculty and academic librarians will only increase. our careful approach to raising the instructional profile of librarians on campus has been rewarded, too, both by an increasing number of faculty members seeking course-related instruction in our electronic classroom as part of the regular instructional program of the library, and by the institutional commitment of resources to the tlt center, which will become the nexus of instructional collaboration between faculty and librarians on our campus. during the 1999-2000 academic year, no fewer than three academic courses in medieval studies will make use of the engelond site. as more faculty become aware of the services provided by the tlt center, such collaborative approaches to information literacy instruction will likely become more evident across a variety of disciplines. the lessons learned over the past two years of project development will be invaluable as we move to provide courseintegrated information literacy instruction to an increasing number of students in an increasingly broad variety of courses. acknowledgments the engelond project has benefited from the work of a number of individuals over the past two years, reproduced with permission of the copyright owner. further reproduction prohibited without permission. especially ted p. sheldon, director of libraries at the university of missouri-kansas city, and marilyn carbonell, assistant director for collection development, both of whom were instrumental in developing the plan for a pilot project in courseintegrated information literacy instruction with professor voigts. the design for the engelond site was developed by john laroe, former multimedia design technologist at the miller nichols library. the original text for the site was written by voigts, laroe, and t. michael kelly, former humanities reference librarian at the miller nichols library. additional text and resources for the site have been developed over the past year by voigts and myself. in addition, a number of librarians and staff members in the public services division of the miller nichols library devoted time to critiquing the site and to assisting with the creation of the embedded audio files. these contributions may not always be evident to the students who benefit from the project, but they were instrumental in our ability to successfully meet our instructional objectives during the 1998-99 academic year. references and notes 1. thomas j. deloughry, "professors are urged to devise strategies to help students deal with 'information explosion' spurred by technology," chronicle of higher education 35 (march 8, 1989), a13, al5. 2. shirley j. behrens, "a conceptual analysis and historical overview of information literacy," college & research libraries 55 guly 1994): 309-22; patricia senn breivik, student learning in the information age (phoenix, ariz.: oryx pr., 1998); "final report of the american library association presidential committee on information literacy" (1989), as reproduced in breivik, student learning in the information age, 121-37 (quotation is from pp. 121-22). for another recent overview of the development of the theory and practice of information literacy at every level of american education over the past two decades, see kathleen l. spitzer and others, information literacy: essential skills for the information age (syracuse, n.y.: eric clearinghouse on information and technology, 1998). 3. ernest l. boyer, college: the undergraduate experience in america (new york: harper & row, 1987), 165; patricia senn breivik and e. gordon gee, information literacy: revolution in the library (new york: macmillan, 1989); sonia bodi, "critical thinking and bibliographic instruction: the relationship," journal of academic librarianship 14 guly 1988): 150-53; barbara b. moran, "library /classroom partnerships for the 1990s," c&rl news 51 (june 1990): 511-14; sonia bodi, "collaborating with faculty in teaching critical thinking: the role of librarians," research strategies 10 (spring 1992): 69-76; hannelore b. rader, "information literacy and the undergraduate curriculum," library trends 44 (fall 1995): 270-78; spitzer and others, information literacy; and breivik, student learning in the information age, 7-8. on the relationship between trends in educational reform favoring the development of critical thinking skills and their relationship to the place of information literacy instruction in higher education, see also joanne r. euster, "the academic library: its place and role in the institution," in academic libraries: their rationale and role in american higher education, gerard b. mccabe and ruth j. person eds. (westport: greenwood pr., 1995), 7; craig gibson, "critical thinking: implications for instruction," rq 35 (fall 1995): 27-35. 4. linda ersham voigts, "teaching students to sift the web," medieval academy news (nov. 1998): 5. 5. elaine jayne and patricia vander meer, "the library's role in academic instructional use of the world wide web," research strategies 15 (1997): 125. see also topsy n. smalley, "partnering with faculty to interweave internet instruction into college coursework," reference services review 26 (summer 1998): 19-27. 6. behrens, "a conceptual analysis and historical overview of information literacy," 312; euster, "the academic library," 6; scott walter, "umkc university libraries: quick reference guide to chaucer." accessed sept. 24, 1999, ww.umkc.edu/lib/ instruction/ guides/ chaucer .html; scott walter, "umkc university libraries: subject guide to literature." accessed sept. 24, 1999, www.umkc.edu/lib/ instruction/ guides/literature.html. all references to specific pages on the engelond site will be made to the page title, e.g., "internet resources." because engelond has been designed in a frameset, it will be easier for interested readers to access the main page at the url provided in the text and then make use of the navigational buttons provided there. 7. scott walter, "umkc university libraries: quick reference guide to evaluating resources on the world wide web." accessed sept. 24, 1999, www.umkc.edu/ lib/ instruction/ guides/ webeval.html. 8. jayne and vander meer, "the library's role in academic instructional use of the world wide web," 125. 9. laura arruda and others, review of "the harvard chaucer page." accessed accessed sept. 24, 1999, www.umkc.edu/lib/engelond. 10. sherrida d. harris and jennifer kearney, review of "the medieval feminist index: scholarship on women, sexuality, and gender." accessed sept. 24, 1999, www.umkc.edu/lib/engelond. 11. smalley, "partnering with faculty to interweave internet instruction into college coursework," 20. 12. in january 1999 engelond received 368 hits, with the three most frequently accessed items being "criteria" (157), "internet resources" (130), and "class picks" (128). in february the total number of hits dropped to 216, with the most frequently accessed items being "criteria" (130), "audio-visual" (59), and "internet resources" and "class picks" (both with 46). in march the total number of hits was 323, with the favorite resources again being "criteria" (113), "internet resources" (74), and "class picks" (65). statistics are based on a study of the daily use logs. accessed sept. 24, 1999,www.umkc.edu/ _reports/. 13. larry hardesty, "the role of the classroom faculty in bibliographic instruction," in teaching librarians to teach: on-the-job training for bibliographic instruction librarians, alice f. clark and kay f. jones eds. (metuchen: scarecrow pr., 1986), 171-72. communications i walter 41 transitioning from xml to rdf: considerations for an effective move towards linked data and the semantic web juliet l. hardesty information technology and libraries | march 2016 51 introduction metadata, particularly within the academic library setting, is often expressed in extensible markup language (xml) and managed with xml tools, technologies, and workflows. software tools such as the oxygen xml editor and querying languages such as xpath and xquery over time have become capable of helping that management. however, managing a library’s metadata currently takes on a greater level of complexity as libraries are increasingly adopting the resource description framework (rdf). semantic web initiatives are surfacing in the library context with experiments in publishing metadata as linked data sets, bibframe development using rdf, and software developments such as the fedora 4 digital repository using rdf. challenges are evident when considering examples of transitions from xml into rdf and show the need for communication and coordination between efforts to incorporate and implement rdf. this article outlines these challenges using different use cases from the literature and first-hand experience. the follow-up discussion considers ways to progress forward from metadata formatted in xml to metadata expressed in rdf. the options explored are not only targeted to metadata practitioners considering this transition but also to programmers, librarians, and managers. literature review and concepts as an initial example of the challenges faced when considering rdf, clarifying terminology is still a helpful activity. rdf focuses on sets of statements describing relationships and meaning. these statements consist of a subject, a predicate, and an object (i.e., an article, has an author, jane smith). these statement parts are also referred to as a resource, a property, and a property value. since there are three parts to rdf statements, they are referred to as triples. the predicate or property of an rdf statement defines the relationship between the subject and the object. rdf ontologies are sets of properties for a particular domain. for example, darwin core has an rdf ontology to express biological properties,1 and ebucore has an rdf ontology to express properties about audiovisual materials.2 pulling apart the many issues involved in moving from xml to rdf is an exploration into the juliet l. hardesty (jlhardes@iu.edu) is metadata analyst at indiana university libraries, bloomington, indiana. mailto:jlhardes@iu.edu transitioning from xml to rdf | hardesty doi: 10.6017/ital.v35i1.9182 52 purpose of metadata, the tools available and their capabilities, and the various strategies that can be employed. poupeau rightly states that xml provides structural logic in its hierarchical identification of elements and attributes, where rdf provides data logic declaring resources that relate to each other using properties.3 these properties are ideally all identified with single reference points (uniform resource identifiers or uris) rather than a description encased in an encoding. a source of honest confusion, however, is that rdf can be expressed as xml. lassila’s note regarding the resource description framework specification from the world wide web consortium (w3c) states, “rdf encourages the view of ‘metadata being data’ by using xml (extensible markup language) as its encoding syntax.”4 so even though rdf can use xml to express resources that relate to each other via properties, identified with single reference points (uris), rdf is itself not an xml schema. rdf has an xml language (sometimes called, confusingly, rdf, and from here forward called rdf/xml). additionally, rdf schema (rdfs) declares a schema or vocabulary as an extension of rdf/xml to express application-specific classes and properties.5 simply speaking, rdf defines entities and their relationships using statements. there are various ways to make these statements, but the original way formulated by the w3c is using an xml language (rdf/xml) that can be extended by an additional xml schema (rdfs) to better define those relationships. ideally, all parts of that relationship (the subject, predicate, object, or the resource, property, property value) are uris pointing to an authority for that resource, that property, or that property value. an additional concept worth covering is serialization. this term is used as a way to describe how rdf data is expressed using various formatting languages. rdf/xml, n-triples, turtle, and jsonld are all examples of rdf serializations.6 describing something as being in rdf really means the framework of subject, predicate, object is being used. describing something as being expressed in rdf/xml or json-ld means that the rdf statements have been serialized into either of those formatting languages. using “rdf” to refer not only to the framework to describe something (rdf) but also the serialization of that description (rdf/xml) can easily muddle the discussion. other thoughts about the difference between xml and rdf or moving metadata from xml into rdf point to the difference in perspective and the change in thinking that is required to manage such a move. in an online discussion about rdf in relation to tei (text encoding initiative), cummings talks about the need for both xml and rdf, using xml to encode text and rdf to extract that data and make it more useful.7 yee, in her in-depth look at bibliographic data as part of the semantic web, points out that rdf is designed to encode knowledge, not information.8 the rdf primer 1.0 also states “rdf directly represents only binary relationships.”9 xml describes what something is by encoding it with descriptive elements and attributes. rdf, on the other hand, constructs statements about something using direct references—a reference to the thing itself, a reference to the descriptor, and a reference to the descriptor’s value. as farnel discussed in her 2015 open repositories presentation about the university of alberta’s move to rdf, they learned they were moving from a records-based framework in xml to a things-based framework in rdf.10 what is pointed out here time and again is something else farnel discussed—moving from xml to information technology and libraries | march 2016 53 rdf is not simply a conversion between encoding formats; it is a translation between two different ways of organizing knowledge. it involves understanding the meaning of the metadata encoded in xml and representing that meaning with appropriate rdf statements. the tools most commonly employed for reworking xml into rdf are openrefine when accompanied by its rdf extension; a triplestore database such as openlink virtuoso,11 apache fuseki,12 or sesame13; oxygen xml editor14; and protégé,15 an ontology editor. openrefine is, according to the website, “a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.”16 the rdf extension, called rdf refine, allows for importing existing vocabularies and reconciling against sparql endpoints (web services that accept sparql queries and return results).17,18 sparql is similar to sql as a language for querying a database, but the syntax is specifically designed to allow for querying data formatted in triple statements instead of tables with columns.19 triplestore databases such as openlink virtuoso can store and index rdf statements for searching as a sparql endpoint, offering a way to retrieve information and visualize connections across a collection of triples. oxygen xml editor has proven helpful in formulating extensible stylesheet language (xsl) transformations to move metadata from a particular xml schema or format into rdf/xml or other serializations such as json-ld (javascript object notation for linking data).20 protégé is a tool developed by stanford university that supports the owl 2 web ontology language and has helped to convert xml schemas to rdf ontologies and establish ways to express xml metadata in rdf. these tools provide the technical means to take metadata expressed in xml and physically reformat it to metadata expressed in an rdf serialization. what that reformatting also encompasses, however, is a review of the information expressed in xml and a set of decisions as to how to express that information as rdf statements. strategic approaches and ideas for handling data transformations into rdf have involved the xml schema or document type definition (dtd). these include thuy, lee, and lee’s approach to map an xml schema (the xsd) to rdf, associating simpletype’s xsd in xml with properties in rdf, defining complextype’s xsd in xml as classes in rdf, and handling a hierarchy of xml schema elements with top levels as domains and lower-level elements and attributes as container classes or subproperties in those domains.21 thuy et al. earlier worked on a method to transform xml to rdf by translating the dtd using rdfs (elements in the dtd are rdf classes or subclasses, attlists are rdf properties, and entities—preset variables in the dtd—are called up for use in rdf as encountered).22 similarly, hacherouf, bahloul, and cruz translate an xml schema into an owl ontology.23 klein et al. point out that while ontologies serve to describe a domain, xml schemas are meant to provide constraints on documents or structure for data so it can be advantageous to work out an rdf expression this way.24 tim berners-lee puts it simply: “the same rdf tree results from many xml trees,” meaning the same single statement in rdf (an article has an author jane smith) can be expressed in many ways in xml and can vary on the basis of the source of the xml, any schemas involved, and the people creating the metadata.25 transitioning from xml to rdf using the xml schema might serve to ensure all xml elements are transitioning from xml to rdf | hardesty doi: 10.6017/ital.v35i1.9182 54 replicated in rdf but does not necessarily establish the relationships meant by that xml encoding without additional evaluation. there is no single strategy that will always work to move xml metadata into rdf, even within the same set of tools (such as fedora/hydra) or the same area of concern (libraries, archives, or museums). use cases for rdf the following use cases explain approaches to transition to rdf taken from two differing perspectives. the first set describes efforts to express xml schemas or standards as rdf ontologies. the second set describes efforts by various library or cultural-heritage digital collections to transform metadata records into rdf statements. they also show that strategies to transform xml to rdf cannot occur without a shift in view from structure to relationships and, likewise, from descriptive encoding to direct meaning. moving an xml schema/standard to an rdf ontology as a graduate student at kent state university, mixter took on converting the descriptive metadata standard vra core 4.0 from an xml schema to an rdf ontology.26 using the vra data standards committee guidelines to ensure all minimum fields were included,27 mixter mapped vra xml elements and attributes to schema.org, foaf, void, and dc terms ontologies. this process is known as “cherry-picking,” or combining various ontologies that already exist to represent properties or relationships (the predicates in rdf statements) as rdf instead of creating new proprietary rdf properties. using owl and rdfs as metavocabularies in protégé, this created an ontology that could “retain the granularity required to describe library, archive, or museum items” of vra core 4.0’s design in xml without being a straight conversion of vra core 4.0 from xml to rdf.28 the outcome was an xslt stylesheet that was tested on vra core 4.0 xml records to produce that same information as rdf statements. one point that seemed to help in testing was the fact that all controlled vocabulary terms had reference identifiers in the xml (ready-made uris). something not discussed in the outcomes was that dates resulted in complex rdf (rdf statements that encompass additional rdf statements or blank nodes) and there was no discussion about this complexity or its effect on using those particular rdf statements. vra core 4.0 now has an rdf ontology in draft form, with mixter as one of its authors.29 the owl ontology still points to schema.org, foaf, and void for equivalent classes and properties, but everything is now named within a vra rdf ontology and namespace and translates to such when vra core 4.0 xml is transformed to rdf. another case in the category of going from an xml standard to an rdf ontology is the development of the bibframe model for bibliographic description from the library of congress. the bibframe model is expressed as rdf. according to the bibframe site, “in addition to being a replacement for marc, bibframe serves as a general model for expressing and connecting bibliographic data.”30 marc has its own format of expression with numbered fields and subfields but can be expressed or serialized in xml and is often shared that way. the bibframe model, information technology and libraries | march 2016 55 while revamping the way a bibliographic record is described on the basis of work, instance, authority, and annotation, also provides tools to transform records from marc/xml to the rdf statements of bibframe.31 a single namespace serves the bibframe model and is explained as a long-term strategy to ensure namespace persistence over the next forty-plus years.32 the transformations produced from library of congress marc records and local marc records contain complex hierarchical rdf statements, particularly when ascribing authority sources to names, subjects, and types of identifiers. as it is still a work in progress there are no tools making use of bibframe records in rdf. an additional example is the work happening with pbcore, the public broadcasting metadata standard managed by the corporation for public broadcasting.33 public broadcasting stations and other institutions across the united states provide descriptive, technical, and structural metadata for audiovisual materials using this xml standard. in boston, wgbh’s use of pbcore coincides with its digital asset management system, hydradam, built on fedora 3 and the hydra technology stack (based on blacklight, solr, and the fedora digital repository).34 fedora 3 does not natively support rdf statements as properties on objects like fedora 4. building off an interest to move hydradam to fedora 4 and leverage rdf for metadata about audiovisual collections, wgbh began exploring transitioning the pbcore xml metadata standard into an rdf ontology. ebucore, the european broadcasting union’s metadata standard, is already expressed as an rdf ontology.35 a comparison between the xml standard of pbcore and the classes and properties expressed in ebucore revealed that most pbcore elements were covered by the ebucore ontology.36 efforts are ongoing to offer pbcore 3.0 as an rdf ontology that uses ebucore with the addition of a smaller set of properties along with a way to transform pbcore xml to pbcore 3.0 in rdf.37 the hydra community, in an effort to help the transition from fedora 3 with its xml binary files of descriptive metadata to fedora 4 using rdf statements as properties on objects, is working on a recommendation and transformation to move descriptive metadata in mods xml into rdf that is usable in fedora 4.38 the mods standard has a draft of an rdf ontology and a stylesheet transformation available,39 but the complex hierarchical rdf produced from this transformation is unmanageable with the current fedora 4 architecture. the hydra mods and rdf descriptive metadata subgroup is attempting to reflect the mods elements in simple rdf statements that can be incorporated as properties on a fedora 4 digital object.40 led by steven anderson at the boston public library, this group is moving through mods element by element, asking the question, “if you had to express this mods element from your metadata in rdf today, how would you do that?” participating institutions are reviewing their mods records and exploring the possible rdf predicates that could be used to represent the meaning of that information. some are even considering how to construct those rdf statements so that mods xml can be re-created as close to the original mods as possible (this is called “round tripping”). there are still questions as to whether every single mods element will be reflected in this transformation, how exactly fedora 4 will make use of these descriptive rdf statements, and if the original mods xml will need to be preserved as part of the digital object in fedora, but this group is recognizing that moving from transitioning from xml to rdf | hardesty doi: 10.6017/ital.v35i1.9182 56 fedora 3 to fedora 4 requires a major shift in thinking about descriptive metadata. this transformation tool is an effort to help make that transition possible. the avalon media system is an open source system for managing and providing access to large collections of digital audio and video.41 it is built on fedora 3 and the hydra technology stack and uses mods xml to store descriptive metadata. as development progresses and the available descriptive fields expand, maintaining the workflow to update xml records in fedora and reindexing objects in the hydra interface becomes increasingly complicated. each time an update is made to descriptive information about an audiovisual item through the avalon interface, the entire xml record for that object, stored as a binary text file, is rewritten in fedora 3 and reindexed in solr. in considering advantages to using fedora 4, it appears that descriptive metadata properties stored in rdf are easier to manage programmatically (updating content, adding new fields, more focused reindexing) because descriptive information would not be stored in a single binary file but as individual properties on the object. turning xml metadata into rdf or linked data for publishing, search and discovery, and management as southwick describes the process, the library at the university of nevada las vegas (unlv) took a collection with descriptive records from contentdm and published them as a single rdf linked open data set.42 after cleaning up controlled vocabulary terms across collections and solidifying locally controlled vocabularies, they exported tab-delimited csv records from contentdm. these records were brought into openrefine with its rdf extension where they reviewed the data and mapped to various properties within the europeana data model (edm). controlled vocabulary terms were in text form and had to be reconciled against a sparql endpoint, either locally from downloaded data or from the controlled vocabulary service, to gather the uris to use as the object or value in the rdf statement. openrefine was then used to create rdf files that were uploaded to a triplestore (first mulgara then openlink virtuoso). this provided public access to the linked open data set and a sparql endpoint for querying the data set. after publishing the data set they experimented with pivotviewer from openlink virtuoso and relfinder to see what kinds of connections and relationships could be visualized from the data as linked open data. the outlined steps are clear and the outcomes are described, but interestingly the data set itself no longer appears to be available online.43 although the unlv use case relies on csv instead of xml as the data source, the tools and workflows enlisted to transform the data set into rdf linked open data are still applicable. openrefine can import xml just as it imports csv, so this described case shows the tools that can be used and decisions to be made in processing that data into rdf statements. in oregon digital,44 xml from qualified dublin core, vra core, and mods at two different institutions (university of oregon and oregon state university) were mapped as linked open data and stored in a triplestore to be served up in a new web application using the hydra technology stack.45 an inventory of metadata fields across all collections was first mapped to existing linked information technology and libraries | march 2016 57 data terms, or properties (those with available uris), then properties that were needed in the new web application but did not have available corresponding uris were mapped to a newly devised local namespace for oregon digital. any properties that were not used were kept in the original static xml file for the record as part of the digital object in fedora. the focus here appears to be on mapping properties without as much detail provided on whether the objects were kept as text or mapped to uri values where possible. from the sample record provided the objects appear to be text and not uris. the real power of this project is finding common properties to describe objects from diverse collections and institutions. what also comes out in the example mappings is the use of many different namespaces or ontologies (dc terms, marc relators, but also mods and mads that produce complex rdf). the university of alberta also combined a variety of xml metadata from different sources into a new digital asset management system based on fedora 4 and the hydra technology stack, called the education and research archive.46 reporting on the experience at open repositories 2015, farnel described the process as working in phases.47 beginning with item types, languages, and licenses, then moving to place names and controlled subject terms, and finally person names and free-form subjects, they made multiple passes converting xml metadata into rdf statements and incorporating uris whenever possible. they are combining all of this into a single data dictionary,48 making use of several rdf ontologies to cover the various metadata properties that are being described about objects and collections. university of california at san diego (ucsd) has developed a local data model using a mix of external (mads, vra core, darwin core, premis) and local ontologies. they published a data dictionary and are working on a substantially different revision as part of the metadata workflow they use to bring digital objects into their digital asset management system from a variety of source metadata formats including xml.49 this allows metadata to be created from disparate source formats and makes it possible to bring them together as rdf for delivery, management, and preservation. discussion if metadata is in xml form and the desire is to express it as rdf, this is not merely a transformation from one xml schema to another. it is changing the expression of that data and changing its use. having metadata in xml means information is encoded in a specific way that allows for interchange and sharing. having metadata in rdf is making statements that have direct meaning and can be used independently. there are different perspectives involved in metadata when approaching rdf: those that manage metadata standards (the xml standard side) and those that have metadata encoded using those xml standards (the data management side). depending on the desired outcomes, the needs of these two perspectives can conflict. when managing a metadata standard the rdf transition tends to follow certain patterns: transitioning from xml to rdf | hardesty doi: 10.6017/ital.v35i1.9182 58 • transform an xml standard into a new rdf ontology o examples: dublin core (dc), darwin core (dwc), mods, vra core • establish a move to rdf that incorporates another existing ontology o example: pbcore, hydra community from the data management side, the rdf transition means different patterns occur. these scenarios often start by reviewing the needed outcome, deciding how much metadata needs to be expressed in rdf, and what works best to get the metadata to that point. cases include the following commonalities: • creating new search and discovery end user applications o example: oregon digital, university of alberta • publishing linked data sets o example: unlv, university of alberta • managing metadata using software that supports rdf o example: university of alberta, ucsd, hydra community conflicts are occurring when the needed outcome on the data management side is not supported by the rdf ontology transitions that have occurred for the xml standards being used. an example of this is how rdf is handled in fedora 4. when rdf is complex (the object of one statement is another entire rdf statement), fedora produces blank nodes as new objects within the repository. while not technically problematic, descriptive metadata with complex rdf can result in a situation where a digital object ends up referencing a blank node that then points to, for example, a subject or a genre. this subject or genre has been created as its own object within the digital repository even though that subject or genre is only meant to provide meaning for the digital object. mods rdf produces this complexity and thus is not workable to use with fedora 4. in contrast, other standards such as dc or dwc in rdf produce simple statements that fedora 4 can apply to a digital object without any additional processing. complications in transitioning from xml to rdf also occur when the original xml does not include uris or authority-controlled sources. converting this metadata to rdf can mean locally minting uris or bringing data over as literals (strings of text) without using uris at all. ideally, the result is somewhere in the middle with externally controlled vocabularies incorporated as much as possible and literals or locally minted uris only used where absolutely necessary. translating strings to authoritative sources is intensive work. if the xml standard cannot be expressed as a single rdf ontology, work is further complicated by the need to map xml elements to different rdf ontologies using logic that is often decided locally. while it is possible to transition xml to rdf, the process is not uniform and the pathway involves a lot of labor. potential alleviators for this labor might involve a more user-centered approach by xml standard bodies to consider the ways their standards can be used when translated into rdf (“users” in this context meaning the users of the standards, not the end users searching and information technology and libraries | march 2016 59 discovering digital content). triplestores can manage queries for complex rdf, but digital repository systems are not there yet. those that support rdf for description of objects do so on the basis of simple property statements. a complex rdf ontology is going to be a challenge to support over time. another way to progress forward is for the data management side of the equation to focus efforts on showing, in an end user search and discovery format, what is currently possible when xml is transitioned into rdf. published linked data sets need to have interfaces for access and use, showing the value of what is currently available and any needs or gaps that remain. libraries and cultural-heritage organization engaged in this work should also openly share the processes that both work and do not work so others contemplating this transformation can consider how to forge ahead themselves. libraries and cultural-heritage organizations moving metadata from xml to rdf should provide feedback to xml standard bodies regarding the usefulness or complications of any rdf transitional help an xml standard might provide. technologies for incorporating rdf into web applications and truly connecting triples across the web also require further work. triplestores have so far been the main way to expose data sets but have not been incorporated into common library or cultural-heritage end user search and discovery web applications. additionally, triplestore use does not seem to extend to management or long-term storage of complete data about digital objects. there seems to be a decision to either reduce the data stored in a triplestore down to simple statements or use the triplestore more like an isolated index or sparql endpoint only and manage the complete metadata record separately (in a static file containing text or in a separate database). that aligns triples in rdf more with relational database storage than with catalog records. triple statements focus on relationships and not the complete unique details of the thing being described. triplestores can handle complex hierarchical rdf graphs and provide responses on the basis of queries against those complexities,50 but triplestores do not appear to be taking over as either the main search and discovery mechanism for online digital resources or for digital object management. software using rdf natively is also not currently widespread. a project such as the bibframe initiative that plans to incorporate rdf needs to make sure the complexity of its data model in rdf is manageable by any tools it produces and that it is possible for vendors and suppliers to encompass the data model in their software development. conclusion the reasons for deciding metadata should transition to rdf are just as important as determining the best process for implementing that transition. reasons for transitioning to rdf are conceptually based around making data more easily shareable and setting up data to have meaning and relationships as opposed to local static description that requires programmatic interpretation. the use cases outlined in this article show the reality does not quite yet match the concept. transitioning an xml standard to rdf does not make that data more shareable or more easily understood unless there are end user applications for using that data in rdf. publishing transitioning from xml to rdf | hardesty doi: 10.6017/ital.v35i1.9182 60 linked data involves going through transitional steps, but the endpoint seems to be more of a byproduct. the real goal is going through the process of producing linked data to learn how that works. self-contained projects that aim to express collections in rdf for the purpose of a new search and discovery interface are more successful in implementing rdf that has that new level of meaning and relationship. beyond the borders of these projects, however, the data is not being shared or used. the use cases described above show some examples of what is happening now when transitioning from xml to rdf. approaches include xml standards converting to rdf expression as well as digital collections with metadata in xml that have an interest in producing that metadata as rdf. software that incorporates rdf is still developing and maturing. helping that process along by providing a pathway from xml to functionally usable rdf improves the chances of the semantic web becoming a real and useful thing. it is vital to understand that transitioning from xml to rdf requires a shift in perspective from replicating structures in xml to defining meaningful relationships in rdf. metadata work is never easy, and for metadata to move from encoded strings of text to statements with semantic relationships requires coordination and communication. how best to achieve this coordination and communication is a topic worth engaging as the move to use rdf, produce linked data, and approach the semantic web continues. bibliography berners-lee, tim. “linked data.” linked data design issues, june 18, 2009. http://www.w3.org/designissues/linkeddata.html. ———. “why rdf model is different from the xml model.” semantic web, september 1998. http://www.w3.org/designissues/rdf-xml.html. estlund, karen, and tom johnson. “link it or don’t use it: transitioning metadata to linked data in hydra,” july 2013. http://ir.library.oregonstate.edu/xmlui/handle/1957/44856. farnel, sharon. “metadata at a crossroads: shifting ‘from strings to things’ for hydra north.” slideshow presented at the open repositories, indianapolis, indiana, 2015. http://slideplayer.com/slide/5384520/. hacherouf, mokhtaria, safia nait bahloul, and christophe cruz. “transforming xml documents to owl ontologies: a survey.” journal of information science 41, no. 2 (april 1, 2015): 242–59. doi:10.1177/0165551514565972. klein, michel, dieter fensel, frank van harmelen, and ian horrocks. “the relation between ontologies and xml schemas.” in linköping electronic articles in computer and information science, 2001. doi:10.1.1.14.1037. lassila, ora. “introduction to rdf metadata.” w3c, november 13, 1997. http://www.w3.org/tr/note-rdf-simple-intro-971113.html. http://www.w3.org/designissues/linkeddata.html http://www.w3.org/designissues/rdf-xml.html http://ir.library.oregonstate.edu/xmlui/handle/1957/44856 http://slideplayer.com/slide/5384520/ http://dx.doi.org/10.1177/0165551514565972 http://dx.doi.org/10.1.1.14.1037 http://www.w3.org/tr/note-rdf-simple-intro-971113.html information technology and libraries | march 2016 61 manola, frank, and eric miller. “rdf primer 1.0, section 2.3 structured property values and blank nodes.” w3c recommendation, february 10, 2004. http://www.w3.org/tr/2004/rec-rdfprimer-20040210/#structuredproperties. mixter, jeff. “using a common model: mapping vra core 4.0 into an rdf ontology.” journal of library metadata 14, no. 1 (january 2014): 1–23. 10.1080/19386389.2014.891890. poupeau, gautier. “xml vs rdf: logique structurelle contre logique des données (xml vs rdf: structural logic against logic data).” les petites cases, august 29, 2010. http://www.lespetitescases.net/xml-vs-rdf. “rdf and tei xml,” october 13, 2010. https://listserv.brown.edu/archives/cgibin/wa?a2=ind1010&l=tei-l&d=0&p=28928. southwick, silvia b. “a guide for transforming digital collections metadata into linked data using open source technologies.” journal of library metadata 15, no. 1 (march 2015): 1–35. doi: 10.1080/19386389.2015.1007009. thuy, pham thi thu, young-koo lee, and sungyoung lee. “a semantic approach for transforming xml data into rdf ontology.” wireless personal communications 73, no. 4 (2013): 1387–1402. doi: 10.1007/s11277-013-1256-z. thuy, pham thi thu, young-koo lee, sungyoung lee, and byeong-soo jeong. “transforming valid xml documents into rdf via rdf schema.” in next generation web services practices, international conference on, 0:35–40. los alamitos, ca: ieee computer society, 2007. doi:10.1109/nwesp.2007.23. “xml rdf.” w3schools. accessed september 30, 2015. http://www.w3schools.com/xml/xml_rdf.asp. yee, martha m. “can bibliographic data be put directly onto the semantic web?” information technology and libraries 28, no. 2 (march 1, 2013): 55–80. doi:10.6017/ital.v28i2.3175. notes 1. “darwin core,” darwin core task group, biodiversity information standards, last modified may 5, 2015, http://rs.tdwg.org/dwc/. 2. “metadata specifications,” european broadcasting union, https://tech.ebu.ch/metadataebucore. 3. gautier poupeau, “xml vs rdf: logique structurelle contre logique des données (xml vs rdf: structural logic against logic data),” les petites cases (blog), august 29, 2010, http://www.lespetitescases.net/xml-vs-rdf. 4. ora lassila, “introduction to rdf metadata,” w3c, november 13, 1997, http://www.w3.org/tr/note-rdf-simple-intro-971113.html. http://www.w3.org/tr/2004/rec-rdf-primer-20040210/#structuredproperties http://www.w3.org/tr/2004/rec-rdf-primer-20040210/#structuredproperties http://dx.doi.org/10.1080/19386389.2014.891890 http://www.lespetitescases.net/xml-vs-rdf https://listserv.brown.edu/archives/cgi-bin/wa?a2=ind1010&l=tei-l&d=0&p=28928 https://listserv.brown.edu/archives/cgi-bin/wa?a2=ind1010&l=tei-l&d=0&p=28928 http://dx.doi.org/10.1080/19386389.2015.1007009 http://dx.doi.org/10.1007/s11277-013-1256-z http://dx.doi.org/10.1109/nwesp.2007.23 http://www.w3schools.com/xml/xml_rdf.asp http://dx.doi.org/10.6017/ital.v28i2.3175 http://rs.tdwg.org/dwc/ https://tech.ebu.ch/metadataebucore http://www.lespetitescases.net/xml-vs-rdf http://www.w3.org/tr/note-rdf-simple-intro-971113.html transitioning from xml to rdf | hardesty doi: 10.6017/ital.v35i1.9182 62 5. “xml rdf,” w3schools, accessed september 30, 2015, http://www.w3schools.com/xml/xml_rdf.asp. 6. see “serialization formats” from resource description framework on wikipedia. “resource description framework,” wikipedia, march 18, 2016, https://en.wikipedia.org/wiki/resource_description_framework#serialization_formats. 7. “rdf and tei xml,” email thread on tei-l@listserv.brown.edu, october 13–18, 2010, https://listserv.brown.edu/archives/cgi-bin/wa?a2=ind1010&l=tei-l&d=0&p=28928. 8. martha m. yee, “can bibliographic data be put directly onto the semantic web?” information technology & libraries 28, no. 2 (march 1, 2013): 57, doi:10.6017/ital.v28i2.3175. 9. frank manola and eric miller, “rdf primer 1.0, section 2.3 structured property values and blank nodes,” w3c recommendation, february 10, 2004, http://www.w3.org/tr/2004/recrdf-primer-20040210/#structuredproperties. 10. sharon farnel, “metadata at a crossroads: shifting ‘from strings to things’ for hydra north” (slideshow presentation, open repositories, indianapolis, indiana, 2015), http://slideplayer.com/slide/5384520/. 11. http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/main/. 12. https://jena.apache.org/documentation/fuseki2/. 13. http://rdf4j.org. 14. http://www.oxygenxml.com. 15. http://protege.stanford.edu. 16. http://openrefine.org. 17. https://en.wikipedia.org/wiki/sparql. 18. http://refine.deri.ie. 19. https://jena.apache.org/tutorials/sparql.html. 20. http://json-ld.org. 21. pham thi thu thuy, young-koo lee, and sungyoung lee, “a semantic approach for transforming xml data into rdf ontology,” wireless personal communications 73, no. 4 (2013): 1392–95, doi:10.1007/s11277-013-1256-z. 22. pham thi thu thuy et al., “transforming valid xml documents into rdf via rdf schema,” in next generation web services practices, international conference on, vol. 0 (los alamitos, ca: ieee computer society, 2007), 37, doi:10.1109/nwesp.2007.23. http://www.w3schools.com/xml/xml_rdf.asp https://en.wikipedia.org/wiki/resource_description_framework#serialization_formats https://listserv.brown.edu/archives/cgi-bin/wa?a2=ind1010&l=tei-l&d=0&p=28928 http://dx.doi.org/10.6017/ital.v28i2.3175 http://www.w3.org/tr/2004/rec-rdf-primer-20040210/#structuredproperties http://www.w3.org/tr/2004/rec-rdf-primer-20040210/#structuredproperties http://slideplayer.com/slide/5384520/ http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/main/ https://jena.apache.org/documentation/fuseki2/ http://rdf4j.org/ http://www.oxygenxml.com/ http://protege.stanford.edu/ http://openrefine.org/ https://en.wikipedia.org/wiki/sparql http://refine.deri.ie/ https://jena.apache.org/tutorials/sparql.html http://json-ld.org/ http://dx.doi.org/10.1007/s11277-013-1256-z http://dx.doi.org/10.1109/nwesp.2007.23 information technology and libraries | march 2016 63 23. see mokhtaria hacherouf, safia nait bahloul, and christophe cruz, “transforming xml documents to owl ontologies: a survey,” journal of information science 41, no. 2 (april 1, 2015): 242–59, doi:10.1177/0165551514565972. 24. michel klein et al., “the relation between ontologies and xml schemas,” section 5 in linköping electronic articles in computer and information science, 6 (2001), doi:10.1.1.108.7190. 25. tim berners-lee, “why rdf model is different from the xml model,” semantic web road map, september 1998, http://www.w3.org/designissues/rdf-xml.html. 26. see jeff mixter, “using a common model: mapping vra core 4.0 into an rdf ontology,” journal of library metadata 14, no. 1 (january 2014): 1–23, doi:10.1080/19386389.2014.891890. 27. the document currently labeled “how to convert version 3.0 to version 4.0” contains a recommendation for a minimum set of elements for “meaningful retrieval” in vra core: http://www.loc.gov/standards/vracore/convert_v3-v4.pdf. 28. mixter, “using a common model,” 2. 29. “vra core rdf ontology available for review,” visual resources association, october 7, 2015, http://vraweb.org/vra-core-rdf-ontology-available-for-review/. 30. “bibliographic framework initiative,” library of congress, https://www.loc.gov/bibframe/. 31. see “marc to bibframe transformation tools” at “tools” bibframe, http://bibframe.org/tools/. 32. “why a single namespace for the bibframe vocabulary?” library of congress, bibframe frequently asked questions, https://www.loc.gov/bibframe/faqs/#q06. 33. “pbcore 2.1,” public broadcasting metadata dictionary project, http://pbcore.org. 34. “wgbh,” hydra community partners, http://projecthydra.org/community-2-2/partners-andmore/wgbh/. 35. “metadata specifications,” european broadcasting union, https://tech.ebu.ch/metadataebucore. 36. see notes from pbcore hackathon part 2, which occurred in june 2015 showing an elementby-element analysis of pbcore against ebucore. “pbcore hackathon part 2,” june 15, 2015, https://docs.google.com/document/d/1pwdfyizhpfjcn5rwj1fiowexg5rirxudxcwkbq5bml a/. 37. “join us for the pbcore sub-committee meeting at amia!” public broadcasting metadata dictionary project blog, november 11, 2015, http://pbcore.org/join-us-for-the-pbcore-subcommittee-meeting-at-amia/. http://dx.doi.org/10.1177/0165551514565972 http://dx.doi.org/10.1.1.108.7190 http://www.w3.org/designissues/rdf-xml.html http://dx.doi.org/10.1080/19386389.2014.891890 http://www.loc.gov/standards/vracore/convert_v3-v4.pdf http://vraweb.org/vra-core-rdf-ontology-available-for-review/ https://www.loc.gov/bibframe/ http://bibframe.org/tools/ https://www.loc.gov/bibframe/faqs/#q06 http://pbcore.org/ http://projecthydra.org/community-2-2/partners-and-more/wgbh/ http://projecthydra.org/community-2-2/partners-and-more/wgbh/ https://tech.ebu.ch/metadataebucore https://docs.google.com/document/d/1pwdfyizhpfjcn5rwj1fiowexg5rirxudxcwkbq5bmla/ https://docs.google.com/document/d/1pwdfyizhpfjcn5rwj1fiowexg5rirxudxcwkbq5bmla/ http://pbcore.org/join-us-for-the-pbcore-sub-committee-meeting-at-amia/ http://pbcore.org/join-us-for-the-pbcore-sub-committee-meeting-at-amia/ transitioning from xml to rdf | hardesty doi: 10.6017/ital.v35i1.9182 64 38. “mods and rdf descriptive metadata subgroup,” last modified march 19, 2016, https://wiki.duraspace.org/display/hydra/mods+and+rdf+descriptive+metadata+subgrou p 39. “mods rdf ontology,” library of congress, https://www.loc.gov/standards/mods/modsrdf/. 40. “mods and rdf descriptive metadata subgroup,” last modified march 19, 2016, https://wiki.duraspace.org/display/hydra/mods+and+rdf+descriptive+metadata+subgrou p 41. “avalon media system,” http://www.avalonmediasystem.org. 42. see silvia b. southwick, “a guide for transforming digital collections metadata into linked data using open source technologies,” journal of library metadata 15, no. 1 (march 2015): 1– 35, http://dx.doi.org/10.1080/19386389.2015.1007009. 43. the url for information is a blog with no links to a data set (https://www.library.unlv.edu/linked-data) and the collection site seems to still be based on contentdm (http://digital.library.unlv.edu/collections). 44. “oregon digital,” http://oregondigital.org. 45. see karen estlund and tom johnson, “link it or don’t use it: transitioning metadata to linked data in hydra,” july 2013, http://ir.library.oregonstate.edu/xmlui/handle/1957/44856, accessed from scholarsarchive@osu. 46. “era: education & research archive,” https://era.library.ualberta.ca. 47. farnel, “metadata at a crossroads.” 48. https://docs.google.com/spreadsheets/d/1hsd6kf4abmm8vtynyqfjgtizg7bljq3fwrbf_nvoiw/edit#gid=1362636241. 49. the substantially revised data model is not available online yet, but the following shows some of the progress toward an rdf data model: “overview of dams metadata workflow,” uc san diego, may 21, 2014, https://tpot.ucsd.edu/metadata-services/mas/data-workflow.html; “dams4 data dictionary,” https://htmlpreview.github.io/?https://github.com/ucsdlib/dams/master/ontology/docs/da ta-dictionary.html, retrieved from github. 50. see the apache jena sparql tutorial for an example of complex rdf with sample queries against that complexity. “sparql tutorial data formats,” the apache software foundation, https://jena.apache.org/tutorials/sparql_data.html. https://wiki.duraspace.org/display/hydra/mods+and+rdf+descriptive+metadata+subgroup https://wiki.duraspace.org/display/hydra/mods+and+rdf+descriptive+metadata+subgroup https://www.loc.gov/standards/mods/modsrdf/ https://wiki.duraspace.org/display/hydra/mods+and+rdf+descriptive+metadata+subgroup https://wiki.duraspace.org/display/hydra/mods+and+rdf+descriptive+metadata+subgroup http://www.avalonmediasystem.org/ http://dx.doi.org/10.1080/19386389.2015.1007009 https://www.library.unlv.edu/linked-data http://digital.library.unlv.edu/collections http://oregondigital.org/ http://ir.library.oregonstate.edu/xmlui/handle/1957/44856 https://era.library.ualberta.ca/ https://docs.google.com/spreadsheets/d/1hsd6kf4abm-m8vtynyqfjgtizg7bljq3fwrbf_nvoiw/edit#gid=1362636241 https://docs.google.com/spreadsheets/d/1hsd6kf4abm-m8vtynyqfjgtizg7bljq3fwrbf_nvoiw/edit#gid=1362636241 https://tpot.ucsd.edu/metadata-services/mas/data-workflow.html https://htmlpreview.github.io/?https://github.com/ucsdlib/dams/master/ontology/docs/data-dictionary.html https://htmlpreview.github.io/?https://github.com/ucsdlib/dams/master/ontology/docs/data-dictionary.html https://jena.apache.org/tutorials/sparql_data.html literature review and concepts use cases for rdf discussion conclusion bibliography notes lib-mocs-kmc364-20140103102230 38 journal of library automation vol. 4/1 march, 1971 recon pilot project: a progress report, april-september 1970 henriette d. avram and lenore s. maruyama: marc development office, library of congress, washington, d. c. a synopsis of the third progress report on the recon pilot project submitted by the library of congress to the council on library resources. an overview is given of the progress made from april through september 1970 in the following areas: recon production, format recognition, research titles, microfilming, and investigation of input devices. in addition, the status of the tasks assigned to the recon working task force are briefly described. introduction the recon pilot project was established in august 1969 to test various techniques for retrospective conversion in an operational environment and to convert a useful body of records into machine readable form. it is being supported with funds from the council on library resources, the u.s. office of education, and the library of congress. this article summarizes the third progress report of the pilot project submitted by the library of congress to the council and has addressed itself to all aspects of the project, regardless of the source of funding, in order to present a meaningful document. two previous articles in the journal of library automation summarized the first and second progress reports, respectively ( 1), ( 2). this article describes the activities occurring april through september 1970. progress-april through september 1970 recon production at the present time, the recon data base contains approximately 20,000 records. it appears that the original estimates on the number of titles to be input during the recon pilot project were considerably higher than the actual number found to be eligible. this situation occurred because of the following circumstances: recon pilot project/ avram and maruyama 39 1) the original estimates were derived from the number of english language monographs cataloged during 1968 and 1969. since the marc distribution service began in march 1969, it was felt that the number of titles eligible for recon in the 1969 and 7-series of card numbers would be equal to the number cataloged during january-march 1969. in actuality, the titles cataloged during this period were primarily records with 1968 card numbers. 2) the estimate of records with 1968 card numbers was higher because it was thought that many more of these titles had been through the cataloging system than were actually processed prior to the beginning of the marc distribution service. instead of being included in recon, these records have been input into the marc distribution service. in order to obtain 85,000 records for conversion, several alternatives, including the conversion of english language monographs in the 1967 card series, are being studied. format recognition format recognition is a technique that will allow the computer to process unedited catalog records by examining data strings for certain keywords, significant punctuation, and other clues to determine the proper content designators. this technique should eliminate substantial portions of the manual editing process and, if successful, should represent a considerable savings in the cost of creating machine readable records. the logical design for format recognition has been completed, and the manual simulation to test the efficiency of the algorithms was described in an earlier article ( 3). completion date for the programs is expected in february 1971. the programs were designed in several modules so that they could be adapted for different input procedures without disturbing the logic. once the programs have been implemented, tests may show that certain fields should be pretagged because the error rate is too high or the occurrence of the field is too low to justify the processing time. the complete logical design for format recognition has been published as a separate report by the american library association ( 4). as part of a manual simulation to test the format recognition algorithms, one hundred fifty records for english language monographs were typed on an mt/st, a typewriter-to-magnetic tape device. the mt/st hardcopy output was used as the raw data for the simulation. the results of the test were analyzed for possible changes to the algorithms, keyword lists, or input specifications. then the records with the content designators assigned by the format recognition algorithms were retyped and processed by the existing marc system programs. proofsheets were produced and given to the recon editors for proofing, a process to verify content designators and bibliographic information. 40 journal of library automation vol. 4/1 march, 1971 each editor proofed all of the format recognition records; their hourly numbers of records proofed were as follows: highest, 9.3; lowest, 5.3; average, 6.8. the average number of current marc records edited and proofed in an hour is 4.8. when format recognition is implemented, present workflow-editing, typing, computer processing, proofing-will be replaced by a new onetyping, format recognition, proofing. in comparing production rates in the two systems, time needed to proof format recognition records must be compared against time needed to edit and proof in the current system. several factors should be considered when evaluating this portion of the simulation experiment. although all the records chosen for the test were of english language monographs, they were generally more difficult than those encountered in a normal day's work for both editors and typists. in addition, numerous errors were made by the human simulators, such as omission of subfield codes, delimiters, or fixed field codes. format recognition does appear to have reduced the amount of time spent in the combined editing and proofing process, but the success of the program depends heavily on the following factors: 1) extensive training for the input typists with greater emphasis placed on their role in this project; and 2) extensive training for the editors to alert them to kinds of errors the format recognition programs might make. proofing time for the test was greater than anticipated. with fewer errors from the typing input and the elimination of human errors from the simulation, it is possible that the proofing rate will be higher under actual work conditions. editors might reach an average of 9.3 records proofed, or double the number presently done in a combined editing/ proofing process. two programs are being written to support the format recognition project. format recognition test data generation (fortgen) will provide test data for format recognition by stripping marc records of delimiters, indicators, and subfield codes, and reformatting the data to be identical with the product from the initial input program. thus, a large quantity of high quality test data can be provided without additional keystroking. the keyword list maintenance program ( klmp) maintains approximately sixty keyword lists used by the format recognition program in processing bibliographic data. these lists are maintained as a separate data set on a 2314 disk pack. the actual lists themselves, alon~ with associated control data, are referred to as "keyword list structures. ' the general function of klmp is to read the entire set of keyword list structures from the file on disk, modify them as specified by parameter cards to klmp, and write a new file on disk. the individual actions performed by klmp are as follows: 1) create a list; 2) remove a list; 3) add a keyword; 4) delete a keyword; 5) augment a table (translation tables to recon pilot project/ avram and maruyama 41 generate codes such as geographic area code, language, place of publication); and 6) list structures (printout of all or selected portions of a list). since the keyword lists will be dynamic in nature, this program provides the flexibility required to change or update them without recataloging the entire format recognition program. new lists will be added as format recognition is extended to other languages, and keywords will be added to or deleted from existing lists as experience is gained in the use of format recognition. research titles since the production operations of the recon pilot project have been limited to english language monographs in the 1968, 1969, or 7 -series of card numbers, it was recognized that many problems concerning retrospective records would not be revealed in the conversion of relatively current titles. for this reason, a project to identify and analyze 5,000 research titles was included as part of the pilot project. these research titles would consist of records for older english language monographs and foreign language monographs in roman alphabets and would be studied for problems in the following areas: 1) earlier cataloging rules which caused certain elements to be omitted from the record or transcribed in a different style; 2) different printed card formats which placed elements in different locations; 3) difficulty in working with foreign languages when converting records to machine readable form; 4) problems arising from shared cataloging records; and 5) problems arising when expanding the format recognition algorithms to cover these kinds of records. the selection of these records was described in an earlier article ( 5). the initial analysis of the research titles has been completed, and a few of the problems encountered are listed as follows: 1) ellipses at the beginning of a title field ( • . . dictionnaire-manuelillustre des ecrivains et des litteratures) were used frequently on older cataloging records. since they are no longer prescribed by the present cataloging rules unless they appear on the title page at the beginning of a title, it was recommended that such ellipses be deleted from the machine record because they would affect the format recognition algorithms. 2) card numbers without digits representing the year (f-3144) were assigned during 1901. generally, these numbers appear with an alphabetic prefix representing the language of the publication or the classification number. it has been recommended that such numbers be revised to read "f01-3144" for the machine record. 3) records cataloged under the 1908 a. l.a. catalog rules included in the series statement such information as the editor of the series or the location of the series statement (half-title: everyman's library, ed. by ernest rhys. reference). it has been recommended that such information he deleted from the machine record. 4) an asterisk preceding personal name added entries (i. 0 spence, 42 journal of library automation vol. 4/1 march, 1971 lewis, 1874joint author.) indicated that the name had appeared in a fuller form at an earlier date; if this name were used as the main entry, there would have been a corresponding full name note at the bottom of the catalog card. it has been decided that this asterisk will be deleted from the machine record. 5) the national bibliographies from which shared cataloging copy is derived use punctuation conventions which differ from the aa rules. for example, the west german bibliography uses parentheses to indicate that the data are not on the title page, brackets to indicate the data are not in the publication, and angled brackets to indicate that the data are enclosed in parentheses on the title page ( <22.-27. mai 1967>. koln ([-ehrenfeld] bundesinstitut fur ostwissenschaftliche und internationale studien) 1967). such conventions would affect the expansion of the format recognition algorithms to foreign languages. this is an area in which the standard bibliographic description would be of great value. 6) in the marc ii format, each place of publication is a separate subfield so that when each place is connected by hyphens (milano-romanapoli ... ,), there would be a problem in inputting the data and having the data printed out in the same fashion. it has been recommended that each place of publication be separated with a comma instead of a hyphen (and the ellipsis deleted from the imprint statement). 7) conjunctions have been used between places of publication on records cataloged according to the 1908 rules and on some shared cataloging copy (london, glasgow and bombay) (neuwied a. rh. u. berlin). in the machine record, each place is a separate subfield, and the presence of a conjunction means that one subfield contains non-essential data. it has been recommended that conjunctions be omitted from the machine record and that places of publication be separated by commas. 8) the a. l.a. cataloging rules for author and title entries states that with certain well-known persons, dates of birth and death can be omitted when the heading is followed by a subject subdivision ( 1. shakespeare, william-language-glossaries, etc.). since the rules provide a list of such persons, it has been recommended that when such names are used as subject headings, they should include dates of birth and death in the machine record. 9) a collation statement like the following ( 25 p., 27-204 p. of ill us., 205-232 p., 233-236 p. of illus., 237-247 p. 28 em.) would cause the format recognition algorithms some difficulty in identifying the proper subfields. this is another area in which the adoption of a standard bibliographic description would aid format recognition programs. 10) both east and west german bibliographies give information about illustrations in the title paragraph rather than in the collation (title paragraph: [mit] 147 abbildungen und 71 tabellen. collation: xii, 418 p. 26 em.). the cataloging policy at the library has been revised so that recon pilot pro]ect/avram and maruyama 43 on current cataloging records information about illustrations is also repeated in the collation. it has been recommended that for retrospective records the data should be input as it appears on the catalog card. in this example, the machine record would not contain illustration information in the collation. 11) the method of transcribing non-lc subject headings has been changed in recent years, and the marc ii format reflects this change. in previous years, the following conventions were used: subscript brackets enclosed headings or portions of headings that were not the same as the lc form; subscript parentheses enclosed portions of headings that were the lc form but not the contributing library's; if two headings had the same number, the lc form was listed first; if both forms of the heading were the same, there would be only one number, and the heading itself would not have the subscript brackets or parentheses. it has been recommended that either the non-lc forms be deleted from the machine record or the transcription of such subject headings be revised to follow the current practice. 12) nlm subject hearings have different capitalization conventions from those used by lc, and the geographic subject subdivisions are often in a form different from that which the library of congress uses ( [dnlm: 1. public health administration-u.s.s.r. w6 p3]). in analyzing these research titles in terms of possible problems with format recognition, it was discovered that nlm subject headings would be incorrectly identified for the above reasons. format recognition depends heavily on capitalization and keyword lists; in this example, the heading "public health administration" would be identified as a corporate name because of the capitalization. examinination of the research titles showed the similarity of the cataloging of the older records (pre-1949) and the current foreign language records based on shared cataloging copy. certain stylistic conventions, such as the use of ellipses or the transcription of imprint statements, were similar for both kinds of material. it would be necessary to have a thorough knowledge of the ala catalog rules (published in 1908) in order to interpret the data on the older printed cards correctly during a conversion project. the experience of the editors in the recon production unit has been that retrospective records, even those cataloged during the last two years, require a considerable amount of interpretation in order to assign the correct content designators in the fixed fields. for pre-1949 records, the problem becomes more acute when one attempts to apply the procedures and techniques for current material to older records. it is very likely that a higher level of personnel would be required to process these records because in many instances the changes would be similar to recataloging the entire record. the expansion of format recognition to foreign languages would be 44 journal of library automation vol. 4/1 march, 1971 t emely difficult without a greater degree of consistency in shared ~:t~oging copy. each national bibliography, from which the cataloging copy is derived, has its own rules and style of cataloging, so that although the language of the works may be the same, e.g., german, the entries from the west german, east german, austrian, and swiss bibliographies may differ in terms of punctuation or style of cataloging. these problems have been compounded by printer·s errors on the printed cards as the result of conventions that differ from the aa rules. the adoption of the standard bibliographic description ( 6) would be a tremendous aid in interpreting cataloging data by both humans and format recognition programs. microfilming techniques the library's photoduplication service is supporting the recon pilot project by providing the cost estimates for the various alternatives of microfilming techniques and providing technical guidance as required. several discussions with them confirmed that the method of filming a portion of the record set containing the subset of records to be converted first and selecting the appropriate records afterward would be more advantageous than selection prior to microfilming ( 7). it was considered unrealistic to attempt to project microfilming costs for the entire recon effort. because of the paper handling problems involved in the management of input worksheets, the microfilming rate should be in reasonable proportion to the actual conversion rate. there is no point in providing a huge supply of input worksheets which will not be used in actual conversion for a long time. the data may become "dated," and there may be storage and handling problems. in addition, cost estimates provided by the photoduplication service can only be expected to prevail over the next twelve months. beyond that period, any quotation given is likely to be higher because of the general trend of rising costs. any projection of costs should be based on a manageable portion of the whole. just what this portion should consist of has yet to be determined. assuming a modus operandi as described above, there is needed a determination of the "rate floor," which is defined as the minimum number of records that must be microfilmed to achieve the maximum cost benefits resulting from a relatively high volume job. once the rate floor is determined, it should probably be translated into year equivalents, i.e., if the rate floor is 100,000 and the catalog card production is 50,000, then two years· worth of cards would be microfilmed. estimates would be obtained for the following alternatives: microfilming for ocr device specifications; microfilming for reader-printer specifications; microfilming for reader specifications; and microfilming for xerox copyflo printouts of the lc printed cards onto recon worksheets. certain ground rules were assumed for the actual microfilming process. the selected drawers of the record would be "frozen" for a day or two prior to being filmed, i.e., the file would be complete and no one would recon pilot project/avram and maruyama 45 remove cards from the file while filming was in process. the filming would take place during the day. assuming that 100,000 cards for the year 1965 would be used as a base figure and that approximately 5,000 cards per day can be filmed with a planetary camera, it would take twenty working days to film the collection of cards for one year in the record set (rate floor as defined above). all cost estimates will include quality control; i.e., quotations would indicate degree of inspection of film for technical quality and degree of preparation of the file before filming. input devices during 1969 the library of congress conducted an investigation to determine the feasibility and desirability of using a mini-computer for marc/recon input functions (original input and corrections). this study was performed with contractual support and consisted of three basic tasks: 1) analysis of present operations to determine functional requirements, to measure workloads, and to identify problem areas; 2) survey and analysis of mini-computers that are potentially capable of meeting the requirements of the present operations; 3) evaluation of available hardware and software capabilities relative to marc data preparation requirements and determination of economic feasibility based on present and projected workloads. the intent of this study was to provide a basis for future planning and procurement activities by the library of congress relative to improvement of the marc/recon man-machine interface. the survey of hardware was not intended to be all-inclusive. there were time and funding limitations, and in addition it was recognized that the mini-computer field was a rapidly expanding one; therefore, it was not possible at any cut-off point to have surveyed the totality. six firms were included in the survey, and the machines considered were the burroughs tc-500, the digital equipment corporation pdp-8/i, the honeywell pdp-516, the ibm 1800, the lnterdata model 4, and the xds sigma 3. of these, the dec pdp8/1 and the honeywell pdp-516 were determined to have the highest potential for meeting marc/recon requirements. additional analysis revealed that software availability for mini-computers is minimal. manufacturers covered in this investigation supplied an assembler as well as testing and editing routines. some provided a fortran, algol, or basic compiler and an operating system with foreground/background processing. systems that support fortran and the operating system are quite substantial, generally requiring 16,000 words of core, memory protect, disc, etc. the cost of this kind of system is generally a minimum of $10,000. few low-cost peripheral devices are available for use with mini-computers. high-speed tape readers, punches, and punched card readers are the most inexpensive input/output devices available. the addition of a magnetic tape unit to most systems significantly increases the overall cost. 46 journal of library automation vol. 4/1 march, 1971 the conclusion reached as a result of this investigation was that there is no gain, either tech~ically or economically ( co.nsiderin~ ~he hardwa~e configuration of the l1brary of congress), to usmg a mm1-computer m performing present marc/recon functions. another input device investigated during this reporting period was the keymatic data system model 1093, which was selected for a two-month test and evaluation period because it appeared to have the following advantages for the recording of bibliographic data: 1) this device has 256 unique codes; 2) data is recorded directly on computer compatible magnetic tape; 3) through manufacturer supplied software, the user may assign to certain keys, called expandables, the value of whole strings of characters; thus a single key would equate to a marc tag; 4) correction procedures are built into the device, i.e., the ability to delete a character, word, sentence, or entire record; and 5) the single character display screen obviates the necessity for hard copy. it is often claimed that hard-copy output is scanned by the typist tmintentionally to the detriment of typing rates. the machine tested was specifically set for the library's requirements. four separate keyboards contained 184 keys, of which 103 had upperand lower-case capability, and the remaining 81 had only a single case. the 256 possible codes were divided into the following categories: 1) 94 were used as expandables and assigned to those marc tags and data strings (correction and modification symbols) that appear most frequently; 2) 10 were used as machine function codes; 3) 150 were assigned unique values in the marc character set; and 4) 2 were left unused. the keys on the four keyboards were assigned values such that the most frequently used keys were located in a strong stroke area. the main character keyboard was designed to be closely compatible to the device currently in use at the library to lessen the training requirements for the typist. therefore, the typist had only to learn the expandable keys and some lesser used special characters. the program supplied by the manufacturer was modified for code conversion and output format acceptable to the marc system and to conform to the library's computer system assignments. the two typists selected to participate in the test were both experienced marc production typists. both typists were given individual instruction on the machine and spent three weeks practicing; at the same time, their performance was being analyzed and discussed with them. during the official evaluation period, the typists spent two weeks working full time on the machine. when the typists began their practice period, their speeds were relatively slow, 6-7 records per hour. as time progressed, their speed increased, leveling off to approximately 11-12 records per hour by the end of the test period. each typist reported problem areas during the official evaluation. one problem was the hesitation which resulted when the typist had to detennine recon pilot project/avram and maruyama 47 whether to use an expandable key or actually type the data, character by character. if she chose the former, the expandable key had to be found. the number and different combination of tags caused some confusion. the opinion of both typists concerning the keyboard arrangement was that they would rather type the tags character by character than search for the expandable key. more experience on this device might eliminate this problem. the absence of hard copy was felt to cause another problem. when a typist intuitively feels that she has made an error in current marc/ recon typing operations, she uses the hard copy to verify that a mistake has actually been made prior to taking corrective action. the lack of hard copy did not allow for this verification, and the typists reported that this detracted from their efficiency. the following table lists the results of the official evaluation period. the average production rate of these two typists on the mt /st is also listed. the figures for mt jst production have been calculated for a particular three-week period. typist a typist b total mt/st new records 505 540 1045 1995 correction records 323 278 601 verified records 58 537 595 average records/hour-new 10.1 14.0 12.1 14.6 average records /hour-corrected 21.3 27.7 24.5 keystrokes total 238,435 259,630 498,065 expandables used 12,280 14,646 26,926 the keymatic model used for the test rents for $768.25 per month (july 1970 pricelist). it is a fully equipped model with several options not required for the marc system. without these options, a less expensive model could be used. keymatic does have a 24-month lease plan in which the basic machine could be rented for $368.00 per month. this is an increase of $258.00 per month per machine over the current method of input. costs per record were computed for the keymatic device and for the mt /st based on the average record statistics of both typists. although the same records were not actually typed on the mt jst, extensive experience with production and error rates on that device made it valid to use average production rates for purposes of comparison. for purposes of computing the cost per record, the hourly cost per machine was calculated by dividing the cost per machine by 160 working hours. the 24-month leasing price of $368.00 per month was used for the keymatic, resulting in a macbine cost per hour of $2.30. the mt /st rental cost is $110.00 per month, resulting in an hourly cost of $.69. (the cost of the mt /st listed in a previous article ( 8) as being $100.00 was 48 journal of library automation vol. 4/1 march, 1971 in error.) on the basis of 12.1 records per hour on each device, the cost per record for the keymatic is $.19 and $.06 for the mt /st. in the context of the library of congress marc/recon project, the addition of a digi-data to translate mt/st output to computer compatible tape adds an incremental cost to each input device. for the purposes of this report, it was assumed that the project required five input devices. on this basis, the prorated digi-data cost per hour is $.33, which makes the total machine cost per hour for the mt /st as $1.02. thus, the cost per record for the mt /st becomes $.08. the results of the test indicated that the keymatic used in the library of congress environment did not substantially increase production rates or decrease error rates. thus, no savings in cost were demonstrated. the complex data to be typed and the construction and quality of the worksheets at the library of congress impose severe constraints on all machines. (the manuscript card reproduced on the marc/recon worksheet results in a source document that is difficult to work with for the following reasons: 1) loss of legibility during the copying process; 2) position of tags in relation to content; and 3) combination of typed and handwritten data as recorded by the catalogers. ) in order to make a fair comparison between the keymatic and the mt /st, the manuscript card was used for the test rather than the printed card. if, on evaluation, the keymatic proved to be more efficient than the mt /st using the manuscript card, it would be even more effective if the printed card were used, since the latter is a far more legible source document. keymatic does have a new machine, model k-103, which has an socharacter visual display option which might correct one of the objections raised by the typists, i.e., lack of hard copy; however, this model requires the use of a converter as does the mt /st. this device is less expensive than the machine used in the test and may be evaluated during the recon project at a later date. an investigation of model 370 compuscan was continued following the initial findings reported in a previous article (9). twenty-five letterpress library of congress printed cards representing english language titles and containing no diacritical marks in the content were sent to the firm for input. this allowed the machine to be evaluated and problems noted within an "ideal" test environment. depending on these results, further testing could be performed. since existing compuscan software was used to conduct the library of congress test, the entire lc card could not be read but only that portion that contained fonts already built into the existing configuration. the printed cards were blocked out, except for the area covering the body of the entry, i.e., title through imprint, prior to microfilming for subsequent scanning. operator intervention was required on approximately 1%-25% of the cfutracters on each card. in addition to the problems offered by variant recon pilot project/ avram and maruyama 49 and touching characters, fine lines in certain characters caused a misreading by the machine. this was f~uticularly true with the letter "e" being interpreted as the letter "c. compuscan felt this problem might be resolved by increasing the size of the comparison matrix of the hardware. in some instances, a period was generated in the middle of a word due to the coarseness of the card stock that was microfilmed. initial discussions have begun on the possibility of testing a retyped version of the printed card. the only rationale behind this test would be to investigate if typing for a scanner that could read upper-and lower-case and special characters made any significant difference in speed and/ or error rate compared to costs and production rates of typing for a scanner which could read only upper-case characters. the latter was described in an earlier article on recon (10). recon working task force the working task force continued the discussion on the implications of a national union catalog in machine-readable form. from the postulated reporting system for a future nuc described in a earlier article (11), several items were isolated for further consideration. these included: 1) grouping of records in a register (by language, alphabet, etc. ) to allow for a segmented approach to computer-produced book catalogs (a register is defined as a printed document containing the full bibliographic descriptions of works sequenced by unique identification numbers. as each record is added to the register, it is added at the end and assigned the next sequential identification number); 2) the need for additional indexes to the register by lc card number and classification number (the class number was not included in the list of data elements required for the machine-readable nuc); 3) the requirement to include the author statement in the title index versus using the main entry in all cases; and 4) clarification of subject index to mean only topical or geographic subjects. the following tasks were outlined for further consideration: 1) format of the printed nuc (graphic design and printing, size, style, typographic variation, etc.); 2) physical size of the volume depending on pattern of distribution (monthly, bimonthly, etc.); 3) input (relationship to marc input, use of format recognition, problems of languages in terms of selection for input); 4) output (cost of production for register and indexes, cost of sorting, costs of selection, etc.); 5) cumulation patterns in terms ?£ cost and utility (number of characters in an average entry, number of items on a page, rate of increase, etc.); 6) the use of com (computer output microfilm) as an alternative to photocomposition for printed output. work on task 3, the investigation of the possible use of existing data bases in machine readable form for a national bibliographic service, has been continued. phase 1 of this task consisted of a survey of existing machine readable data bases. selection of data bases for analysis was based on the following criteria: 1) the data base had to include monograph 50 journal of library automation vol. 4/1 march, 1971 records. 2) any data base known to have predominantly lc marc records was excluded. 3) the data base had to be potentially available to recon (security organizations or commercial vendors might not be willing to give their files to a recon effort). 4) data bases of less than 15,000 records were excluded. a data analysis worksheet was prepared to reduce the documentation to a standardized form for each system studied in the survey. it was initially anticipated that once documentation was received from the various institutions, additional contact would be made via telephone or on-site visits. this proved to be unnecessary, as the submitted documentation was generally sufficient. since many of the formats submitted were complicated, errors could have been made in interpretation; however, this possibility was not considered important enough to affect the findings of this task. if necessary, additional information can be requested from the library systems at a later date. the analysis of the submitted documentation was difficult for the following reasons: 1) the amount of documentation ranged from extremely detailed to very sparse; 2) neither the technical nor the bibliographic terminology was consistent for all organizations; 3) in some instances, the format descriptions were more detailed with respect to control and housekeeping data fields than bibliographic data fields. the formats were ranked according to three broad categories: low potential, medium potential, and high potential. to arrive at a ranking, the data fields of each format were compared to the marc ii format. comparison was made on the following basis: 1) present in both formats; 2) not present in local format and not capable of generation by format recognition algorithms; or 3) not present in local format but capable of generation by format recognition. the result of this analysis distributed the twenty-two institutions into the following ranked order: 1) low potential-s; 2) medium potential-s; 3) high potential-h. the figure for the number of low potential data bases is in addition to the eight out of the eleven originally rejected due to a small data base or very limited content in the record. it is significant to note that although no attempt was made at an all-inclusive survey of machine readable data bases, the total number of records in machine readable form reported by the respondents amounted to approximately 3.7 million of all types. of this figure, about 2.5 million represented monograph records. the phase 1 study included procedures required to transform a record into a certified recon record, thus outlining the areas requiring cost analysis to compare the economics of using existing files for a national bibliographic store, as opposed to original input. (certification in this context means comparing the record of the local institution to the record in the lc official catalog and, if required, making the record consistent with the lc cataloging as well as upgrading it to the bibjiographic com· recon pilot projec1'/ avram and maruyama 51 pleteness of the lc record. input in this sense includes the editing of the record as well as the keying.) the results of the study, prior to any further analysis, seems to indicate that the next phases of task 3 will concentrate on a very large data base with a high degree of compatibility with marc ii (high potential) and another data base with a format differing from marc ii both in level of explicit identification and in bibliographic completeness (medium potential). the first data base tests the most favorable situation; the latter a much less favorable situation. the carry-on phases of task 3 will include: 1) a determination of a cut-off point at which a particular data base would not be included in future studies (although the composition and the format of the records in the data base might fit the selection criteria, the number of records in the file might be insufficient to warrant the costs of the hardware/software for the conversion effort); 2) investigation of the hardware and software effort involved; and 3) determination of the costs of comparing the records with the lc official catalog and the resultant updating costs to bring the records up to the level of the records in the lc machine readable marc/ recon data base. acknowledgments the authors wish to thank the staff members associated with the recon pilot project in the marc development office, the marc editorial office, and the technical processes research office in the library of congress for their contributions to this report. the lc photoduplication service provided valuable assistance in certain phases of this project. work on the recon pilot project has continued to be supported by the council on library resources and the u.s. office of education. references 1. avram, henriette d.: "the recon pilot project: a progress report," journal of library automation, 3 (june 1970), 102-114. 2. avram, henriette d.; guiles, kay d.; maruyama, lenore s.: "the recon pilot project: a progress report, november 1969-april 1970," journal of library automation, 3 (september 1970), 230-251. 3. ibid., p. 235 4. u.s. library of congress. information systems office. format recognition process for marc records: a logical design. chicago: ala, 1970. 5. avram, henriette d.; guiles, kay d.; maruyama, lenore s. op. cit., p. 236. 6. ibid. 7. ibid., p. 237. 8. ibid., p . 246. 9. ibid., pp. 244-245. 10. ibid., pp. 245-248. 11. ibid., p. 248. reproduced with permission of the copyright owner. further reproduction prohibited without permission. editorial: how do you know whence they will come? marmion, dan information technology and libraries; mar 2000; 19, 1; proquest pg. 3 editorial: how do you know whence they will come? a s i write this, i am putting my affairs in order at western michigan university, in preparation for a move to a new position at the university of notre dame libraries beginning in april. at each university my responsibilities include overseeing both the online catalog and the libraries' web presence. i mention this only because i find it interesting, and indicative of an issue with which the library profession in general is grappling, that librarians in both institutions are engaged in discussions regarding the relationship between the two. in talking to librarians at those places and others, from some i hear sentiment for making one or the other the "primary" access point. thus i've heard arguments that "the online catalog represents our collection, so we should use it as our main access mechanism." other librarians state that "the online catalog is fine for searching for books in our collection, but there is so much more to find and so many more options for finding it, that we should use our web pages to link everything together." my hunch is that probably we can all agree that there are things that an online catalog can do better than a web site, and things that a web site can do better than the online catalog. as far as that goes, have we ever had a primary access point (thanks to karen coyle for this thought)? but that's not what i want to talk about today. the debate over a primary access point contains an invalid implicit assumption and asks the wrong question. the implicit assumption is that we can and should control how our patrons come into our systems. the question we should be asking ourselves is not "what is our primary access method?" but rather "how can we ensure that our users, local and remote, will find an avenue that enables them to meet their informational needs?" since at this time i'm more familiar with wmu than notre dame, i'll draw some examples from the former. we have "subject guides to resources" on our web site. these consist of pages put together by subject specialists that point to recommended sources, both print and electronic, dan marmion local and remote, on given subjects. students can use them to begin researching topics in a large number of subject areas. the catch is that the students have to be browsing around the web site. if they happen to start out in the online catalog they will never encounter these gateways, because the only reference to them is on the web site. on the other hand, a student who stays strictly with the web site is quite possibly going to miss a valuable resource in our library if he/she doesn't consult the online catalog, because we obviously can't list everything we own on the web site. (also, obviously, the web site doesn't provide the patron with status information.) this is why we have to ask ourselves the correct question mentioned above. what is the solution? unfortunately i'm not any smarter than everyone else, so i don't have the answer (although i do know some folks who can help us with it: check out www.lita.org/ committe / toptech/ main page. htm). my guess is that we'll have to work it out as a profession, possibly in collaboration with our online system vendors, and that the solution will be neither quick nor simple nor easy. there are some ad hoc moves we can make, of course, such as put links to the gateways into the catalog, and on our web pages stress that the patron really needs to do a catalog search. the bottom line is that we have a dilemma: we can't control how people come into our electronic systems, so we can't have a "primary access point." if we try, we do harm to those who, for whatever reason, reach us via some other avenue. we need to make sure that we provide equal opportunity for all. dan marmion (dmarmion@nd.edu) is associate director of information systems and access at notre dame university, notre dame, indiana. production: ala production services (troy d. linker, christine s. taylor; angela hanshaw, kevin heubusch, and tracy malecki), american library association, 50 e. huron st., chicago, il 60611. publication of material in information technologtj and libraries does not constitute official endorsement by lita or the ala. abstracted in computer & information systems, computing reviews, information science abstracts, library & information science abstracts, referativnyi zhurnal, nauclmaya i tekhnicheskaya informatsiya, otdyelnyi vypusk, and science abstracts publications. indexed in compumath citation index, computer contents, computer literature index, current contents/health services administration, current contents/social behavioral sciences, current index to journals in education, education, library literature, magazine index, newsearch, and social sciences citation index. microfilm copies available to subscribers from university microfilms, ann arbor, michigan. mum requirements of american national standard for information sciences-permanence of paper for printed library materials, ansi z39.48-1992.oo copyright ©2000 american library association. all material in this journal subject to copyright by ala may be photocopied for the noncommercial purpose of scientific or educational advancement granted by sections 107 and 108 of the copyright revision act of 1976. for other reprinting, photocopying, or translating, address requests to the ala office of rights and permissions. the paper used in this publication meets the minieditorial i 3 13 a book catalog at stan ford richard d. johnson : stanford university libraries, stanford, calif. description of a system for the production of a book catalog for an undergraduate library, using an ibm 1401 computer (12k storage, 4 tape drives), an expanded print chain on the 1403 printer, and an 029 card punch for input. described are the conversion of cataloging information into machine readable form, the machine record produced, the computer programs employed, and printing of the catalog. the catalog, issued annually, is in three parts: an author & title catalog, a subject catalog, and a shelf list. cumulative supplements are issued quarterly. a central idea in the depiction of entries in the catalog is the abandonment of the main entry concept. the alphabetical arrangement of entries is discussed: sort keys employed, filing order observed, symbols employed to alter this order, and problems encountered. cost factors involved in the preparation of the catalog are summarized. in november, 1966, a new library opened at stanford university. designed primarily to serve undergraduates, the j. henry meyer memorial library is a major addition to the libraries on the university's campus. a four-story structure with 88,000 square feet of usable space, it has shelving for 140,000 volumes and seating for 1,900 readers. the new library has numerous distinctive features. one is the subject of this account -the catalog. there is no standard card catalog in the building. instead, copies of a book catalog are situated at eighteen locations throughout the library, easily accessible to all students and staff. in addition, copies 14 journal of library automation vol. 1/ 1 march, 1968 of the catalog have been placed at other points on the campus: the main and departmental libraries, offices of academic departments, and student dormitories. the literature now contains numerous accounts on the preparation of book catalogs in libraries ( 1,2). one may question the value of yet another narrative, but an account of the stanford experience is valuable for several reasons. the genesis of the stanford book catalog has been recorded, and a follow-up describing what happened subsequently is the next chapter in the story. the book catalog experience at stanford is now sufficiently advanced that one may recount the undertaking both in depth and breadth-from its inception, through design, implementation, and first full year of operation. such an account can give stanford's approach to some still unsolved problems; for example, filing order, and the innovations it has made. the particular environment within which the book catalog was designed was conducive to innovation, because the entire university library system was not itself committed. finally, the approach here employed has been eclectic, and this report can record thanks to the many individuals .and institutions whose ideas and plans have been examined for possible use in the stanford undertaking. of particular importance to this project were the example and experience of florida atlantic university, the ontario new universities library project at the university of toronto, and the columbia-harvard-yale computerization project. origins the stanford book catalog had its ongms in 1962. during planning for an undergraduate library it was felt a catalog in book form and available in many locations would have immeasurable educational benefits for the students. particularly was it felt that the subject portion of such a catalog would prove a valuable bibliography to students in the university ( 3). somewhat later, when the size and proposed layout for the new library indicated the desirability of at least three complete card catalogs as an adequate guide to the collection, further emphasis was given to the possibility of a book catalog in multiple locations. a grant in 1963 to stanford university from the council on library resources, inc., permitted a study by robert m. hayes and ralph m. shoffner on the economics of book catalog production. this investigation compared the costs of the various ways in which a book catalog can be produced ( 4). of the methods considered, stanford selected the computer to study further. the computer was chosen not only because equipment was already available on campus but also because of the recent introduction of an expanded print chain with the capability of printing upper and lower case letters as well as necessary diacritical marks. in the fall of 1964 stanford undertook further study, employing the hayes-shoffner report as a basis but now comparing refined costs of a a book catalog at stanford/ johnson 15 computer-produced book catalog with costs for three complete card catalogs in the new library, as well as costs for two shelf lists and main entries in the university libraries' union catalog. this second study was completed in december, 1964, and university officials approved the preparation of a computer-based book catalog for the library when it was determined that such a catalog would prove more useful, and for a few years less expensive, than the three card catalogs ( 5). while the autumn study was in progress, cataloging of the new ljbrary' s collection began. plans were made for three card catalogs. although the card catalogs were never prepared, the planning was of considerable value later in establishing field and record lengths for the machine record, as well as in securing general agreement on the kind of information to include and the format of the final catalog. systems design preliminary systems design began in january, 1965. a systems engineer from ibm guided a team of university staff composed of librarians and personnel from the administrative data processing center in the controller's office. at the outset it was recognized that the assignment to produce a book catalog for the new library did not call for consideration of the other aspects involved in the library's operation~, such as acquisitions, circulation and reference. but as work proceeded, efforts were consciously made to design a system that could be integrated into a larger system at a later date. the basic object of the preliminary systems design was to refine further the cost estimates from the study of the preceding autumn. the system as it was being designed, however, called for increased machine time and corresponding increases in cost for processing as well as for programming. in retrospect, the major achievement of the preliminary systems design was to establish the environment for a meaningful dialogue between the librarian and systems and computer personnel. when the study began, the librarian requested a system that would have involved use of a large configuration of equipment with direct-access capability. the systems and computer staff approached the design with knowledge of the equipment that would be used for the project (an ibm 1401 computer, 12k storage, 4 tape drives) and thought in terms of fixed-length records and fixed-length fields. through a program of mutual education, the librarian learned of the computer and what it could do and what it could not do; and systems and computer personnel learned of the library's requirements and desires. there evolved the basic design for a system capable of being implemented on the equipment at hand and acceptable to the library. as preliminary systems design drew to a close, necessary equipment was ordered. the principal element was the expanded print chain for 16 journal of library automation vol 1/ 1 march, 1968 the ibm 1403 printer, containing 100 different characters and developed earlier by florida atlantic university, yale university, and the university of toronto. in addition, appropriate modifications were made to the central processing unit of the 1401 computer to be used in the project. for the inputting of data the ibm 026 card punch was selected. it was available, and there was considerable local experience in its use. a modification made to it simplifies punching of one character, the word-separator character, used to designate an upper-case letter. delivery time on the 026 card punch was four months. although it was realized that the newly announced ibm 029 card punch would be superior for our project, delivery time on it was one year. even before the 026 card punch was received in july, 1965, an order was placed for an 029. the 029 replaced the 026 in august, 1966. the 029 card punch, designed for use with system/ 360, was considered superior to the 026, because it is possible to punch each of the characters specified on the expanded print chain without resorting to the multi-punch key. appropriate modifications were ordered for the 029 so that desired characters would print at the top of the punched cards. detailed systems design was completed by june, 1965, and the system may be described in the following manner. output the design called for four basic outputs from the system: 1. an edit list to facilitate proofing of the items converted into machine readable form. this was considered essential because of print-out in upper and lower case. 2. an author & title catalog listing items under their author and title entries. 3. an alphabetical subject catalog listing items under library of congress subject headings. 4. a shelf list entering all items in call number order (the library of congress classification was adopted in may, 1965), giving all tracings for a particular entry, as well as the number of volumes and copies and their location in the library. a complete catalog was to be printed annually (author & title, subject, and shelf list) with cumulative monthly supplements to each. output for the annual author & title catalog and subject catalog from the computer printer were to be photographically reduced, offset masters created, and fifty copies printed. the catalogs were then to be bound in reusable binders. later it was decided to restrict use of the reusable binders to the shelf list, printed in four copies, and the supplements for the author & title and subject catalogs, to be printed in six copies, and to bind the basic annual catalog in standard book form. it was also decided to print ten copies of the author & title and subject supplements. a book catalog at stanford/johnson 17 it was originally proposed to divide the catalog in a slightly different manner: names (as authors and as subjects) and titles in one section, and topical subjects in the other. although this seemed to have considerable logical value, it proved impossible to implement during preliminary work with card files, given the time and staff available. provision was also made to print the catalog in one section as a dictionary catalog if so desired, or on cards if the book catalog should be abandoned at a later date. input to achieve the above output, the design called for four kinds of input into the system: 1. entries for titles cataloged. a separate record was to be made for each volume or copy of a title cataloged so as to provide holdings information for the shelf list and for integration into a circulation system at a later date. 2. cross references to connect headings in the author & title catalog and in the subject catalog. in addition, the cross reference format would permit the introduction of information notes into any of the catalogs. 3. changes to entries that are in the catalog. 4. entries for items that are on order, with a view to integrating this form of input into a larger acquisitions system at a later date. implementation the systems design called for the preparation of eight different computer programs to transform the input into the various documents as specified above. the basic programs were written dlll'ing the six-month period of june-december, 1965. during the first part of 1966 the programs were debugged and the very important change procedure prepared that enables revision or deletion of a record. coincident with the preparation of the programs, library staff began in july, 1965, the inputting of cataloging information. the expanded print chain was installed in june, 1965, and edit listings for proofing purposes were available in august. in order to test the programs and study the catalog's format, a first test catalog was prepared in january, 1966. a second test, incorporating the change procedure, was undertaken in april; and a third, partial, test was run in june. the machine record when stanford first considered the costs of a book catalog in 1962, it was quickly discovered that the most expensive element was reproduc18 journal of library automation vol. 1/ 1 march, 1968 tion of the individual pages. this factor influenced many decisions in design: the more entries per page, the fewer pages and less overall expense. it became necessary then to consider which elements in a standard catalog entry could be omitted or abbreviated. decisions were fairly simple to make. the collection duplicates almost entirely material in the main research library's collection, with full bibliographical information given in that library's union card catalog. in addition, browsing is encouraged among the open shelves of the new library. the books are readily available should further information be required. along with the factor of cost another element appeared-the desire to make a book catalog that would be something more than reproductions of unit catalog cards. as this thought evolved, it was learned that more space could be saved in the catalog through abandonment of the unit card and main entry concept. articles by ralph h. parker ( 6) and wesley simonton ( 7) were instrumental in developing this aspect of the system. · the library was amenable to a short entry in the catalog, but the actual length was another matter. from a sampling of items cataloged, it was learned that more than 99 per cent of the entries would be less than 500 characters in length. there was considerably less certainty on maximum lengths for the individual units, or fields, composing each entry. computer personnel argued in favor of a fixed-length machine record in order to simplify programming, and a successful compromise was made: there was to be a fixed-length record composed of one fixedlength field and six variable-length fields. each record is 570 characters in length. for the few catalog entries that are extremely long it is possible to use two records for one catalog entry. the maximum length for any catalog entry is thus approximately 1,000 characters. it is possible to enter even longer units by dividing them into sections and entering each as an analytical entry. to speed input-output time and to conserve space on tape, the records are placed on magnetic tape in blocks of two records each. each of the six variable-length fields in the record is individually tagged. it was learned during the preparation of a later program that it would be necessary to restrict the overall length of any one field, and it was agreed that the maximum length of any one of the variable-length fields would be 400 characters. through a misunderstanding, the author did not realize that in tape storage an upper-case letter is equivalent to two characters, a factor not taken into account when record and field lengths were established. fortunately, this minor error has occasioned no problem. the master tape record the master tape record (table 1) illustrates how all of the information appears on magnetic tape. (figure 5 gives an example of the layout.) a book catalog at stanford/johnson 19 table 1. map of master tape record. position type of information 1-30: 31-35: 36-42 : 43-44: 45-46: 47: 48: 49: 50: 51: 52: 53-54: 55-57: 58-71: 72-77 : 78-83 : 84-89: 90-95 : 96-101: 102-107: library of congress classification size and/or format of publication (e.g., folio, mfilm) volume number part number copy number type (blank: monograph, no anal.; 1: monograph, anals. made; 2: serial received in unbound form; 3: serial, unbound, anals. made; 4: serial received in bound form; 5: serial, bound, anals. made; 6: analytic; 7: author-title cross reference; 8: subject cross reference; 9: item on order) record indicator (program supplies ''1'' if there is an overflow record and "2" in second record) special location in library (code a-z) change indicator (code c for revision; code d for delete) title indicator (code t if entry desired under title) shelf list indicator (code s if entry .is to appear in shelf list only) year acquired (e.g., 67) month and year reported missing (e.g., 117 for nov. 1967). it is assumed a book will be removed from the catalog if missing more than· nine years. future codings address and length of main entry (area 20) (three positions for address, three for length) address and length of conventional title (area 30) address and length of title paragraph (area 40) address and lengh of notes (area 50) address and length of subject headings (area 60) address and length of added authors and added titles (area 70) 108-570: variable length fields the fields to simplify coding and keypunching, each field in the record is called an area and numbered 10 through 70. as will be shown later, these numbers are not transferred to tape. a description of the seven fields in each record can give a good idea of the elements included in cataloging and how the unit card/main entry concept was abandoned. area 10 is the one fixed-length field in the record. it is 71 characters long and contains positions for call number, volume number, and copy l 20 journal of libra1·y automation vol. 1/ 1 march, 1968 number. in addition, it contains indicators for other elements: type of publication; record indicator (program supplied if there is overflow to a second record); special location in the library; change indicator; title indicator; shelf list indicator; year of acquisition; and date missing. fourteen positions remain blank for future use. area 20 contains the main entry, area 30 the conventional title, area 40 the title paragraph. the title paragraph includes: the title; author statement; edition statement; imprint, limited to publisher and date; and collation, limited to pagination. area 50 contains notes. subject headings are recorded in area 60, entered one after another and separated one from another by a record mark, a symbol resembling a double dagger. added authors and added titles are entered in area 70, similarly separated one from another by the record mark. only added titles are entered in area 70. if a catalog entry is desired under title, then the title indicator is marked in area 10. personal names on the form of personal names in the catalog, it was decided to anticipate the anglo-american cataloging rules, publication of which was imminent. in general, the title-page form of a personal author's name is used. on the one hand, this has meant a shorter record and greater simplicity in inputting data; on the other hand, it became necessary to maintain a name authority file when the form adopted for the book catalog differed from that established by the library of congress or earlier cataloging rules. the relator, the element that describes the relationship of a person used as an entry to the work being cataloged (e.g., ed., tr., comp., illus.), is omitted in the heading to save space. the relationship is shown in the title paragraph. a heading in the book catalog, either author or subject, is printed once before a group of titles and repeated only if the titles associated with it are continued in another column. in addition to not permitting use of the relator, the system does not permit in the author & title catalog "added" entries composed of an author and a title. in standard cataloging such a technique may be used instead of a separate analytical entry. in the author & title catalog, however, such a composite entry would establish a new "author" (name plus title of the work) and would file as a separate unit after all works by that author. in the subject catalog the author-title entry is permitted so that books about voluminous authors and their individual works may be better displayed. the conventional title the conventional title has been employed to assemble under an author's name editions of a work with variant titles. collected writings of an author, or selections, are given the conventional title [works] or [sea book catalog at stanford/johnson 21 lections] ; through a combination of coding and programming, they are entered first under an author's name before titles of individual works are listed. (see in figure 8 the entry under karen horney for an example. ) the conventional title has meaning only as it is related to the main entry. for that reason it prints only in the catalog when preceded by the main entry. the title paragraph and the unit record as summarized above, the title paragraph includes the title, author statement, edition statement, imprint, and collation. with one major exception this involves the copying of, or truncation of, information present on a library of congress card. the exception is the author statement. as shown in a recent investigation ( 8), this element was present in but twenty-five per cent of the entries studied. current cataloging rules permit in some cases the omission of the author statement when it is identical with the form used in the heading ( 9,10). these rules are based upon a cataloging system employing unit records on cards, the first element of which is the main entry. in unchanged form the author statement is used as the main entry; for added entries another heading, such as author, title, or subject, is superposed on the card. in the stanford system a new unit record was introduced. the first element of it is the title paragraph. all headings, rriain or added, are placed directly above it; and if entry under title is desired, a . title entry is made in hanging-indention form. the stanford book catalog thus does away with the main entry concept completely. the necessity, or even wisdom, of setting apart one field in the machine record as main entry may be questi~ned. why not group the main entry with the other added author entries in area 70? there were two reasons: first, it is simpler to adapt the information from library of congress cataloging information if the form can be followed relatively closely. second, we wished to allow for the possibility of printing standard catalog cards if necessary, and this would allow for a reinstatement of the standard unit card concept. a basic requirement of the system is that the author statement must be included in the title paragraph. if for any reason it cannot be listed there, then it is recorded in note position in area 50. although no formal study was undertaken, it was believed that works by single personal authors would constitute more than fifty per cent of the collection. the addition of the author statement in the title paragraph for each such book could add considerable bulk to the catalog. accordingly, through the use of record marks as coding symbols, the author statement is set off in the title paragraph for those works by single personal authors. through programming, the author statement is suppressed when the work is entered in the catalog under the name given in the main entry; whereas it appears under all "added" entries. 22 journal of library automation vol. 1/1 march, 1968 i }7({]1\_, '-. 0 ~' -oifoii j 1y !.0. • "<• "'·><• ~~· ! :·~·;.~~~,, t: ,· dec z q 1966 1 1' fl ' 6'6 1 ·, . :··' ·""': '~~ ,,, ~~~ -·· ' ' ' ,. ':·.-:. }"'~:::;> ' '., • ., ..... no. '/'---"'~' ' ' i . . ' ~ ' ? i .. · ·' .':' . ' . 1018~ hdll~· ).;--' . ':.' _i''··-::-''~·· :;-_, . v ~·~" 101 i~0 ' . . ; ; r.~~-ickes, harold l. • u • 1 " • u ~ ~o(]j ll .:l,'h•;j}secret diary ot kt.rold l, i ekes, ' . ,t.-1 2 0 1, 1111:.: ' !4oit.i8o6 simon t.nd schuster, 1954-~53 v. ii:j. '1' ~ ' :4.~~ :12 ~\eet.a, lc 'illae fire~ tbocuana oook no. oat£ _l; i , . da;s, 1933 1936. ..e, 'l'he iulde n-t;~ 1 l1jj : • · 11 1 c, . .,.,o.f_ atx oggh, 1936 1939 3 'rio loun"'&u.l.o · · : 1 • tl:odth 1939 19'•1 1 oth( '\ l.• o. sur&. h.._,,u,f oots ! so[]] '<•'l-"·••,_, .. ' . { . ' v•';.,...,,,.;, j· . i llll i ~ ' u.s.~-polltics and sover.,..ent--1933-1~' . · . ·~· do~ i. title . vol. no. u i't. (f)i'y s4 . ~ ' 11 j 11 j i. ·. ' l'{jl () " . " ........ '1· .!' .:. ·. (\;,: ..... --·· · .. c.:t-·-··· oeczo~ _ ..... •· . i ·. 6j6' . ·fig. l. the coding sheet. a book catalog at stanford/johnson 23 the system as it has been established calls for works to be entered alphabetically by title under the heading in the catalog. in the subject catalog this means, too, that works are listed alphabetically by title under each subject heading and not by author. it was felt that this form of arrangement is quite satisfactory for a selective collection, such as the meyer library; and it offers the possibility of scanning a page of thirty or more entries. it has occasioned one problem-when a subject heading expresses form and not subject. for example, under "symphonies," works are arranged alphabetically by title and not by composer. conversion the basic source document used is the library of congress catalog card. although the card itself could be used by crossing out unnecessary information and adding other data, it was felt that a clearer document would result if the needed information were copied onto another catalog card. examples of such cards are shown in figure 1. (subsequent figures depict the manipulation of the entry for the book by thomas a. bailey, presidential greatness.) as illustrated in figure 1, an identification number is assigned to each catalog card and four catalog cards are placed upon a coding board and a xerographic copy made. . the original cards are filed in a manual shelf list with the identification number as an indication that the information has been coded. the coding sheet is given to the coder, who enters area 10 information in the blocks at the right and indicates to the keypuncher where other areas begin and what special symbols should be used. · to simplify the inputting of data and the scanning of punched cards, a special data-processing card was designed for input (see figure 2) . each title converted is represented by a decklet of punch cards averaging six. each of the seven areas or fields begins on a separate card. alcn1'1ov 1. 0. !fq tl.ali :o ~vjs:ot\11 $ 118 fin soot! ~!\ oat i!: oflll"!l 1. ()_ ~4/pi:iitma....!l:!'ouj!oi:e ml'li!:r ~.:.y !' ~ " . 1\ /·· •o o.iioeiit£0 041[ oiiot" im*.ij" ; ,__ 1nit' ::-. ~ 0 0 0 0 d 0 doooodooiiidoooooooooocooododiiioooodooooooooooooooooooooooiddoooqoodo ~ '~ 1 t =. ~~q~"~n ••nwnn~ nuu~n~nnu~u~n~••ftdum~u••u•l1t~li»~"~~~umflhumuhuhunnnnhn~n~na • 111111 i 11 i 1 i ii 1111111 i 1.1 11 111 11 i 11 i 1 i 1111111 i i 11 11 11 1111 i ~ 2,0 main entry ~i 222222 30 222222222222222222222222222222222222222222222222222 convenfion.al title• g~ 3 3 3 3 33 3333333333333333333333333313 33 33333333333 3333333333 z~ 40 title q~ 444444 444444444444444444444444444444444444444444444444444 ~~ $0 note$ ~;;1 ssssss ss5555555555 55ss55 sssssssssssssssssssssssssssss6l55 c •o suo ject headings !ll sssiu iiisisiisiiiiggii ii illiiiilllli l iillilliiilllllllll !:; to iiii(daui~$-igc£0 mm ~111117 11 i i i i i i 11111 111 111111 1j1111111jiii111111111111h 1771111111111711111111 iiiiiiiiiiiiiiiiii i iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii iii iiiuiiiii i iiiiiiiiiiiiiii !9999919!9999999919999999999999!99999911119999!!99!99999!!! !!9! ! ! !! ! 1!!9911!1!9! i l 3 t i & ',.! ~~~~~~ 112 11 u i)j* ll)t iijtj\uu!cuan:uuoll ublt ""31m··~·~ 0"4uuitiusi$1'1" ~' "~ui51~hh1uumiuuimm71 n nn )4 wltn hltl fig. 2. the data processing card. 24 journal of library automation vol. 1/ 1 march, 1968 though this may be considered wasteful of cards and indicative of "socolumn mentality," it does have its benefits. a mistake in punching in one area requires repunching of the material in that area only, the area being the smallest unit for editing purposes. so that the cards are kept in correct order for processing, the first ten columns of the cru:d are used for identification numbers. the six-digit identification number assigned to the original card is punched in the first six columns. the first digit is the month, 1-9 being january through september and zero being october-december, one "month" of 92 days; the second and third digits are the day of the month; and the fourth through sixth a consecutive number assigned each day. it is possible thus to code 999 enhies each day. the year is omitted, because it was assumed data would be transferred to magnetic tape at least once a year and probably more often. the area number is punched in columns 1 and 8 and the sequence within the area in columns 9 and 10. cataloging information begins in column 11. figure 3 shows a decklet of cards for one title. information in area 10 is fonnatted somewhat differently for books on order or for cross references. to enter a book on order, the word "ordered" is punched in columns 22 through 28, the date and order number in columns 29 through 40, and 9 in column 57, the type indicator. ·. this information prints in the catalog as a call number, and books on order are listed first in the shelf list. to date, entering of books on order has been limited to a few sample cases only. a cross reference in the subject catalog has a "call number" composed of the first nine characters of the entry to which reference is made (punched in columns 13-21) and an eight-digit identification number (in columns 22-29). an 8 is punched in column 57, the type indicator. a cross reference in the author & title catalog has a similar "call number" composed of the first ten characters of the entry to which reference is made (punched in columns 12-21) and an eight-digit identification number (in columns 22-29). a 7 is punched in column 57, the type indicator. subject cross references are listed in the shelf list in "call number" order following books on order; and author & title cross references follow subject cross referen.ces. the "call numbers" for cross references do not print in the catalog proper, and serve only as addresses to retrieve the cross reference from magnetic tape when a change or deletion is necessary. added copies and volumes are entered by preparing area 10 information only and punching s in column 61, the shelf list indicator. the first copy entered is never coded explicitly as copy 1, even though a second copy is being simultaneously added. the program automatically identifies the first copy entered as copy 1, and the number prints in the shelf list when another copy is added or a volume or location is shown. ~o long as cataloging information is on punch cards and not on maga book catalog at stanford/ johnson 25 netic tape, the 10-digit identification number is the device used to retrieve the information. when the information is transferred to magnetic tape, the identification number is lost and the call number becomes the identification device. as shown in figure 3, information in area 40 (and in areas 20, 30, 50, and 70 as well ) continues to successive cards. it is not actually necessary tb punch through column 80 before beginning a continuation card for an area, and experience indicates that corrections are simplified if blanks are left at the end of each card. the only requirement is that if 0 10'1000 1 0 0 00 1 08110001100800dod8001000dgdgoo g~gooobigodoooogoogioo nqqmbm ~ ·~mnnn"aknnh·~~n~ nxum~~~u~~uutt ~"m~u~~mmgmhuuvu~ »muuunftd n~n mnn ~ main (huy cohv( ntklftal. ti tle> titl e note:s ii 1("0 suii..icct h£ /tdi,.g3 11 1 1111 1 1111 1 1111 1 111 11 11 1 1111 1 11 1 11 1 1111 1 11 1 111111 2 2 2 22 2 22 2 2 2 2 2 2 2222 222 22 2 22 22222 2 2 222 222f22 2 22222222 llll 3 j 3 3 3i 3 j 3 3 j j j 33 j 3 j l j j j j jj j j j j j j 33 j j l j l ll lll j l j l 1 4411114 44111144114 4 4 1 44 1 1114 41 444 4 14 4 4 44 4444 4 4 14 44 i i s 5 is is ii i i 51 i i 5 1 i is i is 55 5 fs i 1511151555 55 5 55 5 51 ill ai:uo autii«nez perc6 lnd!ons end the opening of the northwest+• by alvin"· josephy0 jrto yale un lv. press1 19€5. 705 p• sub.! nez perce indians--history nort h~ est0 pacific-history r . o. o8i009 call auth tltl subj j.d. call auth tl tl sub j i.d. call auth 1111 subj i. d. e17l .a43 index amer i can h eri ta ge typ joe cllg u. sht yr 66 ten year cu~ula t! ve index, amer!con heritage. volu•e vi, hu•ber 1-volu~e xv• nu•~•r 6. dece•~•r• 195t octobe~, 1964. american heritage, 1965. 167 p• u. s. --his tory--indexes u.s.--civiliz~tion--indexes 081010 el76.1.bl7 typ loc chg tlo t aht vr 65 ::!!~~;ni~:~•:r~;tness: the saage and th e • •n !toe george washington to the presentt• by tho••• ao b•lleyte app l etoncen t ury, 196~ 368 ~· pres idetits-u .s. 081011 eb06.12 v.l tvp joe chg i ckes, harold t. secret dlar~ of harold l• iekel• si~on 1nd schuster, 1954-~5. 3 v. the pro9ra• ha; e ncountered a card sequence error. u. s. --pol !ti cs ano govern~ent--1933-19~ 5 08101 2 u. t slit )lr 66 page 35 france call i.d. . e81!.§..i2 v.2 ·~~~i_:i) typ i oc chg tlo sht s yt 66 ( .... <;.. c'-<1 e'l/c'l3 fig. 4. the edit list. ~ <:r.l i .,... .a 1:'"1 .,.. ~ ~ ~ ~ b" ~ .... a· ~ ~ !""" ....... ........... ....... ~ ii) '"t _?....... ~ ~ a book catalog at stanford/ johnson 27 the edit list on a regular basis, generally once a week during activation, an edit list is run on the computer for cards punched since the last listing. an example of a page from this list is shown in figure 4. this list is proofed against original cataloging data as represented on the coding sheets. information still remains on punched cards, and errors detected are corrected on the cards. as an aid. to proofing, the edit program generates a number of error messages: cards out of order; absence of area 10 information; call number incorrectly formatted; an invalid character punched; information too long to fit into two machine records; information in one field more than 400 characters in length; an incorrect use of coding symbols (for use in determining filing order and to set off author statements) ; and incorrect use of record marks in areas 60 and 70. a nagging problem encountered during proofing is the fact that it is done out of context. in the preparation of cards for a standard card catalog, a second form of proofing is possible when cards are filed and entries compared with headings already in the catalog to insure that they are compatible. with a machine performing this function, this further check is not practical with existing equipment. similarly, the machine does not recognize human errors and will file a misspelled word as it was entered and not as the word it was meant to be. the discipline required for the accurate inputting of cataloging information intended for machine manipulation is at times frightening. differences between a book catalog and a card catalog became more obvious as work progressed. in a card catalog one can: see from the typography, the stains on older cards, and the kind of card stock used, tl1at information was entered at different times. one is more willing to tolerate the differences that appear. because it is not generally possible to compare a number of entries at the same time in a card catalog, one misses many inconsistencies. on the other hand, in a machine-produced book catalog, one scans numerous entries with one glance. produced at the same printing, they appear of equal vintage even though they may have been entered at different times. the inconsistencies resulting from changing cataloging rules become very obvious. this is particularly annoying with respect to the matter of capitalization, and the effort to produce an internally consistent document is difficult. costs for inputting the first year's experience ( 1965-66) has indicated a cost of $.40 per title for inputting of cataloging information. indications are that this cost has remained constant for the second year. included in inputting for each title is provision for all extra records needed for added volumes and copies and cross references. the cost does not include actual cata28 journal of library automation vol. 1/ 1 march, 1968 loging, preparation of the typed catalog card, overhead, or computer charges for the edit list. the cost may be broken down as follows: table 2. inputting costs, 25,000 titles (1965-66) coding: 50 titles per hour @ $2.20 per hour keypunching: 12 titles per hour @ $2.20 per hour proofing: 72 titles per hour @ $7.40 per hour ( 2 staff members) equipment : keypunch rental ( $926.02); punch cards ( $312.34); and coding sheets ( $520.86) $ 1,100.00 4,583.33 2,569.43 1,759.22 total $10,011.98 the stanford experience indicates that over a period of time it is possible to input 100 titles per eight-hour day on each card punch or approximately 2,000 per working month. this figure is based on a shortened catalog record as described, but includes provision for separate records for added volumes and copies and cross references. the staff employed at stanford had no previous experience keypunching and were instructed either in a formal school for five days or, as is now done, on the job. three or four staff members are trained for keypunching at all times and have regular schedules. with a staff of this size, punching can proceed . on a steady basis in spite of vacations and illnesses. the programs systems design called for the preparation of eight computer programs. in addition, two package sort programs from ibm are employed for the filing of entries in the shelf list and in the author & title and subject catalogs. as experience increased, it was found that the updating of entries in the first annual catalog when merged with new data for the second annual catalog could be simplified if three utility programs were used; these also were prepared. a locally devised assembly language, sopat, similar to autocoder, was used for the programs. the first prog~am is the edit program ( lb001) which processes the cards to prepare the edit list described above. during the first two years of operation the basic pattern has been to prepare weekly the edit list described above using the edit program ( lb001), and transfer data to permanent storage on magnetic tape once every three months. (the punch cards are stored in another area on campus, as back-up.) the quarterly basis has coincided with the schedule for the program tests as well as the quarterly supplements and annual catalogs. in brief, the following happens: 1. ·cataloging information is transferred from punch cards to magnetic tape through the card to tape progr~m ( lb010). 2. through a call· number sort ( ibm sort 7) the above records are arranged in call number order for a basic shelf list. a book catalog at .stanford/ johnson 29 3. through the format and update program (lb020), all the necessary entries for the author & title catalog and subject catalog are generated from the above records and the shelf list is updated. 4. through an alphabetical sort (ibm sort 7 or ibsys 7090 sort depending on the magnitude) the entries for the author & title and subject catalogs are arranged in alphabetical order. (longer sorts have been run on an ibm 7090 computer. ) 5. through the author-title and subject update program ( lb050) , the new entries created by lb020 are merged with existing entries and existing records are deleted as specified by lb020. 6. the author-title and subject split program ( lb060) sets up the entries on magnetic tape (line length, indention) as they will appear in the catalog and establishes the two columns for each page. 7. the author-title and subject printout program (lb070 ) prints the pages of the author & title and subject catalogs. 8. the shelf list split program ( lb030) performs the same function for the shelf list as undertaken by the split program for the author & title and subject catalogs ( lb060). 9. the shelf list printout program ( lb040) prints the pages of the shelf list. the change procedure it is possible to change information in a preceding supplement by following tl1e change procedure: to change a call number, the entire entry is deleted, employing a record consisting of area 10 and a delete symbol in the change indicator, and a new entry inserted in a separate record. to change an area, a change record is prepared consisting of area 10 information with the change indicator marked plus card( s) for any area( s) to be changed. for example, if there is an error in a subject heading, it is necessary to prepare only an area 10 plus an area 60 change record showing the subject headings desired. the smallest unit for editing purposes between supplements is the area. changes in an annual catalog dm;ing systems design it was realized tl1at machine time could be saved if a different procedure were followed to change information in the preceding year's catalog when merging it with fresh data to form a new annual catalog. the procedure used is to delete through three utility programs ( lb075, lb080, and lb090) entries that are to be changed and then enter them anew. the first of these programs ( lb075) is a card-to-tape program. special delete cards (essentially area 10 information giving call number ) are 30 journal of library automation vol. 1/1 march, 1968 prepared for each volume and copy to be deleted. they are transferred to magnetic tape and sorted into order by call number. in the second program, the shelf list delete program ( lbobo), the last annual shelf list tape is read and entries as specified by lb075 are deleted and a new shelf list tape written. since the author-title and subject files are not in call-number order, a table look-up technique is used in the third program, lb090, the authortitle and subject delete program. the table consists of call numbers (in proper sequence) for all records to be deleted. each entry in tl1e author-title and subject files is checked against the table and deleted if the call number for the entry is listed there. a revised author-title and subject tape is thus prepared. records on tape the basic tape record follows the format of the master machine record described earlier. figure 5 shows a tape dump of the master shelf list 501 601 701 801 901 1001 1101 1 lot 201 301 ~01 501 601 701 801 ,01 1001 1101 1 101 201 301 401 501 601 701 801 901 1001 1101 1 101 201 101 ~01 501 601 701 801 901 1001 1101 1 101 h 174 0563 12 1 c s46 + 1 10 20 30 40 50 60 70 80 ~0 100 e 1761 ~17 1 r 66 108021 129151> 27901 7 baileyo thomas a.+ presidential greatness thf image ano ne man from george washing ton to the p~ese~t+o 5y thomas ao 5ailey?. "ppletoncentury, 1966o 368 potpresioents•-u.s.t s66 + 1 10 20 30 40 50 60 70 80 9 0 !co e 1761 n 1 c r 66 1oao23 t31nnrco3510501 7 kane, jcseph nathan+ facts about the presidents a ccmpilaticn of 810grape l "l _.c hi stor !cal oau. 5y josfph nathan kane. ho w. wilson, 1960. 348 p.h s$h$eslsvseso cnly i ~ aru z 30o+presioents-u. s.+ +e 1762 "4 1 t 66 108018 126153 279ch ~eanso ~lria~~e+ 7~e wc~an in thf white ~ovse the lives, times and influence of twelve mtable fi~st l~cies+. 8y ii ariamne ~eans+. ranoch ~ouseo 1963o 299 p.+presioents--u.s.--~ives+ + 1 10 20 30 40 50 60 70 80 90 100 e 178 a2~ 1 i k t 66 108c24 1321c1233 c9933204 'i aol~s, jahes 7~uslcw+ album of ~merican history. james ·tr~slc .. acams, ecitor i ~ chih . . c. sca.ifthf.r• l91tlt-60. 6 v.+vole 5 eclted by j. g. e. hopkins.• vol. 6 (s l"'cex.4$ ss h$ e hlvs eso in reference alcove ho.+u.s.--histcry+u . s.--social life and custo~s? +e 178 a24 2 1 k 566 + 10 2 0 30 50 60 70 8 0 9 0 10 0 178 .~24 l k s66 fig. 5. tape dump of master shelf list. a book catalog at stanford/ johnson 31 as formatted in the card to tape program (lb010) and after having been sorted into call-number order. as stated above, two records are placed in a block of 1,140 characters, each record with 570 characters. printed with a limited print chain, some characters do not appear as they will finally. the record mark prints as a plus sign, and in the entry for the work by kane the symbol indicating underscoring prints as a dollar sign. the word-separator characters and some other characters do not print at all; spaces are left to indicate their presence. once the information has been processed through the format and update program (lb020), however, the machine record is somewhat different. new records, one for each entry that will appear in the final catalog; are generated. a listing of the elements in each of these records is shown in table 3. figures 6 and 7 show tape dumps for author a11d subject entries. table 3. the author-title and subject tape record. position 1: 2-81: 82-101: 102-131: 132-136: 137: 138: 139: 142: 145: 148: 151: 154-618: type of information catalog indicator ( 1: author & title catalog; 2: subject catalog) major sort key minor sort key library of congress classification number size and format record indicator (program supplies ·"1" if there is an overflow record and "2" in second record) delete indicator (program supplied, for use in change procedure) address for main entry (area 20) or added author/ added title (area 70) address for title paragraph (area 40) address for conventional title (area 30) address for notes (area 50) address for subject heading (area 60) variable-length fields the sort key the format and update program ( lb020) generates the sort key for each entry. the sort key determines the characters that will be considered when the entry is to be alphabetized. a succeeding sort program does the actual alphabetizing. mter some study, experimentation, and conjecture a 100-character sort key was selected. it is in two parts: a major sort key of 80 characters and a minor sort key of 20 characters. the major sort key is formed from the first 80 characters of the ele32 journal of library automation vol. 1/ 1 march, 1968 1 10 1 20 1 30 1 401 501 6 0 1 1 10 1 20 1 jci 40 1 so l t01 1 1 01 201 301 4 ci sc i 6 01 1 10 1 201 30 1 40 1 501 6 01 1 101 20 1 30 1 4 0 1 50 1 f 0 1 i 10 \ 20 1 301 1 10 zo 10 1ba1cwi ~ t ~ rpr 3095 83 l of i~e s~akespearean company • 40 s o 60 70 eo 90 100 crg anizhicn anc pe 154 171 baldwin, t. w,+ the orcanllaticn a~o personne russell russell, 1961. 463 p.+ + i 1 10 zo 30 40 50 60 70 80 90 100 poll t ical part! es 8alley, stephen kehp+ politi ca l partlf. st u. eojieo by robert a, cclcw jn. ran c mc hal 1bailey stephe n kehp ujk 226s g59 1 54 178 s. a. ess.v s ty stephen kenp ba iley m/0 others. ly, 1965 . 158 p. + + 1 10 20 30 40 50 60 1"ai ley tpo~as a ee 1761 bl7 . 1 541 75 bailey, p. e !~age ano the "an froh george wa shi ngton to the present • + ·1 10 20 30 40 ! bai ley tpo~as a to 6 4 1 a7828 1 54 175 re4t betrayal, h icounter pap erbac ks, 1963 . + 50 6 0 bailey , lt29 p.+ ro eo " too pres ioh t i al creath thoma s a.+ pr es,! cent i al cre6tness t appletoncen tury , 1~66. 168 p,+ 70 80 90 10 0 wccc roo wilson and tp.oha s a.+ wc ccrcw oils cn and t~e c 1 10 20 30 40 50 60 70 80 90 100 life and expl or at 10 1 6 aj n j a~thur n~ 70018~38) 1 54173 baj~ , j , art j.tvr+ life and explorations of f rictjcf nans en . • lt'i 9 p.+ ~ew eo• re v. and f.onsioerabl y enl. , wtth numerous !llus. anc map. w, scctt , n. o + 1 10 20 30 40 50 ! ba i nville jaccues 7c c 335 832 1 54 175 70 -1 935 , trans lat ed, with an introouc tdry ncte, by 60 70 80 90 10 0 fre~ch repuelic 18 bainville , jacou es + t~e f~fnc' rfpubli c, 18 hamis h hiles. j, cape , 1916. 25 3 p. + fig. 6. tape dump of entries in author & title catalog. ment that will serve as the entry in the catalog-the author, the added author, the title (for entry under title), the added title, or the subject heading. the minor sort key is formed from the first 20 characters of the element that follows the heading-the title or conventional title. the conventional title can never be a major sort key. the title, on the other hand, can be either a major sort key or a minor sort key. during the course of the split program ( lb060), major sort keys are compared. if two or more are identical, the entry words for the second and subsequent identical headings are suppressed and do not print. under the heading that does print, entries are arranged in alphabetical order through the first twenty characters of the element generating the minor sort key. in a sense, there is a third sort key. if both major and minor sort keys a book catalog at stanford/johnson 33 are identical, items will print in call-number order. in no case will the element generating the minor sort key be suppressed. during the first test of the programs one mistake was discovered with respect to the generation of sort keys and the suppression of entry elements. in a few cases the library possessed multiple copies of the same book cataloged under different call numbers (for example, one as a separate and one as part of a series). the problem arose when there was to be an entry under title, with two major sort keys identical. a similar situation arose for periodicals. a periodical might be entered itself under its title (area 40) and also appear in the catalog as an author (area 20) for a book it might issue. to eliminate this problem, a minor 1 10 20 ...)o,l 40 50 60 70 80 90 ico 1 2p~f.~ervati~n cf 2cological spec t•ens se e" zc clcgti'l!l spf.c 1 01 i lcclccica12ll~616 191 154preserv4ticn gf zcclcgical speci"e's+\ shse z 201 cclcgjcal speci"ens-collectio~ 4no pe pecple s choic e 201 hc" the voter ~akes up his ml~lo i~ a presioent!al campaign, ey paul f. lazarsfelc, bernarc 8 3ci eltelscn, and' halel gauoet. zo eo. columbia untv. press, 1965, 11e p o+ 4 01 501 t:cl + 1 10 20 30 40 50 60 70 bo 9c ico 1 2prf~icents u s election statisti cal ~ istcry 1ci jk 1967 p4 tal 34q[54presioents--u.s.--electic•• a shtis iical tt lstc 2ci ry cf the ahrica~ presiof thial elections, p.v svf.no pet~rsen, intrcc. c"~ ·i·tic,a l electic•s 301 ay lcuis fill er: • f. uncart 1963. 247 p.+s s,h.e'itlvse'o 1~ referfnce alccve 'ho.+ 4ci '3cl ttl + 1 tr.l 201 1 10 2.0 30 40 2phsidents u s election 1696 cf 71c j6 11!7 tal hectic~ cf 1•~6, 8y sta~ley lo jo>;fs, so 60 70 eo 9c teo preslt£•tial elect! 154pr£s i oe~ts--u. s .--flect tc•--1~96• t~e pres i cent ljnivo of wtsco'sin prfss, 1964. ~3~ po+ fig. 7. tape dump of entries in subject catalog. 34 journal of library automation vol. 1/ 1 march, 1968 change was made in the program: if a title generates a major sort key, it is never suppressed if identical with a preceding major sort key. filing order formation of sort keys leads immediately to a discussion of the unsolved problem of alphabetization. in the meyer library catalog the aim has been to duplicate as closely as possible the arrangement of entries found in the university libraries' union card catalog. basically, this means a word by word alphabetization. in addition, we have attempted to preserve as many of the currently used typing conventions as possible in preparing entries for the card catalog. for example, two or more initials separated one from another by periods have no spaces following internal periods. thus, the abbreviation for the united states is typed as u.s. and not as u. s. abbreviations filed as they are spelled and me's and mac's in separate sequences are two of the major differences from standard manual library filing. it was recognized that in generating the sort key the computer will scan the entry words character by character and space by space. thus, it is important that each character and each space be positioned accurately. the computer checks a character and either interprets it as a blank, a letter, a numeral, a symbol, or else ignores it. in alphabetizing, · this basic rule is followed : a blank files before a letter (a through z), and a letter files before a numeral ( 0 through 9). . certain marks of punctuation are interpreted as a space. they are: period, comma, colon, semicolon, hyphen, and question mark. some marks of punctuation are ignored. they are : parentheses, brackets, dollar sign, virgule or slash ( / ), equal sign, number or sharp ( #), per cent, asterisk, apostrophe, hat sign, and ampersand. it was believed that the presence of a space on either side of an ampersand would place entries in correct order, but in some cases this did not happen. some diacritical marks change the value of the character with which they are associated. for example, an umlaut over an "a," "o," or "u" changes that character to "ae," "oe," or "ue" respectively. non-filing symbols if an alphabetical order is desired other than that explicitly given in the entry words, special symbols are employed at the time of coding. since language of publication is not coded, it is necessary to place symbols around introductory articles for them to be ignored. the less-than ( < ) and greater-than ( >) signs are the symbols used to set off a sequence of characters to be ignored. the placement of these symbols is important. for example, to eliminate the article from the title, the century of science, it is necessary to place the symbols in this manner: century of science. in this way the sort key would be generated starting with a book catalog at stanford/johnson 35 the letter "c" in the word "century"; and a space would be left between the words "the" and "century" in the printed heading. use of the non-filing symbols internally in an entry is limited and must be strictly controlled through recording of decisions in authority files. so that names filed under prefixes and written as two words will be filed in the same sequence as names written as one word, the non-filing symbols are employed internally. for example, in order that van buren and vandenburg will file in the same sequence, van buren is coded as van< >buren. in this manner the computer is instructed to ignore the space when forming the sort key. the use of non-filing symbols has proved quite useful in subject headings to arrange period subdivisions in chronological order when there is a word or words intervening between the heading and the date. thus, with non-filing symbols employed as shown below, these particular subject headings are arranged chronologically: ct. brit.-history-1042-1066 ct. brit.-history-1066-1154 ct. brit.-history-1066-1485 ct. brit.-history-1135-1154 ct. brit.-history-1154-1189 ct. brit.-history- 1154-1216 the filing symbol the less-than and greater-than signs are provided on the expanded print chain purchased for the book catalog project. to date it has not been found necessary to use these signs as symbols in titles, and so their use is restricted to their role in forming non-filing elements. as work proceeded, need was felt for another symbol-one that would set off a field that would not print but which would be filed upon. for example, we wished to file the title, 1848: chapters of german history, as though it were written, eighteen forty-eight; chapters of german history; yet we did not wish to violate the form of the title as given in the book. an examination of all characters in the print chain led us to sacrifice the symbol @ for use as a sign in its own right. it is used solely as a filing symbol. thus, any characters or spaces placed between two @'s will generate a sort key as specified by those characters, but the information will not print. the title, 1848: chapers of german history, will be coded in this manner: @eighteen forty-eight@< 1848 >: chapters of german history. it will be filed as though it were written: eighteen forty-eight: chapters of german history. the use of the filing symbol has been especially useful in arranging period subdivisions in chronological order when treating years in the prechristian era or before the year 1000 a.d. for years in the pre-christian era, coding permits chronological arrangement of years beginning with 36 journal of library automation vol. 1/ 1 march, 1968 9999 b.c. the following procedure is observed: the year in question is subtracted from 9999 and the resulting difference, preceded by the letter z, is entered inside the filing symbols. the year will thus file after all letters but before years in the christian era. for years before 1000 a.d. the leading 0 is simply placed inside the filing symbol, for example, @0@476. to illustrate further, here are three subject headings as manually filed: rome-history-republic, 510-30 b.c. rome-history-republic, 365-30 b.c. rome-history-augustus, 30 b.c.-14 a.d. they are coded in this manner : rome-history- @z9489-z9969@ rome-history-@z9734-z9969@ rome-history-@z9969-0014@ the following sort keys are generated: rome-history-z9489-z9969 rome-history-z9734-z9969 rome-history-z9969-0014 the headings will file in correct chronological order and print as originally shown above. observations on filing order with but few exceptions, the filing order as designed has proved a very satisfactory arrangement. it has been felt advisable to place notes at various points in the catalog to link together headings which are filed separately. for example, the abbreviation mr. is filed as mr, and the word mister is filed as mister. here a note refers from one to the other. in the subject catalog it was discovered that if a country or local heading is abbreviated, two different alphabets are established. so far this has occurred for the united states (u.s.) and great britain ( gt. brit.) the terminal period generates a space when the sort key is established. thus subdivisions separated from the heading by a dash (two hyphens equivalent to two spaces) are in fact separated by three spaces and file before jurisdictional or form subdivisions which do not require the dash. for example, u.s.-history files before u.s. dept. of state. a note in the catalog gives instructions on the filing order in such a case. less fortunate is the situation of the author who chooses to use a name with a first initial and a complete middle name. because of the period and space separating the first initial from the middle name, there are established two spaces. thus, the following "incorrect" alphabetical order is established. smith, j. russell smith, j.a. smith, j.c. a book catalog at stanford/johnson 37 as may be expected, situations such as those described above do not occur often. it is hoped that through scanning of the open page before him the reader will find the correct heading. it may be argued that the coding required to achieve the alphabetical order in this catalog is too demanding for a project based upon use of a sophisticated electronic computer. possibly, programming should have taken care of all of this work. it has been our belief that we have achieved, in terms of the present state of the art, a good balance between what the machine should do and what the human should do. in the process we have been able to keep the form of the information as it appears in the source. as examples, introductory articles have not been eliminated from titles, and library of congress subject headings have been retained ( 11). most important, it has been possible to implement these rules consistently with a relatively inexperienced staff. page creation with each entry created and alphabetized, the author-title and subject split program ( lb060) is called upon to generate the lines for the final catalog and create the two columns of each page on magnetic tape. the final program ( lb070) prints the pages of the catalog from the tape. the computer line printer permits the use of 132 print positions in each line. the type size is the same as pica type-ten characters to the horizontal inch, six lines to the vertical inch. it was decided that the completed page size for the book catalog should be 8~" · x 11". with an allowance for an adequate margin on all four sides of the page, it was believed that the reduction necessary to employ the 132 characters in a line probably no longer than seven inches would be too great. experimentation led us to accept a reduction to 68 per cent and use of 98 of the 132 print positions. this can, however, prove expensive, as the printer takes as long to print 98 characters as it does 132. the catalog page as designed calls for two columns, each 45 characters in length, with an eight-character margin between them. the text is 80 lines, and the page is 84 lines in length because of the heading at the top and the page number centered at the bottom. catalog entries are not split between columns, so that the bottoms of the pages are rarely even. to simplify programming it was decided not to attempt programmed hyphenation of words or to require right and left justification of the lines in the catalog. the first words of a catalog entry are set flush left, and all successive lines are indented two spaces. the call number is set flush right on the last line of the entry if there are three spaces separating it from the last word of the entry; otherwise, it is set flush right on the following line. as stated earlier, entry words (authors, subject headings, added titles) are suppressed if they are the same as those found in a preceding entry 38 journal of library automation vol. 1/ 1 march, 1968 and are repeated only at the head of a new column if the entries are continued there. entry words are so clearly shown in the catalog that it was not considered necessru:y to use keys at the top of each page indicating which letters are included on that page. because an expanded print chain is employed, speed on the printer is considerably reduced, actually to 250 lines per minute. the printer requires eighteen seconds to print one page. the page image is approximately ten inches by fourteen inches. through use of the itek platemaster, this image is reduced to 68 per cent and an offset master created for reproduction on offset equipment. figures 8 and 9 show representative pages from the 1967 annual catalog. the foregoing account has emphasized the preparation of a page in an annual catalog. except for the size of the page and the kind of paper used, the identical process is followed for the preparation of the supplement. through a switch setting, a forty-line page for the supplement is printed. the supplement is printed on ten-ply paper ( 8~" x 11"), kept in unburst form, and bound at the top in post binders. figures 10 and 11 show representative pages from the january, 1967, supplement and illustrate how some of the entries earlier depicted in figures 6 and 7 appear in final form. through similar programs the shelf list is prepared and printed in . essentially the same format as the supplement, a 98-character line and a page of forty lines. a key at the top of each page indicates the first call number on that page. the shelf list is printed on four-ply paper, and copies distributed to important staff service points in the main and meyer libraries. figure 12 shows a copy of a page from the january, 1967, shelf list, depicting how the information from figure 5 appears when finally printed. the lone call number, e176.1.b77, copy 2, is for an item, of which copy one is represented in the 1966 annual shelf list. the first annual catalog and its supplements the first annual catalog was prepared during the summer of 1966, listing the 25,000 titles cataloged as of the end of june. the catalog was 2,804 pages long, 1,569 pages in the author & title catalog and 1,235 pages in the subject catalog. each page from the printer was first scanned by library staff and serious errors masked with white tape. in consequence, the user of the catalog will encounter an occasional blank on a page. the pages were then sent to the university's photo reproduction service, where offset masters were created and fifty copies of each page reproduced. the stanford university press prepared the binding. each set of the catalog was bound in red buckram in seven volumes, approximately 400 pages in each, four volumes for the author & title catalog, three volumes for the subject catalog. there is a title page in each volume and several pages of explanation on the use of the catalog. letters included in each volume are a book catalog at stanford/ johnson 39 author ~ titlc catalog -june 1967 ho rnbeln, thomas r. everest: th e west ri dge. photop raphs fr o• the a•er le~n hount eve rest ekped ltl on and bv tts leader, nor~ an g. dyhrentur th . introd. bw william e. s!rl. edited by david br.ower. sierr a club, 1965. 198 p•, ill us. ds486. e8h54 foil o hornberger, theodore · benj~mln fra nklin. univ. of minnesota pre ss, 1 962. 48 p• ps75l oh6 lfo rnbjow, ar thur the ca p tive, b~ edoua rd bourdet. translated by arthur ho r nb l ow, jr. lntr od . by j. brooks atk inson. bre n t ano's , 19 26 . 255 p. pq2603.0777p7z a. hi sto ry o( the theatre in afl'l.e rlca (t' om its beg lnnln n s t o the pre sent tt-e. j. d. li ppin cot t, 19 13. 2 v. pn2221.h6 the trlu~ph of dea th, by ga briele d'annunz io . tra n, lated by arthur ho rn blow. lntrod. by burton nasc oe . 8oni a nd llverl ght, 192~ . 412 p • pq4 803.z3t7 hornb y , a. s . a nuid e t o patterns and usage in english. oxford untv. pre ss , 1962 . 26 1 p. hot"n e, alistair the fall of pari s; commune , 1870-71. 1966. 458 p• pc14 60.h5 4 the siege and the st. hartin's pr e ss, oc311.h6 the price of ~ j o ry! verdun 1 91 6 . s t. ma rti n*s pr ess , 196j . 371 p• d545.v3h6 return t o po wer; a r epor t on th e new germa ny . pr a ege r, 1956. 41 5 p• 00259.4 .h65 ho r ne, c. sl l vest ~r put"lt.t nlsd .tn d art ; an l nq ulr11 int o a popular rat lacy. b~ joseph crouc h. introd. by the re v. c. si lv es ter horne. c as s e llt 19 10. j8 1 p• n7z.c 8 ho r ne d moo n ; an a ccount of a jou r ne y throu g h pakis t an, kash• ir, a nd af gha nist a n. bv i an s te phen,. . indillna unlv. press, 1955. 2.88 p• 05377.58 horner, har l an ho~t lin coln and gre:ele!l• un l v. ot 11tlno t s press, 1953. 43 2 p• e45 7. z. h79 horne~ , tlia ren rwork o . 196 4.] the coll ec t~d work s o f k~ren horney. w.w. no rt on, 19~2-6~. 2 v . conte n t s . v.l. th e neurot ic personality ot our t i ~e.v.2. selfanaly s i s . rc435 .ff6 neurosis and hu man qrowth; the struggl e t o ward se lf-reall z otlon. w.w. norton, 1950. 391 p• rc 343. h5 48 the neuro t i c pe r sonality of our tf •e • w.w . norton, 1937. 299 p• rc343oh75 n~w wll l/s ln psychoanalysts. w.w • . nof'ton, 1939. 3 13 p• bf1 73.h762 ou r inn e r confli cts: a co nstructive the o ry of neurosis. w.w. norton, 1945. 25 0 p• rcj43,h56 115z j. henry mcycr memor i al library horne~ , karen self-an al ysis. w. ~. norton, 1942. 309 p• bfi73.h"1625 ho rn gren, charl es t. cost accountin g ; a managerial e~pha~ i s. pre ntice-hall, 1 964. 80 1 r • hf~5~6. c8h59 hornik, he nri le t emp le d'honneur et de ver tu~, par jean lenalre de belges. ed. critique pub ll ~e p4r henri horni k. oroz, 1957. 136 p. pq1628 . lst4 horns, stri nn s and har•ony . fty art hu r h. be na::le. doubleday, 1960. 271 p• ml380s.b33 das ho rnunge r helm we h, a nd o ther 3to rt es . by wern e r bergen~ ruen. edi t ed by w. i. lucfts. to ne ls on , 1963. 1 17 p• pt260j.es9h6 ho rodls ch • abraha m picasso ~s a book a rti s t. ~orld pub• co •• 1962. 136 p• nc247.psh63 h~ ronj~rr, robert th e p lanning and design ot airports, hegraw -h lll, 1962. 464 p• tl7 2s . 3 .p5h6 horowit z, david he•ls phe res no rt h and south; econo~ lc di s parity among na ti ons . john s hopkins press, 1966. 11 8 p• hd82 . h617 stu:fe n t . r~lla~tt n~ books, 1962 . 160 p• ho r owitz, irving lou l5 the anarchists, edited irvin g louis horowitz. p• l076 0.1i6 vll h an intr od . by 0• 11, 1 96 4. 6 40 hx 82 6.h6 the i dee o f var end p eace in conte mpora r y p h ilos op hy. w~th an in tr oduc tory essay by roy wood se llar s. pai ne-~hit~an , 1~57. 224 p• j x1952 . h72 the ne ~ socio l ogy; e$says in ~oc ial sc ie nce and socia l th eo r y in honor o ( c. wri ght mi ll s . r.di ted by ir ving lou is ho rowitz. oxfor j un i v. press , }g64 . 5 1 2 p• h35.h68 radi ea 1lsm and t he rev o lt b3alnst reason; the socia l th eori e s d f georges so r e l, wit h a tra ns lati on o ( hls ess a y on the deco•posl ti on of ma r xi s m. hu• en t tir.s press t 1961. 264 p• hx26j.s6h5 revolution in brazil; polit i cs a nd soc iet y in a d eve l opi ng nation. e.p. dutton , 196 4 . ~30 p• f2538 . 2 . h6 th r ee .,. orl ds o t deve l o pmen t; the t heory and p ra c t ice o f international stratl!icatlon r oxfor d univ. press , 1966. 4 75 p• 0640.h6 horrabin, j.r. an atlas or afrtca. 2d, rev . ed . f.a. praeger, 1961. 126 p• g2445.116 mathe ~atics cor th e mlllton, by lance l o t hogb en. i llustrations by j .f. horrahtn. w.~. nort on, 1 937. 647 p• qa36 .h6 horrobln , dav id p. the com~unlcat lon systems o r t he body . basic bo oks, 1964. 214 p• qh508 .h6 fig. 8. a page in the annual author & title catalog. 40 journal of library automation vol. 1/1 march, 1968 subj ect catalog -june 196? cromyell, oliver ullver cro~well, by john morley. centurv co., 1900. 48g p. da4z6.k86 oliver cro•well, by c.v. wcdgwood . mac~illan, 1956. 14~ p• da426.w4 oliver cro• wetl and the rule or the puri tans in en~ l•nd. by s ir charles firth . with an lntrod. by g.h. youn". oxro~d unlv. press, 1~61. 486 p• oa~26.f52 crohye ll, thomas tho•as cro•wel l and the en~lish refor••tlon 1 b~ a.c. dickens. english unive rsities pr e1 s 1 1959. 192 p• da334.c9d5 cronin, a.j. adventur e s in two worlds, by a.j. cronin. little, brown, 1952. 331 p• pr6005.r68a4 crop r£po rts ~.!! agrleulture--statlstlcs. crops--statist ic s ~~~ agrlcu1ture--stotiotlcs. crops and clikate--u.s. cli•ete and •an, the yeerboo~ ot agriewlture, 1 ~41. u.s. dept. or agr iculture. u. s . govt. f>riot. orr., lo•h. 1248 p• s21oa35 1941 cross, "ari~n evan s §!!el iot, george, pseud. cross-country run~ing §:!~ rijnning. crow indians cro w indian beadwork: a descriptive and historical studs~• by willie"' wildschut and john c. ewe rs. museu~ of the a•e rlean lndlan, heye foundation, 1959. ss p. , il1uo. £99.c92w5 the crow indians, by robert lowie. farr•~ ~ rinehart,. 1935. 350 p• e9!1.c92l913 the lite •nd a dventu res or ja~es p. ueckwourth. edited by t.o. bonner. a.a . knopr, 1931. 405 p • f592.b393 the re11glon of the crow indians, by robert h. lovte. american museue ot natural history, tnz. 30>1-4h p• e99.c92l6 croiids the crowd: gu stave le 23~ p• a studv or the popular ~i nd. by bon. t. fisher unvln, 1917. hm28t.l 5 the cro wd in history: a ~tud~ ot popular disturbances in france and eng land, 17301848· 8y george rude. j. wiley, 1964. 2&1 p• hmz83,r8 the erowd in the french revolution, by ceo r o• rud~. cl•rendon press, 1961. 267 p• dc158.8.r& the psychology or social movement•, by hadley cantril. j. wlley 9 1~41. 274 p• h"291o c3 418 crusades an arab-s~ rl•n "entjem•n and wlrrlor in the period or the crus•des: •e•olrs or usa11ah lbn-hunqldh. trans l ated fro~ the original manuscript by philip k. hltti. colu~bla univ . preos, 1929. 265 p• ds!i?.u5 background to the crusades. a 8bc publication. british broadc asting corpore tlon, n.d. 38 p• 015 9. 87 the crusades, by ernest barker. oxford unlv. press, 1923. 11 2 p• 0158.83 the crusade3t by richard a. newhall. rev. ed. holt, rinohort and winston, 1964. 136 p• dl58.n~ fhe crusades, by zo~ old~nbourg . translated by an ne cdrter. pantheon books, 1966. 650 p• 0158,04 the crusades: iron •en and saint~. by ha r old la•b• doubleday. doran, lvjo. 368 p• d15?.l3 the crusadest the ~tory ot the latin kingdo• ot jeruaale~. by t.a. archer and charles l• kingsford. c.p. putn••• 1936. 467 p• dl58.a67 a hlatorv ot the crusades, by steven runc:l•an. cambridge, eng., unlv. press, 19~?. 3 v. d1 57.r8 the klngdo• of the cru•eders, by dana carleton munro. o. appleton, 1935. 216 p• d18 z.m8 the recovery ot the holy land, by pierre dubois. translated wlth an introduction and not~• by walth e r 1. brandt. columb ia un lv . press, 1956. 2sl p• di~z.d813 crus ade s-hi story a history ot the cru~ade~. £dttor-ln-chiet, kenneth h. setton. unlv. or pennsylvania press. 1 958 librory has v.~-2. d1s?.s48 crusades--first, 109 6-1099 the first erusede; the accounts witnesses and participants. pe 1958. 299 p• or eyes~ith, d161.1.a3~7 gest• francoru• et e11orum hlerosollai t anorum. the deeds of the franks and t~e other pllgrl~s to jerusalem. edited bl( rosalind hilla; lntrod. by r.a.b. "ynors. t. nelson. 196z. 103 9 103 p• in lati n and en~ iish. d16l.l.g4 crusades--second, 114?-1149 de protection• ludovlcl vii 1ft oriente•• edited, vlth an english translation by virg inia gingef"lck. berry. coluabla uniy. •preos, 1948. 154 p• dl62olou3 crusades--third, 1189-1192 the crusade ot richa r d llon-hea'f't, by a~brolse. translated by "erton jero~e hdbert. with notes and documentation by john l. l• monte. columbia univ. press, 1~41. 4?8 p• d163.aja52 fig. 9. a page in the annual subject catalog. 0 • • • ~ • • • ~ ~ oe ~ ~ ~ • /\unioii e title catalog -januai!y 1967 sijpplf.he:nt bailey, stephen ke~p political pnrtles, u.s./i,; essays by step~en kemp bailey and others. edited by robert a. goldwin, rand mcnally, 1965. 158 p• jk2265.g59 bailey, thomas a, presidential greatness; the l10age and the man trcm george ~ashlngton to the present. appleton-century, 1966, 368 p• el?6.l.b17 woodrow ~llson and the great betrayal , encounter paperbacks, 1963. 429 p• d543.a7b28 baln, j. atthur lite and exploratlcns of fridtjot nansen. new ed. rev. and considerably en!., with numerous illus. and map. w. scott, n.d. 449 p • g700,l893.b3 bainville, jacques the french republic, 1870-1935. translated, with an introductory note, by hamish miles. j, cape, 1936, 253 p• dc335,b;jz baird, a, craig araumentatlon, discussion, and debate. mcgraw-11111, 1950, 422 p• pn418l,b29 general speech; an introduction. by a. cral!l ealrd and franklin h. knower. 3d ed. mcgraw-1111 i, 1963. 44 8 p• pnh21. 8314 speech criticism; the development ot ~tandar.rls cor rhetorl cal aoo••al9al. bv j, henry ~eyer mekoriiil library baird, donald the english novel, 1578-1956; a checklist o! twentieth-century criticisms. by inglis f. bell and donald !laird. a. swallow, 1958. 169 p• ~h~l~ed only in reference alcove 280, z20l4.f4b4 baird, wlllla~ ralmond baird's manual ot american college fraternities. 17th ed. g. ba~ta, 1963, 834 p. ~h£1~£4 only in area 2~0. lj31.b2 baird's manual ot american college fraternities. 17th ed , g. banta, 1963. 834 p• ~h£1y£4 only in area 230. lj3l, b2 la baja edad media, por enrique bag~ e y juan petit. s. barra!, 1956. 412 p•t lllus. dp99.b3 bak, b-rge elementary introduction to molecular spectra. 2d rev. ed. intersclence, 1962. 144 p• ~c451.b16 baker, carlos american issues: the social record. edited by merle curti, willard thorp, and carlos baker. 4th ed. j.a. llpplncctt, 1960. 1160 p• psso?.t54 ernest he10ln9w~y: critiques of four major novels, f.dlted by carlos oaker. c. fig. 10. a page from the january 1967 author-title supplement. c) • u uu 0 • • • w wu 8 • • • p uv • • w u~ ol > tl:l c c ..,... cj ~ ~ ~ .... c/:) 4 ~ 0 :i: z c/:) 0 z ~ ~ \. '· • # .. c> v 0 40) ~ ~ q, ~ ~ ~ 0 co 0 0 • ... , ''-v' 40$1o•· ....:.t~......,._ ,_... ·~-: ........... ~ .......... """',. ..... ....~ ~~<:j: "' .. ,.jjt .,: r ,!a r., l.u.: -jf' ~ijj\:t 'l 1 jfj-) su p ... lc~t: •: t" po~ys, llewellyn the po~y3 brotherst by r.c. churchill• ton gmnns , grce nt l oos and influence of twalve notable u~·at '• ~~~~. col l:t ... llo,6 }lclel\~t e.dl .t.or: l if\ c~jef• c:. ~crlb.11e t & f•"l .c& '!•1 c:,l co v.l! c,l co .itt.~ c.:..l . ~.. ,111 .. "~"'60 .• _6 " ·•-. y.o:ta ct.l c: ~'1.1.• ~ ~cl.l.t•d. ,bl( ~.g~j:•. jio.pl\1 114•. ,vo t. !> .1 i\ ' l 11<\elt.f. . &11'6·)·)l~7 . . · dellejt, tt.o., u ~. h .utdoilt(.i1 a're~t!"•.••: tk• ... aae and. t.."~· ·~n . . r:rp•. qearge v~s.f\l11ato11 to th~ .present~ ill!. tka•.u ·"·-.. b•ll•ll • 1\ppleta!i,.,c.ellt.lll:.lft .~!1116 •. , 3d~ .jh ..• .p res.j de:nts~:-u •. $, tjue e1.7~.l:,.b77 co.2 &171it1 •. j<3 kllll .t:t .!ot.eph tla~han . t:ec:.t• abou.t .the presidents.:~ .ciollpiu.tloi\ p,t ,b_logrephloe.l ~ncl hl,.,t.qr.lcal .d!ll.t!'• 811 "o .. ph nathan kane, h,,w, iu.l•olla ,\,96.0 .•. ~41\ .. p.·· . ~hlll!~d onl~ in area ~~o. prils1d£11t!i--u, s• '-' tte col c ~l!tl.l!!!!. 111. re.hr.c.nc!\ ~! .. cove ~40• u,s,--histo.ry ~:;s~:~.:sqcj/.1. j.iji:: ~nd (;~stq/1.~ 1..l.t.l!! . ' . !(~j c:, _j, !'\,. ~(·~ ~.·,1, k.:t ~.·3. c::~.1 ica v_._4 .~.1, kt ~~~ ~~ .j. 1\,. v.,g c:.l 1\ tl78o:s91\tl 1.94?. fa.j.i.q 811ttertletd 1 roger ;fh~ :"""'rl.c•ll pa~t i . •. ~lstoi')i ot 't;he uri (ted stat."s r:r~ .... con~o.rd ,tp .h.i.~o.shlu, .. 1776:-1~4~ .• . b:v: ,r.o!j~r but.t.e.rt.l•~.l~~ sj.aon •11cl schuster, 19'17. 47!1 p• t lllu•• tl:~:~;:n~~~g~:~~:p l<:t~rfa~i; wil~i';fi . :r.t..t.le 1!1~ ... .. ~'""' ~i • ··~ ·· : • ,1 ' ~:·· ~ ' ~ ; 0 1!-t\) " \d }., \)( t·..;-o . . ~ : l i ~ ... • ! • u· , : ..... 1: i i ._.~ j j .. .> l:z:j 0 0 ;>~-' ~ i ~ ..... en ~ -it a ~ 0 :i: z en 0 z t!z 44 journal of library automation vol. 1/ 1 march, 1968 imprinted on the spine. fifty sets of the catalog were ready when the building opened in november, 1966. the shelf list, printed in four copies, contained 3,261 pages. each set required seven binders. in view of the fact that activation of the library continued through the first year of opening, the collection grew at a much greater rate than is anticipated for subsequent years. hence when the building opened, there was available besides the annual catalog a first supplement, in ten copies, listing the 4,000 titles cataloged from july through september, 1966. although it was proposed originally to prepare monthly supplements, factors of cost and staff time led . to acceptance of quarterly supplements. the second supplement, issued in january, 1967, included the 8,000 titles cataloged from july through december, 1966. a third supplement, issued in april, 1967, included the 12,000 titles cataloged from july, 1966, through march, 1967. the april supplement had 1,934 pages in its author & title section, 1,206 pages in the subject section, and 1,752 pages in the shelf list. the major drawback to an off-line, batch-process book catalog is that it is an obsolete document when produced. this was especially true during the first year when the library grew at the rate of 100 volumes per working day. as a partial remedy to this situation a brief, dated catalog card accompanies each book cataloged for the meyer library. information included consists of call number, author, and title. this card is placed in an alphabetical file at the reference desk and purged when a new supplement is issued. the meyer library s~aff considered the ten copies of the supplement inadequate for use in the building; during the second year twenty copies of each supplement are to be prepared by running the print program twice. supplements in the second and succeding years will, of course, be considerably shorter than those issued in the first year. the second annual catalog preparation of the second annual catalog began in the spring of 1967. this catalog lists the 41,000 titles cataloged as of the end of june, 1967. the first procedure was to emend the 1966 tape by purging the entries to be changed or deleted; corrected entries come in with new data. in july the information for titles cataloged from april through june was transferred to magnetic tape and merged with the data in the april, 1967, supplement. all programs were run through the author-title and subject update program and at that point merged with the emended catalog from the preceding year, the split program was run, and the pages for a new catalog were created. as in the preceding year, library staff scanned the completed pages and masked noticeable errors. the photo reproduction service printed a book catalog at stanford/ johnson 45 75 ·copies of each page during the first half of august, and the stanford university press bound the catalog during the following month. completed sets of the catalog were delivered on september 20, 1967, a week before classes were to begin for the new academic year. the 1967 catalog is 4,612 pages in length-2,683 pages in the author & title catalog, divided into five volumes of 530 pages each, and 1,929 pages in the subject catalog, divided into four volumes of 480 pages each. as in 1966, there is a title page and an explanatory introduction in each volume. floor plans of the library are on the end sheets; and imprinted in gold on the spine are the letters included in each volume, there being clean alphabetical breaks between volumes. the second annual shelf list is 5,634 pages long, and each of the four copies requires eleven binders. some confusion resulted in 1966, when both author & title and subject catalogs were bound in the same color. in 1967 the author & title catalog was bound in tan bookcloth and the subject catalog in light green. machine timing as the above figures demonstrate, the 1967 edition of the book catalog is no brief document, similarly, time required to process the information on the computer was not brief. as stated earlier, the addition of the expanded print chain considerably reduced the speed of the line printer. instead of printing in excess of 600 lines per minute, the printer speed was reduced to 250 lines per minute. this speed was determined by timings made of the print programs. to print each page in the annual author & title and subject catalogs, eighteen seconds were required. to print each page in the supplements or in the shelf list, ten seconds were required. thus, for example, to print the 4,612 pages in the 1967 annual catalog, twenty-three hours were required on the computer printer. in processing the supplements and annual catalogs, it has now become necessary to talk of time required for processing in terms of hours and not seconds or minutes. during the preparation of the 1967 annual catalog timings were made of the various internal programs, whose output was magnetic tape and which were not tied to the mechanical limitations of the line printer. sample times are shown in table 4. table 4. program running times format and update program ( lb020) shelf list split program ( lb030) author-title and subject update program (lb050) author-title and subject split program ( lb060) 6.5 hours 11.2 hours 3.7 hours 28.5 hours throughout the year time is required on the computer for the preparation of edit lists. timing was conducted for this particular program as well for each 100 records entered, four minutes of machine time are required to prepare an edit list. 46 journal of library automation vol. 1/ 1 march, 1968 · the computer employed for the project is a university facility, and the library was billed for its use at the rate of $32.00 per hour. the library receives a monthly statement for various charges from the administrative data processing center, and these have served as one basic record to employ in calculating the actual costs of the book catalog. costs the determination of actual costs is a difficult undertaking, and a meaningful comparison with costs estimated during the planning process is filled with problems, uncertainties, questions of definition, etc. in a sense, it is impossible to make a meaningful comparison. an element measured during planning is not the same as the element actually achieved. for example, during the early planning stages, before systems design actually began, there was no clear plan for the shelf list nor idea of what its role would actually be. the shelf list as finally designed and implemented is a far more sophisticated document than was then visualized. second, there was no clear thought given to the inputting of separate records for each added volume or copy of a given item in order to achieve an inventory control document as well as a classed listing of items in the library. third, there was no clear determination as to the length of the sort field required and its effect on processing. fourth, a principal study . conducted to justify the book catalog compared its projected costs with the costs of three dictionary card catalogs in the new library. although the book catalog was implemented, the three card catalogs were not, and we have no idea as to the accuracy of our calculations of their cost, even though our experience with the preparation of card catalogs is greater. fifth, cost studies were based upon the preparation of a 40,000title catalog as the first product. this was an unrealistic assumption to make, because the library was to open with only 25,000 titles in its collection. given such reservations and conditions, an effort has been made to summarize estimated costs and so attempt an understanding of how they compare to actual costs. even the determination of actual costs is difficult. it must be borne in mind that the complete operation was performed "in house." cost statements thus omit considerations of such necessary factors as overhead and considerable administrative supervision. for example, during the second year the library was not charged for program maintenance, a signi.jicant contribution from the administrative data processing center. initial planning was based upon preparation of a 40,000-title ( 60,000volume) catalog, ·and it is possible to present cost approximations in two sections, the first recording · costs required to prepare 50 copies of the 25,000-title catalog issued in 1966; and, second, the additional costs required to input the next 16,000 titles, issue three supplements, and prea book catalog at stanford/johnson 47 pare 75 copies of the 41,000-title ( 60,000-vohime) catalog issued in 1967. they are shown in table 5. . . table 5. cost approximations july 65-aug. 66 sept. 66-aug. 67 input (@ $.40 per title) $10,000 $ 6,400 programming computer charges edit lists test catalogs supplements annual catalog reproduction binding . 5,945 3,000 4,000 2,500 4,570 1966 ( 350 vols.); 1967 ( 675 vols. ) 805 binders for shelf list and supplements 84 totals $30,904 1,660 4,460 4,950 5,270 1,690 300 $24,730 . if we eliminate the costs directly related to the production of the 25,000-title catalog, we may be able to isolate the cost of the 41,000-title catalog issued in 1967. this calculation is subject to a certain amount of error, because some processing done in preparation of the 1966 catalog was used again in 1967. this may be compensated for, however, by the time required for the utility programs to emend and delete items from the 1966 tape. test catalogs and their cost were not considered in early planning, and so their $4,000 cost is eliminated as well. in table 6 below the actual costs for the 41,000-title catalog so derived are compared with the estimates prepared in the fall of 1964 and the estimates offered in april, 1965, at the conclusion of the preliminary systems design. various adjustments have been made so that these figures are as comparable as possible. for example, the systems estimate did not include a cost for inputting, and this has been added. the actual figures have been adjusted to include costs only for the printing and binding of fifty sets of the catalog instead of the seventy-five which actually were prepared. although the december, 1964, estimate included under computer charges a factor for supplements, these are not included in the systems estimate or actual charges. the format of the supplement particularly became so sophisticated in design and implementation, both in format and number of copies, that this discrepancy is minimal. these figures necessarily cannot be precise, but they give some magnitude of the work undertaken. although the cost figures indicate that the actual cost was more than fifty per cent greater than actually estimated in 1964, it does remain close to the estimate prepared in the systems design. the chief reasons for the discrepancy may be summarized as the underestimation of the amount 48 journal of library automation vol. 1/ 1 march, 1968 of machine time needed for the various programs, the underestimation of the programming job involved, underestimating the charges for edit lists; and, most important, the design of a system that was very much more sophisticated than that originally foreseen in 1964. table 6. comparison of estimated and actual costs input of 40,000 titles computer charges reproduction (50 copies) binding programming totals dec. 64 april 65 estimate estimate $11,060 $16,647 1,750 8,595 4,324 4,500 . 3,750 2,385 3,000 6,000 $23,884 $38,127 actual costs $16,400 9,610 5,115 1,600 5,945 $38,670 even though costs were greater than expected, one estimate did hold up, namely the time required to complete the job. delivery and installation of equipment, programming, program testing, inputting of data, reproduction, and binding-all were on schedule with only minor slippage that did not affect the completion date of the overall job. the future the publication of the second annual book catalog coincided with the completion of the library's activation project. continued work on the addition of materials to the new library has been assigned to existing divisions within the university library. inputting of cataloging information for the meyer library and preparation of supplements to the book catalog and of new annual catalogs are now functions of the catalog division. growth of the library will henceforth proceed at a slower rate, with from 5,000 to 8,000 titles being added each year. the first by-product of the system has appeared-a listing of serial publications in the library for use in ordering and claiming operations. even as the first annual catalog was being prepared, the feeling was expressed that the equipment employed (the ibm 1401 computer) was not adequate in an economic sense to undertake this mission for increasingly larger masses of material. this · feeling became clearer with the preparation of the second edition. looking to the future, we see at present several paths we may follow. first, studies are under way on the conversion of the book catalog operation to larger equipment, in this case an ibm system/360, probably linking it to the overall library program of automation. not only might this change entail use of more powedul equipment for the ·off-line processing necessary to prepare a book catalog, but there may be possibilities as well of instituting on-line inquiry. there could thus be eliminated the problem of supplements and time-lags. a book catalog at stanford/johnson 49 second, preliminary inquiries have also been made on the use of the existing tapes in computerized typesetting equipment. the hoped-for result would be the achievement of graphic arts quality on the book catalog page and less bulk to the completed catalog through the greater legibility and greater density thus realized. conclusion such success as this project has achieved may be attributed to a number of factors: the entire operation was performed "in house;" we were able to draw upon the skills of many staff members on the stanford campus-in the library, the administrative data processing center, the photo reproduction service, the news and publications office, and the stanford university press. ibm representatives, and particularly the systems engineer assigned to the project, gave considerable impetus and guidance to the undertaking. equipment was delivered on schedule and functioned well. there was a particularly harmonious and understanding working relationship achieved among the many participants in the project, and administrative support from library and university officials was constant. there never was any problem in gaining access to the computer, and the staff responsible for its operation gave devoted service in the preparation of the catalog. through a happy combination of circumstances, sufficient lead time was available for the project to be completed on schedule. when it became obvious that we should exceed the cost estimates originally prepared, library funds were available to continue the work. it is clear from student reactions that the book catalog is a useful tool in the new library, and it is hoped that the experience here recounted will prove valuable to the profession at large. references 1. mccune, lois c.; salmon, stephen r.: "bibliography of library automation," ala bulletin, 61 (june 1967), 674-94. 2. weber, david c.: "book catalog trends in 1966," library trends, 16 (july 1967), 149-64. 3. freitag, wolfgang m.: "planning for student interaction with the library," california librarian, 26 (april 1965), 89-96. 4. hayes, robert m.; shoffner, ralph m.: the economics of book catalog production, a study prepared for stanford university libraries and the council on library resources (sherman oaks, calif.: advanced information systems division, 1964). 5. hayes, robert m.; shoffner, ralph m.; weber, david c.: "the economics of book catalog production," library resources and technical services, 10 (winter 1966), 63-65, 90. 50 journal of library automation vol. 1/ 1 march, 1968 6. parker, ralph h.: ·"book catalogs," library resources and technical services, 8 (fall 1964), 348. 7. simonton, wesley: "the computerized catalog: possible, feasible, desirable?" library resources and technical services, 8 (fall1964 ), 403-405. 8. avram, henriette d .; guiles, kay d.; meade, guthrie t.: "fields of information on library of congress catalog cards: analysis of a random sample, 1950-1964," the library quarterly, 37 (aprill967), 190-91. 9. "rule 134," anglo-american cataloging rules. north american text (chicago: american library association, 1967), pp. 196-97. 10. "rule 3:6,'' rules for descriptive cataloging in the library of congress (washington, d. c.: u. s. government printing office, 1949), p. 14. ll. hines, theodore c.; harris, jessica l.: computer filing of index, bibliographic, and catalog entries (newark: bro-dart foundation, 1966)' p. 18. lib-mocs-kmc364-20140103103252 scope : a cost analysis of an automated serials record system 129 michael e. d. koenig, alexander c. finlay, joann g. cushman : technical information department, pfizer inc. , groton, conn., and james m. detmer: detmer systems co., new canaan, conn. a computerized serials record and control system developed in 1968/69 for the technical information department of pfizer inc. is described and subjected to a cost analysis. this cost analysis is conducted in the context of an investment decision, using the concept of net present value, a method not previously used in library literature. the cost analysis reveals a positive net present value and a system life break-even requirement of seven years at a 10% cost of capital. this demonstrates that such an automated system can be economically justifiable in a library of relatively modest size ( approx. 1,100 serial and periodical titles). it may be that the break-even point in terms of collection size required for successful automation of serial records is smaller than has been assumed to date. introduction the field of librarianship has in general not been characterized by an abundance of cost analysis articles. this is by no means a novel observation ( 1,2,3). library automation has been no exception, despite its more quantitative aura. in particular there has been an almost complete lack of any analysis of the cost of an automated system as an investment decision. 130 journal of library automation vol. 4/3 september, 1971 the bulk of material that has been written regarding costs and cost analysis has concentrated upon costs per unit of productivity of a functioning system, or upon comparison of such costs among various systems ( 4,5,6) . though still perhaps underrepresented, there is a growing core of such articles. indeed, jacob's article on standardized costs ( 7) indicates that a certain level of maturity has been reached. the analysis of library automation in terms of its justifiability as an investment decision is not an appropriate area for benign neglect. librarians, whether they be special, academic, or public, typically must justify their budgets to some higher authority, and the decision to automate must almost invariably be an investment decision, requiring an expenditure of funds above the normal operating budget. if librarians hope to be successful in justifying their pleas for an investment in automation, an "investment in the library's future", they should be prepared to justify their requests in terms of what they represent-investment decisions. the cost analysis described below is an example of such an analysis. it is an after-the-fact analysis, but the principle remains the same. methods and materials the scope (systematic control of periodicals) system was implemented in 1968 by the technical information department of pfizer, inc., at the medical research laboratories in groton, connecticut. the system is not radically different from others described in the literature ( 8,9,10). it is reasonably sophisticated in its handling of such featu res as claiming, binding, and budgeting. the basic design element of the system is the computer generation each month of a deck of ibm cards corresponding to anticipated receipts for that month. as an item is received, the corresponding card is pulled from the anticipated deck and is used to inform the system of the receipt of the item. this "tub file" feature, first used by the university of california at san diego ( 11) is the major design difference between scope and the university of minnesota bio-medical library system described by grosch ( 12) and strom ( 13) , with which scope seems most comparable in terms of system sophistication and capability. system description the system was originally written in fortran iv for an ibm 1800 computer with two tape drives . a total of twelve programs were written. two of these programs are quite large (the weekly update and the monthly generation program) comprising about 600 statements each; the remainder average 200 statements. since that time the programs have been revised to operate on an ibm 360/30 computer using two 2400 tape drives and two 2311 disk drives. several more programs have also been written. fortran iv was chosen as a program language to render the system relatively immune to hardware changes and has fully justified itself. a listing of programs follows. scope: a cost analysis/koenig, et al. 131 program function number epc01 weekly update epc02 monthly card deck generation epc03 vendor listing epc04 periodical title evaluation & budget listing epc05 holdings listing epc06 scope file print epc07 psn file swap-to reassign psn & realphabetize epc08 daily receipt listing epc09 binding listing epc10 short title vs. full title thesamus epcll skeleton binding punch epc12 copy tape file epc13 general skeleton punch epc14 cross index punch epc15 receipt edit epc16 pmchase order analysis epc17 discipline analysis file design core requirements bytes 17060 15992 5648 6916 6992 6852 7638 1768 2480 3876 3024 2008 2920 3300 1444 2796 3024 scope maintains a magnetic tape file in which each periodical is recorded in sequence by its periodical sequence number ( psn). appearing once in the file for every psn are records giving title, cross-reference, holdin~s, and journal control information, including, for instance, "separate index.' records for one or more copies then follow this basic information. each copy within a psn consists of records for all current expected receipts ( xrs ), binding units ( bus ) not yet complete, as well as a trailer ( tl) summary. a file print program is provided which enables the library staff to inspect every item of data in the file. "anticipated" deck scope generates monthly a deck of approximately 2,500 80-column hollerith cards to be used for posting periodicals as received. a card is made for each receipt expected within the succeeding five weeks. for all regular known publication schedules, these cards are complete as to volume, issue ( including separate index) and publication date. for irregular or unknown publication schedules, one or more incomplete cards are provided in the deck. upon receipt of an issue, the proper card is pulled from the "anticipated" deck, the actual date of receipt is punched and the card used to prepare the daily receipts listing. the card is also used to update the tape file on a weekly cycle. unexpected issues require that a card be prepared manually by the library staff. issues which are omitted by the publisher require that the card be returned to the system as a "throwback." if an issue is 132 journal of library automation vol. 4/3 september, 1971 unexpectedly divided into two or more parts, separate cards are manually prepared and the original card deleted. claims in order to issue claims on a current basis, the tape file is updated weekly with receipts. every receipt will find a copy of itself on the scope tape (generated when the "anticipated" deck was produced) and a received code (r) and the current date will be posted to the record. consequently, any item not marked received becomes a claim as soon as the "claim delay" period is exceeded. a card to be used for claiming will be punched on the weekly cycle first exceeding "lag" and "claim delay," and once again every four weeks thereafter until resolved either by receipt or transfer to the missing issue file. the "lag" is the period in weeks lapsing between formal date of publication and earliest anticipated date of receipt. the "claim delay" period is calculated as the weeks elapsing between earliest anticipated date of receipt and latest normal date of receipt. "lag" and "claim delay" may be modified for each publication based on experience. binding binding units are created within the scope file during the monthly generation run. a unit is punched when all the issues comprising it are received or claimed (that is, when none of them is yet to be anticipated). if the unit is complete (no claims) it will be dropped from the tape file at the time it is punched and will not be punched again. binding units are formed whenever a volume changes or whenever the "issues per bind" factor is satisfied. receipts having been accumulated in the file from week to week are dropped at the time of the monthly generation after being counted for binding. from the binding unit cards a listing is prepared that is used by the library staff to make up bundles of periodicals for the binders. the binding unit card accompanies the shipment and is used by the binder. it includes information on issues included, indexes, color of binding, etc. file maintenance in addition to receipts and "throwbacks" the weekly update procedure allows add, change, and delete transactions to affect the scope file on a record-for-record basis. such transactions are needed to handle new periodicals, additional copies, closed series, discontinued copies, name changes, publication schedule changes, revised costs, vendor changes, and the like. the update operation is ordered by psn, copy number, record type, and (for xrs) volume and issue, in that order. an entire publication schedule may be added to the file in such cases as when the schedule is known but highly irregular (frequency code 99). after the receipt cards are processed by the update each week, they are filed in the "manual receipt file" together with copies of claims sent to ....... scope: a cost analysis/koenig, et al. 133 vendors. as binding units are created, copies of binding cards are filed in the same file, and receipt cards representing binding cards are discarded, as are earlier binding cards. this manual file corresponding to 1,000 journals requires about 5,000 cards and occupies three card file drawers. it is filed by psn and is therefore in order alphabetically by journal title. discards and additions to the manual file are about equal and hence it does not increase substantially in size. it permits rapid manual examination of the current status of each periodical. holdings list a program is provided that lists the complete scope file showing full title and abbreviated holdings statement for each psn. in addition, any cross reference/ history data and any desired holdings detail will be printed. since the file maintenance process insures an accurately updated file, this listing may be run at any time to provide an accurate reflection of library holdings. periodical title evaluation (scrutiny) a program is provided that lists all copies in the scope file requiring annual review prior to renewal. this procedure is controlled by the "value code" assigned individually to each copy within a psn. in addition to full title and abbreviated holdings statement, the listing shows by whom abstracted, the discipline codes associated with the periodical, and the annual cost. given this information, library users are requested to vote for retention of items for the next year. those not receiving sufficient votes are not renewed. separate programs not part of the scope system are used to prepare vote cards and tabulate results. budget list the program that prepares the periodical title evaluation list can be used to prepare lists by "department charged," a convenient budgetary tool used each fall to plan purchases for the following year. the lists may, of course, be run at any time. vendor order list a program is provided to prepare from the scope file a listing of all non-terminated copies associated with each requested vendor. a threecharacter vendor abbreviation is used to control this process and is coded into each copy control record. in addition to the short title, the list gives vendor reference (his identifier for the periodical ), pfizer purchase order number and date, and the estimated annual cost. each different condition (form of publication, such as periodical, microfilm) is listed with the number of copies ordered. although prices are not firm at the time of ordering, this listing nevertheless provides the detail needed for purchasing documents. as price 134 journal of library automation vol. 4/3 september, 1971 change information is made available and updated into the fil e, the listing may be rerun for checking out final billings from the vendor. similar lists can be produced by purchase order number, a convenient tool for resolving those financial complexities which inevitably occur. discipline list this program is used to prepare lists by discipline/subject, as microbiology, immunology, etc., a useful tool for maintaining collection balance, and for assuaging patrons' fears that their disciplines may not be adequately represented. system capacity present counts indicate approximately 9,000 tape records in the system, representing approximately 1,100 journals. about 200 issues are posted weekly. there are no restrictions on future expansion of the system as presently implemented. method the method of cost analysis used was the "net present value method." perhaps the clearest most readily available description of this concept is to be found in chapters 19 and 20 of shillinglaw's cost accounting, analysis and control ( 14). briefly the idea is that of comparing a given investment decision with what might reasonably be expected from an alternative use of that same money for another investment. an investment is typically defined as "an expenditure of cash or its equivalent in one time period or periods in order to obtain a net inflow of cash or its equivalent in some other time period or periods." ( 14, p. 564). the librarian typically thinks of investing in automation now in order to make possible a lessened expenditure in the future-at least a lessened expenditure in comparison to what would be necessary to accomplish the same level of operations in a non-automated fashion. conceptually these are the same; investment now in order to reap some future benefit. future savings can be treated as a future cash inflow. the concept of net present value is rather simple; it consists of converting all present and expected future cash flows (or their equivalents) to a present value and examining that value in comparison to alternative uses for the resources invested. the process of conversion is that of relating time and money. time does of course influence the worth of money. a dollar a year from now is worth less than a dollar today, for the dollar today can be invested and a year from now it will be worth more than a dollar, or at least the mathematical expectation of its worth is more than a dollar. the question is at what rate future cash flows should be discounted. business firms typically use their "cost of capital" (the cost which the business must pay to obtain capital) as the discount rate. a business d~cision should yield a positive net present value when the appropriate future cash flows are discounted at the cost of capital. if not, the investscope: a cost analysis/koenig, et al. 135 ment is a losing proposition, and the business would have been better off by not obtaining the capital, or by investing it elsewhere. the calculation of an appropriate cost of capital is a complicated exercise involving such things as debt capital, equity capital, etc. the figure of 10% is often cited as a good rule of thumb; happily it is appropriate in the case at hand and is the one used here. to the obvious question "is there any relevance in this net-presentvalue/cost-of-capital idea to an academic or a public library which does not obtain its funds in the same way, or have any explicit cost of capital?" the response is "yes." if a decision to automate, when analyzed in this fashion in comparison with alternative methods, should result in a negative net present value, then that decision is demonstrably poor. for if the money invested in automation were instead invested in the market, it could supply the alternative system's future greater operating costs with money left over to utilize elsewhere. this latter course might not be an option in fact, but the mere presence of its theoretical preferability would cast doubt on the desirability of any decision to automate. conversely a positive net present value would argue for the desirability of automation, regardless of the source of the funds. the cost analysis that follows is expressed in terms of set up cost outlays (investment) and projected savings (cash inflow). the investment expenses are of course reasonably well documented. the operational savings are based on 18 months' successful experience with the system. set-up costs (including 1968 and 1969 parallel running costs) systems analysis and programming: (fees paid to consultant ) keypunching: conversion reprogramming: (ibm 1800-dbm 360/30) computer time: personnel, opportunity costs: (asst. librarian $4,000 tech. info. mgr. $6,000) total set-up costs: yearly running costs system maintenance: (retainer to detmer systems) computer time (full costing) : allowance for machine conversion: (based on an expectation of conversion at 3 yr. intervals at a cost of $750 each time) total $10,450 2,000 500 4,000 10,000 $26,950 $ 500 5,000 250 $ 5,750 136 journal of library automation vol. 4/3 september, 1971 operational savings 1970 ~ , per year (in comparison with continued running of the previous manual system) posting: $ 1,400 (based on a saving of 8 hours per week of clerical work) claiming: 1,050 (based on a saving of 10% of an assistant librarian's time) binding: 2,700 (based on elimination of approximately 450 hours of overtime, clerk and assistant librarian, and 150 hours regular time per year) replacement costs : 400 (represents decreased replacement costs due to rapid binding and consequent lower loss rate) production of holdings list: 250 (based on a savings of 50 hours per year of assistant librarian's time) ordering/bookkeeping: 1,250 (based on a savings of 250 hours per year of assistant librarian's time ) total $7,050 savings resulting from control of the collection practicable (see discussion below) space saving per year: subscription saving per year: incremental overhead saving per year: total total yearly savings yearly running costs difference (realized savings) results not previously $ 750 2,000 1,500 $ 4,250 $11,300 $ 5,750 $ 5,500 the net present value at the end of 1970 based on 10% cost of capital and 15 year life expectancy follows. the present value of one unit one year ago is 1.1052, at 10% cost of capital (assuming for simplicity that the 1968-70 set-up prices were paid in a lump one year prior to the end of 1970 ); 7.7688 is the present value of an annuity of one unit per year for 15 years at 10% cost of capital. net present value factor scope: a cost analysis/koenig, et al. 137 net present value 1968-1970 set-up costs: ( $26,950) x ( 1.1052) ( -$29,785) yearly savings, commencing 1970: ($ 5,500) x (7.7688 ) ( +$43,117) net present value= $13,332 these findings indicate that the crude payback period :::::::: 4.9 years (commencing january 1971). the system life required to break even at 10% cost of capital = 7 years. another way of looking at the matter is to calculate the discounted rate of return. that is, at what rate of discount is the sum of the positive present values equal to the sum of the negative present values. in this case, the discounted rate of return = 17%. in other words, since the discounted rate of return ( 17%) is significantly above that available for alternative uses of the resources (say 10%), this is a reasonable candidate for investment. discussion the net present value method has two inputs in addition to the raw data. the first one, already discussed, is the cost of capital. most large businesses can supply such a figure, or at least inform the librarian or information manager what approximation is used by that company (though surprisingly many otherwise sophisticated businesses do not use this method ) . in an academic environment, advice can usually be obtained from someone in the economics department or in the business school. in any case, 10% is a good rule of thumb. the second input is the expected life span. this is not as crucial as one might suppose, for the farther distant the cash flow, the less its net present value. the net present value factor in this case for 15 years' life expectancy was 7.7688; for ten years it would have been 6.3213, for 20 years 8.6466-not a great difference. as is invariably the case, many of the effects of scope were difficult to quantify. the most difficult were those in the sections "savings resulting from control of the collection not previously praclkable." since the collection can now be easily analyzed and scrutinized with only a minimum expenditure of research staff time, the rate of growth of the collection has been considerably tamed, while maintaining customer satisfaction. prior to scope, new subscriptions had been added at the rate of about 90 a year. when scope was implemented, this fell to 10, and has now risen to approximately 30. during its first year of operation, scope apparently resulted in 80 fewer periodical subscriptions, the second year, 60 fewer. continuing this progression, 80, 60, 40, 20, 0, one would arrive at the conclusion that a long-range reduction in collection size of 200 subscriptions was achievable. to be conservative, the calculation has been based 138 ]ourool of library automation vol. 4/3 september, 1971 on an estimate of a reduction of 100 subscriptions/year. even this estimate represents a saving of over $4,000 per year. the resulting space savings were based on a cost of $10 per square feet per year (standard occupancy charges adjusted for stack use) and a ten-year cycle in stack space enlargement. this scrutiny might have been done manually at a justifiable cost, but it had not been done, and more importantly probably would not have been done. the operational savings may be open to some criticism because, as is probably obvious to an experienced serials record librarian, the previous manual system was not strikingly efficient. it can well be argued that the most efficient possible manual system rather than the previous system should have been the alternative against which scope was evaluated. from the point of view of the organization, however, the relevant comparison is to actuality, not to what is theoretically possible, but in generalizing the results this specificity must be borne in mind. somewhat mitigating this circumstance, however, is the fact that the running costs of scope are probably overestimated. the computer cost is based on full costing, inappropriately high for the following reasons: 1) it includes programming overhead, but since scope was programmed externally, the scope project is being doubly charged for its programming; 2) the same double charging applies to program maintenance; 3) the costing makes no distinction between high priority jobs, and relatively low priority jobs such as scope, and presumably low priority is less expensive. since the distortions in the two paragraphs above are difficult to estimate and since they are to a degree counterbalancing, they are simply noted rather than quantified. the yearly operational savings ( $7,050) still intuitively appear surprisingly high. one's initial reaction is that even with overhead included, this is not a great deal less than the yearly cost of one library assistant. in point of fact, one library assistant has been transferred from the library to the rapidly expanding computer based information section (computer based sdi and retrospective searching ), with no apparent deterioration of library services. the library is in fact handling a greater work load than previously, with one less person. this cannot be entirely attributed to scope, as some other rationalization of library operations has b een introduced, but it does indicate that the calculated savings are not a grossly distorted reflection of reality. conclusion as pointed out in the introduction, almost any significant attempt at library automation will require an investment decision. librarians should be prepared to make analyses of their proposals in terms of their justifiability as investment decisions, both for reasons of politics and for their own satisfaction and confidence. the net present value method is a powerful, convenient, and useful tool for such analyses. it is hoped that this scope: a cost analysis/koenig, et al. 139 article will serve as a reasonable case study for the application of this technique to the problems of library automation. an automated serial records system for a relatively modest ( 1,100 serial and periodical titles ) special library has run successfully and achieved its objectives for more than a year and a half. one of the major objectives was to produce a system that allowed clerical help to be substituted for a librarian's scarce and costly time, thus allowing more effective utilization of the professional librarian's skills. this objective has been met. furthermore, a complete turnover of the personnel interfacing with the system has been accomplished easily and painlessly. no small part of the credit goes to the originators who designed and documented the system for such turnover. jt is an old chestnut, but well worth repeating-"design the systems not for yourself, but for the person who will be chosen to replace you." the cost analysis of the operations of the system indicate that its design, implementation, and operation are economically justified, and that capital investment will be paid off in approximately seven years. (the crude payback period was less than five years. ) the major implication of this economic justification lies in the relatively modest size of the library's operation. it may well be that the break-even point in terms of collection size required for successful and cost-effective automation of serial records is smaller than has heretofore been assumed. references 1. dougherty, richard m.: "cost analysis studies in libraries: is there a basis for comparison," library resources & technical services, 13 (winter 1969), 136-141. 2. fasana, paul j.: "determining the cost of library automation," a. l. a. bulletin, 61 (june 1967 ) 656-661. 3 . griffin , hillis l.: "estimating data processing costs in libraries," college and research libraries, 25 (sept. 1964), 400-403, 431. 4. kilgour, frederick g.: "costs of library catalog cards produced by computer," journal of library automation, 1 (june 1968), 121-127. 5. chapin, richard e.; pretzer, dale h.: "comparative costs of converting shelf list records to machine readable form," journal of library automation, 1 (march 1968), 66-74. 6. black, donald v.: "creation of computer input in an expanded character set," ] ournal of library automation, 1 (june 1968), 110-120. 7. jacob, m. e. l.: "standardized costs for automated library systems," ] ournal of library automation, 3 (september 1970), 207-217. 8. lebowitz, abraham 1.: "the aec library serial record: a study in library mechanization," special libraries, 53 (march 1967), 149-153. 9. scoones, m.: "the mechanization of serial records with particular reference to subscription control," as lib proceedings, 19 (february 1967)' 45-62. 140 journal of library automation vol. 4/3 september, 1971 10. pizer, irwin h.; franz, donald r. ; brodman, estelle: "mechanization of library procedures in the medium-sized medical library: the serial record," medical library association bulletin, ll (july 1963 ), 313-338. 11. university of california, san diego, university library: report on serials computer project (la jolla, cal., university library, 1962). 12. grosch, audrey n.: university of minnesota bio-medical library serials control system. comprehensive report (minneapolis, university of minnesota libraries, 1968) 91 p. 13. strom, karen d.: "software design for bio-medical library serials control system." in american society for information service, annual meeting, 20-24 oct. 1968, proceedings, vol. 5. (new york, greenwood publishing corp. 1968), 267-275. 14. shillinglaw, gordon: cost accounting analysis and control (homewood, illinois, richard d . irwin inc. 1967) 913 p. 27 automatic format recognition of marc bibliographic elements: a review and projection brett butler: butler associates, stanford, california. a review and discussion of the technique of automatic format recognition ( afr) of bibliographic data are presented. a comparison is made of the record-building facilities of the library of congress, the university of california (both afr techniques), and the ohio college library center (non-afr). a projection of a next logical generation is described. introduction the technique commonly identified as "format recognition" has more potential for radically changing the automation programs of libraries than any other technical issue today. while the development of marc has provided an international standard, and various computer developments provide increasingly lower operating costs, the investment in converting a catalog into machine-readable form has kept most libraries from integrating automated systems into their operations. the most expensive part of the conversion to machine-readable form has been the human editing required (generally by a cataloger) to identify the many variable portions of the marc-format cataloging record. a full cataloging record contains several hundred possible sections (or fields) in the marc format. research at the library of congress (lc) into this problem resulted in the concept of "format recognition" to reduce cataloging input costs. with the automatic format recognition ( afr) approach, an unedited cataloging entry is prepared (keypunched or otherwise converted to machine-readable form). then the afr computer program provides identification of the various elements of the catalog record through sophisticatedcomputer editing. a degree of human post-editing is generally assumed, but· the computer basically is assigned the responsibility of editing an un.(lae;n.1:itie~d block of text into a marc-format cataloging record. pioneering afr work at the library of congress is presently in use original cataloging input to the marc distribution service. this 28 journal of libm1·y automation vol. 7/1 march 1974 system is quite sophisticated because its output goal is a complete marc record with all fields, subfields, tags, and delimiters identified almost entirely through computer editing. the institute of library research (ilr) at the university of california, faced with the need to convert 800,000 catalog records to marc format, has developed a ·jess ambitious afr program which provides a level of identification sufficient to provide the desired book catalog bibliographic output, or to print catalog cards. . . · the aim of this paper is to examine these two afr strategies and consider their implications for input of two major classes of cataloging records: ( 1) lc or other cataloging records in standard card format; and ( 2) original cataloging not yet in card format. comparing the two afr strategies to an essentially non-afr format used at the ohio college library center for on-line ca:taloging input, we will propose a median strategy for original cataloging. format recognition ( ofr). the thesis is that differing strategies of input should be used for records already formatted into catalog card images and for those original cataloging items being input prior to card production. automatic format recognition an examination of the library of congress ( lc), university of california ( u c), ohio college library center ( oclc), and original format recognition ( ofr) strategies will show the operating differences .. a ·detailed field-by-field comparison of the nearly 500 distinct codes which can be identified in creation of a marc record is attached as appendix i. general comparisons can be made in several areas: input documents, manual coding, level of identification, input and processing costs, error correction, and flexibility in use. input documents-the lc/afr program operates from an uncoded typescript to a machine-readable record prepared through mt /st magnetic tape input. this typescript is, however, prepared from an lc cataloger's manuscript worksheet, in which thereis some inherent bibliographic order.· the lc/ afr program does not rely on this inherent order although its design takes advantage of the probable order in search strategies. lc/ afr could operate with keying of catalog cards, book catalog entries~ or any structure of bibliographic data. the uc program is designed more specifically to handle input of formatted catalog cards, and some of its afr strategy is based on· the sequence and physical indentation pattern on standard catalog cards. it would not work effectively on noncard format input without special recognition of some tagging conventions. the oclc program allows direct input to crt screen from any input docutnent; it requires complete identification of each cataloging field or subelement input. automatic fo1'mat recognition/butler 29 manual coding-lc/ afr requires minimal input coding. within the title paragraph, the title proper, the edition statement, and imprint are explicitly separated at input. series, subject, and other added entries are recognized initially from the roman and arabic numerals preceding them. aside jrom these items, virtually all marc fields are recognized by the computer editing program. uc/ afr inserts a code after the call number input, thus providing explicit identification at input. it also identifies each new indentation on the catalog card explicitly, thus implicitly identifying main entry, title, and certain other major cataloging blocks on the card. the oclc input specifications require explicit coding, some of which is prompted by the crt screen. level of identification-lc/ afr provides the highest possible level of marc record identification, deriving practically every field, subfield, and other code if it is present in an individual cataloging record.~ in evaluation of this element of lc/ afr it should be realized that the needs of the library of congress in creating original marc records for nationwide distribution (and its own use) are much more sophisticated and complex than those of any individual user library or system. the uc/ afr approach reflects a more task-oriented approach, deriving a sufficient level of identification to separate major bibliographic elements. this technique is clearly sufficient to produce computer-generated catalog cards or similar output in a standard manner. however, ucjafr lacks several identifiers, such as specific delimitation of information in the imprint field, which would make feasible the use of its records for further computer-generated processes. the oclc input format is of variable level; many elements are optional and are noted with an asterisk in appendix i. at its most complete, the oclc format specifically excludes only a very few marc fields, most notably geographic area and bibliographic price. input and p1'ocessing costs-direct cost information has not been published for production costs of any of the format recognition systems. the library of congress has reported that ..... the format recognition technique is of considerable value in speeding up input and lowering the cost per record for processing."3 while formal reports have not been published, informed opinion has placed the cost of creation of a marc record at a level of $3.00 ± $.50. format recognition is credited with an increase in productivity of about one-third on input keying and an increase of over 80 percent in human editing/proofreading, and actual computer 0 a number of standard subdivi'sions of various fields were first announced as part of the marc format in the 5th edition of books: a marc format, which was published in 1972.1 consequently they are not specified in format recognition process for marc records, published in 1970, which was used as the reference for this paper.2 they are, however, clearly subfields which could be identified by expansions of afr. these elements are marked with a lower-case "r" in appendix i. 30 journal of libtary automation vol. 7/1 march 1974 processing times approximate those achieved with earlier library of congress marc processing programs.4 lt would seem that afr may have lowered library of congress marc processing costs to the level of $2.00 ± $.50. in the final report of the recon pilot project, cost simulation projections for full editing and format recognition editing were given as $3.46 and $3.06 per record, respectively.6 while full cost information has not been derived for the uc/ afr program itself, figures have been informally reported at library automation meetings indicating that the cost of record creation was approximately $1.00 per entry. included in this figure is computer editing of name and other entries against a computerized authority file, which is done manually in the lc/ afr system. this program is undeniably the least-cost effort to date providing a marc-format bibliographic record. no cost data are provided on the oclc on-line input system. it can be observed that the coding required is quite similar to the pre-afr system in use at the library of congress itself, and that on-line crt input had been evaluated at lc as a higher-cost input technique than the magnetictape typewriters currently providing marc input. lc is considering, though, on-line crt access for subsequent human editing of the marc record created through off-line input and afr editing. error rate and correction-any afr strategy, with present state of the art, generates some error above the normal keying rate observed with edited records. the strategy aims for lowest overall cost by catching these errors in a postprocessing edit which must be performed even for records edited prior to input. the library of congress reports, "the format recognition production rate of 8.4 records per hour (proofing only) . . . is slightly less than that (about 9.2 per hour) for proofing edited records. with format recognition records, the editors must be aware of the errors made by the program ... as well as keying mistakes."6 the savings in prekeyboard editing and increased keying rates more than make up for this slight decrease in postprocessing editing. at the library of congress, where afr is used for production of marc records, a full editing process aims at 100 percent accuracy of input. while such a goal is statistically unreachable, considerable effort· is expended by the marc distribution service to provide the most accurate output possible. from a systems perspective, errors existing in marc records are perhaps less reprehensible than errors in printed bibliographic output, simply because the distributed marc record can be updated by subsequent distribution of a "correction" record. it should be noted that some marc subscribers have voiced concern about the increased percentage of "correction" records, which the library of congress indicates come primarily from cataloging changes rather than input edit errors. the uc/ afr program clearly takes a statistical approach to bibliographic element input and processing. shoffner has indicated that the scale of the 1,000,000 record input project caused a reevaluation of the feasibility of traditional procedures. 7 the result is, in the uc/ afr implementaautomatic format recognition/butler 31 tion, a marc record essentially devoid of human editing. for a smaller scale of production, the uc approach could be combined with post-editing such as that used at lc to increase overall file accuracy. in passing, however, it should be noted that rather sophisticated verification techniques are used in the uc/ afr approach which could be of value in future approaches. these include, for instance, comparison of all words against a machine-readable english-language dictionary; words not found in the dictionary are output for manual editing as suspected keypunch errors. little information is available on the error rates and corrections in the oclc system. however, most records keyed to the oclc system are for a local member's catalog card production, so feedback is provided and presumably errors are corrected through re-inputting to obtain a proper set of catalog cards. there is no central control on the quality of locally entered oclc records at present, except for the encoding standards developed by oclc. flexibility in use-a number of considerations are appropriate herehow many types of format (catalog cards, worksheets, etc. ) can be used as input, how many possible outputs can be developed from the derived marc format, how adaptable is the system to remote and multiple input locations, how many special equipment restrictions are there? the lc/ afr program is clearly the most flexible in ability to accept varying inputs and provide a flexible output. it is, however, not capable of any authority-file editing at present (this is done manually against lc's master files before input). while the input form could be used rather easily at remote locations, the marc afr programs themselves are not available for use outside the library of congress. the uc/afr program provides a rather minimal set of cataloging element subfields but does provide more sophisticated textual editing within the program. it is quite adaptable to remote input as long as the original "worksheet" is in catalog card format, a restriction which in effect requires a preinput human editing step for original cataloging input. the marc format provided would not be sufficient for some currently operating programs using the full marc format, but is quite sufficient for most bibliographic outputs. the oclc input program is dependent on visual editing at the time of crt keying. its flexibility in input is considerable, and outputs can approach a full marc record if all optional fields are identified. original format recognition a working conclusion of this review is that an afr program developed according to the strategy of the university of california will deliver a satisfactory marc-format record at a lower cost than other afr or nonafr alternatives. however, much of the efficiency of the ucjafr is based on the presence of an already existing lc-format catalog card from which to keyboard machine-readable data. for original cataloging to be keyboarded from a cataloger's worksheet, 32 journal of library automation vol. 7/1 march 1974 an original format . recognition strategy is proposed which · provides a somewhat more detailed format than the uc/ afr marc while retaining a generally flexible system and low input costs. several system considerations also guide the design of an ofr system designed for relatively general-purpose user input and multiple output functions: • no special equipment requirements for input keying; • no special knowledge of the marc format required; • minimal table-lookup or text searching in processing; • flexible options for depth of coding provided; and • sufficient depth of format derived for most applications. the ofr input strategy outlined in appendix i provides a much greater degree of explicit field coding at input than the afr programs outlined above. the basis for this decision is the judgment that this cataloging, being done originally by a professional, can readily be coded by element name prior to input. no effort is made to identify marc field elements which· occur with very low frequency, or which are of limited utility for most applications. for instance, the "meeting" type of entry occurs in all combinations, in only 1.8 percent of all records studied by the library of congress in its format recognition study. 8 marc elements requiring either extensive human editing or complex computer processing are likewise excluded from input, on a cost-utility basis. an example is the geographic area code, which must either be assigned by a knowledgeable editor or derived through extensive computer searching for the city /county of publication. however, where little penalty is attached to allowing input of coded information, the ofr format allows input for inclusion in the derived marc-format record. conclusion it is clear that the afr programs developed for specific needs by the library of congress and the university of california can be great factors for change in library automation strategies over the next decade. striking benefits in cost savings, ease of input, and subsequent processing are to be gained. the abbreviated outline of an original cataloging ( ofr) input strategy is simply a suggestion of a second generation of format recognition programs which will undoubtedly develop to serve more general needs for marc-format bibliographic input. references 1. u.s. library of congress, marc development office, books: a marc format. 5th ed. (washington, d.c.: u.s. government printing office, 1972). 2. format recognition process for marc records: a logical design (chicago: american library association, 1970). automatic format reeognitionjbvtler 33 3. henriette d. avram, lenore s. maruyama, and john c. rather, "automation activities in tl:te processing department of the library of congress," library resources & technical services 16:195-239 (spring 1972). · . 4. ibid., p.204, 206. 5. recon pilot project, recon pilot project final report, prepared by henriette d. avram. (washington, d.c.: library of congress, 1972). 6. avram, et al., "automation activities," p.206. 7. ralph m. shoffner, some implications of automatic recognition of bibliographic elements, technical paper no. 22. (berkeley, calif.: institute of library research, university of california, april 1971) . 8. format recognition process, p.48. appendix i format recognition input specifications code outline field tag the number listed is the field tag number of that bibliographic element in the marc format. each general field is listed first. following it are notes indicating areas within the field. fixed-field indicators within the field are listed first; each one's code number follows a slash after the field code (041/ 1 =field 41, indicator code 1). if there is more than one group of indicators, an additional code describes group 1 (il) or group 2 (i2). subfields within the field are alphabetic codes following a "+" sign after the field code ( 070+b =field 070, subfield b). field name the overall field name is listed first. fixed-field indicator names are listed at the first indenti'on under the field n arne. subfield names are listed at the second indention under the field n arne. treatment by program these codes indicate the processing provided for each field and subelement by the four computer processing systems considered. codes are slightly different for each column considered: lc the library of congress system. "r'' indicates that the element described is recognized by the program, rather than explicitly identified at input. "i" indicates the element is keyed and not recognized by the format recognition process. a small "r" denotes elements introduced to the marc format since afr documentation was published, but presumably treated by the afr program just as "r" elements. "0" indicates that the element marked is omitted from input altogether. uc the university of california system. codes are identical to those above, but the "r" code is not used. oclc the ohio college library center system. in addition to the above codes, "~" following any item denotes that input is optional. "i" code is used wherever an element is tagged even though the oclc programs create the marc format from these tags. ofr original format recognition proposals. codes are similar to those described in the previous paragraphs. field tag, indicator 015 015+a format recognition input specifications treatment by program field name lc uc oclc ofr national bibliography no. r 0 0 i~ 34 journal of library automation vol. 7/1 march 1974 appendix 1 (continued) field tag, treatment by program indicator field name lc uc oclc ofr 025 overseas acquisition no. r 0 0 0 025ta 041 languages r r i i" 041/0 multilanguage indicator r 041/1 translation indicator r 041ta text/translation code r i 041tb summary language code r i 043 geographic area code r 0 0 0 043ta 049 holdings information 049ta holding library code 0 i i i 050 lc call number r r" i r" 050/0 book is in lc r 050/1 book not in lc r 050ta lc class number r i r 050tb book number r i r 051 lc copy statement r 0 0 0 051ta 051t b 051tc 060 natl. lib. medicine call no. r 0 r" 060ta nlm class number r i" 060t b nlm book number r i" 070 n.a.l. call number r 0 r" 070ta nal class number r 0 070t b nal book number r 0 082 dewey decimal classif. no. r 0 i r" 082ta ddc number r i 086 su. docs. classif. no. r 0 i 0 086ta su. docs. number r i 090 local call number ( lc) 0 r r 090ta lc class number i" r 090tb book number i" r 092 local call number (dewey) 0 0 .. i" 092ta dewey class number i" i" 092tb book number i" i" 100 personal name r r r 100/0,11 forename r 100/1, 11 single surname r 100/2,11 multiple surname r 100/3,11 name of family r 100/0,i2 main entry not subject r 100/1,i2 main entry is subject r loot a name r i loot b numeration r" i lootc title assoc. w /name r i lootd date r i loote relator r i lootk form subheading r i loott title of book r i lootl language r i" loot£ date of work r i" automatic format recognition/butler 35 appendix 1 (continued) field tag, treatment by program indicator field name lc uc oclc ofr ioo+p part of work r i" 110 corporate name r 0 i" 110/0,11 inverted surname r i" 11011,11 place or place + name r 110/2,11 direct-order name r i" 110/0,12 main entry not subject r 110/1,12 main entry is subject r 110+a name r i 110+b subordinate unit r i 110+c relator r i 110+k form subheading r i 110+t title of book r i 110+u nonprinting element r 0 110+1 language r i" 110+p part code r p' 110+f date of work r i" llo+g miscellaneous r i" ill conference or meeting, m.e. r 0 i 0 111/0,11 inverted surname r 111/1, 11 place or place+ name r 111/2,11 direct-order name r 111/0,12 main entry not subject r 111/1,12 main entry is subject r 111 +a name r, i 111 + b number r i 111+c place r i 11l+d date r i 111+e subordinate unit r i 111 +f date of publication r i" 111+g miscellaneous r i" 11l+k form subheading r i 111+1 language r i" 111+p part r i" 111+t title of book r i 130 uniform title heading, m.e. r 0 i i" 130,11 blank 130/0,12 main entry is not subject r 130/1,12 main entry is subject r 130+a uniform title heading r i 130+£ date of work r i" 130+ g miscellaneous r i" 130+h media qualifier r i" 130+k form subheading r i 130+1 language r i" 130+p part r i" 130+s alternate version r i" 130+t title of book r i 240 uniform title, supplied r r i i" 240/0,11 not printed on lc cards r 240/1,11 printed on lc cards r r 240+ a uniform title r r i 240+£ date of work i i" 240+k form subheading r, i 240+p part of work r i" 36 journal of librmy automation vol. 7/1 march 1974 appendix 1 (continued) field tag, treatment by program indicator field name lc ,, ·. uc oclg ofr 240+s version r i" 241 romanized title r 0 i" 241/0, i1 not printed on lc cards r 241/1,!1 printed on lc cards r 241 +a romanized title r i"' 245 title r r i i 245/0, i1 no title added entry r r r 245/1, i1 title added entry r r r 245/0,i2 nonfiling field r 0 245+a short title r r i r 245+b subtitle r r i r 245+c title page transcription r r i r 250 edition statement r 0 i r 250+a edition r i 0 250+ b additional information r i 260 imprint statement r 0 i i 260/0 publisher not m.e. r i r 260/1 publisher is m.e. r i r 260+a place of publication r i r 260+b publisher r i r 260+c date of publication r i r 300 collation r r i r 300+a pagination or volume r r i r 300+b illustration r 0 i 0 300+c height r 0 i 0 350+a bibliographic price r 0 0 i 400 series, personal name r (r) i r 400/0, i1 forename r 400/1, i1 single surname r 400/2, i1 multiple surname r 400/3, i1 name of family r 400/0,i2 author not main entry r 400/1,i2 author is main entry r 400+a name r i r 400+b numeration r i 400+c title associated r i 400+d dates r i 400+e relator r i 400+k form subheading r i 400+f date of work r i"' 400+1 language r i"' 400+p part of work r i" 400+t title of book r i 400+v volume or number r i 410 series, corporate name r (r) i i 410/0, i1 inverted surname r r i" 410/1, i1 place, place + n arne r r 410/2, i1 direct-order name r r i" 410/0,12 author not main entry r r 410/1,12 author is main entry r r 410+a name r i 410+b subordinate unit r i 410+e relator r i automatic format recognition/butler 37 appendix 1 (continued) field tag, treatment by program indicator field name lc uc oclc ofr 410+f date of work r i" 410+g miscellaneous r i" 410+k form subheading r i 410+1 language r i" 410+p part r i" 410+t title of book r i 410+u nonprinting element r 0 410+v volume r i 411 series, conference title r 0 i i" 411/0, ii inverted surname r 411/1, ii place, place+ name r 411/2, ii ·.direct-order name r 411/0, i2 author not main enhy r 411/1, i2 ,author is main enhy r 411+a name r i 4ll+b number r i 411+c place r i 41l+d date r i 411+e name subordinates r i 41l+f publication date r i" 411+g miscellaneous r i" 41l+k form subdivision r i 411+1 language r i" 4ll+p part r i" 411+t title of book r i 4ll+v volume r i 440 series, title r r i i 440+a title r r i r 440+v volume or number r i r 490 series, untraced or r r r t:raced differently i 490/0 . series not traced i 490/1 • series traced diff. r i r 490+a series name r r i r 500 bibliographic notes r r r 500+a general note r r i" 501 +a "bound with" r 0 502+a dissertation r i" 503+a bibliography history 0 0 504+a bibliography note r i 505 contents note r r 505/0 . contents complete r 505/1 · contents incomplete r 505/2 partial contents r 505+a contents note r i" 520+a abstract or annotation r i 600 subject a.e., personal r r i i 600/0, ii forename r 600/1, ii single surname r 60012, ii multiple surname r 600/3, ii name of family r 60010, i2 lc subject heading code r i 600/1, i2 annotated card heading r i 38 ] ournal of library automation vol. 7/1 march 1974 appendix 1 (continued) field tag, treatment by program indicator field name lc uc oclc ofr 600/2,12 nlm subject heading code r i 600/3, i2 nal subject heading code r 0 600/4, i2 other subject heading r i i 600+a name r i 600+b numeration r i 600+c associated title r i 600+d date r i 600+e relator r i 600+f date of work r i"' 600+k form subheading r i 600+1 language r i"' 600+t title of book r i 600+p part of book r i"' 600+x general subdivision r i 600+y period subdivision r i 600+z place subdivision r i 610 subject a.e., corporate r 0 i i 610/0,11 inverted surname r 610/1,11 place, place+ name r 610/2,11 direct-order name r 610/0,i2 lc subject heading code r i 610/1,i2 annotated card heading r i 610/2,i2 nlm subject heading code r i 610/3,12 nal subject heading code r 0 610/4,12 other subject heading r i i 610+a name r 0 1 i 610+b subordinate unit r i 610+e relator r i 610+f date of work r i"' 610+k form subheading r i 610+1 language r i"' 610+g miscellaneous r i"' 610+p part r i"' 610+t title of book r i 610+u nonprinting element r 0 610+x general subdivision r i 610+y period subdivision r i 610+z place subdivision r i 611 subject a.e., conference r 0 i 0 611/0,11 inverted surname r 611/1,11 place, place + n arne r 611/2,11 direct-order name r 611/0, i2 lc subject heading code r i 611/1, i2 annotated card heading r i 611/2, i2 nlm subject heading code r i 611/3, i2 nal subject heading code r 0 611/4, i2 other subject heading r i 611 +a name r i 611+b number r i 61l+c place r i 61l+d date r i 61l+e subordinate unit r i 611+f publication date r io 61l+g miscellaneous r i"' automatic format recognition/butler 39 appendix 1 (continued) field tag, treatment by program indicator field name lc uc oclc ofr 6ll+k form subheading r i 611+1 language r i~ 61l+p part r i~ 61l+t title of book r i 6ll+x general subdivision r i 6ll+y . period subdivision r i 6ll+z place subdivision r i 630 subject a.e., uniform title r 0 i 0 630/0,i2 lc subject heading code r i 630/1,i2 annotated card heading r i 630/2, i2 nlm subject heading code r i 630/3,i2 nal subject heading code r 0 630/4, i2 other subject heading r i 630+a uniform title heading r i r 630+£ date of work r i~ 630+g miscellaneous r i~ 630+h media qualifier r i~ 630+k form subdivision r i 630+1 language r i~ 630+p part r i~ 630+s alternate version r i~ 630+t title r i 630+x general subdivision r i 630+y period subdivision r i 630+z place subdivision r i 650 subject a.e., topical r r i r 650/0, i2 lc subject heading code r i 650/1, i2 annotated card heading r i 650/2,i2 nlm subject heading code r i 650/3,i2 nal subject heading code r 0 650/4, i2 other subject heading r i i 650+a topical subject, place r i 650+b element after place r i 650+x general subdivision r i 650+y period subdivision r i 650+z place subdivision r i 651 subject a.e., geographic r 0 i 0 651/0, i2 lc subject heading code r i 651/1,i2 annotated card heading r i 651/2, i2 nlm subject heading code r i 65113,12 nal subject heading code r 0 65114,12 other subject heading r i 651+a geographic name, place r i 651+b element after place r i 651+x general subdivision r i 651+y period subdivision r i 651+z place subdivision r i 690 subject a.e., local topical 0 0 i~ 0 690+a topical subject, place 0 i 690+b element after place 0 i 690+x general subdivision 0 i 690+y period subdivision 0 i 690+z place subdivision 0 i 40 journal of libm1·y automation vol. 7/1 march 1974 appendix 1 (continued) field tag, treatment by program indicator field name lc uc oclc ofr 691 subject a.e., local geogr. 0 0 f 0 691+a geographic name, place 0 i 691+b element after place 0 i 691+x general subdivision 0 i 69l+y period subdivision 0 i 691+z place subdivision 0 i 700 other a.e., personal name r r i r 700/0, i1 forename r 700/1, i1 single surname r 700/2, i1 multiple surname r 700/3, i1 name of family r 700/0, i2 alternate entry r 700/1, i2 secondary entry r 700/2, i2 analytical entry r 700+a name r i 700+b numeration r i 700+c title associated r i 700+d date r i 700+e relator r i 700+f publication date r i~ 700+k form subheading r i 700+1 language r io 700+p part of work r i~ 700+t title of book r i 710 other a.e., corporate name r 0 i j~; 7~0/0, i1 inverted surname r 710/1, i1 place, place + n arne r 710/2, i1 direct-order name r 710/0, i2 alternate entry r 710/1, i2 secondary entry r 710/2, i2 analytical entry r 710+a name r i 710+b subordinate unit r i 710+e relator r i 710+f date of work r i~ 710+g miscellaneous r io 710+k form subheading r i 710+1 language r i~ 710+p part of work r i~ 710+t title of work r i 710+u nonprinting element r 0 711 other a.e., conference r 0 i i~ 711/0, i1 inverted surname r 711/1, i1 place, place+ name r 711/2, i1 direct-order name r 711/0, i2 alternate entry r 711/1, i2 secondary entry r 711/2, i2 analytical entry r 711+a name r i 711+b number r i 711+c place r i 711+d date r i 711+e subordinate units r i 711+f date of work r i~ . ' automatic for.mat recognition/butler 41 appendix 1 (continued) field•tag, treatment by program indicator field name lc uc oclc ofr 711tg miscellaneous r i" 711tk form subheading r i. 711tl language r i" 711tp part of work r ·i" 711tt title of book r i 730 other a.e., uniform title r 0 i r 730/0,i2 alternate entry r 730/1, i2 secondary entry r 730/2,i2 analytical entry r 730ta uniform title r i 730tf date of work r i" 730tg miscellaneous r i" 730th media qualifier r i" 730tk form subdivision r i" 730tl language r i" 730tp part of work r i" 730ts alternate version r i" 730tt title of work r i 740 other a.e., title traced differently r r i r 740/0, i2 alternate entry r 740/l,i2 secondary entry r 740/2,i2 analytical entry r 740ta title different r i 800 series a.e., personal r r i" i 800/0 forename r 800/1 single surname r 800/2 multiple surname r 800/3 name of family r soot a name r i sooth numeration r i boote title associated r i 800td dates r i boote relator r i 800tf date of work r i" 800tk form subheading r i 800tl language r i" 800tp part of work r i" 800+t title of work r i 800tv volume or number r i 810 series a.e., corporate r r i" i" 810/0 inverted surname r 810/1 place, placet name r 810/2 direct-order name r slota name r i 810tb subordinate unit r i 810te relator r i 810tf date of work r i" 810tg miscellaneous r i" 810tk form subheading r i 810tl language r i" 810tp part of work r i" slott title of work r i 810tu nonprinting element r 0 42 ] ournal of library automation vol. 7/1 march 1974 appendix 1 (continued) field tag, treatment by program indicator field name lc uc oclc ofr 810+v volume or number r i 811 series a.e., conference r 0 i<) 0 811/0 inverted surname r 811/1 place, place+ name r 811/2 direct-order name r 811+a name r i 811+b number r i 811+c place r i 811+d date r i 811+e subordinate unit r i 811+f date of work r io 811+g miscellaneous r jo 811+k form heading r i 811+1 language r i"' 811+p part of work r i"' 811+t title of book r i 811+v volume or number r i 840 series a.e., title r 0 i"' 0 840+a title r i 840+v volume or number r i 590+a local notes field 0 0 i"' 0 910+a user option data field 0 0 i"* 0 lib-s-mocs-kmc364-20141005045237 223 cumulating the supplements to the seventh edition of lc subject headings roy b. torkington: central library and documentation branch, international labour office. 0 at the time of writing, the author was head, library systems department, university of california, san diego. a description is presented of the project of the university of california library automation program to cumulate the 1966 through 1971 supplements to the library of congress subject headings. the university of california institute of library research marc processing software, bibcon, was used, with specially written programs. the resulting cumulation was edited, printed in book form, and made available to libraries. the final task involved merging six marc files into one file of over 125,000 records and then printing that file in a format similar to that of lc subject headings. the project was a cooperative effort with participation by people from several uc campuses. introduction the seventh edition of subject headings used in the dictionary catalogs of the library of congress was published with a cutoff date of june 30, 1964. the first supplement covered additions and changes from july 1, 1964 through december 31, 1965. subsequent supplements were issued annually, with each annual cumulating quarterly over a one-year period.1• 2 by 1972, when the supplement for 1971 was issued, it was necessary, when assigning or verifying subject headings, to use seven supplements ( 1964/ 65, 1966, 1967, 1968, 1969, 1970, 1971) in addition to the seventh edition. the supplement cumulation task of the university of california library automation program aimed at alleviating that problem by merging the 1966 through 1971 annual supplements into one cumulation. through the courtesy of the library of congress we were able to obtain unedited magnetic tape files of all supplements except the 1964/65; these files were in the lc internal format. the university of california library automation program undertakes cooperative automation programs which will benefit uc libraries. the cu0 the views expressed in this document may not be considered those of the international labour office. ~ i ·u~· tc:j-[ ci ··o· l3 ..... l.cmo101n' ...,. 00 ""~'"'·" .o ........ ""it '•,())' s....cj;.tj'td eii·10\.'c a mull ' ' ,~-. p'!... ~' ' .. ~ ... " .. .---.l'-' ...,' }" , i ~ ... , ... '' ',,, 0: ~/ j:.ym l111fc~lt.d ed-~41c~tmu&.~ ' ' .. .,. ...... ··.,o / umlim flt-'l\1~1'<11*"' fig.l. preliminary cumulation production and editing. ' ' , ' , ' ' 'tf 't5 ',b edited. kern ! dorm sori!hl f:llt.1uc\i~atien. m-,-llcu hitisom jc.,..tldltmi ... ,.c~)t,-.., . 'd \5' ~ ;:l, ~__j l:=_j '@),, ,/' ' l '@• / ....... ..... aiviual 0 b' ~ l:;:-j fig. 2. annual supplement cumulation cycle. ...... .. ,... ..... .,_, n•w ~ .... ~ ""'" i a. .a t"-1 .... i r ~ ~ ~ ~ t::;j (1) ~ 0" ~ 1-' ~ cumulating supplements to lc subject headingsjtorklngton 225 mulative supplement task was seen as a low-cost project which could result in a product useful to uc and other libraries. we intended to use software already available at uc as much as possible. the available software was bibcon, a group of marc processing programs developed and used by the uc institute of library research, berkeley, and uc santa cruz, for production of the 1963-1967 uc union catalog supplement and the uc santa cruz book catalog.8 the bibcon programs to be used were: sked, a sort key creator and editor; biblist and bibprint, record, column, and page formatters; and fix, a record corrector. essentially we considered the problem of cumulating the supplements as one of producing a book catalog by cumulating several annual catalogs. thus we needed a program to convert the lc internal format to uc marc (supcon) and a merge program ( supmrg) (see figures 1 and 2) . merge transactions a special merge program was necessary, because merging the supplement files is more complex than merely interfiling entries. frequent matches or _ example: es_upc author ’s uri: author:/// . . . /. example: author://es_upc.dac/ruben.tous (in this example “es” is the cctdl for spain, upc (universitat politècnica de catalunya) is the university, and dac (departament d’arquitectura de computadors) is the department. them into their repositories. as a collateral effect, authors and publishers also will be able to store evidences (in the form of digitally signed metadata graphs) that demonstrate different facts related to the creating–editing–publishing process (e.g., paper submission, paper acceptance, and paper publication). to achieve these goals, our reference architecture requires each metadata graph carrying information about events to be digitally signed by the proper subject. because our approach is based in a pki trust scheme, each signing subject (author or publisher) will need a public key certificate (or identity certificate), which is an electronic document that incorporates a digital signature to bind a public key with an identity. all the certificates used in the architecture will include the public key information of the subject, a validity period, the url of a revocation center, and the digital signature of the certificate produced by the certificate issuer’s private key. each author will have a certificate that will include as a subject-unique identifier the author ’s universal resource identifier (uri), which we explain in the next section, along with the author ’s current information (such as name, e-mail, affiliation, and address) and previous information (list of former names, e-mails, and addresses), and a timestamp indicating when the certificate was generated. the certification authority (ca) of the author’s certificate will be the university, research center, or company with which the author is affiliated. the ca will manage changes in name, e-mail, and address by generating a new certificate in which the former certificate will move to the list of former information. changes in affiliation will be managed by the new ca, which will generate a new certificate with the current information. since the new certificate will have a new uri, the ca also will generate a signed link to the previous uri. therefore the citation-analysis system will be able to recognize the contributions signed with both certificates as contributions made by the same author. it will be the responsibility of the new ca to verify that the author was indeed affiliated to the former organization (which we consider a very feasible requirement). every time an author (or group of authors) submits a paper to a conference, workshop, or journal, the corresponding author will digitally sign a metadata graph describing the paper submission event. although the paper submission will only be signed by the corresponding author, it will include the uris of all the authors. journals (and also conferences and workshops) will have a certificate that contains their related information. their ca will be the organization or editorial board behind them (for instance, acm, ieee, springer, lita, etc.). if a paper is accepted, the journal will send a signed notification of acceptance, which will include the reviews, the comments from the editor, and the conditions for the paper to be accepted. if the paper is rejected, the journal 26 information technology and libraries | march 2011 ■■ microsoft’s conference management toolkit (cmt; http://cmt.research.microsoft.com) is a conference management service sponsored by microsoft research. it uses https to provide confidentiality, but it is a service for which you have to pay. although some of the web-based systems provide confidentiality through https, none of them provides nonrepudiation, which we feel is even more important. this is so because nonrepudiation allows authors to certify their publications to their curriculum evaluators. our proposed scheme always provides nonrepudiation because of its use of signatures. curriculum evaluators don’t need to search for the publisher’s website to find the evaluated author’s paper. in addition, our proposed scheme allows curriculum evaluations to be performed by computer programs. and confidentiality can easily be achieved by encrypting the messages with the public key of the destination of the message. it should not be difficult for authors to obtain the public key for the conference or journal (which could be included in its “call for papers” or included on its webpage). and, because the paper-submission message includes the author’s public key, notifications of acceptance, rejection, and publication can be encrypted with that key. ■■ modeling the scholarly communication process citation analysis systems operate over metadata about the scholarly communication process. currently, these metadata are usually automatically generated by the citation-analysis systems themselves, generally through a programmatic analysis of the scholarly artifacts unstructured textual contents. these techniques have several drawbacks, as enumerated already, but especially regarding the fact that there is metadata that cannot be inferred from the contents of a paper, like all the aspects of the publishing process. to allow citation-analysis systems accessing metadata about the entire scholarly artifacts lifecycle, we suggest a metadata model that captures a great part of the scholarly domain static and dynamic semantics. this model is based on knowledge representation techniques in semantic web, such as resource description framework (rdf) graphs and web ontology language (owl) ontologies. metadata and rdf the term “metadata” typically refers to a certain data representation that describes the characteristics of an information-bearing entity (generally another data representation such as a physical book or a digital video file). metadata plays a privileged role in the scholarly creations’ uris are built in a similar manner to authors’ uris. but it this case, the use of the country code as part of the publisher’s id is optional. because a creation and its metadata evolve through different stages (submission and camera-ready), we will use different uris for each phase. we propose the use of this kind of uri instead of other possible schemes such as the digital object identifier (doi), because the ones proposed in this paper has the advantage of being human readable and contain the cas chain.6 of course, that doesn’t mean that once published a paper cannot obtain a doi or another kind of identifier. publisher’s ca_id: or _ examples: lita and it_italianjournalofzoology creation’s uri: creation:// . . . / example: creation://lita.ital/vol27_num1_ paper124 confidentiality and nonrepudiation nowadays, some conferences manage their paper submissions and notifications of acceptance (with their corresponding reviews) through e-mail, while others use a web-based application, such as edas (http://edas.info/). the e-mail-based system has no means of providing any kind of confidentiality. each router through which the e-mail travel can see their contents (paper submissions and paper reviews). the web-based system can provide confidentiality through http secure (https), although some of the most popular applications (such as edas and myreview) do not provide it; their developers may not have thought that it was an important feature. the following is a short list of some of the existing web-based systems: ■■ edas (http://edas.info/) is probably the most popular sytem. it can manage a large number of conferences and special issues of journals. it does not provide confidentiality. ■■ myreview (http://myreview.intellagence.eu/index .php) is an open-source web application distributed under the gpl license for managing the paper submissions and paper reviews of a conference or journal. myreview is implemented with php and mysql. it does not provide confidentiality. ■■ conftool (http://www.conftool.net) is another web-based management system for conferences and workshops. a free license of the standard version is available for noncommercial conferences and events with fewer than 150 participants. it uses https to provide confidentiality. semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 27 the purpose of the reference architecture described in this paper, we do not instruct which of the two described approaches for signing rdf graphs is to be used. the decision will depend on the implementation (i.e., on how the graphs will be interchanged and processed). owl and an ontology for the scholarly context to allow modeling the scholarly communication process with rdf graphs, we have designed an owl description logic (dl) ontology. owl is a vocabulary for describing properties and classes of rdf resources, complementing rdfs’s capabilities for providing semantics for generalization hierarchies of such properties and classes. owl enriches the rdfs vocabulary by adding, among others, relations between classes (e.g., disjointness), cardinality (e.g., “exactly one”), equality, richer typing of properties, characteristics of properties (e.g., symmetry), and enumerated classes. owl has the influence of more than ten years of dl research. this knowledge allowed the set of constructors and axioms supported by owl to be carefully chosen so as to balance the expressive requirements of typical applications with a requirement for reliable and efficient reasoning support. a suitable balance between these computational requirements and the expressive requirements was achieved by basing the design of owl on the sh family of description logics.10 the language has three increasingly expressive sublanguages designed for different uses: owl lite, owl dl, and owl full. we have chosen owl dl to define the ontology for capturing the static and dynamic semantics of the scholarly communication process. with respect to the other versions of owl, owl dl offers the most expressiveness while retaining computational completeness (all conclusions are guaranteed to be computable) and decidability (all computations will finish in finite time). owl dl is so named because of its correspondence with description logics. figure 3 shows a simplified graphical view of the owl ontology we have defined for capturing static and dynamic semantics of the scholarly communication process. figure 4, figure 5, and figure 6 offer a (partial) tabular representation of the main classes and properties of the ontology. in owl, properties are independent from classes, but we have chosen to depict them in an object-oriented manner to improve understanding. for the same reason we have represented some properties as arrows between classes, despite this information being already present in the tables. uris do not appear as properties in the diagrams because each instance of a class will be an rdf resource, and any resource has a uri according to the rdf model. these uris will follow the rules described in the above section, “reference architecture.” it’s worth mentioning that the selection of the included properties has been based in the study of several metadata formats and standards, such as dublin communication process by helping identify, discover, assess, and manage scholarly artifacts. because metadata are data, they can be represented through any the existing data representation models, such as the relational model or the xml infoset. though the represented information should be the same regardless of the formalism used, each model offers different capabilities of data manipulation and querying. recently, a not-so-recent formalism has proliferated as a metadata representation model: rdf from the world wide web consortium (w3c).7 we have chosen rdf for modeling the citation lifecycle because of its advantages with respect to other formalisms. rdf is modular; a subset of rdf triples from an rdf graph can be used separately, keeping a consistent rdf model. it therefore can be used with partial information, an essential feature in a distributed environment. the union of knowledge is mapped into the union of the corresponding rdf graphs (information can be gathered incrementally from multiple sources). rdf is the main building block of the semantic web initiative, together with a set of technologies for defining rdf vocabularies like rdf schema (rdfs) and the owl.8 rdf comprises several related elements, including a formal model and an xml serialization syntax. the basic building block of the rdf model is the triple subjectpredicate-object. in a graph-theory sense, an rdf instance is a labeled directed graph consisting of vertices, which represent subjects or objects, and labeled edges, which represent predicates (semantic relations between subjects and objects). coming back to the scholarly domain, our proposal is to model static knowledge (e.g., authors and papers metadata) and dynamic knowledge (e.g., “the action of accepting a paper for publication,” or “the action of submitting a paper for publication”) using rdf predicates. the example in figure 1 shows how the action of submitting a paper for publication could be modeled with an rdf graph. figure 2 shows how the example in figure 1 would be serialized using the rdf xml syntax (the abbreviated mode). so, in our approach, we model assertions as rdf graphs and subgraphs. to allow anybody (authors, publishers, citation-analysis systems, or others) to verify a chain of assertions, each involved rdf graph must be digitally signed by the proper principal. there are two approaches to signing rdf graphs (as also happens with xml instances). the first approach applies when the rdf graph is obtained from a digitally signed file. in this situation, one can simply verify the signature on the file. however, in certain situations the rdf graphs or subgraphs come from a more complex processing chain, and one could not have access to the original signed file. a second approach deals with this situation, and faces the problem of digitally signing the graphs themselves, that is, signing the information contained in them.9 for 28 information technology and libraries | march 2011 note that instances of submitted and accepted event classes will point to the same creation instance because no modification of the creation is performed between these events. on the other hand, instances of tobepublished and published event classes will point to different creation instances (pointed by the cameraready and publishedcreation properties) because of the final editorial-side modifications to which a work can be subject. ■■ advantages of the proposed trust scheme the following is a short list of security features provided by our proposed scheme and attacks against which our proposed scheme is resilient: core (dc), dc’s scholarly works application profile, vcard, and bibtex.11 figure 4 shows the class publication and its subclasses, which represent the different kinds of publication. in the figure, we only show classes for journals, proceedings, and books. but it could obviously be extended to contain any kind of publication. figure 5 contains the classes for the agents of the ontology (i.e., the human beings that author papers and book chapters and the organizations to which human beings are affiliated or that edit publications). the figure also includes the creation class (e.g., a paper or a book chapter). finally, figure 6 has the part of the ontology that describes the different events that occur in the process of publishing a paper (i.e., paper submission, paper acceptance, notification of future publication, and publication). figure 1. example rdf graph semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 29 cryptography. the necessary changes do not apply only to the citation-management software, but also to all the involved parties in the publishing lifecycle (e.g., conference and journal management systems). authors and publishers would be the originators of the digitally signed evidences, thus user-friendly tools for generating and signing the rdf metadata would be required. plenty of rdf editors and digital signature toolkits exist, but we predict that conference and journal management systems such as edas could easily be extended to provide integrated functionalities for generating and processing digitally signed metadata graphs. this could be transparent to the users because the rdf documents would be automatically generated (and also signed in the case of the publishers) during the creating–editing– publishing process. because our approach is based on a pki trust scheme, we rely on a special setup assumption: the existence of cas, which certify that the identity information and the public key contained within the public key certificates of authors and publishers belong together. to get a publication recognized by a reliable citation-analysis system, an author or a publisher would need a public-key certificate issued by a ca trusted by this citation-analysis system. the selection of trusted ■■ an author can certify to any evaluation entity that will evaluate his or her curriculum the publications that he or she has done. ■■ an evaluator entity can query the citation-analysis system and get all the publications that a certain author has done. ■■ an author cannot forge notifications of publication. ■■ a publisher cannot repudiate the fact that it has published an article once it has sent the certificate. ■■ two or more authors cannot team up and make the system think that they are the same person to have more publications in their accounts (not even if they happen to have the same name). ■■ implications the adoption of the approach proposed in this paper has certain implications in terms of technological changes but also in terms of behavioral changes at some of the stages of the scholarly publishing workflow. regarding the technological impact, the approach relies on the use of semantic web technologies and public-key 2008–05–25 semantic web for reliable citation management in scholarly publishing . . . . . . figure 2. example rdf/xml representation of graph in figure 1 30 information technology and libraries | march 2011 figure 3. owl ontology for capturing the scholarly communication process figure 4. part of the ontology describing publications semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 31 the citation-analysis system obtains the information or whether the information is duplicated. the proposed approach guarantees that the citation-analysis subsystem can always verify the provenance and trust of the metadata, and the use of unique identifiers ensures the detection of duplicates. our approach also implies minor behavioral changes for authors, mainly related to the management of publickey certificates, which is often required for many other tasks nowadays. a collateral benefit of the approach would be the automation of the copyright transfer procedure, which in most cases still relies on handwritten signatures. authors would only be required to have their public-key certificate at hand (probably installed in the web browser), and the conference and journal management software would do all the work. cas by citation-analysis systems would require the deployment of the necessary mechanisms to allow an author or a publisher to ask for the inclusion of his or her institution in the list. however, this process would be eased if some institutional cas belonged to trust hierarchies (e.g., national or regional), so including some higher-level cas makes the inclusion of cas of some small institutions easier. another technological implication is related to the interchange and storage of the metadata. users and publishers should save the signed metadata coming from a publishing process digitally, and citation-analysis systems should harvest the digitally signed metadata. the metadata-harvesting process could be done in several different ways; but here raises an important benefit of the presented approach: the fact that it does not matter where figure 5. part of the ontology describing agents and creations 32 information technology and libraries | march 2011 domain, but which we have taken in consideration. in our approach, static and dynamic metadata cross many trust boundaries, so it is necessary to apply trust management techniques designed to protect open and decentralized systems. we have chosen a public-key infrastructure (pki) design to cover such a requirement. however, other approaches exist, such as the one by khare and rifkin, which combines rdf with digital signatures in a manner related to what is known as the “web of trust.”13 one aspect of any approach dealing with rdf and cryptography is how to digitally sign rdf graphs. as described above, in the section “modeling the scholarly communication process with semantic web knowledge representation techniques,” there are two different approaches for such a task, signing the file from which the graph will be obtained (which is the one we have chosen) or digitally signing the graphs themselves (the information represented in them), as described by carroll.14 ■■ conclusions the work presented in this paper describes a reference architecture that aims to provide reliability to the citation and citation-tracking lifecycle. the paper defends that current practices in the analysis of impact of scholarly artifacts entail serious design and security flaws, including nonidentical instances confusion, author-naming conflicts, fake citing, repudiation, impersonation, etc. ■■ related work as far as we know, this is the first paper to combine semantic web technologies and public-key cryptography to achieve reliable citation analysis in scholarly publishing. regarding the use of ontologies and semantic web technologies for modeling the scholarly domain, we highlight the research by rodriguez, bollen, and van de sompel.12 they define a semantic model for the scholarly communication process, which is used within an associated large-scale semantic store containing bibliographic, citation, and use data. this work is related to the mesur (metrics from scholarly usage of resources) project (http://www.mesur.org) from los alamos national laboratory. the project’s main goal is providing novel mechanisms for assessing the impact of scholarly communication items, and hence of scholars, with metrics derived from use data. as in our case, the approach by rodriguez, bollen, and van de sompel models static and dynamic aspects of the scholarly communication process using rdf and owl. however, contrary to what happens in that approach, our work focuses on modeling the dynamic aspects of the creation–editing–publishing workflow, while the approach by rodriguez, bollen, and van de sompel focuses on modeling the use of alreadypublished bibliographic resources. regarding the combination of semantic web technologies with security aspects and cryptography, there exist several works that do not specifically focus in the scholarly figure 6. part of the ontology describing events semantic web for reliable citation analysis in scholarly publishing | tous, guerrero, and delgado 33 isi web of knowledge, http://www.isiwebofknowledge .com/ (accessed june 24, 2010); and eugene garfield, citation indexing: its theory and application in science, technology and humanities (new york: wiley, 1979). 3. judit bar-ilan, “an ego-centric citation analysis of the works of michael o. rabin based on multiple citation indexes,” information processing & management: an international journal 42 no. 6 (2006): 1553–66. 4. alfred arsenault and sean turner, “internet x.509 public key infrastructure: pkix roadmap,” draft, pkix working group, sept. 8, 1998, http://tools.ietf.org/html/draft-ietf-pkixroadmap-00 (accessed june 24, 2010). 5. internet assigned numbers authority (iana), root zone database, http://www.iana.org/domains/root/db/ (accessed june 24, 2010). 6. for information on the doi system, see bill rosenblatt, “the digital object identifier: solving the dilemma of copyright protection online,” journal of electronic publishing 3, no. 2 (1997). 7. resource description framework (rdf), world wide web consortium, feb. 10, 2004, http://www.w3.org/rdf/ (accessed june 24, 2010). 8. “rdf vocabulary description language 1.0: rdf schema. w3c working draft 23 january 2003,” http://www .w3.org/tr/2003/wd-rdf-schema-20030123/ (accessed june 24, 2010); “owl web ontology language overview. w3c recommendation 10 february 2004,” http://www.w3.org/tr/ owl-features/ (accessed june 24, 2010). 9. jeremy j. carroll, “signing rdf graphs,” in the semantic web—iswc 2003, vol. 2870, lecture notes in computer science, ed. dieter fensel, katia sycara, and john mylopoulos (new york: springer, 2003). 10. ian horrocks, peter f. patel-schneider, and frank van harmelen, “from shiq and rdf to owl: the making of a web ontology language” web semantics: science, services and agents on the world wide web 1 (2003): 10–11. 11. see the dublin core metadata initiative (dcmi), http:// dublincore.org/ (accessed june 24, 2010); julie allinson, pete johnston, and andy powell, “a dublin core application profile for scholarly works,” ariadne 50 (2007), http://www.ukoln .ac.uk/repositories/digirep/index/eprints_type_vocabulary_ encoding_scheme, http://www.ariadne.ac.uk/issue50/ allinson-et-al/ (accessed dec. 27, 2010); world wide web consortium, “representing vcard objects in rdf/xml: w3c note 22 february 2001,” http://www.w3.org/tr/2001/note -vcard-rdf-20010222/ (accessed dec. 3, 2010); and for bibtex, see “entry types,” http://nwalsh.com/tex/texhelp/bibtx-7. html (accessed june 24, 2010). 12. marko. a. rodriguez, johan bollen, and herbert van de sompel, “a practical ontology for the large-scale modeling of scholarly artifacts and their usage,” proceedings of the 7th acm/ ieee joint conference on digital libraries (2007): 278–87. 13. rohit khare and adam rifkin, “weaving a web of trust,” world wide web journal 2, no. 3 (1997): 77–112. 14. carroll, “signing rdf graphs.” the architecture presented in this work is based in the use of digitally signed rdf graphs in the different stages of the scholarly publishing workflow, in such a manner that authors, publishers, repositories, and citation-analysis systems could have access to independent reliable evidences. the architecture aims to allow the creation of a reliable information space that reflects not just static knowledge but also dynamic relationships, reflecting the full complexity of trust relationships between the different parties in the scholarly domain. to allow modeling the scholarly communication process with rdf graphs, we have designed an owl dl ontology. rdf graphs carrying instances of classes and properties from the ontology will be digitally signed and interchanged between parties at the different stages of the creation–editing–publishing process. citation-management systems will have access to these signed metadata graphs and will be able to verify their provenance and trust before incorporating them to their repositories. because citation analysis has become a critical component in scholarly impact factor calculation, and considering the relevance of this metric within the scholarly publishing value chain, we defend that the relevance of providing a reliable solution justifies the effort of introducing technological changes within the publishing lifecycle. we believe that these changes, which could be easily automated and incorporated to the modern conference and journal editorial systems, are justified considering the serious flaws of the established solutions and the relevance that citation-analysis systems are acquiring in our society ■■ acknowledgment this work has been partly supported by the spanish administration (tec2008-06692-c02-01 and tsi2007 66869-c02-01). references and notes 1. herbert van de sompel et al., “an interoperable fabric for scholarly value chains,” d-lib magazine 12 no. 10 (2006), http:// www.dlib.org/dlib/october06/vandesompel/10vandesompel .html (accessed jan. 19, 2011). 2. boletín oficial del estado (b.o.e.) 054 04/03/2005 sec 3 pag 7875 a 7887, http://www.boe.es/boe/dias/2005/03/04/pdfs/ a07875–07887.pdf (accessed june 24, 2010). see also thomson is creative commons a panacea for managing digital humanities intellectual property rights? articles is creative commons a panacea for managing digital humanities intellectual property rights? yi ding information technology and libraries | september 2019 34 yi ding (yi.ding@csun.edu) is online instructional design librarian and affordable learning $olutions co-coordinator, california state university, northridge. abstract digital humanities is an academic field applying computational methods to explore topics and questions in the humanities field. digital humanities projects, as a result, consist of a variety of creative works different from those in traditional humanities disciplines. born to provide free, simple ways to grant permissions to creative works, creative commons (cc) licenses have become top options for many digital humanities scholars to handle intellectual property rights in the us. however, there are limitations of using cc licenses that are sometimes unknown by scholars and academic librarians. by analyzing case studies and influential lawsuits about intellectual property rights in the digital age, this article advocates for a critical perspective of copyright education and provides academic librarians with specific recommendations about advising digital humanities scholars to use cc licenses with four limitations in mind: 1) the pitfall of a free license; 2) the risk of irrevocability; 3) the ambiguity of noncommercial and nonderivative licenses; 4) the dilemma of sharealike and the open movement. introduction along with an increasing number of digital scholarships, open access became a preferred, more affordable model for scholarly communication in the us.1 in particular, digital humanists envision a sharing culture that digital contents and tools can be widely distributed through open access licenses.2 creative commons (cc) licenses, with their promise to provide simple ways to grant permissions to creative works, became top options for many digital humanities to handle intellectual property rights in the us. however, creative commons is not a panacea for managing the intellectual property rights of digital scholarship. digital humanities projects usually consist of complicated components and their intellectual property rights involve various licenses and stakeholders. with misunderstandings of intellectual property and cc licenses, many scholars are not fully aware of the implications of using cc licenses, which cannot provide legal solutions to all intellectual property rights issues. the increasingly popular application and commercialization of digital humanities projects in the us further complicate the issue. based on case studies and influential lawsuits involving the topic in the us, this article critically investigates the limitations of using cc licenses and recommends that academic librarians provide scholars with more sophisticated suggestions on using cc licenses as well as providing education on intellectual property rights in general. mailto:yi.ding@csun.edu is creative commons a panacea for managing digital humanities ip rights? | ding 35 https://doi.org/10.6017/ital.v38i3.10714 literature review usually identified as rights experts, academic librarians are in a unique position to provide copyright education in the digital humanities field through consultation, instruction, and other means to faculty and students.3 librarians sometimes position themselves as “reuse evangelists” who embrace the vision of creative commons by applying cc licenses as well as introducing cc licenses to the campus community through guides and webpages.4 yet, few discussions have been brought up about the limitations of cc licenses in the library community.5 drawing from scholarly literature from the law field and primary sources including lawsuits, websites, magazine articles, and newspaper articles involving this topic, this article intends to bring a critical perspective into the copyright education academic librarians provide by analyzing the four limitations of cc licenses in managing digital humanities projects intellectual property rights. in the law community, scholars have examined the limitations of open licensing and creative commons. katz elaborated the mismatch of the vision of creative commons and its licensors as well as how the incompatibility of cc licenses may result in potential detriment to the dissemination of knowledge.6 scholars later have referred to katz in extensive discussions of the limitations of cc licenses in different realms of copyrighted works. for example, johnson investigated several limitations of cc licenses for entertainment media, including those with sharealike, noncommercial, and nonderivative licenses.7 lukoseviciene acknowledged the efficiency of cc licenses while pointing out its limitation in ensuring equity in a sharing culture.8 when discussing the problems of cc licenses in data sharing, khayyat and bannister echoed katz’s critique on the limitation of cc licenses in combining copyrighted works with different types of licenses.9 scholars have also addressed problems related to intellectual property rights other than copyright when applying cc licenses. for example, hietnanen discussed the problems of license interpretation and concluded that although cc licenses are useful for “low value high volume licensing,” it fails to address some important intellectual property rights including privacy and moral rights.10 burger demonstrated how cc commercial licenses have encouraged publicity right infringement in several cases.11 nevertheless, none of the above scholars discussed the implication of the limitations of cc licenses in digital scholarship. to solve the problem of excessive open-source licenses, gomulkiewicz suggested a license-selection “wizard” modeling what creative commons offers, which demonstrates the limitation of cc licenses in managing the intellectual property rights of codes, a common component of many digital humanities projects.12 this article does not aim to conduct a comprehensive assessment of pitfalls of cc licenses in digital scholarship or make legal recommendations to manage the intellectual property rights of digital humanities projects. rather, it discusses the four limitations of cc licenses that are usually overlooked but essential for academic librarians to educate patrons in the digital humanities field. with the development of the digital humanities field and more students involved in it, academic librarians should educate both faculty scholars and emerging scholars about implications of applying cc licenses.13 information technology and libraries | september 2019 36 four limitations of cc licenses is creative commons really free?—the pitfall of a free license one major reason that scholars and institutions are using cc licenses is the ease of applying them to creative works. the directory of open access journals (doaj), which is regarded as “both an important mode of discovery and marker of legitimacy within the world of open access publishing,” now recommends cc licenses as a best practice.14 doaj explicitly encourages scholars to use creative commons’ “simple and easy” license chooser tool. indeed, the creative commons website provides scholars and institutions a very user-friendly way to select and apply a license to copyrightable works.15 anyone can place a cc license on a work by copying and pasting from its website. however, this oversimplified process of handling intellectual property rights of creative works may mislead both copyright owners and copyrighted works users to overlook pitfalls of this free license, including unintentional copyright and other intellectual property rights infringements. more specifically, one prominent legal formality of cc licenses is that licensees do not need to pay to register with creative commons to apply a cc license. as indicated by creative commons website, a cc license is legally valid as soon as a user applies it to any material the user has the legal right to license. creative commons also does not “require registration of the work with a national copyright agency.”16 while copyright protection is automatic the moment a work is created and “fixed in a tangible form,” there are various advantages to register copyrighted works through the united states copyright office to establish a public record of the copyright claim.17 one foremost important advantage of copyright registration is that copyright owners can file an infringement suit of works of u.s. origin in court. actually, filing a registration before or within five years of publishing a work will actually put the copyright owner in a stronger position in court to validate the copyright.18 additionally, copyright registration enables one to get awarded statutory damages and attorney’s fees and to gain protection against the importation of infringing copies.19 the emphasis on a free-to-use license along with the lack of clarification of the functions of copyright registration on the website of creative commons may not only mislead scholars to ignore important legal formalities within the copyright law, but also increase the abuse of original materials by stakeholders such as predatory publishers. one example is how the integrated study of marine mammals repackaged existing articles taking advantage of the creative commons licenses used by plos one, which has been publishing articles on digital humanities. 20 the oversimplified process of using cc licenses advocated by creative commons website may also prevent licensors from double-checking or clarifying if they have the legal right to license a work. in 2013, persephone magazine, which used an image with a creative commons license, was later sued for $1,500 for using it. it turned out the photo did not belong to the person who uploaded it with a cc license, which led to 73 companies who used it being sued. persephone magazine claimed that $1,500 was more than its entire advertising revenue for the year and it had to ask its users to donate just to keep the site going.21 therefore, scholars of digital humanities projects, which usually include different types of content such as artworks and photographs, should be wary of using cc licensed images. otherwise, a freely available license might end up costing a scholar unexpected money and energy. in the is creative commons a panacea for managing digital humanities ip rights? | ding 37 https://doi.org/10.6017/ital.v38i3.10714 meantime, when deciding to put their projects under cc licenses or to publish their works in a journal that requires cc licenses, scholars should also be reminded to make accurate and clear copyright statements to prevent innocent infringements of other copyright owners ’ works. for example, a team of art historians who create an online map of architectures in ancient china are very likely to use and critique other people’s images in digital projects under fair use. these digital humanists should cite image sources and clarify the scope of the cc license that they apply to their project. it is understandable that in order to promote an open, sharing culture, the application of a cc license is intentionally designed to be simple and free by creative commons to fulfill its mission. however, the misuse of a free license can lead to false licenses and more innocent infringements and ultimately costs. academic librarians should become aware of these pitfalls and provide more in-depth training on cc licenses to scholars, especially by collaborating with campus centers of digital humanities or language and literature faculty as well as other institutional research support departments as suggested by fraser and arnhem.22 is creative commons really safe?—the risk of irrevocability similar to the pitfall of inaccurate licenses, the irrevocability of cc licenses can also be problematic. a “revocable” license is one that can be terminated by the licensor at any time during the term of the license agreement. an “irrevocable” license, on the other hand, cannot be terminated if there is no breach. all cc licenses are irrevocable.23 licenses and contracts usually have effective date of termination and even if they don’t have one, most courts hold that simple, nonexclusive licenses with unspecified durations that are silent on revocability are revocable at will.24 as a result, the irrevocability of cc licenses can be easily overlooked by cc licensors. this means that while in traditional academic publishing and other means of the dissemination of research, scholarly, and creative output, a scholar will be able to revise the copyright agreement he or she has established with a publisher or a scholarly communication venue due to the usually clear rules on termination dates and revocability, it is impossible to revoke a cc license. this discrepancy of the revocability between traditional copyright agreements and cc licenses may put copyright owners at disadvantage especially because many of them apply noncommercial cc licenses. copyright experts have warned scholars to keep in mind that once a “nonexclusive license,” which cc noncommercial licenses are, has been chosen to grant one’s work, the scholar has lost potential opportunities to “license the same work on an exclusive basis,” which is the case in the commercialization of a digital humanities work.25 we can understand this pitfall of the irrevocability of cc licenses in a case in late 2014. a plan by yahoo to begin selling prints of images uploaded to flickr was met with anger by users, even though yahoo only used photos with creative commons licenses that explicitly allowed commercial uses. although yahoo’s use of cc licensed works was legal, users who initially applied cc licenses with commercial use would not have wanted the company making canvas prints from the photos they posted to flickr to make money.26 should these copyright owners understand better the irrevocability of cc licenses, they might have chosen a different type of cc license with caution. bill of rights, a community of people advocating for protecting the intellectual property rights of artists, even called this kind of commercial use “abuse.”27 although most digital scholars, like those flickr users, have a genuine interest in making their works available to as many people as possible, it can be hard to gauge their reactions to all unforeseen outcomes of applying cc licenses to their works. therefore, scholars need more institutional support and education to information technology and libraries | september 2019 38 become aware of the irrevocability of cc licenses when managing the intellectual property rights of their digital scholarship projects. this institutional awareness-building is especially important because of the lack of support from creative commons. irrevocability is listed in the “considerations before licensing” section on the website of creative commons. however, scholars may easily overlook the irrevocability feature of cc licenses due to two reasons. first, the 6,500-plus-word “considerations before licensing” section is not a mandatory step to go through for licensors. it is simply a clickable link from the “choose a license” webpage of creative commons.28 second, although every cc license consists of three layers, the lawyer-readable legal code, the human-readable deed, and the machine-readable code, the irrevocability of cc licenses can be easily buried in those texts when a layperson without any experience or training of cc license look for the simplest way to promote and expose their works as much as possible.29 some may suggest putting everything under noncommercial use. however, it is not an option for some platforms and is even discouraged by some digital scholarship repositories. for example, the open access scholarly publishers association strongly encourages the use of the cc-by license wherever possible.30 the rationale behind the recommendation is the hope to make scientific findings available for innovations as well as to make open-access journals sustainable with sufficient profit to operate. driven by the same objectives, cc-by has become the gold standard for oa publishing. the three largest oa publishers (biomed central, plos, and hindawi) all use this license.31 in particular, the often multimedia and viable characteristics of digital humanities projects can expose them to even more infringement issues in the future. one example of this is romelab, a project focusing on the recreation of the roman forum, and its website is made up of multiple separate components. the project’s website is constructed with the drupal content management system, and is integrated with a 3d virtual world component, where users can access the romelab website and walk through the virtual space of rome itself. romelab is currently under a creative commons attribution-noncommercial license. as a project funded by the mellon foundation, romelab is required to offer “nonexclusive, royalty-free, worldwide, perpetual, irrevocable license to distribute” its data. 32 however, it is never clear to the researchers creating the site how to release the data that only work within the proprietary software unity engine that they used to produce the virtual space and more importantly, all the 3d models and pictures. simply putting the whole site under the creative commons attribution-noncommercial license doesn’t automatically make its research data accessible by the public. in this case, the irrevocability of cc licenses further complicates the issue of cc licenses being oversimplified. specifically, since the romelab website is also equipped with a chat feature and a multiplayer function, allowing multiple users to interact with each other, the project has a great potential to make profit if repurposed as a teaching tool and even an educational game in the future. whether or not researchers of romelab manage to make their research data publicly available, cc licenses are not a panacea to handle conflicting data release expectations and intellectual property rights of unity engine and mellon foundation. it is therefore recommended that digital scholars consider various data types and licensing options before exclusively applying irrevocable cc licenses to their creative works. moreover, if the creator of romelab wants to produce a virtual introduction of the 3d world of the project, he should take into consideration of the limitation of cc licenses before disseminating his is creative commons a panacea for managing digital humanities ip rights? | ding 39 https://doi.org/10.6017/ital.v38i3.10714 work via platforms such as youtube. in 2014, a user found out that somebody took his drone video of burning man 2013 and reposted it in its entirety to youtube under the inaccurate and misleading title “drone’s eye view of burning man 2014,” which earned a large number of views and advertising.33 when everyone was looking for the newest drone video of burning man in 2014, the video posted by this other person received millions of views, which earned them money from youtube advertising. the reason the user cannot sue this other person is that he originally licensed his video under cc by license, which allows commercial use, and which unfortunately is youtube’s only cc license option.34 had the original videographer better understood the irrevocability of cc licenses, he might have chosen a different platform to disseminate his video or at least utilized other ways to protect his copyright. scholars would not want this kind of abuse of their original works and thus should be more cautious of the irrevocability of cc licenses. furthermore, youtube and many other platforms that digital humanities scholars use to disseminate their research, scholarly, and creative work fail to provide effective functionalities and incentives to fulfill cc’s attribution requirement.35 cc by license stipulates, “if supplied, you must provide the name of the creator and attribution parties, a copyright notice, a license notice, a disclaimer notice, and a link to the material.”36 to find this piece of information on youtube, however, someone must go to a video’s landing page and first click the “show more” text in the description below the video. although it is clear to see the cc attribution license with link displayed, someone must click a “view attributions” link to discover the original author’s credit and source video link. the difficulty of going through different steps may impede an average youtube user or most potential licensees of a cc-licensed digital scholarly work to learn the original creator of any content and if what they are viewing was partially or wholly created by someone else.37 since cc licenses only provide licensees with a very general requirement to attribute, licensees are allowed to attribute “in any reasonable manner.”38 with the only limitation to be “not in any way that suggests the licensor endorses you or your use,” licensees are not incentivized to accurately attribute to the scholar of the original work and thus to help disseminate his or her work crediting the copyright owner.39 while users can search for registered works on the official website of united states copyright office, there is no way to conduct a comprehensive search for works under cc licenses. creative commons does not maintain a database of works distributed under cc licenses. although there are search engines and websites for works under cc licenses, there is no way to conduct an exhaustive search.40 this can create hurdles for future licensees of a derivative work to accurately and clearly attribute the original work. one of the most important motivations of scholars to distribute their works under cc licenses is to get gain more exposure. due to all these above limitations and others to be discussed in this paper, scholars should be more cautious of the irrevocability of cc licenses and its lack of enforcement and support system to help licensors accurately attribute the original work. is creative commons really clear?—the ambiguity of noncommercial and nonderivative licenses noncommercial license in the legal code of a cc attribution-noncommercial-sharealike license, noncommercial is defined as “not primarily intended for or directed towards commercial advantage or monetary compensation. for purposes of this public license, the exchange of the licensed material for other material subject to copyright and similar rights by digital file-sharing or similar means is noncommercial provided there is no payment of monetary compensation in connection with the https://www.youtube.com/watch?v=m2thtb6iffa https://www.youtube.com/watch?v=m2thtb6iffa https://www.youtube.com/watch?v=z9jtiouk_6o https://creativecommons.org/licenses/by/2.0/ information technology and libraries | september 2019 40 exchange.”41 this seemingly clear statement can create some confusion and problems in the real world. while a commercial use weighs against fair use, copyright law does not rule infringement solely on a use being commercial. in fact, it is hard to determine a use as totally noncommercial. in the case of princeton university press v. michigan document services, inc., michigan document services (mds) being a commercial copy shop weighs against a finding of fair use, but mds’s use being commercial is only one of the four factors in a fair-use analysis. in this case, the court held that mds’s commercial exploitation of the copyrighted works from princeton university press did not constitute fair use although the courts clarified the educational use was “noncommercial in nature.”42 there have been a number of cases in us copyright law where commercial uses have been ruled lawful fair use. by making commercial use a decisive factor to determine an illegal use, creative commons fails to specify real cases of commercial uses and thus oversimplifies the complicated copyright issues involving commercial uses that scholars should be aware of. more specifically, many digital scholars nowadays post their articles and projects with noncommercially cc licensed images on a website, the maintenance of which is seldom free. similar to the case of princeton university press v. michigan document services, inc., the educational or scholarly use of those noncommercially licensed images should be considered “noncommercial in nature.”43 however, if a digital humanist maintains a website that is subsidized partly by google ads or a company, the nature of the use of those noncommercially licensed images might be called into question as in the case of princeton university press v. michigan document services, inc. although in both situations, the image is not “primarily intended for or directed towards commercial advantage or monetary compensation,” the digital humanist may still increase the traffic of his site and thus profit from including those images on his site. 44 the “different viewpoints and colliding interests” among commercial publishers, librarians, scholars, university administrators, and others may further complicate the already “ambiguous commercial nature of use” in fair use analysis that creative commons oversimplifies.45 the more recent case of great minds v. fedex office & print services, inc. demonstrates this ambiguity of commercial use and one use of cc noncommercial license that is legal yet unexpected and unwanted by copyright owners. to specify, great minds argued that fedex should compensate it for the money the company made from copying materials that great minds distributed under a cc attribution-noncommercial-sharealike 4.0 license. in an amicus brief to support fedex office, creative commons held that “entities using cc-licensed works must be free to act as entities do—including through employees and the contractors they engage in their service” and otherwise “the value of the license would be significantly diminished.”46 creative commons demonstrated its interpretation of a commercial use to be different from the ruling in the case princeton university press v. michigan document services, in which the judge explicitly ruled the use to be commercial because the copyright complaint was performed on “a profitmaking basis by a commercial enterprise” and clearly forbade the contract between this enterprise and a nonprofit organization to copy and distribute copyrighted content.47 in contrast, in the case of great minds v. fedex office & print services, inc., the court held that great minds ’ nonexclusive public license, i.e. cc attribution-noncommercial-sharealike 4.0 international public license, “unambiguously permitted school districts to engage fedex, for a fee, to reproduce” the copyrighted content.48 scholars should therefore be wary of the complicated process and “several areas of uncertainty” surrounding creative commons, which can be easily is creative commons a panacea for managing digital humanities ip rights? | ding 41 https://doi.org/10.6017/ital.v38i3.10714 overlooked when applying the “simple and easy” cc licenses.49 none of the interpretations of noncommercial uses by creative commons are specified in the generic license deed. compared to more customized licenses that usually involve direct interactions between the licensor and the licensee, the free-of-charge license, cc licenses, has a long way to go to protect both licensors and licensees from infringements and financial loss. a study of noncommercial uses conducted by creative commons indicates that noncommercial licenses account for “approximately two-thirds of all creative commons licenses associated with works available on the internet.”50 kim confirmed this popularity of cc noncommercial licenses that “over 60 percent flickr users prohibit commercial use or derivative work.”51 as kim elaborated and as the previous section in this paper on the irrevocability of cc licenses showcases, either commercial or noncommercial cc licenses are “likely to be detrimental to potential professional careers” of copyright owners.52 nevertheless, as stated by creative commons, they do not offer legal advice. 53 when providing copyright education, academic librarians should therefore remind digital scholars to be careful in using both commercial and noncommercial content and making their own content available for noncommercial purposes. nonderivative license similarly, scholars should be reminded to have a critical view of nonderivative use of cc licenses. according to title 17 section 101 of the copyright act, a “derivative work” is a work based upon one or more preexisting works in which it may be recast, transformed, or adapted.54 however, creative commons used the phrase “adapted material” to define derivative work in the legal code for nonderivative uses.55 creative commons has a different understanding of derivative works from what is defined by the copyright act in musical works. “adapted material is always produced where the licensed material is synched in timed relation with a moving image.”56 this means that while using an original soundtrack in a video is not derivative work according to the copyright act, videos that use an nd-licensed song violate the terms of the cc license. similar to the difference of revocability and commercial use between creative commons and copyright act as discussed earlier in this article, this different understanding of derivative work should be made aware to scholars. specifically, when providing copyright education to scholars, academic librarians should make it clear that nonderivative license cannot alienate the fair use rights of users and that a noncommercial nonderivative license does not prevent companies from using a work in a parody.57 some licensors of cc licenses may not share creative commons’ vision of an open, sharing culture as suggested by the prevalence of nd licenses.58 therefore, instead of providing generic recommendation on using cc licenses, academic librarians should “balance the interests of information users and rights holders” by providing a more sophisticated and critical perspective when educating the scholarly community about the nonderivative cc licenses.59 is creative commons really sustainable—the dilemma of sharealike and open access incompatible sharealike licenses for many digital scholars, the sharealike term in cc licenses is intended to distribute their works more broadly and openly since a licensee is required by creative commons to “distribute . . . [their contributions] . . . under the same license as the original.”60 nevertheless, some incompatibility issues arise to prevent a more open distribution of works. for example, since the creative commons system offers two different sharealike licenses, a scholar cannot create a new derivative work combining two sharealike works with different terms of their respective licenses. http://www4.law.cornell.edu/uscode/17/101.html information technology and libraries | september 2019 42 it is the open and accessible nature of cc-licensed works that makes them ideal for scholars including digital humanists to collaborate on projects but ironically, the sharealike function can create the risk of “an intractable thicket” if incompatibilities between those licenses hinder future collaboration.61 creative commons does provide a series of compatible licenses, but only same licenses with differences in cc versions are considered compatible.62 against open access? these incompatibilities between certain cc licenses have been pointed out by copyright experts to limit “the future production and distribution of creative works” and even “anti-public domain.”63 in 2009, the cofounder and ceo of creative commons, lawrence lessig, pointed out the perils of openness in government in his article “against transparency.”64 echoing his argument that “whether and how new information is used to further public objectives depends upon its incorporation into complex chains of comprehension, action, and response,” this paper advocates a critical perspective of cc licenses in digital scholarship. apart from all the limitations of cc licenses discussed already, a more unsettling misuse of cc licenses is a failure to recognize other rights of a work beyond copyright. in 2011, the image of an underage girl, which was placed on flickr under a cc license, was used in an advertising campaign for mobile phone services.65 although after the lawsuit, creative commons ceo added a term in the legal deed of the latest version (4.0) of every cc license to explicitly state that other rights such as publicity, privacy, or moral rights may limit how to use the material, the case reveals the perils of openness.66 when providing copyright education, academic librarians should not only warn digital scholars of this limitation of cc licenses but also encourage them to include a statement of intellectual property rights including privacy and other rights on their digital scholarship websites to reduce abuses and innocent infringements. conclusion even though cc licenses are helpful for digital humanities to gain more exposure, these licenses are still being improved. creative commons pledged to the community to “clarify how the nc limitation works in the practical world.”67 yet, when providing copyright consultation or partnering with digital humanity scholars, academic librarians should warn these scholars as both licensors and licensees the sophisticated implications of not only the noncommercial license, but also other characteristics and limitations of cc licenses. academic librarians should introduce to digital scholars a more critical view of cc licenses by collaborating with different campus stakeholders.68 while it is recommended that academic librarians suggest digital scholars place their creative works under noncommercial license, academic librarians should also educate them about the ambiguous definitions of commercial use as well as the possibility of commercial parody and other fair use situations. it is also recommended that academic librarians provide digital humanists with guidance on how to create intellectual property statements on their website, which should include not only copyright, but also privacy and other intellectual property rights. currently, a number of university libraries and nonprofit organizations, ranging from duke university library (http://library.duke.edu/), to library of congress (https://www.flickr.com/photos/library_of_congress), and wikipedia (https://en.wikipedia.org/wiki/main_page), use cc licenses for their entire site.69 as cc license http://library.duke.edu/ https://www.flickr.com/photos/library_of_congress https://en.wikipedia.org/wiki/main_page is creative commons a panacea for managing digital humanities ip rights? | ding 43 https://doi.org/10.6017/ital.v38i3.10714 users, academic librarians should also be extremely careful when using cc-licensed pictures or music on the library’s website. the safest way is to only use ones that are in the public domain or that are acquired by the library. despite the use of free and simple cc licenses, academic libraries are recommended to include terms of use and privacy sections on their websites to provide more detailed explanations of the function of cc licenses and intellectual property rights in general. the alignment between the visions of creative commons, digital humanities, and “higher education as a cultural and knowledge commons” put academic librarians in a unique position to provide copyright education in the digital humanities field.70 because of all the limitations of cc licenses, academic librarians should go beyond a simple endorsement of cc licenses and offer a more sophisticated and critical perspective when educating the scholarly community about cc licenses. notes 1 amanda hornby and leslie bussert, "digital scholarship and scholarly communication," university of washington libraries, accessed november 30, 2016, https://www.uwb.edu/getattachment/tlc/faculty/teachingresources/newmedia. 2 oya y rieger, “framing digital humanities: the role of new media in humanities scholarship,” first monday 15, no. 10 (october 11, 2010), http://firstmonday.org/ojs/index.php/fm/article/view/3198. 3 elizabeth joan kelly, "rights instruction for undergraduate students: needs, trends, and resources," college & undergraduate libraries 25, no. 1 (2018): 1-16, https://doi.org/10.1080/10691316.2016.1275910. 4 daniel hickey, "the reuse evangelist: taking ownership of copyright questions at your library," reference & user services quarterly 51, no. 1 (2011): 9-11; “research guides: image resources: creative commons images,” creative commons images image resources research guides at ucla library, accessed april 28, 2019, https://guides.library.ucla.edu/c.php?g=180361&p=1185834; “finding public domain & creative commons media: images,” research guides, accessed april 28, 2019, https://guides.library.harvard.edu/c.php?g=310751&p=2072816. ucla and harvard are two good examples. 5 lewin-lane et al., "the search for a service model of copyright best practices in academic libraries," journal of copyright in education and librarianship 2, no. 2 (2018): 1-24. harvard. for example, when conducting a literature review of the copyright education in academic libraries to search for best practices, does not discuss any limitation of cc licenses in this article. 6 zachary katz, "pitfalls of open licensing: an analysis of creative commons licensing," idea: the intellectual property law review 46, no. 3 (2006): 391-413. 7 eric e. johnson, "rethinking sharing licenses for entertainment media," cardozo arts & entertainment law journal 26, no. 2 (2008): 391-440. https://www.uwb.edu/getattachment/tlc/faculty/teachingresources/newmedia http://firstmonday.org/ojs/index.php/fm/article/view/3198 https://doi.org/10.1080/10691316.2016.1275910 https://guides.library.ucla.edu/c.php?g=180361&p=1185834 https://guides.library.harvard.edu/c.php?g=310751&p=2072816 https://csun-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=tn_gale_ofa190356089&context=pc&vid=01cals_uno&search_scope=everything&tab=everything&lang=en_us https://csun-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=tn_gale_ofa190356089&context=pc&vid=01cals_uno&search_scope=everything&tab=everything&lang=en_us https://csun-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=tn_gale_ofa190356089&context=pc&vid=01cals_uno&search_scope=everything&tab=everything&lang=en_us https://csun-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=tn_gale_ofa190356089&context=pc&vid=01cals_uno&search_scope=everything&tab=everything&lang=en_us information technology and libraries | september 2019 44 8 aurelija lukoseviciene, "beyond the creative commons framework of production and dissemination of knowledge," http://dx.doi.org/10.2139/ssrn.1973967. 9 mashael khayyat and frank bannister, “open data licensing: more than meets the eye,” information polity: the international journal of government & democracy in the information age 20 (4): 231–52, https://doi:10.3233/ip-150357. 10 herkko hietanen, “the pursuit of efficient copyright licensing: how some rights reserved attempts to solve the problems of all rights reserved,” lappeenranta university of technology, 2008. 11 christa engel pletcher burger, “are publicity rights gone in a flash?: flickr, creative commons, and the commercial use of personal photographs,” florida state business review 8 (2009): 129, https://ssrn.com/abstract=1476347. 12 robert w gomulkiewicz, “open source license proliferation: helpful diversity or hopeless confusion?” washington university journal of law & policy 30 (2009): 261; expanded academic asap, accessed april 28, 2019, http://link.galegroup.com.libproxy.csun.edu/apps/doc/a208273638/eaim?u=csunorthridge &sid=eaim&xid=4bbf2442. 13 jacob h. rooksby, “a fresh look at copyright on campus,” missouri law review (summer 2016): 769; general onefile, accessed april 27, 2019, http://link.galegroup.com.libproxy.csun.edu/apps/doc/a485538679/itof?u=csunorthridge& sid=itof&xid=1f2822f3. 14 “escholarship: copyright & legal agreements,” accessed december 1, 2016, http://escholarship.org/help_copyright.html#creative. 15 “directory of open access journals,” doaj, accessed december 1, 2016, https://doaj.org. 16 “frequently asked questions—creative commons,” accessed december 7, 2016, https://creativecommons.org/faq/#do-i-need-to-register-with-creative-commons-before-iobtain-a-license. 17 “copyright in general,” u.s. copyright office, accessed july 30, 2019, https://www.copyright.gov/help/faq/faq-general.html. 18 “why should i register my work if copyright protection is automatic?,” copyright alliance, accessed july 28, 2019, https://copyrightalliance.org/ca_faq_post/copyright-protection-ata/. 19 “copyright basics,” u.s. copyright office and library of congress, accessed november 30, 2016. https://www.copyright.gov/circs/circ01.pdf#page=7. 20 phil clapham, “are creative commons licenses overly permissive? the case of a predatory publisher,” bioscience (2018): 842-43, accessed april 20, 2019, https://doi:10.1093/biosci/biy098; cornelius puschmann and marco bastos, “how digital are http://dx.doi.org/10.2139/ssrn.1973967 https://doi:10.3233/ip-150357 https://ssrn.com/abstract=1476347 http://link.galegroup.com.libproxy.csun.edu/apps/doc/a208273638/eaim?u=csunorthridge&sid=eaim&xid=4bbf2442 http://link.galegroup.com.libproxy.csun.edu/apps/doc/a208273638/eaim?u=csunorthridge&sid=eaim&xid=4bbf2442 http://link.galegroup.com.libproxy.csun.edu/apps/doc/a485538679/itof?u=csunorthridge&sid=itof&xid=1f2822f3 http://link.galegroup.com.libproxy.csun.edu/apps/doc/a485538679/itof?u=csunorthridge&sid=itof&xid=1f2822f3 http://escholarship.org/help_copyright.html#creative https://doaj.org/ https://creativecommons.org/faq/#do-i-need-to-register-with-creative-commons-before-i-obtain-a-license https://creativecommons.org/faq/#do-i-need-to-register-with-creative-commons-before-i-obtain-a-license https://www.copyright.gov/help/faq/faq-general.html https://copyrightalliance.org/ca_faq_post/copyright-protection-ata/ https://www.copyright.gov/circs/circ01.pdf#page=7 https://doi:10.1093/biosci/biy098 is creative commons a panacea for managing digital humanities ip rights? | ding 45 https://doi.org/10.6017/ital.v38i3.10714 the digital humanities? an analysis of two scholarly blogging platforms,” plos one 10, no. 2 (2015), accessed april 20, 2019. https://doi:10.1371/journal.pone.0115035. 21 “why your blog images are a ticking time bomb,” koozai.com, accessed december 2, 2016, https://www.koozai.com/blog/content-marketing-seo/blog-sued-for-images/. 22 john w. white and heather gilbert eds., laying the foundation: digital humanities in academic libraries (west lafayette: purdue university press, 2016), proquest ebook central. 23 “considerations for licensors and licensees—creative commons,” accessed december 7, 2016, https://wiki.creativecommons.org/wiki/considerations_for_licensors_and_licensees. 24 “the terms ‘revocable’ and ‘irrevocable’ in license agreements: tips and pitfalls,” accessed december 7, 2016, http://www.sidley.com/news/the-terms-revocable-and-irrevocable-inlicense-agreements-tips-and-pitfalls-02-21-2013. 25 mark seeley and lois wasoff, “legal aspects and copyright-15,” in academic and professional publishing, edited by robert campbell, ed pentz, and ian borthwick (cambridge, uk: elsevier ltd, 2012), 355-83. 26 douglas macmillan, “fight over yahoo’s use of flickr photos,” wall street journal, november 25, 2014, sec. tech, http://www.wsj.com/articles/fight-over-flickrs-use-of-photos-1416875564. 27 “flickr apologizes but what about cc abuses by others?,” accessed december 7, 2016, http://www.artists-bill-of-rights.org/news/campaign-news/flickr-apologizes-but-whatabout-cc-abuses-by-others?/. 28 “the terms ‘revocable’ and ‘irrevocable’ in license agreements: tips and pitfalls,” accessed december 7, 2016, http://www.sidley.com/news/the-terms-revocable-and-irrevocable-inlicense-agreements-tips-and-pitfalls-02-21-2013. 29 “legal code—creative commons,” accessed december 7, 2016, https://wiki.creativecommons.org/wiki/legal_code. 30 “why cc-by?—oaspa,” accessed december 7, 2016, http://oaspa.org/why-cc-by/. 31 “why cc-by?—oaspa.” 32 “intellectual property policy,” the andrew w. mellon foundation, accessed july 28, 2019, https://mellon.org/grants/grantmaking-policies-and-guidelines/grantmakingpolicies/intellectual-property-policy/. 33 “why i’m giving up on creative commons on youtube,” eddie.com, september 6, 2014, http://eddie.com/2014/09/05/why-im-giving-up-on-creative-commons-on-youtube/. 34 “creative commons—attribution 4.0 international—cc by 4.0,” accessed december 7, 2016, https://creativecommons.org/licenses/by/4.0/. 35 “why i’m giving up on creative commons on youtube.” https://doi:10.1371/journal.pone.0115035. https://www.koozai.com/blog/content-marketing-seo/blog-sued-for-images/ https://wiki.creativecommons.org/wiki/considerations_for_licensors_and_licensees http://www.sidley.com/news/the-terms-revocable-and-irrevocable-in-license-agreements-tips-and-pitfalls-02-21-2013 http://www.sidley.com/news/the-terms-revocable-and-irrevocable-in-license-agreements-tips-and-pitfalls-02-21-2013 http://www.wsj.com/articles/fight-over-flickrs-use-of-photos-1416875564. http://www.artists-bill-of-rights.org/news/campaign-news/flickr-apologizes-but-what-about-cc-abuses-by-others?/ http://www.artists-bill-of-rights.org/news/campaign-news/flickr-apologizes-but-what-about-cc-abuses-by-others?/ http://www.sidley.com/news/the-terms-revocable-and-irrevocable-in-license-agreements-tips-and-pitfalls-02-21-2013. http://www.sidley.com/news/the-terms-revocable-and-irrevocable-in-license-agreements-tips-and-pitfalls-02-21-2013. https://wiki.creativecommons.org/wiki/legal_code http://oaspa.org/why-cc-by/ https://mellon.org/grants/grantmaking-policies-and-guidelines/grantmaking-policies/intellectual-property-policy/ https://mellon.org/grants/grantmaking-policies-and-guidelines/grantmaking-policies/intellectual-property-policy/ http://eddie.com/2014/09/05/why-im-giving-up-on-creative-commons-on-youtube/ https://creativecommons.org/licenses/by/4.0/ information technology and libraries | september 2019 46 36 “creative commons—attribution 4.0 international—cc by 4.0.” 37 “why i’m giving up on creative commons on youtube.” 38 “creative commons—attribution 4.0 international—cc by 4.0.” 39 ibid. 40 “cc search,” accessed december 7, 2016, https://search.creativecommons.org/. 41 “creative commons—attribution-noncommercial-sharealike 4.0 international—cc by-nc-sa 4.0,” accessed december 7, 2016, https://creativecommons.org/licenses/by-ncsa/4.0/legalcode. 42 “u.s. copyright office fair use index,” u.s. copyright office, accessed april 21, 2019, https://www.copyright.gov/fair-use/. 43 ibid. 44 ibid. 45 jerry d campbell, “intellectual property in a networked world: balancing fair use and commercial interests,” library acquisitions: practice and theory 19, no. 2 (1995): 179-84, https://doi:10.1016/0364-6408(95)00020-a; igor slabykh, “ambiguous commercial nature of use in fair use analysis,” aipla quarterly journal 46, no. 3 (2018): 293-339. 46 “defending noncommercial uses: great minds v fedex office,” creative commons, august 30, 2016, https://creativecommons.org/2016/08/30/defending-noncommercial-uses-greatminds-v-fedex-office/. 47 “princeton university press v. michigan document services,” bitlaw, accessed december 7, 2016, http://www.bitlaw.com/source/cases/copyright/pup.html#iiia. 48 justia, “great minds v. fedex office & print services, inc,” stanford copyright and fair use center, march 21, 2018, https://fairuse.stanford.edu/case/great-minds-v-fedex-office-printservices-inc/. 49 minjeong kim, “the creative commons and copyright protection in the digital era: uses of creative commons licenses,” journal of computer‐mediated communication 13, no. 1 (2007): 187-209, https://doi:10.1111/j.1083-6101.2007.00392.x; “directory of open access journals,” doaj, accessed december 1, 2016, https://doaj.org. 50 “feature: creative commons: copyright tools for the 21st century,” accessed december 7, 2016, http://www.infotoday.com/online/jan10/gordon-murnane.shtml. 51 “the creative commons and copyright protection in the digital era: uses of creative commons licenses.” 52 ibid. https://search.creativecommons.org/ https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode https://www.copyright.gov/fair-use/ https://csun-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=tn_sciversesciencedirect_elsevier0364-6408(95)00020-a&context=pc&vid=01cals_uno&search_scope=everything&tab=everything&lang=en_us https://csun-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=tn_sciversesciencedirect_elsevier0364-6408(95)00020-a&context=pc&vid=01cals_uno&search_scope=everything&tab=everything&lang=en_us https://doi:10.1016/0364-6408(95)00020-a. https://csun-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=tn_gale_ofa570516325&context=pc&vid=01cals_uno&search_scope=everything&tab=everything&lang=en_us https://csun-primo.hosted.exlibrisgroup.com/primo-explore/fulldisplay?docid=tn_gale_ofa570516325&context=pc&vid=01cals_uno&search_scope=everything&tab=everything&lang=en_us https://creativecommons.org/2016/08/30/defending-noncommercial-uses-great-minds-v-fedex-office/ https://creativecommons.org/2016/08/30/defending-noncommercial-uses-great-minds-v-fedex-office/ http://www.bitlaw.com/source/cases/copyright/pup.html#iiia https://fairuse.stanford.edu/case/great-minds-v-fedex-office-print-services-inc/ https://fairuse.stanford.edu/case/great-minds-v-fedex-office-print-services-inc/ https://doi:10.1111/j.1083-6101.2007.00392.x https://doaj.org/ http://www.infotoday.com/online/jan10/gordon-murnane.shtml is creative commons a panacea for managing digital humanities ip rights? | ding 47 https://doi.org/10.6017/ital.v38i3.10714 53 “creative commons—attribution-sharealike 4.0 international—cc by-sa 4.0,” accessed december 7, 2016, https://creativecommons.org/licenses/by-sa/4.0/legalcode#s6a. 54 “17 u.s. code § 101—definitions,” legal information institute, accessed april 20, 2019, https://www.law.cornell.edu/uscode/text/17/101. 55 “creative commons—attribution-noncommercial-noderivatives 4.0 international—cc by-ncnd 4.0,” accessed december 7, 2016, https://creativecommons.org/licenses/by-ncnd/4.0/legalcode. 56 “creative commons—attribution-noncommercial-noderivatives 4.0 international—cc by-ncnd 4.0.” 57 the famous campbell v. acuff-rose music case established that a commercial parody could qualify as fair use. 58 katz, “pitfalls of open licensing,” 411. 59 “professional ethics,” tools, publications & resources, american library association, february 6, 2019, http://www.ala.org/tools/ethics. 60 “creative commons—attribution-sharealike 4.0 international—cc by-sa 4.0,” accessed december 7, 2016, https://creativecommons.org/licenses/by-sa/4.0/. 61 molly houweling, “the new servitudes,” georgetown law journal 96, no. 3 (2008): 885-950. 62 “compatible licenses,” creative commons, accessed december 7, 2016, https://creativecommons.org/share-your-work/licensing-considerations/compatiblelicenses/. 63 katz, “pitfalls of open licensing,” 391; susan corbett, “creative commons licences, the copyright regime and the online community: is there a fatal disconnect?,” the modern law review 74, no. 4 (2011): 506, http://www.jstor.org/stable/20869091. 64 lawrence lessig, “against transparency,” new republic, october 8, 2009, https://newrepublic.com/article/70097/against-transparency. 65 “creative commons ceo apologizes to virgin mobile—stock photography news, analysis and opinion,” accessed december 7, 2016, https://www.selling-stock.com/article/creativecommons-ceo-apologizes-to-virgin-mob. 66 “frequently asked questions,” creative commons, accessed july 30, 2019, https://creativecommons.org/faq/#how-are-publicity-privacy-and-personality-rightsaffected-when-i-apply-a-cc-license. 67 “defending noncommercial uses: great minds v fedex office,” creative commons, august 30, 2016, https://creativecommons.org/2016/08/30/defending-noncommercial-uses-greatminds-v-fedex-office/. https://creativecommons.org/licenses/by-sa/4.0/legalcode#s6a https://www.law.cornell.edu/uscode/text/17/101 https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode https://en.wikipedia.org/wiki/parody https://en.wikipedia.org/wiki/fair_use http://www.ala.org/tools/ethics https://creativecommons.org/licenses/by-sa/4.0/ https://creativecommons.org/share-your-work/licensing-considerations/compatible-licenses/ https://creativecommons.org/share-your-work/licensing-considerations/compatible-licenses/ http://www.jstor.org/stable/20869091 https://newrepublic.com/article/70097/against-transparency https://www.selling-stock.com/article/creative-commons-ceo-apologizes-to-virgin-mob https://www.selling-stock.com/article/creative-commons-ceo-apologizes-to-virgin-mob https://creativecommons.org/faq/#how-are-publicity-privacy-and-personality-rights-affected-when-i-apply-a-cc-license https://creativecommons.org/faq/#how-are-publicity-privacy-and-personality-rights-affected-when-i-apply-a-cc-license https://creativecommons.org/2016/08/30/defending-noncommercial-uses-great-minds-v-fedex-office/ https://creativecommons.org/2016/08/30/defending-noncommercial-uses-great-minds-v-fedex-office/ information technology and libraries | september 2019 48 68 andrea malone et al., “center stage: performing a needs assessment of campus research centers and institutes,” journal of library administration 57, no.4 (2017): 406–19, https://doi:10.1080/01930826.2017.1300451. 69 laura gordon-murnane, “feature: creative commons: copyright tools for the 21st century,” information today, accessed december 7, 2016, http://www.infotoday.com/online/jan10/gordon-murnane.shtml. 70 ibid. https://doi:10.1080/01930826.2017.1300451 http://www.infotoday.com/online/jan10/gordon-murnane.shtml abstract introduction literature review four limitations of cc licenses is creative commons really free?—the pitfall of a free license is creative commons really safe?—the risk of irrevocability is creative commons really clear?—the ambiguity of noncommercial and nonderivative licenses noncommercial license nonderivative license is creative commons really sustainable—the dilemma of sharealike and open access incompatible sharealike licenses against open access? conclusion notes incoming editor’s column | gerrity 155 bob gerrity g reetings ital readers. i’m writing this in late september, as the boston red sox attempt to back their way into the major league baseball postseason after blowing a 9-game lead over tampa bay in a major-league september meltdown of epic proportions. [red sox fans are prone to hyperbole, but in this case no hyperbole is needed: this meltdown really is epic.] it’s down to the last game of the season, and like many red sox fans, i’m hopeful but not optimistic. the fate of the 2011 red sox will be old news by the time this appears in print, though: as i’m coming to learn, the wheels of scholarly publishing continue to turn ever so slowly, unless forced to do otherwise. which brings me to why i’m taking on the role of editor of ital. on one hand, i’m fortunate to be taking on the editorship of a journal that quite clearly has been stewarded with care, dedication, and attention by my predecessors. i’ve spent quite a few hours recently in the z678.9 section of my library’s stacks, perusing three decades of back volumes of ital and its predecessor, the journal of library automation. there’s an impressive body of scholarly and informational output on library automation and related topics, from the sublime (“to boolean or not to boolean?” september 1983), to the not-so-sublime (“the effects of baud rate, performance anxiety, and experience in online bibliographic searches,” march 1990), to the sentimental (“floppies to pass the billion-dollar level in ’84.” september 1982), to the déjà-vu-all-over-again (“ls2000—the integrated library system for oclc,” june 1984). overall, i’d have to say there’s a solid foundation to build on, plus plenty of good content in the pipeline, and it would be easy to continue on in the same vein. but that’s not why i’m here. i’m fortunate to be taking on the role of editor as ital faces significant changes. in his inaugural editorial for ital in march 2005, then-incoming editor john webb articulated a number of worthy goals for ital, to both broaden and deepen the content of the journal and the demographic of the authors contributing to it. one goal in particular, though, strikes me (in hindsight of course) as problematic: “i hope to . . . facilitate the electronic publication of articles without endangering—but in fact enhancing—the absolutely essential financial contribution that the journal provides to the association.” anyone who has observed the struggles of the newspaper industry in recent years or been involved in the shift towards e-only in the world of academic/scholarly journals will not be surprised to learn that, in the intervening years since john wrote his column and ital has continued in print plus electronic form, revenues (primarily from subscriptions and advertising) have steadily declined while production and distribution costs have not, resulting in an increasing annual subsidy from ala/lita to support the publication. as a result, i’ve been tasked with exploring a new publication model for ital: open access and electronic only. plans for—and the timing of — this transition are still being developed as i write this, but should be finalized before “my” first issue is published in march 2012. there is much about ital that will not change even if the publication format does. a primary focus of the journal will continue to be to solicit and publish high-quality, peer-reviewed papers covering a broad array of topics related to the design, application, and use of technology in libraries. changes i would like to see include making ital more timely and more relevant to the day-to-day work interests of many of its readers. i’d like to add more topical, current, and informational content to ital without negatively impacting its traditional role as a publication vehicle for librarians in tenure-track positions. ital in an e-only format also needs to provide easy and transparent ways for readers to be informed when new content is published and to offer advice, criticism, and commentary to help improve ital. i look forward to your feedback as ital moves in a new direction, about which i’m both hopeful and optimistic. i would like to offer my sincere thanks to the outgoing editor of ital, marc truitt, who has been both helpful and gracious during this editorial transition. marc is passionate about ital and its legacy, and i hope he’ll see the future ital as a worthy successor to, rather than an unfortunate break from, the journal he’s stewarded for the past several years. incoming editor’s column: ch-ch-ch-ch-changes (turn and face the strain) bob gerrity (robert.gerrity@bc.edu) is associate university librarian for information technology, boston college libraries, chestnut hill, massachusetts. lib-s-mocs-kmc364-20141005045405 bibcon-a general purpose software system for marc-based book catalog production liz gibson: assistant systems analyst, california state library, sacramento. 237 the bibcon file management system, designed for use on ibm 360 system equipment, performs two basic functions: (1) it creates marc structured, bibliographic records from untagged input data; (2) from these records it produces page image output for book catalogs. the system accepts data from several different input devices and can produce a variety of output formats by line printer, photocomposition, or computer output microform (com). introduction bibcon is a general purpose data management system for bibliographic records control (i.e., for creating, manipulating, formatting and outputting of marc structured bibliographic records from catalog card input data). the system, shown in figure 1, consists of seven basic programs which functionally divide into two parts: (a) four programs for creation and correction of marc-like records; and (b) three programs and an ibm utility sort for formation of book catalog entries from these records. obviously, a detailed description of such a large and complicated system is impossible in one journal article. a detailed description of the system specifications and user instructions has been prepared and published by the california state library.1 the bibcon system was cooperatively developed by the institute of library research, berkeley; the library systems development project, santa barbara; and the library systems offices at the santa cruz and berkeley campuses of the university of california. the system was developed in response to the needs of the university of california ( uc) and of the california state library ( csl) for efficient production of author, title, and added entry listings of their monographic holdings for distribution to their respective clientele groups. the general system requirements for both libraries were the same: (a) with a minimum of expensive manual keying, bibliographic data 238 journal of library automation vol. 6/4 december 1973 i nitial input data processor vjo~oco}fp. conversion program f1l( update processor allto>ianc field ~~cog. mal\cilrec. forma'tter sked output sort~recoih) cre.o\tt)r p.~>cero~u outi'\i'rpace formatter fig. l. bibcon: basic system schematic inmallnflri' data l'i\oc:essor pblm'sus proof listing forma. tier bipust output entity and colomn f ormatter ibm sort ou"rput file sorter must be prepared for book catalog production, with any of the standard catalog entries as keys. (b) provision must be made for the widest feasible variety of columnar output formats. (c) the format for any machine-readable records must be compatible with the marc standard. the system has been installed with revisions and modifications on an ibm 360 model 50 computer used by the california state library. all proknowledge numbers 090 call number main entry 100 main entry supplied titles 240 uniform title paragraph 245 title collation 300 collation series notes bibliographic notes 500 notes lc subject headings bibcon ! gibson 239 650 subject added entry other added entries 700 author added entry 7 40 title added entry (traced differently) series added entry 810 series added entry (traced differently) remaining unspecified data 099 remaining unspecified data 400 series, traced (personal ) 410 series, traced (corporate) 440 series, traced (title) 490 series, untraced or traced differently fig. 2. variable field tags-afr-marc ii grams in this version are written in the ibm basic assembler language ( bal) instead of the original combination of bal and cobol. in its .first version, bibcon processed monographic records exclusively. various programs have now been modified so that the system will also process serial records in a simplified marc serials format. this article, however, will describe only the system for processing monographic records. the system has been used to produce catalogs of monographs for uc santa cruz, uc san diego, and the one million record supplement to the uc catalog of books.2• portions of the system were used to produce the initial copies of the university of california union list of serials. the california state library automation project is using this basic .file management system to process both monographic and serial records for the production of several book catalogs. these will include, principally, the california union list of periodicals, reflecting the periodical holdings of libraries throughout california, the california state library list of periodicals, and the catalog of books in the california state library. automatic field recognition (afr) at the heart of the system is the program which creates marc-like records from unedited input data. this program, called automatic field recognition or afr, identifies control and variable fields and creates a leader and record directory for each record submitted to it. in order to accomplish this, when a record is submitted to the program, it .first sets aside areas into which data for each of the four parts can be placed. the field 240 journal of library automation vol. 6/ 4 december 1973 control numbers 0 1 0 lc card number 0 11 linking lc card number 0 1 5 national bibliography number 0 1 6 linking nbn 0 2 0 standard book number 0 2 1 linking sbn 0 2 5 overseas acquisitions number (pl480, lacap, etc.) 0 2 6 linking oan number 0 3 5 local system number 0 3 6 linking local number 0 4 0 cataloging source 0 4 1 languages 0 4 2 search code knowledge numbers 0 50 lc call number 0 5 1 copy statement 0 6 0 nlm call number 0 7 0 nal call number 0 71 nal subject category number 0 80 udc number 0 8 1 bnb classification number 0 8 2 dewey decimal classification number 0 8 6 supt. of documents classification 0 9 0 local call number main entry 10 0 personal name 110 corporate name 111 conference or meeting 13 0 uniform title heading supplied titles 2 4 0 uniform title 2 4 1 romanized title 2 4 2 translated title title paragraph 2 4 5 title 2 5 0 edition statement 260 imprint collation 3 0 0 collation 3 5 0 bibliographic price 3 6 0 converted price series notes 4 0 0 personal name-title (traced same) 410 corporate name-title (traced same) 4 11 conference-title (traced same) 4 4 0 title (traced same) 4 9 0 series untraced or traced differently bibliographic notes 5 0 0 general notes 5 0 1 "bound with" note 50 2 dissertation note 50 3 bibliographic history note 50 4 bibliography note 50 5 contents note (formatted) 5 0 6 "limited use" note 5 2 0 abstract or annotation subject added entries 6 0 0 personal name 6 1 0 corporate n arne ( excluding political jurisdiction alone) 6 11 conference or meeting 6 3 0 uniform title heading lc subject headings 650 topical 6 51 geographic names 6 52 political jurisdictions alone or with subject subdivisions other subject headings 6 6 0 nlm subject headings (mesh) 6 7 0 nal subject headings fig. 3. variable field tags-lc-marc ii 6 9 0 local subject heading systems other added entries 7 0 0 personal n arne 710 corporate name 7 11 conference or meeting 7 3 0 uniform title heading 7 4 0 title traced differently fig. 3 (continued) bibcon /gibson 241 7 50 name not capable of authorship series added entries 8 0 0 personal n arne-title 810 corporate name-title 811 conference or meetingtitle 8 4 0 title identification progresses on the basis of two signal symbols which are inserted between fields during input and on the basis of the order and content of the fields. when a control or variable field is identified, a standard marc record directory entry is created, containing the afr-marc ii field tag, the length of the field, and the starting character position of the field (figure 2) . necessary indicators and subfield delimiters are also created and placed in their proper positions in the field's data stream, and the field, along with its field terminator, is placed into the area set aside for data fields. afr-marc ii records it is important to emphasize that the system produces marc-like records rather than full marc records. while the basic record structure is exactly like that of standard library of congress marc, distinctions such as personal versus corporate main entry are not shown by the field tagging and the degree of subfield delimiting is extremely restricted.5 compare the list of variable field tags for afr-marc ii (automatic field recognition marc ii) records to that for lc-marc ii (library of congress marc ii) records (figures 2 and 3). at present, afr-marc ii provides detailed subfield tagging for only two fields, call number ( 090) and title ( 245). this lack of detailed discrimination causes no problem, however, for output of book catalog entries. it can affect filing sequence, since ala filing rules depend on such distinctions as personal versus corporate author to determine proper sorting. the decision to omit detailed subfield discrimination is a concession to cost. the two principal developers ( uc and csl) decided that, for book catalog production, detailed subfield delimiting would be of little value and that the benefits of such detail (i.e., ability to sort according to lc filing rules) would not justify the added costs in editing, input, programming, and processing which would be required to provide this detail. a sample of an afr-marc ii record created by the automatic field recognition program is shown in figures 4 and 6. it can be contrasted with the lc-marc ii record for the same title (figures 5 and 7). both a machine-based representation (figures 4 and 5) and a formatted output example (figures 6 and 7) of the record are shown. 242 journal of library automation vol. 6;4 december 1973 t $asocial~policy--bibl. 10 $aunited~nations.~~dept.~of~social~affairs. t ~ = blank 1 = field terminator t = end of record rz ?164 s66. 11.5 00001 united nations educational, scientific and cultural organi· zation. education clearing llou.se. education for community l' 2 fal ~-\tr:a!r!t. n. title. ( s~rlcs) m, j.., , j.t. 1.1. 1 s i.b5.u37 no.·{ 0 016.370193 ---copy 2. 1 z711h.s06us library of congress 151 l united nations. in, i, s r, l, ~&-373 fig. 4. sample library of congress card in apr-marc ii format bibcon /gibson 243 fig. 5. sample library of congress card in lc-marc ii format input data afr creates marc structured records from unedited input data. to what does "unedited" input refer? without a program such as afr, each marc field tag, subfield code, indicator, etc., for every marc or marclike record must be manually supplied by a human editor. with afr the input keyer simply indicates that some field is beginning; it is then up to the afr program to identify the field. afr will accept input created by a variety of methods. the decision on input method is based principally on cost. since input costs can vary widely as a result of various local conditions, provision has been made in the bibcon system to accept data in card or tape format. keypunch and optical character recognition (ocr) input are the two methods used thus far. a sample ocr input record appears in figure 8. while input instructions will vary according to the input method used, the four basic keying requirements remain the same: 1. begin an input record with an identification number. 2. place a field separator symbol before each field (i.e., each indention on the catalog card). 3. place a different symbol (called a "location" symbol) after call number and after the library location data. 244 journal of library automation vol. 6/4 december 1973 errors tag1 ind 2 sub3 data record no. 0000001 leader directory 00689nam 00145 , 008004100000090002500041099007700066100009700143245014900240 300001800389410004500407650002500452650002100477700004600498 008 090 $a $c 100 10 $a 245 1 300 $a $b $c $a 410 21 $a 650 0 $a 650 0 $a 700 10 $a 099 $a 710324sl954 z 7165 s66 us ref. 00000 eng united nations educational, scientific and cultural organization. education clearing house. education for community development; a selected bibliography, prepared by unesco and united nations [division of social affairs. paris, 1954] 49 p. 28cm. its educational studies and documents, 7 social policy--bib!. education--bib!. united nations. dept. of social affairs. lb5.u37 no.7 /016.370193 /55-373 /z7164.s66u5 /library of congress$ 1. tag = field tag. 2. ind = indicator. 3. sub = subfield code. fig. 6. afr-marc ii printsus output format 4. end each input record with an end-of-record symbol ·! variations on these four basic rules may be required because of restrictions of the input device used, because of variations in content or form of the input data, or because output specifications require nonstandard treatment by the programs. the task of manipulating the varying input into a form which is acceptable to afr is performed by a program called preafr. pre automatic field recognition (preafr) this program provides the interface between any one of the different input methods and the afr program. basically, preafr accepts data from keypunched cards, and ocr preafr accepts it from tape records. both forms of the preprocessing program combine input data segments until an end-of-record symbol is reached, indicating that all the data for one bibliographic record have been assembled. a character by character search is made, and special characters and diacriticals which were input as special codes are translated into the values necessary for output processing. bibcon /gibson 245 errors tag ind sub data record no. leader directory 001 008 050 0 051 0 082 $a $b $a $a $b $c $a 110 20 $a $b 245 1 $a $b $c 260 1 $a 300 $c $a $c 410 21 $a $b $t $v 650 0 $a $b 650 0 $a $b 00804nam 2200181 001001300000008004100013050003100054051002700085082001500112 110009700127245013500224260001800359300001800377410013600395 650002500531650002100556710004600577 55-373 710324sl954 fre lb5 u37 no. 7 z7164.s66u5 lbs u37 no. 7 copy 2 0164.370193 00000 eng united nations educational, scientific and cultural organization. education clearing house. education for community development; a sel~cted bibliography, prepared by unesco and united nations [division of social affairs. paris, 1954] 49 p. 28 em. united nations educational, scientific and cultural organization. education clearing house. educational studies and documents, 7 social policy bibl. education bibl. 710 20 $a united nations. $b dept. of social affairs. fig. 7. lc-marc ii printsus output format in addition the program can perform several editing and checking func,.. tions. these functions are optional and are dependent upon the input equipment and upon the wishes of the user. options such as deletion of data on the basis of special input symbols, checking to determine that the record control number is valid, and production of a file of control numbers for records in which data could not be interpreted by the input device are standard. because this program provides the interface between different, nonstandard input methods and one standard record formatting program, it is very user-dependent. the basic logic will remain the same, but individual options will have to be added or subtracted by each separate user. 246 journal of library automation vol. 6/4 december 1973 0000001 r=z 7164 =sbb =us y=refy=united =nations =educational, =s cientific and =cultural =organization. =education =clearing =ho use./=education for community development; a selected bibliograp hy, prepared by =u=n=e=s=c=o and =united =nations {{=division 0 f =social =affairs· =paris, 1954}}/49 p. 28 cm· {=its =educat ional siiitudies and documents, 7}/1. =social policy--=bibl· 2. =e ducation--=bibl· =i. =united =nations. =dept. of =social =affa irs· =i=i· =title· {=series}/=l=bs.=u37 no. 7/016-370193/55373/=z71b4·=~bb=u5/=library of =congress$+ note: data are from the catalog card shown in figure 5. fig. 8. sample ocr input s 0 0.0 0 1 0 0 9 8 it 1 , 8 b ~~ u 0 e l a i r e w b .5jillozz'i0000eqfoeqedeifof~h.e1~1e4c4c503c!t2.jl2t5404062epb8.96'tl!.mjuauoilum!lllllllill.__ c g p -'1-'l8.5.hfill!tqlec.lllllll!iift.3.1u9.ll!wiaza2juab!l!jizab40c7859699.ll.qtl2400t96a493~~289968799q197~~1l9.!1_~-r . k • t . r -.s.l..'ll~.qal!i6.ti!!~95a3b!99b985az!t08za8400997828599a3'tl!.llu!>llll!b~l!fllul~allllwll!!hill.allill.'l..~r,~;:____ a .j e • g es 0 _m,9,9.alio.cl93s}b5s54cbl,5l.~qiu.81!j485a240c594949695a2484040~~~ull.lg£li,489a2a3?.!ll__ u.s. w p • c c • 1'1 ___8.1_mub.s.a.~t,4c8995.t.013a8~8f24b40aza840f6cj69993844007a48~~jlu.~5jls!illtl~m.~~!t.qf.l_e2_· 69 189 • • ( .,. • -em'.s.3.e!tll~ 0 6 2 fl f 8 f!! •os 7484040899393 a4a2484040 9781 99a340 839693 48,06840 86818 3a2 8'1 94 az48 &b!t09] 9699 a3a24b 74 .. . it . a . .. . . i q .. b .. _!t._om)..f2£.a.83!.ur4df3 a p 8540c 1 9943,942 a340aj 9_5a44q8889a240a6969993845d404062069540az97~57a4040c 281 a.l..:._ • i _jht..a59.3.b 1 8999 8 5 5e ,c. 38 8,.. 40 p1 s9a 3 89aza]40 8195 8440888942 4066 969993 844840 40 62c99 ,8393 a484 8sa 24 0 82 89 82 93 • l • 8 • c p • l 8 . 2 1 1 8 __ag_g.6j.l9.!ul...9.1.8..8.ab..~.m..o!tll6 ze 1 48 .s.oczbi 448485 9181 8~998 5684 oc388 81 9993851240078~ 8599998 5~0f 1 fa fzf 169f li_$_ 6 7 • j • . p " g . • . . i i ~ k , r • i. i 1 • fbf74b401o~llj..s.itms..3-ua36b40c78596998785az48404qc9c94840q2969t916q400996828599a348404~c2ccj4840 t 8 • i s -e.3...8.9..u93ssiaie(v.dc281a.§84esqj8reqgcjai§5elr0&38885408t' 9iu389az.ala1lw~.889a240a69699~lll448401tdez8599. ····· ~ . . ·-··~~· · ······"'· -·~ ... . · . ,.·.-~ •; . . . .. .. .... ... . i ii ·ll__o c 0 1 c i o·., l 6 0 3 3 p 2 c 2 6 c • c -.e.g..as.a2sd3701 e400qo£il.ed.e.oe..o£.l.e0flf00099f9~0f6fof3f]4007~_f_2f6407999858679401toc381249761l!!.qu888199 fig. 9. preafr output data-printed from tape record preafr produces a file of variable length, machine-readable records (figure 9) which are passed to afr for formatting into a marc structure with limited marc ii tagging as described in the section on afr. record proofing and correcting prints us printsus is an output program which provides formatted afrmarc ii records, showing field tag, subfield delimiters, indicators, etc. this printout is designed for proofing of the marc records created by afr. samples of this type output appear in figures 6 and 7. bibcon /gibson 247 fix by processing data according to "fix commands" this program corrects records in marc format, operating as a context editor. corrections can be made to content or structure. entire records can be deleted and new records can be created using fix "correction" statements. when any change is made, fix automatically updates the record's leader and directory to reflect the record as changed. there are two input files: bibliographic records, in marc format, and the fix correction data. the input records file must be in marc format and must be in the same order (by record i.d. numbers) as the fix correction data file in order to successfully update the records. the fix program method of making corrections is based on the fix expression, which can be considered as a "language," with rules of grammar governing the structure of expressions (sentences), the order of elements within the expressions and the possible contents of each element (see figure 10). output processor the output processor consists of three programs and an ibm utility sort program. these general-purpose programs, which are designed to create book catalog page output, allow a variety of options for sorting as well as formatting. sort key edit ( sked) this program performs two major functions (figure 11) as follows: (a) from a single marc record it creates a record for each point of access to that record as specified by the program user; and (b) it establishes a 256 character sort key at the head of each record extracted. the file is then passed to an ibm sort package for sequencing. record extraction sked does not actually extract data from the original marc record. instead, it replicates the full record for each access point specified. it is left to the biblist program to extract the required data from these records. thus, if a particular bibliographic record should have five access points (one for main entry, one for title, two for subjects, and one for some other added entry), sked would output five full marc records. essentially the only differences in the output sked records would be in the data found in the sort keys prefixed to each record. the record for main entry access would contain main entry data as its first element; the title entry access record would contain title data first, etc. sort key creation data for the sort key are selected on the basis of user-specified tables. rf.cord 0200380 . 01 fix ripr1'ss!on 3 1 s 1 1)50 1 i j~put l£2 a 1. loans, parsonal s~n francisco. oot !'ul' l.'lij 1 · a 1 . loans, parson u s an !' nnclsco. 02 pix !ipr~ssion 2 1 ' c' d fix commands in this sample i~put 2~'; 0 a san prancisc:). v. c2~ci. ttssnln output 245 0 a san f'~ancisco. v.24c!ll. 01 03 fix exprpssion 2 1 1 v.24cll. 1 i 1 300 1 i 02 1 np u:i' l ll s ,, " ~-~'pr.~::rn·r. r#ic-o~·-v·;ncm. v~ output 300 a v. 24cm. 04 245 0 a se:n l'rancisco. os 04 fii ~ipressjon 2 1 i i!l i cd 1 1 'report.• input 245 0 a san 17ranci:;c:>. 100 10 a san francisco reme~ial loan association. output 245 n a reoort.san francisco. 100 10 a s<~.n francisco remedial loan association. u::>7ix :o.:xyr~ sstoh-------z-1-tport.-~-c .., -. hpu'! o!jtpfl'!' 245 0 245 0 a a p.e~ort.san francisco. ....... report.._fan francisco. fig. 10. sample fix data, illustrating fix operations coo£ meaning 8 set field tag c ~data ~ ify -fi·eracd ~&~ ata c ~data report. ~ 00 ...... i ....... .q., t""« j j a no 0 ;:s < ~ 0';) ~ tj cl) @ g_ ~ ..... ~ general description of sked subsystem functions of principal tables original marc rec. broad program functions f!eldtable 1. explosion of record. user indicates the fields to be extracted leader directory cant'! var. field table is used to determine and their heirarchy. fields data number of "skcd records" to make. stream for each field selected, the next table is 2. extraction of data; sort consulted by the program. key build for each sked record to be made, i the program cascades through the three tables as is required, in order field control table to properly extract the data from the marc record to build sort keys. for each field selected by the field table, the control table provides a pointer to the 256 byte sort key is created. matching subfield control entry in the next 3. creation of sked records . 1-..;':;:ab~le=( see=be==-lo=::w~).;_. -:--:---------1 original marc record is rep1icated "superblank": indicates the number · of and unique sort key is affixed to blanks to be placed at the beginning and the front end. end of each field. ~....;..;..._....;..;..._-,rl~---j subfield cont'l table sequencing: for each field selected, indicates the sequence for subflelds to be placed in sort key. blanks: indicates if blanks are to be placed at the end of each sub6eld (for sked ~--~so~rt~in~g~p~u~~~se~s~)~·------------------~ recotjrs i i initial articles: indicates if initial exampl e nrticles should be dropped in each subfield. conditions require 3 miscellaneous editing: all upper sked records ca:<>e ; remove blanks; drop punctuation; etc. -----------.---_;;c;.;:o;.;.p;.;.y_o;;,;f;._., + copy of copy of sort key number 1 marc data sort key number 2 -marc data sort key number 3 marc data r (main entry) m 1-0 ~ v ... t/': r (title added entry) m ii// 1/: i/ v.~ r (name added entry) m vv v v v e swth, john@jfirst i v.1 !/':: f/.; v .i e first world war 40 i i' /[/ v e doe, jane 40 first i 1' .1 r -' c worldwa!r@)i96710 s v/v/ j/ v.1 c smith,john40!96710 s r/v/v v c worldwar40smith, s i .1 [/ vv v/ c l /.-t:~:vv . fc l/ v / 1-john40196710 c 1/ v ~ i~v i/~ ~ l/ ~l~v: ~ [/vv~ ~ ~ 256 byte -----i ~~ ~ ~ ~ +--256 b)ie _____, ~ v rj~ ~ ~ 256 byte~ i.~~ f~ v: fig.ll. general desc1·iption of sked sttbsystem tl:l ....... tl:l r.j a ~ ...______ 0 1--( t::d {j') 0 z 1.'0 ~ ---------------···· ·· _____ _____ !;;_l}i..:t.fo~l'ulul.'l'a't'llli..l.ffiaiu__ ____ _______ _____ _ ·-------··--···page 3 sample sked tables author/title list --··-· -----·· ··--------------····~---· ------------------------···-------······· ···----tnc: na.ifr:t r.nnf. ~ii dr i af.ijr? st ~t 50urt:e sti\tf.ment fl51lct70 l./oh7z 11'1 *· ······ ... . .. .. . .. ·.. .. .... _. . ......... 0~01)~'>~0 ~·1 * f i' f f f f f f f f f f u 'j11)fl~'l70 'l() * l 'l l l l l l l l l l l n 01\0fjil'>~o --·----·--·-'ll........ o __ -lj _ ___ o. o. o ........ o.o o o . o .. ........ o o u ... __ do'jijo'''jo .. ... 'l7 * t t t c l l [j 9 2 f i i s 001101000 i< l (i 1)1)1)1)( 0'•0 97 ... ·--......... ·-· . ···· .... -·-...... ..... .. . ... ....... 1)!]1\0111~1) ~q • 000010~0 q9 o byte 0 .i i> 1 ~ i 1 i i i i 1 i 000n1 ohi ------------·--------· ·--··--100 .. ~ _____ ........... ---·· ....................... 0 .• 1 .l.3 4 ... 7 .. 6 .9 .. ·· -· ···--·--·----. o~(ioioao .. .. 101 * oooqii)qq tn7 • ijnnnlti)o .. ona1 1 r:...ftof:lf.~ll~tljcuf51l5----·--·-.lcj _______ ijc ___ .j:.! 400 '· ,x .0. ilooq oomm00010' sep.if.s-cor 0·101jii~? 0 oih6c f?f.-1 r•;t~'l:l~o~f'li:s 1 cq llc (. '<''•5', x' 0 c·1•j10e 51jsq01)01·70000il000f\0qdf r.ij01j0' itt'.( i oijoijl\60 .. oilll.!~\l. l'~f' '-ox 1.000000.esu500juf 1000000 jooooo~ 10000' ..... .ime.l---00001200 .00r.'.i 11'1 l.lc r. 'i xx 1 1x 'l)tlllocl0[~tjsc.1'1clf30fllloooihj00~0 !)0d0•10' 000 1)1 z 10 ooihiio fof~frih)otioiclci ii~ llc c't11jfl 1 ox'ullq007f.lr.lfl~0'•f'o70 ............... ···----· -·--· ----·---··-----· __ 115 .. ~.----------· ·-··-·····--·-...... ... . ---..... ______ .. 00001730 -· oo.nft, f4f4rc:rooor,nf'>r.~ 11~ oc f.''i'i0 1 ,x'003:joor-51j'i00<11jfifffffl101101lcooi)mo• series-ttl 000!112'10 00;1'\fr f?f'ol' 'ir.olo'lrllf'>f.'i 117 i;c c'2'~~;•,x•0•j0·1cjor~·.j~ooilof?000003100001f'l0000' itmei coijoiz50 -'loo•ooc .fofoi:tl0._ )jq * ~o~!jiz•o oo14'c! foftl'rr.::n .wof>r: 'i i?o nr. f"! l'1' ,x•o'l!icjil11jru-rrrr-.l'lu~ooo~f 3joouoooocooo ·~•hj000' 00')()1300 00:14~1: fofohtlflools. --ihni-corps•en. (chico, 196q] 50p. illas. sponsored bj the california dept. of education, dhision of co11peasatory education, bureau of cou•udity seryices add. :hgra.nt educat.ion, peb.-a'119· 1968. 1 .. children of aign.nt laborers--educatioa. 2~ teachers--butte co., calif. 1. f.lecurntaty and secondu:y educatioa act. if. educatioa--california.-buite co. 5>. social17 baddicapped c:.taildrea-educa.tion. c:gps gps ci.i.iporirlll. &d'us081 coi!ftissio• 01 voftelc~ --day careo .. tnnscrip.t of the ..,ublic hearlnq held jojntly with senate aad asse•blt social melfec~ coaa ittces. san rrancisco, october 11, 18, 19h~. 22dp .. 1. d-ty nurseries .. ll@b!t d) gps calipor:tu. adyisoitf corllsittee 01 coftp!msatorf educltioii., -.:..reconeadations for pxpansion by the california state legislature of the state cospctn1iatarr education ptograll base" on tho rtcat@~r act. [ 196~?) 46p. paul r. j,avtence. chau:•.._n. 1. caucornia. advisory couitt.ee 'ln co•pt!~satorr education. 2. t't[ceptional children--uuc:at1on. l .. socially handicapped childred--educahon. c170 rl cps caliporiiu. bureatj 0, et.e~emuri f.duca1'101. --cjilifot'nia proql."a.ll for tbe ear" of children of \forkinq po\rcnts. s<~cra~ento, califocnia state dept. of e;tucation ( tqu] 1 h, 12's pa incl. lllusa (plaas) tllbles. for11s. 21 c•. (californiaa dept. of education. bulletin of tlle california state dp.['artaent nt education, wol. ill • no.6l .. prepare4 by the diwision of ele11entary el\ucation.•-rorevord .. rihlioqraph:!': p. 79-101. 1 .. dctj nur~e~:im;. 2. childr•n--cbarities. protll!'ction, etc.--cctlifornia.. j. world vat, 19)9-childu·n. 1200 88 '1. 12 ao.,6 cps 'rh~ california children's centers and preschool educatiodal proq~ss. see under: califoroh .. legislati•• laalyst .. l42'l c41 gps c&liporifu. co!'!rsisstolf por spf.cul eotjcatioj,. --the education of pbysicdj., handic .. pc~ children. prepare~ by the c0211ir.~ion for:spp.cii\1 p.dllciltiod of thp. califord.ia state oepat't:.ent of ll!'dueation. saer&aento. calitot'nia state dopt. ot elluc:ttioa (19111] wiii, 121, (1] p .. 2 .1 ca. (california. dept. of education. nullnti~ of the callf:>rnia state oepa~:ti!ipnt of education, •ol x. no.12) lt h"ad of title: ••• oecpallel', 19111.coataias biblioqrapbies. 1. oefecti't'e and deliaqu.,nt claases--p.~iicolt:ioa. 2 .. cbildrea, lbnol'•al and bacltvard. l. edue"tl01l and childten. "· !docl!ltion--califoraia. £200 88 '1.,10 d0.1z gps cllipo!uul. cod!dultllfc conncil por hicii!ii educatiom. --california biqbet educ;stion iid4 the dis:a!!yantaged: a status report. n6a. 67p. on cou~r: lh1•ber 10j2. 1. education .. lligbt>t. 2. nnlu~r!litip.s and colleges--california. 1. student aid--california. if. socially handicapped childrp.n--!ducatioa. el'jo 06 gp5 --co1lifornh hiqher adacatlon and the disadvantac)e4; a status rp.port 68-2 for prosentation to the council, pehruary 1cj, 1968. 86p. 1 [ pablie .. tion] 68-2) 1. educiltiod. lliqher. 2. schohrships--califotnia. l. students--califot"nh. 4 .. stud<-nt aid-califot'nia. 'l. socidly handicapped children-califol'nia. to .. llniversities and c:olleqes-california. 1. "p.r~onnpl s~rwice in p.ducatlon-california. r. uniwcr s itie:o; and colleqes-ca li fornia--r.n tra nc" rtoq ui rfl'•en ts. j"1lf0 llh gp!l --nseof cxc,..pt ion!; to ad .. i:lrion!'; s t"'rtdar-ls for a.:il!li:;sion t"t .!is·l•h.1nt.,'l ~d !';tu:1pnt,,: !l nivprrity of calitornirjp.s--f.ntranep. rr.quireap.nts. 2. univ~rr.itics anr1 cotlr.ttea .. california. l .. collpp.ns.ltory ~r1ucation. f.19d al gps fig. 15. sample bibcon output: csl education catalog rials. additionally, portions of the software have been transferred successfully to the hennepin county library, minnesota. disadvantages: 1. personnel dependency: bal: the system is written in basic assembler language, thus necessitating the services of an experienced programmer. marc: because the system operates upon marc structured record format, the average programmer may well have a difficult time in dealing with the added complexities introduced by this aspect. 256 journal of library automation vol. 6/ 4 december 1973 options: the wide range of options provided by the system necessitates highly complex programs which may be difficult for the average programmer to grasp readily. 2. equipment dependency: ibm: because the programs are written in ibm basic assembler language, the system is presently usable on ibm equipment only. conclusion the bibcon-360 .system is a versatile and inexpensive method for producing book catalogs, when a wide range of format options are i·equired and when the catalogs must contain bibliographic information with more than one entry or access point per bibliographic record. if a simple, main entry catalog is needed, microfilm reproduction of the catalog cards may still be much cheaper. bibcon-360 is most useful for producing large scale catalogs (e.g., union catalogs) to be distributed widely to assist in the effort to provide the widest possible dissemination of library information at the least possible cost. references i. california. state library, sacramento. automation project, a users' manual for bibcon 360; a file manngement system for bibliographic records control {sacramento: california state library, 1972), 274p. (this manual, produced in limited quantities, is now available only on interlibrary loan.) 2. university of california, santa cruz, author-title catalog of the university library (santa cruz: university of california, 1970), 32 v. 3. university of california, san diego, autlwr-title catalog; subject catalog (san diego : san diego medical society-university library, 1969), 350p. 4. california. university. institute of library research, university of california union catalog of mortograplu; cataloged by the nine campuses from 1963 through 1967; a supplement to the catalogs of the university libraries at berkeley and los angeles published in 1963 (berkeley: university of california, 1972), 47 v. 5. u.s. library of congress. informatilm systems office, marc manuals used by the library of congress (chicago: ala, 1970), p.42. 6. california. state library, sacramento, recent works in the california state library in science and technology (sacramento: california state library, 1972) , p.426. 7. california. state library, sacramento, special education problems; a catalcg of materials in the california state library (unpublished). (this topical catalog was output only to test refinements to the bibcon-360 programs. it was not published, but the sample pages produced illustrate furth er refinements in formatting and sorting routines.) lib-mocs-kmc364-20140103102106 27 personnel aspects of library automation david c. weber: director of libraries, stanford university, stanford, california personnel of an automation project is discussed in terms of talents needed in the design team, their qualifications and organization, the attitudes to be fostered, and the communication and documentation that is important for effective teamwork. discussion is based on stanford university's experience with protect ballots and includes comments on some specific problems which have personnel importance and may be faced in major design efforts. no operation is any better than its rersonnel. the selection, encouragement, motivation and advancement o the individuals who operate libraries or library automation programs are the critical elements in the success of automation. the following observations are based upon experience at stanford university over the past eight years in applying data processing to libraries, and particularly in the large scale on-line experience of project ballots (an acronym standing for bibliographic automation of large library operations using a time sharing system) supported by the u. s. office of education bureau of research during the past three years. the first par! of the paper treats of five key personnel aspects: the automation team, thetr qualifications, their organization, the climate for effort, and documentation. 28 journal of library automation vol. 4/1 march, 1971 the team experts are required for the design of any computer system or system based on other sophisticated equipment and they must emphatically form a "team" to be effective. the group may include a statistician and/or financial expert, a systems analyst, a systems designer, a systems programmer, a computer applications programmer, and a librarian. there may be several persons of each type, or one person may assume more than one responsibility. a few universities have librarians who have received training in systems analysis or in programming. the computer related professions are, however, demanding in themselves, and especially so when the programming language may change with each generation of computers. it is therefore usual for the head librarian to work with experts located in a systems office, an administrative datajrocessing center, or a computation center. except for the librarians, few · any of the experts may be on the library payroll, although in a very large project all may be financed from one or two accounts in the library. the team must cover the variety of functions encompassed in a formal system development process. these functions are enumerated in detail in stanford's project documentation ( 1), but a brief summary of typical functions performed by the team may indicate its diversity. there is the analysis of existing library operations, conceptual design of what is desired under an automated system, form and other output design, review of published literature and on-site analysis of selected efforts of a related nature; determination of machine configuration to support the system design, study of machine efficiency, and reliability of main frame plus peripheral equipment; choice of programming language, checkout and debugging of programs; cost effectiveness study, study of present manpower conversion, analysis of space requirements and equipment changes; staff training programs with manuals or computer aided instruction, system documentation and publicity; systems programming and applications programming, and project management. the total effort is collaborative; the system is designed by and with the users of it (i.e., library staff), not for them, and a tremendous contribution of local staff time is essential to success. in many instances an institution will have some, but not all, of these resources and capabilities in adequate amount. if amount is insufficient, the project director must determine how, through consultants or change of project course, a needed talent can be obtained or bypassed. the consequences of each mix of talent and change of strategy need assessment at frequent intervals; reassessment must be done with the full participation of the most senior library officers, including the director of libraries, as well as certain other key university officers. at stanford, the group has for three years comprised diverse talent and worked reasonably well as a team. the library has recently delegated to the director of the computation center the immediate project management of ballots and spires ( 2) (stanford public information retrieval personnel aspects of automation/ weber 29 system). thus the current combined staff of twenty-three, which should reach a peak of twenty-five during 1971, reports to the ballots-spires project director. he in turn reports both to the director of the computation center in a direct relationship and, under his second hat as chief of the library automation department, to the assistant director of libraries for bibliographic operations in a dotted-line relationship. see table 1 for stanford's diversity of staff. table 1. staff of project ballots-1970 title or age degree years of years on classification experience project project director 36 bs, ce 15 1 special assistant 40 bs 12 2 senior system programmer 37 ba 8 1 system programmer 36 bs 14 3 manager technical development 29 bs 5 2 system services manager 30 ba 8 2 librarian 11/system analyst 28 ba, mls 3 3 librarian/system analyst 27 ba, mls 2 <1 project documentation 35 ba, mls 3 1 editor assistant 26 ma 3 <1 system analyst 27 ba, ma 5 1 junior system analyst 25 ba 2 2 programmer trainee 26 1 1 programmer 30 aa 7 3 programmer 26 ba 4 1 programmer 32 bs 11 <1 programmer 28 bs 7 <1 research assistant 27 bs, ms, phd 4 3 research assistant 28 ba, llb 8 2 research assistant 22 ba 3 2 research assistant 24 ba 4 2 senior secretary 27 8 1 secretary 19 1 1 in development of library automation or of any sophisticated data processing system, it is essential to utilize librarians and other system users to the utmost in constructing the design. there is evidence that an effective program of library automation results from on-campus development: that is, using a local staff with librarians working on a daily basis with system ~alysts, programmers, and information scientists. librarians most definitely should not try to do it all themselves; that would be sheer folly and w~uld reveal a lamentable lack of appreciation of the highly complex sktlls of the other professionals working in the information sciences. l 30 journal of library automation vol. 4/1 march, 1971 team qualifications a qualified and enthusiastic team with strong backing from the library administration is the most important single element in a library's automation ehorts. this requires that the library administrator have a grasp of the intricacies, although he himself will probably not understand all details involved in the system design. it also requires consideration of the desire for advancement of those in computer refated professions and the various characteristics of their career/attems, including training, experience, job market, salary potentials, an mobility. the team will need to be selected with care and joint ehort by computer stah and library stah management. people are needed who can teach and learn from one another. they must be tolerant, and interested in problems and details, for they will be changing traditional systems, altering people's work habits, and probably shaking their self-confidence. security comes from knowing the facts and being able to work on the new system-to be in part responsible for one's own future. team harmony of ehort can be promoted by the so-called "bridge professional", or what the sociologists call a "marginal professional", meaning one who is able to assist those in one profession to converse and work ehectively with those in another. at stanford the librarian/analysts and the project editor have been ehective in such a capacity. those in the computer related professions, along with all on the library stah, need a sense of purpose, a sense of achievement, and recognition of their contributions by superiors as well as peers. the automation team needs a competent, experienced, technically knowledgeable, and tactful captain. he must manage with an appreciation for communication, a knack for touching base with various groups having interests in the ehort, the judgment to assign reasonable tasks, and the realism to set and achieve feasible time schedules-all within budget limitations. if the leader is less than this paragon, others in the organization must provide these qualities, all of which are required. for at least another decade it is likely that the expert analyst andjrogrammer will receive as high a salary as a librarian division hea or assistant department chief, and a highly qualified systems designer may well earn more than any chief and perhaps as much as the assistant director of libraries. the scale is not irrational or unjust; it merely recognizes the scarcity of particular talents and their importance to major library automation programs. designing an on-line library system requires a person of proven competence in on-line systems. a salary oher shaved here may well lead to regret. experience in project ballots points up problems with the selection of personnel who are not library trained. some persons may be excellent in theoretical development but poor as managers, or some may play a "campus politics" game in order to move into senior positions in the computation center. computer specialists have diherent career goals than do librarians, and rarely see the library as a permanent career commitment personnel aspects of automation/weber 31 by which to promote library automation; rather their commitment is toward automation and computer applications, not a particular section of the university. a project manager also needs to take great care that research does not become an end in itself, a particular tendency of graduate students doing system development. implementation must be the goal of library automation; automated operations must be sound, efficient, dependable, and economical. some of the special needs and working conditions for personnel in an automated program are outlined by allen b. veaner (3). team organization the organizational unit of an automation program may be first an office, then later a division when the group is farger and the function more permanent. the staff of a major project should have a departmental status equal to that of the acquisition or cataloging department. these latter two departments may be combined with an automation department under an assistant or associate director for technical processing. however, it is a rare individual who can give adequate attention to both the complexities of a major traditional library function and the direction of a major research and development program. thus the initial organizational pattern may be one of separate but equal status, and at some point in time the units may be combined under one administrator. see figure 1 for stanford's new organization adopted after three years of ehort, as it entered the productionengineered phase. units may best be combined when a research and development project begins to take on a significant amount of operational work. the reason is that the person in charge of the system development may need to oversee its implementation in order to assure that standards are followed for data preparation, coding, and the details of forms; and that feedback of experience for system improvement is secured. this combination of units should not be achieved when the rroject is still in the development stage, but it should also not wait unti operations are well under way. some anticipation is desirable. in the medium-scale program such combination of units may be possible after a year of operation, or the continuing production may be assumed by a traditional department and the systems office left free for further experimentation and development work. production is normally the responsibility of traditional departments and ~om the day of implementation; the automation department responsibility is for instructing in system use, debugging of programs, and fine tuning of the system. in a large project striving toward an integrated system for all technical processes and public services, the transfer of responsibilities to traditional departments may come in no less than three years and perhaps as many as five years from the origin of the project because of c_onstant developments in software and hardware, developments which library users cannot control but to which they must be responsive. an ~ director, stanford : university librarjes e the .r.lots : : ballots: prin:cipal investigator and assistant director of libraries for bibliographic operations i library systems i design com!httee i i i i system services manager 4-library syste~ls 2-system programmers analysts (incl. 2 librarians) age = 27 years degrees = 1t (a)exper = 3 years (b)proj = 2 years age = 37 years degrees = 1 exper = 11 years )?roj = 2 years (a) professional experience in edp systems (b) time with the ballots/spires project director, stanford vice president computation center for research project director spires: principal investigator spires/ballots and professor of the and chief of the library's department of communication automation department 6-applications + 4-graduate programmers srudents (full time) (part time ) 26 years 2 age = 31 years degrees = 1 exper = 71years proj = l year 4 years 2 years project documentation 2-editors age = 30 years degrees = 2 exper = 3 years proj = 1 year fig. 1. ballots/spires organization-1970. ~ 'c' ~ ~ ..... .q.. t-t 5:j ~ c ~ ;; c· ;$ ~ ~ -~ ~ i ..... ~ ..... personnel aspects of automation/weber 33 automation division or systems office would remain to take care of the refinements, maintenance, and development of further applications which are a result of the open-ended nature of a major automation program. the climate for effort if the librarian is to work effectively with all of the previously mentioned experts, he must become more than superficially familiar with the equipment and with the software which instructs it. the librarian who carries the responsibility for major mechanized data processing programs will probably have taken at least half a dozen courses in various aspects of data processing in order to be able to state reasonable requirements, to comprehend economic and technical limitations, discuss file organization problems with the systems designer, and be sufficiently informed to help explain the new system to the library staff that will operate or make use of it. this type of specialized training will also be necessary for other team members who will work with different parts of the system. a number of librarians will need to take several short courses selected for their early relevance to the work at hand. staff may take courses offered in the university computer science department, by the computation center, or by a local computer firm. various clerical personnel will need briefing sessions, and it will be necessary to train some typists to serve as skilled terminal operators. indeed, training will be needed on a continuing basis as more staff use the system; manuals are important unless self-instruction is built in. these efforts are desirable because the employee needs assurance that his talents will not be outdated and he be laid off as a consequence; rather that he will be retrained to the new system, shown that its function is not totally different from the previous one, and shown that it can actually serve him and lead to enhanced satisfaction and improved salary in his library employment. computer based systems are far more likely to upgrade librarianship than to make it obsolete. they will enhance the profession by eliminating its routine drudgery, and thus more sharply identify its really professional nature. don r. swanson has commented on this point: "those librarians who have some kind of irrational antipathy toward mechanization per se (not just toward some engineers who have inappropriately oversold mechanization) i regard with some suspicion because i think they do not have sufficient respect for their profession. they may be afraid that librarianship is going to be exposed as being intellectually vacuous, which i don't think is so. even in a completely mechanized library there would still be need for skilled reference librarians, bibliographers, catalogers, acquisitions specialists, administrators, and others. those librarians in the future who regard mechanization, not with suspicion, but as a subject to be mastered will be those who will plan our future libraries and who will plan the things that machines are going to do. there will be no doubt of their professional status." ( 4) 34 journal of library automation vol. 4/1 march, 1971 persons who have inhibitions about machine based systems will not be effective members of the design and development group. those receptive to the change will benefit by having their job horizons enlarged and their prospects for improved salary and personnel classification enhanced. they will also share in the enthusiasm inspired by a bold new enterprise. this is not to say that all library staff members will enjoy the exacting refinements of a machine system, just as not everyone has talent to be a first-rate cataloger. it is not suited to everyone, and therefore the nature and purpose of the system must be clearly explained or demonstrated to anyone interested in such an assignment lest he accept it and then become disenchanted with the work. the importance cannot be overstated of telling the entire library staff what is being done in regard to automation-and why. disquieting rumors will abound in the absence of full and candid communication. staff meetings should be held to review progress and outline next steps. staff bulletins should publish summaries of the program and reports on its current status, information that can also be useful for faculty and staff outside the library. it must not be forgotten that the card catalog, the manual circulation system, and common order forms are familiar to all students and faculty. most students will have seen these in their high school or public libraries, yet few will have seen a sophisticated machine system, and will often be skeptical about its efficiency and dependability. faculty members may well wonder whether it is worth the cost. the effort to explain a program concisely but clearly to the library staff, students, faculty, and other university staff can be highly rewarding in understanding, and in moral and financial support. columbia university's experience with library automation has led them to state that .. though the hardware and software programs associated with computer technology are formidable, they are not the only (and possibly not even the most important) problems in an automation effort. two areas often overlooked or grossly underestimated are: 1 ) creating an environment hospitable to change [and] especially important in this area is staff training and organization. 2) describing and analyzing existing manual procedures sufficiently before attempting to design automated systems." (5) documentation the documentation of any new system is of singular importance. there is an oral tradition in most libraries; techniques of filing or searching are passed on by the supervisor, although libraries use staff manuals to formalize some of the techniques. however, in a system where absolute exactitude is demanded and where costs of system development are high, methodical recording of principles and procedures is obviously necessary. especially vital are details of design and programming, for purposes of debugging, maintenance, and transfer to others. personnel aspects of automation/weber 35 critical personnel issues in an important statement from massachusetts institute of technology's project mac in 1968, professor f. j. corbat6 outlines fifteen critical issues ranging from technical to managerial that affect the complexity and difficulty of constructing computer systems to serve multiple users ( 6). seven of the fifteen have substantial personnel aspects; experience with project ballots provides the basis for the following comments on them. 1) "the first danger signal is when the designers of the system won't document. they don't want to be bothered trying to write out in words what they intend to do." stanford's experience might not put this as a first critical issue, yet it is evident that without adequate and clear documentation the advancement of any research or development project is jeopardized. one expert, an invaluable member of the ballots team, has full responsibility for this very important task. the position requires adequate clerical support; there are one-and-a-half assistants on the ballots team. 2) "the second danger signal is when designers won't or can't implement. what is referred to here is the lofty designer who sketches out on a blackboard one day his great ideas and then turns the job over to coders to finish many months later." stanford has experienced some of the seductiveness of design innovations, especially on the part of graduate student research assistants. (yet these assistants have done excellent work and it is wished they were all full time on the project. ) without constant review and the use of pert charts or other scheduling, shying away from implementation can be a real hazard. there will be dark days when the design team cannot surmount some intractable but crucial obstacle, and tne project manager and stah librarians working with the team must be sympathetic, encouraging and patient. 3) "the next danger signal is when the design needs more than ten people. this doesn't mean that all the support people . . . must add up to no more than ten. but when the crucial kernel of the design team is more than ten people, a larger scale project is coming into being. this is the point where communication problems begin to develop." stanford has flirted with that particular danger point. with acquisition and cataloging staff included, the ballots design group is over ten and there is a communication problem, but one due not so much to size as to different backgrounds, vocabulary and scheduling of effort. the need for communication has been intensified because the main library is over half a mile from the computation center. it has required monthly staff meetings at early stages of design, and late stages of development, and at other times weekly staff meetings of the design group with the librarians who are ~etting the design criteria. failure of constant and accurate communication m a research and development effort is a threat to its effective progress. 4) "if a project cannot be finished or made use of in one year, there is potential trouble, because the chances of underestimation are strong (and ) a personnel turnover of roughly 20% per year must be assumed." stanford's 36 journal of library automation vol. 4/1 march, 1971 experience would bear this out. there was some time and cost underestimation. turnover during 1969-70 was 17%; the year before it was 50%. obviously documentation then becomes a more critical element in progress, and turnover may lead the librarian to feel that it is sometimes one step backwards for every two steps forward. turnover may be minimized by generous salary increases, not only once a year but perhaps at other times also when merit deserves reward and as responsibilities increase. in contrast to customary operations, an automation design effort is constantly changing in nature and emphasis; this fact requires flexibility in personnel management and frequently deserves immediate response in salary and classification administration. to keep a qualified research team in an area of specialization in demand, one must pay the price. let there be no misunderstanding, a good system of library automation cannot be finished in one year-nor in three; and it is costly. 5) "another danger signal is when a system is not a line-of-sight system. this means that all of the terminals, consoles, or what-have-you are not in the same room within shouting distance of the operator." any on-line system like ballots cannot be line-of-sight. terminals are brought to the users, not users to the terminals. since an on-line system requires total file recovery through use of log tapes, a facility not available on the prototype system, stanford has experienced problems when the machine goes down; it takes time to rerun a program or mount a different disk pack; a file was once wiped out; and there are many other users of the central facility, which puts a premium on scheduling, advance notice, backup, and the like. if a design team is not housed in adjacent space, it will take more personnel or time than in a line-of-sight arrangement to achieve the same accomplishment. ballots systems analysts were in the main library througbout the early design phases and the systems designers were near the computation center. lack of line-of-sight was a sufficiently severe problem that all of the ballots staff were collocated near the computation center last winter as the production engineered phase began. 6) "a somewhat related danger signal is when there are over ten system maintainers. here i am talking about an on-line system that is actually being maintained on-line." at stanford no more than one person has worked at one time on the program maintenance of stanford's four-yearold computer produced undergraduate library book catalog. there have been some complexities due to staff changes, changes in the operating system, and an off-campus contract for reprogramming to third-generation equipment, but the problems have not resulted because of the scale of the project. ballots, on the other hand, is twenty to fifty times as large a system, and it is expected that two or three programmers will be needed to maintain the systems software and a similar number to maintain and make minor revisions to the applications software. 7) "the last danger signal is when the system requires the ability to permit combinations of sharing, privacy and control." at stanford, assignpersonnel aspects of automation/ weber 37 ment of authority for file access has become a problem-who is permitted to update an acquisition record or authorize payment? the requirement for security also enters in any system which has salary data or other personnel information in files. a whole order of complexity is added. as in many of the above problems, complexity is accentuated when one is developing an on-line interactive system which serves multiple users. security must be designed to the file level and, later, to the record or even data element level. security requires control of access to file, of writing in a file, and of updating data through three types of checks: access allowable from a given terminal, from the file password, or from an individual password. such problems do not exist in off-line systems. conclusion for successful automation of library operations, it is of fundamental importance to choose a task that is appropriate in timing, magnitude of effort, funding, and personnel. the ballots experience demonstrates that one must devote great thought, care, and analysis to choosing the right automation project at the right time, and base it on having well qualified people to direct and accomplish the task. given suitable conditions it will be a most exciting and fruitful endeavor. the system that works well is a thing of beauty, and people make it so. references 1. stanford university, spires/ballots project: project control notebook, may 1970. section 1.4 "system development process." 2. parker, edwin b.: spires (stanford physics information retrieval system) 1969-70 annual report to the national science foundation. (stanford university: institute for communication research, june 1970). 3. veaner, allen b.: "major decision points in library automation," college & research libraries, 31 (september 1970), 299-312. 4. swanson, don r.: "design requirements for a future library." in markuson, barbara evans, ed.: libraries and automation. (washington: library of congress, 1964), p. 21. 5. columbia university libraries : progress report [to the national science foundation on library automation] for jan. 1968-dec. 1969 (nsf-gn-694). p. 14. 6. corbat6, fernando j.: sensitive issues in the design of multi-use systems (waltham, massachusetts: honeywell edp technology center, technical symposium on advances in software technology, february 1968). 17 pp. project mac internal memo. mac-m-383. microsoft word june_ital_deodato_final.docx evaluating web-‐scale discovery: a step-‐by-‐step guide joseph deodato information technology and libraries | june 2015 19 abstract selecting a web-scale discovery service is a large and important undertaking that involves a significant investment of time, staff, and resources. finding the right match begins with a thorough and carefully planned evaluation process. to be successful, this process should be inclusive, goal-oriented, datadriven, user-centered, and transparent. the following article offers a step-by-step guide for developing a web-scale discovery evaluation plan rooted in these five key principles based on best practices synthesized from the literature as well as the author’s own experiences coordinating the evaluation process at rutgers university. the goal is to offer academic libraries that are considering acquiring a web-scale discovery service a blueprint for planning a structured and comprehensive evaluation process. introduction as the volume and variety of information resources continue to multiply, the library search environment has become increasingly fragmented. instead of providing a unified, central point of access to its collections, the library offers an assortment of pathways to disparate silos of information. to the seasoned researcher familiar with these resources and experienced with a variety of search tools and strategies, this maze of options may be easy to navigate. but for the novice user who is less accustomed to these tools and even less attuned to the idiosyncrasies of each one’s own unique interface, the sheer amount of choice can be overwhelming. even if the user manages to find their way to the appropriate resource, figuring out how to use it effectively becomes yet another challenge. this is at least partly due to the fact that the expectations and behaviors of today’s library users have been profoundly shaped by their experiences on the web. popular sites like google and amazon offer simple, intuitive interfaces that search across a wide range of content to deliver immediate, relevant, and useful results. in comparison, library search interfaces often appear antiquated, confusing, and cumbersome. as a result, users are increasingly relying on information sources that they know to be of inferior quality, but are simply easier to find. as luther and kelly note, the biggest challenge academic libraries face in today’s abundant but fragmented information landscape is “to offer an experience that has the simplicity of google—which users expect—while searching the library’s rich digital and print collections— which users need.”1 in an effort to better serve the needs of these users and improve access to library content, libraries have begun turning to new technologies capable of providing deep discovery of their vast scholarly collections from a single, easy-‐to-‐use interface. these technologies are known as web-‐scale discovery services. joseph deodato (jdeodato@rutgers.edu) is digital user services librarian at rutgers university, new brunswick, new jersey. evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 20 to paraphrase hoeppner, a web-‐scale discovery service is a large central index paired with a richly featured user interface providing a single point of access to the library’s local, open access, and subscription collections.2 unlike federated search, which broadcasts queries in real-‐time to multiple indexes and merges the retrieved results into a single set, web-‐scale discovery relies on a central index of preharvested data. discovery vendors contract with content providers to index their metadata and full-‐text content, which is combined with the library's own local collections and made accessible via a unified index. this approach allows for rapid search, retrieval, and ranking of a broad range of content within a single interface, including materials from the library’s catalog, licensed databases, institutional repository, and digital collections. web-‐scale discovery services also offer a variety of features and functionality that users have come to expect from modern search tools. features such as autocorrect, relevance ranking, and faceted browsing make it easier for users to locate library materials more efficiently while enhanced content such as cover images, ratings, and reviews offer an enriched user experience while providing useful contextual information for evaluating results. commercial discovery products entered the market in 2007 at a time when academic libraries were feeling pressure to compete with newer and more efficient search tools like google scholar. to improve the library search experience and stem the seemingly rising tide of defecting users, academic libraries were quick to adopt discovery solutions that promised improved access and increased usage of their collections. yet despite the significant impact these technologies have on staff and users, libraries have not always undertaken a formal evaluation process when selecting a discovery product. some were early adopters that selected a product at a time when there few other options existed on the market. others served as beta sites for particular vendors or simply chose the product offered by their existing ils or federated search provider. still others had a selection decision made for them by their library director or consortium. however, despite rapid adoption, the web-‐scale discovery market has only just begun to mature. as products emerge from their initial release and more information about them becomes available, the library community has gained a better understanding of how web-‐scale discovery services work and their particular strengths and weaknesses. in fact, some libraries that have already implemented a discovery service are currently considering switching products. whether your library is new to the discovery marketplace or poised for reentry, this article is intended to help you navigate to the best product to meet the needs of your institution. it covers the entire process from soup to nuts from conducting product research and drafting organizational requirements to setting up local trials and coordinating user testing. by combining guiding principles with practical examples, this article aims to offer an evaluation model rooted in best practices that can be adapted by other academic libraries. literature review as the adoption of web-‐scale discovery services continues to rise, a growing body of literature has emerged to help librarians evaluate and select the right product. moore and greene provide a information technology and libraries | june 2015 21 useful review of this literature summarizing key trends such as the timeframe for evaluation, the type of staff involved, the products being evaluated, and the methods and criteria used by evaluators.3 much of the early literature on this subject focuses on comparisons of product features and functionality. rowe, for example, offers comparative reviews of leading commercial services on the basis of criteria such as content, user interface, pricing, and contract options.4 yang and wagner compare commercial and open source discovery tools using a checklist of user interface features that includes search options, faceted navigation, result ranking, and web 2.0 features.5 vaughan provides an in-‐depth look at discovery services that includes an introduction to key concepts, detailed profiles on each major service provider, and a list of questions to consider when selecting a product.6 a number of authors have provided useful lists of criteria to help guide product evaluations. hoeppner, for example, offers a list of key factors such as breadth and depth of indexing, search and refinement options, branding and customization, and tools for saving, organizing, and exporting results.7 luther and kelly and hoseth provide a similar list of end-‐user features but also include institutional considerations such as library goals, cost, vendor support, and compatibility with existing technologies.8 while these works are helpful for getting a better sense of what to look for when shopping for a web-‐scale discovery service, they do not offer guidance on how to design a structured evaluation plan. indeed, many library evaluations have tended to rely on what can be described as the checklist method of evaluation. this typically involves creating a checklist of desirable features and then evaluating products on the basis of whether they provide these features. for example, in developing an evaluation process for rider university, chickering and yang compiled a list of sixteen user interface features, examined live product installations, and ranked each product according to the number of features offered.9 brubaker, leach-‐murray, and parker employed a similar process to select a discovery service for the twenty-‐three members of the private academic library network of indiana (palni).10 these types of evaluations suffer from a number of limitations. first, they tend to rely on vendor marketing materials or reviews of implementations at other institutions rather than local trials and testing. second, product requirements are typically given equal weight rather than prioritized according to importance. third, these requirements tend to focus predominantly on user interface features while neglecting equally important back end functionality and institutional considerations. finally, these evaluations do not always include input or participation from library staff, users, and stakeholders. the first published work to offer a structured model for evaluating web-‐scale discovery services was vaughan’s “investigations into library web-‐scale discovery services.”11 vaughan outlines the evaluation process employed at university of nevada, las vegas (unlv), which, in addition to developing a checklist of product requirements, also included staff surveys, interviews with early adopters, vendor demonstrations, and coverage analysis. the author also provides several useful appendixes with templates and documents that librarians can use to guide their own evaluation. vaughan’s work also appears in popp and dallis’ must-‐read compendium planning and implementing resource discovery tools in academic libraries.12 this substantial volume presents evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 22 forty chapters on planning, implementing, and maintaining web-‐scale discovery services, including an entire section devoted to evaluation and selection. in it, vaughan elaborates on the unlv model and offers useful recommendations for creating an evaluation team, educating library staff, and communicating with vendors.13 metz-‐wiseman et al. offer an overview of best practices for selecting a web-‐scale discovery service on the basis of interviews with librarians from fifteen academic institutions.14 freivalds and lush of penn state university explain how to select a web-‐ scale discovery service through a request for proposal (rfp) process.15 bietila and olson describe a series of tests that were done at the university of chicago to evaluate the coverage and functionality of different discovery tools.16 chapman et al. explain how personas, surveys, and usability testing were used to develop a user-‐centered evaluation process at university of michigan.17 the following article attempts to build on this existing literature, combining the best elements from evaluation methods employed at other institutions as well as the author’s own, with the aim of providing a comprehensive, step-‐by-‐step guide to evaluating web-‐scale discovery services rooted in best practices. background rutgers, the state university of new jersey, is a public research university consisting of thirty-‐two schools and colleges offering degrees in the liberal arts and sciences as well as programs in professional and continuing education. the university is distributed across three regional campuses serving more than 65,000 students and 24,000 faculty and staff. the rutgers university libraries comprise twenty-‐six libraries and centers with a combined collection of more than 10.5 million print and electronic holdings. the libraries’ collections and services support the curriculum of the university’s many degree programs as well as advanced research in all major academic disciplines. in january 2013, the libraries appointed a cross-‐departmental team to research, evaluate, and recommend the selection of a web-‐scale discovery service. the impetus for this initiative derived from a demonstrated need to improve the user search experience on the basis of data collected over the last several years through ethnographic studies, user surveys, and informal interactions at the reference desk and in the classroom. users reported high levels of dissatisfaction with existing library search tools such as the catalog and electronic databases, which they found confusing and difficult to navigate. above all, users demanded a simple, intuitive starting point from which to search and access the library’s collections. accordingly, the libraries began investigating ways to improve access with web-‐scale discovery. the evaluation team examined offerings from four leading web-‐scale discovery providers, including ebsco discovery service, proquest’s summon, ex libris’ primo, and oclc’s worldcat local. the process lasted approximately nine months and included extensive product and user research, vendor demonstrations, an rfp, reference interviews, trials, surveys, and product testing. see appendix a for an overview of the evaluation plan. information technology and libraries | june 2015 23 by the time it began its evaluation, rutgers was already a latecomer to the discovery game. most of our peers had already been using web-‐scale discovery services for many years. however, rutgers’ less-‐than-‐stellar experience with federated search had led it to adopt a more cautious attitude toward the latest and greatest of library “holy grails.” this wait-‐and-‐see approach proved highly beneficial in the end as it allowed time for the discovery market to mature and gave the evaluation team an opportunity to learn from the successes and failures of early adopters. in planning its evaluation, the rutgers team was able to draw on the experiences of earlier pioneers such as unlv, penn state, the university of chicago, and the university of michigan. it was on the metaphorical shoulders of these library giants that rutgers built its own successful evaluation process. what follows is a step-‐by-‐step guide for evaluating and selecting a web-‐scale discovery service on the basis of best practices synthesized from the literature as well as the author’s own experiences coordinating the evaluation process at rutgers. given the rapidly changing nature of the discovery market, the focus of this article is on the process rather than the results of rutgers’ evaluation. while the results will undoubtedly be outdated by the time this article goes to press, the process is likely to remain relevant and useful for years to come. form an evaluation team the first step in selecting a web-‐scale discovery service is appointing a team that will be responsible for conducting the evaluation. composition of the team will vary depending on local practice and staffing, but should include representatives from a broad cross section of library units, including collections, public services, technical services, and systems. institutions with multiple campuses, schools, or library branches will want make sure the interests of these constituencies are also represented. if feasible, the library should consider including actual users on the evaluation team. these may be members of an existing user advisory board or recruits from among the library’s student employees and faculty liaisons. including users on your evaluation team will keep the process focused on user needs and ensure that the library selects the best product to meet them. there are many reasons for establishing an inclusive evaluation team. first, discovery tools have broad implications for a wide range of library services and functions. therefore a diversity of library expertise is required for an informed and comprehensive evaluation. reference and instruction librarians will need to evaluate the functionality of the tool, the quality of results, and its role in the research process. collections staff will need to assess scope of coverage and congruency with the library’s existing subscriptions. access services will need to assess how the tool handles local holdings information and integrates with borrowing and delivery services like interlibrary loan. catalogers will need to evaluate metadata requirements and procedures for harvesting local records. it staff will need to assess technical requirements and compatibility with existing infrastructure and systems. second, depending on the size and goals of the institution, the product may be expected to serve a wide community of users with different needs, skill levels, and academic backgrounds. large evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 24 universities that include multiple schools, offer various degree programs, or have specialized programs like law or medicine will need to determine if and how a new discovery tool will address the needs of all these users. it is important that the composition of the evaluation team adequately represents the interests of the different user groups the tool is intended to serve. the evaluation at rutgers was conducted by a cross-‐departmental team of fifteen members and included experts from a variety of library units and representatives from all campuses. finally, because web-‐scale discovery brings such profound changes to staff and user workflows, decisions regarding selection and implementation are often fraught with controversy. as noted, discovery tools impact a wide range of library services and therefore require careful evaluation from the perspectives of multiple stakeholders. furthermore, these tools dramatically change the nature of library research, and not everyone in your organization may view this change as being for the better. despite growing rates of adoption, debates over the value and utility of web-‐scale discovery continue to divide librarians.18 according to one survey, securing staff buy-‐in is the biggest challenge academic libraries face when implementing a web-‐scale discovery service.19 ensuring broad involvement early in the process will help to secure organizational buy-‐in and support for the selected product. while broad representation is important, having a large and diverse team can sometimes slow down the process; schedules can be difficult to coordinate, members may have competing views or demands on their time, meetings can lose focus or wander off topic, etc. the more members on your evaluation team, the more difficult the team may be to manage. one strategy for managing a large group might be to create a smaller, core team with all other members serving on an ad hoc basis. the core team functions as a steering committee to manage the project and calls on the ad hoc members at different stages in the evaluation process where their input and expertise is needed. another strategy would be to break the larger group into several functional teams, each responsible for evaluating specific aspects of the discovery tool. for example, one team might focus on functionality, another on technology, a third on administration, etc. this method also has the advantage of distributing the workload among team members and breaking down a complex evaluation process into discrete, more manageable parts. like any other committee or taskforce, your evaluation team should have a charge outlining its responsibilities, timetable of deliverables, reporting structure, and membership. the charge should also include a vision or goals statement that explicitly states the underlying assumptions and premises of the discovery tool, its purpose, and how it supports the library’s larger mission of connecting users with information.20 although frequently highlighted in the literature, the importance of defining institutional goals for discovery is often overlooked or taken for granted.21 having a vision statement is crucial to the success of the project for multiple reasons. first, it frames the evaluation process by establishing mutually agreed-‐upon goals and priorities for the product. before the evaluation can begin, the team must have a clear understanding of what problems the discovery service is expected to solve, who it is intended to serve, and how it information technology and libraries | june 2015 25 supports the library’s strategic goals. is the service primarily intended for undergraduates, or is it also expected to serve graduate students and faculty? is it a one-‐stop shop for all information needs, a starting point in a multi-‐step research process, or merely a useful tool for general and interdisciplinary research? second, having a clear vision for the product will help guide implementation and assessment. it will not only help the library decide how to configure the product and what features to prioritize, but also offer explicit benchmarks by which to evaluate performance. finally, aligning web-‐scale discovery with the library’s strategic plan will help put the project in wider context and secure buy-‐in across all units in the organization. having a clear understanding of how the product will be integrated with and support other library services will help minimize common misunderstandings and ensure wider adoption. educate library stakeholders despite the quick maturation and adoption of web-‐scale discovery services, these technologies are still relatively new. many librarians in your organization, including those on the evaluation team, may only possess a cursory understanding of what these tools are and how they function. creating an inclusive evaluation process requires having an informed staff that can participate in the discussions and decision-‐making processes leading to product selection. therefore the first task of your evaluation team should be to educate themselves and their colleagues on the ins and outs of web-‐scale discovery services. this should include performing a literature review, collecting information about products currently on the market, and reviewing live implementations at other institutions. at rutgers, the evaluation team conducted an extensive literature review that resulted in annotated bibliography covering all aspects of web-‐scale discovery, including general introductions, product reviews, and methodologies for evaluation, implementation, and assessment. all team members were encouraged to read this literature to familiarize themselves with relevant terminology, products, and best practices. the team also collected product information from vendor websites and reviewed live implementations at other institutions. in this way, members were able to familiarize themselves with the different features and functionality offered by each vendor. once the team has done its research, it can begin sharing its findings with the rest of the library community. vaughan recommends establishing a quick and easy means of disseminating information such as an internal staff website, blog, or wiki that staff can visit on their own time.22 the rutgers team created a private libguide that served as a central repository for all information related to the evaluation process, including a brief introduction to web-‐scale discovery, information about each product, recorded vendor demonstrations, links to live implementations, and an annotated bibliography. also included was information about the team’s ongoing work, including the group’s charge, timeline, meeting minutes, and reports. in addition to maintaining an online presence, the team also held a series of public forums and workshops to educate staff about the nature of web-‐scale discovery as well as provide updates on the evaluation process and evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 26 respond to questions and concerns. by providing staff with a foundation for understanding web-‐ scale discovery and the process by which these products were to be evaluated, the team sought to maximize the engagement and participation of the larger library community. schedule vendor demonstrations once everyone has a conceptual understanding of what web-‐scale discovery services do and how they work, it is time to begin inviting onsite vendor demonstrations. these presentations give library staff an opportunity to see these products in action and ask vendors in-‐depth questions. sessions are usually led by a sales representative and product manager and typically include a brief history of the product’s development, a demonstration of key features and functionality, and an audience question-‐and-‐answer period. to provide a level playing field for comparison, the evaluation team may wish to submit a list of topics or questions for each vendor to address in their presentation. this could be a general outline of key areas of interest identified by the evaluation team or a list of specific questions solicited from the wider library community. vaughan offers a useful list of questions that librarians may wish to consider to structure vendor demonstrations.23 one tactic used by the evaluation team at auburn university involved requiring vendors to use their products to answer a series of actual reference questions.24 this not only precluded them from using canned searches that might only showcase the strengths of their products, but also gave librarians a better sense of how these products would perform out in the wild against real user queries. another approach might be to invite actual users to the demonstrations. whether you are fortunate enough to have users on your evaluation team or able to encourage a few library student workers to attend, your users may raise important questions that your staff has overlooked. vendor demonstrations should only be scheduled after the evaluation team has had an opportunity to educate the wider library community. an informed staff will get more out of the demos and be better equipped to ask focused questions. as vaughan suggests, demonstrations should be scheduled in close proximity (preferably within the same month) to sustain staff engagement, facilitate retention of details, and make it easier to compare services.25 with the vendor’s permission, libraries should also consider recording these sessions and making them available to staff members who are unable to attend. at the conclusion of each demonstration, staff should be invited to offer their feedback on the presentation or ask any follow-‐up questions. this can be accomplished by distributing a brief paper or online survey to the attendees. create an evaluation rubric perhaps the most important part of the evaluation process is developing a list of key criteria that will be used to evaluate and compare vendor offerings. once the evaluation team has a better understanding of what these products can do and the different features and functionality offered by each vendor, it can begin defining the ideal discovery environment for its institution. this often takes the form of a list of desirable features or product requirements. the process for generating information technology and libraries | june 2015 27 these criteria tends to vary by institution. in some cases, they are defined by the team leader or based on criteria used for past technology purchases.26 in other cases, criteria are compiled through a review of the literature.27 in yet other cases, they are developed and refined with input from library staff through staff surveys and meetings.28 one important element missing from all of these approaches is the user. to ensure the evaluation team selects the best tool for library users, product requirements should be firmly rooted in an assessment of user needs. the university of michigan, for example, used persona analysis to identify common user needs and distilled these into a list of tangible features that could be used for product evaluation.29 other tactics for assessing user needs and expectations might include user surveys, interviews, or focus groups. these tools can be useful for gathering information about what users want from your web-‐scale discovery system. however, these methods should be used with caution, as users themselves don’t always know what they want, particularly from a product they have never used. furthermore, as usability experts have pointed out, what users say they want may not be what they actually need.30 therefore it is important to validate data collected from surveys and focus groups with usability testing. to reliably determine whether a product meets the needs of your users, it is best to observe what users actually do rather than what they say they do. if the evaluation team has a short timeframe or is unable to undertake extensive user research, it may be able to develop product requirements on the basis of existing research. at rutgers, for example, the libraries’ department of planning and assessment conducts a standing survey to collect information about users’ opinions of and satisfaction with library services. the evaluation team was able to use this data to learn more about what users like and don’t like about the library’s current search environment. the team analyzed more than 700 user comments collected from 2009 to 2012 related to the library’s catalog and electronic resources. comments were mapped to specific types of features and functionality that users want or expect from a library search tool. since most users don’t typically articulate their needs in terms of concrete technical requirements, some interpretation was required on the part of the evaluation team. for example, the average user may not necessarily know what faceted browsing is, but a suggestion that there be “a way to browse through books by category instead of always having to use the search box” could reasonably be interpreted as a request for this feature. features were ranked in order of importance by the number of comments made about it. some of the most “requested” features included single point of access, “smart” search functionality such as autocorrect and autocomplete, and improved relevance ranking. of course, user needs are not the only criteria to be considered when choosing a discovery service. organizational and staff needs must also be taken into account. user input is important for defining the functionality of the public interface, but staff input is necessary for determining back-‐ end functionality and organizational fit. to the list of user requirements, the evaluation team added institutional requirements related to factors such as cost, coverage, customizability, and evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 28 support. the team then conducted a library-‐wide survey inviting all staff to rank these requirements in order of importance and offer any additional requirements that should be factored into the evaluation. combining the input from library staff and users, the evaluation team drafted a list of fifty-‐five product requirements (see appendix b), which became the basis for a comprehensive evaluation rubric that would be used to evaluate and ultimately select a web-‐scale discovery service. the design of the rubric was largely modeled after the one developed at penn state.31 requirements were arranged into five categories: content, functionality, usability, administration, and technology. each category was allocated to a sub team according to area of expertise that would be responsible for that portion of the evaluation. each requirement was assigned a weight according to its degree of importance: 3 = mandatory, 2 = desired, 1 = optional. each product was given a score based on how well it met each requirement: 3 = fully meets, 2 = partially meets, 1 = barely meets, 0 = does not meet. the total number of points awarded for each requirement was calculated by multiplying weight by score. the final score for each product was calculated by summing up the total number of points awarded (see appendix c). this scoring method was particularly helpful in minimizing the influence of bias on the evaluation process. keep in mind that some stakeholders may possess personal preferences for or against a particular product because of current or past relations with the vendor, their experiences with the product while at another institution, or their perception of how the product might impact their own work. by establishing a set of predefined criteria, rooted in local needs and measured according to clear and consistent standards, the team adopted an evaluation model that was not only user-‐centered, but also allowed for a fair, unbiased, and systematic evaluation of vendor offerings. this is particularly important for libraries that must go through a formal procurement process to purchase a web-‐scale discovery service. draft the rfp once the evaluation team has defined its product requirements and established a method for evaluating the products in the marketplace, it can set to work drafting a formal rfp. some institutions may be able to forego the rfp process. others, like rutgers, are required to go through a competitive bidding process for any goods and services purchased over a certain dollar amount. the only published model on selecting a discovery service through the rfp process is offered by freivalds and lush.32 the authors provide a brief overview of the pros and cons of using an rfp, describe the process developed at penn state, and offer several useful templates to help guide the evaluation. the rfp lets vendors know that the organization is interested in their product, outlines the organization’s requirements for said product, and gives the vendors an opportunity to explain in detail how their product meets these requirements. rfps are usually written in collaboration with information technology and libraries | june 2015 29 your university’s purchasing department who typically provides a template for this purpose. at a minimum, your rfp should include the following: • background information about the library, including size, user population, holdings, and existing technical infrastructure • a description of the product being sought, including product requirements, services and support expected from the vendor, and the anticipated timeline for implementation • a summary of the criteria that will be used to evaluate proposals, the deadline for submission, and the preferred format of responses • any additional terms or conditions such as requiring vendors to provide references, onsite demonstrations, trial subscriptions, or access to support and technical documentation • information about who to contact regarding questions related to the rfp rfps are useful not only because they force the library to clearly articulate its needs for web-‐scale discovery, but also because they produce a detailed, written record of product information that can be referenced throughout the evaluation process. the key component of rutgers’ rfp was a comprehensive, 135-‐item questionnaire that asked vendors to spell out in painstaking detail the design, technical, and functional specifications of their products (see appendix d). many of the questions were either borrowed from the existing literature or submitted by members of the evaluation team. all questions were directly mapped to criteria from the team’s evaluation rubric. the responses were used to determine how well each product met these criteria and factored into product scoring. vendors were given one month to respond to the rfp. interview current customers while vendor marketing materials, demonstrations, and questionnaires are important sources of product information, vendor claims should not simply be taken at face value. to obtain an impartial assessment of the products under consideration, the evaluation team should reach out to current customers. there are several ways to identify current discovery service subscribers. many published overviews of web-‐scale discovery services offer lists of example implementations for each major discovery provider.33 most vendors also provide a list of subscribers on their website or community wiki (or will provide one on request). and, of course, there is also marshall breeding’s invaluable website, library technology guides, which provides up-‐to-‐date information about technology products used by libraries around the world.34 the advanced search allows you to filter libraries by criteria such as type, collection size, geographic area, and ils, thereby making it easier to identify institutions similar to your own. as part of the rfp process, all four vendors were required to provide references for three current academic library customers of equivalent size and classification to rutgers. these twelve references were then invited to take an online survey asking them to share their opinions of and experiences with the product (see appendix e). the survey consisted of a series of likert-‐scale questions asking each reference to rate their satisfaction with various functions and features of evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 30 their discovery service. this was followed by many in-‐depth written response questions regarding topics such as coverage, quality of results, interface usability, customization, and support. follow-‐ up phone interviews were conducted in cases where additional information or clarification was needed. the surveys permitted the evaluation team to collect feedback from current customers in a way that was minimally obtrusive while allowing for easy analysis and comparison of responses. it also provided a necessary counterbalance to vendor claims by giving the team a much more candid view of each product’s strengths and weaknesses. the reference interviews helped highlight issues and areas of concern that were frequently minimized or glossed over in communications with vendors such as gaps in coverage, inconsistent metadata, duplicate results, discoverability of local collections, and problems with known-‐item searching. configure and test local trials although the evaluation team should strive to collect as much product information from as many sources as possible, no amount of research can effectively substitute for a good old-‐fashioned trial evaluation. conducting trials using the library’s own collections and local settings is the best way to gain first-‐hand insight into how a discovery service works. for some libraries, the expenditure of time and effort involved in configuring a web-‐scale discovery service can make the prospect of conducting trials prohibitive. as a result, many discovery evaluations tend to rely on testing existing implementations at other institutions. however, this method of evaluation only scratches the surface. for one thing, the evaluation team is only able to observe the front-‐end functionality of the public interface. but setting up a local trial gives the library an opportunity to peak under the hood and learn about back-‐end administration, explore configuration and customization options, attain a deeper understanding of the composition of the central index, and get a better feel for what it is like working with the vendor. second, discovery services are highly customizable and the availability of certain features, functionality, and types of content varies by institution. as hoeppner points out, no individual site is capable of demonstrating the “full range of possibilities” available from any vendor.35 the presence or absence of certain features has as much to do with local library decisions as they do with any inherent limitations of the product. finally, establishing trials gives the evaluation team an opportunity to see how a particular discovery service performs within its own local environment. the ability to see how the product works with the library’s own records, ils, link resolver, and authentication system allows the team to evaluate the compatibility of the discovery service with the library’s existing technical infrastructure. at rutgers, one of the goals of the rfp was to help narrow the pool of potential candidates from four to two. the evaluation team was asked to review vendor responses and apply the evaluation rubric to assign each a preliminary score on the basis of how well they met the library’s requirements. the two top-‐scoring candidates would then be selected for a trial evaluation that would allow the team to conduct further testing and make a final recommendation. however, after the proposals were reviewed, the scores for three of the products were so close that the team information technology and libraries | june 2015 31 decided to trial all three. the one remaining product scored notably lower than its competitors and was dropped from further consideration. configuring trials for three different web-‐scale discovery services was no easy task, to be sure. an implementation team was formed to work with the vendors to get the trials up and running. the team received basic training for each product and was given full access to support and technical documentation. working with the vendors, the implementation team set to work loading the library’s records and configuring local settings. for the most part, the trials were basic out-‐of-‐the-‐ box implementations with minimal customization. the vendors were willing to do much of the configuration work for us, but it was important that the team learn and understand the administrative functionality of each product, as this was an integral part of the evaluation process. all vendors agreed to a three-‐month trial period during which the evaluation team ran their products through a series of tests assessing three key areas: coverage, usability, and relevance ranking. the importance of product testing cannot be overstated. as previously mentioned, web-‐scale discovery affect a wide variety of library services and, in most cases, will likely serve as the central point of access to the library’s collections. before committing to a product, the library should have an opportunity to conduct independent testing to validate vendor claims and ensure that their products function according to the library’s expectations. to ensure that critical issues are uncovered, testing should strive to simulate as much as possible the environment and behavior of your users by employing sample searches and strategies that they themselves would use. in fact, wherever possible, users should be invited to participate in testing and offer their feedback about the products under consideration. testing checklists and scripts must also be created to guide testers and ensure consistency throughout the process. as mandernach and condit fagan point out, although product testing is time-‐consuming and labor-‐intensive, it will ultimately save the time of your users and staff who would otherwise be the first to encounter any bugs and help avoid early unfavorable impressions of the product.36 the first test the evaluation team conducted aimed at evaluating the coverage and quality of indexing of each discovery product (see appendix f). loosely borrowing from methods employed at university of chicago, twelve library subject specialists were recruited to help assess coverage within their discipline.37 each subject specialist was asked to perform three search queries representing popular research topics in their discipline and compare the results from each discovery service with respect to breadth of coverage and quality of indexing. in scoring each product, subject specialists were asked to consider the following questions: • do the search results demonstrate broad coverage of the variety of subjects, formats, and content types represented in the library’s collection? • do any particular types of content seem to dominate the results (books, journal articles, newspapers, book reviews, reference materials, etc.)? evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 32 • are the library’s local collections adequately represented in the results? • do any relevant resources appear to be missing from the search results (i.e., results from an especially relevant database or journal)? • do item records contain complete and accurate source information? • do item records contain sufficient metadata (citation, subject headings, abstracts, etc.) to help users identify and evaluate results? participants were asked to rate the performance of each discovery service in terms of coverage and indexing on a scale of 1 to 3 (1 = poor, 2 = average, 3 = good). although results varied by discipline, one product received the highest average scores in both areas. in their observations, participants frequently noted that it appeared to have better coverage and produce a greater variety of sources while results from the other two products tended to be dominated by specific source types like newspapers or reference books. the same product was also noted to have more complete metadata while the other two frequently produced results that lacked additional information like abstracts and subject terms. the second test aimed to evaluate the usability of each discovery service. five undergraduate students of varying grade levels and areas of study were invited to participate in a task-‐based usability test (see appendix g). the purpose of the test was to assess users’ ability to use these products to complete common research tasks and determine which product best meet their needs. students were asked to use all three products to complete five tasks while sharing their thoughts aloud. for the purposes of testing, products were referred to by letters (a, b, c) rather than name. because participants were asked to complete the same tasks using each product, it was assumed that they their ability to complete tasks might improve as the test progressed. accordingly, product order was randomized to minimize potential bias. each session lasted approximately forty-‐five minutes and included a pre-‐test questionnaire to collect background information about the participant as well as a post-‐test questionnaire to ascertain their opinions on the products being tested. because users were being asked to test three different products, the number of tasks was kept to a minimum and focused only on basic product functionality. more comprehensive usability testing would be conducted after selection to help guide implementation and improve the selected product. using each product, participants were asked to find three relevant sources on a topic, email the results to themselves, and attempt to obtain full text for at least one item. although the team noted potential problems in users’ interaction with all of the products, participants had slightly higher success rates with one product over all others. furthermore, in the post-‐test questionnaire, four out of five users stated that they preferred this product to the other two, noting that they found it easier to navigate, obtained more relevant results, and had notably less difficulty accessing full text. a follow-‐up question asked participants how these products compared with the search tools currently offered by the library. almost all participants cited disappointing previous experiences information technology and libraries | june 2015 33 with library databases and the catalog and suggested that a discovery tool might make finding materials easier. however, several users also suggested that none these tools were “perfect.” and, while these discovery services may have the “potential” to improve their library experience, all could use a good deal of improvement, particularly with returning relevant results. therefore the evaluation team embarked on a third and final test of its top three discovery candidates, the goal of which was to evaluate relevance ranking. while usability testing is helpful for highlighting problems with the design of an interface, it is not always the best method for assessing the quality of results. in user testing, students frequently retrieved or selected results that were not relevant to the topic. it was not always clear whether this outcome was attributable to a flaw in product design or to the users’ own ability to construct effective search queries and evaluate results. determining relevance is a subjective process and one that requires a certain level of expertise in the relevant subject area. therefore, to assess relevance ranking among the competing discovery services, the evaluation team turned once again to its library subject specialists. echoing countless other user studies, our testing indicated that most users do not often scroll beyond the first page of results. therefore a discovery service that harvests content from a wide variety of different sources must have an effective ranking algorithm capable of surfacing the most useful and relevant results. to evaluate relevance ranking, subject specialists were asked to construct a search query related to their area of expertise, perform this search in each discovery tool, and rate the relevancy of the first ten results. results were recorded in the exact order retrieved and ranked on a scale of 0–3 (0 = not relevant, 1 = somewhat relevant, 2 = relevant, 3 = very relevant). two values were used to evaluate the relevance-‐ranking algorithm of each discovery service. relevance was assessed by calculating cumulative gain, or the sum of all relevance scores. for example, if the first ten results returned by a discovery product received a score of 3 because they were all deemed to be “very relevant,” the product would receive a cumulative gain score of 30. ranking was assessed by calculating discounted cumulative gain, which discounts the relevance score of results on the basis of where they appear in the rankings. assuming that the relevance of results should decrease with rank, each result after the first was associated with a discount factor of 1/log2i (where i = rank). the relevance for each result is multiplied by the discount factor to provide the discount gain. for example, a result with a relevance score of 3 but a rank of 4 is discounted through this process to a relevance score of 1.5. discounted cumulative gain represents the sum of all discount gain scores.38 eighteen librarians conducted a total of twenty-‐six searches. using a microsoft excel worksheet, participants were asked to record their search query, the titles of the first ten results, and the relevance score of each result (see appendix h). formulas for cumulative gain and discount cumulative gain were embedded in the worksheet so these values were automatically calculated. after all the values were calculated, one product once again had outperformed all others. in the evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 34 majority of searches conducted, librarians rated its results as being more relevant than its competitors. however, librarians were quick to point out that they were not entirely satisfied with the results from any of the three products. in their observations, they noted many of the same issues that were raised in previous rounds of testing such as incomplete metadata, duplicate results, and overrepresentation of certain types of content. at the end of the trial period, the evaluation team once again invited feedback from the library staff. an online library-‐wide survey was distributed in which staff members were asked to rank each discovery product according to several key requirements drawn from the team’s evaluation rubric. each requirement was accompanied by one or more questions for participants to consider in their evaluation. the final question asked participants to rank the three candidates in order of preference. links to the trial implementations of all three products were included in the survey. included in the email announcement was also a link to the team’s website where participants could find more information about web-‐scale discovery. because participating in the survey required staff to review and interact with all three products, the team estimated that it would take forty-‐five minutes to an hour to complete (depending on the staff member’s familiarity with the products). given the amount of time and effort required for participation, relevant committees were also encouraged to review the trials and submit their evaluation as a group. response rate for the survey was much lower than expected, possibly because of the amount of effort involved or because a large number of staff did not feel qualified to comment on certain aspects of the evaluation. however, among the staff members that did respond, one product was rated more highly than all others. coincidentally, it was also the same product that had received the highest scores in all three rounds of testing. make final recommendation at this stage in the process, your evaluation team should have collected enough data to make an informed selection decision. your decision should take into consideration all of the information gathered throughout the evaluation process, including user and product research, vendor demonstrations, rfp responses, customer references, staff and user feedback, trials, and product testing. in preparation for the evaluation team’s final meeting, each sub team was asked to revisit the evaluation rubric. using all of the information that had been collected and made available on the team’s website, each sub team was asked to score the remaining three candidates based on how well they met the requirements in their assigned category and submit a report explaining the rationale for their scores. at the final meeting, a representative from each sub team presented their report to the larger group. the entire team reviewed the scores awarded to each product. once a consensus was reached on the scoring, the final results were tabulated and the product that received the highest total score was selected. once the evaluation team has reached a conclusion, its decision needs to be communicated to library stakeholders. the team’s findings should be compiled in a final report that includes a brief introduction to the subject of web-‐scale discovery, the factors motivating the library’s decision to information technology and libraries | june 2015 35 acquire a discovery service, an overview of the methods that were used evaluate these services, and a summary of the team’s final recommendation. of course, considering that few people in your organization may ever actually read the report, the team should seek out additional opportunities to present its findings to the community. the rutgers evaluation team presented its recommendation report on three different occasions. the first was joint meeting of the library’s two major governing councils. after securing the support of the councils, the group’s recommendation was presented at a meeting of library administrators for final approval. once approved, a third and final presentation was given at an all-‐staff meeting and included a demonstration of the selected product. by taking special care to openly communicate the team’s decision and making transparent the process used to reach it, the evaluation team not only demonstrated the depth of its research but also was able to secure organizational buy-‐in and support for its recommendation. conclusion selecting a web-‐scale discovery service is a large and important undertaking that involves a significant investment of time, staff, and resources. finding the right match begins with a thorough and carefully planned evaluation process. the evaluation process outlined here is intended as a blueprint that similar institutions may wish to follow. however, every library has different needs, means, and goals. while this process served rutgers well, certain elements may not be applicable to your institution. regardless of what method your library chooses, it should strive to create an evaluation process that is inclusive, goal-‐oriented, data-‐driven, user-‐centered, and transparent. inclusive web-‐scale discovery impacts a wide variety of library services and functions. therefore a complete and informed evaluation requires the participation and expertise of a broad cross section of library units. furthermore, as with the adoption of any new technology, the implementation of a web-‐scale discovery service can be potentially disruptive. these products introduce significant and sometimes controversial changes to staff workflows, user behavior, and library usage. ensuring broad involvement in the evaluation process can help allay potential concerns, reduce tensions, and ensure wider adoption. goal-‐oriented it can be easy to be seduced by new technologies simply because they are new. but merely adopting these technologies without taking to the time to reflect on and communicate their purpose and goals can be a recipe for disaster. to select the best discovery tool for your library, evaluators must have a clear understanding of the problems it is trying to solve, the audience it seeks to serve, and the role it plays within the library’s larger mission. articulating the library’s vision and goals for web-‐scale discovery is crucial for establishing an evaluation plan, developing a prioritized list of product requirements, understanding what questions to ask vendors, and setting benchmarks by which to evaluate performance. evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 36 data-‐driven to ensure an informed, fair, and impartial evaluation, evaluators should strive to incorporate data-‐driven practices into all of their decision-‐making. many library stakeholders, including members of the evaluation team, may enter the evaluation process with preexisting views on web-‐ scale discovery, untested assumptions about user behavior, or strong opinions about specific products and vendors. to minimize the influence of these potential biases on the selection process, it is important that the team be able to demonstrate the rationale for its decisions through verifiable data. evaluating web-‐scale discovery services requires extensive research and should include data collected through user research, staff surveys, collections analysis, and product testing. all of this data should be carefully collected, analyzed, and used to inform the team’s final recommendation. user-‐centered if the purpose of adopting a web-‐scale discovery service is to better serve your users, then you should try as much as possible to involve users in the evaluation and selection process. this means including users on the evaluation team, grounding product requirements in user research, and gathering user feedback through surveys, focus groups, and product testing. this last step is especially important. no other piece of information gathered throughout the evaluation process will be as helpful or revealing as actually watching users use these products to complete real-‐life research tasks. user testing is the best and, frankly, only way to validate claims from both vendors and librarians about what your users want and need from your library’s search environment. transparent because web-‐scale discovery impacts library staff and users in significant ways, its reception within academic libraries has been somewhat mixed. as previously mentioned, securing staff buy-‐ in is often one of the most difficult obstacles libraries face when introducing a new web-‐scale discovery service. while encouraging broad participation in the evaluation process helps facilitate buy-‐in, not every library stakeholder will be able to participate. therefore it is important that the evaluation team make special effort to communicate its work and keep the library community updated on its progress. this can be done by creating a staff website or blog devoted to the evaluation process, sending periodic updates via the library’s electronic discussion list, holding public forums and demonstrations, regularly soliciting staff feedback through surveys and polls, and widely distributing the team’s findings and final report. these communications should help secure organizational support by making clear that the team recommendations are based on a thorough evaluation that is inclusive, goal-‐oriented, data-‐driven, user-‐centered, and transparent. information technology and libraries | june 2015 37 appendix a. overview of web-‐scale discovery evaluation plan form an evaluation team create an evaluation team representing a broad cross section of library units. draft a charge outlining the library’s goals for web-scale discovery and the team’s responsibilities, timetable, reporting structure, and membership. 1 educate library stakeholders create a staff website or blog to disseminate information about web-scale discovery and the evaluation process. host workshops and public forums to educate staff, share information, and maximize community participation. 2 schedule vendor demonstrations invite vendors for onsite product demonstrations. schedule visits in close proximity and provide vendors with an outline or list of questions in advance. invite all members of the library community to attend and offer feedback. 3 create an evaluation rubric create a comprehensive, prioritized list of product requirements rooted in staff and user needs. develop a fair and consistent scoring method for determining how each product meets these requirements. 4 draft the rfp if required, draft an rfp to solicit bids from vendors. include information about your library, a summary of your product requirements and evaluation criteria, and any terms or conditions of the bidding process. 5 interview current customers obtain candid assessments of each product by interviewing current customers. ask customers to share their experiences and offer assessments on factors such as coverage, design, functionality, customizability, and vendor support. 6 configure and test local trials after narrowing down the options, select the top candidates for a trial evaluation. test the products with users and staff to evaluate and compare coverage, functionality, and result quality. 7 make final recommendation make an informed recommendation based on all of the information collected. compile the results of your research in a final report and communicate the team’s findings to the library community. 8 evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 38 appendix b. product requirements for a web-‐scale discovery service # requirement description questions to consider 1 content 1.1 scope provides access to the broadest possible spectrum of library content including books, periodicals, audiovisual materials, institutional repository items, digital collections, and open access content with how many publishers and aggregators does the vendor have license agreements? are there any notable exclusions? how many total unique items are included in the central index? how many open access resources are included? what percentage of content is mutually licensed? what is the approximate disciplinary, format, and date breakdown of the central index? what types of local content can be ingested into the index (ils records, institutional repository items, digital collections, research guides, webpages, etc.)? can the library customize what content is exposed to its users? 1.2 depth provides the richest possible metadata for all indexed items, including citations, descriptors, abstracts, and full text what level of indexing is provided? what percentage of items contains only citations? what percentage includes abstracts? what percentage includes full text? 1.3 currency provides regular and timely updates of licensed content as well as on-‐demand updates of local content how frequently is the central index updated? how frequently are local records ingested? can the library initiate a manual harvest of local records? can the library initiate a manual harvest of a specific subset of local records? information technology and libraries | june 2015 39 1.4 data quality provides clear and consistent indexing of records from a variety of different sources and in a variety of different formats what record formats are supported? what metadata fields are required for indexing? how is metadata from different sources normalized into a universal metadata schema? how are controlled vocabularies created? to what degree can collections from different sources have their own unique field information displayed and/or calculated into the relevancy-‐ranking algorithm for retrieval purposes? 1.5 language supports indexing and searching of foreign-‐language materials using non-‐roman characters does the product support indexing and searching of foreign-‐language materials using non-‐roman characters? what languages and character sets are supported? 1.6 federated searching supports incorporation of content not included in the central index via federated searching does the vendor offer federated searching of sources not included in the central index? how are these sources integrated into search results? is there an additional cost for adding connectors to these sources? 1.7 unlicensed content includes and makes discoverable additional content not owned or licensed by the library are local collections from other libraries using the discovery service exposed to all customers? are users able to search content that is included in the central index but not licensed or owned by the host library? 2 functionality 2.1 smart searching provides “smart” search features such as autocomplete, autocorrect, autostemming, thesaurus matching, stop-‐word filtering, keyword highlighting, etc. what “smart” features are included in the search engine? are these features customizable? can they be enabled or disabled by the library? evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 40 2.2 advanced searching provides advanced search options such as field searching, boolean operators, proximity searching, nesting, wildcard/truncation, etc. what types of advanced search options are available? are these options customizable? can they be enabled or disabled by the library? 2.3 search limits provides limits for refining search results according to specified criteria such as peer-‐ review status, full-‐text availability, or location does the product include appropriate limits for filtering search results? 2.4 faceted browsing allows users to browse the index by facets such as format, author, subject, region, era, etc. what types of facets are available for browsing? can users select multiple facets in different categories? are facets easy to add or remove from a search? are facet categories, labels, and ordering customizable? can facets be customized by format or material type (e.g., music, film, etc.)? 2.5 scoped searching provides discipline-‐, format-‐, or location-‐specific search options that allow searches to be limited to a set of predefined resources or criteria can the library construct scoped search portals for specific campus libraries, disciplines, or formats? can these portals be customized with different search options, facets, relevancy ranking, or record displays? 2.6 visual searching provides visual search and browse options such as tag clouds, cluster maps, virtual shelf browsing, geo-‐browsing, etc. does the product provide any options for visualizing search results beyond text-‐based lists? can data visualization tools be integrated into search result display with additional programming? 2.7 relevancy ranking provides useful results using an effective and locally customizable relevancy ranking algorithm what criteria are used to determine relevancy (term frequency and placement, format, document length, publication date, user behavior, scholarly value, etc.)? how does it rank items with varying levels of metadata (e.g., citation only vs. citation + full text)? is relevancy ranking customizable information technology and libraries | june 2015 41 by the library? by the user? 2.8 deduplication has an effective method for identifying and managing duplicate records within results does the product employ an effective method of deduplication? 2.9 record grouping groups different manifestations of the same work together in a single record or cluster does the product employ frbr or some similar method to group multiple manifestations of the same work? 2.10 result sorting provides alternative options for sorting results by criteria such as date, title, author, call number, etc. what options does the product offer for sorting results? 2.11 item holdings provides real-‐time local holdings and availability information within search results how does the product provide local holdings and availability information? is this information displayed in real-‐time? is this information displayed on the results screen or only within the item record? 2.12 openurl supports openurl linking to facilitate seamless access from search results to electronic full text and related services how does the product provide access to the library’s licensed full-‐ text content? are openurl links displayed on the results screen or only in the item record? 2.13 native record linking provides direct links to original records in their native source does the product offer direct links to original records allowing users to easily navigate from the discovery service to the record source, whether it is a subscription database, the library catalog, or the institutional repository? 2.14 output options provides useful output options such as print, email, text, cite, export, etc. what output options does the product offer? what citation formats are supported? which citation managers are supported? are export options customizable? evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 42 2.15 personalization provides personalization features that allow users to customize preferences, save results, bookmark items, create lists, etc. what personalization features does the product offer? are these features linked to a personal account or only session-‐based? must users create their own accounts or can accounts be automatically linked to their institutional id? 2.16 recommendations provides recommendations to help users locate similar items or related resources does the product provide item recommendations to help users locate similar items? does the product provide database recommendations to help users identify specialized databases related to their topic? 2.17 account management allows users to access their library account for activities such as renewing loans, placing holds and requests, paying fines, viewing borrowing history, etc. can the product be integrated with the library’s ils to provide seamless access to user account management functions? does the vendor provide any drivers or technical support for this purpose? 2.18 guest access allows users to search and retrieve records without requiring authentication does the vendor allow for “guest access” to the service? are users required to authenticate to search or only when requesting access to licensed content? 2.19 context-‐sensitive services interacts with university identity and course-‐ management systems to deliver customized services on the basis of user status and affiliation can the product be configured to interact with university identity and course-‐management systems to deliver customized services on the basis of user status and affiliation? does the vendor provide any drivers or technical support for this purpose? 2.20 context-‐sensitive delivery options displays context sensitive delivery options based on the item’s format, status, and availability can the product be configured to interact with the library’s ill and consortium borrowing services to display context-‐sensitive delivery options for unavailable local holdings? does the vendor provide any drivers or technical support for this purpose? information technology and libraries | june 2015 43 2.21 location mapping supports dynamic library mapping to help users physically locate items on the shelf can the product be configured to support location mapping by linking the call numbers of physical items to online library maps? what additional programming is required? 2.22 custom widgets supports the integration of custom library widgets such as live chat can the library’s chat service be embedded into the interface to provide live user support? where can it be embedded? search page? result screen? 2.23 featured items highlights new, featured, or popular items such as recent acquisitions, recreational reading, or heavily borrowed or downloaded items can the product be configured to dynamically highlight specific items or collections in the library? 2.24 alerts provides customizable alerts or rss feeds to inform users about new items related to their research or area of study does the product offer customizable alerts or rss feeds? 2.25 user-‐submitted content supports user-‐submitted content such as tags, ratings, comments, and reviews what types of user-‐submitted content does the product support? is this content only available to the host library or is it shared among all subscribers of the service? can these features be optionally enabled or disabled? 2.26 social media integration allows users to seamlessly share items via social media such as facebook, twitter, delicious, etc. what types of social media sharing does the product support? can these features be enabled or disabled? evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 44 3 usability 3.1 design provides a modern, aesthetically appealing design that is locally customizable does the product have a modern, aesthetically pleasing design? is it easy to locate all important elements of the interface? are colors, graphics, and spacing used effectively to organize content? what aspects of the interface are locally customizable (color scheme, branding, navigation menus, result display, item records, etc.)? can the library apply its own custom stylesheets or is customization limited to a set or predefined options? 3.2 navigation provides an interface that is easy to use and navigate with little or no specialized knowledge is the interface intuitive and easy to navigate? does it use familiar navigational elements and intuitive icons and labels? are links clearly and consistently labeled? do they allow the user to easily move from page to page (forward and back)? do they take the user where he or she expects to go? 3.3 accessibility meets ada and section 508 accessibility requirements does the product meet ada and section 508 accessibility requirements? 3.4 internationalization provides translations of the user interface in multiple languages does the vendor offer translations of the interface in multiple languages? which languages are supported? does this include translations of customized text? 3.5 help provides user help screens that are thorough, easy to understand, context-‐sensitive, and customizable are product help screens thorough, easy to navigate, and easy to understand? are help screens general or context-‐sensitive (i.e., relevant to the user’s current location within the system)? are help screens customizable? information technology and libraries | june 2015 45 3.6 record display provides multiple record displays with varying levels of information (e.g., preview, brief view, full view, staff view, etc.) are record displays well organized and easily scannable? does the product offer multiple record displays with varying levels of information? what types of record displays are available? can record displays be customized by item type or search portal? 3.7 enriched content supports integration of enriched content from third-‐ party providers such as cover images, table of contents, author biographies, reviews, excerpts, journal rankings, citation counts, etc. what types of enriched content does the vendor provide or support? is there an additional cost for this content? 3.8 format icons provides intuitive icons to indicate the format of items within search results does the product provide any icons or visual cues to help users easily recognize the formats of the variety of items displayed in search results? is this information displayed on the results screen or only within the item record? how does the product define formats? are these definitions customizable? 3.9 persistent urls provides short, persistent links to item records, search queries, and browse categories does the product offer persistent links to item records? what about persistent links to canned searches and browse categories? are these links sufficiently short and user-‐ friendly? 4 administration 4.1 cost is offered at a price that is within the library’s budget and proportional to the value of the service how is product pricing calculated? what is the total cost of the service including initial upfront costs and ongoing costs for subscription and technical support? what additional costs would be incurred for add-‐on services (e.g., federated search, recommender services, enriched content, customer support, etc.)? 4.2 implementation is capable of being implemented within the what is the estimated timeframe for implementation, including evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 46 library’s designated timeframe loading of local records and configuration and customization of the platform? 4.3 user community is widely used and respected among the library’s peer institutions how many subscribers does the product have? what percentage of subscribers are college or university libraries? how do current subscribers view the service? 4.4 support is supported by high-‐quality customer service, training, and product documentation does the vendor provide adequate support, training, and help documentation? what forms of customer support are offered? how adequate is the vendor’s documentation regarding content agreements, metadata schema, ranking algorithms, apis, etc.? does the vendor provide on-‐site and online training? is there any additional cost associated with training? 4.5 administrative tools is supported by a robust, easy-‐ to-‐use administrative interface and customization tools does the product have an easy to use administrative interface? does it support multiple administrator logins and roles? what tools are provided for product customization and administering access control? 4.6 statistics reporting includes a robust statistical reporting modules for monitoring and analyzing product usage does the vendor offer a means of capturing and reporting system and usage statistics? what kinds of data are included in such reports? in what formats are these reports available? is the data exportable? information technology and libraries | june 2015 47 5 technology 5.1 development is a sufficiently mature product supported by a stable codebase and progressive development cycle is the product sufficiently mature and supported by a stable codebase? is development informed by a dedicated user’s advisory group? how frequently are improvements and enhancements made to the service? is there a formal mechanism by which customers can suggest, rank, and monitor the status of enhancement requests? what major enhancements are planned for the next 3–5 years? 5.2 authentication is compatible with the library’s authentication protocols does the product allow for ip-‐ authentication for on-‐site users and proxy access for remote users? what authentication methods are supported (e.g., ldap, cas, shibboleth, etc.)? 5.3 browser compatibility is compatible with all major web browsers what browsers does the vendor currently support? 5.4 mobile access is accessible on mobile devices is the product accessible on mobile devices via a mobile optimized web interface or app? does the mobile version include the same features and functionality of the desktop version? 5.5 portability can be embedded in external platforms such as library research guides, course management systems, or university portals can custom search boxes be created and embedded in external platforms such as library research guides, course management systems, or university portals? evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 48 5.6 interoperability includes a robust api and is interoperable with other major library systems such as the ils, ill, proxy server, link resolver, institutional repository, etc. is the product interoperable with other major library systems such as the ils, ill, proxy server, link resolver, institutional repository, etc.? does the vendor offer a robust api that can be used to extract data from the central index or pair it with a different interface? what types of data can be extracted with the api? 5.7 consortia support supports multiple product instances or configurations for a multilibrary environment can the technology support multiple institutions on the same installation, each with its own unique instance and configuration of the product? is there an additional cost for this service? information technology and libraries | june 2015 49 appendix c. sample web-‐scale discovery evaluation rubric category functionality product product a requirement weight score points notes 2.1 smart searching 2.2 advanced searching 2.3 search limits 2.4 faceted browsing 2.5 scoped searching 2.6 visual searching 2.7 relevancy ranking 2.8 deduplication 2.9 record grouping 2.10 result sorting 2.11 item holdings 2.12 openurl 2.13 native record linking 2.14 output options 2.11 item holdings weight scale 1 = optional 2 = desired 3 = mandatory scoring scale 0 = does not meet 1 = barely meets 2 = partially meets 3 = fully meets points = weight × score explanation and rationale for score evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 50 appendix d. web-‐scale discovery vendor questionnaire 1. content 1.1 scope with how many content publishers and aggregators have you forged content agreements? are there any publishers or aggregators with whom you have exclusive agreements that prohibit or limit them from making their content available to competing discovery vendors? if so, which ones? does your central index exclude any of the publishers and aggregators listed in appendix y [not reproduced here]? if so, which ones? how many total unique items are included in your central index? what is the approximate disciplinary breakdown of the central index? what percentage of content pertains to subjects in the humanities? what percentage in the sciences? what percentage in the social sciences? what is the approximate format breakdown of the central index? what percentage of content derives from scholarly journals? what percentage derives from magazines, newspapers, and trade publications? what percentage derives from conference proceedings? what percentage derives from monographs? what percentage derives from other publications? what is the publication date range of the central index? what is the bulk publication date range (i.e., the date range in which the majority of content was published)? does your index include content from open access repositories such as doaj, hathitrust, and arxiv? if so, which ones? does your index include oclc worldcat catalog records? if so, do these records include holdings information? what types of local content can be ingested into the index (e.g., library catalog records, institutional repository items, digital collections, research guides, library web pages, etc.)? can your service host or provide access to items within a consortia or shared catalog like the pennsylvania academic library consortium (palci) or committee on institutional cooperation (cic)? are local collections (ils records, digital collections, institutional repositories, etc.) from libraries that use your discovery service exposed to all customers? information technology and libraries | june 2015 51 can the library customize its holdings within the central index? can the library choose what content to expose to its users? 1.2 depth what level of indexing do you typically provide in your central index? what percentage of items contains only citations? what percentage includes abstracts? what percentage includes full text? 1.3 currency how frequently is the central index updated? how often do you harvest and ingest metadata for the library’s local content? how long does it typically take for such updates to appear in the central index? can the library initiate a manual harvest of local records? can the library initiate a manual harvest of a specific subset of local records? 1.4 data quality with what metadata schemas (marc, mets, mods, ead, etc.) does your discovery platform work? do you currently support rda records? if not, do you have any plans to do so in the near future? what metadata is required for a local resource to be indexed and discoverable within your platform? how is metadata from different sources normalized into a universal metadata schema? to what degree can collections from different sources have their own unique field information displayed and/or calculated into the relevancy-‐ranking algorithm for retrieval purposes? do you provide authority control? how are controlled vocabularies for subjects, names, and titles established? 1.5 language does your product support indexing and searching of foreign language materials using non-‐ roman characters? what languages and character sets are supported? 1.6 federated searching how does your product make provisions for sources not included in your central index? is it possible to incorporate these sources via federated search? how are federated search results evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 52 displayed with the results from the central index? is there an additional cost for implementing federated search connectors to these resources? 1.7 unlicensed content are end users able to search content that is included in your central index but not licensed or owned by the library? if so, does your system provide a locally customizable message to the user or does the user just receive the publisher/aggregator message encouraging them to purchase the article? can the library opt not to expose content it does not license to its users? 2. functionality 2.1 “smart” searching does your product include autocomplete or predictive search functionality? how are autocomplete predictions populated? does your product include autocorrect or “did you mean . . . ” suggestions to correct misspelled queries? how are autocorrect suggestions populated? does your product support search query stemming to automatically retrieve search terms with variant endings (e.g., car/cars)? does your product support thesaurus matching to retrieve synonyms and related words (e.g., car/automobile)? does your product support stop word filtering to automatically remove common stop words (e.g., a, an, on, from, the, etc.) from search queries? does your product support search term highlighting to automatically highlight search terms found within results? how does your product handle zero result or “dead end” searches? please describe what happens when a user searches for an item that is not included in the central index or the library’s local holdings but may be available through interlibrary loan. does your product include any other “smart” search features that you think enhance the usability of your product? are all of the above mentioned search features customizable by the library? can they be optionally enabled or disabled? 2.2 advanced searching information technology and libraries | june 2015 53 does your product support boolean searching that allows users to combine search terms using operators such as and, or, and not? does your product support fielded searching that allows users to search for terms within specific metadata fields (e.g., title, author, subject, etc.)? does your product support phrase searching that allows users to search for exact phrases? does your product support proximity searching that allows users to search for terms within a specified distance from one another? does your product support nested searching to allow users to specify relationships between search terms and determine the order in which they will be searched? does your product support wildcard and truncation searching that allow users to retrieve variations of their search terms? does your product include any other advanced search features that you think enhance the usability of your product? are all of the above mentioned search features customizable by the library? can they be optionally enabled or disabled? 2.3 search limits does your product offer search limits for limiting results according to predetermined criteria such as peer-‐review status or full text availability? 2.4 faceted browsing does your product support faceted browsing of results by attributes such as format, author, subject, region, era, etc.? if so, what types of facets are available for browsing? is faceted browsing possible before as well after the execution of a search? can users select multiple facets in different categories? are facet categories, labels, and ordering customizable by the library? can specialized materials be assigned different facets in accordance with their unique attributes (e.g., allowing users to browse music materials by unique attributes such as medium of performance, musical key/range, recording format, etc.)? 2.5 scoped searching evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 54 does your product support the construction of multiple scoped search portals for specific campus libraries, disciplines (medicine), or formats (music/video)? if so, what aspects of these search portals are customizable (branding, search options, facets, relevancy ranking, record displays, etc.)? 2.6 visual searching does your product provide any options for visualizing search results beyond text-‐based lists, such as cluster maps, tag clouds, image carousels, etc.? 2.7 relevancy ranking please describe your relevancy ranking algorithm. in particular, please describe what criteria are used to determine relevancy (term frequency/placement, item format/length, publication date, user behavior, scholarly value, etc.) and how is each weighted? how does your product rank items with varying levels of metadata (e.g., citation only vs. citation, abstract, and full text)? is relevancy ranking customizable by the library? can relevancy ranking be customized by end users? 2.8 deduplication how does your product identify and manage duplicate records? 2.9 record grouping does your product employ a frbr-‐ized method to group different manifestations of the same work? 2.10 result sorting what options does your product offer for sorting results? 2.11 item holdings how does your product retrieve and display availability data for local physical holdings? is there a delay in harvesting this data or is it presented in real time? is item location and availability displayed in the results list or only in the item record? 2.12 openurl information technology and libraries | june 2015 55 how does your product provide access to the library’s licensed full text content? are openurl links displayed on the results screen or only in the item record? 2.13 native record linking does your product offer direct links to original records in their native source (e.g., library catalog, institutional repository, third-‐party databases, etc.)? 2.14 output options what output options does your product offer (e.g., print, save, email, sms, cite, export)? if you offer a citation function, what citation formats does your product support (mla, apa, chicago, etc.)? if you offer an export function, which citation managers does your product support (e.g., refworks, endnote, zotero, mendeley, easybib, etc.)? are citation and export options locally customizable? can they be customized by search portal? 2.15 personalization does your product offer any personalization features that allow users to customize preferences, save results, create lists, bookmark items, etc.? are these features linked to a personal account or are they session-‐based? if personal accounts are supported, must users create their own accounts or can account creation be based on the university’s cas/ldap identity management system? 2.16 recommendations does your product provide item recommendations to help users locate similar items? on what criteria are these recommendations based? is your product capable of referring users to specialized databases based on their search query? (for example, can a search for “autism” trigger database recommendations suggesting that the user try their search in psycinfo or pubmed?) if so, does your product just provide links to these resources or does it allow the user to launch a new search by passing their query to the recommended database? 2.17 account management can your product be integrated with the library’s ils (sirsidynix symphony) to provide users access to its account management functions (e.g., renewing loans, placing holds/requests, viewing borrowing history, etc.)? if so, do you provide any drivers or technical support for this purpose? evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 56 2.18 guest access are users permitted “guest access” to the service? are users required to authenticate in order to search or only when requesting access to licensed content? 2.19 context-‐sensitive services could your product be configured to interact with our university course management systems (sakai, blackboard, and ecollege) to deliver customized services based on user status and affiliation? if so, do you provide any drivers or technical support for this purpose? 2.20 context-‐sensitive delivery options could your product be configured to interact with the library’s interlibrary loan (illiad) and consortium borrowing services (ezborrow and uborrow) to display context-‐sensitive delivery options for unavailable local holdings? if so, do you provide any drivers or technical support for this purpose? 2.21 location mapping could your product be configured to support location mapping by linking the call numbers of physical items to library maps? 2.22 custom widgets does your product support the integration of custom library widgets such as live chat? where can these widgets be embedded? 2.23 featured items could your product be configured to highlight specific library items such as recent acquisitions, popular items, or featured collections? 2.24 alerts does your product offer customizable alerts or rss feeds to inform users about new items related to their research or area of study? 2.25 user-‐submitted content does your product support user-‐generated content such as tags, ratings, comments, and reviews? is user-‐generated content only available to the host library or is it shared among all subscribers of your service? can these features be optionally enabled or disabled? information technology and libraries | june 2015 57 2.26 social media integration does your product allow users to seamlessly share items via social media such as facebook, google+, and twitter? can these features be optionally enabled or disabled? 3. usability 3.1 design describe how your product incorporates established best practices in usability. what usability testing have you performed and/or do you conduct on an ongoing basis? what aspects of the interface’s design are locally customizable (e.g., color scheme, branding, display, etc.)? can the library apply its own custom stylesheets or is customization limited to a set or predefined options? 3.2 navigation what aspects of the interface’s navigation are locally customizable (e.g., menus, pagination, facets, etc.)? 3.3 accessibility does your product meet ada and section 508 accessibility requirements? what steps have you taken beyond section 508 requirements to make your product more accessible to people with disabilities? 3.4 internationalization do you offer translations of the interface in multiple languages? which languages are supported? does this include translation of any locally customized text? 3.5 help does your product include help screens to assist users in using and navigating the system? are help screens general or context-‐sensitive (i.e., relevant to the user’s current location within the system)? are help screens locally customizable? 3.6 record display evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 58 does your product offer multiple record displays with varying levels of information? what types of record displays are available (e.g., preview, brief view, full view, staff view, etc.)? can record displays be customizable by item type or metadata (e.g., marc-‐based book record vs. mods-‐based repository record)? can record displays be customizable by search portal (e.g., a biosciences search portal that displays medical rather than lc subject headings and call numbers)? 3.7 enriched content does your product provide or support the integration of enriched content such as cover images, tables of contents, author biographies, reviews, excerpts, journal rankings, citation counts, etc.? if so, what types of content does this include? is there an additional cost for this content? 3.8 format icons does your product provide any icons or visual cues to help users easily recognize the formats of the variety of items displayed in search results? how does your product define formats? are these definitions readily available to end users? are these definitions customizable? 3.10 persistent urls does your product offer persistent links to item records? does your product offer persistent links to search queries and browse categories? 4. administration 4.1 cost briefly describe your product pricing model for academic library customers. 4.2 implementation can you meet the timetable defined in appendix z [not reproduced here]? if not, which milestones cannot be met or which conditions must the libraries address in order to meet the milestones? are you currently working on web-‐scale discovery implementations at any other large institutions? 4.3 user community how many live, active installations (i.e., where the product is currently available to end-‐users) do you currently have? information technology and libraries | june 2015 59 how many additional customers have committed to the product? how many of your total customers are college or university libraries? 4.4 support what customer support services and hours of availability do you provide for reporting and/or troubleshooting technical problems? do you have a help ticket tracking system for monitoring and notifying clients of the status of outstanding support issues? do you offer a support website with up-‐to-‐date product documentation, manuals, tutorials, and faqs? do you provide on-‐site and online training for library staff? do you provide on-‐site and online training for end users? briefly describe any consulting services you may provide above and beyond support services included with subscription (e.g., consulting services related to harvesting of a unique library resource for which an ingest/transform/normalize routine does not already exist). do you have regular public meetings for users to share experiences and provide feedback on the product? if so, where and how often are these meetings held? what other communication avenues do you provide for users to communicate with your company and also with each other (e.g., listserv, blog, social media)? 4.5 administration what kinds of tools are provided for local administration and customization of the product? does your product support multiple administrator logins and roles? 4.6 statistics reporting what statistics reporting capabilities are included with your product? what kinds of data are available to track and assess collection management and product usage? in what formats are these reports available? is the data exportable? is it possible to integrate third-‐party analytic tools such as google analytics in order to collect usage data? evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 60 5. technology 5.1 development in what month and year did product development begin? what key features differentiate your product from those of your competitors? how frequently are enhancements and upgrades made to the service? please describe the major enhancements you expect to implement in the next year. please describe the future direction or major enhancements you envision for the product in the next 3–5 years. is there a formal mechanism by which customers may make, rank, and monitor the status of enhancement requests? do you have a dedicated user’s advisory group to test and provide feedback on product development? 5.2 authentication what authentication methods does your product support (e.g., ldap, cas, shibboleth, etc.)? 5.3 browser compatibility please provide a list of currently supported web browsers. 5.4 mobile access is the product accessible on mobile devices via a mobile optimized web interface or app? does the mobile version include the same features and functionality of the desktop version? 5.5 portability can custom search boxes be created and embedded in external platforms such as the library’s research guides, course management systems, or university portals? 5.6 interoperability does your product include an api that can be used extract data from the central index or pair it with a different interface? what types of data can be extracted with the api? do you provide documentation and instruction on the functionality and use of your api? information technology and libraries | june 2015 61 are there any known compatibility issues with your product and any of the following systems or platforms? • drupal • vufind • sirsidynix symphony • fedora commons • ezproxy • illiad 5.7 consortia support can your product support multiple institutions on the same installation, each with its own unique instance and configuration of the product? is there any additional cost for this service? evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 62 appendix e. web-‐scale discovery customer questionnaire institutional background please tell us a little bit about your library. what is the name of your college or university? which web-‐scale discovery service is currently in use at your library? � ebsco discovery service (eds) � primo central (ex libris) � summon (proquest) � worldcat local (oclc) � other ________________ when was your current web-‐scale discovery service selected (month, year)? how long did it take to implement (even in beta form) your current web-‐scale discovery service? which of the following types of content are included in your web-‐scale discovery service? (check all that apply) � library catalog records � periodical indexes and databases � open access content � institutional repository records � local digital collections (other than your institutional repository) � library research guides � library web pages � other ________________ information technology and libraries | june 2015 63 rate your satisfaction on a scale of 1 (low) to 5 (high), please rate your satisfaction with the following aspects of your web-‐scale discovery service. content how satisfied are you with the scope, depth, and currency of coverage provided by your web-‐scale discovery service? [are the question marks below the wrong font?] ◌ 1 ◌ 2 ◌ 3 ◌ 4 ◌ 5 functionality how satisfied are you with the search functionality, performance, and result quality of your web-‐scale discovery service? ◌ 1 ◌ 2 ◌ 3 ◌ 4 ◌ 5 usability how satisfied are you with the design, layout, navigability, and overall ease of use of your web-‐scale discovery interface? ◌ 1 ◌ 2 ◌ 3 ◌ 4 ◌ 5 administration how satisfied are you with the administrative, customization, and reporting tools offered by your web-‐scale discovery service? ◌ 1 ◌ 2 ◌ 3 ◌ 4 ◌ 5 technology how satisfied are you with the level of interoperability between your web-‐scale discovery service and other library systems such as your ils, knowledge base, link resolver, and institutional repository? ◌ 1 ◌ 2 ◌ 3 ◌ 4 ◌ 5 overall overall, how satisfied are you with your institution’s web-‐scale discovery service? ◌ 1 ◌ 2 ◌ 3 ◌ 4 ◌ 5 evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 64 questions please share your experiences with your web-‐scale discovery service by responding to the following questions. briefly describe your reasons for implementing a web-‐scale discovery service. what role does this service play at your library? how is it intended to benefit your users? what types of users is it intended to serve? does your web-‐scale discovery service have any notable gaps in coverage? if so, how do you compensate for those gaps or make users of aware of resources that are not included in the service? are you satisfied with the relevance of the results returned by your web-‐scale discovery service? have you noticed any particular anomalies within search results? does your web-‐scale discovery service lack any specific features or functions that you wish were available? are there any particular aspects of your web-‐scale discovery service that you wish were customizable but are not? did you face any particular challenges integrating your web-‐scale discovery service with other library systems such as your ils, knowledge base, and link resolver? how responsive has the vendor been in providing technical support, resolving problems, and responding to enhancement requests? have they provided adequate training and documentation to support your implementation? in general, how have users responded to the introduction of this service? has their response been positive, negative, or mixed? in general, how have librarians responded to the introduction of this service? has their response been positive, negative, or mixed? what has been the impact of implementing a web-‐scale discovery service on the overall usage of your collection? have you noticed any fluctuations in circulation, full text downloads, or usage of subject-‐specific databases? has your institution conducted any assessment or usability studies of your web-‐scale discovery service? if so, please briefly describe the key findings of these studies. please share any additional thoughts or advice that you think might be helpful to other libraries currently exploring web-‐scale discovery services. information technology and libraries | june 2015 65 appendix f. sample worksheet for web-‐scale discovery coverage test instructions construct 3 search queries representing commonly researched topics in your discipline. test your queries in each discovery product and compare the results. for each product, record the number of results retrieved and rate the quality of coverage and indexing. use the space below your ratings to explain your rationale and record any notes or observations. rate coverage and indexing a scale of 1 to 3 (1 = poor, 2 = average, 3 = good). in your evaluation, please consider the following: coverage indexing • do the search results demonstrate broad coverage of the variety of subjects, formats, and content types represented in the library’s collection? (hint: use facets to examine the breakdown of results by source type or collection). • do any particular types of content seem to dominate the results (books, journal articles, newspapers, book reviews, reference materials, etc.)? • are the library’s local collections adequately represented in the results? • do any relevant resources appear to be missing from the search results (e.g., results from an especially relevant database or journal)? • do item records contain complete and accurate source information? • do item records contain sufficient metadata (citation, subject headings, abstracts, etc.) to help users identify and evaluate results? evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 66 example product product b reviewer reviewer #2 discipline history query results coverage indexing kw: slavery and “united states” 181,457 1 (poor) 3 (good) the majority of results appear to be from newspapers and periodicals. some items designated as “journals” are actually magazines. there are a large number of duplicate records. some major works on this subject are not represented in the results. depth of indexing varies by publication but most include abstracts and subject headings. some records only include citations, but citations appear to be complete and accurate. information technology and libraries | june 2015 67 appendix g. sample worksheet for web-‐scale discovery usability test pre-‐test questionnaire before beginning the test, ask the user for the following information. status � undergraduate � graduate � faculty � staff � other major/department ___________________________ what resource do you use most often for scholarly research? ___________________________ on a scale of 1 to 5, how would you rate your ability to find information using library resources? low � 1 � 2 � 3 � 4 � 5 high on a scale of 1 to 5, how would you rate your ability to find information using google or other search engines? low � 1 � 2 � 3 � 4 � 5 high scenarios ask the user to complete the following tasks using each product while sharing their thoughts aloud. 1. you are writing a research paper for your communications course. you’ve recently been discussing how social media sites like facebook collect and store large amounts of personal data. you decide to write a paper that answers the question: “are social networking sites a threat to privacy?” use the search tool to find sources that will help you support your argument. 2. from the first 10 results, select those that you would use to learn more about this topic and email them to yourself. if none of the results seem useful, do not select any. 3. if you were writing a paper on this topic, how satisfied would you be with these results? � very dissatisfied � dissatisfied � no opinion � satisfied � very satisfied 4. from the first 10 results, attempt to access an item for which full text is available online. evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 68 5. now that you’ve seen the first 10 results, what would you do next? � decide you have enough information and stop � continue and review the next set of results � revise your search and try again � exit and try your search in another library database (which one?) � exit and try your search in google or another search engine � other (please explain) post-‐test questionnaire after the user has used all three products, ask them about their experiences. based on your experience, please rank the three search tools you’ve seen in order of preference. how would you compare these search tools with the search options currently offered by the library? information technology and libraries | june 2015 69 appendix h. sample worksheet for web-‐scale discovery relevance test instructions conduct the same search query in each discovery product and rate the relevance of the first 10 results using the scale provided. for each query, record your search condition, terms, and limiters. for each product, record the first 10 results in the exact order they appear, rank the relevance of each result using the relevance scale, and explain the rationale for your score. all calculations will be tabulated automatically. relevance scale 0 = not relevant not at all relevant to the topic, exact duplicate of a previous result, or not enough information in the record or full text to determine relevance 1 = somewhat relevant somewhat relevant but does not address all of concepts or criteria specified in the search query, e.g., addresses only part of the topic, is too broad or narrow in scope, is not in the specified format, etc. 2 = relevant relevant to the topic, but the topic may not be the primary or central subject of the work, or the work is too brief or dated to be useful; a resource that the user might select 3 = very relevant completely relevant; exactly on topic; addresses all concepts and criteria included in the search query; a resource that the user would likely select calculations cumulative gain measure of overall relevance based on the sum of all relevance scores. discount factor (1/log2i) penalization of relevance based on ranking. assuming that relevance decreases with rank, each result after the first is associated with a discount factor based on log factor 2. discount factor is calculated as 1/log2i where i = rank. the discount factor of result #6 is calculated as 1 divided by the logarithm of 6 with base 2, or 1/log(6,2) = 0.39. discounted gain discounted relevance score based on ranking. discounted gain is calculated by multiplying a result’s relevance score by its discount factor. the discounted gain of a result with a relevance score of 3 and discount factor of 0.39 is 3 × 0.39, or 1.17. discounted cumulative gain measure of overall discounted gain based on the sum of all discount gain scores. evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 70 example product product c reviewer reviewer #3 search condition seeking peer reviewed articles about the impact of media violence on children search terms “mass media” and violence and children limits peer reviewed r esult r elevan ce n otes c. g ain r an k r elevan ce 1/log 2 i d . g ain d .c. g ain effects of media ratings on children and adolescents: a litmus test of the forbidden fruit effect 0 research article suggesting that ratings do not influence children’s perceptions of films or video games. not relevant; does not discuss impact of media violence on children. 19 1 0 1.00 0 9.65 media violence associations with the form and function of aggression among elementary school children 3 research article demonstrating a positive association between media violence exposure and levels of physical and relational aggression in grade school students. very relevant. 2 3 1.00 3 information technology and libraries | june 2015 71 harmful effects of media on children and adolescents 2 review article discussing the influence of media on negative child behaviors such as violence, substance abuse, and sexual promiscuity. relevant but does not focus exclusively on media violence. 3 2 0.63 1.26 the influence of media violence on children 3 review article examining opposing views on media violence and its impact on children. very relevant. 4 3 0.50 1.5 remote control childhood: combating the hazards of media culture in schools 1 review article discussing the harmful effects of mass media on child behavior and learning as well as strategies educators can use to counteract them. somewhat relevant but does not focus exclusively on media violence and discussion is limited to the educational context. 5 1 0.43 0.43 media violence, physical aggression, and relational aggression in school age children 3 research article on the impact of media violence on childhood aggression in relation to different types of aggression, media, and time periods. very relevant. 6 3 0.39 1.17 do you see what i see? parent and child reports of parental 2 research article examining the effectiveness of parental monitoring of children’s violent media 7 2 0.36 0.72 evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 72 monitoring of media consumption. relevant but focused less on the effects of media violence than strategies for mitigating them. exposure to media violence and young children with and without disabilities: powerful opportunities for family-‐ professional partnerships 2 review article discussing the impact of media violence on children with and without disabilities and recommendations for addressing this through family-‐professional partnerships. relevant but slightly more specific than required. 8 2 0.33 0.66 kitle iletisim araçlarinda n televizyonu n 3-‐6 yas grubundaki çocuklarin davranislar i üzerine etkisi. 1 research article demonstrating a positive correlation between media violence exposure and aggressive behavior in grade school students. seems very relevant but article is in turkish. 9 1 0.32 0.32 sex and violence: is exposure to media content harmful to children? 2 review article discussing how exposure to violent or sexually explicit media influences child behavior and what librarians can do about it. relevant but less than two pages long. 10 2 0.30 0.60 information technology and libraries | june 2015 73 references 1. judy luther and maureen c. kelly, “the next generation of discovery,” library journal 136, no. 5 (2011): 66. 2. athena hoeppner, “the ins and outs of evaluating web-‐scale discovery services,” computers in libraries 32, no. 3 (2012): 8. 3. kate b. moore and courtney greene, “choosing discovery: a literature review on the selection and evaluation of discovery layers,” journal of web librarianship 6, no. 3 (2012): 145–63, http://dx.doi.org/10.1080/19322909.2012.689602. 4. ronda rowe, “web-‐scale discovery: a review of summon, ebsco discovery service, and worldcat local,” charleston advisor 12, no. 1 (2010): 5-‐10, http://dx.doi.org/10.5260/chara.12.1.5; ronda rowe, “encore synergy, primo central,” charleston advisor 12, no. 4 (2011): 11–15, http://dx.doi.org/10.5260/chara.12.4.11. 5. sharon q. yang and kurt wagner, “evaluating and comparing discovery tools: how close are we towards the next generation catalog?” library hi tech 28, no. 4 (2010): 690–709, http://dx.doi.org/10.1108/07378831011096312. 6. jason vaughan, “web scale discovery services,” library technology reports 47, no. 1. (2011): 5–61, http://dx.doi.org/10.5860/ltr.47n1. 7. hoeppner, “the ins and outs of evaluation web-‐scale discovery services.” 8. luther and kelly, “the next generation of discovery”; amy hoseth, “criteria to consider when evaluating web-‐based discovery tools,” in planning and implementing resource discovery tools in academic libraries, ed. mary p. popp and diane dallis (hershey, pa: information science reference, 2012), 90–103, http://dx.doi.org/10.4018/978-‐1-‐4666-‐1821-‐ 3.ch006. 9. f. william chickering and sharon q. yang, “evaluation and comparison of discovery tools: an update,” information technology & libraries 33, no. 2 (2014): 5–30, http://dx.doi.org/10.6017/ital.v33i2.3471. 10. noah brubaker, susan leach-‐murray, and sherri parker, “shapes in the cloud: finding the right discovery layer,” online 35, no. 2 (2011): 20–26. 11. jason vaughan, “investigations into library web-‐scale discovery services,” information technology & libraries 31, no. 1 (2012): 32–82, http://dx.doi.org/10.6017/ital.v31i1.1916. 12. mary p. popp and diane dallis, eds., planning and implementing resource discovery tools in academic libraries (hershey, pa: information science reference, 2012), http://dx.doi.org/10.4018/978-‐1-‐4666-‐1821-‐3. 13. jason vaughan, “evaluating and selecting a library web-‐scale discovery service,” in planning and implementing resource discovery tools in academic libraries, ed. mary p. popp and diane evaluating web-‐scale discovery services: a step-‐by-‐step guide | deodato doi: 10.6017/ital.v34i2.5745 74 dallis (hershey, pa: information science reference, 2012), 59–76, http://dx.doi.org/10.4018/978-‐1-‐4666-‐1821-‐3.ch004. 14. monica metz-‐wiseman et al., “best practices for selecting the best fit,” in planning and implementing resource discovery tools in academic libraries, ed. mary p. popp and diane dallis (hershey, pa: information science reference, 2012), 77–89, http://dx.doi.org/10.4018/978-‐1-‐4666-‐1821-‐3.ch005. 15. david freivalds and binky lush, “thinking inside the grid: selecting a discovery system through the rfp process,” in planning and implementing resource discovery tools in academic libraries, ed. mary p. popp and diane dallis (hershey, pa: information science reference, 2012), 104–21, http://dx.doi.org/10.4018/978-‐1-‐4666-‐1821-‐3.ch007. 16. david bietila and tod olson, “designing an evaluation process for resource discovery tools,” in planning and implementing resource discovery tools in academic libraries, ed. mary p. popp and diane dallis (hershey, pa: information science reference, 2012), 122–36, http://dx.doi.org/10.4018/978-‐1-‐4666-‐1821-‐3.ch008. 17. suzanne chapman et al., “developing a user-‐centered article discovery environment,” in planning and implementing resource discovery tools in academic libraries, ed. mary p. popp and diane dallis (hershey, pa: information science reference, 2012), 194–224, http://dx.doi.org/10.4018/978-‐1-‐4666-‐1821-‐3.ch012. 18. lynn d. lampert and katherine s. dabbour, “librarian perspectives on teaching metasearch and federated search technologies,” internet reference services quarterly 12, no.3/4 (2007): 253–78, http://dx.doi.org/10.1300/j136v12n03_02; william breitbach, “web-‐scale discovery: a library of babel?” in planning and implementing resource discovery tools in academic libraries, ed. mary p. popp and diane dallis (hershey, pa: information science reference, 2012), 637–45, http://dx.doi.org/10.4018/978-‐1-‐4666-‐1821-‐3.ch038. 19. metz-‐wiseman et al., “best practices for selecting the best fit,” 81. 20. meris a. mandernach and jody condit fagan, “creating organizational buy-‐in: overcoming challenges to a library-‐wide discovery tool implementation,” in planning and implementing resource discovery tools in academic libraries, ed. mary p. popp and diane dallis (hershey, pa: information science reference, 2012), 422, http://dx.doi.org/10.4018/978-‐1-‐4666-‐1821-‐ 3.ch024. 21. david p. brennan, “details, details, details: issues in planning for, implementing, and using resource discovery tools,” in planning and implementing resource discovery tools in academic libraries, ed. mary p. popp and diane dallis (hershey, pa: information science reference, 2012), 44–56, http://dx.doi.org/10.4018/978-‐1-‐4666-‐1821-‐3.ch003; hoseth, “criteria to consider when evaluating web-‐based discovery tools”; mandernach and condit fagan, “creating organizational buy-‐in.” information technology and libraries | june 2015 75 22. vaughan, “evaluating and selecting a library web-‐scale discovery service,” 64. 23. ibid., 81. 24. nadine p. ellero, “an unexpected discovery: one library’s experience with web-‐scale discovery service (wsds) evaluation and assessment,” journal of library administration 53, no. 5–6 (2014): 323–43, http://dx.doi.org/10.1080/01930826.2013.876824. 25. vaughan, “evaluating and selecting a library web-‐scale discovery service,” 66. 26. hoseth, “criteria to consider when evaluating web-‐based discovery tools.” 27. yang and wagner, “evaluating and comparing discovery tools”; chickering and yang, “evaluation and comparison of discovery tools”; bietila and olson, “designing an evaluation process for resource discovery tools.” 28. vaughan, “investigations into library web-‐scale discovery services”; vaughan, “evaluating and selecting a library web-‐scale discovery service”; freivalds and lush, “thinking inside the grid”l brubaker, leach-‐murray, and parker, “shapes in the cloud.” 29. chapman et al., “developing a user-‐centered article discovery environment.” 30. jakob nielsen, “first rule of usability? don't listen to users,” nielsen norman group, last modified august 5, 2001, accessed, august 5 2014, http://www.nngroup.com/articles/first-‐ rule-‐of-‐usability-‐dont-‐listen-‐to-‐users. 31. freivalds and lush, “thinking inside the grid.” 32. ibid. 33. matthew b. hoy, “an introduction to web scale discovery systems,” medical reference services quarterly 31, no. 3 (2012): 323–29, http://dx.doi.org/10.1080/02763869.2012.698186; vaughan, “web scale discovery services”; vaughan, “investigations into library web-‐scale discovery services”; hoeppner, “the ins and outs of evaluating web-‐scale discovery services”; chickering and yang, “evaluation and comparison of discovery tools.” 34. marshall breeding, “major discovery products,” library technology guides, accessed august 5, 2014, http://librarytechnology.org/discovery. 35. hoeppner, “the ins and outs of evaluating web-‐scale discovery services,” 40. 36. mandernach and condit fagan, “creating organizational buy-‐in,” 429. 37. bietila and olson, “designing an evaluation process for resource discovery tools.” 38. special thanks to rutgers’ associate university librarian for digital library systems, grace agnew, for designing this testing method. 78 information technology and libraries | june 2006 in the early years of modern information retrieval, the fundamental way in which we understood and evaluated search performance was by measuring precision and recall. in recent decades, however, models of evaluation have expanded to incorporate the information-seeking task and the quality of its outcome, as well as the value of the information to the user. we have developed a systems engineering-based methodology for improving the whole search experience. the approach focuses on understanding users’ information-seeking problems, understanding who has the problems, and applying solutions that address these problems. this information is gathered through ongoing analysis of site-usage reports, satisfaction surveys, help desk reports, and a working relationship with the business owners. ■ evaluation models in the early years of modern information retrieval, the fundamental way in which we understood and evaluated search performance was by measuring precision and recall.1 in recent decades, however, models of evaluation have expanded to incorporate the information-seeking task and the quality of its outcome, cognitive models of information behavior, as well as the value of the information to the user.2 the conceptual framework for holistic evaluation of libraries described by nicholson defines multiple perspectives (internal and external views of the library system as well as internal and external views of its use) from which to measure and evaluate a library system.3 the work described in this paper is consistent with these frameworks as it emphasizes that, while efforts to improve search may focus on optimizing precision or recall, it is equally important to recognize that the search experience involves more than a perfect set of high-precision, high-recall search results. the total search experience and how well the system actually helps the user solve the search task must be evaluated. a search experience begins when users enter words in a search box. it continues when the users view some representation (such as a list or a table) of candidate answers to their queries. it includes the users’ reactions to the usefulness of those answers and their representation in satisfying information needs, and continues with the users clicking on a link (or links) to view content. optimizing search results without considering the rest of the search experience and without considering user behavior is missing an opportunity to further improve user success. for example, the experience is a failure if typical users cannot recognize the answers to their information need because the items lack a recognizable title or an informative description, or they involve extensive scrolling or hard-to-use content. ■ proposed solutions problems with search, such as low precision or low recall, are often addressed by either metadata solutions (adding topical tags to content objects based on controlled vocabularies) or replacement of the search engine. the problems with the metadata approach include the time and effort required to establish, evolve, and maintain taxonomies, and the need for trained intermediaries to apply the tags.4 a community of stakeholders may be convened to define the controlled vocabulary, but often the lowest common denominator prevails, the champions and stakeholders leave, and no one is happy with the resulting standard. even with trained intermediaries, inter-indexer inconsistency compromises this approach, and inconsistent term application can cause degradation of search results.5 another shortcoming of the metadata approach is that a specific metadata classification is just a snapshot in time and assumes that there is only one particular hierarchy of the information in the corpus. in reality, however, there is almost always more than one way to describe a concept, and the taxonomy is the view of only one individual or group of individuals. in addition, topical metadata is often implemented with little understanding of the types of queries that are submitted or the probable user search behavior. the other approach to improving search results— replacing a search engine—is not a guarantee to fixing the problem because it focuses only on improving precision (and perhaps recall as well) without understanding the true barriers to a successful search experience. ■ irs.gov irs.gov, one of the most widely used government web sites, is routinely accessed by millions of people each month (more than 27 million visits in april 2005). as an informational site, the key goal of irs.gov is to direct visitors quickly to useful information, either through marcia d. kerchner (mkerchner@mitre.org) is a principal information systems engineer at the mitre corporation, mclean, va. a dynamic methodology for improving the search experience marcia d. kerchner article title | author 79a dynamic methodology for improving the search experience | kerchner 79 navigation or a search function. given that there were almost 16 million queries submitted to irs.gov in april 2005, search is clearly a popular way for its users to look for information. this paper offers an alternative to conventional search-improvement approaches by presenting a systems engineering-based methodology for improving the whole search experience. this methodology was developed, honed, and modified in conjunction with work performed on the irs.gov web site over a threeyear period. a similar strategy of “sense-and-respond” for information technology (it) departments of public organizations that involves systematic intelligence gathering on potential customer demand, a rapid response to fulfill that demand, and metrics to determine how well the demand was satisfied, has recently been described.6 the methodology described in this paper focuses on analyzing the information-seeking behaviors and needs of users and determining the requirements of the business owners (the irs business operating divisions that provide content to irs.gov, such as small business and self-employed, wage and investment) for directing users to relevant content. it is based on the assumption that a web site must evolve based on its user needs, rather than expecting users to adapt to its singularities. to support this evolution, this approach leverages techniques for query expansion and document-space modification.7 dramatic improvements in quality of service to the user have resulted, enhancing the user experience at the site and reducing the need to contact the help desk. the approach is particularly applicable for those government, corporate, and commercial web sites where there is some control over the content, and usage can be categorized into regular patterns. the rest of this paper provides a case study in the application of the methodology and the application of metrics, in addition to precision and recall, to measure search experience improvement. ■ conceptual framework while analysis of search results often focuses on search syntax and search-engine performance, there are actually several steps in the retrieval process, from the user identifying an information need to the user receiving and reviewing query results. as shown in figure 1, finding information is a holistic process. there are several opportunities to improve the whole user experience by fine-tuning this process with a variety of tools—from document engineering to results categorization. once the user and business-owner needs are understood, the appropriate tools to address specific issues can be identified. the tools in our toolkit are described in the following sections. document engineering document engineering includes: ■ document-space modification: modifying the document space by adding terms to content (especially to titles) that are good discriminators and reflect terms commonly entered by users. this approach has the added benefit of making the content more understandable to users. ■ establishment of content-quality standards: defining business processes that improve content quality and organization. document-space modification there is significant syntactic and semantic impreciseness in the english language. in addition, because of the inadequacies of human or automatic keyword assignment, standard means of representing documents in indexes by statistical term associations and frequency counts or by adding metadata tags are not definitive enough to produce a space that is an exact image of the original documents. document-space modification moves documents in the document space closer to future similar queries by adding new terms or modifying the weight of existing terms in the content (figure 2).8 the document space is thus modified to improve retrieval. for irs.gov, rather than adjusting content weights, titles and content are modified to adjust to changing terminology and user needs. establishment of content-quality standards the quality of the search correlates with the quality of the content. improved search results can be achieved by applying good content-creation practices. retrieval can be significantly improved by addressing problems observed in the content. these problems include inconsistencies in term use—for example, earned income credit (eic) versus earned income tax credit (eitc)—duplicate content, insufficiently descriptive page titles, missing document summaries, misspellings, and inconsistent spellings. figure 1. the information retrieval process 80 information technology and libraries | june 2006 processes to improve content quality should establish standards for consistent term usage in content, as well as standards for consistent and descriptive naming of content types (for example, irs types include forms, instructions, and publications). these processes will not only improve search precision, but will also help users identify appropriate content in the search results. for example, content entitled “publication 503” in response to the query “child care” may be the perfect answer (with excellent precision and recall), but the user will not recognize it as the right answer. a title such as “publication 503: child and dependent care expenses” will clearly point the user to the relevant information. usability tests conducted in march 2005 for irs.gov confirmed that content organization plays an important role in the perceived success of a user’s search experience. long pages of links or scrolling pages of content left some users confused and overwhelmed, unable to find the needed information. for these queries, although the search results were perfect, with a precision of 100 percent after one document, the search experiences were still failures. query enhancement the technique of relevance feedback for query expansion improves retrieval in an iterative fashion.9 according to this approach, the user submits a query, reviews the search results, and then reports query-document relevance assessments to the system. these assessments are used to modify the initial query, that is, new terms are added to the initial query (hopefully) to improve it, and the query is resubmitted. if one visualizes the content in a collection as a space (figure 3), this approach attempts to move the query closer to the most relevant content. a drawback of relevance feedback is that it is not generally collected over multiple user sessions and over time, so the next user submitting the same query has to go through the same process of providing results evaluations for query expansion. borlund has noted that, given that an individual user ’s information need is personal and may change over session time, relevance assessments can only be made by a user at a particular time.10 however, on irs.gov, where there are many common queries for which there is a clear best-guess response, there is valuable relevance information that, if captured once, could benefit tens of thousands of users for specific queries. in fact, in april 2005, the top four hundred queries represented almost half of all the queries. another drawback of the relevance-feedback ap proach is that it forces the user, novice or expert, to become engaged in the search process. as noted previously, users are generally not interested in becoming search experts or in becoming intimately involved in the process of search. the relevance-feedback approach tries to change users’ behavior and forces them to find the specific word or words that will best retrieve the relevant information. in fact, some research has shown that the potential benefits of relevance feedback may be hard to achieve primarily because searchers have difficulty finding useful terms for effective query expansion.11 to avoid requiring users to submit relevance-feedback judgments, the methodology uses alternative approaches for gathering feedback: (1) mining sources of input that do not require any additional involvement on the part of the users; and (2) soliciting relevance judgments from subject matter experts. as noted above, while best results may be different per task and per user, particularly given the shortness of the queries, our goal is to maximize the good results for the maximum number of people. best-guess results are derived from a variety of sources, including usability testing, satisfaction survey questionnaires, and businesscontent owners. for example, users entering the common query “1040ez” can be looking for information on the form or the form itself. given that—as shown in table 1 (based on the responses of 11,715 users to satisfaction surveys in 2005)—the goal of 39 percent of irs.gov searchers is to download a form as opposed to 28 percent seeking to obtain general tax information, the retrieval of the 1040ez form and its instructions is prioritized, while also retrieving any general related information. figure 2. document-space modification figure 3. query modification article title | author 81a dynamic methodology for improving the search experience | kerchner 81 we can determine the best-guess results as follows: ■ review the search results for terms that are on the frequently entered search-terms list ■ review help desk contacts, satisfaction-survey comments, and zero-results reports to identify information users who are having trouble finding or understanding ■ identify best results by working with the business owners as necessary ■ analyze why best results are not being retrieved for a particular query ■ add appropriate synonyms for this and related queries ■ engineer relevant documents (as described above) in this way, the thesaurus, as the source for query enhancement, is an evolving structure that adapts to the needs of the users rather than being a fixed entity of elements based on someone’s idea of a standardized vocabulary. search improvement we can intercept very popular queries and return a set of preconfigured results or a quick link at the top of the search-results listing. for example, the user entering “1040” sees a list of the most popular 1040-related forms and instructions in addition to a list of other search results. there were more than 31,000 users in april 2005 who requested the i-9 form. since the form is not an irs form, users are presented with a link to the bureau of citizen and immigration services web site. the tens of thousands of users who look for state tax forms on irs.gov are directed either to the specific state-tax-form website page or to a page with links to state tax sites. this unique and user-friendly approach provides a significant improvement over a page that tells the user that there is no matching result, leaving him to fend for himself. another technique for improving search precision (not currently used for irs.gov) is to tune and adjust parameters in the search engine, such as the relative weighting of basic metadata tags such as title (if they are used in the relevance calculation). results-ranking improvement the search results can be programmatically re-ranked before being presented to the user. this approach (not used as yet on irs.gov) is a variation on the quick links described above for re-ranking more than one result. categorization a large set of search results can be automatically categorized into subsets to help the user find the information he needs. in addition, a “search within a search” function is available to help the user narrow down results. research to be conducted on commercial products to support automatic categorization is planned for the future. summarization as noted earlier, a barrier to a successful user experience can be the lack of informative descriptions in the search results. therefore, an important tool for search-experience improvement is to make sure that content titles and summaries are informative, or as a second choice, that the search engine dynamically generates informative summaries. passage-based summaries and highlighted search terms in the summary and the content have become a feature of many commercial search engines as another way to improve the usability of the returned results. in table 1. reasons for using irs.gov reason for coming to irs.gov % of total site visitors % of total search users download a tax form, publication, or instructions 39 39 obtain general tax information 27 28 obtain information on e-file 10 10 other 6 6 obtain info on tax regulations or written determinations 4 4 order forms from the irs 3 4 sign up or login to e-services 3 3 link and learn (vita/vce) training 3 3 obtain info on the status of your tax return 2 2 use online tax calculators 1 1 obtain info on revenue rulings or court cases 1 1 obtain an employer identification number (ein) 1 — note: due to rounding, totals may not equal 100%. 82 information technology and libraries | june 2006 addition, for those pdf publications that lacked informative titles in the title tag, descriptive information from a different metadata field was added to the search display programmatically, which improved the usability of such results significantly. ■ methodology the methodology for evolving the search functionality is based on a logical, systems-engineering approach to the issue of getting users the information they seek: understanding the problems, understanding who has the problems, and applying solutions that address the problems. usability studies, weblogs, focus groups, help desk contacts, and user surveys provide different perspectives of the information system. the steps of the methodology are: 1. understand the user population. 2. identify the barriers to a successful search experience. 3. analyze the information-seeking behaviors of the users. 4. understand the needs of the business owners. 5. identify and use the appropriate tools to improve the user’s search experience. 6. repeat as needed. 7. monitor new developments in search and analytic technologies and replace the search engine as appropriate. step 1: understand the user population the first step is to profile and understand the user population. as mentioned above, an online satisfaction survey was conducted during a six-week period in january– february 2005, to which 11,715 users responded. the users were asked the frequency of their usage of the site, their primary reason for coming to irs.gov, their category (individual, tax professional, business representative), and how they generally find information on irs.gov. as shown in tables 1–4, 76 percent of the irs. gov visitors use it once a month or less (the largest group being those who use it every six months or less), or were using it for the first time; 64 percent are individual taxpayers; 10 percent are tax professionals; 39 percent visit the site to download a form or publication; and 27 percent come for general tax or e-file information. forty-nine percent use the search engine. not surprisingly, 44 percent of the frequent visitors (those who visit once a week or more) are tax professionals, while 72 percent of the infrequent visitors are individuals or those who represent a business. the most common task of both the most frequent and infrequent visitors is to download a form, publication, or instructions, followed by obtaining general tax information. most frequent and infrequent visitors use the search function to locate their information. thus, the largest group of irs.gov users consists of average citizens, unfamiliar with the site, who have a specific question or a need for a specific form or publication. these users require high-precision, highly relevant results, and a highly intuitive search interface. they do not want or need to read all the material generated by their search, but they want their question answered quickly. these users are generally not experienced with sophisticated query language syntax, and because they come to the site no more than once a month, they are not likely to be familiar with its navigational organization. as studies demonstrate, users in general do not want to learn a search engine interface or tailor their queries to the design of a particular search engine.12 they want to find their information now before “search rage” sets in. one study observed that, on average, searchers get frustrated in twelve minutes.13 tax professionals form a small but important group of irs.gov users that includes lawyers, accountants, and tax preparers. they generally use the site on a regular basis, which could be daily, weekly, or monthly. some of these users, particularly lawyers and accountants, require high relevance in their search results; it is critical that they retrieve every relevant piece of information (e.g., all the tax regulations) related to a tax topic. they may be willing to sift through large results sets to make sure they have seen all the relevant items. in contrast, many tax preparers use the site primarily to download forms and instructions. while these different sets of users have different levels of expertise using the site and somewhat different precision and recall requirements, they do have one characteristic in common—they are not interested in search table 2. frequency of visits to irs.gov first time every six months or less about once a month about once a week daily more than once a day site visitor 29% 34% 13% 13% 7% 4% search user 26% 34% 14% 14% 7% 5% article title | author 83a dynamic methodology for improving the search experience | kerchner 83 for its own sake. approaches to improving retrieval results that focus on forcing users to use tools to refine their query to get presumably better search results (e.g., leveraging the power of boolean or other search syntax) are not desirable in a public web site environment. the complexity of the search must be hidden behind the search box and users must be helped to find information rather than be expected to master a search function. step 2: identify the barriers to a successful search experience there are several categories of reasons why finding information on a public web site can be frustrating for the user. ■ mismatch between user terminology and content terminology  the user search terms may not match the terminology or jargon used in the content (e.g., users ask for “tax tables” or “tax brackets”; the irs names them “tax rate schedules”).  multiple synonymous terms or acronyms are found because different authors are providing content on similar topics (e.g., “ein,” “employer identification number,” “federal id number”; “eic” versus “eitc”).  users request the same information in a variety of ways (e.g., “1040ez,” “1040-ez,” “ez,” “form1040ez,” “1040ez form,” “2005 1040ez,” “ez1040”).  related content may be inconsistently named, complicating the user’s search process (e.g., “1040x” form versus “1040-x” instructions).  the user may use a familiar acronym that is spelled out in the content (e.g., “poa” for “power of attorney”). ■ mismatch between user requests and actual content  many users ask for information that they expect to find on the site but is actually hosted at another site (e.g., “ds156,” a department of state form; “it-201,” a new york state tax form). ■ issues with results listing and content display  content may lack informative titles.  automatically generated summaries may not be sufficiently descriptive for users to recognize the relevant material in the results listing.  content may consist of long, scrolling pages, which users find hard to manage. ■ incomplete user queries  very short search phrases (average length of less than two words) can make it difficult for a search algorithm to deduce the specific content the user is seeking. step 3: analyze the information-seeking behaviors of the users site-usage reports, satisfaction surveys, help desk contact reports, zero-results reports, focus groups, and usability studies are valuable sources of information. they should be mined for information-seeking behaviors of the site’s users and other barriers to a successful search experience, as follows: ■ review site-usage reports for the most frequently entered search terms and popular pages (both may change over time) and the zero-results search terms. look for:  new terms  variations on popular terms  common misspellings or typos  common searches, including searches for items table 3. irs.gov user types type of user % of total site visitors % of total search users individual taxpayer 64% 64% representing a business 11% 11% tax professional 10% 11% representing a charity or nonprofit 3% 3% vita/vce volunteers 3% 3% representing a government entity 2% 2% student 2% 1% irs employee 1% 2% other 4% 3% table 4. how users find information on irs.gov how do you usually find information on irs.gov? % of total site visitors search engine 49% irs keyword 18% navigation to the web page 11% internet search engine (e.g., google, yahoo) 7% site map 5% other 4% bookmarks 3% links to irs.gov from other web sites 3% 84 information technology and libraries | june 2006 not on the site, that could be candidates for preprogrammed “quick links”  frequently entered terms—review search results to identify candidates for improvement ■ review satisfaction surveys over time  look for new problems that caused satisfaction to decrease  analyze answers to questions asking what people could not find, potentially identifying new barriers to success ■ conduct usability studies  identify issues with the user interface as well as with content findability and usability ■ review help desk contact reports  identify which topics users are having trouble finding or understanding step 4: understand the needs of the business owners the business owners are the irs business operating divisions that provide content to irs.gov, such as small business and self-employed, wage and investment. it is important to involve them in the process of enhancing the user experience, because they may have specific goals for prioritizing information on a particular topic or may be managing campaigns for highlighting new information. thus it is desirable to: ■ meet with business owners regularly to understand their goals for providing information to users ■ work with them to increase the findability of their content for example, when an issue in finding a particular content topic is identified (e.g., through an increase in help desk contacts), one approach is to show the business owner the actual results that common queries (based on the site-usage reports) on the topic retrieve and then present suggested alternative results that could be retrieved with a variety of enhancement techniques, such as thesaurus expansion or title improvement. the business owner can then evaluate which set of results presents the content in the most informative manner to the user. steps 1–4 facilitate work behind the scenes to gather the data needed to improve precision and recall and to make information more findable. the remaining steps use these data to adapt proven, widely used techniques for improving search experiences to a web site’s specific environment. step 5: identify appropriate tools to improve the information-retrieval process as described in the previous section, the tools in our toolkit are document engineering, query enhancement, search improvement, results-ranking improvement, categorization, and summarization. step 6: repeat as needed the process of improving the user search experience is ongoing as the site evolves. at irs.gov, different search terms appear on the site-usage reports over time, depending on whether or not it is filing season, or as new content and applications are published. human intervention (with the help of applicable tracking software) is essential for incorporating business requirements, evaluating human behavior, and identifying changing terms. step 7: monitor new developments in search and analytic technologies and replace the search engine as appropriate although a new search engine will not address all the issues that have been described, new features such as passage-based summaries and term-highlighting can improve the search experience. of course, one should consider replacing a search engine if new technology can demonstrate significantly improved precision and recall. the application of the methodology and the use of the toolkit for irs.gov will be described in the next section. ■ findings site-usage reports in 2003, an example of a serious mismatch in user and content terminology was discovered when site-usage reports were analyzed. users entering the equivalent terms ein, employer number, employer id number, and employer identification number retrieved significantly different sets of results. we met with the business owner, who identified a key-starting page that should be retrieved along with other highly relevant pages for all of these query terms. we recommended that “ein” be added to the title of the key page because, although ein is a very popular query, the acronym was not used in the content, but was instead spelled out. as a result, the key page was not being retrieved. synonyms were added to the query enhancement thesaurus to accommodate the variants on the ein concept. after these steps were implemented, the results were as follows: ■ for the query ein, the target page moved from #16 to #1 ■ for the query ein number, it moved from #17 to #5 article title | author 85a dynamic methodology for improving the search experience | kerchner 85 ■ for the query employer identification number, it moved up to #2 (it was not in the top 20 previously) ■ all search results now retrieved on the first page for these terms were highly relevant in january 2004, there were approximately twenty thousand queries using these terms, so the search experience has been improved for tens of thousands of users in one month and hundreds of thousands of users throughout the year. ■ review of help desk contacts help desk reports summarize, for each call or e-mail, the general topic of the user’s contact (filing information, employer id number, forms, and publications issues) and the specific question. for example, the report might indicate that a user needed help in finding or downloading the w-4 form or did not understand the instructions for amending a tax return. as help desk contact reports were reviewed, clusters of questions emerged indicating information that many users could not find or understand. by analyzing approximately 9,800 contacts (e-mail, telephone, chat) during a peak five-day period in april 2003, four particular areas were identified that were ripe for improvement: 480 users could not find previous years’ forms, which, although they can be found on the site, are not indexed and thus not findable through search; 250 users had questions about where to send their tax returns; 170 users had questions about getting a copy of their tax return or w-2 form; and 77 users had problems finding the 1040x or 1040ez forms. utilizing the information retrieval toolkit, the following improvements were implemented: a) search for previous years’ forms tool used: results-ranking improvement a user requesting a previous year’s forms (for example, 2002 1099misc) is now presented with a link directly to the page of forms for that specific year, as follows: recommendation(s) for: 2002 1099misc ■ 2002 forms and publications 2002 forms, instructions, and publications available in pdf format b) request for filing address tools used: document engineering and query enhancement a new “where to file” page was created. synonyms were added to the thesaurus to accommodate the variations on how people make this request (address, where to send, where to mail) and to prioritize retrieval of the “where to file” page. c) request for information about obtaining a copy of a tax return or w-2 form tools used: results-ranking improvement and query enhancement a “quick link” was created to the target page for getting a copy of returns and w-2 forms and synonyms were added to the thesaurus to prioritize related content for any query containing the word “copy.” d) requests for 1040x or 1040ez forms or instructions tool used: query enhancement synonyms were added to the thesaurus to address both the variations on how users requested the 1040x and 1040ez forms and instructions, and the inconsistencies in the titling of these documents (for example, the form and the instructions have different variations of the compound name). ■ results in 2004, approximately 4,200 contacts were reviewed with the help desk during the same time period (the week before april 15) to see whether the changes actually did help users find the information. it should be noted that, during this period from april 2003 to april 2004, many other improvements to the user search experience based on the methodology were deployed. although the number of visits to irs.gov increased by approximately 50 percent compared with the same period in 2003, the total number of contacts with the help desk decreased by 47 percent (there were approximately 9,800 contacts in this period in 2003). the results for the specific improvements are shown in table 5. the average decrease in contacts for those four topics was 68 percent, compared with the average decrease of 47 percent. this approach has significantly improved the user experience by identifying and addressing subject areas users have trouble finding or understanding on irs.gov, eliminating the need for them to contact the help desk. as a result, an increase of resources at the help desk was avoided and, hopefully, user satisfaction improved. 86 information technology and libraries | june 2006 ■ conclusions while the case presented in this article was specific to irs.gov, the methodology itself has wide application across domains. customer service for most government and commercial organizations depends on providing users with relevant information effectively and efficiently. there are many aspects to achieving this elusive goal of matching users with the specific information they need. in this paper, it has been demonstrated that, rather than focusing just on optimizing the search engine or developing a metadata-based solution, it is essential to view the user search experience from the time content is created to the moment when users have truly found the answer to their information needs. there is no one surefire solution, and one should not assume that enhanced metadata or a new search engine is the only solution to retrieval problems. the methodology described in this paper assumes that users, especially infrequent users of public web sites, do not wish to become search experts; that intuitive interfaces and meaningful results displays contribute to a successful user experience; and that keeping business owners involved is important. the methodology is based on understanding the behavior of a site’s users in order to identify barriers to a successful search experience, and on understanding the needs of business owners. the methodology focuses on adapting the site to its users (rather than vice versa) through document modification, improved content-development processes, query enhancement, and targeted search improvement. it includes improvements to the results phase of the search process, such as improved titles and summaries, as well as to the searchand-retrieval phase. this toolkit-based approach is effective and low-cost. it has been used over the past four years to improve the user search experience significantly for the millions of irs.gov users. interesting follow-on research could focus on identifying to what degree this methodology can be automated and how to leverage new tools to provide automated support for usage log analysis (such as mondosearch by mondosoft). it is clear from this case study that it is time to apply systems engineering rigor to search-experience improvement. this approach confirms the need to extend metrics for evaluating search beyond precision and recall to include the totality of the search experience. ■ future work teleporting has been defined as an approach in which users try to jump directly to their information targets.14 trying to achieve perfect search results supports the information-seeking strategy of teleporting. but the search process may involve more than a single search. people often conduct “a series of interconnected but diverse searches on a single, problem-based theme, rather than one extended search session per task.”15 this approach is similar to the sport of orienteering with searchers using data from their present situation to determine where to go next—that is, looking for an overview first and then submitting more detailed searches. given the general, nonspecific nature of the short queries submitted by irs.gov users, the orienteering approach may well describe the information-seeking behaviors of many users. this paper is limited to the improvement of search results for individual searches, but the need to investigate improving the search experience to support orienteering behavior is acknowledged. future research will investigate how to leverage the theoretical models of the information-search process, such as the anomalous states of knowledge (ask) underlying information needs and the information search process model.16 references and notes 1. “common evaluation measures,” the thirteenth text retrieval conference, nist special publication sp 500-261 (gaithersburg, va.: national institute of standards and technology, 2004), appendix a. 2. kalervo jarvelin and peter ingwersen, “information-seeking research needs extension towards tasks and technology,” information research 10, no. 1 (2004), http://informationr .net/ir/10-1/paper212.html (accessed feb. 2, 2006); k. fisher, s. erdelez, and l. mckechnie, eds., theories of information behavior (medford, n.j.: information today, 2005); t. saracevic and paul b. kantor, “studying the value of library and information services, part i: establishing a theoretical framework,” journal of the american society for information science. 48, no. 6 (1997): 527–42. table 5. comparison of 2004 and 2003 help desk contacts problem area number of contacts 2003 number of contacts 2004 change 1040x, 1040ez 77 19 -75% prior year forms 480 103 -78% copy of return 170 91 -47% where to file 250 104 -58% total 977 317 -68% article title | author 87a dynamic methodology for improving the search experience | kerchner 87 3. scott nicholson, “a conceptual framework for the holistic measurement and cumulative evaluation of library services,” journal of documentation 60, no. 2 (2004): 164–82. 4. avra michelson and michael olson, “dynamically enabling search and discovery tem,” internal mitre presentation, mclean, va., mar. 30, 2005. 5. lawrence e. leonard, “inter-indexer consistency studies, 1954–1975: a review of the literature and summary of study results,” occasional paper series, no. 131, graduate school of library science, university of illionois, urbana-champaign, 1977; tefko saracevic, “individual differences in organizing, searching and retrieving information,” in proceedings of american society for information science ’91 (new york: john wiley, 1991), 82–86; g. furnas et al., ”the vocabulary problem in human-system communication,” communications of the acm 30, no. 11 (1987): 964–71. 6. rajiv ramnath and david landsbergen, “it-enabled sense-and-respond strategies in complex public organizations,” communications of the acm 48, no. 5 (2005): 58–64. 7. t. l. brauen et al., “document indexing based on relevance feedback,” report no. isr-14 to the national science foundation, section xi, department of computer science, cornell university, ithaca, n.y., 1968; m. c. davis, m. d. linsky, and m. v. zelkowitz, “a relevance feedback system employing a dynamically evolving document space,” report no. isr-14 to the national science foundation, section x, department of computer science, cornell university, ithaca, n.y., 1968; marcia d. kerchner, dynamic document processing in clustered collections, report no. isr-19 to the national science foundation, ph.d. thesis, department of computer science, cornell university, ithaca, n.y., 1971. 8. ibid. 9. gerard s. salton, dynamic information and library processing (englewood cliffs, n.j.: prentice-hall, 1975). 10. p. borlund, “the iir evaluation model: a framework for evaluation of interactive information retrieval systems,” information research 8, no. 3 (2003), http://informationr.net/ir/8 -3/paper152.html (accessed feb. 15, 2006). 11. ian ruthven, “re-examining the effectiveness of interactive query expansion,” in proceedings of the 26th international acm sigir conference on research and development in information retrieval (new york: acm press, 2003), 213–20. 12. marc l. resnick and rebecca lergier, “things you might not know about how real people search,” 2002, www.search tools.com/analysis/how-people-search.html (accessed oct. 1, 2005). 13. danny sullivan, “webtop search rage study,” the search engine report, 2001, http://searchenginewatch.com/sereport/ article.php/2163451 (accessed sept. 10, 2005). 14. j. teevan et al., “the perfect search engine is not enough: a study of orienteering behavior in directed search,” in proceedings of computer-human interaction conference ’94 (new york: acm press, 2004), 415–22. 15. vicki o’day and robin jeffries, “orienteering in an information landscape: how information seekers get from here to there,” in proceedings interchi ’93 (new york; acm press, 1993), 438. 16. n. j. belkin, r. n. oddy, and h. m. brooks, “ask for information retrieval, part i. background and theory,” the journal of documentation 38, no. 2 (1982): 61–71; n. j. belkin, r. n. oddy, and h. m. brooks, “ask for information retrieval, part ii. results of a design study,” the journal of documentation 38, no. 3 (1982): 145–64; carol c. kuhlthau, seeking meaning: a process approach (norwood, n.j.: ablex, 1993).